CN118136235A

CN118136235A - ICU disease progress prediction method, medium and equipment based on dynamic physiological index

Info

Publication number: CN118136235A
Application number: CN202410170315.4A
Authority: CN
Inventors: 何璇; 梁品; 林家震; 王声溢; 卢朝霞; 钱鹏; 朱正龙; 李思良; 李兆丰
Original assignee: Neusoft Hanfeng Medical Technology Co ltd
Current assignee: Neusoft Hanfeng Medical Technology Co ltd
Priority date: 2024-02-06
Filing date: 2024-02-06
Publication date: 2024-06-04

Abstract

The invention relates to an ICU disease progress prediction method based on dynamic physiological indexes, which comprises the following steps: acquiring data according to preset screening conditions and cleaning the data to obtain initial data; preprocessing initial data to obtain training data; determining a basic model and training the basic model to generate meta-characteristics; training the meta-feature based on the original tag information of the training data to obtain a meta-learner; and inputting the test data into the basic models to obtain the prediction results of the basic models, and inputting the prediction results of the basic models into the meta learner to obtain the final prediction results. The method has the advantages that the dynamic data in the ICU is utilized, an automatic and non-parameter integrated learning algorithm is provided based on a machine learning algorithm to estimate the disease progress of a patient, the robustness is high, and the prediction performance is good.

Description

ICU disease progress prediction method, medium and equipment based on dynamic physiological index

Technical Field

The invention relates to the technical field of ICU patient disease progress prediction, in particular to an ICU disease progress prediction method, medium and device based on dynamic physiological indexes.

Background

An intensive care Unit (INTENSIVE CARE Unit, ICU) is a site that provides a variety of life support systems for critical patients or patients after major surgery to maintain physiological function. In the ICU, rapid exacerbation of the condition is common and the outcome may be irreversible, potentially leading to patient death if the patient is not treated in a timely manner. If the disease progress of the patient can be quantitatively represented according to the current state of the ICU patient so as to obtain the disease progress of the patient, the method can assist doctors in decision making, so that the survival probability of the patient is improved, and the treatment expense for the patient caused by excessive medical treatment can be reduced.

Currently, researchers have developed a range of disease severity scores to represent patient disease progression, such as the APACHE and SAPA series scores, which can be quantified to a degree to represent patient disease progression. Although these scores are optimized a number of times, the accuracy of disease progression in the patient they represent as a whole remains to be improved. At the same time, these scores have limitations, such as being calculated only after data is obtained within 24 hours or 48 hours after patient admission, and such scores are difficult to use for continuous dynamic assessment of patient criticality. So far, most hospitals in China still use the scoring system.

In the ICU, a large amount of complex data, such as various real-time physiological index data, medical laboratory test result data, etc., is acquired and collected for each patient. Machine learning algorithms can use these rich data to predict disease progression for each patient, thereby assisting physicians in timely and effective treatment decisions for patients. Compared with the traditional method, the prediction model of the ICU patient disease progress based on the machine learning algorithm has more accurate prediction results, such as the 90 minutes before deterioration can be predicted by using a dynamic model based on the random forest algorithm (Random Forest Algorithm) in the patients with unstable respiratory cycle. Meanwhile, the clinically relevant hypotension event is predicted by a model based on a random forest algorithm, and the accuracy is 92.7%. A series of cases show that the prediction performance of machine learning in clinic is good. These models, while highly accurate, use only static data as a characteristic model, and the accuracy of disease progression prediction for patients remains to be further improved.

Disclosure of Invention

First, the technical problem to be solved

In view of the above-mentioned shortcomings and disadvantages of the prior art, the present invention provides an ICU disease progress prediction method, medium and apparatus based on dynamic physiological indexes, which solve the technical problem that the accuracy of patient disease progress prediction needs to be further improved by using a model with static data as a feature in the prior art.

(II) technical scheme

In order to achieve the above purpose, the main technical scheme adopted by the invention comprises the following steps:

In a first aspect, the present invention provides a method for ICU disease progression prediction based on dynamic physiological indicators, comprising:

acquiring data according to preset screening conditions and cleaning the data to obtain initial data;

preprocessing initial data to obtain training data;

determining a basic model and training the basic model to generate meta-characteristics;

training the meta-feature based on the original tag information of the training data to obtain a meta-learner;

And inputting the test data into the basic models to obtain the prediction results of the basic models, and inputting the prediction results of the basic models into the meta learner to obtain the final prediction results.

Optionally, the screening conditions include: the deletion of data from an underage patient, deletion of data from which to enter the ICU was scheduled for admission, and deletion of data from which to enter the ICU was different.

Optionally, the data cleaning includes the steps of:

Acquiring dynamic physiological index characteristics selected manually;

And selecting initial data according to the dynamic physiological index characteristics.

Optionally, the preprocessing the initial data to obtain training data includes:

Continuous initial data are subjected to continuous data outlier removal and normalization processing to obtain continuous data;

Sorting and converting each item of continuous data of each patient into a histogram;

the continuous data are represented by a vector formed by splicing a histogram and the total data amount corresponding to the histogram, so that training data are obtained;

and carrying out digital processing on the discrete initial data to obtain training data.

Optionally, the digitizing process includes: the discrete initial data is thermally encoded one-time.

Optionally, the base model includes ：AdaBoost Classifier、Bagging Classifier、RandomForest Classifier、ExtraTrees Classifier、GradientBoosting Classifier、LightGBM Classifier、CatBoost Classifier and XGBoost Classifier.

Optionally, the original tag information includes survival or death.

In a second aspect, the present invention provides an ICU disease progression prediction system based on dynamic physiological indicators, comprising:

the data acquisition module acquires data according to preset screening conditions and performs data cleaning to obtain initial data;

the preprocessing module is used for preprocessing the initial data to obtain training data;

the first training module is used for determining a basic model and training the basic model to generate meta-characteristics;

The second training module trains the meta-feature based on the original label information of the training data to obtain a meta-learner;

And the prediction module inputs the test data into the basic models to obtain the prediction results of the basic models, and inputs the prediction results of the basic models into the meta learner to obtain the final prediction results.

In a third aspect, the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed, implements the ICU disease progression prediction method based on dynamic physiological indicators of any one of the first aspects above.

In a fourth aspect, the present invention provides a storage device comprising a storage medium and a processor, the storage medium storing a computer program which when executed by the processor implements the ICU disease progression prediction method based on dynamic physiological indicators of any one of the first aspects above.

(III) beneficial effects

The beneficial effects of the application are as follows: according to the ICU disease progress prediction method based on the dynamic physiological index, the dynamic physiological index characteristic data are processed in different modes successively, and then the disease progress of a patient is predicted based on a prediction algorithm of super integrated learning (Super Learner). In super integrated learning, prediction errors of a base learner can be offset, and limitation of a single model can be supplemented, so that errors of the whole model can be reduced, and the performance of model learning and prediction can be improved. Compared with the related art, the method fully utilizes dynamic data in the ICU, and provides an automatic and non-parameter integrated learning algorithm for estimating the disease progress of the patient based on the machine learning algorithm, so that the robustness is high, and the prediction performance is good.

Drawings

FIG. 1 is a schematic flow chart of an ICU disease progression prediction method based on dynamic physiological indexes according to an embodiment of the present invention;

Fig. 2 is a block diagram of an ICU disease progression prediction system based on dynamic physiological indicators according to an embodiment of the present invention.

[ Reference numerals description ]

200: An ICU disease progression prediction system based on dynamic physiological indicators;

201: a data acquisition module;

202: a preprocessing module;

203: a first training module;

204: a second training module;

205: and a prediction module.

Detailed Description

The invention will be better explained by the following detailed description of the embodiments with reference to the drawings.

The invention provides an ICU disease progress prediction method based on dynamic physiological indexes. The training data source is a MIMIMIC-III database. MIMIMI-III (Medical Information Mart for INTENSIVE CARE III) is a large, free data set consisting of health data information for more than 4 ten thousand patients constructed by the institute of technology and computing physiology laboratory at Massa Medicata university. Unlike predictive models and selected features used in the past, the present invention notices the importance of dynamic data and makes use of it. Firstly, two types of data preprocessing methods are provided for large-scale dynamic multidimensional sparse clinical data; next, a predictive algorithm based on super ensemble learning (Super Learner) is presented to predict disease progression in a patient. In super integrated learning, prediction errors of a base learner can be offset, and limitation of a single model can be supplemented, so that errors of the whole model can be reduced, and the performance of model learning and prediction can be improved.

While exemplary embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

In a first aspect, referring to fig. 1, the present embodiment provides an ICU disease progression prediction method based on dynamic physiological indexes, including:

s101, acquiring data according to preset screening conditions and cleaning the data to obtain initial data.

Optionally, the data cleaning includes the steps of:

Acquiring dynamic physiological index characteristics selected manually;

Because the MIMIMIIC-III database contains a large amount of medical information, the difficulty of directly reading data from the MIIC-III database is excessive. The MIMIC-III database is thus stored in database platform software, such as Mysql, etc. Considering that the data in the MIMIC-III database are all records row by row, when selecting database platform software, some database platforms searched by columns are selected to efficiently extract data from them according to the screening conditions. In this embodiment, clickhouse database platform software is selected, and the data in the MIMIC-III database is stored in clickhouse database platform software to facilitate extraction of data for model training.

And extracting data for training the model from the clickhouse database platform storing the MIMIMIIC-III database according to the formulated screening conditions, and storing the data in a CSV format.

Due to the large data volume, part of useless data is removed through data cleaning in order to facilitate subsequent processing. The data cleaning process comprises the steps of selecting dynamic physiological index features in combination with actual meanings, selecting patient data for training according to the selected dynamic physiological index features, and the like.

Table 1 shows the final selected dynamic physiological index feature, and the feature is selected by combining the commonly used feature and the number of times of occurrence of the feature in the data set, so as to obtain more than 10 ten thousand pieces of data of 1052 patients.

TABLE 1

S102, preprocessing the initial data to obtain training data.

To filter outliers, all data for each dynamic physiological index feature in table 1 is ordered from small to large. The first 1% and last 99% (first 1% and last 99%, referring to 0-1 and 99-100) of each dynamic physiological index feature-ranked data are considered outliers that are out of bounds. After removing the abnormal values, the data of each physiological characteristic is normalized. Data normalization is a data preprocessing technology, and is used for converting data with different scales and ranges into data in a unified interval so as to be better applied to tasks such as data analysis, machine learning, model training and the like. The dimension difference in the data can be eliminated by data normalization, the convergence rate of the algorithm is improved, and the influence of deviation among features on the model is avoided.

After normalizing the data of each dynamic physiological index feature to be in the interval of (0, 1), each physiological feature data of each patient is represented by a vector with the length of 21. The method comprises the following specific steps: a histogram comprising 20 bins is first used to represent a certain item of physiological characteristic data of a certain patient. The histogram can represent the data distribution by a series of vertical stripes or line segments of unequal height. The interval of (0, 1) is divided into 20 parts on average, so that the plotted histogram includes 20 bins, and the length of each bin can represent how much data is in one cell. The length of the 20 posts and the total data of the physiological characteristic data of the patient are spliced into a vector with the length of 21, the first 20 elements of the vector are the length of the 20 posts, and the last element is the total data of the physiological characteristic data of the patient.

For discrete data, it is feature digitized, i.e., it is converted to a one-hot code. The race data and ICU type data to be subjected to data mapping are first processed, such as ASIAN-SOUTH EAST ASIAN, ASIAN-KOREAN and other ASIAN countries are mapped to ASIAN. After mapping the data, all the discrete data are One-Hot encoded (One-Hot Encoding) to facilitate model processing. One-hot encoding is a method of converting discrete features into binary vectors. For a feature with n different values, the one-hot encoding converts it into a binary vector of length n, where only one element is 1 and the remaining elements are 0. In this way, each value is represented as an independent dimension, thereby avoiding the impact of the magnitude relationship between different values on the model.

S103, determining a basic model and training the basic model to generate meta-characteristics.

For each base model, cross-validation was performed during training to obtain a stable model performance assessment: in machine learning, it is often necessary to train a model using a portion of the data and to evaluate the performance of the model using the remaining data. The n-fold cross validation method is to divide the data set into n subsets, where n-1 subsets are used for training the model and the remaining subset is used for testing the model, then repeat this process several times, selecting different subsets each time as the test set, and finally integrate the evaluation results of several times to obtain the final evaluation result, where n is set to 10 in this embodiment. In super inheritance learning, each base classifier generates a prediction result during training, and the prediction results of all the base classifiers are integrated to obtain new characteristics of each sample data, which are called meta-characteristics.

S104, training the meta-feature based on the original label information of the training data to obtain a meta-learner.

Optionally, the original tag information includes survival or death.

A new machine learning model, i.e., a meta-learner, is trained using the generated meta-features and the original signature (survival or death) of each sample of data. The meta learner is a logistic regression in this embodiment.

S105, inputting the test data into the basic models to obtain the prediction results of the basic models, and inputting the prediction results of the basic models into the meta learner to obtain the final prediction results.

In view of the fact that the true relationship between the disease progression and the explanatory variables of ICU patients is unknown, the present invention aims to evaluate the disease progression of patients by means of an automated, non-parametric algorithm, independent of any underlying relationship, thereby improving the predictive architecture. The invention provides an automatic non-parameter integrated learning model, which is characterized in that algorithms are ordered according to the prediction performance of each base algorithm, and then the best weighted combination of candidate algorithms is constructed so as to obtain an integrated algorithm. The proposed ensemble learning model does not depend on any assumption aiming at basic data distribution, and the accuracy reaches 90.43%, so that the ensemble learning model is more suitable for fitting complex data.

In the aspect of data set selection, the invention uses a MIMIMI (Medical Information Mart for INTENSIVE CARE III) database, which is a public medical Electronic Health Record (EHR) database, comprising medical data of a plurality of ICU patients, wherein mass static data and dynamic data are very suitable for the invention.

In terms of selection of database platform software, the ClickHouse database is selected for use. ClickHouse is a columnar database management system (DBMS) for online analysis (OLAP). For large-scale queries directed to huge-scale data (tens of GB or higher), mySQL, a traditional online transaction (OLTP) line database management system, is particularly frustrating. Whereas the large-scale query efficiency of OLAP systems is extremely high.

In the evaluation of the ensemble learning performance, the training set and the verification set are first divided. In the present invention, 25% of the data is used as the validation set, and the rest is used as the training set. And then carrying out fitting training on the integrated learning model by using a training set, calculating the prediction accuracy, and printing out a classification report of the prediction result, wherein the classification report comprises evaluation indexes such as accuracy, recall rate, F1 value and the like. The final accuracy was 90.43%. Table 2 is a prediction result evaluation of each sub-algorithm of the integrated learning model, and table 3 is an overall prediction effect evaluation of the integrated learning model.

TABLE 2

TABLE 3 Table 3

	Precision	recall	F1-score	support
					0	0.90	0.99	0.94	176
1	0.94	0.63	0.76	54
					accuracy			0.90	230
Macro avg	0.92	0.81	0.85	230
					Weighted avg	0.91	0.90	0.90	230

In a second aspect, as shown in fig. 2, the present embodiment provides an ICU disease progression prediction system 200 based on dynamic physiological indexes, including: a data acquisition module 201, a preprocessing module 202, a first training module 203, a second training module 204, and a prediction module 205. The data acquisition module 201 acquires data according to preset screening conditions and performs data cleaning to obtain initial data. The preprocessing module 202 preprocesses the initial data to obtain training data. The first training module 203 determines a base model and trains the base model to generate meta-features. The second training module 204 trains the meta-feature based on the original tag information of the training data to obtain a meta-learner. The prediction module 205 inputs the test data into the base model to obtain the prediction result of each base model, and inputs the prediction result of each base model into the meta learner to obtain the final prediction result. According to the ICU disease progress prediction system based on dynamic physiological indexes provided in the present embodiment, since the ICU disease progress prediction system based on dynamic physiological indexes is used to implement the steps of the ICU disease progress prediction method based on dynamic physiological indexes provided in the first aspect of the present invention, the ICU disease progress prediction system based on dynamic physiological indexes has all technical effects of the ICU disease progress prediction method based on dynamic physiological indexes, and will not be described herein.

In a third aspect, an embodiment of the present invention provides a computer readable storage medium having stored thereon a computer program, which when executed implements the ICU disease progression prediction method based on dynamic physiological indicators of any one of the first aspects above.

In a fourth aspect, an embodiment of the present invention provides a storage device, including a storage medium and a processor, where the storage medium stores a computer program, where the program when executed by the processor implements the ICU disease progression prediction method based on dynamic physiological indicators according to any one of the first aspect.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, the present invention should also include such modifications and variations provided that they come within the scope of the following claims and their equivalents.

While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that alterations, modifications, substitutions and variations may be made in the above embodiments by those skilled in the art within the scope of the invention.

Claims

1. An ICU disease progression prediction method based on dynamic physiological indexes, comprising:

preprocessing initial data to obtain training data;

2. The method for predicting ICU disease progression based on dynamic physiological indicators according to claim 1, wherein the screening conditions include: the deletion of data from an underage patient, deletion of data from which to enter the ICU was scheduled for admission, and deletion of data from which to enter the ICU was different.

3. The method for ICU disease progression prediction based on dynamic physiological indicators according to claim 1, wherein the data cleansing comprises the steps of:

Acquiring dynamic physiological index characteristics selected manually;

4. The method for predicting ICU disease progression based on dynamic physiological indicators according to claim 1, wherein preprocessing the initial data to obtain training data comprises:

5. The method for ICU disease progression prediction based on dynamic physiological indicators according to claim 4, wherein the digitizing process comprises: the discrete initial data is thermally encoded one-time.

6. The method of dynamic physiological index based ICU disease progression prediction according to claim 1, wherein the base model comprises ：AdaBoost Classifier、Bagging Classifier、RandomForest Classifier、ExtraTrees Classifier、GradientBoosting Classifier、LightGBM Classifier、CatBoost Classifier and XGBoost Classifier.

7. The method for ICU disease progression prediction based on dynamic physiological indicators according to claim 1, wherein the raw signature information comprises survival or death.

8. An ICU disease progression prediction system based on dynamic physiological indicators, comprising:

9. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the dynamic physiological indicator based ICU disease progression prediction method of any one of claims 1 to 7.

10. A storage device comprising a storage medium storing a computer program and a processor, wherein the processor, when executing the computer program, implements the dynamic physiological index based ICU disease progression prediction method of any one of claims 1 to 7.