CN111261282A - Sepsis early prediction method based on machine learning - Google Patents
Sepsis early prediction method based on machine learning Download PDFInfo
- Publication number
- CN111261282A CN111261282A CN202010068293.2A CN202010068293A CN111261282A CN 111261282 A CN111261282 A CN 111261282A CN 202010068293 A CN202010068293 A CN 202010068293A CN 111261282 A CN111261282 A CN 111261282A
- Authority
- CN
- China
- Prior art keywords
- prediction
- model
- data
- training
- sepsis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/10—Pre-processing; Data cleansing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/243—Classification techniques relating to the number of classes
- G06F18/24323—Tree-organised classifiers
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Medical Informatics (AREA)
- Public Health (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a sepsis early-stage prediction method based on machine learning. Firstly, extracting clinical data of a patient in 24 hours after the patient enters an ICU (intensive care unit) by using an electronic medical record, wherein the clinical data comprises a plurality of variables such as demographics (such as age and sex), vital sign variables (such as heart rate and systolic pressure) and laboratory measurement indexes (such as creatinine and platelet count), preprocessing the data, inputting the preprocessed data into an improved deep forest algorithm model for training, and outputting the illness probability of the patient after training and tuning. Meanwhile, the algorithm model can also sequence the characteristic variables of the sepsis and output early warning factors which have important influence on early sepsis prediction. Finally, the patient's corresponding variables to be predicted are entered into the trained model, and an early prediction of sepsis can be made for this patient. The method for early prediction of sepsis based on machine learning can assist doctors in clinical decision making and improve prediction accuracy.
Description
Technical Field
The invention belongs to the field of medical data mining, and particularly relates to a sepsis early-stage prediction method based on machine learning.
Background
Sepsis is a disease that poses a serious threat to life safety, is a systemic inflammatory response syndrome caused by infection, and is one of the main causes of common high-risk complications and death of ICU patients. An estimated 3000 million people worldwide each year suffer from sepsis, and the sepsis treatment cost is very high and the risk is very high due to the number of sepsis fatalities exceeding 600 million people. Sepsis has become a public medical problem of high global concern due to morbidity, mortality, and expensive treatment costs. The clinical diagnostic definition of sepsis has progressed from 1.0 to 3.0 and is also constantly changing. Currently the latest definition of sepsis-3 in the clinic was proposed by the european association of severe illness in 2016. Clinical research on the pathogenesis of sepsis has been advanced to a certain extent, but the pathogenesis of sepsis is complex, more variable factors are involved, and the diagnosis accuracy rate is still to be improved. Studies have shown that early detection of sepsis and timely antibiotic treatment are critical to improve outcome in patients with sepsis, increasing mortality by 4% -8% every hour of treatment delay. The patient who is possibly developed into the sepsis is discovered as early as possible and is given timely treatment, and the research on key influencing factors closely related to the occurrence of the sepsis has important research value and significance for improving the prognosis of the patient. Most current studies are from a medical point of view, mostly based on statistical analysis and simple logistic regression models, and few use machine learning algorithms for early prediction of sepsis in patients. Current studies indicate that sepsis is a major cause of late death. The 24 hours after the patient entered the ICU is a very critical moment during which most of the disease transitions occur. The clinical data within 24 hours have higher research value and significance for early diagnosis of sepsis of patients. In addition, due to the privacy of medical information data, many research literature data are based on a specific hospital and are difficult to share, so that the research methods and results are not repeatable and comparable. With the advent of intelligent medical treatment based on data driving, more and more medical staff expect to utilize a machine learning method to mine medical data, and further help the medical staff to improve deep cognition and diagnosis efficiency of diseases.
Disclosure of Invention
In order to solve the problems of difficult clinical diagnosis and low accuracy of sepsis of ICU patients in the prior art, the invention provides an early sepsis prediction method based on machine learning. The method comprises the steps of extracting a plurality of examination variables recorded by an electronic medical record of a patient, preprocessing data, constructing a prediction model by using an improved deep forest algorithm, outputting the probability of sepsis of the patient, and early predicting and identifying the patient suffering from sepsis, has the beneficial effect of high prediction accuracy, and simultaneously explores key influence factors closely related to the occurrence of sepsis.
In order to achieve the purpose, the invention adopts the technical scheme that:
a sepsis early prediction method based on machine learning comprises the following steps:
and 5, after the model is trained, carrying out early sepsis prediction on a new patient, and simultaneously outputting the importance metric value of each characteristic variable.
Further, the predicted characteristic variables extracted in step 1 include vital sign variables; measuring indexes in a laboratory; demographic information.
Further, the data preprocessing in the step 2 comprises:
variable screening, namely setting a deletion rate threshold value, removing variables with too high deletion rate, and simultaneously performing low variance filtering variable selection, namely calculating the variance corresponding to each variable, setting the threshold value, filtering if the variance of the variables is lower than the threshold value, and selecting and removing zero variance characteristics, wherein the variance is 0 and indicates that the value of the variables has no change, and the variables have no distinction to the model;
filling missing values, namely filling the missing values by using a MissForest model prediction method, wherein the MissForest is a non-parameter missing value filling method, and performing prediction filling on the missing values by using an algorithm model;
outlier processing, the processing method is the 6 σ principle used, where: σ denotes the standard deviation of the data, extending the upper and lower bounds of the general 3 σ, if a certain item of data of a patient deviates more than 6 times the standard deviation from the mean in the data set to which it belongs, i.e. the data value is outside [ U-6 σ -U +6 σ ], where: the average value of the data set represented by U is replaced by the minimum and maximum limit values respectively;
and feature extraction, namely expanding the feature variables, and performing feature expansion from the maximum value, the minimum value and the average value by the medical scoring system based on the maximum value calculation and the minimum value calculation to judge the severity of the patient.
Furthermore, in the model construction in the step 3, an improved deep forest algorithm model is used to construct a prediction model;
the improved deep forest algorithm model comprises the following specific steps:
step (1): two tree-based model algorithms of a random forest and XGboost are selected as a base learner of each layer, and each layer selects 2 identical random forest algorithm models and 2 identical XGboost as four base learner prediction models. After each model was trained on data using k-fold cross validation, wherein: k is 10, and the probability vector X of each forest is outputi;
Step (2): after k-fold training, the accuracy of the prediction result of the training data is calculated at the same time, and the accuracy is used as a weight parameter of the model after regular normalization and is recorded as w'iThe calculation formula is as follows:
wherein: wiShowing the accuracy of the predicted result of the training data of the ith forest algorithm model after 10-fold training; w'iA weight representing a prediction probability of the ith model;
and (3): the prediction probabilities of the four base learning device prediction models used by each layer are subjected to weighted fusion, and the finally output probability vector is recorded as XprobThen the final probability vector X of each layer outputprobInput into the next layer, X, in connection with the original featuresprobThe calculation formula of (2) is as follows:
wherein, XiAnd the predicted probability vector output by the ith forest is represented by training data by adopting k-fold cross validation at each layer, wherein k is 10.
And (4): and (4) repeating the steps (1) to (3) on the next layer, continuing training until the training precision is not improved any more, automatically stopping, and outputting the weighted probability of the last layer as a prediction result.
Further, in the training and tuning of the model parameters in the step 4, each layer of the algorithm model for training and predicting selects 2 identical random forests and 2 identical XGboost, and each forest comprises 100 trees; training data in each layer by adopting a 10-fold cross validation mode, and dividing the data of a training set and a test set according to 0.8/0.2; the evaluation index uses the prediction accuracy, and if the prediction accuracy of the next layer in the training is not improved any more, the training is terminated.
Further, the importance metric value of each feature variable output in the step 5 is generated simultaneously in the model training process, and the feature importance of the last output of each layer is F'nThe calculation method comprises the following steps:
F′n′=w′i*Fi
wherein n represents the nth layer, i represents the ith forest algorithm model of each layer, and FiRepresenting all the feature importance of the ith forest model after 10 th training and averaging;
and finally, selecting and outputting the feature importance of the last layer, and further obtaining the importance metric of each feature variable.
Compared with the prior art, the invention has the following beneficial effects:
the invention aims at medical data extracted within 24 hours after a patient enters an ICU, and carries out preprocessing including variable screening, missing value filling, abnormal value processing, characteristic extraction and the like, and an improved deep forest algorithm model is constructed based on a task of early prediction of sepsis, wherein the improved deep forest algorithm model is different from an original deep forest model framework, the probability superposition of prediction of each forest in the previous layer is directly used, the forest is directly weighted and averaged after the distribution probability of each forest is output, and the weight parameter is determined by the accuracy of a training model. Therefore, dimensionality can be reduced, only the key extracted information is reserved as a new feature for prediction, and the training speed is greatly increased. In addition, the importance of the features is emphasized to be scored, the contribution of the features in respective training models is output in each layer, and finally the overall importance scores of all the features are calculated to obtain early warning factors which have important influence on early prediction of sepsis. The method for machine learning is used for predicting and classifying sepsis, has higher precision, can assist doctors to judge disease conditions, pay close attention to patients with possible sepsis, and timely rescue and improve the prognosis of ICU patients.
Drawings
FIG. 1 is an overall flow diagram of the present invention;
FIG. 2 is a SOFA score table;
FIG. 3 is a diagram of a cascaded forest structure;
FIG. 4 is a diagram of the improved deep forest structure of the present invention;
FIG. 5 is a schematic representation of the ranking of feature importance metric values.
Detailed Description
In order to better explain the technical scheme, the following is made a more detailed description with reference to the embodiment.
A sepsis early prediction method based on machine learning comprises the following steps:
step 1: early prediction of sepsis is a predictive classification task that requires data acquisition first. The data is the cornerstone of research, and for different disease problems, the corresponding data is needed as a support. The ICU patient has many examinations and has complicated data records, so that a plurality of required prediction characteristic variables need to be extracted from the electronic medical record records or medical data sets of the patient, and clinical data within 24 hours after entering the ICU are mainly extracted;
1.1 predictive feature variables commonly used in medical scoring systems
Clinically, the diagnosis of sepsis is mainly judged on the condition of patients using medical scoring systems such as SOFA score, and the variables commonly used in these scoring systems mainly include three types: vital sign variables (e.g. heart rate, pulse, systolic blood pressure, arterial pressure), laboratory measures (e.g. creatinine, direct bilirubin, serum glucose, lactate), demographic information (e.g. age, weight, length of stay, type of stay). As shown in fig. 2, are variables needed in the SOFA score table. In order to compare the performance of the machine learning method with that of the traditional clinical diagnosis, variables appearing in a medical scoring system are often extracted and used in research of others, so that comparison is facilitated.
Step 2: after data is extracted, before the data is input into a prediction model, the data needs to be preprocessed, including variable screening, missing value filling, abnormal value processing, feature extraction and other operations of the data, so that the data quality is improved.
Generally, the quality of an original data set is not high, so that the original data set has many problems, cannot be directly used and needs to be subjected to certain pretreatment. Data in a real scene often has many defects, and data distortion is caused by various reasons such as human errors, equipment errors or loopholes in the collection process of the data derived from a hospital system. The ICU data is sparse, and a large number of missing values and some abnormal values exist. Because the measurement frequencies of the extracted variables are not consistent, for example, some laboratory test indexes such as blood culture/white blood cell count may require 1-5 days to obtain results, some variables are sampled in real time such as heart rate/respiratory rate, and some variables may be sampled for hours or once a day. We need to preprocess the data before making the prediction. The data processing results are that various dirty data are processed in a corresponding mode, and available data with consistent standards are obtained. The data preprocessing mainly comprises the following steps:
and (4) variable screening, namely setting a deletion rate threshold value and removing the variables with the over-high deletion rate. Meanwhile, low variance filtering variable selection is carried out, namely, the variance corresponding to each variable is calculated, a threshold value is set, and filtering is carried out when the variance of the variable is lower than the threshold value. The zero variance feature is generally selected to be eliminated, the variance is 0, which means that the value of the variable has no change, and the variable is not distinguishable for the model.
Outlier processing, because the comparison of medical data anomalies is difficult to define, the processing method is the 6 σ principle used, where: σ denotes the standard deviation of the data, extending the upper and lower bounds of the general 3 σ, if a certain item of data of a patient deviates more than 6 times the standard deviation from the mean in the data set to which it belongs, i.e. the data value is outside [ U-6 σ -U +6 σ ], where: the average value of the data set represented by U is replaced by the minimum and maximum limit values respectively;
and filling missing values, wherein the missing values are filled by using a MissForest model prediction method. MissForest is a nonparametric missing value filling method, and an algorithm model is used for predicting and filling missing values. The algorithm principle takes known data columns as features and takes missing variable columns as labels. And then, taking the label with data as a training set and the label missing as a test set, training by using a random forest algorithm, predicting and updating missing values, and finally, filling the predicted data and continuously iterating to predict other missing data.
Feature extraction, namely, the expansion of variables, and the judgment of the severity of a patient by a common medical scoring system is generally based on the most-valued calculation, so that feature expansion is carried out from three aspects of a maximum value, a minimum value and a mean value.
And step 3: after data preprocessing, the data are input into the constructed prediction model, and the prediction model is constructed by using an improved deep forest algorithm. In the research of medical data mining by using a machine learning algorithm, different algorithms need to be selected or constructed according to different disease diagnosis learning tasks so as to achieve the best prediction effect. In machine learning, various algorithms can be selected, such as logistic regression, support vector machine, K neighbor and the like, or various improved algorithms based on basic algorithms are carried out. With the continuous development of neural networks, deep learning methods such as transfer learning/reinforcement learning are also widely used, and the effects are prominent. The deep forest algorithm has better competitiveness than a deep neural network, and has less hyper-parameters, high training efficiency and excellent performance on a small-scale data set. In the intelligent medical auxiliary diagnosis, a deep learning method is used for obtaining higher accuracy in a plurality of researches, but because of the inexplicability of a neural network, the reliability of doctors on the neural network is not high due to the similar structure of a black box. The tree model has a good exposable type, so the reason can be explained according to the splitting path in the decision tree.
3.1 deep forest model
A multi-granular Cascade Forest algorithm (gcForest), also called deep Forest algorithm, was proposed in 2017 by professor zhou shihua et al. This is a multi-layer classifier, each layer being integrated from multiple base classifiers. The gcForest structure is similar to the deep neural network, and the ability is characterized through hierarchical learning. The deep neural network is widely applied due to the characteristics of strong characterization learning capability, automatic feature conversion, complex models and the like, but the problems of high computational complexity, excessive parameters, lack of interpretability and the like exist at the same time. The proposal of gcForest provides a new idea for exploring a deep algorithm except a neural network. The gcForest algorithm consists of two parts: the first part is multi-granularity scanning and is used for extracting data features; the second part is a cascade forest which is used for iterative training to improve the classification result; the two parts can be used in combination or separately. The basic composition of the neural network is small neurons, the basic composition unit of gcForest is a forest which is an integrated model formed by a plurality of decision trees, and therefore the deep forest can also be regarded as an integrated model. FIG. 3 is a diagram of a cascaded forest structure of the gcForest model.
3.2 improved deep forest model
The gcForest is actually a frame of a deep forest algorithm and needs to be modified according to a specific adaptive scene. On the basis of a gcForest algorithm framework, an improved deep forest algorithm is provided for sepsis prediction. The improved deep forest algorithm structure is shown in fig. 4. Different from an original gcForest model architecture, probability superposition of prediction of each forest of the previous layer is directly used, weighted average is directly carried out on each forest after the forest outputs class distribution probability, and weight parameters are determined by accuracy of a training model. Weight parameter w of each forest model'iThe calculation method is as follows:
wherein, WiShowing the accuracy of the predicted result of the training data of the ith forest algorithm model after 10-fold training;
then, the prediction probabilities of the four models are subjected to weighted fusion, and finally the output probability vector is recorded as XprobThe calculation formula is as follows:
wherein, XiThe predicted probability vector output by the ith forest is shown by training data with k-fold (k is 10) cross validation for each layer.
Will last probability vector XprobAnd connecting the initial features and inputting the initial features into the next layer until the model training is finished, and outputting the weighted probability of the last layer as a prediction result.
After weighted average is carried out on the probabilities predicted by all forest models, dimensionality can be reduced, only key extracted information is reserved as new features for prediction, and training speed is greatly increased. In addition, the feature importance of each layer can be output simultaneously in the training process, and the average value is obtained finallyHowever, the model weight parameter w 'may be used as well'iTo assist in the computation.
And 4, training the model, wherein the training time of the model is different according to the scale of the training data. The essence of model training is a process of parameter optimization. The optimal hyper-parameters and the parameters of each base model are found through training, and then the optimization is continuously carried out according to the evaluation indexes, so that the model effect is stable and optimal. And 2 identical random forests and 2 identical XGboost are selected for each layer of the algorithm model for training prediction, and each forest comprises 100 trees. Each layer was trained using k-fold (k 10) cross validation, with the training set and test set data being divided by 0.8/0.2. The evaluation index uses the prediction accuracy, and if the prediction accuracy of the next layer in the training is not improved any more, the training is terminated.
And 5, after the model is trained, carrying out sepsis prediction output on a new patient, wherein the model predicts the probability of the disease output, the probability value is between 0 and 1, and patients with the prediction probability of more than or equal to 0.5 are classified as 1, namely the patients are considered to possibly produce sepsis. Whereas a prediction probability of less than 0.5 would be classified as 0, indicating that the patient does not develop sepsis. And after the model training is finished, the importance metric value of each characteristic variable can be output. The characteristic importance of the last output of each layer is F'nThe calculation method comprises the following steps:
F′n=w′i*Fi
wherein n represents the nth layer, i represents the ith forest algorithm model of each layer, and FiAll feature importance averaged after training of the ith forest model 10 is shown.
And finally outputting the feature importance of the last layer. As shown in fig. 5, the abscissa represents the importance measure of a feature and the ordinate represents the corresponding feature variable. The longer the bar length, the larger the value. The sum of all feature importance sums to 1.
The invention firstly extracts various predictive variables of a patient from an electronic medical record or a medical data set, and mainly extracts three major types of multidimensional variable information of vital sign variables, laboratory measurement indexes and demographic information of the patient, which are recorded within 24 hours after the patient enters an ICU. And then preprocessing the extracted data, including operations of variable screening, abnormal value processing, missing value filling, feature extraction and the like of the data, and finishing the cleaning and processing of the data. And inputting the processed data into the constructed prediction model, wherein the prediction model uses an improved deep forest algorithm model. After training and parameter tuning, the prediction model can predict a new patient, and data of the predicted patient and corresponding variables in the training model are input, so that the early prediction of sepsis of the patient can be performed. In addition, the trained algorithm model can also sequence the characteristic variables of the sepsis and output early warning factors which have important influence on early sepsis prediction. Higher scores for the early warning factors indicate an increasingly close relationship with the onset of sepsis. By adopting the technical scheme of the invention, doctors can be assisted in making clinical decisions, and the accuracy of predicting the patient condition is improved.
The above description is only of the preferred embodiments of the present invention, and it should be noted that: it will be apparent to those skilled in the art that various modifications and adaptations can be made without departing from the principles of the invention and these are intended to be within the scope of the invention.
Claims (6)
1. A sepsis early prediction method based on machine learning is characterized by comprising the following steps:
step 1, firstly, defining a diagnosis task, and extracting a plurality of predicted characteristic variables recorded within 24 hours after the electronic medical record or medical data of a patient is centralized and input into an ICU;
step 2, after data is extracted, preprocessing the data is needed under the conditions that the data has deletion and abnormality in different degrees, wherein the preprocessing comprises operations of variable screening, deletion value filling, abnormal value processing and feature extraction;
step 3, after data preprocessing, inputting the data into the constructed prediction model, and constructing the prediction model by using an improved deep forest algorithm;
step 4, training the prediction model, finding the optimal parameters through training, and continuously adjusting the optimal parameters to ensure that the model effect is stable and optimal;
and 5, after the model is trained, carrying out sepsis prediction on a new patient, and simultaneously outputting the importance metric value of each characteristic variable.
2. The machine learning-based early prediction of sepsis method of claim 1, characterized by: the predicted characteristic variables extracted in the step 1 comprise vital sign variables; measuring indexes in a laboratory; demographic information.
3. The machine learning-based early prediction of sepsis method of claim 1, characterized by: the data preprocessing in the step 2 comprises the following steps:
variable screening, namely setting a deletion rate threshold value, removing variables with too high deletion rate, and simultaneously performing low variance filtering variable selection, namely calculating the variance corresponding to each variable, setting the threshold value, filtering if the variance of the variables is lower than the threshold value, and selecting and removing zero variance characteristics, wherein the variance is 0 and indicates that the value of the variables has no change, and the variables have no distinction to the model;
filling missing values, namely filling the missing values by using a MissForest model prediction method, wherein the MissForest is a non-parameter missing value filling method, and performing prediction filling on the missing values by using an algorithm model;
outlier processing, the processing method is the 6 σ principle used, where: σ represents the standard deviation of the data, if a certain item of data of a patient deviates by more than 6 times the standard deviation from the mean in the data set to which it belongs, i.e. the data value is outside [ U-6 σ -U +6 σ ], wherein: the average value of the data set represented by U is replaced by the minimum and maximum limit values respectively;
and feature extraction, namely expanding the feature variables, and performing feature expansion from the maximum value, the minimum value and the average value by the medical scoring system based on the maximum value calculation and the minimum value calculation to judge the severity of the patient.
4. The machine learning-based early prediction of sepsis method of claim 1, characterized by: in the model building in the step 3, an improved deep forest algorithm model is used for building a prediction model;
the improved deep forest algorithm model comprises the following specific steps:
step (1): selecting two tree-based model algorithms of a random forest and XGboost as a base learner of each layer, wherein each layer selects 2 same random forest algorithm models and 2 same XGboost as four base learner prediction models; after each model was trained on data using k-fold cross validation, wherein: k is 10, and the probability vector X of each forest is outputi;
Step (2): after k-fold training, the accuracy of the prediction result of the training data is calculated at the same time, and the accuracy is used as a weight parameter of the model after regular normalization and is recorded as w'iThe calculation formula is as follows:
wherein: wiShowing the accuracy of the predicted result of the training data of the ith forest algorithm model after 10-fold training; w'iA weight representing a prediction probability of the ith model;
and (3): the prediction probabilities of the four base learning device prediction models used by each layer are subjected to weighted fusion, and the finally output probability vector is recorded as XprobThen the final probability vector X of each layer outputprobInput into the next layer, X, in connection with the original featuresprobThe calculation formula of (2) is as follows:
Xprob=∑w′i*Xi
wherein, XiAnd the predicted probability vector output by the ith forest is represented by training data by adopting k-fold cross validation at each layer, wherein k is 10.
And (4): and (4) repeating the steps (1) to (3) on the next layer, continuing training until the training precision is not improved any more, automatically stopping, and outputting the weighted probability of the last layer as a prediction result.
5. The method for the early prediction of sepsis based on machine learning according to claim 1, characterized in that: in the training and tuning of the model parameters in the step 4, each layer of the algorithm model for training and predicting selects 2 identical random forests and 2 identical XGboost, and each forest comprises 100 trees; training data in each layer by adopting a 10-fold cross validation mode, and dividing the data of a training set and a test set according to 0.8/0.2; the evaluation index uses the prediction accuracy, and if the prediction accuracy of the next layer in the training is not improved any more, the training is terminated.
6. The machine learning-based early prediction of sepsis method of claim 1, characterized by: the importance metric value of each characteristic variable output in the step 5 is generated simultaneously in the model training process, and the characteristic importance of the last output of each layer is F'nThe calculation method comprises the following steps:
F′n=w′i*Fi
wherein n represents the nth layer, i represents the ith forest algorithm model of each layer, and FiRepresenting all the feature importance of the ith forest model after 10 th training and averaging;
and finally, selecting and outputting the feature importance of the last layer, and further obtaining the importance metric of each feature variable.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010068293.2A CN111261282A (en) | 2020-01-21 | 2020-01-21 | Sepsis early prediction method based on machine learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010068293.2A CN111261282A (en) | 2020-01-21 | 2020-01-21 | Sepsis early prediction method based on machine learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111261282A true CN111261282A (en) | 2020-06-09 |
Family
ID=70952505
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010068293.2A Pending CN111261282A (en) | 2020-01-21 | 2020-01-21 | Sepsis early prediction method based on machine learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111261282A (en) |
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111696667A (en) * | 2020-06-11 | 2020-09-22 | 吾征智能技术(北京)有限公司 | Common gynecological disease prediction model construction method and prediction system |
CN111897857A (en) * | 2020-08-06 | 2020-11-06 | 暨南大学附属第一医院(广州华侨医院) | ICU (intensive care unit) duration prediction method after aortic dissection cardiac surgery |
CN111951975A (en) * | 2020-08-19 | 2020-11-17 | 哈尔滨工业大学 | Sepsis early warning method based on deep learning model GPT-2 |
CN111951964A (en) * | 2020-07-30 | 2020-11-17 | 山东大学 | Method and system for rapidly detecting novel coronavirus pneumonia |
CN111986814A (en) * | 2020-08-21 | 2020-11-24 | 南通大学 | Modeling method of lupus nephritis prediction model of lupus erythematosus patient |
CN112069820A (en) * | 2020-09-10 | 2020-12-11 | 杭州中奥科技有限公司 | Model training method, model training device and entity extraction method |
CN112331350A (en) * | 2020-10-14 | 2021-02-05 | 华南师范大学 | Method, system and storage medium for predicting early shift into intensive care unit |
CN112365943A (en) * | 2020-10-22 | 2021-02-12 | 杭州未名信科科技有限公司 | Method and device for predicting length of stay of patient, electronic equipment and storage medium |
CN112633601A (en) * | 2020-12-31 | 2021-04-09 | 天津开心生活科技有限公司 | Method, device, equipment and computer medium for predicting disease event occurrence probability |
CN112908480A (en) * | 2021-03-17 | 2021-06-04 | 上海电气集团股份有限公司 | Organ failure early warning method and system, electronic equipment and storage medium |
CN112992368A (en) * | 2021-04-09 | 2021-06-18 | 中山大学附属第三医院(中山大学肝脏病医院) | Prediction model system and recording medium for prognosis of severe spinal cord injury |
CN113017831A (en) * | 2021-02-26 | 2021-06-25 | 上海鹰瞳医疗科技有限公司 | Method and equipment for predicting arch height after artificial lens implantation |
CN113017572A (en) * | 2021-03-17 | 2021-06-25 | 上海交通大学医学院附属瑞金医院 | Severe warning method and device, electronic equipment and storage medium |
CN113057589A (en) * | 2021-03-17 | 2021-07-02 | 上海电气集团股份有限公司 | Method and system for predicting organ failure infection diseases and training prediction model |
CN113057588A (en) * | 2021-03-17 | 2021-07-02 | 上海电气集团股份有限公司 | Disease early warning method, device, equipment and medium |
CN113096814A (en) * | 2021-05-28 | 2021-07-09 | 哈尔滨理工大学 | Alzheimer disease classification prediction method based on multi-classifier fusion |
CN113284615A (en) * | 2021-06-16 | 2021-08-20 | 北京大学人民医院 | XGboost algorithm-based gastrointestinal stromal tumor prediction method and system |
CN113517066A (en) * | 2020-08-03 | 2021-10-19 | 东南大学 | Depression assessment method and system based on candidate gene methylation sequencing and deep learning |
CN113539394A (en) * | 2020-12-31 | 2021-10-22 | 内蒙古卫数数据科技有限公司 | Multi-disease prediction method based on medical inspection data |
CN113593708A (en) * | 2021-07-12 | 2021-11-02 | 杭州电子科技大学 | Sepsis prognosis prediction method based on integrated learning algorithm |
CN113744869A (en) * | 2021-09-07 | 2021-12-03 | 中国医科大学附属盛京医院 | Method for establishing early screening of light chain amyloidosis based on machine learning and application thereof |
CN113871009A (en) * | 2021-09-27 | 2021-12-31 | 山东师范大学 | Sepsis prediction system, storage medium and apparatus in intensive care unit |
CN114420300A (en) * | 2022-01-20 | 2022-04-29 | 北京大学第六医院 | Chinese old cognitive impairment prediction model |
CN114724701A (en) * | 2022-03-11 | 2022-07-08 | 梁娜 | Noninvasive ventilation curative effect prediction system based on superposition integration algorithm and automatic encoder |
WO2022216220A1 (en) * | 2021-04-07 | 2022-10-13 | Biosigns Pte. Ltd. | Method and system for personalized prediction of infection and sepsis |
CN115579147A (en) * | 2022-09-26 | 2023-01-06 | 一选(浙江)医疗科技有限公司 | Sepsis recognition model training method, sepsis early warning method and device |
CN116580847A (en) * | 2023-07-14 | 2023-08-11 | 天津医科大学总医院 | Modeling method and system for prognosis prediction of septic shock |
CN117238510A (en) * | 2023-11-16 | 2023-12-15 | 天津中医药大学第二附属医院 | Sepsis prediction method and system based on deep learning |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106874663A (en) * | 2017-01-26 | 2017-06-20 | 中电科软件信息服务有限公司 | Cardiovascular and cerebrovascular disease Risk Forecast Method and system |
CN109119167A (en) * | 2018-07-11 | 2019-01-01 | 山东师范大学 | Pyemia anticipated mortality system based on integrated model |
CN109872819A (en) * | 2019-01-30 | 2019-06-11 | 杭州脉兴医疗科技有限公司 | A kind of acute kidney injury incidence rate forecasting system based on Intensive Care Therapy detection |
-
2020
- 2020-01-21 CN CN202010068293.2A patent/CN111261282A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106874663A (en) * | 2017-01-26 | 2017-06-20 | 中电科软件信息服务有限公司 | Cardiovascular and cerebrovascular disease Risk Forecast Method and system |
CN109119167A (en) * | 2018-07-11 | 2019-01-01 | 山东师范大学 | Pyemia anticipated mortality system based on integrated model |
CN109872819A (en) * | 2019-01-30 | 2019-06-11 | 杭州脉兴医疗科技有限公司 | A kind of acute kidney injury incidence rate forecasting system based on Intensive Care Therapy detection |
Cited By (37)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111696667A (en) * | 2020-06-11 | 2020-09-22 | 吾征智能技术(北京)有限公司 | Common gynecological disease prediction model construction method and prediction system |
CN111951964A (en) * | 2020-07-30 | 2020-11-17 | 山东大学 | Method and system for rapidly detecting novel coronavirus pneumonia |
CN113517066A (en) * | 2020-08-03 | 2021-10-19 | 东南大学 | Depression assessment method and system based on candidate gene methylation sequencing and deep learning |
CN111897857A (en) * | 2020-08-06 | 2020-11-06 | 暨南大学附属第一医院(广州华侨医院) | ICU (intensive care unit) duration prediction method after aortic dissection cardiac surgery |
CN111951975A (en) * | 2020-08-19 | 2020-11-17 | 哈尔滨工业大学 | Sepsis early warning method based on deep learning model GPT-2 |
CN111951975B (en) * | 2020-08-19 | 2022-03-25 | 哈尔滨工业大学 | Sepsis early warning method based on deep learning model GPT-2 |
CN111986814B (en) * | 2020-08-21 | 2024-01-16 | 南通大学 | Modeling method of lupus nephritis prediction model of lupus erythematosus patient |
CN111986814A (en) * | 2020-08-21 | 2020-11-24 | 南通大学 | Modeling method of lupus nephritis prediction model of lupus erythematosus patient |
CN112069820A (en) * | 2020-09-10 | 2020-12-11 | 杭州中奥科技有限公司 | Model training method, model training device and entity extraction method |
CN112069820B (en) * | 2020-09-10 | 2024-05-24 | 杭州中奥科技有限公司 | Model training method, model training device and entity extraction method |
CN112331350A (en) * | 2020-10-14 | 2021-02-05 | 华南师范大学 | Method, system and storage medium for predicting early shift into intensive care unit |
CN112365943A (en) * | 2020-10-22 | 2021-02-12 | 杭州未名信科科技有限公司 | Method and device for predicting length of stay of patient, electronic equipment and storage medium |
CN112633601A (en) * | 2020-12-31 | 2021-04-09 | 天津开心生活科技有限公司 | Method, device, equipment and computer medium for predicting disease event occurrence probability |
CN113539394A (en) * | 2020-12-31 | 2021-10-22 | 内蒙古卫数数据科技有限公司 | Multi-disease prediction method based on medical inspection data |
CN113017831A (en) * | 2021-02-26 | 2021-06-25 | 上海鹰瞳医疗科技有限公司 | Method and equipment for predicting arch height after artificial lens implantation |
CN112908480A (en) * | 2021-03-17 | 2021-06-04 | 上海电气集团股份有限公司 | Organ failure early warning method and system, electronic equipment and storage medium |
CN113057589A (en) * | 2021-03-17 | 2021-07-02 | 上海电气集团股份有限公司 | Method and system for predicting organ failure infection diseases and training prediction model |
CN113017572A (en) * | 2021-03-17 | 2021-06-25 | 上海交通大学医学院附属瑞金医院 | Severe warning method and device, electronic equipment and storage medium |
CN113057588A (en) * | 2021-03-17 | 2021-07-02 | 上海电气集团股份有限公司 | Disease early warning method, device, equipment and medium |
WO2022216220A1 (en) * | 2021-04-07 | 2022-10-13 | Biosigns Pte. Ltd. | Method and system for personalized prediction of infection and sepsis |
CN112992368B (en) * | 2021-04-09 | 2023-06-20 | 中山大学附属第三医院(中山大学肝脏病医院) | Prediction model system and storage medium for severe spinal cord injury prognosis |
CN112992368A (en) * | 2021-04-09 | 2021-06-18 | 中山大学附属第三医院(中山大学肝脏病医院) | Prediction model system and recording medium for prognosis of severe spinal cord injury |
CN113096814A (en) * | 2021-05-28 | 2021-07-09 | 哈尔滨理工大学 | Alzheimer disease classification prediction method based on multi-classifier fusion |
CN113284615A (en) * | 2021-06-16 | 2021-08-20 | 北京大学人民医院 | XGboost algorithm-based gastrointestinal stromal tumor prediction method and system |
CN113593708A (en) * | 2021-07-12 | 2021-11-02 | 杭州电子科技大学 | Sepsis prognosis prediction method based on integrated learning algorithm |
CN113744869A (en) * | 2021-09-07 | 2021-12-03 | 中国医科大学附属盛京医院 | Method for establishing early screening of light chain amyloidosis based on machine learning and application thereof |
CN113744869B (en) * | 2021-09-07 | 2024-03-26 | 中国医科大学附属盛京医院 | Method for establishing early screening light chain type amyloidosis based on machine learning and application thereof |
CN113871009A (en) * | 2021-09-27 | 2021-12-31 | 山东师范大学 | Sepsis prediction system, storage medium and apparatus in intensive care unit |
CN114420300A (en) * | 2022-01-20 | 2022-04-29 | 北京大学第六医院 | Chinese old cognitive impairment prediction model |
CN114420300B (en) * | 2022-01-20 | 2023-08-04 | 北京大学第六医院 | Chinese senile cognitive impairment prediction model |
CN114724701A (en) * | 2022-03-11 | 2022-07-08 | 梁娜 | Noninvasive ventilation curative effect prediction system based on superposition integration algorithm and automatic encoder |
CN115579147A (en) * | 2022-09-26 | 2023-01-06 | 一选(浙江)医疗科技有限公司 | Sepsis recognition model training method, sepsis early warning method and device |
CN115579147B (en) * | 2022-09-26 | 2024-02-09 | 一选(浙江)医疗科技有限公司 | Sepsis recognition model training method, sepsis early warning method and sepsis early warning device |
CN116580847B (en) * | 2023-07-14 | 2023-11-28 | 天津医科大学总医院 | Method and system for predicting prognosis of septic shock |
CN116580847A (en) * | 2023-07-14 | 2023-08-11 | 天津医科大学总医院 | Modeling method and system for prognosis prediction of septic shock |
CN117238510A (en) * | 2023-11-16 | 2023-12-15 | 天津中医药大学第二附属医院 | Sepsis prediction method and system based on deep learning |
CN117238510B (en) * | 2023-11-16 | 2024-02-06 | 天津中医药大学第二附属医院 | Sepsis prediction method and system based on deep learning |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111261282A (en) | Sepsis early prediction method based on machine learning | |
CN110827993A (en) | Early death risk assessment model establishing method and device based on ensemble learning | |
CN111951975B (en) | Sepsis early warning method based on deep learning model GPT-2 | |
CN112201330B (en) | Medical quality monitoring and evaluating method combining DRGs tool and Bayesian model | |
Zhou et al. | Modeling methodology for early warning of chronic heart failure based on real medical big data | |
CN111081379B (en) | Disease probability decision method and system thereof | |
CN114023441A (en) | Severe AKI early risk assessment model and device based on interpretable machine learning model and development method thereof | |
CN113838577B (en) | Convenient layered old people MODS early death risk assessment model, device and establishment method | |
CN113470816A (en) | Machine learning-based diabetic nephropathy prediction method, system and prediction device | |
Wang et al. | Predictive classification of ICU readmission using weight decay random forest | |
CN113593708A (en) | Sepsis prognosis prediction method based on integrated learning algorithm | |
CN115482932A (en) | Multivariate blood glucose prediction algorithm based on transfer learning and glycosylated hemoglobin | |
Van Steenkiste et al. | Sensor fusion using backward shortcut connections for sleep apnea detection in multi-modal data | |
CN111986814A (en) | Modeling method of lupus nephritis prediction model of lupus erythematosus patient | |
KR102169637B1 (en) | Method for predicting of mortality risk and device for predicting of mortality risk using the same | |
Alghatani et al. | Precision clinical medicine through machine learning: using high and low quantile ranges of vital signs for risk stratification of ICU patients | |
KR102421172B1 (en) | Smart Healthcare Monitoring System and Method for Heart Disease Prediction Based On Ensemble Deep Learning and Feature Fusion | |
US6941288B2 (en) | Online learning method in a decision system | |
CN117116477A (en) | Construction method and system of prostate cancer disease risk prediction model based on random forest and XGBoost | |
CN114464319B (en) | AMS susceptibility assessment system based on slow feature analysis and deep neural network | |
Shahul et al. | Machine Learning Based Analysis of Sepsis | |
Rajmohan et al. | G-Sep: A Deep Learning Algorithm for Detection of Long-Term Sepsis Using Bidirectional Gated Recurrent Unit | |
CN115083616A (en) | Chronic nephropathy subtype mining system based on self-supervision graph clustering | |
Umut et al. | Prediction of sepsis disease by Artificial Neural Networks | |
CN112365992A (en) | Medical examination data identification and analysis method based on NRS-LDA |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |