WO2022083140A1

WO2022083140A1 - Patient length of stay prediction method and apparatus, electronic device, and storage medium

Info

Publication number: WO2022083140A1
Application number: PCT/CN2021/099644
Authority: WO
Inventors: 吴静依; 李鹏飞; 李青; 张路霞
Original assignee: 杭州未名信科科技有限公司; 浙江省北大信息技术高等研究院
Priority date: 2020-10-22
Filing date: 2021-06-11
Publication date: 2022-04-28
Also published as: CN112365943A

Abstract

Disclosed in the present application are a patient length of stay prediction method and apparatus, an electronic device, and a storage medium. The method comprises: constructing an ordered multi-classification prediction model by means of cascade concatenation of a plurality of binary classification base learners; training each base learner by using a training data set until each base learner meets performance index requirements to obtain a trained prediction model; and according to a preset prediction feature, selecting a sample to be predicted, and inputting same into the trained prediction model to obtain a prediction result. According to the method of the present application, the ordered multi-classification prediction model is constructed by means of cascade concatenation of the plurality of binary classification base learners; a sequence progressive relationship among categories in ordered multi-classification outcome variables is reserved, and the ordered categories are not assumed to be a geometrically proportional relationship, thereby more meeting real data features; the data set is split layer by layer, so that data of two categories in the data set for training each base learner is relatively balanced, thereby effectively solving the problem of unbalance among multi-category data, and improving the accuracy of the prediction result.

Description

Method, device, electronic device and storage medium for predicting length of hospital stay of patients

technical field

The present application relates to the technical field of data processing, and in particular to a method, device, electronic device and storage medium for predicting the length of hospitalization of a patient.

Background technique

The length of hospital stay is a key indicator for evaluating the efficiency of medical resource utilization. The intelligent length of stay prediction system can assist clinicians to identify patients with high disease risk and provide timely medical intervention, thereby improving the patient’s hospitalization prognosis; it can also assist doctors in making reasonable arrangements Limited medical resources maximize the utilization efficiency of medical resources; it can also provide patients and their families with information about the length of stay in the early stage of admission, so that patients and their families can learn more about their illness and possible hospitalization. information, thereby improving patient satisfaction with medical services and reducing doctor-patient conflicts caused by information asymmetry.

Taking kidney disease as an example, chronic kidney disease is a group of common chronic diseases caused by kidney damage caused by various primary kidney diseases, diabetes and hypertension. my country's kidney disease medical and health system urgently needs to combine an intelligent clinical decision support system to improve medical efficiency and improve patient prognosis.

The existing hospitalization length prediction of patients is generally based on the clinician's work experience. Due to the complexity of the patient's condition, the subjectivity of the doctor's work experience is too high. The prediction of the patient's hospitalization length is difficult, the analysis efficiency is low, the accuracy rate is low, and it cannot be effective. Assist doctors in clinical decision-making and improve medical efficiency.

Considering that the length of hospital stay in the real world is affected by human factors and has a certain volatility, the prediction model of the length of hospitalization that is accurate to the number of days often has a large error. Converting the prediction of hospitalization length from a numerical prediction problem to an ordered multi-classification prediction problem, the differences in patient characteristics between each classification group are more typical, which can improve the prediction accuracy of the model, and the classification results can provide enough information for clinical decision-making Support consultation with patients. At present, ordered multi-classification problems are generally solved based on numerical prediction models or disordered multi-classification prediction models: Numerical prediction models assume that multiple categories of outcome variables follow an proportional correlation, while in real-world ordinal multi-classification data Multiple categories often do not follow a strict proportional relationship; the disordered multi-category prediction model directly ignores the progressive relationship between the categories of the ordered multi-category outcome variable, and the performance of the prediction model is often limited to a certain extent. At the same time, when there is a data imbalance problem between the categories of the ordered multi-category outcome variables, the unordered multi-category prediction model will produce large prediction errors.

SUMMARY OF THE INVENTION

The purpose of the present application is to provide a method, device, electronic device and storage medium for predicting the length of a patient's hospital stay. In order to provide a basic understanding of some aspects of the disclosed embodiments, a brief summary is given below. This summary is not intended to be an extensive review, nor is it intended to identify key/critical elements or delineate the scope of protection of these embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the detailed description that follows.

According to an aspect of the embodiments of the present application, a method for predicting the length of hospitalization of a patient is provided, including:

Construct an ordered multi-classification prediction model by cascading multiple binary base learners;

Use the training data set to train each of the basic learners until each of the basic learners meets the performance index requirements, and obtain a trained prediction model;

According to the preset prediction feature, the samples to be predicted are selected and input into the trained prediction model to obtain the prediction result.

Further, before using the training data set to train each of the basic learners, the prediction method further includes:

Based on the patient's electronic medical record data in the hospital information management system, data cleaning is performed, and training data is extracted to form a training data set.

Further, before selecting the sample to be predicted and inputting the trained prediction model according to the preset prediction feature, the prediction method further includes:

From the electronic medical record data of the hospital information management system or from the training data set, select the predictive features with high predictive value for the length of stay of the patient;

Combined with expert knowledge, the selected predictive features are supplemented and adjusted to obtain preset predictive features.

Further, performing data cleaning includes:

Eliminate patient data with a high missing rate, remove abnormal data, and randomly fill in missing data.

Further, the binary classification base learner is a gradient boosting decision tree algorithm.

Further, the use of the training data set to train each of the two-class base learners until each of the two-class base learners meets performance index requirements, including:

S1. Input the training data set into the prediction model, and set the initial value m=1; the input format of a single training sample is (x, y); y is the outcome variable containing the ordered M classification, and x represents the A set of prediction features; M is the number of classification categories of the prediction model;

S2, determine whether m<M; if so, go to step S3; if not, skip to step S7;

S3. Extract the data of y≥mth category as the training data subset of the mth base learner;

S4, mark the data of y=mth category in the training data subset with the first training label, and mark the data of y>mth category in the training data subset with the second training label;

S5, based on the training data subsets and training labels obtained in the above steps, train the two-class base learner to obtain the mth base learner;

S6, m is updated after incrementing by 1, and returns to step S2;

S7. Output the M-1 basic learners that have been trained.

Further, the random hyperparameter search combined with the five-fold cross-validation method is used to realize the hyperparameter optimization of each basic learner, and the F1 score is used as the reference index of the model prediction performance of the hyperparameter optimization.

Further, the prediction method also includes:

Based on the update of the electronic medical record data in the hospital information management system, the prediction model is updated periodically and synchronously.

According to another aspect of the embodiments of the present application, a device for predicting hospitalization length of a patient is provided, including:

The building module is used to construct an ordered multi-class prediction model by cascading and concatenating multiple binary classification base learners;

A training module, used for training each of the basic learners by using the training data set until each of the basic learners meets the performance index requirements, and obtains a trained prediction model;

The prediction module is used for selecting samples to be predicted and inputting the trained prediction model according to the preset prediction features to obtain prediction results.

According to another aspect of the embodiments of the present application, an electronic device is provided, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor executing the program, In order to achieve the above-mentioned method of predicting the length of hospitalization of patients.

According to another aspect of the embodiments of the present application, there is provided a computer-readable storage medium on which a computer program is stored, and the program is executed by a processor to implement the above-mentioned method for predicting the length of hospitalization of a patient.

The technical solution provided by one aspect of the embodiments of the present application may include the following beneficial effects:

In the method for predicting the length of hospitalization of a patient provided by the embodiment of the present application, an ordered and multi-classified prediction model is constructed by cascading and concatenated multiple binary base learners, and the ordered and multi-classified prediction task is divided into several layer-by-layer steps. For the advanced binary classification task, each layer has a base learner, and the information of the samples to be predicted is input into each trained base learner layer by layer to obtain the predicted category, and the sequence between the categories in the ordered multi-category outcome variable is preserved. Progressive relationship, and does not assume a proportional relationship between ordered categories, which is more in line with the characteristics of real data. By splitting the data set layer by layer, the data of the two categories in the data set used for the training of each layer of basic learners is relatively balanced. , which can effectively solve the problem of data imbalance between multiple categories and improve the accuracy of prediction results.

Other features and advantages of the present application will be set forth in the description which follows, and, in part, will become apparent from the description, or may be inferred or unambiguously determined from the description, or may be implemented by practice of the present application. example to understand. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description, claims, and drawings.

Description of drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following briefly introduces the accompanying drawings required for the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments described in this application. For those of ordinary skill in the art, other drawings can also be obtained based on these drawings without any creative effort.

FIG. 1 shows a flowchart of a method for predicting hospitalization length of a patient according to an embodiment of the present application;

Fig. 2 shows the training process flow chart of the basic learner in an embodiment of the present application;

FIG. 3 shows a flowchart of selecting a sample to be predicted and inputting a trained prediction model to obtain a prediction result in an embodiment of the present application;

FIG. 4 shows a structural block diagram of an apparatus for predicting hospitalization length of a patient provided by another embodiment of the present application;

5 shows a structural block diagram of an electronic device provided by another embodiment of the present application;

FIG. 6 shows a structural block diagram of an intelligent prediction system for the length of stay of a patient with kidney disease provided by another embodiment of the present application.

Detailed ways

In order to make the purpose, technical solutions and advantages of the present application clearer, the present application will be further described below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, but not to limit the present application. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It should also be understood that terms, such as those defined in a general dictionary, should be understood to have meanings consistent with their meanings in the context of the prior art and, unless specifically defined as herein, should not be interpreted in idealistic or overly formal meaning to explain.

As shown in FIG. 1, an embodiment of the present application provides a method for predicting the length of hospitalization of a patient, including the following steps:

S1. Collect effective modeling data.

In this embodiment, a patient with renal disease is used as an example. Those skilled in the art can understand that the method in this embodiment is not limited to being used for patients with renal disease, but can also be used for predicting the length of hospitalization for patients with other diseases. Based on the electronic medical record data of kidney disease patients in the hospital information management system, after data cleaning, effective modeling data is extracted. Modeling data is the training data used to train the base learner.

Collect the electronic medical record data in the hospital information management system, and screen out the patients with chronic kidney disease based on the diagnostic criteria of chronic kidney disease given by the international KDIGO clinical guidelines for kidney disease; the data and characteristic indicators and data of patients whose information missing rate exceeds 30% The outliers are deleted and not included in the final model construction; the missing values of the data are filled with a random filling algorithm, and the random filling algorithm can keep the distribution characteristics of the real data after filling; The modeling data constitutes the modeling database.

S2, screening prediction features.

Combined with expert knowledge and feature screening algorithm, a certain number of predictive features with high predictive value and easy to collect in clinical practice are selected from the modeling database to form a feature subset for modeling.

A predictive feature set is extracted from the electronic medical record data in the hospital information management system, wherein the predictive feature set includes: demographic features, kidney disease features, medical treatment features, general disease features, laboratory test index features, etc.

1) Demographic characteristics include: age, gender, marital status, occupation, education level, medical insurance type and other parameter data;

2) The characteristics of kidney disease include: chronic kidney disease stage, primary disease of kidney disease, years of diagnosis of kidney disease and other parameter data;

3) The characteristics of medical treatment include: type of medical institution, number of hospitalizations, admission status, admission route, admission department and other parameter data;

4) General disease characteristics include: the cause of admission, whether there is comorbidity (diabetes, hypertension, tumor, chronic obstructive pulmonary disease, pulmonary infection, cardiovascular disease, cerebrovascular disease, chronic liver disease) and other parameter data;

5) Laboratory test index characteristics include: blood routine, urine routine, urine protein/creatinine, serum creatinine, blood glucose, blood lipids, electrolytes, serum calcium, serum phosphorus, parathyroid hormone and other parameter data.

A recursive feature elimination algorithm was used to screen out a certain number of predictive feature subsets with high predictive value for the length of stay in patients with renal disease; secondly, combined with expert knowledge, the selected predictive feature subsets were supplemented and adjusted. The feature selection combining expert knowledge and feature screening algorithm is beneficial to ensure the accuracy of screening features and the feasibility of clinical practice. Feature screening can reduce the complexity of predictive models and facilitate clinical practice.

S3. Build a prediction model.

A multi-class prediction model is constructed by cascading and concatenating multiple binary base learners.

Specifically, the length of hospitalization of patients with renal disease is divided into M categories in order from low to high, and the predicted feature subset screened in step S2 is used as the input of the prediction model, and the cascaded layer-by-layer modeling algorithm is used, Using the gradient boosting decision tree algorithm as the base learner, a prediction model for the length of stay in patients with kidney disease was constructed; among them, the hyperparameter optimization of each base learner used random hyperparameter search combined with five-fold cross-validation method, and F1 score was used as hyperparameter search. A reference indicator of optimal model prediction performance.

S4. Use the training data set to train each of the basic learners until each of the basic learners meets the performance index requirements, and obtain a trained prediction model.

The basic structure of the cascaded layer-by-layer modeling algorithm in this embodiment adopts a multi-level integrated architecture, which is composed of multiple binary classification base learners connected in series. Each layer trains a base learner respectively. The prediction model contains M-1 base learners. M is the number of classification categories of the prediction model.

The M categories of outcome variables are arranged in increasing order. For the mth (m=1, 2, ..., M-1) basic learner, the training data subset is the data of y≥mth category.

Given a training dataset D, its single training sample input format is (x,y). Among them, y is an outcome variable containing an ordered M classification, and the M categories of the outcome variable are arranged in increasing order to obtain the first category < the second category < ... < mth category < ... < Mth category; x represents the set of predicted features for the training samples.

As shown in Figure 2, in some embodiments, the training process of the base learner includes the following steps:

S11, input the training data set D, and set the initial value m=1;

S12, judge whether m<M: if yes, then go to step S13; if no, go to step S17;

S13, extracting a training data subset: extracting the data of y≥mth category as the training subset of the mth base learner;

S14, labeling the data label: the training label of the data of the y=mth category in the extracted training data subset is marked as 0, and the training label of the data of y>mth category is marked as 1;

S15, training a base learner: based on the training data subsets and data labels extracted in the above steps, train a preset two-class base learner, thereby obtaining the mth base learner;

S16, m is updated after incrementing by 1, and returns to step S12; that is, m=m+1 or m=m++;

S17. Output the M-1 basic learners that have been trained.

Among them, the random hyperparameter search combined with the five-fold cross-validation method is used to realize the hyperparameter optimization of each basic learner, and the F1 score is used as the reference index of the model prediction performance of the hyperparameter optimization.

S5. According to the preset prediction feature, select samples to be predicted and input the trained prediction model to obtain a prediction result.

Inputting the samples to be predicted into the prediction model to obtain the prediction result, and in some embodiments, it also includes visual display of the prediction result.

The information of newly admitted patients is input into the hospitalization length prediction model, and the prediction results are obtained, and the prediction results and diagnosis and treatment suggestions are displayed visually.

Input the information of the new sample to be predicted into each trained basic learner layer by layer until the predicted category is obtained and output.

In some embodiments, as shown in Figure 3, step S5 specifically includes:

S51, input the information of the sample to be predicted, and set the initial value m=1;

S52, determine whether m is less than M: if so, input the sample information into the trained mth basic learner, and obtain an output of 0 or 1;

S53. If the output is 0, then the final prediction category of the sample is the mth category, and skip to step S55; if the output is 1, m is updated by incrementing by 1 (that is, performing operation m=m+1), and at the same time entering Step S54;

S54, determine whether m is equal to M: if yes, then the final prediction category of the sample is the Mth category, and skip to step S55; if not, return to step S52;

S55. Output the final predicted category of the sample.

S6, automatically update the prediction model.

Based on the update of the data collected by the hospital electronic medical record data management system, the hospitalization duration prediction model is updated synchronously on a regular basis.

Based on the update of the data collected by the hospital electronic medical record data management system, the modeling data is updated based on the system data of the past three years at the end of each year, and a new hospitalization length prediction model is constructed according to the method described in step S3, and the updated hospitalization length prediction model is used to replace the historical prediction. model, thereby realizing regular synchronous updates to the hospital length prediction model.

The method for predicting the length of stay of a patient in the embodiment of the present application is based on a cascaded layer-by-layer modeling algorithm based on ordered multi-classification prediction, adopts a multi-level integrated architecture, and is formed by cascading a plurality of basic learners, and is suitable for ordering There are multiple categories and the categories do not follow the proportional relationship or there is a data imbalance between the categories. The method provided by the embodiment of the present application divides the ordered multi-category prediction task into several progressive binary classification tasks, each layer trains a basic learner, and the information of the new sample to be predicted is input layer by layer Each trained base learner until its predicted class is obtained and output. The cascaded layer-by-layer modeling algorithm retains the sequential progressive relationship between the categories in the ordered multi-category outcome variable, and does not assume the proportional relationship between the ordered categories, which is more in line with the real data characteristics. In addition, by splitting the data set layer by layer, the data of the two categories in the data set used for training each layer of the base learner is relatively balanced, which can effectively solve the problem of data imbalance between multiple categories.

As shown in FIG. 4 , another embodiment of the present application provides a device for predicting the length of hospitalization of a patient, including:

The building module 30 is used for constructing an ordered multi-classification prediction model by cascading and concatenating a plurality of binary classification base learners;

A training module 40, configured to train each of the basic learners by using the training data set until each of the basic learners meets the performance index requirements, and obtain a trained prediction model;

The prediction module 50 is configured to select the samples to be predicted and input the trained prediction model according to the preset prediction feature to obtain the prediction result.

In some embodiments, the prediction device further includes a data extraction module 10 for performing data cleaning based on the patient's electronic medical record data in the hospital information management system before using the training data set to train each basic learner, and extracting the training data to form a training dataset.

In some embodiments, the prediction device further includes a prediction feature acquisition module 20, which is used to select samples to be predicted according to preset prediction features and input them into the trained prediction model,

In some embodiments, the data extraction module 10 includes a cleaning unit for performing data cleaning, and the cleaning unit is specifically used for:

The binary classification base learner is a gradient boosting decision tree algorithm.

In some embodiments, the training module 40 is specifically used to:

S11. Input the training data set into the prediction model, and set the initial value m=1; the input format of a single training sample is (x, y); y is the outcome variable including the ordered M classification, and x represents the training sample A set of prediction features; M is the number of classification categories of the prediction model;

S12, determine whether m<M; if yes, go to step S13; if not, go to step S17;

S13, extracting the data of y≥mth category as the training data subset of the mth base learner;

S14, mark the data of y=mth category in the training data subset with a first training label, and mark the data of y>mth category in the training data subset with a second training label;

S15, based on the training data subsets and training labels obtained in the above steps, train the two-class base learner to obtain the mth base learner;

S16, m is updated after being incremented by 1, and returns to step S12;

S17. Output the M-1 basic learners that have been trained.

In some embodiments, the training module 40 is further configured to use random hyperparameter search combined with a five-fold cross-validation method to realize the hyperparameter optimization of each basic learner, and use the F1 score as a reference index of the model prediction performance for hyperparameter optimization. .

In some embodiments, the prediction apparatus further includes an update module 60, and the update module 60 is configured to periodically update the prediction model synchronously based on the update of the electronic medical record data in the hospital information management system.

Another embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor executes the program to achieve The above-mentioned methods for predicting the length of hospital stay in patients. As shown in FIG. 5, in some embodiments, the electronic device 70 may include: a processor 700, a memory 701, a bus 702 and a communication interface 703, the processor 700, the communication interface 703 and the memory 701 are connected through the bus 702; the memory 701 A computer program that can be run on the processor 700 is stored in the computer, and when the processor 700 runs the computer program, the method for predicting the length of hospitalization of a patient provided by any of the foregoing embodiments of the present application is executed.

Another embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and the program is executed by a processor to implement the above-mentioned method for predicting the length of hospitalization of a patient.

As shown in FIG. 6 , another embodiment of the present application provides an intelligent prediction system for the length of hospitalization of patients with renal disease, including:

An input module, at least for entering information on newly admitted kidney disease patients;

A prediction module, which is at least used for the prediction model of the length of stay of kidney disease patients constructed and trained by the aforementioned method, and predicts the length of stay in the hospital for the data of the newly admitted patient;

A display module, at least used to display the visual prediction results.

Compared with the prior art, the method for predicting the length of stay of a patient in the embodiment of the present application can achieve the following beneficial effects: the cascaded layer-by-layer modeling algorithm based on ordered multi-classification The sequential progressive relationship between categories, and does not assume the proportional relationship between the ordered categories, is more in line with the real data characteristics; by splitting the data set layer by layer, the data set used for each layer of base learner training is made. The data is relatively balanced, which can effectively solve the problem of data imbalance between multiple categories. At the same time, the present disclosure mines the patient data collected by the hospital electronic case data management system based on the cascaded layer-by-layer modeling algorithm, and uses the gradient boosting decision tree algorithm as the base learner to construct a patient-oriented hospitalization duration prediction model and system. , provides a visual display of prediction results for newly admitted patients, and realizes the synchronous update of the intelligent hospitalization duration prediction model according to the data update of the hospital electronic medical record data management system, which improves the existing hospitalization duration prediction based on the experience of clinicians. The insufficiency of subjective prediction effectively improves the efficiency and accuracy of the prediction of the length of hospitalization of patients, thereby assisting clinical decision-making and allocation of medical resources, and improving the hospitalization prognosis and medical satisfaction of patients.

The method, apparatus, electronic device, and computer-readable storage medium provided by the embodiments of the present application are not only limited to predicting the length of hospitalization for patients with kidney disease, but can also be widely used for predicting the length of hospitalization for patients with other diseases.

It should be noted:

The term "module" is not intended to be limited to a particular physical form. Depending on the specific application, a module may be implemented in hardware, firmware, software, and/or a combination thereof. Furthermore, different modules can share common components or even be implemented by the same components. There may or may not be clear boundaries between different modules.

The algorithms and displays provided herein are not inherently related to any particular computer, virtual appliance, or other device. Various general-purpose devices can also be used with the teachings based on this. The structure required to construct such a device is apparent from the above description. Furthermore, this application is not directed to any particular programming language. It should be understood that the content of the application described herein can be implemented using a variety of programming languages and that the descriptions of specific languages above are intended to disclose the best mode of the application.

Similarly, it is to be understood that in the above description of exemplary embodiments of the application, various features of the application are sometimes grouped together into a single embodiment, figure, or its description. This disclosure, however, should not be construed as reflecting an intention that the claimed application requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this application.

It should be understood that although the various steps in the flowchart of the accompanying drawings are sequentially shown in the order indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order and may be performed in other orders. Moreover, at least a part of the steps in the flowcharts of the accompanying drawings may include multiple sub-steps or multiple stages, and these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and the execution sequence is also It does not have to be performed sequentially, but may be performed alternately or alternately with other steps or at least a portion of sub-steps or stages of other steps.

The above-mentioned embodiment only expresses the embodiment of the present application, and its description is more specific and detailed, but should not be construed as a limitation to the patent scope of the present application. It should be noted that, for those skilled in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the scope of protection of the present application should be determined by the appended claims.

Claims

A method for predicting the length of a patient's hospital stay, characterized by comprising:

Construct an ordered multi-classification prediction model by cascading multiple binary base learners;

Use the training data set to train each of the basic learners until each of the basic learners meets the performance index requirements, and obtain a trained prediction model;

According to the preset prediction feature, the sample to be predicted is selected and input to the trained prediction model to obtain the prediction result.
The method according to claim 1, characterized in that, before using the training data set to train each of the basic learners, the prediction method further comprises:

Based on the patient's electronic medical record data in the hospital information management system, data cleaning is performed, and training data is extracted to form a training data set.
The method according to claim 2, wherein, before selecting the sample to be predicted and inputting the trained prediction model according to the preset prediction feature, the prediction method further comprises:

From the electronic medical record data of the hospital information management system or from the training data set, select the predictive features with high predictive value for the length of stay of the patient;

Combined with expert knowledge, the selected predictive features are supplemented and adjusted to obtain preset predictive features.
The method according to claim 2, wherein the performing data cleaning comprises:

Eliminate patient data with a high missing rate, remove abnormal data, and randomly fill in missing data.
The method according to claim 1, wherein the binary classification base learner is a gradient boosting decision tree algorithm.
The method according to claim 1, wherein the training each of the two-class base learners by using a training data set until each of the two-class base learners meets performance index requirements, comprising:

S1. Input the training data set into the prediction model, and set the initial value m=1; the input format of a single training sample is (x, y); y is the outcome variable containing the ordered M classification, and x represents the A set of prediction features; M is the number of classification categories of the prediction model;

S2, determine whether m<M; if so, go to step S3; if not, skip to step S7;

S3. Extract the data of y≥mth category as the training data subset of the mth base learner;

S4, mark the data of y=mth category in the training data subset with the first training label, and mark the data of y>mth category in the training data subset with the second training label;

S5, based on the training data subsets and training labels obtained in the above steps, train the two-class base learner to obtain the mth base learner;

S6, m is updated after incrementing by 1, and returns to step S2;

S7. Output the M-1 basic learners that have been trained.
The method according to claim 6, characterized in that a random hyperparameter search combined with a five-fold cross-validation method is used to realize the hyperparameter optimization of each basic learner, and the F1 score is used as a reference index of the model prediction performance for hyperparameter optimization.
The method according to claim 1, wherein the prediction method further comprises:

Based on the update of the electronic medical record data in the hospital information management system, the prediction model is updated periodically and synchronously.
A device for predicting the length of hospitalization of a patient, comprising:

The building module is used to construct an ordered multi-classification prediction model by cascading and concatenating multiple binary classification base learners;

A training module, used for training each of the basic learners by using the training data set until each of the basic learners meets the performance index requirements, and obtains a trained prediction model;

The prediction module is used for selecting samples to be predicted and inputting the trained prediction model according to the preset prediction features to obtain prediction results.
An electronic device, characterized in that it comprises a memory, a processor, and a computer program stored on the memory and running on the processor, the processor executing the program to achieve claims 1- The method for predicting the length of hospitalization of a patient according to any one of 8.
A computer-readable storage medium on which a computer program is stored, characterized in that the program is executed by a processor to implement the method for predicting the length of hospitalization of a patient according to any one of claims 1-8.