CN112365943A - Method and device for predicting length of stay of patient, electronic equipment and storage medium - Google Patents

Method and device for predicting length of stay of patient, electronic equipment and storage medium Download PDF

Info

Publication number
CN112365943A
CN112365943A CN202011136028.XA CN202011136028A CN112365943A CN 112365943 A CN112365943 A CN 112365943A CN 202011136028 A CN202011136028 A CN 202011136028A CN 112365943 A CN112365943 A CN 112365943A
Authority
CN
China
Prior art keywords
training
prediction
data
patient
prediction model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011136028.XA
Other languages
Chinese (zh)
Inventor
吴静依
李鹏飞
李青
张路霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced Institute of Information Technology AIIT of Peking University
Hangzhou Weiming Information Technology Co Ltd
Original Assignee
Advanced Institute of Information Technology AIIT of Peking University
Hangzhou Weiming Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Advanced Institute of Information Technology AIIT of Peking University, Hangzhou Weiming Information Technology Co Ltd filed Critical Advanced Institute of Information Technology AIIT of Peking University
Priority to CN202011136028.XA priority Critical patent/CN112365943A/en
Publication of CN112365943A publication Critical patent/CN112365943A/en
Priority to PCT/CN2021/099644 priority patent/WO2022083140A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2433Single-class perspective, e.g. one-against-all classification; Novelty detection; Outlier detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/245Classification techniques relating to the decision surface
    • G06F18/2451Classification techniques relating to the decision surface linear, e.g. hyperplane
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references
    • G16H70/20ICT specially adapted for the handling or processing of medical references relating to practices or guidelines

Abstract

The application discloses a method, a device, electronic equipment and a storage medium for predicting the length of a patient in hospital, wherein the method comprises the following steps: establishing an ordered multi-classification prediction model by cascading a plurality of two-classification base learners; training each base learner by utilizing a training data set until each base learner meets the performance index requirement to obtain a trained prediction model; and selecting a sample to be predicted according to the preset prediction characteristics, inputting the sample to be predicted into the trained prediction model, and obtaining a prediction result. According to the method, a plurality of two-classification base learners are cascaded and connected in series to construct an ordered and multi-classification prediction model, the sequence progressive relation among all classes in the ordered and multi-classification ending variable is reserved, the ordered classes are not assumed to be in an equal proportion relation and better accord with real data characteristics, and the data sets are split layer by layer, so that the data of two classes in the data sets for training of all layers of base learners are relatively balanced, the problem of unbalanced data among multiple classes is effectively solved, and the accuracy of prediction results is improved.

Description

Method and device for predicting length of stay of patient, electronic equipment and storage medium
Technical Field
The application relates to the technical field of data processing, in particular to a method and a device for predicting length of stay of a patient in hospital, electronic equipment and a storage medium.
Background
The length of stay is a key index for evaluating the utilization efficiency of medical resources, and an intelligent stay prediction system can assist a clinician in identifying a patient with higher disease risk and provide timely medical intervention, so that the stay prognosis of the patient is improved; the doctor can also be assisted to reasonably arrange limited medical resources, so that the utilization efficiency of the medical resources is maximized; and the patient and the family members can also be provided with information consultation related to the length of stay in hospital at the initial stage of the patient's admission, so that the patient and the family members can master more information on the condition of illness and possible stay in hospital, the medical service satisfaction of the patient is improved, and the doctor-patient contradiction caused by asymmetric information is reduced.
Taking kidney diseases as an example, chronic kidney diseases are a group of common chronic diseases caused by kidney damage caused by various primary kidney diseases, diabetes, hypertension and the like. The medical health system of the kidney diseases in China urgently needs to be combined with an intelligent clinical decision support system to improve the medical efficiency and the prognosis of patients.
The existing patient hospitalization duration prediction is generally based on the working experience of a clinician, due to the complexity of the patient condition, the subjectivity of the working experience of the clinician is too high, the difficulty of the patient hospitalization duration prediction is high, the analysis efficiency is low, the accuracy is low, and the clinical decision of the clinician cannot be effectively assisted and the medical efficiency cannot be improved.
Considering that the hospitalization duration in the real world is influenced by human factors and has certain fluctuation, the numerical hospitalization duration prediction model accurate to the day is often large in error. The hospital stay prediction is converted from a numerical prediction problem into an ordered multi-classification prediction problem, the patient characteristic difference among all classification groups is more typical, the model prediction accuracy can be improved, and the classification result can provide enough information for clinical decision support and patient consultation. At present, the problem of ordered multi-classification is generally solved based on a numerical prediction model or an unordered multi-classification prediction model: the numerical prediction model assumes that a plurality of categories of ending variables follow geometric correlation, and a plurality of categories of ordered multi-category data in the real world often do not follow strict geometric correlation; the unordered multi-class prediction model directly ignores the progressive relation among all classes of the ordered multi-class ending variables, and the performance of the prediction model is often limited to a certain extent. Meanwhile, when the problem of data imbalance exists among all categories of the ordered multi-category ending variable, a larger prediction error can be generated by the unordered multi-category prediction model.
Disclosure of Invention
The application aims to provide a method and a device for predicting length of stay of a patient, electronic equipment and a storage medium. The following presents a simplified summary in order to provide a basic understanding of some aspects of the disclosed embodiments. This summary is not an extensive overview and is intended to neither identify key/critical elements nor delineate the scope of such embodiments. Its sole purpose is to present some concepts in a simplified form as a prelude to the more detailed description that is presented later.
According to an aspect of the embodiments of the present application, there is provided a method for predicting a length of a patient's stay in a hospital, including:
establishing an ordered multi-classification prediction model by cascading a plurality of two-classification base learners;
training each base learner by using a training data set until each base learner meets the performance index requirement to obtain a trained prediction model;
and selecting a sample to be predicted according to preset prediction characteristics, and inputting the sample to be predicted into the trained prediction model to obtain a prediction result.
Further, before the training each of the base learners with a training data set, the prediction method further comprises:
based on the electronic medical record data of the patient in the hospital information management system, data cleaning is carried out, and training data are extracted to form a training data set.
Further, before the selecting a sample to be predicted according to the preset prediction characteristics and inputting the sample to be predicted into the trained prediction model, the prediction method further comprises:
screening a prediction characteristic with high prediction value on the length of stay of a patient from electronic medical record data of the hospital information management system or from the training data set;
and supplementing and adjusting the screened prediction features by combining expert knowledge to obtain preset prediction features.
Further, the performing data cleaning includes:
removing patient data with high missing rate, removing abnormal data, and randomly filling missing data values.
Further, the two-classification-base learner is a gradient boosting decision tree algorithm.
Further, the training each of the two-class base learners by using the training data set until each of the two-class base learners meets the performance index requirement includes:
s1, inputting the training data set into the prediction model, and setting an initial value m to 1; the single training sample input format is (x, y); y is an ending variable containing ordered M classes, and x represents a set of prediction features of the training samples; m is the number of classification categories of the prediction model;
s2, judging whether M is less than M; if yes, go to step S3; if not, jumping to step S7;
s3, extracting data of which y is larger than or equal to the mth category as a training data subset of the mth base learner;
s4, labeling data in the training data subset, where y is the mth category, with a first training label, and labeling data in the training data subset, where y is greater than the mth category, with a second training label;
s5, training the two-classification-based learner based on the training data subset and the training labels obtained in the above steps to obtain an mth base learner;
s6, updating after the m is increased by 1, and returning to the step S2;
and S7, outputting M-1 base learners after training.
Further, the hyper-parameter optimization of each base learner is realized by adopting random hyper-parameter search and combining a five-fold cross validation method, and the F1 score is used as a reference index of the model predictive performance of the hyper-parameter optimization.
Further, the prediction method further comprises:
and updating the prediction model periodically and synchronously based on the updating of the electronic medical record data in the hospital information management system.
According to another aspect of the embodiments of the present application, there is provided an apparatus for predicting length of stay of a patient, including:
the construction module is used for constructing an ordered multi-classification prediction model by utilizing a plurality of two-classification base learners in cascade connection;
the training module is used for training each base learner by utilizing a training data set until each base learner meets the performance index requirement, so as to obtain a trained prediction model;
and the prediction module is used for selecting a sample to be predicted according to preset prediction characteristics and inputting the sample to be predicted into the trained prediction model to obtain a prediction result.
According to another aspect of the embodiments of the present application, there is provided an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the method for predicting length of patient stay as described above.
According to another aspect of the embodiments of the present application, there is provided a computer-readable storage medium having a computer program stored thereon, where the computer program is executed by a processor to implement the method for predicting the length of a patient's stay in a hospital as described above.
The technical scheme provided by one aspect of the embodiment of the application can have the following beneficial effects:
according to the prediction method for the length of the patient in hospital provided by the embodiment of the application, a plurality of two-classification base learners are cascaded and connected in series to construct an ordered multi-classification prediction model, the ordered multi-classification prediction task is divided into a plurality of two-classification tasks which are advanced layer by layer, one base learner is arranged on each layer, information of a sample to be predicted is input into each trained base learner layer by layer to obtain prediction classes, the sequence progressive relation among all classes in an ordered multi-classification ending variable is reserved, the ordered classes are not assumed to be in an equal proportion relation and better accord with real data characteristics, and data sets are divided layer by layer, so that data of two classes in a data set used for training of each layer of base learners are relatively balanced, the problem of data unbalance among the classes can be effectively solved, and the accuracy of prediction results is improved.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the application, or may be learned by the practice of the embodiments. The objectives and other advantages of the application may be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 shows a flowchart of a method for predicting length of a patient's stay in a hospital according to an embodiment of the present application;
FIG. 2 shows a flow diagram of a training process for a base learner in one embodiment of the present application;
FIG. 3 is a flowchart illustrating selecting a sample to be predicted and inputting the sample into a trained prediction model to obtain a prediction result according to an embodiment of the present application;
fig. 4 is a block diagram illustrating a device for predicting length of stay of a patient according to another embodiment of the present application;
fig. 5 shows a block diagram of an electronic device provided in another embodiment of the present application;
fig. 6 shows a block diagram of a system for intelligently predicting the stay in hospital of a kidney disease patient according to another embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is further described with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
As shown in fig. 1, an embodiment of the present application provides a method for predicting length of stay of a patient, including the following steps:
and S1, collecting effective modeling data.
In the present embodiment, a kidney disease patient is taken as an example, and it can be understood by those skilled in the art that the method of the present embodiment is not limited to be used for the kidney disease patient, but can also be used for the duration prediction of hospitalization of other disease patients. Effective modeling data is extracted through data cleaning based on electronic medical record data of a patient with a renal disease in a hospital information management system. The modeling data is training data used to train the base learner.
Collecting electronic medical record data in a hospital information management system, and screening out a kidney disease patient based on a chronic kidney disease diagnosis standard given by an international KDIGO kidney disease clinical guideline; deleting the patient data, characteristic indexes and data abnormal values with the information loss rate of more than 30%, wherein the data, the characteristic indexes and the data abnormal values are not included in the final model construction; filling missing values of the data by adopting a random filling algorithm, wherein the random filling algorithm can ensure that the filled data keep the distribution characteristics of real data; effective modeling data are extracted, and a modeling database is formed by utilizing the effective modeling data.
And S2, screening the prediction characteristics.
And (3) screening a certain number of prediction features which have high prediction value and are convenient for clinical practice acquisition from the modeling database by combining expert knowledge and a feature screening algorithm to form a feature subset for modeling.
Extracting a prediction feature set from electronic medical record data in a hospital information management system, wherein the prediction feature set comprises: demographic characteristics, renal disease characteristics, hospitalization characteristics, general disease characteristics, laboratory test index characteristics, and the like.
1) Demographic characteristics include: age, gender, marital status, occupation, education level, type of medical insurance, etc.;
2) kidney disease characteristics include: the stage of chronic kidney disease, the pathogenic disease of kidney disease, the diagnosis age limit of kidney disease and other parameter data;
3) the hospitalization characteristics include: medical institution type, hospitalization times, hospitalization state, hospitalization route, hospitalization department and other parameter data;
4) general disease characteristics include: the hospital admission causes, whether the patients have complications (diabetes, hypertension, tumor, chronic obstructive pulmonary disease, lung infection, cardiovascular diseases, cerebrovascular diseases, chronic liver diseases) and other parameter data;
5) laboratory test indicator features include: blood routine, urine protein/creatinine, blood sugar, blood fat, electrolyte, blood calcium, blood phosphorus, full-segment parathyroid hormone and other parameter data.
Screening a certain number of prediction feature subsets with high prediction value on the hospitalization duration of the kidney disease patient by using a recursive feature elimination algorithm; and secondly, supplementing and adjusting the screened prediction feature subset by combining with expert knowledge. The feature selection combining expert knowledge and the feature screening algorithm is beneficial to ensuring the accuracy of screening features and the feasibility of clinical practice. The feature screening can reduce the complexity of a prediction model and is convenient for clinical practice.
And S3, constructing a prediction model.
And establishing an ordered multi-classification prediction model by cascading a plurality of two-classification base learners.
Specifically, the hospitalization duration of the kidney disease patient is sequentially divided into M categories from low to high, the prediction feature subset screened in the step S2 is used as the input of a prediction model, the cascading layer-by-layer modeling algorithm is adopted, and the gradient lifting decision tree algorithm is used as a base learner, so as to construct the prediction model of the hospitalization duration of the kidney disease patient; the hyper-parameter optimization of each base learner adopts a random hyper-parameter search and a five-fold cross validation method, and uses an F1 score as a reference index of the model predictive performance of the hyper-parameter optimization.
And S4, training each base learner by utilizing the training data set until each base learner meets the performance index requirement, and obtaining a trained prediction model.
The basic structure of the cascaded layer-by-layer modeling algorithm of the embodiment adopts a multi-level integrated architecture, and is formed by cascading a plurality of two-classification base learners, wherein each layer is respectively trained with one base learner, and the ordered M-classification prediction model comprises M-1 base learners. M is the number of classification categories of the prediction model.
The M categories of the ending variables are arranged in ascending order, and for the M (M is 1,2, …, M-1) base learners, the training data subset is data with y being more than or equal to the M category.
Given a training data set D, its single training sample input format is (x, y). Wherein y is an outcome variable comprising ordered M classifications, arranging M categories of the outcome variable in increasing order to obtain a category 1 < category 2 < · · · · · · < category M < · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·; x represents the set of predicted features of the training sample.
As shown in fig. 2, in some embodiments, the training process of the base learner includes the steps of:
s11, inputting a training data set D, and setting an initial value m to 1;
s12, judging whether M is less than M: if yes, go to step S13; if not, jumping to step S17;
s13, extracting a training data subset: extracting data of which y is larger than or equal to the mth category as a training subset of the mth base learner;
s14, tag data tag: recording training labels of data of which y is equal to the mth category in the extracted training data subset as 0, and recording training labels of data of which y is greater than the mth category as 1;
s15, training a base learner: training a preset two-classification base learner based on the training data subset and the data labels extracted in the step, thereby obtaining an mth base learner;
s16, updating after the m is increased by 1, and returning to the step S12; i.e. m +1 or m + +;
and S17, outputting M-1 base learners after training.
The method is characterized in that the hyper-parameter optimization of each base learner is realized by combining random hyper-parameter search with a five-fold cross validation method, and F1 scores are used as reference indexes of the model predictive performance of the hyper-parameter optimization.
And S5, selecting a sample to be predicted according to preset prediction characteristics, and inputting the sample to be predicted into the trained prediction model to obtain a prediction result.
Inputting a sample to be predicted into the prediction model to obtain a prediction result, and in some embodiments, visually displaying the prediction result.
Inputting the information of the newly admitted patient into the hospitalization duration prediction model to obtain a prediction result, visually displaying the prediction result and the diagnosis and treatment suggestion, and giving a visual result of the influence of the prediction characteristic of the patient on the hospitalization duration based on the SHAP algorithm.
And inputting the information of the new sample to be predicted into each trained base learner layer by layer until the prediction category is obtained and output.
In some embodiments, as shown in fig. 3, step S5 specifically includes:
s51, inputting information of a sample to be predicted, and setting an initial value m to be 1;
s52, judging whether M is smaller than M: if yes, inputting the sample information into the m-th base learner which is trained to obtain an output 0 or 1;
s53, if the output is 0, the final prediction type of the sample is the m-th type, and the step S55 is skipped; if the output is 1, then m is updated after being incremented by 1 (i.e., operation m is executed as m +1), and the process proceeds to step S54;
s54, judging whether M is equal to M: if yes, the final prediction type of the sample is the Mth type, and the step goes to step S55; if not, returning to the step S52;
and S55, outputting the final prediction type of the sample.
And S6, automatically updating the prediction model.
And updating the stay time prediction model periodically and synchronously based on the update of the data collected by the hospital electronic medical record data management system.
Based on the update of the data collected by the hospital electronic medical record data management system, the modeling data is updated based on the system data of the last three years each year, a new length of stay prediction model is constructed according to the method in the step S3, and the updated length of stay prediction model is used for replacing the historical prediction model, so that the periodic synchronous update of the length of stay prediction model is realized.
The patient hospitalization duration prediction method is based on a cascading layer-by-layer modeling algorithm of ordered multi-classification prediction, adopts a multi-level integrated architecture, is formed by cascading a plurality of base learners, and is suitable for the prediction problem that the ordered categories do not follow geometric correlation relations among the categories or data imbalance exists among the categories. According to the method provided by the embodiment of the application, the ordered multi-class prediction task is divided into a plurality of two classification tasks which progress layer by layer, a base learner is trained on each layer, and the information of a new sample to be predicted is input into each trained base learner layer by layer until the prediction class is obtained and output. The cascade layer-by-layer modeling algorithm reserves the sequence progressive relation among all categories in the ordered multi-classification result variable, does not assume that the ordered categories are in an equal ratio relation, and better accords with the real data characteristics. In addition, the data set is split layer by layer, so that the data of two categories in the data set for training the base learner in each layer are relatively balanced, and the problem of data imbalance among multiple categories can be effectively solved.
As shown in fig. 4, another embodiment of the present application provides an apparatus for predicting length of stay of a patient, including:
the construction module 30 is used for constructing an ordered multi-classification prediction model by cascading a plurality of two-classification-based learners;
the training module 40 is configured to train each of the base learners by using a training data set until each of the base learners meets the performance index requirement, so as to obtain a trained prediction model;
and the prediction module 50 is used for selecting a sample to be predicted according to preset prediction characteristics and inputting the sample to be predicted into the trained prediction model to obtain a prediction result.
In some embodiments, the prediction apparatus further includes a data extraction module 10 for performing data cleaning based on electronic medical record data of a patient in the hospital information management system before training each base learner with a training data set, and extracting the training data to form the training data set.
In some embodiments, the prediction apparatus further includes a prediction feature obtaining module 20, configured to, before selecting a sample to be predicted according to a preset prediction feature and inputting the sample into the trained prediction model,
screening a prediction characteristic with high prediction value on the length of stay of a patient from electronic medical record data of the hospital information management system or from the training data set;
and supplementing and adjusting the screened prediction features by combining expert knowledge to obtain preset prediction features.
In some embodiments, the data extraction module 10 includes a cleaning unit for performing data cleaning, and the cleaning unit is specifically configured to:
removing patient data with high missing rate, removing abnormal data, and randomly filling missing data values.
The two-classification base learning device is a gradient boosting decision tree algorithm.
In certain embodiments, training module 40 is specifically configured to:
s11, inputting the training data set into the prediction model, and setting an initial value m to 1; the single training sample input format is (x, y); y is an ending variable containing ordered M classes, and x represents a set of prediction features of the training samples; m is the number of classification categories of the prediction model;
s12, judging whether M is less than M; if yes, go to step S13; if not, jumping to step S17;
s13, extracting data of which y is larger than or equal to the mth category as a training data subset of the mth base learner;
s14, labeling data in the training data subset, where y is the mth category, with a first training label, and labeling data in the training data subset, where y is greater than the mth category, with a second training label;
s15, training the two-classification-based learner based on the training data subset and the training labels obtained in the above steps to obtain an mth base learner;
s16, updating after the m is increased by 1, and returning to the step S12;
and S17, outputting M-1 base learners after training.
In some embodiments, the training module 40 is further specifically configured to implement the hyper-parameter optimization of each base learner by using a random hyper-parameter search in combination with a five-fold cross validation method, and using the F1 score as a reference indicator of the model predictive performance of the hyper-parameter optimization.
In some embodiments, the prediction device further comprises an updating module 60, and the updating module 60 is configured to periodically and synchronously update the prediction model based on the update of the electronic medical record data in the hospital information management system.
Another embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the method for predicting length of stay of a patient as described above. As shown in fig. 5, in some embodiments, the electronic device 70 may include: the processor 700, the memory 701, the bus 702 and the communication interface 703, wherein the processor 700, the communication interface 703 and the memory 701 are connected through the bus 702; the memory 701 stores a computer program that can be executed on the processor 700, and the processor 700 executes the computer program to perform the method for predicting the length of a patient's stay in a hospital as provided in any of the foregoing embodiments of the present application.
Another embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, the program being executed by a processor to implement the above-mentioned method for predicting the length of a patient's stay in a hospital.
As shown in fig. 6, another embodiment of the present application provides an intelligent kidney disease patient length of stay prediction system, which includes:
the input module is at least used for inputting the information of the newly admitted kidney disease patient;
the prediction module is at least used for constructing and training an obtained kidney disease patient hospitalization duration prediction model through the method and predicting the hospitalization duration of the data of the newly admitted patient;
and the display module is at least used for displaying the visual prediction result.
Compared with the prior art, the patient hospitalization duration prediction method provided by the embodiment of the application can achieve the following beneficial effects: the cascade layer-by-layer modeling algorithm based on the ordered multi-classification prediction reserves the sequence progressive relation among all classes in the ordered multi-classification result variable, does not assume that the ordered classes are in an equal ratio relation, and better accords with the real data characteristics; by splitting the data set layer by layer, the data of two categories in the data set for training the base learner of each layer are relatively balanced, and the problem of data imbalance among multiple categories can be effectively solved. Meanwhile, the method excavates patient data collected by an electronic hospital case data management system based on the cascading layer-by-layer modeling algorithm, constructs a patient-oriented hospitalization duration prediction model and system by taking a gradient lifting decision tree algorithm as a base learner, provides visual prediction result display for newly admitted patients, realizes the synchronous update of an intelligent hospitalization duration prediction model according to the data update of an electronic hospital case data management system, overcomes the defect that the existing hospitalization duration prediction is subjectively predicted according to the experience of a clinician, and effectively improves the efficiency and accuracy of the patient hospitalization duration prediction, thereby assisting in clinical decision and medical resource allocation and improving the hospitalization prognosis and medical satisfaction of the patient.
The method, the device, the electronic equipment and the computer readable storage medium provided by the embodiment of the application are not limited to be used for predicting the stay length of the kidney disease patient, and can be widely used for predicting the stay length of other disease patients.
It should be noted that:
the term "module" is not intended to be limited to a particular physical form. Depending on the particular application, a module may be implemented as hardware, firmware, software, and/or combinations thereof. Furthermore, different modules may share common components or even be implemented by the same component. There may or may not be clear boundaries between the various modules.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. In addition, this application is not directed to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present application as described herein, and any descriptions of specific languages are provided above to disclose the best modes of the present application.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The above-mentioned embodiments only express the embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present application. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present application shall be subject to the appended claims.

Claims (11)

1. A method for predicting length of stay of a patient, comprising:
establishing an ordered multi-classification prediction model by cascading a plurality of two-classification base learners;
training each base learner by using a training data set until each base learner meets the performance index requirement to obtain a trained prediction model;
and selecting a sample to be predicted according to preset prediction characteristics, and inputting the sample to be predicted into the trained prediction model to obtain a prediction result.
2. The method of claim 1, wherein prior to said training each of said base learners with a training data set, said predictive method further comprises:
based on the electronic medical record data of the patient in the hospital information management system, data cleaning is carried out, and training data are extracted to form a training data set.
3. The method according to claim 2, wherein before the selecting the sample to be predicted according to the preset prediction features and inputting the sample to be predicted into the trained prediction model, the prediction method further comprises:
screening a prediction characteristic with high prediction value on the length of stay of a patient from electronic medical record data of the hospital information management system or from the training data set;
and supplementing and adjusting the screened prediction features by combining expert knowledge to obtain preset prediction features.
4. The method of claim 2, wherein the performing data scrubbing comprises:
removing patient data with high missing rate, removing abnormal data, and randomly filling missing data values.
5. The method of claim 1, wherein the two-class basis learner is a gradient boosting decision tree algorithm.
6. The method of claim 1, wherein training each of the two-class based learners with a training data set until each of the two-class based learners meets performance criteria comprises:
s1, inputting the training data set into the prediction model, and setting an initial value m to 1; the single training sample input format is (x, y); y is an ending variable containing ordered M classes, and x represents a set of prediction features of the training samples; m is the number of classification categories of the prediction model;
s2, judging whether M is less than M; if yes, go to step S3; if not, jumping to step S7;
s3, extracting data of which y is larger than or equal to the mth category as a training data subset of the mth base learner;
s4, labeling data in the training data subset, where y is the mth category, with a first training label, and labeling data in the training data subset, where y is greater than the mth category, with a second training label;
s5, training the two-classification-based learner based on the training data subset and the training labels obtained in the above steps to obtain an mth base learner;
s6, updating after the m is increased by 1, and returning to the step S2;
and S7, outputting M-1 base learners after training.
7. The method of claim 6, wherein the hyper-parameter optimization of each base learner is realized by adopting a random hyper-parameter search and combining a five-fold cross validation method, and an F1 score is used as a reference index of the model predictive performance of the hyper-parameter optimization.
8. The method of claim 1, wherein the prediction method further comprises:
and updating the prediction model periodically and synchronously based on the updating of the electronic medical record data in the hospital information management system.
9. An apparatus for predicting length of stay of a patient, comprising:
the construction module is used for constructing an ordered multi-classification prediction model by utilizing a plurality of two-classification base learners in cascade connection;
the training module is used for training each base learner by utilizing a training data set until each base learner meets the performance index requirement, so as to obtain a trained prediction model;
and the prediction module is used for selecting a sample to be predicted according to preset prediction characteristics and inputting the sample to be predicted into the trained prediction model to obtain a prediction result.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the program to implement the method of predicting length of a patient's stay in a hospital as claimed in any one of claims 1 to 8.
11. A computer-readable storage medium, on which a computer program is stored, which program is executable by a processor for implementing a method for predicting the length of a patient's stay in a hospital as claimed in any one of claims 1 to 8.
CN202011136028.XA 2020-10-22 2020-10-22 Method and device for predicting length of stay of patient, electronic equipment and storage medium Pending CN112365943A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011136028.XA CN112365943A (en) 2020-10-22 2020-10-22 Method and device for predicting length of stay of patient, electronic equipment and storage medium
PCT/CN2021/099644 WO2022083140A1 (en) 2020-10-22 2021-06-11 Patient length of stay prediction method and apparatus, electronic device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011136028.XA CN112365943A (en) 2020-10-22 2020-10-22 Method and device for predicting length of stay of patient, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112365943A true CN112365943A (en) 2021-02-12

Family

ID=74511555

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011136028.XA Pending CN112365943A (en) 2020-10-22 2020-10-22 Method and device for predicting length of stay of patient, electronic equipment and storage medium

Country Status (2)

Country Link
CN (1) CN112365943A (en)
WO (1) WO2022083140A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113197578A (en) * 2021-05-07 2021-08-03 天津医科大学 Schizophrenia classification method and system based on multi-center model
CN113393939A (en) * 2021-04-26 2021-09-14 上海米健信息技术有限公司 Intensive care unit patient hospitalization day prediction method and system
WO2022083140A1 (en) * 2020-10-22 2022-04-28 杭州未名信科科技有限公司 Patient length of stay prediction method and apparatus, electronic device, and storage medium
CN117894481A (en) * 2024-03-15 2024-04-16 长春大学 Bayesian super-parameter optimization gradient lifting tree heart disease prediction method and device

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115831300B (en) * 2022-09-29 2023-12-29 广州金域医学检验中心有限公司 Detection method, device, equipment and medium based on patient information
CN116434893B (en) * 2023-06-12 2023-08-29 中才邦业(杭州)智能技术有限公司 Concrete compressive strength prediction model, construction method, medium and electronic equipment
CN117472789B (en) * 2023-12-28 2024-03-12 成都工业学院 Software defect prediction model construction method and device based on ensemble learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202883A (en) * 2016-06-28 2016-12-07 成都中医药大学 A kind of method setting up disease cloud atlas based on big data analysis
CN107103198A (en) * 2017-04-26 2017-08-29 上海联影医疗科技有限公司 Medical data processing method, device and equipment
CN108231146A (en) * 2017-12-01 2018-06-29 华南师范大学 A kind of medical records model building method, system and device based on deep learning
US20190357797A1 (en) * 2018-05-28 2019-11-28 The Governing Council Of The University Of Toronto System and method for generating visual identity and category reconstruction from electroencephalography (eeg) signals

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107403198B (en) * 2017-07-31 2020-12-22 广州探迹科技有限公司 Official website identification method based on cascade classifier
CN112365943A (en) * 2020-10-22 2021-02-12 杭州未名信科科技有限公司 Method and device for predicting length of stay of patient, electronic equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202883A (en) * 2016-06-28 2016-12-07 成都中医药大学 A kind of method setting up disease cloud atlas based on big data analysis
CN107103198A (en) * 2017-04-26 2017-08-29 上海联影医疗科技有限公司 Medical data processing method, device and equipment
CN108231146A (en) * 2017-12-01 2018-06-29 华南师范大学 A kind of medical records model building method, system and device based on deep learning
US20190357797A1 (en) * 2018-05-28 2019-11-28 The Governing Council Of The University Of Toronto System and method for generating visual identity and category reconstruction from electroencephalography (eeg) signals

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022083140A1 (en) * 2020-10-22 2022-04-28 杭州未名信科科技有限公司 Patient length of stay prediction method and apparatus, electronic device, and storage medium
CN113393939A (en) * 2021-04-26 2021-09-14 上海米健信息技术有限公司 Intensive care unit patient hospitalization day prediction method and system
CN113197578A (en) * 2021-05-07 2021-08-03 天津医科大学 Schizophrenia classification method and system based on multi-center model
CN117894481A (en) * 2024-03-15 2024-04-16 长春大学 Bayesian super-parameter optimization gradient lifting tree heart disease prediction method and device

Also Published As

Publication number Publication date
WO2022083140A1 (en) 2022-04-28

Similar Documents

Publication Publication Date Title
CN112365943A (en) Method and device for predicting length of stay of patient, electronic equipment and storage medium
US11152119B2 (en) Care path analysis and management platform
EP3234823B1 (en) Differential medical diagnosis apparatus adapted in order to determine an optimal sequence of diagnostic tests for identifying a pathology by adopting diagnostic appropriateness criteria
US11170900B2 (en) Method and apparatus for refining similar case search
Vedomske et al. Random forests on ubiquitous data for heart failure 30-day readmissions prediction
JP2012221508A (en) System and computer readable medium for predicting patient outcomes
CN112635011A (en) Disease diagnosis method, disease diagnosis system, and readable storage medium
US20200388358A1 (en) Machine Learning Method for Generating Labels for Fuzzy Outcomes
CN114942947A (en) Follow-up visit data processing method and system based on intelligent medical treatment
EP4012717A1 (en) A pregnancy decision support system and method
CN110021386B (en) Feature extraction method, feature extraction device, equipment and storage medium
WO2021044594A1 (en) Method, system, and apparatus for health status prediction
CN116721699A (en) Intelligent recommendation method based on tumor gene detection result
CN114783587A (en) Intelligent prediction system for severe acute kidney injury
CN114974555A (en) System for predicting risk of severe acute kidney injury after mechanical ventilation
Mansouri et al. A hybrid machine learning approach for early mortality prediction of ICU patients
CN114203306A (en) Medical event prediction model training method, medical event prediction method and device
CN113947278A (en) Hospital specialty decision support system, method and corresponding device and storage medium
Myrzakerimova et al. Development of an automated expert system for diagnosing diseases of internal organs
Almeida et al. A recommender system based on cohorts’ similarity
Reches et al. From phenotyping to genotyping-bioinformatics for the busy clinician
Kondylakis et al. Computerized clinical guidelines: Current status & principles for future research
US20240078266A1 (en) Hierarchical tagging for personalized matching
Feng et al. A Hybrid Data Mining Approach for Generalizing Characteristics of Emergency Department Visits Causing Overcrowding.
US20160140292A1 (en) System and method for sorting a plurality of data records

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination