WO2023085674A1

WO2023085674A1 - Device and method for predicting discharge of inpatient

Info

Publication number: WO2023085674A1
Application number: PCT/KR2022/016885
Authority: WO
Inventors: 김영학; 전태준; 안임진; 강희준; 권한슬; 김윤하; 서혜람; 조하나; 최희정; 김민경; 한지예
Original assignee: 재단법인 아산사회복지재단; 울산대학교 산학협력단
Priority date: 2021-11-11
Filing date: 2022-11-01
Publication date: 2023-05-19
Also published as: KR20230068717A

Abstract

A device for predicting the discharge of an inpatient according to an embodiment may include a processor that acquires, for a patient hospitalized at a first point in time, a first probability score, indicating the probability of the patient being discharged within a target period from the first point in time, by applying a machine-learning model to input data including medical data collected during the hospitalization period of the patient from the hospitalization date to the first point in time, and predicts, on the basis of the acquired first probability score, whether the patient will be discharged within the target period from the first point in time. The processor may, if the patient remains hospitalized at a second point in time later than the first point in time, update the medical data collected during the hospitalization period on the basis of medical data collected from the first point in time to the second point in time, acquire a second probability score, indicating the probability of the patient being discharged within a target period from the second point in time, by applying the machine-learning model to input data including the updated medical data, and predict, on the basis of the acquired second probability score, whether the patient will be discharged within the target period from the second point in time.

Description

Apparatus and method for predicting discharge of inpatients

Hereinafter, technology related to an apparatus and method for predicting discharge of an inpatient is disclosed.

Effective resource management in hospitals can improve the quality of medical services by reducing the labor-intensive burden of manpower, reducing waiting time for inpatients, and ensuring optimal treatment time. Utilization of hospital processes requires effective bed management, and a patient's stay in the hospital for longer than optimal treatment time may interfere with bed management. Estimating the duration of a patient's hospital stay can help make informed decisions about bed management.

Utilization of costly and scarce human and physical resources can be essential for the efficient operation of hospital processes. Hospitals may be required to improve overall management efficiency while handling a variety of resources, such as staff and staff schedule management, bed management, and clinical pathway management. Effective resource management in hospitals can improve the quality of medical care by reducing the labor-intensive burden of manpower, reducing waiting time for inpatients, and ensuring optimal treatment time. Hospital resource management may include bed management. In most hospitals today, clinicians can manually check a patient's condition and decide whether to continue hospitalization or discharge. Based on the foregoing decision, medical personnel and staff can determine bed capacity and schedule patient appointments in the near future. The number of patients hospitalized for various chronic and acute diseases such as cardiovascular disease (CVD) is steadily increasing, and readmission or complications may be caused due to insufficient treatment. If a patient is hospitalized for longer than the optimal treatment time, effective bed management can be difficult. Accurately estimating the duration of a patient's hospital stay and carefully deciding discharge can be important.

Many studies have focused on the efficiency of hospital resources, most of which can suggest algorithms or models to improve bed management. Integer linear programs can be proposed to investigate bed plans and solve optimization problems. A simulated bed occupancy schedule may be described. In addition, bedside simulations for surgical patients can be studied using Monte Carlo simulations to determine ICU capacity. In particular, the expected length of stay (LOS) is one of the pieces of information necessary for bed management, and LOS can be predicted based on an electronic health record (EHR). Machine learning (ML) based models can be used to predict LOS, prolonged hospitalization, and unscheduled readmission and to find biomarkers for critical illness. Recently, a lot of research on interpretable artificial intelligence (XAI) can be conducted. In one of the XAI studies, a model can be developed that predicts acute disease and provides both results and interpretations. Compared to EHR, research using computer vision results of imaging techniques can be promoted more actively because important parts of images can be directly visualized. To support bed management, ML-based predictive models can be developed to provide daily discharge probabilities and "individual descriptors" that visualize affected features.

A machine learning (ML) based predictive model can be developed to predict the discharge probability of hospitalized patients with cardiovascular diseases (CVDs). The results of the predictive model can be evaluated and key risk factors for hospitalized patients can be described for patient-specific treatment. Bed schedules are efficiently managed and long-term inpatients can be detected in advance. The utilization of hospital processes can be improved and the quality of medical services can be increased.

Patients with chronic and acute diseases, including cardiovascular disease, may have high hospitalization rates, readmission rates, and complications. Alternatives to transfers to other hospitals may be available to address delays in treatment or hospitalization that cause serious problems. Hospitals must continuously seek fundamental measures to reduce waiting time, and efficient bed management is one of them. Because of the diversity of diseases, it may be advantageous to implement bed management for specific departments or diseases (e.g. clustered specific wards) by finding common risk factors and then expanding to the hospital level. there is. Developing ML-based models and predicting discharge of patients hospitalized with CVD can determine available bed capacity and discover risk factors in the near future. By providing persuasive discharge information, such as individual expected discharge dates and cardiovascular disease risk factors, it is possible to assist medical staff in manually managing accurate beds.

Individual descriptors were proposed to evaluate the outcome of prediction and to describe the main risk factors of inpatients for patient-specific care. Even if patients have the same disease and have common variables indicative of the disease, each patient may have different characteristics, medical history, circumstances, and treatment. It may be required to identify and monitor individual variables unique to each patient. The results of the ML-based predictive model may include information about a patient's daily discharge, as well as feature contributions such as feature importance. You can visualize each patient's daily discharge probability and the features that affect each patient during their hospital stay. Individual descriptors can guide medical teams and patients to gain a rational basis for the results of ML-based models, understand the condition in detail, and prepare for treatment in advance. Individual analyzes can be focused on each patient and the identified meaningful characteristics can be used in other studies as a basis for pre-identification of variables influencing hospitalization.

It can be helpful in efficiently managing bed reservations and discovering long-term inpatients in advance. Bed management may refer to a process of designating a patient with the highest probability of being discharged, securing the number of available beds, and allocating beds to waiting patients after reservation for hospitalization. Since the process is complex and is usually manual, it may be intended to assist the process by providing the expected LOS and discharge probabilities returned by the model and recognizing bed capacity in the near future. Patients with a high probability of discharge as well as patients with a consistently low probability of discharge can be detected. In other words, the causes of long-term hospitalization of high-risk patients can be identified and analyzed and provided to the management team.

In summary, an ML-based model can be developed to predict whether a patient hospitalized with CVD will be discharged within 3 days. Based on the model, individual descriptors can be proposed, and bed management simulations can be shown, including affected features such as discharge probabilities and demography, prescribed medications and treatments. It can help improve the efficient utilization of hospital resources and improve the quality of health care.

However, the technical challenges are not limited to the above-described technical challenges, and other technical challenges may exist.

Cohort criteria can be established and data can be extracted from CardioNet, a manually curated database specializing in CVD. Data can be processed to re-index date indexes, integrate current features with historical features from 3 years ago, and impute missing values to create a suitable data set. You can train and evaluate ML-based predictive models to discover sophisticated models. We can predict the probability of discharge within 3 days and explain the results by identifying, quantifying, and visualizing the features of the model.

By developing an ML-based predictive model, it is possible to predict the probability of discharge within 3 days of each day for each cardiovascular disease patient and obtain individual LOS.

An apparatus for predicting discharge of a patient according to an embodiment includes, for a patient who is hospitalized at a first time point, machine learning on input data including medical data collected from the hospitalization date to the first time point during the hospitalization period of the patient. By applying a model, a first probability score for the patient to be discharged from the hospital within the target period from the first time point is obtained, and based on the obtained first probability score, whether the patient will be discharged from the hospital within the target period from the first time point. It may include a processor that predicts.

The processor includes operation, procedure, picture archiving and communication system (PACS), diagnosis, and medications collected for the patient from the hospitalization date to the first time point ( The first likelihood score may be obtained by applying the machine learning model to input data including medical data on one or a combination of two or more of medication, laboratory, and physical.

The processor updates medical data collected during the hospitalization period based on medical data collected from the first time point to the second time point in response to a case where the patient is hospitalized at a second time point after the first time point. and obtaining a second probability score that the patient will be discharged from the hospital within a target period from the second point in time by applying the machine learning model to input data including the updated medical data, based on the obtained second probability score. Thus, it is possible to predict whether the patient will be discharged from the hospital within the target period from the second time point.

The processor may acquire the first likelihood score by applying the machine learning model to input data including medical data collected during the hospitalization period and medical data collected during a predefined period prior to the hospitalization period. there is.

The processor includes diagnosis, medication, laboratory, physical, and medical data collected during a predefined period prior to the hospitalization period for the patient together with medical data collected during the hospitalization period. The first likelihood score is obtained by applying the machine learning model to input data including medical data related to one or a combination of two or more of the length of stay (LOS) of an intensive care unit (ICU). can

The processor selects one or more features of the collected data based on the feature importance of each feature to an ad hoc machine learning model trained based on all features of the collected data. The first likelihood score by selecting as a feature of the input data of a machine learning model, training the machine learning model based on the selected features, and applying the machine learning model to the input data including the selected features. can be obtained.

The processor may select one or more features as an input of the machine learning model by applying a recursive feature elimination with cross validation (RFECV) technique to the features of the input data.

The processor may acquire the first likelihood score by applying an extreme gradient boost (XGB) model to the input data.

The processor may select one or more of the features based on a feature influence corresponding to a score caused by each feature of the input data with respect to the obtained first likelihood score.

The apparatus for predicting discharge of a patient according to an exemplary embodiment may further include a display displaying an influence degree of the feature of the selected one or more features with respect to the obtained first likelihood score.

A method for predicting discharge of a patient according to an embodiment includes a machine learning model for a patient who is hospitalized at a first time point, input data including medical data collected from an hospitalization date to the first time point during the hospitalization period of the patient. obtaining a first probability score that the patient will be discharged from the hospital within a subject period from the first time point by applying ?; and predicting whether the patient will be discharged from the hospital within the target period from the first time point based on the obtained first likelihood score.

The obtaining of the first likelihood score may include operation, procedure, and picture archiving and communication system (PACS) collected from the hospitalization date to the first time point for the patient. The first likelihood score is obtained by applying the machine learning model to input data including medical data related to one or a combination of two or more of diagnosis, medication, laboratory, and physical. Acquisition steps may be included.

A method for predicting discharge of a patient according to an embodiment, in response to a case in which the patient is hospitalized at a second time point after the first time point, based on medical data collected from the first time point to the second time point Updating medical data collected during the hospitalization period; obtaining a second probability score that the patient will be discharged from the hospital within a target period from the second point in time by applying the machine learning model to input data including the updated medical data; and predicting whether the patient will be discharged from the hospital within the subject period from the second time point based on the obtained second likelihood score.

The obtaining of the first likelihood score may include applying the machine learning model to input data including medical data collected during the hospitalization period and medical data collected during a predefined period prior to the hospitalization period, so as to obtain the first probability score. 1 Possibility score may be included.

Acquiring the first likelihood score may include diagnosis, medication, and laboratory information collected during a predefined period prior to the hospitalization period for the patient together with medical data collected during the hospitalization period. ), physical, and length of stay (LOS) in an intensive care unit (ICU) by applying the machine learning model to input data including medical data about one or a combination of two or more. The step of acquiring the first likelihood score may be included.

According to an embodiment, a method for predicting a patient's discharge is provided based on the feature importance of each feature to an ad hoc machine learning model trained based on all features of the collected data, selecting one or more of the features of the data as features of the input data of the machine learning model; and training the machine learning model based on the selected features, wherein obtaining the first likelihood score comprises applying the machine learning model to the input data including the selected features. The step of acquiring the first likelihood score may be included.

The step of selecting the one or more features as input features of the machine learning model may include applying a recursive feature elimination and cross validation (RFECV) technique to the features of the input data to select one or more features. may include selecting them as inputs of the machine learning model.

The obtaining of the first likelihood score may include obtaining the first likelihood score by applying an extreme gradient boost (XGboost) model to the input data.

A method for predicting discharge of a patient according to an embodiment may include a feature influence corresponding to a score caused by each feature of the input data with respect to the obtained first probability score among the features. It may further include selecting one or more features.

The method for predicting discharge of a patient according to an embodiment may further include displaying the feature influence of the selected one or more features with respect to the obtained first likelihood score.

The method for predicting discharge of a patient according to an embodiment may further include displaying probability scores for a plurality of time points during the hospitalization period.

A computer program according to an embodiment may be combined with hardware and stored in a computer readable recording medium to execute any one of the methods described above.

Five ML-based models can be tested using five-fold cross-validations. XGB (Extreme Gradient Boosting), selected as the final model, can achieve an average AUROC (area under receiver operating characteristic) score higher than other models (e.g., logistic regression, random forest, support vector machine, and multilayer perceptron) by 0.865. . It can perform feature reduction (eg, feature selection), indicate feature importance, and evaluate prediction results. One of the results, an individual explainer, can provide hospital discharge scores and daily feature impact scores to medical staff and patients. To use the results, simulated bed care can be visualized.

In the present invention, individual descriptors can be proposed based on the developed ML-based predictive model that provides the discharge probability and the relative contribution of features. Apparatus and methods according to the present invention may assist medical teams and patients in identifying personal and common risk factors of CVD and hospital administrators in improving management of beds and other resources.

1 shows an operation of an apparatus for predicting a patient's discharge according to an embodiment.

Figure 2 shows the overall flow of a method for predicting discharge of a patient according to an embodiment.

3 shows medical data according to an embodiment.

4 illustrates a preprocessing process for obtaining medical data from raw medical data according to an embodiment.

5 illustrates labeling of medical data according to one embodiment.

6 shows acquisition of a probability score and prediction of whether or not to be discharged by a processor according to an embodiment.

7 shows cross-validation performed in training of a machine learning model according to an embodiment.

8 shows ROC curves for comparing performance of a plurality of machine learning models according to an embodiment.

9 illustrates selecting features to be applied to a machine learning model based on feature importance according to an embodiment.

10 shows the performance of machine learning models of input data including selected features according to one embodiment.

11 shows a waterfall chart expressing feature influence according to an exemplary embodiment.

12 shows predicted likelihood scores at multiple time points during a patient's hospital stay, according to one embodiment.

13 shows a simulated impact on bed management to which a predictive model and individual descriptors are applied according to an embodiment.

Specific structural or functional descriptions of the embodiments are disclosed for illustrative purposes only, and may be changed and implemented in various forms. Therefore, the form actually implemented is not limited only to the specific embodiments disclosed, and the scope of the present specification includes changes, equivalents, or substitutes included in the technical idea described in the embodiments.

Although terms such as first or second may be used to describe various components, such terms should only be construed for the purpose of distinguishing one component from another. For example, a first element may be termed a second element, and similarly, a second element may be termed a first element.

It should be understood that when an element is referred to as being “connected” to another element, it may be directly connected or connected to the other element, but other elements may exist in the middle.

Singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as "comprise" or "have" are intended to designate that the described feature, number, step, operation, component, part, or combination thereof exists, but one or more other features or numbers, It should be understood that the presence or addition of steps, operations, components, parts, or combinations thereof is not precluded.

Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and unless explicitly defined in this specification, it should not be interpreted in an ideal or excessively formal meaning. don't

Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. In the description with reference to the accompanying drawings, the same reference numerals are given to the same components regardless of reference numerals, and overlapping descriptions thereof will be omitted.

1 shows an operation of an apparatus for predicting a patient's discharge according to an embodiment. The apparatus 100 for predicting discharge of a patient may include a processor 110 and a display 120 .

Processor 110 may use a machine learning model to obtain a likelihood score (also referred to herein as probability or probability of discharge) of the patient being discharged. The likelihood score may represent the likelihood that the patient will be discharged within a subject period (eg, 3 days) from the predicted time point. The processor 110 may predict whether the patient will be discharged within the target period from the predicted time point based on the obtained probability score. An operation of a processor that obtains a probability score and predicts discharge or not by using a machine learning model will be described in detail with reference to FIG. 6 below.

The display 120 may display whether or not the patient is predicted to be discharged from the hospital. The display 120 may display each predicted probability score at a plurality of viewpoints through the graph 121 according to time. In addition, the display 120 may display a feature influence representing a degree of influence of each feature on the likelihood score in a waterfall chart 122 . The operation of the display 120 will be described in detail with reference to FIGS. 11 to 13 below.

In this specification, a feature (eg, a medical feature) may represent an individual item and/or category that classifies information related to a patient's medical condition. The medical feature may include one or a combination of two or more of a surgical feature, a treatment feature, a medical image transmission system feature, a diagnosis feature, a medication feature, an examination feature, a body feature, and an intensive care unit length of stay feature. Each medical feature described above may include a plurality of sub-features. Individual medical features are discussed later in FIG. 3 .

Figure 2 shows the overall flow of a method for predicting discharge of a patient according to an embodiment. In data processing, cohort criteria can be established and appropriate data sets can be created by processing the data. In AI Model Evaluation, a sophisticated model can be found by training and evaluating a machine learning-based predictive model (eg, a machine learning model). In prediction and explanation, the probability of discharge within a subject period (eg, 3 days) can be predicted and the results of the model can be explained by identifying, quantifying, and visualizing features.

3 shows medical data according to an embodiment.

Medical data 310 according to an embodiment is data having a format that can be input to a machine learning model, and may be obtained through preprocessing of raw medical data. For reference, the medical data 310 itself may be an input format of a machine learning model, and as described later in FIG. 9 , data including some of the features of the medical data may be an input format of a machine learning model. there is.

The medical data 310 may include first partial medical data including the past medical feature 311 and second partial medical data including the current medical feature 312 .

A past medical feature (eg, past feature 311 shown in FIG. 3 ) may include a feature representing information corresponding to the patient's medical care prior to the patient's hospitalization. For example, historical medical features may represent features obtained from raw medical data collected prior to the patient's hospitalization date.

The first partial medical data (eg, past partial medical data) collected in a predefined period prior to the hospitalization period may include medical information corresponding to the aforementioned past medical features. For example, the first portion medical data may include past medical features of the patient's past (eg, for 3 years prior to the date of hospitalization). The first part medical data is illustratively, a history of one or a combination of two or more of diagnosis, medication, laboratory, physical information, and length of stay in intensive care unit (LOS of ICU). It may include data corresponding to medical features.

The current medical feature (eg, the present feature 312 shown in FIG. 3 ) may include a feature representing information corresponding to the hospitalization date and later. For example, the current medical feature may represent a feature obtained from raw medical data collected during hospitalization.

The second partial medical data collected during the hospitalization period (eg, current partial medical data) is data including medical information after the hospitalization date of the patient, and may include medical information corresponding to a current medical feature. The second partial medical data is, by way of example, one or two or more of operation, procedure, PACS, diagnosis, medication, laboratory, and physical information. It may include data corresponding to the current medical features of the combination.

The second partial medical data may be updated at regular intervals (eg, one day) until the patient is discharged. For example, an apparatus for predicting a patient's discharge may collect additional medical information at predetermined intervals. Additional medical information is information that is additionally collected after a patient's hospitalization, and may include, for example, a patient's diagnosis during hospitalization. The apparatus for predicting discharge of a patient may update a current medical feature of the second partial medical data in the medical data based on the aforementioned additional medical information.

The apparatus for predicting discharge of a patient may generate the above-described medical data by combining the above-described current partial medical data and past partial medical data.

For reference, the hospitalization period is a continuous hospitalization period of a currently hospitalized patient, and may indicate a period from the most recent hospitalization date in the case of a patient who has been hospitalized several times. For example, if a patient is admitted on a first day, discharged on a second date, and re-admitted on a third date, the hospitalization period between the first and second dates and the hospitalization period after the third date are not consecutive. (eg, separate) hospitalization periods. If whether or not the patient is discharged from the hospital is predicted after the third date, the hospitalization period of the corresponding patient may indicate a period after the third date, which is the most recent hospitalization date. The period between the first date and the second date may be included not in the hospitalization period, but in the period prior to the hospitalization period.

The raw medical data is data including medical information on patients collected by one or more subjects, and may represent data before preprocessing is applied to the collected medical information.

For example, raw medical data is stored in CardioNet, Inc., a manually curated electronic health record (EHR) database specific to CVD. can be extracted from the company's data. CardioNet may exemplarily consist of 572811 patients who visited Asan Medical Center in Seoul for CVD between January 1, 2000 and December 31, 2016. CardioNet's collection may be subject to AMC institutional review board approval if informed consent is waived. There may be 27 tables such as visitation, demographic, diagnosis, medication, and laboratory. Most tables in CardioNet can have common variables such as patient's identification (PAID), patient's encounter number (INNO), visit or hospitalization date (INDT), and discharge date (OUDT). A KEY column in the form of concatenating PAID and INNO can connect the visit table and other tables. Through KEY, variables in each table to be analyzed can be extracted.

Records of 84,251 out of 63,261 anonymous patients admitted to Cardiology or Thoracic Surgery from 572,811 patients of CardioNet can be obtained. Moreover, it can be focused on predicting discharge within a target period (eg, 3 days) and detecting long-term patients to develop a practical and usable model. Long-term patients over 30 days can be managed separately by the Asan Medical Center (AMC). Thus, the length of stay can be set between 3 and 30 days.

Data extracted from CardioNet may include the following variables for multiple tables:

- Visitation Table: PAID, INNO, KEY, INDT, OUDT, Visit Type, Department, Intensive Care Unit (ICU) Length of Stay (LOS)

- Diagnosis table: International Classification of Diseases (ICD)-10th diagnosis code

-Laboratory table: pathology examination date and code, examination results

- Physical information table: patient's age, height, weight, systolic and diastolic blood pressure, respiratory rate, pulse rate, body mass index, body surface area, measurement date

- Medication table: date and code of prescription

- Procedure table: date and code of order

- Operation table: date and code of surgery or treatment

- Picture Archiving and Communication System (PACS) table: date and code of order

- Transfusion table: date and code of order

For reference, the ICU list is as follows: Acute Care Unit (ACU), Coronary Care Unit (CCU), Cardiac Surgery ICU (CSICU), Medical ICU (MICU), Neonatal ICU (NICU), Neurological ICU (NRICU), NSICU (Neurosurgical ICU), Pediatric ICU (PICU), and Surgical ICU (SICU)

The visit table and other tables of raw medical data according to an embodiment may include only one piece of information per row, and it may be difficult for the ML model to learn all data at once. Thus, the device can acquire features of a new data set (e.g., medical data) by performing pre-processing including one-hot encoding (OHE) of clinically significant orders and codes. can Through pre-processing, the device can access each patient's date-aggregated records.

For reference, as described above with reference to FIG. 3 , tables for diagnosis, medication, examination, and body may be used for both past features and present features. For example, in the case of a diagnosis table, information from the date of hospitalization included in the diagnosis table (eg, period of hospitalization) and information of a predefined period prior to the period of hospitalization (eg, 3 years prior to the date of hospitalization) may be used to generate current features (or current partial medical data) and past features (or past partial medical data), respectively. Tables of surgery, treatment and PACS can be used for current features. The ICU's LOS can be used for past features.

In step 410, the apparatus for predicting discharge of a patient may select codes having a high frequency from raw data (select top frequent codes). For example, if there are too many possible values of code variables of prescriptions included in the medication table, training and/or inference of a machine learning model that classifies all values of code variables of prescriptions may result in inefficiency. In order to limit the number of values that a code can have to a predetermined number or less, an apparatus for predicting discharge from a patient selects codes having a high frequency and all other codes are one code (e.g., “other”). code indicating the ).

For example, in the case of diagnostic and surgical tables, all values of ICD-10 codes and surgical codes can be sliced at the third digit to be converted to 3-digit codes. This may be because the string after the fourth number may indicate a lower hierarchy of the three-digit code. All frequency numbers of values are sorted in descending order and the first 99 codes can be selected. The remaining codes (eg, codes not selected) may be transformed into an "other" feature.

At step 420, the device may perform one-hot encoding (OHE). For example, in the case of diagnostic and surgical tables, OHE can be performed for all 100 codes. Features in the form of "Z_code" such as "Z_DICD" and "Z_OPCD" can refer to "Others" of each original table.

At step 430, the device may fill values. For example, in the case of diagnosis and surgery tables, a total of 100 codes are obtained for each table, and the date index value may be filled with 1 if there is valid prescribed or ordered data and 0 otherwise.

At step 440, the device may perform an imputation on missing values.

For example, in a table that excludes test, body information, and date related features, null values may be replaced with 0. The value type of most other features can be computed as a frequency, so it can be null or integer.

For another example, in order to deal with missing values of the continuous data type of the features of the examination and body tables, the data set is first separated based on KEY so that individual hospitalizations may not be mixed. KEY can refer to one hospitalization case of one patient. Null values can be filled in chronologically (eg, from past to present). After that, the remaining null values can be filled in reverse chronological order (eg, from the present to the past) to handle the case where the outcome was not measured at the beginning of the admission. For each hospitalization of an individual patient, null values can be imputed. Finally, to fill all features with unsorted or unmeasured values, the remaining null values can be filled with the most frequent value for each feature.

The above has been mainly described as examples for diagnosis and operation tables, but similar preprocessing may be applied to other tables. For example, similar to diagnostic and surgical tables, the values of a PACS table can be converted into 100 features. For another example, similar to the diagnosis table, in the medication and treatment table, the most frequent 99 codes and “Others” were obtained through OHE performance, and the corresponding data can be populated. As another example, in the examination table, 60 most frequent examination codes that are examined by more than 50% of all patients may be selected. OHE of the values can be performed and the values can be filled with the results corresponding to each check. If a patient is tested multiple times a day, the data set can be populated with the average of the results.

However, the preprocessing is not limited to the above, and some steps may be omitted or added for each table.

The apparatus for predicting discharge of a patient according to an embodiment may omit step 410 of selecting frequent codes during pre-processing with respect to a table. For example, in the case of a blood transfusion table, all 27 available codes can be used. The value can be filled with the number of prescriptions per day or once taking into account the severity of each patient's disease. As another example, the body table has 10 codes, and all codes can be used.

An apparatus for predicting discharge of a patient according to another embodiment may further perform an additional step together with steps 410 to 440 in a preprocessing process.

Although omitted in FIG. 4 , the apparatus for predicting patient discharge may merge and connect a plurality of tables. In a primary table (eg, visit table) of raw medical data (eg, CardioNet) according to an embodiment, a plurality of main columns (eg, PAID, INNO, INDT, OUDT) and variables related to the visit. Each row may represent a single admission case for each patient. In order to create a new data set format with a date index of the period between admission and discharge, the index can be reset. For example, a row with an INDT of 2021.02.01 and an OUDT of 2021.02.10 may have an LOS of 10 days. One row of the visitation table can be converted into 10 rows with 10 date indices. Tables can be merged and concatenated to create a new data set for model training after preprocessing all values corresponding to the PAID, INNO and date indexes of the other tables.

Also, the device for predicting a patient's discharge may remove features or create additional features. For example, a device for predicting a patient's discharge may remove OUDTs containing future information after creating a new data set. For another example, a total of 10 date-related features may be created in order to classify and recognize date time information according to types. INDT and date indices can be partitioned into integer features such as year, month, day, and day of the week. Another feature can be created that indicates LOS from the date index by subtracting the INDT from the date index and a feature indicating whether the date index is a public holiday.

Although the preprocessing process of obtaining current features of medical data has been mainly described as an example, the preprocessing process of acquiring past features from raw medical data may be similarly performed. In order for the ML model to learn deep on the data, features of the medical data are included in the patient's medical history (eg, medical history) along with day-by-day features (eg, current medical features). may include past medical features related to When the date index of each hospitalization starts from the hospitalization date (INDT), some past medical features may be obtained from main information of a hospital visit record 3 years before the hospitalization date.

For example, an apparatus for predicting a patient's hospital discharge may obtain past medical features from raw medical data collected prior to hospitalization. Similar to current medical features, OHE can be performed on past medical features and values can be filled. Past medical features of the medical data may be filled with a sum value or a recent value corresponding to each feature. For example, lengths of stay in each intensive care unit in the visiting table may be summed. If there is a diagnostic record in the past for 100 diagnostic codes, each value can be summed up. If there are records for 100 medication codes, the number of prescriptions per day or at one time can be summed up. For another example, body information within 3 years and recent examination results may be used for a total of 70 codes.

5 illustrates labeling of medical data according to one embodiment.

A supervised learning algorithm for classification may require labels such as True or False to indicate a correct answer. The target criteria for labeling as true are shown in FIG. 5 .

In FIG. 5 , Day 1 may be a hospitalization date (INDT), Day N may be a hospital discharge date (OUDT), and one circle may represent each day during the hospitalization period. Day N (eg discharge date) can be excluded from the data set because of information such as discharge procedure that can hint the ML model. Although the accuracy of discharge prediction can be high up to two days from the discharge date, it may be useful to make a prediction before the target period (eg, 3 days) in advance when using an actual model. Thus, dates from 1 day before the discharge date (OUDT) to 3 days before the discharge date (OUDT) are 1 (eg true or positive), and dates from the date of admission (INDT) to 4 days before the discharge date (OUDT) are 0 ( eg, false or negative).

In medical data according to an embodiment, various variables of an original table (eg, raw medical data) may be converted into 10 date-related features, 597 current features, and 279 past features. From 84,251 records of 63,261 inpatients with CVD, 669,667 rows of medical data with 886 features can be generated. Medical data consisting of 669,667 records with 886 features including diagnosis code, examination test results, body information, medication, treatment, surgery, PACS, and blood transfusion can be generated. Patients may be admitted to cardiology or thoracic surgery, and their LOS may range from 3 to 30 days. The mean age of the patients may be 61.03 years, and the standard deviation may be 13.42 years. The medical data may consist of 38% female (eg, 254,254 rows) and 62% male (eg, 415,413 rows).

At step 610, the processor may obtain medical data by pre-processing the raw medical data.

As described above in FIG. 3 , the medical data is partial medical data (eg, second partial medical data or current partial medical data) collected in the hospitalization period (a1) and a prior prior to the hospitalization period, based on the collected period. It may include partial medical data (eg, first partial medical data or past partial medical data) collected in the defined period (a2).

At step 620, the processor may obtain a first likelihood score by applying a machine learning model to the input data.

The input data may include at least a portion of medical data. Medical data may include a plurality of features. The processor may select one or more of the features of the medical data as features of the input data. Selection of features of input data will be described in detail with reference to FIG. 9 below.

The likelihood score may represent the likelihood that the patient will be discharged within a predefined subject period from the predicted time point. For example, the first likelihood score is a score representing the predicted possibility of discharge of the patient at the first time point d1 , and may indicate the possibility that the patient will be discharged from the hospital within the target period p1 from the first time point.

For reference, predicting discharge of a patient by a processor may be repeatedly performed according to a predefined cycle. According to an embodiment, a discharge prediction cycle may be the same as a medical data update cycle described later. In the present specification, it is mainly described that the discharge prediction cycle and the medical data update cycle are both one day, but are not limited thereto. According to another embodiment, the discharge prediction cycle may be longer than the update cycle of the medical data or may be a multiple of the update cycle of the medical data.

At step 630, the processor may predict whether the patient will be discharged based on the patient's first likelihood score. Whether the patient will be discharged or not predicted based on the first likelihood score may indicate, in detail, whether the patient will be discharged from the hospital within the target period from the first time point. For example, the processor can predict whether the patient will be discharged by comparing the first likelihood score to a threshold score.

In operation 640, the processor may update medical data collected during the hospitalization period in response to the patient being hospitalized at the second time point. The second point in time may represent a point in time when one cycle of updating medical data has elapsed from the first point in time. However, it is not limited thereto, and the second time point may indicate a time point when one or more cycles have elapsed from the first time point. Due to the additional medical information generated between the first time point d1 and the second time point d2, some of the medical data (eg, current partial medical data or current feature of the medical data) is changed (eg, updated). ) can be

At step 650, the processor may obtain a second likelihood score by applying a machine learning model to the updated input data. The second likelihood score may represent the likelihood that the patient will be discharged from the hospital within the subject period from the second time point. In order to output the second likelihood score from the machine learning model, the input data may include at least a part of medical data obtained from the hospitalization date to the second time point during the hospitalization period. The second likelihood score at the second time point may be output based on input data including medical data collected up to the second time point during the hospitalization period.

In step 660, the processor may predict whether the patient will be discharged based on the patient's second likelihood score. Whether the patient will be discharged or not predicted based on the second probability score may indicate, in detail, whether the patient will be discharged from the hospital within the target period p2 from the second time point d2.

Training data can be labeled as either discharge or hospitalization. A positive (eg, 1) label may be set for discharge, and a negative (eg, 0) label may be set for hospitalization. To evaluate and compare the performance of models, accuracy, sensitivity (or recall of positives), specificity, precision, positive predictive value Metrics including PPV), negative predictive value (NPV), false positive rate (FPR), and true positive rate (TPR) may be used. When monitoring model training and validation, the F1-Score can be used to reflect imbalanced subjects, the receiver operating characteristic (ROC) curve can be used to find the optimal threshold, For comparison, area under ROC (AUROC) scores can be used.

In order to prevent overfitting of the ML-based model and reduce biased results, stratified 5-fold cross validation may be performed as shown in FIG. 7 . The 63,261 PAIDs can be randomly shuffled and divided into 5 groups of about 12,000 people. This may be due to an attempt not to split a single patient's record into a training (eg, dotted box in FIG. 7 ) and test set (eg, diagonally hatched boxes in FIG. 7 ). The first group may be a test set and the remaining groups may be fold 1 training sets. Folds 1 through 5 can be generated in a similar way to ensure equal division of imbalanced subjects (e.g., the true label in the data set is 62.4% for label 0 in all folds and 37.6% for label 1). 25% of the training set can be split into a validation set to adjust the hyperparameters. As a result, at each fold the data set can be split into approximately 133,000 rows for the test set and 535,000 rows for the training set (including the validation set, for example). ML-based models can be trained and tested on all five folds.

An apparatus for predicting a patient's discharge according to an embodiment may experiment with five machine learning models in order to find the most suitable model. For example, a logistic regression (LR) model can be established as a baseline for performance estimation. Support vector machine (SVM), random forest (RF), multi-layer perceptron (MLP) and Extreme Gradient Boosting (XGBoost) will be selected as machine learning models for comparison. can An apparatus for predicting hospital discharge according to an embodiment may perform hyperparameter tuning for each model through a random search.

The apparatus for predicting hospital discharge according to an embodiment may select XGB, which is one of Gradient-Boosting Algorithm (GBM) models, as a final model. GBM may include an ensemble method that combines several weak classifiers (eg, trees). The main idea of GBM may be to focus on and weight mispredicted outcomes. While the XGB is being trained, one tree learns the data set and assigns weights to the mispredicted records with errors, and the next tree of the same model follows the process of learning the weighted data set and assigning weights. can be repeated

For reference, GBM, as an explainable machine learning model, can quantify the contribution of features to prediction results such as feature importance. In particular, XGB can have normalization and performance advantages. XGB can perform parallel processing, can be regulated to prevent overfitting, can be widely used in structured data learning, and can have good predictive performance.

Feature importance may list features and their contribution scores that the model considers important in the process of training data by the tree-based algorithmic model. XGB can be considered the final model because of the XGB's high performance as well as access to the internals of the model including the decision-making process. By approaching the tree, the specific features that contributed to each patient's daily discharge prediction and their influence can be accounted for.

Evaluation by AUROC score of 5-fold cross-validation for each model

	LRLR	SVMSVM	RFRF	MLPMLP	XGBXGB	Support [0, 1]Support[0, 1]

Fold 1 Fold 1
	0.8260.826	0.8250.825	0.8530.853	0.8330.833	0.8660.866	[83113, 50188][83113, 50188]
Fold 2 Fold 2
	0.8270.827	0.8260.826	0.8510.851	0.8350.835	0.8680.868	[83538, 50310][83538, 50310]
Fold 3 Fold 3
	0.8240.824	0.8240.824	0.8500.850	0.8210.821	0.8650.865	[84192, 50585][84192, 50585]
Fold 4 Fold 4
	0.8240.824	0.8230.823	0.8500.850	0.8310.831	0.8640.864	[83969, 50460][83969, 50460]
Fold 5 Fold 5
	0.8220.822	0.8210.821	0.8480.848	0.8340.834	0.8630.863	[82918, 50394][82918, 50394]
MeanMean
	0.8240.824	0.8240.824	0.8500.850	0.8310.831	0.8650.865

Five ML-based models can be tested using 5 cross validations, and the AUROC score for each fold can be shown in Table 1. The highest AUROC score for each fold is shown in bold, and the "Support" column in Table 1 indicates the count of each true value label. In FIG. 8 , a ROC curve plot may appear. The area of the curve may represent AUROC having a value between 0 and 1. The closer the AUROC score is to 1, the higher the performance of the model. XGB can achieve the highest and relatively stable score in all folds.

Comparison of 5 ML-based models with results of metrics

	accuracyaccuracy	sensitivitysensitivity	specificityspecificity	PPVPPV	NPVNPVs	AUROCAUROC

LRLR
	0.75 (±0)0.75 (±0)	0.624 (±0.005) 0.624 (±0.005)	0.828 (±0.004) 0.828 (±0.004)	0.686 (±0.005)0.686 (±0.005)	0.786 (±0.005)0.786 (±0.005)	0.824 (± 0.002)0.824 (± 0.002)
SVMSVM
	0.75 (±0)0.75 (±0)	0.624 (±0.005)0.624 (±0.005)	0.828 (±0.004) 0.828 (±0.004)	0.686 (±0.005)0.686 (±0.005)	0.784 (±0.005)0.784 (±0.005)	0.824 (±0.002)0.824 (±0.002)
RFRF
	0.77 (±0)0.77 (±0)	0.696 (±0.005)0.696 (±0.005)	0.818 (±0.004)0.818 (±0.004)	0.696 (±0.005)0.696 (±0.005)	0.8183 (±0.004)0.8183 (±0.004)	0.85 (±0.002)0.85 (±0.002)
MLPMLP
	0.758 (±0.004)0.758 (±0.004)	0.642 (±0.017)0.642 (±0.017)	0.822 (±0.007)0.822 (±0.007)	0.686 (±0.005)0.686 (±0.005)	0.792 (±0.007)0.792 (±0.007)	0.831 (±0.005)0.831 (±0.005)
XGBXGB
	0.782 (±0.004) 0.782 (±0.004)	0.716 (±0.005) 0.716 (±0.005)	0.824 (±0.005)0.824 (±0.005)	0.71 (±0) 0.71 (±0)	0.828 (±0.004) 0.828 (±0.004)	0.865 (±0.002) 0.865 (±0.002)

Table 2 compares the evaluation results of five ML-based models. All scores in Table 2 are the average value and standard deviation of the results in the five folds, and the highest score for each metric may be indicated in bold. In the case of Specificity, LR and SVM were the highest at 0.828, but XGB may be the highest in the other metrics. In particular, XGB can score better than 0.7 in predicting label 1 even when the labels in the data set are imbalanced. Accordingly, XGB can be selected as the final model for predicting the discharge probability. [0043] FIG. 9 illustrates selecting features to be applied to the machine learning model based on feature importance according to an embodiment.

Feature importance can indicate the importance of that feature of the input data to the machine learning model. The feature importance may be calculated according to the degree to which a prediction error increases compared to the original data when the corresponding feature value is replaced with an arbitrary value.

Graph 900 may represent relative feature importance ordered according to the gain score of XGB. The gain score may represent the average gain of all splits in which the feature is used. All features used in the machine learning model according to an embodiment may be replaced with names used in the Asan Medical Center (AMC). Except for date-related features, most features affecting the model can be found in all tables. Features of the treatment table may be substantially related to a clinically important situation. For example, a term marked with (D) may be more likely to mean a more serious condition than another. Other features may also be related to CVD or include primary examination and prescription during hospitalization.

Feature importance can only indicate the importance of a corresponding feature to a machine learning model, and can be distinguished from feature influence described later in FIG. 11 . As will be described later, the feature influence is a value representing the degree to which the value of a corresponding feature affects one output (eg, a probability score). Because feature importance can be descriptive for the model and difficult to explain for each patient, it may fall short of being used as an individual descriptor for prediction (e.g., likelihood score). Depending on the patient's condition, different features can affect the daily discharge probability each time. Individual descriptors that provide features that have an impact on the daily probability of discharge during hospitalization for each patient can be proposed. Individual descriptors are described in detail in FIGS. 11 to 13 .

The processor according to an embodiment may select features of the input data from among features of the medical data based on feature importance.

At step 910, the processor may calculate feature importance for the ad hoc machine learning model. The ad hoc machine learning model may represent a machine learning model trained with data including all features of the preprocessed medical data. Feature importance may indicate the importance of that feature to the machine learning model. For example, the feature importance may be calculated according to the degree to which the prediction error of the temporary machine learning model increases compared to the original data when the corresponding feature value is replaced with an arbitrary value in the data.

In step 920, the processor may select a feature of the input data of the machine learning model based on the calculated feature importance. For example, the processor may sort the calculated feature importance in descending order and select top features corresponding to a predefined number. For another example, the processor may select features having a feature importance greater than or equal to a predefined threshold feature importance.

Through feature selection of the input data, an input format of the machine learning model may be determined as a data format including the selected features.

At step 930, the processor may train a machine learning model based on the selected features. The processor may extract selected features from the medical data and obtain training data composed of the extracted features. The processor may obtain a machine learning model by training based on selected features, instead of being trained based on all features of the medical data.

At step 940, the processor may obtain a likelihood score by applying a machine learning model to the selected features. A machine learning model can be applied to input data consisting of selected features. According to one embodiment, the input data may be obtained by extracting selected features from medical data including all features for the patient. According to another embodiment, the input data may be obtained by collecting only features selected from the patient as inputs of the machine learning model.

10 shows the performance of machine learning models of input data including selected features according to one embodiment. Too many features can negatively affect model performance. Thus, selecting an appropriate number of features may be required.

According to one embodiment, recursive feature elimination with cross-validation (RFECV) may be performed, and the goal of RFECV is to identify an optimal number of features by comparing model performance while maintaining low feature importance. It may be to remove features one at a time. RFECV can return all feature ranks and names. By applying RFECV to the final model, XGB, about 150 features with a rank of 1 can be identified. For performance comparison, 5-fold cross-validation can be performed using the same data set with the same parameters.

The embodiments to be compared in FIG. 10 are based on all 886 features (denoted XGB 886), based on 150 features selected by RFECV (denoted XGB RFE 150), and 150 selected by RFECV. may include an embodiment based on the top 50 features in feature importance of the trained model (denoted as XGB RFE & FI 50). In FIG. 10, numbers in parentheses in the legend may represent AUROC scores for each example.

Evaluation by AUROC score of 5-fold cross-validation to select features

피처들의 개수number of features	accuracyaccuracy	sensitivitysensitivity	specificityspecificity	PPV PPV	NPVNPVs		AUROCAUROC

886886 (ALL)(ALL)
	0.782 (±0.004) 0.782 (±0.004)	0.716 (±0.005) 0.716 (±0.005)	0.824 (±0.005) 0.824 (±0.005)	0.71 (±0) 0.71 (±0)	0.828 (±0.004) 0.828 (±0.004)	0.865 (±0.0018) 0.865 (±0.0018)
150150 (RFE)(RFE)
	0.77 (±0)0.77 (±0)	0.696 (±0.005)0.696 (±0.005)	0.814 (±0.005)0.814 (±0.005)	0.694 (±0.005)0.694 (±0.005)	0.818 (±0.004)0.818 (±0.004)	0.853 (±0.0018)0.853 (±0.0018)
5050 (RFE & FI)(RFEs & FIs)
	0.76 (±0)0.76 (±0)	0.67 (±0.006)0.67 (±0.006)	0.812 (±0.004)0.812 (±0.004)	0.682 (±0.004)0.682 (±0.004)	0.802 (±0.004)0.802 (±0.004)	0.840 (±0.00096)0.840 (±0.00096)

Table 3 together with FIG. 10 may indicate that the performance difference between the model using all features, the model using 150 features, and the model using 50 features is only about 1 to 2.5% based on the AUROC score. According to an embodiment, even when feature reduction of 83.1% to 94.4% is applied, the maximum performance difference may be only 2.5%. The number of characteristics may be appropriately adjusted in consideration of the characteristics of each hospital or data.

The predictive model can classify the data as 0 or 1 according to the critical score. An optimal threshold score may be a score at which the sum of sensitivity and precision can be simultaneously maximized. In the ROC curve, TPR and FPR may be proportional to each other, but sensitivity and precision may have a trade-off. Reducing false negatives (FN) increases sensitivity and decreasing false positives (FP) increases precision. It may be required to appropriately adjust the threshold score at the decision point of hospital operation.

For reference, the optimal threshold score may be adjusted according to the hospital situation, but ambiguity in decision making may exist due to the likelihood score near the threshold score. Additional techniques can be used to reduce the ambiguity of these decisions. For example, techniques using weighted averages can be used to make the results more conservative but reliable. Rather than directly using the likelihood scores (eg probabilities) returned by the model, it may be more useful to weight the results prior to the prediction time so that the prediction time reflects at least some of the past results. Producing reliable results can be just as important as describing the model and its internal features.

Daily discharge probabilities during the hospitalization period and feature influences by date may be presented through individual descriptors for prediction.

An individual explainer can be presented that can help interpret the prediction results of XGB using a waterfall chart. A waterfall chart is a type of bar chart, also called a bridge or cascade chart, and can represent relative values that calculate the difference between adjacent values (portray). The gradual direction of the eventual discharge probability and the degree of positive or negative influence can be indicated.

An individual explainer may indicate a feature influence on the obtained likelihood score. The feature influence may correspond to a score evoked by each feature to a likelihood score. The feature influence may be expressed as a contribution factor that quantifies the degree of contribution of the corresponding feature to the likelihood score. By displaying the degree of influence of features on the likelihood score, the display can not only check features that have a great influence on generating the corresponding likelihood score, but also visually present magnitudes of the degree of influence to the user.

According to one embodiment, with a trained XGB, desired records are predicted and contributions of all features can be obtained, in order to estimate the values of individual descriptors. For example, the contribution may represent a feature influence obtained by aggregating the scores each feature contributed to all trees. Subsequently, the logistic value of the feature influence (

) and relative values required for descriptors can be calculated.

The processor may select one or more features of the plurality of features based on the feature influence. The processor may sort the features in descending order according to the degree of influence of the feature, and may select features sorted at higher ranks corresponding to a predefined number. Selected features may be indicated by the display. For example, the number of features to be displayed may be selected as 15, and the remaining 871 features may be integrated and simultaneously indicated as “Other” in the descriptor.

In FIG. 11, the x-axis of the plot is the score from 0 to 1, and the y-axis is the contribution and the value that affects the likelihood score (eg, the first likelihood score) at the prediction time (eg, the first time point) can indicate The intercept of the regular diagonally hatched boxes at the bottom of the y-axis can be corrected to reflect the disproportionate number of true value labels in each. The probability of discharge, the gray box at the top of the y-axis, can represent a likelihood score. The width of each box corresponding to a feature may represent the absolute value of each score. The actual score may be displayed on the right side of the plot. The absolute value may decrease from bottom to top, indicating that the contribution to discharge probability also decreases. For reference, the box of “Others” can be relatively wide because it is the sum of the scores of about 800 features excluding the features below it. A dotted box can score each of the features that contributed positively to the discharge probability. Scores can be shifted to the right in the graph by the dotted boxes. Conversely, diagonally hatched boxes represent scores of negatively contributing features and scores can be shifted to the left in the graph.

To recap, there are features contributing to the prediction on the y-axis from bottom to top, the dotted box on the right can be positive and the diagonally hatched box on the left can represent negative.

Graph 1110 shows feature influence for a likelihood score of 0.004 obtained on day 7 (Date: 7), and graph 1120 shows feature influence for a likelihood score of 0.811 obtained on day 12 (Date: 12). figure can be shown. In graph 1110, (D) ARTERIAL MONITORING = 1 (ARTERIAL MONITORING = 1.0) and (D) INFUSION PUMP = 3.0 may negatively affect the likelihood score. In contrast, (D) Infusion Pump = 0 in graph 1120 may have a positive effect on discharge probability. Because arterial monitoring and infusion pumps are primarily prescribed for critically ill patients, both can constitute mostly zeros in the data set. Displaying features along with their values can help clinicians interpret plots intuitively.

Individual descriptors may or may not have features represented in the feature importance plot (illustrated in Figure 9). This may suggest the need to identify features contributing to individual patients rather than managing only features of feature importance to the overall model.

The display may display over time a plurality of predicted likelihood scores for an inpatient. The plurality of probability scores are predicted probability scores at different points in time, and may represent changes in the probability of discharge of the patient by being displayed according to time.

For example, as shown in FIG. 12 , the sample data set may be a record of a patient with a PAID of 228,443 and an INNO of 2 who was hospitalized for 13 days and discharged on the 14th. A plot of the patient's likelihood score may appear in FIG. 12 . The x-axis of the plot may represent the days of the patient's hospital stay excluding the day of discharge (labeled as 14 days), and the y-axis may represent the likelihood score (eg, discharge probability). The model's optimal critical score may be 0.39, indicated by the horizontal dotted line. Circles and triangles can represent the

true labels

1 and 0, respectively, and the size of the circle and triangle can be proportional to the likelihood score. Patterns in the picture may represent the results predicted by the model. Dots can represent positive predictions (eg label 1; predicted discharge) and diagonal hatching represent negative predictions (eg label 0; predicted hospitalization).

For the sample in Figure 12, the model can accurately predict discharge within 3 days. However, adjusting the threshold score may change the prediction results for

days

11 and 12. For example, if the threshold score increases, label 1 may only correspond to

days

12 and 13. Increasing the threshold score can be useful when trying to decrease false positives (FPs) despite increasing false negatives (FNs).

13 shows a simulated impact on bed management to which a predictive model and individual descriptors are applied according to an embodiment. The discharge probabilities of all patients in each ward can be recognized every day, and the most important features and feature values that affect the discharge probability score can be identified at once. Because individual descriptors imply inferences about long-term discharge as well as discharge, they can be useful in interpreting both high and low discharge probabilities. Similarly, information can be obtained based on each patient's expected discharge date, such as bed capacity in the near future. In order to efficiently utilize hospital human and material resources, future bed information can help reduce hospital costs through improved bed management and hospitalization appointments.

Research on biomarker detection for bed management and patient-specific treatment that requires hospital processes to be utilized can be actively conducted. The present invention can propose an ML-based predictive model to identify discharge date and risk factors related to discharge and CVD for better bed management. However, since environmental variables are different for each hospital, an algorithm that can comprehensively consider them may be required. The present invention can contribute to improving algorithms and supporting medical services. Below we describe the expectations of the predictive model.

The model according to the present invention can be extended from ward level to hospital level bed management. The present invention can contribute to reducing labor-intensive work of medical staff and waiting time of patients.

The embodiments described above may be implemented as hardware components, software components, and/or a combination of hardware components and software components. For example, the devices, methods and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate (FPGA). array), programmable logic units (PLUs), microprocessors, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and software applications running on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of software. For convenience of understanding, there are cases in which one processing device is used, but those skilled in the art will understand that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that it can include. For example, a processing device may include a plurality of processors or a processor and a controller. Other processing configurations are also possible, such as parallel processors.

Software may include a computer program, code, instructions, or a combination of one or more of the foregoing, which configures a processing device to operate as desired or processes independently or collectively. You can command the device. Software and/or data may be any tangible machine, component, physical device, virtual equipment, computer storage medium or device, intended to be interpreted by or provide instructions or data to a processing device. , or may be permanently or temporarily embodied in a transmitted signal wave. Software may be distributed on networked computer systems and stored or executed in a distributed manner. Software and data may be stored on computer readable media.

The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination, and the program instructions recorded on the medium may be specially designed and configured for the embodiment or may be known and usable to those skilled in the art of computer software. may be Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. - includes hardware devices specially configured to store and execute program instructions, such as magneto-optical media, and ROM, RAM, flash memory, and the like. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler.

The hardware device described above may be configured to operate as one or a plurality of software modules to perform the operations of the embodiments, and vice versa.

As described above, although the embodiments have been described with limited drawings, those skilled in the art can apply various technical modifications and variations based on this. For example, the described techniques may be performed in an order different from the method described, and/or components of the described system, structure, device, circuit, etc. may be combined or combined in a different form than the method described, or other components may be used. Or even if it is replaced or substituted by equivalents, appropriate results can be achieved.

Therefore, other implementations, other embodiments, and equivalents of the claims are within the scope of the following claims.

Claims

In the device for predicting the patient's discharge,

For a patient who is hospitalized at the first time point, by applying a machine learning model to input data including medical data collected from the hospitalization date to the first time point during the hospitalization period of the patient, the patient is selected from the first time point to the target period. a processor for acquiring a first likelihood score of being discharged within the target period, and predicting whether the patient will be discharged from the hospital within the subject period from the first time point based on the obtained first likelihood score;

Apparatus for predicting discharge of a patient comprising a.
According to claim 1,

the processor,

Operation, procedure, Picture Archiving and Communication System (PACS), diagnosis, medication collected for the patient from the hospitalization date to the first time point, Obtaining the first likelihood score by applying the machine learning model to input data including medical data relating to one or a combination of two or more of laboratory and physical.

A device for predicting a patient's discharge.
According to claim 1,

the processor,

In response to a case where the patient is hospitalized at a second time point after the first time point, medical data collected during the hospitalization period is updated based on medical data collected from the first time point to the second time point;

obtaining a second probability score that the patient will be discharged from the hospital within a target period from the second time point by applying the machine learning model to input data including the updated medical data;

predicting whether the patient will be discharged from the hospital within the subject period from the second time point based on the obtained second likelihood score;

A device for predicting a patient's discharge.
According to claim 1,

the processor,

Obtaining the first likelihood score by applying the machine learning model to input data including medical data collected during a predefined period before the hospitalization period together with medical data collected during the hospitalization period,

A device for predicting a patient's discharge.
According to claim 1,

the processor,

Diagnosis, Medication, Laboratory, Physical, and ICU collected during a predefined period prior to the hospitalization period for the patient together with medical data collected during the hospitalization period ( Obtaining the first likelihood score by applying the machine learning model to input data including medical data related to one or a combination of two or more of a length of stay (LOS) of an intensive care unit (ICU),

A device for predicting a patient's discharge.
According to claim 1,

the processor,

Based on the feature importance of each feature to an ad hoc machine learning model trained based on all features of the collected data, one or more features of the features of the collected data are assigned to the machine learning model. select as a feature of the input data of

train the machine learning model based on the selected features;

obtaining the first likelihood score by applying the machine learning model to the input data including the selected features;

A device for predicting a patient's discharge.
According to claim 1,

the processor,

Selecting one or more features as an input of the machine learning model by applying a recursive feature elimination with cross validation (RFECV) technique to the features of the input data,

A device for predicting a patient's discharge.
According to claim 1,

the processor,

Obtaining the first likelihood score by applying an extreme gradient boost (XGB) model to the input data,

A device for predicting a patient's discharge.
According to claim 1,

the processor,

selecting one or more of the features based on a feature influence corresponding to a score caused by each feature of the input data for the obtained first likelihood score;

A device for predicting a patient's discharge.
According to claim 9,

Display for displaying the feature influence of the selected one or more features with respect to the obtained first likelihood score

Apparatus for predicting the patient's discharge further comprising a.
According to claim 1,

A display showing likelihood scores for a plurality of time points during the hospitalization period.

Apparatus for predicting the patient's discharge further comprising a.
In a method for predicting discharge of a patient,

For a patient who is hospitalized at the first time point, by applying a machine learning model to input data including medical data collected from the hospitalization date to the first time point during the hospitalization period of the patient, the patient is selected from the first time point to the target period. obtaining a first likelihood score to be discharged within and

predicting whether the patient will be discharged from the hospital within the subject period from the first time point based on the obtained first likelihood score;

A method for predicting a patient's discharge comprising a.
According to claim 12,

The step of obtaining the first possibility score,

Operation, procedure, Picture Archiving and Communication System (PACS), diagnosis, medication collected for the patient from the hospitalization date to the first time point, Obtaining the first likelihood score by applying the machine learning model to input data including medical data relating to one or a combination of two or more of laboratory and physical,

A method for predicting a patient's discharge.
According to claim 12,

updating medical data collected during the hospitalization period based on medical data collected from the first time point to the second time point in response to a case where the patient is hospitalized at a second time point after the first time point;

obtaining a second probability score that the patient will be discharged from the hospital within a target period from the second point in time by applying the machine learning model to input data including the updated medical data; and

predicting whether the patient will be discharged from the hospital within the subject period from the second time point, based on the obtained second likelihood score;

A method for predicting a patient's discharge further comprising a.
According to claim 12,

The step of obtaining the first possibility score,

Obtaining the first likelihood score by applying the machine learning model to input data including medical data collected during the hospitalization period and medical data collected during a predefined period before the hospitalization period,

A method for predicting a patient's discharge.
According to claim 12,

The step of obtaining the first possibility score,

Diagnosis, Medication, Laboratory, Physical, and ICU collected during a predefined period prior to the hospitalization period for the patient together with medical data collected during the hospitalization period ( obtaining the first likelihood score by applying the machine learning model to input data including medical data related to one or a combination of two or more of length of stay (LOS) of an intensive care unit (ICU); doing,

A method for predicting a patient's discharge.
According to claim 12,

Based on the feature importance of each feature to an ad hoc machine learning model trained based on all features of the collected data, one or more features of the features of the collected data are assigned to the machine learning model. selecting as a feature of the input data of ; and

training the machine learning model based on the selected features.

Including more,

The step of obtaining the first possibility score,

obtaining the first likelihood score by applying the machine learning model to the input data including the selected features.

A method for predicting a patient's discharge.
According to claim 17,

Selecting the one or more features as features of the input of the machine learning model comprises:

Selecting one or more features as an input of the machine learning model by applying a recursive feature elimination and cross validation (RFECV) technique to the features of the input data,

A method for predicting a patient's discharge.
According to claim 12,

The step of obtaining the first possibility score,

Acquiring the first likelihood score by applying an extreme gradient boost (XGboost) model to the input data.

A method for predicting a patient's discharge.
According to claim 12,

selecting one or more of the features based on a feature influence corresponding to a score caused by each feature of the input data with respect to the obtained first likelihood score;

A method for predicting a patient's discharge further comprising a.
According to claim 20,

displaying the feature influence of the selected one or more features with respect to the obtained first likelihood score;

A method for predicting a patient's discharge further comprising a.
According to claim 15,

Displaying likelihood scores for a plurality of time points during the hospitalization period.

A method for predicting a patient's discharge further comprising a.
A computer program stored in a computer readable recording medium in order to execute the method of any one of claims 12 to 22 in combination with hardware.