CN115527678A - Nomogram ICU (intensive care unit) elderly disease risk scoring model and device fusing medical history texts and establishing method thereof - Google Patents

Nomogram ICU (intensive care unit) elderly disease risk scoring model and device fusing medical history texts and establishing method thereof Download PDF

Info

Publication number
CN115527678A
CN115527678A CN202211300558.2A CN202211300558A CN115527678A CN 115527678 A CN115527678 A CN 115527678A CN 202211300558 A CN202211300558 A CN 202211300558A CN 115527678 A CN115527678 A CN 115527678A
Authority
CN
China
Prior art keywords
risk
model
patient
icu
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211300558.2A
Other languages
Chinese (zh)
Inventor
刘晓莉
张政波
曹德森
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese PLA General Hospital
Original Assignee
Chinese PLA General Hospital
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinese PLA General Hospital filed Critical Chinese PLA General Hospital
Priority to CN202211300558.2A priority Critical patent/CN115527678A/en
Publication of CN115527678A publication Critical patent/CN115527678A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Abstract

The invention discloses a Nomogram ICU (intensive care unit) old disease risk scoring model and device fusing case history texts and an establishing method thereof. By adopting a data set of a large sample, an advanced pre-trained deep learning language model in the medical field and a transparent prediction model presentation mode, the comprehensive consideration of the condition of a patient before entering the ICU and the condition of the patient entering the ICU on the first day is realized, the fusion analysis of unstructured and structured data is realized, and the fusion modeling of a deep learning method and a machine learning method is realized. The risk scoring system needs only very convenient input of 10 types of easily accessible information to quickly obtain the patient's risk of in-hospital death, risk level, and Nomogram. The device is helpful for ICU doctors to more accurately and conveniently obtain the assessment of the disease emergency and the risk degree of the elderly patients, and is easy to popularize and deploy.

Description

Nomogram ICU (intensive care unit) elderly disease risk scoring model and device fusing medical history texts and establishing method thereof
Technical Field
The application relates to a medical information decision technology, in particular to a Nomogram ICU (intensive learning unit) old-age risk scoring model and device based on a fusion text of a pre-training deep learning language model and a machine learning model, and an establishment method thereof.
Background
Due to the extended life expectancy of many countries, the proportion of elderly Intensive Care Unit (ICU) patients is expected to increase even more dramatically in the coming decades. Elderly patients (> 80 years) have become a very high focus group in the ICU in recent years. Previous studies have shown that age and severity of the condition at the time of admission can only partially explain the chances of survival for elderly patients, and that clinical conditions prior to ICU admission can affect prognosis and outcome for elderly patients. The awareness and assessment of the baseline and pre-admission status of a patient by medical personnel is often based on reading of the patient's past medical history documentation and queries of the patient and family members. These documents typically contain important information about the patient's current condition, symptoms, family history, disease history, accepted procedures (e.g., X-ray, laboratory tests, etc.), medications, and the like. Most of this important useful information is presented in an unstructured way, so that due to the lack of explicit structure, this text can only be interpreted and evaluated by humans, but not by any computer program. Therefore, the clinical information buried in the narrative report is extracted and refined to be used as the prior knowledge of decision making, so that the medical care professionals can be helped to better and more quickly acquire key information, the patient can be more three-dimensionally and comprehensively evaluated for the disease condition, and higher-quality nursing is provided for the patient. Furthermore, although many studies have analyzed factors associated with increased mortality in elderly patients admitted to the ICU, it is desirable to aid physicians in triaging elderly patients and improving care during ICU hospitalization by these factors. But the more central and urgent need of medical care personnel is a feasible tool for prognosis evaluation of elderly patients to quantitatively evaluate the severity of diseases as a guideline and basis for decision making, treatment planning and communication with patients and family members.
Disclosure of Invention
In view of the above, the present application aims to propose a Nomogram ICU old age disease risk scoring model fused with case history texts.
The application provides a model for scoring risk of senile diseases of Nomogram ICU fused with case history texts, which comprises the following steps: the system comprises a data acquisition module, a data processing module, a BERT model calculation module, a multivariate logistic regression model and a Nomogram output module;
the data acquisition module is used for acquiring medical record text information before the patient enters the intensive care unit ICU and numerical value information acquired on the first day when the patient enters the ICU; the medical record text information is unstructured and comprises chief complaints, family history, current medical history, medication at the time of admission, past medical history, physical examination and social history; the numerical information is structured and includes GCS score, whether to use a booster, CCI index, whether to lie absolutely on bed, whether to perform mechanical ventilation, respiratory rate, whether to admit emergency, shock index, whether to select a soothing treatment;
the data processing module is used for processing the medical record text information and the numerical information; wherein the processing of the medical record text information comprises: the method comprises the following steps of converting characters/letters in lower case, removing special characters, performing sentence sliding segmentation, segmenting words, and embedding and representing sentences, wherein medical record text information is characterized into a plurality of vectors with preset lengths after being processed;
the BERT type model calculation module calculates and obtains a disease severity evaluation pre-ICU _ risk _ score before entering the ICU according to a plurality of coding vectors with preset lengths of standard medical record text information based on the selected fine-tuning pre-training clinical text type model;
the multivariate logistic regression model takes a preiCU _ risk _ score and a GCS score, whether to use booster drugs, a CCI index, whether to absolutely lie a bed, whether to carry out mechanical ventilation, a respiratory frequency, whether to admit a hospital urgently, a shock index and whether to select a relaxation treatment as input, and the risk probability of hospitalization and death and the risk level of the patient are calculated;
the Nomogram output module outputs Nomogram and disease severity scores for the patient based on inputs to the multivariate logistic regression model and parameters of the model.
Preferably, the data acquisition module extracts the patient's chief complaints, family history, current medical history, medications at the time of admission, past medical history, physical examination, and social history from the electronic health record.
Preferably, for the English case history, the BERT type model calculation module is one of a Bio-Clinical BERT model, a Clinical-Bigbird model, a Clinical-Longformer model and a PubMedBERT model; for the Chinese medical record, the BERT type model calculation module is one of PCL-MedBERT, chinesBLUE, chinesEHRBert and MedBERT models.
Preferably, the risk classes include low risk, medium risk, high risk;
and (4) according to the calculated in-patient death risk probability of the patient, regarding the patient with the in-patient death risk probability of 0-0.1 as a low risk, regarding the patient with the in-patient death risk probability of 0-0.35 as a medium risk, and regarding the patient with the in-patient death risk probability of 0-1 as a high risk.
Preferably, the disease severity score is calculated according to the following formula:
Total points=10*preICU_risk_score-1.4116*GCS score+3.5544*vasopressor +1.769*CCI score+1.0344*respiratory rate+2.8915*admission type+ 18.2479*shock index+4.4857*mechanical ventilation+9.9566*activity status+ 9.6055*code status;
wherein Total points is the disease severity score; GCS score is GCS score; the vasopressor is whether the booster is used or not, if so, the vasopressor is 1, and if not, the vasopressor is 0; CCI score is the CCI index; respiratory rate is the respiratory rate; the admission type is whether to admit the hospital urgently, if so, the admission type is 1, and if not, the admission type is 0; shock index is the shock index; whether mechanical ventilation is carried out or not is 1 if yes, and 0 if not; the activity status is whether the bed is absolutely lying or not, if so, 1, and if not, 0; code status indicates whether or not to select a soothing therapy, and is 1 if yes, and 0 if no.
Preferably, when the sentence is cut in a sliding manner, the sentence is cut in a sliding window manner.
The application also aims to provide a Nomogram ICU (intensive care unit) elderly disease risk scoring device fused with the medical record text, which is realized by a computer and is configured with the Nomogram ICU elderly disease risk scoring model fused with the medical record text.
The application also provides a method for establishing a Nomogram ICU (intensive care unit) elderly disease risk scoring model fused with a case history text, which comprises the following steps:
a data acquisition step; acquiring information for model construction from an electronic health file of a patient, wherein the information comprises medical record text information before the patient enters an Intensive Care Unit (ICU) and numerical information acquired in the first day when the patient enters the ICU; extracting medical record text information including chief complaints, family history, current medical history, medication at the time of admission, past medical history, physical examination and social history; extracting numerical information including basic information, cognitive function, activity tolerance, vital signs, laboratory examination, therapeutic intervention, fluid output, i.e., urine volume, and clinical common score;
a data processing step; cleaning, processing and characteristic construction are carried out on case text information of unstructured data and numerical value information of structured data, and segmentation of a data set is completed and used for training and evaluating a subsequent model; the processing of unstructured data includes: the method comprises the following steps of character/mother lower case conversion, special character removal, sentence sliding segmentation, word segmentation and sentence embedding representation; the processing of the structured data includes: removing abnormal values, aligning data, interpolating, constructing statistical characteristics and setting factor variables; the data set segmentation comprises the preparation of a development set, an internal verification set and a time sequence verification set;
a model development step; obtaining a preiCU _ risk _ score by using a pre-trained deep learning model; training and selecting important risk factors by using a machine learning model; the Nomogram ICU elderly disease risk scoring model fusing medical history texts fuses a preiCU _ risk _ score and the selected important risk factors, and a multivariate logistic regression model is adopted for training to obtain the in-hospital death risk probability, the risk level, the Nomogram and the disease severity score of the patient;
a model evaluation step; selecting a clinically concerned performance index under different verification modes, and comparing a plurality of baseline models with different scenes/requirements; the baseline model includes: the method comprises the following steps of obtaining preiCU _ risk _ score only by using medical history texts, modeling and selecting important risk factors only by using structured data, and a clinical common disease severity scoring system; the authentication types/modes include: internal verification and timing verification; the performance evaluation includes ROC curves, calibration curves, DCA curves and associated 7 evaluation indices, the 7 indices including: area under subject working curve, area of curve surrounded by accuracy and recall, sensitivity, specificity, F1 score, accuracy and its corresponding 95% confidence interval value, brier score.
Preferably, the optimal set of risk variables is obtained based on the entire training set: (1) Obtaining probability of each variable and 95% confidence interval and P value by using univariate logic analysis, and selecting clinical variables with P value less than 0.05; (2) Adopting an LASSO regression algorithm, and carrying out 5 times of cross validation to delete the variables when lambda.1se parameter setting is selected so as to achieve the simplest variable combination; (3) Selecting important variables again by using a forward and backward Stepwise algorithm based on Akaike information criterion; (4) Inputting the selected variables into a multivariate logistic regression model to obtain OR values, 95% CI, P values and variable coefficients, and selecting important variables; (5) excluding variables with small coefficients or high deletion rate;
the best risk variables obtained include GCS score, whether to use boosters, CCI index, whether to lie absolutely on bed, whether to mechanically ventilate, respiratory rate, whether to admit promptly, shock index, whether to select palliative treatment; for the continuous variables in the model, correlation analysis is carried out to confirm whether the subsequent modeling requirements are met.
Preferably, in the data processing step, the data processing module is used for synchronously processing medical record text information of unstructured data and numerical record information of structured data, and the processed unstructured and structured data are matched and associated through the unique identification number ID of the patient, so that fusion analysis can be performed; the unstructured data is embedded and characterized as a vector with a preset length, and represents the information contained in a text; for structured data, the worst value for each variable in clinical significance on the first day of the ICU was calculated; in addition, a frailty index and an elderly nutritional risk index are constructed based on the structured data.
Preferably, in the model development step, a BERT model in the pre-trained clinical field is adopted to execute the downstream tasks of evaluating the disease severity of the patient and the death risk in a hospital, so as to realize the comprehensive and three-dimensional perception and evaluation of the current condition of the patient by inquiring the medical history of a simulated doctor; for English medical records, the BERT model in the Clinical pre-training field is one of Bio-Clinical BERT, clinical-Bigbird, clinical-Longformer and PubMedBERT models; for the Chinese medical record, the BERT model in the pre-training clinical field is one of PCL-MedBERT, chinesBLUE, chinesEHRBert and MedBERT models; the BERT model of the pre-trained clinical domain is finely adjusted through a large amount of medical record texts to evaluate the in-hospital death risk of the patient, and the probability output of the BERT model of the pre-trained clinical domain is used as an evaluation index preiCU _ risk _ score for measuring the condition of the patient before entering the ICU.
Preferably, in the model development step, a BERT model in the pre-training clinical field as a deep learning model and a multivariate logistic regression model as a machine learning model are subjected to fusion modeling, and fusion analysis of unstructured data and structured data is realized, so that fusion consideration and quantification of chronic/historical conditions of patients before entering an ICU and acute conditions of patients on the day of entering the ICU are realized, and a grading model which accords with clinical behavior habits, can be explained and is transparent is constructed.
Preferably, in the model evaluation step, the performance of the obtained risk scoring model is fully evaluated by using a model evaluation module;
the risk scoring model evaluates the prognosis of the patient according to the pre-ICU condition of the patient and the condition information of the first day of ICU entry;
by combining clinical actual requirements, application patients of risk scoring models and clinical existing evaluation modes, 3 types of use scenes are designed to compare model performance differences of a selected scoring system and corresponding scenes, namely:
(1) When a patient enters an ICU, evaluating the disease severity and short-term prognosis of the patient according to the medical record of the patient before entering the ICU, namely evaluating the prognosis of the patient by a preiCU _ risk _ score obtained by a deep learning model;
(2) On the first day when a patient enters an ICU, little information is obtained before the patient enters the ICU, the disease severity and short-term prognosis of the patient are evaluated according to the patient information collected and measured on the same day and the treatment intervention received by the patient, and the patient prognosis is evaluated by using a multivariate logistic regression model of a machine learning model according to GCS (general learning system) scores, whether booster drugs are used, CCI (clinical information) indexes, whether the patient lies on bed absolutely, whether mechanical ventilation is performed, respiratory frequency, whether emergency admission is performed, shock indexes and whether relaxation treatment is selected as input;
(3) Selecting a clinically common disease severity scoring system such as SOFA and SAPSII scores to evaluate the prognosis of the patient;
based on the comparison of the 3 types of use scenes, the performance and indexes of clinical attention are combined, and the model performance is fully quantitatively evaluated, compared and visually presented.
The invention has the technical advantages that:
(1) Aiming at the fact that the clinical condition of the patient before ICU admission affects the prognosis and the outcome of the elderly patient, according to the medical history text record content of the patient in the EHR, the chief complaint, the family history, the current medical history, the medication at the time of admission, the past medical history, the physical examination and the social history are extracted to be used for evaluating the physiological state/the severity of the disease of the patient before ICU admission;
(2) Three novel pre-trained Clinical domain BERT models (Bio-Clinical BERT, clinical-Bigbird and Clinical-Longformer) were fine-tuned to perform downstream tasks, namely characterizing a patient's risk score by characterizing the patient's disease severity as a measure of his pre-ICU information record;
(3) The risk assessment model was constructed to more fully incorporate variables related to the characteristics of the elderly patient, including weakness, comorbidities, cognitive function (including delirium), nutrition, exercise tolerance, and history prior to ICU entry. Meanwhile, the core collected information of the patient entering the ICU on the first day, including vital signs, laboratory examination and urine volume, and the important treatment intervention operation received by the patient, are also included;
(4) The risk scoring model/system is presented in a Nomogram manner, facilitating understanding, absorption and use by healthcare personnel. Can be used as a reference guide for the consultation of patients or wards and family members, and is convenient for the popularization of scoring and the quick deployment/embedding into information systems of other hospitals;
(5) According to the method, the unstructured medical record texts and the structured acquisition and operation record data are subjected to fusion analysis in the field of disease health monitoring of the elderly patients for the first time, fusion modeling of deep learning and machine learning methods is realized, and finally, the system is constructed into an interpretable disease risk scoring system which accords with medical behavior habits;
(6) The comprehensive evaluation indexes are adopted, the comparison evaluation of the ROC curve (measuring discrimination), the calibration curve (measuring calibration) and the DCA curve (measuring clinical net income) on the model, the baseline model and the clinical common score is carried out, and the performance of the scoring system is consistent and is obviously superior to that of the compared model and score;
(7) The present application encapsulates the disease risk scoring system as a convenient device to use, i.e. only the information needed for the class 10 models needs to be entered to obtain the patient's Nomogram output, nosocomial mortality risk probability, and current risk level (low, medium, high).
Drawings
FIG. 1 is a flow chart of a method implementation of the present application;
FIG. 2 is a schematic diagram of an application scenario of the predictive scoring system;
FIG. 3 is a schematic of inclusion and exclusion criteria for a patient;
FIG. 4 is a diagram showing AUROC comparison of clinical knowledge characterization models at different hyper-parameter settings;
FIG. 5 is a schematic diagram of the selection of risk variables using the Lasso regression method;
FIG. 6 is a schematic diagram of a correlation analysis of important risk variables;
FIG. 7 is a disease severity assessment nomogram for an elderly ICU patient, incorporating a deep learning characterized feature (preicU risk score) and a clinical variable diagram;
FIG. 8 is a schematic representation of a risk assessment nomogram for a patient;
FIG. 9 is a schematic comparison of a prediction model with a baseline model and ROC curves for internal validation and time series validation of clinical scores;
FIG. 10 is a graphical comparison of a predictive model to a baseline model and a calibration curve for an internal validation set and a time series validation set of clinical scores;
FIG. 11 is a graphical comparison of the clinical decision curve for the predictive model versus the baseline model and clinical scores for the internal validation set and the time series validation set;
fig. 12 is a schematic diagram of a Nomogram risk scoring device for early assessment of hospital adverse outcomes for elderly patients in the ICU.
Detailed Description
The present application will be described in detail below with reference to the accompanying drawings.
The invention provides a method for evaluating the disease severity and the in-hospital death risk of a patient based on a medical record text of an aged patient with a large sample size before entering an ICU, conventionally collected variables on the day of entering the ICU and received key therapeutic intervention. The medical record text is processed, modeled and represented through a data processing and model development module, specifically, a pre-trained deep learning language BERT model is finely adjusted to finish a downstream task to evaluate and quantify the severity of a disease of a patient before entering an ICU, and the severity is expressed by a preiCU risk score. The process is similar to the physician's assessment of the severity of a patient's disease by analyzing the patient's information prior to entering the ICU, i.e., mimicking the physician's current status by querying the medical history with a more comprehensive and stereoscopic perception, assessment and quantification of the patient's current condition. The processing, modeling and feature selection of the structured data are realized through a data processing and model development module, and then important risk factors are obtained. The method is characterized in that comprehensive modeling is carried out on the key risk factors selected by the preicU risk score after deep learning is subjected to semantic analysis and representation and machine learning, a multivariable logistic regression model is built, and the prediction model is explained and presented in a Nomogram form, so that the behavior habit of medical staff is met. Model evaluation covers internal verification and time sequence verification, and compares the scoring system with 3 types of models/scores (predICU risk score constructed by a deep learning model only considering the condition of a patient before entering an ICU, a machine learning model constructed only considering important risk factors selected by the patient on the first day of entering the ICU, and a disease severity scoring system commonly used in clinic), and key contents of clinical attention, namely, the degree of distinction, the degree of calibration and clinical net income are analyzed in an important mode. Through the process, the evaluation on the disease severity and short-term prognosis (hospital outcome) of the elderly patient can be finally obtained only by inputting 10 types of information very conveniently, reference basis is provided for a doctor to reasonably make/adjust a treatment scheme, and meanwhile the problem that the workload of extra supercharged medical staff needs to be calculated manually is avoided. Wherein the 10 types of information are respectively: text records of medical records before entering the ICU (all the relevant content of 7 types can be input or part of the relevant content can be input), GCS scores, whether booster is used or not, CCI indexes, respiratory rate, whether the type of admission is urgent or not, shock indexes, whether mechanical ventilation treatment is carried out or not, whether absolute bed is in use or not, and code states. Since only 10 types of information are needed, and 5 types are yes/no, compared with the SOFA score commonly used in clinic and the more complicated acute physiology and chronic health score, the evaluation can be performed very conveniently, quickly and accurately, the problem that the workload of extra pressurized medical staff needs to be calculated manually is avoided, and the system is easy to be deployed and popularized in different medical institutions (trimethyl/non-trimethyl) and equipment (computer/online calculator/tablet/manual evaluation and recording).
The invention develops and verifies a disease severity scoring system for the elderly ICU patients based on deep learning and machine learning. Hospitalized mortality is the target of interest that we use to assess disease severity. The score was designed in conjunction with the patient's past medical history, a description of admission status, and the common measurements of the ICU day one. Past medical history records, descriptions of admission status are usually recorded in clinical texts. Common measurement indicators should include as many risk variables as possible, such as weakness, cognitive function and whether treatment is palliative, etc., associated with elderly patients. Clinical texts, basic information of patients, weakness degree, cognitive functions, vital signs, laboratory examination, treatment and urine volume information are included in the research, and an ICU (intensive care unit) elderly disease severity scoring system, also called as a credible AI agent, is constructed. Fig. 2 presents an overview of the application scenario for this scoring.
The invention provides a Nomogram ICU (intensive care unit) old-age risk scoring system and device fusing medical history texts and an establishing method thereof. The specific implementation is shown in fig. 1, and comprises the following steps:
1. the data acquisition module process in the invention is as follows:
this study used electronic Medical record data from aged patients aged 65 years and older collected from the Beth Israel access Medical Center (BIDMC) in 2001 to 2019, which was extracted from MIMIC-III and updated MIMIC-IV (Medical Information market for intelligent Care, MIMIC). The inclusion criteria for the patients were: no recordings of a second hospital or second ICU visit (including more than two), a duration of less than 24 hours in the ICU, lack of basic measurement recordings (heart rate, respiratory rate, mean arterial pressure, systolic pressure, glasgow coma score, body temperature, and oxygen saturation) were considered. Patients who did not have a past medical history recorded (i.e., no clinical text records) were further excluded. The screening process of our study cohort is shown in figure 3. Patients from 2001 to 2016 were further selected as our development set for training and internal validation of the model, with the remaining patients as a time-series validation set to evaluate the performance of the model for future hospital use.
In view of the characteristics of the elderly patient population and the design goals of early evaluation and ease of clinical use, we collected a commonly used text record prior to admission to the ICU, and data conveniently measured on the first day of the ICU. Unstructured data is shown in FIG. 1, including chief complaints, family history, current medical history, medications at the time of admission, past medical history, physical examination, and social history. For the structured data, basic information (age, body mass index, sex, seerson syndrome index, days of hospitalization before ICU admission, type of hospitalization), cognitive function (delirium, glasgow score), activity tolerance (recumbent bed, sitable, standable), vital signs (shock index, respiratory rate, heart rate, inhaled oxygen concentration, mean arterial pressure, systolic pressure, oxygen saturation, body temperature), laboratory tests (albumin, alkaline phosphatase, alanine transaminase, anion space, aspartate transaminase, alkali residual, bicarbonate, bilirubin, urea nitrogen to creatinine ratio, chloride, creatinine, estimated glomerular filtration rate, blood glucose, hematocrit, hemoglobin, international normalized ratio, lactate, lymphocytes, magnesium ions, neutrophils, neutrophil to lymphocyte ratio, carbon dioxide partial pressure, oxygen partial pressure, oxygenation index at mechanical ventilation, platelet, potassium, prothrombin time, partial thrombin time, sodium, viable cell count, blood cell count, mechanical ventilation code (or urine output) is specifically included. In addition, clinical scores, including SOFA and SAPSII, were also calculated to make comparisons of predictive performance.
2. The data processing module process in the invention is as follows:
unstructured data and structured data are processed in different ways, respectively. The extracted text is converted into lower case letters, and special characters such as "=", "- -", and "/n" are removed. In consideration of the heterogeneity of sentence lengths, a sliding window method is adopted to segment the text so as to avoid information loss caused by too long sentences in the process of training the model. The window size and the sliding step size are then used to study and obtain a set of sentences of suitable length. Each sentence is segmented and further mapped to a corresponding ID according to a dictionary of the selected language model. Finally, it is embedded as a vector characterized by a fixed length, representing the information implied in a sentence.
For structured data, values that exceed the physiological boundary values for each variable are deleted. On the first day of the ICU, the clinically significant worst values for each variable were calculated, such as the highest and lowest laboratory tests, the lowest and average vital signs, and the presence or absence of treatment. Thus, the data is consistent for each patient. The induction of missing values is done in 3 ways: variables with a deletion rate below 30% are calculated from the median of the variable; the variable with deletion rate exceeding 30% is inserted into 0, and the corresponding variable is named 'flag' additionally, such as lactic acid _ flag, and is used to indicate whether the measured value exists; and interpolating 21% of the missing value of FiO2, and constructing an index variable 'FiO 2_ flag'. Frailty (frailty index, FI-LAB) is a frailty index for assessing the risk of death in elderly people, which is calculated from 21 routine laboratory data, SBP and DBP. The nutritional risk index for elderly (GNRI) is a nutritional index that assesses the risk of morbidity and mortality in elderly patients and is calculated from albumin, weight and height. The classification variable is further set to a factor variable. After the above feature construction is completed, the names finally used for model training and variable selection are presented according to types and source classifications, as shown in table 1.
The processed unstructured and structured data are matched and correlated by patient ID for constructing a study data set. We randomly drawn 80% of the patient data from 2001 to 2016 as the training set and the remaining 20% as the internal validation set. Patients from 2017 to 2019 were treated individually as a time-series validation set. We calculated the failure rates of the development set and the time series verification set of the structured numerical variables before feature construction, respectively, as shown in table 2. Table 3 presents a baseline comparison of patients in the development and time series validation sets, where 29474 total elderly critically ill patients were analyzed by the present invention. The total study cohort was divided into a development set with 26473 patients (13.1% mortality) and a time series validation set with 3001 patients (12.0% mortality).
TABLE 1 summary of names of included variables
Figure BDA0003904082930000101
Figure BDA0003904082930000111
TABLE 2 feature missing information for development set and time sequence validation set
Figure BDA0003904082930000112
Figure BDA0003904082930000121
Figure BDA0003904082930000131
TABLE 3 patient Baseline comparison in development and time series validation sets
Figure BDA0003904082930000132
Figure BDA0003904082930000141
Figure BDA0003904082930000151
Figure BDA0003904082930000161
3. The model development module process in the invention is as follows:
in this study, we have selected three novel pre-trained Clinical field BERT models (Bidirectional Encoder responses from transducers) to perform our downstream tasks of characterizing patient disease severity by patient record before ICU (mimicking physician's more comprehensive and stereoscopic perception and assessment of patient's current condition by querying medical history) -Bio-Clinical BERT, clinical-Bigbird and Clinical-Longformer. The Clinical-Bigbird and Clinical-Longformer were designed to reduce the huge memory consumption due to the Full Self-attention mechanism calculations and enhance the modeling capability of the model for long-term dependencies of the sequences through a sparse attention mechanism. We fine-tune the 3 pre-trained models described above to assess the patient's risk of in-hospital death based on the clinical text of 85.7% of the patients in the training set, with an additional 14.3% of the patients' data being used to assess the performance of the models to predict the models under different hyper-parameter combinations, while avoiding model overfitting. The trimming process is performed on two parallel GPUs. We perform sliding segmentation on the text of each patient by 120 words or 240 words, with the sliding step length being 10 or 20 words, respectively, to form a segmented corpus data set. According to the method of fine tuning and some pre-experiments, we set the epoch and batch size to 2 and 12, respectively, select the Adam optimizer and set the learning rate to vary in get _ linear _ schedule _ with _ warp manner, and we explore varying learning rates between 1e-5 and 5e-5 (interval 1 e-5), respectively. Table 4 details our comparison of the three above-mentioned clinical language models for their performance at different hyper-parameter settings of the internal and temporal verification sets, and FIG. 4 shows the AUROC versus 95%CI for the models. In considering the performance of the model in the past and the future, a Clinical-Longformer with a window size of 240 and a learning rate of 1e-5 is selected to obtain the preicU _ risk _ score of each patient.
TABLE 4 detailed comparison of three deep-learning clinical Bert models
Figure BDA0003904082930000162
Figure BDA0003904082930000171
Figure BDA0003904082930000181
Figure BDA0003904082930000191
We choose to obtain the best set of risk variables based on the entire training set. We first obtained the probability (ORs) and 95% Confidence Interval (CI) and P-value for each variable using univariate logistic regression analysis and selected useful clinical variables to assess the severity of disease in elderly patients; then, adopting an LASSO regression algorithm, carrying out 5-time cross validation, and further screening key variables; we then select again the important variables using the forward and backward Stepwise algorithm based on Akaike Information Criterion (AIC); finally the selected variables were input into the multivariate LR model, obtaining the OR values and 95% CI, P values and variable coefficients. In order to ensure the universality and the simplicity and usability of the risk score, variables with small coefficients or high deletion rate are excluded in subsequent researches. Furthermore, the correlation of continuous variables needs to be checked before modeling the prediction scores. In table 5, univariate logistic regression showed that 10 independent risk factors had no significant association with assessing the risk of hospitalization mortality. The process of finding the best lambda by Lasso regression is shown in figure 5. When lambda.1se is selected, we have 25 non-zero coefficient variables from the most refined combination of variables by deleting the variables with the LASSO LR model with 5-fold cross validation. In table 6 we list the variables selected by the two algorithms LASSO regression, forward and backward Stepwise algorithm and their corresponding model coefficients. The medical staff can use the medical staff quickly by using the 9 key important variables, namely the important risk factors, of which the coefficient absolute value is larger than 0.06 and which are easy to acquire and obtain, wherein the 9 key important variables include the lowest GCS score value, the booster drug (yes/no), the CCI index, the average respiratory rate, the mechanical ventilation (yes/no), the admission type (emergency/non-emergency), the code state (yes/no treatment), the optimal activity (absolutely in bed/non-in bed) and the shock index. For the continuous variable, the result of the correlation analysis also meets the requirement of the subsequent modeling, as shown in fig. 6.
TABLE 5 univariate analysis of all variables
Figure BDA0003904082930000192
Figure BDA0003904082930000201
Figure BDA0003904082930000211
Figure BDA0003904082930000221
TABLE 6 selection of Risk variables by Lasso regression model and stepwise algorithm
Figure BDA0003904082930000222
Figure BDA0003904082930000231
We obtained a risk probability based on ICU pre-hospital recordings using a model based on deep learning, named preICU _ risk _ score (i.e., preICU risk score). If one patient obtains multiple predicted results from sentence splits, the average probability is used as the final result. Combining preICU _ risk _ score (for ease of calculation and presentation of results, the probability value is expanded to 10 times) and 9 selected key important variables, we applied a multivariate logistic regression model (Multivariable model) on the training set to obtain the final prediction model. And according to the result of the model development, performing visual description presentation on the fitting model by adopting a Nomogram diagram. Finally we obtained a disease severity scoring system that allowed the assessment of elderly patients early on the first day of the ICU. In table 7, univariate and multivariate analyses showed that they had a significant effect on the outcome of hospitalization in elderly patients (P <0.001 for all variables except admission type 0.022). From the final training model we obtained a nomogram, see fig. 7. Thresholds for low, medium and high risk levels may be set based on the risk probability of the patient (model output) and the physician's requirements for accuracy and sensitivity. Figure 8 is an example of the calculation process and final risk level of a disease severity score obtained after a patient has entered data according to 10 dimensions. The disease severity score for an elderly ICU patient was calculated as:
Total points=10*preICU_risk_score-1.4116*GCS score(min)+ 3.5544*vasopressor(use=1,no use=0)+1.769*CCI score+1.0344*respiratory rate (bpm)+2.8915*admission type(urgent=1,no urgent=0)+18.2479*shock index (bpm/mmHg)+4.4857*mechanical ventilation(received=1,not received=0)+ 9.9566*activity status(bed=1,not bed=0)+9.6055*code status(received=1,not received=0)
TABLE 7 univariate and multivariate analysis of risk factors selected for early assessment of mortality
Figure BDA0003904082930000241
4. The model evaluation module process in the invention is as follows:
our clinical alignment chart model was compared to three types of models, including a deep learning model developed from unstructured data (clinical text), namely The preiccurred disease severity Score assessed into The ICU, a multivariate logistic regression model developed from structured data using 9 key important variables selected by a machine learning model, and SOFA (The Sequential organic disease Assessment) and sapsi (Simplified experience physics Score II) scores routinely used in clinical practice. The ROC curve (Receiver operating characteristic curve) is used to evaluate the ability of the model to discriminate survivors from non-survivors. The Calibration curve (Calibration curve) used 500 resampling to assess the agreement of actual and predicted risk probabilities. The DCA curve (precision curve analysis) evaluates the clinical utility of the model by the net gain obtained when varying the different probability thresholds. All models were evaluated for performance by means of internal and time-series validation. Comparative indices include AUROC (area under subject work curve), AUPRC (area of curve bounded by accuracy and recall), sensitivity, specificity, F1 score and accuracy and their corresponding 95% CI (confidence interval) values and Brier score, where 95% of the CIs were obtained using 500 bootstrap resampling.
We compared four application scenarios/scoring systems, including our risk score [ named here Nomogram score (fusion of clinical notes and variables), structured data score (clinical variables), preICU _ risk _ score (clinical notes), SAPSII, and SOFA (clinical common score). In fig. 9 (a) and (b), using ROC curves showing a discrimination comparison of the internal and time series validation class 4 models, both Nomogram scores are significantly better than the other scores. Table 8 shows a detailed comparison of the metrics using 500 bootstrap iterations. For internal validation, the nomogrm score has an AUROC (95% ci) of 0.84 (0.816-0.861), a structured data score of 0.798 (0.77-0.821), a preICU risk score of 0.767 (0.741-0.794), a SAPSII score of 0.764 (0.738-0.792), and a SOFA score of only 0.697 (0.667-0.732). For timing verification, AUROCs (95% CI) were 0.871 (0.851-0.887), 0.814 (0.791-0.837), 0.806 (0.777-0.831), 0.762 (0.736-0.79), and 0.746 (0.717-0.774), respectively. In fig. 10 (a) and (b), calibration performance is shown using calibration curves. Both in-house and time series validation, the Nomogram score showed an acceptable Brier score (in-house: 1.097, time series: 1.062). In fig. 11 (a) and (b), the Nomogram score also shows broader net clinical decision benefit in internal and temporal validation DCA curves compared to the other 4 scoring systems.
TABLE 8 Performance comparison of predictive models to baseline models and clinical scores for internal and temporal validation
Figure BDA0003904082930000251
Figure BDA0003904082930000261
Finally, data acquisition, data processing, model operation and model output are integrated and packaged into a device convenient to operate and use, as shown in fig. 12. Namely, medical history texts (chief complaints, family history, current medical history, medication at the time of admission, previous medical history, physical examination and social history [ partial or whole contents can be selectively input according to actual conditions ]) before the patient enters the ICU and numerical/type information (GCS score, whether booster medication is used, CCI index, whether a bed is absolutely laid, whether mechanical ventilation is performed, breathing frequency, whether emergency admission is performed, shock index and whether relaxation treatment is selected) in the first day of entering the ICU are input, and further through internal processing, calculation, analysis and output of the device, a Nomogram calculation chart, the in-patient death risk and the risk level (low/medium/high) of the patient are obtained. The grade division is determined by the doctor using the scene to decide the threshold value range of different grades, and the default range of the device model (low: 0-0.1, medium: 0.1-0.35, high: 0.35-1) can be selected. The Nomogram ICU old risk assessment system and device fusing case history texts are convenient to popularize and use due to the small amount of input, convenient operation and transparent calculation process. In addition, in consideration of the text input of medical records in Chinese format and personal preference of different users, a scoring system obtained by training English medical records by using Bio-Clinical BERT, clinical-Bigbird, clinical-Longformer and PubMedBERT as pre-training models and a scoring system obtained by using PCL-MedBERT, chineseBLUE, chineseEHRBert and MedBERT as pre-training models are respectively provided for English medical records input.
Unless defined otherwise, all technical and/or scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The materials, methods, and examples set forth in this application are illustrative only and not intended to be limiting.
Although the present invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications, and variations will be apparent to those skilled in the art in light of the teachings of this application and yet remain within the scope of this application.

Claims (13)

1. A Nomogram ICU geriatric disease risk scoring model fused with medical history text, comprising: the system comprises a data acquisition module, a data processing module, a BERT model calculation module, a multivariate logistic regression model and a Nomogram output module;
the data acquisition module is used for acquiring medical record text information of a patient before the patient enters the intensive care unit ICU and numerical information acquired on the first day when the patient enters the ICU; the medical record text information is unstructured and comprises chief complaints, family history, current medical history, medication at the time of admission, past medical history, physical examination and social history; the numerical information is structured and includes GCS score, whether to use a booster, CCI index, whether to lie absolutely on bed, whether to perform mechanical ventilation, respiratory rate, whether to admit emergency, shock index, whether to select a soothing treatment;
the data processing module is used for processing the medical record text information and the numerical information; wherein the processing of the medical record text information comprises: the method comprises the following steps of converting characters/letters in lower case, removing special characters, performing sentence sliding segmentation, segmenting words, and embedding and representing sentences, wherein medical record text information is characterized into a plurality of vectors with preset lengths after being processed;
the BERT model calculation module calculates and obtains a disease severity assessment preiCU _ risk _ score before entering the ICU according to a plurality of coding vectors with preset lengths of standard medical record text information based on the selected fine-tuning pre-training clinical text model;
the multivariate logistic regression model takes a preiCU _ risk _ score and a GCS score, whether to use booster drugs, a CCI index, whether to absolutely lie a bed, whether to carry out mechanical ventilation, a respiratory frequency, whether to admit a hospital urgently, a shock index and whether to select a relaxation treatment as input, and the risk probability of hospitalization and death and the risk level of the patient are calculated;
the Nomogram output module outputs Nomogram and disease severity scores for the patient based on inputs to the multivariate logistic regression model and parameters of the model.
2. The Nomogram ICU geriatric disease risk scoring model fused with medical record text as recited in claim 1, wherein:
the data acquisition module extracts the chief complaints, family history, current medical history, medication at the time of admission, past medical history, physical examination and social history of the patients from the electronic health records.
3. The Nomogram ICU geriatric disease risk scoring model fused with medical record text as recited in claim 1, wherein:
for English medical records, the BERT type model calculation module is one of Bio-Clinical BERT, clinical-Bigbird, clinical-Longformer and PubMedBERT models; for the Chinese medical record, the BERT type model calculation module is one of PCL-MedBERT, chinesBLUE, chinesEHRBert and MedBERT models.
4. The Nomogram ICU geriatric disease risk scoring model fused with medical record text as recited in claim 1, wherein:
the risk classes include low risk, medium risk, high risk;
and (4) according to the calculated in-patient death risk probability of the patient, regarding the patient with the in-patient death risk probability of 0-0.1 as a low risk, regarding the patient with the in-patient death risk probability of 0-0.35 as a medium risk, and regarding the patient with the in-patient death risk probability of 0-1 as a high risk.
5. The Nomogram ICU geriatric disease risk scoring model fused with medical record text as recited in claim 1, wherein: disease severity scoring was performed according to the following calculation:
Total points=10*preICU_risk_score-1.4116*GCS score+3.5544*vasopressor+1.769*CCI score+1.0344*respiratory rate+2.8915*admission type+18.2479*shock index+4.4857*mechanical ventilation+9.9566*activity status+9.6055*code status;
wherein Total points is the disease severity score; GCS score is GCS score; the vasopressor is whether the booster is used or not, if so, the vasopressor is 1, and if not, the vasopressor is 0; CCI score is the CCI index; respiratory rate is the respiratory rate; the admission type is whether to admit the hospital urgently, if so, the admission type is 1, and if not, the admission type is 0; shock index is the shock index; whether mechanical ventilation is carried out or not is 1 if yes, and 0 if not; the activity status is whether the bed is absolutely lying or not, if so, 1, and if not, 0; code status indicates whether a relaxation treatment is selected, and is 1 if yes, and 0 if no.
6. The Nomogram ICU geriatric disease risk scoring model fused with medical record text as recited in claim 1, wherein:
when the sentence is cut in a sliding mode, a sliding window method is adopted for cutting.
7. A Nomogram ICU geriatric risk scoring apparatus fused with a medical history text, implemented by a computer, configured to run the Nomogram ICU geriatric risk scoring model fused with a medical history text as claimed in any one of claims 1 to 6.
8. A method for establishing a Nomogram ICU (intensive care unit) elderly disease risk score model fused with a case history text comprises the following steps:
a data acquisition step; acquiring information for model construction from an electronic health file of a patient, wherein the information comprises medical record text information before the patient enters an Intensive Care Unit (ICU) and numerical information acquired on the first day when the patient enters the ICU; extracting medical record text information including chief complaints, family history, current medical history, medication at the time of admission, past medical history, physical examination and social history; extracting numerical information including basic information, cognitive function, activity tolerance, vital signs, laboratory examination, therapeutic intervention, fluid output, i.e., urine volume, and clinical common score;
a data processing step; cleaning, processing and characteristic construction are carried out on case text information of unstructured data and numerical value information of structured data, and segmentation of a data set is completed and used for training and evaluating a subsequent model; the processing of unstructured data includes: the method comprises the following steps of character/mother lower case conversion, special character removal, sentence sliding segmentation, word segmentation and sentence embedding representation; the processing of the structured data comprises: removing abnormal values, aligning data, interpolating, constructing statistical characteristics and setting factor variables; the data set segmentation comprises the preparation of a development set, an internal verification set and a time sequence verification set;
a model development step; obtaining a preiCU _ risk _ score by using a pre-trained deep learning model; training and selecting important risk factors by using a machine learning model; the Nomogram ICU elderly disease risk scoring model fusing medical history texts fuses a preiCU _ risk _ score and the selected important risk factors, and a multivariate logistic regression model is adopted for training to obtain the in-hospital death risk probability, the risk level, the Nomogram and the disease severity score of the patient;
a model evaluation step; selecting a clinically concerned performance index under different verification modes, and comparing a plurality of baseline models with different scenes/requirements; the baseline model includes: the method comprises the following steps of (1) obtaining a preiCU _ risk _ score only by using a medical record text, and modeling by using structured data only, namely selecting important risk factors and a clinical common disease severity scoring system; the authentication types/modes include: internal verification and timing verification; the performance evaluation includes ROC curves, calibration curves, DCA curves and associated 7 evaluation indices, the 7 indices including: area under subject working curve, area of curve surrounded by accuracy and recall, sensitivity, specificity, F1 score, accuracy and its corresponding 95% confidence interval value, brier score.
9. The method for establishing the Nomogram ICU senile disease risk scoring model fused with the medical record text according to claim 8, wherein:
obtaining an optimal set of risk variables based on the entire training set: (1) Obtaining probability of each variable and 95% confidence interval and P value by using univariate logic analysis, and selecting clinical variables with P value less than 0.05; (2) Adopting an LASSO regression algorithm, and performing 5-time cross validation to delete variables when lambda.1se parameter setting is selected so as to achieve the most simplified variable combination; (3) Selecting important variables again by using a forward and backward Stepwise algorithm based on Akaike information criterion; (4) Inputting the selected variables into a multivariate logistic regression model to obtain OR values and 95% CI, P values and variable coefficients for important variable selection; (5) excluding variables with small coefficients or high deletion rate;
the best risk variables obtained include GCS score, whether to use boosters, CCI index, whether to lie absolutely on bed, whether to mechanically ventilate, respiratory rate, whether to admit promptly, shock index, whether to select palliative treatment; for the continuous variables in the model, correlation analysis is carried out to confirm whether the subsequent modeling requirements are met.
10. The method of claim 8, wherein:
in the data processing step, the data processing module is used for synchronously processing medical record text information of unstructured data and numerical value record information of structured data, and the processed unstructured and structured data are matched and associated through a unique patient identification number (ID), so that fusion analysis can be performed; the unstructured data is embedded into a vector characterized by a preset length and represents the information contained in a text; for structured data, the worst value for each variable in clinical significance on the first day of the ICU was calculated; in addition, a frailty index, an index of risk of nutrition for the elderly is constructed based on the structured data.
11. The method of claim 8, wherein:
in the step of model development, a BERT model in the pre-training clinical field is adopted to execute downstream tasks of evaluating the disease severity of a patient and the risk of death in a hospital so as to realize comprehensive and three-dimensional perception and evaluation of the current condition of the patient by inquiring the medical history of a simulated doctor; for English medical records, the BERT model in the Clinical pre-training field is one of Bio-Clinical BERT, clinical-Bigbird, clinical-Longformer and PubMedBERT models; for the Chinese medical record, a BERT model in the pre-training clinical field is one of PCL-MedBERT, chinesBLUE, chineseEHRBert and MedBERT models; the BERT model of the pre-trained clinical domain is finely adjusted through a large amount of medical record texts to evaluate the in-hospital death risk of the patient, and the probability output of the BERT model of the pre-trained clinical domain is used as an evaluation index preiCU _ risk _ score for measuring the condition of the patient before entering the ICU.
12. The method of claim 8, wherein:
in the model development step, a BERT model in the pre-trained clinical field serving as a deep learning model and a multivariate logistic regression model serving as a machine learning model are subjected to fusion modeling, and fusion analysis of unstructured data and structured data is realized, so that fusion consideration and quantification of chronic/historical conditions of a patient before entering an ICU and acute conditions of the patient on the same day entering the ICU are realized, and a grading model which accords with clinical behavior habits, can be explained and is transparent is constructed.
13. The method of claim 8, wherein:
a model evaluation step, wherein a model evaluation module is used for fully evaluating the performance of the obtained risk scoring model;
the risk scoring model evaluates the prognosis of the patient based on the pre-ICU condition of the patient and the condition information of the first day of ICU entry;
by combining clinical actual requirements, application patients of risk scoring models and clinical existing evaluation modes, 3 types of use scenes are designed to compare model performance differences of a selected scoring system and corresponding scenes, namely:
(1) When a patient enters an ICU, evaluating the disease severity and short-term prognosis of the patient according to the medical record of the patient before entering the ICU, namely evaluating the prognosis of the patient by a preiCU _ risk _ score obtained by a deep learning model;
(2) On the first day when a patient enters an ICU, little information before entering the ICU is acquired, the severity of the disease and the short-term prognosis of the patient are evaluated according to the patient information acquired and measured on the same day and the treatment intervention received by the patient, and the patient prognosis is evaluated by calculating the risk probability of death of the patient in hospital and the risk level of the patient according to the GCS score, whether to use a booster drug, a CCI index, whether to lie on a bed absolutely, whether to carry out mechanical ventilation, the respiratory frequency, whether to admit to a hospital urgently, a shock index and whether to select a soothing treatment as input by a multivariate logistic regression model of a machine learning model;
(3) Selecting a clinically common disease severity scoring system such as SOFA and SAPSII scores to evaluate the prognosis of the patient;
based on the comparison of the 3 types of use scenes, the performance and indexes of clinical attention are combined, and the model performance is fully quantitatively evaluated, compared and visually presented.
CN202211300558.2A 2022-10-24 2022-10-24 Nomogram ICU (intensive care unit) elderly disease risk scoring model and device fusing medical history texts and establishing method thereof Pending CN115527678A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211300558.2A CN115527678A (en) 2022-10-24 2022-10-24 Nomogram ICU (intensive care unit) elderly disease risk scoring model and device fusing medical history texts and establishing method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211300558.2A CN115527678A (en) 2022-10-24 2022-10-24 Nomogram ICU (intensive care unit) elderly disease risk scoring model and device fusing medical history texts and establishing method thereof

Publications (1)

Publication Number Publication Date
CN115527678A true CN115527678A (en) 2022-12-27

Family

ID=84702832

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211300558.2A Pending CN115527678A (en) 2022-10-24 2022-10-24 Nomogram ICU (intensive care unit) elderly disease risk scoring model and device fusing medical history texts and establishing method thereof

Country Status (1)

Country Link
CN (1) CN115527678A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116206755A (en) * 2023-05-06 2023-06-02 之江实验室 Disease detection and knowledge discovery device based on neural topic model
CN116959715A (en) * 2023-09-18 2023-10-27 之江实验室 Disease prognosis prediction system based on time sequence evolution process explanation
CN117133461A (en) * 2023-10-23 2023-11-28 北京肿瘤医院(北京大学肿瘤医院) Method and related equipment for postoperative short-term death risk assessment of aged lung cancer patient
CN117558460A (en) * 2024-01-11 2024-02-13 卓世未来(天津)科技有限公司 Chronic disease management method and system based on small sample learning and large language model

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116206755A (en) * 2023-05-06 2023-06-02 之江实验室 Disease detection and knowledge discovery device based on neural topic model
CN116206755B (en) * 2023-05-06 2023-08-22 之江实验室 Disease detection and knowledge discovery device based on neural topic model
CN116959715A (en) * 2023-09-18 2023-10-27 之江实验室 Disease prognosis prediction system based on time sequence evolution process explanation
CN116959715B (en) * 2023-09-18 2024-01-09 之江实验室 Disease prognosis prediction system based on time sequence evolution process explanation
CN117133461A (en) * 2023-10-23 2023-11-28 北京肿瘤医院(北京大学肿瘤医院) Method and related equipment for postoperative short-term death risk assessment of aged lung cancer patient
CN117133461B (en) * 2023-10-23 2024-01-30 北京肿瘤医院(北京大学肿瘤医院) Method and device for postoperative short-term death risk assessment of aged lung cancer patient
CN117558460A (en) * 2024-01-11 2024-02-13 卓世未来(天津)科技有限公司 Chronic disease management method and system based on small sample learning and large language model
CN117558460B (en) * 2024-01-11 2024-04-05 卓世未来(天津)科技有限公司 Chronic disease management method and system based on small sample learning and large language model

Similar Documents

Publication Publication Date Title
Desautels et al. Prediction of sepsis in the intensive care unit with minimal electronic health record data: a machine learning approach
CN115527678A (en) Nomogram ICU (intensive care unit) elderly disease risk scoring model and device fusing medical history texts and establishing method thereof
Kipnis et al. Development and validation of an electronic medical record-based alert score for detection of inpatient deterioration outside the ICU
Cramer et al. Predicting the incidence of pressure ulcers in the intensive care unit using machine learning
CN110827993A (en) Early death risk assessment model establishing method and device based on ensemble learning
JP5977898B1 (en) BEHAVIOR PREDICTION DEVICE, BEHAVIOR PREDICTION DEVICE CONTROL METHOD, AND BEHAVIOR PREDICTION DEVICE CONTROL PROGRAM
CN108604465B (en) Prediction of Acute Respiratory Disease Syndrome (ARDS) based on patient physiological responses
Baker et al. Continuous and automatic mortality risk prediction using vital signs in the intensive care unit: a hybrid neural network approach
CN115714022B (en) Neonatal jaundice health management system based on artificial intelligence
CN110046757B (en) Outpatient clinic volume prediction system and prediction method based on LightGBM algorithm
CN111063448A (en) Establishment method, storage system and active early warning system of blood transfusion adverse reaction database
Al-Mualemi et al. A deep learning-based sepsis estimation scheme
Ribeiro et al. A machine learning early warning system: multicenter validation in Brazilian hospitals
Zebin et al. A deep learning approach for length of stay prediction in clinical settings from medical records
KR20210112041A (en) Smart Healthcare Monitoring System and Method for Heart Disease Prediction Based On Ensemble Deep Learning and Feature Fusion
Mansouri et al. Predicting hospital length of stay of neonates admitted to the NICU using data mining techniques
CN114023440A (en) Model and device capable of explaining layered old people MODS early death risk assessment and establishing method thereof
Wu et al. Developing and evaluating a machine-learning-based algorithm to predict the incidence and severity of ARDS with continuous non-invasive parameters from ordinary monitors and ventilators
He et al. A multi-attention collaborative deep learning approach for blood pressure prediction
CN112967803A (en) Early mortality prediction method and system for emergency patients based on integrated model
Nakamura et al. Potential impact of initial clinical data on adjustment of pediatric readmission rates
Shahul et al. Machine Learning Based Analysis of Sepsis
Suneetha et al. Fine tuning bert based approach for cardiovascular disease diagnosis
Vieira et al. A decision support system for ICU readmissions prevention
Cesario et al. Early Identification of Patients at Risk of Sepsis in a Hospital Environment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination