US20240055122A1 - Methods, systems and related aspects for real-time prediction of adverse outcomes using machine learning and high-dimensional clinical data - Google Patents

Methods, systems and related aspects for real-time prediction of adverse outcomes using machine learning and high-dimensional clinical data Download PDF

Info

Publication number
US20240055122A1
US20240055122A1 US18/257,925 US202118257925A US2024055122A1 US 20240055122 A1 US20240055122 A1 US 20240055122A1 US 202118257925 A US202118257925 A US 202118257925A US 2024055122 A1 US2024055122 A1 US 2024055122A1
Authority
US
United States
Prior art keywords
time
outcome
feature
data values
monitored
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US18/257,925
Other languages
English (en)
Inventor
Julie K. SHADE
Ashish DOSHI
Eric Sung
Allison HAYS
Natalia A. Trayanova
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Johns Hopkins University
Original Assignee
Johns Hopkins University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Johns Hopkins University filed Critical Johns Hopkins University
Priority to US18/257,925 priority Critical patent/US20240055122A1/en
Assigned to THE JOHNS HOPKINS UNIVERSITY reassignment THE JOHNS HOPKINS UNIVERSITY ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SHADE, JULIE K., SUNG, ERIC, HAYS, Allison, DOSHI, Ashish, TRAYANOVA, NATALIA A.
Publication of US20240055122A1 publication Critical patent/US20240055122A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • A61B5/7267Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems involving training the classification device
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7271Specific aspects of physiological measurement analysis
    • A61B5/7275Determining trends in physiological measurement data; Predicting development of a medical condition based on physiological measurements, e.g. determining a risk factor
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Definitions

  • SARS-CoV-2 severe acute respiratory syndrome coronavirus 2
  • CV cardiovascular
  • myocardial infarction myocardial infarction
  • thromboembolism thromboembolism
  • heart failure cardiovascular manifestations
  • Clinically overt cardiac injury or cardiomyopathy is reported in 8 to 33% of hospitalized patients and is associated with up to 50% mortality, but imaging studies suggest the true incidence of cardiac involvement in all persons infected with SARS-CoV-2 could be as high as 60%.
  • Thromboembolic events are also frequently reported in severe COVID-19 and are associated with mortality; one study found that 70.1% of non-survivors and 0.6% of survivors met criteria for disseminated intravenous coagulation.
  • thromboembolic complications are more pronounced in acute COVID-19 infection than in other viral illnesses, and include pulmonary embolus and ischemic stroke, which can be fatal and are a significant cause of morbidity even as the infection resolves.
  • pulmonary embolus and ischemic stroke can be fatal and are a significant cause of morbidity even as the infection resolves.
  • Machine learning (ML) techniques are ideal for discovering patterns in high-dimensional biomedical data, especially when little is known about the underlying biophysical processes. ML is thus well-positioned for applications in COVID-19 and indeed has been employed in screening, contract tracing, drug development, and outbreak forecasting. ML approaches have been developed for prognostic assessment of hospitalized patients with COVID-19, including models which predict in-hospital mortality, progression to severe disease, and outcomes related to respiratory function. An ML model was also proposed for prediction of thromboembolic events but it required that all variables be present for all patients; did not provide dynamic risk updates, and was trained with data from only 76 patients. Thus far, prognostic ML models have relied on clinical data available at a single time-point, and have not accounted for the dynamic and difficult-to-predict course of the disease.
  • etiologic agent e.g., viral (e.g., COVID-19 and the like), bacterial, fungal, etc.) infections.
  • the present disclosure relates, in certain aspects, to methods, systems, and computer readable media of use in generating models for prognosing adverse outcomes (e.g., adverse cardiovascular (CV) outcomes, such as complications of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) infections, etc.) for a monitored subject infected with an etiologic agent.
  • adverse outcomes e.g., adverse cardiovascular (CV) outcomes, such as complications of severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) infections, etc.
  • CV adverse cardiovascular
  • SARS-CoV-2 severe acute respiratory syndrome coronavirus-2
  • the present disclosure provides a method of generating a model for prognosing a cardiovascular (CV) outcome for a monitored subject infected with an etiologic agent at partially using a computer.
  • the method includes generating, by the computer, a training database that comprises a first set of data values of a first plurality of dynamic and static clinical parameters associated with at least a first plurality of monitored reference subjects infected with the etiologic agent.
  • the method also includes executing, by the computer, at least one variable selection algorithm to select at least a subset of the first plurality of dynamic and static clinical parameters to generate at least a first set of model parameters.
  • the method also includes executing, by the computer, at least one classification algorithm to generate the model for prognosing the CV outcome using at least a subset of the first set of model parameters.
  • the present disclosure provides a method of generating a model for prognosing a cardiovascular (CV) outcome for a monitored subject infected with an etiologic agent at partially using a computer.
  • the method includes generating, by the computer, a first set of data values of a first plurality of dynamic clinical parameters associated with at least a first plurality of monitored reference subjects infected with the etiologic agent, wherein at least a subset of the first set of data values comprises one or more time-series data values.
  • the method also includes processing, by the computer, at least some of the first set of data values for at least some of the first plurality of monitored reference subjects infected with the etiologic agent using one or more sliding time windows that comprise one or more feature time windows associated with one or more outcome time windows, wherein the feature time windows comprise one or more time series features selected from the group consisting of: a short feature, a long feature, and an exponentially weighted decaying feature to produce at least a first set of processed dynamic features.
  • the method also includes combining, by the computer, at least some of the first set of processed dynamic features with a second set of data values of a first plurality of static clinical parameters associated with at least some of the first plurality of monitored reference subjects infected with the etiologic agent for one or more of the time windows to produce at least a first set of combined features,
  • the method also includes training, by the computer, at least one classifier using at least some of the first set of combined features, thereby generating the model for prognosing the CV outcome for the monitored subject infected with the etiologic agent.
  • the present disclosure provides a method of generating a model for prognosing a cardiovascular (CV) outcome for a monitored subject infected with severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) at partially using a computer.
  • the method includes generating, by the computer, a training database that comprises a first set of data values of a first plurality of dynamic and static clinical parameters associated with at least a first plurality of monitored reference subjects infected with the SARS-CoV-2.
  • the method also includes executing, by the computer, at least one variable selection algorithm to select at least a subset of the first plurality of dynamic and static clinical parameters to generate at least a first set of model parameters.
  • the method also includes executing, by the computer, at least one classification algorithm to generate the model for prognosing the CV outcome using at least a subset of the first set of model parameters.
  • the present disclosure provides a method of generating a model for prognosing a cardiovascular (CV) outcome for a monitored subject infected with severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) at partially using a computer.
  • the method includes generating, by the computer, a first set of data values of a first plurality of dynamic clinical parameters associated with at least a first plurality of monitored reference subjects infected with the SARS-CoV-2, wherein at least a subset of the first set of data values comprises one or more time-series data values.
  • the method also includes processing, by the computer, at least some of the first set of data values for at least some of the first plurality of monitored reference subjects infected with the SARS-CoV-2 using one or more sliding time windows that comprise one or more feature time windows associated with one or more outcome time windows, wherein the feature time windows comprise one or more time series features selected from the group consisting of: a short feature, a long feature, and an exponentially weighted decaying feature to produce at least a first set of processed dynamic features.
  • the method also includes combining, by the computer, at least some of the first set of processed dynamic features with a second set of data values of a first plurality of static clinical parameters associated with at least some of the first plurality of monitored reference subjects infected with the SARS-CoV-2 for one or more of the time windows to produce at least a first set of combined features.
  • the method also includes training, by the computer, at least one classifier using at least some of the first set of combined features, thereby generating the model for prognosing the CV outcome for the monitored subject infected with the SARS-CoV-2.
  • the plurality of dynamic and static clinical parameters differs between at two of the reference subjects.
  • one or more of the data values in the first set of data values is absent for one or more of the plurality of reference subjects.
  • the methods include adding one or more additional values to the first set of data values and/or one or more additional dynamic and static clinical parameters to the training database and updating the model for prognosing the CV outcome.
  • the methods include adding a second set of data values of a second plurality of dynamic and static clinical parameters associated with at least a second plurality of reference subjects infected with the SARS-CoV-2 to the training database and updating the model for prognosing the CV outcome.
  • the methods include updating the model for prognosing the CV outcome in substantially real-time.
  • the methods include training the model for prognosing the CV outcome using at least using a stochastic gradient descent method.
  • the first plurality of dynamic and static clinical parameters comprises one or more time-series variables. In certain embodiments, the first plurality of dynamic and static clinical parameters comprises more than about 100 different parameters.
  • the dynamic clinical parameters comprise one or more variables selected from the group consisting of: a dynamic clinical parameter described herein or otherwise known to a person having ordinary skill in the art.
  • the static clinical parameters comprise one or more variables selected from the group consisting of: a static clinical parameter described herein or otherwise known to a person having ordinary skill in the art.
  • the dynamic clinical parameters comprise one or more time series features selected from the group consisting of: a short feature, a long feature, and an exponentially weighted decaying feature.
  • the short feature comprises a selected period of time prior to a given time point.
  • the long feature comprises an entire period to time during which a given reference subject is monitored, wherein corresponding data values are un-weighted.
  • the exponentially weighted decaying feature comprises an entire period to time during which a given reference subject is monitored, wherein corresponding data values are weighted.
  • At least two values in the first set of data values are obtained at different time points from a given monitored reference subject.
  • the methods include pre-processing one or more of the first set of data values in one or more sliding time windows.
  • one or more of the first set of data values of the first plurality of dynamic and static clinical parameters associated with the first plurality of monitored reference subjects infected with the SARS-CoV-2 are obtained when a given reference subject is monitored as an in-patient reference subject.
  • one or more of the first set of data values of the first plurality of dynamic and static clinical parameters associated with the first plurality of monitored reference subjects infected with the SARS-CoV-2 are obtained when a given reference subject is monitored as an out-patient reference subject.
  • the method includes using the model for prognosing the CV outcome to prognose at least one CV outcome of a monitored test subject infected with the SARS-CoV-2 at one or more time points to produce at least one prognosed test subject CV outcome.
  • the method includes determining at least one test risk score for the test subject at the one or more time points, wherein a given test risk score that exceeds a predetermined threshold risk score indicates a probability of the test subject experiencing the CV outcome in a given time window beyond the one or more time points.
  • the method includes determining the test risk score for the test subject in substantially real time.
  • the method includes repeatedly updating the test risk score for the test subject during at least one selected period of time.
  • the method includes integrating the test risk score into an electronic health record (EHR) for the test subject.
  • the method includes administering one or more therapies to the monitored test subject in view of the prognosed test subject CV outcome.
  • EHR electronic health record
  • the CV outcome comprises one or more outcomes selected from the group consisting of: a CV outcome described herein or otherwise known to a person having ordinary skill in the art.
  • the variable selection algorithm is selected from the group consisting of: a supervised machine learning algorithm, an unsupervised machine learning algorithm, Incremental Association Markov Blanket algorithm, a Grow-Shrink algorithm, and a Semi-Interleaved Hiton-PC algorithm.
  • the classification algorithm is selected from the group consisting of: a random forest model, a classification and regression tree model, a linear discriminant analysis model, a decision tree learning model, a support vector machine, a nearest neighbor model, a logistic regression algorithm, an artificial neural network, a generated linear model, and a Bayesian model.
  • the present disclosure provides a system, comprising at least one controller that comprises, or is capable of accessing, computer readable media comprising non-transitory computer executable instructions which, when executed by at least one electronic processor, perform at least: generating a training database that comprises a first set of data values of a first plurality of dynamic and static clinical parameters associated with at least a first plurality of monitored reference subjects infected with an etiologic agent; executing at least one variable selection algorithm to select at least a subset of the first plurality of dynamic and static clinical parameters to generate at least a first set of model parameters; and executing at least one classification algorithm to generate the model for prognosing a cardiovascular (CV) outcome using at least a subset of the first set of model parameters.
  • CV cardiovascular
  • the present disclosure provides a system, comprising at least one controller that comprises, or is capable of accessing, computer readable media comprising non-transitory computer executable instructions which, when executed by at least one electronic processor, perform at least: generating a first set of data values of a first plurality of dynamic clinical parameters associated with at least a first plurality of monitored reference subjects infected with an etiologic agent, wherein at least a subset of the first set of data values comprises one or more time-series data values; processing at least some of the first set of data values for at least some of the first plurality of monitored reference subjects infected with the etiologic agent using one or more sliding time windows that comprise one or more feature time windows associated with one or more outcome time windows, wherein the feature time windows comprise one or more time series features selected from the group consisting of: a short feature, a long feature, and an exponentially weighted decaying feature to produce at least a first set of processed dynamic features; combining at least some of the first set of processed dynamic features with
  • the present disclosure provides a system, comprising at least one controller that comprises, or is capable of accessing, computer readable media comprising non-transitory computer executable instructions which, when executed by at least one electronic processor, perform at least: generating a training database that comprises a first set of data values of a first plurality of dynamic and static clinical parameters associated with at least a first plurality of monitored reference subjects infected with severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2); executing at least one variable selection algorithm to select at least a subset of the first plurality of dynamic and static clinical parameters to generate at least a first set of model parameters; and executing at least one classification algorithm to generate the model for prognosing a cardiovascular (CV) outcome using at least a subset of the first set of model parameters.
  • SARS-CoV-2 severe acute respiratory syndrome coronavirus-2
  • the present disclosure provides a system, comprising at least one controller that comprises, or is capable of accessing, computer readable media comprising non-transitory computer executable instructions which, when executed by at least one electronic processor, perform at least: generating a first set of data values of a first plurality of dynamic clinical parameters associated with at least a first plurality of monitored reference subjects infected with severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), wherein at least a subset of the first set of data values comprises one or more time-series data values; processing at least some of the first set of data values for at least some of the first plurality of monitored reference subjects infected with the SARS-CoV-2 using one or more sliding time windows that comprise one or more feature time windows associated with one or more outcome time windows, wherein the feature time windows comprise one or more time series features selected from the group consisting of: a short feature, a long feature, and an exponentially weighted decaying feature to produce at least a first set of processed dynamic features; combining at least some
  • the present disclosure provides a computer readable media comprising non-transitory computer executable instruction which, when executed by at least one electronic processor perform at least: generating a training database that comprises a first set of data values of a first plurality of dynamic and static clinical parameters associated with at least a first plurality of monitored reference subjects infected with an etiologic agent; executing at least one variable selection algorithm to select at least a subset of the first plurality of dynamic and static clinical parameters to generate at least a first set of model parameters; and executing at least one classification algorithm to generate the model for prognosing a cardiovascular (CV) outcome using at least a subset of the first set of model parameters.
  • CV cardiovascular
  • the present disclosure provides a computer readable media comprising non-transitory computer executable instruction which, when executed by at least one electronic processor perform at least: generating a first set of data values of a first plurality of dynamic clinical parameters associated with at least a first plurality of monitored reference subjects infected with an etiologic agent, wherein at least a subset of the first set of data values comprises one or more time-series data values; processing at least some of the first set of data values for at least some of the first plurality of monitored reference subjects infected with the etiologic agent using one or more sliding time windows that comprise one or more feature time windows associated with one or more outcome time windows, wherein the feature time windows comprise one or more time series features selected from the group consisting of: a short feature, a long feature, and an exponentially weighted decaying feature to produce at least a first set of processed dynamic features; combining at least some of the first set of processed dynamic features with a second set of data values of a first plurality of static clinical parameters associated with
  • the present disclosure provides a computer readable media comprising non-transitory computer executable instruction which, when executed by at least one electronic processor perform at least: generating a training database that comprises a first set of data values of a first plurality of dynamic and static clinical parameters associated with at least a first plurality of monitored reference subjects infected with severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2); executing at least one variable selection algorithm to select at least a subset of the first plurality of dynamic and static clinical parameters to generate at least a first set of model parameters; and executing at least one classification algorithm to generate the model for prognosing a cardiovascular (CV) outcome using at least a subset of the first set of model parameters.
  • SARS-CoV-2 severe acute respiratory syndrome coronavirus-2
  • the present disclosure provides a computer readable media comprising non-transitory computer executable instruction which, when executed by at least one electronic processor perform at least: generating a first set of data values of a first plurality of dynamic clinical parameters associated with at least a first plurality of monitored reference subjects infected with severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), wherein at least a subset of the first set of data values comprises one or more time-series data values; processing at least some of the first set of data values for at least some of the first plurality of monitored reference subjects infected with the SARS-CoV-2 using one or more sliding time windows that comprise one or more feature time windows associated with one or more outcome time windows, wherein the feature time windows comprise one or more time series features selected from the group consisting of: a short feature, a long feature, and an exponentially weighted decaying feature to produce at least a first set of processed dynamic features; combining at least some of the first set of processed dynamic features with a second set of data values of a
  • FIG. 1 is a flow chart that schematically depicts exemplary method steps according to some aspects disclosed herein.
  • FIG. 2 is a flow chart that schematically depicts exemplary method steps according to some aspects disclosed herein.
  • FIG. 3 is a schematic diagram of an exemplary system suitable for use with certain aspects disclosed herein.
  • FIG. 4 Schematic Overview of COVID-HEART Study.
  • A Time-series clinical data used as input. Data shown here are representative and do not correspond with the risk score shown in (D).
  • B Dynamic features pre-processing with sliding time windows. Relative intensity levels within the three feature windows represent the weighting of values at each time; darker colors indicate higher weight.
  • C Combined features. For each time window, the processed dynamic features are combined with static features including demographics and comorbidities. Outcome labels are assigned per-window.
  • D Continuously-updating risk score.
  • the COVID-heart predictor provides a risk score (probability) for a given cardiovascular outcome in the K hours following a given time point. Shown is a sample risk score for a patient that experienced an event: green color indicates low risk score; yellow indicates a risk score within a pre-determined range of a threshold value, and the red indicates that the patient is at high risk for an event in the following K hours.
  • FIG. 5 Participant flow diagram for retrospective study of COVID-HEART. Inclusion and exclusion criteria were applied separately for prediction of each outcome. The data were then temporally divided into development and test sets as shown.
  • FIG. 6 The COVID-HEART predictor can accurately predict the risk of cardiac arrest and thromboembolic events in real time.
  • A COVID-HEART 5-fold cross-validation performance metrics for the two CV outcomes: cardiac arrest and thromboembolic events. Values shown are the mean [95% confidence interval] for each metric over 20 full iterations of cross-validation.
  • Cardiac arrest predictions presented here are for an outcome window of 2 hours, short-time feature window of 2 hours, and time-step of 1 hour.
  • Thromboembolic event predictions shown here are for an outcome window of 24 hours, short-time feature window of 24 hours, and time-step of 24 hours. The best-performing classifier for prediction of each CV outcome is bolded.
  • COVID-HEART test performance metrics for temporally divided test set. Characteristics of this set are provided in Supplementary Table 4.
  • C COVID-HEART test performance metrics over 20 iterations of repeated temporally divided testing.
  • D Risk of cardiac arrest prediction.
  • E Risk of thromboembolic event prediction.
  • FIG. 7 Examples of “true positive” predictions for two different patients, one from the cardiac arrest test set and one from the thromboembolic event test set.
  • A Clinical time-series inputs (top 7 rows) from which the features with the largest coefficients were derived for prediction of cardiac arrest, and time-series risk score (bottom row) for a patient who experienced cardiac arrest during their hospitalization, and for whom the classifier's prediction was correct prior to the cardiac arrest. The most important features derived from these inputs are listed in Table 2.
  • a new prediction is generated every hour.
  • the x-axis refers to the days of admission relative to midnight on the first full day of admission.
  • the binary risk threshold is 0.0008; the red bar indicates the hour during which the patient experienced cardiac arrest.
  • Units for each predictor are as follows: WBC (cells/mm 3 ), Pulse O 2 saturation (%), Pulse (beats/minutes), Chloride (mEq/L), CRP (mg/L), DBP (mmHg), SBP (mmHg).
  • WBC cells/mm 3
  • Pulse O 2 saturation %
  • Pulse beats/minutes
  • Chloride mEq/L
  • CRP mg/L
  • DBP mmHg
  • SBP mmHg
  • Clinical time-series inputs top 4 rows) from which the selected features were derived for prediction of thromboembolic events, and time-series risk score (bottom row) for a patient who experienced a thromboembolic event during their hospitalization.
  • the most important features derived from these inputs are listed in Table 2.
  • a new prediction is generated every 24 hours.
  • the x-axis refers to the days of admission relative to midnight on the first full day of admission.
  • Dashed line (bottom row) indicates binary risk threshold, determined by the development data; red bar indicates the day on which the patient experienced an imaging-confirmed thromboembolic event.
  • Units for each predictor are as follows: magnesium (mEq/L), D-dimer (nmol/L), WBC (cells/mm 3 ), IG Count (%).
  • WBC white blood cell count
  • CPP c-reactive protein
  • DBP diastolic blood pressure
  • SBP systolic blood pressure
  • IG immature granulocyte
  • FIG. 8 COVID-HEART cross-validation and testing results for outcome windows of different duration in predicting each CV outcome using the optimal classifier.
  • Short feature window is 2 hours for prediction of cardiac arrest and 24 hours for prediction of thromboembolic events. Note comparable validation and test results, which indicates strong generalizability. Results shown are for the full temporally-divided development and validation sets (Supplementary Table 4).
  • FIG. 9 Two examples of “true negative” predictions for two patients, one from the cardiac arrest test set and one from the thromboembolic event test set, using the COVID-HEART predictor.
  • A Clinical time-series inputs (top 7 rows) from which the features with the largest coefficients were derived for prediction of cardiac arrest, and time-series risk score (bottom row) for a patient who experienced cardiac arrest during their hospitalization, and for whom the classifier's prediction was correct prior to the cardiac arrest. The most important features derived from these inputs are listed in Table 2. A new prediction is generated every hour. The risk score is below 0.08% for the entire duration of the patient's admission. The date refers to the days of admission relative to midnight on the first full day of admission.
  • Dashed line (bottom row) indicates binary risk threshold, determined by the development data. Units for each predictor are as follows: WBC (cells/mm 3 ), Pulse O 2 saturation (%), Pulse (beats/minutes), Chloride (mEq/L), CRP (mg/L), DBP (mmHg), SBP (mmHg).
  • WBC cells/mm 3
  • Pulse O 2 saturation %
  • Pulse beats/minutes
  • Chloride mEq/L
  • CRP mg/L
  • DBP mmHg
  • SBP mmHg
  • the risk score is low for the entire duration of the patient's admission.
  • the x-axis refers to the days of admission relative to midnight on the first full day of admission. Note that for all dynamic clinical data, values are assumed constant until a new measurement is made.
  • the binary risk threshold is 0.0024 and is not visible due to y-axis limits.
  • Units for each predictor are as follows: magnesium (mEq/L), D-dimer (nmol/L), WBC (cells/mm 3 ), IG Count (%).
  • WBC white blood cell count
  • CRP c-reactive protein
  • DBP diastolic blood pressure
  • SBP systolic blood pressure
  • FIG. 10 Investigation of incorrect predictions by the COVID-HEART predictor for two patients, one from the cardiac arrest test set and one from the thromboembolic event test set.
  • A Clinical time-series inputs (top 7 rows) from which the features with the largest coefficients were derived for prediction of cardiac arrest, and time-series risk score (bottom row) for a patient who experienced cardiac arrest during their hospitalization, and for whom the classifier's prediction was correct prior to the cardiac arrest. The most important features derived from these inputs are listed in Table 2. A new prediction is generated every hour. The risk score fluctuates throughout the patient's hospitalization, crossing above the binary risk threshold several times.
  • Dashed line (bottom row) indicates binary risk threshold, determined by the development data; red bar indicates the hour during which the patient experienced cardiac arrest. Units for each predictor are as follows: WBC (cells/mm 3 ), Pulse O 2 saturation (%), Pulse (beats/minutes), Chloride (mEq/L), CRP (mg/L), DBP (mmHg), SBP (mmHg).
  • B Clinical time-series inputs (top 4 rows) from which the selected features were derived for prediction of thromboembolic events, and time-series risk score (bottom row) for a patient who experienced a thromboembolic event during their hospitalization. The most important features derived from these inputs are listed in Table 2. A new prediction is generated every 24 hours.
  • the risk score peaks midway through patient's hospitalization, then hovers around the binary risk threshold until the event.
  • the x-axis refers to the days of admission relative to midnight on the first full day of admission. Note that for all dynamic clinical data, values are assumed constant until a new measurement is made. Dashed line (bottom row) indicates binary risk threshold, determined by the development data; red bar indicates the day on which the patient experienced an imaging-confirmed thromboembolic event.
  • Units for each predictor are as follows: magnesium (mEq/L), D-dimer (nmol/L), WBC (cells/mm 3 ), IG Count (%).
  • WBC white blood cell count
  • CRP c-reactive protein
  • DBP diastolic blood pressure
  • SBP systolic blood pressure
  • FIG. 11 More time windows are predicted positive for patients that eventually experience each outcome than patients who do not. Proportion of time windows predicted positive (risk probability greater than the binary risk threshold determined by the development data) for patients that do (solid line) and do not (dashed line) experience cardiac arrest (top) and thromboembolic events (bottom) in 5-fold patient-based cross-validation and in the separate test set. Results shown are for the full development and validation sets (Supplementary Table 4).
  • “about” or “approximately” or “substantially” as applied to one or more values or elements of interest refers to a value or element that is similar to a stated reference value or element.
  • the term “about” or “approximately” or “substantially” refers to a range of values or elements that falls within 25%, 20%, 19%, 18%, 17%, 16%, 15%, 14%, 13%, 12%, 11%, 10%, 9%, 8%, 7%, 6%, 5%, 4%, 3%, 2%, 1%, or less in either direction (greater than or less than) of the stated reference value or element unless otherwise stated or otherwise evident from the context (except where such number would exceed 100% of a possible value or element).
  • machine learning algorithm generally refers to an algorithm, executed by computer, that automates analytical model building, e.g., for clustering, classification or pattern recognition.
  • Machine learning algorithms may be supervised or unsupervised. Learning algorithms include, for example, artificial neural networks (e.g., back propagation networks), discriminant analyses (e.g., Bayesian classifier or Fisher's analysis), support vector machines, decision trees (e.g., recursive partitioning processes such as CART-classification and regression trees, or random forests), linear classifiers (e.g., multiple linear regression (MLR), partial least squares (PLS) regression, and principal components regression), hierarchical clustering, and cluster analysis.
  • MLR multiple linear regression
  • PLS partial least squares
  • a dataset on which a machine learning algorithm learns can be referred to as “training data.”
  • a model produced using a machine learning algorithm is generally referred to herein as a “machine learning model.”
  • subject refers to an animal, such as a mammalian species (e.g., human) or avian (e.g., bird) species. More specifically, a subject can be a vertebrate, e.g., a mammal such as a mouse, a primate, a simian or a human. Animals include farm animals (e.g., production cattle, dairy cattle, poultry, horses, pigs, and the like), sport animals, and companion animals (e.g., pets or support animals).
  • farm animals e.g., production cattle, dairy cattle, poultry, horses, pigs, and the like
  • companion animals e.g., pets or support animals.
  • a subject can be a healthy individual, an individual that has or is suspected of having a disease or pathology or a predisposition to the disease or pathology, or an individual that is in need of therapy or suspected of needing therapy.
  • the terms “individual” or “patient” are intended to be interchangeable with “subject.”
  • a “reference subject” refers to a subject known to have or lack specific properties (e.g., known ocular or other pathology and/or the like).
  • Cardiovascular (CV) manifestations of COVID-19 are of significant clinical concern.
  • Current risk prediction for CV complications in COVID-19 is limited and existing approaches fail to account for the dynamic course of the disease.
  • COVID-HEART predictor a novel continuously-updating risk prediction technology to forecast CV complications in hospitalized patients with COVID-19.
  • the risk predictor is trained and tested with retrospective registry data from 2178 patients to predict two outcomes: cardiac arrest and imaging-confirmed thromboembolic events.
  • the COVID-HEART predictor provides tangible clinical decision support in triaging patients and optimizing resource utilization, with its clinical utility extending well beyond COVID-19.
  • FIG. 1 is a flow chart that schematically depicts exemplary method steps of generating a model for prognosing a cardiovascular (CV) outcome for a monitored subject infected with an etiologic agent (e.g., a virus (e.g., SARS-CoV-2), a bacteria, a fungus, or the like).
  • an etiologic agent e.g., a virus (e.g., SARS-CoV-2), a bacteria, a fungus, or the like.
  • method 100 includes generating a training database that comprises a first set of data values of a first plurality of dynamic and static clinical parameters associated with at least a first plurality of monitored reference subjects infected with the etiologic agent (step 102 ).
  • Method 100 also includes executing at least one variable selection algorithm to select at least a subset of the first plurality of dynamic and static clinical parameters to generate at least a first set of model parameters (step 104 ).
  • method 100 also includes executing at least one classification algorithm to generate the model for prognosing the CV outcome using at least a subset of the first set of model parameters (step 106 ).
  • FIG. 2 is a flow chart that schematically depicts some exemplary method steps of generating a model for prognosing a cardiovascular (CV) outcome for a monitored subject infected with an etiologic agent (e.g., a virus (e.g., SARS-CoV-2), a bacteria, a fungus, or the like).
  • an etiologic agent e.g., a virus (e.g., SARS-CoV-2), a bacteria, a fungus, or the like.
  • method 200 includes generating a first set of data values of a first plurality of dynamic clinical parameters associated with at least a first plurality of monitored reference subjects infected with the etiologic agent in which at least a subset of the first set of data values comprises one or more time-series data values (step 202 ).
  • Method 200 also includes processing at least some of the first set of data values for at least some of the first plurality of monitored reference subjects infected with the etiologic agent using one or more sliding time windows that comprise one or more feature time windows associated with one or more outcome time windows in which the feature time windows comprise one or more time series features selected from the group consisting of: a short feature, a long feature, and an exponentially weighted decaying feature to produce at least a first set of processed dynamic features (step 204 ).
  • Method 200 also includes combining at least some of the first set of processed dynamic features with a second set of data values of a first plurality of static clinical parameters associated with at least some of the first plurality of monitored reference subjects infected with the etiologic agent for one or more of the time windows to produce at least a first set of combined features (step 206 ).
  • method 200 also includes training at least one classifier using at least some of the first set of combined features, thereby generating the model for prognosing the CV outcome for the monitored subject infected with the etiologic agent (step 208 ).
  • the present disclosure also provides various deep learning systems and computer program products or machine readable media.
  • the methods described herein are optionally performed or facilitated at least in part using systems, distributed computing hardware and applications (e.g., cloud computing services), electronic communication networks, communication interfaces, computer program products, machine readable media, electronic storage media, software (e.g., machine-executable code or logic instructions) and/or the like.
  • FIG. 3 provides a schematic diagram of an exemplary system suitable for use with implementing at least aspects of the methods disclosed in this application.
  • system 300 includes at least one controller or computer, e.g., server 302 (e.g., a search engine server), which includes processor 304 and memory, storage device, or memory component 306 , and one or more other communication devices 314 , 316 , (e.g., client-side computer terminals, telephones, tablets, laptops, other mobile devices, etc. (e.g., for receiving captured images and/or videos for further analysis, etc.)) positioned remote from camera device 318 , and in communication with the remote server 302 , through electronic communication network 312 , such as the Internet or other internetwork.
  • server 302 e.g., a search engine server
  • server 302 e.g., a search engine server
  • processor 304 e.g., memory, storage device, or memory component 306
  • other communication devices 314 , 316 e.g., client-side computer terminals, telephones, tablets, laptops, other mobile devices, etc. (e.g., for receiving captured images and/
  • Communication devices 314 , 316 typically include an electronic display (e.g., an internet enabled computer or the like) in communication with, e.g., server 302 computer over network 312 in which the electronic display comprises a user interface (e.g., a graphical user interface (GUI), a web-based user interface, and/or the like) for displaying results upon implementing the methods described herein.
  • a user interface e.g., a graphical user interface (GUI), a web-based user interface, and/or the like
  • communication networks also encompass the physical transfer of data from one location to another, for example, using a hard drive, thumb drive, or other data storage mechanism.
  • System 300 also includes program product 308 (e.g., related to an ocular pathology model) stored on a computer or machine readable medium, such as, for example, one or more of various types of memory, such as memory 306 of server 302 , that is readable by the server 302 , to facilitate, for example, a guided search application or other executable by one or more other communication devices, such as 314 (schematically shown as a desktop or personal computer).
  • system 300 optionally also includes at least one database server, such as, for example, server 310 associated with an online website having data stored thereon (e.g., entries corresponding to more reference images and/or videos, indexed therapies, etc.) searchable either directly or through search engine server 302 .
  • System 300 optionally also includes one or more other servers positioned remotely from server 302 , each of which are optionally associated with one or more database servers 310 located remotely or located local to each of the other servers.
  • the other servers can beneficially provide service to geographically remote users and enhance geographically distributed operations.
  • memory 306 of the server 302 optionally includes volatile and/or nonvolatile memory including, for example, RAM, ROM, and magnetic or optical disks, among others. It is also understood by those of ordinary skill in the art that although illustrated as a single server, the illustrated configuration of server 302 is given only by way of example and that other types of servers or computers configured according to various other methodologies or architectures can also be used.
  • Server 302 shown schematically in FIG. 3 represents a server or server cluster or server farm and is not limited to any individual physical server. The server site may be deployed as a server farm or server cluster managed by a server hosting provider. The number of servers and their architecture and configuration may be increased based on usage, demand and capacity requirements for the system 300 .
  • network 312 can include an internet, intranet, a telecommunication network, an extranet, or world wide web of a plurality of computers/servers in communication with one or more other computers through a communication network, and/or portions of a local or other area network.
  • exemplary program product or machine readable medium 308 is optionally in the form of microcode, programs, cloud computing format, routines, and/or symbolic languages that provide one or more sets of ordered operations that control the functioning of the hardware and direct its operation.
  • Program product 308 according to an exemplary aspect, also need not reside in its entirety in volatile memory, but can be selectively loaded, as necessary, according to various methodologies as known and understood by those of ordinary skill in the art.
  • computer-readable medium refers to any medium that participates in providing instructions to a processor for execution.
  • computer-readable medium encompasses distribution media, cloud computing formats, intermediate storage media, execution memory of a computer, and any other medium or device capable of storing program product 508 implementing the functionality or processes of various aspects of the present disclosure, for example, for reading by a computer.
  • a “computer-readable medium” or “machine-readable medium” may take many forms, including but not limited to, non-volatile media, volatile media, and transmission media.
  • Non-volatile media includes, for example, optical or magnetic disks.
  • Volatile media includes dynamic memory, such as the main memory of a given system.
  • Transmission media includes coaxial cables, copper wire and fiber optics, including the wires that comprise a bus. Transmission media can also take the form of acoustic or light waves, such as those generated during radio wave and infrared data communications, among others.
  • Exemplary forms of computer-readable media include a floppy disk, a flexible disk, hard disk, magnetic tape, a flash drive, or any other magnetic medium, a CD-ROM, any other optical medium, punch cards, paper tape, any other physical medium with patterns of holes, a RAM, a PROM, and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave, or any other medium from which a computer can read.
  • Program product 308 is optionally copied from the computer-readable medium to a hard disk or a similar intermediate storage medium.
  • program product 308 or portions thereof, are to be run, it is optionally loaded from their distribution medium, their intermediate storage medium, or the like into the execution memory of one or more computers, configuring the computer(s) to act in accordance with the functionality or method of various aspects. All such operations are well known to those of ordinary skill in the art of, for example, computer systems.
  • this application provides systems that include one or more processors, and one or more memory components in communication with the processor.
  • the memory component typically includes one or more instructions that, when executed, cause the processor to provide information that causes at least one captured image, EMR, and/or the like to be displayed (e.g., via communication devices 314 , 316 or the like) and/or receive information from other system components and/or from a system user (e.g., via communication devices 314 , 316 , or the like).
  • program product 308 includes non-transitory computer-executable instructions which, when executed by electronic processor 304 perform at least: generating a training database that comprises a first set of data values of a first plurality of dynamic and static clinical parameters associated with at least a first plurality of monitored reference subjects infected with the etiologic agent, executing at least one variable selection algorithm to select at least a subset of the first plurality of dynamic and static clinical parameters to generate at least a first set of model parameters, and executing at least one classification algorithm to generate the model for prognosing the CV outcome using at least a subset of the first set of model parameters.
  • Other exemplary executable instructions that are optionally performed are described further herein.
  • COVID-HEART can accurately predict risk in real time for new patients in the face of rapidly changing clinical treatment guidelines.
  • the predictor is next tested with leave-hospital-out nested cross-validation to assess its performance when training and testing is done with data from different populations.
  • the COVID-HEART predictor was developed and validated in a retrospective cohort study approved by the Johns Hopkins University Institutional Review Board on May 21, 2020 under protocol number IRB00249548: Prediction of Cardiac Dysfunction in COVID-19 Patients Using Machine Learning.
  • thromboembolic event prediction if they experienced an imaging-confirmed thromboembolic event or were suspected of experiencing a thromboembolic event immediately prior to admission, which was diagnosed on admission or within 24 hours of admission.
  • cardiac arrest patients were excluded if they experienced cardiac arrest with return of spontaneous circulation immediately prior to admission or if the arrest was precipitated by an event not related to disease severity.
  • FIG. 4 presents a schematic of the COVID-HEART continuously-updating risk predictor.
  • the TRIPOD guidelines for development, validation, and presentation of a multivariable prediction model were followed here (Supplementary Table 1).
  • the model uses a selection of features extracted from 127 different clinical data inputs (shown schematically in FIG. 4 A and presented in detail in Supplmentary Table 2), some of which are associated with CV complications in COVID-19 and in other severe respiratory illnesses.
  • variables that were directly impacted by a physician's assessment of the patient's condition such as the fraction of inspired oxygen set on a mechanical ventilator, are excluded. Definition of these predictors, how they were measured, and pre-processing steps undertaken prior to dynamic feature extraction are provided in Supplementary Methods.
  • the COVID-HEART predictor was trained to estimate the probability that a patient will experience a particular CV event within a set number of hours (outcome window) after any point during the patient's hospitalization. It used static variables (demographics and comorbidities) and dynamic clinical data collected during time periods of markedly different duration prior to the time point of prediction. Dynamic features were calculated from the processed time-series clinical data inputs as illustrated in FIG. 4 B . Each time-point was assigned a binary outcome label indicating whether the patient experienced the outcome of interest in an “outcome window” following the time-point.
  • FIG. 4 C schematically shows an array of processed data for a patient who experienced an adverse CV event.
  • the outcome window for prediction of thromboembolic events was 24 hours as this was the minimum interval in which outcomes could be identified. 2 hours was selected as the outcome window for prediction of cardiac arrest based on practical clinical considerations—this would provide healthcare personnel sufficient time for intervention if indicated. Multiple classifier configurations were investigated for prediction of each outcome, detailed in Supplementary Methods.
  • Eligible patients were divided into development and test sets according to the date of their first admission.
  • the cutoff date was selected such that the development set for each outcome included 70% of eligible patients.
  • Patients in the development set for prediction of cardiac arrest were admitted between Mar. 1, 2020 and Nov. 6, 2020; patients in the test set were admitted between Nov. 7, 2020 and Jan. 8, 2020.
  • the cutoff date for prediction of thromboembolic events was Nov. 5, 2020.
  • Data collection ended on the respective cutoff dates for each set.
  • Classifier development began with five-fold stratified patient-based cross-validation using the development set. We repeated this 20 times for each of the classifier configurations, each time progressively reducing the number of patients used for training and optimization from the full development set by moving the end cutoff date back 1 week (e.g., November 6 th , October 30 th , October 23 rd ). At no point did the reduced training set include any patients from the separate test set.
  • Hyperparameters were optimized through cross-validation with a Bayesian hyperparameter search strategy and the optimal classifier configurations were selected based on the aggregated cross-validation area under the receiver operating characteristic curve (AUROC).
  • the optimal classifier configuration was trained on the full development set and used to predict the time-series risk of each event for each patient in the respective temporally divided test set.
  • a binary prediction was also made at each time point using the optimal threshold determined by the development data during training.
  • Model performance was assessed by the following metrics: accuracy, balanced accuracy, sensitivity, specificity, and AUROC.
  • accuracy was assessed by the following metrics: accuracy, balanced accuracy, sensitivity, specificity, and AUROC.
  • Leave-hospital-out validation was performed by removing all patients admitted to one of the five hospitals in the study, repeating the model training and optimization process using data from patients admitted to the remaining four hospitals, and testing the optimized model with data from patients admitted to the left-out hospital. If a patient was transferred between hospitals or had multiple admissions to different hospitals, their admission to the left-out hospital was used in testing and the rest of their data were removed from the training data set.
  • FIG. 5 shows the flow of patients through the study.
  • Table 1 and Supplementary Table 4 provide demographic and clinical comparisons between patients who did and did not experience each outcome, and between the development and test sets.
  • 18 occurred in the intensive care unit (ICU) three occurred in a non-ICU inpatient unit, four occurred in intermediate care/stepdown, and one occurred in long-term inpatient recovery care.
  • COVID-HEART performance for the two outcomes, in-hospital cardiac arrest and thromboembolic events is summarized in FIG. 6 .
  • Plots of the aggregated cross-validation area under the receiver operating characteristic curves (AUROC) are shown in FIG. 6 A .
  • Linear models were optimal for prediction of both outcomes, and included all features for prediction of cardiac arrest and short features only for prediction of thromboembolic events.
  • the optimized COVID-HEART predictor achieved AUROCs of 0.918 and 0.771, sensitivities of 0.768 and 0.500, and specificities of 0.903 and 0.879 for the full test set for prediction of cardiac arrest and thromboembolic events, respectively ( FIG. 6 B ).
  • Supplementary Table 5 presents leave-hospital-out cross-validation and testing results.
  • the mean test AUROC, sensitivity, and specificity for the left-out hospitals were 0.956 (95% CI: 0.936-0.976), 0.885 (95% CI: 0.838-0.933), and 0.887 (95% CI: 0.843-0.932).
  • the mean test AUROC, sensitivity, and specificity for the left-out hospitals were 0.781 (95% CI: 0.642-0.919), 0.453 (95% CI: 0.147-0.760), and 0.863 (95% CI: 0.822-0.904).
  • FIG. 8 illustrates the COVID-HEART's capability to accurately predict each CV outcome within outcome windows of different durations. This capability may provide significant clinical value in determining the patient's short-term and longer-term risk, thus ensuring appropriate intervention and resources allocation.
  • cross-validation and test results are comparable, indicating strong generalizability of the COVID-HEART despite statistically significant differences in demographics and prevalence of comorbidities between the development and test sets (Supplementary Table 4).
  • FIG. 4 and FIG. 9 provide examples of time-series clinical data and resulting risk scores for “true positive” and “true negative” predictions for patients in the test set for each CV outcome.
  • FIG. 10 illustrates two incorrect predictions; these are discussed in Supplementary Results.
  • the interquartile ranges for the median early warning times over 20 iterations of temporally-divided testing were 14-21 hours for cardiac arrest and 12-60 hours for thromboembolic events, although the classifier was trained to predict outcomes within 2 hours for cardiac arrest and 24 hours for thromboembolic events. This could represent a clinically useful “early warning” system.
  • the COVID-HEART predictor was designed to be fully transparent.
  • Table 2 lists up to 20 features with the largest coefficients in the optimal classifier for each of the two CV outcomes. Note that features were normalized prior to classifier training, and that models are not simple logistic regressions, thus interpretation of the coefficients is not straightforward. Many of these features confirm previous observations in cohorts of severely ill COVID-19 patients. For example, lower O 2 saturation is associated with cardiac arrest and multiple coagulation-related labs results are associated with thromboembolic events.
  • the COVID-HEART predictor a real-time model that can forecast multiple adverse CV events in hospitalized patients with COVID-19.
  • the COVID-HEART predictor is robust to missing data and can be updated each time new data becomes available, representing a continuously evolving warning system for an impending event. It can also predict the likelihood of an adverse event within multiple timeframes (e.g. 2 hours, 8 hours, 24 hours). Although predictions were made at the same time steps for patients in the test set for consistency with the development set, it is possible to apply the model at any arbitrary time during a patient's hospitalization.
  • COVID-HEART is fully transparent thus identifies dynamic predictive features that have not previously been investigated for prediction of these outcomes in patients with COVID-19; these may suggest avenues for future research and personalized targets for clinical intervention.
  • the COVID-HEART risk prediction approach provides transparency and clinical explainability, including the ability to determine which features are dominant contributors to a patient's risk level at a particular time, which may suggest potential patient-specific targets for clinical intervention.
  • Prediction models for CV adverse events in patients with COVID-19 have been limited by lack of sufficient data, impractical requirements for use (e.g. that all data be available for all patients or that measurements are taken at the same time relative to time of admission), and overly restrictive inclusion/exclusion criteria that result in idealistic training and testing cohorts not representative of real patient data.
  • Our model is designed to handle real-world data, which may include noise, missing variables, and data collected at different points in a patient's hospitalization.
  • Models for risk prediction in hospitalized patients have typically focused on predicting mortality risk or length of stay for patients in the ICU.
  • Traditional models incorporate variables thought to indicate physiologic instability or end-organ injury (e.g. respiratory rate, serum bilirubin level, serum creatinine, etc.). While these models generally have good discriminative power, they fail to provide specific, actionable information and simply notify healthcare teams that particular patients are at increased mortality risk at some point in their ICU stay.
  • predictive scores are calculated based on the most extreme variable values during the initial 24 hours of the ICU admission, with repeat calculations every 24-72 hours.
  • Newer models have higher predictive performance compared to traditional models, they are trained to predict the incidence of a particular outcome (e.g. bleeding, renal failure, mortality, etc.) at an indefinite future time. They are not designed to predict the time periods during which patients are at highest risk.
  • a particular outcome e.g. bleeding, renal failure, mortality, etc.
  • prior studies have focused largely on initial diagnosis, mortality, or severity of illness, but none have specifically focused on cardiovascular events, including in-hospital cardiac arrest and thromboembolic events, both clinically important complications with implication for cardiac treatment and monitoring.
  • our model is the first to utilize continuous time series physiologic data as well as laboratory and electrocardiographic data to provide a continuously-updating risk score for an outcome within a particular future time window (e.g. risk of thromboembolic event in the next 24 hours). By providing a risk score for a specific outcome window, our model provides timely, actionable information, allowing the healthcare team to allocate resources and initiate therapies when they are most needed.
  • VTE prophylaxis is one of the treatments most frequently omitted by nursing staff or declined by patients.
  • An analysis of VTE events at our institution over a 72-day period during the Spring 2020 COVID-19 wave demonstrated that 4 out of 11 SARS-CoV-2 positive patients who experienced VTE events had at least one missed dose of VTE prophylaxis. While care providers should ideally strive for 100% compliance with VTE prophylaxis in all eligible patients, the identification of patients at high risk for thromboembolic events may help target these interventions to the patients most in need.
  • identification of high-risk patients would prompt the primary team to assemble specialized staff and equipment, given the high risk of arrest (e.g. calling the anesthesia team for intubation in a high-risk patient, having adequate nursing staff for a possible resuscitation, etc.)
  • a major barrier to clinical adoption of prognostic machine learning models is the lack of appropriate validation on a representative test cohort.
  • the temporally-divided test sets in this study demonstrated the performance of the predictor on a set of patients admitted after the end of data collection for patients in the development set.
  • a prospective cohort would not be expected to have the same composition as the development set; indeed, there were several statistically significant differences in demographics, clinical characteristics, and prevalence of adverse CV events between the development and tests sets in this study.
  • the strong test results show that the predictor is robust to changes in clinical treatment guidelines and evolving demographics. We hypothesize that it maintains its accuracy because it considers data which describe the patient's physiologic state, not variables that are directly influenced by physician input such as ventilator settings or medication use. Further, the predictor maintained strong performance in leave-hospital-out validation, which demonstrated its robustness when trained and tested with data from patients from different populations.
  • Additional limitations stem from the use of the JH-CROWN registry. These include the potential for measurement error, inaccurate patient-reported history (e.g. smoking), and missing data. Another potential limitation is confounding by indication, which means that treatments were selected based on clinical indication. While our model did not include treatments or other variables that were directly influenced by clinical indication, some variables in the model were likely indirectly influenced by clinical indication. For example, the pulse oxygen saturation may have been affected by changes in ventilator settings for patients who were receiving mechanical ventilation. There is also a subgroup of patients who had pre-existing DNR/DNI/comfort care orders.
  • the predictor can facilitate practical, meaningful change in patient triage and the allocation of resources by providing real-time risk scores for CV complications occurring commonly in COVID-19 patients.
  • the COVID-HEART can be re-trained to predict additional adverse CV events including myocardial infarction and arrhythmia.
  • the potential utility of the predictor extends well beyond hospitalized COVID-19 patients, as COVID-HEART could be applied to the prediction of CV adverse events post-hospital discharge or in patients with chronic COVID syndrome (“Long COVID”).
  • the ML methodology utilized here could be expanded to use in other clinical scenarios that require screening or early detection, such as risk of hospital readmission, with the goal of improved clinical outcomes through early warnings and resultant opportunity for timely intervention.
  • the COVID-HEART predictor can identify patient at-risk for adverse CV events by quantitatively evaluating changes in dozens of clinical variables, enhancing clinical practice by providing data-driven clinical decision support.
  • Clinical implementation of the algorithm would require a one-time engineering investment to convert the model and pre-processing algorithms into predictive model markup language.
  • the model could then be fully integrated with an electronic health record system and would require no manual input or time investment by a clinician to calculate or view a patient's risk score and the clinical variables that most influenced the score.
  • Prospective validation would be required to increase clinical confidence in the predictor, and a larger training data set would likely improve accuracy of thromboembolic event prediction.
  • ECG parameters and lab values are reported as the first result value during the patient's admission.
  • Comorbidities are defined according to diagnosis codes in the Elixhauser comorbidity table. Values are reported as mean (standard deviation) unless otherwise indicated. P-values represent comparison between patients that did and did not experience each outcome and were calculated using the two-sample T-test, Fisher's exact test, or chi-squared test as appropriate. This table was generated using the python package tableone with the Bonferroni correction applied for multiple hypothesis testing.
  • ′′Feature′′ refers to the processed input to the ML algorithm based on the values of each clinical variable during each time window
  • ′′Time Duration′′ refers to the length of time over which clinical data values were considered to calculate each feature. Note that features were normalized during pre-processing, although raw values are shown here, and that values are listed per time-window. These are not the only features included in the classifier for prediction of cardiac arrest. P-values calculated using two-sample two-sided t-test or chi-squared test as most appropriate. This table was generated using the python package tableone.
  • Comorbidities including chronic lung disease and pulmonary circulation disorders, are defined using ICD-10 codes according to the Elixhauser comorbidity definitions.
  • the JH-CROWN COVID-19 registry includes patients of all ages seen, since Jan. 1, 2020, at any Johns Hopkins Medical Institution facility (inpatient, outpatient, in-person, video consult, or lab order) with confirmed COVID-19 or suspected of having COVID-19.
  • the cohort is defined as having a completed laboratory test for COVID-19 (whether positive or negative), having an ICD-10 diagnosis of COVID-19 (recorded at the time of encounter, entered on the problem list, entered as medical history, or appearing as a billing diagnosis), or flagged as a “patient under investigation” for suspected or confirmed COVID-19 infection. Further details are available on the Johns Hopkins Institute for Clinical and Translational Research website.
  • FIG. 2 illustrates the flow of patients through the study.
  • patients For an admission to be included, patients must have had a laboratory-confirmed SARS-CoV-2 infection within 14 days prior to the date of admission or during the admission.
  • the minimum length of time from admission to discharge or death was 4 hours for cardiac arrest prediction and 72 hours for prediction of thromboembolic events, the difference being necessitated by the time granularity with which each outcome could be identified, discussed in further detail in the following section.
  • Time spent in the emergency department did not count towards the admission duration, but if a patient had clinical data (e.g. laboratory values or vital signs) recorded in the emergency department prior to admission, those values were used to initialize the clinical data inputs at the start of their inpatient admission. Data were censored at the time of outcome or discharge.
  • the primary outcome for each patient was whether they experienced in-hospital cardiac arrest and/or an imaging-confirmed thromboembolic event.
  • In-hospital cardiac arrest included all-cause mortality and cardiac arrest with return of spontaneous circulation. All-cause mortality was defined according to the time of death recorded in the JH-CROWN database. Cardiac arrest with return of spontaneous circulation was defined as documentation in the medical record of a non-perfusing rhythm and subsequent initiation of chest compressions and other resuscitative measures by the health care team. All cardiac arrest events were considered, regardless of the influence of any precipitating events such as patient position change or respiratory decompensation. These were queried by searching for the ICD-10 code ‘I46.X’ within the problem list and encounter diagnosis list. We performed chart review to adjudicate all ICD-10-based cardiac arrest diagnoses according to the above definition. For patients with multiple cardiac arrests, the first outcome was used, and the remainder of their data were censored.
  • Thromboembolic outcomes included pulmonary embolism confirmed on computed tomography (CT) angiography of the chest, non-hemorrhagic stroke confirmed on CT of the head, and deep venous thrombosis confirmed on either vascular ultrasound or CT of the abdomen or pelvis. Findings that were diagnosed or clinically apparent on initial presentation (confirmed on imaging within 24 hours of presentation) were excluded from analysis. For a patient with multiple adverse coagulation outcomes during their hospitalization, the first outcome was used. We note that such a strict outcome definition could mean that some outcomes were missed, especially if a patient's immediate cause of death was a thromboembolic event or if the event was confirmed by point-of-care ultrasound that was not recorded in the imaging procedure list. However, we found that alternative outcome definition methods (such as ICD-10 diagnosis codes) resulted in many “false positive” outcomes upon chart review, so this method was chosen to ensure all thromboembolic events were confirmed with a consistent, objective level of clinical certainty.
  • CT compute
  • Supplementary Table 2 lists all clinical data inputs from which predictors were extracted. Here, we discuss the definition of these predictors, how they were measured, and pre-processing steps undertaken prior to dynamic feature extraction.
  • Demographic inputs included age, gender, weight, height, body mass index, and race.
  • Gender was defined as the patient's legal gender (Male or Female) as listed in the electronic health record (EHR). Race was self-reported and divided into three categories according to the most common values in the JH-CROWN registry: Black, white, and other. The inclusion of race in machine learning models is controversial. However, there is significant evidence that Black patients and other patients of color experience worse outcomes in COVID-19. We were concerned that by not including race, our model may fail to account for a higher baseline risk of adverse outcomes among Black patients in the study cohort's geographic area. Future work, prior to a prospective study, could include a re-analysis of the current results to ensure that the predictions are not systematically less accurate for any demographic group. Comorbidities were defined by mapping ICD-10 codes according to the Elixhauser comorbidity definitions using the hcuppy python library.
  • SBP systolic blood pressure
  • DBP diastolic blood pressure
  • ECG measurements were extracted from the 12-lead ECG. As with laboratory tests, these measurements were time-stamped at the time the result was received, not the time of the procedure. Parameters (QRS duration, QT interval, etc.) were evaluated by the clinician who interpreted the ECG results.
  • the testing data set was identified and sequestered from the training data prior to model development. Since this was a retrospective study and did not include any data collected prospectively, there was no need of blind assessment of predictors for patients in the testing set. Patients were assigned to development and test sets after predictors were collected and outcomes were defined.
  • the study size was determined by the number of patients in the JH-CROWN registry who met all inclusion and exclusion criteria for prediction of each outcome.
  • Missing values from the beginning of the patient's hospitalization e.g., if they did not have a measurement for a particular laboratory test until hour 48, or at any point during their hospitalization
  • Missing values following a measurement e.g., if a patient had an ECG at hour 12, then did not have another ECG until hour 48
  • Missing values following a measurement were handled with forward filling; each variable was held constant until a new measurement was made.
  • time point indicates a single moment in time.
  • feature window The time window before a time point, during which clinical data are collected and features are extracted.
  • outcome window The time window immediately after, in which the risk of a particular CV outcome is predicted.
  • Positive time windows or “positive time points” are time windows or points for which the patient experienced the CV outcome of interest in the following outcome window.
  • “Short features” encompassed a short window of time immediately preceding the time point at which the prediction was to be made. For example, if the feature window length was 2 hours, these features would include the mean, standard deviation, minimum, maximum, and amplitude of first frequency in Fourier space of the variable over the preceding 2 hours. “Long features” included the mean, standard deviation, minimum, and maximum over the patient's entire hospitalization preceding the time point at which the prediction was to be made.
  • “Exponentially weighted decay features” also encompassed the patient's entire hospitalization preceding the time point at which the prediction was to be made, but the measurements were exponentially weighted according to how recently they were made with more recent measurements weighted more strongly and a half-life of 1 day.
  • Heart rhythm indicators were re-sampled similarly to other dynamic clinical data inputs but were treated discretely. For each window, two variables were recorded for each heart rhythm indicator (Atrial fibrillation, heart block, etc.): a binary indicator of whether the patient experienced that heart rhythm within the window and an integer-valued variable indicating how many times that heart rhythm was noted within the window. It was assumed that if a patient did not have any heart rhythm annotations within a particular hour, they did not experience an abnormal heart rhythm during that window, so missing values were filled in with zero for both the binary indicator variable and integer-valued variable. “Short features” and “long features” were calculated for each heart rhythm indicator but included only the sum (total number of times each was recorded over the interval) and maximum (maximum number of times a rhythm was recorded in a single hour within the interval).
  • Dynamic features were extracted at each time point during each patient's hospitalization.
  • the time-step between time points at which predictions were made was 1 hour for prediction of cardiac arrest and 24 hours for prediction of thromboembolic events.
  • For thromboembolic events each time window began at midnight; for cardiac arrest, each time window began at the top of the hour, commencing with the first full hour after the patient was admitted as an inpatient.
  • the difference in time-step was due to the difference in the time granularity of the outcome labels.
  • cardiac arrest outcomes could be defined by the minute in which they occurred, and thus it would be appropriate to use a time-step as small as 1 minute, 1 hour was chosen to balance computational costs with the desire to train the classifier with as much data as possible.
  • a time-step of 1 hour resulted in 599143 time windows for the development set, which produced an accurate, generalizable classifier as demonstrated by the strong cross-validation and testing results for prediction of cardiac arrest.
  • the linear model was chosen as it is highly explainable (not a “black box”), it is efficient to train with hundreds of thousands of time windows, and it can be updated without requiring full re-training.
  • the learning rate of the linear model was set to “optimal” with early stopping and balanced class weight.
  • the multi-layer perceptron model is similarly efficient to train and can be updated without full re-training. Although it is more difficult to interpret, we chose to include it to assess whether a non-linear model could better represent the relationships between clinical data inputs. As COVID-19 treatment paradigms change, we expect that model updating would be necessary to retain accuracy among evolving clinical practices.
  • Pre-processing steps included removal of features which were missing for >60% of time windows, mean-value imputation for numerical features that were missing (typically at the beginning of a patient's hospitalization or if a certain laboratory test was never performed for a given patient), scaling all numerical features to zero mean and unit variance.
  • feature selection was incorporated using a lasso regression model for sparsity. This feature selection method was chosen as it is not biased towards selecting high-cardinality variables over variables with fewer discrete values (e.g., binary comorbidity features), in contrast with other popular feature selection methods such as the random forest algorithm.
  • Hyperparameters for the linear model included the maximum number of features selected, the loss function (hinge, log, modified Huber, Huber, squared hinge), the regularization penalty (L1, L2, and L1L2), the regularization strength, and the L1 ratio for L1L2 regularization. Losses were weighted during training to strongly penalize errors for positive time windows. If the optimal loss function of the linear classifier was not log or modified Huber, the optimized classifier was calibrated after training to provide risk probabilities in addition to binary predictions.
  • Hyperparameters for the multi-layer perceptron classifier included the maximum number of features selected, the number and size of hidden layers, the regularization strength, the learning rate decay schedule (constant, inverse scaling, or adaptive), and the initial learning rate.
  • the optimal models for prediction of each outcome were re-fit using the entire development set and calibrated if necessary. Static and dynamic features were then calculated for patients in the testing set using the same methods as for the development set. The fitted models were used to predict the risk of each CV outcome at each time point for each patient in the testing set. A binary prediction was also made at each time point using the optimal threshold determined by the development data during training. Models were tested using repeated temporal validation and leave-hospital-out validation.
  • VTE venous thromboembolism
  • the temporally divided testing set was sequestered until the end of model development. There were no changes made to the model following testing.
  • Table 1 indicates the number of patients for which each measurement was missing. This does not necessarily mean they never had a measurement for a certain variable. It may mean that they had a recording at a hospital in a different health system prior to being transferred to a hospital in the Johns Hopkins Health System or that data was missing from the JH-CROWN registry. This is an inherent limitation in the use of retrospective registry data, discussed in further detail in Supplementary Methods.
  • the optimal model for prediction of cardiac arrest with a feature window of 2 hours, outcome window of 2 hours, and time step of 1 hour was a linear model with features selected from short, long, and exponentially weighted decaying features.
  • the optimal model for prediction of thromboembolic outcomes with a feature window of 24 hours, outcome window of 24 hours, and time step of 24 hours was a linear model with short features only.
  • the optimal hyperparameters included 9 features selected, log loss, L2 regularization penalty, and regularization strength of 0.307.
  • Table 2 lists the features with largest absolute coefficients in the model for prediction of each outcome along with their values for time windows in the development and test sets. Feature selection was performed using the development set only. The most important features for prediction of cardiac arrest within 2 hours included age, many vital signs, and lab tests that indicate inflammation, cardiac function, and metabolic function. Several of these have previously been noted as predictors of various adverse outcomes in COVID-19. This serves as a “sanity check” that the model is learning reasonable associations between predictors and outcomes, despite its novel real-time nature.
  • the features with largest absolute coefficients for prediction of thromboembolic events within 24 hours were derived from D-dimer, magnesium, white blood cell count, immature granulocytes, and pulmonary circulation disorders. Other variables were also associated with thromboembolic events (Table 1), but only a few features could be included in the model due to the small number of events in the development set.
  • D-dimer suggests the presence of blood clots being degraded by fibrinolysis and is associated with thromboembolic events.
  • Magnesium promotes fibrinolysis and may be given as an anti-coagulant, so the features extracted from magnesium measurements may indirectly reflect physician assessment that the patient is at high risk for thromboembolic events.
  • white blood cell count is often elevated in patients with pulmonary embolism and deep vein thrombosis, which explains why it is predictive of thromboembolic events.
  • the first example predictions are the “true positive” predictions for one patient in the test set for each outcome, as shown in FIG. 4 .
  • results show that their risk is very low for the first 17 days of their hospitalization as their white blood cell count trends upward and vital signs fluctuate.
  • their pulse oxygen saturation and pulse decrease, while their chloride and white blood cell count increase. The effects of these changes are reflected in the patient's risk score, which increases very rapidly in the 18 hours leading up to the time at which the patient experienced cardiac arrest.
  • the COVID-HEART predictor is successful in determining when the patient becomes at risk of cardiac arrest in the short-term by considering these and other inputs together.
  • the results demonstrate that the risk score is low for the first 11 days of the patient's hospitalization, then it crosses the binary risk threshold immediately prior to the 12 th day when the patient experienced an imaging-confirmed thromboembolic event.
  • the patient's white blood cell count is high, which may raise clinical suspicion for infection.
  • the patient does not have remarkably elevated D-dimer or other traditional risk markers for a thromboembolic event. This highlights the usefulness of the COVID-HEART risk predictor in identifying at-risk patients that may not have raised clinical suspicion for an impending thromboembolic event.
  • FIG. 9 shows an example of a “true negative” prediction for one patient in the test set for each outcome.
  • the cardiac arrest risk score for the patient whose data is shown in FIG. 9 A remains below 0.08% for their entire hospitalization. This patient has several drops pulse oxygen saturation below 90% and isolated drops in systolic and diastolic blood pressures, but the COVID-HEART risk predictor successfully assessed their risk as low. This patient did not experience cardiac arrest and was discharged after 8 days in the hospital.
  • the thromboembolic event risk score for the patient whose data is shown in FIG. 9 B remains near 0.1% and below the binary risk threshold for their entire hospitalization. This patient did not experience any imaging-confirmed thromboembolic events and was discharged after 9 days. This also illustrates the COVID-HEART risk predictor's ability to cope with missing clinical data; the patient has no recorded measurements for magnesium during their hospitalization.
  • FIG. 10 shows an example of an incorrect prediction for one patient in the test set of each outcome.
  • the patient whose clinical data is shown in FIG. 10 A experienced cardiac arrest on day 10 of their hospitalization.
  • Their risk score for cardiac arrest increased rapidly at the end of the 3 rd day, corresponding to drops in pulse oxygen saturation below 80%, an increase in their pulse, and increase in their white blood cell count.
  • it then decreased and continued fluctuating, remaining mostly above the binary risk threshold, until the time at which they experienced cardiac arrest.
  • they were not at the highest risk immediately before they experienced cardiac arrest their risk score was still above the binary threshold immediately prior to the event.
  • this was technically a correct prediction we focus on the risk score spike on day 3 and the elevated risk score between days 3-10 as examples of false positive predictions.
  • the patient whose clinical data is shown in FIG. 10 B experienced an imaging-confirmed thromboembolic event on the 19 th day of their hospitalization. Their risk score was low for the first 5 days of their hospitalization, then rose sharply in response to significantly elevated D-dimer. It is possible that the patient experienced a thromboembolic event at this time, but they did not have a thromboembolic event confirmed by imaging until the 19 th day, 12 days later. It is also possible that the patient was treated with anti-coagulation therapy in response to the elevated D-dimer, which could have prevented a thromboembolic event during the D-dimer spike. This example highlights the need for further investigation of incorrect predictions and a prospective study of the COVID-HEART predictor in a larger cohort, which would make it possible to identify the timing of thromboembolic events more precisely.
  • FIG. 8 shows the results of varying the outcome window for prediction of each outcome.
  • the outcome window can vary from 1 to 24 hours with little change in AUROC, sensitivity, and specificity.
  • This analysis shows that the COVID-HEART predictor can predict cardiac arrest within multiple outcome window durations, representing a continuous early warning system for cardiac arrest that may be able to determine both the patient's short-term and longer-term risk.
  • FIG. 8 also presents numerical results for all outcome windows for the prediction of thromboembolic events. When the feature window is held constant at 24 hours, the results are similar for prediction of thromboembolic events within 1, 2, 3, and 4 days.
  • Section/Topic Title and abstract Checklist Item Page Title 1 D Identify the study as developing and/or validating a 1 multivariable prediction model, the target population, and the outcome to be predicted.
  • Abstract 2 D Provide a summary of objectives, study design, setting, 2 participants, sample size, predictors, outcome, statistical analysis, results, and conclusions.
  • Background and 3a D explain the medical context (including whether 4-5 objectives diagnostic or prognostic) and rationale for developing or validating the multivariable prediction model, including references to existing models.
  • 3b D Specify the objectives, including whether the study 4-5 describes the development or validation of the model or both.
  • Source of data 4a D Describe the study design or source of data (e.g., 5-6 randomized trial, cohort, or registry data), separately for the development and validation data sets, if applicable. 4b D; Specify the key study dates, including start of accrual; 8 end of accrual; and, if applicable, end of follow-up. Participants 5a D; Specify key elements of the study setting (e.g., primary 5-6 care, secondary care, general population) including number and location of centres. 5b D; Describe eligibility criteria for participants. 5-6 5c D; Give details of treatments received, if relevant. N/A Outcome 6a D; Clearly define the outcome that is predicted by the 35-36 prediction model, including how and when assessed.
  • the study design or source of data e.g., 5-6 randomized trial, cohort, or registry data
  • Participants 5a D Specify key elements of the study setting (e.g., primary 5-6 care, secondary care, general population) including number and location of centres. 5b D; Describe eligibility criteria for participants
  • Predictors 7a D Clearly define all predictors used in developing or 36-38, validating the multivariable prediction model, including 63 how and when they were measured. 7b D; Report any actions to blind assessment of predictors for 38 the outcome and other predictors.
  • Sample size 8 D Explain how the study size was arrived at. 38 Missing data 9 D; Describe how missing data were handled (e.g., complete- 39 case analysis, single imputation, multiple imputation) with details of any imputation method.
  • Statistical 10a D Describe how predictors were handled in the analyses.
  • 10 D Specify type of model, all model-building procedures 41-42 methods (including any predictor selection), and method for internal validation.
  • 10c V For validation, describe how the predictions were 42-43 calculated.
  • 10 D Specify all measures used to assess model performance 8 and, if relevant, to compare multiple models.
  • 10e V Describe any model updating (e.g., recalibration) arising N/A from the validation, if done.
  • N/A Development vs. 12 V For validation, identify any differences from the 44-47 validation development data in setting, eligibility criteria, outcome, and predictors.
  • Participants 13a D Describe the flow of participants through the study, 26 including the number of participants with and without the outcome and, if applicable, a summary of the follow-up time. A diagram may be helpful. 13 D; Describe the characteristics of the participants (basic demographics, clinical features, available predictors), 23-25 including the number of participants with missing data for predictors and outcome. 13c V For validation, show a comparison with the development 66-88 data of the distribution of important variables (demographics, predictors and outcome). Model development 14a D Specify the number of participants and outcome events in 10 each analysis. 14 D If done, report the unadjusted association between each candidate predictor and outcome.
  • Model specification 15a D Present the full prediction model to allow predictions for 32-33 individuals (i.e., all regression coefficients, and model intercept or baseline survival at a given time point). 15 D Explain how to the use the prediction model. 43 Model performance 16 D; Report performance measures (with CIs) for the 10-12 prediction model. Model-updating 17 V If done, report the results from any model updating (i.e., model specification, model performance). Limitations 18 D; Discuss any limitations of the study (such as 16-17 nonrepresentative sample, few events per predictor, missing data). Interpretation 19a V For validation, discuss the results with reference to 13-16 performance in the development data, and any other validation data. 19 D; Give an overall interpretation of the results, considering 13-16 objectives, limitations, results from similar studies, and other relevant evidence.
  • Implications 20 D Discuss the potential clinical use of the model and 13-16 implications for future research.
  • Supplementary 21 D Provide information about the availability of 19-20 information supplementary resources, such as study protocol, Web calculator, and data sets.
  • Funding 22 D Give the source of funding and the role of the funders for 1 the present study. indicates data missing or illegible when filed
  • Clinical Data Inputs Demographics age, gender, weight, height, body mass index, race Comorbidities (30) Current smoker, history of smoking, chronic pulmonary disease, diabetes mellitus with complications, diabetes mellitus without complications, lymphoma, valvular disease, psychosis, peripheral vascular disorder, pulmonary circulation disorders, hypothyroidism, alcohol abuse, neurological disorders, deficiency anemia, renal failure, liver disease, rheumatoid arthritis/collagen, solid tumor without metastasis, metastatic cancer, drug abuse, depression, HIV/AIDS, hypertension with complications, hypertension without complications, obesity, coagulopathy, peptic ulcer disease, congestive heart failure, paralysis, fluid and electrolyte disorders Vital signs (6) Pulse, systolic blood pressure, diastolic blood pressure, respiratory rate, temperature, pulse oxygen saturation Lab tests (60) NT-pro-brain natriuretic peptide, white blood cell count, absolute lymphocyte count, D-dimer, nucleated red
  • Each row contains cross-validation results when patients who were admitted to that hospital at any time during the study are left out of the development set, and testing results for patients admitted to that hospital using the model trained and optimized with the development set. If a patient has a valid admission at multiple hospitals, data from their admission to the left-out hospital is assigned to the test set and their other admissions are excluded from the development set to prevent data leakage.

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Pathology (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Psychiatry (AREA)
  • Signal Processing (AREA)
  • Physiology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Molecular Biology (AREA)
  • Surgery (AREA)
  • Animal Behavior & Ethology (AREA)
  • Veterinary Medicine (AREA)
  • Evolutionary Computation (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Investigating Or Analysing Biological Materials (AREA)
US18/257,925 2020-12-18 2021-12-17 Methods, systems and related aspects for real-time prediction of adverse outcomes using machine learning and high-dimensional clinical data Pending US20240055122A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US18/257,925 US20240055122A1 (en) 2020-12-18 2021-12-17 Methods, systems and related aspects for real-time prediction of adverse outcomes using machine learning and high-dimensional clinical data

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US202063127867P 2020-12-18 2020-12-18
US18/257,925 US20240055122A1 (en) 2020-12-18 2021-12-17 Methods, systems and related aspects for real-time prediction of adverse outcomes using machine learning and high-dimensional clinical data
PCT/US2021/064106 WO2022133258A1 (fr) 2020-12-18 2021-12-17 Prédiction en temps réel de résultats défavorables à l'aide d'un apprentissage automatique

Publications (1)

Publication Number Publication Date
US20240055122A1 true US20240055122A1 (en) 2024-02-15

Family

ID=82058507

Family Applications (1)

Application Number Title Priority Date Filing Date
US18/257,925 Pending US20240055122A1 (en) 2020-12-18 2021-12-17 Methods, systems and related aspects for real-time prediction of adverse outcomes using machine learning and high-dimensional clinical data

Country Status (2)

Country Link
US (1) US20240055122A1 (fr)
WO (1) WO2022133258A1 (fr)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024102327A1 (fr) * 2022-11-07 2024-05-16 Humabs Biomed Sa Utilisation de dossiers électroniques de santé rares pour prédire un résultat de santé
CN115969464B (zh) * 2022-12-26 2024-05-10 昆明理工大学 基于支持向量机回归的压电阻抗溶栓效果预测方法和系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9149646B2 (en) * 2008-12-31 2015-10-06 Koninklijke Philips N.V. Method and apparatus for controlling a process of injury therapy
US10973470B2 (en) * 2015-07-19 2021-04-13 Sanmina Corporation System and method for screening and prediction of severity of infection
CN110546646A (zh) * 2017-03-24 2019-12-06 帕伊医疗成像有限公司 用于基于机器学习来评估血管阻塞的方法和系统

Also Published As

Publication number Publication date
WO2022133258A1 (fr) 2022-06-23

Similar Documents

Publication Publication Date Title
Fu et al. Development and validation of early warning score system: A systematic literature review
US20210076960A1 (en) Ecg based future atrial fibrillation predictor systems and methods
Sudharsan et al. Hypoglycemia prediction using machine learning models for patients with type 2 diabetes
Balamuth et al. Comparison of two sepsis recognition methods in a pediatric emergency department
US20170061102A1 (en) Methods and systems for identifying or selecting high value patients
Sax et al. Use of machine learning to develop a risk-stratification tool for emergency department patients with acute heart failure
US11869668B2 (en) Artificial intelligence based cardiac event predictor systems and methods
CN103493054A (zh) 用于预测心血管病发展的医疗信息技术系统
US20240055122A1 (en) Methods, systems and related aspects for real-time prediction of adverse outcomes using machine learning and high-dimensional clinical data
US20210035693A1 (en) Methods, systems, and apparatuses for predicting the risk of hospitalization
KR20220102634A (ko) 건강 관리 집단들의 관리에 대한 머신 러닝 접근법들을 위한 시스템들 및 방법들
Emakhu et al. Acute coronary syndrome prediction in emergency care: A machine learning approach
Salvioni et al. The MECKI score initiative: Development and state of the art
Rahman et al. Using machine learning for early prediction of cardiogenic shock in patients with acute heart failure
Cunningham et al. Cost-utility of an online education platform and diabetes personal health record: analysis over ten years
Sun et al. Effective treatment recommendations for type 2 diabetes management using reinforcement learning: treatment recommendation model development and validation
Tsai et al. Mortality risk prediction of the electrocardiogram as an informative indicator of cardiovascular diseases
US20230245782A1 (en) Artificial Intelligence Based Cardiac Event Predictor Systems and Methods
Shade et al. Real-time prediction of mortality, cardiac arrest, and thromboembolic complications in hospitalized patients with COVID-19
Shade et al. COVID-HEART: development and validation of a multi-variable model for real-time prediction of cardiovascular complications in hospitalized patients with COVID-19
US20140372146A1 (en) Determining a physiologic severity of illness score for patients admitted to an acute care facility
US11810652B1 (en) Computer decision support for determining surgery candidacy in stage four chronic kidney disease
Jawad et al. Development and validation of prognostic machine learning models for short-and long-term mortality among acutely admitted patients based on blood tests
JP2024522121A (ja) 人工知能ベースの心イベント予測システムおよび方法
Zhou Reliable and Practical Machine Learning for Dynamic Healthcare Settings

Legal Events

Date Code Title Description
AS Assignment

Owner name: THE JOHNS HOPKINS UNIVERSITY, MARYLAND

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SHADE, JULIE K.;DOSHI, ASHISH;SUNG, ERIC;AND OTHERS;SIGNING DATES FROM 20210510 TO 20210716;REEL/FRAME:063971/0329

STPP Information on status: patent application and granting procedure in general

Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION