WO2021044594A1

WO2021044594A1 - Method, system, and apparatus for health status prediction

Info

Publication number: WO2021044594A1
Application number: PCT/JP2019/035022
Authority: WO
Inventors: George Chalkidis; Wataru Takeuchi; Shinji Tarumi; Shuntaro Yui
Original assignee: Hitachi, Ltd.
Priority date: 2019-09-05
Filing date: 2019-09-05
Publication date: 2021-03-11

Abstract

A present invention relates to predicting a health status progression for a patient. Predicting the health status progression for the patient may include segmenting a timeline representation of a set of electronic health record data that indicates medical history information for a patient into a set of lookback windows that designate one or more time periods of the timeline representation, determining, for a lookback window of the set of lookback windows, a set of trajectory features that indicate a change of a feature of the set of electronic health record data, assigning, to a subset of trajectory features that achieve a relevance threshold with respect to predicting the health status progression of the patient, a weighting value, and generating, based on the weighting values assigned to the subset of trajectory features, a prediction outcome that indicates an expected health progression of the patient.

Description

[Corrected under Rule 26, 24.10.2019] METHOD, SYSTEM, AND APPARATUS FOR HEALTH STATUS PREDICTION

The present disclosure generally relates to machine learning techniques, and more specifically relates to machine learning techniques configured to make predictions regarding the health status of a patient based on electronic health records.

In the field of cancer disease progression, a typical treatment pattern for a patient consists of several lines of treatment until the end of life is reached. However, in some cases, there comes a point in a patient’s health status trajectory where further aggressive treatment of cancer is unlikely to halt deterioration and extend life any further. Due to the clinical and financial burden of such futile treatments, in some circumstances it is more beneficial to focus on care that improves a patient’s quality of life, such as palliative care, instead of administering futile aggressive treatments that may reduce the quality of life of the patient.

A patient’s health status changes over time can be viewed as a trajectory. Accurately estimating the progression of a cancer patient along their own personal health status trajectory is useful for making an informed risk-benefit analysis of treatment options available for the patient. Currently, for advanced stage cancer outpatients, no adequate solution exists for identifying a patient’s location on the health status trajectory and predicting deterioration months in advance.

In the health status trajectory of advanced stage cancer patients exists a “tipping point” when, if passed, aggressive treatments will no longer extend a patient’s life but instead decrease quality-of-life and increase medical costs. Predicting when this tipping point will occur for a particular patient and modifying treatment plans accordingly can facilitate the reduction of medical costs and increases in patient quality of life. In order to accurately predict the tipping-point in a patient’s health status trajectory based on electronic health record data, it is important to detect trends in time sequence observational data such as laboratory test results, and integrate this information with other data about a patient’s health state, such as present and past diagnoses, treatment information, and the like.

A variety of methods have been proposed for making predictions regarding the health status trajectory of a patient. For instance, US Patent Application 2018/0333106 A1 (Patent Document 1) discloses “Methods and systems for predicting deterioration of a patient's condition within a future time interval based on a time series of values for monitored physiological variables measured from a patient, and in some instances, providing advanced notice to clinicians or caregivers when deterioration is forecasted or modifying treatment for the patient are provided. In particular, deterioration of a patient's condition is based on a Hopf bifurcation model and is predicted using a ratio of deviations for monitored physiological variables. A ratio of deviations relates the standard deviation and root mean square of successive differences for a set of physiological values measured over time. The RoD for one or more variables, such as heart rate, respiratory rate, and blood pressure, may be used to predict the likelihood of the patient's condition deteriorating into an unstable state as what occurs in a Hopf bifurcation.”

[Patent Document 1] US Patent Application 2018/0333106 A1

The invention of Patent Document 1 relates to a technique for predicting the occurrence of a future event indicating probable patient deterioration within a short term (for instance, between two hours and six hours) based on physiological variables collected for a continuously monitored patient (e.g., inpatients).

The technique disclosed in Patent Document 1, however, is directed toward predicting short-term health status changes focus for hospitalized, heavily monitored end-stage patients with only a few weeks or months of life remaining. This method does not disclose techniques for predicting end-of-life for patients as far as six to twelve months prior to death. As a result, the technique of Patent Document 1 does not provide adequate long-term end-of-life care.

In addition, conventional methods of health status trajectory prediction lack the ability to identify time-dependent trends and motifs in electronic health records and to utilize these trends to predict health status changes. Without identifying trends in electronic health records and combining these trends with other data available in the EHR, it is difficult to make accurate predictions regarding a patient’s future health status trajectory and to identify in advance the tipping-point at which deterioration becomes irreversible.

Further, as described above, the technique of Patent Document 1 relates to predicting near term health status changes focus for hospitalized, heavily monitored end-stage patients. However, in the case of outpatients, cancer patients present themselves to the healthcare system based on their own personal judgment. As a result, when viewing a patient’s EHR history on a timeline, data points are sparsely and irregularly sampled. Existing health status trajectory prediction systems have difficulty representing these time-dependent trend evolutions indicated by the EHR for incorporation into risk prediction models. As a result, the technique of Patent Document 1 as well as other existing methods utilize static rather than dynamic features for making predictions, and therefore cannot properly distinguish between patients that are at different points on the health status trajectory.

Accordingly, it is an object of the present disclosure to provide a method, apparatus, and system configured to make accurate long-term predictions regarding a patient’s health status based on trends identified from electronic health records in order to reduce the financial and clinical burden of futile treatments.

One representative example of the present disclosure relates to a method for predicting a health status progression for a patient, the method including preprocessing a set of electronic health record data that indicates medical history information for a patient to construct a timeline representation for the set of electronic health record data, segmenting the timeline representation for the set of electronic health record data into a set of lookback windows that designate one or more time periods of the timeline representation, determining, for a lookback window of the set of lookback windows, a set of trajectory features that indicate a change of a feature of the set of electronic health record data, assigning, to a subset of trajectory features that achieve a relevance threshold with respect to predicting the health status progression of the patient, a set of weighting values, and generating, based on the set of weighting values assigned to the subset of trajectory features, a prediction outcome that indicates an expected health progression of the patient.

According to the present invention, it is possible to provide a method, apparatus, and system configured to make accurate long-term predictions regarding a patient’s health status based on trends identified from electronic health records in order to reduce the financial and clinical burden of futile treatments.

Problems, configurations, and effects other than those described above will be made clear by the following description in the embodiments for carrying out the invention.

FIG. 1 illustrates an example computing infrastructure for executing the embodiments of the present disclosure. FIG. 2 illustrates an example configuration of a system for health status prediction, according to embodiments. FIG. 3 illustrates an example flow of a method for health status prediction, according to embodiments. FIG. 4 illustrates an example of a pre-processing pipeline for constructing a timeline representation for a set of electronic health record data, according to embodiments. FIG. 5 illustrates an example method of pre-processing to construct a timeline representation for a set of electronic health record data, according to embodiments. FIG. 6 illustrates an example method for generating a prediction outcome based on trend features that represent quantitative data indicating changes in biological measurements for the patient, according to embodiments. FIG. 7 illustrates an example functional configuration for generating a prediction outcome based on motif features that represent qualitative data indicating changes in event information associated with a temporal period for the patient, according to embodiments. FIG. 8 illustrates an example method illustrating the functionality of a unified architecture for making health status change predictions, according to embodiments. FIG. 9 illustrates an example method illustrating the functionality of a unified architecture for making health status change predictions using custom features, according to embodiments.

Description of Embodiment(s)

The following detailed description provides further details of the Figures and example implementations of the present application. Reference numerals and descriptions of redundant elements between figures are omitted for clarity. Terms used throughout the description are provided as examples and are not intended to be limiting. Example implementations as described herein can be utilized either singularly or in combination and the functionality of the example implementations can be implemented through any desired means.

Aspects of the disclosure relate to addressing the problems in the related art by preprocessing electronic health record data to construct a timeline representation, segmenting the timeline representation into a set of lookback windows that designate one or more time periods of the timeline representation (e.g., 1 month, 2 months, 6 months, 1 year), determining a set of trajectory features (e.g., absolute change, relative change, percent change, measures of rate changes such as velocity, acceleration, measures of variation and dispersion such as standard deviation, coefficients of variation, and variance-to-mean ratio) that indicate a change of a feature (e.g., BMI fluctuation, disease progression states) within a lookback window of the set of lookback windows, assigning a set of weighting values to a subset of the set of trajectory features that achieve a relevance threshold with respect to predicting the health status progression of the patient, and generating a prediction outcome that indicates an expected health progression of the patient.

In embodiments, the prediction outcome indicates a risk assessment for one or more prediction horizons. For instance, such risk estimates may include health status changes such as the probability of a patient being alive in a predetermined prediction horizon (e.g., 6-12 months).

In embodiments, the set of trajectory features include a set of trend features that represent quantitative data indicating changes in biological measurements for the patient. The set of trend features may include an absolute difference value of a feature within a set of lookback windows, a relative difference value of a feature within a set of lookback windows, a percent change value of a feature within a set of lookback windows, a rate change value of a feature within a set of lookback windows, a velocity of change of a feature within a set of lookback windows, an acceleration of change of a feature within a set of lookback windows, a measure of variation of a feature within a set of lookback windows, an average deviation of a feature within a set of lookback windows, a standard deviation of a feature within a set of lookback windows, a measure of dispersion of a feature within a set of lookback windows, a coefficient of variation of a feature within a set of lookback windows, or a variance-to-mean ratio of a feature within a set of lookback windows.

In embodiments, the set of trajectory features includes a set of motif features that represent qualitative data indicating changes in event information associated with a temporal period for the patient. The set of motif features may include a sequence of medical treatments for the patient within a set of lookback windows, a sequence of medical diagnoses for the patient within a set of lookback windows, or a sequence of disease progression states for the patient within a set of lookback windows.

In embodiments, generating the prediction outcome includes identifying, as a first trend segment, a sequence of consecutive lookback windows associated with a weighting value above a predetermined threshold, and predicting, using a machine learning technique to create a forecast based on the sequence of consecutive lookback windows, a second trend segment that indicates an expected continuation of the first trend segment as the prediction outcome. In embodiments, the machine learning technique may include a convolutional neural network.

End-of-life (EOL) patients consume a significant amount of healthcare resources each year. As described above, in the health status trajectory of advanced stage cancer patients exists a “tipping point” when, if passed, aggressive treatments will no longer extend a patient’s life but instead decrease quality-of-life and increase medical costs, which is called clinical & financial toxicity. In aging societies, in particular, it is important to control EOL care consumption of total available healthcare resources.

Reduced quality of life (QOL) is one common problem among EOL patients. In recent years, a new approach towards EOL management known as “Palliative Care” (PC) has emerged, which is expected to increase QOL and simultaneously lower EOL costs. Cancer treatments are not only very costly, but they also impose heavy burdens on patients receiving therapy. In accordance with the “value-based healthcare” paradigm of Centers of Medicare & Medicaid Services (CMS), high-value care favors PC over aggressive treatments for EOL patients. However, oncologists are faced with the problem of not knowing where a cancer patient is on its disease trajectory and if the “tipping-point” in the health status trajectory has already been passed. For outpatients, current solutions do not address this prognostic gap. Combined with inaccurate survival estimates, life expectancy overestimation is a problem, such that risk-benefit analyses of anti-cancer treatments can be severely flawed.

Typically, cancer patients undergo several lines of treatment before reaching their EOL, and the duration of progression free survival decreases with each line of treatment. Cancer disease progression traverses several stages: early, advanced, and end. Prognostic features dynamically change with disease progression. Current systems in the related art focus on hospitalized, end-stage patients with only weeks or months of life left. Thus, needs for outpatient health status prognosis remain unmet. The present disclosure relates to accurately predicting health status changes far in advance prior to death to optimize EOL care.

Aspects of the disclosure relate to providing an accurate prognosis system for advanced stage cancer patients in an outpatient setting which predicts various risks such as health status changes with machine learning based on electronic health records. These predictions may include the “tipping-point” in a patient’s health status trajectory, and the mortality of the patient (specifically, the likelihood of death within a range of 6 to 12 months). The embodiments described herein are intended to be understood by a person skilled in the art. The description herein does not limit the scope of the present invention.

One aspect of the present disclosure relates to a machine-learning based prognosis system that can be integrated in a hospital or with insurance companies, depending on the desired implementation. The machine-learning based prognosis system according to the present disclosure can also be integrated into clinical workflows. Furthermore, the machine-learning based prognosis system according to the present disclosure can be used by patients and providers to facilitate joint decision support regarding treatment path selection (e.g. aggressive cancer care VS supportive palliative care).

It should be noted that, although examples of the present disclosure are described in relation to health care status predictions for human patients, the present invention is not limited thereto. For instance, aspects of the present disclosure may be applied for making health status predictions for animals, forecasting hardware component (e.g., storage device) lifetimes, or the like.

Referring now to FIG. 1, FIG. 1 illustrates an exemplary computing infrastructure for executing the embodiments of the present disclosure. In particular, FIG. 1 illustrates an example of a computing node 10. Computing node 10 is only one example of a suitable computing device and is not intended to suggest any limitation as to the scope of use or functionality of embodiments of the disclosure described herein. Regardless, computing node 10 is capable of being implemented and/or performing any of the functionality set forth herein. In embodiments, computing node 10 may be configured to operate as a cloud computing node or server in a distributed computing environment for implementing the functions of the present disclosure described herein.

In computing node 10 there is a computer system/server 12, which is operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well-known computing systems, environments, and/or configurations that may be suitable for use with computer system/server 12 include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputer systems, mainframe computer systems, and distributed cloud computing environments that include any of the above systems or devices, and the like.

Computer system/server 12 may be described in the general context of computer system executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, and so on that perform particular tasks or implement particular abstract data types. Computer system/server 12 may be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computer system storage media including memory storage devices.

As illustrated in FIG. 1, computer system/server 12 in the computing node 10 is shown in the form of a general-purpose computing device. The components of computer system/server 12 may include, but are not limited to, one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including system memory 28 to processor 16.

Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, Enhanced ISA (EISA) bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Computer system/server 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer system/server 12, and it includes both volatile and non-volatile media, removable and non-removable media.

System memory 28 can include computer system readable media in the form of volatile memory, such as random access memory (RAM) 30 and/or cache memory 32. Computer system/server 12 may further include other removable/non-removable, volatile/non-volatile computer system storage media. By way of example only, a storage system 34 can be provided for reading from and writing to a non-removable, non-volatile magnetic media (not shown and typically called a "hard drive"). Although not shown, a magnetic disk drive for reading from and writing to a removable, non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable, non-volatile optical disk such as a CD-ROM, DVD-ROM or other optical media can be provided. In such instances, each can be connected to bus 18 by one or more data media interfaces. As will be further depicted and described below, memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the disclosure.

Program/utility 40, having a set (at least one) of program modules 42, may be stored in memory 28 by way of example, and not limitation, as well as an operating system, one or more application programs, other program modules, and program data. Each of the operating system, one or more application programs, other program modules, and program data or some combination thereof, may include an implementation of a networking environment. Program modules 42 generally carry out the functions and/or methodologies of embodiments of the disclosure as described herein.

Computer system/server 12 may also communicate with one or more external devices 14 such as a keyboard, a pointing device, a display 24, etc.; one or more devices that enable a user to interact with computer system/server 12; and/or any devices (e.g., network card, modem, etc.) that enable computer system/server 12 to communicate with one or more other computing devices. Such communication can occur via Input/Output (I/O) interfaces 22. Still yet, computer system/server 12 can communicate with one or more networks such as a local area network (LAN), a general wide area network (WAN), and/or a public network (e.g., the Internet) via network adapter 20. As depicted, network adapter 20 communicates with the other components of computer system/server 12 via bus 18. It should be understood that although not shown, other hardware and/or software components could be used in conjunction with computer system/server 12. Examples, include, but are not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data archival storage systems, etc.

Next, an example configuration of a system for health status predictions according to embodiments of the present disclosure will be described with respect to FIG. 2.

FIG. 2 illustrates an example configuration of a health status prediction system 100, according to embodiments. As illustrated in FIG. 2, the health status prediction system 100 primarily includes a set of electronic

health record systems

101, 105, an I/O interface 102, a pre-processing unit 103, a health status prediction server 106, and a plurality of

destination facilities

107, 110. The various components of the health status prediction system 100 may be connected by means of a communication network such as a local area network (LAN), the Internet, or the like. In embodiments, the health status prediction system 100 may be configured as a server-client architecture, in which one or more electronic health record systems provide electronic health record data to a central server (e.g., health status prediction server 106) configured to perform the functions of the present invention, and provide prediction outcomes to one or more destination facilities (e.g., clinics, hospitals, personal computing devices, insurance companies) for consideration and analysis.

In general, the electronic

health record systems

101, 105 may include one or more data stores of health records for one or more patients. The electronic

health record systems

101, 105 may, for instance, store information regarding patient demographics (patient IDs, sex, birth date), patient visits (visit IDs, visit type), laboratory data (e.g., height, weight, white blood cell count, overall health condition) or the like. In embodiments, the electronic health record system 101 may include medical history data prepared in advance for training the machine learning technique of the present disclosure, and the electronic health record system 105 may be configured to collect and store medical information for a patient 104 that will be used as test data for the health status prediction server 106. In some embodiments, the electronic

health record systems

101, 105 may be implemented as a cloud-based platform, or may be distributed across multiple physical locations.

In particular, the electronic

health record systems

101, 105 may store features relevant to the prediction of the health status progression of a patient. As known to those skilled in the art, examples of features that may be stored in the electronic

health record systems

101, 105 may include alkaline phosphatase (ALP), hemoglobin (HGB), lactate dehydrogenase (LDH), white blood cells (WBC), albumin, bilirubin, calcium, creatinine, lymphocyte count, lymphocyte percentage, monocyte count, monocyte percentage, absolute neutrophils count (ANC), platelets, sodium, and body mass index (BMI). Other features related to cancer prognosis may include Condensed Memorial Symptom Assessment (CMSAS) score, Eastern Cooperative Oncology Group (ECOG) score, Karnofsky Performance Scale (KPS), and Modified Early Warning Score (MEWS). As described herein, in order to predict health status changes in the health trajectory of a patient well in advance, instead of pointwise, snapshots in time-based predictions, aspects of the disclosure relate to modeling the evolution and time-dependency of features and events stored in the electronic health record systems.

The I/O interface 102 may provide a means for transferring information between the electronic health record system 101 and the pre-processing unit 103. For example, the I/O interface may read information designated for analysis from one or more tables stored in the electronic health record system 101 and transfer it to the pre-processing unit 103. The pre-processing unit may be configured to preprocess the electronic health records received from the electronic health record system 101 in order to construct a timeline representation for the set of electronic health record data.

The health status prediction server 106 may include a software module, hardware component, integrated circuit, or the like configured to implement the functions of the present disclosure. As illustrated in FIG. 2, the health status prediction server 106 may include a segmentation unit 115 configured to segment the timeline representation for the set of electronic health record data into a set of lookback windows that designate one or more time periods of the timeline representation, a determination unit 116 configured to determine, for a lookback window of the set of lookback windows, a set of trajectory features that indicate a change of a feature of the set of electronic health record data, an assignment unit 117 configured to assign a set of weighting values to a subset of trajectory features that achieve a relevance threshold with respect to predicting the health status progression of the patient, and a generation unit 118 configured to generate, based on the set of weighting values assigned to the subset of trajectory features, a prediction outcome that indicates an expected health progression of the patient. In addition, the health status prediction server 106 may include one or more machine learning models (e.g., convolutional neural networks, long short term neural networks, gradient boosting, decision trees) configured to perform various aspects of the functionality of the above units.

The

destination facilities

107, 110 may include institutions configured to receive the prediction outcome generated by the health status prediction server 106. As examples, the

destination facilities

107, 110 may include clinics, hospitals, insurance companies, individuals, research facilities, or the like. As illustrated in FIG. 2, each

destination facility

107, 110 may include an

output unit

108, 114, respectively. The

output units

108, 114 may include computing devices configured to provide the prediction outcome received from the health status prediction unit to users (e.g.,

users

109, 112, 113, 114).

As an example, in embodiments, the health status prediction system 100 may be configured to connect to the electronic health record system 101 of a health institution (e.g., a hospital) via the I/O interface 102. Data necessary for training the machine learning technique of the health status prediction server 106 may be retrieved via the I/O interface 102 and undergo pre-processing at 103 in order to create a machine-learning ready dataset. Once a new patient 104 for whom a health status change prediction is to be made comes into contact with the health institution, medical information for the patient 104 is entered into the electronic health record system 105 and input to the health status prediction server 106, which performs the functions of the present disclosure (to be described later) to generate a prediction outcome that indicates the “tipping-point” in the patient’s health trajectory or a risk estimate for a predetermined prediction horizon (e.g., 6 months).

Subsequently, the prediction outcome is transmitted to the

destination facilities

107, 110. For example, at destination facility 107, the output unit 108 may display the result of the prediction outcome (e.g., tipping point or risk estimate for a 6 month prediction horizon) to a physician 109 in order to facilitate performance of a risk benefit analysis for a variety of treatments. As another example, at destination facility 110, the output unit 110 may display the result of the prediction outcome to a patient 112 and a physician 113 to facilitate prognosis and discussion of treatment path options.

Next, an example flow of a method for health status prediction will be described with respect to FIG. 3.

FIG. 3 illustrates an example flow of a health status prediction method 300, according to embodiments. As described herein, aspects of the health status prediction method 300 are directed toward generating a prediction outcome regarding a patient’s health status based on trajectory features for a set of lookback windows defined for a timeline representation of the patient’s electronic health record data. The health status prediction method 300 may be associated with benefits including long-term health status prediction accuracy, medical cost reduction, and quality of life increase. As described herein, the steps of the health status prediction method 300 may be carried out by software modules, dedicated hardware units, integrated circuits, or any other suitable means of achieving each respective function. For instance, the steps of the health status prediction method 300 may be carried out by the preprocessing unit 103 together with the segmentation unit 115, the determination unit 116, the assignment unit 117, and the generation unit 118 of the health status prediction server 106 of FIG. 2.

At Step 310, the pre-processing unit 103 may preprocess a set of electronic health record data for a patient to construct a timeline representation for the set of electronic health record data. As described herein, the set of electronic health record data may indicate medical history information (e.g., sex, birth date, visit information, BMI, white blood cell count) for the patient. Generally, pre-processing may include transforming, revising, preparing, altering, refining, modifying, or otherwise converting the set of electronic health record data into the timeline representation. For example, the pre-processing unit may aggregate the data for a particular patient, sort it by date, perform data cleaning operations and data heuristics to interpolate missing values, and create a timeline representation that illustrates the diagnoses, visits, test results, health status changes, and other medical events for the patient in a chronological order. Details of the pre-processing performed herein will be described later. Other methods of preprocessing the set of electronic health record data to construct the timeline representation are also possible.

At step 320, the segmentation unit 115 may segment the timeline representation for the set of health record data into a set of lookback windows that designate one or more time periods of the timeline representation. Generally, segmenting can include splitting, partitioning, breaking-down, sectioning, or otherwise dividing the timeline representation into a set of lookback windows. As used herein, a lookback window may refer to a defined time span within the timeline representation. The lookback window may be specified to be any desired length of time. The length of a lookback window is referred to as its size. As examples, in a timeline representation that spans 1 year, lookback windows of 1 day, 1 week, 3 weeks, 1 month, 2 months, 4 months, 6 months, 1 year, or the like may be defined. Defining lookback windows for the timeline representation allows for a method of coping with periods of sparse data (e.g., time periods in which a patient has not visited a hospital/clinic and so no data is available), and facilitates identification of trend patterns in the data.

In embodiments, the segmentation unit 115 may determine the number and size of the lookback windows based on a set of characteristic factors of the electronic health record data. The set of characteristic factors of the electronic health record data may include, for instance, the period of time covered by the timeline representation, the density of the collected data points, health diagnoses for the patient, a disease progression state, a treatment sequence, the type of the data included in the timeline representation (e.g., BMI data, blood pressure data, white blood cell data), or the like. In embodiments, the segmentation unit 115 may be configured to utilize a machine learning technique trained to automatically assign appropriate numbers and sizes for the set of lookback windows based on the above described characteristic factors. Alternatively, the segmentation unit 115 may be configured to reference a database that defines recommended numbers and sizes for the set of lookback windows based on the above described characteristics factors of the electronic health record data. Other methods of segmenting the set of health record data into the set of lookback windows are also possible.

At step 330, the determination unit 116 may be configured to determine a set of trajectory features that indicate a change of a feature of the set of electronic health record data. The set of trajectory features may be determined for a lookback window of the set of lookback windows. Generally, determining the set of trajectory features for a lookback window may include computing, ascertaining, deriving, calculating, or otherwise identifying the set of trajectory features for the lookback window. Here, the “feature” of the set of electronic health record data may refer to a particular property or attribute of the set of electronic data. For instance, features may include quantitative, measured values for the patient such as BMI, white blood cell count, albumin, cholesterol or the like, or qualitative assessments of the patient’s health such as disease diagnoses, treatment prescriptions, or the like. Accordingly, as used herein, a trajectory feature may refer to an indication of how a particular feature changes over the time period specified by a particular lookback window. In embodiments, the set of trajectory features may be derived for one or more lookback window of the set of lookback windows using a machine learning technique (e.g., a convolutional neural network).

A trajectory feature may be categorized as either a trend feature (a mathematical representation of how a numerical, quantitative feature changes over time) or a motif feature (a representation of a temporal relationship between qualitative events with respect to a patient). For instance, the set of trend features may include the absolute difference value of a feature within a set of lookback windows, the relative difference value of a feature within a set of lookback windows, the percent change value of a feature within a set of lookback windows, the rate change value of a feature within a set of lookback windows, the velocity of change of a feature within a set of lookback windows, the acceleration of change of a feature within a set of lookback windows, the measure of variation of a feature within a set of lookback windows, the average deviation of a feature within a set of lookback windows, the standard deviation of a feature within a set of lookback windows, the coefficient of variation of a feature within a set of lookback windows, the variance-to-mean ratio of a feature within a set of lookback windows, the measure of dispersion of a feature within a set of lookback windows, or the like. As an example, determining the set of trajectory features may include calculating that the white blood cell count of a cancer patient has a rate of change of +5.2% over a 4 month period.

In embodiments, motif features may include a sequence of medical treatments for the patient within a set of lookback windows, a sequence of medical diagnoses for the patient within a set of lookback windows, or a sequence of disease progression states for the patient within a set of lookback windows. As an example, determining the set of trajectory features may include analyzing the one or more lookback windows and determining the order in which treatments (e.g., alternatively, diagnoses or progression states) were performed on a patient in relation to the rate of disease progression. The trajectory features determined here can be correlated with historical medical data for other patients to facilitate generation of a prediction outcome.

The types of trajectory features derived for a lookback window may be determined based on the nature of the data. As examples, if the number of available data points does not exceed a predetermined threshold (e.g., 10 data points), then the standard deviation may not be calculated, and if the data points are substantially flat (e.g., little to no change) then velocity and acceleration need not be calculated. In addition, as examples, the set of motif features may include a sequence of medical treatments for the patient within a set of lookback windows, a sequence of medical diagnoses for the patient within a set of lookback windows, a sequence of disease progression states for the patient within a set of lookback windows or the like. Depending on whether a trajectory feature is a trend feature or a motif feature, the method of using the machine learning technique to determine the prediction outcome for the patient may differ. Processing flows for each type of trajectory feature will be described later.

At step 340, the assignment unit 117 may be configured to assign a set of weighting values to a subset of trajectory features that achieve a relevance threshold with respect to predicting the health status progression of the patient. Generally, assigning can include designating, computing, allocating, distributing, calculating, specifying, or otherwise determining the set of weighting values for the subset of trajectory features that achieve a relevance threshold with respect to predicting the health status progression of the patient. As used herein, the weighting values may refer to a numerical score (e.g., from 0 to 100, where higher scores indicate a higher degree of impact) that represents the degree of impact a particular trajectory feature is anticipated to have on the result of the prediction outcome.

In embodiments, assigning the set of weighting values may include using a machine learning technique trained on historical medical data to identify those trajectory features from among the set of trajectory features that achieve a relevance threshold with respect to predicting the health status progression of the patient, and subsequently assigning the set of weighting values to the identified subset of trajectory features to quantify the degree of their impact. As used herein, the relevance threshold may refer to a preset boundary condition that assesses the level of pertinence or importance a particular trajectory feature has for predicting the progression of a particular health condition. As an example, trajectory features that indicate a change in values for white blood cell count and lymphocyte count, two factors known to be indicative of cancer progression, may be identified as achieving the relevance threshold for a cancer patient and assigned higher weighting values, whereas trajectory features corresponding to height (e.g., a feature not significantly relevant to cancer progression) may be determined to not achieve the relevance threshold. Other methods of assigning the weighting value to the subset of trajectory features are also possible.

At step 350, the generation unit 118 may be configured to generate a prediction outcome that indicates an expected health progression for the patient. The prediction outcome may be determined based on the set of weighting values assigned to the subset of trajectory features. Generally, generating can include producing, developing, computing, deriving, calculating, or otherwise creating the prediction outcome that indicates an expected health progression for the patient. Generating the prediction outcome may include identifying, as a first trend segment, a sequence of consecutive lookback windows associated with a weighting value above a predetermined threshold, and predicting, using a machine learning technique configured to create a forecast based on the sequence of consecutive lookback windows, a second trend segment that indicates an expected continuation of the first trend segment. In this way, by identifying consecutive lookback windows that all satisfy a predetermined weighting threshold, it is possible to ascertain trends in pertinent features in the health status of a patient and extend those trends to make predictions about that are relevant to the future progression of the patient’s health trajectory. As examples, the prediction outcome may be represented as a risk estimate for a series of prediction horizons. Here, a prediction horizon may refer to a length of time for which the prediction is applicable. For example, the expected health progression of the prediction outcome may indicate the probability that the patient will still be alive for prediction horizons of 1 month, 2 months, 6 months, 1 year, or the like. As an additional example, the expected health progression for the patient may include a projected timeline that includes forecasted dates of particular disease progressions states, hospice care, the tipping point for the patient, death, recovery, or the like. Other methods of generating the prediction outcome are also possible.

Next, an example pre-processing pipeline will be described with reference to FIG. 4.

FIG. 4 illustrates an example of a pre-processing pipeline 400 for constructing a timeline representation for a set of electronic health record data, according to embodiments. As described herein, the health status prediction system of the present disclosure may be configured to preprocess the electronic health data illustrated in FIG. 2 to construct a timeline representation which can be used as an input to the machine learning system.

As illustrated in FIG. 2, the data stored in the electronic health record systems (e.g., electronic health record system 101, 105) may be stored in different tables. For example, as illustrated in FIG. 4, the electronic health systems may include a patient demographic data table 401, a visit information table 402, a laboratory data table 403 and the like. Each patient may be assigned a unique patient ID, and encounters with the healthcare system may be tracked using a unique visit ID. These unique keys, in addition to corresponding time-stamps, enable aggregation of patient data according to a variety of criteria. For example, the pre-processing unit at 404 (e.g., which substantially corresponds to the pre-processing unit 103 of FIG. 2) can aggregate patient data on the basis of when they visited, or on the basis of a user defined time-interval level (e.g., 1 day, 1 week).

As described herein, the pre-processing unit 404 may perform a variety of data cleaning steps to prepare the electronic health record data for construction of the timeline representation. For instance, the pre-processing unit may check for incorrect electronic health record entries or abnormal readings, and ensure that values are measured in terms of the same units and on compatible scales. In addition, the pre-processing unit 404 fill missing data (automatically interpolation) using a variety of heuristics, (e.g. clinical heuristics that fill missing values with the most recent available value in the electronic health record system within a pre-defined time interval).

Next, the pre-processing unit 404 may construct a timeline representation 405 for each patient. As described herein, the timeline representation 405 may be a data structure that illustrates the diagnoses, visits, test results, health status changes, and other medical events for the patient in a chronological order. For example, as illustrated in FIG. 4, the timeline representation 405 may illustrate the initial diagnosis, lab test results, metastatic disease diagnosis, hospice care order, and date of death for a patient in chronological order. It should be noted that the timeline representation 405 illustrated in FIG. 4 serves as a timeline for training the machine learning technique (e.g., as it includes events from initial diagnosis until death), whereas actual test data for a living patient may only include electronic health data for the user from initial diagnosis until the present date. Other methods of constructing the timeline representation are also possible.

Next, an example pre-processing method will be described with respect to FIG. 5.

FIG. 5 illustrates an example pre-processing method for constructing a timeline representation for a set of electronic health record data, according to embodiments. As illustrated in FIG. 5, at step 510, the pre-processing unit (e.g., the pre-processing unit 103 of FIG. 2 or the pre-processing unit 404 of FIG. 4) aggregates (e.g., gathers, collects) electronic health record data from the electronic health record system 301 for each patient. Next, at step 520, the pre-processing unit groups (e.g., sorts, arranges, organizes) the electronic health record data on a per patient basis according to user defined settings (e.g., based on visit date or the like). At step 530, the pre-processing unit applies heuristics to fill in missing values (e.g., automatic interpolation). At step 540, the pre-processing unit performs clinical classification of laboratory values included in the electronic health record data. Here, the clinical classification may include grouping, sorting, or otherwise categorizing the laboratory values to derive conclusions regarding the nature of the health of the patient (e.g., categorizing a BMI value of less than 18.5 as underweight).

Next, at step 550, the pre-processing unit may construct a timeline representation for the electronic health record data for the patient. This timeline representation may substantially correspond to the timeline representation illustrated in FIG. 4. At step 560, the pre-processing unit 404 may verify whether a timeline representation has been constructed for all the patients listed in the electronic health record system. In the case that a timeline representation has not been constructed for all the patients in the electronic health record system, the pre-processing unit 404 returns to step 510 and repeats the steps for another patient. In the case that a timeline representation has been constructed for all the patients in the electronic health record system, the pre-processing unit 404 outputs the timeline representations and the method completes.

As described herein, subsequent to generation of the timeline representation, aspects of the disclosure are directed toward generating a prediction outcome regarding a patient’s health status based on trajectory features for a set of lookback windows defined for the timeline representation of the patient’s electronic health record data. As an example, aspects of the disclosure relate to utilizing various deep learning models including “trend-aware” convolutional neural networks (CNN) with long short-term memory neural networks (LSTM) to model time-dependent data from electronic health records and make predictions about future health status changes. For instance, for progression modeling of time sequence events such as disease progression or metastasis progression, LSTM may be used. Additionally, for trend feature analysis and motif feature analysis, a “trend-aware” CNN may be used that uses a segmentation approach to enhances the data stream with various trajectory and clinical classification features in order to find predictive trends and motifs. Depending on the data type, data streams may be split into different channels for “trend-aware” CNN modeling.

As described above, the method of determining the prediction outcome for a patient may differ depending on whether the identified trajectory features are trend features (e.g., a mathematical representation of how a numerical, quantitative feature changes over time) or motif features (a representation of a temporal relationship between qualitative events with respect to a patient). Accordingly, an example of generating the prediction outcome for a patient based on trend features and an example of generating the prediction outcome for a patient based on motif features will be described with respect to FIG. 6 and FIG. 7, respectively.

FIG. 6 illustrates an example method 600 for generating a prediction outcome based on trend features that represent quantitative data indicating changes in biological measurements for the patient, according to embodiments. As an example, the method 600 may be used to generate a prediction outcome in cases where the trajectory features identified from the set of lookback windows include trend features such as BMI, white blood cell count, albumin, bilirubin, calcium, creatinine, lymphocyte count, lymphocyte percentage, monocyte count, monocyte percentage, absolute neutrophils count (ANC), platelets, sodium, or the like.

First, at step 610, electronic health data is read from the electronic health record system 601, and the pre-processing described above (e.g., the steps illustrated in the method 500 of FIG. 5) are performed to generate a timeline representation. Next, at step 620, the timeline representation is segmented into a set of lookback windows with different lookback window sizes (e.g., 1 month, 3 months, 6 months, 10 months, 1 year). As described herein, the number and size of the lookback windows may be determined based on characteristic factors (e.g., period of time covered by the timeline representation, the density of the collected data points, disease progression state) of the electronic health record data.

Next, at step 630, trajectory features may be determined for one or more lookback windows of the set of lookback windows. As described herein, determining the set of trajectory features for a lookback window may include computing, ascertaining, deriving, calculating, or otherwise identifying the set of trajectory features for the lookback window using a machine learning technique. As described herein, the trajectory features determined at step 630 may include trend features that represent quantitative data indicating changes in biological measurements (e.g., BMI, white blood cell count, albumin, cholesterol) for the patient. As examples, the trend features may include an absolute difference value of a feature (e.g., change in BMI of -1.7), a relative difference value of a feature, a percent change of a feature (e.g., percent change in BMI of -5%), a rate change of a feature, a velocity of change of a feature (e.g., white blood cell count increase of 1.4% per week), an acceleration of change of a feature (e.g., rate of change of cholesterol decreasing by 4.3% per month), a measure of variation of a feature, an average deviation of a feature, a standard deviation of a feature, a coefficient of variation of a feature, a variance-to-mean ratio of a feature, or a measure of dispersion of a feature within a set of lookback windows.

Next, at step 640, a machine learning technique such as a convolutional neural network may identify a subset of trend features that achieve a relevance threshold with respect to predicting the health status progression of the patient. As described herein, the relevance threshold may refer to a preset boundary condition that assesses the level of pertinence or importance a particular trajectory feature has for predicting the progression of a particular health condition. In embodiments, the relevance threshold may be defined based on a correlation between particular trend features and past outcomes illustrated in historical data. For instance, the machine learning technique may be trained using past medical data to identify correlations between particular trend features and particular outcomes (e.g., disease outcomes, diagnoses, death within a particular time span, recovery), and once trained, may apply convolutional feature maps to the set of trend features to identify those trend features that are particularly indicative of health changes in the patient. Other methods of identifying the subset of trend features are also possible.

Next, at step 650, a set of weighting values may be assigned to the subset of trend features identified in Step 640. As described herein, the weighting values may refer to a numerical score (e.g., from 0 to 100, where higher scores indicate a higher degree of impact) that represents the degree of impact a particular trajectory feature is anticipated to have on the result of the prediction outcome. As an example, trend features that indicate a change in values for white blood cell count and lymphocyte count, two factors known to be indicative of cancer progression, may be identified as achieving the relevance threshold for a cancer patient and assigned higher weighting values, whereas trend features corresponding to cholesterol (e.g., a feature not significantly relevant to cancer progression) may be assigned a lower weighting value. In embodiments, assigning the weighting values may be performed by an attention layer of the aforementioned machine learning network.

At step 660, a prediction outcome may be generated based on the set of weighting values assigned to the subset of trend features. Generating the prediction outcome may include identifying, as a first trend segment, a sequence of consecutive lookback windows associated with a weighting value above a predetermined threshold, and predicting, using a machine learning technique configured to create a forecast based on the sequence of consecutive lookback windows, a second trend segment that indicates an expected continuation of the first trend segment. In this way, by identifying consecutive lookback windows that all satisfy a predetermined weighting threshold, it is possible to ascertain trends in pertinent features in the health status of a patient and extend those trends to make predictions about that are relevant to the future progression of the patient’s health trajectory. As examples, the prediction outcome may be represented as a risk estimate for a series of prediction horizons. For example, the prediction outcome may indicate the probability that the patient will still be alive for prediction horizons of 1 month, 2 months, 6 months, 1 year, or the like. Other methods of generating the prediction outcome are also possible.

FIG. 7 illustrates an example functional configuration for generating a prediction outcome based on motif features that represent qualitative data indicating changes in event information associated with a temporal period for the patient, according to embodiments. As an example, the method 700 may be used to generate a prediction outcome in cases where the trajectory features identified from the set of lookback windows include motif features such as a sequence of medical treatments for the patient, a sequence of medical diagnoses (e.g., diagnoses encoded according to the International Statistical Classification of Diseases and Related Health Problems) for the patient within a set of lookback windows, a sequence of disease progression states for the patient within a set of lookback windows, or the like. In the following description, aspects of the method 700 that substantially correspond to aspects of the previously described method 600 will be omitted, and the description will focus on those aspects particular to the method 700.

At Step 710, electronic health data is read from the electronic health record system 701, and the pre-processing described above (e.g., the steps illustrated in the method 500 of FIG. 5) are performed to generate a timeline representation. Next, at step 720, the timeline representation is segmented into a set of lookback windows with different lookback window sizes (e.g., 1 month, 3 months, 6 months, 10 months, 1 year).

At Step 730, clinical classification may be performed on the set of trajectory features for each lookback window. Clinical classification may include grouping, sorting, or otherwise categorizing the trajectory features to derive conclusions regarding the nature of the health of the patient (e.g., categorizing a BMI value of less than 18.5 as underweight). Clinical classification may be performed using a standardized clinical encoding system (e.g., clinical care classification system, International Statistical Classification of Diseases and Related Health Problems) to categorize patient conditions into one or more of a number of designated classes. Performing clinical classification may serve as a dimensionality reduction step which significantly reduces the potential motif search space. Furthermore, performing clinical classification may improve interpretability of the data, since the clinically classified features make it easier for physicians to understand the clinical context of identified motif features. Other methods of performing clinical classification are also possible.

Next, at step 740, a machine learning technique such as a convolutional neural network may identify a subset of motif features that achieve a relevance threshold with respect to predicting the health status progression of the patient. For instance, the machine learning technique may be trained using past medical data to identify correlations between particular motif features and particular outcomes, and once trained, may apply convolutional feature maps to the set of motif features to identify those motif features that are particularly indicative of health changes in the patient. Other methods of identifying the subset of motif features are also possible.

Next, at step 750, a set of weighting values may be assigned to the subset of motif features identified in Step 740. As described herein, the weighting values may refer to a numerical score (e.g., from 0 to 100, where higher scores indicate a higher degree of impact) that represents the degree of impact a particular trajectory feature is anticipated to have on the result of the prediction outcome.

At step 760, a prediction outcome may be generated based on the set of weighting values assigned to the subset of motif features. Generating the prediction outcome may include identifying, as a first trend segment, a sequence of consecutive lookback windows associated with a weighting value above a predetermined threshold, and predicting, using a machine learning technique configured to create a forecast based on the sequence of consecutive lookback windows, a second trend segment that indicates an expected continuation of the first trend segment. In this way, by identifying consecutive lookback windows that all satisfy a predetermined weighting threshold, it is possible to ascertain trends in pertinent features in the health status of a patient and extend those trends to make predictions about that are relevant to the future progression of the patient’s health trajectory. As examples, the prediction outcome may be represented as a risk estimate for a series of prediction horizons. For example, the prediction outcome may indicate the probability that the patient will still be alive for prediction horizons of 1 month, 2 months, 6 months, 1 year, or the like. Other methods of generating the prediction outcome are also possible.

FIG. 8 illustrates an example method 800 illustrating the functionality of a unified architecture for making health status change predictions, according to embodiments. As described above, in embodiments, aspects of the disclosure relate to generating prediction outcomes using a convolutional neural network configured to identify qualitative motif features as well as a convolutional neural network configured to identify quantitative trend features. In addition, aspects of the disclosure relate to a unified machine learning model 800 that includes both of the above described convolutional neural network models as well as a long-short term (LSTM) neural network architecture. By utilizing a unified configuration that combines the output of several neural networks each configured to analyze different types of data, it is possible to facilitate greater flexibility and accuracy in prediction results. In the following description, aspects of the method 800 that substantially correspond to aspects of the previously described

methods

600 or 700 will be omitted, and the description will focus on those aspects particular to the method 800.

At Step 805, pre-processing is performed on the electronic health record data extracted from the electronic health record system 801 to construct a timeline representation. Subsequently, this data is separated into different data streams based on the content and format of the data. For instance, data that includes the above-described trend features or motif features may be sent to step 807 for trend and motif modeling. In contrast, data that does not include trend features or motif features, but instead includes sequence dependent features, may be sent to step 830 for sequence modeling.

First, at step 807, trend and motif modeling is performed on the data stream. Next, at step 809, a segmentation technique is applied to split the received timeline representation for the patient into a set of segments with different lookback windows. Subsequently, timeline representations including quantitative trend features such as numerical biological measurements are routed to step 810, and timeline representations including diagnosis or metastasis information are directed to step 820.

At step 810, trend features are determined for one or more lookback windows of the timeline representation in order to capture temporal trend features regarding the evolution of numerical values in a patient’s timeline representation. For each of the lookback windows that make up the set of segments, trajectory features such as the absolute change of a numerical value between the observational lookback window start and end time point, the velocity of change, the acceleration of change, the coefficient of variation, the variance-to-mean ratio, and other derivatives are extracted. Next, at step 812, convolutional feature maps may be applied to the lookback windows segments that include the determined trajectory features in order to identify a subset of trend features that achieve a relevance threshold with respect to predicting the health status progression of the patient. At step 814, an attention layer may assign weighting values to the identified trend features. At step 816, a softmax function may be utilized to normalize the weighting values assigned to the identified trend features.

At step 820, clinical classification is performed on the segmented timeline representation in order to facilitate dimensional reduction and interpretability of the motif features. Next, at step 822, convolutional feature maps may be applied to the lookback windows segments that include the motif features in order to identify a subset of motif features that achieve a relevance threshold with respect to predicting the health status progression of the patient. At step 824, an attention layer may assign weighting values to the identified motif features. At step 826, a softmax function may be utilized to normalize the weighting values assigned to the identified trend features.

At step 830, sequence modeling is applied to electronic health record data for modeling the progression of sequence dependent events such as diagnoses, for example. Subsequently, at step 832, a machine learning method such as an LSTM neural network may process the data to identify features that achieve a relevance threshold with respect to predicting the health status progression of the patient. At step 834, an attention layer may assign weighting values to the identified features. At step 836, a softmax function may be utilized to normalize the weighting values assigned to the identified features.

At step 840, the output of the softmax function of each datastream (e.g., the sequence features, motif features, and trend features) may be summed together to generate a prediction outcome regarding the health status progression of the patient. As one example, the prediction outcome may indicate the “tipping-point” in a patient’s health status trajectory. Another example prediction outcome may indicate a mortality likelihood within a defined time interval, such as 6 months. Another example prediction outcome output may be a sequence of odds of survival predictions for an increasing future prediction horizon. As described herein, by utilizing a unified configuration that combines the output of several neural networks each configured to analyze different types of data, it is possible to facilitate greater flexibility and accuracy in prediction results.

FIG. 9 illustrates an example method 900 illustrating the functionality of a unified architecture for making health status change predictions using custom features, according to embodiments. As described herein, custom features refer to user-defined settings that allow for tracking and monitoring of particular features or contributions of features within electronic health data. As the electronic health record data evolves over time (e.g., new data may be added to the electronic health record system as patients interact with healthcare systems), aspects of the architecture relate to generating predictions at multiple time points to track the progression of trend features and motif features over time. In the following description, aspects of the method 900 that substantially correspond to aspects of the previously described method 800 will be omitted, and the description will focus on those aspects particular to the method 900.

As illustrated in FIG. 9, time points t₀ to t_n in a patient’s timeline are modeled with

LSTM units

930 and 940, respectively. For each time point, data available up to that specific time point can be used to drive predictions.

At step 905, electronic health record data from the electronic health record system 901 is pre-processed to generate a timeline representation and split into several streams which will be used as inputs to the LSTM at 930. One of these data streams may be a timeline representation of the raw data from the EHR 901. From the raw data, a set of custom features can be created at step 925. The creation of custom features may enable physicians to enter clinically relevant feature derivatives into the model. As an example, a clinician who is investigating the trend between a particular disease treatment and the manifestation of side effects of a specific drug in patients may define a custom feature to monitor the relationship between the desired disease treatment and the drug over time. After definition of custom features, clinical classification may also be performed at step 925, and the results may be input to the LSTM 930.

At step 907, trend and motif modeling is performed on one of the data streams from step 905. At step 909, segmentation is applied to the timeline representation to split the received patient timeline data into a set of segments with different lookback windows. Subsequently, timeline representations including quantitative trend features such as numerical biological measurements are routed to step 910, and timeline representations including motif features such as diagnosis or metastasis information are directed to step 920. For timeline representations that include trend features, trajectory features are determined at step 910, a subset of trend features that achieve a relevance threshold with respect to predicting the health status progression of a patient are identified (e.g., using convolutional feature maps) at step 912, and the output of these steps is input to LSTM 930. Similarly, for timeline representations that include motif features, clinical classification is performed at step 920, a subset of motif features that achieve a relevance threshold with respect to predicting the health status progression of a patient are identified, and the output of these steps is input to LSTM 930.

The raw data, custom features, trend features, and motif features corresponding to the first point in time t₀ are input to the LSTM 930. Subsequently, the process described above is repeated for another point in time, t_n. Similarly, the raw data, custom features, trend features, and motif features for t_n are input to the LSTM 940. Each of the LSTM 930 and the LSTM 940 may generate a prediction result based on the raw data, custom features, trend features, and motif features received for that respective time point.

Subsequently, at step 935, a global attention layer may be used to assign weighting values to the features of each of the received prediction outcomes, and generate a final prediction outcome that aggregates the results of the LSTM 930 and the LSTM 940. In this way, by generating prediction outcomes for different time points and using a global attention layer to aggregate these individual predictions into a final result, the time-dependent nature of particular factors can be more accurately modeled, resulting in more accurate prediction outcomes. Accordingly, accurate short and long term predictions regarding the health status changes of a patient can be made.

The present invention may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.

The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.

These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

Embodiments according to this disclosure may be provided to end-users through a cloud-computing infrastructure. Cloud computing generally refers to the provision of scalable computing resources as a service over a network. More formally, cloud computing may be defined as a computing capability that provides an abstraction between the computing resource and its underlying technical architecture (e.g., servers, storage, networks), enabling convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction. Thus, cloud computing allows a user to access virtual computing resources (e.g., storage, data, applications, and even complete virtualized computing systems) in “the cloud,” without regard for the underlying physical systems (or locations of those systems) used to provide the computing resources.

The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.

While the foregoing is directed to exemplary embodiments, other and further embodiments of the invention may be devised without departing from the basic scope thereof, and the scope thereof is determined by the claims that follow. The descriptions of the various embodiments of the present disclosure have been presented for purposes of illustration, but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the various embodiments. As used herein, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. “Set of,” “group of,” “bunch of,” etc. are intended to include one or more. It will be further understood that the terms “includes” and/or “including,” when used in this specification, specify the presence of the stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. In the previous detailed description of exemplary embodiments of the various embodiments, reference was made to the accompanying drawings (where like numbers represent like elements), which form a part hereof, and in which is shown by way of illustration specific exemplary embodiments in which the various embodiments may be practiced. These embodiments were described in sufficient detail to enable those skilled in the art to practice the embodiments, but other embodiments may be used and logical, mechanical, electrical, and other changes may be made without departing from the scope of the various embodiments. In the previous description, numerous specific details were set forth to provide a thorough understanding the various embodiments. But, the various embodiments may be practiced without these specific details. In other instances, well-known circuits, structures, and techniques have not been shown in detail in order not to obscure embodiments.

Claims

A method for predicting a health status progression for a patient, the method comprising:
preprocessing a set of electronic health record data that indicates medical history information for a patient to construct a timeline representation for the set of electronic health record data;
segmenting the timeline representation for the set of electronic health record data into a set of lookback windows that designate one or more time periods of the timeline representation;
determining, for a lookback window of the set of lookback windows, a set of trajectory features that indicate a change of a feature of the set of electronic health record data;
assigning, to a subset of trajectory features that achieve a relevance threshold with respect to predicting the health status progression of the patient, a set of weighting values; and
generating, based on the set of weighting values assigned to the subset of trajectory features, a prediction outcome that indicates an expected health progression of the patient.
The method of claim 1, wherein the prediction outcome indicates a risk estimate for one or more prediction horizons.
The method of claim 1, wherein the set of trajectory features include a set of trend features that represent quantitative data indicating changes in biological measurements for the patient.
The method of claim 3, wherein the set of trend features include at least one or more of:
an absolute difference value of a feature within a set of lookback windows;
a relative difference value of a feature within a set of lookback windows;
a percent change value of a feature within a set of lookback windows;
a rate change value of a feature within a set of lookback windows;
a velocity of change of a feature within a set of lookback windows;
an acceleration of change of a feature within a set of lookback windows;
a measure of variation of a feature within a set of lookback windows;
an average deviation of a feature within a set of lookback windows;
a standard deviation of a feature within a set of lookback windows;
a measure of dispersion of a feature within a set of lookback windows;
a coefficient of variation of a feature within a set of lookback windows; or
a variance-to-mean ratio of a feature within a set of lookback windows.
The method of claim 1, wherein the set of trajectory features include a set of motif features that represent qualitative data indicating changes in event information associated with a temporal period for the patient.
The method of claim 5, wherein the set of motif features include at least one or more of:
a sequence of medical treatments for the patient within a set of lookback windows;
a sequence of medical diagnoses for the patient within a set of lookback windows; or
a sequence of disease progression states for the patient within a set of lookback windows.
The method of claim 1, wherein generating the prediction outcome further includes:
identifying, as a first trend segment, a sequence of consecutive lookback windows associated with a weighting value above a predetermined threshold; and
predicting, using a machine learning technique to create a forecast based on the sequence of consecutive lookback windows, a second trend segment that indicates an expected continuation of the first trend segment as the prediction outcome.
The method of claim 7, wherein the machine learning technique includes a convolutional neural network.
The method of claim 7, further comprising
ascertaining, using the convolutional neural network, a number and a size of the set of lookback windows based on a set of characteristic factors that characterize the set of electronic health data.
The method of claim 9, wherein the set of characteristic factors includes at least one or more of:
a time period covered by the timeline representation;
a density of data points;
a health diagnosis for the patient;
a disease progression state;
a treatment sequence; and
a data type.
An apparatus for predicting a health status progression for a patient, the apparatus comprising:
a segmentation unit configured to segment a timeline representation of a set of electronic health record data that indicates medical history information for a patient into a set of lookback windows that designate one or more time periods of the timeline representation;
a determination unit configured to determine, for a lookback window of the set of lookback windows, a set of trajectory features that indicate a change of a feature of the set of electronic health record data;
an assignment unit configured to assign, to a subset of trajectory features that achieve a relevance threshold with respect to predicting the health status progression of the patient, a set of weighting values; and
a generation unit configured to generate, based on the set of weighting values assigned to the subset of trajectory features, a prediction outcome that indicates an expected health progression of the patient.
A system for predicting a health status progression for a patient in a distributed computing environment, the system comprising:
a health status prediction server configured to communicate with one or more electronic health record databases configured to store electronic health record information for a patient and one or more client devices via a communication network, the health status prediction server including:
a segmentation unit configured to:
segment a timeline representation of a set of electronic health record data received from the one or more electronic health record database into a set of lookback windows that designate one or more time periods of the timeline representation, and
ascertain, using a convolutional neural network, a number and a size of the set of lookback windows based on a set of characteristic factors that characterize the set of electronic health data;
a determination unit configured to determine, for a lookback window of the set of lookback windows, a set of trajectory features that indicate a change of a feature of the set of electronic health record data;
an assignment unit configured to assign, to a subset of trajectory features that achieve a relevance threshold with respect to predicting the health status progression of the patient, a set of weighting values; and
a generation unit configured to:
generate, based on the set of weighting values assigned to the subset of trajectory features, a prediction outcome that indicates an expected health progression of the patient by:
identifying, as a first trend segment, a sequence of consecutive lookback windows associated with a weighting value above a predetermined threshold, and
predicting, using the convolutional neural network to create a forecast based on the sequence of consecutive lookback windows, a second trend segment that indicates an expected continuation of the first trend segment as the prediction outcome; and
transmit the prediction outcome to the one or more client devices via the communication network;
wherein:
the prediction outcome indicates a risk estimate for one or more prediction horizons;
the set of trajectory features include a set of trend features that represent quantitative data indicating changes in biological measurements for the patient;
the set of trend features include at least one or more of:
an absolute difference value of a feature within a set of lookback windows,
a relative difference value of a feature within a set of lookback windows,
a percent change value of a feature within a set of lookback windows,
a rate change value of a feature within a set of lookback windows,
a velocity of change of a feature within a set of lookback windows,
an acceleration of change of a feature within a set of lookback windows,
a measure of variation of a feature within a set of lookback windows,
an average deviation of a feature within a set of lookback windows,
a standard deviation of a feature within a set of lookback windows,
a measure of dispersion of a feature within a set of lookback windows
a coefficient of variation of a feature within a set of lookback windows, or
a variance-to-mean ratio of a feature within a set of lookback windows;
the set of trajectory features include a set of motif features that represent qualitative data indicating changes in event information associated with a temporal period for the patient;
the set of motif features include at least one or more of:
a sequence of medical treatments for the patient within a set of lookback windows,
a sequence of medical diagnoses for the patient within a set of lookback windows, or
a sequence of disease progression states for the patient within a set of lookback windows; and
the set of characteristic factors includes at least one or more of:
a time period covered by the timeline representation;
a density of data points;
a health diagnosis for the patient;
a disease progression state;
a treatment sequence; and
a data type.