US20230376774A1 - Adversarial Cooperative Imitation Learning for Dynamic Treatment - Google Patents
Adversarial Cooperative Imitation Learning for Dynamic Treatment Download PDFInfo
- Publication number
- US20230376774A1 US20230376774A1 US18/362,166 US202318362166A US2023376774A1 US 20230376774 A1 US20230376774 A1 US 20230376774A1 US 202318362166 A US202318362166 A US 202318362166A US 2023376774 A1 US2023376774 A1 US 2023376774A1
- Authority
- US
- United States
- Prior art keywords
- trajectories
- model
- discriminator
- resulted
- negative
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000011282 treatment Methods 0.000 title claims description 66
- 238000012549 training Methods 0.000 claims abstract description 28
- 230000004044 response Effects 0.000 claims abstract description 26
- 238000000034 method Methods 0.000 claims abstract description 25
- 230000009471 action Effects 0.000 claims description 16
- 230000036541 health Effects 0.000 claims description 16
- 238000010801 machine learning Methods 0.000 claims description 12
- 239000013598 vector Substances 0.000 claims description 7
- 239000002365 multiple layer Substances 0.000 claims description 4
- 238000005457 optimization Methods 0.000 claims description 4
- 230000006872 improvement Effects 0.000 claims description 3
- 210000002569 neuron Anatomy 0.000 description 25
- 238000013528 artificial neural network Methods 0.000 description 20
- 230000015654 memory Effects 0.000 description 16
- 239000010410 layer Substances 0.000 description 14
- 230000006870 function Effects 0.000 description 12
- 238000012545 processing Methods 0.000 description 12
- 238000003860 storage Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 7
- 210000002364 input neuron Anatomy 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000009826 distribution Methods 0.000 description 6
- 210000004205 output neuron Anatomy 0.000 description 6
- 238000003491 array Methods 0.000 description 5
- 230000036772 blood pressure Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 229940079593 drug Drugs 0.000 description 3
- 239000003814 drug Substances 0.000 description 3
- 230000002787 reinforcement Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 239000008280 blood Substances 0.000 description 2
- 210000004369 blood Anatomy 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000010365 information processing Effects 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000003278 mimic effect Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000036760 body temperature Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010367 cloning Methods 0.000 description 1
- 238000013329 compounding Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000002483 medication Methods 0.000 description 1
- 210000000653 nervous system Anatomy 0.000 description 1
- 229910052760 oxygen Inorganic materials 0.000 description 1
- 239000001301 oxygen Substances 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 208000024891 symptom Diseases 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000032258 transport Effects 0.000 description 1
- 238000011269 treatment regimen Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/20—Ensemble learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/004—Artificial life, i.e. computing arrangements simulating life
- G06N3/006—Artificial life, i.e. computing arrangements simulating life based on simulated virtual individual or collective life forms, e.g. social simulations or particle swarm optimisation [PSO]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/046—Forward inferencing; Production systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H20/00—ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/30—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment
Definitions
- the present invention relates to providing medical treatments to patients, and, more particularly, to determining tailored treatments that are adjusted over time according to the changing state of the patients.
- Determining treatments for individual patients has historically been performed by highly skilled doctors, who apply their experience and training to assess the patient's needs and provide a course of treatment.
- the fallibility of human judgment leads to errors.
- a method for responding to changing conditions includes training a model, using a processor, using trajectories that resulted in a positive outcome and trajectories that resulted in a negative outcome. Training is performed using an adversarial discriminator to train the model to generate trajectories that are similar to historical trajectories that resulted in a positive outcome, and using a cooperative discriminator to train the model to generate trajectories that are dissimilar to historical trajectories that resulted in a negative outcome.
- a dynamic response regime is generated using the trained model and environment information. A response to changing environment conditions is performed in accordance with the dynamic response regime.
- a method for treating a patient includes training a model on historical treatment trajectories, including trajectories that resulted in a positive health outcome and trajectories that resulted in a negative health outcome.
- a dynamic treatment regime is generated for a patient using the trained model and patient information. The patient is treated in accordance with the dynamic treatment regime, in a manner that is responsive to changing patient conditions, by triggering one or more medical devices to administer a treatment to the patient.
- a system for treating a patient includes a machine learning model, configured to generate a dynamic response regime for using environment information.
- a model trainer is configured to train the machine learning model, including trajectories that resulted in a positive outcome and trajectories that resulted in a negative outcome, by using an adversarial discriminator to train the machine learning model to generate trajectories that are similar to historical trajectories that resulted in a positive outcome, and by using a cooperative discriminator to train the model to generate trajectories that are dissimilar to historical trajectories that resulted in a negative outcome.
- a response interface is configured to trigger a response to changing environment conditions in accordance with the dynamic response regime.
- FIG. 1 is a block diagram showing a patient being monitored and treated by a system that uses a dynamic treatment regime to react to changing patient conditions, in accordance with an embodiment of the present invention
- FIG. 2 is a block/flow diagram of a method for generating and implementing a dynamic treatment regime for a patient, in accordance with an embodiment of the present invention
- FIG. 3 is a block/flow diagram of a method for training a machine learning model to generate dynamic treatment regimes, in accordance with an embodiment of the present invention
- FIG. 4 is pseudo-code for a learning process for a machine learning model to generate dynamic treatment regimes, in accordance with an embodiment of the present invention
- FIG. 5 is a block diagram of a dynamic treatment regime system that generates and implements a dynamic treatment regime, in accordance with an embodiment of the present invention
- FIG. 6 is a diagram of an exemplary neural network structure, in accordance with an embodiment of the present invention.
- FIG. 7 is a diagram of an exemplary neural network structure with weights, in accordance with an embodiment of the present invention.
- Embodiments of the present invention provide a dynamic treatment regime (DTR), a sequence of tailored treatment decisions that specify how treatments should be adjusted through time, in accordance with the dynamic states of patients.
- Rules in the DTR can take input information, such as a patient's medical history, laboratory results, and demographic information, and output recommended treatments to improve the effectiveness of the treatment program.
- the present embodiments can make use of deep reinforcement techniques for machine learning, for example to learn treatment policies from doctors' previous treatment plans.
- the present embodiments do so in such a way as to avoid the compounding errors that can result from supervised methods that are based on behavior cloning and the sparsity of self-defined reward signals in reinforcement learning models.
- Treatment paths are considered that include both positive trajectories, where a positive health outcome was achieved for a patient, and negative trajectories, where a negative health outcome resulted. By using both positive and negative trajectories, productive strategies are learned, and unproductive strategies are avoided.
- the present embodiments use an adversarial cooperative imitation learning (ACIL) model to determine the dynamic treatment regimes that produce positive outcomes, while staying away from negative trajectories.
- AIL adversarial cooperative imitation learning
- Two discriminators can be used, including an adversarial discriminator and a cooperative discriminator.
- the adversarial discriminator minimizes the discrepancies between the output trajectories and the positive trajectories in a set of training data, while the cooperative discriminator distinguishes the negative trajectories from the positive trajectories and the output trajectories.
- Reward signals from the discriminators are used to refine the policy that generates dynamic treatment regimes.
- DTRs are generated in response to specific patient information. These DTRs are then implemented, by providing the specified care and treatment to the patients, responsive to the changing condition for each patient.
- the present embodiments thereby reduce the likelihood of a negative health outcome and provide superior dynamic treatment regimens.
- a patient 102 is shown.
- the patient 102 may, for example, have a medical condition that is being treated.
- One or more sensors 104 monitor information about the patient's condition, and provide the information to patient monitor 106 .
- This information may include vital information, such as heart rate, blood oxygen saturation, blood pressure, body temperature, blood sugar levels.
- the information may also include patient activity information, such as movements and location. In each case, the information may be collected by any appropriate sensing device or device(s) 104 .
- the patient monitor 106 may also accept information about the patient that is not sensed directly, for example including the patient's demographic information (e.g., age, medical history, family medical history, etc.) and the patient's own statement of symptoms, for example input by the patient or collected by a medical professional.
- the patient's demographic information e.g., age, medical history, family medical history, etc.
- the patient's own statement of symptoms for example input by the patient or collected by a medical professional.
- the patient monitor 106 renders the collected information in a format suitable for the DTR system 108 .
- the DTR system 108 includes a set of rules for how treatment should progress, based on updates to the patient's monitored information. As just one example of such a rule, if a patient's blood pressure were to drop below a threshold, the DTR system 108 may indicate an appropriate medical response and adjustment to treatment.
- the DTR system's policies are learned in advance, as described in greater detail below, to incorporate past instances of successful and unsuccessful treatments, thereby providing a set of rules that stay close to successful treatment trajectories, while staying away from unsuccessful treatment trajectories.
- a treatment application system 110 accepts directives from the DTR system 108 and takes an appropriate action.
- the treatment system 110 can output an alert or an instruction for the recommended treatment.
- the treatment recommendation can include an automatic treatment intervention, by way of one or more medical treatment devices 112 .
- the treatment system 110 may cause a treatment device to introduce an appropriate medication to the patient's bloodstream.
- the present embodiments can make rapid adjustments to a patient's treatment, responsive to the patient's changing medical condition. This reduces the reliance on fallible human decision-making and can lead to superior outcomes, particularly in stressful situations, where a decision needs to be made quickly and correctly.
- Block 202 builds a set of training data that includes, for example, records of historical treatment trajectories.
- the historical treatment trajectories may include information about patient condition, information about the timing and type of treatment actions and changes, and information about the treatment's outcome. Treatment trajectories with both positive health outcomes and negative health outcomes are included in the training set.
- the trajectories can be represented as sequences of states and actions (s 0 , a 0 , s 1 , a 1 , . . . ) drawn from a policy ⁇ .
- each state s t ⁇ includes collected patient information at a time t
- each action a t ⁇ includes a K-dimensional binary-valued vector, where the value on each dimension represents the application of a particular medication, dosage, or treatment action.
- Some of the trajectories are associated with policies that result in positive outcomes ( ⁇ + ), while other trajectories are associated with policies that result in negative outcomes ( ⁇ ⁇ ).
- Block 204 uses the training set to train the ACIL model.
- This model may be implemented using machine learning techniques, described in greater detail below.
- the model accepts patient information as an input, and outputs one or more DTR policies for the patient.
- a DTR policy includes one or more rules that are used to adapt treatment to changing patient conditions.
- Block 206 then collects information for a specific patient 102 , as described above.
- the patient information is used as an input to the ACIL model to produce a DTR policy for the specific patient 102 , relating to that patient's treatment needs.
- the output policy can be expressed as ⁇ ⁇ , with a parameter vector ⁇ that represents the particular policy rules.
- Block 210 then applies a recommended treatment to the patient 102 , using the collected patient information, following a trajectory ⁇ ⁇ that is generated by the policy ⁇ ⁇ .
- block 212 updates the patient information, for example with current measurements.
- Block 210 uses this updated information to determine any updated treatments that may be needed, according to the DTR. This process can continue indefinitely, or can be interrupted by a positive or negative health outcome.
- block 302 trains the patient model, which serves as an environment simulator.
- the adversarial discriminator, cooperative discriminator, and policy network are then iteratively trained until they converge in blocks 304 , 306 , and 308 .
- Convergence can be determined, for example, by determining that the improvement from one iteration to the next has fallen below a predetermined threshold. Alternatively, processing can stop when a predetermined number of iterations has been reached.
- the environment can be simulated with generative models, such as variational auto-encoders, for model-based reinforcement learning and trajectory embedding.
- a generative adversarial network can be used instead.
- the variational auto-encoder architecture builds a patient model that transforms a state distribution into an underlying latent space.
- the patient model includes an encoder, which maps the current state and action to a latent distribution z ⁇ ( ⁇ , ⁇ ), and a decoder, which maps latent z and the current state s t and action a t into a successor state ⁇ t+1 .
- the patient model is trained to minimize a reconstruction error between the input state s t+1 and a reconstructed state ⁇ t+1 that is generated by the decoder, under the latent distribution z.
- An objective function for this can be expressed as:
- w is a reconstruction error
- s t is a state at time t
- a t is an action at time t
- the variable ⁇ represents a balancing weight between two kinds of loss
- D KL is the Kullback-Liebler divergence.
- the auto-encoder seeks to “encode” the input information, in this case the “actions” and “states,” and translates them to the latent space.
- this latent space may represent the actions and states as vectors, which can be readily compared to one another.
- the decoder then translates those vectors back to “actions” and “states,” and an error w represents the difference between the output of the decoder and the input to the encoder.
- the parameters of the auto-encoder are then modified to reduce the value of the error. Training continues, with the parameters being modified at each iteration, until the error value reaches a point where no further training is needed. This may be triggered, for example, when the error value falls below a threshold, or when the error value does not change significantly over a number of iterations.
- training the adversarial discriminator includes a comparison between the trajectories of positive outcome scenarios and the trajectories generated by a policy network.
- the differences between two policies e.g., the policy ⁇ ⁇ generated by the ACIL model, and a policy with a positive outcome ⁇ +
- the occupancy measure can be interpreted as the distribution of state-action pairs that the policy interacts with in the environment.
- a policy ⁇ ⁇ can be implemented as a multiple-layer perceptron network, where ⁇ ⁇ takes the state of the patient as an input and returns, for example, recommended medications.
- the adversarial discriminator D a (s, a) can also be implemented as a multiple-layer perceptron network, having a number and dimension of layers that are fine-tuned parameters, which estimates the probability that a state-action pair (s, a) comes from a positive trajectory policy ⁇ + , rather than a generated policy ⁇ ⁇ .
- the learning of the adversarial discriminator can be expressed as the following objective function:
- This objective function is equivalent to minimizing the Jensen-Shannon divergence D JS between the distributions of state-action pairs ⁇ ⁇ ⁇ and ⁇ ⁇ + , which are generated by interacting with the environment using policy ⁇ ⁇ and policy ⁇ + . represents the expectation over all (s, a) pairs sampled from ⁇ ⁇ ⁇ .
- D a is referred to as an adversarial discriminator, because the goals of optimizing D a and ⁇ ⁇ are opposite—D a seeks to minimize the probability of the state-action pair generated by ⁇ ⁇ , while ⁇ ⁇ is selected to maximize the probability of D a making a mistake.
- training the cooperative discriminator includes training a model to differentiate the generated trajectories and the positive trajectory policies from the negative trajectory policies.
- the occupancy measure ⁇ ⁇ can be used again to compare the different policies.
- the objective function for learning the cooperative discriminator D c can be expressed as:
- This objective function characterizes the optimal negative log loss of classifying the positive trajectories generated from ⁇ ⁇ and ⁇ + and the negative trajectories generated from ⁇ ⁇ .
- This is referred to as a cooperative discriminator because the goals of D c and ⁇ ⁇ are both to maximize the probability of the data that is generated by ⁇ ⁇ is positive.
- the losses from D a and D c can be considered as reward functions that help refine ⁇ ⁇ .
- the distribution ⁇ ⁇ ⁇ is different from ⁇ ⁇ ⁇ , it receives a large reward from D c .
- the loss of ⁇ ⁇ is D JS ( ⁇ ⁇ + + ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ ⁇ .
- training the policy network seeks to update the policy network ⁇ ⁇ to mimic positive trajectories, while staying away from negative trajectories.
- the network incorporates the reward signals from both D a and D c .
- the signal from D a is used to push ⁇ ⁇ closer to ⁇ + , while the signal D c separates ⁇ ⁇ and ⁇ ⁇ .
- the loss function can be defined as:
- H( ⁇ ) is the casual entropy of the policy, which encourages diversity in the learned policy
- ⁇ 0 is a parameter that is used to control H( ⁇ ⁇ ).
- the parameters ⁇ ⁇ and ⁇ ⁇ are weights with values between 0 and 1, and balance the reward signals.
- the adversarial discriminator D a , the cooperative discriminator D c , and the policy network ⁇ ⁇ are trained in a three-party min-max game, which can be defined as:
- ⁇ a and ⁇ b are weight parameters that weight the contribution of the adversarial discriminator and the cooperative discriminator.
- the entropy of the policy ⁇ ⁇ encourages policy diversity, and is defined as:
- the present embodiments generated policies that substantially outperformed baseline processes for generating treatment trajectories.
- ACIL considers discovering DTRs as a sequential decision-making problem and focuses on the long-term influence of the current action. Additionally, with the use of both positive and negative trajectory examples as training data, ACIL is able to mimic policies that have positive health outcomes, while avoiding mistakes. The result is a superior treatment policy, that responds to changing patient conditions in a manner that maximizes the likelihood of a positive health outcome.
- Embodiments described herein may be entirely hardware, entirely software or including both hardware and software elements.
- the present invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
- Embodiments may include a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.
- a computer-usable or computer readable medium may include any apparatus that stores, communicates, propagates, or transports the program for use by or in connection with the instruction execution system, apparatus, or device.
- the medium can be magnetic, optical, electronic, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium.
- the medium may include a computer-readable storage medium such as a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk, etc.
- Each computer program may be tangibly stored in a machine-readable storage media or device (e.g., program memory or magnetic disk) readable by a general or special purpose programmable computer, for configuring and controlling operation of a computer when the storage media or device is read by the computer to perform the procedures described herein.
- the inventive system may also be considered to be embodied in a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.
- a data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus.
- the memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code to reduce the number of times code is retrieved from bulk storage during execution.
- I/O devices including but not limited to keyboards, displays, pointing devices, etc. may be coupled to the system either directly or through intervening I/O controllers.
- Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks.
- Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
- the term “hardware processor subsystem” or “hardware processor” can refer to a processor, memory, software or combinations thereof that cooperate to perform one or more specific tasks.
- the hardware processor subsystem can include one or more data processing elements (e.g., logic circuits, processing circuits, instruction execution devices, etc.).
- the one or more data processing elements can be included in a central processing unit, a graphics processing unit, and/or a separate processor- or computing element-based controller (e.g., logic gates, etc.).
- the hardware processor subsystem can include one or more on-board memories (e.g., caches, dedicated memory arrays, read only memory, etc.).
- the hardware processor subsystem can include one or more memories that can be on or off board or that can be dedicated for use by the hardware processor subsystem (e.g., ROM, RAM, basic input/output system (BIOS), etc.).
- the hardware processor subsystem can include and execute one or more software elements.
- the one or more software elements can include an operating system and/or one or more applications and/or specific code to achieve a specified result.
- the hardware processor subsystem can include dedicated, specialized circuitry that performs one or more electronic processing functions to achieve a specified result.
- Such circuitry can include one or more application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), and/or programmable logic arrays (PLAs).
- ASICs application-specific integrated circuits
- FPGAs field-programmable gate arrays
- PDAs programmable logic arrays
- the system 108 can include a hardware processor 502 , and memory 504 that is coupled to the hardware processor 502 .
- a monitor interface 506 provides communications between the DTR system 108 and the patient monitor 106
- a treatment interface provides communications between the DTR system 108 and the treatment application system 110 .
- the interfaces 106 and 110 can each include any appropriate wired or wireless communications protocol and medium.
- the DTR system 108 may be integrated with one or both of the patient monitor 106 and the treatment application system 110 , such that the interfaces 106 and 110 represent internal communications, such as buses.
- one or both of the patient monitor 106 and the treatment application system 110 can be implemented as separate, discrete pieces of hardware, that communicate with the DTR system 108 .
- the DTR system 108 may include one or more functional modules.
- such modules can be implemented as software that is stored in memory 504 and that is executed by hardware processor 502 .
- such modules can be implemented as one or more discrete hardware components, for example implemented as application-specific integrated chips or field programmable gate arrays.
- patient information is received through the monitor interface 506 .
- this information may be received as discrete sensor readings from a variety of sensors 104 .
- this information may be received from the patient monitor 106 as a consolidated vector that represents multiple measurements.
- Some patient information may also be stored in the memory 504 , for example in the form of patient demographic information and medical history.
- the ACIL model 510 uses the collected patient information to generate a treatment trajectory. This trajectory is updated as new patient information is received.
- the treatment interface 508 sends information about the treatment trajectory to the treatment application system 110 , for use with the patient.
- the ACIL model 510 may be implemented with one or more artificial neural networks. These networks are trained, for example in the manner described above, using model trainer 512 .
- Model trainer uses a set of training data, which may be stored in memory 504 , and which may include treatment trajectories that resulted in positive health outcomes, as well as treatment trajectories that resulted in negative health outcomes.
- An artificial neural network is an information processing system that is inspired by biological nervous systems, such as the brain.
- the key element of ANNs is the structure of the information processing system, which includes a large number of highly interconnected processing elements (called “neurons”) working in parallel to solve specific problems.
- ANNs are furthermore trained in-use, with learning that involves adjustments to weights that exist between the neurons.
- An ANN is configured for a specific application, such as pattern recognition or data classification, through such a learning process.
- ANNs demonstrate an ability to derive meaning from complicated or imprecise data and can be used to extract patterns and detect trends that are too complex to be detected by humans or other computer-based systems.
- the structure of a neural network is known generally to have input neurons 602 that provide information to one or more “hidden” neurons 604 . Connections 608 between the input neurons 602 and hidden neurons 604 are weighted and these weighted inputs are then processed by the hidden neurons 604 according to some function in the hidden neurons 604 , with weighted connections 608 between the layers. There may be any number of layers of hidden neurons 604 , and as well as neurons that perform different functions. There exist different neural network structures as well, such as convolutional neural network, maxout network, etc. Finally, a set of output neurons 606 accepts and processes weighted input from the last set of hidden neurons 604 .
- the output is compared to a desired output available from training data.
- the error relative to the training data is then processed in “feed-back” computation, where the hidden neurons 604 and input neurons 602 receive information regarding the error propagating backward from the output neurons 606 .
- weight updates are performed, with the weighted connections 608 being updated to account for the received error.
- an ANN architecture 700 is shown. It should be understood that the present architecture is purely exemplary, and that other architectures or types of neural network may be used instead.
- the ANN embodiment described herein is included with the intent of illustrating general principles of neural network computation at a high level of generality and should not be construed as limiting in any way.
- layers of neurons described below and the weights connecting them are described in a general manner and can be replaced by any type of neural network layers with any appropriate degree or type of interconnectivity.
- layers can include convolutional layers, pooling layers, fully connected layers, softmax layers, or any other appropriate type of neural network layer.
- layers can be added or removed as needed and the weights can be omitted for more complicated forms of interconnection.
- a set of input neurons 702 each provide an input signal in parallel to a respective row of weights 704 .
- the weights 704 each have a respective settable value, such that a weight output passes from the weight 704 to a respective hidden neuron 706 to represent the weighted input to the hidden neuron 706 .
- the weights 704 may simply be represented as coefficient values that are multiplied against the relevant signals. The signals from each weight adds column-wise and flows to a hidden neuron 706 .
- the hidden neurons 706 use the signals from the array of weights 704 to perform some calculation.
- the hidden neurons 706 then output a signal of their own to another array of weights 704 .
- This array performs in the same way, with a column of weights 704 receiving a signal from their respective hidden neuron 706 to produce a weighted signal output that adds row-wise and is provided to the output neuron 708 .
- any number of these stages may be implemented, by interposing additional layers of arrays and hidden neurons 706 . It should also be noted that some neurons may be constant neurons 709 , which provide a constant output to the array. The constant neurons 709 can be present among the input neurons 702 and/or hidden neurons 706 and are only used during feed-forward operation.
- the output neurons 708 provide a signal back across the array of weights 704 .
- the output layer compares the generated network response to training data and computes an error.
- the error signal can be made proportional to the error value.
- a row of weights 704 receives a signal from a respective output neuron 708 in parallel and produces an output which adds column-wise to provide an input to hidden neurons 706 .
- the hidden neurons 706 combine the weighted feedback signal with a derivative of its feed-forward calculation and stores an error value before outputting a feedback signal to its respective column of weights 704 . This back propagation travels through the entire network 700 until all hidden neurons 706 and the input neurons 702 have stored an error value.
- the stored error values are used to update the settable values of the weights 704 .
- the weights 704 can be trained to adapt the neural network 700 to errors in its processing. It should be noted that the three modes of operation, feed forward, back propagation, and weight update, do not overlap with one another.
- any of the following “/”, “and/or”, and “at least one of”, for example, in the cases of “A/B”, “A and/or B” and “at least one of A and B”, is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B).
- such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C).
- This may be extended for as many items listed.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Artificial Intelligence (AREA)
- Medical Informatics (AREA)
- Computational Linguistics (AREA)
- Public Health (AREA)
- Molecular Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Primary Health Care (AREA)
- Epidemiology (AREA)
- Pathology (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Algebra (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Medical Treatment And Welfare Office Work (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
Methods and systems for responding to changing conditions include training a model, using a processor, using trajectories that resulted in a positive outcome and trajectories that resulted in a negative outcome. Training is performed using an adversarial discriminator to train the model to generate trajectories that are similar to historical trajectories that resulted in a positive outcome, and using a cooperative discriminator to train the model to generate trajectories that are dissimilar to historical trajectories that resulted in a negative outcome. A dynamic response regime is generated using the trained model and environment information. A response to changing environment conditions is performed in accordance with the dynamic response regime.
Description
- This application is a continuing application of U.S. patent application Ser. No. 16/998,228 filed 20 Aug. 2020, which claims the benefit of U.S. Provisional Patent Application Ser. No. 62/893,324, filed on 29 Aug. 2019, both of which are incorporated by reference in their entireties, incorporated herein by reference in its entirety.
- The present invention relates to providing medical treatments to patients, and, more particularly, to determining tailored treatments that are adjusted over time according to the changing state of the patients.
- Determining treatments for individual patients has historically been performed by highly skilled doctors, who apply their experience and training to assess the patient's needs and provide a course of treatment. However, the fallibility of human judgment leads to errors. As a result, there is a need to automate the process of medical decision-making, particularly as it applies to the modification of a treatment plan in response to changing patient conditions.
- A method for responding to changing conditions includes training a model, using a processor, using trajectories that resulted in a positive outcome and trajectories that resulted in a negative outcome. Training is performed using an adversarial discriminator to train the model to generate trajectories that are similar to historical trajectories that resulted in a positive outcome, and using a cooperative discriminator to train the model to generate trajectories that are dissimilar to historical trajectories that resulted in a negative outcome. A dynamic response regime is generated using the trained model and environment information. A response to changing environment conditions is performed in accordance with the dynamic response regime.
- A method for treating a patient includes training a model on historical treatment trajectories, including trajectories that resulted in a positive health outcome and trajectories that resulted in a negative health outcome. A dynamic treatment regime is generated for a patient using the trained model and patient information. The patient is treated in accordance with the dynamic treatment regime, in a manner that is responsive to changing patient conditions, by triggering one or more medical devices to administer a treatment to the patient.
- A system for treating a patient includes a machine learning model, configured to generate a dynamic response regime for using environment information. A model trainer is configured to train the machine learning model, including trajectories that resulted in a positive outcome and trajectories that resulted in a negative outcome, by using an adversarial discriminator to train the machine learning model to generate trajectories that are similar to historical trajectories that resulted in a positive outcome, and by using a cooperative discriminator to train the model to generate trajectories that are dissimilar to historical trajectories that resulted in a negative outcome. A response interface is configured to trigger a response to changing environment conditions in accordance with the dynamic response regime.
- These and other features and advantages will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.
- The disclosure will provide details in the following description of preferred embodiments with reference to the following figures wherein:
-
FIG. 1 is a block diagram showing a patient being monitored and treated by a system that uses a dynamic treatment regime to react to changing patient conditions, in accordance with an embodiment of the present invention; -
FIG. 2 is a block/flow diagram of a method for generating and implementing a dynamic treatment regime for a patient, in accordance with an embodiment of the present invention; -
FIG. 3 is a block/flow diagram of a method for training a machine learning model to generate dynamic treatment regimes, in accordance with an embodiment of the present invention; -
FIG. 4 is pseudo-code for a learning process for a machine learning model to generate dynamic treatment regimes, in accordance with an embodiment of the present invention; -
FIG. 5 is a block diagram of a dynamic treatment regime system that generates and implements a dynamic treatment regime, in accordance with an embodiment of the present invention; -
FIG. 6 is a diagram of an exemplary neural network structure, in accordance with an embodiment of the present invention; and -
FIG. 7 is a diagram of an exemplary neural network structure with weights, in accordance with an embodiment of the present invention. - Embodiments of the present invention provide a dynamic treatment regime (DTR), a sequence of tailored treatment decisions that specify how treatments should be adjusted through time, in accordance with the dynamic states of patients. Rules in the DTR can take input information, such as a patient's medical history, laboratory results, and demographic information, and output recommended treatments to improve the effectiveness of the treatment program.
- The present embodiments can make use of deep reinforcement techniques for machine learning, for example to learn treatment policies from doctors' previous treatment plans. The present embodiments do so in such a way as to avoid the compounding errors that can result from supervised methods that are based on behavior cloning and the sparsity of self-defined reward signals in reinforcement learning models. Treatment paths are considered that include both positive trajectories, where a positive health outcome was achieved for a patient, and negative trajectories, where a negative health outcome resulted. By using both positive and negative trajectories, productive strategies are learned, and unproductive strategies are avoided.
- Toward that end, the present embodiments use an adversarial cooperative imitation learning (ACIL) model to determine the dynamic treatment regimes that produce positive outcomes, while staying away from negative trajectories. Two discriminators can be used, including an adversarial discriminator and a cooperative discriminator. The adversarial discriminator minimizes the discrepancies between the output trajectories and the positive trajectories in a set of training data, while the cooperative discriminator distinguishes the negative trajectories from the positive trajectories and the output trajectories. Reward signals from the discriminators are used to refine the policy that generates dynamic treatment regimes.
- Based on the policies learned by the model, DTRs are generated in response to specific patient information. These DTRs are then implemented, by providing the specified care and treatment to the patients, responsive to the changing condition for each patient. The present embodiments thereby reduce the likelihood of a negative health outcome and provide superior dynamic treatment regimens.
- Referring now to
FIG. 1 , an embodiment of the present invention is shown. Apatient 102 is shown. Thepatient 102 may, for example, have a medical condition that is being treated. One ormore sensors 104 monitor information about the patient's condition, and provide the information topatient monitor 106. This information may include vital information, such as heart rate, blood oxygen saturation, blood pressure, body temperature, blood sugar levels. The information may also include patient activity information, such as movements and location. In each case, the information may be collected by any appropriate sensing device or device(s) 104. Thepatient monitor 106 may also accept information about the patient that is not sensed directly, for example including the patient's demographic information (e.g., age, medical history, family medical history, etc.) and the patient's own statement of symptoms, for example input by the patient or collected by a medical professional. - The
patient monitor 106 renders the collected information in a format suitable for theDTR system 108. TheDTR system 108 includes a set of rules for how treatment should progress, based on updates to the patient's monitored information. As just one example of such a rule, if a patient's blood pressure were to drop below a threshold, theDTR system 108 may indicate an appropriate medical response and adjustment to treatment. The DTR system's policies are learned in advance, as described in greater detail below, to incorporate past instances of successful and unsuccessful treatments, thereby providing a set of rules that stay close to successful treatment trajectories, while staying away from unsuccessful treatment trajectories. - A treatment application system 110 accepts directives from the
DTR system 108 and takes an appropriate action. In some cases, when the treatment recommendation involves the intervention of a medical professional, the treatment system 110 can output an alert or an instruction for the recommended treatment. In other cases, the treatment recommendation can include an automatic treatment intervention, by way of one or moremedical treatment devices 112. As just one example of such an automatic treatment, if theDTR system 108 indicates that a patient's dropping blood pressure necessitates a quick pharmaceutical intervention, the treatment system 110 may cause a treatment device to introduce an appropriate medication to the patient's bloodstream. - In this manner, the present embodiments can make rapid adjustments to a patient's treatment, responsive to the patient's changing medical condition. This reduces the reliance on fallible human decision-making and can lead to superior outcomes, particularly in stressful situations, where a decision needs to be made quickly and correctly.
- Referring now to
FIG. 2 , a method of treating a patient is shown.Block 202 builds a set of training data that includes, for example, records of historical treatment trajectories. The historical treatment trajectories may include information about patient condition, information about the timing and type of treatment actions and changes, and information about the treatment's outcome. Treatment trajectories with both positive health outcomes and negative health outcomes are included in the training set. - In some embodiments, the trajectories can be represented as sequences of states and actions (s0, a0, s1, a1, . . . ) drawn from a policy π. Thus, each state st∈ includes collected patient information at a time t, and each action at∈ includes a K-dimensional binary-valued vector, where the value on each dimension represents the application of a particular medication, dosage, or treatment action. Some of the trajectories are associated with policies that result in positive outcomes (π+), while other trajectories are associated with policies that result in negative outcomes (π−). The positive trajectories can be expressed as τ+=(s1 +, a1 +, . . . ) and the negative trajectories can be expressed as τ−=(s1 −, a1 −, . . . ).
- Block 204 then uses the training set to train the ACIL model. This model may be implemented using machine learning techniques, described in greater detail below. The model accepts patient information as an input, and outputs one or more DTR policies for the patient. As noted above, a DTR policy includes one or more rules that are used to adapt treatment to changing patient conditions.
-
Block 206 then collects information for aspecific patient 102, as described above. In block 208, the patient information is used as an input to the ACIL model to produce a DTR policy for thespecific patient 102, relating to that patient's treatment needs. The output policy can be expressed as πθ, with a parameter vector θ that represents the particular policy rules.Block 210 then applies a recommended treatment to thepatient 102, using the collected patient information, following a trajectory τθ that is generated by the policy πθ. As time goes on, block 212 updates the patient information, for example with current measurements.Block 210 then uses this updated information to determine any updated treatments that may be needed, according to the DTR. This process can continue indefinitely, or can be interrupted by a positive or negative health outcome. - Referring now to
FIG. 3 , additional information on the training of the ACIL model in block 204 is shown. As an overview, block 302 trains the patient model, which serves as an environment simulator. The adversarial discriminator, cooperative discriminator, and policy network are then iteratively trained until they converge inblocks - In
block 302, the environment can be simulated with generative models, such as variational auto-encoders, for model-based reinforcement learning and trajectory embedding. As an alternative to using a variable auto-encoder, a generative adversarial network can be used instead. The variational auto-encoder architecture builds a patient model that transforms a state distribution into an underlying latent space. The patient model includes an encoder, which maps the current state and action to a latent distribution z˜(μ, σ), and a decoder, which maps latent z and the current state st and action at into a successor state ŝt+1. The patient model is trained to minimize a reconstruction error between the input state st+1 and a reconstructed state ŝt+1 that is generated by the decoder, under the latent distribution z. An objective function for this can be expressed as: -
- where w is a reconstruction error, st is a state at time t, at is an action at time t, μ, σ=Ew
1 (st, at) is an encoder network that takes the current state st and action at as inputs, using a first parameter w1, and ŝt+1=Dw2 (st, at, z) is the output a decoder network Dw2 with a latent factor z and the current state and action as input, using a second parameter w2. The variable α represents a balancing weight between two kinds of loss, and the function DKL is the Kullback-Liebler divergence. - In general, the auto-encoder seeks to “encode” the input information, in this case the “actions” and “states,” and translates them to the latent space. In some embodiments, this latent space may represent the actions and states as vectors, which can be readily compared to one another. The decoder then translates those vectors back to “actions” and “states,” and an error w represents the difference between the output of the decoder and the input to the encoder. The parameters of the auto-encoder are then modified to reduce the value of the error. Training continues, with the parameters being modified at each iteration, until the error value reaches a point where no further training is needed. This may be triggered, for example, when the error value falls below a threshold, or when the error value does not change significantly over a number of iterations.
- In block 304, training the adversarial discriminator includes a comparison between the trajectories of positive outcome scenarios and the trajectories generated by a policy network. In general, the differences between two policies (e.g., the policy πθ generated by the ACIL model, and a policy with a positive outcome π+) by comparing the trajectories they generate. For a policy π∈Π, the occupancy measure ρπ:×→ can be defined as ρπ(s, a)=π(a|s) Σt=0 TγP(st=s|π), where γ is a discounting factor, T is a maximum time value, and where successor states are drawn from P(s|π). The occupancy measure can be interpreted as the distribution of state-action pairs that the policy interacts with in the environment. A policy πθ can be implemented as a multiple-layer perceptron network, where πθ takes the state of the patient as an input and returns, for example, recommended medications.
- The adversarial discriminator Da(s, a) can also be implemented as a multiple-layer perceptron network, having a number and dimension of layers that are fine-tuned parameters, which estimates the probability that a state-action pair (s, a) comes from a positive trajectory policy π+, rather than a generated policy πθ. The learning of the adversarial discriminator can be expressed as the following objective function:
-
- This objective function is equivalent to minimizing the Jensen-Shannon divergence DJS between the distributions of state-action pairs ρπ
θ and ρπ+ , which are generated by interacting with the environment using policy πθ and policy π+. represents the expectation over all (s, a) pairs sampled from ρπθ . Da is referred to as an adversarial discriminator, because the goals of optimizing Da and πθ are opposite—Da seeks to minimize the probability of the state-action pair generated by πθ, while πθ is selected to maximize the probability of Da making a mistake. - In
block 306, training the cooperative discriminator includes training a model to differentiate the generated trajectories and the positive trajectory policies from the negative trajectory policies. The occupancy measure ρπ can be used again to compare the different policies. The objective function for learning the cooperative discriminator Dc can be expressed as: -
- This objective function characterizes the optimal negative log loss of classifying the positive trajectories generated from πθ and π+ and the negative trajectories generated from π−. This is referred to as a cooperative discriminator because the goals of Dc and πθ are both to maximize the probability of the data that is generated by πθ is positive. The losses from Da and Dc can be considered as reward functions that help refine πθ. When the distribution ρπ
θ is different from ρπ− , it receives a large reward from Dc. With an optimal Dc, the loss of πθ is DJS(ρπ+ +ρπθ ∥ρπ− . - In
block 308, training the policy network seeks to update the policy network πθ to mimic positive trajectories, while staying away from negative trajectories. The network incorporates the reward signals from both Da and Dc. The signal from Da is used to push πθ closer to π+, while the signal Dc separates πθ and π−. The loss function can be defined as: -
- where H(π) is the casual entropy of the policy, which encourages diversity in the learned policy, and λ≥0 is a parameter that is used to control H(πθ). The parameters ωα and ωβ are weights with values between 0 and 1, and balance the reward signals.
- The adversarial discriminator Da, the cooperative discriminator Dc, and the policy network πθ are trained in a three-party min-max game, which can be defined as:
-
- where ωa and ωb are weight parameters that weight the contribution of the adversarial discriminator and the cooperative discriminator. The entropy of the policy πθ encourages policy diversity, and is defined as:
- When both Da and Dc are optimized, the outcome of the three-party min-max game is equivalent to the following optimization problem:
-
- which finds a policy whose occupancy measure minimizes the JS divergence to π+ and maximizes the JS divergence to π−.
- Referring now to
FIG. 4 , pseudo-code of the learning process for an ACIL model is shown. First the patient model Gw is trained, followed by iterative training of Da, Dc, and πθ. - In tests, the present embodiments generated policies that substantially outperformed baseline processes for generating treatment trajectories. ACIL considers discovering DTRs as a sequential decision-making problem and focuses on the long-term influence of the current action. Additionally, with the use of both positive and negative trajectory examples as training data, ACIL is able to mimic policies that have positive health outcomes, while avoiding mistakes. The result is a superior treatment policy, that responds to changing patient conditions in a manner that maximizes the likelihood of a positive health outcome.
- Embodiments described herein may be entirely hardware, entirely software or including both hardware and software elements. In a preferred embodiment, the present invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
- Embodiments may include a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. A computer-usable or computer readable medium may include any apparatus that stores, communicates, propagates, or transports the program for use by or in connection with the instruction execution system, apparatus, or device. The medium can be magnetic, optical, electronic, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. The medium may include a computer-readable storage medium such as a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk, etc.
- Each computer program may be tangibly stored in a machine-readable storage media or device (e.g., program memory or magnetic disk) readable by a general or special purpose programmable computer, for configuring and controlling operation of a computer when the storage media or device is read by the computer to perform the procedures described herein. The inventive system may also be considered to be embodied in a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.
- A data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code to reduce the number of times code is retrieved from bulk storage during execution. Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) may be coupled to the system either directly or through intervening I/O controllers.
- Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
- As employed herein, the term “hardware processor subsystem” or “hardware processor” can refer to a processor, memory, software or combinations thereof that cooperate to perform one or more specific tasks. In useful embodiments, the hardware processor subsystem can include one or more data processing elements (e.g., logic circuits, processing circuits, instruction execution devices, etc.). The one or more data processing elements can be included in a central processing unit, a graphics processing unit, and/or a separate processor- or computing element-based controller (e.g., logic gates, etc.). The hardware processor subsystem can include one or more on-board memories (e.g., caches, dedicated memory arrays, read only memory, etc.). In some embodiments, the hardware processor subsystem can include one or more memories that can be on or off board or that can be dedicated for use by the hardware processor subsystem (e.g., ROM, RAM, basic input/output system (BIOS), etc.).
- In some embodiments, the hardware processor subsystem can include and execute one or more software elements. The one or more software elements can include an operating system and/or one or more applications and/or specific code to achieve a specified result.
- In other embodiments, the hardware processor subsystem can include dedicated, specialized circuitry that performs one or more electronic processing functions to achieve a specified result. Such circuitry can include one or more application-specific integrated circuits (ASICs), field-programmable gate arrays (FPGAs), and/or programmable logic arrays (PLAs).
- These and other variations of a hardware processor subsystem are also contemplated in accordance with embodiments of the present invention.
- Referring now to
FIG. 5 , additional detail on theDTR system 108 is shown. Thesystem 108 can include ahardware processor 502, andmemory 504 that is coupled to thehardware processor 502. Amonitor interface 506 provides communications between theDTR system 108 and the patient monitor 106, while a treatment interface provides communications between theDTR system 108 and the treatment application system 110. - It should be understood that the
interfaces 106 and 110 can each include any appropriate wired or wireless communications protocol and medium. In some embodiments, theDTR system 108 may be integrated with one or both of the patient monitor 106 and the treatment application system 110, such that theinterfaces 106 and 110 represent internal communications, such as buses. In some embodiments, one or both of the patient monitor 106 and the treatment application system 110 can be implemented as separate, discrete pieces of hardware, that communicate with theDTR system 108. - The
DTR system 108 may include one or more functional modules. In some embodiments, such modules can be implemented as software that is stored inmemory 504 and that is executed byhardware processor 502. In other embodiments, such modules can be implemented as one or more discrete hardware components, for example implemented as application-specific integrated chips or field programmable gate arrays. - During operation, patient information is received through the
monitor interface 506. In some embodiments, this information may be received as discrete sensor readings from a variety ofsensors 104. In other embodiments, this information may be received from the patient monitor 106 as a consolidated vector that represents multiple measurements. Some patient information may also be stored in thememory 504, for example in the form of patient demographic information and medical history. - The
ACIL model 510 uses the collected patient information to generate a treatment trajectory. This trajectory is updated as new patient information is received. Thetreatment interface 508 sends information about the treatment trajectory to the treatment application system 110, for use with the patient. - In some embodiments, the
ACIL model 510 may be implemented with one or more artificial neural networks. These networks are trained, for example in the manner described above, usingmodel trainer 512. Model trainer uses a set of training data, which may be stored inmemory 504, and which may include treatment trajectories that resulted in positive health outcomes, as well as treatment trajectories that resulted in negative health outcomes. - An artificial neural network (ANN) is an information processing system that is inspired by biological nervous systems, such as the brain. The key element of ANNs is the structure of the information processing system, which includes a large number of highly interconnected processing elements (called “neurons”) working in parallel to solve specific problems. ANNs are furthermore trained in-use, with learning that involves adjustments to weights that exist between the neurons. An ANN is configured for a specific application, such as pattern recognition or data classification, through such a learning process.
- Referring now to
FIG. 6 , a generalized diagram of a neural network is shown. ANNs demonstrate an ability to derive meaning from complicated or imprecise data and can be used to extract patterns and detect trends that are too complex to be detected by humans or other computer-based systems. The structure of a neural network is known generally to haveinput neurons 602 that provide information to one or more “hidden”neurons 604.Connections 608 between theinput neurons 602 andhidden neurons 604 are weighted and these weighted inputs are then processed by thehidden neurons 604 according to some function in thehidden neurons 604, withweighted connections 608 between the layers. There may be any number of layers of hiddenneurons 604, and as well as neurons that perform different functions. There exist different neural network structures as well, such as convolutional neural network, maxout network, etc. Finally, a set ofoutput neurons 606 accepts and processes weighted input from the last set of hiddenneurons 604. - This represents a “feed-forward” computation, where information propagates from
input neurons 602 to theoutput neurons 606. Upon completion of a feed-forward computation, the output is compared to a desired output available from training data. The error relative to the training data is then processed in “feed-back” computation, where thehidden neurons 604 andinput neurons 602 receive information regarding the error propagating backward from theoutput neurons 606. Once the backward error propagation has been completed, weight updates are performed, with theweighted connections 608 being updated to account for the received error. This represents just one variety of ANN. - Referring now to
FIG. 7 , anANN architecture 700 is shown. It should be understood that the present architecture is purely exemplary, and that other architectures or types of neural network may be used instead. The ANN embodiment described herein is included with the intent of illustrating general principles of neural network computation at a high level of generality and should not be construed as limiting in any way. - Furthermore, the layers of neurons described below and the weights connecting them are described in a general manner and can be replaced by any type of neural network layers with any appropriate degree or type of interconnectivity. For example, layers can include convolutional layers, pooling layers, fully connected layers, softmax layers, or any other appropriate type of neural network layer. Furthermore, layers can be added or removed as needed and the weights can be omitted for more complicated forms of interconnection.
- During feed-forward operation, a set of
input neurons 702 each provide an input signal in parallel to a respective row ofweights 704. Theweights 704 each have a respective settable value, such that a weight output passes from theweight 704 to a respectivehidden neuron 706 to represent the weighted input to the hiddenneuron 706. In software embodiments, theweights 704 may simply be represented as coefficient values that are multiplied against the relevant signals. The signals from each weight adds column-wise and flows to ahidden neuron 706. - The
hidden neurons 706 use the signals from the array ofweights 704 to perform some calculation. Thehidden neurons 706 then output a signal of their own to another array ofweights 704. This array performs in the same way, with a column ofweights 704 receiving a signal from their respective hiddenneuron 706 to produce a weighted signal output that adds row-wise and is provided to theoutput neuron 708. - It should be understood that any number of these stages may be implemented, by interposing additional layers of arrays and
hidden neurons 706. It should also be noted that some neurons may beconstant neurons 709, which provide a constant output to the array. Theconstant neurons 709 can be present among theinput neurons 702 and/or hiddenneurons 706 and are only used during feed-forward operation. - During back propagation, the
output neurons 708 provide a signal back across the array ofweights 704. The output layer compares the generated network response to training data and computes an error. The error signal can be made proportional to the error value. In this example, a row ofweights 704 receives a signal from arespective output neuron 708 in parallel and produces an output which adds column-wise to provide an input to hiddenneurons 706. Thehidden neurons 706 combine the weighted feedback signal with a derivative of its feed-forward calculation and stores an error value before outputting a feedback signal to its respective column ofweights 704. This back propagation travels through theentire network 700 until allhidden neurons 706 and theinput neurons 702 have stored an error value. - During weight updates, the stored error values are used to update the settable values of the
weights 704. In this manner theweights 704 can be trained to adapt theneural network 700 to errors in its processing. It should be noted that the three modes of operation, feed forward, back propagation, and weight update, do not overlap with one another. - Reference in the specification to “one embodiment” or “an embodiment” of the present invention, as well as other variations thereof, means that a particular feature, structure, characteristic, and so forth described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrase “in one embodiment” or “in an embodiment”, as well any other variations, appearing in various places throughout the specification are not necessarily all referring to the same embodiment. However, it is to be appreciated that features of one or more embodiments can be combined given the teachings of the present invention provided herein.
- It is to be appreciated that the use of any of the following “/”, “and/or”, and “at least one of”, for example, in the cases of “A/B”, “A and/or B” and “at least one of A and B”, is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of both options (A and B). As a further example, in the cases of “A, B, and/or C” and “at least one of A, B, and C”, such phrasing is intended to encompass the selection of the first listed option (A) only, or the selection of the second listed option (B) only, or the selection of the third listed option (C) only, or the selection of the first and the second listed options (A and B) only, or the selection of the first and third listed options (A and C) only, or the selection of the second and third listed options (B and C) only, or the selection of all three options (A and B and C). This may be extended for as many items listed.
- The foregoing is to be understood as being in every respect illustrative and exemplary, but not restrictive, and the scope of the invention disclosed herein is not to be determined from the Detailed Description, but rather from the claims as interpreted according to the full breadth permitted by the patent laws. It is to be understood that the embodiments shown and described herein are only illustrative of the present invention and that those skilled in the art may implement various modifications without departing from the scope and spirit of the invention. Those skilled in the art could implement various other feature combinations without departing from the scope and spirit of the invention. Having thus described aspects of the invention, with the details and particularity required by the patent laws, what is claimed and desired protected by Letters Patent is set forth in the appended claims.
Claims (17)
1. A method for responding to changing conditions, comprising:
training a model, using a processor, including trajectories that resulted in a positive outcome and trajectories that resulted in a negative outcome, by using an adversarial discriminator to train the model to generate trajectories that are similar to historical trajectories that resulted in a positive outcome, and by using a cooperative discriminator to train the model to generate trajectories that are dissimilar to historical trajectories that resulted in a negative outcome, and including iteratively training the adversarial discriminator, the cooperative discriminator, and the dynamic response regime using a three-party optimization until improvement from one iteration to the next has fallen below a predetermined threshold;
generating a dynamic response regime using the trained model and environment information; and
responding to changing environment conditions in accordance with the dynamic response regime.
2. The method of claim 1 , wherein the historical trajectories include patient treatment trajectories.
3. The method of claim 2 , wherein the positive outcomes are positive patient health outcomes, and the negative outcomes are negative patient health outcomes.
4. The method of claim 2 , wherein the environment information and the environment conditions reflect information about a patient being treated.
5. The method of claim 1 , wherein the adversarial discriminator, the cooperative discriminator, and the dynamic response regime are implemented as multiple-layer perceptrons.
6. The method of claim 1 , wherein training the model comprises training an environment model that encodes environment information as a vector in a latent space.
7. The method of claim 1 , wherein the model is implemented as a variational auto-encoder network.
8. The method of claim 1 , wherein responding to changing environment conditions comprises automatically performing a responsive action to correct a negative condition.
9. A system for responding to changing conditions, comprising:
a machine learning model, configured to generate a dynamic response regime for using environment information;
a model trainer, configured to train the machine learning model, including trajectories that resulted in a positive outcome and trajectories that resulted in a negative outcome, by using an adversarial discriminator to train the machine learning model to generate trajectories that are similar to historical trajectories that resulted in a positive outcome, and by using a cooperative discriminator to train the model to generate trajectories that are dissimilar to historical trajectories that resulted in a negative outcome, and to iteratively train the adversarial discriminator, the cooperative discriminator, and the dynamic response regime using a three-party optimization until improvement from one iteration to the next has fallen below a predetermined threshold; and
a response interface, configured to trigger a response to changing environment conditions in accordance with the dynamic response regime.
10. The system of claim 9 , wherein the historical trajectories that resulted in a positive outcome and the historical trajectories that resulted in a negative outcome include patient treatment trajectories.
11. The system of claim 10 , wherein the positive outcomes are positive patient health outcomes, and the negative outcomes are negative patient health outcomes.
12. The system of claim 9 , wherein the environment information and the environment conditions reflect information about a patient being treated.
13. The system of claim 9 , wherein the model trainer is further configured to iteratively train the adversarial discriminator, the cooperative discriminator, and the dynamic response regime using a three-party optimization.
14. The system of claim 9 , wherein the adversarial discriminator, the cooperative discriminator, and the dynamic response regime are implemented as multiple-layer perceptrons in the machine learning model.
15. The system of claim 9 , wherein the model trainer is further configured to train an environment model that encodes the environment information as a vector in a latent space.
16. The system of claim 15 , wherein the environment model is implemented as a variational auto-encoder network in the machine learning model.
17. The system of claim 9 , wherein the response interface is further configured to automatically perform a responsive action to correct a negative condition.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US18/362,166 US20230376774A1 (en) | 2019-08-29 | 2023-07-31 | Adversarial Cooperative Imitation Learning for Dynamic Treatment |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201962893324P | 2019-08-29 | 2019-08-29 | |
US16/998,228 US11783189B2 (en) | 2019-08-29 | 2020-08-20 | Adversarial cooperative imitation learning for dynamic treatment |
US18/362,166 US20230376774A1 (en) | 2019-08-29 | 2023-07-31 | Adversarial Cooperative Imitation Learning for Dynamic Treatment |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/998,228 Continuation US11783189B2 (en) | 2019-08-29 | 2020-08-20 | Adversarial cooperative imitation learning for dynamic treatment |
Publications (1)
Publication Number | Publication Date |
---|---|
US20230376774A1 true US20230376774A1 (en) | 2023-11-23 |
Family
ID=74679893
Family Applications (4)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/998,228 Active 2042-04-03 US11783189B2 (en) | 2019-08-29 | 2020-08-20 | Adversarial cooperative imitation learning for dynamic treatment |
US18/362,166 Pending US20230376774A1 (en) | 2019-08-29 | 2023-07-31 | Adversarial Cooperative Imitation Learning for Dynamic Treatment |
US18/362,125 Pending US20230376773A1 (en) | 2019-08-29 | 2023-07-31 | Adversarial Cooperative Imitation Learning for Dynamic Treatment |
US18/362,193 Pending US20240005163A1 (en) | 2019-08-29 | 2023-07-31 | Adversarial Cooperative Imitation Learning for Dynamic Treatment |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/998,228 Active 2042-04-03 US11783189B2 (en) | 2019-08-29 | 2020-08-20 | Adversarial cooperative imitation learning for dynamic treatment |
Family Applications After (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/362,125 Pending US20230376773A1 (en) | 2019-08-29 | 2023-07-31 | Adversarial Cooperative Imitation Learning for Dynamic Treatment |
US18/362,193 Pending US20240005163A1 (en) | 2019-08-29 | 2023-07-31 | Adversarial Cooperative Imitation Learning for Dynamic Treatment |
Country Status (4)
Country | Link |
---|---|
US (4) | US11783189B2 (en) |
JP (1) | JP7305028B2 (en) |
DE (1) | DE112020004025T5 (en) |
WO (1) | WO2021041185A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024064953A1 (en) * | 2022-09-23 | 2024-03-28 | H. Lee Moffitt Cancer Center And Research Institute, Inc. | Adaptive radiotherapy clinical decision support tool and related methods |
Family Cites Families (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6410289B2 (en) | 2014-03-20 | 2018-10-24 | 日本電気株式会社 | Pharmaceutical adverse event extraction method and apparatus |
CN113421652A (en) * | 2015-06-02 | 2021-09-21 | 推想医疗科技股份有限公司 | Method for analyzing medical data, method for training model and analyzer |
EP3613060A1 (en) * | 2017-04-20 | 2020-02-26 | Koninklijke Philips N.V. | Learning and applying contextual similarities between entities |
US11266355B2 (en) * | 2017-05-19 | 2022-03-08 | Cerner Innovation, Inc. | Early warning system and method for predicting patient deterioration |
WO2019049819A1 (en) * | 2017-09-08 | 2019-03-14 | 日本電気株式会社 | Medical information processing system |
KR101946402B1 (en) * | 2017-10-31 | 2019-02-11 | 고려대학교산학협력단 | Method and system for providing result of prospect of cancer treatment using artificial intelligence |
WO2019086555A1 (en) * | 2017-10-31 | 2019-05-09 | Ge Healthcare Limited | Medical system for diagnosing cognitive disease pathology and/or outcome |
KR20190002059U (en) * | 2018-02-05 | 2019-08-14 | 유정혜 | Genetically customized drug prescription method using web Application |
-
2020
- 2020-08-20 US US16/998,228 patent/US11783189B2/en active Active
- 2020-08-21 WO PCT/US2020/047332 patent/WO2021041185A1/en active Application Filing
- 2020-08-21 JP JP2022505538A patent/JP7305028B2/en active Active
- 2020-08-21 DE DE112020004025.9T patent/DE112020004025T5/en active Pending
-
2023
- 2023-07-31 US US18/362,166 patent/US20230376774A1/en active Pending
- 2023-07-31 US US18/362,125 patent/US20230376773A1/en active Pending
- 2023-07-31 US US18/362,193 patent/US20240005163A1/en active Pending
Also Published As
Publication number | Publication date |
---|---|
JP7305028B2 (en) | 2023-07-07 |
US20210065009A1 (en) | 2021-03-04 |
JP2022542283A (en) | 2022-09-30 |
US20230376773A1 (en) | 2023-11-23 |
US20240005163A1 (en) | 2024-01-04 |
US11783189B2 (en) | 2023-10-10 |
DE112020004025T5 (en) | 2022-07-21 |
WO2021041185A1 (en) | 2021-03-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11727279B2 (en) | Method and apparatus for performing anomaly detection using neural network | |
US20200134428A1 (en) | Self-attentive attributed network embedding | |
EP3547226A1 (en) | Cross-modal neural networks for prediction | |
Sumathi et al. | Pre-diagnosis of hypertension using artificial neural network | |
JP7070255B2 (en) | Abnormality discrimination program, abnormality discrimination method and abnormality discrimination device | |
US20230376774A1 (en) | Adversarial Cooperative Imitation Learning for Dynamic Treatment | |
US20190243739A1 (en) | Time series retrieval for analyzing and correcting system status | |
KR20160012537A (en) | Neural network training method and apparatus, data processing apparatus | |
EP4104104A1 (en) | Generative digital twin of complex systems | |
US11606393B2 (en) | Node classification in dynamic networks using graph factorization | |
US20210232918A1 (en) | Node aggregation with graph neural networks | |
US11169865B2 (en) | Anomalous account detection from transaction data | |
JP2004033673A (en) | Unified probability framework for predicting and detecting intracerebral stroke manifestation and multiple therapy device | |
WO2022005626A1 (en) | Partially-observed sequential variational auto encoder | |
Hjerde | Evaluating Deep Q-Learning Techniques for Controlling Type 1 Diabetes | |
Benyó et al. | Artificial intelligence based insulin sensitivity prediction for personalized glycaemic control in intensive care | |
Vijayan et al. | A cerebellum inspired spiking neural network as a multi-model for pattern classification and robotic trajectory prediction | |
KR20240006058A (en) | Systems, methods, and devices for predicting personalized biological states with models generated by meta-learning | |
US20220019892A1 (en) | Dialysis event prediction | |
US20200090025A1 (en) | Performance prediction from communication data | |
Gireesh et al. | Blood glucose prediction algorithms for hypoglycemic and/or hyperglycemic alerts | |
US11544377B2 (en) | Unsupervised graph similarity learning based on stochastic subgraph sampling | |
Othman | Spatial-temporal data modelling and processing for personalised decision support | |
Yang | Deep Learning Model for Detection of Retinal Vessels from Digital Fundus Images-A Survey | |
Guo et al. | Interactive Pattern Discovery in High-Dimensional, Multimodal Data Using Manifolds |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
AS | Assignment |
Owner name: NEC LABORATORIES AMERICA, INC., NEW JERSEY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:YU, WENCHAO;CHEN, HAIFENG;SIGNING DATES FROM 20200814 TO 20200815;REEL/FRAME:065133/0734 |