CN112270451A - Monitoring and early warning method and system based on reinforcement learning - Google Patents

Monitoring and early warning method and system based on reinforcement learning Download PDF

Info

Publication number
CN112270451A
CN112270451A CN202011217940.8A CN202011217940A CN112270451A CN 112270451 A CN112270451 A CN 112270451A CN 202011217940 A CN202011217940 A CN 202011217940A CN 112270451 A CN112270451 A CN 112270451A
Authority
CN
China
Prior art keywords
decision
action
environment
monitoring data
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011217940.8A
Other languages
Chinese (zh)
Other versions
CN112270451B (en
Inventor
陈芋文
张矩
钟坤华
孙启龙
林小光
刘江
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing Institute of Green and Intelligent Technology of CAS
Original Assignee
Chongqing Institute of Green and Intelligent Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing Institute of Green and Intelligent Technology of CAS filed Critical Chongqing Institute of Green and Intelligent Technology of CAS
Priority to CN202011217940.8A priority Critical patent/CN112270451B/en
Publication of CN112270451A publication Critical patent/CN112270451A/en
Application granted granted Critical
Publication of CN112270451B publication Critical patent/CN112270451B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Strategic Management (AREA)
  • Public Health (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • Databases & Information Systems (AREA)
  • Development Economics (AREA)
  • Primary Health Care (AREA)
  • Game Theory and Decision Science (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a monitoring and early warning method and a system based on reinforcement learning, which comprises the following steps: predicting the incidence relation between the time sequence monitoring data and the adverse event label according to the time sequence monitoring data input in real time, and creating a decision environment; modeling the agent decision-making action; the intelligent agent selects a decision action according to the time sequence monitoring data input at the current moment; the decision environment outputs response information according to the decision action, wherein the response information comprises environment states and reward and punishment values of the decision action; inputting the environment state into a pre-constructed deep reinforcement learning framework, and acquiring the action with the highest expected value in all selectable decision actions of the intelligent agent as the output of the next action decision of the intelligent agent; interacting the intelligent agent and the decision-making environment according to the steps until the end condition is met, and outputting a prediction result; the invention monitors the condition of the target object in real time through reinforcement learning and improves the timeliness of problem processing.

Description

Monitoring and early warning method and system based on reinforcement learning
Technical Field
The invention relates to the field of intelligent medical treatment, in particular to a monitoring and early warning method and system based on reinforcement learning.
Background
The current research is mainly based on the prediction of critical adverse events in a supervised learning mode, and mainly comprises a logistic regression algorithm, a random decision tree algorithm, a deep neural network algorithm and the like. The supervised learning is usually one-time and short-sight, and the instant reward is considered, the correlation between the prediction accuracy and a data set is large, the generalization performance of a model is not strong, and a huge labeled data set is required based on the supervised learning mode, particularly a deep neural network algorithm; the pre-labeling of critical data sets is done at a high cost and effort (key step), while the labeling based on medical data requires a high time from a sophisticated medical specialist, which is costly and expensive.
Disclosure of Invention
In view of the problems in the prior art, the invention provides a monitoring and early warning method and system based on reinforcement learning, and mainly solves the problem that adverse event discovery is delayed due to lack of early diagnosis and early warning in the perioperative period.
In order to achieve the above and other objects, the present invention adopts the following technical solutions.
A monitoring and early warning method based on reinforcement learning comprises the following steps:
predicting the incidence relation between the time sequence monitoring data and the adverse event label according to the time sequence monitoring data input in real time, and creating a decision environment;
modeling the agent decision-making action, wherein the decision-making action comprises waiting for the time-sequence monitoring data input of the next time node or outputting a predicted adverse event label;
the intelligent agent selects a decision action according to the time sequence monitoring data input at the current moment; the decision environment outputs response information according to the decision action, wherein the response information comprises environment states and reward and punishment values of the decision action;
inputting the environment state into a pre-constructed deep reinforcement learning framework, and acquiring the action with the highest expected value in all selectable decision actions of the intelligent agent as the output of the next action decision of the intelligent agent;
and interacting the intelligent agent and the decision-making environment according to the steps until an ending condition is met, and outputting a prediction result.
Optionally, the end condition includes completing prediction of all time-series monitored data within the monitored duration or outputting an adverse event tag.
Optionally, the selecting, by the agent, a decision action according to the time-series monitored data input at the current time includes:
setting a selection strategy of the agent, and selecting a decision action according to the selection strategy, wherein the selection strategy comprises the following steps: randomly or according to a preset probability.
Optionally, the decision environment outputs response information according to the decision action, including:
when the decision action is to wait for the time sequence monitoring data of the next time node to be input, a decision environment acquires the time sequence monitoring data of the next time, predicts the incidence relation between the time sequence monitoring data of the next time and the adverse event label and outputs an environment state corresponding to the time sequence monitoring data of the next time;
when the decision action is used for outputting the predicted adverse event label, the decision environment acquires the time sequence monitoring data at the current moment, predicts the incidence relation between the time sequence monitoring data at the current moment and the adverse event label, outputs a reward and punishment value of the decision action, and judges whether the adverse event label predicted by the intelligent body is correct or not according to the reward and punishment value.
Optionally, the method further comprises constructing a reward-penalty utility function, and the decision environment outputs a reward-penalty value of the decision action according to the reward-penalty utility function.
Optionally, the reward penalty utility function comprises:
Figure BDA0002761054990000021
wherein R (a)t,M:tL) represents the correlation of the decision-making action with the corresponding time-series monitored data; a istIs a decision-making action; m:tA time-series monitoring data subset for a t-time node; p is greater than 0 and is the dimension of time sequence monitoring data;
Figure BDA0002761054990000023
is a compromise parameter for advance predictability and accuracy; predict label is an adverse event expected to be predicted; u-shapedk∈L\lThe predict label k is a mispredicted adverse event.
Optionally, comprising:
and constructing an evaluation function, evaluating the decision environment, and adjusting the reward and punishment utility function according to an evaluation result.
Optionally, the evaluation function is represented by:
Figure BDA0002761054990000022
wherein, C represents an incidence relation prediction model for predicting the time sequence monitoring data and the adverse event label, D' is a test data set, and l is the adverse event label; # denotes the number of data in the set.
A reinforcement learning-based monitoring and early warning system comprises:
the environment modeling module is used for predicting the incidence relation between the time sequence monitoring data and the adverse event label according to the time sequence monitoring data input in real time and creating a decision environment;
the action modeling module is used for modeling the decision action of the intelligent agent, wherein the decision action comprises waiting for the time sequence monitoring data input of the next time node or outputting a predicted adverse event label;
the environment response module is used for selecting decision-making action by the intelligent agent according to the time sequence monitoring data input at the current moment; the decision environment outputs response information according to the decision action, wherein the response information comprises environment states and reward and punishment values of the decision action;
the reinforcement learning module is used for inputting the environment state into a pre-constructed depth reinforcement learning framework, and acquiring the action with the highest expected value in all selectable decision actions of the intelligent agent as the output of the next action decision of the intelligent agent;
and the interactive prediction module is used for interacting the intelligent agent and the decision-making environment according to the steps until the end condition is met and outputting a prediction result.
As described above, the monitoring and early warning method and system based on reinforcement learning of the present invention have the following advantages.
Early warning is carried out on perioperative target objects through a real-time online early warning method, timeliness of problem finding and problem handling is improved, and safety of the target objects is guaranteed.
Drawings
Fig. 1 is a flowchart of a reinforcement learning-based monitoring and early warning method according to an embodiment of the present invention.
Fig. 2 is a schematic interaction flow diagram of a reinforcement learning-based monitoring and early warning method according to an embodiment of the present invention.
FIG. 3 is a schematic diagram of a reinforcement learning process according to an embodiment of the present invention.
Fig. 4 is a block diagram of a monitoring and early warning system based on a reinforcement learning method according to an embodiment of the present invention.
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
Referring to fig. 1, the present invention provides a reinforcement learning-based monitoring and early warning method, which includes steps S01-S05.
In step S01, according to the time-series monitored data input in real time, the association relationship between the time-series monitored data and the adverse event label is predicted, and a decision environment is created:
in an embodiment, a cardiovascular disease patient can be used as a target object, and different monitoring devices are used to monitor multidimensional cardiovascular indicators of the target object respectively, so as to obtain time-series monitoring data including multiple dimensions. Each monitoring device can acquire a group of time-series monitoring data, and the time-series monitoring data acquired by different time nodes can be used for constructing a complete monitoring data set of the target object in the whole perioperative period. Optionally, the time-series monitoring data of a specific time period in the perioperative period may also be used to construct a complete monitoring data set of the patient, and the specific time setting may be adjusted according to the actual application requirements.
In an embodiment, each monitoring device may be respectively docked with the medical system, and output the acquired time-series monitoring data to the medical system, and further, the medical system uses the multi-dimensional time-series monitoring data of the same target object to construct a multi-dimensional monitoring data set. In particular, assume M ∈ RpxTIs a monitoring data set of a patient, and the patient monitoring variables have p, namely, time sequence monitoring data with p dimensions; the monitoring duration is T; m is the set of all monitoring time sequence data in the monitoring duration T, M:tRepresenting a set of all time sequence monitoring data before the time T, wherein the set is a subset of the M monitoring data set, and T is less than T;
Figure BDA0002761054990000041
the monitoring data is the p-dimension time sequence monitoring data corresponding to the time node 1. M and M:tRespectively, as follows:
Figure BDA0002761054990000042
Figure BDA0002761054990000043
in one embodiment, a set of labels L for cardiovascular adverse events may be constructed, each label corresponding to an adverse event, and further, a subset M of monitored data for each time node t may be constructed:tCorrelating with cardiovascular adverse event signatures, resulting in each M:tThe status of (2) is shown. In one embodiment, a prediction model for predicting the state of the output time-series monitored data can be obtained based on the correlation between the time-series monitored data of different time nodes and the adverse cardiovascular event. And modeling the decision environment according to the output of the prediction model, and inputting the time sequence monitoring data into the decision environment to predict the corresponding output response information.
In one embodiment, D represents the cardiovascular critical failure data set predicted by the prediction model D, and C is a prediction model for determining the corresponding relationship between the state of the time-series monitored data and the tag data set. Specific formulae of D and C are as follows:
D={(mi,li)i=1...n|mi∈M,li∈L}
mi∈M lie.L is C: m → L
Figure BDA0002761054990000051
In an embodiment, an evaluation function may be further constructed to evaluate the prediction accuracy of the prediction model, and specifically, the following formula may be designed to evaluate the performance of the prediction model C, where D' is a test data set and # represents the number of data in the set.
Figure BDA0002761054990000052
In step S02, modeling the agent decision-making action, wherein the decision-making action includes waiting for a time-series monitored data input of a next time node or outputting a predicted adverse event label;
in an embodiment, according to the decision environment constructed in step S01, a decision action of the agent in the decision environment is further constructed.
In one embodiment, the time-series monitoring data at the current moment is input into an Agent in advance to make action decision. The Agent selects a decision action according to its own selection strategy. Specifically, the selection strategy may be a random selection or a selection according to a preset probability (e.g., an epsilon-greedy strategy, etc.), and may be set according to an actual application requirement, which is not limited herein.
Specifically, the decision action may be expressed as:
Figure BDA0002761054990000053
wherein wait represents the time sequence monitoring data waiting for the next time node, Uk∈LThe prediction label k represents the adverse event label predicted by the Agent of the Agent.
The Agent of the intelligent Agent monitors the state of the time sequence data according to the current time node and the strategy pi of the AgentΘSelecting a decision behavior atThe formula is as follows:
at=πΘ(Ot)
Figure BDA0002761054990000054
wherein, Ot=M:t
In step S03, the agent selects a decision-making action according to the time-series monitored data input at the current moment; and the decision environment outputs response information according to the decision action, wherein the response information comprises environment states and reward and punishment values of the decision action.
In an embodiment, a reward-penalty utility function may be constructed, and the decision environment outputs a reward-penalty value of the decision action according to the reward-penalty utility function. The reward and punishment utility function is an important component of intelligent Agent training, encodes the task of the intelligent Agent and directly influences the behavior of the intelligent Agent. The design of the reward and punishment utility function needs to reflect an optimization target, the early prediction of the cardiovascular critical event aims to recognize the premonitory sign of the critical event as early as possible by the intelligent Agent, and the early warning accuracy is ensured. If the false alarm rate is high, unnecessary workload can be generated for medical staff. The reward and punishment utility function is to map the decision behavior of the intelligent Agent to a real number space R on the monitoring data set: and A, D → R, learning the correlation between the decision environment and the behavior corresponding to the time sequence monitoring data observed by the intelligent Agent. Namely: r ist=R(at,M:t,l)。
In one embodiment, the reward and punishment utility function is designed by balancing accuracy (c) and advance predictability of the intelligent Agent for predicting the disease deterioration based on the monitored data set (M, l) ∈ D with a label: the specific reward and punishment utility function formula is as follows:
Figure BDA0002761054990000061
wherein p is greater than 0, and the compound is,
Figure BDA0002761054990000062
is a compromise parameter for advance predictability and accuracy.
The designed reward and punishment function can obtain positive reward when the Agent of the intelligent Agent correctly predicts the cardiovascular critical adverse event, the wrong prediction can be punished, and the corresponding punishment can be obtained when the early warning is delayed.
In one embodiment, when the decision action is waiting for the time sequence monitoring data of the next time node to be input, the decision environment acquires the time sequence monitoring data of the next time, predicts the association relationship between the time sequence monitoring data of the next time and the adverse event label, and outputs the environment state corresponding to the time sequence monitoring data of the next time;
when the decision action is outputting the predicted adverse event label, the decision environment acquires the time sequence monitoring data at the current moment, predicts the incidence relation between the time sequence monitoring data at the current moment and the adverse event label, outputs a reward and punishment value of the decision action, and judges whether the adverse event label predicted by the intelligent body is correct or not according to the reward and punishment value.
In step S04, inputting the environment state into a pre-constructed deep reinforcement learning framework, and obtaining the action with the highest expected value in all selectable decision actions of the agent as the output of the next action decision of the agent;
in one embodiment, the cardiovascular critical adverse event prediction is converted into a Markov decision problem, and an optimal decision action is selected to be output based on the relevance of the environment state and the decision action of the reinforcement learning framework learning.
Referring to fig. 3, in an embodiment, deep reinforcement learning combining the sequential convolutional network and the Q learning method can be used to model the exacerbation of the cardiovascular critical patient. And training the model by adopting a deep Q reinforcement learning framework.
Specifically, a monitoring data set unit 05 is constructed, time sequence monitoring data obtained by interaction between the Agent of each time node and a decision environment is stored in the monitoring data set unit 05, and a certain amount of time sequence monitoring data is randomly taken out from the monitoring data set unit 05 for training during training so as to solve the problems of data correlation and non-static distribution.
And using the current time sequence convolution network 04 to evaluate an estimation value function corresponding to each possible decision action of the Agent in the environment state of the decision environment output. The evaluation value function is used for evaluating expected reward of decision-making action in a preset long-term time. The reward and punishment value at the current moment is high and does not represent that the expected reward in a long term is also high. The effectiveness of the decision-making action can be comprehensively evaluated through an estimation value function. The manner in which the expected value is calculated is not limited herein.
And the target value time sequence convolution network 06 is used for evaluating a true value function corresponding to the time sequence monitoring data, and the parameter of the current time sequence convolution network 04 is updated by adopting a gradient descent method according to the error between the true value function and the estimation value function. The parameters of the current timing convolutional network 04 are copied to the target value timing convolutional network 06 every N iterations, and the current timing convolutional network 04 and the target value timing convolutional network 06 may use the same network.
And finally, obtaining the decision action with the highest expected value of the Agent as the optimal decision, and outputting the decision action.
In step S05, the agent interacts with the decision environment according to the above steps until the end condition is satisfied, and a prediction result is output.
Referring to fig. 2, in one embodiment, the predicting Agent02 interacts with the vital signs monitoring data environment (e.g., vital signs monitoring device) of the critical patient at each time to obtain a high-dimensional monitored sign data (i.e., p-dimensional time-series monitored data).
And (3) correlating the time sequence monitoring data with the adverse events by using a prediction model 01 to obtain reward and punishment values of the environment state representation and the decision action of the intelligent agent.
Feeding the environment state back to the Agent02 of the intelligent Agent, and learning the decision action which the intelligent Agent may make aiming at the golden state and the expected value corresponding to the decision action through a reinforcement learning framework; outputting a decision action with the highest expected value;
and the Agent02 feeds the decision-making action back to the decision-making environment 03 to obtain response information of the decision-making environment 03, and the interaction process is completed in a circulating way until the ending condition is met, and a prediction result is output.
In one embodiment, the end condition includes completing the prediction of all time series monitored data within the monitored duration or outputting an adverse event tag.
In one embodiment, the early warning message may be initiated when the Agent outputs a predicted adverse event. Alternatively, cardiovascular critical adverse events are mainly comprised of four aspects. For example: heart failure, cardiac arrest, myocardial ischemia, and arrhythmia.
Alternatively, if atWaiting for more data input by the Agent, so as to continue observing the data sequence in the monitored data set, wherein the sequence observed next time is an additional time node subset sequence set: o ist+1=M:t+1(ii) a If a ist∈{∪k∈LAnd (4) predicting label k, and finishing the sequence learning.
Alternatively, when parameter p is set to 0, the smart Agent will be subject to the same penalty as the time-independent delay prediction. The penalty becomes time-dependent when p > 0. The limited sequence of subsets at the beginning of the monitored data stream results in less penalty to the smart agent for delayed early warning than for receiving more inputs late in the set.
Optionally, the behavior of the prediction Agent is evaluated by a penalty function. If the data observed by the prediction Agent cannot identify the adverse cardiovascular event, the monitoring Agent waits to observe more monitored data or directly makes an early warning prompt.
And when the intelligent Agent gives a prediction and judges that the current time node data corresponds to a certain bad event, triggering the preset early warning information of the medical system. The early warning information may include text description information, voice alert information, etc. corresponding to the adverse event.
Referring to fig. 4, the embodiment provides a reinforcement learning based monitoring and early warning system for implementing the reinforcement learning based monitoring and early warning method in the foregoing embodiment. Since the technical principle of the system embodiment is similar to that of the method embodiment, repeated description of the same technical details is omitted.
In one embodiment, the reinforcement learning-based monitoring and early warning system includes an environment modeling module 10, an action modeling module 11, an environment response module 12, and a reinforcement learning module 13, where the environment modeling module 10 is configured to assist in performing step S01 described in the foregoing method embodiment; the action modeling module 11 is used to assist in executing step S02 described in the foregoing method embodiments; the environmental response module 12 is used to assist in executing step S03 described in the previous method embodiment; the reinforcement learning module 13 is used to assist in executing step S04 described in the foregoing method embodiment; the environmental response module 12 and the reinforcement learning module 13 are used to assist in executing step S05 described in the foregoing method embodiments.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (9)

1. A monitoring and early warning method based on reinforcement learning is characterized by comprising the following steps:
predicting the incidence relation between the time sequence monitoring data and the adverse event label according to the time sequence monitoring data input in real time, and creating a decision environment;
modeling the agent decision-making action, wherein the decision-making action comprises waiting for the time-sequence monitoring data input of the next time node or outputting a predicted adverse event label;
the intelligent agent selects a decision action according to the time sequence monitoring data input at the current moment; the decision environment outputs response information according to the decision action, wherein the response information comprises environment states and reward and punishment values of the decision action;
inputting the environment state into a pre-constructed deep reinforcement learning framework, and acquiring the action with the highest expected value in all selectable decision actions of the intelligent agent as the output of the next action decision of the intelligent agent;
and interacting the intelligent agent and the decision-making environment according to the steps until an ending condition is met, and outputting a prediction result.
2. The reinforcement learning-based monitoring and early warning method according to claim 1, wherein the end condition comprises completion of prediction of all time-series monitoring data within a monitoring duration or output of an adverse event label.
3. The reinforcement learning-based monitoring and early warning method according to claim 1, wherein the agent selects a decision-making action according to the time-series monitoring data input at the current moment, and the decision-making action comprises:
setting a selection strategy of the agent, and selecting a decision action according to the selection strategy, wherein the selection strategy comprises the following steps: randomly or according to a preset probability.
4. The reinforcement learning-based monitoring and early warning method according to claim 1, wherein the decision environment outputs response information according to the decision action, and comprises:
when the decision action is to wait for the time sequence monitoring data of the next time node to be input, a decision environment acquires the time sequence monitoring data of the next time, predicts the incidence relation between the time sequence monitoring data of the next time and the adverse event label and outputs an environment state corresponding to the time sequence monitoring data of the next time;
when the decision action is used for outputting the predicted adverse event label, the decision environment acquires the time sequence monitoring data at the current moment, predicts the incidence relation between the time sequence monitoring data at the current moment and the adverse event label, outputs a reward and punishment value of the decision action, and judges whether the adverse event label predicted by the intelligent body is correct or not according to the reward and punishment value.
5. The reinforcement learning-based monitoring and early-warning method according to claim 1, comprising constructing a reward and punishment utility function, wherein the decision environment outputs a reward and punishment value of the decision action according to the reward and punishment utility function.
6. The reinforcement learning-based monitoring and early warning method according to claim 5, wherein the reward and punishment utility function comprises:
Figure FDA0002761054980000021
wherein R (a)t,M:tL) represents the correlation of the decision-making action with the corresponding time-series monitored data; a istIs a decision-making action; m:tA time-series monitoring data subset for a t-time node; p is greater than 0 and is the dimension of time sequence monitoring data;
Figure FDA0002761054980000023
is a compromise parameter for advance predictability and accuracy; predict label is an adverse event expected to be predicted; u-shapedk∈L\lpredict laBel k is a mispredicted adverse event.
7. The reinforcement learning-based monitoring and early warning method according to claim 5, comprising:
and constructing an evaluation function, evaluating the decision environment, and adjusting the reward and punishment utility function according to an evaluation result.
8. The reinforcement learning-based monitoring and early warning method according to claim 7, wherein the evaluation function is represented as:
Figure FDA0002761054980000022
wherein c represents an incidence relation prediction model for predicting the time-series monitoring data and the adverse event label, D' is a test data set, and l is the adverse event label; # denotes the number of data in the set.
9. A guardianship early warning system based on reinforcement learning, characterized by comprising:
the environment modeling module is used for predicting the incidence relation between the time sequence monitoring data and the adverse event label according to the time sequence monitoring data input in real time and creating a decision environment;
the action modeling module is used for modeling the decision action of the intelligent agent, wherein the decision action comprises waiting for the time sequence monitoring data input of the next time node or outputting a predicted adverse event label;
the environment response module is used for selecting decision-making action by the intelligent agent according to the time sequence monitoring data input at the current moment; the decision environment outputs response information according to the decision action, wherein the response information comprises environment states and reward and punishment values of the decision action;
the reinforcement learning module is used for inputting the environment state into a pre-constructed depth reinforcement learning framework, and acquiring the action with the highest expected value in all selectable decision actions of the intelligent agent as the output of the next action decision of the intelligent agent; and interacting the intelligent agent and the decision-making environment according to the steps until an ending condition is met, and outputting a prediction result.
CN202011217940.8A 2020-11-04 2020-11-04 Monitoring and early warning method and system based on reinforcement learning Active CN112270451B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011217940.8A CN112270451B (en) 2020-11-04 2020-11-04 Monitoring and early warning method and system based on reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011217940.8A CN112270451B (en) 2020-11-04 2020-11-04 Monitoring and early warning method and system based on reinforcement learning

Publications (2)

Publication Number Publication Date
CN112270451A true CN112270451A (en) 2021-01-26
CN112270451B CN112270451B (en) 2022-05-24

Family

ID=74344969

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011217940.8A Active CN112270451B (en) 2020-11-04 2020-11-04 Monitoring and early warning method and system based on reinforcement learning

Country Status (1)

Country Link
CN (1) CN112270451B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114418242A (en) * 2022-03-28 2022-04-29 海尔数字科技(青岛)有限公司 Material discharging scheme determination method, device, equipment and readable storage medium

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108600379A (en) * 2018-04-28 2018-09-28 中国科学院软件研究所 A kind of isomery multiple agent Collaborative Decision Making Method based on depth deterministic policy gradient
CN108694465A (en) * 2018-05-16 2018-10-23 南京邮电大学 Urban SOS Simulation Decision optimization method based on the Q study of SVM vector machines
CN109783709A (en) * 2018-12-21 2019-05-21 昆明理工大学 A kind of sort method based on Markovian decision process and k- arest neighbors intensified learning
CN110263979A (en) * 2019-05-29 2019-09-20 阿里巴巴集团控股有限公司 Method and device based on intensified learning model prediction sample label
EP3543918A1 (en) * 2018-03-20 2019-09-25 Flink AI GmbH Reinforcement learning method
CN110826624A (en) * 2019-11-05 2020-02-21 电子科技大学 Time series classification method based on deep reinforcement learning
CN111578940A (en) * 2020-04-24 2020-08-25 哈尔滨工业大学 Indoor monocular navigation method and system based on cross-sensor transfer learning
US20200337648A1 (en) * 2019-04-24 2020-10-29 GE Precision Healthcare LLC Medical machine time-series event data processor
CN111861752A (en) * 2020-07-24 2020-10-30 中山大学 Trend transaction method and system based on reinforcement learning

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3543918A1 (en) * 2018-03-20 2019-09-25 Flink AI GmbH Reinforcement learning method
CN108600379A (en) * 2018-04-28 2018-09-28 中国科学院软件研究所 A kind of isomery multiple agent Collaborative Decision Making Method based on depth deterministic policy gradient
CN108694465A (en) * 2018-05-16 2018-10-23 南京邮电大学 Urban SOS Simulation Decision optimization method based on the Q study of SVM vector machines
CN109783709A (en) * 2018-12-21 2019-05-21 昆明理工大学 A kind of sort method based on Markovian decision process and k- arest neighbors intensified learning
US20200337648A1 (en) * 2019-04-24 2020-10-29 GE Precision Healthcare LLC Medical machine time-series event data processor
CN110263979A (en) * 2019-05-29 2019-09-20 阿里巴巴集团控股有限公司 Method and device based on intensified learning model prediction sample label
CN110826624A (en) * 2019-11-05 2020-02-21 电子科技大学 Time series classification method based on deep reinforcement learning
CN111578940A (en) * 2020-04-24 2020-08-25 哈尔滨工业大学 Indoor monocular navigation method and system based on cross-sensor transfer learning
CN111861752A (en) * 2020-07-24 2020-10-30 中山大学 Trend transaction method and system based on reinforcement learning

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
QISHENG WANG 等: "Prioritized Guidance for Efficient Multi-Agent Reinforcement Learning Exploration", 《MACHINE LEARNING》 *
王寻: "基于强化学习的游戏环境中智能体决策模型的设计研究", 《中国优秀硕士学位论文全文数据库基础科学辑》 *
程引: "基于强化学习的时间序列决策系统设计与应用", 《中国博士学位论文全文数据库基础科学辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114418242A (en) * 2022-03-28 2022-04-29 海尔数字科技(青岛)有限公司 Material discharging scheme determination method, device, equipment and readable storage medium

Also Published As

Publication number Publication date
CN112270451B (en) 2022-05-24

Similar Documents

Publication Publication Date Title
EP3620983B1 (en) Computer-implemented method, computer program product and system for data analysis
KR102216689B1 (en) Method and system for visualizing classification result of deep neural network for prediction of disease prognosis through time series medical data
Xiao et al. Learning time series associated event sequences with recurrent point process networks
US9875142B2 (en) System and method for efficient task scheduling in heterogeneous, distributed compute infrastructures via pervasive diagnosis
Biloš et al. Neural flows: Efficient alternative to neural ODEs
Fanti et al. A three-level strategy for the design and performance evaluation of hospital departments
CN109326353B (en) Method and device for predicting disease endpoint event and electronic equipment
Jabbari et al. Discovery of causal models that contain latent variables through Bayesian scoring of independence constraints
Lefebvre Fault diagnosis and prognosis with partially observed stochastic Petri nets
Liu et al. Multi-task learning via adaptation to similar tasks for mortality prediction of diverse rare diseases
CN112270451B (en) Monitoring and early warning method and system based on reinforcement learning
Prats et al. Automatic generation of workload profiles using unsupervised learning pipelines
Guan et al. Structural dominant failure modes searching method based on deep reinforcement learning
Yu et al. MAG: A novel approach for effective anomaly detection in spacecraft telemetry data
Xiang et al. Reliable post-signal fault diagnosis for correlated high-dimensional data streams
US20220318615A1 (en) Time-aligned reconstruction recurrent neural network for multi-variate time-series
Huegle et al. MPCSL-a modular pipeline for causal structure learning
Berkenstadt et al. Queueing inference for process performance analysis with missing life-cycle data
Lee et al. Clinical event time-series modeling with periodic events
KR102203336B1 (en) Method and apparatus for experimental design optimization and hypothesis generation using generative model
De Oliveira et al. An optimization-based process mining approach for explainable classification of timed event logs
CN114329938A (en) System reliability analysis method and device, computer equipment and storage medium
US20220262524A1 (en) Parameter-estimation of predictor model using parallel processing
KR102182807B1 (en) Apparatus of mixed effect composite recurrent neural network and gaussian process and its operation method
Hanamori et al. Real-time monitoring solution to detect symptoms of system anomalies

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant