WO2023202500A1 - Deep reinforcement learning based assistive adjustment system for dry weight of hemodialysis patient - Google Patents

Deep reinforcement learning based assistive adjustment system for dry weight of hemodialysis patient Download PDF

Info

Publication number
WO2023202500A1
WO2023202500A1 PCT/CN2023/088561 CN2023088561W WO2023202500A1 WO 2023202500 A1 WO2023202500 A1 WO 2023202500A1 CN 2023088561 W CN2023088561 W CN 2023088561W WO 2023202500 A1 WO2023202500 A1 WO 2023202500A1
Authority
WO
WIPO (PCT)
Prior art keywords
dialysis
dry weight
patient
data
reinforcement learning
Prior art date
Application number
PCT/CN2023/088561
Other languages
French (fr)
Chinese (zh)
Inventor
李劲松
田雨
周天舒
杨子玥
Original Assignee
浙江大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学 filed Critical 浙江大学
Publication of WO2023202500A1 publication Critical patent/WO2023202500A1/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/30ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for calculating health indices; for individual health risk assessment

Definitions

  • the invention belongs to the technical fields of medical treatment and machine learning. Specifically, it relates to an auxiliary adjustment system for the dry weight of hemodialysis patients based on deep reinforcement learning.
  • Dry body weight is one of the most fundamental components of any dialysis prescription and is clinically determined as the lowest tolerated postdialysis weight without adverse intradialytic symptoms and hypotension in the absence of significant fluid overload. Accurate assessment of dry body weight is crucial for the survival prognosis of hemodialysis patients, and inaccurate estimation will have a great negative impact on patient survival. Overestimating the patient's dry weight can lead to chronic fluid overload, possibly by inducing edema, pulmonary congestion, hypertension, and vascular and cardiac damage; underestimating the patient's dry weight can lead to chronic dehydration, cramps and other dialysis side effects, and increase the risk of dialytic hypotension. Risks include loss of residual renal function (RRF).
  • RRF residual renal function
  • bioelectrical impedance analysis is a non-invasive and simple technology to assist in the assessment and determination of dry body weight
  • relative plasma volume (RPV) monitoring has been validated as one of the markers of dry body weight
  • lung ultrasound has become an emerging Techniques for guided dry body weight.
  • none of these methods serve as the gold standard for dry body weight assessment.
  • dry body weight often fluctuates due to uncertainty about the patient's nutritional status or underlying disease, necessitating ongoing reassessment.
  • clinicians may not notice changes in these patients in a timely manner, resulting in delayed or even missed dry weight adjustments.
  • Existing studies can only assess a patient's hydration status at a certain point in time to estimate dry body weight, and cannot help clinicians detect potential changes in dry body weight over time.
  • the purpose of the present invention is to address the shortcomings of the existing technology and propose an auxiliary dry weight adjustment system for hemodialysis patients based on deep reinforcement learning to dynamically support clinicians in determining personalized dry weight adjustment plans for hemodialysis patients.
  • an auxiliary adjustment system for dry weight of hemodialysis patients based on deep reinforcement learning which system includes a data acquisition module, a data processing module, a strategy learning module and an auxiliary decision-making module;
  • the data acquisition module is used to collect medical electronic medical record data of hemodialysis patients during the dialysis induction period and dialysis stable period, and input it into the data processing module;
  • the data processing module is used to process the data collected by the data acquisition module, including the construction of the state space and the construction of the action space; the state represents the time-series coded clinical variables during the patient's dialysis treatment, and the action represents the time compared to the previous dialysis.
  • the dry weight of the treatment course the value to which the current dry weight should be adjusted;
  • the policy learning module is used to set the reward function of deep reinforcement learning.
  • the reward function is the instant reward of each state, which is composed of the reward of the patient's long-term survival probability and the penalty of the patient's current dialysis symptoms, and is based on data processing.
  • the state space and action space constructed by the module are subjected to deep reinforcement learning to obtain the dry weight adjustment strategy;
  • the auxiliary decision-making module is used to visually output the dry weight adjustment strategy to assist doctors in decision-making.
  • the data collection module collects data for every dialysis session; for patients in the stable period of dialysis, the data collection module collects data for every 4 dialysis sessions.
  • the data of each dialysis session includes four types of clinical variables: intra-dialysis measured variables of the previous dialysis session, post-dialysis measured variables of the previous dialysis session, and pre-dialysis measured variables of this dialysis session and this dialysis session. patient demographics.
  • the clinical variable values collected and recorded by the data collection module are the average or sum of the corresponding clinical variable values in these four dialysis treatments.
  • the data processing module first preprocesses the data collected by the data acquisition module, uses multiple interpolation to interpolate the missing clinical variable data, and uses the Min-Max normalization method to normalize the clinical variable data. , and then use the preprocessed data to construct the state space.
  • the data processing module uses an autoencoder of a long short-term memory network to perform temporal encoding processing on the preprocessed clinical variable data; the autoencoder of the long short-term memory network is trained and optimized to minimize the difference between the original input and the decoded output.
  • the data processing module uses backward interpolation to fill in the recommended dry weight value of the physician in each dialysis session, and calculates the patient's dry weight in this dialysis session compared with the dry weight in the previous dialysis session. The amount of change in body weight was discretized.
  • part of the reward function uses a multi-layer perceptron network to predict the probability of the patient dying within one year in the corresponding state, and the reward return is set to the negative logarithmic odds of the probability; the other part of the reward function is Penalties for the occurrence of side effects during dialysis vary with different intradialytic symptoms and severity.
  • an experienced replay pool is constructed and a deep double-Q network is used for deep reinforcement learning.
  • Experience replay refers to saving the rewards and status updates obtained from each interaction with the environment for subsequent deep reinforcement learning. The update of the target Q value during the process.
  • doctors can set an evaluation threshold. Adjustments below this threshold will be directly evaluated by nurses and selectively executed. Adjustments above this threshold will be evaluated and selectively executed by physicians to achieve dry weight adjustment. Decision-making support.
  • the present invention models the important clinical problem of dry weight assessment as a timing decision-making problem of dry weight adjustment; combines clinical knowledge and physician experience to construct a targeted reward function for the dry weight adjustment process, while reflecting The patient's long-term survival reward and the short-term penalty for poor dialysis symptom response; the reinforcement learning agent-deep double Q network with a competitive architecture is used to fully utilize the time-series electronic medical record data to learn the optimal dry weight adjustment strategy; it can reduce the workload of doctors and can When assessing a patient's dry weight, more patient characteristic variables are taken into consideration to help physicians balance short-term and long-term interests and customize a personalized dry weight adjustment plan for the patient.
  • Figure 1 is a structural block diagram of the dry weight auxiliary adjustment system for hemodialysis patients based on deep reinforcement learning of the present invention.
  • Figure 2 is a schematic diagram of the data reconstruction process in the data acquisition module of the present invention.
  • Figure 3 is a schematic diagram of the modeling process of dry body weight adjustment according to the Markov decision process of the present invention.
  • Figure 4 is an overall architecture diagram of the strategy learning module of the present invention.
  • Reinforcement learning is a popular research direction in the field of artificial intelligence. It is based on an agent agent that continuously interacts with the environment. The goal is to find an optimal strategy to maximize the expected cumulative reward.
  • reinforcement learning has been introduced into the healthcare field and plays an increasingly important role in many temporal decision-making problems, such as for patients with diabetes. glycemic control, treatment of patients with sepsis, mechanical Ventilation settings and other issues.
  • reinforcement learning techniques have not been used to support clinicians in assessing dry body weight in hemodialysis patients.
  • the present invention uses the Markov decision process framework to model the dry weight assessment process as a sequential decision-making process, defines respective state spaces and action spaces for different dialysis periods, and designs a reward system combined with clinical background knowledge; the present invention constructs A deep double Q network (Dueling-DQN) based on a competitive architecture is used to learn the optimal dry weight adjustment strategy from historical electronic medical record data, thereby providing nephrologists with clinical decision support suggestions for dry weight adjustment and assisting doctors in patient weight management. Long-term management.
  • Dueling-DQN Deep double Q network
  • the present invention provides a dry weight auxiliary adjustment system for hemodialysis patients based on deep reinforcement learning.
  • the system includes: a data acquisition module for collecting medical electronic medical record data of hemodialysis patients, and a data collection module for collecting original data.
  • Data processing module for processing, strategy learning module for deep reinforcement learning agents; auxiliary decision-making module for visual output and interaction with doctors.
  • the processing process of the data collection module is specifically: collecting clinical data of patients from the medical electronic medical record system, including demographics, laboratory values, dialysis parameters, dialysis symptoms and other related clinical characteristics.
  • the present invention limits the time window of collection during data collection. That is, the data of each dialysis treatment course is reconstructed.
  • the data for each dialysis session include four categories of clinical variables: intradialytic measured variables for the previous dialysis session, post-dialytic measured variables for the previous dialysis session, pre-dialytic measured variables for this dialysis session, and patient population for this dialysis session.
  • Statistical indicators shown in Figure 2.
  • the present invention separately processes and models the data of the dialysis induction period (three months before the start of dialysis) and the dialysis stable period (three months after the start of dialysis). For patients in the induction period of dialysis, the present invention collects data for each dialysis session; for patients in the stable period of dialysis, the present invention collects data for every 4 dialysis sessions, and the clinical variable values collected and recorded are those of the 4 dialysis sessions. Corresponds to the mean (e.g., age) or sum (e.g., number of occurrences of adverse dialysis symptoms) of clinical variable values.
  • the processing process of the data processing module includes two parts:
  • the dry body weight adjustment process modeling is a sequential decision-making process.
  • the present invention models and describes this process based on the Markov decision process (MDP).
  • the Markov decision process is described by the tuple (S, A, T, R, ⁇ ), where S represents the state space, A represents the action space, T represents the transition probability distribution between different states, R represents the reward function, and ⁇ represents Strategy is the mapping from state space to action space.
  • the agent can observe a state s t ⁇ S and select an action a t ⁇ A according to the policy ⁇ . This is the action selection. selection process. Then, the agent receives the reward r t related to its action selection according to the reward function R. This is the reward response process.
  • the environment changes to the next state s t+1 ⁇ S in response to the agent's action according to the state transition probability distribution T.
  • the state S represents the clinical variables of the patient's dialysis course after time series encoding
  • the action A represents the value that the current dry weight should be adjusted (increased or decreased) compared to the dry weight of the last dialysis course. Since the clinical environment is complex and it is difficult to accurately model the probability distribution of state transition, the present invention sets the state transition probability distribution T to be unknown. Under the guidance of the reward function R, the agent learns and outputs the best action selection strategy ⁇ for the unknown complex environment based on historical retrospective data. Construction of state space
  • the Min-Max normalization method was used to normalize the feature matrix to facilitate subsequent learning and optimization of deep models. Since the dry weight adjustment process is actually a partially observable Markov decision process (POMDP), that is, the state transition dynamics and reward distribution do not satisfy the Markov properties (the information contained in the current state is determined by the distribution probability of the future state). (all required content), the present invention uses an autoencoder of a long short-term memory network to perform temporal encoding processing on clinical data collected by patients. The autoencoder of the long short-term memory network is trained and optimized to minimize the reconstruction loss between the original input and the decoded output.
  • POMDP partially observable Markov decision process
  • the encoder and decoder parts are composed of a single-layer long short-term memory network containing 128 units.
  • i the patient
  • o the collected clinical observation feature vector of the t-th dialysis course of the patient
  • t the dialysis course time
  • s the state of the Markov process
  • f the encoder of the trained long short-term memory network.
  • the present invention uses backward interpolation to fill in the recommended dry body weight value of the physician in each dialysis session. ;
  • the present invention calculates the change in the patient's dry weight in this dialysis session compared to the last dialysis session, and performs discretization processing.
  • Discretization processing refers to limiting the dry weight adjustment range to a certain interval, dividing equal adjustment intervals into different adjustment actions, and using the action with the closest continuous value to the doctor's dry weight adjustment during the dialysis treatment as the discretized dry weight. Adjustment action (the amount of change in dry body weight during a dialysis session compared to the previous dialysis session).
  • the present invention constructs specific action spaces for the dialysis induction period (three months before the start of dialysis) and the dialysis stable period (three months after the start of dialysis), as shown in Table 1.
  • the processing process of the policy learning module of the deep reinforcement learning agent includes three parts:
  • the core of the policy learning module of the deep reinforcement learning agent of the present invention is a deep double Q network (DDQN with a dueling structure) based on a competition architecture.
  • Deep double Q network (DDQN) and competition-based Q network (Dueling-DQN) are both improved versions of DQN.
  • the former is an improvement on the DQN training algorithm, and the latter is an improvement on the DQN model structure.
  • the present invention adopts these at the same time.
  • the DQN algorithm is an improvement on the Q-learning algorithm.
  • the Q-learning algorithm uses a Q-tabel to record the action value in each state. When the state space or action space is large, the storage space required will also be large.
  • the core of the DQN algorithm is to use an artificial neural network q(s, a; ⁇ ) s ⁇ S, a ⁇ A to replace Q-tabel, which is the action value function.
  • the input of the action value network is state information, and the output is the value of each action.
  • the agent selects the action based on the value of each action.
  • Experience replay refers to saving the rewards and status updates obtained from each interaction with the environment for subsequent updates of the target Q value, which can disrupt sample correlation, improve sample utilization, and thereby improve the stability of DQN training.
  • Experience playback mainly has two key steps: “storage” and “playback”: storage refers to storing experience in the form of current state s t , action a t , instant reward r t+1 , next state s t+1 , and round state done Stored in the experience pool, playback refers to sampling one or more pieces of experience data from the experience pool according to certain rules.
  • the present invention adopts a priority experience playback method, that is, assigning a priority to each experience in the experience pool, and is more inclined to select experiences with higher priorities when sampling experience.
  • the priority depends on the difference between the current Q value of each state transition and the target Q value (time difference error, TD-error). If the TD-error is larger, it means that there is still a lot of room for improvement in the Q network prediction accuracy, then this sample The more it needs to be learned, the higher the priority.
  • the reward function is the feedback observed from the environment for a given state-action pair.
  • the main goal of the reinforcement learning agent is to maximize the cumulative reward of the state-action pair given the patient's state-action trajectory, so the design of the reward function is crucial to the learning of the reinforcement learning agent.
  • the reward function is set up to respond immediately to each state in the patient's trajectory.
  • the reward consists of two parts: one part reflects the patient's long-term survival probability r 1 (s), and the other part reflects the patient's current intradialytic symptoms r 2 (s).
  • the present invention trains a multi-layer perceptron (MLP) network to predict the probability of the patient's death within one year in this state. Reward returns are set as the negative log odds of probability. Generally speaking, a dead status score within a year is negative and a living status score is positive.
  • MLP multi-layer perceptron
  • r 1 (s) represents the survival reward part in the reward function
  • g (s) represents the probability of death of the patient within one year in state s predicted by the multi-layer perceptron.
  • Penalties vary with different intradialytic symptoms and severity. According to the actual on-site performance, one point is deducted for fever, imbalance syndrome, cerebral hemorrhage and cerebral infarction, while two points are deducted for headache, muscle spasm, abdominal pain, intradialytic hypotension and intradialytic hypertension.
  • This invention trains and optimizes a deep double Q network (Dueling DDQN) based on a competitive architecture, and adjusts the dry weight processing strategy through repeated trials to maximize the overall return of the predicted reward.
  • TD-error time difference error
  • r max 20 value to improve model stability.
  • L( ⁇ ) is the final loss function to be learned by the deep double Q network based on the competition architecture of the present invention
  • error TD is the time difference error
  • w imp is the importance sampling weight of priority experience playback
  • Q main is the deep double Q network
  • the main network in Q target is the target network in the deep double Q network
  • is the parameter of the main network
  • ⁇ ′ is the parameter of the target network
  • is the discount coefficient, taking a value between 0 and 1, the higher the ⁇ value High, indicating that the agent pays more attention to future rewards rather than the current moment.
  • s represents the state
  • a represents the action
  • r represents the reward
  • E represents the expectation
  • represents the regularization term coefficient, which takes a value between 0 and 1
  • r t+1 represents the t+1 dialysis treatment course Reward
  • s t represents the status of the t-th dialysis session
  • a t represents the action of the t-th dialysis session.
  • the special design of the reward function in the present invention effectively improves the policy learning efficiency of the deep Q network.
  • the reward function in the present invention is an immediate reward, that is, each state of the trajectory will give a reward to the agent.
  • the survival reward part r 1 (s) in the reward function distributes the survival reward located at the end of the patient's trajectory to each state of the patient's trajectory in advance and dispersedly through a survival predictor.
  • the intradialytic side reaction penalty part r 2 (s) in the reward function incorporates the patient's immediate feedback on the dialysis course in each dialysis state into the reward, imitating the doctor's behavior of adjusting the dry weight according to the patient's clinical performance, so that
  • the strategies learned by the agent are not only expected to improve the patient's survival, but also reduce the patient's adverse reactions during dialysis, reduce the physical pain of dialysis patients, and improve the therapeutic effect of dialysis treatment.
  • the reward determines the goal of the agent's action. Therefore, the immediate reward can guide the agent's behavior better and more timely than the delayed reward.
  • the corresponding loss function is easier to learn and optimize, which improves the agent's learning efficiency.
  • the deep Q network will learn a value function Q network to map different states and actions to different Q values, so that different dry weight adjustment actions can be selected for different dialysis treatment states based on this mapping, ultimately forming an intelligent Dry body weight adjustment strategies recommended by the body.
  • the auxiliary decision-making module used for visual output and interaction with doctors is specifically: according to the patient's different dialysis treatment status, the reinforcement learning agent will recommend the optimal dry weight adjustment value for the patient.
  • Physicians can set an assessment threshold (such as 0.2kg). Adjustments below this threshold will be directly evaluated by nurses and selectively implemented. Adjustments above this threshold will be evaluated and selectively implemented by physicians to provide auxiliary support for physicians’ dry weight adjustment decisions. .
  • the system will record the agent's recommended values in each dialysis session, whether the physician accepts the agent's recommendations, and the dry weight adjustment value performed by the physician. It will regularly evaluate the patient's dialysis adequacy and provide feedback to physicians and algorithm engineers in the form of visual charts. , so that the model can be updated and optimized later.
  • a specific example of the present invention is as follows:
  • This example uses the electronic medical record data of maintenance hemodialysis patients who received continuous and regular hemodialysis treatment in a tertiary hospital for research.
  • the data of the dialysis induction period and dialysis stable period are divided into three data sets: training set (60% ), validation set (20%), and test set (10%).
  • the data of the training set is used to train the deep reinforcement learning agent model
  • the data of the verification set is used to adjust the optimization parameters
  • the test set is used to test the performance of the model.
  • the present invention uses multiple sampling with replacement (bootstrap) to obtain the confidence interval of the performance index.
  • this embodiment adds a random strategy and a K nearest neighbor strategy to compare and evaluate the effectiveness of the model.
  • the K nearest neighbor strategy refers to selecting the strategy to be taken based on the K most similar state voting. action.
  • This invention uses the weighted double robust (WDR) estimator in the different strategy evaluation method to evaluate the value of different strategies. The results are as shown in Table 2 and shown in Table 3.
  • the dry weight adjustment strategy learned by the deep reinforcement learning agent in the present invention has achieved the best results compared with other strategies. It is worth noting that when the strategy learned by the intelligent agent of the present invention is applied to the dialysis induction period, compared with the existing clinician strategy, it is expected that the 5-year mortality rate of hemodialysis patients can be reduced by 9.47%, and the mortality rate of hemodialysis patients in 3 years can be reduced by 9.47%. The mortality rate was reduced by 7.99%, the incidence of dialysis adverse reactions was reduced by 8.44%, and the coefficient of variation of systolic blood during dialysis was reduced by 4.76%, which was statistically significant. Therefore, the present invention is expected to realize dynamic and intelligent adjustment of the dry weight of hemodialysis patients, and is expected to significantly improve the dialysis treatment effect and long-term survival of hemodialysis patients.

Abstract

Disclosed in the present invention is a deep reinforcement learning based assistive adjustment system for the dry weight of a hemodialysis patient. The system comprises a data collection module, a data processing module, a policy learning module and an assistive decision-making module. In the present invention, by using deep reinforcement learning technology, a double deep Q-network (DDQN) with a dueling struture is constructed as a proxy, the process of a doctor adjusting the dry weight of a hemodialysis patient is simulated, and a policy for adjusting the dry weight of the hemodialysis patient is intelligently learned. By means of the present invention, the process of adjusting the dry weight of a hemodialysis patient is modeled into a partially observable Markov process, respective state spaces and action spaces are defined for different dialysis periods, and reward functions which include long-term survival rewards and short-term dialysis side-effect punishment are designed; and by means of interactive learning of a proxy and a patient state, a dry weight adjustment policy that maximizes an overall reward is obtained, thereby assisting a doctor in performing long-term management of the dry weight of a patient.

Description

一种基于深度强化学习的血透患者干体重辅助调节系统A dry weight auxiliary adjustment system for hemodialysis patients based on deep reinforcement learning 技术领域Technical field
本发明属于医疗及机器学习技术领域,具体地,涉及一种基于深度强化学习的血透患者干体重辅助调节系统。The invention belongs to the technical fields of medical treatment and machine learning. Specifically, it relates to an auxiliary adjustment system for the dry weight of hemodialysis patients based on deep reinforcement learning.
背景技术Background technique
在世界范围内,终末期肾病患者的数量都在显著增加。由于供肾资源短缺,大部人患者依赖于血液透析(血透)治疗维持生命。终末期肾病患者发生感染、心脑血管等疾病的风险远高于正常人群,生存状况远不如普通人群,终末期肾病已经成为医疗保健系统的巨大负担。血透的主要目标是通过超滤(UF)校正体液的成分和体积,实现体液平衡,而干体重是确定血透疗程的超滤量的关键指标。干体重是任何一个透析处方中最基本的组成部分之一,临床上被确定为在没有明显液体超负荷的情况下无透析中不良症状和低血压的最低耐受透析后体重。准确地评估干体重对于血透患者的生存预后至关重要,不准确的估计会对患者生存状况带来很大的负面影响。高估患者干体重会导致慢性体液超负荷,并可能通过诱发水肿、肺充血、高血压以及血管和心脏损伤;低估患者干体重会导致慢性脱水、痉挛等透析副反应,增加透析性低血压的风险,还会导致残余肾功能(RRF)的丧失。The number of patients with end-stage renal disease is increasing significantly worldwide. Due to the shortage of kidney donor resources, most patients rely on hemodialysis (hemodialysis) treatment to maintain their lives. Patients with end-stage renal disease have a much higher risk of infection, cardiovascular and cerebrovascular diseases than the normal population, and their living conditions are far worse than those of the general population. End-stage renal disease has become a huge burden on the health care system. The main goal of hemodialysis is to correct the composition and volume of body fluids through ultrafiltration (UF) to achieve body fluid balance, and dry body weight is a key indicator for determining the amount of ultrafiltration during hemodialysis treatment. Dry body weight is one of the most fundamental components of any dialysis prescription and is clinically determined as the lowest tolerated postdialysis weight without adverse intradialytic symptoms and hypotension in the absence of significant fluid overload. Accurate assessment of dry body weight is crucial for the survival prognosis of hemodialysis patients, and inaccurate estimation will have a great negative impact on patient survival. Overestimating the patient's dry weight can lead to chronic fluid overload, possibly by inducing edema, pulmonary congestion, hypertension, and vascular and cardiac damage; underestimating the patient's dry weight can lead to chronic dehydration, cramps and other dialysis side effects, and increase the risk of dialytic hypotension. Risks include loss of residual renal function (RRF).
现有的干体重评估技术无法实现对血液透析患者干体重的精准的、动态的评估。在临床实践中,医生一般根据透析前、透析中、透析后的临床表现结合一段时间内的体格检查来评估患者干体重。这是一种反复试验、调整试错的方法,通过逐渐改变患者透析后体重和观察患者透析表现来实现。然而,有证据表明,利用传统的体征(如外周水肿、肺部听诊和血压)评估干体重并不可靠。因此,近年来,新的技术也在不断涌现。例如,生物电阻抗分析(BIA)是一种辅助评估确定干体重的无创而简单的技术;相对血浆体积(RPV)监测已被验证为干体重的标志物之一;肺部超声成为一种新兴的指导干体重的技术。然而,这些方法都不能作为评估干体重的黄金标准。此外,由于患者营养状况或潜在疾病的不确定性,干体重经常会出现波动,因此必须进行持续不断的再评估。然而,由于日常工作量大,临床医生可能无法及时注意到这些患者的变化,导致延迟甚至错过干体重调整。现有研究只能评估某个时间点的患者水合状态,从而估计干体重,不能帮助临床医生检测干体重的时序性的潜在变化。Existing dry weight assessment technology cannot achieve accurate and dynamic assessment of dry weight in hemodialysis patients. In clinical practice, doctors generally evaluate patients' dry weight based on clinical manifestations before, during, and after dialysis combined with physical examinations over a period of time. This is a trial-and-error, adjustment-based approach that is accomplished by gradually changing the patient's post-dialysis weight and observing the patient's dialysis performance. However, there is evidence that assessment of dry body weight using traditional physical signs (such as peripheral edema, lung auscultation, and blood pressure) is unreliable. Therefore, in recent years, new technologies have continued to emerge. For example, bioelectrical impedance analysis (BIA) is a non-invasive and simple technology to assist in the assessment and determination of dry body weight; relative plasma volume (RPV) monitoring has been validated as one of the markers of dry body weight; lung ultrasound has become an emerging Techniques for guided dry body weight. However, none of these methods serve as the gold standard for dry body weight assessment. Additionally, dry body weight often fluctuates due to uncertainty about the patient's nutritional status or underlying disease, necessitating ongoing reassessment. However, due to heavy daily workload, clinicians may not notice changes in these patients in a timely manner, resulting in delayed or even missed dry weight adjustments. Existing studies can only assess a patient's hydration status at a certain point in time to estimate dry body weight, and cannot help clinicians detect potential changes in dry body weight over time.
另一方面,临床现有的干体重决策过程高度依赖于临床医师的经验和精力。由于缺乏精标准,干体重的值无法通过某几个患者特征计算得到,需要评估许多相关的患者临床表现综合得出。因此,在临床这样高数据密度的环境中,临床医师必须审查大量的患者特征数据来 评估或监控干体重,从而导致干体重的决策过程复杂、费时费力。这也使得血透治疗的效果与主治医生的经验和医学知识密切相关,加重了区域医疗资源分布的不平衡。On the other hand, the existing clinical dry weight decision-making process is highly dependent on the experience and energy of the clinician. Due to the lack of precise standards, the value of dry body weight cannot be calculated from certain patient characteristics and needs to be evaluated comprehensively from many relevant patient clinical manifestations. Therefore, in a high-data-density environment such as clinical practice, clinicians must review large amounts of patient characteristic data to Assessing or monitoring dry body weight results in a complex, time-consuming and labor-intensive decision-making process. This also makes the effect of hemodialysis treatment closely related to the experience and medical knowledge of the attending doctor, exacerbating the imbalance in the distribution of regional medical resources.
发明内容Contents of the invention
本发明目的在于针对现有技术的不足,提出一种基于深度强化学习的血透患者干体重辅助调节系统,以动态地支持临床医生确定个性化的血透患者干体重调整方案。The purpose of the present invention is to address the shortcomings of the existing technology and propose an auxiliary dry weight adjustment system for hemodialysis patients based on deep reinforcement learning to dynamically support clinicians in determining personalized dry weight adjustment plans for hemodialysis patients.
本发明的目的是通过以下技术方案来实现的:一种基于深度强化学习的血透患者干体重辅助调节系统,该系统包括数据采集模块、数据处理模块、策略学习模块和辅助决策模块;The object of the present invention is achieved through the following technical solutions: an auxiliary adjustment system for dry weight of hemodialysis patients based on deep reinforcement learning, which system includes a data acquisition module, a data processing module, a strategy learning module and an auxiliary decision-making module;
所述数据采集模块用于采集血透患者透析诱导期和透析稳定期的医疗电子病历数据,并输入到数据处理模块;The data acquisition module is used to collect medical electronic medical record data of hemodialysis patients during the dialysis induction period and dialysis stable period, and input it into the data processing module;
所述数据处理模块用于对数据采集模块采集的数据进行处理,包括状态空间的构建和动作空间的构建;状态代表患者透析疗程中经过时序编码后的临床变量,动作代表相比于上一次透析疗程的干体重,当前干体重应该调整的值;The data processing module is used to process the data collected by the data acquisition module, including the construction of the state space and the construction of the action space; the state represents the time-series coded clinical variables during the patient's dialysis treatment, and the action represents the time compared to the previous dialysis. The dry weight of the treatment course, the value to which the current dry weight should be adjusted;
所述策略学习模块用于设置深度强化学习的奖励函数,所述奖励函数为每个状态的即时奖励,由患者的长期生存概率的奖励和患者当前的透析中症状的惩罚构成,并基于数据处理模块构建的状态空间和动作空间进行深度强化学习,得到干体重调整策略;The policy learning module is used to set the reward function of deep reinforcement learning. The reward function is the instant reward of each state, which is composed of the reward of the patient's long-term survival probability and the penalty of the patient's current dialysis symptoms, and is based on data processing. The state space and action space constructed by the module are subjected to deep reinforcement learning to obtain the dry weight adjustment strategy;
所述辅助决策模块用于将干体重调整策略进行可视化输出,辅助医师决策。The auxiliary decision-making module is used to visually output the dry weight adjustment strategy to assist doctors in decision-making.
进一步地,对于透析诱导期的患者,数据采集模块每次透析疗程均进行数据采集;对于透析稳定期的患者,数据采集模块每4次透析疗程进行一次数据采集。Furthermore, for patients in the induction period of dialysis, the data collection module collects data for every dialysis session; for patients in the stable period of dialysis, the data collection module collects data for every 4 dialysis sessions.
进一步地,每个透析疗程的数据包括四类临床变量:上一次透析疗程的透析中测量变量、上一次透析疗程的透析后测量变量、以及此次透析疗程的透析前测量变量和此次透析疗程的患者人口统计学指标。Furthermore, the data of each dialysis session includes four types of clinical variables: intra-dialysis measured variables of the previous dialysis session, post-dialysis measured variables of the previous dialysis session, and pre-dialysis measured variables of this dialysis session and this dialysis session. patient demographics.
进一步地,对于透析稳定期的患者,根据采集的临床变量的不同,数据采集模块采集记录的临床变量值为这4次透析疗程中对应临床变量值的平均值或总和。Further, for patients in the stable phase of dialysis, depending on the clinical variables collected, the clinical variable values collected and recorded by the data collection module are the average or sum of the corresponding clinical variable values in these four dialysis treatments.
进一步地,数据处理模块首先对数据采集模块采集的数据进行预处理,利用多重插补的方式对缺失临床变量数据进行插值处理,使用Min-Max归一化方法对临床变量数据进行归一化处理,之后利用预处理后的数据进行状态空间的构建。Further, the data processing module first preprocesses the data collected by the data acquisition module, uses multiple interpolation to interpolate the missing clinical variable data, and uses the Min-Max normalization method to normalize the clinical variable data. , and then use the preprocessed data to construct the state space.
进一步地,数据处理模块利用一个长短期记忆网络的自编码器对预处理后的临床变量数据进行时序编码处理;长短期记忆网络的自编码器经过训练优化,最小化原始输入和解码输出之间的重建损失,其编码器和解码器部分均是由一个单层、包含128个单元的长短期记忆网络组成,所述状态空间的构建采用长短期记忆网络的自编码器循环编码患者采集的临床变量,并为每个患者的每一个透析疗程时间输出一个代表临床变量的状态。 Furthermore, the data processing module uses an autoencoder of a long short-term memory network to perform temporal encoding processing on the preprocessed clinical variable data; the autoencoder of the long short-term memory network is trained and optimized to minimize the difference between the original input and the decoded output. The reconstruction loss of variables, and output a status representing the clinical variable for each dialysis session time for each patient.
进一步地,数据处理模块进行动作空间构建时,采用后向插值的方式,填充每次透析疗程中医师的推荐干体重值,计算患者本次透析疗程的干体重相较于上一次透析疗程的干体重的变化量,并进行了离散化处理。Furthermore, when constructing the action space, the data processing module uses backward interpolation to fill in the recommended dry weight value of the physician in each dialysis session, and calculates the patient's dry weight in this dialysis session compared with the dry weight in the previous dialysis session. The amount of change in body weight was discretized.
进一步地,策略学习模块中,奖励函数的一部分通过多层感知机网络来预测患者在对应状态下一年内死亡的概率,奖励回报被设置为概率的负对数赔率;奖励函数的另一部分为对透析中发生副反应症状的处罚,处罚随着不同的透析中症状和严重程度而变化。Furthermore, in the policy learning module, part of the reward function uses a multi-layer perceptron network to predict the probability of the patient dying within one year in the corresponding state, and the reward return is set to the negative logarithmic odds of the probability; the other part of the reward function is Penalties for the occurrence of side effects during dialysis vary with different intradialytic symptoms and severity.
进一步地,策略学习模块中,构建有经验回放池并采用深度双Q网络进行深度强化学习,经验回放是指将每次和环境交互得到的奖励与状态更新情况保存起来,用于之后深度强化学习过程中目标Q值的更新。Furthermore, in the policy learning module, an experienced replay pool is constructed and a deep double-Q network is used for deep reinforcement learning. Experience replay refers to saving the rewards and status updates obtained from each interaction with the environment for subsequent deep reinforcement learning. The update of the target Q value during the process.
进一步地,所述辅助决策模块中,医师能够设置评估阈值,低于此阈值的调整将由护士直接评估并选择性执行,高于此阈值的调整由医师评估并选择性执行,实现对干体重调整决策的辅助支持。Further, in the auxiliary decision-making module, doctors can set an evaluation threshold. Adjustments below this threshold will be directly evaluated by nurses and selectively executed. Adjustments above this threshold will be evaluated and selectively executed by physicians to achieve dry weight adjustment. Decision-making support.
本发明的有益效果是:本发明将干体重评估的重要临床问题建模为干体重调整的时序决策问题;结合临床知识和医师经验,为干体重调整过程构建了针对性的奖励函数,同时反映患者的长期生存奖励和短期的透析不良症状反映惩罚;利用强化学习智能体-带竞争构架的深度双Q网络充分利用时序电子病历数据学习最佳干体重调整策略;可以减轻医师工作量,可以在评估患者干体重时综合考虑到更多的患者特征变量,帮助医师平衡短期利益和长期利益,为患者定制个性化的干体重调整方案。由于透析治疗的效果在患者群体中存在很大的异质性,因此患者很有可能受益于更个性化、更智能的调整方案,从而提高长期生存率、降低透析副反应的发生率,提高透析疗程的治疗效果。The beneficial effects of the present invention are: the present invention models the important clinical problem of dry weight assessment as a timing decision-making problem of dry weight adjustment; combines clinical knowledge and physician experience to construct a targeted reward function for the dry weight adjustment process, while reflecting The patient's long-term survival reward and the short-term penalty for poor dialysis symptom response; the reinforcement learning agent-deep double Q network with a competitive architecture is used to fully utilize the time-series electronic medical record data to learn the optimal dry weight adjustment strategy; it can reduce the workload of doctors and can When assessing a patient's dry weight, more patient characteristic variables are taken into consideration to help physicians balance short-term and long-term interests and customize a personalized dry weight adjustment plan for the patient. Since the effects of dialysis treatment are highly heterogeneous among patient groups, patients are likely to benefit from a more personalized and intelligent adjustment plan, thereby improving long-term survival rates, reducing the incidence of dialysis side effects, and improving dialysis The therapeutic effect of the course of treatment.
附图说明Description of the drawings
图1为本发明基于深度强化学习的血透患者干体重辅助调节系统结构框图。Figure 1 is a structural block diagram of the dry weight auxiliary adjustment system for hemodialysis patients based on deep reinforcement learning of the present invention.
图2为本发明数据采集模块中数据重构过程示意图。Figure 2 is a schematic diagram of the data reconstruction process in the data acquisition module of the present invention.
图3为本发明根据马尔可夫决策过程对干体重的调整过程的建模示意图。Figure 3 is a schematic diagram of the modeling process of dry body weight adjustment according to the Markov decision process of the present invention.
图4为本发明策略学习模块的整体架构图。Figure 4 is an overall architecture diagram of the strategy learning module of the present invention.
具体实施方式Detailed ways
以下结合附图对本发明具体实施方式作进一步详细说明。The specific embodiments of the present invention will be described in further detail below with reference to the accompanying drawings.
强化学习是人工智能领域的热门研究方向,它基于一个与环境的不断交互的代理智能体,目标是找到一个最优策略来最大化预期的累积奖励。近年来,随着海量医疗电子病历数据的利用和新型机器学习技术的发展,强化学习已经被引入医疗保健领域,并在许多时序决策问题中发挥着越来越重要的作用,例如用于糖尿病患者的血糖控制、脓毒症患者的治疗、机械 通气设置等问题。然而,迄今为止,强化学习技术尚未用来支持临床医生评估血液透析患者的干体重。Reinforcement learning is a popular research direction in the field of artificial intelligence. It is based on an agent agent that continuously interacts with the environment. The goal is to find an optimal strategy to maximize the expected cumulative reward. In recent years, with the utilization of massive medical electronic medical record data and the development of new machine learning technologies, reinforcement learning has been introduced into the healthcare field and plays an increasingly important role in many temporal decision-making problems, such as for patients with diabetes. glycemic control, treatment of patients with sepsis, mechanical Ventilation settings and other issues. However, to date, reinforcement learning techniques have not been used to support clinicians in assessing dry body weight in hemodialysis patients.
本发明利用马尔可夫决策过程框架将干体重评估过程建模为时序决策过程,为不同的透析时期定义各自的状态空间和动作空间,并设计了一个结合临床背景知识的奖励系统;本发明构建了一个基于竞争构架的深度双Q网络(Dueling-DQN)从历史电子病历数据中学习最佳的干体重调整策略,从而为肾病医师提供干体重调节的临床决策支持建议,辅助医师进行患者体重的长程管理。The present invention uses the Markov decision process framework to model the dry weight assessment process as a sequential decision-making process, defines respective state spaces and action spaces for different dialysis periods, and designs a reward system combined with clinical background knowledge; the present invention constructs A deep double Q network (Dueling-DQN) based on a competitive architecture is used to learn the optimal dry weight adjustment strategy from historical electronic medical record data, thereby providing nephrologists with clinical decision support suggestions for dry weight adjustment and assisting doctors in patient weight management. Long-term management.
如图1所示,本发明提供的一种基于深度强化学习的血透患者干体重辅助调节系统,该系统包括:用于采集血透患者医疗电子病历数据的数据采集模块、用于对原始数据进行处理的数据处理模块、用于深度强化学习智能体的策略学习模块;用于可视化输出、与医师交互的辅助决策模块。As shown in Figure 1, the present invention provides a dry weight auxiliary adjustment system for hemodialysis patients based on deep reinforcement learning. The system includes: a data acquisition module for collecting medical electronic medical record data of hemodialysis patients, and a data collection module for collecting original data. Data processing module for processing, strategy learning module for deep reinforcement learning agents; auxiliary decision-making module for visual output and interaction with doctors.
所述数据采集模块的处理过程具体为:从医疗电子病历系统中采集患者的临床数据,包括人口统计学、实验室值、透析参数、透析症状等相关临床特征。考虑到在临床实践中,干体重的评价发生在每次透析疗程中测量完透析前变量之后、真正利用透析仪器开始透析之前,因此,本发明在数据采集时对采集的时间窗口进行了限定,也即对每条透析疗程的数据进行重构。每个透析疗程的数据包括四类临床变量:上一次透析疗程的透析中测量变量、上一次透析疗程的透析后测量变量、以及此次透析疗程的透析前测量变量、此次透析疗程的患者人口统计学指标(如图2所示)。The processing process of the data collection module is specifically: collecting clinical data of patients from the medical electronic medical record system, including demographics, laboratory values, dialysis parameters, dialysis symptoms and other related clinical characteristics. Considering that in clinical practice, the evaluation of dry body weight occurs after the pre-dialysis variables are measured in each dialysis session and before dialysis is actually started using the dialysis equipment, therefore, the present invention limits the time window of collection during data collection. That is, the data of each dialysis treatment course is reconstructed. The data for each dialysis session include four categories of clinical variables: intradialytic measured variables for the previous dialysis session, post-dialytic measured variables for the previous dialysis session, pre-dialytic measured variables for this dialysis session, and patient population for this dialysis session. Statistical indicators (shown in Figure 2).
本发明对透析诱导期(透析开始的前三个月)和透析稳定期(透析开始的三个月以后)的数据进行分别处理和建模。对于透析诱导期的患者,本发明每次透析疗程均进行数据采集;对于透析稳定期的患者,本发明每4次透析疗程进行一次数据采集,采集记录的临床变量值为这4次透析疗程中对应临床变量值的平均值(例如,年龄)或总和(例如,不良透析症状的发生次数)。The present invention separately processes and models the data of the dialysis induction period (three months before the start of dialysis) and the dialysis stable period (three months after the start of dialysis). For patients in the induction period of dialysis, the present invention collects data for each dialysis session; for patients in the stable period of dialysis, the present invention collects data for every 4 dialysis sessions, and the clinical variable values collected and recorded are those of the 4 dialysis sessions. Corresponds to the mean (e.g., age) or sum (e.g., number of occurrences of adverse dialysis symptoms) of clinical variable values.
所述数据处理模块的处理过程包括两部分:The processing process of the data processing module includes two parts:
1)状态空间的构建1) Construction of state space
2)动作空间的构建2) Construction of action space
1)如图3所示,干体重的调整过程建模是一个时序决策过程,本发明根据马尔可夫决策过程(MDP)对此过程进行了建模描述。马尔科夫决策过程由元组(S,A,T,R,π)描述,其中S表示状态空间,A表示动作空间,T表示不同状态之间的转换概率分布,R表示奖励函数,π表示策略,即从状态空间到动作空间的映射。在每一个时间步t,智能体可以观察到一个状态st∈S,并根据策略π选择一个动作at∈A,此为动作选 择过程。接着,智能体根据奖励函数R收到与其动作选择相关的奖励rt,此为奖励反应过程。最后,环境根据状态转换概率分布T,响应智能体的动作改变至下一个状态st+1∈S。在本发明中,状态S代表时序编码后的患者透析疗程的临床变量,动作A代表相比于上一次透析疗程的干体重,当前干体重应该调整的值(增加或减少的值)。由于临床环境复杂,难以准确建模状态转换的概率分布,因此本发明设定状态转换概率分布T未知。智能体在奖励函数R的指导下,根据历史回顾性数据对未知复杂环境学习并输出最佳的动作选择策略π。状态空间的构建1) As shown in Figure 3, the dry body weight adjustment process modeling is a sequential decision-making process. The present invention models and describes this process based on the Markov decision process (MDP). The Markov decision process is described by the tuple (S, A, T, R, π), where S represents the state space, A represents the action space, T represents the transition probability distribution between different states, R represents the reward function, and π represents Strategy is the mapping from state space to action space. At each time step t, the agent can observe a state s t ∈S and select an action a t ∈A according to the policy π. This is the action selection. selection process. Then, the agent receives the reward r t related to its action selection according to the reward function R. This is the reward response process. Finally, the environment changes to the next state s t+1 ∈S in response to the agent's action according to the state transition probability distribution T. In the present invention, the state S represents the clinical variables of the patient's dialysis course after time series encoding, and the action A represents the value that the current dry weight should be adjusted (increased or decreased) compared to the dry weight of the last dialysis course. Since the clinical environment is complex and it is difficult to accurately model the probability distribution of state transition, the present invention sets the state transition probability distribution T to be unknown. Under the guidance of the reward function R, the agent learns and outputs the best action selection strategy π for the unknown complex environment based on historical retrospective data. Construction of state space
利用多重插补的方式对缺失临床变量数据进行插值处理;使用Min-Max归一化方法对特征矩阵进行了归一化处理,方便后续深度模型的学习和优化。由于干体重调整过程实际上为部分可观察马尔可夫决策过程(POMDP),也即状态转换动力学和奖励分配不满足马尔可夫性质(当前状态所包含的信息是确定未来状态的分布概率所需的全部内容),本发明利用一个长短期记忆网络的自编码器对患者采集的临床数据进行了时序编码处理。长短期记忆网络的自编码器经过训练优化,最小化原始输入和解码输出之间的重建损失,其编码器和解码器部分都是由一个单层、包含128个单元的长短期记忆网络组成。此长短期记忆网络的自编码器循环编码患者采集的临床观察结果,并为每个患者i的每一个透析疗程时间t输出一个状态sit
sit=f(oi1,oi2,oi3,...,oit)
Multiple interpolation was used to interpolate missing clinical variable data; the Min-Max normalization method was used to normalize the feature matrix to facilitate subsequent learning and optimization of deep models. Since the dry weight adjustment process is actually a partially observable Markov decision process (POMDP), that is, the state transition dynamics and reward distribution do not satisfy the Markov properties (the information contained in the current state is determined by the distribution probability of the future state). (all required content), the present invention uses an autoencoder of a long short-term memory network to perform temporal encoding processing on clinical data collected by patients. The autoencoder of the long short-term memory network is trained and optimized to minimize the reconstruction loss between the original input and the decoded output. The encoder and decoder parts are composed of a single-layer long short-term memory network containing 128 units. The autoencoder of this long short-term memory network cyclically encodes clinical observations collected by patients and outputs a state s it for each dialysis session time t for each patient i.
s it =f (o i1 , o i2 , o i3 ,..., o it )
其中,i代表患者,oit表示采集的患者第t个透析疗程的临床观察特征向量,t表示透析疗程时间,s表示马尔科夫过程的状态,f表示训练好的长短期记忆网络的编码器。Among them, i represents the patient, o it represents the collected clinical observation feature vector of the t-th dialysis course of the patient, t represents the dialysis course time, s represents the state of the Markov process, and f represents the encoder of the trained long short-term memory network. .
2)动作空间的构建2) Construction of action space
考虑到临床上干体重的临床推荐值被认为保持不变,直到医生为患者开了新的透析处方,所以本发明采用后向插值的方式,填充了每次透析疗程中医师的推荐干体重值;本发明计算了患者本次透析疗程的干体重相较于上一次透析时干体重的变化量,并进行了离散化处理。Considering that the clinical recommended value of dry body weight is considered to remain unchanged until the doctor prescribes a new dialysis prescription for the patient, the present invention uses backward interpolation to fill in the recommended dry body weight value of the physician in each dialysis session. ; The present invention calculates the change in the patient's dry weight in this dialysis session compared to the last dialysis session, and performs discretization processing.
离散化处理是指将干体重调整范围限定在某一个区间范围内,等调整间隔划分为不同调整动作,采用透析疗程中医生对干体重的调整连续值最相近的动作作为离散化后的干体重调整动作(透析疗程的干体重相较于上一次透析时干体重的变化量)。Discretization processing refers to limiting the dry weight adjustment range to a certain interval, dividing equal adjustment intervals into different adjustment actions, and using the action with the closest continuous value to the doctor's dry weight adjustment during the dialysis treatment as the discretized dry weight. Adjustment action (the amount of change in dry body weight during a dialysis session compared to the previous dialysis session).
本发明为透析诱导期(透析开始的前三个月)和透析稳定期(透析开始的三个月以后)构建了特定的动作空间,具体如表1所示。The present invention constructs specific action spaces for the dialysis induction period (three months before the start of dialysis) and the dialysis stable period (three months after the start of dialysis), as shown in Table 1.
表1.不同透析时期干体重调整频率比较和动作空间构建
Table 1. Comparison of dry weight adjustment frequency and action space construction in different dialysis periods
所述深度强化学习智能体的策略学习模块的处理过程包括三部分:The processing process of the policy learning module of the deep reinforcement learning agent includes three parts:
1)经验回放1) Experience replay
2)学习奖励函数2) Learn the reward function
3)深度Q网络学习干体重调整策略3) Deep Q network learning dry weight adjustment strategy
如图4所示,本发明所述深度强化学习智能体的策略学习模块核心是采用基于竞争构架的深度双Q网络(DDQN with a dueling structure)。深度双Q网络(DDQN)和基于竞争构架Q网络(Dueling-DQN)都是DQN的改进版本,前者是对DQN训练算法的改进,后者是对DQN模型结构的改进,本发明同时采用了这两种改进。而DQN算法是对Q-learning算法的改进,Q-learning算法采用一个Q-tabel来记录每个状态下的动作值,当状态空间或动作空间较大时,需要的存储空间也会较大。如果状态空间或动作空间连续,则Q-learning算法无法使用。而DQN算法的核心就是用一个人工神经网络q(s,a;ω)s∈S,a∈A来代替Q-tabel,即动作价值函数。动作价值网络的输入为状态信息,输出为每个动作的价值,智能体根据每个动作的价值选择采用的动作。As shown in Figure 4, the core of the policy learning module of the deep reinforcement learning agent of the present invention is a deep double Q network (DDQN with a dueling structure) based on a competition architecture. Deep double Q network (DDQN) and competition-based Q network (Dueling-DQN) are both improved versions of DQN. The former is an improvement on the DQN training algorithm, and the latter is an improvement on the DQN model structure. The present invention adopts these at the same time. Two improvements. The DQN algorithm is an improvement on the Q-learning algorithm. The Q-learning algorithm uses a Q-tabel to record the action value in each state. When the state space or action space is large, the storage space required will also be large. If the state space or action space is continuous, the Q-learning algorithm cannot be used. The core of the DQN algorithm is to use an artificial neural network q(s, a; ω) s∈S, a∈A to replace Q-tabel, which is the action value function. The input of the action value network is state information, and the output is the value of each action. The agent selects the action based on the value of each action.
1)经验回放池的构建1) Construction of experience replay pool
经验回放是指将每次和环境交互得到的奖励与状态更新情况都保存起来,用于之后目标Q值的更新,可以打乱样本关联性,提高样本利用率,从而提高DQN训练的稳定性。经验回放主要有“存储”和“回放”两大关键步骤:存储是指将经验以当前状态st、动作at、即时奖励rt+1、下个状态st+1、回合状态done形式存储在经验池中,回放是指按照一定规则从经验池中采样一条或多条经验数据。本发明采用优先经验回放的方式,即为经验池中每条经验指定一个优先级,在采样经验时更倾向于选择优先级更高的经验。优先级取决于每个状态转换的当前Q值与目标Q值的差距(时间差异误差,TD‐error),如果TD‐error越大,就代表Q网络预测精度还有很多上升空间,那么这个样本就越需要被学习,也就是优先级越高。Experience replay refers to saving the rewards and status updates obtained from each interaction with the environment for subsequent updates of the target Q value, which can disrupt sample correlation, improve sample utilization, and thereby improve the stability of DQN training. Experience playback mainly has two key steps: "storage" and "playback": storage refers to storing experience in the form of current state s t , action a t , instant reward r t+1 , next state s t+1 , and round state done Stored in the experience pool, playback refers to sampling one or more pieces of experience data from the experience pool according to certain rules. The present invention adopts a priority experience playback method, that is, assigning a priority to each experience in the experience pool, and is more inclined to select experiences with higher priorities when sampling experience. The priority depends on the difference between the current Q value of each state transition and the target Q value (time difference error, TD-error). If the TD-error is larger, it means that there is still a lot of room for improvement in the Q network prediction accuracy, then this sample The more it needs to be learned, the higher the priority.
2)奖励函数的学习2) Learning of reward function
奖励函数是从给定状态‐动作对的环境中观察到的反馈。强化学习智能体的主要目标是在给定患者状态‐动作轨迹的情况下最大化状态‐动作对的累积奖励,因此奖励函数的设计对于强化学习智能体的学习至关重要。The reward function is the feedback observed from the environment for a given state-action pair. The main goal of the reinforcement learning agent is to maximize the cumulative reward of the state-action pair given the patient's state-action trajectory, so the design of the reward function is crucial to the learning of the reinforcement learning agent.
很自然地想到将患者的生存作为奖励的触发条件。例如,代理会因患者死亡而获得负回 报,而因患者存活而获得正回报。然而,由于血透患者的透析治疗可能会持续数年,因此患者的轨迹会很长。如果仅响应患者结果事件,奖励将非常稀疏,会阻碍强化学习智能体的学习和更新过程。It's natural to think of patient survival as a trigger for rewards. For example, an agent would receive a negative return for a patient's death return, and a positive return due to patient survival. However, because dialysis treatment for hemodialysis patients can last for years, the patient trajectory can be long. If only responding to patient outcome events, rewards would be very sparse, hindering the learning and updating process of the reinforcement learning agent.
因此,在本发明中,奖励函数被设置为即时响应患者轨迹中的每个状态。具体地,奖励包括两部分:一部分反映患者的长期生存概率r1(s),另一部分反映患者当前的透析中症状r2(s)。为了获得生存奖励,本发明训练了一个多层感知机(MLP)网络来预测患者在该状态下一年内死亡的概率。奖励回报被设置为概率的负对数赔率。一般来说,一年内的死亡状态得分为负,生存状态得分为正。
Therefore, in the present invention, the reward function is set up to respond immediately to each state in the patient's trajectory. Specifically, the reward consists of two parts: one part reflects the patient's long-term survival probability r 1 (s), and the other part reflects the patient's current intradialytic symptoms r 2 (s). In order to obtain the survival reward, the present invention trains a multi-layer perceptron (MLP) network to predict the probability of the patient's death within one year in this state. Reward returns are set as the negative log odds of probability. Generally speaking, a dead status score within a year is negative and a living status score is positive.
其中r1(s)表示奖励函数中的生存奖励部分;g(s)表示多层感知机所预测的患者在状态s下一年内死亡的概率。Among them, r 1 (s) represents the survival reward part in the reward function; g (s) represents the probability of death of the patient within one year in state s predicted by the multi-layer perceptron.
奖励的另一部分为对透析中发生副反应的处罚,记为r2(s)。处罚随着不同的透析中症状和严重程度而变化。根据实际的临场表现,发热、失衡综合征、脑出血和脑梗塞扣分1分,而头痛、肌肉痉挛、腹痛、透析中低血压和透析中高血压扣分2分。The other part of the reward is the penalty for side effects during dialysis, which is recorded as r 2 (s). Penalties vary with different intradialytic symptoms and severity. According to the actual on-site performance, one point is deducted for fever, imbalance syndrome, cerebral hemorrhage and cerebral infarction, while two points are deducted for headache, muscle spasm, abdominal pain, intradialytic hypotension and intradialytic hypertension.
总的奖励函数r(s)为患者生存奖励与透析中副反应处罚之和。
r(s)=r1(s)+r2(s)
The total reward function r(s) is the sum of the patient's survival reward and the penalty for side effects during dialysis.
r(s)=r 1 (s)+r 2 (s)
3)深度Q网络的策略学习3) Policy learning of deep Q network
本发明训练并优化了一个基于竞争构架的深度双Q网络(Dueling DDQN),通过反复试验调整干体重处理策略来最大化预测奖励的整体回报。Dueling DDQN的损失函数分为两部分:一是时间差异误差(TD‐error),反映当前Q值与目标Q值的差距;二是一个正则化项来惩罚超出合理阈值rmax=20的输出Q值,以提高模型稳定性。以下公式表示本发明训练并优化基于竞争构架的深度双Q网络的损失函数:
L(θ)=E[errorTD*wimp]+λmax(|Qmain(st,at;θ)-rmax|,0)
This invention trains and optimizes a deep double Q network (Dueling DDQN) based on a competitive architecture, and adjusts the dry weight processing strategy through repeated trials to maximize the overall return of the predicted reward. The loss function of Dueling DDQN is divided into two parts: one is the time difference error (TD-error), which reflects the difference between the current Q value and the target Q value; the other is a regularization term to punish the output Q that exceeds the reasonable threshold r max = 20 value to improve model stability. The following formula represents the loss function of the present invention to train and optimize the deep double Q network based on the competitive architecture:
L(θ)=E[error TD *w imp ]+λmax(|Q main (s t , a t ; θ)-r max |, 0)
errorTD=(Qdouble-target-Qmain(st,at;θ))2
error TD = (Q double-target -Q main (s t , a t ; θ)) 2
其中,L(θ)为本发明基于竞争构架的深度双Q网络最终要学习的损失函数,errorTD为时间差异误差,wimp是优先经验回放的重要性采样权重;Qmain是深度双Q网络中的主网络,Qtarget是深度双Q网络中的目标网络,θ是主网络的参数,θ′是目标网络的参数;γ是折扣系数,取0到1之间的一个值,γ值越高,表示智能体更加关注未来的奖励,而不是当前时刻 的奖励;s表示状态,a表示动作,r表示奖励,E表示期望,λ表示正则化项系数,取0到1之间的一个值,rt+1表示第t+1的透析疗程下的奖励,st表示第t个透析疗程的状态,at表示第t个透析疗程的动作。Among them, L(θ) is the final loss function to be learned by the deep double Q network based on the competition architecture of the present invention, error TD is the time difference error, w imp is the importance sampling weight of priority experience playback; Q main is the deep double Q network The main network in , Q target is the target network in the deep double Q network, θ is the parameter of the main network, θ′ is the parameter of the target network; γ is the discount coefficient, taking a value between 0 and 1, the higher the γ value High, indicating that the agent pays more attention to future rewards rather than the current moment. reward; s represents the state, a represents the action, r represents the reward, E represents the expectation, λ represents the regularization term coefficient, which takes a value between 0 and 1, r t+1 represents the t+1 dialysis treatment course Reward, s t represents the status of the t-th dialysis session, and a t represents the action of the t-th dialysis session.
本发明中奖励函数的特别设计有效提高了深度Q网络的策略学习效率。不同于一般的延迟性的生存奖励(在患者轨迹的终点根据患者生存或者死亡进行相应奖励或处罚),本发明中的奖励函数为即时奖励,也即轨迹的每个状态都会对智能体赋予奖励。奖励函数中的生存奖励部分r1(s)通过一个生存预测器将位于患者轨迹终点的生存奖励提前且分散地发放于患者轨迹的每个状态。另一方面,奖励函数中的透析中副反应处罚部分r2(s)将每个透析状态中患者对透析疗程的即时反馈纳入奖励中,模仿了医生根据患者临床表现调整干体重的行为,使得智能体学习得到的策略不仅有望提高患者生存情况,也能够减少患者的透析内不良反应,减少透析患者的生理痛苦,提高透析疗程的治疗效果。奖励决定了智能体行动的目标,因此即时奖励相比于延迟奖励能够更好地、更及时地指导智能体的行为,相应的损失函数更容易被学习和优化,提高了智能体的学习效率。The special design of the reward function in the present invention effectively improves the policy learning efficiency of the deep Q network. Different from general delayed survival rewards (corresponding rewards or penalties are given at the end of the patient's trajectory according to the patient's survival or death), the reward function in the present invention is an immediate reward, that is, each state of the trajectory will give a reward to the agent. . The survival reward part r 1 (s) in the reward function distributes the survival reward located at the end of the patient's trajectory to each state of the patient's trajectory in advance and dispersedly through a survival predictor. On the other hand, the intradialytic side reaction penalty part r 2 (s) in the reward function incorporates the patient's immediate feedback on the dialysis course in each dialysis state into the reward, imitating the doctor's behavior of adjusting the dry weight according to the patient's clinical performance, so that The strategies learned by the agent are not only expected to improve the patient's survival, but also reduce the patient's adverse reactions during dialysis, reduce the physical pain of dialysis patients, and improve the therapeutic effect of dialysis treatment. The reward determines the goal of the agent's action. Therefore, the immediate reward can guide the agent's behavior better and more timely than the delayed reward. The corresponding loss function is easier to learn and optimize, which improves the agent's learning efficiency.
最终,深度Q网络将学习到一个价值函数Q网络,实现把不同的状态和动作映射到不同的Q值,从而可以根据此映射为不同透析疗程的状态选择不同的干体重调整动作,最终形成智能体推荐的干体重调整策略。Eventually, the deep Q network will learn a value function Q network to map different states and actions to different Q values, so that different dry weight adjustment actions can be selected for different dialysis treatment states based on this mapping, ultimately forming an intelligent Dry body weight adjustment strategies recommended by the body.
所述用于可视化输出、与医师交互的辅助决策模块具体为:针对患者的不同透析疗程状态,强化学习智能体将为其推荐最佳干体重调整值。医师可以设置评估阈值(如0.2kg),低于此阈值的调整将由护士直接评估并选择性执行,高于此阈值的调整由医师评估并选择性执行,实现对医师干体重调整决策的辅助支持。系统将记录每次透析疗程中智能体的推荐值、医师是否接受智能体的建议、以及医师执行的干体重调整值,定期评估患者的透析充分性,利用可视化图表的形式反馈给医师和算法工程师,以便后续对模型进行更新和优化。The auxiliary decision-making module used for visual output and interaction with doctors is specifically: according to the patient's different dialysis treatment status, the reinforcement learning agent will recommend the optimal dry weight adjustment value for the patient. Physicians can set an assessment threshold (such as 0.2kg). Adjustments below this threshold will be directly evaluated by nurses and selectively implemented. Adjustments above this threshold will be evaluated and selectively implemented by physicians to provide auxiliary support for physicians’ dry weight adjustment decisions. . The system will record the agent's recommended values in each dialysis session, whether the physician accepts the agent's recommendations, and the dry weight adjustment value performed by the physician. It will regularly evaluate the patient's dialysis adequacy and provide feedback to physicians and algorithm engineers in the form of visual charts. , so that the model can be updated and optimized later.
本发明的一个具体实例如下:A specific example of the present invention is as follows:
本实施例使用某三甲医院接受持续、定期血液透析治疗的维持性血透患者的电子病历数据进行研究,对于透析诱导期和透析稳定期的数据分别划分为三个数据集:训练集(60%),验证集(20%),以及测试集(10%)。训练集的数据用于训练深度强化学习智能体模型,验证集的数据用于调整优化参数,测试集用于测试模型的性能。在测试集上,本发明采用有放回多次采样的方式(bootstrap)得到性能指标的置信区间。除了医生实施的策略、本发明智能体学习得到的策略,本实施例增加了随机策略和K近邻策略对比评估模型有效性,其中K近邻策略是指根据K个最相似的状态投票选择将采取的动作。本发明采用异策略评估方法中的加权双鲁棒(weighted double robust,WDR)估计器评估不同策略的价值,结果如表2 和表3所示。This example uses the electronic medical record data of maintenance hemodialysis patients who received continuous and regular hemodialysis treatment in a tertiary hospital for research. The data of the dialysis induction period and dialysis stable period are divided into three data sets: training set (60% ), validation set (20%), and test set (10%). The data of the training set is used to train the deep reinforcement learning agent model, the data of the verification set is used to adjust the optimization parameters, and the test set is used to test the performance of the model. On the test set, the present invention uses multiple sampling with replacement (bootstrap) to obtain the confidence interval of the performance index. In addition to the strategies implemented by doctors and the strategies learned by the intelligent agent of the present invention, this embodiment adds a random strategy and a K nearest neighbor strategy to compare and evaluate the effectiveness of the model. The K nearest neighbor strategy refers to selecting the strategy to be taken based on the K most similar state voting. action. This invention uses the weighted double robust (WDR) estimator in the different strategy evaluation method to evaluate the value of different strategies. The results are as shown in Table 2 and shown in Table 3.
表2.透析诱导期内不同策略的策略价值结果比较
Table 2. Comparison of strategic value results of different strategies during the dialysis induction period
表3.透析稳定期内不同策略的策略价值结果比较
Table 3. Comparison of strategic value results of different strategies during the stable period of dialysis
由结果可知,本发明利用深度强化学习智能体学习得到的干体重调整策略相比于其他策略取得了最佳效果。值得注意的是,本发明智能体学习得到的策略在应用于透析诱导期时相比于现有的临床医师策略,预计可以将血透患者5年死亡率降低9.47%,将血透患者3年死亡率降低7.99%,将透析不良反应发生率降低8.44%,将透析中收缩血的变异系数降低4.76%,且具有统计显著性。因此,本发明有望实现血透患者干体重的动态智能调整,有望明显改善血透患者的透析治疗效果和长期生存情况。It can be seen from the results that the dry weight adjustment strategy learned by the deep reinforcement learning agent in the present invention has achieved the best results compared with other strategies. It is worth noting that when the strategy learned by the intelligent agent of the present invention is applied to the dialysis induction period, compared with the existing clinician strategy, it is expected that the 5-year mortality rate of hemodialysis patients can be reduced by 9.47%, and the mortality rate of hemodialysis patients in 3 years can be reduced by 9.47%. The mortality rate was reduced by 7.99%, the incidence of dialysis adverse reactions was reduced by 8.44%, and the coefficient of variation of systolic blood during dialysis was reduced by 4.76%, which was statistically significant. Therefore, the present invention is expected to realize dynamic and intelligent adjustment of the dry weight of hemodialysis patients, and is expected to significantly improve the dialysis treatment effect and long-term survival of hemodialysis patients.
上述实施例用来解释说明本发明,而不是对本发明进行限制,在本发明的精神和权利要求的保护范围内,对本发明作出的任何修改和改变,都落入本发明的保护范围。 The above embodiments are used to illustrate the present invention, rather than to limit the present invention. Within the spirit of the present invention and the protection scope of the claims, any modifications and changes made to the present invention fall within the protection scope of the present invention.

Claims (10)

  1. 一种基于深度强化学习的血透患者干体重辅助调节系统,其特征在于,该系统包括数据采集模块、数据处理模块、策略学习模块和辅助决策模块;A dry weight auxiliary adjustment system for hemodialysis patients based on deep reinforcement learning, characterized in that the system includes a data acquisition module, a data processing module, a strategy learning module and an auxiliary decision-making module;
    所述数据采集模块用于采集血透患者透析诱导期和透析稳定期的医疗电子病历数据,并输入到数据处理模块;The data acquisition module is used to collect medical electronic medical record data of hemodialysis patients during the dialysis induction period and dialysis stable period, and input it into the data processing module;
    所述数据处理模块用于对数据采集模块采集的数据进行处理,包括状态空间的构建和动作空间的构建;状态代表患者透析疗程中经过时序编码后的临床变量,动作代表相比于上一次透析疗程的干体重,当前干体重应该调整的值;The data processing module is used to process the data collected by the data acquisition module, including the construction of the state space and the construction of the action space; the state represents the time-series coded clinical variables during the patient's dialysis treatment, and the action represents the time compared to the previous dialysis. The dry weight of the treatment course, the value to which the current dry weight should be adjusted;
    所述策略学习模块用于设置深度强化学习的奖励函数,所述奖励函数为每个状态的即时奖励,由患者的长期生存概率的奖励和患者当前的透析中症状的惩罚构成,并基于数据处理模块构建的状态空间和动作空间进行深度强化学习,得到干体重调整策略;The policy learning module is used to set the reward function of deep reinforcement learning. The reward function is the instant reward of each state, which is composed of the reward of the patient's long-term survival probability and the penalty of the patient's current dialysis symptoms, and is based on data processing. The state space and action space constructed by the module are subjected to deep reinforcement learning to obtain the dry weight adjustment strategy;
    所述辅助决策模块用于将干体重调整策略进行可视化输出,辅助医师决策。The auxiliary decision-making module is used to visually output the dry weight adjustment strategy to assist doctors in decision-making.
  2. 根据权利要求1所述的一种基于深度强化学习的血透患者干体重辅助调节系统,其特征在于,对于透析诱导期的患者,数据采集模块每次透析疗程均进行数据采集;对于透析稳定期的患者,数据采集模块每4次透析疗程进行一次数据采集。A dry weight auxiliary adjustment system for hemodialysis patients based on deep reinforcement learning according to claim 1, characterized in that, for patients in the dialysis induction period, the data acquisition module collects data for each dialysis treatment course; for patients in the dialysis stable period For patients, the data collection module collects data every 4 dialysis sessions.
  3. 根据权利要求2所述的一种基于深度强化学习的血透患者干体重辅助调节系统,其特征在于,每个透析疗程的数据包括四类临床变量:上一次透析疗程的透析中测量变量、上一次透析疗程的透析后测量变量、以及此次透析疗程的透析前测量变量和此次透析疗程的患者人口统计学指标。A dry weight auxiliary adjustment system for hemodialysis patients based on deep reinforcement learning according to claim 2, characterized in that the data of each dialysis course includes four types of clinical variables: intradialytic measurement variables of the last dialysis course, Post-dialysis measured variables for a dialysis session, as well as pre-dialysis measured variables for the dialysis session and patient demographics for the dialysis session.
  4. 根据权利要求3所述的一种基于深度强化学习的血透患者干体重辅助调节系统,其特征在于,对于透析稳定期的患者,根据采集的临床变量的不同,数据采集模块采集记录的临床变量值为4次透析疗程中对应临床变量值的平均值或总和。A dry weight auxiliary adjustment system for hemodialysis patients based on deep reinforcement learning according to claim 3, characterized in that, for patients in the stable period of dialysis, the data acquisition module collects and records recorded clinical variables based on the differences in collected clinical variables. The value is the mean or sum of the corresponding clinical variable values in 4 dialysis sessions.
  5. 根据权利要求1所述的一种基于深度强化学习的血透患者干体重辅助调节系统,其特征在于,数据处理模块首先对数据采集模块采集的数据进行预处理,利用多重插补的方式对缺失临床变量数据进行插值处理,使用Min-Max归一化方法对临床变量数据进行归一化处理,之后利用预处理后的数据进行状态空间的构建。An auxiliary adjustment system for dry body weight of hemodialysis patients based on deep reinforcement learning according to claim 1, characterized in that the data processing module first preprocesses the data collected by the data acquisition module, and uses multiple interpolation to correct missing data. The clinical variable data is interpolated, and the Min-Max normalization method is used to normalize the clinical variable data, and then the preprocessed data is used to construct the state space.
  6. 根据权利要求5所述的一种基于深度强化学习的血透患者干体重辅助调节系统,其特征在于,数据处理模块利用一个长短期记忆网络的自编码器对预处理后的临床变量数据进行时序编码处理;长短期记忆网络的自编码器经过训练优化,最小化原始输入和解码输出之间的重建损失,其编码器和解码器部分均是由一个单层、包含128个单元的长短期记忆网络组 成,所述状态空间的构建采用长短期记忆网络的自编码器循环编码患者采集的临床变量,并为每个患者的每一个透析疗程时间输出一个代表临床变量的状态。A dry weight auxiliary adjustment system for hemodialysis patients based on deep reinforcement learning according to claim 5, characterized in that the data processing module uses an autoencoder of a long short-term memory network to perform time series on the preprocessed clinical variable data. Encoding processing; the autoencoder of the long short-term memory network is trained and optimized to minimize the reconstruction loss between the original input and the decoded output. Both its encoder and decoder parts are composed of a single layer of long short-term memory containing 128 units. netgroup The construction of the state space uses the autoencoder of the long short-term memory network to cyclically encode the clinical variables collected by the patient, and output a state representing the clinical variable for each dialysis course time of each patient.
  7. 根据权利要求1所述的一种基于深度强化学习的血透患者干体重辅助调节系统,其特征在于,数据处理模块进行动作空间构建时,采用后向插值的方式,填充每次透析疗程中医师的推荐干体重值,计算患者本次透析疗程的干体重相较于上一次透析疗程的干体重的变化量,并进行了离散化处理。A dry weight auxiliary adjustment system for hemodialysis patients based on deep reinforcement learning according to claim 1, characterized in that when the data processing module constructs the action space, it uses backward interpolation to fill in the Chinese medicine practitioners for each dialysis session. According to the recommended dry weight value, the change in dry weight of the patient's current dialysis session compared with the dry weight of the previous dialysis session was calculated and discretized.
  8. 根据权利要求1所述的一种基于深度强化学习的血透患者干体重辅助调节系统,其特征在于,策略学习模块中,奖励函数的一部分通过多层感知机网络来预测患者在对应状态下一年内死亡的概率,奖励回报被设置为概率的负对数赔率;奖励函数的另一部分为对透析中发生副反应症状的处罚,处罚随着不同的透析中症状和严重程度而变化。A dry weight auxiliary adjustment system for hemodialysis patients based on deep reinforcement learning according to claim 1, characterized in that in the strategy learning module, part of the reward function predicts the patient's performance in the corresponding state through a multi-layer perceptron network. The probability of death within the year, the reward return is set as the negative logarithm of the probability; the other part of the reward function is the penalty for the occurrence of side effects symptoms during dialysis, and the penalty changes with different intradialytic symptoms and severity.
  9. 根据权利要求1所述的一种基于深度强化学习的血透患者干体重辅助调节系统,其特征在于,策略学习模块中,构建有经验回放池并采用深度双Q网络进行深度强化学习,经验回放是指将每次和环境交互得到的奖励与状态更新情况保存起来,用于之后深度强化学习过程中目标Q值的更新。A dry weight auxiliary adjustment system for hemodialysis patients based on deep reinforcement learning according to claim 1, characterized in that, in the strategy learning module, an experienced playback pool is constructed and a deep double Q network is used for deep reinforcement learning, and the experience playback It means to save the rewards and status updates obtained each time you interact with the environment, and use them to update the target Q value in the subsequent deep reinforcement learning process.
  10. 根据权利要求1所述的一种基于深度强化学习的血透患者干体重辅助调节系统,其特征在于,所述辅助决策模块中,医师能够设置评估阈值,低于此阈值的调整将由护士直接评估并选择性执行,高于此阈值的调整由医师评估并选择性执行,实现对干体重调整决策的辅助支持。 A dry weight auxiliary adjustment system for hemodialysis patients based on deep reinforcement learning according to claim 1, characterized in that in the auxiliary decision-making module, doctors can set an evaluation threshold, and adjustments below this threshold will be directly evaluated by nurses. And be selectively executed. Adjustments above this threshold are evaluated by physicians and selectively executed to achieve auxiliary support for dry weight adjustment decision-making.
PCT/CN2023/088561 2022-04-18 2023-04-17 Deep reinforcement learning based assistive adjustment system for dry weight of hemodialysis patient WO2023202500A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210404618.9 2022-04-18
CN202210404618.9A CN114496235B (en) 2022-04-18 2022-04-18 Hemodialysis patient dry weight auxiliary adjusting system based on deep reinforcement learning

Publications (1)

Publication Number Publication Date
WO2023202500A1 true WO2023202500A1 (en) 2023-10-26

Family

ID=81489553

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/088561 WO2023202500A1 (en) 2022-04-18 2023-04-17 Deep reinforcement learning based assistive adjustment system for dry weight of hemodialysis patient

Country Status (2)

Country Link
CN (1) CN114496235B (en)
WO (1) WO2023202500A1 (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114496235B (en) * 2022-04-18 2022-07-19 浙江大学 Hemodialysis patient dry weight auxiliary adjusting system based on deep reinforcement learning
CN114626836B (en) * 2022-05-17 2022-08-05 浙江大学 Multi-agent reinforcement learning-based emergency post-delivery decision-making system and method
CN115019960B (en) * 2022-08-01 2022-11-29 浙江大学 Disease assistant decision-making system based on personalized state space progress model
CN116453706B (en) * 2023-06-14 2023-09-08 之江实验室 Hemodialysis scheme making method and system based on reinforcement learning
CN116779150B (en) * 2023-07-03 2023-12-22 浙江一山智慧医疗研究有限公司 Personalized medical decision method, device and application based on multi-agent interaction
CN117012374B (en) * 2023-10-07 2024-01-26 之江实验室 Medical follow-up system and method integrating event map and deep reinforcement learning

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100105990A1 (en) * 2007-01-17 2010-04-29 Gambro Lundia Ab Method of monitoring hypertensive haemodialysis patients
JP5921011B1 (en) * 2015-09-29 2016-05-24 株式会社トマーレ Dialysis information sharing system and dialysis information sharing method
US20190244712A1 (en) * 2016-07-18 2019-08-08 Fresenius Medical Care Deutschland Gmbh Drug Dosing Recommendation
US20200330668A1 (en) * 2017-12-19 2020-10-22 Fresenius Medical Care Deutschland Gmbh Method And Devices For Determining A Treatment Regimen For Altering The Treatment Parameters When Dialyzing A Patient
CN111971755A (en) * 2018-04-12 2020-11-20 费森尤斯医疗保健控股公司 System and method for determining dialysis patient function to assess parameters and timing of palliative and/or end-of-care
CN112530594A (en) * 2021-02-08 2021-03-19 之江实验室 Hemodialysis complication long-term risk prediction system based on convolution survival network
CN112951419A (en) * 2020-11-11 2021-06-11 复旦大学附属华山医院 Hemodialysis dry weight intelligent assessment device
CN114496235A (en) * 2022-04-18 2022-05-13 浙江大学 Hemodialysis patient dry weight auxiliary adjusting system based on deep reinforcement learning

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102013008418A1 (en) * 2013-05-17 2014-11-20 Fresenius Medical Care Deutschland Gmbh Apparatus and method for providing treatment parameters for the treatment of a patient
CN105962939A (en) * 2016-06-16 2016-09-28 南昌大学第二附属医院 Uremia patient dry weight assessment instrument
US20210193317A1 (en) * 2019-12-20 2021-06-24 Fresenius Medical Care Holdings, Inc. Real-time intradialytic hypotension prediction
CN113990494B (en) * 2021-12-24 2022-03-25 浙江大学 Tic disorder auxiliary screening system based on video data

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100105990A1 (en) * 2007-01-17 2010-04-29 Gambro Lundia Ab Method of monitoring hypertensive haemodialysis patients
JP5921011B1 (en) * 2015-09-29 2016-05-24 株式会社トマーレ Dialysis information sharing system and dialysis information sharing method
US20190244712A1 (en) * 2016-07-18 2019-08-08 Fresenius Medical Care Deutschland Gmbh Drug Dosing Recommendation
US20200330668A1 (en) * 2017-12-19 2020-10-22 Fresenius Medical Care Deutschland Gmbh Method And Devices For Determining A Treatment Regimen For Altering The Treatment Parameters When Dialyzing A Patient
CN111971755A (en) * 2018-04-12 2020-11-20 费森尤斯医疗保健控股公司 System and method for determining dialysis patient function to assess parameters and timing of palliative and/or end-of-care
CN112951419A (en) * 2020-11-11 2021-06-11 复旦大学附属华山医院 Hemodialysis dry weight intelligent assessment device
CN112530594A (en) * 2021-02-08 2021-03-19 之江实验室 Hemodialysis complication long-term risk prediction system based on convolution survival network
CN114496235A (en) * 2022-04-18 2022-05-13 浙江大学 Hemodialysis patient dry weight auxiliary adjusting system based on deep reinforcement learning

Also Published As

Publication number Publication date
CN114496235A (en) 2022-05-13
CN114496235B (en) 2022-07-19

Similar Documents

Publication Publication Date Title
WO2023202500A1 (en) Deep reinforcement learning based assistive adjustment system for dry weight of hemodialysis patient
Yuan et al. The development an artificial intelligence algorithm for early sepsis diagnosis in the intensive care unit
CN104318351A (en) Traditional Chinese medicine health management system and method
US11581083B2 (en) Intra-aortic pressure forecasting
Wei et al. Risk assessment of cardiovascular disease based on SOLSSA-CatBoost model
US20170147773A1 (en) System and method for facilitating health monitoring based on a personalized prediction model
CN116453706B (en) Hemodialysis scheme making method and system based on reinforcement learning
CN115831364A (en) Type 2 diabetes risk layered prediction method based on multi-modal feature fusion
CN110767316A (en) Establishment method of wound blood transfusion prediction model, and method and system for determining blood transfusion volume
CN117079810A (en) Cardiovascular disease unscheduled re-hospitalization risk prediction method
CN115547502A (en) Hemodialysis patient risk prediction device based on time sequence data
Sang et al. Study on survival prediction of patients with heart failure based on support vector machine algorithm
WO2023106960A1 (en) Method for predicting the onset of a medical event in a person's health
CN114783587A (en) Intelligent prediction system for severe acute kidney injury
Dahm et al. Indications for admission to the surgical intensive care unit after radical cystectomy and urinary diversion
Heitz et al. WRSE-a non-parametric weighted-resolution ensemble for predicting individual survival distributions in the ICU
Demchenko et al. The Use of Machine Learning Methods to the Automated Atherosclerosis Diagnostic and Treatment System Development.
TWI795928B (en) System and method for prediction of intradialytic adverse event and computer readable medium thereof
Gu et al. Real-Time Intradialytic Hypotension Prediction Model Using an Improved Equilibrium Optimizer for Feature Selection
Xiao et al. Management and Analysis of Sports Health Level of the Elderly Based on Deep Learning
WO2022202360A1 (en) Information processing device, information processing method, and program
CN118039171A (en) Early prognosis prediction model for acute kidney injury patient and establishment method thereof
Zhao et al. Improving Mortality Risk Prediction Using an LSTM Network Combined With Self-Attention
Boonvisuth Development of an artificial intelligence model for prediction of dry weight in chronic hemodialysis patients and assessment of its accuracy compared to standard bioelectrical impedance analysis
Jung et al. Prediction Model Using Machine Learning for 90-day Mortality of Patients With Sepsis in Intensive Care Unit From MIMIC-IV Dataset

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23791155

Country of ref document: EP

Kind code of ref document: A1