WO2023202500A1

WO2023202500A1 - Deep reinforcement learning based assistive adjustment system for dry weight of hemodialysis patient

Info

Publication number: WO2023202500A1
Application number: PCT/CN2023/088561
Authority: WO
Inventors: 李劲松; 田雨; 周天舒; 杨子玥
Original assignee: 浙江大学
Priority date: 2022-04-18
Filing date: 2023-04-17
Publication date: 2023-10-26
Also published as: CN114496235B; CN114496235A

Abstract

Disclosed in the present invention is a deep reinforcement learning based assistive adjustment system for the dry weight of a hemodialysis patient. The system comprises a data collection module, a data processing module, a policy learning module and an assistive decision-making module. In the present invention, by using deep reinforcement learning technology, a double deep Q-network (DDQN) with a dueling struture is constructed as a proxy, the process of a doctor adjusting the dry weight of a hemodialysis patient is simulated, and a policy for adjusting the dry weight of the hemodialysis patient is intelligently learned. By means of the present invention, the process of adjusting the dry weight of a hemodialysis patient is modeled into a partially observable Markov process, respective state spaces and action spaces are defined for different dialysis periods, and reward functions which include long-term survival rewards and short-term dialysis side-effect punishment are designed; and by means of interactive learning of a proxy and a patient state, a dry weight adjustment policy that maximizes an overall reward is obtained, thereby assisting a doctor in performing long-term management of the dry weight of a patient.

Description

A dry weight auxiliary adjustment system for hemodialysis patients based on deep reinforcement learning

Technical field

The invention belongs to the technical fields of medical treatment and machine learning. Specifically, it relates to an auxiliary adjustment system for the dry weight of hemodialysis patients based on deep reinforcement learning.

Background technique

The number of patients with end-stage renal disease is increasing significantly worldwide. Due to the shortage of kidney donor resources, most patients rely on hemodialysis (hemodialysis) treatment to maintain their lives. Patients with end-stage renal disease have a much higher risk of infection, cardiovascular and cerebrovascular diseases than the normal population, and their living conditions are far worse than those of the general population. End-stage renal disease has become a huge burden on the health care system. The main goal of hemodialysis is to correct the composition and volume of body fluids through ultrafiltration (UF) to achieve body fluid balance, and dry body weight is a key indicator for determining the amount of ultrafiltration during hemodialysis treatment. Dry body weight is one of the most fundamental components of any dialysis prescription and is clinically determined as the lowest tolerated postdialysis weight without adverse intradialytic symptoms and hypotension in the absence of significant fluid overload. Accurate assessment of dry body weight is crucial for the survival prognosis of hemodialysis patients, and inaccurate estimation will have a great negative impact on patient survival. Overestimating the patient's dry weight can lead to chronic fluid overload, possibly by inducing edema, pulmonary congestion, hypertension, and vascular and cardiac damage; underestimating the patient's dry weight can lead to chronic dehydration, cramps and other dialysis side effects, and increase the risk of dialytic hypotension. Risks include loss of residual renal function (RRF).

Existing dry weight assessment technology cannot achieve accurate and dynamic assessment of dry weight in hemodialysis patients. In clinical practice, doctors generally evaluate patients' dry weight based on clinical manifestations before, during, and after dialysis combined with physical examinations over a period of time. This is a trial-and-error, adjustment-based approach that is accomplished by gradually changing the patient's post-dialysis weight and observing the patient's dialysis performance. However, there is evidence that assessment of dry body weight using traditional physical signs (such as peripheral edema, lung auscultation, and blood pressure) is unreliable. Therefore, in recent years, new technologies have continued to emerge. For example, bioelectrical impedance analysis (BIA) is a non-invasive and simple technology to assist in the assessment and determination of dry body weight; relative plasma volume (RPV) monitoring has been validated as one of the markers of dry body weight; lung ultrasound has become an emerging Techniques for guided dry body weight. However, none of these methods serve as the gold standard for dry body weight assessment. Additionally, dry body weight often fluctuates due to uncertainty about the patient's nutritional status or underlying disease, necessitating ongoing reassessment. However, due to heavy daily workload, clinicians may not notice changes in these patients in a timely manner, resulting in delayed or even missed dry weight adjustments. Existing studies can only assess a patient's hydration status at a certain point in time to estimate dry body weight, and cannot help clinicians detect potential changes in dry body weight over time.

On the other hand, the existing clinical dry weight decision-making process is highly dependent on the experience and energy of the clinician. Due to the lack of precise standards, the value of dry body weight cannot be calculated from certain patient characteristics and needs to be evaluated comprehensively from many relevant patient clinical manifestations. Therefore, in a high-data-density environment such as clinical practice, clinicians must review large amounts of patient characteristic data to Assessing or monitoring dry body weight results in a complex, time-consuming and labor-intensive decision-making process. This also makes the effect of hemodialysis treatment closely related to the experience and medical knowledge of the attending doctor, exacerbating the imbalance in the distribution of regional medical resources.

Contents of the invention

The purpose of the present invention is to address the shortcomings of the existing technology and propose an auxiliary dry weight adjustment system for hemodialysis patients based on deep reinforcement learning to dynamically support clinicians in determining personalized dry weight adjustment plans for hemodialysis patients.

The object of the present invention is achieved through the following technical solutions: an auxiliary adjustment system for dry weight of hemodialysis patients based on deep reinforcement learning, which system includes a data acquisition module, a data processing module, a strategy learning module and an auxiliary decision-making module;

The data acquisition module is used to collect medical electronic medical record data of hemodialysis patients during the dialysis induction period and dialysis stable period, and input it into the data processing module;

The data processing module is used to process the data collected by the data acquisition module, including the construction of the state space and the construction of the action space; the state represents the time-series coded clinical variables during the patient's dialysis treatment, and the action represents the time compared to the previous dialysis. The dry weight of the treatment course, the value to which the current dry weight should be adjusted;

The policy learning module is used to set the reward function of deep reinforcement learning. The reward function is the instant reward of each state, which is composed of the reward of the patient's long-term survival probability and the penalty of the patient's current dialysis symptoms, and is based on data processing. The state space and action space constructed by the module are subjected to deep reinforcement learning to obtain the dry weight adjustment strategy;

The auxiliary decision-making module is used to visually output the dry weight adjustment strategy to assist doctors in decision-making.

Furthermore, for patients in the induction period of dialysis, the data collection module collects data for every dialysis session; for patients in the stable period of dialysis, the data collection module collects data for every 4 dialysis sessions.

Furthermore, the data of each dialysis session includes four types of clinical variables: intra-dialysis measured variables of the previous dialysis session, post-dialysis measured variables of the previous dialysis session, and pre-dialysis measured variables of this dialysis session and this dialysis session. patient demographics.

Further, for patients in the stable phase of dialysis, depending on the clinical variables collected, the clinical variable values collected and recorded by the data collection module are the average or sum of the corresponding clinical variable values in these four dialysis treatments.

Further, the data processing module first preprocesses the data collected by the data acquisition module, uses multiple interpolation to interpolate the missing clinical variable data, and uses the Min-Max normalization method to normalize the clinical variable data. , and then use the preprocessed data to construct the state space.

Furthermore, the data processing module uses an autoencoder of a long short-term memory network to perform temporal encoding processing on the preprocessed clinical variable data; the autoencoder of the long short-term memory network is trained and optimized to minimize the difference between the original input and the decoded output. The reconstruction loss of variables, and output a status representing the clinical variable for each dialysis session time for each patient.

Furthermore, when constructing the action space, the data processing module uses backward interpolation to fill in the recommended dry weight value of the physician in each dialysis session, and calculates the patient's dry weight in this dialysis session compared with the dry weight in the previous dialysis session. The amount of change in body weight was discretized.

Furthermore, in the policy learning module, part of the reward function uses a multi-layer perceptron network to predict the probability of the patient dying within one year in the corresponding state, and the reward return is set to the negative logarithmic odds of the probability; the other part of the reward function is Penalties for the occurrence of side effects during dialysis vary with different intradialytic symptoms and severity.

Furthermore, in the policy learning module, an experienced replay pool is constructed and a deep double-Q network is used for deep reinforcement learning. Experience replay refers to saving the rewards and status updates obtained from each interaction with the environment for subsequent deep reinforcement learning. The update of the target Q value during the process.

Further, in the auxiliary decision-making module, doctors can set an evaluation threshold. Adjustments below this threshold will be directly evaluated by nurses and selectively executed. Adjustments above this threshold will be evaluated and selectively executed by physicians to achieve dry weight adjustment. Decision-making support.

The beneficial effects of the present invention are: the present invention models the important clinical problem of dry weight assessment as a timing decision-making problem of dry weight adjustment; combines clinical knowledge and physician experience to construct a targeted reward function for the dry weight adjustment process, while reflecting The patient's long-term survival reward and the short-term penalty for poor dialysis symptom response; the reinforcement learning agent-deep double Q network with a competitive architecture is used to fully utilize the time-series electronic medical record data to learn the optimal dry weight adjustment strategy; it can reduce the workload of doctors and can When assessing a patient's dry weight, more patient characteristic variables are taken into consideration to help physicians balance short-term and long-term interests and customize a personalized dry weight adjustment plan for the patient. Since the effects of dialysis treatment are highly heterogeneous among patient groups, patients are likely to benefit from a more personalized and intelligent adjustment plan, thereby improving long-term survival rates, reducing the incidence of dialysis side effects, and improving dialysis The therapeutic effect of the course of treatment.

Description of the drawings

Figure 1 is a structural block diagram of the dry weight auxiliary adjustment system for hemodialysis patients based on deep reinforcement learning of the present invention.

Figure 2 is a schematic diagram of the data reconstruction process in the data acquisition module of the present invention.

Figure 3 is a schematic diagram of the modeling process of dry body weight adjustment according to the Markov decision process of the present invention.

Figure 4 is an overall architecture diagram of the strategy learning module of the present invention.

Detailed ways

The specific embodiments of the present invention will be described in further detail below with reference to the accompanying drawings.

Reinforcement learning is a popular research direction in the field of artificial intelligence. It is based on an agent agent that continuously interacts with the environment. The goal is to find an optimal strategy to maximize the expected cumulative reward. In recent years, with the utilization of massive medical electronic medical record data and the development of new machine learning technologies, reinforcement learning has been introduced into the healthcare field and plays an increasingly important role in many temporal decision-making problems, such as for patients with diabetes. glycemic control, treatment of patients with sepsis, mechanical Ventilation settings and other issues. However, to date, reinforcement learning techniques have not been used to support clinicians in assessing dry body weight in hemodialysis patients.

The present invention uses the Markov decision process framework to model the dry weight assessment process as a sequential decision-making process, defines respective state spaces and action spaces for different dialysis periods, and designs a reward system combined with clinical background knowledge; the present invention constructs A deep double Q network (Dueling-DQN) based on a competitive architecture is used to learn the optimal dry weight adjustment strategy from historical electronic medical record data, thereby providing nephrologists with clinical decision support suggestions for dry weight adjustment and assisting doctors in patient weight management. Long-term management.

As shown in Figure 1, the present invention provides a dry weight auxiliary adjustment system for hemodialysis patients based on deep reinforcement learning. The system includes: a data acquisition module for collecting medical electronic medical record data of hemodialysis patients, and a data collection module for collecting original data. Data processing module for processing, strategy learning module for deep reinforcement learning agents; auxiliary decision-making module for visual output and interaction with doctors.

The processing process of the data collection module is specifically: collecting clinical data of patients from the medical electronic medical record system, including demographics, laboratory values, dialysis parameters, dialysis symptoms and other related clinical characteristics. Considering that in clinical practice, the evaluation of dry body weight occurs after the pre-dialysis variables are measured in each dialysis session and before dialysis is actually started using the dialysis equipment, therefore, the present invention limits the time window of collection during data collection. That is, the data of each dialysis treatment course is reconstructed. The data for each dialysis session include four categories of clinical variables: intradialytic measured variables for the previous dialysis session, post-dialytic measured variables for the previous dialysis session, pre-dialytic measured variables for this dialysis session, and patient population for this dialysis session. Statistical indicators (shown in Figure 2).

The present invention separately processes and models the data of the dialysis induction period (three months before the start of dialysis) and the dialysis stable period (three months after the start of dialysis). For patients in the induction period of dialysis, the present invention collects data for each dialysis session; for patients in the stable period of dialysis, the present invention collects data for every 4 dialysis sessions, and the clinical variable values collected and recorded are those of the 4 dialysis sessions. Corresponds to the mean (e.g., age) or sum (e.g., number of occurrences of adverse dialysis symptoms) of clinical variable values.

The processing process of the data processing module includes two parts:

1) Construction of state space

2) Construction of action space

1) As shown in Figure 3, the dry body weight adjustment process modeling is a sequential decision-making process. The present invention models and describes this process based on the Markov decision process (MDP). The Markov decision process is described by the tuple (S, A, T, R, π), where S represents the state space, A represents the action space, T represents the transition probability distribution between different states, R represents the reward function, and π represents Strategy is the mapping from state space to action space. At each time step t, the agent can observe a state s _t ∈S and select an action a _t ∈A according to the policy π. This is the action selection. selection process. Then, the agent receives the reward r _t related to its action selection according to the reward function R. This is the reward response process. Finally, the environment changes to the next state s _t+1 ∈S in response to the agent's action according to the state transition probability distribution T. In the present invention, the state S represents the clinical variables of the patient's dialysis course after time series encoding, and the action A represents the value that the current dry weight should be adjusted (increased or decreased) compared to the dry weight of the last dialysis course. Since the clinical environment is complex and it is difficult to accurately model the probability distribution of state transition, the present invention sets the state transition probability distribution T to be unknown. Under the guidance of the reward function R, the agent learns and outputs the best action selection strategy π for the unknown complex environment based on historical retrospective data. Construction of state space

Multiple interpolation was used to interpolate missing clinical variable data; the Min-Max normalization method was used to normalize the feature matrix to facilitate subsequent learning and optimization of deep models. Since the dry weight adjustment process is actually a partially observable Markov decision process (POMDP), that is, the state transition dynamics and reward distribution do not satisfy the Markov properties (the information contained in the current state is determined by the distribution probability of the future state). (all required content), the present invention uses an autoencoder of a long short-term memory network to perform temporal encoding processing on clinical data collected by patients. The autoencoder of the long short-term memory network is trained and optimized to minimize the reconstruction loss between the original input and the decoded output. The encoder and decoder parts are composed of a single-layer long short-term memory network containing 128 units. The autoencoder of this long short-term memory network cyclically encodes clinical observations collected by patients and outputs a state s _it for each dialysis session time t for each patient i.
s _it =f (o _i1 , o _i2 , o _i3 ,..., o _it )

Among them, i represents the patient, o _it represents the collected clinical observation feature vector of the t-th dialysis course of the patient, t represents the dialysis course time, s represents the state of the Markov process, and f represents the encoder of the trained long short-term memory network. .

2) Construction of action space

Considering that the clinical recommended value of dry body weight is considered to remain unchanged until the doctor prescribes a new dialysis prescription for the patient, the present invention uses backward interpolation to fill in the recommended dry body weight value of the physician in each dialysis session. ; The present invention calculates the change in the patient's dry weight in this dialysis session compared to the last dialysis session, and performs discretization processing.

Discretization processing refers to limiting the dry weight adjustment range to a certain interval, dividing equal adjustment intervals into different adjustment actions, and using the action with the closest continuous value to the doctor's dry weight adjustment during the dialysis treatment as the discretized dry weight. Adjustment action (the amount of change in dry body weight during a dialysis session compared to the previous dialysis session).

The present invention constructs specific action spaces for the dialysis induction period (three months before the start of dialysis) and the dialysis stable period (three months after the start of dialysis), as shown in Table 1.

Table 1. Comparison of dry weight adjustment frequency and action space construction in different dialysis periods

The processing process of the policy learning module of the deep reinforcement learning agent includes three parts:

1) Experience replay

2) Learn the reward function

3) Deep Q network learning dry weight adjustment strategy

As shown in Figure 4, the core of the policy learning module of the deep reinforcement learning agent of the present invention is a deep double Q network (DDQN with a dueling structure) based on a competition architecture. Deep double Q network (DDQN) and competition-based Q network (Dueling-DQN) are both improved versions of DQN. The former is an improvement on the DQN training algorithm, and the latter is an improvement on the DQN model structure. The present invention adopts these at the same time. Two improvements. The DQN algorithm is an improvement on the Q-learning algorithm. The Q-learning algorithm uses a Q-tabel to record the action value in each state. When the state space or action space is large, the storage space required will also be large. If the state space or action space is continuous, the Q-learning algorithm cannot be used. The core of the DQN algorithm is to use an artificial neural network q(s, a; ω) s∈S, a∈A to replace Q-tabel, which is the action value function. The input of the action value network is state information, and the output is the value of each action. The agent selects the action based on the value of each action.

1) Construction of experience replay pool

Experience replay refers to saving the rewards and status updates obtained from each interaction with the environment for subsequent updates of the target Q value, which can disrupt sample correlation, improve sample utilization, and thereby improve the stability of DQN training. Experience playback mainly has two key steps: "storage" and "playback": storage refers to storing experience in the form of current state s _t , action a _t , instant reward r _t+1 , next state s _t+1 , and round state done Stored in the experience pool, playback refers to sampling one or more pieces of experience data from the experience pool according to certain rules. The present invention adopts a priority experience playback method, that is, assigning a priority to each experience in the experience pool, and is more inclined to select experiences with higher priorities when sampling experience. The priority depends on the difference between the current Q value of each state transition and the target Q value (time difference error, TD-error). If the TD-error is larger, it means that there is still a lot of room for improvement in the Q network prediction accuracy, then this sample The more it needs to be learned, the higher the priority.

2) Learning of reward function

The reward function is the feedback observed from the environment for a given state-action pair. The main goal of the reinforcement learning agent is to maximize the cumulative reward of the state-action pair given the patient's state-action trajectory, so the design of the reward function is crucial to the learning of the reinforcement learning agent.

It's natural to think of patient survival as a trigger for rewards. For example, an agent would receive a negative return for a patient's death return, and a positive return due to patient survival. However, because dialysis treatment for hemodialysis patients can last for years, the patient trajectory can be long. If only responding to patient outcome events, rewards would be very sparse, hindering the learning and updating process of the reinforcement learning agent.

Therefore, in the present invention, the reward function is set up to respond immediately to each state in the patient's trajectory. Specifically, the reward consists of two parts: one part reflects the patient's long-term survival probability r ₁ (s), and the other part reflects the patient's current intradialytic symptoms r ₂ (s). In order to obtain the survival reward, the present invention trains a multi-layer perceptron (MLP) network to predict the probability of the patient's death within one year in this state. Reward returns are set as the negative log odds of probability. Generally speaking, a dead status score within a year is negative and a living status score is positive.

Among them, r ₁ (s) represents the survival reward part in the reward function; g (s) represents the probability of death of the patient within one year in state s predicted by the multi-layer perceptron.

The other part of the reward is the penalty for side effects during dialysis, which is recorded as r ₂ (s). Penalties vary with different intradialytic symptoms and severity. According to the actual on-site performance, one point is deducted for fever, imbalance syndrome, cerebral hemorrhage and cerebral infarction, while two points are deducted for headache, muscle spasm, abdominal pain, intradialytic hypotension and intradialytic hypertension.

The total reward function r(s) is the sum of the patient's survival reward and the penalty for side effects during dialysis.
r(s)＝r ₁ (s)+r ₂ (s)

3) Policy learning of deep Q network

This invention trains and optimizes a deep double Q network (Dueling DDQN) based on a competitive architecture, and adjusts the dry weight processing strategy through repeated trials to maximize the overall return of the predicted reward. The loss function of Dueling DDQN is divided into two parts: one is the time difference error (TD-error), which reflects the difference between the current Q value and the target Q value; the other is a regularization term to punish the output Q that exceeds the reasonable threshold r _max = 20 value to improve model stability. The following formula represents the loss function of the present invention to train and optimize the deep double Q network based on the competitive architecture:
L(θ)＝E[error _TD *w _imp ]+λmax(|Q _main (s _t , a _t ; θ)-r _max |, 0)

error _TD = (Q _{double-target} -Q _main (s _t , a _t ; θ)) ²

Among them, L(θ) is the final loss function to be learned by the deep double Q network based on the competition architecture of the present invention, error _TD is the time difference error, w _imp is the importance sampling weight of priority experience playback; Q _main is the deep double Q network The main network in , Q _target is the target network in the deep double Q network, θ is the parameter of the main network, θ′ is the parameter of the target network; γ is the discount coefficient, taking a value between 0 and 1, the higher the γ value High, indicating that the agent pays more attention to future rewards rather than the current moment. reward; s represents the state, a represents the action, r represents the reward, E represents the expectation, λ represents the regularization term coefficient, which takes a value between 0 and 1, r _t+1 represents the t+1 dialysis treatment course Reward, s _t represents the status of the t-th dialysis session, and a _t represents the action of the t-th dialysis session.

The special design of the reward function in the present invention effectively improves the policy learning efficiency of the deep Q network. Different from general delayed survival rewards (corresponding rewards or penalties are given at the end of the patient's trajectory according to the patient's survival or death), the reward function in the present invention is an immediate reward, that is, each state of the trajectory will give a reward to the agent. . The survival reward part r ₁ (s) in the reward function distributes the survival reward located at the end of the patient's trajectory to each state of the patient's trajectory in advance and dispersedly through a survival predictor. On the other hand, the intradialytic side reaction penalty part r ₂ (s) in the reward function incorporates the patient's immediate feedback on the dialysis course in each dialysis state into the reward, imitating the doctor's behavior of adjusting the dry weight according to the patient's clinical performance, so that The strategies learned by the agent are not only expected to improve the patient's survival, but also reduce the patient's adverse reactions during dialysis, reduce the physical pain of dialysis patients, and improve the therapeutic effect of dialysis treatment. The reward determines the goal of the agent's action. Therefore, the immediate reward can guide the agent's behavior better and more timely than the delayed reward. The corresponding loss function is easier to learn and optimize, which improves the agent's learning efficiency.

Eventually, the deep Q network will learn a value function Q network to map different states and actions to different Q values, so that different dry weight adjustment actions can be selected for different dialysis treatment states based on this mapping, ultimately forming an intelligent Dry body weight adjustment strategies recommended by the body.

The auxiliary decision-making module used for visual output and interaction with doctors is specifically: according to the patient's different dialysis treatment status, the reinforcement learning agent will recommend the optimal dry weight adjustment value for the patient. Physicians can set an assessment threshold (such as 0.2kg). Adjustments below this threshold will be directly evaluated by nurses and selectively implemented. Adjustments above this threshold will be evaluated and selectively implemented by physicians to provide auxiliary support for physicians’ dry weight adjustment decisions. . The system will record the agent's recommended values in each dialysis session, whether the physician accepts the agent's recommendations, and the dry weight adjustment value performed by the physician. It will regularly evaluate the patient's dialysis adequacy and provide feedback to physicians and algorithm engineers in the form of visual charts. , so that the model can be updated and optimized later.

A specific example of the present invention is as follows:

This example uses the electronic medical record data of maintenance hemodialysis patients who received continuous and regular hemodialysis treatment in a tertiary hospital for research. The data of the dialysis induction period and dialysis stable period are divided into three data sets: training set (60% ), validation set (20%), and test set (10%). The data of the training set is used to train the deep reinforcement learning agent model, the data of the verification set is used to adjust the optimization parameters, and the test set is used to test the performance of the model. On the test set, the present invention uses multiple sampling with replacement (bootstrap) to obtain the confidence interval of the performance index. In addition to the strategies implemented by doctors and the strategies learned by the intelligent agent of the present invention, this embodiment adds a random strategy and a K nearest neighbor strategy to compare and evaluate the effectiveness of the model. The K nearest neighbor strategy refers to selecting the strategy to be taken based on the K most similar state voting. action. This invention uses the weighted double robust (WDR) estimator in the different strategy evaluation method to evaluate the value of different strategies. The results are as shown in Table 2 and shown in Table 3.

Table 2. Comparison of strategic value results of different strategies during the dialysis induction period

Table 3. Comparison of strategic value results of different strategies during the stable period of dialysis

It can be seen from the results that the dry weight adjustment strategy learned by the deep reinforcement learning agent in the present invention has achieved the best results compared with other strategies. It is worth noting that when the strategy learned by the intelligent agent of the present invention is applied to the dialysis induction period, compared with the existing clinician strategy, it is expected that the 5-year mortality rate of hemodialysis patients can be reduced by 9.47%, and the mortality rate of hemodialysis patients in 3 years can be reduced by 9.47%. The mortality rate was reduced by 7.99%, the incidence of dialysis adverse reactions was reduced by 8.44%, and the coefficient of variation of systolic blood during dialysis was reduced by 4.76%, which was statistically significant. Therefore, the present invention is expected to realize dynamic and intelligent adjustment of the dry weight of hemodialysis patients, and is expected to significantly improve the dialysis treatment effect and long-term survival of hemodialysis patients.

The above embodiments are used to illustrate the present invention, rather than to limit the present invention. Within the spirit of the present invention and the protection scope of the claims, any modifications and changes made to the present invention fall within the protection scope of the present invention.

Claims

A dry weight auxiliary adjustment system for hemodialysis patients based on deep reinforcement learning, characterized in that the system includes a data acquisition module, a data processing module, a strategy learning module and an auxiliary decision-making module;

The data acquisition module is used to collect medical electronic medical record data of hemodialysis patients during the dialysis induction period and dialysis stable period, and input it into the data processing module;

The data processing module is used to process the data collected by the data acquisition module, including the construction of the state space and the construction of the action space; the state represents the time-series coded clinical variables during the patient's dialysis treatment, and the action represents the time compared to the previous dialysis. The dry weight of the treatment course, the value to which the current dry weight should be adjusted;

The policy learning module is used to set the reward function of deep reinforcement learning. The reward function is the instant reward of each state, which is composed of the reward of the patient's long-term survival probability and the penalty of the patient's current dialysis symptoms, and is based on data processing. The state space and action space constructed by the module are subjected to deep reinforcement learning to obtain the dry weight adjustment strategy;

The auxiliary decision-making module is used to visually output the dry weight adjustment strategy to assist doctors in decision-making.
A dry weight auxiliary adjustment system for hemodialysis patients based on deep reinforcement learning according to claim 1, characterized in that, for patients in the dialysis induction period, the data acquisition module collects data for each dialysis treatment course; for patients in the dialysis stable period For patients, the data collection module collects data every 4 dialysis sessions.
A dry weight auxiliary adjustment system for hemodialysis patients based on deep reinforcement learning according to claim 2, characterized in that the data of each dialysis course includes four types of clinical variables: intradialytic measurement variables of the last dialysis course, Post-dialysis measured variables for a dialysis session, as well as pre-dialysis measured variables for the dialysis session and patient demographics for the dialysis session.
A dry weight auxiliary adjustment system for hemodialysis patients based on deep reinforcement learning according to claim 3, characterized in that, for patients in the stable period of dialysis, the data acquisition module collects and records recorded clinical variables based on the differences in collected clinical variables. The value is the mean or sum of the corresponding clinical variable values in 4 dialysis sessions.
An auxiliary adjustment system for dry body weight of hemodialysis patients based on deep reinforcement learning according to claim 1, characterized in that the data processing module first preprocesses the data collected by the data acquisition module, and uses multiple interpolation to correct missing data. The clinical variable data is interpolated, and the Min-Max normalization method is used to normalize the clinical variable data, and then the preprocessed data is used to construct the state space.
A dry weight auxiliary adjustment system for hemodialysis patients based on deep reinforcement learning according to claim 5, characterized in that the data processing module uses an autoencoder of a long short-term memory network to perform time series on the preprocessed clinical variable data. Encoding processing; the autoencoder of the long short-term memory network is trained and optimized to minimize the reconstruction loss between the original input and the decoded output. Both its encoder and decoder parts are composed of a single layer of long short-term memory containing 128 units. netgroup The construction of the state space uses the autoencoder of the long short-term memory network to cyclically encode the clinical variables collected by the patient, and output a state representing the clinical variable for each dialysis course time of each patient.
A dry weight auxiliary adjustment system for hemodialysis patients based on deep reinforcement learning according to claim 1, characterized in that when the data processing module constructs the action space, it uses backward interpolation to fill in the Chinese medicine practitioners for each dialysis session. According to the recommended dry weight value, the change in dry weight of the patient's current dialysis session compared with the dry weight of the previous dialysis session was calculated and discretized.
A dry weight auxiliary adjustment system for hemodialysis patients based on deep reinforcement learning according to claim 1, characterized in that in the strategy learning module, part of the reward function predicts the patient's performance in the corresponding state through a multi-layer perceptron network. The probability of death within the year, the reward return is set as the negative logarithm of the probability; the other part of the reward function is the penalty for the occurrence of side effects symptoms during dialysis, and the penalty changes with different intradialytic symptoms and severity.
A dry weight auxiliary adjustment system for hemodialysis patients based on deep reinforcement learning according to claim 1, characterized in that, in the strategy learning module, an experienced playback pool is constructed and a deep double Q network is used for deep reinforcement learning, and the experience playback It means to save the rewards and status updates obtained each time you interact with the environment, and use them to update the target Q value in the subsequent deep reinforcement learning process.
A dry weight auxiliary adjustment system for hemodialysis patients based on deep reinforcement learning according to claim 1, characterized in that in the auxiliary decision-making module, doctors can set an evaluation threshold, and adjustments below this threshold will be directly evaluated by nurses. And be selectively executed. Adjustments above this threshold are evaluated by physicians and selectively executed to achieve auxiliary support for dry weight adjustment decision-making.