CN116453706B

CN116453706B - Hemodialysis scheme making method and system based on reinforcement learning

Info

Publication number: CN116453706B
Application number: CN202310701530.8A
Authority: CN
Inventors: 李劲松; 高凯戈; 池胜强; 陈佳; 周天舒
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-06-14
Filing date: 2023-06-14
Publication date: 2023-09-08
Anticipated expiration: 2043-06-14
Also published as: CN116453706A

Abstract

The application discloses a hemodialysis scheme making method based on reinforcement learning, which comprises the following steps: acquiring a dataset comprising patient historical hemodialysis data and patient clinical data; constructing a neural network based on a noise depth Q network structure, wherein the neural network comprises a feature extraction module, an action generation module and a prediction module containing a reward mechanism; training the neural network by adopting the data set to obtain a hemodialysis scheme making model; the historical case data of the patient is input into a hemodialysis plan making model to output a hemodialysis plan decision comprising the dialysis duration and frequency of the patient, and guidance is provided for a doctor to make a treatment plan. The application also provides a hemodialysis scheme making system. The method provided by the application can combine more actual conditions with the needs of patients to give more reasonable and accurate hemodialysis scheme decisions, thereby providing more specific guidance for doctors to make medical schemes.

Description

Hemodialysis scheme making method and system based on reinforcement learning

Technical Field

The application belongs to the technical field of medical health information, and particularly relates to a hemodialysis scheme making method and system based on reinforcement learning.

Background

Hemodialysis is the most commonly used alternative therapy for uremic patients, most patients receive approximately 3 times per week for 4 hours dialysis blood flow of about 200 ml according to clinical hemodialysis guidelines, and if the patient still has a good kidney function, the dialysis frequency can be shortened to twice per week. If the condition of the renal failure patient is improved after the dialysis for a period of time, the dialysis frequency and time can be properly shortened; on the one hand, patients undergoing more dialysis treatments weekly, while data would be beneficial for e.g. left ventricular mass, blood pressure, phosphate control, have little impact on physical and cognitive performance; on the other hand, in consideration of poor patient dialysis experience, inconvenience of patient to hospital, and the like, the number of times of dialysis can be reduced and the dialysis duration and blood flow can be increased. In a real medical scenario, clinicians typically adjust dialysis protocols based on historical dialysis cases, with a large amount of manual effort, and giving dialysis protocols according to official guidelines is too versatile.

Patent document CN109686446a discloses a hemodialysis treatment plan analysis method and system based on double machine learning, the method comprising the steps of: preprocessing the training samples of the summarized hemodialysis center; filling the preprocessed training samples by adopting a Hot-Deck method, deleting the training samples with the variable deletion percentage exceeding alpha, and performing variable filling on the training samples with the variable deletion percentage not exceeding alpha to obtain filled training samples; the method comprises the steps of screening important features of the filled training samples by adopting an Las s o sparse constraint and a random forest, and combining the two screened training samples to obtain a final training sample; and loading the final training sample into a multiple linear regression model, and continuously iterating to obtain a final prediction model. The method does not consider the time sequence characteristics of hemodialysis medical record data, meanwhile, the prediction model is only a simple multiple linear regression model, the fitting degree of complex data is possibly poor, the dynamic scheme recommendation can not be carried out on a patient, the expansion method of a dialysis scheme is not adopted, and the expandability is weak.

Patent document CN111028913a discloses a hemodialysis treatment regimen aid decision making method comprising the steps of: preprocessing hemodialysis history medical record data marked by a professional dialysis doctor to obtain a total sample set; establishing a long-short time memory network based on a self-attention mechanism, and distributing attention weights alpha t for each time sequence medical record of an input model so as to calculate a medical record information global feature vector c; inputting the global feature vector c into a multi-task sharing layer to learn scheme labels; 4, distributing weight lambdan for the loss of each learning task, and taking the weighted sum of the losses as the total loss; and updating network parameters of the model by using an Adam optimization algorithm to obtain a final hemodialysis treatment scheme auxiliary decision model. The scheme can lead to incapacity of calculating a model under the condition that the characteristics of a patient are increased, can not dynamically recommend the dialysis scheme, does not consider the own will of the patient, does not have an expansion method of the dialysis scheme, and has weak expandability.

Disclosure of Invention

The application aims to provide a hemodialysis scheme making method and a hemodialysis scheme making system based on reinforcement learning, which can combine more actual conditions with patient demands and give more reasonable and accurate hemodialysis scheme decisions so as to provide more specific guidance for doctors to make medical schemes.

In order to achieve the first object, the present application provides a hemodialysis solution formulation method based on reinforcement learning, comprising the steps of:

historical case data of a patient is obtained and screened to obtain a dataset comprising historical hemodialysis data of the patient and clinical data of the patient.

The method comprises the steps of constructing a neural network based on a noise depth Q network structure, wherein the neural network comprises a feature extraction module, an action generation module and a prediction module containing a reward mechanism, the feature extraction module comprises a pre-constructed Quasi-RNN encoder, the Quasi-RNN encoder generates patient state data based on a time sequence according to input patient clinical data, the action generation module generates action space parameters containing dialysis duration and dialysis frequency according to patient historical hemodialysis data, and the prediction module analyzes based on the reward mechanism according to the input patient state data and the action space parameters to obtain a corresponding prediction result.

The neural network is trained using the data set to obtain a hemodialysis regimen modeling for providing a patient hemodialysis regimen decision.

The historical case data of the patient is input into a hemodialysis plan making model to output a hemodialysis plan decision comprising the dialysis duration and frequency of the patient, and guidance is provided for a doctor to make a treatment plan.

According to the time sequence characteristics of hemodialysis medical record data, the benefit of the recent and long-term health states of the patient is considered, the self dialysis willingness of the patient is added into the reward function, and the dialysis experience of the patient is improved, so that the final treatment scheme is more humanized.

Specifically, the clinical data of the patient are obtained by taking the identity ID of the patient as a distinguishing mark and adopting a time sequence to arrange the age, the weight, the urea nitrogen information, the parathyroid hormone information, the blood creatinine information, the hemoglobin information, the blood calcium information, the blood phosphorus information, the blood sodium information and the dialysis history length of the patient.

Specifically, the expression of the Quasi-RNN encoder is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,indicating that a status of +_f is outputted for each patient i during each week of dialysis treatment time t>Patient clinical data representing the patient at time t.

Specifically, the action space parameters further comprise custom action parameters, which include operating parameters of the dialysis device or/and patient clinical data.

Specifically, the reward mechanism includes a survival reward value predicted based on the BP neural network, an additional reward value based on the physical sensation of the patient, and a patient willingness reward value.

Specifically, the expression of the reward mechanism is as follows:

in the method, in the process of the application,representing the prediction result of the survival rate of the patient based on the BP neural network, s representing the clinical data of the patient after normalizing the matrix,/I>Representing a survival reward value,/->Representing additional prize values,/->A patient willingness reward value is indicated and a score is in the range of 0 to 5 points.

Specifically, the neural network is further provided with an experience playback pool, and the experience playback pool is used for storing rewarding values obtained by the interaction of the intelligent agent and the environment.

Specifically, during training, parameters of the neural network are updated based on a loss function formed by time difference errors.

The specific expression of the loss function is as follows:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing time difference error, ++>Representing the state of the patient at time t, +.>Represents the dialysis action at time t, +.>Represents the noise added at time t, < >>Representing the main network parameters>Representing the target network parameters->Weight value representing preferential empirical playback, +.>Main network representing noise depth Q network, +.>Target network representing noise depth Q network, +.>Representing that the discount coefficient takes a value between 0 and 1, ">Indicating the prize value at time t.

In order to achieve the second object, the application also provides a hemodialysis scheme making system, which is realized by the hemodialysis scheme making method based on reinforcement learning, and comprises a data acquisition module, a data processing module, a strategy learning module and an auxiliary decision-making module.

The data acquisition module is used for acquiring historical hemodialysis data of a patient and clinical data of the patient.

The data processing module generates corresponding patient states according to the input clinical data of the patient.

The strategy learning module is used for constructing a hemodialysis scheme making model containing willingness rewards of patients.

The auxiliary decision-making module is used for inputting the historical case data of the patient into the hemodialysis scheme making model so as to visually output the hemodialysis scheme decision of the patient and provide guidance for a doctor to make a treatment scheme.

Compared with the prior art, the application has the beneficial effects that:

(1) The application generates the patient state by utilizing the time sequence of the clinical data of the patient, so that the fitting degree of the final hemodialysis scheme making model is better, and meanwhile, the willingness value of the patient is added to solve the problem of the emotion possibly generated by the patient and getting tired of the dialysis, and the dialysis experience of the patient is improved.

(2) The action space also has expansibility, and can generate a hemodialysis scheme decision in a targeted manner according to the condition of a patient.

Drawings

Fig. 1 is a flowchart of a hemodialysis solution making method based on reinforcement learning according to the present embodiment;

fig. 2 is a schematic diagram of a neural network framework based on a noise depth Q network structure according to the present embodiment;

fig. 3 is a schematic structural diagram of a hemodialysis solution making system according to the present embodiment.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

As shown in fig. 1, a hemodialysis solution formulation method based on reinforcement learning includes the following steps:

The historical case data includes patient basic information, visit information such as patient age, weight, dialysis Shi Chang, urea nitrogen, parathyroid hormone, creatinine, hemoglobin, blood calcium, blood phosphorus, blood sodium.

The patient clinical data is obtained by taking the identity ID of the patient as a distinguishing mark and adopting a time sequence to arrange the age, the weight, the urea nitrogen information, the parathyroid hormone information, the blood creatinine information, the hemoglobin information, the blood calcium information, the blood phosphorus information, the blood sodium information and the dialysis history length of the patient.

More specifically, as shown in fig. 2, a schematic diagram of a neural network is provided, in which patient parameters are adjusted through repeated experiments to maximize the overall return of predicted rewards, and finally a cost function is generated, in which the input is state information of the patient, and the output is the value of each action.

Before training, the learning parameters of the neural network based on the noise depth Q network are required to be determined, wherein the number of output layer nodes is not less than the number of elements of the output action set, the number of input layer nodes is not less than the number of elements of the input state set S, and the input layer weight coefficient, the hidden layer weight coefficient and the output layer weight coefficient of the noise depth Q network are initialized.

Next, a reward function is defined, which is feedback from the context of state-action pairs, in the construction of the agent reward function, the goal is to maximize patient survival. First, and the ultimate goal is to have dialysis patients extend their survival time as much as possible, it is obvious that if the patient dies, the rewards earned by the agent are negative, and if the patient survives, the rewards earned by the agent are positive.

First training a BP neural network to predict survival rate of patient in current state in next year。

The BP network structure here adopts a three-layer network structure: input layer, hidden layer, output layer. The first input is the state of the patient in the current state, such as age, weight, dialysis Shi Chang, urea nitrogen, parathyroid hormone, creatinine, hemoglobin, blood calcium, blood phosphorus, blood sodium. The data are combined into an input matrix after normalization.

The number of neurons in the hidden layer is determined from low to high by adopting a trial and error method, and a Sigmoid function is used as an activation function.

Setting training times, and obtaining the BP neural network after convergence errors.

Through the trained BP neural network, the application sets the rewarding value r when the survival rate is more than 50 percent ₁ Positive, and the greater the survival rate, the greater the prize value; when the survival rate is less than 50%, the prize value r ₁ Negative and the smaller the survival, the smaller the prize value.

Wherein the method comprises the steps ofRepresenting a survival benefit; />Representing predicted patient status by BP neural networksProbability of survival in the next year.

For additional rewardsIndicating that if the patient does not have any discomfort symptoms in the current state +.>The method comprises the steps of carrying out a first treatment on the surface of the If mild discomfort symptoms occur +>The method comprises the steps of carrying out a first treatment on the surface of the If severe dialysis side effects occur +.>。

Introducing patient willingness value rewardsThe patients can score themselves according to subjective feelings of the patients, for example, the degree of convenience of the courtyard according to the degree of pain of dialysis, and the scoring range is 0 to 5 points.

Final total prize functionSum of the survival prize value and the additional prize value:

next define behavior policies：

Defining a state cost function：

Wherein the method comprises the steps ofIndicates that at time t according to policy ∈ ->Decision making +.>。

Defining action cost functions：

Hemodialysis protocol enactment network behavior strategyThe method comprises the following steps:

meanwhile, an experience playback pool is also constructed, the experience pool plays back and stores the rewarding value obtained after the intelligent agent interacts with the environment, the number of interactions with the environment can be reduced by using the playback buffer, the sample utilization rate is improved, and the stability of the noise depth Q network is improved.

The application assigns a priority to each experience in the experience pool according to the difference between the Q values of different states, which is called time difference error (TD-error). Sample meter for setting time tThen by observing the state at the present moment +.>By->Selecting an action from the output action set>After execution in the environment, the prize value +.>And next state->Then get new +.>Put it into an experience pool.

The loss function used by the neural network in the training process consists of a time difference error reflecting the difference between the current Q value and the target Q value, and at the beginning of each training round, adding a gaussian noise to each parameter of the Q network to change the current Q value to Q':

for example, the original linear layer is:

after adding noise, this linear layer becomes:

in the method, in the process of the application,noise with mean value of 0, < > and->Are all learnable parameters.

In particular to each of the neurons,the noise of (2) is:

the noise of (2) is:

in the method, in the process of the application,can take-> 。

By randomly sampling a batch of samples from an experience poolThe loss function is expressed as:

wherein, the liquid crystal display device comprises a liquid crystal display device,representing time difference error, ++>Representing the state of the patient at time t, +.>Represents the dialysis action at time t, +.>Represents the noise added at time t, < >>Representing the main network parameters>Representing the target network parameters->Weight value representing preferential empirical playback, +.>Main network representing noise depth Q network, +.>Target network representing noise depth Q network, +.>Representing that the discount coefficient takes a value between 0 and 1, ">Indicating the prize value at time t. The strategy obtained by minimizing the loss function is the optimal strategy.

Through the design of the state, action and noise depth Q network, the hemodialysis scheme making network can learn a cost function Q network, different states and actions are mapped to different Q values, so that dialysis duration and frequency can be adjusted for different patients in different time according to the mapping, and finally, the recommended dialysis duration decision of the intelligent agent is formed. The training process can be repeated every time new patient data is added, so that the obtained cost function Q network continues to learn so as to achieve the purpose of dynamic recommendation.

The input of the Quasi-RNN encoder is patient clinical data and the output is patient state data set S in a markov decision model.

The design process of the quick-RNN encoder is as follows:

the hemodialysis history data, which is marked by a professional dialysis physician, is first preprocessed to obtain a total sample set. Sensitive fields in the hemodialysis history data that relate to the personal privacy of the patient, such as the patient's name, phone, home address, etc., need to be erased; characteristic data of the input model including age, weight, dialysis Shi Chang, urea nitrogen, parathyroid hormone, creatinine, hemoglobin, calcium, phosphorus, sodium and blood of the patient, and a set of states of the model output are then determined.

And then establishing a model based on the quick-RNN, wherein the history case data of the dialysis patient is time-series, and the medical records with longer time have smaller influence on determining the current state, and the medical records with more recent time have larger influence on determining the current state. The 10 features described above are first processed, with patient ID as the unique distinguishing identifier, patient data time ordered to form serialized data, feature missing values replaced with 0, and seven days per week patient clinical data treated as one time step, and then calculated using, for example, a convolution window of size 2, i.e., looking at the inputs of the first two time steps. Data is entered into a single layer 128-dimensional patient state auto-encoder to learn how to represent patient state, in such a way that it learns internally the best way to represent the input in the lower dimension. In the Quasi-RNN convolution layer, iteration is not needed in calculation, all calculation is batched into multiplication of a matrix, and the calculated amount in the circulation process is greatly reduced. Finally, minimizing the loss function between the original input and the decoded output, resulting in a trained patient state self-encoder.

Clinical observations acquired by the patient are cyclically encoded from the encoder using the trained patient status and a status is output for each patient i during a dialysis treatment time t of each week：

Wherein, the liquid crystal display device comprises a liquid crystal display device,indicating that a status of +_f is outputted for each patient i during each week of dialysis treatment time t>Characteristic of the patient at time t, +.>Representing the patient status auto-encoder trained from patient data.

Construction of an action space in which the action space parameters are located:

the recommended dialysis duration and frequency of the patient in the stable period are about 3 times per week for 4 hours, and as the patient progresses, the patient may have better disease condition on one hand, and the frequency and duration of the dialysis can be reduced; on the other hand, the patient may be unwilling to dialyze, and go to the hospital, so the frequency can be reduced, the dialysis duration can be increased, and various actual conditions can be combined, so the change value between the dialysis frequency and the dialysis duration is set as the value of the action space, and discretization is performed.

The discretization process is to limit the frequency and duration adjustment range of dialysis to a certain interval range, and divide the adjustment interval into different adjustment actions. From clinician experience and feedback, the action space construction for dialysis frequency and duration can be summarized as shown in Table 1 below:

TABLE 1

In this embodiment, the action space is divided into two groups consisting of dialysis frequency and dialysis duration, wherein the dialysis frequency is one of [ -3, -2, -1, 0, 1], the dialysis duration is one of [0, 1, 2, 3], and the total number of action two groups is 20: [ -3, 0] [ -3, 1] … [1, 3]. For example, when the duration of dialysis received by the patient is 3 times for 4 hours in the initial state, then after the agent makes the actions of [ -1, -1], it means that the duration of dialysis received by the patient becomes 2 times for 3 hours.

The frequency and duration aspects of the dialysis scheme are only illustrated, and if other angles, such as dialyzer blood flow at each dialysis, are added, the action space becomes a triad of [ delta frequency, delta duration and delta blood flow ] which can be flexibly changed according to the needs of doctors.

The present embodiment obtains the status of patient i at time tThen, the output action set is obtained according to the formulation of the action space>Wherein the action set A contains 20 actions in total, each action is a binary group, and the action of the patient at the time t is recorded as +.>。

As shown in fig. 3, a hemodialysis solution making system provided for this example is implemented based on the hemodialysis solution making method provided in the foregoing embodiment, and includes:

The auxiliary decision-making module is used for inputting the historical case data of the patient into a hemodialysis scheme making model so as to visually output the hemodialysis scheme decision of the patient, provide guidance for a doctor to make a treatment scheme, and particularly, recommend optimal dialysis frequency and duration adjustment values for the patient according to different dialysis treatment course states of the patient by the reinforcement learning intelligent agent. The physician can set an evaluation threshold (e.g., 1 time per week or less than 2 hours per dialysis session), below which adjustments will be evaluated directly by the nurse and optionally performed, above which adjustments are evaluated by the physician and optionally performed, enabling assistance in physician dialysis regimen adjustment decisions. The system will record the recommended value of the agent, whether the physician accepts the advice of the agent, and the adjustment value of the dialysis regimen performed by the physician, periodically evaluate the patient's adequacy of dialysis, and will give the patient's predicted actual survival rate and predicted return, while recording the actual choice of the decision maker, thereby further adjusting the model's subsequent reward function. The visualization chart is used to feed back to doctors and algorithm engineers for subsequent updating and optimization of the model.

To better illustrate the effect of the present application, tests were performed based on patient data from a particular hospital.

The medical records of 50000 times are extracted from the electronic medical record database of the hospital, and the medical records comprise age, weight, dialysis Shi Chang, urea nitrogen, parathyroid hormone, creatinine, hemoglobin, blood calcium, blood phosphorus and blood sodium of the patient, and willingness values. The resulting data is divided into three data sets: training set (60%), validation set (20%), test set (10%), where the number of hidden layers of the patient state encoder is 1 layer, there are 128 hidden units. A state representation between patients was generated by a Quasi-RNN neural network, the state representation consisting of 128-bit vectors.

Before inputting the obtained patient state into a hemodialysis scheme making network, training a BP neural network to determine a reward value, wherein the hidden layer number in the BP neural network is 1 layer, the hidden layer neuron number is determined from low to high by adopting a rounding method, and a Sigmod function is used as an activation function. Setting survival rewarding value according to BP neural network obtained by training:

additional introduction of additional prize values representing current patient physical sensations

Namely:

after setting the reward function, the action function is set, and the action space construction of the dialysis frequency and duration can be summarized as follows in table 2:

TABLE 2

Then substituting the state set, the reward function and the action function into the Noisy-DQN network, and giving the state according to the Q function in the establishment in each time step tAnd corresponding actions->Obtain prize value->Putting the obtained parameter values into an experience pool, updating the state, repeating the steps until the experience pool is full, then starting to sample from the experience pool, and adding Gaussian noise on the Q function at the beginning of each round. The loop is continued until the loss function is minimized.

The trained Q function is the optimal strategy, and the strategy of the application is visually output to a doctor for the doctor to refer to and adjust the hemodialysis frequency and duration of the patient.

Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.

It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof.

Claims

1. A hemodialysis regimen making method based on reinforcement learning, comprising the steps of:

acquiring historical case data of a patient, and screening the historical case data to obtain a dataset comprising historical hemodialysis data of the patient and clinical data of the patient;

the method comprises the steps of constructing a neural network based on a noise depth Q network structure, wherein the neural network comprises a feature extraction module, an action generation module and a prediction module containing a reward mechanism, and the feature extraction module comprises a pre-constructed Quasi-RNN encoder, which is expressed as follows:wherein->Indicating that a status of +_f is outputted for each patient i during each week of dialysis treatment time t>Patient clinical data representing a patient at time t, the Quasi-RNN encoder generating time-series based patient state data from the input patient clinical data, the action generation module generating action space parameters including dialysis duration, dialysis frequency, and operating parameters of dialysis equipment or/and patient clinical data from the patient historical hemodialysis data, the prediction module analyzing based on a reward mechanism from the input patient state data and the action space parameters to obtain corresponding prediction results;

the reward mechanism comprises a survival reward value predicted and obtained based on the BP neural network, an additional reward value based on physical feeling of a patient and a willingness reward value of the patient, and the specific expression is as follows:

in (1) the->Representing the prediction result of the survival rate of the patient based on the BP neural network, s representing the clinical data of the patient after normalizing the matrix,/I>Representing a survival reward value,/->Representing additional prize values,/->A score range of 0 to 5 points representing a patient willingness reward value;

training a neural network using the dataset to obtain a hemodialysis regimen modeling for providing a patient hemodialysis regimen decision;

2. The reinforcement learning-based hemodialysis regimen of claim 1, wherein the patient clinical data is obtained by organizing age, weight, urea nitrogen information, parathyroid hormone information, blood creatinine information, hemoglobin information, blood calcium information, blood phosphorus information, blood sodium information, and dialysis history of the patient in a time series using an identification ID of the patient as a distinguishing mark.

3. The reinforcement learning based hemodialysis solution formulation method of claim 1, wherein the neural network is further provided with an experience replay pool for storing rewards values obtained by the agent and environment interactions.

4. The reinforcement learning-based hemodialysis solution formulation method of claim 1, wherein the neural network is updated with parameters based on a loss function composed of time difference errors during training.

5. The reinforcement learning-based hemodialysis regimen making method of claim 4, wherein the expression of the loss function is as follows: wherein (1)>The error of the time difference is indicated,representing the state of the patient at time t, +.>Represents the dialysis action at time t, +.>Represents the noise added at time t, < >>Representing the main network parameters>Representing the target network parameters->Weight value representing preferential empirical playback, +.>Main network representing noise depth Q network, +.>Target network representing noise depth Q network, +.>Representing that the discount coefficient takes a value between 0 and 1, ">Indicating the prize value at time t.

6. A hemodialysis solution making system, which is characterized by being realized by the reinforcement learning-based hemodialysis solution making method according to any one of claims 1-5, and comprising a data acquisition module, a data processing module, a strategy learning module and an auxiliary decision-making module;

the data acquisition module is used for acquiring historical hemodialysis data of a patient and clinical data of the patient;

the data processing module generates corresponding patient states according to the input clinical data of the patient;

the strategy learning module is used for constructing a hemodialysis scheme making model containing willingness rewards of patients;