WO2021151295A1

WO2021151295A1 - Method, apparatus, computer device, and medium for determining patient treatment plan

Info

Publication number: WO2021151295A1
Application number: PCT/CN2020/118873
Authority: WO
Inventors: 徐卓扬; 赵惟; 左磊; 孙行智; 胡岗
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-06-29
Filing date: 2020-09-29
Publication date: 2021-08-05
Also published as: CN111785366B; CN111785366A

Abstract

Provided are a method, apparatus, and computer device for determining a patient treatment plan, which can solve the problem of an insufficiently accurate generated result when generating a patient treatment plan online. The method comprises: on the basis of deep Q-learning (DQN), creating a patient grouping model used for processing a time series data (101); using sample data marked with the grouping result to train a patient grouping model so as to cause the patient grouping model to satisfy a preset training standard (102); inputting target patient data within a preset time period into the patient grouping model which satisfies the preset training standard to obtain a target group to which the target patient belongs (103); determining a first treatment plan of the target patient on the basis of the features of the population in the target group (104); extracting contraindicated drugs of the target patient according to the target patient data, and from the first treatment plan, filtering out a second treatment plan containing the contraindicated drugs (105); according to the first treatment plan and the second treatment plan, analyzing and obtaining a target treatment plan for the target patient (106).

Description

Method, device, computer equipment and medium for determining patient treatment plan

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on June 29, 2020, the application number is CN202010602269.2, and the name is "Methods, devices and computer equipment for determining patient treatment plans", the entire contents of which are incorporated by reference Incorporated in this application.

Technical field

This application relates to the field of digital medicine, and in particular to a method, device, computer equipment, and medium for determining a patient's treatment plan.

Background technique

Deep reinforcement learning is one of the machine learning methods. It completes the learning from the environment state to the action mapping, selects the optimal strategy according to the maximum feedback value, selects the optimal action for the search strategy, and causes the change of the state to obtain the delayed feedback value and evaluate Function, iterative loop, until the learning condition is met, the learning is terminated.

With the development of science and technology, deep reinforcement learning has gradually been applied to various fields. At present, there have been works using deep reinforcement learning technology for patient diagnosis. The inventor realizes that the methods of using deep reinforcement learning for patient diagnosis often have the following shortcomings: 1. In the patient diagnosis scenario, which features are more concerned about when making diagnosis decisions, and how much each feature contributes to the outcome, while the current model is difficult to explain , Resulting in the information can not be transparent. 2. Current models often only take the patient's single follow-up information as input, but a single follow-up cannot fully represent the patient's long-term follow-up status, resulting in inaccurate analysis results.

Summary of the invention

According to one aspect of the present application, there is provided a method for determining a patient's treatment plan, the method including:

Create a patient clustering model for processing time series data based on deep reinforcement learning DQN;

Training the patient clustering model by using sample data marked with clustering results, so that the patient clustering model meets a preset training standard;

Input the target patient data in a preset time period into a patient grouping model that meets the preset training standard, and obtain the target group to which the target patient belongs;

Determining the first treatment plan of the target patient based on the characteristics of the population in the target group;

Extracting the contraindicated drugs of the target patient according to the target patient data, and selecting a second treatment plan containing the contraindicated drugs from the first treatment plan;

According to the first treatment plan and the second treatment plan, the target treatment plan of the target patient is obtained by analysis.

According to another aspect of the present application, there is provided a device for determining a patient's treatment plan, the device comprising:

The creation module is used to create a patient grouping model for processing time series data based on deep reinforcement learning DQN;

A training module, configured to train the patient clustering model by using sample data marked with clustering results, so that the patient clustering model meets a preset training standard;

The input module is used to input target patient data in a preset time period into a patient grouping model that meets the preset training standard, and obtain the target group to which the target patient belongs;

A determining module, configured to determine the first treatment plan of the target patient based on the characteristics of the population in the target group;

An extraction module for extracting contraindicated drugs of the target patient according to the target patient data, and selecting a second treatment plan containing the contraindicated drugs from the first treatment plan;

The analysis module is used to analyze and obtain the target treatment plan of the target patient according to the first treatment plan and the second treatment plan.

According to another aspect of the present application, there is provided a computer-readable storage medium on which a computer program is stored, and the program is executed by a processor to implement the following steps:

According to another aspect of the present application, there is provided a computer device, including a storage medium, a processor, and a computer program stored on the storage medium and running on the processor. The processor executes the following steps when the program is executed :

Description of the drawings

The drawings described here are used to provide a further understanding of the application and constitute a part of the application. The exemplary embodiments and descriptions of the application are used to explain the application, and do not constitute an improper limitation of the local application. In the attached picture:

FIG. 1 shows a schematic flowchart of a method for determining a patient's treatment plan provided by an embodiment of the present application;

FIG. 2 shows a schematic flowchart of another method for determining a patient's treatment plan provided by an embodiment of the present application;

FIG. 3 shows a network structure diagram of a patient grouping model provided by an embodiment of the present application;

FIG. 4 shows a schematic structural diagram of a device for determining a patient's treatment plan provided by an embodiment of the present application;

Fig. 5 shows a schematic structural diagram of another device for determining a patient treatment plan provided by an embodiment of the present application.

Detailed ways

Hereinafter, the present application will be described in detail with reference to the drawings and in conjunction with the embodiments. It should be noted that the embodiments in the application and the features in the embodiments can be combined with each other if there is no conflict.

Aiming at the problem of weak interpretability of feature contribution and insufficient accuracy of analysis results when applying deep reinforcement learning technology to patient diagnosis, an embodiment of the present application provides a method for determining a patient's treatment plan, as shown in FIG. 1 , The method includes:

101. Create a patient grouping model for processing time series data based on the deep reinforcement learning DQN.

For this embodiment, the purpose is to improve the traditional deep reinforcement learning DQN model, extend the model to a time series model, and add an Attention mechanism, and use the improved DQN model to process patients into groups so that they can be used to process time series Data, and can realize the interpretability of patient characteristics.

102. Train the patient clustering model by using the sample data marked with the clustering results, so that the patient clustering model meets the preset training standard.

In specific application scenarios, the grouping decision rules can be set in advance, and the group to which the sample data belongs can be determined based on the grouping decision rules, and then the grouping results can be marked in the corresponding sample data in a similarly labeled form for use As a reference for verification, the patient clustering model is verified against the sample data output results, and then the training status of the patient clustering model is determined. If the output result of the patient clustering model is determined to have a small error with the labeling result, it can be determined that the patient clustering model conforms to Preset training standards.

103. Input the target patient data in the preset time period into a patient grouping model that meets the preset training standard, and obtain the target group to which the target patient belongs.

Among them, the preset time period can be set according to actual application requirements. For example, the preset time period can be set to include the current time in the previous month, and the corresponding historical target patient data is a record recorded in the preset time period. Or multiple follow-up data about the target patient.

For this embodiment, in a specific application scenario, when the patient's single follow-up information is used as input, the single follow-up information cannot fully represent the patient's long-term follow-up status, which may easily lead to inaccurate analysis results. Therefore, in this embodiment, in addition to the patient follow-up data at the current moment as input, all historical patient follow-up data existing in a preset time period can also be used as input, and the output results of the follow-up data of each patient are integrated to determine The final relatively accurate target grouping result. In addition, the Attention mechanism can also be used to explain the contribution, attention coefficient, contribution ratio, etc. of each feature at each time point to the clustering result.

104. Determine a first treatment plan for the target patient based on the characteristics of the population in the target group.

In specific application scenarios, after the target patient data is divided into groups, based on the population information in the group, a patient with a high similarity to the target patient population can be determined, so as to be based on the patient’s generated data. The first treatment plan that can be selected by the target patient is screened out.

105. Extract the contraindicated drugs of the target patient based on the target patient's data, and screen out the second treatment plan containing the contraindicated drugs from the first treatment plan.

For this embodiment, in a specific application scenario, because different patients may have different contraindicated drugs, the contraindicated drugs of the target patient should be extracted first, so that the first treatment plan containing the corresponding contraindicated drugs should be screened out. The second treatment plan, so that the second treatment plan is not considered when the treatment plan recommendation is finally generated.

106. According to the first treatment plan and the second treatment plan, analyze and obtain the target treatment plan of the target patient.

For this embodiment, in a specific application scenario, after the first treatment plan and the second treatment plan are determined, the second treatment plan will be excluded from the first treatment plan, and the eliminated first treatment plan will be determined as The target treatment plan of the target patient, in this embodiment, takes into account the drug contraindication factors, so as to ensure the safety of the patient's treatment.

Through the method for determining the patient treatment plan in this embodiment, by proposing an improved deep reinforcement learning model DQN network structure, in order to create a patient clustering model for processing time series data, and then use the sample data to train the patient clustering model to achieve Preset training standards. Then input the target patient data in the preset time period into the patient grouping model that meets the preset training standards, and then the target grouping result can be obtained, and then the first treatment plan of the target patient can be determined by using the characteristics of the population in the target group ; To enhance the safety of diagnosis, the target patient’s contraindicated drugs can also be determined based on the target patient’s data, so that the second treatment plan containing the contraindicated drugs can be screened from the first treatment plan; finally, the first treatment plan and the second treatment plan can be used The treatment plan is analyzed to obtain the target treatment plan suitable for the target patient. In addition, in this application, the digital processing of the patient's treatment plan can be realized, and the calculation process of the expected reward value Q can be extended to a time series structure, which can consider more information, and by integrating artificial intelligence and deep learning algorithms, The analysis result is more accurate.

Further, as a refinement and expansion of the specific implementation of the foregoing embodiment, in order to fully explain the specific implementation process in this embodiment, another method for determining a patient's treatment plan is provided. As shown in FIG. 2, the method includes:

201. Create a patient grouping model for processing time series data based on deep reinforcement learning DQN.

For this embodiment, in a specific application scenario, step 201 of the embodiment may specifically include: splitting the deep reinforcement learning DQN corresponding to the last fully connected layer in the network structure into a first fully connected layer and a second recurrent neural network Layer, the third cyclic neural network layer; use the deep reinforcement learning DQN after changing the network structure to construct a patient grouping model, so that when the patient data containing multiple time points is input to the patient grouping model, the first fully connected layer outputs each time The point corresponds to the embedded value of the patient's state, the second recurrent neural network layer outputs the first degree of attention corresponding to the patient state at each time point, and the third recurrent neural network layer outputs the second degree of attention corresponding to the grouping result at each time point, and is based on The embedded value, the first degree of attention, and the second degree of attention are used to calculate the expected reward value of the patient data corresponding to each preset group.

For example, in the network structure diagram of the patient grouping model shown in Figure 3, the abstract features extracted by the convolutional layer are divided into three branches, that is, the last fully connected layer in the corresponding network structure of the deep reinforcement learning DQN is split into : The first fully connected layer 1, the second cyclic neural network layer 2, the third cyclic neural network layer 3. The first fully connected layer 1 is used to output the embedded value corresponding to the patient state at each time point, and the second cyclic neural network layer 2 Is the state value function (value function), used to output the first degree of attention corresponding to the patient state at each time point, and the third recurrent neural network layer 3 is the action advantage function (advantage function), used to output the clustering results corresponding to each time point The second degree of attention.

In a specific application scenario, in order to monitor the training status of the patient clustering model when using the sample data to train the patient clustering model, it is necessary to mark the sample data to belong to the group in advance, which specifically includes: The sample data is grouped into groups, and the grouping result corresponding to each sample data is obtained; the sample data is marked based on the grouping result.

Among them, the preset grouping decision rules can be set according to actual needs. For example, the grouping decision rules can be set according to the patient's personal characteristic information and combined with the inspection index information for classification. In group division, patients with high similarity in personal characteristic information and containing the same examination indicators and the same examination results can be divided into a group.

202. Input the sample data at the current time point and the historical time point into the patient grouping model to obtain a preset number of groups, and the expected reward value of each group corresponding to each sample data.

Among them, the sample data is time series data containing the current time point and a preset number of historical time points, and can include patient data information at the current time and historical time. The patient data information can be personal identification information (such as name, gender, age, etc.) ), treatment plan information (drug combination, medication cycle, dosage, etc.), inspection index information (such as blood sugar, blood pressure, electrocardiogram and other inspection indicators and corresponding inspection results, etc.), etc.; the expected reward value is calculated at the same time point After the first sum of the first degree of attention and the second degree of attention, and the product of the first sum and the embedded value, it is obtained by accumulating the product of the current time point and the historical time point.

For example, the network structure diagram of the patient clustering model shown in Figure 3, if the current patient status (s ₃ ) corresponding to the sample data is input to the patient clustering model plus the patient status at two historical time points (s ₁ , s ₂ ) , Through the fully connected layer and two recurrent neural network layers in the patient grouping model, the e(e ₁ , e ₂ , e ₃ ) output at each time point of the first fully connected layer can be obtained, and the second recurrent neural network layer Output V (V ₁ , V ₂ , V ₃ _{) at each time point, A (A 1} , A ₂ , A ₃ ) at each time point output by the third loop neural network layer, and then use V in the same time step Add to A, then multiply it element-wise with e, and then accumulate the Q value (Q ₃ ) of the current state. Among them, V represents the degree of attention corresponding to the patient state at each time point; A represents the degree of attention corresponding to the patient state at each time point; e represents the embedded representation of the patient state. The calculation formula for each layer is:

h _V1 ,h _V2 ,h _V3 =LSTM-V(s ₁ ,s ₂ ,s ₃ )

h _A1 ,h _A2 ,h _A3 =LSTM-A(s ₁ ,s ₂ ,s ₃ )

A ₁ ,A ₂ ,A ₃ =(W _A h _A1 ,W _A h _A2 ,W _A h _A3 )

v ₁ ,v ₂ ,v ₃ =(W _I s ₁ ,W _I s ₂ ,W _I s ₃ )

e ₁ , e ₂ , e ₃ = (W _II v ₁ , W _II v ₂ , W _II v ₃ )

Q ₃ =(e ₁ O(V ₁ +A ₁ )+e ₂ O(V ₂ +A ₂ )+e ₃ O(V ₃ +A ₃ )

_{_{Wherein, s i, h vi, w}} v, h Ai, A i, v i, e i, Q 3 is a vector, V _i is a _{_{scalar, W A, W I, W}} II matrix, O for corresponding elements are multiplied .

It should be noted that in this application, an Attention mechanism is also incorporated to realize the interpretability of patient characteristics. Among them, the interpretation method of the model decision can be: the contribution of each patient characteristic at each time point to the final Q value can be positively derived from all the _{input s i.}

According to the calculation formula of the expected reward value (Q _{3 ):}

Q ₃ =(e ₁ O(V ₁ +A ₁ )+e ₂ O(V ₂ +A ₂ )+e ₃ O(V ₃ +A ₃ ))=(W _II W _I s ₁ O(V ₁ + A ₁ )+W _II W _I s ₂ O(V ₂ +A ₂ )+W _II W _I s ₃ O(V ₃ +A ₃ ))

It can be seen that the importance of the j-th feature at the i-th time point to the k-th Q value is:

w(i,j,k)=(V _i +A _i [k])*(W _II [K]·W _I [j])*s _i [j]

Among them, (V _i +A _i [k])*(W _II [k]·W _I [j]) is the coefficient of contribution and the degree of attention.

203. Extract the label group corresponding to the sample data, and determine the first expected reward value corresponding to the output of the label group as the training output result of the patient grouping model.

For this embodiment, in a specific application scenario, each sample data corresponds to a unique label group. When the sample data is input into the patient grouping model, the expected reward value corresponding to each preset group will be obtained. In order to verify For the training process of the patient grouping model, it is only necessary to extract the first expected reward value corresponding to the output of the marked group, and determine the first expected reward value as the training output result of the patient grouping model.

204. Calculate the mean square error loss between the first expected reward value and the real expected reward value, and if the loss function is determined to reach a convergent state according to the mean square error loss, it is determined that the patient grouping model meets the preset training standard.

Among them, the first expected reward value is the expected reward value in the current patient state corresponding to the output of the marked group, and the real expected reward value is the largest expected reward value in the next patient state + the actual reward (reward), which is further calculated , Which is the actual expected reward value of the corresponding marked group.

For this embodiment, in a specific application scenario, after extracting the first expected reward value, it is necessary to calculate the mean square error loss based on the first expected reward value and the real expected reward value to further determine whether the loss function has reached the convergence state. When the loss When the function reaches the convergence state, it can be determined that the patient grouping model meets the preset training standard.

205. If it is determined that the loss function has not reached the convergence state, the sample data is used to repeatedly train the patient grouping model, so that the patient grouping model meets the preset training standard.

Correspondingly, if it is determined that the loss function has not reached the convergence state, it can be determined that the patient clustering model has not been successfully trained, and the above training steps should be repeated using sample data to make the patient clustering model meet the preset training standard.

206. Input the target patient data within a preset time period into a patient grouping model that meets the preset training standard, and obtain the target group to which the target patient belongs.

Among them, when the target patient information is time series data, all target patient information at the current time and historical time needs to be input into the patient grouping model to obtain the grouping results; when the target patient information is not time series data, only the target patient information at the current time needs to be The patient information is input into the patient clustering model, and the parameter value corresponding to the historical time point in the patient clustering model is set to 0 to obtain the clustering result.

For this embodiment, in a specific application scenario, when the target patient information is time series data, step 206 of the embodiment may specifically include: extracting historical patient follow-up data and current patient follow-up data of the target patient within a preset time period; Patient follow-up data and current patient follow-up data are input into the patient grouping model that meets the preset training standards to obtain the expected reward value corresponding to each preset group; the preset group with the largest expected reward value is determined as the target patient corresponding The target group for.

207. Determine a first treatment plan for the target patient based on the characteristics of the population in the target group.

For this embodiment, in a specific application scenario, in order to determine the first treatment plan of the target patient, step 207 of the embodiment may specifically include: screening the target group in the target group according to the target patient data and the similarity of the characteristics of the population corresponding to the target patient is greater than For the first patient with the first preset threshold, the population characteristics include at least condition information and personal information; extract the treatment plan corresponding to the first patient and the score value of the treatment plan regarding the treatment effect, and treat the treatment with the score value greater than the second preset threshold The plan is determined as the first treatment plan; or a preset treatment plan created according to the characteristics of the target group is obtained, and the preset treatment plan is determined as the first treatment plan.

Among them, the target group contains the data information of multiple sample patients. In addition to the feature information of multiple dimensions such as the sample patient’s personal identity information, inspection index information, and diagnosis result information, the data information can also include information about the treatment effect. Score information and treatment plan information, such as medication combination, medication cycle, dosage, etc.; the first preset threshold and the second preset threshold are both data greater than 0 and less than or equal to 1, and the specific values can be set according to specific application scenarios , It should be noted that when the value set by the first preset threshold is closer to 1, it can indicate that the feature similarity between the first patient and the target patient selected is higher; when the value set by the second preset threshold is higher Close to 1, it can indicate that the first treatment plan selected, the better the treatment effect after patient feedback.

In specific application scenarios, after grouping the target patients, the target patient’s personal identity information, inspection index information, diagnosis result information and other multi-dimensional feature information can be extracted from the target patient’s information in advance. The first patient whose matching degree with the feature information of the target patient is greater than the first preset threshold is selected from the group, and then the treatment plan whose score value corresponding to the treatment effect of the first patient is greater than the second preset threshold is extracted, and the treatment plan Determined as the first treatment plan.

For example, according to the data of the target patient, the first patient whose feature similarity with the target patient's corresponding population is greater than the first preset threshold is selected in the target group according to the target patient data, including four first patients A, B, C, and D, among which the first patient A The corresponding medication combination is a+c+d, the medication combination corresponding to the first patient B is a+c+e, the medication combination corresponding to the first patient C is a+b+c, and the medication combination corresponding to the first patient D is a+c+d, through statistics, it can be found that a+c+d, a+c+e, and a+b+c three non-overlapping treatment plans are included, and then the score values of the three plans regarding the treatment effect can be obtained, for example The score value corresponding to the treatment plan a+c+d is 0.75, the score value corresponding to the treatment plan a+b+e is 0.91, and the score value corresponding to the treatment plan a+b+c is 0.88. If the second preset threshold is set to 0.85, it can be determined that the selected first treatment plan includes a+b+e and a+b+c.

Correspondingly, as another optional method in this embodiment, the preset treatment plan corresponding to each target group may be determined in advance according to the characteristics of the population in the target group and the diagnosis result of the physician, for example, for the target group When the patient of is a child, and the corresponding doctor’s diagnosis result is disease a, and the commonly used treatment options include A and B, then treatment options A and B can be directly determined as the preset treatment options corresponding to the target group. When it is determined that the target patient belongs to the target group, treatment plans A and B can be determined as the first treatment plan corresponding to the target patient.

208. Extract the contraindicated drugs of the target patient based on the target patient data, and screen out the second treatment plan containing the contraindicated drugs from the first treatment plan.

For this embodiment, in a specific application scenario, in order to determine to obtain a second treatment plan containing drugs contraindicated by the target patient, step 208 of the embodiment may specifically include: determining, according to the drug contraindication data, the target patient corresponding to the population type that is not suitable for taking the second treatment plan. A contraindicated drug; based on the drug allergy history in the target patient's data, determine the second contraindicated drug in which the target patient has an allergic reaction; determine the first treatment plan containing the first contraindicated drug and/or the second contraindicated drug as the second treatment plan .

For example, when the target patient is a pregnant woman, the first contraindication drug of the target patient may correspond to the drug forbidden by the pregnant woman; when the target patient is a penicillin allergic population, penicillin drugs can be determined as the second contraindication drug of the target patient.

209. According to the first treatment plan and the second treatment plan, analyze and obtain the target treatment plan of the target patient.

For this embodiment, in a specific application scenario, step 209 of the embodiment may specifically include: excluding the second treatment plan from the first treatment plan to obtain the target treatment plan.

For example, by excluding the second treatment plan that is forbidden by the population from the first treatment plan that can treat the disease to which the target patient belongs, you can get the treatment plan suitable for the health of the population. Using these treatment plans can effectively treat the population’s health. disease.

Through the above method of determining patient treatment plan, an interpretable deep reinforcement learning model DQN network structure is proposed to create a patient clustering model for processing time series data, and then use sample data to train the patient clustering model to achieve the expected Set training standards. Then input the target patient data in the preset time period into the patient grouping model that meets the preset training standards, and then the target grouping result can be obtained, and then the first treatment plan of the target patient can be determined by using the characteristics of the population in the target group ; To enhance the safety of diagnosis, the target patient’s contraindicated drugs can also be determined based on the target patient’s data, so that the second treatment plan containing the contraindicated drugs can be screened from the first treatment plan; finally, the first treatment plan and the second treatment plan can be used The treatment plan is analyzed to obtain the target treatment plan suitable for the target patient. In addition, in this application, the digital processing of the patient's treatment plan can be realized, and the calculation process of the expected reward value Q can be extended to a time series structure, which can consider more information, and by integrating artificial intelligence and deep learning algorithms, The analysis result is more accurate. In addition, the Attention mechanism is added in the process of calculating the expected reward value, which can achieve a certain degree of interpretability.

Further, as a specific embodiment of the method shown in FIG. 1 and FIG. 2, an embodiment of the present application provides a device for determining a patient's treatment plan. As shown in FIG. 4, the device includes: a creation module 31, a training module 32, and an input Module 33, determination module 34, extraction module 35, analysis module 36.

The creation module 31 can be used to create a patient grouping model for processing time series data based on deep reinforcement learning DQN;

The training module 32 can be used to train the patient clustering model by using the sample data with marked clustering results, so that the patient clustering model meets the preset training standard;

The input module 33 can be used to input target patient data within a preset time period into a patient grouping model that meets the preset training standard, and obtain the target grouping result;

The determining module 34 can be used to input target patient data in a preset time period into a patient grouping model that meets the preset training standard, and obtain the target group to which the target patient belongs;

The extraction module 35 can be used to extract the contraindicated drugs of the target patient based on the target patient's data, and screen out the second treatment plan containing the contraindicated drugs from the first treatment plan;

The analysis module 36 can be used to analyze and obtain the target treatment plan of the target patient according to the first treatment plan and the second treatment plan.

In a specific application scenario, in order to create a patient grouping model for processing time series data, as shown in FIG. 5, the creation module 31 may specifically include: a splitting unit 311 and a construction unit 312;

The splitting unit 311 can be used to split the deep reinforcement learning DQN corresponding to the last fully connected layer in the network structure into a first fully connected layer, a second cyclic neural network layer, and a third cyclic neural network layer;

The construction unit 312 can be used to construct a patient grouping model by using the deep reinforcement learning DQN after changing the network structure, so that when the patient data containing multiple time points is input to the patient grouping model, the first fully connected layer outputs the corresponding patients at each time point The embedded value of the state, the second recurrent neural network layer outputs the first degree of attention corresponding to the patient state at each time point, and the third recurrent neural network layer outputs the second degree of attention corresponding to the grouping result at each time point, and based on the embedded value, The first degree of attention and the second degree of attention calculate the expected reward value of the patient data corresponding to each preset group.

Correspondingly, in order to train a patient grouping model that meets the preset training standards, as shown in FIG. 5, the training module 32 may specifically include: a first input unit 321, a first extraction unit 322, a calculation unit 323, and a training unit 324;

The first input unit 321 can be used to input the sample data at the current time point and the historical time point into the patient grouping model to obtain a preset number of groups, and each sample data corresponds to the expected reward value of each group, the expected reward The value is obtained by accumulating the product of the current time point and the historical time point after calculating the first sum of the first degree of attention and the second degree of attention at the same time point, and the product of the first sum and the embedded value;

The first extraction unit 322 may be used to extract the label group corresponding to the sample data, and determine the first expected reward value corresponding to the output of the label group as the training output result of the patient grouping model;

The calculation unit 323 can be used to calculate the mean square error loss between the first expected reward value and the real expected reward value. If it is determined that the loss function reaches the convergence state based on the mean square error loss, it is determined that the patient grouping model meets the preset training standard;

The training unit 324 can be used to repeatedly train the patient clustering model by using the sample data if it is determined that the loss function has not reached the convergence state, so that the patient clustering model meets the preset training standard.

In a specific application scenario, in order to determine the target group corresponding to the target patient, as shown in FIG. 5, the input module 33 may specifically include: a second extraction unit 331, a second input unit 332, and a first determination unit 333;

The second extraction unit 331 can be used to extract historical patient follow-up data and current patient follow-up data of the target patient within a preset time period;

The second input unit 332 can be used to input historical patient follow-up data and current patient follow-up data into a patient grouping model that meets the preset training standards to obtain the expected reward value corresponding to each preset group;

The first determining unit 333 may be used to determine the preset group with the largest expected reward value as the target group corresponding to the target patient.

In a specific application scenario, in order to determine the first treatment plan of the target patient based on the target grouping result, as shown in FIG. 5, the determining module 34 may specifically include: a screening unit 341 and a second determining unit 342;

The screening unit 341 can be used to screen the first patients whose population characteristics similarity to the target patient is greater than a first preset threshold in the target group based on the target patient data, and the population characteristics include at least medical condition information and personal information;

The second determining unit 342 may be used to extract the treatment plan corresponding to the first patient and the score value of the treatment plan with respect to the treatment effect, and determine the treatment plan with the score value greater than the second preset threshold as the first treatment plan; or

The second determining unit 342 may also be used to obtain a preset treatment plan created according to the characteristics of the target group, and determine the preset treatment plan as the first treatment plan.

In a specific application scenario, in order to screen out the second treatment plan containing the contraindicated drugs of the target patient from the first treatment plan, as shown in FIG. 5, the extraction module 35 may specifically include: a third determination unit 351;

The third determining unit 351 can be used to determine the first contraindicated drug that the target patient corresponds to the population type that is not suitable for taking according to the drug contraindicated data;

The third determining unit 351 can also be used to determine the second contraindicated drug for which the target patient has an allergic reaction based on the drug allergy history in the target patient's data;

The third determining unit 351 may also be used to determine the first treatment plan including the first contraindication drug and/or the second contraindication drug as the second treatment plan.

Correspondingly, in order to analyze and obtain the target treatment plan of the target patient, as shown in FIG. 5, the analysis module 36 may specifically include: a rejection unit 361;

The rejection unit 361 can be used to remove the second treatment plan from the first treatment plan to obtain the target treatment plan.

It should be noted that, for other corresponding descriptions of the functional units involved in the device for determining a patient treatment plan provided in this embodiment, reference may be made to the corresponding descriptions in FIGS. 1 to 2, and details are not repeated here.

Based on the above-mentioned method shown in FIG. 1 and FIG. 2, correspondingly, an embodiment of the present application also provides a computer-readable storage medium. The computer-readable storage medium may include non-volatile and/or volatile memory. A computer program is stored thereon, and when the program is executed by the processor, the method for determining the patient's treatment plan as shown in FIG. 1 and FIG. 2 is realized.

Based on this understanding, the technical solution of the present application can be embodied in the form of a software product. The software product can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.), including several The instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute the methods in each implementation scenario of the present application.

Based on the above methods shown in Figures 1 and 2 and the virtual device embodiments shown in Figures 4 and 5, in order to achieve the above objectives, an embodiment of the present application also provides a computer device, which may be a personal computer, Servers, network devices, etc., the physical device includes a storage medium and a processor; the storage medium is used to store a computer program, and may include non-volatile and/or volatile memory; the processor is used to execute the computer program to achieve the above The method for determining the patient's treatment plan is shown in Figure 1 and Figure 2.

Optionally, the computer device may also include a user interface, a network interface, a camera, a radio frequency (RF) circuit, a sensor, an audio circuit, a Wi-Fi module, and so on. The user interface may include a display screen (Display), an input unit such as a keyboard (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, and the like. The optional network interface can include standard wired interface, wireless interface (such as Bluetooth interface, WI-FI interface) and so on.

Those skilled in the art can understand that the computer device structure provided in this embodiment does not constitute a limitation on the physical device, and may include more or fewer components, or combine certain components, or arrange different components.

The non-volatile readable storage medium may also include an operating system and a network communication module. The operating system is a program that analyzes the hardware and software resources of the physical device for the semantic similarity of text, and supports the operation of information processing programs and other software and/or programs. The network communication module is used to implement communication between various components in the non-volatile readable storage medium, and communication with other hardware and software in the physical device.

Through the description of the above embodiments, those skilled in the art can propose an interpretable deep reinforcement learning model DQN network structure to create a patient clustering model for processing time series data, and then use the sample data to train the patient clustering model , So that it meets the preset training standards. Then input the target patient data in the preset time period into the patient grouping model that meets the preset training standards, and then the target grouping result can be obtained, and then the first treatment plan of the target patient can be determined by using the characteristics of the population in the target group ; To enhance the safety of diagnosis, the target patient’s contraindicated drugs can also be determined based on the target patient’s data, so that the second treatment plan containing the contraindicated drugs can be screened from the first treatment plan; finally, the first treatment plan and the second treatment plan can be used The treatment plan is analyzed to obtain the target treatment plan suitable for the target patient. In addition, in this application, the digital processing of the patient's treatment plan can be realized, and the calculation process of the expected reward value Q can be extended to a time series structure, which can consider more information, and by integrating artificial intelligence and deep learning algorithms, The analysis result is more accurate. In addition, the Attention mechanism is added in the process of calculating the expected reward value, which can achieve a certain degree of interpretability.

Those skilled in the art can understand that the accompanying drawings are only schematic diagrams of preferred implementation scenarios, and the modules or processes in the accompanying drawings are not necessarily necessary for implementing this application. Those skilled in the art can understand that the modules in the device in the implementation scenario can be distributed in the device in the implementation scenario according to the description of the implementation scenario, or can be changed to be located in one or more devices different from the implementation scenario. The modules of the above implementation scenarios can be combined into one module or further divided into multiple sub-modules.

The above serial number of this application is for description only, and does not represent the pros and cons of implementation scenarios. What has been disclosed above are only a few specific implementation scenarios of this application, but this application is not limited to these, and any changes that can be thought of by those skilled in the art should fall into the protection scope of this application.

Claims

A method for determining a patient's treatment plan, which includes:

Create a patient clustering model for processing time series data based on deep reinforcement learning DQN;

Training the patient clustering model by using sample data marked with clustering results, so that the patient clustering model meets a preset training standard;

Input the target patient data in a preset time period into a patient grouping model that meets the preset training standard, and obtain the target group to which the target patient belongs;

Determining the first treatment plan of the target patient based on the characteristics of the population in the target group;

Extracting the contraindicated drugs of the target patient according to the target patient data, and selecting a second treatment plan containing the contraindicated drugs from the first treatment plan;

According to the first treatment plan and the second treatment plan, the target treatment plan of the target patient is obtained by analysis.
The method according to claim 1, wherein the creation of a patient grouping model for processing time series data based on deep reinforcement learning DQN specifically comprises:

The deep reinforcement learning DQN corresponding to the last fully connected layer in the network structure is split into the first fully connected layer, the second recurrent neural network layer, and the third recurrent neural network layer;

Use the deep reinforcement learning DQN after changing the network structure to construct a patient grouping model, so that when the patient data containing multiple time points is input to the patient grouping model, the first fully connected layer outputs the corresponding patients at each time point The embedding value of the state, the second recurrent neural network layer outputs the first attention degree corresponding to the patient state at each time point, and the third recurrent neural network layer outputs the second attention degree corresponding to the grouping result at each time point, and The expected reward value of each preset group corresponding to the patient data is calculated based on the embedded value, the first degree of attention, and the second degree of attention.
The method according to claim 2, wherein the sample data is time series data including a current time point and a preset number of historical time points;

The training of the patient clustering model using sample data marked with clustering results so that the patient clustering model meets a preset training standard specifically includes:

The sample data at the current time point and the historical time point are input into the patient grouping model to obtain a preset number of groups, and each sample data corresponds to the expected reward value of each group, the expected reward The value is calculated by accumulating the first sum of the first degree of interest and the second degree of interest at the same time point and the product of the first sum and the embedded value. And the product obtained at the historical time point;

Extracting the label group corresponding to the sample data, and determining the first expected reward value corresponding to the output of the label group as the training output result of the patient grouping model;

Calculate the mean square error loss between the first expected reward value and the real expected reward value, and if it is determined based on the mean square error loss that the loss function reaches a convergence state, it is determined that the patient grouping model meets a preset training standard;

If it is determined that the loss function has not reached the convergence state, the sample data is used to repeatedly train the patient grouping model, so that the patient grouping model meets the preset training standard.
The method according to claim 3, wherein the inputting the target patient data within a preset time period into a patient grouping model that meets the preset training standard to obtain the target group to which the target patient belongs specifically includes:

Extract historical patient follow-up data and current patient follow-up data of the target patient within a preset time period;

Input the historical patient follow-up data and the current patient follow-up data into a patient grouping model that meets the preset training standard, and obtain the expected reward value corresponding to each preset group;

The preset group with the largest expected reward value is determined as the target group corresponding to the target patient.
The method according to claim 4, wherein the determining the first treatment plan of the target patient based on the characteristics of the population in the target group specifically comprises:

Screening, in the target group according to the target patient data, first patients whose population characteristics similarity to the target patient are greater than a first preset threshold, and the population characteristics include at least medical condition information and personal information;

Extracting the treatment plan corresponding to the first patient and the score value of the treatment plan with respect to the treatment effect, and determining the treatment plan with the score value greater than the second preset threshold as the first treatment plan; or

Obtain a preset treatment plan created according to the characteristics of the population of the target group, and determine the preset treatment plan as the first treatment plan.
The method according to claim 5, wherein said extracting the contraindicated drugs of the target patient based on the target patient data, and selecting a second treatment plan containing the contraindicated drugs from the first treatment plan, Specifically:

According to the drug contraindication data, determine the first contraindication drug that the target patient corresponds to the population type that is not suitable for taking;

According to the drug allergy history in the target patient's data, it is determined that the target patient has a second contraindication drug for allergic reactions;

The first treatment plan including the first contraindication drug and/or the second contraindication drug is determined as the second treatment plan.
The method according to claim 6, wherein the analyzing and obtaining the target treatment plan of the target patient according to the first treatment plan and the second treatment plan specifically comprises:

The second treatment plan is removed from the first treatment plan to obtain the target treatment plan.
A device for determining a patient's treatment plan, which includes:

The creation module is used to create a patient grouping model for processing time series data based on deep reinforcement learning DQN;

A training module, configured to train the patient clustering model by using sample data marked with clustering results, so that the patient clustering model meets a preset training standard;

The input module is used to input target patient data in a preset time period into a patient grouping model that meets the preset training standard, and obtain the target group to which the target patient belongs;

A determining module, configured to determine the first treatment plan of the target patient based on the characteristics of the population in the target group;

An extraction module for extracting contraindicated drugs of the target patient according to the target patient data, and selecting a second treatment plan containing the contraindicated drugs from the first treatment plan;

The analysis module is used to analyze and obtain the target treatment plan of the target patient according to the first treatment plan and the second treatment plan.
A computer-readable storage medium having a computer program stored thereon, wherein the following steps are implemented when the program is executed by a processor:

Create a patient clustering model for processing time series data based on deep reinforcement learning DQN;

Training the patient clustering model by using sample data marked with clustering results, so that the patient clustering model meets a preset training standard;

Input the target patient data in a preset time period into a patient grouping model that meets the preset training standard, and obtain the target group to which the target patient belongs;

Determining the first treatment plan of the target patient based on the characteristics of the population in the target group;

Extracting the contraindicated drugs of the target patient according to the target patient data, and selecting a second treatment plan containing the contraindicated drugs from the first treatment plan;

According to the first treatment plan and the second treatment plan, the target treatment plan of the target patient is obtained by analysis.
The computer-readable storage medium according to claim 9, wherein the creation of a patient grouping model for processing time series data based on deep reinforcement learning DQN specifically comprises:

The deep reinforcement learning DQN corresponding to the last fully connected layer in the network structure is split into the first fully connected layer, the second recurrent neural network layer, and the third recurrent neural network layer;

Use the deep reinforcement learning DQN after changing the network structure to construct a patient grouping model, so that when the patient data containing multiple time points is input to the patient grouping model, the first fully connected layer outputs the corresponding patients at each time point The embedding value of the state, the second recurrent neural network layer outputs the first attention degree corresponding to the patient state at each time point, and the third recurrent neural network layer outputs the second attention degree corresponding to the grouping result at each time point, and The expected reward value of each preset group corresponding to the patient data is calculated based on the embedded value, the first degree of attention, and the second degree of attention.
The computer-readable storage medium according to claim 10, wherein the sample data is time series data including a current time point and a preset number of historical time points;

The training of the patient clustering model using sample data marked with clustering results so that the patient clustering model meets a preset training standard specifically includes:

The sample data at the current time point and the historical time point are input into the patient grouping model to obtain a preset number of groups, and each sample data corresponds to the expected reward value of each group. The expected reward The value is calculated by accumulating the first sum of the first degree of interest and the second degree of interest at the same time point and the product of the first sum and the embedded value And the product obtained at the historical time point;

Extracting the label group corresponding to the sample data, and determining the first expected reward value corresponding to the output of the label group as the training output result of the patient grouping model;

Calculate the mean square error loss between the first expected reward value and the real expected reward value, and if it is determined that the loss function reaches a convergent state based on the mean square error loss, it is determined that the patient grouping model meets a preset training standard;

If it is determined that the loss function has not reached the convergence state, the sample data is used to repeatedly train the patient grouping model, so that the patient grouping model meets the preset training standard.
11. The computer-readable storage medium according to claim 11, wherein said inputting target patient data within a preset period of time into a patient grouping model that meets said preset training criteria to obtain the target group to which the target patient belongs, Specifically:

Extract historical patient follow-up data and current patient follow-up data of the target patient within a preset time period;

Input the historical patient follow-up data and the current patient follow-up data into a patient grouping model that meets the preset training standard, and obtain the expected reward value corresponding to each preset group;

The preset group with the largest expected reward value is determined as the target group corresponding to the target patient.
The computer-readable storage medium according to claim 12, wherein the determining the first treatment plan of the target patient based on the characteristics of the population in the target group specifically comprises:

Screening, in the target group according to the target patient data, first patients whose population characteristics similarity to the target patient are greater than a first preset threshold, and the population characteristics include at least medical condition information and personal information;

Extracting the treatment plan corresponding to the first patient and the score value of the treatment plan with respect to the treatment effect, and determining the treatment plan with the score value greater than the second preset threshold as the first treatment plan; or

Obtain a preset treatment plan created according to the characteristics of the population of the target group, and determine the preset treatment plan as the first treatment plan.
A computer device includes a storage medium, a processor, and a computer program stored on the storage medium and running on the processor, wherein the processor implements the following steps when executing the program:

Create a patient clustering model for processing time series data based on deep reinforcement learning DQN;

Training the patient clustering model by using sample data marked with clustering results, so that the patient clustering model meets a preset training standard;

Input the target patient data in a preset time period into a patient grouping model that meets the preset training standard, and obtain the target group to which the target patient belongs;

Determining the first treatment plan of the target patient based on the characteristics of the population in the target group;

Extracting the contraindicated drugs of the target patient according to the target patient data, and selecting a second treatment plan containing the contraindicated drugs from the first treatment plan;

According to the first treatment plan and the second treatment plan, the target treatment plan of the target patient is obtained by analysis.
The computer device according to claim 14, wherein the creation of a patient grouping model for processing time series data based on deep reinforcement learning DQN specifically comprises:

The deep reinforcement learning DQN corresponding to the last fully connected layer in the network structure is split into the first fully connected layer, the second recurrent neural network layer, and the third recurrent neural network layer;

Use the deep reinforcement learning DQN after changing the network structure to construct a patient grouping model, so that when the patient data containing multiple time points is input to the patient grouping model, the first fully connected layer outputs the corresponding patients at each time point The embedding value of the state, the second recurrent neural network layer outputs the first attention degree corresponding to the patient state at each time point, and the third recurrent neural network layer outputs the second attention degree corresponding to the grouping result at each time point, and The expected reward value of each preset group corresponding to the patient data is calculated based on the embedded value, the first degree of attention, and the second degree of attention.
The computer device according to claim 15, wherein the sample data is time series data including a current time point and a preset number of historical time points;

The training of the patient clustering model using sample data marked with clustering results so that the patient clustering model meets a preset training standard specifically includes:

The sample data at the current time point and the historical time point are input into the patient grouping model to obtain a preset number of groups, and each sample data corresponds to the expected reward value of each group. The expected reward The value is calculated by accumulating the first sum of the first degree of interest and the second degree of interest at the same time point and the product of the first sum and the embedded value And the product obtained at the historical time point;

Extracting the label group corresponding to the sample data, and determining the first expected reward value corresponding to the output of the label group as the training output result of the patient grouping model;

Calculate the mean square error loss between the first expected reward value and the real expected reward value, and if it is determined that the loss function reaches a convergent state based on the mean square error loss, it is determined that the patient grouping model meets a preset training standard;

If it is determined that the loss function has not reached the convergence state, the sample data is used to repeatedly train the patient grouping model, so that the patient grouping model meets the preset training standard.
The computer device according to claim 16, wherein said inputting target patient data within a preset time period into a patient grouping model that meets said preset training standard to obtain the target group to which the target patient belongs specifically comprises:

Extract historical patient follow-up data and current patient follow-up data of the target patient within a preset time period;

Input the historical patient follow-up data and the current patient follow-up data into a patient grouping model that meets the preset training standard, and obtain the expected reward value corresponding to each preset group;

The preset group with the largest expected reward value is determined as the target group corresponding to the target patient.
The computer device according to claim 17, wherein the determining the first treatment plan of the target patient based on the characteristics of the population in the target group specifically comprises:

Screening, in the target group according to the target patient data, first patients whose population characteristics similarity to the target patient is greater than a first preset threshold, and the population characteristics include at least medical condition information and personal information;

Extracting the treatment plan corresponding to the first patient and the score value of the treatment plan with respect to the treatment effect, and determining the treatment plan with the score value greater than the second preset threshold as the first treatment plan; or

Obtain a preset treatment plan created according to the characteristics of the population of the target group, and determine the preset treatment plan as the first treatment plan.
18. The computer device according to claim 18, wherein said extracting contraindicated drugs of said target patient based on said target patient data, and selecting a second treatment plan containing said contraindicated drugs from said first treatment plan , Specifically including:

According to the drug contraindication data, determine the first contraindication drug that the target patient corresponds to the population type that is not suitable for taking;

According to the drug allergy history in the target patient's data, it is determined that the target patient has a second contraindication drug for allergic reactions;

The first treatment plan including the first contraindication drug and/or the second contraindication drug is determined as the second treatment plan.
18. The computer device according to claim 19, wherein the analyzing and obtaining the target treatment plan of the target patient according to the first treatment plan and the second treatment plan specifically includes:

The second treatment plan is excluded from the first treatment plan to obtain the target treatment plan.