CN111785366B

CN111785366B - Patient treatment scheme determination method and device and computer equipment

Info

Publication number: CN111785366B
Application number: CN202010602269.2A
Authority: CN
Inventors: 徐卓扬; 赵惟; 左磊; 孙行智; 胡岗
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2020-06-29
Filing date: 2020-06-29
Publication date: 2023-05-26
Anticipated expiration: 2040-06-29
Also published as: CN111785366A; WO2021151295A1

Abstract

The application discloses a method, a device and computer equipment for determining a patient treatment scheme, relates to the field of digital medical treatment, and can solve the problem that a generation result is inaccurate when the patient treatment scheme is generated online. The method comprises the following steps: creating a patient cluster model for processing the time series data based on the deep reinforcement learning DQN; training a patient grouping model by using sample data marked with the grouping result so that the patient grouping model accords with a preset training standard; inputting target patient data in a preset time period into a patient grouping model which accords with a preset training standard, and obtaining a target group to which a target patient belongs; determining a first treatment regimen for the target patient based on the population characteristics within the target group; extracting contraindicated medicines of the target patient according to the target patient data, and screening out a second treatment scheme containing the contraindicated medicines from the first treatment scheme; and analyzing and obtaining a target treatment scheme of the target patient according to the first treatment scheme and the second treatment scheme.

Description

Patient treatment scheme determination method and device and computer equipment

Technical Field

The present disclosure relates to the field of digital medical technology, and in particular, to a method and apparatus for determining a patient treatment plan, and a computer device.

Background

The deep reinforcement learning is one of machine learning methods, the mapping learning from the environment state to the action is completed, the optimal strategy is selected according to the maximum feedback value, the optimal action is selected by searching the strategy, the state change is caused to obtain a delay feedback value, the function is evaluated, and the iteration loop is performed until the learning condition is met, namely the learning is terminated.

With the development of technology, deep reinforcement learning has been gradually conducted into various fields. Currently, work has been done to use deep reinforcement learning techniques for patient diagnosis. However, the methods for patient diagnosis using deep reinforcement learning often suffer from the following disadvantages: 1. in the patient diagnosis scene, the diagnosis decision is made by paying more attention to which features and how much each feature contributes to the structure, but the current model is difficult to explain, so that the information cannot be transparent. 2. The current model can only take single follow-up information of a patient as input, but the single follow-up is difficult to completely represent the long-term follow-up state of the patient, so that the analysis result is inaccurate.

Disclosure of Invention

In view of this, the present application provides a method, apparatus and computer device for determining a patient treatment plan, which mainly solves the problems that the interpretability of the feature contribution is weak and the analysis result is not accurate enough when the deep reinforcement learning technique is applied to patient diagnosis.

According to one aspect of the present application, there is provided a method of determining a patient treatment regimen, the method comprising:

creating a patient cluster model for processing the time series data based on the deep reinforcement learning DQN;

training the patient grouping model by using sample data marked with the grouping result so that the patient grouping model accords with a preset training standard;

inputting target patient data in a preset time period into a patient grouping model which accords with the preset training standard, and obtaining a target group to which a target patient belongs;

determining a first treatment regimen for the target patient based on the population characteristics within the target group;

extracting a contraindicated drug of the target patient according to the target patient data, and screening a second treatment scheme containing the contraindicated drug from the first treatment scheme;

and analyzing and obtaining the target treatment scheme of the target patient according to the first treatment scheme and the second treatment scheme.

According to another aspect of the present application, there is provided a patient treatment regimen determination apparatus, the apparatus comprising:

a creation module for creating a patient cluster model for processing the time series data based on the deep reinforcement learning DQN;

The training module is used for training the patient grouping model by using sample data marked with the grouping result so that the patient grouping model accords with a preset training standard;

the input module is used for inputting target patient data in a preset time period into a patient grouping model which accords with the preset training standard, and obtaining a target group to which a target patient belongs;

a determination module for determining a first treatment regimen for the target patient based on the population characteristics within the target group;

the extraction module is used for extracting contraindicated medicines of the target patient according to the target patient data and screening out second treatment schemes containing the contraindicated medicines from the first treatment schemes;

and the analysis module is used for analyzing and obtaining the target treatment scheme of the target patient according to the first treatment scheme and the second treatment scheme.

According to another aspect of the present application, there is provided a non-transitory readable storage medium having stored thereon a computer program which when executed by a processor implements the above-described method of determining a patient treatment regimen.

According to yet another aspect of the present application, there is provided a computer device comprising a non-volatile readable storage medium, a processor and a computer program stored on the non-volatile readable storage medium and executable on the processor, the processor implementing the method of determining a patient treatment regimen described above when executing the program.

By means of the technical scheme, compared with the current patient diagnosis mode, the method, the device and the computer equipment for determining the patient treatment scheme are provided, the network structure of the interpretable deep reinforcement learning model DQN is provided, so that a patient grouping model for processing time sequence data is created, and then the patient grouping model is trained by using sample data to reach a preset training standard. Inputting the target patient data in a preset time period into a patient grouping model which accords with a preset training standard, obtaining a target grouping result, and determining a first treatment scheme of the target patient by using crowd characteristics in a target group; further to enhance diagnostic safety, contraindicated drugs for the target patient may also be determined based on the target patient data to screen a second treatment regimen comprising the contraindicated drugs from the first treatment regimen; finally, the first treatment scheme and the second treatment scheme can be utilized to analyze and obtain a target treatment scheme suitable for the target patient. In addition, in the application, the digitization processing of the patient treatment scheme can be realized, the calculation process of the expected reward value Q is expanded into a time sequence structure, more information can be considered, and the analysis result can be more accurate by integrating artificial intelligence and a deep learning algorithm. An Attention mechanism is also added in the process of calculating the expected reward value, so that the interpretability of the patient characteristics can be realized.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the present application. In the drawings:

FIG. 1 illustrates a flow chart of a method of determining a patient treatment regimen provided in an embodiment of the present application;

FIG. 2 illustrates a flow chart of another method of determining a patient treatment regimen provided in an embodiment of the present application;

FIG. 3 illustrates a network architecture diagram of a patient cluster model provided in an embodiment of the present application;

FIG. 4 shows a schematic structural diagram of a patient treatment protocol determination device provided in an embodiment of the present application;

fig. 5 shows a schematic structural diagram of a determination device for another patient treatment regimen provided in an embodiment of the present application.

Detailed Description

The present application will be described in detail hereinafter with reference to the accompanying drawings in conjunction with embodiments. It should be noted that, without conflict, the embodiments and features of the embodiments in the present application may be combined with each other.

Aiming at the problems that the interpretation of the characteristic contribution is weak and the analysis result is not accurate enough when the deep reinforcement learning technology is applied to the diagnosis of a patient, the embodiment of the application provides a method for determining the treatment scheme of the patient, as shown in fig. 1, the method comprises the following steps:

101. A patient cluster model for processing time series data is created based on the deep reinforcement learning DQN.

For the present embodiment, it is intended to expand a model into a time series model by improving a conventional deep reinforcement learning DQN model, and add an Attention mechanism, and perform a process of grouping patients with the improved DQN model so as to be able to be used for processing time series data, and to be able to achieve an interpretability of patient characteristics.

102. And training the patient grouping model by using sample data marked with the grouping result so that the patient grouping model accords with a preset training standard.

In a specific application scenario, a grouping decision rule can be preset, a group to which sample data corresponds is determined based on the grouping decision rule, and then a grouping result is marked in a similar marking mode to the corresponding sample data, and is used as a verification reference to verify the result output by the patient grouping model aiming at the sample data, so that the training state of the patient grouping model is judged, and if the error between the output result and the marking result of the patient grouping model is less, the patient grouping model can be judged to accord with a preset training standard.

103. Inputting the target patient data in a preset time period into a patient grouping model which accords with a preset training standard, and obtaining a target group to which the target patient belongs.

The preset time period can be set according to actual application requirements, for example, the preset time period can be set to be in the previous month including the current moment, and the corresponding historical target patient data are one or more pieces of follow-up data about the target patient recorded in the preset time period.

For the embodiment, in a specific application scenario, when the single follow-up information of the patient is taken as input, the single follow-up information is difficult to completely represent the long-term follow-up state of the patient, and thus the analysis result is easy to be inaccurate. Therefore, in this embodiment, in addition to the patient follow-up data at the current time being input, all the historical patient follow-up data existing in the preset time period can be input, and the final relatively accurate target grouping result is determined by integrating the output results of the patient follow-up data. Furthermore, the degree of contribution, attention coefficient, contribution ratio, and the like of each feature in each time point to the grouping result can also be explained based on the Attention mechanism.

104. A first treatment regimen for the target patient is determined based on the population characteristics within the target group.

In a specific application scenario, after the target patient data is divided into groups, a patient with high similarity to the crowd characteristics corresponding to the target patient can be determined based on crowd information in the groups, so that a first treatment scheme which can be selected by the target patient can be screened based on the generated treatment scheme of the patient.

105. And extracting the contraindicated drugs of the target patient according to the target patient data, and screening out the second treatment scheme containing the contraindicated drugs from the first treatment scheme.

For the embodiment, in a specific application scenario, since different patients may have different contraindicated drugs corresponding to each other, the contraindicated drugs of the target patient should be extracted first, so that the second treatment plan including the corresponding contraindicated drugs is selected from the first treatment plan, so that the second treatment plan is not considered when the treatment plan recommendation is finally generated.

106. And analyzing and obtaining a target treatment scheme of the target patient according to the first treatment scheme and the second treatment scheme.

For the embodiment, in a specific application scenario, after determining the first treatment scheme and the second treatment scheme, the second treatment scheme is removed from the first treatment scheme, and the removed first treatment scheme is determined as the target treatment scheme of the target patient.

By the method for determining a patient treatment plan in this embodiment, a network structure for improving a deep reinforcement learning model DQN is proposed so as to create a patient clustering model for processing time series data, and then the patient clustering model is trained by using sample data so as to reach a preset training standard. Inputting the target patient data in a preset time period into a patient grouping model which accords with a preset training standard, obtaining a target grouping result, and determining a first treatment scheme of the target patient by using crowd characteristics in a target group; further to enhance diagnostic safety, contraindicated drugs for the target patient may also be determined based on the target patient data to screen a second treatment regimen comprising the contraindicated drugs from the first treatment regimen; finally, the first treatment scheme and the second treatment scheme can be utilized to analyze and obtain a target treatment scheme suitable for the target patient. In addition, in the application, the digitization processing of the patient treatment scheme can be realized, the calculation process of the expected reward value Q is expanded into a time sequence structure, more information can be considered, and the analysis result can be more accurate by integrating artificial intelligence and a deep learning algorithm.

Further, as a refinement and extension of the foregoing embodiment, for a complete description of the implementation in this embodiment, another method for determining a patient treatment regimen is provided, as shown in fig. 2, which includes:

201. a patient cluster model for processing time series data is created based on the deep reinforcement learning DQN.

For the present embodiment, in a specific application scenario, the embodiment step 201 may specifically include: splitting the last full-connection layer in the deep reinforcement learning DQN corresponding network structure into a first full-connection layer, a second circulating neural network layer and a third circulating neural network layer; and constructing a patient clustering model by using the depth reinforcement learning DQN with the network structure modified, so that when patient data comprising a plurality of time points is input into the patient clustering model, the first full-connection layer outputs embedded values of the patient states corresponding to the time points, the second circulating neural network layer outputs first attention of the patient states corresponding to the time points, the third circulating neural network layer outputs second attention of the clustering results corresponding to the time points, and the expected reward values of the patient data corresponding to the preset groups are calculated based on the embedded values, the first attention and the second attention.

For example, as shown in the network structure diagram of the patient clustering model in fig. 3, the abstract features extracted by the convolution layer are split into three branches, namely, the last fully-connected layer in the network structure corresponding to the deep reinforcement learning DQN is split into: the system comprises a first full-connection layer 1, a second cyclic neural network layer 2 and a third cyclic neural network layer 3, wherein the first full-connection layer 1 is used for outputting embedded values of patient states corresponding to all time points, the second cyclic neural network layer 2 is a state value function (value function) used for outputting first attention of patient states corresponding to all time points, and the third cyclic neural network layer 3 is an action dominance function (advantage function) used for outputting second attention of grouping results corresponding to all time points.

In a specific application scenario, in order to monitor a training state of a patient clustering model when the patient clustering model is trained by using sample data, it is necessary to perform labeling of a group to which the sample data belongs in advance, specifically including: grouping the sample data according to a preset grouping decision rule to obtain grouping results corresponding to each sample data; sample data is marked based on the clustering result.

The preset grouping decision rule can be set according to actual requirements, for example, the grouping decision rule can be set according to personal characteristic information of patients and is divided by combining with inspection index information. When the grouping is performed, patients with high similarity of personal characteristic information of the patients and the same examination indexes and the same examination results can be divided into one group.

202. Sample data at the current time point and the historical time point are input into a patient grouping model, a preset number of groups are obtained, and expected reward values of the sample data corresponding to the groups are obtained.

The sample data is time sequence data including a current time point and a preset number of historical time points, and can include patient data information in the current time point and the historical time points, wherein the patient data information can be personal identity information (such as name, gender, age and the like), treatment scheme information (medication combination, medication period, medication amount and the like), examination index information (such as examination indexes of blood sugar, blood pressure, electrocardiogram and the like, corresponding examination results and the like) and the like; the expected prize value is obtained by accumulating the product of the current time point and the historical time point after calculating the first sum of the first attention and the second attention at the same time point and the product of the first sum and the embedded value.

For example, as shown in the network configuration diagram of the patient group model shown in fig. 3, if the current patient state (s ₃ ) Plus the patient status(s) at two time points of history ₁ 、s ₂ ) E (e) of each time point output by the first full-connection layer can be obtained through the full-connection layer and the two circulating neural network layers in the patient grouping model ₁ ，e ₂ ，e ₃ ) V (V ₁ 、V ₂ 、V ₃ ) A (a ₁ 、A ₂ 、A ₃ ) Then, V and A in the same time step are added, and then multiplied by e to perform element-wise multiplication to calculate the Q value (Q ₃ ). Wherein V represents the degree of attention of the patient state for each time point; a represents the degree of attention of the corresponding patient state at each time point; e represents an embedded representation of the patient's state. The calculation formula of each layer is as follows:

h _V1 ,h _V2 ,h _V3 ＝LSTM-V(s ₁ ,s ₂ ,s ₃ )

h _A1 ,h _A2 ,h _A3 ＝LSTM-A(s ₁ ,s ₂ ,s ₃ )

A ₁ ,A ₂ ,A ₃ ＝(W _A h _A1 ,W _A h _A2 ,W _A h _A3 )

v ₁ ,v ₂ ,v ₃ ＝(W _I s ₁ ,W _I s ₂ ,W _I s ₃ )

e ₁ ,e ₂ ,e ₃ ＝(W _II v ₁ ,W _II v ₂ ,W _II v ₃ )

Q ₃ ＝(e ₁ O(V ₁ +A ₁ )+e ₂ O(V ₂ +A ₂ )+e ₃ O(V ₃ +A ₃ )

wherein s is _i 、h _vi 、w _v 、h _Ai 、A _i 、v _i 、e _i 、Q ₃ Is vector, V _i Scalar, W _A 、W _I 、W _II For the matrix, O represents the corresponding multiplication of the elements.

It should be noted that, in the present application, the Attention mechanism is also incorporated, so that the interpretability of the patient characteristics can be realized. The interpretation method of the model decision can be as follows: all s through input _i The contribution of each patient feature in each time point to the final Q value can be derived forward.

According to the expected prize value (Q ₃ ) Is calculated according to the formula:

Q ₃ ＝(e ₁ O(V ₁ +A ₁ )+e ₂ O(V ₂ +A ₂ )+e ₃ O(V ₃ +A ₃ ))

＝(W _II W _I s ₁ O(V ₁ +A ₁ )+W _II W _I s ₂ O(V ₂ +A ₂ )+W _II W _I s ₃ O(V ₃ +A ₃ ))

it can be seen that the importance of the jth feature at the ith time point to the kth Q value is:

w(i,j,k)＝(V _i +A _i [k])*(W _II [K]·W _I [j])*s _i [j]

wherein, (V) _i +A _i [k])*(W _II [k]·W _I [j]) I.e. the coefficients of the contributions, the degree of attention.

203. And extracting a mark group corresponding to the sample data, and determining a first expected reward value output by the mark group as a training output result of the patient grouping model.

For the embodiment, in a specific application scenario, each sample data corresponds to only one label group, and when the sample data is input into the patient grouping model, the expected reward value corresponding to each preset group is obtained, so that in order to verify the training process of the patient grouping model, only the first expected reward value corresponding to the label group is extracted, and the first expected reward value is determined as the training output result of the patient grouping model.

204. And calculating the mean square error loss of the first expected reward value and the real expected reward value, and if the loss function reaches a convergence state according to the mean square error loss judgment, determining that the patient grouping model meets the preset training standard.

The first expected reward value is the expected reward value of the label group corresponding to the current patient state, the true expected reward value is the maximum expected reward value+the actual obtained reward (reward) of the next patient state, and the true expected reward value of the corresponding label group is further calculated.

For the embodiment, in a specific application scenario, after the first expected reward value is extracted, the mean square error loss needs to be calculated according to the first expected reward value and the actual expected reward value, whether the loss function reaches the convergence state is further determined, and when the loss function reaches the convergence state, it can be determined that the patient grouping model meets the preset training standard.

205. If the loss function is judged to not reach the convergence state, the patient grouping model is repeatedly trained by using the sample data, so that the patient grouping model accords with a preset training standard.

Correspondingly, if the loss function is judged to not reach the convergence state, the fact that the patient grouping model is not trained successfully can be determined, and the training steps are repeated by utilizing the sample data so that the patient grouping model meets the preset training standard.

206. Inputting the target patient data in a preset time period into a patient grouping model which accords with a preset training standard, and obtaining a target group to which the target patient belongs.

When the target patient information is time sequence data, all the target patient information at the current moment and the historical moment are required to be input into a patient grouping model, and a grouping result is obtained; when the target patient information is not time sequence data, the current time target patient information is only required to be input into the patient grouping model, and the parameter value corresponding to the historical time point in the patient grouping model is set to be 0, so that the grouping result can be obtained.

For the present embodiment, in a specific application scenario, when the target patient information is time-series data, the embodiment step 206 may specifically include: extracting historical patient follow-up data and current patient follow-up data of a target patient in a preset time period; inputting historical patient follow-up data and current patient follow-up data into a patient grouping model which accords with preset training standards, and obtaining expected reward values corresponding to each preset group; and determining the preset group with the maximum expected reward value as a target group corresponding to the target patient.

207. A first treatment regimen for the target patient is determined based on the population characteristics within the target group.

For the present embodiment, in a specific application scenario, in order to determine the first treatment regimen of the target patient, the embodiment step 207 may specifically include: screening a first patient with crowd characteristic similarity larger than a first preset threshold value corresponding to a target patient in a target group according to target patient data, wherein the crowd characteristic at least comprises illness state information and personal information; extracting a corresponding treatment scheme of the first patient, and determining a treatment scheme with a score value larger than a second preset threshold value as the first treatment scheme, wherein the score value of the treatment scheme is related to the treatment effect; or acquiring a preset treatment scheme established according to the crowd characteristics of the target group, and determining the preset treatment scheme as a first treatment scheme.

The target group comprises data information of a plurality of sample patients, wherein the data information comprises characteristic information of a plurality of dimensions such as personal identity information, examination index information, diagnosis result information and the like of the sample patients, and can also comprise score information of treatment effects and treatment scheme information such as medication combination, medication period, medication amount and the like; the first preset threshold value and the second preset threshold value are both data with the value larger than 0 and smaller than or equal to 1, the specific value can be set according to specific application scenes, and the fact that the closer the value set by the first preset threshold value is to 1, the higher the feature similarity between the screened first patient and the target patient can be described; when the value set by the second preset threshold value is closer to 1, the screened first treatment scheme can be indicated, and the treatment effect fed back by the patient is better.

In a specific application scenario, after grouping of target patients is completed, characteristic information of multiple dimensions such as personal identity information, inspection index information, diagnosis result information and the like of the target patients can be extracted from target patient information in advance, then a first patient with the characteristic information matching degree larger than a first preset threshold value with the target patients is screened out from the target group, further a treatment scheme with the score value of the corresponding treatment effect of the first patient larger than a second preset threshold value is extracted, and the treatment scheme is determined to be a first treatment scheme.

For example, screening the first patient in the target group according to the target patient data, wherein the crowd feature similarity of the first patient is greater than a first preset threshold value, the first patient corresponding to the target patient comprises: A. b, C, D four first patients, wherein the combination of the first patients a is a+c+d, the combination of the first patients B is a+c+e, the combination of the first patients C is a+b+c, the combination of the first patients D is a+c+d, three non-overlapping treatment schemes including a+c+d, a+c+e and a+b+c can be found through statistics, and further, the score values of the three schemes about the treatment effect are obtained, for example, the score value corresponding to the treatment scheme a+c+d is 0.75, the score value corresponding to the treatment scheme a+b+e is 0.91, the score value corresponding to the treatment scheme a+b+c is 0.88, and if the set second preset threshold is 0.85, the screened first treatment scheme can be determined to include a+b+e and a+b+c.

Accordingly, as another alternative in this embodiment, the preset treatment schemes corresponding to the target groups may also be determined in advance according to the crowd characteristics and the diagnosis results of the doctors in the target groups, for example, when the patient in the target group is child and the diagnosis result of the corresponding doctor is disease a, and the commonly adopted treatment scheme includes A, B, the treatment scheme A, B may be directly determined as the preset treatment scheme corresponding to the target group, and when it is determined that the target patient belongs to the target group, the treatment scheme A, B may be determined as the first treatment scheme corresponding to the target patient.

208. And extracting the contraindicated drugs of the target patient according to the target patient data, and screening out the second treatment scheme containing the contraindicated drugs from the first treatment scheme.

For the present embodiment, in a specific application scenario, in order to determine that the second treatment regimen containing the contraindicated drug of the target patient is obtained, embodiment step 208 may specifically include: determining a first tabu medicament which is unsuitable for taking and corresponds to the crowd type of the target patient according to the medication tabu data; determining a second tabu drug of the target patient with anaphylactic reaction according to the drug allergy history in the target patient data; a first treatment regimen comprising a first contraindicated drug and/or a second contraindicated drug is determined as a second treatment regimen.

For example, when the target patient is a pregnant woman, the first contraindicated drug of the target patient may correspond to a pregnant woman disabling drug; when the target patient is penicillin allergic crowd, the penicillin medicine can be determined to be a second tabu medicine of the target patient.

209. And analyzing and obtaining a target treatment scheme of the target patient according to the first treatment scheme and the second treatment scheme.

For the present embodiment, in a specific application scenario, the embodiment step 209 may specifically include: and eliminating the second treatment scheme from the first treatment scheme to obtain the target treatment scheme.

For example, eliminating the disabled second regimen from the first regimen that treats the condition of the subject patient results in a combination of regimens that are appropriate for the health of the subject patient, and using those regimens can be effective in treating the condition of the subject patient.

By the above-mentioned method for determining a patient treatment regimen, an interpretable network structure of a deep reinforcement learning model DQN is proposed to create a patient cluster model for processing time series data, and then the patient cluster model is trained using sample data to reach a preset training standard. Inputting the target patient data in a preset time period into a patient grouping model which accords with a preset training standard, obtaining a target grouping result, and determining a first treatment scheme of the target patient by using crowd characteristics in a target group; further to enhance diagnostic safety, contraindicated drugs for the target patient may also be determined based on the target patient data to screen a second treatment regimen comprising the contraindicated drugs from the first treatment regimen; finally, the first treatment scheme and the second treatment scheme can be utilized to analyze and obtain a target treatment scheme suitable for the target patient. In addition, in the application, the digitization processing of the patient treatment scheme can be realized, the calculation process of the expected reward value Q is expanded into a time sequence structure, more information can be considered, and the analysis result can be more accurate by integrating artificial intelligence and a deep learning algorithm. In addition, an Attention mechanism is added in the process of calculating the expected reward value, so that a certain degree of interpretability can be realized.

Further, as an embodiment of the method shown in fig. 1 and 2, an embodiment of the present application provides a device for determining a treatment plan of a patient, as shown in fig. 4, where the device includes: a creation module 31, a training module 32, an input module 33, a determination module 34, an extraction module 35, an analysis module 36.

A creation module 31 operable to create a patient grouping model for processing time series data based on the deep reinforcement learning DQN;

the training module 32 is configured to train the patient classification model by using the sample data marked with the classification result, so that the patient classification model meets a preset training standard;

the input module 33 is configured to input target patient data in a preset time period into a patient grouping model that meets a preset training standard, and obtain a target grouping result;

the determining module 34 may be configured to input the target patient data within a preset time period into a patient grouping model that meets a preset training standard, and obtain a target group to which the target patient belongs;

an extraction module 35, configured to extract a contraindicated drug of the target patient according to the target patient data, and screen a second treatment regimen including the contraindicated drug from the first treatment regimen;

the analysis module 36 is operable to analyze the target treatment regimen for the target patient in accordance with the first treatment regimen and the second treatment regimen.

In a specific application scenario, in order to create a patient grouping model for processing time series data, as shown in fig. 5, the creation module 31 may specifically include: splitting unit 311, constructing unit 312;

the splitting unit 311 may be configured to split a last full-connection layer in the deep reinforcement learning DQN corresponding network structure into a first full-connection layer, a second cyclic neural network layer, and a third cyclic neural network layer;

the construction unit 312 may be configured to construct a patient clustering model using the deep reinforcement learning DQN with the modified network structure, so that when patient data including a plurality of time points is input to the patient clustering model, an embedded value corresponding to a patient state at each time point is output by the first fully-connected layer, a first attention corresponding to the patient state at each time point is output by the second recurrent neural network layer, a second attention corresponding to the clustering result at each time point is output by the third recurrent neural network layer, and an expected reward value corresponding to each preset group of the patient data is calculated based on the embedded value, the first attention, and the second attention.

Accordingly, for training to obtain a patient grouping model that meets the preset training criteria, as shown in fig. 5, the training module 32 may specifically include: a first input unit 321, a first extraction unit 322, a calculation unit 323, a training unit 324;

The first input unit 321 is configured to input sample data at a current time point and a historical time point into a patient grouping model, obtain a preset number of groups, and obtain expected reward values corresponding to the groups of the sample data, where the expected reward values are obtained by accumulating products of the current time point and the historical time point after calculating a first sum of the first attention degree and the second attention degree at the same time point and a product of the first sum and the embedded value;

a first extraction unit 322, configured to extract a marker group corresponding to the sample data, and determine a first expected reward value output by the marker group corresponding to the training output result of the patient classification model;

the calculating unit 323 is configured to calculate a mean square error loss between the first expected reward value and the actual expected reward value, and determine that the patient classification model meets a preset training standard if the loss function reaches a convergence state based on the mean square error loss;

the training unit 324 may be configured to repeatedly train the patient classification model by using the sample data if it is determined that the loss function does not reach the convergence state, so that the patient classification model meets a preset training standard.

In a specific application scenario, in order to determine the target group to which the target patient corresponds, as shown in fig. 5, the input module 33 may specifically include: a second extraction unit 331, a second input unit 332, a first determination unit 333;

The second extracting unit 331 may be configured to extract historical patient follow-up data and current patient follow-up data of the target patient in a preset period of time;

the second input unit 332 is configured to input the historical patient follow-up data and the current patient follow-up data into a patient grouping model that meets a preset training standard, and obtain an expected reward value corresponding to each preset group;

the first determining unit 333 may be configured to determine a preset group with the largest expected reward value as a target group corresponding to the target patient.

In a specific application scenario, in order to determine the first treatment regimen of the target patient based on the target grouping result, as shown in fig. 5, the determining module 34 may specifically include: a screening unit 341, a second determining unit 342;

a screening unit 341, configured to screen, according to the target patient data, a first patient whose crowd feature similarity corresponding to the target patient is greater than a first preset threshold, where the crowd feature at least includes illness state information and personal information;

a second determining unit 342, configured to extract a treatment plan corresponding to the first patient, and a score value of the treatment plan with respect to the treatment effect, and determine a treatment plan with a score value greater than a second preset threshold value as the first treatment plan; or (b)

The second determining unit 342 may be further configured to obtain a preset treatment plan created according to the crowd characteristics of the target group, and determine the preset treatment plan as the first treatment plan.

In a specific application scenario, in order to screen the first treatment regimen from the second treatment regimen containing the contraindicated drug of the target patient, as shown in fig. 5, the extraction module 35 may specifically include: a third determination unit 351;

a third determining unit 351, configured to determine, according to medication contraindication data, a first contraindication medicine that is not suitable for being taken by the group type corresponding to the target patient;

the third determining unit 351 is further configured to determine, according to the medical allergy history in the target patient data, that the target patient has a second tabu drug of allergic reaction;

the third determining unit 351 is further operable to determine the first treatment regimen comprising the first tabu medicament and/or the second tabu medicament as the second treatment regimen.

Accordingly, for analysis to obtain a target treatment regimen for a target patient, as shown in fig. 5, the analysis module 36 may specifically include: a rejecting unit 361;

the rejecting unit 361 may be configured to reject the second treatment plan from the first treatment plan to obtain the target treatment plan.

It should be noted that, for other corresponding descriptions of the functional units related to the determining device of the patient treatment apparatus provided in this embodiment, reference may be made to corresponding descriptions in fig. 1 to 2, and no further description is given here.

Based on the above-described methods shown in fig. 1 and 2, correspondingly, the embodiments of the present application further provide a storage medium having a computer program stored thereon, which when executed by a processor, implements the above-described method of determining a patient treatment regimen as shown in fig. 1 and 2.

Based on such understanding, the technical solution of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.), and includes several instructions for causing a computer device (may be a personal computer, a server, or a network device, etc.) to perform the method of each implementation scenario of the present application.

Based on the methods shown in fig. 1 and fig. 2 and the virtual device embodiments shown in fig. 4 and fig. 5, in order to achieve the above objects, the embodiments of the present application further provide a computer device, which may specifically be a personal computer, a server, a network device, etc., where the entity device includes a storage medium and a processor; a storage medium storing a computer program; a processor for executing a computer program to implement the above-described method of determining a patient treatment regimen as shown in fig. 1 and 2.

Optionally, the computer device may also include a user interface, a network interface, a camera, radio Frequency (RF) circuitry, sensors, audio circuitry, WI-FI modules, and the like. The user interface may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), etc., and the optional user interface may also include a USB interface, a card reader interface, etc. The network interface may optionally include a standard wired interface, a wireless interface (e.g., bluetooth interface, WI-FI interface), etc.

It will be appreciated by those skilled in the art that the computer device structure provided in this embodiment is not limited to this physical device, and may include more or fewer components, or may combine certain components, or may be arranged in different components.

The non-volatile readable storage medium may also include an operating system, a network communication module, etc. The operating system is a program that analyzes entity device hardware and software resources for text semantic similarity, supporting the execution of information handling programs and other software and/or programs. The network communication module is used for realizing communication among components in the nonvolatile readable storage medium and communication with other hardware and software in the entity device.

From the description of the above embodiments, a person skilled in the art may create a patient clustering model for processing time series data by proposing a network structure of an interpretable deep reinforcement learning model DQN, and then training the patient clustering model with sample data to reach preset training criteria. Inputting the target patient data in a preset time period into a patient grouping model which accords with a preset training standard, obtaining a target grouping result, and determining a first treatment scheme of the target patient by using crowd characteristics in a target group; further to enhance diagnostic safety, contraindicated drugs for the target patient may also be determined based on the target patient data to screen a second treatment regimen comprising the contraindicated drugs from the first treatment regimen; finally, the first treatment scheme and the second treatment scheme can be utilized to analyze and obtain a target treatment scheme suitable for the target patient. In addition, in the application, the digitization processing of the patient treatment scheme can be realized, the calculation process of the expected reward value Q is expanded into a time sequence structure, more information can be considered, and the analysis result can be more accurate by integrating artificial intelligence and a deep learning algorithm. In addition, an Attention mechanism is added in the process of calculating the expected reward value, so that a certain degree of interpretability can be realized.

Those skilled in the art will appreciate that the drawings are merely schematic illustrations of one preferred implementation scenario, and that the modules or flows in the drawings are not necessarily required to practice the present application. Those skilled in the art will appreciate that modules in an apparatus in an implementation scenario may be distributed in an apparatus in an implementation scenario according to an implementation scenario description, or that corresponding changes may be located in one or more apparatuses different from the implementation scenario. The modules of the implementation scenario may be combined into one module, or may be further split into a plurality of sub-modules.

The foregoing application serial numbers are merely for description, and do not represent advantages or disadvantages of the implementation scenario. The foregoing disclosure is merely a few specific implementations of the present application, but the present application is not limited thereto and any variations that can be considered by a person skilled in the art shall fall within the protection scope of the present application.

Claims

1. A method of determining a patient treatment regimen, comprising:

analyzing and obtaining a target treatment scheme of the target patient according to the first treatment scheme and the second treatment scheme;

the method for creating the patient grouping model for processing time series data based on the deep reinforcement learning DQN specifically comprises the following steps:

splitting the last full-connection layer in the deep reinforcement learning DQN corresponding network structure into a first full-connection layer, a second circulating neural network layer and a third circulating neural network layer;

constructing a patient clustering model by using the deep reinforcement learning DQN with a changed network structure, so that when patient data comprising a plurality of time points is input into the patient clustering model, the first full-connection layer outputs embedded values of the patient states corresponding to the time points, the second recurrent neural network layer outputs first attention of the patient states corresponding to the time points, the third recurrent neural network layer outputs second attention of the clustering results corresponding to the time points, and the expected reward values of the patient data corresponding to the preset groups are calculated based on the embedded values, the first attention and the second attention;

Wherein the expected reward value is obtained by accumulating the product of the current time point and the historical time point after calculating a first sum of the first attention and the second attention at the same time point and the product of the first sum and the embedded value.

2. The method of claim 1, wherein the sample data is time series data comprising a current time point and a predetermined number of historical time points;

the training the patient grouping model by using the sample data marked with the grouping result so that the patient grouping model accords with a preset training standard specifically comprises the following steps:

inputting sample data at the current time point and the historical time point into the patient grouping model, and obtaining a preset number of groups and expected reward values of the sample data corresponding to the groups;

extracting a mark group corresponding to the sample data, and determining a first expected reward value output by the mark group as a training output result of the patient grouping model;

calculating the mean square error loss of the first expected reward value and the real expected reward value, and if the loss function reaches a convergence state based on the mean square error loss judgment, determining that the patient grouping model meets a preset training standard;

And if the loss function does not reach the convergence state, repeatedly training the patient grouping model by using the sample data so as to enable the patient grouping model to meet the preset training standard.

3. The method according to claim 2, wherein the inputting the target patient data within the preset time period into the patient grouping model meeting the preset training standard, and obtaining the target group to which the target patient belongs specifically includes:

extracting historical patient follow-up data and current patient follow-up data of a target patient in a preset time period;

inputting the historical patient follow-up data and the current patient follow-up data into a patient grouping model which accords with the preset training standard, and obtaining expected reward values corresponding to each preset group;

and determining the preset group with the maximum expected reward value as a target group corresponding to the target patient.

4. The method of claim 3, wherein the determining the first treatment regimen for the target patient based on the demographic characteristics within the target group comprises:

screening a first patient with crowd characteristic similarity larger than a first preset threshold value corresponding to the target patient in the target group according to the target patient data, wherein the crowd characteristic at least comprises illness state information and personal information;

Extracting a corresponding treatment scheme of the first patient, and determining a treatment scheme with the score value larger than a second preset threshold value as a first treatment scheme, wherein the score value of the treatment scheme is related to the treatment effect; or (b)

And acquiring a preset treatment scheme established according to the crowd characteristics of the target group, and determining the preset treatment scheme as the first treatment scheme.

5. The method of claim 4, wherein the extracting the contraindicated drug of the target patient based on the target patient data and screening the first treatment regimen for a second treatment regimen comprising the contraindicated drug, comprises:

determining a first tabu medicament which is unsuitable for taking and corresponds to the crowd type of the target patient according to the medication tabu data;

determining a second contraindicated drug for the target patient having an allergic response based on the drug allergy history in the target patient data;

a first treatment regimen comprising the first tabu drug and/or the second tabu drug is determined as a second treatment regimen.

6. The method of claim 5, wherein the analyzing according to the first treatment regimen and the second treatment regimen results in a target treatment regimen for the target patient, comprising:

And eliminating the second treatment scheme from the first treatment scheme to obtain the target treatment scheme.

7. A patient treatment regimen determination apparatus, comprising:

the analysis module is used for analyzing and obtaining a target treatment scheme of the target patient according to the first treatment scheme and the second treatment scheme;

The creation module may specifically include: a splitting unit and a constructing unit;

the splitting unit can be used for splitting the last full-connection layer in the network structure corresponding to the deep reinforcement learning DQN into a first full-connection layer, a second circulating neural network layer and a third circulating neural network layer;

the construction unit is configured to construct a patient clustering model by using the deep reinforcement learning DQN with a modified network structure, so that when patient data including a plurality of time points is input to the patient clustering model, an embedded value corresponding to a patient state at each time point is output by the first fully-connected layer, a first attention degree corresponding to the patient state at each time point is output by the second recurrent neural network layer, a second attention degree corresponding to a clustering result at each time point is output by the third recurrent neural network layer, and an expected reward value corresponding to each preset group of the patient data is calculated based on the embedded value, the first attention degree and the second attention degree;

8. A non-transitory readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements the method of determining a patient treatment regimen of any one of claims 1 to 6.

9. A computer device comprising a non-volatile readable storage medium, a processor and a computer program stored on the non-volatile readable storage medium and executable on the processor, wherein the processor implements the method of determining a patient treatment regimen of any one of claims 1 to 6 when the program is executed by the processor.