WO2023050668A1

WO2023050668A1 - Clustering model construction method based on causal inference and medical data processing method

Info

Publication number: WO2023050668A1
Application number: PCT/CN2022/074389
Authority: WO
Inventors: 徐卓扬; 孙行智; 赵婷婷; 胡岗
Original assignee: 平安科技（深圳）有限公司
Priority date: 2021-09-30
Filing date: 2022-01-27
Publication date: 2023-04-06
Also published as: CN113782192A

Abstract

A clustering model construction method based on causal inference, comprising: inputting a plurality sample data of multiple sample patients into a model to be trained, and outputting, by means said model, a tendency score of each sample patient for corresponding sample patient clustering result data and multiple sample expected cumulative reward values corresponding to each sample patient; determining, from among the multiple sample expected cumulative reward values, a target sample expected cumulative reward value of each sample patient; adjusting model parameters in said model on the basis of a preset loss function, the tendency score of each sample patient and a corresponding target sample expected cumulative reward value, so as to obtain a clustering model. A plurality of sample data is trained by means of combining a model to be trained with causal inference analysis, eliminating selection deviation for patient clustering result data, so that model fitting is more reasonable, and the application accuracy of a trained model is higher.

Description

Grouping model construction method and medical data processing method based on causal inference

This application declares the priority of the Chinese patent application with the application number 202111156355.6 and titled "Causal inference-based grouping model construction method and medical data processing method" submitted on September 30, 2021. The entire content of the Chinese patent application is referred to way is incorporated in this application.

technical field

The embodiment of the present application relates to the technical field of artificial intelligence, and in particular to a method for constructing a grouping model based on causal inference and a method for processing medical data.

Background technique

In the medical field, patient grouping is of great significance to disease diagnosis, disease prediction, and drug treatment. Currently, deep reinforcement learning models are commonly used to segment patient populations. Most deep reinforcement learning models use multi-layer neural networks to capture the correlation dependence between features to estimate "revenue".

However, in the actual medical field application scenarios, there is a high correlation between the actual grouping decision of patients and certain characteristics. Doctors will make targeted grouping decisions for patients based on diagnostic guidelines or clinical experience. This distributional bias in taking decisions affects the learning of deep reinforcement learning models. The inventor realized that the learning and training process of the existing deep reinforcement learning model has a decision-making distribution shift in the sample data, resulting in low accuracy and inaccuracy in the grouping model trained based on the deep reinforcement learning model when making patient grouping decisions. Reasonable and other issues.

Contents of the invention

In view of this, the embodiment of the present application provides a method for constructing a grouping model based on causal inference, a system, a computer device, a computer-readable storage medium, and a medical data processing method, which are used to solve the problems of learning and processing of existing deep reinforcement learning models. During the training process, due to the distribution deviation of decision-making in the sample data, the grouping model trained based on the deep reinforcement learning model has low accuracy and unreasonable problems in making patient grouping decisions.

The embodiment of the present application solves the above-mentioned technical problems through the following technical solutions:

One aspect of the present application provides a method for constructing a grouping model based on causal inference, including:

Obtain multiple sample data of multiple sample patients, and multiple sample data of each sample patient includes multiple basic data, multiple patient historical follow-up data and sample patient grouping result data;

Input a plurality of sample data of the plurality of sample patients into the model to be trained, output the propensity score of each sample patient for its corresponding sample patient grouping result data through the model to be trained, and use the model to be trained The model outputs each sample patient corresponding to the sample expected cumulative reward value of each model patient grouping result data in the model to be trained, wherein the propensity score represents the probability that the sample patient corresponds to the sample patient grouping result data;

From the sample expected cumulative reward value of each sample patient, determine the target sample expected cumulative reward value corresponding to each sample patient; and

Based on the preset loss function, the propensity score of each patient sample and the corresponding expected cumulative reward value of the target sample, the model parameters in the model to be trained are adjusted to optimize the grouping model.

Another aspect of the embodiment of the present application provides a system for constructing a grouping model based on causal inference, including:

The first acquisition module is used to acquire multiple sample data of multiple sample patients, and the multiple sample data of each sample patient includes multiple basic data, multiple patient historical follow-up data and sample patient grouping result data;

The first model processing module is used to input multiple sample data of the multiple sample patients into the model to be trained, and output the propensity score of each sample patient for its corresponding sample patient grouping result data through the model to be trained value, and the sample expected cumulative reward value of each sample patient corresponding to each model patient grouping result data in the model to be trained is output by the model to be trained, wherein the propensity score indicates that the sample patient corresponds to the sample patient Probability of patient cohort outcome data; and

The first determining module is used to determine the target sample expected cumulative reward value corresponding to each sample patient from the sample expected cumulative reward value of each sample patient; and

The optimization module is used to adjust the model parameters in the model to be trained based on the preset loss function, the propensity score of each sample patient and the corresponding expected cumulative reward value of the target sample, so as to optimize the clustering model.

Another aspect of the embodiment of the present application provides a medical data processing method, including:

Acquiring a plurality of patient data of the target patient, the plurality of patient data including a plurality of basic data, a plurality of patient historical follow-up data, and a patient current follow-up data;

Input the plurality of basic data, the plurality of patient historical follow-up data and the patient current follow-up data into the above-mentioned grouping model, and output the target patient corresponding to each model patient grouping result through the grouping model The expected cumulative reward value of the data;

From the plurality of expected cumulative reward values, determining the largest expected cumulative reward value as a target expected cumulative reward value; and

According to the target expected cumulative reward value, determine the corresponding model patient grouping result data as the target patient grouping result data corresponding to the target patient.

Another aspect of the embodiment of the present application provides a medical data processing system, including:

The second acquiring module is used to acquire multiple patient data of the target patient, the multiple patient data including multiple basic data, multiple patient historical follow-up data and patient current follow-up data;

The second model processing module is used to input the plurality of basic data, the plurality of patient history follow-up data and the patient current follow-up data into the above-mentioned grouping model, and output the target through the grouping model The patient's expected cumulative reward value corresponding to the patient grouping result data of each model;

The second determining module is used to determine the largest expected cumulative reward value as a target expected cumulative reward value from among multiple expected cumulative reward values; and

The third determination module is configured to determine the corresponding model patient grouping result data as the target patient grouping result data corresponding to the target patient according to the target expected cumulative reward value.

In order to achieve the above purpose, an embodiment of the present application further provides a computer device, the computer device includes a memory, a processor, and a computer program stored in the memory and operable on the processor, and the processor executes the The following steps are also performed when the computer program:

In order to achieve the above purpose, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and the computer program can be executed by at least one processor, so that the at least A processor performs the following steps:

The causal inference-based grouping model construction method, system, computer equipment, computer-readable storage medium, and medical data processing method provided in the embodiments of the present application use the multiple basic data of the multiple sample patients and the multiple patient historical follow-up data and the sample patient grouping result data are input into the model to be trained, and the propensity score of each sample patient for its corresponding sample patient grouping result data and each sample patient corresponding to each model patient grouping result data are output by the model to be trained The sample expected cumulative reward value of each sample patient; from the sample expected cumulative reward value of each sample patient, determine the target sample expected cumulative reward value corresponding to each sample patient; and based on the preset loss function, the propensity score of each sample patient value and the corresponding expected cumulative reward value of the target sample, adjust the model parameters in the model to be trained to optimize the clustering model; train multiple sample data by combining the model to be trained with causal inference analysis, and eliminate the need for patient clustering result data The selection bias makes the model fit more reasonable, and the trained model has a higher application accuracy.

Description of drawings

Fig. 1 is the flow chart of the steps of the grouping model construction method based on causal inference in Embodiment 1 of the present application;

Fig. 2 is the flow chart of the steps of the grouping model construction method based on causal inference in Embodiment 1 of the present application;

Fig. 3 is a flow chart of the steps of the method for constructing a grouping model based on causal inference in Embodiment 1 of the present application;

FIG. 4 is a schematic diagram of program modules of a system for constructing a grouping model based on causal inference in Embodiment 2 of the present application;

FIG. 5 is a flow chart of the steps of the medical data processing method of Embodiment 3 of the present application;

FIG. 6 is a schematic diagram of the program modules of the medical data processing system according to Embodiment 4 of the present application;

FIG. 7 is a schematic diagram of a hardware structure of a computer device according to Embodiment 5 of the present application.

Detailed ways

In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, not to limit the present application. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

It should be noted that the descriptions involving "first", "second", etc. in the embodiments of the present application are only for descriptive purposes, and should not be understood as indicating or implying their relative importance or implicitly indicating the indicated technical features quantity. Thus, the features defined as "first" and "second" may explicitly or implicitly include at least one of these features. In addition, the technical solutions of the various embodiments can be combined with each other, but it must be based on the realization of those skilled in the art. When the combination of technical solutions is contradictory or cannot be realized, it should be considered that the combination of technical solutions does not exist , nor within the scope of protection required by the present application.

In the description of the present application, it should be understood that the numerals before the steps do not indicate the order in which the steps are executed, but are only used to facilitate the description of the present application and to distinguish each step, so they should not be construed as limitations on the present application.

Embodiment one

Please refer to FIG. 1 , which shows a flow chart of the steps of the method for constructing a grouping model based on causal inference according to an embodiment of the present application. It can be understood that the flowchart in this method embodiment is not used to limit the sequence of execution steps. The following is an exemplary description taking computer equipment as the execution subject, as follows:

As shown in Figure 1, the method for constructing a grouping model based on causal inference may include steps S100 to S106, wherein:

In step S100, a plurality of sample data of a plurality of sample patients is acquired, and the plurality of sample data of each sample patient includes a plurality of basic data, a plurality of patient history follow-up data and sample patient grouping result data.

In an exemplary embodiment, the plurality of sample patients may be a plurality of diabetic patients. The historical follow-up data of multiple diabetic patients are collected in chronological order, the basic data of a diabetic patient, the data of each follow-up visit, and the corresponding sample patient grouping result data. The diabetic patient is used as a sample data. Among them, multiple basic data include but are not limited to age, gender, place of work, frequently visited places, etc.; follow-up data include: medication history, medical test reports from third-party platforms or medical systems, expert/doctor prescription information, etc. data.

In order to improve the training efficiency of the model, the method further includes: performing preprocessing on multiple sample data, specifically including performing feature merging on multiple basic data of multiple sample patients and performing historical follow-up data on multiple patients through feature engineering. The features are combined to obtain the training data. For example, through feature engineering, the first feature primitives of each basic data and the second feature primitives of each patient's historical follow-up data are obtained, and the first feature primitives corresponding to each basic data are respectively performed based on the sample patient grouping result data. Aggregating and respectively aggregating the second feature primitives corresponding to the historical follow-up data of each patient based on the sample patient grouping result data, calculating the first similarity between each first feature primitive and the sample patient grouping result data, and calculating the first similarity between each first feature primitive and the sample patient grouping result data, and calculating each second feature primitive The second similarity between the two feature primitives and the sample patient grouping result data; at least one basic data corresponding to the first similarity that is less than the first preset threshold is subjected to feature merging and at least one is less than the second preset threshold. The historical follow-up data corresponding to the second similarity degree are combined to obtain training data. For example, in patients with category A, symptoms A and B are not directly related to patients with category A, then symptoms A and B are combined into category A to reduce the impact of redundant data on subsequent model training and effectively improve the accuracy of the model. training efficiency.

Step S102, input multiple sample data of the multiple sample patients into the model to be trained, output the propensity score of each sample patient for its corresponding sample patient grouping result data through the model to be trained, and pass the The model to be trained outputs the sample expected cumulative reward value of each sample patient corresponding to the patient grouping result data of each model in the model to be trained, wherein the propensity score indicates that the sample patient corresponds to the sample patient grouping result data probability.

In an exemplary embodiment, the model to be trained can be a deep reinforcement learning model (Deep Q Network, DQN model). In this embodiment, the preprocessed sample data input into the deep reinforcement learning model is defined as state (state); multiple model patient grouping result data are defined as action (action), according to the sample patient in states (multiple The result information obtained after taking action under sample data) defines reward (reward). Action is the one-hot encoding of patient grouping result data, and reward includes long-term reward and short-term reward. For example, the long-term reward can be positioned as: sign (whether there is a complication in the last follow-up)*5; the short-term reward can be defined as: sign (whether the glycated hemoglobin reaches the target in the next follow-up)*1.

As shown in Figure 3, in an exemplary embodiment, in order to eliminate the deviation of action selection and further improve the accuracy of model training; input multiple sample data of the multiple sample patients into the model to be trained, through the The training model outputs the propensity score of each sample patient for its corresponding sample patient grouping result data, which can also be obtained through the following operations, wherein: step S300, performing random allocation on multiple sample data of the multiple sample patients, to obtain a plurality of training sample data and a plurality of control sample data; and step S302, input the plurality of training sample data and a plurality of control sample data into the model to be trained, and use the model to be trained to Logistic regression is performed on the plurality of training sample data and the plurality of control sample data, and the propensity score of each sample patient to its corresponding sample patient grouping result data is calculated. In this embodiment, a plurality of training sample data of a plurality of first sample patients and a plurality of control sample data of a plurality of second sample patients are randomly assigned, and according to each first sample patient from a plurality of second sample Among the patients, the second sample patient is determined for control, which can be understood as being based on the third similarity between the training sample data of the first sample patient and the control sample data of each second sample patient, from each Screen out one or more second sample patients corresponding to a third preset threshold value from the plurality of third similarities corresponding to the first sample patient, and determine the sample patient from the screened one or more second sample patients The second sample patients with inconsistent data in the clustering results were used for control analysis, so as to conduct causal analysis based on randomly assigned sample data through the model. Wherein, multiple training sample data of the first sample patient are positive sample data, and each control sample data of one or more second sample patients screened out are negative sample data. In an embodiment, the DQN model combines the propensity represented by the sample data to make the expected reward output by the model more accurate.

In an exemplary embodiment, taking the model to be trained as a DQN model as an example, the model to be trained includes an input layer, an output layer, at least four NN layers (hidden layers) and a classification layer, wherein the input layer is used for Receive a plurality of sample data of a plurality of sample patients, the hidden layer is used to analyze and process the plurality of sample data, the output layer includes a plurality of output nodes, and each output node outputs the corresponding model patient grouping of the node The score of the result data; the classification layer is used to convert the score corresponding to each output node into the sample expected cumulative reward value of the patient grouping result data of each model. A plurality of sample data (states) of the plurality of sample patients are input into the input layer of the model to be trained, and after being processed by two layers of hidden layers, the propensity score of each sample patient for its sample patient grouping result data is output. value g and other eigenvalues, and other eigenvalues are input to the rest of the hidden layer, and output the sample expected cumulative reward values Q ₀ , Q ₁ , Q of each sample patient corresponding to each model patient grouping result data (action) through the output layer ₂ , . . . , Q _n . Among them, g represents the probability of doctors or experts taking the corresponding sample patient grouping result data under states. Assuming that there are two actions, a ₀ and a ₁ respectively, then g can be expressed as p(a ₁ |s)=1-p(a ₀ |s); where p(a ₁ |s) means that in the input data In the case of s, the probability that the doctor classifies the sample patient as a ₁ , p(a ₀ |s) represents the probability that the doctor classifies the sample patient as a ₀ when the input data is s.

Step S104, from the sample expected cumulative reward value of each sample patient, determine the target sample expected cumulative reward value corresponding to each sample patient.

In an exemplary embodiment, the target sample expected cumulative reward value corresponding to the sample patient is determined to be the largest sample expected cumulative reward value from the multiple sample expected cumulative reward values of each sample patient.

Step S106, based on the preset loss function, the propensity score of each patient sample and the corresponding expected cumulative reward value of the target sample, adjust the model parameters in the model to be trained to optimize the clustering model.

In order to optimize the model to be trained, please refer to Fig. 2, the preset loss function includes the first loss function, the second loss function and the third loss function; the loss function based on the preset, each sample patient's The propensity score and the corresponding expected cumulative reward value of the target sample, and adjusting the model parameters in the model to be trained to optimize the grouping model may further include steps S200-S210, wherein: step S200, based on the first loss function and the expected cumulative reward value of the target sample corresponding to each sample patient, and calculate the regression loss value; step S202, based on the second loss function and the propensity score of each sample patient, calculate the propensity The first loss value corresponding to the score; step S204, based on the third loss function and the target sample expected cumulative reward value corresponding to each sample patient, calculate the second loss value corresponding to the target sample expected cumulative reward value; Step S206, summing the regression loss value, the first loss value and the second loss value to obtain a loss value; Step S208, calculating the model parameters in the model to be trained according to the loss value Modifying to obtain a modified model to be trained; step S210, performing group training on multiple sample data of the multiple sample patients through the modified model to be trained, and when the modified model parameters reach the preset Stop training after the number of modifications and the loss value does not decrease, and mark the current model to be trained as the grouping model. In this embodiment, after calculating the regression loss value, the first loss value and the second loss value, the loss value is calculated by the following function:

Loss=Q_loss+λ ₁ *g_loss+λ ₂ *reg_loss;

Wherein, Loss is represented as a loss value, reg_loss is represented as a regression loss value, g_loss is represented as a first loss value, Q_loss is represented as a second loss value, and _λ1 and _λ2 are adjustable hyperparameters of the model to be trained.

In this embodiment, the model to be trained is repeatedly trained through the following loss function, the Loss is calculated through the loss function, the gradient is calculated for the Loss, the model parameters of the model are adjusted by using the gradient descent algorithm to backpropagate the Loss, and the training is repeated until the Loss is no longer , the grouping model is obtained. During the training process, the sample data is organized into a quadruple form such as (st _t , a _t , r, st _t+1 ), where st _t represents the state at time t, and a _t represents the grouping scheme of doctors at time t ( action), r and s _t+1 represent the reward obtained after taking a _t under s _t and the next state to transfer to. The loss function is as follows:

(1) The first loss function includes:

reg_loss=Q(s _t , a _tmax )-Q(s _t , a _t );

Among them, reg_loss is the regression loss value, which is used to prevent overestimation of the Q value. Q represents the expected cumulative reward value corresponding to the sample patient, s _t represents multiple sample data at time t; a _t represents the corresponding value of the sample patient at time t. Sample patient grouping result data, Q( _st , _atmax ) represents the largest sample expected cumulative reward value among multiple sample expected cumulative reward values output by the model to be trained, Q(st _t , at _t ) represents the sample patient The actual expected cumulative reward value of the sample patient clustering result data determined in the s _t state;

(2) The second loss function includes:

g_loss = CrossEntropy(g(s _t ), to_one_hot(a _t ));

Among them, s _t represents multiple sample data at time t; a _t represents the grouping result data of sample patients corresponding to sample patients at time t, CrossEntropy represents cross entropy, and g(s _t ) represents the output of g obtained when the input state s _t , g represents the propensity score, and corrects the bias of the state in the previous hidden layer (NN) of the linear classification layer through causal inference analysis; one_hot(a _t ) represents the grouping result data action of the sample patient corresponding to the sample patient at time t One-hot encoding; g_loss is the first loss value, which is used to make the output g approximate to the sample patient grouping result data;

(3) The third loss function includes:

Q_loss=(Q(s _t , a _t )-(γ+max _a (γ*Q(s _t+1 , a _t+1 )))) ² ;

Among them, s _t represents multiple sample data at time t; a _t represents the sample patient grouping result data corresponding to the sample patient at time t; multiple sample data at the next time t _{+1 transferred from s t} +1; Q( s _t , a _t ) represents the actual expected cumulative reward value corresponding to the sample patient grouping result data a _t in the state of multiple sample data of the sample patient at time t; max _a (γ*Q(st _t+1 , a _{t +1} ) means that the sample patient is in the state of multiple sample data at time t+1, the maximum value of the expected cumulative reward value of the sample corresponding to the multiple sample grouping result data a _t+1 ; γ means that in the model to be trained The discount factor is used to indicate the attenuation ratio of the expected cumulative reward value of the target sample at the next time t+1 discounted to the expected cumulative reward value of the target sample corresponding to the time t; Q_loss is the second loss value.

The embodiments of the present application have at least the following beneficial effects:

(1) The deep reinforcement learning model is combined with causal inference analysis to train multiple sample data, decoupling the tendency of patient grouping result data representation, eliminating the deviation of patient grouping result data selection, and the model fitting is more reasonable;

(2) Prevent overestimation of the expected cumulative reward value of each output sample through the propensity score, sample expected cumulative reward value and loss function, and generate safer patient grouping result data;

(3) Causal inference analysis eliminates bias in the decision-making of deep reinforcement learning, optimizes the long-term cumulative return of decision-making choices, effectively reduces the estimation error caused by selection bias, and improves the accuracy and safety of the grouping model in actual use.

Embodiment two

Please continue to refer to FIG. 4 , which shows a schematic diagram of the program modules of the causal inference-based grouping model building system of the present application. In this embodiment, the grouping model construction system 40 based on causal inference may include or be divided into one or more program modules, and one or more program modules are stored in a storage medium and executed by one or more processors. Execute to complete the application and realize the above-mentioned method for constructing a grouping model based on causal inference. The program module referred to in the embodiment of the present application refers to a series of computer program instruction segments capable of accomplishing specific functions, which is more suitable than the program itself to describe the execution process of the causal inference-based grouping model construction system 40 in the storage medium. The following description will specifically introduce the functions of each program module of the present embodiment:

The said grouping model construction system 40 based on causal inference includes:

The first acquiring module 400 is configured to acquire multiple sample data of multiple sample patients, and the multiple sample data of each sample patient includes multiple basic data, multiple patient historical follow-up data and sample patient grouping result data;

The first model processing module 402 is configured to input multiple sample data of the multiple sample patients into the model to be trained, and output the tendency of each sample patient to its corresponding sample patient grouping result data through the model to be trained Score and each sample patient corresponding to the sample expected cumulative reward value of each model patient clustering result data, wherein the propensity score represents the probability that the sample patient corresponds to the sample patient clustering result data; and

The first determining module 404 is configured to determine the target sample expected cumulative reward value corresponding to each sample patient from the sample expected cumulative reward value of each sample patient; and

The optimization module 406 is used to adjust the model parameters in the model to be trained based on the preset loss function, the propensity score of each sample patient and the corresponding expected cumulative reward value of the target sample, so as to optimize the grouping model.

In an exemplary embodiment, the preset loss function includes a first loss function, a second loss function, and a third loss function; the optimization module 406 is further configured to: based on the first loss function and the The expected cumulative reward value of the target sample corresponding to each sample patient is calculated to obtain the regression loss value; based on the second loss function and the propensity score of each sample patient, the first propensity score corresponding to the propensity score is calculated. Loss value; based on the third loss function and the target sample expected cumulative reward value corresponding to each sample patient, calculate the second loss value corresponding to the target sample expected cumulative reward value; for the regression loss value, the Summing the first loss value and the second loss value to obtain a loss value; modifying the model parameters in the model to be trained according to the loss value to obtain a modified model to be trained; and The modified model to be trained performs group training on the multiple sample data of the multiple sample patients, and stops the training when the modified model parameters reach the preset number of modifications and the loss value does not decrease, and the current The model to be trained is marked as the grouping model.

In an exemplary embodiment, the first loss function includes:

reg_loss=Q(s _t , a _tmax )-Q(s _t , a _t );

The second loss function includes:

g_loss = CrossEntropy(g(s _t ), to_one_hot(a _t ));

Among them, s _t represents multiple sample data at time t; a _t represents the grouping result data of sample patients corresponding to sample patients at time t, CrossEntropy represents cross entropy, and g(s _t ) represents the output of g obtained when the input state s _t , g represents the propensity score, one_hot(a _t ) represents the one-hot encoding of the sample patient grouping result data action corresponding to the sample patient at time t; g_loss is the first loss value, which is used to make the output g approximate to the sample patient grouping result data ;

The third loss function includes:

Q_loss=(Q(s _t , a _t )-(γ+max _a (γ*Q(s _t+1 , a _t+1 )))) ² ;

In an exemplary embodiment, the model to be trained is a deep reinforcement learning model.

In an exemplary embodiment, the first model processing module 402 is further configured to: randomly distribute the multiple sample data of the multiple sample patients to obtain multiple training sample data and multiple control samples data; and input the plurality of training sample data and the plurality of control sample data into the model to be trained, and perform logic on the plurality of training sample data and the plurality of control sample data through the model to be trained Regression, calculating the propensity score of each sample patient for its corresponding sample patient grouping result data.

Embodiment two

Please refer to FIG. 5 , which shows a flow chart of the steps of the medical data processing method according to the embodiment of the present application. It can be understood that the flowchart in this method embodiment is not used to limit the sequence of execution steps. The following is an exemplary description taking computer equipment as the execution subject, as follows:

As shown in Figure 5, the medical data processing method may include steps S500-S506, wherein:

Step S500, acquiring a plurality of patient data of the target patient, the plurality of patient data including a plurality of basic data, a plurality of patient history follow-up data and a patient current follow-up data;

Step S502, input the plurality of basic data, the plurality of patient history follow-up data and the patient current follow-up data into the grouping model as described above, and output the target patient corresponding to each model through the grouping model The expected cumulative reward value of patient grouping result data;

Step S504, from the multiple expected cumulative reward values, determine the largest expected cumulative reward value as the target expected cumulative reward value;

Step S506, according to the target expected cumulative reward value, determine the corresponding model patient grouping result data as the target patient grouping result data corresponding to the target patient.

Embodiment three

Please continue to refer to FIG. 6 , which shows a schematic diagram of program modules of the medical data processing system of the present application. In this embodiment, the medical data processing system 60 may include or be divided into one or more program modules, and one or more program modules are stored in a storage medium and executed by one or more processors to complete In this application, the above-mentioned medical data processing method can be realized. The program module referred to in the embodiment of this application refers to a series of computer program instruction segments capable of completing specific functions, which is more suitable for describing the execution process of the medical data processing system 60 in the storage medium than the program itself. The following description will specifically introduce the functions of each program module of the present embodiment:

The medical data processing system includes:

The second acquiring module 600 is configured to acquire multiple patient data of the target patient, the multiple patient data including multiple basic data, multiple patient historical follow-up data, and patient current follow-up data;

The second model processing module 602 is configured to input the multiple basic data, the multiple patient historical follow-up data and the patient current follow-up data into the grouping model according to any one of claims 1-5, through The grouping model outputs the expected cumulative reward value corresponding to each model patient grouping result data of the target patient;

The second determining module 604 is configured to determine the largest expected cumulative reward value as a target expected cumulative reward value from among multiple expected cumulative reward values;

The third determination module 606 is configured to determine the corresponding model patient grouping result data as the target patient grouping result data corresponding to the target patient according to the target expected cumulative reward value.

Embodiment four

Referring to FIG. 7 , it is a schematic diagram of a hardware architecture of a computer device according to Embodiment 5 of the present application. In this embodiment, the computer device 2 is a device capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions. The computer device 2 may be a rack server, a blade server, a tower server or a cabinet server (including an independent server, or a server cluster composed of multiple servers) and the like. As shown in FIG. 7 , the computer device 2 at least includes, but is not limited to, a memory 21 , a processor 22 , a network interface 23 , and a causal inference-based grouping model building system 40 that can communicate with each other through a system bus. in:

In this embodiment, the memory 21 includes at least one type of computer-readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory ( RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 21 may be an internal storage unit of the computer device 2 , such as a hard disk or memory of the computer device 2 . In other embodiments, the memory 21 can also be an external storage device of the computer device 2, such as a plug-in hard disk equipped on the computer device 2, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc. Of course, the storage 21 may also include both the internal storage unit of the computer device 2 and its external storage device. In this embodiment, the memory 21 is usually used to store the operating system and various application software installed in the computer device 2, such as the program codes of the causal inference-based grouping model construction system 40 of the above-mentioned embodiment. In addition, the memory 21 can also be used to temporarily store various types of data that have been output or will be output.

In some embodiments, the processor 22 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips. The processor 22 is generally used to control the overall operation of the computer device 2 . In this embodiment, the processor 22 is used to run the program code stored in the memory 21 or process data, for example, run the causal inference-based grouping model construction system 40, so as to implement the causal inference-based grouping model construction method of the above-mentioned embodiment.

The network interface 23 may include a wireless network interface or a wired network interface, and the network interface 23 is generally used to establish a communication connection between the computer device 2 and other electronic devices. For example, the network interface 23 is used to connect the computer device 2 with an external terminal through a network, and establish a data transmission channel and a communication connection between the computer device 2 and an external terminal. The network can be an enterprise intranet (Intranet), Internet (Internet), Global System of Mobile communication (Global System of Mobile communication, GSM), broadband code division multiple access (Wideband Code Division Multiple Access, WCDMA), 4G network, 5G Internet, Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.

It should be pointed out that FIG. 7 only shows the computer device 2 having components 21-23 and a causal inference-based grouping model building system 40, but it should be understood that it is not required to implement all the components shown, and can be replaced by Implement more or fewer components.

In this embodiment, the causal inference-based grouping model construction system 40 stored in the memory 21 can also be divided into one or more program modules, and the one or more program modules are stored in the memory 21, And it is executed by one or more processors (processor 22 in this embodiment) to complete the application.

For example, FIG. 4 shows a schematic diagram of the program modules of Embodiment 2 of the system 40 for constructing a grouping model based on causal inference. In this embodiment, the system for building a grouping model 40 based on causal inference can be divided into the first acquisition Module 400 , first model processing module 402 , first determination module 404 and optimization module 406 . Wherein, the program module referred to in this application refers to a series of computer program instruction segments capable of completing specific functions, which is more suitable than a program to describe the execution process of the causal inference-based grouping model construction system 40 in the computer device 2 . The specific functions of the program modules 400-406 have been described in detail in the second embodiment, and will not be repeated here.

Embodiment five

This embodiment also provides a computer-readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), only Read memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, server, App application store, etc., on which computer programs are stored, The corresponding functions are realized when the program is executed by the processor. The computer-readable storage medium may be non-volatile or volatile. The computer-readable storage medium of this embodiment is used to store the system 40 for constructing a grouping model based on causal inference, and when executed by a processor, realizes the method for constructing a grouping model based on causal inference in the above-mentioned embodiment.

The serial numbers of the above embodiments of the present application are for description only, and do not represent the advantages and disadvantages of the embodiments.

Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation.

The above are only preferred embodiments of the present application, and are not intended to limit the patent scope of the present application. All equivalent structures or equivalent process transformations made by using the description of the application and the accompanying drawings are directly or indirectly used in other related technical fields. , are all included in the patent protection scope of the present application in the same way.

Claims

A method for constructing a grouping model based on causal inference, including:

Obtain multiple sample data of multiple sample patients, and multiple sample data of each sample patient includes multiple basic data, multiple patient historical follow-up data and sample patient grouping result data;

Input a plurality of sample data of the plurality of sample patients into the model to be trained, output the propensity score of each sample patient for its corresponding sample patient grouping result data through the model to be trained, and use the model to be trained The model outputs each sample patient corresponding to the sample expected cumulative reward value of each model patient grouping result data in the model to be trained, wherein the propensity score represents the probability that the sample patient corresponds to the sample patient grouping result data;

From the sample expected cumulative reward value of each sample patient, determine the target sample expected cumulative reward value corresponding to each sample patient; and

Based on the preset loss function, the propensity score of each patient sample and the corresponding expected cumulative reward value of the target sample, the model parameters in the model to be trained are adjusted to optimize the grouping model.
The method for constructing a grouping model based on causal inference according to claim 1, wherein the preset loss function includes a first loss function, a second loss function and a third loss function;

The adjustment of the model parameters in the model to be trained based on the preset loss function, the propensity score of each sample patient and the corresponding expected cumulative reward value of the target sample to optimize the grouping model includes:

Calculate and obtain a regression loss value based on the first loss function and the expected cumulative reward value of the target sample corresponding to each sample patient;

Based on the second loss function and the propensity score of each sample patient, calculate a first loss value corresponding to the propensity score;

Based on the third loss function and the target sample expected cumulative reward value corresponding to each sample patient, calculate a second loss value corresponding to the target sample expected cumulative reward value;

summing the regression loss value, the first loss value and the second loss value to obtain a loss value;

Modifying the model parameters in the model to be trained according to the loss value to obtain a modified model to be trained; and

Perform group training on multiple sample data of the multiple sample patients through the modified model to be trained, and stop the training when the modified model parameters reach a preset number of modifications and the loss value does not decrease, and Mark the current model to be trained as the grouping model.
The method for constructing a grouping model based on causal inference according to claim 2, wherein the first loss function comprises:

reg_loss=Q(s t , a tmax )-Q(s t , a t );

Among them, reg_loss is the regression loss value, which is used to prevent overestimation of the Q value. Q represents the expected cumulative reward value corresponding to the sample patient, s t represents multiple sample data at time t; a t represents the corresponding value of the sample patient at time t. Sample patient grouping result data, Q( st , atmax ) represents the largest sample expected cumulative reward value among multiple sample expected cumulative reward values output by the model to be trained, Q(st t , at t ) represents the sample patient The actual expected cumulative reward value of the sample patient clustering result data determined in the s t state;

The second loss function includes:

g_loss = CrossEntropy(g(s t ), to_one_hot(a t ));

Among them, s t represents multiple sample data at time t; a t represents the grouping result data of sample patients corresponding to sample patients at time t, CrossEntropy represents cross entropy, g(s t ) represents the output of g obtained when input state st, g represents the propensity score, one_hot(a t ) represents the one-hot encoding of the sample patient grouping result data action corresponding to the sample patient at time t; g_loss is the first loss value, which is used to make the output g approximate to the sample patient grouping result data;

The third loss function includes:

Q_loss=(Q(s t , a t )-(γ+max a (γ*Q(s t+1 , a t+1 )))) 2 ;

Among them, s t represents multiple sample data at time t; a t represents the sample patient grouping result data corresponding to the sample patient at time t; multiple sample data at the next time t +1 transferred from s t +1; Q( s t , a t ) represents the actual expected cumulative reward value corresponding to the sample patient grouping result data a t in the state of multiple sample data of the sample patient at time t; max a (γ*Q(st t+1 , a t +1 )) means that the sample patient is in the state of multiple sample data at time t+1, the maximum value of the expected cumulative reward value of the sample corresponding to the multiple sample grouping result data a t+1 ; γ means that in the model to be trained The discount factor is used to represent the attenuation ratio of the expected cumulative reward value of the target sample at the next time t+1 discounted to the expected cumulative reward value of the target sample corresponding to time t; Q_loss is the second loss value.
The method for constructing a grouping model based on causal inference according to claim 3, wherein the model to be trained is a deep reinforcement learning model.
The method for constructing a grouping model based on causal inference according to claim 1, wherein, before the multiple sample data of the multiple sample patients are input into the model to be trained, the method also includes:

Randomly assigning multiple sample data of the multiple sample patients to obtain multiple training sample data and multiple control sample data;

Correspondingly, the input of multiple sample data of the multiple sample patients into the model to be trained, and outputting the propensity score of each sample patient for its corresponding sample patient grouping result data through the model to be trained includes:

Input the plurality of training sample data and the plurality of control sample data into the model to be trained, perform logistic regression on the plurality of training sample data and the plurality of control sample data through the model to be trained, and calculate The propensity score of each sample patient for its corresponding sample patient grouping result data is obtained.
A system for constructing a grouping model based on causal inference, including:

The first acquisition module is used to acquire multiple sample data of multiple sample patients, and the multiple sample data of each sample patient includes multiple basic data, multiple patient historical follow-up data and sample patient grouping result data;

The first model processing module is used to input multiple sample data of the multiple sample patients into the model to be trained, and output the propensity score of each sample patient for its corresponding sample patient grouping result data through the model to be trained value, and the sample expected cumulative reward value of each sample patient corresponding to each model patient grouping result data in the model to be trained is output by the model to be trained, wherein the propensity score indicates that the sample patient corresponds to the sample patient Probability of patient cohort outcome data;

The first determining module is used to determine the target sample expected cumulative reward value corresponding to each sample patient from the sample expected cumulative reward value of each sample patient; and

The optimization module is used to adjust the model parameters in the model to be trained based on the preset loss function, the propensity score of each sample patient and the corresponding expected cumulative reward value of the target sample, so as to optimize the clustering model.
The system for constructing a grouping model based on causal inference according to claim 6, wherein the preset loss function includes a first loss function, a second loss function and a third loss function;

The optimization module is also used for:

Calculate and obtain a regression loss value based on the first loss function and the expected cumulative reward value of the target sample corresponding to each sample patient;

Based on the second loss function and the propensity score of each sample patient, calculate a first loss value corresponding to the propensity score;

Based on the third loss function and the target sample expected cumulative reward value corresponding to each sample patient, calculate a second loss value corresponding to the target sample expected cumulative reward value;

summing the regression loss value, the first loss value and the second loss value to obtain a loss value;

Modifying the model parameters in the model to be trained according to the loss value to obtain a modified model to be trained; and

Perform group training on multiple sample data of the multiple sample patients through the modified model to be trained, and stop the training when the modified model parameters reach a preset number of modifications and the loss value does not decrease, and Mark the current model to be trained as the grouping model.
The grouping model building system based on causal inference according to claim 7, wherein the first loss function comprises:

reg_loss=Q(s t , a tmax )-Q(s t , a t );

Among them, reg_loss is the regression loss value, which is used to prevent overestimation of the Q value. Q represents the expected cumulative reward value corresponding to the sample patient, s t represents multiple sample data at time t; a t represents the corresponding value of the sample patient at time t. Sample patient grouping result data, Q( st , atmax ) represents the largest sample expected cumulative reward value among multiple sample expected cumulative reward values output by the model to be trained, Q(st t , at t ) represents the sample patient The actual expected cumulative reward value of the sample patient clustering result data determined in the s t state;

The second loss function includes:

g_loss = CrossEntropy(g(s t ), to_one_hot(a t ));

Among them, s t represents multiple sample data at time t; a t represents the grouping result data of sample patients corresponding to sample patients at time t, CrossEntropy represents cross entropy, g(s t ) represents the output of g obtained when input state st, g represents the propensity score, one_hot(a t ) represents the one-hot encoding of the sample patient grouping result data action corresponding to the sample patient at time t; g_loss is the first loss value, which is used to make the output g approximate to the sample patient grouping result data;

The third loss function includes:

Q_loss=(Q(s t , a t )-(γ+max a (γ*Q(s t+1 , a t+1 )))) 2 ;

Among them, s t represents multiple sample data at time t; a t represents the sample patient grouping result data corresponding to the sample patient at time t; multiple sample data at the next time t +1 transferred from s t +1; Q( s t , a t ) represents the actual expected cumulative reward value corresponding to the sample patient grouping result data a t in the state of multiple sample data of the sample patient at time t; max a (γ*Q(st t+1 , a t +1 )) means that the sample patient is in the state of multiple sample data at time t+1, the maximum value of the expected cumulative reward value of the sample corresponding to the multiple sample grouping result data a t+1 ; γ means that in the model to be trained The discount factor is used to represent the attenuation ratio of the expected cumulative reward value of the target sample at the next time t+1 discounted to the expected cumulative reward value of the target sample corresponding to time t; Q_loss is the second loss value.
The system for constructing a grouping model based on causal inference according to claim 8, wherein the model to be trained is a deep reinforcement learning model.
The system for constructing grouping models based on causal inference according to claim 6, wherein said system also includes: a randomized assignment module;

The randomized allocation module is configured to: perform randomized allocation on multiple sample data of the multiple sample patients to obtain multiple training sample data and multiple control sample data;

Correspondingly, the first model processing module is further configured to: input the multiple training sample data and the multiple control sample data into the model to be trained, and use the model to be trained to train the multiple Logistic regression is performed on the sample data and the plurality of control sample data, and the propensity score of each sample patient to its corresponding sample patient grouping result data is calculated.
A method for processing medical data, including:

Acquiring a plurality of patient data of the target patient, the plurality of patient data including a plurality of basic data, a plurality of patient historical follow-up data, and a patient current follow-up data;

Input the plurality of basic data, the plurality of patient history follow-up data and the patient current follow-up data into the grouping model according to any one of claims 1 to 5, and output the target patient through the grouping model The expected cumulative reward value corresponding to each model patient grouping result data;

From the plurality of expected cumulative reward values, determining the largest expected cumulative reward value as a target expected cumulative reward value; and

According to the target expected cumulative reward value, determine the corresponding model patient grouping result data as the target patient grouping result data corresponding to the target patient.
A medical data processing system, including:

The second acquiring module is used to acquire multiple patient data of the target patient, the multiple patient data including multiple basic data, multiple patient historical follow-up data and patient current follow-up data;

The second model processing module is used to input the plurality of basic data, the plurality of patient history follow-up data and the patient current follow-up data into the grouping model according to any one of claims 1 to 5, through the The grouping model outputs the expected cumulative reward value of the target patient corresponding to each model patient grouping result data;

The second determining module is used to determine the largest expected cumulative reward value as a target expected cumulative reward value from among multiple expected cumulative reward values; and

The third determination module is configured to determine the corresponding model patient grouping result data as the target patient grouping result data corresponding to the target patient according to the target expected cumulative reward value.
A computer device comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processor performs the following steps when executing the computer program:

Obtain multiple sample data of multiple sample patients, and multiple sample data of each sample patient includes multiple basic data, multiple patient historical follow-up data and sample patient grouping result data;

Input a plurality of sample data of the plurality of sample patients into the model to be trained, output the propensity score of each sample patient for its corresponding sample patient grouping result data through the model to be trained, and use the model to be trained The model outputs each sample patient corresponding to the sample expected cumulative reward value of each model patient grouping result data in the model to be trained, wherein the propensity score represents the probability that the sample patient corresponds to the sample patient grouping result data;

From the sample expected cumulative reward value of each sample patient, determine the target sample expected cumulative reward value corresponding to each sample patient; and

Based on the preset loss function, the propensity score of each patient sample and the corresponding expected cumulative reward value of the target sample, the model parameters in the model to be trained are adjusted to optimize the grouping model.
The computer device according to claim 13, wherein the preset loss function comprises a first loss function, a second loss function and a third loss function;

When the processor executes the computer program, the following steps are also performed:

Calculate and obtain a regression loss value based on the first loss function and the expected cumulative reward value of the target sample corresponding to each sample patient;

Based on the second loss function and the propensity score of each sample patient, calculate a first loss value corresponding to the propensity score;

Based on the third loss function and the target sample expected cumulative reward value corresponding to each sample patient, calculate a second loss value corresponding to the target sample expected cumulative reward value;

summing the regression loss value, the first loss value and the second loss value to obtain a loss value;

Modifying the model parameters in the model to be trained according to the loss value to obtain a modified model to be trained; and

Perform group training on multiple sample data of the multiple sample patients through the modified model to be trained, and stop the training when the modified model parameters reach a preset number of modifications and the loss value does not decrease, and Mark the current model to be trained as the grouping model.
The computer device of claim 14, wherein the first loss function comprises:

reg_loss=Q(s t , a tmax )-Q(s t , a t );

Among them, reg_loss is the regression loss value, which is used to prevent overestimation of the Q value. Q represents the expected cumulative reward value corresponding to the sample patient, s t represents multiple sample data at time t; a t represents the corresponding value of the sample patient at time t. Sample patient grouping result data, Q( st , atmax ) represents the largest sample expected cumulative reward value among multiple sample expected cumulative reward values output by the model to be trained, Q(st t , at t ) represents the sample patient The actual expected cumulative reward value of the sample patient clustering result data determined in the s t state;

The second loss function includes:

g_loss = CrossEntropy(g(s t ), to_one_hot(a t ));

Among them, s t represents multiple sample data at time t; a t represents the grouping result data of sample patients corresponding to sample patients at time t, CrossEntropy represents cross entropy, g(s t ) represents the output of g obtained when input state st, g represents the propensity score, one_hot(a t ) represents the one-hot encoding of the sample patient grouping result data action corresponding to the sample patient at time t; g_loss is the first loss value, which is used to make the output g approximate to the sample patient grouping result data;

The third loss function includes:

Q_loss=(Q(s t , a t )-(γ+max a (γ*Q(s t+1 , a t+1 )))) 2 ;

Among them, s t represents multiple sample data at time t; a t represents the sample patient grouping result data corresponding to the sample patient at time t; multiple sample data at the next time t +1 transferred from s t +1; Q( s t , a t ) represents the actual expected cumulative reward value corresponding to the sample patient grouping result data a t in the state of multiple sample data of the sample patient at time t; max a (γ*Q(st t+1 , a t +1 )) means that the sample patient is in the state of multiple sample data at time t+1, the maximum value of the expected cumulative reward value of the sample corresponding to the multiple sample grouping result data a t+1 ; γ means that in the model to be trained The discount factor is used to represent the attenuation ratio of the expected cumulative reward value of the target sample at the next time t+1 discounted to the expected cumulative reward value of the target sample corresponding to time t; Q_loss is the second loss value.
The computer device of claim 13, wherein:

Before the multiple sample data of the multiple sample patients are input into the model to be trained, the processor also executes the following steps when executing the computer program:

Randomly assigning multiple sample data of the multiple sample patients to obtain multiple training sample data and multiple control sample data;

Correspondingly, when the processor executes the computer program, the following steps are also performed:

Input the plurality of training sample data and the plurality of control sample data into the model to be trained, perform logistic regression on the plurality of training sample data and the plurality of control sample data through the model to be trained, and calculate The propensity score of each sample patient for its corresponding sample patient grouping result data is obtained.
A computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, and the computer program can be executed by at least one processor, so that the at least one processor performs the following steps:

Obtain multiple sample data of multiple sample patients, and multiple sample data of each sample patient includes multiple basic data, multiple patient historical follow-up data and sample patient grouping result data;

Input a plurality of sample data of the plurality of sample patients into the model to be trained, output the propensity score of each sample patient for its corresponding sample patient grouping result data through the model to be trained, and use the model to be trained The model outputs each sample patient corresponding to the sample expected cumulative reward value of each model patient grouping result data in the model to be trained, wherein the propensity score represents the probability that the sample patient corresponds to the sample patient grouping result data;

From the sample expected cumulative reward value of each sample patient, determine the target sample expected cumulative reward value corresponding to each sample patient; and

Based on the preset loss function, the propensity score of each patient sample and the corresponding expected cumulative reward value of the target sample, the model parameters in the model to be trained are adjusted to optimize the grouping model.
The computer-readable storage medium according to claim 17, wherein the preset loss function includes a first loss function, a second loss function, and a third loss function;

When the processor executes the computer program, the following steps are also performed:

The adjustment of the model parameters in the model to be trained based on the preset loss function, the propensity score of each sample patient and the corresponding expected cumulative reward value of the target sample to optimize the grouping model includes:

Calculate and obtain a regression loss value based on the first loss function and the expected cumulative reward value of the target sample corresponding to each sample patient;

Based on the second loss function and the propensity score of each sample patient, calculate a first loss value corresponding to the propensity score;

Based on the third loss function and the target sample expected cumulative reward value corresponding to each sample patient, calculate a second loss value corresponding to the target sample expected cumulative reward value;

summing the regression loss value, the first loss value and the second loss value to obtain a loss value;

Modifying the model parameters in the model to be trained according to the loss value to obtain a modified model to be trained; and

Perform group training on multiple sample data of the multiple sample patients through the modified model to be trained, and stop the training when the modified model parameters reach a preset number of modifications and the loss value does not decrease, and Mark the current model to be trained as the grouping model.
The computer readable storage medium of claim 18, wherein the first loss function comprises:

reg_loss=Q(s t , a tmax )-Q(s t , a t );

Among them, reg_loss is the regression loss value, which is used to prevent overestimation of the Q value. Q represents the expected cumulative reward value corresponding to the sample patient, s t represents multiple sample data at time t; a t represents the corresponding value of the sample patient at time t. Sample patient grouping result data, Q( st , atmax ) represents the largest sample expected cumulative reward value among multiple sample expected cumulative reward values output by the model to be trained, Q(st t , at t ) represents the sample patient The actual expected cumulative reward value of the sample patient clustering result data determined in the s t state;

The second loss function includes:

g_loss = CrossEntropy(g(s t ), to_one_hot(a t ));

Among them, s t represents multiple sample data at time t; a t represents the grouping result data of sample patients corresponding to sample patients at time t, CrossEntropy represents cross entropy, g(s t ) represents the output of g obtained when input state st, g represents the propensity score, one_hot(a t ) represents the one-hot encoding of the sample patient grouping result data action corresponding to the sample patient at time t; g_loss is the first loss value, which is used to make the output g approximate to the sample patient grouping result data;

The third loss function includes:

Q_loss=(Q(s t , a t )-(γ+max a (γ*Q(s t+1 , a t+1 )))) 2 ;

Among them, s t represents multiple sample data at time t; a t represents the sample patient grouping result data corresponding to the sample patient at time t; multiple sample data at the next time t +1 transferred from s t +1; Q( s t , a t ) represents the actual expected cumulative reward value corresponding to the sample patient grouping result data a t in the state of multiple sample data of the sample patient at time t; max a (γ*Q(st t+1 , a t +1 )) means that the sample patient is in the state of multiple sample data at time t+1, the maximum value of the expected cumulative reward value of the sample corresponding to the multiple sample grouping result data a t+1 ; γ means that in the model to be trained The discount factor is used to represent the attenuation ratio of the expected cumulative reward value of the target sample at the next time t+1 discounted to the expected cumulative reward value of the target sample corresponding to time t; Q_loss is the second loss value.
The computer readable storage medium of claim 1, wherein:

Before the multiple sample data of the multiple sample patients are input into the model to be trained, the processor also executes the following steps when executing the computer program:

Randomly assigning multiple sample data of the multiple sample patients to obtain multiple training sample data and multiple control sample data;

Correspondingly, when the processor executes the computer program, the following steps are also performed:

Input the plurality of training sample data and the plurality of control sample data into the model to be trained, perform logistic regression on the plurality of training sample data and the plurality of control sample data through the model to be trained, and calculate The propensity score of each sample patient for its corresponding sample patient grouping result data is obtained.