WO2023050668A1 - Clustering model construction method based on causal inference and medical data processing method - Google Patents

Clustering model construction method based on causal inference and medical data processing method Download PDF

Info

Publication number
WO2023050668A1
WO2023050668A1 PCT/CN2022/074389 CN2022074389W WO2023050668A1 WO 2023050668 A1 WO2023050668 A1 WO 2023050668A1 CN 2022074389 W CN2022074389 W CN 2022074389W WO 2023050668 A1 WO2023050668 A1 WO 2023050668A1
Authority
WO
WIPO (PCT)
Prior art keywords
sample
patient
model
data
loss
Prior art date
Application number
PCT/CN2022/074389
Other languages
French (fr)
Chinese (zh)
Inventor
徐卓扬
孙行智
赵婷婷
胡岗
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2023050668A1 publication Critical patent/WO2023050668A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/041Abduction
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Definitions

  • the embodiment of the present application relates to the technical field of artificial intelligence, and in particular to a method for constructing a grouping model based on causal inference and a method for processing medical data.
  • the embodiment of the present application provides a method for constructing a grouping model based on causal inference, a system, a computer device, a computer-readable storage medium, and a medical data processing method, which are used to solve the problems of learning and processing of existing deep reinforcement learning models.
  • the grouping model trained based on the deep reinforcement learning model has low accuracy and unreasonable problems in making patient grouping decisions.
  • One aspect of the present application provides a method for constructing a grouping model based on causal inference, including:
  • Obtain multiple sample data of multiple sample patients, and multiple sample data of each sample patient includes multiple basic data, multiple patient historical follow-up data and sample patient grouping result data;
  • Input a plurality of sample data of the plurality of sample patients into the model to be trained, output the propensity score of each sample patient for its corresponding sample patient grouping result data through the model to be trained, and use the model to be trained
  • the model outputs each sample patient corresponding to the sample expected cumulative reward value of each model patient grouping result data in the model to be trained, wherein the propensity score represents the probability that the sample patient corresponds to the sample patient grouping result data;
  • the model parameters in the model to be trained are adjusted to optimize the grouping model.
  • Another aspect of the embodiment of the present application provides a system for constructing a grouping model based on causal inference, including:
  • the first acquisition module is used to acquire multiple sample data of multiple sample patients, and the multiple sample data of each sample patient includes multiple basic data, multiple patient historical follow-up data and sample patient grouping result data;
  • the first model processing module is used to input multiple sample data of the multiple sample patients into the model to be trained, and output the propensity score of each sample patient for its corresponding sample patient grouping result data through the model to be trained value, and the sample expected cumulative reward value of each sample patient corresponding to each model patient grouping result data in the model to be trained is output by the model to be trained, wherein the propensity score indicates that the sample patient corresponds to the sample patient Probability of patient cohort outcome data;
  • the first determining module is used to determine the target sample expected cumulative reward value corresponding to each sample patient from the sample expected cumulative reward value of each sample patient;
  • the optimization module is used to adjust the model parameters in the model to be trained based on the preset loss function, the propensity score of each sample patient and the corresponding expected cumulative reward value of the target sample, so as to optimize the clustering model.
  • Another aspect of the embodiment of the present application provides a medical data processing method, including:
  • the plurality of patient data including a plurality of basic data, a plurality of patient historical follow-up data, and a patient current follow-up data;
  • the target expected cumulative reward value determine the corresponding model patient grouping result data as the target patient grouping result data corresponding to the target patient.
  • a medical data processing system including:
  • the second acquiring module is used to acquire multiple patient data of the target patient, the multiple patient data including multiple basic data, multiple patient historical follow-up data and patient current follow-up data;
  • the second model processing module is used to input the plurality of basic data, the plurality of patient history follow-up data and the patient current follow-up data into the above-mentioned grouping model, and output the target through the grouping model The patient's expected cumulative reward value corresponding to the patient grouping result data of each model;
  • the second determining module is used to determine the largest expected cumulative reward value as a target expected cumulative reward value from among multiple expected cumulative reward values.
  • the third determination module is configured to determine the corresponding model patient grouping result data as the target patient grouping result data corresponding to the target patient according to the target expected cumulative reward value.
  • an embodiment of the present application further provides a computer device, the computer device includes a memory, a processor, and a computer program stored in the memory and operable on the processor, and the processor executes the The following steps are also performed when the computer program:
  • Obtain multiple sample data of multiple sample patients, and multiple sample data of each sample patient includes multiple basic data, multiple patient historical follow-up data and sample patient grouping result data;
  • Input a plurality of sample data of the plurality of sample patients into the model to be trained, output the propensity score of each sample patient for its corresponding sample patient grouping result data through the model to be trained, and use the model to be trained
  • the model outputs each sample patient corresponding to the sample expected cumulative reward value of each model patient grouping result data in the model to be trained, wherein the propensity score represents the probability that the sample patient corresponds to the sample patient grouping result data;
  • the model parameters in the model to be trained are adjusted to optimize the grouping model.
  • an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and the computer program can be executed by at least one processor, so that the at least A processor performs the following steps:
  • Obtain multiple sample data of multiple sample patients, and multiple sample data of each sample patient includes multiple basic data, multiple patient historical follow-up data and sample patient grouping result data;
  • Input a plurality of sample data of the plurality of sample patients into the model to be trained, output the propensity score of each sample patient for its corresponding sample patient grouping result data through the model to be trained, and use the model to be trained
  • the model outputs each sample patient corresponding to the sample expected cumulative reward value of each model patient grouping result data in the model to be trained, wherein the propensity score represents the probability that the sample patient corresponds to the sample patient grouping result data;
  • the model parameters in the model to be trained are adjusted to optimize the grouping model.
  • the causal inference-based grouping model construction method, system, computer equipment, computer-readable storage medium, and medical data processing method use the multiple basic data of the multiple sample patients and the multiple patient historical follow-up data and the sample patient grouping result data are input into the model to be trained, and the propensity score of each sample patient for its corresponding sample patient grouping result data and each sample patient corresponding to each model patient grouping result data are output by the model to be trained
  • the sample expected cumulative reward value of each sample patient from the sample expected cumulative reward value of each sample patient, determine the target sample expected cumulative reward value corresponding to each sample patient; and based on the preset loss function, the propensity score of each sample patient value and the corresponding expected cumulative reward value of the target sample, adjust the model parameters in the model to be trained to optimize the clustering model; train multiple sample data by combining the model to be trained with causal inference analysis, and eliminate the need for patient clustering result data
  • the selection bias makes the model fit more reasonable, and the trained model has a higher application accuracy.
  • Fig. 1 is the flow chart of the steps of the grouping model construction method based on causal inference in Embodiment 1 of the present application;
  • Fig. 2 is the flow chart of the steps of the grouping model construction method based on causal inference in Embodiment 1 of the present application;
  • Fig. 3 is a flow chart of the steps of the method for constructing a grouping model based on causal inference in Embodiment 1 of the present application;
  • FIG. 4 is a schematic diagram of program modules of a system for constructing a grouping model based on causal inference in Embodiment 2 of the present application;
  • FIG. 5 is a flow chart of the steps of the medical data processing method of Embodiment 3 of the present application.
  • FIG. 6 is a schematic diagram of the program modules of the medical data processing system according to Embodiment 4 of the present application.
  • FIG. 7 is a schematic diagram of a hardware structure of a computer device according to Embodiment 5 of the present application.
  • FIG. 1 shows a flow chart of the steps of the method for constructing a grouping model based on causal inference according to an embodiment of the present application. It can be understood that the flowchart in this method embodiment is not used to limit the sequence of execution steps.
  • the following is an exemplary description taking computer equipment as the execution subject, as follows:
  • the method for constructing a grouping model based on causal inference may include steps S100 to S106, wherein:
  • step S100 a plurality of sample data of a plurality of sample patients is acquired, and the plurality of sample data of each sample patient includes a plurality of basic data, a plurality of patient history follow-up data and sample patient grouping result data.
  • the plurality of sample patients may be a plurality of diabetic patients.
  • the historical follow-up data of multiple diabetic patients are collected in chronological order, the basic data of a diabetic patient, the data of each follow-up visit, and the corresponding sample patient grouping result data.
  • the diabetic patient is used as a sample data.
  • multiple basic data include but are not limited to age, gender, place of work, frequently visited places, etc.; follow-up data include: medication history, medical test reports from third-party platforms or medical systems, expert/doctor prescription information, etc. data.
  • the method further includes: performing preprocessing on multiple sample data, specifically including performing feature merging on multiple basic data of multiple sample patients and performing historical follow-up data on multiple patients through feature engineering.
  • the features are combined to obtain the training data. For example, through feature engineering, the first feature primitives of each basic data and the second feature primitives of each patient's historical follow-up data are obtained, and the first feature primitives corresponding to each basic data are respectively performed based on the sample patient grouping result data.
  • Step S102 input multiple sample data of the multiple sample patients into the model to be trained, output the propensity score of each sample patient for its corresponding sample patient grouping result data through the model to be trained, and pass the The model to be trained outputs the sample expected cumulative reward value of each sample patient corresponding to the patient grouping result data of each model in the model to be trained, wherein the propensity score indicates that the sample patient corresponds to the sample patient grouping result data probability.
  • the model to be trained can be a deep reinforcement learning model (Deep Q Network, DQN model).
  • the preprocessed sample data input into the deep reinforcement learning model is defined as state (state); multiple model patient grouping result data are defined as action (action), according to the sample patient in states (multiple The result information obtained after taking action under sample data) defines reward (reward).
  • Action is the one-hot encoding of patient grouping result data, and reward includes long-term reward and short-term reward.
  • the long-term reward can be positioned as: sign (whether there is a complication in the last follow-up)*5; the short-term reward can be defined as: sign (whether the glycated hemoglobin reaches the target in the next follow-up)*1.
  • step S300 performing random allocation on multiple sample data of the multiple sample patients, to obtain a plurality of training sample data and a plurality of control sample data
  • step S302 input the plurality of training sample data and a plurality of control sample data into the model to be trained, and use the model to be trained to Logistic regression is performed on the plurality of training sample data and the plurality of control sample data, and the propensity score of each sample patient to its corresponding sample patient grouping result data is calculated.
  • a plurality of training sample data of a plurality of first sample patients and a plurality of control sample data of a plurality of second sample patients are randomly assigned, and according to each first sample patient from a plurality of second sample Among the patients, the second sample patient is determined for control, which can be understood as being based on the third similarity between the training sample data of the first sample patient and the control sample data of each second sample patient, from each Screen out one or more second sample patients corresponding to a third preset threshold value from the plurality of third similarities corresponding to the first sample patient, and determine the sample patient from the screened one or more second sample patients
  • the second sample patients with inconsistent data in the clustering results were used for control analysis, so as to conduct causal analysis based on randomly assigned sample data through the model.
  • multiple training sample data of the first sample patient are positive sample data
  • each control sample data of one or more second sample patients screened out are negative sample data.
  • the DQN model combines the propensity represented by the sample data to make the expected reward output by the model more accurate.
  • the model to be trained includes an input layer, an output layer, at least four NN layers (hidden layers) and a classification layer, wherein the input layer is used for Receive a plurality of sample data of a plurality of sample patients, the hidden layer is used to analyze and process the plurality of sample data, the output layer includes a plurality of output nodes, and each output node outputs the corresponding model patient grouping of the node The score of the result data; the classification layer is used to convert the score corresponding to each output node into the sample expected cumulative reward value of the patient grouping result data of each model.
  • the input layer is used for Receive a plurality of sample data of a plurality of sample patients
  • the hidden layer is used to analyze and process the plurality of sample data
  • the output layer includes a plurality of output nodes, and each output node outputs the corresponding model patient grouping of the node The score of the result data
  • the classification layer is used to convert the score corresponding to each output node into the sample expected cumulative reward value of the patient grouping
  • a plurality of sample data (states) of the plurality of sample patients are input into the input layer of the model to be trained, and after being processed by two layers of hidden layers, the propensity score of each sample patient for its sample patient grouping result data is output.
  • value g and other eigenvalues, and other eigenvalues are input to the rest of the hidden layer, and output the sample expected cumulative reward values Q 0 , Q 1 , Q of each sample patient corresponding to each model patient grouping result data (action) through the output layer 2 , . . . , Q n .
  • g represents the probability of doctors or experts taking the corresponding sample patient grouping result data under states.
  • g can be expressed as p(a 1
  • s) 1-p(a 0
  • Step S104 from the sample expected cumulative reward value of each sample patient, determine the target sample expected cumulative reward value corresponding to each sample patient.
  • the target sample expected cumulative reward value corresponding to the sample patient is determined to be the largest sample expected cumulative reward value from the multiple sample expected cumulative reward values of each sample patient.
  • Step S106 based on the preset loss function, the propensity score of each patient sample and the corresponding expected cumulative reward value of the target sample, adjust the model parameters in the model to be trained to optimize the clustering model.
  • Loss Q_loss+ ⁇ 1 *g_loss+ ⁇ 2 *reg_loss;
  • Loss is represented as a loss value
  • reg_loss is represented as a regression loss value
  • g_loss is represented as a first loss value
  • Q_loss is represented as a second loss value
  • ⁇ 1 and ⁇ 2 are adjustable hyperparameters of the model to be trained.
  • the model to be trained is repeatedly trained through the following loss function, the Loss is calculated through the loss function, the gradient is calculated for the Loss, the model parameters of the model are adjusted by using the gradient descent algorithm to backpropagate the Loss, and the training is repeated until the Loss is no longer , the grouping model is obtained.
  • the sample data is organized into a quadruple form such as (st t , a t , r, st t+1 ), where st t represents the state at time t, and a t represents the grouping scheme of doctors at time t ( action), r and s t+1 represent the reward obtained after taking a t under s t and the next state to transfer to.
  • the loss function is as follows:
  • the first loss function includes:
  • reg_loss Q(s t , a tmax )-Q(s t , a t );
  • reg_loss is the regression loss value, which is used to prevent overestimation of the Q value.
  • Q represents the expected cumulative reward value corresponding to the sample patient
  • s t represents multiple sample data at time t
  • a t represents the corresponding value of the sample patient at time t.
  • Sample patient grouping result data, Q( st , atmax ) represents the largest sample expected cumulative reward value among multiple sample expected cumulative reward values output by the model to be trained
  • Q(st t , at t ) represents the sample patient The actual expected cumulative reward value of the sample patient clustering result data determined in the s t state;
  • the second loss function includes:
  • g_loss CrossEntropy(g(s t ), to_one_hot(a t ));
  • the third loss function includes:
  • Q_loss (Q(s t , a t )-( ⁇ +max a ( ⁇ *Q(s t+1 , a t+1 )))) 2 ;
  • the discount factor is used to indicate the attenuation ratio of the expected cumulative reward value of the target sample at the next time t+1 discounted to the expected cumulative reward value of the target sample corresponding to the time t;
  • Q_loss is the second loss value.
  • the deep reinforcement learning model is combined with causal inference analysis to train multiple sample data, decoupling the tendency of patient grouping result data representation, eliminating the deviation of patient grouping result data selection, and the model fitting is more reasonable;
  • FIG. 4 shows a schematic diagram of the program modules of the causal inference-based grouping model building system of the present application.
  • the grouping model construction system 40 based on causal inference may include or be divided into one or more program modules, and one or more program modules are stored in a storage medium and executed by one or more processors. Execute to complete the application and realize the above-mentioned method for constructing a grouping model based on causal inference.
  • the program module referred to in the embodiment of the present application refers to a series of computer program instruction segments capable of accomplishing specific functions, which is more suitable than the program itself to describe the execution process of the causal inference-based grouping model construction system 40 in the storage medium. The following description will specifically introduce the functions of each program module of the present embodiment:
  • the said grouping model construction system 40 based on causal inference includes:
  • the first acquiring module 400 is configured to acquire multiple sample data of multiple sample patients, and the multiple sample data of each sample patient includes multiple basic data, multiple patient historical follow-up data and sample patient grouping result data;
  • the first model processing module 402 is configured to input multiple sample data of the multiple sample patients into the model to be trained, and output the tendency of each sample patient to its corresponding sample patient grouping result data through the model to be trained Score and each sample patient corresponding to the sample expected cumulative reward value of each model patient clustering result data, wherein the propensity score represents the probability that the sample patient corresponds to the sample patient clustering result data;
  • the first determining module 404 is configured to determine the target sample expected cumulative reward value corresponding to each sample patient from the sample expected cumulative reward value of each sample patient;
  • the optimization module 406 is used to adjust the model parameters in the model to be trained based on the preset loss function, the propensity score of each sample patient and the corresponding expected cumulative reward value of the target sample, so as to optimize the grouping model.
  • the preset loss function includes a first loss function, a second loss function, and a third loss function; the optimization module 406 is further configured to: based on the first loss function and the The expected cumulative reward value of the target sample corresponding to each sample patient is calculated to obtain the regression loss value; based on the second loss function and the propensity score of each sample patient, the first propensity score corresponding to the propensity score is calculated.
  • Loss value based on the third loss function and the target sample expected cumulative reward value corresponding to each sample patient, calculate the second loss value corresponding to the target sample expected cumulative reward value; for the regression loss value, the Summing the first loss value and the second loss value to obtain a loss value; modifying the model parameters in the model to be trained according to the loss value to obtain a modified model to be trained; and
  • the modified model to be trained performs group training on the multiple sample data of the multiple sample patients, and stops the training when the modified model parameters reach the preset number of modifications and the loss value does not decrease, and the current The model to be trained is marked as the grouping model.
  • the first loss function includes:
  • reg_loss Q(s t , a tmax )-Q(s t , a t );
  • reg_loss is the regression loss value, which is used to prevent overestimation of the Q value.
  • Q represents the expected cumulative reward value corresponding to the sample patient
  • s t represents multiple sample data at time t
  • a t represents the corresponding value of the sample patient at time t.
  • Sample patient grouping result data, Q( st , atmax ) represents the largest sample expected cumulative reward value among multiple sample expected cumulative reward values output by the model to be trained
  • Q(st t , at t ) represents the sample patient The actual expected cumulative reward value of the sample patient clustering result data determined in the s t state;
  • the second loss function includes:
  • g_loss CrossEntropy(g(s t ), to_one_hot(a t ));
  • the third loss function includes:
  • Q_loss (Q(s t , a t )-( ⁇ +max a ( ⁇ *Q(s t+1 , a t+1 )))) 2 ;
  • the discount factor is used to indicate the attenuation ratio of the expected cumulative reward value of the target sample at the next time t+1 discounted to the expected cumulative reward value of the target sample corresponding to the time t;
  • Q_loss is the second loss value.
  • the model to be trained is a deep reinforcement learning model.
  • the first model processing module 402 is further configured to: randomly distribute the multiple sample data of the multiple sample patients to obtain multiple training sample data and multiple control samples data; and input the plurality of training sample data and the plurality of control sample data into the model to be trained, and perform logic on the plurality of training sample data and the plurality of control sample data through the model to be trained Regression, calculating the propensity score of each sample patient for its corresponding sample patient grouping result data.
  • FIG. 5 shows a flow chart of the steps of the medical data processing method according to the embodiment of the present application. It can be understood that the flowchart in this method embodiment is not used to limit the sequence of execution steps.
  • the following is an exemplary description taking computer equipment as the execution subject, as follows:
  • the medical data processing method may include steps S500-S506, wherein:
  • Step S500 acquiring a plurality of patient data of the target patient, the plurality of patient data including a plurality of basic data, a plurality of patient history follow-up data and a patient current follow-up data;
  • Step S502 input the plurality of basic data, the plurality of patient history follow-up data and the patient current follow-up data into the grouping model as described above, and output the target patient corresponding to each model through the grouping model The expected cumulative reward value of patient grouping result data;
  • Step S504 from the multiple expected cumulative reward values, determine the largest expected cumulative reward value as the target expected cumulative reward value;
  • Step S506 according to the target expected cumulative reward value, determine the corresponding model patient grouping result data as the target patient grouping result data corresponding to the target patient.
  • FIG. 6 shows a schematic diagram of program modules of the medical data processing system of the present application.
  • the medical data processing system 60 may include or be divided into one or more program modules, and one or more program modules are stored in a storage medium and executed by one or more processors to complete
  • the program module referred to in the embodiment of this application refers to a series of computer program instruction segments capable of completing specific functions, which is more suitable for describing the execution process of the medical data processing system 60 in the storage medium than the program itself.
  • the following description will specifically introduce the functions of each program module of the present embodiment:
  • the medical data processing system includes:
  • the second acquiring module 600 is configured to acquire multiple patient data of the target patient, the multiple patient data including multiple basic data, multiple patient historical follow-up data, and patient current follow-up data;
  • the second model processing module 602 is configured to input the multiple basic data, the multiple patient historical follow-up data and the patient current follow-up data into the grouping model according to any one of claims 1-5, through
  • the grouping model outputs the expected cumulative reward value corresponding to each model patient grouping result data of the target patient;
  • the second determining module 604 is configured to determine the largest expected cumulative reward value as a target expected cumulative reward value from among multiple expected cumulative reward values;
  • the third determination module 606 is configured to determine the corresponding model patient grouping result data as the target patient grouping result data corresponding to the target patient according to the target expected cumulative reward value.
  • the computer device 2 is a device capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions.
  • the computer device 2 may be a rack server, a blade server, a tower server or a cabinet server (including an independent server, or a server cluster composed of multiple servers) and the like.
  • the computer device 2 at least includes, but is not limited to, a memory 21 , a processor 22 , a network interface 23 , and a causal inference-based grouping model building system 40 that can communicate with each other through a system bus. in:
  • the memory 21 includes at least one type of computer-readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory ( RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, etc.
  • the memory 21 may be an internal storage unit of the computer device 2 , such as a hard disk or memory of the computer device 2 .
  • the memory 21 can also be an external storage device of the computer device 2, such as a plug-in hard disk equipped on the computer device 2, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc.
  • the storage 21 may also include both the internal storage unit of the computer device 2 and its external storage device.
  • the memory 21 is usually used to store the operating system and various application software installed in the computer device 2, such as the program codes of the causal inference-based grouping model construction system 40 of the above-mentioned embodiment.
  • the memory 21 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 22 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips.
  • the processor 22 is generally used to control the overall operation of the computer device 2 .
  • the processor 22 is used to run the program code stored in the memory 21 or process data, for example, run the causal inference-based grouping model construction system 40, so as to implement the causal inference-based grouping model construction method of the above-mentioned embodiment.
  • the network interface 23 may include a wireless network interface or a wired network interface, and the network interface 23 is generally used to establish a communication connection between the computer device 2 and other electronic devices.
  • the network interface 23 is used to connect the computer device 2 with an external terminal through a network, and establish a data transmission channel and a communication connection between the computer device 2 and an external terminal.
  • the network can be an enterprise intranet (Intranet), Internet (Internet), Global System of Mobile communication (Global System of Mobile communication, GSM), broadband code division multiple access (Wideband Code Division Multiple Access, WCDMA), 4G network, 5G Internet, Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.
  • FIG. 7 only shows the computer device 2 having components 21-23 and a causal inference-based grouping model building system 40, but it should be understood that it is not required to implement all the components shown, and can be replaced by Implement more or fewer components.
  • the causal inference-based grouping model construction system 40 stored in the memory 21 can also be divided into one or more program modules, and the one or more program modules are stored in the memory 21, And it is executed by one or more processors (processor 22 in this embodiment) to complete the application.
  • FIG. 4 shows a schematic diagram of the program modules of Embodiment 2 of the system 40 for constructing a grouping model based on causal inference.
  • the system for building a grouping model 40 based on causal inference can be divided into the first acquisition Module 400 , first model processing module 402 , first determination module 404 and optimization module 406 .
  • the program module referred to in this application refers to a series of computer program instruction segments capable of completing specific functions, which is more suitable than a program to describe the execution process of the causal inference-based grouping model construction system 40 in the computer device 2 .
  • the specific functions of the program modules 400-406 have been described in detail in the second embodiment, and will not be repeated here.
  • This embodiment also provides a computer-readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), only Read memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, server, App application store, etc., on which computer programs are stored, The corresponding functions are realized when the program is executed by the processor.
  • the computer-readable storage medium may be non-volatile or volatile.
  • the computer-readable storage medium of this embodiment is used to store the system 40 for constructing a grouping model based on causal inference, and when executed by a processor, realizes the method for constructing a grouping model based on causal inference in the above-mentioned embodiment.

Abstract

A clustering model construction method based on causal inference, comprising: inputting a plurality sample data of multiple sample patients into a model to be trained, and outputting, by means said model, a tendency score of each sample patient for corresponding sample patient clustering result data and multiple sample expected cumulative reward values corresponding to each sample patient; determining, from among the multiple sample expected cumulative reward values, a target sample expected cumulative reward value of each sample patient; adjusting model parameters in said model on the basis of a preset loss function, the tendency score of each sample patient and a corresponding target sample expected cumulative reward value, so as to obtain a clustering model. A plurality of sample data is trained by means of combining a model to be trained with causal inference analysis, eliminating selection deviation for patient clustering result data, so that model fitting is more reasonable, and the application accuracy of a trained model is higher.

Description

基于因果推断的分群模型构建方法和医疗数据处理方法Grouping model construction method and medical data processing method based on causal inference
本申请申明2021年09月30日递交的申请号为202111156355.6、名称为“基于因果推断的分群模型构建方法和医疗数据处理方法”的中国专利申请的优先权,该中国专利申请的整体内容以参考的方式结合在本申请中。This application declares the priority of the Chinese patent application with the application number 202111156355.6 and titled "Causal inference-based grouping model construction method and medical data processing method" submitted on September 30, 2021. The entire content of the Chinese patent application is referred to way is incorporated in this application.
技术领域technical field
本申请实施例涉及人工智能技术领域,尤其涉及一种基于因果推断的分群模型构建方法和医疗数据处理方法。The embodiment of the present application relates to the technical field of artificial intelligence, and in particular to a method for constructing a grouping model based on causal inference and a method for processing medical data.
背景技术Background technique
在医疗领域,患者分群对疾病诊断、疾病预测、用药治疗等方面有重要意义。目前,通常采用深度强化学习模型对患者群体进行划分。深度强化学习模型多是利用多层神经网络捕捉特征之间的相关性依赖以估计“收益”。In the medical field, patient grouping is of great significance to disease diagnosis, disease prediction, and drug treatment. Currently, deep reinforcement learning models are commonly used to segment patient populations. Most deep reinforcement learning models use multi-layer neural networks to capture the correlation dependence between features to estimate "revenue".
然而,在实际医疗领域应用场景中,患者的实际分群决策与某些特征存在高相关性。医生会根据诊断指南或临床经验,对患者有针对地选择分群决策。这种采取决策的分布偏倚会影响深度强化学习模型的学习。发明人意识到现有的深度强化学习模型的学习和训练过程由于样本数据中存在决策的分布偏移,导致基于深度强化学习模型训练完成的分群模型在进行患者分群决策时存在准确性低、不合理等问题。However, in the actual medical field application scenarios, there is a high correlation between the actual grouping decision of patients and certain characteristics. Doctors will make targeted grouping decisions for patients based on diagnostic guidelines or clinical experience. This distributional bias in taking decisions affects the learning of deep reinforcement learning models. The inventor realized that the learning and training process of the existing deep reinforcement learning model has a decision-making distribution shift in the sample data, resulting in low accuracy and inaccuracy in the grouping model trained based on the deep reinforcement learning model when making patient grouping decisions. Reasonable and other issues.
发明内容Contents of the invention
有鉴于此,本申请实施例提供了一种基于因果推断的分群模型构建方法、系统、计算机设备、计算机可读存储介质及医疗数据处理方法,用于解决现有的深度强化学习模型的学习和训练过程由于样本数据中存在决策的分布偏移,导致基于深度强化学习模型训练完成的分群模型在进行患者分群决策时存在准确性低、不合理的问题。In view of this, the embodiment of the present application provides a method for constructing a grouping model based on causal inference, a system, a computer device, a computer-readable storage medium, and a medical data processing method, which are used to solve the problems of learning and processing of existing deep reinforcement learning models. During the training process, due to the distribution deviation of decision-making in the sample data, the grouping model trained based on the deep reinforcement learning model has low accuracy and unreasonable problems in making patient grouping decisions.
本申请实施例是通过下述技术方案来解决上述技术问题:The embodiment of the present application solves the above-mentioned technical problems through the following technical solutions:
本申请的一个方面提供了一种基于因果推断的分群模型构建方法,包括:One aspect of the present application provides a method for constructing a grouping model based on causal inference, including:
获取多个样本患者的多个样本数据,每个样本患者的多个样本数据包括多个基本数据、多个患者历史随访数据和样本患者分群结果数据;Obtain multiple sample data of multiple sample patients, and multiple sample data of each sample patient includes multiple basic data, multiple patient historical follow-up data and sample patient grouping result data;
将所述多个样本患者的多个样本数据输入待训练模型中,通过所述待训练模型输出每个样本患者对于其对应的样本患者分群结果数据的倾向性分值,以及通过所述待训练模型输出每个样本患者对应所述待训练模型中的每个模型患者分群结果数据的样本预期累计奖励值,其中,所述倾向性分值表示样本患者对应于样本患者分群结果数据的概率;Input a plurality of sample data of the plurality of sample patients into the model to be trained, output the propensity score of each sample patient for its corresponding sample patient grouping result data through the model to be trained, and use the model to be trained The model outputs each sample patient corresponding to the sample expected cumulative reward value of each model patient grouping result data in the model to be trained, wherein the propensity score represents the probability that the sample patient corresponds to the sample patient grouping result data;
从每个所述样本患者的样本预期累计奖励值中,确定每个样本患者对应的目标样本预期累计奖励值;及From the sample expected cumulative reward value of each sample patient, determine the target sample expected cumulative reward value corresponding to each sample patient; and
基于预设的损失函数、每个样本患者的倾向性分值以及对应的目标样本预期累计奖励值,调整所述待训练模型中的模型参数,以优化得到分群模型。Based on the preset loss function, the propensity score of each patient sample and the corresponding expected cumulative reward value of the target sample, the model parameters in the model to be trained are adjusted to optimize the grouping model.
本申请实施例的又一个方面提供了一种基于因果推断的分群模型构建系统,包括:Another aspect of the embodiment of the present application provides a system for constructing a grouping model based on causal inference, including:
第一获取模块,用于获取多个样本患者的多个样本数据,每个样本患者的多个样本数据包括多个基本数据、多个患者历史随访数据和样本患者分群结果数据;The first acquisition module is used to acquire multiple sample data of multiple sample patients, and the multiple sample data of each sample patient includes multiple basic data, multiple patient historical follow-up data and sample patient grouping result data;
第一模型处理模块,用于将所述多个样本患者的多个样本数据输入待训练模型中,通过所述待训练模型输出每个样本患者对于其对应的样本患者分群结果数据的倾向性分值,以及通过所述待训练模型输出每个样本患者对应所述待训练模型中的每个模型患者分群结果数据的样本预期累计奖励值,其中,所述倾向性分值表示样本患者对应于样本患者分群结果数据的概率;及The first model processing module is used to input multiple sample data of the multiple sample patients into the model to be trained, and output the propensity score of each sample patient for its corresponding sample patient grouping result data through the model to be trained value, and the sample expected cumulative reward value of each sample patient corresponding to each model patient grouping result data in the model to be trained is output by the model to be trained, wherein the propensity score indicates that the sample patient corresponds to the sample patient Probability of patient cohort outcome data; and
第一确定模块,用于从每个所述样本患者的样本预期累计奖励值中,确定每个样本患者对应的目标样本预期累计奖励值;及The first determining module is used to determine the target sample expected cumulative reward value corresponding to each sample patient from the sample expected cumulative reward value of each sample patient; and
优化模块,用于基于预设的损失函数、每个样本患者的倾向性分值以及对应的目标样本预期累计奖励值,调整所述待训练模型中的模型参数,以优化得到分群模型。The optimization module is used to adjust the model parameters in the model to be trained based on the preset loss function, the propensity score of each sample patient and the corresponding expected cumulative reward value of the target sample, so as to optimize the clustering model.
本申请实施例的又一个方面提供了一种医疗数据处理方法,包括:Another aspect of the embodiment of the present application provides a medical data processing method, including:
获取目标患者的多个患者数据,所述多个患者数据包括多个基本数据、多个患者历史随访数据以及患者当前随访数据;Acquiring a plurality of patient data of the target patient, the plurality of patient data including a plurality of basic data, a plurality of patient historical follow-up data, and a patient current follow-up data;
将所述多个基本数据、所述多个患者历史随访数据以及所述患者当前随访数据输入如上述所述的分群模型中,通过所述分群模型输出所述目标患者对应每个模型患者分群结果数据的预期累计奖励值;Input the plurality of basic data, the plurality of patient historical follow-up data and the patient current follow-up data into the above-mentioned grouping model, and output the target patient corresponding to each model patient grouping result through the grouping model The expected cumulative reward value of the data;
从多个预期累计奖励值中,确定最大的预期累计奖励值为目标预期累计奖励值;及From the plurality of expected cumulative reward values, determining the largest expected cumulative reward value as a target expected cumulative reward value; and
根据所述目标预期累计奖励值,确定其对应的模型患者分群结果数据为所述目标患者对应的目标患者分群结果数据。According to the target expected cumulative reward value, determine the corresponding model patient grouping result data as the target patient grouping result data corresponding to the target patient.
本申请实施例的又一个方面提供了一种医疗数据处理系统,包括:Another aspect of the embodiment of the present application provides a medical data processing system, including:
第二获取模块,用于获取目标患者的多个患者数据,所述多个患者数据包括多个基本数据、多个患者历史随访数据以及患者当前随访数据;The second acquiring module is used to acquire multiple patient data of the target patient, the multiple patient data including multiple basic data, multiple patient historical follow-up data and patient current follow-up data;
第二模型处理模块,用于将所述多个基本数据、所述多个患者历史随访数据以及所述患者当前随访数据输入如上述所述的分群模型中,通过所述分群模型输出所述目标患者对应每个模型患者分群结果数据的预期累计奖励值;The second model processing module is used to input the plurality of basic data, the plurality of patient history follow-up data and the patient current follow-up data into the above-mentioned grouping model, and output the target through the grouping model The patient's expected cumulative reward value corresponding to the patient grouping result data of each model;
第二确定模块,用于从多个预期累计奖励值中,确定最大的预期累计奖励值为目标预期累计奖励值;及The second determining module is used to determine the largest expected cumulative reward value as a target expected cumulative reward value from among multiple expected cumulative reward values; and
第三确定模块,用于根据所述目标预期累计奖励值,确定其对应的模型患者分群结果数据为所述目标患者对应的目标患者分群结果数据。The third determination module is configured to determine the corresponding model patient grouping result data as the target patient grouping result data corresponding to the target patient according to the target expected cumulative reward value.
为了实现上述目的,本申请实施例还提供一种计算机设备,所述计算机设备包括存储器、处理器以及存储在所述存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时还执行以下步骤:In order to achieve the above purpose, an embodiment of the present application further provides a computer device, the computer device includes a memory, a processor, and a computer program stored in the memory and operable on the processor, and the processor executes the The following steps are also performed when the computer program:
获取多个样本患者的多个样本数据,每个样本患者的多个样本数据包括多个基本数据、多个患者历史随访数据和样本患者分群结果数据;Obtain multiple sample data of multiple sample patients, and multiple sample data of each sample patient includes multiple basic data, multiple patient historical follow-up data and sample patient grouping result data;
将所述多个样本患者的多个样本数据输入待训练模型中,通过所述待训练模型输出每个样本患者对于其对应的样本患者分群结果数据的倾向性分值,以及通过所述待训练模型输出每个样本患者对应所述待训练模型中的每个模型患者分群结果数据的样本预期累计奖励值,其中,所述倾向性分值表示样本患者对应于样本患者分群结果数据的概率;Input a plurality of sample data of the plurality of sample patients into the model to be trained, output the propensity score of each sample patient for its corresponding sample patient grouping result data through the model to be trained, and use the model to be trained The model outputs each sample patient corresponding to the sample expected cumulative reward value of each model patient grouping result data in the model to be trained, wherein the propensity score represents the probability that the sample patient corresponds to the sample patient grouping result data;
从每个所述样本患者的样本预期累计奖励值中,确定每个样本患者对应的目标样本预期累计奖励值;及From the sample expected cumulative reward value of each sample patient, determine the target sample expected cumulative reward value corresponding to each sample patient; and
基于预设的损失函数、每个样本患者的倾向性分值以及对应的目标样本预期累计奖励值,调整所述待训练模型中的模型参数,以优化得到分群模型。Based on the preset loss function, the propensity score of each patient sample and the corresponding expected cumulative reward value of the target sample, the model parameters in the model to be trained are adjusted to optimize the grouping model.
为了实现上述目的,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质内存储有计算机程序,所述计算机程序可被至少一个处理器所执行,以使所述至少一个处理器执行以下步骤:In order to achieve the above purpose, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and the computer program can be executed by at least one processor, so that the at least A processor performs the following steps:
获取多个样本患者的多个样本数据,每个样本患者的多个样本数据包括多个基本数据、多个患者历史随访数据和样本患者分群结果数据;Obtain multiple sample data of multiple sample patients, and multiple sample data of each sample patient includes multiple basic data, multiple patient historical follow-up data and sample patient grouping result data;
将所述多个样本患者的多个样本数据输入待训练模型中,通过所述待训练模型输出每个样本患者对于其对应的样本患者分群结果数据的倾向性分值,以及通过所述待训练模型输出每个样本患者对应所述待训练模型中的每个模型患者分群结果数据的样本预期累计奖励值,其中,所述倾向性分值表示样本患者对应于样本患者分群结果数据的概率;Input a plurality of sample data of the plurality of sample patients into the model to be trained, output the propensity score of each sample patient for its corresponding sample patient grouping result data through the model to be trained, and use the model to be trained The model outputs each sample patient corresponding to the sample expected cumulative reward value of each model patient grouping result data in the model to be trained, wherein the propensity score represents the probability that the sample patient corresponds to the sample patient grouping result data;
从每个所述样本患者的样本预期累计奖励值中,确定每个样本患者对应的目标样本预期累计奖励值;及From the sample expected cumulative reward value of each sample patient, determine the target sample expected cumulative reward value corresponding to each sample patient; and
基于预设的损失函数、每个样本患者的倾向性分值以及对应的目标样本预期累计奖励值,调整所述待训练模型中的模型参数,以优化得到分群模型。Based on the preset loss function, the propensity score of each patient sample and the corresponding expected cumulative reward value of the target sample, the model parameters in the model to be trained are adjusted to optimize the grouping model.
本申请实施例提供的基于因果推断的分群模型构建方法、系统、计算机设备、计算机可读存储介质及医疗数据处理方法,将所述多个样本患者的多个基本数据、多个患者历史随访数据和样本患者分群结果数据输入待训练模型中,通过所述待训练模型输出每个样本患者对于其对应的样本患者分群结果数据的倾向性分值以及每个样本患者对应每个模型患者分群结果数据的样本预期累计奖励值;从每个样本患者的样本预期累计奖励值中,确定每个样本患者对应的目标样本预期累计奖励值;及基于预设的损失函数、每个样本患者的倾向性分值以及对应的目标样本预期累计奖励值,调整所述待训练模型中的模型参数,以优化得到分群模型;通过待训练模型结合因果推断分析对多个样本数据进行训练,消除对于患者分群结果数据的选择偏差,使得模型拟合得更加合理,训练后的模型应用准确率更高。The causal inference-based grouping model construction method, system, computer equipment, computer-readable storage medium, and medical data processing method provided in the embodiments of the present application use the multiple basic data of the multiple sample patients and the multiple patient historical follow-up data and the sample patient grouping result data are input into the model to be trained, and the propensity score of each sample patient for its corresponding sample patient grouping result data and each sample patient corresponding to each model patient grouping result data are output by the model to be trained The sample expected cumulative reward value of each sample patient; from the sample expected cumulative reward value of each sample patient, determine the target sample expected cumulative reward value corresponding to each sample patient; and based on the preset loss function, the propensity score of each sample patient value and the corresponding expected cumulative reward value of the target sample, adjust the model parameters in the model to be trained to optimize the clustering model; train multiple sample data by combining the model to be trained with causal inference analysis, and eliminate the need for patient clustering result data The selection bias makes the model fit more reasonable, and the trained model has a higher application accuracy.
附图说明Description of drawings
图1为本申请实施例一之基于因果推断的分群模型构建方法的步骤流程图;Fig. 1 is the flow chart of the steps of the grouping model construction method based on causal inference in Embodiment 1 of the present application;
图2为本申请实施例一之基于因果推断的分群模型构建方法的步骤流程图;Fig. 2 is the flow chart of the steps of the grouping model construction method based on causal inference in Embodiment 1 of the present application;
图3为本申请实施例一之基于因果推断的分群模型构建方法的步骤流程图;Fig. 3 is a flow chart of the steps of the method for constructing a grouping model based on causal inference in Embodiment 1 of the present application;
图4为本申请实施例二之基于因果推断的分群模型构建系统的程序模块示意图;FIG. 4 is a schematic diagram of program modules of a system for constructing a grouping model based on causal inference in Embodiment 2 of the present application;
图5为本申请实施例三之医疗数据处理方法的步骤流程图;FIG. 5 is a flow chart of the steps of the medical data processing method of Embodiment 3 of the present application;
图6为本申请实施例四之医疗数据处理系统的程序模块示意图;FIG. 6 is a schematic diagram of the program modules of the medical data processing system according to Embodiment 4 of the present application;
图7为本申请实施例五之计算机设备的硬件结构示意图。FIG. 7 is a schematic diagram of a hardware structure of a computer device according to Embodiment 5 of the present application.
具体实施方式Detailed ways
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, not to limit the present application. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.
需要说明的是,在本申请实施例中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。It should be noted that the descriptions involving "first", "second", etc. in the embodiments of the present application are only for descriptive purposes, and should not be understood as indicating or implying their relative importance or implicitly indicating the indicated technical features quantity. Thus, the features defined as "first" and "second" may explicitly or implicitly include at least one of these features. In addition, the technical solutions of the various embodiments can be combined with each other, but it must be based on the realization of those skilled in the art. When the combination of technical solutions is contradictory or cannot be realized, it should be considered that the combination of technical solutions does not exist , nor within the scope of protection required by the present application.
在本申请的描述中,需要理解的是,步骤前的数字标号并不标识执行步骤的前后顺序,仅用于方便描述本申请及区别每一步骤,因此不能理解为对本申请的限制。In the description of the present application, it should be understood that the numerals before the steps do not indicate the order in which the steps are executed, but are only used to facilitate the description of the present application and to distinguish each step, so they should not be construed as limitations on the present application.
实施例一Embodiment one
请参阅图1,示出了本申请实施例之基于因果推断的分群模型构建方法的步骤流程图。可以理解,本方法实施例中的流程图不用于对执行步骤的顺序进行限定。下面以计算机设备为执行主体进行示例性描述,具体如下:Please refer to FIG. 1 , which shows a flow chart of the steps of the method for constructing a grouping model based on causal inference according to an embodiment of the present application. It can be understood that the flowchart in this method embodiment is not used to limit the sequence of execution steps. The following is an exemplary description taking computer equipment as the execution subject, as follows:
如图1所示,所述基于因果推断的分群模型构建方法可以包括步骤S100~S106,其中:As shown in Figure 1, the method for constructing a grouping model based on causal inference may include steps S100 to S106, wherein:
步骤S100,获取多个样本患者的多个样本数据,每个样本患者的多个样本数据包括多个基本数据、多个患者历史随访数据和样本患者分群结果数据。In step S100, a plurality of sample data of a plurality of sample patients is acquired, and the plurality of sample data of each sample patient includes a plurality of basic data, a plurality of patient history follow-up data and sample patient grouping result data.
在示例性的实施例中,多个样本患者为可以多个糖尿病患者。按照时间顺序收集多个糖尿病患者的历史随访数据,一个糖尿病患者的基本数据、每次随访数据、对应的样本患者分群结果数据以该糖尿病患者为一个样本数据。其中,多个基本数据包括但不限于年龄、性别、工作地点、经常出入的场所等;随访数据包括:用药史、来源于第三方平台或者医疗系统的医疗检测报告、专家/医生开药信息等数据。In an exemplary embodiment, the plurality of sample patients may be a plurality of diabetic patients. The historical follow-up data of multiple diabetic patients are collected in chronological order, the basic data of a diabetic patient, the data of each follow-up visit, and the corresponding sample patient grouping result data. The diabetic patient is used as a sample data. Among them, multiple basic data include but are not limited to age, gender, place of work, frequently visited places, etc.; follow-up data include: medication history, medical test reports from third-party platforms or medical systems, expert/doctor prescription information, etc. data.
为了提高模型的训练效率,所述方法还包括:对多个样本数据进行预处理,具体包括通过特征工程,对多个样本患者的多个基本数据进行特征合并以及对多个患者历史随访数据进行特征合并,得到训练数据。举例而言,通过特征工程,获取各个基本数据的第一特 征基元和各个患者历史随访数据的第二特征基元,基于样本患者分群结果数据分别对各个基本数据对应的第一特征基元进行聚合以及基于样本患者分群结果数据分别对各个患者历史随访数据对应的第二特征基元进行聚合,计算各个第一特征基元与样本患者分群结果数据之间的第一相似度,以及计算各个第二特征基元与样本患者分群结果数据之间的第二相似度;将至少一个小于第一预设阈值的第一相似度对应的基本数据进行特征合并以及将至少一个小于第二预设阈值的第二相似度对应的历史随访数据进行特征合并,以得到训练数据。例如,在a类患者中,A病症、B病症对于a类患者无直接关系,则将A病症、B病症合并成A类病症,以减少冗杂的数据对后续模型训练的影响,有效提高模型的训练效率。In order to improve the training efficiency of the model, the method further includes: performing preprocessing on multiple sample data, specifically including performing feature merging on multiple basic data of multiple sample patients and performing historical follow-up data on multiple patients through feature engineering. The features are combined to obtain the training data. For example, through feature engineering, the first feature primitives of each basic data and the second feature primitives of each patient's historical follow-up data are obtained, and the first feature primitives corresponding to each basic data are respectively performed based on the sample patient grouping result data. Aggregating and respectively aggregating the second feature primitives corresponding to the historical follow-up data of each patient based on the sample patient grouping result data, calculating the first similarity between each first feature primitive and the sample patient grouping result data, and calculating the first similarity between each first feature primitive and the sample patient grouping result data, and calculating each second feature primitive The second similarity between the two feature primitives and the sample patient grouping result data; at least one basic data corresponding to the first similarity that is less than the first preset threshold is subjected to feature merging and at least one is less than the second preset threshold. The historical follow-up data corresponding to the second similarity degree are combined to obtain training data. For example, in patients with category A, symptoms A and B are not directly related to patients with category A, then symptoms A and B are combined into category A to reduce the impact of redundant data on subsequent model training and effectively improve the accuracy of the model. training efficiency.
步骤S102,将所述多个样本患者的多个样本数据输入待训练模型中,通过所述待训练模型输出每个样本患者对于其对应的样本患者分群结果数据的倾向性分值,以及通过所述待训练模型输出每个样本患者对应所述待训练模型中的每个模型患者分群结果数据的样本预期累计奖励值,其中,所述倾向性分值表示样本患者对应于样本患者分群结果数据的概率。Step S102, input multiple sample data of the multiple sample patients into the model to be trained, output the propensity score of each sample patient for its corresponding sample patient grouping result data through the model to be trained, and pass the The model to be trained outputs the sample expected cumulative reward value of each sample patient corresponding to the patient grouping result data of each model in the model to be trained, wherein the propensity score indicates that the sample patient corresponds to the sample patient grouping result data probability.
在示例性的实施例中,所述待训练模型可以为深度强化学习模型(Deep Q Network,DQN模型)。在本实施例中,将输入深度强化学习模型中经预处理后的样本数据定义为state(状态);将多个模型患者分群结果数据定义为action(动作),根据样本患者在states(多个样本数据)下采取action后得到的结局信息定义reward(奖励)。action为患者分群结果数据的独热编码,reward包含长期reward和短期reward。例如,长期reward可以定位为:sign(最后一次随访是否出现并发症)*5;短期reward可以定义为:sign(下一次随访糖化血红蛋白是否达标)*1。In an exemplary embodiment, the model to be trained can be a deep reinforcement learning model (Deep Q Network, DQN model). In this embodiment, the preprocessed sample data input into the deep reinforcement learning model is defined as state (state); multiple model patient grouping result data are defined as action (action), according to the sample patient in states (multiple The result information obtained after taking action under sample data) defines reward (reward). Action is the one-hot encoding of patient grouping result data, and reward includes long-term reward and short-term reward. For example, the long-term reward can be positioned as: sign (whether there is a complication in the last follow-up)*5; the short-term reward can be defined as: sign (whether the glycated hemoglobin reaches the target in the next follow-up)*1.
如图3所示,在示例性的实施例中,为了消除动作选择的偏差,进一步提高模型训练的精度;将所述多个样本患者的多个样本数据输入待训练模型中,通过所述待训练模型输出每个样本患者对于其对应的样本患者分群结果数据的倾向性分值还可以通过以下操作得到,其中:步骤S300,对所述多个样本患者的多个样本数据进行随机化分配,以得到多个训练样本数据和多个对照样本数据;及步骤S302,将所述多个训练样本数据和多个对照样本数据输入到所述待训练模型中,通过所述待训练模型对所述多个训练样本数据和所述多个对照样本数据进行逻辑回归,计算得到所述每个样本患者对于其对应的样本患者分群结果数据的倾向性分值。在本实施例中,随机化分配得到多个第一样本患者的多个训练样本数据和多个第二样本患者的多个对照样本数据,根据各个第一样本患者从多个第二样本患者中确定用于对照的第二样本患者,用于对照可以理解为是根据第一样本患者的训练样本数据和各个第二样本患者的对照样本数据之间的第三相似度,从每个第一样本患者对应的多个第三相似度中筛选出一个或多个大于第三预设阈值对应的第二样本患者,并从筛选出的一个或多个第二样本患者中确定样本患者分群结果数据不一致的第二样本患者用于对照分析,以通过模型根据随机分配的样本数据进行因果分析。其中,第一样本患者的多个训练样本数据为正样本数据,筛选出的一个或多个第二样本患者的各个对照样本数据为负样本数据。在实施例中,DQN模型结合了对样本数据表示的倾向性,使模型输出的预期奖励 更加准确。As shown in Figure 3, in an exemplary embodiment, in order to eliminate the deviation of action selection and further improve the accuracy of model training; input multiple sample data of the multiple sample patients into the model to be trained, through the The training model outputs the propensity score of each sample patient for its corresponding sample patient grouping result data, which can also be obtained through the following operations, wherein: step S300, performing random allocation on multiple sample data of the multiple sample patients, to obtain a plurality of training sample data and a plurality of control sample data; and step S302, input the plurality of training sample data and a plurality of control sample data into the model to be trained, and use the model to be trained to Logistic regression is performed on the plurality of training sample data and the plurality of control sample data, and the propensity score of each sample patient to its corresponding sample patient grouping result data is calculated. In this embodiment, a plurality of training sample data of a plurality of first sample patients and a plurality of control sample data of a plurality of second sample patients are randomly assigned, and according to each first sample patient from a plurality of second sample Among the patients, the second sample patient is determined for control, which can be understood as being based on the third similarity between the training sample data of the first sample patient and the control sample data of each second sample patient, from each Screen out one or more second sample patients corresponding to a third preset threshold value from the plurality of third similarities corresponding to the first sample patient, and determine the sample patient from the screened one or more second sample patients The second sample patients with inconsistent data in the clustering results were used for control analysis, so as to conduct causal analysis based on randomly assigned sample data through the model. Wherein, multiple training sample data of the first sample patient are positive sample data, and each control sample data of one or more second sample patients screened out are negative sample data. In an embodiment, the DQN model combines the propensity represented by the sample data to make the expected reward output by the model more accurate.
在示例性的实施例中,以待训练模型为DQN模型为例,该待训练模型包括输入层、输出层、至少四层NN层(隐层)以及分类层,其中,所述输入层用于接收多个样本患者的多个样本数据,隐藏层用于对所述多个样本数据进行解析和处理,所述输出层包括多个输出结点,每个输出结点输出该节点对应模型患者分群结果数据的得分;所述分类层用于将每个输出结点对应的得分转化为每个模型患者分群结果数据的样本预期累计奖励值。将所述多个样本患者的多个样本数据(state)输入到待训练模型的输入层中,经过两层隐层的处理后,输出每个样本患者对于其样本患者分群结果数据的倾向性分值g和其他特征值,其他特征值再输入其余的隐层,并通过输出层输出每个样本患者对应每个模型患者分群结果数据(action)的样本预期累计奖励值Q 0、Q 1、Q 2、…、Q n。其中,g表示医生或者专家在states下采取对应的样本患者分群结果数据的概率。假设action有两个,分别为a 0和a 1,则g可以表示为p(a 1|s)=1-p(a 0|s);其中,p(a 1|s)表示在输入数据为s的情形下,医生将样本患者分类为a 1的概率,p(a 0|s)表示在输入数据为s的情形下,医生将样本患者分类为a 0的概率。 In an exemplary embodiment, taking the model to be trained as a DQN model as an example, the model to be trained includes an input layer, an output layer, at least four NN layers (hidden layers) and a classification layer, wherein the input layer is used for Receive a plurality of sample data of a plurality of sample patients, the hidden layer is used to analyze and process the plurality of sample data, the output layer includes a plurality of output nodes, and each output node outputs the corresponding model patient grouping of the node The score of the result data; the classification layer is used to convert the score corresponding to each output node into the sample expected cumulative reward value of the patient grouping result data of each model. A plurality of sample data (states) of the plurality of sample patients are input into the input layer of the model to be trained, and after being processed by two layers of hidden layers, the propensity score of each sample patient for its sample patient grouping result data is output. value g and other eigenvalues, and other eigenvalues are input to the rest of the hidden layer, and output the sample expected cumulative reward values Q 0 , Q 1 , Q of each sample patient corresponding to each model patient grouping result data (action) through the output layer 2 , . . . , Q n . Among them, g represents the probability of doctors or experts taking the corresponding sample patient grouping result data under states. Assuming that there are two actions, a 0 and a 1 respectively, then g can be expressed as p(a 1 |s)=1-p(a 0 |s); where p(a 1 |s) means that in the input data In the case of s, the probability that the doctor classifies the sample patient as a 1 , p(a 0 |s) represents the probability that the doctor classifies the sample patient as a 0 when the input data is s.
步骤S104,从每个所述样本患者的样本预期累计奖励值中,确定每个样本患者对应的目标样本预期累计奖励值。Step S104, from the sample expected cumulative reward value of each sample patient, determine the target sample expected cumulative reward value corresponding to each sample patient.
在示例性的实施例中,从每个样本患者的多个样本预期累计奖励值中确定最大的样本预期累计奖励值为该样本患者对应的目标样本预期累计奖励值。In an exemplary embodiment, the target sample expected cumulative reward value corresponding to the sample patient is determined to be the largest sample expected cumulative reward value from the multiple sample expected cumulative reward values of each sample patient.
步骤S106,基于预设的损失函数、每个样本患者的倾向性分值以及对应的目标样本预期累计奖励值,调整所述待训练模型中的模型参数,以优化得到分群模型。Step S106, based on the preset loss function, the propensity score of each patient sample and the corresponding expected cumulative reward value of the target sample, adjust the model parameters in the model to be trained to optimize the clustering model.
为了优化所述待训练模型,请参阅图2,所述预设的损失函数包括第一损失函数、第二损失函数和第三损失函数;所述基于预设的损失函数、每个样本患者的倾向性分值以及对应的目标样本预期累计奖励值,调整所述待训练模型中的模型参数,以优化得到分群模型还可以进一步包括步骤S200~S210,其中:步骤S200,基于所述第一损失函数和所述每个样本患者对应的目标样本预期累计奖励值,计算得到回归损失值;步骤S202,基于所述第二损失函数和所述每个样本患者的倾向性分值,计算得到倾向性分值对应的第一损失值;步骤S204,基于所述第三损失函数和所述每个样本患者对应的目标样本预期累计奖励值,计算得到目标样本预期累计奖励值对应的第二损失值;步骤S206,对所述回归损失值、所述第一损失值和所述第二损失值求和以得到损失值;步骤S208,根据所述损失值对所述待训练模型中的所述模型参数进行修改,以得到修改后的待训练模型;步骤S210,通过所述修改后的待训练模型对所述多个样本患者的多个样本数据执行分群训练,并在修改后的模型参数达到预设修改次数后且所述损失值不下降时停止训练,并将当前的待训练模型标记为所述分群模型。在本实施例中,计算得到回归损失值、第一损失值和第二损失值之后,通过以下函数计算得到所述损失值:In order to optimize the model to be trained, please refer to Fig. 2, the preset loss function includes the first loss function, the second loss function and the third loss function; the loss function based on the preset, each sample patient's The propensity score and the corresponding expected cumulative reward value of the target sample, and adjusting the model parameters in the model to be trained to optimize the grouping model may further include steps S200-S210, wherein: step S200, based on the first loss function and the expected cumulative reward value of the target sample corresponding to each sample patient, and calculate the regression loss value; step S202, based on the second loss function and the propensity score of each sample patient, calculate the propensity The first loss value corresponding to the score; step S204, based on the third loss function and the target sample expected cumulative reward value corresponding to each sample patient, calculate the second loss value corresponding to the target sample expected cumulative reward value; Step S206, summing the regression loss value, the first loss value and the second loss value to obtain a loss value; Step S208, calculating the model parameters in the model to be trained according to the loss value Modifying to obtain a modified model to be trained; step S210, performing group training on multiple sample data of the multiple sample patients through the modified model to be trained, and when the modified model parameters reach the preset Stop training after the number of modifications and the loss value does not decrease, and mark the current model to be trained as the grouping model. In this embodiment, after calculating the regression loss value, the first loss value and the second loss value, the loss value is calculated by the following function:
Loss=Q_loss+λ 1*g_loss+λ 2*reg_loss; Loss=Q_loss+λ 1 *g_loss+λ 2 *reg_loss;
其中,Loss表示为损失值,reg_loss表示为回归损失值,g_loss表示为第一损失值, Q_loss表示为第二损失值,λ 1和λ 2为所述待训练模型的可调整的超参数。 Wherein, Loss is represented as a loss value, reg_loss is represented as a regression loss value, g_loss is represented as a first loss value, Q_loss is represented as a second loss value, and λ1 and λ2 are adjustable hyperparameters of the model to be trained.
在本实施例中,通过以下损失函数重复训练所述待训练模型,通过损失函数计算出Loss,对Loss求梯度,使用梯度下降算法反向传播Loss调整模型的模型参数,重复训练直到Loss不再下降,则得到分群模型。在训练过程中,将样本数据整理成形如(s t,a t,r,s t+1)的四元组形式,其中s t表示t时刻的state,a t表示t时刻医生的分群方案(action),r和s t+1表示在s t下采取a t后得到的reward和转移到的下一个state。损失函数具体如下: In this embodiment, the model to be trained is repeatedly trained through the following loss function, the Loss is calculated through the loss function, the gradient is calculated for the Loss, the model parameters of the model are adjusted by using the gradient descent algorithm to backpropagate the Loss, and the training is repeated until the Loss is no longer , the grouping model is obtained. During the training process, the sample data is organized into a quadruple form such as (st t , a t , r, st t+1 ), where st t represents the state at time t, and a t represents the grouping scheme of doctors at time t ( action), r and s t+1 represent the reward obtained after taking a t under s t and the next state to transfer to. The loss function is as follows:
(1)所述第一损失函数包括:(1) The first loss function includes:
reg_loss=Q(s t,a tmax)-Q(s t,a t); reg_loss=Q(s t , a tmax )-Q(s t , a t );
其中,reg_loss为回归损失值,用于防止Q值的过高估计,Q表示样本患者对应的预期累计奖励值,s t表示t时刻下的多个样本数据;a t表示t时刻样本患者对应的样本患者分群结果数据,Q(s t,a tmax)表示通过所述待训练模型输出的多个样本预期累计奖励值中最大的样本预期累计奖励值,Q(s t,a t)表示样本患者在s t状态下确定的样本患者分群结果数据的实际预期累计奖励值; Among them, reg_loss is the regression loss value, which is used to prevent overestimation of the Q value. Q represents the expected cumulative reward value corresponding to the sample patient, s t represents multiple sample data at time t; a t represents the corresponding value of the sample patient at time t. Sample patient grouping result data, Q( st , atmax ) represents the largest sample expected cumulative reward value among multiple sample expected cumulative reward values output by the model to be trained, Q(st t , at t ) represents the sample patient The actual expected cumulative reward value of the sample patient clustering result data determined in the s t state;
(2)所述第二损失函数包括:(2) The second loss function includes:
g_loss=CrossEntropy(g(s t),to_one_hot(a t)); g_loss = CrossEntropy(g(s t ), to_one_hot(a t ));
其中,s t表示t时刻下的多个样本数据;a t表示t时刻样本患者对应的样本患者分群结果数据,CrossEntropy表示交叉熵,g(s t)表示输入状态s t时得到的g的输出,g表示倾向性分值,通过因果推断分析矫正线性分类层的前一个隐层(NN)中state表示分布的偏倚;one_hot(a t)表示t时刻样本患者对应的样本患者分群结果数据action的独热编码;g_loss为第一损失值,用于使输出的g逼近样本患者分群结果数据; Among them, s t represents multiple sample data at time t; a t represents the grouping result data of sample patients corresponding to sample patients at time t, CrossEntropy represents cross entropy, and g(s t ) represents the output of g obtained when the input state s t , g represents the propensity score, and corrects the bias of the state in the previous hidden layer (NN) of the linear classification layer through causal inference analysis; one_hot(a t ) represents the grouping result data action of the sample patient corresponding to the sample patient at time t One-hot encoding; g_loss is the first loss value, which is used to make the output g approximate to the sample patient grouping result data;
(3)所述第三损失函数包括:(3) The third loss function includes:
Q_loss=(Q(s t,a t)-(γ+max a(γ*Q(s t+1,a t+1)))) 2Q_loss=(Q(s t , a t )-(γ+max a (γ*Q(s t+1 , a t+1 )))) 2 ;
其中,s t表示t时刻下的多个样本数据;a t表示t时刻样本患者对应的样本患者分群结果数据;s t+1转移到的下一个t+1时刻的多个样本数据;Q(s t,a t)表示样本患者在t时刻的多个样本数据的状态下,样本患者分群结果数据a t对应的实际预期累计奖励值;max a(γ*Q(s t+1,a t+1)表示样本患者在t+1时刻的多个样本数据状态下,多个样本分群结果数据a t+1对应的样本预期累计奖励值中的最大值;γ表示所述待训练模型中的折扣因子,用于表示下一个t+1时刻的目标样本预期累计奖励值折现到t时刻对应的目标样本预期累计奖励值的衰减比例;Q_loss为第二损失值。 Among them, s t represents multiple sample data at time t; a t represents the sample patient grouping result data corresponding to the sample patient at time t; multiple sample data at the next time t +1 transferred from s t +1; Q( s t , a t ) represents the actual expected cumulative reward value corresponding to the sample patient grouping result data a t in the state of multiple sample data of the sample patient at time t; max a (γ*Q(st t+1 , a t +1 ) means that the sample patient is in the state of multiple sample data at time t+1, the maximum value of the expected cumulative reward value of the sample corresponding to the multiple sample grouping result data a t+1 ; γ means that in the model to be trained The discount factor is used to indicate the attenuation ratio of the expected cumulative reward value of the target sample at the next time t+1 discounted to the expected cumulative reward value of the target sample corresponding to the time t; Q_loss is the second loss value.
本申请实施例至少具有以下有益效果:The embodiments of the present application have at least the following beneficial effects:
(1)深度强化学习模型结合因果推断分析对多个样本数据进行训练,对患者分群结果数据表示的倾向性解耦,消除了患者分群结果数据选择的偏差,模型拟合得更加合理;(1) The deep reinforcement learning model is combined with causal inference analysis to train multiple sample data, decoupling the tendency of patient grouping result data representation, eliminating the deviation of patient grouping result data selection, and the model fitting is more reasonable;
(2)通过倾向性分值、样本预期累计奖励值以及损失函数防止每次输出的样本预期累计奖励值的过高估计,产生更加安全的患者分群结果数据;(2) Prevent overestimation of the expected cumulative reward value of each output sample through the propensity score, sample expected cumulative reward value and loss function, and generate safer patient grouping result data;
(3)因果推断分析在深度强化学习的决策中进行偏差消除,优化决策选择的长期累积回报,有效减少选择偏差带来的估计误差,提高了分群模型在现实使用中的准确性、安 全性。(3) Causal inference analysis eliminates bias in the decision-making of deep reinforcement learning, optimizes the long-term cumulative return of decision-making choices, effectively reduces the estimation error caused by selection bias, and improves the accuracy and safety of the grouping model in actual use.
实施例二Embodiment two
请继续参阅图4,示出了本申请基于因果推断的分群模型构建系统的程序模块示意图。在本实施例中,基于因果推断的分群模型构建系统40可以包括或被分割成一个或多个程序模块,一个或者多个程序模块被存储于存储介质中,并由一个或多个处理器所执行,以完成本申请,并可实现上述基于因果推断的分群模型构建方法。本申请实施例所称的程序模块是指能够完成特定功能的一系列计算机程序指令段,比程序本身更适合于描述基于因果推断的分群模型构建系统40在存储介质中的执行过程。以下描述将具体介绍本实施例各程序模块的功能:Please continue to refer to FIG. 4 , which shows a schematic diagram of the program modules of the causal inference-based grouping model building system of the present application. In this embodiment, the grouping model construction system 40 based on causal inference may include or be divided into one or more program modules, and one or more program modules are stored in a storage medium and executed by one or more processors. Execute to complete the application and realize the above-mentioned method for constructing a grouping model based on causal inference. The program module referred to in the embodiment of the present application refers to a series of computer program instruction segments capable of accomplishing specific functions, which is more suitable than the program itself to describe the execution process of the causal inference-based grouping model construction system 40 in the storage medium. The following description will specifically introduce the functions of each program module of the present embodiment:
所述基于因果推断的分群模型构建系统40,包括:The said grouping model construction system 40 based on causal inference includes:
第一获取模块400,用于获取多个样本患者的多个样本数据,每个样本患者的多个样本数据包括多个基本数据、多个患者历史随访数据和样本患者分群结果数据;The first acquiring module 400 is configured to acquire multiple sample data of multiple sample patients, and the multiple sample data of each sample patient includes multiple basic data, multiple patient historical follow-up data and sample patient grouping result data;
第一模型处理模块402,用于将所述多个样本患者的多个样本数据输入待训练模型中,通过所述待训练模型输出每个样本患者对于其对应的样本患者分群结果数据的倾向性分值以及每个样本患者对应每个模型患者分群结果数据的样本预期累计奖励值,其中,所述倾向性分值表示样本患者对应于样本患者分群结果数据的概率;及The first model processing module 402 is configured to input multiple sample data of the multiple sample patients into the model to be trained, and output the tendency of each sample patient to its corresponding sample patient grouping result data through the model to be trained Score and each sample patient corresponding to the sample expected cumulative reward value of each model patient clustering result data, wherein the propensity score represents the probability that the sample patient corresponds to the sample patient clustering result data; and
第一确定模块404,用于从每个所述样本患者的样本预期累计奖励值中,确定每个样本患者对应的目标样本预期累计奖励值;及The first determining module 404 is configured to determine the target sample expected cumulative reward value corresponding to each sample patient from the sample expected cumulative reward value of each sample patient; and
优化模块406,用于基于预设的损失函数、每个样本患者的倾向性分值以及对应的目标样本预期累计奖励值,调整所述待训练模型中的模型参数,以优化得到分群模型。The optimization module 406 is used to adjust the model parameters in the model to be trained based on the preset loss function, the propensity score of each sample patient and the corresponding expected cumulative reward value of the target sample, so as to optimize the grouping model.
在示例性的实施例中,所述预设的损失函数包括第一损失函数、第二损失函数和第三损失函数;所述优化模块406,还用于:基于所述第一损失函数和所述每个样本患者对应的目标样本预期累计奖励值,计算得到回归损失值;基于所述第二损失函数和所述每个样本患者的倾向性分值,计算得到倾向性分值对应的第一损失值;基于所述第三损失函数和所述每个样本患者对应的目标样本预期累计奖励值,计算得到目标样本预期累计奖励值对应的第二损失值;对所述回归损失值、所述第一损失值和所述第二损失值求和以得到损失值;根据所述损失值对所述待训练模型中的所述模型参数进行修改,以得到修改后的待训练模型;及通过所述修改后的待训练模型对所述多个样本患者的多个样本数据执行分群训练,并在修改后的模型参数达到预设修改次数后且所述损失值不下降时停止训练,并将当前的待训练模型标记为所述分群模型。In an exemplary embodiment, the preset loss function includes a first loss function, a second loss function, and a third loss function; the optimization module 406 is further configured to: based on the first loss function and the The expected cumulative reward value of the target sample corresponding to each sample patient is calculated to obtain the regression loss value; based on the second loss function and the propensity score of each sample patient, the first propensity score corresponding to the propensity score is calculated. Loss value; based on the third loss function and the target sample expected cumulative reward value corresponding to each sample patient, calculate the second loss value corresponding to the target sample expected cumulative reward value; for the regression loss value, the Summing the first loss value and the second loss value to obtain a loss value; modifying the model parameters in the model to be trained according to the loss value to obtain a modified model to be trained; and The modified model to be trained performs group training on the multiple sample data of the multiple sample patients, and stops the training when the modified model parameters reach the preset number of modifications and the loss value does not decrease, and the current The model to be trained is marked as the grouping model.
在示例性的实施例中,所述第一损失函数包括:In an exemplary embodiment, the first loss function includes:
reg_loss=Q(s t,a tmax)-Q(s t,a t); reg_loss=Q(s t , a tmax )-Q(s t , a t );
其中,reg_loss为回归损失值,用于防止Q值的过高估计,Q表示样本患者对应的预期累计奖励值,s t表示t时刻下的多个样本数据;a t表示t时刻样本患者对应的样本患者分群结果数据,Q(s t,a tmax)表示通过所述待训练模型输出的多个样本预期累计奖励值 中最大的样本预期累计奖励值,Q(s t,a t)表示样本患者在s t状态下确定的样本患者分群结果数据的实际预期累计奖励值; Among them, reg_loss is the regression loss value, which is used to prevent overestimation of the Q value. Q represents the expected cumulative reward value corresponding to the sample patient, s t represents multiple sample data at time t; a t represents the corresponding value of the sample patient at time t. Sample patient grouping result data, Q( st , atmax ) represents the largest sample expected cumulative reward value among multiple sample expected cumulative reward values output by the model to be trained, Q(st t , at t ) represents the sample patient The actual expected cumulative reward value of the sample patient clustering result data determined in the s t state;
所述第二损失函数包括:The second loss function includes:
g_loss=CrossEntropy(g(s t),to_one_hot(a t)); g_loss = CrossEntropy(g(s t ), to_one_hot(a t ));
其中,s t表示t时刻下的多个样本数据;a t表示t时刻样本患者对应的样本患者分群结果数据,CrossEntropy表示交叉熵,g(s t)表示输入状态s t时得到的g的输出,g表示倾向性分值,one_hot(a t)表示t时刻样本患者对应的样本患者分群结果数据action的独热编码;g_loss为第一损失值,用于使输出的g逼近样本患者分群结果数据; Among them, s t represents multiple sample data at time t; a t represents the grouping result data of sample patients corresponding to sample patients at time t, CrossEntropy represents cross entropy, and g(s t ) represents the output of g obtained when the input state s t , g represents the propensity score, one_hot(a t ) represents the one-hot encoding of the sample patient grouping result data action corresponding to the sample patient at time t; g_loss is the first loss value, which is used to make the output g approximate to the sample patient grouping result data ;
所述第三损失函数包括:The third loss function includes:
Q_loss=(Q(s t,a t)-(γ+max a(γ*Q(s t+1,a t+1)))) 2Q_loss=(Q(s t , a t )-(γ+max a (γ*Q(s t+1 , a t+1 )))) 2 ;
其中,s t表示t时刻下的多个样本数据;a t表示t时刻样本患者对应的样本患者分群结果数据;s t+1转移到的下一个t+1时刻的多个样本数据;Q(s t,a t)表示样本患者在t时刻的多个样本数据的状态下,样本患者分群结果数据a t对应的实际预期累计奖励值;max a(γ*Q(s t+1,a t+1)表示样本患者在t+1时刻的多个样本数据状态下,多个样本分群结果数据a t+1对应的样本预期累计奖励值中的最大值;γ表示所述待训练模型中的折扣因子,用于表示下一个t+1时刻的目标样本预期累计奖励值折现到t时刻对应的目标样本预期累计奖励值的衰减比例;Q_loss为第二损失值。 Among them, s t represents multiple sample data at time t; a t represents the sample patient grouping result data corresponding to the sample patient at time t; multiple sample data at the next time t +1 transferred from s t +1; Q( s t , a t ) represents the actual expected cumulative reward value corresponding to the sample patient grouping result data a t in the state of multiple sample data of the sample patient at time t; max a (γ*Q(st t+1 , a t +1 ) means that the sample patient is in the state of multiple sample data at time t+1, the maximum value of the expected cumulative reward value of the sample corresponding to the multiple sample grouping result data a t+1 ; γ means that in the model to be trained The discount factor is used to indicate the attenuation ratio of the expected cumulative reward value of the target sample at the next time t+1 discounted to the expected cumulative reward value of the target sample corresponding to the time t; Q_loss is the second loss value.
在示例性的实施例中,所述待训练模型为深度强化学习模型。In an exemplary embodiment, the model to be trained is a deep reinforcement learning model.
在示例性的实施例中,所述第一模型处理模块402,还用于:对所述多个样本患者的多个样本数据进行随机化分配,以得到多个训练样本数据和多个对照样本数据;及将所述多个训练样本数据和多个对照样本数据输入到所述待训练模型中,通过所述待训练模型对所述多个训练样本数据和所述多个对照样本数据进行逻辑回归,计算得到所述每个样本患者对于其对应的样本患者分群结果数据的倾向性分值。In an exemplary embodiment, the first model processing module 402 is further configured to: randomly distribute the multiple sample data of the multiple sample patients to obtain multiple training sample data and multiple control samples data; and input the plurality of training sample data and the plurality of control sample data into the model to be trained, and perform logic on the plurality of training sample data and the plurality of control sample data through the model to be trained Regression, calculating the propensity score of each sample patient for its corresponding sample patient grouping result data.
实施例二Embodiment two
请参阅图5,示出了本申请实施例之医疗数据处理方法的步骤流程图。可以理解,本方法实施例中的流程图不用于对执行步骤的顺序进行限定。下面以计算机设备为执行主体进行示例性描述,具体如下:Please refer to FIG. 5 , which shows a flow chart of the steps of the medical data processing method according to the embodiment of the present application. It can be understood that the flowchart in this method embodiment is not used to limit the sequence of execution steps. The following is an exemplary description taking computer equipment as the execution subject, as follows:
如图5所示,所述医疗数据处理方法可以包括步骤S500~S506,其中:As shown in Figure 5, the medical data processing method may include steps S500-S506, wherein:
步骤S500,获取目标患者的多个患者数据,所述多个患者数据包括多个基本数据、多个患者历史随访数据以及患者当前随访数据;Step S500, acquiring a plurality of patient data of the target patient, the plurality of patient data including a plurality of basic data, a plurality of patient history follow-up data and a patient current follow-up data;
步骤S502,将所述多个基本数据、所述多个患者历史随访数据以及所述患者当前随访数据输入如上述所述的分群模型中,通过所述分群模型输出所述目标患者对应每个模型患者分群结果数据的预期累计奖励值;Step S502, input the plurality of basic data, the plurality of patient history follow-up data and the patient current follow-up data into the grouping model as described above, and output the target patient corresponding to each model through the grouping model The expected cumulative reward value of patient grouping result data;
步骤S504,从多个预期累计奖励值中,确定最大的预期累计奖励值为目标预期累计奖励值;Step S504, from the multiple expected cumulative reward values, determine the largest expected cumulative reward value as the target expected cumulative reward value;
步骤S506,根据所述目标预期累计奖励值,确定其对应的模型患者分群结果数据为所述目标患者对应的目标患者分群结果数据。Step S506, according to the target expected cumulative reward value, determine the corresponding model patient grouping result data as the target patient grouping result data corresponding to the target patient.
实施例三Embodiment three
请继续参阅图6,示出了本申请医疗数据处理系统的程序模块示意图。在本实施例中,医疗数据处理系统60可以包括或被分割成一个或多个程序模块,一个或者多个程序模块被存储于存储介质中,并由一个或多个处理器所执行,以完成本申请,并可实现上述医疗数据处理方法。本申请实施例所称的程序模块是指能够完成特定功能的一系列计算机程序指令段,比程序本身更适合于描述医疗数据处理系统60在存储介质中的执行过程。以下描述将具体介绍本实施例各程序模块的功能:Please continue to refer to FIG. 6 , which shows a schematic diagram of program modules of the medical data processing system of the present application. In this embodiment, the medical data processing system 60 may include or be divided into one or more program modules, and one or more program modules are stored in a storage medium and executed by one or more processors to complete In this application, the above-mentioned medical data processing method can be realized. The program module referred to in the embodiment of this application refers to a series of computer program instruction segments capable of completing specific functions, which is more suitable for describing the execution process of the medical data processing system 60 in the storage medium than the program itself. The following description will specifically introduce the functions of each program module of the present embodiment:
所述医疗数据处理系统,包括:The medical data processing system includes:
第二获取模块600,用于获取目标患者的多个患者数据,所述多个患者数据包括多个基本数据、多个患者历史随访数据以及患者当前随访数据;The second acquiring module 600 is configured to acquire multiple patient data of the target patient, the multiple patient data including multiple basic data, multiple patient historical follow-up data, and patient current follow-up data;
第二模型处理模块602,用于将所述多个基本数据、所述多个患者历史随访数据以及所述患者当前随访数据输入如权利要求1~5任一项所述的分群模型中,通过所述分群模型输出所述目标患者对应每个模型患者分群结果数据的预期累计奖励值;The second model processing module 602 is configured to input the multiple basic data, the multiple patient historical follow-up data and the patient current follow-up data into the grouping model according to any one of claims 1-5, through The grouping model outputs the expected cumulative reward value corresponding to each model patient grouping result data of the target patient;
第二确定模块604,用于从多个预期累计奖励值中,确定最大的预期累计奖励值为目标预期累计奖励值;The second determining module 604 is configured to determine the largest expected cumulative reward value as a target expected cumulative reward value from among multiple expected cumulative reward values;
第三确定模块606,用于根据所述目标预期累计奖励值,确定其对应的模型患者分群结果数据为所述目标患者对应的目标患者分群结果数据。The third determination module 606 is configured to determine the corresponding model patient grouping result data as the target patient grouping result data corresponding to the target patient according to the target expected cumulative reward value.
实施例四Embodiment four
参阅图7,是本申请实施例五之计算机设备的硬件架构示意图。本实施例中,所述计算机设备2是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设备。该计算机设备2可以是机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。如图7所示,所述计算机设备2至少包括,但不限于,可通过系统总线相互通信连接存储器21、处理器22、网络接口23、以及基于因果推断的分群模型构建系统40。其中:Referring to FIG. 7 , it is a schematic diagram of a hardware architecture of a computer device according to Embodiment 5 of the present application. In this embodiment, the computer device 2 is a device capable of automatically performing numerical calculation and/or information processing according to preset or stored instructions. The computer device 2 may be a rack server, a blade server, a tower server or a cabinet server (including an independent server, or a server cluster composed of multiple servers) and the like. As shown in FIG. 7 , the computer device 2 at least includes, but is not limited to, a memory 21 , a processor 22 , a network interface 23 , and a causal inference-based grouping model building system 40 that can communicate with each other through a system bus. in:
本实施例中,存储器21至少包括一种类型的计算机可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,存储器21可以是计算机设备2的内部存储单元,例如该计算机设备2的硬盘或内存。在另一些实施例中,存储器21也可以是计算机设备2的外部存储设备,例如该计算机设备2上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital, SD)卡,闪存卡(Flash Card)等。当然,存储器21还可以既包括计算机设备2的内部存储单元也包括其外部存储设备。本实施例中,存储器21通常用于存储安装于计算机设备2的操作系统和各类应用软件,例如上述实施例的基于因果推断的分群模型构建系统40的程序代码等。此外,存储器21还可以用于暂时地存储已经输出或者将要输出的各类数据。In this embodiment, the memory 21 includes at least one type of computer-readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory ( RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 21 may be an internal storage unit of the computer device 2 , such as a hard disk or memory of the computer device 2 . In other embodiments, the memory 21 can also be an external storage device of the computer device 2, such as a plug-in hard disk equipped on the computer device 2, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc. Of course, the storage 21 may also include both the internal storage unit of the computer device 2 and its external storage device. In this embodiment, the memory 21 is usually used to store the operating system and various application software installed in the computer device 2, such as the program codes of the causal inference-based grouping model construction system 40 of the above-mentioned embodiment. In addition, the memory 21 can also be used to temporarily store various types of data that have been output or will be output.
处理器22在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器22通常用于控制计算机设备2的总体操作。本实施例中,处理器22用于运行存储器21中存储的程序代码或者处理数据,例如运行基于因果推断的分群模型构建系统40,以实现上述实施例的基于因果推断的分群模型构建方法。In some embodiments, the processor 22 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips. The processor 22 is generally used to control the overall operation of the computer device 2 . In this embodiment, the processor 22 is used to run the program code stored in the memory 21 or process data, for example, run the causal inference-based grouping model construction system 40, so as to implement the causal inference-based grouping model construction method of the above-mentioned embodiment.
所述网络接口23可包括无线网络接口或有线网络接口,该网络接口23通常用于在所述计算机设备2与其他电子装置之间建立通信连接。例如,所述网络接口23用于通过网络将所述计算机设备2与外部终端相连,在所述计算机设备2与外部终端之间的建立数据传输通道和通信连接等。所述网络可以是企业内部网(Intranet)、互联网(Internet)、全球移动通讯系统(Global System of Mobile communication,GSM)、宽带码分多址(Wideband Code Division Multiple Access,WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi等无线或有线网络。The network interface 23 may include a wireless network interface or a wired network interface, and the network interface 23 is generally used to establish a communication connection between the computer device 2 and other electronic devices. For example, the network interface 23 is used to connect the computer device 2 with an external terminal through a network, and establish a data transmission channel and a communication connection between the computer device 2 and an external terminal. The network can be an enterprise intranet (Intranet), Internet (Internet), Global System of Mobile communication (Global System of Mobile communication, GSM), broadband code division multiple access (Wideband Code Division Multiple Access, WCDMA), 4G network, 5G Internet, Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.
需要指出的是,图7仅示出了具有部件21-23、基于因果推断的分群模型构建系统40的计算机设备2,但是应理解的是,并不要求实施所有示出的部件,可以替代的实施更多或者更少的部件。It should be pointed out that FIG. 7 only shows the computer device 2 having components 21-23 and a causal inference-based grouping model building system 40, but it should be understood that it is not required to implement all the components shown, and can be replaced by Implement more or fewer components.
在本实施例中,存储于存储器21中的所述基于因果推断的分群模型构建系统40还可以被分割为一个或者多个程序模块,所述一个或者多个程序模块被存储于存储器21中,并由一个或多个处理器(本实施例为处理器22)所执行,以完成本申请。In this embodiment, the causal inference-based grouping model construction system 40 stored in the memory 21 can also be divided into one or more program modules, and the one or more program modules are stored in the memory 21, And it is executed by one or more processors (processor 22 in this embodiment) to complete the application.
例如,图4示出了所述实现基于因果推断的分群模型构建系统40实施例二的程序模块示意图,该实施例中,所述基于因果推断的分群模型构建系统40可以被划分为第一获取模块400、第一模型处理模块402、第一确定模块404以及优化模块406。其中,本申请所称的程序模块是指能够完成特定功能的一系列计算机程序指令段,比程序更适合于描述所述基于因果推断的分群模型构建系统40在所述计算机设备2中的执行过程。所述程序模块400-406的具体功能在实施例二中已有详细描述,在此不再赘述。For example, FIG. 4 shows a schematic diagram of the program modules of Embodiment 2 of the system 40 for constructing a grouping model based on causal inference. In this embodiment, the system for building a grouping model 40 based on causal inference can be divided into the first acquisition Module 400 , first model processing module 402 , first determination module 404 and optimization module 406 . Wherein, the program module referred to in this application refers to a series of computer program instruction segments capable of completing specific functions, which is more suitable than a program to describe the execution process of the causal inference-based grouping model construction system 40 in the computer device 2 . The specific functions of the program modules 400-406 have been described in detail in the second embodiment, and will not be repeated here.
实施例五Embodiment five
本实施例还提供一种计算机可读存储介质,如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘、服务器、App应用商城等等,其上存储有计算机程序,程序被处理器执行时实现相应功能。所述计算机可读存储介质可以是非易失性,也可以是易失性。本实施例的计算机可读存储介质用于存储基于因果推断的分群模型构建系统40,被处理器 执行时实现上述实施例的基于因果推断的分群模型构建方法。This embodiment also provides a computer-readable storage medium, such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), only Read memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, server, App application store, etc., on which computer programs are stored, The corresponding functions are realized when the program is executed by the processor. The computer-readable storage medium may be non-volatile or volatile. The computer-readable storage medium of this embodiment is used to store the system 40 for constructing a grouping model based on causal inference, and when executed by a processor, realizes the method for constructing a grouping model based on causal inference in the above-mentioned embodiment.
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。The serial numbers of the above embodiments of the present application are for description only, and do not represent the advantages and disadvantages of the embodiments.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。Through the description of the above embodiments, those skilled in the art can clearly understand that the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。The above are only preferred embodiments of the present application, and are not intended to limit the patent scope of the present application. All equivalent structures or equivalent process transformations made by using the description of the application and the accompanying drawings are directly or indirectly used in other related technical fields. , are all included in the patent protection scope of the present application in the same way.

Claims (20)

  1. 一种基于因果推断的分群模型构建方法,其中,包括:A method for constructing a grouping model based on causal inference, including:
    获取多个样本患者的多个样本数据,每个样本患者的多个样本数据包括多个基本数据、多个患者历史随访数据和样本患者分群结果数据;Obtain multiple sample data of multiple sample patients, and multiple sample data of each sample patient includes multiple basic data, multiple patient historical follow-up data and sample patient grouping result data;
    将所述多个样本患者的多个样本数据输入待训练模型中,通过所述待训练模型输出每个样本患者对于其对应的样本患者分群结果数据的倾向性分值,以及通过所述待训练模型输出每个样本患者对应所述待训练模型中的每个模型患者分群结果数据的样本预期累计奖励值,其中,所述倾向性分值表示样本患者对应于样本患者分群结果数据的概率;Input a plurality of sample data of the plurality of sample patients into the model to be trained, output the propensity score of each sample patient for its corresponding sample patient grouping result data through the model to be trained, and use the model to be trained The model outputs each sample patient corresponding to the sample expected cumulative reward value of each model patient grouping result data in the model to be trained, wherein the propensity score represents the probability that the sample patient corresponds to the sample patient grouping result data;
    从每个所述样本患者的样本预期累计奖励值中,确定每个样本患者对应的目标样本预期累计奖励值;及From the sample expected cumulative reward value of each sample patient, determine the target sample expected cumulative reward value corresponding to each sample patient; and
    基于预设的损失函数、每个样本患者的倾向性分值以及对应的目标样本预期累计奖励值,调整所述待训练模型中的模型参数,以优化得到分群模型。Based on the preset loss function, the propensity score of each patient sample and the corresponding expected cumulative reward value of the target sample, the model parameters in the model to be trained are adjusted to optimize the grouping model.
  2. 根据权利要求1所述的基于因果推断的分群模型构建方法,其中,所述预设的损失函数包括第一损失函数、第二损失函数和第三损失函数;The method for constructing a grouping model based on causal inference according to claim 1, wherein the preset loss function includes a first loss function, a second loss function and a third loss function;
    所述基于预设的损失函数、每个样本患者的倾向性分值以及对应的目标样本预期累计奖励值,调整所述待训练模型中的模型参数,以优化得到分群模型,包括:The adjustment of the model parameters in the model to be trained based on the preset loss function, the propensity score of each sample patient and the corresponding expected cumulative reward value of the target sample to optimize the grouping model includes:
    基于所述第一损失函数和所述每个样本患者对应的目标样本预期累计奖励值,计算得到回归损失值;Calculate and obtain a regression loss value based on the first loss function and the expected cumulative reward value of the target sample corresponding to each sample patient;
    基于所述第二损失函数和所述每个样本患者的倾向性分值,计算得到倾向性分值对应的第一损失值;Based on the second loss function and the propensity score of each sample patient, calculate a first loss value corresponding to the propensity score;
    基于所述第三损失函数和所述每个样本患者对应的目标样本预期累计奖励值,计算得到目标样本预期累计奖励值对应的第二损失值;Based on the third loss function and the target sample expected cumulative reward value corresponding to each sample patient, calculate a second loss value corresponding to the target sample expected cumulative reward value;
    对所述回归损失值、所述第一损失值和所述第二损失值求和以得到损失值;summing the regression loss value, the first loss value and the second loss value to obtain a loss value;
    根据所述损失值对所述待训练模型中的所述模型参数进行修改,以得到修改后的待训练模型;及Modifying the model parameters in the model to be trained according to the loss value to obtain a modified model to be trained; and
    通过所述修改后的待训练模型对所述多个样本患者的多个样本数据执行分群训练,并在修改后的模型参数达到预设修改次数后且所述损失值不下降时停止训练,并将当前的待训练模型标记为所述分群模型。Perform group training on multiple sample data of the multiple sample patients through the modified model to be trained, and stop the training when the modified model parameters reach a preset number of modifications and the loss value does not decrease, and Mark the current model to be trained as the grouping model.
  3. 根据权利要求2所述的基于因果推断的分群模型构建方法,其中,所述第一损失函数包括:The method for constructing a grouping model based on causal inference according to claim 2, wherein the first loss function comprises:
    reg_loss=Q(s t,a tmax)-Q(s t,a t); reg_loss=Q(s t , a tmax )-Q(s t , a t );
    其中,reg_loss为回归损失值,用于防止Q值的过高估计,Q表示样本患者对应的预期累计奖励值,s t表示t时刻下的多个样本数据;a t表示t时刻样本患者对应的样本患者分群结果数据,Q(s t,a tmax)表示通过所述待训练模型输出的多个样本预期累计奖励值中最大的样本预期累计奖励值,Q(s t,a t)表示样本患者在s t状态下确定的样本患者分群 结果数据的实际预期累计奖励值; Among them, reg_loss is the regression loss value, which is used to prevent overestimation of the Q value. Q represents the expected cumulative reward value corresponding to the sample patient, s t represents multiple sample data at time t; a t represents the corresponding value of the sample patient at time t. Sample patient grouping result data, Q( st , atmax ) represents the largest sample expected cumulative reward value among multiple sample expected cumulative reward values output by the model to be trained, Q(st t , at t ) represents the sample patient The actual expected cumulative reward value of the sample patient clustering result data determined in the s t state;
    所述第二损失函数包括:The second loss function includes:
    g_loss=CrossEntropy(g(s t),to_one_hot(a t)); g_loss = CrossEntropy(g(s t ), to_one_hot(a t ));
    其中,s t表示t时刻下的多个样本数据;a t表示t时刻样本患者对应的样本患者分群结果数据,CrossEntropy表示交叉熵,g(s t)表示输入状态st时得到的g的输出,g表示倾向性分值,one_hot(a t)表示t时刻样本患者对应的样本患者分群结果数据action的独热编码;g_loss为第一损失值,用于使输出的g逼近样本患者分群结果数据; Among them, s t represents multiple sample data at time t; a t represents the grouping result data of sample patients corresponding to sample patients at time t, CrossEntropy represents cross entropy, g(s t ) represents the output of g obtained when input state st, g represents the propensity score, one_hot(a t ) represents the one-hot encoding of the sample patient grouping result data action corresponding to the sample patient at time t; g_loss is the first loss value, which is used to make the output g approximate to the sample patient grouping result data;
    所述第三损失函数包括:The third loss function includes:
    Q_loss=(Q(s t,a t)-(γ+max a(γ*Q(s t+1,a t+1)))) 2Q_loss=(Q(s t , a t )-(γ+max a (γ*Q(s t+1 , a t+1 )))) 2 ;
    其中,s t表示t时刻下的多个样本数据;a t表示t时刻样本患者对应的样本患者分群结果数据;s t+1转移到的下一个t+1时刻的多个样本数据;Q(s t,a t)表示样本患者在t时刻的多个样本数据的状态下,样本患者分群结果数据a t对应的实际预期累计奖励值;max a(γ*Q(s t+1,a t+1))表示样本患者在t+1时刻的多个样本数据状态下,多个样本分群结果数据a t+1对应的样本预期累计奖励值中的最大值;γ表示所述待训练模型中的折扣因子,用于表示下一个t+1时刻的目标样本预期累计奖励值折现到t时刻对应的目标样本预期累计奖励值的衰减比例;Q_loss为第二损失值。 Among them, s t represents multiple sample data at time t; a t represents the sample patient grouping result data corresponding to the sample patient at time t; multiple sample data at the next time t +1 transferred from s t +1; Q( s t , a t ) represents the actual expected cumulative reward value corresponding to the sample patient grouping result data a t in the state of multiple sample data of the sample patient at time t; max a (γ*Q(st t+1 , a t +1 )) means that the sample patient is in the state of multiple sample data at time t+1, the maximum value of the expected cumulative reward value of the sample corresponding to the multiple sample grouping result data a t+1 ; γ means that in the model to be trained The discount factor is used to represent the attenuation ratio of the expected cumulative reward value of the target sample at the next time t+1 discounted to the expected cumulative reward value of the target sample corresponding to time t; Q_loss is the second loss value.
  4. 根据权利要求3所述的基于因果推断的分群模型构建方法,其中,所述待训练模型为深度强化学习模型。The method for constructing a grouping model based on causal inference according to claim 3, wherein the model to be trained is a deep reinforcement learning model.
  5. 根据权利要求1所述的基于因果推断的分群模型构建方法,其中,在所述将所述多个样本患者的多个样本数据输入待训练模型中之前,所述方法还包括:The method for constructing a grouping model based on causal inference according to claim 1, wherein, before the multiple sample data of the multiple sample patients are input into the model to be trained, the method also includes:
    对所述多个样本患者的多个样本数据进行随机化分配,以得到多个训练样本数据和多个对照样本数据;Randomly assigning multiple sample data of the multiple sample patients to obtain multiple training sample data and multiple control sample data;
    相应的,所述将所述多个样本患者的多个样本数据输入待训练模型中,通过所述待训练模型输出每个样本患者对于其对应的样本患者分群结果数据的倾向性分值包括:Correspondingly, the input of multiple sample data of the multiple sample patients into the model to be trained, and outputting the propensity score of each sample patient for its corresponding sample patient grouping result data through the model to be trained includes:
    将所述多个训练样本数据和多个对照样本数据输入到所述待训练模型中,通过所述待训练模型对所述多个训练样本数据和所述多个对照样本数据进行逻辑回归,计算得到所述每个样本患者对于其对应的样本患者分群结果数据的倾向性分值。Input the plurality of training sample data and the plurality of control sample data into the model to be trained, perform logistic regression on the plurality of training sample data and the plurality of control sample data through the model to be trained, and calculate The propensity score of each sample patient for its corresponding sample patient grouping result data is obtained.
  6. 一种基于因果推断的分群模型构建系统,其中,包括:A system for constructing a grouping model based on causal inference, including:
    第一获取模块,用于获取多个样本患者的多个样本数据,每个样本患者的多个样本数据包括多个基本数据、多个患者历史随访数据和样本患者分群结果数据;The first acquisition module is used to acquire multiple sample data of multiple sample patients, and the multiple sample data of each sample patient includes multiple basic data, multiple patient historical follow-up data and sample patient grouping result data;
    第一模型处理模块,用于将所述多个样本患者的多个样本数据输入待训练模型中,通过所述待训练模型输出每个样本患者对于其对应的样本患者分群结果数据的倾向性分值,以及通过所述待训练模型输出每个样本患者对应所述待训练模型中的每个模型患者分群结果数据的样本预期累计奖励值,其中,所述倾向性分值表示样本患者对应于样本患者分群结果数据的概率;The first model processing module is used to input multiple sample data of the multiple sample patients into the model to be trained, and output the propensity score of each sample patient for its corresponding sample patient grouping result data through the model to be trained value, and the sample expected cumulative reward value of each sample patient corresponding to each model patient grouping result data in the model to be trained is output by the model to be trained, wherein the propensity score indicates that the sample patient corresponds to the sample patient Probability of patient cohort outcome data;
    第一确定模块,用于从每个所述样本患者的样本预期累计奖励值中,确定每个样本患 者对应的目标样本预期累计奖励值;及The first determining module is used to determine the target sample expected cumulative reward value corresponding to each sample patient from the sample expected cumulative reward value of each sample patient; and
    优化模块,用于基于预设的损失函数、每个样本患者的倾向性分值以及对应的目标样本预期累计奖励值,调整所述待训练模型中的模型参数,以优化得到分群模型。The optimization module is used to adjust the model parameters in the model to be trained based on the preset loss function, the propensity score of each sample patient and the corresponding expected cumulative reward value of the target sample, so as to optimize the clustering model.
  7. 根据权利要求6所述的基于因果推断的分群模型构建系统,其中,所述预设的损失函数包括第一损失函数、第二损失函数和第三损失函数;The system for constructing a grouping model based on causal inference according to claim 6, wherein the preset loss function includes a first loss function, a second loss function and a third loss function;
    所述优化模块,还用于:The optimization module is also used for:
    基于所述第一损失函数和所述每个样本患者对应的目标样本预期累计奖励值,计算得到回归损失值;Calculate and obtain a regression loss value based on the first loss function and the expected cumulative reward value of the target sample corresponding to each sample patient;
    基于所述第二损失函数和所述每个样本患者的倾向性分值,计算得到倾向性分值对应的第一损失值;Based on the second loss function and the propensity score of each sample patient, calculate a first loss value corresponding to the propensity score;
    基于所述第三损失函数和所述每个样本患者对应的目标样本预期累计奖励值,计算得到目标样本预期累计奖励值对应的第二损失值;Based on the third loss function and the target sample expected cumulative reward value corresponding to each sample patient, calculate a second loss value corresponding to the target sample expected cumulative reward value;
    对所述回归损失值、所述第一损失值和所述第二损失值求和以得到损失值;summing the regression loss value, the first loss value and the second loss value to obtain a loss value;
    根据所述损失值对所述待训练模型中的所述模型参数进行修改,以得到修改后的待训练模型;及Modifying the model parameters in the model to be trained according to the loss value to obtain a modified model to be trained; and
    通过所述修改后的待训练模型对所述多个样本患者的多个样本数据执行分群训练,并在修改后的模型参数达到预设修改次数后且所述损失值不下降时停止训练,并将当前的待训练模型标记为所述分群模型。Perform group training on multiple sample data of the multiple sample patients through the modified model to be trained, and stop the training when the modified model parameters reach a preset number of modifications and the loss value does not decrease, and Mark the current model to be trained as the grouping model.
  8. 根据权利要求7所述的基于因果推断的分群模型构建系统,其中,所述第一损失函数包括:The grouping model building system based on causal inference according to claim 7, wherein the first loss function comprises:
    reg_loss=Q(s t,a tmax)-Q(s t,a t); reg_loss=Q(s t , a tmax )-Q(s t , a t );
    其中,reg_loss为回归损失值,用于防止Q值的过高估计,Q表示样本患者对应的预期累计奖励值,s t表示t时刻下的多个样本数据;a t表示t时刻样本患者对应的样本患者分群结果数据,Q(s t,a tmax)表示通过所述待训练模型输出的多个样本预期累计奖励值中最大的样本预期累计奖励值,Q(s t,a t)表示样本患者在s t状态下确定的样本患者分群结果数据的实际预期累计奖励值; Among them, reg_loss is the regression loss value, which is used to prevent overestimation of the Q value. Q represents the expected cumulative reward value corresponding to the sample patient, s t represents multiple sample data at time t; a t represents the corresponding value of the sample patient at time t. Sample patient grouping result data, Q( st , atmax ) represents the largest sample expected cumulative reward value among multiple sample expected cumulative reward values output by the model to be trained, Q(st t , at t ) represents the sample patient The actual expected cumulative reward value of the sample patient clustering result data determined in the s t state;
    所述第二损失函数包括:The second loss function includes:
    g_loss=CrossEntropy(g(s t),to_one_hot(a t)); g_loss = CrossEntropy(g(s t ), to_one_hot(a t ));
    其中,s t表示t时刻下的多个样本数据;a t表示t时刻样本患者对应的样本患者分群结果数据,CrossEntropy表示交叉熵,g(s t)表示输入状态st时得到的g的输出,g表示倾向性分值,one_hot(a t)表示t时刻样本患者对应的样本患者分群结果数据action的独热编码;g_loss为第一损失值,用于使输出的g逼近样本患者分群结果数据; Among them, s t represents multiple sample data at time t; a t represents the grouping result data of sample patients corresponding to sample patients at time t, CrossEntropy represents cross entropy, g(s t ) represents the output of g obtained when input state st, g represents the propensity score, one_hot(a t ) represents the one-hot encoding of the sample patient grouping result data action corresponding to the sample patient at time t; g_loss is the first loss value, which is used to make the output g approximate to the sample patient grouping result data;
    所述第三损失函数包括:The third loss function includes:
    Q_loss=(Q(s t,a t)-(γ+max a(γ*Q(s t+1,a t+1)))) 2Q_loss=(Q(s t , a t )-(γ+max a (γ*Q(s t+1 , a t+1 )))) 2 ;
    其中,s t表示t时刻下的多个样本数据;a t表示t时刻样本患者对应的样本患者分群结果数据;s t+1转移到的下一个t+1时刻的多个样本数据;Q(s t,a t)表示样本患者在t 时刻的多个样本数据的状态下,样本患者分群结果数据a t对应的实际预期累计奖励值;max a(γ*Q(s t+1,a t+1))表示样本患者在t+1时刻的多个样本数据状态下,多个样本分群结果数据a t+1对应的样本预期累计奖励值中的最大值;γ表示所述待训练模型中的折扣因子,用于表示下一个t+1时刻的目标样本预期累计奖励值折现到t时刻对应的目标样本预期累计奖励值的衰减比例;Q_loss为第二损失值。 Among them, s t represents multiple sample data at time t; a t represents the sample patient grouping result data corresponding to the sample patient at time t; multiple sample data at the next time t +1 transferred from s t +1; Q( s t , a t ) represents the actual expected cumulative reward value corresponding to the sample patient grouping result data a t in the state of multiple sample data of the sample patient at time t; max a (γ*Q(st t+1 , a t +1 )) means that the sample patient is in the state of multiple sample data at time t+1, the maximum value of the expected cumulative reward value of the sample corresponding to the multiple sample grouping result data a t+1 ; γ means that in the model to be trained The discount factor is used to represent the attenuation ratio of the expected cumulative reward value of the target sample at the next time t+1 discounted to the expected cumulative reward value of the target sample corresponding to time t; Q_loss is the second loss value.
  9. 根据权利要求8所述的基于因果推断的分群模型构建系统,其中,所述待训练模型为深度强化学习模型。The system for constructing a grouping model based on causal inference according to claim 8, wherein the model to be trained is a deep reinforcement learning model.
  10. 根据权利要求6所述的基于因果推断的分群模型构建系统,其中,所述系统还包括:随机化分配模块;The system for constructing grouping models based on causal inference according to claim 6, wherein said system also includes: a randomized assignment module;
    所述随机化分配模块,用于:对所述多个样本患者的多个样本数据进行随机化分配,以得到多个训练样本数据和多个对照样本数据;The randomized allocation module is configured to: perform randomized allocation on multiple sample data of the multiple sample patients to obtain multiple training sample data and multiple control sample data;
    相应的,所述第一模型处理模块,还用于:将所述多个训练样本数据和多个对照样本数据输入到所述待训练模型中,通过所述待训练模型对所述多个训练样本数据和所述多个对照样本数据进行逻辑回归,计算得到所述每个样本患者对于其对应的样本患者分群结果数据的倾向性分值。Correspondingly, the first model processing module is further configured to: input the multiple training sample data and the multiple control sample data into the model to be trained, and use the model to be trained to train the multiple Logistic regression is performed on the sample data and the plurality of control sample data, and the propensity score of each sample patient to its corresponding sample patient grouping result data is calculated.
  11. 一种医疗数据处理方法,其中,包括:A method for processing medical data, including:
    获取目标患者的多个患者数据,所述多个患者数据包括多个基本数据、多个患者历史随访数据以及患者当前随访数据;Acquiring a plurality of patient data of the target patient, the plurality of patient data including a plurality of basic data, a plurality of patient historical follow-up data, and a patient current follow-up data;
    将所述多个基本数据、所述多个患者历史随访数据以及所述患者当前随访数据输入如权利要求1~5任一项所述的分群模型中,通过所述分群模型输出所述目标患者对应每个模型患者分群结果数据的预期累计奖励值;Input the plurality of basic data, the plurality of patient history follow-up data and the patient current follow-up data into the grouping model according to any one of claims 1 to 5, and output the target patient through the grouping model The expected cumulative reward value corresponding to each model patient grouping result data;
    从多个预期累计奖励值中,确定最大的预期累计奖励值为目标预期累计奖励值;及From the plurality of expected cumulative reward values, determining the largest expected cumulative reward value as a target expected cumulative reward value; and
    根据所述目标预期累计奖励值,确定其对应的模型患者分群结果数据为所述目标患者对应的目标患者分群结果数据。According to the target expected cumulative reward value, determine the corresponding model patient grouping result data as the target patient grouping result data corresponding to the target patient.
  12. 一种医疗数据处理系统,其中,包括:A medical data processing system, including:
    第二获取模块,用于获取目标患者的多个患者数据,所述多个患者数据包括多个基本数据、多个患者历史随访数据以及患者当前随访数据;The second acquiring module is used to acquire multiple patient data of the target patient, the multiple patient data including multiple basic data, multiple patient historical follow-up data and patient current follow-up data;
    第二模型处理模块,用于将所述多个基本数据、所述多个患者历史随访数据以及所述患者当前随访数据输入如权利要求1~5任一项所述的分群模型中,通过所述分群模型输出所述目标患者对应每个模型患者分群结果数据的预期累计奖励值;The second model processing module is used to input the plurality of basic data, the plurality of patient history follow-up data and the patient current follow-up data into the grouping model according to any one of claims 1 to 5, through the The grouping model outputs the expected cumulative reward value of the target patient corresponding to each model patient grouping result data;
    第二确定模块,用于从多个预期累计奖励值中,确定最大的预期累计奖励值为目标预期累计奖励值;及The second determining module is used to determine the largest expected cumulative reward value as a target expected cumulative reward value from among multiple expected cumulative reward values; and
    第三确定模块,用于根据所述目标预期累计奖励值,确定其对应的模型患者分群结果数据为所述目标患者对应的目标患者分群结果数据。The third determination module is configured to determine the corresponding model patient grouping result data as the target patient grouping result data corresponding to the target patient according to the target expected cumulative reward value.
  13. 一种计算机设备,所述计算机设备包括存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时执行以 下步骤:A computer device comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the processor performs the following steps when executing the computer program:
    获取多个样本患者的多个样本数据,每个样本患者的多个样本数据包括多个基本数据、多个患者历史随访数据和样本患者分群结果数据;Obtain multiple sample data of multiple sample patients, and multiple sample data of each sample patient includes multiple basic data, multiple patient historical follow-up data and sample patient grouping result data;
    将所述多个样本患者的多个样本数据输入待训练模型中,通过所述待训练模型输出每个样本患者对于其对应的样本患者分群结果数据的倾向性分值,以及通过所述待训练模型输出每个样本患者对应所述待训练模型中的每个模型患者分群结果数据的样本预期累计奖励值,其中,所述倾向性分值表示样本患者对应于样本患者分群结果数据的概率;Input a plurality of sample data of the plurality of sample patients into the model to be trained, output the propensity score of each sample patient for its corresponding sample patient grouping result data through the model to be trained, and use the model to be trained The model outputs each sample patient corresponding to the sample expected cumulative reward value of each model patient grouping result data in the model to be trained, wherein the propensity score represents the probability that the sample patient corresponds to the sample patient grouping result data;
    从每个所述样本患者的样本预期累计奖励值中,确定每个样本患者对应的目标样本预期累计奖励值;及From the sample expected cumulative reward value of each sample patient, determine the target sample expected cumulative reward value corresponding to each sample patient; and
    基于预设的损失函数、每个样本患者的倾向性分值以及对应的目标样本预期累计奖励值,调整所述待训练模型中的模型参数,以优化得到分群模型。Based on the preset loss function, the propensity score of each patient sample and the corresponding expected cumulative reward value of the target sample, the model parameters in the model to be trained are adjusted to optimize the grouping model.
  14. 根据权利要求13所述的计算机设备,其中,所述预设的损失函数包括第一损失函数、第二损失函数和第三损失函数;The computer device according to claim 13, wherein the preset loss function comprises a first loss function, a second loss function and a third loss function;
    所述处理器执行所述计算机程序时还执行以下步骤:When the processor executes the computer program, the following steps are also performed:
    基于所述第一损失函数和所述每个样本患者对应的目标样本预期累计奖励值,计算得到回归损失值;Calculate and obtain a regression loss value based on the first loss function and the expected cumulative reward value of the target sample corresponding to each sample patient;
    基于所述第二损失函数和所述每个样本患者的倾向性分值,计算得到倾向性分值对应的第一损失值;Based on the second loss function and the propensity score of each sample patient, calculate a first loss value corresponding to the propensity score;
    基于所述第三损失函数和所述每个样本患者对应的目标样本预期累计奖励值,计算得到目标样本预期累计奖励值对应的第二损失值;Based on the third loss function and the target sample expected cumulative reward value corresponding to each sample patient, calculate a second loss value corresponding to the target sample expected cumulative reward value;
    对所述回归损失值、所述第一损失值和所述第二损失值求和以得到损失值;summing the regression loss value, the first loss value and the second loss value to obtain a loss value;
    根据所述损失值对所述待训练模型中的所述模型参数进行修改,以得到修改后的待训练模型;及Modifying the model parameters in the model to be trained according to the loss value to obtain a modified model to be trained; and
    通过所述修改后的待训练模型对所述多个样本患者的多个样本数据执行分群训练,并在修改后的模型参数达到预设修改次数后且所述损失值不下降时停止训练,并将当前的待训练模型标记为所述分群模型。Perform group training on multiple sample data of the multiple sample patients through the modified model to be trained, and stop the training when the modified model parameters reach a preset number of modifications and the loss value does not decrease, and Mark the current model to be trained as the grouping model.
  15. 根据权利要求14所述的计算机设备,其中,所述第一损失函数包括:The computer device of claim 14, wherein the first loss function comprises:
    reg_loss=Q(s t,a tmax)-Q(s t,a t); reg_loss=Q(s t , a tmax )-Q(s t , a t );
    其中,reg_loss为回归损失值,用于防止Q值的过高估计,Q表示样本患者对应的预期累计奖励值,s t表示t时刻下的多个样本数据;a t表示t时刻样本患者对应的样本患者分群结果数据,Q(s t,a tmax)表示通过所述待训练模型输出的多个样本预期累计奖励值中最大的样本预期累计奖励值,Q(s t,a t)表示样本患者在s t状态下确定的样本患者分群结果数据的实际预期累计奖励值; Among them, reg_loss is the regression loss value, which is used to prevent overestimation of the Q value. Q represents the expected cumulative reward value corresponding to the sample patient, s t represents multiple sample data at time t; a t represents the corresponding value of the sample patient at time t. Sample patient grouping result data, Q( st , atmax ) represents the largest sample expected cumulative reward value among multiple sample expected cumulative reward values output by the model to be trained, Q(st t , at t ) represents the sample patient The actual expected cumulative reward value of the sample patient clustering result data determined in the s t state;
    所述第二损失函数包括:The second loss function includes:
    g_loss=CrossEntropy(g(s t),to_one_hot(a t)); g_loss = CrossEntropy(g(s t ), to_one_hot(a t ));
    其中,s t表示t时刻下的多个样本数据;a t表示t时刻样本患者对应的样本患者分群 结果数据,CrossEntropy表示交叉熵,g(s t)表示输入状态st时得到的g的输出,g表示倾向性分值,one_hot(a t)表示t时刻样本患者对应的样本患者分群结果数据action的独热编码;g_loss为第一损失值,用于使输出的g逼近样本患者分群结果数据; Among them, s t represents multiple sample data at time t; a t represents the grouping result data of sample patients corresponding to sample patients at time t, CrossEntropy represents cross entropy, g(s t ) represents the output of g obtained when input state st, g represents the propensity score, one_hot(a t ) represents the one-hot encoding of the sample patient grouping result data action corresponding to the sample patient at time t; g_loss is the first loss value, which is used to make the output g approximate to the sample patient grouping result data;
    所述第三损失函数包括:The third loss function includes:
    Q_loss=(Q(s t,a t)-(γ+max a(γ*Q(s t+1,a t+1)))) 2Q_loss=(Q(s t , a t )-(γ+max a (γ*Q(s t+1 , a t+1 )))) 2 ;
    其中,s t表示t时刻下的多个样本数据;a t表示t时刻样本患者对应的样本患者分群结果数据;s t+1转移到的下一个t+1时刻的多个样本数据;Q(s t,a t)表示样本患者在t时刻的多个样本数据的状态下,样本患者分群结果数据a t对应的实际预期累计奖励值;max a(γ*Q(s t+1,a t+1))表示样本患者在t+1时刻的多个样本数据状态下,多个样本分群结果数据a t+1对应的样本预期累计奖励值中的最大值;γ表示所述待训练模型中的折扣因子,用于表示下一个t+1时刻的目标样本预期累计奖励值折现到t时刻对应的目标样本预期累计奖励值的衰减比例;Q_loss为第二损失值。 Among them, s t represents multiple sample data at time t; a t represents the sample patient grouping result data corresponding to the sample patient at time t; multiple sample data at the next time t +1 transferred from s t +1; Q( s t , a t ) represents the actual expected cumulative reward value corresponding to the sample patient grouping result data a t in the state of multiple sample data of the sample patient at time t; max a (γ*Q(st t+1 , a t +1 )) means that the sample patient is in the state of multiple sample data at time t+1, the maximum value of the expected cumulative reward value of the sample corresponding to the multiple sample grouping result data a t+1 ; γ means that in the model to be trained The discount factor is used to represent the attenuation ratio of the expected cumulative reward value of the target sample at the next time t+1 discounted to the expected cumulative reward value of the target sample corresponding to time t; Q_loss is the second loss value.
  16. 根据权利要求13所述的计算机设备,其中,The computer device of claim 13, wherein:
    在所述将所述多个样本患者的多个样本数据输入待训练模型中之前,所述处理器执行所述计算机程序时还执行以下步骤:Before the multiple sample data of the multiple sample patients are input into the model to be trained, the processor also executes the following steps when executing the computer program:
    对所述多个样本患者的多个样本数据进行随机化分配,以得到多个训练样本数据和多个对照样本数据;Randomly assigning multiple sample data of the multiple sample patients to obtain multiple training sample data and multiple control sample data;
    相应的,所述处理器执行所述计算机程序时还执行以下步骤:Correspondingly, when the processor executes the computer program, the following steps are also performed:
    将所述多个训练样本数据和多个对照样本数据输入到所述待训练模型中,通过所述待训练模型对所述多个训练样本数据和所述多个对照样本数据进行逻辑回归,计算得到所述每个样本患者对于其对应的样本患者分群结果数据的倾向性分值。Input the plurality of training sample data and the plurality of control sample data into the model to be trained, perform logistic regression on the plurality of training sample data and the plurality of control sample data through the model to be trained, and calculate The propensity score of each sample patient for its corresponding sample patient grouping result data is obtained.
  17. 一种计算机可读存储介质,其中,所述计算机可读存储介质内存储有计算机程序,所述计算机程序可被至少一个处理器所执行,以使所述至少一个处理器执行以下步骤:A computer-readable storage medium, wherein a computer program is stored in the computer-readable storage medium, and the computer program can be executed by at least one processor, so that the at least one processor performs the following steps:
    获取多个样本患者的多个样本数据,每个样本患者的多个样本数据包括多个基本数据、多个患者历史随访数据和样本患者分群结果数据;Obtain multiple sample data of multiple sample patients, and multiple sample data of each sample patient includes multiple basic data, multiple patient historical follow-up data and sample patient grouping result data;
    将所述多个样本患者的多个样本数据输入待训练模型中,通过所述待训练模型输出每个样本患者对于其对应的样本患者分群结果数据的倾向性分值,以及通过所述待训练模型输出每个样本患者对应所述待训练模型中的每个模型患者分群结果数据的样本预期累计奖励值,其中,所述倾向性分值表示样本患者对应于样本患者分群结果数据的概率;Input a plurality of sample data of the plurality of sample patients into the model to be trained, output the propensity score of each sample patient for its corresponding sample patient grouping result data through the model to be trained, and use the model to be trained The model outputs each sample patient corresponding to the sample expected cumulative reward value of each model patient grouping result data in the model to be trained, wherein the propensity score represents the probability that the sample patient corresponds to the sample patient grouping result data;
    从每个所述样本患者的样本预期累计奖励值中,确定每个样本患者对应的目标样本预期累计奖励值;及From the sample expected cumulative reward value of each sample patient, determine the target sample expected cumulative reward value corresponding to each sample patient; and
    基于预设的损失函数、每个样本患者的倾向性分值以及对应的目标样本预期累计奖励值,调整所述待训练模型中的模型参数,以优化得到分群模型。Based on the preset loss function, the propensity score of each patient sample and the corresponding expected cumulative reward value of the target sample, the model parameters in the model to be trained are adjusted to optimize the grouping model.
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述预设的损失函数包括第一损失函数、第二损失函数和第三损失函数;The computer-readable storage medium according to claim 17, wherein the preset loss function includes a first loss function, a second loss function, and a third loss function;
    所述处理器执行所述计算机程序时还执行以下步骤:When the processor executes the computer program, the following steps are also performed:
    所述基于预设的损失函数、每个样本患者的倾向性分值以及对应的目标样本预期累计奖励值,调整所述待训练模型中的模型参数,以优化得到分群模型,包括:The adjustment of the model parameters in the model to be trained based on the preset loss function, the propensity score of each sample patient and the corresponding expected cumulative reward value of the target sample to optimize the grouping model includes:
    基于所述第一损失函数和所述每个样本患者对应的目标样本预期累计奖励值,计算得到回归损失值;Calculate and obtain a regression loss value based on the first loss function and the expected cumulative reward value of the target sample corresponding to each sample patient;
    基于所述第二损失函数和所述每个样本患者的倾向性分值,计算得到倾向性分值对应的第一损失值;Based on the second loss function and the propensity score of each sample patient, calculate a first loss value corresponding to the propensity score;
    基于所述第三损失函数和所述每个样本患者对应的目标样本预期累计奖励值,计算得到目标样本预期累计奖励值对应的第二损失值;Based on the third loss function and the target sample expected cumulative reward value corresponding to each sample patient, calculate a second loss value corresponding to the target sample expected cumulative reward value;
    对所述回归损失值、所述第一损失值和所述第二损失值求和以得到损失值;summing the regression loss value, the first loss value and the second loss value to obtain a loss value;
    根据所述损失值对所述待训练模型中的所述模型参数进行修改,以得到修改后的待训练模型;及Modifying the model parameters in the model to be trained according to the loss value to obtain a modified model to be trained; and
    通过所述修改后的待训练模型对所述多个样本患者的多个样本数据执行分群训练,并在修改后的模型参数达到预设修改次数后且所述损失值不下降时停止训练,并将当前的待训练模型标记为所述分群模型。Perform group training on multiple sample data of the multiple sample patients through the modified model to be trained, and stop the training when the modified model parameters reach a preset number of modifications and the loss value does not decrease, and Mark the current model to be trained as the grouping model.
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述第一损失函数包括:The computer readable storage medium of claim 18, wherein the first loss function comprises:
    reg_loss=Q(s t,a tmax)-Q(s t,a t); reg_loss=Q(s t , a tmax )-Q(s t , a t );
    其中,reg_loss为回归损失值,用于防止Q值的过高估计,Q表示样本患者对应的预期累计奖励值,s t表示t时刻下的多个样本数据;a t表示t时刻样本患者对应的样本患者分群结果数据,Q(s t,a tmax)表示通过所述待训练模型输出的多个样本预期累计奖励值中最大的样本预期累计奖励值,Q(s t,a t)表示样本患者在s t状态下确定的样本患者分群结果数据的实际预期累计奖励值; Among them, reg_loss is the regression loss value, which is used to prevent overestimation of the Q value. Q represents the expected cumulative reward value corresponding to the sample patient, s t represents multiple sample data at time t; a t represents the corresponding value of the sample patient at time t. Sample patient grouping result data, Q( st , atmax ) represents the largest sample expected cumulative reward value among multiple sample expected cumulative reward values output by the model to be trained, Q(st t , at t ) represents the sample patient The actual expected cumulative reward value of the sample patient clustering result data determined in the s t state;
    所述第二损失函数包括:The second loss function includes:
    g_loss=CrossEntropy(g(s t),to_one_hot(a t)); g_loss = CrossEntropy(g(s t ), to_one_hot(a t ));
    其中,s t表示t时刻下的多个样本数据;a t表示t时刻样本患者对应的样本患者分群结果数据,CrossEntropy表示交叉熵,g(s t)表示输入状态st时得到的g的输出,g表示倾向性分值,one_hot(a t)表示t时刻样本患者对应的样本患者分群结果数据action的独热编码;g_loss为第一损失值,用于使输出的g逼近样本患者分群结果数据; Among them, s t represents multiple sample data at time t; a t represents the grouping result data of sample patients corresponding to sample patients at time t, CrossEntropy represents cross entropy, g(s t ) represents the output of g obtained when input state st, g represents the propensity score, one_hot(a t ) represents the one-hot encoding of the sample patient grouping result data action corresponding to the sample patient at time t; g_loss is the first loss value, which is used to make the output g approximate to the sample patient grouping result data;
    所述第三损失函数包括:The third loss function includes:
    Q_loss=(Q(s t,a t)-(γ+max a(γ*Q(s t+1,a t+1)))) 2Q_loss=(Q(s t , a t )-(γ+max a (γ*Q(s t+1 , a t+1 )))) 2 ;
    其中,s t表示t时刻下的多个样本数据;a t表示t时刻样本患者对应的样本患者分群结果数据;s t+1转移到的下一个t+1时刻的多个样本数据;Q(s t,a t)表示样本患者在t时刻的多个样本数据的状态下,样本患者分群结果数据a t对应的实际预期累计奖励值;max a(γ*Q(s t+1,a t+1))表示样本患者在t+1时刻的多个样本数据状态下,多个样本分群结果数据a t+1对应的样本预期累计奖励值中的最大值;γ表示所述待训练模型中的折扣因子,用于表示下一个t+1时刻的目标样本预期累计奖励值折现到t时刻对应的目标样本预期累计奖励值的衰减比例;Q_loss为第二损失值。 Among them, s t represents multiple sample data at time t; a t represents the sample patient grouping result data corresponding to the sample patient at time t; multiple sample data at the next time t +1 transferred from s t +1; Q( s t , a t ) represents the actual expected cumulative reward value corresponding to the sample patient grouping result data a t in the state of multiple sample data of the sample patient at time t; max a (γ*Q(st t+1 , a t +1 )) means that the sample patient is in the state of multiple sample data at time t+1, the maximum value of the expected cumulative reward value of the sample corresponding to the multiple sample grouping result data a t+1 ; γ means that in the model to be trained The discount factor is used to represent the attenuation ratio of the expected cumulative reward value of the target sample at the next time t+1 discounted to the expected cumulative reward value of the target sample corresponding to time t; Q_loss is the second loss value.
  20. 根据权利要求1所述的计算机可读存储介质,其中,The computer readable storage medium of claim 1, wherein:
    在所述将所述多个样本患者的多个样本数据输入待训练模型中之前,所述处理器执行所述计算机程序时还执行以下步骤:Before the multiple sample data of the multiple sample patients are input into the model to be trained, the processor also executes the following steps when executing the computer program:
    对所述多个样本患者的多个样本数据进行随机化分配,以得到多个训练样本数据和多个对照样本数据;Randomly assigning multiple sample data of the multiple sample patients to obtain multiple training sample data and multiple control sample data;
    相应的,所述处理器执行所述计算机程序时还执行以下步骤:Correspondingly, when the processor executes the computer program, the following steps are also performed:
    将所述多个训练样本数据和多个对照样本数据输入到所述待训练模型中,通过所述待训练模型对所述多个训练样本数据和所述多个对照样本数据进行逻辑回归,计算得到所述每个样本患者对于其对应的样本患者分群结果数据的倾向性分值。Input the plurality of training sample data and the plurality of control sample data into the model to be trained, perform logistic regression on the plurality of training sample data and the plurality of control sample data through the model to be trained, and calculate The propensity score of each sample patient for its corresponding sample patient grouping result data is obtained.
PCT/CN2022/074389 2021-09-30 2022-01-27 Clustering model construction method based on causal inference and medical data processing method WO2023050668A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111156355.6 2021-09-30
CN202111156355.6A CN113782192A (en) 2021-09-30 2021-09-30 Grouping model construction method based on causal inference and medical data processing method

Publications (1)

Publication Number Publication Date
WO2023050668A1 true WO2023050668A1 (en) 2023-04-06

Family

ID=78854417

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/074389 WO2023050668A1 (en) 2021-09-30 2022-01-27 Clustering model construction method based on causal inference and medical data processing method

Country Status (2)

Country Link
CN (1) CN113782192A (en)
WO (1) WO2023050668A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113782192A (en) * 2021-09-30 2021-12-10 平安科技(深圳)有限公司 Grouping model construction method based on causal inference and medical data processing method

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109858630A (en) * 2019-02-01 2019-06-07 清华大学 Method and apparatus for intensified learning
US20190303765A1 (en) * 2018-03-30 2019-10-03 Visa International Service Association Method, System, and Computer Program Product for Implementing Reinforcement Learning
CN111785366A (en) * 2020-06-29 2020-10-16 平安科技(深圳)有限公司 Method and device for determining patient treatment scheme and computer equipment
CN112115322A (en) * 2020-09-25 2020-12-22 平安科技(深圳)有限公司 User grouping method and device, electronic equipment and storage medium
CN113255735A (en) * 2021-04-29 2021-08-13 平安科技(深圳)有限公司 Method and device for determining medication scheme of patient
CN113270189A (en) * 2021-05-19 2021-08-17 复旦大学附属肿瘤医院 Tumor treatment aid decision-making method based on reinforcement learning
CN113421653A (en) * 2021-06-23 2021-09-21 平安科技(深圳)有限公司 Medical information pushing method and device, storage medium and computer equipment
CN113782192A (en) * 2021-09-30 2021-12-10 平安科技(深圳)有限公司 Grouping model construction method based on causal inference and medical data processing method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111696661A (en) * 2020-05-13 2020-09-22 平安科技(深圳)有限公司 Patient clustering model construction method, patient clustering method and related equipment

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190303765A1 (en) * 2018-03-30 2019-10-03 Visa International Service Association Method, System, and Computer Program Product for Implementing Reinforcement Learning
CN109858630A (en) * 2019-02-01 2019-06-07 清华大学 Method and apparatus for intensified learning
CN111785366A (en) * 2020-06-29 2020-10-16 平安科技(深圳)有限公司 Method and device for determining patient treatment scheme and computer equipment
CN112115322A (en) * 2020-09-25 2020-12-22 平安科技(深圳)有限公司 User grouping method and device, electronic equipment and storage medium
CN113255735A (en) * 2021-04-29 2021-08-13 平安科技(深圳)有限公司 Method and device for determining medication scheme of patient
CN113270189A (en) * 2021-05-19 2021-08-17 复旦大学附属肿瘤医院 Tumor treatment aid decision-making method based on reinforcement learning
CN113421653A (en) * 2021-06-23 2021-09-21 平安科技(深圳)有限公司 Medical information pushing method and device, storage medium and computer equipment
CN113782192A (en) * 2021-09-30 2021-12-10 平安科技(深圳)有限公司 Grouping model construction method based on causal inference and medical data processing method

Also Published As

Publication number Publication date
CN113782192A (en) 2021-12-10

Similar Documents

Publication Publication Date Title
AU2012245343B2 (en) Predictive modeling
CN112071425B (en) Data processing method and device, computer equipment and storage medium
US11250954B2 (en) Patient readmission prediction tool
US9443002B1 (en) Dynamic data analysis and selection for determining outcomes associated with domain specific probabilistic data sets
US20200082918A1 (en) System and methd of social-behavioral roi calculation and optimization
JP2012221508A (en) System and computer readable medium for predicting patient outcomes
WO2021151327A1 (en) Triage data processing method and apparatus, and device and medium
CN111144658B (en) Medical risk prediction method, device, system, storage medium and electronic equipment
WO2022227198A1 (en) Method and device for determining drug regimen of patient
US20200058408A1 (en) Systems, methods, and apparatus for linking family electronic medical records and prediction of medical conditions and health management
US20220189619A1 (en) Patient Treatment Resource Utilization Predictor
CN111696661A (en) Patient clustering model construction method, patient clustering method and related equipment
Hilbert et al. Using decision trees to manage hospital readmission risk for acute myocardial infarction, heart failure, and pneumonia
WO2023050668A1 (en) Clustering model construction method based on causal inference and medical data processing method
CN112447270A (en) Medication recommendation method, device, equipment and storage medium
US20150019255A1 (en) Systems and methods for primary admissions analysis
CN115295115A (en) Sodium valproate blood concentration prediction method and device based on deep learning
CN114694355B (en) Remote early warning method and system
WO2020188043A1 (en) Method and system to deliver time-driven activity-based-costing in a healthcare setting in an efficient and scalable manner
WO2023178789A1 (en) Disease risk estimation network optimization method and apparatus, medium, and device
CN111368412B (en) Simulation model construction method and device for nursing demand prediction
Katz-Rogozhnikov et al. Toward Comprehensive Attribution of Healthcare Cost Changes
CN113782216B (en) Disabling weight determining method and device, electronic equipment and storage medium
US20230105348A1 (en) System for adaptive hospital discharge
US20220319650A1 (en) Method and System for Providing Information About a State of Health of a Patient

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22874071

Country of ref document: EP

Kind code of ref document: A1