CN114388095A - Sepsis treatment strategy optimization method, system, computer device and storage medium - Google Patents

Sepsis treatment strategy optimization method, system, computer device and storage medium Download PDF

Info

Publication number
CN114388095A
CN114388095A CN202111584274.6A CN202111584274A CN114388095A CN 114388095 A CN114388095 A CN 114388095A CN 202111584274 A CN202111584274 A CN 202111584274A CN 114388095 A CN114388095 A CN 114388095A
Authority
CN
China
Prior art keywords
strategy
sepsis
treatment strategy
health state
medical record
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111584274.6A
Other languages
Chinese (zh)
Inventor
余超
刘翔宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sun Yat Sen University
Original Assignee
Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sun Yat Sen University filed Critical Sun Yat Sen University
Priority to CN202111584274.6A priority Critical patent/CN114388095A/en
Publication of CN114388095A publication Critical patent/CN114388095A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/10ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to drugs or medications, e.g. for ensuring correct administration to patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Public Health (AREA)
  • Biomedical Technology (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Pathology (AREA)
  • Chemical & Material Sciences (AREA)
  • Medicinal Chemistry (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention provides a sepsis treatment strategy optimization method, a system, computer equipment and a storage medium, after corresponding health state labeled medical record data is constructed according to acquired medical record data of a sepsis patient, a health state transfer model of the sepsis patient is constructed according to health state information and medication action information, a first treatment strategy of the sepsis patient is obtained through the health state transfer model, meanwhile, the medical record data is labeled according to the health state, a second treatment strategy of the sepsis patient is obtained through a model-free depth reinforcement learning method, strategy fusion is carried out on the first treatment strategy and the second treatment strategy to obtain a fusion treatment strategy, expected income evaluation is carried out on the fusion treatment strategy through a strategy evaluation method, the fusion treatment strategy is adjusted to obtain an optimal treatment strategy, and reasonable medication treatment strategies can be given for different health states of the sepsis patient, provides reliable reference for sepsis treatment.

Description

Sepsis treatment strategy optimization method, system, computer device and storage medium
Technical Field
The invention relates to the technical field of medical reinforcement learning, in particular to a sepsis treatment strategy optimization method and system based on strategy fusion, computer equipment and a storage medium.
Background
Sepsis is a systemic inflammatory response syndrome caused by infection, has serious threat to life safety, and is one of the main causes of common high-risk complications and death of ICU patients. Therefore, the drug administration action of each stage of the sepsis patient is very critical, and if reasonable medication strategies can be provided for different health states of different stages of the sepsis patient, the death rate of the sepsis patient can be effectively reduced.
The existing machine learning aiming at sepsis is only limited to early prediction of the disease condition of a patient, and does not relate to machine learning research aiming at the treatment strategy of the sepsis patient, and clinical treatment still mainly depends on experience accumulation and manual decision of doctors, however, clinical data of the sepsis patient cannot be well shared, the clinical experience of each doctor is limited, and reasonable and effective medication strategies cannot be guaranteed to be given at each health state stage of the patient.
Disclosure of Invention
The invention aims to provide a sepsis treatment strategy optimization method, which is a method for acquiring sepsis treatment strategies by learning and optimizing the sepsis treatment strategies by respectively utilizing a model-based and model-free reinforcement learning method after health state labeling processing is carried out on medical record data of sepsis patients through clustering analysis, then fusing the two optimized learning methods to acquire the treatment strategies, and estimating and adjusting the fusion strategies by adopting strategy separation evaluation to acquire a final optimal treatment strategy.
In order to achieve the above object, it is necessary to provide a sepsis treatment strategy optimization method, system, computer device, and storage medium in response to the above technical problems.
In a first aspect, embodiments of the present invention provide a method for optimizing a sepsis treatment strategy, the method comprising the steps of:
acquiring medical record data of a sepsis patient, and constructing corresponding health state labeling medical record data according to the medical record data; the health state labeling medical record data comprises sign data, inspection data, medication action information and health state information;
according to the health state information and the medication action information of the health state labeling medical record data, a health state transfer model of the sepsis patient is constructed, and a first treatment strategy of the sepsis patient is obtained through the health state transfer model;
labeling medical record data according to the health state, and obtaining a second treatment strategy of the sepsis patient by a model-free deep reinforcement learning method;
and performing strategy fusion on the first treatment strategy and the second treatment strategy to obtain a fusion treatment strategy.
Further, the method further comprises:
and adjusting the fusion treatment strategy by a strategy separation evaluation method to obtain an optimal treatment strategy.
Further, the step of constructing corresponding health status labeling medical record data according to the medical record data comprises:
acquiring a sepsis lethal core index according to the medical record data; the medical record data comprises physical sign data, inspection data and medication action information;
combining the sepsis lethal core index with a preset sepsis diagnosis standard index to obtain a sepsis analysis key index;
performing cluster analysis on the key indexes of sepsis analysis to obtain health state classification of sepsis patients;
and carrying out health state information labeling on the medical record data according to the health state classification to obtain the health state labeled medical record data.
Further, the step of obtaining a first treatment strategy for the sepsis patient by the health state transition model comprises:
constructing a medication action reward model according to the health state transition model, the medication action information and the medication action reward value;
and obtaining a first treatment strategy of the sepsis patient through a strategy iterative algorithm according to the health state transition model and the medication action reward model.
Further, the step of labeling medical record data according to the health state and obtaining a second treatment strategy of the sepsis patient through a model-free deep reinforcement learning method comprises the following steps:
and inputting the physical sign data, the inspection data, the medication action information and the medication action reward value into a Dueling DQN network, and performing strategy prediction by minimizing time sequence difference errors to obtain the second treatment strategy.
Further, the step of performing strategy fusion on the first treatment strategy and the second treatment strategy to obtain a fused treatment strategy comprises:
pre-assigning corresponding first weight vectors and second weight vectors to the first treatment strategy and the second treatment strategy;
and according to the first weight vector and the second weight vector, carrying out weighted summation on the first treatment strategy and the second treatment strategy to obtain the fusion treatment strategy.
Further, the step of adjusting the fusion treatment strategy to obtain an optimal treatment strategy by the off-strategy evaluation method includes:
constructing an off-strategy evaluation model according to the fusion treatment strategy, and taking the off-strategy evaluation model as a fusion strategy optimization objective function;
and optimizing the fusion strategy optimization objective function through a gradient lifting algorithm to obtain the optimal treatment strategy.
In a second aspect, embodiments of the present invention provide a sepsis treatment strategy optimisation system, the system comprising:
the system comprises a preprocessing module, a health state labeling module and a health state labeling module, wherein the preprocessing module is used for acquiring medical record data of a sepsis patient and constructing corresponding health state labeling medical record data according to the medical record data; the health state labeling medical record data comprises sign data, inspection data, medication action information and health state information;
the first learning module is used for constructing a health state transfer model of the sepsis patient according to the health state information and the medication action information of the health state labeling medical record data, and obtaining a first treatment strategy of the sepsis patient through the health state transfer model;
the second learning module is used for labeling medical record data according to the health state and obtaining a second treatment strategy of the patient suffering from the sepsis through a model-free deep reinforcement learning method;
and the strategy fusion module is used for performing strategy fusion on the first treatment strategy and the second treatment strategy to obtain a fusion treatment strategy.
In a third aspect, an embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method when executing the computer program.
In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the above method.
The above-mentioned application provides a sepsis treatment strategy optimization method, system, computer device and storage medium, by the method, after the corresponding health status labeling medical record data is constructed according to the acquired medical record data of the sepsis patient, constructing a health state transfer model of the sepsis patient according to the health state information and the medication action information, obtaining a first treatment strategy of the sepsis patient through the health state transfer model, and simultaneously, labeling medical record data according to the health state, obtaining a second treatment strategy of the sepsis patient by a model-free deep reinforcement learning method, performing strategy fusion on the first treatment strategy and the second treatment strategy to obtain a fusion treatment strategy, and carrying out expected income evaluation on the fusion treatment strategy by a strategy-leaving evaluation method, and adjusting the fusion treatment strategy to obtain the optimal treatment strategy. Compared with the prior art, the sepsis treatment strategy optimization method has the advantages that the sepsis treatment strategy is learned and optimized by respectively using the model-contained and model-free reinforcement learning methods, the treatment strategy is obtained by optimizing and learning the two methods and then fused, the fusion strategy is evaluated and adjusted by adopting strategy separation evaluation, on the basis of fully utilizing the advantages of high sample utilization rate of model reinforcement learning, strong asymptotic performance of model reinforcement learning and the like, the rational medication treatment strategy can be provided for different health states of sepsis patients, reliable reference basis is provided for sepsis treatment, and further the death rate of the sepsis patients is effectively reduced.
Drawings
Fig. 1 is a schematic view of an application scenario of the sepsis treatment strategy optimization method in the embodiment of the invention;
FIG. 2 is a flowchart of sepsis treatment strategy optimization in an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a method for optimizing a sepsis treatment strategy in an embodiment of the present invention;
FIG. 4 is a schematic flow chart of another method for optimizing a sepsis treatment strategy in an embodiment of the present invention;
FIG. 5 is a graph comparing policy therapy fusion with a physician policy, a first therapy policy, and a second therapy policy in accordance with an embodiment of the present invention;
fig. 6 a, b, c and d respectively show the physician strategy, the first treatment strategy, the second treatment strategy and the fused treatment fused medication action matrix in the embodiment of the invention;
fig. 7 is a schematic structural diagram of a sepsis treatment strategy optimization system in an embodiment of the present invention;
fig. 8 is an internal structural diagram of a computer device in the embodiment of the present invention.
Detailed Description
In order to make the purpose, technical solution and advantages of the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments, and it is obvious that the embodiments described below are part of the embodiments of the present invention, and are used for illustrating the present invention only, but not for limiting the scope of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The sepsis treatment strategy optimization method provided by the invention can be applied to a terminal or a server as shown in figure 1. The terminal can be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers and portable wearable devices, and the server can be implemented by an independent server or a server cluster formed by a plurality of servers. The server can complete model reinforcement learning and model-free reinforcement learning for optimizing the sepsis treatment strategy by adopting the sepsis treatment strategy optimization method provided by the invention according to the flow architecture diagram shown in fig. 2 based on the sepsis patient-related medical record data in the MIMIC public data set, and finally obtains the optimal treatment strategy through strategy fusion and applies the optimal treatment strategy to other learning tasks on the server or transmits the optimal treatment strategy to the terminal for receiving and using by a terminal user.
In one embodiment, as shown in fig. 3, there is provided a sepsis treatment strategy optimization method comprising the steps of:
s11, acquiring medical record data of the sepsis patient, and constructing corresponding health state annotation medical record data according to the medical record data; the health state labeling medical record data comprises sign data, inspection data, medication action information and health state information; the medical record data of the sepsis patients are from an MIMIC public data set, include treatment history data of a plurality of sepsis patients, can meet the data volume requirement of reinforcement learning, and reliably and effectively provide reliable guarantee for the accuracy of the subsequent treatment strategy optimization results to a certain extent. However, the health status corresponding to the data at different times of each sepsis patient is not labeled in the existing MIMIC public data set, and in order to implement better strategy optimization based on the health status of the patient, the embodiment first performs cluster analysis based on the original medical record data to obtain different health status classifications, and performs health status information labeling on the original medical record data according to the health status classification result to obtain health status labeled medical record data which can be used for strategy analysis optimization. Specifically, the step of constructing corresponding health status labeled medical record data according to the medical record data includes:
acquiring a sepsis lethal core index according to the medical record data; the medical record data comprises physical sign data, inspection data and medication action information; the sepsis lethal core index can be understood as a data index which has a large influence on the mortality of a patient and is screened from a plurality of data indexes in original medical record data by a supervised learning or unsupervised learning method. Preferably, the indexes at the top two layers of the tree structure can be selected from the trained decision tree by inputting the physical sign data and the test data of the sepsis patient into the decision tree model and by learning and training, and the indexes are used as core indexes which have great influence on the mortality of the patient.
Combining the sepsis lethal core index with a preset sepsis diagnosis standard index to obtain a sepsis analysis key index; the preset sepsis diagnostic standard index refers to data indexes generally used for diagnosing sepsis, including heart rate (HeartRate), White Blood Cells (WBC), sequential organ failure Score (SOFA) within 24 hours, rapid sequential organ failure score (QSOFA) Mean Blood Pressure (MBP), and blood Lactate (Lactate). After the sepsis lethal core index is obtained by the method, the sepsis lethal core index is combined with a preset sepsis diagnosis standard index to obtain a sepsis analysis key index for subsequent cluster analysis. It should be noted that the sepsis lethal core index and the preset sepsis diagnostic standard index may have repeated conditions, and when the sepsis lethal core index and the preset sepsis diagnostic standard index are combined, the repeated redundant index is directly deleted, and an unrepeated index data set is obtained and is used as a final sepsis analysis key index.
Performing cluster analysis on the key indexes of sepsis analysis to obtain health state classification of sepsis patients; the method comprises the steps of setting a plurality of different clustering numbers, selecting the optimal clustering number according to the minimum mean square error principle, wherein each cluster is used for representing a health state of a sepsis patient, namely, a plurality of different health states suitable for classifying medical record data of the sepsis patient are given through clustering analysis.
And carrying out health state information labeling on the medical record data according to the health state classification to obtain the health state labeled medical record data. The health state information marking means that medical record data of a patient at each moment and the corresponding health state are matched and identified, and the association of the health state information and medicine taking action information of a doctor is completed.
S12, constructing a health state transfer model of the sepsis patient according to the health state information and the medication action information of the health state labeling medical record data, and obtaining a first treatment strategy of the sepsis patient through the health state transfer model; the health state transfer model is constructed by the health state information at the current moment, the medicine taking action information corresponding to the health state information at the current moment and the health state information of the patient at the next moment. Specifically, the step of obtaining a first treatment strategy of the sepsis patient through the health state transition model comprises the following steps:
constructing a medication action reward model according to the health state transition model, the medication action information and the medication action reward value; the reward value of the medication action can be set artificially according to the requirement, for example, the patient is discharged to obtain positive reward, the patient is dead to obtain negative reward, and the intermediate treatment process obtains 0 reward. This example defines the markov decision process for sepsis treatment, which is defined as a quintuple:<S,A,P,R,γ>where S represents the health state space of the patient, A represents the physician ' S medication action space, P represents a transition model of the patient ' S health state, and P (S ' | S, a) represents the performance of a medication action in the health state SThe probability of a transitioning to the healthy state s', R ═ R (s, a) represents the reward value for the medication action obtained by performing the medication action a in the healthy state s, and γ represents the discount factor. Defining a long-distance expected income model based on the Markov decision process
Figure BDA0003426019090000081
A medication action reward model is available for calculating a reward value that is available for a medication action performed by a physician in a certain health state of the patient, which may be expressed as:
R(s,a)=∑r∈Rr∑s∈SP(s′|s,a)
where R represents the set of all medication action reward values.
And obtaining a first treatment strategy of the sepsis patient through a strategy iterative algorithm according to the health state transition model and the medication action reward model. The first treatment strategy process for the sepsis patient obtained through the strategy iterative algorithm learning comprises a strategy evaluation stage of estimating a current strategy value function and a strategy optimization stage of optimizing and improving a strategy through a maximized strategy value function, and the specific strategy iterative process comprises the following steps: randomly initializing a strategy pi1And defining a state value function V(s) ═ E [. Sigma ]tγtrt|s0=s,π1]Indicating that in the healthy state s, the policy is executed pi1A desired cumulative benefit to be obtained, wherein γ represents a discount factor; in the policy evaluation phase, the state value function v(s) is iteratively updated until convergence: v(s) ← Sigmas′,rP (s '| s, a) (r + γ V (s')); in the strategy improvement stage, the strategy pi is adjusted by using the updated state value function V(s) in the strategy evaluation stage1:π1(s,a)←argmaxas′,rP (s '| s, a) (r + γ V (s')); by iteratively performing the two stages of policy evaluation and policy refinement until the policy pi1The convergence is no longer altered, leading to a first treatment strategy for sepsis patients.
The modeled method for obtaining the first treatment strategy has the advantage of higher sample utilization rate, so that the strategy has betterRepresentativeness and accuracy, but considering that the patient health state transition model only depends on limited key indexes, certain deviation may exist in model estimation. In order to further improve the accuracy of the strategy, the embodiment also obtains a second treatment strategy pi based on the complete data characteristics in the health status labeling medical record data by using a model-free deep reinforcement learning method through the following steps2
S13, marking medical record data according to the health state, and obtaining a second treatment strategy of the patient with sepsis through a model-free deep reinforcement learning method; the model-free deep reinforcement learning method comprises a time sequence difference learning method which learns from current value function estimation through a bootstrap method. The existing strategy learning method based on time sequence difference comprises Q-learning, DQN (deep Q-network) network, Double DQN network and Dueling DQN network:
q-learning, an algorithm based on time sequence difference learning, estimates an optimal state action value function through a Bellman equation: q*(s,a)=R(s,a)+ymaxa′E[Q*(s′,a′)]Wherein the state action value function is defined as: q (s, a) [ ∑ E [ ]tγtrt|s0=s,a0=a];
A DQN (deep Q-network) network, which combines a Q-learning algorithm and a Deep Neural Network (DNN) to represent a state action value function, and learns an optimal strategy by minimizing a time sequence difference error, wherein the state action value function is easy to generate an over-estimation condition, and results in an incorrect prediction result and a poor strategy;
a Double DQN network, in order to solve the problem of DQN (deep Q-network) network, two networks with completely consistent structures but different parameter updating frequencies are used for action selection and action evaluation;
the Dueling DQN network realizes cross-action general learning under the condition of not changing a basic reinforcement learning process, and divides a state action value function Q (s, a) into a state value function V(s) and an action advantage function A (s, a) ═ Q (s, a) -V(s), so that the state action value function is estimated more accurately, and a better strategy estimation effect can be realized.
In principle, the model-free strategy optimization method based on time sequence difference learning can achieve the purpose that the second treatment strategy is obtained based on the complete data characteristic learning of the health state labeled medical record data, and in order to ensure the reasonable effectiveness of the second treatment strategy, the embodiment preferably uses the Dueling DQN network to perform model-free reinforcement learning. Specifically, the step of labeling medical record data according to the health state and obtaining a second treatment strategy of the sepsis patient through a model-free deep reinforcement learning method comprises the following steps:
and inputting the physical sign data, the inspection data, the medication action information and the medication action reward value into a Dueling DQN network, and performing strategy prediction by minimizing time sequence difference errors to obtain the second treatment strategy.
S14, strategy fusion is carried out on the first treatment strategy and the second treatment strategy to obtain a fusion treatment strategy. After the first treatment strategy and the second treatment strategy are obtained according to the steps of the method, in order to fully utilize the advantages of the two strategies, the strategy with model learning and the strategy without model learning are fused by pre-distributing corresponding weight vectors, and a fusion treatment strategy is obtained. Specifically, the step of performing strategy fusion on the first treatment strategy and the second treatment strategy to obtain a fused treatment strategy includes:
pre-assigning corresponding first weight vectors and second weight vectors to the first treatment strategy and the second treatment strategy; wherein the first weight vector and the second weight vector respectively represent a probability of selecting a corresponding policy under the specific health state S;
and according to the first weight vector and the second weight vector, carrying out weighted summation on the first treatment strategy and the second treatment strategy to obtain the fusion treatment strategy. Wherein the fusion therapy strategy is pimixCan be expressed as:
πmix=w1π1+w2π2
wherein, w1And w2Respectively representing a first weight vector and a second weight vectorAn amount; pi1And pi2A first and a second treatment strategy, respectively, in a specific health state S.
In order to further improve the rationality, the accuracy and the like of the fusion treatment strategy, as shown in fig. 4, the sepsis treatment strategy optimization method further includes evaluating expected benefits of the fusion treatment strategy through the following step S15, and performing optimization and adjustment on the fusion treatment strategy according to a corresponding evaluation result until an optimal treatment strategy is obtained.
S15, adjusting the fusion treatment strategy through a strategy separation evaluation method to obtain an optimal treatment strategy. The off-strategy evaluation method is mainly used for estimating expected benefits of the fusion treatment strategy, and performing maximum optimization of the expected benefits by using gradient promotion to continuously adjust a first weight vector and a second weight vector corresponding to the first treatment strategy and the second treatment strategy distribution until the stable first weight vector and the stable second weight vector are obtained, so as to obtain the optimal treatment strategy. Specifically, the step of adjusting the fusion treatment strategy to obtain an optimal treatment strategy by the off-strategy evaluation method includes:
constructing an off-strategy evaluation model according to the fusion treatment strategy, and taking the off-strategy evaluation model as a fusion strategy optimization objective function; the fusion strategy optimization objective function is obtained through the following steps:
first, a fusion therapy strategy is appliedmixThe expected benefit at each trajectory (patient treatment record) is defined as:
Figure BDA0003426019090000111
in the formula (I), the compound is shown in the specification,
Figure BDA0003426019090000112
ρt=πmix(at|st)/πb(at|st)
Figure BDA0003426019090000113
wherein H represents the length of the track; rhotRepresenting an evaluation fusion strategy pimixWith doctor's strategy pib(medication action information in original medical record data) importance ratio; pimix(at|st) And pib(at|st) Respectively in a healthy state stUnder the fusion treatment strategy of pimixAnd physician strategy pibPerforming a medication action atThe probability of (d); w is atRepresenting a normalization process on the importance ratio; d represents the medical record data (historical treatment data set) of the sepsis patient, | D | represents the number of tracks in the medical record data. For fusion treatment strategy pimixThe expected revenue estimate of (c) is the average of the expected revenue estimates for all trajectories, remembering from the policy evaluation model:
Figure BDA0003426019090000114
in order to maximize the expected benefit of the fusion therapy strategy, the above off-strategy evaluation model is therefore used as the strategy fusion optimization objective function.
And optimizing the fusion strategy optimization objective function through a gradient lifting algorithm to obtain the optimal treatment strategy. Before optimizing the fusion strategy optimization objective function, a sigmoid function sigma needs to be added to the fusion strategy optimization objective function to increase the nonlinear characteristic:
Figure BDA0003426019090000121
then, the following gradient lifting algorithm is used for carrying out maximum optimization on J, and the strategy weight w is adjusted1And w2And obtaining an optimal treatment strategy:
Figure BDA0003426019090000122
Figure BDA0003426019090000123
Figure BDA0003426019090000124
Figure BDA0003426019090000125
where α is the learning rate.
In the embodiment of the application, after health state labeling processing is carried out on medical record data of a sepsis patient through cluster analysis, a health state transfer model of the sepsis patient is constructed according to health state information and medication action information, a first treatment strategy of the sepsis patient is obtained through the health state transfer model, meanwhile, according to a Duling DQN network, model-free deep reinforcement learning is carried out on the basis of the health state labeled medical record data through minimizing time sequence difference errors, a second treatment strategy of the sepsis patient is obtained, then the first treatment strategy and the second treatment strategy are subjected to weighted summation to obtain a fusion treatment strategy, expected income evaluation is carried out on the fusion treatment strategy through a strategy separation evaluation method, the weight of the fusion treatment strategy is adjusted to obtain an optimal treatment strategy, and the advantages of model reinforcement learning and model-free reinforcement learning are added, the reasonable medication strategy can be provided for different health states of the sepsis patients, a reliable reference basis is provided for sepsis treatment, and the death rate of the sepsis patients is further effectively reduced.
To verify the technical effect of the sepsis therapeutic strategy optimization method of the present invention, this example shows
The two aspects of expected yield and medication action matrix based on different strategies are compared and analyzed, and the results shown in fig. 5 and fig. 6(a-d) are obtained respectively: as shown in fig. 5, the fusion treatment strategy has the best effect, and the expected benefit is higher than the expected benefit of the doctor strategy (medication action information in different health states in the original medical record data), the first treatment strategy (reinforcement learning strategy with model) and the second treatment strategy (reinforcement learning strategy without model); as shown in fig. 6(a-d), the second treatment strategy was similar to the physician's strategy, but also explored the use of some higher doses of vasopressin, whereas the first treatment strategy differed from the fusion treatment strategy by a greater amount from the physician's strategy, allowing the use of more drug doses in combination.
It should be noted that, although the steps in the above-described flowcharts are shown in sequence as indicated by arrows, the steps are not necessarily executed in sequence as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise.
In one embodiment, as shown in fig. 7, there is provided a sepsis treatment strategy optimization system, the system comprising:
the system comprises a preprocessing module 1, a health state labeling module and a health state labeling module, wherein the preprocessing module 1 is used for acquiring medical record data of a sepsis patient and constructing corresponding health state labeling medical record data according to the medical record data; the health state labeling medical record data comprises sign data, inspection data, medication action information and health state information;
the first learning module 2 is used for constructing a health state transfer model of the sepsis patient according to the health state information and the medication action information of the health state labeling medical record data, and obtaining a first treatment strategy of the sepsis patient through the health state transfer model;
the second learning module 3 is used for labeling medical record data according to the health state and obtaining a second treatment strategy of the patient suffering from sepsis by a model-free deep reinforcement learning method;
and the strategy fusion module 4 is used for performing strategy fusion on the first treatment strategy and the second treatment strategy to obtain a fusion treatment strategy.
In addition, the sepsis treatment strategy optimization system further comprises a strategy optimization module, and the strategy optimization module is used for adjusting the fusion treatment strategy through a strategy separation evaluation method to obtain an optimal treatment strategy.
For specific limitations of a sepsis therapeutic strategy optimization system, reference may be made to the above limitations of a sepsis therapeutic strategy optimization method, which are not described herein again. The modules in the sepsis treatment strategy optimization system can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
Fig. 8 shows an internal structure diagram of a computer device in one embodiment, and the computer device may be specifically a terminal or a server. As shown in fig. 8, the computer apparatus includes a processor, a memory, a network interface, a display, and an input device, which are connected through a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a sepsis treatment strategy optimization method. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those of ordinary skill in the art that the architecture shown in FIG. 8 is a block diagram of only a portion of the architecture associated with the subject application, and is not intended to limit the computing devices to which the subject application may be applied, as a particular computing device may include more or less components than those shown, or may combine certain components, or have a similar arrangement of components.
In one embodiment, a computer device is provided, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the steps of the above method being performed when the computer program is executed by the processor.
In an embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the above-mentioned method.
To sum up, the sepsis treatment strategy optimization method, system, computer device and storage medium provided by the embodiments of the present invention, which implements obtaining different health status categories based on a cluster analysis method according to acquired medical record data of a sepsis patient, performing health status labeling on parallel medical record data, after constructing corresponding health status labeled medical record data, constructing a health status transfer model of the sepsis patient according to health status information and medication action information, obtaining a first treatment strategy of the sepsis patient through the health status transfer model, simultaneously labeling the medical record data according to the health status, using a Dueling n network, performing strategy prediction through a model-free depth reinforcement learning method that minimizes timing difference errors to obtain a second treatment strategy of the sepsis patient, and then fusing the first treatment strategy and the second treatment strategy to obtain a fused treatment strategy, the sepsis treatment strategy is learned and optimized by respectively using a model-based reinforcement learning method and a model-free reinforcement learning method, the two optimized learning methods are fused to obtain the treatment strategy, the off-strategy evaluation is adopted to evaluate and adjust the fusion strategy, and on the basis of fully utilizing the advantages of high sample utilization rate of the model-based reinforcement learning, strong asymptotic property of the model-free reinforcement learning and the like, reasonable medication strategies can be provided for different health states of sepsis patients, reliable reference basis is provided for sepsis treatment, and further the death rate of sepsis patients is effectively reduced.
The embodiments in this specification are described in a progressive manner, and all the same or similar parts of the embodiments are directly referred to each other, and each embodiment is described with emphasis on differences from other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment. It should be noted that, the technical features of the embodiments may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express some preferred embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for those skilled in the art, various modifications and substitutions can be made without departing from the technical principle of the present invention, and these should be construed as the protection scope of the present application. Therefore, the protection scope of the present patent shall be subject to the protection scope of the claims.

Claims (10)

1. A method for optimizing a sepsis treatment strategy, the method comprising the steps of:
acquiring medical record data of a sepsis patient, and constructing corresponding health state labeling medical record data according to the medical record data; the health state labeling medical record data comprises sign data, inspection data, medication action information and health state information;
according to the health state information and the medication action information of the health state labeling medical record data, a health state transfer model of the sepsis patient is constructed, and a first treatment strategy of the sepsis patient is obtained through the health state transfer model;
labeling medical record data according to the health state, and obtaining a second treatment strategy of the sepsis patient by a model-free deep reinforcement learning method;
and performing strategy fusion on the first treatment strategy and the second treatment strategy to obtain a fusion treatment strategy.
2. The sepsis treatment strategy optimization method of claim 1, further comprising:
and adjusting the fusion treatment strategy by a strategy separation evaluation method to obtain an optimal treatment strategy.
3. The sepsis therapeutic strategy optimization method of claim 1, wherein said step of constructing corresponding health status annotation medical record data based on said medical record data comprises:
acquiring a sepsis lethal core index according to the medical record data; the medical record data comprises physical sign data, inspection data and medication action information;
combining the sepsis lethal core index with a preset sepsis diagnosis standard index to obtain a sepsis analysis key index;
performing cluster analysis on the key indexes of sepsis analysis to obtain health state classification of sepsis patients;
and carrying out health state information labeling on the medical record data according to the health state classification to obtain the health state labeled medical record data.
4. A sepsis treatment strategy optimisation method according to claim 1 wherein the step of deriving a first treatment strategy for a sepsis patient from the health state shift model comprises:
constructing a medication action reward model according to the health state transition model, the medication action information and the medication action reward value;
and obtaining a first treatment strategy of the sepsis patient through a strategy iterative algorithm according to the health state transition model and the medication action reward model.
5. A sepsis therapeutic strategy optimization method according to claim 1, characterized in that said step of labeling medical record data according to said health status and obtaining a second therapeutic strategy for sepsis patients by a model-free deep reinforcement learning method comprises:
and inputting the physical sign data, the inspection data, the medication action information and the medication action reward value into a Dueling DQN network, and performing strategy prediction by minimizing time sequence difference errors to obtain the second treatment strategy.
6. A sepsis treatment strategy optimisation method as claimed in claim 1 wherein the step of strategy fusing the first and second treatment strategies to obtain a fused treatment strategy comprises:
pre-assigning corresponding first weight vectors and second weight vectors to the first treatment strategy and the second treatment strategy;
and according to the first weight vector and the second weight vector, carrying out weighted summation on the first treatment strategy and the second treatment strategy to obtain the fusion treatment strategy.
7. A sepsis treatment strategy optimisation method according to claim 1 wherein the step of adapting the fusion treatment strategy by off-strategy assessment method to obtain an optimal treatment strategy comprises:
constructing an off-strategy evaluation model according to the fusion treatment strategy, and taking the off-strategy evaluation model as a fusion strategy optimization objective function;
and optimizing the fusion strategy optimization objective function through a gradient lifting algorithm to obtain the optimal treatment strategy.
8. A sepsis treatment strategy optimization system, the system comprising:
the system comprises a preprocessing module, a health state labeling module and a health state labeling module, wherein the preprocessing module is used for acquiring medical record data of a sepsis patient and constructing corresponding health state labeling medical record data according to the medical record data; the health state labeling medical record data comprises sign data, inspection data, medication action information and health state information;
the first learning module is used for constructing a health state transfer model of the sepsis patient according to the health state information and the medication action information of the health state labeling medical record data, and obtaining a first treatment strategy of the sepsis patient through the health state transfer model;
the second learning module is used for labeling medical record data according to the health state and obtaining a second treatment strategy of the patient suffering from the sepsis through a model-free deep reinforcement learning method;
and the strategy fusion module is used for performing strategy fusion on the first treatment strategy and the second treatment strategy to obtain a fusion treatment strategy.
9. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the computer program is executed by the processor.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202111584274.6A 2021-12-22 2021-12-22 Sepsis treatment strategy optimization method, system, computer device and storage medium Pending CN114388095A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111584274.6A CN114388095A (en) 2021-12-22 2021-12-22 Sepsis treatment strategy optimization method, system, computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111584274.6A CN114388095A (en) 2021-12-22 2021-12-22 Sepsis treatment strategy optimization method, system, computer device and storage medium

Publications (1)

Publication Number Publication Date
CN114388095A true CN114388095A (en) 2022-04-22

Family

ID=81197588

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111584274.6A Pending CN114388095A (en) 2021-12-22 2021-12-22 Sepsis treatment strategy optimization method, system, computer device and storage medium

Country Status (1)

Country Link
CN (1) CN114388095A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115050451A (en) * 2022-08-17 2022-09-13 合肥工业大学 Automatic generation system for clinical sepsis medication scheme
CN115116590A (en) * 2022-06-29 2022-09-27 中国医学科学院基础医学研究所 Deep reinforcement learning method and device, and pulmonary nodule patient follow-up procedure planning method, system, medium and equipment
CN117275661A (en) * 2023-11-23 2023-12-22 太原理工大学 Deep reinforcement learning-based lung cancer patient medication prediction method and device

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115116590A (en) * 2022-06-29 2022-09-27 中国医学科学院基础医学研究所 Deep reinforcement learning method and device, and pulmonary nodule patient follow-up procedure planning method, system, medium and equipment
CN115050451A (en) * 2022-08-17 2022-09-13 合肥工业大学 Automatic generation system for clinical sepsis medication scheme
CN117275661A (en) * 2023-11-23 2023-12-22 太原理工大学 Deep reinforcement learning-based lung cancer patient medication prediction method and device
CN117275661B (en) * 2023-11-23 2024-02-09 太原理工大学 Deep reinforcement learning-based lung cancer patient medication prediction method and device

Similar Documents

Publication Publication Date Title
Singh et al. DMENet: diabetic macular edema diagnosis using hierarchical ensemble of CNNs
CN114388095A (en) Sepsis treatment strategy optimization method, system, computer device and storage medium
Hsu et al. Effective multiple cancer disease diagnosis frameworks for improved healthcare using machine learning
CN104572583B (en) Method and system for data densification
US20220044809A1 (en) Systems and methods for using deep learning to generate acuity scores for critically ill or injured patients
Afsaneh et al. Recent applications of machine learning and deep learning models in the prediction, diagnosis, and management of diabetes: a comprehensive review
Ho et al. The dependence of machine learning on electronic medical record quality
WO2021073255A1 (en) Time series clustering-based medication reminder method and related device
Kadi et al. Systematic mapping study of data mining–based empirical studies in cardiology
Srivastava et al. A rule-based monitoring system for accurate prediction of diabetes: monitoring system for diabetes
Teo et al. Current trends in readmission prediction: an overview of approaches
Sun et al. Personalized vital signs control based on continuous action-space reinforcement learning with supervised experience
Raghu et al. Learning to predict with supporting evidence: Applications to clinical risk prediction
Cheng et al. Combining knowledge extension with convolution neural network for diabetes prediction
Baucum et al. Adapting reinforcement learning treatment policies using limited data to personalize critical care
CN116525116A (en) Real-time risk early warning and monitoring system, equipment and storable medium for cardiogenic shock
Geetha et al. Stacking Ensemble Learning-Based Convolutional Gated Recurrent Neural Network for Diabetes Miletus.
Rahmati et al. Developing prediction models for 30-day readmission after stroke among Medicare beneficiaries
CN116230224A (en) Method and system for predicting adverse events of heart failure based on time sequence model
CN116543853A (en) Drug interaction prediction method, computer device, and medium
Wang et al. Prediction of target range of intact parathyroid hormone in hemodialysis patients with artificial neural network
US20210216894A1 (en) Predicting Rates of Hypoglycemia by a Machine Learning System
Jibril et al. Feature Selection and Parameter Optimization of Support Vector Machine (Svm) and Logistic Regression (Lr) Algorithms Using Particle Swarm Optimization (Pso) In Prediction of Diabetes.
Medina et al. On the early detection of Sepsis in MIMIC-III
Qian et al. Temporal reflected logistic regression for probabilistic heart failure survival score prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination