KR102559883B1

KR102559883B1 - Dosage Strategy Generation Method Using Markov Decision Process

Info

Publication number: KR102559883B1
Application number: KR1020200112353A
Authority: KR
Inventors: 박철진; 이미림; 임광현
Original assignee: 한양대학교 산학협력단; 홍익대학교 산학협력단
Priority date: 2020-09-03
Filing date: 2020-09-03
Publication date: 2023-07-25
Also published as: KR20220030706A

Abstract

마르코프 의사결정 과정을 이용하여 투약 전략을 생성하는 방법이 개시된다. 개시된 투약 전략 생성 방법은 대상 질병에 대한 샘플 환자의 치료 이력 데이터를 이용하여, 중도 상태, 생존 상태 및 사망 상태를 포함하는 마르코프 의사결정 과정 모델을 생성하는 단계; 상기 마르코프 의사결정 과정 모델을 이용하여, 보상의 총합이 최대가 되도록 상기 중도 상태 각각에서의 제1투약 정책을 결정하는 단계; 및 상기 마르코프 의사결정 과정 모델 및 상기 제1투약 정책에 대한 가치 함수를 이용하여, 미리 설정된 제약 조건을 만족하는 제2투약 정책을 생성하는 단계를 포함하며, 상기 치료 이력 데이터는, 상기 샘플 환자의 성별 정보, 나이 정보, 동반 질환 정보, 바이탈 사인 정보, 검사 정보 및 약물 투약 정보 중 적어도 하나를 포함한다.A method of generating a dosing strategy using a Markov decision-making process is disclosed. The disclosed medication strategy generation method includes the steps of generating a Markov decision-making process model including intermediate status, survival status, and death status using treatment history data of a sample patient for a target disease; determining a first medication policy in each of the intermediate states so that the total sum of rewards is maximized by using the Markov decision-making process model; and generating a second medication policy that satisfies a preset constraint condition by using the Markov decision-making process model and a value function for the first medication policy, wherein the treatment history data includes at least one of gender information, age information, comorbidity information, vital sign information, examination information, and drug administration information of the sample patient.

Description

Dosage Strategy Generation Method Using Markov Decision Process

본 발명은 투약 정책 생성을 위한 데이터 처리 방법에 관한 것으로, 더욱 상세하게는 마르코프 의사결정 과정을 이용하여 투약 전략을 생성하는 방법에 관한 것이다. The present invention relates to a data processing method for generating a medication policy, and more particularly, to a method for generating a medication strategy using a Markov decision-making process.

환자의 생존 가능성을 높이기 위해, 약물 투여 시점과 개인 환자의 상태를 모두 고려한 투약 전략이 결정될 필요가 있다. 의사 결정자의 경험과 지식에만 의존하여 주관적인 판단 하에 약물 투여량을 결정할 경우, 생존에 대한 큰 불확실성 및 약물 치료에 대한 불안과 불신을 초래할 수 있기 때문에, 의사결정자를 지원하기 위한 보다 객관적 수단이 필요하다. In order to increase the patient's chance of survival, it is necessary to determine a dosing strategy that considers both the timing of drug administration and the individual patient's condition. Determining drug dosage based on subjective judgment, relying only on the experience and knowledge of decision makers, can lead to great uncertainty about survival and anxiety and distrust about drug treatment, so more objective means to support decision makers are needed.

이러한 객관적인 수단으로서, 마르코프 의사결정 과정(Markov Decision Process)이 이용될 수 있다. 마르코프 의사결정 과정(Markov decision process, MDP)이란, 마르코프 과정(Markov Process)을 기반으로 한 의사 결정 모델로써, 보상의 총합(total sum of rewards)을 최대화할 수 있도록, 특정 상태에서의 행동(action)을 결정하는 정책(policy)을 도출하는 모델이다. 마르코프 의사결정 과정은 의사 결정 시점이 유한한 경우, 과도 상태(transient state)를 정의하며, 이 때의 정책은 시점에 영향을 받기에, 실상 같은 상태일지라도 시점 별 최적 행동은 달라질 수 있다. 반면, 의사결정 시점이 무한한 경우에는, 정상 상태(steady state)가 정의되며, 이 때의 정책은 시점에 무관하여 같은 상태에 대해 언제나 같은 최적 행동을 취하게 한다.As such an objective means, a Markov Decision Process can be used. The Markov decision process (MDP) is a decision-making model based on the Markov process. It is a model that derives a policy that determines an action in a specific state to maximize the total sum of rewards. The Markov decision-making process defines a transient state when the decision-making time is finite, and the policy at this time is affected by the time, so even in the same state, the optimal action for each time point may be different. On the other hand, if the decision-making point is infinite, a steady state is defined, and the policy at this time always takes the same optimal action for the same state regardless of the point in time.

관련 선행문헌으로 특허문헌인 대한민국 등록특허 제10-1860258호, 비특허문헌인 "Komorowski, M. et al. The intensive care AI clinician learns optimal treatment strategies for sepsis, Nature Medicine volume 24, pages1716-1720, 2018"가 있다.As related prior literature, there is Korean Patent Registration No. 10-1860258, which is a patent document, and "Komorowski, M. et al. The intensive care AI clinician learns optimal treatment strategies for sepsis, Nature Medicine volume 24, pages 1716-1720, 2018" which is a non-patent document.

본 발명은 환자의 치료 이력 데이터를 이용하여, 특정 치료 기간 동안, 치료 도중 환자의 사망 위험성을 최소화하는 동시에 생존 확률을 최대화할 수 있는 시점 별 투약 전략을 제공하기 위한 것이다. 즉, 본 발명은 정상 상태에 도달하지 않은 과도 상태의 마르코프 의사결정 과정에서도, 환자의 위험 수준을 통제 가능하도록하는 시점별 최적 투약 전략을 제공하기 위한 것이다.An object of the present invention is to provide a time-specific dosing strategy capable of maximizing the survival probability while minimizing the risk of death of the patient during a specific treatment period by using the patient's treatment history data. That is, the present invention is to provide an optimal dosing strategy for each time point that enables control of a patient's risk level even in a Markov decision-making process in a transient state in which a steady state has not been reached.

상기한 목적을 달성하기 위한 본 발명의 일 실시 예에 따르면, 대상 질병에 대한 성별 정보, 나이 정보, 동반 질환 정보, 바이탈 사인 정보, 검사 정보 및 약물 투약 정보 중 적어도 하나를 포함하는 샘플 환자의 치료 이력 데이터를 이용하여, 중도 상태, 생존 상태 및 사망 상태를 포함하는 마르코프 의사결정 과정 모델을 생성하는 단계; 상기 마르코프 의사결정 과정 모델을 이용하여, 보상의 총합이 최대가 되도록 상기 중도 상태 각각에서의 제1투약 정책을 결정하는 단계; 및 상기 마르코프 의사결정 과정 모델 및 상기 제1투약 정책에 대한 가치 함수를 이용하여, 미리 설정된 제약 조건을 만족하는 제2투약 정책을 생성하는 단계를 포함하는 마르코프 의사결정 과정을 이용하는 투약 전략 생성 방법이 제공된다.According to one embodiment of the present invention for achieving the above object, using treatment history data of a sample patient including at least one of gender information, age information, accompanying disease information, vital sign information, test information, and drug administration information for a target disease, generating a Markov decision-making process model including intermediate state, survival state, and death state; determining a first medication policy in each of the intermediate states so that the total sum of rewards is maximized by using the Markov decision-making process model; and generating a second medication policy that satisfies preset constraints using the Markov decision-making process model and a value function for the first medication policy.

또한 상기한 목적을 달성하기 위한 본 발명의 다른 실시 예에 따르면, 대상 질병에 대한 샘플 환자의 치료 이력 데이터를 수집하는 단계; 바이탈 사인 정보 및 약물 투약 정보를 포함하는 상기 치료 이력 데이터를 이용하여, 중도 상태, 생존 상태 및 사망 상태를 포함하는 마르코프 의사결정 과정 모델을 생성하는 단계; 및 상기 마르코프 의사결정 과정 모델을 이용하여, 미리 설정된 제약 조건을 만족하면서, 보상의 총합이 최대가 되도록 상기 과도 상태 각각에서의 투약 정책을 결정하는 단계를 포함하는 마르코프 의사결정 과정을 이용하는 투약 전략 생성 방법이 제공된다.In addition, according to another embodiment of the present invention for achieving the above object, collecting treatment history data of a sample patient for a target disease; generating a Markov decision-making process model including intermediate status, survival status, and death status using the treatment history data including vital sign information and drug administration information; and determining a medication policy in each of the transient states so that a total sum of rewards is maximized using the Markov decision-making process model while satisfying a preset constraint condition. A medication strategy generation method using a Markov decision-making process is provided.

본 발명의 일 실시 예에 따르면, 마르코프 의사결정 과정 모델이 시점에 영향을 받지 않는 정상 상태(steady state)가 아닌, 시점에 영향을 받는 과도 상태(transient state)인 경우에도 적용될 수 있어 같은 환자의 상태 뿐만 아니라 약물 투여시점까지 고려할 수 있으며, 보상 함수에 따른 생존률의 편차를 줄일 수 있는 투약 전략을 생성할 수 있다.According to an embodiment of the present invention, the Markov decision-making process model can be applied not only in a steady state unaffected by time, but also in a transient state affected by time, so that not only the state of the same patient but also the time of drug administration can be considered, and a dosing strategy that can reduce the variation in survival rate according to the compensation function can be created.

도 1은 본 발명의 일 실시 예에 따른 마르코프 의사결정 과정을 설명하기 위한 도면이다.
도 2는 본 발명의 일 실시 예에 따른 마르코프 의사결정 과정을 이용하는 투약 전략 생성 방법을 설명하기 위한 흐름도이다.
도 3은 본 발명의 일 실시 예에 따른 마르코프 의사결정 과정 모델을 생성하는 방법을 설명하기 위한 도면이다.
도 4는 본 발명의 다른 실시 예에 따른 마르코프 의사결정 과정을 이용하는 투약 전략 생성 방법을 설명하기 위한 흐름도이다.1 is a diagram for explaining a Markov decision-making process according to an embodiment of the present invention.
2 is a flowchart illustrating a method for generating a medication strategy using a Markov decision-making process according to an embodiment of the present invention.
3 is a diagram for explaining a method of generating a Markov decision-making process model according to an embodiment of the present invention.
4 is a flowchart illustrating a method of generating a medication strategy using a Markov decision-making process according to another embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다. Since the present invention can make various changes and have various embodiments, specific embodiments are illustrated in the drawings and described in detail. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention. Like reference numerals have been used for like elements throughout the description of each figure.

이하에서, 본 발명에 따른 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시 예에 따른 마르코프 의사결정 과정을 설명하기 위한 도면이다.1 is a diagram for explaining a Markov decision-making process according to an embodiment of the present invention.

본 발명의 일 실시 예에 따른 투약 전략 생성 방법은, 도 1과 같은 마르코프 의사결정 과정 모델을 이용한다.A method for generating a medication strategy according to an embodiment of the present invention uses a Markov decision-making process model as shown in FIG. 1 .

본 발명의 일 실시 예에 따른 마르코프 의사결정 과정 모델에서, 현재 상태(S_t)에 존재하는 환자에 대한 투약 행동(A_t)이 이루어지면, 현재 상태에 대한 투약 행동에 따른 보상(r_t(s,a))이 주어진다. 이러한 보상은, 환경에서의 관측 결과에 따라서 결정되며, 여기서 관측 결과는 투약 행동에 따른 신체 반응이 될 수 있다. 예컨대, 투약 행동에 따른 신체 반응이 개선된다면 양수값의 보상이 주어지며, 신체 반응이 나빠진다면 음수값의 보상이 주어질 수 있다. 현재 상태에 대한 투약 행동 이후, 환자의 현재 상태는 다음 상태(S_t+1)로 전이되며, 다음 상태에서 다시 투약 행동이 결정된다. In the Markov decision-making process model according to an embodiment of the present invention, when a medication behavior (A _t ) is performed for a patient in the current state (S _t ), a reward (r _t (s,a)) is given. Such compensation is determined according to an observation result in the environment, where the observation result may be a bodily response according to a medication administration behavior. For example, a reward of positive value may be given if the body response according to the medication behavior is improved, and a reward of negative value may be given if the body response is deteriorated. After the drug administration behavior for the current state, the patient's current state transitions to the next state (S _t+1 ), and the drug administration behavior is determined again in the next state.

마르코프 의사결정 과정 모델에서는 전술된 과정이 반복되며, 보상의 총합이 최대가 되는 투약 정책이 결정된다. 여기서 정책이란, 현재 상태에서 선택 가능한 복수의 행동 각각을 수행할 확률로 정의된다. 현재 상태에 대한 투약 행동은 여러가지가 존재할 수 있으며, 투약 정책이란, 현재 상태에서 수행될 투약 행동을 매핑하는 함수라고 할 수 있다. 그리고 보상의 총합은, 첫 투약 시점에서부터 마지막 투약 시점까지의 보상의 합을 의미한다.In the Markov decision-making process model, the above-described process is repeated, and a dosing policy that maximizes the sum of rewards is determined. Here, the policy is defined as a probability of performing each of a plurality of selectable actions in the current state. A number of medication behaviors for the current state may exist, and the medication policy may be referred to as a function mapping medication behaviors to be performed in the current state. And the sum of rewards means the sum of rewards from the time of the first administration to the time of the last administration.

본 발명의 일 실시 예에 따른 투약 전략 생성 방법은 대상 질병을 보유하고 있는 샘플 환자의 치료 이력 데이터로부터 투약하고자 하는 기간에 해당되는 데이터를 추출하여 마르코프 의사결정 과정 모델을 생성하는데, 환자별 상태나, 환자에 대한 투약 전략 등은 매우 다양하기 때문에, 효과적으로 마르코프 의사결정 과정 모델을 생성하기 위해 치료 이력 데이터를 유사도에 따라서 클러스터링한다. In the method for generating a medication strategy according to an embodiment of the present invention, a Markov decision-making process model is created by extracting data corresponding to a period to be administered from treatment history data of a sample patient having a target disease. In order to effectively generate a Markov decision-making process model, the treatment history data is clustered according to the degree of similarity, since the patient's condition and the medication strategy for the patient vary widely.

그리고 본 발명의 일 실시 예에 따른 투약 전략 생성 방법은 치료 이력 데이터의 클러스터링에 의해 환자의 상태가 이산적으로 표현되는 환경에서 보다 생존률을 높일 수 있는 최적의 투약 전략을 제공하기 위해, 보상의 총합이 최대가 될 뿐만 아니라 미리 설정된 제약 조건을 만족하는 투약 정책을 결정한다.In addition, the method for generating a medication strategy according to an embodiment of the present invention determines a medication policy that not only maximizes the sum of rewards but also satisfies preset constraints in order to provide an optimal medication strategy that can increase the survival rate in an environment where the patient's condition is discretely expressed by clustering of treatment history data.

본 발명의 일 실시 예에 따른 투약 전략 생성 방법은, 프로세서와 메모리를 포함하는 컴퓨팅 장치에서 수행될 수 있으며, 이를테면, 데스크탑, 노트북 서버, 모바일 단말 등에서 수행될 수 있다.The method for generating a dosage strategy according to an embodiment of the present invention may be performed in a computing device including a processor and a memory, such as a desktop computer, a laptop server, and a mobile terminal.

도 2는 본 발명의 일 실시 예에 따른 마르코프 의사결정 과정을 이용하는 투약 전략 생성 방법을 설명하기 위한 흐름도이다.2 is a flowchart illustrating a method for generating a medication strategy using a Markov decision-making process according to an embodiment of the present invention.

본 발명의 일 실시 예에 따른 컴퓨팅 장치는 대상 질병에 대한 샘플 환자의 치료 이력 데이터를 수집하고, 수집된 치료 이력 데이터를 이용하여, 중도 상태, 생존 상태 및 사망 상태를 포함하는 마르코프 의사결정 과정 모델을 생성(S210)한다. 여기서, 중도 상태란 치료중인 상태를 의미하며, 치료 결과 생존 상태 및 사망 상태 중 하나의 최종 상태(absorbing state)로 전이될 수 있다. The computing device according to an embodiment of the present invention collects treatment history data of sample patients for a target disease, and uses the collected treatment history data to generate a Markov decision-making process model including intermediate state, survival state, and death state (S210). Here, the intermediate state means a state under treatment, and as a result of the treatment, transition may be made to one of a survival state and a death state (absorbing state).

또한 치료 이력 데이터는, 샘플 환자의 바이탈 사인 정보 및 약물 투약 정보를 포함하며, 실시예에 따라서 성별 정보, 나이 정보, 동반 질환 정보, 바이탈 사인 정보, 검사 정보 및 약물 투약 정보 중 적어도 하나를 포함할 수 있다. 약물 투약 정보는 투약된 약물의 종류와, 투약된 약물의 투약량에 대한 정보를 포함하며, 투약 행동에 대응된다.In addition, the treatment history data includes vital sign information and drug administration information of the sample patient, and may include at least one of gender information, age information, accompanying disease information, vital signs information, examination information, and drug administration information according to an embodiment. The medication administration information includes information about the type of medication administered and the dosage amount of the medication administered, and corresponds to medication administration behavior.

그리고 컴퓨팅 장치는 단계 S210에서 생성된 마르코프 의사결정 과정 모델을 이용하여, 미리 설정된 제약 조건을 만족하면서 보상의 총합이 최대가 되도록 하는 중도 상태 각각에서의 투약 정책을 결정(S220)한다.Then, the computing device uses the Markov decision-making process model generated in step S210 to determine a medication policy in each intermediate state that satisfies the preset constraints and maximizes the sum of the rewards (S220).

컴퓨팅 장치는 일 실시 예로서 보상 함수를 이용하여, 제약 조건이 만족될 수 있는 투약 정책을 결정할 수 있다. 예컨대, 제약 조건이, 대상 환자의 생존률이 높아지도록 하는 조건으로써, 대상 환자가 특정 중도 상태에 존재할 확률을 임계값 이하로 낮추는 조건이라면, 컴퓨팅 장치는 특정 중도 상태로 전이될 경우 보상을 감소시키는 보상 함수를 이용하여 제약 조건이 만족되는 투약 정책을 결정할 수 있다.한편, 치료 이력 데이터의 클러스터링에 따라서, 마르코프 의사결정 과정 모델의 상태가 압축적으로 표현될 수 있으며, 상태가 압축적으로 표현된 상태에서는 보상 함수에 따라서 환자의 생존률의 편차가 커질 수 있기 때문에, 또 다른 실시예로서 컴퓨팅 장치는 제약 조건을 이용하여, 환자의 생존률이 높아질 수 있는 투약 전략을 생성할 수 있다.As an example, the computing device may use a reward function to determine a medication policy that can satisfy constraint conditions. For example, if the constraint condition is a condition that increases the survival rate of the target patient and lowers the probability of the target patient to be in a specific intermediate state below a threshold value, the computing device may determine a medication policy that satisfies the constraint condition by using a reward function that reduces a reward when transitioning to a specific intermediate state. On the other hand, according to the clustering of treatment history data, the state of the Markov decision-making process model can be compressed, and in a state where the state is compressed, the deviation of the patient's survival rate will increase according to the reward function. Therefore, as another embodiment, the computing device may use the constraints to generate a dosing strategy that can increase the patient's survival rate.

컴퓨팅 장치는 별도의 제약 조건없이 마르코프 의사결정 과정 모델로부터 보상의 총합이 최대가 되도록 중도 상태 각각에서의 제1투약 정책을 결정한 후, 마르코프 의사결정 과정 모델 및 제1투약 정책에 대한 가치 함수를 이용하여, 미리 설정된 제약 조건을 만족하는 제2투약 정책을 생성할 수 있다. The computing device may determine the first medication policy in each of the intermediate states from the Markov decision-making process model so that the total sum of the rewards is maximized without any additional constraints, and then use the Markov decision-making process model and the value function for the first medication policy to generate a second medication policy that satisfies the preset constraints.

컴퓨팅 장치가, 제1투약 정책을 먼저 결정한 후, 제1투약 정책에 대한 가치 함수를 이용하여 제2투약 정책을 결정하는 것은, 제1투약 정책과 유사한 투약 정책을 제2투약 정책으로 결정하기 위함이다. 제1투약 정책은 마르코프 의사결정 과정 모델로부터 도출될 수 있는 최선의 정책일 수 있지만, 현실적인 다양한 조건이나 환경 등을 고려할 때, 제1투약 정책을 그대로 수용하여 환자의 투약 전략을 수립하기는 쉽지 않다. 이에 본 발명의 일 실시 예에서는 제약 조건을 만족하는 환경에서 제1투약 정책과 유사한 투약 정책을 획득하기 위해, 제1투약 정책에 대한 가치 함수를 이용하여 제2투약 정책을 결정한다.The reason why the computing device first determines the first medication policy and then determines the second medication policy using the value function for the first medication policy is to determine a medication policy similar to the first medication policy as the second medication policy. The first medication policy may be the best policy that can be derived from the Markov decision-making process model, but considering various realistic conditions or environments, it is not easy to accept the first medication policy as it is and establish a medication strategy for the patient. Accordingly, in an embodiment of the present invention, in order to obtain a medication policy similar to the first medication policy in an environment that satisfies the constraint conditions, the second medication policy is determined by using a value function for the first medication policy.

이하, 각 단계별로 보다 상세히 설명하기로 한다.Hereinafter, each step will be described in more detail.

<마르코프 의사결정 과정 모델><Markov Decision Making Process Model>

도 3은 본 발명의 일 실시 예에 따른 마르코프 의사결정 과정 모델을 생성하는 방법을 설명하기 위한 도면이다.3 is a diagram for explaining a method of generating a Markov decision-making process model according to an embodiment of the present invention.

도 3을 참조하면, 본 발명의 일 실시 예에 따른 컴퓨팅 장치는 단계 S210에서, 치료 이력 데이터를 클러스터링하여, 샘플 환자에 대한 중도 상태를 결정(S310)한다. 컴퓨팅 장치는 치료 이력 데이터 중 목표 투여기간에 해당되는 데이터를 수집하고, 치료 이력 데이터는 시계열 데이터 형태로써, 미리 설정된 시간 단위로 수집될 수 있으며, 전처리될 수 있다. 이러한 치료 이력 데이터에 결측치가 포함된 경우, 컴퓨팅 장치는 전처리 과정에서 LOCF(Last Observation Carried Forward) 알고리즘 등을 이용하여, 결측치를 보완할 수 있다.Referring to FIG. 3 , in step S210, the computing device according to an embodiment of the present invention clusters treatment history data and determines an intermediate state for a sample patient (S310). The computing device collects data corresponding to a target administration period among treatment history data, and the treatment history data is in the form of time-series data, which may be collected in units of preset time and may be pre-processed. When missing values are included in the treatment history data, the computing device may supplement the missing values by using a Last Observation Carried Forward (LOCF) algorithm or the like in a preprocessing process.

전술된 바와 같이, 치료 이력 데이터는, 성별 정보, 나이 정보, 동반 질환 정보, 바이탈 사인 정보, 검사 정보 및 약물 투약 정보 등을 포함할 수 있는데, 이러한 데이터들이 유사도에 따라 클러스터링될 수 있으며, 일 실시 예로서 k-means 알고리즘을 통해 클러스터링될 수 있다. 클러스터 각각이 환자의 중도 상태 각각에 대응될 수 있으며, 환자의 중도 상태 각각에 대해 서로 다른 인덱스가 할당될 수 있다. As described above, the treatment history data may include gender information, age information, comorbidity information, vital signs information, examination information, drug administration information, etc. These data may be clustered according to similarity. As an example, it may be clustered through a k-means algorithm. Each cluster may correspond to each of the patient's intermediate states, and different indices may be assigned to each of the patient's intermediate states.

예컨대, 치료 이력 데이터가 750개의 클러스터로 클러스터링된 경우, 750개의 중도 상태가 결정되며, 이러한 중도 상태 각각에 대해 1부터 750까지의 인덱스가 할당될 수 있다. 만일 치료 이력 데이터에 바이탈 사인 정보가 포함된 경우, 환자의 중도 상태에 대응되는 클러스터에 포함된 바이탈 사인 정보의 범위가, 해당 중도 상태에서의 환자의 상태를 나타낸다고 할 수 있다.For example, when treatment history data is clustered into 750 clusters, 750 intermediate states are determined, and indexes from 1 to 750 may be assigned to each of these intermediate states. If vital sign information is included in the treatment history data, it can be said that the range of vital sign information included in the cluster corresponding to the patient's critical condition represents the patient's condition in the critical condition.

그리고 컴퓨팅 장치는 현재 중도 상태에서 다음 중도 상태에 대한 전이 확률을, 투약 행동별로 생성(S320)한다. Then, the computing device generates transition probabilities from the current intermediate state to the next intermediate state for each medication behavior (S320).

치료 이력 데이터를 통해, 현재 중도 상태에서 다음 중도 상태로 전이된 샘플 환자가 확인될 수 있으며, 이 때 현재 중도 상태에서 수행된 투약 행동 역시 확인될 수 있다. 컴퓨팅 장치는 현재상태에서 다음 상태로 전이된 빈도를 나타내는 행렬을 투약 행동 각각에 대해 생성하고, 행렬의 각 행을 각 행의 합으로 나누어, 투약 행동 각각에 따른 전이 확률을 계산한다. Through the treatment history data, a sample patient who has transitioned from a current intermediate state to a next intermediate state may be identified, and at this time, a medication behavior performed in the current intermediate state may also be identified. The computing device generates a matrix representing the frequency of transition from the current state to the next state for each medication behavior, and divides each row of the matrix by the sum of each row to calculate a transition probability according to each medication behavior.

예컨대 전술된 실시 예와 같이, 생존과 사망을 나타내는 최종 상태와 함께 750개의 중도 상태가 결정되고,치료 이력 데이터에 포함된 투약 행동이 10가지라면, 752x752 크기의 행렬이 10개 생성될 수 있다. For example, as in the above-described embodiment, if 750 intermediate states are determined along with final states indicating survival and death, and there are 10 medication behaviors included in the treatment history data, 10 matrices of 752x752 size may be generated.

중도 상태와 달리, 사망 상태 및 생존 상태에 대한 전이 확률은, 항상 자신의 상태로만 전이되도록 1이 할당된다.Unlike the intermediate state, the transition probabilities for the dead state and the survival state are always assigned 1 so that only the own state is transitioned.

그리고 컴퓨팅 장치는 단계 S320에서 보상 함수를 정의한다. 보상 함수는 중도 상태에서 생존 상태로 전이될 경우 양수값의 보상이 부여되고, 중도 상태에서 사망 상태로 전이될 경우 음수값의 보상이 부여되도록 정의될 수 있다. 예컨대, 양수값의 보상은 100, 음수값의 보상은 -100일 수 있으며, 현재 중도 상태에서 최종적으로 생존 상태로 전이될 경우, 보상 함수에 의해 100의 보상이 부여될 수 있다.And the computing device defines a compensation function in step S320. The reward function may be defined such that a positive value reward is given when a transition is made from a neutral state to a survival state, and a negative value reward is given when a transition is made from a neutral state to a dead state. For example, a reward of a positive value may be 100, a reward of a negative value may be -100, and a reward of 100 may be given by a reward function when a transition is finally made from the current midway state to the survival state.

<제1투약 정책 결정><Determination of the first administration policy>

본 발명의 일 실시 예에 따른 컴퓨팅 장치는 단계 S220에서, 단계 S210에서 생성된 마르코프 의사결정 과정 모델을 이용하여, 보상의 총합이 최대가 되도록 중도 상태 각각에서의 제1투약 정책을 결정한다. In step S220, the computing device according to an embodiment of the present invention uses the Markov decision-making process model generated in step S210 to determine the first medication policy in each intermediate state so that the sum of the rewards is maximized.

컴퓨팅 장치는 마르코프 의사결정 과정 모델의 최적 정책을 결정하는 다양한 알고리즘을 이용하여, 제1투약 정책을 결정할 수 있으며, 일 실시 예로서, 동적 프로그래밍 방법 중 하나인 역진 귀납법 (Backward Induction)을 이용하여, 제1투약 정책을 결정할 수 있다. The computing device may determine the first medication policy by using various algorithms for determining the optimal policy of the Markov decision-making process model. As an embodiment, the first medication policy may be determined by using backward induction, which is one of dynamic programming methods.

컴퓨팅 장치는 주어진 투약 정책에 대한 보상의 기대값을 반환하는 가치 함수와, 벨만 최적화(Bellman optimality)를 이용하여, 보상의 총합이 최대가 되는 제1투약 정책을 결정한다. 즉, 제1투약 정책에 대한 가치 함수로부터 얻을 수 있는 보상의 총합의 기대값이, 최대화된다고 할 수 있다.The computing device uses a value function that returns an expected value of a reward for a given medication policy and Bellman optimality to determine a first medication policy that maximizes the sum of the rewards. That is, it can be said that the expected value of the total sum of rewards obtained from the value function for the first medication policy is maximized.

이 때, 컴퓨팅 장치는 치료 이력 데이터로부터 획득된 투약 행동들 중에서, 중도 상태 각각에서 수행된 투약 행동을 이용하여, 중도 상태 각각에서의 제1투약 정책을 결정한다. 즉, 주어진 투약 정책은, 치료 이력 데이터로부터 획득된 모든 투약 행동이 아닌, 현재 중도 상태에서 수행된 투약 행동만이 고려되며, 현재 중도 상태에서 수행된 투약 행동이 수행될 확률을 포함한다고 볼 수 있다. 따라서, 중도 상태 각각에 대한 제1투약 정책 역시, 중도 상태 각각에서 수행된 투약 행동의 수행 확률을 포함한다. At this time, the computing device determines a first medication policy in each intermediate state by using a medication behavior performed in each intermediate state among medication behaviors obtained from the treatment history data. That is, the given medication policy considers only the medication behaviors performed in the current intermediate state, not all medication behaviors obtained from the treatment history data, and includes the probability that the medication behaviors performed in the current intermediate state will be performed. Accordingly, the first medication policy for each intermediate state also includes a probability of performing a medication behavior performed in each intermediate state.

<제2투약 정책 생성><Create Second Dosing Policy>

본 발명의 일 실시 예에 따른 컴퓨팅 장치는 단계 S220에서, 단계 S210에서 생성된 마르코프 의사결정 과정 모델 및 제1투약 정책에 대한 가치 함수를 이용하여, 미리 설정된 제약 조건을 만족하면서 보상의 총합이 최대가 되는 제2투약 정책을 생성한다.In step S220, the computing device uses the Markov decision-making process model generated in step S210 and the value function for the first medication policy to generate a second medication policy that satisfies preset constraints and maximizes the sum of rewards.

여기서, 제약 조건은 일 실시 예로서, 대상 환자가 중도 상태 중 위험 상태에 존재할 확률이 제1임계값 이하인 조건일 수 있으며, 위험 상태는 중도 상태 중 사망률이 제2임계값 이상인 중도 상태일 수 있다. 예컨대, 제1임계값은 0.04일 수 있으며, 제2임계값은 90%일 수 있다. 그리고 중도 상태 중 사망률은 다양한 방식으로 계산될 수 있으며 일 실시 예로서, 중도 상태에 존재한 샘플 환자 중에서, 대상 질병에 의해 사망한 환자의 비율에 따라서 결정될 수 있다. 예컨대, 총 100명의 샘플 환자 중에서, 제1중도 상태에 존재한 환자가 50명인데, 이 중 10명이 사망하였다면, 사망률은 20%가 된다.Here, as an example, the constraint condition may be a condition in which the probability of the target patient being in a critical state among intermediate states is equal to or less than a first threshold value, and the risk state may be a moderate state in which a mortality rate among intermediate states is greater than or equal to a second threshold value. For example, the first threshold value may be 0.04, and the second threshold value may be 90%. In addition, mortality during critical condition may be calculated in various ways, and as an example, may be determined according to a ratio of patients who die from a target disease among sample patients present in critical condition. For example, out of a total of 100 sample patients, if there are 50 patients in the first critical condition and 10 of them die, the mortality rate is 20%.

전술된 바와 같이, 컴퓨팅 장치는 위험 상태로 전이될 경우 음의 값을 갖는 보상 함수를 이용하여 유사한 결과를 가져올 수 있는 투약 정책을 결정할 수 있지만, 이 경우, 정확한 임계값 이하로 통제하는 것이 어려울 수 있다. 또한, 주관적인 판단에 의해 음의 값을 갖는 보상을 부여할 경우, 환자 치료 과정에서 위험성은 줄일 수 있으나, 최종적인 생존이라는 목적성은 흐려질 수 있다.As described above, the computing device may determine a dosing policy that can bring about a similar result by using a reward function having a negative value when transitioning to a risk state, but in this case, it may be difficult to control the dosing policy below an accurate threshold. In addition, when a reward having a negative value is given based on subjective judgment, the risk in the patient treatment process may be reduced, but the purpose of final survival may be obscured.

이에 본 발명의 일 실시 예는, 보상 함수 대신 제약 조건을 이용하여, 제2투약 정책을 생성한다.Accordingly, in one embodiment of the present invention, the second medication policy is generated using constraint conditions instead of the compensation function.

이러한 제약 조건에 의한 제2투약 정책은, 대상 환자가 위험 상태에 존재할 가능성이 적으면서도 보상의 총합이 최대가 될 수 있는, 중도 상태 각각에 대한 투약 행동의 수행 확률을 포함한다. 투약 행동의 수행 확률을 포함하는 이러한 투약 정책은, 의료진과 같은 의사 결정자가 적절한 투약 행동을 결정할 때, 확률에 따른 다양한 선택지를 제공할 수 있다. The second medication policy based on these constraints includes probabilities of performing medication behaviors for each of the intermediate states in which the possibility of the target patient being in a dangerous state is low and the sum of rewards is maximized. Such a medication policy including the probability of performing a medication medication behavior may provide various options based on probability when a decision maker such as a medical staff determines an appropriate medication medication behavior.

이에 본 발명의 일 실시 예에 따른 컴퓨팅 장치는 중도 상태 각각에 대한, 단일의 투약 행동이 포함된 투약 정책, 즉 최적의 의사 결정 규칙을 제2투약 정책으로 결정한다. 즉, 컴퓨팅 장치는 환자의 위험성을 고려한 최적의 의사결정 규칙을 제2투약 정책으로 제공할 수 있으며, 이러한 투약 정책은 의사 결정자에게 생존이라는 단일 관점이 아닌, 환자의 위험 수준까지 같이 고려 가능한 복수의 관점을 제공할 수 있다.Accordingly, the computing device according to an embodiment of the present invention determines a medication policy including a single medication behavior for each intermediate condition, that is, an optimal decision-making rule as the second medication policy. That is, the computing device may provide the optimal decision-making rule considering the risk of the patient as the second medication policy, and this medication policy may provide the decision maker with multiple viewpoints that can consider the patient's risk level as well, rather than a single viewpoint of survival.

이를 위해 본 발명의 일 실시 예에 따른 컴퓨팅 장치는 선형 계획법을 이용하여, 제약 조건을 만족하면서 동시에 확정적인 제2투약 정책을 제공할 수 있으며, 일 실시 예로서, [수학식 1]을 이용하여 제2투약 정책()을 결정할 수 있다. 컴퓨팅 장치는 선형 계획법을 통해 해를 구하는 알고리즘인 단체법(simplex method)이나 내부점법(interior point method) 기반의 알고리즘을 이용하여, [수학식 1]의 해인 제2투약 정책을 도출할 수 있다. To this end, the computing device according to an embodiment of the present invention may provide a definite second medication policy while satisfying constraint conditions using a linear programming method. As an example, using [Equation 1], the second medication policy ( ) can be determined. The computing device may derive the second medication policy, which is the solution of [Equation 1], using a simplex method or an interior point method-based algorithm for obtaining a solution through a linear programming method.

여기서, 의사결정 변수인 P_t는 시점 t에서의 중도 상태 및 최종 상태 각각에 대한 투약 정책을 나타낸다. 다시 말해, 시점 t에서 특정 상태가 주어졌을 때, 투약 행동 별 수행 확률을 나타내며, 예컨대 752개의 총 상태와 10개의 투약 행동이 존재한다면, 752 X 10 크기의 행렬로 나타낼 수 있다. (1)식에서, 는 대상 환자가 중도 상태 및 대상 환자가 중도 상태 및 최종 상태 각각에 존재할 확률로서 750개의 중도 상태 및 2개의 최종 상태가 존재할 경우, 752 X 1크기의 벡터 형태로 나타낼 수 있다. 는 마르코프 체인에 따라서, 미리 설정된 0부터 N(자연수) 사이의 시점(t) 단위로 갱신될 수 있다. 는 제1투약 정책에 대한 가치 함수를 나타낸다. 는 t 시점에서 투약 정책 P_t가 주어졌을 때의 보상의 기대값을 나타내며, 는 0과 1사이의 사용자 지정 상수를 나타낸다. 는 투약 정책 P_t가 주어졌을 때, 획득된 전이 확률을 나타낸다. 정리하자면 (1)식은 t시점에서 얻을 수 있는 즉각적인 보상과 이후 시점에서 얻을 수 있는 보상의 총합을 합친 값의 평균이라고 해석될 수 있다. (2)~(6)식은 (1)식을 최대화할 때의 제약식을 의미한다. Here, P _t , which is a decision-making variable, represents a medication policy for each intermediate state and final state at time t. In other words, when a specific state is given at time t, it represents the probability of performing each medication behavior. For example, if there are 752 total states and 10 medication behaviors, it can be represented by a matrix having a size of 752 X 10. In formula (1), is the probability that the target patient is in the intermediate state and the target patient is in the intermediate state and the final state, respectively. may be updated in units of time points (t) between 0 and N (natural number) set in advance according to the Markov chain. denotes the value function for the first dosing policy. Represents the expected value of the reward when the medication policy P _t is given at time t, represents a user-specified constant between 0 and 1. Represents the obtained transition probability given the dosing policy P _t . To summarize, equation (1) can be interpreted as the average of the sum of the immediate rewards that can be obtained at time t and the sum of rewards that can be obtained at later times. Expressions (2) to (6) mean constraint expressions when maximizing expression (1).

(2)식은 투약 정책 P_t가 정해졌을 때, 보상의 평균을 나타내는 를 구하기 위한 수학식이다. R_t는 상태 s에서 행동 a가 수행됐을 때 받는 보상을 나타내며, 752개의 상태와 10개의 투약 행동이 존재한다면, 752 X 10 크기의 행렬 형태로 나타낼 수 있다. 는 hadamard product로 행렬의 성분 별 곱을 의미하는 연산자를 나타내며, 은 1로 구성된 벡터를 나타낸다. Equation (2) represents the average reward when the medication policy P _t is determined. is a mathematical formula for obtaining R _t represents the reward received when action a is performed in state s, and if there are 752 states and 10 medication actions, it can be represented in the form of a 752 X 10 matrix. is a hadamard product and represents an operator that means product-by-component multiplication of a matrix, represents a vector consisting of 1s.

(3)식은 투약 정책 P_t가 정해졌을 때의 전이 확률을 구하기 위한 수학식이다. 는 영벡터 중 k 번째 성분만 1을 부여하는 기저 벡터를 나타내며, G_k는 행동 k에 대한 마르코프 의사결정 과정의 전이 확률을 나타낸다.Equation (3) is an equation for obtaining the transition probability when the medication policy P _t is determined. denotes a basis vector for which only the kth component of the zero vector is assigned 1, and G _k denotes the transition probability of the Markov decision-making process for action k.

(4)식은 위험 상태에 존재할 확률을 제어하는 제약 조건을 나타내는 수학식으로서, 은 위험 상태를 선택하기 위한 보조 행렬이며, d는 제1임계값을 나타낸다. Equation (4) is a mathematical expression representing the constraint condition controlling the probability of being in a dangerous state, is an auxiliary matrix for selecting a dangerous state, and d represents a first threshold value.

마지막 (5)와 (6)식은 의사결정 정책의 확률적 정의를 나타내는 식을 의미한다. The last equations (5) and (6) represent the probabilistic definition of the decision-making policy.

본 발명의 일 실시 예에 의해 제2투약 정책이 도출되면, 의사결정자는 대상 환자의 현재 상태와 대응되는 중도 상태를 결정하고, 현재 해당되는 중도 상태에서의 투약 행동을, 제2투약 정책에서 확인하여 적절한 투약 행동을 대상 환자에 적용할 수 있다.When the second medication policy is derived according to an embodiment of the present invention, the decision maker determines an intermediate state corresponding to the current state of the target patient, and confirms the medication behavior in the current intermediate state in the second medication policy. Appropriate medication behavior can be applied to the target patient.

도 4는 본 발명의 다른 실시 예에 따른 마르코프 의사결정 과정을 이용하는 투약 전략 생성 방법을 설명하기 위한 흐름도이다.4 is a flowchart illustrating a method of generating a medication strategy using a Markov decision-making process according to another embodiment of the present invention.

도 4를 참조하면, 본 발명의 일 실시 예에 따른 컴퓨팅 장치는 샘플 환자의 치료 이력 데이터를 수집하고 전처리(S410)한 후, 마르코프 의사결정 과정 모델을 생성(S420)한다. 그리고 단계 S420에서 생성된 마르코프 의사결정 과정 모델을 이용하여, 보상의 총합이 최대가 되는 제1투약 정책을 결정(S430)하고, 미리 설정된 제약 조건을 만족하면서 보상의 총합이 최대가 되는 제2투약 정책을 생성(S440)한다.Referring to FIG. 4 , the computing device according to an embodiment of the present invention collects treatment history data of sample patients, pre-processes them (S410), and then creates a Markov decision-making process model (S420). Then, using the Markov decision-making process model generated in step S420, a first medication policy that maximizes the sum of rewards is determined (S430), and a second medication policy that maximizes the sum of rewards while satisfying preset constraints is generated (S440).

이 때, 컴퓨팅 장치는 제2투약 정책이 미리 설정된 제약 조건을 만족하는지 판단(S450)한다. 제2투약 정책이 미리 설정된 제약 조건을 만족할 경우, 제2투약 정책은 마르코프 의사결정 과정 모델에 포함된 모든 중도 상태에 대한 투약 행동을 포함하며, 제2투약 정책이 미리 설정된 제약 조건을 만족하지 못할 경우, 일부 중도 상태에 대한 투약 행동은 결정되지 않는다.At this time, the computing device determines whether the second medication policy satisfies the preset constraint condition (S450). When the second medication policy satisfies preset constraint conditions, the second medication policy includes medication behavior for all intermediate states included in the Markov decision-making process model, and if the second medication policy does not satisfy preset constraint conditions, medication behavior for some intermediate states is not determined.

컴퓨팅 장치는 미리 설정된 제약 조건을 만족하는 제2투약 정책이 결정되지 않은 경우, 다시 말해 일부 중도 상태에 대해 제약 조건을 만족하는 제2투약 정책이 생성되지 않은 경우, 제약 조건을 수정(S460)한다. 컴퓨팅 장치는 제1임계값 또는 제2임계값을 조절하여 제약 조건을 수정할 수 있다. 그리고 조절된 제1임계값 또는 제2임계값을 이용하여, 단계 S440을 반복한다.When the second medication policy that satisfies the preset constraint condition is not determined, that is, when the second medication policy that satisfies the constraint condition is not generated for some intermediate states, the computing device modifies the constraint condition (S460). The computing device may modify the constraint condition by adjusting the first threshold value or the second threshold value. Step S440 is then repeated using the adjusted first or second threshold value.

만일 제2투약 정책이 미리 설정된 제약 조건을 만족하는 경우, 다시 말해 제2투약 정책이 모든 중도 상태에 대해 생성된 경우, 컴퓨팅 장치는 제2투약 정책을 최종 투약 정책으로 출력(S470)한다.If the second medication policy satisfies the preset constraint conditions, that is, if the second medication policy is generated for all intermediate states, the computing device outputs the second medication policy as the final medication policy (S470).

앞서 설명한 기술적 내용들은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예들을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 하드웨어 장치는 실시예들의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로써 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The technical contents described above may be implemented in the form of program instructions that can be executed through various computer means and recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program commands recorded on the medium may be specially designed and configured for the embodiments or may be known and usable to those skilled in computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, magneto-optical media such as floptical disks, and hardware devices specially configured to store and execute program instructions such as ROM, RAM, and flash memory. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler. A hardware device may be configured to act as one or more software modules to perform the operations of the embodiments and vice versa.

이상과 같이 본 발명은 구체적인 구성 요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등하거나 등가적 변형이 있는 모든 것들은 본 발명 사상의 범주에 속한다고 할 것이다.As described above, the present invention has been described by specific details such as specific components and limited embodiments and drawings, but this is provided only to help a more general understanding of the present invention, the present invention is not limited to the above embodiments, and those skilled in the art can make various modifications and variations from these descriptions. Therefore, the spirit of the present invention should not be limited to the described embodiments, and not only the claims described later, but also all modifications equivalent or equivalent to these claims belong to the scope of the present invention.

Claims

A method for generating a dosing strategy using a Markov decision-making process, performed by a computing device, comprising:
Using treatment history data of a sample patient including at least one of gender information, age information, comorbidity information, vital sign information, examination information, and drug administration information for a target disease, a Markov decision-making process model including intermediate state, survival state, and death state;
determining a first medication policy in each of the intermediate states so that the total sum of rewards is maximized by using the Markov decision-making process model; and
generating a second medication policy that satisfies preset constraints using the Markov decision-making process model and a value function for the first medication policy;
The step of generating the Markov decision-making process model
clustering the treatment history data according to a degree of similarity, and determining the intermediate state for the sample patient; and
Generating a transition probability from the current intermediate state to the next intermediate state for each medication behavior and defining a reward function,
The constraint condition is a condition in which the probability of the target patient being in a critical state among the critical states is equal to or less than a first threshold value,
The risk state is a moderate state in which the mortality rate among the intermediate states is equal to or higher than the second threshold value.
A method for generating a dosing strategy using a Markov decision-making process.

According to claim 1,
The reward function is
A reward function for giving a positive value reward when transitioning from the intermediate state to the surviving state and giving a negative value reward when transitioning from the intermediate state to the dead state.
A method for generating a dosing strategy using a Markov decision-making process.

According to claim 1,
The step of determining the first medication policy is
Among the medication behaviors obtained from the treatment history data, using the medication behaviors performed in each of the intermediate states, determining a first medication policy in each of the intermediate states
A method for generating a dosing strategy using a Markov decision-making process.

delete

According to claim 1,
The step of generating the second medication policy is
Generating a dosing policy including optimal dosing behavior considering the risk of the target patient for each of the intermediate conditions
A method for generating a dosing strategy using a Markov decision-making process.

According to claim 5,
The step of generating the second medication policy is
Adjusting the first threshold value or the second threshold value when the second medication policy that satisfies the constraint condition is not generated for some intermediate states.
A method for generating a dosing strategy using a Markov decision-making process.

According to claim 1,
The mortality rate
Among the sample patients present in the intermediate state, determined according to the proportion of patients who died from the target disease
A method for generating a dosing strategy using a Markov decision-making process.

A method for generating a dosing strategy using a Markov decision-making process, performed by a computing device, comprising:
Collecting treatment history data of a sample patient for a target disease;
generating a Markov decision-making process model including intermediate status, survival status, and death status using the treatment history data including vital sign information and drug administration information; and
Using the Markov decision-making process model, determining a medication policy in each of the intermediate states so that a total sum of rewards is maximized while satisfying a preset constraint,
The step of generating the Markov decision-making process model
clustering the treatment history data according to a degree of similarity, and determining the intermediate state for the sample patient; and
Generating a transition probability from the current intermediate state to the next intermediate state for each medication behavior and defining a reward function,
The constraint condition is a condition in which the probability of the target patient being in a critical state among the critical states is equal to or less than a first threshold value,
The risk state is a moderate state in which the mortality rate among the intermediate states is equal to or higher than the second threshold value.
A method for generating a dosing strategy using a Markov decision-making process.

delete

According to claim 8,
The step of determining the dosing policy is
For each of the severe conditions, determining the optimal medication policy considering the risk of the target patient
A method for generating a dosing strategy using a Markov decision-making process.