KR20220102737A

KR20220102737A - Method And Computer Program for Predicting Recurrence Risk of Acute Coronary Syndrome

Info

Publication number: KR20220102737A
Application number: KR1020210005028A
Authority: KR
Inventors: 나형철; 아카미사이 오이 솜; 위옹 소완리잊 꽁; 조완섭; 강길원; 방희제
Original assignee: 충북대학교 산학협력단
Priority date: 2021-01-14
Filing date: 2021-01-14
Publication date: 2022-07-21

Abstract

The present disclosure provides a method and a computer program for predicting the recurrence risk of acute coronary syndrome. According to an aspect of the present disclosure, a method for predicting the recurrence risk of acute coronary syndrome and a computer program thereof are provided. The method comprises the steps of: collecting sample data on a patient group and a control group with recurrence of the acute coronary syndrome; performing logistic regression analysis to predict the risk of recurrence of the acute coronary syndrome based on the sample data; generating a machine learning-based logistic regression model which predicts the recurrence risk of the acute coronary syndrome of the patient based on the results of the logistic regression analysis, and performing learning based on the sample data. According to the present invention, rehabilitation is performed adaptively according to the degree of risk of recurrence of the patient.

Description

Method And Computer Program for Predicting Recurrence Risk of Acute Coronary Syndrome

본 발명은 급성 관상동맥증후군의 재발위험 예측방법 및 컴퓨터 프로그램에 관한 것이다.The present invention relates to a method and computer program for predicting the risk of recurrence of acute coronary syndrome.

이 부분에 기술된 내용은 단순히 본 실시예에 대한 배경 정보를 제공할 뿐 종래기술을 구성하는 것은 아니다.The content described in this section merely provides background information for the present embodiment and does not constitute the prior art.

심장 재활(cardiac rehabilitation)이란, 심장병 환자에 대한 전인적인 치료 및 사후 관리로, 심장병으로 약해진 심폐기능과 운동기능을 회복시키고, 심장병의 발생 위험인자를 관리하여 병의 재발을 방지하기 위한 것이다. 연구에 따르면, 심장 재활은 환자의 운동능력을 회복시키고, 심장병의 재발, 재입원, 재시술 필요성을 줄여주고 심장 원인 및 모든 원인의 사망률을 감소시킨다. 또한, 심혈관질환의 재발률을 15-20% 감소시킴은 물론 심혈관 관련 유병률 및 사망률 역시 25 % ~ 40 % 감소시키는 효과가 있다.Cardiac rehabilitation is a holistic treatment and follow-up management for heart disease patients, to restore cardiopulmonary function and motor function weakened by heart disease, and to prevent recurrence of the disease by managing risk factors for heart disease. Studies have shown that cardiac rehabilitation restores the patient's exercise capacity, reduces the need for heart disease recurrence, readmission, and reoperation, and reduces cardiac cause and all-cause mortality. In addition, it has the effect of reducing the recurrence rate of cardiovascular disease by 15-20%, as well as reducing the cardiovascular-related morbidity and mortality by 25% to 40%.

그러나 심장 재활에 대한 필요성 인지 및 인식의 부족과, 심장 재활이 가능한 병원에의 접근성이 떨어지고 보험처리가 제한되는 점, 심장 재활에 참여할 시간이 부족하다는 점, 환자를 케어할 보호자가 필요하다는 점, 신체적 불편함에 따른 통원의 어려움 등을 이유로 심장 재활에 대한 참여는 저조한 실정이다. 예컨대, 국내 심장병 환자의 심장 재활 참여율은 전체 심장병 환자의 5 % 수준에 머물고 있으며, 이러한 현상은 심장병 환자에게만 국한되지 않는다.However, the lack of awareness and awareness of the need for cardiac rehabilitation, the lack of access to hospitals capable of cardiac rehabilitation and limited insurance coverage, the lack of time to participate in cardiac rehabilitation, the need for a guardian to take care of the patient, Participation in cardiac rehabilitation is low due to difficulties in going to hospital due to physical discomfort. For example, the participation rate of cardiac rehabilitation of patients with heart disease in Korea remains at 5% of all heart disease patients, and this phenomenon is not limited to patients with heart disease.

환자에게 재활의 필요성을 인지시키고 환자의 재활에 대한 인식을 개선시키기 위하여, 질병의 재발위험, 더 나아가 질병의 재발위험의 확률을 예측하여 환자의 재활 계획 수립에 이용할 필요가 있다. 구체적인 수치로서 질병의 재발위험을 인식한 환자와 의료진은 보다 적극적으로 질병의 재발 방지를 위해 노력하고, 질병의 재발 확률의 정도에 부합하도록 대응할 수 있다.In order to make the patient aware of the need for rehabilitation and to improve the patient's awareness of rehabilitation, it is necessary to predict the risk of disease recurrence, furthermore, the probability of the disease recurrence risk, and use it to establish the patient's rehabilitation plan. Patients and medical staff, recognizing the risk of disease recurrence as a specific numerical value, can more actively work to prevent disease recurrence and respond to the degree of disease recurrence probability.

그에 따라, 질병과 관련된 다양한 인자(factor)를 로지스틱 회귀분석(logistic regression analysis)하여 질병의 발생 여부를 분석하는 방안이 고려된 바 있다.Accordingly, a method of analyzing the occurrence of disease by performing logistic regression analysis of various disease-related factors has been considered.

로지스틱 회귀분석은 종속변수와 독립변수들 간의 인과관계를 로지스틱 함수를 이용하여 추정하는 통계기법이다. 로지스틱 회귀분석은 선형 회귀분석(linear regression analysis)과 달리 인자의 값 자체를 기초로 분석하는 것이 아닌, 값에 대한 확률을 기초로 종속변수와 독립변수 간의 관계를 분석한다. 그에 따라 로지스틱 회귀분석을 수행하는 경우 아웃라이어(outlier)에 의한 왜곡을 방지하는 효과가 있으나, 단순 로지스틱 회귀분석의 수행만으로는 인자별 오즈비(odds ratio)나 질병의 재발 여부만을 예측할 수 있을 뿐 재활이 필요한 환자에게 질병의 재발 확률에 관한 구체적 수치 정보까지 제공할 수는 없는 문제가 있다.Logistic regression analysis is a statistical technique that estimates the causal relationship between the dependent and independent variables using a logistic function. Unlike linear regression analysis, logistic regression analysis analyzes the relationship between the dependent variable and the independent variable based on the probability of the value, not based on the value of the factor itself. Accordingly, when performing logistic regression analysis, there is an effect of preventing distortion due to outliers, but only performing simple logistic regression analysis can predict only the odds ratio for each factor or the recurrence of disease. There is a problem in that it is not possible to provide even specific numerical information about the probability of recurrence of the disease to the patients in need.

본 개시의 일 측면에 의하면, 질병이 재발한 환자군 및 대조군에 관한 표본 데이터를 수집하고, 표본 데이터를 기초로 질병의 재발위험을 예측하는 로지스틱 회귀분석(logistic regression analysis)을 수행하고, 로지스틱 회귀분석의 결과를 기초로 질병의 재발위험 확률을 예측하는, 기계학습 기반의 로지스틱 회귀 모델(logistic regression model)을 생성하고, 표본 데이터를 기초로 학습시키는 질병의 재발위험 예측방법 및 그 컴퓨터 프로그램을 제공하는 데 주된 목적이 있다.According to an aspect of the present disclosure, sample data about a patient group and a control group in which a disease has recurred is collected, and logistic regression analysis for predicting the risk of disease recurrence based on the sample data is performed, and logistic regression analysis is performed. To provide a method for predicting disease recurrence risk probability based on the result of has the main purpose of

본 개시의 일 측면에 의하면, 기 설정된 인자(factor)에 관하여, 급성 관상동맥증후군(acute coronary syndrome)이 재발한 환자군의 상기 인자에 관한 데이터 및 대조군의 상기 인자에 관한 데이터인 표본 데이터를 수집하는 과정; 상기 표본 데이터를 기초로 상기 급성 관상동맥증후군의 재발위험을 예측하는 로지스틱 회귀분석(logistic regression analysis)을 수행하는 과정; 상기 로지스틱 회귀분석의 결과를 기초로 상기 급성 관상동맥증후군의 재발위험 확률을 예측하는 기계학습 기반의 로지스틱 회귀 모델(logistic regression model)을 생성하고, 상기 표본 데이터를 기초로 상기 로지스틱 회귀 모델을 학습시키는 과정을 포함하는 것을 특징으로 하는 급성 관상동맥증후군의 재발위험 예측방법을 제공한다.According to one aspect of the present disclosure, with respect to a preset factor (factor), data about the factor in the patient group with recurrent acute coronary syndrome and sample data that is data about the factor of the control group are collected. process; performing logistic regression analysis to predict the recurrence risk of the acute coronary syndrome based on the sample data; Creating a machine learning-based logistic regression model for predicting the recurrence risk probability of the acute coronary syndrome based on the result of the logistic regression analysis, and learning the logistic regression model based on the sample data It provides a method for predicting the risk of recurrence of acute coronary syndrome, characterized in that it includes a process.

본 개시의 다른 측면에 의하면, 급성 관상동맥증후군의 재발위험 예측방법에 있어서, 상기 대조군은, 상기 급성 관상동맥증후군이 1 회 발생하였던 사람의 군(group)으로 구성된 것을 특징으로 하는 급성 관상동맥증후군의 재발위험 예측방법을 제공한다.According to another aspect of the present disclosure, in the method for predicting the risk of recurrence of acute coronary syndrome, the control group is acute coronary syndrome, characterized in that it is composed of a group of people who have experienced the acute coronary syndrome once. provides a method for predicting the risk of recurrence of

본 개시의 다른 측면에 의하면, 전술한 급성 관상동맥증후군의 재발위험 예측방법에 있어서, AUC(Area Under Curve) 값이 기 설정된 검정 수준 이상인지 여부를 판단함으로써 상기 로지스틱 회귀 모델의 검증을 수행하는 것을 특징으로 하는 급성 관상동맥증후군의 재발위험 예측방법을 제공한다.According to another aspect of the present disclosure, in the aforementioned method for predicting the risk of recurrence of acute coronary syndrome, the verification of the logistic regression model is performed by determining whether an Area Under Curve (AUC) value is greater than or equal to a preset test level. It provides a method for predicting the risk of recurrence of acute coronary syndrome characterized by

본 개시의 또 다른 측면에 의하면, 학습된 로지스틱 회귀 모델을 이용하여 상기 인자에 관한 환자의 개별 데이터로부터 상기 질병의 재발 위험 확률을 예측하는 과정을 더 포함하는 것을 특징으로 하는 급성 관상동맥증후군의 재발위험 예측방법을 제공한다.According to another aspect of the present disclosure, the recurrence of acute coronary syndrome, characterized in that it further comprises the process of predicting the probability of recurrence of the disease from individual data of the patient regarding the factor using the learned logistic regression model. It provides a risk prediction method.

본 개시의 일 측면에 의하면, 질병이 재발한 환자군 및 대조군에 관한 표본 데이터를 수집하고, 표본 데이터를 기초로 질병의 재발위험을 예측하는 로지스틱 회귀분석(logistic regression analysis)을 수행하고, 로지스틱 회귀분석의 결과를 기초로 질병의 재발위험 확률을 예측하는, 기계학습 기반의 로지스틱 회귀 모델(logistic regression model)을 생성하고, 표본 데이터를 기초로 학습시킴으로써, 질병의 재발위험 확률을 정확하게 제공하는 효과가 있다.According to an aspect of the present disclosure, sample data about a patient group and a control group in which a disease has recurred is collected, and logistic regression analysis for predicting the risk of disease recurrence based on the sample data is performed, and logistic regression analysis is performed. By creating a machine learning-based logistic regression model that predicts the probability of disease recurrence based on the results of .

본 개시의 다른 측면에 의하면, 학습된 로지스틱 회귀 모델을 이용하여 개별 환자 데이터로부터 질병의 재발위험 확률을 예측함에 따라, 예측한 질병의 재발위험 확률을 환자의 운동 처방 및/또는 질환 관리에 반영하여 환자의 재발 위험의 정도에 따라 적응적으로 재활이 이루어지게 하는 효과가 있다. According to another aspect of the present disclosure, as the disease recurrence risk probability is predicted from individual patient data using the learned logistic regression model, the predicted disease recurrence risk probability is reflected in the patient's exercise prescription and / or disease management. It has the effect of adaptively performing rehabilitation according to the degree of the patient's risk of recurrence.

따라서, 본 개시의 여러 측면에 따른 질병의 재발위험 예측 방법을 적용하는 경우, 환자로 하여금 질병의 재발위험에 대한 이해를 높이고, 운동 처방에 대한 순응도를 향상시킬 수 있는 효과가 있다.Therefore, when the method for predicting the risk of recurrence of a disease according to various aspects of the present disclosure is applied, there is an effect that allows the patient to increase the understanding of the risk of recurrence of the disease and to improve the compliance with the exercise prescription.

도 1은 본 개시의 일 실시예에 따른 질병의 재발위험 예측방법을 나타내는 흐름도이다.
도 2는 본 개시의 일 실시예에 따라 급성 관상동맥증후군의 재발위험 확률의 예측 결과에 대한 통계적 검정 결과이다.1 is a flowchart illustrating a method for predicting a recurrence risk of a disease according to an embodiment of the present disclosure.
2 is a statistical test result for predicting the probability of recurrence risk of acute coronary syndrome according to an embodiment of the present disclosure.

이하, 본 개시의 일부 실시예들을 예시적인 도면을 통해 상세하게 설명한다. 각 도면의 구성요소들에 열람부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 개시를 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 개시의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.Hereinafter, some embodiments of the present disclosure will be described in detail with reference to exemplary drawings. It should be noted that in adding the reading reference numerals to the components of each drawing, the same components are to have the same reference numerals as much as possible even though they are displayed on different drawings. In addition, in describing the present disclosure, if it is determined that a detailed description of a related known configuration or function may obscure the gist of the present disclosure, the detailed description thereof will be omitted.

또한, 본 개시의 구성 요소를 설명하는 데 있어서, 제2, 제1 등의 용어를 사용할 수 있다. 이러한 용어는 그 구성 요소를 다른 구성 요소와 구별하기 위한 것일 뿐, 그 용어에 의해 해당 구성 요소의 본질이나 차례 또는 순서 등이 한정되지 않는다. 명세서 전체에서, 어떤 부분이 어떤 구성요소를 '포함', '구비'한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 '…부', '모듈' 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.In addition, in describing the components of the present disclosure, terms such as second, first, etc. may be used. These terms are only for distinguishing the elements from other elements, and the essence, order, or order of the elements are not limited by the terms. Throughout the specification, when a part 'includes' or 'includes' a certain component, this means that other components may be further included, rather than excluding other components, unless otherwise stated. . In addition, the '... Terms such as 'unit' and 'module' mean a unit that processes at least one function or operation, which may be implemented as hardware or software or a combination of hardware and software.

첨부된 도면과 함께 이하에 개시될 상세한 설명은 본 개시의 예시적인 실시형태를 설명하고자 하는 것이며, 본 개시가 실시될 수 있는 유일한 실시형태를 나타내고자 하는 것이 아니다.DETAILED DESCRIPTION The detailed description set forth below in conjunction with the appended drawings is intended to describe exemplary embodiments of the present disclosure and is not intended to represent the only embodiments in which the present disclosure may be practiced.

본 개시에서는 질병의 재발위험 예측방법으로서, 급성 관상동맥증후군(acute coronary syndrome)의 재발 위험 예측을 예로 들어 설명하나, 본 개시의 재발위험 예측방법이 반드시 이러한 용도에 한정되는 것이 아니다. 본 개시에서는 질병의 재발위험을 예측하는 데 있어 본 개시에 따른 로지스틱 회귀 모델(logistic regression model)의 성능을 제시하고자 급성 관상동맥증후군의 재발 환자군과 그 대조군의 데이터를 기초로 질병의 재발위험을 예측하였으나, 본 개시의 재발위험 예측방법은 질병의 재발위험 예측이 요구되는 어느 분야에도 적용될 수 있다.In the present disclosure, as a method for predicting the risk of recurrence of a disease, an example of predicting the risk of recurrence of acute coronary syndrome is described, but the method for predicting the risk of recurrence of the present disclosure is not necessarily limited to this use. In the present disclosure, in order to present the performance of the logistic regression model according to the present disclosure in predicting the risk of disease recurrence, the risk of disease recurrence is predicted based on the data of the acute coronary syndrome recurrence patient group and the control group. However, the recurrence risk prediction method of the present disclosure can be applied to any field requiring disease recurrence risk prediction.

도 1은 본 개시의 일 실시예에 따른 질병의 재발위험 예측방법을 나타내는 흐름도이다.1 is a flowchart illustrating a method for predicting a recurrence risk of a disease according to an embodiment of the present disclosure.

질병이 재발한 환자군 및 대조군으로부터 표본 데이터를 생성 또는 수집한다(S100). 여기서 표본 데이터는 각 군(group)에 관한 데이터로, 로지스틱 회귀분석(logistic regression analysis)의 독립변수로 사용될 수 있는 인자(factor)의 값 또는 범주 데이터이다. 인자에는 예컨대 표 1에서와 같이, 나이(age), 성별(gender), 시술 코드(procedure code), 처방약 코드(prescription medicine code), 진단 코드(diagnosis code) 및 시술 받은 질병 코드(treated disease code) 등이 포함될 수 있으며, 그 밖에 질병의 재발 위험을 예측하는 데 사용될 수 있는 다양한 변수가 더 포함될 수 있다.Sample data is generated or collected from the patient group and the control group in which the disease recurs (S100). Here, the sample data is data about each group, and is a factor value or category data that can be used as an independent variable in logistic regression analysis. Factors include, for example, age, gender, procedure code, prescription medicine code, diagnosis code, and treated disease code, as shown in Table 1, for example. etc. may be included, and in addition, various variables that can be used to predict the risk of disease recurrence may be further included.

데이터를 수집할 대조군은 질병의 재발 위험을 더욱 정확하게 예측하기 위하여 해당 질병이 1회 발생하였던 환자를 대상으로 함이 바람직하다.In order to more accurately predict the risk of recurrence of the disease, the control group from which data will be collected is preferably from patients who have had the disease once.

변수variable 설명Explanation AgeAge 나이age GenderGender 성별gender PRO_CODEPRO_CODE 시술 코드(예, Percutaneous coronary intervention 등)Procedure code (e.g., Percutaneous coronary intervention, etc.) MED_CODEMED_CODE 처방약 코드
(예, Clopidogrel 등)prescription drug code
(eg Clopidogrel, etc.) CON_CODE CON_CODE 진단 코드
(예, Acute myocardial infarction 등)diagnostic code
(eg, acute myocardial infarction, etc.) PRO_REASONCODEPRO_REASONCODE 시술 받은 질병 코드
(예, Angina pectoris 등)disease code treated
(eg Angina pectoris, etc.)

표본 데이터를 전처리한다(S102). 전처리에는 예컨대, 데이터 스케일링(scaling), 데이터 가공에 의한 특정 변수값의 생성, 데이터 포맷(format) 도는 타입(type)의 일원화, 데이터의 범주화, 데이터 병합, 결손치(missing value)의 처리, 분석 툴(analysis tool)이 처리할 수 있는 형태로의 데이터 가공(예: 문자열을 숫자로 변환 등), 데이터 셔플링(shuffling) 등이 포함될 수 있다.The sample data is pre-processed (S102). Preprocessing includes, for example, data scaling, generation of specific variable values by data processing, unification of data format or type, categorization of data, data merging, processing of missing values, analysis tools (analysis tool) may include data processing into a form that can be processed (eg, converting a string into a number, etc.), data shuffling, and the like.

표본 데이터를 이용하여 질병의 재발 여부를 예측하는 로지스틱 회귀분석(logistic regression analysis)을 수행한다(S104). 로지스틱 회귀분석은 표본 데이터의 전부 또는 일부를 이용하여 수행될 수 있으며, S106 단계에서 로지스틱 회귀 모델을 트레이닝시킬 트레이닝 데이터(training data)를 이용하여 수행됨이 바람직하다. 이는 로지스틱 회귀분석 결과를 기초로 로지스틱 회귀 모델을 생성하기 때문이다.A logistic regression analysis for predicting whether a disease will recur using the sample data is performed (S104). The logistic regression analysis may be performed using all or part of the sample data, and is preferably performed using training data to train the logistic regression model in step S106. This is because a logistic regression model is created based on the logistic regression analysis result.

로지스틱 회귀분석은 표본 데이터의 모든 인자를 대상으로 수행될 수도 있으나, 일부 인자만을 대상으로 수행될 수 있으며, 로지스틱 회귀분석이 수행되는 인자의 구성은 다양할 수 있다.Logistic regression analysis may be performed on all factors of the sample data, but may be performed on only some factors, and the composition of factors for which logistic regression analysis is performed may vary.

S104 단계의 수행으로, 로지스틱 회귀분석의 수행 결과로서 각 인자에 대응하는 회귀 계수(regression coefficient), 오즈비(odds ratio), 신뢰구간(confidence intervals) 등이 산출될 수 있다. 여기서 오즈비란, 사건이 발생하지 않을 확률 대비 사건이 발생할 확률을 나타내는 값으로, 오즈비 또는 오즈비에 로그를 취한 로그-오즈비를 기초로 인자와 예측값 간의 관계를 분석할 수 있다.By performing step S104, regression coefficients, odds ratios, confidence intervals, and the like corresponding to each factor may be calculated as a result of logistic regression analysis. Here, the odds ratio is a value representing the probability that an event will occur versus the probability that the event does not occur, and the relationship between the factor and the predicted value may be analyzed based on the odds ratio or the log-odds ratio obtained by taking the logarithm of the odds ratio.

표본 데이터의 각 인자에 대응하는 회귀 계수(regression coefficient)의 전부 또는 일부를 이용하여 로지스틱 회귀 모델을 생성한다(S106). 회귀 계수는 S104 단계의 회귀분석 수행 결과로서 생성된 추정값 일 수 있다. 로지스틱 회귀 모델의 독립변수가 되는 각 인자는, 특정 인자의 값의 변화 전후의 오즈(odds) 간 비(ratio)를 고려하여 선정된 인자일 수 있다.A logistic regression model is generated using all or a part of regression coefficients corresponding to each factor of the sample data (S106). The regression coefficient may be an estimated value generated as a result of performing the regression analysis in step S104. Each factor serving as an independent variable of the logistic regression model may be a factor selected in consideration of a ratio between odds before and after a change in a value of a specific factor.

또 다른 실시예에서 로지스틱 회귀분석은 신뢰구간에 1이 포함되거나 오즈비가 기 설정된 기준값 이하인 경우, 로지스틱 회귀분석으로 산출된 각 회귀 계수 또는 상수의 값을 조정하거나, 분석에 사용된 표본 데이터를 다시 샘플링하여 로지스틱 회귀분석을 재수행함으로써 질병의 재벌위험의 예측 성능을 향상시킬 수 있다.In another embodiment, in the logistic regression analysis, when 1 is included in the confidence interval or the odds ratio is less than or equal to a preset reference value, the value of each regression coefficient or constant calculated by the logistic regression analysis is adjusted, or the sample data used for the analysis is sampled again. Therefore, it is possible to improve the predictive performance of the conglomerate risk of disease by re-performing the logistic regression analysis.

본 개시에서의 로지스틱 회귀 모델은 기계학습(machine learning) 기반의 모델이다. 로지스틱 회귀 모델에 사용되는 기계학습 방법론은 의사결정 트리(decision tree), 랜덤 포레스트(random forest), K-NN(K-Nearest Neighbor), SVM(Support Vector Machine) 등일 수 있으나 이에 한하지 않고, 통상의 기술자가 질병의 재발위험을 예측하는 로지스틱 회귀 모델을 구현하는 데 용이하게 채용할 수 있는 기계학습 방법론이면 본 개시의 로지스틱 회귀 모델의 구현에 이용될 수 있다.The logistic regression model in the present disclosure is a machine learning-based model. The machine learning methodology used in the logistic regression model may be a decision tree, a random forest, a K-Nearest Neighbor (K-NN), a support vector machine (SVM), etc., but is not limited thereto, and is usually Any machine learning methodology that can be easily employed by those skilled in the art to implement a logistic regression model for predicting the risk of recurrence of a disease may be used for the implementation of the logistic regression model of the present disclosure.

표본 데이터로부터 트레이닝 데이터 및/또는 테스트 데이터(test data)를 각각 추출하고, 트레이닝 데이터를 이용하여 질병의 재발위험 확률을 예측하는 로지스틱 회귀 모델을 트레이닝한다(S108). 전술한 대로, 표본 데이터로부터 트레이닝 데이터를 추출하는 과정은 S104 단계 이전에 수행될 수 있다.Each of training data and/or test data is extracted from the sample data, and a logistic regression model for predicting the probability of recurrence of a disease is trained using the training data (S108). As described above, the process of extracting training data from the sample data may be performed before step S104.

S108 단계에서 트레이닝된 로지스틱 회귀 모델을 검정하기 위하여, AUC(Area Under Curve) 값을 검정하여 이용하여 로지스틱 회귀 모델의 예측 품질을 평가한다(S110). AUC란 진양성 비율을 세로축으로, 위양성 비율을 가로축으로 하는 ROC 커브(Receiver Operating Characteristic curve) 아래 영역으로, AUC 값이란 전체 영역 대비 ROC 커브 아래 영역의 비율을 나타내는 값이다. AUC는 평가척도에 불변한 값이며, ROC 커브가 어떤 분류 임계치(classification threshold) 또는 컷오프(cutoff)를 선정하였는지에 불변한 값이다. S108 단계에서의 검정은 예컨대, AUC 값이 기 설정된 검정 수준 이상인지 여부를 판단함으로써 수행될 수 있다.In order to test the logistic regression model trained in step S108, the prediction quality of the logistic regression model is evaluated by using an area under curve (AUC) value to test ( S110 ). AUC is the area under the ROC curve (Receiver Operating Characteristic curve) with the true positive rate as the vertical axis and the false positive rate as the horizontal axis. The AUC value is a value representing the ratio of the area under the ROC curve to the total area. AUC is an invariant value in the evaluation scale, and is an invariant value in which classification threshold or cutoff is selected for the ROC curve. The test in step S108 may be performed, for example, by determining whether the AUC value is equal to or greater than a preset test level.

로지스틱 회귀 모델의 예측 품질을 평가하는 또 다른 방법은, 모델의 적합성(goodness-of-fit)을 평가하는 것으로 손실함수(loss function)로부터 획득한 로그 손실(log loss)을 최소화하는 모델인지 등이 고려될 수 있다.Another way to evaluate the prediction quality of a logistic regression model is to evaluate the goodness-of-fit of the model, and whether it is a model that minimizes the log loss obtained from the loss function. can be considered.

질병을 겪었던 환자의 개별 데이터로부터 질병의 재발 위험 확률을 예측하기 위하여, S106 내지 S108 단계 중 어느 한 단계에서의 로지스틱 회귀 모델을 이용하여 질병의 재발 위험의 확률을 예측한다(S112).In order to predict the probability of disease recurrence risk from individual data of patients who have suffered the disease, the probability of disease recurrence risk is predicted using a logistic regression model in any one of steps S106 to S108 ( S112 ).

도 1에서는 과정 각 과정을 순차적으로 실행하는 것으로 기재하고 있으나, 이는 본 개시의 일 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것이다. 다시 말해, 본 개시의 일 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 개시의 일 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 도 1에 기재된 순서를 변경하여 실행하거나 각 과정 중 하나 이상의 과정을 병렬적으로 실행하는 것으로 다양하게 수정 및 변형하여 적용 가능할 것이므로, 도 1의 시계열적인 순서로 한정되는 것은 아니다. 예컨대, S102 단계의 데이터 전처리 과정 또는 S110 단계의 모델 검정 과정은 생략될 수 있으며, S104 단계 및 S106 단계는 로지스틱 회귀분석과 동시에 적합한(fit) 로지스틱 회귀 모델을 생성하도록, 하나의 단계로서 수행될 수 있다.Although it is described that each process is sequentially executed in FIG. 1 , this is merely illustrative of the technical idea of an embodiment of the present disclosure. In other words, those of ordinary skill in the art to which an embodiment of the present disclosure pertain may change the order described in FIG. Since it may be applied by various modifications and variations to be executed in parallel, it is not limited to the time-series order of FIG. 1 . For example, the data preprocessing process of step S102 or the model testing process of step S110 may be omitted, and steps S104 and S106 may be performed as one step to generate a fit logistic regression model at the same time as logistic regression analysis. have.

이하에서는 본 개시의 일 실시예에 따른 질병의 재발위험 예측방법을 이용하여 급성 관상동맥증후군의 재발위험의 확률을 예측하고, 예측된 확률의 통계적 검정 결과를 서술한다. 예측된 확률의 통계적 검정 결과는 도 2에 나타나 있다.Hereinafter, the probability of the recurrence risk of acute coronary syndrome is predicted using the disease recurrence risk prediction method according to an embodiment of the present disclosure, and a statistical test result of the predicted probability will be described. The statistical test result of the predicted probability is shown in FIG. 2 .

도 2는 본 개시의 일 실시예에 따라 급성 관상동맥증후군의 재발위험 확률의 예측 결과에 대한 통계적 검정 결과이다.2 is a statistical test result for predicting the probability of recurrence risk of acute coronary syndrome according to an embodiment of the present disclosure.

급성 관상동맥증후군의 재발위험을 예측하기 위하여, 급성 관상동맥증후군이 재발한 환자군과 급성 관상동맥증후군(예: 급성심근경색증후군)이 1회 발생한 대상의 대조군으로부터 나이, 성별, 시술 코드, 처방약 코드, 진단 코드 및 시술받은 질병 코드에 관한 데이터를 각각 수집하였다. 각 데이터는, 환자의 병원 의료기록과 외부 데이터베이스(건강보험심사평가원의 입원환자 데이터셋 및 Synthea의 Synthetic patient records)로부터 획득되었으며, 획득된 데이터의 수는 표 2와 같다.In order to predict the risk of recurrence of acute coronary syndrome, age, sex, procedure code, prescription drug code were obtained from the patient group with recurrence of acute coronary syndrome and the control group of subjects who had one occurrence of acute coronary syndrome (eg, acute myocardial infarction syndrome). , diagnostic codes and treated disease codes were collected, respectively. Each data was obtained from the patient's hospital medical record and an external database (inpatient dataset of Health Insurance Review and Assessment Service and Synthea's Synthetic patient records), and the number of data obtained is shown in Table 2.

데이터 종류data type 재발환자군recurrent patient group 대조군control 총 인원 수total number of people 병원 의료기록hospital medical records 00 44 44 건강보험심사평가원Health Insurance Review and Assessment Service 703703 58285828 65316531 SyntheaSynthea 00 778778 778778 총gun 703703 66106610 73137313

본 급성 관상동맥증후군의 재발위험 예측에서는 수집된 전체 데이터의 70 %를 트레이닝 데이터로, 30 %를 테스트 데이터로 사용하였다.In predicting the recurrence risk of acute coronary syndrome, 70% of the total collected data was used as training data and 30% was used as test data.

도 2의 ROC 커브를 참조하면, 예측의 진양성 비율이 위양성 비율 대비 높게 나타남을 알 수 있으며, AUC 값은 0.821로 나타났다. AUC 값이 0.821이므로, 본 개시의 일 실시예의 재발위험 예측방법에 의해 학습된 급성 관상동맥증후군의 재발위험의 확률을 예측하는 로지스틱 회귀 모델의 예측 성능이 우수함이 검증되었다.Referring to the ROC curve of FIG. 2 , it can be seen that the predicted true positive rate is higher than the false positive rate, and the AUC value is 0.821. Since the AUC value is 0.821, it was verified that the predictive performance of the logistic regression model for predicting the probability of the recurrence risk of acute coronary syndrome learned by the recurrence risk prediction method of an embodiment of the present disclosure is excellent.

실제 급성 관상동맥증후군이 재발한 환자의 개별 데이터를 로지스틱 회귀 모델에 적용하는 경우, 표 3과 같이 실제 급성 관상동맥증후군이 재발한 환자의 재발 확률과 재발하지 않을 확률 각각을 0.999(이하 생략)와 0.172(이하 생략)로 각각 예측하여, 로지스틱 회귀 모델의 개별 데이터에 대한 예측 성능 또한 우수함을 확인할 수 있다.When individual data of patients with actual acute coronary syndrome relapse are applied to the logistic regression model, as shown in Table 3, the probability of recurrence and non-recurrence of patients with actual acute coronary syndrome, respectively, is 0.999 (omitted below) and It can be confirmed that the prediction performance for individual data of the logistic regression model is also excellent by predicting each of 0.172 (hereinafter omitted).

Predict result : 1 with probability of : 0.9999999999996334
Predict result : 0 with probability of : 0.17207951877166552Predict result : 1 with probability of : 0.9999999999996334
Predict result : 0 with probability of : 0.17207951877166552

본 명세서에 설명되는 장치, 부(unit), 과정, 단계 등의 다양한 구현예들은, 디지털 전자 회로, 집적 회로, FPGA(field programmable gate array), ASIC(application specific integrated circuit), 컴퓨터 하드웨어, 펌웨어, 소프트웨어, 및/또는 이들의 조합으로 실현될 수 있다. 이러한 다양한 구현예들은 프로그래밍 가능 시스템상에서 실행 가능한 하나 이상의 컴퓨터 프로그램들로 구현되는 것을 포함할 수 있다. 프로그래밍 가능 시스템은, 저장 시스템, 적어도 하나의 입력 디바이스, 그리고 적어도 하나의 출력 디바이스로부터 데이터 및 명령을 수신하고 이들에게 데이터 및 명령을 전송하도록 결합된 적어도 하나의 프로그래밍 가능 프로세서(이것은 특수 목적 프로세서일 수 있거나 혹은 범용 프로세서일 수 있음)를 포함한다. 컴퓨터 프로그램들(이것은 또한 프로그램들, 소프트웨어, 소프트웨어 애플리케이션들 혹은 코드로서 알려져 있음)은 프로그래밍 가능 프로세서에 대한 명령어들을 포함하며 "컴퓨터가 읽을 수 있는　기록매체"에 저장된다. Various implementations of the devices, units, processes, steps, etc., described herein may include digital electronic circuits, integrated circuits, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), computer hardware, firmware, software, and/or a combination thereof. These various implementations may include being implemented in one or more computer programs executable on a programmable system. The programmable system includes at least one programmable processor (which may be a special purpose processor) coupled to receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device. or may be a general-purpose processor). Computer programs (also known as programs, software, software applications or code) contain instructions for a programmable processor and are stored on a "computer-readable recording medium".

컴퓨터가 읽을 수 있는　기록매체는, 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 이러한 컴퓨터가 읽을 수 있는　기록매체는 ROM, CD-ROM, 자기 테이프, 플로피디스크, 메모리 카드, 하드 디스크, 광자기 디스크, 스토리지 디바이스 등의 비휘발성(non-volatile) 또는 비 일시적인(non-transitory) 매체 또는 데이터 전송 매체(data transmission medium)와 같은 일시적인(transitory) 매체를 더 포함할 수도 있다. 또한, 컴퓨터가 읽을 수 있는　기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산 방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수도 있다.The computer-readable recording medium includes all types of recording devices in which data readable by a computer system is stored. These computer-readable recording media are non-volatile or non-transitory, such as ROM, CD-ROM, magnetic tape, floppy disk, memory card, hard disk, magneto-optical disk, and storage device. It may further include a medium or a transitory medium such as a data transmission medium. In addition, the computer-readable 　 recording medium may be distributed in a network-connected computer system, and the computer-readable code may be stored and executed in a distributed manner.

본 명세서에 설명되는 시스템들 및 기법들의 다양한 구현예들은, 프로그램가능 컴퓨터에 의하여 구현될 수 있다. 여기서, 컴퓨터는 프로그램가능 프로세서, 데이터 저장 시스템(휘발성 메모리, 비휘발성 메모리, 또는 다른 종류의 저장 시스템이거나 이들의 조합을 포함함) 및 적어도 한 개의 커뮤니케이션 인터페이스를 포함한다. 예컨대, 프로그램가능 컴퓨터는 서버, 네트워크 기기, 셋톱박스, 내장형 장치, 컴퓨터 확장 모듈, 개인용 컴퓨터, 랩톱, PDA(Personal Data Assistant), 클라우드 컴퓨팅 시스템 또는 모바일 장치 중 하나일 수 있다.Various implementations of the systems and techniques described herein may be implemented by a programmable computer. Here, the computer includes a programmable processor, a data storage system (including volatile memory, non-volatile memory, or other types of storage systems or combinations thereof), and at least one communication interface. For example, the programmable computer may be one of a server, a network appliance, a set-top box, an embedded device, a computer expansion module, a personal computer, a laptop, a Personal Data Assistant (PDA), a cloud computing system, or a mobile device.

이상의 설명은 본 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 실시예들은 본 실시예의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea of this embodiment, and a person skilled in the art to which this embodiment belongs may make various modifications and variations without departing from the essential characteristics of the present embodiment. Accordingly, the present embodiments are intended to explain rather than limit the technical spirit of the present embodiment, and the scope of the technical spirit of the present embodiment is not limited by these embodiments. The protection scope of this embodiment should be interpreted by the claims below, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present embodiment.

Claims

With respect to a preset factor, the process of collecting sample data, which is data on the factor in a patient group with recurrent acute coronary syndrome and data on the factor in a control group;
performing logistic regression analysis to predict the recurrence risk of the acute coronary syndrome based on the sample data;
Generating a machine learning-based logistic regression model for predicting the recurrence risk probability of the acute coronary syndrome based on the result of the logistic regression analysis, and learning the logistic regression model based on the sample data process
A method for predicting the risk of recurrence of acute coronary syndrome, characterized in that it comprises a.

According to claim 1,
The control group, the acute coronary syndrome recurrence risk prediction method, characterized in that consisting of a group of patients who have had the acute coronary syndrome once.

According to claim 1,
The factor, age, sex, procedure code, prescription drug code, diagnosis code, and a method for predicting recurrence risk of acute coronary syndrome, characterized in that it includes all or a part of the disease code treated.

According to claim 1,
The result of the logistic regression analysis, each regression coefficient (regression coefficient) of the factor, an odds ratio (odds ratio) of all or part of the recurrence risk prediction method of acute coronary syndrome characterized in that it includes.

5. The method of claim 4,
The process of training the logistic regression model is,
A method for predicting recurrence risk of acute coronary syndrome, characterized in that the logistic regression analysis is re-performed so that the odds ratio is equal to or greater than a preset reference value, and the logistic regression model is generated based on the re-performed result.

According to claim 1,
The learning process includes extracting training data and test data from the sample data, training the logistic regression model with the training data, and verifying the trained logistic regression model with test data.
A method for predicting the risk of recurrence of acute coronary syndrome, characterized by

7. The method of claim 6,
The verification is
A method for predicting the risk of recurrence of acute coronary syndrome, characterized in that it is performed by determining whether the Area Under Curve (AUC) value is greater than or equal to a preset test level.

According to claim 1,
The process of predicting the recurrence risk probability of the acute coronary syndrome from individual data of the patient regarding the factor using a learned logistic regression model
A method for predicting the risk of recurrence of acute coronary syndrome, characterized in that it further comprises a.

A computer program stored in a computer-readable recording medium to execute each process included in the method for predicting the recurrence risk of acute coronary syndrome according to any one of claims 1 to 8.