KR102601514B1

KR102601514B1 - System for prediction of early dropping out in outpatients with alcohol use disorders and method thereof

Info

Publication number: KR102601514B1
Application number: KR1020210096253A
Authority: KR
Inventors: 김대진; 최인영; 박소진; 전지원; 박성웅
Original assignee: (주)디지털팜
Priority date: 2021-07-22
Filing date: 2021-07-22
Publication date: 2023-11-14
Also published as: WO2023003315A1; KR20230015009A

Abstract

본 발명의 실시예들은 알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템 및 그 방법에 관한 것이다. 본 발명의 실시예들에 따르면, 복수의 알코올 사용장애 환자의 데이터를 수집하고, 하나 이상의 머신 러닝 알고리즘을 적용할 복수의 독립 변수들을 결정하고, 복수의 알코올 사용장애 환자의 데이터를 가공하여 가공 데이터를 생성하는 전처리부; 가공 데이터를 수신하고, 복수의 알코올 사용장애 환자의 외래 치료 조기 중단 여부를 종속 변수로 설정하고, 독립 변수들을 바탕으로 가공 데이터 중 전체 또는 일부에 하나 이상의 머신 러닝 알고리즘을 적용하여 예측 모델을 생성하는 예측 모델 생성부; 가공 데이터 중 전체 또는 일부를 예측 모델에 입력하여 복수의 알코올 사용 장애 환자의 외래 치료 조기 중단 여부에 관한 예측 결과를 생성하는 예측부 및 예측 결과를 출력하는 출력부를 포함하는 알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템을 제공할 수 있다.Embodiments of the present invention relate to a system and method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder. According to embodiments of the present invention, data of a plurality of alcohol use disorder patients are collected, a plurality of independent variables to which one or more machine learning algorithms are to be applied are determined, and the data of the plurality of alcohol use disorder patients are processed to produce processed data. A preprocessor that generates; Receives processed data, sets whether multiple alcohol use disorder patients stop early outpatient treatment as a dependent variable, and generates a prediction model by applying one or more machine learning algorithms to all or part of the processed data based on the independent variables. Prediction model generation unit; Outpatient treatment for patients with alcohol use disorders, including a prediction unit that inputs all or part of the processed data into a prediction model to generate prediction results regarding early discontinuation of outpatient treatment for patients with multiple alcohol use disorders, and an output unit that outputs the prediction results. It can provide an early outage prediction system.

Description

System and method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder {SYSTEM FOR PREDICTION OF EARLY DROPPING OUT IN OUTPATIENTS WITH ALCOHOL USE DISORDERS AND METHOD THEREOF}

본 발명은 알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템 및 그 방법을 제공한다.The present invention provides a system and method for predicting early cessation of outpatient treatment for patients with alcohol use disorder.

종래에 알코올 사용장애 환자의 치료율을 높이기 위하여 지속적인 외래 치료 유지에 영향을 주는 요인에 대한 연구들이 수행되었다. 그러나, 이 연구들은 대부분 전향적 연구에 불과하고, 후향적 연구인 경우에도 회귀 분석과 같은 전통적인 방법론에 초점을 맞추었다.Previously, studies were conducted on factors affecting the maintenance of continuous outpatient treatment in order to increase the treatment rate of patients with alcohol use disorder. However, most of these studies were only prospective studies, and even in the case of retrospective studies, they focused on traditional methodologies such as regression analysis.

머신 러닝은 경험과 데이터 사용을 통해 자동으로 향상되는 컴퓨터 알고리즘에 관한 연구이다. 머신 러닝은 인공지능의 일부로 간주되기도 한다. 머신 러닝 알고리즘은 특정한 동작을 명시적으로 프로그래밍하지 않고, 대신에 머신 러닝 알고리즘은 학습 데이터라고 하는 샘플을 기반으로 예측이나 결정을 내리기 위한 모델을 구축하는 데 사용될 수 있다. 머신 러닝은 의학, 음성인식 및 컴퓨터 비전 등 다양한 응용에 사용될 수 있다. 예를 들어, 공개 특허 제10-2021-0047149호에서는 머신러닝 기법을 이용하여 심장재활 환자의 심근경색 재발 위험을 예측하는 방법 및 시스템을 개시하고 있다. 구체적으로 심장재활 환자의 임상기록과 라이프로그 데이터를 획득하고, 상기 데이터를 기반으로 머신러닝 기반의 위험 예측 모델에 입력할 데이터셋을 생성하며, 상기 머신러닝 기반의 위험 예측 모델을 이용하여 상기 데이터셋으로부터 심장재활 환자의 심근경색 재발 위험 여부 또는 심근경색 재발 위험 지수를 결정 및 제공하는 단계를 포함하고 있다.Machine learning is the study of computer algorithms that automatically improve through experience and the use of data. Machine learning is also considered a part of artificial intelligence. Machine learning algorithms do not explicitly program specific actions; instead, machine learning algorithms can be used to build models to make predictions or decisions based on samples, called training data. Machine learning can be used in a variety of applications, including medicine, speech recognition, and computer vision. For example, Publication Patent No. 10-2021-0047149 discloses a method and system for predicting the risk of myocardial infarction recurrence in cardiac rehabilitation patients using machine learning techniques. Specifically, clinical records and lifelog data of cardiac rehabilitation patients are acquired, a dataset to be input into a machine learning-based risk prediction model is created based on the data, and the data is generated using the machine learning-based risk prediction model. From the three, it includes the step of determining and providing a myocardial infarction recurrence risk index or myocardial infarction recurrence risk index for cardiac rehabilitation patients.

머신 러닝에 기반한 예측 모델은 높은 정확도로 클래스 분류를 할 수 있다. 최근 정신 의학 연구에서는 의사결정 지원 시스템을 개발하는 과정에서 머신 러닝에 기반한 예측 모델을 유용하게 사용하고 있다. Prediction models based on machine learning can classify classes with high accuracy. Recently, in psychiatric research, prediction models based on machine learning have been usefully used in the process of developing decision support systems.

알코올 사용장애는 알코올로 인한 신체적 합병증 및 알코올성 치매 등 신체적 질병만이 아니라, 알코올과 관련된 범죄, 사고 등 사회문제 및 막대한 경제적 손실을 초래할 수 있다.Alcohol use disorder can cause not only physical diseases such as physical complications and alcohol-related dementia caused by alcohol, but also social problems such as alcohol-related crimes and accidents, and enormous economic losses.

2016년도 한국의 정신질환실태 역학조사에 따르면, 알코올 의존과 남용이 포함된 알코올 사용장애 평생 유병율은 12.2%(남 18.1%, 여 6.4%)로 다른 정신장애질환에 비해 유병률이 가장 높다.According to the 2016 Epidemiological Survey on Mental Illness in Korea, the lifetime prevalence rate of alcohol use disorder, which includes alcohol dependence and abuse, is 12.2% (18.1% for men, 6.4% for women), which is the highest prevalence rate compared to other mental disorders.

알코올 사용장애는 다른 정신질환에 비해 재발율이 높은 질환이다. 재발을 막기 위해서는 한순간의 치료로 종결되지 않고 장기간에 걸쳐 관리될 필요가 있다. 또한 꾸준하게 치료를 받으면 치료 결과에 긍정적인 영향을 줄 수 있다. 따라서 환자에 대한 지속적인 추적 관찰은 알코올 사용장애의 예후를 평가할 수 있는 중요한 지표가 된다.Alcohol use disorder is a disease with a higher relapse rate than other mental disorders. To prevent recurrence, it is necessary to manage it over a long period of time, rather than ending it with a one-time treatment. Additionally, consistent treatment can have a positive effect on treatment results. Therefore, continuous follow-up of patients becomes an important indicator for evaluating the prognosis of alcohol use disorder.

알코올 사용장애 환자들의 외래 치료 유지율은 상당히 낮은 것이 실상이다. 외국의 경우, 알코올 사용장애로 외래 치료를 받는 환자가 4 번째 치료에서 추적관찰이 중단되는 비율이 52% ~ 75%수준이다. 국내의 연구에 따르면 퇴원 후 6개월 이내에 추적 관찰이 중단된 환자는 91.7%이었다. 알코올 사용장애 환자들 중 추적 관찰이 조기 중단되는 환자를 예측 관리하는 것이 중요하다.The reality is that the outpatient treatment retention rate for patients with alcohol use disorder is quite low. In foreign countries, the rate of loss of follow-up for patients receiving outpatient treatment for alcohol use disorder at the fourth treatment ranges from 52% to 75%. According to a domestic study, 91.7% of patients lost follow-up within 6 months after discharge. Among patients with alcohol use disorder, it is important to predict and manage patients whose follow-up is discontinued early.

본 발명의 실시예들은 머신 러닝을 통한 예측 모델 설계를 통해 알코올 사용장애 환자의 외래 치료 조기 중단 확률을 계산하여 외래 치료 조기 중단 여부를 예측할 수 있다.Embodiments of the present invention can calculate the probability of early discontinuation of outpatient treatment for patients with alcohol use disorder by designing a prediction model using machine learning to predict whether outpatient treatment will be discontinued early.

본 발명의 실시예들은 조기에 외래 치료 중단 위험이 높은 알코올 사용장애 환자에 대해 꾸준히 치료를 유지할 수 있도록 환자 관리에 도움을 주고, 궁극적으로는 환자의 재발 방지 및 치료의 성공률을 높이는 데 기여할 수 있다.Embodiments of the present invention help manage patients so that they can consistently maintain treatment for patients with alcohol use disorder who are at high risk of discontinuing outpatient treatment at an early stage, and ultimately contribute to preventing patient recurrence and increasing the success rate of treatment. .

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical challenge that this embodiment aims to achieve is not limited to the technical challenges described above, and other technical challenges may exist.

일 측면에서, 본 발명의 실시예들은 복수의 알코올 사용장애 환자의 데이터를 수집하고, 복수의 알코올 사용장애 환자의 외래 치료 조기 중단 여부에 대한 예측 모델을 생성하기 위한 하나 이상의 머신 러닝 알고리즘을 적용할 복수의 독립 변수들을 결정하고, 복수의 알코올 사용장애 환자의 데이터를 가공하여 가공 데이터를 생성하는 전처리부; 가공 데이터를 수신하고, 복수의 알코올 사용장애 환자의 외래 치료 조기 중단 여부를 종속 변수로 설정하고, 독립 변수들을 바탕으로 가공 데이터 중 전체 또는 일부에 하나 이상의 머신 러닝 알고리즘을 적용하여 예측 모델을 생성하는 예측 모델 생성부; 가공 데이터 중 전체 또는 일부를 예측 모델에 입력하여 복수의 알코올 사용 장애 환자의 외래 치료 조기 중단 여부에 관한 예측 결과를 생성하는 예측부; 및 예측 결과를 출력하는 출력부;를 포함하는 알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템을 제공한다.In one aspect, embodiments of the present invention collect data from multiple alcohol use disorder patients and apply one or more machine learning algorithms to generate a prediction model for early discontinuation of outpatient treatment for multiple alcohol use disorder patients. A pre-processing unit that determines a plurality of independent variables and processes data of a plurality of alcohol use disorder patients to generate processed data; Receives processed data, sets whether multiple alcohol use disorder patients stop early outpatient treatment as a dependent variable, and generates a prediction model by applying one or more machine learning algorithms to all or part of the processed data based on the independent variables. Prediction model generation unit; a prediction unit that inputs all or part of the processed data into a prediction model to generate prediction results regarding whether to discontinue early outpatient treatment for patients with multiple alcohol use disorders; and an output unit for outputting the prediction results. A system for predicting early cessation of outpatient treatment for patients with alcohol use disorder is provided.

다른 측면에서, 본 발명의 실시예들은 복수의 알코올 사용장애 환자의 데이터를 수집하는 데이터 수집 단계; 복수의 알코올 사용장애 환자의 외래 치료 조기 중단 여부에 대한 예측 모델을 생성하기 위한 하나 이상의 머신 러닝 알고리즘을 적용할 복수의 독립 변수들을 결정하는 독립 변수 결정 단계; 복수의 알코올 사용장애 환자의 데이터를 가공하여 가공 데이터를 생성하는 전처리 단계; 가공 데이터를 수신하고, 복수의 알코올 사용장애 환자의 외래 치료 조기 중단 여부를 종속 변수로 설정하고, 독립 변수들을 바탕으로 가공 데이터 중 전체 또는 일부에 하나 이상의 머신 러닝 알고리즘을 적용하여 예측 모델을 생성하는 예측 모델 생성 단계; 가공 데이터 중 전체 또는 일부를 예측 모델에 입력하여 복수의 알코올 사용 장애 환자의 외래 진료 초기 중단 여부에 관한 예측 결과를 생성하는 예측 단계 및 예측 결과를 출력하는 출력 단계를 포함하는 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법을 제공한다.In another aspect, embodiments of the present invention include a data collection step of collecting data on a plurality of alcohol use disorder patients; An independent variable determination step of determining a plurality of independent variables to which one or more machine learning algorithms will be applied to create a prediction model for early discontinuation of outpatient treatment for multiple alcohol use disorder patients; A pre-processing step of processing data of a plurality of alcohol use disorder patients to generate processed data; Receives processed data, sets whether multiple alcohol use disorder patients stop early outpatient treatment as a dependent variable, and generates a prediction model by applying one or more machine learning algorithms to all or part of the processed data based on the independent variables. Predictive model creation step; A prediction step for inputting all or part of the processed data into a prediction model to generate a prediction result regarding whether to initially discontinue outpatient treatment for multiple alcohol use disorder patients, and an output step for outputting the prediction result. Provides a method for predicting early discontinuation of treatment.

본 발명의 실시예들에 따르면, 머신 러닝을 통한 예측 모델 설계를 통해 알코올 사용장애 환자의 외래 치료 조기 중단확률을 계산하여 외래 치료 조기 중단 여부를 예측할 수 있다.According to embodiments of the present invention, it is possible to predict whether early discontinuation of outpatient treatment will occur by calculating the probability of early discontinuation of outpatient treatment for patients with alcohol use disorder through designing a prediction model using machine learning.

본 발명의 실시예들에 따르면, 조기에 외래 치료 중단 위험이 높은 알코올 사용장애 환자에 대해 꾸준히 치료를 유지할 수 있도록 환자 관리에 도움을 줄 수 있고, 궁극적으로는 환자의 재발 방지 및 치료의 성공률을 높이는 데 기여할 수 있다.According to embodiments of the present invention, it is possible to help manage patients so that they can consistently maintain treatment for patients with alcohol use disorder who are at high risk of early outpatient treatment discontinuation, and ultimately prevent patients from relapse and increase the success rate of treatment. can contribute to raising

도 1은 본 발명의 실시예들에 따른 알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템의 개략적인 구성도이다.
도 2는 본 발명의 실시예들에 따른 전처리부가 수행하는 변수 결정 동작을 나타내는 흐름도이다.
도 3은 본 발명의 실시예들에 따른 전처리부가 알코올 사용장애 환자의 데이터를 학습 데이터군과 시험 데이터군으로 분류하는 동작을 나타낸 도면이다.
도 4는 본 발명의 실시예들에 따른 전처리부가 특정 클래스에 샘플링을 수행하는 동작을 나타낸 도면이다.
도 5는 본 발명의 실시예들에 따른 예측 모델 생성부가 학습 데이터군에 하나 이상의 머신 러닝 알고리즘을 적용하여 예측 모델을 생성하는 동작의 일 예를 나타낸 도면이다.
도 6은 본 발명의 실시예들에 따른 예측 모델 생성부가 성능평가지표에 따라 예측 모델을 결정하는 동작을 나타낸 도면이다.
도 7은 본 발명의 실시예들에 따른 성능평가지표 중 하나인 AUC를 표시한 도면이다.
도 8은 본 발명의 실시예들에 따른 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법을 나타낸 도면이다.Figure 1 is a schematic diagram of a system for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder according to embodiments of the present invention.
Figure 2 is a flowchart showing a variable determination operation performed by the preprocessor according to embodiments of the present invention.
Figure 3 is a diagram showing the operation of the pre-processing unit classifying data of an alcohol use disorder patient into a learning data group and a test data group according to embodiments of the present invention.
Figure 4 is a diagram showing an operation of a preprocessor performing sampling on a specific class according to embodiments of the present invention.
Figure 5 is a diagram illustrating an example of an operation in which a prediction model generator according to embodiments of the present invention generates a prediction model by applying one or more machine learning algorithms to a group of learning data.
Figure 6 is a diagram showing the operation of the prediction model generator according to embodiments of the present invention to determine a prediction model according to the performance evaluation index.
Figure 7 is a diagram showing AUC, one of the performance evaluation indicators according to embodiments of the present invention.
Figure 8 is a diagram showing a method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder according to embodiments of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실행할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 도면 부호를 붙였다.Below, with reference to the attached drawings, embodiments of the present invention will be described in detail so that those skilled in the art can easily implement it. However, the present invention may be implemented in many different forms and is not limited to the embodiments described herein. In order to clearly explain the present invention in the drawings, parts that are not related to the description are omitted, and similar parts are given reference numerals throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"도어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들의 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Throughout the specification, when a part is said to be "connected" to another part, this includes not only the case where it is "directly connected," but also the case where it is "electrically connected" with another element in between. . In addition, when a part is said to "include" a certain component, this does not mean excluding other components unless specifically stated to the contrary, but may further include other components, and one or more other features. It should be understood that this does not exclude in advance the presence or addition of numbers, steps, operations, components, parts, or combinations thereof.

명세서 전체에서 사용되는 정도의 용어 "약", "실질적으로" 등은 언급된 의미에 고유한 제조 및 물질 허용오차가 제시될 때 그 수치에서 또는 그 수치에 근접한 의미로 사용되고, 본 발명의 이해를 돕기 위해 정확하거나 절대적인 수치가 언급된 개시 내용을 비양심적인 침해자가 부당하게 이용하는 것을 방지하기 위해 사용된다. 본 발명의 명세서 전체에서 사용되는 정도의 용어 "~(하는) 단계" 또는 "~의 단계"는 "~ 를 위한 단계"를 의미하지 않는다.The terms "about", "substantially", etc. used throughout the specification are used to mean at or close to that value when manufacturing and material tolerances inherent in the stated meaning are presented, and are used to enhance the understanding of the present invention. Precise or absolute figures are used to assist in preventing unscrupulous infringers from taking unfair advantage of stated disclosures. The term “step of” or “step of” as used throughout the specification of the present invention does not mean “step for.”

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1개의 유닛이 2개 이상의 하드웨어를 이용하여 실현되어도 되고, 2개 이상의 유닛이 1개의 하드웨어에 의해 실현되어도 된다.In this specification, 'part' includes a unit realized by hardware, a unit realized by software, and a unit realized using both. Additionally, one unit may be realized using two or more pieces of hardware, and two or more units may be realized using one piece of hardware.

도 1은 본 발명의 실시예들에 따른 알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템의 개략적인 구성도이다.Figure 1 is a schematic diagram of a system for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder according to embodiments of the present invention.

도 1을 참조하면, 본 발명의 실시예들에 따른 알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템(100)은, 전처리부(110), 예측 모델 생성부(120), 예측부(130) 및 출력부(140)를 포함할 수 있다.Referring to FIG. 1, the system 100 for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder according to embodiments of the present invention includes a preprocessor 110, a prediction model generation unit 120, a prediction unit 130, and It may include an output unit 140.

전처리부(110)는 복수의 알코올 사용장애 환자의 데이터를 수집할 수 있다. 그리고 전처리부(110)는 복수의 알코올 사용장애 환자의 외래 치료 조기 중단 여부에 대한 예측 모델을 생성하기 위한 하나 이상의 머신 러닝 알고리즘을 적용할 복수의 독립 변수들을 결정할 수 있다. 그리고 전처리부(110)는 복수의 알코올 사용장애 환자의 데이터를, 전술한 하나 이상의 머신 러닝 알고리즘을 적용하기 위하여, 가공하여 가공 데이터를 생성할 수 있다.The preprocessor 110 may collect data from multiple alcohol use disorder patients. In addition, the preprocessor 110 may determine a plurality of independent variables to which one or more machine learning algorithms will be applied to generate a prediction model for whether or not outpatient treatment for multiple alcohol use disorder patients will stop early from outpatient treatment. In addition, the pre-processing unit 110 may process data of a plurality of alcohol use disorder patients to apply one or more machine learning algorithms described above to generate processed data.

전처리부(110)는 알코올 사용장애 환자의 데이터를 수집할 때, 데이터를 보관하는 서버 내지 단말과의 유무선의 통신을 이용할 수 있다. 일 예로, 전처리부(110)는 하나 이상의 의료기관에서 알코올 사용장애 환자의 의료 데이터를 수신할 수 있다. 이때, 복수의 알코올 사용장애 환자의 데이터는 공통 데이터 모델(CDM, Common Data Model)로 규격화 되어 있을 수 있다. When collecting data on patients with alcohol use disorder, the preprocessor 110 may use wired or wireless communication with a server or terminal that stores the data. As an example, the preprocessor 110 may receive medical data of patients with alcohol use disorder from one or more medical institutions. At this time, the data of multiple alcohol use disorder patients may be standardized in a common data model (CDM, Common Data Model).

일 예로, 복수의 알코올 사용장애 환자의 데이터는 임상데이터 웨어하우스(CDW, Clinical Data Warehouse)로부터 수집될 수 있다. 임상데이터 웨어하우스(CDW)는 비식별화(De-identification)를 통해 연구 특성에 맞게 추출된 데이터를 전처리부(110)에 전달할 수 있다.As an example, data on multiple alcohol use disorder patients may be collected from a clinical data warehouse (CDW). The clinical data warehouse (CDW) can deliver data extracted according to study characteristics to the preprocessor 110 through de-identification.

한편, 예측 모델의 민감도를 높이기 위해, 복수의 알코올 사용장애 환자는 입원기간이 2주 이상인 환자 중에서 선택될 수 있다.Meanwhile, to increase the sensitivity of the prediction model, patients with multiple alcohol use disorders can be selected from among patients with a hospitalization period of more than 2 weeks.

이때, 복수의 알코올 사용장애 환자 중 2주 이상 입원한 경우가 2회 이상인 환자에 대한 입원일은, 가장 처음 입원한 일자를 기준으로 정의될 수 있다. At this time, among patients with multiple alcohol use disorders, the date of hospitalization for patients who have been hospitalized more than twice for more than 2 weeks can be defined based on the date of the first hospitalization.

알코올 사용장애 환자의 지속적인 외래 방문 여부는, 해당 환자가 퇴원 후 6개월 동안 매달 1회 이상 외래 방문 여부로 정의할 수 있다.Continuous outpatient visits for patients with alcohol use disorder can be defined as whether the patient visits outpatients at least once a month for 6 months after discharge.

또한, 퇴원 후 6개월 이내에 사망한 환자는 대상에서 제외될 수 있다. Additionally, patients who die within 6 months after discharge may be excluded.

전처리부(110)는, 복수의 알코올 사용장애 환자의 외래 치료 조기 중단 여부에 대한 예측 모델을 생성하기 위한 하나 이상의 머신 러닝 알고리즘을 적용할, 복수의 독립 변수들을 결정할 수 있다.The preprocessor 110 may determine a plurality of independent variables to which one or more machine learning algorithms will be applied to generate a prediction model for whether or not outpatient treatment for patients with alcohol use disorder will be discontinued early.

일 예로, 전처리부(110)는 1) 환자의 연령, 2) 성별, 3) 입원기간, 4) 주소, 5) 진료과, 6) 입원 전 1년 이내 진단받은 당뇨, 간질환, 우울장애 및 불안장애 등 동반질환 여부, 7) 입원 전 알코올 사용장애 외래 치료 여부, 8) 날트렉손 처방 여부 등을 포함한 변수 중에서 독립 변수를 결정할 수 있다. As an example, the preprocessing unit 110 includes 1) patient's age, 2) gender, 3) hospitalization period, 4) address, 5) medical department, 6) diabetes, liver disease, depressive disorder, and anxiety diagnosed within 1 year before hospitalization. The independent variable can be determined among variables including whether there are comorbidities such as disabilities, 7) whether alcohol use disorder is treated outpatiently before hospitalization, and 8) whether naltrexone is prescribed.

한편, 복수의 알코올 사용 장애 환자 중에서 외래 치료 유지 그룹과 외래 치료 조기 중단 그룹의 차이를 확인하기 위하여 결정된 독립 변수에 대하여 t-테스트(t-test)와 카이-제곱 테스트(chi-square test) 등의 통계 분석이 실시될 수 있다. Meanwhile, among patients with multiple alcohol use disorders, t-test, chi-square test, etc. were performed on the determined independent variables to confirm the difference between the outpatient treatment maintenance group and the outpatient treatment early discontinuation group. Statistical analysis can be performed.

t-테스트는 두 그룹간의 평균의 차이가 유의미한지 검증하는 통계방법이다. 카이-제곱 테스트는 카이-제곱 분포에 기초한 통계적 방법으로, 관찰된 빈도가 기대되는 빈도와 유의미하게 다른지 여부를 검정하기 위해 사용되는 검정방법이다. The t-test is a statistical method that verifies whether the difference in means between two groups is significant. The chi-square test is a statistical method based on the chi-square distribution and is a test method used to test whether the observed frequency is significantly different from the expected frequency.

일 예로, 전술한 통계 분석의 실시 결과는 표 1과 같을 수 있다.As an example, the results of the statistical analysis described above may be as shown in Table 1.

외래 치료 유지 (n=126)Retention in outpatient treatment (n=126) 외래 치료 조기 중단 (n=713)Early discontinuation of outpatient treatment (n=713) P-valueP-value 입원기간Hospitalization period 0.4060.406 28일 이하28 days or less 437 (61.3%)437 (61.3%) 77 (61.1%)77 (61.1%) 29-56일29-56 days 136 (19.1%)136 (19.1%) 25 (19.8%)25 (19.8%) 57-70일57-70 days 110 (15.4%)110 (15.4%) 15 (11.9%)15 (11.9%) 70일 이상70 days or more 30 (4.2%)30 (4.2%) 9 (7.1%)9 (7.1%) 성별gender 0.008***0.008*** 남성male 91 (72.2%)91 (72.2%) 590 (82.7%)590 (82.7%) 여성female 35 (27.8%)35 (27.8%) 123 (17.3%)123 (17.3%) 나이age 0.058*0.058* 29세 이하Under 29 9 (7.1%)9 (7.1%) 22 (3.1%)22 (3.1%) 30-3930-39 22 (17.5%)22 (17.5%) 96 (13.5%)96 (13.5%) 40-4940-49 29 (23.0%)29 (23.0%) 201 (28.2%)201 (28.2%) 50-5950-59 30 (23.8%)30 (23.8%) 216 (30.3%)216 (30.3%) 60세 이상over 60 years old 36 (28.6%)36 (28.6%) 178 (25.0%)178 (25.0%) 주소address 0.04**0.04** 서울seoul 37 (29.4%)37 (29.4%) 144 (20.2%)144 (20.2%) 경기game 75 (59.5%)75 (59.5%) 451 (63.3%)451 (63.3%) 기타etc 14 (11.1%)14 (11.1%) 118 (16.5%)118 (16.5%) 진료과Department 0.015**0.015** 정신과psychiatry 111 (88.1%)111 (88.1%) 546 (76.6%)546 (76.6%) 소화기내과Gastroenterology 9 (7.1%)9 (7.1%) 104 (14.6%)104 (14.6%) 기타etc 6 (4.8%)6 (4.8%) 63 (8.8%)63 (8.8%) 입원전 알코올 사용장애 외래 치료 여부Outpatient treatment for alcohol use disorder before hospitalization 0.000***0.000*** 무radish 35 (27.8%)35 (27.8%) 325 (45.6%)325 (45.6%) 유you 91 (72.2%)91 (72.2%) 388 (54.4%)388 (54.4%) 당뇨diabetes 0.087*0.087* 무radish 109 (86.5%)109 (86.5%) 654 (91.7%)654 (91.7%) 유you 17 (13.5%)17 (13.5%) 59 (8.3%)59 (8.3%) 간질환liver disease 0.2240.224 무radish 107 (84.9%)107 (84.9%) 569 (79.8%)569 (79.8%) 유you 19 (15.1%)19 (15.1%) 144 (20.2%)144 (20.2%) 우울장애depressive disorder 0.006***0.006*** 무radish 78 (61.9%)78 (61.9%) 529 (74.2%)529 (74.2%) 유you 48 (38.1%)48 (38.1%) 184 (25.8%)184 (25.8%) 불안장애anxiety disorder 0.053*0.053* 무radish 104 (82.5%)104 (82.5%) 635 (89.1%)635 (89.1%) 유you 22 (17.5%)22 (17.5%) 78 (10.9%)78 (10.9%) 날트렉손 처방 여부Whether naltrexone is prescribed 0.000***0.000*** 무radish 93 (73.8%)93 (73.8%) 626 (87.8%)626 (87.8%) 유you 33 (26.2%)33 (26.2%) 87 (12.2%)87 (12.2%)

표 1에 따르면, 성별, 주소, 진료과, 우울장애, 입원 전 알코올 사용장애 외래 치료 여부, 날트렉손 처방 여부에 대해 유의수준 0.05 하에서, 복수의 알코올 사용 장애 환자 중에서 외래 치료 유지 그룹과 외래 치료 조기 중단 그룹간에 통계적으로 유의한 차이가 확인될 수 있다.According to Table 1, at the significance level of 0.05 for gender, address, department, depressive disorder, outpatient treatment for alcohol use disorder before hospitalization, and whether or not naltrexone was prescribed, among patients with multiple alcohol use disorders, the outpatient treatment maintenance group and the outpatient treatment early discontinuation group A statistically significant difference can be confirmed between the two.

일 예로, 남성, 우울장애 동반하지 않은 환자, 날트렉손 처방받지 않은 환자, 서울이 아닌 지역의 환자, 정신과가 아닌 과에서 입원한 환자, 입원전 알코올 사용장애 진단을 받고 외래 치료를 받지 않은 환자에서 외래 치료 조기 중단 비율이 높게 나올 수 있다. For example, patients who are male, do not have a depressive disorder, are not prescribed naltrexone, are outside of Seoul, are hospitalized in a department other than psychiatry, or are diagnosed with alcohol use disorder before hospitalization and do not receive outpatient treatment. The rate of early discontinuation of treatment may be high.

이러한 검증 과정을 통해 그룹간 알코올 사용 장애 환자의 특성 차이가 확인될 수 있다. Through this verification process, differences in characteristics of alcohol use disorder patients between groups can be confirmed.

전처리부(110)는 하나 이상의 머신 러닝 알고리즘을 적용하기 위하여, 복수의 알코올 사용장애 환자의 데이터를 가공하여, 가공 데이터를 생성할 수 있다. 한편, 전처리부(110)는 예측 모델이 생성된 이후에도 예측 대상 데이터에 대해서 전술한 과정을 수행할 수 있다. 이를 통해 예측 모델의 학습 성능이 향상될 수 있다.The preprocessor 110 may process data of a plurality of alcohol use disorder patients to generate processed data in order to apply one or more machine learning algorithms. Meanwhile, the preprocessor 110 may perform the above-described process on prediction target data even after the prediction model is created. This can improve the learning performance of the prediction model.

전처리부(110)는 복수의 알코올 사용장애 환자의 데이터 중에서 결측 데이터, 이상 데이터, 중복 데이터를 제거하거나 수정함으로써, 양질의 가공 데이터를 확보하여 예측 모델의 정확성을 향상시킬 수 있다.The preprocessor 110 can secure high-quality processed data and improve the accuracy of the prediction model by removing or modifying missing data, abnormal data, and duplicate data from the data of a plurality of alcohol use disorder patients.

또한 전처리부(110)는 복수의 알코올 사용장애 환자의 데이터에 대해, 데이터간의 결합, 분할, 필터링 샘플링 파생변수 생성, 더미변수 생성, 스케일 조정, 자료형 변경, 정규화 등의 과정을 수행하여 가공 데이터를 생성할 수 있다. In addition, the preprocessor 110 performs processes such as combining, dividing, filtering, sampling, generating derived variables, creating dummy variables, adjusting scale, changing data types, and normalizing data on the data of multiple alcohol use disorder patients to produce processed data. can be created.

일 예로, 전처리부(110)는 경험적, 실험적으로 파생된 숫자 또는 문자의 디지털 정보를 수정, 정렬하여 단순화된 형식으로 변환할 수 있다.As an example, the preprocessor 110 may modify and align digital information of numbers or letters derived empirically or experimentally and convert them into a simplified format.

예측 모델 생성부(120)는 가공 데이터를 수신하고, 복수의 알코올 사용장애 환자의 외래 치료 조기 중단 여부를 종속 변수로 설정하고, 독립 변수들을 바탕으로 가공 데이터 중 전체 또는 일부에 하나 이상의 머신 러닝 알고리즘을 적용하여 예측 모델을 생성할 수 있다.The prediction model generator 120 receives processed data, sets whether or not early outpatient treatment of multiple alcohol use disorder patients discontinues outpatient treatment as a dependent variable, and uses one or more machine learning algorithms to apply all or part of the processed data based on the independent variables. You can create a prediction model by applying .

이때, 머신 러닝 알고리즘은, 지도학습 알고리즘, 비지도학습 알고리즘, 강화학습 알고리즘 등 크게 3가지로 분류될 수 있다. At this time, machine learning algorithms can be broadly classified into three types: supervised learning algorithms, unsupervised learning algorithms, and reinforcement learning algorithms.

지도학습 알고리즘은 의도하는 결과가 있을 때 사용하는 알고리즘으로서, 학습을 하는 동안 머신 러닝 알고리즘 모델은 입력으로 들어온 값에 대해 변수를 조정해서 출력에 매핑할 수 있다.A supervised learning algorithm is an algorithm used when there is an intended result. During learning, the machine learning algorithm model can adjust variables for input values and map them to the output.

비지도학습 알고리즘은 의도하는 결과가 없을 때 사용하는 알고리즘으로서, 입력 데이터 집합을 비슷한 유형의 집합으로 분류할 수 있다. 비지도학습 알고리즘은 데이터 마이닝에 사용될 수 있다.An unsupervised learning algorithm is an algorithm used when there is no intended result, and can classify input data sets into similar types of sets. Unsupervised learning algorithms can be used in data mining.

강화학습 알고리즘은 입력값에 대한 결정을 내릴 때 사용하는 알고리즘으로서, 결정을 했을 때 성공/실패에 따라 주어진 입력값에 대한 결정이 점차 달라지는 알고리즘이다. 강화학습 알고리즘은 학습을 할수록 입력에 대한 결과 예측이 가능할 수 있다. The reinforcement learning algorithm is an algorithm used when making decisions about input values. It is an algorithm in which the decision about a given input value gradually changes depending on success/failure when making a decision. As the reinforcement learning algorithm learns, it may be possible to predict the results of the input.

한편, 예측 모델 생성부(120)는, 일 예로, 워크스테이션 서버 또는 클라우드 서버로 구현될 수 있다. Meanwhile, the prediction model generator 120 may be implemented as a workstation server or a cloud server, for example.

예측부(130)는 전처리부(110)에서 생성한 가공 데이터 중 전체 또는 일부를 예측 모델 생성부(120)에서 생성한 예측 모델에 입력하여, 복수의 알코올 사용 장애 환자의 외래 치료 조기 중단 여부에 관한 예측 결과를 생성할 수 있다.The prediction unit 130 inputs all or part of the processed data generated by the pre-processing unit 110 into the prediction model generated by the prediction model generation unit 120 to determine whether to stop outpatient treatment early for patients with multiple alcohol use disorders. Prediction results can be generated.

알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템(100)은 예측부(130)에서 생성한 예측결과를 이용하여 알코올 사용 장애 환자의 외래 치료 조기 중단 여부를 예측할 수 있으며, 또한 외래 치료 조기 중단 여부에 영향을 주는 변수를 파악할 수 있다.The early discontinuation of outpatient treatment for patients with alcohol use disorder prediction system 100 can predict whether early discontinuation of outpatient treatment for patients with alcohol use disorder using the prediction results generated by the prediction unit 130, and can also determine whether early discontinuation of outpatient treatment for patients with alcohol use disorder. You can identify influencing variables.

그리고 알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템(100)은 예측부(130)에서 생성한 예측 결과를 이용하여 환자의 특성에 따라 각별한 관리를 도모하여 꾸준히 치료를 받을 수 있도록 도와주는 역할을 수행할 수 있다. In addition, the early cessation of outpatient treatment for patients with alcohol use disorder prediction system 100 uses the prediction results generated by the prediction unit 130 to provide special management according to the characteristics of the patient, helping the patient to receive treatment consistently. can do.

출력부(140)는 예측부(130)에서 생성된 예측 결과를 출력할 수 있다. 이때, 출력부(140)는 디스플레이를 통한 화면출력 또는 프린터를 이용한 예측결과 인쇄 등의 방법으로 예측 결과를 출력할 수 있다. The output unit 140 may output the prediction result generated by the prediction unit 130. At this time, the output unit 140 may output the prediction result using a method such as screen output through a display or printing the prediction result using a printer.

도 2는 본 발명의 실시예들에 따른 전처리부(110)가 수행하는 변수 결정 동작을 나타내는 흐름도이다.Figure 2 is a flowchart showing a variable determination operation performed by the preprocessor 110 according to embodiments of the present invention.

도 2를 참조하면, 전처리부(110)는 독립 변수들을 결정할 때, 독립 변수들 사이의 다중 공선성 문제를 해결하기 위해, 독립 변수들 사이에 대한 분산 팽창 계수(VIF, Variance Inflation Factors)를 계산하고, 분산 팽창 계수(VIF)가 기 설정된 임계값 이하를 유지하도록 독립 변수들을 결정할 수 있다.Referring to FIG. 2, when determining independent variables, the preprocessor 110 calculates variance inflation factors (VIF) between the independent variables to solve the multicollinearity problem between the independent variables. And, the independent variables can be determined so that the variance inflation factor (VIF) remains below a preset threshold.

다중 공선성 문제란 독립 변수의 일부가 다른 독립 변수의 조합으로 표현될 수 있는 문제를 의미한다. 다중 공선성 문제는 독립 변수들이 서로 독립적이지 않고 상호관계가 강한 경우 발생할 수 있다. 다중 공선성 문제를 해결하기 위한 방법으로, 다른 독립 변수에 의존하는 변수를 없애는 방법이 사용될 수 있으며, 이때 분산 팽창 계수(VIF)가 사용될 수 있다.A multicollinearity problem refers to a problem in which part of an independent variable can be expressed as a combination of other independent variables. Multicollinearity problems can occur when independent variables are not independent from each other and have strong interrelationships. As a way to solve the multicollinearity problem, a method of eliminating variables that depend on other independent variables can be used, and in this case, the variance inflation factor (VIF) can be used.

분산 팽창 계수(VIF)는 하나의 독립 변수를 다른 독립 변수로 선형 회귀한 성능을 나타낸 것이다. i번째 변수의 분산 팽창 계수(VIF)는 다음 수학식 1을 통해 구할 수 있다. Variance inflation factor (VIF) represents the performance of linear regression of one independent variable on another independent variable. The variance inflation factor (VIF) of the ith variable can be obtained through Equation 1 below.

은 i번째 변수를 선형 회귀한 결정계수이다. is the coefficient of determination obtained by linear regression of the ith variable.

값은 1보다 작기 때문에, 독립 변수에 의존하는 경우라면, 값은 커지게 된다. Since the value is less than 1, if it depends on the independent variable, The value becomes larger.

전처리부(110)는 결정된 모든 독립 변수들에 대하여 분산 팽창 계수(VIF)값을 계산하여, 모든 분산 팽창 계수(VIF)값이 기 설정된 임계값 이하를 유지될 수 있도록 독립 변수를 결정할 수 있다.The preprocessor 110 may calculate variance inflation factor (VIF) values for all determined independent variables and determine independent variables so that all variance inflation factor (VIF) values are maintained below a preset threshold.

일 예로, 기 설정된 분산 팽창 계수(VIF)의 임계값은 5로 결정할 수 있다. For example, the threshold value of the preset variance inflation factor (VIF) may be determined to be 5.

도 2를 참조하면, 전처리부(110)는 독립 변수를 결정할 수 있다(S210).Referring to FIG. 2, the preprocessor 110 may determine an independent variable (S210).

전처리부(110)는 결정된 모든 독립 변수들에 대하여 상술한 방법으로 분산 팽창 계수(VIF)를 계산할 수 있다(S220).The preprocessor 110 may calculate the variance inflation factor (VIF) for all determined independent variables using the method described above (S220).

전처리부(110)는 어느 하나의 독립 변수에 대한 분산 팽창 계수(VIF)가 임계값을 초과하는 경우(S230-Y), 다른 독립 변수를 결정할 수 있다(S240).If the variance inflation factor (VIF) for one independent variable exceeds the threshold (S230-Y), the preprocessor 110 may determine another independent variable (S240).

이때, 전처리부(110)는 다른 독립 변수를 결정한 후, S220 단계로 진입하여 결정된 독립 변수에 대하여 다시 분산 팽창 계수(VIF)를 계산할 수 있다.At this time, after determining another independent variable, the preprocessor 110 may enter step S220 and calculate the variance inflation factor (VIF) again for the determined independent variable.

모든 독립 변수에 대한 분산 팽창 계수(VIF)가 5 이하인 경우(S230-N), 전처리부(110)는 변수 결정 과정을 종료할 수 있다. If the variance inflation factor (VIF) for all independent variables is 5 or less (S230-N), the preprocessor 110 may end the variable determination process.

도 3은 본 발명의 실시예들에 따른 전처리부(110)가 알코올 사용장애 환자 데이터를 학습 데이터군(Training Data Set)과 시험 데이터군(Test Data Set)으로 분류하는 동작을 나타낸 도면이다.Figure 3 is a diagram showing the operation of the preprocessor 110 classifying alcohol use disorder patient data into a training data set and a test data set according to embodiments of the present invention.

도 3을 참조하면, 전처리부(110)는 복수의 알코올 사용장애 환자의 데이터를 학습 데이터군과 시험 데이터군으로 분류할 수 있다.Referring to FIG. 3, the preprocessor 110 may classify data of a plurality of alcohol use disorder patients into a learning data group and a test data group.

머신 러닝 알고리즘을 이용하여 예측 모델을 생성하기 위해서 전처리부(110)는 복수의 알코올 사용장애 환자의 데이터 중에서 일부를 학습 데이터군(Training Data Set)으로 분류하여, 예측 모델을 학습하는 데 사용할 수 있다. In order to create a prediction model using a machine learning algorithm, the preprocessor 110 classifies some of the data of a plurality of alcohol use disorder patients into a training data set and can be used to learn the prediction model. .

한편, 시험 데이터군(Test Data Set)은 복수의 알코올 사용장애 환자의 데이터를 기초로 생성된 예측 모델을 테스트하기 위해 사용될 수 있다.Meanwhile, the test data set can be used to test a prediction model created based on data from multiple alcohol use disorder patients.

시험 데이터군(Test Data Set)은 통계적으로 유의미한 결과를 도출할 수 있을만큼 크게 설정될 수 있고, 복수의 알코올 사용장애 환자의 데이터 전체일 수 있다. 시험 데이터군은 학습 데이터군과 같은 특징을 가지도록 분류될 수 있다. The test data set can be set large enough to produce statistically significant results, and can be the entire data of multiple alcohol use disorder patients. The test data group can be classified to have the same characteristics as the learning data group.

예측 모델 생성부(120)는 시험 데이터군(Test Data Set)을 이용하여, 예측 모델의 성능을 측정하는데 사용될 성능평가지표를 도출할 수 있다. 예측 모델 생성부(120)는 성능평가지표를 이용하여, 예측 모델의 객관적 성능을 확인하고 서로 다른 예측 모델들 간의 성능을 상호 비교할 수 있다.The prediction model generator 120 may use a test data set to derive a performance evaluation index to be used to measure the performance of the prediction model. The prediction model generator 120 can use performance evaluation indicators to check the objective performance of the prediction model and compare the performance between different prediction models.

도 4는 본 발명의 실시예들에 따른 전처리부(110)가 특정 클래스에 오버 샘플링을 수행하는 동작을 나타낸 도면이다.Figure 4 is a diagram showing an operation of the preprocessor 110 performing oversampling on a specific class according to embodiments of the present invention.

도 4를 참조하면, 전처리부(110)는 복수의 알코올 사용장애 환자의 데이터를 가공할 때, 학습 데이터군의 클래스 불균형을 해결하기 위해 특정 클래스에 샘플링 방법을 적용할 수 있다.Referring to FIG. 4, when processing data of multiple alcohol use disorder patients, the preprocessor 110 may apply a sampling method to a specific class to resolve class imbalance in the learning data group.

일 예로, 학습 데이터군에 대한 종속 변수인 외래 치료 유지 클래스와 외래 치료 조기 중단 클래스에 대해서, 외래 치료 유지 클래스와 외래 치료 조기 중단 클래스에 포함되는 데이터의 개수가 서로 불균형(e.g. 85:15)을 가지는 문제가 발생할 수 있다. For example, for the outpatient treatment maintenance class and the outpatient treatment early discontinuation class, which are dependent variables for the learning data group, the number of data included in the outpatient treatment maintenance class and the outpatient treatment early discontinuation class is imbalanced (e.g. 85:15). Problems may arise.

클래스 불균형 문제가 있는 데이터를 이용하여 예측 모델이 학습될 경우 편향된 결과가 도출될 수 있고, 알코올 사용장애 환자의 치료 중단 여부를 정확하게 예측하는 것이 어려울 수 있다. 따라서, 이러한 데이터의 클래스 불균형 문제를 해결하기 위해 샘플링 방법이 적용될 수 있다.If a prediction model is learned using data with class imbalance problems, biased results may be derived, and it may be difficult to accurately predict whether patients with alcohol use disorder will discontinue treatment. Therefore, sampling methods can be applied to solve the class imbalance problem of such data.

일 예로, 샘플링 방법은 오버샘플링 방법과 언더샘플링 방법으로 구분될 수 있다.As an example, sampling methods can be divided into oversampling methods and undersampling methods.

언더샘플링 방법은, 다수 클래스의 데이터군을 소수 클래스의 데이터군 수준으로 감소시키는 방식이다. 언더샘플링 방법은 다수의 클래스 데이터를 제거하므로 계산시간이 감소할 수 있고 클래스 오버랩을 감소시킬 수 있다. 다만, 언더샘플링 방법은 학습에 사용되는 전체 데이터의 수를 급격하게 감소시켜 오히려 학습 성능을 떨어트릴 수 있다.The undersampling method is a method of reducing the majority class data group to the level of the minority class data group. The undersampling method removes a large number of class data, which can reduce computation time and reduce class overlap. However, the undersampling method can drastically reduce the total number of data used for learning, thereby deteriorating learning performance.

반면, 오버샘플링 방법은, 소수 클래스의 데이터군을 다수 클레스의 수준으로 증가시켜 학습에 충분한 데이터를 확보하는 것이다. On the other hand, the oversampling method secures sufficient data for learning by increasing the data group of minority classes to the level of majority classes.

예를 들어, 오버샘플링의 방법은, 기존에 존재하는 소수 클래스를 단순 복제하여 비율을 맞추는 랜덤 오버샘플링, 임의의 소수 클래스의 데이터로부터 인근 소수 클래스 사이에 새로운 데이터를 생성하는 방식인 SMOTE(Sythetic Minority Over-Sampling Technique)등의 방법이 사용될 수 있다. For example, oversampling methods include random oversampling to adjust the ratio by simply replicating existing minority classes, and SMOTE (Sythetic Minority), which is a method of generating new data between neighboring minority classes from data of a random minority class. Methods such as Over-Sampling Technique may be used.

전처리부(110)는 데이터를 가공할 때, 오버샘플링 또는 언더샘플링의 방법을 사용하여 클래스 불균형을 교정하고, 더욱 정밀한 예측을 도출할 수 있다. 한편, 본 발명의 실시예들에서는, 전처리부(110)는, 학습 데이터군에 대한 종속 변수인 외래 치료 유지 클래스와 외래 치료 조기 중단 클래스 간의 불균형 문제를 해결하기 위해서, 두 클래스 중에서 소수인 클래스(e.g. 외래 치료 유지 클래스)에 오버샘플링을 적용할 수 있다.When processing data, the preprocessor 110 can correct class imbalance using oversampling or undersampling methods and derive more precise predictions. Meanwhile, in embodiments of the present invention, the preprocessor 110 is configured to solve the imbalance problem between the outpatient treatment maintenance class and the outpatient treatment early discontinuation class, which are dependent variables for the learning data group, and the minority class among the two classes ( Oversampling can be applied to (e.g. outpatient treatment maintenance classes).

도 4를 참조하면, 일 예로, 전처리부(110)는 소수인 클래스에 대해, 소수인 클래스에 포함된 데이터 중 a, b에 대해 복제된(duplicated) 데이터를 생성할 수 있다.Referring to FIG. 4, as an example, the preprocessor 110 may generate duplicated data for a and b among data included in the minority class for a minority class.

도 5는 본 발명의 실시예들에 따른 예측 모델 생성부(120)가 학습 데이터군에 하나 이상의 머신 러닝 알고리즘을 적용하여 예측 모델을 생성하는 동작을 나타낸 도면이다.Figure 5 is a diagram showing an operation of the prediction model generator 120 according to embodiments of the present invention to generate a prediction model by applying one or more machine learning algorithms to a group of learning data.

도 5를 참조하면, 예측 모델 생성부(120)는 가공 데이터 중 학습 데이터군에 대응하는 부분에, 하나 이상의 머신 러닝 알고리즘을 적용하여 예측 모델을 생성할 수 있다.Referring to FIG. 5, the prediction model generator 120 may generate a prediction model by applying one or more machine learning algorithms to a portion of the processed data corresponding to the learning data group.

이때 하나 이상의 머신 러닝 일고리즘은 로지스틱 회귀(Logistic Regression), 서포트 벡터 머신(SVM, Support Vector Machine), 랜덤 포레스트(Random Forest), 그래디언트 부스팅(Gradient Boosting) 및 에이다부스트(Adaboost)중 하나 이상일 수 있다. At this time, one or more machine learning algorithms may be one or more of Logistic Regression, Support Vector Machine (SVM), Random Forest, Gradient Boosting, and Adaboost. .

로지스틱 회귀(Logistic Regression)는 두개의 값만을 가지는 종속 변수와 독립 변수들 간에 인과관계를 로지스틱 함수를 이용하여 추정하는 통계기법이다. 종속 변수는 이분형(0 또는 1)이고, 독립 변수는 범주형 또는 연속형일 수 있다.Logistic Regression is a statistical technique that uses a logistic function to estimate the causal relationship between a dependent variable that has only two values and an independent variable. The dependent variable is dichotomous (0 or 1), and the independent variable can be categorical or continuous.

로지스틱 회귀모형은 일반화 선형모형의 특수한 형태로 S자 곡선을 그리는 함수모형이다. 로지스틱 회귀 분석 결과, 종속 변수 값이 0.5보다 크면 그 사건이 일어날 것으로 예측하며 0.5보다 작으면 그 사건이 일어나지 않을 것으로 예측된다. The logistic regression model is a special form of the generalized linear model and is a functional model that draws an S-shaped curve. As a result of the logistic regression analysis, if the value of the dependent variable is greater than 0.5, the event is predicted to occur, and if it is less than 0.5, the event is predicted to not occur.

서포트 벡터 머신(SVM, Support Vector Machine)은 머신 러닝 학습분야 중 하나로, 패턴인식, 자료분석을 위한 지도 학습 모델이며, 주로 분류와 회기 분석을 위해 사용된다. 서포트 벡터 머신 알고리즘은 두 카테고리 중 어느 하나에 속한 데이터의 집합이 주어졌을 때, 새로운 데이터가 어느 카테고리에 속할지 판단하는 비확률적 이진 선형 분류 모델을 생성할 수 있다.Support Vector Machine (SVM) is one of the machine learning fields. It is a supervised learning model for pattern recognition and data analysis, and is mainly used for classification and regression analysis. The support vector machine algorithm can create a non-probabilistic binary linear classification model that determines which category new data belongs to, given a set of data belonging to one of two categories.

이 경우, 카테고리는 알코올 사용장애 환자의 외래 치료 유지 그룹과 외래 치료 조기 중단 그룹으로 구분될 수 있고, 새로운 데이터가 두 그룹 중 어디에 해당하는지 판단하는데 서포트 벡터 머신이 사용될 수 있다. In this case, the categories can be divided into a group of patients with alcohol use disorder maintaining outpatient treatment and a group of early discontinuation of outpatient treatment, and a support vector machine can be used to determine which of the two groups new data falls into.

랜덤 포레스트(Random Forest)는 회귀 분석 등에 사용되는 앙상블 학습 방법의 일종으로, 훈련 과정에서 구성한 다수의 결정 트리로부터 분류 또는 평균 예측치를 출력함으로써 동작한다.Random Forest is a type of ensemble learning method used in regression analysis, etc., and operates by outputting classification or average prediction values from multiple decision trees constructed during the training process.

앙상블 모델을 이용한 랜덤 포레스트 테스트 과정은, 결정 트리로부터 얻어진 결과를 평균, 곱하기 또는 과반수 투표 방식을 통해 최종 결과를 도출해 낼 수 있다. 이러한 테스트는 병렬적으로 진행될 수 있어 높은 계산 효율성을 얻을 수 있다.The random forest testing process using an ensemble model can derive the final result by averaging, multiplying, or majority voting the results obtained from the decision tree. These tests can be performed in parallel, resulting in high computational efficiency.

그래디언트 부스팅(Gradient Boosting)은 회귀 분석 또는 분류 분석을 수행할 수 있는 머신 러닝 알고리즘이며 머신 러닝 알고리즘의 앙상블 방법론 중 부스팅 계열에 속하는 알고리즘이다.Gradient Boosting is a machine learning algorithm that can perform regression or classification analysis, and is an algorithm that belongs to the boosting family of ensemble methodologies of machine learning algorithms.

부스팅이란, 약한 분류기들을 결합하여 강한 분류기를 만드는 과정이고, 그래디언트 부스팅은 이전 단계의 모델이 예측한 데이터의 오차를 가지고, 이 오차를 0으로 만드는 것을 목표로 새로운 단계의 모델을 만드며, 이러한 모델들을 결합하여 모델을 생성하는 방식의 알고리즘이다.Boosting is the process of creating a strong classifier by combining weak classifiers, and gradient boosting takes the error of the data predicted by the model in the previous step and creates a new step model with the goal of reducing this error to 0. This model It is an algorithm that combines them to create a model.

에이다부스트(Adaboost)는 다른 학습 알고리즘의 결과물들에 가중치를 두어 더하는 방법으로 최종 결과물을 표현하는 머신 러닝 알고리즘이다.Adaboost is a machine learning algorithm that expresses the final result by adding weight to the results of other learning algorithms.

한편, 상술한 머신 러닝 알고리즘은 일 예로서, 본 발명의 실시예들은 이에 한정되지 않는다. Meanwhile, the above-described machine learning algorithm is an example, and embodiments of the present invention are not limited thereto.

예측 모델 생성부(120)는 학습 데이터군(Training Data Set)에 머신 러닝 알고리즘을 적용하여 해당 머신 러닝 알고리즘에 대응되는 예측 모델을 생성할 수 있다.The prediction model generator 120 may apply a machine learning algorithm to a training data set to generate a prediction model corresponding to the machine learning algorithm.

도 6은 본 발명의 실시예들에 따른 예측 모델 생성부(120)가 성능평가지표에 따라 예측 모델을 결정하는 동작을 나타낸 도면이다. Figure 6 is a diagram showing an operation of the prediction model generator 120 determining a prediction model according to a performance evaluation index according to embodiments of the present invention.

도 6을 참조하면, 예측 모델 생성부(120)는 머신 러닝 알고리즘이 복수일 때, 전처리부(110)로부터 수신한 가공 데이터 중 학습 데이터군에 대응하는 부분에 복수의 머신 러닝 알고리즘을 적용하여 생성된 복수의 후보 예측 모델들 각각에 대해, 가공 데이터 중 시험 데이터군에 대응하는 부분을 입력하여 시험 결과를 도출할 수 있다.Referring to FIG. 6, when there are multiple machine learning algorithms, the prediction model generator 120 generates a plurality of machine learning algorithms by applying them to the portion corresponding to the learning data group among the processed data received from the preprocessor 110. For each of the plurality of candidate prediction models, test results can be derived by inputting a portion of the processed data corresponding to the test data group.

그리고 예측 모델 생성부(120)는 도출된 시험 결과를 이용하여 복수의 후보 예측 모델들 각각에 대한 성능평가지표를 계산할 수 있다.And the prediction model generator 120 may calculate a performance evaluation index for each of the plurality of candidate prediction models using the derived test results.

그리고 예측 모델 생성부(120)는 복수의 후보 예측 모델들 중에서 성능평가지표의 수치가 가장 높은 후보 예측 모델을, 예측 모델로 결정할 수 있다.In addition, the prediction model generator 120 may determine the candidate prediction model with the highest performance evaluation index among the plurality of candidate prediction models as the prediction model.

이때, 성능평가지표는 일 예로, 정확도(Accuracy), 민감도(Sensitivity), 특이도(Specificity), AUC(조작특성곡선 아래 면적, Area under the ROC curve) 중 하나일 수 있다.At this time, the performance evaluation index may be, for example, one of Accuracy, Sensitivity, Specificity, and AUC (Area under the ROC curve).

정확도(Accuracy)는 예측결과가 동일한 데이터 건수(TP + TN)를 전체 예측 데이터 건수(TP + FP + FN + TN)으로 나눈 값으로서, 실제 데이터에서 예측 데이터가 얼마나 같은지를 판단하는 지표이다. 정확도는 전체 환자 중에서 외래 치료 중단여부 또는 외래 치료 유지 여부를 맞춘 비율을 의미한다. 이때, TP는 예측 모델이 포지티브(Positive)라고 예측하였는데 실제로도 포지티브인 데이터의 건수, FP는 예측 모델이 포지티브(Positive)라고 예측하였는데 실제로는 네거티브(Negative)인 데이터의 건수, FN은 예측 모델이 네거티브라고 예측하였는데 실제로는 포지티브인 데이터의 건수, TN은 예측 모델이 네거티브라고 예측하였는데 실제로도 네거티브인 데이터의 건수를 의미한다.Accuracy is the number of data with the same prediction result (TP + TN) divided by the total number of prediction data (TP + FP + FN + TN), and is an indicator that determines how much the predicted data is the same as the actual data. Accuracy refers to the proportion of all patients who correctly decided whether to discontinue outpatient treatment or maintain outpatient treatment. At this time, TP is the number of data that was actually positive when the prediction model was predicted to be positive, FP is the number of data that was actually negative when the prediction model was predicted to be positive, and FN is the number of data that was actually negative when the prediction model was predicted to be positive. TN refers to the number of data that was predicted to be positive but was actually positive, and TN refers to the number of data that the prediction model predicted to be negative but was actually negative.

민감도(Sensitivity)는 재현율(Recall rate) 또는 히트율(hit rate)라고도 하며, 예측 모델이 포지티브라고 예측한 것(TP + FP) 중에서 실제 포지티브인 것(TP)의 비율로서, 실제 외래 치료를 중단한 환자 중 예측 모델이 적중한 비율을 의미한다.Sensitivity is also called recall rate or hit rate, and is the ratio of what is actually positive (TP) among what the prediction model predicts to be positive (TP + FP), and is the ratio of what is actually positive (TP), which results in stopping actual outpatient treatment. This refers to the percentage of patients in whom the prediction model was correct.

특이도(Specificity)는 예측 모델이 네거티브라고 예측한 것(TN + FP) 중에서 실제 네거티브(TN)인 것의 비율로서, 실제 외래 치료 유지 환자 중 예측 모델이 적중한 비율을 의미한다.Specificity is the ratio of actual negatives (TN) among those predicted by the prediction model to be negative (TN + FP), and refers to the proportion of patients maintained in outpatient treatment who were actually hit by the prediction model.

AUC는 ROC(Receiver Operating Characteristics, 수신자 조작 특성) 커브로부터 구할 수 있는 것으로서, False Positive Rate에 따른 True Positive Rate를 의미하는데, 민감도에 따른 (1 - 특이도)를 의미한다.AUC can be obtained from the ROC (Receiver Operating Characteristics) curve and means the True Positive Rate according to the False Positive Rate, which means (1 - specificity) according to sensitivity.

AUC는 ROC커브의 아래 면적으로, 최대는 1이며, 좋은 예측 모델일수록 1에 가까운 AUC 값을 가진다. AUC is the area under the ROC curve, the maximum is 1, and the better the prediction model, the closer the AUC value is to 1.

예측 모델 생성부(120)는 복수의 후보 예측 모델들에 대해, 상술한 성능평가지표를 사용하여 가장 높은 확률로 알코올 사용장애 환자의 외래 치료 조기 중단 여부를 예측할 수 있는 예측 모델을 선택할 수 있다.The prediction model generator 120 may select a prediction model that can predict whether a patient with alcohol use disorder will discontinue outpatient treatment early with the highest probability using the above-mentioned performance evaluation index among the plurality of candidate prediction models.

일 예로, 예측 모델 생성부(120)는 후보 예측 모델들 각각에 따른 성능평가지표를 계산하여 표 2를 생성할 수 있다.As an example, the prediction model generator 120 may calculate a performance evaluation index for each candidate prediction model and generate Table 2.

ModelModel AUCAUC AccuracyAccuracy SensitivitySensitivity SpecificitySpecificity Logistic RegressionLogistic Regression 0.69140.6914 0.61300.6130 0.70580.7058 0.60260.6026 SVMSVM 0.67970.6797 0.70230.7023 0.64700.6470 0.70860.7086 Random ForestRandom Forest 0.63650.6365 0.73800.7380 0.47050.4705 0.76820.7682 Gradient BoostingGradient Boosting 0.60220.6022 0.70830.7083 0.41170.4117 0.74170.7417 AdaboostAdaboost 0.72410.7241 0.64280.6428 0.76470.7647 0.62910.6291

표 2에 따르면, AUC 또는 민감도(Sensitivity)를 기준으로 하여 예측 모델을 결정하면, 에이다부스트(Adaboost) 알고리즘을 이용한 후보 예측 모델이 예측 모델로 결정될 수 있다.According to Table 2, if the prediction model is determined based on AUC or sensitivity, a candidate prediction model using the Adaboost algorithm may be determined as the prediction model.

반면, 정확도(Accuracy) 또는 특이도(Specificity)를 기준으로 하여 예측 모델을 결정하면, 랜덤 포레스트(Random Forest) 알고리즘을 이용한 후보 예측 모델이 예측 모델로 결정될 수 있다.On the other hand, if the prediction model is determined based on Accuracy or Specificity, a candidate prediction model using the Random Forest algorithm may be determined as the prediction model.

도 7은 본 발명의 실시예들에 따른 성능평가지표 중 하나인 AUC를 표시한 도면이다.Figure 7 is a diagram showing AUC, one of the performance evaluation indicators according to embodiments of the present invention.

도 7을 참조하면, 일 예로, 예측 모델 생성부(120)는 AUC를 성능평가지표로 이용하여 예측 모델을 결정할 수 있다.Referring to FIG. 7 , as an example, the prediction model generator 120 may determine a prediction model using AUC as a performance evaluation index.

ROC 커브는, 일 예로, 도 7과 같이 결정될 수 있다. AUC는 ROC커브의 아래 면적을 의미하는 것으로, 예측 모델 생성부(120)는 후보 예측 모델 각각에 대한 ROC커브의 아래 면적을 계산하여 AUC값을 확인할 수 있다. As an example, the ROC curve may be determined as shown in FIG. 7. AUC refers to the area under the ROC curve, and the prediction model generator 120 can check the AUC value by calculating the area under the ROC curve for each candidate prediction model.

도 7을 참조하면, 에이다부스트(Adaboost)의 AUC값이 가장 크므로, 예측 모델 생성부(120)는 에이다부스트(Adaboost)를 이용한 예측 모델이 선정될 수 있다.Referring to FIG. 7, since the AUC value of Adaboost is the largest, the prediction model generator 120 may select a prediction model using Adaboost.

도 8은 본 발명의 실시예들에 따른 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법을 나타낸 도면이다.Figure 8 is a diagram showing a method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder according to embodiments of the present invention.

도 8을 참조하면, 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법은, 복수의 알코올 사용장애 환자의 데이터를 수집하는 데이터 수집 단계(S810)를 포함할 수 있다. Referring to FIG. 8, the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder may include a data collection step (S810) of collecting data on a plurality of patients with alcohol use disorder.

그리고 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법은, 복수의 알코올 사용장애 환자의 외래 치료 조기 중단 여부에 대한 예측 모델을 생성하기 위한 하나 이상의 머신 러닝 알고리즘을 적용할 복수의 독립 변수들을 결정하는 독립 변수 결정 단계(S820)를 포함할 수 있다.And the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder is an independent variable that determines a plurality of independent variables to which one or more machine learning algorithms will be applied to create a prediction model for early discontinuation of outpatient treatment for patients with alcohol use disorder. A variable determination step (S820) may be included.

그리고 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법은, 복수의 알코올 사용장애 환자의 데이터를 가공하여 가공 데이터를 생성하는 전처리 단계(S830)를 포함할 수 있다.In addition, the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder may include a preprocessing step (S830) of generating processed data by processing data of a plurality of patients with alcohol use disorder.

한편, 전술한 데이터 수집 단계(S810), 독립 변수 결정 단계(S820) 및 전처리 단계(S830)는 전술한 전처리부(110)에 의해 실행될 수 있다.Meanwhile, the above-described data collection step (S810), independent variable determination step (S820), and preprocessing step (S830) may be executed by the above-described preprocessor 110.

그리고 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법은, 가공 데이터를 수신하고, 복수의 알코올 사용장애 환자의 외래 치료 조기 중단 여부를 종속 변수로 설정하고, 독립 변수들을 바탕으로 하여, 가공 데이터 중 전체 또는 일부에 하나 이상의 머신 러닝 알고리즘을 적용하여 예측 모델을 생성하는 예측 모델 생성 단계(S840)를 포함할 수 있다. 한편, 예측 모델 생성 단계(S840)는 전술한 예측 모델 생성부(120)에 의해 실행될 수 있다.And the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder is to receive processed data, set whether or not early discontinuation of outpatient treatment for multiple alcohol use disorder patients is a dependent variable, and based on the independent variables, determine the total of the processed data. Alternatively, it may include a prediction model generation step (S840) in which a prediction model is generated by applying one or more machine learning algorithms to some parts. Meanwhile, the prediction model creation step (S840) may be executed by the prediction model creation unit 120 described above.

그리고 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법은, 가공 데이터 중 전체 또는 일부를 예측 모델에 입력하여 복수의 알코올 사용 장애 환자의 외래 진료 초기 중단 여부에 관한 예측 결과를 생성하는 예측 단계(S850)를 포함할 수 있다. 한편, 예측 단계(S850)는 전술한 예측부(130)에 의해 실행될 수 있다.In addition, the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder includes a prediction step (S850) in which all or part of the processed data is input into a prediction model to generate a prediction result regarding whether multiple alcohol use disorder patients will initially discontinue outpatient treatment. may include. Meanwhile, the prediction step (S850) may be performed by the prediction unit 130 described above.

그리고 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법은, 예측 결과를 출력하는 출력 단계(S860)를 포함할 수 있다. 한편, 출력 단계(S860)는 전술한 출력부(140)에 의해 실행될 수 있다.Additionally, the method for predicting early cessation of outpatient treatment for patients with alcohol use disorder may include an output step (S860) that outputs the prediction result. Meanwhile, the output step (S860) may be executed by the output unit 140 described above.

독립 변수 결정 단계(S820)는, 일 예로, 독립 변수들을 결정할 때, 독립 변수들 사이의 다중 공선성 문제를 해결하기 위해, 독립 변수들에 대한 분산 팽창 계수(VIF, Variance Inflation Factors)를 계산하고, 분산 팽창 계수가 기 설정된 임계값 이하를 유지하도록 독립 변수들을 결정할 수 있다.For example, in the independent variable determination step (S820), when determining the independent variables, calculate Variance Inflation Factors (VIF) for the independent variables to solve the multicollinearity problem between the independent variables. , independent variables can be determined so that the variance expansion coefficient remains below a preset threshold.

일 예로, 전처리 단계(S830)는, 복수의 알코올 사용장애 환자 데이터를, 학습 데이터군과 시험 데이터군으로 분류하는 단계를 포함할 수 있다.As an example, the preprocessing step (S830) may include classifying a plurality of alcohol use disorder patient data into a learning data group and a test data group.

그리고 전처리 단계(S830)는, 학습 데이터군에 대한 종속 변수인 외래 치료 유지 클래스와 외래 치료 조기 중단 클래스 중에서 소수인 클래스에 오버샘플링을 적용하는 단계를 포함할 수 있다.And the preprocessing step (S830) may include applying oversampling to a minority class among the outpatient treatment maintenance class and the outpatient treatment early discontinuation class, which are dependent variables for the learning data group.

예측 모델 생성 단계(S840)는, 일 예로, 가공 데이터 중 학습 데이터군에 대응하는 부분에, 하나 이상의 머신 러닝 알고리즘을 적용하여 예측 모델을 생성할 수 있다. 이때, 하나 이상의 머신 러닝 알고리즘은, 1) 로지스틱 회귀(Logistic Regression), 2) 서포트 벡터 머신(SVM, Support Vector Machine), 3) 랜덤 포레스트(Random Forest), 4) 그래디언트 부스팅(Gradient Boosting) 및 5) 에이다부스트(Adaboost)중 하나 이상일 수 있다.For example, in the prediction model creation step (S840), a prediction model may be generated by applying one or more machine learning algorithms to a portion of the processed data corresponding to the learning data group. At this time, one or more machine learning algorithms include: 1) Logistic Regression, 2) Support Vector Machine (SVM), 3) Random Forest, 4) Gradient Boosting, and 5 ) It may be one or more of Adaboost.

한편, 예측 모델 생성 단계(S840)는, 일 예로, 1) 머신 러닝 알고리즘이 복수일 때, 가공 데이터 중 학습 데이터군에 대응하는 부분에 복수의 머신 러닝 알고리즘을 적용하여 생성된 복수의 후보 예측 모델들 각각에 대해, 가공 데이터 중 시험 데이터군에 대응하는 부분을 입력하여 시험 결과를 도출하는 단계, 2) 시험 결과를 이용하여 복수의 후보 예측 모델들 각각에 대한 성능평가지표를 계산하는 단계, 3) 복수의 후보 예측 모델들 중에서 성능평가지표의 수치가 가장 높은 후보 예측 모델을, 예측 모델로 결정하는 단계 를 포함할 수 있다.Meanwhile, the prediction model creation step (S840) is, for example, 1) when there are multiple machine learning algorithms, a plurality of candidate prediction models generated by applying a plurality of machine learning algorithms to the portion corresponding to the learning data group among the processed data. For each of these, a step of deriving test results by inputting a portion of the processed data corresponding to the test data group, 2) calculating a performance evaluation index for each of a plurality of candidate prediction models using the test results, 3) ) It may include the step of determining the candidate prediction model with the highest performance evaluation index among the plurality of candidate prediction models as the prediction model.

이때, 성능평가지표는, AUC(Area under the ROC curve)일 수 있다.At this time, the performance evaluation index may be AUC (Area under the ROC curve).

전술한 알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템(100)은, 프로세서, 메모리, 사용자 입력장치, 프레젠테이션 장치 중 적어도 일부를 포함하는 컴퓨팅 장치에 의해 구현될 수 있다. 메모리는, 프로세서에 의해 실행되면 특정 태스크를 수행할 있도록 코딩되어 있는 컴퓨터-판독가능 소프트웨어, 애플리케이션, 프로그램 모듈, 루틴, 인스트럭션(instructions), 및/또는 데이터 등을 저장하는 매체이다. 프로세서는 메모리에 저장되어 있는 컴퓨터-판독가능 소프트웨어, 애플리케이션, 프로그램 모듈, 루틴, 인스트럭션, 및/또는 데이터 등을 판독하여 실행할 수 있다. 사용자 입력장치는 사용자로 하여금 프로세서에게 특정 태스크를 실행하도록 하는 명령을 입력하거나 특정 태스크의 실행에 필요한 데이터를 입력하도록 하는 수단일 수 있다. 사용자 입력장치는 물리적인 또는 가상적인 키보드나 키패드, 키버튼, 마우스, 조이스틱, 트랙볼, 터치-민감형 입력수단, 또는 마이크로폰 등을 포함할 수 있다. 프레젠테이션 장치는 디스플레이, 프린터, 스피커, 또는 진동장치 등을 포함할 수 있다.The system 100 for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder described above may be implemented by a computing device that includes at least some of a processor, memory, user input device, and presentation device. Memory is a medium that stores computer-readable software, applications, program modules, routines, instructions, and/or data that are coded to perform specific tasks when executed by a processor. The processor may read and execute computer-readable software, applications, program modules, routines, instructions, and/or data stored in memory. A user input device may be a means for allowing a user to input a command that causes the processor to execute a specific task or to input data required to execute a specific task. User input devices may include a physical or virtual keyboard, keypad, key buttons, mouse, joystick, trackball, touch-sensitive input means, or microphone. Presentation devices may include displays, printers, speakers, or vibrating devices.

컴퓨팅 장치는 스마트폰, 태블릿, 랩탑, 데스크탑, 서버, 클라이언트 등의 다양한 장치를 포함할 수 있다. 컴퓨팅 장치는 하나의 단일한 스탠드-얼론 장치일 수도 있고, 통신망을 통해 서로 협력하는 다수의 컴퓨팅 장치들로 이루어진 분산형 환경에서 동작하는 다수의 컴퓨팅 장치를 포함할 수 있다.Computing devices may include a variety of devices such as smartphones, tablets, laptops, desktops, servers, and clients. A computing device may be a single stand-alone device or may include multiple computing devices operating in a distributed environment comprised of multiple computing devices cooperating with each other through a communication network.

또한, 전술한 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법은, 프로세서를 구비하고, 또한 프로세서에 의해 실행되면 딥 러닝 모델을 활용한 영상 진단 방법을 수행할 수 있도록 코딩된 컴퓨터 판독가능 소프트웨어, 애플리케이션, 프로그램 모듈, 루틴, 인스트럭션, 및/또는 데이터 구조 등을 저장한 메모리를 구비하는 컴퓨팅 장치에 의해 실행될 수 있다.In addition, the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder described above includes a processor, and is a computer-readable software and application coded to perform an imaging diagnosis method using a deep learning model when executed by the processor. , can be executed by a computing device having a memory storing program modules, routines, instructions, and/or data structures, etc.

상술한 본 실시예들은 다양한 수단을 통해 구현될 수 있다. 예를 들어, 본 실시예들은 하드웨어, 펌웨어(firmware), 소프트웨어 또는 그것들의 결합 등에 의해 구현될 수 있다.The above-described embodiments can be implemented through various means. For example, the present embodiments may be implemented by hardware, firmware, software, or a combination thereof.

하드웨어에 의한 구현의 경우, 본 실시예들은 하나 또는 그 이상의 ASICs(Application Specific Integrated Circuits), DSPs(Digital Signal Processors), DSPDs(Digital Signal Processing Devices), PLDs(Programmable Logic Devices), FPGAs(Field Programmable Gate Arrays), 프로세서, 컨트롤러, 마이크로 컨트롤러 또는 마이크로 프로세서 등에 의해 구현될 수 있다.In the case of hardware implementation, the present embodiments include one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), and Field Programmable Gates (FPGAs). Arrays), processors, controllers, microcontrollers, or microprocessors.

예를 들어, 실시예들에 따른 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법은 심층 신경망의 뉴런(neuron)과 시냅스(synapse)가 반도체 소자들로 구현된 인공지능 반도체 장치를 이용하여 구현될 수 있다. 이때 반도체 소자는 현재 사용하는 반도체 소자들, 예를 들어 SRAM이나 DRAM, NAND 등일 수도 있고, 차세대 반도체 소자들, RRAM이나 STT MRAM, PRAM 등일 수도 있고, 이들의 조합일 수도 있다.For example, the method for predicting early cessation of outpatient treatment for patients with alcohol use disorder according to embodiments can be implemented using an artificial intelligence semiconductor device in which neurons and synapses of a deep neural network are implemented with semiconductor devices. there is. At this time, the semiconductor device may be currently used semiconductor devices, such as SRAM, DRAM, or NAND, or may be next-generation semiconductor devices such as RRAM, STT MRAM, or PRAM, or a combination thereof.

실시예들에 따른 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법을 인공지능 반도체 장치를 이용하여 구현할 때, 딥 러닝 모델을 소프트웨어로 학습한 결과(가중치)를 어레이로 배치된 시냅스 모방소자에 전사하거나 인공지능 반도체 장치에서 학습을 진행할 수도 있다.When implementing the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder according to embodiments using an artificial intelligence semiconductor device, the results (weights) of learning the deep learning model with software are transferred to the synaptic mimetic elements arranged in an array. Learning can also be carried out in artificial intelligence semiconductor devices.

펌웨어나 소프트웨어에 의한 구현의 경우, 본 실시예들에 따른 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법은 이상에서 설명된 기능 또는 동작들을 수행하는 장치, 절차 또는 함수 등의 형태로 구현될 수 있다. 소프트웨어 코드는 메모리 유닛에 저장되어 프로세서에 의해 구동될 수 있다. 메모리 유닛은 상기 프로세서 내부 또는 외부에 위치하여, 이미 공지된 다양한 수단에 의해 프로세서와 데이터를 주고 받을 수 있다.In the case of implementation by firmware or software, the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder according to the present embodiments may be implemented in the form of a device, procedure, or function that performs the functions or operations described above. . Software code can be stored in a memory unit and run by a processor. The memory unit is located inside or outside the processor and can exchange data with the processor through various known means.

또한, 위에서 설명한 "시스템", "프로세서", "컨트롤러", "컴포넌트", "모듈", "인터페이스", "모델", 또는 "유닛" 등의 용어는 일반적으로 컴퓨터 관련 엔티티 하드웨어, 하드웨어와 소프트웨어의 조합, 소프트웨어 또는 실행 중인 소프트웨어를 의미할 수 있다. 예를 들어, 전술한 구성요소는 프로세서에 의해서 구동되는 프로세스, 프로세서, 컨트롤러, 제어 프로세서, 개체, 실행 스레드, 프로그램 및/또는 컴퓨터일 수 있지만 이에 국한되지 않는다. 예를 들어, 컨트롤러 또는 프로세서에서 실행 중인 애플리케이션과 컨트롤러 또는 프로세서가 모두 구성 요소가 될 수 있다. 하나 이상의 구성 요소가 프로세스 및/또는 실행 스레드 내에 있을 수 있으며, 구성 요소들은 하나의 장치(예: 시스템, 컴퓨팅 디바이스 등)에 위치하거나 둘 이상의 장치에 분산되어 위치할 수 있다.Additionally, terms such as "system", "processor", "controller", "component", "module", "interface", "model", or "unit" described above generally refer to computer-related entities hardware, hardware and software. It may refer to a combination of, software, or running software. By way of example, but not limited to, the foregoing components may be a process, processor, controller, control processor, object, thread of execution, program, and/or computer run by a processor. For example, both an application running on a controller or processor and the controller or processor can be a component. One or more components may reside within a process and/or thread of execution, and the components may be located on a single device (e.g., system, computing device, etc.) or distributed across two or more devices.

한편, 또 다른 실시예는 전술한 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법을 수행하는, 컴퓨터 기록매체에 저장되는 컴퓨터 프로그램을 제공한다. 또한 또 다른 실시예는 전술한 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.Meanwhile, another embodiment provides a computer program stored in a computer recording medium that performs the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder described above. Another embodiment also provides a computer-readable recording medium recording a program for realizing the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder described above.

기록매체에 기록된 프로그램은 컴퓨터에서 읽히어 설치되고 실행됨으로써 전술한 단계들을 실행할 수 있다.The program recorded on the recording medium can be read, installed, and executed on the computer to execute the above-described steps.

이와 같이, 컴퓨터가 기록매체에 기록된 프로그램을 읽어 들여 프로그램으로 구현된 기능들을 실행시키기 위하여, 전술한 프로그램은 컴퓨터의 프로세서(CPU)가 컴퓨터의 장치 인터페이스(Interface)를 통해 읽힐 수 있는 C, C++, JAVA, 기계어 등의 컴퓨터 언어로 코드화된 코드(Code)를 포함할 수 있다.In this way, in order for the computer to read the program recorded on the recording medium and execute the functions implemented by the program, the above-mentioned program is a C, C++ program that the computer's processor (CPU) can read through the computer's device interface (Interface). , may include code coded in computer languages such as JAVA and machine language.

이러한 코드는 전술한 기능들을 정의한 함수 등과 관련된 기능적인 코드(Function Code)를 포함할 수 있고, 전술한 기능들을 컴퓨터의 프로세서가 소정의 절차대로 실행시키는데 필요한 실행 절차 관련 제어 코드를 포함할 수도 있다.These codes may include functional codes related to functions defining the above-mentioned functions, etc., and may also include control codes related to execution procedures necessary for the computer processor to execute the above-described functions according to predetermined procedures.

또한, 이러한 코드는 전술한 기능들을 컴퓨터의 프로세서가 실행시키는데 필요한 추가 정보나 미디어가 컴퓨터의 내부 또는 외부 메모리의 어느 위치(주소 번지)에서 참조 되어야 하는지에 대한 메모리 참조 관련 코드를 더 포함할 수 있다.In addition, these codes may further include memory reference-related codes that determine which location (address address) in the computer's internal or external memory the additional information or media required for the computer's processor to execute the above-mentioned functions should be referenced. .

또한, 컴퓨터의 프로세서가 전술한 기능들을 실행시키기 위하여 원격(Remote)에 있는 어떠한 다른 컴퓨터나 서버 등과 통신이 필요한 경우, 코드는 컴퓨터의 프로세서가 컴퓨터의 통신 모듈을 이용하여 원격(Remote)에 있는 어떠한 다른 컴퓨터나 서버 등과 어떻게 통신해야만 하는지, 통신 시 어떠한 정보나 미디어를 송수신해야 하는지 등에 대한 통신 관련 코드를 더 포함할 수도 있다.In addition, if the computer's processor needs to communicate with any other remote computer or server in order to execute the above-mentioned functions, the code is It may further include communication-related codes for how to communicate with other computers, servers, etc., and what information or media should be transmitted and received during communication.

이상에서 전술한 바와 같은 프로그램을 기록한 컴퓨터로 읽힐 수 있는 기록매체는, 일 예로, ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 미디어 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어, 인터넷을 통한 전송)의 형태로 구현되는 것도 포함할 수 있다.Recording media that can be read by a computer recording the above-described program include, for example, ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical media storage, etc., and also include carrier wave (e.g. , transmission via the Internet) may also be implemented.

또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.Additionally, computer-readable recording media can be distributed across computer systems connected to a network, so that computer-readable code can be stored and executed in a distributed manner.

그리고, 본 발명을 구현하기 위한 기능적인(Functional) 프로그램과 이와 관련된 코드 및 코드 세그먼트 등은, 기록매체를 읽어서 프로그램을 실행시키는 컴퓨터의 시스템 환경 등을 고려하여, 본 발명이 속하는 기술분야의 프로그래머들에 의해 용이하게 추론되거나 변경될 수도 있다.In addition, the functional program for implementing the present invention and the code and code segments related thereto are designed by programmers in the technical field to which the present invention belongs, taking into account the system environment of the computer that reads the recording medium and executes the program. It can also be easily inferred or changed by .

알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법은, 컴퓨터에 의해 실행되는 애플리케이션이나 프로그램 모듈과 같은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다.The method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder can also be implemented in the form of a recording medium containing instructions executable by a computer, such as an application or program module executed by a computer. Computer-readable media can be any available media that can be accessed by a computer and includes both volatile and non-volatile media, removable and non-removable media. Additionally, computer-readable media may include all computer storage media. Computer storage media includes both volatile and non-volatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data.

알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법은, 단말기에 기본적으로 설치된 애플리케이션(이는 단말기에 기본적으로 탑재된 플랫폼이나 운영체제 등에 포함된 프로그램을 포함할 수 있다)에 의해 실행될 수 있고, 사용자가 애플리케이션 스토어 서버, 애플리케이션 또는 해당 서비스와 관련된 웹 서버 등의 애플리케이션 제공 서버를 통해 마스터 단말기에 직접 설치한 애플리케이션(즉, 프로그램)에 의해 실행될 수도 있다. 이러한 의미에서, 전술한 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법은 단말기에 기본적으로 설치되거나 사용자에 의해 직접 설치된 애플리케이션(즉, 프로그램)으로 구현되고 단말기에 등의 컴퓨터로 읽을 수 있는 기록매체에 기록될 수 있다The method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder can be executed by an application installed by default on the terminal (this may include programs included in the platform or operating system, etc. installed by default on the terminal), and the user can access the application store. It may also be executed by an application (i.e. program) installed directly on the master terminal through an application providing server such as a server, application, or web server related to the service. In this sense, the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder described above is implemented as an application (i.e., program) installed by default on the terminal or directly installed by the user and stored on a computer-readable recording medium such as the terminal. can be recorded

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어 단일형으로 설명되어 있는 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The description of the present invention described above is for illustrative purposes, and those skilled in the art will understand that the present invention can be easily modified into other specific forms without changing the technical idea or essential features of the present invention. will be. Therefore, the embodiments described above should be understood in all respects as illustrative and not restrictive. For example, components described as single may be implemented in a distributed manner, and similarly, components described as distributed may also be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the claims described later than the detailed description above, and all changes or modified forms derived from the meaning and scope of the claims and their equivalent concepts should be construed as being included in the scope of the present invention. do.

100: 알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템
110: 전처리부
120: 예측 모델 생성부
130: 예측부
140: 출력부100: Prediction system for early discontinuation of outpatient treatment for patients with alcohol use disorder
110: Preprocessing unit
120: Prediction model generation unit
130: prediction unit
140: output unit

Claims

Collecting data from a plurality of alcohol use disorder patients, determining a plurality of independent variables to which one or more machine learning algorithms will be applied to generate a prediction model for early discontinuation of outpatient treatment for the plurality of alcohol use disorder patients, A pre-processing unit that processes data of multiple alcohol use disorder patients and generates processed data;
Receiving the processed data, setting whether or not early outpatient treatment of the plurality of patients with alcohol use disorder is discontinued as a dependent variable, and applying the one or more machine learning algorithms to all or part of the processed data based on the independent variables a prediction model generator that generates the prediction model;
a prediction unit that inputs all or part of the processed data into the prediction model to generate a prediction result regarding whether to discontinue outpatient treatment early for the plurality of alcohol use disorder patients; and
A system for predicting early cessation of outpatient treatment for patients with alcohol use disorder, including an output unit that outputs the prediction result.

According to paragraph 1,
The preprocessor,
When determining the independent variables, calculate Variance Inflation Factors (VIF) for the independent variables to solve the multicollinearity problem between the independent variables,
A system for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder, wherein the independent variables are determined to maintain the variance inflation coefficient below a preset threshold.

According to paragraph 1,
The preprocessor,
A system for predicting early cessation of outpatient treatment for alcohol use disorder patients, which classifies the data of the plurality of alcohol use disorder patients into a learning data group and a test data group.

According to paragraph 3,
The preprocessor,
A system for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder that applies oversampling to the minority class among the outpatient treatment maintenance class and the outpatient treatment early discontinuation class, which are dependent variables for the learning data group.

According to paragraph 4,
The prediction model generator,
Generating the prediction model by applying the one or more machine learning algorithms to a portion of the processed data corresponding to the learning data group,
The one or more machine learning algorithms include:
Predicting early discontinuation of outpatient treatment in patients with alcohol use disorder using one or more of Logistic Regression, Support Vector Machine (SVM), Random Forest, Gradient Boosting, and Adaboost. system.

According to clause 5,
The prediction model generator,
When the machine learning algorithm is plural, each of a plurality of candidate prediction models generated by applying a plurality of machine learning algorithms to a portion corresponding to the learning data group among the processed data, and the test data group among the processed data Enter the corresponding part to derive the test result,
Calculate a performance evaluation index for each of the plurality of candidate prediction models using the test results,
A system for predicting early cessation of outpatient treatment for patients with alcohol use disorder, wherein among the plurality of candidate prediction models, the candidate prediction model with the highest performance evaluation index is determined as the prediction model.

According to clause 6,
The performance evaluation indicators are,
A system for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder using AUC (Area under the ROC curve).

In the method of predicting early discontinuation of outpatient treatment in patients with multiple alcohol use disorders,
A data collection step of collecting data of the plurality of alcohol use disorder patients through a preprocessor included in the system according to any one of claims 1 to 7, and whether to discontinue outpatient treatment early for the plurality of alcohol use disorder patients. an independent variable determination step of determining a plurality of independent variables to which one or more machine learning algorithms will be applied to create a prediction model for, and a pre-processing step of generating processed data by processing the data of the plurality of alcohol use disorder patients;
Receiving the processed data through a prediction model generator included in the system according to any one of claims 1 to 7, and setting as a dependent variable whether or not outpatient treatment for the plurality of alcohol use disorder patients is stopped early, A prediction model generation step of generating the prediction model by applying the one or more machine learning algorithms to all or part of the processed data based on the independent variables;
Through the prediction unit included in the system according to any one of claims 1 to 7, all or part of the processed data is input into the prediction model to determine whether to initially discontinue outpatient treatment for the plurality of alcohol use disorder patients. A prediction step of generating a prediction result regarding: and
A method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder, comprising: an output step of outputting the prediction result through an output unit included in the system according to any one of claims 1 to 7.

According to clause 8,
The independent variable determination step is,
When determining the independent variables, calculate Variance Inflation Factors (VIF) for the independent variables to solve the multicollinearity problem between the independent variables,
A method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder, wherein the independent variables are determined to maintain the variance inflation coefficient below a preset threshold.

According to clause 8,
The preprocessing step is,
A method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder, comprising: classifying the plurality of data on patients with alcohol use disorder into a learning data group and a test data group.

According to clause 10,
The preprocessing step is,
A method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder, comprising: applying oversampling to minority classes among the outpatient treatment maintenance class and the outpatient treatment early discontinuation class, which are dependent variables for the learning data group.

According to clause 11,
The prediction model creation step is,
Generating the prediction model by applying the one or more machine learning algorithms to a portion of the processed data corresponding to the learning data group,
The one or more machine learning algorithms include:
Predicting early discontinuation of outpatient treatment in patients with alcohol use disorder using one or more of Logistic Regression, Support Vector Machine (SVM), Random Forest, Gradient Boosting, and Adaboost. method.

According to clause 12,
The prediction model creation step is,
When the machine learning algorithm is plural, for each of a plurality of candidate prediction models generated by applying a plurality of machine learning algorithms to a portion corresponding to the learning data group among the processed data, the test data group among the processed data Deriving a test result by inputting the corresponding part;
calculating a performance evaluation index for each of the plurality of candidate prediction models using the test results; and
A method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder, comprising: determining a candidate prediction model with the highest value of the performance evaluation index among the plurality of candidate prediction models as the prediction model.

According to clause 13,
The performance evaluation indicators are,
A method for predicting early discontinuation of outpatient treatment in patients with alcohol use disorder using AUC (Area under the ROC curve).