WO2023003315A1

WO2023003315A1 - System and method for predicting early discontinuation of treatment of outpatient with alcohol use disorder

Info

Publication number: WO2023003315A1
Application number: PCT/KR2022/010516
Authority: WO
Inventors: 김대진; 최인영; 박소진; 전지원; 박성웅
Original assignee: (주)디지털팜
Priority date: 2021-07-22
Filing date: 2022-07-19
Publication date: 2023-01-26
Also published as: KR102601514B1; KR20230015009A

Abstract

Embodiments of the present invention pertain to a system and method for predicting early discontinuation of treatment of outpatients with alcohol use disorder. According to embodiments of the present invention, a plurality of independent parameters to which a machine learning algorithm will be applied is determined, the machine learning algorithm is applied to produce a prediction model, and processed data is input into the prediction model to generate a prediction result accounting for whether early discontinuation of treatment of outpatients with alcohol use disorder will be performed or not.

Description

System and method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder

The present invention provides a system and method for predicting early discontinuation of outpatient treatment for alcohol use disorder patients.

Conventionally, in order to increase the treatment rate of patients with alcohol use disorder, studies on factors affecting continuous outpatient treatment have been conducted. However, most of these studies were only prospective studies, and even retrospective studies focused on traditional methodologies such as regression analysis.

Machine learning is the study of computer algorithms that automatically improve through experience and the use of data. Machine learning is also considered a part of artificial intelligence. Machine learning algorithms do not explicitly program specific actions; instead, machine learning algorithms can be used to build models to make predictions or decisions based on samples called training data. Machine learning can be used for a variety of applications, including medicine, speech recognition and computer vision.

Predictive models based on machine learning can classify with high accuracy. In recent psychiatric research, predictive models based on machine learning have been usefully used in the process of developing decision support systems.

Alcohol use disorder may cause not only physical diseases such as alcohol-induced physical complications and alcohol-related dementia, but also social problems such as alcohol-related crimes and accidents, and enormous economic losses.

According to the 2016 Epidemiologic Survey of Mental Disorders in Korea, the lifetime prevalence of alcohol use disorder, which includes alcohol dependence and abuse, was 12.2% (male 18.1%, female 6.4%), which is the highest prevalence rate compared to other mental disorders.

Alcohol use disorder has a higher relapse rate than other mental disorders. In order to prevent recurrence, it is necessary to be managed over a long period of time, not terminated with a single treatment. In addition, continuous treatment can have a positive effect on the treatment outcome. Therefore, continuous follow-up of patients is an important indicator to evaluate the prognosis of alcohol use disorder.

It is a reality that the outpatient treatment retention rate of patients with alcohol use disorder is quite low. In foreign countries, 52% to 75% of patients receiving outpatient treatment for alcohol use disorder lose follow-up at the fourth treatment. According to a domestic study, 91.7% of patients were discontinued from follow-up within 6 months of discharge. Among patients with alcohol use disorder, it is important to predict and manage patients whose follow-up is prematurely discontinued.

Embodiments of the present invention can predict whether or not to discontinue outpatient treatment early by calculating the probability of early withdrawal from outpatient treatment for alcohol use disorder patients through a predictive model design through machine learning.

Embodiments of the present invention help in patient management so that patients with alcohol use disorder who have a high risk of early outpatient treatment discontinuation can continue treatment steadily, and ultimately contribute to preventing relapse of patients and increasing the success rate of treatment. .

However, the technical problem to be achieved by the present embodiment is not limited to the technical problem as described above, and other technical problems may exist.

In one aspect, embodiments of the present invention collect data of a plurality of alcohol use disorder patients, and apply one or more machine learning algorithms to generate a predictive model for early discontinuation of outpatient treatment of a plurality of alcohol use disorder patients. a pre-processing unit that determines a plurality of independent variables and generates processed data by processing data of a plurality of alcohol use disorder patients; Receives processed data, sets whether or not the outpatient treatment of multiple alcohol use disorder patients is prematurely discontinued as a dependent variable, and applies one or more machine learning algorithms to all or part of the processed data based on independent variables to generate a predictive model a predictive model generating unit; A prediction unit that inputs all or part of the processed data into a predictive model to generate a prediction result regarding whether or not the outpatient treatment of a plurality of alcohol use disorder patients is prematurely discontinued; And an output unit for outputting a prediction result; provides a system for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder.

In another aspect, embodiments of the present invention include a data collection step of collecting data of a plurality of alcohol use disorder patients; An independent variable determination step of determining a plurality of independent variables to which one or more machine learning algorithms for generating a predictive model for early discontinuation of outpatient treatment of a plurality of alcohol use disorder patients are to be applied; A pre-processing step of generating processed data by processing data of a plurality of alcohol use disorder patients; Receives processed data, sets whether or not to discontinue outpatient treatment early for multiple alcohol use disorder patients as a dependent variable, and applies one or more machine learning algorithms to all or part of the processed data based on independent variables to create a predictive model generating a predictive model; Outpatient treatment of patients with alcohol use disorder, including a prediction step of generating a prediction result on whether a plurality of patients with alcohol use disorder will initially discontinue outpatient treatment by inputting all or part of the processed data into a predictive model, and an output step of outputting the prediction result. Provides a method for predicting early discontinuation of treatment.

According to embodiments of the present invention, it is possible to predict whether or not to discontinue outpatient treatment early by calculating the probability of early withdrawal from outpatient treatment of an alcohol use disorder patient through a predictive model design through machine learning.

According to the embodiments of the present invention, it is possible to help patients with alcohol use disorder who have a high risk of discontinuing outpatient treatment early in patient management so that they can consistently maintain treatment, and ultimately prevent recurrence of patients and increase the success rate of treatment. can contribute to heightening

1 is a schematic configuration diagram of a system for predicting early discontinuation of outpatient treatment for alcohol use disorder patients according to embodiments of the present invention.

2 is a flowchart illustrating a variable determination operation performed by a preprocessor according to embodiments of the present invention.

3 is a diagram illustrating an operation of classifying data of an alcohol use disorder patient into a learning data group and a test data group by a preprocessing unit according to embodiments of the present invention.

4 is a diagram illustrating an operation of performing sampling on a specific class by a pre-processor according to embodiments of the present invention.

5 is a diagram illustrating an example of an operation of generating a predictive model by applying one or more machine learning algorithms to a training data group by a predictive model generator according to embodiments of the present invention.

6 is a diagram illustrating an operation of determining a predictive model according to a performance evaluation index of a predictive model generator according to embodiments of the present invention.

7 is a diagram showing AUC, which is one of the performance evaluation indicators according to embodiments of the present invention.

8 is a diagram illustrating a method for predicting early discontinuation of outpatient treatment for alcohol use disorder patients according to embodiments of the present invention.

Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily practice them. However, the present invention may be implemented in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and reference numerals are attached to similar parts throughout the specification.

Throughout the specification, when a part is said to be "connected" to another part, this includes not only the case of being "directly connected" but also the case of being "electrically connected" with another element in between. . In addition, when a part is said to "include" a certain component, it means that it can further include other components, not excluding other components unless otherwise stated, and one or more other characteristics. However, it should be understood that it does not preclude the presence or addition of numbers, steps, operations, components, parts, or combinations thereof.

As used throughout the specification, the terms "about", "substantially", etc., are used at or approximating that value when manufacturing and material tolerances inherent in the stated meaning are given, and do not convey an understanding of the present invention. Accurate or absolute figures are used to help prevent exploitation by unscrupulous infringers of the disclosed disclosure. The term "step of (doing)" or "step of" as used throughout the specification of the present invention does not mean "step for".

In this specification, a "unit" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. Further, one unit may be realized using two or more hardware, and two or more units may be realized by one hardware.

Referring to FIG. 1, a system for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder according to embodiments of the present invention 100 includes a preprocessor 110, a predictive model generator 120, a predictor 130, and An output unit 140 may be included.

The pre-processing unit 110 may collect data of a plurality of alcohol use disorder patients. In addition, the preprocessing unit 110 may determine a plurality of independent variables to which one or more machine learning algorithms are applied to generate a predictive model for whether or not a plurality of patients with alcohol use disorder prematurely discontinue outpatient treatment. In addition, the pre-processing unit 110 may generate processed data by processing data of a plurality of alcohol use disorder patients in order to apply one or more machine learning algorithms described above.

When the pre-processing unit 110 collects data of patients with alcohol use disorder, wired/wireless communication with a server or terminal storing the data may be used. For example, the pre-processing unit 110 may receive medical data of alcohol use disorder patients from one or more medical institutions. At this time, data of a plurality of alcohol use disorder patients may be standardized as a common data model (CDM, Common Data Model).

For example, data of a plurality of alcohol use disorder patients may be collected from a Clinical Data Warehouse (CDW). The clinical data warehouse (CDW) may transmit data extracted according to research characteristics to the pre-processing unit 110 through de-identification.

On the other hand, in order to increase the sensitivity of the predictive model, a plurality of alcohol use disorder patients may be selected from patients with a hospitalization period of 2 weeks or more.

In this case, the date of hospitalization for a patient who has been hospitalized two or more times for two weeks or longer among multiple alcohol use disorder patients may be defined based on the first hospitalized date.

Whether or not patients with alcohol use disorder continue to visit the outpatient clinic can be defined as whether or not the patient visits the outpatient clinic at least once a month for 6 months after being discharged from the hospital.

In addition, patients who died within 6 months of discharge can be excluded.

The preprocessing unit 110 may determine a plurality of independent variables to which one or more machine learning algorithms for generating a predictive model for early discontinuation of outpatient treatment of a plurality of alcohol use disorder patients will be applied.

For example, the preprocessing unit 110 is 1) the patient's age, 2) gender, 3) hospitalization period, 4) address, 5) medical department, 6) diabetes, liver disease, depressive disorder and anxiety diagnosed within 1 year before hospitalization Independent variables can be determined among variables including whether there are comorbidities such as disabilities, 7) outpatient treatment for alcohol use disorder before hospitalization, and 8) whether naltrexone was prescribed.

On the other hand, t-test, chi-square test, etc. for the independent variables determined to confirm the difference between the outpatient treatment maintenance group and the outpatient treatment early discontinuation group among patients with multiple alcohol use disorders Statistical analysis of can be performed.

The t-test is a statistical method for verifying whether the difference in average between two groups is significant. The chi-square test is a statistical method based on a chi-square distribution, and is a test method used to test whether an observed frequency is significantly different from an expected frequency.

As an example, the results of the statistical analysis described above may be shown in Table 1.

	외래 치료 유지 (n=126)Maintained outpatient treatment (n=126)	외래 치료 조기 중단 (n=713)Early discontinuation of outpatient treatment (n=713)	P-valueP-value
입원기간Hospitalization period			0.4060.406
28일 이하28 days or less	437 (61.3%)437 (61.3%)	77 (61.1%)77 (61.1%)
29-56일29-56 days	136 (19.1%)136 (19.1%)	25 (19.8%)25 (19.8%)
57-70일57-70 days	110 (15.4%)110 (15.4%)	15 (11.9%)15 (11.9%)
70일 이상more than 70 days	30 (4.2%)30 (4.2%)	9 (7.1%)9 (7.1%)
성별gender			0.008*0.008*
남성male	91 (72.2%)91 (72.2%)	590 (82.7%)590 (82.7%)
여성woman	35 (27.8%)35 (27.8%)	123 (17.3%)123 (17.3%)
나이age			0.0580.058
29세 이하under 29	9 (7.1%)9 (7.1%)	22 (3.1%)22 (3.1%)
30-3930-39	22 (17.5%)22 (17.5%)	96 (13.5%)96 (13.5%)
40-4940-49	29 (23.0%)29 (23.0%)	201 (28.2%)201 (28.2%)
50-5950-59	30 (23.8%)30 (23.8%)	216 (30.3%)216 (30.3%)
60세 이상60+	36 (28.6%)36 (28.6%)	178 (25.0%)178 (25.0%)
주소address			0.040.04
서울Seoul	37 (29.4%)37 (29.4%)	144 (20.2%)144 (20.2%)
경기game	75 (59.5%)75 (59.5%)	451 (63.3%)451 (63.3%)
기타Etc	14 (11.1%)14 (11.1%)	118 (16.5%)118 (16.5%)
진료과medical department			0.0150.015
정신과psychiatry	111 (88.1%)111 (88.1%)	546 (76.6%)546 (76.6%)
소화기내과Gastroenterology	9 (7.1%)9 (7.1%)	104 (14.6%)104 (14.6%)
기타Etc	6 (4.8%)6 (4.8%)	63 (8.8%)63 (8.8%)
입원전 알코올 사용장애 외래 치료 여부Pre-hospital alcohol use disorder outpatient treatment			0.000*0.000*
무radish	35 (27.8%)35 (27.8%)	325 (45.6%)325 (45.6%)
유you	91 (72.2%)91 (72.2%)	388 (54.4%)388 (54.4%)
당뇨diabetes			0.0870.087
무radish	109 (86.5%)109 (86.5%)	654 (91.7%)654 (91.7%)
유you	17 (13.5%)17 (13.5%)	59 (8.3%)59 (8.3%)
간질환liver disease			0.2240.224
무radish	107 (84.9%)107 (84.9%)	569 (79.8%)569 (79.8%)
유you	19 (15.1%)19 (15.1%)	144 (20.2%)144 (20.2%)
우울장애depressive disorder			0.006*0.006*
무radish	78 (61.9%)78 (61.9%)	529 (74.2%)529 (74.2%)
유you	48 (38.1%)48 (38.1%)	184 (25.8%)184 (25.8%)
불안장애anxiety disorder			0.0530.053
무radish	104 (82.5%)104 (82.5%)	635 (89.1%)635 (89.1%)
유you	22 (17.5%)22 (17.5%)	78 (10.9%)78 (10.9%)
날트렉손 처방 여부Whether naltrexone is prescribed			0.000*0.000*
무radish	93 (73.8%)93 (73.8%)	626 (87.8%)626 (87.8%)
유you	33 (26.2%)33 (26.2%)	87 (12.2%)87 (12.2%)

According to Table 1, under the significance level of 0.05 for gender, address, department, depressive disorder, outpatient treatment for alcohol use disorder before hospitalization, and whether or not naltrexone was prescribed, outpatient treatment maintenance group and outpatient treatment early discontinuation group among multiple alcohol use disorder patients Statistically significant differences can be identified between males. For example, male, patients without depressive disorder, patients not prescribed naltrexone, patients outside of Seoul, patients hospitalized in non-psychiatric departments, alcohol use before hospitalization There may be a high rate of premature discontinuation of outpatient treatment in patients diagnosed with a disability and not receiving outpatient treatment.

Through this verification process, differences in the characteristics of alcohol use disorder patients between groups can be identified.

The pre-processing unit 110 may generate processed data by processing data of a plurality of alcohol use disorder patients in order to apply one or more machine learning algorithms. Meanwhile, the pre-processing unit 110 may perform the above-described process for prediction target data even after the prediction model is generated. Through this, the learning performance of the predictive model can be improved.

The pre-processing unit 110 may improve the accuracy of the predictive model by securing high-quality processed data by removing or correcting missing data, abnormal data, and redundant data among the data of a plurality of alcohol use disorder patients.

In addition, the pre-processing unit 110 processes the data of a plurality of alcohol use disorder patients, such as combining data, segmentation, filtering sampling derived variable generation, dummy variable generation, scaling adjustment, data type change, normalization, etc., to obtain processed data. can create

For example, the pre-processing unit 110 may convert digital information of numbers or characters derived empirically or experimentally into a simplified form by correcting and arranging them.

The predictive model generation unit 120 receives the processed data, sets whether or not the outpatient treatment of the plurality of alcohol use disorder patients is prematurely discontinued as a dependent variable, and uses one or more machine learning algorithms in all or part of the processed data based on the independent variables. can be applied to generate a predictive model.

At this time, machine learning algorithms can be largely classified into three types: supervised learning algorithms, unsupervised learning algorithms, and reinforcement learning algorithms.

A supervised learning algorithm is an algorithm that is used when there is an intended result. During learning, a machine learning algorithm model can adjust variables for input values and map them to outputs.

An unsupervised learning algorithm is an algorithm used when there is no intended result, and can classify an input data set into a set of similar types. Unsupervised learning algorithms can be used for data mining.

Reinforcement learning algorithm is an algorithm used when making a decision about an input value. When a decision is made, the decision on the given input value gradually changes according to success/failure. As the reinforcement learning algorithm learns, it may be possible to predict the result of the input.

Meanwhile, the predictive model generating unit 120 may be implemented as, for example, a workstation server or a cloud server.

The prediction unit 130 inputs all or part of the processed data generated by the pre-processing unit 110 to the prediction model generated by the predictive model generation unit 120 to determine whether a plurality of patients with alcohol use disorders are prematurely discontinued from outpatient treatment. predictive results can be generated.

The system 100 for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder can predict whether or not the outpatient treatment for patients with alcohol use disorder will be prematurely discontinued by using the prediction result generated by the prediction unit 130, and also determines whether or not the outpatient treatment is prematurely discontinued. Influencing variables can be identified.

In addition, the system 100 for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder plays a role of helping to receive treatment steadily by promoting special management according to the characteristics of the patient using the prediction result generated by the prediction unit 130. can do.

The output unit 140 may output a prediction result generated by the prediction unit 130 . At this time, the output unit 140 may output the prediction result by a method such as screen output through a display or printing of the prediction result using a printer.

2 is a flowchart illustrating a variable determination operation performed by the preprocessor 110 according to embodiments of the present invention.

Referring to FIG. 2, when determining independent variables, the preprocessor 110 calculates variance inflation factors (VIFs) between independent variables in order to solve the problem of multicollinearity between independent variables. Independent variables may be determined so that the variance expansion coefficient (VIF) is maintained below a predetermined threshold value.

A multicollinearity problem refers to a problem in which some of the independent variables can be expressed as a combination of other independent variables. The multicollinearity problem can occur when independent variables are not independent of each other and have strong interrelationships. As a method for solving the multicollinearity problem, a method of eliminating variables dependent on other independent variables may be used, and in this case, a variance inflation factor (VIF) may be used.

The variance inflation factor (VIF) represents the performance of a linear regression of one independent variable on another. The variance inflation factor (VIF) of the i-th variable can be obtained through Equation 1 below.

is the coefficient of determination of linear regression of the ith variable.

Since the value is less than 1, if it depends on the independent variable,

value gets bigger.

The preprocessor 110 may calculate variance inflation coefficient (VIF) values for all determined independent variables, and determine independent variables such that all VIF values are maintained below a predetermined threshold value.

For example, a predetermined critical value of the variance expansion coefficient (VIF) may be determined to be 5.

Referring to FIG. 2 , the pre-processing unit 110 may determine independent variables (S210).

The pre-processing unit 110 may calculate the variance expansion coefficient (VIF) for all the determined independent variables in the above-described manner (S220).

The preprocessing unit 110 may determine another independent variable when the variance inflation factor (VIF) for any one independent variable exceeds a critical value (S230-Y) (S240).

At this time, after determining another independent variable, the preprocessor 110 may enter step S220 and calculate a variance inflation factor (VIF) again for the determined independent variable.

When the variance inflation factor (VIF) for all independent variables is 5 or less (S230-N), the preprocessor 110 may end the variable determination process.

3 is a diagram illustrating an operation of classifying alcohol use disorder patient data into a training data set and a test data set by the preprocessing unit 110 according to embodiments of the present invention.

Referring to FIG. 3 , the preprocessing unit 110 may classify data of a plurality of alcohol use disorder patients into a learning data group and a test data group.

In order to generate a predictive model using a machine learning algorithm, the preprocessing unit 110 classifies some of the data of a plurality of patients with alcohol use disorder into a training data set, and can be used to learn the predictive model. .

Meanwhile, the test data set may be used to test a predictive model generated based on data of a plurality of alcohol use disorder patients.

The test data set may be set large enough to derive statistically significant results, and may include all data of a plurality of alcohol use disorder patients. The test data group may be classified to have the same characteristics as the training data group.

The predictive model generating unit 120 may derive a performance evaluation index to be used to measure performance of the predictive model using a test data set. The predictive model generation unit 120 may check the objective performance of the predictive model and compare the performances of different predictive models using the performance evaluation index.

4 is a diagram illustrating an operation of performing oversampling on a specific class by the preprocessor 110 according to embodiments of the present invention.

Referring to FIG. 4 , the pre-processing unit 110 may apply a sampling method to a specific class in order to solve the class imbalance of the learning data group when processing data of a plurality of alcohol use disorder patients.

For example, with respect to the outpatient treatment maintenance class and the outpatient treatment early discontinuation class, which are dependent variables for the learning data group, the number of data included in the outpatient treatment maintenance class and the outpatient treatment early discontinuation class is disproportionate (e.g. 85:15) to each other. problems can arise.

When a predictive model is trained using data with class imbalance problems, biased results may be derived, and it may be difficult to accurately predict whether patients with alcohol use disorder will stop treatment. Accordingly, a sampling method may be applied to solve the class imbalance problem of such data.

For example, sampling methods may be divided into oversampling methods and undersampling methods.

The undersampling method is a method of reducing a data group of a majority class to a level of a data group of a minority class. Since the undersampling method removes a large number of class data, calculation time can be reduced and class overlap can be reduced. However, the undersampling method drastically reduces the total number of data used for learning, and may rather degrade learning performance.

On the other hand, the oversampling method secures enough data for learning by increasing the data group of the minority class to the level of the majority class.

For example, oversampling methods include random oversampling, which simply replicates an existing minority class to match the ratio, and synthetic minority (SMOTE), which is a method of generating new data between neighboring minority classes from data of an arbitrary minority class. Over-Sampling Technique) can be used.

When processing data, the pre-processing unit 110 may correct class imbalance by using an oversampling or undersampling method and derive a more precise prediction. On the other hand, in the embodiments of the present invention, the pre-processing unit 110, in order to solve the imbalance problem between the foreign treatment maintenance class and the foreign treatment early discontinuation class, which are dependent variables for the learning data group, a class that is a minority of the two classes ( Oversampling can be applied to outpatient treatment maintenance classes).

Referring to FIG. 4 , for example, the pre-processing unit 110 may generate duplicated data for data a and b included in the minority class for a minority class.

5 is a diagram illustrating an operation of generating a predictive model by applying one or more machine learning algorithms to a training data group by the predictive model generator 120 according to embodiments of the present invention.

Referring to FIG. 5 , the predictive model generator 120 may generate a predictive model by applying one or more machine learning algorithms to a portion corresponding to a training data group among processed data.

At this time, the one or more machine learning algorithms may be one or more of logistic regression, support vector machine (SVM), random forest, gradient boosting, and adaboost. .

Logistic regression is a statistical technique for estimating a causal relationship between a dependent variable having only two values and independent variables using a logistic function. The dependent variable is dichotomous (0 or 1), and the independent variable can be categorical or continuous.

The logistic regression model is a special form of generalized linear model and is a functional model that draws an S-shaped curve. As a result of logistic regression analysis, if the value of the dependent variable is greater than 0.5, the event is predicted to occur, and if the value is less than 0.5, the event is predicted not to occur.

A Support Vector Machine (SVM) is one of the machine learning fields and is a supervised learning model for pattern recognition and data analysis, and is mainly used for classification and regression analysis. The support vector machine algorithm may create a non-probabilistic binary linear classification model that determines which category new data belongs to when given a set of data belonging to one of two categories.

In this case, the category may be divided into an outpatient treatment maintenance group and an outpatient treatment early discontinuation group for patients with alcohol use disorder, and a support vector machine may be used to determine which of the two groups the new data corresponds to.

A random forest is a type of ensemble learning method used in regression analysis, etc., and operates by outputting a classification or average prediction value from a plurality of decision trees constructed in the training process.

The random forest test process using the ensemble model may derive a final result through average, multiplication, or majority voting of the result obtained from the decision tree. These tests can be performed in parallel, resulting in high computational efficiency.

Gradient Boosting is a machine learning algorithm that can perform regression analysis or classification analysis, and is an algorithm that belongs to the boosting family of ensemble methodologies of machine learning algorithms.

Boosting is the process of creating a strong classifier by combining weak classifiers, and gradient boosting takes the error of the data predicted by the model in the previous stage and creates a new model with the goal of making this error zero. It is an algorithm that creates a model by combining them.

Adaboost is a machine learning algorithm that expresses the final result by weighting and adding the results of other learning algorithms.

Meanwhile, the above-described machine learning algorithm is an example, and embodiments of the present invention are not limited thereto.

The predictive model generator 120 may generate a predictive model corresponding to the machine learning algorithm by applying a machine learning algorithm to a training data set.

6 is a diagram illustrating an operation of determining a predictive model according to performance evaluation indexes by the predictive model generator 120 according to embodiments of the present invention.

Referring to FIG. 6, when there are a plurality of machine learning algorithms, the predictive model generation unit 120 applies a plurality of machine learning algorithms to a portion corresponding to the learning data group among the processed data received from the preprocessor 110 to generate For each of the plurality of candidate prediction models, a test result may be derived by inputting a part corresponding to a test data group among processed data.

Also, the predictive model generator 120 may calculate a performance evaluation index for each of a plurality of candidate predictive models using the derived test results.

Also, the prediction model generator 120 may determine a candidate prediction model having the highest performance evaluation index value among a plurality of candidate prediction models as the prediction model.

In this case, the performance evaluation index may be, for example, one of accuracy, sensitivity, specificity, and area under the ROC curve (AUC).

Accuracy is a value obtained by dividing the number of data with identical prediction results (TP + TN) by the total number of predicted data (TP + FP + FN + TN), and is an index for determining how identical the predicted data is in actual data. Accuracy refers to the ratio of whether to discontinue outpatient treatment or to maintain outpatient treatment among all patients. At this time, TP is the number of data that the prediction model predicted to be positive but is actually positive, FP is the number of data that the prediction model predicted to be positive but is actually negative, FN is the number of data that the prediction model is negative TN means the number of data that is predicted to be negative but is actually positive, and TN is the number of data that is actually negative even though the prediction model predicted to be negative.

Sensitivity, also called recall rate or hit rate, is the ratio of actual positives (TP) among those predicted by the predictive model to be positive (TP + FP). It means the proportion of the predictive model of one patient.

Specificity is the ratio of actual negatives (TN) among those predicted by the predictive model to be negative (TN + FP), and means the ratio of actual outpatient treatment maintenance patients to which the predictive model is correct.

AUC can be obtained from the ROC (Receiver Operating Characteristics) curve, and means the true positive rate according to the false positive rate, which means (1 - specificity) according to the sensitivity.

AUC is the area under the ROC curve, and the maximum is 1, and a good predictive model has an AUC value close to 1.

The predictive model generation unit 120 may select a predictive model capable of predicting whether an alcohol use disorder patient will stop outpatient treatment early with the highest probability using the above-described performance evaluation index for a plurality of candidate predictive models.

For example, the predictive model generation unit 120 may generate Table 2 by calculating a performance evaluation index according to each of the candidate predictive models.

ModelModel	AUCAUC	AccuracyAccuracy	SensitivitySensitivity	SpecificitySpecificity
Logistic RegressionLogistic Regression	0.69140.6914	0.61300.6130	0.70580.7058	0.60260.6026
SVMSVM	0.67970.6797	0.70230.7023	0.64700.6470	0.70860.7086
Random ForestRandom Forest	0.63650.6365	0.73800.7380	0.47050.4705	0.76820.7682
Gradient BoostingGradient Boosting	0.60220.6022	0.70830.7083	0.41170.4117	0.74170.7417
AdaboostAdaboost	0.72410.7241	0.64280.6428	0.76470.7647	0.62910.6291

According to Table 2, if a predictive model is determined based on AUC or sensitivity, a candidate predictive model using the Adaboost algorithm can be determined as the predictive model. On the other hand, accuracy or specificity ( When a predictive model is determined based on specificity, a candidate predictive model using a random forest algorithm may be determined as the predictive model.

Referring to FIG. 7 , for example, the predictive model generation unit 120 may determine a predictive model using AUC as a performance evaluation index.

An ROC curve, for example, may be determined as shown in FIG. 7 . AUC means the area under the ROC curve, and the predictive model generation unit 120 may check the AUC value by calculating the area under the ROC curve for each candidate prediction model.

Referring to FIG. 7 , since Adaboost has the largest AUC value, the prediction model generator 120 may select a prediction model using Adaboost.

Referring to FIG. 8 , the method for predicting early discontinuation of outpatient treatment for alcohol use disorder patients may include a data collection step ( S810 ) of collecting data of a plurality of alcohol use disorder patients.

In addition, the method for predicting early discontinuation of outpatient treatment of patients with alcohol use disorder is an independent variable that determines a plurality of independent variables to which one or more machine learning algorithms are applied to generate a predictive model for whether or not a plurality of patients with alcohol use disorder will discontinue outpatient treatment early. A variable determination step (S820) may be included.

The method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder may include a preprocessing step ( S830 ) of generating processed data by processing data of a plurality of patients with alcohol use disorder.

Meanwhile, the aforementioned data collection step (S810), independent variable determination step (S820), and preprocessing step (S830) may be executed by the aforementioned preprocessor 110.

In addition, the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder receives processed data, sets whether or not early discontinuation of outpatient treatment for multiple alcohol use disorder patients as a dependent variable, and based on independent variables, all of the processed data Alternatively, it may include a predictive model generating step (S840) of generating a predictive model by applying one or more machine learning algorithms to a part. Meanwhile, the predictive model generating step (S840) may be executed by the aforementioned predictive model generating unit 120.

And the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder is a prediction step of inputting all or part of the processed data into a predictive model to generate a prediction result on whether or not to discontinue the outpatient treatment of a plurality of patients with alcohol use disorder in the early stage (S850) can include Meanwhile, the predicting step (S850) may be executed by the predicting unit 130 described above.

Further, the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder may include an output step ( S860 ) of outputting a prediction result. Meanwhile, the output step (S860) may be executed by the above-described output unit 140.

In the independent variable determining step (S820), for example, when determining the independent variables, in order to solve the multicollinearity problem between the independent variables, variance inflation factors (VIFs) are calculated for the independent variables and , independent variables may be determined so that the variance expansion coefficient is maintained below a predetermined threshold value.

For example, the pre-processing step ( S830 ) may include classifying a plurality of alcohol use disorder patient data into a learning data group and a test data group.

The pre-processing step (S830) may include applying oversampling to a minority of the foreign treatment maintenance class and the foreign treatment early discontinuation class, which are dependent variables for the learning data group.

In the predictive model generating step (S840), for example, a predictive model may be generated by applying one or more machine learning algorithms to a part corresponding to the training data group among the processed data. At this time, one or more machine learning algorithms are: 1) Logistic Regression, 2) Support Vector Machine (SVM), 3) Random Forest, 4) Gradient Boosting, and 5 ) may be one or more of Adaboost.

Meanwhile, in the predictive model generating step (S840), as an example, 1) when there are a plurality of machine learning algorithms, a plurality of candidate prediction models generated by applying a plurality of machine learning algorithms to a part corresponding to a training data group among processed data For each of the processing data, deriving a test result by inputting a part corresponding to the test data group, 2) Calculating a performance evaluation index for each of a plurality of candidate prediction models using the test result, 3 ) determining a candidate prediction model having the highest performance evaluation index among a plurality of candidate prediction models as a prediction model.

In this case, the performance evaluation index may be an area under the ROC curve (AUC).

The aforementioned system 100 for predicting early withdrawal from outpatient treatment for patients with alcohol use disorder may be implemented by a computing device including at least some of a processor, a memory, a user input device, and a presentation device. Memory is a medium that stores computer-readable software, applications, program modules, routines, instructions, and/or data that are coded to perform particular tasks when executed by a processor. A processor may read and execute computer-readable software, applications, program modules, routines, instructions, and/or data stored in memory. The user input device may be a means for allowing a user to input a command to execute a specific task to the processor or input data required for execution of the specific task. The user input device may include a physical or virtual keyboard or keypad, key buttons, mouse, joystick, trackball, touch-sensitive input means, or a microphone. The presentation device may include a display, a printer, a speaker, or a vibrator.

Computing devices may include a variety of devices such as smart phones, tablets, laptops, desktops, servers, and clients. A computing device may be a single stand-alone device or may include multiple computing devices operating in a distributed environment consisting of multiple computing devices cooperating with each other over a communications network.

In addition, the above-described method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder has a processor and is coded to perform an image diagnosis method using a deep learning model when executed by the processor. Computer readable software and applications , program modules, routines, instructions, and/or data structures, etc. may be executed by a computing device having a memory.

The present embodiments described above may be implemented through various means. For example, the present embodiments may be implemented by hardware, firmware, software, or a combination thereof.

In the case of hardware implementation, the present embodiments include one or more ASICs (Application Specific Integrated Circuits), DSPs (Digital Signal Processors), DSPDs (Digital Signal Processing Devices), PLDs (Programmable Logic Devices), FPGAs (Field Programmable Gates) Arrays), processors, controllers, microcontrollers or microprocessors.

For example, a method for predicting early withdrawal from outpatient treatment of a patient with alcohol use disorder according to embodiments may be implemented using an artificial intelligence semiconductor device in which neurons and synapses of a deep neural network are implemented as semiconductor devices. there is. At this time, the semiconductor device may be currently used semiconductor devices such as SRAM, DRAM, NAND, etc., next-generation semiconductor devices, RRAM, STT MRAM, PRAM, etc., or a combination thereof.

When the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder according to the embodiments is implemented using an artificial intelligence semiconductor device, the result (weight) of learning the deep learning model as software is transferred to the synaptic mimic device arranged in an array, or Learning may be performed on an artificial intelligence semiconductor device.

In the case of implementation by firmware or software, the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder according to the present embodiments may be implemented in the form of a device, procedure, or function that performs the functions or operations described above. . The software codes may be stored in a memory unit and driven by a processor. The memory unit may be located inside or outside the processor and exchange data with the processor by various means known in the art.

Also, the terms "system", "processor", "controller", "component", "module", "interface", "model", or "unit" as described above generally refer to computer-related entities hardware, hardware and software. can mean a combination of, software, or running software. For example, but is not limited to, a process driven by a processor, a processor, a controller, a control processor, an object, a thread of execution, a program, and/or a computer. For example, a component can be both an application running on a controller or processor and a controller or processor. One or more components may reside within a process and/or thread of execution, and components may reside on one device (eg, system, computing device, etc.) or may be distributed across two or more devices.

On the other hand, another embodiment provides a computer program stored in a computer recording medium that performs the above-described method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder. In addition, another embodiment provides a computer-readable recording medium recording a program for realizing the above-described method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder.

A program recorded on a recording medium may be read, installed, and executed in a computer to execute the above-described steps.

In this way, in order for the computer to read the program recorded on the recording medium and execute the functions implemented by the program, the above-described program is C, C++ that can be read by the computer's processor (CPU) through the computer's device interface. , JAVA, may include a code coded in a computer language such as machine language.

These codes may include functional codes related to functions defining the above-described functions, and may include control codes related to execution procedures necessary for a processor of a computer to execute the above-described functions according to a predetermined procedure.

In addition, these codes may further include memory reference related codes for which location (address address) of the computer's internal or external memory should be referenced for additional information or media necessary for the computer's processor to execute the above-mentioned functions. .

In addition, when the computer processor needs to communicate with any other remote computer or server in order to execute the above-mentioned functions, the code allows the computer processor to use the computer's communication module to communicate with any other remote computer or server. Communication-related codes for how to communicate with other computers or servers, what information or media to transmit/receive during communication, and the like may be further included.

Recording media that can be read by a computer on which the program as described above is recorded are, for example, ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical media storage device, etc., and also carrier wave (e.g. , Transmission through the Internet) may also include what is implemented in the form of.

In addition, the computer-readable recording medium is distributed in computer systems connected through a network, so that computer-readable codes can be stored and executed in a distributed manner.

In addition, a functional program for implementing the present invention, codes and code segments related thereto, in consideration of the system environment of a computer that reads a recording medium and executes a program, etc., help programmers in the art to which the present invention belongs It may be easily inferred or changed by

The method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder may be implemented in the form of a recording medium including instructions executable by a computer, such as an application or program module executed by a computer. Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. Also, computer readable media may include all computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

A method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder may be executed by an application basically installed in a terminal (this may include a program included in a platform or operating system, etc. It may be executed by an application (that is, a program) directly installed in the master terminal through an application providing server such as a server, an application, or a web server related to the corresponding service. In this sense, the above-described method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder is implemented as an application (i.e., a program) that is basically installed in a terminal or directly installed by a user, and is stored in a computer-readable recording medium such as a terminal. can be recorded

The above description of the present invention is for illustrative purposes, and those skilled in the art can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, the embodiments described above should be understood as illustrative in all respects and not limiting. For example, components described as a single type may be implemented in a distributed manner, and components described as distributed may also be implemented in a combined form.

The scope of the present invention is indicated by the appended claims rather than the detailed description above, and all changes or modifications derived from the meaning and scope of the claims and equivalent concepts thereof should be construed as being included in the scope of the present invention. do.

CROSS-REFERENCE TO RELATED APPLICATIONCROSS-REFERENCE TO RELATED APPLICATION

This patent application claims priority in accordance with US Patent Act Article 119 (a) (35 U.S.C § 119 (a)) for Patent Application No. 10-2021-0096253 filed in Korea on July 22, 2021, and All contents are incorporated into this patent application by reference. In addition, if this patent application claims priority for the same reason as above for countries other than the United States, all the contents are incorporated into this patent application as references.

Claims

Collecting data of a plurality of alcohol use disorder patients, determining a plurality of independent variables to which one or more machine learning algorithms to apply one or more machine learning algorithms for generating a predictive model for early discontinuation of outpatient treatment of the plurality of alcohol use disorder patients, a pre-processing unit generating processed data by processing data of a plurality of alcohol use disorder patients;

Receiving the processed data, setting whether the plurality of alcohol use disorder patients' outpatient treatment was prematurely discontinued as a dependent variable, and applying the one or more machine learning algorithms to all or part of the processed data based on the independent variables a predictive model generating unit generating the predictive model;

a prediction unit inputting all or part of the processed data into the predictive model to generate a prediction result regarding whether the plurality of alcohol use disorder patients will prematurely discontinue outpatient treatment; and

A system for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder, comprising: an output unit for outputting the prediction result.
According to claim 1,

The pre-processing unit,

When determining the independent variables, calculate Variance Inflation Factors (VIFs) for the independent variables in order to solve the problem of multicollinearity between the independent variables,

A system for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder for determining the independent variables so that the variance expansion coefficient is maintained below a predetermined threshold value.
According to claim 1,

The pre-processing unit,

A system for predicting early discontinuation of outpatient treatment of alcohol use disorder patients, which classifies the data of the plurality of alcohol use disorder patients into a learning data group and a test data group.
According to claim 3,

The pre-processing unit,

A system for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder, which applies oversampling to a minority class among the dependent variables for the learning data group, the outpatient treatment maintenance class and the outpatient treatment early withdrawal class.
According to claim 4,

The predictive model generating unit,

Generating the predictive model by applying the one or more machine learning algorithms to a portion corresponding to the learning data group among the processed data;

The one or more machine learning algorithms,

Predicting early discontinuation of outpatient treatment for patients with alcohol use disorder using one or more of Logistic Regression, Support Vector Machine (SVM), Random Forest, Gradient Boosting, and Adaboost system.
According to claim 5,

The predictive model generating unit,

When the machine learning algorithm is plural, to each of a plurality of candidate prediction models generated by applying a plurality of machine learning algorithms to a portion corresponding to the training data group among the processed data, to the test data group among the processed data Enter the corresponding part to derive the test result,

Calculate a performance evaluation index for each of the plurality of candidate prediction models using the test result,

A system for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder, which determines, as the predictive model, a candidate predictive model having the highest value of the performance evaluation index among the plurality of candidate predictive models.
According to claim 6,

The performance evaluation index is,

A system for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder, which is AUC (Area under the ROC curve).
In the method of predicting early discontinuation of outpatient treatment of multiple alcohol use disorder patients,

a data collection step of collecting data of the plurality of alcohol use disorder patients;

an independent variable determination step of determining a plurality of independent variables to which one or more machine learning algorithms for generating a predictive model for early discontinuation of outpatient treatment of the plurality of alcohol use disorder patients are to be applied;

a pre-processing step of generating processed data by processing the data of the plurality of alcohol use disorder patients;

Receiving the processed data, setting whether the plurality of alcohol use disorder patients' outpatient treatment was prematurely discontinued as a dependent variable, and applying the one or more machine learning algorithms to all or part of the processed data based on the independent variables a predictive model generating step of generating the predictive model;

a prediction step of inputting all or part of the processed data into the prediction model to generate a prediction result regarding whether or not the plurality of alcohol use disorder patients will initially discontinue outpatient treatment; and

A method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder, comprising: an output step of outputting the prediction result.
According to claim 8,

In the step of determining the independent variable,

When determining the independent variables, calculate Variance Inflation Factors (VIFs) for the independent variables in order to solve the multicollinearity problem between the independent variables,

A method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder, wherein the independent variables are determined so that the variance expansion coefficient is maintained below a predetermined threshold value.
According to claim 8,

In the preprocessing step,

Classifying the plurality of alcohol use disorder patient data into a learning data group and a test data group; a method for predicting early discontinuation of outpatient treatment for alcohol use disorder patients.
According to claim 10,

In the preprocessing step,

A method for predicting early withdrawal from ambulatory treatment for patients with alcohol use disorder, comprising: applying oversampling to a small number of the outpatient treatment maintenance class and the outpatient treatment early discontinuation class, which are dependent variables for the learning data group.
According to claim 11,

The predictive model generation step,

Generating the predictive model by applying the one or more machine learning algorithms to a portion corresponding to the learning data group among the processed data;

The one or more machine learning algorithms,

Predicting early discontinuation of outpatient treatment for patients with alcohol use disorder using one or more of Logistic Regression, Support Vector Machine (SVM), Random Forest, Gradient Boosting, and Adaboost Way.
According to claim 12,

The predictive model generation step,

When there are a plurality of machine learning algorithms, for each of a plurality of candidate prediction models generated by applying a plurality of machine learning algorithms to a portion corresponding to the training data group among the processed data, the test data group among the processed data Deriving a test result by inputting a part corresponding to;

calculating a performance evaluation index for each of the plurality of candidate prediction models using the test result; and

A method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder, comprising: determining a candidate predictive model having the highest performance evaluation index among the plurality of candidate predictive models as the predictive model.
According to claim 13,

The performance evaluation index is,

A method for predicting early discontinuation of outpatient treatment in patients with alcohol use disorder, which is AUC (Area under the ROC curve).