WO2023003315A1 - System and method for predicting early discontinuation of treatment of outpatient with alcohol use disorder - Google Patents

System and method for predicting early discontinuation of treatment of outpatient with alcohol use disorder Download PDF

Info

Publication number
WO2023003315A1
WO2023003315A1 PCT/KR2022/010516 KR2022010516W WO2023003315A1 WO 2023003315 A1 WO2023003315 A1 WO 2023003315A1 KR 2022010516 W KR2022010516 W KR 2022010516W WO 2023003315 A1 WO2023003315 A1 WO 2023003315A1
Authority
WO
WIPO (PCT)
Prior art keywords
alcohol use
patients
use disorder
outpatient treatment
predictive model
Prior art date
Application number
PCT/KR2022/010516
Other languages
French (fr)
Korean (ko)
Inventor
김대진
최인영
박소진
전지원
박성웅
Original Assignee
(주)디지털팜
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by (주)디지털팜 filed Critical (주)디지털팜
Publication of WO2023003315A1 publication Critical patent/WO2023003315A1/en

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/50ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for simulation or modelling of medical disorders
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/20Ensemble learning
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H20/00ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance
    • G16H20/70ICT specially adapted for therapies or health-improving plans, e.g. for handling prescriptions, for steering therapy or for monitoring patient compliance relating to mental therapies, e.g. psychological therapy or autogenous training
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Definitions

  • the present invention provides a system and method for predicting early discontinuation of outpatient treatment for alcohol use disorder patients.
  • Machine learning is the study of computer algorithms that automatically improve through experience and the use of data. Machine learning is also considered a part of artificial intelligence. Machine learning algorithms do not explicitly program specific actions; instead, machine learning algorithms can be used to build models to make predictions or decisions based on samples called training data. Machine learning can be used for a variety of applications, including medicine, speech recognition and computer vision.
  • Predictive models based on machine learning can classify with high accuracy.
  • predictive models based on machine learning have been usefully used in the process of developing decision support systems.
  • Alcohol use disorder may cause not only physical diseases such as alcohol-induced physical complications and alcohol-related dementia, but also social problems such as alcohol-related crimes and accidents, and enormous economic losses.
  • Alcohol use disorder has a higher relapse rate than other mental disorders. In order to prevent recurrence, it is necessary to be managed over a long period of time, not terminated with a single treatment. In addition, continuous treatment can have a positive effect on the treatment outcome. Therefore, continuous follow-up of patients is an important indicator to evaluate the prognosis of alcohol use disorder.
  • Embodiments of the present invention can predict whether or not to discontinue outpatient treatment early by calculating the probability of early withdrawal from outpatient treatment for alcohol use disorder patients through a predictive model design through machine learning.
  • Embodiments of the present invention help in patient management so that patients with alcohol use disorder who have a high risk of early outpatient treatment discontinuation can continue treatment steadily, and ultimately contribute to preventing relapse of patients and increasing the success rate of treatment. .
  • embodiments of the present invention collect data of a plurality of alcohol use disorder patients, and apply one or more machine learning algorithms to generate a predictive model for early discontinuation of outpatient treatment of a plurality of alcohol use disorder patients.
  • a pre-processing unit that determines a plurality of independent variables and generates processed data by processing data of a plurality of alcohol use disorder patients; Receives processed data, sets whether or not the outpatient treatment of multiple alcohol use disorder patients is prematurely discontinued as a dependent variable, and applies one or more machine learning algorithms to all or part of the processed data based on independent variables to generate a predictive model a predictive model generating unit;
  • a prediction unit that inputs all or part of the processed data into a predictive model to generate a prediction result regarding whether or not the outpatient treatment of a plurality of alcohol use disorder patients is prematurely discontinued;
  • an output unit for outputting a prediction result provides a system for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder.
  • embodiments of the present invention include a data collection step of collecting data of a plurality of alcohol use disorder patients; An independent variable determination step of determining a plurality of independent variables to which one or more machine learning algorithms for generating a predictive model for early discontinuation of outpatient treatment of a plurality of alcohol use disorder patients are to be applied; A pre-processing step of generating processed data by processing data of a plurality of alcohol use disorder patients; Receives processed data, sets whether or not to discontinue outpatient treatment early for multiple alcohol use disorder patients as a dependent variable, and applies one or more machine learning algorithms to all or part of the processed data based on independent variables to create a predictive model generating a predictive model; Outpatient treatment of patients with alcohol use disorder, including a prediction step of generating a prediction result on whether a plurality of patients with alcohol use disorder will initially discontinue outpatient treatment by inputting all or part of the processed data into a predictive model, and an output step of outputting the prediction result.
  • FIG. 1 is a schematic configuration diagram of a system for predicting early discontinuation of outpatient treatment for alcohol use disorder patients according to embodiments of the present invention.
  • FIG. 2 is a flowchart illustrating a variable determination operation performed by a preprocessor according to embodiments of the present invention.
  • FIG. 3 is a diagram illustrating an operation of classifying data of an alcohol use disorder patient into a learning data group and a test data group by a preprocessing unit according to embodiments of the present invention.
  • FIG. 4 is a diagram illustrating an operation of performing sampling on a specific class by a pre-processor according to embodiments of the present invention.
  • FIG. 5 is a diagram illustrating an example of an operation of generating a predictive model by applying one or more machine learning algorithms to a training data group by a predictive model generator according to embodiments of the present invention.
  • FIG. 6 is a diagram illustrating an operation of determining a predictive model according to a performance evaluation index of a predictive model generator according to embodiments of the present invention.
  • AUC is one of the performance evaluation indicators according to embodiments of the present invention.
  • FIG. 8 is a diagram illustrating a method for predicting early discontinuation of outpatient treatment for alcohol use disorder patients according to embodiments of the present invention.
  • step of (doing) or “step of” as used throughout the specification of the present invention does not mean “step for”.
  • a "unit” includes a unit realized by hardware, a unit realized by software, and a unit realized using both. Further, one unit may be realized using two or more hardware, and two or more units may be realized by one hardware.
  • FIG. 1 is a schematic configuration diagram of a system for predicting early discontinuation of outpatient treatment for alcohol use disorder patients according to embodiments of the present invention.
  • a system for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder includes a preprocessor 110, a predictive model generator 120, a predictor 130, and An output unit 140 may be included.
  • the pre-processing unit 110 may collect data of a plurality of alcohol use disorder patients. In addition, the preprocessing unit 110 may determine a plurality of independent variables to which one or more machine learning algorithms are applied to generate a predictive model for whether or not a plurality of patients with alcohol use disorder prematurely discontinue outpatient treatment. In addition, the pre-processing unit 110 may generate processed data by processing data of a plurality of alcohol use disorder patients in order to apply one or more machine learning algorithms described above.
  • the pre-processing unit 110 may collect data of patients with alcohol use disorder, wired/wireless communication with a server or terminal storing the data may be used.
  • the pre-processing unit 110 may receive medical data of alcohol use disorder patients from one or more medical institutions.
  • data of a plurality of alcohol use disorder patients may be standardized as a common data model (CDM, Common Data Model).
  • data of a plurality of alcohol use disorder patients may be collected from a Clinical Data Warehouse (CDW).
  • the clinical data warehouse (CDW) may transmit data extracted according to research characteristics to the pre-processing unit 110 through de-identification.
  • a plurality of alcohol use disorder patients may be selected from patients with a hospitalization period of 2 weeks or more.
  • the date of hospitalization for a patient who has been hospitalized two or more times for two weeks or longer among multiple alcohol use disorder patients may be defined based on the first hospitalized date.
  • Whether or not patients with alcohol use disorder continue to visit the outpatient clinic can be defined as whether or not the patient visits the outpatient clinic at least once a month for 6 months after being discharged from the hospital.
  • the preprocessing unit 110 may determine a plurality of independent variables to which one or more machine learning algorithms for generating a predictive model for early discontinuation of outpatient treatment of a plurality of alcohol use disorder patients will be applied.
  • the preprocessing unit 110 is 1) the patient's age, 2) gender, 3) hospitalization period, 4) address, 5) medical department, 6) diabetes, liver disease, depressive disorder and anxiety diagnosed within 1 year before hospitalization
  • Independent variables can be determined among variables including whether there are comorbidities such as disabilities, 7) outpatient treatment for alcohol use disorder before hospitalization, and 8) whether naltrexone was prescribed.
  • the t-test is a statistical method for verifying whether the difference in average between two groups is significant.
  • the chi-square test is a statistical method based on a chi-square distribution, and is a test method used to test whether an observed frequency is significantly different from an expected frequency.
  • the pre-processing unit 110 may generate processed data by processing data of a plurality of alcohol use disorder patients in order to apply one or more machine learning algorithms. Meanwhile, the pre-processing unit 110 may perform the above-described process for prediction target data even after the prediction model is generated. Through this, the learning performance of the predictive model can be improved.
  • the pre-processing unit 110 may improve the accuracy of the predictive model by securing high-quality processed data by removing or correcting missing data, abnormal data, and redundant data among the data of a plurality of alcohol use disorder patients.
  • the pre-processing unit 110 processes the data of a plurality of alcohol use disorder patients, such as combining data, segmentation, filtering sampling derived variable generation, dummy variable generation, scaling adjustment, data type change, normalization, etc., to obtain processed data.
  • a plurality of alcohol use disorder patients such as combining data, segmentation, filtering sampling derived variable generation, dummy variable generation, scaling adjustment, data type change, normalization, etc.
  • the pre-processing unit 110 may convert digital information of numbers or characters derived empirically or experimentally into a simplified form by correcting and arranging them.
  • the predictive model generation unit 120 receives the processed data, sets whether or not the outpatient treatment of the plurality of alcohol use disorder patients is prematurely discontinued as a dependent variable, and uses one or more machine learning algorithms in all or part of the processed data based on the independent variables. can be applied to generate a predictive model.
  • machine learning algorithms can be largely classified into three types: supervised learning algorithms, unsupervised learning algorithms, and reinforcement learning algorithms.
  • a supervised learning algorithm is an algorithm that is used when there is an intended result.
  • a machine learning algorithm model can adjust variables for input values and map them to outputs.
  • An unsupervised learning algorithm is an algorithm used when there is no intended result, and can classify an input data set into a set of similar types. Unsupervised learning algorithms can be used for data mining.
  • Reinforcement learning algorithm is an algorithm used when making a decision about an input value. When a decision is made, the decision on the given input value gradually changes according to success/failure. As the reinforcement learning algorithm learns, it may be possible to predict the result of the input.
  • the predictive model generating unit 120 may be implemented as, for example, a workstation server or a cloud server.
  • the prediction unit 130 inputs all or part of the processed data generated by the pre-processing unit 110 to the prediction model generated by the predictive model generation unit 120 to determine whether a plurality of patients with alcohol use disorders are prematurely discontinued from outpatient treatment. predictive results can be generated.
  • the system 100 for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder can predict whether or not the outpatient treatment for patients with alcohol use disorder will be prematurely discontinued by using the prediction result generated by the prediction unit 130, and also determines whether or not the outpatient treatment is prematurely discontinued. Influencing variables can be identified.
  • the system 100 for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder plays a role of helping to receive treatment steadily by promoting special management according to the characteristics of the patient using the prediction result generated by the prediction unit 130. can do.
  • the output unit 140 may output a prediction result generated by the prediction unit 130 . At this time, the output unit 140 may output the prediction result by a method such as screen output through a display or printing of the prediction result using a printer.
  • FIG. 2 is a flowchart illustrating a variable determination operation performed by the preprocessor 110 according to embodiments of the present invention.
  • the preprocessor 110 calculates variance inflation factors (VIFs) between independent variables in order to solve the problem of multicollinearity between independent variables.
  • Independent variables may be determined so that the variance expansion coefficient (VIF) is maintained below a predetermined threshold value.
  • a multicollinearity problem refers to a problem in which some of the independent variables can be expressed as a combination of other independent variables.
  • the multicollinearity problem can occur when independent variables are not independent of each other and have strong interrelationships.
  • a method for solving the multicollinearity problem a method of eliminating variables dependent on other independent variables may be used, and in this case, a variance inflation factor (VIF) may be used.
  • VIF variance inflation factor
  • the variance inflation factor (VIF) represents the performance of a linear regression of one independent variable on another.
  • the variance inflation factor (VIF) of the i-th variable can be obtained through Equation 1 below.
  • the preprocessor 110 may calculate variance inflation coefficient (VIF) values for all determined independent variables, and determine independent variables such that all VIF values are maintained below a predetermined threshold value.
  • VIF variance inflation coefficient
  • a predetermined critical value of the variance expansion coefficient (VIF) may be determined to be 5.
  • the pre-processing unit 110 may determine independent variables (S210).
  • the pre-processing unit 110 may calculate the variance expansion coefficient (VIF) for all the determined independent variables in the above-described manner (S220).
  • the preprocessing unit 110 may determine another independent variable when the variance inflation factor (VIF) for any one independent variable exceeds a critical value (S230-Y) (S240).
  • VF variance inflation factor
  • S230-Y critical value
  • the preprocessor 110 may enter step S220 and calculate a variance inflation factor (VIF) again for the determined independent variable.
  • VIP variance inflation factor
  • the preprocessor 110 may end the variable determination process.
  • FIG. 3 is a diagram illustrating an operation of classifying alcohol use disorder patient data into a training data set and a test data set by the preprocessing unit 110 according to embodiments of the present invention.
  • the preprocessing unit 110 may classify data of a plurality of alcohol use disorder patients into a learning data group and a test data group.
  • the preprocessing unit 110 classifies some of the data of a plurality of patients with alcohol use disorder into a training data set, and can be used to learn the predictive model. .
  • test data set may be used to test a predictive model generated based on data of a plurality of alcohol use disorder patients.
  • the test data set may be set large enough to derive statistically significant results, and may include all data of a plurality of alcohol use disorder patients.
  • the test data group may be classified to have the same characteristics as the training data group.
  • the predictive model generating unit 120 may derive a performance evaluation index to be used to measure performance of the predictive model using a test data set.
  • the predictive model generation unit 120 may check the objective performance of the predictive model and compare the performances of different predictive models using the performance evaluation index.
  • FIG. 4 is a diagram illustrating an operation of performing oversampling on a specific class by the preprocessor 110 according to embodiments of the present invention.
  • the pre-processing unit 110 may apply a sampling method to a specific class in order to solve the class imbalance of the learning data group when processing data of a plurality of alcohol use disorder patients.
  • the number of data included in the outpatient treatment maintenance class and the outpatient treatment early discontinuation class is disproportionate (e.g. 85:15) to each other. problems can arise.
  • sampling methods may be divided into oversampling methods and undersampling methods.
  • the undersampling method is a method of reducing a data group of a majority class to a level of a data group of a minority class. Since the undersampling method removes a large number of class data, calculation time can be reduced and class overlap can be reduced. However, the undersampling method drastically reduces the total number of data used for learning, and may rather degrade learning performance.
  • the oversampling method secures enough data for learning by increasing the data group of the minority class to the level of the majority class.
  • oversampling methods include random oversampling, which simply replicates an existing minority class to match the ratio, and synthetic minority (SMOTE), which is a method of generating new data between neighboring minority classes from data of an arbitrary minority class.
  • SMOTE synthetic minority
  • the pre-processing unit 110 may correct class imbalance by using an oversampling or undersampling method and derive a more precise prediction.
  • the pre-processing unit 110 in order to solve the imbalance problem between the foreign treatment maintenance class and the foreign treatment early discontinuation class, which are dependent variables for the learning data group, a class that is a minority of the two classes ( Oversampling can be applied to outpatient treatment maintenance classes).
  • the pre-processing unit 110 may generate duplicated data for data a and b included in the minority class for a minority class.
  • FIG. 5 is a diagram illustrating an operation of generating a predictive model by applying one or more machine learning algorithms to a training data group by the predictive model generator 120 according to embodiments of the present invention.
  • the predictive model generator 120 may generate a predictive model by applying one or more machine learning algorithms to a portion corresponding to a training data group among processed data.
  • the one or more machine learning algorithms may be one or more of logistic regression, support vector machine (SVM), random forest, gradient boosting, and adaboost. .
  • Logistic regression is a statistical technique for estimating a causal relationship between a dependent variable having only two values and independent variables using a logistic function.
  • the dependent variable is dichotomous (0 or 1), and the independent variable can be categorical or continuous.
  • the logistic regression model is a special form of generalized linear model and is a functional model that draws an S-shaped curve. As a result of logistic regression analysis, if the value of the dependent variable is greater than 0.5, the event is predicted to occur, and if the value is less than 0.5, the event is predicted not to occur.
  • a Support Vector Machine is one of the machine learning fields and is a supervised learning model for pattern recognition and data analysis, and is mainly used for classification and regression analysis.
  • the support vector machine algorithm may create a non-probabilistic binary linear classification model that determines which category new data belongs to when given a set of data belonging to one of two categories.
  • the category may be divided into an outpatient treatment maintenance group and an outpatient treatment early discontinuation group for patients with alcohol use disorder, and a support vector machine may be used to determine which of the two groups the new data corresponds to.
  • a random forest is a type of ensemble learning method used in regression analysis, etc., and operates by outputting a classification or average prediction value from a plurality of decision trees constructed in the training process.
  • the random forest test process using the ensemble model may derive a final result through average, multiplication, or majority voting of the result obtained from the decision tree. These tests can be performed in parallel, resulting in high computational efficiency.
  • Gradient Boosting is a machine learning algorithm that can perform regression analysis or classification analysis, and is an algorithm that belongs to the boosting family of ensemble methodologies of machine learning algorithms.
  • Boosting is the process of creating a strong classifier by combining weak classifiers, and gradient boosting takes the error of the data predicted by the model in the previous stage and creates a new model with the goal of making this error zero. It is an algorithm that creates a model by combining them.
  • Adaboost is a machine learning algorithm that expresses the final result by weighting and adding the results of other learning algorithms.
  • the predictive model generator 120 may generate a predictive model corresponding to the machine learning algorithm by applying a machine learning algorithm to a training data set.
  • FIG. 6 is a diagram illustrating an operation of determining a predictive model according to performance evaluation indexes by the predictive model generator 120 according to embodiments of the present invention.
  • the predictive model generation unit 120 applies a plurality of machine learning algorithms to a portion corresponding to the learning data group among the processed data received from the preprocessor 110 to generate For each of the plurality of candidate prediction models, a test result may be derived by inputting a part corresponding to a test data group among processed data.
  • the predictive model generator 120 may calculate a performance evaluation index for each of a plurality of candidate predictive models using the derived test results.
  • the prediction model generator 120 may determine a candidate prediction model having the highest performance evaluation index value among a plurality of candidate prediction models as the prediction model.
  • the performance evaluation index may be, for example, one of accuracy, sensitivity, specificity, and area under the ROC curve (AUC).
  • Accuracy is a value obtained by dividing the number of data with identical prediction results (TP + TN) by the total number of predicted data (TP + FP + FN + TN), and is an index for determining how identical the predicted data is in actual data.
  • Accuracy refers to the ratio of whether to discontinue outpatient treatment or to maintain outpatient treatment among all patients.
  • TP is the number of data that the prediction model predicted to be positive but is actually positive
  • FP is the number of data that the prediction model predicted to be positive but is actually negative
  • FN is the number of data that the prediction model is negative
  • TN means the number of data that is predicted to be negative but is actually positive
  • TN is the number of data that is actually negative even though the prediction model predicted to be negative.
  • Sensitivity also called recall rate or hit rate, is the ratio of actual positives (TP) among those predicted by the predictive model to be positive (TP + FP). It means the proportion of the predictive model of one patient.
  • TN + FP the ratio of actual negatives among those predicted by the predictive model to be negative
  • FP + FP the ratio of actual outpatient treatment maintenance patients to which the predictive model is correct.
  • AUC can be obtained from the ROC (Receiver Operating Characteristics) curve, and means the true positive rate according to the false positive rate, which means (1 - specificity) according to the sensitivity.
  • AUC is the area under the ROC curve, and the maximum is 1, and a good predictive model has an AUC value close to 1.
  • the predictive model generation unit 120 may select a predictive model capable of predicting whether an alcohol use disorder patient will stop outpatient treatment early with the highest probability using the above-described performance evaluation index for a plurality of candidate predictive models.
  • the predictive model generation unit 120 may generate Table 2 by calculating a performance evaluation index according to each of the candidate predictive models.
  • a predictive model using the Adaboost algorithm can be determined as the predictive model.
  • accuracy or specificity When a predictive model is determined based on specificity, a candidate predictive model using a random forest algorithm may be determined as the predictive model.
  • AUC is one of the performance evaluation indicators according to embodiments of the present invention.
  • the predictive model generation unit 120 may determine a predictive model using AUC as a performance evaluation index.
  • An ROC curve for example, may be determined as shown in FIG. 7 .
  • AUC means the area under the ROC curve, and the predictive model generation unit 120 may check the AUC value by calculating the area under the ROC curve for each candidate prediction model.
  • the prediction model generator 120 may select a prediction model using Adaboost.
  • FIG. 8 is a diagram illustrating a method for predicting early discontinuation of outpatient treatment for alcohol use disorder patients according to embodiments of the present invention.
  • the method for predicting early discontinuation of outpatient treatment for alcohol use disorder patients may include a data collection step ( S810 ) of collecting data of a plurality of alcohol use disorder patients.
  • the method for predicting early discontinuation of outpatient treatment of patients with alcohol use disorder is an independent variable that determines a plurality of independent variables to which one or more machine learning algorithms are applied to generate a predictive model for whether or not a plurality of patients with alcohol use disorder will discontinue outpatient treatment early.
  • a variable determination step (S820) may be included.
  • the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder may include a preprocessing step ( S830 ) of generating processed data by processing data of a plurality of patients with alcohol use disorder.
  • the aforementioned data collection step (S810), independent variable determination step (S820), and preprocessing step (S830) may be executed by the aforementioned preprocessor 110.
  • the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder receives processed data, sets whether or not early discontinuation of outpatient treatment for multiple alcohol use disorder patients as a dependent variable, and based on independent variables, all of the processed data
  • it may include a predictive model generating step (S840) of generating a predictive model by applying one or more machine learning algorithms to a part.
  • the predictive model generating step (S840) may be executed by the aforementioned predictive model generating unit 120.
  • the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder is a prediction step of inputting all or part of the processed data into a predictive model to generate a prediction result on whether or not to discontinue the outpatient treatment of a plurality of patients with alcohol use disorder in the early stage (S850) can include Meanwhile, the predicting step (S850) may be executed by the predicting unit 130 described above.
  • the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder may include an output step ( S860 ) of outputting a prediction result. Meanwhile, the output step (S860) may be executed by the above-described output unit 140.
  • independent variable determining step (S820) for example, when determining the independent variables, in order to solve the multicollinearity problem between the independent variables, variance inflation factors (VIFs) are calculated for the independent variables and , independent variables may be determined so that the variance expansion coefficient is maintained below a predetermined threshold value.
  • VAFs variance inflation factors
  • the pre-processing step ( S830 ) may include classifying a plurality of alcohol use disorder patient data into a learning data group and a test data group.
  • the pre-processing step (S830) may include applying oversampling to a minority of the foreign treatment maintenance class and the foreign treatment early discontinuation class, which are dependent variables for the learning data group.
  • a predictive model may be generated by applying one or more machine learning algorithms to a part corresponding to the training data group among the processed data.
  • one or more machine learning algorithms are: 1) Logistic Regression, 2) Support Vector Machine (SVM), 3) Random Forest, 4) Gradient Boosting, and 5 ) may be one or more of Adaboost.
  • the predictive model generating step (S840) when there are a plurality of machine learning algorithms, a plurality of candidate prediction models generated by applying a plurality of machine learning algorithms to a part corresponding to a training data group among processed data For each of the processing data, deriving a test result by inputting a part corresponding to the test data group, 2) Calculating a performance evaluation index for each of a plurality of candidate prediction models using the test result, 3 ) determining a candidate prediction model having the highest performance evaluation index among a plurality of candidate prediction models as a prediction model.
  • the performance evaluation index may be an area under the ROC curve (AUC).
  • the aforementioned system 100 for predicting early withdrawal from outpatient treatment for patients with alcohol use disorder may be implemented by a computing device including at least some of a processor, a memory, a user input device, and a presentation device.
  • Memory is a medium that stores computer-readable software, applications, program modules, routines, instructions, and/or data that are coded to perform particular tasks when executed by a processor.
  • a processor may read and execute computer-readable software, applications, program modules, routines, instructions, and/or data stored in memory.
  • the user input device may be a means for allowing a user to input a command to execute a specific task to the processor or input data required for execution of the specific task.
  • the user input device may include a physical or virtual keyboard or keypad, key buttons, mouse, joystick, trackball, touch-sensitive input means, or a microphone.
  • the presentation device may include a display, a printer, a speaker, or a vibrator.
  • Computing devices may include a variety of devices such as smart phones, tablets, laptops, desktops, servers, and clients.
  • a computing device may be a single stand-alone device or may include multiple computing devices operating in a distributed environment consisting of multiple computing devices cooperating with each other over a communications network.
  • the above-described method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder has a processor and is coded to perform an image diagnosis method using a deep learning model when executed by the processor.
  • Computer readable software and applications , program modules, routines, instructions, and/or data structures, etc. may be executed by a computing device having a memory.
  • present embodiments described above may be implemented through various means.
  • the present embodiments may be implemented by hardware, firmware, software, or a combination thereof.
  • the present embodiments include one or more ASICs (Application Specific Integrated Circuits), DSPs (Digital Signal Processors), DSPDs (Digital Signal Processing Devices), PLDs (Programmable Logic Devices), FPGAs (Field Programmable Gates) Arrays), processors, controllers, microcontrollers or microprocessors.
  • ASICs Application Specific Integrated Circuits
  • DSPs Digital Signal Processors
  • DSPDs Digital Signal Processing Devices
  • PLDs Programmable Logic Devices
  • FPGAs Field Programmable Gates
  • processors controllers, microcontrollers or microprocessors.
  • a method for predicting early withdrawal from outpatient treatment of a patient with alcohol use disorder may be implemented using an artificial intelligence semiconductor device in which neurons and synapses of a deep neural network are implemented as semiconductor devices.
  • the semiconductor device may be currently used semiconductor devices such as SRAM, DRAM, NAND, etc., next-generation semiconductor devices, RRAM, STT MRAM, PRAM, etc., or a combination thereof.
  • the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder is implemented using an artificial intelligence semiconductor device
  • the result (weight) of learning the deep learning model as software is transferred to the synaptic mimic device arranged in an array, or Learning may be performed on an artificial intelligence semiconductor device.
  • the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder may be implemented in the form of a device, procedure, or function that performs the functions or operations described above.
  • the software codes may be stored in a memory unit and driven by a processor.
  • the memory unit may be located inside or outside the processor and exchange data with the processor by various means known in the art.
  • system generally refer to computer-related entities hardware, hardware and software.
  • a component can be both an application running on a controller or processor and a controller or processor.
  • One or more components may reside within a process and/or thread of execution, and components may reside on one device (eg, system, computing device, etc.) or may be distributed across two or more devices.
  • another embodiment provides a computer program stored in a computer recording medium that performs the above-described method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder.
  • another embodiment provides a computer-readable recording medium recording a program for realizing the above-described method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder.
  • a program recorded on a recording medium may be read, installed, and executed in a computer to execute the above-described steps.
  • the above-described program is C, C++ that can be read by the computer's processor (CPU) through the computer's device interface.
  • JAVA may include a code coded in a computer language such as machine language.
  • These codes may include functional codes related to functions defining the above-described functions, and may include control codes related to execution procedures necessary for a processor of a computer to execute the above-described functions according to a predetermined procedure.
  • these codes may further include memory reference related codes for which location (address address) of the computer's internal or external memory should be referenced for additional information or media necessary for the computer's processor to execute the above-mentioned functions. .
  • the code allows the computer processor to use the computer's communication module to communicate with any other remote computer or server.
  • Communication-related codes for how to communicate with other computers or servers, what information or media to transmit/receive during communication, and the like may be further included.
  • Recording media that can be read by a computer on which the program as described above is recorded are, for example, ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical media storage device, etc., and also carrier wave (e.g. , Transmission through the Internet) may also include what is implemented in the form of.
  • carrier wave e.g. , Transmission through the Internet
  • the computer-readable recording medium is distributed in computer systems connected through a network, so that computer-readable codes can be stored and executed in a distributed manner.
  • a functional program for implementing the present invention codes and code segments related thereto, in consideration of the system environment of a computer that reads a recording medium and executes a program, etc., help programmers in the art to which the present invention belongs It may be easily inferred or changed by
  • the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder may be implemented in the form of a recording medium including instructions executable by a computer, such as an application or program module executed by a computer.
  • Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. Also, computer readable media may include all computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
  • a method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder may be executed by an application basically installed in a terminal (this may include a program included in a platform or operating system, etc. It may be executed by an application (that is, a program) directly installed in the master terminal through an application providing server such as a server, an application, or a web server related to the corresponding service.
  • an application that is, a program
  • the above-described method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder is implemented as an application (i.e., a program) that is basically installed in a terminal or directly installed by a user, and is stored in a computer-readable recording medium such as a terminal. can be recorded

Abstract

Embodiments of the present invention pertain to a system and method for predicting early discontinuation of treatment of outpatients with alcohol use disorder. According to embodiments of the present invention, a plurality of independent parameters to which a machine learning algorithm will be applied is determined, the machine learning algorithm is applied to produce a prediction model, and processed data is input into the prediction model to generate a prediction result accounting for whether early discontinuation of treatment of outpatients with alcohol use disorder will be performed or not.

Description

알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템 및 그 방법System and method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder
본 발명은 알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템 및 그 방법을 제공한다.The present invention provides a system and method for predicting early discontinuation of outpatient treatment for alcohol use disorder patients.
종래에 알코올 사용장애 환자의 치료율을 높이기 위하여 지속적인 외래 치료 유지에 영향을 주는 요인에 대한 연구들이 수행되었다. 그러나, 이 연구들은 대부분 전향적 연구에 불과하고, 후향적 연구인 경우에도 회귀 분석과 같은 전통적인 방법론에 초점을 맞추었다.Conventionally, in order to increase the treatment rate of patients with alcohol use disorder, studies on factors affecting continuous outpatient treatment have been conducted. However, most of these studies were only prospective studies, and even retrospective studies focused on traditional methodologies such as regression analysis.
머신 러닝은 경험과 데이터 사용을 통해 자동으로 향상되는 컴퓨터 알고리즘에 관한 연구이다. 머신 러닝은 인공지능의 일부로 간주되기도 한다. 머신 러닝 알고리즘은 특정한 동작을 명시적으로 프로그래밍하지 않고, 대신에 머신 러닝 알고리즘은 학습 데이터라고 하는 샘플을 기반으로 예측이나 결정을 내리기 위한 모델을 구축하는 데 사용될 수 있다. 머신 러닝은 의학, 음성인식 및 컴퓨터 비전 등 다양한 응용에 사용될 수 있다.Machine learning is the study of computer algorithms that automatically improve through experience and the use of data. Machine learning is also considered a part of artificial intelligence. Machine learning algorithms do not explicitly program specific actions; instead, machine learning algorithms can be used to build models to make predictions or decisions based on samples called training data. Machine learning can be used for a variety of applications, including medicine, speech recognition and computer vision.
머신 러닝에 기반한 예측 모델은 높은 정확도로 클래스 분류를 할 수 있다. 최근 정신 의학 연구에서는 의사결정 지원 시스템을 개발하는 과정에서 머신 러닝에 기반한 예측 모델을 유용하게 사용하고 있다. Predictive models based on machine learning can classify with high accuracy. In recent psychiatric research, predictive models based on machine learning have been usefully used in the process of developing decision support systems.
알코올 사용장애는 알코올로 인한 신체적 합병증 및 알코올성 치매 등 신체적 질병만이 아니라, 알코올과 관련된 범죄, 사고 등 사회문제 및 막대한 경제적 손실을 초래할 수 있다.Alcohol use disorder may cause not only physical diseases such as alcohol-induced physical complications and alcohol-related dementia, but also social problems such as alcohol-related crimes and accidents, and enormous economic losses.
2016년도 한국의 정신질환실태 역학조사에 따르면, 알코올 의존과 남용이 포함된 알코올 사용장애 평생 유병율은 12.2%(남 18.1%, 여 6.4%)로 다른 정신장애질환에 비해 유병률이 가장 높다.According to the 2016 Epidemiologic Survey of Mental Disorders in Korea, the lifetime prevalence of alcohol use disorder, which includes alcohol dependence and abuse, was 12.2% (male 18.1%, female 6.4%), which is the highest prevalence rate compared to other mental disorders.
알코올 사용장애는 다른 정신질환에 비해 재발율이 높은 질환이다. 재발을 막기 위해서는 한순간의 치료로 종결되지 않고 장기간에 걸쳐 관리될 필요가 있다. 또한 꾸준하게 치료를 받으면 치료 결과에 긍정적인 영향을 줄 수 있다. 따라서 환자에 대한 지속적인 추적 관찰은 알코올 사용장애의 예후를 평가할 수 있는 중요한 지표가 된다.Alcohol use disorder has a higher relapse rate than other mental disorders. In order to prevent recurrence, it is necessary to be managed over a long period of time, not terminated with a single treatment. In addition, continuous treatment can have a positive effect on the treatment outcome. Therefore, continuous follow-up of patients is an important indicator to evaluate the prognosis of alcohol use disorder.
알코올 사용장애 환자들의 외래 치료 유지율은 상당히 낮은 것이 실상이다. 외국의 경우, 알코올 사용장애로 외래 치료를 받는 환자가 4 번째 치료에서 추적관찰이 중단되는 비율이 52% ~ 75%수준이다. 국내의 연구에 따르면 퇴원 후 6개월 이내에 추적 관찰이 중단된 환자는 91.7%이었다. 알코올 사용장애 환자들 중 추적 관찰이 조기 중단되는 환자를 예측 관리하는 것이 중요하다.It is a reality that the outpatient treatment retention rate of patients with alcohol use disorder is quite low. In foreign countries, 52% to 75% of patients receiving outpatient treatment for alcohol use disorder lose follow-up at the fourth treatment. According to a domestic study, 91.7% of patients were discontinued from follow-up within 6 months of discharge. Among patients with alcohol use disorder, it is important to predict and manage patients whose follow-up is prematurely discontinued.
본 발명의 실시예들은 머신 러닝을 통한 예측 모델 설계를 통해 알코올 사용장애 환자의 외래 치료 조기 중단 확률을 계산하여 외래 치료 조기 중단 여부를 예측할 수 있다.Embodiments of the present invention can predict whether or not to discontinue outpatient treatment early by calculating the probability of early withdrawal from outpatient treatment for alcohol use disorder patients through a predictive model design through machine learning.
본 발명의 실시예들은 조기에 외래 치료 중단 위험이 높은 알코올 사용장애 환자에 대해 꾸준히 치료를 유지할 수 있도록 환자 관리에 도움을 주고, 궁극적으로는 환자의 재발 방지 및 치료의 성공률을 높이는 데 기여할 수 있다.Embodiments of the present invention help in patient management so that patients with alcohol use disorder who have a high risk of early outpatient treatment discontinuation can continue treatment steadily, and ultimately contribute to preventing relapse of patients and increasing the success rate of treatment. .
다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problem to be achieved by the present embodiment is not limited to the technical problem as described above, and other technical problems may exist.
일 측면에서, 본 발명의 실시예들은 복수의 알코올 사용장애 환자의 데이터를 수집하고, 복수의 알코올 사용장애 환자의 외래 치료 조기 중단 여부에 대한 예측 모델을 생성하기 위한 하나 이상의 머신 러닝 알고리즘을 적용할 복수의 독립 변수들을 결정하고, 복수의 알코올 사용장애 환자의 데이터를 가공하여 가공 데이터를 생성하는 전처리부; 가공 데이터를 수신하고, 복수의 알코올 사용장애 환자의 외래 치료 조기 중단 여부를 종속 변수로 설정하고, 독립 변수들을 바탕으로 가공 데이터 중 전체 또는 일부에 하나 이상의 머신 러닝 알고리즘을 적용하여 예측 모델을 생성하는 예측 모델 생성부; 가공 데이터 중 전체 또는 일부를 예측 모델에 입력하여 복수의 알코올 사용 장애 환자의 외래 치료 조기 중단 여부에 관한 예측 결과를 생성하는 예측부; 및 예측 결과를 출력하는 출력부;를 포함하는 알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템을 제공한다.In one aspect, embodiments of the present invention collect data of a plurality of alcohol use disorder patients, and apply one or more machine learning algorithms to generate a predictive model for early discontinuation of outpatient treatment of a plurality of alcohol use disorder patients. a pre-processing unit that determines a plurality of independent variables and generates processed data by processing data of a plurality of alcohol use disorder patients; Receives processed data, sets whether or not the outpatient treatment of multiple alcohol use disorder patients is prematurely discontinued as a dependent variable, and applies one or more machine learning algorithms to all or part of the processed data based on independent variables to generate a predictive model a predictive model generating unit; A prediction unit that inputs all or part of the processed data into a predictive model to generate a prediction result regarding whether or not the outpatient treatment of a plurality of alcohol use disorder patients is prematurely discontinued; And an output unit for outputting a prediction result; provides a system for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder.
다른 측면에서, 본 발명의 실시예들은 복수의 알코올 사용장애 환자의 데이터를 수집하는 데이터 수집 단계; 복수의 알코올 사용장애 환자의 외래 치료 조기 중단 여부에 대한 예측 모델을 생성하기 위한 하나 이상의 머신 러닝 알고리즘을 적용할 복수의 독립 변수들을 결정하는 독립 변수 결정 단계; 복수의 알코올 사용장애 환자의 데이터를 가공하여 가공 데이터를 생성하는 전처리 단계; 가공 데이터를 수신하고, 복수의 알코올 사용장애 환자의 외래 치료 조기 중단 여부를 종속 변수로 설정하고, 독립 변수들을 바탕으로 가공 데이터 중 전체 또는 일부에 하나 이상의 머신 러닝 알고리즘을 적용하여 예측 모델을 생성하는 예측 모델 생성 단계; 가공 데이터 중 전체 또는 일부를 예측 모델에 입력하여 복수의 알코올 사용 장애 환자의 외래 진료 초기 중단 여부에 관한 예측 결과를 생성하는 예측 단계 및 예측 결과를 출력하는 출력 단계를 포함하는 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법을 제공한다.In another aspect, embodiments of the present invention include a data collection step of collecting data of a plurality of alcohol use disorder patients; An independent variable determination step of determining a plurality of independent variables to which one or more machine learning algorithms for generating a predictive model for early discontinuation of outpatient treatment of a plurality of alcohol use disorder patients are to be applied; A pre-processing step of generating processed data by processing data of a plurality of alcohol use disorder patients; Receives processed data, sets whether or not to discontinue outpatient treatment early for multiple alcohol use disorder patients as a dependent variable, and applies one or more machine learning algorithms to all or part of the processed data based on independent variables to create a predictive model generating a predictive model; Outpatient treatment of patients with alcohol use disorder, including a prediction step of generating a prediction result on whether a plurality of patients with alcohol use disorder will initially discontinue outpatient treatment by inputting all or part of the processed data into a predictive model, and an output step of outputting the prediction result. Provides a method for predicting early discontinuation of treatment.
본 발명의 실시예들에 따르면, 머신 러닝을 통한 예측 모델 설계를 통해 알코올 사용장애 환자의 외래 치료 조기 중단확률을 계산하여 외래 치료 조기 중단 여부를 예측할 수 있다.According to embodiments of the present invention, it is possible to predict whether or not to discontinue outpatient treatment early by calculating the probability of early withdrawal from outpatient treatment of an alcohol use disorder patient through a predictive model design through machine learning.
본 발명의 실시예들에 따르면, 조기에 외래 치료 중단 위험이 높은 알코올 사용장애 환자에 대해 꾸준히 치료를 유지할 수 있도록 환자 관리에 도움을 줄 수 있고, 궁극적으로는 환자의 재발 방지 및 치료의 성공률을 높이는 데 기여할 수 있다.According to the embodiments of the present invention, it is possible to help patients with alcohol use disorder who have a high risk of discontinuing outpatient treatment early in patient management so that they can consistently maintain treatment, and ultimately prevent recurrence of patients and increase the success rate of treatment. can contribute to heightening
도 1은 본 발명의 실시예들에 따른 알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템의 개략적인 구성도이다.1 is a schematic configuration diagram of a system for predicting early discontinuation of outpatient treatment for alcohol use disorder patients according to embodiments of the present invention.
도 2는 본 발명의 실시예들에 따른 전처리부가 수행하는 변수 결정 동작을 나타내는 흐름도이다.2 is a flowchart illustrating a variable determination operation performed by a preprocessor according to embodiments of the present invention.
도 3은 본 발명의 실시예들에 따른 전처리부가 알코올 사용장애 환자의 데이터를 학습 데이터군과 시험 데이터군으로 분류하는 동작을 나타낸 도면이다.3 is a diagram illustrating an operation of classifying data of an alcohol use disorder patient into a learning data group and a test data group by a preprocessing unit according to embodiments of the present invention.
도 4는 본 발명의 실시예들에 따른 전처리부가 특정 클래스에 샘플링을 수행하는 동작을 나타낸 도면이다.4 is a diagram illustrating an operation of performing sampling on a specific class by a pre-processor according to embodiments of the present invention.
도 5는 본 발명의 실시예들에 따른 예측 모델 생성부가 학습 데이터군에 하나 이상의 머신 러닝 알고리즘을 적용하여 예측 모델을 생성하는 동작의 일 예를 나타낸 도면이다.5 is a diagram illustrating an example of an operation of generating a predictive model by applying one or more machine learning algorithms to a training data group by a predictive model generator according to embodiments of the present invention.
도 6은 본 발명의 실시예들에 따른 예측 모델 생성부가 성능평가지표에 따라 예측 모델을 결정하는 동작을 나타낸 도면이다. 6 is a diagram illustrating an operation of determining a predictive model according to a performance evaluation index of a predictive model generator according to embodiments of the present invention.
도 7은 본 발명의 실시예들에 따른 성능평가지표 중 하나인 AUC를 표시한 도면이다.7 is a diagram showing AUC, which is one of the performance evaluation indicators according to embodiments of the present invention.
도 8은 본 발명의 실시예들에 따른 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법을 나타낸 도면이다.8 is a diagram illustrating a method for predicting early discontinuation of outpatient treatment for alcohol use disorder patients according to embodiments of the present invention.
아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실행할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily practice them. However, the present invention may be implemented in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and reference numerals are attached to similar parts throughout the specification.
명세서 전체에서, 어떤 부분이 다른 부분과 "연결"도어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들의 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Throughout the specification, when a part is said to be "connected" to another part, this includes not only the case of being "directly connected" but also the case of being "electrically connected" with another element in between. . In addition, when a part is said to "include" a certain component, it means that it can further include other components, not excluding other components unless otherwise stated, and one or more other characteristics. However, it should be understood that it does not preclude the presence or addition of numbers, steps, operations, components, parts, or combinations thereof.
명세서 전체에서 사용되는 정도의 용어 "약", "실질적으로" 등은 언급된 의미에 고유한 제조 및 물질 허용오차가 제시될 때 그 수치에서 또는 그 수치에 근접한 의미로 사용되고, 본 발명의 이해를 돕기 위해 정확하거나 절대적인 수치가 언급된 개시 내용을 비양심적인 침해자가 부당하게 이용하는 것을 방지하기 위해 사용된다. 본 발명의 명세서 전체에서 사용되는 정도의 용어 "~(하는) 단계" 또는 "~의 단계"는 "~ 를 위한 단계"를 의미하지 않는다.As used throughout the specification, the terms "about", "substantially", etc., are used at or approximating that value when manufacturing and material tolerances inherent in the stated meaning are given, and do not convey an understanding of the present invention. Accurate or absolute figures are used to help prevent exploitation by unscrupulous infringers of the disclosed disclosure. The term "step of (doing)" or "step of" as used throughout the specification of the present invention does not mean "step for".
본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1개의 유닛이 2개 이상의 하드웨어를 이용하여 실현되어도 되고, 2개 이상의 유닛이 1개의 하드웨어에 의해 실현되어도 된다.In this specification, a "unit" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. Further, one unit may be realized using two or more hardware, and two or more units may be realized by one hardware.
도 1은 본 발명의 실시예들에 따른 알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템의 개략적인 구성도이다.1 is a schematic configuration diagram of a system for predicting early discontinuation of outpatient treatment for alcohol use disorder patients according to embodiments of the present invention.
도 1을 참조하면, 본 발명의 실시예들에 따른 알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템(100)은, 전처리부(110), 예측 모델 생성부(120), 예측부(130) 및 출력부(140)를 포함할 수 있다.Referring to FIG. 1, a system for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder according to embodiments of the present invention 100 includes a preprocessor 110, a predictive model generator 120, a predictor 130, and An output unit 140 may be included.
전처리부(110)는 복수의 알코올 사용장애 환자의 데이터를 수집할 수 있다. 그리고 전처리부(110)는 복수의 알코올 사용장애 환자의 외래 치료 조기 중단 여부에 대한 예측 모델을 생성하기 위한 하나 이상의 머신 러닝 알고리즘을 적용할 복수의 독립 변수들을 결정할 수 있다. 그리고 전처리부(110)는 복수의 알코올 사용장애 환자의 데이터를, 전술한 하나 이상의 머신 러닝 알고리즘을 적용하기 위하여, 가공하여 가공 데이터를 생성할 수 있다.The pre-processing unit 110 may collect data of a plurality of alcohol use disorder patients. In addition, the preprocessing unit 110 may determine a plurality of independent variables to which one or more machine learning algorithms are applied to generate a predictive model for whether or not a plurality of patients with alcohol use disorder prematurely discontinue outpatient treatment. In addition, the pre-processing unit 110 may generate processed data by processing data of a plurality of alcohol use disorder patients in order to apply one or more machine learning algorithms described above.
전처리부(110)는 알코올 사용장애 환자의 데이터를 수집할 때, 데이터를 보관하는 서버 내지 단말과의 유무선의 통신을 이용할 수 있다. 일 예로, 전처리부(110)는 하나 이상의 의료기관에서 알코올 사용장애 환자의 의료 데이터를 수신할 수 있다. 이때, 복수의 알코올 사용장애 환자의 데이터는 공통 데이터 모델(CDM, Common Data Model)로 규격화 되어 있을 수 있다. When the pre-processing unit 110 collects data of patients with alcohol use disorder, wired/wireless communication with a server or terminal storing the data may be used. For example, the pre-processing unit 110 may receive medical data of alcohol use disorder patients from one or more medical institutions. At this time, data of a plurality of alcohol use disorder patients may be standardized as a common data model (CDM, Common Data Model).
일 예로, 복수의 알코올 사용장애 환자의 데이터는 임상데이터 웨어하우스(CDW, Clinical Data Warehouse)로부터 수집될 수 있다. 임상데이터 웨어하우스(CDW)는 비식별화(De-identification)를 통해 연구 특성에 맞게 추출된 데이터를 전처리부(110)에 전달할 수 있다.For example, data of a plurality of alcohol use disorder patients may be collected from a Clinical Data Warehouse (CDW). The clinical data warehouse (CDW) may transmit data extracted according to research characteristics to the pre-processing unit 110 through de-identification.
한편, 예측 모델의 민감도를 높이기 위해, 복수의 알코올 사용장애 환자는 입원기간이 2주 이상인 환자 중에서 선택될 수 있다.On the other hand, in order to increase the sensitivity of the predictive model, a plurality of alcohol use disorder patients may be selected from patients with a hospitalization period of 2 weeks or more.
이때, 복수의 알코올 사용장애 환자 중 2주 이상 입원한 경우가 2회 이상인 환자에 대한 입원일은, 가장 처음 입원한 일자를 기준으로 정의될 수 있다. In this case, the date of hospitalization for a patient who has been hospitalized two or more times for two weeks or longer among multiple alcohol use disorder patients may be defined based on the first hospitalized date.
알코올 사용장애 환자의 지속적인 외래 방문 여부는, 해당 환자가 퇴원 후 6개월 동안 매달 1회 이상 외래 방문 여부로 정의할 수 있다.Whether or not patients with alcohol use disorder continue to visit the outpatient clinic can be defined as whether or not the patient visits the outpatient clinic at least once a month for 6 months after being discharged from the hospital.
또한, 퇴원 후 6개월 이내에 사망한 환자는 대상에서 제외될 수 있다. In addition, patients who died within 6 months of discharge can be excluded.
전처리부(110)는, 복수의 알코올 사용장애 환자의 외래 치료 조기 중단 여부에 대한 예측 모델을 생성하기 위한 하나 이상의 머신 러닝 알고리즘을 적용할, 복수의 독립 변수들을 결정할 수 있다.The preprocessing unit 110 may determine a plurality of independent variables to which one or more machine learning algorithms for generating a predictive model for early discontinuation of outpatient treatment of a plurality of alcohol use disorder patients will be applied.
일 예로, 전처리부(110)는 1) 환자의 연령, 2) 성별, 3) 입원기간, 4) 주소, 5) 진료과, 6) 입원 전 1년 이내 진단받은 당뇨, 간질환, 우울장애 및 불안장애 등 동반질환 여부, 7) 입원 전 알코올 사용장애 외래 치료 여부, 8) 날트렉손 처방 여부 등을 포함한 변수 중에서 독립 변수를 결정할 수 있다. For example, the preprocessing unit 110 is 1) the patient's age, 2) gender, 3) hospitalization period, 4) address, 5) medical department, 6) diabetes, liver disease, depressive disorder and anxiety diagnosed within 1 year before hospitalization Independent variables can be determined among variables including whether there are comorbidities such as disabilities, 7) outpatient treatment for alcohol use disorder before hospitalization, and 8) whether naltrexone was prescribed.
한편, 복수의 알코올 사용 장애 환자 중에서 외래 치료 유지 그룹과 외래 치료 조기 중단 그룹의 차이를 확인하기 위하여 결정된 독립 변수에 대하여 t-테스트(t-test)와 카이-제곱 테스트(chi-square test) 등의 통계 분석이 실시될 수 있다. On the other hand, t-test, chi-square test, etc. for the independent variables determined to confirm the difference between the outpatient treatment maintenance group and the outpatient treatment early discontinuation group among patients with multiple alcohol use disorders Statistical analysis of can be performed.
t-테스트는 두 그룹간의 평균의 차이가 유의미한지 검증하는 통계방법이다. 카이-제곱 테스트는 카이-제곱 분포에 기초한 통계적 방법으로, 관찰된 빈도가 기대되는 빈도와 유의미하게 다른지 여부를 검정하기 위해 사용되는 검정방법이다. The t-test is a statistical method for verifying whether the difference in average between two groups is significant. The chi-square test is a statistical method based on a chi-square distribution, and is a test method used to test whether an observed frequency is significantly different from an expected frequency.
일 예로, 전술한 통계 분석의 실시 결과는 표 1과 같을 수 있다.As an example, the results of the statistical analysis described above may be shown in Table 1.
외래 치료 유지 (n=126)Maintained outpatient treatment (n=126) 외래 치료 조기 중단 (n=713)Early discontinuation of outpatient treatment (n=713) P-valueP-value
입원기간Hospitalization period 0.4060.406
28일 이하28 days or less 437 (61.3%)437 (61.3%) 77 (61.1%)77 (61.1%)
29-56일29-56 days 136 (19.1%)136 (19.1%) 25 (19.8%)25 (19.8%)
57-70일57-70 days 110 (15.4%)110 (15.4%) 15 (11.9%)15 (11.9%)
70일 이상more than 70 days 30 (4.2%)30 (4.2%) 9 (7.1%)9 (7.1%)
성별gender 0.008***0.008***
남성male 91 (72.2%)91 (72.2%) 590 (82.7%)590 (82.7%)
여성woman 35 (27.8%)35 (27.8%) 123 (17.3%)123 (17.3%)
나이age 0.058*0.058*
29세 이하under 29 9 (7.1%)9 (7.1%) 22 (3.1%)22 (3.1%)
30-3930-39 22 (17.5%)22 (17.5%) 96 (13.5%)96 (13.5%)
40-4940-49 29 (23.0%)29 (23.0%) 201 (28.2%)201 (28.2%)
50-5950-59 30 (23.8%)30 (23.8%) 216 (30.3%)216 (30.3%)
60세 이상60+ 36 (28.6%)36 (28.6%) 178 (25.0%)178 (25.0%)
주소address 0.04**0.04**
서울Seoul 37 (29.4%)37 (29.4%) 144 (20.2%)144 (20.2%)
경기game 75 (59.5%)75 (59.5%) 451 (63.3%)451 (63.3%)
기타Etc 14 (11.1%)14 (11.1%) 118 (16.5%)118 (16.5%)
진료과medical department 0.015**0.015**
정신과psychiatry 111 (88.1%)111 (88.1%) 546 (76.6%)546 (76.6%)
소화기내과Gastroenterology 9 (7.1%)9 (7.1%) 104 (14.6%)104 (14.6%)
기타Etc 6 (4.8%)6 (4.8%) 63 (8.8%)63 (8.8%)
입원전 알코올 사용장애 외래 치료 여부Pre-hospital alcohol use disorder outpatient treatment 0.000***0.000***
radish 35 (27.8%)35 (27.8%) 325 (45.6%)325 (45.6%)
you 91 (72.2%)91 (72.2%) 388 (54.4%)388 (54.4%)
당뇨diabetes 0.087*0.087*
radish 109 (86.5%)109 (86.5%) 654 (91.7%)654 (91.7%)
you 17 (13.5%)17 (13.5%) 59 (8.3%)59 (8.3%)
간질환liver disease 0.2240.224
radish 107 (84.9%)107 (84.9%) 569 (79.8%)569 (79.8%)
you 19 (15.1%)19 (15.1%) 144 (20.2%)144 (20.2%)
우울장애depressive disorder 0.006***0.006***
radish 78 (61.9%)78 (61.9%) 529 (74.2%)529 (74.2%)
you 48 (38.1%)48 (38.1%) 184 (25.8%)184 (25.8%)
불안장애anxiety disorder 0.053*0.053*
radish 104 (82.5%)104 (82.5%) 635 (89.1%)635 (89.1%)
you 22 (17.5%)22 (17.5%) 78 (10.9%)78 (10.9%)
날트렉손 처방 여부Whether naltrexone is prescribed 0.000***0.000***
radish 93 (73.8%)93 (73.8%) 626 (87.8%)626 (87.8%)
you 33 (26.2%)33 (26.2%) 87 (12.2%)87 (12.2%)
표 1에 따르면, 성별, 주소, 진료과, 우울장애, 입원 전 알코올 사용장애 외래 치료 여부, 날트렉손 처방 여부에 대해 유의수준 0.05 하에서, 복수의 알코올 사용 장애 환자 중에서 외래 치료 유지 그룹과 외래 치료 조기 중단 그룹간에 통계적으로 유의한 차이가 확인될 수 있다.일 예로, 남성, 우울장애 동반하지 않은 환자, 날트렉손 처방받지 않은 환자, 서울이 아닌 지역의 환자, 정신과가 아닌 과에서 입원한 환자, 입원전 알코올 사용장애 진단을 받고 외래 치료를 받지 않은 환자에서 외래 치료 조기 중단 비율이 높게 나올 수 있다. According to Table 1, under the significance level of 0.05 for gender, address, department, depressive disorder, outpatient treatment for alcohol use disorder before hospitalization, and whether or not naltrexone was prescribed, outpatient treatment maintenance group and outpatient treatment early discontinuation group among multiple alcohol use disorder patients Statistically significant differences can be identified between males. For example, male, patients without depressive disorder, patients not prescribed naltrexone, patients outside of Seoul, patients hospitalized in non-psychiatric departments, alcohol use before hospitalization There may be a high rate of premature discontinuation of outpatient treatment in patients diagnosed with a disability and not receiving outpatient treatment.
이러한 검증 과정을 통해 그룹간 알코올 사용 장애 환자의 특성 차이가 확인될 수 있다. Through this verification process, differences in the characteristics of alcohol use disorder patients between groups can be identified.
전처리부(110)는 하나 이상의 머신 러닝 알고리즘을 적용하기 위하여, 복수의 알코올 사용장애 환자의 데이터를 가공하여, 가공 데이터를 생성할 수 있다. 한편, 전처리부(110)는 예측 모델이 생성된 이후에도 예측 대상 데이터에 대해서 전술한 과정을 수행할 수 있다. 이를 통해 예측 모델의 학습 성능이 향상될 수 있다.The pre-processing unit 110 may generate processed data by processing data of a plurality of alcohol use disorder patients in order to apply one or more machine learning algorithms. Meanwhile, the pre-processing unit 110 may perform the above-described process for prediction target data even after the prediction model is generated. Through this, the learning performance of the predictive model can be improved.
전처리부(110)는 복수의 알코올 사용장애 환자의 데이터 중에서 결측 데이터, 이상 데이터, 중복 데이터를 제거하거나 수정함으로써, 양질의 가공 데이터를 확보하여 예측 모델의 정확성을 향상시킬 수 있다.The pre-processing unit 110 may improve the accuracy of the predictive model by securing high-quality processed data by removing or correcting missing data, abnormal data, and redundant data among the data of a plurality of alcohol use disorder patients.
또한 전처리부(110)는 복수의 알코올 사용장애 환자의 데이터에 대해, 데이터간의 결합, 분할, 필터링 샘플링 파생변수 생성, 더미변수 생성, 스케일 조정, 자료형 변경, 정규화 등의 과정을 수행하여 가공 데이터를 생성할 수 있다. In addition, the pre-processing unit 110 processes the data of a plurality of alcohol use disorder patients, such as combining data, segmentation, filtering sampling derived variable generation, dummy variable generation, scaling adjustment, data type change, normalization, etc., to obtain processed data. can create
일 예로, 전처리부(110)는 경험적, 실험적으로 파생된 숫자 또는 문자의 디지털 정보를 수정, 정렬하여 단순화된 형식으로 변환할 수 있다.For example, the pre-processing unit 110 may convert digital information of numbers or characters derived empirically or experimentally into a simplified form by correcting and arranging them.
예측 모델 생성부(120)는 가공 데이터를 수신하고, 복수의 알코올 사용장애 환자의 외래 치료 조기 중단 여부를 종속 변수로 설정하고, 독립 변수들을 바탕으로 가공 데이터 중 전체 또는 일부에 하나 이상의 머신 러닝 알고리즘을 적용하여 예측 모델을 생성할 수 있다.The predictive model generation unit 120 receives the processed data, sets whether or not the outpatient treatment of the plurality of alcohol use disorder patients is prematurely discontinued as a dependent variable, and uses one or more machine learning algorithms in all or part of the processed data based on the independent variables. can be applied to generate a predictive model.
이때, 머신 러닝 알고리즘은, 지도학습 알고리즘, 비지도학습 알고리즘, 강화학습 알고리즘 등 크게 3가지로 분류될 수 있다. At this time, machine learning algorithms can be largely classified into three types: supervised learning algorithms, unsupervised learning algorithms, and reinforcement learning algorithms.
지도학습 알고리즘은 의도하는 결과가 있을 때 사용하는 알고리즘으로서, 학습을 하는 동안 머신 러닝 알고리즘 모델은 입력으로 들어온 값에 대해 변수를 조정해서 출력에 매핑할 수 있다.A supervised learning algorithm is an algorithm that is used when there is an intended result. During learning, a machine learning algorithm model can adjust variables for input values and map them to outputs.
비지도학습 알고리즘은 의도하는 결과가 없을 때 사용하는 알고리즘으로서, 입력 데이터 집합을 비슷한 유형의 집합으로 분류할 수 있다. 비지도학습 알고리즘은 데이터 마이닝에 사용될 수 있다.An unsupervised learning algorithm is an algorithm used when there is no intended result, and can classify an input data set into a set of similar types. Unsupervised learning algorithms can be used for data mining.
강화학습 알고리즘은 입력값에 대한 결정을 내릴 때 사용하는 알고리즘으로서, 결정을 했을 때 성공/실패에 따라 주어진 입력값에 대한 결정이 점차 달라지는 알고리즘이다. 강화학습 알고리즘은 학습을 할수록 입력에 대한 결과 예측이 가능할 수 있다. Reinforcement learning algorithm is an algorithm used when making a decision about an input value. When a decision is made, the decision on the given input value gradually changes according to success/failure. As the reinforcement learning algorithm learns, it may be possible to predict the result of the input.
한편, 예측 모델 생성부(120)는, 일 예로, 워크스테이션 서버 또는 클라우드 서버로 구현될 수 있다. Meanwhile, the predictive model generating unit 120 may be implemented as, for example, a workstation server or a cloud server.
예측부(130)는 전처리부(110)에서 생성한 가공 데이터 중 전체 또는 일부를 예측 모델 생성부(120)에서 생성한 예측 모델에 입력하여, 복수의 알코올 사용 장애 환자의 외래 치료 조기 중단 여부에 관한 예측 결과를 생성할 수 있다.The prediction unit 130 inputs all or part of the processed data generated by the pre-processing unit 110 to the prediction model generated by the predictive model generation unit 120 to determine whether a plurality of patients with alcohol use disorders are prematurely discontinued from outpatient treatment. predictive results can be generated.
알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템(100)은 예측부(130)에서 생성한 예측결과를 이용하여 알코올 사용 장애 환자의 외래 치료 조기 중단 여부를 예측할 수 있으며, 또한 외래 치료 조기 중단 여부에 영향을 주는 변수를 파악할 수 있다.The system 100 for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder can predict whether or not the outpatient treatment for patients with alcohol use disorder will be prematurely discontinued by using the prediction result generated by the prediction unit 130, and also determines whether or not the outpatient treatment is prematurely discontinued. Influencing variables can be identified.
그리고 알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템(100)은 예측부(130)에서 생성한 예측 결과를 이용하여 환자의 특성에 따라 각별한 관리를 도모하여 꾸준히 치료를 받을 수 있도록 도와주는 역할을 수행할 수 있다. In addition, the system 100 for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder plays a role of helping to receive treatment steadily by promoting special management according to the characteristics of the patient using the prediction result generated by the prediction unit 130. can do.
출력부(140)는 예측부(130)에서 생성된 예측 결과를 출력할 수 있다. 이때, 출력부(140)는 디스플레이를 통한 화면출력 또는 프린터를 이용한 예측결과 인쇄 등의 방법으로 예측 결과를 출력할 수 있다. The output unit 140 may output a prediction result generated by the prediction unit 130 . At this time, the output unit 140 may output the prediction result by a method such as screen output through a display or printing of the prediction result using a printer.
도 2는 본 발명의 실시예들에 따른 전처리부(110)가 수행하는 변수 결정 동작을 나타내는 흐름도이다.2 is a flowchart illustrating a variable determination operation performed by the preprocessor 110 according to embodiments of the present invention.
도 2를 참조하면, 전처리부(110)는 독립 변수들을 결정할 때, 독립 변수들 사이의 다중 공선성 문제를 해결하기 위해, 독립 변수들 사이에 대한 분산 팽창 계수(VIF, Variance Inflation Factors)를 계산하고, 분산 팽창 계수(VIF)가 기 설정된 임계값 이하를 유지하도록 독립 변수들을 결정할 수 있다.Referring to FIG. 2, when determining independent variables, the preprocessor 110 calculates variance inflation factors (VIFs) between independent variables in order to solve the problem of multicollinearity between independent variables. Independent variables may be determined so that the variance expansion coefficient (VIF) is maintained below a predetermined threshold value.
다중 공선성 문제란 독립 변수의 일부가 다른 독립 변수의 조합으로 표현될 수 있는 문제를 의미한다. 다중 공선성 문제는 독립 변수들이 서로 독립적이지 않고 상호관계가 강한 경우 발생할 수 있다. 다중 공선성 문제를 해결하기 위한 방법으로, 다른 독립 변수에 의존하는 변수를 없애는 방법이 사용될 수 있으며, 이때 분산 팽창 계수(VIF)가 사용될 수 있다.A multicollinearity problem refers to a problem in which some of the independent variables can be expressed as a combination of other independent variables. The multicollinearity problem can occur when independent variables are not independent of each other and have strong interrelationships. As a method for solving the multicollinearity problem, a method of eliminating variables dependent on other independent variables may be used, and in this case, a variance inflation factor (VIF) may be used.
분산 팽창 계수(VIF)는 하나의 독립 변수를 다른 독립 변수로 선형 회귀한 성능을 나타낸 것이다. i번째 변수의 분산 팽창 계수(VIF)는 다음 수학식 1을 통해 구할 수 있다. The variance inflation factor (VIF) represents the performance of a linear regression of one independent variable on another. The variance inflation factor (VIF) of the i-th variable can be obtained through Equation 1 below.
Figure PCTKR2022010516-appb-img-000001
Figure PCTKR2022010516-appb-img-000001
Figure PCTKR2022010516-appb-img-000002
은 i번째 변수를 선형 회귀한 결정계수이다.
Figure PCTKR2022010516-appb-img-000002
is the coefficient of determination of linear regression of the ith variable.
Figure PCTKR2022010516-appb-img-000003
값은 1보다 작기 때문에, 독립 변수에 의존하는 경우라면,
Figure PCTKR2022010516-appb-img-000004
값은 커지게 된다.
Figure PCTKR2022010516-appb-img-000003
Since the value is less than 1, if it depends on the independent variable,
Figure PCTKR2022010516-appb-img-000004
value gets bigger.
전처리부(110)는 결정된 모든 독립 변수들에 대하여 분산 팽창 계수(VIF)값을 계산하여, 모든 분산 팽창 계수(VIF)값이 기 설정된 임계값 이하를 유지될 수 있도록 독립 변수를 결정할 수 있다.The preprocessor 110 may calculate variance inflation coefficient (VIF) values for all determined independent variables, and determine independent variables such that all VIF values are maintained below a predetermined threshold value.
일 예로, 기 설정된 분산 팽창 계수(VIF)의 임계값은 5로 결정할 수 있다. For example, a predetermined critical value of the variance expansion coefficient (VIF) may be determined to be 5.
도 2를 참조하면, 전처리부(110)는 독립 변수를 결정할 수 있다(S210).Referring to FIG. 2 , the pre-processing unit 110 may determine independent variables (S210).
전처리부(110)는 결정된 모든 독립 변수들에 대하여 상술한 방법으로 분산 팽창 계수(VIF)를 계산할 수 있다(S220).The pre-processing unit 110 may calculate the variance expansion coefficient (VIF) for all the determined independent variables in the above-described manner (S220).
전처리부(110)는 어느 하나의 독립 변수에 대한 분산 팽창 계수(VIF)가 임계값을 초과하는 경우(S230-Y), 다른 독립 변수를 결정할 수 있다(S240).The preprocessing unit 110 may determine another independent variable when the variance inflation factor (VIF) for any one independent variable exceeds a critical value (S230-Y) (S240).
이때, 전처리부(110)는 다른 독립 변수를 결정한 후, S220 단계로 진입하여 결정된 독립 변수에 대하여 다시 분산 팽창 계수(VIF)를 계산할 수 있다.At this time, after determining another independent variable, the preprocessor 110 may enter step S220 and calculate a variance inflation factor (VIF) again for the determined independent variable.
모든 독립 변수에 대한 분산 팽창 계수(VIF)가 5 이하인 경우(S230-N), 전처리부(110)는 변수 결정 과정을 종료할 수 있다. When the variance inflation factor (VIF) for all independent variables is 5 or less (S230-N), the preprocessor 110 may end the variable determination process.
도 3은 본 발명의 실시예들에 따른 전처리부(110)가 알코올 사용장애 환자 데이터를 학습 데이터군(Training Data Set)과 시험 데이터군(Test Data Set)으로 분류하는 동작을 나타낸 도면이다.3 is a diagram illustrating an operation of classifying alcohol use disorder patient data into a training data set and a test data set by the preprocessing unit 110 according to embodiments of the present invention.
도 3을 참조하면, 전처리부(110)는 복수의 알코올 사용장애 환자의 데이터를 학습 데이터군과 시험 데이터군으로 분류할 수 있다.Referring to FIG. 3 , the preprocessing unit 110 may classify data of a plurality of alcohol use disorder patients into a learning data group and a test data group.
머신 러닝 알고리즘을 이용하여 예측 모델을 생성하기 위해서 전처리부(110)는 복수의 알코올 사용장애 환자의 데이터 중에서 일부를 학습 데이터군(Training Data Set)으로 분류하여, 예측 모델을 학습하는 데 사용할 수 있다. In order to generate a predictive model using a machine learning algorithm, the preprocessing unit 110 classifies some of the data of a plurality of patients with alcohol use disorder into a training data set, and can be used to learn the predictive model. .
한편, 시험 데이터군(Test Data Set)은 복수의 알코올 사용장애 환자의 데이터를 기초로 생성된 예측 모델을 테스트하기 위해 사용될 수 있다.Meanwhile, the test data set may be used to test a predictive model generated based on data of a plurality of alcohol use disorder patients.
시험 데이터군(Test Data Set)은 통계적으로 유의미한 결과를 도출할 수 있을만큼 크게 설정될 수 있고, 복수의 알코올 사용장애 환자의 데이터 전체일 수 있다. 시험 데이터군은 학습 데이터군과 같은 특징을 가지도록 분류될 수 있다. The test data set may be set large enough to derive statistically significant results, and may include all data of a plurality of alcohol use disorder patients. The test data group may be classified to have the same characteristics as the training data group.
예측 모델 생성부(120)는 시험 데이터군(Test Data Set)을 이용하여, 예측 모델의 성능을 측정하는데 사용될 성능평가지표를 도출할 수 있다. 예측 모델 생성부(120)는 성능평가지표를 이용하여, 예측 모델의 객관적 성능을 확인하고 서로 다른 예측 모델들 간의 성능을 상호 비교할 수 있다.The predictive model generating unit 120 may derive a performance evaluation index to be used to measure performance of the predictive model using a test data set. The predictive model generation unit 120 may check the objective performance of the predictive model and compare the performances of different predictive models using the performance evaluation index.
도 4는 본 발명의 실시예들에 따른 전처리부(110)가 특정 클래스에 오버 샘플링을 수행하는 동작을 나타낸 도면이다.4 is a diagram illustrating an operation of performing oversampling on a specific class by the preprocessor 110 according to embodiments of the present invention.
도 4를 참조하면, 전처리부(110)는 복수의 알코올 사용장애 환자의 데이터를 가공할 때, 학습 데이터군의 클래스 불균형을 해결하기 위해 특정 클래스에 샘플링 방법을 적용할 수 있다.Referring to FIG. 4 , the pre-processing unit 110 may apply a sampling method to a specific class in order to solve the class imbalance of the learning data group when processing data of a plurality of alcohol use disorder patients.
일 예로, 학습 데이터군에 대한 종속 변수인 외래 치료 유지 클래스와 외래 치료 조기 중단 클래스에 대해서, 외래 치료 유지 클래스와 외래 치료 조기 중단 클래스에 포함되는 데이터의 개수가 서로 불균형(e.g. 85:15)을 가지는 문제가 발생할 수 있다. For example, with respect to the outpatient treatment maintenance class and the outpatient treatment early discontinuation class, which are dependent variables for the learning data group, the number of data included in the outpatient treatment maintenance class and the outpatient treatment early discontinuation class is disproportionate (e.g. 85:15) to each other. problems can arise.
클래스 불균형 문제가 있는 데이터를 이용하여 예측 모델이 학습될 경우 편향된 결과가 도출될 수 있고, 알코올 사용장애 환자의 치료 중단 여부를 정확하게 예측하는 것이 어려울 수 있다. 따라서, 이러한 데이터의 클래스 불균형 문제를 해결하기 위해 샘플링 방법이 적용될 수 있다.When a predictive model is trained using data with class imbalance problems, biased results may be derived, and it may be difficult to accurately predict whether patients with alcohol use disorder will stop treatment. Accordingly, a sampling method may be applied to solve the class imbalance problem of such data.
일 예로, 샘플링 방법은 오버샘플링 방법과 언더샘플링 방법으로 구분될 수 있다.For example, sampling methods may be divided into oversampling methods and undersampling methods.
언더샘플링 방법은, 다수 클래스의 데이터군을 소수 클래스의 데이터군 수준으로 감소시키는 방식이다. 언더샘플링 방법은 다수의 클래스 데이터를 제거하므로 계산시간이 감소할 수 있고 클래스 오버랩을 감소시킬 수 있다. 다만, 언더샘플링 방법은 학습에 사용되는 전체 데이터의 수를 급격하게 감소시켜 오히려 학습 성능을 떨어트릴 수 있다.The undersampling method is a method of reducing a data group of a majority class to a level of a data group of a minority class. Since the undersampling method removes a large number of class data, calculation time can be reduced and class overlap can be reduced. However, the undersampling method drastically reduces the total number of data used for learning, and may rather degrade learning performance.
반면, 오버샘플링 방법은, 소수 클래스의 데이터군을 다수 클레스의 수준으로 증가시켜 학습에 충분한 데이터를 확보하는 것이다. On the other hand, the oversampling method secures enough data for learning by increasing the data group of the minority class to the level of the majority class.
예를 들어, 오버샘플링의 방법은, 기존에 존재하는 소수 클래스를 단순 복제하여 비율을 맞추는 랜덤 오버샘플링, 임의의 소수 클래스의 데이터로부터 인근 소수 클래스 사이에 새로운 데이터를 생성하는 방식인 SMOTE(Sythetic Minority Over-Sampling Technique)등의 방법이 사용될 수 있다. For example, oversampling methods include random oversampling, which simply replicates an existing minority class to match the ratio, and synthetic minority (SMOTE), which is a method of generating new data between neighboring minority classes from data of an arbitrary minority class. Over-Sampling Technique) can be used.
전처리부(110)는 데이터를 가공할 때, 오버샘플링 또는 언더샘플링의 방법을 사용하여 클래스 불균형을 교정하고, 더욱 정밀한 예측을 도출할 수 있다. 한편, 본 발명의 실시예들에서는, 전처리부(110)는, 학습 데이터군에 대한 종속 변수인 외래 치료 유지 클래스와 외래 치료 조기 중단 클래스 간의 불균형 문제를 해결하기 위해서, 두 클래스 중에서 소수인 클래스(e.g. 외래 치료 유지 클래스)에 오버샘플링을 적용할 수 있다.When processing data, the pre-processing unit 110 may correct class imbalance by using an oversampling or undersampling method and derive a more precise prediction. On the other hand, in the embodiments of the present invention, the pre-processing unit 110, in order to solve the imbalance problem between the foreign treatment maintenance class and the foreign treatment early discontinuation class, which are dependent variables for the learning data group, a class that is a minority of the two classes ( Oversampling can be applied to outpatient treatment maintenance classes).
도 4를 참조하면, 일 예로, 전처리부(110)는 소수인 클래스에 대해, 소수인 클래스에 포함된 데이터 중 a, b에 대해 복제된(duplicated) 데이터를 생성할 수 있다.Referring to FIG. 4 , for example, the pre-processing unit 110 may generate duplicated data for data a and b included in the minority class for a minority class.
도 5는 본 발명의 실시예들에 따른 예측 모델 생성부(120)가 학습 데이터군에 하나 이상의 머신 러닝 알고리즘을 적용하여 예측 모델을 생성하는 동작을 나타낸 도면이다.5 is a diagram illustrating an operation of generating a predictive model by applying one or more machine learning algorithms to a training data group by the predictive model generator 120 according to embodiments of the present invention.
도 5를 참조하면, 예측 모델 생성부(120)는 가공 데이터 중 학습 데이터군에 대응하는 부분에, 하나 이상의 머신 러닝 알고리즘을 적용하여 예측 모델을 생성할 수 있다.Referring to FIG. 5 , the predictive model generator 120 may generate a predictive model by applying one or more machine learning algorithms to a portion corresponding to a training data group among processed data.
이때 하나 이상의 머신 러닝 일고리즘은 로지스틱 회귀(Logistic Regression), 서포트 벡터 머신(SVM, Support Vector Machine), 랜덤 포레스트(Random Forest), 그래디언트 부스팅(Gradient Boosting) 및 에이다부스트(Adaboost)중 하나 이상일 수 있다. At this time, the one or more machine learning algorithms may be one or more of logistic regression, support vector machine (SVM), random forest, gradient boosting, and adaboost. .
로지스틱 회귀(Logistic Regression)는 두개의 값만을 가지는 종속 변수와 독립 변수들 간에 인과관계를 로지스틱 함수를 이용하여 추정하는 통계기법이다. 종속 변수는 이분형(0 또는 1)이고, 독립 변수는 범주형 또는 연속형일 수 있다.Logistic regression is a statistical technique for estimating a causal relationship between a dependent variable having only two values and independent variables using a logistic function. The dependent variable is dichotomous (0 or 1), and the independent variable can be categorical or continuous.
로지스틱 회귀모형은 일반화 선형모형의 특수한 형태로 S자 곡선을 그리는 함수모형이다. 로지스틱 회귀 분석 결과, 종속 변수 값이 0.5보다 크면 그 사건이 일어날 것으로 예측하며 0.5보다 작으면 그 사건이 일어나지 않을 것으로 예측된다. The logistic regression model is a special form of generalized linear model and is a functional model that draws an S-shaped curve. As a result of logistic regression analysis, if the value of the dependent variable is greater than 0.5, the event is predicted to occur, and if the value is less than 0.5, the event is predicted not to occur.
서포트 벡터 머신(SVM, Support Vector Machine)은 머신 러닝 학습분야 중 하나로, 패턴인식, 자료분석을 위한 지도 학습 모델이며, 주로 분류와 회기 분석을 위해 사용된다. 서포트 벡터 머신 알고리즘은 두 카테고리 중 어느 하나에 속한 데이터의 집합이 주어졌을 때, 새로운 데이터가 어느 카테고리에 속할지 판단하는 비확률적 이진 선형 분류 모델을 생성할 수 있다.A Support Vector Machine (SVM) is one of the machine learning fields and is a supervised learning model for pattern recognition and data analysis, and is mainly used for classification and regression analysis. The support vector machine algorithm may create a non-probabilistic binary linear classification model that determines which category new data belongs to when given a set of data belonging to one of two categories.
이 경우, 카테고리는 알코올 사용장애 환자의 외래 치료 유지 그룹과 외래 치료 조기 중단 그룹으로 구분될 수 있고, 새로운 데이터가 두 그룹 중 어디에 해당하는지 판단하는데 서포트 벡터 머신이 사용될 수 있다. In this case, the category may be divided into an outpatient treatment maintenance group and an outpatient treatment early discontinuation group for patients with alcohol use disorder, and a support vector machine may be used to determine which of the two groups the new data corresponds to.
랜덤 포레스트(Random Forest)는 회귀 분석 등에 사용되는 앙상블 학습 방법의 일종으로, 훈련 과정에서 구성한 다수의 결정 트리로부터 분류 또는 평균 예측치를 출력함으로써 동작한다.A random forest is a type of ensemble learning method used in regression analysis, etc., and operates by outputting a classification or average prediction value from a plurality of decision trees constructed in the training process.
앙상블 모델을 이용한 랜덤 포레스트 테스트 과정은, 결정 트리로부터 얻어진 결과를 평균, 곱하기 또는 과반수 투표 방식을 통해 최종 결과를 도출해 낼 수 있다. 이러한 테스트는 병렬적으로 진행될 수 있어 높은 계산 효율성을 얻을 수 있다.The random forest test process using the ensemble model may derive a final result through average, multiplication, or majority voting of the result obtained from the decision tree. These tests can be performed in parallel, resulting in high computational efficiency.
그래디언트 부스팅(Gradient Boosting)은 회귀 분석 또는 분류 분석을 수행할 수 있는 머신 러닝 알고리즘이며 머신 러닝 알고리즘의 앙상블 방법론 중 부스팅 계열에 속하는 알고리즘이다.Gradient Boosting is a machine learning algorithm that can perform regression analysis or classification analysis, and is an algorithm that belongs to the boosting family of ensemble methodologies of machine learning algorithms.
부스팅이란, 약한 분류기들을 결합하여 강한 분류기를 만드는 과정이고, 그래디언트 부스팅은 이전 단계의 모델이 예측한 데이터의 오차를 가지고, 이 오차를 0으로 만드는 것을 목표로 새로운 단계의 모델을 만드며, 이러한 모델들을 결합하여 모델을 생성하는 방식의 알고리즘이다.Boosting is the process of creating a strong classifier by combining weak classifiers, and gradient boosting takes the error of the data predicted by the model in the previous stage and creates a new model with the goal of making this error zero. It is an algorithm that creates a model by combining them.
에이다부스트(Adaboost)는 다른 학습 알고리즘의 결과물들에 가중치를 두어 더하는 방법으로 최종 결과물을 표현하는 머신 러닝 알고리즘이다.Adaboost is a machine learning algorithm that expresses the final result by weighting and adding the results of other learning algorithms.
한편, 상술한 머신 러닝 알고리즘은 일 예로서, 본 발명의 실시예들은 이에 한정되지 않는다. Meanwhile, the above-described machine learning algorithm is an example, and embodiments of the present invention are not limited thereto.
예측 모델 생성부(120)는 학습 데이터군(Training Data Set)에 머신 러닝 알고리즘을 적용하여 해당 머신 러닝 알고리즘에 대응되는 예측 모델을 생성할 수 있다.The predictive model generator 120 may generate a predictive model corresponding to the machine learning algorithm by applying a machine learning algorithm to a training data set.
도 6은 본 발명의 실시예들에 따른 예측 모델 생성부(120)가 성능평가지표에 따라 예측 모델을 결정하는 동작을 나타낸 도면이다. 6 is a diagram illustrating an operation of determining a predictive model according to performance evaluation indexes by the predictive model generator 120 according to embodiments of the present invention.
도 6을 참조하면, 예측 모델 생성부(120)는 머신 러닝 알고리즘이 복수일 때, 전처리부(110)로부터 수신한 가공 데이터 중 학습 데이터군에 대응하는 부분에 복수의 머신 러닝 알고리즘을 적용하여 생성된 복수의 후보 예측 모델들 각각에 대해, 가공 데이터 중 시험 데이터군에 대응하는 부분을 입력하여 시험 결과를 도출할 수 있다.Referring to FIG. 6, when there are a plurality of machine learning algorithms, the predictive model generation unit 120 applies a plurality of machine learning algorithms to a portion corresponding to the learning data group among the processed data received from the preprocessor 110 to generate For each of the plurality of candidate prediction models, a test result may be derived by inputting a part corresponding to a test data group among processed data.
그리고 예측 모델 생성부(120)는 도출된 시험 결과를 이용하여 복수의 후보 예측 모델들 각각에 대한 성능평가지표를 계산할 수 있다.Also, the predictive model generator 120 may calculate a performance evaluation index for each of a plurality of candidate predictive models using the derived test results.
그리고 예측 모델 생성부(120)는 복수의 후보 예측 모델들 중에서 성능평가지표의 수치가 가장 높은 후보 예측 모델을, 예측 모델로 결정할 수 있다.Also, the prediction model generator 120 may determine a candidate prediction model having the highest performance evaluation index value among a plurality of candidate prediction models as the prediction model.
이때, 성능평가지표는 일 예로, 정확도(Accuracy), 민감도(Sensitivity), 특이도(Specificity), AUC(조작특성곡선 아래 면적, Area under the ROC curve) 중 하나일 수 있다.In this case, the performance evaluation index may be, for example, one of accuracy, sensitivity, specificity, and area under the ROC curve (AUC).
정확도(Accuracy)는 예측결과가 동일한 데이터 건수(TP + TN)를 전체 예측 데이터 건수(TP + FP + FN + TN)으로 나눈 값으로서, 실제 데이터에서 예측 데이터가 얼마나 같은지를 판단하는 지표이다. 정확도는 전체 환자 중에서 외래 치료 중단여부 또는 외래 치료 유지 여부를 맞춘 비율을 의미한다. 이때, TP는 예측 모델이 포지티브(Positive)라고 예측하였는데 실제로도 포지티브인 데이터의 건수, FP는 예측 모델이 포지티브(Positive)라고 예측하였는데 실제로는 네거티브(Negative)인 데이터의 건수, FN은 예측 모델이 네거티브라고 예측하였는데 실제로는 포지티브인 데이터의 건수, TN은 예측 모델이 네거티브라고 예측하였는데 실제로도 네거티브인 데이터의 건수를 의미한다.Accuracy is a value obtained by dividing the number of data with identical prediction results (TP + TN) by the total number of predicted data (TP + FP + FN + TN), and is an index for determining how identical the predicted data is in actual data. Accuracy refers to the ratio of whether to discontinue outpatient treatment or to maintain outpatient treatment among all patients. At this time, TP is the number of data that the prediction model predicted to be positive but is actually positive, FP is the number of data that the prediction model predicted to be positive but is actually negative, FN is the number of data that the prediction model is negative TN means the number of data that is predicted to be negative but is actually positive, and TN is the number of data that is actually negative even though the prediction model predicted to be negative.
민감도(Sensitivity)는 재현율(Recall rate) 또는 히트율(hit rate)라고도 하며, 예측 모델이 포지티브라고 예측한 것(TP + FP) 중에서 실제 포지티브인 것(TP)의 비율로서, 실제 외래 치료를 중단한 환자 중 예측 모델이 적중한 비율을 의미한다.Sensitivity, also called recall rate or hit rate, is the ratio of actual positives (TP) among those predicted by the predictive model to be positive (TP + FP). It means the proportion of the predictive model of one patient.
특이도(Specificity)는 예측 모델이 네거티브라고 예측한 것(TN + FP) 중에서 실제 네거티브(TN)인 것의 비율로서, 실제 외래 치료 유지 환자 중 예측 모델이 적중한 비율을 의미한다.Specificity is the ratio of actual negatives (TN) among those predicted by the predictive model to be negative (TN + FP), and means the ratio of actual outpatient treatment maintenance patients to which the predictive model is correct.
AUC는 ROC(Receiver Operating Characteristics, 수신자 조작 특성) 커브로부터 구할 수 있는 것으로서, False Positive Rate에 따른 True Positive Rate를 의미하는데, 민감도에 따른 (1 - 특이도)를 의미한다.AUC can be obtained from the ROC (Receiver Operating Characteristics) curve, and means the true positive rate according to the false positive rate, which means (1 - specificity) according to the sensitivity.
AUC는 ROC커브의 아래 면적으로, 최대는 1이며, 좋은 예측 모델일수록 1에 가까운 AUC 값을 가진다. AUC is the area under the ROC curve, and the maximum is 1, and a good predictive model has an AUC value close to 1.
예측 모델 생성부(120)는 복수의 후보 예측 모델들에 대해, 상술한 성능평가지표를 사용하여 가장 높은 확률로 알코올 사용장애 환자의 외래 치료 조기 중단 여부를 예측할 수 있는 예측 모델을 선택할 수 있다.The predictive model generation unit 120 may select a predictive model capable of predicting whether an alcohol use disorder patient will stop outpatient treatment early with the highest probability using the above-described performance evaluation index for a plurality of candidate predictive models.
일 예로, 예측 모델 생성부(120)는 후보 예측 모델들 각각에 따른 성능평가지표를 계산하여 표 2를 생성할 수 있다.For example, the predictive model generation unit 120 may generate Table 2 by calculating a performance evaluation index according to each of the candidate predictive models.
ModelModel AUCAUC AccuracyAccuracy SensitivitySensitivity SpecificitySpecificity
Logistic RegressionLogistic Regression 0.69140.6914 0.61300.6130 0.70580.7058 0.60260.6026
SVMSVM 0.67970.6797 0.70230.7023 0.64700.6470 0.70860.7086
Random ForestRandom Forest 0.63650.6365 0.73800.7380 0.47050.4705 0.76820.7682
Gradient BoostingGradient Boosting 0.60220.6022 0.70830.7083 0.41170.4117 0.74170.7417
AdaboostAdaboost 0.72410.7241 0.64280.6428 0.76470.7647 0.62910.6291
표 2에 따르면, AUC 또는 민감도(Sensitivity)를 기준으로 하여 예측 모델을 결정하면, 에이다부스트(Adaboost) 알고리즘을 이용한 후보 예측 모델이 예측 모델로 결정될 수 있다.반면, 정확도(Accuracy) 또는 특이도(Specificity)를 기준으로 하여 예측 모델을 결정하면, 랜덤 포레스트(Random Forest) 알고리즘을 이용한 후보 예측 모델이 예측 모델로 결정될 수 있다.According to Table 2, if a predictive model is determined based on AUC or sensitivity, a candidate predictive model using the Adaboost algorithm can be determined as the predictive model. On the other hand, accuracy or specificity ( When a predictive model is determined based on specificity, a candidate predictive model using a random forest algorithm may be determined as the predictive model.
도 7은 본 발명의 실시예들에 따른 성능평가지표 중 하나인 AUC를 표시한 도면이다.7 is a diagram showing AUC, which is one of the performance evaluation indicators according to embodiments of the present invention.
도 7을 참조하면, 일 예로, 예측 모델 생성부(120)는 AUC를 성능평가지표로 이용하여 예측 모델을 결정할 수 있다.Referring to FIG. 7 , for example, the predictive model generation unit 120 may determine a predictive model using AUC as a performance evaluation index.
ROC 커브는, 일 예로, 도 7과 같이 결정될 수 있다. AUC는 ROC커브의 아래 면적을 의미하는 것으로, 예측 모델 생성부(120)는 후보 예측 모델 각각에 대한 ROC커브의 아래 면적을 계산하여 AUC값을 확인할 수 있다. An ROC curve, for example, may be determined as shown in FIG. 7 . AUC means the area under the ROC curve, and the predictive model generation unit 120 may check the AUC value by calculating the area under the ROC curve for each candidate prediction model.
도 7을 참조하면, 에이다부스트(Adaboost)의 AUC값이 가장 크므로, 예측 모델 생성부(120)는 에이다부스트(Adaboost)를 이용한 예측 모델이 선정될 수 있다.Referring to FIG. 7 , since Adaboost has the largest AUC value, the prediction model generator 120 may select a prediction model using Adaboost.
도 8은 본 발명의 실시예들에 따른 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법을 나타낸 도면이다.8 is a diagram illustrating a method for predicting early discontinuation of outpatient treatment for alcohol use disorder patients according to embodiments of the present invention.
도 8을 참조하면, 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법은, 복수의 알코올 사용장애 환자의 데이터를 수집하는 데이터 수집 단계(S810)를 포함할 수 있다. Referring to FIG. 8 , the method for predicting early discontinuation of outpatient treatment for alcohol use disorder patients may include a data collection step ( S810 ) of collecting data of a plurality of alcohol use disorder patients.
그리고 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법은, 복수의 알코올 사용장애 환자의 외래 치료 조기 중단 여부에 대한 예측 모델을 생성하기 위한 하나 이상의 머신 러닝 알고리즘을 적용할 복수의 독립 변수들을 결정하는 독립 변수 결정 단계(S820)를 포함할 수 있다.In addition, the method for predicting early discontinuation of outpatient treatment of patients with alcohol use disorder is an independent variable that determines a plurality of independent variables to which one or more machine learning algorithms are applied to generate a predictive model for whether or not a plurality of patients with alcohol use disorder will discontinue outpatient treatment early. A variable determination step (S820) may be included.
그리고 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법은, 복수의 알코올 사용장애 환자의 데이터를 가공하여 가공 데이터를 생성하는 전처리 단계(S830)를 포함할 수 있다.The method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder may include a preprocessing step ( S830 ) of generating processed data by processing data of a plurality of patients with alcohol use disorder.
한편, 전술한 데이터 수집 단계(S810), 독립 변수 결정 단계(S820) 및 전처리 단계(S830)는 전술한 전처리부(110)에 의해 실행될 수 있다.Meanwhile, the aforementioned data collection step (S810), independent variable determination step (S820), and preprocessing step (S830) may be executed by the aforementioned preprocessor 110.
그리고 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법은, 가공 데이터를 수신하고, 복수의 알코올 사용장애 환자의 외래 치료 조기 중단 여부를 종속 변수로 설정하고, 독립 변수들을 바탕으로 하여, 가공 데이터 중 전체 또는 일부에 하나 이상의 머신 러닝 알고리즘을 적용하여 예측 모델을 생성하는 예측 모델 생성 단계(S840)를 포함할 수 있다. 한편, 예측 모델 생성 단계(S840)는 전술한 예측 모델 생성부(120)에 의해 실행될 수 있다.In addition, the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder receives processed data, sets whether or not early discontinuation of outpatient treatment for multiple alcohol use disorder patients as a dependent variable, and based on independent variables, all of the processed data Alternatively, it may include a predictive model generating step (S840) of generating a predictive model by applying one or more machine learning algorithms to a part. Meanwhile, the predictive model generating step (S840) may be executed by the aforementioned predictive model generating unit 120.
그리고 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법은, 가공 데이터 중 전체 또는 일부를 예측 모델에 입력하여 복수의 알코올 사용 장애 환자의 외래 진료 초기 중단 여부에 관한 예측 결과를 생성하는 예측 단계(S850)를 포함할 수 있다. 한편, 예측 단계(S850)는 전술한 예측부(130)에 의해 실행될 수 있다.And the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder is a prediction step of inputting all or part of the processed data into a predictive model to generate a prediction result on whether or not to discontinue the outpatient treatment of a plurality of patients with alcohol use disorder in the early stage (S850) can include Meanwhile, the predicting step (S850) may be executed by the predicting unit 130 described above.
그리고 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법은, 예측 결과를 출력하는 출력 단계(S860)를 포함할 수 있다. 한편, 출력 단계(S860)는 전술한 출력부(140)에 의해 실행될 수 있다.Further, the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder may include an output step ( S860 ) of outputting a prediction result. Meanwhile, the output step (S860) may be executed by the above-described output unit 140.
독립 변수 결정 단계(S820)는, 일 예로, 독립 변수들을 결정할 때, 독립 변수들 사이의 다중 공선성 문제를 해결하기 위해, 독립 변수들에 대한 분산 팽창 계수(VIF, Variance Inflation Factors)를 계산하고, 분산 팽창 계수가 기 설정된 임계값 이하를 유지하도록 독립 변수들을 결정할 수 있다.In the independent variable determining step (S820), for example, when determining the independent variables, in order to solve the multicollinearity problem between the independent variables, variance inflation factors (VIFs) are calculated for the independent variables and , independent variables may be determined so that the variance expansion coefficient is maintained below a predetermined threshold value.
일 예로, 전처리 단계(S830)는, 복수의 알코올 사용장애 환자 데이터를, 학습 데이터군과 시험 데이터군으로 분류하는 단계를 포함할 수 있다.For example, the pre-processing step ( S830 ) may include classifying a plurality of alcohol use disorder patient data into a learning data group and a test data group.
그리고 전처리 단계(S830)는, 학습 데이터군에 대한 종속 변수인 외래 치료 유지 클래스와 외래 치료 조기 중단 클래스 중에서 소수인 클래스에 오버샘플링을 적용하는 단계를 포함할 수 있다.The pre-processing step (S830) may include applying oversampling to a minority of the foreign treatment maintenance class and the foreign treatment early discontinuation class, which are dependent variables for the learning data group.
예측 모델 생성 단계(S840)는, 일 예로, 가공 데이터 중 학습 데이터군에 대응하는 부분에, 하나 이상의 머신 러닝 알고리즘을 적용하여 예측 모델을 생성할 수 있다. 이때, 하나 이상의 머신 러닝 알고리즘은, 1) 로지스틱 회귀(Logistic Regression), 2) 서포트 벡터 머신(SVM, Support Vector Machine), 3) 랜덤 포레스트(Random Forest), 4) 그래디언트 부스팅(Gradient Boosting) 및 5) 에이다부스트(Adaboost)중 하나 이상일 수 있다.In the predictive model generating step (S840), for example, a predictive model may be generated by applying one or more machine learning algorithms to a part corresponding to the training data group among the processed data. At this time, one or more machine learning algorithms are: 1) Logistic Regression, 2) Support Vector Machine (SVM), 3) Random Forest, 4) Gradient Boosting, and 5 ) may be one or more of Adaboost.
한편, 예측 모델 생성 단계(S840)는, 일 예로, 1) 머신 러닝 알고리즘이 복수일 때, 가공 데이터 중 학습 데이터군에 대응하는 부분에 복수의 머신 러닝 알고리즘을 적용하여 생성된 복수의 후보 예측 모델들 각각에 대해, 가공 데이터 중 시험 데이터군에 대응하는 부분을 입력하여 시험 결과를 도출하는 단계, 2) 시험 결과를 이용하여 복수의 후보 예측 모델들 각각에 대한 성능평가지표를 계산하는 단계, 3) 복수의 후보 예측 모델들 중에서 성능평가지표의 수치가 가장 높은 후보 예측 모델을, 예측 모델로 결정하는 단계 를 포함할 수 있다.Meanwhile, in the predictive model generating step (S840), as an example, 1) when there are a plurality of machine learning algorithms, a plurality of candidate prediction models generated by applying a plurality of machine learning algorithms to a part corresponding to a training data group among processed data For each of the processing data, deriving a test result by inputting a part corresponding to the test data group, 2) Calculating a performance evaluation index for each of a plurality of candidate prediction models using the test result, 3 ) determining a candidate prediction model having the highest performance evaluation index among a plurality of candidate prediction models as a prediction model.
이때, 성능평가지표는, AUC(Area under the ROC curve)일 수 있다.In this case, the performance evaluation index may be an area under the ROC curve (AUC).
전술한 알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템(100)은, 프로세서, 메모리, 사용자 입력장치, 프레젠테이션 장치 중 적어도 일부를 포함하는 컴퓨팅 장치에 의해 구현될 수 있다. 메모리는, 프로세서에 의해 실행되면 특정 태스크를 수행할 있도록 코딩되어 있는 컴퓨터-판독가능 소프트웨어, 애플리케이션, 프로그램 모듈, 루틴, 인스트럭션(instructions), 및/또는 데이터 등을 저장하는 매체이다. 프로세서는 메모리에 저장되어 있는 컴퓨터-판독가능 소프트웨어, 애플리케이션, 프로그램 모듈, 루틴, 인스트럭션, 및/또는 데이터 등을 판독하여 실행할 수 있다. 사용자 입력장치는 사용자로 하여금 프로세서에게 특정 태스크를 실행하도록 하는 명령을 입력하거나 특정 태스크의 실행에 필요한 데이터를 입력하도록 하는 수단일 수 있다. 사용자 입력장치는 물리적인 또는 가상적인 키보드나 키패드, 키버튼, 마우스, 조이스틱, 트랙볼, 터치-민감형 입력수단, 또는 마이크로폰 등을 포함할 수 있다. 프레젠테이션 장치는 디스플레이, 프린터, 스피커, 또는 진동장치 등을 포함할 수 있다.The aforementioned system 100 for predicting early withdrawal from outpatient treatment for patients with alcohol use disorder may be implemented by a computing device including at least some of a processor, a memory, a user input device, and a presentation device. Memory is a medium that stores computer-readable software, applications, program modules, routines, instructions, and/or data that are coded to perform particular tasks when executed by a processor. A processor may read and execute computer-readable software, applications, program modules, routines, instructions, and/or data stored in memory. The user input device may be a means for allowing a user to input a command to execute a specific task to the processor or input data required for execution of the specific task. The user input device may include a physical or virtual keyboard or keypad, key buttons, mouse, joystick, trackball, touch-sensitive input means, or a microphone. The presentation device may include a display, a printer, a speaker, or a vibrator.
컴퓨팅 장치는 스마트폰, 태블릿, 랩탑, 데스크탑, 서버, 클라이언트 등의 다양한 장치를 포함할 수 있다. 컴퓨팅 장치는 하나의 단일한 스탠드-얼론 장치일 수도 있고, 통신망을 통해 서로 협력하는 다수의 컴퓨팅 장치들로 이루어진 분산형 환경에서 동작하는 다수의 컴퓨팅 장치를 포함할 수 있다.Computing devices may include a variety of devices such as smart phones, tablets, laptops, desktops, servers, and clients. A computing device may be a single stand-alone device or may include multiple computing devices operating in a distributed environment consisting of multiple computing devices cooperating with each other over a communications network.
또한, 전술한 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법은, 프로세서를 구비하고, 또한 프로세서에 의해 실행되면 딥 러닝 모델을 활용한 영상 진단 방법을 수행할 수 있도록 코딩된 컴퓨터 판독가능 소프트웨어, 애플리케이션, 프로그램 모듈, 루틴, 인스트럭션, 및/또는 데이터 구조 등을 저장한 메모리를 구비하는 컴퓨팅 장치에 의해 실행될 수 있다.In addition, the above-described method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder has a processor and is coded to perform an image diagnosis method using a deep learning model when executed by the processor. Computer readable software and applications , program modules, routines, instructions, and/or data structures, etc. may be executed by a computing device having a memory.
상술한 본 실시예들은 다양한 수단을 통해 구현될 수 있다. 예를 들어, 본 실시예들은 하드웨어, 펌웨어(firmware), 소프트웨어 또는 그것들의 결합 등에 의해 구현될 수 있다.The present embodiments described above may be implemented through various means. For example, the present embodiments may be implemented by hardware, firmware, software, or a combination thereof.
하드웨어에 의한 구현의 경우, 본 실시예들은 하나 또는 그 이상의 ASICs(Application Specific Integrated Circuits), DSPs(Digital Signal Processors), DSPDs(Digital Signal Processing Devices), PLDs(Programmable Logic Devices), FPGAs(Field Programmable Gate Arrays), 프로세서, 컨트롤러, 마이크로 컨트롤러 또는 마이크로 프로세서 등에 의해 구현될 수 있다.In the case of hardware implementation, the present embodiments include one or more ASICs (Application Specific Integrated Circuits), DSPs (Digital Signal Processors), DSPDs (Digital Signal Processing Devices), PLDs (Programmable Logic Devices), FPGAs (Field Programmable Gates) Arrays), processors, controllers, microcontrollers or microprocessors.
예를 들어, 실시예들에 따른 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법은 심층 신경망의 뉴런(neuron)과 시냅스(synapse)가 반도체 소자들로 구현된 인공지능 반도체 장치를 이용하여 구현될 수 있다. 이때 반도체 소자는 현재 사용하는 반도체 소자들, 예를 들어 SRAM이나 DRAM, NAND 등일 수도 있고, 차세대 반도체 소자들, RRAM이나 STT MRAM, PRAM 등일 수도 있고, 이들의 조합일 수도 있다.For example, a method for predicting early withdrawal from outpatient treatment of a patient with alcohol use disorder according to embodiments may be implemented using an artificial intelligence semiconductor device in which neurons and synapses of a deep neural network are implemented as semiconductor devices. there is. At this time, the semiconductor device may be currently used semiconductor devices such as SRAM, DRAM, NAND, etc., next-generation semiconductor devices, RRAM, STT MRAM, PRAM, etc., or a combination thereof.
실시예들에 따른 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법을 인공지능 반도체 장치를 이용하여 구현할 때, 딥 러닝 모델을 소프트웨어로 학습한 결과(가중치)를 어레이로 배치된 시냅스 모방소자에 전사하거나 인공지능 반도체 장치에서 학습을 진행할 수도 있다.When the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder according to the embodiments is implemented using an artificial intelligence semiconductor device, the result (weight) of learning the deep learning model as software is transferred to the synaptic mimic device arranged in an array, or Learning may be performed on an artificial intelligence semiconductor device.
펌웨어나 소프트웨어에 의한 구현의 경우, 본 실시예들에 따른 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법은 이상에서 설명된 기능 또는 동작들을 수행하는 장치, 절차 또는 함수 등의 형태로 구현될 수 있다. 소프트웨어 코드는 메모리 유닛에 저장되어 프로세서에 의해 구동될 수 있다. 메모리 유닛은 상기 프로세서 내부 또는 외부에 위치하여, 이미 공지된 다양한 수단에 의해 프로세서와 데이터를 주고 받을 수 있다.In the case of implementation by firmware or software, the method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder according to the present embodiments may be implemented in the form of a device, procedure, or function that performs the functions or operations described above. . The software codes may be stored in a memory unit and driven by a processor. The memory unit may be located inside or outside the processor and exchange data with the processor by various means known in the art.
또한, 위에서 설명한 "시스템", "프로세서", "컨트롤러", "컴포넌트", "모듈", "인터페이스", "모델", 또는 "유닛" 등의 용어는 일반적으로 컴퓨터 관련 엔티티 하드웨어, 하드웨어와 소프트웨어의 조합, 소프트웨어 또는 실행 중인 소프트웨어를 의미할 수 있다. 예를 들어, 전술한 구성요소는 프로세서에 의해서 구동되는 프로세스, 프로세서, 컨트롤러, 제어 프로세서, 개체, 실행 스레드, 프로그램 및/또는 컴퓨터일 수 있지만 이에 국한되지 않는다. 예를 들어, 컨트롤러 또는 프로세서에서 실행 중인 애플리케이션과 컨트롤러 또는 프로세서가 모두 구성 요소가 될 수 있다. 하나 이상의 구성 요소가 프로세스 및/또는 실행 스레드 내에 있을 수 있으며, 구성 요소들은 하나의 장치(예: 시스템, 컴퓨팅 디바이스 등)에 위치하거나 둘 이상의 장치에 분산되어 위치할 수 있다.Also, the terms "system", "processor", "controller", "component", "module", "interface", "model", or "unit" as described above generally refer to computer-related entities hardware, hardware and software. can mean a combination of, software, or running software. For example, but is not limited to, a process driven by a processor, a processor, a controller, a control processor, an object, a thread of execution, a program, and/or a computer. For example, a component can be both an application running on a controller or processor and a controller or processor. One or more components may reside within a process and/or thread of execution, and components may reside on one device (eg, system, computing device, etc.) or may be distributed across two or more devices.
한편, 또 다른 실시예는 전술한 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법을 수행하는, 컴퓨터 기록매체에 저장되는 컴퓨터 프로그램을 제공한다. 또한 또 다른 실시예는 전술한 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법을 실현시키기 위한 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록매체를 제공한다.On the other hand, another embodiment provides a computer program stored in a computer recording medium that performs the above-described method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder. In addition, another embodiment provides a computer-readable recording medium recording a program for realizing the above-described method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder.
기록매체에 기록된 프로그램은 컴퓨터에서 읽히어 설치되고 실행됨으로써 전술한 단계들을 실행할 수 있다.A program recorded on a recording medium may be read, installed, and executed in a computer to execute the above-described steps.
이와 같이, 컴퓨터가 기록매체에 기록된 프로그램을 읽어 들여 프로그램으로 구현된 기능들을 실행시키기 위하여, 전술한 프로그램은 컴퓨터의 프로세서(CPU)가 컴퓨터의 장치 인터페이스(Interface)를 통해 읽힐 수 있는 C, C++, JAVA, 기계어 등의 컴퓨터 언어로 코드화된 코드(Code)를 포함할 수 있다.In this way, in order for the computer to read the program recorded on the recording medium and execute the functions implemented by the program, the above-described program is C, C++ that can be read by the computer's processor (CPU) through the computer's device interface. , JAVA, may include a code coded in a computer language such as machine language.
이러한 코드는 전술한 기능들을 정의한 함수 등과 관련된 기능적인 코드(Function Code)를 포함할 수 있고, 전술한 기능들을 컴퓨터의 프로세서가 소정의 절차대로 실행시키는데 필요한 실행 절차 관련 제어 코드를 포함할 수도 있다.These codes may include functional codes related to functions defining the above-described functions, and may include control codes related to execution procedures necessary for a processor of a computer to execute the above-described functions according to a predetermined procedure.
또한, 이러한 코드는 전술한 기능들을 컴퓨터의 프로세서가 실행시키는데 필요한 추가 정보나 미디어가 컴퓨터의 내부 또는 외부 메모리의 어느 위치(주소 번지)에서 참조 되어야 하는지에 대한 메모리 참조 관련 코드를 더 포함할 수 있다.In addition, these codes may further include memory reference related codes for which location (address address) of the computer's internal or external memory should be referenced for additional information or media necessary for the computer's processor to execute the above-mentioned functions. .
또한, 컴퓨터의 프로세서가 전술한 기능들을 실행시키기 위하여 원격(Remote)에 있는 어떠한 다른 컴퓨터나 서버 등과 통신이 필요한 경우, 코드는 컴퓨터의 프로세서가 컴퓨터의 통신 모듈을 이용하여 원격(Remote)에 있는 어떠한 다른 컴퓨터나 서버 등과 어떻게 통신해야만 하는지, 통신 시 어떠한 정보나 미디어를 송수신해야 하는지 등에 대한 통신 관련 코드를 더 포함할 수도 있다.In addition, when the computer processor needs to communicate with any other remote computer or server in order to execute the above-mentioned functions, the code allows the computer processor to use the computer's communication module to communicate with any other remote computer or server. Communication-related codes for how to communicate with other computers or servers, what information or media to transmit/receive during communication, and the like may be further included.
이상에서 전술한 바와 같은 프로그램을 기록한 컴퓨터로 읽힐 수 있는 기록매체는, 일 예로, ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 미디어 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어, 인터넷을 통한 전송)의 형태로 구현되는 것도 포함할 수 있다.Recording media that can be read by a computer on which the program as described above is recorded are, for example, ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical media storage device, etc., and also carrier wave (e.g. , Transmission through the Internet) may also include what is implemented in the form of.
또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.In addition, the computer-readable recording medium is distributed in computer systems connected through a network, so that computer-readable codes can be stored and executed in a distributed manner.
그리고, 본 발명을 구현하기 위한 기능적인(Functional) 프로그램과 이와 관련된 코드 및 코드 세그먼트 등은, 기록매체를 읽어서 프로그램을 실행시키는 컴퓨터의 시스템 환경 등을 고려하여, 본 발명이 속하는 기술분야의 프로그래머들에 의해 용이하게 추론되거나 변경될 수도 있다.In addition, a functional program for implementing the present invention, codes and code segments related thereto, in consideration of the system environment of a computer that reads a recording medium and executes a program, etc., help programmers in the art to which the present invention belongs It may be easily inferred or changed by
알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법은, 컴퓨터에 의해 실행되는 애플리케이션이나 프로그램 모듈과 같은 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다. 컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 모두 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다.The method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder may be implemented in the form of a recording medium including instructions executable by a computer, such as an application or program module executed by a computer. Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. Also, computer readable media may include all computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.
알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법은, 단말기에 기본적으로 설치된 애플리케이션(이는 단말기에 기본적으로 탑재된 플랫폼이나 운영체제 등에 포함된 프로그램을 포함할 수 있다)에 의해 실행될 수 있고, 사용자가 애플리케이션 스토어 서버, 애플리케이션 또는 해당 서비스와 관련된 웹 서버 등의 애플리케이션 제공 서버를 통해 마스터 단말기에 직접 설치한 애플리케이션(즉, 프로그램)에 의해 실행될 수도 있다. 이러한 의미에서, 전술한 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법은 단말기에 기본적으로 설치되거나 사용자에 의해 직접 설치된 애플리케이션(즉, 프로그램)으로 구현되고 단말기에 등의 컴퓨터로 읽을 수 있는 기록매체에 기록될 수 있다A method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder may be executed by an application basically installed in a terminal (this may include a program included in a platform or operating system, etc. It may be executed by an application (that is, a program) directly installed in the master terminal through an application providing server such as a server, an application, or a web server related to the corresponding service. In this sense, the above-described method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder is implemented as an application (i.e., a program) that is basically installed in a terminal or directly installed by a user, and is stored in a computer-readable recording medium such as a terminal. can be recorded
전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어 단일형으로 설명되어 있는 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustrative purposes, and those skilled in the art can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, the embodiments described above should be understood as illustrative in all respects and not limiting. For example, components described as a single type may be implemented in a distributed manner, and components described as distributed may also be implemented in a combined form.
본 발명의 범위는 상기 상세한 설명보다 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the appended claims rather than the detailed description above, and all changes or modifications derived from the meaning and scope of the claims and equivalent concepts thereof should be construed as being included in the scope of the present invention. do.
CROSS-REFERENCE TO RELATED APPLICATIONCROSS-REFERENCE TO RELATED APPLICATION
본 특허출원은 2021년 07월 22일 한국에 출원한 특허출원번호 제 10-2021-0096253호에 대해 미국 특허법 119(a)조 (35 U.S.C § 119(a))에 따라 우선권을 주장하며, 그 모든 내용은 참고문헌으로 본 특허출원에 병합된다. 아울러, 본 특허출원은 미국 이외에 국가에 대해서도 위와 동일한 이유로 우선권을 주장하면 그 모든 내용은 참고문헌으로 본 특허출원에 병합된다.This patent application claims priority in accordance with US Patent Act Article 119 (a) (35 U.S.C § 119 (a)) for Patent Application No. 10-2021-0096253 filed in Korea on July 22, 2021, and All contents are incorporated into this patent application by reference. In addition, if this patent application claims priority for the same reason as above for countries other than the United States, all the contents are incorporated into this patent application as references.

Claims (14)

  1. 복수의 알코올 사용장애 환자의 데이터를 수집하고, 상기 복수의 알코올 사용장애 환자의 외래 치료 조기 중단 여부에 대한 예측 모델을 생성하기 위한 하나 이상의 머신 러닝 알고리즘을 적용할 복수의 독립 변수들을 결정하고, 상기 복수의 알코올 사용장애 환자의 데이터를 가공하여 가공 데이터를 생성하는 전처리부;Collecting data of a plurality of alcohol use disorder patients, determining a plurality of independent variables to which one or more machine learning algorithms to apply one or more machine learning algorithms for generating a predictive model for early discontinuation of outpatient treatment of the plurality of alcohol use disorder patients, a pre-processing unit generating processed data by processing data of a plurality of alcohol use disorder patients;
    상기 가공 데이터를 수신하고, 상기 복수의 알코올 사용장애 환자의 외래 치료 조기 중단 여부를 종속 변수로 설정하고, 상기 독립 변수들을 바탕으로 상기 가공 데이터 중 전체 또는 일부에 상기 하나 이상의 머신 러닝 알고리즘을 적용하여 상기 예측 모델을 생성하는 예측 모델 생성부;Receiving the processed data, setting whether the plurality of alcohol use disorder patients' outpatient treatment was prematurely discontinued as a dependent variable, and applying the one or more machine learning algorithms to all or part of the processed data based on the independent variables a predictive model generating unit generating the predictive model;
    상기 가공 데이터 중 전체 또는 일부를 상기 예측 모델에 입력하여, 상기 복수의 알코올 사용 장애 환자의 외래 치료 조기 중단 여부에 관한 예측 결과를 생성하는 예측부; 및a prediction unit inputting all or part of the processed data into the predictive model to generate a prediction result regarding whether the plurality of alcohol use disorder patients will prematurely discontinue outpatient treatment; and
    상기 예측 결과를 출력하는 출력부;를 포함하는 알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템.A system for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder, comprising: an output unit for outputting the prediction result.
  2. 제1항에 있어서,According to claim 1,
    상기 전처리부는,The pre-processing unit,
    상기 독립 변수들을 결정할 때, 상기 독립 변수들 사이의 다중 공선성 문제를 해결하기 위해, 상기 독립 변수들에 대한 분산 팽창 계수(VIF, Variance Inflation Factors)를 계산하고,When determining the independent variables, calculate Variance Inflation Factors (VIFs) for the independent variables in order to solve the problem of multicollinearity between the independent variables,
    상기 분산 팽창 계수가 기 설정된 임계값 이하를 유지하도록 상기 독립 변수들을 결정하는 알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템.A system for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder for determining the independent variables so that the variance expansion coefficient is maintained below a predetermined threshold value.
  3. 제1항에 있어서,According to claim 1,
    상기 전처리부는,The pre-processing unit,
    상기 복수의 알코올 사용장애 환자의 데이터를, 학습 데이터군과 시험 데이터군으로 분류하는 알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템.A system for predicting early discontinuation of outpatient treatment of alcohol use disorder patients, which classifies the data of the plurality of alcohol use disorder patients into a learning data group and a test data group.
  4. 제3항에 있어서,According to claim 3,
    상기 전처리부는,The pre-processing unit,
    상기 학습 데이터군에 대한 종속 변수인, 외래 치료 유지 클래스와 외래 치료 조기 중단 클래스 중에서, 소수인 클래스에 오버샘플링을 적용하는 알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템.A system for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder, which applies oversampling to a minority class among the dependent variables for the learning data group, the outpatient treatment maintenance class and the outpatient treatment early withdrawal class.
  5. 제4항에 있어서,According to claim 4,
    상기 예측 모델 생성부는,The predictive model generating unit,
    상기 가공 데이터 중 상기 학습 데이터군에 대응하는 부분에, 상기 하나 이상의 머신 러닝 알고리즘을 적용하여 상기 예측 모델을 생성하고,Generating the predictive model by applying the one or more machine learning algorithms to a portion corresponding to the learning data group among the processed data;
    상기 하나 이상의 머신 러닝 알고리즘은,The one or more machine learning algorithms,
    로지스틱 회귀(Logistic Regression), 서포트 벡터 머신(SVM, Support Vector Machine), 랜덤 포레스트(Random Forest), 그래디언트 부스팅(Gradient Boosting) 및 에이다부스트(Adaboost)중 하나 이상인 알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템.Predicting early discontinuation of outpatient treatment for patients with alcohol use disorder using one or more of Logistic Regression, Support Vector Machine (SVM), Random Forest, Gradient Boosting, and Adaboost system.
  6. 제5항에 있어서,According to claim 5,
    상기 예측 모델 생성부는,The predictive model generating unit,
    상기 머신 러닝 알고리즘이 복수일 때, 상기 가공 데이터 중 상기 학습 데이터군에 대응하는 부분에 복수의 머신 러닝 알고리즘을 적용하여 생성된 복수의 후보 예측 모델들 각각에, 상기 가공 데이터 중 상기 시험 데이터군에 대응하는 부분을 입력하여 시험 결과를 도출하고,When the machine learning algorithm is plural, to each of a plurality of candidate prediction models generated by applying a plurality of machine learning algorithms to a portion corresponding to the training data group among the processed data, to the test data group among the processed data Enter the corresponding part to derive the test result,
    상기 시험 결과를 이용하여 상기 복수의 후보 예측 모델들 각각에 대한 성능평가지표를 계산하고,Calculate a performance evaluation index for each of the plurality of candidate prediction models using the test result,
    상기 복수의 후보 예측 모델들 중에서 상기 성능평가지표의 수치가 가장 높은 후보 예측 모델을, 상기 예측 모델로 결정하는 알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템.A system for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder, which determines, as the predictive model, a candidate predictive model having the highest value of the performance evaluation index among the plurality of candidate predictive models.
  7. 제6항에 있어서,According to claim 6,
    상기 성능평가지표는,The performance evaluation index is,
    AUC(Area under the ROC curve)인 알코올 사용장애 환자의 외래 치료 조기 중단 예측 시스템. A system for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder, which is AUC (Area under the ROC curve).
  8. 복수의 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법에 있어서,In the method of predicting early discontinuation of outpatient treatment of multiple alcohol use disorder patients,
    상기 복수의 알코올 사용장애 환자의 데이터를 수집하는 데이터 수집 단계;a data collection step of collecting data of the plurality of alcohol use disorder patients;
    상기 복수의 알코올 사용장애 환자의 외래 치료 조기 중단 여부에 대한 예측 모델을 생성하기 위한 하나 이상의 머신 러닝 알고리즘을 적용할 복수의 독립 변수들을 결정하는 독립 변수 결정 단계;an independent variable determination step of determining a plurality of independent variables to which one or more machine learning algorithms for generating a predictive model for early discontinuation of outpatient treatment of the plurality of alcohol use disorder patients are to be applied;
    상기 복수의 알코올 사용장애 환자의 데이터를 가공하여 가공 데이터를 생성하는 전처리 단계;a pre-processing step of generating processed data by processing the data of the plurality of alcohol use disorder patients;
    상기 가공 데이터를 수신하고, 상기 복수의 알코올 사용장애 환자의 외래 치료 조기 중단 여부를 종속 변수로 설정하고, 상기 독립 변수들을 바탕으로 상기 가공 데이터 중 전체 또는 일부에 상기 하나 이상의 머신 러닝 알고리즘을 적용하여 상기 예측 모델을 생성하는 예측 모델 생성 단계;Receiving the processed data, setting whether the plurality of alcohol use disorder patients' outpatient treatment was prematurely discontinued as a dependent variable, and applying the one or more machine learning algorithms to all or part of the processed data based on the independent variables a predictive model generating step of generating the predictive model;
    상기 가공 데이터 중 전체 또는 일부를 상기 예측 모델에 입력하여, 상기 복수의 알코올 사용 장애 환자의 외래 진료 초기 중단 여부에 관한 예측 결과를 생성하는 예측 단계; 및a prediction step of inputting all or part of the processed data into the prediction model to generate a prediction result regarding whether or not the plurality of alcohol use disorder patients will initially discontinue outpatient treatment; and
    상기 예측 결과를 출력하는 출력 단계;를 포함하는 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법.A method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder, comprising: an output step of outputting the prediction result.
  9. 제8항에 있어서,According to claim 8,
    상기 독립 변수 결정 단계는,In the step of determining the independent variable,
    상기 독립 변수들을 결정할 때, 상기 독립 변수들 사이의 다중 공선성 문제를 해결하기 위해, 상기 독립 변수들에 대한 분산 팽창 계수(VIF, Variance Inflation Factors)를 계산하고,When determining the independent variables, calculate Variance Inflation Factors (VIFs) for the independent variables in order to solve the multicollinearity problem between the independent variables,
    상기 분산 팽창 계수가 기 설정된 임계값 이하를 유지하도록 상기 독립 변수들을 결정하는 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법.A method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder, wherein the independent variables are determined so that the variance expansion coefficient is maintained below a predetermined threshold value.
  10. 제8항에 있어서,According to claim 8,
    상기 전처리 단계는,In the preprocessing step,
    상기 복수의 알코올 사용장애 환자 데이터를, 학습 데이터군과 시험 데이터군으로 분류하는 단계;를 포함하는 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법.Classifying the plurality of alcohol use disorder patient data into a learning data group and a test data group; a method for predicting early discontinuation of outpatient treatment for alcohol use disorder patients.
  11. 제10항에 있어서,According to claim 10,
    상기 전처리 단계는,In the preprocessing step,
    상기 학습 데이터군에 대한 종속 변수인 외래 치료 유지 클래스와 외래 치료 조기 중단 클래스 중에서 소수인 클래스에 오버샘플링을 적용하는 단계;를 포함하는 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법.A method for predicting early withdrawal from ambulatory treatment for patients with alcohol use disorder, comprising: applying oversampling to a small number of the outpatient treatment maintenance class and the outpatient treatment early discontinuation class, which are dependent variables for the learning data group.
  12. 제11항에 있어서,According to claim 11,
    상기 예측 모델 생성 단계는,The predictive model generation step,
    상기 가공 데이터 중 상기 학습 데이터군에 대응하는 부분에, 상기 하나 이상의 머신 러닝 알고리즘을 적용하여 상기 예측 모델을 생성하고,Generating the predictive model by applying the one or more machine learning algorithms to a portion corresponding to the learning data group among the processed data;
    상기 하나 이상의 머신 러닝 알고리즘은,The one or more machine learning algorithms,
    로지스틱 회귀(Logistic Regression), 서포트 벡터 머신(SVM, Support Vector Machine), 랜덤 포레스트(Random Forest), 그래디언트 부스팅(Gradient Boosting) 및 에이다부스트(Adaboost)중 하나 이상인 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법.Predicting early discontinuation of outpatient treatment for patients with alcohol use disorder using one or more of Logistic Regression, Support Vector Machine (SVM), Random Forest, Gradient Boosting, and Adaboost Way.
  13. 제12항에 있어서,According to claim 12,
    상기 예측 모델 생성 단계는,The predictive model generation step,
    상기 머신 러닝 알고리즘이 복수일 때, 상기 가공 데이터 중 상기 학습 데이터군에 대응하는 부분에 복수의 머신 러닝 알고리즘을 적용하여 생성된 복수의 후보 예측 모델들 각각에 대해, 상기 가공 데이터 중 상기 시험 데이터군에 대응하는 부분을 입력하여 시험 결과를 도출하는 단계;When there are a plurality of machine learning algorithms, for each of a plurality of candidate prediction models generated by applying a plurality of machine learning algorithms to a portion corresponding to the training data group among the processed data, the test data group among the processed data Deriving a test result by inputting a part corresponding to;
    상기 시험 결과를 이용하여 상기 복수의 후보 예측 모델들 각각에 대한 성능평가지표를 계산하는 단계; 및calculating a performance evaluation index for each of the plurality of candidate prediction models using the test result; and
    상기 복수의 후보 예측 모델들 중에서 상기 성능평가지표의 수치가 가장 높은 후보 예측 모델을, 상기 예측 모델로 결정하는 단계;를 포함하는 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법.A method for predicting early discontinuation of outpatient treatment for patients with alcohol use disorder, comprising: determining a candidate predictive model having the highest performance evaluation index among the plurality of candidate predictive models as the predictive model.
  14. 제13항에 있어서,According to claim 13,
    상기 성능평가지표는,The performance evaluation index is,
    AUC(Area under the ROC curve)인 알코올 사용장애 환자의 외래 치료 조기 중단 예측 방법.A method for predicting early discontinuation of outpatient treatment in patients with alcohol use disorder, which is AUC (Area under the ROC curve).
PCT/KR2022/010516 2021-07-22 2022-07-19 System and method for predicting early discontinuation of treatment of outpatient with alcohol use disorder WO2023003315A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR10-2021-0096253 2021-07-22
KR1020210096253A KR102601514B1 (en) 2021-07-22 2021-07-22 System for prediction of early dropping out in outpatients with alcohol use disorders and method thereof

Publications (1)

Publication Number Publication Date
WO2023003315A1 true WO2023003315A1 (en) 2023-01-26

Family

ID=84979465

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2022/010516 WO2023003315A1 (en) 2021-07-22 2022-07-19 System and method for predicting early discontinuation of treatment of outpatient with alcohol use disorder

Country Status (2)

Country Link
KR (1) KR102601514B1 (en)
WO (1) WO2023003315A1 (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20190008770A (en) * 2017-07-17 2019-01-25 주식회사 헬스맥스 Method for predicting success or failure of health consulting
KR20200022760A (en) * 2018-08-23 2020-03-04 가톨릭대학교 산학협력단 High Risk Drinking Behavior Prevention Service Providing System
KR102200039B1 (en) * 2018-12-13 2021-01-08 연세대학교 산학협력단 Method and apparatus for providing a prediction value for spontaneous ureter stone passage

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
FONSI ELBREDER M., DE SOUZA E SILVA R., CRISTINA PILLON S., LARANJEIRA R.: "Alcohol Dependence: Analysis of Factors Associated with Retention of Patients in Outpatient Treatment", ALCOHOL AND ALCOHOLISM, PERGAMON, OXFORD, GB, vol. 46, no. 1, 1 January 2011 (2011-01-01), GB , pages 74 - 76, XP093026899, ISSN: 0735-0414, DOI: 10.1093/alcalc/agq078 *
J.D. WESTWOOD, S.W. WESTWOOD, L. FELLäNDER-TSAI, C.M. FIDOPIASTIS, R. S. HALUCK, R.A. ROBB, S. SENGER, AND K. G. VOSBURGH: "STUDIES IN HEALTH TECHNOLOGY AND INFORMATICS", 27 May 2021, I O S PRESS, AMSTERDAM , NL , ISSN: 0926-9630, article EBRAHIMI ALI, WIIL UFFE KOCK, MANSOURVAR MARJAN, NAEMI AMIN, ANDERSEN KJELD, NIELSEN ANETTE SØGAARD: "Deep Neural Network to Identify Patients with Alcohol Use Disorder : Proceedings of MIE 2021", pages: 238 - 242, XP093026903, DOI: 10.3233/SHTI210156 *
JOHANNESSEN DAGNY ADRIAENSSEN, NORDFJÆRN TROND, GEIRDAL AMY ØSTERTUN: "Substance use disorder patients’ expectations on transition from treatment to post-discharge period", SAGE JOURNALS, vol. 37, no. 3, 1 June 2020 (2020-06-01), pages 208 - 226, XP093026910, ISSN: 1455-0725, DOI: 10.1177/1455072520910551 *
KIM SUK-YOUNG, PARK TAESUNG, KIM KWONYOUNG, OH JIHOON, PARK YOONJAE, KIM DAI-JIN: "A Deep Learning Algorithm to Predict Hazardous Drinkers and the Severity of Alcohol-Related Problems Using K-NHANES", FRONTIERS IN PSYCHIATRY, vol. 12, XP093026907, DOI: 10.3389/fpsyt.2021.684406 *
LEE MARY R., SANKAR VIGNESH, HAMMER AARON, KENNEDY WILLIAM G., BARB JENNIFER J., MCQUEEN PHILIP G., LEGGIO LORENZO: "Using Machine Learning to Classify Individuals With Alcohol Use Disorder Based on Treatment Seeking Status", ECLINICAL MEDICINE, vol. 12, 1 July 2019 (2019-07-01), pages 70 - 78, XP093026902, ISSN: 2589-5370, DOI: 10.1016/j.eclinm.2019.05.008 *
PARK SO JIN, LEE SUN JUNG, KIM HYUNGMIN, KIM JAE KWON, CHUN JI-WON, LEE SOO-JUNG, LEE HAE KOOK, KIM DAI JIN, CHOI IN YOUNG: "Machine learning prediction of dropping out of outpatients with alcohol use disorders", PLOS ONE, vol. 16, no. 8, pages e0255626, XP093026913, DOI: 10.1371/journal.pone.0255626 *

Also Published As

Publication number Publication date
KR102601514B1 (en) 2023-11-14
KR20230015009A (en) 2023-01-31

Similar Documents

Publication Publication Date Title
Quiroz-Juárez et al. Identification of high-risk COVID-19 patients using machine learning
Biswas et al. An XAI based autism detection: the context behind the detection
WO2018143540A1 (en) Method, device, and program for predicting prognosis of stomach cancer by using artificial neural network
WO2022005090A1 (en) Method and apparatus for providing diagnosis result
Keniya et al. Disease prediction from various symptoms using machine learning
WO2021212670A1 (en) New infectious disease onset risk prediction method, apparatus, terminal device, and medium
WO2019235828A1 (en) Two-face disease diagnosis system and method thereof
WO2021190661A1 (en) Data processing system, method, apparatus, and storage medium
Chai et al. Glaucoma diagnosis in the Chinese context: An uncertainty information-centric Bayesian deep learning model
WO2022119162A1 (en) Medical image-based disease prediction method
Eapen Artificial intelligence in dermatology: a practical introduction to a paradigm shift
Bhatt et al. An intelligent system for diagnosing thyroid disease in pregnant ladies through artificial neural network
WO2022265292A1 (en) Method and device for detecting abnormal data
WO2018088825A1 (en) Two-class classification method for predicting class to which specific item belongs, and computing device using same
Saffari et al. DCNN-fuzzyWOA: artificial intelligence solution for automatic detection of covid-19 using X-ray images
CN110867225A (en) Character-level clinical concept extraction named entity recognition method and system
Reeves et al. Resampling to address inequities in predictive modeling of suicide deaths
Murugan et al. Impact of Internet of Health Things (IoHT) on COVID-19 disease detection and its treatment using single hidden layer feed forward neural networks (SIFN)
WO2023003315A1 (en) System and method for predicting early discontinuation of treatment of outpatient with alcohol use disorder
WO2021139432A1 (en) Artificial intelligence-based user rating prediction method and apparatus, terminal, and medium
WO2023101417A1 (en) Method for predicting precipitation based on deep learning
WO2023003169A1 (en) Method, server, and computer program for providing response to query data on basis of quality data of pharmaceuticals
Meena et al. Depression Detection on COVID 19 Tweets Using Chimp Optimization Algorithm.
WO2022181907A1 (en) Method, apparatus, and system for providing nutrient information on basis of stool image analysis
Almasinejad et al. Predicting the status of COVID-19 active cases using a neural network time series

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22846185

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE