KR20190134934A

KR20190134934A - Predictive device for customer churn using Deep Learning and Boosted Decision Trees and method of predicting customer churn using it

Info

Publication number: KR20190134934A
Application number: KR1020180057094A
Authority: KR
Inventors: 이지형; 우상명; 김경태
Original assignee: 성균관대학교산학협력단
Priority date: 2018-05-18
Filing date: 2018-05-18
Publication date: 2019-12-05
Also published as: KR102129962B1

Abstract

The present invention relates to a device for predicting customer churn, configured to predict customer churn from a service provided by a company. According to an embodiment of the present invention, the device for predicting customer churn comprises: an input unit to which customer data is inputted; a pre-processing unit configured to pre-process the customer data inputted to the input unit and classify the customer data into time-based pre-processed data and customer-based pre-processed data; a customer churn prediction unit configured to extract a behavior pattern using the classified time-based pre-processed data and a statistical index of a customer using the classified customer-based pre-processed data, and predict customer churn by using the extracted behavior pattern and statistical index; and an output unit configured to output prediction results of the customer churn prediction unit. According to the present invention, it is possible to predict customer churn with higher accuracy.

Description

Prediction device for customer churn using Deep Learning and Boosted Decision Trees and method of predicting customer churn using it}

본 발명은 기업이 제공하는 서비스로부터 고객이 이탈할 것인지를 예측하는 고객이탈 예측장치 및 이를 이용한 고객이탈 예측방법에 관한 것이다.The present invention relates to a customer departure prediction device for predicting whether a customer will leave the service provided by the company and a customer departure prediction method using the same.

많은 기업들은 경쟁 시장에서 우위를 유지하고 안정적인 수익을 확보하기 위해, 고객 중심 비즈니스 전략의 중요성을 인식하고 있다. 회사는 고객으로부터 발생하는 수익에 의존적이기 때문에, 고객 관계 관리(Customer Relationship Management)는 고객의 실제 행동을 반영하는 데이터를 기반으로 의사 결정을 수행한다. 예를 들면, 데이터 분석을 통해 고객의 잠재적 가치를 평가하고, 고객 이탈 가능성을 예측한다. Many companies recognize the importance of customer-centric business strategies in order to stay ahead of the competition and generate stable returns. Because companies rely on revenue from their customers, customer relationship management makes decisions based on data that reflects their customers' actual behavior. For example, data analysis can assess the potential value of customers and predict the likelihood of customer churn.

거래를 중단하려는 고객의 경향으로 정의되는 고객 이탈은 전 세계 기업이 당면 하고 있는 중요한 과제 중 하나이다. 고객 이탈로 인한 매출 감소는 기회비용을 발생시킬 뿐만 아니라 신규 고객 유치의 필요성을 증가시킨다. 하지만 신규 고객을 확보하는데 드는 비용은 $300에서 $600에 이르는 것으로 추산되며, 이 수치는 기존의 고객을 유지하기 위해 소요되는 비용보다 5∼6배 더 높은 수치이다. Customer churn, defined as the tendency of customers to stop trading, is one of the major challenges facing companies around the world. Reducing sales from customer churn not only creates opportunity costs, but also increases the need for new customers. However, the cost of acquiring new customers is estimated to range from $ 300 to $ 600, which is five to six times higher than the cost of retaining existing customers.

따라서 기존의 고객 중 이탈할 가능성이 높은 고객을 예측해 방지하는 것이 신규 고객을 유치하는 것보다 비용적인 측면에서 효과적이다. 고객 이탈을 효과적으로 관리하기 위해서는 정확한 고객 이탈 예측모델을 구축하는 것이 중요하다.Therefore, predicting and preventing existing customers who are more likely to drop out is more cost effective than attracting new customers. In order to effectively manage customer churn, it is important to establish accurate customer churn prediction models.

데이터마이닝 기법을 통한 백화점의 고객이탈예측모형 연구 (아시아마케팅저널, 2005, 6권 4호, 저자 윤성준)A Study on the Customer Departure Prediction Model of Department Stores through Data Mining Techniques

본 발명은 보다 향상된 정확도로 고객의 이탈 여부를 판단할 수 있는 고객이탈 예측장치 및 이를 이용한 고객이탈 예측방법을 제공하는 것을 목적으로 한다.An object of the present invention is to provide a customer departure prediction apparatus that can determine whether the customer is separated with improved accuracy and a customer departure prediction method using the same.

본 발명의 실시예에 따른 고객이탈 예측장치는,Customer churn prediction apparatus according to an embodiment of the present invention,

고객 데이터가 입력되는 입력부; 상기 입력부로 입력된 고객 데이터를 전처리하여 시간기준 전처리 데이터와 고객기준 전처리 데이터로 분류하는 전처리부; 상기 분류된 시간기준 전처리 데이터를 이용하여 행동패턴을 추출하고, 상기 분류된 고객기준 전처리 데이터를 이용하여 고객의 통계지표를 추출하며, 상기 추출된 행동패턴과 통계지표를 이용하여 고객의 이탈 여부를 예측하는 고객이탈 예측부; 및, 상기 고객이탈 예측부의 예측 결과를 출력하는 출력부를 포함한다.An input unit to which customer data is input; A pre-processing unit for pre-processing the customer data input to the input unit and classifying the time-based pre-processing data and the customer-based pre-processing data; The behavioral pattern is extracted using the classified time based preprocessing data, the statistical index of the customer is extracted using the classified customer based preprocessing data, and whether the customer is separated using the extracted behavior pattern and the statistical index. Predicting customer churn prediction unit; And an output unit for outputting a prediction result of the customer deviation predicting unit.

본 발명의 실시예에 따른 고객이탈 예측장치에 있어서, 상기 전처리부는, 상기 입력부로 입력된 고객 데이터를 시간기준으로 전처리하여 시간기준 전처리 데이터로 분류하는 시간기준 전처리 모듈과, 상기 입력부로 입력된 고객 데이터를 고객기준으로 전처리하여 고객기준 전처리 데이터로 분류하는 고객기준 전처리 모듈을 포함한다.In the customer departure prediction apparatus according to an embodiment of the present invention, the pre-processing unit, the time-based pre-processing module for pre-processing the customer data input to the input unit on a time basis to classify the time-based pre-processing data, the customer input to the input unit It includes a customer-based preprocessing module that pre-processes the data to customer standards and classifies the data into customer-based preprocessing data.

본 발명의 실시예에 따른 고객이탈 예측장치에 있어서, 상기 시간기준 전처리 데이터는 고객의 행동을 시간 순으로 배열한 타임 테이블 형식이고, 상기 고객기준 전처리 데이터는 시간과는 무관하게 다수의 고객 속성을 배열한 속성 테이블 형식일 수 있다.In the customer departure predicting apparatus according to an embodiment of the present invention, the time-based preprocessing data is in a time table format in which customer behaviors are arranged in chronological order, and the customer-based preprocessing data includes a plurality of customer attributes regardless of time. It can be an formatted attribute table format.

본 발명의 실시예에 따른 고객이탈 예측장치에 있어서, 상기 고객이탈 예측부는, 상기 분류된 시간기준 전처리 데이터를 이용하여 행동패턴을 추출하여 고객의 이탈 여부를 예측하는 행동패턴 추출모델과, 상기 분류된 고객기준 전처리 데이터를 이용하여 통계지표를 추출하여 고객의 이탈 여부를 예측하는 통계지표 추출모델과, 상기 행동패턴 추출모델과 상기 통계지표 추출모델의 예측 결과에 대해 투표 기법을 적용하여 과반수 이상의 예측 결과를 산출하는 앙상블 예측모델을 포함한다.In the customer deviation prediction apparatus according to an embodiment of the present invention, the customer deviation prediction unit extracts a behavior pattern by using the classified time-based pre-processing data to predict whether or not the customer departure, and the classification Statistical index extraction model using the pre-processed customer-based preprocessed data to predict the departure of customers and predicting more than half by applying the voting technique to the prediction results of the behavior pattern extraction model and statistical index extraction model It includes an ensemble prediction model that produces a result.

본 발명의 실시예에 따른 고객이탈 예측장치에 있어서, 상기 행동패턴 추출모델은 RNN(Recurrent Neural Network, 순환 신경망) 기반의 예측모델, CNN(Convolutional Neural Network, 콘볼루션 신경망) 기반의 예측모델, 상기 통계지표 추출모델은 RF(Random Forest, 랜덤 포레스트) 기반의 예측 모델 또는 XGBoost(Extreme Gradient Boosting) 기반의 예측 모델일 수 있다. In the customer departure prediction apparatus according to an embodiment of the present invention, the behavior pattern extraction model is a prediction model based on RNN (Recurrent Neural Network, cyclic neural network), a convolutional neural network (convolutional neural network) based prediction model, the The statistical index extraction model may be a prediction model based on a random forest (RF) or a prediction model based on extreme gradient boosting (XGBoost).

본 발명의 실시예에 따른 고객이탈 예측장치에 있어서, 상기 고객이탈 예측부는, 상기 RNN, CNN, RF, XGBoost 기반의 예측 모델 중 적어도 어느 3개의 예측 모델로 이루어질 수 있다.In the customer churn prediction apparatus according to an embodiment of the present invention, the customer churn prediction unit may be composed of at least three prediction models of the RNN, CNN, RF, XGBoost-based prediction model.

본 발명의 실시예에 따른 고객이탈 예측장치에 있어서, 상기 앙상블 예측모델은, 상기 RNN 기반의 예측 모델과 상기 RF 기반의 예측 모델과, 상기 XGBoost 기반의 예측 모델로 이루어질 수 있다.In the customer departure prediction apparatus according to an embodiment of the present invention, the ensemble prediction model may include the RNN based prediction model, the RF based prediction model, and the XGBoost based prediction model.

본 발명의 실시예에 따른 고객이탈 예측방법은,Customer churn prediction method according to an embodiment of the present invention,

컴퓨팅 수단에 의해 수행되며, 고객 데이터를 입력하는 제1 단계; 상기 입력된 고객 데이터를 전처리하여 시간기준 전처리 데이터와 고객기준 전처리 데이터로 분류하는 제2 단계; 상기 분류된 시간기준 전처리 데이터를 이용하여 행동패턴을 추출하고, 상기 분류된 고객기준 전처리 데이터를 이용하여 고객의 통계지표를 추출하며, 상기 추출된 행동패턴과 통계지표를 이용하여 고객의 이탈 여부를 예측하는 제3 단계; 및, 상기 고객이탈 예측부의 예측 결과를 출력하는 제4 단계를 포함한다.A first step performed by the computing means and inputting customer data; A second step of pre-processing the input customer data and classifying it into time-based preprocessing data and customer-based preprocessing data; The behavioral pattern is extracted using the classified time based preprocessing data, the statistical index of the customer is extracted using the classified customer based preprocessing data, and whether the customer is separated using the extracted behavior pattern and the statistical index. Predicting a third step; And a fourth step of outputting a prediction result of the customer deviation predicting unit.

본 발명의 실시예에 따른 고객이탈 예측방법에 있어서, 상기 제2 단계는, 상기 입력된 고객 데이터를 시간기준으로 전처리하여 시간기준 전처리 데이터로 분류하고, 상기 입력된 고객 데이터를 고객기준으로 전처리하여 고객기준 전처리 데이터로 분류한다.In the customer departure prediction method according to an embodiment of the present invention, the second step, pre-processing the input customer data on a time basis to classify the time-based pre-processing data, by pre-processing the input customer data on a customer basis Classify as customer standard preprocessing data.

본 발명의 실시예에 따른 고객이탈 예측방법에 있어서, 상기 시간기준 전처리 데이터는 고객의 행동을 시간 순으로 배열한 타임 테이블 형식이고, 상기 고객기준 전처리 데이터는 시간과는 무관하게 다수의 고객 속성을 배열한 속성 테이블 형식일 수 있다.In the customer departure prediction method according to an embodiment of the present invention, the time-based preprocessing data is in a time table format in which customer behaviors are arranged in chronological order, and the customer-based preprocessing data includes a plurality of customer attributes irrespective of time. It can be an formatted attribute table format.

본 발명의 실시예에 따른 고객이탈 예측방법에 있어서, 상기 제3 단계는, 상기 분류된 시간기준 전처리 데이터를 이용하여 행동패턴을 추출하여 고객의 이탈 여부를 예측하는 과정과, 상기 분류된 고객기준 전처리 데이터를 이용하여 통계지표를 추출하여 고객의 이탈 여부를 예측하는 과정과, 상기 예측 결과에 대해 투표 기법을 적용하여 과반수 이상의 예측 결과를 산출하는 과정을 포함한다.In the method of predicting customer departure according to an embodiment of the present invention, the third step includes: extracting a behavior pattern by using the classified time-based preprocessing data to predict whether the customer is leaving, and the classified customer standard. The method includes extracting statistical indicators using preprocessing data to predict whether the customer deviates, and calculating a prediction result of more than half by applying a voting technique to the prediction result.

본 발명의 실시예에 따른 고객이탈 예측방법에 있어서, 상기 행동패턴은 RNN 기반의 예측모델, CNN 기반의 예측모델, 상기 통계지표는 RF 기반의 예측 모델 또는 XGBoost 기반의 예측 모델을 이용하여 추출할 수 있다.In the customer churn prediction method according to an embodiment of the present invention, the behavior pattern may be extracted using an RNN based prediction model, a CNN based prediction model, and the statistical indicator using an RF based prediction model or an XGBoost based prediction model. Can be.

본 발명의 실시예에 따른 고객이탈 예측방법에 있어서, 상기 제3 단계는, 상기 RNN, CNN, RF, XGBoost 기반의 예측 모델 중 적어도 어느 3개의 예측 모델을 이용할 수 있다.In the customer departure prediction method according to an embodiment of the present invention, the third step may use at least three prediction models of the RNN, CNN, RF, and XGBoost based prediction models.

본 발명의 실시예에 따른 고객이탈 예측방법에 있어서, 상기 예측 결과에 대해 투표 기법을 적용하여 과반수 이상의 예측 결과를 산출하는 과정은, 상기 RNN 기반의 예측 모델과 상기 RF 기반의 예측 모델과, 상기 XGBoost 기반의 예측 모델을 이용하는 것이 바람직하다.In the customer churn prediction method according to an embodiment of the present invention, the process of calculating a majority or more prediction results by applying a voting technique to the prediction results, the RNN-based prediction model and the RF-based prediction model, and It is preferable to use an XGBoost based prediction model.

본 발명의 실시예에 따른 기록매체에 저장된 컴퓨터 프로그램은, A computer program stored in a recording medium according to an embodiment of the present invention,

컴퓨터에서, 고객 데이터를 입력하는 제1 단계; 상기 입력된 고객 데이터를 전처리하여 시간기준 전처리 데이터와 고객기준 전처리 데이터로 분류하는 제2 단계; 상기 분류된 시간기준 전처리 데이터를 이용하여 행동패턴을 추출하고, 상기 분류된 고객기준 전처리 데이터를 이용하여 고객의 통계지표를 추출하며, 상기 추출된 행동패턴과 통계지표를 이용하여 고객의 이탈 여부를 예측하는 제3 단계; 및, 상기 고객이탈 예측부의 예측 결과를 출력하는 제4 단계를 실행시킨다.At the computer, a first step of entering customer data; A second step of pre-processing the input customer data and classifying it into time-based preprocessing data and customer-based preprocessing data; The behavioral pattern is extracted using the classified time based preprocessing data, the statistical index of the customer is extracted using the classified customer based preprocessing data, and whether the customer is separated using the extracted behavior pattern and the statistical index. Predicting a third step; And outputting a prediction result of the customer deviation predicting unit.

기타 본 발명의 다양한 측면에 따른 구현예들의 구체적인 사항은 이하의 상세한 설명에 포함되어 있다.Other specific details of embodiments according to various aspects of the present invention are included in the following detailed description.

본 발명의 실시예에 따른 고객이탈 예측장치에 의하면, 보다 향상된 정확도로 고객의 이탈 여부를 판단할 수 있다.According to the customer churn prediction apparatus according to an embodiment of the present invention, it is possible to determine whether the customer churn with improved accuracy.

도 1은 본 발명의 실시예에 따른 고객이탈 예측장치가 도시된 블록도이다.
도 2는 고객의 행동을 시간 순으로 배열한 타임 테이블 형식의 시간기준 전처리 데이터의 일 예이다.
도 3은 다수의 고객 속성을 배열한 속성 테이블 형식의 고객기준 전처리 데이터의 일 예이다.
도 4는 고객의 행동패턴을 추출하는 RNN 모델이 도시된 도면이다.
도 5는 고객의 행동패턴을 추출하는 CNN 모델이 도시된 도면이다.
도 6은 본 발명의 실시예에 따른 고객이탈 예측장치의 향상된 정확도 측정에 사용된 데이터 세트이다.
도 7은 도 6의 데이터 세트를 여러 예측 모델을 이용하여 예측한 결과의 정확도가 예시된 표이다.
도 8은 각각의 예측 모델의 정확도가 예시된 표이다.
도 9는 속성의 중요도가 예시된 그래프이다.1 is a block diagram illustrating a customer churn prediction apparatus according to an embodiment of the present invention.
2 is an example of time-based preprocessing data in a time table format in which customer behaviors are arranged in chronological order.
3 is an example of customer-based preprocessing data in an attribute table format in which a plurality of customer attributes are arranged.
4 is a diagram illustrating an RNN model for extracting a behavior pattern of a customer.
5 is a diagram illustrating a CNN model for extracting a behavior pattern of a customer.
6 is a data set used for improved accuracy measurement of a customer churn prediction apparatus according to an embodiment of the present invention.
FIG. 7 is a table illustrating the accuracy of a result of predicting the data set of FIG. 6 using various prediction models.
8 is a table illustrating the accuracy of each prediction model.
9 is a graph illustrating the importance of attributes.

본 발명은 다양한 변환을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예를 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변환, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.As the invention allows for various changes and numerous embodiments, particular embodiments will be illustrated and described in detail in the detailed description. However, this is not intended to limit the present invention to specific embodiments, it should be understood to include all transformations, equivalents, and substitutes included in the spirit and scope of the present invention.

본 발명에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 발명에서, '포함하다' 또는 '가지다' 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다. The terminology used herein is for the purpose of describing particular example embodiments only and is not intended to be limiting of the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise. In the present invention, the terms 'comprise' or 'have' are intended to indicate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, and one or more other features. It is to be understood that the present invention does not exclude the possibility of the presence or the addition of numbers, steps, operations, components, components, or a combination thereof.

본 발명의 고객이탈 예측장치는 고객의 행동 패턴을 이용하여 고객 이탈을 예측하는 딥 러닝 기반의 예측모델과 고객의 통계적 특징을 활용하는 Boosted Decision Tree 기반의 예측모델을 이용한다. 본 발명의 고객이탈 예측장치는 각 모델의 효과적인 고객 이탈 예측을 위한 데이터 전처리 기법 및, 고객의 행동 패턴과 고객의 통계적 특징을 모두 활용하는 앙상블 모델에 대해 개시한다. 이하, 도면을 참조하여 본 발명의 실시예에 따른 고객이탈 예측장치 및 고객이탈 예측방법을 설명한다.The customer churn prediction apparatus of the present invention uses a deep learning-based prediction model that predicts customer churn using a customer behavior pattern and a boosted decision tree-based prediction model that utilizes statistical characteristics of the customer. The customer churn prediction apparatus of the present invention discloses a data preprocessing technique for effective customer churn prediction of each model, and an ensemble model that utilizes both customer behavior patterns and statistical characteristics of customers. Hereinafter, a customer departure prediction apparatus and a customer departure prediction method according to an embodiment of the present invention will be described with reference to the drawings.

도 1은 본 발명의 일 실시예에 따른 고객이탈 예측장치가 도시된 블록도이다. 도 1에 도시된 바와 같이, 본 발명의 일 실시예에 따른 고객이탈 예측장치는, 입력부(100), 전처리부(200), 고객이탈 예측부(300), 출력부(400)를 포함한다.1 is a block diagram illustrating a customer departure prediction apparatus according to an embodiment of the present invention. As illustrated in FIG. 1, the apparatus for estimating customer deviation according to an embodiment of the present invention includes an input unit 100, a preprocessor 200, a customer deviation predictor 300, and an output unit 400.

입력부(100)는 고객 데이터를 입력 받는다. 고객 데이터란, 고객이탈 예측을 위해 이용되는 각종 데이터로, 예를 들어, 게임 산업 분야에서는 특정 게임 사용자의 로그인 데이터 및 해당 게임에서의 활동 내역을 표시하는 데이터일 수 있고, 금융 산업 분야에서는 고객이 사용 중인 각종 카드의 사용 내역일 수 있다.The input unit 100 receives customer data. Customer data is a variety of data used for estimating customer churn, for example, in the game industry may be data indicating the log-in data of a particular game user and the activity history of the game, and in the financial industry It may be a history of use of various cards in use.

전처리부(200)는 입력부(100)로 입력된 고객 데이터를 전처리하여 시간기준 전처리 데이터와 고객기준 전처리 데이터로 분류한다. 전처리부(200)는 시간기준 전처리 모듈(210)과 고객기준 전처리 모듈(220)을 포함하고, 시간기준 전처리 모듈(210)은 입력부(100)로 입력된 고객 데이터를 시간기준으로 전처리하여 시간기준 전처리 데이터로 분류하고, 고객기준 전처리 모듈(220)은 입력부(100)로 입력된 고객 데이터를 고객기준으로 전처리하여 고객기준 전처리 데이터로 분류한다.The preprocessing unit 200 preprocesses the customer data input to the input unit 100 and classifies the data into time-based preprocessing data and customer-based preprocessing data. The preprocessor 200 includes a time-based preprocessing module 210 and a customer-based preprocessing module 220, and the time-based preprocessing module 210 preprocesses the customer data input to the input unit 100 on a time basis. The pre-processing data is classified, and the customer standard preprocessing module 220 pre-processes the customer data input to the input unit 100 based on the customer standard and classifies it as the customer standard preprocessing data.

시간기준 전처리 모듈(210)에 의해 분류된 시간기준 전처리 데이터는 시간에 따른 고객의 행동 변화를 반영한 데이터이다. The time-based preprocessing data classified by the time-based preprocessing module 210 is data reflecting a change in customer behavior over time.

시간기준 전처리 데이터는, 도 2에 도시된 바와 같이, 고객의 행동을 시간 순으로 배열한 타임 테이블 형식일 수 있다. 도 2의 (a), (b) 표에서 “Customer ID”는 고객 아이디, “Time”은 고객이 특정 행동을 한 시간을 의미한다. “Attribute_standard”는 기준 속성을 의미한다. 기준 속성은 고객 행동의 기준이 되는 대표 속성으로, 예를 들어, 고객이 게임을 플레이하는 경우, 해당 게임 내에서 제공되는 여러 캐릭터 중 하나일 수 있다. (a) 표에서는 1번 캐릭터가 기준 속성이 되고, (b) 표에서는 2번 캐릭터가 기준 속성이 된다. 또한 예를 들어, 고객이 사용하는 카드인 경우, 도 2의 (a), (b) 표에서 “Attribute_standard”는 종류가 다른 2개의 A 카드(1), B 카드(2)일 수 있다.As shown in FIG. 2, the time-based preprocessing data may be in a time table format in which customer behaviors are arranged in chronological order. In Tables 2a and 2b, “Customer ID” refers to a customer ID and “Time” refers to a time when a customer performs a specific action. “Attribute_standard” means standard attribute. The reference attribute is a representative attribute that is a standard of customer behavior. For example, when a customer plays a game, the reference attribute may be one of several characters provided in the game. In the table (a), character 1 is the reference attribute, and in table (b) the character 2 is the reference attribute. Also, for example, in the case of a card used by a customer, “Attribute_standard” may be two A cards 1 and B cards 2 of different types in the tables (a) and (b) of FIG. 2.

“Attribute1”, “Attribute2”는 고객의 행동을 표현하는 행동 속성을 의미한다. 예를 들어, 게임인 경우, 표 (a)의 2번째 라인은 고객(4175D6K2)이 2017년 3월 4일 1번 캐릭터로 게임을 하면서 취득한 게임 머니의 양(Attribute1)과 획득한 아이템(Attribute2)을 의미한다.“Attribute1” and “Attribute2” refer to behavior attributes that represent customer behavior. For example, in the case of a game, the second line of the table (a) shows the amount of game money (Attribute1) and items (Attribute2) acquired by the customer (4175D6K2) as a character on March 4, 2017. Means.

이와 같이 “Attribute + 숫자”로 표시될 수 있는 행동 속성은, 다양하게 정의될 수 있다. 예를 들어, 매 초마다 유저(고객)의 캐릭터가 얻는 경험치 양, 매 초마다 유저(고객)의 캐릭터가 얻는 돈의 양, 매 초마다 유저(고객)의 캐릭터가 이동하는 경로, 매 초마다 유저(고객)의 캐릭터가 취하는 행동(공격, 스킬 발동 등등) 등이 될 수 있다.As such, the behavior attribute that can be expressed as “Attribute + number” can be defined in various ways. For example, the amount of experience that a character of the user (the customer) gets every second, the amount of money that the character of the user (the customer) gets every second, the path that the character of the user (the customer) moves every second, every second It may be an action (attack, skill play, etc.) that the character of the user (customer) takes.

이러한 시간기준 전처리 데이터는 시간에 따른 고객의 행동 변화를 있는 그대로 표현하기 때문에, 고객의 행동 패턴을 추출하는 데에 효과적이다.Since the time-based preprocessing data expresses the behavior change of the customer over time as it is, it is effective in extracting the customer's behavior pattern.

상기와 같이 고객의 행동 패턴은 특정 속성을 기준으로 세분화하여 표현할 수 있다. 예를 들어, 금융 산업에서의 한 고객의 행동 패턴은 사용 중인 각 카드의 결제 패턴으로 분리할 수 있다. 게임 산업에서의 한 고객의 행동 패턴은 고객이 소유하고 있는 각 캐릭터의 행동 패턴으로 분리할 수 있다. 이와 같이 기준 속성을 이용하여 고객의 행동 데이터를 분할함으로써 더 정교한 고객의 행동 패턴을 추출할 수 있다.As described above, the behavior pattern of the customer may be expressed by subdividing based on specific attributes. For example, the behavior pattern of a customer in the financial industry can be separated into the payment pattern of each card in use. The behavior pattern of a customer in the game industry can be separated into the behavior pattern of each character owned by the customer. As such, by dividing the behavioral data of the customer by using the reference attribute, more sophisticated customer behavioral patterns can be extracted.

시간기준 전처리 모듈(210)은 기준 속성을 기준으로 고객의 행동 데이터를 세분화하여 시간에 따른 시간기준 전처리 데이터로 분류한다. 시간기준 전처리 모듈(210)에 의해 분류된 시간기준 전처리 데이터는 행동패턴 추출모델(310)을 구축하기 위해 활용된다. The time base preprocessing module 210 classifies the customer's behavior data based on the base attribute and classifies the time base preprocessing data over time. The time-based preprocessing data classified by the time-based preprocessing module 210 is used to construct the behavior pattern extraction model 310.

고객기준 전처리 모듈(220)에 의해 분류된 고객기준 전처리 데이터는 시간에 따른 고객의 통계적 특징을 반영한 데이터이다. The customer-based preprocessing data classified by the customer-based preprocessing module 220 is data reflecting statistical characteristics of the customer over time.

고객기준 전처리 데이터는, 도 3에 도시된 바와 같이, 시간과는 무관하게 다수의 고객 속성을 배열한 속성 테이블 형식일 수 있다. 고객기준 전처리 데이터는 각 고객의 특징적인 속성을 나타내는 데이터로, 고객의 속성의 평균값, 최대값, 최소값 등이 있다. 예를 들어, 고객이 하루에 특정 게임을 play한 시간을 대상 속성으로 할 때, 고객기준 전처리 데이터는 고객(게임 유저)이 하루에 play한 시간의 평균값, 최대값, 최소값 등이 될 수 있다. The customer-based preprocessing data may be in an attribute table format in which a plurality of customer attributes are arranged regardless of time, as shown in FIG. 3. The customer-based preprocessing data is the data representing the characteristic attributes of each customer, and includes the average value, the maximum value, and the minimum value of the customer's attributes. For example, when the customer plays a specific game in a day as a target attribute, the customer-based preprocessing data may be an average value, a maximum value, a minimum value, or the like of the time the customer (game user) played in a day.

도 3의 표에서 “Customer ID”는 고객 아이디, “Attribute1_mean”은 속성1의 평균값, “Attribute2_mode”는 속성2의 최대/최소값, “Attribute3_slope”는 속성3의 기울기 등을 의미한다. 도 3의 표에서, “Attribute1_mean”, “Attribute2_mode”, “Attribute3_slope”는 각각 다른 속성에 대해 배열한 것이나, 동일한 고객 아이디의 동일한 속성에 관한 것인 경우, “Attribute2_mode”는 “Attribute1_mode”, “Attribute3_slope”는 “Attribute1_slope”로 배열할 수도 있다.In the table of FIG. 3, “Customer ID” means a customer ID, “Attribute1_mean” means an average value of Attribute 1, “Attribute2_mode” means a maximum / minimum value of Attribute 2, and “Attribute3_slope” means a slope of Attribute 3, and the like. In the table of FIG. 3, "Attribute1_mean", "Attribute2_mode", and "Attribute3_slope" are arranged for different attributes, respectively, but when it is about the same attribute of the same customer ID, "Attribute2_mode" is "Attribute1_mode", "Attribute3_slope" May be arranged as “Attribute1_slope”.

고객기준 전처리 모듈(220)에 의해 분류된 고객기준 전처리 데이터는 통계지표 추출모델(320)을 구축하기 위해 활용된다. The customer-based preprocessing data classified by the customer-based preprocessing module 220 is used to build a statistical index extraction model 320.

고객이탈 예측부(300)는 시간기준 전처리 데이터와 고객기준 전처리 데이터를 상호 보완적으로 활용하여 고객이탈 여부를 예측한다. 고객이탈 예측부(300)는 행동패턴 추출모델(310)과 통계지표 추출모델(320)과 앙상블 예측모델(330)을 포함한다.The customer departure prediction unit 300 predicts whether the customer has left by using the time-based preprocessing data and the customer-based preprocessing data complementarily. The customer deviation predictor 300 includes a behavior pattern extraction model 310, a statistical index extraction model 320, and an ensemble prediction model 330.

행동패턴 추출모델(310)에서는 시간기준 전처리 데이터를 이용하여 행동패턴을 추출하여 고객의 이탈 여부를 예측하고, 통계지표 추출모델(320)에서는 고객기준 전처리 데이터를 이용하여 통계지표를 추출하여 고객의 이탈 여부를 예측한다. 앙상블 예측모델(330)은 행동패턴 추출모델(310)과 통계지표 추출모델(320)의 예측 결과에 대해 투표 기법을 적용하여 과반수 이상의 예측 결과를 산출한다.In the behavior pattern extraction model 310, the behavior pattern is extracted using the time-based preprocessing data to predict the departure of the customer, and in the statistical index extraction model 320, the statistical index is extracted using the customer-based preprocessing data. Predict whether or not a deviation will occur. The ensemble prediction model 330 applies a voting technique to the prediction results of the behavior pattern extraction model 310 and the statistical index extraction model 320 to calculate more than half of the prediction results.

행동패턴 추출모델(310)은 도 2에 도시된 바와 같은 시간기준 전처리 데이터를 이용하여 고객의 행동패턴을 추출하여 고객의 이탈 여부를 예측한다.The behavior pattern extraction model 310 extracts the behavior pattern of the customer by using the time-based preprocessing data as shown in FIG. 2 and predicts the departure of the customer.

행동패턴 추출모델(310)은 RNN(Recurrent Neural Network, 순환 신경망) 기반의 예측모델, CNN(Convolutional Neural Network, 콘볼루션 신경망) 기반의 예측모델일 수 있다.The behavior pattern extraction model 310 may be a predictive model based on RNN (Recurrent Neural Network), a convolutional neural network (CNN) based prediction model.

RNN은 시계열 데이터(time-series data)와 같이 시간의 흐름에 따라 변화하는 데이터를 학습하기 위한 딥 러닝 모델로, 기준 시점(t)과 다음 시점(t+1)에 네트워크를 연결하여 구성한 인공 신경망(ANN)이다. RNN is a deep learning model for learning data that changes over time, such as time-series data, and is an artificial neural network constructed by connecting a network at a reference time point t and a next time point t + 1. (ANN).

LSTM은 RNN에서 발생하는 vanishing gradient problem을 극복하기 위해 제안된 것으로, LSTM은 각 노드로 흘러 들어가는 정보의 양을 gate들을 이용하여 조절하여 정보가 희석되는 현상을 늦춰준다. Cell state에 input gate, forget gate를 적용하여 정보를 선택적으로 추가하거나 제거한다. Gate의 열리고 닫히는 정도를 통해, hidden 노드의 정보가 희석되는 정도를 조절할 수 있어 vanishing gradient 현상을 어느 정도 예방할 수 있다. LSTM은 RNN의 일종이다.LSTM is proposed to overcome the vanishing gradient problem that occurs in RNN. LSTM slows down the dilution of information by controlling the amount of information flowing into each node using gates. Apply input gate and forget gate to cell state to selectively add or remove information. By opening and closing the gate, the degree of dilution of hidden node information can be controlled to prevent vanishing gradient. LSTM is a kind of RNN.

CNN은 다층 신경망의 한 종류로 패턴 인식 분야에서 좋은 성능을 보이고 있어 이미지, 텍스트 등의 데이터를 분류, 인식하는데 주로 사용된다. CNN은 주된 요소는 convolution 층과 poolng 층이다. 입력의 특징을 추출하는 convolution 층과 중요한 정보를 손실하지 않고 입력을 축소하는 pooling 층을 통해 원본 데이터의 특징을 추출한다.CNN is a kind of multi-layer neural network and shows good performance in pattern recognition field. It is mainly used to classify and recognize data such as image and text. The main elements of CNN are the convolution layer and the poolng layer. Features of the original data are extracted through a convolution layer that extracts the features of the input and a pooling layer that reduces the input without losing important information.

행동패턴 추출모델(310)은, 도 4 및 도 5에 도시된 바와 같이, 시간기준 전처리 데이터(TD)를 이용하여 고객의 행동패턴을 추출하는 RNN 모델(도 4) 또는 CNN 모델을 구축한다. 각 모델을 이용하여 추출한 고객의 행동패턴을 concatenate 연산하여 인공 신경망에 적용한다. 즉, 기준 속성에 따라 세분화된 고객의 행동패턴을 독립적으로 활용하여 고객 이탈을 예측한다.As shown in FIGS. 4 and 5, the behavior pattern extraction model 310 constructs an RNN model (FIG. 4) or a CNN model that extracts a behavior pattern of a customer using time-based preprocessing data (TD). Concatenate operation pattern of customer's behavior extracted by each model and apply to artificial neural network. In other words, predict customer churn by independently using customer behavior patterns broken down by criteria attributes.

행동패턴 추출모델(310)은 RNN 모델이나 CNN 모델 중 어느 하나의 모델로 이루어지거나, 또는 RNN 모델과 CNN 모델로 이루어질 수 있다.The behavior pattern extraction model 310 may consist of any one of an RNN model and a CNN model, or may consist of an RNN model and a CNN model.

통계지표 추출모델(320)은 도 3에 도시된 바와 같은 고객기준 전처리 데이터를 이용하여 고객의 통계 지표를 추출하여 고객의 이탈 여부를 예측한다.The statistical index extraction model 320 extracts the statistical index of the customer using the customer-based preprocessing data as shown in FIG. 3 to predict whether the customer has left.

통계지표 추출모델(320)은 RF(Random Forest, 랜덤 포레스트) 기반의 예측모델과 XGBoost (Extreme Gradient Boosting) 기반의 예측모델 중 적어도 어느 하나로 이루어질 수 있으며, 보다 구체적으로는 RF, XGBoost 기반의 예측모델이 함께 통계지표 추출모델(320)을 구축하는 것이 바람직하다.The statistical index extraction model 320 may be formed of at least one of an RF (Random Forest, Random Forest) based prediction model and an XGBoost (Extreme Gradient Boosting) based prediction model, and more specifically, an RF, XGBoost based prediction model. In addition, it is desirable to build a statistical index extraction model 320 together.

RF는 의사결정 트리 기반의 기계학습 기법 중 하나로, 성능이 낮은(정확도가 낮은) 여러 개의 모델이 결합하여 높은 성능의 모델을 형성하는 앙상블 기법이다. 분류를 수행하기 위해 각 트리는 특정 class에 대해 “투표”하고, 가장 많은 득표를 한 class로 분류한다. 독립 변수의 중요도를 계산할 수 있고 과적합을 줄일 수 있다는 장점을 가지고 있다.RF is a decision tree based machine learning technique, which is an ensemble technique that combines several models of low performance (low accuracy) to form a high performance model. To perform the classification, each tree “votes” for a particular class and classifies the most votes into one class. It has the advantage of being able to calculate the importance of independent variables and reduce overfitting.

XGBoost는 트리 부스팅을 위한 고효율의 확장 가능한 기계학습 기법으로, 기존 트리 부스팅 기반의 기법보다 높은 성능을 보여 다양한 분야에서 사용되고 있다. 트리 앙상블 기법 중 하나로, 트리를 구성할 때 병렬처리 기법을 사용하기 때문에 수행시간이 빠르다는 장점이 있다.XGBoost is a highly efficient and scalable machine learning method for tree boosting. It is used in various fields because it shows higher performance than the existing tree boosting method. As one of the tree ensemble techniques, the parallel processing technique is used to construct the tree, which has the advantage of fast execution time.

통계지표 추출모델(320)이 RF와, XGBoost 기반의 예측모델이 함께 구축된 경우, 먼저 RF를 이용하여 고객기준 전처리 데이터에 포함된 각 속성의 중요도를 계산한다. 속성의 중요도 계산은, Mean Decrease Accuracy를 계산하는 것으로, Random Forest를 가동하면, Random Forest 내부의 알고리즘을 통해 계산된다. 각 속성의 중요도 계산 결과, 높은 중요도를 가지는 기설정된 개수의 속성을 선정하여 RF 모델과 XGBoost 모델을 구축한다. 여기서, 기설정된 개수는, 예를 들어 100개일 수 있다.When the statistical index extraction model 320 is constructed with an RF and an XGBoost-based prediction model, first, the importance of each attribute included in the customer-based preprocessing data is calculated using the RF. The importance of the property is calculated by calculating the Mean Decrease Accuracy. When the Random Forest is started, it is calculated by the algorithm inside the Random Forest. As a result of importance calculation of each property, RF model and XGBoost model are constructed by selecting a predetermined number of properties with high importance. Here, the predetermined number may be 100, for example.

즉, 고객기준 전처리 데이터에 포함된 모든 속성을 이용하여 RF 모델과 XGBoost 모델을 구축하는 것이 아니라, RF 모델에 의해 계산되어 중요도 순으로 배열된 속성들 중에서 상위 100개의 속성을 선정하고, 이 선정된 속성을 이용하여 RF 모델과 XGBoost 모델을 구축한다.In other words, instead of building the RF model and the XGBoost model using all the attributes included in the customer-based preprocessing data, the top 100 attributes are selected from the attributes calculated by the RF model and arranged in order of importance. Construct the RF model and the XGBoost model using the attributes.

앙상블 예측모델(330)은 행동패턴 추출모델(310)과 통계지표 추출모델(320)의 예측 결과에 대해 투표 기법을 적용하여 과반수 이상의 예측 결과를 산출한다. 보다 구체적으로, 앙상블 예측모델(330)은 고객의 행동패턴을 활용하는 RNN 모델, CNN 모델 중 정확도가 높은 모델 RNN 모델과, 고객의 통계지표를 활용하는 RF 모델과 XGBoost 모델로 구축될 수 있다. 즉, 앙상블 예측모델(330)은 RNN 모델, RF 모델, XGBoost 모델로 구축될 수 있는 데, 여기서 앙상블 예측모델(330)을 위해 다시 RNN 모델, RF 모델, XGBoost 모델을 구축하는 것이 아니라, 행동패턴 추출모델(310)과 통계지표 추출모델(320) 구축을 위해 사용된 RNN 모델, RF 모델, XGBoost 모델을 사용하여 앙상블 예측모델(330)을 구축한다.The ensemble prediction model 330 applies a voting technique to the prediction results of the behavior pattern extraction model 310 and the statistical index extraction model 320 to calculate more than half of the prediction results. More specifically, the ensemble prediction model 330 may be constructed of an RNN model using a customer's behavior pattern, an RNN model with high accuracy among CNN models, and an RF model and an XGBoost model using a customer's statistical indicators. That is, the ensemble prediction model 330 may be constructed as an RNN model, an RF model, and an XGBoost model. Here, the ensemble prediction model 330 does not rebuild the RNN model, the RF model, or the XGBoost model, but rather the behavior pattern. The ensemble prediction model 330 is constructed by using the RNN model, the RF model, and the XGBoost model used for the extraction model 310 and the statistical index extraction model 320.

앙상블 예측모델(330)에서의 이탈 여부 예측 과정은, 행동패턴 추출모델(310)과 통계지표 추출모델(320)을 구축하는 RNN 모델, RF 모델, XGBoost 모델의 이탈 여부 예측 결과, 어느 2개 이상이 “이탈”로 예측하고, 나머지 하나가 “비이탈”로 예측한 경우, 앙상블 예측모델(330)은 투표 기법에 따라 다수인 “이탈”로 예측한다.The estimating prediction process of the ensemble prediction model 330 may include at least two or more results of predicting the deviation of the RNN model, the RF model, and the XGBoost model, which construct the behavior pattern extraction model 310 and the statistical index extraction model 320. If it predicts this "deviation" and the other predicts it as "non-deviation", the ensemble prediction model 330 predicts the majority "violation" according to the voting technique.

출력부(400)는 고객이탈 예측부(300)의 예측 결과를, 시각적 또는 청각적 수단을 통해 외부로 출력한다.The output unit 400 outputs the prediction result of the customer deviation predictor 300 to the outside through visual or audio means.

다음으로, 본 발명의 실시예에 따른 고객이탈 예측방법을 설명한다. Next, a customer departure prediction method according to an embodiment of the present invention will be described.

본 발명의 실시예에 따른 고객이탈 예측방법은 컴퓨팅 수단에 의해 수행될 수 있다. 여기서, 컴퓨팅 수단은, 데스크탑 컴퓨터, 노트북, 태블릿 PC, 이동통신 단말기 등 프로그램을 저장하기 위한 메모리, 프로그램을 실행하여 연산 및 제어하기 위한 마이크로프로세서 등을 구비하고 있는 단말기를 의미한다. 이러한 컴퓨팅 수단은, 전술한 고객이탈 예측장치의 구성요소인 입력부(100), 전처리부(200), 고객이탈 예측부(300), 출력부(400) 등을 구비하거나, 이들의 기능을 실행시키는 기록매체에 저장된 프로그램이 탑재되어 고객이탈 예측방법을 수행할 수 있다.The customer departure prediction method according to the embodiment of the present invention may be performed by a computing means. Here, the computing means means a terminal having a memory for storing a program such as a desktop computer, a notebook computer, a tablet PC, a mobile communication terminal, a microprocessor for executing a program and controlling the same. Such computing means may include an input unit 100, a preprocessor 200, a customer departure predictor 300, an output unit 400, or the like, which are components of the above-described customer deviation predictor, or execute their functions. A program stored in the recording medium may be mounted to perform a customer departure prediction method.

본 발명의 실시예에 따른 고객이탈 예측방법은, 고객 데이터를 입력하는 제1 단계와, 입력된 고객 데이터를 전처리하여 시간기준 전처리 데이터와 고객기준 전처리 데이터로 분류하는 제2 단계와, 분류된 시간기준 전처리 데이터를 이용하여 행동패턴을 추출하고, 분류된 고객기준 전처리 데이터를 이용하여 고객의 통계지표를 추출하며, 추출된 행동패턴과 통계지표를 이용하여 고객의 이탈 여부를 예측하는 제3 단계와, 고객이탈 예측부의 예측 결과를 출력하는 제4 단계를 포함한다.The customer departure prediction method according to the embodiment of the present invention comprises a first step of inputting customer data, a second step of pre-processing the input customer data and classifying the time-based preprocessing data and the customer-based preprocessing data, and the classified time. A third step of extracting behavior patterns using the standard preprocessing data, extracting customer statistical indicators using the classified customer standard preprocessing data, and using the extracted behavior patterns and statistical indicators to predict whether the customer has left The fourth step of outputting the prediction result of the customer deviation prediction unit.

컴퓨팅 수단은, 제1 단계에서 입력된 고객 데이터를 시간기준으로 전처리하여 시간기준 전처리 데이터로 분류한다. 또한, 제1 단계에서 입력된 고객 데이터를 고객기준으로 전처리하여 고객기준 전처리 데이터로 분류한다.The computing means preprocesses the customer data input in the first step on a time basis and classifies the data into time based preprocessing data. In addition, the customer data input in the first step is pre-processed on a customer basis and classified into customer-based preprocessing data.

여기서, 시간기준 전처리 데이터는 고객의 행동을 시간 순으로 배열한 타임 테이블 형식일 수 있다. 또한, 고객기준 전처리 데이터는 시간과는 무관하게 다수의 고객 속성을 배열한 속성 테이블 형식일 수 있다.Here, the time-based preprocessing data may be in a time table format in which customer behaviors are arranged in chronological order. In addition, the customer-based preprocessing data may be in an attribute table format in which a plurality of customer attributes are arranged regardless of time.

고객의 이탈 여부를 예측하는 제3 단계는, 분류된 시간기준 전처리 데이터를 이용하여 행동패턴을 추출하여 고객의 이탈 여부를 예측하는 과정과, 분류된 고객기준 전처리 데이터를 이용하여 통계지표를 추출하여 고객의 이탈 여부를 예측하는 과정과, 예측 결과에 대해 투표 기법을 적용하여 과반수 이상의 예측 결과를 산출하는 과정을 포함한다.In the third step of estimating the departure of the customer, a process of estimating the departure of the customer by extracting a behavior pattern using the classified time-based preprocessing data and extracting statistical indicators using the classified customer-based preprocessing data It includes the process of predicting whether the customer has deviated, and the process of calculating the majority of the prediction result by applying the voting technique to the prediction result.

행동패턴은 RNN 기반의 예측모델, CNN 기반의 예측모델을 이용하여 추출할 수 있다. 또한, 통계지표는 RF 기반의 예측 모델 또는 XGBoost 기반의 예측 모델을 이용하여 추출할 수 있다.Behavior patterns can be extracted using RNN-based prediction models and CNN-based prediction models. In addition, statistical indicators can be extracted using an RF-based prediction model or an XGBoost-based prediction model.

제3 단계는, RNN, CNN, RF, XGBoost 기반의 예측 모델 중 적어도 어느 3개의 예측 모델을 이용할 수 있으며, 예측 결과에 대해 투표 기법을 적용하여 과반수 이상의 예측 결과를 산출하는 과정은, RNN 기반의 예측 모델과 RF 기반의 예측 모델과, XGBoost 기반의 예측 모델을 이용하는 것이 바람직하다. In the third step, at least any three prediction models of RNN, CNN, RF, and XGBoost based prediction models may be used, and the process of calculating a majority or more prediction results by applying a voting technique to the prediction results may be performed based on RNN. It is preferable to use the prediction model, the RF-based prediction model, and the XGBoost-based prediction model.

다음으로, 실험 자료를 통해 본 발명의 실시예에 따른 고객이탈 예측장치의 향상된 정확도에 대해 살펴본다.Next, look at the improved accuracy of the customer departure prediction apparatus according to an embodiment of the present invention through the experimental data.

1. 실험 자료1. Experimental data

본 실험은 게임 Blade & Soul의 로그 데이터를 이용하였다. Training data 4000개, validation data 3000개, test data 3000개로 구성되어 있으며, 한 개의 데이터는 한 명의 고객에 대해 8주 동안 수집된 로그 데이터이다.This experiment used log data of game Blade & Soul. It consists of 4000 training data, 3000 validation data and 3000 test data, and one data is log data collected for 8 weeks for one customer.

본 실험에 사용한 데이터는 엔씨소프트가 제공한 데이터로, 2017 Game Data Mining competition (IEEE Computational Intelligence in Games)에서 공개되었다. 데이터의 구성은 도 6에 도시된 표와 같다.The data used in this experiment was provided by NCsoft and was released at 2017 Game Data Mining competition (IEEE Computational Intelligence in Games). The configuration of the data is shown in the table shown in FIG.

도 6에서 “churn”은 이탈했다는 의미이다. training data을 통해 모델을 만들고, validation data를 통해 신뢰도를 구하고 test data를 이용해 최종 정확도를 계산한다. 이탈한 고객의 수와 이탈하지 않은 고객의 수가 불균형하기 때문에, Oversampling을 적용하여 데이터 불균형으로 인한 모델의 성능 감소를 해결하였다.In Figure 6, "churn" means that the departure. Model is created through training data, reliability is obtained through validation data, and final accuracy is calculated using test data. Since the number of customers who deviated and the number of customers who did not deviate was unbalanced, Oversampling was applied to solve the reduction in model performance due to data imbalance.

5.2 예측 모델 구축5.2 building predictive models

시간기준 전처리를 위한 기준 속성으로 고객이 플레이하는 캐릭터를 사용하였다. 본 실험에 사용한 게임은 한 명의 유저가 여러 개의 캐릭터를 소유할 수 있다. 따라서, 플레이 시간이 가장 긴 캐릭터를 메인 캐릭터, 나머지 캐릭터를 보조 캐릭터로 정의하여, 한 고객의 행동패턴을 두 캐릭터의 행동패턴으로 분리하였다.We used the character that the customer plays as the reference attribute for the time base preprocessing. In the game used in this experiment, one user can own several characters. Therefore, the character with the longest play time was defined as the main character and the remaining characters as auxiliary characters, so that the behavior pattern of one customer was divided into the behavior pattern of two characters.

LSTM를 이용하여 RNN 예측 모델을 구축한다. RNN 예측 모델은 5개의 레이어로 구성되며, 각 레이어는 56개의 LSTM cell로 구성된다.Construct RNN prediction model using LSTM. The RNN prediction model consists of five layers, and each layer consists of 56 LSTM cells.

CNN 예측 모델은 5개의 레이어로 구성되며, 한 개의 레이어는 3×1 convolution과 3 ×1 max pooling으로 구성된다.The CNN prediction model consists of five layers, with one layer consisting of 3 × 1 convolution and 3 × 1 max pooling.

RNN 예측 모델과 CNN 예측 모델은 활성화 함수로 Exponential Linear Units(ELU)(1.0)를 사용하였다. Loss는 cross entropy를 사용하였고 optimizer로 Root Mean Square Propagation(RMSProp)를 사용하였다. 또한 각 layer에 Batch normalization과 Dropout(0.4)를 적용하였다.The RNN prediction model and the CNN prediction model used Exponential Linear Units (ELU) (1.0) as activation functions. Loss uses cross entropy and Root Mean Square Propagation (RMSProp) as an optimizer. In addition, Batch normalization and Dropout (0.4) were applied to each layer.

RF 예측 모델과 XGBoost 예측 모델 구축을 위해 각 고객의 통계적 특징을 나타내는 속성(통계지표)과 고객의 행동 변화를 반영하는 속성(행동패턴) 1364개를 생성하였다. 예측 모델 구축에 활용할 속성의 선정을 위해, RF를 이용하여 각 속성의 Mean Decrease Accuracy를 산출하였다. 산출한 중요도를 기반으로 100개의 속성을 선정하여 RF 예측 모델과 XGBoost 예측 모델을 구축하였다.In order to construct the RF prediction model and the XGBoost prediction model, 1364 attributes (statistical indicators) representing the statistical characteristics of each customer and the behavioral patterns (behavior patterns) reflecting the change of the customer's behavior were generated. Mean Decrease Accuracy of each attribute was calculated using RF to select the attribute to be used for constructing the prediction model. Based on the calculated importance, 100 attributes were selected to build an RF prediction model and an XGBoost prediction model.

3. 실험 결과3. Experimental Results

예측 모델의 정확도는 도 7에 도시된 표와 같다. 한 가지 특징을 활용하는 예측 모델 중 가장 높은 정확도를 보이는 모델은 RNN을 이용하여 고객의 행동패턴을 추출한 예측모델이다. 두 번째로 높은 정확도를 보이는 모델은 고객의 통계적 특징(통계지표)을 활용하는 XGBoost 예측모델이다.The accuracy of the prediction model is shown in the table shown in FIG. Among the predictive models using one feature, the model with the highest accuracy is the predictive model that extracts the behavior patterns of customers using RNN. The second highest-accuracy model is the XGBoost prediction model that utilizes the customer's statistical characteristics (statistical indicators).

RNN 모델과 XGBoost 모델을 이용하여 구축한 앙상블 예측 모델의 정확도는 76%로 가장 높은 정확도를 보였다. 즉, 고객의 행동패턴과 고객의 통계적 특징(통계지표)을 함께 활용하여 고객 이탈을 예측하는 모델의 성능이 가장 뛰어났다.The accuracy of the ensemble prediction model constructed using the RNN model and the XGBoost model was the highest at 76%. In other words, the performance of the model that predicts customer churn by using both customer behavior patterns and customer statistical characteristics (statistic indicators) was the best.

추가적으로, 기준 속성을 이용하여 분할하지 않은 데이터를 이용하여 RNN 모델과 CNN 모델을 구축하였다. 이를 통해 기준 속성을 이용한 고객 행동패턴의 세분화가 고객 이탈예측에 유효한 영향을 주는지 확인하였다. In addition, the RNN model and the CNN model were constructed using undivided data using the reference attributes. Through this, we verified whether segmentation of customer behavior patterns using reference attributes has an effective effect on predicting customer churn.

또한, 행동 변화 반영 속성을 포함하지 않은 데이터를 이용하여 RF 모델과 XGBoost 모델을 구축하였다. 이를 통해 행동 변화 반영 속성이 고객 이탈 예측에 유효한 영향을 주는지 확인하였다. 실험 결과는 도 8에 도시된 표와 같다.In addition, the RF model and the XGBoost model were constructed using data that did not include behavior change reflection attributes. Through this, we confirmed whether the behavior change reflecting effect has an effect on the prediction of customer churn. Experimental results are shown in the table shown in FIG. 8.

기준 속성을 이용하여 고객의 행동 데이터를 분할하여 행동패턴을 추출한 RNN 모델과 CNN 모델의 정확도가 그렇지 않은 모델의 정확도보다 평균 3.5% 높았다. 따라서 기준 속성에 따라 고객의 행동패턴을 세분화하여 추출하는 것은 고객 이탈예측에 효과적임을 알 수 있었다.The accuracy of the RNN and CNN models, which extracted the behavioral patterns by dividing the behavioral data of the customer by using the reference attribute, was 3.5% higher than the accuracy of the models that did not. Therefore, it was found that segmenting and extracting customer's behavior patterns according to the criteria attributes is effective for predicting customer churn.

고객의 행동 변화를 반영하는 속성을 이용한 XGBoost, RF 모델의 정확도가 그렇지 않은 모델의 정확도보다 평균 6.5% 높았다. 또한, XGBoost 모델과 RF 모델을 구축하기 위해 계산한 각 속성의 중요도(Mean Decrease Accuracy)는 도 9와 같다. The accuracy of the XGBoost and RF models, using attributes that reflect changes in customer behavior, averaged 6.5% higher than those of other models. In addition, the importance (Mean Decrease Accuracy) of each property calculated to build the XGBoost model and RF model is shown in FIG.

가장 높은 중요도를 가지는 속성은 매주 고객이 얻은 경험치의 변화량이다. 두 번째로 중요한 속성은 마지막 한 주 동안 고객이 아이템을 변환시킨 횟수이다. 세 번째로 중요한 속성은 매주 플레이 시간의 변동 계수이다. 상위 3개의 속성이 고객의 행동 변화를 반영하는 속성이다. 따라서, 고객의 통계적 특징을 활용하여 고객 이탈을 예측할 때, 고객의 행동 변화를 반영하는 속성을 포함하는 것이 효과적임을 알 수 있다.The most important attribute is the amount of experience gained each week by the customer. The second most important attribute is the number of times the customer converted items in the last week. The third most important attribute is the coefficient of variation in weekly play time. The top three attributes reflect changes in customer behavior. Therefore, it can be seen that it is effective to include an attribute reflecting the change in the behavior of the customer when predicting the customer departure using the statistical characteristics of the customer.

상기와 같은, 본 발명의 실시예에 따른 고객이탈 예측장치에 의하면, 보다 향상된 정확도로 고객의 이탈 여부를 판단할 수 있다.According to the customer departure predicting apparatus according to the embodiment of the present invention as described above, it is possible to determine whether the customer leaving with more improved accuracy.

이상, 본 발명의 일 실시예에 대하여 설명하였으나, 해당 기술 분야에서 통상의 지식을 가진 자라면 특허청구범위에 기재된 본 발명의 사상으로부터 벗어나지 않는 범위 내에서, 구성 요소의 부가, 변경, 삭제 또는 추가 등에 의해 본 발명을 다양하게 수정 및 변경시킬 수 있을 것이며, 이 또한 본 발명의 권리범위 내에 포함된다고 할 것이다.As mentioned above, although an embodiment of the present invention has been described, those of ordinary skill in the art may add, change, delete or add components within the scope not departing from the spirit of the present invention described in the claims. The present invention may be modified and changed in various ways, etc., which will also be included within the scope of the present invention.

100 : 입력부 200 : 전처리부
210 : 시간기준 전처리 모듈 220 : 고객기준 전처리 모듈
300 : 고객이탈 예측부 310 : 행동패턴 추출모델
320 : 통계지표 추출모델 330 : 앙상블 예측모델
400 : 출력부100 input unit 200 preprocessing unit
210: preprocessing module based on time 220: preprocessing module based on customers
300: customer deviation prediction unit 310: behavior pattern extraction model
320: Statistical Indicator Extraction Model 330: Ensemble Prediction Model
400: output unit

Claims

An input unit to which customer data is input;
A pre-processing unit for pre-processing the customer data input to the input unit and classifying the time-based pre-processing data and the customer-based pre-processing data;
The behavioral pattern is extracted using the classified time based preprocessing data, the statistical index of the customer is extracted using the classified customer based preprocessing data, and whether the customer is separated using the extracted behavior pattern and the statistical index. Predicting customer churn prediction unit; And,
Output unit for outputting the prediction result of the customer deviation prediction unit
Customer departure prediction apparatus comprising a.

The method according to claim 1, wherein the preprocessing unit,
A time-based preprocessing module for pre-processing customer data input to the input unit on a time-based basis and classifying the data into time-based pre-processing data;
Customer-based preprocessing module for pre-processing customer data input to the input unit on a customer basis and classifying it as customer-based preprocessing data
Customer departure prediction apparatus comprising a.

The method according to claim 2,
Wherein the time-based preprocessing data is a time table format in which customer behaviors are arranged in chronological order, and the customer-based preprocessing data is an attribute table format in which a plurality of customer attributes are arranged regardless of time.

The method of claim 1, wherein the customer departure prediction unit,
A behavior pattern extraction model that predicts a customer departure by extracting a behavior pattern using the classified time based preprocessing data;
A statistical index extraction model for predicting whether the customer deviates by extracting the statistical index using the classified customer standard preprocessing data;
An ensemble prediction model that produces a majority or more prediction results by applying a voting technique to the prediction results of the behavior pattern extraction model and the statistical index extraction model.
Customer departure prediction apparatus comprising a.

The method according to claim 4,
The behavior pattern extraction model is a predictive model based on RNN (Recurrent Neural Network), a convolutional neural network (CNN) based prediction model,
The statistical indicator extraction model is a customer departure prediction device which is a prediction model based on a random forest (RF) or a prediction model based on XGBoost (Extreme Gradient Boosting).

The method of claim 5, wherein the customer departure prediction unit,
The customer churn prediction apparatus consisting of at least three prediction models of the RNN, CNN, RF, XGBoost-based prediction model.

The method according to claim 5, wherein the ensemble prediction model,
The customer departure prediction device comprising the RNN-based prediction model, the RF-based prediction model, and the XGBoost-based prediction model.

Performed by computing means,
A first step of entering customer data;
A second step of pre-processing the input customer data and classifying it into time-based preprocessing data and customer-based preprocessing data;
The behavioral pattern is extracted using the classified time based preprocessing data, the statistical index of the customer is extracted using the classified customer based preprocessing data, and whether the customer is separated using the extracted behavior pattern and the statistical index. Predicting a third step; And,
A fourth step of outputting a prediction result of the customer deviation prediction unit;
Customer deviation prediction method comprising a.

The method of claim 8, wherein the second step,
Pre-processing the input customer data on a time basis to classify the time-based preprocessing data, and pre-processing the input customer data on a customer basis to classify the customer departure pre-processing data.

The method according to claim 9,
The time-based preprocessing data is a time table format in which customer behaviors are arranged in chronological order, and the customer-based preprocessing data is an attribute table format in which a plurality of customer attributes are arranged regardless of time.

The method of claim 8, wherein the third step,
Extracting behavior patterns using the classified time-based preprocessing data to predict whether the customer has left, and extracting statistical indicators using the classified customer-based preprocessing data to predict whether the customer has left; A method of predicting customer churn comprising applying a voting technique to the prediction result to produce a majority or more prediction result.

The method according to claim 11,
The behavior pattern is extracted using a RNN based prediction model, a CNN based prediction model,
The statistical indicator is a customer departure prediction method to extract using the RF-based or XGBoost-based prediction model.

The method of claim 12, wherein the third step,
The customer churn prediction method using at least any three prediction models of the RNN, CNN, RF, XGBoost-based prediction model.

The method according to claim 12,
The process of calculating a majority or more prediction results by applying a voting technique to the prediction results,
A customer departure prediction method using the RNN based prediction model, the RF based prediction model, and the XGBoost based prediction model.

In computing means,
Entering customer data;
Pre-processing the input customer data and classifying the received customer data into time-based preprocessing data and customer-based preprocessing data;
The behavioral pattern is extracted using the classified time based preprocessing data, the statistical index of the customer is extracted using the classified customer based preprocessing data, and whether the customer is separated using the extracted behavior pattern and the statistical index. Predicting; And,
Outputting a prediction result of the customer deviation prediction unit;
A computer program stored in a recording medium for executing the.