KR102428405B1

KR102428405B1 - System and method for recommendation of customized financial products

Info

Publication number: KR102428405B1
Application number: KR1020200078176A
Authority: KR
Inventors: 이승목; 윤한호; 안홍철; 박주희; 박병욱; 김동우; 최창원
Original assignee: 미래에셋증권 주식회사; 서울대학교산학협력단
Priority date: 2020-06-26
Filing date: 2020-06-26
Publication date: 2022-08-01
Also published as: KR20220000475A

Abstract

본 발명의 일 기술적 측면에 따른 맞춤형 금융상품 추천 시스템은, 고객 데이터에 대하여 주성분 분석을 수행하여 고객 데이터의 차원을 감소시켜 전처리를 수행하는 전처리부, 상기 전처리부에 의하여 전처리된 고객 데이터에 대하여 다중 레이블 분류 모형을 이용하여 레이블 모형을 생성하는 레이블 모형부 및 상기 레이블 모형부에 의하여 생성된 상기 레이블 모형에 대하여, 임의 샘플링을 시행하여 불균형성을 해소하는 불균형 데이터 처리부를 포함한다. A customized financial product recommendation system according to one technical aspect of the present invention includes a pre-processing unit that performs pre-processing by reducing the dimension of customer data by performing principal component analysis on customer data, and multiple customer data pre-processed by the pre-processing unit. A label model unit for generating a label model using the label classification model, and an imbalance data processing unit for resolving imbalance by performing random sampling on the label model generated by the label model unit.

Description

Customized financial product recommendation system and method {SYSTEM AND METHOD FOR RECOMMENDATION OF CUSTOMIZED FINANCIAL PRODUCTS}

본 발명은 맞춤형 금융상품 추천 시스템 및 방법에 관한 것이다.The present invention relates to a customized financial product recommendation system and method.

다양하고 복잡한 구조의 금융상품들이 연이어 등장함에 따라, 고객의 금융상품 거래내역 등의 정보를 활용해, 금융상품 투자성향을 판단하여 금융 서비스를 제공하는 다양한 기술들이 개발되고 있다. As financial products with various and complex structures appear one after another, various technologies are being developed to provide financial services by judging the investment propensity of financial products by using information such as customer's financial product transaction details.

이러한 기술의 일 예로서, 사용자의 거래 데이터를 이용하여 개인 맞춤형 금융상품을 예측하고 추천을 제공하는 맞춤형 금융상품 추천 기술이 주요하게 개발되고 있다. As an example of such a technology, a customized financial product recommendation technology that predicts and provides a recommendation for a personalized financial product using a user's transaction data is being developed.

맞춤형 금융상품 추천을 위해서는 고객의 금융 데이터를 이용해 통계 분석을 하고 있으며, 이러한 통계적 분석에 있어서 종래에는 시스템의 복잡성 등을 이유로 단일 레이블 분류 모형(single-label classification model)을 사용하는 것이 일반적이다. 그러나 이러한 단일 레이블 분류 모형은 각 금융상품을 독립적으로 다루게 되므로, 고객이 여러 가지 금융상품을 동시에 구매하는 경우를 반영하지 못하는 한계점을 가지고 있다.In order to recommend customized financial products, statistical analysis is performed using customer's financial data. In this statistical analysis, conventionally, a single-label classification model is generally used due to the complexity of the system and the like. However, since this single-label classification model handles each financial product independently, it has a limitation in not reflecting the case where customers purchase multiple financial products at the same time.

이러한 종래 기술은, 다양한 금융상품을 동시에 비교하고 복수의 금융상품으로 포트폴리오를 구성하는 금융 소비자의 실제적인 요구를 반영하기 어려운 한계가 있다. The prior art has a limitation in that it is difficult to compare various financial products at the same time and reflect the actual needs of financial consumers who construct a portfolio with a plurality of financial products.

즉, 여러 가지 금융상품을 동시에 고려할 수 있는 맞춤형 금융상품 추천 기술에 대한 요구가 존재하고 있으나, 종래의 기술은 이러한 요구를 만족시키지 못하는 문제가 있다.That is, there is a demand for a customized financial product recommendation technology that can consider various financial products at the same time, but the conventional technology has a problem in that it cannot satisfy this demand.

한국 공개특허공보 제10-2017-0125631호Korean Patent Publication No. 10-2017-0125631

본 발명의 일 기술적 측면은 상기한 종래 기술의 문제점을 해결하기 위한 것으로써, 단일 레이블 분류 모형을 확장한 다중 레이블 분류 모형(multi-label classification model)을 이용하여 고객의 금융상품 추천의 정확도를 높일 수 있는 맞춤형 금융상품 추천 시스템 및 추천 방법을 제공하는 것이다.One technical aspect of the present invention is to solve the problems of the prior art, and to increase the accuracy of customer's financial product recommendation by using a multi-label classification model that is an extension of the single-label classification model. It is to provide a customized financial product recommendation system and recommendation method.

또한, 본 발명의 일 기술적 측면은, 고객 데이터에 대하여 주성분 분석을 수행하여 고객 데이터의 차원을 감소시켜 전처리를 수행함으로써, 시스템 자원을 효율적으로 구성할 수 있는 맞춤형 금융상품 추천 시스템 및 추천 방법을 제공하는 것이다.In addition, a technical aspect of the present invention provides a customized financial product recommendation system and recommendation method that can efficiently configure system resources by performing principal component analysis on customer data to reduce the dimension of customer data and perform preprocessing will do

또한, 본 발명의 일 기술적 측면은, 레이블 모형에 대하여 불균형 데이터 처리를 수행하여 예측의 정확도를 높이고 시스템 자원을 효율적으로 구성할 수 있는 맞춤형 금융상품 추천 시스템 및 그 제공방법을 제공하는 것이다.In addition, one technical aspect of the present invention is to provide a customized financial product recommendation system and a method for providing the same, which can increase prediction accuracy and efficiently configure system resources by performing unbalanced data processing on a label model.

본 발명의 상기 목적과 여러 가지 장점은 이 기술분야에 숙련된 사람들에 의해 본 발명의 바람직한 실시예로부터 더욱 명확하게 될 것이다.The above objects and various advantages of the present invention will become more apparent from preferred embodiments of the present invention by those skilled in the art.

본 발명의 일 기술적 측면은 맞춤형 금융상품 추천 시스템을 제안한다. 상기 맞춤형 금융상품 추천 시스템은, 고객 데이터에 대하여 주성분 분석을 수행하여 고객 데이터의 차원을 감소시켜 전처리를 수행하는 전처리부, 상기 전처리부에 의하여 전처리된 고객 데이터에 대하여 다중 레이블 분류 모형을 이용하여 레이블 모형을 생성하는 레이블 모형부 및 상기 레이블 모형부에 의하여 생성된 상기 레이블 모형에 대하여, 임의 샘플링을 시행하여 불균형성을 해소하는 불균형 데이터 처리부를 포함할 수 있다.One technical aspect of the present invention proposes a customized financial product recommendation system. The customized financial product recommendation system includes a preprocessor that performs principal component analysis on customer data to reduce the dimension of customer data to perform preprocessing, and labels using a multi-label classification model for customer data preprocessed by the preprocessor It may include a label model unit for generating a model and an imbalance data processing unit for resolving imbalance by performing random sampling on the label model generated by the label model unit.

일 실시예에서, 상기 고객 데이터는, 금융상품 거래금액 정보, 금융상품 거래 횟수 정보, 입출금고 금액 정보, 금융상품 평균잔고 정보, 고객 프로파일 정보, 고객 투자성향 정보, 장기 휴면 정보 중 적어도 하나를 포함할 수 있다.In one embodiment, the customer data includes at least one of financial product transaction amount information, financial product transaction number information, deposit and withdrawal amount information, financial product average balance information, customer profile information, customer investment propensity information, and long-term dormancy information can do.

일 실시예에서, 상기 전처리부는, 상기 고객 데이터의 자료를 한 개의 축으로 사상시켰을 때 분산이 가장 커지는 축을 첫 번째 주성분으로, 두 번째로 커지는 축을 두 번째 주성분으로 놓이도록 새로운 좌표계를 설정하고, 이를 이용하여 고객 데이터를 선형 변환할 수 있다.In one embodiment, the preprocessor sets a new coordinate system such that, when the data of the customer data is mapped to one axis, the axis with the largest variance is the first main component and the second axis is the second main component, and this can be used to linearly transform customer data.

일 실시예에서, 상기 레이블 모형부는, 랜덤 포레스트 모형, 로지스틱 회귀 모형, 에이다부스트 모형 및 인공신경망 모형 중 적어도 하나를 단일 레이블 분류 모형으로서 사용할 수 있다.In an embodiment, the label model unit may use at least one of a random forest model, a logistic regression model, an Adaboost model, and an artificial neural network model as a single label classification model.

일 실시예에서, 상기 레이블 모형부는, 상기 전처리된 고객 데이터를 복수개의 단일 레이블 데이터로 변환하는 이진 연관성 기법을 이용하여 다중 레이블 모형을 생성할 수 있다.In an embodiment, the label model unit may generate a multi-label model using a binary association technique that converts the preprocessed customer data into a plurality of single label data.

일 실시예에서, 상기 레이블 모형부는, 상기 전처리된 고객 데이터에 대하여 이진 연관성 기법에 따른 이전 레이블에 대한 분류 결과를 생성하고, 상기 이전 레이블에 대한 분류 결과를 다음 레이블을 예측할 때 설명변수로 사용하여 레이블 사이에 존재하는 상관관계를 반영하여 분류기 체인을 생성할 수 있다.In one embodiment, the label model unit generates a classification result for a previous label according to a binary association technique with respect to the preprocessed customer data, and uses the classification result for the previous label as an explanatory variable when predicting the next label. A classifier chain can be created by reflecting the correlations that exist between labels.

일 실시예에서, 상기 레이블 모형부는, 레이블의 조건부 확률을 순차적으로 계산하고 그에 대한 결합확률분포를 산출하는 확률적 분류기 체인을 이용하여 레이블 모형을 생성할 수 있다.In an embodiment, the label model unit may generate a label model using a probabilistic classifier chain that sequentially calculates conditional probabilities of labels and calculates a joint probability distribution therefor.

일 실시예에서, 상기 불균형 데이터 처리부는, 설명변수를 이용하여 고정된 샘플링을 수행하는 제1 샘플링 처리와 임의로 샘플링을 수행하는 제2 샘플링 처리를 수행하여 불균형 데이터를 처리할 수 있다.In an embodiment, the imbalance data processing unit may process the imbalance data by performing a first sampling process for performing fixed sampling and a second sampling process for arbitrarily performing sampling using an explanatory variable.

일 실시예에서, 상기 불균형 데이터 처리부는, 고객 데이터에 포함된 장기 휴면 정보를 이용하여, 기 설정된 기간 동안 특정 금융상품에 대한 거래가 없었다면 금융상품 거래와 무관한 프로파일 등의 정보를 활용하여 구매 확률을 계산할 수 있다.In one embodiment, the imbalance data processing unit, by using the long-term dormancy information included in the customer data, if there is no transaction for a specific financial product for a preset period, using information such as a profile irrelevant to the financial product transaction to determine the purchase probability can be calculated

본 발명의 다른 일 기술적 측면은 맞춤형 금융상품 추천 방법을 제안한다. 상기 맞춤형 금융상품 추천 방법은, 고객 데이터에 대하여 주성분 분석을 수행하여 고객 데이터의 차원을 감소시켜 전처리를 수행하는 단계, 전처리된 고객 데이터에 대하여 다중 레이블 분류 모형을 이용하여 레이블 모형을 생성하는 단계 및 상기 레이블 모형에 대하여, 임의 샘플링을 시행하여 불균형성을 해소하는 단계를 포함한다.Another technical aspect of the present invention proposes a customized financial product recommendation method. The customized financial product recommendation method includes: performing principal component analysis on customer data to reduce the dimension of customer data to perform preprocessing; generating a label model using a multi-label classification model for preprocessed customer data; and resolving imbalance by performing random sampling on the label model.

일 실시예에서, 상기 전처리를 수행하는 단계는, 상기 고객 데이터의 자료를 한 개의 축으로 사상시켰을 때 분산이 가장 커지는 축을 첫 번째 주성분으로 설정하는 단계, 상기 분산이 두 번째로 커지는 축을 두 번째 주성분으로 설정하는 단계, 상기 첫 번째 주성분과 상기 두 번째 주성분을 이용하여 새로운 좌표계를 설정하는 단계 및 상기 새로운 좌표계를 이용하여 상기 고객 데이터를 선형 변환하는 단계를 포함할 수 있다.In one embodiment, the performing of the pre-processing may include setting an axis having the largest variance as a first principal component when the data of the customer data is mapped to one axis, and setting the axis having the second largest variance as a second principal component It may include the steps of setting , setting a new coordinate system using the first principal component and the second principal component, and linearly transforming the customer data using the new coordinate system.

일 실시예에서, 상기 레이블 모형을 생성하는 단계는, 상기 전처리된 고객 데이터를 복수개의 단일 레이블 데이터로 변환하는 이진 연관성 기법을 이용하여 다중 레이블 모형을 생성하는 단계, 상기 전처리된 고객 데이터에 대하여 이진 연관성 기법에 따른 이전 레이블에 대한 분류 결과를 생성하고, 상기 이전 레이블에 대한 분류 결과를 다음 레이블을 예측할 때 설명변수로 사용하여 레이블 사이에 존재하는 상관관계를 반영하여 분류기 체인을 생성하는 단계 및 레이블의 조건부 확률을 순차적으로 계산하고 그에 대한 결합확률분포를 산출하는 확률적 분류기 체인을 이용하여 레이블 모형을 생성하는 단계 중 적어도 하나를 포함할 수 있다.In one embodiment, the generating of the label model includes generating a multi-label model using a binary association technique that converts the preprocessed customer data into a plurality of single label data; Generating a classification result for the previous label according to the association technique, and using the classification result for the previous label as an explanatory variable when predicting the next label to reflect the correlation existing between the labels to generate a classifier chain and a label The method may include at least one of generating a label model using a probabilistic classifier chain that sequentially calculates conditional probabilities of and calculates a joint probability distribution for them.

일 실시예에서, 상기 고객 데이터를 선형 변환하는 단계는, 설명변수를 이용하여 고정된 샘플링을 수행하는 제1 샘플링 처리 단계 및 임의로 샘플링을 수행하는 제2 샘플링 처리 단계를 포함 할 수 있다.In an embodiment, the linear transformation of the customer data may include a first sampling processing step of performing fixed sampling using an explanatory variable, and a second sampling processing step of optionally performing sampling.

일 실시예에서, 상기 고객 데이터를 선형 변환하는 단계는, 고객 데이터에 포함된 장기 휴면 정보를 이용하여 기 설정된 기간 이상 거래가 없는 고객에 대한 데이터는 훈련에서 제외하는 단계 및 기 설정된 기간 동안 특정 금융상품에 대한 거래가 없었다면 금융상품 거래와 무관한 프로파일 등의 정보를 활용하여 구매 확률을 계산하는 단계를 포함할 수 있다.In one embodiment, the step of linearly transforming the customer data includes excluding from the training data about customers who have not traded for more than a preset period using long-term dormancy information included in the customer data, and a specific financial product for a preset period. If there is no transaction for , the method may include calculating a purchase probability by using information such as a profile that is not related to financial product transactions.

상기한 과제의 해결 수단은, 본 발명의 특징을 모두 열거한 것은 아니다. 본 발명의 과제 해결을 위한 다양한 수단들은 이하의 상세한 설명의 구체적인 실시 형태를 참조하여 보다 상세하게 이해될 수 있을 것이다.The means for solving the above-described problems do not enumerate all the features of the present invention. Various means for solving the problems of the present invention may be understood in more detail with reference to specific embodiments of the detailed description below.

본 발명의 일 실시 형태에 따르면, 단일 레이블 분류 모형을 확장한 다중 레이블 분류 모형을 이용하여 고객의 금융상품 추천의 정확도를 높일 수 있는 효과가 있다.According to an embodiment of the present invention, there is an effect of increasing the accuracy of a customer's financial product recommendation using a multi-label classification model that is an extension of the single-label classification model.

또한, 본 발명의 일 실시 형태에 따르면, 고객 데이터에 대하여 주성분 분석을 수행하여 데이터의 차원을 감소시켜, 다중공선성 문제를 완화하고, 시스템 자원을 효율적으로 구성할 수 있는 효과가 있다. In addition, according to an embodiment of the present invention, there is an effect of reducing the dimension of data by performing principal component analysis on customer data, thereby alleviating the multicollinearity problem, and efficiently configuring system resources.

또한, 본 발명의 일 실시 형태에 따르면, 레이블 모형에 대하여 불균형 데이터 처리를 수행하여 예측의 정확도를 높이고 시스템 자원을 효율적으로 구성할 수 있는 효과가 있다.In addition, according to an embodiment of the present invention, there is an effect of increasing the accuracy of prediction by performing unbalanced data processing on the label model and efficiently configuring system resources.

도 1은 본 발명의 일 실시예에 따른 맞춤형 금융상품 추천 시스템의 일 적용 예를 설명하는 도면이다.
도 2는 본 발명의 일 실시예에 따른 맞춤형 금융상품 추천 시스템을 설명하는 블록 구성도이다.
도 3은 도 2에 도시된 레이블 모형부에 의하여 수행되는 분류기 체인을 설명하는 도면이다.
도 4는 이진 연관성 기법과 분류기 체인을 간략하게 비교하여 도시하는 도면이다.
도 5는 도 2에 도시된 레이블 모형부에서 앙상블 분류기 체인을 적용한 결과를 예시하는 도면이다.
도 6은 세 가지의 금융상품에 대한 고객의 구매 확률 분포의 일 예를 나타내는 그래프이다.
도 7은 도 6에 대하여 분류기 체인을 적용하는 경우를 도시하는 그래프이다.
도 8은 본 발명의 일 실시예에 따른 맞춤형 금융상품 추천 방법을 설명하는 순서도이다.1 is a view for explaining an application example of a customized financial product recommendation system according to an embodiment of the present invention.
2 is a block diagram illustrating a customized financial product recommendation system according to an embodiment of the present invention.
FIG. 3 is a view for explaining a classifier chain performed by the label model unit shown in FIG. 2 .
4 is a diagram illustrating a simplified comparison between a binary association technique and a classifier chain.
FIG. 5 is a diagram illustrating a result of applying the ensemble classifier chain in the label model unit shown in FIG. 2 .
6 is a graph illustrating an example of a customer's purchase probability distribution for three types of financial products.
FIG. 7 is a graph illustrating a case in which a classifier chain is applied with respect to FIG. 6 .
8 is a flowchart illustrating a method for recommending a customized financial product according to an embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시 형태들을 설명한다. Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings.

그러나, 본 발명의 실시 형태는 여러 가지 다른 형태로 변형될 수 있으며, 본 발명의 범위가 이하 설명하는 실시 형태로 한정되는 것은 아니다. 또한, 본 발명의 실시 형태는 당해 기술분야에서 평균적인 지식을 가진 자에게 본 발명을 더욱 완전하게 설명하기 위해서 제공되는 것이다. However, the embodiments of the present invention may be modified in various other forms, and the scope of the present invention is not limited to the embodiments described below. In addition, the embodiments of the present invention are provided in order to more completely explain the present invention to those of ordinary skill in the art.

즉, 전술한 목적, 특징 및 장점은 첨부된 도면을 참조하여 상세하게 후술 되며, 이에 따라 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 것이다. 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 상세한 설명을 생략한다. 이하, 첨부된 도면을 참조하여 본 발명에 따른 바람직한 실시예를 상세히 설명하기로 한다. 도면에서 동일한 참조 부호는 동일 또는 유사한 구성요소를 가리키는 것으로 사용된다.That is, the above-described objects, features and advantages will be described below in detail with reference to the accompanying drawings, and accordingly, a person of ordinary skill in the art to which the present invention pertains will be able to easily implement the technical idea of the present invention. In describing the present invention, if it is determined that a detailed description of a known technology related to the present invention may unnecessarily obscure the gist of the present invention, the detailed description will be omitted. Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings. In the drawings, the same reference numerals are used to refer to the same or similar components.

또한, 본 명세서에서 사용되는 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "구성된다" 또는 "포함한다" 등의 용어는 명세서 상에 기재된 여러 구성 요소들, 또는 여러 단계들을 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다.Also, as used herein, the singular expression includes the plural expression unless the context clearly dictates otherwise. In the present application, terms such as "consisting of" or "comprising" should not be construed as necessarily including all of the various components or various steps described in the specification, and some components or some steps are included. It should be construed that it may not, or may further include additional components or steps.

또한, 이하에서 본 발명에 따른 시스템을 설명하기 위하여 다양한 구성요소 및 그의 하부 구성요소에 대하여 설명하고 있다. 이러한 구성요소 및 그의 하부 구성요소들은, 하드웨어, 소프트웨어 또는 이들의 조합 등 다양한 형태로서 구현될 수 있다. 예컨대, 각 요소들은 해당 기능을 수행하기 위한 전자적 구성으로 구현되거나, 또는 전자적 시스템에서 구동 가능한 소프트웨어 자체이거나 그러한 소프트웨어의 일 기능적인 요소로 구현될 수 있다. 또는, 전자적 구성과 그에 대응되는 구동 소프트웨어로 구현될 수 있다.In addition, various components and sub-components thereof are described below in order to describe the system according to the present invention. These components and sub-components thereof may be implemented in various forms, such as hardware, software, or a combination thereof. For example, each element may be implemented as an electronic configuration for performing a corresponding function, or may be software itself operable in an electronic system or implemented as a functional element of such software. Alternatively, it may be implemented with an electronic configuration and corresponding driving software.

본 발명의 실시 형태를 설명하기 위하여 다양한 순서도가 개시되고 있으나, 이는 각 단계의 설명의 편의를 위한 것으로, 반드시 순서도의 순서에 따라 각 단계가 수행되는 것은 아니다. 즉, 순서도에서의 각 단계는, 서로 동시에 수행되거나, 순서도에 따른 순서대로 수행되거나, 또는 순서도에서의 순서와 반대의 순서로도 수행될 수 있다. Although various flowcharts are disclosed to describe the embodiments of the present invention, this is for convenience of description of each step, and each step is not necessarily performed according to the order of the flowchart. That is, each step in the flowchart may be performed simultaneously with each other, performed in an order according to the flowchart, or may be performed in an order opposite to the order in the flowchart.

이하에서는, 본 발명의 몇몇 실시예에 따른 맞춤형 금융상품 추천 시스템 및 그 제공방법을 설명하도록 한다.Hereinafter, a customized financial product recommendation system and a method for providing the same according to some embodiments of the present invention will be described.

도 1은 본 발명의 일 실시예에 따른 맞춤형 금융상품 추천 시스템의 일 적용 예를 설명하는 도면이다.1 is a view for explaining an application example of a customized financial product recommendation system according to an embodiment of the present invention.

도 1을 참조하면, 금융상품 추천 시스템(100)은 금융상품 서비스 서버(200)로부터 고객 데이터를 제공받고, 이러한 고객 데이터를 이용하여 각 고객에게 개별적인 금융상품 추천 서비스를 제공할 수 있다.Referring to FIG. 1 , the financial product recommendation system 100 may receive customer data from the financial product service server 200 and provide individual financial product recommendation services to each customer using the customer data.

금융상품 서비스 서버(200)는 하나의 통합된 서비스 서버일 수도 있고, 또는 각각 구분되는 복수의 금융 서비스 서버의 집합체일 수 도 있다.The financial product service server 200 may be a single integrated service server, or may be an aggregate of a plurality of financial service servers that are each separated.

금융상품 추천 시스템(100)은 이러한 금융상품 서비스 서버(200)로부터 고객의 금융상품과 관련된 고객 데이터를 획득하고, 이러한 고객 데이터에 대하여 전처리를 수행하고 레이블 모형을 생성하여 사용자 별 금융상품 추천 데이터를 생성할 수 있다. The financial product recommendation system 100 obtains customer data related to the customer's financial product from the financial product service server 200, performs pre-processing on the customer data, and generates a label model to provide financial product recommendation data for each user. can create

이하, 도 2 내지 도 7을 참조하여, 완전 판매 보증을 위한 금융상품 추천 시스템(100)에 대하여 보다 상세히 설명한다.Hereinafter, the financial product recommendation system 100 for a complete sales guarantee will be described in more detail with reference to FIGS. 2 to 7 .

도 2를 참조하면, 금융상품 추천 시스템(100)은 전처리부(110), 레이블 모형부(120), 불균형 데이터 처리부(130) 및 고객 데이터 DB(140)를 포함할 수 있다. Referring to FIG. 2 , the financial product recommendation system 100 may include a preprocessor 110 , a label model unit 120 , an imbalance data processing unit 130 , and a customer data DB 140 .

고객 데이터 DB(140)는 금융상품 서비스 서버(200)로부터 제공된 고객 데이터를 저장할 수 있다.The customer data DB 140 may store customer data provided from the financial product service server 200 .

일 실시예에서, 고객 데이터는 금융상품 거래금액 정보, 금융상품 거래 횟수 정보, 입출금고 금액 정보, 금융상품 평균잔고 정보, 고객 프로파일 정보, 고객 투자성향 정보, 장기 휴면 정보 중 적어도 하나를 포함할 수 있다.In an embodiment, the customer data may include at least one of financial product transaction amount information, financial product transaction count information, deposit and withdrawal amount information, financial product average balance information, customer profile information, customer investment propensity information, and long-term dormancy information. have.

금융상품 거래금액 정보는 고객이 금융상품을 매수 및 매도한 금액에 대한 자료로서, 일정기간, 예컨대 매달마다 집계될 수 있다. 예를 들어, 금융상품 거래금액 정보는 각 고객이 일정기간 동안 CMA, 국내주식, 대출, 랩, 신탁, 예수금, 채권, 파생결합증권, 파생상품, 펀드, 해외주식 및 기타 금융상품 중 적어도 하나에 대한 매수 및 매도 금액을 포함할 수 있다.The financial product transaction amount information is data on the amount of purchase and sale of a financial product by a customer, and may be aggregated for a certain period, for example, every month. For example, financial product transaction amount information is stored in at least one of CMAs, domestic stocks, loans, wraps, trusts, deposits, bonds, derivative-linked securities, derivatives, funds, foreign stocks and other financial products for a certain period of time. It may include purchase and sale amounts for

금융상품 거래 횟수 정보는 고객이 금융상품을 매수 및 매도한 횟수에 대한 자료로서, 일정기간, 예컨대 매달마다 집계될 수 있다. 예를 들어, 금융상품 거래 횟수 정보는 각 고객이 일정기간 동안 CMA, 국내주식, 대출, 랩, 신탁, 예수금, 채권, 파생결합증권, 파생상품, 펀드, 해외주식 및 기타 금융상품 중 적어도 하나에 대한 매수 및 매도 횟수를 포함할 수 있다.The financial product transaction count information is data on the number of times a customer purchases and sells a financial product, and may be aggregated for a certain period, for example, every month. For example, information on the number of transactions in financial products is stored in at least one of CMA, domestic stocks, loans, wraps, trusts, deposits, bonds, derivative-linked securities, derivatives, funds, overseas stocks and other financial products for a certain period of time. It may include the number of purchases and sales for

입출금고 금액 정보는 고객의 입금, 입고, 출금, 출고 금액에 대한 정보로서, 일정기간, 예컨대 매달마다 집계될 수 있다.The deposit and withdrawal amount information is information on the customer's deposit, warehousing, withdrawal, and shipment amount, and may be aggregated for a certain period, for example, every month.

금융상품 평균잔고 정보는 고객이 보유한 금융상품에 대한 평균 잔액 정보로서, 일정기간, 예컨대 매달마다 집계될 수 있다. 예를 들어, 금융상품 평균잔고 정보는 각 고객이 일정기간 동안 보유한 CMA, 국내주식, 대출, 랩, 신탁, 예수금, 채권, 파생결합증권, 파생상품, 펀드, 해외주식 및 기타 금융상품 중 적어도 하나에 대한 잔액을 포함할 수 있다.The financial product average balance information is average balance information on financial products owned by the customer, and may be aggregated for a certain period, for example, every month. For example, the average balance information of financial products includes at least one of CMA, domestic stocks, loans, wraps, trusts, deposits, bonds, derivative-linked securities, derivatives, funds, foreign stocks and other financial products held by each customer for a certain period of time. may include a balance for

고객 프로파일 정보는 개별 고객을 다른 고객과 구분 짓는 프로파일 정보이다. 예를 들어, 고객 프로파일 정보는 고객의 연령, 성별, 구분코드 중 적어도 하나를 포함할 수 있다.The customer profile information is profile information that distinguishes an individual customer from other customers. For example, the customer profile information may include at least one of the customer's age, gender, and classification code.

고객 투자성향 정보는 각 개별 고객에 대한 투자 성향 정보로서, 고객의 투자자정보확인서의 설문 문항에 기반하여 산출된다. 이를 통해 고객의 투자성향 점수가 산출되며, 투자성향 점수 또는 투자성향 구분 코드를 활용할 수 있다.고객의 설문 정보를 바탕으로 투자성향 점수를 산출할 수 있으며, 위험 감수 정도 및 기대수익률 등의 정보가 포함될 수 있다. 이를 기반으로 고객의 투자성향(예를 들어, 위험선호도 등)을 분류할 수 있다.Customer investment propensity information is investment propensity information for each individual customer, and is calculated based on the questionnaire questions in the customer's investor information confirmation letter. Through this, the customer's propensity score is calculated, and the propensity score or propensity classification code can be used. Based on the customer's questionnaire information, the propensity score can be calculated, and information such as the degree of risk tolerance and expected return can be obtained. may be included. Based on this, the customer's investment propensity (eg, risk preference, etc.) can be classified.

장기 휴면 정보는 기 설정된 기간 이상 거래가 없는 고객에 대하여 설정되는 정보이다. 장기 휴면 정보는 고객에 대하여 설정될 수도 있고, 어느 고객의 특정 상품에 대하여 설정될 수도 있다. 예컨대, 어느 고객이 기 설정된 기간 이상 거래가 없다면 해당 고객은 장기 휴면으로 설정될 수 있다. 다른 예로, 어느 고객이 펀드에 대하여 기 설정된 기간 이상 거래가 없다면 해당 고객은 펀드 상품에 대하여 장기 휴면으로 설정될 수 있다. The long-term dormancy information is information set for a customer who has not made a transaction for more than a preset period. The long-term dormancy information may be set for a customer or may be set for a specific product of a certain customer. For example, if a customer does not make a transaction for more than a preset period, the customer may be set to dormancy for a long time. As another example, if a customer does not make a transaction for a fund for a predetermined period or more, the customer may be set as a long-term dormant for a fund product.

장기 휴면 정보는, 해당 휴면 고객의 데이터를 훈련에서 제외하거나 구매 확률을 재설정함으로써 불균형 데이터를 제거하는 데 사용될 수 있다. Long-term dormancy information can be used to remove disproportionate data by excluding the dormant customer's data from training or by resetting the purchase probability.

전처리부(110)는 주성분 분석을 이용하여 고객 데이터에 대하여 전처리를 수행한다. 전처리부(110)는 고객 데이터에 대하여 주성분 분석을 수행하여 고객 데이터의 차원을 감소시켜 전처리를 수행할 수 있다.The preprocessor 110 performs preprocessing on customer data using principal component analysis. The preprocessor 110 may perform the preprocessing by reducing the dimension of the customer data by performing principal component analysis on the customer data.

여기에서, 주성분 분석(Principal Component Analysis, PCA)은 고차원의 자료를 저차원의 자료로 환원시키는 것이다.Here, principal component analysis (PCA) is to reduce high-dimensional data to low-dimensional data.

일 예로, 전처리부(110)는 서로 연관 가능성이 있는 고차원 공간의 표본들을 선형 연관성이 없는 저차원 공간의 표본으로 변환하기 위해 직교 변환을 이용할 수 있다. 이러한 경우, 주성분의 차원 수는 원래 표본의 차원 수보다 작거나 같다. For example, the preprocessor 110 may use orthogonal transformation to transform samples in a high-dimensional space that are likely to be related to each other into samples in a low-dimensional space in which there is no linear correlation. In this case, the number of dimensions of the principal component is less than or equal to the number of dimensions of the original sample.

전처리부(110)는 분산의 크기에 따라 축을 설정하여 직교 변환을 이용할 수 있다. 예컨대, 전처리부(110)는 고객 데이터의 자료를 한 개의 축으로 사상시켰을 때 그 분산이 가장 커지는 축을 첫 번째 주성분, 두 번째로 커지는 축을 두번째 주성분으로 놓이도록 새로운 좌표계를 설정하고, 이를 이용하여 고객 데이터를 선형 변환할 수 있다. The preprocessor 110 may use an orthogonal transformation by setting an axis according to the size of the variance. For example, the preprocessor 110 sets a new coordinate system so that when the data of customer data is mapped to one axis, the axis with the largest variance is the first main component and the axis with the second largest as the second main component, and using this The data can be linearly transformed.

이와 같이, 표본의 차이를 가장 잘 나타내는 성분들로 분해함으로써 여러 가지 응용이 가능하다. 이 변환은 첫 번째 주성분이 가장 큰 분산을 가지고, 이후의 주성분들은 이전의 주성분들과 직교한다는 제약 아래에 가장 큰 분산을 갖고 있는 방식으로 정의될 수 있다. 주성분들은 공분산 행렬의 고유벡터이기 때문에 직교하게 된다. In this way, various applications are possible by decomposing the sample difference into the components that best represent the difference. This transformation can be defined in such a way that the first principal component has the largest variance, and the subsequent principal components have the largest variance under the constraint that they are orthogonal to the previous principal components. The principal components are orthogonal because they are eigenvectors of the covariance matrix.

일 예로, 주성분 분석은 다음과 같이 수행될 수 있다. 먼저 nХp 차원 자료 X를 가지고 있다고 가정하자. 자료 X의 각 행을 x_i라고 하자. 자료의 평균이 0이라고 가정하고 평균이 0이 아닐 경우 자료의 평균을 빼준다. 자료 X의 주성분 w₁은 다음의 수식으로 정의된다.As an example, principal component analysis may be performed as follows. First, suppose we have an nХp dimension data X. Let each row of data X be x _i . Assume that the mean of the data is 0, and if the mean is not 0, subtract the mean of the data. The principal component w ₁ of data X is defined by the following equation.

[수학식 1][Equation 1]

이제 w₁, w₂, … w_k-1를 1, 2, … k-1번째 주성분이라고 하고,

라고 하면 번째 주성분 W_k는 다음의 수학식 2와 같이 정의된다.Now w ₁ , w ₂ , … Let w _k-1 be 1, 2, … Let it be the k-1th principal component,

, the th principal component W _k is defined as in Equation 2 below.

[수학식 2][Equation 2]

w₁, w₂, … w_p를 1, 2, ?? p 번째 주성분이라고 하면, W는 w_i를 각 열로 이루어진 행렬이 되고, 자료 X의 전체 주성분 분해는 다음의 수학식 3과 같다.w ₁ , w ₂ , … w _p to 1, 2, ?? Assuming the p-th principal component, W becomes a matrix consisting of each column of w _i , and the total principal component decomposition of the data X is as shown in Equation 3 below.

[수학식 3][Equation 3]

또한 자료 X의 특잇값 분해(Singular Value Decomposition, SVD)는 다음의 수학식 4와 같이 나타낼 수 있다.In addition, the singular value decomposition (SVD) of the data X can be expressed as in Equation 4 below.

[수학식 4][Equation 4]

여기서 D는 X의 특잇값이라 불리는 양수 d_i의 대각 행렬이고 U, W는 직교 행렬이다. 여기서 W가 주성분 분석에서 이용한 W와 일치하게 된다.where D is a diagonal matrix of positive d _i called singular values of X and U and W are orthogonal matrices. Here, W coincides with W used in principal component analysis.

여기서 전체 p를 이용하지 않고 적절한 k를 선택하여 W_k를 생성하여 위와 같은 변환을 통하여 T_k를 생성하면 T_k는 nХk차원 행렬로서 축소된 차원을 가지게 된다. 여기서 k를 선택하는 방법은 일반적으로 자료 X의 분산의 일정 비율(예를 들면 90%) 이상을 가지는 최소의 k를 선택하는데 이 방법은 위 특잇값 분해의 d_i에 대하여 아래의 수학식 5의 조건을 만족하는 최소의 k를 찾는 것과 같은 방법이다.Here, instead of using the entire p, if an appropriate k is selected to generate W _k and T _k is generated through the above transformation, T _k is an nХk-dimensional matrix and has a reduced dimension. Here, the method of selecting k generally selects the minimum k having a certain ratio (for example, 90%) of the _variance of the data X. This method is based on the expression It is the same method as finding the minimum k that satisfies the condition.

[수학식 5][Equation 5]

레이블 모형부(120)는 전처리부에 의하여 전처리된 고객 데이터에 대하여 다중 레이블 분류 모형을 이용하여 레이블 모형을 생성할 수 있다. The label model unit 120 may generate a label model using a multi-label classification model for customer data preprocessed by the preprocessor.

레이블 모형부(120)는 단일 레이블 분류 모형을 기반으로, 이하에서 설명하는 다중 레이블 분류 모형을 이용하여 레이블 모형을 생성할 수 있다. The label model unit 120 may generate a label model based on the single label classification model using a multi-label classification model described below.

여기에서, 단일 레이블 분류 모형은 1차원 분류 문제에서 이용되는 모형으로서, 예를 들어, 랜덤 포레스트(Random Forest) 모형, 로지스틱 회귀(Logistic Regression) 모형, 에이다부스트(AdaBoost) 모형 및 인공신경망(Neural Network) 모형 등이 사용될 수 있으며, 이 외에도 다른 종류의 단일 레이블 모형이 적용될 수도 있다. Here, the single label classification model is a model used in a one-dimensional classification problem, for example, a random forest model, a logistic regression model, an AdaBoost model, and a neural network. ) model may be used, and other types of single label models may also be applied.

단일 레이블 모형의 경우 서로 다른 레이블을 상호 배제하므로, 하나의 상품을 구매하는 경우는 적합한 추정이 가능하나, 여러 상품을 동시에 구입하는 경우에는 그 추정이 어렵다. In the case of the single-label model, since different labels are mutually excluded, a suitable estimation is possible when a single product is purchased, but it is difficult to estimate when multiple products are purchased at the same time.

따라서, 레이블 모형부(120)는 다중 레이블 분류 모형을 이용하여 여러 상품을 동시에 구입하는 경우에도 적용이 가능한 레이블 모형을 제공할 수 있다.Therefore, the label model unit 120 can provide a label model that can be applied even when purchasing several products at the same time using the multi-label classification model.

일 실시예에서, 레이블 모형부(120)는 이진 연관성 기법(Binary Relevance, BR)을 다중 레이블 모형으로서 이용하여 레이블 모형을 생성할 수 있다. 즉, 레이블 모형부(120)는 전처리된 고객 데이터를 복수개의 단일 레이블 데이터로 변환할 수 있다.In an embodiment, the label modeler 120 may generate a label model by using a binary relevance (BR) method as a multi-label model. That is, the label model unit 120 may convert the preprocessed customer data into a plurality of single label data.

예를 들어, 아래의 표 1과 같은 금융상품 구매 예측 데이터가 주어졌다고 가정하자.For example, suppose that financial product purchase prediction data as shown in Table 1 below is given.

고객client 국내주식domestic stocks 채권bond 펀드fund 해외주식overseas stocks 1One OO XX XX OO 22 XX XX OO OO 33 OO XX XX XX 44 XX OO OO XX

이진 연관성 기법을 위하여 각 레이블 사이의 독립성을 설정한다. For the binary association technique, the independence between each label is established.

구체적으로 모든 레이블의 집합을 L이라고 하면, 집합 L에 있는 각 다른 레이블 l에 대하여, 레이블 모형부(120)는 H_l:X→{1, -1}과 같은 이진 분류 모형을 구성하여 학습할 수 있다. 즉, 레이블 모형부(120)는 원본 데이터를 |L|개의 데이터 셋으로 변환한다. 레이블 모형부(120)는, 원래의 표본이 레이블 l을 포함하면 l로 표시하고, 그렇지 않다면 -l이라 표시하는 방식으로 각 데이터 셋을 구성할 수 있다. Specifically, assuming that the set of all labels is L, for each different label l in the set L, the label model unit 120 constructs a binary classification model such as H _l :X → {1, -1} to learn. can That is, the label model unit 120 converts the original data into |L| data sets. The label model unit 120 may configure each data set in such a way that if the original sample includes the label l, it is displayed as l, otherwise it is expressed as -l.

표 1의 금융상품 구매 예측 데이터는 이진 연관성 기법이 적용되면 아래의 표 2 내지 표 5로 변환될 수 있다.The financial product purchase prediction data in Table 1 may be converted into Tables 2 to 5 below when the binary association technique is applied.

고객client 주식거래stock trading 주식미거래stock not traded 1One OO XX 22 XX OO 33 OO XX 44 XX OO

고객client 펀드거래fund transaction 펀드미거래Fund non-trading 1One XX OO 22 OO XX 33 XX OO 44 OO XX

고객client 채권거래bond trading 채권미거래non-trading of bonds 1One XX OO 22 XX OO 33 XX OO 44 OO XX

고객client 해외주식거래Overseas stock trading 해외주식미거래Non-trading of foreign stocks 1One OO XX 22 OO XX 33 XX OO 44 XX OO

이러한 이진 연관성 기법에 의한 새로운 표본 x는 다음의 수학식 6과 같이 |L|개의 분석모형(classifier)에서 출력되는 레이블의 합집합에 해당하는 레이블로 분류한다.The new sample x by this binary association technique is classified as a label corresponding to the union of labels output from the |L| analysis models (classifier) as shown in Equation 6 below.

[수학식 6][Equation 6]

일 실시예에서, 레이블 모형부(120)는 분류기 체인 (Classifier Chains, CC)을 다중 레이블 모형으로서 이용하여 레이블 모형을 생성할 수 있다. 분류기 체인은 레이블 사이의 상관성이 모형에 반영되는 점에서 유리하다.In an embodiment, the label model unit 120 may generate a label model using classifier chains (CC) as a multi-label model. Classifier chains are advantageous in that the correlation between labels is reflected in the model.

레이블 모형부(120)는 먼저 전처리된 고객 데이터에 대하여 이진 연관성 기법에 따른 이전 레이블에 대한 분류 결과를 생성하고, 이전 레이블에 대한 분류 결과를 다음 레이블을 예측할 때 설명변수로 사용하여 레이블 사이에 존재하는 상관관계를 반영하여 분류기 체인을 생성할 수 있다. The label model unit 120 first generates a classification result for the previous label according to the binary association technique with respect to the preprocessed customer data, and uses the classification result for the previous label as an explanatory variable when predicting the next label to exist between the labels. A classifier chain can be created by reflecting the correlation.

도 3은 도 2에 도시된 레이블 모형부에 의하여 수행되는 분류기 체인을 설명하는 도면이다. 일 예로, 다중 레이블이 총 5개가 있다고 가정할 때, 레이블 모형부(120)는 도 3에 도시된 훈련(training)과 예측(prediction)을 각각 진행한다.FIG. 3 is a view for explaining a classifier chain performed by the label model unit shown in FIG. 2 . For example, assuming that there are a total of five multi-labels, the label model unit 120 performs training and prediction shown in FIG. 3 , respectively.

도 3에서 알 수 있듯이, 레이블 모형부(120)는 훈련 과정에서 주어진 입력 변수 x와 출력 변수 y의 일부분을 함께 이진 분류 알고리즘의 입력변수로 사용하여 분류기 체인 알고리즘을 적용할 수 있다. 또한 예측 과정에서도 기존에 주어지는 입력 변수 x와 앞의 과정에서 예측한 y를 함께 이진 분류 알고리즘의 예측에 활용할 수 있다. As can be seen from FIG. 3 , the label model unit 120 may apply the classifier chain algorithm by using a portion of the input variable x and the output variable y given in the training process as input variables of the binary classification algorithm. Also, in the prediction process, both the input variable x given in the past and the y predicted in the previous process can be used for prediction of the binary classification algorithm.

도 4는 이진 연관성 기법과 분류기 체인을 간략하게 비교하여 도시하고 있다. 도시된 바와 같이, 분류기 체인은 레이블 간의 상관관계를 나타낼 수 있는 점에서 이진 연관성 기법보다 유리하다.4 shows a simplified comparison of the binary association technique and the classifier chain. As shown, the classifier chain is advantageous over the binary association technique in that it can represent the correlation between labels.

또한, 레이블 모형부(120)는 앙상블 분류기 체인(Ensemble Classifier Chains, ECC)을 다중 레이블 모형으로서 이용하여 레이블 모형을 생성할 수 있다. 앙상블 분류기 체인은 레이블을 예측하는 순서를 임의로 결정하여 분류기 체인 방법론을 여러 번 적용한다. 그리고 생성된 여러 개의 체인에 대한 결과를 모두 앙상블 하여 사용한다. Also, the label model unit 120 may generate a label model by using Ensemble Classifier Chains (ECC) as a multi-label model. The ensemble classifier chain applies the classifier chain methodology multiple times by randomly determining the order in which it predicts the labels. And all the results for several generated chains are used in an ensemble.

도 5는 레이블 모형부(120)에서 앙상블 분류기 체인을 적용한 결과를 예시하는 도면으로서, 이를 참조하면, 레이블이 모두 6개인 데이터에 5번의 분류기 체인을 적용하고 그 결과를 앙상블 하여 예측하는 것을 알 수 있다. 여기에서, 앙상블에는 단순 평균이 사용되었고, 레이블을 예측할 때에는 0.5를 기준으로 0과 1을 구분하였다.5 is a diagram illustrating the result of applying the ensemble classifier chain in the label model unit 120. Referring to this, it can be seen that the classifier chain is applied 5 times to data with all 6 labels and the result is ensembled and predicted. have. Here, a simple average was used for the ensemble, and 0 and 1 were classified based on 0.5 when predicting the label.

일 실시예에서, 레이블 모형부(120)는 확률적 분류기 체인 (Probabilistic Classifier Chains, PCC)을 다중 레이블 모형으로서 이용하여 레이블 모형을 생성할 수 있다. 확률적 분류기 체인은 레이블을 순차적으로 예측하지 않고 모든 레이블의 결합확률 분포를 추정하여 상관성을 생성하므로, 보다 정확한 추정이 가능하다.In an embodiment, the label modeler 120 may generate a label model using probabilistic classifier chains (PCC) as a multi-label model. The probabilistic classifier chain does not predict the labels sequentially, but generates correlation by estimating the joint probability distribution of all labels, so more accurate estimation is possible.

도 6은 세 가지의 금융상품에 대한 고객의 구매 확률 분포의 일 예를 나타내는 그래프로서, 예컨대, 어느 고객의 주식, 펀드, 채권 중 어떤 상품을 구매할지에 대한 확률 분포를 도시하고 있다. 도시된 그래프에서 개별 노드는 제2 레벨부터 제4 레벨까지 순서대로 (주식), (주식, 펀드), (주식, 펀드, 채권) 조합의 구매 확률을 의미하고, 노드를 연결하는 선은 조건부 확률을 나타낸다. 제4 레벨에서 모든 금융상품 조합의 확률을 살펴보면, 아무것도 구매하지 않을 확률이 0.324로 가장 높으므로 이 고객은 아무것도 구매하지 않을 가능성이 가장 높다.6 is a graph illustrating an example of a customer's purchase probability distribution for three financial products, and shows, for example, a probability distribution for which customer's stock, fund, and bond to purchase. In the graph shown, individual nodes mean the purchase probability of a combination of (stocks), (stocks, funds), and (stocks, funds, bonds) in order from the second level to the fourth level, and the line connecting the nodes means the conditional probability indicates Looking at the probabilities of all financial product combinations at level 4, the probability of not buying anything is the highest at 0.324, so this customer is most likely not buying anything.

도 7은 도 6에 대하여 분류기 체인을 적용하는 경우를 도시하는 그래프이다. 분류기 체인은 해당하는 노드에서 단순히 조건부 확률이 높은 방향을 따라가므로, 분류기 체인을 적용하는 경우 도 7과 같이, 세 가지 상품을 모두 구매할 것이라는 예측이 도출된다. 이는, 분류기 체인은 순차적으로만 비교하므로, 도 6과 같은 경우에는 실질적인 예측 확률이 올바르게 동작하지 않을 수 있다.FIG. 7 is a graph illustrating a case in which a classifier chain is applied with respect to FIG. 6 . Since the classifier chain simply follows the direction in which the conditional probability is high in the corresponding node, a prediction that all three products will be purchased is derived as shown in FIG. 7 when the classifier chain is applied. Since the classifier chains are compared only sequentially, the actual prediction probability may not operate correctly in the case of FIG. 6 .

한편, 레이블을 순차적으로 예측하는 분류기 체인과 달리, 확률적 분류기 체인은 이진 연관성 기법의 철학을 여전히 따르되 모든 레이블의 결합확률분포(joint distribution)를 추정한다. 즉, 레이블 모형부(120)는 레이블의 조건부 확률을 순차적으로 계산한 이후, 베이즈 정리를 활용하여 결합확률분포를 산출할 수 있다. 이러한 레이블 모형부(120)는 아래의 수학식 7을 만족한다.On the other hand, unlike classifier chains that predict labels sequentially, probabilistic classifier chains still follow the philosophy of binary association techniques, but estimate the joint distribution of all labels. That is, the label model unit 120 may calculate the joint probability distribution by using Bayes theorem after sequentially calculating the conditional probabilities of the labels. The label model unit 120 satisfies Equation 7 below.

[수학식 7][Equation 7]

또한, 레이블 모형부(120)는 앙상블 확률적 분류기 체인(Ensemble Probabilistic Classifier Chains, EPCC)을 적용할 수 있다. 앙상블 확률적 분류기 체인은 레이블의 조건부 확률을 추정할 때, 레이블의 예측하는 순서를 임의로 결정한 확률적 분류기 체인을 여러 개 적합한다. 최종적으로는 이러한 결과를 앙상블 함으로써, 레이블의 결합확률분포의 정확도를 더 높일 수 있다.Also, the label model unit 120 may apply Ensemble Probabilistic Classifier Chains (EPCC). When estimating the conditional probability of a label, the ensemble probabilistic classifier chain fits several probabilistic classifier chains that randomly determine the order of label prediction. Finally, by ensemble of these results, the accuracy of the joint probability distribution of the label can be further increased.

한편, 대부분의 고객은 소수의 금융상품만을 거래하므로 각 금융상품의 거래 데이터는 불균형성이 내포되어 있다. 따라서, 불균형 데이터 처리부(130)는 레이블 모형부에 의하여 생성된 레이블 모형에 대하여, 임의 샘플링(random sampling)을 시행하여 불균형성을 해소할 수 있다. On the other hand, since most customers trade only a few financial products, the transaction data of each financial product contains an imbalance. Accordingly, the imbalance data processing unit 130 may solve the imbalance by performing random sampling on the label model generated by the label model unit.

일 실시예에서, 불균형 데이터 처리부(130)는 다수 레이블을 언더 샘플링(undersampling)하여 불균형성을 해소할 수 있다. 불균형 데이터 처리부(130)는 레이블 모형에 포함된 다수의 레이블이 소수 레이블의 개수와 같아지도록 다수 레이블을 가진 표본을 임의로 선택하고 이에 대하여 언더 샘플링을 수행할 수 있다. 이는, 다중 레이블 모형도 사실상 단일 레이블 모형을 여러 번 적용하는 방법이므로, 모든 레이블마다 언더 샘플링 방법을 적용하여 데이터의 불균형성을 해소할 수 있다.In an embodiment, the imbalance data processing unit 130 may reduce the imbalance by undersampling multiple labels. The imbalance data processing unit 130 may arbitrarily select a sample having a plurality of labels so that the number of labels included in the label model is equal to the number of decimal labels, and perform undersampling thereon. Since the multi-label model is actually a method of applying the single-label model several times, the data imbalance can be resolved by applying the undersampling method to every label.

또는, 불균형 데이터 처리부(130)는 레이블 모형에 대하여 오버 샘플링(oversampling)을 수행할 수도 있다. 이를 위하여, 불균형 데이터 처리부(130)는 우선 소수 레이블을 가진 집단에서 임의로 표본을 샘플링하고 해당 샘플의 인근에 있는 k개의 소수 레이블을 가진 표본을 선택한다. 그리고 해당 표본과 k개의 인접한 표본을 잇는 선 사이에 있는 임의에 표본을 데이터에 추가하여 샘플링을 수행할 수 있다. Alternatively, the imbalance data processing unit 130 may oversampling the label model. To this end, the imbalance data processing unit 130 first randomly samples a sample from a group having a fractional label, and selects a sample having k number of fractional labels in the vicinity of the corresponding sample. And sampling can be performed by adding a random sample to the data between the line connecting the corresponding sample and k adjacent samples.

일 실시예에서, 불균형 데이터 처리부(130)는 설명변수를 이용하여 고정된 샘플링을 수행할 수 있다. 고객 데이터 중에서 일부 정보는 설명변수로 설정될 수 있다. 예컨대, 불균형 데이터 처리부(130)는 설명변수 중 장기 휴면 정보를 이용하여 고정된 샘플링을 수행할 수 있다. 즉, 장기 휴면 고객인 경우 이를 훈련 데이터에서 제외시킬 수 있다. 이는, 어떤 금융상품을 거래하는 고객은 계속 거래를 하는 반면 거래하지 않는 고객은 꾸준하게 거래를 하지 않는 현상을 고정 샘플링으로서 활용하는 것이다. In an embodiment, the imbalance data processing unit 130 may perform fixed sampling using an explanatory variable. Some of the customer data may be set as explanatory variables. For example, the imbalance data processing unit 130 may perform fixed sampling using long-term dormancy information among explanatory variables. That is, long-term dormant customers can be excluded from the training data. This is to use as a fixed sampling the phenomenon that customers who trade certain financial products continue to trade while customers who do not trade do not trade consistently.

예컨대, 불균형 데이터 처리부(130)는 기 설정된 기간 (예를 들어, 6개월) 이상 거래가 없는 고객에 대한 데이터는 훈련에서 제외하고, 특정 상품에 대한 구매 여부를 예측할 때에는 기 설정된 기간 동안 특정 금융상품에 대한 거래가 없었다면 금융상품 거래와 무관한 프로파일 등의 정보를 활용하여 구매 확률을 계산할 수 있다. For example, the imbalance data processing unit 130 excludes data about customers who have not traded for more than a preset period (eg, 6 months) from training, and when predicting whether to purchase a specific product, If there is no transaction for financial products, the purchase probability can be calculated using information such as profiles that are not related to financial product transactions.

이는, 장기 휴면 고객의 데이터를 삭제함으로써 얻는 정확도 향상에 비하여, 장기 휴면 고객의 데이터를 삭제함으로써 발생하는 예측력의 손실은 미미하므로 충분히 유효하다.This is sufficiently effective because the loss of predictive power caused by deleting the long-term dormant customer data is insignificant compared to the accuracy improvement obtained by deleting the long-term dormant customer data.

일 실시예에서, 불균형 데이터 처리부(130)는 2단계 샘플링을 수행할 수 있다. 불균형 데이터 처리부(130)는 설명변수를 이용하여 고정된 샘플링을 수행하는 제1 샘플링 처리와 임의로 샘플링을 수행하는 제2 샘플링 처리를 수행하여 불균형 데이터를 처리할 수 있다. In an embodiment, the imbalance data processing unit 130 may perform two-step sampling. The unbalanced data processing unit 130 may process the unbalanced data by performing a first sampling process of performing fixed sampling and a second sampling process of arbitrarily performing sampling using an explanatory variable.

이상에서는 도 1 내지 도 7을 참조하여, 본 발명의 일 실시예에 따른 맞춤형 금융상품 추천 시스템에 대하여 설명하였다.In the above, a customized financial product recommendation system according to an embodiment of the present invention has been described with reference to FIGS. 1 to 7 .

이하에서는, 도 8을 참조하여, 본 발명의 일 실시예에 따른 맞춤형 금융상품 추천 방법에 대하여 설명한다. Hereinafter, a customized financial product recommendation method according to an embodiment of the present invention will be described with reference to FIG. 8 .

이하에서 설명할 맞춤형 금융상품 추천 방법은 기 설명한 맞춤형 금융상품 추천 시스템을 기초로 수행되므로, 도 1 내지 도 8 기초로 상술한 설명을 참조하여 보다 쉽게 이해할 수 있다.Since the customized financial product recommendation method to be described below is performed based on the previously described customized financial product recommendation system, it can be more easily understood with reference to the above description based on FIGS. 1 to 8 .

도 8은 본 발명의 일 실시예에 따른 맞춤형 금융상품 추천 방법을 설명하는 순서도이다.8 is a flowchart illustrating a method for recommending a customized financial product according to an embodiment of the present invention.

도 8을 참조하면, 금융상품 추천 시스템(100)은 고객 데이터에 대하여 주성분 분석을 수행하여 고객 데이터의 차원을 감소시켜 전처리를 수행할 수 있다(S810).Referring to FIG. 8 , the financial product recommendation system 100 may perform principal component analysis on customer data to reduce the dimension of customer data to perform preprocessing ( S810 ).

금융상품 추천 시스템(100)은 전처리된 고객 데이터에 대하여 다중 레이블 분류 모형을 이용하여 레이블 모형을 생성할 수 있다(S820).The financial product recommendation system 100 may generate a label model using a multi-label classification model for the preprocessed customer data (S820).

금융상품 추천 시스템(100)은 레이블 모형에 대하여, 임의 샘플링을 시행하여 불균형성을 해소할 수 있다(S830).The financial product recommendation system 100 may eliminate the imbalance by performing random sampling on the label model (S830).

단계 S810에 대한 일 실시예에서, 금융상품 추천 시스템(100)은, 고객 데이터의 자료를 한 개의 축으로 사상시켰을 때 분산이 가장 커지는 축을 첫 번째 주성분으로 설정하고, 분산이 두 번째로 커지는 축을 두 번째 주성분으로 설정할 수 있다. 금융상품 추천 시스템(100)은 첫 번째 주성분과 두 번째 주성분을 이용하여 새로운 좌표계를 설정하고, 새로운 좌표계를 이용하여 상기 고객 데이터를 선형 변환할 수 있다.In one embodiment for step S810, the financial product recommendation system 100 sets the axis with the largest variance as the first principal component when the data of customer data is mapped to one axis, and the axis with the second largest variance is set as two axes. It can be set as the second principal component. The financial product recommendation system 100 may set a new coordinate system using the first principal component and the second principal component, and linearly transform the customer data using the new coordinate system.

단계 S820에 대한 일 실시예에서, 금융상품 추천 시스템(100)은, 전처리된 고객 데이터를 복수개의 단일 레이블 데이터로 변환하는 이진 연관성 기법을 이용하여 다중 레이블 모형을 생성하는 단계, 전처리된 고객 데이터에 대하여 이진 연관성 기법에 따른 이전 레이블에 대한 분류 결과를 생성하고, 이전 레이블에 대한 분류 결과를 다음 레이블을 예측할 때 설명변수로 사용하여 레이블 사이에 존재하는 상관관계를 반영하여 분류기 체인을 생성하는 단계 및 레이블의 조건부 확률을 순차적으로 계산하고 그에 대한 결합확률분포를 산출하는 확률적 분류기 체인을 이용하여 레이블 모형을 생성하는 단계 중 적어도 하나를 이용하여 레이블 모형을 생성할 수 있다. In one embodiment for step S820, the financial product recommendation system 100 generates a multi-label model using a binary association technique that converts the preprocessed customer data into a plurality of single label data; generating a classification result for the previous label according to the binary association technique for A label model may be generated using at least one of the steps of generating a label model using a probabilistic classifier chain that sequentially calculates the conditional probability of a label and calculates a joint probability distribution for it.

단계 S830에 대한 일 실시예에서, 금융상품 추천 시스템(100)은, 설명변수를 이용하여 고정된 샘플링을 수행하는 제1 샘플링 처리를 수행하고, 임의로 샘플링을 수행하는 제2 샘플링 처리를 수행하여 불균형성을 해소할 수 있다. In one embodiment for step S830, the financial product recommendation system 100 performs a first sampling process of performing fixed sampling using an explanatory variable, and performs a second sampling process of arbitrarily sampling to perform unbalanced sex can be relieved.

단계 S830에 대한 일 실시예에서, 금융상품 추천 시스템(100)은 고객 데이터에 포함된 장기 휴면 정보를 이용하여 기 설정된 기간 이상 거래가 없는 고객에 대한 데이터는 훈련에서 제외하고, 기 설정된 기간 동안 특정 금융상품에 대한 거래가 없었다면 금융상품 거래와 무관한 프로파일 등의 정보를 활용하여 구매 확률을 계산함으로써 불균형성을 해소할 수 있다.In one embodiment for step S830, the financial product recommendation system 100 excludes from the training the data for the customer who has not made a transaction for more than a preset period using the long-term dormancy information included in the customer data from the training, and If there is no product transaction, the imbalance can be resolved by calculating the purchase probability using information such as profiles that are not related to financial product transactions.

이상에서 설명한 본 발명은 전술한 실시예 및 첨부된 도면에 의해 한정되는 것이 아니고 후술 하는 특허청구범위에 의해 한정되며, 본 발명의 구성은 본 발명의 기술적 사상을 벗어나지 않는 범위 내에서 그 구성을 다양하게 변경 및 개조할 수 있다는 것을 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 쉽게 알 수 있다.The present invention described above is not limited by the above-described embodiments and the accompanying drawings, but is limited by the claims described below, and the configuration of the present invention may vary within the scope without departing from the technical spirit of the present invention. Those of ordinary skill in the art to which the present invention pertains can easily recognize that it can be changed and modified.

10, 11, 12 : 사용자 단말
100 : 금융상품 추천 시스템
110 : 전처리부
120 : 레이블 모형부
130 : 불균형 데이터 처리부
140 : 고객 데이터 DB
200 : 금융상품 서비스 서버10, 11, 12: user terminal
100: Financial product recommendation system
110: preprocessor
120: label model part
130: unbalanced data processing unit
140: customer data DB
200: financial product service server

Claims

a preprocessing unit for performing preprocessing by reducing the dimension of customer data by performing principal component analysis on customer data;
a label model unit for generating a label model using a multi-label classification model for customer data preprocessed by the preprocessor; and
An imbalance data processing unit for resolving imbalance by performing random sampling on the label model generated by the label model unit;
The label model part
generating a label model using a probabilistic classifier chain that calculates the conditional probability of a label and calculates a joint probability distribution for the conditional probability,
When estimating the conditional probability of the label, a plurality of probabilistic classifier chains for estimating the conditional probability of the label in an arbitrary order are applied, and then the results obtained from each probabilistic classifier chain are ensembled using the result of the label. A customized financial product recommendation system that calculates a combined probability distribution.

The method of claim 1, wherein the customer data is
A customized financial product recommendation system including at least one of financial product transaction amount information, financial product transaction number information, deposit and withdrawal amount information, average financial product balance information, customer profile information, customer investment propensity information, and long-term dormancy information.

According to claim 1, wherein the pre-processing unit
When the data of the customer data is mapped to one axis, a new coordinate system is set so that the axis with the largest variance is the first principal component and the axis with the second largest as the second principal component, and using this, the customer data is linearly transformed. Financial product recommendation system.

According to claim 1, wherein the label model part
A customized financial product recommendation system using at least one of a random forest model, a logistic regression model, an Adaboost model, and an artificial neural network model as a single label classification model.

According to claim 1, wherein the label model part
A customized financial product recommendation system for generating a multi-label model using a binary association technique that converts the pre-processed customer data into a plurality of single-label data.

According to claim 1, wherein the label model part
For the preprocessed customer data, a classification result for the previous label is generated according to the binary association technique, and the classification result for the previous label is used as an explanatory variable when predicting the next label to reflect the correlation existing between the labels. A custom financial product recommendation system that creates a chain of classifiers.

delete

The method of claim 1, wherein the imbalance data processing unit
A customized financial product recommendation system that processes imbalanced data by performing a first sampling process that performs fixed sampling using an explanatory variable and a second sampling process that performs random sampling.

The method of claim 1, wherein the imbalance data processing unit
Using the long-term dormancy information included in the customer data, data on customers who have not traded for more than a preset period are excluded from training, and if there is no transaction for a specific financial product for a preset period, information such as profiles unrelated to financial product transactions are stored. A customized financial product recommendation system that calculates the purchase probability using

performing the preprocessing by reducing the dimension of the customer data by the preprocessor performing principal component analysis on the customer data;
generating a label model using a multi-label classification model for the preprocessed customer data by the label model unit; and
resolving the imbalance by performing random sampling on the label model by the imbalance data processing unit; including,
The label model part
generating a label model using a probabilistic classifier chain that calculates the conditional probability of a label and calculates a joint probability distribution for the conditional probability,
When estimating the conditional probability of the label, a plurality of probabilistic classifier chains for estimating the conditional probability of the label in an arbitrary order are applied, and then the results obtained from each probabilistic classifier chain are ensembled using the result of the label. A customized financial product recommendation method that calculates a joint probability distribution.

The method of claim 10, wherein the pre-processing unit performing the pre-processing comprises:
setting, as a first principal component, an axis having the largest variance when the preprocessor maps the customer data data to one axis;
setting, by the pre-processing unit, an axis on which the dispersion is secondly increased as a second principal component;
setting, by the preprocessor, a new coordinate system using the first principal component and the second principal component; and
linearly transforming the customer data by the preprocessor using the new coordinate system; A method of recommending customized financial products, including

The method of claim 10, wherein the step of generating the label model by the label model unit,
generating a multi-label model using a binary association technique in which the label model unit converts the preprocessed customer data into a plurality of single label data;
The label model unit generates a classification result for the previous label according to the binary association technique with respect to the preprocessed customer data, and uses the classification result for the previous label as an explanatory variable when predicting the next label. Correlation existing between labels generating a classifier chain by reflecting the relationship; A customized financial product recommendation method including at least one of.

The method of claim 11, wherein the pre-processing unit linearly transforms the customer data
a first sampling processing step in which the preprocessor performs fixed sampling using an explanatory variable; and
a second sampling processing step in which the pre-processing unit arbitrarily performs sampling; A method of recommending customized financial products, including

The method of claim 11, wherein the pre-processing unit linearly transforms the customer data
excluding from the training by the pre-processing unit data about a customer who has not made a transaction for more than a preset period using the long-term dormancy information included in the customer data; and
and setting, by the pre-processing unit, a probability of purchasing a corresponding financial product to 0 if there is no transaction for a specific financial product for a preset period.