KR20220080508A

KR20220080508A - Feature selection method for deep learning

Info

Publication number: KR20220080508A
Application number: KR1020200169699A
Authority: KR
Inventors: 최대룡
Original assignee: (주)Yh데이타베이스
Priority date: 2020-12-07
Filing date: 2020-12-07
Publication date: 2022-06-14

Abstract

이상금융거래 탐지용 딥러닝 학습을 위한 피처 선정 방법이 개시된다. 이 피처 선정 방법은 사용자별 과거 거래데이터를 기반으로 사용자별 거래패턴을 나타내는 프로파일 데이터를 생성하는 단계, 및 프로파일 데이터로 작성된 탐지 룰(rule)에서 일부 항목들을 딥러닝 학습을 위한 피처(feature) 항목들로 선정하는 단계를 포함한다.A feature selection method for deep learning learning for abnormal financial transaction detection is disclosed. This feature selection method includes the steps of generating profile data representing a transaction pattern for each user based on historical transaction data for each user, and selecting some items from a detection rule written with the profile data as a feature item for deep learning learning. including the step of selecting them.

Description

Feature selection method for deep learning for abnormal financial transaction detection

본 발명은 이상금융거래를 탐지하기 위한 기술에 관련된 것으로, 특히 이상금융거래 탐지를 위한 딥러닝 학습용 데이터를 생성하여 딥러닝을 행하는 기술에 관한 것이다.The present invention relates to a technology for detecting abnormal financial transactions, and more particularly, to a technology for performing deep learning by generating data for deep learning learning for detecting abnormal financial transactions.

이상금융거래 탐지 시스템(Fraud detection system, FDS)은 결제자의 다양한 정보를 수집해 패턴을 만든 후 패턴과 다른 이상 결제를 잡아내고 결제 경로를 차단하는 룰(Rule) 기반의 탐지 시스템이다. 관련하여, 국내등록특허공보 제10-1153968호에는 금융사기 방지 시스템이 개시되어 있다. 이 시스템은 다중채널을 통해 수집한 금융 사기 사례에 관한 데이터를 유형별로 관리하고 사용자의 평소 금융거래에 대한 데이터를 사용자별로 관리하여 사용자가 통신망을 통해 수행하는 금융거래가 금융사기에 해당하는지 여부를 판단하여 금융사기에 해당할 경우 금융거래를 차단한다.Fraud detection system (FDS) is a rule-based detection system that collects various information from the payer and creates a pattern, then catches abnormal payments that are different from the pattern and blocks the payment path. In this regard, Korean Patent Publication No. 10-1153968 discloses a financial fraud prevention system. This system manages data on financial fraud cases collected through multiple channels by type and manages data about users' usual financial transactions for each user to determine whether a financial transaction performed by a user through the communication network is a financial fraud. Therefore, in case of financial fraud, financial transactions are blocked.

그러나 기존 시스템의 경우 실시간 분석이 불가능한 RDB 및 IP 추적만을 이용한다. 그리고 기존의 룰 기반의 기술의 경우, 급변하는 사기방식의 대응이 어렵다는 단점이 있다. 이러한 단점을 보완하기 위해 딥러닝 기술을 이용하여 사기거래에 선제적으로 대응하는 방안이 제시되어 있기는 하지만, 무분별한 딥러닝 학습을 통해서는 이상금융거래를 효과적으로 탐지하기가 어렵다.However, in the case of the existing system, only RDB and IP tracking, which cannot be analyzed in real time, are used. And in the case of the existing rule-based technology, there is a disadvantage in that it is difficult to respond to the rapidly changing fraud method. To compensate for these shortcomings, a method to preemptively respond to fraudulent transactions using deep learning technology has been proposed, but it is difficult to effectively detect abnormal financial transactions through reckless deep learning learning.

국내등록특허공보 제10-1153968호 (2012년 6월 8일 공고)Domestic Patent Publication No. 10-1153968 (Announced on June 8, 2012)

본 발명은 효과적인 이상금융거래의 탐지를 위해 보다 개선된 딥러닝 학습을 수행할 수 있는 기술적 방안을 제공함을 목적으로 한다.An object of the present invention is to provide a technical method capable of performing more improved deep learning learning for effective detection of abnormal financial transactions.

일 양상에 따른 이상금융거래 탐지용 딥러닝 학습을 위한 피처 선정 방법은 사용자별 과거 거래데이터를 기반으로 사용자별 거래패턴을 나타내는 프로파일 데이터를 생성하는 단계, 및 프로파일 데이터로 작성된 탐지 룰(rule)에서 일부 항목들을 딥러닝 학습을 위한 피처(feature) 항목들로 선정하는 단계를 포함할 수 있다.A feature selection method for deep learning learning for detecting abnormal financial transactions according to an aspect includes the steps of generating profile data representing a transaction pattern for each user based on past transaction data for each user, and a detection rule written with the profile data. It may include selecting some items as feature items for deep learning learning.

선정 단계는 탐지 룰에 속한 항목들 중에서 탐지에 사용된 항목들을 피처 항목들로 선정할 수 있다.In the selection step, items used for detection among items belonging to the detection rule may be selected as feature items.

이상금융거래 탐지용 딥러닝 학습을 위한 피처 선정 방법은 선정된 피처 항목별로 해당 탐지율을 가중치로 부여하는 단계를 더 포함할 수 있다.The feature selection method for deep learning learning for abnormal financial transaction detection may further include assigning a corresponding detection rate to each selected feature item as a weight.

본 발명은 룰 방식의 탐지 시스템에서 탐지가 많이 된 룰의 탐지 항목을 피처 항목으로 선정하여 딥러닝 학습용 데이터에 반영하고 그 딥러닝 학습용 데이터로 딥러닝 학습을 수행함으로써, 그 딥러닝 학습 결과를 가지고 사기거래에 대한 보다 효과적인 선제적 대응을 가능하게 하는 효과를 창출한다.The present invention selects the detection item of the rule that has been detected a lot in the rule-type detection system as a feature item, reflects it on the data for deep learning learning, and performs deep learning learning with the data for deep learning learning, with the deep learning learning result It creates the effect of enabling a more effective preemptive response to fraudulent transactions.

도 1은 일 실시예에 따른 이상금융거래 탐지용 딥러닝 학습을 위한 피처 선정 방법 흐름도이다.
도 2는 도 1의 설명 참고를 위한 예시도이다.
도 3은 일 실시예에 따른 이상금융거래 탐지용 딥러닝 학습 프로세스를 나타낸 도면이다.1 is a flowchart of a feature selection method for deep learning learning for detecting abnormal financial transactions according to an embodiment.
FIG. 2 is an exemplary diagram for reference to the description of FIG. 1 .
3 is a diagram illustrating a deep learning learning process for detecting abnormal financial transactions according to an embodiment.

전술한, 그리고 추가적인 본 발명의 양상들은 첨부된 도면을 참조하여 설명되는 바람직한 실시예들을 통하여 더욱 명백해질 것이다. 이하에서는 본 발명을 이러한 실시예를 통해 통상의 기술자가 용이하게 이해하고 재현할 수 있도록 상세히 설명하기로 한다.The foregoing and further aspects of the present invention will become more apparent through preferred embodiments described with reference to the accompanying drawings. Hereinafter, the present invention will be described in detail so that those skilled in the art can easily understand and reproduce it through these examples.

도 1은 일 실시예에 따른 이상금융거래 탐지용 딥러닝 학습을 위한 피처 선정 방법 흐름도이며, 도 2는 도 1의 설명 참고를 위한 예시도이다. 도 1은 이상금융거래 탐지용 딥러닝 학습을 위한 서버 시스템에 의해 수행되는 것으로, 서버 시스템에 속하는 하나 이상의 서버에 의해 수행될 수 있다. 이하에서는 설명의 편의상 동작 주체를 서버 시스템이라 한다. 서버 시스템은 거래 로그 데이터베이스에 저장된 사용자들의 과거 (금융)거래 데이터를 기반으로 사용자별 거래패턴을 나타내는 프로파일 데이터를 생성한다(S100). S100에서, 서버 시스템은 사용자별 과거 거래 데이터에 대해 빅데이터 분석을 통한 프로파일링 분석을 행하여 프로파일 데이터를 생성할 수 있다. 서버 시스템은 사용자별 프로파일 데이터를 가지고 탐지 룰(시나리오)을 생성한다(S200). 즉, 프로파일링된 항목들을 이용하여 탐지 시나리오를 작성하는 것이다. 그리고 이 같은 S200은 자동이 아니라 관리자의 개입에 의해 수행될 수도 있다.1 is a flowchart of a feature selection method for deep learning learning for detecting abnormal financial transactions according to an embodiment, and FIG. 2 is an exemplary diagram for reference to the description of FIG. 1 . 1 is performed by a server system for deep learning learning for abnormal financial transaction detection, and may be performed by one or more servers belonging to the server system. Hereinafter, for convenience of description, the operating subject is referred to as a server system. The server system generates profile data indicating a transaction pattern for each user based on the users' past (financial) transaction data stored in the transaction log database (S100). In S100, the server system may generate profile data by performing profiling analysis through big data analysis on past transaction data for each user. The server system creates a detection rule (scenario) with profile data for each user (S200). That is, a detection scenario is created using the profiled items. And this S200 may be performed not automatically but by the intervention of an administrator.

서버 시스템은 S200을 통해 작성된 탐지 룰에서 일부 항목들을 딥러닝 학습을 위한 피처(feature) 항목들로 선정한다(S300). 일 실시예에 있어서, 서버 시스템은 탐지에 사용된 항목들을 피처 항목들로 선정한다. 즉, 기존 룰 방식의 탐지 시스템에서 탐지에 사용된 항목들을 피처 항목들로 선정하는 것이다. 일 실시예에 있어서, 서버 시스템은 탐지에 사용된 항목들 중에서 탐지율이 기설정된 기준율(예를 들어, 5%) 이상인 항목들만을 피처 항목들로 선정한다. 이러한 피처 선정 기준을 적용하였을 경우, 최근 1년간 로그인 국가, VPN 우회정보, 출국정보, 신규단말 항목 등이 이상금융거래 탐지에 효과적으로 활용되었음이 확인되었다.The server system selects some items from the detection rule written through S200 as feature items for deep learning learning (S300). In one embodiment, the server system selects items used for detection as feature items. That is, in the existing rule-based detection system, items used for detection are selected as feature items. In an embodiment, the server system selects only items having a detection rate equal to or greater than a preset reference rate (eg, 5%) among items used for detection as feature items. When these feature selection criteria were applied, it was confirmed that the login country, VPN bypass information, departure information, and new terminal items were effectively used to detect abnormal financial transactions over the past year.

다음으로, 서버 시스템은 선정된 피처 항목들 각각에 대해 해당 탐지율을 가중치로 반영한다(S400). 즉, 서버 시스템은 백분율로 표현된 탐지율을 그대로 가중치로써 해당 피처 항목에 반영시킬 수 있으며, 이에 대한 예가 도 2에 도시되어 있다. 이 같은 가중치가 반영된 피처 항목들은 딥러닝 학습시 이용되어 보다 효율적인 딥러닝 학습을 가능하게 한다. 부연하면, 상술한 피처 선정 기준에 따라 선정된 피처 항목들에 적절한 가중치를 반영하여 딥러닝 학습에 사용토록 함으로써, 효율적인 딥러닝을 가능하게 할 수 있는 것이다.Next, the server system reflects the corresponding detection rate as a weight for each of the selected feature items (S400). That is, the server system may reflect the detection rate expressed as a percentage to the corresponding feature item as a weight as it is, and an example of this is shown in FIG. 2 . Feature items with such weights are used during deep learning learning to enable more efficient deep learning learning. In other words, by reflecting appropriate weights on the feature items selected according to the above-described feature selection criteria and using them for deep learning learning, it is possible to enable efficient deep learning.

도 3은 일 실시예에 따른 이상금융거래 탐지용 딥러닝 학습 프로세스를 나타낸 도면이다. 도 3의 프로세스를 수행하기 위한 서버 시스템에는 분석 서버(100)와 탐지 서버(200)와 시뮬레이션 서버(시뮬레이터)(300) 및 딥러닝 서버(400)가 포함될 수 있다. 분석 서버(100)는 과거의 소정 기간(예를 들어, 3년) 동안의 고객별 거래로그를 이용하여 시뮬레이터(300)가 딥러닝 서버(400)에서 사용할 데이터를 생성할 때 이용되는 고객 프로파일 데이터(고객 거래패턴 데이터)를 생성하는데, 고객의 거래 정보와 단말 정보를 입력으로 하여 프로파일링을 통해 고객의 거래패턴을 분석하여 고객 프로파일 데이터를 생성해낼 수 있다. 이 같은 분석 서버(100)는 하둡 클러스터(Hadoop 2.0)를 사용하는 서버일 수 있다.3 is a diagram illustrating a deep learning learning process for detecting abnormal financial transactions according to an embodiment. The server system for performing the process of FIG. 3 may include an analysis server 100 , a detection server 200 , a simulation server (simulator) 300 , and a deep learning server 400 . The analysis server 100 is customer profile data used when the simulator 300 generates data to be used in the deep learning server 400 using transaction logs for each customer for a predetermined period (eg, 3 years) in the past. (Customer transaction pattern data) is generated, and customer profile data can be generated by analyzing the customer's transaction pattern through profiling by inputting the customer's transaction information and terminal information. The analysis server 100 may be a server using a Hadoop cluster (Hadoop 2.0).

탐지 서버(200)는 실시간 수집되는 거래로그를 기 정의된 룰(패턴)과 딥러닝 서버(400)에 의해 생성된 딥러닝 모델(예측 함수)에 기반하여 이상금융거래를 탐지한다. 또한, 탐지 서버(200)는 탐지 시나리오에서 이상금융거래 탐지에 기여한 항목들을 피처 항목들로 선정하며, 선정된 피처 항목별로 해당 탐지율을 가중치로 부여하는 역할을 추가로 수행할 수 있다. 시뮬레이터(300)는 실시간으로 데이터베이스에 쌓이는 거래데이터의 상태정보를 수집한다. 여기서 데이터베이스는 다차원 데이터베이스(Multidimensional Database, MDB)일 수 있으며, MDB 데이터는 로그시간 기준 24시간 이내 데이터를 의미한다.The detection server 200 detects abnormal financial transactions based on a predefined rule (pattern) and a deep learning model (prediction function) generated by the deep learning server 400 based on the transaction log collected in real time. In addition, the detection server 200 may select items contributing to the detection of abnormal financial transactions in the detection scenario as feature items, and may additionally perform a role of assigning a corresponding detection rate as a weight for each selected feature item. The simulator 300 collects state information of transaction data accumulated in the database in real time. Here, the database may be a multidimensional database (MDB), and MDB data means data within 24 hours based on log time.

시뮬레이터(300)는 분석 서버(100)에 의해 생성된 사용자별 프로파일 데이터와 실시간 수집된 상태정보를 합하여 딥러닝 학습용 데이터를 생성한다. 그리고 딥러닝 서버(400)는 딥러닝 모델링 학습 연산을 위한 GPGPU(General-Purpose computing on GPU) 서버일 수 있다. 딥러닝 서버(400)는 딥러닝 학습용 데이터를 입력받아 딥러닝 뉴럴 네트워크 방식의 학습을 진행하여 딥러닝 모델(예측 함수)을 생성하되, 선정된 피처 항목들을 가지고 학습을 진행하여 딥러닝 모델을 생성한다.The simulator 300 generates data for deep learning learning by adding the profile data for each user generated by the analysis server 100 and the real-time collected state information. And the deep learning server 400 may be a General-Purpose computing on GPU (GPGPU) server for deep learning modeling learning operations. The deep learning server 400 receives data for deep learning learning and performs deep learning neural network method learning to generate a deep learning model (prediction function), but performs learning with selected feature items to generate a deep learning model do.

이제까지 본 발명에 대하여 그 바람직한 실시예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far, with respect to the present invention, the preferred embodiments have been looked at. Those of ordinary skill in the art to which the present invention pertains will understand that the present invention can be implemented in a modified form without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments are to be considered in an illustrative rather than a restrictive sense. The scope of the present invention is indicated in the claims rather than the foregoing description, and all differences within the scope equivalent thereto should be construed as being included in the present invention.

100 : 분석 서버 200 : 탐지 서버
300 : 시뮬레이터 400 : 딥러닝 서버100: analysis server 200: detection server
300: simulator 400: deep learning server

Claims

generating profile data representing a transaction pattern for each user based on past transaction data for each user; and
selecting some items from a detection rule written with profile data as feature items for deep learning learning;
Feature selection method for deep learning learning for abnormal financial transaction detection, including

The method of claim 1,
The selection step is a feature selection method for deep learning learning for abnormal financial transaction detection that selects the items used for detection among the items belonging to the detection rule as feature items.

3. The method of claim 2,
assigning a corresponding detection rate to each selected feature item as a weight;
Feature selection method for deep learning learning for abnormal financial transaction detection further comprising a.