KR20180076753A

KR20180076753A - System and Method for Anomaly Pattern

Info

Publication number: KR20180076753A
Application number: KR1020160181252A
Authority: KR
Inventors: 서장원
Original assignee: 주식회사 엘렉시
Priority date: 2016-12-28
Filing date: 2016-12-28
Publication date: 2018-07-06
Also published as: KR102031123B1

Abstract

A system for sensing an abnormal pattern according to an embodiment of the present technology includes a data vectorization unit for digitizing all data based on word embedding, a time-series modeling unit which performs learning by inputting a time-series data vector or a data vector generated during a predetermined time window of the time-series data vector, and a hierarchical behavior knowledge space processing unit for checking whether the abnormal pattern is sensed based on a modeling result of the time-series modeling unit. Accordingly, the present invention can sense the abnormal pattern with a hybrid method.

Description

{System and Method for Anomaly Pattern}

본 발명은 이상 감지 기술에 관한 것으로, 보다 구체적으로는 이상패턴 감지 시스템 및 방법에 관한 것이다.The present invention relates to anomaly detection technology, and more particularly to an abnormal pattern detection system and method.

이상패턴 감지 방법은 주어진 또는 온라인으로 피딩(feeding)되는 데이터로부터 비정상적인 데이터 즉, 이상패턴을 찾는 방법이다.The abnormal pattern detection method is a method of finding abnormal data, that is, an abnormal pattern, from data supplied or on-line.

이상패턴 감지 방법은 규칙기반 방법론과 신경망기반 방법론의 크게 두 가지로 나누어진다.The abnormal pattern detection method is divided into two types: rule-based methodology and neural network-based methodology.

규칙기반 방법론은 고전적인 방법으로서, 전문가가 이상패턴이 발생하는 원인을 분석하고 이를 판단할 수 있는 통계적 규칙((즉, "데이터의 xxx한 패턴이 한 시간 동안 90%이상 나타나면" 등), 결정적 규칙(즉, "YYY한 데이터가 어느 데이터 필드에 출현하면" 등)를 만들어 입력되는 데이터에 적용하여 이상패턴을 찾는 방법론이다.Rule-based methodology is a classical method in which experts analyze statistical rules (ie, "xxx pattern of data appears over 90% for one hour") to analyze and determine the cause of anomalous patterns, Rules (that is, "when YYY data appears in any data field", etc.) are generated and applied to input data to find an abnormal pattern.

규칙기반 방법론은 이상패턴이 나타나는 경우, 적용한 규칙을 이용하여 원인을 설명할 수 있고, 새로운 규칙을 시스템에 추가하기 쉽지만, 문제 영역에 대한 전문가가 규칙을 생성하는 것이 필요하다. 또한, 통계적 규칙인 경우 의미 있는 통계치를 구하기 위해서는 일정량 이상의 데이터가 필요하기 때문에, 실시간 이상패턴 감지가 불가능하다. 뿐만 아니라, 데이터에 필연적으로 나타나는 노이즈(오류)에 의하여 규칙 적용에 예외적 경우가 많이 생겨 실용적으로 사용하기에 만은 어려움이 있다.The rule-based methodology can explain the cause by using the applied rule when the abnormal pattern appears, and it is easy to add the new rule to the system, but it is necessary that the expert on the problem area generates the rule. In addition, in the case of statistical rules, since more than a certain amount of data is required to obtain meaningful statistics, real-time abnormal pattern detection is impossible. In addition, due to the noise (error) inevitably appearing in the data, there are many exceptions in the rule application, which is difficult to use practically.

신경망기반 방법론은 일반적으로 비지도학습 방법 또는 입력 데이터를 출력 데이터로 사용하는 오토 인코더(Auto Encoder)의 구성을 가지는 지도학습 방법을 이용하여 데이터로부터 이상패턴을 감지해내는 방법론이다.A neural network based methodology is a methodology that detects abnormal patterns from data by using a non - cooperative learning method or a learning method with an auto encoder configuration using input data as output data.

신경망기반 방법론은 학습 과정을 통해 데이터를 모델링하기 때문에, 실제 감시 상황에서는 데이터에 대한 통계치를 구하기 위한 시간이 필요 없어, 실시간에 가까운 이상패턴 감지가 가능하고 데이터에 필연적으로 나타나는 노이즈(오류)에 상대적으로 강인하다. 그러나 이상패턴이 나타나는 경우, 원인을 설명하기 위해서는 전문가들의 실제 데이터에 대한 분석이 필요하고 이상패턴의 발견을 시스템에 실시간으로 반영하여, 즉시 감지에 이용하기 위한 적응(Adaptive)학습, 점진(Incremental)학습 방법론은 아직 실용화되고 있지 않다.Since the neural network based methodology models the data through the learning process, it does not need time to obtain statistics on the data in the actual monitoring situation, and it is possible to detect the near ideal pattern in real time and to relate the noise (error) . However, when an abnormal pattern appears, it is necessary to analyze the actual data of the experts in order to explain the cause, and it is necessary to adapt the learning of detecting the abnormal pattern in real time to the system, Learning methodology has not yet been put into practical use.

이상패턴 감지를 위하여 피딩되는 데이터는, 데이터가 시스템에 입력되는 순서가 중요한 시계열(Time series) 데이터와, 순서에 독립적인 공간 데이터로 나누어 볼 수 있다. 또한 하나의 데이터는 데이터 필드의 집합으로 이루어져 있고, 각 데이터 필드는 값의 메트릭(Metric)(크기, 거리)가 의미를 가지는 수치적 데이터와, 종류, 이름 등을 나타내는 비수치적(Nominal) 데이터로 나눌 수 있다.The data fed for detection of anomalous patterns can be divided into time series data in which the order of inputting data into the system is important and spatial data independent of the order. In addition, one data is composed of a set of data fields, and each data field is composed of numerical data having a meaning of a value (metric) (size, distance) and non-numerical data indicating a kind, Can be divided.

이때 비수치적 데이터는 데이터 값의 메트릭이 의미가 없기 때문에, 수치적 신경망의 입력으로 사용하기에는 부적절하다.In this case, non-numerical data is inappropriate for use as an input of a numerical neural network since the metric of the data value is meaningless.

따라서 비수치적 데이터를 수치적 데이터로 변환하는 처리(Vectorization, 벡터화)를 해주어야 한다. 또한, 학습데이터에는 보이지 않는(Unseen) 데이터가 실제 감시에서는 나타날 수 있기 때문에, 이 경우에도 처리할 수 있어야 한다.Therefore, non-numerical data must be converted into numerical data (vectorization, vectorization). Also, since unseen data may appear in the training data, it should be able to be processed in this case as well.

본 기술의 실시예는 신경망기반 방법과 규칙기반 방법, 예를 들어 계측적 행위 지식 공간(Hierarchical Behavior Knowledge Space; HBKS) 방법의 변형을 적용하여 하이브리드 방식으로 이상패턴을 감지할 수 있는 시스템 및 방법을 제공할 수 있다.Embodiments of the present technology may provide a system and method for detecting anomalous patterns in a hybrid manner by applying a variation of a neural network-based method and a rule-based method, for example, a Hierarchical Behavior Knowledge Space (HBKS) method .

본 기술의 실시예는 워드 임베딩(Word Embedding) 방법을 응용한 수치화 처리 방법과 보이지 않는(Unseen) 데이터를 처리할 수 있는 이상패턴 감지 시스템 및 방법을 제공할 수 있다.Embodiments of the present technology can provide a method of digitizing using a word embedding method and an abnormal pattern detecting system and method capable of processing unseen data.

본 기술의 일 실시예에 의한 이상패턴 감지 시스템은 워드 임베딩을 기반으로 모든 데이터를 수치화하는 데이터 벡터화부; 시계열 데이터벡터 또는 시계열 데이터 벡터 중 일정 시간 윈도우 동안 생성된 데이터 벡터를 입력으로 하여 학습을 수행하는 시계열 모델링부; 및 시계열 모델링부의 모델링 결과에 기초하여 이상패턴이 감지되는지 검사하는 계층적 행위지식공간 처리부;를 포함할 수 있다.The abnormal pattern detection system according to an embodiment of the present invention includes a data vectorization unit for digitizing all data based on word embedding; A time series modeling unit for performing learning by inputting a time series data vector or a data vector generated during a predetermined time window among time series data vectors; And a hierarchical action knowledge space processing unit for checking whether an abnormal pattern is detected based on a modeling result of the time series modeling unit.

본 기술에 의하면 신경망기반 방법과 규칙기반 방법을 통합한 하이브리드 방식으로 이상패턴을 감지할 수 있다.According to the present invention, it is possible to detect anomalous patterns by a hybrid method that integrates a neural network-based method and a rule-based method.

본 기술은 금융권의 부정행위 적발(fraud detection), 고장예지진단 (Prognosis), DRM 위해성 모니터링(Digital Right Management Risk Monitoring), 네트워크 침입(Intrusion) 모니터링 등 다양한 이상패턴 검출에 응용될 수 있다.This technology can be applied to various abnormal pattern detection such as fraud detection, prognosis, DRM risk monitoring, network intrusion monitoring, and the like of a financial institution.

도 1은 일 실시예에 의한 이상패턴 감지 시스템의 구성도이다.
도 2는 일 실시예에 의한 통계치 벡터 연산 개념을 설명하기 위한 도면이다.
도 3은 일 실시예에 의한 이상패턴 감지 방법을 설명하기 위한 흐름도이다.1 is a block diagram of an abnormal pattern detection system according to an embodiment of the present invention.
FIG. 2 is a view for explaining the concept of statistical vector computation according to an embodiment.
3 is a flowchart illustrating an abnormal pattern detection method according to an exemplary embodiment of the present invention.

이하, 첨부된 도면을 참조하여 본 기술의 실시예를 보다 구체적으로 설명한다.Hereinafter, embodiments of the present technology will be described in more detail with reference to the accompanying drawings.

도 1은 일 실시예에 의한 이상패턴 감지 시스템의 구성도이다.1 is a block diagram of an abnormal pattern detection system according to an embodiment of the present invention.

도 1을 참조하면, 일 실시예에 의한 이상패턴 감지 시스템은 데이터 벡터화부, 시계열 모델링부(Time-series Modeler) 및 계층적 행위지식공간(Hierarchical Behavioral Knowledge Space; HBKS) 처리부를 포함할 수 있다.Referring to FIG. 1, the abnormal pattern detection system may include a data vectorization unit, a time-series modeler, and a hierarchical behavioral knowledge space (HBKS) processor.

1. 데이터1. Data

데이터 벡터화부(Data vectorizer)는 워드 임베딩(Word embedding)을 응용하여 모든 데이터를 수치화하여 표현 하도록 구성될 수 있다.The data vectorizer may be configured to digitally represent all data by applying word embedding.

데이터 벡터화부에 대해 보다 구체적으로 설명하면 다음과 같다.The data vectorization unit will be described in more detail as follows.

먼저, 전체 데이터 필드 중 이상패턴 감지에 필요한 데이터 필드들을 선택하는 데이터필드 필터링 과정이 수행될 수 있다.First, a data field filtering process for selecting data fields necessary for detecting an abnormal pattern among the entire data fields can be performed.

데이터 필드를 비수치 값을 가지는 데이터 필드 집합과 수치 값을 가지는 데이터 필드 집합으로 나누고 각각의 데이터에 대하여 비수치 데이터와 수치 데이터로 나둘 수 있다.A data field can be divided into a set of data fields having non-numerical values and a set of data fields having numerical values, and can be divided into non-numeric data and numerical data for each data.

<예> (성별, 신장, 체중, 결혼상태) -> (성별, 결혼상태), (신장, 체중)<Example> (sex, height, weight, marital status) -> (gender, marital status), (height, weight)

그리고, 수치 데이터 필드 집합에 대하여 데이터 필드별로 전체 데이터 집합에 대하여 일정 범위로 정규화할 수 있다.The set of numerical data fields can be normalized to a certain extent for the entire data set for each data field.

<예> 0 ~ 1사이로 신장 데이터를 정규화<Example> Normalizing the height data between 0 and 1

아울러, 비수치 데이터 필드 집합에 대하여 하나의 비수치 데이터 인스턴스(instance)를 원핫(One-hot)벡터로 표현할 수 있다.In addition, one non-numeric data instance for a set of non-numeric data fields can be represented as a one-hot vector.

<예> 2 X 2 = 4 (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0), (0, 0, 0, 1)0, 0, 0), (0, 0, 1, 0), (0, 0, 0, 1)

이 때, 언신(Unseen) 데이터의 벡터화를 위하여 실제 데이터에 없는 비수치 데이터 필드 값의 경우, 모든 경우를 생성하여 데이터 집합에 추가할 수 있다.In this case, in the case of non-numeric data field values that are not in actual data for vectorization of unseen data, all cases can be generated and added to the data set.

<예> 데이터 집합에 (1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0)만 있는 경우 (0, 0, 0, 1) 추가 Example: If the data set contains only (0, 0, 0, 0), (0, 1, 0, 0)

아울러, 시계열을 반영하기 위하여 비수치 데이터 순서대로 워드 임베딩 학습을 수행할 수 있다. 이때, 언신(Unseen) 데이터의 경우 0벡터, unseen벡터, 0벡터로 워드 임베딩 학습이 이루어질 수 있다.In addition, word-embedded learning can be performed in order of non-numeric data to reflect the time series. At this time, in case of unseen data, word embedding learning can be performed by 0 vector, unseen vector, and 0 vector.

결과적으로 모든 비수치 데이터 필드 인스턴스에 대하여 특정 차원의 벡터 표현을 얻을 수 있다. 그리고, 얻어진 벡터 표현을 수치 데이터의 정규화 범위로 정규화할 수 있다.As a result, a vector representation of a particular dimension can be obtained for all non-numeric data field instances. Then, the obtained vector expression can be normalized to the normalized range of the numerical data.

이어서, 수치 데이터 필드와 비수치 데이터 필드를 연결(Concatenation)하고, 연결한 결과를 "데이터벡터"로 정의할 수 있다.Next, the numerical data field and the non-numeric data field are concatenated, and the result of the concatenation can be defined as a "data vector".

만약 데이터 벡터 전체가 수치 데이터인 경우에는 각 차원(데이터필드)을 모두 또는 일부를 비수치적 데이터로 표현하여 해당 데이터의 BKS 키데이터로 사용할 수 있다. 또한, 데이터에 대한 주성분 분석 (Principal Component Analysis; PCS) 등을 통하여 데이터 간 변화가 큰 타원부터 선택할 수 있다.If the entire data vector is numerical data, all or some of the dimensions (data fields) may be expressed as non-numeric data and used as the BKS key data of the corresponding data. Also, it can be selected from ellipses with large change between data through Principal Component Analysis (PCS).

<예> (0.7, -2, 1) =>({+, 0, -}, (작다, 크다), {yes, No})에서 (+, 작다, Yes)<Yes> (+, smaller, Yes) in (0.7, -2, 1) => ({+, 0, -}

2. 시계열 모델링부2. Time Series Modeling Unit

시계열 모델링부는 도메인(Domain, 문제 영역) 특성과 데이터 양에 따라, 시계열 모델링에 적합한 신경망 모델을 적용하여 구성할 수 있다.The time series modeling department Domain (Domain) Depending on the nature and amount of data, a neural network model suitable for time series modeling can be applied.

일 실시예에서, 시계열 모델링부는 비지도 학습을 이용한 순환형(Recurrent) 신경망, 자기 조직화 지도(Self-Organizing Map) 등의 모델, 또는 지도학습을 이용한 오토 인코더(Auto Encoder) 모델 등을 사용하여 구성할 수 있다.In one embodiment, the time series modeling unit may be configured using a recurrent neural network, a self-organizing map, or the like using non-bipartite learning, or an auto-encoder model using map learning can do.

시계열 모델링부에서 입력 데이터는 데이터벡터화 단계에서 데이터 집합으로부터 얻은 시계열의 데이터벡터 또는 시계열 데이터벡터 중 일정 시간 윈도우(W) 동안 생성된 데이터벡터일 수 있다.In the time series modeling unit, the input data may be a time series data vector obtained from the data set in the data vectorization step, or a data vector generated during the predetermined time window W among the time series data vectors.

<예> W = 3, 시간 t의 벡터를 v(t)라고 하면, v(t), v(t-1), v(t-2). t<W이면 v(t) 는 0벡터(가장 엔트로피가 높은 벡터)V (t), v (t-1), and v (t-2), where W = 3 and the vector of time t is v (t). If t <W, then v (t) is a 0 vector (vector with the highest entropy)

3. 계층적 행위지식공간(HBKS) 처리부3. Hierarchical Behavior Knowledge Space (HBKS)

계층적 행위지식공간 처리부는 데이터를 감시하는 과정 중에 이상패턴이 나타나는 경우 이를 시스템에 저장해 놓고 이후 감시 과정에 적용하도록 구성될 수 있다. 아울러, 계층적 행위지식공간 처리부는 이상패턴에 대하여 파악된 원인도 함께 저장하여 향후 동일 패턴 발생시 원인을 설명할 수 있도록 구성할 수 있다.Hierarchical Behavior The knowledge space processing unit can be configured to store an abnormal pattern in the system during the data monitoring process and to apply it to the monitoring process afterwards. In addition, the hierarchical action knowledge space processing unit can store the cause of the abnormal pattern together and explain the cause of the same pattern in the future.

계층적 행위지식공간 처리부에 대해 설명하기에 앞서, 행위지식공간(Behavioral Knowledge Space; BKS)에 대해 설명한다.Hierarchical Behavior Before describing the knowledge space processing unit, we describe the Behavioral Knowledge Space (BKS).

BKS는 기본적으로"BKS요소"들의 집합이다.BKS is basically a set of "BKS elements".

BKS = {BKS요소}BKS = {BKS element}

BKS요소는 "데이터베이스의 키값에 해당하는 키데이터와 통계치벡터의 쌍으로 이루어질 수 있다.The BKS element may be a pair of key data and a statistic vector corresponding to the key value of the database.

BKS요소 = {(키데이터, 통계치벡터)}BKS element = {(key data, statistic vector)}

키데이터는 관찰 대상인 데이터 중 하나로, 데이터 벡터화의 입력 데이터를 의미할 수 있다. 통계치벡터는 (해당 키데이터가 참조된 돗수, 해당 키데이터가 이상패턴으로 판별된 횟수)의 2차원 벡터로 이루어질 수 있다.The key data is one of data to be observed and may mean input data of data vectorization. The statistical vector may be a two-dimensional vector (the number of times the key data is referenced, the number of times the corresponding key data is determined as an abnormal pattern).

유의수준임계치는 "해당 키데이터가 참조된 돗수"의 임계치로 "몇 번 이상 해당 BKS가 참조 되었으면 BKS판단을 믿겠다"라는 의미를 내포한다. 예를 들어 "유의수준임계치 = 30"과 같이 설정될 수 있다.The significance level threshold implies that the BKS judgment is believed to be "if the BKS has been referred to more than a certain number of times with the threshold value of the" reference number of the key data ". For example, "significance level threshold = 30 ".

신뢰율이란 "해당 키데이터가 실제로 이상패턴인 확률"로 통계치벡터가 (a,b)이면 b/a, a != 0으로 표현될 수 있다.The confidence rate can be expressed as b / a, a! = 0 if the statistical vector is (a, b) with "probability that the key data is actually an abnormal pattern".

BKS탐색이란 주어진 BKS에서 키데이터가 주어진 데이터와 일치하는 BKS요소가 있는지 찾아보고, 있으면 해당 BKS요소를 출력하는 것을 의미한다.A BKS search means that in a given BKS, the key data looks for a BKS element that matches the given data, and if so, the corresponding BKS element is output.

BKS생성이란 주어진 BKS에서 주어진 데이터를 키데이터로 가지는 BKS요소를 생성하는 것을 의미하며, 이때 통계치백터 V도 생성할 수 있다. 예를 들어, 이상패턴이면 V = (1,1), 이상패턴이 아니면 V = (1,0)와 같이 생성될 수 있다.Generating a BKS means creating a BKS element with the given data as the key data in a given BKS, where a statistical vector V can also be generated. For example, if an abnormal pattern is V = (1,1), and if it is not an abnormal pattern, V = (1,0).

BKS요소 업데이트란 주어진 데이터를 키데이터로 가지는 통계치백터 V= (a,b)에서 이상패턴이면 V = (a+1,b+1), 이상패턴이 아니면 V = (a+1,b)와 같이 업데이트하는 것을 의미한다.BKS element update means that V = (a + 1, b + 1) for an ideal pattern and V = (a + 1, b) for a statistical vector with given data as key data. It means updating together.

이하, HBKS에 대해 설명한다.The HBKS will be described below.

W개의 키데이터의 순서쌍(Ordered Set)을 생각하면 이는 길이 W의 단어와 1대1 매핑 가능하며, 또한 하나의 키데이터는 하나의 알파벳에 대응 가능하다.Considering an ordered set of W key data, it can be mapped to a word of length W one by one, and one key data can correspond to one alphabet.

<예> W=3, (KD1,KD2,KD3) => "MAN", KD1 => 'M', KD2 => 'A', KD3 => 'N'KD2 => 'A', KD3 => 'N', KD1 => 'M'

1 <= w <= W 일 때 주어진 순서쌍에 길이 w의 작은 순서쌍을 생각하면 이는 길이 W의 단어의 서브 스트링(sub-string)으로 나타내어질 수 있다.Considering a small ordered pair of length w in a given ordered pair when 1 <= w <= W, this can be represented by a sub-string of words of length W.

<예> MAN => "M", "MA", "MAN"<Example> MAN => "M", "MA", "MAN"

주어진 데이터 집합에 대하여 길이 W의 데이터 순서쌍의 연결(Concatenation)을 키데이터로 가지는 BKS를 생각할 수 있으며, 또한 길이 w (1 <= w <= W)의 키데이터를 가지는 BKS도 생각할 수 있다.A BKS having key data of concatenation of a data sequence pair of length W for a given data set is conceivable, and a BKS having key data of length w (1 <= w <= W) is also conceivable.

위와 같이 데이터 집합과 W가 정해지면, 이에 따라 길이 w의 BKS를 생각할 수 있고("w-BKS"라고 한다) 이들의 모임이 주어진 데이터집합의 HBKS로 정의한다.Once the data set and W are determined as above, we can think of a BKS of length w (called "w-BKS") and define these collections as HBKS of the given dataset.

또한 HBKS의 키데이터는 TRIE 구조로 연결되어 있다.Also, the key data of HBKS is connected by TRIE structure.

리프(Leaf) 노드(즉 W-BKS의 각노드)는 해당 데이터필드가 나타난 경우 왜 이상패턴인가에 대한 설명을 저장할 수 있는 저장소를 가질 수 있다.A leaf node (i.e., each node of the W-BKS) may have a store that can store a description of why it is anomalous if the corresponding data field appears.

HBKS탐색이란 주어진 HBKS에서 W길이의 키데이터가 있는지 찾아보고, 있으면 해당 HBKS요소를 출력하는 것을 의미한다.The HBKS search refers to searching for a key length of W in a given HBKS, and if so, outputting the corresponding HBKS element.

HBKS생성이란 주어진 BKS에서 주어진 데이터를 키데이터로 가지는 HBKS요소를 생성하는 것을 의미하며, 이때 각 t-BKS 각각의 통계치백터 V도 생성할 수 있다.HBKS generation means generating an HBKS element having the given data as the key data in a given BKS, and can also generate the statistical vector V of each t-BKS.

HBKS요소 업데이트 과정에 따라 단계별로 t-BKS가 업데이트될 수 있다.The t-BKS may be updated step by step according to the HBKS element update procedure.

HBKS 처리부에 대해 보다 구체적으로 설명하면 다음과 같다.The HBKS processing unit will be described in more detail as follows.

길이 W의 학습데이터에 대한 HBKS를 정의하는 예를 설명하면 다음과 같다.An example of defining HBKS for learning data of length W will be described as follows.

재귀(Recurrent) 신경망과 같이 시계열 데이터벡터의 시간 윈도우가 정해지지 않는 경우에는 W를 1부터 큰 숫자 중 임의로 정할 수 있다.If a time window of a time series data vector is not determined, such as a recurrent neural network, W can be arbitrarily set to any value from 1 to a large number.

키데이터는 특정시간에 관찰하는 W개의 데이터벡터에 대응하는 순서쌍을 의미한다.The key data means an ordered pair corresponding to W data vectors observed at a specific time.

<예> W=5, t = 3 이라고 가정하면 <Example> Assuming that W = 5 and t = 3

5-BKS의 키데이터는 (V3, V2, V1, 0백터, 0백터)The key data of 5-BKS is (V3, V2, V1, 0 vector, 0 vector)

4-BKS의 키데이터는 (V3, V2, V1, 0백터)The key data of 4-BKS is (V3, V2, V1, 0 vector)

3-BKS의 키데이터는 (V3, V2, V1)The key data of 3-BKS is (V3, V2, V1)

2-BKS의 키데이터는 (V3, V2)The key data of 2-BKS is (V3, V2)

1-BKS의 키데이터는 (V3)The key data of 1-BKS is (V3)

통계치 벡터는 해당 키데이터에 대한 통계치벡터로, 도 2에 도시한 것과 같이 트라이(TRIE) 구조에 따라 연산할 수 있다.The statistic vector is a statistical vector for the corresponding key data and can be calculated according to a TRIE structure as shown in FIG.

HBKS 처리부에서의 학습 데이터에 대한 학습 방법은 예를 들어 다음과 같다.The learning method for the learning data in the HBKS processing unit is as follows, for example.

학습데이터에 대하여 시계열 모델링을 마친 후. 일반적인 이상패턴 발생비율을 설정할 수 있다(예, 탐지비율 0.1%). 그리고 설정된 탐지비율에 맞춰 신경망 이상패턴 감지 임계치(THD)를 설정할 수 있다. 예를 들어, 학습데이터 1,000,000개 일 때, 이상패턴 발생비율 0.1%이면 1,000개의 패턴이 이상패턴 후보로 감지되는 임계치(THD)를 설정할 수 있다.After completing the time series modeling on learning data. It is possible to set a general abnormal pattern occurrence rate (eg detection rate 0.1%). Then, the neural network abnormal pattern detection threshold (THD) can be set in accordance with the detected detection ratio. For example, if the learning data is 1,000,000 and the abnormal pattern occurrence rate is 0.1%, the threshold THD at which 1,000 patterns are detected as abnormal pattern candidates can be set.

이에 의하여 탐지된 이상패턴 후보에 대하여 HBKS 설정할 수 있다. 이 경우 초기 설정은 "이상패턴아님" 으로 설정할 수 있다.HBKS can be set for the detected abnormal pattern candidate. In this case, the initial setting can be set to "Not an abnormal pattern".

도 3은 일 실시예에 의한 이상패턴 감지 방법을 설명하기 위한 흐름도이다.3 is a flowchart illustrating an abnormal pattern detection method according to an exemplary embodiment of the present invention.

도 3을 참조하면, 시간 t의 데이터가 획득되면 데이터필드 필터링을 수행할 수 있다. 그리고, 이전 데이터가 준비되면 HSBK 탐색 조건에 기초하여 이상패턴인지 판단한다.Referring to FIG. 3, when data of time t is obtained, data field filtering can be performed. When the previous data is prepared, it is judged whether it is an abnormal pattern based on the HSBK search condition.

이상패턴인 경우 이를 알리고, 이상패턴이 아닌 경우 해당 데이터를 벡터화하고 시계열 모델을 적용한다. 아울러, 이상패턴 감지 임계치(THD)가 초과되었는지 확인하여 임계치를 초과한 경우에는 이상패턴이 발생했음을 알리는 단계로 진행하고, 그렇지 않은 경우에는 초기 단계로 복귀한다.If it is an abnormal pattern, it informs it. If it is not an abnormal pattern, it vectorizes the data and applies a time series model. In addition, it is checked whether the abnormal pattern detection threshold value THD is exceeded. If the threshold value is exceeded, the process goes to a step of notifying that an abnormal pattern has occurred. Otherwise, the process returns to the initial step.

도 3에 도시한 HSBK 탐색 조건에 대해 구체적으로 설명하면 다음과 같다.The HSBK search conditions shown in FIG. 3 will be described in detail as follows.

먼저. 길이 W의 시계열 데이터를 HBKS 탐색한다.first. The time series data of length W is searched for HBKS.

w = 1부터 시작하여 w = W까지 중에서 시계열벡터를 w-BKS에서 탐색한다.The time series vector is searched in w-BKS from w = 1 to w = W.

탐색 결과, 발견하지 못한 경우 그대로 통과하고, 발견한 경우에는 하기 과정을 수행할 수 있다.If the search result is not found, it passes as it is, and if it is found, the following process can be performed.

-해당 통계치벡터중 a가 유의수준임계치 이상이고 이상패턴율(b/a)이 신뢰율 이상이면 이상패턴으로 출력하고 최종 leaf노드가 원인에 대한 설명을 가지고 있는 경우 이를 출력- If a of the statistical vector is above the significance level and the abnormal pattern rate (b / a) is above the confidence rate, the abnormal pattern is output. If the final leaf node has a description of the cause,

-해당 통계치벡터중 a가 유의수준임계치 미만이고 이상패턴율(b/a)이 신뢰율 이상이면 이상패턴 후보로 출력하고 전문가 리뷰 필요함을 알리거나 저장- If a is less than the significance level threshold and the abnormal pattern rate (b / a) is more than the confidence rate,

-해당 통계치벡터중 a가 유의수준임계치 미만이고 이상패턴율(b/a)이 신뢰율 이하이면 정상패턴 후보로 출력하고 전문가 리뷰 필요함을 알리거나 저장- If a is less than the significance level threshold and the abnormal pattern rate (b / a) is less than the confidence rate, it is output as a normal pattern candidate.

-- 해당 통계치벡터중 a가 유의수준임계치 이상이고 이상패턴율(b/a)가 신뢰율 이하이면 정상패턴 후보로 출력- If a is greater than the significance level threshold and the abnormal pattern rate (b / a) is less than the confidence rate,

한편, HBKS는 전문가 리뷰에 의해 업데이트될 수 있다.On the other hand, HBKS can be updated by expert reviews.

예를 들어, HBKS에 있는 패턴 중 전문가의 리뷰가 필요하거나 신경망 모델을 거쳐 이상패턴으로 감지된 패턴에 대하여 전문가 리뷰 후 결과에 따라 HBKS탐색 / 생성 / 업데이트가 이루어질 수 있다.For example, HBKS search / generation / update can be done according to the result after expert review on patterns that need expert review of pattern in HBKS or detected as abnormal pattern through neural network model.

이 때, 전문가의 리뷰 후 판단에 따라, 해당 데이터에 대한 a가 유의수준임계치 이상이 되도록 조정하고 이상패턴율(b/a) 또는 정상패턴률이 (a-b/a)이 신뢰율 이상이 되도록 a와 b를 최소한으로 조정한 후 이에 맞춰 전체 HBKS 수정할 수 있다.At this time, according to the judgment after the expert's review, it is necessary to adjust a to the significance level threshold or more so that the abnormal pattern rate (b / a) or the normal pattern rate (ab / a) And b can be adjusted to a minimum and then the entire HBKS can be modified accordingly.

<예 1><Example 1>

W = 5, 유의수준임계치 30, 신뢰율 95% (a, b) = (11, 5)인 경우 정상패턴 판단 W = 5, significance level threshold 30, confidence rate 95% (a, b) = (11, 5)

리프노드에서 모자란 돗수만큼 정상패턴을 본 것으로 하고(11+19, 5), 정상패턴률이 (a-b/a)이 신뢰율 이상이 되도록 a를 증가 (11 + 19, 5), the normal pattern rate is increased to a (a-b / a) above the confidence rate

<예 2><Example 2>

W = 5, 유의수준임계치 30, 신뢰율 95% (a, b) = (11, 5)인 경우 이상패턴 판단W = 5, significance level threshold 30, confidence rate 95% (a, b) = (11, 5)

리프노드에서 모자란 돗수 만큼 이상패턴을 본 것으로 하고(11+19, 5+19), 이상패턴률이 (b/a)이 신뢰율 이상이 될 때까지 a와 b를 증가 (11 + 19, 5 + 19), and a and b are increased until the ideal pattern rate (b / a) becomes more than the confidence rate

전체 HBKS 수정되면 해당 leaf 노드에 원인을 적어 저장할 수 있다.Once the entire HBKS has been modified, the cause can be saved to the corresponding leaf node.

이와 같이, 본 기술에서는 비수치적 데이터를 워드 임베딩 방법을 응용하여 벡터화함으로써, 효과적으로 데이터 차원을 축소시켜 모델링 결과의 일반화 능력 향상시킬 수 있다. 그리고, 비수치적 데이터를 직전 및 직후 데이터를 참조하여 벡터화함으로써 시계열성을 반영할 수 있다.As described above, in the present technology, the non-numerical data is vectorized by applying the word embedding method, thereby effectively reducing the data dimension and improving the generalization ability of the modeling result. The non-numeric data can be vectorized by referring to the immediately preceding data and the immediately following data to reflect the time constant.

또한 HBKS를 이용하여 전문가의 경험적 지식에 기반한 규칙과 신경망 학습 결과를 유기적으로 결합할 수 있다.It is also possible to organically combine rules and neural network learning results based on empirical knowledge of experts using HBKS.

이상패턴이 나타나는 경우, 적용한 규칙을 이용하여 원인을 설명할 수 있다(단, 이상패턴이 HBKS에 저장된 경우에 한함. HBKS에 저장된 패턴이 아닌 경우에는 기존 신경망 방식과 마찬가지로 원인에 대한 분석 필요).If an abnormal pattern appears, the cause can be explained by using the applied rule (only when the abnormal pattern is stored in HBKS, if the pattern is not stored in HBKS, it is necessary to analyze the cause as in the case of the existing neural network method).

전문가(사용자)의 판단에 의한 이상패턴을 시스템에 등록하여 다음 감지부터 바로 적용할 수 있으며, 탐지 정확도에 대한 정책이 있는 경우(예를 들어 오탐 비율이 5% 이하 등)에 이를 HBKS를 이용하여 통계적으로 목표치에 수렴하도록 할 수 있다.It is possible to register an anomaly pattern according to the judgment of the expert (user) in the system and immediately apply from the next detection. If there is a policy for the detection accuracy (for example, the false rate is less than 5%), It can be statistically converged to the target value.

본 기술은 또한 신경망 방법에 기반하기 때문에 실시간에 가까운 이상패턴 감지가 가능하고 데이터에 필연적으로 나타나는 노이즈(오류)에 상대적으로 강하다.This technology is also based on neural network method, so it is possible to detect near-ideal pattern in real time and is relatively strong against noise (error) inevitably appear in data.

아울러, 데이터 벡터화시 데이터 집합에 포함되어 있지 않은 데이터 값의 조합도 벡터화에 포함시킬 수 있으므로 모든 데이터 값의 조합에 대한 감시가 가능하다.In addition, a combination of data values not included in the data set in the data vectorization can be included in the vectorization, so that it is possible to monitor all combinations of data values.

이와 같이, 본 발명이 속하는 기술분야의 당업자는 본 발명이 그 기술적 사상이나 필수적 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로서 이해해야만 한다. 본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 등가개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.Thus, those skilled in the art will appreciate that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. It is therefore to be understood that the embodiments described above are to be considered in all respects only as illustrative and not restrictive. The scope of the present invention is defined by the appended claims rather than the detailed description and all changes or modifications derived from the meaning and scope of the claims and their equivalents are to be construed as being included within the scope of the present invention do.

Vectorizer : 데이터 벡터화부
Time-series Modeler : 시계열 모델링부
HBKS : 계층적 행위지식공간 처리부Vectorizer: The data vectorizer
Time-series Modeler: Time series modeling part
HBKS: hierarchical behavior knowledge space processing unit

Claims

Determining whether there is an abnormal pattern by referring to the statistical data when there is key data corresponding to the received data field, and if there is no key data corresponding to the received data field Wherein the controller is configured to time-series model the received data field to determine whether an abnormal pattern exists.

The method according to claim 1,
Wherein the abnormal pattern detection system is configured to vectorize the received data field and then time-series model the received data field.

The method of claim 1,
Wherein the abnormal pattern detection system is configured to perform the time series modeling using any one of a recurrent neural network model, a self-organizing map model, and an auto encoder model.

The method according to claim 1,
Wherein the abnormal pattern detection system is configured to update the database when it is determined that the abnormal pattern is an abnormal pattern.

A rule-based detection processing unit configured to determine whether an abnormal pattern is based on a rule for a received data field;
A vectorization unit for vectorizing the received data field to generate a data vector;
A neural network based sensing processing unit configured to receive the data vector and determine whether the pattern is abnormal based on a neural network; And
A database for referring to, generating and updating information on a data field determined as an abnormal pattern by the rule-based detection processor and the neural network-based detection processor;
The abnormal pattern detection system comprising:

6. The method of claim 5,
Wherein the rule-based detection processing unit is configured to reflect information on a data field determined as an abnormal pattern in the neural network-based detection processing unit to the database.

6. The method of claim 5,
Wherein the rule-based detection processing unit is configured to include a hierarchical action knowledge space processing unit.

6. The method of claim 5,
Wherein the rule-based detection processing unit is configured to notify that the abnormal pattern is an abnormal pattern and to output a description of the cause if the abnormal pattern is determined.

6. The method of claim 5,
Wherein the rule-based detection processing unit is configured to notify that an abnormal pattern is displayed and that an administrator or an expert needs to be determined if the abnormal pattern candidate is determined.

6. The method of claim 5,
Wherein the vectorization unit is configured to quantize all of the received data fields in time series.

6. The method of claim 5,
Wherein the vectorization unit is configured to generate a data vector in all cases for non-numeric data of the received data fields.

6. The method of claim 5,
Wherein the vectorization unit is configured to represent all or a part of the received data field as non-numeric data when all of the received data fields are numeric data.

6. The method of claim 5,
Wherein the database is configured to store key data and statistic vectors corresponding to data fields judged as abnormal patterns.

6. The method of claim 5,
And a filtering unit for receiving and filtering data, which is a set of at least one data field, and providing the data to the rule-based detection processing unit and the vectorization unit.

A rule-based determination step of determining whether the received data field is an abnormal pattern based on a rule;
Generating a data vector by vectorizing the received data field if the result of the determination is not an abnormal pattern;
A neural network-based determination step of receiving the data vector and determining whether an abnormal pattern is based on a neural network; And
Storing information on a data field determined as an abnormal pattern as a result of the rule base and the neural network;
And detecting the abnormal pattern.

16. The method of claim 15,
Wherein the rule-based decision step is performed by a hierarchical action knowledge space method.

16. The method of claim 15,
Wherein the vectorization unit is configured to digitize all of the received data fields in time series.

16. The method of claim 15,
Wherein the vectorization unit is configured to generate a data vector in all cases for non-numeric data of the received data fields.

16. The method of claim 15,
Wherein the vectorization unit is configured to represent all or a part of the received data field as non-numeric data when all of the received data fields are numeric data.

16. The method of claim 15,
Wherein the neural network based sensing processing unit is configured to include a time series modeling unit.

16. The method of claim 15,
Wherein the storing step includes storing key data and a statistic vector corresponding to the data field determined as the abnormal pattern.