KR20200066428A

KR20200066428A - A unit and method for processing rule based action

Info

Publication number: KR20200066428A
Application number: KR1020180152189A
Authority: KR
Inventors: 박정환; 진홍석; 원지섭; 한혁; 진성일
Original assignee: 주식회사 리얼타임테크
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2020-06-10
Also published as: KR102189127B1

Abstract

An objective of the present invention is to provide an action-based rule processing apparatus and a processing method thereof which can define logs generated in a large amount as an action, process the logs in a standardized form, define continuity of actions as a rule to detect abnormal access, be suitable for artificial intelligence learning, and process the actions at high speeds. To achieve the objective, the action-based rule processing apparatus according to the present invention includes a log collection unit which collects logs used for rule processing from a plurality of agents and stores the collected logs; a rule block which classifies contents of the log according to a combination of predetermined values; and a real-time rule processing unit which includes a condition block for classifying predetermined metadata values of the log and pre-defining actions of the log, which is a combination of the rule block and the condition check block as a rule, wherein the log stored in the log collection unit is retrieved by real-time distribution streaming.

Description

A behavior-based rule processing device and a processing method thereof {A UNIT AND METHOD FOR PROCESSING RULE BASED ACTION}

본 발명은 행위 기반 룰 처리 장치 및 그 처리 방법에 관한 것으로, 더욱 상세하게는 인공지능 학습 및 실시간 분석에 적합한 행위 기반 룰 처리 장치 및 그 처리 방법에 관한 것이다.The present invention relates to a behavior-based rule processing apparatus and a processing method thereof, and more particularly, to a behavior-based rule processing apparatus and a processing method suitable for artificial intelligence learning and real-time analysis.

최근 하드웨어의 급속한 성장과 웹을 통해 제공되는 서비스들의 팽창으로 인해 인터넷 망에서 사용되는 데이터 량이 급증하고 있다. 이에 따라 정상적으로 서비스를 이용하는 사용자의 접근과 아울러 공격을 목적으로 하는 비정상적인 접근이 폭발적으로 늘어나고 있다.Recently, due to the rapid growth of hardware and the expansion of services provided through the web, the amount of data used in the Internet network is rapidly increasing. Accordingly, the abnormal access for the purpose of attack is increasing explosively as well as the access of the user who normally uses the service.

기존 비정상적인 접근을 탐지하기 위해 사용되고 있는 분석 시스템들은 관리자가 지정한 룰을 기반으로 특정 행위, 즉 로그에서 발견되는 비정상적인 정보를 추출하여 위협을 판단해왔다. 하지만 전통적인 로그 분석 시스템에서는 폭발적으로 늘어나는 로그의 데이터 량을 감당하지 못하거나 점점 지능화 되어가는 공격 패턴들에 일일이 대응할 수 없는 한계를 보이고 있다.Existing analysis systems used to detect abnormal access have determined threats by extracting abnormal information found in specific actions, that is, logs, based on rules specified by the administrator. However, in the traditional log analysis system, it is unable to cope with the explosively increasing amount of log data, or it has limitations that it cannot respond to increasingly intelligent attack patterns.

이러한 대용량 처리 한계를 극복하기 위한 방법으로 대부분의 분석 시스템에서는 하드웨어 성능을 끌어올리는 방식을 택하고 있으나 대역폭의 한계와 IO(입출력) 지연 현상 등의 한계는 여전이 존재하며, 지능화 되어 판단이 어려운 접근을 탐지하기 위해 인공지능 모델을 통해 패턴을 추출하는 등의 연구가 이루어지고 있으나 기존 룰을 처리하는 룰 엔진들에서 사용하는 룰 정의 방식이 대부분 프로그래밍 방식을 택하고 있기 때문에 인공지능 모델에서 사용할 수 있는 형태로 정형화 할 수 없는 문제점이 있었다.As a method to overcome the limitations of large-capacity processing, most analysis systems adopt a method that enhances hardware performance, but there are still limitations such as bandwidth limitations and IO (input/output) delays, and it is an intelligent and difficult to judge approach. Research has been conducted to extract patterns through an artificial intelligence model to detect, but most of the rule definition methods used in rule engines that process existing rules use a programming method, so they can be used in artificial intelligence models. There was a problem that can not be formalized in form.

일례로, 일반적인 분석 단계(룰 엔진)에서는 먼저 정규화 로그를 데이터베이스와 같은 저장 매체에 저장하고, 저장되어 있는 정보를 불러와서 분석 로직(룰 처리)을 수행하게 된다. 하지만 이러한 방식은 실시간 스트리밍 시스템을 사용하더라도 분석(룰 처리) 이전에 스트리밍 결과를 데이터베이스에 저장하는 과정이 필요하고, 분석 단계에서 데이터베이스 질의를 위한 처리 시간이 필요하기 때문에 실시간 처리에 적합하지 않다.For example, in the general analysis step (rule engine), the normalization log is first stored in a storage medium such as a database, and the stored information is loaded to perform analysis logic (rule processing). However, even if a real-time streaming system is used, this method is not suitable for real-time processing because it requires a process of storing streaming results in a database before analysis (rule processing) and processing time for querying the database in the analysis step.

또한, 로그의 종류가 다양한 경우 로그 종류마다 로그의 정규화 결과 항목들을 각각 테이블의 컬럼들로 지정해서 데이터베이스의 테이블에 저장하게 되고, 분석 로직에서는 여러 테이블들에 대한 JOIN 질의 같은 시간이 오래 걸리는 처리가 필요하기 때문에 실시간 분석에 적합하지 않다.In addition, when the log types are various, the normalized result items of the log are specified as columns of each table and stored in a table in the database. In the analysis logic, processing that takes a long time, such as a JOIN query for multiple tables, takes a long time. It is not suitable for real-time analysis because it is necessary.

또한, 룰 처리 엔진에서의 룰은 질의(SQL Query) 형태로 정의되어야 하며, 룰 생성 및 룰 고도화 등을 위한 AI 학습 모델에 적합하지 않다. 왜냐하면, AI 학습의 결과물이 일종의 프로그래밍 언어인 SQL 질의 형태의 룰이기 때문에, 자연어 학습에 가까운 학습 모델을 구현해야 하기 때문이다.In addition, rules in the rule processing engine must be defined in the form of queries (SQL Query), and are not suitable for AI learning models for rule generation and rule enhancement. Because the result of AI learning is a rule in the form of SQL query, which is a kind of programming language, it is necessary to implement a learning model close to natural language learning.

이에 본 발명은 상기한 바와 같은 요구에 부응하기 위해 제안된 것으로서, 그 목적은 대용량으로 발생하는 로그들을 행위로 정의하여 정형화 된 형태로 처리하고, 행위들의 연속성을 룰로 정의하여 비정상적인 접근을 탐지하며, 인공지능 학습에 적합함과 아울러 고속으로 처리할 수 있는 행위 기반 룰 처리 장치 및 그 처리 방법을 제공하는 것이다.Accordingly, the present invention has been proposed to meet the above-described needs, and its purpose is to define logs that occur in large quantities as actions and process them in a standardized form, and define continuity of actions as rules to detect abnormal access, It is to provide an action-based rule processing device and a processing method suitable for artificial intelligence learning and capable of processing at a high speed.

상기 목적을 달성하기 위해, 본 발명에 따른 행위 기반 룰 처리 장치는, 복수의 에이전트로부터 룰 처리에 사용되는 로그를 수집함과 아울러 수집된 상기 로그를 저장하는 로그 수집부와, 상기 로그의 내용을 미리 정해진 값들의 조합에 따라 분류하는 룰 블록(Rule Block)과, 상기 로그의 미리 정해진 메타데이터값들을 분류하는 조건 검사 블록(Condition Block)을 포함함과 아울러 상기 룰 블록과, 상기 조건 검사 블록의 조합인 상기 로그의 행위들을 룰로 미리 정의하는 실시간 룰 처리부를 포함하며, 상기 로그 수집부에 저장되어 있는 상기 로그를 실시간 분산 스트리밍에 의해 불러온다.In order to achieve the above object, the behavior-based rule processing apparatus according to the present invention collects logs used for rule processing from a plurality of agents and a log collection unit for storing the collected logs, and the contents of the logs A rule block that classifies according to a combination of predetermined values, and a condition block that classifies predetermined metadata values of the log, and includes the rule block and the condition check block. It includes a real-time rule processing unit that pre-defines the actions of the log as a rule, and retrieves the log stored in the log collection unit by real-time distributed streaming.

또한, 본 발명에 따른 행위 기반 룰 처리 장치에서, 상기 실시간 분산 스트리밍은, 불러오는 상기 로그를 데이터 내용 또는 구분자에 따라 분류하는 제 1 정규화와, 상기 제 1 정규화의 결과로부터 상기 로그의 내용이 상기 룰 블록에 해당하는지 검사하는 제 2 정규화를 각각 수행하는 정규화 과정을 포함한다.In addition, in the behavior-based rule processing apparatus according to the present invention, in the real-time distributed streaming, the first normalization classifying the loaded log according to data content or delimiter, and the log contents from the result of the first normalization are the rules And a normalization process, each of which performs a second normalization to check whether it corresponds to the block.

또한, 본 발명에 따른 행위 기반 룰 처리 장치에서, 상기 로그의 내용이 상기 룰 블록에 해당할 경우, 상기 실시간 분산 스트리밍은 상기 룰 블록에 해당하는 블록과, 상기 조건 검사 블록에 해당하는 블록을 조합하여 상기 로그의 행위를 출력하고, 상기 행위에 대한 사용자 매핑(상기 행위를 발생한 사용자를 식별)을 수행한다.In addition, in the behavior-based rule processing apparatus according to the present invention, when the content of the log corresponds to the rule block, the real-time distributed streaming combines the block corresponding to the rule block and the block corresponding to the condition check block To output the log's action, and perform user mapping (identifying the user who generated the action) to the action.

또한, 본 발명에 따른 행위 기반 룰 처리 장치에서, 상기 실시간 분산 스트리밍과, 상기 실시간 룰 처리부의 룰 처리는 메모리 상에서 수행된다.In addition, in the behavior-based rule processing apparatus according to the present invention, the real-time distributed streaming and the rule processing of the real-time rule processing unit are performed on a memory.

또한, 본 발명에 따른 행위 기반 룰 처리 장치에서, 상기 실시간 분산 스트리밍은 상기 정규화 과정을 병렬로 분산 처리하고, 상기 실시간 룰 처리부는 상기 행위를 사용자별 또는 행위별로 멀티 쓰레딩(multi-threading) 처리한다.In addition, in the behavior-based rule processing apparatus according to the present invention, the real-time distributed streaming distributes the normalization process in parallel, and the real-time rule processing unit multi-threads the behavior for each user or each activity. .

또한, 본 발명에 따른 행위 기반 룰 처리 장치에서, 상기 실시간 룰 처리부는 상기 룰을 처리시 사용자별 행위들에 버퍼(buffer)를 두고 처리한다.In addition, in the behavior-based rule processing apparatus according to the present invention, the real-time rule processing unit processes the rule with a buffer in user-specific actions.

또한, 본 발명에 따른 행위 기반 룰 처리 장치에서, 상기 실시간 룰 처리부는 상기 로그의 행위가 미리 정의한 상기 룰 블록의 내용과, 상기 조건 검사 블록의 메타데이터와 일치하는지를 검사한다.In addition, in the behavior-based rule processing apparatus according to the present invention, the real-time rule processing unit checks whether the behavior of the log matches the content of the rule block defined in advance and the metadata of the condition check block.

한편, 상기 목적을 달성하기 위해, 본 발명에 따른 행위 기반 룰 처리 방법은, 로그 수집부에 의해 수집되며 룰 처리에 사용되는 로그의 내용을 미리 정해진 값들의 조합에 따라 분류하는 룰 블록과, 상기 로그의 미리 정해진 메타데이터값들을 분류하는 조건 검사 블록을 포함함과 아울러 상기 룰 블록과, 상기 조건 검사 블록의 조합인 상기 로그의 행위들을 룰로 미리 정의하는 실시간 룰 처리부에 의해 실시간으로 룰 처리가 이루어지도록 상기 로그를 가공하는 행위 기반 룰 처리 방법으로서, 상기 로그 수집부에 의해 수집되는 상기 로그를 실시간 분산 스트리밍이 불러오는 제 1 단계(S100)와, 상기 로그 수집부로부터 불러온 상기 로그를 상기 실시간 분산 스트리밍의 정규화 과정에 의해 정규화하는 제 2 단계(S200)와, 상기 실시간 분산 스트리밍에 의해 정규화된 상기 로그를 상기 룰 블록과 매핑하여 상기 로그의 사용자 식별(사용자 매핑)을 수행하는 제 3 단계(S300)와, 매핑된 상기 로그를 상기 실시간 룰 처리부에 의해 룰 처리하는 제 4 단계(S400)를 포함한다.On the other hand, in order to achieve the above object, the behavior-based rule processing method according to the present invention includes a rule block that classifies contents of a log collected by a log collection unit and used for rule processing according to a combination of predetermined values, and In addition to including a condition check block that classifies predetermined metadata values of a log, real-time rule processing is performed by a real-time rule processing unit that pre-defines rules of the log as a rule that is a combination of the rule block and the condition check block. As a behavior-based rule processing method of processing the log to be lost, a first step (S100) in which real-time distributed streaming calls the log collected by the log collection unit, and the real-time distribution of the log retrieved from the log collection unit The second step (S200) of normalizing by the normalization process of streaming, and the third step of performing user identification (user mapping) of the log by mapping the log normalized by the real-time distributed streaming with the rule block (S300) ) And a fourth step of processing the mapped log by the real-time rule processing unit (S400 ).

또한, 본 발명에 따른 행위 기반 룰 처리 방법에서, 상기 정규화 과정은, 불러오는 상기 로그를 데이터 내용 또는 구분자에 따라 분류하는 제 1 정규화와, 상기 제 1 정규화의 결과로부터 상기 로그의 내용이 상기 룰 블록에 해당하는지 검사하는 제 2 정규화를 각각 수행한다.In addition, in the behavior-based rule processing method according to the present invention, the normalization process includes: a first normalization classifying the loaded log according to data content or a separator, and the log content from the result of the first normalization is the rule block Each of the second normalization is performed to check whether it corresponds to.

또한, 본 발명에 따른 행위 기반 룰 처리 방법에서, 상기 로그의 내용이 상기 룰 블록에 해당할 경우, 상기 실시간 분산 스트리밍은 상기 룰 블록에 해당하는 블록과, 상기 조건 검사 블록에 해당하는 블록을 조합하여 상기 로그의 행위를 출력하고, 상기 실시간 룰 처리부는 상기 로그의 행위가 미리 정의한 상기 룰 블록의 내용과, 상기 조건 검사 블록의 메타데이터와 일치하는지를 검사한다.Further, in the behavior-based rule processing method according to the present invention, when the content of the log corresponds to the rule block, the real-time distributed streaming combines a block corresponding to the rule block and a block corresponding to the condition check block The log action is output, and the real-time rule processing unit checks whether the log action matches the content of the rule block defined in advance and the metadata of the condition check block.

본 발명에 의하면, 대용량으로 발생하는 로그들을 행위로 정의하여 정형화 된 형태로 처리하고, 행위들의 연속성을 룰로 정의하여 비정상적인 접근을 탐지하며, 인공지능 학습에 적합함과 아울러 고속으로 처리할 수 있는 효과가 있다.According to the present invention, logs generated in a large amount are defined as actions and processed in a standardized form, and continuity of actions is defined as rules to detect abnormal access, suitable for artificial intelligence learning, and capable of processing at high speed. There is.

도 1은 본 발명에 따른 행위 기반 룰 처리 장치의 전체 구성을 나타내는 시스템 구성도.
도 2는 룰 블록과 행위를 나타내는 도면.
도 3은 룰 처리를 나타내는 도면.
도 4는 본 발명에 따른 행위 기반 룰 처리 방법의 처리 흐름을 나타내는 플로어 차트.1 is a system configuration diagram showing the overall configuration of a behavior-based rule processing apparatus according to the present invention.
2 is a diagram showing rule blocks and actions.
3 is a diagram showing rule processing.
Figure 4 is a floor chart showing the processing flow of the behavior-based rule processing method according to the present invention.

이하, 본 발명의 실시예에 대해 관련 도면들을 참조하여 상세히 설명하기로 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to related drawings.

본 발명은 설명의 용이함을 위해, 자바 언어 환경을 예로 설명하지만, 이에 한정되는 것은 아니며, C#과 같이 중간 언어를 가진 언어에 모두 적용가능하다.For ease of explanation, the present invention describes the Java language environment as an example, but is not limited thereto, and is applicable to all languages having an intermediate language such as C#.

도 1은 본 발명에 따른 행위 기반 룰 처리 장치의 전체 구성을 나타내는 시스템 구성도이다.1 is a system configuration diagram showing the overall configuration of a behavior-based rule processing apparatus according to the present invention.

도 1을 참조하면, 본 발명에 따른 행위 기반 룰 처리 장치(1000)는 로그 수집부(100)와, 실시간 룰 처리부(200)와, 실시간 분산 스트리밍(300)을 포함한다.1, the behavior-based rule processing apparatus 1000 according to the present invention includes a log collection unit 100, a real-time rule processing unit 200, and a real-time distributed streaming 300.

여기서, 로그 수집부(100)는 복수의 에이전트(10)로부터 룰(Rule) 처리에 사용되는 로그를 수집한다. 또한, 수집된 로그를 저장하는 역할을 수행한다.Here, the log collection unit 100 collects logs used for rule processing from a plurality of agents 10. Also, it serves to store the collected log.

실시간 룰 처리부(200)는 로그의 내용을 미리 정해진 값들의 조합에 따라 분류하는 룰 블록(Rule Block)과, 로그의 미리 정해진 메타데이터값들을 분류하는 조건 검사 블록(Condition Block)을 포함함과 아울러 룰 블록과, 조건 검사 블록의 조합인 로그의 행위들을 룰로 미리 정의한다.The real-time rule processing unit 200 includes a rule block that classifies the log contents according to a combination of predetermined values, and a condition block that classifies predetermined metadata values of the log. The behavior of the log, which is a combination of the rule block and the condition check block, is defined in advance as a rule.

실시간 분산 스트리밍(300)은, 로그 수집부(100)에 저장되어 있는 로그를 실시간으로 불러오는 역할을 수행한다.The real-time distributed streaming 300 serves to fetch the log stored in the log collection unit 100 in real time.

이에 대해, 도 2 및 도 3을 참조하여 좀더 상세히 설명하도록 한다.This will be described in more detail with reference to FIGS. 2 and 3.

도 2는 룰 블록과 행위를 나타내는 도면이고, 도 3은 룰 처리를 나타내는 도면이다.2 is a diagram showing rule blocks and actions, and FIG. 3 is a diagram showing rule processing.

로그의 형태와 종류는 수집 대상 또는 수집 에이전트의 특성에 따라 매우 다양하다. 수집 대상에는 대표적으로 데이터베이스 시스템, 웹서버, 네트워크 패킷, 운영체제 시스템 또는 응용 프로그램(application) 등이 있으며, 이러한 수집 대상이 남기는 로그를 복수의 에이전트(10)를 통해 수집할 수 있다. 또한, 수집 대상이 로그를 남기지 않더라도 복수의 에이전트(10)가 직접 로그를 생성할 수도 있다. 응용 프로그램에서 로그를 남기고 있지 않더라도 에이전트(10)가 응용 프로그램에 탑재되어 분석에 필요한 애플리케이션 로그를 생성하거나 로그가 남지 않는 네트워크의 패킷을 감시하면서 파일 형태의 로그로 남기는 경우 등을 예로 들 수 있다.The types and types of logs vary greatly depending on the characteristics of the collection target or collection agent. The target of collection includes a database system, a web server, a network packet, an operating system, or an application, and the logs left by the collection target can be collected through a plurality of agents 10. Further, even if the collection target does not leave a log, the plurality of agents 10 may directly generate the log. For example, even if the log is not left in the application program, the agent 10 is mounted in the application program to generate an application log required for analysis or to log in a file form while monitoring a packet in a network where no log is left.

복수의 에이전트(10)를 통해 수집된 로그들은 정규화되어 실시간 룰 처리부(200)로 보내진다. 분석 과정의 실시간성을 위해 복수의 에이전트(10)와 실시간 룰 처리부(200) 사이에 스트리밍 처리가 필요할 수 있다. 스트리밍 처리를 통해 대용량의 데이터(로그)의 실시간 처리 및 분산 처리가 가능하며, 로그 수집부(100)와, 실시간 분산 스트리밍(300)을 활용하여 스트리밍 처리 인프라를 구성할 수 있다.Logs collected through the plurality of agents 10 are normalized and sent to the real-time rule processing unit 200. Streaming processing may be required between the plurality of agents 10 and the real-time rule processing unit 200 for real-time analysis. Real-time processing and distributed processing of large amounts of data (logs) are possible through streaming processing, and a streaming processing infrastructure can be configured by utilizing the log collection unit 100 and real-time distributed streaming 300.

일례로 로그 수집부(100)의 경우, 대용량 로그 데이터의 분산 메시징 시스템으로 로그 에이전트와 스트리밍 데이터 처리 사이에서 버퍼(Buffer) 역할을 수행하며, 실시간 분산 스트리밍(300)에서는 로그 수집부(100)에 저장된 로그들을 마이크로 배치 방식으로 가져와 비즈니스 로직에 맞는 데이터 정규화 처리를 할 수 있다. 정규화 처리가 이루어진 데이터(로그)들은 실시간 룰 처리부(200)로 넘어가 비즈니스 로직 또는 룰 처리 프로세스의 입력으로 사용된다.For example, in the case of the log collection unit 100, a distributed messaging system for large-capacity log data serves as a buffer between the log agent and the streaming data processing, and in the real-time distributed streaming 300, the log collection unit 100 The stored logs can be imported in a micro-batch method to perform data normalization processing according to business logic. The normalized data (logs) are transferred to the real-time rule processing unit 200 and used as input to business logic or rule processing.

로그의 내용은 통상 특정 포맷에 맞춰 기록된 문자열들이다. 가장 일반적인 웹(Web) 로그를 예로 들면, 웹 로그는 특정 웹 서버 또는 와스(WAS) 위에서 동작하는 웹 사이트를 방문한 사용자들이 언제, 어디서, 어떤 페이지를 방문했는지에 대한 정보를 기록한 파일이다. 웹 로그 내용을 보면, 이러한 정보들이 포맷에 맞춰 문자열로 기록되어 있다.The contents of the log are usually strings recorded according to a specific format. Taking the most common web log as an example, the web log is a file that records information about when, where, and what pages were visited by users who visited a website running on a specific web server or WAS. If you look at the contents of the web log, these information is recorded as a string according to the format.

표 1의 웹 로그의 예를 살펴보면, 서버에 요청한 클라이언트의 IP 주소, 식별(Identity) 정보, HTTP 인증을 통해 알아낸 사용자의 유저(User) ID, 서버 요청처리 시간, 클라이언트가 사용한 HTTP method와 HTTP 프로토콜, 요청 URL, 서버가 클라이언트에게 보내는 상태 코드, 그리고 서버가 클라이언트에게 보내는 내용(body)의 크기의 내용을 담고 있다. 이러한 로그 문자열 전체를 분석 단계에서 파싱을 통해 분석을 할 수도 있지만 보통은 분석을 위해 필요한 정보에 맞춰 미리 앞에서 정규화를 하고 그 결과를 분석 단계에서 사용하게 된다. 로그의 데이터 내용 또는 구분자에 따라서 정규화를 하는 것이 일반적이다. 표 1에 나타낸 바와 같이 웹 로그를 구분자(공백) 또는 데이터의 의미에 따라 정규화를 하게 되면 다음 표 2와 같이 정규화 될 수 있다.Looking at the example of the web log in Table 1, the IP address of the client requesting the server, the identity information, the user ID of the user found through HTTP authentication, the server request processing time, the HTTP method and HTTP used by the client It contains the protocol, the request URL, the status code the server sends to the client, and the size of the body the server sends to the client. Although the entire log string can be analyzed through parsing in the analysis step, it is normally normalized beforehand according to the information required for analysis, and the result is used in the analysis step. It is common to normalize according to the log data content or delimiter. As shown in Table 1, if the web log is normalized according to the separator (space) or the meaning of the data, it can be normalized as shown in Table 2 below.

[표 2]와 같이 정규화된 데이터를 보다 실시간 분석(룰 처리), AI 학습에 적합한 룰을 만들기 위해 룰의 구성 요소인 "룰 블록"을 미리 정의하고, 룰 처리 엔진에서 사용하는 모든 룰을 룰의 구성 요소인 룰 블록과, 추가 조건 검사를 위한 "조건 검사 블록"을 사용하여 정의한다. 룰이란 결국 로그를 하나의 행위로 본다면 이러한 행위들의 패턴을 의미한다. 물론 로그가 기준에 따라서 여러 행위를 의미할 수 있지만 로그는 웹 로그의 경우 사용자의 웹 페이지 접근 행위, 데이터베이스 로그의 경우 로그인 또는 데이터 조회와 같은 사용자 행위로 볼 수 있다. 실시간 룰 처리부(200)에서 분석(룰 처리)시 사용되는 룰은 이러한 행위들의 조합이므로 룰 생성을 위한 모든 행위들을 정의해 놓고, 행위 기반 룰 처리 장치(1000)의 모든 로그들이 정의해 놓은 행위들로 매칭 된다면, 룰 처리 과정은 단순히 행위 비교 연산으로 바뀌게 되며, 분석(룰 처리) 단계 이전에 데이터베이스와 같은 저장 매체에 저장하지 않고 분석이 가능하다.As shown in [Table 2], in order to analyze the normalized data more in real time (rule processing) and to create a rule suitable for AI learning, the rule rule component "rule block" is defined in advance, and all rules used in the rule processing engine are ruled. It is defined by using the rule block, which is a component of and the "condition check block" for additional condition check. A rule means a pattern of these actions if the log is viewed as an action. Of course, the log can mean several actions depending on the criteria, but the log can be viewed as a user's action such as a user's access to a web page in the case of a web log, or a login or data inquiry in the case of a database log. Since the rules used in the analysis (rule processing) in the real-time rule processing unit 200 are combinations of these actions, all actions for rule generation are defined, and actions defined by all logs of the action-based rule processing apparatus 1000 If matched with, the rule processing process is simply changed to a behavior comparison operation, and analysis can be performed without storing in a storage medium such as a database before the analysis (rule processing) step.

로그들의 포맷이 정해져 있다고 해도, 데이터 내용의 종류에 따라 행위를 구분하게 된다면 행위의 종류는 엄청나게 많아진다. 따라서, 룰에 사용되는 행위의 타입을 정의해 행위들을 분류해야 하며, 이 행위의 타입이 앞에서 언급한 룰의 구성 요소인 룰 블록이다. 로그를 행위로 본다는 것은 로그의 내용이 어떤 특정한 값일 때 그것을 미리 정의한 행위로 본다는 것이다. 행위를 구분할 때 기준이 되는 것이 행위의 타입인 룰 블록이다.Even if the format of the logs is determined, if the actions are classified according to the type of data content, the types of the actions are greatly increased. Therefore, actions must be classified by defining the type of action used in the rule, and this action type is a rule block that is a component of the aforementioned rule. Viewing a log as an action means that when the content of the log is a certain value, it is viewed as a predefined action. A rule block that is a type of action is a criterion when classifying actions.

도 2에서는 "로그인 시도"라는 의미를 갖는 행위의 타입을 "try_login"이라는 ID를 갖는 룰 블록으로 정의하며, 이때 검사하는 웹 로그 데이터의 값은 클라이언트의 요청 HTTP method를 나타내는 E 값과, 접근 페이지 주소를 의미하는 F 값과, 클라이언트의 요청 HTTP 프로토콜인 G와, 서버의 응답 코드인 H이다. E="GET", F="/login.php", G="HTTP/1.1", H="200"인 경우, "로그인 시도"라는 의미의 행위로 볼 수 있는 것이다. 행위의 판별에 사용되는 조건 값들은 물론 사용자에 따라 다를 수 있다. 다른 웹 사이트에서는 로그인 페이지의 주소가 "/login.php"가 아닐 수도 있고, 로그인 페이지가 없을 수도 있기 때문이다.In FIG. 2, the type of the action having the meaning of “login attempt” is defined as a rule block having an ID of “try_login”. At this time, the value of the web log data to be checked is the E value indicating the HTTP method requested by the client and the access page. The F value for the address, the client's request HTTP protocol G, and the server's response code H. In the case of E="GET", F="/login.php", G="HTTP/1.1", and H="200", it can be considered as an action of "login attempt". Condition values used for discrimination of actions may of course vary depending on the user. This is because the address of the login page may not be "/login.php" on other websites, or the login page may not exist.

이처럼 로그 데이터의 내용 중 특정 값들만 비교해서 행위의 타입을 구분하고, 1차 정규화 결과의 모든 값들을 메타 데이터로 정의하여 행위의 타입인 룰 블록과 메타 데이터를 합쳐 행위로 정의한다. 로그들을 행위(룰 블록 + 메타 데이터)로 매칭시키기 위해, 데이터의 내용, 구분자를 기준으로 한 1차 정규화 과정 이후에 추가로 2차 정규화 과정이 필요하다. 2차 정규화 과정에서는 로그의 1차 정규화 결과가 미리 정의해 놓은 행위의 타입(룰 블록)에 해당되는지 검사한다. 즉, 본 발명에 따른 행위 기반 룰 처리 장치(1000)에서, 실시간 분산 스트리밍(300)은 로그 수집부(100)로부터 불러오는 로그를 데이터 내용 또는 구분자에 따라 분류(제 1 정규화)한다. 다음, 제 1 정규화의 결과로부터 로그의 종류가 룰 블록에 해당하는지 검사(제 2 정규화)한다. 이후, 룰 블록에 해당되면 해당 룰 블록의 ID와 1차 정규화 결과 값들을 합쳐서 최종 정규화 결과인 행위가 나오게 된다.As described above, the behavior type is classified by comparing only specific values among the contents of log data, and all values of the primary normalization result are defined as metadata, and the rule block, which is the type of behavior, and metadata are combined to define the behavior. In order to match the logs with actions (rule block + meta data), a second normalization process is required after the first normalization process based on the content and delimiters of the data. In the second normalization process, it is checked whether the first normalization result of the log corresponds to a predefined type of action (rule block). That is, in the behavior-based rule processing apparatus 1000 according to the present invention, the real-time distributed streaming 300 classifies the logs retrieved from the log collection unit 100 according to data contents or delimiters (first normalization). Next, it is checked whether the log type corresponds to the rule block from the result of the first normalization (second normalization). Thereafter, if the rule block corresponds, the ID of the corresponding rule block and the primary normalization result values are combined to produce an action that is the final normalization result.

도 2에서는 E, F, G, H 값들이 "try_login"이라는 룰 블록의 조건 값들에 부합되면, 해당 로그는 "try_login", 즉 "로그인 시도"라는 의미를 갖는 행위의 종류(룰 블록)로 분류된다. 하지만 이 행위의 종류만으로는 행위를 구분하기에는 부족하다. A 값이 있어야 누가 "로그인 시도" 행위를 했는지 알 수 있고, D 값을 알아야 언제 "로그인 시도" 행위를 했는지 알 수 있다. 이러한 룰 블록 조건 검사 값 이외의 값들을 앞에서 언급한 조건 검사 블록에서 검사를 수행한다. 따라서 룰 처리 단계의 모든 룰 들은 룰 블록과 조건 검사 블록들의 조합으로 구성된다. "IP가 192.167.0.1인 사용자가 로그인 시도를 했다"라는 룰을 생성한다면, 그 룰은 도 3에 나타낸 바와 같이 "try_login"이라는 ID를 갖는 룰 블록과 A 값이 "192.167.0.1"인지를 비교하는 조건 검사 블록으로 구성된다. 이것은 "try_login"이라는 룰 블록 검사를 통해 "로그인 시도" 행위인지를 판별하고, 조건 검사 블록을 통해 "192.167.0.1"인 사용자가 맞는지 판별하여 IP가 192.167.0.1인 사용자가 로그인 시도를 한 경우를 로그 분석을 통해 실시간 룰 처리부(200)에서 자동으로 찾아낼 수 있다는 의미이다. 즉, 본 발명에 따른 행위 기반 룰 처리 장치(1000)에서, 실시간 룰 처리부(200)는 로그의 행위가 미리 정의한 상기 룰 블록의 내용과, 조건 검사 블록의 메타데이터와 일치하는지를 검사한다.In FIG. 2, if the E, F, G, and H values meet the condition values of the rule block “try_login”, the corresponding log is classified as a type of action (rule block) having the meaning of “try_login”, that is, “try to log in”. do. However, this kind of action alone is not enough to distinguish the action. The A value is needed to know who has done the "login attempt" and know the D value to know when the "login attempt" was done. Values other than these rule block condition check values are checked in the condition check block mentioned above. Therefore, all rules in the rule processing step are composed of a combination of rule blocks and condition check blocks. If the rule "User with IP 192.167.0.1 tried to log in" is created, the rule compares the rule block with ID "try_login" and the value of A is "192.167.0.1" as shown in FIG. It consists of a condition check block. This determines whether or not it is an "try to log in" action through a rule block check called "try_login", and determines whether a user with "192.167.0.1" is correct through a condition check block when a user with an IP of 192.167.0.1 tries to log in. This means that it can be automatically found in the real-time rule processing unit 200 through log analysis. That is, in the action-based rule processing apparatus 1000 according to the present invention, the real-time rule processing unit 200 checks whether a log action matches the content of the rule block defined in advance and the metadata of the condition check block.

이와 같이 룰을 구성하고 정규화 결과를 룰의 구성 요소인 룰 블록과 조건 검사 블록에 필요한 값들로 정규화 결과를 정의하면, 실시간 분산 스트리밍(300)에 의해 실시간으로 들어오는 로그 데이터를 데이터베이스 저장 없이 단순 룰 블록 ID 검사와 1차 정규화 결과인 메타 데이터들의 특정 항목 값 비교를 통해 메모리 상(in-memory)에서 룰 처리를 수행할 수 있다. 즉, 본 발명에 따른 하는 행위 기반 룰 처리 장치(1000)에서, 실시간 분산 스트리밍(300)과, 실시간 룰 처리부(200)의 룰 처리는 메모리 상에서 수행된다.In this way, if the rule is configured and the normalization result is defined with the values required for the rule block, which is a component of the rule, and the condition check block, the log data coming in real time by the real-time distributed streaming 300 is stored in a simple rule block without database storage. Rule processing can be performed in-memory through ID check and comparison of specific item values of meta data, which is the result of primary normalization. That is, in the behavior-based rule processing apparatus 1000 according to the present invention, the real-time distributed streaming 300 and the rule processing of the real-time rule processing unit 200 are performed on the memory.

이후, 본 발명에 따른 행위 기반 룰 처리 장치(1000)에서, 로그의 내용이 룰 블록에 해당할 경우, 실시간 분산 스트리밍(300)은 룰 블록에 해당하는 블록과, 조건 검사 블록에 해당하는 블록을 조합하여 로그의 행위를 출력하고, 이러한 로그의 행위에 대한 사용자 매핑(상기 행위를 발생한 사용자를 식별)을 수행한다.Thereafter, in the behavior-based rule processing apparatus 1000 according to the present invention, when the contents of the log correspond to the rule block, the real-time distributed streaming 300 blocks the block corresponding to the rule block and the block corresponding to the condition check block Combined, the log actions are output, and user mapping (identifying the user who caused the action) to the log actions is performed.

또한, "IP가 192.167.0.1인 사용자가 로그인 시도를 두 번 연속으로 하였다"라는 룰과 같이 연속적인 행위 값(룰 블록 ID 값 + 메타 데이터 항목 값)의 검사를 해야 하는 경우도 있지만, "IP가 192.167.0.1인 사용자가 로그인 시도를 10초 안에 연속으로 하였다"라는 룰의 경우 IP가 192.167.0.1인 사용자가 로그인 시도를 한 첫 번째 로그 이후로 10초간 기다려야 룰 통과 여부를 판단할 수 있다. 만약 IP가 192.167.0.1인 사용자가 10초 안에 다시 로그인 시도를 하지 않는다면 해당 룰에는 걸리지 않는 것이기 때문이다. 이처럼 룰 검사 시에 특정 로그 이후에 발생되는 로그들을 살펴보아야 하는 경우가 있으므로, 이런 경우 룰 처리를 위해 내부적으로 행위들의 버퍼(buffer)를 두어서 처리하도록 한다. 즉, 본 발명에 따른 행위 기반 룰 처리 장치(1000)에서, 실시간 룰 처리부(200)는 룰을 처리시 사용자별 행위들에 버퍼를 두고 처리한다.In addition, in some cases, it is necessary to check a series of action values (rule block ID value + metadata item value), such as the rule "A user with an IP of 192.167.0.1 made two consecutive login attempts." In the case of the rule that a user with 192.167.0.1 attempted a login within 10 seconds consecutively, a user with an IP of 192.167.0.1 must wait 10 seconds after the first log in attempting to log in to determine whether or not the rule has passed. If the user with IP 192.167.0.1 does not try to log in again within 10 seconds, the rule is not applied. In this case, it is necessary to check the logs generated after a specific log when checking rules. In this case, a buffer of actions is placed internally for rule processing. That is, in the action-based rule processing apparatus 1000 according to the present invention, the real-time rule processing unit 200 processes a rule with a buffer in user-specific actions.

또한, 실시간 분산 스트리밍(300)에서는 로그들의 정규화 과정의 병렬 처리가 가능하다. 실시간 분산 스트리밍(300)이 정규화 처리를 하게 되고, 정규화 처리 결과인 행위 값들이 병렬로 실시간 룰 처리부(200)에 입력이 되고, 실시간 룰 처리부(200)에서는 액션별로 멀티 쓰레딩(multi-threading) 처리를 행하면 룰 처리 성능이 향상될 수 있다. 보통 행위들을 사용자 기준으로 구분하게 되며 룰 또한 사용자들을 기준으로 생성된다. 실시간 분산 스트리밍 단계 또는 분석(룰 처리) 단계에서 사용자 식별이 가능하다면 룰 처리를 사용자별로 병렬 처리가 가능하다. 즉, 본 발명에 따른 행위 기반 룰 처리 장치(1000)에서, 실시간 분산 스트리밍(300)은 정규화 과정을 병렬로 분산 처리하고, 실시간 룰 처리부(200)는 행위를 행위별 또는 사용자별로 멀티 쓰레딩 처리한다.Also, in the real-time distributed streaming 300, parallel processing of the normalization process of logs is possible. The real-time distributed streaming 300 performs normalization processing, and the action values resulting from the normalization processing are input to the real-time rule processing unit 200 in parallel, and the real-time rule processing unit 200 processes multi-threading for each action. By performing, rule processing performance can be improved. Usually, actions are classified based on users, and rules are also created based on users. If the user can be identified in the real-time distributed streaming step or the analysis (rule processing) step, the rule processing can be processed in parallel for each user. That is, in the action-based rule processing apparatus 1000 according to the present invention, the real-time distributed streaming 300 distributes the normalization process in parallel, and the real-time rule processing unit 200 performs multi-threading of the actions by action or by user. .

한편, 로그의 정규화 결과를 데이터베이스에 저장한 후 분석(룰 처리)을 하게 된다면, 룰 처리 엔진에서의 룰은 질의(SQL Query) 형태로 정의되어야 하기 때문에 인공지능 학습 모델의 입력은 질의가 되며, 인공지능 학습의 결과물이 일종의 프로그래밍 언어인 SQL 질의 형태의 룰이기 때문에 룰과 룰 처리 결과를 학습하여 룰 고도화에 활용하는 AI에 적합하지 않다. 하지만 룰들을 간단한 블록들을 이용하여 구성한다면, 룰의 패턴 등을 인공지능 학습 모델의 입력과 출력(결과물)으로 사용할 수 있어, 추후 인공 지능 도입에 용이하다. 즉, 학습 결과로 이미 룰 엔진에 등록되어 있는 룰 블록들의 조합을 생성한다면, 기존 룰의 고도화(최적의 룰 블록 조합)을 학습 결과로 생성할 수 있다.On the other hand, if analysis (rule processing) is performed after storing the normalized result of the log in the database, the rules in the rule processing engine must be defined in the form of a query (SQL Query), so the input of the AI learning model becomes a query, Because the result of AI learning is a rule in the form of SQL query, which is a kind of programming language, it is not suitable for AI that learns rules and results of rule processing and utilizes them to advance rules. However, if the rules are constructed using simple blocks, the rule patterns can be used as input and output (results) of the AI learning model, making it easy to introduce artificial intelligence later. That is, if a combination of rule blocks already registered in the rule engine is generated as a learning result, an upgrade of the existing rule (optimal rule block combination) can be generated as a learning result.

다음, 도 4는 본 발명에 따른 행위 기반 룰 처리 방법의 처리 흐름을 나타내는 플로어 차트이다.Next, Figure 4 is a floor chart showing the processing flow of the behavior-based rule processing method according to the present invention.

도 4를 참조하면, 본 발명에 따른 행위 기반 룰 처리 방법은, 로그 수집부(100)에 의해 수집되며 룰 처리에 사용되는 로그의 내용을 미리 정해진 값들의 조합에 따라 분류하는 룰 블록과, 로그의 미리 정해진 메타데이터값들을 분류하는 조건 검사 블록을 포함함과 아울러 룰 블록과, 조건 검사 블록의 조합인 로그의 행위들을 룰로 미리 정의하는 실시간 룰 처리부(200)에 의해 실시간으로 룰 처리가 이루어지도록 로그를 가공하는 행위 기반 룰 처리 방법으로 총 4단계로 진행된다.Referring to FIG. 4, the behavior-based rule processing method according to the present invention includes a rule block that classifies the contents of a log collected by the log collection unit 100 and used for rule processing according to a combination of predetermined values, and a log In addition to including a condition check block that classifies predetermined metadata values of the rule block, rule processing is performed in real time by a real-time rule processing unit 200 that pre-defines the log blocks, which are combinations of the rule block and the condition check block, as rules. It is a process-based rule processing method for processing logs, and proceeds in four stages.

제 1 단계(S100)에서는, 로그 수집부(100)에 의해 수집되는 로그를 실시간 분산 스트리밍(300)이 불러온다.In the first step (S100), the real-time distributed streaming 300 loads the logs collected by the log collection unit 100.

제 2 단계(S200)에서는, 로그 수집부(100)로부터 불러온 로그를 실시간 분산 스트리밍(300)의 정규화 과정에 의해 정규화한다.In the second step (S200 ), the log retrieved from the log collection unit 100 is normalized by the normalization process of the real-time distributed streaming 300.

제 3 단계(S300)에서는, 실시간 분산 스트리밍(300)에 의해 정규화된 로그를 룰 블록과 매핑하여 로그의 사용자 식별(사용자 매핑)을 수행한다.In the third step (S300), the log normalized by the real-time distributed streaming 300 is mapped to the rule block to perform user identification (user mapping) of the log.

제 4 단계(S400)에서는. 매핑된 로그를 실시간 룰 처리부(200)에 의해 룰 처리한다.In the fourth step (S400). The mapped log is rule-processed by the real-time rule processing unit 200.

이때, 정규화 과정은, 불러오는 로그를 데이터 내용 또는 구분자에 따라 분류하는 제 1 정규화와, 제 1 정규화의 결과로부터 로그의 종류가 룰 블록에 해당하는지 검사하는 제 2 정규화를 각각 수행하게 된다.At this time, the normalization process performs first normalization to classify the loaded log according to data content or separator, and second normalization to check whether the log type corresponds to a rule block from the results of the first normalization.

한편, 로그의 종류가 룰 블록에 해당할 경우, 실시간 분산 스트리밍(300)은 룰 블록에 해당하는 블록과, 조건 검사 블록에 해당하는 블록을 조합하여 로그의 행위를 출력하고, 실시간 룰 처리부(200)는 로그의 행위가 미리 정의한 룰 블록의 내용과, 조건 검사 블록의 메타데이터와 일치하는지를 검사한다.On the other hand, if the type of log corresponds to a rule block, the real-time distributed streaming 300 combines the block corresponding to the rule block and the block corresponding to the condition check block to output the log behavior, and the real-time rule processing unit 200 ) Checks whether the behavior of the log matches the contents of the predefined rule block and the metadata of the condition check block.

이상의 설명은 본 발명의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 발명에 개시된 실시예들은 본 발명의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical idea of the present invention, and those of ordinary skill in the art to which the present invention pertains may make various modifications and variations without departing from the essential characteristics of the present invention. Therefore, the embodiments disclosed in the present invention are not intended to limit the technical spirit of the present invention, but to explain, and the scope of the technical spirit of the present invention is not limited by these embodiments. The scope of protection of the present invention should be interpreted by the following claims, and all technical spirits within the equivalent range should be interpreted as being included in the scope of the present invention.

10 : 복수의 에이전트
100 : 로그 수집부
200 : 실시간 룰 처리부
300 : 실시간 분산 스트리밍
1000 : 행위 기반 룰 처리 장치10: multiple agents
100: log collection unit
200: real-time rule processing unit
300: Real-time distributed streaming
1000: Behavior-based rule processing device

Claims

A log collection unit that collects logs used for rule processing from a plurality of agents and stores the collected logs;
The rule block includes a rule block for classifying contents of the log according to a combination of predetermined values, and a condition block for classifying predetermined metadata values of the log, and the rule block, It includes a real-time rule processing unit that pre-defines the actions of the log as a combination of the condition check block,
A behavior-based rule processing device for loading the log stored in the log collection unit by real-time distributed streaming.

According to claim 1,
The real-time distributed streaming,
A first normalization to classify the loaded log according to data content or separator,
And a normalization process, each of which performs a second normalization to check whether the log contents correspond to the rule block from the result of the first normalization.

According to claim 2,
When the content of the log corresponds to the rule block,
In the real-time distributed streaming, a block corresponding to the rule block and a block corresponding to the condition check block are combined to output the log action, and user mapping (identifying the user who caused the action) is performed. An action-based rule processing device.

The method of claim 3,
The real-time distributed streaming and the rule processing of the real-time rule processing unit are an action-based rule processing device performed on a memory.

The method of claim 3,
The real-time distributed streaming distributes the normalization process in parallel,
The real-time rule processing unit is an action-based rule processing device that performs multi-threading processing of the actions for each user or each action.

The method of claim 3,
The real-time rule processing unit is a behavior-based rule processing apparatus that processes a rule by placing a buffer in user-specific actions.

The method of claim 3,
The real-time rule processing unit is an action-based rule processing device that checks whether the action of the log matches the content of the rule block defined in advance and the metadata of the condition check block.

A rule block for classifying contents of a log collected by a log collection unit and used for rule processing according to a combination of predetermined values, and a condition check block for classifying predetermined metadata values of the log, and the rule As an action-based rule processing method of processing the log so that rule processing is performed in real time by a real-time rule processing unit that pre-defines actions of the log, which is a combination of a block and the condition check block, as a rule,
A first step (S100) in which real-time distributed streaming loads the log collected by the log collection unit;
A second step (S200) of normalizing the log retrieved from the log collection unit by a normalization process of the real-time distributed streaming,
A third step (S300) of mapping the log normalized by the real-time distributed streaming with the rule block to perform user identification (user mapping) of the log;
And a fourth step (S400) of processing the mapped log by the real-time rule processing unit (S400).

The method of claim 8,
The normalization process,
A first normalization to classify the loaded log according to data content or separator,
A behavior-based rule processing method for performing a second normalization, each of which checks whether the log content corresponds to the rule block, from the result of the first normalization.

The method of claim 8,
When the content of the log corresponds to the rule block,
The real-time distributed streaming combines the block corresponding to the rule block and the block corresponding to the condition check block and outputs the behavior of the log,
The real-time rule processing unit is an action-based rule processing method for checking whether the log action matches the content of the rule block, which is predefined, and metadata of the condition check block.