KR102088304B1

KR102088304B1 - Log Data Similar Pattern Matching and Risk Management Method Based on Graph Database

Info

Publication number: KR102088304B1
Application number: KR1020190043353A
Authority: KR
Inventors: 서인덕; 신윤섭
Original assignee: 주식회사 이글루시큐리티
Priority date: 2019-04-12
Filing date: 2019-04-12
Publication date: 2020-03-13

Abstract

The present invention relates to a graph database-based log data similar pattern matching and risk managing method. More specifically, provided is the graph database-based log data similar pattern matching and risk managing method which determines a subgraph with respect to a frequent pattern after designating collected log data as graph data and compares the similarity with comparison reference graph data so that efficient security control is possible by discovering, screening, and managing data with a high threat possibility.

Description

Log Data Similar Pattern Matching and Risk Management Method Based on Graph Database

본 발명은 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법에 관한 것으로, 더욱 상세하게는, 수집한 로그데이터를 그래프데이터로 저장한 후 빈발 패턴에 대하여 서브그래프를 결정하고, 비교기준 그래프데이터와 유사도를 비교함으로써, 위협 가능성이 큰 데이터를 색출하고 선별 및 관리하여 효율적인 보안관제가 가능한, 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법법에 관한 것이다.The present invention relates to a method for matching and risk management of log data-based log data similar patterns, and more specifically, after storing the collected log data as graph data, subgraphs are determined for the frequent patterns, and the comparison reference graph data By comparing similarities, it is possible to find, select, and manage data with a high threat potential to enable efficient security control.

정보자원 내의 정보의 중요성과 정보의 양이 커짐에 따라, 네트워크 상 보안의 중요성 역시 대두되었다. 정보자원의 보안을 위하여 통합보안관리시스템, 위협관리시스템, 방화벽 등의 보안장비 및 보안시스템이 사용되고 있다. 현재 지속적이고 가변성인 사이버 침해 시도 증가에 따라 많은 양의 보안이벤트가 발생하고 있으며 변화하는 공격에 대한 효율적인 대응이 필요한 상황이다.As the importance of information and the amount of information in information resources has increased, the importance of security on the network has also emerged. For the security of information resources, security equipment and security systems such as integrated security management systems, threat management systems, and firewalls are used. Currently, a large number of security events are occurring due to the continuous and variable increase in cyber infringement attempts, and it is necessary to respond effectively to changing attacks.

한편, 디지털 장비를 사용하는 사용자가 급격히 증가하였고, 컴퓨터와 IoT 기술의 발전에 따라 다양한 데이터가 발생하게 되었다. 이러한 상황에서, 상기 데이터들은 관계성이 큰 특징을 가지며, 데이터의 효율적인 관리를 위해서는 데이터가 가진 관계성에 대해서도 저장을 해야 한다. 이러한 관계성을 저장하기 위해서는 많은 데이터 저장공간이 필요하며, 데이터간 관계에 따른 연산도 많아지는 문제가 발생하였다. 이에 따라 관계성이 큰 데이터들을 관계에 따라 객체를 연결시켜 그래프데이터로 표현한다.Meanwhile, the number of users using digital equipment has rapidly increased, and various data has been generated according to the development of computers and IoT technologies. In this situation, the data has a great relationship, and for efficient management of the data, it is necessary to store the relationship of the data. In order to store this relationship, a lot of data storage space is required, and a problem arises in that calculations according to relationships between data are also increased. Accordingly, objects with high relations are expressed as graph data by connecting objects according to the relationship.

그래프데이터는 데이터 안의 객체 특성에 따라 다양한 패턴을 보일 수 있는데, 이런 다양한 패턴들 중에 일정한 패턴이 생성되고, 이를 빈발패턴이라 한다. 보안관제 영역에서 이러한 빈발패턴은 중요한 의미를 가질 수 있다. 어떠한 정보자원에 대하여 해킹 등의 위협이 지속적으로 가해진다는 것을 의미할 수 있기 때문이다. 따라서 이러한 빈발패턴을 따로 선별하고 관리할 필요성이 대두되었다.Graph data can show various patterns according to the characteristics of objects in the data. Among these various patterns, a certain pattern is generated, and this is called a frequent pattern. In the area of security control, this frequent pattern can have an important meaning. This can mean that threats such as hacking are continuously applied to certain information resources. Therefore, the need to separately select and manage these frequent patterns has emerged.

한국등록특허공보 제10-1764674호에 개시된 종래의 침해 자원에 대한 그래프 데이터베이스 생성 방법 및 그 장치를 참고하면, 종래의 침해 자원에 대한 그래프 데이터베이스 생성 방법 및 그 장치는 침해 행위에 대하여 그래프 데이터베이스를 생성하고 저장하는 것이다. 그러나 이런 그래프데이터로 저장된 경우에는 이를 직접 해석하거나 다른 그래프데이터와 비교하여야 한다는 문제점이 있다. 더군다나 오늘날과 같은 빅데이터 환경에서는 데이터의 양이 방대하여 그래프데이터의 크기도 클 수밖에 없다. 막대한 용량을 가진 그래프데이터에 대하여 사용자가 이를 다른 그래프데이터와 직접 비교한다거나, 다른 프로그램을 구동하여 비교한다는 것은 매우 어려운 과정이다.Referring to the conventional method and apparatus for generating a graph database for infringement resources disclosed in Korean Patent Publication No. 10-1764674, the conventional method and apparatus for generating a graph database for infringement resources creates a graph database for infringement behavior And save it. However, when stored as such graph data, there is a problem that it must be directly interpreted or compared with other graph data. Moreover, in today's big data environment, the amount of data is huge, so the size of graph data is inevitably large. It is a very difficult process for a user to directly compare it with other graph data, or to run a different program, for graph data with huge capacity.

또한, 종래의 발명은 그래프데이터의 계속적인 저장을 추구하며, 서브그래프 등을 이용한 해석 연산의 효율성과 신속성에는 그 방점을 두고 있지 않다. 또한 빈발패턴을 색출한다거나 탐색하는 방법에 대한 개시도 없어 빈발패턴을 선별하고 관리하는 데에 적절하지 않다. 이에 따라, 관련 업계에서는 침해 위협과 관련된 데이터에 관하여 그래프데이터를 생성하고, 빈발패턴을 선별하고 관리할 수 있으면서도, 그 연산이 효율적이고 신속한 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 요구하고 있는 실정이다.In addition, the conventional invention seeks to continuously store graph data, and does not place any emphasis on the efficiency and speed of analysis operations using subgraphs or the like. In addition, there is no disclosure on how to search for or search for frequent patterns, so it is not suitable for selecting and managing frequent patterns. Accordingly, related industries require graph data-based log data-like pattern matching and risk management methods while being able to generate graph data with respect to data related to infringement threats, select and manage frequent patterns, and efficiently perform the calculations efficiently and rapidly. That is true.

한국등록특허공보 제10-1764674호(2017.07.28)Korean Registered Patent Publication No. 10-1764674 (2017.07.28)

본 발명은 상기와 같은 문제점을 해결하고자 안출된 것으로, The present invention was made to solve the above problems,

본 발명의 목적은, 수집된 로그데이터를 유입 그래프데이터로 저장하는 유입 그래프데이터 생성단계를 포함하며, 상기 유입 그래프데이터 생성단계는, 상기 로그데이터의 각 요소를 그래프 요소로 지정하는 개별요소 지정단계와 저장된 그래프 요소의 적어도 일부에 대하여 레이블을 부여하는 레이블 부여단계를 포함하도록 구성함으로써 수집한 로그데이터를 그래프데이터로 저장한 후 빈발 패턴에 대하여 서브그래프를 결정하고, 비교기준 그래프데이터와 유사도를 비교함으로써, 위협 가능성이 큰 데이터를 색출하고 선별 및 관리하여 효율적인 보안관제가 가능한 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 제공하는 것이다.An object of the present invention includes an inflow graph data generation step of storing the collected log data as inflow graph data, wherein the inflow graph data generation step is an individual element designation step of designating each element of the log data as a graph element And configured to include a labeling step of labeling at least a portion of the stored graph elements to store the collected log data as graph data, determine subgraphs for frequent patterns, and compare similarity with comparative reference graph data By doing so, it is to provide a graph database-based log data similar pattern matching and risk management method that enables efficient security control by finding, screening, and managing data with high threat potential.

본 발명의 다른 목적은, 상기 개별요소 지정단계는, 수집된 로그데이터를 분석하여 제1요소 또는 제2요소의 값에 상응하는 값을 가진 노드가 없는 경우 노드를 생성하는 노드생성단계, 상기 로그데이터의 제1요소와 제2요소를 제1노드와 제2노드로 지정하는 노드지정단계 및 로그데이터의 제3요소를 엣지로 지정하는 엣지지정단계를 포함하도록 구성함으로써 데이터의 형식에 따라 그래프데이터의 구성요소를 지정하여 위협 가능성이 큰 데이터를 색출하고 선별 및 관리하여 효율적인 보안관제가 가능한 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 제공하는 것이다.Another object of the present invention, the individual element designation step is a node generation step of analyzing the collected log data and generating a node when no node has a value corresponding to the value of the first element or the second element, the log Graph data according to the data format by configuring to include the node designation step of designating the first and second elements of the data as the first node and the second node, and the edge designation step of designating the third element of the log data as the edge. It is to provide a method for matching and risk management of log data similar patterns based on graph database, which enables efficient security control by designating and selecting and managing the data with high threat potential by designating the components of.

본 발명의 또 다른 목적은, 상기 개별요소 지정단계는, 상기 엣지의 유형에 따라 가중치를 지정하는 가중치지정단계를 더 포함하도록 구성하여 수집한 데이터의 위협 유형에 따라 효율적으로 위협 가능성이 큰 데이터를 색출하는 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 제공하는 것이다.Another object of the present invention, the individual element designation step is configured to further include a weight designation step of designating a weight according to the type of the edge to efficiently collect data with a high threat potential according to the threat type of the collected data. It provides a pattern matching and risk management method for finding log data based on the graph database to search.

본 발명의 또 다른 목적은, 상기 제1요소는 공격자자원 이며, 상기 제2요소는 위협대상자원이고, 상기 엣지는 위협방법인 것을 특징으로 하여 정규 형식으로 변환된 로그에서 공격자와 위협대상을 분류하여 효율적으로 위협 가능성이 큰 데이터를 색출하는 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 제공하는 것이다.Still another object of the present invention is that the first element is an attacker resource, the second element is a threat target resource, and the edge is a threat method to classify an attacker and a threat target from a log converted to a regular format. Therefore, it is to provide similar pattern matching and risk management method based on graph database that efficiently finds the data with high threat potential.

본 발명의 또 다른 목적은, 상기 유입 그래프데이터와 비교기준 그래프데이터의 비교를 위해 각 서브그래프를 설정하는 서브그래프 결정단계와 서브그래프 간 패턴매칭으로 유사도를 계산하는 속성비교단계를 더 포함하도록 구성함으로써 서브그래프 간 유사패턴매칭을 통해 효율적으로 위협 가능성이 큰 데이터를 색출하는 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 제공하는 것이다.Another object of the present invention is configured to further include a subgraph determination step of setting each subgraph for comparison of the inflow graph data and the comparison reference graph data and an attribute comparison step of calculating similarity by pattern matching between the subgraphs. By doing so, it is to provide a log database similar pattern matching and risk management method based on a graph database that efficiently finds data with a high probability of threat through similar pattern matching between subgraphs.

본 발명의 또 다른 목적은, 상기 서브그래프 결정단계는, 서브그래프의 기준이 되는 기준노드를 설정하는 기준노드설정단계, 기준노드에 연결된 엣지를 탐색하여 지정하는 제1엣지탐색단계, 상기 기준노드와 엣지로 연결된 제1연결노드를 탐색하는 제1연결노드탐색단계를 포함하도록 구성함으로써 체계적으로 서브그래프를 구성하여 효율적으로 위협 가능성이 큰 데이터를 색출하는 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 제공하는 것이다.Another object of the present invention, the sub-graph determining step, a reference node setting step of setting a reference node that is the reference of the sub-graph, the first edge search step of searching and specifying the edge connected to the reference node, the reference node Graph database-based log data similarity pattern matching and risk management by systematically constructing subgraphs to search for data with a high threat potential by configuring to include the first connection node search step of searching for the first connection node connected to the edge Is to provide a way.

본 발명의 또 다른 목적은, 상기 제1엣지탐색단계는, 기준노드에 연결된 엣지 중 인엣지만을 탐색하여 서브그래프의 엣지로 지정하되, 상기 인엣지의 가중치가 0인 것은 제외하도록 구성함으로써 유사패턴매칭의 연산을 간소화하여 효율적인 빈발패턴 선별이 가능한 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 제공하는 것이다.Another object of the present invention, the first edge search step, by searching only the in-edge among the edges connected to the reference node to designate as an edge of the subgraph, but configured to exclude that the weight of the in-edge is 0, similar patterns It is to provide similar pattern matching and risk management method based on the graph database that enables efficient frequent pattern selection by simplifying the calculation of matching.

본 발명의 또 다른 목적은, 상기 서브그래프 결정단계는, 상기 기준노드설정단계 이전에 가중치가 0인 엣지를 삭제하는 엣지정리단계와, 상기 제1연결노드탐색단계 이후에 제1연결노드에 연결된 엣지 중 가중치가 0이 아닌 인엣지만을 탐색하여 서브그래프의 엣지로 지정하는 제2엣지탐색단계 및 제1연결노드에 인엣지로 연결된 노드를 제2연결노드로 지정하는 제2연결노드탐색단계를 더 포함하도록 구성하여 효율적인 연산이 가능하고 다변화하는 위협에 대응가능한 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 제공하는 것이다.Another object of the present invention, the sub-graph determination step, the edge clean-up step of deleting the weighted zero edge before the reference node setting step, and the first connection node search step connected to the first connection node The second edge search step, which designates the edge of the subgraph as the edge of the subgraph by searching only the in-edges whose weight is not 0, and the second connection node search step, which specifies the node connected to the edge as the second connection node to the first connection node. It is configured to further include to provide a log database-like pattern matching and risk management method based on graph database that enables efficient operation and responds to diversified threats.

본 발명의 또 다른 목적은, 상기 속성비교단계는, 유입 서브그래프와 비교기준 서브그래프를 기반으로 유사도 인자를 연산하는 서브그래프 연산단계와 유사도 인자를 비교하여 각 서브그래프의 유사도를 반환하는 유사도 반환단계를 포함하되, 상기 유사도 인자는 각 서브그래프의 노드와 엣지를 기준으로 연산되도록 구성함으로써 서브그래프의 정량적 비교가 가능하도록 하는 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 제공하는 것이다.Another object of the present invention, the attribute comparison step, comparing the similarity factor with the subgraph operation step of calculating the similarity factor based on the inflow subgraph and the comparison reference subgraph, return similarity to return the similarity of each subgraph Including a step, the similarity factor is configured to be calculated based on nodes and edges of each subgraph to provide a method for matching and risk management of log data similar patterns based on a graph database that enables quantitative comparison of subgraphs.

본 발명의 또 다른 목적은, 상기 유사도 반환단계는, 상기 유입 서브그래프의 유사도 인자와 비교기준 서브그래프의 유사도 인자의 차이를 기준으로 하여 그 차이가 적을수록 높은 유사도 값을 반환하도록 구성함으로써 빈발패턴을 효율적으로 선별하는 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 제공하는 것이다.Another object of the present invention, the similarity return step, based on the difference between the similarity factor of the inflow subgraph and the similarity factor of the comparison reference subgraph, the smaller the difference, the higher the similarity value is configured to return. It is to provide similar pattern matching and risk management method based on the graph database that efficiently selects.

본 발명의 또 다른 목적은, 서브그래프의 위험도를 계산하는 위험도 산출단계를 더 포함하도록 구성하여 유입된 데이터의 위험도를 효과적으로 통지하는 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 제공하는 것이다.Another object of the present invention is to provide a graph database-based log data similar pattern matching and risk management method to effectively notify the risk of incoming data by further comprising a risk calculation step of calculating the risk of the subgraph.

본 발명의 또 다른 목적은, 상기 위험도 산출단계는, 기준노드의 레이블 값과 상기 기준노드에 연결된 인엣지의 가중치를 기반으로 하여 위험도를 산출하도록 구성함으로써 위협대상자원과 위협유형에 따라 위험도를 가변적으로 통지하는 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 제공하는 것이다.Another object of the present invention, the risk calculation step, by configuring the risk value based on the weight of the label value of the reference node and the edge connected to the reference node, the risk is variable according to the threat resource and the threat type To provide similar pattern matching and risk management method based on graph database notified by.

본 발명의 또 다른 목적은, 빈발패턴을 선별하고 저장하여 관리하는 선별관리단계를 더 포함하도록 구성함으로써 선별한 빈발패턴을 효율적으로 관리하여 보안관제시스템의 효율성을 향상시키는 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 제공하는 것이다.Another object of the present invention, by configuring to further include a screening management step of selecting and storing and managing frequent patterns, the graph database-based log data similar pattern to efficiently manage the selected frequent patterns to improve the efficiency of the security control system It is to provide a matching and risk management method.

본 발명은 앞서 본 목적을 달성하기 위해서 다음과 같은 구성을 가진 실시예에 의해서 구현된다.The present invention is implemented by an embodiment having the following configuration in order to achieve the above object.

본 발명의 일 실시예에 따르면, 본 발명은, 수집된 로그데이터를 유입 그래프데이터로 저장하는 유입 그래프데이터 생성단계를 포함하며, 상기 유입 그래프데이터 생성단계는, 상기 로그데이터의 각 요소를 그래프 요소로 지정하는 개별요소 지정단계와 저장된 그래프 요소의 적어도 일부에 대하여 레이블을 부여하는 레이블 부여단계를 포함하는 것을 특징으로 한다.According to an embodiment of the present invention, the present invention includes an inflow graph data generation step of storing the collected log data as inflow graph data, wherein the inflow graph data generation step comprises graph elements for each element of the log data. And an individual element designating step designated by and a labeling step of labeling at least a part of the stored graph elements.

본 발명의 다른 실시예에 따르면, 본 발명은, 상기 개별요소 지정단계는, 수집된 로그데이터를 분석하여 제1요소 또는 제2요소의 값에 상응하는 값을 가진 노드가 없는 경우 노드를 생성하는 노드생성단계, 상기 로그데이터의 제1요소와 제2요소를 제1노드와 제2노드로 지정하는 노드지정단계 및 로그데이터의 제3요소를 엣지로 지정하는 엣지지정단계를 포함하는 것을 특징으로 한다.According to another embodiment of the present invention, the present invention, the individual element designation step, analyzes the collected log data to generate a node when there is no node having a value corresponding to the value of the first element or the second element And a node designation step, a node designation step of designating the first and second elements of the log data as first and second nodes, and an edge designation step of designating the third element of the log data as an edge. do.

본 발명의 또 다른 실시예에 따르면, 본 발명은, 상기 개별요소 지정단계는, 상기 엣지의 유형에 따라 가중치를 지정하는 가중치지정단계를 더 포함하는 것을 특징으로 한다.According to another embodiment of the present invention, the present invention is characterized in that the individual element designation step further comprises a weight designation step of designating a weight according to the type of the edge.

본 발명의 또 다른 실시예에 따르면, 본 발명은, 상기 제1요소는 공격자자원 이며, 상기 제2요소는 위협대상자원이고, 상기 엣지는 위협방법인 것을 특징으로 한다.According to another embodiment of the present invention, the present invention is characterized in that the first element is an attacker resource, the second element is a threat target resource, and the edge is a threat method.

본 발명의 또 다른 실시예에 따르면, 본 발명은, 상기 유입 그래프데이터와 비교기준 그래프데이터의 비교를 위해 각 서브그래프를 설정하는 서브그래프 결정단계와 서브그래프 간 패턴매칭으로 유사도를 계산하는 속성비교단계를 더 포함하는 것을 특징으로 한다.According to another embodiment of the present invention, the present invention compares the inflow graph data and the comparison reference graph data by comparing the properties of calculating the similarity by sub-graph determination step of setting each sub-graph and pattern matching between the sub-graphs Characterized in that it further comprises a step.

본 발명의 또 다른 실시예에 따르면, 본 발명은, 상기 서브그래프 결정단계는, 서브그래프의 기준이 되는 기준노드를 설정하는 기준노드설정단계, 기준노드에 연결된 엣지를 탐색하여 지정하는 제1엣지탐색단계, 상기 기준노드와 엣지로 연결된 제1연결노드를 탐색하는 제1연결노드탐색단계를 포함하는 것을 특징으로 한다.According to another embodiment of the present invention, the present invention, the sub-graph determining step, a reference node setting step of setting a reference node that is the reference of the sub-graph, the first edge to search for and specify the edge connected to the reference node And a first connection node search step of searching for a first connection node connected to the reference node and an edge.

본 발명의 또 다른 실시예에 따르면, 본 발명은, 상기 제1엣지탐색단계는, 기준노드에 연결된 엣지 중 인엣지만을 탐색하여 서브그래프의 엣지로 지정하되, 상기 인엣지의 가중치가 0인 것은 제외하는 것을 특징으로 한다.According to another embodiment of the present invention, in the present invention, in the first edge search step, only the in-edge among the edges connected to the reference node is searched and designated as an edge of the subgraph, but the weight of the in-edge is 0 It is characterized by being excluded.

본 발명의 또 다른 실시예에 따르면, 본 발명은, 상기 서브그래프 결정단계는, 상기 기준노드설정단계 이전에 가중치가 0인 엣지를 삭제하는 엣지정리단계와, 상기 제1연결노드탐색단계 이후에 제1연결노드에 연결된 엣지 중 가중치가 0이 아닌 인엣지만을 탐색하여 서브그래프의 엣지로 지정하는 제2엣지탐색단계 및 제1연결노드에 인엣지로 연결된 노드를 제2연결노드로 지정하는 제2연결노드탐색단계를 더 포함하는 것을 특징으로 한다.According to another embodiment of the present invention, the present invention, the sub-graph determination step, the edge clean-up step of deleting the edge with a weight of zero before the reference node setting step, and after the first connection node search step A second edge search step of designating as an edge of the subgraph by searching only the in-edge that has a non-zero weight among the edges connected to the first connection node, and a second node designating a node connected as an edge to the first connection node as the second connection node. It characterized in that it further comprises a connection node search step.

본 발명의 또 다른 실시예에 따르면, 본 발명은, 상기 속성비교단계는, 유입 서브그래프와 비교기준 서브그래프를 기반으로 유사도 인자를 연산하는 서브그래프 연산단계와 유사도 인자를 비교하여 각 서브그래프의 유사도를 반환하는 유사도 반환단계를 포함하되, 상기 유사도 인자는 각 서브그래프의 노드와 엣지를 기준으로 연산되는 것을 특징으로 한다.According to another embodiment of the present invention, in the present invention, the attribute comparison step compares the similarity factor with the subgraph calculation step of calculating the similarity factor based on the inflow subgraph and the comparison reference subgraph, and compares the similarity factor with each subgraph. And a similarity return step of returning the similarity, wherein the similarity factor is calculated based on nodes and edges of each subgraph.

본 발명의 또 다른 실시예에 따르면, 본 발명은, 상기 유사도 반환단계는, 상기 유입 서브그래프의 유사도 인자와 비교기준 서브그래프의 유사도 인자의 차이를 기준으로 하여 그 차이가 적을수록 높은 유사도 값을 반환하는 것을 특징으로 한다.According to another embodiment of the present invention, the present invention, the similarity return step, based on the difference between the similarity factor of the inflow subgraph and the similarity factor of the comparison reference subgraph, the smaller the difference, the higher the similarity value. It is characterized by returning.

본 발명의 또 다른 실시예에 따르면, 본 발명은, 서브그래프의 위험도를 계산하는 위험도 산출단계를 더 포함하는 것을 특징으로 한다.According to another embodiment of the present invention, the present invention is characterized in that it further comprises a risk calculation step of calculating the risk of the subgraph.

본 발명의 또 다른 실시예에 따르면, 본 발명은, 상기 위험도 산출단계는, 기준노드의 레이블 값과 상기 기준노드에 연결된 인엣지의 가중치를 기반으로 하여 위험도를 산출하는 것을 특징으로 한다.According to another embodiment of the present invention, the present invention is characterized in that the risk calculating step calculates a risk based on a label value of a reference node and a weight of an edge connected to the reference node.

본 발명의 또 다른 실시예에 따르면, 본 발명은, 빈발패턴을 선별하고 저장하여 관리하는 선별관리단계를 더 포함하는 것을 특징으로 한다.According to another embodiment of the present invention, the present invention is characterized in that it further comprises a screening management step of screening and storing frequent patterns.

본 발명은 앞서 본 실시예와 하기에 설명할 구성과 결합, 사용관계에 의해 다음과 같은 효과를 얻을 수 있다.According to the present invention, the following effects can be obtained according to the configuration, combination, and use relationship described above with respect to the present embodiment.

본 발명은, 수집된 로그데이터를 유입 그래프데이터로 저장하는 유입 그래프데이터 생성단계를 포함하며, 상기 유입 그래프데이터 생성단계는, 상기 로그데이터의 각 요소를 그래프 요소로 지정하는 개별요소 지정단계와 저장된 그래프 요소의 적어도 일부에 대하여 레이블을 부여하는 레이블 부여단계를 포함하도록 구성함으로써 수집한 로그데이터를 그래프데이터로 저장한 후 빈발 패턴에 대하여 서브그래프를 결정하고, 비교기준 그래프데이터와 유사도를 비교함으로써, 위협 가능성이 큰 데이터를 색출하고 선별 및 관리하여 효율적인 보안관제가 가능한 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 제공하는 효과가 있다.The present invention includes an inflow graph data generation step of storing the collected log data as inflow graph data, wherein the inflow graph data generation step includes individual element designation steps for designating each element of the log data as graph elements and stored data. By storing the collected log data as graph data by configuring to include a labeling step of labeling at least a part of the graph elements, by sub-graphing the frequent patterns, and comparing similarity with the comparative reference graph data, It has the effect of providing log pattern similar pattern matching and risk management method based on graph database that enables efficient security control by finding, screening, and managing data with high threat potential.

본 발명은, 상기 개별요소 지정단계는, 수집된 로그데이터를 분석하여 제1요소 또는 제2요소의 값에 상응하는 값을 가진 노드가 없는 경우 노드를 생성하는 노드생성단계, 상기 로그데이터의 제1요소와 제2요소를 제1노드와 제2노드로 지정하는 노드지정단계 및 로그데이터의 제3요소를 엣지로 지정하는 엣지지정단계를 포함하도록 구성함으로써 데이터의 형식에 따라 그래프데이터의 구성요소를 지정하여 위협 가능성이 큰 데이터를 색출하고 선별 및 관리하여 효율적인 보안관제가 가능한 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 제공하는 효과를 도출한다.In the present invention, the individual element designation step comprises: generating a node when there is no node having a value corresponding to the value of the first element or the second element by analyzing the collected log data, and generating the log data. It is configured to include the node designation step of designating the 1st element and the 2nd element as the 1st node and the 2nd node, and the edge designation step of designating the 3rd element of the log data as an edge. By designating, selecting and managing the data with high threat potential, we derive the effect of providing similar pattern matching and risk management method based on the graph database that enables efficient security control.

본 발명은, 상기 개별요소 지정단계는, 상기 엣지의 유형에 따라 가중치를 지정하는 가중치지정단계를 더 포함하도록 구성하여 수집한 데이터의 위협 유형에 따라 효율적으로 위협 가능성이 큰 데이터를 색출하는 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 제공하는 제공하는 효과를 수반한다.In the present invention, the individual element designation step is further configured to further include a weight designation step of designating a weight according to the type of the edge, and a graph database for efficiently extracting data having a high threat potential according to the threat type of the collected data Based log data, similar pattern matching and risk management are provided.

본 발명은, 상기 제1요소는 공격자자원 이며, 상기 제2요소는 위협대상자원이고, 상기 엣지는 위협방법인 것을 특징으로 하여 정규 형식으로 변환된 로그에서 공격자와 위협대상을 분류하여 효율적으로 위협 가능성이 큰 데이터를 색출하는 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 제공하는 효과가 있다.The present invention is characterized in that the first element is an attacker resource, the second element is a threat resource, and the edge is a threat method. It has the effect of providing similar pattern matching and risk management methods based on graph data to search for data with high probability.

본 발명은, 상기 유입 그래프데이터와 비교기준 그래프데이터의 비교를 위해 각 서브그래프를 설정하는 서브그래프 결정단계와 서브그래프 간 패턴매칭으로 유사도를 계산하는 속성비교단계를 더 포함하도록 구성함으로써 서브그래프 간 유사패턴매칭을 통해 효율적으로 위협 가능성이 큰 데이터를 색출하는 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 제공하는 효과를 가진다.The present invention is configured to further include an attribute comparison step of calculating similarity by a pattern matching between a subgraph determination step of setting each subgraph and a pattern matching between the subgraphs for comparison of the inflow graph data and comparison reference graph data. It has the effect of providing similar pattern matching and risk management methods based on graph database that efficiently finds data with a high probability of threat through similar pattern matching.

본 발명은, 상기 서브그래프 결정단계는, 서브그래프의 기준이 되는 기준노드를 설정하는 기준노드설정단계, 기준노드에 연결된 엣지를 탐색하여 지정하는 제1엣지탐색단계, 상기 기준노드와 엣지로 연결된 제1연결노드를 탐색하는 제1연결노드탐색단계를 포함하도록 구성함으로써 체계적으로 서브그래프를 구성하여 효율적으로 위협 가능성이 큰 데이터를 색출하는 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 제공한다.In the present invention, the sub-graph determining step, a reference node setting step of setting a reference node as a reference for the sub-graph, a first edge search step of searching for and specifying an edge connected to the reference node, connected to the reference node and the edge Provides a graph database-based log data similar pattern matching and risk management method that systematically constructs a subgraph by searching for the first connection node search step to search for the first connection node, and efficiently finds data with a high threat potential. .

본 발명은, 상기 제1엣지탐색단계는, 기준노드에 연결된 엣지 중 인엣지만을 탐색하여 서브그래프의 엣지로 지정하되, 상기 인엣지의 가중치가 0인 것은 제외하도록 구성함으로써 유사패턴매칭의 연산을 간소화하여 효율적인 빈발패턴 선별이 가능한 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 제공하는 효과를 준다.The present invention, the first edge search step, by searching only the in-edge of the edges connected to the reference node to designate the edge of the subgraph, but configured to exclude that the weight of the in-edge is 0, the calculation of similar pattern matching It has the effect of providing a similar pattern matching and risk management method based on graph data that enables efficient and efficient frequent pattern selection.

본 발명은, 상기 서브그래프 결정단계는, 상기 기준노드설정단계 이전에 가중치가 0인 엣지를 삭제하는 엣지정리단계와, 상기 제1연결노드탐색단계 이후에 제1연결노드에 연결된 엣지 중 가중치가 0이 아닌 인엣지만을 탐색하여 서브그래프의 엣지로 지정하는 제2엣지탐색단계 및 제1연결노드에 인엣지로 연결된 노드를 제2연결노드로 지정하는 제2연결노드탐색단계를 더 포함하도록 구성하여 효율적인 연산이 가능하고 다변화하는 위협에 대응가능한 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 제공하는 효과를 수반한다.In the present invention, the sub-graph determining step includes an edge clean-up step of deleting an edge having a weight of 0 before the reference node setting step, and a weight among the edges connected to the first connection node after the first connection node search step. It is configured to further include a second edge search step of designating a non-zero in-edge as an edge of the subgraph and a second connection node search step of designating a node connected as an edge to the first connection node as the second connection node. Therefore, it has the effect of providing a similar pattern matching and risk management method based on graph data, which is capable of efficient computation and responds to diversified threats.

본 발명은, 상기 속성비교단계는, 유입 서브그래프와 비교기준 서브그래프를 기반으로 유사도 인자를 연산하는 서브그래프 연산단계와 유사도 인자를 비교하여 각 서브그래프의 유사도를 반환하는 유사도 반환단계를 포함하되, 상기 유사도 인자는 각 서브그래프의 노드와 엣지를 기준으로 연산되도록 구성함으로써 서브그래프의 정량적 비교가 가능하도록 하는 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법 제공하는 효과가 존재한다.In the present invention, the attribute comparison step includes a similarity return step of returning the similarity of each subgraph by comparing the similarity factor with the subgraph calculation step of calculating the similarity factor based on the inflow subgraph and the comparison reference subgraph. , The similarity factor is configured to be calculated based on nodes and edges of each subgraph, thereby providing an effect of providing a pattern matching and risk management method of log data based on a graph database that enables quantitative comparison of subgraphs.

본 발명은, 상기 유사도 반환단계는, 상기 유입 서브그래프의 유사도 인자와 비교기준 서브그래프의 유사도 인자의 차이를 기준으로 하여 그 차이가 적을수록 높은 유사도 값을 반환하도록 구성함으로써 빈발패턴을 효율적으로 선별하는 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 제공하는 효과가 있다.In the present invention, the similarity returning step is based on the difference between the similarity factor of the inflow subgraph and the similarity factor of the comparison reference subgraph, so that the smaller the difference is, the higher the similarity value is efficiently selected, thereby frequently selecting frequent patterns This has the effect of providing a similar method of pattern matching and risk management based on the graph database.

본 발명은, 서브그래프의 위험도를 계산하는 위험도 산출단계를 더 포함하도록 구성하여 유입된 데이터의 위험도를 효과적으로 통지하는 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 제공할 수 있다.The present invention can be configured to further include a risk calculation step of calculating the risk of the subgraph to provide a graph database-based log data similar pattern matching and risk management method for effectively notifying the risk of the incoming data.

본 발명은, 상기 위험도 산출단계는, 기준노드의 레이블 값과 상기 기준노드에 연결된 인엣지의 가중치를 기반으로 하여 위험도를 산출하도록 구성함으로써 위협대상자원과 위협유형에 따라 위험도를 가변적으로 통지하는 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 제공한다.In the present invention, the risk calculation step is configured to calculate a risk based on a label value of a reference node and a weight of an edge connected to the reference node, so that the risk is variably notified according to the target resource and the threat type It provides database-based log data similar pattern matching and risk management methods.

본 발명은, 빈발패턴을 선별하고 저장하여 관리하는 선별관리단계를 더 포함하도록 구성함으로써 선별한 빈발패턴을 효율적으로 관리하여 보안관제시스템의 효율성을 향상시키는 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 제공하는 효과를 도출한다.The present invention is configured to further include a screening management step of selecting, storing, and managing frequent patterns to efficiently manage the selected frequent patterns to improve the efficiency of the security control system. Derive the effect that provides the method.

도 1은 본 발명의 일 실시예에 따른 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법의 흐름도.
도 2는 본 발명의 일 실시예에 따른 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법을 실행하는 시스템의 블록도.
도 3은 도 1의 유입 그래프데이터 생성단계를 도시한 흐름도.
도 4는 도 3의 개별요소 지정단계 및 레이블 부여단계의 일 실시예를 도시한 도면.
도 5는 도 3의 개별요소 지정단계를 도시한 흐름도.
도 6은 도 1의 서브그래프 결정단계를 도시한 흐름도.
도 7은 도 6의 서브그래프 결정단계의 일 실시예를 도시한 도면.
도 8은 도 1의 속성 비교단계를 도시한 흐름도.
도 9는 도 8의 속성 비교단계의 일 실시예를 도시한 도면.
도 10은 도 1의 선별관리단계를 도시한 흐름도.1 is a flow chart of a method for matching and risk management of log data similar patterns based on a graph database according to an embodiment of the present invention.
2 is a block diagram of a system that executes a method for matching and risk management of similar patterns in log data based on a graph database according to an embodiment of the present invention.
Figure 3 is a flow chart showing the step of generating the inflow graph data of Figure 1;
4 is a view showing an embodiment of the individual element designation step and labeling step of FIG. 3;
Figure 5 is a flow chart showing the individual element designation step of Figure 3;
6 is a flowchart illustrating a subgraph determination step of FIG. 1.
FIG. 7 is a view showing an embodiment of the subgraph determination step of FIG. 6.
8 is a flow chart showing the attribute comparison step of FIG.
9 is a view showing an embodiment of the attribute comparison step of FIG. 8;
10 is a flow chart showing the selection management step of Figure 1;

이하에서는 본 발명에 따른 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법의 바람직한 실시 예들을 첨부된 도면을 참조하여 상세히 설명한다. 하기에서 본 발명을 설명함에 있어 공지의 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하도록 한다. 특별한 정의가 없는 한 본 명세서의 모든 용어는 본 발명이 속하는 기술분야의 통상의 지식을 가진 기술자가 이해하는 당해 용어의 일반적 의미와 동일하고 만약 본 명세서에서 사용된 용어의 의미와 충돌하는 경우에는 본 명세서에서 사용된 정의에 따른다.Hereinafter, preferred embodiments of the graph database-based log data similar pattern matching and risk management method according to the present invention will be described in detail with reference to the accompanying drawings. In the following description of the present invention, when it is determined that a detailed description of known functions or configurations may unnecessarily obscure the subject matter of the present invention, the detailed description will be omitted. Unless otherwise specified, all terms in this specification are the same as the general meaning of the term understood by a person skilled in the art to which the present invention belongs, and if there is a conflict with the meaning of the term used herein It follows the definition used in the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니고, 다른 구성요소 또한 더 포함할 수 있는 것을 의미한다. 이하, 첨부한 도면을 참조하여 본 발명의 바람직한 실시예를 설명함으로써 본 발명을 상세히 설명한다.Throughout the specification, when a part “includes” a certain component, this means that other components are not excluded, and that other components may be further included unless specifically stated to the contrary. Hereinafter, the present invention will be described in detail by describing preferred embodiments of the present invention with reference to the accompanying drawings.

본 명세서에서, 위협이란 정보자원을 대상으로 한 악의적 행위가 수행되는 것을 의미한다. 위협에는 여러가지 유형이 있을 수 있으므로, 위협 유형 및 위협 유형에 따른 위협 가능성에도 여러가지가 있을 수 있다. 또한 공격자자원은 악의적 행위 수행을 위한 자산과 관련된 정보로써, 아이피(IP)나 도메인 및 악성코드가 포함될 수 있다.In this specification, a threat means that a malicious activity targeting an information resource is performed. Since there can be various types of threats, there can be various types of threats depending on the threat type and the threat type. In addition, the attacker resource is information related to assets for performing malicious actions, and may include IP, domain, and malicious code.

본 명세서에서, 그래프데이터는 그래프 이론에 토대를 둔 데이터배이스로, 노드(node), 엣지(edge), 레이블(label)을 갖추고 있다. 노드는 추적 대상이 되는 객체로, 본 명세서의 경우 공격자자원이나 위협대상자원 등 정보자원이 노드로 플롯될 수 있다. 이에 따라 노드에는 아이피(IP), 도메인 등 자산을 특정할 수 있는 개념이 플롯된다. 엣지는 노드간의 관계를 나타낸다. 어떤 노드에서 다른 노드로 데이터가 전송될 때, 엣지는 방향을 가질 수 있다. 또한, 이러한 엣지는 가중치를 가질 수 있는데, 본 명세서에서는 위협유형이 수반하는 위험성에 따라 엣지에 가중치를 부여하게 된다. 레이블은 노드의 정보와 관련이 있는 것으로, 여러 노드에 대하여 레이블을 부여함으로써 같은 집합으로 분류할 수 잇다. 본 명세서의 경우, 자산이 갖는 중요도에 따라 레이블이 부여될 수 있고, 또는 유입되는 데이터의 위협 정도에 따라 레이블이 부여될 수도 있다.In this specification, graph data is a data base based on graph theory, and includes nodes, edges, and labels. The node is an object to be tracked, and in this specification, information resources such as an attacker resource or a threat target resource may be plotted as a node. Accordingly, the concept of identifying assets such as IP and domain is plotted on the node. Edge represents the relationship between nodes. When data is transmitted from one node to another, the edge may have a direction. In addition, such an edge may have a weight, and in this specification, the edge is weighted according to the risk accompanying the threat type. Labels are related to node information and can be classified into the same set by labeling multiple nodes. In the present specification, the label may be labeled according to the importance of the asset, or may be labeled according to the threat level of the incoming data.

도 1은 본 발명의 일 실시예에 따른 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법이다. 도 1을 참조하면, 이그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법(S1)은 수집한 로그데이터를 그래프데이터로 지정한 후 빈발 패턴에 대하여 서브그래프를 결정하고, 비교기준 그래프데이터와 유사도를 비교함으로써, 위협 가능성이 큰 데이터를 색출하고 선별 및 관리하여 효율적인 보안관제가 가능한 것으로, 유입 그래프데이터 생성단계(S10)를 포함할 수 있다.1 is a graph database based log data similar pattern matching and risk management method according to an embodiment of the present invention. Referring to FIG. 1, the method for matching and risk management of log data similar patterns based on this graph database (S1) specifies the collected log data as graph data, determines a subgraph for frequent patterns, and compares similarity with comparative reference graph data. By comparison, it is possible to efficiently control data by finding, screening, and managing threat-prone data, and may include an inflow graph data generation step (S10).

도 2는 본 발명의 일 실시예에 따른 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법에 따른 시스템(1)의 블록도이다. 상기 시스템은 그래프데이터 생성부(10)에서 로그데이터를 수집하고 정규화하며, 유입 그래프데이터를 생성할 수 있다. FIG. 2 is a block diagram of a system 1 according to a method for matching and matching risk patterns of log data based on a graph database according to an embodiment of the present invention. The system may collect and normalize log data from the graph data generation unit 10, and generate inflow graph data.

그래프데이터 분석부(30)에서는 유입 그래프데이터와 비교기준 그래프데이터에 대하여 서브그래프를 생성하고, 패턴매칭 기법으로 유사도를 판단하고, 위험도를 산출할 수 있다.The graph data analysis unit 30 may generate a subgraph with respect to the incoming graph data and the comparison reference graph data, determine similarity using a pattern matching technique, and calculate a risk.

도 3은 본 발명의 일 실시예에 따른 유입 그래프데이터 생성단계(S10)를 도시한 흐름도이다. 도 3을 참고하면, 상기 유입 그래프데이터 생성단계(S10)는 수집된 로그데이터를 유입 그래프데이터로 변환하여 지정하는 것이다. 본 발명은 수집된 로그데이터를 그래프데이터로 변환하고, 비교기준 그래프데이터와 비교하여 유사도를 산출함으로써 빈발패턴을 색출하는 것을 목적으로 하므로, 소정의 과정을 통하여 그래프데이터를 생성한다. 상기 유입 그래프데이터 생성단계(S10)는 전처리단계(S11), 개별요소 지정단계(S13), 레이블 부여단계(S15)를 포함할 수 있다.3 is a flowchart illustrating an inflow graph data generation step (S10) according to an embodiment of the present invention. Referring to FIG. 3, the inflow graph data generation step (S10) is to designate by converting the collected log data into inflow graph data. The present invention is to convert the collected log data into graph data, and compare the comparison with the reference graph data to calculate the similarity, so as to find out the frequent pattern, to generate graph data through a predetermined process. The inflow graph data generation step (S10) may include a pre-processing step (S11), an individual element designation step (S13), and a labeling step (S15).

상기 전처리단계(S11)는 네트워크의 구성요소들로부터 전달되는 로그를 수집하고, 정규 형식으로 변환하고 가공하는 과정이다. 본 발명의 일 실시예에서는 엔드포인트(Endpoint)에서 발생되는 로그, SIEM이나 IPS 등 보안관제 시스템에서 발생한 로그를 수집할 수 있다. 수집된 로그는 출발지 IP정보(source IP), 목적지 IP(destination IP) 정보, 출발지 포트(source port) 정보, 목적지 포트(destination port) 정보, 호스트(host) 정보, 페이로드(payload) 정보, HTTP 레퍼러(hypertext transfer protocol referer) 정보 및 보안이벤트의 개수 정보 중 적어도 하나를 포함할 수 있다. 또한, 정규화된 로그는 Time, Type, SentIP, DestIP, Payload 등을 비롯한 필드를 포함하는 데이터로 구성될 수 있다. 이때, Time은 이벤트의 탐지시각, Type은 이벤트의 유형, SentIP는 출발지 IP, DestIP는 이벤트의 목적지 IP, Payload는 로그에 담긴 공격의 정보가 될 수 있다. 상기 로그 및 정규화된 로그가 포함하고 있는 정보 및/또는 필드는 예시적인 사항이며, 보안관제에 필요한 다른 정보 및/또는 필드를 포함할 수 있는 것을 알 수 있다. 수집된 데이터는 상기 정보들 중 일부를 결여할 수 있으며, 중복하여 포함할 수도 있고, 필드의 값의 유형이 다를수도 있다. 이에 따라 상기 전처리단계(11)에서는 로그 데이터를 정규 형식으로 변환함과 동시에 결여된 값을 채우거나, 중복된 값을 지우는 등의 데이터 정제를 할 수도 있다. The pre-processing step (S11) is a process of collecting logs transmitted from components of the network, converting them into a normal format, and processing them. In one embodiment of the present invention, a log generated from an endpoint or a security control system such as SIEM or IPS may be collected. The collected logs include source IP information, destination IP information, source port information, destination port information, host information, payload information, HTTP It may include at least one of referrer (hypertext transfer protocol referer) information and the number of security events. In addition, the normalized log may be composed of data including fields including Time, Type, SentIP, DestIP, Payload, and the like. At this time, Time can be the detection time of the event, Type is the type of the event, SentIP is the source IP, DestIP is the destination IP of the event, and Payload can be information about the attack in the log. It can be seen that the information and / or fields included in the log and the normalized log are exemplary, and may include other information and / or fields required for security control. The collected data may lack some of the above information, may be included redundantly, or may have different types of field values. Accordingly, in the pre-processing step 11, log data may be converted to a normal format, and data may be refined such as filling a missing value or deleting a duplicate value.

도 5는 본 발명의 일 실시예에 따른 개별요소 지정단계(S13)의 흐름도이다. 도 5를 참고하면, 상기 개별요소 지정단계(S13)는 전처리된 로그데이터를 유입 그래프데이터로 변환하는 단계이다. 정규화된 로그데이터는 그래프데이터의 요소인 필요한 노드, 엣지, 레이블 등으로의 변환에 필요한 정보를 가지고 있으며, 상기 개별요소 지정단계(S13)에서 로그데이터를 매칭하여 그래프데이터의 요소로 저장하게 된다. 상기 개별요소 지정단계(S13)는 노드생성단계(S131), 노드지정단계(S133), 엣지지정단계(S135), 가중치지정단계(S137)를 포함할 수 있다.5 is a flowchart of an individual element designation step (S13) according to an embodiment of the present invention. Referring to FIG. 5, the individual element designation step (S13) is a step of converting pre-processed log data into inflow graph data. The normalized log data has information necessary for conversion to necessary nodes, edges, labels, etc., which are elements of graph data, and matches log data in the individual element designation step (S13) to be stored as elements of graph data. The individual element designation step (S13) may include a node creation step (S131), a node designation step (S133), an edge designation step (S135), and a weight designation step (S137).

상기 노드생성단계(S131)는 수집된 로그데이터를 분석하여 제1요소 또는 제2요소의 값에 상응하는 값을 가진 노드가 없는 경우, 해당하는 값의 노드를 생성한다. 도 4를 참고하여 이를 설명하면, 정규화된 로그에는 전술한 바와 같이 여러 정보가 포함될 수 있다. 노드생성단계(S131)에서는 상기 로그에서 제1요소에 대해 값을 확인하고, 해당하는 값을 가진 노드가 존재하는지 확인한다. 해당하는 값을 가진 노드가 존재하는 경우, 새로운 노드를 생성하지 않고, 해당하는 값을 가진 노드가 존재하지 않는 경우 새로운 노드를 생성한다. 생성된 새로운 노드는 로그데이터의 제1요소의 값을 가진다. 로그데이터의 제2요소에 대해서도 같은 작업을 반복한다.In the node creation step (S131), when there is no node having a value corresponding to the value of the first element or the second element by analyzing the collected log data, a node having a corresponding value is generated. If this is described with reference to FIG. 4, the normalized log may include various information as described above. In the node creation step (S131), a value is checked for a first element in the log, and it is determined whether a node having a corresponding value exists. If a node with a corresponding value exists, a new node is not created. If a node with a corresponding value does not exist, a new node is created. The generated new node has the value of the first element of log data. The same operation is repeated for the second element of log data.

본 발명의 일 실시예에서, 제1요소는 공격자자원일 수 있다. 공격자자원은 정보자원에 대하여 악의적 행위를 수행하는 자원이다. 상기 공격자자원은 전술한 바와 같이 아이피(IP)나 도메인 및 악성코드가 포함될 수 있으며, 바람직하게는 아이피를 제1요소로 할 수 있다. 전처리된 로그는 SentIP라는 필드명으로 데이터의 출발지IP를 포함하고 있을 수 있다.In one embodiment of the invention, the first element may be an attacker resource. Attacker resources are resources that perform malicious acts on information resources. The attacker resource may include an IP (IP), a domain, and a malicious code as described above, and preferably, the IP may be a first element. The pre-processed log may include the source IP of the data in the field name SentIP.

본 발명의 일 실시예에서, 제2요소는 위협대상자원일 수 있다. 공격자자원에 의해 악의적 행위가 수행될 때 그 표적이 되는 자원이라 볼 수 있다. 상기 위협대상자원은 아이피(IP), 도메인, 시스템이 될 수 있으며, 특정 개념으로 한정되지 않는다. 바람직하게는 악의적 행위의 대상이 되는 표적의 아이피를 제2요소로 할 수 있다. 전처리된 로그는 DestIP라는 필드명으로 데이터의 도착지IP를 포함할 수 있다.In one embodiment of the present invention, the second element may be a resource to be threatened. It can be regarded as a target resource when malicious action is performed by an attacker resource. The threat target resource may be an IP, a domain, or a system, and is not limited to a specific concept. Preferably, the IP of the target, which is the object of malicious activity, can be used as the second element. The pre-processed log may include the destination IP of data as a field name of DestIP.

상기 노드지정단계(S133)는 로그데이터의 제1요소와 제2요소를 유입 그래프데이터의 제1노드와 제2노드로 지정하는 단계이다. 상기 노드생성단계(S131)에서 생성되거나 이미 존재하던 노드에 대하여, 바람직하게는 출발지IP에 해당하는 값을 가진 노드를 제1노드로 지정하고, 목적지IP에 해당하는 값을 가진 노드를 제2노드로 지정한다. 이로써 데이터의 공격자자원과 위협대상자원이 그래프데이터 내의 제1노드와 제2노드로 플롯된다.The node designation step (S133) is a step of designating the first and second elements of the log data as the first node and the second node of the inflow graph data. For the node created or already existed in the node creation step (S131), preferably, a node having a value corresponding to a source IP is designated as a first node, and a node having a value corresponding to a destination IP is a second node. Is specified as. In this way, the attacker resource and the target resource of the data are plotted as the first node and the second node in the graph data.

상기 엣지지정단계(S135)는 로그데이터의 제3요소를 엣지로 지정하는 단계이다. 로그데이터에는 데이터의 출발지에서 도착지로의 전달이 기록되어있다. 따라서 데이터의 전달을 엣지로 표현하여 제1노드와 제2노드를 연결한다. 여기서 연결된 엣지는 그 데이터의 유형에 따라 다르게 생성될 수 있다. 예를 들면 엣지로 인한 연결은 통신, 공격, 데이터 유포, 위협 등이 있을 수 있다. 지정된 엣지는 상기 열거한 정류의 관계를 가질 수 있으나, 상기 열거한 연결관계는 예시에 해당하며, 이에 제한되지 않고 다른 여러 관계를 표시할 수 있다.The edge designation step (S135) is a step of designating the third element of log data as an edge. In the log data, the transmission of data from the origin to the destination is recorded. Therefore, the first node and the second node are connected by expressing data transmission as an edge. Here, the connected edge may be generated differently depending on the type of data. For example, the connection due to the edge may include communication, attack, data dissemination, and threats. The specified edge may have a relationship of rectification listed above, but the connection relationship listed above is an example, and is not limited thereto, and may indicate various other relationships.

상기 가중치지정단계(S137)는 엣지의 연결관계에 따라 엣지가 가진 가중치를 지정할 수 있다. 로그데이터에는 공격자자원에서 위협대상자원으로의 위협 내용이 포함될 수 있다. 이 경우, 위협 내용과 그 유형이 엣지에 포함될 수 있다. 또한, 위협 내용이 수반되지 않은 데이터 전달로 인한 엣지가 존재할 수 있는데, 이 경우에는 위협 내용과 유형은 엣지에 포함되지 않을 것이다. 상기 가중치지정단계(S137)에서는 엣지에 포함된 위협 내용 및/또는 유형에 따라 저장된 DB 혹은 위협 유형에 따른 위험도 연산에서 상기 위협 내용 및/또는 유형의 위협성을 도출하여 엣지의 가중치로 지정한다. 본 발명의 일 실시예에서는, 제1요소에서 제2요소로 전달된 데이터를 분석하여 위협 유형에 따라 int 형식의 가중치를 부여할 수 있다. 이때 전달된 데이터에 위협이 존재하지 않는다면 가중치는 0이 될 수 있다. 본 발명의 다른 실시예에서는 int 형식이 아닌 double 형식의 실수를 가중치로 지정할 수도 있다.In the weight designation step (S137), the weight of the edge may be designated according to the connection relationship between the edges. The log data may include the contents of the threat from the attacker resource to the threat target resource. In this case, the content of the threat and its type can be included in the edge. In addition, there may be an edge due to data transmission that does not involve the threat content, and in this case, the threat content and type will not be included in the edge. In the weighting step (S137), the threat content and / or type of threats are derived from the risk calculation according to the DB or threat type stored according to the threat content and / or type included in the edge and designated as the weight of the edge. In one embodiment of the present invention, data transmitted from the first element to the second element may be analyzed to assign an int type weight according to the threat type. At this time, if there is no threat in the transmitted data, the weight may be 0. In another embodiment of the present invention, a real number of a double type rather than an int type may be designated as a weight.

도 4 및 도 5를 참고하여 상기 개별요소 지정단계(S13)의 프로세스를 설명하면, 정규화된 로그를 분석하여 제1 내지 제3요소를 추출한 후, 제1요소와 제2요소의 값에 해당하는 A, B의 값을 가진 노드가 있는지 탐색한다. A, B 값을 가진 노드가 없다면 새로운 노드를 생성하고, A, B의 값을 가진 노드를 그래프 요소로 지정한다. 이후 A에서 B로의 엣지를 생성하고, 제3요소를 분석하여 노드의 연결관계를 표시한다. 도 4에 도시된 실시예에서는 C와 D의 연결관계가 표시될 수 있다. 노드의 연결관계, 즉 전송된 데이터를 분석하여 가중치를 지정하면 유입 그래프데이터가 완성될 수 있다.Referring to FIGS. 4 and 5, the process of the individual element designation step (S13) is described, and after analyzing the normalized log, extracting the first to third elements, and corresponding to the values of the first and second elements Search for nodes with values of A and B. If there are no nodes with A and B values, a new node is created, and nodes with A and B values are designated as graph elements. Then, an edge from A to B is generated, and the third element is analyzed to display the connection relationship of the nodes. In the embodiment illustrated in FIG. 4, a connection relationship between C and D may be displayed. The inflow graph data can be completed by specifying the weight by analyzing the connection relationship of the nodes, that is, the transmitted data.

상기 레이블 부여단계(S15)는 지정된 그래프 요소의 적어도 일부에 대하여 그 특성에 따라 레이블을 부여하는 단계이다. 바람직하게는 유입 그래프데이터의 노드 중 제2노드에 해당하는 노드에 대해서 레이블 값을 부여할 수 있다. 또한, 본 발명의 다른 실시예에서는 제2노드에 해당하는 노드 뿐만이 아니라 제1노드에 해당하는 노드에 대해서도 레이블을 부여할 수 있다. 레이블을 부여하는 방식은 공지된 또는 공지될 방법이 가능하나, 바람직하게는 위협대상자원의 중요도에 따라 지정될 수 있으며, 레이블의 값은 바람직하게는 실수 영역에서 정의될 수 있다. 도 4에 도시된 실시예에서는 제2요소에 해당하는 위협대상자원에 대하여 Y의 레이블값이 지정되었고, 제1요소에 해당하는 공격자자원에 대해서는 X의 레이블값이 지정된 것을 확인할 수 있다.The labeling step (S15) is a step of labeling at least a part of the designated graph element according to its characteristics. Preferably, a label value may be assigned to a node corresponding to the second node among nodes of the incoming graph data. In addition, in another embodiment of the present invention, not only the node corresponding to the second node but also the node corresponding to the first node may be labeled. The method of labeling may be known or known, but preferably, it may be designated according to the importance of the threatened resource, and the value of the label may be preferably defined in the real region. In the embodiment illustrated in FIG. 4, it can be confirmed that the label value of Y was assigned to the threat target resource corresponding to the second element, and the label value of X was assigned to the attacker resource corresponding to the first element.

유입 그래프데이터가 완성되면 그래프데이터 분석부(30)에서는 유입 그래프데이터와 기 저장된 비교기준 그래프데이터 간 유사패턴 매칭을 통하여 빈발패턴을 색출한다. 상기 빈발패턴은 그래프데이터 내에 존재하는 다양한 패턴들 중 지속적으로 생성되거나 생성될 수 있는 유사하거나 같은 패턴들을 지칭한다. 이러한 패턴은 주요한 데이터 전송을 나타낼 수 있다. 주요한 데이터 중 외부에서의 정보자원에 대한 위협이 있을 수 있는 바, 빈발패턴을 색출하고 관리해야 하는 것이다. 빈발패턴의 색출을 위한 프로세스로 서브그래프 결정단계(S20)와 속성 비교단계(S30)가 포함될 수 있다.When the incoming graph data is completed, the graph data analysis unit 30 searches for frequent patterns through similar pattern matching between the incoming graph data and the pre-stored comparison reference graph data. The frequent pattern refers to similar or identical patterns that may be continuously generated or generated among various patterns existing in graph data. This pattern can represent a major data transfer. Among the main data, there may be a threat to external information resources, so it is necessary to find and manage frequent patterns. As a process for finding out the frequent pattern, a subgraph determination step (S20) and an attribute comparison step (S30) may be included.

상기 서브그래프 결정단계(S20)는 유입 그래프데이터와 비교기준 그래프데이터의 비교를 위해 서브그래프를 설정하는 단계이다. 서브그래프란, 그래프의 노드와 엣지의 일부로 이루어진 그래프로, 그래프 G의 서브그래프 H는 하기의 수학식 1을 만족시킨다. 이에 따라 유입 서브그래프와 비교기준 서브그래프가 결정된다.The sub-graph determination step (S20) is a step of setting a sub-graph for comparison of the incoming graph data and the comparison reference graph data. A subgraph is a graph composed of a part of nodes and edges of a graph, and the subgraph H of graph G satisfies Equation 1 below. Accordingly, the inflow subgraph and the comparison reference subgraph are determined.

수학식 1Equation 1

이때, V(H)는 서브그래프 H의 노드, V(G)는 그래프 G의 노드, E(H)는 서브그래프 H의 엣지, E(G)는 그래프 G의 엣지에 해당한다.At this time, V (H) corresponds to the node of the subgraph H, V (G) corresponds to the node of the graph G, E (H) corresponds to the edge of the subgraph H, and E (G) corresponds to the edge of the graph G.

유입 데이터그래프의 노드와 엣지는 다수가 있을 수 있다. 따라서, 유입 데이터그래프로부터 생성될 수 있는 서브그래프도 그 수가 많게 되는데, 서브그래프를 일일히 지정하여 유사패턴 매칭으로 빈발패턴을 탐색할 수도 있겠으나, 이러한 경우에는 그 연산의 양과 시간이 기하급수적으로 많을 수밖에 없다. 따라서, 서브그래프를 일정한 기준에 따라 지정함으로써 연산의 양과 시간을 저감할 수 있다. There may be multiple nodes and edges of the incoming datagraph. Therefore, the number of subgraphs that can be generated from the incoming data graph is also large, and it is possible to search for frequent patterns with similar pattern matching by specifying the subgraphs individually, but in this case, the amount and time of the operation are exponential. There must be many. Therefore, the amount and time of calculation can be reduced by designating the subgraph according to a certain criterion.

도 6은 본 발명의 일 실시예에 따른 서브그래프 결정단계(S20)를 도시한 흐름도이다. 도 6을 참고하면, 상기 서브그래프 결정단계(S20)는 기준노드설정단계(S21), 제1엣지탐색단계(S23), 제1연결노드탐색단계(S25)를 포함할 수 있으며, 제2엣지탐색단계(S27)와 제2연결노드탐색단계(S29)를 더 포함할 수 있다. 도 7은 본 발명의 일 실시예에 따라 구성된 유입 그래프데이터를 예시한 도면이다. 이하는 도 7을 참고하여 설명하도록 한다. 6 is a flowchart illustrating a subgraph determination step S20 according to an embodiment of the present invention. Referring to FIG. 6, the subgraph determination step (S20) may include a reference node setting step (S21), a first edge search step (S23), and a first connection node search step (S25), and a second edge. A search step (S27) and a second connection node search step (S29) may be further included. 7 is a diagram illustrating inflow graph data constructed according to an embodiment of the present invention. The following will be described with reference to FIG. 7.

상기 기준노드설정단계(S21)는 서브그래프의 기준이 되는 기준노드를 설정한다. 기준노드를 설정함으로써 모든 노드를 중심으로 하는 서브그래프를 비교할 필요 없이 조건에 맞는 서브그래프를 결정한 후 비교하여 연산시간과 연산의 양을 효율적으로 줄일 수 있다. 상기 기준노드설정단계(S21)에서의 기준노드 설정은 여러가지 방법과 기준이 있을 수 있으나, 본 발명의 바람직한 일 실시예에서는 상기 기준노드설정단계(S21)는 같은 레이블 값이 부여된 복수의 노드 중에서 인엣지의 수가 최대인 노드를 기준노드로 설정하는 것일 수 있다. 이때, 인엣지란 방향성을 가진 데이터의 흐름인 엣지 중 해당 노드로 들어오는 방향의 엣지를 지칭하며, 본 발명에서는 어떠한 노드가 데이터의 도착지인 제2요소인 경우 인엣지가 정의될 수 있다.In the reference node setting step (S21), a reference node serving as a reference for the subgraph is set. By setting the reference node, it is possible to efficiently reduce the calculation time and the amount of calculation by determining and comparing subgraphs that meet the conditions without having to compare the subgraphs centered on all nodes. In the reference node setting step (S21), the reference node setting step may have various methods and criteria, but in a preferred embodiment of the present invention, the reference node setting step (S21) is performed among a plurality of nodes to which the same label value is assigned. The node having the maximum number of in-edges may be set as a reference node. At this time, the in-edge refers to an edge in a direction that enters a corresponding node among edges, which are flows of directional data, and in the present invention, when an node is a second element that is a destination of data, an in-edge may be defined.

도 7에 도시된 그래프데이터는 본 발명의 일 실시예에 따른 유입 그래프데이터 중 일부를 도시한 것이다. 상기 유입 그래프데이터에서 확인할 수 있듯이 노드는 N1, N2와 V1 내지 V4가 존재하며, N1과 N2의 레이블은 3으로 지정되어있다. 엣지는 E1 내지 E5가 존재한다. 각 엣지 상에 표기된 숫자는 위협 유형에 따른 가중치를 지칭한다.The graph data shown in FIG. 7 shows some of the inflow graph data according to an embodiment of the present invention. As can be seen from the inflow graph data, nodes N1, N2 and V1 to V4 exist, and labels of N1 and N2 are designated as 3. The edges are E1 to E5. The number indicated on each edge refers to the weight according to the type of threat.

도 7의 유입 그래프데이터에서 기준노드를 설정하는 과정을 설명하면, 레이블이 3으로 같은 N1과 N2 노드에 대하여 어떤 것을 기준노드로 할 것인지 결정하게 된다. 이 경우, 인엣지의 수가 더 많은 N1이 기준노드가 될 것이다. 인엣지의 수가 많다는 것은, 데이터가 해당 자원으로 더 많이 유입된다는 것을 뜻하므로, 해당 자원의 상대적 중요도가 다른 노드에 해당하는 자원보다 더 클 수 있다.When the process of setting the reference node in the inflow graph data of FIG. 7 is described, it is determined which of the N1 and N2 nodes having the same label as 3 is the reference node. In this case, N1 with a larger number of in-edges will be the reference node. The large number of in-edges means that more data flows into the resource, so the relative importance of the resource may be greater than the resource corresponding to other nodes.

본 발명의 다른 실시예에 따른 기준노드설정단계(S21)에서는, 인엣지의 수가 아닌 인엣지의 가중치 합이 최대인 노드를 기준노드로 설정할 수도 있다. 이 경우, N1노드는 인엣지의 가중치 합이 7이고, N2노드는 인엣지의 가중치 합이 5인 바, N1이 기준노드로 설정될 것이다. 인엣지 가중치 합이 높다는 것은 위협성이 높은 공격을 많이 받는다는 것을 뜻하므로, 해당 자원의 중요도가 다른 자원보다 더 크다는 것을 의미한다.In the reference node setting step S21 according to another embodiment of the present invention, a node having the maximum sum of the weights of the edges rather than the number of edges may be set as the reference node. In this case, the N1 node has a sum of the weights of the in-edge, and the N2 node has a sum of the weights of the in-edges, so N1 will be set as a reference node. A high sum of in-edge weights means that they are subjected to many threatening attacks, which means that the importance of the resource is greater than other resources.

본 발명의 또다른 실시예에 따른 기준노드설정단계(S21)에서는, 같은 레이블 값이 부여된 복수의 노드 중에서 인엣지의 수가 최대인 노드를 기준노드로 설정하되, 인엣지의 수를 카운팅할 때 인엣지의 가중치가 0인 것은 제외할 수 있다. 가중치가 0인 엣지는 제1요소에서 제2요소로의 데이터의 전송이 위협이 아닌 것을 의미한다. 본 발명의 목적은 위협 유형 또는 내용이 담긴 로그데이터의 전송을 빈발패턴으로 색출하여 관리하는 것이므로, 위협이 없는 데이터가 포함된 엣지는 기준노드 설정의 기준에서 배제할 수도 있다.In the reference node setting step (S21) according to another embodiment of the present invention, when a node having the largest number of in-edges among a plurality of nodes to which the same label value is assigned is set as a reference node, when counting the number of in-edges It can be excluded that the weight of the in-edge is 0. An edge with a weight of 0 means that the transmission of data from the first element to the second element is not a threat. The purpose of the present invention is to search for and manage the transmission of log data containing the threat type or content in a frequent pattern, so an edge containing data without threats may be excluded from the criteria for setting the reference node.

본 발명의 다른 실시예에 따른 서브그래프 결정단계(S20)는 기준노드를 설정하는 기준노드설정단계(S21) 이전 또는 이후에 엣지정리단계(S22)를 포함할 수도 있다. 상기 엣지정리단계(S22)는 가중치가 0인 엣지를 삭제하도록 하여 연산시간 및 연산량을 줄일 수 있다. 위협 유형 또는 내용이 포함되지 않은 데이터가 전송된 경우 해당 엣지는 가중치가 0으로 지정될 수 있다. 이 경우, 빈발패턴 탐색 중 기준노드 설정단계(S21)에서의 연산 낭비를 제거하기 위하여, 가중치가 0인 엣지를 삭제하여 유입 그래프데이터에 있어서 효율적인 서브그래프 결정이 가능하다.The subgraph determining step S20 according to another embodiment of the present invention may include an edge rearranging step S22 before or after the reference node setting step S21 for setting the reference node. In the edge rearranging step (S22), an edge having a weight of 0 is deleted, so that the computation time and computation amount can be reduced. When data that does not contain a threat type or content is transmitted, the corresponding edge may be assigned a weight of 0. In this case, in order to remove the computational waste in the reference node setting step (S21) during the frequent pattern search, it is possible to determine an effective subgraph in the inflow graph data by deleting the edge having a weight of 0.

도 7을 참고로 하면, 상기 엣지정리단계(S22)는 바람직하게는 기준노드설정단계(S21) 이전에 수행되며, 가중치가 0에 해당하는 엣지를 삭제하므로 유입 그래프데이터 중 가중치가 0인 E5를 제거할 수 있다. 이 경우 기준노드 설정 시 N1으로의 인엣지는 3개가 아닌 2개로 카운팅될 수 있다.Referring to FIG. 7, the edge rearranging step (S22) is preferably performed before the reference node setting step (S21), and since the edge corresponding to the weight is deleted, E5 having a weight of 0 in the inflow graph data is deleted. Can be removed. In this case, when setting the reference node, the edge to N1 may be counted as two instead of three.

상기 제1엣지탐색단계(S23)는 기준노드에 연결된 엣지를 탐색하여 서브그래프의 요소로 지정할 수 있다. 서브그래프는 노드와 엣지로 이루어지는 바, 유사패턴 매칭을 위한 서브그래프의 구성요소를 지정하기 위함이다. 엣지를 지정하지 않은 노드만의 서브그래프도 존재할 수 있으나, 그 경우에는 데이터의 전송에 따른 정보를 갖고있지 않는 바 분석이 무의미하다. 본 발명의 일 실시예에 따른 제1엣지탐색단계(S23)에서는 기준노드에 연결된 엣지를 탐색하여 서브그래프의 엣지로 지정할 수 있으나, 바람직하게는 기준노드에 연결된 엣지 중 인엣지만을 탐색하여 서브그래프의 엣지로 지정할 수 있다. 통상의 정보자원은 특별한 사정이 없는 한 위협데이터를 전송하지 않을 가능성이 높은 바 연산의 효율성을 위하여 아웃엣지는 서브그래프의 엣지에서 제외할 수 있다.The first edge search step S23 may search for an edge connected to a reference node and designate it as an element of a subgraph. The subgraph consists of nodes and edges, and is intended to designate the components of the subgraph for similar pattern matching. Sub-graphs only for nodes that do not specify an edge may exist, but in that case, analysis is meaningless because it does not have information according to data transmission. In the first edge search step (S23) according to an embodiment of the present invention, an edge connected to a reference node may be searched and designated as an edge of a subgraph, but preferably, only an in-edge among edges connected to a reference node is searched for the subgraph. It can be specified as the edge. Normal information resources are highly unlikely to transmit threat data unless otherwise specified. For efficiency of calculation, the out edge can be excluded from the edge of the subgraph.

본 발명의 다른 실시예에 따른 제1엣지탐색단계(S23)에서는, 기준노드에 연결된 엣지 중 아웃엣지에 대해서도 서브그래프의 엣지로 지정할 수 있고, 또다른 실시예에 따른 제제1엣지탐색단계(S23)에서는, 기준노드의 아웃엣지 중 가중치가 0이 아닌 것만 서브그래프의 엣지로 지정할 수도 있다. 이 경우는 기준노드에 해당하는 정보자원이 소위 좀비PC가 되어 다른 정보자원에 위협 데이터를 전송하는 경우로, 이 역시 다른 정보자원에 대해 위협이 될 수 있는 바 빈발패턴으로 선별하여 관리할 필요가 있다.In the first edge search step (S23) according to another embodiment of the present invention, an out edge among the edges connected to the reference node may be designated as an edge of the subgraph, and the formulation 1 edge search step (S23) according to another embodiment. In), only the weight of the reference node that has a non-zero weight can be designated as the edge of the subgraph. In this case, the information resource corresponding to the reference node becomes a so-called zombie PC and transmits threat data to other information resources. This also can be a threat to other information resources, so it needs to be selected and managed as a frequent pattern. have.

도 7을 참고하면, 기준노드인 N1에 대하여 인엣지는 E1, E2, E5가 존재한다. 상기 인엣지들은 모두 서브그래프의 엣지로 지정될 수 있으나, 이중 E5에 대해서는 특정 실시예에 따르면 엣지정리단계(S22)에서 E5가 제거되거나, 서브그래프의 엣지로 지정되지 않을 수 있다.Referring to FIG. 7, E1, E2, and E5 exist for the edge of the reference node N1. All of the in-edges may be designated as the edges of the subgraph, but for the E5, E5 may be removed or may not be designated as the edge of the subgraph according to a specific embodiment.

상기 제1연결노드탐색단계(S25)는 상기 기준노드와 제1엣지탐색단계(S23)에서 지정된 엣지로 연결된 노드를 탐색하여 제1연결노드로 지정하는 단계이다. 서브그래프는 노드와 엣지로 구성되는데, 엣지는 노드와 노드 사이를 연결하는 바, 제1연결노드를 서브그래프의 노드로 지정함으로써 서브그래프를 구성하는 단계이다. 본 발명의 바람직한 일 실시예에서는 상기 제1연결노드탐색단계(S25)를 통하여 제1연결노드로 지정된 노드들은 기준노드가 되는 정보자원에 직접적으로 위협 데이터를 송신한 공격자자원이 될 것이다. 도 7을 참고하면, 기준노드인 N1에 위협 데이터를 전송하는 V1과 V2 노드가 제1연결노드가 될 것이다.The first connection node search step (S25) is a step of searching for a node connected by the reference node and the edge specified in the first edge search step (S23) and designating it as the first connection node. The subgraph is composed of a node and an edge, and the edge connects between the node and the node, and is a step of configuring the subgraph by designating the first connection node as a node of the subgraph. In a preferred embodiment of the present invention, nodes designated as the first connection node through the first connection node discovery step (S25) will be an attacker resource that directly transmits threat data to the information resource serving as a reference node. Referring to FIG. 7, V1 and V2 nodes transmitting threat data to the reference node N1 will be the first connection node.

전술한 바와 같이 서브그래프 결정단계(S20)은 제2엣지탐색단계(S27)와 제2연결노드탐색단계(S29)를 더 포함할 수도 있다. 상기 단계들(S27 및 S29)은 위협대상자원에 대하여 직접적으로 위협 데이터를 송신한 공격자자원이 아닌, 간접적으로 위협데이터를 송신한 공격자자원의 탐색을 위해 수행될 수 있다. 예컨대, DDos 공격의 경우에는 원격 PC가 여러대의 좀비PC를 감염시키고, 감염된 좀비PC가 위협대상자원을 공격하기 때문에, 이러한 패턴에 대해서도 빈발패턴으로 관리를 위해 제2엣지탐색단계(S27)와 제2연결노드탐색단계(S29)가 수행될 수 있다.As described above, the sub-graph determination step (S20) may further include a second edge search step (S27) and a second connection node search step (S29). The above steps (S27 and S29) may be performed to search for an attacker resource that indirectly transmits threat data, rather than an attacker resource that directly transmits threat data to the target resource. For example, in the case of a DDos attack, since the remote PC infects multiple zombie PCs and the infected zombie PC attacks the threat target resource, the second edge search step (S27) and the second step for managing such patterns are also frequently used. Two connection node search step (S29) may be performed.

상기 제2엣지탐색단계(S27)는 제1연결노드에 연결된 엣지를 탐색하며, 바람직하게는 인엣지만을 탐색하되, 인엣지의 가중치가 0이 아닌 것만을 탐색하여 서브그래프의 제2엣지로 지정할 수 있다.In the second edge search step (S27), the edge connected to the first connection node is searched, and only the in-edge is searched, but only the weight of the in-edge is non-zero to be designated as the second edge of the subgraph. You can.

상기 제2연결노드탐색단계(S29)는 상기 제1연결노드와 엣지로 연결된 노드를 탐색하되, 서브그래프의 제2엣지로 연결된 노드에 한하여 제2연결노드로 지정할 수 있다. 도 7을 참고하면, 가중치가 1로 지정되어 0이 아닌 E3이 제2엣지로 지정되고, V3 노드가 제2연결노드로 지정될 것이다.In the second connection node search step (S29), a node connected to the first connection node and an edge is searched, but only the node connected to the second edge of the subgraph can be designated as the second connection node. Referring to FIG. 7, the weight is designated as 1, so E3 that is not 0 is designated as the second edge, and the V3 node will be designated as the second connection node.

도 8은 본 발명의 일 실시예에 따른 속성비교단계(S30)를 도시한 흐름도이고, 도 9는 본 발명의 일 실시예에 따른 서브그래프 연산단계(S31)에서의 서브그래프의 일 예를 도시한 도면이다. 이하는 도 8과 도 9를 참고하여 상기 속성비교단계(S30)를 설명하도록 한다. 상기 속성비교단계(S30)는 유입 그래프데이터와 비교기준 그래프데이터에 대하여 서브그래프를 설정한 후, 서브그래프를 유사패턴 매칭으로 비교한 후 유사도를 반환한다. 상기 속성비교단계(S30)는 서브그래프 연산단계(S31)와 유사도 반환단계(S33)를 포함할 수 있다.8 is a flowchart illustrating an attribute comparison step (S30) according to an embodiment of the present invention, and FIG. 9 shows an example of a subgraph in the subgraph operation step (S31) according to an embodiment of the present invention It is one drawing. Hereinafter, the attribute comparison step (S30) will be described with reference to FIGS. 8 and 9. In the attribute comparison step (S30), after subgraphs are set for the incoming graph data and the comparison reference graph data, the subgraphs are compared with similar pattern matching and similarity is returned. The attribute comparison step (S30) may include a subgraph operation step (S31) and a similarity return step (S33).

상기 서브그래프 연산단계(S31)는 유입 서브그래프와 비교기준 서브그래프를 비교하여 유사도 인자(F)를 연산하는 단계이다. 상기 유사도 인자(F)는 두 서브그래프가 얼마나 유사한지를 나타내는 척도이며, 각 서브그래프의 노드와 엣지를 기준으로 연산된다.The subgraph operation step S31 is a step of calculating the similarity factor F by comparing the inflow subgraph and the comparison reference subgraph. The similarity factor F is a measure indicating how similar the two subgraphs are, and is calculated based on the nodes and edges of each subgraph.

상기 유사도 인자(F)는 여러가지 방법으로 계산될 수 있다. 두 서브그래프를 비교하여 노드와 엣지를 기반으로 유사도 인자(F)를 계산하는 방법들이 본 발명의 권리범위에 속하나, 바람직하게는 상기 서브그래프 연산단계(S31)의 유사도 인자(F)는 하기의 식 1을 통하여 산출될 수 있다.The similarity factor (F) can be calculated in a number of ways. The methods for comparing the two subgraphs and calculating the similarity factor F based on the node and the edge are within the scope of the present invention, but preferably, the similarity factor F of the subgraph operation step S31 is as follows. It can be calculated through Equation 1.

(식 1)(Equation 1)

이때, 각 노드간 최소거리는 서브그래프를 구성하는 노드에 있어서 임의의 한 노드에서 다른 한 노드까지의 최소 엣지 수이다. 도 9를 참고로 하여 이를 설명하면, 유입 서브그래프를 구성하는 노드는 N1, V1 내지 V3이며, 엣지는 E1 내지 E3가 존재한다. 비교기준 서브그래프를 구성하는 노드는 N3, V5, V6이며, 엣지는 E5와 E6이다.At this time, the minimum distance between each node is the minimum number of edges from any one node to another node in a node constituting the subgraph. If this is described with reference to FIG. 9, nodes constituting the inflow subgraph are N1, V1 to V3, and edges E1 to E3 exist. The nodes constituting the comparison reference subgraph are N3, V5, and V6, and the edges are E5 and E6.

유입 서브그래프의 유사도 인자(F)를 산출하면, 각 노드의 조합은 (N1, V1), (N1, V2), (N1, V3), (V1, V2), (V1, V3), (V2, V3)가 되며, 상기 조합의 최소거리는 각각 1, 1, 2, 2, 3, 1이 된다. 이를 모두 합하면 유사도 인자(F)는 10으로 산출된다. 비교기준 서브그래프의 유사도 인자(F)를 산출하면, 각 노드의 조합은 (N3, V5), (N3, V6), (V5, V6)가 되며, 상기 조합의 최소거리는 각각 1, 1, 2가 되어 유사도 인자(F)는 4로 산출된다. 상기 유사도 인자(F)는 기준 노드에 대하여 많은 공격자자원에 의해 위협이 수행될 경우 그 값이 커지는 경향이 있다.When calculating the similarity factor (F) of the inflow subgraph, the combination of each node is (N1, V1), (N1, V2), (N1, V3), (V1, V2), (V1, V3), (V2) , V3), and the minimum distance of the combination is 1, 1, 2, 2, 3, 1, respectively. Summing them together, the similarity factor (F) is calculated as 10. When calculating the similarity factor (F) of the comparison reference subgraph, the combination of each node becomes (N3, V5), (N3, V6), (V5, V6), and the minimum distance of the combination is 1, 1, 2, respectively. And the similarity factor (F) is calculated as 4. The similarity factor (F) tends to increase when the threat is performed by many attacker resources against the reference node.

본 발명의 다른 실시예에 의한 서브그래프 연산단계(S31)에서는, 유사도 인자(F2)를 계산하되, 상기 유사도 인자는 노드, 엣지 및 가중치를 기반으로 하여 산출될 수 있다. 전술한 유사도 인자(F)의 산출은 노드간 최소거리를 구할 시 임의의 노드에서 다른 한 노드까지의 최소 엣지 수로, 한 엣지를 지날 때 1의 값을 증가시켰으나, 다른 실시예에서의 유사도 인자(F2)는 최소거리를 구하는 경우 한 엣지를 지날 때 해당 엣지의 가중치 값을 증가시킬 수 있다. 이에 따라 도 9의 일 예를 계산하면, 비교기준 서브그래프의 유사도 인자(F2)는 3, 2, 5의 합인 10이 된다. 해당하는 경우의 유사도 인자(F2)는 공격자자원에서의 위협내용까지 고려하여 유사도 인자를 산출한 것이다.In the subgraph operation step S31 according to another embodiment of the present invention, the similarity factor F2 is calculated, but the similarity factor may be calculated based on nodes, edges, and weights. The above-described calculation of the similarity factor (F) is the minimum number of edges from any node to another node when determining the minimum distance between nodes, and increases the value of 1 when passing an edge, but the similarity factor ( In the case of obtaining the minimum distance, F2) may increase the weight value of the edge when passing an edge. Accordingly, when calculating the example of FIG. 9, the similarity factor F2 of the comparison reference subgraph becomes 10, which is the sum of 3, 2, and 5. The similarity factor (F2) in the corresponding case is calculated by considering the threat content in the attacker resource.

본 발명의 또다른 실시예에 의한 서브그래프 연산단계(S31)에서는 유사도 인자(F3)를 계산할 시 상기 식 1을 통하여 계산하되, 계산된 값을 서브그래프의 노드 수로 나눠서 산출할 수 있다. 비교기준 서브그래프의 유사도 인자(F3)는 식 1에 따른 값인 4를 노드 수인 3으로 나눠 1.33이 될 수 있다. 해당 유사도 인자(F3)는 공격자자원의 증가에 의한 유사도 인자(F3)의 급격한 증가를 상쇄시키는 방법에 해당한다.In the subgraph operation step S31 according to another embodiment of the present invention, when calculating the similarity factor F3, it is calculated through Equation 1 above, but it can be calculated by dividing the calculated value by the number of nodes in the subgraph. The similarity factor F3 of the comparison reference subgraph may be 1.33 by dividing the value 4 according to Equation 1 by the number of nodes 3. The similarity factor (F3) corresponds to a method of canceling the rapid increase in the similarity factor (F3) due to an increase in attacker resources.

상기 유사도 반환단계(S33)는 서브그래프 연산단계(S31)에서 산출된 유입 서브그래프와 비교기준 서브그래프의 유사도 인자(F, F2, F3 등) 차이를 산출하고, 그 차이가 적을수록 높은 유사도 값을 반환할 수 있다. 유사도 인자의 차이가 0인 경우 유사도는 최고값을 가질 수 있다. 상기 유사도 반환단계(S33)에서는 유사도 인자의 차이값을 유사도로 변환시키는 데에 여러 방법의 변환식을 쓸 수 있으며, 공지된 혹은 공지될 변환식이 사용될 수 있다. 또한, 본 발명의 일 실시예에서는 유사도가 숫자 형식으로 반환될 수 있으나, 다른 실시예에서는 유사도 인자에 따라 유사도의 단계를 구분하여 반환될 수도 있다. 유사도 반환 방법은 전술한 예시에 국한되지 않으며 유사도 인자의 차이가 아닌 유사도 인자를 나눈 값을 사용하는 등 다른 반환방법이 사용될 수 있다. 상기 유사도 반환단계(S33)에서 The similarity return step (S33) calculates a difference in the similarity factor (F, F2, F3, etc.) between the inflow subgraph and the comparison reference subgraph calculated in the subgraph operation step (S31), and the smaller the difference is, the higher the similarity value is. Can return When the difference between the similarity factors is 0, the similarity may have the highest value. In the similarity return step (S33), a conversion formula of various methods may be used to convert the difference value of the similarity factor to similarity, and a known or known conversion formula may be used. Further, in one embodiment of the present invention, the similarity may be returned in a numeric format, but in other embodiments, the similarity steps may be classified and returned according to the similarity factor. The similarity return method is not limited to the above-described examples, and other return methods such as using a value obtained by dividing the similarity factor rather than the difference between the similarity factors may be used. In the similarity return step (S33)

나아가, 본 발명의 일 실시예에 따른 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법(S1)은 유입 서브그래프의 위험도(R)를 산출하여 사용자에게 통지할 수 있다. 이에 따라 본 발명의 일 실시예에 따른 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법(S1)은 위험도 산출단계(S40)를 더 포함할 수 있다.Furthermore, the graph database-based log data similar pattern matching and risk management method S1 according to an embodiment of the present invention can calculate the risk R of the inflow subgraph and notify the user. Accordingly, the graph database based log data similar pattern matching and risk management method (S1) according to an embodiment of the present invention may further include a risk calculation step (S40).

위험도 산출단계(S40)에서는 위험도(R)를 그래프데이터의 노드와 엣지를 기반으로 하여 산출하게 된다. 상기 위험도(R)는 여러가지 방법으로 계산될 수 있으며, 노드와 엣지를 기반으로 위험도(R)를 계산하는 여러방법들이 본 발명의 권리범위에 속하나, 바람직하게는 상기 위험도 산출단계(S40)의 위험도(R)는 하기의 식 2를 통하여 산출될 수 있다.In the risk calculation step S40, the risk R is calculated based on the nodes and edges of the graph data. The risk (R) can be calculated in a number of ways, several methods for calculating the risk (R) based on the node and the edge is within the scope of the present invention, preferably, the risk of the risk calculation step (S40) (R) can be calculated through Equation 2 below.

(식 2)(Equation 2)

이때, R은 위험도이며, L은 서브그래프의 기준노드에 부여된 레이블 값이다. 도 9를 참고하여 이를 설명하면, 유입 서브그래프의 위험도(R)는 레이블 값 3과 기준노드에 연결된 인엣지 가중치의 합인 7을 곱하여 21의 값으로 산출될 수 있다. 상기 위험도(R)는 기준노드에 해당하는 위협대상자산의 중요도 또는 취약정도와 위협 내용의 위협성 및 위협 발생 빈도를 고려하도록 산출된다.At this time, R is the risk, and L is the label value assigned to the reference node of the subgraph. When this is described with reference to FIG. 9, the risk (R) of the inflow subgraph can be calculated as a value of 21 by multiplying the label value 3 and 7, which is the sum of the edge weights connected to the reference node. The risk (R) is calculated to take into account the importance or vulnerability of the target asset corresponding to the reference node, the threat of the content of the threat and the frequency of occurrence of the threat.

본 발명의 다른 실시예에서의 위험도(R2)는 기준노드에 직접 연결된 인엣지뿐만이 아니라 간접적으로 연결된 인엣지까지 포함하여 위험도를 구할 수도 있다. 도 9를 참고하면 유입 서브그래프에서 E3의 가중치까지 고려하므로 위험도는 24로 산출될 수 있다. 상기 위험도(R, R2)를 산출하는 방법은 예시된 방법에 국한되지 않으며, 위협 유형과 위협대상자원의 중요도 등을 고려할 수 있는 방법이라면 무방하다. 산출된 위험도(R)는 여러 시각적, 청각적 방법에 의하여 사용자에게 통지될 수 있음은 자명하다.In another embodiment of the present invention, the risk (R2) may include the edge directly connected to the reference node as well as the indirectly connected edge to obtain the risk. Referring to FIG. 9, since the weight of E3 is considered in the inflow subgraph, the risk may be calculated as 24. The method of calculating the risks (R, R2) is not limited to the illustrated method, and may be any method that can consider the threat type and the importance of the target resource. It is obvious that the calculated risk R can be notified to the user by various visual and audible methods.

상기의 단계에 따라 유사도 및/또는 위험도를 산출하면, 빈발패턴을 선별하고 이를 저장하여 관리하는 것이 필요하다. 저장된 빈발패턴은 추후 데이터의 유입 시 비교기준 서브그래프로 활용될 수 있다. 이에 따라 본 발명에 따른 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법(S1)은 선별관리단계(S50)를 더 포함할 수 있다.When similarity and / or risk are calculated according to the above steps, it is necessary to select and store frequent patterns. The stored frequent pattern can be used as a comparison reference subgraph when data is subsequently introduced. Accordingly, the graph database-based log data similar pattern matching and risk management method (S1) according to the present invention may further include a screening management step (S50).

도 10은 본 발명의 일 실시예에 따른 선별관리단계(S50)를 도시한 흐름도이다. 도 10을 참고하면, 상기 선별관리단계(S50)는 빈발패턴을 선별하고 저장하여 관리할 수 있다. 전술한 바와 같이 빈발패턴은 위협대상자원에 대한 위협 시도일 수 있으므로 따로 저장하여 관리하는 것이다. 상기 선별관리단계(S50)는 빈발패턴선별단계(S51)와 저장단계(S53)를 포함할 수 있다.10 is a flowchart illustrating a screening management step (S50) according to an embodiment of the present invention. Referring to FIG. 10, in the screening management step (S50), frequent patterns may be selected and stored. As described above, the frequent pattern may be an attempt to threaten the target resource, and thus is stored and managed separately. The selection management step (S50) may include a frequent pattern selection step (S51) and a storage step (S53).

상기 빈발패턴선별단계(S51)는 기준에 따라 유입 그래프데이터 및/또는 서브그래프를 빈발패턴으로 정의할 수 있다. 기준이 되는 기준치는 여러 함수 또는 상수로 정의될 수 있으며, 보안관제시스템의 목적에 따라 다른 값으로 정의될 수 있다. 기준치 이상의 유사도를 갖는 유입 그래프데이터 및/또는 서브그래프는 빈발패턴으로 정의되어 관리된다.The frequent pattern selection step S51 may define inflow graph data and / or subgraphs as frequent patterns according to a standard. The reference value that can be defined can be defined as various functions or constants, and can be defined as different values according to the purpose of the security control system. Inflow graph data and / or subgraphs having similarities above the reference value are defined and managed as frequent patterns.

상기 관리단계(S53)는 선별된 빈발패턴을 저장하고 관리할 수 있다. 선별된 빈발패턴에 대하여 DB에 저장할 수 있으며, DB에 저장된 빈발패턴들을 분석할 수도 있고, 추후 유입되는 데이터가 있는 경우 비교기준 그래프로 불러내어 비교작업을 행할 수도 있다.The management step (S53) may store and manage the selected frequent pattern. The selected frequent patterns can be stored in the DB, the frequent patterns stored in the DB can be analyzed, or, if there is incoming data, it can be called into a comparison criteria graph to perform comparison.

상기의 단계를 따라 수집한 로그데이터를 그래프데이터로 지정한 후 빈발 패턴에 대하여 서브그래프를 결정하고, 비교기준 그래프데이터와 유사도를 비교함으로써, 위협 가능성이 큰 데이터를 색출하고 선별 및 관리하여 효율적인 보안관제가 가능한 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법이 수행될 수 있다.After specifying the log data collected as the graph data according to the above steps, subgraphs are determined for the frequent patterns, and the similarity is compared with the comparative reference graph data to find, screen, and manage data with a high threat potential for effective security control. A possible pattern matching and risk management method can be performed based on possible graph database.

이상에서 설명한 본 발명의 실시 예는 장치 및 방법을 통해서만 구현이 되는 것은 아니며, 본 발명의 실시예의 구성에 대응하는 기능을 실현하는 프로그램 또는 그 프로그램이 기록된 기록 매체를 통해 구현될 수도 있으며, 이러한 구현은 앞서 설명한 실시예의 기재로부터 본 발명이 속하는 기술 분야의 전문가라면 쉽게 구현할 수 있는 것이다.The embodiment of the present invention described above is not implemented only through an apparatus and a method, and may be implemented through a program that realizes a function corresponding to the configuration of the embodiment of the present invention or a recording medium in which the program is recorded. The implementation can be easily implemented by those skilled in the art to which the present invention pertains from the description of the above-described embodiments.

이상의 상세한 설명은 본 발명을 예시하는 것이다. 또한, 전술한 내용은 본 발명의 바람직한 실시 형태를 나타내어 설명하는 것이며, 본 발명은 다양한 다른 조합, 변경 및 환경에서 사용할 수 있다. 즉 본 명세서에 개시된 발명의 개념의 범위, 저술한 개시 내용과 균등한 범위 및/또는 당업계의 기술 또는 지식의 범위내에서 변경 또는 수정이 가능하다. 저술한 실시예는 본 발명의 기술적 사상을 구현하기 위한 최선의 상태를 설명하는 것이며, 본 발명의 구체적인 적용 분야 및 용도에서 요구되는 다양한 변경도 가능하다. 따라서 이상의 발명의 상세한 설명은 개시된 실시 상태로 본 발명을 제한하려는 의도가 아니다. 또한 첨부된 청구범위는 다른 실시 상태도 포함하는 것으로 해석되어야 한다.The above detailed description is to illustrate the present invention. In addition, the foregoing is a description of preferred embodiments of the present invention, and the present invention can be used in various other combinations, modifications and environments. That is, it is possible to change or modify the scope of the concept of the invention disclosed herein, the scope equivalent to the disclosed contents, and / or the scope of the art or knowledge in the art. The embodiments described describe the best state for implementing the technical idea of the present invention, and various changes required in specific application fields and uses of the present invention are possible. Accordingly, the detailed description of the invention is not intended to limit the invention to the disclosed embodiments. In addition, the appended claims should be construed to include other embodiments.

S1: 그래프데이터베이스 기반 로그데이터 유사패턴 매칭 및 위험관리 방법
S10: 유입 그래프데이터 생성단계
S11: 전처리단계 S13: 개별요소 지정단계
S131: 노드생성단계 S133: 노드지정단계
S135: 엣지지정단계 S137: 가중치지정단계
S15: 레이블 부여단계
S20: 서브그래프 결정단계
S21: 기준노드설정단계 S22: 엣지정리단계
S23: 제1엣지탐색단계 S25: 제2연결노드탐색단계
S27: 제2엣지탐색단계 S29: 제2연결노드탐색단계
S30: 속성비교단계
S31: 서브그래프 연산단계 S33: 유사도 반환단계
S40: 위험도 산출단계
S50: 선별관리단계
S51: 빈발패턴선별단계 S53: 관리단계
F: 유사도 인자 R: 위험도S1: Graph database based log data similar pattern matching and risk management method
S10: Step of generating inflow graph data
S11: Pre-processing step S13: Individual element designation step
S131: node creation step S133: node designation step
S135: Edge designation step S137: Weight designation step
S15: Labeling step
S20: Subgraph determination step
S21: Reference node setting step S22: Edge cleanup step
S23: first edge search step S25: second connection node search step
S27: second edge search step S29: second connection node search step
S30: Quick comparison step
S31: Subgraph operation step S33: Similarity return step
S40: Risk calculation stage
S50: Screening management stage
S51: Frequent pattern selection step S53: Management step
F: Similarity factor R: Risk

Claims

And generating inflow graph data to store the collected log data as inflow graph data.
The inflow graph data generation step includes an individual element designation step of designating each element of the log data as a graph element and a labeling step of labeling at least a portion of the stored graph elements,
The individual element designation step is a node generation step of generating a node when there is no node having a value corresponding to the value of the first element or the second element by analyzing the collected log data, and the first element and the first element of the log data. A graph database-based log data similar pattern matching and risk management method comprising a node designation step of designating 2 elements as a first node and a second node and an edge designation designation of a 3rd element of log data as an edge. .

delete

The method of claim 1, wherein the step of designating the individual elements,
A graph database-based log data similar pattern matching and risk management method, further comprising a weight designating step of designating a weight according to the type of the edge.

The graph database based log data similar pattern matching and risk management method according to claim 1, wherein the first element is an attacker resource, the second element is a threat target resource, and the edge is a threat method.

The similarity can be determined by generating inflow graph data that stores the collected log data as inflow graph data, and determining subgraphs to set each subgraph for comparison between the inflow graph data and comparison reference graph data, and pattern matching between subgraphs. Comprising the attribute comparison step to calculate,
The inflow graph data generation step includes an individual element designation step of designating each element of the log data as a graph element and a labeling step of labeling at least a portion of the stored graph elements,
The sub-graph determination step,
A reference node setting step of setting a reference node to be a subgraph reference, a first edge search step of searching for and specifying an edge connected to the reference node, and a first connection node searching for a first connection node connected to the reference node and the edge A graph database based log data similar pattern matching and risk management method comprising a search step.

delete

According to claim 5, The reference node setting step,
A graph database-based log data similar pattern matching and risk management method, characterized in that a node having the same number of edges with the same label is set as a reference node.

According to claim 5, The reference node setting step,
A graph database-based log data similar pattern matching and risk management method, characterized in that a node having the same weight among the nodes with the same label is set as a reference node.

The method of claim 7 or 8, wherein the first edge search step,
A graph database-based log data similar pattern matching and risk management method characterized by searching only the in-edge among the edges connected to the reference node and designating it as an edge of the subgraph, except that the weight of the in-edge is zero.

The method of claim 9, wherein the sub-graph determination step,
An edge clearing step of deleting an edge having a weight of 0 before the reference node setting step;
After the first connection node search step, a second edge search step of designating an edge in which the weight is not 0 among edges connected to the first connection node is designated as an edge of the subgraph, and
A graph database-based log data similar pattern matching and risk management method, further comprising a second connection node search step of designating a node connected as an edge to the first connection node as a second connection node.

The method of claim 5, wherein the attribute comparison step,
A subgraph operation step of calculating the similarity factor based on the inflow subgraph and the comparison reference subgraph and a similarity return step of comparing the similarity factor and returning the similarity of each subgraph,
The similarity factor is calculated based on the nodes and edges of each subgraph, and the graph database-based log data similar pattern matching and risk management method.

The method of claim 11, wherein the similarity factor is,
A graph database based log data similar pattern matching and risk management method, which is calculated through Equation 1 below according to the nodes and edges constituting each subgraph.
(Equation 1)

At this time, F is a similarity factor, and the minimum distance between each node is the minimum number of edges from any one node to another node in the subgraph.

The method of claim 12, wherein the similarity return step,
A graph database based log data similarity pattern matching and risk management method characterized by returning a higher similarity value based on the difference between the similarity factor of the inflow subgraph and the similarity factor of the comparison reference subgraph.

The method according to claim 5, further comprising a risk calculation step of calculating a risk of the subgraph, a log database similar pattern matching and risk management method.

15. The method of claim 14, The risk calculation step,
A graph database based log data similar pattern matching and risk management method, characterized in that a risk is calculated based on a label value of a reference node and a weight of an edge connected to the reference node.

16. The method according to claim 15, wherein the risk is calculated according to Equation 2 below.
(Equation 2)

At this time, R is the risk and L is the label value assigned to the reference node.

The method of claim 5, further comprising a screening management step of selecting and storing and managing the frequent patterns.

The method of claim 17, wherein the selection management step,
Based on a graph database characterized by including a frequent pattern selection step of defining frequent patterns by determining similarity according to criteria defined in advance as a function or constant and a management step of storing and managing inflow subgraphs corresponding to frequent patterns Log data similar pattern matching and risk management method.