KR20200025043A

KR20200025043A - Method and system for security information and event management based on artificial intelligence

Info

Publication number: KR20200025043A
Application number: KR1020180101822A
Authority: KR
Inventors: 신승원; 이승현; 김연근; 강희도
Original assignee: 한국과학기술원
Priority date: 2018-08-29
Filing date: 2018-08-29
Publication date: 2020-03-10
Also published as: KR102225040B1

Abstract

The present invention relates to an integrated log management system based on artificial intelligence and a system thereof. Various security events occurring in an enterprise environment are analyzed to build a knowledge base (KB) for the enterprise security and attacks of malicious and abnormal behavior can be detected by performing correlation analysis for the entire system based on the built KB.

Description

Integrated log management method and system based on artificial intelligence {METHOD AND SYSTEM FOR SECURITY INFORMATION AND EVENT MANAGEMENT BASED ON ARTIFICIAL INTELLIGENCE}

본 발명은 인공 지능 기반의 통합 로그 관리 방법 및 그 시스템에 관한 것으로, 보다 상세하게는 비정형 데이터 분석 및 상관관계 분석을 통해 악성 공격에 대한 자동화된 탐지 기능을 제공하는 기술에 관한 것이다. The present invention relates to an artificial intelligence-based integrated log management method and system, and more particularly, to a technique for providing an automated detection function for malicious attacks through atypical data analysis and correlation analysis.

보안 정보 및 이벤트 관리 시스템, 또는 통합 로그 관리 시스템(Security Information and Event Management; SIEM)은 엔터프라이즈 환경에서 발생하는 다양한 보안 로그를 수집하고, 단일화된 인터페이스를 이용하여 시스템의 보안 상황을 분석 및 대응한다. The security information and event management system, or integrated log management system (SIEM) collects various security logs generated in an enterprise environment and analyzes and responds to the security status of the system using a unified interface.

엔터프라이즈 환경에서 발생하는 다양한 공격은 갈수록 고도화가 되고 있으나, 시스템을 무기력화하기 위한 다양한 공격은 시스템의 관리자(Admin)가 구축한 분산된 보안 장비에 의해 탐지되고 있다.Various attacks occurring in the enterprise environment are becoming more advanced, but various attacks to incapacitate the system are detected by distributed security equipment built by the administrator of the system.

기존 SIEM 시스템은 각기 다른 보안 장치들(Security devices)로부터 획득되는 이벤트에 대한 상관관계 분석이 부족하므로, 고도화되고 복잡한 공격들을 특정할 수 있는 근거가 매우 미약하다. 또한, 기존 SIEM 시스템은 정형화된 데이터를 기반으로 분석하기 때문에 공격 행위를 특정하기 위한 피쳐(feature) 수집에 제한이 있어, 고도화된 공격을 탐지하기 어렵다는 한계가 존재하였다.Existing SIEM systems lack correlations for events obtained from different security devices, and there is very little ground for specifying advanced and complex attacks. In addition, since the existing SIEM system analyzes based on the standardized data, there is a limit in collecting a feature for specifying the attack behavior, which makes it difficult to detect an advanced attack.

나아가, 기존 SIEM 시스템은 몇 가지 문제점을 내포하며, 그 중 가장 큰 문제는 복잡한 로그들을 데이터베이스(Relational DB)에 텍스트로 저장하기 때문에 로그 분석이 매우 어렵다는 것이다. 복잡한 로그들이 텍스트 기반으로 저장되면, 로그들 간에 어떠한 상관관계가 존재하는지 감지하기가 어렵고, 패턴들을 분석하는데 있어서 한계가 존재하며, 정확도가 떨어진다는 문제점이 존재한다. Furthermore, the existing SIEM system has some problems, the biggest of which is that log analysis is very difficult because it stores complex logs as text in a relational DB. When complex logs are stored on a text basis, it is difficult to detect what correlation exists between the logs, there are limitations in analyzing patterns, and there is a problem of inaccuracy.

이하에서는, 도 1 및 도 2를 참조하여 기존 SIEM 시스템의 문제점 및 한계점에 대해 상세히 설명하고자 한다.Hereinafter, the problems and limitations of the existing SIEM system will be described in detail with reference to FIGS. 1 and 2.

도 1은 기존 SIEM 시스템의 문제점을 설명하기 위한 예를 도시한 것이다.1 illustrates an example for explaining a problem of an existing SIEM system.

도 1을 참조하면, 네트워크에 랜섬웨어(Ransomware)가 침입하여 호스트(Host)들을 감염시켰다고 가정한 경우, 호스트들의 로그 데이터(Log)는 기존 SIEM 시스템의 데이터베이스에 텍스트 형태로 저장된다. 이로 인해, 기존 SIEM 시스템은 호스트 A, B, C가 감염되었다고 분석해낼 수 있으나, 로그 데이터 간의 상관관계를 분석하기 어려워 어느 호스트에서 우선적으로 랜섬웨어가 감염되었는지 감지할 수 없다는 한계가 존재하였다. Referring to FIG. 1, when it is assumed that ransomware invades a network by infecting hosts, log data of the hosts is stored in a text form in a database of an existing SIEM system. As a result, the existing SIEM system can analyze that hosts A, B, and C are infected, but it is difficult to analyze the correlation between log data, and there is a limit that it cannot detect which host ransomware is infected first.

이러한 한계점을 극복하기 위한 방법으로 상관관계를 분석하기 위해, 최근에는 네트워크 구성도 및 네트워크 통신 연결을 기반으로 로그를 이용한 관계형 그래프(Relational Graph)를 구축하고, 공격을 분석하는 제품이 출시되었다. In order to analyze the correlation as a way to overcome these limitations, recently, a product that constructs a relational graph using a log based on a network diagram and a network communication connection and analyzes an attack has been released.

대표적인 예가 LINKURIOUS 사에서 판매하는 그래프 솔루션이다. 기존 SIEM 시스템은 그래프 솔루션을 기반으로, 데이터로 인한 네트워크 통신 연결 정보 기반의 관계형 그래프를 획득하여 공격들을 탐지하였다.A good example is the graph solution sold by LINKURIOUS. Existing SIEM system detects attacks based on graph solution by acquiring relational graph based on data communication network information.

도 2는 기존의 그래프 솔루션을 이용하여 공격을 분석하는 예를 도시한 것이다.2 shows an example of analyzing an attack using a conventional graph solution.

보다 구체적으로, 도 2는 LINKURIOUS 사에서 제공하는 그래프 솔루션을 이용하여 UDP 스톰(storm) 공격을 분석하는 예를 도시한 것이다.More specifically, FIG. 2 illustrates an example of analyzing a UDP storm attack using a graph solution provided by LINKURIOUS.

도 2를 참조하면, 그래프 솔루션을 이용한 기존 SIEM 시스템은 IP와 포트(port) 정보만을 이용하여 공격을 탐지하는 것을 알 수 있다. 이로 인해, 기존 SIEM 시스템은 매우 단순한 공격들만 탐지한다는 한계가 존재하였다. Referring to FIG. 2, it can be seen that the existing SIEM system using the graph solution detects an attack using only IP and port information. As a result, existing SIEM systems were limited to detecting very simple attacks.

따라서, 기존 SIEM 시스템에서 사용되는, 로그를 텍스트 형식으로 저장하여 패턴 매칭에 따른 공격을 분석하는 방법, 또는 네트워크 통신 연결 정보를 이용한 단순한 관계형 그래프를 통해 공격을 분석하는 방법이 아닌, 새로운 방법으로 로그 정보들을 수집 및 관리하는 SIEM 시스템이 요구된다.Therefore, the log is saved in a text format, which is used in existing SIEM systems, to analyze attacks based on pattern matching, or to analyze the attacks through simple relational graphs using network communication connection information. What is needed is a SIEM system to collect and manage information.

나아가, 관리자가 시스템에 저장된 데이터를 이용하여 규칙(또는 룰, role)을 생성해 탐지하는 것은 신속성이 떨어지므로, 보다 신속한 대처를 위해 자동으로 데이터들을 분석하여 규칙을 생성해내는 SIEM 시스템이 요구된다. Furthermore, it is less rapid for administrators to create and detect rules (or roles) using the data stored in the system. Therefore, a SIEM system is needed to automatically analyze the data and generate rules for faster response. .

본 발명의 목적은 엔터프라이즈 환경에서 발생하는 다양한 보안 이벤트를 분석하여 엔터프라이즈 보안을 위한 지식 베이스(Knowledge Base; KB)를 구축하고, 구축된 지식 베이스를 기반으로 시스템 전반에 대한 상관관계 분석을 수행하여 공격을 탐지하는 기술을 제공하고자 한다. An object of the present invention is to analyze the various security events occurring in the enterprise environment to build a knowledge base (KB) for enterprise security, and to perform a correlation analysis for the entire system based on the established knowledge base to attack To provide a technique for detecting this.

또한, 본 발명의 목적은 로그에 포함된 유의미한 요소들로만 상관관계 분석을 수행하여 기존의 방법으로는 탐지하지 못하는 공격들에 대한 분석이 가능한 기술을 제공하고자 한다. It is also an object of the present invention to provide a technique that can analyze the attacks that can not be detected by the existing method by performing correlation analysis with only the significant elements included in the log.

본 발명의 실시예에 따른 인공 지능을 결합한 보안 정보 및 이벤트 관리 시스템 또는 통합 로그 관리 시스템(Security Information and Event Management; SIEM)의 동작 방법에 있어서, 복수의 보안 장치들(Security devices)로부터 보안 로그를 수집하는 단계, 자연어 처리 기법(Natural Language Processing; NLP) 및 시맨틱(Semantic)을 이용하여 공격 행위 탐지를 위한 상기 보안 로그를 분석하는 단계, 상기 분석 결과를 기반으로 획득되는 복수의 피쳐들(feature) 간 상관관계를 분석하여 관계 그래프(Relational Graph)를 생성하는 단계 및 상기 관계 그래프를 기반으로 공격이나 장애 등의 연관성을 분석하여 악성 및 비정상행위를 탐지하는 단계를 포함한다.In a method of operating a security information and event management system or an integrated log management system (SIEM) incorporating artificial intelligence according to an embodiment of the present invention, the security log from a plurality of security devices (Security devices) Collecting, analyzing the security log for attack behavior detection using natural language processing (NLP) and semantic, and a plurality of features obtained based on the analysis result Generating a relational graph by analyzing the correlation between and detecting the malicious and abnormal behavior by analyzing the association of the attack or disorder based on the relation graph.

또한, 본 발명의 실시예에 따른 인공 지능을 결합한 보안 정보 및 이벤트 관리 시스템 또는 통합 로그 관리 시스템(Security Information and Event Management; SIEM)의 동작 방법은 상기 탐지 결과에 따라, 새로이 탐지되는 악성 또는 비정상행위에 대한 보안위협 정보를 보고(Reports)하는 단계를 더 포함할 수 있다.In addition, the operation method of the security information and event management system or integrated log management system (SIEM) incorporating artificial intelligence according to an embodiment of the present invention, the newly detected malicious or abnormal behavior according to the detection result The method may further include reporting security information about the (Reports).

상기 보안 로그를 수집하는 단계는 상기 복수의 보안 장치들(Security devices)로부터 로그가 발생한 시간 정보(timestamp), IP 주소 및 포트 정보 중 적어도 어느 하나 이상을 포함하는 상기 보안 로그 및 감사 정보들을 수집할 수 있다.The collecting of the security log may include collecting the security log and audit information including at least one of timestamp, IP address, and port information from which the log was generated from the plurality of security devices. Can be.

상기 보안 로그를 분석하는 단계는 상기 보안 로그에 전처리 기법을 적용한 후, 상기 자연어 처리 기법 및 상기 시맨틱을 이용하여 그래프 자료 구조를 위한 그래프 형태를 분석할 수 있다.In the analyzing of the security log, after applying a preprocessing technique to the security log, the graph form for the graph data structure may be analyzed using the natural language processing technique and the semantics.

상기 보안 로그를 분석하는 단계는 상기 자연어 처리 기법을 이용하여 상기 전처리된 보안 로그로 인한 자연어 데이터(corpus)로부터 일정량의 데이터를 추출(sampling)하고, 상기 추출된 데이터의 문법적 또는 의미론적 정보를 바탕으로 패턴을 정의(annotation)하며, 상기 패턴들을 기계학습 또는 인공지능 알고리즘에 학습시켜 상기 그래프 형태 기반의 자연어 처리 기법 모델(NLP Model)을 생성할 수 있다.The analyzing of the security log may include extracting a predetermined amount of data from natural language data (corpus) due to the preprocessed security log using the natural language processing technique, and based on grammatical or semantic information of the extracted data. Annotation of the pattern may be performed, and the patterns may be trained in a machine learning or artificial intelligence algorithm to generate an NLP model based on the graph form.

상기 관계 그래프(Relational Graph)를 생성하는 단계는 상기 자연어 처리 기법 모델을 통해 상기 복수의 보안 장치들로부터 수집되는 상기 보안 로그 중 공격 행위 탐지를 위한 근거의 집합인 상기 복수의 피쳐들을 도출하는 단계 및 상관관계분석(correlation Analysis)을 통해 상기 복수의 피쳐들의 상관관계를 분석하여 상기 관계 그래프를 생성 및 관리하는 단계를 포함할 수 있다.Generating the relational graph may include deriving the plurality of features that are sets of grounds for detecting an attack behavior in the security log collected from the plurality of security devices through the natural language processing model; Generating and managing the relationship graph by analyzing the correlation of the plurality of features through a correlation analysis (correlation analysis).

상기 관계 그래프를 생성 및 관리하는 단계는 상기 복수의 피쳐들에 대한 공통분모를 획득하고, 상기 복수의 피쳐들 간의 연관성 및 상관관계를 분석하며, 상기 보안 로그의 명시적인 관계와 내포된 관계를 도출하여 관계의 주체가 되는 노드들을 생성하는 과정을 통해, 주체 노드(Vertex), 주체 노드 간의 관계(Edge), 및 노드와 관계의 속성(Property)으로 데이터를 표현한 상기 관계 그래프를 생성할 수 있다.Generating and managing the relationship graph may include obtaining a common denominator for the plurality of features, analyzing associations and correlations between the plurality of features, and deriving an explicit relationship and an embedded relationship in the security log. Through the process of generating the nodes that are the subjects of the relationship, the relationship graph expressing the data by the subject node Vertex, the relationship between the subject nodes, and the property of the node and the relationship may be generated.

상기 관계 그래프를 생성 및 관리하는 단계는 상기 노드들을 생성하는 과정에서, 상기 보안 로그에서 중복된 정보는 하나로 결합하고, 단일 로그에서 획득되지 않는 관계들은 상기 상관관계분석을 통해 추가적으로 도출하여 관계를 명시할 수 있다. In the generating and managing of the relationship graph, in the process of generating the nodes, duplicate information in the security log is combined into one, and relationships not acquired in a single log are additionally derived through the correlation analysis to specify a relationship. can do.

상기 관계 그래프를 생성 및 관리하는 단계는 그래프 기반 데이터베이스(Graph DB)를 이용하여 상기 관계 그래프의 실질적인 구축과 효율적인 데이터 관리를 수행할 수 있다.In the generating and managing of the relation graph, the construction of the relation graph and efficient data management may be performed using a graph-based database.

상기 악성 및 비정상행위를 탐지하는 단계는 상기 관계 그래프를 기반으로 상호 연관관계와 공격이나 장애 등의 연관성을 분석하며, 사용자에 의해 기 정의된 행위 규칙에 따라 악성 및 비정상행위를 탐지할 수 있다.The detecting of the malicious and abnormal behaviors may be performed based on the relationship graph and analyze the correlations between the correlation and the attack or the disorder, and may detect the malicious and abnormal behaviors according to a predefined behavior rule by the user.

상기 악성 및 비정상행위를 탐지하는 단계는 평판 기반 행위 분석을 수행하며, 사용자의 행위에 대한 평판 기능을 단계별로 분류하여 상기 악성 및 비정상행위에 대한 최종 결론을 제공할 수 있다.The detecting of malicious and abnormal behaviors may be performed by performing reputation-based behavior analysis and classifying the reputation function of the user's behavior in stages to provide a final conclusion on the malicious and abnormal behaviors.

본 발명의 실시예에 따른 인공 지능을 결합한 보안 정보 및 이벤트 관리 시스템 또는 통합 로그 관리 시스템(Security Information and Event Management; SIEM)에 있어서, 복수의 보안 장치들(Security devices)로부터 보안 로그를 수집하는 로그 수집부, 자연어 처리 기법(Natural Language Processing; NLP) 및 시맨틱(Semantic)을 이용하여 공격 행위 탐지를 위한 상기 보안 로그를 분석하는 로그 분석부, 상기 분석 결과를 기반으로 획득되는 복수의 피쳐들(feature) 간 상관관계를 분석하여 관계 그래프(Relational Graph)를 생성하는 그래프 생성부 및 상기 관계 그래프를 기반으로 공격이나 장애 등의 연관성을 분석하여 악성 및 비정상행위를 탐지하는 탐지부를 포함한다.In the security information and event management system or integrated log management system (SIEM) combining artificial intelligence according to an embodiment of the present invention, a log for collecting security logs from a plurality of security devices (Security devices) Log analyzer for analyzing the security log for attack behavior detection using a collector, Natural Language Processing (NLP) and Semantic, a plurality of features obtained based on the analysis result A graph generating unit for generating a relational graph by analyzing the correlation between and a detection unit for detecting malicious and abnormal behavior by analyzing the association of an attack or a disorder based on the relationship graph.

또한, 본 발명의 실시예에 따른 인공 지능을 결합한 보안 정보 및 이벤트 관리 시스템 또는 통합 로그 관리 시스템(Security Information and Event Management; SIEM)은 상기 탐지 결과에 따라, 새로이 탐지되는 악성 또는 비정상행위에 대한 보안위협 정보를 보고(Reports)하는 통신부를 더 포함할 수 있다. In addition, the security information and event management system or integrated log management system (SIEM) that combines artificial intelligence according to an embodiment of the present invention, according to the detection result, security for newly detected malicious or abnormal behavior The communication unit may further include a communication unit for reporting threat information.

상기 그래프 생성부는 상기 자연어 처리 기법 모델을 통해 상기 복수의 보안 장치들로부터 수집되는 상기 보안 로그 중 공격 행위 탐지를 위한 근거의 집합인 상기 복수의 피쳐들을 도출하는 피쳐 도출부 및 상관관계분석(correlation Analysis)을 통해 상기 복수의 피쳐들의 상관관계를 분석하여 상기 관계 그래프를 생성 및 관리하는 그래프 관리부를 포함할 수 있다. The graph generator is a feature derivation unit and correlation analysis (correlation analysis) for deriving the plurality of features that is a set of grounds for the detection of attack behavior in the security log collected from the plurality of security devices through the natural language processing technique model It may include a graph management unit for generating and managing the relationship graph by analyzing the correlation of the plurality of features through the).

본 발명의 실시예에 따르면, 로그에 포함된 유의미한 요소들로만 상관관계 분석을 수행하여 기존의 방법으로는 탐지하지 못하는 공격들에 대한 분석이 가능할 수 있다.According to an embodiment of the present invention, it is possible to analyze the attacks that cannot be detected by the existing method by performing correlation analysis on only the significant elements included in the log.

또한, 본 발명의 실시예에 따르면, 비정형 데이터에 대한 분석과 상관관계 분석을 통해 공격을 분석하고, 논리적 중앙 집중형 지식 베이스(Knowledge Base; KB)로 공격 탐지에 대한 노하우를 축적함으로써, 추후 발생하는 공격에 대한 자동화된 탐지 기능을 제공할 수 있다. In addition, according to an embodiment of the present invention, by analyzing the attack through the analysis and correlation analysis of the unstructured data, and accumulating know-how for attack detection in a logical centralized Knowledge Base (KB), it will occur later Can provide automated detection of attacks.

도 1은 기존 SIEM 시스템의 문제점을 설명하기 위한 예를 도시한 것이다.
도 2는 기존의 그래프 솔루션을 이용하여 공격을 분석하는 예를 도시한 것이다.
도 3은 본 발명의 실시예에 따른 인공 지능 기반의 통합 로그 관리 방법의 흐름도를 도시한 것이다.
도 4는 본 발명의 실시예에 따른 인공 지능 기반의 통합 로그 관리 시스템의 세부 구성을 블록도로 도시한 것이다.
도 5는 본 발명의 실시예에 따른 인공 지능 기반의 통합 로그 관리 시스템의 구조도를 도시한 것이다.
도 6은 봇넷 확산의 예를 도시한 것이다.
도 7은 본 발명의 실시예에 따른 상관관계 분석을 통한 봇넷 확산의 예를 도시한 것이다.
도 8은 그래프의 노드 및 관계를 설명하기 위해 도시한 것이다.
도 9는 본 발명의 실시예에 따른 평판 기반 행위 분석을 이용하여 악성 행위를 탐지하는 예를 설명하기 위해 도시한 것이다.1 illustrates an example for explaining a problem of an existing SIEM system.
2 shows an example of analyzing an attack using a conventional graph solution.
3 is a flowchart illustrating an integrated log management method based on artificial intelligence according to an embodiment of the present invention.
4 is a block diagram illustrating a detailed configuration of an artificial intelligence-based integrated log management system according to an embodiment of the present invention.
5 is a structural diagram of an integrated log management system based on artificial intelligence according to an embodiment of the present invention.
6 shows an example of botnet spreading.
7 illustrates an example of botnet spreading through correlation analysis according to an embodiment of the present invention.
8 is a diagram illustrating nodes and relationships in a graph.
9 illustrates an example of detecting malicious behavior using reputation based behavior analysis according to an embodiment of the present invention.

이하, 본 발명에 따른 실시예들을 첨부된 도면을 참조하여 상세하게 설명한다. 그러나 본 발명이 실시예들에 의해 제한되거나 한정되는 것은 아니다. 또한, 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. However, the present invention is not limited or limited by the embodiments. Also, like reference numerals in the drawings denote like elements.

또한, 본 명세서에서 사용되는 용어(terminology)들은 본 발명의 바람직한 실시예를 적절히 표현하기 위해 사용된 용어들로서, 이는 시청자, 운용자의 의도 또는 본 발명이 속하는 분야의 관례 등에 따라 달라질 수 있다. 따라서, 본 용어들에 대한 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. In addition, terms used in the present specification (terminology) are terms used to properly express preferred embodiments of the present invention, which may vary depending on the intention of the viewer, the operator, or customs in the field to which the present invention belongs. Therefore, the definitions of the terms should be made based on the contents throughout the specification.

도 3은 본 발명의 실시예에 따른 인공 지능 기반의 통합 로그 관리 방법의 흐름도를 도시한 것이고, 도 4는 본 발명의 실시예에 따른 인공 지능 기반의 통합 로그 관리 시스템의 세부 구성을 블록도로 도시한 것이다.3 is a flowchart illustrating an integrated log management method based on artificial intelligence according to an embodiment of the present invention, and FIG. 4 is a block diagram illustrating a detailed configuration of an integrated log management system based on artificial intelligence according to an embodiment of the present invention. It is.

도 3 및 도 4를 참조하면, 본 발명의 실시예에 따른 인공 지능 기반의 통합 로그 관리 방법 및 그 시스템은 비정형 데이터 분석 및 상관관계 분석을 통해 악성 공격에 대한 자동화된 탐지 기능을 제공한다.3 and 4, the artificial intelligence-based integrated log management method and system according to an embodiment of the present invention provides an automated detection function for malicious attacks through atypical data analysis and correlation analysis.

이를 위해, 도 4에서 본 발명의 실시예에 따른 인공 지능 기반의 통합 로그 관리 시스템(400)은 로그 수집부(410), 로그 분석부(420), 그래프 생성부(430) 및 탐지부(440)를 포함한다. 또한, 도 3의 각 단계들(단계 310 내지 단계 360)은 도 4의 본 발명의 실시예에 따른 인공 지능 기반의 통합 로그 관리 시스템(400)의 구성요소들 즉, 로그 수집부(410), 로그 분석부(420), 그래프 생성부(430), 탐지부(440), 통신부(450) 및 시스템 제어부(470)에 의해 수행될 수 있다.To this end, in Figure 4, the artificial intelligence-based integrated log management system 400 according to an embodiment of the present invention is a log collector 410, a log analyzer 420, a graph generator 430 and a detector 440 ). In addition, each of the steps (steps 310 to 360) of FIG. 3 includes components of the artificial intelligence-based integrated log management system 400 according to the embodiment of the present invention, that is, the log collecting unit 410. The log analyzer 420, the graph generator 430, the detector 440, the communicator 450, and the system controller 470 may be performed.

도 3을 참조하면, 단계 310에서, 로그 수집부(410)는 복수의 보안 장치들(Security devices)로부터 보안 로그를 수집한다. Referring to FIG. 3, in step 310, the log collector 410 collects security logs from a plurality of security devices.

로그 수집부(410)는 침입 탐지 시스템(Intrusion Detection System; IDS), 방화벽(firewall), 미들박스(middlebox) 등의 보안 장치(Security device)로부터 로그가 발생한 시간 정보(timestamp), IP 주소 및 포트 정보 중 적어도 어느 하나 이상을 포함하는 보안 로그 및 감사 정보들을 수집할 수 있다. The log collector 410 may include timestamp, IP address, and port at which a log is generated from a security device such as an intrusion detection system (IDS), a firewall, a middlebox, and the like. Security logs and audit information including at least one of the information may be collected.

이 때, 상기 보안 로그는 해당 로그가 발생한 시간 정보(timestamp)를 포함할 수 있으며, 패킷 통신에 의해 발생한 로그의 경우, 호스트의 IP 주소 및 포트 정보를 포함할 수 있다. 이로 인해, 본 발명의 실시예에 따른 인공 지능 기반의 통합 로그 관리 방법 및 그 시스템은 보안 로그로부터 복수의 보안 장치들 중 이벤트가 발생한 순서를 파악할 수 있으며, 호스트의 IP 주소 및 포트 정보로부터 네트워크 연결 및 데이터의 흐름을 파악할 수 있고, 네트워크 서비스의 종류를 감지할 수 있다.At this time, the security log may include a timestamp (timestamp) when the log occurs, and, in the case of a log generated by packet communication, may include the IP address and port information of the host. For this reason, the artificial intelligence-based integrated log management method and system according to an embodiment of the present invention can determine the order in which events occur among the plurality of security devices from the security log, and the network connection from the IP address and port information of the host And it can grasp the flow of data, it can detect the type of network service.

또한, 상기 보안 로그는 해당 로그가 발생한 이유에 대한 설명이 포함될 수 있다. 이는 자연어로 기술되며, 정보의 수준은 보안 장치의 종류에 따라 간단(예를 들어, login failure)하거나, 상세(예를 들어, login failure: unknown user name)할 수 있다.In addition, the security log may include a description of why the log occurs. This is described in natural language, and the level of information may be simple (eg, login failure) or detailed (eg, login failure: unknown user name), depending on the type of security device.

또한, 상기 보안 로그는 시스템 로그의 경우, 범위가 매우 넓기 때문에 파일 시스템, 사용자 프로그램, 커널 프로그램, 메모리 접근 등의 각 분야별로 구분할 수 있는 정보를 포함할 수 있다. 파일 시스템 로그의 경우, 접근하고자 하는 파일에 대한 정보 및 파일에 접근하는 주체에 대한 정보(예를 들어, Chrome browser writes 127 bytes in /var/log/chrome.log)가 제공될 수 있다. 프로그램 크래쉬(crash) 로그의 경우, 종료된 프로그램의 이름 및 이유 등의 정보(예를 들어, Safari browser crash: not enough memory)가 포함될 수 있다. 메모리 접근에 대한 로그의 경우, 접근하는 주체에 대한 정보나 메모리 영역에 대한 정보가 포함될 수 있다.In addition, since the security log has a very wide range in the case of a system log, the security log may include information that can be distinguished for each field such as a file system, a user program, a kernel program, and a memory access. In the case of the file system log, information about a file to be accessed and information about a subject to access the file (for example, Chrome browser writes 127 bytes in /var/log/chrome.log) may be provided. In the case of a program crash log, information such as the name and reason of the terminated program (for example, Safari browser crash: not enough memory) may be included. In the case of a log of memory access, information about an accessing subject or a memory area may be included.

상기 전술한 정보들은 하나 또는 논리적으로 연결된 다수의 로그 데이터베이스에 저장되며, 이후 분석을 거쳐 관리자(Admin)가 전송하는 검색 질의에 따라 관련된 정보를 전달하거나 공격을 탐지할 수 있다. 이외에도 다양한 종류의 보안 장치가 존재하며, 각 보안 장치가 생성하는 보안 로그는 매우 상이하므로, 본 발명의 실시예에 따른 인공 지능 기반의 통합 로그 관리 방법 및 그 시스템은 실제 로그들에 대한 매뉴얼한 분석을 병행하여 각 로그들로부터 획득할 수 잇는 정보의 수준을 최대화하고자 한다. 이 때, 로그 데이터베이스는 도 4에 도시된 그래프 기반 데이터베이스(460)에 포함된 것일 수 있다.The above-described information is stored in one or a plurality of logically connected log databases, and afterwards, related information can be delivered or an attack can be detected according to a search query transmitted by an administrator through analysis. In addition, there are various types of security devices, and since the security logs generated by each security device are very different, the artificial intelligence-based integrated log management method and system according to an embodiment of the present invention have manual analysis of actual logs. In parallel, we want to maximize the level of information that can be obtained from each log. At this time, the log database may be included in the graph-based database 460 shown in FIG.

단계 320에서, 로그 분석부(420)는 자연어 처리 기법(Natural Language Processing; NLP) 및 시맨틱(Semantic)을 이용하여 공격 행위 탐지를 위한 보안 로그를 분석한다. In operation 320, the log analyzer 420 analyzes a security log for attack behavior detection using Natural Language Processing (NLP) and Semantic.

로그 분석의 자동화를 위해, 보안 로그를 기계화하는 작업이 필요하다. 이에 따라서, 로그 분석부(420)는 자연어 처리 기법 및 시맨틱을 이용하여 보안 로그를 분석할 수 있다. To automate log analysis, it is necessary to mechanize the security log. Accordingly, the log analyzer 420 may analyze the security log using natural language processing techniques and semantics.

자연어 처리 기법(Natural Language Processing; NLP)은 상이한 형태의 보안 로그들을 통일된 형태의 기계어로 변환할 수 있다. 이처럼 시스템에서 자연어를 이해하도록 변환해주는 자연어 처리 기법은 기본적으로 수집된 대용량의 자연어 데이터(corpus)로부터 트레이닝을 위한 일정량의 데이터를 추출(sampling)한 뒤, 해당 데이터의 문법적 또는 의미론적 정보를 바탕으로 패턴을 정의(annotation)하고, 이러한 패턴들을 기계학습 또는 인공지능 알고리즘에 학습시켜 모델을 생성한다.Natural Language Processing (NLP) can translate different types of security logs into a unified form of machine language. As such, the natural language processing technique that transforms the natural language in the system extracts a certain amount of data for training from the collected large amount of natural language data (corpus), and then based on the grammatical or semantic information of the data. Annotates patterns and trains them on machine learning or artificial intelligence algorithms to create models.

이로 인해, 로그 분석부(420)는 자연어 처리 기법을 활용하여 그래프 자료 구조를 위한 그래프 형태를 분석하고, 그래프 형태 기반의 자연어 처리 기법 모델(NLP Model)을 생성 및 트레이닝할 수 있다. As a result, the log analyzer 420 may analyze the graph form for the graph data structure by using the natural language processing technique, and generate and train the NLP model based on the graph form.

본 발명의 인공 지능 기반의 통합 로그 관리 방법 및 그 시스템에서 기용되는 자연어 처리 기법(NLP)은 품사 태깅, 의미역 결정 및 개체명 인식 등이 있다. Artificial log-based integrated log management method of the present invention and the natural language processing technique (NLP) used in the system includes part-of-speech tagging, semantic domain determination and entity name recognition.

품사 태깅(Part-of-speech tagging; POS tagging)은 주어진 문장에 대해 문법적 연관성을 분석한 후, 문장을 구성하는 단어가 어떠한 문법적 역할을 수행하는지를 표기하는 기법이다. 예를 들어, 하기의 [예 1]과 같이 ‘Multiple login failures are detected in a mail server’ 라는 에러 메시지가 수집된 경우, 품사 태깅은 해당 문장을 분석하여 각 단어가 문장에서 어떠한 문법적 요소로 존재하는지 파악한다. 이 때, 품사 태깅은 품사 태깅 모델을 사용하여 명사(NN), 동사(VB), 전치사(IN) 및 형용사(JJ) 등을 파악할 수 있다. Part-of-speech tagging (POS tagging) is a technique for analyzing the grammatical relations of a given sentence and then indicating what grammatical role the words make up. For example, when the error message 'Multiple login failures are detected in a mail server' is collected as shown in [Example 1] below, the part-of-speech tagging analyzes the sentence to determine what grammatical elements each word exists in the sentence. Figure out. In this case, the part-of-speech tagging may identify a noun (NN), a verb (VB), a preposition (IN), an adjective (JJ), etc. using a part-of-speech tagging model.

[예 1]Example 1

의미역 결정(Semantic Role Labeling; SRL)은 문장 구성 요소들의 문법적 정보를 바탕으로 해당 단어 또는 구가 문장에서 어떠한 의미론적인 역할을 수행하는지 분석하는 기법이다. 예를 들어, 하기의 [예 2]와 같이 ‘login failures’와 ‘mail server’는 모두 명사이나, 실제 의미를 전달하는 역할이 다르다. 이에 따라서, 의미역 결정은 문장에서 가장 중요한 요소인 주어, 술어, 목적어를 통틀어 SPO라고 지칭하며, 문장에서 SPO와 함께 장소, 시간 등의 추가적인 의미를 나타내는 수식구(AM)를 구분할 수 있다. 여기서, 술어의 경우, ‘주어가 목적어에 행하는 것’이라는 방향성을 내포하기 때문에, SPO 정보에서 주어와 목적어를 노드(Vertex)로, 술어를 관계(Edge)로 나타내는 그래프를 형성할 수 있다. Semantic Role Labeling (SRL) is a technique that analyzes the semantic role of a word or phrase in a sentence based on the grammatical information of the sentence elements. For example, as in [Example 2] below, 'login failures' and 'mail server' are both nouns, but differ in their actual meanings. Accordingly, the semantic domain determination refers to SPO as a subject, predicate, and object, which are the most important elements in the sentence, and can distinguish a modifier phrase (AM) indicating additional meaning of place, time, etc. together with the SPO in the sentence. Here, in the case of a predicate, since the term “the subject performs on the object” includes a directionality, it is possible to form a graph in which the subject and the object are represented as a node in the SPO information, and the predicate is represented as an edge.

[예 2][Example 2]

개체명 인식(Named Entity Recognition; NER)은 문장에 존재하는 개체들이 어떠한 의미를 갖는지 파악하는 기법이다. 전술한 품사 태깅 및 의미역 결정의 자연어 처리 기법을 통해 문장의 문법적, 의미론적 정보들을 파악할 수 있으나, 해당 요소들이 실제로 지칭하는 것이 무엇인지는 알기 어렵다. 이를 해결하기 위해, 개체명 인식은 각 요소가 실제로 전달하는 정보를 파악할 수 있다.Named Entity Recognition (NER) is a technique for identifying the meaning of the entities in a sentence. Although the grammatical and semantic information of the sentence can be grasped through the above-described parts of speech tagging and natural language processing techniques of semantic domain determination, it is difficult to know what the elements actually refer to. To address this, entity name recognition can determine what information each element actually conveys.

예를 들어, 하기의 [예 3]을 참조하면, ‘login’은 인증에 관련된 용어이므로 ‘Authentication’이라는 개체군에 속하며, ‘mail server’는 네트워크 말단에 존재하는 서버이므로 ‘End host’라는 개체군에 속한다. 이에 따라서, 개체명 인식은 비슷한 패턴을 발생시키는 공격들을 우선적으로 검증하여 빠르게 탐지할 수 있다.For example, referring to [Example 3] below, 'login' belongs to a group called 'Authentication' because it is a term related to authentication, and 'mail server' belongs to a group called 'End host' because it exists at the end of the network. Belong. Accordingly, entity name recognition can be detected quickly by first verifying attacks that generate similar patterns.

[예 3][Example 3]

본 발명의 실시예에 따른 인공 지능 기반의 통합 로그 관리 방법 및 그 시스템은 전술한 3개의 자연어 처리 기법뿐만 아니라, 다양한 기법을 사용할 수 있으므로, 이에 한정하지 않는다.Artificial intelligence-based integrated log management method and system according to an embodiment of the present invention can use a variety of techniques, as well as the above-described three natural language processing techniques, it is not limited thereto.

다시 도 3을 참조하면, 단계 320에서, 로그 분석부(420)는 자연어 처리 기법(NLP) 및 시맨틱을 이용하기 전, 보안 로그에 전처리 기법을 적용할 수 있다.Referring back to FIG. 3, in step 320, the log analyzer 420 may apply the preprocessing technique to the security log before using the natural language processing technique (NLP) and semantics.

상기 전처리 기법은 수집된 보안 로그들에 대한 불필요한 요소들을 제거하거나, 사전에 정의된 또는 고유한 로그 포맷을 미리 처리하는 과정일 수 있다. 이후, 로그 분석부(420)는 전처리된 보안 로그에 자연어 처리 기법(NLP) 및 시맨틱을 이용하여 그래프 자료 구조를 위한 관계(Edge)값 및 노드(Vertex)값을 나타내는 그래프 형태를 분석할 수 있다. The preprocessing technique may be a process of removing unnecessary elements for the collected security logs or preprocessing a predefined or unique log format. Subsequently, the log analyzer 420 may analyze a graph form representing an edge value and a vertex value for the graph data structure using a natural language processing technique (NLP) and semantics on the preprocessed security log. .

나아가, 로그 분석부(420)는 분석된 내용의 정확도를 그래프 기반 데이터베이스(460)에 저장되어 유지되는 매뉴얼한 분석 정보와 비교함으로써, 보다 정확한 정보로 병합될 수 있는 그래프 형태를 정의할 수 있다. Furthermore, the log analyzer 420 may define a graph form that may be merged into more accurate information by comparing the accuracy of the analyzed content with manual analysis information stored and maintained in the graph-based database 460.

단계 330에서, 그래프 생성부(430)는 분석 결과를 기반으로 획득되는 복수의 피쳐들(feature) 간 상관관계를 분석하여 관계 그래프(Relational Graph)를 생성한다.In operation 330, the graph generator 430 may generate a relational graph by analyzing correlations between a plurality of features acquired based on the analysis result.

단계 330에서, 그래프 생성부(430)는 자연어 처리 기법 모델을 통해 복수의 보안 장치들로부터 수집되는 보안 로그 중 공격 행위 탐지를 위한 근거의 집합인 복수의 피쳐들을 도출하는 단계 또는 피쳐 도출부(미도시) 및 상관관계분석(correlation Analysis)을 통해 복수의 피쳐들의 상관관계를 분석하여 관계 그래프를 생성 및 관리하는 단계 또는 그래프 관리부(미도시)를 포함할 수 있다. In operation 330, the graph generator 430 may derive a plurality of features or a feature derivation unit that is a set of basis for attack behavior detection in the security log collected from the plurality of security devices through the natural language processing model. May include a graph management unit (not shown) or a step of generating and managing a relation graph by analyzing correlations of a plurality of features through correlation analysis and correlation analysis.

예를 들면, 그래프 생성부(430)는 보안 로그에 대한 복수의 피쳐들을 도출하여 피쳐들간의 공통분모를 통한 상관관계를 분석하고, 연관성을 도출할 수 있다.For example, the graph generator 430 may derive a plurality of features for the security log, analyze correlations through common denominators between features, and derive an association.

이하에서는 [예 4]를 참조하여, 공통분모를 도출하는 과정에 대해 상세히 설명하고자 한다.Hereinafter, a process of deriving a common denominator will be described in detail with reference to [Example 4].

[예 4]Example 4

[예 4]는 침입 탐지 시스템(Intrusion Detection System; IDS)과 방화벽(Firewall)에서 생성된 로그의 예를 나타낸다. 일반적으로 로그는 보안 장치별로 로그 생성에 대한 규칙(시맨틱, Semantic)을 포함한다. 예를 들어, 일반적인 네트워크 보안 장치는 로그 생성을 할 때, 출발지와 도착지의 주소와 각 서비스에 대한 정보를 나타내는 포트 정보를 포함한다.[Example 4] shows an example of logs generated by an intrusion detection system (IDS) and a firewall. In general, logs contain rules (Semantic) for log generation by security device. For example, a typical network security device includes a port information indicating an address of a source and a destination and information about each service when generating a log.

[예 4]를 참조하면, (1)은 각기 다른 보안 장치에서 도출된 로그에서, 로그 생성 규칙이 같은 부분을 명시하고 있다. 이와 같이, 각각의 로그는 기본적으로 공통분모를 포함하기 때문에, 그래프 생성부(430)는 1차적으로 각각의 로그가 가지는 명확한 공통분모를 도출할 수 있다. Referring to [Example 4], (1) specifies the same log generation rules in logs derived from different security devices. As described above, since each log basically includes a common denominator, the graph generator 430 may first derive a clear common denominator of each log.

다만, 각각의 보안 장치는 그 역할과 기능이 서로 상이하다. 그러므로, 제공되는 로그가 다르고, 로그 생성 규칙에 명시가 되어 있지만 관계를 도출하기 어려운 부분이 있다. [예 4]의 침입 탐지 시스템의 로그에서, (2)는 현재 공격 호스트(192.168.0.100)가 대상 호스트(192.168.0.101)에 SSH 무작위 비밀번호 추측 공격을 수행한 것이고, 방화벽의 로그에서 (2)는 공격 호스트가 대상 호스트의 FTP 서비스에 접근을 시도하였지만 권한이 없어 실패한 것을 나타내고 있다. 또한, (3)은 침입 탐지 시스템의 룰을 정의하는 보안 관리자에 의해 결정되는 문장으로, 해당 정보는 상세한 정보를 포함한다.However, each security device has a different role and function. Therefore, the logs provided are different, and although the log generation rules are specified, there are some parts that are difficult to derive. In the log of the intrusion detection system of [Example 4], (2) indicates that the current attacking host (192.168.0.100) has performed an SSH random password guessing attack on the target host (192.168.0.101), and (2) Indicates that the attacking host tried to access the FTP service of the target host, but failed because of unauthorized privileges. In addition, (3) is a sentence determined by the security administrator defining the rules of the intrusion detection system, the information includes detailed information.

각 보안 장치들에서 도출된 보안 로그는 로그 생성 규칙 기반의 명확한 관계(Explicit relationship)와 보안 관리자의 로그 분석을 통해 숨겨진 정보를 나타내는 내포된 관계(Implicit relationship)로 분류될 수 있다. 이 때, 명확한 관계는 로그 생성 규칙에 따라 상관관계를 분석할 수 있지만, 내포된 관계는 드러나지 않는다는 한계가 있다. 이를 위해, 단계 330에서, 그래프 생성부(430)는 보안 관리자의 직접적인 로그 분석을 이용하여 복수의 시스템에 대한 보안 로그의 상관관계를 사전에 정의하는 작업을 수행할 수 있다. The security log derived from each security device may be classified into an explicit relationship based on a log generation rule and an implicit relationship representing hidden information through log analysis of the security manager. At this time, the clear relationship can be analyzed by the log generation rule, but there is a limit that the nested relationship is not revealed. To this end, in operation 330, the graph generator 430 may perform a task of defining correlations of security logs for a plurality of systems in advance using direct log analysis of the security manager.

이하에서는 도 8을 참조하여 상관관계에 따라 관계 그래프를 생성하는 과정에 대해 상세히 설명하고자 한다.Hereinafter, a process of generating a relationship graph according to the correlation will be described in detail with reference to FIG. 8.

도 8은 그래프의 노드 및 관계를 설명하기 위해 도시한 것이다. 8 is a diagram illustrating nodes and relationships in a graph.

단계 330에서, 그래프 생성부(430)는 관계 그래프(Relational Graph)를 효율적으로 표현하기 위해 그래프 이론을 사용할 수 있다. 도 8을 참조하면, 그래프 이론은 주체 노드(Vertex)와 주체간의 관계(Edge)로 데이터를 표현하며, 이와 더불어 각각의 노드와 관계는 속성(Property)를 나타낸다.In operation 330, the graph generator 430 may use graph theory to efficiently express a relational graph. Referring to FIG. 8, the graph theory represents data in a subject node (Vertex) and a subject (Edge) between the subjects, and each node and the relationship represents a property.

단계 330에서, 그래프 생성부(430)는 복수의 피쳐들에 대한 공통분모를 획득하고, 복수의 피쳐들 간의 연관성 및 상관관계를 분석하며, 보안 로그의 명시적인 관계와 내포된 관계를 도출하여 관계의 주체가 되는 노드들을 생성하는 과정을 통해, 주체 노드(Vertex), 주체 노드 간의 관계(Edge), 및 노드와 관계의 속성(Property)으로 데이터를 표현한 관계 그래프를 생성할 수 있다.In operation 330, the graph generator 430 obtains a common denominator for the plurality of features, analyzes the associations and correlations between the plurality of features, derives the explicit and nested relationships of the security log, Through the process of generating the nodes that are the subjects, the relation graph representing the data by the subject node Vertex, the relationship between the subject nodes, and the property of the node and the relationship may be generated.

이 때, 상기 노드들을 생성하는 과정에서, 그래프 생성부(430)는 보안 로그에서 중복되는 정보는 하나로 결합하고, 단일 로그에서 획득되지 않는 관계들은 상관관계분석을 통해 추가적으로 도출하여 관계를 명시할 수 있다. At this time, in the process of generating the nodes, the graph generator 430 may combine the duplicate information in the security log into one, and the relationship that is not obtained in a single log may be additionally derived through correlation analysis to specify the relationship. have.

다시 도 3을 참조하면, 단계 330에서, 그래프 생성부(430)는 그래프 기반 데이터베이스(Graph DB, 460)를 이용하여 관계 그래프의 실질적인 구축과 효율적인 데이터 관리를 수행할 수 있으며, 생성된 관계 그래프를 그래프 기반 데이터베이스(460)에 저장하여 유지할 수 있다. Referring back to FIG. 3, in step 330, the graph generator 430 may perform an actual construction and efficient data management of the relationship graph by using a graph-based database (Graph DB) 460. It can be stored and maintained in the graph-based database 460.

본 발명의 실시예에 따른 인공 지능 기반의 통합 로그 관리 방법 및 그 시스템은 그래프 기반 데이터베이스(460)를 기반으로, 사용자 쿼리(user query)에 따라 그래프 함수로 인한 데이터간 상관관계를 쉽게 검색하는 인터페이스를 제공할 수 있다. Artificial intelligence-based integrated log management method and system according to an embodiment of the present invention based on the graph-based database 460, an interface for easily searching the correlation between the data due to the graph function according to the user query (user query) Can be provided.

단계 340에서, 탐지부(440)는 관계 그래프를 기반으로 공격이나 장애 등의 연관성을 분석하여 악성 및 비정상행위를 탐지한다.In operation 340, the detection unit 440 detects malicious and abnormal behavior by analyzing an association such as an attack or a disorder based on the relationship graph.

탐지부(440)는 관계 그래프를 기반으로 상호 연관관계와 공격이나 장애 등의 연관성을 분석하며, 사용자에 의해 기 정의된 행위 규칙에 따라 악성 및 비정상행위를 탐지할 수 있다. The detector 440 analyzes the correlation between the correlation and the attack or the disorder based on the relationship graph, and detects malicious and abnormal behaviors according to a behavior rule defined by the user.

예를 들면, 탐지부(440)는 상호 연관관계 및 연관성의 분석 결과를 기반으로, 기 정의된 행위 규칙에 따른 공격 패턴과 유사한 행위를 탐지하여 악성 및 비정상행위를 판단할 수 있다. 이 때, 탐지부(440)는 기계학습 기반의 유사도 분석 기술(Similarity analysis)을 이용하여 발생된 행위와 사전에 정의된 행위의 유사도를 분석하여 악성 및 비정상행위 등의 공격성 여부를 탐지할 수 있다. For example, the detector 440 may determine malicious and abnormal behavior by detecting behavior similar to an attack pattern based on a predefined behavior rule based on the analysis result of the correlation and the correlation. At this time, the detection unit 440 may detect whether there is aggression such as malicious or abnormal behavior by analyzing the similarity between the generated behavior and a predefined behavior using a machine learning based similarity analysis technique. .

이후, 단계 350에서, 시스템 제어부(470)는 탐지 결과를 기반으로, 그래프 기반 데이터베이스(460)에 저장되어 유지되는 사전에 정의된 행위가 아닌, 새로운 악성 및 비정상행위 여부를 판단할 수 있다. 시스템 제어부(470)의 판단 결과에 따라, 새로운 악성 및 비정상행위 중 적어도 어느 하나 이상인 경우, 단계 360에서, 통신부(450)는 새로운 악성 또는 비정상행위에 대한 보안위협 정보를 관리자(Admin)에게 보고(Reports)할 수 있다. Thereafter, in operation 350, the system controller 470 may determine whether new malicious and abnormal behaviors are used, rather than predefined behaviors stored and maintained in the graph-based database 460, based on the detection result. According to the determination result of the system controller 470, if at least one or more of the new malicious and abnormal behavior, in step 360, the communication unit 450 reports the security threat information for the new malicious or abnormal behavior to the administrator (Admin) ( Reports).

이 때, 통신부(450)는 네트워크를 통하여 관리자(Admin)와 통신할 수 있다. 상기 네트워크는 본 발명의 실시예에 따른 인공 지능 기반의 통합 로그 관리 시스템(400)과 복수의 보안 장치들 및 관리자 각각을 연결해주는 망(Network)으로서, LAN(Local Area Network), WAN(Wide Area Network) 등의 폐쇄형 네트워크일 수 있으나, 인터넷(Internet)과 같은 개방형 네트워크일 수도 있다. 여기서, 인터넷은 TCP/IP 프로토콜 및 그 상위계층에 존재하는 여러 서비스 즉, HTTP(HyperText Transfer Protocol), Telnet, FTP(File Transfer Protocol), DNS(Domain Name System), SMTP(Simple Mail Transfer Protocol), SNMP(Simple Network Management Protocol), NFS(Network File Service), NIS(Network Information Service)를 제공하는 전 세계적인 개방형 컴퓨터 네트워크 구조를 의미한다. 또한, 네트워크는 유/무선에 한정되지 않으며, LTE(Long Term Evolution), 와이브로(Wibro) 등과 같은 다양한 무선 네트워크 기술과 결합할 수 있다. At this time, the communication unit 450 may communicate with the administrator (Admin) through the network. The network is a network that connects the artificial intelligence-based integrated log management system 400 and a plurality of security devices and administrators, according to an embodiment of the present invention, and includes a local area network (LAN) and a wide area (WAN). It may be a closed network such as a network, or an open network such as the Internet. Here, the Internet includes various services existing in the TCP / IP protocol and its upper layers, such as HTTP (HyperText Transfer Protocol), Telnet, File Transfer Protocol (FTP), Domain Name System (DNS), Simple Mail Transfer Protocol (SMTP), The global open computer network architecture that provides Simple Network Management Protocol (SNMP), Network File Service (NFS), and Network Information Service (NIS). In addition, the network is not limited to wired / wireless and may be combined with various wireless network technologies such as Long Term Evolution (LTE), Wibro, and the like.

도 5는 본 발명의 실시예에 따른 인공 지능 기반의 통합 로그 관리 시스템의 구조도를 도시한 것이다.5 is a structural diagram of an integrated log management system based on artificial intelligence according to an embodiment of the present invention.

도 5를 참조하면, 본 발명의 실시예에 따른 인공 지능 기반의 통합 로그 관리 시스템(500)은 복수의 보안 장치들(510)로부터 보안 로그를 수집하고, 구조가 없는 데이터 구조인 로그 데이터에 대한 전처리 과정 후, 자연어 처리 기법(NLP)을 적용하여 각기 다른 보안 장치(510)에서 제공되는 보안 로그를 정형화된 형태로 추출하는 로그 분석(520)을 수행한다. Referring to FIG. 5, the artificial intelligence-based integrated log management system 500 according to an embodiment of the present invention collects security logs from a plurality of security devices 510 and provides log data that is a structureless data structure. After the preprocessing process, a log analysis 520 is applied to extract a security log provided from different security devices 510 in a standardized form by applying a natural language processing technique (NLP).

이 때, 인공 지능 기반의 통합 로그 관리 시스템(500)은 기 설정된 방법의 자연어 처리 기법(NLP) 및 사전에 정의된 시맨틱(semantic)을 포함하는 컴퓨팅 리소스(computing resources, 521)를 이용하여 로그 분석(520)을 수행할 수 있으며, 로그 분석(520)를 통해 데이터에서 추출된 정형화된 데이터에서 공격 탐지에 필요한 피쳐를 도출(Feature Derivation)한다. At this time, the artificial intelligence-based integrated log management system 500 analyzes logs using computing resources (521) including a natural language processing technique (NLP) of a predetermined method and a predefined semantic. 520 may be performed, and a feature derivation may be derived through the log analysis 520 to extract a feature required for attack detection from the standardized data extracted from the data.

이후, 인공 지능 기반의 통합 로그 관리 시스템(500)은 악성 및 비정상행위 등의 공격 탐지를 위한 피쳐들의 상관관계를 분석(530)한다. 인공 지능 기반의 통합 로그 관리 시스템(500)은 상관관계 분석(Correlation Analysis, 530)을 통해 각각의 분산된 보안 장치(510)에서 자체적으로 도출한 로그 및 알람(alert)간의 관계도를 나타내는 관계 그래프(Relational Graph, 531)를 생성(Graph generation)한다. Thereafter, the artificial intelligence-based integrated log management system 500 analyzes 530 a correlation between features for attack detection such as malicious and abnormal behavior. Artificial intelligence-based integrated log management system 500 is a relationship graph showing the relationship between the log and the alarm (alert) derived from each distributed security device 510 through the correlation analysis (Correlation Analysis, 530) (Relational Graph, 531) is generated (Graph generation).

이로 인해, 인공 지능 기반의 통합 로그 관리 시스템(500)은 생성된 관계 그래프(531)를 그래프 기반 데이터베이스에 저장하며, 기계학습 기반의 유사도 분석 기술을 이용하여 저장되어 유지되는 관계 그래프(531)와 추후 발생하는 행위의 상호 연관관계 및 공격이나 장애 등의 연관성을 분석(Relational Analysis)하여 악성 및 비정상행위 등의 공격 유사도를 탐지(540)한다. 상기 유사도 분석 기술은 정의된 관계 그래프(531)와 발생된 행위가 정확하게 일치하지 않더라도, ‘가능성 높은 행위’를 사전에 정의된 관계 그래프(531)에 매칭함으로써, 공격 탐지율을 높일 수 있다. As a result, the artificial intelligence-based integrated log management system 500 stores the generated relationship graph 531 in a graph-based database, and the relationship graph 531 that is stored and maintained using a machine learning-based similarity analysis technique. An analysis of correlations between future behaviors and relations such as attacks or disorders is performed to detect attack similarities such as malicious and abnormal behaviors (540). The similarity analysis technique may increase the detection rate of an attack by matching 'probable behavior' to a predefined relation graph 531 even if the defined relation graph 531 does not exactly match the generated behavior.

인공 지능 기반의 통합 로그 관리 시스템(500)은 장기적으로 시스템 자체와 탐지 룰(role)을 보완하여 오탐을 줄이고 탐지율을 높이기 위해 전문가(또는, 관리자)의 의견을 수렴할 수 있다. 이에 따라서, 인공 지능 기반의 통합 로그 관리 시스템(500)은 악성 및 비정상행위 등의 공격 유사도에 대한 탐지 결과(540)를 전문가(또는, 관리자)에게 보고(541)하고, 전문가의 판단을 통해 시스템 룰을 수정 및 보완할 수 있다.The artificial intelligence-based integrated log management system 500 may collect opinions of experts (or administrators) to reduce false positives and increase detection rates by supplementing the system itself and detection rules in the long term. Accordingly, the artificial intelligence-based integrated log management system 500 reports (541) a detection result 540 of attack similarity such as malicious and abnormal behavior to an expert (or an administrator), and determines the system through expert judgment. Rules can be modified and supplemented.

즉, 도 5에 도시된 바와 같이, 본 발명의 실시예에 따른 인공 지능 기반의 통합 로그 관리 시스템(500)은 다양한 보안 로그를 수집하고, 이를 기반으로 공격 탐지를 위한 지식 베이스(Knowledge Base; KB)를 구축할 수 있다. That is, as shown in Figure 5, artificial intelligence-based integrated log management system 500 according to an embodiment of the present invention collects a variety of security logs, based on this knowledge base for attack detection (Knowledge Base; KB) ) Can be built.

이와 더불어, 본 발명의 실시예에 따른 인공 지능 기반의 통합 로그 관리 시스템(500)은 사전에 정의된 규칙에 의한 자동화된 공격 탐지 기능을 제공할 수 있으며, 사용자의 직접적인 개입이 아닌, 사전에 정의된 룰(또는 사전에 정의된 행위)과 해당 룰(또는 발생된 행위)과의 유사도 분석을 통해, 위반되는 행위에 대한 조기 탐지 및 알람 기능도 포함할 수 있다. In addition, the artificial intelligence-based integrated log management system 500 according to an embodiment of the present invention may provide an automated attack detection function based on a predefined rule, and is defined in advance, not directly by a user. Similarity analysis between a rule (or predefined action) and the rule (or action) can also include early detection and alarming for violations.

도 6은 봇넷 확산의 예를 도시한 것이고, 도 7은 본 발명의 실시예에 따른 상관관계 분석을 통한 봇넷 확산의 예를 도시한 것이다.6 shows an example of botnet spreading, and FIG. 7 shows an example of botnet spreading through correlation analysis according to an embodiment of the present invention.

도 6을 참조하면, 랜섬웨어가 감염된 호스트(610)에 대해, 기존의 SIEM 시스템은 침입 탐지 시스템(620, Intrusion Detection System; IDS)에서 탐지한 ‘의심스러운 연결’과 침입 방지 시스템(640, Intrusion Prevention System; IPS)에서 제공한 ‘악성 페이로드 검출(Malicious payload)’과 같은 정보를 관리자(650)에게 전달한다. 즉, 기존의 SIEM 시스템은 ‘사실’을 기술하지만, 해당 ‘사실’이 전체 보안성에 어떠한 영향을 미치는지, 여러 보안 이벤트들이 가지는 ‘연관성’에 대한 분석이 포함되어 있지 않다. 이에 따라서, 기존의 SIEM 시스템은 알림을 통해 ‘의심스러운 연결을 제한’이나, ‘악성 행위를 하는 호스트 차단’과 같은 제한적인 대응만 수행할 수 있다.Referring to FIG. 6, for a host 610 infected with ransomware, the existing SIEM system may be configured to detect a 'suspicious connection' detected by an intrusion detection system (620) and an intrusion prevention system (640). Information such as 'Malicious payload' provided by the Prevention System (IPS) is transmitted to the manager 650. In other words, the existing SIEM system describes 'facts', but does not include an analysis of how 'facts' affect the overall security and the 'relevance' of various security events. Accordingly, existing SIEM systems can only perform limited responses such as "limit suspicious connection" or "block malicious host" through notification.

도 7을 참조하면, 본 발명의 실시예에 따른 인공 지능 기반의 통합 로그 관리 시스템은 ‘의심스러운 연결(Suspicious connection to)’과 ‘악성 페이로드 검출(malware propagation to)’의 사실을 통해 각 노드 간의 상관관계를 표현하고, 해당 정보에 대한 상관관계 분석(Relational analysis)을 통하여 ‘봇넷 확산(Botnet propagation)’이라는 새로운 결과를 도출할 수 있다.Referring to FIG. 7, the artificial intelligence-based integrated log management system according to an embodiment of the present invention uses each node through the fact of 'Suspicious connection to' and 'malware propagation to'. Expressing the correlation between the two and the correlation analysis (Relational analysis) of the information can derive a new result of 'Botnet propagation'.

도 7에서, 본 발명의 실시예에 따른 인공 지능 기반의 통합 로그 관리 시스템은 침입 탐지 시스템(720, Intrusion Detection System; IDS)이 탐지한 ‘의심스러운 연결’을 통해 C&C 서버(730)를 도출하고, C&C 서버(730)에 접근하는 호스트(710)가 악성 코드를 배포하는 것을 통해 새로운 정보를 도출할 수 있다.In FIG. 7, the artificial intelligence-based integrated log management system according to an embodiment of the present invention derives the C & C server 730 through the 'suspicious connection' detected by the intrusion detection system 720 (IDS). In addition, the host 710 accessing the C & C server 730 may derive new information through distribution of malicious code.

즉, 본 발명의 실시예에 따른 인공 지능 기반의 통합 로그 관리 시스템은 기존에 존재하는 수많은 로그에 대한 상관관계를 분석하여 각 이벤트 간의 상관관계를 도출하고, 최종적으로는 각 이벤트들이 탐지된 궁극적인 이유에 대한 분석을 수행할 수 있다. 이에 따라서, 본 발명의 실시예에 따른 인공 지능 기반의 통합 로그 관리 시스템은 단순하게 ‘악성 행위를 하는 호스트 차단’, ‘의심스러운 연결 제한’과 같은 단순한 대응이 아닌, 호스트(710)가 악의적으로 접속하려고 하는 C&C 서버(730)를 근본적으로 차단하거나, 호스트(710) 중 공격에 대한 최초 지점을 파악하여 근본적인 보안 문제를 해결할 수 있다. That is, the artificial intelligence-based integrated log management system according to an embodiment of the present invention analyzes the correlation of a number of existing logs to derive a correlation between each event, and finally, each event is ultimately detected. Analyze the reason. Accordingly, the artificial intelligence-based integrated log management system according to an embodiment of the present invention is not simply a response such as 'blocking host performing malicious behavior' and 'limiting suspicious connection', but the host 710 is malicious. It is possible to fundamentally block the C & C server 730 attempting to access or to identify the initial point of attack of the host 710 to solve the fundamental security problem.

도 9는 본 발명의 실시예에 따른 평판 기반 행위 분석을 이용하여 악성 행위를 탐지하는 예를 설명하기 위해 도시한 것이다.9 is a diagram illustrating an example of detecting malicious behavior using reputation based behavior analysis according to an embodiment of the present invention.

본 발명의 실시예에 따른 인공 지능 기반의 통합 로그 관리 시스템(900)은 보안 장치(910)로부터 수집되는 보안 로그에 대해 평판 기반 행위 분석을 수행하며, 사용자의 행위에 대한 평판 기능을 단계별로 분류하여 악성 및 비정상행위에 대한 최종 결론을 제공할 수 있다. Artificial intelligence-based integrated log management system 900 according to an embodiment of the present invention performs a reputation-based behavior analysis on the security log collected from the security device 910, and classifies the reputation function for the user's behavior step by step To provide a final conclusion on malicious and abnormal behavior.

시스템에 대한 자원은 다양하고, 각 사용자별로 접근하고 사용하는 방식이 모두 다르다. 이러한 다양성은 일반화를 더욱 어렵게 한다. 이로 인해, 본 발명의 실시예에 따른 인공 지능 기반의 통합 로그 관리 시스템(900)은 평판 기반 행위 분석을 수행하고, 사용자 각각의 악성 및 양성 행위에 대한 평판 기능을 단계별로 제공할 수 있다. The resources for the system vary and the way users access and use each one is different. This diversity makes generalization more difficult. For this reason, the artificial intelligence-based integrated log management system 900 according to the embodiment of the present invention may perform reputation-based behavior analysis and provide a reputation function for malicious and benign behaviors of each user in stages.

이 때, 상기 평판 기반 행위 분석은 대상 호스트의 양성 및 악성 정도를 주기적으로 추적하여 최종 결론을 내리는 것이다. 예를 들면, 대상 호스트가 특정 시점에 악성 행위를 하더라도, 해당 호스트의 평판이 좋은 경우, 본 발명의 실시예에 따른 인공 지능 기반의 통합 로그 관리 시스템(900)은 악성 및 비정상행위 등의 공격 탐지의 우선순위를 조절하여 관리자(920)에게 보고(921)할 수 있다. 이러한 기능은 전체 탐지 로그에 대한 양을 줄이고, 오탐을 줄일 수 있다.At this time, the reputation-based behavior analysis is to determine the final conclusion by periodically tracking the benign and malignant degree of the target host. For example, even if the target host is malicious at a specific time, if the host has a good reputation, the artificial intelligence-based integrated log management system 900 according to an embodiment of the present invention detects attacks such as malicious and abnormal behaviors. The priority of the report may be adjusted to the manager 920 (921). This feature can reduce the total amount of detection logs and reduce false positives.

나아가, 관리자(920)는 보안 장치(910)로 행위 규칙을 제공하며, 인공 지능 기반의 통합 로그 관리 시스템(900)에 규칙을 실시간 업데이트할 수 있다. In addition, the manager 920 may provide an action rule to the security device 910, and may update the rule in real time to the artificial intelligence-based integrated log management system 900.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 어플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the devices and components described in the embodiments may include, for example, processors, controllers, arithmetic logic units (ALUs), digital signal processors, microcomputers, field programmable arrays (FPAs), It may be implemented using one or more general purpose or special purpose computers, such as a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to the execution of the software. For the convenience of understanding, the processing apparatus may be described as one used, but those skilled in the art will appreciate that the processing apparatus includes a plurality of processing elements and / or a plurality of types of processing elements. It can be seen that it may include. For example, the processing device may include a plurality of processors or one processor and one controller. In addition, other processing configurations are possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the above, and may configure the processing device to operate as desired, or process independently or collectively. You can command the device. Software and / or data may be any type of machine, component, physical device, virtual equipment, computer storage medium or device in order to be interpreted by or to provide instructions or data to the processing device. Or may be embodied permanently or temporarily in a signal wave to be transmitted. The software may be distributed over networked computer systems so that they may be stored or executed in a distributed manner. Software and data may be stored on one or more computer readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The method according to the embodiment may be embodied in the form of program instructions that can be executed by various computer means and recorded on a computer readable medium. The computer readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions recorded on the media may be those specially designed and constructed for the purposes of the embodiments, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer readable recording media include magnetic media such as hard disks, floppy disks and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.Although the embodiments have been described by the limited embodiments and the drawings as described above, various modifications and variations are possible to those skilled in the art from the above description. For example, the described techniques may be performed in a different order than the described method, and / or components of the described systems, structures, devices, circuits, etc. may be combined or combined in a different form than the described method, or other components. Or even by substitution or replacement by equivalents, an appropriate result can be achieved.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are within the scope of the claims that follow.

500: 통합 로그 관리 시스템
610, 710: 호스트(Host)
620, 720: 침입 탐지 시스템(Intrusion Detection System; IDS)
630, 730: C&C 서버
640, 740: 침입 방지 시스템(Intrusion Prevention System; IPS)
650, 750, 920: 관리자(Admin)
910: 보안 장치
921: 보고(reports)500: integrated log management system
610, 710: host
620, 720: Intrusion Detection System (IDS)
630, 730: C & C Server
640, 740: Intrusion Prevention System (IPS)
650, 750, 920: Admin
910: security device
921: reports

Claims

In the security information and event management system that combines artificial intelligence, or a method of operating an integrated log management system (Security Information and Event Management (SIEM),
Collecting a security log from a plurality of security devices;
Analyzing the security log for attack behavior detection using Natural Language Processing (NLP) and Semantic;
Generating a relational graph by analyzing correlations between a plurality of features acquired based on the analysis result; And
Detecting malicious and abnormal behavior by analyzing the association of attacks or disorders based on the relationship graph
AI-based integrated log management method comprising a.

The method of claim 1,
Reporting security threat information on newly detected malicious or abnormal behavior according to the detection result
AI-based integrated log management method further comprising.

The method of claim 1,
Collecting the security log is
Artificial intelligence-based integrated log management method for collecting the security log and audit information including at least one of timestamp, IP address and port information that the log is generated from the plurality of security devices (Security devices) .

The method of claim 1,
Analyzing the security log
An artificial intelligence-based integrated log management method of analyzing a graph form for a graph data structure using the natural language processing technique and the semantics after applying a preprocessing technique to the security log.

The method of claim 4, wherein
Analyzing the security log
Sampling a predetermined amount of data from natural language data (corpus) due to the pre-processed security log using the natural language processing technique, and annotating patterns based on grammatical or semantic information of the extracted data. Artificial intelligence-based integrated log management method for generating the NLP model based on the graph form by learning the patterns in a machine learning or artificial intelligence algorithm.

The method of claim 5,
The step of generating the relational graph (Relational Graph)
Deriving the plurality of features, which is a set of grounds for detecting an attack behavior in the security log collected from the plurality of security devices through the natural language processing technique model; And
Generating and managing the relationship graph by analyzing the correlation of the plurality of features through correlation analysis
AI-based integrated log management method comprising a.

The method of claim 6,
Creating and managing the relationship graph
Obtaining common denominators for the plurality of features, analyzing associations and correlations between the plurality of features, and deriving explicit and nested relationships in the security log to create nodes that are subjects of the relationships Through the process, artificial log-based integrated log management method for generating the relationship graph representing the data in the subject node (Vertex), the relationship between the subject node (Edge), and the node and the property of the relationship (Property).

The method of claim 7, wherein
Creating and managing the relationship graph
In the process of generating the nodes, the overlapped information in the security log is combined into one, and relationships not acquired in a single log are additionally derived through the correlation analysis to specify the relationship. How to manage logs.

The method of claim 8,
Creating and managing the relationship graph
Artificial log-based integrated log management method that uses a graph-based database (Graph DB) to perform the actual construction of the relationship graph and efficient data management.

The method of claim 1,
Detecting the malicious and abnormal behavior
An artificial intelligence-based integrated log management method that analyzes correlations between mutual correlations and attacks or disorders based on the relationship graphs, and detects malicious and abnormal behaviors according to behavior rules predefined by a user.

The method of claim 10,
Detecting the malicious and abnormal behavior
An artificial intelligence-based integrated log management method that performs a reputation-based behavior analysis and provides a final conclusion on the malicious and abnormal behaviors by classifying the reputation function of the user's behavior step by step.

A computer program stored in a computer readable recording medium for performing the method of any one of claims 1 to 11.

Security information and event management system that combines artificial intelligence, or integrated log management system (Security Information and Event Management (SIEM),
A log collector configured to collect security logs from a plurality of security devices;
A log analyzer for analyzing the security log for attack behavior detection using Natural Language Processing (NLP) and Semantic;
A graph generator configured to generate a relational graph by analyzing correlations between a plurality of features acquired based on the analysis result; And
Detection unit that detects malicious and abnormal behavior by analyzing the association of attacks or disorders based on the relationship graph
Artificial intelligence-based integrated log management system that includes.

The method of claim 13,
Communication unit for reporting security threat information about newly detected malicious or abnormal behavior according to the detection result
AI-based integrated log management system further comprising.

The method of claim 13,
The graph generation unit
A feature derivation unit for deriving the plurality of features, which is a set of basis for detecting attack behavior in the security log collected from the plurality of security devices through the natural language processing technique model; And
Graph management unit for generating and managing the relationship graph by analyzing the correlation of the plurality of features through a correlation analysis (correlation analysis)
Artificial intelligence-based integrated log management system that includes.