KR20230073056A

KR20230073056A - Malicious event log automatic analysis device and method

Info

Publication number: KR20230073056A
Application number: KR1020210191721A
Authority: KR
Inventors: 한승철; 방효섭; 고명수; 선동환
Original assignee: 주식회사 엔피코어
Priority date: 2021-11-18
Filing date: 2021-12-29
Publication date: 2023-05-25

Abstract

인공지능(AI) 기반 챗봇엔진을 적용하여 엔드포인트 위협이벤트인 악성 이벤트로그를 자동분석하는 방법 및 장치가 개시된다. 자동분석 방법은, 이벤트 로그의 효율적 분석을 위한 프로세스별 그룹화로 로그를 가공하고, AI 분석을 위한 신경-언어프로그래밍(neuro-linguistic programming, NLP) 기법을 통한 프로세스 행위에 대한 문장을 생성하고, 실제로 동작하는 악성 행위들을 데이터화하여 AI 모델 학습을 위한 학습데이터로 사용하며, 학습데이터를 학습시키기 위해 트랜스포머 알고리즘 기반의 학습 체계를 구현하고, 실제 악성행위를 분석하기 위해 BERT 및 LSTM 알고리즘 기반의 이벤트 악성 여부 자동 분석기를 통해 그룹화한 이벤트 로그를 실시간으로 확인하여 악성 행위 대상이 일정 유사도 이상일 경우 탐지하여 위협 이벤트로 자동판단한다.Disclosed is a method and apparatus for automatically analyzing a malicious event log, which is an endpoint threat event, by applying an artificial intelligence (AI)-based chatbot engine. The automatic analysis method processes logs by grouping by process for efficient analysis of event logs, generates sentences about process behavior through neuro-linguistic programming (NLP) techniques for AI analysis, and actually Convert malicious behaviors into data and use them as learning data for AI model learning, implement a transformer algorithm-based learning system to learn learning data, and analyze actual malicious behaviors to determine if the event is malicious based on BERT and LSTM algorithms. Event logs grouped through an automatic analyzer are checked in real time, and if the target of malicious activity has a certain degree of similarity, it is detected and automatically judged as a threat event.

Description

Malicious event log automatic analysis device and method {MALICIOUS EVENT LOG AUTOMATIC ANALYSIS DEVICE AND METHOD}

본 발명은 악성 이벤트로그 자동분석 방안에 관한 것으로, 보다 상세하게는, 인공지능(airtificial intelligence, AI) 기반 챗봇엔진(chatbot engine)을 적용하여 엔드포인트 위협이벤트인 악성 이벤트로그를 자동분석하는 방법 및 장치에 관한 것이다. The present invention relates to a method for automatically analyzing malicious event logs, and more particularly, to a method for automatically analyzing malicious event logs, which are endpoint threat events, by applying an artificial intelligence (AI)-based chatbot engine, and It's about the device.

정보통신 인프라가 지속적으로 확대됨에 따라, 정보통신(IT) 인프라를 이용한 다양한 형태의 보안 위협이 급증하고 있다. 최근 이러한 지능형 지속 위협(Advanced Persistent Threat, 이하 APT)은 악성코드를 이용한 기업 내 주요 정보 유출 사고 및 내부 사용자들의 비업무 사이트 접근으로 인한 업무 집중도 약화와 이로 인한 정보 유출 사고가 빈번히 발생하고 있어 이를 방지하기 위한 대책이 절실히 요구되고 있다.As the information and communication infrastructure continues to expand, various types of security threats using the information and communication (IT) infrastructure are rapidly increasing. Recently, these advanced persistent threats (Advanced Persistent Threats, hereinafter referred to as APTs) frequently cause major information leakage accidents within companies using malicious codes and weaken work concentration due to internal users' access to non-business sites, resulting in information leakage accidents. Countermeasures are desperately needed.

지속적으로 피해가 발생되고 있는 보안 위협에 대응하기 위해서는 방대한 양의 악성코드와 위협 관련 데이터를 분석할 필요가 있고, 관련 데이터의 효과적인 분석을 위해서는 자동 학습으로 이미지를 분석하는 악성코드 학습 및 분류 모듈에 대한 개발이 요구된다.In order to respond to security threats that continue to cause damage, it is necessary to analyze a vast amount of malicious code and threat-related data. development is required.

한편, 기존의 악성코드 탐지 및 대응 기술과 관련된 해외 및 국내 APT 솔루션 제조사는 신종 또는 변종 악성코드에 대응하기 위하여 각 업체별로 다양한 형태로 대응 솔루션을 제시하고 있으나 점차 방대해지고 지능화되는 보안 위협에 실시간 대응하기가 쉽지 않은 실정이다.On the other hand, overseas and domestic APT solution manufacturers related to existing malware detection and response technologies are presenting response solutions in various forms for each company to respond to new or variant malware, but respond in real time to security threats that are gradually becoming more massive and intelligent. It is not easy to do.

이에, 머신러닝을 활용한 악성코드 탐지 기술을 연구하고 있는 엔드포인트 위협탐지 및 대응(endpoint detection & response, EDR) 솔루션 개발 업체에서는 인공지능(artificial intelligence, AI) 기술을 적극 활용하여 시그니처 기반, 룰 기반 등의 기존 솔루션에서 탐지하지 못하는 일부 사이버 위협들을 탐지하고 있다.In response, endpoint detection & response (EDR) solution developers who are researching malware detection technology using machine learning actively utilize artificial intelligence (AI) technology to detect signature-based, rule-based It detects some cyber threats that are not detected by existing solutions such as

또한, 엔드포인트 기기에서 정보를 수집하여 사이버 공격의 징후를 상관분석 및 머신러닝으로 실시간으로 자동 탐지하고 대응하기 위한 엔드포인트 침해사고 대응 솔루션 또한 적극 도입되고 있는 상황이다.In addition, endpoint incident response solutions are being actively introduced to automatically detect and respond to signs of cyber attacks in real time through correlation analysis and machine learning by collecting information from endpoint devices.

또한, 업무용 퍼스널 컴퓨터(personal computer, PC)에서 악성행위에 대한 사전 대응 기술로 분산 원격 포렌식에 대한 다양한 위협 인텔리전스 도구들을 개발되고 있다. 그리고, C/C++, 루비, 파이썬 등 다양한 언어로 개발한 소프트웨어 어플리케이션의 보안을 테스트할 수 있는 SAST(static application security testing) 도구도 다양하게 개발 중인 상황이다.In addition, various threat intelligence tools for distributed remote forensics are being developed as a preemptive response technology for malicious behavior in a business personal computer (PC). In addition, various static application security testing (SAST) tools that can test the security of software applications developed in various languages such as C/C++, Ruby, and Python are also being developed.

그러나, 전술한 악성코드나 위협공격에 대한 분석 및 대응을 효과적으로 하기 위해서는 PC, 단말, 서버 등에 설치된 에이전트 내부에서 악성코드를 직접 분석하는 형태가 가장 바람직하나, 아직까지 그러한 형태의 솔루션은 제안되지 못하고 있다.However, in order to effectively analyze and respond to the above-mentioned malicious codes or threat attacks, it is most desirable to directly analyze the malicious code inside the agent installed on the PC, terminal, server, etc., but such a type of solution has not yet been proposed. there is.

본 발명의 목적은 챗봇엔진을 활용하여 위협 이벤트 자동분석 및 탐지를 엔드포인트에서 자동 수행할 수 있는 엔드포인트 보안 솔루션을 제공하는데 있다.An object of the present invention is to provide an endpoint security solution capable of automatically analyzing and detecting threat events at an endpoint using a chatbot engine.

본 발명의 다른 목적은 이벤트 로그 데이터셋 구축을 위한 대용량 데이터베이스(database, DB)인 엘라스틱 서치 데이터베이스 관리시스템(DB management system, DBMS)을 구축하고, 위협 이벤트 로그 자동 분석 및 탐지를 위한 트랜스포머(transformer) 네트워크와 LSTM(long short-term memory) 네트워크 기반 학습 체계를 구축하고, 이를 토대로 하는 텍스트 분류 기능으로 악성 행위 이벤트를 효과적으로 분류할 수 있는 자동화된 행위 이벤트 분석 기능을 구비하면서, 엔드포인트 이벤트 탐지룰과 연동하여 악성 이벤트 탐지를 수행할 수 있는 엔드포인트 자동분석 장치 및 방법을 제공하는데 있다.Another object of the present invention is to build an elastic search database management system (DBMS), which is a large-capacity database (database, DB) for building event log datasets, and to build a transformer for automatic analysis and detection of threat event logs. Establish a network and LSTM (long short-term memory) network-based learning system, and have an automated behavioral event analysis function that can effectively classify malicious behavioral events with a text classification function based on this, while endpoint event detection rules and It is to provide an automatic endpoint analysis device and method capable of performing malicious event detection in conjunction with each other.

본 발명의 또 다른 목적은 챗봇 엔진을 적용한 인공지능(AI) 기반 엔드포인트 위협탐지 및 대응(endpoint detection and response, EDR) 솔루션을 제공하는데 있다.Another object of the present invention is to provide an AI-based endpoint detection and response (EDR) solution to which a chatbot engine is applied.

본 발명의 또 다른 목적은 분석 서버에 AI 모듈과 이벤트로그를 저장할 AI 데이터베이스를 구축하고, 사용자 PC에 에이전트를 설치하고 에이전트에서 사용자 PC의 이벤트를 룰(rule) 매칭을 진행하여 분석하고 분석 결과를 분석 서버의 AI 데이터베이스와 연동하여 분석결과를 전송하며, 프로세스 식별자(process ID) 및 세션(session) 정보를 활용하여 그룹화를 진행하고 진행 순서대로 프로세스 행위 순서대로 문장을 생성하여 AI 데이터베이스로 전송하여 데이터셋을 구축함으로써, 효과적으로 악성 이벤트로그를 자동분석할 수 있는 엔드포인트 자동분석 장치 및 방법을 제공하는데 있다.Another object of the present invention is to build an AI database to store AI modules and event logs in an analysis server, install an agent on a user PC, analyze events on the user PC by rule matching in the agent, and analyze the analysis results. The analysis result is transmitted in conjunction with the AI database of the analysis server, grouping is performed using process ID and session information, and sentences are generated in the order of process actions in the order of progress and transmitted to the AI database to transmit data By constructing the set, it is intended to provide an endpoint automatic analysis device and method capable of automatically analyzing malicious event logs effectively.

본 발명의 또 다른 목적은 인공지능 기술 적용을 통해 위협 이벤트 데이터와 이벤트 로그를 수집하여 프로세스 식별자 및 세션 식별자 별로 그룹화, 프로세스별 그룹화를 진행하여 인공지능 기반으로 위협 프로세스를 자동 분석하고, 자동 분석에 근거한 위협 이벤트를 자동 판단하고, 탐지 근거를 제공하기 위해 트랜스포머 네트워크의 분석 결과와 프로세스 트리를 제공할 수 있는, 엔드포인트 자동분석 장치 및 방법을 제공하는데 있다.Another object of the present invention is to automatically analyze threat processes based on artificial intelligence by collecting threat event data and event logs through the application of artificial intelligence technology, grouping by process identifier and session identifier, and grouping by process, and automatically analyzes threat processes based on artificial intelligence. An automatic endpoint analysis device and method capable of automatically determining a threat event based on a threat and providing an analysis result and a process tree of a transformer network to provide a detection basis are provided.

상기 기술적 과제를 해결하기 위한 본 발명의 일 측면에 따른 악성 이벤트로그 자동분석 장치는, 인공지능(artificial intelligence, AI) 기반 챗봇엔진(chatbot engine)을 적용한 이벤트 자동분석 및 탐지를 위한 엔드포인트 위협탐지 및 대응(endpoint detection & response, EDR) 장치로서, 프로세서와 메모리를 포함하고, 메모리에 저장된 적어도 하나의 명령에 의해 프로세서가: 이벤트 로그의 효율적 분석을 위한 프로세스별 그룹화로 로그를 가공하고, AI 분석을 위한 NLP(Natural Language Processing) 기법을 통한 프로세스 행위에 대한 문장을 생성하고, 실제로 동작하는 악성 행위들을 데이터화하여 AI 모델 학습을 위한 학습데이터로 사용하며, 학습데이터를 학습시키기 위해 트랜스포머(transfomer) 알고리즘 기반의 학습 체계를 구현하고, 실제 악성행위를 분석하기 위해 BERT(bidirectional encoder representations from transformers) 및 LSTM(long short-term memory) 알고리즘 기반의 이벤트 악성 여부 자동 분석기를 통해 그룹화한 이벤트 로그를 실시간으로 확인하여 악성 행위 대상이 일정 유사도 이상일 경우 탐지하여 위협 이벤트로 자동판단하도록 구성될 수 있다.An automatic malicious event log analysis device according to an aspect of the present invention for solving the above technical problem is endpoint threat detection for automatic event analysis and detection using an artificial intelligence (AI) based chatbot engine. And an endpoint detection & response (EDR) device, including a processor and a memory, wherein the processor: processes the log into groupings by process for efficient analysis of event logs, and analyzes AI by at least one command stored in the memory. Sentences for process behavior are generated through NLP (Natural Language Processing) techniques for In order to implement a learning system based on learning and analyze actual malicious behavior, check grouped event logs in real time through an automatic analyzer for event maliciousness based on BERT (bidirectional encoder representations from transformers) and LSTM (long short-term memory) algorithms Thus, if the target of a malicious action has a certain degree of similarity or higher, it can be configured to detect and automatically judge it as a threat event.

일실시예에서, 이벤트 로그 분류를 위한 딥러닝기반 챗봇엔진은, 위협 프로세스 행위를 학습시키기 위한 행위 문장 생성 및 라벨링 방식으로 동작하며, 트랜스포머 아키텍처의 인코더 계층 및 디코더 계층 6개를 기준으로 성능에 따라 최적 계층으로 설계되고, 모델 학습 시 필요한 비용(cost)의 최적화를 위한 학습율(learning rate)를 학습 진행 시간에 따라 그 속도를 조정하도록 구성되고, 트랜스포머 기반 BERT 모델과 밀집레이어(dense layer)을 결합한 모델로 구성될 수 있고, LSTM 및 게이트 순환 유닛(gated recurrent uint, GRU) 기반의 전통적인 순환 신경망(recurrent neural network, RNN) 모델과 트랜스포머 기반 모델 간의 F1-score 비교에 따른 최적 모델이 이용될 수 있다.In one embodiment, the deep learning-based chatbot engine for event log classification operates in a behavior sentence generation and labeling method for learning threat process behavior, and according to performance based on six encoder and decoder layers of the transformer architecture It is designed as an optimal layer, configured to adjust the learning rate according to the learning progress time for cost optimization required for model learning, and combines a transformer-based BERT model and a dense layer. It can be configured as a model, and an optimal model according to F1-score comparison between a transformer-based model and a traditional recurrent neural network (RNN) model based on LSTM and gated recurrent units (GRU) can be used. .

일실시예에서, 이벤트 분석을 위한 AI 모듈이 EDR 시스템과 연동될 수 있다. 여기서, AI 모듈은 클라우드 기반 EDR 솔루션과 연동되어 통합 서비스를 제공할 수 있다.In one embodiment, an AI module for event analysis may interwork with the EDR system. Here, the AI module can provide an integrated service in conjunction with a cloud-based EDR solution.

일실시예에서, AI 모듈은 각각 사이트별로 배치되는 어플라이언스 형태의 관리 서버에 의해 관리되거나, 클라우드(cloud) 기반으로 네트워크 상에서 직접 서비스될 수 있다.In one embodiment, the AI module may be managed by a management server in the form of an appliance disposed for each site, or may be directly serviced on a network based on a cloud.

상기 기술적 과제를 해결하기 위한 본 발명의 다른 측면에 따른 악성 이벤트로그 자동분석 방법은, 통합분석서버에 인공지능 모듈과 이벤트로그를 저장할 데이터베이스를 구축하고, 엔드포인트에 에이전트를 설치하고 상기 에이전트에서 이벤트 로그에 대한 룰(rule) 매칭을 진행하여 악성 이벤트 로그를 분석하는 방법으로서, 프로세스 식별자(process ID) 및 세션(session) 정보를 활용하여 이벤트 로그에 대한 그룹화를 진행하는 단계; 상기 이벤트 로그를 토대로 프로세스 행위 순서대로 문장을 생성하는 단계; 상기 행위 순서대로 생성된 문장에 포함하는 데이터셋에 기초하여 엔드포인트에서의 악성 이벤트로그를 자동분석하는 단계를 포함한다.In the method for automatically analyzing malicious event logs according to another aspect of the present invention for solving the above technical problem, a database for storing an artificial intelligence module and an event log is built in an integrated analysis server, an agent is installed on an endpoint, and an event log in the agent is installed. A method of analyzing malicious event logs by performing rule matching on logs, comprising: grouping event logs by utilizing process ID and session information; generating sentences in order of process actions based on the event log; and automatically analyzing malicious event logs in the endpoint based on the dataset included in the sentences generated in the sequence of actions.

상기 기술적 과제를 해결하기 위한 본 발명의 또 다른 측면에 따른 악성 이벤트로그 자동분석 방법은, 이벤트 로그의 효율적 분석을 위한 프로세스별 그룹화로 로그를 가공하는 단계; AI 분석을 위한 NLP(Natural Language Processing)기법을 통해 프로세스 행위에 대한 문장을 생성하는 단계; 실제로 동작하는 악성 행위들을 데이터화하여 AI 모델 학습을 위한 학습데이터로 사용하는 단계; 학습데이터를 학습시키기 위해 트랜스포머(transfomer) 알고리즘 기반의 학습 체계를 생성하는 단계; 실제 악성행위를 분석하기 위해 BERT 및 LSTM 알고리즘 기반의 이벤트 악성 여부 자동 분석기를 통해 그룹화한 이벤트 로그를 실시간으로 확인하는 단계; 및 악성 행위 대상이 일정 유사도 이상일 경우 탐지하여 위협 이벤트로 자동판단하는 단계를 포함한다.A method for automatically analyzing malicious event logs according to another aspect of the present invention for solving the above technical problem includes processing logs by grouping by process for efficient analysis of event logs; Generating sentences for process behavior through NLP (Natural Language Processing) techniques for AI analysis; Converting malicious behaviors that actually operate into data and using them as learning data for AI model learning; generating a learning system based on a transformer algorithm to learn learning data; Checking grouped event logs in real time through an automatic malicious event analyzer based on BERT and LSTM algorithms to analyze actual malicious behavior; and detecting and automatically judging a malicious action target as a threat event when the target has a certain degree of similarity or higher.

상기 기술적 과제를 해결하기 위한 본 발명의 또 다른 측면에 따른 악성 이벤트로그 자동분석 장치는, 통합분석서버에 인공지능 모듈과 이벤트로그를 저장할 데이터베이스를 구축하고, 엔드포인트에 에이전트를 설치하고 상기 에이전트에서 이벤트 로그에 대한 룰(rule) 매칭을 진행하여 악성 이벤트 로그를 분석하는 장치로서, 프로세서, 및 상기 프로세서에 의해 실행되는 적어도 하나의 명령을 저장하는 메모리를 포함하고, 상기 프로세서는 상기 적어도 하나의 명령에 의해, 프로세스 식별자(process ID) 및 세션(session) 정보를 활용하여 이벤트 로그에 대한 그룹화를 진행하는 단계; 상기 이벤트 로그를 토대로 프로세스 행위 순서대로 문장을 생성하는 단계; 상기 행위 순서대로 생성된 문장에 포함하는 데이터셋에 기초하여 엔드포인트에서의 악성 이벤트로그를 자동분석하는 단계를 수행하도록 구성된다.An automatic malicious event log analysis device according to another aspect of the present invention for solving the above technical problem is to build a database to store artificial intelligence modules and event logs in an integrated analysis server, install an agent on an endpoint, and in the agent An apparatus for analyzing a malicious event log by performing rule matching on an event log, comprising: a processor and a memory storing at least one command executed by the processor, the processor comprising: Grouping event logs using process ID and session information; generating sentences in order of process actions based on the event log; It is configured to perform the step of automatically analyzing malicious event logs in the endpoint based on the dataset included in the sentences generated in the sequence of actions.

상기 기술적 과제를 해결하기 위한 본 발명의 또 다른 측면에 따른 악성 이벤트로그 자동분석 장치는, 인공지능(AI) 기반 챗봇엔진을 적용한 엔드포인트 위협탐지 및 대응(endpoint detection & response, EDR)용 엔드포인트 자동분석 장치로서, 프로세서와 메모리를 포함하고, 메모리에 저장된 적어도 하나의 명령에 의해 프로세서가: 이벤트 로그의 효율적 분석을 위한 프로세스별 그룹화로 로그를 가공하고, AI 분석을 위한 NLP(Natural Language Processing) 기법을 통한 프로세스 행위에 대한 문장을 생성하고, 실제로 동작하는 악성 행위들을 데이터화하여 AI 모델 학습을 위한 학습데이터로 사용하며, 학습데이터를 학습시키기 위해 트랜스포머(transfomer) 알고리즘 기반의 학습 체계를 구현하고, 실제 악성행위를 분석하기 위해 BERT(bidirectional encoder representations form transformers) 및 LSTM(long short-term memory) 알고리즘 기반의 이벤트 악성 여부 자동 분석기를 통해 그룹화한 이벤트 로그를 실시간으로 확인하여 악성 행위 대상이 일정 유사도 이상일 경우 탐지하여 위협 이벤트로 자동판단하도록 구성된다.An automatic malicious event log analysis device according to another aspect of the present invention for solving the above technical problem is an endpoint for endpoint detection & response (EDR) using an artificial intelligence (AI) based chatbot engine. As an automatic analysis device, it includes a processor and a memory, and by at least one command stored in the memory, the processor: processes the log into groupings by process for efficient analysis of event logs, and NLP (Natural Language Processing) for AI analysis Create sentences about process behavior through techniques, convert malicious behaviors that actually operate into data and use them as learning data for AI model learning, implement a transformer algorithm-based learning system to learn learning data, In order to analyze the actual malicious behavior, the grouped event logs are checked in real time through an event malicious automatic analyzer based on BERT (bidirectional encoder representations form transformers) and LSTM (long short-term memory) algorithms to determine if the malicious behavior target has a certain degree of similarity or higher. It is configured to detect and automatically judge it as a threat event.

일실시예에서, 상기 트랜스포머 알고리즘을 위한 트랜스포머 모델은 인코더(encoders) 계층과 디코더(decoders) 계층을 구비하고, 입력쪽 레이어의 위치 인코딩(Postional Encoding)에서 각 단어의 임베딩(embedding) 벡터에서 임베딩 차원의 위치에 1을 더한 값이 홀수인 경우에 코사인(cos) 함수인 하기 수학식 1을 활용하여 단어의 위치를 계산함으로써 악성 행위의 위치 정보를 생성하도록 구성될 수 있다.In one embodiment, the transformer model for the transformer algorithm includes an encoder layer and a decoders layer, and an embedding dimension in an embedding vector of each word in positional encoding of an input layer. If the value obtained by adding 1 to the position of is an odd number, the position of the word may be calculated by using Equation 1, which is a cosine function, to generate location information of malicious behavior.

일실시예에서, 상기 트랜스포머 알고리즘을 위한 트랜스포머 모델은 인코더(encoders) 계층과 디코더(decoders) 계층을 구비하고, 입력쪽 레이어의 위치 인코딩(Postional Encoding)에서 각 단어의 임베딩(embedding) 벡터에서 임베딩 차원의 위치에 1을 더한 값이 짝수인 경우에 싸인(sin) 함수인 하기 수학식 2를 활용하여 단어의 위치를 계산함으로써 악성 행위의 위치 정보를 생성하도록 구성될 수 있다.In one embodiment, the transformer model for the transformer algorithm includes an encoder layer and a decoders layer, and an embedding dimension in an embedding vector of each word in positional encoding of an input layer. If the value obtained by adding 1 to the position of is an even number, the position of the word may be calculated by using Equation 2 below, which is a sin function, thereby generating location information of malicious behavior.

일실시예에서, 상기 디코더 계층을 포함하는 디코더 구조는 문장 행렬로부터 각 시점의 단어를 예측하도록 훈련되고, 미래에 있는 단어들을 참고하지 못하도록 미래 시점의 단어들을 마스크하는 레이어를 디코더 서브 레이어 중 맨 앞단에 적용하도록 구성될 수 있다.In one embodiment, the decoder structure including the decoder layer is trained to predict a word of each view from a sentence matrix, and a layer masking words of a future view so as not to refer to words in the future is a front end of decoder sublayers. It can be configured to apply to.

전술한 본 발명에 따르면, 인공지능(AI) 기반 챗봇엔진을 적용한 악성 이벤트로그 자동분석 장치나 방법은 엔드포인트 위협탐지 및 대응(endpoint detection & response, EDR) 솔루션을 핵심엔진으로 적용하여 다량의 로그를 효율적이고 빠르게 분류 및 분석이 필요한 솔루션의 분석 모듈로 활용하고, 전문적인 분석가가 없어도 AI에 의해 위협행위가 발생하였을 때 침해여부를 자동으로 판단하는 솔루션으로 활용할 수 있다.According to the present invention described above, an automatic analysis device or method for malicious event logs to which an artificial intelligence (AI)-based chatbot engine is applied applies an endpoint detection & response (EDR) solution as a core engine to detect a large amount of logs. can be used as an analysis module for solutions that require efficient and fast classification and analysis, and can be used as a solution that automatically determines whether a threat is violated by AI without a professional analyst.

또한, 본 발명에 따른 AI를 이용한 자동분석은 엔드포인트 이벤트 로그 분석에 한정되지 않고 악성코드를 테스트할 수 있는 가시화 플랫폼에 적용하여 악성코드 및 악성행위의 분석 결과 값의 신뢰도를 상당히 높일 수 있으며, SECaaS(security as a service)와 같은 클라우드 기반 보안서비스 제품 등에 즉각 적용이 가능한 장점이 있다.In addition, the automatic analysis using AI according to the present invention is not limited to endpoint event log analysis, and can significantly increase the reliability of the analysis result of malicious code and malicious behavior by applying it to a visualization platform capable of testing malicious code, It has the advantage of being immediately applicable to cloud-based security service products such as SECaaS (security as a service).

전술한 본 발명의 악성 이벤트로그 자동분석 장치나 방법을 다음과 같은 분야에 효과적으로 적용할 수 있다. APT대응 분야의 경우, 지능형 지속위협대응을 위한 온프레미스(on-premise) 타입의 솔루션에 적용할 수 있다. EDR 솔루션 분야의 경우, 엔드포인트 이벤트행위 분석과 원인분석 및 대응 솔루션에 적용할 수 있다. SECaaS 분야의 경우, 클라우드 기반의 APT 및 랜섬웨어 대응 솔루션에 적용할 수 있다. 그리고, 보안관제 분야의 경우, 보안관제 자동화를 위한 엔드포인트 위협정보 가시화 분야에 적용할 수 있다.The above-described automatic malicious event log analysis device or method of the present invention can be effectively applied to the following fields. In the case of the APT response field, it can be applied to an on-premise type of solution for responding to intelligent persistent threats. In the case of the EDR solution field, it can be applied to endpoint event behavior analysis, cause analysis, and response solutions. In the case of SECaaS, it can be applied to cloud-based APT and Ransomware response solutions. And, in the case of the security control field, it can be applied to the endpoint threat information visualization field for security control automation.

도 1은 본 발명의 일실시예에 따른 인공지능(AI) 기반 챗봇엔진을 적용한 이벤트 자동분석 및 탐지 시스템(이하 간략히 '자동분석 장치')의 주요 구성 및 그 작동 원리를 설명하기 위한 도면이다.
도 2는 도 1의 자동분석 장치에 채용할 수 있는 엔드포인트 위협탐지 및 대응(endpoint detection & response, EDR)에 적용되는 침해위협 AI 분석 과정을 설명하기 위한 도면이다.
도 3은 도 1의 자동분석 장치에 채용할 수 있는 EDR에 적용되는 AI 기반 위협행위 탐지 기능을 설명하기 위한 도면이다.
도 4는 도 1의 자동분석 장치에 채용할 수 있는 AI 분석 모듈 적용에 대한 개략적인 예시도이다.
도 5는 도 1의 자동분석 장치에 채용할 수 있는 챗봇엔진 기반의 위협 이벤트로그 자동분석을 위한 AI 엔진을 설명하기 위한 개략도이다.
도 6은 도 1의 자동분석 장치에 채용할 수 있는 챗봇엔진을 적용한 AI 모듈이 적용된 EDR 시스템에 대한 개략적인 구성도이다.
도 7은 도 1의 자동분석 장치에 채용할 수 있는 EDR 솔루션과 AI 모듈 연동 인터페이스를 설명하기 위한 도면이다.
도 8은 도 1의 자동분석 장치에 채용할 수 있는 이벤트 분석을 위한 AI 모듈 적용 구성에 대한 개략적인 블록도이다.
도 9는 본 발명의 다른 실시예에 따른 악성 이벤트로그 자동분석 방법(이하 간략히 '자동분석 방법')에서 세션 ID와 프로세스 ID를 묶어 프로세서별로 그룹화하는 과정을 설명하기 위한 예시도이다.
도 10은 도 9의 자동분석 방법에서 정상행위가 모여 랜섬웨어 행위를 하는 악성 이벤트를 설명하기 위한 예시도이다.
도 11은 도 9의 자동분석 방법에서 정상행위가 모여 네트워크 정보수집 또는 정보전달 행위를 하는 악성 이벤트를 설명하기 위한 예시도이다.
도 12는 도 9의 자동분석 방법에서 이벤트 타입으로 종료됨(terminated)을 포함하지 않는 악성 이벤트를 설명하기 위한 예시도이다.
도 13은 도 9의 자동분석 방법에 채용할 수 있는 트랜스포머 모델 네트워크의 아키텍처에 대한 블록도이다.
도 14는 도 13의 트랜스포머 모델의 구조를 설명하기 위한 예시도이다.
도 15는 도 13의 트랜스포머 모델에 채용할 수 있는 또 다른 구성으로서 BERT와 밀집레이어(dense layer)을 이용한 텍스트 분류 구조를 설명하기 위한 예시도이다.
도 16은 도 13의 트랜스포머 모델과 함께 이용하거나 대체할 수 있는 도 15의 LSTM을 활용한 텍스트 분류의 예시도이다.
도 17은 도 1의 자동분석 장치에 의한 보안 서비스의 형태를 설명하기 블록도이다.
도 18은 도 1의 자동분석 장치의 AI기반 위협 이벤트 자동분석에 대한 성능평가 과정을 설명하기 위한 블록도이다.
도 19는 본 발명의 또 다른 실시예에 따른 자동분석 장치의 주요 구성을 설명하기 블록도이다.1 is a diagram for explaining the main configuration and operation principle of an automatic event analysis and detection system (hereinafter referred to as an 'automatic analysis device') to which an artificial intelligence (AI)-based chatbot engine is applied according to an embodiment of the present invention.
FIG. 2 is a diagram for explaining an AI analysis process of an infringement threat applied to endpoint detection & response (EDR) that can be employed in the automatic analysis device of FIG. 1 .
FIG. 3 is a diagram for explaining an AI-based threat behavior detection function applied to EDR that can be employed in the automatic analysis device of FIG. 1 .
Figure 4 is a schematic illustration of the application of an AI analysis module that can be employed in the automatic analysis device of Figure 1.
5 is a schematic diagram illustrating an AI engine for automatic analysis of a threat event log based on a chatbot engine that can be employed in the automatic analysis device of FIG. 1 .
6 is a schematic configuration diagram of an EDR system to which an AI module to which a chatbot engine that can be employed in the automatic analysis device of FIG. 1 is applied is applied.
7 is a diagram for explaining an EDR solution and an AI module linkage interface that can be employed in the automatic analysis device of FIG. 1.
8 is a schematic block diagram of an AI module application configuration for event analysis that can be employed in the automatic analysis device of FIG. 1 .
9 is an exemplary diagram for explaining a process of grouping session IDs and process IDs by processor in an automatic malicious event log analysis method (hereinafter referred to simply as 'automatic analysis method') according to another embodiment of the present invention.
FIG. 10 is an exemplary diagram for explaining a malicious event in which normal behaviors are gathered and ransomware behaviors in the automatic analysis method of FIG. 9 .
FIG. 11 is an exemplary diagram for explaining a malicious event in which normal behaviors are gathered to collect network information or transmit information in the automatic analysis method of FIG. 9 .
FIG. 12 is an exemplary diagram for explaining a malicious event that does not include terminated as an event type in the automatic analysis method of FIG. 9 .
FIG. 13 is a block diagram of the architecture of a transformer model network that can be employed in the automatic analysis method of FIG. 9 .
14 is an exemplary diagram for explaining the structure of the transformer model of FIG. 13;
FIG. 15 is an exemplary diagram for explaining a text classification structure using BERT and a dense layer as another configuration that can be employed in the transformer model of FIG. 13 .
FIG. 16 is an exemplary diagram of text classification using the LSTM of FIG. 15 that can be used together with or replaced with the transformer model of FIG. 13 .
17 is a block diagram illustrating a form of security service by the automatic analysis device of FIG. 1;
FIG. 18 is a block diagram illustrating a performance evaluation process for automatic AI-based threat event analysis of the automatic analysis device of FIG. 1 .
19 is a block diagram illustrating the main configuration of an automatic analysis device according to another embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다. Since the present invention can make various changes and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention. Like reference numerals have been used for like elements throughout the description of each figure.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는 데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. "및/또는"이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다. Terms such as first, second, A, and B may be used to describe various components, but the components should not be limited by the terms. These terms are only used for the purpose of distinguishing one component from another. For example, a first element may be termed a second element, and similarly, a second element may be termed a first element, without departing from the scope of the present invention. The term “and/or” includes any combination of a plurality of related listed items or any of a plurality of related listed items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. It is understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, but other elements may exist in the middle. It should be. On the other hand, when an element is referred to as “directly connected” or “directly connected” to another element, it should be understood that no other element exists in the middle.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Terms used in this application are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this application, the terms "include" or "have" are intended to designate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, but one or more other features It should be understood that the presence or addition of numbers, steps, operations, components, parts, or combinations thereof is not precluded.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and unless explicitly defined in the present application, they should not be interpreted in an ideal or excessively formal meaning. don't

이하, 첨부한 도면들을 참조하여, 본 발명의 바람직한 실시예를 보다 상세하게 설명하고자 한다. 본 발명을 설명함에 있어 전체적인 이해를 용이하게 하기 위하여 도면상의 동일한 구성요소에 대해서는 동일한 참조부호를 사용하고 동일한 구성요소에 대해서 중복된 설명은 생략한다.Hereinafter, with reference to the accompanying drawings, preferred embodiments of the present invention will be described in more detail. In order to facilitate overall understanding in the description of the present invention, the same reference numerals are used for the same components in the drawings, and redundant descriptions of the same components are omitted.

도 1은 본 발명의 일실시예에 따른 인공지능(AI) 기반 챗봇엔진을 적용하여엔드포인트 위협이벤트인 악성 이벤트의 이벤트로그를 자동분석하는 장치(이하 간략히 '자동분석 장치')의 개략적인 구성 및 그 작동 원리를 설명하기 위한 도면이다.1 is a schematic configuration of a device for automatically analyzing an event log of a malicious event, which is an endpoint threat event, by applying an artificial intelligence (AI) based chatbot engine according to an embodiment of the present invention (hereinafter referred to as an 'automatic analysis device' for short) And it is a drawing for explaining the operating principle.

도 1을 참조하면, 자동분석 장치(100)는, 그룹화를 위한 제1 모듈(110), 저장을 위한 제2 모듈(120), 구축을 위한 제3 모듈(130), 분석을 위한 제4 모듈(140) 및 자동 판단을 위한 제5 모듈(150)을 포함한다.Referring to FIG. 1 , the automatic analysis device 100 includes a first module 110 for grouping, a second module 120 for storage, a third module 130 for construction, and a fourth module for analysis. 140 and a fifth module 150 for automatic determination.

제1 모듈(110)은 사용자 퍼스널 컴퓨터(PC)로부터 수집한 로그를 그룹화한다. 제2 모듈(120)은 수집한 이벤트 로그나 그룹화된 로그를 정량화하여 저장한다. 제3 모듈(130)은 정량화된 로그를 토대로 데이터셋을 구축한다. 제4 모듈(140)은 트랜스포머 모델 기반으로 이벤트 로그에 대한 자동 분석을 수행한다. 제5 모듈(150)은 이벤트 로그에 대한 자동 분석 결과에 기초하여 위협 이벤트를 자동 판단한다.The first module 110 groups logs collected from a user's personal computer (PC). The second module 120 quantifies and stores the collected event logs or grouped logs. The third module 130 builds a dataset based on the quantified logs. The fourth module 140 automatically analyzes the event log based on the transformer model. The fifth module 150 automatically determines a threat event based on the result of automatic analysis of the event log.

트랜스포머 알고리즘을 위한 트랜스포머 모델은 인코더(encoders) 계층과 디코더(decoders) 계층을 구비하고, 입력쪽 레이어의 위치 인코딩(Postional Encoding, PE)에서 각 단어의 임베딩(embedding) 벡터에서 임베딩 차원의 위치에 1을 더한 값이 홀수인 경우에 코사인(cos) 함수인 하기 수학식 1을 활용하여 단어의 위치(pos)를 계산함으로써 악성 행위의 위치 정보를 생성하도록 구성될 수 있다.The transformer model for the transformer algorithm has an encoder layer and a decoders layer, and in the positional encoding (PE) of the input layer, 1 is placed at the position of the embedding dimension in the embedding vector of each word. When the value obtained by adding is an odd number, the position information of the malicious act may be generated by calculating the position (pos) of the word using Equation 1, which is a cosine (cos) function.

수학식 1에서, D는 이벤트 분석기의 입력 차원의 수를 나타내고, i는 상기 입력 차원의 몇 번째인지를 나타낸다.In Equation 1, D represents the number of input dimensions of the event analyzer, and i represents the number of the input dimension.

또한, 트랜스포머 알고리즘을 위한 트랜스포머 모델은 인코더(encoders) 계층과 디코더(decoders) 계층을 구비하고, 입력쪽 레이어의 위치 인코딩(Postional Encoding)에서 각 단어의 임베딩(embedding) 벡터에서 임베딩 차원의 위치에 1을 더한 값이 짝수인 경우에 싸인(sin) 함수인 하기 수학식 2를 활용하여 단어의 위치를 계산함으로써 악성 행위의 위치 정보를 생성하도록 구성될 수 있다.In addition, the transformer model for the transformer algorithm has an encoder layer and a decoders layer, and in the positional encoding of the input side layer, 1 is placed at the position of the embedding dimension in the embedding vector of each word. When the value obtained by adding is an even number, the position information of the malicious act may be generated by calculating the position of the word using Equation 2 below, which is a sin function.

수학식 2에서, D는 이벤트 분석기의 입력 차원의 수를 나타내고, i는 상기 입력 차원의 몇 번째인지를 나타낸다.In Equation 2, D represents the number of input dimensions of the event analyzer, and i represents the number of the input dimension.

자동분석 장치(100)는 네트워크(network)를 통해 인공지능(AI) 분석 웹 관리 시스템(200)에 연결될 수 있다. AI분석 웹 관리 시스템(200)은 사용자 퍼스널 컴퓨터에 연결된 AI 분석 서버에 탑재될 수 있고, 관리자에 대한 사용자 관리를 수행하는 모듈(210), 수집이나 분석 현황에 대한 모니터링을 수행하는 모듈(220), 수집 및 분석 시스템에 대한 관리를 수행하는 모듈(230), AI 분석 현황을 관리하는 모듈(240)을 구비할 수 있다.The automatic analysis device 100 may be connected to the artificial intelligence (AI) analysis web management system 200 through a network. The AI analysis web management system 200 can be mounted on an AI analysis server connected to a user's personal computer, and a module 210 performing user management for an administrator, a module 220 performing monitoring of collection or analysis status , a module 230 for managing the collection and analysis system, and a module 240 for managing the AI analysis status.

전술한 자동분석 장치(100)는, 인공지능(AI) 기반 챗봇엔진을 적용한 이벤트 자동분석 및 탐지를 위한 엔드포인트 위협탐지 및 대응(endpoint detection & response, EDR) 장치로서, 프로세서와 메모리를 포함하고, 메모리에 저장된 적어도 하나의 명령에 의해 프로세서가: 이벤트 로그의 효율적 분석을 위한 프로세스별 그룹화로 로그를 가공하고, AI 분석을 위한 NLP(Natural Language Processing) 기법을 통한 프로세스 행위에 대한 문장을 생성하고, 실제로 동작하는 악성 행위들을 데이터화하여 AI 모델 학습을 위한 학습데이터로 사용하며, 학습데이터를 학습시키기 위해 트랜스포머(transfomer) 알고리즘 기반의 학습 체계를 구현하고, 실제 악성행위를 분석하기 위해 BERT(bidirectional encoder representations form transformers) 및 LSTM(long short-term memory) 알고리즘 기반의 이벤트 악성 여부 자동 분석기를 통해 그룹화한 이벤트 로그를 실시간으로 확인하여 악성 행위 대상이 일정 유사도 이상일 경우 탐지하여 위협 이벤트로 자동판단한다.The aforementioned automatic analysis device 100 is an endpoint detection & response (EDR) device for automatically analyzing and detecting events using an artificial intelligence (AI)-based chatbot engine, and includes a processor and memory, , By at least one command stored in memory, the processor: Processes logs by grouping by process for efficient analysis of event logs, generates sentences about process behavior through NLP (Natural Language Processing) techniques for AI analysis, , Actual malicious behaviors are converted into data and used as learning data for AI model learning, a transformer algorithm-based learning system is implemented to learn the learning data, and BERT (bidirectional encoder) is used to analyze actual malicious behaviors. Representation form transformers) and LSTM (long short-term memory) algorithm-based event maliciousness automatic analyzer checks grouped event logs in real time, detects if the target of malicious action has a certain similarity or higher, and automatically judges it as a threat event.

이벤트 로그 분류를 위한 딥러닝기반 챗봇엔진은, 챗봇 엔진에 위협 프로세스 행위를 학습시키기 위한 행위 문장 생성 및 라벨링 방식으로 동작하며, 트랜스포머 아키텍처의 인코더 계층 및 디코더 계층 6개를 기준으로 성능에 따라 최적 계층으로 설계되고, 모델 학습 시 필요한 비용(cost)의 최적화를 위한 학습률(learning rate)을 학습 진행 시간에 따라 그 속도를 조정하도록 구성되고, 트랜스포머 기반 BERT 모델과 밀집레이어(dense layer)를 결합한 모델로 구성될 수 있고, LSTM 및 GRU(gated recurrent unit) 기반의 전통적인 RNN(recurrent neural network) 모델과 트랜스포머 기반 모델의 F1-score의 비교에 따른 최적 모델을 이용할 수 있다.The deep learning-based chatbot engine for event log classification operates by generating and labeling behavioral sentences to teach the chatbot engine threat process behavior, and optimizes the performance based on the six encoder and decoder layers of the Transformer architecture. , and is configured to adjust the speed according to the learning progress time for optimizing the cost required for model learning, and is a model combining a transformer-based BERT model and a dense layer. LSTM and GRU (gated recurrent unit) based traditional RNN (recurrent neural network) models and transformer-based models based on F1-score comparison according to the optimal model can be used.

또한, 이벤트 분석을 위한 AI 모듈은 EDR 시스템과 연동할 수 있다. 여기서, AI 모듈은 클라우드 기반 EDR 솔루션과 연동되어 통합 서비스를 제공할 수 있다. 이러한 AI 모듈은 각각 사이트별로 배치되는 어플라이언스 형태의 관리 서버에 의해 관리되거나, 클라우드(cloud) 기반으로 네트워크 상에서 직접 서비스될 수 있다.In addition, the AI module for event analysis can work with the EDR system. Here, the AI module can provide an integrated service in conjunction with a cloud-based EDR solution. These AI modules may be managed by a management server in the form of an appliance disposed for each site, or may be directly serviced on a network based on a cloud.

도 2는 도 1의 자동분석 장치에 채용할 수 있는 엔드포인트 위협탐지 및 대응(endpoint detection & response, EDR)에 적용되는 침해위협 AI 분석 과정을 설명하기 위한 도면이다. 도 3은 도 1의 자동분석 장치에 채용할 수 있는 EDR에 적용되는 AI 기반 위협행위 탐지 기능을 설명하기 위한 도면이다. 그리고 도 4는 도 1의 자동분석 장치에 채용할 수 있는 AI 분석 모듈 적용에 대한 개략적인 예시도이다.FIG. 2 is a diagram for explaining an AI analysis process of an infringement threat applied to endpoint detection & response (EDR) that can be employed in the automatic analysis device of FIG. 1 . FIG. 3 is a diagram for explaining an AI-based threat behavior detection function applied to EDR that can be employed in the automatic analysis device of FIG. 1 . And Figure 4 is a schematic illustration of the application of the AI analysis module that can be employed in the automatic analysis device of Figure 1.

도 2를 참조하면, 자동분석 장치에 채용할 수 있는 EDR에 적용되는 침해위협 AI 분석 과정은 EDR 시스템 또는 분석시스템에서 분석시스템 AI DB에 이벤트 로그를 그룹별로 정형화하여 저장할 수 있다(S21).2, the infringement threat AI analysis process applied to the EDR that can be employed in the automatic analysis device can standardize and store event logs by group in the analysis system AI DB in the EDR system or analysis system (S21).

다음, 프로세서 식별자(process ID) 및 세션(session) 별 분석 로그를 그룹화할 수 있다(S22).Next, analysis logs for each process ID and session may be grouped (S22).

다음, 이벤트 로그로부터 행위 문장을 생성할 수 있다(S23).Next, an action sentence may be generated from the event log (S23).

다음, AI 기반으로 행위 문장을 분석할 수 있다(S24).Next, the action sentence can be analyzed based on AI (S24).

다음, 분석된 행위 문장에 기초하여 위협 이벤트를 자동으로 판단할 수 있다(S25).Next, a threat event may be automatically determined based on the analyzed action sentence (S25).

다음, 자동판단 결과는 웹서버(web server) DB 예컨대 엘레스틱(elastic) DB 분석 및 판단 결과를 분산시스템 AI DB로 전송하고, 위협 이벤트 트리 및 분석 정보를 가시화하도록 디스플레이 장치로 전송할 수 있다(S26, S31).Next, the automatic determination result may be transmitted to a web server DB such as an elastic DB analysis and determination result to a distributed system AI DB, and to a display device to visualize the threat event tree and analysis information (S26 , S31).

그리고, 분석시스템 AI DB에 저장되는 이벤트 로그는 사용자 퍼스널 컴퓨터에 설치되는 에이전트에 전달되어 에이전트가 침해위협 정보를 수집하는데 이용될 수 있다(S27).And, the event log stored in the analysis system AI DB is delivered to the agent installed in the user's personal computer, and the agent can be used to collect infringement threat information (S27).

다음, 수집된 침해위협정보에 기초하여 에이전트는 위협행위를 자동탐지할 수 있다(S28).Next, based on the collected infringement threat information, the agent can automatically detect a threat (S28).

다음, 위협행위 자동탐지를 통해 수집된 이벤트 로그는 분석시스템 AI DB로 보고될 수 있다(S30).Next, the event log collected through the automatic threat behavior detection can be reported to the analysis system AI DB (S30).

즉, 도 3에 나타낸 바와 같이, EDR에 적용되는 AI 기반 위협행위 탐지 방법은, 위협 이벤트의 자동 판단 단계(S31) 후에 자동 분석 단계(S32), 데이터 수집 단계(S33), 로그 그룹화 단계(S34), 위협 탐지 단계(S35) 및 위협 이벤트 정보 가시화 단계(S36)를 수행하도록 구성될 수 있다.That is, as shown in FIG. 3, in the AI-based threat behavior detection method applied to EDR, after the automatic determination of threat events (S31), the automatic analysis step (S32), the data collection step (S33), and the log grouping step (S34) ), threat detection step (S35) and threat event information visualization step (S36).

다시 말해서, 도 4에 도시한 바와 같이, 이벤트 로그를 수집하는 단계(S41) 이후에, 이벤트 행위 문장을 생성하는 단계(S42), AI 모델 학습 기반의 특징 추출은 룰(rule)을 적용하여 수행하는 단계(S43), 및 결과 데이터를 전송하여 악성여부를 자동판정하는 단계(S44)를 수행하도록 구성될 수 있다.In other words, as shown in FIG. 4, after collecting event logs (S41), generating event action sentences (S42), AI model learning-based feature extraction is performed by applying a rule It may be configured to perform a step (S43), and a step (S44) of automatically determining whether the resulting data is malicious by transmitting.

위에서 도 2 및 도 4를 참조하여 설명한 바와 같이, 자동분석 장치의 일종인 챗봇엔진을 적용한 위협 이벤트 자동분석 및 탐지 AI모듈은, 엔드포인트 보안을 위해 EDR 시스템과 연동할 수 있다.As described above with reference to FIGS. 2 and 4, the threat event automatic analysis and detection AI module to which the chatbot engine, which is a kind of automatic analysis device, is applied, can work with the EDR system for endpoint security.

AI 모듈은 기존의 EDR 시스템에 대한 AI 적용을 위해 기존의 웹 데이터베이스(web database)로 전달되던 이벤트 로그 데이터를 새롭게 구축한 AI 분석 데이터베이스로 전달하도록 인터페이스 마이그레이션 작업을 수행하여 구성될 수 있다.The AI module can be configured by performing an interface migration operation to forward the event log data, which was delivered to the existing web database, to the newly built AI analysis database for AI application to the existing EDR system.

또한, AI 모듈은 이벤트 로그 데이터셋 구축을 위해 대용량 데이터베이스(DB)인 엘라스틱 서치 DBMS(DB management system)를 구비할 수 있다.In addition, the AI module may have an elastic search DBMS (DB management system), which is a large-capacity database (DB), to build an event log dataset.

또한, AI 모듈은 위협 이벤트 로그 자동 분석 및 탐지를 위해 트랜스포머(transformer) 알고리즘 기반 학습 체계를 구비할 수 있다.In addition, the AI module may have a transformer algorithm-based learning system for automatic analysis and detection of threat event logs.

또한, AI 모듈은 텍스트 분류 기능으로 악성 행위 이벤트를 분류할 수 있는 자동화된 행위 이벤트 분석을 위한 기능이나 이러한 기능에 대응하는 구성부를 구비할 수 있다.In addition, the AI module may have a function for automated behavior event analysis capable of classifying malicious behavior events with a text classification function, or a component corresponding to such a function.

또한, AI 모듈은 엔드포인트 이벤트 탐지룰과 연동하여 악성 이벤트를 탐지하도록 구성될 수 있다.In addition, the AI module may be configured to detect malicious events in conjunction with endpoint event detection rules.

전술한 구성을 위해, AI 모듈은, 위협 이벤트 데이터 수집, 이벤트 로그를 수집하여 프로세서 식별자(process ID) 및 세션 식별자(session ID) 별로 그룹화, 프로세스별 그룹화를 진행하여 AI 기반 위협 프로세스 자동 분석, 자동분석 결과에 따른 위협 이벤트 자동 판단, 그리고 웹 유저인터페이스(Web UI)를 통한 위협 이벤트 자동 분석 및 판단 결과와 프로세스 트리를 가시화하여 표출할 수 있다.For the above configuration, the AI module collects threat event data, collects event logs, groups them by processor identifier (process ID) and session identifier (session ID), and groups them by process to automatically analyze AI-based threat processes, automatically Automatic judgment of threat events according to the analysis results, automatic analysis of threat events through the web UI, and judgment results and process trees can be visualized and displayed.

이와 같이 본 실시예에서는 이벤트로그를 자동으로 분석하여 악성여부 또한 자동으로 판정할 수 있고, 또한 이벤트 분석의 자동화 체계를 구축하여 높은 정확도와 신속한 분석의 두 가지 이점을 얻을 수 있다.As described above, in the present embodiment, it is possible to automatically determine whether an event log is malicious by automatically analyzing the event log, and two advantages of high accuracy and rapid analysis can be obtained by constructing an automated event analysis system.

도 5는 도 1의 자동분석 장치에 채용할 수 있는 챗봇 기반의 위협 이벤트로그 자동분석을 위한 AI엔진을 설명하기 위한 개략도이다.5 is a schematic diagram for explaining an AI engine for automatic analysis of a chatbot-based threat event log that can be employed in the automatic analysis device of FIG. 1 .

도 5를 참조하면, 자동분석 장치에 채용할 수 있는 AI엔진은, EDR을 적용한 챗봇엔진의 일종으로서, 엔드포인트(endpoint)에서 추출한 이벤트를 룰(rule)을 활용하여 1차 분석하고(S51, S52), 분석결과를 AI 분석서버 DB에 저장하고(S53, 54), 저장된 데이터를 AI모듈이 ㅍ프프로세스 식별자(process ID) 및 세션 식별자(session) 별로 분석로그 그룹화를 수행할 수 있다. 룰(rule)은 IOC(indicator of compromise) 침해지표와 MITRE ATT&CK(adversarial tactics, techniques and common knoledge) 공격기법을 포함할 수 있다.Referring to FIG. 5, the AI engine that can be employed in the automatic analysis device is a kind of chatbot engine to which EDR is applied, and first analyzes the event extracted from the endpoint using a rule (S51, S52), the analysis results are stored in the AI analysis server DB (S53, 54), and the AI module can perform analysis log grouping by process ID and session identifier (session) for the stored data. The rule may include IOC (indicator of compromise) infringement indicators and MITER ATT&CK (adversarial tactics, techniques and common knowledge) attack techniques.

또한, AI엔진은 이벤트 그룹에 관하여 순서를 고려한 이벤트 행위 문장 생성과, 생성된 행위 문장을 AI DB로 저장하여 데이터를 수집하고 데이터셋으로 구축할 수 있다(S56). 이러한 데이터셋을 이용하면, 트랜스포머 기반 모델 혹은 RNN 계열의 모델 학습에 이용하여 학습된 분석기를 준비할 수 있다.In addition, the AI engine may generate event action sentences considering the order of event groups and store the generated action sentences in an AI DB to collect data and build a dataset (S56). Using these datasets, a trained analyzer can be prepared by using a transformer-based model or RNN-based model learning.

이러한 구성에 의하면, 입력되는 이벤트 로그를 학습된 분석기로 자동 분석과 분류 확률을 계산하고, 이를 토대로 위협 이벤트의 자동판단을 진행하고, 분석 및 판단 결과를 웹 서버 DB에 저장하고 웹 사용자인터페이스 매니저(web UI manager)를 통해 트랜스포머 기반 자동 분석 및 판단 결과와 프로세스 트리 행위 순서를 시각화하여 사용자에게 제공할 수 있다.According to this configuration, the input event log is automatically analyzed and the classification probability is calculated with the learned analyzer, based on this, automatic judgment of threat events is performed, analysis and judgment results are stored in the web server DB, and the web user interface manager ( web UI manager), the result of transformer-based automatic analysis and judgment and the order of process tree actions can be visualized and provided to the user.

또한, AI엔진은 에이전트(agent)를 통해 수집되는 행위 이벤트 로그를 프로세스 식별자(process ID) 및 세션 등을 활용하여 그룹화를 진행하고 위협 행위 이벤트를 행위 문장으로 생성하여 데이터셋을 구축하고(S57), 이렇게 구축된 행위 이벤트 데이터셋을 활용하여 텍스트 분류 기법들 중 속도와 전체적인 장문의 구문 분석을 위한 트랜스포머(transformer) 기반의 분석기를 통해 분석기에서 나온 분석 결과를 기반으로 위협 이벤트 로그를 자동 탐지할 수 있다(S58, S59).In addition, the AI engine groups the action event logs collected through the agent using process IDs and sessions, generates threat action events as action sentences, and builds a dataset (S57). , Using the behavioral event data set constructed in this way, a transformer-based analyzer for speed and overall long sentence parsing among text classification techniques can automatically detect threat event logs based on the analysis results from the analyzer. Yes (S58, S59).

자동 탐지된 위협 이벤트는 웹 서버 DB 예컨대 엘라스틱(elastic) DB에 저장되고 위협 이벤트 자동 판단에 이용될 수 있으며(S59), 웹서버 DB나 엘라스틱 DB에서 분석 및 판단 결과를 저장하고(S60), 사용자 인터페이스(user interface, UI)를 통해 분석 결과나 판단 결과를 표출할 수 있다(S61).The automatically detected threat event is stored in a web server DB, such as an elastic DB, and can be used for automatic determination of the threat event (S59), and analysis and judgment results are stored in the web server DB or elastic DB (S60), and the user Analysis results or judgment results may be expressed through a user interface (UI) (S61).

도 6은 도 1의 자동분석 장치에 채용할 수 있는 챗봇 엔진을 적용한 AI 모듈이 적용된 EDR 시스템에 대한 개략적인 구성도이다. 도 7은 도 1의 자동분석 장치에 채용할 수 있는 EDR 솔루션과 AI 모듈 연동 인터페이스를 설명하기 위한 도면이다.6 is a schematic configuration diagram of an EDR system to which an AI module to which a chatbot engine that can be employed in the automatic analysis device of FIG. 1 is applied is applied. 7 is a diagram for explaining an EDR solution and an AI module linkage interface that can be employed in the automatic analysis device of FIG. 1.

도 6을 참조하면, 챗봇 엔진을 적용한 AI기반 EDR 시스템(100)은, 자동분석 장치의 일종으로서, AI 모듈(160)과 이벤트로그를 저장할 AI 데이터베이스(database)(170)를 구비할 수 있고, AI 모듈을 통해 네트워크 상의 사용자 PC에 설치된 에이전트에서 사용자 PC의 이벤트를 룰(Rule) 매칭을 진행하여 분석하도록 하고, 분석 결과를 분석 서버의 AI 데이터베이스(170)와 연동하여 분석결과를 네트워크 상의 사용자 PC, 분석 서버, 엘라스틱 DB(260) 등으로 전송하도록 구성될 수 있다. 엘라스틱 DB(260)에는 웹 브라우저 기반 사용자 인터페이스(270)가 설치되어 사용자 PC, 기타 사용자 단말의 접근을 허용하고 사용자 PC와 통신할 수 있다.Referring to FIG. 6, the AI-based EDR system 100 to which the chatbot engine is applied may include an AI module 160 and an AI database 170 to store event logs as a kind of automatic analysis device, Through the AI module, the agent installed on the user PC on the network analyzes the event of the user PC by performing rule matching, and the analysis result is linked with the AI database 170 of the analysis server to transfer the analysis result to the user PC on the network , may be configured to be transmitted to an analysis server, an elastic DB 260, and the like. A web browser-based user interface 270 is installed in the elastic DB 260 to permit access from a user PC and other user terminals and to communicate with the user PC.

AI 모듈의 주요 작동 원리를 도 7을 참조하여 설명하면 다음과 같다. 이하에서 설명하는 과정은 해당 기능을 수행하는 구성부에 대응될 수 있다.The main operating principle of the AI module is described with reference to FIG. 7 as follows. A process described below may correspond to a component that performs a corresponding function.

이벤트가 발생되면(event occurred), AI 모듈은 에이전트로부터 이벤트 로그 수집(log collection)을 수행한다(71). 수집된 로그는 특정 폴더(folder)에 저장된다. 다음, 룰 모듈(rule module, 72)을 통해 룰 매칭을 진행하고 그 결과를 특정 폴더에 저장한다. 수집된 로그와 매칭 결과는 매니저(manager, 73)를 통해 웹(web) 상에서 사용자 PC 등에 제공되거나 DB(74)에 저장될 수 있다. 여기서 매니저(73)는 서버용 하드웨어 또는 클라우드 환경으로 구축되고 가상화 운영체제 상에서 DB 서버와 같이 AI 모듈과 연동할 수 있다.When an event occurs, the AI module performs event log collection from the agent (71). Collected logs are saved in a specific folder. Next, rule matching is performed through a rule module 72 and the result is stored in a specific folder. The collected logs and matching results may be provided to a user PC on the web through a manager 73 or stored in a DB 74 . Here, the manager 73 is built as server hardware or a cloud environment and can work with an AI module like a DB server on a virtualized operating system.

다음, DB(74)에 저장된 로그(log)는 그룹핑(grouping)되고(75), 그룹핑된 로그는 행위 문장(action sentence, 76)으로 생성될 수 있다. 생성된 행위 문장(76)은 데이터베이스(74)에 저장될 수 있다Next, the logs stored in the DB 74 are grouped (75), and the grouped logs may be generated as action sentences (76). The generated action sentence 76 may be stored in the database 74.

또한, DB(74)에 저장된 로그와 매칭 결과에 기초하여 기계학습(machine leanrning, 77)을 학습시킬 수 있고, 그에 의해 학습 모델 파일(learning model file)을 생성(create)할 수 있다(78). 그리고, 앞서 생성된 행위 문장(76)은 학습 모델 파일을 생성하는데 이용될 수 있다.In addition, machine learning (77) can be trained based on the logs and matching results stored in the DB (74), thereby creating a learning model file (78). . And, the previously generated action sentence 76 may be used to generate a learning model file.

다음, 학습 모델 파일을 이용하여 행위 문장 분석 결과(action sentence analyzed results, 79)를 생성할 수 있다. 그리고 생성된 행위 문장 분석 결과(79)와 문턱치 설정(threshold setting, 80)에 기초하여 위협을 자동판단(automatic judgement of threats)하고(81), 판단 결과가 소정의 기준치(threshold result, 82) 이상이거나 이하일 때 그 결과를 매니저 DB(83)에 저장하도록 구성될 수 있다(83).Next, action sentence analyzed results (79) can be generated using the learning model file. Then, automatic judgment of threats is made based on the generated action sentence analysis result (79) and threshold setting (80) (81), and the judgment result is greater than or equal to a predetermined threshold result (82). When it is equal to or less than, the result may be configured to be stored in the manager DB 83 (83).

즉, 에이전트에서 시도한 분석결과를 호출하여 프로세서 식별자(process ID) 및 세션(session) 정보를 활용하여 그룹화를 진행하고 진행 순서대로 그리고 프로세스 행위 순서대로 문장을 생성하 후 인공지능 데이터베이스(AI DB)에 전송하여 데이터셋을 구축할 수 있다.In other words, by calling the analysis results attempted by the agent, grouping is performed using process ID and session information, and sentences are generated in the order of progress and process action, and then stored in the artificial intelligence database (AI DB). You can build a dataset by sending

도 8은 도 1의 자동분석 장치에 채용할 수 있는 이벤트 분석을 위한 AI 모듈 적용 구성(EDR 솔루션)에 대한 걔략적인 블록도이다.8 is a schematic block diagram of an AI module application configuration (EDR solution) for event analysis that can be employed in the automatic analysis device of FIG. 1.

도 8을 참조하면, EDR 솔루션에 대응하는 EDR 시스템은 넓은 의미에서 EDR 통합관리서버(300)과 에이전트(Agent, 100)를 포함할 수 있다. 즉, 본 실시예의 EDR 시스템은 이벤트 분석을 위한 AI모듈을 적용한 AI기반 EDR 시스템으로 구축될 수 있다.Referring to FIG. 8 , the EDR system corresponding to the EDR solution may include an EDR integrated management server 300 and an agent 100 in a broad sense. That is, the EDR system of this embodiment can be built as an AI-based EDR system to which an AI module for event analysis is applied.

EDR 통합관리서버(이하 간략히 '통합관리서버')(300)는 사용자 인터페이스(UI, 310), AI 기반 이벤트 분석모듈(320), 데이터베이스 관리 시스템(DBMS, 330), 복수의 행위분석 엔진(350), Windows server 2016R2 등의 서버 계열의 운영체제(360), 윈도우 운영체제(windows OS), HPE ProLiant 서버 등에 설치되는 ESXi, 아마존 웹서비스(amazon web services, AWS), 마이크로소프트 애저(azure) 등의 클라우드 컴퓨팅 서비스, NHN toast 등의 공공기관용 클라우드 인프라 등의 플랫폼(370), 이를 지원하는 하드웨어(hardware, 380)를 구비할 수 있다. 하드웨어(380)는 클라우드(cloud) 기반 하드웨어일 수 있다.The EDR integrated management server (hereinafter referred to as 'integrated management server') 300 includes a user interface (UI, 310), an AI-based event analysis module 320, a database management system (DBMS, 330), and a plurality of behavior analysis engines 350 ), server-type operating system (360) such as Windows server 2016R2, Windows operating system (windows OS), ESXi installed on HPE ProLiant server, Amazon web services (AWS), cloud such as Microsoft Azure A platform 370 such as a computing service, a cloud infrastructure for public institutions such as NHN toast, and the like, and hardware 380 supporting it may be provided. Hardware 380 may be cloud-based hardware.

EDR 시스템은 엔드포인트(100)의 에이전트(agent, 180)에서 이벤트 정보를 수집하여 통합관리서버(300)로 전송하고, 통합관리서버(300) 내의 DB 관리시스템(330)에 해당 정보를 저장하도록 할 수 있다. 저장된 정보는 AI기반 이벤트 분석모듈(320)에서 순차적으로 수집하여 AI기반 이벤트 분석을 수행하고 분석된 결과를 기반으로 엔드포인트(100) 내의 탐지 룰(rule)(182)에 적용하도록 구성될 수 있다.The EDR system collects event information from the agent (agent, 180) of the endpoint (100), transmits it to the integrated management server (300), and stores the information in the DB management system (330) in the integrated management server (300). can do. The stored information may be sequentially collected by the AI-based event analysis module 320 to perform AI-based event analysis and applied to the detection rule 182 within the endpoint 100 based on the analyzed result. .

엔드포인트(100)은 윈도우 운영체제(102)와 PC하드웨어(H/W)(104)를 구비할 수 있으나, 이에 한정되지 않고, 안드로이드 운영체제 등의 기타 에이전트(180) 설치 및 그 동작이 가능한 환경을 갖춘 기타 운영체제를 구비할 수 있다.The endpoint 100 may include a Windows operating system 102 and PC hardware (H/W) 104, but is not limited thereto, and provides an environment in which other agents 180 such as an Android operating system can be installed and operated. Other operating systems may be provided.

본 실시예에 따른 통합관리서버(300)의 AI기반 이벤트분석모듈(320)이나 엔드포인트(100)의 에이전트(180)는, 본 실시예의 악성 이벤트로그 자동분석 방법을 수행하는 장치로서, 이벤트 로그의 효율적 분석을 위한 프로세스별 그룹화로 로그를 가공하는 단계; AI 분석을 위한 NLP(Natural Language Processing)기법을 통해 프로세스 행위에 대한 문장을 생성하는 단계; 실제로 동작하는 악성 행위들을 데이터화하여 AI 모델 학습을 위한 학습데이터로 사용하는 단계; 학습데이터를 학습시키기 위해 트랜스포머(transfomer) 알고리즘 기반의 학습 체계를 생성하는 단계; 실제 악성행위를 분석하기 위해 BERT 및 LSTM 알고리즘 기반의 이벤트 악성 여부 자동 분석기를 통해 그룹화한 이벤트 로그를 실시간으로 확인하는 단계; 악성 행위 대상이 일정 유사도 이상일 경우 탐지하여 위협 이벤트로 자동판단하는 단계를 수행하도록 구성될 수 있다.The AI-based event analysis module 320 of the integrated management server 300 or the agent 180 of the endpoint 100 according to this embodiment is a device that performs the automatic malicious event log analysis method of this embodiment, and is an event log processing logs by grouping by process for efficient analysis of; Generating sentences for process behavior through NLP (Natural Language Processing) techniques for AI analysis; Converting malicious behaviors that actually operate into data and using them as learning data for AI model learning; generating a learning system based on a transformer algorithm to learn learning data; Checking grouped event logs in real time through an automatic malicious event analyzer based on BERT and LSTM algorithms to analyze actual malicious behavior; It may be configured to perform a step of automatically determining a malicious action target as a threat event by detecting a similarity or higher.

도 9는 본 발명의 다른 실시예에 따른 악성 이벤트로그 자동분석 방법(간략히 자동분석 방법)에서 세션 ID와 프로세스 ID를 묶어 프로세서별로 그룹화하는 과정을 설명하기 위한 예시도이다.9 is an exemplary diagram for explaining a process of grouping session IDs and process IDs by processor in an automatic analysis method for malicious event logs (short, automatic analysis method) according to another embodiment of the present invention.

도 9를 참조하면, 악성 이벤트로그 자동분석을 위한 그룹화 과정은 로그 수집 단계(S91) 및 그룹화 단계(S92)를 포함할 수 있고, 그룹화된 데이터를 사용자 인터페이스에 표시하는 UI 표시 단계(S93)를 더 포함할 수 있다.Referring to FIG. 9 , the grouping process for automatically analyzing malicious event logs may include a log collection step (S91) and a grouping step (S92), and a UI display step (S93) for displaying the grouped data on the user interface. can include more.

로그 수집 단계(S91)에서는 세션 식별자(sesseion ID, SID)와 프로세스 식별자(process ID, PID) 별로 이벤트로그(eventlog)를 수집하여 저장한다.In the log collection step S91, event logs are collected and stored for each session ID (SID) and process ID (PID).

그룹화 단계(S92)에서는 수집된 이벤트 로그는 세션 식별자별로 혹은 프로세스 식별자별로 그룹핑한다.In the grouping step (S92), the collected event logs are grouped by session identifier or process identifier.

도 10은 도 9의 자동분석 방법에서 정상행위가 모여 랜섬웨어 행위를 하는 악성 이벤트를 설명하기 위한 예시도이다.FIG. 10 is an exemplary diagram for explaining a malicious event in which normal behaviors are gathered and ransomware behaviors in the automatic analysis method of FIG. 9 .

도 10을 참조하면, 정상행위가 모여 랜섬웨어 행위를 하는 악성 이벤트는, 지속성 유지 등을 위한 시작 프로그램 등록(S101), 금전 요구 등을 위한 바탕화면 변경(S102) 및 파일 암호화를 위한 파일이름 변경(S103)을 동시에 수행하는 경우를 포함할 수 있다. 이러한 동시 이벤트는 랜섬웨어(ransomware) 동작의 일종으로 간주되어 악성 행위로 판단될 수 있다(S104).Referring to FIG. 10, the malicious event in which normal behaviors gather and act as ransomware includes registering a startup program for maintaining persistence (S101), changing the background screen for requesting money (S102), and changing the file name for file encryption. (S103) may be performed at the same time. Such concurrent events may be regarded as a kind of ransomware operation and may be determined as malicious behavior (S104).

도 11은 도 9의 자동분석 방법에서 정상행위가 모여 네트워크 정보수집 또는 정보전달 행위를 하는 악성 이벤트를 설명하기 위한 예시도이다.FIG. 11 is an exemplary diagram for explaining a malicious event in which normal behaviors are gathered to collect network information or transmit information in the automatic analysis method of FIG. 9 .

도 11을 참조하면, 정상행위가 모여 네트워크 정보수집 또는 정보전달 행위를 하는 악성 이벤트는, 프로세스 실행(S111), 파일 생성 또는 파일 쓰기(S112) 및 네트워크 통신(S113)을 동시에 수행하는 경우를 포함할 수 있다. 이러한 동시 이벤트는 악성 의심 이벤트로 분류되어 악성 의심 행위로 판단될 수 있다(S114).Referring to FIG. 11, a malicious event in which normal actions gather to collect network information or transmit information includes a case in which process execution (S111), file creation or file writing (S112), and network communication (S113) are simultaneously performed. can do. Such concurrent events may be classified as suspected malicious events and determined as suspected malicious actions (S114).

전술한 도 9 내지 도 11를 참조하여 설명한 바와 같이, 자동분석 방법은 프로세스 행위별 구분을 위한 이벤트로그 그룹화 기법을 이용할 수 있다. 즉, 하나의 이벤트 로그만 분석 시 해당 이벤트 로그가 위협 행위 이벤트인 경우도 있지만 대부분의 경우 정상 행위 이벤트이며 정상 행위들로 판단된 행위들이 모여 위협 행위를 진행하기 때문에 부모 프로세스 ID 및 현재 프로세스 ID와 같은 세션 ID를 활용하여 시간 순서대로 그룹화를 진행하고, 이를 통해 악성 이벤트를 분석할 수 있다.As described above with reference to FIGS. 9 to 11 , the automatic analysis method may use an event log grouping technique for classification by process activity. That is, when only one event log is analyzed, the event log may be a threat action event, but in most cases, it is a normal action event, and actions determined as normal actions are gathered to proceed with the threat action, so that the parent process ID and the current process ID Grouping is performed in chronological order using the same session ID, and malicious events can be analyzed through this.

도 12는 도 9의 자동분석 방법에서 이벤트 타입으로 종료됨(terminated)을 포함하지 않는 악성 이벤트를 설명하기 위한 예시도이다.FIG. 12 is an exemplary diagram for explaining a malicious event that does not include terminated as an event type in the automatic analysis method of FIG. 9 .

도 12를 참조하면, 자동분석 방법은 NLP 기법을 위해 이벤트 로그를 활용한 프로세스 행위 문장 생성 과정을 이용할 수 있다. 즉, 그룹화된 이벤트 로그 중 행위가 종료됨(Terminated, S122)가 없는 경우, 프로세스 종료까지 진행되지 않음을 뜻하므로, Terminated 이벤트가 존재하는 이벤트 그룹에서 이벤트 타입, 변경되는 값(value), 프로세스 실행 경로를 그 행위 순서를 고려하여 문장을 생성하고 이를 이용하여 악성 이벤트로그를 자동분석할 수 있다.Referring to FIG. 12 , the automatic analysis method may use a process action sentence generation process using an event log for an NLP technique. That is, if there is no action terminated (Terminated, S122) among the grouped event logs, it means that it does not proceed until the end of the process. Sentences can be created considering the sequence of actions in the path, and malicious event logs can be automatically analyzed using this.

또한, 학습모델 생성을 위해 행위 순서 문장으로 데이터 셋을 구축할 수 있다. 이 경우, 이벤트 행위 문장 생성기에서 생성된 문장을 AI 분석 서버의 DB에 저장할 수 있고, 위협 이벤트 탐지를 위해 생성한 이벤트 행위 문장에 라벨링(labeling)을 수행할 수 있다.In addition, to create a learning model, a data set can be constructed with action sequence sentences. In this case, the sentence generated by the event action sentence generator may be stored in the DB of the AI analysis server, and the event action sentence generated to detect the threat event may be labeled.

도 13은 도 9의 자동분석 방법에 채용할 수 있는 트랜스포머 모델 네트워크의 아키텍처에 대한 블록도이다.FIG. 13 is a block diagram of the architecture of a transformer model network that can be employed in the automatic analysis method of FIG. 9 .

도 13을 참조하면, 자동분석 장치는 AI 분석을 위한 트랜스포머(transformer) 알고리즘 기반의 학습 체계를 이용할 수 있다.Referring to FIG. 13 , the automatic analysis device may use a transformer algorithm-based learning system for AI analysis.

트랜스포머 알고리즘 기반의 학습 체계는, 병렬처리가 가능하며 단어 간의 의미 파악에 뛰어난 성능을 가지고 있으며, 6개의 인코더와 디코더의 층들로 구성될 수 있다.The transformer algorithm-based learning system is capable of parallel processing and has excellent performance in understanding meaning between words, and can be composed of 6 layers of encoders and decoders.

예를 들어, 자동분석 장치는, 이벤트 로그의 임베딩(embedding)을 발생 위치와 관련하여 인코딩한(positional encoding) 데이터를 제1 모듈에 입력하고, 제1 모듈의 제1 프로세스(S132)를 통해, 입력 데이터를 다중 헤드 셀프 어텐션(multi-head self-attention)을 가진 인코더 셀프 어텐션(encoder self-attention)을 통해 처리하고, 입력 데이터와 인코더 셀프 어텐션으로 처리한 결과를 합하고 일반화하고(add & Norm), 위치 기준으로 FFNN(을 수행하고 그 결과를 일반화된 데이터와 합하고 일반화하여 출력할 수 있다.For example, the automatic analysis device inputs data obtained by positional encoding the embedding of the event log in relation to the occurrence position to the first module, and through the first process (S132) of the first module, The input data is processed through encoder self-attention with multi-head self-attention, and the result of processing the input data and encoder self-attention is combined and normalized (add & norm). , it is possible to perform FFNN (based on location, add the result with generalized data, and generalize it and output it.

다음, 자동분석 장치는, 제1 프로세스(S132)의 처리 결과를 제2 모듈의 제2 프로세스의 다중 헤드 셀프 어텐션에 입력할 수 있다. 제2 모듈의 제2 프로세스(S134)는 다중 헤드 셀프 어텐션의 이전 스테이지로서 마스크 다중 헤드 셀프 어텐션(masked multi-head self-attention)을 구비한 마스크 디코더 셀프 어텐션(masked decoder self-attention)을 더 포함할 수 있다. 마스크 디코더 셀프 어텐션의 처리 결과는 해당 모듈의 입력 데이터와 합해지고 일반화된 후 다중 헤드 셀프 어텐션에 입력될 수 있다. 이때, 다중 헤드 셀프 어텐션은 인코더 디코더 어텐션(encoder-decoder attention)에 대응될 수 있다.Next, the automatic analysis device may input the processing result of the first process ( S132 ) to the multi-head self-attention of the second process of the second module. The second process S134 of the second module further includes masked decoder self-attention with masked multi-head self-attention as a previous stage of multi-head self-attention. can do. The processing result of the mask decoder self-attention can be input to the multi-head self-attention after being combined with the input data of the corresponding module and generalized. In this case, multi-head self-attention may correspond to encoder-decoder attention.

제2 프로세스(S134)의 출력은 밀집(dense) 신경망의 softmax 등의 출력단을 통해 출력될 수 있다.The output of the second process ( S134 ) may be output through an output terminal such as softmax of a dense neural network.

전술한 제1 모듈과 제2 모듈의 두 단계들(S132, S134)에서의 관계는 복수의 레이어들(layers)에서 각각 수행되는 복수의 프로세스들의 인접한 두 레이어들에도 동일하게 적용될 수 있다. 즉, 자동분석 장치는 트랜스포머 알고리즘을 통해 이벤트 로그 그룹별로 자동판단을 동시에 병렬 수행할 수 있다.The above-described relationship between the two steps S132 and S134 of the first module and the second module may be equally applied to two adjacent layers of a plurality of processes respectively performed in the plurality of layers. That is, the automatic analysis device can simultaneously and parallelly perform automatic determination for each event log group through the transformer algorithm.

또한, 어텐션(attention)을 진행할 때, 병렬로 진행할 헤드(head)의 경우 8개로 설정하고, 손실함수는 다중 클래스 분류 문제를 해결하기 위해 크로스 엔트로피 함수를 채택할 수 있다. 손실함수 최적화를 위한 학습 속도(learning rate)의 경우, 학습 진행 경과에 따라 학습 속도의 크기를 점차 줄여 손실 함수가 좀 더 수렴에 용이하도록 수치를 튜닝할 수 있다.In addition, when proceeding with attention, in the case of heads that proceed in parallel, it is set to 8, and the loss function may adopt a cross entropy function to solve the multi-class classification problem. In the case of the learning rate for optimizing the loss function, the value can be tuned so that the loss function converges more easily by gradually reducing the size of the learning rate according to the progress of learning.

도 14는 도 13의 트랜스포머 모델의 구조를 설명하기 위한 예시도이다.14 is an exemplary diagram for explaining the structure of the transformer model of FIG. 13;

도 14를 참조하면, 트랜스포머 모델은 인코더(encoders) 계층(S142)과 디코더(decoders) 계층(S144)으로 이루어질 수 있다. 입력쪽 레이어의 위치 인코딩(Postional Encoding)에서 각 단어의 임베딩(embedding) 벡터에서 위치 정보를 더하여 모델 입력으로 사용할 수 있다.Referring to FIG. 14, the transformer model may include an encoder layer (S142) and a decoder layer (S144). In the positional encoding of the input layer, the positional information can be added to the embedding vector of each word and used as a model input.

또한, 디코더 구조도 인코더 구조와 동일하게 포지셔널 인코딩을 거친 후 문장 행렬이 입력되도록 구성될 수 있다. 그리고 디코더 구조는 문장 행렬로부터 각 시점의 단어를 예측하도록 훈련될 수 있다. 또, 미래에 있는 단어들을 참고하지 못하도록 미래 시점의 단어들을 마스크하는 레이어를 디코더 서브 레이어 중 맨 앞단에 적용할 수 있다.Also, the decoder structure may be configured such that a sentence matrix is input after positional encoding in the same manner as the encoder structure. And the decoder structure can be trained to predict the word at each time point from the sentence matrix. In addition, a layer masking future words to prevent future words from being referred to may be applied to the front end of decoder sublayers.

도 15는 도 13의 트랜스포머 모델에 채용할 수 있는 또 다른 구성으로서 BERT와 밀집레이어(dense layer)을 이용한 텍스트 분류 구조를 설명하기 위한 예시도이다.FIG. 15 is an exemplary diagram for explaining a text classification structure using BERT and a dense layer as another configuration that can be employed in the transformer model of FIG. 13 .

도 15를 참조하면, 자동분석 장치는 BERT 및 LSTM 모델을 활용한 이벤트 자동 분석기(S136)를 이용할 수 있다. 이 분석기(S136)는 문장의 주제찾기와 분류하는 문제의 해결을 위한 정밀 튜닝(fine-tuning)의 장점이 있는 BERT 모델을 활용할 수 있다. 그 경우, BERT 모델을 사전 학습하여 마지막 분류 시 학습시킬 레이블(lable) 개수만큼의 출력(output)을 가지는 덴스 레이어들(Dense Layers, S138)을 붙여서 사용할 수 있다.Referring to FIG. 15, the automatic analysis device may use an automatic event analyzer (S136) using BERT and LSTM models. This analyzer (S136) can utilize the BERT model, which has the advantage of fine-tuning, to solve problems of finding and classifying sentence topics. In this case, the BERT model can be pre-learned and used by attaching dense layers (S138) having as many outputs as the number of labels to be learned during final classification.

또한, LSTM 모델을 이용하는 경우, 전통적인 RNN보다 긴 시퀀스의 입력을 처리하는데 성능이 뛰어나며 위협 프로세스의 순차적인 이벤트 특성상 긴 문장이 형성 가능성이 높으므로 전통적인 RNN보다 LSTM 모델을 활용하는 것이 바람직할 수 있다. 그리고, 텍스트 분류와 챗봇 시스템 등 여러 자연어 처리에 많이 쓰이고 있으므로 트랜스포머 기반 모델과의 성능 비교에 따라 선택적으로 사용하는 것도 가능하다.In addition, when using the LSTM model, it is superior to traditional RNNs in processing long sequence inputs and has a high possibility of forming long sentences due to the sequential event characteristics of the threat process. In addition, since it is widely used in various natural language processing such as text classification and chatbot systems, it is possible to selectively use it according to performance comparison with transformer-based models.

도 16은 도 13의 트랜스포머 모델과 함께 이용하거나 대체할 수 있는 도 15의 LSTM을 활용한 텍스트 분류의 예시도이다.FIG. 16 is an exemplary diagram of text classification using the LSTM of FIG. 15 that can be used together with or replaced with the transformer model of FIG. 13 .

도 16을 참조하면, 트랜스포머 기반인 BERT 모델과 LSTM 및 GRU를 활용한 모델을 동일한 데이터 셋을 활용하여 텍스트 분류 혹은 질의응답 모델로 학습하여 검증 데이터와 테스트 데이터 셋 결과를 비교하여 f1-score가 우수한 모델을 선택적으로 이용할 수 있다.Referring to FIG. 16, the transformer-based BERT model and the model using LSTM and GRU are trained as text classification or question-answer models using the same data set, and the results of the verification data and the test data set are compared to obtain an excellent f1-score. The model can optionally be used.

예를 들어, "언제나 나는 구제역이 정말 싫다"는 텍스트 입력에 대하여 임베딩 레이어(S161)를 통해 다수의 LSTM(S162)에서 전처리한 후, 콘볼루션 레이어(convolution layer, S163)에서 학습한 후, Max pooling layer(S164) 및 FCL(fully-connected layer, S165)을 통해 자동분석 결과를 출력할 수 있다. 자동분석 결과는 악성 이벤트임(+), 악성 이벤트 아님(-), 판단 보류(0)로 설정될 수 있다.For example, after preprocessing the text input "I always hate foot-and-mouth disease" in a plurality of LSTMs (S162) through an embedding layer (S161), learning in a convolution layer (S163), Max An automatic analysis result can be output through a pooling layer (S164) and a fully-connected layer (FCL, S165). The automatic analysis result can be set as malicious event (+), not malicious event (-), or judgment pending (0).

전술한 이벤트 분석을 위한 컴퓨팅 장치(자동분석 장치에 대응함)나 이 컴퓨팅 장치의 적어도 일부를 구성하는 AI모듈을 EDR 시스템에 연동하여 구축할 수 있다.A computing device for event analysis described above (corresponding to an automatic analysis device) or an AI module constituting at least a part of the computing device may be constructed in conjunction with the EDR system.

각각 사이트별로 어플라이언스 형태의 관리 서버를 두어 보안 에이전트를 관리하고 서비스할 수 있는 구축형 AI모듈 연동 EDR 시스템이나 Cloud 기반으로 보안 에이전트를 관리하고 서비스하는 SECaaS 형태로 구현될 수 있다.It can be implemented in the form of an EDR system linked with a buildable AI module that can manage and service security agents by placing an appliance-type management server for each site, or in the form of SECaaS that manages and services security agents based on the Cloud.

본 실시예에 의하면, AI모듈은 기존에 소정 서비스를 제공하고 있는 클라우드 기반 EDR 솔루션과 연동하여 통합서비스를 제공할 수 있다. 이 경우, AI모듈의 설정 인터페이스와 분석된 결과물 정보는 EDR 시스템의 관리시스템인 GUI에 연동하여 제공되도록 구성될 수 있다.According to this embodiment, the AI module can provide an integrated service in conjunction with a cloud-based EDR solution that already provides a predetermined service. In this case, the setting interface of the AI module and the analyzed result information may be configured to be provided in conjunction with the GUI, which is the management system of the EDR system.

또한, 지능형 EDR 분석 시스템(이하 간략히 분석 시스템)은 챗봇엔진을 활용한 위협 이벤트 자동분석 및 탐지 기술이 적용된 엔드포인트 보안 솔루션을 포함할 수 있다. 이러한 분석 시스템은 이벤트 수집을 위한 사용자 PC에 에이전트 설치 형태로 구현되고, 수집되는 이벤트들을 야라(yara) 룰을 활용한 1차 분석 진행 후에 AI DB에 저장할 수 있다. 그리고 AI DB에 수집된 분석 결과를 프로세스 ID와 세션 별 이벤트 그룹화 가공을 진행한 후, 가공된 이벤트를 AI 분석기를 통해 자동 분석하고, 자동분석 결과에 따른 이벤트 자동판단 결과를 웹 서버 DB에 저장할 수 있다.In addition, the intelligent EDR analysis system (hereinafter simply referred to as the analysis system) may include an endpoint security solution to which threat event automatic analysis and detection technology using a chatbot engine is applied. This analysis system is implemented in the form of installing an agent on a user PC for event collection, and the collected events can be stored in an AI DB after the primary analysis using Yara rules. In addition, after processing the analysis results collected in the AI DB into event groupings by process ID and session, the processed events are automatically analyzed through the AI analyzer, and the event automatic judgment results based on the automatic analysis results are stored in the web server DB. there is.

또한, 분석 시스템은 웹 브라우저 기반 사용자 인터페이스를 활용하여 분석 결과와 판단 결과를 가시화하여 표출할 수 있다.In addition, the analysis system may utilize a web browser-based user interface to visualize and display analysis results and judgment results.

또한, 분석 시스템은 위협 이벤트 자동탐지의 일종으로써 이벤트 행위의 AI 기반 구문 분석을 통해 위협 이벤트를 자동 탐지할 수 있다. 그리고, 자동탐지된 결과기반으로 위협 이벤트를 자동 판정할 수 있다. 또한, 신변종 악성코드 유입 차단을 위한 블랙리스트와 연동하여 행위 분석을 통한 위협 프로세스 판정 후, 웹 사용자 인터페이스를 통해 블랙리스트 갱신할 수 있다. 또한, 분석 시스템은 야라(yara) 룰을 활용한 이벤트 탐지를 위해 사용자 PC에서 수집된 이벤트를 야라룰을 활용하여 위협 이벤트를 탐지하도록 구성될 수 있다.In addition, the analysis system can automatically detect threat events through AI-based syntax analysis of event behavior as a type of automatic threat event detection. In addition, a threat event can be automatically determined based on the result of the automatic detection. In addition, the blacklist can be updated through the web user interface after determining the threat process through behavior analysis in conjunction with the blacklist for blocking the inflow of new variant malicious code. In addition, the analysis system may be configured to detect threat events by utilizing Yara rules for events collected from the user's PC for event detection using Yara rules.

또한, 분석 시스템은 엔드포인트 예컨대 사용자 PC에서 발생하는 프로세스, 레지스트리, 네트워크, 파일 정보를 수집하고, MITRE ATT&CK에서 제공하는 공격전술, 공격기술, 공격방식 정보를 받아, 탐지 정보에서 높은 비율의 None, Telemetry 및 General Behavior, Tactic 이하의 탐지 기능을 제공할 수 있다.In addition, the analysis system collects process, registry, network, and file information occurring on endpoints, such as user PCs, and receives attack tactics, attack techniques, and attack method information provided by MITER ATT&CK, and detects a high rate of None, It can provide detection functions below Telemetry, General Behavior, and Tactic.

또한, 분석 시스템은 AI 분석 기능을 통해 자동으로 이벤트 로그를 분석하고 악성 여부 및 침해 행위를 판단하여 판단정보를 표출할 수 있다. 이러한 판단정보는 MITRE ATT&CK에서 제공하는 공격전술, 공격기술, 공격방식 정보를 포함하고, General Behavior 및 Tactic 단계 이상까지 증가시켜 저장될 수 있다.In addition, the analysis system can automatically analyze event logs through an AI analysis function, determine maliciousness and infringement, and express judgment information. This judgment information includes attack tactics, attack techniques, and attack method information provided by MITRE ATT&CK, and can be stored after increasing to General Behavior and Tactic levels or higher.

도 17은 도 1의 자동분석 장치에 의한 보안 서비스의 형태를 설명하기 블록도이다.17 is a block diagram illustrating a form of security service by the automatic analysis device of FIG. 1;

도 17을 참조하면, 보안 서비스 형태는 챗봇 엔진을 적용한 엔드포인트 위협행위 탐지 EDR 제품의 형태로 구현될 수 있다.Referring to FIG. 17, the security service type can be implemented in the form of an endpoint threat behavior detection EDR product to which a chatbot engine is applied.

이러한 보안 서비스를 위한 자동분석 장치는, 학습데이터 수집 및 보안알림 그리고 백업을 위한 외부 서버(400)로 구축될 수 있고, HTTP로 다른 장치들과 통신할 수 있다. 외부 서버(400)는 단문 메시지 서비스 서버(410), 업데이트 서버(420), 바이러스 토탈 서버(430), 비트디펜더(bitdefender) 서버(440), 메일 서버(450), 백업 서버(460) 등을 포함할 수 있다.The automatic analysis device for this security service can be built as an external server 400 for learning data collection, security notification, and backup, and can communicate with other devices through HTTP. The external server 400 includes a short message service server 410, an update server 420, a virus total server 430, a bitdefender server 440, a mail server 450, a backup server 460, and the like. can include

여기서, EDR 솔루션 관리를 위한 통합관리서버(300)는, 매니저의 일종으로서, 서버용 하드웨어나 클라우드 환경으로 구축되며 가상화 운영체제 상에 설치되어 DB 서버와 같이 연동하도록 설치될 수 있다.Here, the integrated management server 300 for EDR solution management, as a kind of manager, is built as server hardware or a cloud environment and is installed on a virtualized operating system and may be installed to work with a DB server.

또한, 자동분석 장치를 이용하는 AI기반 이벤트 분석시스템은, 엔드포인트(100)이나 통합관리서버(300)에 탑재될 수 있고, PC 하드웨어 또는 클라우드 환경의 가상환경에서 구현되고, 윈도우즈 서버 등의 운영체계에서 운영될 수 있다. 또한, 엔드포인트 이벤트 수집 및 위협 이벤트 대응을 위한 에이전트(Agent, 180)는 특정 엔드포인트인 Windows 7, Windows 10 등에서 운영되도록 구성될 수 있다.In addition, the AI-based event analysis system using an automatic analysis device can be mounted on the endpoint 100 or the integrated management server 300, implemented in PC hardware or a virtual environment of a cloud environment, and operating systems such as Windows servers. can operate in In addition, an agent (Agent, 180) for endpoint event collection and threat event response can be configured to operate on a specific endpoint, such as Windows 7 or Windows 10.

또한, 챗봇 엔진을 적용한 엔드포인트 위협행위 탐지 시스템은, 자동분석 장치를 이용하는 AI기반 이벤트 분석시스템의 일종으로서, 사용자 PC에서 발생하는 이벤트 로그를 룰(Rule) 매칭을 진행하여 DBMS에 사용자 PC 이벤트 로그로 전송하고, DBMS에 수집된 사용자 Pc 이벤트 로그를 프로세스 트리로 그룹화한 뒤 DBMS에 저장하고, 수집된 이벤트 로그 프로세스 트리를 행위 순서대로 문장 생성을 진행하고, 생성된 문장을 AI 모델학습 데이터셋으로 활용하여 트랜스포커 기반 혹은 LSTM 기반의 딥러닝 기반 AI 모델을 생성할 수 있다.In addition, the endpoint threat behavior detection system to which the chatbot engine is applied is a type of AI-based event analysis system that uses an automatic analysis device. Rule matching is performed on the event log generated on the user PC to log the user PC event log to the DBMS. After grouping the user PC event logs collected in the DBMS into process trees, storing them in the DBMS, generating sentences from the collected event log process trees in the order of actions, and converting the generated sentences into AI model training datasets. It can be used to create transporter-based or LSTM-based deep learning-based AI models.

또한, 전술한 시스템은 실제 사용자 PC 등의 엔드포인트(100)에서 발생하는 이벤트 로그를 통합관리서버(300)의 AI 분석 시스템에 전송하여 생성된 AI모델에서 자동 분석을 진행하여 유사도를 출력할 수 있고, 위협 프로세스 자동판단 분석결과와 자동판단 결과를 DBMS에 전송하고, Web UI를 통해 프로세스 트리 및 AI 자동 분석 및 판단 결과를 가시화하여 표출할 수 있다.In addition, the above-described system transmits the event log generated in the endpoint 100 such as the actual user PC to the AI analysis system of the integrated management server 300 to perform automatic analysis on the generated AI model to output the degree of similarity. The threat process automatic judgment analysis result and automatic judgment result are transmitted to the DBMS, and the process tree and AI automatic analysis and judgment results can be visualized and displayed through the Web UI.

또한, 전술한 챗봇엔진을 적용한 엔드포인트 위협행위 탐지 시스템은 다음의 모듈 구성을 구비할 수 있다. 즉, 시스템은 프로세스 트리 행위 순서 문장 생성 모듈; 텐서플로우(tensorflow) 프레임워크를 활용한 트랜스포머(Transformer) 네트워크 및 LSTM 기반 네트워크 모델 학습 모듈; 트랜스포머 네트워크 기반 및 LSTM 네트워크 기반 학습 모델을 활용한 자동 분석 및 유사도 출력 모듈; 데이터셋 처리를 위한 라이브러리; 자동 분석결과 기반 위협프로세스 자동 탐지 모듈; 분석 결과와 자동 판단결과 웹(Web) 사용자 인터페이스를 통한 가시화 모듈을 포함할 수 있다.In addition, the endpoint threat behavior detection system to which the aforementioned chatbot engine is applied may have the following module configuration. That is, the system includes a process tree action sequence sentence generation module; Transformer network and LSTM-based network model training module using the tensorflow framework; Automatic analysis and similarity output module utilizing transformer network-based and LSTM network-based learning models; library for processing datasets; Automatic analysis result-based threat process automatic detection module; Analysis results and automatic judgment results may include a visualization module through a web user interface.

라이브러리는 Scikit-learn, pandas, numpy 라이브러리들 중에서 선택되는 어느 하나 이상을 포함할 수 있다.The library may include one or more selected from scikit-learn, pandas, and numpy libraries.

본 실시예에 따르면, 엔드포인트로부터 행위 정보를 수집하여 저장 중앙 서버에 저장하고, 저장된 행위 정보에 대한 AI 분석을 통해 보안 위협 이벤트 자동분석 과정을 진행하면, 다량의 로그를 효율적이고 빠르게 분류 및 분석이 가능하고, 전문적인 분석가가 없어도 AI에 의해 위협행위가 발생하였을 때 침해여부를 자동으로 판단할 수 있다.According to this embodiment, when behavioral information is collected from endpoints, stored in a storage central server, and security threat event automatic analysis is performed through AI analysis of stored behavioral information, a large amount of logs are efficiently and quickly classified and analyzed. This is possible, and when a threatening act occurs by AI, it can automatically determine whether it is infringed without a professional analyst.

또한, AI를 이용한 자동분석은 엔드포인트 이벤트 로그 분석뿐만 아니라 악성코드를 테스트할 수 있는 가시화 플랫폼에 적용하여 악성코드 및 악성행위의 분석 결과 값의 신뢰도를 상당히 높일 수 있으며, SECaaS와 같은 클라우드 기반 보안서비스 제품 등에 즉각 적용할 수 있는 장점이 있다.In addition, automatic analysis using AI can significantly increase the reliability of the analysis result of malicious code and malicious behavior by applying it to a visualization platform that can test malicious code as well as endpoint event log analysis, and cloud-based security such as SECaaS. It has the advantage of being immediately applicable to service products, etc.

도 18은 도 1의 자동분석 장치의 AI기반 위협 이벤트 자동분석에 대한 성능평가 과정을 설명하기 위한 블록도이다. FIG. 18 is a block diagram illustrating a performance evaluation process for automatic AI-based threat event analysis of the automatic analysis device of FIG. 1 .

도 18을 참조하면, AI기반 위협 이벤트 자동분석 방법에 대한 성능평가는 다음과 같이 수행될 수 있다.Referring to FIG. 18 , performance evaluation of the AI-based automatic threat event analysis method may be performed as follows.

먼저, 이벤트 로그 수집 단계(S181)에서, 악성코드 실행에 따라 알려진 악성정보 수집채널 예컨대 바이러스 토탈(Virus Total), 바이러스 사인(Virus Sign), 비트디펜더(Bitdefender) 등의 5개에서 악성 정보를 수집하여 악성정보 수집채널별 악성 200개와 자체 보유한 정상파일 1000개를 성능평가용 시료로 사용할 수 있다.First, in the event log collection step (S181), malicious information is collected from five known malicious information collection channels according to the execution of malicious code, such as Virus Total, Virus Sign, and Bitdefender. Therefore, 200 malicious files and 1000 normal files owned by each malicious information collection channel can be used as samples for performance evaluation.

여기서, 본 실시예에서 설명한 자동분석 장치의 일종인 '가상화 기반 악성행위 수동분석 가시화플랫폼'을 활용하여 위협 이벤트 자동분석을 평가할 테스트베드로 구축할 수 있다.Here, the 'virtualization-based passive analysis visualization platform for malicious behavior', which is a kind of automatic analysis device described in this embodiment, can be used to build a test bed to evaluate automatic analysis of threat events.

다음, 악성코드 실행에 따라 이벤트 로그가 수집되면, 챗봇엔진을 통해 위협 이벤트 자동탐지가 수행된다(S182). 엔드포인트 등에서의 악성코드 실행에 따른 자동탐지 결과는 룰(rule) 분석(S185) 결과와 함께 사용자 인터페이스에 가시화하도록 구성될 수 있다(S186). 가시화되는 정보를 통해 탐지 성능 확인과 보고서가 사용자 등에게 제공될 수 있다.Next, when the event log is collected according to the execution of the malicious code, the threat event is automatically detected through the chatbot engine (S182). The result of automatic detection according to the execution of the malicious code on the endpoint, etc. may be configured to be visualized on the user interface together with the result of the rule analysis (S185) (S186). Through the visualized information, detection performance check and report can be provided to the user.

다음, 자동탐지 결과는 이벤트 자동 분석 과정(S183)을 거쳐 가시화 시스템(S184)에 전달될 수 있다.Next, the automatic detection result may be delivered to the visualization system (S184) through an automatic event analysis process (S183).

다음, AI모듈과 연동되는 가시화 시스템(S184)의 EDR GUI를 통해 이벤트 자동 분석결과를 확인하고, 테스트베드 UI를 확인하여 테스트 결과를 측정할 수 있다.Next, the event automatic analysis result can be checked through the EDR GUI of the visualization system (S184) linked with the AI module, and the test result can be measured by checking the test bed UI.

전술한 실시예들에 의하면, 알려지지 않은 보안위협 및 기존 안티바이러스를 무력화하는 신종 및 변종 위협에 효과적으로 대응할 수 있다. 또한, IOC 기반 포렌식 분석을 통한 사이버 위협 자동 대응체계 구축으로 수동분석에 의존한 휴먼에러와 시간적 한계점 크게 개선할 수 있다. 또한, AI기반 이벤트 분석기술 도입으로 자동화된 분석시스템을 통해 룰 생성 및 배포로 자율진화형 위협대응시스템을 효율적으로 구축할 수 있다. 또한, 타기관 및 타시스템과의 최신 위협정보 공유를 통해 신뢰성 있는 정보 생성 및 배포가 가능하고, 악성 위협에 대한 대응시간을 단축하여 사이버 피해를 최소화하는데 기여할 수 있다.According to the above-described embodiments, it is possible to effectively respond to unknown security threats and new and variant threats that neutralize existing anti-virus. In addition, by establishing an automatic cyber threat response system through IOC-based forensic analysis, human errors and temporal limitations that depended on manual analysis can be greatly improved. In addition, by introducing AI-based event analysis technology, it is possible to efficiently build a self-evolving threat response system by creating and distributing rules through an automated analysis system. In addition, it is possible to create and distribute reliable information by sharing the latest threat information with other organizations and other systems, and it can contribute to minimizing cyber damage by shortening the response time to malicious threats.

도 19는 본 발명의 또 다른 실시예에 따른 이미지 기반 악성코드 탐지 장치의 주요 구성을 설명하기 블록도이다.19 is a block diagram illustrating the main components of an image-based malicious code detection device according to another embodiment of the present invention.

본 실시예의 악성 이벤트로그 자동분석 장치(1000)(간략히 자동분석 장치)는 서버측 EDR 시스템의 적어도 일부 구성부로서 설치되거나, 엔드포인트 에이전트의 적어도 일부 구성부로서 설치될 수 있다.The automatic malicious event log analysis device 1000 (briefly, the automatic analysis device) of the present embodiment may be installed as at least a part of a component of a server-side EDR system or as at least a part of an endpoint agent.

도 19를 참조하면, 자동분석 장치(1000)는, 적어도 하나의 프로세서(1100), 메모리(1200) 및 네트워크와 연결되어 통신을 수행하는 송수신 장치(1300)를 포함할 수 있다. 또한, 자동분석 장치(1000)는 입력 인터페이스 장치(1400), 출력 인터페이스 장치(1500), 저장 장치(1600)를 더 포함할 수 있다. 자동분석 장치(1000)에 포함된 각각의 구성 요소들은 버스(bus)(1700)에 의해 연결되어 서로 통신을 수행할 수 있다.Referring to FIG. 19 , the automatic analysis device 1000 may include at least one processor 1100, a memory 1200, and a transceiver 1300 connected to a network to perform communication. In addition, the automatic analysis device 1000 may further include an input interface device 1400, an output interface device 1500, and a storage device 1600. Each component included in the automatic analysis device 1000 may be connected by a bus 1700 to communicate with each other.

다만, 자동분석 장치(1000)에 포함된 각각의 구성요소들은 공통 버스(1700)가 아니라, 프로세서(1100)를 중심으로 개별 인터페이스 또는 개별 버스를 통하여 연결될 수도 있다. 예를 들어, 프로세서(1100)는 메모리(1200), 송수신 장치(1300), 입력 인터페이스 장치(1400), 출력 인터페이스 장치(1500) 및 저장 장치(1600) 중에서 적어도 하나와 전용 인터페이스를 통하여 연결될 수도 있다.However, each component included in the automatic analysis device 1000 may be connected through an individual interface or individual bus centered on the processor 1100 instead of the common bus 1700 . For example, the processor 1100 may be connected to at least one of the memory 1200, the transmission/reception device 1300, the input interface device 1400, the output interface device 1500, and the storage device 1600 through a dedicated interface. .

프로세서(1100)는 메모리(1200) 및 저장 장치(1600) 중에서 적어도 하나에 저장된 프로그램 명령(program command)을 실행할 수 있다. 프로세서(1100)는 중앙 처리 장치(central processing unit, CPU), 그래픽 처리 장치(graphics processing unit, GPU), 또는 본 발명의 실시예들에 따른 방법들이 수행되는 전용의 프로세서를 의미할 수 있다.The processor 1100 may execute a program command stored in at least one of the memory 1200 and the storage device 1600 . The processor 1100 may mean a central processing unit (CPU), a graphics processing unit (GPU), or a dedicated processor on which methods according to embodiments of the present invention are performed.

메모리(1200) 및 저장 장치(1600) 각각은 휘발성 저장 매체 및 비휘발성 저장 매체 중에서 적어도 하나로 구성될 수 있다. 예를 들어, 메모리(1200)는 읽기 전용 메모리(read only memory, ROM) 및 랜덤 액세스 메모리(random access memory, RAM) 중에서 적어도 하나로 구성될 수 있다.Each of the memory 1200 and the storage device 1600 may include at least one of a volatile storage medium and a non-volatile storage medium. For example, the memory 1200 may include at least one of a read only memory (ROM) and a random access memory (RAM).

메모리(1200) 또는 저장 장치(1600)에 저장되거나 프로세서(1100)에 탑재되어 실행되는 적어도 하나의 명령은 프로세서가; 이벤트 로그의 효율적 분석을 위한 프로세스별 그룹화로 로그를 가공하는 단계; AI 분석을 위한 NLP(Natural Language Processing)기법을 통해 프로세스 행위에 대한 문장을 생성하는 단계; 실제로 동작하는 악성 행위들을 데이터화하여 AI 모델 학습을 위한 학습데이터로 사용하는 단계; 학습데이터를 학습시키기 위해 Transfomer 알고리즘 기반의 학습 체계를 생성하는 단계; 실제 악성행위를 분석하기 위해 BERT 및 LSTM 알고리즘 기반의 이벤트 악성 여부 자동 분석기를 통해 그룹화한 이벤트 로그를 실시간으로 확인하는 단계; 악성 행위 대상이 일정 유사도 이상일 경우 탐지하여 위협 이벤트로 자동판단하는 단계를 수행하도록 구성될 수 있다.At least one command stored in the memory 1200 or the storage device 1600 or loaded and executed by the processor 1100 may include; Processing logs by grouping by process for efficient analysis of event logs; Generating sentences for process behavior through NLP (Natural Language Processing) techniques for AI analysis; Converting malicious behaviors that actually operate into data and using them as learning data for AI model learning; Creating a learning system based on the Transformer algorithm to learn the learning data; Checking grouped event logs in real time through an automatic malicious event analyzer based on BERT and LSTM algorithms to analyze actual malicious behavior; It may be configured to perform a step of automatically determining a malicious action target as a threat event by detecting a similarity or higher.

또한, 전술한 본 발명의 실시예들에 따른 방법의 동작은 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 프로그램 또는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의해 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어 분산 방식으로 컴퓨터로 읽을 수 있는 프로그램 또는 코드가 저장되고 실행될 수 있다.In addition, the operation of the method according to the above-described embodiments of the present invention can be implemented as a computer-readable program or code on a computer-readable recording medium. A computer-readable recording medium includes all types of recording devices in which data that can be read by a computer system is stored. In addition, computer-readable recording media may be distributed to computer systems connected through a network to store and execute computer-readable programs or codes in a distributed manner.

또한, 컴퓨터가 읽을 수 있는 기록매체는 롬(rom), 램(ram), 플래시 메모리(flash memory) 등과 같이 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함할 수 있다. 프로그램 명령은 컴파일러(compiler)에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터(interpreter) 등을 사용해서 컴퓨터에 의해 실행될 수 있는 고급 언어 코드를 포함할 수 있다.In addition, the computer-readable recording medium may include hardware devices specially configured to store and execute program instructions, such as ROM, RAM, and flash memory. The program command may include high-level language codes that can be executed by a computer using an interpreter or the like as well as machine code generated by a compiler.

본 발명의 일부 측면들은 장치의 문맥에서 설명되었으나, 그것은 상응하는 방법에 따른 설명 또한 나타낼 수 있고, 여기서 블록 또는 장치는 방법 단계 또는 방법 단계의 특징에 상응한다. 유사하게, 방법의 문맥에서 설명된 측면들은 또한 상응하는 블록 또는 아이템 또는 상응하는 장치의 특징으로 나타낼 수 있다. 방법 단계들의 몇몇 또는 전부는 예를 들어, 마이크로프로세서, 프로그램 가능한 컴퓨터 또는 전자 회로와 같은 하드웨어 장치에 의해(또는 이용하여) 수행될 수 있다. 몇몇의 실시예에서, 가장 중요한 방법 단계들의 하나 이상은 이와 같은 장치에 의해 수행될 수 있다. Although some aspects of the present invention have been described in the context of an apparatus, it may also represent a description according to a corresponding method, where a block or apparatus corresponds to a method step or feature of a method step. Similarly, aspects described in the context of a method may also be represented by a corresponding block or item or a corresponding feature of a device. Some or all of the method steps may be performed by (or using) a hardware device such as, for example, a microprocessor, programmable computer, or electronic circuitry. In some embodiments, one or more of the most important method steps may be performed by such an apparatus.

실시예들에서, 프로그램 가능한 로직 장치(예를 들어, 필드 프로그래머블 게이트 어레이)가 여기서 설명된 방법들의 기능의 일부 또는 전부를 수행하기 위해 사용될 수 있다. 실시예들에서, 필드 프로그래머블 게이트 어레이는 여기서 설명된 방법들 중 하나를 수행하기 위한 마이크로프로세서와 함께 작동할 수 있다. 일반적으로, 방법들은 어떤 하드웨어 장치에 의해 수행되는 것이 바람직하다.In embodiments, a programmable logic device (eg, a field programmable gate array) may be used to perform some or all of the functions of the methods described herein. In embodiments, a field programmable gate array may operate in conjunction with a microprocessor to perform one of the methods described herein. Generally, methods are preferably performed by some hardware device.

Claims

A method of analyzing malicious event logs by building a database to store artificial intelligence modules and event logs in an integrated analysis server, installing an agent on an endpoint, and performing rule matching on the event log in the agent,
Grouping event logs using process ID and session information;
generating sentences in order of process actions based on the event log;
Automatically analyzing a malicious event log in an endpoint based on a dataset included in a sentence generated in the sequence of actions;
Endpoint automatic analysis method comprising a.

Processing logs by grouping by process for efficient analysis of event logs;
Generating sentences for process behavior through NLP (Natural Language Processing) techniques for AI analysis;
Converting malicious behaviors that actually operate into data and using them as learning data for AI model learning;
generating a learning system based on a transformer algorithm to learn learning data;
Checking grouped event logs in real time through an automatic malicious event analyzer based on BERT and LSTM algorithms to analyze actual malicious behavior; and
An endpoint automatic analysis method comprising the step of detecting and automatically determining a malicious behavior target as a threat event if the target has a certain degree of similarity or higher.

A device that builds a database to store artificial intelligence modules and event logs in an integrated analysis server, installs an agent on an endpoint, and analyzes a malicious event log by performing rule matching on the event log in the agent,
processor; and
and a memory for storing at least one instruction executed by the processor, wherein the processor, by the at least one instruction,
Grouping event logs using process ID and session information;
generating sentences in order of process actions based on the event log;
Automatically analyzing a malicious event log in an endpoint based on a dataset included in a sentence generated in the sequence of actions;
An endpoint auto-analysis device that performs

As an endpoint automatic analysis device for endpoint detection & response (EDR) using an artificial intelligence (AI)-based chatbot engine,
comprising a processor and a memory, wherein at least one instruction stored in the memory causes the processor to:
Processing logs by grouping by process for efficient analysis of event logs, generating sentences about process behavior through NLP (Natural Language Processing) techniques for AI analysis, and data of malicious behaviors that actually operate to improve AI model learning. BERT (bidirectional encoder representations form transformers) and LSTM (long short-term memory) to implement a transformer algorithm-based learning system to learn learning data, and to analyze actual malicious behavior An automatic endpoint analysis device that checks grouped event logs through an algorithm-based event automatic analyzer in real time and detects if the target of malicious activity has a certain similarity or higher and automatically judges it as a threat event.

The method of claim 4,
The transformer model for the transformer algorithm has an encoder layer and a decoders layer, and in the positional encoding of the input side layer, 1 is assigned to the position of the embedding dimension in the embedding vector of each word. An automatic endpoint analysis device configured to generate position information of a malicious act by calculating a position (pos) of a word by using Equation 1, which is a cosine function, when the sum is an odd number;
[Equation 1]

.

The method of claim 4,
The transformer model for the transformer algorithm has an encoder layer and a decoders layer, and in the positional encoding of the input side layer, 1 is assigned to the position of the embedding dimension in the embedding vector of each word. An automatic endpoint analysis device configured to generate position information of a malicious act by calculating a position (pos) of a word by using Equation 2 below, which is a sin function when the sum is an even number;
[Equation 2]

.

The method of claim 5,
The decoder structure including the decoder layer is trained to predict words of each view from a sentence matrix, and applies a layer masking words of future views to the front end of decoder sublayers so that words in the future are not referred to. Point automatic analysis device.