KR20230046182A

KR20230046182A - Apparatus, method and computer program for detecting attack on network

Info

Publication number: KR20230046182A
Application number: KR1020210194486A
Authority: KR
Inventors: 윤영; 권성호; 김동우; 김원남; 김현민; 신수철; 오승택; 정하규
Original assignee: (주)너울리; (주)넷코아테크; 한국교육학술정보원
Priority date: 2021-12-31
Filing date: 2021-12-31
Publication date: 2023-04-05
Also published as: KR102651655B1

Abstract

A device for detecting an intrusion attack on a network comprises: a collection unit that collects packet data including headers and payloads; an embedding unit that tokenizes the headers and payloads of the collected packet data and embeds the tokenized headers and payloads; a learning unit that trains an artificial intelligence model based on embedding results; a detection unit that detects an intrusion attack on a target network by using the artificial intelligence model; and an analysis unit that analyzes the detected intrusion attack.

Description

DEVICE, METHOD AND COMPUTER PROGRAM FOR DETECTING ATTACK ON NETWORK}

본 발명은 네트워크에 대한 침해 공격을 탐지하는 장치, 방법 및 컴퓨터 프로그램에 관한 것이다.The present invention relates to an apparatus, method and computer program for detecting an intrusion attack on a network.

설명가능한 인공지능 모델(eXplainable Artificail Intelligence, XAI)은 판단에 대한 이유를 사람이 이해할 수 있는 방식으로 제시한다. 설명가능한 인공지능 모델은 특정한 판단에 대해 알고리즘의 설계자도 이유를 설명할 수 없는 '블랙박스' 인공지능과 대비되는 개념이다. 설명가능한 인공지능 모델은 인공지능의 불확실한 의사 결정 과정을 해소함으로써 인공지능 모델에 대한 신뢰성을 높일 수 있다.Explainable AI models (eXplainable Artificial Intelligence, XAI) give reasons for judgments in a way that humans can understand. An explainable artificial intelligence model is a concept contrasted with 'black box' artificial intelligence, in which even the designer of the algorithm cannot explain the reason for a particular decision. Explainable artificial intelligence models can increase the reliability of artificial intelligence models by resolving the uncertain decision-making process of artificial intelligence.

인공지능 모델을 이용한 네트워크 보안 시스템에 관한 연구가 활발하게 이루어지고 있다. 그러나, 종래의 보안 시스템은 판별 결과에 대한 근거와 해석 결과를 구체적으로 도출하는 기술은 미흡하였다.Research on network security systems using artificial intelligence models is being actively conducted. However, the conventional security system lacks a technique for specifically deriving the basis for the discrimination result and the analysis result.

또한, 인공지능 모델의 추론 결과에 기초하여 보안 시스템의 탐지 규칙을 개선하고, 오탐율의 저하 및 정탐율의 개선 여부를 정량적으로 분석할 필요가 있었다.In addition, it was necessary to improve the detection rules of the security system based on the inference results of the artificial intelligence model, and quantitatively analyze whether the false positive rate and the true positive rate were improved.

한국등록특허공보 제 1814368호 (2017.12.27 등록)Korean Registered Patent Publication No. 1814368 (registered on December 27, 2017)

본 발명은 전술한 종래 기술의 문제점을 해결하기 위한 것으로서, 헤더 및 페이로드를 포함하는 패킷 데이터를 수집하고, 수집한 패킷 데이터의 헤더 및 페이로드를 토큰화하고, 토큰화된 헤더 및 페이로드를 임베딩하고, 임베딩 결과에 기초하여 인공지능 모델을 학습시키고, 인공지능 모델을 이용하여 대상 네트워크에 대한 침해 공격을 탐지하고, 탐지된 침해 공격을 분석하고자 한다.The present invention is to solve the problems of the prior art described above, and collects packet data including headers and payloads, tokenizes the headers and payloads of the collected packet data, and tokenizes the headers and payloads. Embedding, learning an artificial intelligence model based on the embedding results, detecting an intrusion attack on the target network using the artificial intelligence model, and analyzing the detected intrusion attack.

트랜스포머 기반의 앙상블 모델 및 셀프 어텐션 기반의 설명 가능한 인공지능 모델을 이용하여 네트워크에 대한 침해 공격을 탐지하고 판별 근거를 분석하는 장치, 방법 및 컴퓨터 프로그램을 제공하고자 한다.It is intended to provide a device, method, and computer program that detects an intrusion attack on a network and analyzes the basis for discrimination using a transformer-based ensemble model and a self-attention-based explainable artificial intelligence model.

탐지된 네트워크에 대한 침해 공격과 다른 침해 공격 간의 상관관계 및 인과관계를 분석하는 침해 공격 탐지 장치, 방법 및 컴퓨터 프로그램을 제공하고자 한다.It is intended to provide an intrusion attack detection device, method, and computer program that analyzes the correlation and causal relationship between an intrusion attack on a detected network and other intrusion attacks.

다만, 본 실시예가 이루고자 하는 기술적 과제는 상기된 바와 같은 기술적 과제들로 한정되지 않으며, 또 다른 기술적 과제들이 존재할 수 있다.However, the technical problem to be achieved by the present embodiment is not limited to the technical problems described above, and other technical problems may exist.

상술한 기술적 과제를 달성하기 위한 수단으로서, 본 발명의 일 실시예는, 네트워크에 대한 침해 공격을 탐지하는 장치에 있어서, 헤더 및 페이로드를 포함하는 패킷 데이터를 수집하는 수집부, 상기 수집한 패킷 데이터의 헤더 및 페이로드를 토큰화하고, 상기 토큰화된 헤더 및 페이로드를 임베딩하는 임베딩부, 상기 임베딩 결과에 기초하여 인공지능 모델을 학습시키는 학습부, 상기 인공지능 모델을 이용하여 대상 네트워크에 대한 침해 공격을 탐지하는 탐지부 및 상기 탐지된 침해 공격을 분석하는 분석부를 포함할 수 있다.As a means for achieving the above-mentioned technical problem, an embodiment of the present invention is an apparatus for detecting an intrusion attack on a network, a collection unit for collecting packet data including a header and a payload, the collected packets An embedding unit that tokenizes the header and payload of data and embeds the tokenized header and payload, a learning unit that learns an artificial intelligence model based on the embedding result, and a target network using the artificial intelligence model. It may include a detection unit that detects an intrusion attack on the system and an analyzer that analyzes the detected intrusion attack.

일 실시예에서, 상기 임베딩부는 상기 토큰화된 헤더 및 페이로드의 임베딩 벡터를 도출하고, 상기 학습부는 상기 도출된 임베딩 벡터에 기초하여 상기 인공지능 모델을 학습시킬 수 있다.In an embodiment, the embedding unit may derive embedding vectors of the tokenized header and payload, and the learning unit may train the artificial intelligence model based on the derived embedding vectors.

일 실시예에서, 상기 학습부는 상기 임베딩 벡터에 대해 합성곱 연산을 수행하여 특징점을 추출하고, 상기 추출된 특징점에 기초하여 상기 인공지능 모델을 학습시킬 수 있다.In one embodiment, the learning unit may extract feature points by performing a convolution operation on the embedding vector, and may train the artificial intelligence model based on the extracted feature points.

일 실시예에서, 상기 인공지능 모델은 상기 추출된 특징점에 기초하여 학습된 트랜스포머 기반의 앙상블 모델에 기초하여 침해 공격을 탐지하는 제 1 모델 및 상기 페이로드의 토큰별로 어텐션 가중치를 도출하는 셀프 어텐션 모델에 기초하여 상기 침해 공격의 판별 근거를 분석하는 설명 가능한 인공지능 모델인 제 2 모델을 포함할 수 있다.In one embodiment, the artificial intelligence model includes a first model for detecting an intrusion attack based on a transformer-based ensemble model learned based on the extracted feature points and a self-attention model for deriving an attention weight for each token of the payload. It may include a second model, which is an explainable artificial intelligence model that analyzes the basis for discrimination of the infringement attack based on .

일 실시예에서, 상기 학습부는 상기 도출된 어텐션 가중치에 기초하여 상기 페이로드의 토큰 중에서 위협 정보 토큰을 선정할 수 있다.In an embodiment, the learning unit may select a threat information token from tokens of the payload based on the derived attention weight.

일 실시예에서, 상기 학습부는 상기 위협 정보 토큰의 위치 정보 및 상기 위협 정보 토큰의 어텐션 가중치 값을 더 도출할 수 있다.In an embodiment, the learning unit may further derive location information of the threat information token and an attention weight value of the threat information token.

일 실시예에서, 상기 학습부는 상기 위협 정보 토큰의 위치 정보 및 상기 위협 정보 토큰의 어텐션 가중치 값에 기초하여 상기 인공지능 모델의 탐지 규칙을 개선할 수 있다.In an embodiment, the learning unit may improve a detection rule of the artificial intelligence model based on location information of the threat information token and an attention weight value of the threat information token.

일 실시예에서, 상기 분석부는 상기 탐지된 침해 공격과 다른 침해 공격 간의 상관관계 및 인과관계 중 적어도 하나를 분석할 수 있다.In one embodiment, the analyzer may analyze at least one of a correlation and a causal relationship between the detected intrusion attack and other intrusion attacks.

일 실시예에서, 상기 분석부는 연관 법칙 학습 기법에 기초하여 상기 탐지된 침해 공격과 동시간대에 발생한 다른 침해 공격 간의 상기 상관관계를 분석할 수 있다.In an embodiment, the analyzer may analyze the correlation between the detected invasion attack and another invasion attack occurring at the same time based on an association law learning technique.

일 실시예에서, 상기 분석부는 시퀀스 추론 기법에 기초하여 상기 탐지된 침해 공격과 과거에 발생한 다른 침해 공격 간의 상기 인과관계를 분석할 수 있다.In one embodiment, the analysis unit may analyze the causal relationship between the detected invasion attack and other invasion attacks that have occurred in the past based on a sequence reasoning technique.

본 발명의 다른 실시예는, 네트워크에 대한 침해 공격을 탐지하는 방법에 있어서, 헤더 및 페이로드를 포함하는 패킷 데이터를 수집하는 단계, 상기 수집한 패킷 데이터의 헤더 및 페이로드를 토큰화하는 단계, 상기 토큰화된 헤더 및 페이로드를 임베딩하는 단계, 상기 임베딩 결과에 기초하여 인공지능 모델을 학습시키는 단계, 상기 인공지능 모델을 이용하여 대상 네트워크에 대한 침해 공격을 탐지하는 단계 및 상기 탐지된 침해 공격을 분석하는 단계를 포함할 수 있다.Another embodiment of the present invention is a method for detecting an intrusion attack on a network, comprising: collecting packet data including headers and payloads; tokenizing headers and payloads of the collected packet data; Embedding the tokenized header and payload, learning an artificial intelligence model based on the embedding result, detecting an intrusion attack on a target network using the artificial intelligence model, and the detected intrusion attack It may include the step of analyzing.

본 발명의 또 다른 실시예는, 네트워크에 대한 침해 공격을 탐지하는 명령어들의 시퀀스를 포함하는 컴퓨터 판독가능 기록매체에 저장된 컴퓨터 프로그램에 있어서, 상기 컴퓨터 프로그램은 컴퓨팅 장치에 의해 실행될 경우, 헤더 및 페이로드를 포함하는 패킷 데이터를 수집하고, 상기 수집한 패킷 데이터의 헤더 및 페이로드를 토큰화하고, 상기 토큰화된 헤더 및 페이로드를 임베딩하고, 상기 임베딩 결과에 기초하여 인공지능 모델을 학습시키고, 상기 인공지능 모델을 이용하여 대상 네트워크에 대한 침해 공격을 탐지하고, 상기 탐지된 침해 공격을 분석하도록 하는 명령어들의 시퀀스를 포함할 수 있다.Another embodiment of the present invention is a computer program stored on a computer readable recording medium containing a sequence of instructions for detecting an intrusion attack on a network, the computer program comprising a header and a payload when executed by a computing device. Collecting packet data including, tokenizing headers and payloads of the collected packet data, embedding the tokenized headers and payloads, learning an artificial intelligence model based on the embedding results, It may include a sequence of instructions for detecting an intrusion attack on a target network using an artificial intelligence model and analyzing the detected intrusion attack.

상술한 과제 해결 수단은 단지 예시적인 것으로서, 본 발명을 제한하려는 의도로 해석되지 않아야 한다. 상술한 예시적인 실시예 외에도, 도면 및 발명의 상세한 설명에 기재된 추가적인 실시예가 존재할 수 있다.The above-described means for solving the problems is only illustrative and should not be construed as limiting the present invention. In addition to the exemplary embodiments described above, there may be additional embodiments described in the drawings and detailed description.

전술한 본 발명의 과제 해결 수단 중 어느 하나에 의하면, 본 발명은 헤더 및 페이로드를 포함하는 패킷 데이터를 수집하고, 수집한 패킷 데이터의 헤더 및 페이로드를 토큰화하고, 토큰화된 헤더 및 페이로드를 임베딩하고, 임베딩 결과에 기초하여 인공지능 모델을 학습시키고, 인공지능 모델을 이용하여 대상 네트워크에 대한 침해 공격을 탐지하고, 탐지된 침해 공격을 분석할 수 있다.According to any one of the problem solving means of the present invention described above, the present invention collects packet data including headers and payloads, tokenizes the headers and payloads of the collected packet data, and tokenizes the headers and payloads. It is possible to embed the load, train an artificial intelligence model based on the embedding result, detect an intrusion attack on the target network using the artificial intelligence model, and analyze the detected intrusion attack.

인공지능 모델의 탐지 규칙을 개선함으로써 네트워크에 대한 침해 공격을 보다 정확하고 효율적으로 탐지할 수 있다.By improving the detection rules of the artificial intelligence model, intrusion attacks on the network can be detected more accurately and efficiently.

네트워크의 보안을 강화하고 침해 공격에 의한 피해가 발생하는 것을 방지할 수 있다.It can strengthen the security of the network and prevent damage caused by intrusion attacks.

도 1은 본 발명의 일 실시예에 따른 침해 공격 탐지 장치의 구성도이다.
도 2는 본 발명의 일 실시예에 따라 인공지능 모델을 학습시키는 방법을 설명하기 위한 예시적인 도면이다.
도 3은 본 발명의 일 실시예에 따라 인공지능 모델을 학습시키는 방법을 설명하기 위한 예시적인 도면이다.
도 4는 본 발명의 일 실시예에 따라 인공지능 모델을 학습시키는 방법을 설명하기 위한 예시적인 도면이다.
도 5는 본 발명의 일 실시예에 따른 앙상블 모델을 설명하기 위한 예시적인 도면이다.
도 6은 본 발명의 일 실시예에 따라 인공지능 모델의 탐지 규칙을 개선하는 방법을 설명하기 위한 예시적인 도면이다.
도 7은 본 발명의 일 실시예에 따라 탐지된 침해 공격과 다른 침해 공격 간의 인과관계를 분석하는 방법을 설명하기 위한 예시적인 도면이다.
도 8은 본 발명의 일 실시예에 따른 네트워크에 대한 침해 공격 탐지 방법의 순서도이다.
도 9는 본 발명의 일 실시예에 따른 침해 공격 탐지 장치에 의해 사용자에게 제공되는 인터페이스 화면의 예시적인 도면이다.1 is a block diagram of an infringement attack detection device according to an embodiment of the present invention.
2 is an exemplary diagram for explaining a method of learning an artificial intelligence model according to an embodiment of the present invention.
3 is an exemplary diagram for explaining a method of learning an artificial intelligence model according to an embodiment of the present invention.
4 is an exemplary diagram for explaining a method of learning an artificial intelligence model according to an embodiment of the present invention.
5 is an exemplary diagram for explaining an ensemble model according to an embodiment of the present invention.
6 is an exemplary diagram for explaining a method of improving a detection rule of an artificial intelligence model according to an embodiment of the present invention.
7 is an exemplary diagram for explaining a method of analyzing a causal relationship between a detected invasion attack and another invasion attack according to an embodiment of the present invention.
8 is a flowchart of a method for detecting an intrusion attack on a network according to an embodiment of the present invention.
9 is an exemplary view of an interface screen provided to a user by an infringement attack detection device according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참조하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 본 발명의 실시예를 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail so that those skilled in the art can easily practice the present invention with reference to the accompanying drawings. However, the present invention may be embodied in many different forms and is not limited to the embodiments described herein. And in order to clearly explain the present invention in the drawings, parts irrelevant to the description are omitted, and similar reference numerals are attached to similar parts throughout the specification.

명세서 전체에서, 어떤 부분이 다른 부분과 "연결"되어 있다고 할 때, 이는 "직접적으로 연결"되어 있는 경우뿐 아니라, 그 중간에 다른 소자를 사이에 두고 "전기적으로 연결"되어 있는 경우도 포함한다. 또한 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미하며, 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Throughout the specification, when a part is said to be "connected" to another part, this includes not only the case where it is "directly connected" but also the case where it is "electrically connected" with another element interposed therebetween. . In addition, when a part "includes" a certain component, this means that it may further include other components, not excluding other components, unless otherwise stated, and one or more other characteristics. However, it should be understood that it does not preclude the possibility of existence or addition of numbers, steps, operations, components, parts, or combinations thereof.

본 명세서에 있어서 '부(部)'란, 하드웨어에 의해 실현되는 유닛(unit), 소프트웨어에 의해 실현되는 유닛, 양방을 이용하여 실현되는 유닛을 포함한다. 또한, 1 개의 유닛이 2 개 이상의 하드웨어를 이용하여 실현되어도 되고, 2 개 이상의 유닛이 1 개의 하드웨어에 의해 실현되어도 된다. 한편, '~부'는 소프트웨어 또는 하드웨어에 한정되는 의미는 아니며, '~부'는 어드레싱 할 수 있는 저장 매체에 있도록 구성될 수도 있고 하나 또는 그 이상의 프로세서들을 재생시키도록 구성될 수도 있다. 따라서, 일 예로서 '~부'는 소프트웨어 구성요소들, 객체지향 소프트웨어 구성요소들, 클래스 구성요소들 및 태스크 구성요소들과 같은 구성요소들과, 프로세스들, 함수들, 속성들, 프로시저들, 서브루틴들, 프로그램 코드의 세그먼트들, 드라이버들, 펌웨어, 마이크로코드, 회로, 데이터, 데이터베이스, 데이터 구조들, 테이블들, 어레이들 및 변수들을 포함한다. 구성요소들과 '~부'들 안에서 제공되는 기능은 더 작은 수의 구성요소들 및 '~부'들로 결합되거나 추가적인 구성요소들과 '~부'들로 더 분리될 수 있다. 뿐만 아니라, 구성요소들 및 '~부'들은 디바이스 또는 보안 멀티미디어카드 내의 하나 또는 그 이상의 CPU들을 재생시키도록 구현될 수도 있다.In this specification, a "unit" includes a unit realized by hardware, a unit realized by software, and a unit realized using both. Further, one unit may be realized using two or more hardware, and two or more units may be realized by one hardware. On the other hand, '~ unit' is not limited to software or hardware, and '~ unit' may be configured to be in an addressable storage medium or configured to reproduce one or more processors. Therefore, as an example, '~unit' refers to components such as software components, object-oriented software components, class components, and task components, processes, functions, properties, and procedures. , subroutines, segments of program code, drivers, firmware, microcode, circuitry, data, databases, data structures, tables, arrays and variables. Functions provided within components and '~units' may be combined into smaller numbers of components and '~units' or further separated into additional components and '~units'. In addition, components and '~units' may be implemented to play one or more CPUs in a device or a secure multimedia card.

이하에서 언급되는 "네트워크"는 단말들 및 서버들과 같은 각각의 노드 상호 간에 정보 교환이 가능한 연결 구조를 의미하는 것으로, 근거리 통신망(LAN: Local Area Network), 광역 통신망(WAN: Wide Area Network), 인터넷 (WWW: World Wide Web), 유무선 데이터 통신망, 전화망, 유무선 텔레비전 통신망 등을 포함한다. 무선 데이터 통신망의 일례에는 3G, 4G, 5G, 3GPP(3rd Generation Partnership Project), LTE(Long Term Evolution), WIMAX(World Interoperability for Microwave Access), 와이파이(Wi-Fi), 블루투스 통신, 적외선 통신, 초음파 통신, 가시광 통신(VLC: Visible Light Communication), 라이파이(LiFi) 등이 포함되나 이에 한정되지는 않는다.The "network" referred to below refers to a connection structure capable of exchanging information between nodes such as terminals and servers, such as a local area network (LAN) and a wide area network (WAN). , the Internet (WWW: World Wide Web), wired and wireless data communications networks, telephone networks, and wired and wireless television communications networks. Examples of wireless data communication networks include 3G, 4G, 5G, 3rd Generation Partnership Project (3GPP), Long Term Evolution (LTE), World Interoperability for Microwave Access (WIMAX), Wi-Fi, Bluetooth communication, infrared communication, ultrasonic communication, visible light communication (VLC: Visible Light Communication), LiFi, and the like, but are not limited thereto.

본 명세서에 있어서 단말 또는 디바이스가 수행하는 것으로 기술된 동작이나 기능 중 일부는 해당 단말 또는 디바이스와 연결된 서버에서 대신 수행될 수도 있다. 이와 마찬가지로, 서버가 수행하는 것으로 기술된 동작이나 기능 중 일부도 해당 서버와 연결된 단말 또는 디바이스에서 수행될 수도 있다.In this specification, some of the operations or functions described as being performed by a terminal or device may be performed instead by a server connected to the terminal or device. Likewise, some of the operations or functions described as being performed by the server may also be performed in a terminal or device connected to the corresponding server.

이하 첨부된 도면을 참고하여 본 발명의 일 실시예를 상세히 설명하기로 한다.Hereinafter, an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 침해 공격 탐지 장치의 구성도이다. 1 is a block diagram of an infringement attack detection device according to an embodiment of the present invention.

침해 공격 탐지 장치(100)는 예를 들어, 패킷 데이터에 포함되는 헤더 및 페이로드의 조합이 악성인지 여부를 판별함으로써 네트워크에 대한 침해 공격을 탐지할 수 있다.The intrusion attack detection apparatus 100 may detect an intrusion attack on a network by determining, for example, whether a combination of a header and a payload included in packet data is malicious.

침해 공격 탐지 장치(100)는 예를 들어, 탐지된 침해 공격의 패킷 데이터에 포함되는 헤더 및 페이로드를 분석하고, 분석 결과를 이용하여 장치의 탐지 성능을 점차 개선할 수 있다.The intrusion attack detection device 100 may analyze, for example, headers and payloads included in packet data of the detected intrusion attack, and gradually improve detection performance of the device using the analysis result.

도 1을 참조하면, 침해 공격 탐지 장치(100)는 수집부(110), 임베딩부(120), 학습부(130), 탐지부(140) 및 분석부(150)를 포함할 수 있다.Referring to FIG. 1 , an intrusion attack detection device 100 may include a collection unit 110 , an embedding unit 120 , a learning unit 130 , a detection unit 140 and an analysis unit 150 .

수집부(110)는 헤더 및 페이로드를 포함하는 패킷 데이터를 수집할 수 있다.The collection unit 110 may collect packet data including headers and payloads.

일 실시예에서, 수집부(110)는 기설정된 기준에 기초하여 수집된 패킷 데이터를 정상 또는 악성으로 분류하고 레이블링할 수 있다. 침해 공격 탐지 장치(100)는 레이블링된 패킷 데이터를 이용하여 침해 공격을 탐지하는 인공지능 모델을 학습시킬 수 있다.In one embodiment, the collection unit 110 may classify and label the collected packet data as normal or malicious based on predetermined criteria. The intrusion attack detection apparatus 100 may train an artificial intelligence model for detecting an intrusion attack using labeled packet data.

임베딩부(120)는 수집한 패킷 데이터의 헤더 및 페이로드를 토큰화할 수 있다. 임베딩부(120)는 예를 들어, N-GRAM, BPE 또는 WPE 중 어느 하나의 방식에 의하여 헤더 및 페이로드를 토큰화할 수 있다.The embedding unit 120 may tokenize the header and payload of the collected packet data. The embedding unit 120 may tokenize the header and payload using any one of, for example, N-GRAM, BPE, or WPE.

임베딩부(120)는 토큰화된 헤더 및 페이로드를 임베딩할 수 있다. 임베딩부(120)는 토큰화된 헤더 및 페이로드의 임베딩 벡터를 도출할 수 있다.The embedding unit 120 may embed the tokenized header and payload. The embedding unit 120 may derive embedding vectors of the tokenized header and payload.

예를 들어, 임베딩부(120)는 센텐스피스(SentencePiece)를 이용하여 토큰화된 헤더 및 페이로드를 임베딩할 수 있다. 또는, 임베딩부(120)는 워드투벡(Word2vec) 및 패스트텍스트(FastText) 기법을 이용하여 토큰화된 헤더 및 페이로드를 임베딩할 수 있다.For example, the embedding unit 120 may embed the tokenized header and payload using SentencePiece. Alternatively, the embedding unit 120 may embed the tokenized header and payload using Word2vec and FastText techniques.

학습부(130)는 임베딩 결과에 기초하여 인공지능 모델을 학습시킬 수 있다. 예를 들어, 학습부(130)는 1D-CNN 기법을 이용하여 인공지능 모델을 학습시킬 수 있다. 여기서, 인공지능 모델은 헤어 및 페이로드의 특징점에 기초하여 학습된 트랜스포머 기반의 앙상블 모델에 기초하여 침해 공격을 탐지하는 모델일 수 있다.The learning unit 130 may train an artificial intelligence model based on the embedding result. For example, the learning unit 130 may train an artificial intelligence model using a 1D-CNN technique. Here, the artificial intelligence model may be a model that detects an intrusion attack based on a transformer-based ensemble model learned based on hair and payload feature points.

학습부(130)는 예를 들어, 도출된 헤더 및 페이로드의 임베딩 벡터에 기초하여 인공지능 모델을 학습시킬 수 있다. 학습부(130)는 임베딩 벡터에 대해 합성곱 연산을 수행하여 특징점을 추출할 수 있다. 학습부(130)는 추출된 특징점에 기초하여 인공지능 모델을 학습시킬 수 있다.The learning unit 130 may, for example, train an artificial intelligence model based on the derived embedding vectors of the header and payload. The learning unit 130 may extract feature points by performing a convolution operation on the embedding vector. The learning unit 130 may train an artificial intelligence model based on the extracted feature points.

도 2 내지 도 4는 본 발명의 일 실시예에 따라 인공지능 모델을 학습시키는 방법을 설명하기 위한 예시적인 도면이다.2 to 4 are exemplary diagrams for explaining a method of learning an artificial intelligence model according to an embodiment of the present invention.

'wait for the video and don't rent it'라는 자연어 문장에 대하여 토큰화, 패딩 및 임베딩 처리를 수행하면 도 2에 도시된 것과 같은 문장 형태의 행렬이 도출된다. 도 2에서 n은 문장의 길이이고, k는 임베딩 벡터의 차원을 나타낼 수 있다. When tokenization, padding, and embedding are performed on the natural language sentence 'wait for the video and don't rent it', a sentence-type matrix as shown in FIG. 2 is derived. In FIG. 2, n may represent the length of a sentence, and k may represent the dimension of an embedding vector.

1D-CNN 기법에 있어서, 커널의 너비는 임베딩 벡터의 차원(k)과 동일하게 설정되고, 커널의 높이는 유동적으로 설정될 수 있다. 따라서, 커널의 사이즈는 커널의 높이를 의미할 수 있다. 커널의 사이즈는 하이퍼파라미터 튜닝을 통해 실험적으로 가장 좋은 성능이 나오는 값이 사용될 수 있다.In the 1D-CNN technique, the width of the kernel is set equal to the dimension (k) of the embedding vector, and the height of the kernel can be set flexibly. Accordingly, the size of the kernel may mean the height of the kernel. As for the size of the kernel, a value that gives the best performance experimentally through hyperparameter tuning can be used.

도 3은 커널의 사이즈, 즉 커널의 높이를 2로 설정한 경우에 임베딩 벡터에 대해 합성곱 연산을 수행하여 특징점을 추출하는 방법을 예시적으로 도시한다.3 exemplarily illustrates a method of extracting feature points by performing a convolution operation on an embedding vector when the size of the kernel, that is, the height of the kernel is set to 2.

도 3을 참조하면, 커널의 사이즈가 2인 경우에, 첫번째 스텝에서 'wait for'에 대해서 합성곱 연산을 수행하고, 두번째 스텝에서는 'for the'에 대해서 합성곱 연산을 수행하고, 세번째 스텝에서는 'the video'에 대해서 합성곱 연산을 수행하고, 네번째 스텝에서는 'video and'에 대해서 합성곱 연산을 수행할 수 있다. 마찬가지의 방식으로 임베딩 벡터 전체에 대해서 합성곱 연산을 수행할 수 있다.Referring to FIG. 3, when the size of the kernel is 2, a convolution operation is performed on 'wait for' in the first step, a convolution operation is performed on 'for the' in the second step, and a convolution operation is performed on 'for the' in the third step. A convolution operation may be performed on 'the video', and a convolution operation may be performed on 'video and' in the fourth step. In the same way, the convolution operation can be performed on the entire embedding vector.

1D-CNN 기법에 있어서, 임베딩 벡터에 대한 합성곱 연산을 수행한 후에 풀링(pooling)을 수행할 수 있다. 풀링(pooling)의 예로, 각 합성곱 연산으로부터 얻은 결과 벡터에서 가장 큰 값을 취하는 맥스 풀링(Max-pooling)을 수행할 수 있다.In the 1D-CNN technique, pooling may be performed after performing a convolution operation on an embedding vector. As an example of pooling, max-pooling, which takes the largest value from the resulting vector obtained from each convolution operation, can be performed.

도 4는 커널 사이즈가 2인 경우와 커널 사이즈가 3인 경우 각각에 있어서, 임베딩 벡터에 대해 합성곱 연산을 수행하고 맥스 풀링을 수행함으로써 특징점을 추출하는 것을 예시적으로 도시한다.FIG. 4 exemplarily illustrates extraction of feature points by performing a convolution operation on an embedding vector and max pooling in cases where the kernel size is 2 and the kernel size is 3, respectively.

도 2 내지 4를 참조하여 설명한 바와 같이, 본 발명은 문장 자체가 아닌 특징점을 추출하고, 추출된 특징점에 기초하여 인공지능 모델을 학습시킴으로써 인공지능 모델의 학습 속도 및 정확도를 향상시킬 수 있다.As described with reference to FIGS. 2 to 4 , the present invention can improve the learning speed and accuracy of the artificial intelligence model by extracting feature points other than sentences themselves and training the artificial intelligence model based on the extracted feature points.

학습부(130)는 예를 들어, 1D-CNN 기법으로 추출된 특징점을 앙상블 모델에 적용하여 패킷 데이터의 정상 또는 악성 여부를 판별할 수 있다. 앙상블 모델은 패킷 데이터의 페이로드 데이터뿐 아니라 헤더 정보를 모델링에 활용할 수 있다. 또한, 앙상블 모델은 각 특징점에 따라 다른 모델의 구조를 적용하고, 산출되는 결과의 평균값을 이용하여 최종적으로 정상 또는 악성 여부를 판별할 수 있다.For example, the learning unit 130 may determine whether the packet data is normal or malicious by applying the feature points extracted by the ID-CNN technique to the ensemble model. The ensemble model can utilize header information as well as payload data of packet data for modeling. In addition, the ensemble model can finally determine whether it is normal or malignant by applying the structure of another model according to each feature point and using the average value of the calculated result.

도 5는 본 발명의 일 실시예에 따른 앙상블 모델의 구조를 설명하기 위한 예시적인 도면이다. 침해 공격 탐지 장치(100)는 도 5에 도시된 것과 같은 CMAE 모델을 이용하여 패킷 데이터의 정상 또는 악성 여부를 판별할 수 있다.5 is an exemplary diagram for explaining the structure of an ensemble model according to an embodiment of the present invention. The intrusion attack detection apparatus 100 may determine whether the packet data is normal or malicious using the CMAE model shown in FIG. 5 .

침해 공격 탐지 장치(100)는 CMAE 모델을 이용하여 정상 또는 악성 여부를 판별한 결과에 기초하여 침해 사고 발생 여부를 판단할 수 있다.The intrusion attack detection device 100 may determine whether an intrusion incident has occurred based on a result of determining whether the intrusion attack is normal or malicious using the CMAE model.

다른 실시예에서, 침해 공격 탐지 장치(100)는 데이터의 특성에 따라 전처리 방식 및 모델의 구조를 가변적으로 변경하여 학습을 진행 후 패킷 데이터의 정상 또는 악성 여부를 판별할 수 있다.In another embodiment, the intrusion attack detection apparatus 100 may determine whether the packet data is normal or malicious after learning by variably changing a preprocessing method and a model structure according to data characteristics.

예를 들어, 침해 공격 탐지 장치(100)는 CMAE 모델을 이용하여 판별한 결과를 설명 가능한 인공지능 모델(eXplainable Artificial Intelligence, XAI)에 적용하고 판별 근거를 분석할 수 있다.For example, the infringement attack detection apparatus 100 may apply the result determined using the CMAE model to an explainable artificial intelligence (XAI) model and analyze the basis for the determination.

설명 가능한 인공지능 모델은 페이로드의 토큰별로 어텐션 가중치를 도출하는 셀프 어텐션 모델에 기초하여 침해 공격의 판별 근거를 분석할 수 있다.The explainable artificial intelligence model can analyze the basis for discrimination of infringement attacks based on the self-attention model that derives the attention weight for each token of the payload.

셀프 어텐션 모델은, 문장 내에서 특정한 특징값(feature)이 문맥 내에서 어떤 특징값을 참조하고 있는지를 나타내는 어텐션 가중치(Attention weight)를 도출할 수 있다.The self-attention model may derive an attention weight representing which feature value a specific feature in a sentence refers to in a context.

셀프 어텐션 모델을 이용하여 어텐션 가중치를 구하는 방법으로는 먼저, 각 인코더의 입력 벡터인 각 워드의 임베딩 값으로 세 개의 벡터를 만들고, 다른 행렬을 곱하여 각 단어에 대한 Q, K, V 벡터를 만든다. 셀프 어텐션의 경우에는 Q, K 및 V의 값이 모두 동일할 수 있다.To obtain the attention weight using the self-attention model, first, three vectors are created with the embedding value of each word, which is the input vector of each encoder, and then multiplied by another matrix to create Q, K, and V vectors for each word. In the case of self-attention, the values of Q, K, and V may all be the same.

다음으로, 각 단어의 Q와 K의 내적을 계산하여 어텐션 값을 도출한다. 어텐션 값은 특정 위치에서 단어를 인코딩할 때 입력 문장의 다른 단어에 집중할 정도를 결정할 수 있다.Next, the attention value is derived by calculating the dot product of Q and K of each word. The attention value can determine the degree to which other words in the input sentence are focused when encoding a word at a particular position.

다음으로, 각 V 벡터에 어텐션 값을 곱하고 가중치 벡터를 더함으로써 셀프 어텐션 레이어의 출력을 생성할 수 있다. 구체적으로, 소프트맥스 함수를 이용하여 전체 값의 합이 1이 되는 확률 분포인 어텐션 분포(Attention Distribution)를 얻을 수 있다. 어텐션 분포의 각각의 값은 어텐션 가중치이다. 즉, 각각의 토큰이 어텐션 가중치를 가지게 되고, 도 5는 이것을 행렬의 형태로 시각화한 예시 도면이다.Next, the output of the self-attention layer can be generated by multiplying each V vector by an attention value and adding a weight vector. Specifically, an attention distribution, which is a probability distribution in which the sum of all values becomes 1, can be obtained by using the softmax function. Each value of the attention distribution is an attention weight. That is, each token has an attention weight, and FIG. 5 is an exemplary diagram visualizing this in the form of a matrix.

학습부(130)는 셀프 어텐션 모델을 이용하여 도출된 어텐션 가중치에 기초하여 페이로드의 토큰 중에서 위협 정보 토큰을 선정할 수 있다. 예를 들어, 학습부(130)는 토큰화된 헤더 및 페이로드 중에서 어텐션 가중치의 값이 가장 높은 5 개의 토큰을 위협 정보 토큰으로 선정할 수 있다.The learning unit 130 may select a threat information token from payload tokens based on the attention weight derived using the self-attention model. For example, the learning unit 130 may select five tokens having the highest attention weight values among tokenized headers and payloads as threat information tokens.

학습부(130)는 위협 정보 토큰의 위치 정보 및 위협 정보 토큰의 어텐션 가중치 값을 더 도출할 수 있다. 학습부(130)는 위협 정보 토큰의 위치 정보를 도출함으로써 페이로드에서 침해 공격을 구성하는 주요 부분이 어디에 위치하는지를 표시하고, 이에 의하여 인공지능 모델의 판별 근거를 제공할 수 있다.The learning unit 130 may further derive location information of the threat information token and an attention weight value of the threat information token. The learning unit 130 may indicate where a main part constituting an infringement attack is located in a payload by deriving the location information of the threat information token, thereby providing a basis for discrimination of the artificial intelligence model.

학습부(130)는 위협 정보 토큰의 위치 정보 및 위협 정보 토큰의 어텐션 가중치 값에 기초하여 인공지능 모델의 탐지 규칙을 개선할 수 있다. 예를 들어, 학습부(130)는 인공지능 모델이 어텐션 가중치 값이 높은 위협 정보 토큰의 핵심 징후를 판별할 수 있도록 인공지능 모델의 탐지 규칙을 개선할 수 있다.The learning unit 130 may improve detection rules of the artificial intelligence model based on the location information of the threat information token and the attention weight value of the threat information token. For example, the learning unit 130 may improve detection rules of the artificial intelligence model so that the artificial intelligence model can determine a key symptom of a threat information token having a high attention weight value.

도 6은 본 발명의 일 실시예에 따라 인공지능 모델의 탐지 규칙을 개선하는 방법을 설명하기 위한 예시적인 도면이다.6 is an exemplary diagram for explaining a method of improving a detection rule of an artificial intelligence model according to an embodiment of the present invention.

도 6을 참조하면, 침해 공격 탐지 장치(100)는 위협 정보 토큰의 핵심 징후를 판별할 수 있도록 위협 정보 토큰의 위치 정보 및 위협 정보 토큰의 어텐션 가중치 값에 기초하여 인공지능 모델의 탐지 규칙을 개선할 수 있다. Referring to FIG. 6 , the compromise attack detection apparatus 100 improves the detection rule of the artificial intelligence model based on the location information of the threat information token and the attention weight value of the threat information token so as to determine the key symptoms of the threat information token. can do.

침해 공격 탐지 장치(100)는 예를 들어, n 바이트 단위로 어텐션 가중치를 도출하고, 도 6에 도시된 바와 같이 어텐션 매트릭스를 시각화하여 위협 정보 토큰의 핵심 징후를 판별하는 데에 활용할 수 있다.The intrusion attack detection apparatus 100 may derive an attention weight in units of n bytes, for example, and visualize the attention matrix as shown in FIG. 6 to determine key symptoms of the threat information token.

탐지부(140)는 인공지능 모델을 이용하여 대상 네트워크에 대한 침해 공격을 탐지할 수 있다. 예를 들어, 탐지부(140)는 대상 네트워크를 실시간 모니터링함으로써 대상 네트워크에 대한 침해 공격을 탐지할 수 있다.The detection unit 140 may detect an intrusion attack on a target network using an artificial intelligence model. For example, the detection unit 140 may detect an attack on the target network by monitoring the target network in real time.

분석부(150)는 탐지된 침해 공격을 분석할 수 있다. 분석부(150)는 탐지된 침해 공격과 다른 침해 공격 간의 상관관계 및 인과관계 중 적어도 하나를 분석할 수 있다.The analysis unit 150 may analyze the detected infringement attack. The analysis unit 150 may analyze at least one of a correlation and a causal relationship between the detected intrusion attack and other intrusion attacks.

분석부(150)는 예를 들어, 연관 법칙 학습 기법에 기초하여 탐지된 침해 공격과 동시간대에 발생한 다른 침해 공격 간의 상관관계를 분석할 수 있다. 연관 법칙 학습 기법으로는 예를 들어, Apriori, FP-Growth 중 어느 하나를 이용할 수 있다.For example, the analyzer 150 may analyze a correlation between an intrusion attack detected based on an association law learning technique and another intrusion attack occurring at the same time. As an association law learning technique, for example, any one of Apriori and FP-Growth may be used.

분석부(150)는 탐지된 침해 공격과 다른 침해 공견 간의 지지도, 신뢰도 및 향상도 중 적어도 하나 이상의 지표를 정량적으로 분석함으로써 두 침해 공격 간의 상관관계를 분석할 수 있다.The analysis unit 150 may analyze a correlation between two intrusion attacks by quantitatively analyzing at least one indicator of support, reliability, and improvement between the detected intrusion attack and other intrusion attacks.

지지도는 예를 들어, 전체 위협 발생 수를 기준으로 두 침해 공격의 위협 판단 요소가 동시에 발생한 수의 비율로 도출될 수 있다.The degree of support may be derived, for example, as a ratio of the number of simultaneous occurrences of threat determination factors of two intrusion attacks based on the total number of occurrences of threats.

신뢰도를 예를 들어, 탐지된 침해 공격의 위협 판단 요소의 발생 수를 기준으로 두 침해 공격의 위협 판단 요소가 동시에 발생한 수의 비율로 도출될 수 있다.For example, the reliability may be derived as a ratio of the number of simultaneously occurring threat factors of two intrusion attacks based on the number of occurrences of threat factors of the detected intrusion attack.

향상도는 예를 들어, 전체 위협 발생 수에 대한 다른 침해 공격의 위협 판단 요소의 발생 수를 기준으로 신뢰도의 비율로 도출될 수 있다.The degree of improvement may be derived as a ratio of reliability based on, for example, the number of occurrences of threat determination factors of other intrusion attacks to the total number of occurrences of threats.

예를 들어, 침해 공격 탐지 장치(100)는 특정 유형의 침해 공격이 인지가 되면, 동일한 유형의 침해 공격이 발생했던 과거의 내역을 분석할 수 있다. 침해 공격 탐지 장치(100)는 동일한 유형의 침해 공격과 병행하여 발생한 다른 침해 공격과의 상관성을 지지도, 신뢰도, 향상도 등의 정량적인 지표로 표현하고, 지표별로 높은 값을 보인 다른 침해 공격들을 분석할 수 있다.For example, when a specific type of intrusion attack is recognized, the intrusion attack detecting apparatus 100 may analyze past records in which the same type of intrusion attack occurred. The intrusion attack detection device 100 expresses the correlation with other intrusion attacks that occur concurrently with the same type of intrusion attack as quantitative indicators such as support, reliability, and improvement, and analyzes other intrusion attacks showing high values for each indicator. can do.

침해 공격 탐지 장치(100)는 탐지된 침해 공격과 동시간대에 발생한 다른 침해 공격 간의 상관관계를 분석함으로써 어떤 위협 경보 요소들 중에서 사고가 될 가능성이 높은지 판단하는 근거를 제공할 수 있다.The intrusion attack detecting apparatus 100 may provide a basis for determining which threat alert elements have a high possibility of becoming an accident by analyzing a correlation between a detected intrusion attack and another intrusion attack occurring at the same time.

분석부(150)는 예를 들어, 시퀀스 추론 기법에 기초하여 탐지된 침해 공격과 과거에 발생한 다른 침해 공격 간의 인과관계를 분석할 수 있다. 시퀀스 추론 기법은 예를 들어, Prefix-Span, Sequential Bayesian 중 어느 하나를 이용하여 침해 공격의 발생 시계열을 분석할 수 있다.For example, the analyzer 150 may analyze a causal relationship between an intrusion attack detected based on a sequence reasoning technique and another intrusion attack that occurred in the past. The sequence reasoning technique may analyze the occurrence time series of the infringement attack using, for example, any one of Prefix-Span and Sequential Bayesian.

분석부(150)는 시퀀스 추론 기법에 기초하여 인과관계를 분석함으로써 새롭게 발생하는 침해 공격의 패턴을 추출 및 분석할 수 있다.The analysis unit 150 may extract and analyze a pattern of a newly occurring infringement attack by analyzing a causal relationship based on a sequence reasoning technique.

도 7은 본 발명의 일 실시예에 따라 탐지된 침해 공격과 과거에 발생한 다른 침해 공격 간의 인과관계를 분석하는 방법을 설명하기 위한 예시적인 도면이다.7 is an exemplary diagram for explaining a method of analyzing a causal relationship between a detected invasion attack and another invasion attack that occurred in the past according to an embodiment of the present invention.

도 7을 참조하면, 침해 공격 탐지 장치(100)는 시퀀스 추론 기법에 기초하여 반복 패턴 A 및 반복 패턴 B를 인식함으로써 탐지된 침해 공격과 과거에 발생한 다른 침해 공격 간의 인과관계를 분석할 수 있다.Referring to FIG. 7 , the intrusion attack detection apparatus 100 may analyze the causal relationship between the detected intrusion attack and other intrusion attacks that have occurred in the past by recognizing repetition patterns A and B based on a sequence inference technique.

예를 들어, 침해 공격 탐지 장치(100)는 탐지된 침해 공격과 높은 상관성을 갖는 다른 침해 공격들을 중심으로 반복적으로 발생한 침해 공격의 시퀀스를 분석할 수 있다. 침해 공격 탐지 장치(100)는 분석된 침해 공격의 시퀀스에서 반복되는 부분 시퀀스를 인지하고 추출할 수 있다. 이에 의하여, 높은 상관성을 가지는 다른 침해 공격의 원인을 분석하거나, 또는 여러 단계에 걸쳐 발생하는 보안 공격의 패턴을 추출할 수 있다.For example, the intrusion attack detection apparatus 100 may analyze a sequence of repetitive intrusion attacks centering on other intrusion attacks having a high correlation with the detected intrusion attack. The intrusion attack detection apparatus 100 may recognize and extract a partial sequence that is repeated in the analyzed intrusion attack sequence. In this way, it is possible to analyze the causes of other intrusion attacks having high correlation, or to extract patterns of security attacks occurring over several steps.

또 다른 예를 들어, 침해 공격 탐지 장치(100)는 침해 공격에 의해 발생한 사고와 관련하여, 기관, 위협 공격 IP, 위협 피해 IP, 블랙리스트 IP, 공격 유형, 자산, 위협 공격 PORT, 위협 피해 PORT, 사고 피해 프로토콜, 탐지 규칙, 의도, 공격 국가, 피해 국가에 관한 정보를 수집하고, 데이터베이스를 구축할 수 있다.As another example, the intrusion attack detection device 100 may include, in relation to an incident caused by an intrusion attack, an institution, a threat attack IP, a threat damage IP, a blacklist IP, an attack type, an asset, a threat attack PORT, and a threat damage PORT. , accident damage protocols, detection rules, intentions, attack country, and victim country information can be collected and a database can be built.

침해 공격 탐지 장치(100)는 데이터베이스로부터 발생 빈도가 높은 항목의 집합을 추출할 수 있다. 예를 들어, 동시에 발생하는 빈도가 높은 시퀀스의 집합을 추출할 수 있다.The intrusion attack detection apparatus 100 may extract a set of items with a high frequency of occurrence from the database. For example, a set of frequently occurring sequences may be extracted.

침해 공격 탐지 장치(100)는 데이터베이스를 이용하여 추출한 시퀀스의 지지도를 도출하고, 시퀀스의 원인(기관, 위협 공격 IP, 위협 피해 IP, 블랙리스트 IP, 자산, 위협 공격 PORT, 위협 피해 PORT, 사고 피해 프로토콜, 탐지 규칙, 의도, 공격 국가, 피해 국가)과 결과(공격 유형)를 기준으로 분류할 수 있다.The infringement attack detection device 100 derives the support of the sequence extracted using the database, and the cause of the sequence (institution, threat attack IP, threat damage IP, blacklist IP, asset, threat attack PORT, threat damage PORT, accident damage It can be classified based on protocol, detection rules, intent, attacking country, victim country) and result (type of attack).

침해 공격 탐지 장치(100)는 기설정된 시간 간격으로 분석 결과를 업데이트할 수 있다.The intrusion attack detection device 100 may update the analysis result at preset time intervals.

침해 공격 탐지 장치(100)는 시퀀스 추론 기법에 기초하여 탐지된 침해 공격과 과거에 발생한 다른 침해 공격 간의 인과관계를 분석함으로써 새롭게 발생하는 사고가 될 가능성이 높은 침해 공격 패턴의 근거를 제공할 수 있다. 또한, 인과관계의 분석 결과를 이용하여 새롭게 발생하는 사고의 패턴에 대한 예측 정보를 제공할 수 있다.The intrusion attack detection apparatus 100 analyzes the causal relationship between the intrusion attack detected based on the sequence reasoning technique and other intrusion attacks that have occurred in the past, thereby providing a basis for an intrusion attack pattern that is highly likely to be a newly occurring incident. . In addition, predictive information on a newly occurring accident pattern can be provided by using the analysis result of the causal relationship.

도 8은 본 발명의 일 실시예에 따른 네트워크에 대한 침해 공격을 탐지하는 방법의 순서도이다. 도 8에 도시된 침해 공격 탐지 장치(100)에서 수행되는 네트워크에 대한 침해 공격을 탐지하는 방법(800)은 도 1에 도시된 실시예에 따라 침해 공격 탐지 장치(100)에 의해 시계열적으로 처리되는 단계들을 포함한다. 따라서, 이하 생략된 내용이라고 하더라도 도 1에 도시된 실시예에 따라 침해 공격 탐지 장치(100)에서 수행되는 네트워크에 대한 침해 공격을 탐지하는 방법에도 적용된다.8 is a flowchart of a method for detecting an intrusion attack on a network according to an embodiment of the present invention. A method 800 for detecting an intrusion attack on a network performed by the intrusion attack detection device 100 shown in FIG. 8 is processed time-sequentially by the intrusion attack detection device 100 according to the embodiment shown in FIG. includes steps to Therefore, even if the details are omitted below, they are also applied to the method of detecting an intrusion attack on a network performed by the intrusion attack detection apparatus 100 according to the embodiment shown in FIG. 1 .

단계 S810에서 침해 공격 탐지 장치(100)는 헤더 및 페이로드를 포함하는 패킷 데이터를 수집할 수 있다.In step S810, the intrusion attack detection device 100 may collect packet data including a header and a payload.

단계 S820에서 침해 공격 탐지 장치(100)는 수집한 패킷 데이터의 헤더 및 페이로드를 토큰화하고, 토큰화된 헤더 및 페이로드를 임베딩할 수 있다.In step S820, the intrusion attack detection apparatus 100 may tokenize the header and payload of the collected packet data and embed the tokenized header and payload.

단계 S830에서 침해 공격 탐지 장치(100)는 임베딩 결과에 기초하여 인공지능 모델을 학습시킬 수 있다.In step S830, the intrusion attack detection device 100 may learn an artificial intelligence model based on the embedding result.

단계 S840에서 침해 공격 탐지 장치(100)는 인공지능 모델을 이용하여 대상 네트워크에 대한 침해 공격을 탐지할 수 있다.In step S840, the intrusion attack detection device 100 may detect an intrusion attack on the target network using an artificial intelligence model.

단계 S850에서 침해 공격 탐지 장치(100)는 탐지된 침해 공격을 분석할 수 있다.In step S850, the intrusion attack detection device 100 may analyze the detected intrusion attack.

상술한 설명에서, 단계 S810 내지 S850은 본 발명의 구현예에 따라서, 추가적인 단계들로 더 분할되거나, 더 적은 단계들로 조합될 수 있다. 또한, 일부 단계는 필요에 따라 생략될 수도 있고, 단계 간의 순서가 전환될 수도 있다.In the above description, steps S810 to S850 may be further divided into additional steps or combined into fewer steps, depending on an embodiment of the present invention. Also, some steps may be omitted as needed, and the order of steps may be switched.

도 9는 본 발명의 일 실시예에 따른 침해 공격 탐지 장치(100)에 의해 사용자에게 제공되는 인터페이스 화면의 예시적인 도면이다.9 is an exemplary view of an interface screen provided to a user by the intrusion attack detection device 100 according to an embodiment of the present invention.

도 9의 (a)는 대상 네트워크에 대한 침해 공격의 탐지 결과를 사용자에게 제공하는 화면이다. 도 7의 (a)에 도시된 바와 같이, 페이로드에서 판별에 주요하게 작용한 부분이 하이라이팅 표시되어 제공되고, 판별 이유란에 각 부분의 위치 및 중요도를 계산한 결과가 제공될 수 있다.9(a) is a screen for providing a user with a detection result of an intrusion attack on a target network. As shown in (a) of FIG. 7 , a part of the payload that has played a major role in the determination is highlighted and displayed, and a result of calculating the location and importance of each part may be provided in the reason for determination.

도 9의 (b)는 탐지된 침해 공격과 동시간대에 발생한 다른 침해 공격 간의 상관관계를 분석한 결과를 제공하는 화면의 예시적인 도면이다. 상관관계를 분석한 결과는 위협 정보를 이루는 IP주소, 포트, 프로토콜 등의 세부 필드 간의 상관성 결과를 포함할 수 있다. 예를 들어, 상관관계를 분석할 기간은 사용자에 의해 설정될 수 있다.9(b) is an exemplary view of a screen providing a result of analyzing a correlation between a detected intrusion attack and another intrusion attack occurring at the same time. The correlation analysis result may include a correlation result between detailed fields such as an IP address, a port, and a protocol constituting the threat information. For example, a period for analyzing the correlation may be set by the user.

도 9의 (c)는 탐지된 침해 공격과 과거에 발생한 다른 침해 공격 간의 인과관계를 분석한 결과를 제공하는 화면의 예시적인 도면이다. 인과관계를 분석한 결과는 반복되는 시퀀스의 정보 및 반복 시퀀스를 통해 파악된 침해 공격의 원인 정보를 포함할 수 있다.9(c) is an exemplary view of a screen providing a result of analyzing a causal relationship between a detected intrusion attack and another intrusion attack that occurred in the past. The result of analyzing the causal relationship may include repeated sequence information and cause information of an infringement attack identified through the repeated sequence.

침해 공격 탐지 장치(100)에 의해 사용자에게 제공되는 인터페이스 화면은 인공지능 모델의 탐지 규칙의 개선을 요청하는 화면을 포함할 수 있다. 침해 공격 탐지 장치(100)는 탐지 규칙의 개선을 요청받고, 탐지 규칙의 개선을 수행한 후에 개선 내용을 검증할 수 있다. 개선 내용은, 기존의 탐지 규칙을 이용하는 경우의 오탐율 및 정탐율과 수정된 탐지 규칙을 이용하는 경우의 오탐율 및 정탐율에 관한 정보를 포함할 수 있다.The interface screen provided to the user by the intrusion attack detection device 100 may include a screen requesting improvement of detection rules of the artificial intelligence model. The intrusion attack detection device 100 may receive a request for improvement of detection rules, perform improvement of the detection rules, and then verify the improvement contents. The improvement content may include information about the false positive rate and true positive rate when using the existing detection rule and the false positive rate and true positive rate when using the modified detection rule.

도 1 내지 도 9를 통해 설명된 침해 공격 탐지 장치에서 네트워크에 대한 침해 공격을 탐지하는 방법은 컴퓨터에 의해 실행되는 매체에 저장된 컴퓨터 프로그램 또는 컴퓨터에 의해 실행 가능한 명령어를 포함하는 기록 매체의 형태로도 구현될 수 있다.The method for detecting an intrusion attack on a network in the intrusion attack detection device described with reference to FIGS. 1 to 9 may be in the form of a computer program stored in a medium executed by a computer or a recording medium including instructions executable by a computer. can be implemented

컴퓨터 판독 가능 매체는 컴퓨터에 의해 액세스될 수 있는 임의의 가용 매체일 수 있고, 휘발성 및 비휘발성 매체, 분리형 및 비분리형 매체를 모두 포함한다. 또한, 컴퓨터 판독가능 매체는 컴퓨터 저장 매체를 포함할 수 있다. 컴퓨터 저장 매체는 컴퓨터 판독가능 명령어, 데이터 구조, 프로그램 모듈 또는 기타 데이터와 같은 정보의 저장을 위한 임의의 방법 또는 기술로 구현된 휘발성 및 비휘발성, 분리형 및 비분리형 매체를 모두 포함한다.Computer readable media can be any available media that can be accessed by a computer and includes both volatile and nonvolatile media, removable and non-removable media. Also, computer readable media may include computer storage media. Computer storage media includes both volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data.

전술한 본 발명의 설명은 예시를 위한 것이며, 본 발명이 속하는 기술분야의 통상의 지식을 가진 자는 본 발명의 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 쉽게 변형이 가능하다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다. 예를 들어, 단일형으로 설명되어 있는 각 구성 요소는 분산되어 실시될 수도 있으며, 마찬가지로 분산된 것으로 설명되어 있는 구성 요소들도 결합된 형태로 실시될 수 있다.The above description of the present invention is for illustrative purposes, and those skilled in the art can understand that it can be easily modified into other specific forms without changing the technical spirit or essential features of the present invention. will be. Therefore, the embodiments described above should be understood as illustrative in all respects and not limiting. For example, each component described as a single type may be implemented in a distributed manner, and similarly, components described as distributed may be implemented in a combined form.

본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 균등 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.The scope of the present invention is indicated by the following claims rather than the detailed description above, and all changes or modifications derived from the meaning and scope of the claims and equivalent concepts should be construed as being included in the scope of the present invention. do.

100: 침해 공격 탐지 장치
110: 수집부
120: 임베딩부
130: 학습부
140: 탐지부
150: 분석부100: intrusion attack detection device
110: collection unit
120: embedding unit
130: learning unit
140: detection unit
150: analysis unit

Claims

A device for detecting an intrusion attack on a network,
a collection unit that collects packet data including a header and a payload;
an embedding unit that tokenizes headers and payloads of the collected packet data and embeds the tokenized headers and payloads;
A learning unit for learning an artificial intelligence model based on an embedding result;
a detection unit that detects an intrusion attack on a target network using the artificial intelligence model; and
An analysis unit that analyzes the detected infringement attack
To include, intrusion attack detection device.

According to claim 1,
The embedding unit derives an embedding vector of the tokenized header and payload,
Wherein the learning unit learns the artificial intelligence model based on the derived embedding vector.

According to claim 2,
Wherein the learning unit extracts feature points by performing a convolution operation on the embedding vector, and learns the artificial intelligence model based on the extracted feature points.

According to claim 3,
The artificial intelligence model is based on a first model that detects an infringement attack based on a transformer-based ensemble model learned based on the extracted feature points and a self-attention model that derives an attention weight for each token of the payload. An intrusion attack detection device comprising a second model, which is an explainable artificial intelligence model that analyzes the basis for discrimination of an attack.

According to claim 4,
Wherein the learning unit selects a threat information token from tokens of the payload based on the derived attention weight.

According to claim 5,
Wherein the learning unit further derives location information of the threat information token and an attention weight value of the threat information token.

According to claim 6,
wherein the learning unit improves a detection rule of the artificial intelligence model based on location information of the threat information token and an attention weight value of the threat information token.

According to claim 1,
Wherein the analysis unit analyzes at least one of a correlation and a causal relationship between the detected intrusion attack and other intrusion attacks.

According to claim 8,
Wherein the analysis unit analyzes the correlation between the detected intrusion attack and another intrusion attack occurring at the same time based on a correlation law learning technique.

According to claim 8,
Wherein the analysis unit analyzes the causal relationship between the detected invasion attack and other invasion attacks that occurred in the past based on a sequence inference technique.

A method for detecting an intrusion attack on a network,
Collecting packet data including header and payload;
Tokenizing headers and payloads of the collected packet data;
embedding the tokenized header and payload;
Learning an artificial intelligence model based on the embedding result;
detecting an intrusion attack on a target network using the artificial intelligence model; and
Analyzing the detected infringement attack
To include, intrusion attack detection method.

According to claim 11,
Deriving an embedding vector of the tokenized header and payload
Including more,
Wherein the step of learning the artificial intelligence model is learning the artificial intelligence model based on the derived embedding vector.

According to claim 12,
In the step of learning the artificial intelligence model, a feature point is extracted by performing a convolution operation on the embedding vector, and the artificial intelligence model is trained based on the extracted feature point.

According to claim 13,
The artificial intelligence model is based on a first model that detects an infringement attack based on a transformer-based ensemble model learned based on the extracted feature points and a self-attention model that derives an attention weight for each token of the payload. A method for detecting intrusion attacks, including a second model, which is an explainable artificial intelligence model that analyzes the discrimination basis of an attack.

15. The method of claim 14,
Selecting a threat information token from tokens of the payload based on the derived attention weight
To further include, infringement attack detection method.

According to claim 15,
Deriving location information of the threat information token and an attention weight value of the threat information token
To further include, infringement attack detection method.

17. The method of claim 16,
Improving a detection rule of the artificial intelligence model based on location information of the threat information token and an attention weight value of the threat information token.
To further include, infringement attack detection method.

According to claim 11,
Analyzing a correlation between the detected intrusion attack and other intrusion attacks occurring at the same time based on a correlation law learning technique
To further include, infringement attack detection method.

According to claim 11,
Analyzing a causal relationship between the detected intrusion attack and other intrusion attacks that occurred in the past based on a sequence reasoning technique
To further include, infringement attack detection method.

A computer program stored on a computer readable recording medium comprising a sequence of instructions for detecting an intrusion attack on a network,
When the computer program is executed by a computing device,
Collect packet data including header and payload;
Tokenize headers and payloads of the collected packet data;
Embedding the tokenized header and payload;
Train an artificial intelligence model based on the embedding result,
Detect an intrusion attack on the target network using the artificial intelligence model,
A computer program stored on a computer readable recording medium comprising a sequence of instructions for analyzing the detected compromise attack.