KR100628329B1

KR100628329B1 - Generation apparatus and method of detection rules for attack behavior based on information of network session

Info

Publication number: KR100628329B1
Application number: KR1020050069968A
Authority: KR
Inventors: 정일안; 오진태; 장종수
Original assignee: 한국전자통신연구원
Priority date: 2005-07-30
Filing date: 2005-07-30
Publication date: 2006-09-27

Abstract

본 발명은 네트워크 세션 특성 정보에 대한 공격행위 탐지규칙 자동생성 및 자동갱신 방법 및 장치에 관한 것으로, 세션 특성 정보별로 분류된 네트워크 데이터를 세션 특성 정보 및 정상 유형, 공격 유형 및 미지 유형 중 네트워크 데이터가 속하는 네트워크 데이터 유형의 특성을 포함하는 입력 데이터 형식으로 변환하는 네트워크 세션 특성정보 추출부 및 입력 데이터 형식으로 변환된 네트워크 데이터에 확장 C4.5 알고리즘을 적용하여 결정트리를 구성하고, 결정트리의 최종노드에서 오류율을 기초로 결정트리의 정확도를 생성하며, 네트워크 데이터 유형의 특성, 결정트리의 최적의 정보이득을 갖는 노드를 선택하기 위한 조건식 및 정확도를 기초로 네트워크 데이터 유형별로 패턴화된 탐지규칙을 생성하는 탐지규칙 자동 생성부를 포함하여 탐지규칙을 자동으로 생성하고 갱신한다. The present invention relates to a method and apparatus for automatically generating and automatically updating attack behavior detection rules for network session characteristic information. The network data classified by session characteristic information may be divided into session characteristic information and normal data type, attack type, and unknown type. Network session feature information extracting unit converting to input data format including properties of belonging network data type and extended C4.5 algorithm applied to network data converted to input data format to construct decision tree, and final node of decision tree Generates the accuracy of decision tree based on error rate, and generates detection rule patterned by network data type based on the characteristics of network data type, conditional expression for selecting node with optimal information gain of decision tree, and accuracy. Detection including automatic detection rule generation Create and update rules automatically.

네트워크 세션 특성 정보, 탐지규칙, 결정트리, 자동생성, 자동갱신 Network session property information, detection rule, decision tree, auto generation, auto update

Description

Generation Apparatus and Method of detection rules for attack behavior based on information of network session}

도 1 은 네트워크 세션 특성정보를 기반으로 공격행위의 탐지을 규칙 자동생성하고 갱신하는 방법의 전체 구성도이다.1 is an overall configuration diagram of a method for automatically generating and updating a rule for detection of an attack based on network session characteristic information.

도 2(a)는 도 1 에 도시된 네트워크 세션 특성정보 추출부의 세션 전체 통계정보이다. FIG. 2 (a) is session total statistical information of the network session characteristic information extraction unit illustrated in FIG. 1.

도 2(b)는 도 1 에 도시된 네트워크 세션 특성정보 추출부의 네트워크 프로토콜 연결정보이다. FIG. 2 (b) is network protocol connection information of the network session characteristic information extraction unit shown in FIG. 1.

도 2(c)는 도 1 에 도시된 네트워크 세션 특성정보 추출부의 네트워크 프로토콜 헤더정보이다. FIG. 2 (c) is network protocol header information of the network session characteristic information extracting unit shown in FIG. 1.

도 2(d) 및 2(e)는 도 1 에 도시된 네트워크 세션 특성정보 추출부의 두 호스트 사이의 연결정보이다.2 (d) and 2 (e) are connection information between two hosts of the network session characteristic information extracting unit shown in FIG.

도 3 은 도 1 의 네트워크 세션 특성정보 추출부(130)에서 수집된 데이터의 특성정보를 추출하는 흐름도이다.3 is a flowchart for extracting feature information of data collected by the network session feature information extractor 130 of FIG. 1.

도 4 은 도 1 에 도시된 탐지규칙 자동생성부에서 탐지규칙을 자동으로 생성하는 흐름도이다. 4 is a flowchart automatically generating a detection rule in the detection rule automatic generation unit shown in FIG. 1.

도 5 는 도 1 에 도시된 탐지규칙 검증 및 갱신부 내의 탐지규칙 검증부에서 생성된 탐지규칙을 검증하기 위한 흐름도이다.FIG. 5 is a flowchart for verifying a detection rule generated by a detection rule verification unit in the detection rule verification and update unit shown in FIG. 1.

도 6 은 도 1 에 도시된 탐지규칙 검증 및 갱신부 내의 탐지규칙 자동갱신부에서 생성된 탐지규칙의 정확도를 갱신하기 위한 흐름도이다. 6 is a flowchart for updating the accuracy of the detection rule generated by the detection rule automatic update unit in the detection rule verification and update unit shown in FIG. 1.

도 7 은 탐지규칙을 자동으로 생성하기 위한 전체 흐름도이다. 7 is an overall flowchart for automatically generating a detection rule.

도 8 은 본 발명의 효과를 종래의 공격 탐지 방법 중의 하나와 비교하여 도시한 도면이다.8 is a view showing the effect of the present invention compared to one of the conventional attack detection method.

도 9 는 연속형 데이터를 이산형 데이터로 변환함으로써 탐지규칙의 정확도 향상을 나타내는 도면이다.9 is a diagram illustrating the improvement of accuracy of a detection rule by converting continuous data into discrete data.

도 10 및 도 11 은 본 발명의 일 실시예에 따른 탐지규칙 형성을 실험 데이터에 적용한 결과를 도시한다.10 and 11 illustrate results of applying detection rule formation to experimental data according to an embodiment of the present invention.

본 발명은 네트워크 세션 특성 정보에 대한 공격행위 탐지규칙 자동생성 및 자동갱신 방법 및 장치에 관한 것으로, 네트워크 세션별 특성정보에 대해 각 공격행위에 대한 탐지규칙들을 자동으로 생성하고, 생성된 규칙들의 정확성을 탐지 정확도 계산을 통해 자동으로 갱신하여 보다 정확한 탐지규칙들을 통해, 알려진 공격 및 유사한 공격유형, 정상 행위와 다른 유형을 탐지하기 위한 것이다. The present invention relates to a method and apparatus for automatically generating and automatically updating attack behavior detection rules for network session characteristic information. The present invention relates to automatically generating detection rules for each attack behavior for network session characteristic information, and to accurately correcting the generated rules. It is automatically updated through the calculation of detection accuracy to detect the known attack and similar attack type, normal behavior and other types through more accurate detection rules.

일반적으로, 침입탐지 방법은 특징기반의 규칙 탐지와 비정상행위 탐지로 나 뉜다. 특징기반 방법은 이미 알려진 공격패킷을 분석하여 패턴화하고 그 패턴과 일치하는지를 검사함으로써 공격을 판정하는 것으로, 알려진 공격에 대한 탐지 정확도는 상당히 높지만, 변형된 공격행위나 알려지지 않은 새로운 공격에 대한 탐지에는 한계가 있다.In general, intrusion detection methods are divided into feature-based rule detection and abnormal behavior detection. The feature-based method is to determine an attack by analyzing and patterning known attack packets and checking whether they match the pattern. The detection accuracy of known attacks is quite high, but it is not suitable for detection of modified attack behavior or unknown new attacks. There is a limit.

이러한 단점을 극복하기 위한 비정상행위 탐지 방법은 정상적인 행위에 의해 생성된 데이터를 모델링하여 이 정상모델과 벗어난 정도에 따라 공격을 탐지하는 것으로 주로 알려지지 않은 공격을 탐지하는데 사용된다.The abnormal behavior detection method to overcome this drawback is mainly used to detect an unknown attack by modeling the data generated by the normal behavior and detecting the attack according to the deviation from the normal model.

대량의 네트워크 데이터로부터 침입과 관련된 특징을 찾는 작업은 대부분 전문가의 지식에 의해 분석되는데, 이러한 방법은 정확성은 높으나 분석 비용이 높고 새로운 형태의 공격에 대한 대처 능력이 떨어지게 된다. Finding features related to intrusion from large amounts of network data is mostly analyzed by expert knowledge, which is more accurate but expensive to analyze and less able to cope with new types of attacks.

반면, 비정상행위 탐지 방법은 알려지지 않은 공격을 탐지하는데 효과적이지만, 특징화가 어렵고 특징 기반에 비해 정확성이 떨어진다. 또한, 네트워크 패킷에서는 나타나지 않거나 정상적인 네트워크 데이터 흐름과 명확하게 구별할 수 없는 공격유형에 대한 특징들을 추출하기 위해 네트워크 세션 단위의 특성정보를 이용하는 방법에 관한 연구가 거의 없는 실정이다.On the other hand, anomaly detection methods are effective in detecting unknown attacks, but are difficult to characterize and are less accurate than feature-based. In addition, there is little research on how to use the characteristic information of the network session unit to extract the characteristics of the attack type that do not appear in the network packet or cannot be clearly distinguished from the normal network data flow.

본 발명은 상기와 같은 문제점을 해결하기 위해 네트워크 세션 특성 정보에 대한 공격행위 탐지규칙을 자동으로 생성하고 갱신하여 보다 정확도가 높은 탐지규칙을 생성하는 것을 목적으로 한다. In order to solve the above problems, an object of the present invention is to automatically generate and update attack behavior detection rules for network session characteristic information to generate more accurate detection rules.

본 발명의 바람직한 일 실시예에 있어서, 공격 행위 탐지 규칙 생성 장치는 세션 특성 정보별로 분류된 네트워크 데이터를 상기 세션 특성 정보 및 정상 유형, 공격 유형 및 미지 유형 중 상기 네트워크 데이터가 속하는 네트워크 데이터 유형의 특성을 포함하는 입력 데이터 형식으로 변환하는 네트워크 세션 특성정보 추출부;및 상기 입력 데이터 형식으로 변환된 네트워크 데이터에 분류 알고리즘을 적용하여 결정트리를 구성하고, 상기 결정트리의 최종노드에서 오류율을 기초로 상기 결정트리의 정확도를 생성하며, 상기 네트워크 데이터 유형의 특성, 상기 결정트리의 최적의 정보이득을 갖는 노드를 선택하기 위한 조건식 및 상기 정확도를 기초로 상기 네트워크 데이터 유형별로 패턴화된 탐지규칙을 생성하는 탐지규칙 자동 생성부;를 포함한다. In an exemplary embodiment of the present invention, the apparatus for generating an attack behavior detection rule may include network data classified by the session characteristic information, and the characteristics of the network data type to which the network data belongs among the session characteristic information and the normal type, the attack type, and the unknown type. Network session characteristic information extraction unit for converting into an input data format comprising a; and to form a decision tree by applying a classification algorithm to the network data converted to the input data format, based on the error rate in the last node of the decision tree Generating an accuracy of a decision tree, and generating a detection rule patterned for each network data type based on the characteristics of the network data type, a conditional expression for selecting a node having an optimal information gain of the decision tree, and the accuracy Detection rule automatic generation unit; includes .

본 발명의 바람직한 일 실시예에 있어서, 공격 행위 탐지 규칙 생성 장치는 이전에 생성된 상기 탐지규칙과 새롭게 생성된 상기 탐지 규칙이 일치하는 경우 상기 각각의 네트워크 데이터 유형별 오류율은 상기 오류율에 상기 각각의 네트워크 데이터 유형만을 대상으로 한 오류율을 곱하여 생성하고, 상기 생성된 각각의 데이터 유형별 오류율을 기초로 상기 결정 트리의 정확도를 갱신하는 탐지규칙 검증 및 갱신부;를 더 포함한다. In an exemplary embodiment of the present invention, the apparatus for generating an attack behavior detection rule may include the error rate for each network data type corresponding to the error rate when the previously generated detection rule and the newly generated detection rule match. And a detection rule verification and updating unit which multiplies and generates an error rate targeting only data types and updates the accuracy of the decision tree based on the generated error rates for each data type.

본 발명의 또 다른 바람직한 일 실시예에 있어서, 공격 행위 탐지 규칙 생성 방법은 네트워크 데이터를 세션 특성 정보별로 분류하는 단계; 상기 분류된 상기 데이터를 상기 세션 특성 정보 및 정상 유형, 공격 유형 및 미지 유형 중 상기 네트워크 데이터가 속하는 네트워크 데이터 유형의 특성을 포함하는 입력 데이터 형 식으로 변환하는 단계; 상기 입력 데이터 형식으로 변환된 상기 데이터에 분류 알고리즘을 적용하여 결정트리를 구성하는 단계; 상기 결정트리의 최종노드에서 오류율을 기초로 상기 결정트리의 정확도를 생성하는 단계; 및 상기 네트워크 데이터가 속하는 네트워크 데이터 유형의 특성, 상기 결정트리의 최적의 정보이득을 갖는 노드를 선택하기 위한 조건식 및 상기 정확도를 기초로 상기 네트워크 데이터 유형별로 패턴화된 탐지규칙을 생성하는 단계;를 포함한다.In another preferred embodiment of the present invention, the method for generating an attack behavior detection rule may include classifying network data by session characteristic information; Converting the classified data into an input data format including the session characteristic information and a characteristic of a network data type to which the network data belongs among a normal type, an attack type, and an unknown type; Constructing a decision tree by applying a classification algorithm to the data converted into the input data format; Generating an accuracy of the decision tree based on an error rate at the last node of the decision tree; And generating a detection rule patterned for each network data type based on a characteristic of a network data type to which the network data belongs, a conditional expression for selecting a node having an optimal information gain of the decision tree, and the accuracy. Include.

본 발명의 또 다른 바람직한 일 실시예에 있어서, 공격 행위 탐지 규칙 생성 방법은 이전에 생성된 상기 탐지규칙과 새롭게 생성된 상기 탐지 규칙이 일치하는지 판단하는 단계; 상기 탐지 규칙이 일치하는 경우 상기 각각의 네트워크 데이터 유형별 오류율을 상기 오류율에 상기 각각의 네트워크 데이터 유형만을 대상으로 한 오류율을 곱하여 생성하는 단계; 상기 생성된 각각의 데이터 유형별 오류율을 기초로 상기 결정 트리의 정확도를 갱신하는 단계; 를 더 포함한다.In another preferred embodiment of the present invention, the method for generating an attack behavior detection rule comprises: determining whether the previously generated detection rule coincides with the newly generated detection rule; Generating an error rate for each network data type by multiplying the error rate by an error rate for each network data type only when the detection rule is matched; Updating the accuracy of the decision tree based on the generated error rates for each data type; It further includes.

이하 본 발명의 바람직한 실시예가 첨부된 도면들을 참조하여 설명될 것이다. 도면들 중 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 참조번호들 및 부호들로 나타내고 있음에 유의해야 한다. 하기에서 본 발명을 설명함에 있어, 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다.Hereinafter, preferred embodiments of the present invention will be described with reference to the accompanying drawings. It should be noted that the same elements among the drawings are denoted by the same reference numerals and symbols as much as possible even though they are shown in different drawings. In the following description of the present invention, if it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the subject matter of the present invention, the detailed description thereof will be omitted.

도 1 은 본 발명에 따른 네트워크 세션 특성정보를 기반으로 공격행위 탐지 규칙을 자동으로 생성하고 갱신하는 전체 흐름도를 도시하는 도면으로서, 네트워크 데이터 생성부(110), 네트워크 데이터 수집부(120), 네트워크 세션 특성정보 추출부(130), 탐지규칙 자동생성부(140), 탐지규칙 저장 데이터베이스부(150), 탐지규칙 검증 및 갱신부(180)를 포함한다. 1 is a diagram illustrating an overall flowchart of automatically generating and updating an attack detection rule based on network session characteristic information according to the present invention, wherein the network data generating unit 110, the network data collecting unit 120, and a network are shown. The session characteristic information extractor 130, the detection rule automatic generation unit 140, the detection rule storage database unit 150, and the detection rule verification and update unit 180 are included.

네트워크 데이터 생성부(110)는 데이터를 생성하는 기능을 수행한다. 여기서 데이터 생성은 네트워크 카드로부터 실제 네트워크 상의 데이터를 수신하는 것, 관리자에 의하여 공격 행위가 포함되지 않는 정상적인 네트워크 사용환경에서 임의적으로 정상 행위 데이터를 생성하는 것 및 관리자에 의해 다양한 공격 도구를 사용하여 임의적으로 공격 행위 데이터를 생성하는 것을 포함한다. The network data generator 110 performs a function of generating data. Here, data generation may be performed by receiving data on a real network from a network card, randomly generating normal behavior data in a normal network environment where attack behavior is not included by an administrator, and randomly using various attack tools by an administrator. And generating attack behavior data.

네트워크 데이터 수집부(120)는 네트워크 데이터 생성부로부터 생성된 데이터들을 소프트웨어로 처리하거나 하드웨어적으로 구성하여 수집하는 기능을 수행한다. The network data collector 120 performs a function of collecting data generated by the network data generator by software or by hardware configuration.

네트워크 데이터 수집부(120)는 상기 네트워크 데이터 생성부(110)에서 생성된 데이터의 유형에 따라 정상 행위 데이터는 "Normal(정상)" 데이터로 , 공격 행위 데이터들은 "Attack(공격)" 데이터로, 그리고 정상 및 공격 행위 데이터에 속하지 않는 경우는 모두 "Unknown(미지)" 데이터로 수집한다. The network data collection unit 120 is a normal behavior data is "Normal" data, the attack behavior data "Attack" data according to the type of data generated by the network data generation unit 110, And all cases that do not belong to normal and attack behavior data are collected as "Unknown" data.

공격 행위 데이터를 수집함에 있어 관리자는 공격 행위 데이터에 해당 공격 명칭을 각각 표기하여 각각의 파일 형태로 저장한다. 이 경우 파일 명칭의 형식은 공격 유형과 해당 공격명 및 기타 정보를 포함한다. In collecting the attack behavior data, the administrator saves each attack name in the attack behavior data and stores them in each file format. In this case, the format of the file name includes the attack type, the attack name, and other information.

파일 명칭을 부가하는 바람직한 일 실시예로서, 공격유형이 서비스거부 (Denial of Service) 공격의 경우에는 'dos'로 표기하고, 해당 공격이 메일 서버를 공격하는 mailbomb인 경우에는 'mailbomb'로 표기한다. 또한 기타 정보에는 시험한 날짜나 요일 등을 표기하고 이들간의 구별자는 '_'로 표기한다. 즉, 파일명을 'dos_mailbomb_w2frii' 형태로 생성함으로써, 생성된 데이터를 관리하는 목록 파일로부터 특성 정보를 추출하는 것이 용이하도록 구성한다. As a preferred embodiment of adding a file name, the attack type is 'dos' for a Denial of Service attack and 'mailbomb' if the attack is a mailbomb that attacks a mail server. . Other information shall include the date and day of the test and the distinction between them shall be indicated by '_'. That is, by generating the file name in the form of 'dos_mailbomb_w2frii', it is configured to easily extract the characteristic information from the list file managing the generated data.

네트워크 세션 특성정보 추출부(130)는 네트워크 데이터 수집부(120)에서 수집된 데이터를 세션별로 분류하고 특성정보를 추출하는 기능을 수행한다. 이에 대하여서는 도 2와 도 3 에 대한 관련 설명에서 보다 상세히 설명하기로 한다.The network session characteristic information extractor 130 classifies the data collected by the network data collector 120 for each session and extracts the characteristic information. This will be described in more detail with reference to FIGS. 2 and 3.

탐지규칙 자동생성부(140)는 네트워크 세션 특성정보 추출부(130)에서 추출한 특성 정보를 기초로 정보이론에 기반한 결정트리 알고리즘들중의 하나인 C4.5를 확장한 알고리즘을 사용하여 각 공격유형별로 데이터를 분류하도록 탐지규칙을 생성하는 기능을 한다. 이에 관하여서는 도 4에서 보다 상세히 설명하기로 한다. The detection rule automatic generation unit 140 uses an algorithm that extends C4.5, which is one of decision tree algorithms based on information theory, based on the characteristic information extracted by the network session characteristic information extraction unit 130 for each attack type. This function creates a detection rule to classify the data. This will be described in more detail with reference to FIG. 4.

탐지규칙 저장 데이터베이스(150)는 탐지규칙 자동생성부(140)에서 생성된 탐지규칙을 저장한다. The detection rule storage database 150 stores the detection rule generated by the detection rule automatic generation unit 140.

탐지규칙 검증 및 갱신부(180)는 탐지규칙 검증부(160)와 탐지규칙 자동갱신부(170)를 포함한다. 탐지규칙 검증부(160)는 탐지규칙 자동생성부(140)에서 생성되어 탐지규칙 저장 데이터베이스(150)에 저장된 탐지규칙들을 검증하고, 탐지규칙 자동갱신부(170)는 탐지규칙 검증부(160)에서 검증된 탐지규칙의 정확도를 갱신한다. The detection rule verification and update unit 180 includes a detection rule verification unit 160 and a detection rule automatic update unit 170. The detection rule verification unit 160 verifies the detection rules generated by the detection rule automatic generation unit 140 and stored in the detection rule storage database 150, and the detection rule automatic update unit 170 detects the detection rule verification unit 160. Update the accuracy of the verified detection rule in.

탐지규칙 검증 및 갱신부(180)는 이전에 생성된 상기 탐지규칙과 새롭게 생 성된 상기 탐지 규칙이 일치하는지와 각각의 네트워크 데이터 유형별 오류율을 상기 오류율에 상기 각각의 네트워크 데이터 유형만을 대상으로 한 오류율을 곱하여 생성하고, 상기 생성된 각각의 데이터 유형별 오류율을 기초로 상기 결정 트리의 정확도를 갱신한다. 이에 대하여서는 도 5 및 도 6에서 보다 상세히 설명하기로 한다.The detection rule verifying and updating unit 180 matches the previously generated detection rule with the newly generated detection rule, and compares the error rate for each network data type with the error rate for each network data type. And multiply and update the accuracy of the decision tree based on the generated error rates for each data type. This will be described in more detail with reference to FIGS. 5 and 6.

도 2 는 네트워크 세션 특성정보 추출부(130)에서 네트워크 데이터 수집부(120)로부터 수집한 데이터를 세션별로 분류하고 특성정보를 추출한 일 실시예를 도시한 것이다. FIG. 2 illustrates an embodiment in which the network session characteristic information extractor 130 classifies data collected from the network data collector 120 for each session and extracts characteristic information.

네트워크 세션 특성정보 추출부(130)는 네트워크 데이터 수집부(120)로부터 수집한 데이터를 분석할 대상 및 관점에 따라 네트워크 세션 특성 정보를 크게 세션 전체 통계정보(도 2-(a)), 네트워크 프로토콜 연결정보(도 2-(b)), 네트워크 프로토콜 헤더정보(도 2-(c)), 두 호스트 사이의 연결정보(도 2-(d), 도 2-(e))로 분류하여 정의한다.The network session characteristic information extractor 130 greatly expands the network session characteristic information according to the object and viewpoint to analyze the data collected from the network data collection unit 120, and includes the session-wide statistical information (Fig. 2- (a)) and the network protocol. Defined by classifying connection information (Fig. 2- (b)), network protocol header information (Fig. 2- (c)), and connection information between two hosts (Fig. 2- (d) and 2- (e)). .

데이터는 유형에 따라 소정의 범위 내에서 연속적으로 이어지는 형태로 나오는 값의 형태의 연속형(continuous)과 일정 범위나 특정 값이 한정되어 구별되는 형태의 이산형(discrete)으로 분류한다. The data is classified into a continuous form of a value that comes in a continuous form within a predetermined range according to a type, and a discrete form of a form in which a certain range or a specific value is defined and distinguished.

세션 전체 통계정보(도 2-(a))는 세션내에 있는 전체 패킷들에 대해서 분석되는 통계 정보이다. 패킷 단위로 특성 정보를 추출할 경우 데이터 추출이 제한됨에 반해 세션 단위로 특성 정보를 추출함으로서 유형을 분류함에 있어 보다 효과적이다. Session total statistics (Fig. 2- (a)) is statistical information analyzed for all packets in the session. Extracting feature information in packet units is more effective in classifying types by extracting feature information in session units, while data extraction is limited.

네트워크 프로토콜 연결정보(도 2-(b))는 TCP/IP 프로토콜의 연결성립과 관련된 내용을 분석한 정보이다. 네트워크 프로토콜 헤더정보(도 2-(c))는 TCP/IP 프로토콜 헤더내의 주요 관심 필드들의 정보이다. 두 호스트 사이의 연결정보(도 2-(d), 도 2-(e))는 두 호스트 사이의 방향을 고려한 각종 통계 정보들이다.Network protocol connection information (FIG. 2- (b)) is information analyzed for connection establishment of the TCP / IP protocol. The network protocol header information (Fig. 2- (c)) is information of the main fields of interest in the TCP / IP protocol header. The connection information between two hosts (FIGS. 2- (d) and 2- (e)) is various statistical information considering the direction between the two hosts.

네트워크 데이터 수집부(120)로부터 수집한 데이터를 네트워크 세션 특성정보 추출부(130)에서 세션별로 분류하고 특성정보를 추출하여(S310) 도 2에 도시된 바와 같이 각 세션별로 특성 정보를 누적한다(S320).The data collected from the network data collection unit 120 is classified into sessions by the network session characteristic information extracting unit 130 and the characteristic information is extracted (S310) to accumulate the characteristic information for each session as shown in FIG. 2 ( S320).

각 세션별로 네트워크 특성 정보를 추출한 데이터의 유형이 연속형인지 이산형인지를 판단한다(S330).It is determined whether the type of data from which network characteristic information is extracted for each session is continuous or discrete (S330).

추출한 데이터의 유형이 연속형인 경우에는 관리자가 설정한 임계값(threshold) 범위를 기준으로 어느 구간에서 연속적 또는 이산적인지 검색하는 특성 분류 시험(S340)을 거친다. If the type of the extracted data is continuous, a characteristic classification test (S340) for searching in which section is continuous or discrete based on a threshold range set by the administrator is performed.

연속형 데이터는 이산형에 비하여 정보이론에 기반한 분류 알고리즘에서 임계값 결정 방식에 따라 분류에 큰 영향을 미치게 된다. 그 결과, 이산형에 적합한 특성을 가진 분류 알고리즘에 적용하기 위하여 연속형 데이터의 경우 특성 분류 시험을 거친다.The continuous data has a greater influence on the classification according to the threshold determination method in the classification algorithm based on information theory than the discrete type. As a result, the continuous data is subjected to the characteristic classification test in order to apply it to the classification algorithm having the characteristics suitable for the discrete type.

추출한 데이터의 유형이 이산형인 경우에는 정상 및 공격 유형별로 분류(S370)한 후 분류 알고리즘에 적용하기 위해 "세션정보1, 세션정보2, ..., 세션정 보N, 명명자" 와 같은 입력데이터 형식으로 변환(S380)한다.If the extracted data type is discrete, it is classified by normal and attack type (S370) and then input data such as "session information 1, session information 2, ..., session information N, naming" to apply to the classification algorithm. Convert to format (S380).

특성 분류 시험(S340)을 거쳐 연속형 데이터 값이 특정 값이나 일정 범위의 대표값(예를 들어, 평균값)으로 이루어진 이산형 데이터 값들로 구획화(partition)가 되는지에 따라 최적의 분류(S350)가 되는지를 판단한다. Through the characteristic classification test (S340), the optimal classification (S350) is determined depending on whether the continuous data value is partitioned into discrete data values composed of a specific value or a range of representative values (for example, an average value). Determine if you can.

최적의 분류인 경우에는 연속형 데이터를 이산형 데이터로 변환(S360)하고, 그렇지 않은 경우에는 데이터의 변환없이 연속형 데이터를 정상 및 각 공격 유형별로 분류(370)하여 분류 알고리즘에 적용하기 위해 "세션정보1, 세션정보2, ..., 세션정보N, 명명자" 와 같은 입력 데이터 형식으로 변환(S380)한다.In the case of the optimal classification, the continuous data is converted into discrete data (S360). Otherwise, the continuous data is classified (370) by normal and each attack type without conversion of the data and applied to the classification algorithm. The session information 1, the session information 2, ..., session information N, the naming "is converted into the input data format (S380).

여기서 명명자는 네트워크 데이터가 속하는 네트워크 데이터 유형의 특성을 표시하는 명칭으로서, 상기 네트워크 데이터 수집부(120)의 파일명칭에 사용하는 방식과 같이 정상행위 데이터인 경우에는 'Normal(정상)'로 표기하고, 공격행위 데이터인 경우에는 상기 네트워크 데이터 수집부의 파일명칭을 참조하여 해당 공격명칭으로 표기한다.Here, the name is a name indicating the characteristics of the network data type to which the network data belongs , and in the case of normal behavior data such as the method used for the file name of the network data collection unit 120, it is expressed as 'Normal'. In case of attack data, the attack name is referred to by referring to the file name of the network data collection unit.

도 4 는 탐지규칙 자동생성부(140)에서, 도 3에서 입력 데이터 형식으로 변환된 세션 특성 정보 입력 데이터를 정보이론에 기반한 결정트리 알고리즘들중의 하나인 C4.5를 확장한 알고리즘을 사용하여 각 공격유형별로 분류시키는 흐름도이다. 4 is a detection rule automatic generation unit 140, by using the algorithm that extends C4.5, which is one of decision tree algorithms based on information theory, the session characteristic information input data converted to the input data format in FIG. Flow chart to classify by each attack type.

J. Ross Quinlan에 의해 제안된 C4.5는 엔트로피의 감소값에 따른 이득을 계산하여 트리를 자동으로 구성하는 알고리즘이다. C4.5 는 분류 알고리즘의 일종으로 네트워크 데이터는 대용량의 패킷들을 가지고 있고, 패킷내에 연속적인 형태와 이산적인 형태의 척도들이 동시에 존재하기 때문에 이를 효과적으로 분류하기에 적합하다. C4.5, proposed by J. Ross Quinlan, is an algorithm that automatically constructs a tree by calculating the gain according to the reduction of entropy. C4.5 is a classification algorithm. Network data has a large amount of packets, and it is suitable for classifying effectively because there are both continuous and discrete forms of scale in the packet.

또한, 이러한 분류 알고리즘은 신경망이나 베이지안 분류기에 비해 관계형 데이터베이스의 질의언어인 SQL문으로 바꾸기 쉬우므로 탐지규칙을 생성시키는데도 유리하다. In addition, this classification algorithm is easier to convert to SQL statements, which are query languages of relational databases, compared to neural networks and Bayesian classifiers.

기본적인 C4.5는 J. Ross Quinlan이 1986년 ID3 알고리즘을 기초로 하여 개발한 분류 알고리즘이다. 이 알고리즘은 부모 노드에서 자식노드가 2개 이상 분리되는 것을 허용하는 다지분리(multiple split)가 가능하고 생성된 트리로부터 해석이 용이하다. The basic C4.5 is a classification algorithm developed by J. Ross Quinlan based on the 1986 ID3 algorithm. This algorithm is capable of multiple splitting that allows two or more child nodes to be split at the parent node and is easy to interpret from the generated tree.

즉, 이산형에 대해서는 이산의 수만큼 분류를 할 수 있고, 연속형에 대해서도 분류가 가능하다. 그러나 연속형인 경우에는 이산형에 비해 분류가 잘 되지 않을 수 있다. 마지막으로 끝 노드에서 오류율을 조사하여 최소한의 오류율을 얻기 위해 가지치기를 한다.That is, the discrete type can be classified by the number of discrete, and the continuous type can also be classified. However, the continuous type may not be classified better than the discrete type. Finally, the error rate is examined at the end node and pruned to obtain the minimum error rate.

상기 탐지규칙 자동생성부에 사용하는 분류 알고리즘들 중 확장 C4.5 알고리즘은 트리를 자동으로 구성시킴으로써 분류를 위한 척도의 선택을 쉽게 하고, 구성된 트리 모형의 해석을 용이하게 해준다. Among the classification algorithms used in the detection rule automatic generation unit, the extended C4.5 algorithm automatically constructs a tree to easily select a measure for classification and to facilitate the interpretation of the constructed tree model.

또한 구성되는 변수들의 결합이 목표변수에 어떤 영향을 주는지 쉽게 파악할 수 있기 때문에 공격탐지에 적용할 경우에는 탐지 규칙과 공격과의 연관성을 규명하기에도 적합한 알고리즘이다.Also, it is easy to identify how the combination of variables affects the target variable. Therefore, when applied to attack detection, this algorithm is also suitable for identifying correlation between detection rule and attack.

확장 C4.5 알고리즘은 상기 기본적인 C4.5 알고리즘을 탐지규칙의 생성을 위 해 적합화시킨 것으로 상기 네트워크 세션 특성정보 추출부의 특성 분류 시험을 거쳐 입력데이터로 변환하는 과정이 선행되어야 한다.The extended C4.5 algorithm is an adaptation of the basic C4.5 algorithm for the generation of detection rules, which must be preceded by a process of classifying the network session feature information extractor into feature data.

결정트리를 구성하는 과정은 기본적인 C4.5 알고리즘과 같은 방식으로 수행되지만, 가지치기 과정을 수행하지 않고 최종 노드에서 오류율을 계산한다. 마지막으로 결정트리를 구성하여 결정트리 규칙으로 변환하는 과정을 포함한다.The decision tree is constructed in the same way as the basic C4.5 algorithm, but the error rate is calculated at the final node without pruning. Finally, the process includes constructing a decision tree and converting it to a decision tree rule.

도 4 는 탐지규칙 자동생성부(140)에서 확장 C4.5 알고리즘을 사용하여 다음과 같이 탐지규칙을 자동을 생성한다. 4 automatically generates a detection rule as follows using the extended C4.5 algorithm in the detection rule automatic generation unit 140.

세션 특성 정보 입력 데이터(S380)의 각 클래스(또는 분류)별 빈도를 계산한다(S410). 계산된 빈도를 바탕으로 결정트리의 노드를 구성한다(S420). 각 속성별 정보이득을 계산하고(S430) 최적의 정보 이득을 갖는 속성을 선택한다(S440). The frequency for each class (or classification) of the session characteristic information input data S380 is calculated (S410). A node of the decision tree is configured based on the calculated frequency (S420). The information gain for each attribute is calculated (S430), and an attribute having an optimal information gain is selected (S440).

최종(leaf) 노드가 아닌 경우 S410 에서 S440까지의 과정을 반복하여 수행한다(S450).If it is not the final (leaf) node and repeats the process from S410 to S440 (S450).

최종 노드인 경우에는 각 결정트리의 오류율을 계산하여 정확도를 결정한다(S460). 여기서 오류율은 기본적인 C4.5 알고리즘에서와 같이 데이터 집합에서 생성된 규칙을 만족하지 않는 경우의 비율로 계산되고, 정확도는 지지도(support)와 신뢰도(confidence)로 구성하여 정의한다.In the case of the last node, the accuracy is determined by calculating the error rate of each decision tree (S460). In this case, the error rate is calculated as the rate when the rule generated in the data set is not satisfied, as in the basic C4.5 algorithm, and the accuracy is defined by configuring support and confidence.

지지도는 빈번하게 발생하는 규칙의 빈도를 의미하는 것으로 지지도가 클수록 그 규칙이 유용함을 의미하고, 신뢰도는 규칙이 얼마나 믿을만한지를 나타내는 것으로 신뢰도가 클수록 정확성이 높다는 것을 의미한다. Support means the frequency of frequently occurring rules. The larger the support, the more useful the rule is. The reliability means how reliable the rule is. The higher the reliability, the higher the accuracy.

지지도는 전체 데이터 집합의 크기에 대한 해당 항목의 크기의 비율로 정의 하고, 신뢰도는 100% 신뢰도에서 오류율을 뺀 값으로 정의한다.Support is defined as the ratio of the size of the item to the size of the entire data set, and confidence is defined as 100% confidence minus the error rate.

즉 다음과 같이 표현될 수 있다. That is, it can be expressed as follows.

신뢰도 = 1.0 - 오류율Reliability = 1.0-Error Rate

지지도 = 해당 항목의 크기 / 전체 데이터 집합의 크기Support = size of the item / size of the entire dataset

최종 노드인 경우에는 각 결정트리의 오류율을 계산하여 정확도를 결정한 후(S460) 각 노드들은 정확도를 갖는 결정트리를 구성한다(S470). In the case of the final node, after determining the accuracy by calculating the error rate of each decision tree (S460), each node constructs a decision tree having accuracy (S470).

일 실시예로서, 다음은 mailbomb 공격에 해당하는 공격행위(Attack)와 정상행위(Normal)인 경우에 구성된 결정트리의 일부분을 나타낸 것이다. As an example, the following shows a part of the decision tree constructed in the case of attack and normal corresponding to a mailbomb attack.

결정트리의 구성형식은 "레벨번호 척도명 조건기호 척도값 : [정상 또는 공격명 (정확도)]"으로 표현된다. 여기서, 최종 노드인 경우에서만 []항목이 표현되고, 정확도는 상기 단계의 정의와 같이 지지도와 신뢰도를 포함한다.The format of the decision tree is expressed as "Level Number Scale Name Condition Symbol Scale Value: [Normal or Attack Name (Accuracy)]". Here, the [] item is expressed only in the case of the final node, and the accuracy includes support and reliability as defined in the above step.

L2 HAttl_length > 112 :L2 HAttl_length> 112:

L3 HBidletime_max <= 988.7 :L3 HBidletime_max <= 988.7:

L4 HBminseg_size > 23 : mailbomb (1.000, 0.980)L4 HBminseg_size> 23: mailbomb (1.000, 0.980)

L4 HBminseg_size <= 23 :L4 HBminseg_size <= 23:

L5 HBidletime_max <= 46.6 : Normal (0.921, 0.752)L5 HBidletime_max <= 46.6: Normal (0.921, 0.752)

각 노드에서 정확도를 갖는 결정트리를 구성한 후(S470) 정상 및 각 공격 유형별로 "규칙번호;공격명;조건식들;정확도(지지도, 신뢰도)" 형식으로 표현된 탐지규칙으로 변환(S480)하여 탐지규칙 저장 데이터베이스(150)에 저장한다. After constructing a decision tree with accuracy at each node (S470), it is converted into a detection rule expressed in the form of "rule number; attack name; conditional expressions; accuracy (support map, reliability)" for each type of attack and detection (S480). The rule storage database 150 stores the data.

위에서 설명한 mailbomb 공격에 해당하는 공격행위(Attack)와 정상행위(Normal)인 경우에 구성된 결정트리의 일부분의 일 실시예를 탐지규칙 형식으로 변환하면 다음과 같이 표현된다.When an embodiment of a part of the decision tree configured in the case of attack and normal corresponding to the mailbomb attack described above is converted into a detection rule format, it is expressed as follows.

1;mailbomb;HAttl_length > 112, HBidletime_max <= 988.7, HBminseg_size > 23;1.000, 0.9801; mailbomb; HAttl_length> 112, HBidletime_max <= 988.7, HBminseg_size> 23; 1.000, 0.980

도 5는 도 1에 도시된 탐지규칙 검증 및 갱신부(180) 내의 탐지규칙 검증부(160)의 세부 구성도이다. 5 is a detailed configuration diagram of the detection rule verification unit 160 in the detection rule verification and update unit 180 shown in FIG. 1.

탐지규칙 검증제어기에서 검증을 시작하는 경우(S510), 네트워크 데이터 생성부(110)를 통해 정상 및 공격 데이터를 생성하여 네트워크 데이터 수집부(120)를 통해 데이터를 수집하고, 수집된 데이터를 도3의 과정과 같이 네트워크 세션 특성 정보 추출기(130)를 통해 세션 특성 정보 데이터를 추출하여(S511-S514) "규칙번호;공격명;조건식들;정확도(지지도, 신뢰도)" 형식으로 표현된 탐지규칙 형식으로 변환한다(S520).In the case where verification is started in the detection rule verification controller (S510), normal and attack data are generated through the network data generator 110 to collect data through the network data collector 120, and the collected data is illustrated in FIG. 3. The detection rule format expressed in the form of "rule number; attack name; conditional expressions; accuracy (support map, reliability)" by extracting the session characteristic information data through the network session characteristic information extractor 130 as shown in FIG. Convert to (S520).

변환된 탐지규칙과 탐지규칙 저장 데이터베이스(150)에 저장된 탐지규칙들을 질의(query)를 통해 일치하는지를 검증한다(S530). 질의는 데이터베이스에 저장되 어 있는 전체 탐지규칙들과 일대일 일치여부를 검색하여 판단하기 위한 것으로, 도 4 의 S480 단계와 같은 탐지규칙의 형식에 포함된 조건식들과 순서대로 모두 비교한다. The converted detection rule and the detection rules stored in the detection rule storage database 150 are verified through a query (S530). The query is for searching and determining whether one-to-one correspondence with the entire detection rules stored in the database is compared, and compares all of them in order with the conditional expressions included in the format of the detection rule such as step S480 of FIG. 4.

모든 조건식들이 일치하는 경우에는 탐지규칙 자동 갱신기(170)를 통해 탐지규칙의 정확도를 갱신한다(S540). 하나 이상의 조건식이 일치하지 않는 경우에는 S613 단계로 돌아가서 다시 그 이후의 단계들을 반복하여 처리한다. If all the condition expressions match, the accuracy of the detection rule is updated through the automatic detection rule updater 170 (S540). If one or more conditional expressions do not match, the process returns to step S613 to repeat the subsequent steps.

도 6은 도 1에 도시된 탐지규칙 검증 및 갱신부(180) 내의 탐지규칙 자동갱신부(170)의 세부 구성도이다. 6 is a detailed block diagram of the detection rule automatic update unit 170 in the detection rule verification and update unit 180 shown in FIG. 1.

탐지규칙 검증부(160)를 통해 검증된 세션 특성 정보 입력 데이터탐지규칙에 대하여 데이터 유형이 정상(Normal)행위, 공격(Attack)행위, 미지(Unknown)행위인지 판단한다(S610). The session type information input data detection rule verified through the detection rule verification unit 160 determines whether the data type is normal, attack, or unknown (S610).

여기서 상기 데이터 유형은 도 3의 변환된 입력 데이터 형식의 마지막 항인 명명자에 의하여 판단하고, 정상 및 공격 행위에 속하지 않는 경우는 모두 미지 행위로 판단한다(S380).In this case, the data type is determined by the nominal person, which is the last term of the converted input data format of FIG. 3, and all cases that do not belong to normal and attacking behavior are determined as unknown behavior (S380).

각 데이터 유형(정상, 공격, 미지)에 따라 각각 오류율을 계산한다(S611, S612, S613). Error rates are calculated for each data type (normal, attack, unknown) (S611, S612, and S613).

정상 행위를 반영한 오류율은 탐지규칙 정확도의 오류율(100% 신뢰도 - 그 규칙의 신뢰도)에 정상 행위 데이터집합만을 대상으로 한 오류율을 곱한 값으로 계산한다(S611). The error rate reflecting the normal behavior is calculated by multiplying the error rate (100% confidence-the confidence of the rule) of the detection rule accuracy by the error rate only for the normal behavior data set (S611).

공격 행위를 반영한 오류율은 탐지규칙 정확도의 오류율에 공격 행위 데이터 집합만을 대상으로 한 오류율을 곱한 값으로 계산한다(S612). The error rate reflecting the attacking behavior is calculated by multiplying the error rate of the detection rule accuracy by the error rate targeting only the attacking behavior data set (S612).

미지행위를 반영한 오류율은 정상 및 공격 행위가 아닌 데이터집합만을 대상으로 하여 미지 행위를 반영한 오류율을 계산한다(S613). The error rate reflecting the unknown behavior calculates an error rate reflecting the unknown behavior only for the data set, not the normal and the attack behavior (S613).

이렇게 각 데이터 유형(정상, 공격, 미지)에 따라 각각 오류율을 반영하는 것은 탐지규칙 정확도의 신뢰성을 높이는 의미를 지닌다. In this way, reflecting the error rate for each data type (normal, attack, unknown) means increasing the reliability of detection rule accuracy.

S611- S613 단계에서 각각 계산된 오류율들을 바탕으로 가중치 부여, 퍼셉트론 규칙, 퍼지 이론 등을 이용하여 탐지규칙의 정확도를 계산한다(S620). 여기서 정확도는 상기 탐지규칙 자동생성부(140)에서와 마찬가지로 신뢰도와 지지도를 포함한다. Based on the error rates calculated in each of steps S611 to S613, the accuracy of the detection rule is calculated using weighting, a perceptron rule, a fuzzy theory, and the like (S620). In this case, the accuracy includes reliability and support as in the detection rule automatic generation unit 140.

예를 들어, 가중치 부여 방법을 적용하는 경우에는 해당 규칙의 기존 정확도에 관리자가 정의한 각 데이터 유형에 대한 가중치와 계산된 오류율을 곱한 값을 더하여 계산한다.For example, if a weighting method is applied, the existing accuracy of the rule is calculated by adding the weight of each administrator-defined data type multiplied by the calculated error rate.

그 후 탐지규칙 검증부에서 변환된 탐지규칙과 탐지규칙 저장 데이터베이스(150)에 저장된 탐지규칙들을 질의(query)를 통해 일치하는지를 검증하는 단계에서 검색된 규칙의 정확도를 자동으로 갱신한 후 탐지규칙 저장 데이터베이스(150)에 저장한다(S630, S640).Thereafter, the detection rule verification unit automatically updates the accuracy of the searched rule in the step of verifying whether the converted detection rule and the detection rules stored in the detection rule storage database 150 are matched by using a query, and then the detection rule storage database. In operation S630, the storage medium is stored at 150.

도 1 의 네트워크 데이터 생성부(110)에서 생성된 데이터를 네트워크 데이터 수집부(120)에서 생성된 데이터의 유형에 따라 수집한다. 그 후 상기 수집된 네트워크 데이터를 세션 특성 정보별로 분류한다(S700). 그 후 분류된 데이터를 세션 특성 정보 및 정상 유형, 공격 유형 및 미지 유형 중 상기 네트워크 데이터가 속하는 네트워크 데이터 유형의 특성을 포함하는 입력 데이터 형식으로 변환한다(S710). 이에 대하여는 도 3에서 서술한 네트워크 세션 특성정보 추출부의 구성과 관련하여 서술한 내용을 참고한다.The data generated by the network data generator 110 of FIG. 1 is collected according to the type of data generated by the network data collector 120. Thereafter, the collected network data is classified by session characteristic information (S700). Thereafter, the classified data is converted into an input data format including session characteristic information and characteristics of the network data type to which the network data belongs among the normal type, the attack type, and the unknown type (S710). This will be described with reference to the configuration described in connection with the configuration of the network session characteristic information extraction unit described in FIG.

상기 입력 데이터 형식으로 변환된 상기 데이터에 분류 알고리즘을 적용하여 결정트리를 구성하고(S720), 상기 결정트리의 최종노드에서 오류율을 기초로 상기 결정트리의 정확도를 생성한다(S730).A decision tree is constructed by applying a classification algorithm to the data converted into the input data format (S720), and the accuracy of the decision tree is generated based on an error rate in the final node of the decision tree (S730).

그 후, 상기 명칭, 상기 결정트리의 최적의 정보이득을 갖는 노드를 선택하기 위한 조건식, 및 상기 정확도를 기초로 상기 네트워크 데이터 유형별로 패턴화된 탐지규칙을 생성한다(S740). 이에 대하여는 도 4에서 서술한 탐지규칙 자동생성부의 구과 관련하여 서술한 내용을 참고한다.Thereafter, a detection rule patterned for each network data type is generated based on the name, a conditional expression for selecting a node having an optimal information gain of the decision tree, and the accuracy (S740). For this, refer to the contents described in connection with the phrase of the detection rule automatic generation unit described in FIG.

이전에 생성된 상기 탐지규칙과 새롭게 생성된 상기 탐지 규칙이 일치하는지 판단한다(S750).이에 대하여는 도 5에서 서술한 탐지규칙 검증 및 갱신부 내의 탐지규칙 검증부의 구성과 관련하여 서술한 내용을 참고한다.It is determined whether the previously generated detection rule coincides with the newly generated detection rule (S750). For this, refer to the contents described regarding the configuration of the detection rule verification unit in the detection rule verification and update unit described in FIG. 5. do.

상기 탐지 규칙이 일치하는 경우 상기 각각의 네트워크 데이터 유형별 오류율을 상기 오류율에 상기 각각의 네트워크 데이터 유형만을 대상으로 한 오류율을 곱하여 생성하고, 상기 생성된 각각의 데이터 유형별 오류율을 기초로 상기 결정 트리의 정확도를 갱신한다(S760). 이에 대하여는 도 6에서 서술한 탐지규칙 검증 및 갱신부 내의 탐지규칙 자동갱신부의 구성과 관련하여 서술한 내용을 참고한다.When the detection rule is matched, the error rate for each network data type is generated by multiplying the error rate by an error rate only for each network data type, and the accuracy of the decision tree is based on the generated error rate for each data type. Update (S760). Regarding this, refer to the descriptions related to the configuration of the detection rule automatic update unit in the detection rule verification and update unit described in FIG. 6.

도 8 은 본 발명의 효과를 종래의 공격 탐지 방법 중의 하나와 비교하여 도 시한 도면이다. 본 발명의 탐지규칙 형성 방법과 Florida Institute of Technology의 LERAD의 접근방법을 비교한 공격 탐지율의 차이를 도시하고 있다. 8 is a view showing the effect of the present invention compared to one of the conventional attack detection method. The difference in attack detection rate is shown comparing the detection rule formation method of the present invention with the LERAD approach of the Florida Institute of Technology.

도 8 에서 가로축은 각 공격명을 명시한 것이고, 세로축은 각 공격에 대한 탐지율을 표시한다. 본 발명은 실선 화살표 부분으로 도시되고, 인용된 LERAD의 탐지율을 도시한다. 도면에 도시된 바와 같이 LERAD의 연구 결과보다 본 발명에서 10% 정도 향상된 탐지 결과를 볼 수 있다.In FIG. 8, the horizontal axis indicates each attack name, and the vertical axis indicates the detection rate for each attack. The present invention is shown by the solid arrow portion and shows the detection rate of the cited LERAD. As shown in the figure, the detection result of about 10% improvement can be seen in the present invention than the result of the study of LERAD.

도 9 (a) 및(b)는 연속형 데이터를 이산형 데이터로 변환함으로써 탐지규칙의 정확도 향상을 나타내는 도면이다.9 (a) and (b) are diagrams showing the improvement of accuracy of a detection rule by converting continuous data into discrete data.

도 9(a)는 실험데이터인 DARPA 데이터의 주변 데이터에 적용한 결과, 연속형 데이터일 때의 탐지율보다 이산형 데이터로 변환하였을 때의 탐지율이 높음을 도시한다. 즉 도면에 도시된 바와 같이, Tue, Wed 데이터에 적용한 결과에서는 거의 향상이 이루어지지 않았으나, 나머지 데이터에서는 탐지율이 향상되고 있음을 볼 수 있다. FIG. 9 (a) shows that the detection rate when converted to discrete data is higher than the detection rate when the continuous data is applied to the peripheral data of the DARPA data, which is experimental data. That is, as shown in the figure, the improvement in the result applied to the Tue, Wed data was hardly achieved, it can be seen that the detection rate is improved in the remaining data.

도 9(b) 는 연속형 데이터인 경우보다 이산형으로 변환한 데이터인 경우 오탐율이 더 낮아짐을 도시한다. 즉, Mon, Fri 데이터에 적용한 경우에는 큰 변화가 없으나, 그 외의 데이터에서는 오탐율(False Positive)이 낮아짐을 볼 수 있다. FIG. 9 (b) shows that the false positive rate is lower in the case of data converted to discrete type than in the case of continuous data. In other words, when applied to Mon, Fri data does not have a large change, other data can be seen that the false positive (False Positive) is lowered.

도 10 및 도 11 은 본 발명의 일 실시예에 따른 탐지규칙 형성을 실험 데이터에 적용한 결과를 도시한다. 도 3 에서 각 세션별로 특성 정보를 누적(S320)하고 데이터 유형을 판단(S330)하기 전에 다양한 척도(measure)들 중에서 좋은 척도들을 우선 선택되도록 하여 DARPA 실험 데이터에 대해 적용한 결과이다.10 and 11 illustrate results of applying detection rule formation to experimental data according to an embodiment of the present invention. In FIG. 3, before the characteristic information is accumulated for each session (S320) and the data type is determined (S330), a good measure is selected among various measures and applied to the DARPA experimental data.

GRR은 다음과 같이 정의한다.GRR is defined as follows.

GRR(Good Rule Rate) =

, (이 때, α= 0.01)Good Rule Rate (GRR) =

, Where α = 0.01

매 단계마다 GRR값이 좋은 것은 선택(G)하고, 불필요한 것은 무시하고(I), 나쁜 것은 선택하지 않는(X) 식으로 시험한다. 위와 같은 방식으로 시험한 결과, 도 10에서와 같이 Wed와 Thu데이터에 적용했을 때 매 단계마다 좋은 척도만을 선택할 경우, 탐지율이 높고 오탐율이 낮아진다.In each step, the GRR value is tested by selecting (G) the good, ignoring the unnecessary (I) and not selecting the bad (X). As a result of the test in the above manner, when applied to Wed and Thu data as shown in FIG. 10, if only a good measure is selected at each step, the detection rate is high and the false detection rate is low.

도 11에서 RST 는 시험 결과를 의미하고, SLT 는 선택을 의미한다.In FIG. 11, RST means a test result and SLT means a selection.

본 발명은 또한 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록 장치를 포함한다. The invention can also be embodied as computer readable code on a computer readable recording medium. Computer-readable recording media include all kinds of recording devices that store data that can be read by a computer system.

컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플라피 디스크, 광데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장되고 실행될 수 있다.Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage, and the like, which are also implemented in the form of carrier waves (for example, transmission over the Internet). It also includes. The computer readable recording medium can also be distributed over network coupled computer systems so that the computer readable code is stored and executed in a distributed fashion.

이상 도면과 명세서에서 최적 실시예들이 개시되었다. 여기서 특정한 용어들이 사용되었으나, 이는 단지 본 발명을 설명하기 위한 목적에서 사용된 것이지 의 미 한정이나 특허청구범위에 기재된 본 발명의 범위를 제한하기 위하여 사용된 것은 아니다. The best embodiments have been disclosed in the drawings and specification above. Although specific terms have been used herein, they are used only for the purpose of describing the present invention and are not used to limit the scope of the present invention as defined in the meaning or claims.

그러므로 본 기술 분야의 통상의 지식을 가진 자라면 이로부터 다양한 변형 및 균등한 타 실시예가 가능하다는 점을 이해할 것이다. 따라서 본 발명의 진정한 기술적 보호 범위는 첨부된 특허청구범위의 기술적 사상에 의해 정해져야 할 것이다.Therefore, those skilled in the art will understand that various modifications and equivalent other embodiments are possible from this. Therefore, the true technical protection scope of the present invention will be defined by the technical spirit of the appended claims.

본 발명에서는 네트워크 세션 기반의 특성 정보들에 대해 각 공격 유형별 탐지규칙들을 생성하고, 탐지 오류율을 재계산하여 보다 더 정확한 탐지규칙들을 선별하고, 정확성이 낮은 탐지규칙들은 배제하는 과정을 자동으로 처리함으로써, 전문가의 도움 없이도 알려진 공격이나 유사한 공격 유형을 탐지하는데 효과적인 탐지규칙들을 자동으로 생성할 수 있어 보다 빠른 대응을 수행한다. In the present invention, by generating detection rules for each attack type for network session-based characteristic information, recalculating the detection error rate to select more accurate detection rules, and automatically removing the detection rules with low accuracy. In addition, detection rules can be automatically generated to detect known or similar types of attacks without the need for expert assistance, resulting in faster response.

그에 따라, 변형된 공격행위나 알려지지 않은 새로운 공격에 대한 탐지에 있어 높은 정확성을 지니고 새로운 형태의 공격에 대한 대처 능력을 갖춘 탐지 규칙을 생성한다.As a result, detection rules are generated with high accuracy in detection of modified attack behaviors or unknown new attacks and capable of coping with new types of attacks.

또한, 네트워크 패킷에서는 나타나지 않거나 정상적인 네트워크 데이터 흐름과 명확하게 구별할 수 없는 공격유형에 대한 특징들을 네트워크 세션 단위의 특성정보를 이용하여 추출하는 효과를 발생한다.In addition, there is an effect of extracting the characteristics of the attack type that do not appear in the network packet or cannot be clearly distinguished from the normal network data flow by using the characteristic information of the network session unit.

Claims

A network session characteristic information extraction unit for converting network data classified by session characteristic information into an input data format including the session characteristic information and a characteristic of a network data type to which the network data belongs among a normal type, an attack type, and an unknown type; And

A decision tree is constructed by applying a classification algorithm to the network data converted into the input data format, and the accuracy of the decision tree is generated based on an error rate at a final node of the decision tree. A detection rule automatic generation unit for generating a detection rule patterned for each network data type based on a conditional expression for selecting a node having an optimal information gain of a tree and the accuracy; Generating device.

The method of claim 1,

When the previously generated detection rule and the newly generated detection rule match

The error rate for each network data type is generated by multiplying the error rate by an error rate for each network data type only, and verifying a detection rule for updating the accuracy of the decision tree based on the generated error rate for each data type. Updater; Attack behavior detection rule generating device further comprising.

The method of claim 1,

Depending on whether network data classified by session characteristic information is continuous within a predetermined range, it is divided into continuous type and discrete type,

In case of continuous type, if it is partitioned into discrete data value within a predetermined range based on a predetermined threshold range, it is converted into discrete data.

The network data is classified into normal behaviors, attack behaviors, and unknown behaviors, thereby classifying the algorithm. remind Apparatus for generating an attack behavior detection rule, characterized in that the conversion to the input data format.

The method of claim 1, wherein the algorithm is

Calculating a frequency for each classification of network data classified by the session characteristic information,

Configure nodes of the decision tree based on the calculated frequency,

Information nodes are calculated for each node to select nodes having optimal information gains.

And an attack behavior detection rule generation device for calculating an accuracy of the decision tree based on the error rate at a final node.

The method of claim 1, wherein the session characteristic information

And at least one of session total statistics information, network protocol connection information, network protocol header information, and connection information between two hosts.

Classifying network data by session characteristic information;

Converting the classified data into an input data format including the session characteristic information and a characteristic of a network data type to which the network data belongs among a normal type, an attack type, and an unknown type;

Constructing a decision tree by applying a classification algorithm to the data converted into the input data format;

Generating an accuracy of the decision tree based on an error rate at the last node of the decision tree; and

Generating a detection rule patterned for each network data type based on the name, a conditional expression for selecting a node having an optimal information gain of the decision tree, and the accuracy; How to create a detection rule.

The method of claim 6,

Determining whether the previously generated detection rule matches the newly generated detection rule;

Generating the error rate for each network data type by multiplying the error rate by the error rate for each network data type only if the detection rule is matched; and

And updating the accuracy of the decision tree based on the generated error rates for each data type.

The method of claim 6,

Dividing the network data classified by the session characteristic information into a continuous type and a discrete type according to whether the network data classified by the session characteristic information is continuous within a predetermined range;

Converting into discrete data when the data is partitioned into discrete data values within a predetermined range based on a predetermined threshold range in the case of continuous type; and

The network data is classified into normal behaviors, attack behaviors, and unknown behaviors, thereby classifying the algorithm. remind Converting to the input data format; attack method detection rule generation method further comprising a.

7. The method of claim 6, wherein the classification algorithm is

Calculating frequency of classification of network data classified by the session characteristic information;

Constructing nodes of the decision tree based on the calculated frequencies;

Calculating the information gain for each node and selecting a node having an optimal information gain; and

Calculating an accuracy of the decision tree on the basis of the error rate at a final node.

10. The method of claim 6 or 9, wherein the classification algorithm is

Method for generating attack behavior detection rule, characterized in that the extended C4.5 algorithm.

The method of claim 6, wherein the session characteristic information

Method for generating an attack behavior detection rule, characterized in that it comprises one or more of session total statistics information, network protocol connection information, network protocol header information, connection information between the two hosts.