KR101564518B1

KR101564518B1 - Method and apparatus for automatically creating rule for network traffic dection

Info

Publication number: KR101564518B1
Application number: KR1020140185959A
Authority: KR
Inventors: 김명섭; 심규석; 윤성호
Original assignee: 고려대학교 산학협력단
Priority date: 2014-12-22
Filing date: 2014-12-22
Publication date: 2015-10-29

Abstract

The present invention relates to a method and an apparatus for automatically creating a rule for a network traffic detection, which can automatically detect a network traffic by using a sequential pattern algorithm. The method for automatically creating a rule according to one embodiment of the present invention may include the steps of: collecting traffic including a packet; configuring a flow by grouping a plurality of packets included in the traffic; configuring a sequence by extracting an identifier and a payload from the flow; creating a content group ( ) having a length of k by using a content combination belonging to a content group ( ) having a length of k-1 and thereby, extracting a candidate content (herein, the k means a natural number of 2 or higher, a content group (L1) having a length of 1 is extracted from the configured sequence); extracting a final content by considering an inclusive relationship of the candidate contents; and creating a rule by using the extracted final content.

Description

[0001] METHOD AND APPARATUS FOR AUTOMATICALLY CREATING RULE FOR NETWORK TRAFFIC DECTION [0002]

본 발명은 네트워크 트래픽 탐지를 위한 규칙 자동 생성 방법 및 장치에 관한 것으로, 보다 상세하게는 순차 패턴 알고리즘을 이용하여 네트워크 트래픽을 자동으로 탐지하는 네트워크 트래픽 탐지를 위한 규칙 생성 방법 및 장치에 관한 것이다.BACKGROUND OF THE INVENTION 1. Field of the Invention [0001] The present invention relates to a method and apparatus for automatically generating rules for detecting network traffic, and more particularly, to a method and apparatus for generating rules for detecting network traffic that automatically detects network traffic using a sequential pattern algorithm.

네트워크 관리의 목적은 네트워크 자원을 최대한 활용하고 비정상 공격으로부터 네트워크 장비를 보호하는 것이다. 이를 위해 네트워크 관리자들은 적절한 네트워크 정책을 수립하여 관리 대상 네트워크에 적용한다. 네트워크 정책은 특정 트래픽을 차단하거나 대역폭을 조절하는 방법으로 수행되기 때문에 트래픽의 발생 원천을 알아내는 트래픽 분석이 선행되어야 한다. The goal of network management is to take full advantage of network resources and protect network equipment from anomalous attacks. To this end, network administrators establish appropriate network policies and apply them to the network to be managed. Since the network policy is performed by blocking specific traffic or adjusting the bandwidth, traffic analysis should be performed to find the origin of traffic.

트래픽 분석은 응용을 대표하는 고유한 특성(시그니쳐, 규칙)을 사용하여 트래픽을 발생시킨 응용을 판별하는 것으로써, 분석 결과는 네트워크 정책뿐만 아니라, capacity planning, network provisioning, traffic engineering, fault diagnosis 등과 같은 다양한 분야에서 활용된다. 관련된 선행문헌으로 대한민국 등록특허 제10-1156008호가 있다.The traffic analysis is to identify applications that generate traffic using unique characteristics (signatures, rules) that represent the application, and the analysis results can be used not only for network policy, but also for capacity planning, network provisioning, traffic engineering, It is used in various fields. A related prior art document is Korean Patent No. 10-1156008.

Snort는 보편적으로 사용하는 트래픽 탐지 엔진이다. 실제 상용 회사에서도 Snort 엔진을 트래픽 분석 및 탐지 장비 개발에 활용할 만큼 네트워크 트래픽 분야에서 매우 보편적인 도구이다. Snort는 기 정의된 규칙 (Rule)을 사용하여 대응되는 패킷이 탐지하고 규칙에 정의된 행위(action)를 수행한다. 규칙은 패킷 단위로 적용되며, 패킷의 헤더 정보(IP address, port, protocol)와 통계 정보(packet size), 그리고 페이로드 정보(content, pcre, offset) 등을 사용한다.Snort is a commonly used traffic detection engine. It is a very common tool in the field of network traffic that a commercial company can use Snort engine to develop traffic analysis and detection equipment. Snort uses predefined rules (Rule) to detect the corresponding packet and to perform the action defined in the rule. The rules are applied on a per-packet basis, and use header information (IP address, port, protocol), statistical information (packet size), and payload information (content, pcre, offset).

일반적으로 규칙을 생성하기 위해서는 분석 대상 트래픽을 전수 조사하여 해당 트래픽에서만 관찰되는 공통된 특징을 찾는 작업을 수행하였다. Generally, in order to generate a rule, we performed a total number of traffic analysis to find common characteristics observed only in the corresponding traffic.

하지만, 이러한 방법은 규칙 생성 시간이 많이 소요되고, 추출하는 사람의 능력에 따라 생성된 규칙의 정확도가 가변적이라는 한계점을 가진다. However, this method has a limitation that the rule generation time is long and the accuracy of the generated rule is variable according to the ability of the extracting person.

따라서 수동이 아닌 자동으로 네트워크 탐지를 위한 규칙을 신속하게 생성하는 기술에 대한 연구가 필요한 실정이다.Therefore, there is a need for research on techniques for quickly generating rules for network detection, not manual.

본 발명의 목적은 신속하고 일괄적인 네트워크 트래픽 탐지 규칙을 자동으로 생성할 수 있는 규칙 자동 생성 방법 및 장치를 제공하는 데 있다.SUMMARY OF THE INVENTION It is an object of the present invention to provide a method and apparatus for automatic generation of a rule that can automatically generate a rapid and collective network traffic detection rule.

본 발명의 목적은 순차 패턴 알고리즘을 이용하여 네트워크 트래픽을 자동으로 탐지하는 네트워크 트래픽 탐지를 위한 규칙 생성 방법 및 장치를 제공하는 데 있다.It is an object of the present invention to provide a method and apparatus for generating a rule for network traffic detection that automatically detects network traffic using a sequential pattern algorithm.

상기 목적을 달성하기 위해 본 발명의 일실시예에 의하면, 패킷을 포함하는 트래픽을 수집하는 단계; 상기 트래픽에 포함된 복수의 패킷을 그룹핑하여 플로우를 구성하는 단계; 상기 플로우에서 식별자 및 페이로드를 추출하여 시퀀스를 구성하는 단계; 및 길이가 k-1인 컨텐트 집합(

)에 속하는 컨텐트의 조합을 이용하여 길이가 k인 컨텐트 집합(

)을 생성하여 후보 컨텐트를 추출하는 단계- 상기 k는 2이상의 자연수이고, 길이가 1인 컨텐트 집합(L₁)은 상기 구성된 시퀀스에서 추출됨-; 상기 후보 컨텐트 간의 포함관계를 고려하여 최종 컨텐트를 추출하는 단계; 및 상기 추출된 최종 컨텐트를 이용하여 규칙을 생성하는 단계를 포함하는 규칙 자동 생성 방법이 개시된다.According to an aspect of the present invention, there is provided a method for controlling a network, the method comprising: collecting traffic including a packet; Grouping a plurality of packets included in the traffic to form a flow; Extracting an identifier and a payload in the flow to construct a sequence; And a content set of length k-1 (

), A content set having a length k (" k "

), And extracting candidate content, wherein k is a natural number greater than or equal to 2, and a content set (L ₁ ) of length ₁ is extracted from the configured sequence; Extracting a final content by considering the inclusion relation between the candidate contents; And generating a rule using the extracted final content.

상기 목적을 달성하기 위해 본 발명의 일실시예에 의하면, 패킷을 포함하는 트래픽을 수집하는 수집부; 상기 트래픽에 포함된 복수의 패킷을 그룹핑하여 플로우를 구성하는 플로우 구성부; 상기 플로우에서 식별자 및 페이로드를 추출하여 시퀀스를 구성하는 시퀀스 구성부; 및 길이가 k-1인 컨텐트 집합(

)을 생성하여 후보 컨텐트를 추출하고, 상기 후보 컨텐트 간의 포함관계를 고려하여 길이가 최종 컨텐트로 추출하는 컨텐트 추출부- 상기 k는 2이상의 자연수이고, 길이가 1인 컨텐트 집합(L₁)은 상기 구성된 시퀀스에서 추출됨-; 상기 추출된 최종 컨텐트를 이용하여 규칙을 생성하는 규칙 생성부; 상기 수집부, 상기 플로우 구성부, 상기 시퀀스 구성부, 상기 컨텐트 추출부 및 상기 규칙 생성부를 제어하는 제어부를 포함하는 규칙 자동 생성 장치가 개시된다.According to an aspect of the present invention, there is provided a communication system including: a collecting unit collecting traffic including a packet; A flow configuring unit for grouping a plurality of packets included in the traffic to form a flow; A sequence constructing unit for extracting an identifier and a payload from the flow to construct a sequence; And a content set of length k-1 (

), A content set having a length k (" k "

A content extracting unit for extracting a candidate content from the candidate content and extracting the length as a final content in consideration of a content relation between the candidate content, wherein the k is a natural number of 2 or more, and the content set (L ₁ ) Extracted from the configured sequence; A rule generator for generating a rule using the extracted final content; A rule automatic generation device is disclosed that includes a control unit that controls the collection unit, the flow configuration unit, the sequence configuration unit, the content extraction unit, and the rule generation unit.

본 발명의 일실시예에 의한 네트워크 트래픽 탐지를 위한 규칙 자동 생성 방법 및 장치는 관리자의 수작업에 의존하지 않고도 자동으로 네트워크 트래픽 탐지 규칙을 신속하게 생성할 수 있다.The automatic rule generation method and apparatus for detecting network traffic according to an embodiment of the present invention can automatically generate network traffic detection rules automatically without depending on the manual operation of the administrator.

본 발명의 일실시예에 의하면, 전처리나 후처리 과정이 없이도 네트워크 트래픽 탐지 규칙을 생성할 수 있다.According to an embodiment of the present invention, a network traffic detection rule can be generated without a preprocessor or a post-process.

도 1은 본 발명의 일실시예와 관련된 규칙 자동 생성 장치를 나타내는 블록도이다.
도 2는 본 발명의 일실시예와 관련된 규칙 자동 생성 방법을 나타내는 흐름도이다.
도 3은 본 발명의 일실시예와 관련된 컨텐트 추출 알고리즘을 나타낸다.
도 4는 본 발명의 일실시예와 관련된 후보 컨텐트 알고리즘을 나타낸다.
도 5는 도 3 내지 도 4의 알고리즘을 수행하는 과정의 예시를 나타낸다.
도 6은 본 발명의 일실시예와 관련된 컨텐트 정보만을 이용하여 생성된 규칙의 예를 나타낸다.
도 7은 본 발명의 일실시예와 관련된 컨텐트 정보와 추가 정보를 이용하여 생성된 규칙의 예를 나타낸다.
도 8은 본 발명의 일실시예와 관련된 컨텐트 위치 분석 알고리즘을 나타낸다.
도 9는 본 발명의 일실시예와 관련된 규칙 자동 생성 방법을 통해 생성된 snort content 규칙의 예를 나타낸다.1 is a block diagram illustrating an automatic rule generation device according to an embodiment of the present invention.
2 is a flow chart illustrating a method for automatically generating a rule in accordance with an embodiment of the present invention.
Figure 3 shows a content extraction algorithm in accordance with an embodiment of the present invention.
Figure 4 illustrates a candidate content algorithm associated with an embodiment of the present invention.
FIG. 5 shows an example of a process of performing the algorithm of FIGS.
6 shows an example of a rule generated using only content information related to an embodiment of the present invention.
FIG. 7 shows an example of a rule generated using content information and additional information according to an embodiment of the present invention.
Figure 8 shows a content location analysis algorithm in accordance with an embodiment of the present invention.
FIG. 9 shows an example of the snort content rule generated through the automatic rule generation method according to an embodiment of the present invention.

이하, 본 발명의 일실시예와 관련된 네트워크 트래픽 탐지를 위한 규칙 자동 생성 방법 및 장치에 대해 도면을 참조하여 설명하도록 하겠다.Hereinafter, a method and apparatus for automatically generating a rule for detecting network traffic according to an embodiment of the present invention will be described with reference to the drawings.

본 명세서에서 사용되는 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "구성된다" 또는 "포함한다" 등의 용어는 명세서상에 기재된 여러 구성 요소들, 또는 여러 단계들을 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다.As used herein, the singular forms "a", "an" and "the" include plural referents unless the context clearly dictates otherwise. In this specification, the terms "comprising ", or" comprising ", etc. should not be construed as necessarily including the various elements or steps described in the specification, Or may be further comprised of additional components or steps.

이하에서는 네트워크 트래픽 탐지를 Snort content 규칙을 네트워크 탐지 규칙의 예로 설명하도록 하겠다. Snort는 보편적으로 사용하는 트래픽 탐지 엔진이다. 실제 상용 회사에서도 Snort 엔진을 트래픽 분석 및 탐지 장비 개발에 활용할 만큼 네트워크 트래픽 분야에서 매우 보편적인 도구이다. Snort는 기 정의된 규칙(Rule)을 사용하여 대응되는 패킷이 탐지하고 규칙에 정의된 행위(action)를 수행한다. 규칙은 패킷 단위로 적용되며, 패킷의 헤더 정보(IP address, port, protocol)와 통계 정보(packet size), 그리고 페이로드 정보(content, pcre, offset) 등을 사용한다.In the following, we will describe network traffic detection as Snort content rule as an example of network detection rule. Snort is a commonly used traffic detection engine. It is a very common tool in the field of network traffic that a commercial company can use Snort engine to develop traffic analysis and detection equipment. Snort uses predefined rules (Rule) to detect the corresponding packet and to perform the action defined in the rule. The rules are applied on a per-packet basis, and use header information (IP address, port, protocol), statistical information (packet size), and payload information (content, pcre, offset).

도 1은 본 발명의 일실시예와 관련된 규칙 자동 생성 장치를 나타내는 블록도이다.1 is a block diagram illustrating an automatic rule generation device according to an embodiment of the present invention.

도시된 바와 같이, 규칙 자동 생성 장치(100)는 수집부(110), 플로우 구성부(120), 시퀀스 구성부(130), 컨텐트 추출부(140), 규칙 생성부(150), 저장부(160), 전송부(170) 및 제어부(180)를 포함할 수 있다.The automatic rule generating apparatus 100 includes a collecting unit 110, a flow generating unit 120, a sequence generating unit 130, a content extracting unit 140, a rule generating unit 150, a storage unit 160, a transmission unit 170, and a control unit 180.

수집부(110)는 최초 호스트 별로 분석 대상 응용 및 서비스, 혹은 악성 코드가 발생한 트래픽을 수집할 수 있다. 상기 트래픽은 패킷(packet)을 기본 단위로 한다.The collecting unit 110 can collect the analysis target application and the service or the traffic in which the malicious code is generated for each initial host. The traffic is based on a packet.

플로우 구성부(120)는 수집된 패킷 집합 트래픽을 플로우(flow)로 구성할 수 있다. The flow configuration unit 120 may configure collected packet aggregation traffic as a flow.

이하, 실시예에서의 플로우는 5-tuple(source IP, source port, Destination IP, Destination port, Protocol)이 동일한 패킷의 집합이다.Hereinafter, the flow in the embodiment is a set of packets having the same 5-tuple (source IP, source port, Destination IP, Destination port, Protocol).

시퀀스 구성부(130)는 단일 플로우에서 전송 방향이 같은 패킷들을 조합하여 하나의 시퀀스(sequence)를 구성할 수 있다. 예를 들어, 시퀀스 구성부(130)는 패킷의 구성요소 중 페이로드만을 조합하여 하나의 시퀀스(sequence)를 구성할 수 있다.The sequence generator 130 may combine packets having the same transmission direction in a single flow to form one sequence. For example, the sequence constructing unit 130 may construct a sequence by combining only the payload among the constituent elements of the packet.

시퀀스(sequence)는 컨텐트(content)를 추출하기 위한 순차 패턴 알고리즘의 입력으로 사용될 수 있다. A sequence can be used as an input to a sequential pattern algorithm for extracting content.

컨텐트 추출부(140)는 순차 패턴 알고리즘을 이용하여 규칙에 적용될 컨텐트를 추출할 수 있다. 순차 패턴 알고리즘은 입력 받은 sequence에서 길이가 1인 후보 content를 시작으로 길이를 증가시키면서 후보 content를 찾고 최종적으로 일정 수준이상의 지지도를 가지는 content를 추출할 수 있다. 본 명세서에서 content 길이는 해당 컨텐츠의 바이트 크기를 의미한다. The content extracting unit 140 may extract the content to be applied to the rule using the sequential pattern algorithm. The sequential pattern algorithm searches for candidate content by increasing the length starting from the candidate content of length 1 in the input sequence and finally extracts content with a certain level of support. In this specification, the content length means the byte size of the corresponding content.

규칙 생성부(150)는 상기 컨텐트 추출부(140)에서 추출된 컨텐트를 이용하여 네트워크 탐지 규칙을 생성할 수 있다. 단순히 content만을 규칙으로 사용할 경우 오탐(탐지 대상이 아닌 트래픽을 탐지)의 가능성이 높기 때문에, 규칙 생성부(150)는 추가적인 정보를 분석하여 규칙에 기술할 수 있다. 예를 들어, 상기 추가 정보는 헤더 정보 및 content의 위치 정보를 포함할 수 있다.The rule generation unit 150 may generate a network detection rule using the content extracted by the content extraction unit 140. [ If only content is used as a rule, the possibility of false positives (detection of non-detection traffic) is high, so that the rule generator 150 can analyze additional information and describe the information in the rule. For example, the additional information may include header information and location information of content.

저장부(160)는 생성된 규칙을 저장할 수 있다. The storage unit 160 may store the generated rules.

전송부(170)는 저장된 규칙을 특정 서버에 제공할 수 있다.The transmission unit 170 may provide the stored rules to the specific server.

제어부(180)는 수집부(110), 플로우 구성부(120), 시퀀스 구성부(130), 컨텐트 추출부(140), 규칙 생성부(150), 저장부(160) 및 전송부(170)를 전반적으로 제어할 수 있다.The control unit 180 includes a collecting unit 110, a flow generating unit 120, a sequence generating unit 130, a content extracting unit 140, a rule generating unit 150, a storage unit 160, Can be controlled as a whole.

도 2는 본 발명의 일실시예와 관련된 규칙 자동 생성 방법을 나타내는 흐름도이다.2 is a flow chart illustrating a method for automatically generating a rule in accordance with an embodiment of the present invention.

수집부(110)는 결정된 탐지 대상에 근거하여 트래픽을 수집할 수 있다(S210). 트래픽 수집은 규칙 생성을 위한 첫 단계라 할 수 있다. 탐지 대상은 응용, 서비스, 공격, 악성 코드 등 네트워크 관리 목적에 따라 매우 다양할 수 있다. 수집부(110)는 트래픽을 발생시키는 호스트에서 직접 트래픽을 수집 할 경우, wireshark, tcpdump와 같은 수집 도구를 사용하고, 네트워크 전체 트래픽을 수집할 경우, 스위치의 미러닝 기능이나 탭 장비를 사용하여 수집할 수 있다.The collecting unit 110 may collect traffic based on the determined detection target (S210). Traffic collection is the first step in creating rules. Detection targets can vary widely depending on network management objectives such as application, service, attack, and malicious code. The collecting unit 110 collects traffic directly from a host that generates traffic, uses a collecting tool such as wireshark and tcpdump, collects traffic using a switching function or a tap device can do.

하기 수학식 1 및 수학식 2는 수집한 패킷의 형태를 나타낸다.The following equations (1) and (2) show the types of collected packets.

PacketSet은 패킷들의 집합을 의미하고 단일 패킷 P는 호스트 ID, source IP address, source port, Layer4 protocol, destination IP address, destination port, 그리고 payload로 구성된다. 특히, payload는 연속된 문자들로 구성되며, 본 실시예에서 자동 생성하는 content는 payload의 부분 문자열을 의미한다.A PacketSet is a set of packets, and a single packet P consists of a host ID, a source IP address, a source port, a Layer 4 protocol, a destination IP address, a destination port, and a payload. Particularly, the payload is composed of consecutive characters. In this embodiment, the automatically generated content means a partial string of the payload.

규칙 생성을 위한 트래픽 수집에서는 탐지 대상 트래픽만을 수집해야 하기 때문에 개별 호스트에서 트래픽을 수집하는 것이 생성된 규칙의 정확성을 높이는 측면에서 권장될 수 있다. Since traffic collection for rule generation requires only the traffic to be detected, collecting traffic from individual hosts can be recommended in terms of improving the accuracy of generated rules.

한편, 본 발명의 일실시예에서 사용하는 지지도는 입력 트래픽을 발생시킨 호스트를 기준으로 하기 때문에 최소 2개 이상의 호스트에서 트래픽을 수집해야 한다. 하지만, 실제 트래픽 수집 환경에서 여러 호스트의 트래픽을 수집하는 것은 매우 번거롭고 불가능한 경우가 있기 때문에 동일한 호스트에서 수집한 트래픽을 여러 파일에 나누어 저장하고 지지도를 호스트 기준이 아닌 입력 파일 기준으로 계산하는 방법도 적용할 수 있다. 즉, 지지도를 계산하는 기준으로 트래픽 수집의 환경에 따라 변화할 수 있다. 자세한 설명은 후술하도록 하겠다.Meanwhile, the support used in the embodiment of the present invention is based on the host that generated the input traffic, so the traffic should be collected from at least two hosts. However, collecting traffic from multiple hosts in an actual traffic collection environment is cumbersome and impossible. Therefore, it is also possible to divide traffic collected from the same host into multiple files and calculate the support based on the input file instead of the host basis can do. In other words, it may change depending on the environment of traffic collection as a criterion for calculating the support score. A detailed description will be given later.

플로우 구성부(120)는 수집된 패킷 집합 트래픽을 플로우로 구성할 수 있다(S220). 본 실시예에서 사용한 플로우는 수학식 3 및 수학식 4와 같이 5-tuple(source IP, source port, Destination IP, Destination port, Protocol)이 동일한 패킷의 집합이다. The flow configuration unit 120 may configure the collected packet aggregation traffic as a flow (S220). The flow used in this embodiment is a set of packets having the same 5-tuple (Source IP, Source Port, Destination IP, Destination port, Protocol) as in Equations (3) and (4).

한편, 본 발명의 일실시예에 의하면, source 측과 destination측이 대칭되는 플로우는 하나의 플로우로 구성하고 각 패킷 집합에 전송 방향(forward, backward)을 기입할 수 있다. 즉, 본 발명의 일실시예에서 정의한 플로우는 5-tuple이 동일한 패킷 집합과 이와 대칭 패킷 집합을 포함하는 양방향 플로우가 될 수 있다.Meanwhile, according to an embodiment of the present invention, a flow in which a source side and a destination side are symmetric can be configured as a flow, and a forward direction and a backward direction can be written in each packet set. That is, the flow defined in an embodiment of the present invention may be a bidirectional flow in which 5-tuple includes the same packet set and a symmetric packet set.

패킷을 플로우로 구성하는 이유는 비록 snort가 패킷 단위에 적용되기는 하나 네트워크의 특성상 단일 메시지가 여러 패킷으로 나누어 전송되는 경우(패킷 단편화)가 발생한다. 따라서 단일 플로우를 구성하는 패킷들을 전송 방향 별로 구분하여 페이로드를 합치면 메시지의 끊김 없이 실제 전송된 페이로드 메시지를 확인할 수 있다.The reason for constructing a packet as a flow is that, although snort is applied to a packet unit, a single message is divided into several packets (packet fragmentation) due to the characteristics of the network. Therefore, by dividing the packets constituting a single flow by the transmission direction and combining the payload, the actual transmitted payload message can be confirmed without interrupting the message.

시퀀스 구성부(130)는 플로우의 순방향(forward), 역방향(backward)으로 구분된 패킷들의 페이로드만을 추출하여 하나의 sequence를 만들 수 있다(S230). 만약, 플로우가 양방향 통신 패킷들로 구성되었다면, 2개의 sequence가 생성되고 단 방향 통신 트래픽이면 1개의 sequence가 생성된다. The sequence constructing unit 130 may extract a payload of packets divided into a forward and a backward flow to generate a sequence (S230). If the flow is composed of two-way communication packets, two sequences are generated and if one-way communication traffic, one sequence is generated.

수학식 5 및 수학식 6은 시퀀스를 나타내는 수식이다. Equations (5) and (6) are mathematical expressions representing sequences.

SequenceSet은 수학식 5와 같이 여러 sequence(S)들로 구성되며, 하나의 sequence는 수학식 6과 같이 호스트 ID와 문자열

로 구성될 수 있다.The SequenceSet is composed of several sequences (S) as shown in Equation (5), and one sequence is composed of a host ID and a string

&Lt; / RTI >

시퀀스 구성부(130)는 수학식 5 및 수학식 6과 같이 구성된 sequence에 호스트 ID를 기입할 수 있다. 이는 후술할 지지도(support) 계산을 위해 사용될 수 있다. 만약 지지도 계산을 파일 기준으로 계산할 경우 시퀀스 구성부(130)는 파일 ID를 기입할 수 있다.The sequence constructing unit 130 may write the host ID into the sequence configured as shown in Equations (5) and (6). This can be used for support calculations as described below. If the support calculation is calculated on the basis of a file, the sequence constructing unit 130 can write the file ID.

컨텐트 추출부(140)는 sequence 집합과 최소 지지도를 입력 받아 최소 지지도를 만족하는 content를 추출할 수 있다(S240).The content extracting unit 140 may receive the sequence set and the minimum support and extract a content satisfying the minimum support (S240).

도 3은 본 발명의 일실시예와 관련된 컨텐트 추출 알고리즘을 나타내고, 도 4는 본 발명의 일실시예와 관련된 후보 컨텐트 알고리즘을 나타낸다. 도 3의 알고리즘 및 도 4의 알고리즘은 순차 패턴 알고리즘으로, 입력 받은 sequence 집합으로부터 기 정의된 최소 지지도를 만족하는 content 집합을 출력하는 방법을 나타낸다.FIG. 3 illustrates a content extraction algorithm in accordance with an embodiment of the present invention, and FIG. 4 illustrates a candidate content algorithm associated with an embodiment of the present invention. The algorithm of FIG. 3 and the algorithm of FIG. 4 are sequential pattern algorithms, and show a method of outputting a set of content satisfying a predetermined minimum support from an input sequence set.

하기 수학식 7 및 수학식 8은 추출된 컨텐트를 나타낸다. The following Equations (7) and (8) represent extracted contents.

도 3을 참조하면, 컨텐트 추출부(140)는 입력 받은 sequence 집합의 모든 sequence에서 길이 1인 content를 추출하여 길이 1 content 집합

에 저장한다(도 3 알고리즘 Line: 1~5). 컨텐트 추출부(140)는 길이 1인 content를 시작으로 길이를 1씩 늘려가며 모든 길이의 content를 추출하여 자신의 길이 content 집합

에 저장한다(도 3 알고리즘 Line: 6~20). 단, 컨텐트 추출부(140)는 새로 생성된 집합

의 모든 content 중, 입력받은 최소 지지도를 만족하지 않는 content는 삭제한다(도 3 알고리즘 Line: 8~17). 최소 지지도를 만족하지 못하는 content는 content 추출 자격을 만족하지 못할 뿐만 아니라 해당 content를 확장한 content에서도 최소 지지도를 만족하지 않기 때문이다.Referring to FIG. 3, the content extracting unit 140 extracts content having a length of 1 from all sequences of the input sequence set,

(Fig. 3 Algorithm Line: 1 to 5). The content extracting unit 140 extracts contents of all lengths by incrementing the length by 1, starting from content having a length of 1,

(Fig. 3 Algorithm Line: 6 to 20). However, the content extracting unit 140 extracts a newly created set

The content that does not satisfy the inputted minimum support is deleted (Algorithm Line 3 of FIG. 3). Content that does not meet the minimum support not only fails to satisfy the content extraction eligibility, but also does not meet the minimum support for the content that extends the content.

수학식 9는 지지도를 계산하는 수식이다.Equation (9) is a formula for calculating the degree of support.

도 3의 알고리즘에서는 content의 지지도는 수학식 (9)와 같이 전체 호스트 수 중 해당 content를 발생시킨 호스트의 비율로 측정될 수 있다(도 3 알고리즘 Line: 9~13). 예를 들어 특정 content가 모든 호스트에서 발생된 트래픽에서 관찰될 경우 지지도는 1, 그렇지 않을 경우 1 보다 작은 값을 가진다. In the algorithm of FIG. 3, the support of the content can be measured by the ratio of hosts that generate the corresponding content among the total number of hosts as shown in Equation (9) (FIG. 3 Algorithm Line: 9-13). For example, if a particular content is observed in traffic originating from all hosts, the acceptance is less than 1, otherwise it is less than 1.

만약, 제한된 트래픽 수집 환경에서 단일 호스트에서 수집된 트래픽을 본 알고리즘에 적용할 경우, 호스트 기준의 지지도 계산을 파일이나 플로우 기준으로 변경하여 사용할 수 있다. 최소 지지도를 만족하지 못하는 content가 삭제된 집합

의 content는 집합

을 생성하는 데 사용될 수 있다(도 3 알고리즘 Line: 18).If traffic collected from a single host in a limited traffic collection environment is applied to this algorithm, the host-based support calculation can be changed to a file or flow reference. If the content that does not meet the minimum support is deleted

The content of the set

(Figure 3 Algorithm Line: 18).

이때 사용하는 방법은 도 4의 알고리즘에 기술된 방법이다. 컨텐트 추출부(140)는 입력받은 집합

의 content들을 비교하여 집합

의 content를 생성할 수 있다. 집합

의 content를 조합하여 집합

의 content를 생성하는 것은 첫 문자를 제외한 길이 k-2 content와 마지막 문자를 제외한 길이 k-2 content가 동일한 집합

의 content들끼리 가능하다(도 4 알고리즘 Line: 1~7).The method used at this time is the method described in the algorithm of Fig. The content extracting unit 140 extracts a content

The content of

Can be generated. set

The content of

The length of the content except for the first character k-2 content and the length of the last character except k-2 content

(Fig. 4 Algorithm Line: 1 to 7).

예를 들어, 컨텐트 추출부(140)는 집합

의 content인 "abcd"와 "bcde"는 "a"를 제외한 "bcd"와 'e'를 제외한 "bcd"가 같기 때문에 집합

의 content "abcde"를 생성할 수 있다. For example, the content extracting unit 140 may extract

Bcd "except for" a "and" bcd "except for" e "are the same, so" abcd "and" bcde "

The content "abcde"

위와 같은 방법으로 길이를 1씩 증가 시키면서 더 이상 새로운 content가 추출되지 않을 때까지 content 추출과 지지도 미만 content 삭제를 반복한다. 추출의 마지막 단계로써 추출된 모든 길이의 content의 포함 관계를 확인하고, 만약 포함 관계에 있는 content가 발견되면 해당 content를 집합에서 삭제한다(도 3 알고리즘 Line: 22). 컨텐트 추출부(140)는 최종적으로 생성된 content 집합을 다음 단계인 규칙 생성 단계로 전달할 수 있다.Increase the length by 1 in this manner and repeat the content extraction and content deletion until the new content is no longer extracted. As a final step of the extraction, the contents of all extracted lengths are checked, and if the content in the inclusion relation is found, the corresponding content is deleted from the set (Fig. 3 Algorithm Line: 22). The content extracting unit 140 may transmit the finally generated content set to the next rule creating step.

도 5는 도 3 내지 도 4의 알고리즘을 수행하는 과정의 예시를 나타낸다.FIG. 5 shows an example of a process of performing the algorithm of FIGS.

도 5의 예시에서는 3명의 호스트에서 발생한 트래픽으로 구성된 SequenceSet과 최소 지지도 0.6을 입력받는다. SequenceSet은 4개의 sequence로 구성되어 있다. 최소 지지도가 0.6이라는 의미는 전체 호스트 수가 3이기 때문에 최소 2명의 호스트에서 발생된 트래픽에 해당 content가 관찰되어야 한다는 의미이다. In the example of FIG. 5, a SequenceSet configured with traffic generated from three hosts and a minimum support of 0.6 are input. A SequenceSet consists of four sequences. A minimum support of 0.6 means that the total number of hosts is 3, so the content should be observed for traffic from at least two hosts.

도 3 및 도4의 알고리즘이 수행되면 길이 1인 모든 content를 추출한다. 길이 1인 content(

,

) 모두 최소 지지도 0.6을 만족하기 때문에 길이 2 content 생성에 사용된다. 길이 2 content 생성 후, 최소 지지도를 계산하고 만족하지 못하는 content들은 삭제한다. 컨텐트 추출부(140)는 최소 지지도를 만족하는 길이 2 content(

,

)를 사용하여 길이 3 content를 추출한다. 최소 지지도를 만족하는 길이 3 content의 개수가 1이기 때문에 더 이상 content의 길이를 늘리는 것은 불가능하다. 따라서 컨텐트 추출부(140)는 content 추출을 종료한다. 추출 종료 후, 포함관계가 있는

,

는 contentSet에서 삭제되고 최종적으로

가 다음 단계인 규칙 생성 단계로 전달될 수 있다.When the algorithm of FIG. 3 and FIG. 4 is performed, all contents of length 1 are extracted. Content of length 1 (

,

) Are both used to generate length 2 content since they satisfy the minimum support of 0.6. After generating length 2 content, calculate the minimum support and delete content that is not satisfactory. The content extracting unit 140 extracts a content having a length 2 content (

,

) To extract length 3 content. It is impossible to increase the length of the content any more because the number of 3 contents satisfying the minimum support is 1. Therefore, the content extracting unit 140 ends the content extraction. After extraction is complete,

,

Is removed from the contentSet and finally

May be passed to the rule creation step, which is the next step.

규칙 생성부(150)는 추출된 최종 컨텐트를 이용하여 규칙을 생성할 수 있다.The rule generation unit 150 may generate a rule using the extracted final content.

추출된 최종 content 정보만을 사용하여 Snort 규칙을 작성할 경우, 오탐의 가능성이 높다. 즉, 추출된 content의 길이가 너무 짧을 경우, 탐지 대상이 아닌 트래픽에서 해당 규칙이 적용될 수 있다. When Snort rules are created using only the extracted final content information, there is a high possibility of false positives. That is, if the length of the extracted content is too short, the rule can be applied to traffic that is not a detection target.

도 6은 본 발명의 일실시예와 관련된 컨텐트 정보만을 이용하여 생성된 규칙의 예를 나타내고, 도 7은 본 발명의 일실시예와 관련된 컨텐트 정보와 추가 정보를 이용하여 생성된 규칙의 예를 나타낸다.FIG. 6 shows an example of a rule generated using only content information related to an embodiment of the present invention, and FIG. 7 shows an example of a rule generated using content information and additional information related to an embodiment of the present invention .

도 6 및 도 7과 같이 content 정보만을 사용한 규칙과 그렇지 않은 규칙은 큰 차이를 보인다. 도 6의 예는 패킷 페이로드 전체를 검사하면서 해당 content가 존재하는지를 검사한다. 하지만, 도 7은 TCP 프로토콜을 사용하고, Destination IP가 111.222.333.0/24, 포트 번호는80을 사용하여 전송하는 패킷 중 해당 content가 페이로드의 4번째 바이트부터 20번째 바이트 사이에 존재하는지를 검사한다. As shown in FIG. 6 and FIG. 7, there is a large difference between rules using only content information and rules not using content information. The example of FIG. 6 examines the entire packet payload and checks whether the corresponding content exists. However, FIG. 7 uses the TCP protocol and checks whether the corresponding content among the packets transmitted using the Destination IP of 111.222.333.0/24 and the port number of 80 exists between the 4th byte and the 20th byte of the payload .

따라서 규칙 생성부(150)는 content의 위치 정보와 헤더 정보 등의 추가 정보를 분석하고, 추출된 최종 컨텐트 외에 분석된 정보를 포함시켜 규칙을 생성할 수 있다(S250). 상기와 같이 분석된 추가 정보를 규칙에 포함시킴으로써 규칙의 오탐 가능성을 낮출 수 있을 뿐만 아니라, 상대적으로 수행 오버헤드가 큰 페이로드 검사량을 줄일 수 있기에 트래픽 탐지 시스템의 성능이 향상될 수 있다.Therefore, the rule generation unit 150 may analyze the additional information such as the location information of the content and the header information, and generate the rule by including the analyzed information in addition to the extracted final content (S250). By including the additional information analyzed as described above in the rule, it is possible to reduce the probability of false positives of the rule and reduce the payload inspection amount with relatively high performance overhead, so that the performance of the traffic detection system can be improved.

규칙 생성부(150)는 추출한 content의 위치 정보를 분석하기 위해 트래픽 수집 단계에서 생성한 패킷 데이터를 사용할 수 있다. Snort는 패킷 단위로 동작하기 때문에 content의 위치는 sequence가 아닌 실제 패킷 페이로드 내의 위치로 분석해야 한다. The rule generation unit 150 may use the packet data generated in the traffic collection step to analyze the location information of the extracted content. Because Snort operates on a packet-by-packet basis, the location of the content should be analyzed to the location in the actual packet payload, not the sequence.

도 8은 본 발명의 일실시예와 관련된 컨텐트 위치 분석 알고리즘을 나타낸다.Figure 8 shows a content location analysis algorithm in accordance with an embodiment of the present invention.

도 8의 알고리즘은 content와 packetSet이 주어졌을 때, content의 위치 정보를 분석하는 과정을 나타낸다. 도 8의 알고리즘의 출력인 offset은 해당 content가 packetSet의 패킷에 매칭 될 때, 매칭 시작 위치의 최소 바이트 위치를 의미하고 depth는 매칭 종료 위치의 최대 바이트 위치를 의미한다. 즉, 해당 content가 패킷에 매칭될 때, 페이로드의 offset과 depth 사이에서만 매칭된다는 의미이다.The algorithm of FIG. 8 shows a process of analyzing the location information of content when a content and a packetSet are given. The output offset of the algorithm of FIG. 8 means the minimum byte position of the matching start position when the corresponding content matches the packet of the packet set, and the depth means the maximum byte position of the matching end position. That is, when the corresponding content is matched to the packet, it means that it is matched only between the offset and the depth of the payload.

도 8을 참조하면, 규칙 생성부(150)는 최초 offset은 패킷의 최대 크기로 depth는 0으로 초기화 한다(도 8의 알고리즘 Line: 1~2). 그리고 규칙 생성부(150)는 packetSet의 모든 패킷을 순회하며 offset과 depth가 조정한다. 규칙 생성부(150)는 입력으로 받은 content와 패킷의 매칭 여부가 확인하고, 만약 매칭이 된다면 시작 바이트 위치를 얻어와 현재 offset과 비교한다. 만약, 현재 offset 보다 작은 값이면, 규칙 생성부(150)는 해당 값은 현재 offset으로 변경한다. depth의 경우 종료 바이트 위치를 얻어와 현재 depth 보다 큰 값이면, 규칙 생성부(150)는 해당 값을 현재 depth로 변경한다(도 8의 알고리즘 Line: 4~6).Referring to FIG. 8, the rule generator 150 initializes the initial offset to the maximum size of the packet and the depth to 0 (Algorithm Line 1 to 2 in FIG. 8). The rule generator 150 then traverses all packets in the packet set and adjusts the offset and depth. The rule generation unit 150 checks whether the received content and the packet are matched with each other. If the match is found, the rule generation unit 150 obtains the start byte position and compares it with the current offset. If the value is smaller than the current offset, the rule generator 150 changes the value to the current offset. In the case of depth, if the end byte position is obtained and is greater than the current depth, the rule generator 150 changes the value to the current depth (Algorithm Line 4-6 in FIG. 8).

규칙 생성부(150)는 최종 결정된 offset과 depth값을 content 규칙에 추가한다.The rule generation unit 150 adds the finally determined offset and depth values to the content rule.

규칙 생성부(150)는 추출한 content의 헤더 정보를 분석하기 위해 앞서 설명한 위치 정보 분석 단계와 유사한 과정을 수행할 수 있다. 규칙 생성부(150)는 packetSet의 모든 패킷을 순회하며, 해당 content와 매칭 여부를 확인한다. 만약, 매칭이 된다면 규칙 생성부(150)는 해당 패킷의 헤더 정보를 저장한다. 모든 패킷을 검사한 후, 저장된 헤더 정보가 고유한 한가지 값을 가지는 경우, 규칙 생성부(150)는 해당 헤더 정보를 content 규칙에 추가한다. 규칙 생성부(150)는 IP의 경우 CIDR 값을 32, 24, 16 순으로 감소시키면서 고유한 값이 추출될 때까지 반복한다. 즉, 규칙 생성부(150)는 CIDR 값이 32인, D 클래스 IP로 고유한 값을 찾는 것을 시도하고, 만약 찾지 못하면, CIDR 값을 24로 적용하여 C 클래스 IP를 찾는다. 예를 들어 해당 content가 매칭 되는 Destination IP가 CIDR 32로 "111.222.333.1/32"과 "111.222.333.2/32"가 추출되면, CIDR 24로 설정하고 "111.222.333.0/24"를 추출한다. 규칙 생성부(150)는 CIDR 16을 적용해도 고유한 IP를 추출하지 못할 경우 "any"로 추출할 수 있다.The rule generation unit 150 may perform a process similar to the location information analysis step described above in order to analyze the header information of the extracted content. The rule generation unit 150 circulates all the packets of the packet set and confirms whether the content matches with the corresponding content. If the packet is matched, the rule generator 150 stores header information of the packet. After checking all the packets, if the stored header information has one unique value, the rule generator 150 adds the header information to the content rule. The rule generation unit 150 repeats CIDR values in the order of 32, 24, and 16 in the case of IP until a unique value is extracted. That is, the rule generator 150 tries to find a unique value in the D class IP having the CIDR value of 32, and if not, the CIDR value is set to 24 to find the C class IP. For example, if the Destination IP to which the corresponding content matches is extracted as "111.222.333.1/32" and "111.222.333.2/32" with CIDR 32, the CIDR is set to 24 and "111.222.333.0/24" is extracted. The rule generation unit 150 can extract "any" if it can not extract a unique IP even when CIDR 16 is applied.

규칙 생성부(150)는 port의 경우에도 고유한 값이 추출되면 해당 값을 사용하고 고유하지 않은 여러 값이 추출되면 "any"로 추출할 수 있다.The rule generation unit 150 may use a corresponding value when a unique value is extracted in the case of a port and extract it as an "any "

상술한 컨텐트의 위치 정보, 헤더 정보 등의 추가 정보가 추가되어 규칙이 생성되면, 상기 생성된 규칙은 저장부(160)에 저장되고, 외부(개인 또는 단체)로 전송될 수 있다(S260).When the rule is generated by adding additional information such as the location information of the content, the header information, and the like, the generated rule may be stored in the storage unit 160 and transmitted to the external (individual or group) (S260).

생성된 규칙은 다음과 같이 활용될 수 있다.The generated rules can be used as follows.

첫째, 상기 생성된 일정 금액을 지불한 구독자에게 정기적으로 배부될 수 있다. 실제 www.snort.org에서도 일정 금액을 지불하는 개인 및 단체에게 최신 규칙을 제공한다. 둘째, 규칙 자동 생성 서비스를 제공하는 웹사이트를 운영할 수 있다. 사용자가 업로드한 트래픽 데이터에서 규칙을 자동 생성하고 생성된 결과를 실시간으로 확인할 수 있는 서비스를 제공할 수 있다. 마지막으로 본 시스템을 활용하기 원하는 네트워크 장비 개발 회사에 본 시스템을 설치하고 유지 관리 서비스를 제공할 수 있다.First, it can be distributed periodically to the subscriber who paid the generated certain amount. Actually, www.snort.org also provides the latest rules for individuals and organizations paying a certain amount. Second, you can run a website that provides automatic rule generation services. It is possible to provide a service that can automatically generate rules from the traffic data uploaded by the user and confirm the generated results in real time. Finally, it is possible to install this system and provide maintenance service to a network equipment development company that wants to utilize this system.

도 9는 본 발명의 일실시예와 관련된 규칙 자동 생성 방법을 통해 생성된 snort content 규칙의 예를 나타낸다.FIG. 9 shows an example of the snort content rule generated through the automatic rule generation method according to an embodiment of the present invention.

도시된 규칙의 예는 대표적인 인터넷 응용, 서비스 7종을 선정하여, 해당 응용을 사용할 때 발생되는 트래픽 수집을 통해 생성된 규칙이다. An example of the rule shown is a rule generated by collecting traffic generated when a typical Internet application or service is selected and the application is used.

전술한 바와 같이, 본 발명의 일실시예에 의한 네트워크 트래픽 탐지를 위한 규칙 자동 생성 방법 및 장치는 관리자의 수작업에 의존하지 않고도 자동으로 네트워크 트래픽 탐지 규칙을 신속하게 생성할 수 있다.As described above, the automatic rule generation method and apparatus for detecting network traffic according to an embodiment of the present invention can automatically generate network traffic detection rules automatically without depending on the manual operation of the administrator.

상술한 규칙 자동 생성 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터로 판독 가능한 기록 매체에 기록될 수 있다. 이때, 컴퓨터로 판독 가능한 기록매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 한편, 기록매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다.The automatic rule generation method described above can be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable recording medium. At this time, the computer-readable recording medium may include program commands, data files, data structures, and the like, alone or in combination. On the other hand, the program instructions recorded on the recording medium may be those specially designed and configured for the present invention or may be available to those skilled in the art of computer software.

컴퓨터로 판독 가능한 기록매체에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(Magnetic Media), CD-ROM, DVD와 같은 광기록 매체(Optical Media), 플롭티컬 디스크(Floptical Disk)와 같은 자기-광 매체(Magneto-Optical Media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. The computer-readable recording medium includes a magnetic recording medium such as a magnetic medium such as a hard disk, a floppy disk and a magnetic tape, an optical medium such as a CD-ROM and a DVD, a magnetic disk such as a floppy disk, A magneto-optical media, and a hardware device specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like.

한편, 이러한 기록매체는 프로그램 명령, 데이터 구조 등을 지정하는 신호를 전송하는 반송파를 포함하는 광 또는 금속선, 도파관 등의 전송 매체일 수도 있다.The recording medium may be a transmission medium such as a light or metal line, a wave guide, or the like including a carrier wave for transmitting a signal designating a program command, a data structure, and the like.

또한, 프로그램 명령에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상술한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The program instructions also include machine language code, such as those generated by the compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

상기와 같이 설명된 규칙 자동 생성 방법 및 장치는 상기 설명된 실시예들의 구성과 방법이 한정되게 적용될 수 있는 것이 아니라, 상기 실시예들은 다양한 변형이 이루어질 수 있도록 각 실시예들의 전부 또는 일부가 선택적으로 조합되어 구성될 수도 있다.The above-described automatic rule generation method and apparatus are not limited to the configuration and method of the above-described embodiments, but the embodiments may be modified so that all or some of the embodiments are selectively And may be configured in combination.

100: 규칙 자동 생성 장치
110: 수집부
120: 플로우 구성부
130: 시퀀스 구성부
140: 컨텐트 추출부
150: 규칙 생성부
160: 저장부
170: 전송부
180: 제어부100: Automatic rule generator
110: collecting section
120:
130:
140:
150:
160:
170:
180:

Claims

Collecting traffic including a packet;
Grouping a plurality of packets included in the traffic to form a flow;
Extracting an identifier and a payload in the flow to construct a sequence; And
A content set of length k-1 (

), A content set having a length k (" k "

), And extracting candidate content, wherein k is a natural number greater than or equal to 2, and a content set (L ₁ ) of length ₁ is extracted from the configured sequence;
Extracting a final content by considering the inclusion relation between the candidate contents; And
And generating a rule by using the extracted final content.

The method of claim 1, wherein the candidate content extracting step
The content set having the length k-1 (

And deleting the content that does not satisfy the set minimum support level.

3. The method of claim 2,
Wherein the calculation is performed based on any one of a host, a file, and a flow.

3. The method of claim 2, wherein the candidate content extraction step
A content set of length k-1 (

) Length k-2 excluding the first character Content length excluding k-2 content and last character Content set with the same content (

&Lt; / RTI > using a combination of the content belonging to the first rule.

3. The method of claim 2,
Using at least one of header information of a packet matched with the extracted final content and location information of the final content.

A collection unit for collecting traffic including packets;
A flow configuring unit for grouping a plurality of packets included in the traffic to form a flow;
A sequence constructing unit for extracting an identifier and a payload from the flow to construct a sequence; And
A content set of length k-1 (

), A content set having a length k (" k "

A content extracting unit for extracting a candidate content from the candidate content and extracting the length as a final content in consideration of a content relation between the candidate content, wherein the k is a natural number of 2 or more, and the content set (L ₁ ) Extracted from the configured sequence;
A rule generator for generating a rule using the extracted final content;
And a control unit for controlling the collecting unit, the flow generating unit, the sequence generating unit, the content extracting unit, and the rule generating unit.

7. The apparatus of claim 6, wherein the content extractor
The content set having the length k-1 (

And deletes the content that does not satisfy the set minimum support.

8. The method of claim 7,
A rule, a host, a file, and a flow.

8. The apparatus of claim 7, wherein the content extracting unit
A content set of length k-1 (

) Is a combination of the contents belonging to the automatic rule generating unit.

8. The apparatus of claim 7, wherein the rule generator
Wherein header information of a packet matched with the extracted final content and location information of the final content are used.