KR20150029483A

KR20150029483A - Apparatus and method for detecting attacks using data mining

Info

Publication number: KR20150029483A
Application number: KR20130108749A
Authority: KR
Inventors: 최경; 채기준; 진흔이; 이실; 김미희
Original assignee: 이화여자대학교 산학협력단
Priority date: 2013-09-10
Filing date: 2013-09-10
Publication date: 2015-03-18
Also published as: KR101535716B1

Abstract

The purpose of the present invention is to provide an apparatus and a method for detecting attacks, which are capable of improving the speed and precision of data mining for detecting an attack state. An apparatus and a method for detecting attacks using data mining are disclosed. The method for detecting attacks comprises the steps of: performing a first data mining using candidate property data; selecting one among a plurality of decision tree algorithms used for the first data mining based on a result of the first data mining; determining key property data among the candidate property data by using the selected decision tree algorithm; and performing a second data mining by using the determined key property data.

Description

[0001] APPARATUS AND METHOD FOR DETECTING ATTACKS USING DATA MINING [0002]

본 발명은 데이터 마이닝을 이용한 공격 탐지 장치 및 방법에 관한 것으로, 보다 상세하게는 후보 속성 데이터 중에서 공격 상태 검출에 사용되는 핵심 속성만을 입력 데이터로 이용하여 데이터 마이닝함으로써, 공격 상태 검출을 위한 데이터 마이닝의 속도 및 정확도를 높이는 공격 탐지 장치 및 방법에 관한 것이다. The present invention relates to an apparatus and method for detecting an attack using data mining. More particularly, the present invention relates to an attack detection apparatus and method using data mining using only key attributes used for attack state detection among candidate attribute data, Speed and accuracy of an attack detection apparatus and method.

서비스 거부(DoS: Denial of Service) 공격은 대응하는 데이터가 없는 요청 데이터를 대량으로 전송하여 네트워크 시스템이 요청 데이터에 대응하는 데이터를 검색하도록 하는 공격이다. 네트워크 시스템이 요청 데이터에 대응하는 데이터가 없다는 것을 모르는 경우, 네트워크 시스템은 요청 데이터에 대응하는 데이터를 검색하기 위하여 지속적으로 자원을 소모해야 하므로, 과부하가 발생하여 사용자들에게 정상적인 서비스를 제공하지 못할 수 있다.A Denial of Service (DoS) attack is an attack that causes the network system to retrieve data corresponding to the requested data by sending a large amount of request data without corresponding data. If the network system does not know that there is no data corresponding to the requested data, the network system must continuously consume resources to retrieve the data corresponding to the requested data, so that an overload may occur and the normal system may not be provided to the users have.

한국공개특허 제10-2009-0079627호(공개일 2009년 07월 22일)에는 데이터 마이닝을 이용하여 유해 데이터를 선별하는 기술이 개시되어 있다. 그러나, 모든 데이터를 비교하는 경우, 서비스 거부 공격과 관련이 적은 데이터에 따라 공격을 탐지 못할 가능성이 있다.Korean Patent Laid-Open No. 10-2009-0079627 (published on July 22, 2009) discloses a technology for selecting harmful data using data mining. However, when comparing all the data, there is a possibility that the attack can not be detected according to the data which is not related to the denial of service attack.

따라서, 공격 탐지율을 높이는 방법이 요청되고 있다.Therefore, a method for increasing the attack detection rate is being demanded.

본 발명은 결정 트리 알고리즘을 기초로 후보 속성 데이터 중에서 공격 상태 검출에 사용되는 핵심 속성인 키 속성 데이터를 결정하고, 키 속성 데이터를 입력 데이터로 이용하여 데이터 마이닝함으로써, 공격 상태 검출을 위한 데이터 마이닝의 속도 및 정확도를 높인 공격 탐지 장치 및 방법을 제공할 수 있다.According to the present invention, key attribute data, which is a key attribute used in attack state detection, is determined from candidate attribute data based on a decision tree algorithm, and data mining is performed using key attribute data as input data. An attack detection apparatus and method with increased speed and accuracy can be provided.

본 발명의 일실시예에 따른 공격 탐지 방법은 후보 속성 데이터를 이용하여 제1 데이터 마이닝을 수행하는 단계; 상기 제1 데이터 마이닝의 결과에 기초하여 제1 데이터 마이닝에 사용된 복수의 결정 트리 알고리즘(decision tree algorithm) 중 하나의 결정 트리 알고리즘을 선택하는 단계; 선택된 결정 트리 알고리즘을 이용하여 후보 속성 데이터 중 키 속성 데이터를 결정하는 단계; 및 상기 키 속성 데이터를 이용하여 제2 데이터 마이닝을 수행하는 단계를 포함할 수 있다.An attack detection method according to an embodiment of the present invention includes performing first data mining using candidate attribute data; Selecting one decision tree algorithm among a plurality of decision tree algorithms used for the first data mining based on the result of the first data mining; Determining key attribute data of candidate attribute data using a selected decision tree algorithm; And performing a second data mining using the key attribute data.

본 발명의 일실시예에 따른 공격 탐지 방법의 키 속성 데이터를 결정하는 단계는 상기 트리 구조에서 복수의 레벨에 대응하는 후보 속성 데이터들 중 기준 레벨 이상의 후보 속성 데이터를 키 속성 데이터로 결정할 수 있다.The step of determining the key attribute data of the attack detection method according to an embodiment of the present invention may determine the candidate attribute data that is higher than the reference level among the candidate attribute data corresponding to the plurality of levels in the tree structure as the key attribute data.

본 발명의 일실시예에 따른 공격 탐지 방법은 공격 요구 사항에 기초하여 데이터 속성을 선택하는 단계; 선택한 데이터 속성에 대응하는 후보 속성 데이터를 수집하는 단계를 더 포함할 수 있다.An attack detection method according to an embodiment of the present invention includes: selecting a data attribute based on an attack requirement; And collecting candidate attribute data corresponding to the selected data attribute.

본 발명의 일실시예에 따른 공격 탐지 방법의 후보 속성 데이터는 정상 상태의 시스템에서 수집한 정상 상태 데이터, 및 공격 상태의 시스템에서 수집한 공격 상태 데이터를 포함할 수 있다.Candidate attribute data of the attack detection method according to an embodiment of the present invention may include steady state data collected in the steady state system and attack state data collected in the attack state system.

본 발명의 일실시예에 따른 공격 탐지 방법은 임의로 시스템에 공격을 수행하는 단계를 더 포함하고, 상기 공격 상태는 상기 시스템이 상기 공격을 수행하는 단계에 의하여 공격받는 상태일 수 있다.The attack detection method according to an exemplary embodiment of the present invention may further include performing an attack on the system, and the attack state may be a state in which the system is attacked by the step of performing the attack.

본 발명의 일실시예에 따른 공격 탐지 방법의 제1 데이터 마이닝을 수행하는 단계는 결정 트리 알고리즘을 포함하는 복수의 마이닝 알고리즘과 상기 후보 속성 데이터를 이용하여 상기 시스템의 공격 여부를 판단하고, 판단 결과에 기초하여 상기 마이닝 알고리즘의 공격 상태 탐지율을 결정할 수 있다.The step of performing the first data mining of the attack detection method according to an embodiment of the present invention may include determining whether the system is attacked using a plurality of mining algorithms including a decision tree algorithm and the candidate attribute data, The attack detection rate of the mining algorithm can be determined.

본 발명의 일실시예에 따른 공격 탐지 방법의 결정 트리 알고리즘을 선택하는 단계는 제1 데이터 마이닝에 사용된 복수의 결정 트리 알고리즘 중 공격 상태 탐지율이 가장 높은 결정 트리 알고리즘을 선택할 수 있다.In the step of selecting the decision tree algorithm of the attack detection method according to an embodiment of the present invention, a decision tree algorithm having the highest attack state detection rate among the plurality of decision tree algorithms used in the first data mining may be selected.

본 발명의 일실시예에 따른 공격 탐지 장치는 후보 속성 데이터를 이용하여 제1 데이터 마이닝을 수행하는 제1 데이터 마이닝부; 상기 제1 데이터 마이닝의 결과에 기초하여 제1 데이터 마이닝에 사용된 복수의 결정 트리 알고리즘(decision tree algorithm) 중 하나의 결정 트리 알고리즘을 선택하는 알고리즘 선택부; 선택된 결정 트리 알고리즘을 이용하여 후보 속성 데이터 중 키 속성 데이터를 결정하는 키 속성 데이터 결정부; 및 상기 키 속성 데이터를 이용하여 제2 데이터 마이닝을 수행하는 제2 데이터 마이닝부를 포함할 수 있다.An attack detection apparatus according to an embodiment of the present invention includes a first data mining unit for performing a first data mining using candidate attribute data; An algorithm selection unit that selects one decision tree algorithm among a plurality of decision tree algorithms used for the first data mining based on a result of the first data mining; A key attribute data determination unit for determining key attribute data among candidate attribute data using a selected decision tree algorithm; And a second data mining unit for performing a second data mining using the key attribute data.

본 발명의 일실시예에 의하면, 결정 트리 알고리즘을 기초로 후보 속성 데이터 중에서 공격 상태 검출에 사용되는 핵심 속성인 키 속성 데이터를 결정하고, 키 속성 데이터를 입력 데이터로 이용하여 데이터 마이닝함으로써, 공격 상태 검출을 위한 데이터 마이닝의 속도 및 정확도를 높일 수 있다.According to an embodiment of the present invention, key attribute data, which is a key attribute used in attack state detection, is determined from candidate attribute data based on a decision tree algorithm, and data mining is performed using key attribute data as input data, The speed and accuracy of data mining for detection can be increased.

도 1은 본 발명의 일실시예에 따른 공격 탐지 장치를 나타내는 도면이다.
도 2는 본 발명의 일실시예에 따른 후보 속성 데이터 수집부를 나타내는 도면이다.
도 3은 본 발명의 일실시예에 따른 공격 탐지 과정의 일례이다.
도 4는 본 발명의 일실시예에 따른 제1 데이터 마이닝 결과의 일례이다.
도 5는 본 발명의 일실시예에 따라 선택한 알고리즘의 일례이다.
도 6은 도 5의 알고리즘에서 선택한 키 속성 데이터로 제2 데이터 마이닝한 결과의 일례이다.
도 7은 본 발명의 일실시예에 따라 선택한 알고리즘의 다른 일례이다.
도 8은 도 7의 알고리즘에서 선택한 키 속성 데이터로 제2 데이터 마이닝한 결과의 일례이다.
도 9는 본 발명의 일실시예에 따른 제1 데이터 마이닝 결과의 다른 일례이다.
도 10은 본 발명의 일실시예에 따라 선택한 키 속성 데이터의 일례이다.
도 11은 도 10에서 선택한 키 속성 데이터로 제2 데이터 마이닝한 결과의 일례이다.
도 12는 본 발명의 일실시예에 따른 공격 탐지 방법을 도시한 플로우차트이다.
도 13은 본 발명의 일실시예에 따른 후보 속성 데이터 수집 방법을 도시한 플로우차트이다.1 is a diagram illustrating an attack detection apparatus according to an embodiment of the present invention.
2 is a diagram showing a candidate attribute data collection unit according to an embodiment of the present invention.
3 is an example of an attack detection process according to an embodiment of the present invention.
4 is an example of a first data mining result according to an embodiment of the present invention.
Figure 5 is an example of an algorithm selected in accordance with an embodiment of the present invention.
FIG. 6 is an example of a result of second data mining with key attribute data selected in the algorithm of FIG.
Figure 7 is another example of an algorithm selected in accordance with an embodiment of the present invention.
FIG. 8 is an example of a result of second data mining with the key attribute data selected in the algorithm of FIG.
9 is another example of a first data mining result according to an embodiment of the present invention.
10 is an example of key attribute data selected according to an embodiment of the present invention.
11 is an example of a result of second data mining using the key attribute data selected in FIG.
12 is a flowchart illustrating an attack detection method according to an embodiment of the present invention.
13 is a flowchart illustrating a candidate attribute data collection method according to an embodiment of the present invention.

이하, 본 발명의 실시예를 첨부된 도면을 참조하여 상세하게 설명한다. 본 발명의 일실시예에 따른 공격 탐지 방법은 공격 탐지 장치에 의해 수행될 수 있다. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings. The attack detection method according to an embodiment of the present invention can be performed by an attack detection apparatus.

본 발명의 일실시예에 따른 공격 탐지 장치(100)는 서브 스테이션 및 근거리 통신망(LAN)을 포함하는 스마트 그리드 환경의 네트워크 시스템에 대한 DoS 공격을 탐지할 수 있다.The attack detection apparatus 100 according to an embodiment of the present invention can detect a DoS attack for a network system in a smart grid environment including a sub-station and a local area network (LAN).

도 1은 본 발명의 일실시예에 따른 공격 탐지 장치를 나타내는 도면이다. 1 is a diagram illustrating an attack detection apparatus according to an embodiment of the present invention.

도 1을 참고하면, 본 발명의 일실시예에 따른 공격 탐지 장치(100)는 후보 속성 데이터 수집부(110), 제1 데이터 마이닝부(120), 알고리즘 선택부(130), 키 속성 데이터 결정부(140) 및 제2 데이터 마이닝부(150)를 포함할 수 있다.1, an attack detection apparatus 100 according to an embodiment of the present invention includes a candidate attribute data collection unit 110, a first data mining unit 120, an algorithm selection unit 130, Unit 140 and a second data mining unit 150.

후보 속성 데이터 수집부(110)는 공격 요구 사항에 기초하여 데이터 속성을 선택하고, 선택한 데이터 속성에 대응하는 후보 속성 데이터를 수집할 수 있다. 이때, 후보 속성 데이터는 정상 상태의 시스템에서 수집한 정상 상태 데이터, 및 공격 상태의 시스템에서 수집한 공격 상태 데이터를 포함할 수 있다.The candidate attribute data collection unit 110 may select a data attribute based on an attack requirement and collect candidate attribute data corresponding to the selected data attribute. At this time, the candidate attribute data may include steady state data collected from the steady state system and attack state data collected from the attack state system.

후보 속성 데이터 수집부(110)의 구체적인 구성 및 동작은 이하 도 2를 참조하여 상세히 설명한다.The specific configuration and operation of the candidate attribute data collection unit 110 will be described below in detail with reference to FIG.

제1 데이터 마이닝부(120)는 후보 속성 데이터 수집부(110)가 수집한 후보 속성 데이터를 이용하여 제1 데이터 마이닝을 수행할 수 있다.The first data mining unit 120 may perform the first data mining using the candidate attribute data collected by the candidate attribute data collecting unit 110. [

이때, 제1 데이터 마이닝부(120)는 복수의 마이닝 알고리즘과 후보 속성 데이터를 이용하여 시스템의 공격 여부를 판단하고, 판단 결과에 기초하여 마이닝 알고리즘의 공격 상태 탐지율을 결정할 수 있다.At this time, the first data mining unit 120 may determine whether the system is attacked using a plurality of mining algorithms and candidate attribute data, and determine an attack state detection rate of the mining algorithm based on the determination result.

이때, 마이닝 알고리즘은 베이즈(Bayes) 타입, 기능(Functions) 타입, 레이지(Lazy) 타입, 메타(Meta) 타입, 규칙(Rules) 타입, 트리(Tree) 타입 및 기타(Misc) 타입으로 분류될 수 있다.At this time, the mining algorithm is classified into Bayes type, Functions type, Lazy type, Meta type, Rules type, Tree type and Misc type .

예를 들어, 베이즈 분류기(bayes classifier)로 분류된 마이닝 알고리즘은 베이즈 타입 마이닝 알고리즘이고, 네추럴 네트워크(neural network)와 지원 벡터 머신(SVM: support vector machines)은 기능(Functions) 타입 마이닝 알고리즘일 수 있다. 또한, 인스턴스 기반 분류기(instance based classifier)로 분류된 마이닝 알고리즘은 레이지 타입 마이닝 알고리즘이고, 코밍 알고리즘(combing algorithm)은 메타 타입 마이닝 알고리즘일 수 있다.For example, a mining algorithm classified as a bayes classifier is a Bayesian type mining algorithm, and a neural network and a support vector machine (SVM) . In addition, a mining algorithm classified into an instance based classifier may be a lazy type mining algorithm, and a combing algorithm may be a metatype mining algorithm.

그리고, 규칙 기반 분류기(bayes classifier)로 분류된 마이닝 알고리즘은 규칙 타입 마이닝 알고리즘이고, 결정 트리 분류기(decision tree classifier)로 분류된 마이닝 알고리즘은 트리 타입 마이닝 알고리즘이며, 하이퍼 파이프스(hyperpipes)와 VFI(voting feature intervals)를 포함하는 미셀레니어스 분류기(miscellaneous classifier)로 분류된 마이닝 알고리즘은 기타(Misc) 타입 마이닝 알고리즘일 수 있다.The mining algorithm classified as a rule classifier is a rule type mining algorithm and the mining algorithm classified as a decision tree classifier is a tree type mining algorithm and hyperpipes and VFI The mining algorithm classified as a miscellaneous classifier including voting feature intervals may be a Misc type mining algorithm.

이때, 결정 트리 알고리즘(decision tree algorithm)은 결정 트리 분류기로 분류된 트리 타입 마이닝 알고리즘이며, 제1 데이터 마이닝에 사용된 후보 속성 데이터를 트리 구조로 나타내는 알고리즘일 수 있다.At this time, the decision tree algorithm is a tree type mining algorithm classified into a decision tree classifier, and may be an algorithm that shows candidate attribute data used in the first data mining as a tree structure.

알고리즘 선택부(130)는 제1 데이터 마이닝부(120)가 수행한 제1 데이터 마이닝의 결과에 기초하여 제1 데이터 마이닝에 사용된 복수의 결정 트리 알고리즘(decision tree algorithm) 중 하나의 결정 트리 알고리즘을 선택할 수 있다.The algorithm selecting unit 130 selects one of a plurality of decision tree algorithms used for the first data mining based on the result of the first data mining performed by the first data mining unit 120. [ Can be selected.

이때, 알고리즘 선택부(130)는 제1 데이터 마이닝에 사용된 복수의 결정 트리 알고리즘 중 공격 상태 탐지율이 가장 높은 결정 트리 알고리즘을 선택할 수 있다.At this time, the algorithm selecting unit 130 can select a decision tree algorithm having the highest attack state detection rate among the plurality of decision tree algorithms used for the first data mining.

키 속성 데이터 결정부(140)는 알고리즘 선택부(130)가 선택한 결정 트리 알고리즘을 이용하여 후보 속성 데이터 중 키 속성 데이터를 결정할 수 있다.The key attribute data determination unit 140 can determine the key attribute data in the candidate attribute data using the decision tree algorithm selected by the algorithm selection unit 130. [

이때, 키 속성 데이터 결정부(140)는 결정 트리 알고리즘이 표시하는 트리 구조에서 복수의 레벨에 대응하는 후보 속성 데이터들 중 기준 레벨 이상의 후보 속성 데이터를 키 속성 데이터로 결정할 수 있다. 예를 들어, 기준 레벨이 최상위 레벨인 경우, 키 속성 데이터 결정부(140)는 트리 구조에서 최상위 레벨에 대응하는 후보 속성 데이터를 키 속성 데이터로 결정할 수 있다.At this time, the key attribute data determination unit 140 may determine candidate attribute data of a level higher than a reference level among the candidate attribute data corresponding to a plurality of levels in the tree structure displayed by the decision tree algorithm as the key attribute data. For example, when the reference level is the highest level, the key attribute data determination unit 140 can determine candidate attribute data corresponding to the highest level in the tree structure as the key attribute data.

키 속성 데이터 결정부(140)가 키 속성 데이터를 결정하는 과정은 이하 도 5와 도 7을 참조하여 상세히 설명한다.The process of determining the key attribute data by the key attribute data determination unit 140 will be described in detail with reference to FIG. 5 and FIG.

제2 데이터 마이닝부(150)는 키 속성 데이터 결정부(140)가 결정한 키 속성 데이터를 이용하여 제2 데이터 마이닝을 수행할 수 있다.The second data mining unit 150 may perform the second data mining using the key attribute data determined by the key attribute data determination unit 140. [

이때, 제2 데이터 마이닝부(150)가 제2 데이터 마이닝을 수행하기 위하여 이용하는 마이닝 알고리즘은 제1 데이터 마이닝부(120)가 이용한 마이닝 알고리즘과 동일할 수 있다. 그러나, 제1 데이터 마이닝부(120)는 모든 후보 속성 데이터를 입력 데이터로 이용하여 데이터 마이닝할 수 있다. 반면, 제2 데이터 마이닝부(150)는 후보 속성 데이터 중에서 공격 상태 검출에 사용되는 핵심 속성인 키 속성 데이터 만을 입력 데이터로 이용하여 데이터 마이닝하고 있으므로, 제1 데이터 마이닝부(120)보다 연산량이 적으면서도 정확도를 높일 수 있다.At this time, the mining algorithm used by the second data mining unit 150 to perform the second data mining may be the same as the mining algorithm used by the first data mining unit 120. However, the first data mining unit 120 can perform data mining using all candidate attribute data as input data. On the other hand, since the second data mining unit 150 performs data mining using only the key attribute data, which is a key attribute used for attack state detection, among the candidate attribute data as input data, the amount of computation is smaller than that of the first data mining unit 120 Accuracy can be increased.

즉, 본 발명의 일실시예에 따른 공격 탐지 장치(100)는 후보 속성 데이터 중에서 공격 상태 검출에 사용되는 핵심 속성인 키 속성 데이터를 결정하고, 키 속성 데이터를 입력 데이터로 이용하여 데이터 마이닝함으로써, 공격 상태 검출을 위한 데이터 마이닝의 속도 및 정확도를 높일 수 있다.
That is, the attack detection apparatus 100 according to an embodiment of the present invention determines key attribute data, which is a key attribute used in attack state detection, among candidate attribute data, and performs data mining using key attribute data as input data, The speed and accuracy of data mining for attack state detection can be increased.

도 2는 본 발명의 일실시예에 따른 후보 속성 데이터 수집부를 나타내는 도면이다. 2 is a diagram showing a candidate attribute data collection unit according to an embodiment of the present invention.

도 2를 참고하면, 본 발명의 일실시예에 따른 후보 속성 데이터 수집부(110)는 데이터 속성 선택부(210), 데이터 수집부(220), 및 시스템 공격부(230)를 포함할 수 있다.2, the candidate attribute data collection unit 110 according to an exemplary embodiment of the present invention may include a data attribute selection unit 210, a data collection unit 220, and a system attack unit 230 .

데이터 속성 선택부(210)는 공격 요구 사항에 기초하여 데이터 속성을 선택할 수 있다.The data attribute selection unit 210 can select the data attribute based on the attack requirement.

예를 들어, SYN 플로드 공격(SYN Flood Attack)과 버퍼 오버 플로우 공격(Buffer Overflow Attack)은 공격 방식이 상이하므로, 각각의 공격에 따라 변화하는 속성 데이터가 상이할 수 있다.For example, the SYN flood attack and the buffer overflow attack differ in the attack method, so that the attribute data that changes according to each attack may be different.

따라서, 데이터 속성 선택부(210)는 탐지하고자 하는 공격의 NSM(Network and System Management)요구 사항을 분석하고, 분석 결과에 따라 데이터 속성을 선택할 수 있다.Accordingly, the data attribute selection unit 210 analyzes network and system management (NSM) requirements of an attack to be detected and can select data attributes according to the analysis result.

예를 들어, 데이터 속성 선택부(210)는 SYN 플로드 공격의 NSM 요구 사항을 분석하고, 분석 결과에 따라 표 1과 같은 데이터 속성을 선택할 수 있다.For example, the data attribute selection unit 210 may analyze the NSM requirements of the SYN flood attack and select the data attributes as shown in Table 1 according to the analysis result.

또한, 데이터 속성 선택부(210)는 버퍼 오버 플로우 공격의 NSM 요구 사항에 따라 buffer used (kB)와 transmission payload size (bytes)를 데이터 속성으로 더 선택할 수도 있다.In addition, the data property selector 210 may further select buffer used (kB) and transmission payload size (bytes) as data attributes according to the NSM requirements of the buffer overflow attack.

데이터 수집부(220)는 데이터 속성 선택부(210)가 선택한 데이터 속성에 대응하는 후보 속성 데이터를 수집할 수 있다. 이때, 후보 속성 데이터는 정상 상태의 시스템에서 수집한 정상 상태 데이터, 및 공격 상태의 시스템에서 수집한 공격 상태 데이터를 포함할 수 있다.The data collection unit 220 may collect candidate attribute data corresponding to the data attribute selected by the data attribute selection unit 210. [ At this time, the candidate attribute data may include steady state data collected from the steady state system and attack state data collected from the attack state system.

구체적으로 데이터 수집부(220)는 시스템 공격부(230)가 시스템에 공격을 수행하기 전에 데이터 속성 선택부(210)가 선택한 데이터 속성에 대응하는 속성 데이터인 정상 상태 데이터를 수집할 수 있다.Specifically, the data collection unit 220 may collect the steady state data, which is attribute data corresponding to the data attribute selected by the data attribute selection unit 210, before the system attack unit 230 attacks the system.

그리고, 데이터 수집부(220)는 시스템 공격부(230)가 시스템에 공격을 수행하는 중에 데이터 속성 선택부(210)가 선택한 데이터 속성에 대응하는 속성 데이터를 수집함으로써, 시스템이 공격받는 상태의 속성 데이터인 공격 상태 데이터를 수집할 수 있다.The data collecting unit 220 collects attribute data corresponding to the data attribute selected by the data attribute selecting unit 210 while the system attacking unit 230 is attacking the system, It is possible to collect attack status data, which is data.

시스템 공격부(230)는 임의로 시스템에 공격을 수행할 수 있다. 예를 들어, 시스템 공격부(230)는 GOOSE 메시지를 이용하여 SYN 플로드 공격 및 버퍼 오버 플로우 공격을 수행할 수 있다. The system attack unit 230 may arbitrarily perform an attack on the system. For example, the system attack unit 230 may perform a SYN flood attack and a buffer overflow attack using a GOOSE message.

이때, 시스템 공격부(230)는 시스템에 일정 시간 간격으로 기 설정된 회수의 SYN 플로드 공격 및 버퍼 오버 플로우 공격을 수행할 수 있다. 예를 들어, 시스템 공격부(230)는 1분마다 1번씩 100회 공격을 수행할 수 있다. 이때, 데이터 수집부(220)는 시스템 공격부(230)가 공격을 수행하는 시간에 따라 1분마다 공격 상태 데이터를 수집할 수 있다. 또한, 데이터 수집부(220)는 시스템 공격부(230)가 공격을 수행한 후, 다음 공격이 수행되기 전까지의 1분 동안 정상 상태 데이터를 수집할 수 있다. 즉, 데이터 수집부(220)는 100개의 공격 상태 데이터와 100개의 정상 상태 데이터를 후보 속성 데이터로 수집할 수 있다.At this time, the system attack unit 230 may perform a SYN flood attack and a buffer overflow attack a predetermined number of times at a predetermined time interval in the system. For example, the system attack unit 230 may perform 100 attacks once every minute. At this time, the data collection unit 220 may collect the attack state data every one minute according to the time when the system attack unit 230 performs an attack. In addition, the data collecting unit 220 may collect the steady state data for one minute before the next attack is performed after the system attack unit 230 performs the attack. That is, the data collection unit 220 may collect 100 attack state data and 100 steady state data as candidate attribute data.

또한, 데이터 속성 선택부(210)는 시스템 공격부(230)에 의한 공격을 모니터링하고, 모니터링 결과에 따라 추가로 데이터 속성을 선택할 수 있다. 예를 들어, 시스템 공격부(230)에 의한 공격에 따라 변화하는 속성 데이터 중에서 데이터 속성 선택부(210)가 선택하지 않은 데이터 속성의 속성 데이터가 있을 수 있다. 이때, 데이터 속성 선택부(210)는 변화하는 속성 데이터의 데이터 속성을 추가 선택할 수 있다.In addition, the data attribute selection unit 210 may monitor an attack by the system attack unit 230 and may further select a data attribute according to a monitoring result. For example, there may be attribute data of a data attribute that is not selected by the data attribute selection unit 210 from attribute data that changes according to an attack by the system attack unit 230. At this time, the data attribute selection unit 210 can additionally select the data attribute of the changed attribute data.

그리고, 시스템 공격부(230)가 공격을 수행하고, 데이터 수집부(220)가 후보 속성 데이터를 수집하는 시스템은 테스트용 시스템일 수 있다. 이때, 테스트용 시스템은 서버 스테이션과 근거리 통신망으로 구성된 스마트 그리드 환경의 네트워크 시스템일 수 있다. 즉, 테스트용 시스템은 공격 탐지 장치(100)가 공격을 탐지하고자 하는 네트워크 시스템과 유사한 시스템일 수 있다.
The system in which the system attack unit 230 performs an attack and the data collection unit 220 collects candidate attribute data may be a system for testing. At this time, the test system may be a network system of a smart grid environment composed of a server station and a local area network. That is, the test system may be a system similar to the network system in which the attack detection apparatus 100 seeks an attack.

도 3은 본 발명의 일실시예에 따른 공격 탐지 과정의 일례이다. 3 is an example of an attack detection process according to an embodiment of the present invention.

단계(310)에서 데이터 속성 선택부(210)는 공격 요구 사항에 기초하여 데이터 속성을 선택할 수 있다. 이때, 데이터 속성 선택부(210)는 탐지하고자 하는 공격의 NSM(Network and System Management)요구 사항을 분석하고, 분석 결과에 따라 데이터 속성을 선택할 수 있다.In step 310, the data attribute selector 210 may select a data attribute based on an attack requirement. At this time, the data attribute selection unit 210 analyzes network and system management (NSM) requirements of an attack to be detected and can select a data attribute according to the analysis result.

단계(320)에서 시스템 공격부(230)는 임의로 시스템에 공격을 수행할 수 있다. 예를 들어, 시스템 공격부(230)는 GOOSE 메시지를 이용하여 SYN 플로드 공격 및 버퍼 오버 플로우 공격을 수행할 수 있다. In step 320, the system attack unit 230 may arbitrarily perform an attack on the system. For example, the system attack unit 230 may perform a SYN flood attack and a buffer overflow attack using a GOOSE message.

이때, 시스템 공격부(230)는 테스트용 시스템인 공격 테스트 베드(Attack test bed)(300)에 일정 시간 간격으로 기 설정된 회수의 SYN 플로드 공격 및 버퍼 오버 플로우 공격을 수행할 수 있다. 예를 들어, 시스템 공격부(230)는 1분마다 1번씩 100회 공격을 수행할 수 있다. At this time, the system attack unit 230 may perform a predetermined number of SYN flood attacks and buffer overflow attacks in an attack test bed 300, which is a test system, at predetermined time intervals. For example, the system attack unit 230 may perform 100 attacks once every minute.

이때, 공격 테스트 베드(300)는 서브스테이션(302), 관리자 데스크, 서브스테이션 외부와 인터페이스를 포함한 I / O 장치, 지능형 센서와 액추에이터를 포함할 수 있다. 또한, 서브스테이션(302)은 일정 수준의 CPU와 메모리를 포함하는 지능형 전자 장치(IED: intelligent electronic device)를 포함할 수 있다.At this time, the attack test bed 300 may include an I / O device including an interface with the sub-station 302, an administrative desk, an external sub-station, an intelligent sensor and an actuator. In addition, the sub-station 302 may include an intelligent electronic device (IED) that includes a certain level of CPU and memory.

또한, 시스템 공격부(230)는 공격자(301)의 역할을 대신하여 I/O를 통하여 서브 스테이션(302)에 포함된 IED에 회수의 SYN 플로드 공격 및 버퍼 오버 플로우 공격을 수행할 수 있다.In addition, the system attacking unit 230 may perform the SYN flood attack and the buffer overflow attack on the IED included in the sub-station 302 through I / O on behalf of the attacker 301.

단계(330)에서 데이터 수집부(220)는 단계(310)에서 선택한 데이터 속성에 대응하는 후보 속성 데이터를 수집할 수 있다. 구체적으로 데이터 수집부(220)는 단계(320)에서 시스템에 공격을 수행하지 않는 상태일 때, 단계(310)에서 선택한 데이터 속성에 대응하는 속성 데이터인 정상 상태 데이터를 수집할 수 있다. 또한, 데이터 수집부(220)는 시스템 공격부(230)가 시스템에 공격을 수행하는 상태일 때, 단계(310)에서 선택한 데이터 속성에 대응하는 속성 데이터를 수집함으로써, 시스템이 공격받는 상태의 속성 데이터인 공격 상태 데이터를 수집할 수 있다.In operation 330, the data collection unit 220 may collect candidate attribute data corresponding to the data attribute selected in operation 310. Specifically, when the data collection unit 220 is in a state in which the attack is not performed on the system in step 320, the data collection unit 220 may collect the steady state data, which is attribute data corresponding to the data attribute selected in step 310. The data collection unit 220 collects attribute data corresponding to the data attribute selected in step 310 when the system attack unit 230 is in an attack state to the system, It is possible to collect attack status data, which is data.

단계(340)에서 데이터 수집부(220)는 단계(330)에서 수집한 정상 상태 데이터와 공격 상태 데이터를 데이터베이스화할 수 있다. 이때, 데이터 수집부(220)는 공격 상태 데이터를 수집한 시간에서 가장 가까운 시간에 수집한 정상 상태 데이터를 공격 상태 데이터와 매칭하여 데이터베이스화할 수 있다.In operation 340, the data collection unit 220 may convert the steady state data and the attack state data collected in operation 330 into a database. At this time, the data collecting unit 220 can match the steady state data collected at the nearest time from the collected time of the attack state data to the attack state data and convert it into a database.

단계(350)에서 제1 데이터 마이닝부(120)는 단계(340)에서 데이터베이스화한 후보 속성 데이터를 이용하여 제1 데이터 마이닝을 수행할 수 있다. 이때, 제1 데이터 마이닝부(120)는 복수의 마이닝 알고리즘으로 각각 매칭된 정상 상태 데이터와 공격 상태 데이터를 비교하여 시스템의 공격 여부를 판단하고, 마이닝 알고리즘들의 판단 결과에 기초하여 마이닝 알고리즘 각각의 공격 상태 탐지율을 결정할 수 있다.In operation 350, the first data mining unit 120 may perform the first data mining using the candidate attribute data obtained in step 340. At this time, the first data mining unit 120 compares the steady state data matched with the plurality of mining algorithms to the attack state data to determine whether the system is attacked. Based on the determination results of the mining algorithms, The state detection rate can be determined.

단계(360)에서 알고리즘 선택부(130)는 단계(350)에서 수행한 제1 데이터 마이닝의 결과에 기초하여 제1 데이터 마이닝에 사용된 복수의 결정 트리 알고리즘 중 하나의 결정 트리 알고리즘을 선택할 수 있다. 이때, 알고리즘 선택부(130)는 제1 데이터 마이닝에 사용된 복수의 결정 트리 알고리즘 중 공격 상태 탐지율이 가장 높은 결정 트리 알고리즘을 선택할 수 있다.In step 360, the algorithm selector 130 may select one of the decision tree algorithms used in the first data mining based on the result of the first data mining performed in step 350 . At this time, the algorithm selecting unit 130 can select a decision tree algorithm having the highest attack state detection rate among the plurality of decision tree algorithms used for the first data mining.

단계(370)에서 키 속성 데이터 결정부(140)는 알고리즘 선택부(130)가 선택한 결정 트리 알고리즘을 이용하여 후보 속성 데이터 중 키 속성 데이터를 선택할 수 있다. 이때, 키 속성 데이터 결정부(140)는 결정 트리 알고리즘이 표시하는 트리 구조에서 복수의 레벨에 대응하는 후보 속성 데이터들 중 기준 레벨 이상의 후보 속성 데이터를 키 속성 데이터로 선택할 수 있다. The key attribute data determination unit 140 may select the key attribute data in the candidate attribute data by using the decision tree algorithm selected by the algorithm selection unit 130 in step 370. [ At this time, the key attribute data determination unit 140 can select, as the key attribute data, candidate attribute data higher than the reference level among the candidate attribute data corresponding to the plurality of levels in the tree structure displayed by the decision tree algorithm.

단계(380)에서 제2 데이터 마이닝부(150)는 단계(370)에서 결정한 키 속성 데이터를 이용하여 제2 데이터 마이닝을 수행할 수 있다.In operation 380, the second data mining unit 150 may perform the second data mining using the key attribute data determined in operation 370.

이때, 제2 데이터 마이닝부(150)가 제2 데이터 마이닝을 수행하기 위하여 이용하는 마이닝 알고리즘은 단계(350)에서 이용한 마이닝 알고리즘과 동일하며 입력 데이터가 상이할 수 있다. At this time, the mining algorithm used by the second data mining unit 150 to perform the second data mining is the same as the mining algorithm used in step 350, and the input data may be different.

구체적으로 단계(350)에서 제1 데이터 마이닝부(120)는 모든 후보 속성 데이터를 입력 데이터로 이용하여 데이터 마이닝하고, 단계(380)에서 제2 데이터 마이닝부(150)는 후보 속성 데이터 중에서 공격 상태 검출에 사용되는 핵심 속성인 키 속성 데이터 만을 입력 데이터로 이용하여 데이터 마이닝할 수 있다.
Specifically, in step 350, the first data mining unit 120 performs data mining using all candidate attribute data as input data. In step 380, the second data mining unit 150 extracts, from the candidate attribute data, It is possible to perform data mining using only key attribute data, which is a key attribute used for detection, as input data.

도 4는 본 발명의 일실시예에 따른 제1 데이터 마이닝 결과의 일례이다. 4 is an example of a first data mining result according to an embodiment of the present invention.

도 4는 시스템 공격부(230)가 SYN 플로드 공격을 수행한 경우, 제1 데이터 마이닝부(120)가 결정한 마이닝 알고리즘 각각의 공격 상태 탐지율의 일례일 수 있다.4 is an example of an attack state detection rate of each mining algorithm determined by the first data mining unit 120 when the system attack unit 230 performs a SYN flood attack.

제1 데이터 마이닝부(120)는 복수의 마이닝 알고리즘으로 후보 속성 데이터에 포함된 정상 상태 데이터와 공격 상태 데이터를 비교하여 마이닝 알고리즘 각각의 공격 상태 탐지율을 결정할 수 있다.The first data mining unit 120 can determine the attack state detection rate of each mining algorithm by comparing the steady state data included in the candidate attribute data with the attack state data using a plurality of mining algorithms.

예를 들어, 제1 데이터 마이닝부(120)는 표 2와 같이 64개의 마이닝 알고리즘들의 공격 상태 탐지율을 결정할 수 있다. 또한, 제1 데이터 마이닝부(120)는 64개의 마이닝 알고리즘의 공격 상태 탐지율을 도 4와 같은 그래프로 표시할 수도 있다.For example, the first data mining unit 120 may determine an attack state detection rate of 64 mining algorithms as shown in Table 2. In addition, the first data mining unit 120 may display the attack state detection rate of 64 mining algorithms in a graph as shown in FIG.

이때, 표 2의 마이닝 알고리즘 중 공격 상태 탐지율이 가장 높은 결정 트리 알고리즘은 99.833%인 ADTree(alternating decision tree), 및 LADTree일 수 있다.At this time, the decision tree algorithm having the highest attack state detection rate among the mining algorithms of Table 2 may be ADTree (alternating decision tree) of 99.833%, and LADTree.

따라서, 알고리즘 선택부(130)는 ADTree, 및 LADTree 중 적어도 하나를 선택할 수 있다.
Therefore, the algorithm selection unit 130 can select at least one of ADTree and LADTree.

도 5는 본 발명의 일실시예에 따라 선택한 알고리즘의 일례이다. Figure 5 is an example of an algorithm selected in accordance with an embodiment of the present invention.

도 5는 도 4와 표 2에 대응하는 마이닝 알고리즘의 결정 트리 알고리즘 중에서 알고리즘 선택부(130)가 선택한 결정 트리 알고리즘인 ADTree의 일례이다.FIG. 5 is an example of ADTree which is a decision tree algorithm selected by the algorithm selecting unit 130 among decision tree algorithms of the mining algorithm corresponding to FIG. 4 and Table 2. FIG.

결정 트리 알고리즘인 ADTree는 도 5에 도시된 바와 같이 후보 속성 데이터들을 트리 구조로 표시할 수 있다. 이때, ADTree는 후보 속성 데이터들 중 Traffic count, memory used, time of round trip (s), average packets size를 상위 레벨의 후보 속성 데이터(510)로 설정하고, average bytes/sec, average packets/sec을 하위 레벨의 후보 속성 데이터(520)로 설정할 수 있다.The decision tree algorithm ADTree can display the candidate attribute data in a tree structure as shown in FIG. At this time, the ADTree sets the traffic count, memory used, time of round trip (s), and average packet size among the candidate attribute data to the upper level candidate attribute data 510 and sets average bytes / sec and average packets / Level candidate attribute data 520 as shown in FIG.

그리고, 키 속성 데이터 결정부(140)는 ADTree가 표시하는 후보 속성 데이터 중 상위 레벨의 후보 속성 데이터(510)와 하위 레벨의 후보 속성 데이터(520)을 키 속성 데이터로 결정할 수 있다.
Then, the key attribute data determination unit 140 can determine the candidate attribute data 510 of the upper level and the candidate attribute data 520 of the lower level as the key attribute data, among the candidate attribute data displayed by the ADTree.

도 6은 도 5의 알고리즘에서 선택한 키 속성 데이터로 제2 데이터 마이닝한 결과의 일례이다. FIG. 6 is an example of a result of second data mining with key attribute data selected in the algorithm of FIG.

도 6과 표 3은 제2 데이터 마이닝부(150)가 도 5에서 선택한 키 속성 데이터로 도 4와 표 2에 대응하는 64개의 마이닝 알고리즘을 수행한 결과일 수 있다.6 and Table 3 may be a result of the second data mining unit 150 performing 64 mining algorithms corresponding to FIG. 4 and Table 2 with the key attribute data selected in FIG.

표 2와 표 3 및 도 4와 도 6을 비교하면 동일한 마이닝 알고리즘임에도 입력 데이터가 변경됨에 따라 공격 상태 탐지율이 변경됨을 알 수 있다.Comparing Table 2 with Table 3 and FIG. 4 with FIG. 6, it can be seen that although the same mining algorithm is used, the attack state detection rate is changed as the input data is changed.

구체적으로 표 2에서 공격 상태 탐지율은 최대값이 99.833%인 반면, 표 3에서 공격 상태 탐지율이 100%인 마이닝 알고리즘이 출현하고 있다.Specifically, in Table 2, the attack rate is 99.833%, while the attack rate is 100%.

즉, 제1 데이터 마이닝부(120)가 16개의 후보 속성 데이터를 모두 입력 데이터로 사용하는 반면, 제2 데이터 마이닝부(150)는 키 속성 데이터 결정부(140)가 선택한 6개의 후보 속성 데이터만을 이용함으로써, 마이닝 알고리즘의 수행 속도가 증가하며, 공격 상태 탐지율까지 증가시킬 수 있다.
That is, the first data mining unit 120 uses all 16 candidate attribute data as input data, while the second data mining unit 150 uses only six candidate attribute data selected by the key attribute data determination unit 140 The execution speed of the mining algorithm is increased and the attack state detection rate can be increased.

도 7은 본 발명의 일실시예에 따라 선택한 알고리즘의 다른 일례이다. Figure 7 is another example of an algorithm selected in accordance with an embodiment of the present invention.

도 7은 도 4와 표 2에 대응하는 마이닝 알고리즘의 결정 트리 알고리즘 중에서 알고리즘 선택부(130)가 선택한 결정 트리 알고리즘인 LADTree의 일례이다.7 is an example of a decision tree algorithm LADTree selected by the algorithm selection unit 130 from the decision tree algorithms of the mining algorithms corresponding to FIG. 4 and Table 2. FIG.

결정 트리 알고리즘인 LADTree는 도 7에 도시된 바와 같이 후보 속성 데이터들을 트리 구조로 표시할 수 있다. 이때, LADTree는 후보 속성 데이터들 중 traffic count, average packets/s, memory used, 40-79 packets count, time of round trip (s), 320-639 percent를 상위 레벨의 후보 속성 데이터(710)로 설정하고, average B/s를 하위 레벨의 후보 속성 데이터(720)로 설정할 수 있다.The decision tree algorithm LADTree can display the candidate attribute data in a tree structure as shown in FIG. At this time, LADTree sets the traffic attribute, average packets / s, memory used, 40-79 packet count, time of round trip (s), and 320-639 percent of the candidate attribute data as candidate attribute data 710 , And average B / s can be set to candidate attribute data 720 of lower level.

그리고, 키 속성 데이터 결정부(140)는 LADTree가 표시하는 후보 속성 데이터 중 상위 레벨의 후보 속성 데이터(710)와 하위 레벨 후보 속성 데이터(720)을 키 속성 데이터로 결정할 수 있다.
Then, the key attribute data determination unit 140 can determine the candidate attribute data 710 and the lower-level candidate attribute data 720 of the higher level among the candidate attribute data displayed by the LAD tree as the key attribute data.

도 8은 도 7의 알고리즘에서 선택한 키 속성 데이터로 제2 데이터 마이닝한 결과의 일례이다. FIG. 8 is an example of a result of second data mining with the key attribute data selected in the algorithm of FIG.

도 8은 제2 데이터 마이닝부(150)가 도 7에서 선택한 키 속성 데이터로 도 4와 표 2에 대응하는 64개의 마이닝 알고리즘을 수행한 결과일 수 있다.8 may be a result of the second data mining unit 150 performing 64 mining algorithms corresponding to FIG. 4 and Table 2 with the key attribute data selected in FIG.

도 4와 도 8을 비교하면 동일한 마이닝 알고리즘임에도 입력 데이터가 변경됨에 따라 공격 상태 탐지율이 변경됨을 알 수 있다.Comparing FIG. 4 and FIG. 8, it can be seen that although the same mining algorithm is used, the attack state detection rate is changed as the input data is changed.

구체적으로 도 4에서 공격 상태 탐지율은 최대값이 99.833%인 반면, 도 8에서 공격 상태 탐지율이 100%인 마이닝 알고리즘이 출현하고 있다.Specifically, in FIG. 4, the maximum value of the attack state detection rate is 99.833%, while the mining algorithm with the attack state detection rate of 100% is emerging in FIG.

즉, 제1 데이터 마이닝부(120)가 16개의 후보 속성 데이터를 모두 입력 데이터로 사용하는 반면, 제2 데이터 마이닝부(150)는 키 속성 데이터 결정부(140)가 선택한 7개의 후보 속성 데이터만을 이용함으로써, 공격 상태 탐지와 관련이 적은 속성 데이터의 비교를 생략하여 공격 상태 탐지율을 증가시킬 수 있다.
That is, the first data mining unit 120 uses all 16 candidate attribute data as input data, while the second data mining unit 150 uses only 7 candidate attribute data selected by the key attribute data determination unit 140 The attack state detection rate can be increased by omitting the comparison of the attribute data which is not related to the attack state detection.

도 9는 본 발명의 일실시예에 따른 제1 데이터 마이닝 결과의 다른 일례이다. 9 is another example of a first data mining result according to an embodiment of the present invention.

도 9는 시스템 공격부(230)가 버퍼 오버 플로우 공격을 수행한 경우, 제1 데이터 마이닝부(120)가 결정한 마이닝 알고리즘 각각의 공격 상태 탐지율의 일례일 수 있다.9 is an example of an attack state detection rate of each mining algorithm determined by the first data mining unit 120 when the system attack unit 230 performs a buffer overflow attack.

예를 들어, 시스템 공격부(230)가 버퍼 오버 플로우 공격을 수행한 경우, 제1 데이터 마이닝부(120)는 표 4와 같이 70개의 마이닝 알고리즘들의 공격 상태 탐지율을 결정할 수 있다. 또한, 제1 데이터 마이닝부(120)는 70개의 마이닝 알고리즘의 공격 상태 탐지율을 도 9와 같은 그래프로 표시할 수도 있다.For example, when the system attack unit 230 performs a buffer overflow attack, the first data mining unit 120 may determine an attack state detection rate of 70 mining algorithms as shown in Table 4. [ Also, the first data mining unit 120 may display the attack state detection rate of 70 mining algorithms in a graph as shown in FIG.

이때, 표 4의 마이닝 알고리즘 중 공격 상태 탐지율이 가장 높은 결정 트리 알고리즘은 100%인 Tree. ADTree, tree.RamdomTree, tree.REPTree, tree.J48, tree.J48graft 및 tree.LADTree일 수 있다.In this case, the decision tree algorithm having the highest attack state detection rate among the mining algorithms in Table 4 is 100% tree. ADTree, tree.RamdomTree, tree.REPTree, tree.J48, tree.J48graft, and tree.LADTree.

따라서, 알고리즘 선택부(130)는 Tree. ADTree, tree.RamdomTree, tree.REPTree, tree.J48, tree.J48graft 및 tree.LADTree 중 적어도 하나를 선택할 수 있다.
Therefore, the algorithm selecting unit 130 selects the Tree. ADTree, tree.RamdomTree, tree.REPTree, tree.J48, tree.J48graft, and tree.LADTree.

도 10은 본 발명의 일실시예에 따라 선택한 키 속성 데이터의 일례이다. 10 is an example of key attribute data selected according to an embodiment of the present invention.

도 10의 케이스 1(Case 1)는 알고리즘 선택부(130)가 도 9와 표 4에 대응하는 마이닝 알고리즘의 결정 트리 알고리즘 중에서 ADTree, RamdomTree, REPTree를 선택한 경우, 키 속성 데이터 결정부(140)가 키 속성 데이터로 결정한 후보 속성 데이터의 일례이다.In Case 1 of FIG. 10, when the algorithm selecting unit 130 selects ADTree, RamdomTree, and REPTree among decision tree algorithms of the mining algorithms corresponding to FIG. 9 and Table 4, the key attribute data determining unit 140 And is an example of candidate attribute data determined by key attribute data.

또한, 도 10의 케이스 2(Case 2)는 알고리즘 선택부(130)가 도 9와 표 4에 대응하는 마이닝 알고리즘의 결정 트리 알고리즘 중에서 tree.J48, 또는 tree.J48graft를 선택한 경우, 키 속성 데이터 결정부(140)가 키 속성 데이터로 결정한 후보 속성 데이터의 일례이다.In Case 2 of FIG. 10, when the algorithm selecting unit 130 selects tree.J48 or tree.J48graft among the decision tree algorithm of the mining algorithm corresponding to FIG. 9 and Table 4, Is an example of candidate attribute data determined by the key attribute data.

그리고, 도 10의 케이스 3(Case 3)는 알고리즘 선택부(130)가 도 9와 표 4에 대응하는 마이닝 알고리즘의 결정 트리 알고리즘 중에서 LADTree를 선택한 경우, 키 속성 데이터 결정부(140)가 키 속성 데이터로 결정한 후보 속성 데이터의 일례이다. 이때, LADTree는 average packets size와 total transmission packets를 상위 레벨의 후보 속성 데이터로 설정할 수 있다. 따라서, 키 속성 데이터 결정부(140)는 average packets size와 total transmission packets를 키 속성 데이터로 결정할 수 있다.
In Case 3 of FIG. 10, when the algorithm selecting unit 130 selects LADTree among the decision tree algorithms of the mining algorithm corresponding to FIG. 9 and Table 4, the key attribute data determining unit 140 determines that the key attribute And is an example of candidate attribute data determined by data. At this time, LADTree can set average packet sizes and total transmission packets as candidate attribute data of a high level. Therefore, the key attribute data determination unit 140 can determine the average packet size and the total transmission packets as the key attribute data.

도 11은 도 10에서 선택한 키 속성 데이터로 제2 데이터 마이닝한 결과의 일례이다. 11 is an example of a result of second data mining using the key attribute data selected in FIG.

도 11과 표 5는 제2 데이터 마이닝부(150)가 도 10에서 선택한 키 속성 데이터로 도 9와 표 4에 대응하는 71개의 마이닝 알고리즘을 수행한 결과일 수 있다.11 and Table 5 may be a result of the 71 mining algorithms corresponding to FIG. 9 and Table 4 performed by the second data mining unit 150 with the key attribute data selected in FIG.

표 4와 표 5 및 도 9와 도 11을 비교하면 동일한 마이닝 알고리즘임에도 입력 데이터가 변경됨에 따라 공격 상태 탐지율이 변경됨을 알 수 있다.Comparing Table 4 with Table 5 and FIG. 9 with FIG. 11, it can be seen that although the same mining algorithm is used, the attack state detection rate is changed as the input data is changed.

구체적으로 표 4에서 공격 상태 탐지율이 100%인 마이닝 알고리즘의 개수는 40개인 반면, 표 5에서 공격 상태 탐지율이 100%인 마이닝 알고리즘의 개수는 57개이므로 1.5배에 가까이 증가하고 있다.Specifically, in Table 4, the number of mining algorithms with 100% attack state detection rate is 40, while the number of mining algorithms with 100% attack state detection rate in Table 5 is 57, which is close to 1.5 times.

즉, 제1 데이터 마이닝부(120)가 후보 속성 데이터를 모두 입력 데이터로 사용하는 반면, 제2 데이터 마이닝부(150)는 키 속성 데이터 결정부(140)가 선택한 후보 속성 데이터만을 이용함으로써, 공격 상태 탐지율이 100%인 마이닝 알고리즘의 개수를 증가시킬 수 있다.
That is, while the first data mining unit 120 uses all the candidate attribute data as input data, the second data mining unit 150 uses only the candidate attribute data selected by the key attribute data determination unit 140, The number of mining algorithms with 100% state detection rate can be increased.

도 12는 본 발명의 일실시예에 따른 공격 탐지 방법을 도시한 플로우차트이다.12 is a flowchart illustrating an attack detection method according to an embodiment of the present invention.

단계(1210)에서 후보 속성 데이터 수집부(110)는 공격 요구 사항에 기초하여 데이터 속성을 선택하고, 선택한 데이터 속성에 대응하는 후보 속성 데이터를 수집할 수 있다. 이때, 후보 속성 데이터는 정상 상태의 시스템에서 수집한 정상 상태 데이터, 및 공격 상태의 시스템에서 수집한 공격 상태 데이터를 포함할 수 있다.In step 1210, the candidate attribute data collection unit 110 may select a data attribute based on the attack requirement and collect candidate attribute data corresponding to the selected data attribute. At this time, the candidate attribute data may include steady state data collected from the steady state system and attack state data collected from the attack state system.

단계(1220)에서 제1 데이터 마이닝부(120)는 단계(1210)에서 수집한 후보 속성 데이터를 이용하여 제1 데이터 마이닝을 수행할 수 있다. 이때, 제1 데이터 마이닝부(120)는 복수의 마이닝 알고리즘으로 각각 매칭된 정상 상태 데이터와 공격 상태 데이터를 비교하여 시스템의 공격 여부를 판단하고, 마이닝 알고리즘들의 판단 결과에 기초하여 마이닝 알고리즘 각각의 공격 상태 탐지율을 결정할 수 있다.In operation 1220, the first data mining unit 120 may perform the first data mining using the candidate attribute data collected in operation 1210. At this time, the first data mining unit 120 compares the steady state data matched with the plurality of mining algorithms to the attack state data to determine whether the system is attacked. Based on the determination results of the mining algorithms, The state detection rate can be determined.

단계(1230)에서 알고리즘 선택부(130)는 단계(1220)에서 수행한 제1 데이터 마이닝의 결과에 기초하여 제1 데이터 마이닝에 사용된 복수의 결정 트리 알고리즘 중 하나의 결정 트리 알고리즘을 선택할 수 있다. 이때, 알고리즘 선택부(130)는 제1 데이터 마이닝에 사용된 복수의 결정 트리 알고리즘 중 공격 상태 탐지율이 가장 높은 결정 트리 알고리즘을 선택할 수 있다.In step 1230, the algorithm selector 130 may select one of the plurality of decision tree algorithms used for the first data mining based on the result of the first data mining performed in step 1220 . At this time, the algorithm selecting unit 130 can select a decision tree algorithm having the highest attack state detection rate among the plurality of decision tree algorithms used for the first data mining.

단계(1240)에서 키 속성 데이터 결정부(140)는 단계(1230)에서 선택한 결정 트리 알고리즘을 이용하여 후보 속성 데이터 중 키 속성 데이터를 선택할 수 있다. 이때, 키 속성 데이터 결정부(140)는 결정 트리 알고리즘이 표시하는 트리 구조에서 복수의 레벨에 대응하는 후보 속성 데이터들 중 기준 레벨 이상의 후보 속성 데이터를 키 속성 데이터로 선택할 수 있다. In step 1240, the key attribute data determination unit 140 may select the key attribute data in the candidate attribute data using the decision tree algorithm selected in step 1230. At this time, the key attribute data determination unit 140 can select, as the key attribute data, candidate attribute data higher than the reference level among the candidate attribute data corresponding to the plurality of levels in the tree structure displayed by the decision tree algorithm.

단계(1250)에서 제2 데이터 마이닝부(150)는 단계(1240)에서 결정한 키 속성 데이터를 이용하여 제2 데이터 마이닝을 수행할 수 있다.In operation 1250, the second data mining unit 150 may perform the second data mining using the key attribute data determined in operation 1240.

이때, 제2 데이터 마이닝부(150)가 제2 데이터 마이닝을 수행하기 위하여 이용하는 마이닝 알고리즘은 단계(1220)에서 이용한 마이닝 알고리즘과 동일하며 입력 데이터가 상이할 수 있다. At this time, the mining algorithm used by the second data mining unit 150 to perform the second data mining is the same as the mining algorithm used in step 1220, and the input data may be different.

구체적으로 단계(1220)에서 제1 데이터 마이닝부(120)는 모든 후보 속성 데이터를 입력 데이터로 이용하여 데이터 마이닝하고, 단계(1250)에서 제2 데이터 마이닝부(150)는 후보 속성 데이터 중에서 공격 상태 검출에 사용되는 핵심 속성인 키 속성 데이터 만을 입력 데이터로 이용하여 데이터 마이닝할 수 있다.
Specifically, in step 1220, the first data mining unit 120 performs data mining using all the candidate attribute data as input data. In step 1250, the second data mining unit 150 extracts, from the candidate attribute data, It is possible to perform data mining using only key attribute data, which is a key attribute used for detection, as input data.

도 13은 본 발명의 일실시예에 따른 후보 속성 데이터 수집 방법을 도시한 플로우차트이다. 도 13의 단계(1310) 내지 단계(1340)는 도 12의 단계(1210)에 포함될 수 있다.13 is a flowchart illustrating a candidate attribute data collection method according to an embodiment of the present invention. Steps 1310 to 1340 of FIG. 13 may be included in step 1210 of FIG.

단계(1310)에서 데이터 속성 선택부(210)는 공격 요구 사항에 기초하여 데이터 속성을 선택할 수 있다. 이때, 데이터 속성 선택부(210)는 탐지하고자 하는 공격의 NSM 요구 사항을 분석하고, 분석 결과에 따라 데이터 속성을 선택할 수 있다.In step 1310, the data attribute selector 210 may select a data attribute based on an attack requirement. At this time, the data attribute selection unit 210 analyzes the NSM requirements of the attack to be detected and can select data attributes according to the analysis result.

단계(1320)에서 데이터 수집부(220)는 단계(1310)에서 선택한 데이터 속성에 대응하는 후보 속성 데이터를 수집할 수 있다. 이때, 시스템은 시스템 공격부(230)에게 공격을 받는 상태가 아니므로 데이터 수집부(220)가 수집하는 후보 속성 데이터는 정상 상태 데이터일 수 있다.In step 1320, the data collection unit 220 may collect candidate attribute data corresponding to the data attribute selected in step 1310. At this time, since the system is not under attack by the system attack unit 230, the candidate attribute data collected by the data collection unit 220 may be steady state data.

단계(1330)에서 시스템 공격부(230)는 임의로 시스템에 공격을 수행할 수 있다. 예를 들어, 시스템 공격부(230)는 GOOSE 메시지를 이용하여 SYN 플로드 공격 및 버퍼 오버 플로우 공격을 수행할 수 있다. 이때, 시스템 공격부(230)는 테스트용 시스템인 공격 테스트 베드(Attack test bed)에 일정 시간 간격으로 기 설정된 회수의 SYN 플로드 공격 및 버퍼 오버 플로우 공격을 수행할 수 있다. In step 1330, the system attack unit 230 may arbitrarily perform an attack on the system. For example, the system attack unit 230 may perform a SYN flood attack and a buffer overflow attack using a GOOSE message. At this time, the system attack unit 230 may perform a predetermined number of SYN flood attacks and buffer overflow attacks at predetermined time intervals in an attack test bed, which is a test system.

단계(1340)에서 데이터 수집부(220)는 단계(310)에서 선택한 데이터 속성에 대응하는 후보 속성 데이터를 수집할 수 있다. 이때, 시스템은 단계(1330)에서 시스템 공격부(230)에게 공격을 받는 상태이므로 데이터 수집부(220)가 수집하는 후보 속성 데이터는 공격 상태 데이터일 수 있다.In operation 1340, the data collection unit 220 may collect candidate attribute data corresponding to the data attribute selected in operation 310. At this time, since the system is under attack by the system attack unit 230 in step 1330, the candidate attribute data collected by the data collection unit 220 may be attack state data.

또한, 데이터 수집부(220)와 시스템 공격부(230)는 시스템 공격부(230)가 기 설정된 회수의 공격을 모두 수행할 때까지 단계(1320) 내지 단계(1340)를 반복할 수 있다.
The data collecting unit 220 and the system attack unit 230 may repeat steps 1320 to 1340 until the system attack unit 230 performs all the predetermined number of attacks.

본 발명은 결정 트리 알고리즘을 기초로 후보 속성 데이터 중에서 공격 상태 검출에 사용되는 핵심 속성인 키 속성 데이터를 결정하고, 키 속성 데이터를 입력 데이터로 이용하여 데이터 마이닝함으로써, 공격 상태 검출을 위한 데이터 마이닝의 속도 및 정확도를 높일 수 있다.
According to the present invention, key attribute data, which is a key attribute used in attack state detection, is determined from candidate attribute data based on a decision tree algorithm, and data mining is performed using key attribute data as input data. Speed and accuracy.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.
The method according to an embodiment may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like. The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다.While the invention has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims. This is possible.

그러므로, 본 발명의 범위는 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.Therefore, the scope of the present invention should not be limited to the described embodiments, but should be determined by the equivalents of the claims, as well as the claims.

110: 후보 속성 데이터 수집부
120: 제1 데이터 마이닝부
130: 알고리즘 선택부
140: 키속성 데이터 결정부
150: 제2 데이터 마이닝부110: Candidate attribute data collection unit
120: a first data mining unit
130: Algorithm selector
140: Key attribute data determination unit
150: second data mining unit

Claims

Performing first data mining using candidate attribute data;
Selecting one decision tree algorithm among a plurality of decision tree algorithms used for the first data mining based on the result of the first data mining;
Determining key attribute data of candidate attribute data using a selected decision tree algorithm; And
Performing second data mining using the key attribute data
The attack detection method comprising:

The method according to claim 1,
The decision tree algorithm includes:
Wherein the candidate attribute data used in the first data mining is represented by a tree structure.

3. The method of claim 2,
Wherein the step of determining the key attribute data comprises:
And determining candidate attribute data of a level higher than a reference level among candidate attribute data corresponding to a plurality of levels in the tree structure as key attribute data.

The method according to claim 1,
Selecting a data attribute based on an attack requirement; And
Collecting candidate attribute data corresponding to the selected data attribute
Further comprising:

The method according to claim 1,
The candidate attribute data may include:
An attack detection method including normal state data collected from a normal state system and attack state data collected from a system in an attack state.

6. The method of claim 5,
Performing an attack on the system arbitrarily
Further comprising:
In the attack state,
Wherein the system is attacked by performing the attack.

The method according to claim 1,
Wherein performing the first data mining comprises:
Determining whether the system is attacked using a plurality of mining algorithms including a decision tree algorithm and the candidate attribute data, and determining an attack state detection rate of the mining algorithm based on a determination result.

8. The method of claim 7,
Wherein the step of selecting the decision tree algorithm comprises:
And selecting a decision tree algorithm having the highest attack state detection rate among a plurality of decision tree algorithms used for the first data mining.

A computer-readable recording medium on which a program for executing the method according to any one of claims 1 to 8 is recorded.

A first data mining unit for performing a first data mining using candidate attribute data;
An algorithm selection unit for selecting one decision tree algorithm among a plurality of decision tree algorithms used for the first data mining based on the result of the first data mining;
A key attribute data determination unit for determining key attribute data among candidate attribute data using a selected decision tree algorithm; And
A second data mining unit for performing a second data mining using the key attribute data,
And an attack detection device.

11. The method of claim 10,
The decision tree algorithm includes:
And the candidate attribute data used in the first data mining is represented by a tree structure.

12. The method of claim 11,
Wherein the key attribute data determination unit determines,
And determines candidate attribute data of a level higher than a reference level among candidate attribute data corresponding to a plurality of levels in the tree structure as key attribute data.

11. The method of claim 10,
A data attribute selection unit for selecting a data attribute based on an attack requirement; And
A data collection unit for collecting candidate attribute data corresponding to the selected data attribute,
Further comprising an attack detection device.

11. The method of claim 10,
The candidate attribute data may include:
Stealth state data collected from a steady state system, and attack state data collected from a system in an attack state.

15. The method of claim 14,
A system attack part that arbitrarily attacks the system
Further comprising:
In the attack state,
Wherein the system is attacked by the system attacking unit.

11. The method of claim 10,
Wherein the first data mining unit comprises:
Determining whether the system is attacked using a plurality of mining algorithms including the decision tree algorithm and the candidate attribute data, and determining an attack state detection rate of the mining algorithm based on the determination result.

17. The method of claim 16,
Wherein the algorithm selecting unit comprises:
And selects a decision tree algorithm having the highest attack state detection rate among a plurality of decision tree algorithms used for the first data mining.