KR101994528B1

KR101994528B1 - Method and Apparatus for Detection of Traffic Flooding Attacks using Time Series Analysis

Info

Publication number: KR101994528B1
Application number: KR1020170110010A
Authority: KR
Inventors: 이성주; 사재원; 김윤빈
Original assignee: 고려대학교 세종산학협력단
Priority date: 2017-08-30
Filing date: 2017-08-30
Publication date: 2019-06-28
Also published as: KR20190023767A

Abstract

시계열 기법을 이용한 트래픽 폭주 공격 탐지 방법 및 장치가 제시된다. 본 발명에서 제안하는 시계열 기법을 이용한 트래픽 폭주 공격 탐지 방법은 비정상 침입 공격 탐지를 위해 정상 트래픽 흐름과 비정상 트래픽 흐름을 트래픽 시계열 패턴으로 변환하고, 트래픽 시계열 패턴의 특징을 추출하는 단계, 상기 추출된 트래픽 시계열 패턴의 특징을 학습하여 대표 서브 시퀀스를 추출하는 단계, 상기 추출된 대표 서브 시퀀스를 이용하여 트래픽 시계열 패턴에 대한 유클리디안 거리를 계산하는 단계 및 상기 계산된 유클리디안 거리를 이용하여 정상 트래픽 흐름과 비정상 트래픽 흐름을 분류하는 단계를 포함한다.A method and an apparatus for detecting a traffic congestion attack using a time series technique are presented. The method of detecting a traffic congestion attack using the time series scheme according to the present invention includes the steps of converting a normal traffic flow and an abnormal traffic flow into a traffic time series pattern for detecting an abnormal intrusion attack and extracting characteristics of a traffic time series pattern, Calculating a Euclidean distance with respect to a traffic time series pattern using the extracted representative subsequence by learning a characteristic of the time series pattern and extracting representative subsequences by using the calculated Euclidean distance, And classifying the flow and the abnormal traffic flow.

Description

FIELD OF THE INVENTION [0001] The present invention relates to a method and apparatus for detecting a traffic congestion attack using a time series technique,

본 발명은 트래픽을 시계열 데이터로 변환하여 분석함으로써 정상과 비정상 트래픽 흐름을 분류하는 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for classifying normal and abnormal traffic flows by converting traffic into time-series data for analysis.

최근 IT기술의 보편화에 따라 다양한 분야에서 예측모델 설계를 포함한 여러 가지의 기술을 통하여 사용자들에게 맞춤 서비스를 제공하며, 이에 따라 네트워크 보안을 비롯한 개인 정보에 대한 신뢰성이 부각되었다. 네트워크 분야의 침입 탐지 기술에서 침입 모델은 크게 두 종류로 나누어 진다. 비정상 기반 침입 탐지 기술(Anormaly Based Detection)은 사용자의 일반적인 패턴에 대한 프로파일을 생성하고 이로부터 벗어나는 패턴의 분석을 수행하고, 오용 침입 탐지 기술(Misuse Based Detection)은 과거의 침입 행위들로부터 얻어진 패턴을 사용하여 이와 유사하거나 동일한 패턴의 분석을 수행한다. Recently, as the IT technology becomes more popular, personalized services are provided to users through various technologies including prediction model design in various fields, and reliability of personal information including network security has been highlighted accordingly. In the network intrusion detection technology, there are two types of intrusion model. The anomaly based detection technique generates a profile of a user's general patterns and analyzes patterns deviating therefrom. Misuse Based Detection detects a pattern obtained from past intrusions To perform similar or similar pattern analysis.

비정상 기반 침입 탐지 기술에서 분석하는 비정상 침입 중 한가지 종류인 DDoS(Distributed Denial of Service) 공격은 원격으로 대량의 좀비PC를 생성하고, 이를 이용하여 트래픽을 기하급수적으로 증가시킴으로써 서버를 마비시키는 트래픽 폭주 공격이다. 또한, 트래픽 폭주 공격의 발생 사례는 계속적으로 증가하고 있으며, 규모 또한 커지고 있기 때문에 이러한 비정상 침입 공격에 대한 효율적인 탐지 기법이 요구된다. DDoS (Distributed Denial of Service) attack, which is one kind of abnormal unauthorized intrusion that is analyzed by abnormal-based intrusion detection technology, creates a large number of zombie PC remotely and uses it to increase traffic exponentially, to be. In addition, the cases of traffic congestion attacks are continuously increasing and the size of the traffic congestion attacks is increasing. Therefore, an efficient detection technique for the abnormal intrusion attacks is required.

본 발명이 이루고자 하는 기술적 과제는 비정상 침입 공격 탐지를 위하여 정상 트래픽 흐름과 비정상 트래픽 흐름을 시계열 패턴으로 변환하고, 변환된 시계열 데이터에 대해 쉐이플릿(shapelets) 기법을 적용하여 비정상 침입 공격을 탐지하고 분류하는 방법 및 장치를 제공하는데 있다.SUMMARY OF THE INVENTION The present invention is directed to a method and apparatus for detecting abnormal intrusion attacks by converting normal traffic flows and abnormal traffic flows into time series patterns and applying shapelets to the transformed time series data to detect abnormal intrusion attacks And to provide a method and an apparatus for performing the method.

일 측면에 있어서, 본 발명에서 제안하는 시계열 기법을 이용한 트래픽 폭주 공격 탐지 방법은 비정상 침입 공격 탐지를 위해 정상 트래픽 흐름과 비정상 트래픽 흐름을 트래픽 시계열 패턴으로 변환하고, 트래픽 시계열 패턴의 특징을 추출하는 단계, 상기 추출된 트래픽 시계열 패턴의 특징을 학습하여 대표 서브 시퀀스를 추출하는 단계, 상기 추출된 대표 서브 시퀀스를 이용하여 트래픽 시계열 패턴에 대한 유클리디안 거리를 계산하는 단계 및 상기 계산된 유클리디안 거리를 이용하여 정상 트래픽 흐름과 비정상 트래픽 흐름을 분류하는 단계를 포함한다. In one aspect, a traffic congestion attack detection method using the time series scheme proposed in the present invention includes a step of converting a normal traffic flow and an abnormal traffic flow into a traffic time series pattern for detecting an abnormal intrusion attack, and extracting characteristics of a traffic time series pattern Extracting a representative subsequence by learning characteristics of the extracted traffic time series pattern, calculating an Euclidean distance with respect to a traffic time series pattern using the extracted representative subsequence, and calculating the Euclidean distance And classifying the normal traffic flow and the abnormal traffic flow.

상기 비정상 침입 공격 탐지를 위해 정상 트래픽 흐름과 비정상 트래픽 흐름을 트래픽 시계열 패턴으로 변환하고, 트래픽 시계열 패턴의 특징을 추출하는 단계는 과적합에 따른 정확도 감소와 학습 시간의 증가 문제를 해결하기 위해 정보 이득 기법을 이용하여 트래픽 시계열 패턴의 데이터 세트의 복수의 특징 중 미리 정해진 수의 상위 특징을 추출한다. The step of converting the normal traffic flow and the abnormal traffic flow into the traffic time series pattern and extracting the characteristics of the traffic time series pattern for the detection of the abnormal intrusion attack is performed in order to solve the problem of the decrease in the accuracy and the increase in the learning time according to the over- Technique to extract a predetermined number of upper features of the plurality of features of the data set of the traffic time series pattern.

복수의 특징 중 미리 정해진 수의 상위 특징을 선택하기 위해 전처리 과정으로서 엔트로피 계산을 수행하고, 상위 노드의 엔트로피와 하위 노드의 엔트로피의 차를 통해 정보 이득 기법을 이용함으로써 미리 정해진 수의 상위 특징을 추출한다. In order to select a predetermined number of upper features among the plurality of features, entropy calculation is performed as a preprocessing process, and a predetermined number of upper features are extracted by using an information gain technique through difference between entropy of an upper node and entropy of a lower node do.

상기 추출된 대표 서브 시퀀스를 이용하여 트래픽 시계열 패턴에 대한 유클리디안 거리를 계산하는 단계는 쉐이플릿(Shapelets) 시계열 분석 기법의 학습과 테스트 수행 시간을 감소시키기 위해 패스트-쉐이플릿(Fast-Shapelet) 기법을 적용하였고, 유클리디안 거리 기법을 통해 정상 트래픽 흐름과 비정상 트래픽 흐름 간의 거리를 최대로 하는 쉐이플릿을 생성한다. The step of calculating the Euclidean distance with respect to the traffic time series pattern using the extracted representative subsequence includes a Fast-Shapelet method to reduce the learning time and the test execution time of the Shapelets time series analysis technique, And generates a shaper that maximizes the distance between the normal traffic flow and the abnormal traffic flow through the Euclidean distance method.

상기 계산된 유클리디안 거리를 이용하여 정상 트래픽 흐름과 비정상 트래픽 흐름을 분류하는 단계는 상기 유클리디안 거리를 계산하는 단계에서 생성된 쉐이플릿과 비정상 트래픽 흐름 간에 미리 정해진 임계치를 기준으로 이진 분류를 진행한다. 픽 폭주 공격 탐지 방법. The step of classifying the normal traffic flow and the abnormal traffic flow using the calculated Euclidean distance includes a step of classifying a binary classification based on a predetermined threshold between the shaper and the abnormal traffic flow generated in calculating the Euclidean distance Go ahead. How to Detect Peak Congestion Attacks.

또 다른 일 측면에 있어서, 본 발명에서 제안하는 시계열 기법을 이용한 트래픽 폭주 공격 탐지 장치는 비정상 침입 공격 탐지를 위해 정상 트래픽 흐름과 비정상 트래픽 흐름을 트래픽 시계열 패턴으로 변환하고, 트래픽 시계열 패턴의 특징을 추출하는 특징 추출부, 상기 추출된 트래픽 시계열 패턴의 특징을 학습하여 대표 서브 시퀀스를 추출하는 서브 시퀀스 추출부, 상기 추출된 대표 서브 시퀀스를 이용하여 트래픽 시계열 패턴에 대한 유클리디안 거리를 계산하는 유클리디안 거리 계산부 및 상기 계산된 유클리디안 거리를 이용하여 정상 트래픽 흐름과 비정상 트래픽 흐름을 분류하는 분류부를 포함한다. According to another aspect of the present invention, there is provided a device for detecting a traffic congestion attack using a time series technique, which comprises: converting a normal traffic flow and an abnormal traffic flow into a traffic time series pattern for detecting an abnormal intrusion attack; A subsequence extracting unit for extracting a representative subsequence by learning the characteristics of the extracted traffic time series pattern, a subsequence extracting unit for extracting a representative subsequence from the extracted traffic subsequence, And a classifying unit classifying the normal traffic flow and the abnormal traffic flow using the calculated Euclidean distance.

상기 특징 추출부는 과적합에 따른 정확도 감소와 학습 시간의 증가 문제를 해결하기 위해 정보 이득 기법을 이용하여 트래픽 시계열 패턴의 데이터 세트의 복수의 특징 중 미리 정해진 수의 상위 특징을 추출한다. The feature extraction unit extracts a predetermined number of upper features among a plurality of features of the data set of the traffic time series pattern using an information gain technique to solve the problem of reduction of the accuracy and increase of the learning time according to the over sum.

상기 특징 추출부는 복수의 특징 중 미리 정해진 수의 상위 특징을 선택하기 위해 전처리 과정으로서 엔트로피 계산을 수행하고, 상위 노드의 엔트로피와 하위 노드의 엔트로피의 차를 통해 정보 이득 기법을 이용함으로써 미리 정해진 수의 상위 특징을 추출한다. The feature extraction unit performs entropy calculation as a preprocessing process to select a predetermined number of upper features among a plurality of features and calculates a predetermined number of features by using an information gain technique through a difference between an entropy of an upper node and an entropy of a lower node Extract upper features.

상기 유클리디안 거리 계산부는 쉐이플릿(Shapelets) 시계열 분석 기법의 학습과 테스트 수행 시간을 감소시키기 위해 패스트-쉐이플릿(Fast-Shapelet) 기법을 적용하였고, 유클리디안 거리 기법을 통해 정상 트래픽 흐름과 비정상 트래픽 흐름 간의 거리를 최대로 하는 쉐이플릿을 생성한다. The Euclidean distance calculator applied the Fast-Shapel technique to reduce the learning and test execution time of the Shapelets time series analysis technique and the normal traffic flow Create a schematic that maximizes the distance between the unhealthy traffic flows.

상기 분류부는 상기 유클리디안 거리를 계산하는 단계에서 생성된 쉐이플릿과 비정상 트래픽 흐름 간에 미리 정해진 임계치를 기준으로 이진 분류를 진행한다. The classifier performs binary classification on the basis of a predetermined threshold between the shitlet generated in the step of calculating the Euclidean distance and the abnormal traffic flow.

본 발명의 실시예들에 따르면 과적합에 따른 정확도 감소와 학습 시간의 증가 문제를 해결하기 복수의 특징 중 상위 특징을 선택하여 쉐이플릿(Shapelets) 시계열 분석 기법을 적용하고, 쉐이플릿 시계열 분석 기법의 학습과 테스트 수행 시간을 효과적으로 감소하기 위해, 유클리디안 거리 기법을 통해 정상 트래픽 흐름과 비정상 트래픽 흐름 간의 거리를 최대로 하는 쉐이플릿을 생성하여 정상 트래픽 흐름과 비정상 트래픽 흐름을 분류할 수 있다.According to the embodiments of the present invention, Shapelets time series analysis technique is applied to select an upper feature among a plurality of features to solve the problem of decrease in accuracy and increase in learning time according to over sum, In order to effectively reduce the learning and test execution time, the Euclidean distance method can classify the normal traffic flow and the abnormal traffic flow by generating a shaper that maximizes the distance between the normal traffic flow and the abnormal traffic flow.

도 1은 본 발명의 일 실시예에 따른 시계열 기법을 이용한 트래픽 폭주 공격 탐지 방법을 설명하기 위한 흐름도이다.
도 2는 본 발명의 일 실시예에 따른 트래픽 시계열 패턴의 특징을 추출하는 방법을 설명하기 위한 도면이다.
도 3은 본 발명의 일 실시예에 따른 쉐이프릿(shapelets) 기법을 설명하기 위한 도면이다.
도 4는 본 발명의 일 실시예에 따른 서브 시퀀스 추출 방법을 설명하기 위한 도면이다.
도 5는 본 발명의 일 실시예에 따른 시계열 기법을 이용한 트래픽 폭주 공격 탐지 장치의 구성을 나타내는 도면이다.
도 6은 본 발명의 일 실시예에 따른 정상 및 트래픽 폭주공격의 트래픽과 쉐이프릿을 적용한 결과를 나타내는 그래프이다.1 is a flowchart illustrating a method for detecting a traffic congestion attack using a time series technique according to an embodiment of the present invention.
2 is a diagram for explaining a method of extracting characteristics of a traffic time series pattern according to an embodiment of the present invention.
3 is a view for explaining a shapelets technique according to an embodiment of the present invention.
4 is a diagram for explaining a subsequence extraction method according to an embodiment of the present invention.
5 is a diagram illustrating a configuration of a traffic congestion attack detection apparatus using a time series technique according to an embodiment of the present invention.
FIG. 6 is a graph showing the results of applying normal traffic and traffic congestion attacks and shaplets according to an exemplary embodiment of the present invention.

침임 탐지 기술은 탐지 방법에 따라 지식기반/오용 탐지 및 행위기반/비정상 행위 탐지로 분류할 수 있다. 지식기반/오용 탐지는 특정 공격에 관한 분석결과를 바탕으로 패턴을 설정하고, 오탐률이 낮은 이점이 있지만, 새로운 공격 탐지 불가능하고, 탐지율이 낮은 단점을 갖는다. 행위기반/비정상 행위 탐지는 급격한 변화가 발견 되면 침입으로 탐지하고, 새로운 공격 탐지가 가능하며, 인공지능 알고리즘을 이용하는 이점이 있지만, 오탐률이 높다는 단점을 갖는다. Intrusion detection technology can be classified into knowledge base / misuse detection and behavior based / abnormal behavior detection according to detection methods. Knowledge base / misuse detection sets patterns based on analysis results of specific attacks, and has the disadvantage of low false positives, but it can not detect new attacks and has a low detection rate. Behavior based / abnormal behavior detection has the disadvantage of detecting intrusion when sudden change is detected, enabling new attack detection, and using artificial intelligence algorithm, but high false positives.

또 다른 측면에서, 침임 탐지 기술은 데이터 수집원에 따라 네트워크 패킷 및 호스트 로그 파일로 분류할 수 있다. 네트워크 패킷은 해당 네트워크의 모든 트래픽을 탐지하고, 여러 유형의 침입을 탐지하며, 서버 성능 저하가 없다. 호스트 로그 파일은 다양한 로그 자료를 통해 정확한 침입방지가 가능하고, 소스코드를 접속한 호스트를 알 수 있다. 이하, 본 발명의 실시 예를 첨부된 도면을 참조하여 상세하게 설명한다.
In another aspect, intrusion detection techniques can be grouped into network packets and host log files depending on the data source. Network packets detect all traffic on the network, detect various types of intrusions, and have no server performance degradation. The host log file can be precisely protected against intrusion through various log data, and it can know the host connecting the source code. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 시계열 기법을 이용한 트래픽 폭주 공격 탐지 방법을 설명하기 위한 흐름도이다. 1 is a flowchart illustrating a method for detecting a traffic congestion attack using a time series technique according to an embodiment of the present invention.

제안하는 시계열 기법을 이용한 트래픽 폭주 공격 탐지 방법은 비정상 침입 공격 탐지를 위해 정상 트래픽 흐름과 비정상 트래픽 흐름을 트래픽 시계열 패턴으로 변환하고, 트래픽 시계열 패턴의 특징을 추출하는 단계(110), 상기 추출된 트래픽 시계열 패턴의 특징을 학습하여 대표 서브 시퀀스를 추출하는 단계(120), 상기 추출된 대표 서브 시퀀스를 이용하여 트래픽 시계열 패턴에 대한 유클리디안 거리를 계산하는 단계(130) 및 상기 계산된 유클리디안 거리를 이용하여 정상 트래픽 흐름과 비정상 트래픽 흐름을 분류하는 단계(140)를 포함한다. A method for detecting a traffic congestion attack using a time series technique includes a step 110 of converting a normal traffic flow and an abnormal traffic flow into a traffic time series pattern and detecting characteristics of a traffic time series pattern for detecting an abnormal intrusion attack, A step 120 of extracting a representative subsequence by learning a characteristic of the time series pattern, a step 130 of calculating an Euclidean distance with respect to a traffic time series pattern using the extracted representative subsequence, And classifying the normal traffic flow and the abnormal traffic flow using the distance (step 140).

단계(110)에서, 비정상 침입 공격 탐지를 위해 정상 트래픽 흐름과 비정상 트래픽 흐름을 트래픽 시계열 패턴으로 변환하고, 트래픽 시계열 패턴의 특징을 추출한다. 과적합에 따른 정확도 감소와 학습 시간의 증가 문제를 해결하기 위해 정보 이득 기법을 이용하여 트래픽 시계열 패턴의 데이터 세트의 복수의 특징 중 미리 정해진 수의 상위 특징을 추출한다. 복수의 특징 중 미리 정해진 수의 상위 특징을 선택하기 위해 전처리 과정으로서 엔트로피 계산을 수행하고, 상위 노드의 엔트로피와 하위 노드의 엔트로피의 차를 통해 정보 이득 기법을 이용함으로써 미리 정해진 수의 상위 특징을 추출한다. In step 110, the normal traffic flow and the abnormal traffic flow are converted into a traffic time series pattern for detecting an abnormal intrusion attack, and characteristics of the traffic time series pattern are extracted. To solve the problem of decreasing the accuracy and increasing the learning time according to the over-sum, an information gain method is used to extract a predetermined number of upper features among the plurality of features of the data set of the traffic time series pattern. In order to select a predetermined number of upper features among the plurality of features, entropy calculation is performed as a preprocessing process, and a predetermined number of upper features are extracted by using an information gain technique through difference between entropy of an upper node and entropy of a lower node do.

도 2는 본 발명의 일 실시예에 따른 트래픽 시계열 패턴의 특징을 추출하는 방법을 설명하기 위한 도면이다.2 is a diagram for explaining a method of extracting characteristics of a traffic time series pattern according to an embodiment of the present invention.

본 발명의 실시예에 따른 NSL-KDD 데이터 세트는 DARPA 계획의 일환인 데이터 세트 KDD CUP'99의 정제된 버전으로서 실제 네트워크 상에서 정상 트래픽 패킷과 비정상 패킷인 네 가지 공격 유형으로, 각 공격은 41개의 특징들로 구성되어 있다. 이러한 네 가지 공격 유형은 DoS, Probe, U2R, R2L을 포함한다. 공격 유형별 패킷 분포는 표 1과 같다.The NSL-KDD data set according to an embodiment of the present invention is a refined version of the data set KDD CUP'99, which is part of the DARPA scheme, with four types of attacks: normal traffic packets and abnormal packets on the real network, . These four types of attacks include DoS, Probe, U2R, and R2L. Table 1 shows the packet distribution by attack type.

<표 1><Table 1>

또한, 학습용과 테스트용 데이터 세트를 지원하며, 데이터 세트의 특징들은 네트워크 헤더의 내용인 Protocol type, Service, src_byte, dst_byte와 호스트와 게스트의 로그인 횟수 등 access 내용을 포함한다. 제안하는 방법에서는 쉐이플릿(shapelets) 기반 시계열 분석 기법을 이용한 트래픽 폭주공격에 초점을 맞추어 네 가지 공격 유형 중 DoS 공격 트래픽 데이터 세트를 이용한다. It also supports data sets for learning and testing, and the characteristics of the data set include the contents of the network header, such as Protocol type, Service, src_byte, dst_byte, and access counts of host and guest logins. The proposed method focuses on traffic congestion attacks using shapelets based time series analysis and uses DoS attack traffic data set among four attack types.

도 2를 참조하면, 도 2(a)는 정상 패킷과 DoS의 전체 학습 데이터 수의 비교를 나타내었고, 도 2(b)는 추출된 10개의 특징 데이터 수를 나타내었다. 이와 같이, 종래기술에서는 총 41개의 특징을 기준으로 5~30개의 특징을 5단위로 측정하고, 과적합에 대한 정확도 감소 우려된다. 따라서, 본 발명에서는 학습의 수행 시간을 줄이기 위해 10개의 특징을 추출하였다. 아래에서 더욱 상세히 설명한다. Referring to FIG. 2, FIG. 2 (a) shows a comparison of the total number of learning data of normal packets and DoS, and FIG. 2 (b) shows the number of extracted feature data. As described above, in the prior art, 5 to 30 features are measured in 5 units based on a total of 41 features, and the accuracy of the over sum is decreased. Therefore, in the present invention, ten features are extracted to reduce learning time. This will be described in more detail below.

본 발명의 실시예에 따른 NSL-KDD의 학습 데이터 세트는 4,756,832개의 패킷으로 구성되어 있다. 침입탐지의 경우, 새로운 공격유형의 탐지를 위해서 주기적으로 추가되는 공격유형에 대한 학습이 필요하다. 따라서 학습의 수행시간을 줄이기 위한 방법이 필요하다. 다시 말해, 모든 데이터 세트를 이용할 경우, 오히려 과적합(Overfitting)에 따른 정확도 감소와 학습시간의 증가가 야기될 수 있다.The training data set of the NSL-KDD according to the embodiment of the present invention is composed of 4,756,832 packets. In the case of intrusion detection, it is necessary to learn about the types of attacks added periodically in order to detect new attack types. Therefore, a method for reducing the execution time of learning is needed. In other words, when using all data sets, rather than overfitting, accuracy may be reduced and learning time may be increased.

본 발명에서는 과적합에 따른 정확도 감소와 학습 시간의 증가 문제를 해결하기 위하여 정보 이득(Information gain) 기법을 적용하여 41개의 특징 중 상위 10개의 특징을 선택하고, 시계열 분석 방법인 쉐이플릿 기법에 적용하였다. In the present invention, in order to solve the problem of decreasing the accuracy and increasing the learning time according to over sum, the information gain method is applied to select the top 10 features among the 41 features and applied to the time-series analysis method Respectively.

정보 이득 기법이란, 어떤 특징을 선택함으로 인해 데이터의 변별력이 높아지게 되는 것을 말하고, 상위 노드의 엔트로피(E(node1))에서 하위 노드의 엔트로피(E(node2))의 차를 통해 나타낸다. 결과 갑이 클수록 정보 이득이 크고, 변별력이 좋다는 것을 의미한다. 각 노드 별로 엔트로피를 계산하고, 이를 이용하여 상위 노드에서 하위 노드의 엔트로피의 차를 통해 순위를 지정한다. 이를 하기식으로 나타낼 수 있다. The information gain technique means that the discrimination power of data increases due to the selection of a certain feature and is represented through the difference between the entropy (E (node1)) of the upper node and the entropy (E (node2)) of the lower node. The larger the result, the greater the information gain and the better the discrimination power. The entropy is calculated for each node, and the rank is assigned through the difference of the entropy of the lower node in the upper node. This can be expressed by the following equation.

본 발명에서 사용한 특징인 정보 이득 상위 10개 목록은 표 2와 같다.Table 2 lists the top 10 information gains, which are features used in the present invention.

<표 2><Table 2>

단계(120)에서, 상기 추출된 트래픽 시계열 패턴의 특징을 학습하여 대표 서브 시퀀스를 추출하고, 단계(130)에서 상기 추출된 대표 서브 시퀀스를 이용하여 트래픽 시계열 패턴에 대한 유클리디안 거리를 계산한다. In step 120, the representative subsequence is extracted by learning the characteristics of the extracted traffic time series pattern, and in step 130, the Euclidean distance to the traffic time series pattern is calculated using the extracted representative subsequence .

쉐이플릿(Shapelets) 시계열 분석 기법의 학습과 테스트 수행 시간을 감소시키기 위해 패스트-쉐이플릿(Fast-Shapelet) 기법을 적용하였고, 유클리디안 거리 기법을 통해 정상 트래픽 흐름과 비정상 트래픽 흐름 간의 거리를 최대로 하는 쉐이플릿을 생성한다. Shapelets Fast-Shapelet technique was applied to reduce the learning and testing time of the time series analysis technique. The distance between the normal traffic flow and the abnormal traffic flow was maximized through the Euclidean distance technique. As shown in FIG.

도 3은 본 발명의 일 실시예에 따른 쉐이프릿(shapelets) 기법을 설명하기 위한 도면이다. 3 is a view for explaining a shapelets technique according to an embodiment of the present invention.

도 3을 참조하면, 쉐이프릿 기법은 시계열 패턴을 분류 기법 중 하나로서, 시계열 패턴 사이에서 추출된 서브 시퀀스를 이용하여 분류하고, 이때 시계열 패턴을 학습하여 추출된 대표 서브 시퀀스를 이용하여 유클리디안 거리를 계산한다. 이후, 계산된 거리에 대하여 임계값의 기준에 따라 정상과 비정상 판단한다. Referring to FIG. 3, the Shaplet technique is one of classification techniques for classifying a time series pattern using a subsequence extracted between time series patterns. In this case, by using representative subsequences extracted by learning a time series pattern, Calculate the distance. Thereafter, the calculated distance is determined to be normal and abnormal according to the criterion of the threshold value.

도 4는 본 발명의 일 실시예에 따른 서브 시퀀스 추출 방법을 설명하기 위한 도면이다. 4 is a diagram for explaining a subsequence extraction method according to an embodiment of the present invention.

도 4를 참조하면, 본 발명의 실시예에 따른 서브 시퀀스 추출 방법은 시계열 데이터에 슬라이딩 윈도우 기법을 적용한다. 같은 길이를 가진 시계열 데이터 T, R의 거리를 계산하면 아래와 같다. Referring to FIG. 4, a subsequence extraction method according to an embodiment of the present invention applies a sliding window technique to time series data. The distance between time series data T and R having the same length is calculated as follows.

이때, 시계열 데이터와 서브 시퀀스의 관계는

이다. At this time, the relationship between the time series data and the subsequence

to be.

시계열 데이터와 추출된 서브 시퀀스와의 거리를 계산하면 하기식과 같다. The distance between the time series data and the extracted subsequence is calculated as follows.

단계(140)에서, 상기 계산된 유클리디안 거리를 이용하여 정상 트래픽 흐름과 비정상 트래픽 흐름을 분류한다. 유클리디안 거리를 계산하는 단계에서 생성된 쉐이플릿과 비정상 트래픽 흐름 간에 미리 정해진 임계치를 기준으로 이진 분류를 진행한다.In step 140, the calculated Euclidean distance is used to classify the normal traffic flow and the abnormal traffic flow. The binary classification is performed on the basis of a predetermined threshold value between the shaper and the abnormal traffic flow generated in the step of calculating the Euclidean distance.

다시 말해, 정상 트래픽 클래스와 비정상 트래픽 클래스에서 유클리디안 거리 기법을 통해 쉐이플릿을 생성한 후, 비정상 트래픽 클래스와 생성된 쉐이플릿과의 임계치를 기준으로 하여 이진 분류를 진행하였다. In other words, after generating the shillet through the Euclidean distance method in the normal traffic class and the abnormal traffic class, the classification is performed based on the threshold value of the abnormal traffic class and the generated shillet.

최적 분할점(Optimal Split Point; OSP)는 시계열데이터와 추출된 서브 시퀀스와의 거리를 임계치를 통하여 구분하고, 아래와 같이 나타낼 수 있다.The Optimal Split Point (OSP) can be expressed as follows by the distance between the time series data and the extracted subsequence through the threshold value.

여기에서 d_th는 거리 임계치를 나타낸다. Where d _th represents the distance threshold.

쉐이플릿 후보는

와 같고, 쉐이플릿 전체 후보는

이다. Shaflet Candidate

, And all candidates for the Shiflet

to be.

종래기술의 시계열 통계 기반 공격의심 이상징후를 탐지하기 위한 방법에서는 사용자 IP별 네트워크 시계열 데이터의 특성 값을 기반으로 임계치를 설정한 후 임계치를 기준으로 정상과 비정상을 구별하였다. 반면에, 제안하는 방법은 임계치를 통한 방법이 아닌 시계열 데이터의 서브 시퀀스를 계산하고, 이를 통해서 정상 데이터와 비정상 데이터를 구분한다. In the method for detecting an abnormal suspicion based on the time series statistic based on the prior art, a threshold value is set based on a characteristic value of a network time series data for each user IP, and then a normal value and an abnormal value are distinguished based on a threshold value. On the other hand, the proposed method calculates the subsequence of the time series data rather than the threshold method, and distinguishes the normal data from the abnormal data.

종래기술의 네트워크 환경에서 멀웨어를 검출하기 위한 트래픽 패턴 분석 및 엔트로피 예측 방법은 엔트로피 레이트를 통하여 PAS가 멀웨어에 감염이 되는지 여부의 확률을 결정한다. 반면에, 제안하는 방법에서 엔트로피 계산은 해당 특징을 추출하기 위한 전처리 과정이며, 상위 노드의 엔트로피와 하위 노드의 엔트로피의 차를 통해 정보 이득을 계산함으로써, 전체 데이터 세트에서 가장 의미 있는 특징 10개를 추출하는 방법으로 쓰인다. The traffic pattern analysis and entropy prediction method for detecting malware in the network environment of the related art determines the probability of whether the PAS infects malware through entropy. On the other hand, in the proposed method, the entropy calculation is a preprocessing process for extracting the feature, and by calculating the information gain through the difference between the entropy of the upper node and the entropy of the lower node, It is used as an extraction method.

종래기술의 네트워크 이상징후 탐지 장치 및 방법은 IP주소 또는 해당 포트 번호에 기초하여 계산된 엔트로피를 통해 네트워크 이상징후를 판단한다. 반면에, 제안된 방법에서 엔트로피를 사용하는 부분은 특징을 추출하기 위한 방법으로 사용되며, 이러한 방법을 통해 특징을 축소함으로써 수행 시간을 단축할 수 있다.
The prior art network anomaly detection apparatus and method determine anomalous network anomaly through entropy calculated based on an IP address or a corresponding port number. On the other hand, in the proposed method, the part using entropy is used as a method for extracting the feature, and the execution time can be shortened by reducing the feature through this method.

도 5는 본 발명의 일 실시예에 따른 시계열 기법을 이용한 트래픽 폭주 공격 탐지 장치의 구성을 나타내는 도면이다. 5 is a diagram illustrating a configuration of a traffic congestion attack detection apparatus using a time series technique according to an embodiment of the present invention.

제안하는 시계열 기법을 이용한 트래픽 폭주 공격 탐지 장치(500)는 특징 추출부(510), 서브 시퀀스 추출부(520), 유클리디안 거리 계산부(530) 및 분류부(540)를 포함한다. The traffic congestion detection apparatus 500 using the proposed time series scheme includes a feature extraction unit 510, a subsequence extraction unit 520, an Euclidean distance calculation unit 530, and a classification unit 540.

특징 추출부(510)는 비정상 침입 공격 탐지를 위해 정상 트래픽 흐름과 비정상 트래픽 흐름을 트래픽 시계열 패턴으로 변환하고, 트래픽 시계열 패턴의 특징을 추출한다. 과적합에 따른 정확도 감소와 학습 시간의 증가 문제를 해결하기 위해 정보 이득 기법을 이용하여 트래픽 시계열 패턴의 데이터 세트의 복수의 특징 중 미리 정해진 수의 상위 특징을 추출한다. 복수의 특징 중 미리 정해진 수의 상위 특징을 선택하기 위해 전처리 과정으로서 엔트로피 계산을 수행하고, 상위 노드의 엔트로피와 하위 노드의 엔트로피의 차를 통해 정보 이득 기법을 이용함으로써 미리 정해진 수의 상위 특징을 추출한다. The feature extraction unit 510 converts the normal traffic flow and the abnormal traffic flow into a traffic time series pattern for detecting an abnormal intrusion attack and extracts characteristics of the traffic time series pattern. To solve the problem of decreasing the accuracy and increasing the learning time according to the over-sum, an information gain method is used to extract a predetermined number of upper features among the plurality of features of the data set of the traffic time series pattern. In order to select a predetermined number of upper features among the plurality of features, entropy calculation is performed as a preprocessing process, and a predetermined number of upper features are extracted by using an information gain technique through difference between entropy of an upper node and entropy of a lower node do.

본 발명의 실시예에 따른 NSL-KDD 데이터 세트는 DARPA 계획의 일환인 데이터 세트 KDD CUP'99의 정제된 버전으로서 실제 네트워크 상에서 정상 트래픽 패킷과 비정상 패킷인 네 가지 공격 유형으로, 각 공격은 41개의 특징들로 구성되어 있다. 이러한 네 가지 공격 유형은 DoS, Probe, U2R, R2L을 포함한다. The NSL-KDD data set according to an embodiment of the present invention is a refined version of the data set KDD CUP'99, which is part of the DARPA scheme, with four types of attacks: normal traffic packets and abnormal packets on the real network, . These four types of attacks include DoS, Probe, U2R, and R2L.

서브 시퀀스 추출부(520)는 상기 추출된 트래픽 시계열 패턴의 특징을 학습하여 대표 서브 시퀀스를 추출하고, 유클리디안 거리 계산부(530)는 상기 추출된 대표 서브 시퀀스를 이용하여 트래픽 시계열 패턴에 대한 유클리디안 거리를 계산한다. The subsequence extractor 520 extracts a representative subsequence by learning the characteristics of the extracted traffic time series pattern, and the euclidean distance calculator 530 calculates an euclidean distance by using the extracted representative subsequence, Calculate the Euclidean distance.

도 3을 참조하면, 쉐이프릿 기법은 시계열 패턴을 분류 기법 중 하나로서, 시계열 패턴 사이에서 추출된 서브 시퀀스를 이용하여 분류하고, 이때 시계열 패턴을 학습하여 추출된 대표 서브 시퀀스를 이용하여 유클리디안 거리를 계산한다. 이후, 계산된 거리에 대하여 임계값의 기준에 따라 정상과 비정상 판단한다. Referring to FIG. 3, the Shaplet technique is one of classification techniques for classifying a time series pattern using a subsequence extracted between time series patterns. In this case, by using a representative subsequence extracted by learning a time series pattern, Calculate the distance. Thereafter, the calculated distance is determined to be normal and abnormal according to the criterion of the threshold value.

도 4를 참조하면, 본 발명의 실시예에 따른 서브 시퀀스 추출 방법은 시계열 데이터에 슬라이딩 윈도우 기법을 적용한다. Referring to FIG. 4, a subsequence extraction method according to an embodiment of the present invention applies a sliding window technique to time series data.

분류부(540)는 상기 계산된 유클리디안 거리를 이용하여 정상 트래픽 흐름과 비정상 트래픽 흐름을 분류한다. 유클리디안 거리를 계산하는 단계에서 생성된 쉐이플릿과 비정상 트래픽 흐름 간에 미리 정해진 임계치를 기준으로 이진 분류를 진행한다.The classifying unit 540 classifies the normal traffic flow and the abnormal traffic flow using the calculated Euclidean distance. The binary classification is performed on the basis of a predetermined threshold value between the shaper and the abnormal traffic flow generated in the step of calculating the Euclidean distance.

도 6은 본 발명의 일 실시예에 따른 정상 및 트래픽 폭주공격의 트래픽과 쉐이프릿을 적용한 결과를 나타내는 그래프이다. FIG. 6 is a graph showing the results of applying normal traffic and traffic congestion attacks and shaplets according to an exemplary embodiment of the present invention.

도 6을 참조하면, 각각의 특징을 x축으로 설정 하였으며, 이에 따른 패킷 값을 y축에 나타낸다. Referring to FIG. 6, each characteristic is set as an x-axis, and a packet value is represented on the y-axis.

본 발명의 실시예에 따른 실험은 Intel Core i5-4690 3.5GHz, 8GB RAM 환경에서 수행하였다. 실험에서 사용된 데이터는 NSL-KDD 데이터 세트이고, 패킷 분류를 위한 분류부는 쉐이플릿을 사용하였다.Experiments according to embodiments of the present invention were performed in an Intel Core i5-4690 3.5 GHz, 8 GB RAM environment. The data used in the experiment is the NSL-KDD data set, and the classification part for packet classification uses the Shaplet.

표 3은 정상과 비정상 패킷의 분류가 잘 이루어졌는지에 대한 척도들을 보여준다. Table 3 shows the metrics for the classification of normal and abnormal packets.

<표 3><Table 3>

분류 정확도에 대한 척도로는 생성된 쉐이플릿을 통해 정상 패킷으로 분류된 클래스 중 정상 패킷의 분포 정도인 Precision, 정상 패킷의 분류 분포인 Recall, 비정상 패킷의 분류 정도인 TPR(True Positive Rate), 그리고 분류 정확성(Accuracy)을 사용하였다.As a measure of the classification accuracy, Precision, which is the distribution of the normal packets among the classes classified as normal packets through the generated shuffle, Recall which is the classification distribution of the normal packets, True Positive Rate (TPR) Classification Accuracy was used.

표 4는 정보 이득을 통해 특징 개수를 축소하여 수행한 학습 수행시간을 보여준다. Table 4 shows the execution time of the learning by reducing the number of features through information gain.

<표 4><Table 4>

특징 개수를 축소하여 수행한 결과, 41개의 특징 전부를 사용한 수행시간보다 약 25배의 성능 향상이 있음을 확인하였다.As a result of reducing the number of features, it is confirmed that the performance improvement is about 25 times higher than the execution time using all 41 features.

보안 시스템은 서비스를 지원하는 호스트에 대한 신뢰성과 타당성의 기준을 정하는 중요한 척도이다. 본 발명에서는 트래픽 폭주 공격에 대한 비정상 기반 탐지 분류를 위한 기술로 쉐이플릿 기법을 제안하였고, 이에 대한 약 95%의 분류 정확도를 확인하였다. 또한, 특징 개수를 감소시켜 수행시간에서 25배의 성능향상이 있음을 확인하였다.
A security system is an important measure of the reliability and validity of a host that supports a service. In the present invention, Shaplet technique is proposed as a technique for detection of abnormal traffic based on traffic congestion attack, and about 95% classification accuracy is confirmed. Also, it is confirmed that there is a 25 times improvement in performance time by reducing the number of features.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다.　 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다.　 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다.　 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다.　 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components. For example, the apparatus and components described in the embodiments may be implemented within a computer system, such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA) A programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For ease of understanding, the processing apparatus may be described as being used singly, but those skilled in the art will recognize that the processing apparatus may have a plurality of processing elements and / As shown in FIG. For example, the processing unit may comprise a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as a parallel processor.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다.　 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다.　 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded. The software and / or data may be in the form of any type of machine, component, physical device, virtual equipment, computer storage media, or device As shown in FIG. The software may be distributed over a networked computer system and stored or executed in a distributed manner. The software and data may be stored on one or more computer readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다.　 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다.　 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다.　 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.　 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.　 The method according to an embodiment may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software. Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Magneto-optical media, and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like. Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다.　 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments. For example, it is to be understood that the techniques described may be performed in a different order than the described methods, and / or that components of the described systems, structures, devices, circuits, Lt; / RTI > or equivalents, even if it is replaced or replaced.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.
Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

Transforming the normal traffic flow and the abnormal traffic flow into a traffic time series pattern and detecting characteristics of the traffic time series pattern to detect an abnormal intrusion attack;
Extracting a representative subsequence by learning features of the extracted traffic time series pattern;
Calculating an Euclidean distance for a traffic time series pattern using the extracted representative subsequence; And
Classifying the normal traffic flow and the abnormal traffic flow using the calculated Euclidean distance;
Lt; / RTI >
Wherein the step of converting the normal traffic flow and the abnormal traffic flow into the traffic time series pattern for detecting the abnormal intrusion attack and extracting characteristics of the traffic time series pattern includes:
We focus on traffic congestion attack using shapelets based time series analysis technique and use DoS attack traffic data set among multiple attack types. In order to solve the problem of decrease in accuracy and learning time according to over sum, information gain Extracting a predetermined number of upper features of the plurality of features of the data set of the traffic time series pattern and performing entropy calculation as a preprocessing process to select a predetermined number of upper features of the plurality of features, And the difference between the entropy of the lower node and the entropy of the lower node, thereby extracting a predetermined number of upper features
Detection of traffic congestion attack.

delete

The method according to claim 1,
Wherein the step of calculating the Euclidean distance with respect to the traffic time series pattern using the extracted representative subsequence comprises:
Shapelets Fast-Shapelet technique was applied to reduce the learning and testing time of the time series analysis technique. The distance between the normal traffic flow and the abnormal traffic flow was maximized through the Euclidean distance technique. To create a shipp
Detection of traffic congestion attack.

5. The method of claim 4,
Classifying the normal traffic flow and the abnormal traffic flow using the calculated Euclidean distance,
The binary classification is performed on the basis of a predetermined threshold value between the shuffle and the abnormal traffic flow generated in the step of calculating the Euclidean distance
Detection of traffic congestion attack.

A feature extraction unit for converting a normal traffic flow and an abnormal traffic flow into a traffic time series pattern for detecting an abnormal intrusion attack and extracting characteristics of a traffic time series pattern;
A subsequence extracting unit for extracting a representative subsequence by learning features of the extracted traffic time series pattern;
An Euclidean distance calculating unit for calculating an Euclidean distance with respect to a traffic time series pattern using the extracted representative subsequence; And
And classifying the normal traffic flow and the abnormal traffic flow using the calculated Euclidean distance,
Lt; / RTI >
The feature extraction unit,
We focus on traffic congestion attack using shapelets based time series analysis technique and use DoS attack traffic data set among multiple attack types. In order to solve the problem of decrease in accuracy and learning time according to over sum, information gain Extracting a predetermined number of upper features of the plurality of features of the data set of the traffic time series pattern and performing entropy calculation as a preprocessing process to select a predetermined number of upper features of the plurality of features, And the difference between the entropy of the lower node and the entropy of the lower node, thereby extracting a predetermined number of upper features
Traffic congestion attack detection device.

delete

The method according to claim 6,
The euclidean distance calculator may calculate,
Shapelets Fast-Shapelet technique was applied to reduce the learning and testing time of the time series analysis technique. The distance between the normal traffic flow and the abnormal traffic flow was maximized through the Euclidean distance technique. To create a shipp
Traffic congestion attack detection device.

10. The method of claim 9,
Wherein,
The binary classification is performed on the basis of a predetermined threshold value between the shuffle and the abnormal traffic flow generated in the step of calculating the Euclidean distance
Traffic congestion attack detection device.