KR20140064149A

KR20140064149A - Apparatus and method for traffic analysis

Info

Publication number: KR20140064149A
Application number: KR1020120131148A
Authority: KR
Inventors: 윤효진; 김종권
Original assignee: 서울대학교산학협력단
Priority date: 2012-11-19
Filing date: 2012-11-19
Publication date: 2014-05-28
Also published as: KR101437008B1

Abstract

The present invention provides an apparatus and a method for analyzing traffic. The apparatus for analyzing the traffic according to an aspect of the present invention comprises; a selecting part for separately selecting initial N bytes from a payload of multiple flow packets; a generating part for generating numerical flow information for each flow packet using features of each flow packet; and a learning part for performing machine learning for the separately selected initial N bytes and the generated numerical flow information, and determining classification criteria of the traffic according to the learning result.

Description

[0001] Apparatus and Method for Traffic Analysis [

본 발명은 트래픽 분석 기술에 관한 것으로서, 더 구체적으로는 인터넷 트래픽의 응용프로그램을 분류할 수 있는 트래픽 분석 장치 및 방법에 관한 것이다.The present invention relates to traffic analysis technology, and more particularly, to a traffic analysis apparatus and method capable of classifying application programs of Internet traffic.

최근, 인터넷이 대중화되면서 인터넷 사용자는 이메일이나, 웹서비스 같은 전통적 인터넷 서비스뿐만 아니라, P2P 파일 공유 및 멀티미디어 스트리밍 서비스 등을 사용하고 있다. 그에 따라, 인터넷 트래픽도 급증하고 있어, 다량 트래픽을 정확히 파악하고, 네트워크를 효율적으로 관리할 수 있는 트래픽 모니터링 및 분석 기술이 중요해지고 있다.Recently, as the Internet has become popular, Internet users use P2P file sharing and multimedia streaming services as well as traditional Internet services such as e-mail and web services. As a result, Internet traffic is rapidly increasing, and traffic monitoring and analysis technology that can accurately grasp a large amount of traffic and efficiently manage the network is becoming important.

인터넷 트래픽 분석이란 분석 대상 네트워크의 트래픽을 수집하여, 응용 프로그램별로 분류하고 수량적으로 측정하는 것을 의미한다. 따라서, 인터넷 트래픽을 분석하기 위해서는 트래픽을 각 응용 프로그램별로 분류하는 기술이 필수적이다. Internet traffic analysis refers to collecting traffic of the analyzed network and classifying it by application program and measuring it quantitatively. Therefore, in order to analyze Internet traffic, it is essential to classify traffic by each application program.

인터넷 트래픽 분석 기술은 가장 단순한 포트 기반 방법부터 플로우 정보를 이용한 방법, 호스트의 행동을 분석한 방법 또는 이들을 결합하여 이용하는 방법 등 다양한 방법들이 있다.Internet traffic analysis techniques include various methods such as the simplest port-based method, the flow information method, the host's behavior analysis method, or a combination thereof.

본 발명은 전술한 바와 같은 기술적 배경에서 안출된 것으로서, 두 가지 정보를 이용하여 트래픽의 응용프로그램의 시그니처를 분류할 수 있는 트래픽 분석 장치 및 방법을 제공하는 것을 그 목적으로 한다.It is an object of the present invention to provide a traffic analyzing apparatus and method capable of classifying a signature of an application program of traffic using two types of information.

본 발명의 목적은 이상에서 언급한 목적으로 제한되지 않으며, 언급되지 않은 또 다른 목적들은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The objects of the present invention are not limited to the above-mentioned objects, and other objects not mentioned can be clearly understood by those skilled in the art from the following description.

본 발명의 일면에 따른 트래픽 분석 장치는, 복수의 플로우(Flow) 패킷의 페이로드(Payload)에서 초기 N 바이트를 각기 선별하는 선별부; 상기 복수의 플로우 각각의 특징(Feature)을 이용하여 상기 각 플로우에 대한 플로우 수치적 정보(Flow statistics)를 생성하는 생성부; 및 각기 선별된 상기 초기 N 바이트와 생성된 상기 플로우 수치적 정보를 기계 학습(Machine Learning)하고, 그 결과 트래픽의 분류 기준을 결정하는 학습부를 포함하는 것을 특징으로 한다.According to an aspect of the present invention, there is provided a traffic analysis apparatus including: a selector for selecting initial N bytes from a payload of a plurality of flow packets; A generating unit for generating flow statistics for each of the plurality of flows using a feature of each of the plurality of flows; And a learning unit for learning machine learning of the selected N-byte and the generated flow numerical information, respectively, and determining a classification criterion of the traffic as a result.

본 발명의 다른 면에 따른 트래픽 분석 장치에 의한 트래픽 분석 방법은, 복수의 플로우(Flow) 패킷의 페이로드에서 초기 N 바이트를 각기 선별하는 단계; 상기 복수의 플로우 각각의 특징(Feature)을 이용하여 상기 각 플로우에 대한 플로우 수치적 정보(Flow statistics)를 생성하는 단계; 및 각기 선별된 상기 초기 N 바이트와 생성된 상기 플로우 수치적 정보를 기계 학습(Machine Learning)하고, 그 결과 트래픽의 분류 기준을 결정하는 단계를 포함하는 것을 특징으로 한다.According to another aspect of the present invention, there is provided a method of analyzing traffic by a traffic analysis apparatus, comprising: selecting initial N bytes from payloads of a plurality of flow packets; Generating flow statistics for each of the plurality of flows using a feature of each of the plurality of flows; And machine learning the first N bytes and the generated flow numerical information, respectively, and determining a classification criterion of the traffic as a result.

본 발명에 따르면, 대량의 인터넷 트래픽을 비교적 정확히 응용프로그램별로 분류할 수 있다.According to the present invention, a large amount of Internet traffic can be classified by application programs relatively accurately.

도 1은 본 발명의 실시예에 따른 트래픽 분석 장치를 도시한 구성도.
도 2 및 도 3은 본 발명의 실시예에 따른 트래픽 분류 실험 결과 그래프.
도 4는 본 발명의 실시예에 따른 트래픽 분석 방법을 도시한 흐름도.1 is a block diagram illustrating a traffic analysis apparatus according to an embodiment of the present invention.
FIG. 2 and FIG. 3 are graphs of traffic classification experiment results according to an embodiment of the present invention.
4 is a flowchart illustrating a traffic analysis method according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 한편, 본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성소자, 단계, 동작 및/또는 소자는 하나 이상의 다른 구성소자, 단계, 동작 및/또는 소자의 존재 또는 추가를 배제하지 않는다.BRIEF DESCRIPTION OF THE DRAWINGS The advantages and features of the present invention and the manner of achieving them will become apparent with reference to the embodiments described in detail below with reference to the accompanying drawings. The present invention may, however, be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art. Is provided to fully convey the scope of the invention to those skilled in the art, and the invention is only defined by the scope of the claims. It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. In the present specification, the singular form includes plural forms unless otherwise specified in the specification. As used herein, the terms " comprises, " and / or "comprising" refer to the presence or absence of one or more other components, steps, operations, and / Or additions.

이제 본 발명의 실시예에 대하여 첨부한 도면을 참조하여 상세히 설명하기로 한다. 도 1은 본 발명의 실시예에 따른 트래픽 분석 장치를 도시한 구성도이다.Embodiments of the present invention will now be described in detail with reference to the accompanying drawings. 1 is a block diagram illustrating a traffic analysis apparatus according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명의 실시예에 따른 트래픽 분석 장치(10)는 선별부(110), 생성부(120), 학습부(130) 및 분류부(140)를 포함한다.1, a traffic analysis apparatus 10 according to an embodiment of the present invention includes a selector 110, a generator 120, a learning unit 130, and a classifier 140.

선별부(110)는 인터넷 트래픽의 플로우(Flow) 패킷들을 입력받아, 복수의 플로우의 패킷 중 어느 하나의 패킷의 페이로드(Payload)에서 초기 N 바이트(예컨대, 32 byte)를 각기 선별한다.The selector 110 receives flow packets of Internet traffic and selects initial N bytes (e.g., 32 bytes) from a payload of a packet of a plurality of flows.

선별부(110)는 각 플로우의 패킷들 중에서 세션(Session) 생성의 초기에 출발지와 도착지 간에 송/수신되는 플로우 시그널 패킷의 페이로드에서 초기 N 바이트를 선별할 수 있다. 즉, 플로우 시그널 패킷은 플로우의 여러 패킷 중에서 맨 처음으로 나타나는 일 패킷일 수 있다.The selector 110 may select the initial N bytes from the payload of the flow signal packet transmitted / received between the source and destination at the beginning of session generation among the packets of each flow. That is, the flow signal packet may be a packet that appears first among the various packets of the flow.

이때, 인터넷 트래픽은 다수의 플로우로 구성되는데, 각 플로우는 <출발지(Source) IP, 출발지 포트(Port), 도착지(Destination) IP, 도착지 포트(Port), 프로토콜>와 같이 다섯 개의 튜플 값이 같은 패킷들을 묶은 것이다.At this time, the Internet traffic is composed of a plurality of flows. Each flow has five tuple values such as <Source IP, Origination Port, Destination IP, Destination Port, Protocol> Packets are bundled.

생성부(120)는 복수의 플로우 각각의 특징(Feature)을 이용하여 각 플로우에 대한 플로우 수치적 정보(Flow statistics)를 생성한다. 여기서, 각 플로우 특징(Key flow feature)은 패킷 간 도착시간, 프로토콜, 포트, 패킷 개수, 평균 패킷 크기 및 각 패킷의 플래그(Flag) 중 적어도 하나를 포함한다.The generating unit 120 generates flow statistics for each flow using features of the plurality of flows. Here, each key flow feature includes at least one of an inter-packet arrival time, a protocol, a port, a number of packets, an average packet size, and a flag of each packet.

학습부(130)는 각기 선별된 초기 N 바이트와 생성된 플로우 수치적 정보를 트레이닝(Training) 정보로 이용하여 기계 학습(Machine Learning)한다. 여기서, 기계 학습(Machine Learning)은 데이터의 집합을 기계에 학습시킴에 따라, 얻은 규칙이나 패턴을 이용하여 추후 입력되는 데이터로부터 유용한 정보를 추출하기 위한 기술이다. 본 발명에서는 기계 학습을 통해서 트래픽의 응용프로그램의 시그니처를 분류하기 위한 분류기(또는, 분류 기준)를 결정한다.The learning unit 130 performs machine learning using the selected initial N bytes and the generated flow numerical information as training information. Here, machine learning is a technique for extracting useful information from data inputted later by using a rule or pattern obtained by learning a set of data on a machine. In the present invention, a classifier (or classifying criterion) for classifying the signature of the application program of the traffic is determined through machine learning.

이때, 학습부(130)는 서포트 벡터 머신(Support Vector Machine) 및 C.45 결정 트리(Decision Tree) 중 적어도 하나의 알고리즘을 이용하여 초기 N 바이트와 플로우 수치적 정보를 학습할 수 있다.At this time, the learning unit 130 may learn the initial N bytes and the flow numerical information using at least one of a support vector machine and a C.45 decision tree.

한편, 기계 학습은 특징(Feature)을 이용하는데, 플로우의 시그널 패킷의 페이로드는 특징이 아닌, 16진수(Hexa)로 표현되어 기계 학습될 수 없으므로, 학습부(130)는 초기 N 바이트를 입력받으면, 16진수인 초기 N 바이트를 숫자로 바꿔주는 과정을 수행한 후 기계 학습한다.Since the payload of the signal packet of the flow is not a feature but expressed in hexadecimal (Hexa) and can not be machine-learned, the learning unit 130 inputs the initial N bytes If it receives, it performs the process of converting the initial N bytes, which are hexadecimal numbers, into numbers, and then learns the machine.

이를 위하여, 학습부(130)는 기계 학습 전에 페이로드에 문자가 존재하는지 여부를 확인하고, 문자가 존재하면, 초기 N 바이트를 8 바이트 단위로 그 내용에 대응되는 0~255 중 어느 하나의 숫자로 매칭(Maching)한다.For this, the learning unit 130 checks whether or not a character exists in the payload before the machine learning. If there is a character, the learning unit 130 divides the initial N bytes into any one of 0 to 255 As shown in FIG.

예를 들어, N이 16이고 페이로드의 내용이 4EF0면, 학습부(130)는 총 16 바이트 중 초기 8 바이트인 4E를 그 내용에 대응하는 78로 변경하고, 다음 8 바이트인 F0을 그 내용에 대응하는 240으로 변경하여 매칭한다.For example, if N is 16 and the contents of the payload is 4EF0, the learning unit 130 changes 4E, which is the initial 8 bytes out of the total 16 bytes, to 78 corresponding to its content, 240 " corresponding to " 1 "

여기서, 페이로드에 문자 존재 여부는 선별부(110)나, 다른 구성요소에 의해 확인되어 학습부(130)로 알려질 수 있으며, 학습부(130)에 의해 직접 확인될 수도 있다.Here, the presence or absence of a character in the payload can be confirmed by the selector 110 or other components, and can be known to the learning unit 130 and directly confirmed by the learning unit 130. [

한편, 학습부(130)는 페이로드에 문자가 존재하지 않으면, 초기 N 바이트를 8 바이트단위로 256으로 매칭한 후 기계 학습에 이용한다. 따라서, 본 발명에서는 플로우 시그널 패킷의 페이로드가 실제로 존재하지 않을 경우에도 해당 플로우를 효과적으로 트레이닝할 수 있다.On the other hand, if there is no character in the payload, the learning unit 130 matches the initial N bytes to 256 in 8-byte units and uses it for machine learning. Therefore, even when the payload of the flow signal packet does not actually exist, the flow can be effectively trained.

일반적으로, 페이로드 기법은 패킷을 열어 가장 상위층의 응용프로그램 프로토콜을 직접 확인(Deep Packet Inspection)하므로, 각 응용프로그램의 유일한(Unique) 페이로드 시그니쳐(특징)를 알고 있다면, 매우 정확히 데이터를 분류할 수 있는 특징이 있다. 그러나, 종래의 페이로드 기법은 패킷이 암호화 전송될 경우 응용프로그램의 시그니처를 정확히 파악하기 어려운 문제가 있었다.In general, the payload mechanism opens the packet and checks the application protocol of the top layer directly (Deep Packet Inspection), so if you know the unique payload signature (characteristic) of each application, There is a feature that can be. However, the conventional payload technique has a problem that it is difficult to accurately grasp the signature of an application program when a packet is encrypted and transmitted.

이러한 문제를 개선하고자, 본 발명의 실시예는 페이로드를 기계 학습하여 트래픽 분류기(또는, 분류 기준)를 얻으므로, 응용프로그램의 패킷이 암호화 전송되거나, 페이로드가 실제로 비어있는 경우에도 해당 플로우를 효과적으로 트레이닝할 수 있어, 트래픽 분류의 정확도를 높일 수 있다.In order to solve such a problem, the embodiment of the present invention obtains a traffic classifier (or classification standard) by mechanically learning a payload, so that even when a packet of an application program is encrypted or transmitted and the payload is actually empty, Can be trained effectively, and the accuracy of traffic classification can be improved.

뿐만 아니라, 본 발명의 실시예는 외부적으로 관찰 가능한 통계적 정보인 플로우 수치적 정보를 기계 학습하여 응용프로그램 시그니쳐를 분류하므로, 페이로드 정보를 정상적으로 사용하기 어려울 경우에도 효과적으로 트래픽을 분류할 수 있다.In addition, the embodiment of the present invention classifies application program signatures by mechanically learning flow numerical information, which is externally observable statistical information, so that traffic can be effectively classified even when payload information is difficult to use normally.

이하, 도 2 및 도 3을 참조하여 본 발명의 실시예에 따른 패킷 분석 기법의 성능에 대해서 살펴본다. 도 2 및 도 3은 BitTorrent, CHAT, MAIL, NTP, SSH/SSL, SpamAssasin 및 Web 응용프로그램의 패킷 및 플로우 특징을 기계 학습함에 따른 분류기를 이용하여 트래픽을 분류한 실험결과이다. 또한, 서포트 벡터 머신(SVM) 및 C4.5 결정 트리(C4.5)를 각기 사용하여 실험한 결과이다.Hereinafter, performance of the packet analysis technique according to an embodiment of the present invention will be described with reference to FIG. 2 and FIG. FIG. 2 and FIG. 3 are experimental results that classify traffic using classifiers according to machine learning of packet and flow characteristics of BitTorrent, CHAT, MAIL, NTP, SSH / SSL, SpamAssassin, and Web applications. Also, it is the result of experiment using each of support vector machine (SVM) and C4.5 decision tree (C4.5).

도 2는 본 발명의 실시예에 따른 하이브리드 패킷 분석 기법에서 페이로드 반영 범위를 증가시키며 실험한 결과, 결정된 트래픽 분류기의 전체 정확도(Overall Accuracy)를 도시한 그래프이다.FIG. 2 is a graph illustrating the overall accuracy of the determined traffic classifier as a result of an experiment in which a payload reflection range is increased in a hybrid packet analysis technique according to an embodiment of the present invention.

도 2에서, 본 발명의 트래픽 분석 장치(10)는 페이로드의 초기 8 바이트와 초기 16 바이트를 이용한 경우보다, 초기 32 바이트를 이용한 경우에, 보다 향상된 전체 정확도를 얻을 수 있고, 이러한 특징은 서포트 벡터 머신을 사용할 경우에 두드러짐을 알 수 있다. 그 이유는 기계 학습 시에 플로우 페이로드를 34 바이트 이상을 이용하면, 각 응용프로그램의 시그니처(특징)를 충분히 반영할 수 있기 때문이다. 또한, 도 2의 실험에서는 페이로드를 64 바이트 이상 반영하였을 때에는 전체 정확도가 개선되지 않음을 확인할 수 있다. 따라서, 본 발명의 실시예에서는 플로우 시그널 페이로드의 초기 32 바이트를 학습에 이용하는 경우를 예로 들어 설명한다.2, the traffic analyzing apparatus 10 of the present invention can obtain improved overall accuracy when using the initial 32 bytes rather than using the initial 8 bytes and the initial 16 bytes of the payload, It is noticeable when using a vector machine. The reason is that when the flow payload is used at least 34 bytes at the time of machine learning, the signature (characteristic) of each application program can be sufficiently reflected. Also, in the experiment of FIG. 2, it can be confirmed that the total accuracy is not improved when the payload is reflected in 64 bytes or more. Therefore, in the embodiment of the present invention, the case where the initial 32 bytes of the flow signal payload is used for learning will be described as an example.

도 3은 본 발명의 실시예에 따른 트래픽 분석 장치가 페이로드만을 이용한 경우(Payload), 플로우 정보만을 이용한 경우(Flow statistics) 및 페이로드/플로우 정보를 함께 이용한 경우(Hybrid)의 전체 정확도를 비교하여 도시한 그래프이다. 도 3은 페이로드의 초기 32 바이트를 이용한 경우의 예이다.FIG. 3 is a flow chart illustrating a method of comparing a total accuracy of a case where a traffic analyzing apparatus according to an embodiment of the present invention uses only a payload (payload), a case where only flow information is used (Flow statistics), and a case where payload / flow information is used together Fig. 3 shows an example in which the initial 32 bytes of the payload are used.

도 3에서, 본 발명의 실시예에 따른 트래픽 분석 장치가 두 가지 정보(페이로드 및 플로우 정보)를 모두 활용했을 때에 두 가지 정보 중 하나만을 이용한 경우에 비해서 향상된 결과를 얻을 수 있음을 알 수 있다. 이는 C4.5 결정 트리와 서포트 벡터 머신 알고리즘을 이용한 실험 결과 그래프에서 각기 확인할 수 있다.In FIG. 3, it can be seen that when the traffic analysis apparatus according to the embodiment of the present invention utilizes both the information (payload and flow information), an improved result can be obtained as compared with the case where only one of the two pieces of information is used . This can be confirmed in the graph of the experimental result using the C4.5 decision tree and the support vector machine algorithm.

이와 같이, 본 발명의 실시예는 페이로드 정보와 플로우 수치적 정보를 함께 활용하는 하이브리드 기법을 이용하므로, 대량 인터넷 트래픽의 분류 정확도를 높일 수 있다.As described above, since the embodiment of the present invention uses a hybrid technique that utilizes payload information and flow numerical information together, classification accuracy of massive Internet traffic can be improved.

이하, 도 4를 참조하여 본 발명의 실시예에 따른 트래픽 분석 방법에 대해서 설명한다. 도 4는 본 발명의 실시예에 따른 트래픽 분석 방법을 도시한 흐름도이다.Hereinafter, a traffic analysis method according to an embodiment of the present invention will be described with reference to FIG. 4 is a flowchart illustrating a traffic analysis method according to an embodiment of the present invention.

도 4를 참조하면, 트래픽 분석 장치(10)는 복수의 플로우(Flow) 시그널 패킷의 페이로드에서 초기 N 바이트를 각기 선별한다(S410). 여기서, 플로우 시그널 패킷은 플로우의 여러 패킷 중에서 맨 처음으로 나타나는 일 패킷일 수 있으며, N은 32일 수 있다.Referring to FIG. 4, the traffic analysis apparatus 10 selects initial N bytes from payloads of a plurality of flow signal packets (S410). Here, the flow signal packet may be one packet that appears first among the various packets of the flow, and N may be 32. [

이어서, 트래픽 분석 장치(10)는 복수의 플로우 각각의 특징(Feature)을 이용하여 각 플로우에 대한 플로우 수치적 정보(Flow statistics)를 생성한다(S420). 여기서, 각 플로우 특징(Key flow feature)은 패킷 간 도착시간, 프로토콜, 포트, 패킷 개수, 평균 패킷 크기 및 각 패킷의 플래그(Flag) 중 적어도 하나를 포함한다.Next, the traffic analysis apparatus 10 generates flow statistics for each flow using the features of each of the plurality of flows (S420). Here, each key flow feature includes at least one of an inter-packet arrival time, a protocol, a port, a number of packets, an average packet size, and a flag of each packet.

트래픽 분석 장치(10)는 각기 선별된 초기 N 바이트와 생성된 플로우 수치적 정보를 기계 학습(Machine Learning)하고, 트래픽의 분류 기준을 결정한다(S430). 이때, 트래픽 분석 장치(10)는 기계 학습 전에 페이로드에 문자가 존재하는지 여부를 확인하고, 존재하면, 초기 N 바이트를 8 바이트 단위로 그 내용에 대응되는 0~255 중 어느 하나의 숫자로 매칭한다. 반면, 트래픽 분석 장치(10)는 페이로드에 문자가 존재하지 않으면, 초기 N 바이트를 8 바이트 단위로 256으로 매칭한다.The traffic analysis apparatus 10 performs machine learning on each of the selected N bytes and the generated flow numerical information, and determines a traffic classification standard (S430). At this time, the traffic analysis apparatus 10 checks whether or not a character exists in the payload before the machine learning. If there is, the initial N bytes are matched with any one of 0 to 255 corresponding to the content in units of 8 bytes do. On the other hand, if there is no character in the payload, the traffic analysis apparatus 10 matches the initial N bytes to 256 in 8-byte units.

이후, 트래픽 분석 장치(10)는 결정된 분류 기준에 따라 입력받은 인터넷 트래픽의 응용프로그램의 시그니처를 분류한다(S440).Thereafter, the traffic analysis apparatus 10 classifies the signature of the application program of the Internet traffic received according to the determined classification criteria (S440).

이와 같이, 본 발명의 실시예는 두 정보(페이로드의 초기 N 바이트 및 플로우 수치 정보)를 함께 활용하여 두 가지 정보 중 하나만 이용할 때보다 분류의 정확도를 향상시킬 수 있다.As described above, the embodiment of the present invention can improve the accuracy of classification by using both pieces of information (the initial N bytes of the payload and the flow numerical information) together rather than using only one of the two pieces of information.

또한, 본 발명의 실시예는 페이로드를 기계 학습하여 트래픽 분류기(또는, 분류 기준)를 얻으므로, 응용프로그램의 패킷이 암호화 전송되는 경우나, 페이로드가 실제로 비어있는 경우에도 해당 플로우를 효과적으로 학습할 수 있어, 트래픽 분류의 정확도를 높일 수 있다.Further, since the embodiment of the present invention obtains the traffic classifier (or classification standard) by mechanically learning the payload, even when the packet of the application program is encrypted or when the payload is actually empty, The accuracy of the traffic classification can be improved.

이상, 본 발명의 구성에 대하여 첨부 도면을 참조하여 상세히 설명하였으나, 이는 예시에 불과한 것으로서, 본 발명이 속하는 기술분야에 통상의 지식을 가진자라면 본 발명의 기술적 사상의 범위 내에서 다양한 변형과 변경이 가능함은 물론이다. 따라서 본 발명의 보호 범위는 전술한 실시예에 국한되어서는 아니되며 이하의 특허청구범위의 기재에 의하여 정해져야 할 것이다.While the present invention has been described in detail with reference to the accompanying drawings, it is to be understood that the invention is not limited to the above-described embodiments. Those skilled in the art will appreciate that various modifications, Of course, this is possible. Accordingly, the scope of protection of the present invention should not be limited to the above-described embodiments, but should be determined by the description of the following claims.

Claims

A selector for selecting an initial N bytes from a payload of a plurality of flow packets;
A generating unit for generating flow statistics for each of the plurality of flows using a feature of each of the plurality of flows; And
A learning unit which learns the initial N bytes and the generated flow numerical information selected by each machine and learns the classification criteria of the traffic as a result,
Lt; / RTI >

The method according to claim 1,
A classifying unit for classifying the signature of the application program with respect to the traffic that is subsequently input according to the classification criteria;
Further comprising:

The apparatus according to claim 1,
Wherein the initial N bytes are selected from a payload of each flow packet transmitted and received between a source and a destination at an initial stage of session creation among the packets constituting each flow.

The apparatus according to claim 1,
Wherein the initial N bytes and the flow numerical information are learned using at least one of a Support Vector Machine and a C4.5 Decision Tree.

The apparatus according to claim 1,
Wherein the initial N bytes are matched with any one of 0 to 255 corresponding to the contents of the initial N bytes in units of 8 bytes before the machine learning.

The apparatus according to claim 1,
And if the character does not exist in the payload, expresses the initial N bytes as 256 by 8 bytes.

A traffic analysis method using a traffic analysis apparatus,
Selecting an initial N bytes from a payload of a plurality of Flow packets, respectively;
Generating flow statistics for each of the plurality of flows using a feature of each of the plurality of flows; And
A step of machine learning the selected initial N bytes and the generated flow numerical information, and determining a classification criterion of the traffic as a result
Lt; / RTI >

8. The method of claim 7, wherein prior to the machine learning step,
Matching the initial N bytes with any one of 0 to 255 corresponding to the content in 8-byte units
Further comprising:

8. The method of claim 7, wherein prior to the machine learning step,
If there is no character in the payload, matching the initial N bytes to 256 in 8 byte units
Further comprising: