KR20190091713A

KR20190091713A - Apparatus and method for classifying network traffic

Info

Publication number: KR20190091713A
Application number: KR1020180010614A
Authority: KR
Inventors: 임완선; 김귀훈; 김민석; 김주봉; 임현교; 한연희; 허주성; 홍용근
Original assignee: 한국전자통신연구원
Priority date: 2018-01-29
Filing date: 2018-01-29
Publication date: 2019-08-07

Abstract

A method for classifying network traffic is disclosed. The method comprises the steps of: cleansing network traffic data collected via an interface of a network, at a processor; converting the cleansed network traffic data into a data set consisting of image data, at the processor; learning the data set to generate a recurrent neural networks (RNN) model, at the processor; and classifying a type of network traffic for a data flow using the generated RNN model, the data flow transmitted and received by network equipment connected to the network, at the processor.

Description

Network traffic classification device and its method {APPARATUS AND METHOD FOR CLASSIFYING NETWORK TRAFFIC}

본 발명은 기계 학습을 이용하여 네트워크 트래픽을 분류하는 기술에 관한 것이다. The present invention relates to a technique for classifying network traffic using machine learning.

인터넷에 기반한 응용프로그램의 사용이 급증하면서 한정된 네트워크 자원을 효율적으로 운용하기 위한 연구가 수행되고 있다. 이를 위해서는 대용량의 네트워크 트래픽을 정확하게 분류할 수 있는 방법이 요구된다.As the use of Internet-based applications increases rapidly, research is being conducted to efficiently operate limited network resources. This requires a way to accurately classify large amounts of network traffic.

최근, 기계 학습 기법의 하나인 딥러닝(deep learning) 기법을 이용하여 네트워크 트래픽을 분류하는 연구가 수행되고 있다. 딥러닝 기법에는 대표적으로 MNN (Multi-layer Neural network) 모델, CNN (Convolution Neural Network) 모델, RNN (Recurrent Neural Networks) 모델 등이 있다. Recently, researches for classifying network traffic using deep learning, which is one of machine learning techniques, have been conducted. Deep learning techniques include a multi-layer neural network (MNN) model, a convolutional neural network (CNN) model, and a recurrent neural network (RNN) model.

딥러닝 기법의 하나인 RNN 모델은 음성 인식, 언어 모델링, 번역 등 주로 자연어 처리와 관련된 분야에서 활용되고 있는데, 이러한 RNN 모델을 이용하여 네트워크 트래픽을 분류하는 연구는 아직 미흡한 실정이다.The RNN model, which is one of the deep learning techniques, is mainly used in the fields related to natural language processing such as speech recognition, language modeling, and translation. However, the study of classifying network traffic using the RNN model is still insufficient.

본 발명의 목적은 RNN 모델을 이용하여 네트워크 트래픽을 분류하는 네트워크 트래픽 분류 장치 및 그 방법을 제공하는 데 있다.An object of the present invention is to provide a network traffic classification apparatus and method for classifying network traffic using an RNN model.

상술한 목적을 달성하기 위한 본 발명의 일면에 따른 네트워크 트래픽 분류 방법은, 프로세서에서, 상기 네트워크 인터페이스를 통해 수집된 네트워크 트래픽 데이터를 정제하는 단계; 상기 프로세서에서, 상기 정제된 네트워크 트래픽 데이터를 이미지 데이터로 구성된 데이터 세트로 변환하는 단계; 상기 프로세서에서, 상기 데이터 세트를 학습하여 RNN(Recurrent Neural Networks) 모델을 생성하는 단계; 및 상기 프로세서에서, 상기 네트워크에 접속된 네트워크 장비에서 송수신하는 데이터 플로우를 상기 생성된 RNN 모델을 이용하여 상기 데이터 플로우에 대한 네트워크 트래픽의 종류를 분류하는 단계를 포함한다.According to an aspect of the present invention, there is provided a method for classifying network traffic, the processor comprising: refining network traffic data collected through the network interface; Converting, in the processor, the purified network traffic data into a data set consisting of image data; At the processor, learning the data set to generate a Recurrent Neural Networks (RNN) model; And classifying, by the processor, a type of network traffic for the data flow using the generated RNN model for data flows transmitted and received by the network equipment connected to the network.

본 발명의 다른 일면에 따른 네트워크 트래픽 분류 장치는, 네트워크 접속하여 네트워크 트래픽 데이터를 수집하는 네트워크 인터페이스; 상기 네트워크 트래픽 데이터를 필터링하여 다수의 페이로드를 추출하고, 상기 추출된 다수의 페이로드를 이미지 패턴으로 표현되는 데이터 세트로 변환하고, 상기 데이터 세트를 학습하여 RNN(Recurrent Neural Networks) 모델을 생성하는 프로세서; 및 상기 프로세서에 의해 생성된 상기 RNN 모델을 저장하는 저장소를 포함한다.In accordance with another aspect of the present invention, a network traffic classification apparatus includes a network interface configured to access a network and collect network traffic data; Extracting a plurality of payloads by filtering the network traffic data, converting the extracted payloads into a data set represented by an image pattern, and learning the data set to generate a Recurrent Neural Networks (RNN) model A processor; And a repository for storing the RNN model generated by the processor.

본 발명에 따르면, 네트워크 트래픽을 단순하고 이해하기 쉬운 이미지 데이터로 구성된 데이터 세트로 변환하고, 이 데이터 세트를 학습하여 RNN 모델을 생성하고, 이러한 RNN 모델을 네트워크 트래픽을 분류하데 활용함으로써, 네트워크 트래픽을 분류하기 위해 플로우 구조와 네트워크 트래픽의 종류 사이의 관계를 복잡한 규칙으로 만들 필요가 없기 때문에, 네트워크 트래픽 정책이나 계획을 용이하게 수립할 수 있다. According to the present invention, the network traffic is converted into a data set consisting of simple and understandable image data, the data set is learned to generate an RNN model, and the network traffic is utilized to classify the network traffic. Because the relationship between the flow structure and the type of network traffic does not have to be complicated to classify, it is easy to formulate a network traffic policy or plan.

또한, 본 발명에서 생성하는 RNN 모델은 네트워크 트래픽을 이미지 형태(또는 이미지 패턴)으로 표현 가능한 데이터 세트의 학습으로부터 생성되기 때문에, 종래에는 파악하지 못했던 이미지 형태(또는 이미지 패턴)의 플로우 특징을 기반으로 네트워크 트래픽을 분류할 수 있다.In addition, since the RNN model generated in the present invention is generated from learning of a data set capable of representing network traffic in an image form (or image pattern), it is based on a flow characteristic of an image form (or image pattern) which has not been grasped conventionally. You can classify network traffic.

도 1은 본 발명의 일 실시 예에 따른 네트워크 트래픽 분류장치가 포함된 전체 시스템을 도시한 구성도이다.
도 2는 본 발명의 일 실시 예에 따른 네트워크 트래픽 분류 장치의 구성도이다.
도 3은 본 발명의 일 실시 예에 따른 PCAP 파일을 파싱하여 획득한 플로우 정보를 나타내는 도면이다.
도 4는 본 발명의 일 실시 예에 따른 데이터 변환 모듈에서 생성한 데이터 세트의 일 예를 도시한 도면이다.
도 5 내지 8은 본 발명의 일 실시 예에 따른 네트워크 트래픽 분류 방법을 설명하기 위한 흐름도들이다.1 is a block diagram showing an entire system including a network traffic classification apparatus according to an embodiment of the present invention.
2 is a block diagram of a network traffic classification apparatus according to an embodiment of the present invention.
3 is a diagram illustrating flow information obtained by parsing a PCAP file according to an embodiment of the present invention.
4 is a diagram illustrating an example of a data set generated by a data conversion module according to an embodiment of the present invention.
5 to 8 are flowcharts illustrating a network traffic classification method according to an embodiment of the present invention.

이하, 본 발명의 다양한 실시예가 첨부된 도면과 연관되어 기재된다. 본 발명의 다양한 실시예는 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들이 도면에 예시되고 관련된 상세한 설명이 기재되어 있다. 그러나, 이는 본 발명의 다양한 실시예를 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 다양한 실시예의 사상 및 기술 범위에 포함되는 모든 변경 및/또는 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 도면의 설명과 관련하여, 유사한 구성요소에 대해서는 유사한 참조 부호가 사용되었다.Hereinafter, various embodiments of the present invention will be described in connection with the accompanying drawings. Various embodiments of the present invention may have various changes and various embodiments, and specific embodiments are illustrated in the drawings and related detailed descriptions are described. However, this is not intended to limit the various embodiments of the present invention to specific embodiments, it should be understood to include all modifications and / or equivalents and substitutes included in the spirit and scope of the various embodiments of the present invention. In the description of the drawings, similar reference numerals are used for similar elements.

본 발명의 다양한 실시예에서 사용될 수 있는“포함한다” 또는 “포함할 수 있다” 등의 표현은 개시(disclosure)된 해당 기능, 동작 또는 구성요소 등의 존재를 가리키며, 추가적인 하나 이상의 기능, 동작 또는 구성요소 등을 제한하지 않는다. 또한, 본 발명의 다양한 실시예에서, "포함하다" 또는 "가지다" 등의 용어는 명세서에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Expressions such as "comprises" or "can include" as used in various embodiments of the present invention indicate the existence of the corresponding function, operation or component disclosed, and additional one or more functions, operations or It does not restrict the components. In addition, in various embodiments of the invention, the terms "comprise" or "have" are intended to indicate that there is a feature, number, step, action, component, part, or combination thereof described in the specification, one Or other features or numbers, steps, operations, components, parts or combinations thereof in any way should not be excluded in advance.

본 발명의 다양한 실시예에서 "또는" 등의 표현은 함께 나열된 단어들의 어떠한, 그리고 모든 조합을 포함한 다. 예를 들어, "A 또는 B"는, A를 포함할 수도, B를 포함할 수도, 또는 A 와 B 모두를 포함할 수도 있다.In various embodiments of the present invention, the expression "or" includes any and all combinations of words listed together. For example, "A or B" may include A, may include B, or may include both A and B.

본 발명의 다양한 실시예에서 사용된 "제 1," "제2", "첫째" 또는 "둘째," 등의 표현들은 다양한 실시예들의 다양한 구성요소들을 수식할 수 있지만, 해당 구성요소들을 한정하지 않는다. 예를 들어, 상기 표현들은 해당 구성요소들의 순서 및/또는 중요도 등을 한정하지 않는다. 상기 표현들은 한 구성요소를 다른 구성요소와 구분하기 위해 사용될 수 있다. 예를 들어, 제1 사용자 기기와 제 2 사용자 기기는 모두 사용자 기기이며, 서로 다른 사용자 기기를 나타낸다. 예를 들어, 본 발명의 다양한 실시예의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다.Expressions such as "first," "second," "first," or "second," and the like used in various embodiments of the present invention may modify various elements of the various embodiments, but do not limit the corresponding elements. Do not. For example, the above expressions do not limit the order and / or importance of the corresponding elements. The above expressions may be used to distinguish one component from another. For example, both a first user device and a second user device are user devices and represent different user devices. For example, without departing from the scope of the various embodiments of the present invention, the first component may be referred to as the second component, and similarly, the second component may also be referred to as the first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 상기 어떤 구성요소가 상기 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 상기 어떤 구성요소와 상기 다른 구성요소 사이에 새로운 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 상기 어떤 구성 요소와 상기 다른 구성요소 사이에 새로운 다른 구성요소가 존재하지 않는 것으로 이해될 수 있어야 할 것이다.When a component is said to be "connected" or "connected" to another component, the component may or may not be directly connected to or connected to the other component. It is to be understood that there may be new other components between the other components. On the other hand, when a component is referred to as being "directly connected" or "directly connected" to another component, it will be understood that there is no new other component between the component and the other component. Should be able.

본 발명의 실시예에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명의 실시 예를 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다.The terms used in the embodiments of the present invention are merely used to describe specific embodiments, and are not intended to limit the embodiments of the present invention. Singular expressions include plural expressions unless the context clearly indicates otherwise.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명의 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which embodiments of the present invention belong.

일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 발명의 다양한 실시 예에서 명백하게 정의되지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Terms such as those defined in the commonly used dictionaries should be construed as having meanings consistent with the meanings in the context of the related art, and are ideally or excessively formal unless otherwise defined in various embodiments of the present invention. It is not interpreted in the sense.

본 발명의 실시 예에 따른 설명에 앞서 몇 가지 용어가 정의된다.Several terms are defined before the description according to the embodiment of the present invention.

플로우(flow) Flow

플로우는 제한 시간 동안 관찰되는 패킷들의 집합으로서, 집합 내의 패킷들은 동일한 소스, 동일한 목적지, 동일한 포트 및 동일한 IP 프로토콜과 같은 다수의 속성을 공유한다. A flow is a set of packets observed for a timeout period, in which packets share a number of attributes, such as the same source, the same destination, the same port, and the same IP protocol.

플로우의 수(a total number of flow) A total number of flow

플로우의 수는 동일한 소스, 동일한 목적지, 동일한 포트 및 동일한 IP 프로토콜과 같은 다수의 속성을 공유하는 패킷들의 집합의 개수로 정의한다. The number of flows is defined as the number of sets of packets that share a number of attributes, such as the same source, the same destination, the same port, and the same IP protocol.

RNN(Recurrent Neural Networks) 모델 Recurrent Neural Networks Model

RNN 모델에서는 입력 데이터가 순차적으로 히든 레이어의 입력으로 사용된다. RNN 모델은 하나의 입력이 입력되면, 학습 셀을 거친 후 다음 입력이 순차적으로 들어오면, 다음 입력과 동시에 이전의 학습 정보를 동시에 이용하여 새로운 학습 결과를 만들어 낸다. RNN의 히든 레이어의 학습 step이 모두 끝나면 최종 입력과 바로 직전의 학습 결과를 통하여 최종적인 출력 결과가 나오게 된다.In the RNN model, input data is sequentially used as input of a hidden layer. The RNN model generates a new learning result by using previous learning information simultaneously with the next input when one input is inputted, and then the next input is sequentially entered through the learning cell. When the learning steps of the hidden layer of the RNN are completed, the final output result is shown through the final input and the immediately preceding learning result.

PCAP(Packet capture) 파일 Packet capture file

PCAP 파일은 네트워크 패킷을 캡처하여 패킷 형태로 저장한 파일이다.A PCAP file is a file that captures network packets and stores them in packet form.

응용 서비스 Application service

응용 서비스는 잘 알려진(Well-Known) 표준화된 프로토콜 기반으로 단말들 간의 네트워크 통신에서 네트워크 트래픽을 발생시키는 모든 종류의 응용 서비스로서, 예를 들면, BitTorrent, FTP, DNS, NTP, RDP(Remote, Desktop Protocol), NETBIOS, SSH(Secure Shell), Web, Browser HTTP, Browser RTMP 등을 포함할 수 있으며, 이에 한정하지 않는다. Application services are all kinds of application services that generate network traffic in network communication between terminals based on Well-Known standardized protocol. For example, BitTorrent, FTP, DNS, NTP, RDP (Remote, Desktop) Protocol), NETBIOS, Secure Shell (SSH), Web, Browser HTTP, Browser RTMP, and the like, but are not limited thereto.

도 1은 본 발명의 일 실시 예에 따른 네트워크 트래픽 분류장치가 포함된 전체 시스템을 도시한 구성도이다.1 is a block diagram showing an entire system including a network traffic classification apparatus according to an embodiment of the present invention.

도 1을 참조하면, 전체 시스템은 네트워크(100), 다수의 단말(200 및 300) 및 네트워크 분류 장치(400)를 포함한다.Referring to FIG. 1, the entire system includes a network 100, a plurality of terminals 200 and 300, and a network classification device 400.

네트워크(100)는 다수의 서브 네트워크(110)와 다수의 네트워크 장비(130 및 150)를 포함하는 개념으로, 각 서브 네트워크(110)는 근거리 통신망(Local Area Network; LAN), 광역 통신망(Wide Area Network; WAN) 또는 부가가치 통신망(Value Added Network; VAN) 등과 같은 유선 네트워크나 이동 통신망(mobile radio communication network) 또는 위성 통신망 등과 같은 무선 네트워크 등 모든 종류의 네트워크를 포함할 수 있으며, 각 네트워크 장비(130 및 150)는 네트워크(100) 내에서 다수의 단말(200 및 300) 사이의 통신 과정에서 발생하는 네트워크 트래픽, 즉, 일련의 패킷들의 경로를 제어하는 것으로, 네트워크 장비(130)는 단말(200)과 서브 네트워크(110) 사이의 패킷들의 경로를 제어하고, 네트워크 장비(150)는 서브 네트워크(110)와 단말(300) 사이의 패킷들의 경로를 제어한다. 이를 위해, 네트워크 장비는 허브(Hub), 브릿지(Bridge), 스위치(Switch), 라우터(Router) 등 다수의 단말(200 및 300) 사이의 통신을 제어 또는 중계하는 모든 종류의 통신 장치를 포함하도록 구성될 수 있다. 그 밖에, 본 명세서에서 사용되는 네트워크(100)는, 도면에 도시하지는 않았으나, 다수의 단말(200 및 300)과 직접 또는 간접적으로 연결되는 액세스 포인트(an access point: AP)를 포함하는 개념으로 지칭될 수도 있다. 액세스 포인트는 노드 B(a Node B), 라디오 네트워크 제어기(Radio Network Controller), 이볼브드 노드 B(Evolved Node B: eNB), 기지국 제어기(Base Station Controller: BSC), 베이스 트랜시버 스테이션(Base Transceiver Station: BTS), 기지국(Base Station: BS), 트랜시버 기능부(Transceiver Function: TF"), 기본 서비스 세트(Basic Service Set: BSS), 확장 서비스 세트(Extended Service Set: ESS), 라디오 기지국(Radio Base Station: RBS) 또는 일부 다른 용어(some other terminology)를 포함하거나(may comprise), 이들로 구현되거나(be implemented) 또는 이들로 공지될 수 있다(or known).The network 100 is a concept including a plurality of sub-networks 110 and a plurality of network equipment (130 and 150), each sub-network 110 is a local area network (LAN), wide area network (Wide Area) Network (WAN) or a wired network such as a Value Added Network (VAN), or any kind of network, such as a wireless network such as a mobile radio network or a satellite network, and the like. And 150 is a network traffic generated in a communication process between the plurality of terminals 200 and 300 in the network 100, that is, to control the path of a series of packets, and the network equipment 130 may control the terminal 200. And control the path of the packets between the sub-network 110, and the network equipment 150 controls the path of the packets between the sub-network 110 and the terminal 300. To this end, the network equipment includes all types of communication devices for controlling or relaying communication between a plurality of terminals 200 and 300, such as a hub, a bridge, a switch, a router, and the like. Can be configured. In addition, the network 100 used herein refers to a concept including an access point (AP) that is directly or indirectly connected to a plurality of terminals 200 and 300 although not shown in the drawings. May be The access point is a Node B, a Radio Network Controller, an Evolved Node B (eNB), a Base Station Controller (BSC), a Base Transceiver Station: BTS, Base Station (BS), Transceiver Function (TF "), Basic Service Set (BSS), Extended Service Set (ESS), Radio Base Station : RBS) or some other terminology may comprise, be implemented, or known as.

다수의 단말(200 및 300) 각각은 가입자국(a subscriber station), 가입자 유닛(a subscriber unit), 모바일 스테이션(a mobile station: MS), 원격국(a remote station), 원격 단말(a remote terminal), 사용자 단말(a user terminal: UT), 사용자 에이전트(a user agent), 사용자 디바이스(a user device), 사물 디바이스(thing device), 사용자 장비(user equipment), 사용자 스테이션(a user station) 또는 일부 다른 용어(some other terminology)를 포함하거나(may comprise) 이들로 구현되거나(be implemented) 또는 이들로 공지될 수 있다(or known). 일부 구현들에서(In some implementations), 단말(200 및 300)은 폰(예를 들어, 셀룰러 폰 또는 스마트 폰(eg, a cellular phone or smart phone)), 컴퓨터(예를 들어, 랩탑(eg, a laptop)), 태블릿(a tablet), 휴대용 통신 디바이스(a portable communication device), 휴대용 컴퓨팅 디바이스((a portable computing device, 예를 들어, 개인 휴대 정보 단말(eg, a personal data assistant)), 엔터테인먼트 디바이스(an entertainment device, 예를 들어, 음악 또는 비디오 디바이스 또는 위성 라디오(eg, music or video device, or a satellite radio)), 글로벌 측위 시스템(a global positioning system: GPS), 또는 무선 또는 유선 매체(a wireless or wired medium)를 통해 통신하도록 구성되는 임의의 다른 적절한 디바이스(any other suitable device)로 구성될 수 있다.Each of the plurality of terminals 200 and 300 is a subscriber station, a subscriber unit, a mobile station (MS), a remote station, a remote terminal. ), A user terminal (UT), a user agent, a user device, a thing device, user equipment, a user station, or Some other terminology may comprise, be implemented, or known as these. In some implementations, terminals 200 and 300 may include phones (eg, a cellular phone or smart phone), computers (eg, laptops (eg, a laptop), a tablet, a portable communication device, a portable computing device (eg, a personal data assistant), entertainment An entertainment device (eg, a music or video device, or a satellite radio), a global positioning system (GPS), or a wireless or wired medium ( any other suitable device configured to communicate via a wireless or wired medium.

트래픽 모니터링 장치(430 및 450)는 네트워크 장비(130 및 150)의 네트워크 트래픽을 모니터링하는 장치로, 트래픽 모니터링 장치(430)는 네트워크 장비(130)에 접속하여, 네트워크 장비(130)의 네트워크 트래픽을 모니터링하고, 트래픽 모니터링 장치(450)은 네트워크 장비(150)에 접속하여, 네트워크 장비(130)의 네트워크 트래픽을 모니터링 한다. 트래픽 모니터링 장치(430 및 450)는 통신 사업자측에 구비된 장치로서, 도 1의 예에서는 네트워크 장비(130 및 150)와 트래픽 모니터링 장치(430 및 450)가 물리적으로 분리된 형태로 도시되고 있으나, 네트워크 장비(130 및 150)가 트래픽 모니터링 장치(430 및 450) 내에 포함될 수도 있다.The traffic monitoring devices 430 and 450 are devices for monitoring network traffic of the network devices 130 and 150, and the traffic monitoring device 430 is connected to the network device 130 to monitor the network traffic of the network device 130. The traffic monitoring apparatus 450 connects to the network equipment 150 and monitors network traffic of the network equipment 130. The traffic monitoring devices 430 and 450 are provided on the carrier side. In the example of FIG. 1, the network devices 130 and 150 and the traffic monitoring devices 430 and 450 are physically separated from each other. Network equipment 130 and 150 may be included in traffic monitoring devices 430 and 450.

네트워크 트래픽 분류 장치(400)는 네트워크 장비(130 및 150) 또는 트래픽 모니터링 장치(410 및 430)로부터 제공되는 네트워크 트래픽 데이터를 RNN 모델을 이용하여 플로우 기반의 네트워크 트래픽을 분류하는 장치로서, 이에 대한 설명은 도 2를 참조하여 상세히 설명하기로 한다.The network traffic classifying apparatus 400 classifies the network traffic data provided from the network equipment 130 and 150 or the traffic monitoring apparatuses 410 and 430 by using the RNN model. Will be described in detail with reference to FIG.

도 2는 본 발명의 일 실시 예에 따른 네트워크 트래픽 분류 장치의 구성도이다.2 is a block diagram of a network traffic classification apparatus according to an embodiment of the present invention.

도 2를 참조하면, 네트워크 트래픽 분류 장치(500)는, 하드웨어 아키텍쳐 관점에서(in terms of hardware architecture), 프로세서(510), 저장소(520), 네트워크 인터페이스(530) 및 사용자 인터페이스(540)를 포함하도록 구성될 수 있다.Referring to FIG. 2, the network traffic classification apparatus 500 includes, in terms of hardware architecture, a processor 510, a storage 520, a network interface 530, and a user interface 540. It can be configured to.

프로세서(510)는 네트워크 트래픽 데이터를 RNN 모델을 이용하여 플로우 기반의 네트워크 트래픽을 분류하기 위한 다양한 소프트웨어들을 실행하는 구성으로, 메모리(520)에 저장된 소프트웨어들을 실행하기 위한 하드웨어 디바이스이다. 프로세서(510)는 주문 제작 또는 범용의 프로세서, 중앙 처리 장치(CPU), 반도체 기반 마이크로 프로세서, 매크로 프로세서 또는 일반적으로 소프트웨어 명령어를 실행하기 위한 임의의 디바이스일 수 있다.The processor 510 is a hardware device for executing software stored in the memory 520, configured to execute various software for classifying network traffic data based on the RNN model. The processor 510 may be a custom or general purpose processor, a central processing unit (CPU), a semiconductor based microprocessor, a macro processor, or any device generally for executing software instructions.

저장소(520)는 프로세서(510)에 의해 실행되는 다양한 소프트웨어를 저장하는 구성으로, DRAM, SRAM, SDRAM 등과 같은 휘발성 메모리 요소 및 ROM, EPROM, EEPROM, PROM, 테이프, CD-ROM, 디스크 등과 같은 비휘발성 메모리 요소 중 어느 하나 또는 그들의 결합을 포함할 수 있다. 저장소(520)에 저장된 소프트웨어는 네트워크 트래픽을 분류하기 위한 하나 이상의 개별 프로그램(one or more separate programs)을 포함할 수 있다. 각 프로그램은, 예를 들면, 네트워크 트래픽 데이터를 정제하는 프로세스와 관련된 프로그램, 정제된 데이터를 데이터 세트로 변환하는 프로세스와 관련된 프로그램, 데이터 세트를 학습하여 RNN 모델을 생성하는 프로세스와 관련된 프로그램, 생성된 RNN 모델, 상기 생성된 RNN 모델을 기반으로 트래픽 모니터링 장치(430 및 450)에서 송수신하는 데이터 플로우를 분류하는 프로세스와 관련된 프로그램 및 이들 프로그램들을 실행 환경을 제공하는 운영체제 등을 포함할 수 있다. The storage 520 is a configuration for storing various software executed by the processor 510 and includes nonvolatile memory elements such as DRAM, SRAM, SDRAM, and the like, and non-volatile memory elements such as ROM, EPROM, EEPROM, PROM, tape, CD-ROM, disk, and the like. It can include any one or combination of volatile memory elements. Software stored in storage 520 may include one or more separate programs for classifying network traffic. Each program may be, for example, a program associated with a process of refining network traffic data, a program associated with a process of converting refined data to a data set, a program associated with a process of learning a data set to generate an RNN model, and generated. An RNN model, a program related to a process of classifying data flows transmitted and received by the traffic monitoring apparatuses 430 and 450 based on the generated RNN model, and an operating system that provides an execution environment for these programs may be included.

각 프로그램은 소스 프로그램, 실행 가능 프로그램(executable program, 예를 들면, 객체 코드), 스크립트(script) 또는 명령어 집합(a set of instructions)을 포함하는 임의의 다른 개체(any other entity)의 형태일 수 있다. 또한, 각 프로그램은 컴파일러(compiler), 어셈블러(assembler), 인터프리터(interpreter) 등을 통해 변형될 수 있다. 또한, 각 프로그램은 객체 지향 프로그래밍 언어에 의해 쓰여지거나, 루틴, 서브루틴 및/또는 함수를 구비하는 프로시저 프로그래밍에 의해 쓰여질 수 있다.Each program may be in the form of a source program, an executable program (eg, object code), script, or any other entity including a set of instructions. have. In addition, each program may be modified through a compiler, an assembler, an interpreter, and the like. In addition, each program can be written by an object-oriented programming language or by procedure programming with routines, subroutines and / or functions.

네트워크 인터페이스(530)는 네트워크(100 또는 110), 네트워크(100) 내의 네트워크 장비(130 및 150) 또는 네트워크 장비(130 및 150)에 연결된 트래픽 모니터링 장치(430, 450)에 접속 가능한 하드웨어 디바이스로서, 네트워크 장비(130 및 150) 또는 트래픽 모니터링 장치(430, 450)로부터 네트워크 트래픽 데이터를 수집하고, 이를 처리 가능한 데이터로 가공하여 프로세서(510)에 전달할 수 있다. 네트워크 인터페이스(530)는, 예를 들면, 네트워크 인터페이스 카드(NIC), 네트워크 어댑터 등일 수 있다.The network interface 530 is a hardware device connectable to the network 100 or 110, the network equipment 130 and 150 within the network 100, or the traffic monitoring devices 430 and 450 connected to the network equipment 130 and 150. Network traffic data may be collected from the network equipment 130 and 150 or the traffic monitoring devices 430 and 450, processed into processed data, and transmitted to the processor 510. The network interface 530 may be, for example, a network interface card (NIC), a network adapter, or the like.

사용자 인터페이스(540)는 사용자(10)와 네트워크 트래픽 분류 장치(500)의 인터페이싱을 위한 것으로, 키보드, 마우스 등과 같은 입력 디바이스 및 프린터, 스캐너, 스피커, 디스플레이 등과 같은 출력 디바이스를 포함할 수 있다.The user interface 540 is for interfacing the user 10 with the network traffic classifying apparatus 500 and may include an input device such as a keyboard or a mouse and an output device such as a printer, a scanner, a speaker, a display, or the like.

이와 같이, 상술한 구성들(510, 520, 530 및 540)을 포함하도록 구성된 네트워크 트래픽 분류 장치(500)는 '전자 장치' 또는 '컴퓨팅 장치'로 지칭될 수 있다. 전자 장치 또는 컴퓨팅 장치가 동작할 때, 프로세서(510)는 저장소(520)와 통신하여, 네트워크 트래픽 분류 및 네트워크 트래픽 분류를 위한 RNN 모델 생성과 관련된 소프트웨어들을 판독하여 실행하게 된다. As such, the network traffic classification apparatus 500 configured to include the above-described components 510, 520, 530, and 540 may be referred to as an 'electronic device' or a 'computing device'. When the electronic device or computing device is operating, the processor 510 communicates with the storage 520 to read and execute the software associated with network traffic classification and RNN model generation for network traffic classification.

프로세서(510)는 저장소(520)에 저장된 소프트웨어들의 논리적 기능(logical functions)을 수행하도록 도 2에 도시된 바와 같이 다수의 모듈들(510A~510E)을 포함하도록 구성될 수 있다. 각 모듈은 하드웨어 아키텍쳐 관점에서 논리 회로로 구현될 수 있다. 이하, 프로세서(510)에 포함된(또는 임베딩된) 모듈들(510A~510E)에 대해 상세히 기술한다.The processor 510 may be configured to include a number of modules 510A-510E as shown in FIG. 2 to perform logical functions of software stored in the storage 520. Each module may be implemented as a logic circuit from a hardware architecture point of view. Hereinafter, the modules 510A to 510E included (or embedded) in the processor 510 will be described in detail.

제어 모듈(510A)Control module (510A)

제어 모듈(510A)은 주변 모듈들(510B~510F)과 프로세서(510)의 외부에 구비된 구성들(520, 530 및 540)의 동작을 제어 및 관리한다. 예를 들면, 제어 모듈(510A)은 주변 모듈들(510B~510F) 및 구성들(520, 530 및 540)의 동작 순서를 결정하고, 결정된 동작 순서에 따라 주변 모듈들(520~580) 및 구성들(520, 530 및 540)의 동작을 지시 및 제어하거나, 주변 모듈들(520~580) 중에서 어느 하나의 모듈에서 생성한 데이터를 다른 모듈에서 처리 가능하도록 가공하는 프로세스를 수행할 수 있다.The control module 510A controls and manages operations of the peripheral modules 510B to 510F and the components 520, 530, and 540 provided outside the processor 510. For example, the control module 510A determines the operation order of the peripheral modules 510B-510F and the configurations 520, 530, and 540, and determines the peripheral modules 520-580 and the configuration according to the determined operation sequence. Instructs and controls the operations of the fields 520, 530, and 540, or process the data generated by any one of the peripheral modules 520 ˜ 580 to be processed by another module.

데이터 수집 모듈(510B)Data Acquisition Module (510B)

데이터 수집 모듈(510B)은, 네트워크(100 또는 110)에 접속된 네트워크 인터페이스(530)를 통해, 네트워크 장비(130 및 150) 또는 트래픽 모니터링 장치(430 및 450)에서 송수신하는 네트워크 트래픽 데이터를 수집하는 것으로, 하드웨어 아키텍쳐 관점에서, 데이터 수집 모듈(510B)은 네트워크 트래픽 데이터를 일정 단위로 일시적으로 저장하여 일정 단위로 출력하는 일종의 '버퍼 메모리'이거나 이를 포함하는 하드웨어 모듈로 구현될 수 있다.The data collection module 510B collects network traffic data transmitted and received by the network equipment 130 and 150 or the traffic monitoring devices 430 and 450 through the network interface 530 connected to the network 100 or 110. In terms of a hardware architecture, the data collection module 510B may be implemented as a hardware module or a kind of 'buffer memory' that temporarily stores network traffic data in a predetermined unit and outputs the data in a predetermined unit.

데이터 수집 모듈(510B)에 의해 수집되는 네트워크 트래픽 데이터는 PCAP(Packet capture) 파일 또는 이를 포함하는 파일 또는 데이터를 지칭하는 용어로 사용될 수 있다. PCAP 파일은 네트워크 트래픽(또는 패킷)을 캡쳐한 파일이다. Network traffic data collected by the data collection module 510B may be used as a term referring to a packet capture (PCAP) file or a file or data including the same. A PCAP file is a file that captures network traffic (or packets).

PCAP 파일의 구조는 '헤더'로 구분되는 필드와 '페이로드'로 구분되는 필드를 포함하며, 헤더에는 PCAP 파일을 설명하는 전체 정보, 캡쳐된 패킷 데이터의 길이, 소스 아이피, 소스 포트, 목적지 아이피, 목적지 포트, 프로토콜, 네트워크 트래픽(또는 패킷)을 캡쳐한 시간 정보, 플로우를 식별하는 플로우 식별자, 응용 서비스의 종류를 구분하는 정보 등이 기록될 수 있다. 페이로드에는 실제 패킷 데이터가 기록된다.The structure of a PCAP file includes fields delimited by 'headers' and fields delimited by' payloads'.The header contains full information describing the PCAP file, the length of the captured packet data, the source IP, the source port, and the destination IP. The destination port, protocol, time information of capturing network traffic (or packet), a flow identifier for identifying a flow, and information for distinguishing a type of an application service may be recorded. The actual packet data is recorded in the payload.

데이터 정제 모듈(510C)Data Purification Module (510C)

데이터 정제 모듈(510C)은 네트워크 트래픽 데이터, 즉, PCAP 파일을 정제하는 프로세스를 수행한다. 이를 위해, 데이터 정제 모듈(420)은, 네트워크 트래픽 분류 장치(500)가 동작을 시작하면, 제어 모듈(510A)의 지시에 따라, PCAP 파일의 정제를 위한 프로그램이 저장된 저장소(520)와 통신하여, 상기 프로그램을 읽어와 실행함으로써, PCAP 파일을 정제한다.The data refining module 510C performs a process of refining network traffic data, that is, PCAP files. To this end, when the network traffic classification apparatus 500 starts to operate, the data refining module 420 communicates with the storage 520 in which the program for refining the PCAP file is stored, according to an instruction of the control module 510A. The PCAP file is refined by reading and executing the program.

구체적으로, 데이터 정제 모듈(420)은 데이터 수집 모듈(510B)에 의해 수집된 PCAP 파일을 파싱하여, PCAP 파일의 헤더에 기록된 플로우 식별자 및 응용 서비스의 종류를 구분하는 정보 등을 기반으로 도 3에 도시된 바와 같은 응용 서비스 별로 구분되는 플로우 정보를 획득한다. 도 3에 도시된 플로우 정보에서는 BitTorrent, SKYPE, RDP, SSH, Web으로 이루어진 응용 서비스(20)가 도시된다. 각 응용 서비스별로 구분되는 플로우 정보는 플로우들의 전체 개수(21, Total number of flows), 모든 플로우들을 구성하는 패킷들의 전체 개수(22, Total number of packets), 하나의 플로우 내에서 패킷들의 평균 개수(23, Average number of packets in a flow), 전체 패킷 크기(24, Total packet size), 하나의 플로우 내에서 모든 패킷들의 평균 크기(25, Average size of all packets in a flow), 모든 플로우들에 대한 평균 패킷 크기(26, Average packets size over all flows), 모든 플로우들에 대한 최소 패킷 크기(27, Min packet size over all flows) 및 모든 플로우들에 대한 최대 패킷 크기(28, Max packet size over all flows) 등을 포함할 수 있다. In detail, the data refining module 420 parses the PCAP file collected by the data collection module 510B, and based on the flow identifier recorded in the header of the PCAP file and information for identifying the type of the application service, etc. FIG. 3. Acquire flow information classified for each application service as shown in FIG. In the flow information illustrated in FIG. 3, an application service 20 including BitTorrent, SKYPE, RDP, SSH, and Web is illustrated. Flow information divided by each application service includes a total number of flows (21), a total number of packets constituting all flows (22, a total number of packets), and an average number of packets in one flow ( 23, Average number of packets in a flow), Total packet size (24), Average size of all packets in a flow (25), for all flows Average packet size over all flows (26), Min packet size over all flows (27) and Max packet size over all flows (28) ) May be included.

데이터 정제 모듈(510C)은 상기 플로우 정보에서 플로우의 전체 개수(21)를 기반으로 응용 서비스들을 선택하는 프로세스를 수행한다. 예를 들면, 데이터 정제 모듈(510C)은 플로우의 전체 개수(21)가 사전에 설정된 플로우의 기준 개수 이상인 응용 서비스들을 선택할 수 있다. 이것은 아래에서 설명될 RNN 모델을 생성하는데 충분한 학습 데이터를 제공하는 응용 서비스를 선택하기 위함이다. 도 3의 예에서, 사전에 설정된 플로우의 기준 개수가 '30,000'인 경우, 선택되는 응용 서비스는 rdp, ssh, bittorrent이다.The data refining module 510C performs a process of selecting application services based on the total number 21 of flows in the flow information. For example, the data refining module 510C may select application services in which the total number 21 of flows is equal to or greater than a reference number of preset flows. This is to select an application service that provides enough training data to generate the RNN model described below. In the example of FIG. 3, when the reference number of the preset flow is 30,000, the selected application service is rdp, ssh, or bittorrent.

데이터 정제 모듈(510C)는 선택된 응용 서비스들의 플로우 내의 모든 패킷들을 시간에 따라 발생한 순서로 정렬하고, 이러한 순서에 따라 맨 앞에 정렬된 패킷을 포함하는 상위 N(여기서, N은 2이상의 자연수) 개의 패킷들을 모든 패킷들로부터 추출하고, 추출된 패킷들의 페이로드를 각각 추출하는 프로세스를 통해, PCAP 파일(네트워크 트래픽 데이터)을 필터링하여, 축약하는 데이터 정제 과정을 완료한다. 여기서, 플로우 내의 모든 패킷들 중에서 시간 순으로 상위 N개의 패킷들을 선택하는 이유는 네트워크 통신에서 초기에 발생한 패킷들에서 플로우의 구조적 특징의 검색 확률이 높기 때문이다. 패킷들을 선택하는 개수 N은 시스템 부하를 고려하여, 설계에 따라 다양한 값으로 설정될 수 있다.The data refining module 510C sorts all packets in the flow of the selected application services in the order in which they occurred over time, and includes the top N packets, where N is a natural number of two or more, including the packet sorted first in this order. To extract the data from all the packets, and through the process of extracting the payload of the extracted packets, respectively, to filter the PCAP file (network traffic data), to complete the data purification process to reduce. Here, the reason for selecting the top N packets in chronological order among all packets in the flow is that the probability of retrieving the structural features of the flow is high in the packets initially generated in the network communication. The number N for selecting packets may be set to various values depending on the design in consideration of the system load.

데이터 변환 모듈(510D)Data Conversion Module (510D)

데이터 변환 모듈(510D)은, 페이로드와 그레이 스케일(gray scale) 간의 맵핑 관계를 규정한 데이터 변환 규칙을 기반으로, 데이터 정제 모듈(510C)에 의해 정제된 네트워크 트래픽 데이터, 즉, 네트워크 트래픽 데이터(또는 PCAP 파일)의 필터링 및 축약에 의해 생성된 페이로드들을 이미지 데이터로 구성되는 데이터 세트로 변환하는 프로세스를 수행한다. 이미지 데이터는, 예를 들면, 그레이 데이터(gray data)일 수 있다. 데이터 세트는 행과 열로 이루어진 다수의 원소를 포함하는 매트릭스 형태로 표현될 수 있다. 데이터 변환 규칙은 저장소(520)에 저장될 수 있으며, 데이터 변환 모듈(510D)의 호출 명령 또는 제어 모듈(510A)의 제어 명령에 따라, 저장소(520)에서 데이터 변환 모듈(510D)로 로딩될 수 있다.The data conversion module 510D is configured to perform network traffic data refined by the data refining module 510C, that is, network traffic data based on data conversion rules that define a mapping relationship between payload and gray scale. Or converting the payloads generated by the filtering and abbreviation of the PCAP file) into a data set consisting of image data. The image data may be, for example, gray data. The data set may be represented in a matrix form that includes a plurality of elements consisting of rows and columns. The data conversion rule may be stored in the storage 520 and may be loaded from the storage 520 into the data conversion module 510D according to a call command of the data conversion module 510D or a control command of the control module 510A. have.

하나의 페이로드가 M개의 비트들로 구성될 때, 데이터 변환 모듈(510D)은 M개의 비트들을 특정 비트 단위(예를 들면, 1bit, 2bit, 4bit 또는 8bit)로 분할(segmentation)하여, 다수의 비트 그룹을 생성하고, 생성된 비트 그룹들을 상기 데이터 변환 규칙에 따라 변환하여 다수의 그레이 데이터를 생성한다. When one payload is composed of M bits, the data conversion module 510D divides the M bits into specific bit units (eg, 1 bit, 2 bit, 4 bit, or 8 bit), so that a plurality of A bit group is generated, and the generated bit groups are converted according to the data conversion rule to generate a plurality of gray data.

데이터 변환 모듈(510D)은 상기 생성된 다수의 그레이 데이터를 원소로 하는 매트릭스 형태의 데이터 세트로 생성함으로써, 데이터 정제 모듈(510C)에 의해 정제된 네트워크 트래픽 데이터를 이미지 데이터로 구성되는 데이터 세트로 변환하는 과정을 완료한다. 도 4에는 데이터 변환 모듈(510D)에 의해 생성된 데이터 세트의 예가 도시되며, 데이터 세트의 예는 다수의 그레이 데이터들을 십진수로 나타내는 원소들(0, 1, 2, 3, 4, 6, 11 및 13)로 구성된 10x8 매트릭스 형태로 표현한 것이다. 도 4에서는 설명의 이해를 돕기 위해, 매트릭스의 원소들을 십진수로 나타낸 것일 분, 실제 원소들이 십진수로 구성되는 것은 아니다.The data conversion module 510D converts the network traffic data purified by the data refining module 510C into a data set composed of image data by generating the matrix data set having the plurality of gray data as elements. Complete the process. 4 shows an example of a data set generated by the data conversion module 510D, and an example of the data set includes elements 0, 1, 2, 3, 4, 6, 11 and decimals representing a plurality of gray data as decimal numbers. It is expressed in the form of 10x8 matrix consisting of 13). In FIG. 4, for easy understanding of the description, the elements of the matrix are represented in decimal, and the actual elements are not composed of decimal.

학습 모듈(510E)Learning Module (510E)

학습 모듈(510E)은 이미지 데이터로 구성된 데이터 세트를 기계 학습으로 학습하여 데이터 세트와 네트워크 트래픽의 종류 간의 관계를 예측하는 RNN 모델을 생성하고, 이를 저장소(520)에 저장하거나, 저장소(520)에 저장된 이전의 RNN 모델을 갱신하는 프로세스를 수행한다. The learning module 510E learns a data set consisting of image data by machine learning to generate an RNN model that predicts the relationship between the data set and the type of network traffic and stores it in the storage 520 or in the storage 520. Perform the process of updating a previous saved RNN model.

분류 모듈(510F)Sorting Module (510F)

분류 모듈(510F)은 학습 모듈(510E)에 의해 저장소(520)에 저장된 RNN 모델을 호출하여, 네트워크 장비(130 및 150) 또는 트래픽 모니터링 장치(430 및 450)에서 송수신되는 데이터 플로우를 상기 호출한 RNN 모델의 입력 데이터로 구성하여 상기 데이터 플로우에 대한 네트워크 트래픽의 종류를 분류하는 프로세스를 수행한다.The classification module 510F calls the RNN model stored in the storage 520 by the learning module 510E to call the data flow transmitted and received by the network equipment 130 and 150 or the traffic monitoring devices 430 and 450. A process of classifying the types of network traffic for the data flow is performed by configuring the input data of the RNN model.

제어 모듈(510A)은 분류 모듈(510F)에 의한 분류 결과를 기반으로 특정 트래픽 정제 관리, 트래픽 혼잡 탐지, 트래픽 혼잡 제어 등을 포함하는 트래픽 정책을 수립, 수정 및 보완할 수 있다. 이러한 트래픽 정책의 수립, 수정 및 보완은 분류 모듈(510F)에서 수행할 수도 있다. The control module 510A may establish, modify, and supplement a traffic policy including specific traffic refinement management, traffic congestion detection, traffic congestion control, etc. based on the classification result by the classification module 510F. The establishment, modification, and supplementation of such traffic policy may be performed by the classification module 510F.

학습 모듈(510E)에서 생성된 RNN 모델이 통신서비스 사업자 측 서버(도시하지 않음)에 제공되는 경우, 통신서비스 사업자는 상기 제공된 RNN 모델을 활용하여 미리 정한 트래픽 정책을 수정 및 보완할 수도 있다. When the RNN model generated in the learning module 510E is provided to a server of a communication service provider side (not shown), the communication service provider may modify and supplement a predetermined traffic policy by using the provided RNN model.

이와 같이, 본 발명의 일 실시 예에 따른 네트워크 트래픽 분류 장치(500)는As such, the network traffic classification apparatus 500 according to an embodiment of the present invention

네트워크 트래픽을 단순하고 이해하기 쉬운 이미지 데이터로 구성된 데이터 세트로 변환하고, 이러한 데이터세트를 학습하여 RNN 모델을 생성하고, 이러한 RNN 모델을 네트워크 트래픽을 분류하데 활용함으로써, 네트워크 트래픽을 분류하기 위해 플로우 구조와 네트워크 트래픽 간의 관계를 복잡한 규칙으로 만들 필요가 없기 때문에, 네트워크 트래픽 정책이나 계획을 용이하게 수립할 수 있다. Flow structure to classify network traffic by converting network traffic into a data set consisting of simple and understandable image data, learning these datasets to generate RNN models, and using these RNN models to classify network traffic. Because the relationship between the network and network traffic does not need to be complicated rules, network traffic policies or plans can be easily established.

또한, 본 발명의 일 실시 예에 따른 네트워크 트래픽 분류 장치(500)에서 생성하는 RNN 모델은 네트워크 트래픽을 이미지 형태(또는 이미지 패턴)으로 표현되는 데이터 세트의 학습으로부터 생성되기 때문에, 종래에는 파악하지 못했던 이미지 형태(또는 이미지 패턴)의 플로우 특징을 기반으로 네트워크 트래픽을 분류할 수 있다. In addition, since the RNN model generated by the network traffic classification apparatus 500 according to an embodiment of the present invention is generated from learning of a data set in which network traffic is expressed in an image form (or image pattern), it has not been known in the past. Network traffic may be classified based on flow characteristics in image form (or image pattern).

도 5는 본 발명의 일 실시 예에 따른 네트워크 트래픽 분류 방법을 설명하기 위한 흐름도이다. 아래의 단계들을 설명하는 과정에서 도 1 내지 도 4를 참조하여 설명한 내용과 중복되는 내용은 간략히 기술하거나 생략하기로 한다.5 is a flowchart illustrating a network traffic classification method according to an embodiment of the present invention. In the process of describing the following steps, contents overlapping with those described with reference to FIGS. 1 to 4 will be briefly described or omitted.

도 5를 참조하면, 먼저, 단계 S510에서, 네트워크(100)에 접속된 네트워크 인터페이스(530)에 의해, 네트워크 트래픽 데이터를 수집하는 프로세스가 수행된다. 이때, 네트워크 트래픽 데이터는 PCA 파일 또는 이를 포함하는 데이터일 수 있다. Referring to FIG. 5, first, in step S510, a process of collecting network traffic data is performed by the network interface 530 connected to the network 100. In this case, the network traffic data may be a PCA file or data including the same.

이어, 단계 S520에서, 프로세서(510)에 의해, 상기 수집된 네트워크 트래픽 데이터를 정제하는 프로세스가 수행된다.Subsequently, in step S520, a process of purifying the collected network traffic data is performed by the processor 510.

이어, 단계 S530에서, 프로세서(510)에 의해, 상기 정제된 네트워크 트래픽 데이터를 이미지 데이터로 구성된 데이터 세트로 변환하는 프로세스가 수행된다. 일 예로, 상기 정제된 네트워크 트래픽 데이터는 데이터 변환 규칙에 따라, 상기 데이터 세트로 변환될 수 있으며, 데이터 변환 규칙은 그레이 스케일과 상기 네트워크 트래픽 데이터를 정제하는 단계에서 추출된 페이로드 간의 맵핑 관계를 규정한 규칙일 수 있다.Subsequently, in step S530, a process of converting the purified network traffic data into a data set composed of image data is performed by the processor 510. For example, the refined network traffic data may be converted into the data set according to a data conversion rule, and the data conversion rule defines a mapping relationship between gray scale and payload extracted in the step of refining the network traffic data. It can be a rule.

이어, 단계 S540에서, 프로세서(510)에 의해, 상기 데이터 세트를 학습하여 RNN(Recurrent Neural Networks) 모델을 생성하는 프로세스가 수행된다.Subsequently, in step S540, the processor 510 performs a process of learning the data set to generate a Recurrent Neural Networks (RNN) model.

이어, 단계 S550에서, 프로세서(510)에 의해, 상기 네트워크(110)에 접속된 네트워크 장비(130 및 150) 또는 상기 네트워크 장비(130 및 150)에 접속된 통신 사업자측의 트래픽 모니터링 장치(430 및 450)에서 송수신하는 데이터 플로우를 상기 생성된 RNN 모델을 이용하여 상기 데이터 플로우에 대한 네트워크 트래픽의 종류를 분류하는 프로세스가 수행된다.Subsequently, in step S550, the network monitoring apparatus 430 and the network equipment 130 and 150 connected to the network 110 or the network operator 130 and 150 connected to the network equipment 130 and 150 are connected by the processor 510. A process of classifying the type of network traffic for the data flow is performed using the generated RNN model for the data flow transmitted and received at 450.

도 6은 도 5에 도시된 단계 S520의 상세 흐름도이다.FIG. 6 is a detailed flowchart of step S520 shown in FIG. 5.

도 6을 참조하면, 전술한 단계 S520에 따라 네트워크 트래픽 데이터를 정제하기 위해, 먼저, 단계 S521에서, 상기 수집된 네트워크 트래픽 데이터를 파싱하여 응용 서비스별 플로우 정보를 획득하는 프로세스가 수행된다. 플로우 정보는 응용 서비스별 플로우의 개수를 포함한다. 플로우 정보의 예가 도 3에 도시된다. 응용 서비스별 플로우의 개수는, 예를 들면, 네트워크 트래픽 데이터, 즉, PCAP 파일을 파싱하여, PCAP 파일의 헤더에 기록된 플로우 식별자 및 응용 서비스의 종류를 구분하는 정보를 추출하고, 응용 서비스 별로 플로우 식별자의 개수를 카운팅하는 방법으로 획득될 수 있다. Referring to FIG. 6, in order to purify network traffic data according to the above-described step S520, first, in step S521, a process of parsing the collected network traffic data to obtain flow information for each application service is performed. Flow information includes the number of flows for each application service. An example of flow information is shown in FIG. The number of flows for each application service is, for example, parsing network traffic data, i.e., a PCAP file, extracting a flow identifier recorded in the header of the PCAP file, and information for identifying the type of application service, and flows for each application service. It can be obtained by counting the number of identifiers.

이어, 단계 S523에서, 응용 서비스별 플로우의 개수 중에서 사전에 설정한 기준 개수 이상의 상기 플로우의 개수를 갖는 응용 서비스를 선택하는 프로세스가 수행된다.Subsequently, in step S523, a process of selecting an application service having the number of the flows equal to or greater than a preset reference number is performed from the number of flows for each application service.

이어, 단계 S525에서, 상기 선택된 응용 서비스의 플로우에 포함된 다수의 패킷들을 추출하는 프로세스가 수행된다.Then, in step S525, a process of extracting a plurality of packets included in the flow of the selected application service is performed.

이어, 단계 S527에서, 상기 다수의 패킷들로부터 다수의 페이로드를 추출하고, 상기 추출된 다수의 페이로드를 상기 정제된 네트워크 트래픽 데이터로서 생성하는 프로세스가 수행된다.Then, in step S527, a process of extracting a plurality of payloads from the plurality of packets and generating the extracted plurality of payloads as the purified network traffic data is performed.

도 7은 도 6에 도시된 단계 S525의 상세 흐름도이다.FIG. 7 is a detailed flowchart of step S525 shown in FIG. 6.

도 7을 참조하면, 전술한 단계 S525에 따라, 다수의 패킷들을 추출하기 위해, 먼저, 단계 S525-1에서, 상기 선택된 응용 서비스 내의 플로우에 포함된 다수의 패킷들을 시간 순서로 정렬하는 프로세스가 수행된다. Referring to FIG. 7, according to the above-described step S525, in order to extract a plurality of packets, first, in step S525-1, a process of aligning the plurality of packets included in the flow in the selected application service in time order is performed. do.

이어, 단계 S525-3에서, 상기 시간 순서로 정렬된 다수의 패킷들 중에서 플로우의 구조적 특징의 검색 확률이 높은 상위 N개의 패킷들을 상기 다수의 패킷들로서 추출하는 프로세스가 수행된다.Subsequently, in step S525-3, a process of extracting, as the plurality of packets, the upper N packets having a high probability of retrieving a structural feature of the flow from among the plurality of packets arranged in time order are performed.

도 8은 도 5에 도시한 도 5에 도시한 단계 S530의 상세 흐름도이다.FIG. 8 is a detailed flowchart of step S530 shown in FIG. 5 shown in FIG.

도 8을 참조하면, 전술한 단계 S530에 따라, 상기 정제된 네트워크 트래픽 데이터를 이미지 데이터로 구성된 데이터 세트로 변환하기 위해, 먼저, 단계 S531에서, 상기 네트워크 트래픽 데이터를 정제하는 단계에서 추출된 페이로드를 구성하는 비트들을 특정 비트 단위로 분할하여, 다수의 비트 그룹을 생성하는 프로세스가 수행된다.Referring to FIG. 8, in order to convert the purified network traffic data into a data set composed of image data according to the above-described step S530, first, in step S531, the payload extracted in the step of purifying the network traffic data The process of generating a plurality of bit groups is performed by dividing the bits constituting the bit into specific bit units.

이어, 단계 S533에서, 상기 데이터 변환 규칙을 기반으로, 상기 다수의 비트 그룹들을 다수의 그레이 데이터로 각각 변환하는 프로세스가 수행된다. Then, in step S533, a process of converting the plurality of bit groups into a plurality of gray data, respectively, is performed based on the data conversion rule.

이어, 단계 S535에서, 상기 다수의 그레이 데이터를 원소로 하는 매트릭스 형태의 상기 데이터 세트를 생성하는 프로세스가 수행된다.Subsequently, in step S535, a process of generating the data set in the form of a matrix having the plurality of gray data as an element is performed.

이상에서 본 발명에 대하여 실시 예를 중심으로 설명하였으나 이는 단지 예시일 뿐 본 발명을 한정하는 것이 아니며, 본 발명이 속하는 분야의 통상의 지식을 가진 자라면 본 발명의 본질적인 특성을 벗어나지 않는 범위에서 이상에 예시되지 않은 여러 가지의 변형과 응용이 가능함을 알 수 있을 것이다. 예를 들어, 본 발명의 실시예에 구체적으로 나타난 각 구성 요소는 변형하여 실시할 수 있는 것이다. 그리고 이러한 변형과 응용에 관계된 차이점들은 첨부된 청구 범위에서 규정하는 본 발명의 범위에 포함되는 것으로 해석되어야 할 것이다.Although the present invention has been described above with reference to the embodiments, these are only examples and are not intended to limit the present invention, and those skilled in the art to which the present invention pertains may have an abnormality within the scope not departing from the essential characteristics of the present invention. It will be appreciated that various modifications and applications are not illustrated. For example, each component specifically shown in the embodiment of the present invention can be modified. And differences relating to such modifications and applications will have to be construed as being included in the scope of the invention defined in the appended claims.

Claims

In the network traffic classification method in a network traffic classification apparatus comprising a network interface connected to a network and a processor connected to the network interface,
At the processor, refining network traffic data collected via the network interface;
Converting, in the processor, the purified network traffic data into a data set consisting of image data;
At the processor, learning the data set to generate a Recurrent Neural Networks (RNN) model; And
Classifying, by the processor, a type of network traffic for the data flow using the generated RNN model for a data flow transmitted and received by the network equipment connected to the network;
Network traffic classification method comprising a.

The method of claim 1, wherein the refining of the network traffic data comprises:
Parsing the network traffic data to obtain flow information for each application service, and selecting an application service having a number of flows greater than or equal to a reference number from among the number of flows for each application service based on the obtained flow information;
Extracting a plurality of packets included in the flow of the selected application service;
Extracting a plurality of payloads from the plurality of packets; And
Generating the extracted plurality of payloads as the purified network traffic data
Network traffic classification method comprising a.

The method of claim 2, wherein the extracting of the plurality of packets comprises:
Arranging a plurality of packets included in the flow in the selected application service in chronological order; And
Extracting, as the plurality of packets, the upper N packets having a high probability of retrieving a structural feature of a flow among the plurality of packets arranged in chronological order
Network traffic classification method comprising a.

The method of claim 1, wherein the network traffic data,
A network traffic classification method that is a packet capture (PCAP) file.

The method of claim 1, wherein the converting to the data set comprises:
And converting the refined network traffic data into the data set based on a data conversion rule that defines a mapping relationship between gray scale and the payload extracted in the step of refining the network traffic data. .

The method of claim 1, wherein the converting to the data set comprises:
Generating a plurality of bit groups by dividing the bits constituting the payload extracted in the step of refining the network traffic data into specific bit units;
Generating the plurality of gray data by converting the plurality of bit groups according to a data conversion rule that defines a mapping relationship between the gray scale and the payload; And
Generating the data set in matrix form using the plurality of gray data as elements
Network traffic classification method comprising a.

The method of claim 1, wherein generating the RNN model comprises:
Learning the data set by machine learning to generate the RNN model for predicting a relationship between the data set and a type of network traffic.

In the network traffic classification apparatus comprising a network interface for connecting to the network and a processor connected to the network interface,
A network interface for network connection to collect network traffic data;
Extracting a plurality of payloads by filtering the network traffic data, converting the extracted payloads into a data set represented by an image pattern, and learning the data set to generate a Recurrent Neural Networks (RNN) model A processor; And
A repository for storing the RNN model generated by the processor
Network traffic classification device comprising a.

The processor of claim 8, wherein the processor comprises:
And classifying the type of network traffic for the data flow using the RNN model stored in the storage of the data flow transmitted and received by the network equipment connected to the network.

The processor of claim 8, wherein the processor comprises:
Parse the network traffic data to obtain flow information for each application service, and select an application service having the number of the flows equal to or greater than a reference number from among the number of flows for each application service based on the obtained flow information. And extracting the plurality of packets included in the flow and extracting the plurality of payloads from the plurality of packets.

The processor of claim 10, wherein the processor comprises:
Arranging the plurality of packets included in the flow of the selected application service in chronological order, and extracting, as the plurality of packets, the top N packets having a high probability of retrieving a structural feature of the flow among the plurality of packets arranged in the chronological order The network traffic classification apparatus.

The method of claim 8, wherein the network traffic data,
A network traffic classification device that is a packet capture (PCAP) file.

The processor of claim 8, wherein the processor comprises:
And converting the extracted plurality of payloads into a data set represented by an image pattern based on a data conversion rule that defines a mapping relationship between a gray scale representing the image pattern and a payload.

The method of claim 13, wherein the reservoir,
And storing the data conversion rule.

The processor of claim 8, wherein the processor comprises:
By dividing the bits constituting each of the plurality of payloads into specific bit units, generating a plurality of bit groups, and converting the plurality of bit groups according to a data conversion rule that defines a mapping relationship between gray scale and payload. Generating a plurality of gray data and generating the data set in a matrix form having the plurality of gray data as elements.

The processor of claim 8, wherein the processor comprises:
And machine learning the data set to generate the RNN model for predicting a relationship between the data set and a type of network traffic.