KR102014234B1

KR102014234B1 - Method and Apparatus for automatic analysis for Wireless protocol

Info

Publication number: KR102014234B1
Application number: KR1020190012995A
Authority: KR
Inventors: 심신우; 김태규; 전성구; 윤지원; 방우림; 전영배
Original assignee: 엘아이지넥스원 주식회사; 고려대학교 산학협력단
Priority date: 2019-01-31
Filing date: 2019-01-31
Publication date: 2019-08-26

Abstract

Disclosed are an automatic wireless protocol analysis method and an apparatus therefor. According to an embodiment of the present invention, a wireless protocol analyzer may comprise: a preprocessing unit for extracting bit stream data from the collected wireless signal and classifying a session in which a message is received based on the bit stream data; a message distance calculation unit configured to measure a distance between the classified messages and generate a linkage matrix for clustering; and a cluster processing unit for clustering the message based on the linkage matrix.

Description

Wireless protocol automatic analysis method and device therefor {Method and Apparatus for automatic analysis for Wireless protocol}

본 발명은 무선 프로토콜을 자동으로 분석하는 방법 및 그를 위한 장치에 관한 것이다.The present invention relates to a method and an apparatus therefor for automatically analyzing a wireless protocol.

이 부분에 기술된 내용은 단순히 본 발명의 실시예에 대한 배경 정보를 제공할 뿐 종래기술을 구성하는 것은 아니다.The contents described in this section merely provide background information on the embodiments of the present invention and do not constitute a prior art.

종래에는 수집된 패킷들이 기존에 정의된 프로토콜에 따른 패킷들인지 그렇지 않은 지를 판단하여 새로운 유형의 프로토콜에 따른 패킷일 경우 사용자가 해당 프로토콜의 정의 파일을 생성하여 프로토콜 분석 결과를 출력한다. 이러한 프로토콜을 분석하는 기술은 한국등록특허 제 10-1466017 호에 기재되어 있다. In the related art, it is determined whether the collected packets are packets according to a previously defined protocol or not, and if the packet is according to a new type of protocol, the user generates a definition file of the corresponding protocol and outputs a protocol analysis result. Techniques for analyzing this protocol are described in Korean Patent Registration No. 10-1466017.

하지만, 종래 프로토콜 분석 방법에서는 기존에 정의되지 않은 프로토콜이거나 사용자가 해당 프로토콜의 구조를 알지 못하는 경우, 프로토콜 분석에 어려움이 존재한다. However, in the conventional protocol analysis method, if the protocol is not defined in the past or the user does not know the structure of the protocol, there is a difficulty in protocol analysis.

따라서, 프로토콜 구조를 특정할 수 없는 패킷을 분석해야 할 경우 종래 기법을 적용하여 프로토콜 분석을 처리할 수 없다. Therefore, when it is necessary to analyze a packet for which the protocol structure cannot be specified, the protocol analysis cannot be processed by applying a conventional technique.

또한, 무선 프로토콜을 분석하기 위해서는 무선 신호 수집 단계에서 아날로그 신호를 디지털 비트 스트림으로 변환해야 하는 처리가 필요하다. 종래 프로토콜 분석 방법에서는 유선 프로토콜 분석에 치중하였기 때문에 무선 신호 분석 수집 기법에 대한 방법이 전혀 기재되어 있지 않다. 즉, 무선 프로토콜은 데이터 시퀀스에서 이용되는 메모리를 최소화하기 위해 바이트가 아닌 비트 단위의 메시지를 사용한다. 무선 프로토콜을 분석하기 위해서는 바이트가 아닌 비트 단위 분석을 통해 메시지를 분석하는 기술이 필요하다.In addition, in order to analyze a wireless protocol, a process of converting an analog signal into a digital bit stream is required in a wireless signal collection step. In the conventional protocol analysis method, since it focuses on wired protocol analysis, no method for wireless signal analysis collection method is described. In other words, the wireless protocol uses messages in bits rather than bytes to minimize the memory used in the data sequence. Analyzing wireless protocols requires a technique for analyzing messages by analyzing bits rather than bytes.

본 발명은 무선신호로부터 비트 스트림 데이터를 추출하고, 비트 스트림 데이터 기반의 메시지 간 거리를 측정하여 군집화를 수행하는 무선 프로토콜 자동 분석 방법 및 그를 위한 장치를 제공하는 데 주된 목적이 있다.The present invention is to provide a method for automatically analyzing a wireless protocol for extracting bit stream data from a radio signal and performing clustering by measuring a distance between messages based on bit stream data, and an apparatus therefor.

본 발명의 일 측면에 의하면, 상기 목적을 달성하기 위한 무선 프로토콜 분석기는 기 수집된 무선신호로부터 비트 스트림 데이터를 추출하고, 상기 비트 스트림 데이터를 기반으로 메시지가 수신된 세션을 분류하는 전처리부; 분류된 상기 메시지 간의 거리를 측정하여 군집화를 위한 연결 행렬(Linkage Matrix)을 생성하는 메시지 거리 계산부; 및 상기 연결 행렬을 기반으로 상기 메시지에 대한 군집화를 수행하는 군집화 처리부를 포함할 수 있다. According to an aspect of the present invention, a wireless protocol analyzer for achieving the above object comprises a pre-processing unit for extracting the bit stream data from the pre-collected radio signal, and classifies the session in which the message is received based on the bit stream data; A message distance calculator configured to measure a distance between the classified messages and generate a linkage matrix for clustering; And a clustering processor that clusters the messages based on the connection matrix.

또한, 본 발명의 다른 측면에 의하면, 상기 목적을 달성하기 위한 무선 프로토콜 분석 방법은 기 수집된 무선신호로부터 비트 스트림 데이터를 추출하고, 상기 비트 스트림 데이터를 기반으로 메시지가 수신된 세션을 분류하는 전처리 단계; 분류된 상기 메시지 간의 거리를 측정하여 군집화를 위한 연결 행렬(Linkage Matrix)을 생성하는 메시지 거리 계산단계; 및 상기 연결 행렬을 기반으로 상기 메시지에 대한 군집화를 수행하는 군집화 처리 단계를 포함할 수 있다.In addition, according to another aspect of the present invention, a wireless protocol analysis method for achieving the above object is a pre-processing to extract the bit stream data from the pre-collected radio signal, and to classify the session in which the message is received based on the bit stream data step; A message distance calculation step of measuring a distance between the classified messages to generate a linkage matrix for clustering; And a clustering processing step of performing clustering on the message based on the connection matrix.

이상에서 설명한 바와 같이, 본 발명은 비트 스트림을 사용하는 무선 송수신 프로토콜에 대한 분석을 수행할 수 있는 효과가 있다. As described above, the present invention has an effect of performing an analysis on a wireless transmission / reception protocol using a bit stream.

또한, 본 발명은 프로토콜의 구조가 알려지지 않은 무선 프로토콜에 대해 효율적으로 분석을 수행할 수 있는 효과가 있다.In addition, the present invention has the effect that can be efficiently performed for the wireless protocol of which the structure of the protocol is unknown.

또한, 본 발명은 프로토콜을 분석하는 과정에서 사용자가 개입하여 변수를 변경해야 하는 체험적인 부분을 자동화함으로써, 효율적으로 메시지를 분석할 수 있는 효과가 있다. In addition, the present invention has an effect that can efficiently analyze the message by automating the experiential part that needs to be changed by the user in the process of analyzing the protocol.

또한, 본 발명은 사용자가 개입하는 부분을 줄이고 자동화함으로써 사용자로부터 유발되는 실수를 줄일 수 있는 효과가 있다. In addition, the present invention has the effect of reducing the mistakes caused by the user by reducing the portion of the user intervention and automation.

또한, 본 발명은 최적의 변수 값을 찾아내고자 할 때, 사용자가 직접 변수를 바꿔가며 테스트를 하는 것보다 자동화를 통해 신속하게 변수를 찾아 프로토콜을 분석할 수 있는 효과가 있다. In addition, the present invention has the effect that the user can quickly find the variable through the analysis of the protocol to analyze the protocol rather than testing the variable by changing the variable directly when the user wants to find the optimal value.

도 1은 본 발명의 실시예에 따른 프로토콜 자동 분석 시스템을 개략적으로 나타낸 블록 구성도이다.
도 2는 본 발명의 실시예에 따른 프로토콜 자동 분석기를 개략적으로 나타낸 블록 구성도이다.
도 3은 본 발명의 실시예에 따른 프로토콜 자동 분석 방법을 설명하기 위한 순서도이다.
도 4는 본 발명의 실시예에 따른 프로토콜 자동 분석기의 전처리부 동작을 설명하기 위한 도면이다.
도 5는 본 발명의 실시예에 따른 프로토콜 자동 분석기의 메시지 거리 계산부 동작을 설명하기 위한 도면이다.
도 6a 및 6b는 본 발명의 실시예에 따른 프로토콜 자동 분석기의 메시지 거리 계산부 동작을 설명하기 위한 도면이다.
도 7은 본 발명의 실시예에 따른 프로토콜 자동 분석기의 메시지 거리 계산부 동작을 설명하기 위한 도면이다.
도 8은 본 발명의 실시예에 따른 프로토콜 자동 분석기의 군집화 처리부 동작을 설명하기 위한 도면이다. 1 is a block diagram schematically showing a protocol automatic analysis system according to an embodiment of the present invention.
2 is a block diagram schematically illustrating a protocol automatic analyzer according to an embodiment of the present invention.
3 is a flowchart illustrating a protocol automatic analysis method according to an embodiment of the present invention.
4 is a view for explaining the operation of the preprocessor of the protocol automatic analyzer according to an embodiment of the present invention.
5 is a view for explaining the operation of the message distance calculation unit of the protocol automatic analyzer according to an embodiment of the present invention.
6A and 6B are diagrams for describing an operation of a message distance calculator of a protocol automatic analyzer according to an exemplary embodiment of the present invention.
7 is a view for explaining the operation of the message distance calculation unit of the protocol automatic analyzer according to an embodiment of the present invention.
8 is a view for explaining the clustering processing unit operation of the protocol automatic analyzer according to an embodiment of the present invention.

이하, 본 발명의 바람직한 실시예를 첨부된 도면들을 참조하여 상세히 설명한다. 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다. 또한, 이하에서 본 발명의 바람직한 실시예를 설명할 것이나, 본 발명의 기술적 사상은 이에 한정하거나 제한되지 않고 당업자에 의해 변형되어 다양하게 실시될 수 있음은 물론이다. 이하에서는 도면들을 참조하여 본 발명에서 제안하는 무선 프로토콜 자동 분석 방법 및 그를 위한 장치에 대해 자세하게 설명하기로 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. In describing the present invention, when it is determined that the detailed description of the related well-known configuration or function may obscure the gist of the present invention, the detailed description thereof will be omitted. In addition, the following will describe a preferred embodiment of the present invention, but the technical idea of the present invention is not limited thereto and may be variously modified and modified by those skilled in the art. Hereinafter, with reference to the drawings will be described in detail the radio protocol automatic analysis method and apparatus therefor proposed in the present invention.

본 발명은 무선 송수신 프로토콜의 구조를 자동으로 분석해주는 기법에 관한 것으로써, 무선 송수신 프로토콜의 경우 데이터 시퀀스에서 사용되는 메모리를 최소화하기 위해 바이트가 아닌 비트 단위의 메시지를 사용한다. 기존의 유선 프로토콜 분석기의 경우, 바이트 단위의 데이터를 분석하기 위한 작업을 필요로 하는데 이러한 문제를 무선 프로토콜 분석기에서 해소하기 위하여 우리는 비트 단위의 계산 만을 이용하여 분류하며 인위적인 작업을 최소화하였다.The present invention relates to a technique for automatically analyzing the structure of a wireless transmit / receive protocol. In the case of a wireless transmit / receive protocol, a message in units of bits rather than bytes is used to minimize memory used in a data sequence. In the case of the conventional wire protocol analyzer, it is necessary to analyze the data in byte unit. To solve this problem in the wireless protocol analyzer, we classify using only the bit unit calculation and minimize the artificial work.

본 발명에서는 무선 환경을 고려하여 프로토콜을 분석하기 위해 메시지를 비트 단위로 분석하고 사람이 개입하여 변수를 바꾸어야 하는 체험적인 부분을 자동화하기 위한 방법으로 메시지 수집 시간 간격을 고려한다. 본 발명에서는 이러한 기법들을 이용하여 비공개 프로토콜을 비트 단위로 자동 분석하여 유사한 메시지들끼리 군집화 하는데 그 목적이 있다.In the present invention, the message collection time interval is considered as a method for automating an experiential part in which a message is analyzed bit by bit and a parameter is changed by human intervention to analyze a protocol in consideration of a wireless environment. The purpose of the present invention is to automatically analyze a private protocol bit by bit using these techniques to cluster similar messages.

본 발명은 군사 분야에 적용되어 무선 프로토콜을 자동으로 분석하는 것이 바람직하나 반드시 이에 한정되는 것은 아니며, 무선 침입 방지 시스템 (WIPS, Wireless Intrusion Prevention System)에 포함되어 정상적인 무선 패킷과 악의적인 무선 패킷을 분류하고 악의적인 패킷을 분석하기 위한 소프트웨어로서 활용이 가능하다. The present invention is preferably applied to the military field to automatically analyze a wireless protocol, but is not necessarily limited thereto, and is included in a wireless intrusion prevention system (WIPS) to classify normal wireless packets and malicious wireless packets. It can be used as software for analyzing malicious packets.

도 1은 본 발명의 실시예에 따른 프로토콜 자동 분석 시스템을 개략적으로 나타낸 블록 구성도이다.1 is a block diagram schematically showing a protocol automatic analysis system according to an embodiment of the present invention.

본 실시예에 따른 프로토콜 자동 분석 시스템(100)은 무선신호 수집장비(110) 및 프로토콜 자동 분석기(120)를 포함한다. 도 1의 프로토콜 자동 분석 시스템(100)은 일 실시예에 따른 것으로서, 도 1에 도시된 모든 블록이 필수 구성요소는 아니며, 다른 실시예에서 프로토콜 자동 분석 시스템(100)에 포함된 일부 블록이 추가, 변경 또는 삭제될 수 있다. The protocol automatic analysis system 100 according to the present embodiment includes a wireless signal collection device 110 and a protocol automatic analyzer 120. The protocol automatic analysis system 100 of FIG. 1 is in accordance with one embodiment, and not all of the blocks shown in FIG. 1 are required components, and in other embodiments, some blocks included in the protocol automatic analysis system 100 are added. , Can be changed or deleted.

프로토콜 자동 분석 시스템(100)은 무선 신호를 수집하는 단계를 수행하는 무선신호 수집장비(110)와 수집한 무선신호로부터 추출한 비트 스트림으로 프로토콜을 분석하는 단계를 수행하는 프로토콜 자동 분석기(120)로 구분된다. 프로토콜 자동 분석 시스템(100)는 무선 환경에서 사용되는 대표적인 프로토콜인 IEEE 802.11을 대상으로 프로토콜 자동 분석을 수행할 수 있으나 반드시 이에 한정되는 것은 아니다. The protocol automatic analysis system 100 is divided into a wireless signal collecting device 110 performing a step of collecting a wireless signal and a protocol automatic analyzer 120 performing a step of analyzing a protocol with a bit stream extracted from the collected wireless signal. do. The protocol automatic analysis system 100 may perform protocol automatic analysis for IEEE 802.11, which is a representative protocol used in a wireless environment, but is not necessarily limited thereto.

무선신호 수집장비(110)는 안테나(112), 복조기(114) 및 아날로그 디지털 변환기(116)을 포함한다. 안테나(112)는 무선신호 수집장비(110)의 일측에 부탁되어 무선 신호를 수신하는 역할을 수행한다. 안테나(112)는 군용 시스템(초단파, 극초단파 등) 안테나일 수 있으나 반드시 이에 한정되는 것은 아니며, 무선신호를 수신할 수 있다면 다양한 형태로 구현될 수 있다. The radio signal collecting device 110 includes an antenna 112, a demodulator 114, and an analog to digital converter 116. The antenna 112 is requested to one side of the radio signal collecting device 110 to perform a role of receiving a radio signal. The antenna 112 may be a military system (microwave, microwave, etc.) antenna, but is not limited thereto, and may be implemented in various forms as long as it can receive a radio signal.

복조기(114)는 수신된 무선신호 중 기 설정된 대역의 무선신호를 수집하는 동작을 수행한다. 여기서, 복조기(114)는 GNURadio와 같은 SDR(Software Defined Recorder)를 이용하여 기 설정된 대역의 무선신호를 수집하는 것이 바람직하나 반드시 이에 한정되는 것은 아니다. The demodulator 114 collects radio signals of a predetermined band among the received radio signals. Here, the demodulator 114 preferably collects a radio signal of a predetermined band by using a software defined recorder (SDR) such as GNURadio, but is not necessarily limited thereto.

아날로그 디지털 변환기(116)는 아날로그 무선신호를 디지털 비트 스트림으로 변환하는 동작을 수행한다. The analog to digital converter 116 converts an analog radio signal into a digital bit stream.

본 실시예에 따른 무선신호 수집장비(110)는 SDR(Software Defined Recorder)와 같은 장비와 GNURadio와 같은 무선 신호를 변복조해주는 소프트웨어를 이용하여 구현될 수 있다. The radio signal collecting device 110 according to the present embodiment may be implemented using a device such as a software defined recorder (SDR) and software for demodulating and demodulating a radio signal such as GNURadio.

우선 SDR 장비의 안테나에서 수집된 아날로그 신호를 바탕으로 GNURadio로 신호를 복조하여 비트 스트림으로 바꾼다. 복조 기법은 어떠한 통신 표준이나 기술도 가능하며, 최종적으로 무선 신호 수집 파트에서 비트 정보를 다음 다이어그램으로 보내기만 하면 다음 단계를 진행할 수 있다.First, based on the analog signal collected from the antenna of the SDR device, the signal is demodulated by GNURadio and converted into a bit stream. The demodulation technique can be any communication standard or technology, and the final step is to simply send the bit information from the radio signal acquisition part to the next diagram.

무선신호 수집장비(110)는 통신 변복조 모듈이 포함된 하드웨어 네트워크 어댑터를 사용하거나 소프트웨어 상에서 데이터 처리가 가능한 USRP(Universal Software Radio Peripheral)와 같은 RF 트랜시버를 사용하여 무선신호를 수집할 수 있으나 반드시 이에 한정되는 것은 아니다. The radio signal collecting device 110 may collect radio signals by using a hardware network adapter including a communication modulation and demodulation module or by using an RF transceiver such as Universal Software Radio Peripheral (USRP) capable of data processing in software. It doesn't happen.

GNURadio는 Unix 개발 환경에서 제공되는 오프소스 소프트웨어이다. GNURadio를 USRP와 함께 이용하면 원하는 모듈을 소프트웨어로 제작할 수 있다. USRP 장비의 안테나에서 수신된 아날로그 신호를 복조하는 코드를 GNURadio를 이용하여 개발하였다. GNURadio is open source software provided by the Unix development environment. GNURadio can be used with USRP to create your own modules in software. The code for demodulating the analog signal received from the antenna of the USRP device was developed using GNURadio.

프로토콜 자동 분석기(120)는 전처리부(210), 메시지 거리 계산부(220) 및 군집화 처리부(230)를 포함한다. The protocol automatic analyzer 120 includes a preprocessor 210, a message distance calculator 220, and a clustering processor 230.

전처리부(210)는 메시지 군집화를 위해 필요한 부가적인 데이터를 추출하고, 추출된 데이터를 패킷 단위로 분류하는 동작을 수행한다. The preprocessor 210 extracts additional data necessary for message clustering and classifies the extracted data in packet units.

메시지 거리 계산부(220)는 메시지 사이의 거리를 측정한다. 메시지 거리 계산부(220)는 메시지 간의 유사도를 Needleman-Wunsch 알고리즘이나 Waterman 알고리즘 등으로 산출하여 메시지 사이의 거리를 측정할 수 있다. The message distance calculator 220 measures the distance between the messages. The message distance calculator 220 may calculate the similarity between the messages by using a Needleman-Wunsch algorithm or a Waterman algorithm, and measure the distance between the messages.

군집화 처리부(230)는 메시지 간의 거리를 기반으로 유사도가 높은 메시지들을 군집화한다. 군집화 처리부(230)는 계산된 메시지 간의 거리를 기반으로 UPGMA(Unweighted Pair Group Method with Arithmetic Mean)와 같은 계층적 군집화 알고리즘(Hierarchical Clustering Algorithm)을 이용하여 가장 높은 유사도의 메시지들끼리의 군집화를 수행한다. The clustering processor 230 clusters the messages with high similarity based on the distance between the messages. The clustering processor 230 performs clustering of messages having the highest similarity using a hierarchical clustering algorithm, such as an unweighted pair group method with arithmetic mean (UPGMA), based on the calculated distance between the messages. .

프로토콜 자동 분석기(120)는 물리적 계층의 데이터 패킷을 분석한다. 프로토콜 자동 분석기(120)는 APRE(Automatic Protocol Reversing Engineering) 기법을 이용하여 프로토콜을 분석할 수 있으나 반드시 이에 한정되는 것은 아니다. The protocol automatic analyzer 120 analyzes the data packet of the physical layer. The protocol automatic analyzer 120 may analyze the protocol by using an automatic protocol reversing engineering (APRE) technique, but is not limited thereto.

종래 기술에서는 대부분 HTTP, FTP 등과 같이 OSI Layer의 상위 계층에서의 통신을 주로 다루었지만, 본 발명의 프로토콜 자동 분석기(120)는 물리 계층의 데이터 패킷을 주로 다룬다. 프로토콜 자동 분석기(120)에 대한 자세한 설명은 도 2에 기재하도록 한다. Although most of the prior art deals mainly with communication in the upper layer of the OSI Layer such as HTTP, FTP, etc., the protocol automatic analyzer 120 of the present invention mainly deals with data packets of the physical layer. Detailed description of the protocol automatic analyzer 120 will be described with reference to FIG. 2.

도 2는 본 발명의 실시예에 따른 프로토콜 자동 분석기를 개략적으로 나타낸 블록 구성도이다. 2 is a block diagram schematically illustrating a protocol automatic analyzer according to an embodiment of the present invention.

본 실시예에 따른 프로토콜 자동 분석기(120)는 전처리부(210), 메시지 거리 계산부(220) 및 군집화 처리부(230)를 포함한다. 도 2의 프로토콜 자동 분석기(120)는 일 실시예에 따른 것으로서, 도 2에 도시된 모든 블록이 필수 구성요소는 아니며, 다른 실시예에서 프로토콜 자동 분석기(120)에 포함된 일부 블록이 추가, 변경 또는 삭제될 수 있다.The protocol automatic analyzer 120 according to the present embodiment includes a preprocessor 210, a message distance calculator 220, and a clustering processor 230. The protocol automatic analyzer 120 of FIG. 2 is in accordance with one embodiment, and not all of the blocks shown in FIG. 2 are required components. In another embodiment, some blocks included in the protocol automatic analyzer 120 may be added or changed. Or may be deleted.

전처리부(210)는 무선신호로부터 획득한 데이터를 정제하고, 메시지 군집화를 위해 필요한 부가적인 데이터들을 추출한다. 본 실시예에 따른 전처리부(210)는 비트 스트림 데이터 추출부(212), 시간 정보 추출부(214) 및 데이터 세션 분류부(216)를 포함한다. The preprocessor 210 refines the data obtained from the radio signal and extracts additional data necessary for message clustering. The preprocessor 210 according to the present embodiment includes a bit stream data extractor 212, a time information extractor 214, and a data session classifier 216.

비트 스트림 데이터 추출부(212)는 수집된 무선신호로부터 획득한 데이터를 비트 단위로 분석하기 위하여 비트 단위의 비트 스트림 데이터로 변환한다. 비트 스트림 데이터 추출부(212)는 비트 스트림 데이터에서 기 설정된 중요 데이터들을 정제하거나 추출할 수 있다. The bit stream data extractor 212 converts the data obtained from the collected radio signals into bit stream data in units of bits in order to analyze the units in bits. The bit stream data extractor 212 may purify or extract predetermined important data from the bit stream data.

시간 정보 추출부(214)는 비트 스트림 데이터가 수집된 시점의 시간 정보를 추출하여 저장한다. 다시 말해, 시간 정보 추출부(214)는 비트 단위로 메시지가 도착한 시간 정보를 추출하여 저장한다. The time information extractor 214 extracts and stores time information at the time point at which the bit stream data is collected. In other words, the time information extracting unit 214 extracts and stores time information when the message arrives in bits.

데이터 세션 분류부(216)는 추출된 시간 정보를 기반으로 세션을 구분하는 동작을 수행한다. 무선기기 통신은 세션 별로 통신 시간이 유사하기 때문에 추출된 시간정보를 기반으로 메시지가 수신된 세션을 자동으로 구분할 수 있다. The data session classifier 216 classifies sessions based on the extracted time information. Since wireless device communication has a similar communication time for each session, a session in which a message is received can be automatically classified based on the extracted time information.

메시지를 군집화 하는데 있어 시간 관계는 중요한 고려 요소이다. 메시지들이 수신된 시각을 나타낸 도 4를 참조하면, 무선기기들 간의 송수신이 메시지가 연속적으로 송수신된 뒤 일련의 시간 이후에 다시 연속적으로 메시지가 오는 형태가 반복되는 것을 확인할 수 있다. 즉, 메시지들이 같은 세션에 있을 수록 서로 관련 있는 메시지가 송수신될 가능성이 높고, 다른 세션에 있을 수록 메시지 간에 관련성이 적을 수 있다. 따라서, 본 발명의 메시지 거리 계산부(220)에서는 메시지 간의 거리를 계산할 때 시간 간격에 따른 가중치를 부여할 수 있다. 패킷 수신 시간 간격(Time interval)에 대한 가중치를 이용한 성능 향상에 대한 내용은 메시지 거리 계산부(220)에 기재하도록 한다. Time clustering is an important consideration in clustering messages. Referring to FIG. 4 showing the time at which the messages were received, it can be seen that transmission and reception between wireless devices are repeated after a series of time after the messages are continuously transmitted and received. That is, the more likely messages are in the same session, the more likely they are to be sent and received, and the more different sessions are, the less relevant they are to messages. Therefore, the message distance calculator 220 of the present invention may assign weights according to time intervals when calculating the distance between messages. Information on performance improvement using weights for packet reception time intervals is described in the message distance calculator 220.

메시지 거리 계산부(220)는 비트 스트림 데이터 및 시간 정보에 근거하여 적합변수를 탐색하고, 적합변수를 이용하여 메시지 간의 거리를 측정하는 동작을 수행한다. 메시지 거리 계산부(220)는 적합변수 탐색부(222), 거리 계산부(224) 및 연결행렬 생성부(226)를 포함한다. The message distance calculator 220 searches for a fitting variable based on the bit stream data and time information, and measures a distance between messages using the fitting variable. The message distance calculator 220 includes a fit variable search unit 222, a distance calculator 224, and a connection matrix generator 226.

적합변수 탐색부(222)는 비트 단위의 프로토콜의 분석을 위하여 메시지 거리 계산에 사용되는 변수를 탐색한다. 여기서, 적합변수 탐색부(222)는 Monte Carlo 기법을 이용하여 메시지 거리 계산에 사용되는 적합변수를 탐색하는 것이 바람직하나 반드시 이에 한정되는 것은 아니다. 여기서, 메시지 거리 계산에 사용되는 알고리즘은 Needleman-Wunsch 알고리즘일 수 있다. The adaptive variable search unit 222 searches for a variable used for calculating the message distance for analyzing the protocol in bits. In this case, the fit variable search unit 222 may search for a fit variable used for calculating a message distance using the Monte Carlo technique, but is not limited thereto. Here, the algorithm used for calculating the message distance may be a Needleman-Wunsch algorithm.

본 발명의 핵심은 비트 단위의 프로토콜도 분석이 가능하도록 제안하는 기법이기 때문에, 적합변수들이 군집화 성능에 많은 영향을 미친다. 다시 말해, 적합변수들의 비중을 어떻게 두느냐에 따라 메시지 거리 계산 결과가 달라질 수 있기 때문에 본 발명에서는 Monte Carlo 기법을 이용하여 가장 높은 정확도로 메시지를 분류하는 비중의 값을 인위적으로 정하지 않고 적합변수들을 탐색한다. 예를 들어, 적합변수 탐색부(222)는 두 개의 메시지의 시퀀스 캐릭터가 일치하였을 때 얻는 매칭값(Matching award)을 10으로 가정하였을 때, 불일치 패널티(Mismatch-penalty)와 간극 패널티(Gap-penalty)는 0 내지 10 사이의 값을 랜덤하게 추출하여 적합변수를 탐색할 수 있다. Since the core of the present invention is a technique for proposing a bit-by-bit protocol analysis, the fit variables have a great influence on the clustering performance. In other words, since the result of calculating the message distance may vary depending on how the weights of the fit variables are placed, the present invention searches for the fit variables without artificially determining the value of the weight classifying the message with the highest accuracy using the Monte Carlo technique. do. For example, the fit variable search unit 222 assumes a matching award (10) obtained when the sequence characters of two messages match, and mismatch-penalty and gap-penalty. ) Can search for fitted variables by randomly extracting a value between 0 and 10.

도 5에 도시된 바와 같이, 적합변수 탐색부(222)는 Monte Carlo 기법을 이용하여 가장 높은 정확도를 보이는 두 값을 찾는다. 도 5에 도시된 그래프의 가장 높은 지점은 Mismatch-penalty(X)가 0.9일 때와 Gap-penalty(Y)가 5.1일 때 정확도(Z)가 90 %로 가장 높은 값을 갖는다. Monte Carlo 기법을 이용하면 이처럼 가장 높은 정확도를 갖는 적합변수들을 자동적으로 찾아낼 수 있다.As shown in FIG. 5, the fit variable search unit 222 finds two values having the highest accuracy using the Monte Carlo technique. The highest point of the graph shown in FIG. 5 has the highest accuracy (Z) of 90% when Mismatch-penalty (X) is 0.9 and when Gap-penalty (Y) is 5.1. Using the Monte Carlo technique, this highest accuracy fit can be found automatically.

거리 계산부(224)는 전처리부(210)에서 획득한 비트 스트림 데이터 및 시간 정보를 기반으로 메시지 간의 거리를 측정한다. 여기서, 거리 계산부(224)는 시퀀스 거리 계산 알고리즘 중 하나인 Needleman-Wunsch 알고리즘을 이용하여 시퀀스 간 거리를 비교하는 것이 바람직하나 반드시 이에 한정되는 것은 아니다.The distance calculator 224 measures the distance between the messages based on the bit stream data and the time information obtained by the preprocessor 210. Here, the distance calculator 224 preferably compares the distance between sequences using the Needleman-Wunsch algorithm, which is one of sequence distance calculation algorithms, but is not necessarily limited thereto.

이하에서는 Needleman-Wunsch 알고리즘을 적용하여 메시지 간 거리를 측정하는 것으로 가정하여 설명하도록 한다. Needleman-Wunsch 알고리즘을 이용하여 거리를 측정 시, 메시지들 사이의 거리를 계산하기 위해서는 몇 가지 사전 변수들을 설정해주어야 하며, 이러한 변수 설정은 적합변수 탐색부(222)에서 처리된다. 자세한 과정은 도 6a를 참조하여 설명하도록 한다. In the following description, it is assumed that the distance between messages is measured by applying the Needleman-Wunsch algorithm. When the distance is measured using the Needleman-Wunsch algorithm, several dictionary variables must be set in order to calculate the distance between messages, and the variable setting is processed by the fit variable search unit 222. A detailed process will be described with reference to FIG. 6A.

도 6a에 도시된 바와 같이, 두 시퀀스 70832F65BD867AD200와 708300D200을 예시로 Needlmen-Wunsch 알고리즘을 이용한 정렬(Align) 기법을 설명하도록 한다. As shown in FIG. 6A, an alignment technique using a Needlmen-Wunsch algorithm will be described using two sequences 70832F65BD867AD200 and 708300D200 as an example.

적합변수 탐색부(222)는 우선 각 메시지 길이만큼의 크기를 갖는 제1 행렬(행렬 F)을 생성한다. 제1 행렬의 i행 j열의 요소의 값을 F_i,j라 정의하고 두 시퀀스를 m1[i], m2[j], 유사도 함수를 S(x,y)라 하면 전체 행렬 요소들의 값은 아래와 같이 정의된다.The fit variable search unit 222 first generates a first matrix (matrix F) having a size corresponding to each message length. If we define the values of elements in row i, column j of the first matrix as F _{i, j} , the two sequences are m1 [i], m2 [j], and the similarity function is S (x, y), Is defined as:

(F_i,j: 제1 행렬(행렬 F)의 i행 j열의 요소의 값, S: 유사도 함수, m: 메시지(시퀀스), d: 임의의 고정값(상수))(F _{i, j} : value of element of row i column j of the first matrix (matrix F), S: similarity function, m: message (sequence), d: arbitrary fixed value (constant))

여기서, d는 임의의 고정된 값을 넣으며, 상기 고정된 값은 간극 패널티(Gap-penalty)로 정의될 수 있다. 간극 패널티(Gap-penalty)는 두 시퀀스를 정렬(align)할 때, 값이 밀렸을 때의 패널티를 의미한다. 함수 S는 매칭되었을 때의 유사도를 정량화하여 점수로 나타낸 것이고 이는 [수학식 2]와 같이 정의될 수 있다. Here, d is a fixed value, and the fixed value may be defined as a gap-penalty. Gap-penalty refers to the penalty of a value being pushed when aligning two sequences. The function S is expressed as a score quantifying the similarity when matched, which can be defined as shown in [Equation 2].

(S: 유사도 함수, v: 매칭 보상 함수, e: 두 메시지 값이 같을 때의 보상값, f는 두 메시지 값이 틀렸을 때의 패널티(penalty)값)(S: similarity function, v: matching compensation function, e: compensation value when two message values are the same, f is penalty value when two message values are wrong)

이때, e는 두 메시지 값이 같을 때의 보상과 같은 개념이고 f는 틀렸을 때의 패널티(penalty)이다. 여기서, 두 메시지 값이 상이할 때의 점수 차이는 불일치 패널티(Mismatch-penalty)로 정의될 수 있다. In this case, e is the same concept as compensation when two message values are the same, and f is a penalty when wrong. Here, the score difference when the two message values are different may be defined as a mismatch-penalty.

Needleman-Wunsch 알고리즘을 이용하여 정렬(align)하기 위해서는 불일치 패널티(Mismatch-penalty)와 간극 패널티(Gap-penalty)와 같은 사전 적합변수들을 먼저 설정해야만 한다. In order to align using the Needleman-Wunsch algorithm, pre-fit variables such as mismatch-penalty and gap-penalty must first be set.

최종적으로 얻어진 행렬에서 도 6a의 사선 패턴 부분과 같이 행렬(Matrix)에서 가장 큰 값에서 시작하여 왼쪽, 위쪽, 대각선 값 중 가장 큰 값(동일한 경우 대각선)으로 이동하여 경로(path)를 구성한다. 즉, 가장 패널티(penalty)가 최소가 되는 정렬(align)을 찾는 것이다.In the finally obtained matrix, a path is formed by starting from the largest value in the matrix and moving to the largest value (diagonal case in the same case) of the left, upper, and diagonal values as shown in the diagonal pattern portion of FIG. 6A. In other words, find the alignment with the least penalty.

도 6b에 도시된 바와 같이 실제 예시를 바탕으로 설명하면 아래와 같다. Referring to the actual example as shown in Figure 6b as follows.

도 6b을 참조하면, 제1 영역(610)은 정렬(align)을 하면서 생긴 간극(Gap)으로 인한 패널티(penalty)로 정의하고, 제2 영역(620)은 값이 서로 불일치(Mismatch)하기 때문에 생긴 패널티(penalty)로 정의한다. 이러한 부분에 대한 점수를 정의하여 Needleman-Wunsch 알고리즘으로 서로 다르게 정렬(align)된 결과를 보여주게 된다. Referring to FIG. 6B, the first region 610 is defined as a penalty due to a gap generated during alignment, and the second region 620 is mismatched with each other. It is defined as the penalty incurred. The scores for these parts are defined to show differently aligned results with the Needleman-Wunsch algorithm.

본 발명에서는 성능 향상을 위하여 시간 간격(Time interval)을 이용한 가중치도 추가로 적용할 수 있다. In the present invention, a weight using a time interval may be additionally applied to improve performance.

다시 말해, 거리 계산부(224)는 Needleman-Wunsch 알고리즘을 이용하여 얻어낸 결과 값에 [수학식 3]을 적용하여 시간 간격(Time interval)이 멀어질수록 거리도 멀어지도록 설정할 수 있다. 여기서, 거리 계산부(224)는 종래의 다른 기법과 비교하였을 때, 자동화 분석을 위하여 중요한 부분에 해당한다. 즉, 패킷과 패킷 사이의 유사도 평가에 시간 간격에 따라 가중치를 부여하여 자동으로 세션 분리가 되도록 한다. In other words, the distance calculator 224 may set the distance so that the distance becomes longer as the time interval becomes longer by applying Equation 3 to the result value obtained using the Needleman-Wunsch algorithm. Here, the distance calculator 224 corresponds to an important part for the automated analysis when compared with other conventional techniques. In other words, weights are assigned to the similarity evaluation between packets according to time intervals so that sessions are automatically separated.

(G: Half-Gaussian 분포, t[i], t[j]: 메시지가 수집된 시간, μ: 0(기 설정된 상수))(G: Half-Gaussian distribution, t [i], t [j]: time the message was collected, μ: 0 (preset constant))

여기서, G는 Half-normal distribution을 따르며 자세한 수식은 [수학식 4]와 같다. Here, G follows a half-normal distribution and the detailed formula is shown in [Equation 4].

((G(y): Half-Gaussian 분포, σ: 3(기 설정된 상수)))((G (y): Half-Gaussian distribution, σ: 3 (preset constant)))

[수학식 4]의 G(y) 함수는 Half-Gaussian 분포를 의미한다. 구체적으로, G(y) 함수는 거리에 따라서 점차 낮아지는 가중치를 부여하여 시퀀스 사이의 거리 계산에 곱한다. 이러한 G(y) 함수에 대한 그래프는 도 7의 (a)에 도시되어 있다. The G (y) function of Equation 4 denotes a Half-Gaussian distribution. Specifically, the G (y) function gives a weight that gradually decreases with distance and multiplies the distance calculation between sequences. The graph for this G (y) function is shown in FIG.

본 발명에 따른 거리 계산부(224)는 거리에 대한 가중치 함수를 계산된 정렬 스코어(Alignment score)에 적용하면, 가까운 시간에 존재하는 패킷끼리 자동적으로 묶이게 된다. 도 7의 (b)는 거리에 대한 가중치 함수를 적용하지 않은 정렬 스코어(Alignment score)를 나타내고, 도 7의 (c)는 거리에 대한 가중치 함수를 적용한 정렬 스코어(Alignment score)를 나타낸다. When the distance calculation unit 224 according to the present invention applies a weighting function for the distance to the calculated alignment score, packets existing in a close time are automatically bundled. 7 (b) shows an alignment score without applying a weight function for distance, and FIG. 7 (c) shows an alignment score with a weight function for distance.

즉, 본 발명에 따른 거리 계산부(224)는 거리에 대한 가중치 함수를 이용함에 따라, 시간 차이가 오래 나는 군집화 대상을 자동으로 제거할 수 있으므로 인위적인 세션 분리 작업없이 군집을 유추할 수 있다. That is, according to the distance calculator 224 according to the present invention, the clustering object having a long time difference can be automatically removed according to the weighting function of the distance, so that the cluster can be inferred without artificial session separation.

이하, 메시지 거리 계산부(220)의 동작을 구체적으로 설명하도록 한다. 여기서, 메시지 거리 계산부(220)는 Needleman-Wunsch 알고리즘을 이용하여 메시지 간의 거리를 측정하는 것으로 가정하고 있으나 반드시 이에 한정되는 것은 아니다. Hereinafter, the operation of the message distance calculator 220 will be described in detail. Here, the message distance calculator 220 is assumed to measure the distance between messages using the Needleman-Wunsch algorithm, but is not necessarily limited thereto.

메시지 거리 계산부(220)는 유사한 메시지끼리 군집하기 위하여 메시지의 유사 정도를 측정하기 위한 척도를 필요로 한다. 예를 들어, 메시지 거리 계산부(220)는 Needleman-Wunsch 알고리즘과 시간 간격에 따른 가중치를 부여하는 기법을 적용하여 메시지의 거리를 계산한다. 여기서, Needleman-Wunsch 알고리즘은 대표적인 서열 정렬(Sequence Alignment) 알고리즘 중 하나이다. The message distance calculator 220 needs a measure for measuring the similarity of the messages in order to cluster similar messages. For example, the message distance calculator 220 calculates the distance of the message by applying a Needleman-Wunsch algorithm and a weighting technique based on a time interval. Here, the Needleman-Wunsch algorithm is one of the representative Sequence Alignment algorithms.

메시지 거리 계산부(220)는 서로 다른 두 서열을 가진 메시지를 비교하기 위하여 두 서열을 최대한 유사하게 맞추는 서열 정렬(Sequence Alignment)을 수행한 후 두 메시지의 유사도를 정량적으로 비교하기 위하여 불일치 패널티(mismatch penalty), 간극 패널티(gap penalty) 및 매칭값(Matching award)을 설정하고, 설정된 불일치 패널티, 간극 패널티 및 매칭값을 이용하여 메시지 간의 유사도를 계산한다. The message distance calculation unit 220 performs a sequence alignment to closely match the two sequences as much as possible to compare messages having two different sequences, and then mismatches to quantitatively compare the similarities of the two messages. A penalty, a gap penalty, and a matching award are set, and the similarity between the messages is calculated using the set mismatch penalty, the gap penalty, and the matching value.

불일치 패널티(mismatch penalty) 값은 서로 다른 값을 가진 경우에 부여하는 값으로 음의 값을 부여하여 메시지가 유사하지 않을 수록 더 낮은 값을 갖도록 설정한다. The mismatch penalty value is a value that is given when there are different values. A mismatch penalty value is set to have a lower value when the message is not similar.

메시지 거리 계산부(220)는 메시지의 길이가 서로 다른 경우 정렬 과정에서 공백이 발생할 수 있다. 간극 패널티(gap penalty) 값은 정렬 과정에서 공백이 발생한 경우 계산하는 값으로 음의 값으로 설정하여 공백이 추가될 때마다 더 낮은 유사도 값을 가지게 된다. The message distance calculator 220 may generate a space in the sorting process when the lengths of the messages are different. The gap penalty value is calculated when there is a space in the alignment process. It is set to a negative value so that each time a space is added, it has a lower similarity value.

매칭값(Matching award)는 두 값이 서로 동일한 값을 가진 경우 값을 부여하여 동일한 메시지일 수록 더 높은 유사도 값을 갖게 한다. Matching award assigns a value when two values have the same value so that the same message has a higher similarity value.

Needleman-Wunsch 알고리즘은 두 서열을 비교하여 행렬을 구성하는 단계와 최적의 정렬을 찾는 단계로 구성되어 있다. The Needleman-Wunsch algorithm consists of comparing two sequences to construct a matrix and finding the best alignment.

메시지 거리 계산부(220)는 두 메시지 (m₁, m₂)의 최적의 정렬을 계산하기 위해 메시지의 비트 별로 유사도를 측정하여 제1 행렬(행렬 F)을 구성한다. 즉, 제1 행렬(행렬 F)의 크기는 length(m₁) + 1 by length(m₂) + 1의 값으로 구성된다. The message distance calculator 220 configures a first matrix (matrix F) by measuring similarity for each bit of the message in order to calculate an optimal alignment of the two messages m ₁ and m ₂ . That is, the size of the first matrix (matrix F) is composed of values of length (m ₁ ) + 1 by length (m ₂ ) + 1.

제1 행렬(행렬 F)과 최적의 정렬을 나타내는 제2 행렬(행렬 Ptr)은 [수학식 5] 내지 [수학식 8]와 같이 정의된다. 여기서, g는 간극 패널티(gap penalty) 값을 의미하고, e는 매칭값(Matching award)을 의미하고, f는 불일치 패널티(mismatch penalty) 값을 의미한다. A second matrix (matrix Ptr) showing an optimal alignment with the first matrix (matrix F) is defined as in Equations 5 to 8. Here, g means a gap penalty value, e means a matching award, and f means a mismatch penalty value.

(F_i,j: 제1 행렬(행렬 F)의 i행 j열의 요소의 값, S: 매칭 보상 함수, e: 두 메시지 값이 같을 때의 보상값, f는 두 메시지 값이 틀렸을 때의 패널티(penalty)값, g: 간극 패널티, Ptr_i,j: 제2 행렬(행렬 Ptr), Diag: 좌측 상단 대각선 방향 이동, Left: 왼쪽 방향 이동, Up: 위쪽 방향 이동)(F _{i, j} : the value of the element in row i column j of the first matrix (matrix F), S: matching compensation function, e: compensation value when two message values are equal, f is penalty when two message values are wrong) (penalty) value, g: gap penalty, Ptr _{i, j} : second matrix (matrix Ptr), Diag: diagonally shifted to the upper left, Left: shifted left, Up: shifted upwards)

메시지 거리 계산부(220)는 최적의 정렬을 나타내는 제2 행렬(행렬 Ptr)을 이용하여 두 메시지를 정렬할 수 있다. The message distance calculator 220 may align two messages using a second matrix (matrix Ptr) indicating an optimal alignment.

메시지 거리 계산부(220)는 제2 행렬(행렬 Ptr)의 마지막 원소에서 시작하여 원소의 값이 Diag이면 좌측 상단 대각선 방향으로 원소의 값이 Left이면 왼쪽 방향으로 Up이면 위쪽 방향으로 재귀적으로 움직이면서 최적의 정렬 경로를 탐색한다. 즉, 두 메시지의 정렬은 메시지의 역순으로 진행되며 최적의 경로를 찾는 과정에서 원소의 값이 Diag이면 해당 순서의 비트가 동일한 값을 갖는 것을 의미하며, 원소의 값이 Left이면 해당 순서의 왼쪽 서열에 있는 메시지가 gap을 가지는 다는 것을 의미한다. 원소의 값이 Up이면 해당 순서의 위쪽 서열에 있는 메시지가 gap을 가진다는 것을 의미한다. 메시지 거리 계산부(220)는 전술한 바와 같이, 재귀적으로 움직이면서 제2 행렬(행렬 Ptr)의 첫 번째 원소에 도착하면 정렬이 끝나게 된다. The message distance calculation unit 220 recursively starts from the last element of the second matrix (matrix Ptr) and moves upwardly when the element value is Diag and upwards when the element value is Left and upwards when the element value is Left. Find the best alignment path. In other words, the sorting of two messages proceeds in the reverse order of the message. If the element value is Diag in the process of finding the optimal path, the bits of the sequence have the same value. If the element value is Left, the left sequence of the order This means that the message in has a gap. If the value of the element is Up, the message in the upper sequence of the sequence has a gap. As described above, when the message distance calculator 220 arrives at the first element of the second matrix (matrix Ptr) while moving recursively, the alignment is completed.

이후, 메시지 거리 계산부(220)는 두 메시지 (m₁, m₂)이 Needleman-Wunsch 알고리즘을 이용하여 정렬된 결과 (m₁', m₂')와 두 메시지가 수집된 시간인 (t[m₁], t[m₂])을 고려하여 두 메시지의 거리를 계산한다. 메시지 거리 계산부(220)는 정렬된 두 메시지 (m₁', m₂')의 전체 메시지 길이에서 m₁'과 m₂'이 동일한 값을 갖는 비트의 수를 나눈 값을 유사도(Identity)로 정의한다. Thereafter, the message distance calculating unit 220 determines that the two messages (m ₁ , m ₂ ) are sorted using the Needleman-Wunsch algorithm (m ₁ ', m ₂ ') and the time when the two messages were collected (t [ m ₁ ], t [m ₂ ]) to calculate the distance of the two messages. Message calculation section 220 into two messages, sorted _{_{(m 1 ', m 2'}} ) m 1 ' and m _2' similarity to the value obtained by dividing the number of bits having the same value (Identity) from the total message length of the define.

메시지 거리 계산부(220)는 두 메시지가 수집된 시간 차이가 길 수록 더 낮은 유사도 값을 갖도록 하기 위해 시간 가중치(w)가 주어졌을 때 확률 밀도 함수 f(x)를 [수학식 9]와 같이 정의될 수 있다. The message distance calculation unit 220 calculates the probability density function f (x) as shown in Equation 9 when the time weight w is given so that the longer the time difference between the two messages is collected, the lower the similarity value is. Can be defined.

(f(x): 확률 밀도 함수, w: 시간 가중치)(f (x): probability density function, w: time weight)

확률 밀도 함수 f(x)가 위와 같을 때, 두 메시지 (m₁, m₂)의 거리 d(m₁, m₂)는 [수학식 10]과 같이 계산된다.When the probability density function f (x) is as above, the distance d (m ₁ , m ₂ ) of two messages (m ₁ , m ₂ ) is calculated as shown in Equation 10.

(d(m₁, m₂): 두 메시지 (m₁, m₂)의 가중된 거리, identity: 유사도)(d (m ₁ , m ₂ ): weighted distance of two messages (m ₁ , m ₂ ), identity: similarity)

따라서, 메시지 거리 계산부(220)는 두 메시지가 유사할 수록 두 메시지의 거리는 가까우므로 작은 거리 값을 가지고, 두 메시지가 다를 수록 두 메시지의 거리는 길기 때문에 큰 거리 값을 가지는 결과를 산출한다. Accordingly, the message distance calculator 220 has a smaller distance value because the two messages are similar, and the distance between the two messages is closer, and the distance between the two messages is longer as the two messages are different, and thus a large distance value is calculated.

예를 들어, 비트 스트림으로 구성된 메시지 m₁= 10010, m₂ = 110 가 주어졌을 때 제1 행렬(행렬 F)과 제2 행렬(행렬 Ptr)은 각각 [표 1]과 [표 2]와 같이 구성된다. 여기서, 간극 패널티(gap penalty) 값(g)은 -5, 매칭값(Matching award, e)은 10, 불일치 패널티(mismatch penalty) 값(f)는 -1인 것으로 가정한다. For example, given a message m ₁ = 10010 and m ₂ = 110 consisting of a bit stream, the first matrix (matrix F) and the second matrix (matrix Ptr) are shown in Table 1 and Table 2, respectively. It is composed. Here, it is assumed that a gap penalty value g is -5, a matching award e is 10, and a mismatch penalty value f is -1.

[표 1]은 m₁= 10010, m₂ = 110 에 대한 제1 행렬(행렬 F)을 나타낸다. Table 1 shows a first matrix (matrix F) for m ₁ = 10010 and m ₂ = 110.

[표 2]에서 음영으로 표시된 원소들은 두 메시지의 최적의 정렬 경로를 나타낸다. The shaded elements in Table 2 indicate the optimal alignment path for the two messages.

[표 2]에 의하면, 메시지 거리 계산부(220)에서 산출된 두 메시지 m₁, m₂의 정렬 결과는 각각 m'₁= 10010, m'₂ = 1--10 이다. 두 메시지 m₁, m₂ 정렬 결과 메시지 길이는 5 이고, m'₁과 m'₂에서 공통되는 비트의 수는 3 개 이므로 유사eh도(identity)은 identity = 3/5 = 0.6 이 된다. 또한, 두 메시지가 수집된 시간이 각각 t[m₁] = 0.1, t[m₂] = 0.2 이고, 시간 가중치 w = 10 일 때, 두 메시지의 거리 값 d(m₁, m₂) = 1.66 을 가진다. According to [Table 2], the alignment results of the two messages m ₁ and m ₂ calculated by the message distance calculator 220 are m ' ₁ = 10010 and m' ₂ = 1--10, respectively. The sorting result of the two messages m ₁ and m ₂ is 5, and the number of bits common to m ' ₁ and m' ₂ is 3, so the identity is 3 = 5/5. In addition, when the time when two messages were collected is t [m ₁ ] = 0.1, t [m ₂ ] = 0.2 and time weight w = 10, the distance value d (m ₁ , m ₂ ) of the two messages = 1.66 Has

연결행렬 생성부(226)는 각 세션에서 메시지들 사이의 거리를 계산하게 되면, 모든 메시지들 사이의 거리가 행렬 형태로 구해지는데 이를 연결 행렬(Linkage Matrix, 행렬 D)로 정의할 수 있다. 다시 말해, 연결행렬 생성부(226)는 수집한 모든 데이터에서 모든 가능한 메시지 쌍의 조합에 대한 메시지 간의 거리를 계산하여 연결 행렬을 구성할 수 있다. The link matrix generation unit 226 calculates the distance between the messages in each session, and the distance between all the messages is obtained in a matrix form, which may be defined as a linkage matrix (matrix D). In other words, the connection matrix generator 226 may configure a connection matrix by calculating a distance between messages for all possible message pair combinations in all collected data.

군집화 처리부(230)는 메시지 거리 계산부(220)에서 생성된 연결 행렬(Linkage Matrix, 행렬 D)을 기반으로 계층적 군집화 알고리즘(Hierarchical Clustering Algorithm)을 적용하여 군집화를 처리한다. 군집화 처리부(230)는 거리 군집 처리부(232), 계통수 생성부(234) 및 메시지 군집부(236)를 포함한다. The clustering processor 230 processes the clustering by applying a hierarchical clustering algorithm based on the linkage matrix D generated by the message distance calculator 220. The clustering processor 230 includes a distance cluster processor 232, a tree tree generator 234, and a message cluster 236.

거리 군집 처리부(232)는 메시지 거리 계산부(220)에서 계산한 메시지 거리에 대한 연결 행렬(Linkage Matrix, 행렬 D)을 기반으로 계층적 군집화 알고리즘(Hierarchical Clustering Algorithm)을 이용하여 유사한 메시지들끼리 군집화를 수행한다. The distance clustering unit 232 clusters similar messages using a hierarchical clustering algorithm based on a linkage matrix (D) of the message distance calculated by the message distance calculating unit 220. Perform

계통수 생성부(234)는 계층적으로 군집화되는 알고리즘은 개체들이 결합되는 순서를 나타내는 트리 다이어그램을 생성한다. 계통수 생성부(234)는 덴드로그램(Dendrogram) 형태로 개체들이 결합되는 순서를 나타낼 수 있다. The phylogenetic tree generation unit 234 generates a tree diagram indicating the order in which the objects are combined hierarchically clustered. The phylogenetic tree generation unit 234 may indicate the order in which the entities are combined in the form of a dendrogram.

메시지 군집부(236)는 생성된 덴드로그램을 참고하여 군집 수준을 설정하고, 설정된 군집 수준에 따라 메시지 군집을 수행한다. The message cluster unit 236 sets a cluster level by referring to the generated dendrogram and performs a message cluster according to the set cluster level.

이하, 군집화 처리부(230)의 동작에 대해 구체적으로 설명하도록 한다. Hereinafter, the operation of the clustering processor 230 will be described in detail.

군집화 처리부(230)는 계층적 군집화 알고리즘을 이용하여 유사한 메시지들끼리 군집화를 수행한다. 여기서, 계층적 군집화 알고리즘은 계층적 트리모형을 이용하여 개별적인 개체들을 계층적으로 유사한 개체 혹은 그룹과 통합하여 군집화 하는 알고리즘이다. 군집화 처리부(230)는 UPGMA(Unweighted Pair Group Method with Arithmetic Mean) 알고리즘을 적용하여 유사한 메시지들끼리 군집화를 수행할 수 있다. UPGMA 알고리즘은 계층적 군집화 알고리즘 중 하나로 두 군집에 포함된 모든 객체 쌍 간의 평균 거리를 사용하여 계층적으로 군집한다. The clustering processor 230 performs clustering between similar messages using a hierarchical clustering algorithm. Here, the hierarchical clustering algorithm is an algorithm that integrates individual entities into hierarchically similar entities or groups by using a hierarchical tree model. The clustering processor 230 may cluster similar messages by applying an Unweighted Pair Group Method with Arithmetic Mean (UPGMA) algorithm. The UPGMA algorithm is one of hierarchical clustering algorithms that hierarchically clusters using the average distance between all pairs of objects in two clusters.

두 군집 x, y가 주어지고, n_a를 군집 a에 포함된 개체의 개수, a_i를 군집 a의 i 번째 개체라 할 때 두 군집 x, y의 거리 dist(x, y)는 [수학식 11]과 같이 정의될 수 있다. Given two clusters x and y, n _a is the number of objects in cluster a, and a _i is the i th entity of cluster a, the distance dist (x, y) of two clusters x, y is 11].

(x, y: 두 개의 군집, dist(x, y): 두 군집 x, y의 거리, n_x:군집 x에 포함된 개체의 개수, n_y: 군집 y에 포함된 개체의 개수, d(m₁, m₂): 두 메시지 (m₁, m₂)의 거리)(x, y: two clusters, dist (x, y): distance of two clusters x, y, n _x : number of objects in cluster x, n _y : number of objects in cluster y, d ( m ₁ , m ₂ ): the distance of two messages (m ₁ , m ₂ ))

군집화 처리부(230)는 덴드로그램을 사용하여 단계별로 형성된 군집을 확인하고 형성된 군집의 거리를 확인할 수 있다. 군집화 처리부(230)는 덴드로그램을 활용하여 형성된 군집의 거리의 값에 대한 임계값(threshold value)을 설정하고, 설정된 임계값을 기초로 최종 군집 수준을 결정할 수 있다. The clustering processor 230 may check the clusters formed in stages using the dendrogram and check the distance of the formed clusters. The clustering processor 230 may set a threshold value for the distance value of the cluster formed by using the dendrogram, and determine the final cluster level based on the set threshold value.

도 3은 본 발명의 실시예에 따른 프로토콜 자동 분석 방법을 설명하기 위한 순서도이다. 3 is a flowchart illustrating a protocol automatic analysis method according to an embodiment of the present invention.

무선신호 수집장비(110)는 무선신호를 수집한다(S310). 단계 S310은 GNURadio와 같은 SDR(Software Defined)를 이용하여 원하는 무선 신호 대역의 신호를 수집하고, 아날로그 신호를 디지털 비트 스트림으로 변환하는 변환한다. 단계 S310은 무선신호 수집장비(110)에서 수행하는 동작과 대응된다. The wireless signal collecting device 110 collects a wireless signal (S310). Step S310 collects a signal of a desired radio signal band using SDR (Software Defined) such as GNURadio, and converts the analog signal into a digital bit stream. Step S310 corresponds to an operation performed by the radio signal collecting device 110.

프로토콜 자동 분석기(120)는 수집된 무선신호의 전처리를 수행한다(S320). 단계 S320은 수집한 무선 신호에서 메시지 군집화를 위해 필요한 부가적인 데이터 추출 및 패킷 단위로 나누는 동작을 수행한다. 여기서, 단계 S320은 수집한 비트 스트림 데이터에서 시간 정보를 추출하고, 시간에 따른 데이터 세션을 분류하는 동작을 수행한다. 단계 S320은 프로토콜 자동 분석기(120)의 전처리부(210)에서 수행하는 동작과 대응된다. The protocol automatic analyzer 120 performs preprocessing of the collected radio signals (S320). In operation S320, additional data extraction required for message clustering and division into packet units are performed. Here, step S320 extracts time information from the collected bit stream data, and classifies a data session according to time. Step S320 corresponds to an operation performed by the preprocessor 210 of the protocol automatic analyzer 120.

프로토콜 자동 분석기(120)는 전처리 처리 결과에 근거하여 메시지 간의 거리를 계산한다(S330). 단계 S330은 Needleman-Wunsch 알고리즘이나 Waterman 알고리즘을 이용하여 메시지 간의 유사도를 계산할 수 있다. 단계 S330은 메시지 거리 계산에 사용될 변수를 설정하고, Needleman-Wunsch 알고리즘이나 Waterman 알고리즘을 이용하여 메시지간의 거리를 계산하며, 수집한 모든 메시지 간의 메시지 거리를 계산한 연결 행렬(Linkage Matrix, 행렬 D)을 생성한다. 단계 S330은 프로토콜 자동 분석기(120)의 메시지 거리 계산부(220)에서 수행하는 동작과 대응된다. The protocol automatic analyzer 120 calculates a distance between messages based on the preprocessing result (S330). Step S330 may calculate the similarity between messages using the Needleman-Wunsch algorithm or the Waterman algorithm. Step S330 sets a variable to be used for calculating the message distance, calculates the distance between the messages using the Needleman-Wunsch algorithm or the Waterman algorithm, and calculates a linkage matrix (matrix D) for calculating the message distance between all collected messages. Create Step S330 corresponds to an operation performed by the message distance calculator 220 of the protocol automatic analyzer 120.

프로토콜 자동 분석기(120)는 계산된 메시지간 거리를 기반으로 유사한 메시지들끼리 군집을 수행한다(S340). 단계 S340는 계산된 거리를 바탕으로 계층적 군집화 알고리즘을 이용하여 유사한 메시지들끼리 군집을 처리한다. 단계 S340는 계산된 메시지들 간의 거리를 이용하여 UPGMA와 같은 계층적 군집화 알고리즘을 이용하여 유사한 메시지들끼리 군집하고, 군집화된 결과를 바탕으로 덴드로그램(Dendrogram)을 생성하며, 메시지 군집 수준에 대한 임계치를 결정하여 메시지 군집을 수행한다. 단계 S340은 프로토콜 자동 분석기(120)의 군집화 처리부(230)에서 수행하는 동작과 대응된다. The protocol automatic analyzer 120 performs clustering of similar messages based on the calculated distance between the messages (S340). Step S340 processes the clusters of similar messages using a hierarchical clustering algorithm based on the calculated distance. Step S340 clusters similar messages using a hierarchical clustering algorithm such as UPGMA using the calculated distance between the messages, generates a dendrogram based on the clustered result, and a threshold for the message cluster level. Determining and performing message clustering. Step S340 corresponds to an operation performed by the clustering processor 230 of the protocol automatic analyzer 120.

도 3에서는 각 단계를 순차적으로 실행하는 것으로 기재하고 있으나, 반드시 이에 한정되는 것은 아니다. 다시 말해, 도 3에 기재된 단계를 변경하여 실행하거나 하나 이상의 단계를 병렬적으로 실행하는 것으로 적용 가능할 것이므로, 도 3은 시계열적인 순서로 한정되는 것은 아니다.In FIG. 3, the steps are described as being sequentially executed, but are not necessarily limited thereto. In other words, since the steps described in FIG. 3 may be applied by changing or executing one or more steps in parallel, FIG. 3 is not limited to the time series order.

도 3에 기재된 본 실시예에 따른 프로토콜 자동 분석 방법은 애플리케이션(또는 프로그램)으로 구현되고 단말장치(또는 컴퓨터)로 읽을 수 있는 기록매체에 기록될 수 있다. 본 실시예에 따른 프로토콜 자동 분석 방법을 구현하기 위한 애플리케이션(또는 프로그램)이 기록되고 단말장치(또는 컴퓨터)가 읽을 수 있는 기록매체는 컴퓨팅 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치 또는 매체를 포함한다.The protocol automatic analysis method according to the present embodiment described in FIG. 3 may be implemented in an application (or program) and recorded on a recording medium readable by a terminal device (or computer). The recording medium which an application (or program) for implementing the protocol automatic analysis method according to the present embodiment is recorded and the terminal device (or computer) can read is any kind of recording device that stores data that can be read by the computing system. Or media.

도 4는 본 발명의 실시예에 따른 프로토콜 자동 분석기의 전처리부 동작을 설명하기 위한 도면이다. 4 is a view for explaining the operation of the preprocessor of the protocol automatic analyzer according to an embodiment of the present invention.

도 4를 참조하면, 무선기기들 간의 송수신이 메시지가 연속적으로 송수신된 뒤 일련의 시간 이후에 다시 연속적으로 메시지가 오는 형태가 반복되는 것을 확인할 수 있다. 즉, 메시지들이 같은 세션에 있을 수록 서로 관련 있는 메시지가 송수신될 가능성이 높고, 다른 세션에 있을 수록 메시지 간에 관련성이 적을 수 있다. 따라서, 본 발명의 메시지 거리 계산부(220)에서는 메시지 간의 거리를 계산할 때 시간 간격에 따른 가중치를 부여할 수 있다.Referring to FIG. 4, it can be seen that the transmission and reception between wireless devices is repeated after the message is continuously transmitted and received again after a series of time. That is, the more likely messages are in the same session, the more likely they are to be sent and received, and the more different sessions are, the less relevant they are to messages. Therefore, the message distance calculator 220 of the present invention may assign weights according to time intervals when calculating the distance between messages.

도 5는 본 발명의 실시예에 따른 프로토콜 자동 분석기의 메시지 거리 계산부 동작을 설명하기 위한 도면이다. 5 is a view for explaining the operation of the message distance calculation unit of the protocol automatic analyzer according to an embodiment of the present invention.

본 발명의 메시지 거리 계산부(220)는 Monte Carlo 기법을 이용하여 가장 높은 정확도로 메시지를 분류하는 비중의 값을 인위적으로 정하지 않고 적합변수들을 탐색한다. 예를 들어, 메시지 거리 계산부(220)는 두 개의 메시지의 시퀀스 캐릭터가 일치하였을 때 얻는 매칭값(Matching award)을 10으로 가정하였을 때, 불일치 패널티(Mismatch-penalty)와 간극 패널티(Gap-penalty)는 0 내지 10 사이의 값을 랜덤하게 추출하여 적합변수를 탐색할 수 있다. The message distance calculator 220 of the present invention searches for the appropriate variables without artificially determining the value of the specific gravity that classifies the message with the highest accuracy using the Monte Carlo technique. For example, the message distance calculator 220 assumes a matching award obtained when two sequence characters of two messages match 10, and mismatch-penalty and gap-penalty. ) Can search for fitted variables by randomly extracting a value between 0 and 10.

도 5를 참조하면, 메시지 거리 계산부(220)는 Monte Carlo 기법을 이용하여 가장 높은 정확도를 보이는 두 값을 찾는다. 도 5에 도시된 그래프의 가장 높은 지점은 Mismatch-penalty(X)가 0.9일 때와 Gap-penalty(Y)가 5.1일 때 정확도(Z)가 90 %로 가장 높은 값을 갖는다. Monte Carlo 기법을 이용하면 이처럼 가장 높은 정확도를 갖는 적합변수들을 자동적으로 찾아낼 수 있다.Referring to FIG. 5, the message distance calculator 220 finds two values having the highest accuracy using the Monte Carlo technique. The highest point of the graph shown in FIG. 5 has the highest accuracy (Z) of 90% when Mismatch-penalty (X) is 0.9 and when Gap-penalty (Y) is 5.1. Using the Monte Carlo technique, this highest accuracy fit can be found automatically.

도 8은 본 발명의 실시예에 따른 프로토콜 자동 분석기의 군집화 처리부 동작을 설명하기 위한 도면이다.8 is a view for explaining the clustering processing unit operation of the protocol automatic analyzer according to an embodiment of the present invention.

군집화 처리부(230)는 거리가 가까운 메시지들끼리 군집화가 가능하다. 도 8은 각 메시지들의 군집화 결과를 댄드로그램(Dendrogram)으로 나타낸 것이다. The clustering processor 230 may cluster the messages with close distances. 8 shows a clustering result of each message in a dendrogram.

도 8에 도시된 바와 같이, x 축의 Index는 각 메시지의 순번을 나타내며, y 축의 값은 Needleman-Wunsch 알고리즘으로 얻어낸 각 메시지들 사이의 거리(Distance)를 나타낸다. 각 군집은 거리가 가까운 순으로 군집화가 된 것을 볼 수 있다. As shown in FIG. 8, the index on the x-axis represents the sequence number of each message, and the value on the y-axis represents the distance between each message obtained by the Needleman-Wunsch algorithm. Each cluster can be seen clustered in order of closest distance.

군집화 처리부(230)에서 사용되는 계층적 군집화 알고리즘은 어떤 계층적 알고리즘도 가능하며, UPGMA(Unweighted Pair Group Method with Arithmetic Mean) 알고리즘을 이용하여 군집화한 결과를 나타낸다. 군집화 처리부(230)는 최종적으로 얻어진 댄드로그램 즉 계통수 그래프에서 임계치를 기준으로 소정의 거리에 존재하는 것으로 판단되는 클러스터를 최종적으로 선택한다.The hierarchical clustering algorithm used in the clustering processor 230 may be any hierarchical algorithm, and represents a result of clustering using an unweighted pair group method with arithmetic mean (UPGMA) algorithm. The clustering processor 230 finally selects the cluster that is determined to exist at a predetermined distance based on the threshold value in the finally obtained dandogram, that is, the tree tree graph.

종래의 경우, 바이트 단위의 의미 정보(Semantic Information)을 이용하였기에 비트 단위의 메시지를 통신하는 무선 통신 프로토콜의 경우 적합하지 않지만, 본 발명의 경우 비트 단위의 비교로만 성능을 끌어내어 문제점을 해소할 수 있다. In the conventional case, since the semantic information of the byte is used, it is not suitable for a wireless communication protocol for communicating a bit-by-bit message. However, in the case of the present invention, the performance can be solved only by comparing the bit-wise. have.

이상의 설명은 본 발명의 실시예의 기술 사상을 예시적으로 설명한 것에 불과한 것으로서, 본 발명의 실시예가 속하는 기술 분야에서 통상의 지식을 가진 자라면 본 발명의 실시예의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 발명의 실시예들은 본 발명의 실시예의 기술 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 실시예의 기술 사상의 범위가 한정되는 것은 아니다. 본 발명의 실시예의 보호 범위는 아래의 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술 사상은 본 발명의 실시예의 권리범위에 포함되는 것으로 해석되어야 할 것이다.The above description is merely illustrative of the technical spirit of the embodiments of the present invention, and those skilled in the art to which the embodiments of the present invention pertain various modifications without departing from the essential characteristics of the embodiments of the present invention. Modifications may be possible. Therefore, the embodiments of the present invention are not intended to limit the technical spirit of the embodiments of the present invention, but to describe, and the scope of the technical spirit of the embodiments of the present invention is not limited by these embodiments. The protection scope of the embodiments of the present invention should be interpreted by the following claims, and all technical ideas within the equivalent scope should be interpreted as being included in the scope of the embodiments of the present invention.

100: 프로토콜 자동 분석 시스템
110: 무선신호 수집장비 112: 안테나
114: 복조기 116: 아날로그 디지털 변환기
120: 프로토콜 자동 분석기 210: 전처리부
220: 메시지 거리 계산부 230: 군집화 처리부100: protocol automatic analysis system
110: radio signal collection equipment 112: antenna
114: demodulator 116: analog-to-digital converter
120: protocol automatic analyzer 210: preprocessor
220: message distance calculation unit 230: clustering processing unit

Claims

An apparatus for automatically analyzing wireless protocols,
A preprocessor extracting bit stream data from the collected wireless signal and classifying a session in which a message is received based on the bit stream data;
A message distance calculator configured to measure a distance between the classified messages and generate a linkage matrix for clustering; And
A clustering processor configured to perform clustering on the message based on the connection matrix;
The preprocessor may include: a bit stream data extracting unit converting the wireless signal collected through the wireless signal collecting device into bit stream data in units of bits; A time information extracting unit for extracting time information of the time point at which the bit stream data is collected; And a data session classification unit for automatically classifying a session in which a message is received based on the bit stream data and the time information.
The message distance calculator measures bit similarity of each of two messages having two different sequences among the messages, generates a first matrix (matrix F) having a size determined based on the length of each message, Search for an alignment path according to whether bits of each of the two messages are identical in one matrix to generate a second matrix (matrix Ptr), wherein the two messages are aligned according to the alignment path of the second matrix (matrix Ptr). Computing a similarity between the two messages based on the length of the message and the number of common bits included in the sorting result for, and calculates the distance between the messages by applying the time weight based on the time information between the messages to the similarity,
The message distance calculator generates the connection matrix by calculating the distance between the messages between all message pairs that can be combined in the message.

delete

The method of claim 1,
The message distance calculation unit,
In order to compare two messages having two different sequences, a sequence alignment is performed, and then a mismatch penalty and a gap penalty are used to quantitatively compare the similarities of the two messages. and setting at least one of a penalty and a matching award, and calculating the alignment result using at least one of the set mismatch penalty, the gap penalty, and the matching value.

The method of claim 1,
The message distance calculation unit,
The similarity is measured for each bit of the message to generate a first matrix (matrix F), and the second matrix (matrix Ptr) for optimal alignment is generated by performing the alignment between the messages in the reverse order of the messages according to a predefined rule. And calculating the alignment result based on the first matrix (matrix F) and the second matrix (matrix Ptr).

The method of claim 1,
The message distance calculation unit,
Probability density function by setting the value obtained by dividing the number of bits having the same value between the two messages from the total message lengths of the two messages corresponding to the sorting result as the identity, and applying the similarity and the time information ( and f (x)) to calculate the distance between the messages.

The method of claim 6,
The message distance calculation unit,
And the probability density function (f (x)) is defined by applying a time weight (w) so that the longer the time difference between two messages is collected, the lower the similarity value.

The method of claim 1,
The clustering processing unit,
Clustering is performed by using a hierarchical clustering algorithm based on a linkage matrix (D), and a set cluster is generated by generating a dendrogram indicating the order in which the messages are combined. And finally clustering the messages according to the level.

In the method for automatically analyzing a wireless protocol,
Extracting bit stream data from the collected wireless signal and classifying a session in which a message is received based on the bit stream data;
A message distance calculation step of measuring a distance between the classified messages to generate a linkage matrix for clustering; And
And a clustering processing step of performing clustering on the message based on the connection matrix.
The preprocessing step may include: a bit stream data extraction step of converting the wireless signal collected through the wireless signal collection device into bit stream data in units of bits; A time information extraction step of extracting time information at the time point at which the bit stream data is collected; And a data session classification step of automatically classifying a session in which a message is received based on the bit stream data and the time information.
The message distance calculating step may measure similarity for each bit of each of two messages having two different sequences among the messages, generate a first matrix (matrix F) having a size determined based on the length of each message, and Search for an alignment path according to whether bits of each of the two messages are identical in the first matrix to generate a second matrix (matrix Ptr), and the two aligned according to the alignment path of the second matrix (matrix Ptr) The similarity between the two messages is calculated based on the length of the message and the number of common bits included in the sorting result for the message, and the distance between the messages is calculated by applying a time weight based on time information between the messages to the similarity,
In the message distance calculating step, the connection matrix is generated by calculating the distance between the messages between all pairs of messages that can be combined in the message.

delete

The method of claim 9,
The message distance calculation step,
The similarity is measured for each bit of the message to generate a first matrix (matrix F), and the second matrix (matrix Ptr) for optimal alignment is generated by performing the alignment between the messages in the reverse order of the messages according to a predefined rule. And calculating the sorting result based on the first matrix (matrices F) and the second matrix (matrices Ptr).