KR20150091775A

KR20150091775A - Method and System of Network Traffic Analysis for Anomalous Behavior Detection

Info

Publication number: KR20150091775A
Application number: KR1020140012407A
Authority: KR
Inventors: 이종훈; 김익균; 조현숙
Original assignee: 한국전자통신연구원
Priority date: 2014-02-04
Filing date: 2014-02-04
Publication date: 2015-08-12

Abstract

According to an embodiment of the present invention, a system for analyzing network traffic comprises: a data collection unit for collecting network traffic data; a characteristic factor extraction unit for extracting network characteristic factors from the collected data; a cluster generation unit for clustering the extracted characteristic factors; an analysis unit for extracting outliers by analyzing the cluster; and an abnormal host determination unit for determining abnormal hosts from the outliers.

Description

TECHNICAL FIELD [0001] The present invention relates to an apparatus and method for analyzing network traffic,

본 발명은 비정상 행위를 탐지하기 위해 네트워크 트래픽을 분석하기 위한 방법에 관한 것으로, 구체적으로는 네트워크 데이터를 수집하고, 수집된 데이터에서 사이버 공격 등을 인지하기 위한 특성 인자를 추출하고, 추출된 특성 인자에 대한 데이터를 주기적으로 학습하여 네트워크 내에 있는 호스트 중에서 비정상 행위가 의심되는 호스트를 탐지하기 위한 기술에 관한 것이다.The present invention relates to a method for analyzing network traffic for detecting an abnormal behavior, and more particularly, to collecting network data, extracting a characteristic factor for recognizing a cyber attack or the like from the collected data, The present invention relates to a technique for detecting a host suspected of an abnormal activity among hosts in a network by periodically learning data on the network.

통상적으로 네트워크 침입을 탐지하는 방법은 침입 탐지 내용에 따라 오용 탐지(misuse detection) 방법과 이상 행위 탐지(anomaly detection)로 나뉘어진다. 오용 탐지 방법은 사전에 이미 알려진 수백 가지 이상의 공격 패턴을 적용해 침입 탐지를 하는 것으로, 이미 알려진 공격 패턴에 대한 시그니처(signature)나 룰 데이터베이스(rule database)를 전문가에 의해 생성하고, 이와 일치하는 패턴을 탐지하는 기술이다. 이러한 오용 탐지 방법에서 사용되는 기술은 예를 들어, 패턴 매칭, 상태 정의 분석, 전문가 시스템, 키 스트로크 모니터링(key stroke monitoring), 조건부 확률 등이 있다.Generally, a method of detecting a network intrusion is divided into a misuse detection method and anomaly detection according to intrusion detection contents. The misuse detection method is to perform intrusion detection by applying a hundred or more known attack patterns already known in advance and generate a signature or a rule database for an already known attack pattern by an expert, . Techniques used in such misuse detection methods include, for example, pattern matching, state definition analysis, expert systems, key stroke monitoring, conditional probability, and the like.

이상행위 탐지 방법은 통계적인 자료를 기반으로 하여, 통계에서 벗어나는 알려지지 않은 각종 비이상적인 행위를 탐지하는 기술이다. 이상행위 탐지 방법은 통계 자료를 기반으로 하여 정상 또는 비정상을 구분하기 위한 학습 과정을 위해서 많은 학습 데이터가 요구된다. 이상행위 탐지 방법으로는 통계적인 방법, 특징 추출, 비정상적인 행위 측정 방법의 결합, 예측 가능한 패턴 생성, 신경망 기법 등이 사용된다.The anomaly detection method is based on statistical data and is a technique to detect various non-ideal behaviors that are not known from statistics. The abnormal behavior detection method requires a lot of learning data for the learning process to distinguish normal or abnormal from statistical data. The abnormal behavior detection methods include statistical methods, feature extraction, combination of abnormal behavior measurement methods, predictable pattern generation, and neural network techniques.

기존의 룰기반 네트워크 침입 탐지 시스템의 경우, 알려진 공격이나 악성행위를 기반으로 룰을 정의하고, 이러한 룰에 의해 외부의 침입을 인지하므로, 알려지지 않은 새로운 유형의 공격이나 정상 행위를 가장한 악성 행위의 경우에는 이를 탐지해 낼 수가 없다. 예를 들어, 공격자가 백신 프로그램 업데이트 서버에 침투하여 백신 업데이트 서버의 리다이렉션(redirection) 주소를 변경하고, 공격자 자신이 지정한 악성 서버로부터 백신 업데이트가 이루어지도록 하여, 사용자로 하여금 정상 프로그램을 가장한 악성코드를 다운로드 하도록 하는 행위는 기존의 보안 장비로는 막을 수가 없다. In the case of existing rule-based network intrusion detection systems, rules are defined based on known attacks or malicious actions, and because they recognize external intrusions by these rules, new types of malicious attacks I can not detect it. For example, an attacker may infiltrate a vaccine program update server to change the redirection address of the vaccine update server, to update the vaccine from the malicious server designated by the attacker himself, Can not be prevented by existing security equipment.

한편, 최근 지능형 사이버 보안 기술은 다양한 소스의 대용량 데이터를 활용하여 네트워크 및 시스템 이벤트를 하나의 연동된 보안 인프라로 구성하는 통합 보안 관리 기술을 목표로 하고 있다. 해킹과 악성 프로그램이 진화된 APT(Advanced Persistent Threat) 공격과 같이 알려지지 않은 사이버 공격은, 정보 유출, 시스템 다운(system down), 사이트 및 네트워크 마비 등을 통해 개인뿐만 아니라 조직 및 국가 차원에서 많은 피해를 발생시킨다. 따라서, 지능형 보안 기술은 호스트, 네트워크 등에서 발생하는 데이터 및 보안 이벤트 간의 연관성을 분석하여 보안 지능을 향상시키는 차세대 보안정보 기술로써, IT 기반 주요시설의 네트워크, 시스템, 응용(application) 서비스를 사이버 표적 공격으로부터 보호하기 위해 필수적이다.Meanwhile, the latest intelligent cyber security technology aims at an integrated security management technology that utilizes a large amount of data of various sources and configures network and system events into one integrated security infrastructure. Unknown cyber attacks, such as hacking and malicious program evolving APT (Advanced Persistent Threat) attacks, can cause many harms both at the organization and at the national level, as well as through information leakage, system down, site and network paralysis. . Therefore, intelligent security technology is a next generation security information technology that improves security intelligence by analyzing the association between data and security events occurring in host, network, etc., . &Lt; / RTI >

보편적으로 네트워크의 침입을 탐지하는 IDS(Intrusion Detection System)의 경우 DDoS, 포트 스캔, 컴퓨터를 크랙하려는 시도와 같은 공격이 발생한 경우에는 이를 인지할 수 있지만, 오랜 잠복 기간을 두고 정교하게 이루어지는 최근의 지능형 사이버 표적 공격을 인지하고 방어하는 데는 한계를 드러내고 있기 때문에, 단순히 하나의 공격 요소를 차단하는 것이 아닌 호스트, 네트워크상에서 수집되는 다양한 데이터의 관계를 분석하여 은밀하게 진행되는 공격을 인지하고 탐지할 필요가 있다.An IDS (Intrusion Detection System) that detects intrusion of a network in general can recognize this when an attack such as DDoS, port scan, or an attempt to crack a computer occurs, but it is a recent intelligent It is necessary to recognize and detect secret attacks by analyzing the relation of various data collected on the host and network, not merely blocking one attack element, because it reveals limitations in recognizing and defending cyber target attacks have.

본 발명은 전술한 종래 기술의 문제점을 해결하기 위하여, 사이버 표적 공격을 탐지하기 위해 네트워크 데이터 수집 장치를 이용하여 네트워크 데이터를 수집하고, 수집된 데이터에서 사이버 표적 공격을 인지하기 위한 특성 인자를 추출하고, 추출된 각각의 특성 인자에 대한 군집도의 변화 상태를 시간 기반으로 분석하고, 주기적으로 학습함으로써, 네트워크 상에 존재하는 호스트 중에서 비정상(Anomaly) 행위가 의심스러운 호스트를 탐지하고, 이를 기반으로 각각의 특성 인자와 비정상 호스트들간의 관계를 직관적이고 시각적으로 표현하기 위한 방법 및 그 시스템을 제공하는 것에 목적이 있다.In order to solve the above-described problems of the related art, the present invention has been made to collect network data using a network data collection device to detect a cyber target attack, extract characteristic parameters for recognizing a cyber target attack from the collected data, , Analyzing the change state of the cluster diagram for each extracted characteristic factor on a time basis, and periodically learning it to detect a host suspected of anomaly activity among the hosts existing on the network, And to provide a system and method for intuitively and visually expressing a relationship between a characteristic parameter of a host and abnormal hosts.

본 발명의 일 실시 예에 따른 네트워크 트래픽 분석 시스템은, 네트워크 트래픽 데이터를 수집하는 데이터 수집부; 상기 수집된 데이터로부터 네트워크 특성 인자를 추출하는 특성 인자 추출부; 상기 추출된 특성 인자에 대한 클러스터링을 수행하는 클러스터 생성부; 상기 클러스터를 분석하여 이상 호스트(Outlier)을 추출하는 분석부; 및 상기 이상 호스트로부터 비정상 호스트를 결정하는 비정상 호스트 결정부를 포함할 수 있다.A network traffic analysis system according to an embodiment of the present invention includes: a data collection unit for collecting network traffic data; A characteristic parameter extracting unit for extracting a network characteristic parameter from the collected data; A cluster generating unit for performing clustering on the extracted characteristic factors; An analysis unit for analyzing the cluster and extracting outliers; And an abnormal host determination unit for determining an abnormal host from the abnormal host.

또한 본 발명의 일 실시 예에 따른 비정상 행위를 수행하는 호스트를 탐지하기 위한 네트워크 트래픽 분석 방법은, 네트워크 트래픽 분석 시스템에서 네트워크 트래픽 데이터를 수집하는 단계; 상기 수집된 데이터로부터 네트워크 특성 인자를 추출하는 단계; 상기 추출된 특성 인자에 대한 클러스터링을 수행하는 단계; 상기 클러스터를 분석하여 이상 호스트를 추출하는 단계; 및 상기 이상 호스트로부터 비정상 호스트를 결정하는 단계를 포함할 수 있다.According to another aspect of the present invention, there is provided a network traffic analyzing method for detecting a host that performs an abnormal operation, comprising: collecting network traffic data in a network traffic analyzing system; Extracting a network characteristic factor from the collected data; Performing clustering on the extracted characteristic factors; Analyzing the cluster to extract an abnormal host; And determining an abnormal host from the abnormal host.

본 발명의 실시 예에 따르면, 사이버 표적 공격을 탐지하기 위해 네트워크 수집 장치를 이용하여 네트워크 데이터를 수집하고, 수집된 데이터에서 사이버 표적 공격을 인지하기 위한 특성인자를 추출하고, 추출된 각각의 특성인자에 대한 군집도의 변화 상태를 시간 기반으로 분석하고, 주기적으로 학습함으로써, 망 내에 있는 호스트 중에서 비정상(Anomaly) 행위가 의심스러운 호스트를 네트워크 데이터 분석 방법을 통해 탐지하고, 이를 기반으로 각각의 특성인자와 비정상 호스트들간의 관계를 직관적이고 시각적으로 표현할 수 있는 효과가 있다.According to an embodiment of the present invention, network data is collected using a network collecting device to detect a cyber target attack, a characteristic factor for recognizing a cyber target attack is extracted from the collected data, Based on time-based analysis of the state of change in the degree of aggregation, and periodically learns the hosts that are suspicious of anomaly behavior among the hosts in the network through the network data analysis method, And the relationship between the abnormal hosts can be intuitively and visually expressed.

또한 차세대 지능형 보안 시스템은 기존에 알려진 공격 패턴이나 룰이 아닌 지속적인 기간을 가지고 은밀하게 진행되는 새로운 사이버 표적 공격을 탐지하고 차단할 필요가 있는데, 본 발명에 따른 네트워크상의 데이터를 수집하여 분석하기 위한 방법 및 시스템은 사이버 표적 공격이 의심스러운 비정상 호스트를 탐지하는 기술 및 시스템으로 활용될 수 있다.In addition, the next generation intelligent security system needs to detect and block new cyber target attacks that are secretly proceeding with a continuous period of time rather than existing known attack patterns or rules. A method for collecting and analyzing data on the network according to the present invention, The system can be utilized as a technology and system for detecting a suspicious host with a cyber-target attack.

도 1은 본 발명의 일 실시 예에 따른 비정상 행위 탐지를 위한 네트워크 분석 시스템의 개념도를 나타낸다.
도 2는 본 발명의 일 실시 예에 따른 비정상 행위 탐지를 위한 네트워크 분석 시스템의 예시적인 구조를 나타낸다.
도 3은 본 발명의 일 실시 예에 따른 네트워크 데이터 특성 인자들의 예시적인 목록을 나타낸다.
도 4는 본 발명의 일 실시 예에 따른 클러스터링 및 클러스터 형상 변화를 나타낸다.
도 5는 본 발명의 일 실시 예에 따른 클러스터 분석 과정을 통해 비정상 호스트 리스트를 추출하는 예시를 나타낸다.
도 6은 본 발명의 일 실시 예에 따른 네트워크 분석 시스템의 시각화 화면을 보여준다.
도 7은 본 발명의 일 실시 예에 따른 비정상 행위 인지를 위한 흐름도를 나타낸다.FIG. 1 is a conceptual diagram of a network analysis system for detecting abnormal behavior according to an embodiment of the present invention.
2 illustrates an exemplary structure of a network analysis system for abnormal behavior detection according to an embodiment of the present invention.
Figure 3 shows an exemplary list of network data characterization factors in accordance with one embodiment of the present invention.
4 shows clustering and cluster shape changes according to an embodiment of the present invention.
FIG. 5 illustrates an example of extracting an abnormal host list through a cluster analysis process according to an embodiment of the present invention.
FIG. 6 shows a visualization screen of a network analysis system according to an embodiment of the present invention.
FIG. 7 is a flowchart illustrating an abnormal behavior determination according to an exemplary embodiment of the present invention.

이하, 첨부된 도면들을 참조하여 본 발명의 다양한 실시 예들을 상세히 설명한다. 이때, 첨부된 도면들에서 동일한 구성 요소는 가능한 동일한 부호로 나타내고 있음에 유의해야 한다. 또한 본 발명의 요지를 흐리게 할 수 있는 공지 기능 및 구성에 대한 상세한 설명은 생략할 것이다. 즉, 하기의 설명에서는 본 발명의 실시 예에 따른 동작을 이해하는데 필요한 부분만이 설명되며, 그 이외 부분의 설명은 본 발명의 요지를 흩트리지 않도록 생략될 것이라는 것을 유의하여야 한다.Various embodiments of the present invention will now be described in detail with reference to the accompanying drawings. Note that, in the drawings, the same components are denoted by the same reference symbols as possible. Further, the detailed description of well-known functions and constructions that may obscure the gist of the present invention will be omitted. In other words, it should be noted that only parts necessary for understanding the operation according to the embodiment of the present invention will be described, and descriptions of other parts will be omitted so as not to disturb the gist of the present invention.

이하에서 기술되는 본 발명의 일 실시 예는, 네트워크 데이터를 수집하고, 수집된 데이터에서 사이버 표적 공격을 인지하기 위한 특성 인자를 추출하고, 추출된 특성 인자에 대한 데이터를 주기적으로 학습하여 네트워크 상에 존재하는 호스트 중에서 비정상 행위가 의심스러운 호스트를 탐지하기 위한 네트워크 데이터 분석 방법 및 시스템을 포함한다. 보다 자세히는 네트워크 데이터를 분석하기 위한 과정, 특성 인자를 통해 의심스러운 호스트를 탐지하기 위한 방법, 및 이러한 과정과 방법을 포함하는 네트워크 분석 시스템의 구조, 그리고 분석 결과를 표현하기 위한 화면 구성을 포함하고 있다.An embodiment of the present invention described below collects network data, extracts a characteristic factor for recognizing a cyber target attack from the collected data, periodically learns data about the extracted characteristic factor, And a network data analysis method and system for detecting a suspicious host suspicious of an existing host. More specifically, the present invention includes a process for analyzing network data, a method for detecting a suspicious host through a characteristic parameter, a structure of a network analysis system including the process and method, and a screen configuration for expressing analysis results have.

도 1은 본 발명의 일 실시 예에 따른 비정상 행위 탐지를 위한 네트워크 분석 시스템의 개념도를 나타낸다.FIG. 1 is a conceptual diagram of a network analysis system for detecting abnormal behavior according to an embodiment of the present invention.

도 1을 참조하면, 네트워크 환경 10은 기업망과 같은 내부 네트워크 100, 네트워크 데이터 분석 시스템 200, 및 인터넷과 같은 외부 네트워크 300을 포함할 수 있다. 내부 네트워크 100과 네트워크 데이터 분석 시스템 200, 외부 네트워크 300은 스위치 110, 스위치 310, 및 라우터 320 등을 통해 연결될 수 있다. 이와 같은 구성은 예시적인 것이며, 다양하게 변형된 네트워크 환경이 존재할 수 있다.Referring to FIG. 1, the network environment 10 may include an internal network 100 such as a corporate network, a network data analysis system 200, and an external network 300 such as the Internet. The internal network 100, the network data analysis system 200, and the external network 300 may be connected through a switch 110, a switch 310, a router 320, and the like. Such a configuration is exemplary and there may be various modified network environments.

네트워크 데이터 분석 시스템 200(이하 분석 시스템 200이라고도 한다.)은 외부로부터 발생하는 악성 행위를 탐지하기 위해서 외부 네트워크와 내부 네트워크 사이의 연결 점에 위치하고 외부 네트워크와 내부 네트워크 사이에서 발생되는 네트워크 트래픽을 수집하는 네트워크 데이터 수집 장치 210을 포함할 수 있다. 도 1의 예시에서는 각각의 독립된 장치가 유선으로 연결되는 것으로 도시되었으나, 하나의 시스템 200 내부에 위치한 데이터 수집 모듈 또는 장치가 네트워크 데이터 수집 장치 210의 기능을 수행할 수 있다. 이와 관련하여, 도 2를 참조하여 설명한다.A network data analysis system 200 (hereinafter also referred to as an analysis system 200) is located at a connection point between an external network and an internal network and detects network traffic generated between the external network and the internal network in order to detect malicious activity occurring from the outside And a network data collection device 210. In the example of FIG. 1, each separate device is shown as wired, but a data collection module or device located within one system 200 may perform the function of the network data collection device 210. This will be described with reference to FIG.

도 2는 본 발명의 일 실시 예에 따른 비정상 행위 탐지를 위한 네트워크 분석 시스템의 예시적인 구조를 나타낸다. 도 2에 도시된 구조는 예시적인 것이며, 분석 시스템 200이 반드시 이와 동일한 구조를 가져야 하는 것은 아니다. 일부 구성요소는 생략될 수 있고, 적어도 두 개의 유닛으로 도시된 기능이 하나의 유닛에서 수행될 수 있다.2 illustrates an exemplary structure of a network analysis system for abnormal behavior detection according to an embodiment of the present invention. The structure shown in FIG. 2 is exemplary, and the analysis system 200 does not necessarily have to have the same structure. Some of the components may be omitted, and functions shown as at least two units may be performed in one unit.

도 2를 참조하면, 분석 시스템 200은 데이터 수집 모듈 210, 군집(cluster; 클러스터) 생성 모듈 220, 모델 분석 모듈 230, 및 비정상 인지 모듈 240을 포함할 수 있다. 또한, 각각의 모듈에서 획득된 데이터를 저장하기 위한 데이터 저장소 250을 더 포함할 수 있다.2, the analysis system 200 may include a data collection module 210, a cluster generation module 220, a model analysis module 230, and an abnormal recognition module 240. It may further comprise a data store 250 for storing the data obtained in each module.

데이터 수집 모듈 210은 네트워크 데이터를 수집하고 전처리를 수행할 수 있다. 데이터 수집 모듈 210은 전술한 네트워크 데이터 수집 장치에 대응될 수 있다. 데이터 수집 모듈 210은 데이터를 수집하기 위한 네트워크 데이터 수집부 211, 수집된 데이터에서 통계적인 정보를 획득하기 위한 네트워크 데이터 통계부 213, 수집된 데이터로부터 특성 인자 추출을 위한 특성 인자 추출부 215, 및 빅데이터 플랫폼(big data platform) 등에 데이터를 저장하기 위한 데이터 저장부 217를 포함할 수 있다. 특성 인자의 종류와 특징에 대해서는 도 3을 참조하여 후술한다.The data collection module 210 may collect network data and perform preprocessing. The data collection module 210 may correspond to the above-described network data collection device. The data collection module 210 includes a network data collection unit 211 for collecting data, a network data statistics unit 213 for obtaining statistical information from the collected data, a characteristic parameter extraction unit 215 for extracting characteristic parameters from the collected data, And a data storage unit 217 for storing data in a big data platform or the like. The types and characteristics of the characteristic factors will be described later with reference to Fig.

군집 생성 모듈 220은 군집 알고리즘과 정의된 특성 인자를 이용하여 군집화(clustering)를 수행할 수 있다. 군집 생성 모듈 220은 추출된 네트워크 특성 인자 중 IP 레이어 특성 인자(IP layer parameter)로 군집화 및 학습을 수행하기 위한 IP 레이어 군집부 221, 트랜스포트 레이어 특성 인자(transport layer parameter)로 군집화 및 학습을 수행하기 위한 트랜스포트 레이어 군집부 223, 및 응용 레이어의 특성 인자(application layer parameter)로 군집화를 수행하기 위한 응용 레이어 분석부 225를 포함할 수 있다. The cluster generation module 220 may perform clustering using a cluster algorithm and defined characteristic factors. The cluster generation module 220 performs clustering and learning using an IP layer cluster unit 221 for performing clustering and learning using IP layer parameter among extracted network characteristic parameters and a transport layer parameter And an application layer analyzer 225 for performing clustering on application layer parameter of the application layer.

모델 분석 모듈 230은 군집 생성 모듈 220에서 생성된 클러스터(군집)를 분석하여 의심 호스트(suspicious host)를 탐지할 수 있다. 모델 분석 모듈 230은 클러스터의 형상을 분석하여 클러스터에 속하지 않는 이상점(outlier)(또는 이상 호스트)를 탐지하는 클러스터 분석부 231, 탐지된 이상점에 대하여 이전 클러스터링(군집화)의 결과와 비교하여 이상점에 대한 필터링을 수행하는 클러스터 시계열 분석부 233, 상술한 과정을 종합하여 최종적으로 이상점을 추출하는 이상점 추출부 235, 및 추출된 이상점에 해당하는 호스트의 과거 행위 이력을 저장된 데이터로부터 분석하는 행동(behavior) 분석부 237을 포함할 수 있다.The model analysis module 230 can detect a suspicious host by analyzing clusters generated in the cluster generation module 220. The model analysis module 230 analyzes the shape of the cluster and detects an outlier (or an abnormal host) not belonging to the cluster. The cluster analysis unit 231 analyzes the detected anomaly by comparing the result of the previous clustering (clustering) A cluster time series analyzing unit 233 for performing filtering on a point, an abnormality point extracting unit 235 for finally extracting an abnormality point by combining the above processes, and an analyzing unit 235 for analyzing a past behavior history of the host corresponding to the extracted abnormality from stored data And a behavior analysis unit 237 for analyzing the behavior of the user.

비정상(anomaly) 인지 모듈 240은 모델 분석 모듈 230에서 추출된 의심 호스트에 대하여 비정상 여부를 결정하고, 그 결과를 시스템, 또는 시스템이 지원하는 스크린에 표현할 수 있다. 비정상 인지 모듈 240은 의심 호스트로 분류된 호스트의 행위가 사이버 표적 공격의 행위와 유사/일치하는지 여부를 판단하는 비정상 호스트 결정부 241, 결정된 비정상 호스트를 디스플레이 하기 위한 분석 데이터 표현부 243, 및 비정상 호스트의 이력을 관리하기 위한 비정상 호스트 추적 관리부 245를 포함할 수 있다.The anomaly recognition module 240 may determine an abnormality with respect to the suspect host extracted from the model analysis module 230 and display the result on the screen supported by the system or the system. The abnormal recognition module 240 includes an abnormal host determination unit 241 for determining whether an action of a host classified as a suspicious host resembles or matches the behavior of a cyber target attack, an analyzed data display unit 243 for displaying a determined abnormal host, And an abnormal host tracking management unit 245 for managing the history of the host.

네트워크 데이터를 수집하기 위한 데이터 수집부 211은 네트워크 수집 장비와 사용자 호스트로부터 네트워크 요소와 관련된 데이터를 수집할 수 있다. 보다 구체적으로, 호스트가 통신을 하게 되면, 송신자의 IP 주소와 포트, 목적지의 IP 주소와 포트, 데이터 흐름(flow)의 지속 시간(duration)(즉, 시작 시간과 종료 시간), 수신 패킷의 수와 송신 패킷의 수, 수신 데이터의 크기와 송신 데이터의 크기 등이 수집될 수 있다. 또한, 데이터 흐름(flow) 내에 있었던 TCP 플래그(flag)의 목록, 서비스 아이디, 어플리케이션 프로토콜 아이디, 장치 아이디, 서비스 제공자 아이디, 옵저베이션 도메인 아이디(Observation Domain ID) 등이 수집될 수 있다.The data collection unit 211 for collecting network data may collect data related to the network elements from the network collection equipment and the user host. More specifically, when the host communicates, the IP address and port of the sender, the IP address and port of the destination, the duration of the data flow (i.e., start time and end time), the number of received packets The number of transmission packets, the size of received data, and the size of transmission data can be collected. Also, a list of TCP flags in a data flow, a service ID, an application protocol ID, a device ID, a service provider ID, an observation domain ID, and the like may be collected.

또한 데이터 수집부 211은 서비스와 시그니쳐에 대한 데이터베이스를 유지하면서, 알려지지 않은 서비스이거나 알려지지 않은 시그니쳐가 포함된 경우 해당 패킷의 페이로드를 수집할 수 있다. 수집된 네트워크상의 로우 데이터(raw data)는 네트워크 데이터 통계부 213에 의해 통계화 되고, 네트워크 특성 인자 추출부 215는 특성 인자들을 추출할 수 있다.In addition, the data collection unit 211 may collect a payload of the packet when the unknown service or the unknown signature is included, while maintaining the database of the service and the signature. The raw data on the collected network is statistically analyzed by the network data statistics unit 213, and the network characteristic parameter extraction unit 215 can extract the characteristic parameters.

도 3은 본 발명의 일 실시 예에 따른 네트워크 데이터 특성 인자들의 예시적인 목록을 나타낸다.Figure 3 shows an exemplary list of network data characterization factors in accordance with one embodiment of the present invention.

도 3을 참조하면, 네트워크 데이터의 분석을 위한 특성 인자는 4개의 분류로 나누어질 수 있다. 즉, 특성 인자는 IP 레이어 특성인자, 트랜스포트 레이어 특성인자, 어플리케이션 레이어 특성인자, 행동 특성인자로 나누어질 수 있다. 각각의 추출된 특성인자는 대응되는 군집부 221, 223, 225에서 군집화되고, 클러스터 분석부 231 및/또는 행동 분석부 237에 의해 분석될 수 있다.Referring to FIG. 3, characteristic factors for analyzing network data can be divided into four categories. That is, the characteristic parameter can be divided into an IP layer characteristic parameter, a transport layer characteristic parameter, an application layer characteristic parameter, and a behavior characteristic parameter. Each extracted characteristic factor may be clustered in the corresponding cluster part 221, 223, 225, and analyzed by the cluster analysis part 231 and / or the behavior analysis part 237.

IP 레이어 특성 인자는, TCHC, TCPC, FS, BDIP, 및 PDIP 5가지의 특성인자를 포함할 수 있다. 보다 구체적으로, 네트워크 특성 인자 추출부 215는 수집된 데이터에서 IP 계층의 데이터 분석을 위해 2초 내에 있었던 연결 호스트의 수(TCHC), 2초 내에 있었던 연결 포트의 수(TCPC), 2초 내에 있었던 전체 플로우의 수(FS), 목적지 주소 각각에 대한 데이터의 양(BPID), 목적지 주소 각각에 대한 패킷의 수(PDIP)를 추출할 수 있다.The IP layer characteristic factor may include five characteristic factors: TCHC, TCPC, FS, BDIP, and PDIP. More specifically, the network characteristic parameter extracting unit 215 extracts the number of connection hosts (TCHC) within 2 seconds, the number of connection ports (TCPC) within 2 seconds, The total number of flows (FS), the amount of data for each destination address (BPID), and the number of packets for each destination address (PDIP) can be extracted.

트랜스포트 레이어 특성 인자는, PTYPE, FBYTE, TFLAG, DIP, DIPP, 및 TSUB 6가지의 특성 인자를 포함할 수 잇다. 보다 구체적으로, 네트워크 특성 인자 추출부 215는 트랜스포트 계층의 분석을 위해 수신된 패킷의 프로토콜 타입 리스트(PTYPE), 송신 바이트(src_bytes; source bytes)와 수신 바이트(dst_bytes; destination bytes) 사이에서 실제 수신된 패킷의 양에 대한 비율(rate)(FBYTE), 플로우 내에 있었던 TCP 플래그의 값(TFLAG), 목적지 IP 주소의 분포(DIP), 목적지 IP 주소의 연결 포트 분포(DIPP), 목적지 주소가 포함된 서브넷(subnet)의 총 트래픽 통계(TSUB)를 추출하여 계산할 수 있다.The transport layer characteristic parameters can include six characteristic parameters: PTYPE, FBYTE, TFLAG, DIP, DIPP, and TSUB. More specifically, the network characteristic parameter extracting unit 215 extracts, from the protocol type list (PTYPE), the source byte (src_bytes), and the destination byte (dst_bytes) of the received packet for analysis of the transport layer, (DIPP) of the destination IP address, and the destination address (TFLAG), the distribution of the destination IP address (DIP), the destination TCP / The total traffic statistics (TSUB) of the subnet can be extracted and calculated.

응용 레이어 특성인자는, SAUP, SADP, FSID, FAID, 및 FSPID 5가지의 특성 인자를 포함할 수 있다. 보다 구체적으로, 네트워크 특성 인자 추출부 215는 응용 레이어의 분석을 위해서는 사용자의 서비스와 어플리케이션 사용 빈도(SAUP), 사용된 서비스와 어플리케이션 집중도(SADP), 서비스별 플로우 통계(FSID), 어플리케이션 프로토콜 아이디에 대한 플로우 통계(FAID), 서비스 공급자 ID에 대한 플로우 통계(FSPID)를 추출할 수 있다.The application layer characteristic parameter may include five characteristic parameters: SAUP, SADP, FSID, FAID, and FSPID. More specifically, the network characteristic parameter extracting unit 215 extracts a network characteristic factor extracting unit 215 for analyzing an application layer, such as a user service and an application usage frequency (SAUP), a used service and application concentration (SADP), a service flow statistics (FSID) The flow statistics (FAID), and the flow statistics (FSPID) for the service provider ID.

행동 특성 인자는 GF, USID 및 USP 3가지의 특성 인자를 포함할 수 있다. 보다 구체적으로, 네트워크 특성 인자 추출부 215는 사용자 행위에 대한 분석을 위한 연결된 목적지의 지리적 정보(GF), 알려지지 않은 서비스 ID의 정보, 페이로드 내에 알려지지 않은 시그니쳐에 대한 정보(USP)를 추출할 수 있다.Behavioral factors may include three characteristic factors: GF, USID, and USP. More specifically, the network characteristic parameter extracting unit 215 can extract information of connected destinations (GF) for analyzing user behavior, information of an unknown service ID, information (USP) about unknown signatures in the payload have.

예를 들어 다시 도 2를 참조하면, IP 레이어 군집부 221은 복수 개의 호스트에 대해 추출된 특성 인자에 대해서 호스트의 수, 포트의 수, 플로우의 수, 목적지 IP 주소에 대한 데이터의 양과 패킷의 수에 대한 분석을 위해 먼저 BDIP와 PDIP 특성 인자의 평균값을 계산한다. 그리고, 5가지(TCHC, TCPC, FS, BDIP, 및 PDIP)의 특성 인자 값을 변수로 군집화(Clustering) 알고리즘을 통해 군집화를 수행할 수 있다. 군집화(Clustering)란, 일반적으로 K-Means 클러스터링 알고리즘, EM 알고리즘과 같은 알고리즘을 반복적으로 이용하여 비슷한 특징을 가진 클러스터로 데이터 집합을 생성하는 것을 의미한다. 군집화의 결과로는 클러스터와 클러스터에 속하지 않는 이상점(Outlier)을 탐지해낼 수 있다. 또한 그 결과는 시각적으로 표현될 수 있는데, 이와 관련하여 도 4를 참조하여 후술한다. 군집화의 결과로는 군집, 즉 클러스터들과 각 클러스터 내 에서 중심점과 거리 값 등이 계산될 수 있다. 이에 대한 결과로 나오는 클러스터에 대하여 주기적인 군집화 프로세스를 수행하여 주기적인 군집의 변화 모양을 분석할 수 있다. 그리고 클러스터에 속하지 않는 데이터인 이상점(Outlier)을 추출할 수 있다. 예를 들어 클러스터 분석부 231, 또는 클러스터 분석부 231 내에 포함된 IP 레이어 분석부에서는 이와 같은 군집화를 통해서 어떤 다른 호스트 또는 전체 평균의 값과 비교하여 급격하게 많은 목적지 호스트들과 연결이 되어 있거나, 플로우의 수가 다른 호스트들에 비해 특이한 경우, 대량의 패킷이나 데이터를 교환중인 호스트를 추출할 수 있다.For example, referring again to FIG. 2, the IP layer cluster unit 221 determines the number of hosts, the number of ports, the number of flows, the amount of data for the destination IP address, and the number of packets We first calculate the mean value of the BDIP and PDIP characteristic factors. In addition, clustering can be performed through the clustering algorithm using the parameter values of five types (TCHC, TCPC, FS, BDIP, and PDIP) as variables. Clustering generally means generating a dataset with clusters of similar characteristics by repeatedly using algorithms such as K-Means clustering algorithm and EM algorithm. Clustering can detect clusters and outliers that do not belong to clusters. Also, the result can be expressed visually, which will be described later with reference to FIG. As a result of clustering, the clusters, that is, the clusters and the center values and distance values within each cluster, can be calculated. As a result, periodic clustering process can be performed on clusters resulting in periodic clustering. And it is possible to extract outliers which are data not belonging to the cluster. For example, the IP layer analyzing unit included in the cluster analyzing unit 231 or the cluster analyzing unit 231 compares with the value of some other host or the average through the clustering, The host exchanging a large amount of packets or data can be extracted.

이와 같은 방법으로 트랜스포트 레이어 군집부 223은 복수 개의 호스트에 대해 추출된 특성 인자인 PTYPE, FBYTE, TFLAG, DIP, DIPP, 및 TSUB에 대해 군집화를 수행할 수 있다. 클러스터 분석부 231, 또는 클러스터 분석부 231 내에 포함된 트랜스포트 레이어 분석부에서는 이러한 군집화를 통해서 송신 데이터와 수신 데이터의 차이가 유난히 많이 나거나, 목적지 주소의 분포가 다른 호스트와 차이가 나는 경우, TCP 플래그 내에 SYN, ACK, FIN의 분포 차이가 많이 나는 경우 등을 탐지할 수 있다.In this manner, the transport layer cluster unit 223 can perform clustering on the extracted characteristic parameters PTYPE, FBYTE, TFLAG, DIP, DIPP, and TSUB for a plurality of hosts. In the case where the difference between the transmission data and the reception data is remarkably large or the distribution of the destination address is different from that of the host through the clustering, the transport layer analyzing unit included in the cluster analyzing unit 231 or the cluster analyzing unit 231, And a case where a distribution difference of SYN, ACK, and FIN is large in the network can be detected.

또한, 어플리케이션 레이어 군집부 225는 복수 개의 호스트에 대해 추출된 특성 인자인 SAUP, SADP, FSID, FAID, 및 FSPID에 대해 군집화를 수행할 수 있다. 클러스터 분석부 231, 또는 클러스터 분석부 231 내에 포함된 어플리케이션 레이어 분석부에서는 응용과 서비스에 대한 군집화를 통해서, 어플리케이션의 플로우가 다른 호스트들과 차이가 나는 경우를 찾아낼 수 있다. 예를 들어, 백신 업데이트의 경우 일반적인 경우의 플로우와 다른 흐름을 가지는 호스트를 의심스러운 호스트로 찾아낼 수가 있다.In addition, the application layer cluster unit 225 may perform clustering on SAUP, SADP, FSID, FAID, and FSPID, which are extracted characteristic parameters for a plurality of hosts. The application layer analyzing unit included in the cluster analyzing unit 231 or the cluster analyzing unit 231 can find a case where the flow of the application differs from other hosts through clustering of the application and the service. For example, in the case of a vaccine update, a host with a different flow than the flow in the normal case can be found as a suspicious host.

도 4는 본 발명의 일 실시 예에 따른 클러스터링 및 클러스터 형상 변화를 나타낸다.4 shows clustering and cluster shape changes according to an embodiment of the present invention.

도 4를 참조하면, 군집 생성 모듈 220에 클러스터링 된 결과가 도시된다. 결과 410은 클러스터링의 일 예시를 나타낸다. 결과 410에서는, 클러스터링 결과 5개의 클러스터 401과 클러스터 401에 속하지 않는 복수 개의 이상점(Outlier) 402가 나타난다. 이상점 402의 수는 군집 알고리즘 및 기타 설정에 의해 조절될 수 있다.Referring to FIG. 4, clustered results are shown in cluster generation module 220. Result 410 shows an example of clustering. In the result 410, as a result of clustering, five clusters 401 and a plurality of outliers 402 that do not belong to the cluster 401 appear. The number of anomalies 402 may be adjusted by a clustering algorithm and other settings.

이와 같은 클러스터링은 특정 시간 T 초마다 주기적으로 수행될 수 있다. 또한, 이전 주기의 클러스터링 결과와 현재의 클러스터링 결과 사이의 변화율이 분석되고, 각각의 클러스터에 속하지 않는 이상점 데이터가 클러스터 시계열 분석부 233에서 비정상 행위 탐지(Anomaly Detection)을 위한 후보로써 관리될 수 있다. 그리고 군집화의 형상 변화율 분석을 통해, 이상점이 급격히 늘어나는 경우 추출되는 이상점의 개수가 가변적으로 조정될 수 있다. 이 경우, 클러스터의 중심 점에서부터 클러스터 외부 경계까지의 거리가 증가될 수 있다.Such clustering may be performed periodically at a specific time T seconds. In addition, the rate of change between the clustering result of the previous period and the current clustering result is analyzed, and the abnormal point data not belonging to each cluster can be managed as a candidate for abnormal behavior detection (anomaly detection) in the cluster time series analyzing unit 233 . Through the analysis of the shape change rate of clustering, the number of extracted abnormal points can be variably adjusted when the ideal point increases sharply. In this case, the distance from the center point of the cluster to the outer boundary of the cluster may be increased.

도 4에 도시된 것과 같이, 주기적으로 군집화를 수행하였을 때, 클러스터의 변화가 거의 없는 결과 420A, 또는 클러스터의 변화가 큰 결과 420B가 도출될 수 있다. 클러스터 시계열 분석부 233은 이러한 변화율을 추적하여 오탐율을 낮출 수가 있다. 이상점 추출부 235는, IP 레이어 군집부 211, 트랜스포트 레이어 군집부 223, 어플리케이션 레이어 군집부 225, 및 그에 대응되는 클러스터 분석부 231의 분석 결과로부터 도출되는 이상점에 대해서, 현재 군집화의 형상 변화율에 대비하여 이상점을 최종 결정할 수 있다.As shown in Fig. 4, when clustering is performed periodically, a result 420A having almost no cluster change, or a cluster 420B having a large cluster change can be derived. The cluster time series analyzer 233 can track the rate of change and lower the false rate. The anomaly point extracting unit 235 extracts an anomaly point derived from the analysis results of the IP layer cluster unit 211, the transport layer cluster unit 223, the application layer cluster unit 225, and the cluster analysis unit 231 corresponding thereto, The ideal point can be finally determined.

도 5는 본 발명의 일 실시 예에 따른 클러스터 분석 과정을 통해 비정상 호스트 리스트를 추출하는 예시를 나타낸다.FIG. 5 illustrates an example of extracting an abnormal host list through a cluster analysis process according to an embodiment of the present invention.

도 5를 참조하면, 군집 생성 모듈 220의 IP 레이어 군집부 221, 트랜스포트 레이어 군집부 223, 및 어플리케이션 레이어 군집부 225에서 군집 알고리즘에 의해 생성된 클러스터에 대해서, 클러스터 분석부 230에 포함된 IP 레이어 클러스터 분석부 501, 트랜스포트 레이어 클러스터 분석부 503, 어플리케이션 레이어 클러스터 분석부 505는 각각 이상점 510을 추출할 수 있다.5, the clusters generated by the clustering algorithm in the IP layer clusters 221, the transport layer clusters 223, and the application layer clusters 225 of the cluster generating module 220 are divided into IP layers The cluster analysis unit 501, the transport layer cluster analysis unit 503, and the application layer cluster analysis unit 505 can extract the abnormal point 510, respectively.

변형된 실시 예에서, 각각의 군집부와 분석부는 통합될 수 있다. 예를 들어, IP 레이어 군집부 221과 IP 레이어 클러스터 분석부 501은 IP 레이어 분석부로 통합되어, 클러스터링 및 클러스터 분석을 통한 이상점 추출을 수행할 수 있다.In a modified embodiment, each of the cluster and the analysis unit can be integrated. For example, the IP layer cluster unit 221 and the IP layer cluster analysis unit 501 may be integrated into the IP layer analysis unit to perform abnormal point extraction through clustering and cluster analysis.

다시 도 5를 참조하면, IP 레이어 클러스터 분석부 501은 호스트 2, 5, 8, 12, 18(H2, H5, H8, H12, H17)을 이상점으로 분류할 수 있다. 이와 같은 방식으로 각각의 분석부 501, 503, 505로부터 이상 호스트 후보군 510이 도출될 수 있다. 이상점 추출부 235는 분석부 501, 503, 505에 의한 결과와, 클러스터 시계열 분석부 233에 의해 제공되는 이전 주기의 클러스터링 결과에 기초한 클러스터 변화율(cluster movement rate)을 종합하여, 의심 호스트 520(예를 들어, H5, H8, H11)을 추출할 수 있다. 행동 분석부 237은 행동 패턴 데이터베이스 507과 지리 정보 데이터베이스 509에 기초하여, 최종 비정상 호스트 셋 530(예를 들어, H5, H8)을 결정할 수 있다. 행동 분석부 237은 이상점이 발생한 호스트의 리스트를 시간대별로 비이상 호스트 셋에 포함시킬 수 있다. 데이터베이스 507 및 509는 별도의 데이터베이스처럼 도시되었으나, 도 2에 도시된 데이터 저장소 250에 함께 위치할 수 있다.Referring again to FIG. 5, the IP layer cluster analyzer 501 can classify hosts 2, 5, 8, 12, and 18 (H2, H5, H8, H12, and H17) as abnormal points. In this way, the abnormal host candidate group 510 can be derived from each of the analysis units 501, 503, and 505. [ The anomaly extracting unit 235 integrates the results of the analyzers 501, 503, and 505 and the cluster movement rate based on the clustering result of the previous period provided by the cluster time series analyzing unit 233, H5, H8, H11) can be extracted. The behavior analysis unit 237 can determine the final unhealthy host set 530 (e.g., H5, H8) based on the behavior pattern database 507 and the geographic information database 509. [ The behavior analysis unit 237 may include a list of hosts in which an abnormal point has occurred in the non-ideal host set by time zone. The databases 507 and 509 are shown as separate databases, but may be co-located in the data store 250 shown in FIG.

예를 들어, 행동 분석부 237은 의심 호스트 520 각각의 접속 IP 주소를 지리 정보 데이터베이스 507에 질의하여, 각 의심 호스트 520의 접속 위치를 파악할 수 있다. 일반적으로 공격자나 해커들은 자신의 IP를 숨기기 위해서, 국내에서 추적하기 힘든 국가의 시스템을 경유하여 접속하는 경우가 많으므로, 접속 IP가 그와 같은 국가의 IP인 경우 의심 호스트는 비정상 행위를 수행하는 호스트일 가능성이 높다. 이러한 경우, 접속된 목적지 주소가, 현재 네트워크에서는 많이 접속하지 않는 곳이며, 공격자의 경유지로 통상 많이 이용되는 국가의 접속지라면 비정상 호스트로 분류될 수 있다.For example, the behavior analysis unit 237 can inquire the connection IP address of each suspect host 520 to the geographical information database 507 to determine the connection location of each suspect host 520. In general, an attacker or a hacker often connects via a national system which is difficult to track in order to hide his / her IP. Therefore, if the access IP is an IP of the same country, the suspicious host performs an abnormal behavior It is likely to be a host. In this case, the connected destination address is a place that does not access much in the present network, and can be classified as an abnormal host if the connected country of the country is commonly used as the stopover destination of the attacker.

또 다른 예시로, 행동 분석부 237은 행동 패턴 데이터베이스 507에 기초하여, 주기적으로 외부 시스템으로 킵 얼라이브(Keep-Alive) 메시지를 보내는지 여부를 판단하고, 외부 시스템의 위치가 현재의 네트워크에서 접속이 잘 일어나지 않는 곳이라면, 이를 비정상 호스트로 분류할 수 있다.In another example, the behavior analysis unit 237 may determine whether to send a keep-alive message periodically to an external system based on the behavior pattern database 507, If it does not happen well, you can classify it as an abnormal host.

도 6은 본 발명의 일 실시 예에 따른 네트워크 분석 시스템의 시각화 화면을 보여준다.FIG. 6 shows a visualization screen of a network analysis system according to an embodiment of the present invention.

도 6의 시각화 화면 600은 비정상 인지 모듈 240의 분석 데이터 표현부 243에 의해, 네트워크 데이터 분석 시스템의 스크린에 제공될 수 있다. 그러나 변형된 실시 예에서, 시각화 화면 600은 기업 내부 네트워크에 있는 관리자 컴퓨팅 시스템의 스크린 등에 디스플레이 될 수 있다.The visualization screen 600 of FIG. 6 may be provided on the screen of the network data analysis system by the analysis data presentation unit 243 of the abnormal recognition module 240. However, in a modified embodiment, the visualization screen 600 may be displayed on a screen of an administrator computing system in an enterprise internal network.

도 6을 참조하면, 시각화 화면 600은 특성 인자의 리스트를 나타내는 특성 인자 표시 영역 (feature(or parameter) display panel) 601, 호스트들의 현재 상태를 나타내는 호스트 표시 영역(host display panel) 602, 및 호스트와 특성 인자 사이의 관계를 나타내는 관계 표시 영역(relation display panel) 603을 포함할 수 있다. 6, the visualization screen 600 includes a feature (parameter) display panel 601 showing a list of characteristic parameters, a host display panel 602 indicating the current status of the hosts, And a relation display panel 603 indicating the relationship between the characteristic factors.

호스트 표시 영역 602에는 네트워크 상의 모든 호스트의 IP 주소 또는 ID가 점 604으로 표시될 수 있다. 영역 602는 2차원 평면으로 구성될 수 있다. X 좌표 값 605가 증가할수록 해당 호스트가 비정상 호스트로 분류되는 횟수가 증가하고, Y 좌표 값 606이 증가할수록 해당 호스트의 과거 데이터 분석 결과의 의심치가 증가할 수 있다.In the host display area 602, the IP address or ID of all the hosts on the network may be indicated by a point 604. The region 602 may be composed of a two-dimensional plane. As the X coordinate value 605 increases, the number of times that the host is classified as an abnormal host increases, and as the Y coordinate value 606 increases, the suspicious value of the past data analysis result of the host may increase.

즉, 호스트 표시 영역 602 상에서 좌측 하단 영역, 이른바 안전 영역(safe zone) 507은 비정상 호스트로 분류된 횟수가 가장 적고, 과거 데이터 분석 결과의 의심치가 낮음을 의미하므로, 가장 안전하다고 판단되는 호스트가 위치할 수 있다. 반대로 우측 상단 영역, 이른바 위험 영역(most suspicious zone) 608은 비정상 호스트로 분류된 횟수가 높고, 과거 데이터의 분석 결과에 대한 의심치가 높음을 의미하므로 가장 비정상적인 호스트가 위치할 수 있다.That is, the lower left area on the host display area 602, that is, the so-called safe zone 507, has the smallest number of times of being classified as an abnormal host and the result of past data analysis is low. can do. On the contrary, the rightmost upper region, the so-called 'most suspicious zone' 608, has the highest number of abnormal hosts, and has a high suspicion of the past data analysis results.

또한 각 호스트와 특성인자와의 연결 선 609는 해당 호스트가 연결된 특성인자에 의해서 비정상 호스트로 분류되었고, 선의 굵기가 굵을수록 그 특성인자에 의해 비정상 호스트로 분류가 빈번하게 일어났음을 의미할 수 있다. 또한, 특정 호스트(예를 들어, Host #1)가 시간이 지날수록 정상 호스트 영역에서 비정상 호스트 영역으로 이동하는 것도 파악할 수 있다. (예를 들어, 움직임 610)In addition, the connection line 609 between each host and the characteristic parameter is classified as an abnormal host by the characteristic parameter connected to the host, and the thicker the line, the more frequent the classification as the abnormal host due to the characteristic parameter have. Also, it can be seen that as a specific host (for example, Host # 1) moves from the normal host region to the abnormal host region over time. (E. G., Movement 610)

도 7은 본 발명의 일 실시 예에 따른 비정상 행위 인지를 위한 흐름도를 나타낸다. 이하의 설명에서, 전술한 내용과 대응되거나 중복되는 설명은 생략한다.FIG. 7 is a flowchart illustrating an abnormal behavior determination according to an exemplary embodiment of the present invention. In the following description, descriptions corresponding to or overlapping with those described above will be omitted.

도 7을 참조하면, S701에서 네트워크 데이터 수집 장치로부터 네트워크 데이터가 수집되면, S702에서 수집된 데이터를 저장하고, S703에서 수집된 데이터에 대한 데이터 통계를 계산하여 S704에서 통계 데이터 저장소에 저장한다. S705에서 수집된 데이터와 통계화된 데이터에서 특성 인자를 특성 인자 프로파일로부터 추출한다. 특성 인자 프로파일은 데이터 저장소 250에 미리 저장되어 있을 수 있다. 또한, 도 7과 관련하여 설명되는 저장소들은 별도로 위치하거나, 또는 데이터 저장소 250에 통합되어 위치할 수 있다.Referring to FIG. 7, when the network data is collected from the network data collecting device in S701, the collected data is stored in S702, the data statistics on the collected data are calculated in S703, and stored in the statistical data storage in S704. The characteristic parameters are extracted from the characteristic parameter profile in the data collected in S705 and the statistical data. The characteristic factor profile may be stored in the data store 250 in advance. In addition, the repositories described in connection with FIG. 7 may be located separately, or may be integrated into the data repository 250.

S707에서 특성 인자가 추출된 데이터는 클러스터링 알고리즘에 의해 각각의 특성인자 집합에 대하여 클러스터링이 수행되고, S708에서 클러스터링 된 결과 데이터는 군집 데이터 저장소에 저장될 수 있다. 분석 시스템 200은 S711에서 군집 수행 후의 결과 데이터를 분석하여 군집에 포함되지 않는 이상 데이터를 가진 호스트를 추출할 수 있다. 이 경우, 시간에 의한 군집 형상 변화율을 이전의 군집 형상 결과와 비교 분석하여(S712), 군집 형상 변화율 추이를 기반으로 의심스러운(Suspicious) 호스트를 추출할 수 있다.In S707, the data on which the characteristic parameter is extracted is clustering performed for each characteristic factor set by the clustering algorithm, and the result data clustered in S708 can be stored in the cluster data store. The analysis system 200 can analyze the result data after cluster execution in S711 to extract a host having abnormal data that is not included in the cluster. In this case, it is possible to extract suspicious hosts based on the cluster shape change rate trend by comparing and analyzing the cluster shape change rate over time with the previous cluster shape result (S712).

S717에서 추출된 이상 의심 호스트들에 대한 과거의 이력 데이터(Long-term History Data)와 과거의 수집 데이터를 추출하고, 과거 T 시간 내의 의심 호스트들의 데이터와 과거의 비정상 호스트의 이력이 분석될 수 있다. S718에서 분석 결과에 기초하여 비정상 호스트 여부가 결정될 수 있다. 분석 시스템 200은 이 결정 결과를 비정상 호스트 저장소(또는 데이터 저장소 250)에 저장하고, S720에서 스크린에 출력할 수 있다. 또한, 이와 같은 일련의 프로세스는 소정의 주기를 갖는 시간 간격으로 주기적으로 수행될 수 있다(S720).The long-term history data and the past collected data for the suspected hosts extracted in S717 are extracted, and the history of the suspicious hosts in the past T-time and the history of the past abnormal hosts can be analyzed . An abnormal host can be determined based on the analysis result in S718. The analysis system 200 may store the determination result in the abnormal host repository (or data repository 250) and output it to the screen at S720. In addition, such a series of processes may be performed periodically at a time interval having a predetermined period (S720).

이상 본 명세서를 통해 개시된 모든 실시 예들과 조건부 예시들은, 본 발명의 기술 분야에서 통상의 지식을 가진 당업자가 독자가 본 발명의 원리와 개념을 이해하도록 돕기 위한 의도로 기술된 것으로, 당업자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시 예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.
It should be noted that all embodiments and conditional examples disclosed herein are intended to assist the reader in understanding the principles and concepts of the present invention by those of ordinary skill in the art, It will be understood that the invention may be embodied in various other forms without departing from the spirit or essential characteristics thereof. Therefore, the disclosed embodiments should be considered in an illustrative rather than a restrictive sense. The scope of the present invention is defined by the appended claims rather than by the foregoing description, and all differences within the scope of equivalents thereof should be construed as being included in the present invention.

Claims

A network traffic analysis system comprising:
A data collection unit for collecting network traffic data;
A characteristic parameter extracting unit for extracting a network characteristic parameter from the collected data;
A cluster generating unit for performing clustering on the extracted characteristic factors;
An analysis unit for analyzing the cluster and extracting outliers; And
And an abnormal host determination unit that determines an abnormal host from the abnormal host.

The method according to claim 1,
Wherein the data collector is located at a connection point connecting the internal network and the external network.

The method according to claim 1,
Wherein the characteristic parameter comprises at least one of an IP layer characteristic parameter, a transport layer characteristic parameter, an application layer characteristic parameter, and a behavior characteristic parameter.

The method according to claim 1,
Wherein the analysis unit extracts a host not included in the cluster for the extracted characteristic factor as an abnormal host.

The method according to claim 1,
Wherein the abnormal host determination unit determines an abnormal host based on the behavior pattern or the geographic database of the abnormal host.

The method according to claim 1,
And periodically performs an abnormal host determination process.

The method according to claim 1,
Further comprising an analysis data presentation unit for visually presenting on the display screen whether or not an abnormal host is present for each host existing on the network.

8. The method of claim 7,
The display screen displays,
A specific parameter display area for displaying a characteristic parameter list,
A host display area indicating whether or not each host is abnormal, and
And a relationship display area for indicating a relationship between each host and a characteristic parameter.

9. The method of claim 8,
Wherein the host display area is configured in a two-dimensional plane, one axis represents the number of times each host is classified as an abnormal host, and the other axis represents a result of previous analysis for each host.

The method according to claim 1,
Wherein the abnormal host determination unit determines the abnormal host based on at least one of past clustering result and past history data for the extracted abnormal host.

A network traffic analysis method for detecting a host performing an abnormal behavior,
In a network traffic analysis system, collecting network traffic data includes:
Extracting a network characteristic factor from the collected data;
Performing clustering on the extracted characteristic factors;
Analyzing the cluster to extract an abnormal host; And
Determining an abnormal host from the abnormal host.

12. The method of claim 11,
Wherein the series of steps is performed according to a predetermined period.

12. The method of claim 11,
Further comprising displaying on the screen of the system whether an abnormal host is present for each host present on the network.

12. The method of claim 11,
Wherein the step of determining the abnormal host determines the abnormal host based on the geographical information obtained through the IP address of the abnormal host.

12. The method of claim 11,
Wherein the step of determining the abnormal host determines an abnormal host based on whether the abnormal host periodically transmits a keep-alive message to an external system.