KR20160064710A

KR20160064710A - Apparatus and method for detecting anomaly intrusion using local deviation factor graph based algorithm

Info

Publication number: KR20160064710A
Application number: KR1020140168643A
Authority: KR
Inventors: 김성열; 은상남; 진치국
Original assignee: 건국대학교 산학협력단
Priority date: 2014-11-28
Filing date: 2014-11-28
Publication date: 2016-06-08
Also published as: KR101693405B1

Abstract

Provided are an apparatus and a method for detecting an anomaly intrusion by using an LDFGB algorithm, which can improve a detection rate and a positive error rate by using an LDFGB algorithm to differentiate the distribution of data nodes and using a local deviation factor to identify an outlier. The apparatus for detecting an anomaly intrusion by using an LDFGB algorithm comprises: a calculation unit for setting multiple data as nodes and calculating a distance between the nodes by using a Euclidean distance algorithm for the set nodes; a clustering unit for generating a cluster by performing a graph-based algorithm based on the distance between the nodes calculated by the calculation unit; and a detection unit for detecting a malicious node by performing the LDFGB algorithm on the cluster generated by the clustering unit.

Description

[0001] APPARATUS AND METHOD FOR DETECTING ANOMALY INTRUSION USING LOCAL DEVIATION FACTOR GRAPH BASED ALGORITHM [0002]

본 발명은 LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치 및 방법에 관한 것으로, 더욱 상세하게는 인터넷상에서 비정상 침입을 탐지하는 LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치 및 방법에 대한 것이다.The present invention relates to an apparatus and method for detecting an abnormal intrusion using an LDFGB algorithm, and more particularly to an apparatus and method for detecting an abnormal intrusion using an LDFGB algorithm for detecting an abnormal intrusion on the Internet.

인터넷 기술의 발달로 점점 더 많은 위험 요소가 인터넷상에 발생하고 있다. 그에 따라, 인터넷상에서 위험 요소를 예방하기 위한 인터넷 보안 기술이 중요한 이슈가 되고 있다.With the development of Internet technology, more and more risk factors are occurring on the Internet. Accordingly, Internet security technology for preventing risk factors on the Internet has become an important issue.

인터넷 보안 기술 중에서 침입 탐지 기술은 가장 중요한 기술이다. 즉, 침입 탐지 기술은 인터넷상에서 외부로부터의 비정상적인 침입(공격)을 감지하는 기술로, 대부분의 인터넷 보안 기술은 침입 탐지로부터 시작한다.Among Internet security technologies, intrusion detection technology is the most important technology. In other words, intrusion detection technology detects abnormal intrusion (attack) from outside on the Internet. Most Internet security technology starts from intrusion detection.

침입 탐지 기술은 알려지거나 알려지지 않은 침입(공격)을 빠르고 효과적으로 탐지하기 위해 다양한 방법이 연구되고 있다. 1987년에 Denning은 알려지거나 알려지지 않은 침입(공격)을 탐지할 수 있는 최초의 비정상 침입 탐지 모델을 소개했다. 이후, 머신 러닝(machine learning), 면역학(immunological), 데이터 마이닝(data mining) 등과 같은 다양한 비정상 침입 탐지 기술이 개발되었다.Intrusion detection technology has been studied in various ways to detect known or unknown intrusion (attack) quickly and effectively. In 1987, Denning introduced the first abnormal intrusion detection model that could detect known or unknown intrusions. Various abnormal intrusion detection techniques have been developed since then, such as machine learning, immunological, data mining, and the like.

비정상 침입 탐지 기술 중에서 가장 널리 사용되고 있는 기술은 데이터 마이닝 기술이다. 데이터 마이닝 기술은 인터넷을 통해 유입되는 데이터들을 이용하여 데이터 모델을 생성하고, 생성된 데이터 모델을 이용하여 비정상 침입을 탐지한다. 이러한 데이터 마이닝 기술은 성공적인 침입 탐지를 위해 클러스터링 알고리즘(clustering algorithm)을 사용한다.One of the most widely used techniques for abnormal intrusion detection is data mining. Data mining technology generates a data model using data input through the Internet and detects abnormal intrusion using the generated data model. This data mining technique uses a clustering algorithm for successful intrusion detection.

클러스터링 알고리즘은 유사한(동일한) 특성을 갖는 노드들을 그룹으로 그룹화하는 작업이다. 이때, 클러스터링 알고리즘을 통해 동일한 그룹(즉, 클러스터)으로 분류된 노드들은 다른 그룹(즉, 클러스터)의 노드들보다 더 유사한 특성(동일한 특성)을 갖는다. A clustering algorithm is the task of grouping nodes with similar (identical) characteristics into groups. At this time, the nodes classified into the same group (i.e., cluster) through the clustering algorithm have more similar characteristics (same characteristics) than the nodes of the other group (i.e., cluster).

다양한 클러스터링 알고리즘 중에서 비정상 침입 탐지에 대중적으로 사용되는 알고리즘은 K-수단 알고리즘(K-means algorithm)이다. K-수단 알고리즘은 유사한 데이터 세트를 같은 클러스터로 분류하고, 유사하지 않은 데이터를 다른 클러스터로 분류한다. K-수단 알고리즘을 이용하는 경우, 사용자가 클러스터 k의 수를 설정해야한다. 이때, 사용자는 클러스터 K의 수를 설정하기 위해서 데이터에 대한 기본 지식이 있어야 하기 때문에, 데이터에 대한 지식이 없는 경우 K-수단 알고리즘을 이용하기 어려운 문제점이 있다.Among various clustering algorithms, the K-means algorithm is popularly used for abnormal intrusion detection. The K-means algorithm classifies similar data sets into the same cluster and classifies dissimilar data into different clusters. If the K-means algorithm is used, the user must set the number of cluster k. At this time, since the user must have a basic knowledge of data in order to set the number of clusters K, there is a problem that it is difficult to use the K-means algorithm when there is no knowledge of data.

이외의 클러스터링 알고리즘들도 몇몇 단점을 갖는다. 예를 들면, 담금질 기법(simulated annealing)과 클러스터링 알고리즘의 결합시 많은 트레이닝 데이터를 필요로 한다. 클러스터링 알고리즘의 이용을 위해 많은 트레이닝 데이터를 사용하는 경우 데이터 처리를 위한 자원이 과잉 소비되는 문제점이 있다.Other clustering algorithms have some disadvantages. For example, the combination of simulated annealing and clustering algorithms requires a lot of training data. There is a problem that resources for data processing are consumed excessively when a lot of training data is used for the use of the clustering algorithm.

이에, 최근에는 연구자들은 과도한 자원 소모를 방지하기 위한 클러스터링 알고리즘을 연구하고 있다. 자원 소모를 방지하는 다양한 클러스터링 알고리즘 중에서 효과적인 알고리즘 중 하나는 그래프 기반 클러스터링 알고리즘이다. 예를 들어, PBS(predictive block sampling) 알고리즘은 근사 함수에 기초한 데이터 포인트의 유사도를 측정한다. 하지만, PBS 알고리즘은 탐지율이 상대적으로 낮아 비정상 침입 탐지 기술에 사용하기 어려운 문제점이 있다. LDC(linear discriminant classifiers) 알고리즘은 비정상 침입의 탐지 속도를 향상시키지만, 데이터 노드의 분포 상황을 정확하고 포괄적으로 분석하지 못하는 문제점이 있다.Recently, researchers are studying clustering algorithms to prevent excessive resource consumption. Among the various clustering algorithms that prevent resource consumption, one of the effective algorithms is the graph-based clustering algorithm. For example, a predictive block sampling (PBS) algorithm measures the similarity of data points based on an approximate function. However, the PBS algorithm has a relatively low detection rate, which makes it difficult to use it for abnormal intrusion detection technology. Although the linear discriminant classifiers (LDC) algorithm improves the detection speed of abnormal intrusions, there is a problem that the distribution situation of data nodes can not be analyzed accurately and comprehensively.

그래프 기반 클러스터 알고리즘(Graph-Base Cluster Algorithm)은 클러스터링 알고리즘의 일종으로, 데이터 세트를 복수의 클러스터로 자동 분류하는데 일반적으로 사용된다. 그래프 기반 클러스터 알고리즘은 클러스터링 정밀도의 파라미터 설정을 통해 클러스터링 결과를 제어한다. 그래프 기반 클러스터 알고리즘은 데이터 집합의 기록을 노드로 포장하고, 노드는 완전한 무방향 그래프의 정점으로 처리된다. 이때, 그래프 기반 클러스터 알고리즘은 엣지(edge)의 가중치를 이용하여 노드 간 거리값을 처리하며, 유클리디언 거리(Euclidean distance) 함수를 이용하여 산출한다.Graph-based Cluster Algorithm is a type of clustering algorithm that is commonly used to automatically classify data sets into multiple clusters. The graph-based cluster algorithm controls the clustering result through parameter setting of clustering precision. The graph-based cluster algorithm packs the records of the dataset into nodes, and the nodes are treated as vertices of the complete non-directional graph. At this time, the graph-based cluster algorithm processes the inter-node distance value using the weight of the edge and calculates it using the Euclidean distance function.

그래프 기반 클러스터 알고리즘은 유클리디언 거리 함수를 이용하여 산출한 거리값을 이용하여 거리 행렬을 구성한다. 그래프 기반 클러스터 알고리즘은 임계값(δ)을 클러스터 정밀도(Cluster Precision)의 파라미터(α)를 이용하여 산출한다.The graph-based cluster algorithm constructs a distance matrix using distance values calculated using Euclidean distance functions. The graph-based cluster algorithm calculates the threshold value (δ) using the parameter (α) of the cluster precision (Cluster Precision).

그래프 기반 클러스터 알고리즘은 모든 그래프를 가로지르고, 노드 사이에 엣지가 있는 노드들을 동일한 클러스터로 분류한다. 그에 따라, 그래프 기반 클러스터 알고리즘은 복수의 하위 그래프를 생성하고, 각각의 하위 그래프는 클러스터를 나타낸다. 이후, 그래프 기반 클러스터 알고리즘은 생성한 그래프를 이용하여 이상점을 처리한다.The graph-based cluster algorithm traverses all graphs and classifies nodes with edges between nodes into the same cluster. Accordingly, the graph-based cluster algorithm generates a plurality of sub-graphs, and each sub-graph represents a cluster. Then, the graph-based cluster algorithm processes the anomalous points using the generated graph.

이러한, 그래프 기반 클러스터 알고리즘은 수십년가 사용되고 있지만, 침입 탐지를 위해 사용되는 경우 두 가지 단점이 존재한다. 첫번째로 그래프 기반 클러스터 알고리즘은 단지 임계값에 의해 정상 클러스터와 비정상 클러스터를 구별하기 때문에 클러스터링 정밀도가 낮다. 두번째로, 그래프 기반 클러스터 알고리즘은 이상점을 표시하기 위한 적절한 방법을 제공하지 않고 이상점을 버린다. 이러한 이유로, 그래프 기반 클러스터 알고리즘은 높은 탐지율을 달성하지 못하는 문제점이 있다.These graph-based cluster algorithms have been used for decades, but there are two disadvantages when used for intrusion detection. First, the graph-based cluster algorithm has a low clustering precision because it distinguishes normal and abnormal clusters from each other only by a threshold value. Second, the graph-based cluster algorithm discards anomalous points without providing a suitable way to display the anomalous points. For this reason, graph-based cluster algorithms fail to achieve high detection rates.

한국공개특허 제10-2004-0012285호(명칭: 은닉 마르코프 모델을 이용한 비정상행위 침입탐지 시스템 및 방법)Korean Patent Laid-Open No. 10-2004-0012285 (Name: An abnormal behavior intrusion detection system and method using a hidden Markov model)

본 발명은 상기한 종래의 문제점을 해결하기 위해 제안된 것으로, LDFGB 알고리즘(LOCAL DEVIATION FACTOR GRAPH BASED ALGORITHM)을 이용하여 데이터 노드의 분포 상황을 차별화하고, 이상점(Outlier)를 식별하기 위해 로컬 편차 팩터를 사용하여 검출율(탐지율) 및 긍정 오류 비율을 개선하도록 한 LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치 및 방법을 제공하는 것을 목적으로 한다.DISCLOSURE OF THE INVENTION The present invention has been made in order to solve the above-mentioned problems, and it is an object of the present invention to differentiate the distribution status of data nodes by using an LDFGB algorithm (LOCAL DEVIATION FACTOR GRAPH BASED ALGORITHM) And an object of the present invention is to provide an apparatus and method for detecting an abnormal intrusion using an LDFGB algorithm that improves the detection rate (detection rate) and the positive error rate by using the LDFGB algorithm.

상기한 목적을 달성하기 위하여 본 발명의 실시예에 따른 LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치는, 복수의 데이터를 각각 노드로 설정하고, 설정된 노드에 대해 유클리디언 거리 알고리즘을 이용하여 각 노드 간 거리를 산출하는 산출부; 산출부에서 산출한 노드 간 거리를 근거로 그래프 기반 알고리즘을 수행하여 클러스터링을 생성하는 클러스터링부; 및 클러스터링부에서 생성한 클러스터링에 대해 LDFGB 알고리즘을 수행하여 악성 노드를 탐지하는 탐지부를 포함한다.In order to achieve the above object, an abnormal intrusion detection apparatus using an LDFGB algorithm according to an embodiment of the present invention sets a plurality of data as nodes, sets a distance between nodes using a Euclidean distance algorithm for the set nodes, A calculation unit for calculating a correction coefficient; A clustering unit for performing clustering by performing a graph-based algorithm based on the inter-node distance calculated by the calculating unit; And a detection unit for detecting a malicious node by performing an LDFGB algorithm on the clustering generated by the clustering unit.

산출부는, 산출한 각 노드 간 거리를 근거로 거리 행렬을 생성한다.The calculation unit generates a distance matrix based on the calculated distances between nodes.

클러스터링부는, 거리 행렬의 최대값 및 최소값의 차이값과 클러스터 정밀도를 곱한 값과, 거리 행렬의 최소값을 합산한 값을 임계값으로 산출한다.The clustering unit calculates a value obtained by summing the value obtained by multiplying the difference value between the maximum value and the minimum value of the distance matrix by the cluster precision and the minimum value of the distance matrix as a threshold value.

클러스터링부는, 노드 간 거리를 근거로 형성된 그래프에서 임계값보다 큰 모든 엣지를 제거한 횡 그래프에 포함된 노드들을 동일한 클러스터로 분류하고, 횡 그래프에 포함되지 않은 노드들을 이상점으로 처리한다.The clustering unit classifies the nodes included in the horizontal graph in which all edges larger than the threshold value are removed in the graph formed based on the distance between nodes into the same cluster, and treats nodes not included in the horizontal graph as an ideal point.

탐지부는, 클러스터링부에서 생성한 클러스터들을 내림차순으로 정렬하고, 노멀 클러스터, 의심 클러스터 및 비정상 클러스터를 초기화하고, 데이터 세트의 개수와 정상 비정상 비율의 백분위를 곱한값과 각 클러스터를 비교하여 각 클러스터에 포함된 노드를 정상 클러스터(CN), 의심 클러스터(CS) 및 비정상 클러스터(CA) 중에 하나로 분류한다.The detection unit arranges the clusters generated in the clustering unit in descending order, initializes normal clusters, suspicious clusters and abnormal clusters, compares each of the clusters with a value obtained by multiplying the number of data sets by the percentile of the normal abnormal ratio, (CN), a suspicious cluster (CS), and an abnormal cluster (CA).

탐지부는, 각각의 대상 노드의 노드 지역 편차 계수를 산출하고, 가장 큰 노드 지역 편차 계수를 갖는 대상 노드를 비정상 클러스터로 분류하고, 나머지 대상 노드들을 정상 클러스터로 분류한다.The detection unit calculates the node regional deviation coefficient of each target node, classifies the target node having the largest node regional deviation coefficient into an abnormal cluster, and classifies the remaining target nodes into a normal cluster.

탐지부는, 정상 클러스터로 분류된 대상 노드를 정상으로 분류하고, 비정상 클러스터로 분류된 대상 노드를 비정상으로 분류한다.The detection unit classifies the target node classified as the normal cluster as normal, and classifies the target node classified as the abnormal cluster as abnormal.

탐지부는, 노드 로컬 편차율을 노드 로컬 편차 영향 레이트로 나눈 값을 로드 지역 편차 계수로 산출한다.The detection section calculates a value obtained by dividing the node local deviation rate by the node local deviation influence rate as a load regional deviation coefficient.

탐지부는, 노드와 노드의 질량 중심 사이의 거리, 원심이 노드이고 반경이 K로 된 원에 포함된 노드의 개수를 이용하여 노드 로컬 편차율을 산출하고, 노드 로컬 편차율 및 노드 K 거리 이웃을 근거로 노드 로컬 편차 영향 레이트를 산출한다.
The detection unit calculates the node local deviation rate using the distance between the center of mass of the node and the node, the number of nodes included in the circle whose radius is K and the centrifugal is the node, and calculates the node local deviation rate and the node K distance neighbor Thereby calculating the node local deviation influence rate.

상기한 목적을 달성하기 위하여 본 발명의 실시예에 따른 LDFGB 알고리즘을 이용한 비정상 침입 탐지 방법은, 비정상 침입 탐지 장치에 의해, 복수의 데이터 및 유클리디언 거리 알고리즘을 이용하여 노드 간 거리를 산출하는 단계; 비정상 침입 탐지 장치에 의해, 산출한 노드 간 거리를 근거로 그래프 기반 알고리즘을 수행하여 클러스터링을 생성하는 단계; 및 비정상 침입 탐지 장치에 의해, 생성한 클러스터링에 대해 LDFGB 알고리즘을 수행하여 악성 노드를 탐지하는 단계를 포함한다.In order to achieve the above object, an abnormal intrusion detection method using an LDFGB algorithm according to an embodiment of the present invention includes calculating an inter-node distance using a plurality of data and an Euclidean distance algorithm by an abnormal intrusion detection device, ; Performing a graph-based algorithm on the basis of the calculated inter-node distance by the abnormal intrusion detection device to generate clustering; And performing an LDFGB algorithm on the generated clustering by the abnormal intrusion detection device to detect a malicious node.

노드 간 거리를 산출하는 단계에서는, 비정상 침입 탐지 장치에 의해, 산출한 노드 간 거리를 근거로 거리 행렬을 생성한다.In the step of calculating the inter-node distance, the abnormal intrusion detection device generates a distance matrix based on the calculated inter-node distance.

클러스터링을 생성하는 단계는, 비정상 침입 탐지 장치에 의해, 클러스터 정밀도를 입력받는 단계; 비정상 침입 탐지 장치에 의해, 복수의 노드를 이용하여 그래프를 생성하는 단계; 비정상 침입 탐지 장치에 의해, 클러스터 정밀도 및 거리 행렬을 근거로 임계값을 산출하는 단계; 비정상 침입 탐지 장치에 의해, 생성한 그래프에서 임계값보다 큰 엣지를 제거하는 단계; 비정상 침입 탐지 장치에 의해, 엣지를 제거하는 단계에서 엣지가 제거된 횡 그래프에 포함된 노드들을 동일한 클러스터로 분류하는 단계; 및 비정상 침입 탐지 장치에 의해, 횡 그래프에 포함되지 않은 노드들을 이상점으로 처리하는 단계를 포함한다.The step of generating the clustering includes: inputting the cluster precision by the abnormal intrusion detection device; Generating a graph using a plurality of nodes by an abnormal intrusion detection device; Calculating, by the abnormal intrusion detection device, a threshold value based on the cluster precision and the distance matrix; Removing, by the abnormal intrusion detection device, an edge larger than the threshold value in the generated graph; Classifying the nodes included in the horizontal graph in which the edge is removed in the step of removing the edge into the same cluster by the abnormal intrusion detection device; And processing the nodes, which are not included in the lateral graph, as an anomaly point by the abnormal intrusion detection device.

임계값을 산출하는 단계에서는, 비정상 침입 탐지 장치에 의해, 거리 행렬의 최대값 및 최소값의 차이값과 클러스터 정밀도를 곱한 값과, 거리 행렬의 최소값을 합산한 값을 임계값으로 산출한다.In the step of calculating the threshold value, the abnormal intrusion detection device calculates a value obtained by adding the value obtained by multiplying the difference value between the maximum value and the minimum value of the distance matrix by the cluster precision and the minimum value of the distance matrix as a threshold value.

악성 노드를 탐지하는 단계는, 비정상 침입 탐지 장치에 의해, 클러스터링을 생성하는 단계에서 생성된 클러스터들을 내림차순으로 정렬하는 단계; 비정상 침입 탐지 장치에 의해, 노멀 클러스터, 의심 클러스터 및 비정상 클러스터를 초기화하는 단계; 및 비정상 침입 탐지 장치에 의해, 정렬된 클러스터들에 대해 데이터 세트의 개수와 정상 비정상 비율의 백분위를 곱한값과 각 클러스터를 비교하여 각 클러스터에 포함된 노드를 정상 클러스터(CN), 의심 클러스터(CS) 및 비정상 클러스터(CA) 중에 하나로 분류하는 단계를 포함한다.The step of detecting a malicious node includes the steps of: sorting clusters generated in the step of generating clusters in descending order by an abnormal intrusion detection device; Initializing a normal cluster, a suspicious cluster, and an abnormal cluster by an abnormal intrusion detection device; And an abnormal intrusion detection device compares the number of data sets with the value obtained by multiplying the number of data sets by the percentile of the normal abnormal rate with the value of each cluster to determine nodes included in each cluster as a normal cluster (CN), a suspicious cluster (CS ) And an abnormal cluster (CA).

악성 노드를 탐지하는 단계는, 비정상 침입 탐지 장치에 의해, 분류하는 단계에서 의심 클러스터로 분류된 대상 노드들의 노드 지역 편차 계수를 산출하는 단계; 비정상 침입 탐지 장치에 의해, 산출한 노드 지역 편차 계수가 가장 큰 대상 노드를 비정상 클러스터로 분류하는 단계; 및 비정상 침입 탐지 장치에 의해, 비정상 클러스터로 분류되지 않은 대상 노드들을 정상 클러스터로 분류하는 단계를 더 포함한다.The step of detecting a malicious node includes the steps of: calculating, by the abnormal intrusion detection device, a node regional deviation coefficient of a target node classified as a suspected cluster in a classification step; Classifying the target node having the largest node regional deviation coefficient into an abnormal cluster by the abnormal intrusion detection device; And classifying the target nodes not classified as abnormal clusters into normal clusters by the abnormal intrusion detection device.

악성 노드를 탐지하는 단계는, 비정상 침입 탐지 장치에 의해, 정상 클러스터로 분류된 노드들을 정상으로 분류하는 단계; 및 비정상 침입 탐지 장치에 의해, 비정상 클러스터로 분류된 노드들을 비정상으로 분류하는 단계를 더 포함한다.The step of detecting a malicious node includes: classifying the nodes classified as normal clusters as normal by the abnormal intrusion detection device; And classifying the nodes classified as abnormal clusters as abnormal by the abnormal intrusion detection device.

노드 지역 편차 계수를 산출하는 단계에서는, 비정상 침입 탐지 장치에 의해, 노드 로컬 편차율을 노드 로컬 편차 영향 레이트로 나눈 값을 로드 지역 편차 계수로 산출한다.In the step of calculating the node regional deviation coefficient, the abnormal intrusion detection device calculates a value obtained by dividing the node local deviation rate by the node local deviation influence rate as a load regional deviation coefficient.

노드 지역 편차 계수를 산출하는 단계는, 비정상 침입 탐지 장치에 의해, 노드와 노드의 질량 중심 사이의 거리, 원심이 노드이고 반경이 K로 된 원에 포함된 노드의 개수를 이용하여 노드 로컬 편차율을 산출하는 단계; 및 비정상 침입 탐지 장치에 의해, 노드 로컬 편차율 및 노드 K 거리 이웃을 근거로 노드 로컬 편차 영향 레이트를 산출하는 단계를 포함한다.The step of calculating the node local deviation coefficient may comprise calculating the node local deviation rate using the distance between the center of mass of the node and the node, the number of nodes included in the circle whose radius is K, ; And calculating, by the abnormal intrusion detection device, the node local deviation influence rate based on the node local deviation rate and the node K distance neighbor.

본 발명에 의하면, LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치 및 방법은 LDFGB 알고리즘을 이용하여 데이터 노드의 분포 상황을 차별화하고, 이상점를 식별하기 위해 로컬 편차 팩터를 사용함으로써, 비정상 침입 탐지의 검출율(탐지율) 및 긍정 오류 비율을 향상시킬 수 있는 효과가 있다.According to the present invention, an apparatus and method for detecting an intruder using an LDFGB algorithm differentiates a distribution status of data nodes using an LDFGB algorithm, and uses a local deviation factor to identify an abnormal point, thereby detecting a detection rate of abnormal intrusion detection ) And the positive error ratio can be improved.

또한, LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치 및 방법은 LDFGB 알고리즘을 이용하여 데이터 노드의 분포 상황을 차별화하고, 이상점를 식별하기 위해 로컬 편차 팩터를 사용함으로써, 비정상 침입에 해당하는 다양한 형태의 클러스터를 탐지할 수 있고, 알려지지 않거나 알려진 비정상 침입(즉, 공격)의 탐지율(검출율)을 향상시킬 수 있는 효과가 있다.Also, the apparatus and method for detecting abnormal intrusion using the LDFGB algorithm differentiates the distribution status of data nodes using the LDFGB algorithm, and uses local deviation factors to identify anomalous points, thereby detecting various types of clusters corresponding to abnormal intrusions And the detection rate (detection rate) of an unknown intrusion (i.e., attack) unknown or known can be improved.

또한, LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치 및 방법은 클러스터 정밀도의 파라미터에 따른 데이터 세트의 초기 파티션을 얻기 위해 그래프 기반 클러스터 알고리즘을 사용하고, 그래프 기반 클러스터 알고리즘의 결과를 처리하기 위해 이상점 탐지 알고리즘을 사용함으로써, 비정상 침입에 대한 탐지율과 긍정 오류 비율을 개선할 수 있는 효과가 있다.In addition, the apparatus and method for detecting abnormal intrusion using the LDFGB algorithm uses a graph-based cluster algorithm to obtain an initial partition of a data set according to parameters of the cluster precision, and an anomaly detection algorithm The detection rate and the positive error rate for the abnormal intrusion can be improved.

또한, LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치 및 방법은 클러스터 정밀도(α)의 조정을 통해 과도한 클러스터의 생성을 방지함으로써, 비정상 침입 탐지를 위한 데이터 처리에서 자원이 과잉 소비되는 종래의 문제점을 해결할 수 있다.In addition, the apparatus and method for detecting abnormal intrusion using the LDFGB algorithm can prevent generation of an excessive cluster through adjustment of the cluster precision (alpha), thereby solving the conventional problem that resources are excessively consumed in data processing for abnormal intrusion detection .

도 1은 본 발명의 실시예에 따른 LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치를 설명하기 위한 블록도.
도 2는 도 1의 산출부를 설명하기 위한 도면.
도 3은 도 1의 클러스터링부를 설명하기 위한 도면.
도 4 내지 도 6은 도 1의 탐지부를 설명하기 위한 도면.
도 7은 본 발명의 실시예에 따른 LDFGB 알고리즘을 이용한 비정상 침입 탐지 방법을 설명하기 위한 흐름도.
도 8은 도 7의 노드 간 거리를 산출하는 단계를 설명하기 위한 도면.
도 9는 도 7의 클러스터를 생성하는 단계를 설명하기 위한 도면.
도 10은 도 7의 악성 노드를 탐지하는 단계를 설명하기 위한 도면.
도 11 내지 도 17은 본 발명의 실시예에 따른 LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치 및 방법과 종래의 LDC 알고리즘을 이용한 비정상 침입 탐지 기술의 성능을 비교 설명하기 위한 도면.1 is a block diagram for explaining an abnormal intrusion detection apparatus using an LDFGB algorithm according to an embodiment of the present invention;
2 is a diagram for explaining the calculation unit of FIG. 1;
FIG. 3 is a view for explaining the clustering unit of FIG. 1. FIG.
Figs. 4 to 6 are views for explaining the detection unit of Fig. 1; Fig.
7 is a flowchart illustrating an abnormal intrusion detection method using an LDFGB algorithm according to an embodiment of the present invention.
8 is a view for explaining the step of calculating the inter-node distance in Fig. 7; Fig.
FIG. 9 is a diagram for explaining a step of creating the cluster of FIG. 7; FIG.
FIG. 10 is a diagram for explaining the step of detecting the malicious node in FIG. 7; FIG.
FIGS. 11 to 17 are diagrams for comparing performance of an abnormal intrusion detection apparatus and method using an LDFGB algorithm and abnormal intrusion detection techniques using a conventional LDC algorithm according to an embodiment of the present invention. FIG.

이하, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명의 기술적 사상을 용이하게 실시할 수 있을 정도로 상세히 설명하기 위하여, 본 발명의 가장 바람직한 실시예를 첨부 도면을 참조하여 설명하기로 한다. 우선 각 도면의 구성요소들에 참조부호를 부가함에 있어서, 동일한 구성요소들에 대해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 부호를 가지도록 하고 있음에 유의해야 한다. 또한, 본 발명을 설명함에 있어, 관련된 공지 구성 또는 기능에 대한 구체적인 설명이 본 발명의 요지를 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명은 생략한다.
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings in order to facilitate a person skilled in the art to easily carry out the technical idea of the present invention. . In the drawings, the same reference numerals are used to designate the same or similar components throughout the drawings. In the following description of the present invention, a detailed description of known functions and configurations incorporated herein will be omitted when it may make the subject matter of the present invention rather unclear.

이하, 본 발명의 실시예에 따른 LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치를 첨부된 도면을 참조하여 상세하게 설명하면 아래와 같다. 도 1은 본 발명의 실시예에 따른 LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치를 설명하기 위한 블록도이다. 도 2는 도 1의 산출부를 설명하기 위한 도면이고, 도 3은 도 1의 클러스터링부를 설명하기 위한 도면이고, 도 4 내지 도 6은 도 1의 탐지부를 설명하기 위한 도면이다.Hereinafter, an abnormal intrusion detection apparatus using an LDFGB algorithm according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings. 1 is a block diagram for explaining an abnormal intrusion detection apparatus using an LDFGB algorithm according to an embodiment of the present invention. FIG. 2 is a view for explaining the calculating unit of FIG. 1, FIG. 3 is a view for explaining the clustering unit of FIG. 1, and FIGS. 4 to 6 are views for explaining the detecting unit of FIG.

LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치 및 방법에서는 높은 탐지율을 달성하기 위해, 라벨 처리에서 로컬 편차 팩터에 기초한 이상치 검출 방법을 이용하여 개선된 그래프 기반 클러스터링 알고리즘을 제안한다. 이 방법은 주로 경계상의 데이터를 더욱 정확히 분류하고, 정상 클러스터와 비정상 클러스터 사이의 차이를 증가시키는 방법에 초점을 맞추고 있다.In order to achieve a high detection rate in the abnormal intrusion detection apparatus and method using the LDFGB algorithm, an improved graph based clustering algorithm is proposed using an abnormal value detection method based on a local deviation factor in label processing. This method mainly focuses on more precisely classifying the data on the boundary and increasing the difference between normal and abnormal clusters.

이를 위해, 도 1에 도시된 바와 같이, LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치(100)는 산출부(120), 클러스터링부(140), 탐지부(160)를 포함하여 구성된다.1, the abnormal intrusion detection apparatus 100 using the LDFGB algorithm includes a calculation unit 120, a clustering unit 140, and a detection unit 160.

산출부(120)는 침입 탐지 대상인 복수의 데이터를 입력받는다.산출부(120)는 입력된 복수의 데이터를 각각 노드로 설정한다. 산출부(120)는 설정된 노드에 대해 유클리디언 거리 알고리즘을 이용하여 각 노드 간 거리를 산출한다. 즉, 산출부(120)는 하기의 수학식 1(즉, 유클리디언 거리(Euclidean distance) 함수)를 이용하여 노드 간 거리(d(i,j))를 산출한다.The calculation unit 120 receives a plurality of data to be intrusion detection objects. The calculation unit 120 sets the plurality of input data as nodes. The calculating unit 120 calculates the distance between nodes using the Euclidean distance algorithm for the set node. That is, the calculating unit 120 calculates the inter-node distance d (i, j) using the following Equation 1 (i.e., Euclidean distance function).

산출부(120)는 산출한 노드 간 거리를 이용하여 거리 행렬(도 2 참조)을 생성한다. The calculation unit 120 generates a distance matrix (see FIG. 2) using the calculated inter-node distances.

클러스터링부(140)는 산출부(120)에서 산출한 노드 간 거리를 근거로 그래프 기반 알고리즘을 수행하여 클러스터를 생성한다. 이때, 클러스터링부(140)에서 사용하는 그래프 기반 알고리즘의 플로우은 도 3에 도시된 바와 같으며, 이를 상세하게 설명하면 아래와 같다.The clustering unit 140 generates a cluster by performing a graph-based algorithm based on the inter-node distance calculated by the calculation unit 120. Here, the flow of the graph-based algorithm used in the clustering unit 140 is as shown in FIG. 3 and will be described in detail as follows.

클러스터링부(140)는 클러스터 정밀도(Cluster Precision)를 입력받는다. 클러스터링부(140)는 산출부(120)에서 설정된 복수의 노드를 이용하여 그래프를 생성한다. 클러스터링부(140)는 입력된 클러스터 정밀도 및 산출부(120)에서 생성한 거리 행렬을 이용하여 임계값(δ)을 산출한다. 이때, 클러스터링부(140)는 하기의 수학식 2를 이용하여 임계값(δ)을 산출한다.The clustering unit 140 receives the cluster precision. The clustering unit 140 generates a graph using a plurality of nodes set by the calculation unit 120. The clustering unit 140 calculates the threshold value? Using the input cluster precision and the distance matrix generated by the calculation unit 120. [ At this time, the clustering unit 140 calculates the threshold value? Using the following equation (2).

여기서, dismin은 거리 행렬에서 최소값이고, dismax는 거리 행렬에서 최대값이다. 클러스터 정밀도(Cluster Precision)는 사용자에 의해 입력되는 값이다.Where dismin is the minimum value in the distance matrix and dismax is the maximum value in the distance matrix. Cluster Precision is the value entered by the user.

클러스터링부(140)는 노드 간에 형성되는 그래프에서 임계값(δ)보다 큰 모든 엣지를 제거한다. 클러스터링부(140)는 횡 그래프에 포함된 노드들을 동일한 클러스터로 분류하고, 횡 그래프를 서브 그래프로 정의한다. 클러스터링부(140)는 이외의 노드들을 이상점(Outlier)로 처리한다. 클러스터링부(140)는 모든 노드들에 대한 이상점 처리가 완료되면 복수의 클러스터(C1, C2, ... Cn) 및 이상점(Outlier)을 출력한다.
The clustering unit 140 removes all edges larger than the threshold value? In the graph formed between the nodes. The clustering unit 140 classifies the nodes included in the horizontal graph into the same cluster, and defines the horizontal graph as a subgraph. The clustering unit 140 processes the other nodes as outliers. The clustering unit 140 outputs a plurality of clusters C1, C2, ..., Cn and an outlier when the anomaly processing for all the nodes is completed.

탐지부(160)는 클러스터링부(140)에서 생성한 클러스터에 대해 LDFGB 알고리즘을 수행하여 악성 노드를 탐지하기 위해 하기와 같이 정의한다.The detection unit 160 detects the malicious node by performing the LDFGB algorithm on the cluster generated by the clustering unit 140 as follows.

정의 1: 이상점(Outlier)Definition 1: Outlier

이상점은 지역의 밀도와 관련하여, 자신의 로컬 지역에 상대적으로 외곽에 있는 개체로 정의한다. 이때, 이상점은 악성 노드를 의미한다. 예를 들어, 도 4에 도시된 바와 같이, 노드들이 밀집되어 밀도가 높은 로컬 지역(C1, C2)을 형성하는 경우, 로컬 지역(C1, C2)에서 상대적으로 외곽에 위치한 노드(O1, O2)를 이상점으로 정의한다.The anomaly is defined as an object that is relative to the local density and is relatively outside of its local area. At this time, the abnormal point means a malicious node. For example, as shown in FIG. 4, when nodes are densely formed to form local regions C1 and C2 having high density, nodes O1 and O2 located relatively outside in the local regions C1 and C2, Is defined as an ideal point.

정의 2: 노드(p)의 K 거리(K distance of an object p)Definition 2: K distance of an object p

노드(p)의 K 거리는 양의 정수 K에 대해 k-distance(p)로 표시된다. 노드(p)와 노드(o∈D) 사이의 거리는 d(p,o)로 표시한다. 이때, 노드(p)의 K 거리는 하기의 수학식 3의 조건을 만족한다.The K distance of the node p is denoted by k-distance (p) for a positive integer K. [ The distance between node p and node oD is denoted by d (p, o). At this time, the K distance of the node p satisfies the following equation (3).

예를 들어, 도 5에 도시된 바와 같이, 노드들이 분포되고 K가 2인 경우, 노드(O)와 노드(a)의 거리는 노드(O)와 노드(c)의 거리와 같고, 노드(O)와 노드(e)의 거리는 노드(O)와 노드(c)의 거리 미만이고, 노드(O)와 노드(b)의 거리는 노드(O)와 노드(c)의 거리를 초과하고, 노드(O)와 노드(d)의 거리는 노드(O)와 노드(c)의 거리를 초과한다. 그에 따라, 노드(p)의 K 거리는 노드(a), 노드(c), 노드(e)가 된다.5, when the nodes are distributed and K is 2, the distance between the node O and the node a is equal to the distance between the node O and the node c, ) And the node e are less than the distance between the node O and the node c and the distance between the node O and the node b exceeds the distance between the node O and the node c, O) and the node (d) exceeds the distance between the node (O) and the node (c). Thus, the K distance of the node p becomes the node a, the node c, and the node e.

정의 3: 노드 K 거리 이웃(k-distance neighborhood of an object)Definition 3: k-distance neighborhood of an object

노드 K 거리 이웃은 정의된 노드(p)의 K 거리를 감안할 때, 노드(p)부터 다른 노드의 거리가 노드(p)의 K 거리보다 작은 모든 노드를 포함한다. 이때, 하기의 수학식 4를 만족하는 경우, 해당 노드(q)를 노드(p)의 K 가까운 이웃(k-nearest neighbors of p)으로 정의한다.The node K distance neighbor includes all nodes whose distance from node p to the other node is less than the K distance of node p, taking into account the K distance of the defined node p. At this time, if the following expression (4) is satisfied, the node (q) is defined as k nearest neighbors of p of the node (p).

정의 4: 노드 로컬 편차율(local deviation rate of an object)Definition 4: Local deviation rate of an object

노드 로컬 편차율은 노드(p)의 K 거리가 주어지고, 노드(p)는 반경 K인 원의 중심으로 가정한다. 이 원한에 있는 모든 노드들은 노드(p)의 k 거리 이웃이다. 노드(p)는 이 원의 질량 중심이므로, 로컬 편차율(LDR)은 하기의 수학식 5와 같이 정의된다.The node local deviation rate is assumed to be the center of a circle with radius K, given the K distance of node p. All nodes in this grasp are k-distance neighbors of node p. Since the node p is the center of mass of the circle, the local deviation ratio LDR is defined as: " (5) "

여기서, dis(p,p')는 노드(p)와 노드(p')의 질량 중심 사이의 거리이고, N_k _-distance(p)는 원심이 노드(P)로 되고, 반경이 K로 된 원에서 노드의 개수를 의미한다.(P) is the distance between the center of mass of the node p and the center of mass of the node p ', N _k _{-distance (p)} The number of nodes in the circle.

정의 5: 노드 로컬 편차 영향 레이트(local deviation influence rate of an object)Definition 5: The local deviation influence rate of an object

노드 로컬 편차 영향 레이트는 노드 K 거리 이웃(N_k _- _distance(p)) 및 로컬 편차율(LDR)을 감안하면 하기의 수학식 6과 같이 정의된다.The node local deviation influence rate is defined as Equation (6) below considering the node K distance neighbor (N _k _- _distance (p)) and the local deviation rate (LDR).

정의 6: 노드 지역 편차 계수(local deviation factor of an object)Definition 6: Local deviation factor of an object

노드 지역 편차 계수는 로컬 편차율(LDR) 및 노드 로컬 편차 영향 레이트(LDIR)을 감안하면 하기의 수학식 7과 같이 정의된다. 이때, 노드 지역 편차 계수는 이상점(Outlier)을 탐지하는데 사용된다. 여기서, 노드 지역 편차 계수가 높은 값인 경우 이상점일 확률이 높은 것을 의미하고, 노드 지역 편차 계수가 낮은 값인 경우 해당 노드 인근의 밀도가 높은 것을 의미한다.The node regional deviation coefficient is defined as Equation (7) below, taking into account the local deviation ratio (LDR) and the node local deviation influence rate (LDIR). At this time, the node local deviation coefficient is used to detect an outlier. Here, when the node local deviation coefficient is high, it means that there is a high probability of an abnormal point, and when the node local deviation coefficient is a low value, it means that the density near the node is high.

탐지부(160)는 LDFGB 알고리즘을 이용하여 이상점을 탐지한다. 이를 도 6을 참조하여 설명하면 아래와 같다.The detection unit 160 detects an abnormal point using the LDFGB algorithm. This will be described with reference to FIG.

제1단계(Step1)로, 탐지부(160)는 클러스터링부(140)에서 생성한 클러스터들(C1, C2, ..., Cn)을 내림차순으로 정렬한다. In the first step (Step 1), the detection unit 160 arranges the clusters C1, C2, ..., Cn generated by the clustering unit 140 in descending order.

제2단계(Step2)로, 탐지부(160)는 클러스터의 구분을 위해 노멀 클러스터(CN; normal clusters), 의심 클러스터(CS; suspicious clusters) 및 비정상 클러스터(CA; abnormal clusters)를 초기화한다. 여기서, 의심 클러스터는 다음 단계에서 처리해야하는 데이터 세트를 의미한다. In a second step (Step 2), the detection unit 160 initializes normal clusters (CN), suspicious clusters (CS), and abnormal clusters (CA) for classifying clusters. Here, the suspicious cluster means a data set to be processed in the next step.

제3단계(Step3)로, 탐지부(160)는 데이터 세트의 개수(M)와 정상 비정상 비율의 백분위(λ1, λ2, 여기서, λ1와 λ2의 합은 1이다.)를 곱한값과 각 클러스터를 비교하여 각 클러스터에 포함된 노드를 정상 클러스터(CN), 의심 클러스터(CS) 및 비정상 클러스터(CA)로 분류한다. 이때, 정상 동작의 횟수가 침입 행동의 횟수보다 훨씬 큰 것을 전제로 충족해야하기 때문에, λ1 >> λ2의 조건을 충족해야한다. 이때, 전제 조건을 만족하지 않는 경우, 외각 포인트는 버려지지 않고 비정상 클러스터로 분류된다.In the third step (Step 3), the detection unit 160 calculates a value obtained by multiplying the number M of data sets by the percentages of normal abnormal rates (? 1,? 2, where the sum of? 1 and? 2 is 1) And classifies the nodes included in each cluster into a normal cluster (CN), a suspicious cluster (CS), and an abnormal cluster (CA). At this time, since the number of times of normal operation must be larger than the number of intrusion actions, the condition of? 1 >>? 2 must be satisfied. At this time, if the precondition is not satisfied, the outer point is not discarded but is classified as an abnormal cluster.

제4단계(Step4)로, 탐지부(160)는 상술한 수학식 5 내지 수학식 7을 이용하여 각각의 대상 노드(p)의 노드 지역 편차 계수(LDF)를 산출한다. 이때, 여기서 대상 노드(p)는 의심 클러스터(CS)로 분류된 클러스터에 포함된 노드를 의미한다. 탐지부(160)는 산출한 대상 노드(p)들의 노드 지역 편차 계수(LDF)를 근거로 내림차순으로 배열한다. 탐지부(160)는 가장 큰 노드 지역 편차 계수(LDF)를 갖는 대상 노드를 비정상 클러스터(CA)로 분류하고, 나머지 대상 노드들을 정상 클러스터(CN)로 분류한다.In the fourth step (Step 4), the detection unit 160 calculates the node local deviation coefficient (LDF) of each target node p using the above-described Equations (5) to (7). In this case, the target node p means a node included in a cluster classified as a suspicious cluster (CS). The detection unit 160 arranges the calculated target nodes p in descending order based on the node regional deviation coefficient LDF. The detection unit 160 classifies the target node having the largest node regional deviation coefficient (LDF) into an abnormal cluster (CA), and classifies the remaining target nodes into a normal cluster (CN).

제5단계(Step5)로, 탐지부(160)는 정상 클러스터(CN)에 포함된 노드를 정상(normal)으로 분류하고, 비정상 클러스터(CA)에 포함된 노드들을 비정상(abnormal)으로 분류한다. 탐지부(160)는 모든 노드에 대한 분류가 완료되면 탐지를 종료한다.
In the fifth step (Step 5), the detection unit 160 classifies the nodes included in the normal cluster CN as normal and classifies the nodes included in the abnormal cluster CA as abnormal. The detection unit 160 ends the detection when classification for all the nodes is completed.

이하, 본 발명의 실시예에 따른 LDFGB 알고리즘을 이용한 비정상 침입 탐지 방법을 첨부된 도면을 참조하여 상세하게 설명하면 아래와 같다. 도 7은 본 발명의 실시예에 따른 LDFGB 알고리즘을 이용한 비정상 침입 탐지 방법을 설명하기 위한 흐름도이다. 도 8은 도 7의 노드 간 거리를 산출하는 단계를 설명하기 위한 도면이고, 도 9는 도 7의 클러스터를 생성하는 단계를 설명하기 위한 도면이고, 도 10은 도 7의 악성 노드를 탐지하는 단계를 설명하기 위한 도면이다.Hereinafter, an abnormal intrusion detection method using an LDFGB algorithm according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings. 7 is a flowchart illustrating an abnormal intrusion detection method using an LDFGB algorithm according to an embodiment of the present invention. FIG. 8 is a view for explaining the step of calculating the inter-node distance in FIG. 7, FIG. 9 is a view for explaining the step of creating the cluster of FIG. 7, Fig.

LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치(100; 이하, 비정상 침입 탐지 장치(100))는 칩입 탐지 대상인 복수의 데이터들 및 유클리디언 거리(Euclidean distance) 함수(즉, 수학식 1)를 이용하여 노드 간 거리를 산출한다(S100). 이를 첨부된 도 8을 참조하여 설명하면 아래와 같다.The abnormal intrusion detection device 100 using the LDFGB algorithm 100 uses a plurality of data to be intrusion detection objects and an Euclidean distance function (i.e., Equation 1) (S100). This will be described with reference to FIG. 8 attached hereto.

비정상 침입 탐지 장치(100)는 침입 탐지 대상인 복수의 데이터를 입력받는다(S110).The abnormal intrusion detection device 100 receives a plurality of intrusion detection target data (S110).

비정상 침입 탐지 장치(100)는 입력된 복수의 데이터를 각각 노드로 설정한다(S130).The abnormal intrusion detection device 100 sets a plurality of input data as nodes (S130).

비정상 침입 탐지 장치(100)는 설정한 각 노드에 대해 유클리디언 거리 함수(즉, 수학식 1)를 이용하여 각 노드 간 거리(d(i,j))를 산출한다(S150).The abnormal intrusion detection apparatus 100 calculates the distance d (i, j) between nodes using the Euclidean distance function (i.e., Equation 1) for each set node (S150).

이후, 비정상 침입 탐지 장치(100)는 산출한 각 노드 간 거리를 근거로 거리 행렬을 생성한다(S170).Then, the abnormal intrusion detection apparatus 100 generates a distance matrix based on the calculated distances between nodes (S170).

비정상 침입 탐지 장치(100)는 기산출한 노드 간 거리를 근거로 그래프 기반 알고리즘을 수행하여 클러스터를 생성한다(S200). 이를 첨부된 도 9를 참조하여 설명하면 아래와 같다.The abnormal intrusion detection apparatus 100 generates a cluster by performing a graph-based algorithm based on the calculated inter-node distances (S200). This will be described with reference to FIG. 9 attached hereto.

비정상 침입 탐지 장치(100)는 클러스터 정밀도(Cluster Precision)를 입력받는다(S210).The abnormal intrusion detection apparatus 100 receives the cluster precision (S210).

비정상 침입 탐지 장치(100)는 기설정된 복수의 노드를 이용하여 그래프를 생성한다(S220).The abnormal intrusion detection apparatus 100 generates a graph using a plurality of predetermined nodes (S220).

비정상 침입 탐지 장치(100)는 입력된 클러스터 정밀도 및 기생성한 거리 행렬을 이용하여 임계값(δ)을 산출한다(S230). 이때, 비정상 침입 탐지 장치(100)는 거리 행렬의 최소값 및 최대값과 클러스터 정밀도를 근거로 임계값(δ)을 산출한다.The abnormal intrusion detection apparatus 100 calculates a threshold value? Using the input cluster precision and the distance matrix generated (S230). At this time, the abnormal intrusion detection apparatus 100 calculates the threshold value? Based on the minimum value and the maximum value of the distance matrix and the cluster precision.

비정상 침입 탐지 장치(100)는 노드 간에 형성되는 그래프에서 임계값(δ)보다 큰 모든 엣지를 제거한다(S240).The abnormal intrusion detection apparatus 100 removes all edges larger than the threshold value? In the graph formed between the nodes (S240).

비정상 침입 탐지 장치(100)는 횡 그래프에 포함된 노드들을 동일한 클러스터로 분류하고(S250), 횡 그래프를 서브 그래프로 정의한다(S260).The abnormal intrusion detection apparatus 100 classifies the nodes included in the horizontal graph into the same cluster (S250) and defines the horizontal graph as a subgraph (S260).

비정상 침입 탐지 장치(100)는 이외의 노드들(즉, 횡 그래프에 포함되지 않은 노드들)을 이상점(Outlier)으로 처리한다(S270). 그에 따라, 비정상 침입 탐지 장치(100)는 복수의 클러스터((C1, C2, ... Cn) 및 이상점을 출력한다.The abnormal intrusion detection apparatus 100 treats the other nodes (i.e., nodes not included in the lateral graph) as an outlier (S270). Thereby, the abnormal intrusion detection apparatus 100 outputs a plurality of clusters (C1, C2, ..., Cn) and an abnormal point.

비정상 침입 탐지 장치(100)는 상기 생성한 클러스터에 대해 LDFGB 알고리즘을 수행하여 악성 노드를 탐지한다(S300). 이를 첨부된 도 10을 참조하여 설명하며 아래와 같다.The abnormal intrusion detection apparatus 100 performs an LDFGB algorithm on the generated cluster to detect a malicious node (S300). This will be described with reference to FIG. 10 attached hereto.

비정상 침입 탐지 장치(100)는 기생성한 클러스터들을 내림차순으로 정렬한다(S310).The abnormal intrusion detection device 100 arranges the generated clusters in descending order (S310).

비정상 침입 탐지 장치(100)는 클러스터 구분을 위해 노멀 클러스터(CN), 의심 클러스터(CS) 및 비정상 클러스터(CA)를 초기화한다(S320). The abnormal intrusion detection apparatus 100 initializes a normal cluster CN, a suspicious cluster CS, and an abnormal cluster CA for cluster classification (S320).

비정상 침입 탐지 장치(100)는 내림차순으로 정렬한 클러스터들에 대해 데이터 세트의 개수(M)와 정상 비정상 비율의 백분위(λ1, λ2, 여기서, λ1와 λ2의 합은 1)를 곱한값과 각 클러스터를 비교하여 각 클러스터에 포함된 노드를 정상 클러스터(CN), 의심 클러스터(CS) 및 비정상 클러스터(CA) 중에 하나로 분류한다(S330). 이때, 정상 동작의 횟수가 침입 행동의 횟수보다 훨씬 큰 것을 전제로 충족해야하기 때문에, λ1 >> λ2의 조건을 충족해야한다. 여기서, 전제 조건을 만족하지 않는 경우, 외각 포인트는 버려지지 않고 비정상 클러스터로 분류된다.The abnormal intrusion detection apparatus 100 calculates a value obtained by multiplying the number of data sets M by the number of data sets M by the percentages of the normal abnormal ratio (? 1,? 2, where the sum of? 1 and? 2 is 1) And classifies the nodes included in each cluster into one of the normal cluster CN, the suspicious cluster CS, and the abnormal cluster CA in step S330. At this time, since the number of times of normal operation must be larger than the number of intrusion actions, the condition of? 1 >>? 2 must be satisfied. Here, if the precondition is not satisfied, the outer point is not discarded but is classified as an abnormal cluster.

비정상 침입 탐지 장치(100)는 각각의 대상 노드(p)의 노드 지역 편차 계수(LDF)를 산출한다(S340). 이때, 비정상 침입 탐지 장치(100)는 상술한 수학식 5 내지 수학식 7을 이용하여 대상 노드(p)의 노드 지역 편차 계수(LDF)를 산출한다. 여기서, 대상 노드(p)는 의심 클러스터(CS)로 분류된 클러스터에 포함된 노드를 의미한다.The abnormal intrusion detection apparatus 100 calculates a node regional deviation coefficient (LDF) of each target node p (S340). At this time, the abnormal intrusion detection apparatus 100 calculates the node local deviation coefficient (LDF) of the target node p using Equations (5) to (7). Here, the target node p means a node included in a cluster classified as a suspicious cluster (CS).

비정상 침입 탐지 장치(100)는 산출한 대상 노드(p)의 노드 지역 편차 계수(LDF)를 근거로 대상 노드들을 내림차순으로 배열한다(S350).The abnormal intrusion detection apparatus 100 arranges the target nodes in descending order based on the calculated node local deviation coefficient (LDF) of the target node p (S350).

비정상 침입 탐지 장치(100)는 가장 큰 노드 지역 편차 계수(LDF)를 갖는 대상 노드를 비정상 클러스터(CA)로 분류한다(S360). The abnormal intrusion detection apparatus 100 classifies the target node having the largest node regional deviation coefficient (LDF) as an abnormal cluster (CA) (S360).

비정상 침입 탐지 장치(100)는 나머지 대상 노드들을 정상 클러스터(CN)로 분류한다(S370).The abnormal intrusion detection apparatus 100 classifies the remaining target nodes into a normal cluster CN (S370).

비정상 침입 탐지 장치(100)는 정상 클러스터(CN)에 포함된 노드를 정상(normal)으로 분류한다(S380).The abnormal intrusion detection apparatus 100 classifies nodes included in the normal cluster CN as normal (S380).

비정상 침입 탐지 장치(100)는 비정상 클러스터(CA)에 포함된 노드들을 비정상(abnormal)으로 분류한다(S390). 탐지부(160)는 모든 노드에 대한 분류가 완료되면 탐지를 종료한다.
The abnormal intrusion detection apparatus 100 classifies the nodes included in the abnormal cluster CA as abnormal (S390). The detection unit 160 ends the detection when classification for all the nodes is completed.

이하, 본 발명의 실시예에 따른 LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치 및 방법에 따른 실험 결과를 첨부된 도면을 참조하여 설명하면 아래와 같다. 도 11 내지 도 17은 본 발명의 실시예에 따른 LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치 및 방법과 종래의 LDC 알고리즘을 이용한 비정상 침입 탐지 기술의 성능을 비교 설명하기 위한 도면이다.Hereinafter, experimental results of an apparatus and method for detecting abnormal intrusion using an LDFGB algorithm according to an embodiment of the present invention will be described with reference to the accompanying drawings. FIGS. 11 to 17 are diagrams for comparing performance of an abnormal intrusion detection apparatus and method using an LDFGB algorithm and abnormal intrusion detection techniques using a conventional LDC algorithm according to an embodiment of the present invention.

LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치 및 방법의 성능을 평가하기 위해, 도 11에 도시된 2차원 인공 데이터 세트를 이용하여 실험한 결과를 예로 들어 설명한다.In order to evaluate the performance of the abnormal intrusion detection apparatus and method using the LDFGB algorithm, an experiment using the two-dimensional artificial data set shown in FIG. 11 will be described as an example.

도 12에서는 데이터 노드들이 두가지 상황으로 분포하는 것을 도시한다. 이러한 분포를 형성한 데이터 노드들이 침입 탐지 대상인 경우, 종래의 LDC 알고리즘은 두 가기 종류의 상황을 구별하지 못한다. 그에 반해, LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치 및 방법에서는 K 값을 조정하여 도시된 두 가지 종류의 상황을 구별할 수 있다. 따라서, LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치 및 방법은 종래의 LDC 알고리즘에 비해 비정상 침입 탐지 성능이 우수함을 알 수 있다.Figure 12 illustrates that data nodes are distributed in two situations. When the data nodes forming this distribution are intrusion detection targets, the conventional LDC algorithm can not distinguish between two types of situations. On the other hand, in the abnormal intrusion detection apparatus and method using the LDFGB algorithm, it is possible to distinguish two kinds of situations by adjusting the K value. Therefore, it can be seen that the abnormal intrusion detection apparatus and method using the LDFGB algorithm are superior to the conventional LDC algorithm in the abnormal intrusion detection performance.

알려지지 않은 침입을 탐지하는 알고리즘의 능력을 평가하기 위해, 트레이닝 데이터 세트에서 무작위로 10000개의 샘플을 선택하고, 트레이닝 데이터 세트와 다른 유형의 침입 샘플을 2500개를 무작위로 선택한다. 도 13에서는 클러스터 정밀도(α)의 조정에 따른 클러스터링 결과를 도시한다. 여기서, 클러스터 정밀도의 조정에 따른 클러스터 개수의 변경을 참고하면, 비교적 큰 클러스터 정밀도(α)는 적은 수의 클러스터로 연결된다.To evaluate the ability of the algorithm to detect unknown intrusions, randomly select 10000 samples from the training data set and randomly select 2500 training data sets and other types of intrusion samples. 13 shows the result of clustering in accordance with the adjustment of the cluster precision [alpha]. Here, referring to the change in the number of clusters due to the adjustment of the cluster precision, a relatively large cluster precision? Is connected to a small number of clusters.

결과적으로, 과도한 데이터는 하나의 큰 클래스로 분류된다. 그리고, 대부분의 비정상 침입은 이러한 상황에서 검출할 수 없다.As a result, the excess data is classified into one large class. And most of the abnormal intrusions can not be detected in this situation.

한편, 작은 클러스터 정밀도(α)에서 분류는 과도한 클러스터를 생성할 수 있다.On the other hand, classification at a small cluster precision (?) Can generate an excessive cluster.

따라서, LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치 및 방법은 클러스터 정밀도(α)의 조정을 통해 과도한 클러스터의 생성을 방지하여 데이터 처리를 위한 자원이 과잉 소비되는 문제점을 해결할 수 있다.Therefore, the apparatus and method for detecting abnormal intrusion using the LDFGB algorithm can prevent generation of excessive clusters by adjusting the cluster precision (alpha), thereby solving the problem of over-consumption of resources for data processing.

도 14는 종래의 LDC 알고리즘을 이용한 비정상 침입 검출 기술의 성능을 도시하고, 도 15는 본 발명의 실시예에 따른 LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치 및 방법의 성능을 도시한 도면이다. 여기서, 도 14 및 도 15에서는 KDD 99 데이터 세트를 이용한 침입 탐지에 따른 검출율(Detection rate; 탐지율)과 정상 비정상 비율(False positive rate)를 도시한다.FIG. 14 illustrates the performance of the abnormal intrusion detection technique using the conventional LDC algorithm, and FIG. 15 illustrates the performance of the abnormal intrusion detection device and method using the LDFGB algorithm according to the embodiment of the present invention. 14 and 15 show the detection rate and the false positive rate according to the intrusion detection using the KDD 99 data set.

그래프 베이스 알고리즘에서 최선의 상황은 모든 데이터가 9개의 서브 세트들로 나누어지는 것이므로, 클러스터 정밀도(α)를 0.05로 고정한다. 정상 동작은 침입 동작의 수에 비해 훨씬 더 크다는 비정상 침입 탐지의 전제를 만족하기 위해, 매개변수 λ1 및 λ2가 각각 (0.9 , 1.0) and (0.0 ,0.1)인 상태에서 시험한다. 마지막으로, 변수(K)의 값을 변경하고, 그룹 3에서 이러한 구성 모델들을 테스트한다. 이때, 실험 결과를 통해 k=9, λ1=0.95 and λ2=0.05일 때, LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치 및 방법의 성능이 최대인 것을 알 수 있다.The best situation in the graph-based algorithm is that all data is divided into 9 subsets, so the cluster precision (α) is fixed at 0.05. To satisfy the premise of abnormal intrusion detection that normal operation is much greater than the number of intrusion attempts, the parameters λ1 and λ2 are tested with (0.9, 1.0) and (0.0, 0.1), respectively. Finally, we change the value of the variable K and test these configuration models in group 3. At this time, the performance of the abnormal intrusion detection apparatus and method using the LDFGB algorithm is maximized when k = 9, λ1 = 0.95 and λ2 = 0.05 through the experimental results.

또한, 도 14 및 도 15를 참조하면, LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치 및 방법은 동일한 K값일 때, LDC 알고리즘을 이용한 비정상 침입 탐지 기술보다 성능(즉, 검출율 및 정상 비정상 비율)이 향상되는 것을 알 수 있다. 여기서, 도 16 및 도 17을 통해서도 LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치 및 방법은 종래의 LDC 알고리즘을 이용한 비정상 침입 탐지 기술에 비해 성능이 향상되는 것을 알 수 있다.14 and 15, the abnormal intrusion detection apparatus and method using the LDFGB algorithm have improved performance (i.e., detection rate and normal abnormal rate) than the abnormal intrusion detection technique using the LDC algorithm at the same K value . 16 and 17, it can be seen that the apparatus and method for detecting an abnormal intrusion using the LDFGB algorithm are improved in performance compared to the abnormal intrusion detection technique using the conventional LDC algorithm.

이를 통해, 파라미터 K는 이 알고리즘의 성능에 중요한 인자임을 알 수 있으며, 상대적으로 큰 K 값은 많은 고립된 포인트가 정상으로 분류되는 원인이 될 수도 있기 때문에, 너무 큰 K 값을 설정하지 않는다. 한편, 비교적 작은 K 값은 큰 LDF 값을 갖는 최대 기록으로 이어진다. 따라서, 의심 클러스터에서 비정상 데이터를 분리할 수 없다. 이러한 상황들 모두는 클러스터 정밀도를 감소시킨다.
Thus, it can be seen that the parameter K is an important factor in the performance of the algorithm, and a relatively large K value does not set a too large K value because it may cause many isolated points to be classified as normal. On the other hand, a relatively small K value leads to a maximum record having a large LDF value. Therefore, abnormal data can not be separated in the suspect cluster. All of these situations reduce the cluster precision.

상술한 바와 같이, LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치 및 방법은 LDFGB 알고리즘을 이용하여 데이터 노드의 분포 상황을 차별화하고, 이상점를 식별하기 위해 로컬 편차 팩터를 사용함으로써, 비정상 침입 탐지의 검출율(탐지율) 및 긍정 오류 비율을 향상시킬 수 있는 효과가 있다.As described above, the abnormal intrusion detection apparatus and method using the LDFGB algorithm differentiates the distribution status of the data nodes by using the LDFGB algorithm and uses the local deviation factor to identify the abnormal point, thereby detecting the detection rate of the abnormal intrusion detection ) And the positive error ratio can be improved.

또한, LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치 및 방법은 클러스터 정밀도(α)의 조정을 통해 과도한 클러스터의 생성을 방지함으로써, 비정상 침입 탐지를 위한 데이터 처리에서 자원이 과잉 소비되는 종래의 문제점을 해결할 수 있다.
In addition, the apparatus and method for detecting abnormal intrusion using the LDFGB algorithm can prevent generation of an excessive cluster through adjustment of the cluster precision (alpha), thereby solving the conventional problem that resources are excessively consumed in data processing for abnormal intrusion detection .

이상에서 본 발명에 따른 바람직한 실시예에 대해 설명하였으나, 다양한 형태로 변형이 가능하며, 본 기술분야에서 통상의 지식을 가진자라면 본 발명의 특허청구범위를 벗어남이 없이 다양한 변형예 및 수정예를 실시할 수 있을 것으로 이해된다.While the present invention has been described in connection with what is presently considered to be practical exemplary embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but many variations and modifications may be made without departing from the scope of the present invention. It will be understood that the invention may be practiced.

100: LDFGB 알고리즘을 이용한 비정상 침입 탐지 장치
120: 산출부
140: 클러스터링부
160: 탐지부100: Abnormal intrusion detection device using LDFGB algorithm
120:
140: clustering unit
160:

Claims

A calculating unit for setting a plurality of data as nodes and calculating a distance between the nodes using the EUCLIDEAN DISTANCE ALGORITHM for the set nodes;
A clustering unit for performing clustering by performing a graph-based algorithm based on the inter-node distance calculated by the calculation unit; And
And an LDFGB algorithm (LOCAL DEVIATION FACTOR GRAPH BASED ALGORITHM) for clustering generated by the clustering unit to detect a malicious node.

The method according to claim 1,
The calculating unit calculates,
And a distance matrix is generated based on the distance between the calculated nodes.

The method of claim 2,
The clustering unit,
Wherein a threshold value is calculated as a sum of a value obtained by multiplying the difference value between the maximum value and the minimum value of the distance matrix by a cluster precision and a minimum value of the distance matrix as a threshold value.

The method of claim 3,
The clustering unit,
Wherein nodes included in a horizontal graph obtained by removing all edges larger than the threshold value in the graph formed based on the inter-node distance are classified into the same cluster, and nodes not included in the horizontal graph are treated as abnormal points. Abnormal Intrusion Detection System Using Algorithm.

The method according to claim 1,
The detection unit detects,
The clusters generated by the clustering unit are sorted in descending order, and the normal clusters, the suspicious clusters and the abnormal clusters are initialized, and the values obtained by multiplying the number of data sets by the percentiles of the normal abnormal ratio are compared with the respective clusters, Is classified into one of a normal cluster (CN), a suspicious cluster (CS), and an abnormal cluster (CA).

The method of claim 5,
The detection unit detects,
And calculating a node local deviation coefficient of each of the target nodes, classifying the target node having the largest node regional deviation coefficient as an abnormal cluster, and classifying the remaining target nodes as a normal cluster, Device.

The method of claim 6,
The detection unit detects,
Wherein the target node classified as the normal cluster is classified as normal and the target node classified as the abnormal cluster is classified as abnormal.

The method of claim 6,
The detection unit detects,
And calculating a value obtained by dividing the node local deviation rate by the node local deviation influence rate as a load regional deviation coefficient.

The method of claim 8,
The detection unit detects,
The node local deviation rate is calculated using the distance between the center of mass of the node and the node, the number of nodes included in the circle whose radius is K and the centrifugal node is K, and based on the node local deviation rate and the node K distance neighbor Wherein the node local deviation influencing rate is calculated by using the LDFGB algorithm.

Calculating an inter-node distance using a plurality of data and an EUCLIDEAN DISTANCE ALGORITHM by an abnormal intrusion detection device;
Performing a graph based algorithm on the basis of the calculated inter-node distance by the abnormal intrusion detection device to generate clustering; And
And performing an LDFGB algorithm on the generated clustering by the abnormal intrusion detection device to detect a malicious node.

The method of claim 10,
In the step of calculating the inter-node distance,
The abnormal intrusion detection apparatus generates a distance matrix based on the calculated inter-node distance by the abnormal intrusion detection apparatus.

The method of claim 11,
Wherein the generating the clustering comprises:
Receiving the cluster precision by the abnormal intrusion detection device;
Generating a graph using the plurality of nodes by the abnormal intrusion detection device;
Calculating, by the abnormal intrusion detection device, a threshold value based on the cluster precision and the distance matrix;
Removing, by the abnormal intrusion detection device, an edge larger than the threshold value in the generated graph;
Classifying the nodes included in the horizontal graph from which the edge has been removed into the same cluster by the abnormal intrusion detection device in the step of removing the edge; And
And processing the nodes that are not included in the horizontal graph as abnormal points by the abnormal intrusion detection apparatus.

The method of claim 12,
In the step of calculating the threshold value,
Wherein the abnormal intrusion detection device calculates a value obtained by multiplying a difference value between a maximum value and a minimum value of the distance matrix by a cluster precision and a minimum value of the distance matrix as the threshold value. Abnormal Intrusion Detection Method.

The method of claim 10,
Wherein the step of detecting the malicious node comprises:
Arranging clusters generated in the clustering in descending order by the abnormal intrusion detection device;
Initializing a normal cluster, a suspicious cluster, and an abnormal cluster by the abnormal intrusion detection device; And
Wherein the abnormal intrusion detection device compares the number of data sets with the value obtained by multiplying the number of data sets by the percentile of the normal abnormal ratio with each cluster and compares the nodes included in each cluster with the normal cluster (CN) (CS) and an abnormal cluster (CA). The abnormal intrusion detection method using the LDFGB algorithm includes:

15. The method of claim 14,
Wherein the step of detecting the malicious node comprises:
Calculating, by the abnormal intrusion detection device, a node regional deviation coefficient of the target nodes classified as suspicious clusters in the classifying step;
Classifying the calculated target node having the largest node regional deviation coefficient into an abnormal cluster by the abnormal intrusion detection device; And
Further comprising classifying target nodes not classified as abnormal clusters into a normal cluster by the abnormal intrusion detection device.

16. The method of claim 15,
Wherein the step of detecting the malicious node comprises:
Classifying nodes classified as normal clusters into normal by the abnormal intrusion detection device; And
Further comprising the step of classifying nodes classified as abnormal clusters as abnormal by the abnormal intrusion detection device, using the LDFGB algorithm.

15. The method of claim 14,
In the step of calculating the node regional deviation coefficient,
Wherein the abnormal intrusion detection device calculates a value obtained by dividing a node local deviation rate by a node local deviation influence rate as a load regional deviation coefficient.

18. The method of claim 17,
Wherein the step of calculating the node regional deviation coefficient comprises:
Calculating a node local deviation rate using the distance between the center of mass of the node and the node, the number of nodes included in the circle whose center is the radius and K is the radius, by the abnormal intrusion detection device; And
And calculating a node local deviation influence rate based on the node local deviation rate and the node K distance neighbor by the abnormal intrusion detection apparatus.