KR102247181B1

KR102247181B1 - Method and device for generating anomalous behavior detection model using learning data generated based on xai

Info

Publication number: KR102247181B1
Application number: KR1020200178549A
Authority: KR
Inventors: 장은동; 유이 응우엔링
Original assignee: 주식회사 이글루시큐리티
Priority date: 2020-12-18
Filing date: 2020-12-18
Publication date: 2021-05-03

Abstract

A method for generating an abnormal behavior detection model using learning data generated based on eXplainable Artificial Intelligence (XAI) relates to a method performed by a generating device that generates the abnormal behavior detection model using the learning data generated based on the XAI, comprising: a step of providing an abnormal determination range for each reference characteristic to a reference correction terminal, and correcting the abnormal determination range for each reference characteristic based on reference correction information received from the reference correction terminal; a step of classifying collected data into normal data and abnormal data based on the abnormal determination range for each modified standard feature; and a step of generating a learning model for abnormal behavior detection based on the classified normal data and abnormal data.

Description

Method and device for generating abnormal behavior detection model using learning data generated based on XAI {METHOD AND DEVICE FOR GENERATING ANOMALOUS BEHAVIOR DETECTION MODEL USING LEARNING DATA GENERATED BASED ON XAI}

본 발명은 XAI(eXplainable Artificial Intelligence)에 기초하여 생성된 학습데이터를 이용한 이상행위탐지모델 생성방법 및 장치에 관한 것이다. The present invention relates to a method and apparatus for generating an abnormal behavior detection model using learning data generated based on XAI (eXplainable Artificial Intelligence).

정보자원 내의 정보의 중요성과 정보의 양이 커짐에 따라, 네트워크 상 보안의 중요성 역시 대두되었다. 정보자원의 보안을 위하여 통합보안관리시스템, 위협관리시스템, 방화벽, IDS, ISP 등의 보안장비 및 보안시스템이 사용되고 있다.As the importance and amount of information in information resources increase, the importance of security on the network has also emerged. For the security of information resources, security equipment and security systems such as integrated security management system, threat management system, firewall, IDS, and ISP are used.

보안장비 및 보안시스템의 수집데이터에 대하여 이상행위에 해당되는지 탐지하기 위한 모델이 요구된다. A model is required to detect whether the collected data of security equipment and security system is an abnormal behavior.

이상행위탐지모델을 생성하기 위하여, 보안장비 및 보안시스템의 수집데이터를 이용한 지도학습 또는 비지도학습이 활용될 수 있다.In order to create an abnormal behavior detection model, supervised learning or unsupervised learning using collected data of security equipment and security systems may be utilized.

다만, 이상행위탐지모델을 이용한 수집데이터의 이상 또는 정상 예측 시, 예측결과와 실제 보안관제에서 판단하는 이상치의 기준이 상이하게 나타나는 문제가 발생될 수 있다. However, when the abnormality or normality of the collected data is predicted using the abnormal behavior detection model, a problem may arise in which the prediction result and the criteria of the outlier determined by the actual security control are different.

이는, 비지도학습에 사용되는 학습데이터에 다량의 노이즈가 포함되거나, 이상행위탐지모델의 생성을 요청한 고객사의 판단기준 및 관제담당자의 판단기준이 상황에 따라 변경될 수 있기 때문이다. This is because a large amount of noise may be included in the learning data used for unsupervised learning, or the judgment criteria of the customer requesting the generation of the abnormal behavior detection model and the judgment criteria of the controller may be changed according to the situation.

따라서, 고객사에 보고되는 이상치 중 판단기준을 충족시키지 않는 데이터가 다수 포함될 수 있으며, 이는 이상행위탐지모델을 생성하는 회사에 대한 고객사의 신뢰가 저감되고 고객사의 비용 및 인력이 과도하게 소모되는 문제를 발생시킬 수 있다. Therefore, among the outliers reported to the customer, many data that do not meet the judgment criteria may be included, which reduces the customer's trust in the company that creates the anomaly detection model and eliminates the problem of excessive consumption of the customer's cost and manpower. Can occur.

본 발명은 상술한 문제점을 해결할 수 있는 방법 및 장치를 제공하는 것을 목적으로 한다.An object of the present invention is to provide a method and apparatus capable of solving the above-described problems.

또한, 본 발명은, 수집데이터를 이용한 학습을 통해 이상행위탐지를 위한 1차 학습모델을 생성하고, 생성된 1차 학습모델을 설명가능한모델로 해석하여 1차 학습모델의 이상판단범위를 도출하는 것을 일 목적으로 한다.In addition, the present invention generates a primary learning model for detecting anomalies through learning using collected data, and interprets the generated primary learning model as an explanable model to derive an abnormality determination range of the primary learning model. It is for the purpose of work.

또한, 본 발명은, 고객사 또는 관제 담당자의 개별적 판단기준을 반영하여 이상판단범위를 수정하고, 수정된 이상판단범위에 기초하여 생성된 학습데이터를 이용해 이상행위탐지를 위한 2차 학습모델을 생성하는 것을 일 목적으로 한다. In addition, the present invention is to modify the abnormality determination range by reflecting the individual determination criteria of the customer or control personnel, and to generate a secondary learning model for abnormal behavior detection using the learning data generated based on the modified abnormality determination range. It is for the purpose of work.

본 발명이 해결하고자 하는 과제들은 이상에서 언급된 과제로 제한되지 않으며, 언급되지 않은 또 다른 과제들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The problems to be solved by the present invention are not limited to the problems mentioned above, and other problems that are not mentioned will be clearly understood by those skilled in the art from the following description.

상술한 과제를 해결하기 위한 본 발명의 일 실시 예에 따른 XAI에 기초하여 생성된 학습데이터를 이용한 이상행위탐지모델 생성방법은, XAI(eXplainable Artificial Intelligence)에 기반하여 생성된 학습데이터를 이용하여 이상행위탐지모델을 생성하는 생성장치에 의해 수행되는 방법으로서, 보안장치의 수집데이터를 이용한 지도학습 또는 비지도학습을 통해 이상행위탐지를 위한 학습모델을 생성하는 단계; 모델귀납법(Model-agnostic methods)을 통해 학습모델을 해석하여 수집데이터에 포함된 복수의 특징 중 학습모델의 이상행위탐지 판단에 대한 기여도가 기 설정된 기준 이상인 기준특징을 선별하는 단계; 모델귀납법(Model-agnostic methods)을 통해 학습모델을 해석하여 기준특징별 수치데이터를 도출하고, 도출된 수치데이터를 이용하여 기준특징별 이상판단범위를 도출하는 단계; 기준특징별 이상판단범위를 기준수정단말에 제공하고, 기준수정단말로부터 수신된 기준수정정보에 기초하여 기준특징별 이상판단범위를 수정하는 단계; 수정된 기준특징별 이상판단범위에 기초하여, 수집데이터를 정상데이터와 이상데이터로 분류하는 단계; 및 분류된 정상데이터 및 이상데이터에 기초하여 이상행위탐지를 위한 학습모델을 생성하는 단계를 포함한다. An anomaly behavior detection model generation method using learning data generated based on XAI according to an embodiment of the present invention for solving the above-described problem is performed by using learning data generated based on XAI (eXplainable Artificial Intelligence). A method performed by a generating device for generating a behavior detection model, the method comprising: generating a learning model for detecting anomalous behavior through supervised or unsupervised learning using collected data of a security device; Analyzing the learning model through model-agnostic methods and selecting a reference feature whose contribution to the abnormal behavior detection determination of the learning model is greater than or equal to a preset criterion among a plurality of features included in the collected data; Deriving numerical data for each reference characteristic by analyzing the learning model through model-agnostic methods, and deriving an abnormality judgment range for each reference characteristic using the derived numerical data; Providing an abnormality determination range for each reference characteristic to the reference modification terminal, and modifying the abnormality determination range for each reference characteristic based on the reference correction information received from the reference modification terminal; Classifying the collected data into normal data and abnormal data based on the modified abnormality determination range for each reference characteristic; And generating a learning model for detecting abnormal behavior based on the classified normal data and abnormal data.

또한, 상기 기준수정정보는, 기준특징에서 선별된 지정특징 및 지정특징별 이상판단범위를 포함한다. In addition, the reference correction information includes a designated feature selected from a reference feature and an abnormality determination range for each designated feature.

또한, 상기 기준특징별 이상판단범위를 기준수정단말에 제공하고, 기준수정단말로부터 수신된 기준수정정보에 기초하여 기준특징별 이상판단범위를 수정하는 단계는, 기준특징별 이상판단범위 중 지정특징과 대응되는 일부를 지정특징별 이상판단범위로 수정한다. In addition, the step of providing the abnormality determination range for each standard feature to the standard modification terminal and modifying the abnormality determination range for each standard feature based on the standard correction information received from the standard modification terminal includes a designated feature among the abnormality determination ranges for each standard feature. The part corresponding to and is modified to the range of abnormality judgment for each designated characteristic.

또한, 상기 기준수정단말은, 이상행위탐지모델의 생성을 요청한 고객사의 단말 및 이상행위탐지 업무를 수행하는 관제 담당자의 단말 중 적어도 하나를 포함한다.In addition, the reference modification terminal includes at least one of a terminal of a customer company that has requested the generation of an abnormal behavior detection model and a terminal of a controller in charge of performing an abnormal behavior detection task.

또한, 상기 분류된 정상데이터 및 이상데이터에 기초하여 이상행위탐지를 위한 학습모델을 생성하는 단계는, 분류된 정상데이터만을 이용한 비지도 학습을 통해 이상행위탐지를 위한 비지도학습모델을 생성한다. In addition, in the step of generating a learning model for detecting abnormal behavior based on the classified normal data and abnormal data, an unsupervised learning model for detecting abnormal behavior is generated through unsupervised learning using only the classified normal data.

또한, 상기 분류된 정상데이터 및 이상데이터에 기초하여 이상행위탐지를 위한 학습모델을 생성하는 단계는, 분류된 정상데이터 및 이상데이터를 이용한 지도 학습을 통해 이상행위탐지를 위한 지도학습모델을 생성한다. In addition, in the step of generating a learning model for detecting abnormal behavior based on the classified normal data and abnormal data, a supervised learning model for detecting abnormal behavior is generated through supervised learning using the classified normal data and abnormal data. .

또한, 상기 보안장치의 수집데이터를 이용한 비지도학습 또는 지도학습을 통해 이상행위탐지를 위한 학습모델을 생성하는 단계는, 서로 다른 복수의 보안장치로부터 수집된 복수의 수집데이터를 이용한 지도학습 또는 비지도학습을 통해 이상행위탐지를 위한 복수의 학습모델을 생성한다. In addition, generating a learning model for detecting anomalies through unsupervised learning or supervised learning using the collected data of the security device includes supervised learning or non-supervised learning using a plurality of collected data collected from a plurality of different security devices. Through supervised learning, multiple learning models are created for detecting abnormal behavior.

또한, 상기 모델귀납법(Model-agnostic methods)을 통해 학습모델을 해석하여 수집데이터에 포함된 복수의 특징 중 학습모델의 이상행위탐지 판단에 대한 기여도가 기 설정된 기준 이상인 기준특징을 선별하는 단계는, 모델귀납법(Model-agnostic methods)을 통해 복수의 학습모델을 해석하여, 수집데이터별로 학습모델의 이상행위탐지 판단에 대한 기여도가 기 설정된 기준 이상인 개별특징을 선별하는 단계; 개별특징이 선별된 횟수를 카운팅하는 단계; 및 카운팅된 횟수에 기초하여, 복수의 수집데이터에 대하여 선별된 모든 개별특징에서 기준특징을 선별하는 단계를 포함한다. In addition, the step of analyzing the learning model through the model-agnostic methods and selecting a reference feature whose contribution to the abnormal behavior detection determination of the learning model is greater than or equal to a preset criterion among a plurality of features included in the collected data, Analyzing a plurality of learning models through model-agnostic methods, and selecting individual features whose contribution to the abnormal behavior detection determination of the learning model for each collected data is greater than or equal to a preset criterion; Counting the number of times the individual features are selected; And selecting a reference feature from all individual features selected for the plurality of collected data, based on the counted number of times.

또한, 상기 모델귀납법(Model-agnostic methods)을 통해 학습모델을 해석하여 수집데이터별로 학습모델의 이상행위탐지 판단에 대한 기여도가 기 설정된 기준 이상인 적어도 개별특징을 선별하는 단계는, 복수의 수집데이터의 개수 대비 과반수 이상 카운팅된 개별특징을 기준특징으로 선별한다. In addition, the step of analyzing the learning model through the model-agnostic methods and selecting at least individual features whose contribution to the abnormal behavior detection determination of the learning model for each collected data is greater than or equal to a preset criterion may include: Individual features counted more than half of the number are selected as standard features.

또한, 상기 모델귀납법(Model-agnostic methods)을 통해 학습모델을 해석하여 기준특징별 수치데이터를 도출하고, 도출된 수치데이터를 이용하여 기준특징별 이상판단범위를 도출하는 단계는, 복수의 수집데이터별로 기준특징별 이상판단범위를 도출하고, 기준특징별 이상판단범위 중 수집데이터의 데이터량에 영향을 받는 기준특징 이상판단범위를 비례보정대상으로 분류한다.In addition, the step of deriving numerical data for each reference characteristic by analyzing the learning model through the model-agnostic methods, and deriving an abnormality judgment range for each reference characteristic using the derived numerical data, includes: For each reference characteristic, the abnormality judgment range for each reference characteristic is derived, and the reference characteristic abnormality detection range that is affected by the amount of data collected among the abnormality detection ranges for each reference characteristic is classified as a target for proportional correction.

또한, 상기 기준특징별 이상판단범위를 기준수정단말에 제공하고, 기준수정단말로부터 수신된 기준수정정보에 기초하여 기준특징별 이상판단범위를 수정하는 단계는, 비례보정대상에 해당하는 기준특징 이상판단범위의 수정 시, 수집데이터의 데이터량에 기초하여 산출된 비례계수를 반영하여 수정한다. In addition, the step of providing the abnormality determination range for each standard feature to the standard modification terminal and correcting the abnormality determination range for each standard feature based on the standard modification information received from the standard modification terminal includes: When modifying the judgment range, the proportional coefficient calculated based on the data amount of the collected data is reflected and corrected.

또, 본 발명의 실시 예에 따른 XAI에 기초하여 생성된 학습데이터를 이용한 이상행위탐지모델 생성장치는, 보안장치의 수집데이터를 이용한 지도학습 또는 비지도학습을 통해 이상행위탐지를 위한 학습모델을 생성하는 1차 학습모델 생성부; 모델귀납법(Model-agnostic methods)을 통해 학습모델을 해석하여, 수집데이터에 포함된 복수의 특징 중 학습모델의 이상행위탐지 판단에 대한 기여도가 기 설정된 기준 이상인 기준특징을 선별하고, 기준특징별 수치데이터를 도출하며, 도출된 수치데이터를 이용하여 기준특징별 이상판단범위를 도출하는 이상판단범위 판단부; 기준특징별 이상판단범위를 기준수정단말에 제공하고, 기준수정단말로부터 수신된 기준수정정보에 기초하여 기준특징별 이상판단범위를 수정하는 이상판단범위 수정부; 수정된 기준특징별 이상판단범위에 기초하여, 수집데이터를 정상데이터와 이상데이터로 분류하는 학습데이터 생성부; 및 분류된 정상데이터 및 이상데이터에 기초하여 이상행위탐지를 위한 학습모델을 생성하는 2차 학습모델 생성부를 포함한다.In addition, the apparatus for generating anomalous behavior detection model using learning data generated based on XAI according to an embodiment of the present invention provides a learning model for abnormal behavior detection through supervised or unsupervised learning using collected data of a security device. A primary learning model generation unit that generates; By analyzing the learning model through model-agnostic methods, among the plurality of features included in the collected data, a criterion feature whose contribution to the abnormal behavior detection judgment of the learning model is greater than or equal to a preset criterion is selected, and numerical values for each criterion feature An abnormality determination range determination unit that derives data and uses the derived numerical data to derive an abnormality determination range for each reference characteristic; An abnormality determination range correction unit that provides an abnormality determination range for each standard feature to the standard modification terminal and corrects the abnormality determination range for each standard feature based on the standard correction information received from the standard modification terminal; A learning data generator for classifying collected data into normal data and abnormal data based on the modified abnormality determination range for each reference characteristic; And a secondary learning model generator that generates a learning model for detecting abnormal behavior based on the classified normal data and abnormal data.

이 외에도, 본 발명을 구현하기 위한 다른 방법, 다른 시스템 및 상기 방법을 실행하기 위한 컴퓨터 프로그램을 기록하는 컴퓨터 판독 가능한 기록 매체가 더 제공될 수 있다.In addition to this, another method for implementing the present invention, another system, and a computer-readable recording medium for recording a computer program for executing the method may be further provided.

본 발명의 실시 예에 따르면, 고객사 또는 관제 담당자의 개별적 판단기준이 반영된 이상행위탐지를 위한 학습모델이 생성되며, 이를 통해, 학습모델을 이용한 이상행위 탐지 시 고객사 또는 관제 담당자의 개별적 판단기준과 맞지 않는 오탐의 발생이 저감될 수 있다.According to an embodiment of the present invention, a learning model for detecting anomalies that reflects the individual judgment criteria of a customer company or a control officer is created, and through this, when an abnormal behavior is detected using the learning model, it does not meet the individual judgment criteria of the customer company or the control person in charge. The occurrence of false false positives can be reduced.

본 발명의 효과들은 이상에서 언급된 효과로 제한되지 않으며, 언급되지 않은 또 다른 효과들은 아래의 기재로부터 통상의 기술자에게 명확하게 이해될 수 있을 것이다.The effects of the present invention are not limited to the effects mentioned above, and other effects not mentioned will be clearly understood by those skilled in the art from the following description.

도 1은 본 발명의 실시 예에 따른 XAI에 기초하여 생성된 학습데이터를 이용한 이상행위탐지모델 생성장치의 구성을 도시하는 블록도이다.
도 2는 보안장치에서 수집데이터가 수집되는 과정을 개념적으로 도시하는 개념도이다.
도 3 및 도 4는 본 발명의 일 실시 예에 따른 XAI에 기초하여 생성된 학습데이터를 이용한 이상행위탐지모델 생성방법의 구체적인 과정을 도시하는 흐름도이다.
도 5는 도 3의 S20단계의 동작과정을 개념적으로 도시하는 개념도이다.
도 6은 도 3의 S30단계의 구체적인 과정을 도시하는 흐름도이다.
도 7은 본 발명의 다른 실시 예에 따른 XAI에 기초하여 생성된 학습데이터를 이용한 이상행위탐지모델 생성방법의 구체적인 과정을 도시하는 흐름도이다.
도 8는 도 7의 S20단계의 구체적인 과정을 도시하는 흐름도이다. 1 is a block diagram showing the configuration of an apparatus for generating an abnormal behavior detection model using learning data generated based on XAI according to an embodiment of the present invention.
2 is a conceptual diagram conceptually showing a process of collecting collected data in a security device.
3 and 4 are flowcharts illustrating a detailed process of a method of generating an abnormal behavior detection model using learning data generated based on XAI according to an embodiment of the present invention.
FIG. 5 is a conceptual diagram conceptually showing the operation of step S20 of FIG. 3.
6 is a flowchart showing a specific process of step S30 of FIG. 3.
7 is a flowchart illustrating a specific process of a method for generating an abnormal behavior detection model using learning data generated based on XAI according to another embodiment of the present invention.
8 is a flowchart showing a specific process of step S20 of FIG. 7.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 제한되는 것이 아니라 서로 다른 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술 분야의 통상의 기술자에게 본 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. Advantages and features of the present invention, and a method of achieving them will become apparent with reference to the embodiments described below in detail together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but may be implemented in a variety of different forms. It is provided to fully inform the skilled person of the scope of the present invention, and the present invention is only defined by the scope of the claims.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소 외에 하나 이상의 다른 구성요소의 존재 또는 추가를 배제하지 않는다. 명세서 전체에 걸쳐 동일한 도면 부호는 동일한 구성 요소를 지칭하며, "및/또는"은 언급된 구성요소들의 각각 및 하나 이상의 모든 조합을 포함한다. 비록 "제1", "제2" 등이 다양한 구성요소들을 서술하기 위해서 사용되나, 이들 구성요소들은 이들 용어에 의해 제한되지 않음은 물론이다. 이들 용어들은 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용하는 것이다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있음은 물론이다.The terms used in the present specification are for describing exemplary embodiments and are not intended to limit the present invention. In this specification, the singular form also includes the plural form unless specifically stated in the phrase. As used herein, “comprises” and/or “comprising” do not exclude the presence or addition of one or more other elements other than the mentioned elements. Throughout the specification, the same reference numerals refer to the same elements, and "and/or" includes each and all combinations of one or more of the mentioned elements. Although "first", "second", and the like are used to describe various elements, it goes without saying that these elements are not limited by these terms. These terms are only used to distinguish one component from another component. Therefore, it goes without saying that the first component mentioned below may be the second component within the technical idea of the present invention.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어(기술 및 과학적 용어를 포함)는 본 발명이 속하는 기술분야의 통상의 기술자에게 공통적으로 이해될 수 있는 의미로 사용될 수 있을 것이다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms (including technical and scientific terms) used in the present specification may be used with meanings that can be commonly understood by those of ordinary skill in the art to which the present invention belongs. In addition, terms defined in a commonly used dictionary are not interpreted ideally or excessively unless explicitly defined specifically.

이하, 첨부된 도면을 참조하여 본 발명의 실시예를 상세하게 설명한다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

1. 본 발명의 실시 예에 따른 이상행위탐지모델 생성장치(10)의 설명1. Description of the abnormal behavior detection model generating apparatus 10 according to an embodiment of the present invention

도 1을 참조하면, 본 발명의 일 실시 예에 따른 이상행위탐지모델 생성장치(10)는 보안 장치의 수집데이터를 이용한 학습을 통해 이상행위탐지를 위한 1차 학습모델을 생성하며, 1차 학습모델을 설명가능한모델로 해석하여 1차 학습모델의 이상치판단에 영향을 미치는 기준특징 및 기준특징별 이상판단범위를 판단한다. 1, the abnormal behavior detection model generation apparatus 10 according to an embodiment of the present invention generates a first learning model for abnormal behavior detection through learning using collected data of a security device, and the first learning By interpreting the model as an explanable model, the criterion features that influence the outlier judgment of the primary learning model and the range of abnormality judgments for each criterion feature are determined.

도 2를 참조하면, 자산(DBMS, OS, WAS, Networ)들은 인터넷과 보안 장치들을 통해 연결된다. 보안 장치는 방화벽, IDS(Intrusion Detection System) 및 IPS(Intrusion Prevention System)와 같은 장치들을 포함하며, 각각의 장치에서 기 설정된 조건을 만족시키는 데이터가 수집될 수 있다. 2, assets (DBMS, OS, WAS, Networ) are connected through the Internet and security devices. The security device includes devices such as a firewall, an Intrusion Detection System (IDS), and an Intrusion Prevention System (IPS), and data satisfying a preset condition may be collected from each device.

또한, 다시 도 1을 참조하면, 이상행위탐지모델 생성장치(10)는 기준특징별 이상판단범위를 기준수정단말(20)에 전송하고, 기준수정단말(20)로부터 수신한 기준수정정보에 기초하여 기준특징별 이상판단범위를 수정한다. In addition, referring to FIG. 1 again, the abnormal behavior detection model generating device 10 transmits an abnormality determination range for each reference characteristic to the reference correction terminal 20, and based on the reference correction information received from the reference correction terminal 20. Therefore, the range of abnormality judgment for each standard feature is corrected.

일 실시 예에서, 기준수정단말(20)은 이상행위탐지모델의 생성을 의뢰한 고객사의 단말 또는 관제업무를 수행하는 관제담당자의 단말일 수 있다. In one embodiment, the reference modification terminal 20 may be a terminal of a customer who requested the generation of an abnormal behavior detection model or a terminal of a controller performing a control task.

또한, 이상행위탐지모델 생성장치(10)는 수정된 기준특징별 이상판단범위에 기초하여 수집데이터를 정상데이터와 이상데이터로 분류하며, 분류된 정상데이터 및 이상데이터에 기초하여 이상행위탐지를 위한 2차 학습모델을 생성한다. In addition, the abnormal behavior detection model generation device 10 classifies the collected data into normal data and abnormal data based on the modified abnormality determination range for each reference characteristic, and detects abnormal behavior based on the classified normal data and abnormal data. Create a secondary learning model.

일 실시 예에서, 이상행위탐지모델 생성장치(10)는 정상데이터만을 이용한 비지도학습을 통해 2차 학습모델을 생성할 수 있다. 공격성을 갖지 않는 정상데이터만을 이용하여 비지도학습을 수행하는 경우, 비지도학습에 의해 생성된 이상행위탐지모델의 이상행위 탐지성능이 향상될 수 있다. 생성된 정상데이터만을 이용하여 비지도 학습을 수행함에 따라, 정상의 범위가 정의되고, 정상의 범위를 벗어나는 수집데이터가 이상행위로 탐지되므로, 2차 학습모델의 이상행위 탐지성능이 향상될 수 있다. In one embodiment, the abnormal behavior detection model generation apparatus 10 may generate a secondary learning model through unsupervised learning using only normal data. When unsupervised learning is performed using only normal data that does not have aggression, the abnormal behavior detection performance of the abnormal behavior detection model generated by unsupervised learning can be improved. As unsupervised learning is performed using only the generated normal data, the normal range is defined, and the collected data outside the normal range is detected as an abnormal behavior, so the abnormal behavior detection performance of the secondary learning model can be improved. .

또한, 일 실시 예에서, 이상행위탐지모델 생성장치(10)는 정상데이터 및 이상데이터를 모두 이용한 지도학습을 통해 2차 학습모델을 생성할 수 있다. In addition, in an embodiment, the abnormal behavior detection model generating apparatus 10 may generate a secondary learning model through supervised learning using both normal data and abnormal data.

비지도학습 또는 지도학습을 통한 이상행위탐지모델 생성시, 기존에는 학습데이터에 포함된 다량의 노이즈, 이상행위탐지모델의 생성을 요청한 고객사의 개별적 판단기준, 관제담당자의 개별적 판단기준 등에 의하여 이상행위탐지모델에 의한 예측결과와 실제 보안관제에서 판단하는 이상치의 기준이 상이하게 도출되는 문제가 발생되고 있다. When creating an abnormal behavior detection model through unsupervised or supervised learning, the abnormal behavior is based on a large amount of noise previously included in the learning data, the individual judgment standard of the customer requesting the creation of the abnormal behavior detection model, and the individual judgment standard of the controller. There is a problem that the prediction result by the detection model and the criterion of the outlier judged by the actual security control are derived differently.

다만, 본 발명에 따르면, 생성된 이상행위탐지모델을 설명가능한모델로 해석하여 이상판단기준을 도출하고, 고객사 및/또는 관제 담당자로부터 도출된 이상판단기준에 대한 피드백을 수신하여 이상판단기준을 수정한다. However, according to the present invention, the abnormal behavior detection model is interpreted as an explanable model to derive the abnormality determination standard, and the abnormality determination standard is modified by receiving feedback on the abnormality determination standard derived from the customer and/or the control person in charge. do.

또한, 수정된 이상판단기준에 의해 생성된 학습데이터를 생성하므로 노이즈가 제거되며, 수정된 이상판단기준에 의해 생성된 학습데이터를 이용하여 이행위탐지를 위한 2차 학습모델을 생성하므로, 2차 학습모델을 이용한 예측결과에 고객사의 개별적 판단기준, 관제담당자의 개별적 판단기준이 모두 반영될 수 있다. In addition, since the learning data generated by the modified abnormality judgment criteria is generated, noise is removed, and a secondary learning model for detecting this behavior is generated using the learning data generated by the modified abnormality judgment criteria. In the prediction result using the learning model, both the individual judgment criteria of the customer and the individual judgment criteria of the controller can be reflected.

상술한 과정을 수행하기 위하여, 이상행위탐지모델 생성장치(10)는 1차 학습모델 생성부(11), 학습모델 해석부(12), 이상판단범위 판단부(13), 이상판단범위 수정부(14), 학습데이터 생성부(15) 및 2차 학습모델 생성부(16)를 포함한다. In order to perform the above-described process, the abnormal behavior detection model generation device 10 includes a primary learning model generation unit 11, a learning model analysis unit 12, an abnormality determination range determination unit 13, and an abnormality determination range correction unit. (14), a learning data generation unit 15 and a secondary learning model generation unit 16.

또한, 이상행위탐지모델 생성장치(10) 및 기준수정단말(20)은 각각 정보를 전송하고 수신하기 위한 통신부, 정보를 연산하기 위한 제어부 및 정보를 저장하기 위한 메모리(또는 데이터베이스)를 포함할 수 있다.In addition, the abnormal behavior detection model generation device 10 and the reference correction terminal 20 may each include a communication unit for transmitting and receiving information, a control unit for calculating information, and a memory (or database) for storing information. have.

제어부는, 하드웨어적으로, ASICs(applicationspecific integrated circuits), DSPs(digital signal processors), DSPDs(digital signal processing devices), PLDs(programmable logic devices), FPGAs(field programmable gate arrays), 프로세서(processors), 제어기(controllers), 마이크로컨트롤러(micro-controllers), 마이크로 프로세서(microprocessors), 기타 기능 수행을 위한 전기적인 유닛 중 적어도 하나를 이용하여 구현될 수 있다.The control unit, in hardware, includes application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), processors, and controllers. It may be implemented using at least one of (controllers), micro-controllers, microprocessors, and electrical units for performing other functions.

또한, 소프트웨어적으로, 본 명세서에서 설명되는 절차 및 기능과 같은 실시 예들은 별도의 소프트웨어 모듈들로 구현될 수 있다. 상기 소프트웨어 모듈들 각각은 본 명세서에서 설명되는 하나 이상의 기능 및 작동을 수행할 수In addition, in terms of software, embodiments such as procedures and functions described herein may be implemented as separate software modules. Each of the software modules may perform one or more functions and operations described herein.

있다. 소프트웨어 코드는 적절한 프로그램 언어로 쓰여진 소프트웨어 애플리케이션으로 소프트웨어 코드가 구현될 수 있다. 상기 소프트웨어 코드는 메모리에 저장되고, 제어부에 의해 실행될 수 있다.have. The software code is a software application written in an appropriate programming language, and the software code can be implemented. The software code is stored in a memory and can be executed by the control unit.

통신부는 유선통신모듈, 무선통신모듈 및 근거리통신모듈 중 적어도 하나를 통해 구현될 수 있다. 무선 인터넷 모듈은 무선 인터넷 접속을 위한 모듈을 말하는 것으로 각 장치에 내장되거나 외장될 수 있다. 무선 인터넷 기술로는 WLAN(Wireless LAN)(Wi-Fi), Wibro(Wireless broadband), Wimax(World Interoperability for Microwave Access), HSDPA(High Speed Downlink Packet Access), LTE(long term evolution), LTE-A(Long Term Evolution-Advanced) 등이 이용될 수 있다.The communication unit may be implemented through at least one of a wired communication module, a wireless communication module, and a short-range communication module. The wireless Internet module refers to a module for wireless Internet access and may be built-in or external to each device. Wireless Internet technologies include WLAN (Wireless LAN) (Wi-Fi), Wibro (Wireless broadband), Wimax (World Interoperability for Microwave Access), HSDPA (High Speed Downlink Packet Access), LTE (long term evolution), LTE-A. (Long Term Evolution-Advanced) or the like may be used.

메모리는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hardMemory is a flash memory type, a hard disk type

disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등), 램(random access memory; RAM), SRAM(static random access memory), 롬(read-only memory; ROM), EEPROM(electrically erasable programmable read-only memory), PROM(programmable read-only memory), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나의 타입의 저장매체를 포함할 수 있다. disk type), multimedia card micro type, card type memory (for example, SD or XD memory, etc.), random access memory (RAM), static random access memory (SRAM), read -only memory; ROM), EEPROM (electrically erasable programmable read-only memory), PROM (programmable read-only memory), magnetic memory, magnetic disk, may include at least one type of storage medium of the optical disk.

이하에서는, 도 3 내지 도 8을 참조하여, 본 실시 예에 따른 이상행위탐지모델 생성장치(10)에 의해 수행되는 이상행위탐지모델 생성방법(S1)에 대해 구체적으로 설명한다. Hereinafter, an abnormal behavior detection model generation method S1 performed by the abnormal behavior detection model generation apparatus 10 according to the present embodiment will be described in detail with reference to FIGS. 3 to 8.

2. 본 발명의 일 실시 예에 따른 이상행위탐지모델 생성방법(S1)의 설명 2. Description of the abnormal behavior detection model generation method (S1) according to an embodiment of the present invention

도 3 및 도 4를 참조하면, 본 실시 예에 따른 이상행위탐지모델 생성방법(S1)은 이상행위탐지를 위한 학습모델을 생성하는 단계(S10), 모델귀납법을 통해 학습모델을 해석하여 기준특징을 선별하는 단계(S20), 모델귀납법을 통해 학습모델을 해석하여 기준특징별 이상판단범위를 판단하는 단계(S30), 기준수정정보에 기초하여 기준특징별 이상판단범위를 수정하는 단계(S40), 수정된 기준특징별 이상판단범위에 기초하여 수집데이터를 정상데이터와 이상데이터로 분류하는 단계(S50), 분류된 정상데이터 및 이상데이터에 기초하여 이상행위탐지를 위한 학습모델을 생성하는 단계(S60)를 포함한다. 3 and 4, the abnormal behavior detection model generation method (S1) according to the present embodiment includes the step of generating a learning model for abnormal behavior detection (S10), by analyzing the learning model through model induction Selecting (S20), determining an abnormality judgment range for each reference characteristic by analyzing the learning model through model induction (S30), correcting the abnormality detection range for each reference characteristic based on the reference correction information (S40) , Classifying the collected data into normal data and abnormal data based on the modified abnormality determination range for each reference characteristic (S50), generating a learning model for abnormal behavior detection based on the classified normal data and abnormal data ( S60).

(1) 이상행위탐지를 위한 학습모델을 생성하는 단계(S10)의 설명(1) Description of the step (S10) of generating a learning model for detecting abnormal behavior

먼저, 1차 학습모델 생성부(11)의 1차 이상행위탐지모델 생성모듈(111)이 보안장치의 수집데이터를 이용하여 이상행위탐지를 위한 1차 학습모델을 생성한다(S10). First, the first abnormal behavior detection model generation module 111 of the first learning model generation unit 11 generates a first learning model for abnormal behavior detection by using the collected data of the security device (S10).

1차 학습모델은 비지도학습 또는 지도학습을 통해 생성될 수 있다. The primary learning model can be created through unsupervised learning or supervised learning.

비지도학습을 통해 생성되는 경우, 1차 이상행위탐지모델 생성모듈(111)이 보안장치의 수집데이터를 전처리하여 학습데이터를 생성하고, 생성된 학습데이터를 이용한 비지도학습을 통해 1차 학습모델을 생성한다. When generated through unsupervised learning, the primary abnormal behavior detection model generation module 111 pre-processes the collected data of the security device to generate learning data, and the primary learning model through unsupervised learning using the generated learning data. Is created.

일 실시 예에서, 비지도학습을 위한 알고리즘으로 K-평균(K-means), 병합군집(agglomerative clustering), DBSCAN(Density-based spatial clustering of applications with noise), PCA (Principal component analysis) 등이 사용될 수 있다.In one embodiment, as an algorithm for unsupervised learning, K-means, agglomerative clustering, density-based spatial clustering of applications with noise (DBSCAN), principal component analysis (PCA), and the like are used. I can.

지도학습을 통해 생성되는 경우, 1차 이상행위탐지모델 생성모듈(111)의 보안장치의 수집데이터 중 일부를 전처리하여 학습데이터를 생성한 후, 생성된 학습데이터를 이용한 지도학습을 통해 1차 학습모델을 생성한다. 일 실시 예에서, 지도학습을 위한 학습데이터는 수집데이터 중 일부에 대해 이상라벨(Anormaly) 또는 정상라벨(Normal)을 부여한 후 이를 전처리하여 생성될 수 있다. When generated through supervised learning, after pre-processing some of the collected data of the security device of the first abnormal behavior detection model generation module 111 to generate learning data, the first learning through supervised learning using the generated learning data Create a model. In an embodiment, learning data for supervised learning may be generated by pre-processing after giving an abnormal label or a normal label to some of the collected data.

일 실시 예에서, 지도학습을 위한 알고리즘으로 서포트 벡터 머신 (support vector machine), 회귀 분석 (Regression), 신경망 (Neural network), 합성곱 신경망 (Convolution Neural Network), 나이브 베이즈 분류 (Naive Bayes Classification) 등이 사용될 수 있다. In one embodiment, as an algorithm for supervised learning, a support vector machine, a regression analysis, a neural network, a convolution neural network, and a Naive Bayes classification Etc. can be used.

(2) 모델귀납법을 통해 학습모델을 해석하여 기준특징을 선별하는 단계(S20)의 설명(2) Description of the step (S20) of selecting a reference feature by analyzing a learning model through model induction

1차 학습모델이 생성되면, 학습모델 해석부(12)가 생성된 1차 학습모델을 모델귀납법을 통해 해석하여 기준특징을 선별한다(S20). When the primary learning model is generated, the learning model analysis unit 12 analyzes the generated primary learning model through model induction to select a reference feature (S20).

1차 학습모델의 해석에는 모델귀납법(Model-agnostic methods)이 사용될 수 있다. Model-agnostic methods can be used for the interpretation of the first-order learning model.

모델귀납법으로는 Partial Dependence Plot(PDP), Individual Conditional Expectation(ICE), M Plot, Accumulated Local Effects (ALE) Plot, Feature Interaction, Global Surrogate, Local Surrogate (LIME) 및 Shapley Values (SHAP) 등이 사용될 수 있다. Model induction methods include Partial Dependence Plot (PDP), Individual Conditional Expectation (ICE), M Plot, Accumulated Local Effects (ALE) Plot, Feature Interaction, Global Surrogate, Local Surrogate (LIME) and Shapley Values (SHAP). have.

일 실시 예에서, 모델귀납법으로 바람직하게는 Shapley Values (SHAP)이 사용될 수 있다. In one embodiment, Shapley Values (SHAP) may be preferably used as the model induction method.

학습모델 해석부(12)가 모델귀납법을 통해 1차 학습모델을 해석함에 따라, 1차 학습모델의 학습에 사용된 학습데이터에 포함된 복수의 특징별로 이상판단에 대한 기여도가 산출될 수 있다. As the learning model analysis unit 12 analyzes the primary learning model through model induction, a contribution to the abnormality determination may be calculated for each of a plurality of features included in the learning data used for learning the primary learning model.

이상판단은 수집데이터가 정상 또는 이상에 해당되는지 판단하는 것을 의미한다. Abnormal judgment means determining whether the collected data is normal or abnormal.

또한, 이상판단에 대한 기여도는 수치데이터의 형태로 산출될 수 있다. In addition, the degree of contribution to the abnormality determination can be calculated in the form of numerical data.

이상판단에 대한 기여도가 산출되면, 학습모델 해석부(12)가 복수의 특징 중 기여도가 기 설정된 기준 이상인 특징을 기준특징으로 선정한다. When the degree of contribution to the abnormality determination is calculated, the learning model analysis unit 12 selects a feature whose contribution is greater than or equal to a preset criterion among the plurality of features as the reference feature.

일 실시 예에서, 기 설정된 기준은 이상판단에 유의미한 영향을 준 것으로 판단될 수 있는 기준일 수 있으며, 기준특징은 복수 개로 선정될 수 있다. 기 설정된 기준은 기여도의 수치일 수 있다. 또한, 기 설정된 기준은 순치의 순위에 의해 결정될 수 있다. 예를 들어, 기여도의 수치가 높은 상위 3개의 특징이 기준특징으로 선별될 수 있다. In an embodiment, the preset criterion may be a criterion that can be determined to have a significant influence on the abnormality determination, and a plurality of criterion features may be selected. The preset criterion may be a number of contributions. In addition, the preset criterion may be determined by the order of the net chi. For example, the top three features with a high level of contribution can be selected as reference features.

도 5를 참조하면, 설명가능한모델에 의해 repeat_uri_cnt_max, repeat_pkp_bytes_cnt_tot, repeat_pkp_bytes_cnt_max, repeat_uri_cnt_tot, pkp_bytes_sum, pkp_bytes_avg, pkp_bytes_std 등의 특징들에 대한 기여도가 도시된다. Referring to FIG. 5, contributions to features such as repeat_uri_cnt_max, repeat_pkp_bytes_cnt_tot, repeat_pkp_bytes_cnt_max, repeat_uri_cnt_tot, pkp_bytes_sum, pkp_bytes_avg, pkp_bytes_std, and the like are shown according to an explanable model.

일 실시 예에서, 기여도 수치(SHAP value)가 0.6 이상인 repeat_uri_cnt_max 및 repeat_pkp_bytes_cnt_tot이 기준특징으로 선정될 수 있다.In an embodiment, repeat_uri_cnt_max and repeat_pkp_bytes_cnt_tot having a SHAP value of 0.6 or more may be selected as reference features.

또한, 일 실시 예에서, 기여도 수치(SHAP value)가 높은 상위 4개의 특징인 repeat_uri_cnt_max, repeat_pkp_bytes_cnt_tot, repeat_pkp_bytes_cnt_max, repeat_uri_cnt_tot이 기준특징으로 선정될 수 있다. In addition, in an embodiment, repeat_uri_cnt_max, repeat_pkp_bytes_cnt_tot, repeat_pkp_bytes_cnt_max, repeat_uri_cnt_tot, which are the top four features having a high SHAP value, may be selected as reference features.

즉, 학습모델 해석부(12)에 의해 1차 학습모델의 특징별 기여도를 제공하는 설명가능한모델을 생성하며, 설명가능한모델을 통해 제공된 기여도에 기초하여 기준특징을 선별할 수 있다. That is, the learning model analysis unit 12 generates an explanable model that provides a contribution degree for each feature of the primary learning model, and a reference feature may be selected based on the contribution provided through the explainable model.

도 5을 참조하면, 생성된 설명가능한 모델이 개념적으로 도시된다. Referring to Fig. 5, the generated explainable model is conceptually illustrated.

1차 학습모델은 입력된 수집데이터에 대하여 이상 또는 정상에 해당됨을 판단하는 출력값을 제공하는 반면에, 설명가능한모델은 수집데이터의 특징별 기여도와 특징별 수치데이터를 제공한다. The primary learning model provides an output value for determining that the input collected data is abnormal or normal, while the explainable model provides the contribution of the collected data by feature and numerical data by feature.

즉, 수집데이터를 설명가능한모델에 입력시 수집데이터의 기준특징별 수치데이터가 제공된다. That is, when the collected data is input into an explainable model, numerical data for each standard feature of the collected data is provided.

도시된 실시 예에서, 기준특징인 repeat_uri_cnt_max, repeat_pkp_bytes_cnt_tot, repeat_pkp_bytes_cnt_max, repeat_uri_cnt_tot 각각에 대한 수치데이터(Model output value)가 제공된다. In the illustrated embodiment, numerical data (Model output value) for each of the reference features repeat_uri_cnt_max, repeat_pkp_bytes_cnt_tot, repeat_pkp_bytes_cnt_max, and repeat_uri_cnt_tot are provided.

설명가능한모델에 의해 제공되는 기준특징별 수치데이터는 1차 학습모델이 이상으로 판단한 근거로서 사용될 수 있다.Numerical data for each reference feature provided by an explainable model can be used as a basis for the primary learning model to judge abnormalities.

(3) 모델귀납법을 통해 학습모델을 해석하여 기준특징별 이상판단범위를 판단하는 단계(S30)의 설명(3) Description of the step (S30) of determining the range of abnormality judgment for each standard feature by analyzing the learning model through the model induction method

모델귀납법을 통한 1차 학습모델 해석을 통해 설명가능한모델이 생성되면, 이상판단범위 판단부(13)가 생성된 설명가능한모델을 통하여 기준특징별 이상판단범위를 판단한다(S30). When an explainable model is generated through the first learning model analysis through model induction, the abnormality determination range determination unit 13 determines the abnormality determination range for each reference feature through the generated explainable model (S30).

기준특징별 이상판단범위는 1차 학습모델이 수집데이터를 이상으로 판단하게 한 기준특징의 수치데이터의 기준범위를 의미한다. 즉, 기준특징별 이상판단범위 내에 포함되는 수집데이터의 경우 이상으로 판단되며, 기준특징별 이상기준 수치범위 밖에 포함되는 수집데이터의 경우 정상으로 판단될 수 있다.The abnormality judgment range for each reference characteristic means the reference range of the numerical data of the reference characteristic that caused the primary learning model to judge the collected data as abnormal. That is, the collected data included in the abnormality determination range for each reference characteristic is judged as abnormal, and the collected data included in the abnormality standard numerical range for each reference characteristic may be determined as normal.

수집데이터의 이상판단은 하나의 기준특징에 의해 결정될 수 있으며, 복수의 기준특징에 의해 결정될 수 있다. 예를 들어, 어느 하나의 기준특징의 이상판단범위를 만족하는 경우 수집데이터가 이상으로 판단될 수 있으며, 복수 개의 기준특징의 이상판단범위를 모두 만족하는 경우 수집데이터가 이상으로 판단될 수 있다. The abnormality judgment of the collected data may be determined by one reference characteristic, and may be determined by a plurality of reference characteristics. For example, when the abnormality determination range of any one of the reference features is satisfied, the collected data may be determined as abnormal, and when all of the abnormality determination ranges of a plurality of reference characteristics are satisfied, the collected data may be determined as abnormal.

도 6을 참조하면, 설명가능한모델을 통해 기준특징별 이상판단범위를 판단하는 단계(S30)의 구체적인 과정이 도시된다. Referring to FIG. 6, a detailed process of the step S30 of determining an abnormality determination range for each reference characteristic through an explanable model is shown.

먼저, 이상판단범위 판단부(13)가 수집데이터를 설명가능한모델에 입력하고, 수집데이터의 기준특징에 대한 수치데이터를 산출한다(S31).First, the abnormality determination range determination unit 13 inputs the collected data into an explanable model, and calculates numerical data for a reference characteristic of the collected data (S31).

일 실시 예에서, 수치데이터의 산출에는 1차 탐지모델의 학습에 사용된 수집데이터가 사용된다. In one embodiment, the collected data used for learning of the primary detection model is used to calculate the numerical data.

학습에 사용된 수집데이터 각각의 기준특징에 대한 수치데이터가 산출되면, 이상판단범위 판단부(13)가 산출된 수치데이터에 기초하여 기준특징별 이상판단범위를 판단한다(S32). When numerical data for each reference feature of the collected data used for learning is calculated, the abnormality determination range determination unit 13 determines an abnormality determination range for each reference characteristic based on the calculated numerical data (S32).

일 실시 예에서, 산출된 수치데이터에 기초하여 기준특징별 경계값이 도출되고, 도출된 경계값들에 의해 이상여부가 판단될 수 있다. In an embodiment, a boundary value for each reference feature may be derived based on the calculated numerical data, and an abnormality may be determined based on the derived boundary values.

도 4를 참조하면, 어느 하나의 보안장치에서 수집된 수집데이터 A에 대한 기준특징이 F1, F2 및 F3로 결정되며, F1, F2 및 F3 각각에 대한 이상판단범위가 R1, R2 및 R3로 결정된다. F1, F2, F3, R1, R2, 및 R3는 설명을 위해 임의로 배정된 기호이며, 기준특징 및 이상판단범위는 보안장치의 수집데이터의 종류 및 성격에 따라 변경될 수 있다. 도 5을 참조하면, F1, F2 및 F3는 repeat_uri_cnt_max, repeat_pkp_bytes_cnt_tot, repeat_pkp_bytes_cnt_max에 해당되고, R1, R2 및 R3는 1,500 이상, 1, 300 이상, 1,000 이상에 해당될 수 있다. Referring to FIG. 4, the reference characteristics for collected data A collected from any one security device are determined as F1, F2 and F3, and the abnormality determination range for each of F1, F2 and F3 is determined as R1, R2 and R3. do. F1, F2, F3, R1, R2, and R3 are symbols randomly assigned for explanation, and the standard characteristics and abnormality detection range can be changed according to the type and nature of the collected data of the security device. 5, F1, F2, and F3 correspond to repeat_uri_cnt_max, repeat_pkp_bytes_cnt_tot, repeat_pkp_bytes_cnt_max, and R1, R2 and R3 may correspond to 1,500 or more, 1, 300 or more, and 1,000 or more.

일 실시 예에서, 수집데이터는 기준특징 중 어느 하나의 이상판단범위에 포함되는 경우 이상으로 판단될 수 있다. In an embodiment, when the collected data is included in the abnormality determination range of any one of the reference characteristics, it may be determined as abnormal.

또한, 일 실시 예에서, 이상판단기준은 적어도 하나의 기준특징의 이상판단범위의 조합으로 구성될 수 있다. 예를 들어, repeat_pkt_bytes_cnt_tot가 1,500 이상이고, repeat_pkp_bytes_cnt_max가 1,300 이상인 경우 수집데이터가 이상으로 판단될 수 있다. In addition, in an embodiment, the abnormality determination criterion may be composed of a combination of abnormality determination ranges of at least one reference feature. For example, when repeat_pkt_bytes_cnt_tot is 1,500 or more and repeat_pkp_bytes_cnt_max is 1,300 or more, the collected data may be determined as abnormal.

(4) 기준수정정보에 기초하여 기준특징별 이상판단범위를 수정하는 단계 (S40)의 설명(4) Description of the step (S40) of modifying the abnormality determination range for each standard feature based on the standard correction information

기준특징별 이상판단범위가 산출되면, 이상판단범위 수정부(14)는 산출된 기준특징별 이상판단범위를 기준수정단말(20)에 전송한다. When the abnormality determination range for each reference characteristic is calculated, the abnormality determination range correction unit 14 transmits the calculated abnormality determination range for each reference characteristic to the reference modification terminal 20.

일 실시 예에서, 기준수정단말(20)은 이상행위탐지모델의 생성을 의뢰한 고객사의 단말 또는 관제업무를 수행하는 관제담당자의 단말일 수 있다.In one embodiment, the reference modification terminal 20 may be a terminal of a customer who requested the generation of an abnormal behavior detection model or a terminal of a controller performing a control task.

기준특징별 이상판단범위를 수신한 기준수정단말(20)은 기준특징별 이상판단범위의 수정사항에 대한 기준수정정보를 연산한다. Upon receiving the abnormality determination range for each reference characteristic, the reference correction terminal 20 calculates reference correction information for corrections of the abnormality determination range for each reference characteristic.

기준특징별 이상판단범위 중 일부는 실제 보안관제상 이상으로 판단하는 수치와 상이할 수 있으며, 기준수정단말(20)은 실제 보안관제상 이상으로 판단하는 수치와 상이한 이상판단범위를 갖는 기준특징을 지정특징으로 선정하고, 지정특징별로 실제 보안관제상 이상으로 판단하는 수치를 이상판단범위로 설정한다. 즉, 기준수정정보에는 기준특징에서 선별된 지정특징 및 지정특징별 이상판단범위가 포함된다. Some of the abnormality judgment ranges for each standard feature may be different from the value judged as an abnormality in the actual security control, and the standard modification terminal 20 designates a standard feature that has an abnormality judgment range that is different from the value judged as abnormality in the actual security control. It is selected as, and the numerical value that is judged as an abnormality in actual security control for each specified characteristic is set as the abnormality determination range. That is, the reference correction information includes the designated features selected from the reference features and the abnormality determination range for each designated feature.

기준수정단말(20)로부터 기준수정정보가 수신되면, 이상판단범위 수정부(14)는 수신된 기준수정정보에 기초하여 기준특징별 이상판단범위를 수정한다. When the reference correction information is received from the reference correction terminal 20, the abnormality determination range correction unit 14 corrects the abnormality determination range for each reference feature based on the received reference correction information.

도시된 실시 예에서, 기준수정단말(20)에서 수신된 기준수정정보에 기초하여 기준특징 F1의 이상판단범위가 R1에서 R1'로 수정된다. In the illustrated embodiment, based on the reference correction information received from the reference correction terminal 20, the abnormality determination range of the reference feature F1 is modified from R1 to R1'.

예를들어, 기준특징별 이상판단범위가 repeat_pkt_bytes_cnt_tot가 1,500 이상이고, repeat_pkp_bytes_cnt_max가 1,300 이상으로 판단된 경우, 기준수정정보는 기준특징인 repeat_pkt_bytes_cnt_tot의 수정에 대한 정보를 포함할 수 있다. 예를 들어, 기준특징인 repeat_pkt_bytes_cnt_tot이 수정이 필요한 지정특징으로 선정되고, 해당 지정특징에 대한 이상판단범위는 1,400 이상으로 수정될 수 있다. 이 경우, 이상판단범위 수정부(14)는 기준수정정보를 수신하여 기준특징별 이상판단범위를 repeat_pkt_bytes_cnt_tot가 1,400 이상 및 repeat_pkp_bytes_cnt_max가 1,300 이상으로 수정할 수 있다. For example, when it is determined that the abnormality determination range for each reference feature is 1,500 or more and repeat_pkp_bytes_cnt_max is 1,300 or more, the reference modification information may include information on the modification of the reference feature repeat_pkt_bytes_cnt_tot. For example, repeat_pkt_bytes_cnt_tot, which is a reference feature, is selected as a designated feature that needs to be modified, and an abnormality determination range for the designated feature may be modified to 1,400 or more. In this case, the abnormality determination range correction unit 14 may modify the abnormality determination range for each reference feature by receiving the reference correction information so that repeat_pkt_bytes_cnt_tot is 1,400 or more and repeat_pkp_bytes_cnt_max is 1,300 or more.

(5) 수정된 기준특징별 이상판단범위에 기초하여 수집데이터를 정상데이터와 이상데이터로 분류하는 단계(S50)의 설명(5) Description of the step (S50) of classifying the collected data into normal data and abnormal data based on the modified abnormality determination range for each reference characteristic

기준특징별 이상판단범위가 수정되면, 학습데이터 생성부(15)는 수정된 기준특징별 이상판단범위에 기초하여 수집데이터를 분류한다. When the abnormality determination range for each reference characteristic is modified, the learning data generation unit 15 classifies the collected data based on the corrected abnormality determination range for each reference characteristic.

수정된 기준특징별 이상판단범위에 포함되는 수집데이터는 이상데이터로, 포함되지 않는 수집데이터는 이상데이터로 분류된다.The collected data included in the abnormality judgment range for each modified reference characteristic is classified as abnormal data, and the collected data not included is classified as abnormal data.

즉, 기준특징별 이상판단범위는 기준수정단말(20)의 기준수정정보에 기초하여 수정되며, 수정된 기준특징별 이상판단범위에 기초하여 수집데이터 분류된다. That is, the abnormality determination range for each reference characteristic is modified based on the reference correction information of the reference modification terminal 20, and the collected data is classified based on the corrected abnormality determination range for each reference characteristic.

이를 통해, 분류된 수집데이터를 이용하여 새로운 학습데이터 생성시, 생성된 학습데이터에는 고객사의 개별적 판단기준 및 관제담당자의 개별적 판단기준과 같은 특성이 반영될 수 있다. Through this, when new learning data is generated using the classified collected data, the generated learning data may reflect characteristics such as the customer's individual judgment criteria and the controller's individual judgment criteria.

따라서, 생성된 학습데이터를 이용한 학습을 통해 이상행위탐지를 위한 2차 학습모델을 생성하는 경우, 2차 학습모델의 에측결과에 고객사의 개별적 판단기준 및 관제담당자의 개별적 판단기준과 같은 특성이 반영될 수 있다. Therefore, when a secondary learning model for detecting abnormal behavior is generated through learning using the generated learning data, characteristics such as the customer's individual judgment criteria and the controller's individual judgment criteria are reflected in the prediction results of the secondary learning model. Can be.

(6) 분류된 정상데이터 및 이상데이터에 기초하여 이상행위탐지를 위한 학습모델을 생성하는 단계(S60)의 설명(6) Description of the step (S60) of generating a learning model for detecting abnormal behavior based on the classified normal data and abnormal data

2차 학습모델 생성부(16)의 2차 이상행위탐지모델 생성모듈(161)은 분류된 수집데이터를 이용하여 2차 학습모델을 생성한다. The secondary abnormal behavior detection model generation module 161 of the secondary learning model generation unit 16 generates a secondary learning model using the classified collected data.

일 실시 예에서, 2차 학습모델 생성부(16)는 분류된 정상데이터 및 이상데이터를 이용하여 학습데이터를 생성한 후, 생성된 학습데이터를 이용한 지도학습을 통해 이상행위탐지를 위한 2차 학습모델을 생성할 수 있다. In one embodiment, the secondary learning model generation unit 16 generates training data using the classified normal data and abnormal data, and then secondary learning for detecting abnormal behavior through supervised learning using the generated learning data. You can create a model.

또한, 일 실시 예에서, 2차 학습모델 생성부(16)는 분류된 수집데이터 중 정상데이터만을 이용하여 학습데이터를 생성한 후, 생성된 학습데이터를 통한 비지도 학습을 통해 이상행위탐지를 위한 2차 학습모델을 생성할 수 있다. In addition, in one embodiment, the secondary learning model generation unit 16 generates learning data using only normal data among the classified collected data, and then uses the generated learning data to detect abnormal behavior through unsupervised learning. Secondary learning models can be created.

3. 본 발명의 다른 실시 예에 따른 이상행위탐지모델 생성방법(S1)의 설명3. Description of the abnormal behavior detection model generation method (S1) according to another embodiment of the present invention

도 7을 참조하면, 서로 다른 복수의 수집데이터에 의해 복수의 학습모델이 생성될 수 있다. 예를 들어, 지역별로 복수 개의 데이터 센서가 운영되고 데이터 센터별로 데이터가 수집되는 경우, 각각의 복수의 데이터센터에 대응되는 복수의 학습모델이 생성될 수 있다. Referring to FIG. 7, a plurality of learning models may be generated by a plurality of different collection data. For example, when a plurality of data sensors are operated for each region and data is collected for each data center, a plurality of learning models corresponding to each of the plurality of data centers may be generated.

따라서, 1차 학습모델 생성부(11)가 서로 다른 복수의 수집데이터를 이용한 지도학습 또는 비지도학습을 통하여 복수의 1차 학습모델을 생성한다(S10).Accordingly, the primary learning model generation unit 11 generates a plurality of primary learning models through supervised learning or unsupervised learning using a plurality of different collected data (S10).

또한, 도 8을 참조하면, 복수의 1차 학습모델이 생성되면, 지도학습모델 해석부(12)가 모델귀납법을 통해 복수의 1차 학습모델을 해석하여 수집데이터별로 1차 학습모델의 이상행위탐지 판단에 대한 기여도가 기 설정된 수준 이상인 개별특징을 선별한다(S21). In addition, referring to FIG. 8, when a plurality of primary learning models are generated, the supervised learning model analysis unit 12 analyzes the plurality of primary learning models through model induction, and the abnormal behavior of the primary learning model for each collected data. Individual features whose contribution to detection determination is higher than or equal to a preset level are selected (S21).

수집데이터별로 개별특징이 선별되면, 지도학습모델 해석부(12)가 개별특징이 선별된 횟수를 카운팅(Counting)한다(S22).When individual features are selected for each collected data, the supervised learning model analysis unit 12 counts the number of times the individual features are selected (S22).

도 7을 참조하면, 수집데이터 A 및 B에 대해서는 개별특징 F1, F2, F3가 선별되고, 수집데이터 C에 대해서는 개별특징 F1, F2, F4가 선별된다. 이에 따라, F1, F2, F3, F4는 3회, 3회, 2회, 1회로 각각 카운팅된다. Referring to FIG. 7, individual features F1, F2, and F3 are selected for collection data A and B, and individual features F1, F2, and F4 are selected for collection data C. Accordingly, F1, F2, F3, and F4 are counted 3 times, 3 times, 2 times, and 1 time respectively.

도 8을 참조하면, 카운팅이 완료되면, 지도학습모델 해석부(12)가 카운팅된 횟수에 기초하여 복수의 수집데이터에 대하여 선별된 모든 개별특징에서 기준특징을 선별한다(S23). Referring to FIG. 8, when counting is completed, the supervised learning model analysis unit 12 selects a reference feature from all individual features selected for a plurality of collected data based on the counted number (S23).

일 실시 예에서, 복수의 수집데이터의 개수 대비 과반수 이상 카운팅된 개별특징이 기준특징으로 선별될 수 있다.In an embodiment, an individual feature counted by a majority relative to the number of a plurality of collected data may be selected as a reference feature.

도 7을 참조하면, 수집데이터의 개수인 3 대비 과반수인 2회 이상 카운팅된 개별특징들이 기준특징으로 선별될 수 있다. 이에 따라, 3회, 3회, 2회로 카운팅된 F1, F2, F3이 기준특징으로 선별된다. Referring to FIG. 7, individual features counted two or more times, which is a majority of the number of collected data 3, may be selected as reference features. Accordingly, F1, F2, F3 counted 3 times, 3 times, and 2 times are selected as standard features.

이를 통해, 특징 이상행위에 대한 탐지시, 수집데이터의 편차에 따라 이상판단의 기준특징이 서로 상이해지는 것을 예방할 수 있다. Through this, when detecting a characteristic abnormal behavior, it is possible to prevent the reference characteristics of the abnormality determination from being different from each other according to the deviation of the collected data.

기준특징이 선별되면, 이상판단범위 수정부(14)가 복수의 수집데이터별로 기준특징별 이상판단범위를 도출한다(S30). When the reference characteristic is selected, the abnormality determination range correction unit 14 derives an abnormality determination range for each reference characteristic for each of a plurality of collected data (S30).

또한, 이상판단범위 수정부(14)는 기준특징별 이상판단범위 중 수집데이터의 데이터량에 영향을 받는 기준특징 이상판단범위를 비례보정대상으로 분류할 수 있다. In addition, the abnormality determination range correction unit 14 may classify a range of abnormality determination of a reference characteristic that is affected by the amount of collected data among the abnormality determination ranges for each reference characteristic as a target for proportional correction.

서로 다른 수집데이터별로 데이터량이 상이해질 수 있으며, 기준특징의 종류에 따라 데이터량에 영향을 받을 수 있다. 예를 들어, 급격한 데이터변화를 이상행위로 탐지하는 학습모델을 생설하려고 하는 경우, 상대적으로 수집데이터량이 많은 지역에서의 이상행위로 판단되는 데이터변화량과 상대적으로 수집데이터량이 적은 지역에서의 이상행위로 판단되는 데이터변화량은 다르게 설정되야 한다. The amount of data may be different for different collected data, and the amount of data may be affected by the type of the reference feature. For example, if you try to create a learning model that detects abrupt data change as an abnormal behavior, the amount of data change judged as an abnormal behavior in an area with a relatively large amount of collected data and an abnormal behavior in an area with a relatively small amount of collected data. The amount of data change to be determined should be set differently.

따라서, 기준특징별 이상판단범위의 수정시 수집데이터별 데이터량을 반영하기 위하여, 수집데이터량의 데이터량에 따라 이상판단범위가 변경되는 기준특징을 비례보정대상으로 분류한다. Therefore, in order to reflect the amount of data by collection data when modifying the range of abnormality judgment for each reference characteristic, the reference characteristic whose abnormality detection range is changed according to the amount of data collected is classified as a target for proportional correction.

복수의 수집데이터별로 기준특징별 이상판단범위가 도출되고, 비례보정대상이 선별되면, 이상판단범위 수정부(14)가 기준수정단말(20)에서 수신한 기준수정정보에 기초하여 기준특징별 이상판단범위를 수정한다(S40). When the abnormality determination range for each standard feature is derived for each of a plurality of collected data and the proportional correction target is selected, the abnormality determination range correction unit 14 based on the standard correction information received from the standard correction terminal 20, the abnormality for each standard feature The determination range is corrected (S40).

이상판단범위 수정부(14)는 복수의 수집데이터 중 어느 하나의 수집데이터에 대한 기준특징별 이상판단범위를 기준수정단말(20)에 제공하고, 이에 대한 기준수정정보에 기초하여 기준특징별 이상판단범위를 수정할 수 있다. The abnormality determination range correction unit 14 provides the abnormality determination range by reference characteristic for any one of a plurality of collected data to the reference correction terminal 20, and based on the reference correction information, the abnormality by reference characteristic The judgment range can be modified.

또한, 이상판단범위 수정부(14)는 비례보정대상에 대한 수정시, 수집데이터의 데이터량에 기초하여 산출된 비례계수를 반영하여 수정할 수 있다. In addition, the abnormality determination range correction unit 14 may modify the proportionality correction target by reflecting the calculated proportionality coefficient based on the data amount of the collected data.

도 7을 참조하면, F1이 비례보정대상으로 분류된 경우, 이상판단범위 수정시 비례계수를 반영하여 수정할 수 있다. Referring to FIG. 7, when F1 is classified as a proportional correction target, it can be corrected by reflecting the proportionality coefficient when the abnormality determination range is corrected.

예를 들어, 수집데이터 C에 대한 기준특징별 이상판단범위가 기준수정단말(20)로 전송되고 이에 대한 기준수정정보가 수신된 경우, 이상판단범위 수정부(14)는 수집데이터 C의 데이터량에 기초하여 수집데이터별 비례계수를 도출하고, 비례보정대상의 수정 시 비례계수를 반영하여 수정할 수 있다. For example, when the abnormality determination range for each standard feature of the collected data C is transmitted to the standard correction terminal 20 and the standard correction information is received, the abnormality determination range correction unit 14 determines the amount of data in the collected data C. Based on this, the proportional coefficient for each collected data can be derived, and the proportional coefficient can be reflected and corrected when modifying the object for proportional correction.

수집데이터 A의 데이터량이 수집데이터 C의 2배이고, 수집데이터 B의 데이터량이 수집데이터 C의 1.5배인 경우, 수집데이터 A의 비례계수는 2, 수집데이터 B의 비례계수는 1.5, 수집데이터 C의 비례계수는 1로 도출된다. If the data amount of collected data A is twice the amount of collected data C, and the data amount of collected data B is 1.5 times that of the collected data C, then the proportionality coefficient of collected data A is 2, the proportionality coefficient of collected data B is 1.5, and the proportion of collected data C The coefficient is derived as 1.

이상판단범위 수정부(14)는 수집데이터 C의 기준특징(F1)의 이상판단범위와 기준수정정보에 포함된 수정된 이상판단범위의 차이인 보정치(A)를 도출하며, 수정시 보정치(A)에 비례계수를 곱하여 각 수집데이터의 기준특징(F1)에 대한 수정을 수행할 수 있다. The abnormality determination range correction unit 14 derives a correction value (A), which is the difference between the abnormality determination range of the reference feature (F1) of the collected data C and the corrected abnormality determination range included in the reference correction information, and when corrected, the correction value (A ) Can be multiplied by a proportionality factor to modify the reference feature (F1) of each collected data.

따라서, 수집데이터 A, 수집데이터 B 및 수집데이터 C의 개별특징인 F1의 이상판단범위는 각각 R1 + 2 x A, R4 + 1.5 x A 및 R7 + 1 x A로 수정된다. Therefore, the abnormality judgment range of F1, which is an individual characteristic of collected data A, collected data B, and collected data C, is modified to R1 + 2 x A, R4 + 1.5 x A, and R7 + 1 x A, respectively.

수정이 완료되면, 학습데이터 생성부(15)가 수집데이터별로 수정된 기준특징별 이상판단범위에 기초하여 각각의 수집데이터를 정상데이터와 이상데이터로 분류한다(S50). When the correction is completed, the learning data generation unit 15 classifies each collected data into normal data and abnormal data based on the abnormality determination range for each reference characteristic corrected for each collected data (S50).

분류가 완료되면, 2차 학습모델 생성부(16)가 수집데이터별로 분류된 정상데이터 및 이상데이터를 이용한 지도학습 또는 비지도학습을 통해 2차 학습모델을 생성한다(S60).When the classification is completed, the secondary learning model generation unit 16 generates a secondary learning model through supervised or unsupervised learning using normal data and abnormal data classified by collected data (S60).

이상에서 전술한 본 발명의 실시 예에 따른 방법은, 하드웨어인 서버와 결합되어 실행되기 위해 프로그램(또는 어플리케이션)으로 구현되어 매체에 저장될 수 있다.The method according to the embodiment of the present invention described above may be implemented as a program (or application) to be executed in combination with a server, which is hardware, and stored in a medium.

상기 전술한 프로그램은, 상기 컴퓨터가 프로그램을 읽어 들여 프로그램으로 구현된 상기 방법들을 실행시키기 위하여, 상기 컴퓨터의 프로세서(CPU)가 상기 컴퓨터의 장치 인터페이스를 통해 읽힐 수 있는 C, C++, JAVA, 기계어 등의 컴퓨터 언어로 코드화된 코드(Code)를 포함할 수 있다. 이러한 코드는 상기 방법들을 실행하는 필요한 기능들을 정의한 함수 등과 관련된 기능적인 코드(Functional Code)를 포함할 수 있고, 상기 기능들을 상기 컴퓨터의 프로세서가 소정의 절차대로 실행시키는데 필요한 실행 절차 관련 제어 코드를 포함할 수 있다. 또한, 이러한 코드는 상기 기능들을 상기 컴퓨터의 프로세서가 실행시키는데 필요한 추가 정보나 미디어가 상기 컴퓨터의 내부 또는 외부 메모리의 어느 위치(주소 번지)에서 참조되어야 하는지에 대한 메모리 참조관련 코드를 더 포함할 수 있다. 또한, 상기 컴퓨터의 프로세서가 상기 기능들을 실행시키기 위하여 원격(Remote)에 있는 어떠한 다른 컴퓨터나 서버 등과 통신이 필요한 경우, 코드는 상기 컴퓨터의 통신 모듈을 이용하여 원격에 있는 어떠한 다른 컴퓨터나 서버 등과 어떻게 통신해야 하는지, 통신 시 어떠한 정보나 미디어를 송수신해야 하는지 등에 대한 통신 관련 코드를 더 포함할 수 있다.The above-described program is C, C++, JAVA, machine language, etc. that can be read by the computer's processor (CPU) through the device interface of the computer in order for the computer to read the program and execute the methods implemented as a program. It may include a code (Code) coded in the computer language of. Such code may include a functional code related to a function defining necessary functions for executing the methods, and a control code related to an execution procedure necessary for the processor of the computer to execute the functions according to a predetermined procedure. can do. In addition, such code may further include code related to a memory reference to which location (address address) of the internal or external memory of the computer or the media or additional information necessary for the processor of the computer to execute the functions. have. In addition, when the processor of the computer needs to communicate with any other computer or server in the remote in order to execute the functions, the code uses the communication module of the computer to determine how It may further include a communication-related code for whether to communicate or what information or media to transmit and receive during communication.

상기 저장되는 매체는, 레지스터, 캐쉬, 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로는, 상기 저장되는 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있지만, 이에 제한되지 않는다. 즉, 상기 프로그램은 상기 컴퓨터가 접속할 수 있는 다양한 서버 상의 다양한 기록매체 또는 사용자의 상기 컴퓨터상의 다양한 기록매체에 저장될 수 있다. 또한, 상기 매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장될 수 있다.The stored medium is not a medium that stores data for a short moment, such as a register, cache, memory, etc., but a medium that stores data semi-permanently and can be read by a device. Specifically, examples of the storage medium include, but are not limited to, ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like. That is, the program may be stored in various recording media on various servers to which the computer can access, or on various recording media on the user's computer. In addition, the medium may be distributed over a computer system connected through a network, and computer-readable codes may be stored in a distributed manner.

본 발명의 실시예와 관련하여 설명된 방법 또는 알고리즘의 단계들은 하드웨어로 직접 구현되거나, 하드웨어에 의해 실행되는 소프트웨어 모듈로 구현되거나, 또는 이들의 결합에 의해 구현될 수 있다. 소프트웨어 모듈은 RAM(Random Access Memory), ROM(Read Only Memory), EPROM(Erasable Programmable ROM), EEPROM(Electrically Erasable Programmable ROM), 플래시 메모리(Flash Memory), 하드 디스크, 착탈형 디스크, CD-ROM, 또는 본 발명이 속하는 기술 분야에서 잘 알려진 임의의 형태의 컴퓨터 판독가능 기록매체에 상주할 수도 있다.The steps of a method or algorithm described in connection with an embodiment of the present invention may be implemented directly in hardware, implemented as a software module executed by hardware, or a combination thereof. Software modules include Random Access Memory (RAM), Read Only Memory (ROM), Erasable Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), Flash Memory, hard disk, removable disk, CD-ROM, or It may reside on any type of computer-readable recording medium well known in the art to which the present invention pertains.

이상, 첨부된 도면을 참조로 하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야의 통상의 기술자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며, 제한적이 아닌 것으로 이해해야만 한다.In the above, embodiments of the present invention have been described with reference to the accompanying drawings, but those skilled in the art to which the present invention pertains can be implemented in other specific forms without changing the technical spirit or essential features. You will be able to understand. Therefore, the embodiments described above are illustrative in all respects, and should be understood as non-limiting.

10: 이상행위탐지모델 생성장치
111: 1차 이상행위탐지모델 생성모듈
12: 학습모델 해석부
13: 이상판단범위 판단부
14: 이상판단범위 수정부
15: 학습데이터 생성부
16: 2차 학습모델 생성부
161: 2차 이상행위탐지모델 생성모듈
20: 기준수정단말10: abnormal behavior detection model generation device
111: 1st abnormal behavior detection model generation module
12: learning model analysis unit
13: Abnormality judgment range judgment unit
14: Correction of the scope of abnormality judgment
15: learning data generation unit
16: Secondary learning model generation unit
161: 2nd abnormal behavior detection model generation module
20: standard modification terminal

Claims

As a method performed by a generation device that generates an abnormal behavior detection model using learning data generated based on XAI (eXplainable Artificial Intelligence),
Generating, by the generating device, a learning model for detecting anomalous behavior through supervised or unsupervised learning using collected data of the security device;
The generating device interprets the learning model through model-agnostic methods and selects a reference feature whose contribution to the abnormal behavior detection determination of the learning model is greater than or equal to a preset criterion among a plurality of features included in the collected data. ;
Deriving numerical data for each reference characteristic by analyzing the learning model through model-agnostic methods, and deriving an abnormality judgment range for each reference characteristic by using the derived numerical data;
Providing, by the generating device, an abnormality determination range for each reference characteristic to the reference correction terminal, and correcting an abnormality determination range for each reference characteristic based on the reference correction information received from the reference correction terminal;
Classifying, by the generating device, the collected data into normal data and abnormal data based on the modified abnormality determination range for each reference characteristic; And
The generating device comprises the step of generating a learning model for detecting abnormal behavior based on the classified normal data and abnormal data,
Generating a learning model for detecting abnormal behavior through unsupervised learning or supervised learning using the collected data of the security device,
It is to create a plurality of learning models for detecting abnormal behavior through supervised or unsupervised learning using a plurality of collected data collected from a plurality of different security devices,
The step of selecting the reference feature,
Analyzing a plurality of learning models through model-agnostic methods, and selecting individual features whose contribution to the abnormal behavior detection determination of the learning model for each collected data is greater than or equal to a preset criterion;
Counting the number of times the individual features are selected; And
Based on the counted number of times, comprising the step of selecting a reference feature from all individual features selected for the plurality of collected data,
The step of selecting the individual features,
Individual features counted more than half of the number of collected data are selected as standard features,
The step of deriving the abnormality judgment range for each standard feature,
Derive the range of abnormality judgment for each standard feature for each of a plurality of collected data
Among the ranges of abnormality judgment for each reference characteristic, the range of abnormality judgment of the reference characteristics that is affected by the amount of data collected is classified as the subject of proportional correction.
The step of modifying the abnormality judgment range for each of the reference features,
When modifying the range of judgment for abnormality of the reference characteristic corresponding to the subject of proportional correction, the correction is made by reflecting the proportionality coefficient calculated based on the data amount of the collected data.
A method of generating anomalous behavior detection model using learning data generated based on XAI.

The method of claim 1,
The reference modification information includes a designated feature selected from the reference feature and an abnormality determination range for each designated feature,
The step of providing the abnormality determination range for each reference characteristic to a reference modification terminal, and correcting the abnormality determination range for each reference characteristic based on the reference correction information received from the reference modification terminal,
It is to modify a part of the abnormality judgment range for each standard characteristic that corresponds to the designated characteristic to the abnormality judgment range for each specific characteristic.
A method of generating anomalous behavior detection model using learning data generated based on XAI.

The method of claim 1,
The standard modification terminal,
Including at least one of the terminal of the customer requesting the generation of the abnormal behavior detection model and the terminal of the controller performing the abnormal behavior detection task,
A method of generating anomalous behavior detection model using learning data generated based on XAI.

The method of claim 1,
Generating a learning model for detecting abnormal behavior based on the classified normal data and abnormal data,
To create an unsupervised learning model for detecting abnormal behavior through unsupervised learning using only classified normal data,
A method of generating anomalous behavior detection model using learning data generated based on XAI.

The method of claim 1,
Generating a learning model for detecting abnormal behavior based on the classified normal data and abnormal data,
To generate a supervised learning model for detecting abnormal behavior through supervised learning using classified normal data and abnormal data,
A method of generating anomalous behavior detection model using learning data generated based on XAI.

delete

A primary learning model generation unit that generates a learning model for detecting abnormal behavior through supervised or unsupervised learning using collected data of the security device;
By interpreting the learning model through model-agnostic methods, among a plurality of features included in the collected data, a criterion feature whose contribution to the abnormal behavior detection judgment of the learning model is greater than or equal to a preset criterion is selected, and numerical values for each standard feature An abnormality determination range determination unit that derives data and uses the derived numerical data to derive an abnormality determination range for each reference characteristic;
An abnormality determination range correction unit that provides an abnormality determination range for each standard feature to the standard modification terminal and corrects the abnormality determination range for each standard feature based on the standard correction information received from the standard modification terminal;
A learning data generator for classifying collected data into normal data and abnormal data based on the modified abnormality determination range for each reference characteristic; And
It includes a secondary learning model generation unit that generates a learning model for detecting abnormal behavior based on the classified normal data and abnormal data,
The primary learning model generation unit,
Generates a plurality of learning models for detecting abnormal behavior through supervised or unsupervised learning using a plurality of collected data collected from a plurality of different security devices,
The abnormality determination range determination unit,
Analyzing multiple learning models through model-agnostic methods, selecting individual features whose contribution to the abnormal behavior detection judgment of the learning model for each collected data is more than a preset criterion,
Counting the number of times individual features were selected,
Individual features counted more than half of the number of collected data are selected as standard features, and an abnormality judgment range for each standard feature is derived for each of the plurality of collected data.
Among the ranges of abnormality judgment for each reference characteristic, the range of abnormality judgment of a reference characteristic that is affected by the amount of data collected is classified as a subject for proportional correction,
The abnormality determination range correction unit,
When modifying the range of judgment for abnormality of the reference characteristic corresponding to the subject of proportional correction, the correction is made by reflecting the proportionality coefficient calculated based on the data amount of the collected data.
An abnormal behavior detection model generation device using learning data generated based on XAI.

It is combined with a computer that is hardware and stored in a computer-readable recording medium to execute the method of any one of claims 1 to 5,
program.