KR20220096802A

KR20220096802A - System And Method for Detecting Abnormal Control Data through Predicting Format of Control Command

Info

Publication number: KR20220096802A
Application number: KR1020200189567A
Authority: KR
Inventors: 김광식; 소현진; 구경호
Original assignee: 주식회사 포스코아이씨티
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2022-07-07
Also published as: KR102504809B1

Abstract

In accordance with one aspect of the present invention, a system for detecting abnormal control data by predicting a control command, includes: a security rule generation part generating control data security rules based on a control command of control data delivered from a control terminal; and an abnormal control data determination part determining whether a target control command of target control data received from the control terminal is abnormal by using the control data security rules. The control data security rules include: a clustering model for determining the type of the target control command by using the target control command; a field format determination model generating a prediction field format dividing a field which is a unit control command corresponding to the determined type of the target control command and constituting the target control command, and then, applying the generated prediction field format to the target control command to generate preprocessing data; and an abnormal control data determination model receiving the preprocessing data to determine whether the target control command is abnormal control data. Therefore, the present invention is capable of detecting abnormal control data about a control command of an unknown format.

Description

System And Method for Detecting Abnormal Control Data through Predicting Format of Control Command

본 발명은 제어명령의 포맷을 예측하여 비정상 제어데이터를 탐지하는 시스템 및 방법에 관한 것이다.The present invention relates to a system and method for detecting abnormal control data by predicting the format of a control command.

PLC(Programmable Logic Controller)와 같은 공장 제어 장치들의 온라인화가 증가됨에 따라 공장 제어 장치들을 타겟으로 하는 사이버 공격 또한 증가하고 있다.As the onlineization of factory control devices such as programmable logic controllers (PLCs) increases, cyber attacks targeting factory control devices are also increasing.

공장 제어 장치들을 타겟으로 하는 사이버 공격은 공장 제어 장치들로 전달되는 제어데이터의 커맨드(Command) 또는 밸류(Value)를 불법적으로 변경함으로써 공장 제어 장치들을 오작동시키거나 동작 불능 상태로 만들 수 있고, 심한 경우 공장 제어 장치들을 손상시킬 수 있다.Cyberattacks targeting factory control devices may cause malfunction or inoperability of factory control devices by illegally changing the command or value of control data transmitted to factory control devices. This may damage the factory controls.

이러한 사이버 공격으로부터 공장 제어 장치들을 보호하기 위한 침입 탐지 시스템 등과 같은 보안 시스템이 제안된 바 있다. 일반적인 침입 탐지 시스템은 공장 제어 장치로 입력되는 제어데이터가 미리 정해진 보안규칙에 위배되는지 여부를 판단함으로써 해당 제어데이터가 정상 제어데이터인지 또는 비정상 제어데이터인지를 구분하도록 설계된다.A security system such as an intrusion detection system for protecting factory control devices from such cyber attacks has been proposed. A general intrusion detection system is designed to distinguish whether the control data input to the factory control device is normal control data or abnormal control data by determining whether the control data is in violation of a predetermined security rule.

특히, 제어데이터는 공장 제어 장치를 제어하기 위한 제어명령을 포함하며, 이러한 제어명령은 각 공장 및 공정마다 서로 다른 포맷(format)을 갖는다. 각 공장 및 공정에 대응하는 제어명령의 포맷들을 공개되어 유출될 경우 외부 공격의 표적이 되기 쉬워 보안에 취약해질 수 있으며, 제어명령의 포맷들이 공개되지 않는 경우, 비정상 제어명령을 포함하는 비정상 제어데이터를 탐지할 수 없는 문제가 발생한다.In particular, the control data includes control commands for controlling the factory control device, and these control commands have different formats for each factory and process. When the formats of control commands corresponding to each plant and process are disclosed and leaked, they can easily become a target of external attacks and become vulnerable to security. If the formats of control commands are not disclosed, abnormal control data including abnormal control commands There is a problem that cannot be detected.

본 발명은 상술한 문제점을 해결하기 위한 것으로서, 자연어 처리 알고리즘을 이용하여 제어명령의 종류를 예측할 수 있는 비정상 제어데이터 탐지 시스템 및 방법을 제공하는 것을 그 기술적 특징으로 한다.The present invention is to solve the above problems, and it is a technical feature to provide a system and method for detecting abnormal control data capable of predicting the type of control command using a natural language processing algorithm.

또한, 본 발명은 자연어 처리 알고리즘을 이용하여 제어명령의 포맷을 예측할 수 있는 비정상 제어데이터 탐지 시스템 및 방법을 제공하는 것을 그 기술적 특징으로 한다.In addition, the present invention provides a system and method for detecting abnormal control data capable of predicting the format of a control command by using a natural language processing algorithm.

또한, 본 발명은 공개되지 않은 제어명령의 종류 및 포맷이 예측된 정상 데이터를 학습하여 비정상 제어데이터의 탐지를 위한 제어데이터 보안규칙을 생성할 수 있는 비정상 제어데이터 탐지 시스템을 제공하는 것을 그 기술적 특징으로 한다.In addition, the present invention provides an abnormal control data detection system capable of generating a control data security rule for detection of abnormal control data by learning normal data whose types and formats of control commands that have not been disclosed are predicted. do it with

상술한 목적을 달성하기 위한 본 발명의 일 측면에 따른 제어명령의 포맷을 예측하여 비정상 제어데이터를 탐지하는 시스템은 제어 단말로부터 전달되는 제어데이터의 제어명령을 기초로 제어데이터 보안규칙을 생성하는 보안규칙 생성부; 및 상기 제어데이터 보안규칙을 이용하여 상기 제어 단말에서 수신되는 타겟 제어데이터의 타겟 제어명령의 비정상 여부를 판단하는 비정상 제어데이터 판단부;를 포함하고, 상기 제어데이터 보안규칙은, 상기 타겟 제어명령을 이용하여 상기 타겟 제어명령의 종류를 결정하는 클러스터링 모델, 결정된 타겟 제어명령의 종류에 대응하고 상기 타겟 제어명령을 구성하는 단위 제어명령인 필드가 구분된 예측 필드 포맷을 생성하고, 생성된 예측 필드 포맷을 상기 타겟 제어명령에 부여하여 전처리 데이터를 생성하는 필드 포맷 결정 모델, 및 상기 전처리 데이터를 입력받아 상기 타겟 제어명령의 비정상 제어데이터 여부를 판단하는 비정상 제어데이터 판단모델을 포함하는 것을 특징으로 한다.A system for detecting abnormal control data by predicting the format of a control command according to an aspect of the present invention for achieving the above object is a security that generates a control data security rule based on a control command of control data transmitted from a control terminal. rule generator; and an abnormal control data determination unit that determines whether a target control command of the target control data received from the control terminal is abnormal by using the control data security rule, wherein the control data security rule includes: A clustering model for determining the type of the target control command using the clustering model, a field corresponding to the determined type of the target control command and a field that is a unit control command constituting the target control command is generated, and a prediction field format is generated, and the generated prediction field format is generated. It is characterized in that it comprises a field format determination model for generating pre-processing data by applying to the target control command, and an abnormal control data determination model for receiving the pre-processing data and determining whether the target control command is abnormal control data.

본 발명에 따르면, 포맷이 공개되지 않은 제어명령에 대한 비정상 제어 데이터를 탐지할 수 있는 효과가 있다.According to the present invention, there is an effect of detecting abnormal control data for a control command whose format is not disclosed.

도 1은 본 발명의 일 실시예에 따른 비정상 제어데이터 탐지 시스템이 적용되는 네트워크 구성을 개략적으로 보여주는 도면이다.
도 2는 도 1에 도시된 게이트웨이의 구성을 개략적으로 보여주는 블록도이다.
도 3은 도 1에 도시된 탐지 서버의 구성을 개략적으로 보여주는 블록도이다.
도 4는 일반적인 제어명령 데이터를 나타내는 도면이다.
도 5는 본 발명의 일 실시예에 따른 제어데이터 보안규칙에 따른 비정상 제어데이터 판단 과정을 나타내는 블록도이다.
도 6은 도 5에 도시된 클러스터링 모델의 학습과정을 나타내는 플로우 차트이다.
도 7은 도 5에 도시된 필드 포맷 결정 모델의 학습과정을 나타내는 플로우 차트이다.
도 8은 도 5에 도시된 비정상 제어데이터 판단모델의 학습과정을 나타내는 플로우 차트이다.1 is a diagram schematically showing a network configuration to which an abnormal control data detection system according to an embodiment of the present invention is applied.
FIG. 2 is a block diagram schematically showing the configuration of the gateway shown in FIG. 1 .
FIG. 3 is a block diagram schematically showing the configuration of the detection server shown in FIG. 1 .
4 is a view showing general control command data.
5 is a block diagram illustrating an abnormal control data determination process according to a control data security rule according to an embodiment of the present invention.
6 is a flowchart illustrating a learning process of the clustering model shown in FIG. 5 .
7 is a flowchart illustrating a learning process of the field format determination model shown in FIG. 5 .
8 is a flowchart illustrating a learning process of the abnormal control data determination model shown in FIG. 5 .

이하, 첨부되는 도면을 참고하여 본 발명의 실시예들에 대해 상세히 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

본 명세서에서 서술되는 용어의 의미는 다음과 같이 이해되어야 할 것이다.The meaning of the terms described in this specification should be understood as follows.

단수의 표현은 문맥상 명백하게 다르게 정의하지 않는 한 복수의 표현을 포함하는 것으로 이해되어야 하고, "제1", "제2" 등의 용어는 하나의 구성요소를 다른 구성요소로부터 구별하기 위한 것으로, 이들 용어들에 의해 권리범위가 한정되어서는 안된다.The singular expression is to be understood as including the plural expression unless the context clearly defines otherwise, and the terms "first", "second", etc. are used to distinguish one element from another, The scope of rights should not be limited by these terms.

"포함하다" 또는 "가지다" 등의 용어는 하나 또는 그 이상의 다른 특징이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.It should be understood that terms such as “comprise” or “have” do not preclude the possibility of addition or existence of one or more other features or numbers, steps, operations, components, parts, or combinations thereof.

"적어도 하나"의 용어는 하나 이상의 관련 항목으로부터 제시 가능한 모든 조합을 포함하는 것으로 이해되어야 한다. 예를 들어, "제1 항목, 제2 항목 및 제3 항목 중에서 적어도 하나"의 의미는 제1 항목, 제2 항목 또는 제3 항목 각각 뿐만 아니라 제2 항목, 제2 항목 및 제3 항목 중에서 2개 이상으로부터 제시될 수 있는 모든 항목의 조합을 의미한다.The term “at least one” should be understood to include all possible combinations from one or more related items. For example, the meaning of “at least one of the first, second, and third items” means that each of the first, second, or third items as well as two of the second, second, and third items It means a combination of all items that can be presented from more than one.

이하, 도 1을 참조하여, 본 발명의 일 실시예에 따른 비정상 제어데이터 탐지 시스템을 대해 상세히 설명한다. 도 1은 본 발명의 일 실시예에 따른 비정상 제어데이터 탐지 시스템이 적용되는 네트워크 구성을 개략적으로 보여주는 도면이다. 도 2는 도 1에 도시된 게이트웨이의 구성을 개략적으로 보여주는 블록도이고, 도 3은 도 1에 도시된 탐지 서버의 구성을 개략적으로 보여주는 블록도이다.Hereinafter, an abnormal control data detection system according to an embodiment of the present invention will be described in detail with reference to FIG. 1 . 1 is a diagram schematically showing a network configuration to which an abnormal control data detection system according to an embodiment of the present invention is applied. FIG. 2 is a block diagram schematically showing the configuration of the gateway shown in FIG. 1 , and FIG. 3 is a block diagram schematically showing the configuration of the detection server shown in FIG. 1 .

도 1에 도시된 제어 단말(100)은 공장 제어 장치(110)를 제어하고 공장 제어 장치(110)에 의해 생성되는 조업 데이터를 수집하는 역할을 수행한다. 본 발명의 일 실시예에 따른 제어 단말(100)은 SCADA(Supervisory Control And Data Acquisition) 또는 HMI(Human Machine Interface)를 포함할 수 있다.The control terminal 100 shown in FIG. 1 controls the factory control device 110 and serves to collect operation data generated by the factory control device 110 . The control terminal 100 according to an embodiment of the present invention may include a supervisory control and data acquisition (SCADA) or a human machine interface (HMI).

공장 제어 장치(110)는 제어 단말(100)에서 전달되는 제어데이터를 기초로 조업 현장에 배치되어 있는 설비를 제어하고, 설비 제어 결과 또는 각 설비로부터 획득되는 조업 데이터를 제어 단말(100)로 전송하는 역할을 수행한다. 공장 제어 장치(110)는 PLC(Programmable Logic Controller)를 포함할 수 있다. The factory control device 110 controls the facilities disposed at the operation site based on the control data transmitted from the control terminal 100 , and transmits the facility control result or operation data obtained from each facility to the control terminal 100 . perform the role The factory control device 110 may include a programmable logic controller (PLC).

비정상 제어데이터 탐지 시스템(120)은 제어 단말(100)과 공장 제어 장치(110)를 연결한다. 구체적으로, 비정상 제어데이터 탐지 시스템(120)은 제어 단말(100)에서 생성된 제어데이터를 공장 제어 장치(110)로 전달하고, 공장 제어 장치(110)에 의해 생성된 조업 데이터를 제어 단말(100)로 전달한다. The abnormal control data detection system 120 connects the control terminal 100 and the factory control device 110 . Specifically, the abnormal control data detection system 120 transmits the control data generated by the control terminal 100 to the factory control device 110 , and transmits the operation data generated by the factory control device 110 to the control terminal 100 . ) to pass

도 2는 도 1에 도시된 게이트웨이의 구성을 개략적으로 보여주는 블록도이다. 도 2에 도시된 바와 같이 보안 게이트웨이(130)는 데이터 차단부(210), 데이터 미러링부(220) 및 바이패스부(230)를 포함할 수 있다.FIG. 2 is a block diagram schematically showing the configuration of the gateway shown in FIG. 1 . As shown in FIG. 2 , the security gateway 130 may include a data blocking unit 210 , a data mirroring unit 220 , and a bypass unit 230 .

데이터 차단부(210)는 제어 단말(110)로부터 제어데이터가 수신되면 수신된 제어데이터의 IP 어드레스(Internet Protocol Address)가 미리 등록된 IP 어드레스에 해당하는지 여부를 판단한다. 판단 결과, 수신된 제어데이터의 IP 어드레스가 미리 등록된 IP 어드레스에 해당하지 않는 경우 데이터 차단부(210)는 해당 제어데이터는 비정상적인 경로를 통해 수신된 비정상 제어데이터인 것으로 판단하여 해당 제어데이터가 공장 제어 장치(110)로 전달되는 것을 차단할 수 있다.When control data is received from the control terminal 110 , the data blocking unit 210 determines whether an IP address (Internet Protocol Address) of the received control data corresponds to a pre-registered IP address. As a result of the determination, if the IP address of the received control data does not correspond to the previously registered IP address, the data blocking unit 210 determines that the control data is abnormal control data received through an abnormal path, and the corresponding control data is returned to the factory. It is possible to block transmission to the control device 110 .

데이터 차단부(210)는 탐지 서버(140)에 의해 생성되는 정상 경로 리스트를 기초로 제어 단말(110)에서 수신되는 제어데이터의 IP 어드레스가 미리 등록된 IP 어드레스에 해당하는지 여부를 판단할 수 있다. 이때, 정상 경로 리스트에는 정상 IP 어드레스들이 등록되어 있을 수 있다.The data blocking unit 210 may determine whether an IP address of the control data received from the control terminal 110 corresponds to a pre-registered IP address based on the normal path list generated by the detection server 140 . . In this case, normal IP addresses may be registered in the normal path list.

데이터 차단부(210)는 제어데이터의 차단이 수행되면 차단 결과를 사용자에게 통지할 수 있다.The data blocking unit 210 may notify the user of the blocking result when the control data is blocked.

데이터 미러링부(220)는 제어 단말(100)로부터 공장 제어 장치(110)로 제어데이터를 전달한다. 이때, 데이터 미러링부(220)는 제어데이터를 네트워크 지연없이 복제하여 복제된 제어데이터를 탐지 서버(140)로도 전달한다. 탐지 서버(140)는 이러한 데이터 미러링부(220)의 복제 기능을 통해 제어데이터를 획득함으로써 비정상 제어데이터를 탐지할 수 있게 된다.The data mirroring unit 220 transmits control data from the control terminal 100 to the factory control device 110 . At this time, the data mirroring unit 220 transmits the duplicated control data to the detection server 140 by replicating the control data without network delay. The detection server 140 can detect the abnormal control data by acquiring the control data through the replication function of the data mirroring unit 220 .

바이패스부(230)는 보안 게이트웨이(130)의 오류 발생 여부에 따라 제어데이터의 전달 경로를 조정한다. 구체적으로, 바이패스부(230)는 보안 게이트웨이(130)에 오류가 발생되지 않은 경우 제어데이터가 데이터 미러링부(220)를 통해 공장 제어 장치(110)로 전달되도록 한다.The bypass unit 230 adjusts the transmission path of the control data according to whether an error occurs in the security gateway 130 . Specifically, the bypass unit 230 transmits control data to the factory control device 110 through the data mirroring unit 220 when no error occurs in the security gateway 130 .

하지만 게이트웨이(130)에 오류가 발생된 경우 바이패스부(230)는 데이터 미러링부(220)를 바이패스하여 제어데이터를 공장 제어 장치(110)로 직접 전달한다. 이러한 경우 제어데이터의 복제 기능이 수행되지 않게 된다.However, when an error occurs in the gateway 130 , the bypass unit 230 bypasses the data mirroring unit 220 and directly transmits the control data to the factory control device 110 . In this case, the copy function of control data is not performed.

탐지 서버(140)는 제어데이터들을 기초로 제어데이터 보안규칙을 생성하고, 판단 대상이 되는 타겟 제어데이터가 수신되면 타겟 제어데이터를 제어데이터 보안규칙과 비교함으로써 타겟 제어데이터가 비정상 제어데이터인지 여부를 판단한다.The detection server 140 generates a control data security rule based on the control data, and when the target control data to be determined is received, the detection server 140 compares the target control data with the control data security rule to determine whether the target control data is abnormal control data. judge

이하, 본 발명에 따른 탐지 서버(140)의 구성을 도 3 및 도 4를 참조하여 상세히 설명한다.Hereinafter, the configuration of the detection server 140 according to the present invention will be described in detail with reference to FIGS. 3 and 4 .

도 3은 도 1에 도시된 탐지 서버의 구성을 개략적으로 보여주는 블록도이다. FIG. 3 is a block diagram schematically showing the configuration of the detection server shown in FIG. 1 .

본 발명의 실시예에 따른 탐지 서버(140)는 데이터 수집부(310), 보안규칙 생성부(320), 비정상 제어데이터 판단부(330), 감시부(340) 및 데이터 베이스(350)을 포함한다.The detection server 140 according to an embodiment of the present invention includes a data collection unit 310 , a security rule generation unit 320 , an abnormal control data determination unit 330 , a monitoring unit 340 , and a database 350 . do.

데이터 수집부(310)는 데이터 미러링부(220)를 통해 수신된 제어데이터들 중 적어도 일부를 추출하여 수집한다. 이때, 제어 단말(100) 및 공장 제어 장치(110)의 가동 기간 중에 데이터 미러링부(220)를 통해 수신된 제어데이터들을 정상 제어데이터로 가정할 수 있으며, 이에 따라, 데이터 수집부(310)는 제어데이터들 중 제어 단말(100) 및 공장 제어 장치(110)의 가동 기간 중에 데이터 미러링부(220)를 통해 수신된 제어데이터들을 수집할 수 있다. The data collection unit 310 extracts and collects at least some of the control data received through the data mirroring unit 220 . In this case, it may be assumed that the control data received through the data mirroring unit 220 during the operation period of the control terminal 100 and the factory control device 110 are normal control data, and accordingly, the data collection unit 310 is Among the control data, control data received through the data mirroring unit 220 may be collected during the operation period of the control terminal 100 and the factory control device 110 .

데이터 수집부(310)에 의해 수집된 제어데이터는 도 4에 도시된 바와 같이, 제어 장치 정보, 송수신처 정보 및 밸류를 포함한다. As shown in FIG. 4 , the control data collected by the data collection unit 310 includes control device information, transceiver information, and a value.

제어 장치 정보는 SysInfo(System Information), PType(ProtoType), 및 Name을 포함한다. 이때, SysInfo(System Information)는 제어장치 프로토콜 정보를 의미하고, PType(ProtoType)은 제어장치 프로토콜 타입 정보를 의미하고, Name은 제어장치 모델정보를 의미한다.The control device information includes SysInfo (System Information), PType (ProtoType), and Name. At this time, SysInfo (System Information) means controller protocol information, PType (ProtoType) means controller protocol type information, and Name means controller model information.

송수신처 정보는 SMac(Source Mac Address), Sport(Source Port), SIP(Source Internet Protocol)를 포함하고, DMac(Destination Mac Address), DIP(Destination Internet Protocol), DPort(Destination Port)를 포함하며, 네트워크 정보인 VID(Vlan ID)를 더 포함할 수 있다. 이때, SMac(Source Mac Address)은 제어데이터를 송신한 MAC 어드레스를 의미하고, Sport(Source Port)는 제어데이터를 송신한 포트를 의미하며, SIP(Source Internet Protocol)는 제어데이터를 송신한 IP 어드레스를 의미하고, DMac(Destination Mac Address)는 제어데이터를 수신할 MAC 어드레스를 의미하며, DIP(Destination Internet Protocol)는 제어데이터를 수신할 IP 어드레스를 의미하고, DPort(Destination Port)는 제어데이터를 수신할 포트를 의미하며, VID(Vlan ID)는 네트워크 정보로써, 가상 네트워크 망 분리 시 가상 네트워크간의 구별을 위해 이용되는 식별정보를 의미할 수 있다. The sending and receiving destination information includes SMac (Source Mac Address), Sport (Source Port), SIP (Source Internet Protocol), and includes DMac (Destination Mac Address), DIP (Destination Internet Protocol), and DPort (Destination Port), It may further include VID (Vlan ID) which is network information. At this time, SMac (Source Mac Address) means the MAC address that transmitted the control data, Sport (Source Port) means the port that transmitted the control data, and SIP (Source Internet Protocol) means the IP address that transmitted the control data. DMac (Destination Mac Address) means a MAC address to receive control data, DIP (Destination Internet Protocol) means an IP address to receive control data, and DPort (Destination Port) means to receive control data. It means a port to be used, and VID (Vlan ID) is network information and may refer to identification information used to distinguish between virtual networks when virtual networks are separated.

밸류는 공장 및 공정에 대한 제어명령을 의미하고, 복수 개의 필드로 이루어질 수 있다. 이때, 각 필드는 공장 및 공정을 제어하는 단위 제어명령이며, 16진수의 데이터로 이루어질 수 있다. 공장 및 공정에 따라 제어명령을 구성하는 필드들의 종류 및 크기가 달라질 수 있다. 예를 들어, 전공정, 메인공정, 후공정 및 공통 공정에 대한 제어명령들은 서로 다른 필드들을 포함하기 때문에 서로 다른 길이를 가질 수 있다. 이때, 각 공장 및 공정에 대한 제어 명령의 필드 구성을 필드 포맷으로 정의할 수 있다. 필드 포맷이 공개되거나 시스템에 저장되어 외부로 유출되는 경우, 외부 공격의 표적이 되기 쉽기 때문에 보안에 취약해질 수 있기 때문에, 필드 포맷은 일반적으로 공개되거나 시스템에 저장되지 않는다. 그러나, 필드 포맷에 대한 정보가 없는 경우에는 제어명령에 대한 비정상 제어데이터를 탐지할 수 없다. 이에 따라, 본 발명에 따른 비정상 제어데이터 탐지 시스템은 제어명령을 분류하여 제어명령의 종류를 산출하고, 산출된 제어명령의 종류에 따라 대응되는 필드 포맷을 예측하고, 예측된 필드 포맷을 기반으로 비정상 제어데이터를 탐지하기 위한 제어데이터 보안규칙을 생성한다. The value means a control command for a factory and a process, and may consist of a plurality of fields. In this case, each field is a unit control command for controlling a factory and a process, and may consist of hexadecimal data. The types and sizes of fields constituting the control command may vary according to factories and processes. For example, control commands for the front process, the main process, the post process, and the common process may have different lengths because they include different fields. In this case, the field configuration of the control command for each plant and process may be defined in a field format. When the field format is disclosed or stored in the system and leaked to the outside, the field format is generally not disclosed or stored in the system, since it can become vulnerable to security because it is easy to become a target of an external attack. However, when there is no information on the field format, abnormal control data for the control command cannot be detected. Accordingly, the abnormal control data detection system according to the present invention classifies the control command to calculate the type of the control command, predicts the corresponding field format according to the calculated type of the control command, and based on the predicted field format, the abnormal control data detection system Create a control data security rule to detect control data.

본 발명의 다른 일 실시예에서는 데이터 수집부(310)가 생략되고, 제어데이터가 데이터 베이스(350)에 입력되어 저장될 수도 있다. In another embodiment of the present invention, the data collection unit 310 may be omitted, and control data may be input to and stored in the database 350 .

보안규칙 생성부(320)는 획득된 제어데이터를 머신러닝(Machine Learning) 및 딥러닝(Deep Learning) 기법으로 학습하여 제어데이터 보안규칙을 생성한다. 본 발명의 일 실시예에 따르면, 보안 규칙 생성부(320)는 제어 단말(100)로 입력되는 타겟 제어데이터의 타겟 제어명령을 전처리하는 클러스터링 모델(410) 및 필드 포맷 결정 모델(420)을 생성하고, 비정상 제어데이터를 판단하는 비정상 제어데이터 판단모델(430)을 생성할 수 있다. 본 발명의 일 실시예에 따르면, 클러스터링 모델(410), 필드 포맷 결정 모델(420) 및 비정상 제어데이터 판단모델(430)은 제어 단말(100)에 전달되는 제어데이터의 제어명령을 이용하여 생성될 수 있다. 클러스터링 모델(410), 필드 포맷 결정 모델(420) 및 비정상 제어데이터 판단모델(430)에 대해서는 도 5 내지 도 8을 참조하여 후술한다.The security rule generator 320 generates a control data security rule by learning the acquired control data using machine learning and deep learning techniques. According to an embodiment of the present invention, the security rule generator 320 generates a clustering model 410 and a field format determination model 420 for pre-processing a target control command of the target control data input to the control terminal 100 . and an abnormal control data determination model 430 for determining abnormal control data may be generated. According to an embodiment of the present invention, the clustering model 410 , the field format determination model 420 , and the abnormal control data determination model 430 may be generated using a control command of the control data transmitted to the control terminal 100 . can The clustering model 410 , the field format determination model 420 , and the abnormal control data determination model 430 will be described later with reference to FIGS. 5 to 8 .

비정상 제어데이터 판단부(330)는 타겟 제어데이터의 타겟 제어명령을 전처리하여 전처리 데이터를 생성하고, 생성된 전처리 데이터를 비정상 제어데이터 판단모델(430)에 입력하여 타겟 제어데이터의 비정상 여부를 결정한다.The abnormal control data determination unit 330 pre-processes the target control command of the target control data to generate pre-processing data, and inputs the generated pre-processing data to the abnormal control data determination model 430 to determine whether the target control data is abnormal. .

비정상 제어데이터 판단부(330)는 전처리부(331) 및 판단부(332)를 포함한다.The abnormal control data determining unit 330 includes a preprocessing unit 331 and a determining unit 332 .

전처리부(331)는 타겟 제어데이터의 타겟 제어명령을 전처리하여 전처리 데이터를 생성한다. 구체적으로, 본 발명의 일 실시예에 따르면, 전처리부(331)는 타겟 제어명령을 클러스터링하여 타겟 제어명령의 종류를 결정하고, 결정된 타겟 제어명령의 종류에 대응되는 필드 포맷 결정 모델(420)을 이용하여 타겟 제어명령에 예측 필드 포맷이 부여된 전처리 데이터를 생성한다.The pre-processing unit 331 pre-processes a target control command of the target control data to generate pre-processing data. Specifically, according to an embodiment of the present invention, the preprocessor 331 determines the type of the target control command by clustering the target control command, and generates a field format determination model 420 corresponding to the determined type of the target control command. It is used to generate preprocessed data in which the prediction field format is given to the target control command.

판단부(332)는 생성된 전처리 데이터를 비정상 제어데이터 판단모델(430)에 입력하여 타겟 제어데이터의 비정상 여부를 산출한다. 구체적으로, 생성된 전처리 데이터를 오토 인코더(Auto Encoder) 모델인 비정상 제어데이터 판단모델(430)에 입력하여 타겟 제어명령에 대한 재구성 로스 값을 산출한다. 이때, 재구성 로스 값은 비정상 제어데이터 판단모델(430)에 입력되는 데이터와 비정상 제어데이터 판단모델(430)에 의해 예측되는 데이터의 차이를 나타내는 값이다. 즉, 재구성 로스 값이 클수록 필드 포맷 결정 모델(420)에 입력된 데이터와 예측되는 데이터의 차이가 큰 것을 의미하며, 이상데이터의 재구성 로스 값이 정상데이터의 재구성 로스 값보다 크다. 이에 따라, 비정상 제어데이터 판단모델(430)은 재구성 로스 값이 미리 정해진 일정 한도를 초과하면 입력된 타겟 제어명령을 비정상인 것으로 판단한다.The determination unit 332 inputs the generated pre-processing data to the abnormal control data determination model 430 to calculate whether the target control data is abnormal. Specifically, the generated pre-processing data is input to the abnormal control data determination model 430 which is an auto encoder model to calculate a reconstruction loss value for the target control command. In this case, the reconstruction loss value is a value representing a difference between data input to the abnormal control data determination model 430 and data predicted by the abnormal control data determination model 430 . That is, the larger the reconstruction loss value, the greater the difference between the data input to the field format determination model 420 and the predicted data, and the reconstruction loss value of the abnormal data is greater than the reconstruction loss value of the normal data. Accordingly, the abnormal control data determination model 430 determines that the input target control command is abnormal when the reconstruction loss value exceeds a predetermined predetermined limit.

감시부(340)는 데이터 수집부(310), 보안규칙 생성부(320), 비정상 제어데이터 판단부(330) 및 데이터 베이스(350)의 정상적으로 동작하고 있는지 여부를 탐지하고, 비정상적으로 동작하고 있는 경우, 해당 부분 또는 탐지 서버(140) 전체를 종료하여 재동작시킨다. The monitoring unit 340 detects whether the data collection unit 310, the security rule generation unit 320, the abnormal control data determination unit 330 and the database 350 are operating normally, and In this case, the corresponding part or the entire detection server 140 is terminated and operated again.

데이터 베이스(350)는 데이터 수집부(310)를 통해 입력되는 제어데이터를 저장할 수 있다. 클러스터링 모델(410), 필드 포맷 결정 모델(420) 및 비정상 제어데이터 판단모델(430)은 데이터베이스(350)에 저장된 제어데이터를 이용하여 생성될 수 있다.The database 350 may store control data input through the data collection unit 310 . The clustering model 410 , the field format determination model 420 , and the abnormal control data determination model 430 may be generated using control data stored in the database 350 .

이하, 도 5 내지 도 8을 참조하여, 본 발명의 일 실시예에 따른 제어데이터 보안규칙에 대해 상세히 후술한다. 도 5는 본 발명의 일 실시예에 따른 제어데이터 보안규칙에 따른 비정상 제어데이터 판단 과정을 나타내는 블록도이고, 도 6은 도 5에 도시된 클러스터링 모델의 학습과정을 나타내는 플로우 차트이다. 도 7은 도 5에 도시된 필드 포맷 결정 모델의 학습과정을 나타내는 플로우 차트이고, 도 8은 도 5에 도시된 비정상 제어데이터 판단모델의 학습과정을 나타내는 플로우 차트이다.Hereinafter, a control data security rule according to an embodiment of the present invention will be described in detail with reference to FIGS. 5 to 8 . 5 is a block diagram illustrating an abnormal control data determination process according to a control data security rule according to an embodiment of the present invention, and FIG. 6 is a flowchart illustrating a learning process of the clustering model shown in FIG. 7 is a flowchart illustrating a learning process of the field format determination model illustrated in FIG. 5 , and FIG. 8 is a flowchart illustrating a learning process of the abnormal control data determination model illustrated in FIG. 5 .

본 발명의 일 실시예에 따른 비정상 제어데이터 탐지 시스템은 도 5에 도시된 바와 같이, 클러스터링 모델(410), 필드 포맷 결정 모델(420) 및 비정상 제어데이터 판단모델(430)을 포함하는 제어데이터 보안규칙(400)을 이용하여 타겟 제어데이터의 비정상 제어데이터 여부를 판단한다. 비정상 제어데이터 탐지 시스템은 클러스터링(410) 모델을 이용하여 제어명령의 종류를 결정하고, 결정된 제어명령의 종류에 대응하는 필드 포맷 결정 모델(420)을 이용하여 타겟 제어명령에 예측 필드 포맷을 부여하여 타겟 제어명령을 전처리하여 전처리 데이터를 생성한다. 전처리 데이터를 비정상 제어데이터 판단모델(430)에 입력하여 타겟 제어데이터가 비정상 제어 데이터인지 판단한다.As shown in FIG. 5 , the abnormal control data detection system according to an embodiment of the present invention secures control data including a clustering model 410 , a field format determination model 420 , and an abnormal control data determination model 430 . It is determined whether the target control data is abnormal control data using the rule 400 . The abnormal control data detection system determines the type of control command using the clustering 410 model, and gives a predicted field format to the target control command using the field format determination model 420 corresponding to the determined type of control command. Pre-process the target control command to generate pre-processing data. The pre-processing data is input to the abnormal control data determination model 430 to determine whether the target control data is abnormal control data.

클러스터링 모델(410)은 제어명령의 종류를 결정한다. 이를 위해, 클러스터링 모델(410)은 도 6에 도시된 과정을 통해 학습되어 생성된다. 이때, 전술한 바와 같이, 클러스터링 모델(410)은 제어 단말(100)에 전달되는 제어데이터의 제어명령을 이용하여 생성될 수 있다.The clustering model 410 determines the type of control command. To this end, the clustering model 410 is learned and generated through the process shown in FIG. 6 . In this case, as described above, the clustering model 410 may be generated using a control command of the control data transmitted to the control terminal 100 .

본 발명의 일 실시예에 따르면, 클러스터링 모델(410)은 자연어 처리(Natural Language Processing, NLP) 알고리즘에 따라 전처리된 제어명령을 입력 받아 학습될 수 있다.According to an embodiment of the present invention, the clustering model 410 may be learned by receiving a control command preprocessed according to a natural language processing (NLP) algorithm.

우선, 제어명령의 길이를 기준으로 제어명령의 종류 및 제어명령의 종류의 개수를 예측한다(S601). First, the type of the control command and the number of types of the control command are predicted based on the length of the control command (S601).

이후, 예측된 제어명령의 종류의 개수(N)가 4보다 작거나 같은지 또는 4보다 큰지 판단한다(S602). Thereafter, it is determined whether the predicted number N of types of control commands is less than, equal to, or greater than 4 (S602).

예측된 제어명령의 종류의 개수(N)가 4보다 작거나 같은 경우, 예측된 제어명령의 종류를 기준으로 각 제어명령의 종류를 결정하는 클러스터링 모델을 저장한다(S607).When the number (N) of the types of the predicted control commands is less than or equal to 4, a clustering model for determining the types of each control command based on the types of the predicted control commands is stored (S607).

예측된 제어명령의 종류의 개수(N)가 4보다 큰 경우, 제어명령에 자연어 처리 알고리즘(NLP)를 적용한다(S603). 자연어 처리 알고리즘(NLP)은 단어를 벡터로 표현하는 방법으로, 예를 들어, 원핫인코딩(One-Hot Encoding) 및 워드 임베딩(Word Embedding) 등의 방법을 포함한다. 본 발명에 따른 비정상 제어데이터 탐지 시스템은 제어 명령을 시스템 사이의 대화로 가정하여 자연어 처리 알고리즘(NLP)을 적용하여 제어명령을 전처리한다. 이에 따라, 제어명령은 자연어 처리 알고리즘(NLP)을 통해 벡터화된다. When the predicted number of types of control commands (N) is greater than 4, a natural language processing algorithm (NLP) is applied to the control commands (S603). A natural language processing algorithm (NLP) is a method of expressing a word as a vector, and includes, for example, methods such as one-hot encoding and word embedding. The abnormal control data detection system according to the present invention preprocesses the control command by applying a natural language processing algorithm (NLP), assuming that the control command is a dialogue between the systems. Accordingly, the control commands are vectorized through a natural language processing algorithm (NLP).

벡터화된 제어명령을 클러스터링한다(S604). 구체적으로, 클러스터링 모델(410)은 벡터화된 제어명령을 입력받아 예측된 제어명령의 종류의 개수(N)의 그룹으로 분류한다. 이때, 클러스터링 모델(410)은 K 평균 모델(K-Means Model)일 수 있다.The vectorized control commands are clustered (S604). Specifically, the clustering model 410 receives vectorized control commands and classifies them into groups of the number (N) of types of predicted control commands. In this case, the clustering model 410 may be a K-Means model.

클러스터링 모델(410)이 제어명령을 분류한 결과를 확인(S605)하여 클러스터링 모델(410)이 제어명령을 분류한 결과가 타당한 경우, 학습된 클러스터링 모델(410)을 저장한다(S607). 이때, 클러스터링 모델(410)은 클러스터링된 제어명령의 종류, 제어명령의 종류의 개수(N), 클러스터링의 중심점 위치 및 클러스터링 기준 등의 클러스터링 관련 정보를 포함할 수 있다. 한편, 클러스터링 모델(410)에 의해 제어명령이 분류된 결과가 타당하지 않은 경우, 예측된 제어명령의 종류의 개수(N)를 1만큼 증가시켜 클러스터링 모델이 분류할 그룹의 수를 증가시키고(S606), 전술한 S604 단계 내지 S605 단계를 반복하여 클러스터링 모델(410)을 재생성한다.The clustering model 410 checks the result of classifying the control command (S605), and when the result of classifying the control command by the clustering model 410 is valid, the learned clustering model 410 is stored (S607). In this case, the clustering model 410 may include clustering-related information such as the type of clustered control commands, the number of types of control commands (N), the location of the center point of clustering, and the clustering criteria. On the other hand, if the result of classifying the control command by the clustering model 410 is not valid, the number of predicted types of control commands (N) is increased by 1 to increase the number of groups to be classified by the clustering model (S606) ), the clustering model 410 is regenerated by repeating steps S604 to S605 described above.

필드 포맷 결정 모델(420)은 클러스터링 모델(410)에 의해 결정된 타겟 제어명령의 종류에 대응되는 예측 필드 포맷의 형태로 전처리한다. 구체적으로, 본 발명의 일 실시예에 따르면, 필드 포맷 결정 모델(420)은 예측한 필드 포맷에 따라 타겟 제어명령에 일정 길이 간격으로 필드 구분자로서 빈 공간인 화이트 스페이스(white space)를 추가하고, 화이트 스페이스(white space)를 추가된 타겟 제어명령을 자연어 처리(Natural Language Processing, NLP) 알고리즘에 따라 전처리하여 전처리 데이터를 생성한다. 이를 위해, 필드 포맷 결정 모델(420)은 도 7에 도시된 과정을 통해 생성된다.The field format determination model 420 pre-processes the prediction field format corresponding to the type of target control command determined by the clustering model 410 . Specifically, according to an embodiment of the present invention, the field format determination model 420 adds a white space, which is an empty space, as a field separator at regular length intervals to the target control command according to the predicted field format, Preprocessing data is generated by preprocessing a target control command to which a white space is added according to a natural language processing (NLP) algorithm. To this end, the field format determination model 420 is generated through the process shown in FIG. 7 .

우선, 클러스터링 모델(410)을 이용하여 제어명령을 클러스터링한다(S701). 구체적으로, 전술한 바와 같이, 클러스터링 모델(410)을 이용하여 각 제어명령을 분류하여 각 제어명령의 종류를 결정한다.First, the control command is clustered using the clustering model 410 (S701). Specifically, as described above, each control command is classified using the clustering model 410 to determine the type of each control command.

본 발명의 일 실시예에 따르면, 필드 포맷 결정 모델(420)은 클러스터링 모델(410)에 따라 결정된 각 제어명령의 종류에 대응하여 각각 생성된다. 예를 들어, 제어명령이 클러스터링 모델(410)에 의해 5가지 종류로 분류되는 경우, 각 제어명령의 종류에 대응하는 5가지의 필드 포맷 결정 모델(420)이 생성된다. According to an embodiment of the present invention, the field format determination model 420 is generated corresponding to the type of each control command determined according to the clustering model 410 . For example, when the control commands are classified into five types by the clustering model 410 , five field format determination models 420 corresponding to the types of each control command are generated.

이후, 각 제어명령에 필드 구분자로서 일정 길이 간격으로 화이트 스페이스가 삽입된다(S702). 즉, 각 제어명령에 일정 길이 간격으로 화이트 스페이스를 삽입되어 일정 길이 간격을 갖는 필드로 구분되는 필드구분 제어명령으로 변환한다. 이때, 화이트 스페이스는 제어명령에서 데이터 타입의 크기에 대응하는 1byte(char type), 2bytes(WORD/INT, int/short type), 4bytes(DWORD/DINT/REAL, int/float type), 8bytes(float/double type), 16bytes(string)의 일정한 간격으로 삽입될 수 있다. 예를 들어, 도 4에 도시된 바와 같이, "003403FA01343CAF0167DE..."인 제어데이터의 제어명령에 1byte의 일정한 간격으로 00 34 03 FA 01 34 3C AF 01 67 DE ... "와 같이 필드 구분자로서 화이트 스페이스가 삽입된다. 이때, 화이트 스페이스로 구분된 제어명령의 각 부분들은 필드에 대응되는 것으로 가정할 수 있다. Thereafter, white spaces are inserted at regular length intervals as field delimiters in each control command (S702). That is, a white space is inserted into each control command at a predetermined interval and converted into a field division control command divided into fields having a predetermined interval. In this case, the white space corresponds to the size of the data type in the control command: 1 byte (char type), 2 bytes (WORD/INT, int/short type), 4 bytes (DWORD/DINT/REAL, int/float type), 8 bytes (float) /double type), can be inserted at regular intervals of 16 bytes (string). For example, as shown in Fig. 4, in the control command of the control data of "003403FA01343CAF0167DE...", at regular intervals of 1 byte, 00 34 03 FA 01 34 3C AF 01 67 DE ... " as a field delimiter. A white space is inserted In this case, it can be assumed that each part of the control command separated by the white space corresponds to a field.

이후, 필드구분 제어명령을 이용하여 이상데이터를 생성한다(S703). 구체적으로, 화이트 스페이스로 필드가 구분된 제어명령을 필드 단위로 순서를 변경하거나, 삭제하거나, 추가하거나 값을 변경하여 이상 데이터를 생성한다. 이에 따라, 해당 제어데이터에 대해 순서 위반 이상데이터, 절삭 위반 이상데이터, 지정 길이 위반 이상데이터 및 고정값 위반 이상데이터가 생성될 수 있다.Thereafter, abnormal data is generated using the field classification control command (S703). Specifically, abnormal data is generated by changing the order, deleting, adding, or changing the value of the control commands separated by the white space on a field-by-field basis. Accordingly, order violation abnormal data, cutting violation abnormal data, specified length violation abnormal data, and fixed value violation abnormal data may be generated for the corresponding control data.

이후, 필드구분 제어명령 및 이상데이터에 자연어 처리 알고리즘(NLP)을 적용한다(S704). 구체적으로, 필드구분 제어명령 및 이상데이터에 자연어 처리 알고리즘(NLP)을 적용하여 필드구분 제어명령 및 이상데이터를 벡터화시킨다. Thereafter, a natural language processing algorithm (NLP) is applied to the field classification control command and the abnormal data (S704). Specifically, a field division control command and anomaly data are vectorized by applying a natural language processing algorithm (NLP) to the field division control command and anomaly data.

이후, 벡터화된 필드구분 제어명령 및 벡터화된 이상데이터를 이용하여 오토인코더(Auto Encoder) 모델인 필드 포맷 결정 모델(420)을 학습시킨다(S705). 구체적으로, 벡터화된 필드구분 제어명령 및 이상데이터는 필드 포맷 결정 모델에 입력되고, 필드 포맷 결정 모델(420)은 벡터화된 필드구분 제어명령 및 이상데이터에 대한 재구성 로스(Reconstruction Loss) 값을 산출한다. 이때, 재구성 로스 값은 필드 포맷 결정 모델(420)에 입력되는 데이터와 필드 포맷 결정 모델에 의해 예측되는 데이터의 차이를 나타내는 값이다. 즉, 재구성 로스 값이 클수록 필드 포맷 결정 모델(420)에 입력된 데이터와 예측되는 데이터의 차이가 큰 것을 의미하며, 이상데이터의 재구성 로스 값이 정상데이터의 재구성 로스 값보다 크다.Thereafter, the field format determination model 420, which is an auto encoder model, is trained using the vectorized field classification control command and the vectorized abnormal data (S705). Specifically, the vectorized field division control command and abnormal data are input to the field format determination model, and the field format determination model 420 calculates a reconstruction loss value for the vectorized field division control command and abnormal data. . In this case, the reconstruction loss value is a value indicating a difference between data input to the field format determination model 420 and data predicted by the field format determination model. That is, the larger the reconstruction loss value, the greater the difference between the data input to the field format determination model 420 and the predicted data, and the reconstruction loss value of the abnormal data is greater than the reconstruction loss value of the normal data.

이후, 필드 포맷 결정 모델(420)로부터 벡터화된 필드구분 제어명령 및 벡터화된 이상데이터에 대해 산출된 재구성 로스 값을 이용하여 필드 포맷 결정 모델(420)의 학습 중단 여부를 결정한다(S706). 이때, 벡터화된 필드구분 제어명령에 삽입된 화이트 스페이스의 간격에 따라 이상데이터와 정상데이터의 차이에 의한 재구성 로스 값이 발생하고, 최적의 필드 포맷에 가까울수록 이상데이터와 정상데이터의 차이가 커져 재구성 로스 값이 커진다. 이에 따라, 재구성 로스 값과 이전에 산출된 재구성 로스 값을 비교하여 재구성 로스 값의 상승률이 미리 정해진 일정 한도 이상인 경우, 해당 필드 포맷이 최적의 필드 포맷인 것으로 판단되어 최적의 필드 포맷을 학습한 필드 포맷 결정 모델(420)의 학습을 중단할 수 있다. Thereafter, it is determined whether or not to stop learning of the field format determination model 420 by using the vectorized field classification control command from the field format determination model 420 and the reconstruction loss value calculated for the vectorized abnormal data (S706). At this time, depending on the interval of the white space inserted in the vectorized field classification control command, a reconstruction loss value due to the difference between the abnormal data and the normal data occurs. Loss value increases. Accordingly, by comparing the reconstruction loss value with the previously calculated reconstruction loss value, when the rate of increase of the reconstruction loss value is equal to or greater than a predetermined limit, the field in which the corresponding field format is determined to be the optimal field format and the optimal field format is learned Learning of the format determination model 420 may be stopped.

또한, 필드 포맷 결정 모델(420)의 학습을 중단하기 위해서는, 필드구분 제어명령의 재구성 로스 값이 미리 정해진 일정 한도보다 작아야 하고, 벡터화된 이상데이터의 재구성 로스 값 각각이 필드구분 제어명령의 전체 재구성 로스 값보다 큰 값을 가져야 한다. In addition, in order to stop the learning of the field format determination model 420, the reconstruction loss value of the field division control command must be less than a predetermined limit, and each reconstruction loss value of the vectorized abnormal data is the total reconstruction of the field division control command. It must have a value greater than the loss value.

필드 포맷 결정 모델(420)의 학습을 중단한 경우, 학습된 필드 포맷 결정 모델(420)을 저장한다(S708).When the learning of the field format determination model 420 is stopped, the learned field format determination model 420 is stored (S708).

한편, 필드 포맷 결정 모델(420)의 학습이 중단 되지 않은 경우, 제어명령에 화이트 스페이스가 삽입되는 간격을 증가시킨다(S707). 전술한 바와 같이, 화이트 스페이스는 제어명령에 1byte(char type), 2bytes(WORD/INT, int/short type), 4bytes(DWORD/DINT/REAL, int/float type), 8bytes(float/double type), 16bytes(string)의 간격으로 삽입될 수 있으며, 1byte 간격으로 화이트 스페이스를 삽입하여 S702 내지 S706단계를 진행하고, 필드 포맷 결정 모델(420)의 학습을 중단하지 않는 경우, 화이트 스페이스가 삽입되는 간격을 2배로 증가시킨다. 즉, 필드 포맷 결정 모델(420)의 학습이 중단되지 않는 경우, 화이트 스페이스가 삽입되는 간격을 2배로 증가시킨 후, 필드 포맷 결정 모델(420)의 학습이 중단될 때까지 S702 내지 S706 단계를 반복하여 필드 포맷 결정 모델(420)이 재생성한다.On the other hand, when the learning of the field format determination model 420 is not stopped, the interval at which the white space is inserted in the control command is increased (S707). As described above, the white space is 1 byte (char type), 2 bytes (WORD/INT, int/short type), 4 bytes (DWORD/DINT/REAL, int/float type), 8 bytes (float/double type) in the control command. , may be inserted at an interval of 16 bytes (string), a white space is inserted at an interval of 1 byte, steps S702 to S706 are performed, and when learning of the field format determination model 420 is not stopped, the white space is inserted is doubled. That is, when the learning of the field format determination model 420 is not stopped, the interval at which the white space is inserted is doubled, and then steps S702 to S706 are repeated until the learning of the field format determination model 420 is stopped. Thus, the field format determination model 420 regenerates it.

비정상 제어데이터 판단모델(430)은 제어단말(100)을 통해 입력되는 타겟 제어데이터가 비정상 제어데이터인지 판단한다. 이를 위해, 비정상 제어데이터 판단모델(430)은 도 8에 도시된 과정을 통해 생성된다.The abnormal control data determination model 430 determines whether the target control data input through the control terminal 100 is abnormal control data. To this end, the abnormal control data determination model 430 is generated through the process shown in FIG. 8 .

우선, 제어명령은 전술한 클러스터링 모델(410)에 의해 클러스터링되어 각 제어명령의 종류를 결정한다(S801).First, the control commands are clustered by the aforementioned clustering model 410 to determine the type of each control command (S801).

이후, S801 단계에서 결정된 제어명령의 종류에 대응되는 필드 포맷 결정 모델에 의해 제어명령에 화이트 스페이스가 삽입되고 자연어 처리 알고리즘(NLP)에 의해 벡터화되어 제어명령에 예측 필드 포맷이 부여된 전처리 데이터를 산출한다.Thereafter, a white space is inserted into the control command by the field format determination model corresponding to the type of control command determined in step S801 and vectorized by a natural language processing algorithm (NLP) to calculate preprocessed data to which the predicted field format is given to the control command do.

이후, 예측 필드 포맷이 부여된 전처리 데이터를 이용하여 오토인코더(Auto Encoder) 모델인 비정상 제어데이터 판단모델(430)을 학습시킨다(S803). 본 발명의 일 실시예에 따르면, 비정상 제어데이터 판단모델(430)은 정상 데이터로 학습시킨다. 구체적으로, 정상 데이터의 제어명령을 제어명령의 종류에 대응하는 필드 포맷 결정 모델(420)을 통해 전처리하여 제어명령에 예측 필드 포맷이 부여되고 벡터화된 전처리 데이터를 이용하여 비정상 제어데이터 판단모델(430)이 학습된다.Thereafter, the abnormal control data determination model 430, which is an auto encoder model, is trained using the pre-processed data to which the prediction field format is assigned (S803). According to an embodiment of the present invention, the abnormal control data determination model 430 is trained with normal data. Specifically, the control command of normal data is pre-processed through the field format determination model 420 corresponding to the type of control command, the predicted field format is given to the control command, and the abnormal control data determination model 430 using vectorized pre-processing data. ) is learned.

이후, 학습된 비정상 제어데이터 판단모델(430)은 저장된다(S804).Thereafter, the learned abnormal control data determination model 430 is stored (S804).

다시 도 5를 참조하면, 본 발명에 따른 비정상 제어데이터 탐지 시스템은 전술한 과정을 통해 생성된 클러스터링 모델(410), 필드 포맷 결정 모델(420) 및 비정상 제어데이터 판단모델(430)을 통해 타겟 제어데이터의 비정상 여부를 판단할 수 있다. Referring back to FIG. 5 , the abnormal control data detection system according to the present invention controls the target through the clustering model 410 , the field format determination model 420 , and the abnormal control data determination model 430 generated through the above-described process. Whether the data is abnormal can be determined.

본 발명에 따른 비정상 제어데이터 탐지 시스템은 클러스터링 모델(410)을 통해 타겟 제어명령을 클러스터링하여 타겟 제어명령의 종류를 결정하고, 결정된 타겟 제어명령의 종류에 대응되는 필드 포맷 결정 모델(420)을 통해 타겟 제어명령에 필드 포맷이 부여되고 자연어 처리 알고리즘(NLP)를 통해 벡터화하여 전처리하여 전처리 데이터를 생성한다. 본 발명에 따른 비정상 제어데이터 탐지 시스템은 전처리 데이터를 비정상 제어데이터 판단모델(430)에 입력하여 타겟 제어명령에 대한 재구성 로스 값을 산출한다. 이때, 재구성 로스 값은 비정상 제어데이터 판단모델(430)에 입력되는 데이터와 비정상 제어데이터 판단모델(430)에 의해 예측되는 데이터의 차이를 나타내는 값이다. 즉, 재구성 로스 값이 클수록 필드 포맷 결정 모델(420)에 입력된 데이터와 예측되는 데이터의 차이가 큰 것을 의미하며, 이상데이터의 재구성 로스 값이 정상데이터의 재구성 로스 값보다 크다. 이에 따라, 비정상 제어데이터 판단모델(430)은 재구성 로스 값이 미리 정해진 일정 한도를 초과하면 입력된 타겟 제어데이터를 비정상 데이터로 판단한다.The abnormal control data detection system according to the present invention determines the type of the target control command by clustering the target control command through the clustering model 410, and through the field format determination model 420 corresponding to the determined type of the target control command. A field format is given to the target control command and vectorized through a natural language processing algorithm (NLP) to generate pre-processing data. The abnormal control data detection system according to the present invention calculates a reconstruction loss value for a target control command by inputting preprocessed data into the abnormal control data determination model 430 . In this case, the reconstruction loss value is a value representing a difference between data input to the abnormal control data determination model 430 and data predicted by the abnormal control data determination model 430 . That is, the larger the reconstruction loss value, the greater the difference between the data input to the field format determination model 420 and the predicted data, and the reconstruction loss value of the abnormal data is greater than the reconstruction loss value of the normal data. Accordingly, the abnormal control data determination model 430 determines the input target control data as abnormal data when the reconstruction loss value exceeds a predetermined predetermined limit.

본 발명이 속하는 기술분야의 당업자는 상술한 본 발명이 그 기술적 사상이나 필수적 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다.Those skilled in the art to which the present invention pertains will understand that the above-described present invention may be embodied in other specific forms without changing the technical spirit or essential characteristics thereof.

그러므로, 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적인 것이 아닌 것으로 이해해야만 한다. 본 발명의 범위는 상기 상세한 설명보다는 후술하는 특허청구범위에 의하여 나타내어지며, 특허청구범위의 의미 및 범위 그리고 그 등가 개념으로부터 도출되는 모든 변경 또는 변형된 형태가 본 발명의 범위에 포함되는 것으로 해석되어야 한다.Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive. The scope of the present invention is indicated by the following claims rather than the above detailed description, and all changes or modifications derived from the meaning and scope of the claims and their equivalent concepts should be interpreted as being included in the scope of the present invention. do.

본 명세서에 설명되어 있는 모든 개시된 방법들 및 절차들은, 적어도 부분적으로, 하나 이상의 컴퓨터 프로그램 또는 구성요소를 사용하여 구현될 수 있다.　 이 구성요소는 RAM, ROM, 플래시 메모리, 자기 또는 광학 디스크, 광메모리, 또는 그 밖의 저장매체와 같은 휘발성 및 비휘발성 메모리를 포함하는 임의의 통상적 컴퓨터 판독 가능한 매체 또는 기계 판독 가능한 매체를 통해 일련의 컴퓨터 지시어들로서 제공될 수 있다. 상기 지시어들은 소프트웨어 또는 펌웨어로서 제공될 수 있으며, 전체적 또는 부분적으로, ASICs, FPGAs, DSPs, 또는 그 밖의 다른 임의의 유사 소자와 같은 하드웨어 구성에 구현될 수도 있다. 상기 지시어들은 하나 이상의 프로세서 또는 다른 하드웨어 구성에 의해 실행되도록 구성될 수 있는데, 상기 프로세서 또는 다른 하드웨어 구성은 상기 일련의 컴퓨터 지시어들을 실행할 때 본 명세서에 개시된 상기 방법들 및 절차들의 모두 또는 일부를 수행하거나 수행할 수 있도록 한다.All disclosed methods and procedures described herein may be implemented, at least in part, using one or more computer programs or components. These components may be configured as a series of series via any conventional computer-readable medium or machine-readable medium including volatile and non-volatile memory such as RAM, ROM, flash memory, magnetic or optical disks, optical memory, or other storage media. They may be provided as computer instructions. The instructions may be provided as software or firmware, and may be implemented, in whole or in part, in a hardware configuration such as ASICs, FPGAs, DSPs, or any other similar device. The instructions may be configured to be executed by one or more processors or other hardware components, which when executing the series of computer instructions perform all or part of the methods and procedures disclosed herein or make it possible to do

100: 제어 단말 110: 공장 제어 장치
120: 비정상 제어데이터 탐지 시스템
130: 보안 게이트웨이 140: 탐지 서버100: control terminal 110: factory control device
120: Abnormal control data detection system
130: security gateway 140: detection server

Claims

a security rule generator for generating a control data security rule based on a control command of the control data transmitted from the control terminal; and
An abnormal control data determination unit for determining whether a target control command of the target control data received from the control terminal is abnormal by using the control data security rule; and
The control data security rule is
A clustering model for determining a type of the target control command by using the target control command;
A field corresponding to the determined type of target control command and generating a prediction field format in which a field, which is a unit control command constituting the target control command, is divided, and applying the generated prediction field format to the target control command to generate pre-processing data a format decision model, and
and an abnormal control data determination model for receiving the pre-processing data and determining whether the target control command is abnormal control data.

According to claim 1,
The security rule generating unit,
An abnormal control data detection system, characterized in that the clustering model is generated by vectorizing the control commands using a natural language processing algorithm (NLP) and clustering the vectorized control commands.

According to claim 1,
The security rule generating unit,
The clustering model is generated through the process of determining the type of each control command based on the length of the control command or the process of determining the type of the control command according to a K-means clustering algorithm. Abnormal control data detection system with

According to claim 1,
The security rule generating unit,
Predicting the type of the control command and the number of types of the control command based on the length of the control command,
When the number of types of the predicted control commands is less than or equal to 4, generating the clustering model for determining the types of each control command based on the types of the predicted control commands,
Abnormal control data detection system, characterized in that when the number of types of the predicted control commands is greater than 4, the clustering model for determining the types of each control command is generated according to a K-means clustering algorithm. .

According to claim 1,
The security rule generating unit,
Predicting the type of the control command and the number of types of the control command based on the length of the control command,
When the number of types of the predicted control commands is less than or equal to 4, generating the clustering model for determining the types of each control command based on the types of the predicted control commands,
When the predicted number of types of control commands is greater than 4, the control commands are vectorized using a natural language processing algorithm (NLP), and each control command is based on a K-means clustering algorithm. Abnormal control data detection system, characterized in that generating the clustering model to determine the type of.

According to claim 1,
The security rule generating unit,
By inserting a white space as an empty space in the control command at regular intervals to generate a field division control command in which fields, which are the unit control commands, are divided, the field format determination model is trained as the generated field division control command;
Abnormal control data detection system, characterized in that the learned field format determination model generates the predicted field format.

According to claim 1,
The security rule generating unit,
generating a field classification control command in which fields, which are the unit control commands, are divided by inserting a white space, which is an empty space, in the control command at regular intervals;
Abnormal data is generated using the field classification control command,
Vectorizing the field classification control command and abnormal data through a natural language processing algorithm (NLP),
The vectorized field classification control command and vectorized abnormal data are input to the field format decision model to learn,
Abnormal control data detection system, characterized in that the learned field format determination model generates the predicted field format.

8. The method of claim 7,
The security rule generating unit,
The vectorized field classification control command and the vectorized abnormal data are input to the field format determination model to calculate a reconstruction loss,
When the rate of increase of the calculated Reconstruction Loss value is greater than or equal to a certain limit, the learning of the field format determination model is stopped, and the learned field format determination model is stored in the abnormal control data determination unit,
Abnormal control data detection system, characterized in that when the increase rate of the calculated reconstruction loss value is less than a certain limit, the field format determination model is regenerated by doubling the white space insertion interval of the field division control command.

8. The method of claim 7,
The security rule generating unit,
The vectorized field classification control command and the vectorized abnormal data are input to the field format determination model to calculate a reconstruction loss,
Abnormal control data detection, characterized in that when the increase rate of the calculated reconstruction loss value is less than a certain limit, the interval at which the white space is inserted in the field division control command is doubled and the field format determination model is regenerated system.

According to claim 1,
The security rule generating unit,
The type of control command is calculated by the clustering model,
A prediction field format is given by the field format determination model corresponding to the type of the control command to generate preprocessed data,
Abnormal control data detection system, characterized in that learning the abnormal control data determination model, which is an auto encoder model, using the pre-processing data.

According to claim 1,
The abnormal control data determination model receives the pre-processing data and calculates a reconstruction loss value,
The abnormal control data detection system, characterized in that the determination unit determines that the target control data is abnormal when the calculated Reconstruction Loss value is equal to or greater than a predetermined limit.

generating a clustering model based on a control command of control data transmitted from a control terminal;
generating a field format determination model using the clustering model based on the control command;
generating an abnormal control data determination model using the clustering model and the field format determination model based on the control command; and
Determining whether the target control command of the target control data received from the control terminal is abnormal control data using the clustering model, the field format determination model, and the abnormal control data determination model; Data detection methods.

13. The method of claim 12,
The step of generating the clustering model comprises:
estimating the type of the control command and the number of types of the control command based on the length of the control command;
generating the clustering model for determining the type of each control command based on the predicted type of the control command when the predicted number of types of the control command is less than or equal to 4;
vectorizing the control command using a natural language processing algorithm (NLP) when the predicted number of types of the control command is greater than 4; and
generating the clustering model for determining the type of the vectorized control command according to a K-means clustering algorithm when the predicted number of types of control commands is greater than 4; Abnormal control data detection method.

13. The method of claim 12,
The generating of the field format decision model comprises:
generating a field classification control command in which a field, which is a unit control command, is divided by inserting a white space, which is an empty space, in the control command at regular intervals;
generating abnormal data using the field classification control command;
vectorizing the field division control command and the abnormal data through a natural language processing algorithm (NLP);
training the vectorized field division control command and vectorized abnormal data to a field format determination model, and calculating a reconstruction loss value for the vectorized field division control command and vectorized abnormal data;
stopping learning of the field format determination model when the rate of increase of the reconstruction loss value is greater than or equal to a certain limit; and
When the rate of increase of the reconstruction loss value is less than a certain limit, increasing the white space insertion interval of the field classification control command by doubling to regenerate the field format decision model;
The type of the control command is calculated by the clustering model, and a field format determination model corresponding to the type of the calculated control command is generated;
The method for detecting abnormal control data, characterized in that the generated field format determination model generates a predicted field format corresponding to the type of the control command.

13. The method of claim 12,
The step of generating the abnormal control data judgment model includes:
calculating a type of control command by the clustering model;
generating preprocessed data by assigning a predicted field format by the field format determination model corresponding to the calculated type of control command;
and learning the abnormal control data determination model corresponding to the type of the control command by using the pre-processing data.

13. The method of claim 12,
The step of determining whether the target control command of the target control data received from the control terminal is abnormal control data includes:
calculating a type of the target control command;
generating preprocessed data by assigning a predicted field format by the field format determination model corresponding to the calculated type of control command; and
and determining whether the control command is abnormal by inputting the pre-processing data to the abnormal control data determination model corresponding to the calculated type of control command.