KR20230067360A

KR20230067360A - Apparatus for anomaly detection based on clustering and method therefor

Info

Publication number: KR20230067360A
Application number: KR1020210153391A
Authority: KR
Inventors: 박영현; 정성욱
Original assignee: 에스케이플래닛 주식회사
Priority date: 2021-11-09
Filing date: 2021-11-09
Publication date: 2023-05-16

Abstract

본 발명의 클러스터링을 기반으로 하는 이상 탐지를 위한 방법은 클러스터부가 기 설정된 제1 개수의 입력되는 데이터가 누적될 때마다 누적된 데이터를 클러스터링하여 데이터 클러스터를 생성하는 단계와, 탐지부가 상기 데이터 클러스터를 모사하도록 학습된 탐지망에 상기 데이터 클러스터를 입력하는 단계와, 상기 탐지망이 상기 데이터 클러스터를 재구성하여 상기 데이터 클러스터를 모사하는 모사 데이터 클러스터를 생성하면, 상기 탐지부가 상기 모사 데이터 클러스터와 상기 입력된 데이터 클러스터의 차이가 기 설정된 임계치 이상인지 여부를 판단하는 단계와, 상기 판단 결과, 상기 차이가 기 설정된 임계치 이상이면, 상기 입력된 데이터 클러스터에 이상이 있는 것으로 판단하는 단계를 포함한다. A method for detecting an anomaly based on clustering of the present invention comprises the steps of generating a data cluster by clustering the accumulated data whenever a preset first number of input data is accumulated by a cluster unit; inputting the data cluster into a detection network learned to simulate; and when the detection network reconstructs the data cluster to generate a simulated data cluster that simulates the data cluster, the detection unit transmits the simulated data cluster and the input data cluster. Determining whether the difference between the data clusters is equal to or greater than a preset threshold, and determining that the input data cluster has an error if the difference is equal to or greater than the preset threshold as a result of the determination.

Description

Apparatus for anomaly detection based on clustering and method therefor}

본 발명은 이상 탐지를 위한 기술에 관한 것으로, 더욱 상세하게는, 클러스터링을 기반으로 하는 이상 탐지를 위한 장치 및 이를 위한 방법에 관한 것이다. The present invention relates to a technology for anomaly detection, and more particularly, to an apparatus and method for anomaly detection based on clustering.

작업 효율화를 위해 기계학습(machine learning) 또는 심층기계학습(deep learning) 기법을 도입하는 사례가 늘고있다. 대표적인 예로 제조업체에서는 생산품에 대한 양품/불량품 판정을 고속화, 자동화하기 위한 목적으로 전술한 기법을 도입하기도 한다. 이때, 종래의 방법은 양부 판정을 수행할 기계학습 모델을 구축하기 위해 레이블이 포함된 데이터 세트가 필요하다는 한계점이 있다. 이상 탐지 범주의 모델을 구축한다고 가정할 때에도 정상 범주의 데이터를 탐색하여 레이블링을 수행하여야 한다. 제조업체 입장에서 생산되는 제품의 양과 속도로 인해 생산품에 대해 전수검사 방식의 수동 레이블링 작업은 큰 부담이 될 수 있어 근본적인 문제 해결 방안이 요구된다. There are increasing cases of introducing machine learning or deep learning techniques to improve work efficiency. As a typical example, a manufacturer introduces the above-described technique for the purpose of speeding up and automating the determination of good/defective products for products. At this time, the conventional method has a limitation in that a data set including a label is required to build a machine learning model to perform a quality decision. Even assuming that a model of the anomaly detection category is built, labeling must be performed by exploring the data of the normal category. Due to the quantity and speed of products produced from the manufacturer's point of view, the manual labeling work of total inspection method for products can be a great burden, so a fundamental solution to the problem is required.

한국등록특허 제2297232호 (2021년 08월 27일 등록)Korean Registered Patent No. 2297232 (registered on August 27, 2021)

본 발명의 목적은 클러스터링을 기반으로 하는 이상 탐지를 위한 장치 및 이를 위한 방법을 제공함에 있다. An object of the present invention is to provide an apparatus and method for anomaly detection based on clustering.

본 발명의 실시예에 따른 이상 탐지를 위한 방법은 클러스터부가 기 설정된 제1 개수의 입력되는 데이터가 누적될 때마다 누적된 데이터를 클러스터링하여 데이터 클러스터를 생성하는 단계와, 탐지부가 상기 데이터 클러스터를 모사하도록 학습된 탐지망에 상기 데이터 클러스터를 입력하는 단계와, 상기 탐지망이 상기 데이터 클러스터를 재구성하여 상기 데이터 클러스터를 모사하는 모사 데이터 클러스터를 생성하면, 상기 탐지부가 상기 모사 데이터 클러스터와 상기 입력된 데이터 클러스터의 차이가 기 설정된 임계치 이상인지 여부를 판단하는 단계와, 상기 판단 결과, 상기 차이가 기 설정된 임계치 이상이면, 상기 입력된 데이터 클러스터에 이상이 있는 것으로 판단하는 단계를 포함한다. A method for anomaly detection according to an embodiment of the present invention includes generating a data cluster by clustering the accumulated data whenever a first number of input data of a predetermined number is accumulated by a cluster unit, and by the detection unit copying the data cluster. inputting the data cluster into a detection network learned to do so; and when the detection network reconstructs the data cluster to generate a simulated data cluster that simulates the data cluster, the detection unit reconstructs the simulated data cluster and the input data Determining whether the difference between the clusters is equal to or greater than a preset threshold, and determining that there is an error in the input data cluster if the difference is equal to or greater than the preset threshold as a result of the determination.

상기 방법은 상기 판단 결과, 상기 차이가 기 설정된 임계치 미만이면, 클러스터부가 상기 데이터 클러스터에서 기 설정된 제2 개수의 데이터를 소거하는 단계와, 상기 클러스터부가 기 설정된 제2 개수의 데이터가 소거된 데이터 클러스터에 새로 입력되는 데이터를 누적하여 데이터 클러스터를 새로 생성하는 단계와, 상기 탐지부가 상기 탐지망을 이용하여 새로 생성된 데이터 클러스터의 이상 여부를 탐지하는 단계를 더 포함한다. The method includes, as a result of the determination, if the difference is less than a preset threshold, erasing, by a cluster unit, a second number of data from the data cluster; The method further includes generating new data clusters by accumulating newly input data, and detecting whether or not the newly created data cluster is abnormal by the detection unit using the detection network.

상기 방법은 상기 데이터 클러스터를 생성하는 단계 전, 클러스터부가 기 설정된 제1 개수의 데이터를 누적하고, 누적된 데이터를 클러스터링하여 학습용 데이터 클러스터를 생성하는 단계와, 학습부가 학습되지 않은 탐지망에 상기 학습용 데이터 클러스터를 입력하는 단계와, 상기 탐지망이 상기 학습용 데이터 클러스터를 재구성하여 상기 학습용 데이터 클러스터를 모사하는 학습용 모사 데이터 클러스터를 생성하는 단계와, 상기 학습부가 상기 학습용 모사 데이터 클러스터와 상기 입력된 학습용 데이터 클러스터의 차이가 최소가 되도록 상기 탐지망의 파라미터를 갱신하는 최적화를 수행하는 단계를 더 포함한다. The method includes, before generating the data cluster, the cluster unit accumulating a preset first number of data and clustering the accumulated data to generate a data cluster for learning; inputting a data cluster; generating a simulated data cluster for learning that simulates the learning data cluster by reconstructing the learning data cluster by the detection network; The method may further include performing optimization of updating parameters of the detection network so that a difference between clusters is minimized.

상기 학습용 데이터 클러스터에 포함되는 데이터는 정상 데이터 및 이상 데이터를 포함한다. 또한, 상기 학습용 데이터 클러스터에 포함되는 이상 데이터의 개수는 상기 학습용 데이터 클러스터에 포함되는 정상 데이터 개수 대비 기 설정된 비율(ab) 미만인 것을 특징으로 한다. Data included in the training data cluster includes normal data and abnormal data. In addition, it is characterized in that the number of abnormal data included in the training data cluster is less than a preset ratio (ab) to the number of normal data included in the training data cluster.

본 발명의 실시예에 따른 이상 탐지를 위한 장치는 기 설정된 제1 개수의 입력되는 데이터가 누적될 때마다 누적된 데이터를 클러스터링하여 데이터 클러스터를 생성하는 클러스터부와, 상기 데이터 클러스터를 모사하도록 학습된 탐지망에 상기 데이터 클러스터를 입력하여 상기 탐지망이 상기 데이터 클러스터를 재구성하여 상기 데이터 클러스터를 모사하는 모사 데이터 클러스터를 생성하면, 상기 모사 데이터 클러스터와 상기 입력된 데이터 클러스터의 차이가 기 설정된 임계치 이상인지 여부를 판단하고, 상기 판단 결과, 상기 차이가 기 설정된 임계치 이상이면, 상기 입력된 데이터 클러스터에 이상이 있는 것으로 판단하는 탐지부를 포함한다. An apparatus for detecting anomaly according to an embodiment of the present invention includes a cluster unit generating a data cluster by clustering the accumulated data whenever a preset first number of input data is accumulated, and learning to simulate the data cluster. When the data cluster is input to the detection network and the detection network reconstructs the data cluster to generate a simulated data cluster that simulates the data cluster, whether the difference between the simulated data cluster and the input data cluster is greater than or equal to a predetermined threshold value. and a detection unit that determines whether or not there is an abnormality in the input data cluster if the difference is greater than or equal to a preset threshold as a result of the determination.

상기 클러스터부는 상기 판단 결과, 상기 차이가 기 설정된 임계치 미만이면, 상기 데이터 클러스터에서 기 설정된 제2 개수의 데이터를 소거하고, 기 설정된 제2 개수의 데이터가 소거된 데이터 클러스터에 새로 입력되는 데이터를 누적하여 데이터 클러스터를 새로 생성하고, 상기 탐지부는 상기 탐지망을 이용하여 새로 생성된 데이터 클러스터의 이상 여부를 탐지하는 것을 특징으로 한다. As a result of the determination, if the difference is less than a predetermined threshold, the cluster unit erases the second number of data from the data cluster and accumulates newly input data in the data cluster from which the second number of data has been erased. to create a new data cluster, and the detection unit detects whether or not the newly created data cluster is abnormal using the detection network.

상기 장치는 상기 클러스터부가 기 설정된 제1 개수의 데이터를 누적하고, 누적된 데이터를 클러스터링하여 학습용 데이터 클러스터를 생성하면, 학습되지 않은 탐지망에 상기 학습용 데이터 클러스터를 입력하고, 상기 탐지망이 상기 학습용 데이터 클러스터를 재구성하여 상기 학습용 데이터 클러스터를 모사하는 학습용 모사 데이터 클러스터를 생성하면, 상기 학습용 모사 데이터 클러스터와 상기 입력된 학습용 데이터 클러스터의 차이가 최소가 되도록 상기 탐지망의 파라미터를 갱신하는 최적화를 수행하는 학습부를 더 포함한다. When the cluster unit accumulates a preset first number of data and clusters the accumulated data to generate a data cluster for learning, the device inputs the data cluster for learning to an unlearned detection network, and the detection network is used for the learning. When a data cluster is reconstructed to generate a simulated data cluster for learning that simulates the data cluster for learning, the difference between the simulated data cluster for learning and the input data cluster for learning is minimized to update the parameters of the detection network. Further includes a learning unit.

상기 학습용 데이터 클러스터에 포함되는 데이터는 정상 데이터 및 이상 데이터를 포함한다. 그리고 상기 학습용 데이터 클러스터에 포함되는 이상 데이터의 개수는 상기 학습용 데이터 클러스터에 포함되는 정상 데이터 개수 대비 기 설정된 비율(ab) 미만인 것을 특징으로 한다. Data included in the training data cluster includes normal data and abnormal data. Further, the number of abnormal data included in the training data cluster is less than a predetermined ratio (ab) to the number of normal data included in the training data cluster.

본 발명에 따르면, 엄청나게 많은 수의 데이터를 이용하여 라벨링 없이 이상 여부를 탐지할 수 있는 학습 모델, 즉, 탐지망을 생성할 수 있다. 따라서 학습 모델을 생성하는 시간, 노력 및 비용을 절약할 수 있다. 게다가, 이러한 탐지망을 이용하여 이상 여부를 탐지함으로써 트레이서빌리티(traceability)를 향상시킬 수 있다. According to the present invention, it is possible to create a learning model, ie, a detection network, capable of detecting abnormalities without labeling using an extremely large number of data. Thus, the time, effort and cost of creating a learning model can be saved. In addition, traceability can be improved by detecting anomalies using such a detection network.

도 1은 본 발명의 실시예에 따른 클러스터링을 기반으로 하는 이상 탐지를 위한 장치의 구성을 설명하기 위한 도면이다.
도 2는 본 발명의 실시예에 따른 클러스터링을 기반으로 하는 이상 탐지를 위한 장치의 세부적인 구성을 설명하기 위한 도면이다.
도 3은 본 발명의 일 실시예에 따른 클러스터링을 기반으로 하는 이상 탐지를 위한 탐지망의 구성을 설명하기 위한 도면이다.
도 4는 본 발명의 실시예에 따른 클러스터링을 기반으로 하는 이상 탐지를 위한 탐지망을 학습시키기 위한 방법을 설명하기 위한 흐름도이다.
도 5는 본 발명의 실시예에 따른 클러스터링을 기반으로 하는 이상 탐지를 위한 탐지망을 학습시키기 위한 클러스터링 데이터를 설명하기 위한 도면이다.
도 6은 본 발명의 실시예에 따른 클러스터링을 기반으로 하는 이상 탐지를 위한 방법을 설명하기 위한 흐름도이다. 1 is a diagram for explaining the configuration of an apparatus for anomaly detection based on clustering according to an embodiment of the present invention.
2 is a diagram for explaining a detailed configuration of an apparatus for detecting an anomaly based on clustering according to an embodiment of the present invention.
3 is a diagram for explaining the configuration of a detection network for anomaly detection based on clustering according to an embodiment of the present invention.
4 is a flowchart illustrating a method for training a detection network for anomaly detection based on clustering according to an embodiment of the present invention.
5 is a diagram for explaining clustering data for learning a detection network for anomaly detection based on clustering according to an embodiment of the present invention.
6 is a flowchart illustrating a method for anomaly detection based on clustering according to an embodiment of the present invention.

본 발명의 과제 해결 수단의 특징 및 이점을 보다 명확히 하기 위하여, 첨부된 도면에 도시된 본 발명의 특정 실시 예를 참조하여 본 발명을 더 상세하게 설명한다. In order to clarify the characteristics and advantages of the problem solving means of the present invention, the present invention will be described in more detail with reference to specific embodiments of the present invention shown in the accompanying drawings.

다만, 하기의 설명 및 첨부된 도면에서 본 발명의 요지를 흐릴 수 있는 공지 기능 또는 구성에 대한 상세한 설명은 생략한다. 또한, 도면 전체에 걸쳐 동일한 구성 요소들은 가능한 한 동일한 도면 부호로 나타내고 있음에 유의하여야 한다. However, detailed descriptions of well-known functions or configurations that may obscure the gist of the present invention will be omitted in the following description and accompanying drawings. In addition, it should be noted that the same components are indicated by the same reference numerals throughout the drawings as much as possible.

이하의 설명 및 도면에서 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니 되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위한 용어의 개념으로 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서 본 명세서에 기재된 실시 예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일 실시 예에 불과할 뿐이고, 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형 예들이 있을 수 있음을 이해하여야 한다. The terms or words used in the following description and drawings should not be construed as being limited to a common or dictionary meaning, and the inventor may appropriately define the concept of terms for explaining his/her invention in the best way. It should be interpreted as a meaning and concept consistent with the technical idea of the present invention based on the principle that there is. Therefore, the embodiments described in this specification and the configurations shown in the drawings are only one of the most preferred embodiments of the present invention, and do not represent all of the technical ideas of the present invention. It should be understood that there may be equivalents and variations.

또한, 제1, 제2 등과 같이 서수를 포함하는 용어는 다양한 구성요소들을 설명하기 위해 사용하는 것으로, 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용될 뿐, 상기 구성요소들을 한정하기 위해 사용되지 않는다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제2 구성요소는 제1 구성요소로 명명될 수 있고, 유사하게 제1 구성요소도 제2 구성요소로 명명될 수 있다. In addition, terms including ordinal numbers, such as first and second, are used to describe various components, and are used only for the purpose of distinguishing one component from other components, and to limit the components. Not used. For example, a second element may be termed a first element, and similarly, a first element may be termed a second element, without departing from the scope of the present invention.

더하여, 어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급할 경우, 이는 논리적 또는 물리적으로 연결되거나, 접속될 수 있음을 의미한다. 다시 말해, 구성요소가 다른 구성요소에 직접적으로 연결되거나 접속되어 있을 수 있지만, 중간에 다른 구성요소가 존재할 수도 있으며, 간접적으로 연결되거나 접속될 수도 있다고 이해되어야 할 것이다. Additionally, when an element is referred to as being “connected” or “connected” to another element, it means that it is logically or physically connected or capable of being connected. In other words, it should be understood that a component may be directly connected or connected to another component, but another component may exist in the middle, or may be indirectly connected or connected.

또한, 본 명세서에서 사용한 용어는 단지 특정한 실시 예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 또한, 본 명세서에서 기술되는 "포함 한다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다. In addition, terms used in this specification are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In addition, terms such as "include" or "having" described in this specification are intended to designate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, but one or the other It should be understood that the above does not preclude the possibility of the presence or addition of other features, numbers, steps, operations, components, parts, or combinations thereof.

또한, 명세서에 기재된 "…부", "…기", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다. In addition, terms such as “… unit”, “… unit”, and “module” described in the specification mean a unit that processes at least one function or operation, which may be implemented as hardware or software or a combination of hardware and software. there is.

또한, "일(a 또는 an)", "하나(one)", "그(the)" 및 유사어는 본 발명을 기술하는 문맥에 있어서(특히, 이하의 청구항의 문맥에서) 본 명세서에 달리 지시되거나 문맥에 의해 분명하게 반박되지 않는 한, 단수 및 복수 모두를 포함하는 의미로 사용될 수 있다. Also, "a or an", "one", "the" and similar words in the context of describing the invention (particularly in the context of the claims below) indicate otherwise in this specification. may be used in the sense of including both the singular and the plural, unless otherwise clearly contradicted by the context.

아울러, 본 발명의 범위 내의 실시 예들은 컴퓨터 실행가능 명령어 또는 컴퓨터 판독가능 매체에 저장된 데이터 구조를 가지거나 전달하는 컴퓨터 판독가능 매체를 포함한다. 이러한 컴퓨터 판독가능 매체는, 범용 또는 특수 목적의 컴퓨터 시스템에 의해 액세스 가능한 임의의 이용 가능한 매체일 수 있다. 예로서, 이러한 컴퓨터 판독가능 매체는 RAM, ROM, EPROM, CD-ROM 또는 기타 광 디스크 저장장치, 자기 디스크 저장장치 또는 기타 자기 저장장치, 또는 컴퓨터 실행가능 명령어, 컴퓨터 판독가능 명령어 또는 데이터 구조의 형태로 된 소정의 프로그램 코드 수단을 저장하거나 전달하는 데에 이용될 수 있고, 범용 또는 특수 목적 컴퓨터 시스템에 의해 액세스 될 수 있는 임의의 기타 매체와 같은 물리적 저장 매체를 포함할 수 있지만, 이에 한정되지 않는다. In addition, embodiments within the scope of the present invention include computer-readable media having or conveying computer-executable instructions or data structures stored thereon. Such computer readable media can be any available media that can be accessed by a general purpose or special purpose computer system. By way of example, such computer readable media may be in the form of RAM, ROM, EPROM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage, or computer executable instructions, computer readable instructions or data structures. physical storage media such as, but not limited to, any other medium that can be used to store or convey any program code means in a computer system and which can be accessed by a general purpose or special purpose computer system. .

아울러, 본 발명은 퍼스널 컴퓨터, 랩탑 컴퓨터, 핸드헬드 장치, 멀티프로세서 시스템, 마이크로프로세서-기반 또는 프로그램 가능한 가전제품(programmable consumer electronics), 네트워크 PC, 미니컴퓨터, 메인프레임 컴퓨터, 모바일 전화, PDA, 페이저(pager) 등을 포함하는 다양한 유형의 컴퓨터 시스템 구성을 가지는 네트워크 컴퓨팅 환경에서 적용될 수 있다. 본 발명은 또한 네트워크를 통해 유선 데이터 링크, 무선 데이터 링크, 또는 유선 및 무선 데이터 링크의 조합으로 링크된 로컬 및 원격 컴퓨터 시스템 모두가 태스크를 수행하는 분산형 시스템 환경에서 실행될 수 있다. 분산형 시스템 환경에서, 프로그램 모듈은 로컬 및 원격 메모리 저장 장치에 위치될 수 있다. In addition, the present invention relates to personal computers, laptop computers, handheld devices, multiprocessor systems, microprocessor-based or programmable consumer electronics, network PCs, minicomputers, mainframe computers, mobile phones, PDAs, pagers It can be applied in a network computing environment having various types of computer system configurations including (pager) and the like. The invention may also be practiced in distributed system environments where tasks are performed by both local and remote computer systems linked by wired data links, wireless data links, or a combination of wired and wireless data links through a network. In a distributed system environment, program modules may be located in local and remote memory storage devices.

먼저, 본 발명의 실시예에 따른 클러스터링을 기반으로 하는 이상 탐지를 위한 장치에 대해서 설명하기로 한다. 도 1은 본 발명의 실시예에 따른 클러스터링을 기반으로 하는 이상 탐지를 위한 장치의 구성을 설명하기 위한 도면이다. 도 2는 본 발명의 실시예에 따른 클러스터링을 기반으로 하는 이상 탐지를 위한 장치의 세부적인 구성을 설명하기 위한 도면이다. 도 3은 본 발명의 일 실시예에 따른 클러스터링을 기반으로 하는 이상 탐지를 위한 탐지망의 구성을 설명하기 위한 도면이다. First, an apparatus for anomaly detection based on clustering according to an embodiment of the present invention will be described. 1 is a diagram for explaining the configuration of an apparatus for anomaly detection based on clustering according to an embodiment of the present invention. 2 is a diagram for explaining a detailed configuration of an apparatus for detecting an anomaly based on clustering according to an embodiment of the present invention. 3 is a diagram for explaining the configuration of a detection network for anomaly detection based on clustering according to an embodiment of the present invention.

먼저, 도 1을 참조하면, 본 발명의 실시예에 따른 클러스터링을 기반으로 하는 이상 탐지를 위한 장치(10: 이하, '이상탐지장치'로 칭함)는 데이터수집부(11), 입력부(12), 표시부(13), 저장부(14) 및 제어부(15)를 포함한다. First, referring to FIG. 1, an apparatus for detecting anomaly based on clustering according to an embodiment of the present invention (10: hereinafter referred to as 'anomaly detection apparatus') includes a data collection unit 11 and an input unit 12 , It includes a display unit 13, a storage unit 14 and a control unit 15.

데이터수집부(11)는 데이터를 수집하기 위한 것이다. 여기서, 데이터는 스마트팩토리 등의 생산 시설에서 생산된 제품에 대한 양품 혹은 불량품 판정을 위해 소정의 센서를 통해 생산된 제품을 측정한 측정값이 될 수 있다. 이러한 데이터는 예컨대, 길이, 무게, 모양, 꺾임 또는 충격을 견디는 정도, 압력을 견디는 정도, 탄성 등이 될 수 있다. 데이터수집부(11)는 제어부(15)의 제어에 따라 수집된 데이터를 저장부(14)에 저장한다. The data collection unit 11 is for collecting data. Here, the data may be a measurement value obtained by measuring a product produced through a predetermined sensor in order to determine a good product or a defective product for a product produced in a production facility such as a smart factory. Such data may be, for example, length, weight, shape, degree of enduring bending or impact, degree of enduring pressure, elasticity, and the like. The data collection unit 11 stores the collected data in the storage unit 14 under the control of the control unit 15 .

입력부(12)는 이상탐지장치(10)를 제어하기 위한 사용자의 키 조작을 입력 받고 입력 신호를 생성하여 제어부(15)로 전달한다. 입력부(12)는 전원 on/off를 위한 전원 키, 숫자 키, 방향키 중 어느 하나를 포함할 수 있으며, 이상탐지장치(10)의 일면에 소정의 기능키로 형성될 수 있다. 표시부(13)가 터치스크린으로 이루어진 경우, 입력부(12)의 각 종 키들의 기능이 표시부(13)에서 이루어질 수 있으며, 터치스크린만으로 모든 기능을 수행할 수 있는 경우, 입력부(12)는 생략될 수도 있다. The input unit 12 receives a user's key manipulation for controlling the anomaly detection device 10, generates an input signal, and transmits it to the control unit 15. The input unit 12 may include any one of a power key, numeric keys, and direction keys for power on/off, and may be formed as a predetermined function key on one surface of the anomaly detection device 10. When the display unit 13 is made of a touch screen, the functions of various keys of the input unit 12 can be performed on the display unit 13, and when all functions can be performed only with the touch screen, the input unit 12 can be omitted. may be

표시부(13)는 화면 표시를 위한 것으로, 이상탐지장치(10)의 메뉴, 입력된 데이터, 기능 설정 정보 및 기타 다양한 정보를 사용자에게 시각적으로 제공할 수 있다. 표시부(13)는 액정표시장치(LCD, Liquid Crystal Display), 유기 발광 다이오드(OLED, Organic Light Emitting Diodes), 능동형 유기 발광 다이오드(AMOLED, Active Matrix Organic Light Emitting Diodes) 등으로 형성될 수 있다. 한편, 표시부(13)는 터치스크린으로 구현될 수 있다. 이러한 경우, 표시부(13)는 터치센서를 포함한다. 터치센서는 사용자의 터치 입력을 감지한다. 터치센서는 정전용량 방식(capacitive overlay), 압력식, 저항막 방식(resistive overlay), 적외선 감지 방식(infrared beam) 등의 터치 감지 센서로 구성되거나, 압력 감지 센서(pressure sensor)로 구성될 수도 있다. 상기 센서들 이외에도 물체의 접촉 또는 압력을 감지할 수 있는 모든 종류의 센서 기기가 본 발명의 터치센서로 이용될 수 있다. 터치센서는 사용자의 터치 입력을 감지하고, 터치된 위치를 나타내는 입력 좌표를 포함하는 감지 신호를 발생시켜 제어부(15)로 전송할 수 있다. The display unit 13 is for displaying a screen, and can visually provide the menu of the anomaly detection device 10, input data, function setting information, and other various information to the user. The display unit 13 may be formed of a Liquid Crystal Display (LCD), Organic Light Emitting Diodes (OLED), Active Matrix Organic Light Emitting Diodes (AMOLED), or the like. Meanwhile, the display unit 13 may be implemented as a touch screen. In this case, the display unit 13 includes a touch sensor. The touch sensor detects a user's touch input. The touch sensor may be configured as a touch sensor such as a capacitive overlay, a pressure sensor, a resistive overlay, or an infrared beam, or a pressure sensor. . In addition to the above sensors, all types of sensor devices capable of sensing contact or pressure of an object may be used as the touch sensor of the present invention. The touch sensor may detect a user's touch input, generate a detection signal including an input coordinate representing a touched position, and transmit the generated detection signal to the control unit 15 .

저장부(14)는 이상탐지장치(10)의 동작에 필요한 프로그램 및 데이터를 저장하는 역할을 수행하며, 프로그램 영역과 데이터 영역으로 구분될 수 있다. 프로그램 영역은 이상탐지장치(10)의 전반적인 동작을 제어하는 프로그램 및 이상탐지장치(10)를 부팅시키는 운영체제(OS, Operating System), 응용 프로그램 등을 저장할 수 있다. 데이터 영역은 이상탐지장치(10)의 사용 및 동작에 따라 발생하는 데이터가 저장되는 영역이다. 특히, 데이터수집부(11)가 수집한 데이터를 저장할 수 있다. 저장부(14)에 저장되는 각 종 데이터는 삭제, 변경, 추가될 수 있다. The storage unit 14 serves to store programs and data necessary for the operation of the anomaly detection device 10, and may be divided into a program area and a data area. The program area may store a program for controlling the overall operation of the anomaly detection device 10, an operating system (OS) for booting the anomaly detection device 10, an application program, and the like. The data area is an area where data generated according to the use and operation of the anomaly detection device 10 is stored. In particular, data collected by the data collection unit 11 may be stored. Various types of data stored in the storage unit 14 can be deleted, changed, or added.

제어부(15)는 이상탐지장치(10)의 전반적인 동작 및 이상탐지장치(10)의 내부 블록들 간 신호 흐름을 제어하고, 데이터를 처리하는 데이터 처리 기능을 수행할 수 있다. 또한, 제어부(15)는 기본적으로, 이상탐지장치(10)의 각 종 기능을 제어하는 역할을 수행한다. 제어부(15)는 중앙처리장치(CPU: Central Processing Unit), 디지털신호처리기(DSP: Digital Signal Processor) 등을 예시할 수 있다.The control unit 15 may control the overall operation of the anomaly detection device 10 and signal flow between internal blocks of the anomaly detection device 10, and may perform a data processing function of processing data. In addition, the controller 15 basically plays a role of controlling various functions of the anomaly detection device 10. The controller 15 may include a central processing unit (CPU), a digital signal processor (DSP), and the like.

도 2를 참조하면, 제어부(15)는 클러스터부(100), 학습부(200) 및 탐지부(300)를 포함한다. Referring to FIG. 2 , the control unit 15 includes a cluster unit 100 , a learning unit 200 and a detection unit 300 .

클러스터부(100)는 데이터를 클러스터링하여 데이터 클러스터를 생성하기 위한 것이다. 클러스터부(100)는 기 설정된 제1 개수(N)의 데이터를 이용하여 데이터 클러스터를 생성한다. 또한, 생성된 데이터 클러스터에 대한 이상 탐지 프로세스가 종료되면, 클러스터부(100)는 기존의 데이터 클러스터에서 FIFO(First In First Out) 방식으로 기 설정된 제2 개수(M)의 데이터를 소거한 후, 기존이 데이터 클러스터에서 소거되지 않은 데이터에 새로 추가되는 데이터를 포함하는 제1 개수(N)의 데이터를 이용하여 데이터 클러스터를 새로 생성한다. The cluster unit 100 is for generating data clusters by clustering data. The cluster unit 100 creates a data cluster using a first number (N) of data. In addition, when the anomaly detection process for the generated data cluster is terminated, the cluster unit 100 erases a preset second number (M) of data in a first in first out (FIFO) method from the existing data cluster, A data cluster is newly created using the first number (N) of data including data newly added to the data that has not been deleted from the existing data cluster.

학습부(300)는 데이터 클러스터를 모사하여 모사 데이터 클러스터를 생성하는 탐지망(DN: Detection Network)을 학습시키기 위한 것이다. 탐지망(DN)은 기본적으로 인공신경망(ANN: Artificial Neural Network)의 일 종이다. 일 실시예에 따르면, 탐지망(DN)은 GNN(Graph Neural Network) 모델이 될 수 있다. 이러한 탐지망(DN)은 입력을 저차원으로 압축하였다가 다시 고차원으로 복원하는 그래프 오토 인코더(Graph Auto-Encoder) 모델이 될 수 있다. 탐지망(DN)은 선택적으로 어텐션 메커니즘(Attention-mechanism)을 추가하여 그래프 어텐션 오토 인코더(Graph Attentional Auto-Encoder) 모델을 구성 할 수 있다. 다른 실시예에 따르면, 탐지망(DN)은 GNN 모델이 아닌, 오토 인코더(Auto-Encoder), GAN(Generative Adversarial Network) 등의 생성망(Generative Network)을 포함하는 모델이 될 수 있다. The learning unit 300 is for learning a detection network (DN) that creates a simulated data cluster by simulating a data cluster. A detection network (DN) is basically a type of artificial neural network (ANN). According to one embodiment, the detection network (DN) may be a Graph Neural Network (GNN) model. Such a detection network (DN) may be a graph auto-encoder model that compresses an input into a low-dimensional and then restores it to a high-dimensional one. The detection network (DN) can configure a Graph Attentional Auto-Encoder model by selectively adding an attention-mechanism. According to another embodiment, the detection network DN may be a model including a generative network such as an auto-encoder or a generative adversarial network (GAN), rather than a GNN model.

도 3에 일 실시예에 따른 탐지망(DN)이 도시되었다. 이러한 탐지망(DN)은 입력을 저차원으로 압축하였다가 다시 고차원으로 복원한다. 도시된 바와 같이, 탐지망(DN)은 인코더 및 디코더를 포함한다. 인코더는 데이터 클러스터에 대해 복수의 계층 간 가중치가 적용되는 복수의 연산을 수행하여 잠재클러스터벡터를 생성하고, 탐지망(DN)의 디코더는 잠재클러스터벡터에 대해 복수의 계층 간 가중치가 적용되는 복수의 연산을 수행하여 모사 데이터 클러스터를 생성할 수 있다. 3 shows a detection network DN according to an embodiment. This detection network (DN) compresses the input to a low dimension and restores it to a high dimension. As shown, the detection network DN includes an encoder and a decoder. The encoder generates a latent cluster vector by performing a plurality of operations to which a plurality of inter-layer weights are applied to data clusters, and the decoder of the detection network (DN) generates a plurality of multi-layer weights to which a plurality of inter-layer weights are applied to the latent cluster vector. Operations can be performed to create simulated data clusters.

탐지부(300)는 탐지망(DN)을 이용하여 데이터 클러스터에 대한 이상 여부를 탐지한다. 탐지부(300)는 탐지망(DN)에 데이터 클러스터를 입력하고, 탐지망(DN)이 데이터 클러스터를 재구성하여 데이터 클러스터를 모사하는 모사 데이터 클러스터를 생성하도록 한다. 그러면, 탐지부(300)는 모사 데이터 클러스터와 데이터 클러스터의 차이가 기 설정된 임계치 이상인지 여부를 판단하고, 그 차이가 기 설정된 임계치 이상이면, 데이터 클러스터에 이상이 있는 것으로 판정하며, 그 차이가 기 설정된 임계치 미만이면, 데이터 클러스터에 이상이 없는 것으로 판정한다. The detection unit 300 uses the detection network DN to detect whether or not there is an anomaly in the data cluster. The detection unit 300 inputs the data cluster to the detection network DN, and causes the detection network DN to reconstruct the data cluster to generate a simulated data cluster that simulates the data cluster. Then, the detection unit 300 determines whether the difference between the simulated data cluster and the data cluster is greater than or equal to a preset threshold, and if the difference is greater than or equal to the preset threshold, it is determined that there is an abnormality in the data cluster, and the difference is If it is less than the set threshold, it is determined that there is no abnormality in the data cluster.

다음으로, 본 발명의 실시예에 따른 클러스터링을 기반으로 하는 이상 탐지를 위한 방법에 대해서 설명하기로 한다. 본 발명의 실시예에 따르면, 이상 탐지를 위해 탐지망(DN: Detection Network)을 이용한다. 이러한 탐지망(DN)을 학습시키기 위한 방법에 대해서 설명하기로 한다. 도 4는 본 발명의 실시예에 따른 클러스터링을 기반으로 하는 이상 탐지를 위한 탐지망을 학습시키기 위한 방법을 설명하기 위한 흐름도이다. 도 5는 본 발명의 실시예에 따른 클러스터링을 기반으로 하는 이상 탐지를 위한 탐지망을 학습시키기 위한 클러스터링 데이터를 설명하기 위한 도면이다. Next, a method for anomaly detection based on clustering according to an embodiment of the present invention will be described. According to an embodiment of the present invention, a detection network (DN) is used to detect an anomaly. A method for learning such a detection network (DN) will be described. 4 is a flowchart illustrating a method for training a detection network for anomaly detection based on clustering according to an embodiment of the present invention. 5 is a diagram for explaining clustering data for learning a detection network for anomaly detection based on clustering according to an embodiment of the present invention.

도 3의 실시예는 데이터수집부(11)가 스마트팩토리 등의 생산 시설에서 생산된 제품에 대한 양품 혹은 불량품 판정을 위해 소정의 센서를 통해 생산된 제품을 측정한 측정값인 데이터를 수집하여 저장부(14)에 저장한 상태를 가정한다. 여기서, 데이터는 예컨대, 길이, 무게, 모양, 꺾임 또는 충격을 견디는 정도, 압력을 견디는 정도, 탄성 등이 될 수 있다. In the embodiment of FIG. 3, the data collection unit 11 collects and stores data, which are measured values obtained by measuring products produced through a predetermined sensor, in order to determine good or defective products for products produced in production facilities such as smart factories. Assume the state stored in unit 14. Here, the data may be, for example, length, weight, shape, degree of enduring bending or impact, degree of enduring pressure, elasticity, and the like.

도 3을 참조하면, 학습부(200)는 S110 단계에서 탐지망(DN)의 파라미터, 즉, 가중치(w)를 초기화한다. 초기화를 위해 Xavier 초기화기(initializer)를 사용할 수 있다. Referring to FIG. 3 , the learning unit 200 initializes the parameter of the detection network DN, that is, the weight w in step S110. For initialization, you can use the Xavier initializer.

다음으로, 클러스터부(100)는 S120 단계에서 저장부(200)에 저장된 데이터를 순차로 추출하여 누적한다. 이어서, 클러스터부(100)는 S130 단계에서 기 설정된 제1 개수(N)의 데이터가 누적되어 추출되었는지 여부를 확인한다. Next, the cluster unit 100 sequentially extracts and accumulates the data stored in the storage unit 200 in step S120. Subsequently, the cluster unit 100 checks whether or not the data of the first number N previously set has been accumulated and extracted in step S130.

S130 단계의 확인 결과, 기 설정된 제1 개수(N) 미만의 데이터가 누적되면, 전술한 S120 단계 및 S130 단계를 반복한다. 반면, S130 단계의 확인 결과, 기 설정된 제1 개수(N)의 데이터가 추출되어 누적되면, S140 단계에서 누적된 기 설정된 제1 개수(N)의 데이터를 클러스터링하여 학습용 데이터 클러스터를 형성한다. As a result of checking in step S130, if data less than the preset first number (N) is accumulated, steps S120 and S130 described above are repeated. On the other hand, as a result of checking in step S130, if the first number N of data is extracted and accumulated in step S140, the first number N of data accumulated in step S140 is clustered to form a data cluster for learning.

이러한 학습용 데이터 클러스터로 사용되는 데이터 클러스터의 예를 도 5에 도시하였다. 도 5의 (A)에 도시된 3개의 데이터 클러스터는 정상 데이터만으로 이루어진 데이터 클러스터를 나타내며, 도 5의 (B)에 도시된 데이터 클러스터는 이상 데이터가 혼합된 데이터 클러스터를 나타낸다. 이러한 학습용 데이터 클러스터에 포함되는 데이터는 학습 시, 정상 여부에 대한 정보를 부여하지 않는다. 다른 말로, 앞서 데이터수집부(11)가 수집하는 데이터는 측정값이 양품을 나타내는 정상 데이터 및 측정값이 불량품을 나타내는 이상 데이터를 포함한다. 다만, 학습용 데이터 클러스터에 포함되는 이상 데이터의 개수는 학습용 데이터 클러스터에 포함되는 정상 데이터 개수 대비 기 설정된 비율(ab) 미만이다. 여기서, 기 설정된 비율(ab)은 정상 데이터가 압도적으로 많아서 이상 데이터를 무시할 수 있는 정도의 수를 의미한다. 즉, 정상 데이터가 이상 데이터 대비 비교할 수 없을 만큼 많기 때문에 정상 혹은 이상 여부를 가리지 않고 전부 학습에 이용된다. 학습 과정에서 소량의 이상 데이터에 대한 손실을 극복하는 것 대비 정상 데이터의 손실을 극복하는 편이 이득이기 때문에 비지도 학습(unsupervised learning)이 가능하다. An example of a data cluster used as such a training data cluster is shown in FIG. 5 . The three data clusters shown in (A) of FIG. 5 represent data clusters consisting of only normal data, and the data clusters shown in (B) of FIG. 5 represent data clusters in which abnormal data are mixed. The data included in this learning data cluster does not give information on whether or not it is normal during learning. In other words, the data previously collected by the data collection unit 11 includes normal data indicating good products and abnormal data indicating defective products. However, the number of abnormal data included in the training data cluster is less than a preset ratio (ab) to the number of normal data included in the training data cluster. Here, the predetermined ratio (ab) means the number of degrees in which abnormal data can be ignored because normal data is overwhelmingly large. That is, since normal data is incomparably larger than abnormal data, all of them are used for learning regardless of whether they are normal or abnormal. Unsupervised learning is possible because it is more advantageous to overcome the loss of normal data than to overcome the loss of a small amount of abnormal data in the learning process.

다음으로, 학습부(200)는 S150 단계에서 학습용 데이터 클러스터를 탐지망(DN)에 입력한다. 그러면, 탐지망(DN)은 S160 단계에서 복수의 계층 간 가중치가 적용되는 복수의 연산을 통해 학습용 데이터 클러스터를 재구성하여 학습용 데이터 클러스터를 모사하는 학습용 모사 데이터 클러스터를 생성한다. Next, the learning unit 200 inputs the data cluster for learning to the detection network DN in step S150. Then, the detection network DN reconstructs the training data cluster through a plurality of operations to which weights between a plurality of layers are applied in step S160 to generate a training data cluster that simulates the training data cluster.

일 실시예에 따르면, 탐지망(DN)의 인코더는 학습용 데이터 클러스터에 대해 복수의 계층 간 가중치가 적용되는 복수의 연산을 수행하여 학습용 잠재클러스터벡터를 생성하고, 탐지망(DN)의 디코더는 학습용 잠재클러스터벡터에 대해 복수의 계층 간 가중치가 적용되는 복수의 연산을 수행하여 학습용 모사 데이터 클러스터를 생성한다. According to an embodiment, the encoder of the detection network (DN) generates a latent cluster vector for learning by performing a plurality of operations to which weights between a plurality of layers are applied to a data cluster for learning, and the decoder of the detection network (DN) is for learning. A plurality of calculations to which weights between a plurality of layers are applied to the latent cluster vector are performed to generate a simulated data cluster for learning.

그러면, 학습부(200)는 S170 단계에서 학습용 모사 데이터 클러스터와 학습용 데이터 클러스터 간의 차이를 나타내는 손실을 산출하고, 손실이 최소화되도록 역전파(Backpropagation) 알고리즘을 통해 탐지망(DN)의 가중치(w)를 갱신하는 최적화를 수행한다. Then, in step S170, the learning unit 200 calculates a loss representing the difference between the simulated data cluster for learning and the data cluster for learning, and calculates the weight (w) of the detection network (DN) through a backpropagation algorithm so that the loss is minimized. Perform optimization to update .

다음으로, 학습부(200)는 S180 단계에서 앞서 산출된 손실이 기 설정된 목표치 미만인지 여부를 판별한다. S180 단계의 판별 결과, 손실이 목표치 미만이면, S190 단계로 진행한다. S190 단계에서 클러스터부(100)는 학습용 데이터 클러스터에서 기 설정된 제2 개수(M)의 데이터를 소거한다. 이때, 클러스터부(100)는 학습용 클러스터에 포함된 데이터 중 앞서 S120 단계에서 데이터를 추출한 순서에 따라 FIFO(First In First Out) 방식으로 제2 개수(M)의 먼저 추출된 데이터를 소거한다. 그런 다음, 전술한 S120 단계 내지 S180 단계를 반복한다. 반면, S180 단계의 판별 결과, 손실이 목표치 이상이면, S200 단계로 진행하여 학습을 종료한다. 이는 전술한 S120 단계 내지 S170 단계의 학습이 서로 다른 복수의 학습용 데이터 클러스터를 이용하여 반복되는 것을 의미한다. Next, the learning unit 200 determines whether the previously calculated loss is less than a preset target value in step S180. As a result of the determination in step S180, if the loss is less than the target value, the process proceeds to step S190. In step S190, the cluster unit 100 erases data of a preset second number (M) from the learning data cluster. At this time, the cluster unit 100 erases the second number (M) of the first extracted data in the first in first out (FIFO) method according to the order in which the data were previously extracted in step S120 among the data included in the learning cluster. Then, steps S120 to S180 described above are repeated. On the other hand, as a result of the determination in step S180, if the loss is equal to or greater than the target value, the process proceeds to step S200 and the learning is terminated. This means that the learning of steps S120 to S170 described above is repeated using a plurality of different training data clusters.

전술한 바와 같이, 탐지망(DN)에 대한 학습이 완료되면, 탐지망(DN)은 탐지부(300)에 제공되며, 탐지부(300)는 탐지망(DN)을 이용하여 데이터의 이상 여부를 판별할 수 있다. 이러한 방법에 대해 설명하기로 한다. 도 6은 본 발명의 실시예에 따른 클러스터링을 기반으로 하는 이상 탐지를 위한 방법을 설명하기 위한 흐름도이다. As described above, when the learning of the detection network (DN) is completed, the detection network (DN) is provided to the detection unit 300, and the detection unit 300 determines whether the data is abnormal using the detection network (DN). can determine. These methods will be described. 6 is a flowchart illustrating a method for anomaly detection based on clustering according to an embodiment of the present invention.

도 6을 참조하면, 클러스터부(100)는 데이터수집부(11)를 통해 S210 단계에서 제품에 대한 양품 혹은 불량품 판정을 위해 소정의 센서를 통해 생산된 제품을 측정한 측정값인 데이터를 지속적으로 입력 받는다. 그리고 클러스터부(100)는 S220 단계에서 입력되는 데이터의 개수가 기 설정된 제1 개수(N) 누적되었는지 여부를 확인한다. Referring to FIG. 6 , the cluster unit 100 continuously stores data, which is a measurement value obtained by measuring a product produced through a predetermined sensor, in step S210 through the data collection unit 11 to determine whether the product is a good product or a defective product. receive input Then, the cluster unit 100 checks whether or not the number of data input in step S220 has been accumulated to a preset first number (N).

S220 단계의 확인 결과, 기 설정된 제1 개수(N) 미만의 데이터가 누적되면, 전술한 S210 단계 및 S220 단계를 반복한다. 반면, S220 단계의 확인 결과, 기 설정된 제1 개수(N)의 데이터가 누적되면, 클러스터부(100)는 S230 단계에서 누적된 기 설정된 제1 개수(N)의 데이터를 클러스터링하여 데이터 클러스터를 형성한다. As a result of checking in step S220, if data less than the preset first number (N) is accumulated, steps S210 and S220 described above are repeated. On the other hand, as a result of checking in step S220, if the first number N of data is accumulated, the cluster unit 100 forms a data cluster by clustering the first number N of data accumulated in step S230. do.

데이터 클러스터가 형성될 때 마다, 탐지부(300)는 S240 단계에서 데이터 클러스터를 탐지망(DN)에 입력한다. 그러면, 탐지망(DN)은 S250 단계에서 복수의 계층 간 가중치가 적용되는 복수의 연산을 통해 데이터 클러스터를 재구성하여 데이터 클러스터를 모사하는 모사 데이터 클러스터를 생성한다. Whenever a data cluster is formed, the detection unit 300 inputs the data cluster to the detection network DN in step S240. Then, the detection network DN reconstructs the data cluster through a plurality of calculations to which weights between a plurality of layers are applied in step S250 to generate a simulated data cluster that simulates the data cluster.

일 실시예에 따르면, 탐지망(DN)의 인코더는 데이터 클러스터에 대해 복수의 계층 간 가중치가 적용되는 복수의 연산을 수행하여 잠재클러스터벡터를 생성하고, 탐지망(DN)의 디코더는 잠재클러스터벡터에 대해 복수의 계층 간 가중치가 적용되는 복수의 연산을 수행하여 모사 데이터 클러스터를 생성한다. According to an embodiment, the encoder of the detection network (DN) generates a latent cluster vector by performing a plurality of operations to which a plurality of inter-layer weights are applied to data clusters, and the decoder of the detection network (DN) generates a latent cluster vector. A plurality of operations are performed to which a plurality of inter-layer weights are applied to generate a simulated data cluster.

그러면, 탐지부(300)는 S260 단계에서 모사 데이터 클러스터와 데이터 클러스터 간의 차이를 나타내는 손실을 산출한다. 그런 다음, 탐지부(300)는 S270 단계에서 손실이 기 설정된 임계치 이상인지 여부를 판단한다. Then, the detection unit 300 calculates a loss indicating a difference between the simulated data cluster and the data cluster in step S260. Then, the detection unit 300 determines whether or not the loss is greater than or equal to a preset threshold in step S270.

S270 단계의 판단 결과, 손실이 기 설정된 임계치 미만이면, S280 단계로 진행한다. S280 단계에서 클러스터부(100)는 데이터 클러스터에서 기 설정된 제2 개수(M)의 데이터를 소거한다. 이때, 클러스터부(100)는 학습용 클러스터에 포함된 데이터 중 앞서 S210 단계에서 데이터가 입력된 순서에 따라 FIFO(First In First Out) 방식으로 제2 개수(M)의 먼저 추출된 데이터를 소거한다. 그런 다음, 다시, S210 단계로 회귀한다. 이에 따라, 클러스터부(100)는 데이터수집부(11)를 통해 S210 단계에서 데이터를 지속적으로 입력 받고, 클러스터부(100)는 S220 단계에서 입력되는 데이터의 개수가 앞서(S280) 소거되지 않은 데이터를 포함하여 기 설정된 제1 개수(N) 누적되었는지 여부를 확인하고, 소거되지 않은 데이터와 새로 누적된 데이터를 포함하여 기 설정된 제1 개수(N) 미만의 데이터가 누적되면, S230 단계에서 누적된 기 설정된 제1 개수(N)의 데이터를 클러스터링하여 데이터 클러스터를 형성하고, 전술한 바와 같은 후속 과정(S230~270)을 반복한다. As a result of the determination in step S270, if the loss is less than a predetermined threshold value, the process proceeds to step S280. In step S280, the cluster unit 100 erases the second number M of data from the data cluster. At this time, the cluster unit 100 erases the second number (M) of first extracted data in a first in first out (FIFO) method according to the order in which the data was previously input in step S210 among the data included in the learning cluster. Then, it returns to step S210 again. Accordingly, the cluster unit 100 continuously receives data in step S210 through the data collection unit 11, and the cluster unit 100 receives data that has not been erased before (S280) the number of input data in step S220. It is checked whether a preset first number (N) has been accumulated, including, and if data less than the preset first number (N) is accumulated, including non-erased data and newly accumulated data, the accumulated data is accumulated in step S230. Data clusters are formed by clustering the data of the first number (N) set in advance, and the following steps (S230 to 270) as described above are repeated.

반면, S270 단계의 판단 결과, 손실이 기 설정된 임계치 미만 이상이면, S290 단계로 진행한다. 이에 따라, S290 단계에서 탐지부(300)는 해당 데이터 클러스터에 이상이 있는 것으로 판정한다. 이는 해당 데이터 클러스터에 무시할 수 없는 개수의 이상 데이터가 포함되어 있음을 의미한다. On the other hand, as a result of the determination in step S270, if the loss is less than or equal to a predetermined threshold value, the process proceeds to step S290. Accordingly, in step S290, the detection unit 300 determines that there is an abnormality in the corresponding data cluster. This means that the data cluster contains a non-negligible number of abnormal data.

제조업체의 경우 수율 향상을 위한 노력으로 생산 조건 분석이 필요하다. 양품/불량품 생산 조건을 분류해주는 모델이 있다면 불량품 생산 조건이 발생한 상황에서 생산을 중단, 장비 또는 환경 정비가 가능하다. 다만, 이러한 모델을 구축하기 위해선 생산 조건에 대한 전수검사 및 라벨링을 수행하여야 하는 번거로움이 있다. 하지만, 본 발명은 생산 시 발생하는 데이터(생산 조건, 생산 결과 등)을 이용, 특정 클러스터를 구성하여 해당 클러스터의 이상 감지를 자동으로 수행할 수 있다. 이상 감지 방법은 클러스터 내 정상범주에서 벗어난 생산 데이터가 포함된 경우를 이상 상황으로 판정하는 것을 기본으로 한다. 제조업뿐만 아니라 데이터의 수량이 너무 많은 등의 이유로 트레이서빌리티(traceability)가 낮은 다양한 상황에 적용, 같은 맥락의 효과를 얻을 수 있다. For manufacturers, it is necessary to analyze production conditions in an effort to improve yield. If there is a model that classifies the production conditions of good/defective products, it is possible to stop production and repair equipment or environment in situations where the production conditions of defective products occur. However, in order to build such a model, there is a hassle of performing total inspection and labeling for production conditions. However, the present invention can configure a specific cluster using data generated during production (production conditions, production results, etc.) and automatically detect abnormalities in the corresponding cluster. The anomaly detection method is based on determining as an anomaly a case in which production data outside the normal range is included in the cluster. In addition to manufacturing, the same effect can be obtained by applying it to various situations where traceability is low due to too much data.

또한, 본 명세서는 다수의 특정한 구현물의 세부사항들을 포함하지만, 이들은 어떠한 발명이나 청구 가능한 것의 범위에 대해서도 제한적인 것으로서 이해되어서는 안 되며, 오히려 특정한 발명의 특정한 실시형태에 특유할 수 있는 특징들에 대한 설명으로서 이해되어야 한다. 개별적인 실시형태의 문맥에서 본 명세서에 기술된 특정한 특징들은 단일 실시형태에서 조합하여 구현될 수도 있다. 반대로, 단일 실시형태의 문맥에서 기술한 다양한 특징들 역시 개별적으로 혹은 어떠한 적절한 하위 조합으로도 복수의 실시형태에서 구현 가능하다. 나아가, 특징들이 특정한 조합으로 동작하고 초기에 그와 같이 청구된 바와 같이 묘사될 수 있지만, 청구된 조합으로부터의 하나 이상의 특징들은 일부 경우에 그 조합으로부터 배제될 수 있으며, 그 청구된 조합은 하위 조합이나 하위 조합의 변형물로 변경될 수 있다.Further, while this specification contains details of a number of specific implementations, they should not be construed as limiting on the scope of any invention or claim, but rather on features that may be unique to a particular embodiment of a particular invention. It should be understood as an explanation for Certain features that are described in this specification in the context of separate embodiments may also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments individually or in any suitable subcombination. Further, while features may operate in particular combinations and are initially depicted as such claimed, one or more features from a claimed combination may in some cases be excluded from that combination, and the claimed combination is a subcombination. or sub-combination variations.

마찬가지로, 특정한 순서로 도면에서 동작들을 묘사하고 있지만, 이는 바람직한 결과를 얻기 위하여 도시된 그 특정한 순서나 순차적인 순서대로 그러한 동작들을 수행하여야 한다거나 모든 도시된 동작들이 수행되어야 하는 것으로 이해되어서는 안 된다. 특정한 경우, 멀티태스킹과 병렬 프로세싱이 유리할 수 있다. 또한, 상술한 실시형태의 다양한 시스템 컴포넌트의 분리는 그러한 분리를 모든 실시형태에서 요구하는 것으로 이해되어서는 안되며, 설명한 프로그램 컴포넌트와 시스템들은 일반적으로 단일의 소프트웨어 제품으로 함께 통합되거나 다중 소프트웨어 제품에 패키징될 수 있다는 점을 이해하여야 한다.Similarly, while actions are depicted in the drawings in a particular order, it should not be construed as requiring that those actions be performed in the specific order shown or in the sequential order, or that all depicted actions must be performed to obtain desired results. In certain cases, multitasking and parallel processing can be advantageous. Further, the separation of various system components in the embodiments described above should not be understood as requiring such separation in all embodiments, and the program components and systems described may generally be integrated together into a single software product or packaged into multiple software products. You have to understand that you can.

본 명세서에서 설명한 주제의 특정한 실시형태를 설명하였다. 기타의 실시형태들은 이하의 청구항의 범위 내에 속한다. 예컨대, 청구항에서 인용된 동작들은 상이한 순서로 수행되면서도 여전히 바람직한 결과를 성취할 수 있다. 일 예로서, 첨부도면에 도시한 프로세스는 바람직한 결과를 얻기 위하여 반드시 그 특정한 도시된 순서나 순차적인 순서를 요구하지 않는다. 특정한 구현예에서, 멀티태스킹과 병렬 프로세싱이 유리할 수 있다.Specific embodiments of the subject matter described herein have been described. Other embodiments are within the scope of the following claims. For example, the actions recited in the claims can be performed in a different order and still achieve desirable results. As an example, the processes depicted in the accompanying drawings do not necessarily require the specific depicted order or sequential order in order to obtain desirable results. In certain implementations, multitasking and parallel processing may be advantageous.

본 기술한 설명은 본 발명의 최상의 모드를 제시하고 있으며, 본 발명을 설명하기 위하여, 그리고 당업자가 본 발명을 제작 및 이용할 수 있도록 하기 위한 예를 제공하고 있다. 이렇게 작성된 명세서는 그 제시된 구체적인 용어에 본 발명을 제한하는 것이 아니다. 따라서, 상술한 예를 참조하여 본 발명을 상세하게 설명하였지만, 당업자라면 본 발명의 범위를 벗어나지 않으면서도 본 예들에 대한 개조, 변경 및 변형을 가할 수 있다.The present description presents the best mode of the invention and provides examples to illustrate the invention and to enable those skilled in the art to make and use the invention. The specification thus prepared does not limit the invention to the specific terms presented. Therefore, although the present invention has been described in detail with reference to the above-described examples, those skilled in the art may make alterations, changes, and modifications to the present examples without departing from the scope of the present invention.

따라서 본 발명의 범위는 설명된 실시 예에 의하여 정할 것이 아니고 특허청구범위에 의해 정하여져야 한다.Therefore, the scope of the present invention should not be determined by the described embodiments, but by the claims.

본 발명은 클러스터링을 기반으로 하는 이상 탐지를 위한 장치 및 이를 위한 방법에 관한 것이다. 이러한 본 발명에 따르면, 엄청나게 많은 수의 데이터를 이용하여 라벨링 없이 이상 여부를 탐지할 수 있는 학습 모델, 즉, 탐지망을 생성할 수 있다. 따라서 학습 모델을 생성하는 시간, 노력 및 비용을 절약할 수 있다. 게다가, 이러한 탐지망을 이용하여 이상 여부를 탐지함으로써 트레이서빌리티(traceability)를 향상시킬 수 있다. 따라서 본 발명은 시판 또는 영업의 가능성이 충분할 뿐만 아니라 현실적으로 명백하게 실시할 수 있는 정도이므로 산업상 이용가능성이 있다. The present invention relates to an apparatus and method for detecting anomaly based on clustering. According to the present invention, it is possible to create a learning model, ie, a detection network, capable of detecting abnormalities without labeling using an extremely large number of data. Thus, the time, effort and cost of creating a learning model can be saved. In addition, traceability can be improved by detecting anomalies using such a detection network. Therefore, the present invention has industrial applicability because it is not only sufficiently commercially available or commercially viable, but also to the extent that it can be clearly practiced in reality.

10: 이상탐지장치 11: 데이터수집부
12: 입력부 13: 표시부
14: 저장부 15: 제어부
100: 클러스터부 200: 학습부
300: 탐지부10: anomaly detection device 11: data collection unit
12: input unit 13: display unit
14: storage unit 15: control unit
100: cluster unit 200: learning unit
300: detection unit

Claims

generating data clusters by clustering the accumulated data whenever a preset first number of input data is accumulated;
inputting the data cluster into a detection network learned by a detection unit to simulate the data cluster;
when the detection network reconstructs the data cluster to generate a simulated data cluster that simulates the data cluster, determining whether or not a difference between the simulated data cluster and the input data cluster is equal to or greater than a predetermined threshold value; and
determining that there is an error in the input data cluster when the difference is equal to or greater than a predetermined threshold as a result of the determination;
characterized in that it includes
Methods for anomaly detection.

According to claim 1,
As a result of the determination, if the difference is less than a preset threshold,
erasing, by a cluster unit, data of a predetermined second number from the data cluster;
generating a new data cluster by accumulating newly input data in the data cluster from which the second number of data has been erased by the cluster unit; and
detecting whether or not the newly created data cluster is abnormal by the detection unit using the detection network;
characterized in that it further comprises
Methods for anomaly detection.

According to claim 1,
Before the step of generating the data cluster,
accumulating a first set number of data by a cluster unit and clustering the accumulated data to generate a data cluster for learning;
inputting the learning data cluster to a detection network that has not been learned by a learning unit;
generating, by the detection network, a simulated data cluster for learning that simulates the data cluster for training by reconstructing the data cluster for training;
performing optimization by the learning unit to update parameters of the detection network so that a difference between the simulated data cluster for learning and the inputted data cluster for learning is minimized;
characterized in that it further comprises
Methods for anomaly detection.

According to claim 3,
The data included in the training data cluster includes normal data and abnormal data,
Characterized in that the number of abnormal data included in the learning data cluster is less than a preset ratio (ab) to the number of normal data included in the learning data cluster
Methods for anomaly detection.

a cluster unit generating a data cluster by clustering the accumulated data whenever a first number of input data is accumulated; and
When the data cluster is input to a detection network learned to copy the data cluster and the detection network reconstructs the data cluster to generate a simulated data cluster that simulates the data cluster, the simulated data cluster and the input data cluster a detecting unit that determines whether a difference in is equal to or greater than a preset threshold, and determines that there is an error in the input data cluster if the difference is greater than or equal to a preset threshold as a result of the determination;
characterized in that it includes
A device for anomaly detection.

According to claim 5,
the cluster part
As a result of the determination, if the difference is less than a preset threshold, erasing a preset second number of data from the data cluster;
Creating a new data cluster by accumulating data newly input to a data cluster from which a predetermined second number of data has been erased;
Characterized in that the detection unit detects whether the newly created data cluster is abnormal using the detection network
A device for anomaly detection.

According to claim 5,
When the cluster unit accumulates a preset first number of data and clusters the accumulated data to generate a data cluster for learning,
Enter the training data cluster into an unlearned detection network,
When the detection network reconstructs the training data cluster to generate a training data cluster that simulates the training data cluster,
a learning unit performing optimization to update the parameters of the detection network so that a difference between the learning simulation data cluster and the input training data cluster is minimized;
characterized in that it further comprises
A device for anomaly detection.

According to claim 7,
The data included in the training data cluster includes normal data and abnormal data,
Characterized in that the number of abnormal data included in the learning data cluster is less than a preset ratio (ab) to the number of normal data included in the learning data cluster
A device for anomaly detection.