KR20210114163A

KR20210114163A - Method for detecting abnomaly using segmenting video image frames, and apparatus for the same

Info

Publication number: KR20210114163A
Application number: KR1020200029466A
Authority: KR
Inventors: 이승익; 무함마드
Original assignee: 한국전자통신연구원
Priority date: 2020-03-10
Filing date: 2020-03-10
Publication date: 2021-09-23

Abstract

According to an embodiment, the present invention relates to a technology detecting an anomaly from a video image monitored using an artificial intelligence technology. An anomaly detecting method using segmenting of a video image frame comprises: a step of collecting a selected number of frames from video image input to generate segments; a step of learning a neural network based on a feature of an intermediate layer of the neural network with the segments as input; and a step of inputting the segments to the learned neural network to detect the anomaly in a unit of the segments.

Description

Anomaly detection method using segmenting of video image frame and apparatus therefor

본 발명은 인공지능 기술을 이용하여 모니터링하는 비디오 영상으로부터 이상 상황을 탐지하는 기술에 관한 것이다. The present invention relates to a technology for detecting an abnormal situation from a video image monitored using artificial intelligence technology.

비디오 영상 기반 이상 상황 탐지 시스템은 CCTV와 같은 고정형 카메라나 이동 로봇과 같이 이동형 카메라를 탑재한 시스템에서 비디오 영상을 기반으로 이상 상황을 탐지하는 것이다. 그리고 최근에는 이상 상황을 탐지하기 위하여 인공지능 기술을 활용하고 있는데, 정상 상황과 비정상 상황을 구분하기 위하여 입력 데이터에 정상과 비정상을 레이블링하고, 레이블링된 입력 데이터를 인공지능에 입력하여 학습을 하여 정상과 비정상을 구분할 수 있도록 하고 있다. A video image-based anomaly detection system detects anomalies based on video images in a system equipped with a fixed camera such as CCTV or a mobile camera such as a mobile robot. And recently, artificial intelligence technology is used to detect abnormal situations. In order to distinguish between normal and abnormal situations, normal and abnormal are labeled on input data, and the labeled input data is input into artificial intelligence to learn and normalize. to distinguish between abnormal and abnormal.

즉, 기존의 인공지능 기술을 이용한 비디오 기반 이상 탐지 시스템은 주로 입력 비디오 영상의 각 프레임별로 레이블(주어진 프레임이 정상 혹은 이상인지를 나타내는 플래그)이 주어지고, 이 레이블을 인공지능 학습의 학습 신호로 사용하는 지도학습기반이 주된 방법이었다. That is, in the video-based anomaly detection system using the existing artificial intelligence technology, a label (a flag indicating whether a given frame is normal or abnormal) is given to each frame of the input video image, and this label is used as a learning signal for AI learning. The supervised learning base used was the main method.

그러나 이와 같은 지도 학습 기반의 방법은 사전에 무엇이 이상인지를 정의해야 하고, 이에 따라 사전에 정의되지 않은 이상은 탐지할 수 없다는 문제점이 있다. 그리고 현실적으로 이 보다 더 큰 문제점은 비디오의 모든 프레임에 대하여 이상 여부를 사람이 직접 보면서 레이블링 작업을 해야한다는 점인데, 이와 같은 레이블링 작업은 엄청나게 많은 수고와 비용을 요구하기 때문이다.However, such a supervised learning-based method has a problem in that it is necessary to define what is an anomaly in advance, and accordingly, an anomaly that is not defined in advance cannot be detected. And in reality, a bigger problem than this is that a person has to label every frame of the video to see if there are any abnormalities, because such a labeling operation requires a lot of effort and cost.

한국공개특허 제 10-2019-0100085호는 인공지능을 이용하여, 위험 상황을 감지할 수 있는 로봇 및 그의 동작 방법을 개시하고 있다. 상기 선행 특허는 음성 인식 모델 및 영상 인식 모델을 이용하여 위험상황을 감지하는 기술에 관한 것으로, 영상 인식 모델이 출력된 대상 특징 벡터와 라벨링(또는 레이블링)된 현재 상황의 차이에 상응하는 비용함수를 최소화하도록 학습한다. 따라서 상기 선행특허도 역시 앞서 기존 기술의 문제점으로 지적한 바와 같이 현재 상황을 프레임단위로 라벨링을 해야 한다는 문제점이 있다. Korean Patent Application Laid-Open No. 10-2019-0100085 discloses a robot capable of detecting a dangerous situation and an operation method thereof using artificial intelligence. The prior patent relates to a technology for detecting a dangerous situation using a voice recognition model and an image recognition model, and a cost function corresponding to the difference between the target feature vector outputted by the image recognition model and the current labeled (or labeled) situation. learn to minimize Accordingly, the prior patent also has a problem in that the current situation must be labeled on a frame-by-frame basis, as pointed out as a problem of the existing technology.

따라서 위 종래 기술이 가지는 문제점을 해결하고, 이상 탐지 시스템의 인공지능 학습에 필요한 레이블링 작업의 수고와 비용을 줄이는 기술의 필요성이 대두된다. Therefore, there is a need for a technology that solves the problems of the prior art and reduces the labor and cost of the labeling task required for artificial intelligence learning of an anomaly detection system.

본 발명의 목적은 비디오 영상 프레임의 세그먼팅을 이용한 이상 탐지 방법및 이를 위한 장치를 제공함으로써 비디오 수준에서의 레이블링 정보, 즉 비디오가 이상 상황을 포함하는지 여부를 나타내는 정보만을 가지고 프레임 수준에서의 이상을 탐지할 수 있는 이상 탐지 시스템을 제공함에 있다. An object of the present invention is to provide a method for detecting anomalies using segmenting of a video image frame and an apparatus for the same, thereby detecting anomalies at the frame level with only labeling information at the video level, that is, information indicating whether the video contains abnormal conditions. An object of the present invention is to provide an anomaly detection system capable of detecting.

또한 이상 비디오 영상 또는 정상 비디오 영상을 이용하여 이상 탐지에 사용될 인공지능 신경망을 학습하는 방법을 제공함에 있다.Another object of the present invention is to provide a method for learning an artificial intelligence neural network to be used for anomaly detection by using an abnormal video image or a normal video image.

실시예에 따른 비디오 영상 프레임의 세그먼팅을 이용한 이상 탐지 방법은, 비디오 영상 입력으로부터 선택된 개수의 프레임을 묶어 세그먼트들을 생성하는 단계; 상기 세그먼트들을 입력으로 한 신경망의 중간 계층의 특징을 기반으로 신경망을 학습하는 단계; 및 상기 세그먼트들을 학습된 신경망에 입력하여 상기 세그먼트 단위로 이상을 검출하는 단계를 포함한다. An anomaly detection method using segmenting a video image frame according to an embodiment includes generating segments by combining a number of frames selected from a video image input; learning a neural network based on the characteristics of an intermediate layer of the neural network to which the segments are input; and inputting the segments into a learned neural network to detect anomalies in units of the segments.

상기 신경망을 학습하는 단계는 적어도 하나 이상의 이상 프레임을 포함하는 이상 비디오 영상으로 학습하는 단계를 포함할 수 있다. Learning the neural network may include learning from an abnormal video image including at least one or more frames.

상기 이상 비디오 영상으로 학습하는 단계는 상기 세그먼트에 대한 상기 신경망의 중간 계층의 특징을 기반으로 두 개의 클러스터를 생성하는 단계; 상기 두 개의 클러스터 중 상기 신경망이 정상으로 예측한 정상 세그먼트의 수가 더 많은 클러스터를 정상 클러스터로 선택하는 단계; 및 상기 두 개의 클러스터 중 상기 정상 클러스터로 선택되지 않은 나머지 클러스터를 이상 클러스터로 선택하는 단계를 포함할 수 있다. The step of learning from the abnormal video image may include: generating two clusters based on a characteristic of an intermediate layer of the neural network for the segment; selecting a cluster having a larger number of normal segments predicted to be normal by the neural network as a normal cluster among the two clusters; and selecting the remaining clusters that are not selected as the normal cluster among the two clusters as the abnormal cluster.

상기 이상 비디오 영상으로 학습하는 단계는 상기 정상 클러스터에 속해 있는 모든 세그먼트들의 레이블을 정상으로 레이블링하는 단계; 상기 이상 클러스터에 속해 있는 모든 세그먼트들의 레이블을 이상으로 레이블링하는 단계; 및 상기 레이블들을 상기 신경망의 예측결과 정답셋으로 설정하는 단계를 더 포함할 수 있다. Learning from the abnormal video image may include: labeling labels of all segments belonging to the normal cluster as normal; labeling labels of all segments belonging to the abnormal cluster as abnormal; and setting the labels as a set of correct answers as a result of prediction of the neural network.

상기 이상 비디오 영상으로 학습하는 단계는 상기 세그먼트들을 입력으로 한 신경망의 최종 값들을 상기 신경망의 예측결과 정답셋과 비교하여 손실함수1를 계산하는 단계; 상기 정상 클러스터와 상기 이상 클러스터를 비교하여 손실함수2를 계산하는 단계; 및 상기 손실함수1과 손실함수2를 종합하여 전체손실함수를 계산하는 단계를 더 포함할 수 있다. The step of learning from the abnormal video image may include calculating a loss function 1 by comparing final values of a neural network to which the segments are input with a set of prediction results of the neural network; calculating a loss function 2 by comparing the normal cluster with the abnormal cluster; and calculating an overall loss function by synthesizing the loss function 1 and the loss function 2 .

상기 손실함수2는 상기 정상 클러스터의 중심과 상기 이상 클러스터의 중심 사이의 거리에 반비례하는 것일 수 있다.The loss function 2 may be inversely proportional to a distance between the center of the normal cluster and the center of the abnormal cluster.

상기 신경망을 학습하는 단계는 이상 프레임을 하나도 포함하지 않는 정상 비디오 영상으로 학습하는 단계를 포함할 수 있다. Learning the neural network may include learning from a normal video image that does not include any abnormal frames.

상기 정상 비디오 영상으로 학습하는 단계는 상기 세그먼트에 대한 상기 신경망의 중간 계층의 특징을 기반으로 두 개의 클러스터를 생성하는 단계; 상기 두 개의 클러스터 중 하나를 정상1 클러스터로 선택하는 단계; 및 상기 두 개의 클러스터 중 다른 하나를 정상2 클러스터로 선택하는 단계를 포함할 수 있다. The learning from the normal video image may include: generating two clusters based on a characteristic of an intermediate layer of the neural network for the segment; selecting one of the two clusters as a normal 1 cluster; and selecting the other one of the two clusters as the normal 2 cluster.

상기 정상 비디오 영상으로 학습하는 단계는 상기 세그먼트들을 입력으로 한 신경망의 최종 값들을 정상을 나타내는 정답셋과 비교하여 손실함수1을 계산하는 단계; 상기 정상1 클러스터와 상기 정상2 클러스터를 비교하여 손실함수2를 계산하는 단계; 및 상기 손실함수1과 손실함수2를 종합하여 전체손실함수를 계산하는 단계를 더 포함할 수 있다. The learning with the normal video image may include calculating a loss function 1 by comparing final values of the neural network to which the segments are input with a set of correct answers indicating normal; calculating a loss function 2 by comparing the normal 1 cluster with the normal 2 cluster; and calculating an overall loss function by synthesizing the loss function 1 and the loss function 2 .

상기 손실함수2는 상기 정상1 클러스터의 중심과 상기 정상2 클러스터의 중심 사이의 거리에 비례하는 것일 수 있다. The loss function 2 may be proportional to a distance between the center of the normal 1 cluster and the center of the normal 2 cluster.

실시예에 따른 비디오 영상 프레임의 세그먼팅을 이용한 이상 탐지 장치는, 비디오 영상 입력으로부터 선택된 개수의 프레임을 묶어 세그먼트들을 생성하는 세그먼트 생성부; 상기 세그먼트들을 입력으로 한 신경망의 중간 계층의 특징을 기반으로 신경망을 학습하는 신경망 학습부; 및 상기 세그먼트들을 학습된 신경망에 입력하여 상기 세그먼트 단위로 이상을 검출하는 이상 검출부를 포함한다. According to an embodiment, an apparatus for detecting anomalies using segmenting of video image frames includes: a segment generator configured to generate segments by combining a number of frames selected from a video image input; a neural network learning unit for learning the neural network based on the characteristics of the middle layer of the neural network to which the segments are input; and an anomaly detection unit configured to input the segments into a learned neural network and detect anomalies in units of the segments.

상기 신경망 학습부는 적어도 하나 이상의 이상 프레임을 포함하는 이상 비디오 영상으로 학습하는 이상 비디오 학습부를 포함할 수 있다. The neural network learning unit may include an abnormal video learning unit that learns from an abnormal video image including at least one or more frames.

상기 이상 비디오 학습부는 상기 세그먼트에 대한 상기 신경망의 중간 계층의 특징을 기반으로 두 개의 클러스터를 생성하는 이상클러스터 생성부를 포함하고, 상기 이상클러스터 생성부는 상기 두 개의 클러스터 중 상기 신경망이 정상으로 예측한 정상 세그먼트의 수가 더 많은 클러스터를 정상 클러스터로 선택하고, 상기 두 개의 클러스터 중 상기 정상 클러스터로 선택되지 않은 나머지 클러스터를 이상 클러스터로 선택하는 것일 수 있다.The abnormal video learning unit includes an abnormal cluster generation unit that generates two clusters based on the characteristics of the intermediate layer of the neural network for the segment, and the abnormal cluster generation unit is a normal one of the two clusters predicted by the neural network as normal. A cluster having a larger number of segments may be selected as a normal cluster, and a remaining cluster not selected as the normal cluster among the two clusters may be selected as an abnormal cluster.

상기 이상 비디오 학습부는 상기 세그먼트들에 대한 레이블을 생성하는 레이블링 생성부를 더 포함하고, 상기 레이블링 생성부는 상기 정상 클러스터에 속해 있는 모든 세그먼트들의 레이블을 정상으로 생성하고, 상기 이상 클러스터에 속해 있는 모든 세그먼트들의 레이블을 이상으로 생성하며, 상기 레이블들을 상기 신경망의 예측결과 정답셋으로 설정하는 것일 수 있다. The abnormal video learning unit further includes a labeling generation unit generating labels for the segments, and the labeling generation unit generating normal labels of all segments belonging to the normal cluster, and The label may be generated as an ideal, and the labels may be set as a set of correct answers as a result of prediction of the neural network.

상기 이상 비디오 학습부는 상기 세그먼트들을 입력으로 한 신경망의 최종 값들을 상기 신경망의 예측결과 정답셋과 비교하여 손실함수1를 계산하는 이상손실함수1 계산부; 상기 정상 클러스터와 상기 이상 클러스터를 비교하여 손실함수2를 계산하는 이상손실함수2 계산부; 및 상기 손실함수1과 손실함수2를 종합하여 전체손실함수를 계산하는 이상전체손실함수 계산부를 더 포함할 수 있다. The abnormal video learning unit includes: an abnormal loss function 1 calculator for calculating a loss function 1 by comparing final values of the neural network to which the segments are input with a set of prediction results of the neural network; an abnormal loss function 2 calculator for calculating a loss function 2 by comparing the normal cluster with the abnormal cluster; and an ideal total loss function calculator for calculating an overall loss function by synthesizing the loss function 1 and the loss function 2 .

상기 손실함수2는 상기 정상 클러스터의 중심과 상기 이상 클러스터의 중심 사이의 거리에 반비례하는 것일 수 있다. The loss function 2 may be inversely proportional to a distance between the center of the normal cluster and the center of the abnormal cluster.

상기 신경망 학습부는 이상 프레임을 하나도 포함하지 않는 정상 비디오 영상으로 학습하는 정상 비디오 학습부를 포함할 수 있다. The neural network learning unit may include a normal video learning unit that learns from a normal video image that does not include any abnormal frames.

상기 정상 비디오 학습부는 상기 세그먼트에 대한 상기 신경망의 중간 계층의 특징을 기반으로 두 개의 클러스터를 생성하는 정상클러스터 생성부를 포함하고, 상기 정상클러스터 생성부는 상기 두 개의 클러스터 중 하나를 정상1 클러스터로 선택하고, 상기 두 개의 클러스터 중 다른 하나를 정상2 클러스터로 선택하는 것일 수 있다. The normal video learning unit includes a normal cluster generation unit that generates two clusters based on the characteristics of the intermediate layer of the neural network for the segment, and the normal cluster generation unit selects one of the two clusters as a normal 1 cluster, , the other one of the two clusters may be selected as the normal 2 cluster.

상기 정상 비디오 학습부는 상기 세그먼트들을 입력으로 한 신경망의 최종 값들을 정상을 나타내는 정답셋과 비교하여 손실함수1을 계산하는 정상손실함수1 계산부; 상기 정상1 클러스터와 상기 정상2 클러스터를 비교하여 손실함수2를 계산하는 정상손실함수2 계산부; 및 상기 손실함수1과 손실함수2를 종합하여 전체손실함수를 계산하는 정상전체손실함수 계산부를 더 포함할 수 있다. The normal video learning unit includes: a normal loss function 1 calculator for calculating a loss function 1 by comparing final values of the neural network to which the segments are input with a set of correct answers indicating normal; a normal loss function 2 calculator for calculating a loss function 2 by comparing the normal 1 cluster with the normal 2 cluster; and a stationary total loss function calculator for calculating a total loss function by synthesizing the loss function 1 and the loss function 2 .

본 발명에 따르면 비디오 영상 프레임의 세그먼팅을 이용한 이상 탐지 방법및 이를 위한 장치를 제공함으로써 비디오 수준에서의 레이블링 정보, 즉 비디오가 이상 상황을 포함하는지 여부를 나타내는 정보만을 가지고 프레임 수준에서의 이상을 탐지할 수 있는 이상 탐지 시스템을 제공할 수 있다.According to the present invention, an anomaly detection method using video image frame segmenting and an apparatus therefor are provided, thereby detecting anomaly at the frame level only with labeling information at the video level, that is, information indicating whether the video includes an abnormal condition. An anomaly detection system that can do this can be provided.

또한 이상 비디오 영상 또는 정상 비디오 영상을 이용하여 이상 탐지에 사용될 인공지능 신경망을 학습하는 방법을 제공할 수 있다.In addition, it is possible to provide a method for learning an artificial intelligence neural network to be used for anomaly detection by using an abnormal video image or a normal video image.

도 1은 실시예에 따른 비디오 영상 프레임의 세그먼팅을 이용한 이상 탐지 장치의 일 예를 나타낸 블록도이다.
도 2는 도 1에 도시된 신경망 학습부(130)의 일 예를 나타낸 블록도이다.
도 3은 도 2에 도시된 이상 비디오 학습부(220)의 일 예를 나타낸 도면이다.
도 4는 도 2에 도시된 정상 비디오 학습부(230)의 일 예를 나타낸 도면이다.
도 5는 도 2에 도시된 이상 비디오 학습부(220)의 활용예를 나타낸 도면이다.
도 6은 도 2에 도시된 정상 비디오 학습부(230)의 활용예를 나타낸 도면이다.
도 7은 실시예에 따른 비디오 영상 프레임의 세그먼팅을 이용한 이상 탐지 방법의 일 예를 나타낸 동작 흐름도이다.
도 8은 실시예에 따른 도 7에 도시된 신경망 학습(S730)의 일 예를 나타낸 동작 흐름도이다.
도 9는 실시예에 따른 도 7에 도시된 신경망 학습(S730)의 다른 일 예를 나타낸 동작 흐름도이다.
도 10은 실시예에 따른 컴퓨터 시스템 구성을 나타낸 도면이다. 1 is a block diagram illustrating an example of an anomaly detection apparatus using segmenting of a video image frame according to an embodiment.
FIG. 2 is a block diagram illustrating an example of the neural network learning unit 130 shown in FIG. 1 .
FIG. 3 is a diagram illustrating an example of the abnormal video learning unit 220 shown in FIG. 2 .
FIG. 4 is a diagram illustrating an example of the normal video learning unit 230 shown in FIG. 2 .
FIG. 5 is a diagram illustrating an example of application of the abnormal video learning unit 220 shown in FIG. 2 .
FIG. 6 is a diagram illustrating an example of application of the normal video learning unit 230 shown in FIG. 2 .
7 is an operation flowchart illustrating an example of an anomaly detection method using segmenting a video image frame according to an embodiment.
8 is an operation flowchart illustrating an example of neural network learning ( S730 ) shown in FIG. 7 according to an embodiment.
9 is an operation flowchart illustrating another example of neural network learning ( S730 ) shown in FIG. 7 according to an embodiment.
10 is a diagram showing the configuration of a computer system according to an embodiment.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다.Advantages and features of the present invention and methods of achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be implemented in a variety of different forms, and only these embodiments allow the disclosure of the present invention to be complete, and common knowledge in the technical field to which the present invention belongs It is provided to fully inform the possessor of the scope of the invention, and the present invention is only defined by the scope of the claims. Like reference numerals refer to like elements throughout.

비록 "제1" 또는 "제2" 등이 다양한 구성요소를 서술하기 위해서 사용되나, 이러한 구성요소는 상기와 같은 용어에 의해 제한되지 않는다. 상기와 같은 용어는 단지 하나의 구성요소를 다른 구성요소와 구별하기 위하여 사용될 수 있다. 따라서, 이하에서 언급되는 제1 구성요소는 본 발명의 기술적 사상 내에서 제2 구성요소일 수도 있다.Although "first" or "second" is used to describe various elements, these elements are not limited by the above terms. Such terms may only be used to distinguish one component from another. Accordingly, the first component mentioned below may be the second component within the spirit of the present invention.

본 명세서에서 사용된 용어는 실시예를 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 또는 "포함하는(comprising)"은 언급된 구성요소 또는 단계가 하나 이상의 다른 구성요소 또는 단계의 존재 또는 추가를 배제하지 않는다는 의미를 내포한다.The terminology used herein is for the purpose of describing the embodiment and is not intended to limit the present invention. As used herein, the singular also includes the plural unless specifically stated otherwise in the phrase. As used herein, “comprises” or “comprising” implies that the stated component or step does not exclude the presence or addition of one or more other components or steps.

다른 정의가 없다면, 본 명세서에서 사용되는 모든 용어는 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 공통적으로 이해될 수 있는 의미로 해석될 수 있다. 또한, 일반적으로 사용되는 사전에 정의되어 있는 용어들은 명백하게 특별히 정의되어 있지 않는 한 이상적으로 또는 과도하게 해석되지 않는다.Unless otherwise defined, all terms used herein may be interpreted with meanings commonly understood by those of ordinary skill in the art to which the present invention pertains. In addition, terms defined in a commonly used dictionary are not to be interpreted ideally or excessively unless specifically defined explicitly.

이하에서는, 도 1 내지 도10을 참조하여 실시예에 따른 비디오 영상 프레임의 세그먼팅을 이용한 이상 탐지 장치 및 이를 위한 장치가 상세히 설명된다.Hereinafter, an apparatus for detecting anomaly using segmentation of a video image frame and an apparatus therefor according to an embodiment will be described in detail with reference to FIGS. 1 to 10 .

도 1은 실시예에 따른 비디오 영상 프레임의 세그먼팅을 이용한 이상 탐지 장치의 일 예를 나타낸 블록도이다. 1 is a block diagram illustrating an example of an anomaly detection apparatus using segmenting of a video image frame according to an embodiment.

도 1을 참조하면, 비디오 영상 프레임의 세그먼팅을 이용한 이상 탐지 장치(100)는 세그먼트 생성부(120), 신경망 학습부(130), 이상 검출부(140)를 포함한다. 그리고 상기 장치는 비디오 입력 영상(110)을 입력으로 받아 세그먼트 단위 이상 검출 결과(150)를 출력으로 내보낸다. Referring to FIG. 1 , an anomaly detection apparatus 100 using segmenting of a video image frame includes a segment generator 120 , a neural network learning unit 130 , and an anomaly detection unit 140 . In addition, the device receives the video input image 110 as an input and outputs a segment-unit abnormality detection result 150 as an output.

세그먼트 생성부(120)는 비디오 입력 영상(110)을 입력으로 받아 선택된 개수의 프레임을 묶어 세그먼트들로 생성한다. 상기 선택된 개수(한 세그먼트를 구성하는 프레임의 개수)가 1이 되면 상기 이상 탐지 장치는 프레임 수준에서 이상 및 정상을 판단할 수 있는 시스템이 되고, 1보다 크게 되면 세그먼트 수준에서 이상 및 정상을 판단할 수 있는 시스템이 될 수 있다. The segment generator 120 receives the video input image 110 as an input and combines the selected number of frames to generate segments. When the selected number (the number of frames constituting one segment) becomes 1, the abnormality detection device becomes a system capable of determining abnormality and normality at the frame level, and when it is greater than 1, it is possible to determine abnormality and normality at the segment level. It can be a system that can

신경망 학습부(130)는 세그먼트 생성부(120)에서 생성된 세그먼트들을 입력으로 받아 신경망의 중간 계층의 특징을 기반으로 신경망을 학습한다.The neural network learning unit 130 receives the segments generated by the segment generating unit 120 as input and learns the neural network based on the characteristics of the intermediate layer of the neural network.

상기 신경망 학습부(130)는 적어도 하나 이상의 이상 프레임을 포함하는 이상 비디오 영상으로 학습하는 동작을 수행할 수 있다. The neural network learning unit 130 may perform an operation of learning from an abnormal video image including at least one or more frames.

우선 상기 신경망 학습부(130)는 상기 이상 비디오 영상으로 학습하는 동작을 수행하기 위하여 상기 세그먼트에 대한 상기 신경망의 중간 계층의 특징을 기반으로 두 개의 그룹, 즉 두 개의 클러스터를 생성할 수 있다. First, the neural network learning unit 130 may generate two groups, ie, two clusters, based on the characteristics of the intermediate layer of the neural network for the segment in order to perform the operation of learning from the abnormal video image.

상기 이상 탐지 장치가 상기 신경망을 학습하기 위해서는 각 세그먼트별로 이상 또는 정상의 판별 결과에 상응하는 이상 또는 정상의 레이블링이 있어야 한다. 그러나 본 발명에서는 오로지 비디오 수준에서의 레이블링, 즉 입력 비디오 영상이 이상 상황을 포함하고 있는지 여부만을 알고 있기 때문에 각 세그먼트 별로 어떤 세그먼트가 이상 상황을 포함하고 어떤 세그먼트가 정상 세그먼트인지 알지 못한다. 이를 극복하기 위해서 신경망의 중간 계층의 특징을 사용하여, 본 발명에서는 세그먼트들을 클러스터링하는 방법을 제안한다. 여기서 클러스터링이란 세그먼트의 특징을 기반으로 비슷한 특징을 가진 세그먼트들끼리 한 그룹으로 묶는 것을 말한다. 따라서 상기 신경망 학습부(130)는 상기 세그먼트들을 두 개의 그룹, 즉 두 개의 클러스터로 나누고, 상기 클러스터 별로 레이블링을 하여 정답셋으로 사용하는 것이다. In order for the anomaly detection device to learn the neural network, there must be an abnormality or normal labeling corresponding to the abnormality or normality determination result for each segment. However, in the present invention, since only labeling at the video level, ie, whether the input video image contains an abnormal situation, is known, which segment contains an abnormal situation and which segment is a normal segment for each segment is not known. In order to overcome this problem, the present invention proposes a method of clustering segments using the characteristics of the middle layer of the neural network. Here, clustering refers to grouping segments with similar characteristics into a group based on segment characteristics. Therefore, the neural network learning unit 130 divides the segments into two groups, ie, two clusters, labels each cluster and uses them as a set of correct answers.

상기 신경망 학습부(130)는 상기 두 개의 클러스터 중 상기 신경망이 정상으로 예측한 정상 세그먼트의 수가 더 많은 클러스터를 정상 클러스터로 선택하고, 상기 정상 클러스터로 선택되지 않은 나머지 클러스터를 이상 클러스터로 선택할 수 있다. 즉, 두 개의 클러스터 중에서 신경망이 예측한 각 세그먼트에서 정상으로 예측한 세그먼트가 더 많이 분포하는 클러스터를 정상 클러스터로 선택하고, 나머지 클러스터를 이상 클러스터로 선택하는 것이다. The neural network learning unit 130 may select a cluster having a larger number of normal segments predicted to be normal by the neural network among the two clusters as a normal cluster, and select the remaining clusters that are not selected as the normal cluster as an abnormal cluster. . That is, among the two clusters, the cluster in which the segment predicted by the neural network is distributed more in each segment predicted by the neural network is selected as the normal cluster, and the remaining clusters are selected as the abnormal cluster.

그리고 상기 신경망 학습부(130)는 상기 정상 클러스터에 속해 있는 모든 세그먼트들의 레이블을 정상으로 레이블링하고, 상기 이상 클러스터에 속해 있는 모든 세그먼트들의 레이블을 이상으로 레이블링할 수 있다. 이와 같이 생성된 레이블들을 상기 신경망의 예측결과 정답셋으로 설정할 수 있다.In addition, the neural network learning unit 130 may label labels of all segments belonging to the normal cluster as normal, and may label labels of all segments belonging to the abnormal cluster as abnormal. The labels generated in this way can be set as a set of correct answers as a result of prediction of the neural network.

상기와 같은 방법으로, 상기 이상 탐지 장치는 세그먼트 단위의 상기 신경망의 예측결과 정답셋을 생성하고, 상기 정답셋을 가지고 신경망 학습을 진행할 수 있는 것이다. In the same manner as described above, the anomaly detection apparatus may generate a set of correct answers as a result of prediction of the neural network in units of segments, and may perform neural network learning using the set of correct answers.

상기 신경망 학습부(130)는 상기 세그먼트들을 입력으로 한 신경망의 최종 값들을 상기 신경망의 예측결과 정답셋과 비교하여 손실함수1을 계산할 수 있다. 또한 상기 정상 클러스터와 상기 이상 클러스터를 비교하여 손실함수2를 계산할 수 있다. 그리고 상기 손실함수1과 상기 손실함수2를 종합하여 전체 손실함수를 계산할 수 있다. 그리고 상기 손실함수2는 상기 정상 클러스터의 중심과 상기 이상 클러스터의 중심 사이의 거리에 반비례하는 것일 수 있다. The neural network learning unit 130 may calculate the loss function 1 by comparing final values of the neural network to which the segments are input with the set of prediction results of the neural network. In addition, the loss function 2 may be calculated by comparing the normal cluster and the abnormal cluster. And the total loss function can be calculated by synthesizing the loss function 1 and the loss function 2 . And the loss function 2 may be inversely proportional to a distance between the center of the normal cluster and the center of the abnormal cluster.

상기 손실함수1은 각 세그먼트의 신경망의 최종 값(출력 또는 예측 값)과 정답셋을 비교하여 얻을 수 있는데, 그 수학식은 하기 수학식 1과 같이 정의될 수 있다.The loss function 1 can be obtained by comparing the final value (output or predicted value) of the neural network of each segment with a set of correct answers, and the equation can be defined as Equation 1 below.

[수학식 1] [Equation 1]

여기서,

는 손실함수1,

은 하나의 비디오 입력 영상에 있는 세그먼트의 개수,

는 i번째 세그먼트에 대한 신경망의 출력,

는 i번째 세그먼트가 속한 클러스터(또는 정답셋)를 의미한다. here,

is the loss function 1,

is the number of segments in one video input image,

is the output of the neural network for the ith segment,

denotes the cluster (or correct answer set) to which the i-th segment belongs.

상기 손실함수2는 두 개의 클러스터의 비교를 통해 구할 수 있다. 상기 두 개의 클러스터는 하나는 정상, 나머지는 이상을 나타내고 있으므로, 두 클러스터 간의 거리가 되도록 멀리 떨어져 있는 것이 바람직하다. 따라서 두 클러스터 간의 거리를 이용하여 손실함수2를 하기 수학식 2와 같이 계산할 수 있다. The loss function 2 can be obtained by comparing two clusters. Since one of the two clusters indicates normal and the other indicates abnormality, it is preferable that the two clusters be separated as far apart as possible. Therefore, the loss function 2 can be calculated as in Equation 2 below using the distance between the two clusters.

[수학식 2][Equation 2]

여기서,

는 손실함수2,

은 첫 번째 클러스터의 중심,

는 두번째 클러스터의 중심을 의미한다. here,

is the loss function 2,

is the center of the first cluster,

is the center of the second cluster.

이상과 같이 손실함수에 세그먼트와 클러스터의 두 가지 요소를 고려할 수 있으므로, 이상 비디오를 가지고 학습하는 경우의 전체 손실 함수는 하기 수학식 3과 같이 정의될 수 있다.As described above, since two elements of a segment and a cluster can be considered in the loss function, the overall loss function in the case of learning with an abnormal video can be defined as in Equation 3 below.

[수학식 3][Equation 3]

여기서

은 전체손실함수,

는 0과 1사이의 실수값,

는 손실함수1,

는 손실함수2이다. here

is the total loss function,

is a real value between 0 and 1,

is the loss function 1,

is the loss function 2.

상기의 수학식들로 보여준 손실함수들은 단지 하나의 예일 뿐이고, 사용자의선택과 응용에 따라서 다르게 구현될 수도 있다. 예를 들어

는 교차 엔트로피 손실일 수도 있고,

는 클러스터에 속한 모든 원소들의 거리에 대한 평균일 수도 있다. The loss functions shown by the above equations are only examples, and may be implemented differently according to user's selection and application. E.g

may be a cross-entropy loss,

may be an average of the distances of all elements in the cluster.

또한, 상기 신경망 학습부(130)는 이상 프레임을 하나도 포함하지 않는 정상 비디오 영상으로 학습하는 동작을 수행할 수 있다. Also, the neural network learning unit 130 may perform an operation of learning from a normal video image that does not include any abnormal frames.

우선 상기 신경망 학습부(130)는 정상 비디오 영상으로 학습하는 동작을 수행하기 위하여 상기 세그먼트에 대한 상기 신경망의 중간 계층의 특징을 기반으로 두 개의 클러스터를 생성할 수 있다.First, the neural network learning unit 130 may generate two clusters based on the characteristics of the intermediate layer of the neural network for the segment in order to perform an operation of learning from a normal video image.

앞서 설명한 바와 같이 상기 신경망을 학습하기 위해서는 각 세그먼트별로 이상 또는 정상의 판별 결과에 상응하는 이상 또는 정상의 레이블링이 있어야 한다. 따라서 상기 신경망 학습부(130)는 상기 신경망의 중간 계층의 특징을 기반으로 상기 세그먼트들을 두 개의 그룹, 즉 두 개의 클러스터로 나누고, 상기 클러스터 별로 레이블링을 하여 정답셋으로 사용하는 것이다. As described above, in order to learn the neural network, labeling of abnormality or normality corresponding to the determination result of abnormality or normality for each segment is required. Accordingly, the neural network learning unit 130 divides the segments into two groups, ie, two clusters, based on the characteristics of the intermediate layer of the neural network, labels each cluster, and uses the set as a correct answer set.

그리고 상기 신경망 학습부(130)는 상기 두 개의 클러스터 중 하나를 정상1 클러스터로 선택하고, 다른 하나를 정상2 클러스터로 선택할 수 있다. 즉, 모든 세그먼트가 정상임을 알고 있으므로 두 개의 클러스터 모두 정상 클러스터가 되는 것이고, 세그먼트 단위의 신경망의 예측결과 정답셋도 모두 정상으로 하면 된다. 따라서 상기 세그먼트 단위의 신경망의 예측에 대한 정답셋을 가지고 신경망 학습을 진행할 수 있다.The neural network learning unit 130 may select one of the two clusters as the normal 1 cluster and select the other as the normal 2 cluster. In other words, since we know that all segments are normal, both clusters become normal clusters, and the prediction result of the segment-by-segment neural network can also be set to normal. Therefore, neural network learning can be performed with a set of correct answers for prediction of the neural network in the segment unit.

상기 신경망 학습부(130)는 상기 세그먼트들을 입력으로 한 신경망의 최종 값들을 정상을 나타내는 정답셋과 비교하여 손실함수1을 계산할 수 있다. 또한 상기 정상1 클러스터와 상기 정상2 클러스터를 비교하여 손실함수2를 계산할 수 있다. 그리고 상기 손실함수1과 상기 손실함수2를 종합하여 전체 손실함수를 계산할 수 있다. 그리고 상기 손실함수2는 상기 정상1 클러스터의 중심과 상기 정상2 클러스터의 중심 사이의 거리에 비례하는 것일 수 있다. The neural network learning unit 130 may calculate the loss function 1 by comparing final values of the neural network to which the segments are input with a set of correct answers indicating normality. Also, the loss function 2 may be calculated by comparing the normal 1 cluster and the normal 2 cluster. And the total loss function can be calculated by synthesizing the loss function 1 and the loss function 2 . In addition, the loss function 2 may be proportional to a distance between the center of the normal 1 cluster and the center of the normal 2 cluster.

상기 손실함수1은 각 세그먼트의 신경망의 최종 값(출력 또는 예측 값)과 정답셋을 비교하여 얻을 수 있는데, 그 수학식은 상기 수학식 1과 같이 정의될 수 있다. The loss function 1 can be obtained by comparing the final value (output or predicted value) of the neural network of each segment with the correct answer set, and the equation can be defined as Equation 1 above.

상기 손실함수2는 역시 두 개의 클러스터의 비교를 통해 구할 수 있다. 그리고 상기 두 개의 클러스터 모두가 정상을 나타내고 있으므로, 두 클러스터 간의 거리가 되도록 가까운 것이 바람직하다. 따라서 두 클러스터 간의 거리를 이용하여 손실함수2를 하기 수학식 4와 같이 계산할 수 있다. The loss function 2 can also be obtained by comparing two clusters. In addition, since both of the two clusters indicate normality, it is preferable that the distance between the two clusters be as close as possible. Therefore, the loss function 2 can be calculated as in Equation 4 below using the distance between the two clusters.

[수학식 4][Equation 4]

여기서,

는 손실함수2,

은 첫 번째 클러스터의 중심,

는 두번째 클러스터의 중심을 의미한다. here,

is the loss function 2,

is the center of the first cluster,

is the center of the second cluster.

이상과 같이 손실함수에 세그먼트와 클러스터의 두 가지 요소를 고려할 수 있으므로, 이상 비디오를 가지고 학습하는 경우의 전체 손실 함수는 상기 수학식 3과 같이 정의될 수 있다.As described above, since two elements, a segment and a cluster, can be considered in the loss function, the overall loss function in the case of learning with an abnormal video can be defined as in Equation 3 above.

상기 신경망 학습부(130)는 이상 비디오 영상과 정상 비디오 영상을 다수 구축하고, 이상 비디오 영상으로 학습하는 동작과 정상 비디오 영상으로 학습하는 동작을 결과가 수렴할 때까지 반복할 수 있다. 각각의 비디오 영상을 학습할 때마다 클러스터링이 수행되고, 같은 비디오 영상이라고 하더라도 학습이 진행됨에 따라서 클러스터링의 결과가 변화할 수 있다. 이상 비디오 영상으로 학습하는 경우, 클러스터링 결과에서 어느 클러스터가 정상 또는 이상인지는 사전에 정의되지는 않고, 신경망의 출력 결과에 따라 정상으로 판별된 세그먼트가 더 많이 존재하는 클러스터가 정상 클러스터로 선택되는 것이다.The neural network learning unit 130 may construct a plurality of abnormal video images and normal video images, and repeat the operation of learning from the abnormal video image and the operation of learning from the normal video image until the results converge. Clustering is performed whenever each video image is learned, and even if it is the same video image, the clustering result may change as learning proceeds. When learning from an abnormal video image, which cluster is normal or abnormal in the clustering result is not defined in advance, and a cluster having more segments determined as normal according to the output result of the neural network is selected as the normal cluster.

이상 검출부(130)는 세그먼트 생성부(120)에서 생성된 세그먼트들을 신경망학습부에서 학습된 신경망에 입력하고, 각 세그먼트에 대하여 이상 혹은 정상을 판별하여 세그먼트 단위 이상 검출 결과(150)를 출력하게 된다. The abnormality detection unit 130 inputs the segments generated by the segment generation unit 120 to the neural network learned by the neural network learning unit, determines abnormality or normality with respect to each segment, and outputs the segment unit abnormality detection result 150 . .

도 2는 도 1에 도시된 신경망 학습부(130)의 일 예를 나타낸 블록도이다. FIG. 2 is a block diagram illustrating an example of the neural network learning unit 130 shown in FIG. 1 .

도 2를 참조하면, 신경망 학습부(130)는 신경망(210)을 포함하고, 이상 비디오 학습부(220) 또는 정상 비디오 학습부(230)를 포함할 수 있다. Referring to FIG. 2 , the neural network learning unit 130 may include a neural network 210 , and may include an abnormal video learning unit 220 or a normal video learning unit 230 .

상기 신경망(210)은 세그먼트 생성부에서 생성된 세그먼트를 입력으로 받아 이상 혹은 정상을 판별하기 위한 인공지능 신경망이다. 상기 신경망(210)은 상기 이상 비디오 학습부(220) 또는 정상 비디오 학습부(230)를 통해 학습될 수 있다. 상기 이상 비디오 학습부(220) 또는 정상 비디오 학습부(230)는 상기 신경망의 중간 계층의 특징을 기반으로 학습할 수 있다. 즉, 상기 신경망은 이상 비디오 학습부 또는 정상 비디오 학습부와의 상호 작용을 통해서 학습을 수행하게 되는 것이다.The neural network 210 is an artificial intelligence neural network for receiving the segment generated by the segment generator as an input and determining abnormality or normality. The neural network 210 may be trained through the abnormal video learner 220 or the normal video learner 230 . The abnormal video learner 220 or the normal video learner 230 may learn based on the characteristics of the intermediate layer of the neural network. That is, the neural network performs learning through interaction with the abnormal video learner or the normal video learner.

신경망 학습부(130)는 적어도 하나 이상의 이상 프레임을 포함하는 이상 비디오 영상으로 학습하는 동작을 수행하는 이상 비디오 학습부(220)를 포함할 수 있다. The neural network learning unit 130 may include the abnormal video learning unit 220 performing an operation of learning from an abnormal video image including at least one or more frames.

상기 이상 비디오 학습부(220)는 이상 비디오 영상으로 학습하는 동작을 수행하기 위하여 상기 세그먼트에 대한 상기 신경망의 중간 계층의 특징을 기반으로 두 개의 클러스터를 생성할 수 있다. The abnormal video learning unit 220 may generate two clusters based on the characteristics of the middle layer of the neural network for the segment in order to perform the operation of learning from the abnormal video image.

상기 이상 비디오 학습부(220)가 상기 신경망을 학습하기 위해서는 각 세그먼트별로 이상 또는 정상의 판별 결과에 상응하는 이상 또는 정상의 레이블링이 있어야 한다. 그러나 본 발명에서는 오로지 비디오 수준에서의 레이블링, 즉 입력 비디오 영상이 이상 상황을 포함하고 있는지 여부만을 알고 있기 때문에 각 세그먼트 별로 어떤 세그먼트가 이상 상황을 포함하고 어떤 세그먼트가 정상 세그먼트인지 알지 못한다. 이를 극복하기 위해서 신경망의 중간 계층의 특징을 사용하여, 본 발명에서는 세그먼트들을 클러스터링하는 방법을 제안한다. 여기서 클러스터링이란 세그먼트의 특징을 기반으로 비슷한 특징을 가진 세그먼트들끼리 한 그룹으로 묶는 것을 말한다. 따라서 상기 이상 비디오 학습부(220)는 상기 세그먼트들을 두 개의 그룹, 즉 두 개의 클러스터로 나누고, 상기 클러스터 별로 레이블링을 하여 정답셋으로 사용하는 것이다. In order for the abnormal video learning unit 220 to learn the neural network, labeling of abnormality or normality corresponding to the determination result of abnormality or normality for each segment is required. However, in the present invention, since only labeling at the video level, ie, whether the input video image contains an abnormal situation, is known, which segment contains an abnormal situation and which segment is a normal segment for each segment is not known. In order to overcome this problem, the present invention proposes a method of clustering segments using the characteristics of the middle layer of the neural network. Here, clustering refers to grouping segments with similar characteristics into a group based on segment characteristics. Therefore, the abnormal video learning unit 220 divides the segments into two groups, ie, two clusters, labels each cluster, and uses them as a correct answer set.

상기 이상 비디오 학습부(220)는 상기 두 개의 클러스터 중 상기 신경망이 정상으로 예측한 정상 세그먼트의 수가 더 많은 클러스터를 정상 클러스터로 선택하고, 상기 정상 클러스터로 선택되지 않은 나머지 클러스터를 이상 클러스터로 선택할 수 있다. 즉, 두 개의 클러스터 중에서 신경망이 예측한 각 세그먼트에서 정상으로 예측한 세그먼트가 더 많이 분포하는 클러스터를 정상 클러스터로 선택하고, 나머지 클러스터를 이상 클러스터로 선택하는 것이다. The abnormal video learning unit 220 may select a cluster with a larger number of normal segments predicted as normal by the neural network among the two clusters as a normal cluster, and select the remaining clusters that are not selected as the normal cluster as an abnormal cluster. have. That is, among the two clusters, the cluster in which the segment predicted by the neural network is distributed more in each segment predicted by the neural network is selected as the normal cluster, and the remaining clusters are selected as the abnormal cluster.

그리고 상기 이상 비디오 학습부(220)는 상기 정상 클러스터에 속해 있는 모든 세그먼트들의 레이블을 정상으로 레이블링하고, 상기 이상 클러스터에 속해 있는 모든 세그먼트들의 레이블을 이상으로 레이블링할 수 있다. 이와 같이 생성된 레이블들을 상기 신경망의 예측결과 정답셋으로 설정할 수 있다.The abnormal video learning unit 220 may label labels of all segments belonging to the normal cluster as normal, and label labels of all segments belonging to the abnormal cluster as abnormal. The labels generated in this way can be set as a set of correct answers as a result of prediction of the neural network.

상기와 같은 방법으로, 상기 이상 비디오 학습부(220)는 세그먼트 단위의 상기 신경망의 예측결과 정답셋을 생성하고, 상기 정답셋을 가지고 신경망 학습을 진행할 수 있는 것이다. In the same way as described above, the abnormal video learning unit 220 may generate a set of correct answers as a result of prediction of the neural network in units of segments, and may perform neural network learning with the set of correct answers.

상기 이상 비디오 학습부(220)는 상기 세그먼트들을 입력으로 한 신경망의 최종 값들을 상기 신경망의 예측결과 정답셋과 비교하여 손실함수1을 계산할 수 있다. 또한 상기 정상 클러스터와 상기 이상 클러스터를 비교하여 손실함수2를 계산할 수 있다. 그리고 상기 손실함수1과 상기 손실함수2를 종합하여 전체 손실함수를 계산할 수 있다. 그리고 상기 손실함수2는 상기 정상 클러스터의 중심과 상기 이상 클러스터의 중심 사이의 거리에 반비례하는 것일 수 있다.The abnormal video learning unit 220 may calculate the loss function 1 by comparing final values of the neural network to which the segments are input with the set of prediction results of the neural network. In addition, the loss function 2 may be calculated by comparing the normal cluster and the abnormal cluster. And the total loss function can be calculated by synthesizing the loss function 1 and the loss function 2 . And the loss function 2 may be inversely proportional to a distance between the center of the normal cluster and the center of the abnormal cluster.

상기 손실함수1은 각 세그먼트의 신경망의 최종 값(출력 또는 예측 값)과 정답셋을 비교하여 얻을 수 있는데, 그 수학식은 상기 수학식 1과 같이 정의될 수 있다. 상기 손실함수2는 두 개의 클러스터의 비교를 통해 구할 수 있다. 상기 두 개의 클러스터는 하나는 정상, 나머지는 이상을 나타내고 있으므로, 두 클러스터 간의 거리가 되도록 멀리 떨어져 있는 것이 바람직하다. 따라서 두 클러스터 간의 거리를 이용하여 손실함수2를 상기 수학식 2와 같이 계산할 수 있다. 이상과 같이 손실함수에 세그먼트와 클러스터의 두 가지 요소를 고려할 수 있으므로, 이상 비디오를 가지고 학습하는 경우의 전체 손실 함수는 상기 수학식 3과 같이 정의될 수 있다.The loss function 1 can be obtained by comparing the final value (output or predicted value) of the neural network of each segment with the correct answer set, and the equation can be defined as Equation 1 above. The loss function 2 can be obtained by comparing two clusters. Since one of the two clusters indicates normal and the other indicates abnormality, it is preferable that the two clusters be separated as far apart as possible. Therefore, the loss function 2 can be calculated as in Equation 2 by using the distance between the two clusters. As described above, since two elements, a segment and a cluster, can be considered in the loss function, the overall loss function in the case of learning with an abnormal video can be defined as in Equation 3 above.

상기의 수학식들로 보여준 손실함수들은 단지 하나의 예일 뿐이고, 사용자의 선택과 응용에 따라서 다르게 구현될 수도 있다. 예를 들어

는 교차 엔트로피 손실일 수도 있고,

는 클러스터에 속한 모든 원소들의 거리에 대한 평균일 수도 있다. The loss functions shown by the above equations are merely examples, and may be implemented differently according to a user's selection and application. E.g

may be a cross-entropy loss,

may be an average of the distances of all elements in the cluster.

또한, 상기 신경망 학습부(130)는 이상 프레임을 하나도 포함하지 않는 정상 비디오 영상으로 학습하는 동작을 수행하는 정상 비디오 학습부(230)를 포함할 수 있다. In addition, the neural network learning unit 130 may include a normal video learning unit 230 that performs an operation of learning from a normal video image that does not include any abnormal frames.

우선 상기 정상 비디오 학습부(230)가 정상 비디오 영상으로 학습하는 동작을 수행하기 위하여 상기 세그먼트에 대한 상기 신경망의 중간 계층의 특징을 기반으로 두 개의 클러스터를 생성할 수 있다.First, in order for the normal video learning unit 230 to learn from a normal video image, two clusters may be generated based on the characteristics of the intermediate layer of the neural network for the segment.

앞서 설명한 바와 같이 상기 신경망을 학습하기 위해서는 각 세그먼트별로 이상 또는 정상의 판별 결과에 상응하는 이상 또는 정상의 레이블링이 있어야 한다. 따라서 상기 정상 비디오 학습부(230)는 상기 신경망의 중간 계층의 특징을 기반으로 상기 세그먼트들을 두 개의 그룹, 즉 두 개의 클러스터로 나누고, 상기 클러스터 별로 레이블링을 하여 정답셋으로 사용하는 것이다. As described above, in order to learn the neural network, labeling of abnormality or normality corresponding to the determination result of abnormality or normality for each segment is required. Therefore, the normal video learning unit 230 divides the segments into two groups, ie, two clusters, based on the characteristics of the intermediate layer of the neural network, labels each cluster, and uses it as a correct answer set.

상기 정상 비디오 학습부(230)는 상기 두 개의 클러스터 중 하나를 정상1 클러스터로 선택하고, 다른 하나를 정상2 클러스터로 선택할 수 있다. 즉, 모든 세그먼트가 정상임을 알고 있으므로 두 개의 클러스터 모두 정상 클러스터가 되는 것이고, 세그먼트 단위의 신경망의 예측결과 정답셋도 모두 정상으로 하면 된다. 따라서 상기 세그먼트 단위의 신경망의 예측에 대한 정답셋을 가지고 신경망 학습을 진행할 수 있다.The normal video learning unit 230 may select one of the two clusters as the normal 1 cluster and select the other as the normal 2 cluster. In other words, since we know that all segments are normal, both clusters become normal clusters, and the prediction result of the segment-by-segment neural network can also be set to normal. Therefore, neural network learning can be performed with a set of correct answers for prediction of the neural network in the segment unit.

그리고 상기 정상 비디오 학습부(230)는 상기 세그먼트들을 입력으로 한 신경망의 최종 값들을 정상을 나타내는 정답셋과 비교하여 손실함수1을 계산할 수 있다. 또한 상기 정상1 클러스터와 상기 정상2 클러스터를 비교하여 손실함수2를 계산할 수 있다. 그리고 상기 손실함수1과 상기 손실함수2를 종합하여 전체 손실함수를 계산할 수 있다. 그리고 상기 손실함수2는 상기 정상1 클러스터의 중심과 상기 정상2 클러스터의 중심 사이의 거리에 비례하는 것일 수 있다. In addition, the normal video learning unit 230 may calculate the loss function 1 by comparing final values of the neural network to which the segments are input with a set of correct answers indicating normality. Also, the loss function 2 may be calculated by comparing the normal 1 cluster and the normal 2 cluster. And the total loss function can be calculated by synthesizing the loss function 1 and the loss function 2 . In addition, the loss function 2 may be proportional to a distance between the center of the normal 1 cluster and the center of the normal 2 cluster.

상기 손실함수1은 각 세그먼트의 신경망의 최종 값(출력 또는 예측 값)과 정답셋을 비교하여 얻을 수 있는데, 그 수학식은 상기 수학식 1과 같이 정의될 수 있다. 상기 손실함수2는 역시 두 개의 클러스터의 비교를 통해 구할 수 있다. 그리고 상기 두 개의 클러스터 모두가 정상을 나타내고 있으므로, 두 클러스터 간의 거리가 되도록 가까운 것이 바람직하다. 따라서 두 클러스터 간의 거리를 이용하여 손실함수2를 상기 수학식 4와 같이 계산할 수 있다. 이상과 같이 손실함수에 세그먼트와 클러스터의 두 가지 요소를 고려할 수 있으므로, 이상 비디오를 가지고 학습하는 경우의 전체 손실 함수는 상기 수학식 3과 같이 정의될 수 있다.The loss function 1 can be obtained by comparing the final value (output or predicted value) of the neural network of each segment with the correct answer set, and the equation can be defined as Equation 1 above. The loss function 2 can also be obtained by comparing two clusters. In addition, since both of the two clusters indicate normality, it is preferable that the distance between the two clusters be as close as possible. Therefore, the loss function 2 can be calculated as in Equation 4 using the distance between the two clusters. As described above, since two elements, a segment and a cluster, can be considered in the loss function, the overall loss function in the case of learning with an abnormal video can be defined as in Equation 3 above.

상기 신경망 학습부(130)는 이상 비디오 영상과 정상 비디오 영상을 다수 구축하고, 이상 비디오 영상으로 학습하는 동작과 정상 비디오 영상으로 학습하는 동작을 결과가 수렴할 때까지 반복할 수 있다. 이 때 각각의 비디오 영상을 학습할 때마다 클러스터링이 수행되고, 같은 비디오 영상이라고 하더라도 학습이 진행됨에 따라서 클러스터링의 결과가 변화할 수 있다. 이상 비디오 영상으로 학습하는 경우, 클러스터링 결과에서 어느 클러스터가 정상 또는 이상인지는 사전에 정의되지는 않고, 신경망의 출력 결과에 따라 정상으로 판별된 세그먼트가 더 많이 존재하는 클러스터가 정상 클러스터로 선택되는 것이다.The neural network learning unit 130 may construct a plurality of abnormal video images and normal video images, and repeat the operation of learning from the abnormal video image and the operation of learning from the normal video image until the results converge. At this time, clustering is performed every time each video image is learned, and even for the same video image, the clustering result may change as learning proceeds. When learning from an abnormal video image, which cluster is normal or abnormal in the clustering result is not defined in advance, and a cluster having more segments determined as normal according to the output result of the neural network is selected as the normal cluster.

도 3은 도 2에 도시된 이상 비디오 학습부(220)의 일 예를 나타낸 도면이다. FIG. 3 is a diagram illustrating an example of the abnormal video learning unit 220 shown in FIG. 2 .

도 3을 참조하면, 이상 비디오 학습부(220)는 이상클러스터 생성부(310), 레이블 생성부(320), 이상손실함수1 계산부(330), 이상손실함수2 계산부(340), 이상전체손실함수 계산부(350)를 포함할 수 있다. Referring to FIG. 3 , the anomaly video learning unit 220 includes an anomaly cluster generation unit 310 , a label generation unit 320 , an abnormal loss function 1 calculation unit 330 , an abnormal loss function 2 calculation unit 340 , and an anomaly A total loss function calculation unit 350 may be included.

이상클러스터 생성부(310)는 세그먼트에 대한 상기 신경망(210)의 중간 계층의 특징을 기반으로 두 개의 클러스터를 생성할 수 있다. The abnormal cluster generating unit 310 may generate two clusters based on the characteristics of the middle layer of the neural network 210 for the segment.

상기 이상클러스터 생성부(310)는 상기 두 개의 클러스터 중 상기 신경망이 정상으로 예측한 정상 세그먼트의 수가 더 많은 클러스터를 정상 클러스터로 선택하고, 상기 정상 클러스터로 선택되지 않은 나머지 클러스터를 이상 클러스터로 선택할 수 있다. 즉, 두 개의 클러스터 중에서 신경망이 예측한 각 세그먼트에서 정상으로 예측한 세그먼트가 더 많이 분포하는 클러스터를 정상 클러스터로 선택하고, 나머지 클러스터를 이상 클러스터로 선택하는 것이다. The abnormal cluster generation unit 310 may select a cluster having a larger number of normal segments predicted as normal by the neural network among the two clusters as a normal cluster, and select the remaining clusters that are not selected as the normal cluster as an abnormal cluster. have. That is, among the two clusters, the cluster in which the segment predicted by the neural network is distributed more in each segment predicted by the neural network is selected as the normal cluster, and the remaining clusters are selected as the abnormal cluster.

레이블 생성부(320)는 상기 정상 클러스터에 속해 있는 모든 세그먼트들의 레이블을 정상으로 레이블링하고, 상기 이상 클러스터에 속해 있는 모든 세그먼트들의 레이블을 이상으로 레이블링할 수 있다. 이와 같이 생성된 레이블들을 상기 신경망의 예측결과 정답셋으로 설정할 수 있다.The label generator 320 may label labels of all segments included in the normal cluster as normal, and label labels of all segments included in the abnormal cluster as abnormal. The labels generated in this way can be set as a set of correct answers as a result of prediction of the neural network.

이상손실함수1 계산부(330)는 상기 세그먼트들을 입력으로 한 신경망의 최종 값들을 상기 레이블 생성부(320)에서 생성한 상기 신경망의 예측결과 정답셋과 비교하여 손실함수1를 계산할 수 있다. 또한 이상손실함수2 계산부(340)는 상기 이상클러스터 생성부(310)에서 생성한 상기 정상 클러스터와 상기 이상 클러스터를 비교하여 손실함수2를 계산할 수 있다. 그리고 이상전체손실함수 계산부(350)는 상기 이상손실함수1 계산부(330)에서 계산한 상기 손실함수1과 상기 이상손실함수2 계산부(340)에서 계산한 상기 손실함수2를 종합하여 전체 손실함수를 계산할 수 있다. 그리고 상기 손실함수2는 상기 정상 클러스터의 중심과 상기 이상 클러스터의 중심 사이의 거리에 반비례하는 것일 수 있다. The ideal loss function 1 calculator 330 may calculate the loss function 1 by comparing the final values of the neural network to which the segments are input with the set of prediction results of the neural network generated by the label generator 320 . Also, the abnormal loss function 2 calculation unit 340 may calculate the loss function 2 by comparing the abnormal cluster generated by the abnormal cluster generating unit 310 with the abnormal cluster. And the ideal total loss function calculation unit 350 synthesizes the loss function 1 calculated by the ideal loss function 1 calculation unit 330 and the loss function 2 calculated by the ideal loss function 2 calculation unit 340 to obtain a total The loss function can be calculated. And the loss function 2 may be inversely proportional to a distance between the center of the normal cluster and the center of the abnormal cluster.

는 교차 엔트로피 손실일 수도 있고,

may be a cross-entropy loss,

may be an average of the distances of all elements in the cluster.

이상 비디오 학습부(220)는 이상 비디오 영상을 다수 구축하고, 이상 비디오 영상으로 학습하는 동작을 결과가 수렴할 때까지 반복할 수 있다. 이 때 각각의 이상 비디오 영상을 학습할 때마다 클러스터링이 수행되고, 같은 비디오 영상이라고 하더라도 학습이 진행됨에 따라서 클러스터링의 결과가 변화할 수 있다. 이때, 클러스터링 결과에서 어느 클러스터가 정상 또는 이상인지는 사전에 정의되지는 않고, 신경망의 출력 결과에 따라 정상으로 판별된 세그먼트가 더 많이 존재하는 클러스터가 정상 클러스터로 선택되는 것이다.The abnormal video learning unit 220 may construct a plurality of abnormal video images and repeat the operation of learning from the abnormal video images until the results converge. In this case, clustering is performed whenever each abnormal video image is learned, and even if it is the same video image, the clustering result may change as learning proceeds. In this case, which cluster is normal or abnormal in the clustering result is not defined in advance, and a cluster having more segments determined to be normal according to the output result of the neural network is selected as the normal cluster.

도 4는 도 2에 도시된 정상 비디오 학습부(230)의 일 예를 나타낸 도면이다.FIG. 4 is a diagram illustrating an example of the normal video learning unit 230 shown in FIG. 2 .

도 4를 참조하면, 정상 비디오 학습부(230)는 정상클러스터 생성부(410), 정상손실함수1 계산부(420), 정상손실함수2 계산부(430), 정상전체손실함수 계산부(440)를 포함할 수 있다. Referring to FIG. 4 , the normal video learning unit 230 includes a normal cluster generation unit 410 , a stationary loss function 1 calculation unit 420 , a stationary loss function 2 calculation unit 430 , and a stationary total loss function calculation unit 440 . ) may be included.

정상클러스터 생성부(410)는 세그먼트에 대한 상기 신경망(210)의 중간 계층의 특징을 기반으로 두 개의 클러스터를 생성할 수 있다. The normal cluster generator 410 may generate two clusters based on the characteristics of the middle layer of the neural network 210 for the segment.

상기 신경망을 학습하기 위해서는 각 세그먼트별로 이상 또는 정상의 판별 결과에 상응하는 이상 또는 정상의 레이블링이 있어야 한다. 따라서 상기 정상 비디오 학습부(230)는 상기 신경망의 중간 계층의 특징을 기반으로 상기 세그먼트들을 두 개의 그룹, 즉 두 개의 클러스터로 나누고, 상기 클러스터 별로 레이블링을 하여 정답셋으로 사용하는 것이다. In order to learn the neural network, there should be labeling of abnormality or normality corresponding to the determination result of abnormality or normality for each segment. Therefore, the normal video learning unit 230 divides the segments into two groups, ie, two clusters, based on the characteristics of the intermediate layer of the neural network, labels each cluster, and uses it as a correct answer set.

상기 정상클러스터 생성부(410)는 상기 두 개의 클러스터 중 하나를 정상1 클러스터로 선택하고, 다른 하나를 정상2 클러스터로 선택할 수 있다. 즉, 모든 세그먼트가 정상임을 알고 있으므로 두 개의 클러스터 모두 정상 클러스터가 되는 것이고, 세그먼트 단위의 신경망의 예측결과 정답셋도 모두 정상으로 하면 된다. 따라서 상기 세그먼트 단위의 신경망의 예측에 대한 정답셋을 가지고 신경망 학습을 진행할 수 있다.The normal cluster generator 410 may select one of the two clusters as the normal 1 cluster and select the other as the normal 2 cluster. In other words, since we know that all segments are normal, both clusters become normal clusters, and the prediction result of the segment-by-segment neural network can also be set to normal. Therefore, neural network learning can be performed with a set of correct answers for prediction of the neural network in the segment unit.

정상손실함수1 계산부(420)는 상기 세그먼트들을 입력으로 한 신경망의 최종 값들을 정상을 나타내는 정답셋과 비교하여 손실함수1을 계산할 수 있다. 또한 정상손실함수2 계산부(430)는 상기 정상1 클러스터와 상기 정상2 클러스터를 비교하여 손실함수2를 계산할 수 있다. 그리고 정상전체손실함수 계산부(440)는 상기 정상손실함수1 계산부(420)에서 계산한 상기 손실함수1과 상기 정상손실함수2 계산부(430)에서 계산한 상기 손실함수2를 종합하여 전체 손실함수를 계산할 수 있다. 그리고 상기 손실함수2는 상기 정상1 클러스터의 중심과 상기 정상2 클러스터의 중심 사이의 거리에 비례하는 것일 수 있다. The normal loss function 1 calculator 420 may calculate the loss function 1 by comparing the final values of the neural network to which the segments are input with a set of correct answers indicating normality. Also, the normal loss function 2 calculator 430 may calculate the loss function 2 by comparing the normal 1 cluster and the normal 2 cluster. And the stationary total loss function calculation unit 440 synthesizes the loss function 1 calculated by the normal loss function 1 calculation unit 420 and the loss function 2 calculated by the stationary loss function 2 calculation unit 430 to obtain a total The loss function can be calculated. In addition, the loss function 2 may be proportional to a distance between the center of the normal 1 cluster and the center of the normal 2 cluster.

정상 비디오 학습부(230)는 정상 비디오 영상을 다수 구축하고, 정상 비디오 영상으로 학습하는 동작을 결과가 수렴할 때까지 반복할 수 있다. 이 때 각각의 비디오 영상을 학습할 때마다 클러스터링이 수행되고, 같은 비디오 영상이라고 하더라도 학습이 진행됨에 따라서 클러스터링의 결과가 변화할 수 있다. The normal video learning unit 230 may repeat the operation of constructing a plurality of normal video images and learning from the normal video images until the results converge. At this time, clustering is performed every time each video image is learned, and even for the same video image, the clustering result may change as learning proceeds.

도 5는 도 2에 도시된 이상 비디오 학습부(220)의 활용예를 나타낸 도면이다.FIG. 5 is a diagram illustrating an example of application of the abnormal video learning unit 220 shown in FIG. 2 .

도 5를 참조하면, 이상 프레임을 포함하는 이상 비디오 영상(510)이 세그먼트 생성부에 입력된다. 상기 이상 비디오 영상(510)에서 각 사각형은 비디오 영상의 프레임을 나타내고 각 사각형 중 진한 색으로 표시된 것은 이상 프레임을 나타낸다. Referring to FIG. 5 , an abnormal video image 510 including an abnormal frame is input to the segment generator. In the abnormal video image 510, each rectangle represents a frame of a video image, and a dark color among the rectangles indicates an abnormal frame.

이후 세그먼트 생성부에서 이상 비디오 영상의 각 프레임을 선택된 개수만큼묶어 세그먼트들(520)을 생성한다.Thereafter, the segment generating unit generates segments 520 by bundling each frame of the abnormal video image by a selected number.

그리고 상기 세그먼트들이 신경망(530)의 입력으로 들어가게 되고, 이상클러스터 생성부는 신경망의 중간 계층의 특징을 기반으로 두 개의 클러스터(550)를 생성한다. 그리고 상기 두 개의 클러스터 중 상기 신경망이 정상으로 예측한 정상 세그먼트의 수가 더 많은 클러스터를 정상 클러스터(560)로 선택하고, 나머지 클러스터를 이상 클러스터(570)로 선택할 수 있다. Then, the segments are input to the neural network 530 , and the abnormal cluster generator generates two clusters 550 based on the characteristics of the middle layer of the neural network. In addition, a cluster having a larger number of normal segments predicted by the neural network as a normal among the two clusters may be selected as the normal cluster 560 , and the remaining clusters may be selected as the abnormal cluster 570 .

그리고 레이블 생성부는 상기 정상 클러스터에 속해 있는 세그먼트들의 레이블을 정상으로 레이블링하고, 상기 이상 클러스터에 속해 있는 세그먼트들의 레이블을 이상으로 레이블링할 수 있다. 이와 같이 생성된 레이블들을 상기 신경망의 예측결과 정답셋으로 설정할 수 있다.The label generator may label labels of segments belonging to the normal cluster as normal, and label labels of segments belonging to the abnormal cluster as abnormal. The labels generated in this way can be set as a set of correct answers as a result of prediction of the neural network.

이렇게 하여 이상 비디오 학습부는 세그먼트 단위의 상기 신경망의 예측결과 정답셋을 생성하고, 상기 정답셋을 가지고 신경망 학습을 진행할 수 있는 것이다. In this way, the abnormal video learning unit may generate a set of correct answers as a result of prediction of the neural network in units of segments, and may perform neural network learning using the set of correct answers.

이상손실함수1 계산부는 상기 세그먼트들을 입력으로 한 신경망의 최종 값들(540)을 상기 레이블 생성부에서 생성한 상기 신경망의 예측결과 정답셋(550)과 비교하여 손실함수1(580)을 계산할 수 있다. 또한 이상손실함수2 계산부는 상기 이상클러스터 생성부에서 생성한 상기 정상 클러스터(560)와 상기 이상 클러스터(570)를 비교하여 손실함수2(590)를 계산할 수 있다. 이 때, 상기 손실함수2는 상기 정상 클러스터의 중심과 상기 이상 클러스터의 중심 사이의 거리에 반비례하는 것일 수 있다. 이후 이상전체손실함수 계산부는 상기 손실함수1과 상기 손실함수2를 종합하여 전체 손실함수를 계산할 수 있다. The abnormal loss function 1 calculator may calculate the loss function 1 580 by comparing the final values 540 of the neural network to which the segments are input with the prediction result set 550 of the neural network generated by the label generator. . In addition, the abnormal loss function 2 calculation unit may calculate the loss function 2 (590) by comparing the normal cluster 560 and the abnormal cluster 570 generated by the abnormal cluster generation unit. In this case, the loss function 2 may be inversely proportional to a distance between the center of the normal cluster and the center of the abnormal cluster. Thereafter, the ideal total loss function calculation unit may calculate the total loss function by synthesizing the loss function 1 and the loss function 2 .

는 교차 엔트로피 손실일 수도 있고,

may be a cross-entropy loss,

may be an average of the distances of all elements in the cluster.

도 6은 도 2에 도시된 정상 비디오 학습부(230)의 활용예를 나타낸 도면이다. FIG. 6 is a diagram illustrating an example of application of the normal video learning unit 230 shown in FIG. 2 .

도 6을 참조하면, 이상 프레임을 하나도 포함하지 않는 정상 비디오 영상(610)이 세그먼트 생성부에 입력된다. 상기 정상 비디오 영상(610)에서 각 사각형은 비디오 영상의 프레임이고, 모두 정상 프레임을 나타낸다.Referring to FIG. 6 , a normal video image 610 including no abnormal frames is input to the segment generator. In the normal video image 610, each rectangle is a frame of a video image, and all of them represent normal frames.

이후 세그먼트 생성부에서 이상 비디오 영상의 각 프레임들을 선택된 개수만큼 묶어 세그먼트들(620)을 생성한다.Thereafter, the segment generator generates segments 620 by grouping each frame of the abnormal video image by a selected number.

그리고 상기 세그먼트들이 신경망(630)의 입력으로 들어가게 되고, 정상클러스터생성부는 신경망의 중간 계층의 특징을 기반으로 두 개의 클러스터(660)를 생성한다. 그리고 상기 두 개의 클러스터 중 하나를 정상1 클러스터(670)로 선택하고, 다른 하나를 정상2 클러스터(675)로 선택할 수 있다. 즉, 모든 세그먼트가 정상임을 알고 있으므로 두 개의 클러스터 모두 정상 클러스터가 되는 것이고, 세그먼트 단위의 신경망의 예측결과 정답셋(650)도 모두 정상으로 하면 된다. 따라서 상기 세그먼트 단위의 신경망의 예측에 대한 정답셋을 가지고 신경망 학습을 진행할 수 있다.Then, the segments are input to the neural network 630 , and the normal cluster generator generates two clusters 660 based on the characteristics of the middle layer of the neural network. Then, one of the two clusters may be selected as the normal 1 cluster 670 , and the other may be selected as the normal 2 cluster 675 . That is, since it is known that all segments are normal, both clusters become normal clusters, and the prediction result set 650 of the neural network for each segment is also set to normal. Therefore, neural network learning can be performed with a set of correct answers for prediction of the neural network in the segment unit.

정상손실함수1 계산부는 상기 세그먼트들을 입력으로 한 신경망의 최종 값들(640)을 정상을 나타내는 정답셋(650)과 비교하여 손실함수1(680)을 계산할 수 있다. 또한 정상손실함수2 계산부는 상기 정상1 클러스터(670)와 상기 정상2 클러스터(675)를 비교하여 손실함수2(690)를 계산할 수 있다. 이 때, 상기 손실함수2는 상기 정상1 클러스터의 중심과 상기 정상2 클러스터의 중심 사이의 거리에 비례하는 것일 수 있다. 이후 정상전체손실함수 계산부는 상기 손실함수1과 상기 손실함수2를 종합하여 전체 손실함수를 계산할 수 있다. The normal loss function 1 calculator may calculate the loss function 1 680 by comparing the final values 640 of the neural network to which the segments are input with the correct answer set 650 indicating normality. Also, the stationary loss function 2 calculator may calculate the loss function 2 690 by comparing the normal 1 cluster 670 and the normal 2 cluster 675 . In this case, the loss function 2 may be proportional to a distance between the center of the normal 1 cluster and the center of the normal 2 cluster. Thereafter, the stationary total loss function calculation unit may calculate the total loss function by synthesizing the loss function 1 and the loss function 2 .

도 7은 실시예에 따른 비디오 영상 프레임의 세그먼팅을 이용한 이상 탐지 방법의 일 예를 나타낸 동작 흐름도이다. 7 is an operation flowchart illustrating an example of an anomaly detection method using segmenting a video image frame according to an embodiment.

도 7을 참조하면, 비디오 영상 프레임의 세그먼팅을 이용한 이상 탐지 장치의 세그먼트 생성부에 비디오 영상이 입력된다(S710).Referring to FIG. 7 , a video image is input to a segment generator of an anomaly detection apparatus using segmenting of a video image frame ( S710 ).

세그먼트 생성부는 입력된 비디오 영상으로부터 선택된 개수의 프레임을 묶어 세그먼트들을 생성한다(S720). 상기 이상 탐지 장치는 상기 선택된 개수(한 세그먼트를 구성하는 프레임의 개수)가 1이 되면 프레임 수준에서 이상 및 정상을 판단할 수 있는 시스템이 되고, 1보다 크게 되면 세그먼트 수준에서 이상 및 정상을 판단할 수 있는 시스템이 될 수 있다. The segment generator generates segments by combining a selected number of frames from the input video image (S720). When the selected number (the number of frames constituting one segment) becomes 1, the anomaly detection device becomes a system capable of determining abnormality and normality at the frame level, and when it is greater than 1, it is possible to determine abnormality and normality at the segment level. It can be a system that can

신경망 학습부는 상기 세그먼트들을 입력으로 한 신경망의 중간 계층의 특징을 기반으로 신경망을 학습한다(S730). 만약 상기 신경망의 학습이 완료되면, 신경망 학습 단계(S730)를 생략할 수 있다. The neural network learning unit learns the neural network based on the characteristics of the middle layer of the neural network to which the segments are input (S730). If learning of the neural network is completed, the neural network learning step S730 may be omitted.

우선 신경망 학습부는 적어도 하나 이상의 이상 프레임을 포함하는 이상 비디오 영상으로 학습하는 동작을 수행할 수 있다. First, the neural network learning unit may perform an operation of learning from an abnormal video image including at least one or more frames.

신경망 학습부가 이상 비디오 영상으로 학습하는 동작을 수행하기 위하여 상기 세그먼트에 대한 상기 신경망의 중간 계층의 특징을 기반으로 두 개의 클러스터를 생성할 수 있다. In order to perform the operation of the neural network learning unit learning from the abnormal video image, two clusters may be generated based on the characteristics of the intermediate layer of the neural network for the segment.

상기 신경망 학습부는 상기 두 개의 클러스터 중 상기 신경망이 정상으로 예측한 정상 세그먼트의 수가 더 많은 클러스터를 정상 클러스터로 선택하고, 상기 정상 클러스터로 선택되지 않은 나머지 클러스터를 이상 클러스터로 선택할 수 있다. 즉, 두 개의 클러스터 중에서 신경망이 예측한 각 세그먼트에서 정상으로 예측한 세그먼트가 더 많이 분포하는 클러스터를 정상 클러스터로 선택하고, 나머지 클러스터를 이상 클러스터로 선택하는 것이다. The neural network learning unit may select a cluster having a larger number of normal segments predicted as normal by the neural network from among the two clusters as a normal cluster, and select the remaining clusters that are not selected as the normal cluster as an abnormal cluster. That is, among the two clusters, the cluster in which the segment predicted by the neural network is distributed more in each segment predicted by the neural network is selected as the normal cluster, and the remaining clusters are selected as the abnormal cluster.

상기 신경망 학습부는 상기 정상 클러스터에 속해 있는 모든 세그먼트들의 레이블을 정상으로 레이블링하고, 상기 이상 클러스터에 속해 있는 모든 세그먼트들의 레이블을 이상으로 레이블링할 수 있다. 이와 같이 생성된 레이블들을 상기 신경망의 예측결과 정답셋으로 설정할 수 있다.The neural network learning unit may label labels of all segments included in the normal cluster as normal, and label labels of all segments included in the abnormal cluster as abnormal. The labels generated in this way can be set as a set of correct answers as a result of prediction of the neural network.

상기 신경망 학습부는 상기 세그먼트들을 입력으로 한 신경망의 최종 값들을 상기 신경망의 예측결과 정답셋과 비교하여 손실함수1를 계산할 수 있다. 또한 상기 정상 클러스터와 상기 이상 클러스터를 비교하여 손실함수2를 계산할 수 있다. 그리고 상기 손실함수1과 상기 손실함수2를 종합하여 전체 손실함수를 계산할 수 있다. 그리고 상기 손실함수2는 상기 정상 클러스터의 중심과 상기 이상 클러스터의 중심 사이의 거리에 반비례하는 것일 수 있다. The neural network learning unit may calculate the loss function 1 by comparing final values of the neural network to which the segments are input with the set of prediction results of the neural network. In addition, the loss function 2 may be calculated by comparing the normal cluster and the abnormal cluster. And the total loss function can be calculated by synthesizing the loss function 1 and the loss function 2 . And the loss function 2 may be inversely proportional to a distance between the center of the normal cluster and the center of the abnormal cluster.

또한, 신경망 학습부는 이상 프레임을 하나도 포함하지 않는 정상 비디오 영상으로 학습하는 동작을 수행할 수 있다. Also, the neural network learning unit may perform an operation of learning from a normal video image that does not include any abnormal frames.

우선 상기 신경망 학습부가 정상 비디오 영상으로 학습하는 동작을 수행하기 위하여 상기 세그먼트에 대한 상기 신경망의 중간 계층의 특징을 기반으로 두 개의 클러스터를 생성할 수 있다.First, in order for the neural network learning unit to learn from a normal video image, two clusters may be generated based on a characteristic of an intermediate layer of the neural network for the segment.

상기 신경망 학습부는 상기 두 개의 클러스터 중 하나를 정상1 클러스터로 선택하고, 다른 하나를 정상2 클러스터로 선택할 수 있다. 즉, 모든 세그먼트가 정상임을 알고 있으므로 두 개의 클러스터 모두 정상 클러스터가 되는 것이고, 세그먼트 단위의 신경망의 예측결과 정답셋도 모두 정상으로 하면 된다. 따라서 상기 세그먼트 단위의 신경망의 예측에 대한 정답셋을 가지고 신경망 학습을 진행할 수 있다.The neural network learning unit may select one of the two clusters as the normal 1 cluster and select the other as the normal 2 cluster. In other words, since we know that all segments are normal, both clusters become normal clusters, and the prediction result of the segment-by-segment neural network can also be set to normal. Therefore, neural network learning can be performed with a set of correct answers for prediction of the neural network in the segment unit.

그리고 상기 신경망 학습부는 상기 세그먼트들을 입력으로 한 신경망의 최종 값들을 정상을 나타내는 정답셋과 비교하여 상기 수학식 1과 같이 손실함수1을 계산할 수 있다. 또한 상기 정상1 클러스터와 상기 정상2 클러스터를 비교하여 상기 손실함수2를 계산할 수 있다. 그리고 상기 손실함수1과 상기 손실함수2를 종합하여 전체 손실함수를 계산할 수 있다. 그리고 상기 손실함수2는 상기 정상1 클러스터의 중심과 상기 정상2 클러스터의 중심 사이의 거리에 비례하는 것일 수 있다. In addition, the neural network learning unit may calculate the loss function 1 as in Equation 1 by comparing the final values of the neural network to which the segments are input with a set of correct answers indicating normality. Also, the loss function 2 may be calculated by comparing the normal 1 cluster and the normal 2 cluster. And the total loss function can be calculated by synthesizing the loss function 1 and the loss function 2 . In addition, the loss function 2 may be proportional to a distance between the center of the normal 1 cluster and the center of the normal 2 cluster.

상기 손실함수1은 각 세그먼트의 신경망의 예측값과 정답셋을 비교하여 얻을 수 있는데, 그 수학식은 상기 수학식 1과 같이 정의될 수 있다. 상기 손실함수2는 두 개의 클러스터의 비교를 통해 구할 수 있다. 그리고 상기 두 개의 클러스터 모두가 정상을 나타내고 있으므로, 두 클러스터 간의 거리가 되도록 가까운 것이 바람직하다. 따라서 두 클러스터 간의 거리를 사용하여 손실함수2를 상기 수학식 4와 같이 계산할 수 있다. 이상과 같이 손실함수에 세그먼트 및 클러스터의 두 가지 요소를 고려할 수 있으므로, 이상 비디오를 가지고 학습하는 경우의 전체 손실 함수는 상기 수학식 3과 같이 정의될 수 있다.The loss function 1 can be obtained by comparing the predicted value of the neural network of each segment with the correct answer set, and the equation can be defined as Equation 1 above. The loss function 2 can be obtained by comparing two clusters. In addition, since both of the two clusters indicate normality, it is preferable that the distance between the two clusters be as close as possible. Therefore, the loss function 2 can be calculated as in Equation 4 using the distance between the two clusters. As described above, since two elements of a segment and a cluster can be considered in the loss function, the overall loss function in the case of learning with an abnormal video can be defined as in Equation 3 above.

이상 검출부는 세그먼트 생성부에서 생성한 세그먼트들을 신경망 학습부에서학습된 신경망에 입력하여 상기 세그먼트 단위로 이상을 검출하고(S740), 세그먼트 단위의 이상 검출결과가 생성된다(S750). The abnormality detection unit inputs the segments generated by the segment generation unit to the neural network learned by the neural network learning unit to detect the abnormality in the segment unit (S740), and an abnormality detection result in the segment unit is generated (S750).

도 8은 실시예에 따른 도 7에 도시된 신경망 학습(S730)의 일 예를 나타낸 동작 흐름도이다.8 is an operation flowchart illustrating an example of neural network learning ( S730 ) shown in FIG. 7 according to an embodiment.

도 8을 참조하여, 이상 비디오 영상을 입력으로 받은 경우, 신경망 학습부가 신경망을 학습하는 방법을 설명한다. A method for the neural network learning unit to learn a neural network when an abnormal video image is received as an input will be described with reference to FIG. 8 .

우선 세그먼트 생성부에서 생성된 세그먼트를 신경망에 입력한다(S810). First, the segment generated by the segment generator is input to the neural network (S810).

이상 비디오 학습부의 이상클러스터 생성부는 상기 세그먼트에 대한 상기 신경망의 중간 계층의 특징을 기반으로 두 개의 클러스터를 생성한다(S820). 이때, 상기 두 개의 클러스터 중 상기 신경망이 정상으로 예측한 정상 세그먼트의 수가 더 많은 클러스터를 정상 클러스터로 선택하고, 상기 정상 클러스터로 선택되지 않은 나머지 클러스터를 이상 클러스터로 선택할 수 있다. 즉, 두 개의 클러스터 중에서 신경망이 예측한 각 세그먼트에서 정상으로 예측한 세그먼트가 더 많이 분포하는 클러스터를 정상 클러스터로 선택하고, 나머지지 클러스터를 이상 클러스터로 선택하는 것이다. The abnormal cluster generating unit of the abnormal video learning unit generates two clusters based on the characteristics of the middle layer of the neural network for the segment ( S820 ). In this case, a cluster having a larger number of normal segments predicted by the neural network as a normal among the two clusters may be selected as a normal cluster, and the remaining clusters not selected as the normal cluster may be selected as an abnormal cluster. That is, among the two clusters, the cluster in which the segment predicted by the neural network is distributed more in each segment predicted by the neural network is selected as the normal cluster, and the remaining cluster is selected as the abnormal cluster.

이상 비디오 학습부의 레이블 생성부는 상기 정상 클러스터에 속해 있는 모든 세그먼트들의 레이블을 정상으로 레이블링하고, 상기 이상 클러스터에 속해 있는 모든 세그먼트들의 레이블을 이상으로 레이블링한다(S830). 이와 같이 생성된 레이블들을 상기 신경망의 예측결과 정답셋으로 설정할 수 있다.The label generating unit of the abnormal video learning unit labels labels of all segments belonging to the normal cluster as normal, and labels labels of all segments belonging to the abnormal cluster as abnormal (S830). The labels generated in this way can be set as a set of correct answers as a result of prediction of the neural network.

이상 비디오 학습부의 이상손실함수1 계산부는 상기 세그먼트들을 입력으로 한 신경망의 최종 값들을 상기 신경망의 예측결과 정답셋과 비교하여 상기 수학식1과 같이 손실함수1를 계산한다(S840). 또한 이상 비디오 학습부의 이상손실함수2 계산부는 상기 정상 클러스터와 상기 이상 클러스터를 비교하여 상기 수학식2와 같이 손실함수2를 계산한다(S850). 그리고 이상 비디오 학습부의 이상전체손실함수 계산부는 상기 손실함수1과 상기 손실함수2를 종합하여 상기 수학식3과 같이 전체 손실함수를 계산한다(S860). 그리고 상기 손실함수2는 상기 정상 클러스터의 중심과 상기 이상 클러스터의 중심 사이의 거리에 반비례하는 것일 수 있다. The abnormal loss function 1 calculation unit of the abnormal video learning unit calculates the loss function 1 as in Equation 1 by comparing the final values of the neural network to which the segments are input with the prediction result set of the neural network (S840). In addition, the abnormal loss function 2 calculation unit of the abnormal video learning unit calculates the loss function 2 as in Equation 2 by comparing the normal cluster with the abnormal cluster (S850). And the ideal total loss function calculation unit of the ideal video learning unit calculates the total loss function as in Equation 3 by combining the loss function 1 and the loss function 2 (S860). And the loss function 2 may be inversely proportional to a distance between the center of the normal cluster and the center of the abnormal cluster.

도 9는 실시예에 따른 도 7에 도시된 신경망 학습(S730)의 다른 일 예를 나타낸 동작 흐름도이다.9 is an operation flowchart illustrating another example of neural network learning ( S730 ) shown in FIG. 7 according to an embodiment.

도 9를 참조하여, 정상 비디오 영상을 입력으로 받은 경우, 신경망 학습부가 신경망을 학습하는 방법을 설명한다. A method for the neural network learning unit to learn a neural network when a normal video image is received as an input will be described with reference to FIG. 9 .

우선 세그먼트 생성부에서 생성된 세그먼트를 신경망에 입력한다(S910). First, the segment generated by the segment generator is input to the neural network (S910).

정상 비디오 학습부의 정상클러스터 생성부는 상기 세그먼트에 대한 상기 신경망의 중간 계층의 특징을 기반으로 두 개의 클러스터를 생성한다(S920). 이때, 상기 두 개의 클러스터 중 하나를 정상1 클러스터로 선택하고, 다른 하나를 정상2 클러스터로 선택할 수 있다. 즉, 모든 세그먼트가 정상임을 알고 있으므로 두 개의 클러스터 모두 정상 클러스터가 되는 것이고, 세그먼트 단위의 신경망의 예측에 대한 정답셋도 모두 정상으로 하면 된다. 따라서 상기 세그먼트 단위의 신경망의 예측에 대한 정답셋을 가지고 신경망 학습을 진행할 수 있다.The normal cluster generating unit of the normal video learning unit generates two clusters based on the characteristics of the middle layer of the neural network for the segment (S920). In this case, one of the two clusters may be selected as the normal 1 cluster and the other may be selected as the normal 2 cluster. In other words, since all segments are known to be normal, both clusters become normal clusters, and the set of correct answers for the prediction of the segment unit neural network is also set to normal. Therefore, neural network learning can be performed with a set of correct answers for prediction of the neural network in the segment unit.

정상 비디오 학습부의 정상손실함수1 계산부는 상기 세그먼트들을 입력으로 한 신경망의 최종 값들을 정상을 나타내는 정답셋과 비교하여 상기 수학식 1과 같이 손실함수1을 계산한다(S930). 또한 정상 비디오 학습부의 정상손실함수2 계산부는 상기 정상1 클러스터와 상기 정상2 클러스터를 비교하여 상기 수학식4와 같이 상기 손실함수2를 계산한다(S940). 그리고 정상 비디오 학습부의 정상전체손실함수 계산부는 상기 손실함수1과 상기 손실함수2를 종합하여 상기 수학식3과 같이 전체 손실함수를 계산한다(S950). 그리고 상기 손실함수2는 상기 정상1 클러스터의 중심과 상기 정상2 클러스터의 중심 사이의 거리에 비례하는 것일 수 있다. The normal loss function 1 calculator of the normal video learning unit calculates the loss function 1 as in Equation 1 by comparing the final values of the neural network to which the segments are input with the set of correct answers indicating normal (S930). Also, the normal loss function 2 calculator of the normal video learning unit calculates the loss function 2 as in Equation 4 by comparing the normal 1 cluster and the normal 2 cluster (S940). And the stationary total loss function calculation unit of the normal video learning unit calculates the total loss function as in Equation 3 by synthesizing the loss function 1 and the loss function 2 (S950). In addition, the loss function 2 may be proportional to a distance between the center of the normal 1 cluster and the center of the normal 2 cluster.

도 10은 실시예에 따른 컴퓨터 시스템 구성을 나타낸 도면이다.10 is a diagram showing the configuration of a computer system according to an embodiment.

실시예에 따른 비디오 영상 프레임의 세그먼팅을 이용한 이상 탐지 장치는 컴퓨터로 읽을 수 있는 기록매체와 같은 컴퓨터 시스템(1000)에서 구현될 수 있다.The apparatus for detecting anomalies using segmenting of a video image frame according to an embodiment may be implemented in the computer system 1000 such as a computer-readable recording medium.

컴퓨터 시스템(1000)은 버스(1020)를 통하여 서로 통신하는 하나 이상의 프로세서(1010), 메모리(1030), 사용자 인터페이스 입력 장치(1040), 사용자 인터페이스 출력 장치(1050) 및 스토리지(1060)를 포함할 수 있다. 또한, 컴퓨터 시스템(1000)은 네트워크(1080)에 연결되는 네트워크 인터페이스(1070)를 더 포함할 수 있다. 프로세서(1010)는 중앙 처리 장치 또는 메모리(1030)나 스토리지(1060)에 저장된 프로그램 또는 프로세싱 인스트럭션들을 실행하는 반도체 장치일 수 있다. 메모리(1030) 및 스토리지(1060)는 휘발성 매체, 비휘발성 매체, 분리형 매체, 비분리형 매체, 통신 매체, 또는 정보 전달 매체 중에서 적어도 하나 이상을 포함하는 저장 매체일 수 있다. 예를 들어, 메모리(1030)는 ROM(1031)이나 RAM(1032)을 포함할 수 있다.Computer system 1000 may include one or more processors 1010 , memory 1030 , user interface input device 1040 , user interface output device 1050 , and storage 1060 that communicate with each other via bus 1020 . can Additionally, the computer system 1000 may further include a network interface 1070 coupled to the network 1080 . The processor 1010 may be a central processing unit or a semiconductor device that executes programs or processing instructions stored in the memory 1030 or storage 1060 . The memory 1030 and the storage 1060 may be storage media including at least one of a volatile medium, a non-volatile medium, a removable medium, a non-removable medium, a communication medium, and an information delivery medium. For example, the memory 1030 may include a ROM 1031 or a RAM 1032 .

이상에서 설명된 실시예에 따르면, 비디오 영상 프레임의 세그먼팅을 이용한 이상 탐지 방법및 이를 위한 장치를 제공함으로써 비디오 수준에서의 레이블링 정보, 즉 비디오가 이상 상황을 포함하는지 여부를 나타내는 정보만을 가지고 프레임 수준에서의 이상을 탐지할 수 있는 이상 탐지 시스템을 제공할 수 있다.According to the embodiment described above, by providing a method for detecting anomalies using segmenting of video image frames and an apparatus for the same, labeling information at the video level, that is, information indicating whether the video contains anomalies, is provided at the frame level. It is possible to provide an anomaly detection system capable of detecting anomalies in

또한 이상 비디오 영상 또는 정상 비디오 영상을 이용하여 이상 탐지에 사용될 인공지능 신경망을 학습하는 방법을 제공할 수 있다. In addition, it is possible to provide a method for learning an artificial intelligence neural network to be used for anomaly detection by using an abnormal video image or a normal video image.

이상에서 첨부된 도면을 참조하여 본 발명의 실시예들을 설명하였지만, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다.Although embodiments of the present invention have been described above with reference to the accompanying drawings, those of ordinary skill in the art to which the present invention pertains can practice the present invention in other specific forms without changing its technical spirit or essential features. You will understand that there is Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive.

100: 비디오 영상 프레임의 세그먼팅을 이용한 이상 탐지 장치
110: 비디오 입력 영상
120: 세그먼트 생성부
130: 신경망 학습부
140: 이상 검출부
150: 세그먼트 단위 이상 검출 결과100: anomaly detection device using segmentation of video image frames
110: video input image
120: segment generator
130: neural network learning unit
140: abnormality detection unit
150: Segment unit abnormality detection result

Claims

generating segments by combining a selected number of frames from the video image input;
learning a neural network based on the characteristics of an intermediate layer of the neural network to which the segments are input; and
and inputting the segments into a learned neural network to detect anomalies in units of the segments.

According to claim 1,
The step of learning the neural network is
An anomaly detection method using segmenting of a video image frame, comprising learning from an abnormal video image including at least one or more frames.

3. The method of claim 2,
The step of learning from the above video image is
generating two clusters based on a characteristic of an intermediate layer of the neural network for the segment;
selecting a cluster having a larger number of normal segments predicted to be normal by the neural network as a normal cluster among the two clusters; and
and selecting the remaining clusters that are not selected as the normal cluster among the two clusters as the abnormal cluster.

4. The method of claim 3,
The step of learning from the above video image is
labeling labels of all segments belonging to the normal cluster as normal;
labeling labels of all segments belonging to the abnormal cluster as abnormal; and
The method of detecting anomalies using segmenting of video image frames, further comprising the step of setting the labels as a set of prediction results of the neural network.

5. The method of claim 4,
The step of learning from the above video image is
calculating a loss function 1 by comparing final values of the neural network to which the segments are input with the set of prediction results of the neural network;
calculating a loss function 2 by comparing the normal cluster with the abnormal cluster; and
The method of detecting anomalies using segmenting of video image frames, further comprising calculating a total loss function by synthesizing the loss function 1 and the loss function 2.

6. The method of claim 5,
The loss function 2 is inversely proportional to a distance between the center of the normal cluster and the center of the anomaly cluster.

According to claim 1,
The step of learning the neural network is
An anomaly detection method using segmenting of video image frames, comprising learning a normal video image that does not include any abnormal frames.

8. The method of claim 7,
The step of learning with the normal video image is
generating two clusters based on a characteristic of an intermediate layer of the neural network for the segment;
selecting one of the two clusters as a normal 1 cluster; and
and selecting the other one of the two clusters as the normal 2 cluster.

9. The method of claim 8,
The step of learning with the normal video image is
calculating a loss function 1 by comparing final values of the neural network to which the segments are input with a set of correct answers indicating normality;
calculating a loss function 2 by comparing the normal 1 cluster with the normal 2 cluster; and
An anomaly detection method using segmenting of a video image frame, further comprising calculating a total loss function by synthesizing the loss function 1 and the loss function 2.

10. The method of claim 9,
Wherein the loss function 2 is proportional to a distance between the center of the normal 1 cluster and the center of the normal 2 cluster.

a segment generating unit generating segments by combining a selected number of frames from a video image input;
a neural network learning unit for learning the neural network based on the characteristics of the middle layer of the neural network to which the segments are input; and
and an anomaly detection unit configured to input the segments into a learned neural network and detect anomalies in units of the segments.

12. The method of claim 11,
The neural network learning unit
An anomaly detection apparatus using segmenting of video image frames, comprising: an anomaly video learning unit that learns from an abnormal video image including at least one or more frames.

13. The method of claim 12,
The above video learning unit
An abnormal cluster generating unit for generating two clusters based on the characteristics of the intermediate layer of the neural network for the segment,
The abnormal cluster generating unit
Among the two clusters, a cluster with a larger number of normal segments predicted as normal by the neural network is selected as a normal cluster,
An anomaly detection apparatus using segmenting of a video image frame, wherein the remaining clusters that are not selected as the normal cluster among the two clusters are selected as the abnormal cluster.

14. The method of claim 13,
The above video learning unit
Further comprising a labeling generator to generate a label for the segments,
The labeling generation unit
generating labels of all segments belonging to the normal cluster as normal;
Generates the labels of all segments belonging to the abnormal cluster as an abnormality,
An anomaly detection apparatus using segmenting of a video image frame, which sets the labels as a set of correct answers as a result of prediction of the neural network.

15. The method of claim 14,
The above video learning unit
an abnormal loss function 1 calculator for calculating a loss function 1 by comparing the final values of the neural network to which the segments are input with the prediction result set of the neural network;
an abnormal loss function 2 calculator for calculating a loss function 2 by comparing the normal cluster with the abnormal cluster; and
Anomaly detection apparatus using segmentation of a video image frame, further comprising an abnormal total loss function calculation unit for calculating a total loss function by synthesizing the loss function 1 and the loss function 2.

16. The method of claim 15,
The loss function 2 is inversely proportional to a distance between the center of the normal cluster and the center of the abnormal cluster.

12. The method of claim 11,
The neural network learning unit
An anomaly detection apparatus using segmenting of video image frames, comprising a normal video learning unit learning from a normal video image that does not include any abnormal frames.

18. The method of claim 17,
The normal video learning unit
And a normal cluster generator for generating two clusters based on the characteristics of the middle layer of the neural network for the segment,
The normal cluster generating unit
Selecting one of the two clusters as the normal 1 cluster,
An anomaly detection apparatus using segmenting of a video image frame, wherein the other one of the two clusters is selected as a normal 2 cluster.

19. The method of claim 18,
The normal video learning unit
a normal loss function 1 calculator for calculating a loss function 1 by comparing final values of the neural network to which the segments are input with a set of correct answers indicating normality;
a normal loss function 2 calculator for calculating a loss function 2 by comparing the normal 1 cluster with the normal 2 cluster; and
The apparatus for detecting anomalies using segmenting of video image frames, further comprising a stationary total loss function calculator for calculating a total loss function by synthesizing the loss function 1 and the loss function 2.

20. The method of claim 19,
Wherein the loss function 2 is proportional to a distance between the center of the normal 1 cluster and the center of the normal 2 cluster.