KR102570126B1

KR102570126B1 - Method and System for Generating Video Synopsis Based on Abnormal Object Detection

Info

Publication number: KR102570126B1
Application number: KR1020210097834A
Authority: KR
Inventors: 김영갑; 잉글 팔라시
Original assignee: 세종대학교산학협력단
Priority date: 2021-07-26
Filing date: 2021-07-26
Publication date: 2023-08-22
Also published as: KR20230016394A

Abstract

여러 대의 카메라로 구성된 다시점 영상감시시스템에 적용할 수 있고, 다양한 혼잡 상황에서도 위험 상황을 빠르게 탐지하여 실시간으로 영상 시놉시스를 생성할 수 있는 방법과 장치를 제공한다. 본 발명의 영상 시놉시스 생성 방법은 복수의 비디오를 받아들이고, 상기 복수의 비디오에서 소정의 객체가 포함된 비정상 프레임들을 추출하여 영상 시놉시스를 생성하는 것으로서, 복수의 비디오 각각에 대하여, 입력 영상 프레임에서 상기 비정상 프레임들을 추출하는 단계; 상기 비정상 프레임들을 결합하여 요약 영상을 생성하는 단계; 및 상기 복수의 비디오 각각에 대한 요약 영상을 결합하여 상기 영상 시놉시스를 생성하는 단계;를 포함한다. 상기 비정상 프레임들을 추출하는 단계는 상기 입력 영상 프레임에서 소정의 기준에 따라서 흐릿한 프레임을 구분하는 단계;를 포함한다. 이에 따라, 상기 흐릿한 프레임이 상기 요약 영상과 상기 영상 시놉시스에 포함되지 않게 된다.It can be applied to a multi-view video surveillance system composed of multiple cameras, and provides a method and device that can quickly detect dangerous situations even in various congested situations and generate video synopses in real time. The video synopsis generation method of the present invention receives a plurality of videos and extracts abnormal frames including a predetermined object from the plurality of videos to generate a video synopsis. extracting frames; generating a summary image by combining the abnormal frames; and combining summary images of each of the plurality of videos to generate the video synopsis. The extracting of the abnormal frames includes distinguishing blurry frames from the input image frame according to a predetermined criterion. Accordingly, the blurry frame is not included in the summary video and the video synopsis.

Description

Method and apparatus for generating video synopsis based on abnormal object detection {Method and System for Generating Video Synopsis Based on Abnormal Object Detection}

본 발명은 영상 감시 시스템에 관한 것으로서, 특히, 영상감시카메라에 의해 획득된 영상으로부터 시놉시스를 생성하는 방법 및 장치에 관한 것이다.The present invention relates to a video surveillance system, and more particularly, to a method and apparatus for generating a synopsis from an image acquired by a video surveillance camera.

영상감시카메라의 증가에 따라, 영상 데이터의 양이 과도하게 증가하였다. 이에 따라, 영상 데이터의 분석, 추출, 및 저장이 어려워지고, 더 많은 데이터 저장공간과 분석 인력이 필요하게 되었다. 영상 시놉시스는 긴 영상을 짧게 요약하는 것으로서, 영상 데이터의 양을 감소시키고 몇 시간 분량의 영상을 단 몇 분 만에 검토할 수 있게 해주기 때문에, 데이터 저장공간 문제를 해소하고 영상 분석에 소요되는 비용과 노동력을 크게 감소시킬 수 있게 해준다.With the increase in video surveillance cameras, the amount of video data has increased excessively. Accordingly, it is difficult to analyze, extract, and store image data, and more data storage space and analysis personnel are required. A video synopsis is a short summary of a long video. It reduces the amount of video data and allows hours of video to be reviewed in just a few minutes, eliminating data storage problems and reducing the cost and cost of video analysis. It can greatly reduce the labor force.

그런데 기존의 영상 시놉시스 생성 방법은 한 대의 카메라에 의해 획득한 영상에 대한 시놉시스를 다루기 때문에, 여러 대의 카메라로 구성되어있는 영상감시시스템에서는 적합하지 않다. 또한 기존의 영상 시놉시스 생성 방법은 전체 영상 프레임을 처리하여 생성될 뿐만 아니라 모든 움직임 객체를 포함하도록 생성되기 때문에, 전처리 부담이 커서 실시간 시나리오에 대한 성능이 낮고, 특히 무의미한 객체 데이터 처리 부담으로 인하여 혼잡 상황에서 성능이 크게 저하된다는 문제가 있다.However, since the existing video synopsis generation method deals with the synopsis of an image acquired by a single camera, it is not suitable for a video surveillance system composed of multiple cameras. In addition, since the existing video synopsis generation method is generated by processing not only the entire video frame but also includes all moving objects, the performance for real-time scenarios is low due to the heavy preprocessing burden, especially in congested situations due to the burden of processing meaningless object data. There is a problem that performance is significantly degraded in .

본 발명은 이와 같은 문제점을 해결하기 위한 것으로서, 여러 대의 카메라로 구성된 다시점 영상감시시스템에 적용할 수 있고, 다양한 혼잡 상황에서도 위험 상황을 빠르게 탐지하여 실시간으로 영상 시놉시스를 생성할 수 있는 방법과 장치를 제공하는 것을 기술적 과제로 한다.The present invention is to solve this problem, and it can be applied to a multi-view video surveillance system composed of several cameras, and a method and apparatus capable of generating a video synopsis in real time by quickly detecting a dangerous situation even in various congested situations. It is a technical task to provide

상기 기술적 과제를 달성하기 위한 본 발명의 일 측면에 따르면, 복수의 비디오를 받아들이고, 상기 복수의 비디오에서 소정의 객체가 포함된 비정상 프레임들을 추출하여 영상 시놉시스를 생성하는 방법이 제공된다. 영상 시놉시스 생성 방법은 상기 복수의 비디오 각각에 대하여, 입력 영상 프레임에서 상기 비정상 프레임들을 추출하는 단계; 상기 비정상 프레임들을 결합하여 요약 영상을 생성하는 단계; 및 상기 복수의 비디오 각각에 대한 요약 영상을 결합하여 상기 영상 시놉시스를 생성하는 단계;를 포함한다. 상기 비정상 프레임들을 추출하는 단계는 상기 입력 영상 프레임에서 소정의 기준에 따라서 흐릿한 프레임을 구분하는 단계;를 포함한다. 이에 따라, 상기 흐릿한 프레임이 상기 요약 영상과 상기 영상 시놉시스에 포함되지 않게 된다.According to an aspect of the present invention for achieving the above technical problem, a method of generating a video synopsis by receiving a plurality of videos and extracting abnormal frames including a predetermined object from the plurality of videos is provided. The video synopsis generating method may include extracting the abnormal frames from input video frames for each of the plurality of videos; generating a summary image by combining the abnormal frames; and combining summary images of each of the plurality of videos to generate the video synopsis. The extracting of the abnormal frames includes distinguishing blurry frames from the input image frame according to a predetermined criterion. Accordingly, the blurry frame is not included in the summary video and the video synopsis.

상기 흐릿한 프레임을 구분하는 단계는 소정의 커널을 상기 입력 영상 프레임에 대하여 컨벌루션 연산함으로써 라플라시안 분산을 연산하는 단계;를 포함할 수 있다. 이 경우, 상기 라플라시안 분산을 소정의 임계치와 비교함으로써 상기 흐릿한 프레임이 구분될 수 있다.The distinguishing of the blurry frames may include calculating a Laplacian variance by performing a convolutional operation on the input image frame with a predetermined kernel. In this case, the blurry frame may be identified by comparing the Laplacian variance with a predetermined threshold.

상기 비정상 프레임들을 추출하는 단계에서, 상기 비정상 프레임은 소정의 인공신경망 모델에 의하여 추출될 수 있다.In the step of extracting the abnormal frames, the abnormal frames may be extracted by a predetermined artificial neural network model.

상기 인공신경망 모델은 욜로(YOLO) 모델일 수 있다.The artificial neural network model may be a YOLO model.

상기 비정상 프레임들을 추출하는 단계는 상기 입력 영상 프레임의 색체계를 CIELUV 색공간으로 변환하는 단계;를 포함할 수 있다. 이 경우, 상기 인공신경망은 색공간 변환된 영상 프레임을 토대로 상기 비정상 프레임을 추출할 수 있다.The extracting of the abnormal frames may include converting a color system of the input image frame into a CIELUV color space. In this case, the artificial neural network may extract the abnormal frame based on the color space converted image frame.

상기 비정상 프레임들을 추출하는 단계는 상기 입력 영상 프레임에서 상기 소정의 객체를 추출하고, 상기 소정의 객체가 실제로 비정상 객체일 확률을 나타내는 비정상 점수를 산출하는 단계; 및 상기 객체에 대한 상기 비정상 점수를 토대로 상기 입력 영상 프레임에 대한 프레임 비정상 점수를 결정하는 단계;를 포함할 수 있다.The extracting of the abnormal frames may include extracting the predetermined object from the input image frame and calculating an abnormal score indicating a probability that the predetermined object is actually an abnormal object; and determining a frame abnormality score for the input image frame based on the abnormality score for the object.

상기 요약 영상은 상기 입력 영상 프레임 중에서 상기 프레임 비정상 점수를 토대로 정상 프레임으로 판단되는 프레임과 상기 흐릿한 프레임으로 구분된 프레임을 배제하고 생성될 수 있다. 이 경우, 상기 요약 영상을 생성하는 단계는 상기 입력 영상 프레임에 대하여 교차 엔트로피를 결정하는 단계: 및 상기 교차 엔트로피에 따라 상기 요약 영상에 포함될 프레임들을 결정하는 단계;를 포함할 수 있다.The summary image may be generated by excluding a frame determined as a normal frame based on the frame abnormality score and a frame divided into the blurry frame from among the input image frames. In this case, the generating of the summary image may include determining cross entropy of the input image frames: and determining frames to be included in the summary image according to the cross entropy.

상기 소정의 객체는 총기와 칼 중 적어도 하나를 포함할 수 있다.The predetermined object may include at least one of a gun and a knife.

본 발명의 다른 측면에 따르면, 복수의 비디오를 받아들이고, 상기 복수의 비디오에서 소정의 객체가 포함된 비정상 프레임들을 추출하여 영상 시놉시스를 생성하는 영상 시놉시스 생성 장치가 제공된다. 영상 시놉시스 생성 장치는 프로그램 명령들을 저장하는 메모리와; 상기 메모리에 통신가능하게 접속되고 상기 메모리에 저장된 상기 프로그램 명령들을 실행하는 프로세서;를 구비한다. 상기 프로그램 명령들은 상기 프로세서에 의해 실행될 때 상기 프로세서로 하여금: 상기 복수의 비디오 각각에 대하여, 입력 영상 프레임에서 상기 비정상 프레임들을 추출하는 동작; 상기 비정상 프레임들을 결합하여 요약 영상을 생성하는 동작; 및 상기 복수의 비디오 각각에 대한 요약 영상을 결합하여 상기 영상 시놉시스를 생성하는 동작;을 수행하게 할 수 있다. 상기 비정상 프레임들을 추출하는 동작을 수행하는 명령들은 상기 입력 영상 프레임에서 소정의 기준에 따라서 흐릿한 프레임을 구분하는 동작;을 수행하는 명령들을 포함할 수 있다. 이에 따라 상기 흐릿한 프레임이 상기 요약 영상과 상기 영상 시놉시스에 포함되지 않게 된다.According to another aspect of the present invention, an apparatus for generating a video synopsis is provided that receives a plurality of videos and extracts abnormal frames including a predetermined object from the plurality of videos to generate a video synopsis. An image synopsis generating device includes a memory for storing program instructions; and a processor communicatively connected to the memory and executing the program instructions stored in the memory. The program instructions, when executed by the processor, cause the processor to: extract the abnormal frames from an input video frame for each of the plurality of videos; generating a summary image by combining the abnormal frames; and generating the video synopsis by combining summary images for each of the plurality of videos. The instructions for performing the operation of extracting the abnormal frames may include instructions for performing an operation of distinguishing blurry frames from the input image frame according to a predetermined criterion. Accordingly, the blurry frame is not included in the summary video and the video synopsis.

상기 흐릿한 프레임을 구분하는 동작을 수행하는 명령들은 소정의 커널을 상기 입력 영상 프레임에 대하여 컨벌루션 연산함으로써 라플라시안 분산을 연산하는 동작;을 수행하게 하는 명령들을 포함하여, 상기 라플라시안 분산을 소정의 임계치와 비교함으로써 상기 흐릿한 프레임을 구분할 수 있다.Instructions for performing the operation of distinguishing the blurry frames include instructions for performing an operation of calculating a Laplacian variance by performing a convolution operation on a predetermined kernel with respect to the input image frame; and comparing the Laplacian variance with a predetermined threshold. By doing so, the blurry frame can be distinguished.

상기 비정상 프레임들을 추출하는 동작을 수행하는 명령들은, 소정의 인공신경망 모델에 의하여 상기 비정상 프레임을 추출할 수 있다. 상기 인공신경망 모델이 욜로(YOLO) 모델일 수 있다.Instructions for extracting the abnormal frames may extract the abnormal frames according to a predetermined artificial neural network model. The artificial neural network model may be a YOLO model.

상기 비정상 프레임들을 추출하는 명령들은 상기 입력 영상 프레임의 색체계를 CIELUV 색공간으로 변환하는 동작;을 수행하는 명령들을 포함할 수 있다. 이 경우, 상기 인공신경망은 색공간 변환된 영상 프레임을 토대로 상기 비정상 프레임을 추출할 수 있다.Commands for extracting the abnormal frames may include commands for performing an operation of converting a color system of the input image frame into a CIELUV color space. In this case, the artificial neural network may extract the abnormal frame based on the color space converted image frame.

상기 비정상 프레임들을 추출하는 동작을 수행하는 명령들은 상기 입력 영상 프레임에서 상기 소정의 객체를 추출하고, 상기 소정의 객체가 실제로 비정상 객체일 확률을 나타내는 비정상 점수를 산출하는 동작; 및 상기 객체에 대한 상기 비정상 점수를 토대로 상기 입력 영상 프레임에 대한 프레임 비정상 점수를 결정하는 동작;을 수행하는 명령들을 포함할 수 있다.The instructions for performing the operation of extracting the abnormal frames may include extracting the predetermined object from the input image frame and calculating an abnormality score indicating a probability that the predetermined object is actually an abnormal object; and determining a frame abnormality score for the input image frame based on the abnormality score for the object.

상기 요약 영상은 상기 입력 영상 프레임 중에서 상기 프레임 비정상 점수를 토대로 정상 프레임으로 판단되는 프레임과 상기 흐릿한 프레임으로 구분된 프레임을 배제하고 생성될 수 있다. 이 경우, 상기 요약 영상을 생성하는 명령들은 상기 입력 영상 프레임에 대하여 교차 엔트로피를 결정하는 동작: 및 상기 교차 엔트로피에 따라 상기 요약 영상에 포함될 프레임들을 결정하는 동작;을 수행하는 명령들을 포함할 수 있다.The summary image may be generated by excluding a frame determined as a normal frame based on the frame abnormality score and a frame divided into the blurry frame from among the input image frames. In this case, the instructions for generating the summary image may include instructions for performing: determining cross entropy for the input image frames; and determining frames to be included in the summary image according to the cross entropy. .

본 발명의 일 실시예에 따르면, 다양한 혼잡 상황에서도 위험 상황을 빠르게 탐지하여 실시간으로 영상 시놉시스를 생성할 수 있게 된다. 특히, 본 발명에 의한 시놉시스 생성 방법과 장치는 여러 대의 카메라로 구성된 다시점 영상감시시스템에 적용할 수 있다.According to an embodiment of the present invention, it is possible to quickly detect a dangerous situation even in various congestion situations and generate an image synopsis in real time. In particular, the synopsis generation method and apparatus according to the present invention can be applied to a multi-view video surveillance system composed of multiple cameras.

도 1은 본 발명의 일 실시예에 따른 영상 감시 시스템의 전체적인 구성을 보여주는 개략적인 블록도이다.
도 2는 본 발명의 일 실시예에 따른 영상 시놉시스 생성 과정을 보여주는 흐름도이다.
도 3은 본 발명의 일 실시예에 따른 프레임 선택기 모듈의 블록도이다.
도 4는 CIELUV 색공간에 표시된 색채들의 예를 보여준다.
도 5는 다수의 카메라로부터 입력된 영상 프레임에 대하여 프레임 비정상 점수를 칼라로 시각화한 것을 보여준다.
도 6은 라플라시안 분산에 따른 프레임 선명도의 차이를 보여준다.
도 7은 비정상 요약 영상을 생성하는 프로세스에 대한 의사 코드(pseudo-code)의 일 예를 보여준다.
도 8은 저장된 다시점 비디오에 대한 비디오 시놉시스 생성 예를 보여준다.
도 9는 라이브 카메라 영상에 대한 비디오 시놉시스 생성 예를 보여준다.
도 10은 본 발명의 일 실시예에 따른 영상 시놉시스 생성 장치의 블록도이다.
도 11은 본 발명의 실험에서 객체 탐지 모델의 훈련에 사용된 테스트 영상에 대한 정보를 정리한 표이다.
도 12는 본 발명에 의한 객체 탐지 모델과 다른 최신 탐지 모델들의 총과 칼을 탐지하는 실험에서의 평균 정밀도를 정리한 표이다.
도 12는 본 발명에 의한 객체 탐지 모델과 다른 모델들의 탐지 정확도에 대한 정량적 평가 지표를 정리한 표이다.
도 13은 도 12에 정리된 정량적 평가 지표를 도시한 그래프이다.
도 14는 본 발명에 의한 영상 시놉시스 생성 방법과 다른 알고리즘들의 요약 영상 생성에 있어서의 정밀도에 대한 비교 결과를 도시한 그래프이다.
도 15는 본 발명에 의한 영상 시놉시스 생성 방법과 다른 알고리즘들의 영상 시놉시스 생성 결과를 대비하여 정리한 표이다.1 is a schematic block diagram showing the overall configuration of a video surveillance system according to an embodiment of the present invention.
2 is a flowchart illustrating a process of generating an image synopsis according to an embodiment of the present invention.
3 is a block diagram of a frame selector module according to an embodiment of the present invention.
4 shows an example of colors displayed in the CIELUV color space.
5 shows that frame abnormality scores are visualized in color with respect to image frames input from a plurality of cameras.
6 shows a difference in frame sharpness according to Laplacian dispersion.
7 shows an example of pseudo-code for a process of generating an abnormal summary image.
8 shows an example of generating a video synopsis for a stored multi-view video.
9 shows an example of generating a video synopsis for a live camera image.
10 is a block diagram of an apparatus for generating a video synopsis according to an embodiment of the present invention.
11 is a table summarizing information on test images used for object detection model training in the experiment of the present invention.
12 is a table summarizing the average accuracy of the object detection model according to the present invention and other up-to-date detection models in an experiment for detecting guns and knives.
12 is a table summarizing quantitative evaluation indexes for detection accuracy of the object detection model according to the present invention and other models.
FIG. 13 is a graph showing the quantitative evaluation index summarized in FIG. 12 .
FIG. 14 is a graph showing a comparison result of accuracy in generating a summary image of the method for generating an image synopsis according to the present invention and other algorithms.
15 is a table arranged in comparison with the video synopsis generation method according to the present invention and the video synopsis generation results of other algorithms.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다.Since the present invention can make various changes and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention. Like reference numerals have been used for like elements throughout the description of each figure.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는 데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. "및/또는"이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다.Terms such as first, second, A, and B may be used to describe various components, but the components should not be limited by the terms. These terms are only used for the purpose of distinguishing one component from another. For example, a first element may be termed a second element, and similarly, a second element may be termed a first element, without departing from the scope of the present invention. The term “and/or” includes any combination of a plurality of related listed items or any of a plurality of related listed items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다.It is understood that when an element is referred to as being "connected" or "connected" to another element, it may be directly connected or connected to the other element, but other elements may exist in the middle. It should be. On the other hand, when an element is referred to as “directly connected” or “directly connected” to another element, it should be understood that no other element exists in the middle.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.Terms used in this application are only used to describe specific embodiments, and are not intended to limit the present invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this application, the terms "include" or "have" are intended to designate that there is a feature, number, step, operation, component, part, or combination thereof described in the specification, but one or more other features It should be understood that the presence or addition of numbers, steps, operations, components, parts, or combinations thereof is not precluded.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which the present invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and unless explicitly defined in the present application, they should not be interpreted in an ideal or excessively formal meaning. don't

이하, 본 발명에 따른 바람직한 실시예를 첨부된 도면을 참조하여 상세하게 설명한다. 본 발명을 설명함에 있어 전체적인 이해를 용이하게 하기 위하여 도면상의 동일한 구성요소에 대해서는 동일한 참조부호를 사용하고 동일한 구성요소에 대해서 중복된 설명은 생략한다.Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings. In order to facilitate overall understanding in the description of the present invention, the same reference numerals are used for the same components in the drawings, and redundant descriptions of the same components are omitted.

도 1은 본 발명의 일 실시예에 따른 영상 감시 시스템의 전체적인 구성을 보여주는 개략적인 블록도이다. 도시된 영상 감시 시스템은 감시대상 지역에 분산설치되어 감시 영상을 획득하는 복수의 감시 카메라(10a~10n)와, 상기 복수의 감시 카메라(10a~10n)가 접속할 수 있는 감시제어 서버(20)를 구비한다. 일 실시예에 있어서, 감시 카메라들(10a~10n)은 유선망 및/또는 무선망을 기반으로 한 IP 네트웍을 통해서 감시제어 서버(20)에 접속된다. 감시 카메라들(10a~10n)은 유선망의 일 예로서 USB 케이블을 통해서 감시제어 서버(20)에 접속될 수도 있다.1 is a schematic block diagram showing the overall configuration of a video surveillance system according to an embodiment of the present invention. The illustrated video surveillance system includes a plurality of surveillance cameras 10a to 10n that are distributed and installed in a surveillance target area to obtain surveillance images, and a monitoring and control server 20 to which the plurality of surveillance cameras 10a to 10n can access. provide In one embodiment, the monitoring cameras 10a to 10n are connected to the monitoring and control server 20 through an IP network based on a wired network and/or a wireless network. The monitoring cameras 10a to 10n may be connected to the monitoring and control server 20 through a USB cable as an example of a wired network.

각 감시 카메라(10a~10n)는 감시제어 서버(20)의 제어신호에 응답하여 수평회전(panning) 및 수직회전(tilting)하면서 적절한 줌 배율로 주변 지역을 촬영할 수 있다. 감시제어 서버(20)는 각 감시 카메라(10a~10n)로부터의 영상에서 움직임 객체 또는 비정상 객체를 탐지하고, 필요에 따라 감시 카메라(10a~10n)의 팬, 틸트, 줌을 제어할 수 있다. 또한, 감시제어 서버(20)는 감시 카메라(10a~10n)로부터 획득한 영상에서 비정상 객체가 포함된 영상 프레임들을 추출하여 영상 시놉시스를 구성할 수 있다. 청구범위를 포함하여 본 명세서에서, "비정상 객체(Abnormal object)"이란 예컨대 총기, 칼, 또는 총기나 칼을 들고 있는 사람과 같이 타인에게 위해를 가할 수 있거나, 위험 상황을 야기할 수 있는 객체를 일컫는데, 비정상 객체의 종류가 이에 한정되는 것은 아니다. 예를 들어, 본 발명의 다른 실시예에서는, 비정상 객체가 ...Each of the monitoring cameras 10a to 10n may photograph the surrounding area at an appropriate zoom magnification while horizontally rotating (panning) and vertically rotating (tilting) in response to a control signal from the monitoring and control server 20 . The monitoring and control server 20 may detect a moving object or an abnormal object in the images from each monitoring camera 10a to 10n, and control pan, tilt, and zoom of the monitoring cameras 10a to 10n as necessary. In addition, the monitoring and control server 20 may construct an image synopsis by extracting image frames including abnormal objects from images obtained from the monitoring cameras 10a to 10n. In this specification, including the claims, an “abnormal object” refers to an object that may harm others or cause a dangerous situation, such as a gun, a knife, or a person holding a gun or knife. However, the type of abnormal object is not limited thereto. For example, in another embodiment of the present invention, an abnormal object is...

감시제어 서버(20)는 영상 시놉시스를 디스플레이 장치를 통해서 운영자에게 표출할 수 있고, 저장 장치에 저장할 수도 있다. 감시제어 서버(20)는 후술하는 바와 같이 적어도 하나의 프로세서를 구비하며, 상기 적어도 하나의 프로세서에 의한 프로그램 코드의 실행을 통해 비정상 객체의 검출, 영상 시놉시스의 생성, 영상 디스플레이, 영상 저장 등의 기능을 수행한다.The monitoring and control server 20 may display the video synopsis to the operator through a display device, and may store it in a storage device. As will be described later, the monitoring and control server 20 includes at least one processor, and functions such as detecting an abnormal object, generating an image synopsis, displaying an image, and storing an image through the execution of program codes by the at least one processor. Do it.

감시제어 서버(20)는 예컨대 클라우드 서버 내에서 구현될 수도 있다. 대부분의 카메라 네트웍에서 각 카메라에 의해 생성되는 비디오 피드는 클라우드 서버에 저장되기 때문에, 리소스와 대역폭이 낭비된다. 이 문제를 해소하기 위하여, 본 발명에서 상기 감시제어 서버(20)가 에지/클라우드 컴퓨팅 장치에서 직접 구현될 수 있으며, 그 결과, 각 감시 카메라(10a~10n)에 의해 획득된 원본 영상 전체가 감시제어 서버(20)에 저장되는 것이 아니라, 시놉시스 영상만이 저장될 수도 있다.The monitoring and control server 20 may be implemented, for example, in a cloud server. In most camera networks, the video feeds generated by each camera are stored on cloud servers, wasting resources and bandwidth. In order to solve this problem, in the present invention, the monitoring and control server 20 can be directly implemented in the edge/cloud computing device, and as a result, the entire original video obtained by each monitoring camera 10a to 10n is monitored. Instead of being stored in the control server 20, only the synopsis video may be stored.

도 2는 본 발명의 일 실시예에 따른 영상 시놉시스 생성 과정을 보여주는 흐름도이다. 본 실시예에 따른 영상 시놉시스 생성 과정은 이상 상황을 탐지하기 위한 객체 탐지 모델의 훈련 단계(제100단계), 복수의 카메라의 연결을 확인하는 단계(제110단계), 학습된 객체 탐지 모델을 사용하여 이상 상황을 탐지하고 추출하는 단계(제120단계), 비정상 프레임의 요약을 생성하는 단계(제130단계), 및 복수의 카메라들에 대한 이상 상황 비디오 시놉시스를 생성하는 단계(140)를 포함한다.2 is a flowchart illustrating a process of generating an image synopsis according to an embodiment of the present invention. The process of generating an image synopsis according to the present embodiment includes training an object detection model for detecting an abnormal situation (step 100), checking the connection of a plurality of cameras (step 110), and using the learned object detection model. and detecting and extracting an abnormal situation (step 120), generating a summary of abnormal frames (step 130), and generating an abnormal situation video synopsis for a plurality of cameras (step 140). .

위에서 언급한 바와 같이, 본 발명은 영상 프레임 내에 총이나 칼과 같은 비정상 객체가 있는지 탐지하여 해당 프레임의 이상 상황 여부를 판단한다. 본 발명의 예시적인 실시예에 있어서, 비정상 객체를 탐지하기 위한 객체 탐지 모델로는 YOLOv3가 사용되며, 파이썬(Python)으로 작성된 오픈소스 신경망 라이브러리인 케라스(Keras)를 토대로 구현된다. 즉, 본 실시예에 따른 객체 탐지 모델은 '케라스를 토대로 구현되며 비정상 프레임 객체를 추출하는 YOLO 모델(KB-AF-E-YM: Keras based-abnormal frame-extraction-Yolo model)'이라고 칭할 수 있다. 그렇지만, 본 발명에서 객체 탐지모델이 YOLOv3 모델에 한정되는 것은 아니며, 예컨대 YOLOv2나 YOLOv5와 같은 다른 욜로 모델이 사용될 수도 있고, 욜로 모델 이외의 모델이 사용될 수도 있다.As mentioned above, the present invention detects whether there is an abnormal object such as a gun or a knife in an image frame and determines whether the corresponding frame is in an abnormal situation. In an exemplary embodiment of the present invention, YOLOv3 is used as an object detection model for detecting abnormal objects, and is implemented based on Keras, an open source neural network library written in Python. That is, the object detection model according to the present embodiment can be referred to as 'Keras based-abnormal frame-extraction-Yolo model (KB-AF-E-YM) implemented based on Keras and extracting abnormal frame objects'. there is. However, the object detection model in the present invention is not limited to the YOLOv3 model, and other YOLO models such as YOLOv2 or YOLOv5 may be used, or models other than the YOLO model may be used.

도 2의 제100단계는 오프라인 전처리 단계로서, 이 단계에서는 객체 탐지 모델에 대한 훈련이 진행된다. 객체 탐지 모델의 훈련은 프레임이 입력되면 해당 프레임 내에 있는 비정상 객체를 탐지하고 실제 비정상 객체일 확률을 나타내는 비정상 점수(abnormal score)을 최대한 정확하게 출력하도록 이루어질 수 있다.Step 100 of FIG. 2 is an off-line pre-processing step, in which object detection model training is performed. Training of the object detection model may be performed to detect an abnormal object within a frame when a frame is input and output an abnormal score indicating a probability that the object is an actual abnormal object as accurately as possible.

모델을 훈련하기 위해서는 방대한 데이터 세트를 대상으로 한 많은 계산이 필요하다. 일 실시예에 따르면, ImageNet 데이터 세트, Open-Image 데이터 세트, IMFD(Internet Movie Firearms Database) 및 Gettyimages에서 제공하는 이미지를 사용하여 보다 큰 커스텀 데이터 세트를 만들었다. 또한, 이러한 이미지에 스케일링(scaling), 뒤집기(flipping), 및 자르기(shearing)과 같은 데이터 변형 기술을 적용하여 데이터를 확대함으로써, 총 432427개의 훈련용 이미지 데이터를 구성하였다. 각 이미지에 주석을 추가한 후 Keras 기반 프레임워크를 사용하여 YOLOv3 가중치를 사전 훈련시킴으로써, 객체 탐지 모델을 재구성할 수 있는 업데이트된 가중치를 획득하였다.Training a model requires a lot of computation on huge data sets. According to one embodiment, a larger custom data set was created using the ImageNet data set, the Open-Image data set, the Internet Movie Firearms Database (IMFD), and images provided by Gettyimages. In addition, a total of 432427 training image data was constructed by applying data transformation techniques such as scaling, flipping, and shearing to these images to enlarge the data. After adding annotations to each image, we obtained updated weights to reconstruct the object detection model by pre-training YOLOv3 weights using a Keras-based framework.

제110단계에서는, 영상 시놉시스를 생성하는 서버 예컨대 감시제어 서버(20)는 복수의 카메라(10a~10n)의 연결 상태를 확인할 수 있다. 이때, 연결 상태의 확인은 복수의 카메라(10a~10n)에 대한 새로운 연결을 의미할 수도 있고, 연결되어 있는 카메라의 목록을 확인하는 작업을 의미할 수도 있으며, 기존에 연결된 카메라(10a~10n) 중에서 영상 시놉시스 생성에 사용할 원본 영상을 받아들일 일부 카메라를 선별하는 것을 의미할 수도 있다. 연결되는 카메라들은 유선 LAN 또는 무선 LAN과 같은 네트워크를 사용하여 감시제어 서버(20)에 연결될 수도 있고, USB 케이블을 통해 연결될 수도 있다. 각 카메라에는 영상 시놉시스 생성 시에 적용할 일련번호가 부여될 수 있다.In step 110, the server generating the video synopsis, for example, the monitoring and control server 20 may check the connection state of the plurality of cameras 10a to 10n. At this time, the confirmation of the connection state may mean a new connection to the plurality of cameras 10a to 10n, or an operation to check the list of connected cameras, or the previously connected cameras 10a to 10n. It may also mean selecting some cameras from which to accept the original video to be used for generating the video synopsis. The cameras to be connected may be connected to the monitoring and control server 20 using a network such as a wired LAN or a wireless LAN, or may be connected through a USB cable. A serial number to be applied when generating an image synopsis may be assigned to each camera.

제120단계에서는, 제100단계에서 학습된 객체 탐지 모델을 사용하여 이상 상황을 탐지하고 추출한다. 도 3은 이상 상황의 탐지와 추출에 사용되는 프레임 선택기 모듈의 블록도이다. 프레임 선택기 모듈(122)은 프레임 추출부(124), 학습된 객체 탐지 모델(124: KB-AF-E-YM), 및 프레임 선별부(128)를 포함한다.In step 120, an abnormal situation is detected and extracted using the object detection model learned in step 100. 3 is a block diagram of a frame selector module used for detecting and extracting anomalies. The frame selector module 122 includes a frame extractor 124 , a learned object detection model 124 (KB-AF-E-YM), and a frame selector 128 .

프레임 추출부(124)는 카메라들(10a~10n)로부터 입력되는 영상 프레임들의 색체계를 CIELUV 색공간으로 변환할 수 있다. 프레임 추출부(124)는 CIELUV 색공간의 절대 차(absolute difference) 값을 활용하여 이전 프레임과 일정 수준 이상으로 국부적 휘도 레벨이 다른 프레임들을 추출한다. 즉, 전처리 작업으로서, 프레임 추출부(124)는 수신되는 각 입력 영상 프레임에 대하여 픽셀 또는 소정의 이미지 블록 단위로 이전 프레임과의 CIELUV 색공간 상의 절대 차 값을 계산하고, 절대 차 값의 국부적인 또는 전체적인 합산치가 큰 프레임을 추출한다.The frame extractor 124 may convert a color system of image frames input from the cameras 10a to 10n into a CIELUV color space. The frame extraction unit 124 extracts frames having a local luminance level different from the previous frame by more than a predetermined level by utilizing an absolute difference value of the CIELUV color space. That is, as a pre-processing task, the frame extractor 124 calculates an absolute difference value in the CIELUV color space from the previous frame in units of pixels or predetermined image blocks for each received input image frame, and calculates the local difference value of the absolute difference value. Alternatively, a frame having a large overall sum value is extracted.

도 4는 CIELUV 색공간에 표시된 색채들의 예를 보여준다. CIELUV 색공간에서, 휘도(L)는 0~100의 범위에 있는 값을 가질 수 있고 u, v 좌표는 녹색/빨간색 및 파란색/노란색 축의 방향을 추정할 수 있게 해주며 -100에서 100 사이의 값을 갖는다. CIELUV 색공간은 인간의 시각 인지 특성을 감안하여 휘도를 더 세밀하게 표현할 수 있는 색체계로서, 상기 색체계 변환은 영상의 feature quality를 향상시켜 비정상 객체를 더 잘 탐지할 수 있게 해준다. 최소 색상 콘트라스트를 측정하는 것이 효율적인데, 이는 작은 물체를 검출할 때 많은 장점이 있다.4 shows an example of colors displayed in the CIELUV color space. In the CIELUV color space, luminance (L) can have values ranging from 0 to 100, and the u, v coordinates allow us to estimate the orientation of the green/red and blue/yellow axes, with values ranging from -100 to 100. have The CIELUV color space is a color system that can express luminance in more detail in consideration of human visual perception characteristics, and the color system conversion improves the feature quality of images to better detect abnormal objects. It is efficient to measure the minimum color contrast, which has many advantages when detecting small objects.

본 실시예에 따르면, 변화량이 일정 기준보다 높은 프레임만이 객체 탐지 모델(126)로 입력되어, 변화량이 큰 프레임에 대해서만 객체 탐지 모델(126)이 비정상 객체를 탐지하게 할 수 있다. 그렇지만, 변형된 실시예에서는 프레임 추출부(124)가 생략되어 모든 영상 프레임이 객체 탐지 모델(126)에 공급될 수 있다. 아울러, 프레임 추출부(124)의 기능 중에서 색공간 변환과 프레임 추출 중 어느 하나만이 생략될 수도 있다.According to this embodiment, only frames whose variation is greater than a predetermined standard are input to the object detection model 126, so that the object detection model 126 detects an abnormal object only for frames in which the variation is large. However, in the modified embodiment, the frame extractor 124 may be omitted and all image frames may be supplied to the object detection model 126 . In addition, among the functions of the frame extraction unit 124, only one of color space conversion and frame extraction may be omitted.

객체 탐지 모델(126)은 프레임 추출부(124)로부터 영상 프레임을 받아들이고, 영상 프레임에서 비정상 객체를 탐지한다. 비정상 객체의 탐지는 위에서 언급한 바와 같이 KB-AF-E-YM이라 명명되는 YOLO 모델에 의해 이루어질 수 있다. 상기 YOLO 모델은 다수의 컨벌루션 레이어들을 통해 특징 맵(feature map)을 추출하고 fully connected layer를 거쳐 바운딩 박스를 정하고 클래스 점수를 예측한다. 즉, YOLO 모델은 각 영상 프레임을 SxS 개의 그리드로 분할하고, 각 그리드에 비정상 객체가 포함되었는지 여부를 계산하기 위하여 객체 클래스 점수를 계산한다. 각 그리드에 대한 신뢰도를 높일 수 있도록 주변의 그리드를 합치고 바운딩 박스의 위치를 조정함으로써, 객체 인식의 정확도를 높이고 클래스 점수의 신뢰도를 높일 수 있다.The object detection model 126 receives an image frame from the frame extractor 124 and detects an abnormal object in the image frame. As mentioned above, detection of an abnormal object may be performed by a YOLO model named KB-AF-E-YM. The YOLO model extracts a feature map through a plurality of convolutional layers, determines a bounding box through a fully connected layer, and predicts a class score. That is, the YOLO model divides each image frame into SxS grids, and calculates object class scores to calculate whether abnormal objects are included in each grid. By merging neighboring grids and adjusting the position of the bounding box to increase the reliability of each grid, it is possible to increase the accuracy of object recognition and increase the reliability of class scores.

객체 탐지 모델(126)은 비정상 객체와 객체 클래스 점수를 출력할 수 있다. 일 실시예에 있어서, 비정상 객체의 정보를 바운딩 박스의 위치와 크기로 나타내어질 수 있다. 상기 객체 클래스 점수는 해당 객체가 실제 비정상 객체일 확률을 나타내는 비정상 점수(abnormal score)일 수 있다. 상기 비정상 점수는 0과 1 사이의 값으로 매겨져서 주석 형태로 영상 프레임 또는 바운딩 박스에 부가될 수 있다. 상기 비정상 점수는 1에 근접할수록 비정상일 확률이 높다는 것을 의미할 수 있다.The object detection model 126 may output abnormal objects and object class scores. In one embodiment, information of an abnormal object may be represented by the location and size of a bounding box. The object class score may be an abnormal score indicating a probability that the corresponding object is an actual abnormal object. The abnormal score may be set to a value between 0 and 1 and added to the image frame or bounding box in the form of annotations. The abnormality score may mean that the closer to 1, the higher the probability of abnormality.

객체에 대한 비정상 점수를 토대로 프레임의 비정상 점수가 계산될 수 있다. 예를 들어, 영상 프레임에서 탐지된 객체들의 비정상 점수 중 가장 높은 비정상 점수가 프레임의 비정상 점수로 정해질 수 있다. 프레임 비정상 점수(α)에 따라, 각 프레임은 정상 프레임과 비정상 프레임으로 구분될 수 있다. 즉, α=0이라면 현재 처리중인 프레임에는 원하는 객체가 없으며, 해당 프레임은 정상 프레임(NF: Normal Frame)으로 표시된다. 만약 α=1이라면 처리중인 프레임에 원하는 객체가 존재한다는 의미이며, 해당 프레임은 비정상 프레임(ANF: Abnormal Frame)으로 표시된다. 이와 같이, KB-AF-E-YM은 프레임에서 객체를 선택하고 프레임에 주석을 부가하게 된다. 주석이 부가된 프레임 정보는 매트릭스(MA) 시퀀스의 형태로 다음 단계로 넘겨진다. 이때, 상기 주석에는 비정상 객체에 대한 정보와, 프레임 비정상 점수 및/또는 프레임의 정상/비정상 여부에 대한 정보가 포함된다.An anomaly score for a frame may be calculated based on an anomaly score for an object. For example, the highest abnormal score among abnormal scores of objects detected in an image frame may be determined as the abnormal score of the frame. Each frame can be divided into a normal frame and an abnormal frame according to the frame abnormality score α. That is, if α=0, there is no desired object in the frame currently being processed, and the corresponding frame is displayed as a normal frame (NF). If α=1, it means that the desired object exists in the frame being processed, and the corresponding frame is displayed as an abnormal frame (ANF). In this way, KB-AF-E-YM will select objects in the frame and annotate the frame. Annotated frame information is passed to the next step in the form of a matrix (MA) sequence. In this case, the annotation includes information about the abnormal object, frame abnormality score, and/or information about whether the frame is normal/abnormal.

도 5는 다수의 카메라로부터 입력된 영상 프레임에 대하여 프레임 비정상 점수를 칼라로 시각화한 것을 보여준다. 도 5의 (a)는 8개의 카메라로부터 입력되는 각 6개 프레임의 영상에 대한 프레임 시퀀스 열(sequential frame series)과 속성(attribute) 즉 프레임 비정상 점수을 보여준다. 도면에서, 보라색 부분은 비정상 프레임을 나타내고 다른 모든 색상은 정상 프레임임을 나타낸다. 아래로 향하는 화살표는 비정상 프레임의 표시를 의미하고, 위로 향하는 화살표는 정상 프레임의 표시를 의미한다.5 shows that frame abnormality scores are visualized in color with respect to image frames input from a plurality of cameras. 5(a) shows a sequential frame series and an attribute, that is, a frame abnormality score, for each of 6 frames of images input from 8 cameras. In the figure, the purple part represents an abnormal frame and all other colors represent a normal frame. A downward arrow indicates an abnormal frame, and an upward arrow indicates a normal frame.

프레임 선별부(128)는 객체 탐지 모델(126)로부터 전달받은 프레임 중에서 흐릿한 프레임을 버리고 선명한 프레임만 선택함으로써, 우수한 화질의 프레임만으로 매끄러운 시놉시스가 생성될 수 있게 해준다. 일 실시예에 있어서, 프레임 선별부(128)는 영상 프레임에 라플라시안 커널을 적용하여 라플라시안 분산(laplacian variance)을 계산함으로써, 상기 라플라시안 분산을 토대로 흐릿한 프레임(blur frame)과 선명한 프레임을 구분할 수 있게 해준다. 수학식 1은 3x3 행렬로 되어 있는 라플라시안 커널의 일 예를 보여준다. 프레임 선별부(128)는 영상 프레임에 대하여 수평 및 수직 방향으로 일정한 이동간격(stride) 단위로 슬라이딩시켜 상대적으로 이동시키면서 컨벌루션 연산을 수행함으로써 라플라시안 분산을 계산할 수 있다.The frame selection unit 128 discards blurry frames from frames received from the object detection model 126 and selects only clear frames, thereby enabling a smooth synopsis to be generated only with frames of excellent quality. In one embodiment, the frame selection unit 128 calculates a Laplacian variance by applying a Laplacian kernel to an image frame, thereby distinguishing a blur frame from a sharp frame based on the Laplacian variance. . Equation 1 shows an example of a Laplacian kernel in a 3x3 matrix. The frame selector 128 may calculate the Laplacian variance by performing a convolution operation while relatively moving an image frame by sliding it horizontally and vertically in units of a predetermined movement interval (stride).

상기 라플라시안 분산이 일정한 임계값보다 작으면, 해당 프레임은 흐릿한 프레임으로 간주되어, 폐기되거나 주석에 표시될 수 있다. 반면에, 상기 라플라시안 분산이 일정한 임계값보다 크면, 해당 프레임은 선명한 프레임으로 간주될 수 있다. 도 6은 라플라시안 분산에 따른 프레임 선명도의 차이를 보여준다. 발명자들의 실험에 따르면, 라플라시안 분산이 (0, 159) 사이의 구간에 있으면 프레임 영상의 화질이 최악이었고, 라플라시안 분산이 (160, 349) 사이의 구간에 있으면 영상 화질이 평균 범위에 있다고 할 수 있었으며, 라플라시안 분산이 350 이상인 경우에는 프레임 영상의 화질이 우수하였다. 따라서, 중요한 정보인 비정상 프레임의 폐기를 방지하면서, 화질이 열악하지 않은 프레임 영상을 선별할 수 있게 임계값을 정하는 것이 중요하다. 응용분야에 따라서 다르지만, 본 실시예에서 라플라시안 분산의 임계값은 160으로 설정되어 사용되었다.If the Laplacian variance is less than a certain threshold value, the corresponding frame is regarded as a blurry frame and may be discarded or marked as a comment. On the other hand, if the Laplacian variance is greater than a certain threshold value, the corresponding frame may be regarded as a clear frame. 6 shows a difference in frame sharpness according to Laplacian dispersion. According to the experiment of the inventors, if the Laplacian variance is in the range of (0, 159), the image quality of the frame image is the worst, and if the Laplacian variance is in the range of (160, 349), the image quality can be said to be in the average range. , when the Laplacian variance was 350 or more, the image quality of the frame image was excellent. Therefore, it is important to set a threshold value to select a frame image with not poor image quality while preventing the discarding of an abnormal frame, which is important information. Depending on the application field, in this embodiment, the threshold value of the Laplacian variance was set to 160 and used.

다시 도 2를 참조하면, 제130단계에서는, 프레임 선택기 모듈(122)로부터 출력되는 영상 프레임들 중에서 비정상 프레임을 모아서 비정상 요약 영상(abnormal summary video)을 생성한다. 제130단계는 다수의 감시 카메라(10a~10n)로부터 입력되는 다수의 입력 영상 각각에 대하여 별도로 수행되어, 모든 카메라 영상에 대한 요약을 생성하게 된다.Referring back to FIG. 2 , in step 130, abnormal frames are collected from among image frames output from the frame selector module 122 to generate an abnormal summary video. Step 130 is separately performed for each of the plurality of input images input from the plurality of surveillance cameras 10a to 10n, and a summary of all camera images is generated.

일 실시예에 있어서, 비정상 요약 영상은 프레임 비정상 점수를 토대로 비정상 프레임을 선택하고, 화질이 기준에 미치지 못하는 프레임을 배제한 후, 남는 프레임들을 이어붙여서 생성될 수 있다. 예를 들어, 프레임 비정상 점수가 0.5 미만인 프레임은 배제하고 0.5 이상인 프레임만으로 비정상 요약 영상을 구성할 수 있다.In an embodiment, an abnormal summary image may be generated by selecting an abnormal frame based on a frame abnormality score, excluding frames whose image quality does not meet a standard, and then concatenating remaining frames. For example, an abnormal summary image may be configured only with frames having a frame abnormality score of 0.5 or more, excluding frames having a frame abnormality score of less than 0.5.

도 7은 비정상 요약 영상을 생성하는 프로세스에 대한 의사 코드(pseudo-code)의 일 예를 보여준다. 객체 탐지 모델(126) 즉, KB-AF-E-YM이 프레임 내의 객체들의 비정상 점수와 객체들의 비정상 점수를 기반으로 한 해당 프레임의 비정상 점수 결정을 수행하며, 이를 어텐션 모델(attention model)이 백그라운드에서 동시에 수행해 실시간 시놉시스 생성이 가능해진다.7 shows an example of pseudo-code for a process of generating an abnormal summary image. The object detection model 126, that is, KB-AF-E-YM, determines the abnormal score of the frame based on the abnormal scores of objects in the frame and the abnormal scores of the objects, and the attention model determines the background simultaneously, real-time synopsis generation is possible.

도 2의 제140단계에서는, 각 카메라 영상의 비정상 요약을 각 카메라에 대하여 사전에 할당된 시퀀스 넘버 순서대로 스티칭하여 연결한다. 즉, 도 8 및 도 9와 같이 각 카메라 영상에 대하여 요약된 비정상 프레임들을 순차적으로 연결하게 된다. 도 8은 저장된 다시점 비디오에 대한 비디오 시놉시스 생성 예를 보여주고, 도 9는 라이브 카메라 영상에 대한 비디오 시놉시스 생성 예를 보여준다.In step 140 of FIG. 2 , abnormal summaries of each camera image are stitched and connected in the order of sequence numbers assigned to each camera in advance. That is, as shown in FIGS. 8 and 9 , abnormal frames summarized for each camera image are sequentially connected. 8 shows an example of creating a video synopsis for a stored multi-view video, and FIG. 9 shows an example of creating a video synopsis for a live camera image.

제130단계에서 각 카메라 영상에 대하여 요약 영상을 생성하는 동안에, 또는 제140단계에서 여러 카메라 영상에 대한 요약 영상들을 스티칭하여 시놉시스를 생성하는 동안에, 영상에 지터, 고스팅, 또는 흔들림 문제가 생길 수 있다. 이를 감안하여, 스티칭은 스티칭 파이프라인에서 시작하여, 픽셀의 특징점을 계산한 다음, 이미지를 등록하고 블렌딩하는 과정을 거치는 것이 바람직하다.While generating a summary image for each camera image in step 130 or generating a synopsis by stitching summary images of multiple camera images in step 140, jitter, ghosting, or shaking may occur in the image. there is. Considering this, it is preferable that stitching starts with a stitching pipeline, calculates feature points of pixels, and then goes through a process of registering and blending images.

도 10은 본 발명의 일 실시예에 따른 영상 시놉시스 생성 장치의 블록도이다. 본 발명의 일 실시예에 따른 영상 시놉시스 생성 장치는 적어도 하나의 프로세서(220), 메모리(240), 및 저장 장치(260)를 포함할 수 있다. 앞에서 언급한 바와 같이, 영상 시놉시스 생성 장치는 도 1에 도시된 영상 감시 시스템에서 감시제어 서버(20)에 의해 구현될 수 있다. 이러한 경우, 감시제어 서버(20)는 각 감시 카메라(10a~10n)가 획득한 감시 영상을 해당 카메라로부터 받아들이고, 수신된 감시 영상에서 비정상 객체가 포함된 비정상 프레임을 탐지하여 영상 시놉시스를 생성한다.10 is a block diagram of an apparatus for generating a video synopsis according to an embodiment of the present invention. An apparatus for generating a video synopsis according to an embodiment of the present invention may include at least one processor 220 , a memory 240 , and a storage device 260 . As mentioned above, the video synopsis generating device may be implemented by the monitoring and control server 20 in the video surveillance system shown in FIG. 1 . In this case, the monitoring and control server 20 receives monitoring images acquired by each monitoring camera 10a to 10n from the corresponding camera, detects an abnormal frame including an abnormal object in the received monitoring image, and generates an image synopsis.

프로세서(220)는 메모리(240) 및/또는 저장 장치(260)에 저장된 프로그램 명령을 실행할 수 있다. 프로세서(220)는 적어도 하나의 중앙 처리 장치(central processing unit, CPU)나 그래픽 처리 장치(graphics processing unit, GPU)에 의해 구현될 수 있으며, 그밖에 본 발명에 따른 방법을 수행할 수 있는 여타의 프로세서일 수 있다.Processor 220 may execute program instructions stored in memory 240 and/or storage device 260 . The processor 220 may be implemented by at least one central processing unit (CPU) or graphics processing unit (GPU), and other processors capable of performing the method according to the present invention. can be

메모리(240)는 예컨대 RAM(Random Access Memory)와 같은 휘발성 메모리와, ROM(Read Only Memory)과 같은 비휘발성 메모리를 포함할 수 있다. 메모리(240)는 저장 장치(260)에 저장된 프로그램 명령을 로드하여, 프로세서(220)에 제공함으로써 프로세서(220)가 이를 실행할 수 있도록 할 수 있다.The memory 240 may include, for example, volatile memory such as RAM (Random Access Memory) and non-volatile memory such as ROM (Read Only Memory). The memory 240 may load a program command stored in the storage device 260 and provide it to the processor 220 so that the processor 220 can execute it.

저장 장치(260)는 프로그램 명령과 데이터를 저장하기에 적합한 기록매체로서, 예컨대 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(Magnetic Media), CD-ROM(Compact Disk Read Only Memory), DVD(Digital Video Disk)와 같은 광 기록 매체(Optical Media), 플롭티컬 디스크(Floptical Disk)와 같은 자기-광 매체(Magneto-Optical Media), 플래시 메모리나 EPROM(Erasable Programmable ROM) 또는 이들을 기반으로 제작되는 SSD와 같은 반도체 메모리를 포함할 수 있다.The storage device 260 is a recording medium suitable for storing program commands and data, for example, magnetic media such as hard disks, floppy disks and magnetic tapes, CD-ROMs (Compact Disk Read Only Memory), DVDs ( Optical media such as digital video disks, magneto-optical media such as floptical disks, flash memory or EPROM (Erasable Programmable ROM), or SSDs based on them It may include a semiconductor memory such as.

저장 장치(260)는 상기 프로그램 명령을 저장한다. 특히, 상기 프로그램 명령은 본 발명에 따른 영상 시놉시스 생성 방법을 구현하기 위한 영상 시놉시스 생성 프로그램을 포함할 수 있다. 상기 영상 시놉시스 생성 프로그램은 상기 복수의 비디오 각각에 대하여, 입력 영상 프레임에서 상기 비정상 프레임들을 추출하는 동작; 상기 비정상 프레임들을 결합하여 요약 영상을 생성하는 동작; 및 상기 복수의 비디오 각각에 대한 요약 영상을 결합하여 상기 영상 시놉시스를 생성하는 동작;을 수행하게 하는 영상 시놉시스 생성 방법을 구현하는데 필요한 프로그램 명령들을 포함한다. 특히, 상기 비정상 프레임들을 추출하는 동작을 수행하는 명령들은 상기 입력 영상 프레임에서 소정의 기준에 따라서 흐릿한 프레임을 구분하는 동작;을 수행하는 명령들을 포함함으로써 상기 흐릿한 프레임이 상기 요약 영상과 상기 영상 시놉시스에 포함되지 않게 하는 명령들을 포함할 수 있다. 이와 같은 프로그램 명령은 프로세서(220)의 제어에 의해 메모리(240)에 로드된 상태에서, 프로세서(220)에 의해 실행되어 본 발명에 의한 방법을 구현할 수 있다.The storage device 260 stores the program instructions. In particular, the program command may include a video synopsis generating program for implementing the video synopsis generating method according to the present invention. The video synopsis generating program extracts the abnormal frames from input video frames for each of the plurality of videos; generating a summary image by combining the abnormal frames; and generating the video synopsis by combining summary videos for each of the plurality of videos. In particular, the instructions for performing the operation of extracting the abnormal frames include instructions for performing an operation of distinguishing blurry frames from the input image frames according to a predetermined criterion, so that the blurry frames are included in the summary image and the video synopsis. It may contain instructions that prevent it from being included. Such program commands may be executed by the processor 220 in a state in which they are loaded into the memory 240 under the control of the processor 220 to implement the method according to the present invention.

본 발명에 의한 영상 시놉시스 생성 방법의 성능을 평가하기 위해 실험을 실시하고, 실험 결과를 다른 최신 연구와 비교하였다. 모든 평가는 3.59GHz의 클럭 속도를 가지는 AMD Ryzen 5 3500X 6코어 프로세서, 16GB 램, Nvidia Gigabyte Geforce RTX 2060 그래픽 카드, Logitech C920 Pro H.D. 웹캠을 사용하여 평가하였다.In order to evaluate the performance of the image synopsis generation method according to the present invention, an experiment was conducted, and the experimental results were compared with other recent studies. All ratings are based on an AMD Ryzen 5 3500X 6-core processor clocked at 3.59GHz, 16GB RAM, Nvidia Gigabyte Geforce RTX 2060 graphics card, Logitech C920 Pro H.D. Evaluated using a webcam.

객체 탐지 모델(124) 즉, KB-AF-E-YM을 훈련하기 위하여 TensorFlow를 사용하였으며, Keras API를 사용하여 추가 변환을 하였다. 객체 탐지에 대한 KB-AF-E-YM 모델을 훈련하기 위하여 위해 ImageNet, IMFD, OpenImage dataset의 이미지들을 사용하였다.TensorFlow was used to train the object detection model 124, that is, KB-AF-E-YM, and additional transformation was performed using Keras API. To train the KB-AF-E-YM model for object detection, images from ImageNet, IMFD, and OpenImage datasets were used.

영상 시놉시스 성능을 테스트하기 위해 YouTube, IMFD, 라이브 카메라에서 추출된 영상을 사용했다. 6개의 영상 중에서 영상 1, 5, 6번은 YouTube로부터 획득하였는데, 이 영상에는 각각 2, 1, 2개의 비정상 객체가 있었다. 영상 2번은 IMFD로부터 획득하였고, 이 영상에는 3개의 비정상 객체가 있었다. 영상 3, 4번은 라이브 카메라로부터 입력하였고, 이들 영상 각각에는 1개의 비정상 객체가 있었다. 각 영상에 대한 자세한 정보는 도 11의 표에 정리되어 있다. To test the video synopsis performance, we used videos extracted from YouTube, IMFD, and live cameras. Among the six images, images 1, 5, and 6 were obtained from YouTube, and these images contained 2, 1, and 2 abnormal objects, respectively. Image No. 2 was acquired from IMFD, and there were 3 abnormal objects in this image. Images 3 and 4 were input from the live camera, and there was one abnormal object in each of these images. Detailed information on each image is summarized in the table of FIG. 11 .

본 발명에 의한 객체 탐지 모델과 다른 최신 탐지 모델의 총과 칼을 탐지하는 실험에서의 평균 정밀도를 도 12에 정리하였다. 총과 칼의 두 가지 비정상 객체를 탐지하는 실험에서 본 발명에 의한 KB-AF-E-YM 모델은 평균 정밀도(mAP: mean Average Precision)가 96.54%에 달하였으며, 다른 최신 탐지 모델에 비하여 우수한 성능을 보였다.12 summarizes the average accuracy in the experiment of detecting guns and knives of the object detection model according to the present invention and other latest detection models. In an experiment to detect two abnormal objects, a gun and a knife, the KB-AF-E-YM model according to the present invention reached 96.54% of average precision (mAP), which is superior to other state-of-the-art detection models. showed

또한, 본 발명에 의한 KB-AF-E-YM 모델을 3가지 기존 알고리즘 즉, 가우시안 혼합 모델(GMM: Gaussian Mixture Model)[R1], LOB-STERBGS[R2], 배경 차감법(Background subtraction method)[R3] 알고리즘과 Precision, Recall, F1, Similarity, PCC를 비교하였다. 비교 결과는 도 13의 표에 정리되어 있고, 도 14의 그래프에 도시되어 있다. 모든 척도에 있어서, 본 발명에 의한 KB-AF-E-YM 모델이 전체적으로 뛰어난 것을 확인할 수 있었다.In addition, the KB-AF-E-YM model according to the present invention is applied to three existing algorithms, that is, Gaussian Mixture Model (GMM) [R1], LOB-STERBGS [R2], and background subtraction method The [R3] algorithm was compared with Precision, Recall, F1, Similarity, and PCC. The comparison results are summarized in the table of FIG. 13 and shown in the graph of FIG. 14 . In all scales, it was confirmed that the KB-AF-E-YM model according to the present invention was overall excellent.

도 13 및 도 14에서, 정밀도(Precision)은 모델이 True라고 분류한 것 중에서 실제 True인 것의 비율을 말한다. 재현율(Recall)은 실제 True인 것 중에서 모델이 True라고 예측한 것의 비율을 말한다. F1은 정밀도(Precision)와 재현율(Recall)의 가중 평균이다. 유사도(Similarity)는 전체 객체와 비정상 객체라고 예측된 것의 합산치 중에서 실제 비정상 객체의 비율을 말한다(즉, TruePositive/(TruePositive+FalseNegative+FalsePositive).In FIGS. 13 and 14 , precision refers to the ratio of things that are actually true among things classified as true by the model. Recall is the ratio of what the model predicts to be true out of what is actually true. F1 is the weighted average of Precision and Recall. Similarity refers to the ratio of actual abnormal objects among the sum of all objects and predicted abnormal objects (i.e., TruePositive/(TruePositive+FalseNegative+FalsePositive)).

한편, 요약 영상의 정밀도에 대해서도 다른 기법과 비교를 행하였다. 일반적으로, 영상 요약은 조명 조건의 변화, 혼잡 시나리오와 같은 다양한 환경 속성에 따라 결과가 달라질 수 있는데, 본 비교에서는 유용하지 않은 정보를 버리는 능력을 기준으로 정확도를 비교하였다. 그리고, 동일한 평가 척도를 평가에 사용하였다. 다른 기법과의 포괄적인 비교가 도 14에 도시되어 있다. 본 발명의 시놉시스 생성 방법이 다른 기법에 비하여 성능이 뒤지지는 않았지만, 비정상 프레임에 대한 요약만 생성하기 때문에 정밀도에 현저하게 큰 차이는 없었다.On the other hand, the accuracy of the summary image was also compared with other techniques. In general, video summarization results can vary depending on various environmental properties such as changes in lighting conditions and congestion scenarios. In this comparison, accuracy was compared based on the ability to discard unuseful information. And, the same evaluation scale was used for evaluation. A comprehensive comparison with other techniques is shown in FIG. 14 . Although the performance of the synopsis generation method of the present invention was not inferior to other techniques, there was no significant difference in precision because it only generated summaries for abnormal frames.

마지막으로, 최종적으로 생성된 시놉시스에 대하여 본 발명의 방법과 다른 알고리즘을 비교하였다. Finally, the method of the present invention and other algorithms were compared for the finally generated synopsis.

실제 실내 및 실외 위협 시나리오를 고려하여, 각각이 3500프레임 이상의 길이를 가지는 총 6개의 온라인 및 오프라인 비디오를 선택하였다. 본 발명의 방법을 Time Equidistant Algorithm(TEA)[T1], Constant Equidistant Algorithm(CEA)[T2], SeDiM[T3] 등의 알고리즘과 비교하였다. 도 15는 본 발명에 의한 영상 시놉시스 생성 방법과 다른 알고리즘들의 영상 시놉시스 생성 결과를 대비하여 정리한 표이다. 실험 결과, CEA가 TEA 알고리즘에 비해 성능이 더 우수함을 확인하였는데, 이 두가지 알고리즘 모두 SeDiM에 비해 결과 영상 시놉시스에서 왜곡이 발견되었다. 그리고 본 발명에 의한 방법이 T1, T2 및 T3 방법보다 성능이 비교적 우수하였다.Considering real indoor and outdoor threat scenarios, a total of 6 online and offline videos, each with a length of more than 3500 frames, were selected. The method of the present invention was compared with algorithms such as Time Equidistant Algorithm (TEA) [T1], Constant Equidistant Algorithm (CEA) [T2], and SeDiM [T3]. 15 is a table arranged in comparison with the video synopsis generation method according to the present invention and the video synopsis generation results of other algorithms. As a result of the experiment, it was confirmed that CEA performed better than the TEA algorithm, and distortion was found in the resulting image synopsis in both of these algorithms compared to SeDiM. In addition, the method according to the present invention performed relatively better than the T1, T2 and T3 methods.

대비에 사용한 기존 알고리즘이 단일 카메라 뷰에 대한 시놉시스를 생성하는데 비하여, 본 발명의 방법은 멀티뷰 카메라에 대한 결합 영상 시놉시스를 생성하는 것임에도 불구하고, 본 발명의 방법은 최신 시놉시스 생성 기법에 비하여 Recall, F1, Similarity에서 뛰어난 성능을 가진 것을 확인할 수 있었고 영상 요약 시간 또한 다른 방법에 비해 짧아 더욱 빨리 영상을 분석할 수 있게 만들어 주었다.Although the existing algorithm used for contrast generates a synopsis for a single camera view, the method of the present invention generates a combined image synopsis for a multi-view camera, compared to the latest synopsis generation technique, the method of the present invention recalls , F1, and Similarity, it was confirmed that it had excellent performance, and the video summary time was also shorter than other methods, making it possible to analyze the video more quickly.

위에서 언급한 바와 같이 본 발명의 실시예에 따른 장치와 방법은 컴퓨터로 읽을 수 있는 기록매체에 저장된 상태에서 컴퓨터가 읽어서 실행할 수 있는 프로그램 명령들 또는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의해 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어 분산 방식으로 컴퓨터로 읽을 수 있는 프로그램 또는 코드가 저장되고 실행될 수 있다.As mentioned above, devices and methods according to embodiments of the present invention can be implemented as program instructions or codes that can be read and executed by a computer while stored in a computer-readable recording medium. A computer-readable recording medium includes all types of recording devices in which data that can be read by a computer system is stored. In addition, computer-readable recording media may be distributed to computer systems connected through a network to store and execute computer-readable programs or codes in a distributed manner.

상기 컴퓨터가 읽을 수 있는 기록매체는 롬(rom), 램(ram), 플래시 메모리(flash memory) 등과 같이 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함할 수 있다. 프로그램 명령은 컴파일러(compiler)에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터(interpreter) 등을 사용해서 컴퓨터에 의해 실행될 수 있는 고급 언어 코드를 포함할 수 있다.The computer-readable recording medium may include hardware devices specially configured to store and execute program instructions, such as ROM, RAM, and flash memory. The program command may include high-level language codes that can be executed by a computer using an interpreter or the like as well as machine code generated by a compiler.

본 발명의 일부 측면들은 장치의 문맥에서 설명되었으나, 그것은 상응하는 방법에 따른 설명 또한 나타낼 수 있고, 여기서 블록 또는 장치는 방법 단계 또는 방법 단계의 특징에 상응한다. 유사하게, 방법의 문맥에서 설명된 측면들은 또한 상응하는 블록 또는 아이템 또는 상응하는 장치의 특징으로 나타낼 수 있다. 방법 단계들의 몇몇 또는 전부는 예를 들어, 마이크로프로세서, 프로그램 가능한 컴퓨터 또는 전자 회로와 같은 하드웨어 장치에 의해서 또는 이를 이용하여 수행될 수 있다. 몇몇의 실시예에서, 가장 중요한 방법 단계들의 하나 이상은 이와 같은 장치에 의해 수행될 수 있다.Although some aspects of the present invention have been described in the context of an apparatus, it may also represent a description according to a corresponding method, where a block or apparatus corresponds to a method step or feature of a method step. Similarly, aspects described in the context of a method may also be represented by a corresponding block or item or a corresponding feature of a device. Some or all of the method steps may be performed by or using a hardware device such as, for example, a microprocessor, programmable computer or electronic circuitry. In some embodiments, one or more of the most important method steps may be performed by such an apparatus.

실시예들에서, 프로그램 가능한 로직 장치(예를 들어, 필드 프로그래머블 게이트 어레이)가 여기서 설명된 방법들의 기능의 일부 또는 전부를 수행하기 위해 사용될 수 있다. 실시예들에서, 필드 프로그래머블 게이트 어레이는 여기서 설명된 방법들 중 하나를 수행하기 위한 마이크로프로세서와 함께 작동할 수 있다. 일반적으로, 방법들은 어떤 하드웨어 장치에 의해 수행되는 것이 바람직하다.In embodiments, a programmable logic device (eg, a field programmable gate array) may be used to perform some or all of the functions of the methods described herein. In embodiments, a field programmable gate array may operate in conjunction with a microprocessor to perform one of the methods described herein. Generally, methods are preferably performed by some hardware device.

위에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although the above has been described with reference to preferred embodiments of the present invention, those skilled in the art will variously modify and change the present invention within the scope not departing from the spirit and scope of the present invention described in the claims below. You will understand that it can be done.

Claims

A method of receiving a plurality of videos obtained through each of a plurality of cameras, extracting abnormal frames including a predetermined abnormal object from the plurality of videos, and generating an image synopsis including the abnormal frames, the method comprising:
extracting the abnormal frames from an input image frame for each of the plurality of videos, and generating a summary image including the extracted abnormal frames; and
generating the video synopsis in the form of one frame sequence including the abnormal frames included in the summary video for each of the plurality of videos;
Extracting the abnormal frames and generating the summary image
extracting the predetermined object from the input image frame and calculating an abnormality score representing a probability that the extracted object is actually the abnormal object;
determining a frame abnormality score for the input image frame based on the abnormality score for the extracted object; and
distinguishing blurry frames from the input image frame based on the frame abnormality score; thereby preventing frames determined as normal frames and frames classified as blurry frames from being included in the summary image and the image synopsis. How to create a synopsis.

The method according to claim 1, wherein the step of classifying the blurry frame
and calculating a Laplacian variance by performing a convolution with a predetermined kernel on the input image frame; wherein the image synopsis generation method distinguishes the blurry frame by comparing the Laplacian variance with a predetermined threshold.

The method of claim 1 , wherein in the step of extracting the abnormal frames, the abnormal frames are extracted according to a predetermined artificial neural network model.

The method of claim 3 , wherein the artificial neural network model is a YOLO model.

The method according to claim 3, wherein the step of extracting the abnormal frames
and converting the color system of the input image frame into a CIELUV color space, wherein the artificial neural network extracts the abnormal frame based on the color space-converted image frame.

delete

The method according to claim 1, wherein generating the summary image
Determining cross entropy for the input image frame; and
determining frames to be included in the summary image according to the cross entropy;
Image synopsis generation method further comprising a.

The method of claim 1 , wherein the predetermined object includes at least one of a gun and a knife.

An apparatus for receiving a plurality of videos acquired through each of a plurality of cameras, extracting abnormal frames including a predetermined abnormal object from the plurality of videos, and generating an image synopsis including the abnormal frames, the apparatus comprising:
a memory for storing program instructions; a processor communicatively connected to the memory and executing the program instructions stored in the memory;
The program instructions, when executed by the processor, cause the processor to:
extracting the abnormal frames from an input image frame for each of the plurality of videos, and generating a summary image including the extracted abnormal frames; and
An operation of generating the video synopsis in the form of one frame sequence including the abnormal frames included in the summary video for each of the plurality of videos;
Commands for extracting the abnormal frames and generating the summary image are
extracting the predetermined object from the input image frame and calculating an abnormality score representing a probability that the extracted object is actually the abnormal object;
determining a frame abnormality score for the input image frame based on the abnormality score for the extracted object; and
By including commands for performing an operation of classifying blurry frames based on the frame abnormality score in the input image frame, the frame determined as a normal frame and the frame divided into the blurry frame are included in the summary image and the image synopsis. A video synopsis generating device that prevents

The method according to claim 10, wherein the instructions for performing the operation of distinguishing the blurry frame are
An image synopsis generating device that distinguishes the blurry frame by comparing the Laplacian variance with a predetermined threshold, including instructions for performing an operation of calculating a Laplacian variance by performing a convolutional operation on the input image frame with a predetermined kernel.

The apparatus of claim 10 , wherein the commands for extracting the abnormal frames extract the abnormal frames according to a predetermined artificial neural network model.

The apparatus of claim 12 , wherein the artificial neural network model is a YOLO model.

The method according to claim 12, wherein the instructions for extracting the abnormal frames are
and converting the color system of the input image frame into a CIELUV color space; wherein the artificial neural network extracts the abnormal frame based on the color space-converted image frame.

delete

The method according to claim 10, the instructions for generating the summary image
Determining cross entropy for the input image frame; and
determining frames to be included in the summary image according to the cross entropy;
Image synopsis generating device further comprising instructions for performing.

The apparatus of claim 10 , wherein the predetermined object includes at least one of a gun and a knife.