KR20220129728A

KR20220129728A - Algorihm for keyframe extraction from video

Info

Publication number: KR20220129728A
Application number: KR1020210034356A
Authority: KR
Inventors: 정용화; 박대희; 유승현; 손승욱; 안한세
Original assignee: 고려대학교 세종산학협력단
Priority date: 2021-03-17
Filing date: 2021-03-17
Publication date: 2022-09-26
Also published as: KR102496462B1

Abstract

Disclosed are a method and device for extracting a key frame from a video in an object detection method. The object detection method of the present invention comprises the steps of: obtaining an image by photographing a plurality of objects through a camera; obtaining detection boxes for detecting the objects through an object detection unit from the obtained image; receiving a set of the obtained boxes, and obtaining and storing a matrix (S) representing an average size of the detection boxes in each pixel and a matrix (D) representing a motion position of the detection boxes in each pixel through a key frame extraction unit; and correcting the ratio of the detection boxes in accordance with a photographing angle using the matrix (S) representing the average size of the detection boxes in each pixel and the matrix (D) representing the motion position of the detection boxes in each pixel, and determining whether the objects are moved in accordance with the corrected ratio of the detection boxes to select a key frame. Therefore, a key frame image which is a core is extracted to detect and track an object.

Description

Algorihm for keyframe extraction from video

본 발명은 합성곱 신경망(Convolutional Neural Network) 기반 객체 탐지기를 이용하여 정확도를 측정할 때 실험 데이터 중 유사한 픽셀을 가진 프레임을 제외하는 비디오의 키프레임 추출을 위한 알고리즘에 관한 것이다. The present invention relates to an algorithm for extracting keyframes of a video by excluding frames having similar pixels from experimental data when measuring accuracy using a convolutional neural network-based object detector.

돈사 내 작업자의 부족(국내의 경우 작업자 1명이 평균 2,000 마리의 돼지를 관리)과 돼지의 높은 폐사율(국내의 경우 연간 약 500만 마리의 돼지가 폐사)을 고려할 때, 개별 돼지에 대한 세밀한 관리를 위하여 정보기술(Information Technology)을 적용한 돈사 모니터링의 필요성이 증가하고 있다. 그러나 지속적으로 발전하는 합성 곱 신경망 기반 객체 탐지 기술을 적용하여도 돼지들 간의 겹침(occlusion) 등의 이유로 혼잡한 돈사 내 돼지들을 정확히 탐지하는데 한계가 있다. Considering the shortage of workers in the pig house (one worker manages an average of 2,000 pigs in Korea) and the high mortality rate of pigs (about 5 million pigs die annually in Korea), detailed management of individual pigs is required. For this purpose, the need for pig house monitoring to which Information Technology is applied is increasing. However, even if the continuously developing synthetic product neural network-based object detection technology is applied, there is a limit to accurately detecting pigs in a crowded pig house due to occlusion between pigs.

합성곱 신경망(Convolutional Neural Network) 기술 발전으로 객체 탐지를 통한 돈사에서의 돼지 모니터링이 가능하다. 종래기술에 따른 돈사에서의 돼지 모니터링을 위한 객체 탐지 방법으로는 카메라로부터 획득된 영상으로부터 탐지된 객체에 대응하는 바운딩 박스(bounding box)들을 획득하는 YOLOv4 객체 탐지 방법이 있다. 시간과 정확도가 적절하게 어울리는 가성비가 좋은 객체 탐지기인 YOLOv4에서는 객체 탐지 후 나오는 객체의 정보에서 현재 탐지한 박스 정보의 전체 크기를 Opencv를 이를 이용하여 1차적으로 픽셀마다 지닌 박스의 평균 크기를 저장한 후 이를 이용해 키프레임을 결정한다. With the development of convolutional neural network technology, it is possible to monitor pigs in pig houses through object detection. As an object detection method for monitoring a pig in a pig house according to the prior art, there is a YOLOv4 object detection method for acquiring bounding boxes corresponding to an object detected from an image obtained from a camera. In YOLOv4, an object detector with good cost-effectiveness that matches time and accuracy properly, Opencv is used to first store the average size of the box in each pixel by using Opencv for the total size of the currently detected box information from the object information that comes out after object detection. Then use it to determine keyframes.

YOLOv4가 탐지기중에서 속도와 정확도를 고려했을 때 좋은 성능을 발휘를 하고 있긴 하지만 돼지 농장같이 각 카메라마다 설치하고 암모니아가 많이 분출되어 기계를 자주 교체해줘야 하는 상황에는 고가의 장비를 설치하기에는 어려움이 있다. Although YOLOv4 shows good performance in terms of speed and accuracy among detectors, it is difficult to install expensive equipment in situations such as pig farms where each camera is installed and the machine needs to be replaced frequently due to a lot of ammonia squirting.

반면에 저가의 임베디드 보드에서 정상적으로 작동 하는지 여부를 실험한 결과 값은 실시간(30FPS)으로 나오지 않기 때문에 모든 프레임에 대하여 탐지하는 데는 어려움이 있다. 따라서, 영상에서 주요 프레임만 추출하는 방법을 필요로 한다.On the other hand, it is difficult to detect every frame because the result of testing whether it operates normally on a low-cost embedded board does not come out in real time (30FPS). Therefore, a method of extracting only the main frame from the image is required.

본 발명이 이루고자 하는 기술적 과제는 실시간으로 모든 프레임을 탐지하기 어려운 경우와 실제로 객체가 큰 움직임이 없는 경우에 추적 및 탐지를 하지 않아도 객체의 크기 변화가 없기 때문에, 모든 프레임을 탐지하는 것이 아닌 핵심이 되는 키프레임 영상을 추출하여 객체를 탐지 및 추적하는 키프레임 추출 알고리즘을 제공하는데 있다.The technical problem to be achieved by the present invention is that there is no change in the size of an object even if tracking and detection are not performed when it is difficult to detect all frames in real time and when there is no actual movement of the object, so the key is not to detect all frames. An object of the present invention is to provide a keyframe extraction algorithm that detects and tracks an object by extracting a keyframe image.

일 측면에 있어서, 본 발명에서 제안하는 객체 탐지 방법은 카메라를 통해 복수의 객체를 촬영하여 영상을 획득하는 단계, 획득된 영상으로부터 객체 탐지부를 통해 복수의 객체 탐지를 위한 탐지 박스들을 획득하는 단계, 획득된 박스들의 집합을 입력 받아 키프레임 추출부를 통해 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S) 및 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)을 구하고 저장하는 단계 및 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S) 및 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)을 이용하여 촬영 각도에 따른 탐지 박스의 비율을 수정하고, 수정된 탐지 박스 비율에 따라 객체의 움직임 여부를 판단하여 키프레임으로 선정하는 단계를 포함한다. In one aspect, the object detection method proposed by the present invention includes the steps of acquiring an image by photographing a plurality of objects through a camera, acquiring detection boxes for detecting a plurality of objects from the acquired image through an object detector, Obtaining and storing a matrix (S) indicating the average size of the detection box in each pixel and a matrix (D) indicating the movement position of the detection box in each pixel through a keyframe extracting unit by receiving the obtained set of boxes as an input; Correct the ratio of the detection box according to the shooting angle using the matrix (S) indicating the average size of the detection box and the matrix (D) indicating the movement position of the detection box at each pixel, and according to the modified detection box ratio, the object determining whether to move and selecting a key frame.

획득된 박스들의 집합을 입력 받아 키프레임 추출부를 통해 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S) 및 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)을 구하고 저장하는 단계는 획득된 박스들의 집합을 입력 받아 키프레임 추출부를 통해 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S)을 구하는 단계, 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S)을 이용하여 각 픽셀에 대한 움직임 추정값을 계산하는 단계 및 계산된 움직임 추정값에 따라 각 픽셀에 움직임 여부를 나타내는 값을 저장함으로써, 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)을 구하는 단계를 포함한다. Obtaining and storing a matrix (S) indicating the average size of the detection box in each pixel and a matrix (D) indicating the movement position of the detection box in each pixel through a keyframe extracting unit by receiving the obtained set of boxes as an input Obtaining a matrix (S) indicating the average size of the detection box in each pixel by receiving a set of boxes as an input and using a matrix (S) indicating the average size of the detection box in each pixel through the keyframe extraction unit. calculating a motion estimation value, and obtaining a matrix D indicating a motion position of a detection box in each pixel by storing a value indicating whether a motion is performed in each pixel according to the calculated motion estimation value.

획득된 박스들의 집합을 입력 받아 키프레임 추출부를 통해 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S)을 구하는 단계는 제1 프레임의 탐지 박스의 크기와 제1 프레임 다음의 제2 프레임에서 상응하는 탐지 박스의 크기 간의 평균을 계산하여 탐지 박스의 중심점에 해당하는 픽셀에 대한 탐지 박스의 평균 크기로 정한다. The step of receiving the obtained set of boxes and obtaining a matrix S indicating the average size of the detection box in each pixel through the keyframe extraction unit corresponds to the size of the detection box of the first frame and the second frame following the first frame The average between the sizes of the detection boxes is calculated and the average size of the detection boxes for the pixel corresponding to the center point of the detection box is determined.

각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S) 및 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)을 이용하여 촬영 각도에 따른 탐지 박스의 비율을 수정하고, 수정된 탐지 박스 비율에 따라 객체의 움직임 여부를 판단하여 키프레임으로 선정하는 단계는 촬영된 영상의 촬영 각도에 따른 탐지 박스의 크기의 차이에 따라 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S)을 이용하여 탐지 박스의 비율을 수정하는 단계, 수정된 탐지 박스의 비율에 따라 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)의 값을 판단하는 단계 및 탐지 박스의 움직임 위치를 나타내는 행렬(D)에 따른 객체의 움직임 위치를 나타내는 픽셀의 수가 전체 영상 픽셀의 미리 정해진 기준 이상일 경우 해당 프레임을 키프레임으로 선정하는 단계를 포함한다. The ratio of the detection box according to the shooting angle is corrected using the matrix (S) indicating the average size of the detection box in each pixel and the matrix (D) indicating the movement position of the detection box in each pixel. According to the step of determining whether the object is moving and selecting a key frame according to the difference in the size of the detection box according to the shooting angle of the captured image, the detection box is determined using a matrix (S) indicating the average size of the detection box in each pixel. correcting the ratio of , determining a value of a matrix D indicating the movement position of the detection box in each pixel according to the modified detection box ratio, and an object according to the matrix D indicating the movement position of the detection box and selecting the corresponding frame as a key frame when the number of pixels indicating the movement position of the image is greater than or equal to a predetermined reference of all image pixels.

본 발명의 실시예에 따르면, 수정된 탐지 박스의 비율을 이용하여 계산된 제1 프레임의 움직임 추정값과 제1 프레임 다음의 제2 프레임의 움직임 추정값 간의 차이를 미리 정해진 임계값(th_pixeldiff)과 비교하여 각 픽셀의 탐지 박스의 움직임 위치를 나타내는 행렬(D)의 값을 판단한다. According to an embodiment of the present invention, the difference between the motion estimation value of the first frame calculated using the modified detection box ratio and the motion estimation value of the second frame following the first frame is compared with a predetermined threshold value (th _pixeldiff ) Thus, the value of the matrix D indicating the movement position of the detection box of each pixel is determined.

또 다른 일 측면에 있어서, 본 발명에서 제안하는 객체 탐지 장치는 카메라를 통해 복수의 객체를 촬영하여 영상을 획득하는 영상 수집부, 획득된 영상으로부터 복수의 객체 탐지를 위한 탐지 박스들을 획득하는 객체 탐지부 및 획득된 박스들의 집합을 입력 받아 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S) 및 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)을 구하고 저장하고, 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S) 및 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)을 이용하여 촬영 각도에 따른 탐지 박스의 비율을 수정하고, 수정된 탐지 박스 비율에 따라 객체의 움직임 여부를 판단하여 키프레임으로 선정하는 키프레임 추출부를 포함한다. In another aspect, the object detection apparatus proposed by the present invention includes an image collection unit that acquires an image by photographing a plurality of objects through a camera, and an object detection unit that acquires detection boxes for detecting a plurality of objects from the acquired image A matrix (S) indicating the average size of the detection box in each pixel and a matrix (D) indicating the movement position of the detection box in each pixel are obtained and stored by receiving the set of negative and obtained boxes, and storing the matrix of the detection box in each pixel Correct the ratio of the detection box according to the shooting angle using the matrix (S) indicating the average size and the matrix (D) indicating the movement position of the detection box in each pixel, and determine whether the object moves according to the corrected detection box ratio and a keyframe extractor for determining and selecting a keyframe.

본 발명의 실시예들에 따르면 실시간으로 모든 프레임을 탐지하기 어려운 경우와 실제로 객체가 큰 움직임이 없는 경우에 추적 및 탐지를 하지 않아도 객체의 크기 변화가 없기 때문에, 핵심이 되는 키프레임 영상을 추출하여 객체를 탐지 및 추적할 수 있다. 제안하는 비디오에서의 키프레임 추출을 위한 알고리즘을 통해 초당 30프레임이 아닌 초당 4~15프레임정도만으로도 탐지가 가능한 탐지기에 적용을 할 수 있다.According to the embodiments of the present invention, since there is no change in the size of the object even if tracking and detection is not performed, when it is difficult to detect all frames in real time and when there is no actual movement of the object, the key frame image is extracted and Objects can be detected and tracked. Through the algorithm for extracting key frames from the proposed video, it can be applied to a detector that can detect only 4 to 15 frames per second instead of 30 frames per second.

도 1은 본 발명의 일 실시예에 따른 비디오에서의 키프레임 추출 방법을 설명하기 위한 흐름도이다.
도 2는 본 발명의 일 실시예에 따른 영상획득을 위해 설치된 카메라를 나타내는 도면이다.
도 3은 본 발명의 일 실시예에 따른 촬영된 영상의 촬영 각도에 따른 뷰를 나타내는 도면이다.
도 4는 본 발명의 일 실시예에 따른 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S) 및 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)을 구하는 과정을 설명하기 위한 흐름도이다.
도 5는 본 발명의 일 실시예에 따른 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S)을 설명하기 위한 도면이다.
도 6은 본 발명의 일 실시예에 따른 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)을 나타내는 도면이다.
도 7은 본 발명의 일 실시예에 따른 키프레임 선정 과정을 설명하기 위한 흐름도이다.
도 8은 본 발명의 일 실시예에 따른 비디오에서의 키프레임 추출 방법의 구체적 예시를 나타내는 흐름도이다.
도 9는 본 발명의 일 실시예에 따른 비디오에서의 키프레임 추출 장치의 구성을 나타내는 도면이다.
도 10은 본 발명의 일 실시예에 따른 움직임 여부를 판단하여 선정된 키프레임을 나타내는 도면이다.
도 11은 본 발명의 또 다른 실시예에 따른 일 실시예에 따른 움직임 여부를 판단하여 선정된 키프레임을 나타내는 도면이다.
도 12는 본 발명의 또 다른 실시예에 따른 일 실시예에 따른 움직임 여부를 판단하여 선정된 키프레임을 나타내는 도면이다.1 is a flowchart illustrating a method of extracting a keyframe from a video according to an embodiment of the present invention.
2 is a view showing a camera installed for image acquisition according to an embodiment of the present invention.
3 is a view illustrating a view according to a photographing angle of a photographed image according to an embodiment of the present invention.
4 is a flowchart for explaining a process of obtaining a matrix S indicating an average size of a detection box in each pixel and a matrix D indicating a movement position of a detection box in each pixel according to an embodiment of the present invention.
5 is a diagram for explaining a matrix S indicating an average size of a detection box in each pixel according to an embodiment of the present invention.
6 is a diagram illustrating a matrix D indicating a movement position of a detection box in each pixel according to an embodiment of the present invention.
7 is a flowchart illustrating a keyframe selection process according to an embodiment of the present invention.
8 is a flowchart illustrating a specific example of a method for extracting a keyframe from a video according to an embodiment of the present invention.
9 is a diagram illustrating a configuration of an apparatus for extracting keyframes from a video according to an embodiment of the present invention.
10 is a diagram illustrating a keyframe selected by determining whether to move according to an embodiment of the present invention.
11 is a diagram illustrating a keyframe selected by determining whether to move according to another embodiment of the present invention.
12 is a diagram illustrating a keyframe selected by determining whether to move according to another embodiment of the present invention.

이하, 본 발명의 실시 예를 첨부된 도면을 참조하여 상세하게 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 비디오에서의 키프레임 추출 방법을 설명하기 위한 흐름도이다. 1 is a flowchart illustrating a method of extracting a keyframe from a video according to an embodiment of the present invention.

제안하는 비디오에서의 키프레임 추출 방법은 카메라를 통해 복수의 객체를 촬영하여 영상을 획득하는 단계(110), 획득된 영상으로부터 객체 탐지부를 통해 복수의 객체 탐지를 위한 탐지 박스들을 획득하는 단계(120), 획득된 박스들의 집합을 입력 받아 키프레임 추출부를 통해 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S) 및 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)을 구하고 저장하는 단계(130) 및 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S) 및 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)을 이용하여 촬영 각도에 따른 탐지 박스의 비율을 수정하고, 수정된 탐지 박스 비율에 따라 객체의 움직임 여부를 판단하여 키프레임으로 선정하는 단계(140)를 포함한다. The proposed method for extracting keyframes from a video includes obtaining an image by photographing a plurality of objects through a camera (110), and obtaining detection boxes for detecting a plurality of objects from the obtained image through an object detector (120) ( 130) and the matrix (S) indicating the average size of the detection box in each pixel, and the matrix (D) indicating the movement position of the detection box in each pixel to correct the ratio of the detection box according to the shooting angle, and the modified detection and determining whether the object moves according to the box ratio and selecting the key frame as a key frame (140).

단계(110)에서, 카메라를 통해 복수의 객체를 촬영하여 영상을 획득한다. 예를 들어, 카메라를 통해 복수의 객체를 촬영하기 위해 위에서 아래를 촬영하는 탑-다운 뷰(top-down view) 또는 틸트-다운 뷰(tilted-down view) 카메라를 통해 촬영할 수 있다. In step 110, an image is obtained by photographing a plurality of objects through a camera. For example, in order to photograph a plurality of objects through the camera, the image may be photographed through a top-down view or a tilted-down view camera that photographs from the top down.

본 발명에서 제안하는 객체 탐지를 위한 비디오에서의 키프레임 추출 방법 및 장치에 대한 상세한 설명을 위해 돈사 내에서의 객체(다시 말해, 돼지) 탐지에 적용하여 예시로서 설명한다. 돈사 내에서의 객체 탐지는 일 실시예일뿐 이에 한정되지 않으며, 본 발명에서 제안하는 객체 탐지를 위한 비디오에서의 키프레임 추출 방법 및 장치는 다양한 분야에서의 객체 탐지에 적용될 수 있다. For a detailed description of the method and apparatus for extracting keyframes from a video for object detection proposed in the present invention, it will be described as an example by applying it to object (ie, pig) detection in a pig house. Object detection in a pig house is only an embodiment and is not limited thereto, and the method and apparatus for extracting a keyframe from a video for object detection proposed in the present invention may be applied to object detection in various fields.

최근 CNN(Convolutional Neural Network) 기술의 발전으로 다양한 영상 처리 응용에 딥러닝 기법이 적용되고 있지만 30FPS를 만족하는 환경을 갖추기 위해 값이 비싼 기계를 자주 교체하는 문제점이 있기 때문에 실생활에 적용하는 것은 현실적으로 어려움이 있다. 다양한 환경에 능동적으로 프레임을 추출할 수 있도록 적용하기 위해서는 적절한 파라미터를 설정하는 것이 중요하다.With the recent development of Convolutional Neural Network (CNN) technology, deep learning techniques are being applied to various image processing applications. There is this. It is important to set appropriate parameters in order to apply frames to be actively extracted in various environments.

본 발명의 실시예들에 따르면 실시간으로 모든 프레임을 탐지하기 어려운 경우와 실제로 객체가 큰 움직임이 없는 경우에 추적 및 탐지를 하지 않아도 객체의 크기 변화가 없기 때문에, 핵심이 되는 키프레임 영상을 추출하여 객체를 탐지 및 추적할 수 있다. 제안하는 비디오에서의 키프레임 추출을 위한 알고리즘을 통해 초당 30프레임이 아닌 초당 4~15프레임정도만으로도 탐지가 가능한 탐지기에 적용을 할 수 있다. According to the embodiments of the present invention, since there is no change in the size of the object even if tracking and detection is not performed, when it is difficult to detect all frames in real time and when there is no actual movement of the object, the key frame image is extracted and Objects can be detected and tracked. Through the algorithm for extracting key frames from the proposed video, it can be applied to a detector that can detect only 4 to 15 frames per second instead of 30 frames per second.

도 2는 본 발명의 일 실시예에 따른 영상획득을 위해 설치된 카메라를 나타내는 도면이다. 2 is a view showing a camera installed for image acquisition according to an embodiment of the present invention.

도 2(a)는 본 발명의 일 실시예에 따른 탑-다운 뷰(top-down view) 카메라의 설치 예시를 나타내는 도면이다. 도 2(b)는 본 발명의 일 실시예에 따른 틸트-다운 뷰(tilted-down view) 카메라의 설치 예시를 나타내는 도면이다. 이와 같이, 설치된 카메라의 촬영 각도에 따라 촬영된 영상은 각 픽셀에서 탐지 박스의 크기에 차이를 갖게 된다. Figure 2 (a) is a view showing an example of installation of a top-down view (top-down view) camera according to an embodiment of the present invention. 2B is a diagram illustrating an example of installing a tilted-down view camera according to an embodiment of the present invention. In this way, the image taken according to the shooting angle of the installed camera has a difference in the size of the detection box in each pixel.

도 3은 본 발명의 일 실시예에 따른 촬영된 영상의 촬영 각도에 따른 뷰를 나타내는 도면이다. 3 is a view illustrating a view according to a photographing angle of a photographed image according to an embodiment of the present invention.

도 3(a)는 본 발명의 일 실시예에 따른 탑-다운 뷰(top-down view) 카메라로 촬영된 영상의 예시를 나타내는 도면이다. 도 3(b)는 본 발명의 일 실시예에 따른 틸트-다운 뷰(tilted-down view) 카메라로 촬영된 영상의 예시를 나타내는 도면이다. 3A is a diagram illustrating an example of an image captured by a top-down view camera according to an embodiment of the present invention. 3B is a diagram illustrating an example of an image captured by a tilted-down view camera according to an embodiment of the present invention.

도 3(a) 및 도 3(b)과 같이 설치된 카메라의 촬영 각도에 따라 촬영된 영상은 각 픽셀에서 탐지 박스의 크기에 차이를 갖게 된다. 예를 들어, 도 3(b)를 참조하면 영상의 아랫부분과 윗부분에서 객체의 크기가 차이를 보이는 것을 확인할 수 있다. 따라서, 이러한 비율의 차이 때문에 객체를 탐지하는 탐지 박스의 크기도 비율에 따라 차이를 갖게 된다. 본 발명에서는 이러한 비율의 차이를 반영한 탐지 박스를 이용하여 핵심이 되는 키프레임 영상을 추출함으로써 객체를 탐지 및 추적할 수 있다. As shown in FIGS. 3(a) and 3(b) , the image captured according to the shooting angle of the installed camera has a difference in the size of the detection box in each pixel. For example, referring to FIG. 3B , it can be seen that the size of the object is different in the lower part and the upper part of the image. Therefore, due to the difference in the ratio, the size of the detection box for detecting the object also has a difference according to the ratio. In the present invention, an object can be detected and tracked by extracting a key frame image, which is a core, using a detection box reflecting the difference in the ratio.

단계(120)에서, 획득된 영상으로부터 객체 탐지부를 통해 복수의 객체 탐지를 위한 탐지 박스들을 획득한다. In step 120, detection boxes for detecting a plurality of objects are obtained from the obtained image through the object detector.

단계(130)에서, 획득된 박스들의 집합을 입력 받아 키프레임 추출부를 통해 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S) 및 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)을 구하고 저장한다. In step 130, a matrix (S) indicating the average size of the detection box in each pixel and a matrix (D) indicating the movement position of the detection box in each pixel are obtained through a keyframe extraction unit by receiving the obtained set of boxes, Save.

단계(130)에서는, 먼저 획득된 박스들의 집합을 입력 받아 키프레임 추출부를 통해 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S)을 구한다. In step 130, a matrix S indicating the average size of the detection box in each pixel is obtained through a keyframe extractor by receiving a set of first obtained boxes.

각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S)은 제1 프레임의 탐지 박스의 크기와 제1 프레임 다음의 제2 프레임에서 상응하는 탐지 박스의 크기 간의 평균을 계산하여 탐지 박스의 중심점에 해당하는 픽셀에 대한 탐지 박스의 평균 크기로 정한다. 이와 같이 각 픽셀에 대한 탐지 박스의 평균 크기를 행렬(S)로서 저장한다. The matrix S representing the average size of the detection box in each pixel corresponds to the center point of the detection box by calculating the average between the size of the detection box of the first frame and the size of the corresponding detection box in the second frame following the first frame. It is determined as the average size of the detection box for each pixel. In this way, the average size of the detection box for each pixel is stored as a matrix (S).

이후, 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S)을 이용하여 각 픽셀에 대한 움직임 추정값을 계산한다. 계산된 움직임 추정값에 따라 각 픽셀에 움직임 여부를 나타내는 값을 저장함으로써, 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)을 구한다. 단계(130)는 도 4를 참조하여 더욱 상세히 설명한다. Thereafter, a motion estimation value for each pixel is calculated using a matrix S indicating the average size of the detection box in each pixel. By storing a value indicating whether or not a motion is present in each pixel according to the calculated motion estimation value, a matrix D indicating a motion position of the detection box in each pixel is obtained. Step 130 will be described in more detail with reference to FIG. 4 .

도 4는 본 발명의 일 실시예에 따른 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S) 및 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)을 구하는 과정을 설명하기 위한 흐름도이다. 4 is a flowchart for explaining a process of obtaining a matrix S indicating an average size of a detection box in each pixel and a matrix D indicating a movement position of a detection box in each pixel according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 획득된 박스들의 집합을 입력 받아 키프레임 추출부를 통해 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S) 및 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)을 구하고 저장하는 단계는 획득된 박스들의 집합을 입력 받아 키프레임 추출부를 통해 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S)을 구하는 단계(410), 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S)을 이용하여 각 픽셀에 대한 움직임 추정값을 계산하는 단계(420) 및 계산된 움직임 추정값에 따라 각 픽셀에 움직임 여부를 나타내는 값을 저장함으로써, 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)을 구하는 단계(430)를 포함한다. A matrix (S) indicating the average size of the detection box in each pixel and a matrix (D) indicating the movement position of the detection box in each pixel by receiving a set of acquired boxes according to an embodiment of the present invention The step of obtaining and storing is a step of obtaining a matrix (S) representing the average size of the detection box in each pixel through a keyframe extraction unit by receiving the set of obtained boxes as an input ( 410 ), indicating the average size of the detection box in each pixel A matrix indicating the motion position of the detection box in each pixel by calculating a motion estimation value for each pixel by using the matrix S ( 420 ) and storing a value indicating whether a motion is present in each pixel according to the calculated motion estimation value and (430) finding (D).

단계(410)에서, 획득된 박스들의 집합을 입력 받아 키프레임 추출부를 통해 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S)을 구한다. In step 410, a matrix S indicating the average size of the detection box in each pixel is obtained through a keyframe extractor by receiving the obtained set of boxes.

제1 프레임의 탐지 박스의 크기와 제1 프레임 다음의 제2 프레임에서 상응하는 탐지 박스의 크기 간의 평균을 계산하여 탐지 박스의 중심점에 해당하는 픽셀에 대한 탐지 박스의 평균 크기로 정한다. An average between the size of the detection box of the first frame and the size of the corresponding detection box in the second frame following the first frame is calculated and is determined as the average size of the detection box with respect to a pixel corresponding to the center point of the detection box.

예를 들어, 제1 프레임의 탐지 박스의 크기가 4이고, 제1 프레임 다음의 제2 프레임에서 상응하는 탐지 박스의 크기가 3일 경우, 두 탐지 박스의 크기의 평균인 3.5를 탐지 박스의 중심점에 해당하는 픽셀에 대한 탐지 박스의 평균 크기로 정한다. 이와 같은 과정을 통해 프레임의 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S)을 저장한다. For example, if the size of the detection box of the first frame is 4 and the size of the corresponding detection box in the second frame after the first frame is 3, the average of the sizes of the two detection boxes, 3.5, is the center point of the detection box. It is determined as the average size of the detection box for pixels corresponding to . Through this process, a matrix S indicating the average size of the detection box is stored in each pixel of the frame.

도 5는 본 발명의 일 실시예에 따른 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S)을 설명하기 위한 도면이다. 5 is a diagram for explaining a matrix S indicating an average size of a detection box in each pixel according to an embodiment of the present invention.

도 5와 같이 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S)에서 임의의 객체를 탐지한 탐지 박스(510)를 나타내었다. 탐지 박스(510) 내의 각 픽셀에는 탐지 박스의 평균 크기를 나타내는 값이 나타나 있다. 설명의 편의를 위해 가로 줄에 동일한 픽셀 값이 저장 되었다고 가정 했지만 실제로는 다양한 값을 가질 수 있다. As shown in FIG. 5 , a detection box 510 for detecting an arbitrary object is shown in a matrix S indicating the average size of the detection box in each pixel. Each pixel in the detection box 510 has a value indicating the average size of the detection box. For convenience of explanation, it is assumed that the same pixel values are stored in horizontal lines, but in reality, they can have various values.

단계(420)에서, 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S)을 이용하여 각 픽셀에 대한 움직임 추정값을 계산한다. In step 420, a motion estimation value for each pixel is calculated using a matrix S indicating the average size of the detection box in each pixel.

본 발명의 실시예에 따른 탐지 박스(510)에 대하여 객체의 움직임 여부를 판단하기 위한 움직임 추정값을 구할 수 있다: With respect to the detection box 510 according to an embodiment of the present invention, it is possible to obtain a motion estimation value for determining whether an object moves:

단계(430)에서, 계산된 움직임 추정값에 따라 각 픽셀에 움직임 여부를 나타내는 값을 저장함으로써, 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)을 구한다. In step 430, a matrix D indicating a motion position of the detection box in each pixel is obtained by storing a value indicating whether or not movement is performed in each pixel according to the calculated motion estimation value.

도 6은 본 발명의 일 실시예에 따른 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)을 나타내는 도면이다. 6 is a diagram illustrating a matrix D indicating a movement position of a detection box in each pixel according to an embodiment of the present invention.

각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)은 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S)을 이용하여 계산된 움직임 추정 값에 따라 움직임 추정이 미리 정해진 기준 이상일 경우, 탐지 박스(610)의 해당 픽셀에 1을 저장하고, 미리 정해진 기준보다 작을 경우 0을 저장한다. The matrix D indicating the motion position of the detection box in each pixel is based on the motion estimation value calculated using the matrix S indicating the average size of the detection box in each pixel. 1 is stored in the corresponding pixel of 610, and 0 is stored if it is smaller than a predetermined criterion.

본 발명의 실시예에 따르면, 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)은 YOLOv4에서 객체를 탐지할 때 나온 결과 박스를 표시하는 용도로로서, 도 6과 같이 객체가 있고 이를 탐지한 탐지 박스(610)가 있다면 해당 프레임의 픽셀과 다음 프레임의 픽셀의 차이를 비교한 다음 미리 정해진 수치 이상일 경우 해당 픽셀의 값을 1로 채운다.According to an embodiment of the present invention, the matrix (D) indicating the movement position of the detection box in each pixel is for displaying the result box obtained when the object is detected in YOLOv4. If there is a detection box 610 , the difference between the pixel of the corresponding frame and the pixel of the next frame is compared, and if it is equal to or greater than a predetermined value, the value of the corresponding pixel is filled with 1.

단계(140)에서, 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S) 및 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)을 이용하여 촬영 각도에 따른 탐지 박스의 비율을 수정하고, 수정된 탐지 박스 비율에 따라 객체의 움직임 여부를 판단하여 키프레임으로 선정한다. 단계(140)는 도 7을 참조하여 더욱 상세히 설명한다.In step 140, the ratio of the detection box according to the shooting angle is corrected using the matrix S indicating the average size of the detection box in each pixel and the matrix D indicating the movement position of the detection box in each pixel, According to the modified detection box ratio, it is determined whether the object is moving and selected as a key frame. Step 140 will be described in more detail with reference to FIG. 7 .

도 7은 본 발명의 일 실시예에 따른 키프레임 선정 과정을 설명하기 위한 흐름도이다. 7 is a flowchart illustrating a keyframe selection process according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S) 및 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)을 이용하여 촬영 각도에 따른 탐지 박스의 비율을 수정하고, 수정된 탐지 박스 비율에 따라 객체의 움직임 여부를 판단하여 키프레임으로 선정하는 단계는 촬영된 영상의 촬영 각도에 따른 탐지 박스의 크기의 차이에 따라 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S)을 이용하여 탐지 박스의 비율을 수정하는 단계(710), 수정된 탐지 박스의 비율에 따라 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)의 값을 판단하는 단계(720) 및 탐지 박스의 움직임 위치를 나타내는 행렬(D)에 따른 객체의 움직임 위치를 나타내는 픽셀의 수가 전체 영상 픽셀의 미리 정해진 기준 이상일 경우 해당 프레임을 키프레임으로 선정하는 단계(730)를 포함한다. Correcting the ratio of the detection box according to the shooting angle using the matrix (S) indicating the average size of the detection box in each pixel and the matrix (D) indicating the movement position of the detection box in each pixel according to an embodiment of the present invention and, determining whether the object moves according to the modified detection box ratio and selecting a key frame is a matrix representing the average size of the detection box in each pixel according to the difference in the size of the detection box according to the photographing angle of the photographed image. Correcting the ratio of the detection box using (S) (710), determining the value of the matrix (D) indicating the movement position of the detection box in each pixel according to the modified ratio of the detection box ( 720) and If the number of pixels indicating the movement position of the object according to the matrix D indicating the movement position of the detection box is equal to or greater than a predetermined standard of all image pixels, selecting the corresponding frame as a key frame ( 730 ) is included.

단계(710)에서, 촬영된 영상의 촬영 각도에 따른 탐지 박스의 크기의 차이에 따라 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S)을 이용하여 탐지 박스의 비율을 수정한다. In step 710, the ratio of the detection box is corrected by using the matrix S indicating the average size of the detection box in each pixel according to the difference in the size of the detection box according to the photographing angle of the photographed image.

예를 들어, 틸트-다운 뷰(tilted-down view) 카메라로 획득된 영상에서는 도 3(b)에 도시된 바와 같이, 카메라와의 거리에 따라 객체 크기가 상이하기 때문에 촬영된 영상의 촬영 각도에 따른 탐지 박스의 크기의 차이에 따라 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S)을 이용하여 탐지 박스의 비율을 수정한다. For example, in an image acquired with a tilted-down view camera, as shown in FIG. 3( b ), since the object size varies according to the distance from the camera, the angle of the captured image is According to the difference in the size of the detection box, the ratio of the detection box is corrected by using a matrix S indicating the average size of the detection box in each pixel.

탐지 박스의 비율을 반영하여 수정된 움직임 추정값(N_estimate)은 다음과 같다: The motion estimate (N _estimate ) modified to reflect the proportion of the detection box is as follows:

N_estimate = N_estimate-1 * (4/7+4/8)N _estimate = N _estimate-1 * (4/7+4/8)

여기서, N_estimate은 탐지 박스의 비율을 반영하여 수정된 움직임 추정값이고, N_estimate-1은 수정되기 전 움직임 추정값이다.Here, N _estimate is a motion estimate modified by reflecting the ratio of the detection box, and N _estimate-1 is a motion estimate before being modified.

도 5를 참조하여 위 수식을 설명하면, 여기서 4/7은 탐지 박스(510)에서의 탐지 박스의 평균 크기 7을 갖는 윗 줄 4개의 픽셀들에 관한 것이고, 4/8은 탐지 박스(510)에서의 탐지 박스의 평균 크기 8을 갖는 아랫 줄 4개의 픽셀들에 관한 것이다. The above formula is described with reference to FIG. 5 , where 4/7 relates to the upper row 4 pixels having an average size of 7 of the detection box in the detection box 510 , and 4/8 is the detection box 510 . The bottom row 4 pixels with an average size of 8 of the detection box in .

이와 같이 탐지 박스의 비율(N_estimate)을 수정한 후, 단계(720)에서 수정된 탐지 박스의 비율에 따라 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)의 값을 판단한다. After the detection box ratio (N _estimate ) is corrected in this way, the value of the matrix D indicating the movement position of the detection box in each pixel is determined according to the modified detection box ratio in step 720 .

수정된 탐지 박스의 비율을 이용하여 계산된 제1 프레임의 움직임 추정값과 제1 프레임 다음의 제2 프레임의 움직임 추정 값 간의 차이를 미리 정해진 임계 값(th_pixeldiff)과 비교하여 각 픽셀의 탐지 박스의 움직임 위치를 나타내는 행렬(D)의 값을 판단한다. 본 발명의 실시예에 따른 미리 정해진 임계 값(th_pixeldiff)은 7.5로 설정되었다. 미리 정해진 임계 값(th_pixeldiff)의 수치는 사람이 육안으로 보았을 때 이미지의 차이가 7.5%이상일 때 이미지의 차이를 인식할 수 있다는 실험을 통해 얻은 결과 수치이다. 이는 실시예일뿐 이에 한정되지 않으며, 이외의 다양한 값으로 설정될 수 있다.The difference between the motion estimation value of the first frame calculated using the corrected ratio of detection boxes and the motion estimation value of the second frame following the first frame is compared with a predetermined threshold value (th _pixeldiff ) to determine the detection box of each pixel. The value of the matrix D indicating the movement position is determined. A predetermined threshold value (th _pixeldiff ) according to an embodiment of the present invention is set to 7.5. The numerical value of the predetermined threshold value (th _pixeldiff ) is a result obtained through an experiment that the human eye can recognize the image difference when the image difference is more than 7.5%. This is only an example, but is not limited thereto, and may be set to various other values.

N_estimate = N_estimate-1 * (4/7+4/8)N _estimate = N _estimate-1 * (4/7+4/8)

N_estimate > th_pixeldiff 이고, N_estimate > th_pixeldiff 인 픽셀의 수 > th_keyframe인 경우, 해당 프레임을 키프레임으로 선정할 수 있다. If N _estimate > th _pixeldiff and the number of pixels with N _estimate > th _pixeldiff > th _keyframe , the corresponding frame can be selected as a keyframe.

여기서, th_pixeldiff는 다음 프레임에서 현재 프레임의 해당 픽셀 값을 뺏을 때의 값이 미리 정해진 임계 값(th_pixeldiff) 보다 클 경우 해당 픽셀이 다르다고 판단기 위한 기준 값이다. th_keyframe는 앞서 미리 정해진 임계 값(th_pixeldiff)과 비교하여 차이 있다고 판단된 픽셀의 수가 미리 정해진 수치 이상인지 판단하기 위한 기준 값이다.Here, th _pixeldiff is a reference value for determining that the corresponding pixel is different when the value obtained when the corresponding pixel value of the current frame is taken from the next frame is greater than a predetermined threshold value (th _pixeldiff ). The th _keyframe is a reference value for determining whether the number of pixels determined to be different from the previously predetermined threshold value (th _pixeldiff ) is greater than or equal to a predetermined value.

이와 같이 계산된 차이 있다고 판단된 픽셀의 수가 미리 정해진 수치(th_keyframe) 이상일 경우, 해당 프레임을 키프레임으로 선정할 수 있다. When the number of pixels determined to have a difference calculated as described above is greater than or equal to a predetermined value (th _keyframe ), the corresponding frame may be selected as the keyframe.

본 발명의 실시예에 따르면, YOLOv4를 이용하는 객체 탐지부에 촬영된 영상을 입력하여 탐지 박스의 움직임 위치를 나타내는 행렬(D)을 다음과 같이 결정한다: According to an embodiment of the present invention, a matrix (D) indicating a movement position of a detection box is determined as follows by inputting a photographed image to an object detection unit using YOLOv4:

|I_keyframe - I_t| > th_pixeldiff 이면, D = 1이고, 그렇지 않으면, D = 0이다. |I _keyframe - I _t | If > th _pixeldiff , then D = 1; otherwise, D = 0.

현재 프레임의 해당 픽셀이(x, y)라고 하면 (x, y)의 픽셀 값을 I_t, 키프레임 해당 픽셀이 (x, y)라고 하면 (x, y)의 키프레임의 픽셀 값을 I_keyframe 이라 하고, 만약 두 값의 차이가 th_pixeldiff 보다 클 경우 객체가 움직였다고 판단하여 D=1의 값을 저장할 수 있다. If the corresponding pixel of the current frame is (x, y), the pixel value of (x, y) is I _t . If the corresponding pixel of the key frame is (x, y), the pixel value of the key frame of (x, y) is I It is called _keyframe , and if the difference between the two values is greater than th _pixeldiff , it is determined that the object has moved and the value of D=1 can be stored.

단계(730)에서, 탐지 박스의 움직임 위치를 나타내는 행렬(D)에 따른 객체의 움직임 위치를 나타내는 픽셀의 수가 전체 영상 픽셀의 미리 정해진 기준 이상일 경우 해당 프레임을 키프레임으로 선정한다. In step 730, when the number of pixels indicating the movement position of the object according to the matrix D indicating the movement position of the detection box is equal to or greater than a predetermined reference of all image pixels, the corresponding frame is selected as the key frame.

해당 프레임의 탐지 박스의 움직임 위치를 나타내는 행렬(D)의 각 픽셀 값이 전체 이미지 크기(다시 말해, 가로 X 세로의 모든 픽셀)의 미리 정해진 비율(예를 들어, 7.5%)이상일 경우 해당 프레임을 키프레임으로 선정할 수 있다. If the value of each pixel in the matrix (D) indicating the movement position of the detection box of the corresponding frame is greater than or equal to a predetermined ratio (e.g., 7.5%) of the total image size (that is, all pixels in the horizontal X vertical), the corresponding frame is It can be selected as a keyframe.

도 8은 본 발명의 일 실시예에 따른 비디오에서의 키프레임 추출 방법의 구체적 예시를 나타내는 흐름도이다. 8 is a flowchart illustrating a specific example of a method for extracting a keyframe from a video according to an embodiment of the present invention.

제안하는 비디오에서의 키프레임 추출 방법은 먼저, 카메라를 통해 복수의 객체를 촬영하여 영상을 획득하고(810), 획득된 영상으로부터 객체 탐지부를 통해 객체를 탐지한다(820). 이후, 탐지된 객체에 대한 탐지 박스를 획득한다(831). 획득된 박스를 입력 받아 키프레임 추출부를 통해 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S) 및 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)을 구한다(832). 해당 프레임 내에 탐지할 또 다른 객체가 있는지 판단하고(833), 복수의 객체가 있는 경우 모든 객체에 대하여 위 과정(820 내지 832)을 반복 수행한다. In the proposed method of extracting keyframes from a video, first, an image is obtained by photographing a plurality of objects through a camera ( 810 ), and an object is detected from the obtained image through an object detector ( 820 ). Thereafter, a detection box for the detected object is obtained (831). The obtained box is received and a matrix S indicating the average size of the detection box in each pixel and a matrix D indicating the movement position of the detection box in each pixel are obtained through the keyframe extraction unit (832). It is determined whether there is another object to be detected within the corresponding frame (833), and if there are a plurality of objects, the above processes 820 to 832 are repeatedly performed for all objects.

해당 프레임에 대한 객체 탐지를 반복 수행한 후, 움직임 판별을 위한 영상을 입력 받는다(840). 입력된 영상으로부터 객체 탐지부를 통해 객체를 탐지한다(850). After repeatedly performing object detection for the corresponding frame, an image for motion determination is received ( 840 ). An object is detected from the input image through the object detector ( 850 ).

촬영된 영상의 촬영 각도에 따른 탐지 박스의 크기의 차이에 따라 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S)을 이용하여 탐지 박스의 비율을 수정한다. 예를 들어, 틸트-다운 뷰(tilted-down view) 카메라로 획득된 영상에서는 도 3(b)에 도시된 바와 같이, 카메라와의 거리에 따라 객체 크기가 상이하기 때문에 촬영된 영상의 촬영 각도에 따른 탐지 박스의 크기의 차이에 따라 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S)을 이용하여 탐지 박스의 비율(N_estimate)을 수정한다. According to the difference in the size of the detection box according to the shooting angle of the captured image, the ratio of the detection box is corrected by using a matrix S indicating the average size of the detection box in each pixel. For example, in an image acquired with a tilted-down view camera, as shown in FIG. 3( b ), since the object size varies according to the distance from the camera, the angle of the captured image is According to the difference in the size of the detection box, the ratio of the detection box (N _estimate ) is corrected by using a matrix (S) indicating the average size of the detection box in each pixel.

이와 같이 탐지 박스의 비율(N_estimate)을 수정한 후, 에서 수정된 탐지 박스의 비율에 따라 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)의 값을 판단한다. After the detection box ratio (N _estimate ) is corrected in this way, the value of the matrix D indicating the movement position of the detection box in each pixel is determined according to the modified detection box ratio in .

수정된 탐지 박스의 비율을 이용하여 계산된 제1 프레임의 움직임 추정값과 제1 프레임 다음의 제2 프레임의 움직임 추정값 간의 차이에 대한 미리 정해진 임계값(th_pixeldiff)을 조정한다(861). 이후, 미리 정해진 임계값(th_pixeldiff)과 각 픽셀의 탐지 박스의 움직임 위치를 나타내는 행렬(D)의 값을 판단한다(862). A predetermined threshold value th _pixeldiff for the difference between the calculated motion estimation value of the first frame and the motion estimation value of the second frame following the first frame is adjusted using the corrected ratio of detection boxes ( 861 ). Thereafter, a predetermined threshold value th _pixeldiff and a value of the matrix D indicating the movement position of the detection box of each pixel are determined ( 862 ).

|I_keyframe - I_t| > th_pixeldiff 이면, D = 1로 결정하고(864), 그렇지 않으면, D = 0로 결정한다(863). |I _keyframe - I _t | > th _pixeldiff , then D = 1 (864), otherwise, D = 0 (863).

이후, 탐지 박스의 움직임 위치를 나타내는 행렬(D)에 따른 객체의 움직임 위치를 나타내는 픽셀의 수가 전체 영상 픽셀의 미리 정해진 기준 이상일 경우 해당 프레임을 키프레임으로 선정한다(871). Thereafter, when the number of pixels indicating the movement position of the object according to the matrix D indicating the movement position of the detection box is greater than or equal to a predetermined standard of all image pixels, the corresponding frame is selected as a key frame ( 871 ).

예를 들어, 해당 프레임의 탐지 박스의 움직임 위치를 나타내는 행렬(D)의 각 픽셀 값이 전체 이미지 크기(다시 말해, 가로 X 세로의 모든 픽셀)의 미리 정해진 비율(예를 들어, 7.5%)이상일 경우 해당 프레임을 키프레임으로 선정할 수 있다(872). 그렇지 않을 경우, 해당 프레임 내의 다음 탐지할 객체를 판단한 후, 다음 탐지할 객체에 대하여 과정(850 내지 871)을 반복 수행한다. For example, the value of each pixel in the matrix D representing the movement position of the detection box in the corresponding frame must be greater than or equal to a predetermined percentage (e.g., 7.5%) of the total image size (i.e., all pixels horizontally and vertically). In this case, the corresponding frame may be selected as a key frame (872). Otherwise, after determining the next object to be detected in the frame, processes 850 to 871 are repeatedly performed for the next object to be detected.

도 9는 본 발명의 일 실시예에 따른 비디오에서의 키프레임 추출 장치의 구성을 나타내는 도면이다. 9 is a diagram illustrating a configuration of an apparatus for extracting keyframes from a video according to an embodiment of the present invention.

제안하는 비디오에서의 키프레임 추출 장치(900)는 영상 수집부(910), 객체 탐지부(920) 및 키프레임 추출부(930)를 포함한다. 영상 수집부(910), 객체 탐지부(920) 및 키프레임 추출부(930) 도 1의 단계들(110~140)을 수행하기 위해 구성될 수 있다.The apparatus 900 for extracting keyframes from the proposed video includes an image collection unit 910 , an object detection unit 920 , and a keyframe extraction unit 930 . The image collection unit 910 , the object detection unit 920 , and the keyframe extraction unit 930 may be configured to perform steps 110 to 140 of FIG. 1 .

영상 수집부(910)는 카메라를 통해 복수의 객체를 촬영하여 영상을 획득한다. 예를 들어, 카메라를 통해 복수의 객체를 촬영하기 위해 위에서 아래를 촬영하는 탑-다운 뷰(top-down view) 또는 틸트-다운 뷰(tilted-down view) 카메라를 통해 촬영할 수 있다. The image collection unit 910 acquires an image by photographing a plurality of objects through a camera. For example, in order to photograph a plurality of objects through the camera, the image may be photographed through a top-down view or a tilted-down view camera that photographs from the top down.

객체 탐지부(920)는 획득된 영상으로부터 객체 탐지부를 통해 복수의 객체 탐지를 위한 탐지 박스들을 획득한다. The object detector 920 acquires detection boxes for detecting a plurality of objects from the acquired image through the object detector.

키프레임 추출부(930)는 획득된 박스들의 집합을 입력 받아 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S) 및 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)을 구하고 저장한다. 이후, 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S) 및 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)을 이용하여 촬영 각도에 따른 탐지 박스의 비율을 수정하고, 수정된 탐지 박스 비율에 따라 객체의 움직임 여부를 판단하여 키프레임으로 선정한다. The keyframe extracting unit 930 receives the obtained set of boxes, obtains a matrix S indicating the average size of the detection box in each pixel, and a matrix D indicating the movement position of the detection box in each pixel, and stores it. Thereafter, the ratio of the detection box according to the shooting angle is corrected by using the matrix (S) indicating the average size of the detection box in each pixel and the matrix (D) indicating the movement position of the detection box in each pixel, and the modified detection box It determines whether the object moves according to the ratio and selects it as a key frame.

키프레임 추출부(930)는 먼저 획득된 박스들의 집합을 입력 받아 키프레임 추출부를 통해 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S)을 구한다. The keyframe extractor 930 first receives a set of acquired boxes and obtains a matrix S indicating the average size of the detection box in each pixel through the keyframe extractor.

이후, 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S)을 이용하여 각 픽셀에 대한 움직임 추정값을 계산한다. 계산된 움직임 추정값에 따라 각 픽셀에 움직임 여부를 나타내는 값을 저장함으로써, 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)을 구한다.Thereafter, a motion estimation value for each pixel is calculated using a matrix S indicating the average size of the detection box in each pixel. By storing a value indicating whether or not a motion is present in each pixel according to the calculated motion estimation value, a matrix D indicating a motion position of the detection box in each pixel is obtained.

키프레임 추출부(930)는 각 픽셀에 탐지 박스의 평균 크기를 나타내는 행렬(S) 및 각 픽셀에 탐지 박스의 움직임 위치를 나타내는 행렬(D)을 이용하여 촬영 각도에 따른 탐지 박스의 비율을 수정하고, 수정된 탐지 박스 비율에 따라 객체의 움직임 여부를 판단하여 키프레임으로 선정한다. The keyframe extractor 930 corrects the ratio of the detection box according to the shooting angle by using the matrix S indicating the average size of the detection box in each pixel and the matrix D indicating the movement position of the detection box in each pixel and determines whether the object moves according to the modified detection box ratio and selects it as a keyframe.

도 10은 본 발명의 일 실시예에 따른 움직임 여부를 판단하여 선정된 키프레임을 나타내는 도면이다. 10 is a diagram illustrating a keyframe selected by determining whether to move according to an embodiment of the present invention.

도 11은 본 발명의 또 다른 실시예에 따른 일 실시예에 따른 움직임 여부를 판단하여 선정된 키프레임을 나타내는 도면이다.11 is a diagram illustrating a keyframe selected by determining whether to move according to another embodiment of the present invention.

도 12는 본 발명의 또 다른 실시예에 따른 일 실시예에 따른 움직임 여부를 판단하여 선정된 키프레임을 나타내는 도면이다.12 is a diagram illustrating a keyframe selected by determining whether to move according to another embodiment of the present invention.

도 10 내지 도 12를 참조하면, 각 도면의 (a)와 (b)에서 움직임이 탐지된 객체를 포함하고 있는 해당 프레임이 키프레임으로 선정된 것을 확인 할 수 있다. 이와 같이 선정된 핵심이 되는 키프레임 영상을 추출하여 객체를 탐지 및 추적함으로써 초당 30프레임이 아닌 초당 4~15프레임정도만으로도 탐지가 가능하다. Referring to FIGS. 10 to 12 , it can be confirmed that the corresponding frame including the motion-detected object in (a) and (b) of each figure is selected as the key frame. By extracting the selected keyframe image and detecting and tracking the object, detection is possible with only 4 to 15 frames per second instead of 30 frames per second.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPA(field programmable array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다.　 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다.　 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다.　 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다.　 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented as a hardware component, a software component, and/or a combination of the hardware component and the software component. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA), It may be implemented using one or more general purpose or special purpose computers, such as a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that may include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다.　 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다.　 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or device, to be interpreted by or to provide instructions or data to the processing device. may be embodied in The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다.　 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다.　 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다.　 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.　 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.　 The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다.　 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited embodiments and drawings, various modifications and variations are possible by those skilled in the art from the above description. For example, the described techniques are performed in a different order than the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

<참고 자료><References>

[1] S. Matthews, et al., "Early Detection of Health and Welfare Compromises through Automated Detection of Behavioural Changes in Pigs," The Veterinary Journal, Vol. 217, pp. 43-51, 2016.[1] S. Matthews, et al., "Early Detection of Health and Welfare Compromises through Automated Detection of Behavioral Changes in Pigs," The Veterinary Journal , Vol. 217, pp. 43-51, 2016.

[2] L. Liu, et al., "Deep Learning for Generic Object Detection: A Survey," International Journal of Computer Vision, Vol. 128, pp. 261-318, 2020.[2] L. Liu, et al., “Deep Learning for Generic Object Detection: A Survey,” International Journal of Computer Vision , Vol. 128, pp. 261-318, 2020.

[3] A. Bochkovskiy, C. Wang, and H. Liao, "YOLOv4: Optimal Speed and Accuracy of Object Detection," arXiv preprint arXiv:2004.10934, 2020.[3] A. Bochkovskiy, C. Wang, and H. Liao, “YOLOv4: Optimal Speed and Accuracy of Object Detection,” arXiv preprint arXiv:2004.10934 , 2020.

[4] T. Lin, et al., "Microsoft COCO: Common Objects in Context," In Proceedings of the European Conference on Computer Vision(ECCV), 2014.[4] T. Lin, et al., “Microsoft COCO: Common Objects in Context,” In Proceedings of the European Conference on Computer Vision (ECCV) , 2014.

[5] I. Blayvas, A. Bruckstein, and R. Kimmel, "Efficient Computation of Adaptive Threshold Surfaces for Image Binarization," Pattern Recognition, Vol. 18, No. 1, pp. 89-101, 2006.[5] I. Blayvas, A. Bruckstein, and R. Kimmel, "Efficient Computation of Adaptive Threshold Surfaces for Image Binarization," Pattern Recognition , Vol. 18, No. 1, pp. 89-101, 2006.

Claims

acquiring an image by photographing a plurality of objects through a camera;
obtaining detection boxes for detecting a plurality of objects from the obtained image through an object detector;
receiving the obtained set of boxes, obtaining and storing a matrix (S) indicating the average size of the detection box in each pixel and a matrix (D) indicating the movement position of the detection box in each pixel through a keyframe extraction unit; and
The ratio of the detection box according to the shooting angle is corrected using the matrix (S) indicating the average size of the detection box in each pixel and the matrix (D) indicating the movement position of the detection box in each pixel. Determining whether the object is moving according to the step and selecting it as a key frame
An object detection method comprising a.

According to claim 1,
Obtaining and storing a matrix (S) indicating the average size of the detection box in each pixel and a matrix (D) indicating the movement position of the detection box in each pixel through a keyframe extracting unit by receiving the obtained set of boxes as input,
obtaining a matrix (S) representing an average size of a detection box in each pixel through a keyframe extraction unit by receiving the obtained set of boxes;
calculating a motion estimation value for each pixel by using a matrix S indicating an average size of a detection box in each pixel; and
Obtaining a matrix (D) indicating the movement position of the detection box in each pixel by storing a value indicating whether or not movement is performed in each pixel according to the calculated motion estimation value
An object detection method comprising a.

3. The method of claim 2,
The step of obtaining a matrix (S) indicating the average size of the detection box in each pixel by receiving the set of obtained boxes as an input, through the keyframe extraction unit,
Calculate the average between the size of the detection box of the first frame and the size of the corresponding detection box in the second frame following the first frame, and determine the average size of the detection box for the pixel corresponding to the center point of the detection box
object detection method.

According to claim 1,
The ratio of the detection box according to the shooting angle is corrected using the matrix (S) indicating the average size of the detection box in each pixel and the matrix (D) indicating the movement position of the detection box in each pixel. The step of determining whether an object is moving and selecting it as a keyframe is
correcting the ratio of the detection box by using a matrix (S) indicating the average size of the detection box in each pixel according to the difference in the size of the detection box according to the photographing angle of the photographed image;
determining a value of a matrix (D) indicating a movement position of the detection box in each pixel according to the corrected ratio of the detection box; and
When the number of pixels indicating the movement position of the object according to the matrix (D) indicating the movement position of the detection box is greater than or equal to a predetermined standard of all image pixels, selecting the corresponding frame as a key frame
An object detection method comprising a.

5. The method of claim 4,
The difference between the motion estimation value of the first frame calculated using the modified detection box ratio and the motion estimation value of the second frame following the first frame is compared with a predetermined threshold value (th _pixeldiff ) to move the detection box of each pixel To determine the value of the matrix (D) indicating the position
object detection method.

an image collecting unit that acquires an image by photographing a plurality of objects through a camera;
an object detector for obtaining detection boxes for detecting a plurality of objects from the obtained image; and
A matrix (S) indicating the average size of the detection box in each pixel and a matrix (D) indicating the movement position of the detection box in each pixel are obtained and stored by receiving the obtained set of boxes, and the average size of the detection box is obtained in each pixel The ratio of the detection box according to the shooting angle is corrected using the matrix (S) representing Keyframe extractor that selects keyframes
An object detection device comprising a.

7. The method of claim 6,
The keyframe extractor,
Obtaining a matrix (S) representing the average size of the detection box in each pixel through a keyframe extractor by receiving the set of acquired boxes,
Calculate a motion estimate for each pixel using a matrix (S) representing the average size of the detection box at each pixel,
By storing a value indicating whether or not movement is in each pixel according to the calculated motion estimation value, a matrix D indicating the movement position of the detection box in each pixel is obtained.
object detection device.

8. The method of claim 7,
The keyframe extractor,
Calculate the average between the size of the detection box of the first frame and the size of the corresponding detection box in the second frame following the first frame, and determine the average size of the detection box for the pixel corresponding to the center point of the detection box
object detection device.

7. The method of claim 6,
The keyframe extractor,
Correct the ratio of the detection box by using a matrix (S) indicating the average size of the detection box in each pixel according to the difference in the size of the detection box according to the shooting angle of the photographed image,
Determine the value of the matrix (D) indicating the movement position of the detection box in each pixel according to the ratio of the modified detection box,
When the number of pixels indicating the movement position of an object according to the matrix (D) indicating the movement position of the detection box is greater than or equal to a predetermined standard of all image pixels, the corresponding frame is selected as a key frame.
object detection device.

10. The method of claim 9,
The keyframe extractor,
The difference between the motion estimation value of the first frame calculated using the modified detection box ratio and the motion estimation value of the second frame following the first frame is compared with a predetermined threshold value (th _pixeldiff ) to move the detection box of each pixel To determine the value of the matrix (D) indicating the position
object detection device.