KR102139582B1

KR102139582B1 - Apparatus for CCTV Video Analytics Based on Multiple ROIs and an Object Detection DCNN and Driving Method Thereof

Info

Publication number: KR102139582B1
Application number: KR1020190160358A
Authority: KR
Inventors: 장정훈
Original assignee: 주식회사 인텔리빅스
Priority date: 2019-12-05
Filing date: 2019-12-05
Publication date: 2020-07-29

Abstract

The present invention relates to a CCTV image analysis device based on a multi-region of interest (ROI) and object detecting DCNN and to a driving method thereof. According to an embodiment of the present invention, the CCTV image analysis device based on a multi-ROI and object detecting DCN comprises: a storage unit for storing setting information on a first ROI and a second ROI separately set in appearance regions of objects of interest in a photographing area of a photographing device; and a control unit for separately extracting a first object image of the first ROI and a second object image of the second ROI on the basis of the (pre)stored setting information from a photographed image received from the photographing device to perform object detection based on DCNN with respect to the extracted first and second object images, and performing object tracking and event detection after merging objects of the first ROI and the second ROI detected by the object detection. Accordingly, a remote object can be more properly detected.

Description

{Apparatus for CCTV Video Analytics Based on Multiple ROIs and an Object Detection DCNN and Driving Method Thereof}

본 발명은 다중 ROI 및 객체 검출 DCNN 기반의 CCTV 영상분석장치 및 그 장치의 구동방법에 관한 것으로서, 더 상세하게는 가령 CCTV 영상으로부터 다중 관심 영역(ROI)을 중심으로 객체 검출 DCNN(Deep Convolutional Neural Network)을 이용하여 관심 객체들을 검출하고, 이를 바탕으로 객체 추적 및 관심 이벤트를 검출하는 CCTV 영상분석장치 및 그 장치의 구동방법에 관한 것이다.The present invention relates to a multi-ROI and object detection DCNN-based CCTV image analysis device and a method of driving the apparatus, and more specifically, for example, an object detection from a CCTV image centering on a multiple region of interest (ROI) Deep Convolutional Neural Network ) To detect the objects of interest, and based on this, it relates to a CCTV image analysis device for detecting an object tracking and an event of interest and a method of driving the device.

지능형 CCTV 영상 감시 시스템의 핵심을 이루는 CCTV 영상 분석 장치는 CCTV 카메라 영상으로부터 관심 객체들을 검출/추적하고, 이를 바탕으로 “금지된 구역에 침입 발생”등과 같은 이상 상황을 자동으로 감지하여 경보를 발생시킨다. 이러한 CCTV 영상 분석 장치를 구현하는 데 있어서 핵심 기술 중의 하나가 “관심 객체 검출” 기술이다.The CCTV video analysis device, which forms the core of the intelligent CCTV video surveillance system, detects and tracks objects of interest from the CCTV camera video, and automatically detects anomalies such as “intrusion in the forbidden area” based on this and generates an alarm. . One of the core technologies in implementing such a CCTV image analysis device is “interesting object detection” technology.

물론 컴퓨터 비전 분야에서는 오래 전부터 영상으로부터 관심 객체를 검출하기 위한 다양한 방법들이 연구되어 왔는데, 2010년대 중반부터 딥러닝(Deep-Learning) 기술이 발전하면서 DCNN 기반의 객체 검출 기술들이 발표되었다. 이들 DCNN 기반의 객체 검출 기술은 기존 객체 검출 기술의 성능을 훨씬 뛰어 넘는 검출 성능을 보임으로써 인기를 끌게 되었다. 대표적인 객체 검출 DCNN 모델들로 Faster R-CNN, SSD(Single Shot Multibox Detector), YOLO(You Only Look One) 등이 있다.Of course, in the computer vision field, various methods for detecting an object of interest have been studied for a long time, and since deep-learning technology has been developed since the mid-2010s, DCNN-based object detection technologies have been announced. These DCNN-based object detection technologies have become popular by showing detection performance far exceeding that of existing object detection technologies. Representative object detection DCNN models include Faster R-CNN, Single Shot Multibox Detector (SSD), and You Only Look One (YOLO).

이러한 객체 검출 DCNN 모델을 이용하여 학습된 객체 검출 DCNN을 실제로 사용할 때, DCNN의 입력 해상도(입력 크기)는 일정 값으로 고정된다. 따라서 주어진 영상으로부터 객체 검출을 수행하려면 입력 영상을 스케일링(보통은 축소)하여 DCNN 입력 해상도와 동일하도록 만든 후 DCNN에 입력해야 한다. 이러한 점은 객체 검출 DCNN의 입력 해상도보다 높은 해상도를 갖는 입력 영상으로부터 원거리 객체(영상에서 작은 크기로 나타나는 객체)들을 검출하고자 할 때 문제를 야기시키곤 한다. 왜냐하면 높은 해상도의 입력 영상을 객체 검출 DCNN의 입력 크기와 동일해지도록 축소하는 동안 영상 내의 원거리 객체들은 객체 검출 DCNN이 검출하지 못할 정도로 작아지곤 하기 때문이다.When actually using the object detection DCNN trained using the object detection DCNN model, the input resolution (input size) of the DCNN is fixed to a certain value. Therefore, in order to perform object detection from a given image, the input image must be scaled (usually reduced) to make it equal to the DCNN input resolution and then input to the DCNN. This causes a problem when trying to detect distant objects (objects appearing in a small size in an image) from an input image having a resolution higher than the input resolution of the object detection DCNN. This is because while the high-resolution input image is reduced to be equal to the input size of the object detection DCNN, the distant objects in the image often become small enough that the object detection DCNN cannot detect.

대부분의 객체 검출 DCNN 모델은 DCNN의 입력 해상도를 증가시킴에 따라 객체 검출 성능(특히 원거리 객체의 검출 성능)이 증가하는 경향이 있다. 그러나 DCNN의 입력 해상도가 증가함에 따라 DCNN의 용량 및 계산량이 급격히 증가하기 때문에, 원거리 객체 검출 성능 향상을 위해 무작정 객체 검출 DCNN의 입력 해상도를 키우기에는 어려운 점이 있다.Most object detection DCNN models tend to increase the object detection performance (especially the detection performance of distant objects) as the DCNN input resolution increases. However, as the input resolution of the DCNN increases, the capacity and computation amount of the DCNN increases rapidly, so it is difficult to increase the input resolution of the blind object detection DCNN in order to improve the performance of the long-distance object detection.

한국등록특허공보 제10-1942808호(2019.01.22)Korean Registered Patent Publication No. 10-1942808 (2019.01.22) 한국등록특허공보 제10-1955919호(2019.03.04)Korean Registered Patent Publication No. 10-1955919 (2019.03.04) 한국공개특허공보 제10-2018-0107930호(2018.10.04)Korean Patent Publication No. 10-2018-0107930 (2018.10.04) 한국공개특허공보 제10-2018-0065856호(2018.06.18)Korean Patent Publication No. 10-2018-0065856 (2018.06.18) 한국공개특허공보 제10-2016-0096460호(2016.08.16)Korean Patent Publication No. 10-2016-0096460 (2016.08.16)

본 발명의 실시예는 가령 CCTV 영상으로부터 다중 관심 영역(ROI)을 중심으로 객체 검출 DCNN을 이용하여 관심 객체들을 검출하고, 이를 바탕으로 객체 추적 및 관심 이벤트를 검출하는 CCTV 영상분석장치 및 그 장치의 구동방법을 제공함에 그 목적이 있다.According to an embodiment of the present invention, for example, a CCTV image analysis device for detecting an object of interest and detecting an object of interest and an object of interest based on the object detection DCNN based on the object detection DCNN from a CCTV image, and of the device. The object is to provide a driving method.

또한, 본 발명의 실시예는 낮은 입력 해상도를 갖는 객체 검출 DCNN을 이용하여, 고해상도 CCTV 영상으로부터 관심 객체(특히 원거리 객체)들을 효과적으로 검출하고, 이를 바탕으로 객체 추적 및 이벤트 검출을 수행하는 CCTV 영상 분석 장치 및 그 장치의 구동방법에 제공함에 다른 목적이 있다.In addition, an embodiment of the present invention uses an object detection DCNN having a low input resolution to effectively detect objects of interest (especially long-distance objects) from high-resolution CCTV images, and analyze CCTV images based on this to perform object tracking and event detection. Another object is to provide a device and a method for driving the device.

본 발명의 실시예에 따른 다중 ROI 및 객체 검출 DCNN 기반의 CCTV 영상분석장치는, 촬영장치의 촬영영역에서 관심 객체들의 출현 지역에 각각 설정되는 제1 관심영역(ROI) 및 제2 관심영역의 설정 정보를 저장하는 저장부, 및 상기 촬영장치로부터 수신하는 촬영영상에서 상기 저장한 설정 정보를 근거로 상기 제1 관심영역의 제1 객체이미지 및 상기 제2 관심영역의 제2 객체이미지를 각각 추출하여 상기 추출한 제1 객체이미지 및 상기 제2 객체이미지에 대하여 DCNN 기반의 객체 검출을 수행하며, 상기 객체 검출에 의해 검출되는 상기 제1 관심영역과 상기 제2 관심영역의 객체를 병합한 후 객체 추적 및 이벤트 검출을 수행하는 제어부를 포함한다.The multi-ROI and object detection DCNN-based CCTV image analysis apparatus according to an embodiment of the present invention sets a first region of interest (ROI) and a second region of interest that are respectively set in regions where objects of interest appear in a region of the imaging apparatus. A first object image of the first region of interest and a second object image of the second region of interest are respectively extracted from the storage unit for storing information and the stored setting information from the captured image received from the imaging device, respectively. DCNN-based object detection is performed on the extracted first object image and the second object image, and after merging the objects of the first region of interest and the second region of interest detected by the object detection, object tracking and And a control unit that performs event detection.

상기 제어부는, 상기 제1 관심영역과 상기 제2 관심영역이 중복될 때 중복 영역의 동일 객체는 상기 촬영영상의 단위 프레임 기준으로 하나의 객체로 병합하여 처리할 수 있다.When the first region of interest and the second region of interest overlap, the control unit may process the same object in the overlapped region as one object based on the unit frame of the captured image.

상기 제어부는, 상기 중복 영역의 동일 객체에 대하여 각각 설정된 제1 바운딩 박스 및 제2 바운딩 박스를 상기 단위 프레임 기준으로 병합하여 새로운 바운딩 박스를 생성할 수 있다.The control unit may generate a new bounding box by merging the first bounding box and the second bounding box respectively set for the same object in the overlapping region based on the unit frame.

상기 제어부는, 상기 제1 객체이미지 및 상기 제2 객체이미지에 대한 해상도 변환 후 상기 DCNN 기반의 객체 검출을 수행할 때 원거리의 객체에 대한 손실이 적은 객체검출기를 사용할 수 있다.The control unit may use an object detector with less loss on distant objects when performing DCNN-based object detection after resolution conversion of the first object image and the second object image.

상기 제어부는, 상기 해상도 변환을 위한 관심영역(ROI) 이미지처리부를 포함하며, 변환된 해상도는 상기 객체검출기의 DCNN의 입력 영상의 해상도와 일치할 수 있다.The control unit includes a region of interest (ROI) image processing unit for the resolution conversion, and the converted resolution may match the resolution of the DCNN input image of the object detector.

상기 제어부는, 사용하는 상기 객체검출기의 해상도에 따라 상기 제1 관심영역 및 상기 제2 관심영역의 설정 크기를 결정할 수 있다.The controller may determine a set size of the first region of interest and the second region of interest according to the resolution of the object detector to be used.

또한, 본 발명의 실시예에 따른 다중 ROI 및 객체 검출 DCNN 기반의 CCTV 영상분석장치의 구동방법은, 촬영장치의 촬영영역에서 관심 객체들의 출현 지역에 각각 설정되는 제1 관심영역(ROI) 및 제2 관심영역의 설정 정보를 저장부에 저장하는 단계, 및 제어부가, 상기 촬영장치로부터 수신하는 촬영영상에서 상기 저장한 설정 정보를 근거로 상기 제1 관심영역의 제1 객체이미지 및 상기 제2 관심영역의 제2 객체이미지를 각각 추출하여 상기 추출한 제1 객체이미지 및 상기 제2 객체이미지에 대하여 DCNN 기반의 객체 검출을 수행하며, 상기 객체 검출에 의해 검출되는 상기 제1 관심영역과 상기 제2 관심영역의 객체를 병합한 후 객체 추적 및 이벤트 검출을 수행하는 단계를 포함한다.In addition, a method of driving a multi-ROI and object detection DCNN-based CCTV image analysis apparatus according to an embodiment of the present invention includes: a first region of interest (ROI) and a first region of interest (ROI) that are respectively set in a region where objects of interest appear in an imaging region of the imaging apparatus. 2 storing the setting information of the region of interest in the storage unit, and the control unit, based on the stored setting information in the captured image received from the photographing apparatus, the first object image and the second interest of the first region of interest DCNN-based object detection is performed on the extracted first object image and the second object image by respectively extracting a second object image of the region, and the first region of interest and the second region of interest detected by the object detection And performing object tracking and event detection after merging objects in the region.

상기 수행하는 단계는, 상기 제1 관심영역과 상기 제2 관심영역이 중복될 때 중복 영역의 동일 객체는 상기 촬영영상의 단위 프레임 기준으로 하나의 객체로 병합하여 처리할 수 있다.In the step of performing, when the first region of interest and the second region of interest overlap, the same object in the overlapping region may be processed by merging into one object based on the unit frame of the captured image.

상기 수행하는 단계는, 상기 중복 영역의 동일 객체에 대하여 각각 설정된 제1 바운딩 박스 및 제2 바운딩 박스를 상기 단위 프레임 기준으로 병합하여 새로운 바운딩 박스를 생성할 수 있다.In the step of performing, a new bounding box may be generated by merging the first bounding box and the second bounding box respectively set for the same object in the overlapping region based on the unit frame.

상기 수행하는 단계는, 상기 제1 객체이미지 및 상기 제2 객체이미지에 대한 해상도 변환 후 상기 DCNN 기반의 객체 검출을 수행할 때 원거리의 객체에 대한 손실이 적은 객체검출기를 사용할 수 있다.In the step of performing, the DCNN-based object detection may be performed after resolution conversion of the first object image and the second object image, and an object detector with less loss on distant objects may be used.

관심영역(ROI) 이미지처리부에서 상기 해상도 변환을 수행하며, 변환된 해상도는 상기 객체검출기의 DCNN의 입력 영상의 해상도와 일치할 수 있다.The ROI image processing unit performs the resolution conversion, and the converted resolution may match the resolution of the DCNN input image of the object detector.

상기 구동방법은, 사용하는 상기 객체검출기의 해상도에 따라 상기 제1 관심영역 및 상기 제2 관심영역의 설정 크기를 결정하는 단계를 더 포함할 수 있다.The driving method may further include determining a set size of the first region of interest and the second region of interest according to the resolution of the object detector to be used.

본 발명의 실시예에 따르면, 가령 객체 검출 DCNN을 통해 입력 CCTV 영상으로부터 객체 검출을 수행할 때 입력 CCTV 영상 전체에서 수행하는 것이 아니라, 설정된 ROI에 해당하는 부분 영상(ROI 영상)에서 객체 검출을 수행하므로, ROI 영상을 DCNN 입력 영상으로 변환시, 전체 영상을 DCNN 입력 영상으로 변환할 때보다 원거리 객체에 대한 손실이 훨씬 작기 때문에, 전체 영상에서 객체 검출을 할 때보다 원거리 객체를 보다 잘 검출할 수 있게 될 것이다.According to an embodiment of the present invention, when performing object detection from an input CCTV image through, for example, object detection DCNN, object detection is performed from a partial image (ROI image) corresponding to a set ROI, rather than from the entire input CCTV image. Therefore, when converting an ROI image to a DCNN input image, the loss on a far object is much smaller than when converting the entire image to a DCNN input image, so that far objects can be detected better than when detecting objects on the whole image. Will be.

도 1은 본 발명의 실시예에 따른 DCNN 기반의 영상관제시스템을 나타내는 도면,
도 2는 도 1의 영상분석장치의 세부구조를 예시한 블록다이어그램,
도 3은 도 1의 영상분석장치의 다른 세부구조를 예시한 블록다이어그램,
도 4는 ROI 및 이벤트 검출 영역의 설정 예를 나타내는 도면,
도 5는 도 4의 ROI-1 및 ROI-2에 해당하는 ROI 이미지를 보여주는 도면,
도 6은 도 5의 ROI-1 이미지를 DCNN 기반 객체검출부에 의해 ROI-1 내의 보행자 객체들의 바운딩 박스를 검출한 예시도,
도 7은 동일 객체에 대한 입력 비디오 프레임에서의 바운딩 박스 좌표와 정규화된 ROI 이미지에서의 바운딩 박스 좌표의 대응 관계를 설명하기 위한 도면,
도 8은 도 4의 ROI-1 및 ROI-2에서 검출한 객체 바운딩 박스들을 입력 비디오 프레임 기준으로 통합한 예를 나타내는 도면, 그리고
도 9는 본 발명의 실시예에 따른 영상분석장치의 구동과정의 흐름도이다.1 is a view showing a DCNN-based video control system according to an embodiment of the present invention,
Figure 2 is a block diagram illustrating the detailed structure of the image analysis device of Figure 1,
3 is a block diagram illustrating another detailed structure of the image analysis apparatus of FIG. 1,
4 is a diagram showing an example of setting an ROI and an event detection area,
5 is a view showing ROI images corresponding to ROI-1 and ROI-2 of FIG. 4;
FIG. 6 is an exemplary view of detecting the bounding box of pedestrian objects in the ROI-1 by the DCNN-based object detection unit of the ROI-1 image of FIG. 5;
FIG. 7 is a view for explaining a correspondence between bounding box coordinates in an input video frame for the same object and bounding box coordinates in a normalized ROI image;
FIG. 8 is a diagram illustrating an example in which object bounding boxes detected in ROI-1 and ROI-2 of FIG. 4 are integrated based on an input video frame, and
9 is a flowchart of a driving process of an image analysis apparatus according to an embodiment of the present invention.

이하, 도면을 참조하여 본 발명의 실시예에 대하여 상세히 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

도 1은 본 발명의 실시예에 따른 DCNN 기반의 영상관제시스템을 나타내는 도면이다.1 is a view showing a DCNN-based image control system according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명의 실시예에 따른 DCNN 기반의 영상관제시스템(90)은 촬영장치(100), 통신망(110), 영상분석장치(120), 관제장치(130) 및 관리자장치(140)의 일부 또는 전부를 포함한다.As shown in Figure 1, the DCNN-based video control system 90 according to an embodiment of the present invention is an imaging device 100, a communication network 110, an image analysis device 120, a control device 130 and a manager Includes some or all of device 140.

여기서, "일부 또는 전부를 포함한다"는 것은 영상분석장치(120)와 같은 일부 구성요소가 생략되어 DCNN 기반의 영상관제시스템(90)이 구성되거나, 영상분석장치(120)를 구성하는 구성요소의 일부 또는 전부가 통신망(110)을 구성하는 네트워크장치(예: 무선교환장치 등)에 통합되어 구성될 수 있는 것 등을 의미하는 것으로서, 발명의 충분한 이해를 돕기 위하여 전부 포함하는 것으로 설명한다.Here, "including some or all" means that some components, such as the image analysis apparatus 120, are omitted, so that the DCNN-based image control system 90 is configured, or components constituting the image analysis apparatus 120. It means that all or part of a network device (eg, a wireless switching device, etc.) constituting the communication network 110 may be integrated and configured.

촬영장치(100)는 감시카메라로서 일반 CCTV(Closed Circuit Television) 카메라나 IP(Internet Protocol) 카메라 등을 포함한다. 또한, 촬영장치(100)는 고정식 카메라뿐 아니라 팬(Pan), 틸트(Tilt) 및 줌(Zoom) 동작이 가능한 PTZ(Pan-Tilt-Zoom) 카메라를 포함할 수 있다. 촬영장치(100)는 사회 안전(Social Safety), 범죄 예방(Crime Prevention), 사회 문제(suicide issue) 및 공공 감시(Public Surveilance)를 위하여 다양한 장소에 설치되어 촬영영상을 제공할 수 있다. 예를 들어, 촬영장치(100)는 지하철이나 버스정류장 등 공공장소에 설치되어 사건, 사고 등 다양한 상황을 감시할 수 있으며, 다리의 난간이나 외진 곳에서 발생하는 폭력 행위 등도 감시하도록 할 수 있다. 나아가, 어린이집 등에 설치되어 있는 CCTV를 통해서도 감시가 이루어지도록 할 수 있을 것이다.The photographing apparatus 100 is a surveillance camera and includes a general CCTV (Closed Circuit Television) camera or an IP (Internet Protocol) camera. In addition, the photographing apparatus 100 may include a pan-tilt-zoom (PTZ) camera capable of a pan, tilt, and zoom operation as well as a fixed camera. The photographing apparatus 100 may be installed in various places for social safety, crime prevention, suicide issue, and public surveillance to provide a photographed image. For example, the photographing apparatus 100 may be installed in a public place such as a subway or a bus stop to monitor various situations such as incidents and accidents, and to monitor the railing of a bridge or violent activities occurring in a remote place. Furthermore, surveillance will be possible through CCTV installed in daycare centers.

통신망(110)은 유무선 통신망을 모두 포함한다. 가령 통신망(110)으로서 유무선 인터넷망이 이용되거나 연동될 수 있다. 여기서, 유선망은 케이블망이나 공중 전화망(PSTN)과 같은 인터넷망을 포함하는 것이고, 무선 통신망은 CDMA, WCDMA, GSM, EPC(Evolved Packet Core), LTE(Long Term Evolution), 와이브로(Wibro) 망 등을 포함하는 의미이다. 물론 본 발명의 실시예에 따른 통신망(110)은 이에 한정되는 것이 아니며, 가령 클라우드 컴퓨팅 환경하의 클라우드 컴퓨팅망, 5G망 등에 사용될 수 있다. 가령, 통신망(110)이 유선 통신망인 경우 통신망(110) 내의 액세스포인트는 전화국의 교환국 등에 접속할 수 있지만, 무선 통신망인 경우에는 통신사에서 운용하는 SGSN 또는 GGSN(Gateway GPRS Support Node)에 접속하여 데이터를 처리하거나, BTS(Base Station Transmission), NodeB, e-NodeB 등의 다양한 중계기에 접속하여 데이터를 처리할 수 있다.The communication network 110 includes both wired and wireless communication networks. For example, a wired/wireless Internet network may be used or interlocked as the communication network 110. Here, the wired network includes an Internet network such as a cable network or a public telephone network (PSTN), and the wireless communication network includes CDMA, WCDMA, GSM, Evolved Packet Core (EPC), Long Term Evolution (LTE), Wibro network, etc. It is meant to include. Of course, the communication network 110 according to an embodiment of the present invention is not limited thereto, and may be used, for example, a cloud computing network under a cloud computing environment, a 5G network, and the like. For example, when the communication network 110 is a wired communication network, an access point in the communication network 110 may access a telephone exchange or the like, but in the case of a wireless communication network, data may be accessed by accessing a SGSN or a Gateway GPRS Support Node (GGSN) operated by a communication company. The data can be processed by accessing various repeaters such as BTS (Base Station Transmission), NodeB, and e-NodeB.

통신망(110)은 액세스포인트를 포함할 수 있다. 여기서의 액세스포인트는 건물 내에 많이 설치되는 펨토(femto) 또는 피코(pico) 기지국과 같은 소형 기지국을 포함한다. 펨토 또는 피코 기지국은 소형 기지국의 분류상 촬영장치(100) 등을 최대 몇 대까지 접속할 수 있느냐에 따라 구분된다. 물론 액세스포인트는 촬영장치(100) 등과 지그비 및 와이파이 등의 근거리 통신을 수행하기 위한 근거리 통신모듈을 포함할 수 있다. 액세스포인트는 무선통신을 위하여 TCP/IP 혹은 RTSP(Real-Time Streaming Protocol)를 이용할 수 있다. 여기서, 근거리 통신은 와이파이 이외에 블루투스, 지그비, 적외선, UHF(Ultra High Frequency) 및 VHF(Very High Frequency)와 같은 RF(Radio Frequency) 및 초광대역 통신(UWB) 등의 다양한 규격으로 수행될 수 있다. 이에 따라 액세스포인트는 데이터 패킷의 위치를 추출하고, 추출된 위치에 대한 최상의 통신 경로를 지정하며, 지정된 통신 경로를 따라 데이터 패킷을 다음 장치, 예컨대 영상분석장치(120)나 관제장치(130) 등으로 전달할 수 있다. 액세스포인트는 일반적인 네트워크 환경에서 여러 회선을 공유할 수 있으며, 예컨대 라우터(router), 리피터(repeater) 및 중계기 등이 포함된다.The communication network 110 may include an access point. The access point here includes small base stations, such as femto or pico base stations, which are often installed in buildings. The femto or pico base station is classified according to the maximum number of accesses to the photographing apparatus 100, etc., according to the classification of the small base station. Of course, the access point may include a short-range communication module for performing short-range communication, such as Zigbee and Wi-Fi, with the photographing apparatus 100. The access point can use TCP/IP or RTSP (Real-Time Streaming Protocol) for wireless communication. Here, the short-range communication may be performed in various specifications such as Bluetooth, ZigBee, infrared, UHF (Ultra High Frequency) and VHF (Very High Frequency), and Radio Frequency (RF) and Ultra Wideband Communication (UWB). Accordingly, the access point extracts the location of the data packet, designates the best communication path for the extracted location, and transmits the data packet along the designated communication path to the next device, such as the video analysis device 120 or the control device 130, etc. Can be delivered. An access point can share multiple lines in a typical network environment, including routers, repeaters and repeaters, for example.

본 발명의 실시예에 따라, 통신망(110)은 인트라넷과 같은 내부 전산망을 포함할 수 있으며, 이를 통해 영상분석장치(120), 관제장치(130) 및 관리자장치(140)가 서로 연동할 수 있다. 다만, 본 발명의 실시예에서는 통신망(110)을 중심으로 각 구성요소들이 다양한 형태로 구성될 수 있는 것이므로, 어느 하나의 형태에 특별히 한정하지는 않을 것이다.According to an embodiment of the present invention, the communication network 110 may include an internal computer network, such as an intranet, through which the image analysis device 120, the control device 130, and the manager device 140 may interwork with each other. . However, in the embodiment of the present invention, since each component may be configured in various forms around the communication network 110, it will not be specifically limited to any one form.

영상분석장치(120)는 촬영장치(100)로부터 수신되는 촬영영상에 대하여 다중(혹은 복수의) 관심영역(ROI: Region Of Interest)을 이용한 객체 검출 DCNN 기반의 영상 분석을 수행한다. 무엇보다 본 발명의 실시예에 따른 영상분석장치(120)는 낮은 입력 해상도를 갖는 객체 검출 DCNN을 이용하여, 고해상도 CCTV 영상으로부터 관심 객체(특히 원거리 객체)들을 효과적으로 검출하고, 이를 바탕으로 객체 추적 및 이벤트 검출을 수행한다. 여기서, "낮은 입력 해상도"란 현재 시중에 유통되는 객체검출기의 성능을 비교할 때 평균 이하 정도로 이해될 수 있겠지만, 본 발명의 실시예에서는 그러한 개념에 특별히 한정하지는 않을 것이다. 입력 CCTV 영상에서 관심 객체들의 출현 위치가 영상 전체에 분포하는 것이 아니라 특정 지역으로 제한되어 있어 경우, 사용자 가령 관제요원은 관심 객체들의 출현 지역에 ROI를 설정한다. 영상분석장치(120)는 객체 검출 DCNN을 통해 입력 CCTV 영상으로부터 객체 검출을 수행할 때, 입력 CCTV 영상 전체에서 수행하는 것이 아니라, 설정된 ROI에 해당하는 부분 영상(ROI 영상)에서 객체 검출을 수행한다. ROI 영상을 DCNN 입력 영상으로 변환시, 전체 영상을 DCNN 입력 영상으로 변환할 때보다 원거리 객체에 대한 손실이 훨씬 적기 때문에, 전체 영상에서 객체 검출을 할 때보다 원거리 객체를 보다 잘 검출할 수 있게 된다.The image analysis device 120 performs DCNN-based image analysis based on object detection using multiple (or multiple) regions of interest (ROI) on the captured image received from the imaging apparatus 100. Above all, the image analysis apparatus 120 according to an embodiment of the present invention effectively detects objects of interest (especially long-distance objects) from a high-resolution CCTV image by using an object detection DCNN having a low input resolution, and tracks objects based on this. Event detection is performed. Here, "low input resolution" may be understood to be less than the average when comparing the performance of the object detector currently distributed on the market, but the embodiment of the present invention will not be specifically limited to such a concept. When the location of occurrence of objects of interest in the input CCTV image is not limited to the entire image, but is limited to a specific area, the user agent, for example, sets a ROI in the area of appearance of objects of interest. When performing object detection from the input CCTV image through the object detection DCNN, the image analysis device 120 performs object detection from a partial image (ROI image) corresponding to the set ROI, rather than from the entire input CCTV image. . When converting an ROI image to a DCNN input image, since the loss on a far object is much less than when converting the entire image to a DCNN input image, it is possible to detect a far object better than when detecting an object on the whole image.

또한, 영상분석장치(120)는 ROI 영상으로부터 검출된 객체의 바운딩 박스(Bounding Box) 좌표는 원(본) 입력 영상을 기준으로 한 바운딩 박스 좌표로 변환된다. 객체 추적 및 이벤트 검출은 원 입력 영상 기준의 객체 바운딩 박스 정보를 이용하여 수행된다. 좌표값 설정은 화소를 기준으로 계산될 수도 있을 것이다. 다시 말해, (0. 0)은 단위 프레임 영상에서 볼 때 좌측 최상단의 화소가 될 수 있다. 이는 절대좌표값을 이용한 방식이라면, 단위 프레임 영상의 정중앙 부위를 (0, 0)으로 하는 상대좌표값도 얼마든지 이용될 수 있을 것이다. 설정된 ROI가 두 개 이상인 경우, 각 ROI에 대해 객체 검출 DCNN을 적용하여 객체 바운딩 박스를 획득한다. 설정된 ROI들 간에 겹치는 영역이 존재하는 경우, 동일 객체에 대해 한 개 이상의 ROI로부터 해당 객체의 바운딩 박스를 얻을 수 있다. 이런 경우, 서로 겹치는 바운딩 박스들을 하나로 병합하여 사용한다.In addition, the image analysis apparatus 120 converts the bounding box coordinates of the object detected from the ROI image to the bounding box coordinates based on the original (original) input image. Object tracking and event detection are performed using object bounding box information based on the original input image. Coordinate value setting may be calculated based on pixels. In other words, (0. 0) may be the upper left pixel when viewed from the unit frame image. If this is a method using an absolute coordinate value, any number of relative coordinates having a center portion of the unit frame image as (0, 0) may be used. When two or more ROIs are set, an object bounding box is obtained by applying object detection DCNN to each ROI. When overlapping regions exist between the set ROIs, the bounding box of the corresponding object may be obtained from one or more ROIs for the same object. In this case, the overlapping bounding boxes are merged into one.

상기한 바와 같이, 본 발명의 실시예에 따른 영상분석장치(120)는 가령 관제요원이 감시영역에 대하여 제1 관심영역 및 제2 관심영역을 설정한 경우 설정 정보를 근거로 수신된 촬영영상에서 해당 관심영역에 상응하는 객체 이미지를 추출하고 추출한 객체 이미지의 해상도를 변환하여 객체 검출기의 입력 영상의 해상도와 일치시킨다. 이와 같은 방법을 통해 딥러닝 가령 DCNN을 수행하게 됨으로써 해상도 변환시 원거리 객체에 대한 손실을 방지할 수 있게 된다. 다시 말해, 객체 검출기는 비용적인 측면이나 데이터 연산처리 속도 등을 고려하여 저해상도의 객체 검출기를 사용한다 하더라도 그에 상응하는 크기의 관심영역을 설정하도록 하고, 해당 관심영역에 관계되는 객체 이미지에 대하여 해상도를 변환하더라도 원거리 객체를 보다 잘 검출할 수 있게 되는 것이다. 물론, 이에 있어서 촬영장치(100)가 줌 카메라가 사용되는 경우에도 줌인 및 줌아웃이 이루어진 해당 상태의 촬영영상에서 관심영역의 크기는 변경되므로, 이에 상응하여 해상도 변환이 이루어질 수 있다.As described above, the image analysis apparatus 120 according to an embodiment of the present invention, for example, in the case where the control agent sets the first region of interest and the second region of interest for the surveillance region, in the received image based on the setting information The object image corresponding to the region of interest is extracted and the resolution of the extracted object image is converted to match the resolution of the input image of the object detector. By performing deep learning, for example, DCNN, through this method, it is possible to prevent loss of a distant object during resolution conversion. In other words, even if a low-resolution object detector is used in consideration of a cost aspect or a data operation processing speed, the object detector sets an area of interest corresponding to the size and sets the resolution of the object image related to the area of interest. Even if converted, the far object can be detected better. Of course, in this case, even when a zoom camera is used, the size of a region of interest is changed in a captured image in a state in which zoom-in and zoom-out are performed, so that a resolution conversion can be performed correspondingly.

영상분석장치(120)는 촬영장치(100)의 렌즈에 들어오는 전체 화면에서 관심 객체들의 출현 지역인 감시 영역을 관심 영역으로 세분화하는 관계로 객체 추적이 용이하지 않을 수 있으므로, 가령 객체 추적은 단위 프레임 영상 즉 픽쳐(picture) 기반으로 수행해야 하므로 객체 검출 이후에는 2개의 객체 이미지를 단위 프레임 기준으로 병합하여 또는 검출된 객체를 병합하여 이를 이용해 전체 관심 영역에서의 객체 추적을 수행하여 이벤트를 검출하게 된다. 병합 과정에서 중복되는 동일 객체는 하나의 객체로 처리된다. 또한, 객체 추적을 위하여 바운딩 박스를 설정하는 경우에는 동일 객체에 대하여는 병합 과정에서 하나의 바운딩 박스로 처리하게 된다. 원거리 객체를 원활하게 검출하기 위해 관심 영역을 분할한 후 DCNN 기반의 객체 검출을 수행한 후 다시 분할된 관심 영역을 병합하여 객체 추적 및 이벤트 검출을 수행하는 것이다. 이의 과정에서 중복되는 객체는 하나의 객체로 처리함으로써 분할 과정에서 발생하는 객체 오류를 바로잡을 수 있게 된다.Since the video analysis device 120 subdivides the surveillance area, which is the area of occurrence of the objects of interest, on the entire screen entering the lens of the imaging device 100 into the area of interest, object tracking may not be easy, for example, object tracking is a unit frame Since it has to be performed based on an image, that is, a picture, the object is detected by merging two object images based on a unit frame or merging the detected objects, and then tracking the object in the region of interest using the object. . In the merging process, the same duplicate object is treated as one object. In addition, when setting a bounding box for object tracking, the same object is processed as one bounding box in the merging process. In order to smoothly detect a long-distance object, after dividing the region of interest, DCNN-based object detection is performed, and then the divided region of interest is merged to perform object tracking and event detection. Objects that occur during the partitioning process can be corrected by processing overlapping objects as one object in the process.

예를 들어, 영상분석장치(120)는 촬영장치(100)의 촬영영상에 대하여 영상 분석을 수행하여, 즉 객체를 추출하고 추출한 객체에 대하여 움직임을 추적하고 추적하는 객체에 이벤트(예: 교통사고 등)가 발생하게 될 때, 이를 관제장치(130)로 제공함으로써 지능적인 관제가 이루어지도록 할 수 있다. 예를 들어, 건널목에 대하여 다중 ROI를 설정한 후 이를 통해 객체검출을 수행한 후 다시 병합하여 객체 추적으로 하고 이벤트를 검출한 결과 건널목에서 교통사고 발생하였다고 판단될 때 이벤트 정보를 관제장치로 제공할 수 있는 것이다.For example, the image analysis apparatus 120 performs image analysis on the captured image of the photographing apparatus 100, that is, extracts an object, and tracks movements on the extracted object and events (eg, traffic accidents) When it occurs, etc.), it is possible to provide intelligent control by providing it to the control device 130. For example, after multi-ROI is set for a crossing, object detection is performed through this, and then merged again to track objects and provide event information to the control device when it is determined that a traffic accident has occurred at the crossing. It is possible.

관제장치(130)는 지방자치단체 등의 관제센터에서 운영하는 관제서버나 관제 모니터 등을 포함한다. 통신망(110)을 경유하여 촬영장치(100)의 촬영영상을 수신할 수 있으며, 영상분석장치(120)의 분석 결과를 근거로 선별 관제 또는 지능형 관제가 이루어지도록 할 수 있다. 예를 들어, 관제 모니터에는 복수의 촬영장치(100)의 각 촬영영상을 보여주는 복수의 화면을 포함할 수 있다. 이의 화면상에서는 영상분석장치(120)에서 이벤트 발생 정보를 제공하는 촬영영상(100)만 표시되도록 할 수 있을 것이다.The control device 130 includes a control server or a control monitor operated by a control center such as a local government. The captured image of the image pickup device 100 may be received through the communication network 110, and selection or intelligent control may be performed based on the analysis result of the image analysis apparatus 120. For example, the control monitor may include a plurality of screens showing each photographed image of the plurality of photographing devices 100. On the screen of this, the image analysis apparatus 120 may display only the captured image 100 that provides event occurrence information.

관리자장치(140)는 관제요원의 데스크탑컴퓨터, 랩탑컴퓨터, 태블릿PC, 스마트폰 등의 다양한 장치를 포함할 수 있다. 관제요원은 관리자장치(140)를 통해 영상분석장치(120) 또는 관제장치(130)에 접속하여 촬영장치(100)에서 촬영되는 감시 영역에서 복수의 관심 영역을 설정할 수 있다. 물론 복수의 관심 영역은 전체 감시 영역에서 건널목과 같이 관심 객체들이 출현하는 지역에 해당되며, 가령 종래에는 이러한 관심 영역을 출현 지역 전체에 설정할 수 있었다면, 본 발명의 실시예는 DCNN 기반의 객체검출기의 성능 등을 고려하여 복수의 관심 영역이 설정되도록 하는 것이 바람직하다. 따라서 관심 영역의 최대 크기는 디폴트(default)로 결정될 수 있다. 가령, 객체검출기의 성능이 변경되는 경우, 그에 상응하여 관심 영역의 최대 크기가 결정될 수도 있을 것이다. 이는 프로그램상 알고리즘적으로 결정될 수 있으며, 시스템 설계자의 의도에 따라 얼마든지 달라질 수 있는 것이므로, 본 발명의 실시예에서는 어느 하나의 형태에 특별히 한정하지는 않을 것이다.The manager device 140 may include various devices such as a desktop computer, a laptop computer, a tablet PC, and a smart phone of a control agent. The control agent may access the image analysis device 120 or the control device 130 through the manager device 140 to set a plurality of areas of interest in a surveillance area photographed by the imaging device 100. Of course, a plurality of regions of interest correspond to regions where objects of interest appear, such as crossings, in the entire surveillance region. For example, in the related art, if such regions of interest could be set in the entire region of occurrence, the embodiment of the present invention is based on the DCNN-based object detector. It is desirable to set a plurality of regions of interest in consideration of performance and the like. Therefore, the maximum size of the region of interest may be determined as a default. For example, when the performance of the object detector is changed, the maximum size of the region of interest may be correspondingly determined. Since this can be determined algorithmically in a program, and may vary as much as the intention of a system designer, the embodiment of the present invention will not be limited to any one form.

도 2는 도 1의 영상분석장치의 세부구조를 예시한 블록다이어그램이다.FIG. 2 is a block diagram illustrating a detailed structure of the image analysis device of FIG. 1.

도 2에 도시된 바와 같이, 본 발명의 실시예에 따른 도 1의 영상분석장치(120)는 통신 인터페이스부(200), 제어부(210), 다중 ROI 영상처리부(220) 및 저장부(230)의 일부 또는 전부를 포함한다.As shown in FIG. 2, the image analysis apparatus 120 of FIG. 1 according to an embodiment of the present invention includes a communication interface unit 200, a control unit 210, a multi-ROI image processing unit 220 and a storage unit 230 Includes all or part of.

여기서, "일부 또는 전부를 포함한다"는 것은 저장부(230)와 같은 일부 구성요소가 생략되어 영상분석장치(120)가 구성되거나, 다중 ROI 영상처리부(220)가 제어부(210)와 같은 다른 구성요소에 통합되어 구성될 수 있는 것 등을 의미하는 것으로서, 발명의 충분한 이해를 돕기 위하여 전부 포함하는 것으로 설명한다.Here, "including some or all" is that some components, such as the storage unit 230, are omitted, so that the image analysis apparatus 120 is configured, or the multi-ROI image processing unit 220 is different from the control unit 210. It means that it can be integrated into a component and can be configured, and is described as including everything in order to help the understanding of the invention.

통신 인터페이스부(200)는 도 1의 통신망(110)을 경유하여 촬영장치(100) 및 관제장치(130)와 통신을 수행한다. 물론 통신을 수행하는 과정에서 통신 인터페이스부(200)는 변/복조, 인코딩/디코딩, 먹싱/디먹싱, 해상도를 변환하는 스케일링 등의 동작을 수행할 수 있다. 이는 당업자에게 자명하므로 더 이상의 설명은 생략하도록 한다.The communication interface 200 communicates with the imaging device 100 and the control device 130 via the communication network 110 of FIG. 1. Of course, in the process of performing communication, the communication interface 200 may perform operations such as modulation/demodulation, encoding/decoding, muxing/demuxing, and scaling to convert resolution. Since this is obvious to those skilled in the art, further description will be omitted.

통신 인터페이스부(200)는 촬영장치(100)에서 촬영영상이 제공되는 경우, 이를 제어부(210)로 전달하며, 또한 다중 ROI 영상처리부(220)에서의 영상 분석 결과로서 이벤트 발생 정보를 제어부(210)로부터 수신하여 관제장치(130)로 전송한다.When the photographing image is provided by the photographing apparatus 100, the communication interface 200 transmits it to the control unit 210, and also controls the event generation information as the image analysis result from the multi-ROI image processing unit 220. ) And transmits it to the control device 130.

제어부(210)는 도 2의 통신 인터페이스부(200), 다중 ROI 영상처리부(220) 및 저장부(230)의 전반적인 제어동작을 담당한다. 통신 인터페이스부(200)를 통해 촬영장치(100)의 촬영영상이 수신되는 경우, 다중 ROI 영상처리부(220)에 제공한다. 이의 과정에서 촬영영상의 영상데이터를 저장부(230)에 임시 저장한 후 다중 ROI 영상처리부(220)에 제공할 수 있다.The control unit 210 is responsible for the overall control operation of the communication interface unit 200, the multi-ROI image processing unit 220 and the storage unit 230 of FIG. When a photographed image of the photographing apparatus 100 is received through the communication interface 200, it is provided to the multi-ROI image processing unit 220. In the process of this, the image data of the captured image may be temporarily stored in the storage unit 230 and then provided to the multi-ROI image processing unit 220.

또한, 제어부(210)는 다중 ROI 영상처리부(220)의 분석 결과를 관제장치(130)로 제공하도록 통신 인터페이스부(200)를 제어한다. 예컨대, 복수의 촬영장치(100) 중에서 특정 채널의 촬영장치(100)에서 제공한 촬영영상의 분석 결과 이벤트가 발생한 것으로 판단될 때 이를 다중 ROI 영상처리부(220)로부터 이벤트 정보의 형태로 제공받아 관제장치(130)에 통보해 줄 수 있다. 그 결과, 관제요원은 관제장치(130)에서 특정 촬영장치(100)에 의해 감시되는 감시영역에 사건, 사고 등의 이벤트가 발생하였다는 사실을 알 수 있게 된다.In addition, the control unit 210 controls the communication interface unit 200 to provide the analysis result of the multi-ROI image processing unit 220 to the control device 130. For example, when it is determined that an event has occurred as a result of analysis of a photographed image provided by the photographing apparatus 100 of a specific channel among a plurality of photographing apparatuses 100, it is provided in the form of event information from the multi-ROI image processing unit 220 to control The device 130 can be notified. As a result, the control agent can know that an event, such as an accident, has occurred in the surveillance area monitored by the specific photographing device 100 in the control device 130.

뿐만 아니라, 제어부(210)는 본 발명의 실시예에 따른 다중 ROI 설정을 위한 동작을 수행할 수 있다. 가령, 관리자장치(140)를 통해 관제요원이 특정 촬영장치(100)의 감시영역 중 관심객체들의 출현 지역에 다중 ROI를 설정하고자 할 때, 이의 동작에 관여할 수 있다. 여기서, 설정되는 다중 ROI는 다중 ROI 영상처리부(220)에서 사용되는 DCNN 기반의 객체검출기의 성능에 따라 다양하게 설정될 수 있다. 대표적으로, 다중 ROI의 개수에 따라 객체 검출기의 수량이 결정될 수 있으며, 객체 검출기의 성능에 따라 관심영역의 설정 크기가 결정될 수 있다.In addition, the controller 210 may perform an operation for setting multiple ROIs according to an embodiment of the present invention. For example, when the control agent through the manager device 140 wants to set multiple ROIs in the areas of interest among the surveillance areas of the specific imaging device 100, it may be involved in its operation. Here, the set multi-ROI may be variously set according to the performance of the DCNN-based object detector used in the multi-ROI image processing unit 220. Typically, the number of object detectors may be determined according to the number of multiple ROIs, and the set size of the region of interest may be determined according to the performance of the object detector.

관제요원이 가령 관리자장치(140)를 통해 특정 감시영역에 대하여 다중 ROI를 설정하는 경우, 제어부(210)는 이에 대한 정보를 저장부(230)에 저장시킨 후 다중 ROI 영상처리부(220)에서 요청시 제공할 수 있지만, 다중 ROI 영상처리부(220)의 프로그램 내부에 소프트웨어적인 레지스트리(registry) 등에 저장시켜 영상 분석시 사용하도록 할 수 있을 것이다.When a control agent sets multiple ROIs for a specific monitoring area through, for example, the manager device 140, the control unit 210 stores the information in the storage unit 230 and requests the multi-ROI image processing unit 220. It can be provided, but it can be stored in a software registry or the like inside the program of the multi-ROI image processing unit 220 to be used for image analysis.

다중 ROI 영상처리부(220)는 촬영장치(100)로부터 수신되는 촬영영상의 영상 분석시, 더 정확하게는 DCNN 기반의 객체 검출 및 검출 객체에 대한 (관심) 이벤트 검출시 관심 객체 즉 원거리 객체들에 대한 효과적인 검출이 이루어질 수 있도록 동작한다. 이를 위하여, 다중 ROI 영상처리부(220)는 가령 DCNN 기반의 객체 검출기의 입력 영상에 대한 해상도를 고려하여 다중 ROI의 설정이 이루어지도록 한다. 물론 이는 촬영영상에서 관심영역에 대한 ROI 이미지를 추출하고 추출한 ROI 이미지를 지정된 범위에서 해상도를 변환하는 경우를 전제한 것이라 볼 수 있다. 해상도 변환시 원거리 객체가 제대로 검출되지 않을 수 있으므로, 고정된 성능의 해상도 변환이 적용되어야만 객체 검출이 용이하므로 해상도 변환 요인보다는 다중 ROI를 적용해 객체검출기의 입력 영상에 대한 해상도와 일치시킴으로써 원거리 객체를 효과적으로 검출하게 된다.The multi-ROI image processor 220 analyzes an image of a captured image received from the photographing apparatus 100, and more precisely, detects an object of interest, that is, distant objects, when detecting a DCNN-based object and detecting an event of interest for the detected object It works so that effective detection can be achieved. To this end, the multi-ROI image processing unit 220 sets the multi-ROI by taking into account the resolution of the input image of the DCNN-based object detector. Of course, this can be regarded as the premise of extracting the ROI image for the region of interest from the captured image and converting the extracted ROI image in a specified range. Since the long distance object may not be properly detected when converting the resolution, it is easy to detect the object only when a fixed performance resolution conversion is applied, so applying multiple ROIs rather than resolution conversion factors effectively matches the resolution of the input object of the object detector. Will be detected.

따라서, 다중 ROI 영상처리부(220)는 가령 관제요원에 의해 설정되는 다중 ROI의 설정 정보를 근거로 입력된 촬영영상에서 각 ROI 영역에 해당하는 관심 영역의 이미지를 추출한 후 추출한 이미지의 해상도 변환을 통해 객체 검출기에 해상도가 변환된 ROI 이미지를 제공하여 객체를 검출하게 된다. 물론 여기서, 객체 검출은 사람, 사물 등 다양한 형태의 객체가 진정한 객체 인지를 판단하는 것이며, 정확도를 높이기 위하여 딥러닝 즉 DCNN 기반의 객체검출기를 사용한다고 볼 수 있다. 그리고, 실질적으로 관제요원의 감시영역은 다중 ROI의 전체 영역에 해당되므로, 다중 ROI 영상처리부(220)는 각각의 객체 검출기에서 검출된 객체에 대한 병합 동작을 수행한다. 병합은 물론 단위 프레임 기준으로 이루어진다. 이의 과정에서 가령 서로 중첩되는 영역의 동일 객체는 하나의 객체로 처리한다. 물론 객체 추적을 위해 바운딩 박스를 설정하는 경우에는 동일 객체에 대하여는 바운딩 박스도 병합하여 새로운 바운딩 박스를 생성하여 처리한다. 이후 병합된 단위 프레임 기준의 객체들을 근거로 픽처에서 객체 추적을 수행하고 이의 과정에서 이벤트 발생을 검출할 수 있다. 이벤트 발생은 객체와 객체, 또 객체와 사물간 상관관계를 룰 기반으로 설정하여 이와 비교하여 판단되거나, 딥러닝 기반으로 분석하여 이벤트 발생을 판단할 수도 있으므로, 어느 하나의 형태에 특별히 한정하지는 않을 것이다. 룰 기반은 빅데이터가 굳이 필요하지 않지만, 딥러닝은 빅데이터를 이용한다는 것에 차이가 있다고 볼 수 있다.Accordingly, the multi-ROI image processing unit 220 extracts an image of a region of interest corresponding to each ROI region from the input image, based on setting information of the multi-ROI set by, for example, a control agent, and then converts the extracted image through resolution. An object is detected by providing an ROI image with a converted resolution to the object detector. Of course, here, object detection is to determine whether various types of objects, such as people and objects, are true objects, and deep learning, or DCNN-based object detectors, can be used to increase accuracy. In addition, since the surveillance area of the control agent substantially corresponds to the entire area of the multi-ROI, the multi-ROI image processing unit 220 performs a merge operation on the object detected by each object detector. Merging is of course done on a per-frame basis. In the process of this, for example, the same object in a region overlapping each other is treated as one object. Of course, when setting a bounding box for object tracking, a new bounding box is generated and processed for the same object by merging the bounding box. Thereafter, object tracking may be performed in a picture based on the merged unit frame-based objects, and an event may be detected in the process. The occurrence of an event may be determined by comparing the object and the object, and the correlation between the object and the object based on a rule, or may be determined by analyzing based on deep learning, so it will not be specifically limited to any one type. . Rule-based big data is not necessary, but deep learning uses big data.

상기한 바와 같이 다중 ROI 영상처리부(220)는 지정된 자원, 또는 객체 검출기에 대한 비용을 절약하면서, 또는 데이터 연산 처리 속도를 증가시키기 위하여 다중 ROI를 이용해 객체를 검출하고 이를 다시 병합해 이벤트 발생 여부를 판단하게 됨으로써 데이터 연산 처리 속도나 자원에 소요되는 비용을 절약할 수 있다.As described above, the multi-ROI image processing unit 220 detects an object using multiple ROIs and saves the cost of a designated resource or an object detector, or increases the data processing speed, and merges them again to determine whether an event occurs. By making judgments, it is possible to save data processing speed or cost for resources.

저장부(230)는 제어부(210)의 제어하에 처리되는 정보나 데이터를 저장한 후 출력할 수 있다. 여기서, 정보는 요청이나 응답과 같은 제어신호를 의미한다면 데이터는 촬영영상의 영상데이터와 같은 데이터를 의미하지만, 2개의 용어는 서로 혼용되는 경우가 있고, 따라서 본 발명의 실시예에서는 그러한 용어의 개념에 특별히 한정하지는 않을 것이다. 예를 들어, 저장부(230)는 관제요원 등이 ROI를 설정하는 경우, 해당 설정 정보를 저장할 수 있을 것이다.The storage unit 230 may store and output information or data processed under the control of the control unit 210. Here, if the information means a control signal such as a request or a response, the data means data such as image data of a captured image, but the two terms may be mixed with each other, and accordingly, in the embodiment of the present invention, the concept of the term It will not be particularly limited to. For example, the storage unit 230 may store corresponding setting information when a control agent or the like sets an ROI.

한편, 도 2의 제어부(210)는 CPU와 메모리를 포함할 수 있다. CPU와 메모리는 원칩화하여 형성될 수도 있다. CPU는 제어회로, 연산부(ALU), 명령어해석부, 레지스트리 등을 포함하며, 메모리는 램을 포함할 수 있다. 제어회로는 제어동작을, 그리고 연산부는 2진비트정보의 연산을, 명령어해석부는 고급언어를 기계어로 또 기계어를 고급언어로 해석하는 동작을 수행할 수 있으며, 레지스트리는 소프트웨어적인 데이터 저장에 관여할 수 있다. 상기의 구성 결과 영상분석장치(120)의 초기 구동시 또는 필요시에 다중 ROI 영상처리부(220)에 저장된 프로그램 또는 알고리즘을 메모리에 저장한 후 이를 실행시킴으로써 연산 처리 속도를 빠르게 증가시킬 수 있을 것이다.Meanwhile, the control unit 210 of FIG. 2 may include a CPU and a memory. The CPU and the memory may be formed by one chip. The CPU includes a control circuit, an operation unit (ALU), an instruction analysis unit, and a registry, and the memory may include RAM. The control circuit can perform the control operation, the operation unit performs binary bit information calculation, and the instruction analysis unit performs the operation of interpreting the high-level language as the machine language and the machine language as the high-level language, and the registry is involved in software data storage. Can. As a result of the above configuration, when the initial driving of the image analysis apparatus 120 or when necessary, the program or algorithm stored in the multi-ROI image processing unit 220 may be stored in a memory and then executed to rapidly increase the computational processing speed.

도 3은 도 1의 영상분석장치(또는 도 2의 다중 ROI 영상처리부)의 다른 세부구조를 예시한 블록다이어그램, 도 4는 ROI 및 이벤트 검출 영역의 설정 예를 나타내는 도면, 도 5는 도 4의 ROI-1 및 ROI-2에 해당하는 ROI 이미지를 보여주는 도면, 도 6은 도 5의 ROI-1 이미지를 DCNN 기반 객체검출부에 의해 ROI-1 내의 보행자 객체들의 바운딩 박스를 검출한 예시도, 도 7은 동일 객체에 대한 입력 비디오 프레임에서의 바운딩 박스 좌표와 정규화된 ROI 이미지에서의 바운딩 박스 좌표의 대응 관계를 설명하기 위한 도면, 도 8은 도 4의 ROI-1 및 ROI-2에서 검출한 객체 바운딩 박스들을 입력 비디오 프레임 기준으로 통합한 예를 나타내는 도면이다.3 is a block diagram illustrating another detailed structure of the image analysis apparatus of FIG. 1 (or the multi-ROI image processing unit of FIG. 2), FIG. 4 is a diagram showing an example of setting an ROI and an event detection area, and FIG. 5 is a view of FIG. FIG. 6 is a view showing ROI images corresponding to ROI-1 and ROI-2, and FIG. 6 is an exemplary view of detecting a bounding box of pedestrian objects in ROI-1 by the DCNN-based object detection unit of the ROI-1 image of FIG. 5. Is a diagram for explaining the correspondence between the bounding box coordinates in the input video frame for the same object and the bounding box coordinates in the normalized ROI image, and FIG. 8 is an object bounding detected in ROI-1 and ROI-2 of FIG. 4. This is a diagram showing an example in which boxes are integrated based on an input video frame.

도 3에 도시된 바와 같이, 본 발명의 다른 실시예에 따른 영상분석장치(120')는 비디오 프레임 획득부(300), ROI 이미지처리부(310), 객체 바운딩 박스 병합부(320), 객체 추적부(330) 및 이벤트 검출부(340)의 일부 또는 전부를 포함하며, 여기서 ROI 이미지처리부(310)는 ROI 이미지추출부(311), ROI 이미지 전처리부(312), DCNN 기반 객체 검출부(313) 및 객체 바운딩 박스 좌표 변환부(314)의 일부 또는 전부를 포함할 수 있다.As shown in FIG. 3, the image analysis apparatus 120 ′ according to another embodiment of the present invention includes a video frame acquisition unit 300, an ROI image processing unit 310, an object bounding box merging unit 320, and object tracking Includes part or all of the unit 330 and the event detection unit 340, wherein the ROI image processing unit 310 includes an ROI image extraction unit 311, an ROI image pre-processing unit 312, a DCNN-based object detection unit 313, and It may include a part or all of the object bounding box coordinate conversion unit 314.

여기서 "일부 또는 전부를 포함"한다는 것은 위의 구성요소들이 하드웨어(H/W), 소프트웨어(S/W), 또는 그 조합에 의해 구성되어 일부 구성요소가 생략되어 구성되거나, 일부 구성요소가 다른 구성요소에 통합되어 구성될 수 있는 것 등을 의미한다.Here, "including some or all" means that the above components are configured by hardware (H/W), software (S/W), or a combination thereof, and some components are omitted or some components are different. It means things that can be integrated into components.

사용자는 도 3의 CCTV 영상분석장치(120')를 운영할 때, 관심 객체 검출을 위해 최소 1개 이상의 직사각형 형태의 ROI를 설정한다고 가정하자. 또한, 사용자는 관심 이벤트 검출을 위해 최소 1개 이상의 다각형 형태의 이벤트 검출 영역을 설정한다고 가정하자. 도 4는 CCTV 영상으로부터 "보행자의 횡단보도 진입" 이벤트를 검출하기 위해 횡단보도 근처에 두 개의 ROI(실선으로 표시된 직사각형 영역)와 1개의 이벤트 검출 영역(점선으로 표시된 다각형 영역)을 설정한 예를 보여준다.Suppose that the user sets at least one rectangular ROI to detect the object of interest when operating the CCTV image analysis device 120' of FIG. 3. In addition, it is assumed that the user sets at least one polygon-shaped event detection area to detect an event of interest. FIG. 4 shows an example in which two ROIs (a rectangular area indicated by a solid line) and one event detection area (a polygonal area indicated by a dotted line) are set in the vicinity of a pedestrian crossing to detect a "pedestrian entering a pedestrian crossing" event from a CCTV image. Show.

도 4의 예에서와 같이 ROI 및 이벤트 검출 영역이 설정되어 있는 상태에서 도 3의 CCTV 영상분석장치(120')의 동작 흐름은 다음과 같다.As in the example of FIG. 4, the operation flow of the CCTV image analysis device 120 ′ of FIG. 3 in the state where the ROI and event detection area are set is as follows.

도 3에서 비디오 프레임 획득부(300)는 CCTV 영상분석장치(120')에 연결된 CCTV 카메라 등과 같은 비디오 제공 장치로부터 비디오 프레임을 지속적으로 획득한다. 일례로 CCTV 영상분석장치(120')에 연결된 비디오 제공 장치가 IP 카메라인 경우, 비디오 프레임 획득부(300)는 IP 카메라로부터 인코딩된 비디오 스트림을 지속적으로 수신 및 디코딩하여 YUV 픽셀 포맷 또는 RGB 픽셀 포맷의 비디오 프레임을 지속적으로 획득한다.In FIG. 3, the video frame acquisition unit 300 continuously acquires a video frame from a video providing device such as a CCTV camera connected to the CCTV image analysis device 120'. For example, when the video providing device connected to the CCTV video analysis device 120' is an IP camera, the video frame acquisition unit 300 continuously receives and decodes the encoded video stream from the IP camera to YUV pixel format or RGB pixel format. Video frames are continuously acquired.

ROI 이미지 추출부(311)는 비디오 프레임 획득부(300)가 획득한 입력 비디오 프레임으로부터, 사용자, 가령 관제요원이 기설정한 ROI에 해당하는 이미지(이하, ROI 이미지)를 추출한다. 도 5의 (a) 및 (b)는 도 4의 ROI#1과 ROI#2에 해당하는 ROI 이미지를 보여준다. 도 3의 ROI 이미지 전처리부(312)는 ROI 이미지 추출부(311)를 통해 획득한 ROI 이미지를 가공하여, DCNN 기반 객체 검출부(313)의 입력으로 사용되는 정규화된 ROI 이미지 데이터를 생성한다. 정규화된 ROI 이미지는 다음과 같은 특성을 갖는다. 첫째, 정규화된 ROI 이미지의 해상도는 DCNN 기반 객체 검출부(313)에 포함되어 있는 객체 검출 DCNN의 입력 해상도와 일치한다. 또한, 정규화된 ROI 이미지의 데이터는 통상적으로 R/G/B 3채널로 구성된다.The ROI image extraction unit 311 extracts an image (hereinafter referred to as an ROI image) corresponding to a ROI preset by a user, for example, a control agent, from an input video frame acquired by the video frame acquisition unit 300. 5A and 5B show ROI images corresponding to ROI#1 and ROI#2 of FIG. 4. The ROI image pre-processing unit 312 of FIG. 3 processes the ROI image obtained through the ROI image extraction unit 311 to generate normalized ROI image data used as input to the DCNN-based object detection unit 313. The normalized ROI image has the following characteristics. First, the normalized ROI image resolution matches the input resolution of the object detection DCNN included in the DCNN-based object detection unit 313. In addition, the data of the normalized ROI image is typically composed of three R/G/B channels.

도 3의 DCNN 기반 객체 검출부(313)는 객체 검출 DCNN을 이용하여, 위의 정규화된 ROI 이미지로부터 객체 검출 결과를 획득한다. 객체 검출 결과로는 검출된 객체의 바운딩 박스 좌표 값과 객체 종류(또는 객체 클래스) 값을 포함한다. 도 6은 정규화된 ROI 이미지(도 6의 (a))로부터 DCNN 기반 객체 검출부(313)를 통해 얻은 보행자 객체 검출 결과(도 6의 (b))의 예를 보여주고 있다.The DCNN-based object detection unit 313 of FIG. 3 uses the object detection DCNN to obtain an object detection result from the normalized ROI image. The object detection result includes the detected object's bounding box coordinate value and object type (or object class) value. FIG. 6 shows an example of a pedestrian object detection result (FIG. 6( b)) obtained through the DCNN-based object detection unit 313 from the normalized ROI image (FIG. 6( a )).

현재까지 다양한 객체 검출 DCNN 모델이 발표된 바 있다. 대표적으로 Faster R-CNN, YOLO(You Only Look Once), SSD(Single Shot Multibox Detector) 등이 있다. DCNN 기반 객체 검출부(313)는 이러한 객체 검출 DCNN 모델을 학습시켜 얻은 객체 검출 DCNN을 이용하여 객체 검출을 수행한다.Various object detection DCNN models have been released to date. Typical examples include Faster R-CNN, You Only Look Once (YOLO), and Single Shot Multibox Detector (SSD). The DCNN-based object detection unit 313 performs object detection using the object detection DCNN obtained by training the object detection DCNN model.

도 3의 객체 바운딩 박스 좌표 변환부(314)는 DCNN 기반 객체 검출부(313)를 통해 얻은 객체 바운딩 박스 좌표값(정규화된 ROI 이미지 기준)을 입력 비디오 프레임 기준의 좌표값으로 변환한다. 도 7과 같이, 입력 비디오 프레임(700)을 기준으로 하는 ROI의 좌표 값(710)을

, 정규화된 ROI 이미지(720)의 해상도(즉, 객체 검출 DCNN의 입력 해상도)를

, 정규화된 ROI 이미지(720)를 기준으로 하는 특정 객체의 바운딩 박스 좌표 값(721)을

, 입력 비디오 프레임(700) 기준으로 한 동일 객체의 바운딩 박스 좌표 값(711)을

라고 하면,

와

의 관계는 <수학식 1>과 같이 표현된다.The object bounding box coordinate conversion unit 314 of FIG. 3 converts the object bounding box coordinate values (normalized ROI image reference) obtained through the DCNN-based object detection unit 313 into coordinate values of an input video frame reference. As shown in FIG. 7, the coordinate value 710 of the ROI based on the input video frame 700 is

, The resolution of the normalized ROI image 720 (ie, the input resolution of the object detection DCNN)

, The bounding box coordinate value 721 of a specific object based on the normalized ROI image 720

, The bounding box coordinate value 711 of the same object based on the input video frame 700

Speaking of,

Wow

The relationship of is expressed as <Equation 1>.

한편, 객체 바운딩 박스 통합부(320)는 각 ROI로부터 획득한 객체 바운딩 박스들을 입력 비디오 프레임 기준으로 통합한다. 만약 설정된 두 개 이상의 ROI들 사이에 겹치는 영역이 존재한다면, 겹치는 영역 내에 존재하는 객체에 대해 해당 ROI들로부터 동일 객체에 대해 두 개 이상의 객체 바운딩 박스를 동시에 얻을 수 있다. 객체 바운딩 박스 통합부(320)는 서로 다른 ROI에서 획득한 객체 바운딩 박스들 사이의 겹침을 체크하고, 겹침이 존재하는 객체 바운딩 박스들을 하나로 병합한다. 겹침이 존재하는 바운딩 박스들의 병합은 그 바운딩 박스들을 모두 감싸는 새로운 바운딩 박스를 구함에 의해 이루어진다. 도 8은 도 3의 객체 바운딩 박스 통합부(320)에 의해, 도 4의 ROI-1(도 8의 (a))과 ROI-2(도 8의 (b))로부터 각각 획득한 객체 바운딩 박스들을 입력 비디오 프레임 기준으로 통합(도 8의 (c))한 예를 보여준다.Meanwhile, the object bounding box integration unit 320 integrates object bounding boxes obtained from each ROI based on an input video frame. If an overlapping region exists between two or more set ROIs, two or more object bounding boxes for the same object may be simultaneously obtained from corresponding ROIs for an object existing in the overlapping region. The object bounding box integration unit 320 checks the overlap between object bounding boxes obtained from different ROIs, and merges the object bounding boxes in which the overlapping exists. Merging of bounding boxes with overlapping is accomplished by obtaining a new bounding box that encloses all of the bounding boxes. 8 is an object bounding box obtained from ROI-1 (FIG. 8(a)) and ROI-2 (FIG. 8(b)) of FIG. 4 by the object bounding box integration unit 320 of FIG. Shows an example of integrating them into an input video frame basis (Fig. 8 (c)).

객체 추적부(330)는 객체 바운딩 박스 통합부(320)를 통해 얻은 객체 바운딩 박스 데이터를 이용하여 다중 객체 추적(Multiple Object Tracking)을 수행한다.The object tracking unit 330 performs multiple object tracking using the object bounding box data obtained through the object bounding box integration unit 320.

현재까지 다양한 영상 기반의 다중 객체 추적 방법들이 발표된 바 있다. 이들 중 별도의 객체 검출기를 통해 입력 비디오 프레임으로부터 검출한 객체들의 바운딩 박스 정보를 이용하여 다중 객체를 추적하는 방식을 “Tracking-by-Detection”방식이라고 부른다. “Tracking-by-Detection” 방식을 따르는 대표적인 다중 객체 추적 알고리즘으로 SORT(Simple Online and Realtime Tracking)라는 알고리즘이 있다. 본 발명의 실시예에 따른 객체 추적부(330)는 이러한 “Tracking-by-Detection” 방식의 다중 객체 추적 알고리즘을 이용하여 객체 추적을 수행한다.So far, various image-based multi-object tracking methods have been announced. The method of tracking multiple objects using the bounding box information of objects detected from the input video frame through a separate object detector is called a “Tracking-by-Detection” method. A representative multi-object tracking algorithm that follows the “Tracking-by-Detection” method is an algorithm called Simple Online and Realtime Tracking (SORT). The object tracking unit 330 according to an embodiment of the present invention performs object tracking by using the “tracking-by-detection” multi-object tracking algorithm.

이벤트 검출부(340)는 지정된 이벤트 검출 영역 내에서 추적 객체들에 의해 발생하는 이벤트를 검출한다. 다양한 형태의 이벤트들이 정의될 수 있는데, 가장 간단한 이벤트의 예로 “지정된 이벤트 검출 영역에 지정된 타입의 객체가 존재”하는 이벤트를 들 수 있다.The event detection unit 340 detects an event generated by tracking objects within a designated event detection area. Various types of events can be defined. The simplest event is an event in which an object of a specified type exists in a designated event detection area.

도 9는 본 발명의 실시예에 따른 영상분석장치의 구동과정의 흐름도이다.9 is a flowchart of a driving process of an image analysis apparatus according to an embodiment of the present invention.

설명의 편의상 도 9를 도 1과 함께 참조하면, 본 발명의 실시예에 따른 영상분석장치(120)는 촬영영상의 촬영영역에서 관심 객체들의 출현 지역에 제1 관심영역 및 제2 관심영역을 각각 설정한 설정 정보를 저장한다(S900). 여기서, 설정되는 ROI는 객체 검출기의 수에 따라 결정될 수 있다. ROI의 개수는 객체 검출기와 동일 개수를 유지하는 것이 바람직하며, 최대 개수를 넘지 않는 것이 바람직하다. 또한, 관심영역의 설정 범위 즉 설정 크기는 객체 검출기의 성능에 따라 결정될 수 있다.Referring to FIG. 9 together with FIG. 1 for convenience of description, the image analysis apparatus 120 according to an embodiment of the present invention may respectively display a first region of interest and a second region of interest in an area where objects of interest appear in a region of a captured image. The set setting information is stored (S900). Here, the set ROI may be determined according to the number of object detectors. It is desirable to keep the number of ROIs the same as the object detector, and it is preferable not to exceed the maximum number. Also, the setting range of the region of interest, that is, the setting size, may be determined according to the performance of the object detector.

무엇보다 하나의 관심영역으로 설정할 수 있는 것을 다중 ROI로 설정하는 것은 DCNN 기반의 객체 검출시 데이터 연산 처리 속도 등과 관련이 있기 때문이며, 무엇보다 관심영역을 추출한 객체 이미지를 객체 검출기의 입력 영상에 대한 해상도와 일치시키기 위하여 부득이하게 해상도를 변환해야 할 때, 이로 인해 원거리의 객체가 제대로 검출되지 않는 등의 문제를 해결하기 위한 것이라 볼 수 있다.First of all, it is because DCNN-based object detection is related to data processing speed when setting one ROI as multiple ROI, and above all, the resolution of the object image from the object detector's input image is extracted. When it is inevitable to convert the resolution in order to match with, this can be considered as a solution to problems such as distant objects not being properly detected.

또한, 영상분석장치(120)는 촬영장치(100)로부터 수신하는 촬영영상에서, (기)저장한 설정 정보를 근거로 제1 관심영역의 제1 객체이미지 및 제2 관심영역의 제2 객체이미지를 각각 추출하여 추출한 제1 객체이미지 및 제2 객체이미지에 대하여 DCNN 기반의 객체 검출을 수행하며, 객체 검출에 의해 검출되는 제1 관심영역과 제2 관심영역의 객체를 병합한 후 객체 추적 및 이벤트 검출을 수행한다(S910).In addition, the image analysis device 120, the first object image of the first region of interest and the second object image of the second region of interest on the basis of the stored setting information in the captured image received from the imaging apparatus 100 DCNN-based object detection is performed on the extracted first object image and the second object image, respectively, and object tracking and events are performed after merging the objects of the first region of interest and the second region of interest detected by object detection. Detection is performed (S910).

위에서 언급한 바와 같이, 영상분석장치(120)는 다중 ROI에 대하여 병합 동작을 수행하게 되며, 가령 각 ROI로부터 획득한 객체 바운딩 박스들을 입력 비디오 프레임 기준으로 통합한다. 만약 설정된 두 개 이상의 ROI들 사이에 겹치는 영역이 존재한다면, 겹치는 영역 내에 존재하는 객체에 대해 해당 ROI들로부터 동일 객체에 대해 두 개 이상의 객체 바운딩 박스를 동시에 얻을 수 있다. 서로 다른 ROI에서 획득한 객체 바운딩 박스들 사이의 겹침을 체크하고, 겹침이 존재하는 객체 바운딩 박스들을 하나로 병합한다. 겹침이 존재하는 바운딩 박스들의 병합은 그 바운딩 박스들을 모두 감싸는 새로운 바운딩 박스를 구하게 된다. 예를 들어, 좌표를 설정할 때 장치 또는 시스템 내부적으로는 해당 좌표값을 갖고 있기 때문에 두 개의 좌표값을 서로 비교함으로써 겹침 여부를 체크할 수 있을 것이다.As mentioned above, the image analysis apparatus 120 performs a merge operation on multiple ROIs, for example, integrates object bounding boxes obtained from each ROI based on an input video frame. If an overlapping region exists between two or more set ROIs, two or more object bounding boxes for the same object may be simultaneously obtained from corresponding ROIs for an object existing in the overlapping region. The overlap between object bounding boxes obtained from different ROIs is checked, and the object bounding boxes in which overlap exists are merged into one. Merging of bounding boxes with overlapping results in a new bounding box that wraps all of the bounding boxes. For example, when setting coordinates, since the device or system has the corresponding coordinate values, it may be possible to check whether they overlap by comparing two coordinate values.

상기한 내용 이외에도 도 1의 영상분석장치(120)는 다양한 동작을 수행할 수 있으며, 기타 자세한 내용은 앞서 충분히 설명하였으므로 그 내용들로 대신하고자 한다.In addition to the above, the image analysis device 120 of FIG. 1 may perform various operations, and other detailed contents have been sufficiently described above, and thus the contents thereof will be replaced.

한편, 본 발명의 실시 예를 구성하는 모든 구성 요소들이 하나로 결합하거나 결합하여 동작하는 것으로 설명되었다고 해서, 본 발명이 반드시 이러한 실시 예에 한정되는 것은 아니다. 즉, 본 발명의 목적 범위 안에서라면, 그 모든 구성 요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다. 또한, 그 모든 구성요소들이 각각 하나의 독립적인 하드웨어로 구현될 수 있지만, 각 구성 요소들의 그 일부 또는 전부가 선택적으로 조합되어 하나 또는 복수 개의 하드웨어에서 조합된 일부 또는 전부의 기능을 수행하는 프로그램 모듈을 갖는 컴퓨터 프로그램으로서 구현될 수도 있다. 그 컴퓨터 프로그램을 구성하는 코드들 및 코드 세그먼트들은 본 발명의 기술 분야의 당업자에 의해 용이하게 추론될 수 있을 것이다. 이러한 컴퓨터 프로그램은 컴퓨터가 읽을 수 있는 비일시적 저장매체(non-transitory computer readable media)에 저장되어 컴퓨터에 의하여 읽혀지고 실행됨으로써, 본 발명의 실시 예를 구현할 수 있다.On the other hand, that all components constituting the embodiments of the present invention are described as being combined or operated as one, the present invention is not necessarily limited to these embodiments. That is, if it is within the scope of the present invention, all of the components may be selectively combined and operated. In addition, although all of the components may be implemented as one independent hardware, a part or all of the components are selectively combined to perform a part or all of functions combined in one or a plurality of hardware. It may be implemented as a computer program having a. The codes and code segments constituting the computer program may be easily deduced by those skilled in the art of the present invention. Such a computer program is stored in a computer-readable non-transitory computer readable media, and read and executed by a computer, thereby implementing an embodiment of the present invention.

여기서 비일시적 판독 가능 기록매체란, 레지스터, 캐시(cache), 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라, 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로, 상술한 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리 카드, ROM 등과 같은 비일시적 판독가능 기록매체에 저장되어 제공될 수 있다.Here, the non-transitory readable recording medium means a medium that stores data semi-permanently and that can be read by a device, rather than a medium that stores data for a short time, such as registers, caches, and memory. . Specifically, the above-described programs may be stored and provided on a non-transitory readable recording medium such as a CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM, and the like.

이상에서는 본 발명의 바람직한 실시 예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어서는 안 될 것이다.Although the preferred embodiments of the present invention have been shown and described above, the present invention is not limited to the specific embodiments described above, and it is usually in the technical field to which the present invention pertains without departing from the gist of the present invention as claimed in the claims. Of course, various modifications can be implemented by a person having knowledge of these, and these modifications should not be individually understood from the technical idea or prospect of the present invention.

100: 촬영장치 110: 통신망
120, 120': 영상분석장치 130: 관제장치
140: 관리자장치 200: 통신 인터페이스부
210: 제어부 220: 다중 ROI 영상처리부
230: 저장부 300: 비디오 프레임 획득부
310: ROI 이미지처리부 320: 객체 바운딩 박스 통합부
330: 객체 추적부 340: 이벤트 검출부100: photographing device 110: communication network
120, 120': image analysis device 130: control device
140: manager device 200: communication interface
210: control unit 220: multi-ROI image processing unit
230: storage unit 300: video frame acquisition unit
310: ROI image processing unit 320: object bounding box integration unit
330: object tracking unit 340: event detection unit

Claims

A storage unit for storing setting information of a first region of interest (ROI) and a second region of interest, which are respectively set in regions where objects of interest appear in a region of the imaging apparatus; And
The first object image of the first region of interest and the second object image of the second region of interest are extracted from the captured image received from the photographing apparatus, respectively, and the extracted first object image and the DCNN (Deep Convolutional Neural Network)-based object detection is performed on a second object image, and object tracking and event detection are performed after merging the objects of the first region of interest and the second region of interest detected by the object detection Includes a control unit for performing;
The control unit,
When the first region of interest and the second region of interest overlap, the same object in the overlapped region is processed by merging into one object based on the unit frame of the captured image,
The control unit,
A new bounding box is generated by merging the first bounding box and the second bounding box detected for the same object in the overlapping area based on the unit frame,
The control unit,
CCTV image analysis device based on multiple ROI and object detection DCNN that determines the set size of the first region of interest and the second region of interest according to the resolution of the object detector used.

delete

According to claim 1,
When the DCNN-based object detection is performed after the resolution conversion of the first object image and the second object image, the controller uses a multi-ROI and object detection DCNN based object detector that uses an object detector with less loss for distant objects. CCTV video analysis device.

According to claim 4,
The control unit includes a region of interest (ROI) image processing unit for the resolution conversion, and the converted resolution is a multi-ROI and object detection DCNN-based CCTV image analysis device that matches the resolution of the input image of the DCNN of the object detector.

delete

Storing setting information of a first region of interest (ROI) and a second region of interest, respectively, which are set in regions where objects of interest appear in a region of the imaging apparatus;
The controller extracts the first object image of the first region of interest and the second object image of the second region of interest based on the stored setting information from the captured image received from the photographing apparatus, respectively, and extracts the first object DCNN-based object detection is performed on the image and the second object image, and object tracking and event detection are performed after merging the objects of the first region of interest and the second region of interest detected by the object detection. step; And
Including the step of determining the set size of the first region of interest and the second region of interest according to the resolution of the object detector used;
The step of performing,
When the first region of interest and the second region of interest overlap, the same object in the overlapping region is merged and processed into one object based on the unit frame of the captured image; And
Generating a new bounding box by merging the first bounding box and the second bounding box, respectively, detected for the same object in the overlapping region based on the unit frame;
A method of driving a multi-ROID and object detection DCNN-based CCTV image analysis device.

delete

The method of claim 7,
The step of performing,
Multi-ROI and object detection DCNN-based CCTV image analysis using an object detector with less loss for distant objects when performing DCNN-based object detection after resolution conversion for the first object image and the second object image How the device is driven.

The method of claim 10,
A method of driving a CCTV image analysis apparatus based on a multi-ROI and object detection DCNN based on the resolution of the input image of the DCNN of the object detector, and the converted resolution performed by the ROI image processing unit.

delete