KR102127276B1

KR102127276B1 - The System and Method for Panoramic Video Surveillance with Multiple High-Resolution Video Cameras

Info

Publication number: KR102127276B1
Application number: KR1020180159267A
Authority: KR
Inventors: 장정훈; 이형주
Original assignee: 주식회사 인텔리빅스
Priority date: 2018-12-11
Filing date: 2018-12-11
Publication date: 2020-06-26

Abstract

According to one embodiment of the present invention, a panoramic video surveillance method executed in a video surveillance apparatus connected to multiple high-resolution video cameras comprises: a step of receiving high-resolution video camera videos from the multiple high-resolution video cameras; a step of generating and displaying a low-resolution panoramic video by using the high-resolution video camera videos based on a low-resolution panoramic video generation table; and a step of, when a specific position or a specific moving object is designated in the low-resolution panoramic video, generating and displaying a high-resolution digital PTZ video falling under a designated region by using an ultrahigh-resolution panoramic video generation table. Accordingly, the present invention is able to generate and use a low-resolution panoramic video instead of generating an ultrahigh-resolution panoramic video for displaying and analyzing panoramic videos, to generate and use only a video falling under the region of a subject of interest (a high-resolution digital PTZ video) instead of generating a whole ultrahigh-resolution panoramic video for displaying a high-resolution video of the subject of interest, and to build an efficient high-resolution panoramic video surveillance system.

Description

Panoramic Video Surveillance with Multiple High-Resolution Video Cameras

본 발명은 복수의 고해상도 카메라들을 이용한 파노라마 영상 감시 시스템 및 그 방법에 관한 것으로서, 더 상세하게는 가령 복수의 고해상도 카메라로부터 수신되는 영상으로부터 관심 객체들의 검출/추적이 가능하고, 이를 바탕으로 지정된 이벤트를 감지하여 경보를 주는 영상감시장치로서, 객체 이미지 인식 DCNN(Deep Convolutional Neural Network)을 이용하여 강건하고 효율적인 영상 분석을 수행하는 복수의 고해상도 카메라들을 이용한 파노라마 영상 감시 시스템 및 그 방법에 관한 것이다.The present invention relates to a system for monitoring a panoramic image using a plurality of high-resolution cameras and a method thereof, and more specifically, to detect/track objects of interest from an image received from a plurality of high-resolution cameras, and to designate an event based on this. As a video surveillance device that detects and gives an alarm, it relates to a panoramic image surveillance system and method using a plurality of high-resolution cameras that perform robust and efficient image analysis using object image recognition Deep Convolutional Neural Network (DCNN).

일반적으로 영상 감시 시스템은 특정 위치에 카메라를 설치하고, 다른 장소에서 특정 위치에 설치된 카메라에서 촬영된 영상을 표출하여 특정 위치를 감시하는 시스템을 말한다.Generally, a video surveillance system refers to a system that installs a camera at a specific location and monitors a specific location by displaying an image captured by a camera installed at a specific location in another location.

영상 감시 시스템 중 파노라마 영상 기반의 감시 시스템은 180도 또는 360도의 광각 영상을 제공하는 시스템으로, 하나의 영상으로 넓은 지역을 한꺼번에 감시할 수 있다는 장점을 가지고 있다. 또한 영상 감시의 자동화를 위해 비디오 영상으로부터 이동 객체들을 자동 추적하는 경우, 파노라마 영상은 넓은 지역에서 이동하는 객체들을 끊김 없이 추적할 수 있도록 해주는 장점을 가지고 있다.Among the video surveillance systems, the panoramic video-based surveillance system is a system that provides a wide-angle image of 180 degrees or 360 degrees, and has the advantage of being able to monitor a large area at once with a single image. In addition, in the case of automatically tracking moving objects from a video image for automating video surveillance, the panoramic image has an advantage of continuously tracking moving objects in a large area.

일반적으로 파노라마 비디오 영상을 획득하는 방식으로는 한 대의 고해상도 카메라에 어안 렌즈 또는 반사경을 부착하여 180도/360도의 광각 영상을 획득하는 방식과 방사형으로 배치된 복수의 일반 카메라로부터 얻은 영상들을 서로 붙여서 광각의 파노라마 영상을 획득하는 방식으로 나눌 수 있다.In general, as a method of acquiring a panoramic video image, a fisheye lens or a reflector is attached to one high resolution camera to obtain a wide angle image of 180 degrees/360 degrees, and images obtained from a plurality of general cameras arranged radially are pasted together. It can be divided into the way to obtain a panoramic image of.

최근에는 비교적 간단한 구성으로 파노라마 영상을 획득할 수 있는 "단일카메라 + 어안 렌즈" 방식이 영상 감시 시스템에서 주로 사용되고 있다. 하지만, 이 방식은 단일 카메라의 해상도에 의해 파노라마 영상의 전체 해상도가 결정되는 문제, 렌즈의 가장자리로 갈수록 영상 왜곡이 심해지는 문제 등을 가지고 있다. 그로 인하여 이 방식은 근거리 영상 감시에만 주로 활용되고 있다.Recently, a "single camera + fisheye lens" method that can acquire a panoramic image with a relatively simple configuration is mainly used in an image surveillance system. However, this method has a problem in that the overall resolution of a panoramic image is determined by the resolution of a single camera, and the image distortion increases as it approaches the edge of the lens. Therefore, this method is mainly used for short-range video surveillance.

반면 복수의 일반 카메라를 이용하여 파노라마 영상을 획득하는 방식은 위의 "단일카메라 + 어안 렌즈" 방식보다 카메라 구성 및 파노라마 영상 생성 방식이 복잡하기는 하나, "단일카메라 + 어안 렌즈" 방식의 문제점을 극복할 수 있기 때문에 근거리뿐만 아니라 원거리 영상 감시에도 활용될 수 있다.On the other hand, the method of acquiring a panoramic image using a plurality of general cameras is more complicated than the above "single camera + fisheye lens" method, and the method of generating a panoramic image is complicated, but the "single camera + fisheye lens" method has a problem. Because it can be overcome, it can be used not only for short distance, but also for remote video surveillance.

일반적으로 카메라의 화각이 고정되어 있을 때 카메라 영상의 해상도가 높을수록 원거리 피사체에 대해 더 상세한 정보를 획득할 수 있다. 따라서 광역/원거리 감시 시스템에서는 높은 해상도의 카메라가 주로 사용된다.In general, the higher the resolution of the camera image when the angle of view of the camera is fixed, the more detailed information can be obtained on a distant subject. Therefore, high-resolution cameras are mainly used in wide area/remote surveillance systems.

복수의 일반 카메라 기반 파노라마 영상 감시 시스템에서 개별 카메라의 영상 해상도가 높으면, 개별 카메라의 영상을 서로 붙여서 생성한 파노라마 영상은 매우 높은 영상 해상도를 갖는다. 예를 들면 수평 배치된 5대의 일반 카메라를 사용하고, 개별 카메라의 영상 해상도가 Full-HD급(1920x1080)인 경우, 이를 통해 얻어지는 파노라마 영상의 최대 해상도는 약 8448x1080에 이른다(이웃하는 카메라 영상 사이의 겹침 정도를 15%라고 가정하는 경우). 이러한 초고해상도의 파노라마 영상을 별도의 전용 하드웨어 장치 없이 일반 PC에서 실시간(즉, 적어도 15FPS 이상)으로 생성하려면 막대한 CPU 자원이 필요하기 때문에 현실적으로 실시간 처리가 이루어지기 매우 어렵다. 따라서 복수의 고해상도 카메라를 기반으로 하는 파노라마 영상 감시 시스템의 경우 기존의 방식 그대로 운영하기에 어려운 점이 있다.In a plurality of general camera-based panoramic image surveillance systems, if the image resolutions of the individual cameras are high, the panoramic images generated by pasting the images of the individual cameras have a very high image resolution. For example, if 5 general cameras are horizontally placed and the video resolution of each camera is Full-HD (1920x1080), the maximum resolution of the panoramic image obtained through this is about 8448x1080 (between neighboring camera images. Assuming the degree of overlap is 15%). In order to generate such ultra-high-resolution panoramic images in real-time (ie, at least 15 FPS or higher) on a general PC without a separate dedicated hardware device, real-time processing is very difficult in reality. Therefore, in the case of a panoramic video surveillance system based on a plurality of high-resolution cameras, it is difficult to operate in a conventional manner.

한편, 최근 관공서나 기업 등에서 보안/안전을 위해 설치하는 CCTV 카메라의 수는 폭발적으로 증가하고 있다. 그러나 설치한 CCTV 카메라 수에 비해 CCTV 카메라 영상을 모니터링하는 요원의 수는 턱없이 부족한 실정이다. 이러한 문제점을 해결하기 위해 지능형 CCTV 영상 감시 시스템의 도입이 활발하게 이루어지고 있다. 지능형 CCTV 영상 감시 시스템의 핵심을 이루는 CCTV 영상분석장치는 CCTV 카메라로부터 비디오 영상을 받아 이동 객체들을 검출/추적하고, 이를 바탕으로 “금지된 구역에 침입 발생” 등과 같은 이상 상황을 자동으로 감지하여 경보를 발생시킨다. 모니터링 요원은 다수의 (무의미한) CCTV 영상을 항상 주시할 필요 없이 경보가 발생한 CCTV 영상만 확인함으로써, 다수의 CCTV 카메라 영상을 효과적으로 모니터링할 수 있다.Meanwhile, in recent years, the number of CCTV cameras installed for security/safety in government offices and enterprises has exploded. However, compared to the number of installed CCTV cameras, the number of agents monitoring CCTV camera images is far from sufficient. In order to solve this problem, the introduction of an intelligent CCTV video surveillance system has been actively conducted. The CCTV video analysis device, which forms the core of the intelligent CCTV video surveillance system, receives video images from CCTV cameras, detects and tracks moving objects, and automatically detects and detects abnormal conditions such as “intrusion in the forbidden area” based on this. Causes The monitoring agent can effectively monitor multiple CCTV camera images by checking only the CCTV image where the alarm has occurred without having to constantly watch for multiple (insignificant) CCTV images.

그러나 기존의 CCTV 영상분석장치의 대부분은 모션 기반의 객체 검출 알고리즘을 사용하는 관계로, 실제 관심 객체(예: 대표적으로 사람 및 차량)의 검출 이외에도 다양한 원인(예: 바람에 흔들리는 나뭇가지, 출렁이는 물결, 움직이는 그림자, 갑작스러운 조명 변화, 반짝이는 불빛, 눈/비 등)에 의한 객체 오검출이 빈번하게 발생한다. 이를 통해 오경보 또한 빈번하게 발생하여 효율적인 모니터링을 할 수 없게 만든다.However, since most of the existing CCTV image analysis devices use motion-based object detection algorithms, various causes (eg, branches swaying in the wind, sloshing) in addition to detection of actual objects of interest (eg, people and vehicles) The object detection is frequently caused by waves, moving shadows, sudden lighting changes, twinkling lights, snow/rain, etc. Through this, false alarms also occur frequently, making efficient monitoring impossible.

컴퓨터 비전(Computer Vision) 연구자들은 모션 기반의 객체 검출 기술 이외에 2000년대 중반부터 객체 형상 학습 기반의 객체 검출 기술을 발전시켜 왔다. 상기 기술에서는 특정 타입의 객체(예를 들면, 보행자)의 다양한 학습 이미지들로부터 객체 형상 특징을 추출하여 학습하고, 학습된 객체의 형상 특징과 유사한 형상 특징을 보이는 영역을 영상에서 찾음에 의해 객체 검출을 수행한다. 대표적으로 Viola-Jones, HOG, ICF, ACF, DPM 등의 객체 검출 기술이 있다. 그러나 이러한 객체 검출 기술들의 검출 성능 한계 및 처리 부하 문제로 상용 CCTV 영상분석장치에 적용하기에는 어려움이 있었다.In addition to motion-based object detection technology, computer vision researchers have developed object detection technology based on object shape learning since the mid-2000s. In the above technique, an object shape feature is extracted and learned from various learning images of a specific type of object (for example, a pedestrian), and an object is detected by finding a region having a shape characteristic similar to that of the learned object in the image. To perform. Typically, there are object detection technologies such as Viola-Jones, HOG, ICF, ACF, and DPM. However, due to limitations in detection performance and processing load of these object detection technologies, it was difficult to apply them to commercial CCTV image analysis devices.

이런 와중에 2012년도에 캐나다 토론토 대학의 G. Hinton 교수 팀이 AlexNet이라는 DCNN을 이용하여, ILSVRC(ImageNet Large Scale Visual Recognition Challenge)에서 기존의 이미지 인식 알고리즘들과는 압도적인 성능 차이로 우승을 하게 됨에 따라, 컴퓨터 비전 분야에서 딥러닝(Deep Learning) 기술이 주목을 받기 시작하였고, 그 후 딥러닝 기술을 이용하여 컴퓨터 비전의 각종 문제들을 해결하려는 시도가 이어져 왔다. In the midst of this, as Professor G. Hinton of the University of Toronto, Canada, in 2012, used a DCNN called AlexNet to win the ILSVRC (ImageNet Large Scale Visual Recognition Challenge) with an overwhelming performance difference from existing image recognition algorithms. In the field of vision, deep learning technology has started to receive attention, and since then, attempts have been made to solve various problems of computer vision using deep learning technology.

2014년부터 DCNN 기반의 객체 검출 기술들이 발표되기 시작하였다. 이들 DCNN 기반의 객체 검출 기술은 기존의 객체 검출 기술의 성능을 훨씬 뛰어 넘는 검출 성능을 제공한다. 대표적으로 Fast/Faster R-CNN, RFCN, SSD, YOLO 등의 객체 검출 기술이 있다.From 2014, DCNN-based object detection technologies began to be announced. These DCNN-based object detection technologies provide detection capabilities far exceeding those of existing object detection technologies. Typically, there are object detection technologies such as Fast/Faster R-CNN, RFCN, SSD, and YOLO.

그러나 DCNN 기반의 객체 검출 기술을 상용 CCTV 영상분석장치에 적용하기에는 여전히 여러 제약점들이 있다. 대표적인 제약점은 DCNN 기반의 객체 검출기를 이용해 비디오를 실시간 처리하기 위한 하드웨어 비용이 매우 높다는 점이다. 통상적인 DCNN 기반의 객체 검출기는 한 장의 비디오 프레임으로부터 객체를 검출하는 데에도 상당히 많은 연산량을 요구하기 때문에, 일반 CPU에서 DCNN 기반의 객체 검출기를 이용하여 비디오를 실시간으로 처리(예: 통상적으로 초당 7 프레임 이상 객체 검출 수행 필요)하기에는 매우 어렵다. 따라서 DCNN 기반의 객체 검출기를 이용하여 비디오를 실시간으로 처리하려면, 대규모 병렬 연산이 가능한 GPU가 반드시 요구된다. 또한 GPU를 사용한다 하더라도, 성능이 우수한 고가의 GPU를 사용하지 않는 이상 하나의 영상분석장치에서 여러 개의 비디오 스트림을 동시에 실시간 처리하기는 어렵다.However, there are still several limitations in applying DCNN-based object detection technology to commercial CCTV image analysis devices. A typical limitation is that the hardware cost for real-time video processing using a DCNN-based object detector is very high. Since a typical DCNN-based object detector requires a considerable amount of computation to detect an object from a single video frame, the general CPU processes the video in real time using a DCNN-based object detector (eg, typically 7 per second). It is very difficult to perform object detection of frame anomalies). Therefore, in order to process video in real time using a DCNN-based object detector, a GPU capable of large-scale parallel computation is essential. In addition, even if a GPU is used, it is difficult to simultaneously process multiple video streams simultaneously in a single image analysis device unless an expensive GPU with high performance is used.

한국공개특허공보 제10-2006-0044129호(2000.05.16.)Korean Patent Publication No. 10-2006-0044129 (2000.05.16.) 한국공개특허공보 제10-2005-0045845호(2005.05.17.)Korean Patent Publication No. 10-2005-0045845 (2005.05.17.) 한국공개특허공보 제10-2005-0046822호(2005.05.18.)Korean Patent Publication No. 10-2005-0046822 (2005.05.18.) 한국공개특허공보 제10-2009-0121102호(2009.10.26.)Korean Patent Publication No. 10-2009-0121102 (2009.10.26.) 한국공개특허공보 제10-2018-0072561호(2018.06.29.)Korean Patent Publication No. 10-2018-0072561 (2018.06.29.) 한국공개특허공보 제10-2018-0107930호(2018.10.04.)Korean Patent Publication No. 10-2018-0107930 (2018.10.04.) 한국등록특허공보 제10-1850286호(2018.04.13.)Korean Registered Patent Publication No. 10-1850286 (2018.04.13.) 한국등록특허공보 제10-1040049호(2011.06.02.)Korean Registered Patent Publication No. 10-1040049 (2011.06.02.) 한국등록특허공보 제10-1173853호(2012.08.08.)Korean Registered Patent Publication No. 10-1173853 (2012.08.08.) 한국등록특허공보 제10-1178539호(2012.08.24.)Korean Registered Patent Publication No. 10-1178539 (2012.08.24.) 한국등록특허공보 제10-1748121호(2017.06.12.)Korean Registered Patent Publication No. 10-1748121 (2017.06.12.) 한국등록특허공보 제10-1789690호(2017.10.18.)Korean Registered Patent Publication No. 10-1789690 (October 18, 2017) 한국등록특허공보 제10-1808587호(2017.12.07.)Korean Registered Patent Publication No. 10-1808587 (2017.12.07.)

“Pedestrian Detection: An Evaluation of the State of the Art”IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 34, Issue 4, April 2012“Pedestrian Detection: An Evaluation of the State of the Art” IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 34, Issue 4, April 2012 “Efficient Processing of Deep Neural Networks: A Tutorial and Survey”, Proceedings of the IEEE, Vol. 105, Issue 12, Dec. 2017“Efficient Processing of Deep Neural Networks: A Tutorial and Survey”, Proceedings of the IEEE, Vol. 105, Issue 12, Dec. 2017 “Object Detection with Deep Learning: A Review”arXiv.org, arXiv:1807.05511 [cs.CV], Jul. 2018“Object Detection with Deep Learning: A Review” arXiv.org, arXiv: 1807.05511 [cs.CV], Jul. 2018

본 발명의 실시예는, 복수의 고해상도 카메라를 이용한 파노라마 영상 감시 시스템에서, 파노라마 영상의 표출 및 영상 분석을 위해서는 초고해상도의 파노라마 영상을 생성하는 대신 저해상도 파노라마 영상을 생성하여 사용하고, 관심 피사체의 고해상도 영상 표출을 위해서는 초고해상도 파노라마 영상 전체를 생성하는 대신 관심 피사체의 영역에 해당하는 영상(고해상도 디지털 PTZ 영상)만을 생성하여 사용함으로써 효율적인 고해상도 파노라마 영상 감시 시스템을 구축할 수 있도록 하는 영상 감시 방법 및 이를 실행하는 장치를 제공하는 것을 목적으로 한다.In an embodiment of the present invention, in a panoramic image surveillance system using a plurality of high-resolution cameras, instead of generating an ultra-high resolution panoramic image for generating and analyzing a panoramic image, a low-resolution panoramic image is generated and used. In order to display the image, instead of generating the entire ultra-high resolution panoramic image, only the image (high resolution digital PTZ image) corresponding to the region of interest is generated and used, and an image surveillance method to implement an efficient high-resolution panoramic image surveillance system and execution thereof An object of the present invention is to provide a device.

또한, 본 발명의 실시예는 가령 복수의 고해상도 카메라로부터 수신되는 영상으로부터 관심 객체들의 검출/추적이 가능하고, 이를 바탕으로 지정된 이벤트를 감지하여 경보를 주는 영상감시장치로서, 객체 이미지 인식 DCNN을 이용하여 강건하고 효율적인 영상 분석을 수행하는 복수의 고해상도 카메라들을 이용한 파노라마 영상 감시 시스템 및 그 방법을 제공함에 그 목적이 있다.In addition, an embodiment of the present invention can detect/track objects of interest from images received from a plurality of high-resolution cameras, and use DCNN for object image recognition as an image surveillance device that detects and alerts designated events based on this. The objective is to provide a panoramic image surveillance system and method using a plurality of high-resolution cameras that perform robust and efficient image analysis.

본 발명이 해결하고자 하는 과제는 이상에서 언급한 과제(들)로 제한되지 않으며, 언급되지 않은 또 다른 과제(들)은 아래의 기재로부터 당업자에게 명확하게 이해될 수 있을 것이다.The problem to be solved by the present invention is not limited to the problem(s) mentioned above, and another problem(s) not mentioned will be clearly understood by those skilled in the art from the following description.

실시예들 중에서, 방사 형태로 배치되고 영상의 일부가 서로 겹치게 설치된 복수의 고해상도 카메라와 연결된 영상 감시 장치에서 실행되는 영상 감시 방법은 상기 복수의 고해상도 카메라로부터 고해상도 카메라 영상을 수신하는 단계, 저해상도 파노라마 영상 생성 테이블을 기초로 상기 고해상도 카메라 영상을 이용하여 저해상도 파노라마 영상을 생성하여 표시하는 단계, 상기 저해상도 파노라마 영상에서 특정 위치 또는 특정 이동 객체 지정 시 초고해상도 파노라마 영상 생성 테이블을 이용하여 지정된 영역에 해당하는 고해상도 디지털 PTZ 영상을 생성하여 표시하는 단계를 포함한다.Among embodiments, an image surveillance method executed in an image surveillance apparatus connected to a plurality of high resolution cameras arranged in a radial form and partially overlapping each other includes receiving a high resolution camera image from the plurality of high resolution cameras, and a low resolution panoramic image Generating and displaying a low-resolution panoramic image using the high-resolution camera image based on a generation table. When a specific location or a specific moving object is specified in the low-resolution panoramic image, a high-resolution panoramic image generation table is used to designate a high-resolution image corresponding to a designated area. And generating and displaying a digital PTZ image.

실시예들 중에서, 방사 형태로 배치되고 영상의 일부가 서로 겹치게 설치된 복수의 고해상도 카메라와 연결된 영상 감시 장치는 상기 복수의 고해상도 카메라로부터 고해상도 카메라 영상을 수신하는 영상 입력부, 저해상도파노라마 영상 생성 테이블을 기초로 상기 고해상도 카메라 영상을 이용하여 저해상도의 파노라마 영상을 생성하는 저해상도 파노라마 영상 생성부, 상기 저해상도 파노라마 영상에서 특정 위치 또는 특정 이동 객체가 지정되면 초고해상도 파노라마 영상 생성 테이블을 이용하여 지정된 영역에 해당하는 고해상도 디지털 PTZ 영상을 생성하는 고해상도 디지털 PTZ 영상 생성부를 포함한다.Among embodiments, an image surveillance device arranged in a radial form and connected to a plurality of high-resolution cameras in which a part of the images are overlapped with each other is based on an image input unit that receives high-resolution camera images from the plurality of high-resolution cameras and a low-resolution panoramic image generation table. A low-resolution panoramic image generator that generates a low-resolution panoramic image using the high-resolution camera image, and when a specific location or a specific moving object is specified in the low-resolution panoramic image, a high-resolution digital image corresponding to a designated area using an ultra-high-resolution panoramic image generation table It includes a high-resolution digital PTZ image generation unit for generating a PTZ image.

또한, 본 발명의 실시예에 따른 복수의 고해상도 카메라들을 이용한 파노라마 영상 감시 시스템은, 영상 내의 객체에 대한 인식결과를 제공해주는 DCNN(Deep Convolutional Neural Network)을 이용해 영상 분석을 수행하는 복수의 고해상도 카메라들을 이용한 파노라마 영상 감시 시스템으로서, 제1 픽셀 포맷, 제1 해상도 및 제1 프레임률(frame rate)을 갖는 복수의 제1 비디오 프레임으로 구성되는 제1 영상 데이터를 상기 고해상도 카메라로부터 수신하며, 상기 수신한 제1 영상 데이터의 상기 제1 해상도, 상기 제1 프레임률 및 상기 제1 픽셀 포맷을 각각 상기 제1 해상도보다 낮은 제2 해상도, 상기 제1 프레임률보다 낮은 제2 프레임률 및 상기 제1 픽셀 포맷(pixel format)과 다른 형식을 갖는 이종의 제2 픽셀 포맷의 제2 영상 데이터로 변환하는 영상 변환부, 및 상기 변환한 제2 영상 데이터의 비디오 프레임에서 모션(motion) 기반으로 복수의 이동 객체의 검출 및 추적을 수행하고, 상기 복수의 추적 중인 객체의 이미지를 각각 추출하며, 상기 추출한 객체 이미지를 상기 DCNN에 입력하여 얻은 인식 결과를 이용하여 상기 복수의 추적 객체를 구분하며, 상기 구분한 복수의 추적 객체에서 지정 기준을 벗어나는 사용자 비관심 객체를 확인하여 상기 확인한 사용자 비관심 객체를 상기 복수의 추적 객체의 대상에서 제거하여 획득되는 사용자 관심 객체의 추적 정보 및 지정 규칙을 근거로 이벤트를 검출하는 모션 기반 영상 분석부를 포함한다.In addition, the panoramic image surveillance system using a plurality of high-resolution cameras according to an embodiment of the present invention includes a plurality of high-resolution cameras performing image analysis using a deep convolutional neural network (DCNN) that provides an object recognition result in an image. A panoramic video surveillance system, the first video data consisting of a plurality of first video frames having a first pixel format, a first resolution, and a first frame rate are received from the high-resolution camera, and the received The first resolution, the first frame rate, and the first pixel format of the first image data are respectively a second resolution lower than the first resolution, a second frame rate lower than the first frame rate, and the first pixel format, respectively. An image conversion unit for converting second image data in a second pixel format having a different format from (pixel format), and a plurality of moving objects based on motion in a video frame of the converted second image data Performs detection and tracking, extracts images of the plurality of objects being tracked, distinguishes the plurality of tracking objects by using the recognition result obtained by inputting the extracted object images to the DCNN, and the plurality of the separated objects Motion that detects an event based on tracking information and a specified rule of a user-interest object obtained by removing the identified user non-interest object from the target of the plurality of tracking objects by checking a user non-interest object that is out of a specified criterion from the tracking object It includes a base image analysis unit.

상기 영상 변환부는, 상기 제1 프레임률에 따라 입력되는 상기 복수의 비디오 프레임을 수신하는 비디오 프레임 획득부, 상기 입력된 복수의 제1 비디오 프레임보다 적은 수의 비디오 프레임을 샘플링하여 복수의 제2 비디오 프레임을 생성하는 비디오 프레임 서브샘플링부, 상기 생성한 복수의 제2 비디오 프레임을 상기 제2 해상도로 변환하는 비디오 프레임 스케일링부, 및 상기 제2 해상도로 변환된 상기 복수의 제2 비디오 프레임의 상기 제1 픽셀 포맷을 상기 제2 픽셀 포맷으로 변환하여 생성한 상기 제2 영상 데이터를 상기 모션 기반 영상 분석부로 제공하는 픽셀 포맷 변환부를 포함할 수 있다.The image converting unit, a video frame acquiring unit receiving the plurality of video frames input according to the first frame rate, sampling a smaller number of video frames than the input plurality of first video frames, and a plurality of second videos A video frame subsampling unit for generating a frame, a video frame scaling unit for converting the generated plurality of second video frames to the second resolution, and the first of the plurality of second video frames converted to the second resolution. And a pixel format converter configured to provide the second image data generated by converting one pixel format to the second pixel format to the motion-based image analyzer.

상기 모션 기반 영상 분석부는, 상기 제2 영상 데이터와 학습된 배경 영상을 이용하여 차 영상(difference image)을 생성하고, 상기 생성한 차 영상에서 노이즈를 제거하여 상기 복수의 이동 객체에 대한 모션 영역을 검출하는 모션 영역 검출부, 상기 검출한 모션 영역과 상기 제2 영상데이터를 이용하여 상기 복수의 이동 객체를 검출하고 추적하는 객체 추적부, 상기 복수의 제2 비디오 프레임 중 하나의 비디오 프레임에서 추출한 상기 복수의 추적 객체의 객체 이미지를 상기 DCNN에 적용해 얻은 제1 인식결과 및 상기 복수의 제2 비디오 프레임 중 상기 하나의 비디오 프레임과 일정 시간 간격을 두고 입력된 적어도 하나의 다른 비디오 프레임에서 추출한 상기 복수의 추적 객체의 객체 이미지를 상기 DCNN에 적용해 얻은 제2 인식결과를 근거로 상기 복수의 추적 객체를 구분하는 추적 객체 분류부, 상기 구분한 복수의 추적 객체 중에서 지정 기준에 벗어나는 사용자 비관심 객체를 확인하여 상기 확인한 사용자 비관심 객체를 상기 객체 추적부의 상기 복수의 추적 객체의 대상에서 제거하는 비관심 추적 객체 제거부, 및 상기 제거에 따라 획득되는 사용자 관심 객체의 추적 정보가 지정 규칙을 만족하는 이벤트를 검출하는 관심 추적 객체 기반 이벤트 검출부를 포함할 수 있다.The motion-based image analysis unit generates a difference image by using the second image data and the learned background image, and removes noise from the generated difference image to generate a motion region for the plurality of moving objects. A motion area detection unit to detect, an object tracking unit to detect and track the plurality of moving objects using the detected motion area and the second image data, and the plurality of extracted video frames from one of the plurality of second video frames The first recognition result obtained by applying the object image of the tracking object to the DCNN and the plurality of extracted from at least one other video frame input at a predetermined time interval from the one video frame among the plurality of second video frames Based on the second recognition result obtained by applying the object image of the tracking object to the DCNN, a tracking object classification unit for classifying the plurality of tracking objects, and checking a user's uninteresting object that deviates from a specified criterion among the divided tracking objects The uninterested tracking object removal unit for removing the identified user uninterested object from the object of the plurality of tracking objects in the object tracking unit, and an event in which tracking information of the user interest object obtained according to the removal satisfies a specified rule. It may include an event detection unit based on a tracking object of interest to detect.

상기 추적 객체 분류부는, 상기 제1 인식결과 및 상기 제2 인식결과를 점수로 각각 계산 및 누적하여 누적 점수가 가장 높은 유형(class)을 추적 객체의 유형으로 확정할 수 있다.The tracking object classification unit may calculate and accumulate the first recognition result and the second recognition result as scores, respectively, and determine a class having the highest accumulated score as a type of the tracking object.

상기 비관심 추적 객체 제거부는, 상기 확정된 유형의 객체에서 상기 지정 기준을 벗어나는 사용자 비관심 추적 객체를 확인하여 상기 복수의 추적 객체의 대상에서 제거할 수 있다.The non-interest tracking object removal unit may identify a user non-interest tracking object that exceeds the designated criterion in the determined type of object and remove it from the target of the plurality of tracking objects.

상기 영상 변환부는, 휘도신호(Y), 적색신호의 차(U) 및 휘도신호와 청색성분의 차(V)의 픽셀 포맷을 갖는 상기 제1 픽셀 포맷을 RGB(Red-Green-Blue) 또는 그레이스케일(Gray-Scale) 픽셀 포맷을 갖는 상기 제2 픽셀 포맷으로 변환할 수 있다.The image converting unit may include the first pixel format having a pixel format of a luminance signal (Y), a difference (U) of a red signal, and a difference (V) of a luminance signal and a blue component (Red-Green-Blue) or gray. The second pixel format may be converted to a gray-scale pixel format.

또한, 본 발명의 실시예에 따른 복수의 고해상도 카메라들을 이용한 파노라마 영상 감시 방법은, 영상 내의 객체에 대한 인식결과를 제공해주는 DCNN을 이용해 영상 분석을 수행하는 복수의 고해상도 카메라들을 이용한 파노라마 영상 감시 시스템의 영상 감시 방법으로서, 상기 파노라마 영상 감시 시스템을 구성하는 영상 변환부가, 제1 픽셀 포맷, 제1 해상도 및 제1 프레임률을 갖는 복수의 제1 비디오 프레임으로 구성되는 제1 영상 데이터를 상기 고해상도 카메라로부터 수신하며, 상기 수신한 제1 영상 데이터의 상기 제1 해상도, 상기 제1 프레임률 및 상기 제1 픽셀 포맷을 각각 상기 제1 해상도보다 낮은 제2 해상도, 상기 제1 프레임률보다 낮은 제2 프레임률 및 상기 제1 픽셀 포맷과 다른 형식을 갖는 이종의 제2 픽셀 포맷의 제2 영상 데이터로 변환하는 단계, 및 상기 파노라마 영상 감시 시스템을 구성하는 모션 기반 영상 분석부가, 상기 영상 변환부에서 변환한 제2 영상 데이터의 비디오 프레임에서 모션 기반으로 복수의 이동 객체의 검출 및 추적을 수행하고, 상기 복수의 추적 중인 객체의 이미지를 각각 추출하며, 상기 추출한 객체 이미지를 상기 DCNN에 입력하여 얻은 인식 결과를 이용하여 상기 복수의 추적 객체를 구분하며, 상기 구분한 복수의 추적 객체에서 지정 기준을 벗어나는 사용자 비관심 객체를 확인하여 상기 확인한 사용자 비관심 객체를 상기 복수의 추적 객체의 대상에서 제거하여 획득되는 사용자 관심 객체의 추적 정보 및 지정 규칙을 근거로 이벤트를 검출하는 단계를 포함한다.In addition, the method for monitoring a panoramic image using a plurality of high-resolution cameras according to an embodiment of the present invention includes a panoramic image monitoring system using a plurality of high-resolution cameras that perform image analysis using DCNN, which provides an object recognition result in an image. As an image monitoring method, the image conversion unit constituting the panoramic image monitoring system receives first image data composed of a plurality of first video frames having a first pixel format, a first resolution, and a first frame rate from the high-resolution camera. And a second resolution lower than the first resolution and a second frame rate lower than the first frame rate, respectively, for receiving the first resolution, the first frame rate, and the first pixel format of the received first image data. And converting second image data of a second type of heterogeneous pixel having a format different from that of the first pixel format, and a motion-based image analysis unit constituting the panoramic image surveillance system. 2 Performs detection and tracking of a plurality of moving objects based on motion in a video frame of video data, extracts images of the plurality of tracking objects, and uses the recognition results obtained by inputting the extracted object images to the DCNN The user interest obtained by classifying the plurality of tracking objects and checking the user non-interesting objects that deviate from the specified criteria from the plurality of tracking objects to remove the checked user non-interesting objects from the target of the plurality of tracking objects And detecting an event based on object tracking information and a specified rule.

상기 영상 변환부는 비디오 프레임 획득부, 비디오 프레임 서브샘플링부, 비디오 프레임 스케일링부, 및 픽셀 포맷 변환부를 포함하며, 상기 제2 영상 데이터로 변환하는 단계는, 상기 비디오 프레임 획득부에서, 상기 제1 프레임률에 따라 입력되는 상기 복수의 비디오 프레임을 수신하는 단계, 상기 비디오 프레임 서브샘플링부에서, 상기 입력된 복수의 제1 비디오 프레임보다 적은 수의 비디오 프레임을 샘플링하여 복수의 제2 비디오 프레임을 생성하는 단계, 상기 비디오 프레임 스케일링부에서, 상기 생성한 복수의 제2 비디오 프레임을 상기 제2 해상도로 변환하는 단계, 및 상기 픽셀 포맷 변환부가, 상기 제2 해상도로 변환된 상기 복수의 제2 비디오 프레임의 상기 제1 픽셀 포맷을 상기 제2 픽셀 포맷으로 변환하여 생성한 상기 제2 영상 데이터를 상기 모션 기반 영상 분석부로 제공하는 단계를 포함할 수 있다.The image conversion unit includes a video frame acquisition unit, a video frame subsampling unit, a video frame scaling unit, and a pixel format conversion unit, and converting the second image data into the video frame acquisition unit includes: the first frame Receiving the plurality of video frames input according to a rate, and generating a plurality of second video frames by sampling fewer video frames than the plurality of input first video frames in the video frame subsampling unit Step, in the video frame scaling unit, converting the generated plurality of second video frames to the second resolution, and the pixel format conversion unit of the plurality of second video frames converted to the second resolution The method may include providing the second image data generated by converting the first pixel format to the second pixel format to the motion-based image analysis unit.

상기 모션 기반 영상 분석부는 모션 영역 검출부, 객체 추적부, 추적 객체 분류부, 비관심 추적 객체 제거부 및 관심 추적 객체 기반 이벤트 검출부를 포함하며, 상기 이벤트를 검출하는 단계는, 상기 모션 영역 검출부에서, 상기 제2 영상 데이터와 학습된 배경 영상을 이용하여 차 영상을 생성하고, 상기 생성한 차 영상에서 노이즈를 제거하여 상기 복수의 이동 객체에 대한 모션 영역을 검출하는 단계, 상기 객체 추적부에서, 상기 검출한 모션 영역과 상기 제2 영상데이터를 이용하여 상기 복수의 이동 객체를 검출하고 추적하는 단계, 상기 추적 객체 분류부에서, 상기 복수의 제2 비디오 프레임 중 하나의 비디오 프레임에서 추출한 상기 복수의 추적 객체의 객체 이미지를 상기 DCNN에 적용해 얻은 제1 인식결과 및 상기 복수의 제2 비디오 프레임 중 상기 하나의 비디오 프레임과 일정 시간 간격을 두고 입력된 적어도 하나의 다른 비디오 프레임에서 추출한 상기 복수의 추적 객체의 객체 이미지를 상기 DCNN에 적용해 얻은 제2 인식결과를 근거로 상기 복수의 추적 객체를 구분하는 단계, 상기 비관심 추적 객체 제거부에서, 상기 구분한 복수의 추적 객체 중에서 지정 기준에 벗어나는 사용자 비관심 객체를 확인하여 상기 확인한 사용자 비관심 객체를 상기 객체 추적부의 상기 복수의 추적 객체의 대상에서 제거하는 단계, 및 상기 관심 추적 객체 기반 이벤트 검출부에서, 상기 제거에 따라 획득되는 사용자 관심 객체의 추적 정보가 지정 규칙을 만족하는 이벤트를 검출하는 단계를 포함할 수 있다.The motion-based image analysis unit includes a motion area detection unit, an object tracking unit, a tracking object classification unit, an uninterested tracking object removal unit, and an interest tracking object-based event detection unit, and detecting the event comprises: Generating a difference image using the second image data and the learned background image, and removing noise from the generated difference image to detect motion regions for the plurality of moving objects, in the object tracking unit, Detecting and tracking the plurality of moving objects using the detected motion region and the second image data, and in the tracking object classification unit, the plurality of tracking extracted from one video frame among the plurality of second video frames The first recognition result obtained by applying the object image of the object to the DCNN and the plurality of tracking objects extracted from at least one other video frame input at a predetermined time interval from the one video frame among the plurality of second video frames Classifying the plurality of tracking objects based on the second recognition result obtained by applying the object image of the DCNN, and in the non-interest tracking object removing unit, a user ratio that deviates from a specified criterion among the divided tracking objects. Checking the object of interest and removing the identified user uninterested object from the object of the plurality of tracking objects in the object tracking unit, and tracking information of the user interest object obtained by the removal in the interest tracking object-based event detection unit It may include the step of detecting an event that satisfies the specified rule.

상기 복수의 추적 객체를 구분하는 단계는, 상기 제1 인식결과 및 상기 제2 인식결과를 점수로 각각 계산 및 누적하여 누적 점수가 가장 높은 유형을 추적 객체의 유형으로 확정할 수 있다.The step of distinguishing the plurality of tracking objects may calculate and accumulate the first recognition result and the second recognition result as scores, respectively, to determine the type having the highest accumulated score as the type of the tracking object.

상기 복수의 추적 객체의 대상에서 제거하는 단계는, 상기 확정된 유형의 객체에서 상기 지정 기준을 벗어나는 사용자 비관심 추적 객체를 확인하여 상기 복수의 추적 객체의 대상에서 제거할 수 있다.In the step of removing from the targets of the plurality of tracking objects, a user uninterested tracking object that exceeds the specified criterion in the determined type of object may be identified and removed from the targets of the plurality of tracking objects.

상기 제2 영상 데이터로 변환하는 단계는, 휘도신호(Y), 적색신호의 차(U) 및 휘도신호와 청색성분의 차(V)의 픽셀 포맷을 갖는 상기 제1 픽셀 포맷을 RGB 또는 그레이스케일 픽셀 포맷을 갖는 상기 제2 픽셀 포맷으로 변환할 수 있다.In the converting of the second image data, the first pixel format having a pixel format of a luminance signal (Y), a difference (U) of a red signal, and a difference (V) of a luminance signal and a blue component is RGB or grayscale. The pixel format may be converted to the second pixel format.

기타 실시예들의 구체적인 사항들은 상세한 설명 및 첨부 도면들에 포함되어 있다.Details of other embodiments are included in the detailed description and accompanying drawings.

본 발명의 이점 및/또는 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나, 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성요소를 지칭한다.Advantages and/or features of the present invention and methods for achieving them will become apparent by referring to embodiments described below in detail together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be implemented in various different forms, and only the present embodiments allow the disclosure of the present invention to be complete, and common knowledge in the art to which the present invention pertains. It is provided to completely inform the person having the scope of the invention, and the present invention is only defined by the scope of the claims. The same reference numerals refer to the same components throughout the specification.

본 발명의 실시예에 따르면, 방사 형태로 배치된 다수의 고해상도 영상 촬영 기기로부터 수신된 고해상도 영상으로 저해상도 파노라마 영상을 생성하여 파노라마 영상을 표시하고 실시간 영상 분석을 통해 이벤트가 발생하면 해당 위치에 대한 입력된 고해상도 영상을 이용하여 디지털 PTZ 영상을 제공함으로써 적은 CPU 자원을 이용하여 고해상도 파노라마 영상 감시 시스템을 구축할 수 있는 효과가 있다.According to an embodiment of the present invention, a low-resolution panoramic image is generated from a high-resolution image received from a plurality of high-resolution image photographing devices arranged in a radial form to display a panoramic image, and when an event occurs through real-time image analysis, input to a corresponding location By providing a digital PTZ image using the old high resolution image, there is an effect of constructing a high resolution panoramic image surveillance system using less CPU resources.

또한, 본 발명의 실시예에 따르면, 기존의 모션 기반의 객체 검출 알고리즘을 사용하는 CCTV 영상분석장치의 문제점을 DCNN 기반 이미지 인식 기술을 활용하여 개선하되, 일반적인 CPU가 내장된 장치에서도 DCNN 기반 영상 분석이 가능하게 됨으로써 기존 고가의 장비 구입에 들던 비용을 절약할 수 있을 것이다.In addition, according to an embodiment of the present invention, the problem of the CCTV image analysis apparatus using the existing motion-based object detection algorithm is improved by using DCNN-based image recognition technology, but DCNN-based image analysis is performed even in a device with a general CPU. By making this possible, it would be possible to save the cost of purchasing expensive equipment.

나아가, 딥러닝이 적용된 DCNN 기반 이미지 인식 기술을 활용함으로써 객체 검출의 정확도가 증대되어 이벤트 검출에 따른 경보 발송이 정확해지게 될 것이다.Furthermore, by using DCNN-based image recognition technology to which deep learning is applied, the accuracy of object detection is increased, so that an alarm is sent according to event detection.

도 1은 본 발명의 일 실시예에 따른 파노라마 영상 감시 시스템을 설명하기 위한 전체 구성도,
도 2는 본 발명의 일 실시예에 따른 파노라마 영상감시장치를 설명하기 위한 블록도,
도 3은 본 발명에 따른 파노라마 영상 감시 방법의 일 실시예를 설명하기 위한 흐름도,
도 4는 본 발명에 따른 파노라마 영상 감시 방법의 다른 일 실시예를 설명하기 위한 흐름도,
도 5는 본 발명의 일 실시예에 따른 파노라마 영상감시장치가 저해상도 파노라마 영상을 생성하는 과정을 설명하기 위한 참조도,
도 6은 본 발명의 일 실시예에 따른 파노라마 영상감시장치가 고해상도 디지털 PTZ 영상을 생성하는 과정을 설명하기 위한 참조도,
도 7은 본 발명의 다른 실시예에 따른 객체 이미지 인식 DCNN 기반 CCTV 영상분석장치의 구성도,
도 8은 본 발명의 일 실시예에 따른 모션 영역 검출부의 처리 과정 및 결과를 보여주는 예시도,
도 9는 본 발명의 일 실시예에 따른 객체 추적부의 처리 과정 및 결과를 보여주는 예시도,
도 10은 본 발명의 일 실시예에 따른 추적 객체 분류부의 처리 과정을 설명하는 도면,
도 11은 주어진 이미지를 사람, 차량, 미확인 클래스 중 하나로 분류하는 객체 이미지 인식 DCNN을 학습하기 위한 샘플 이미지들의 예시도,
도 12는 본 발명의 일 실시예에 따른 추적 객체 분류부를 통해 추적 객체들을 분류한 예시도,
도 13은 본 발명의 일 실시예에 따른 비관심 추적 객체 제거부의 처리 결과를 보여주는 예시도,
도 14는 본 발명의 일 실시예에 따른 관심 추적 객체 기반 이벤트 검출부의 처리 결과를 보여주는 예시도,
도 15는 본 발명의 일 실시예에 따른, N대의 CCTV 카메라 영상을 동시에 실시간 분석하기 위한 객체 이미지 인식 DCNN 기반 CCTV 영상분석장치의 구성도,
도 16은 본 발명의 일 실시예에 따른, CCTV 관제 센터 등에서 대규모로 영상 분석을 수행하기 위한 객체 이미지 인식 DCNN 기반 CCTV 영상 분석 시스템의 구성도,
도 17은 본 발명의 다른 실시예에 따른 영상분석장치의 구조를 나타내는 블록도, 그리고
도 18은 본 발명의 실시예에 따른 영상분석장치의 구동과정을 나타내는 흐름도이다.1 is an overall configuration diagram for explaining a panoramic video surveillance system according to an embodiment of the present invention,
2 is a block diagram for explaining a panoramic image sensor value according to an embodiment of the present invention,
3 is a flowchart for explaining an embodiment of a method for monitoring a panoramic image according to the present invention,
4 is a flowchart for explaining another embodiment of a method for monitoring a panoramic image according to the present invention,
5 is a reference diagram for explaining a process of generating a low-resolution panoramic image by the panoramic image surveillance apparatus according to an embodiment of the present invention,
6 is a reference diagram for explaining a process of generating a high-resolution digital PTZ image by the panoramic image surveillance apparatus according to an embodiment of the present invention,
7 is a block diagram of an object image recognition DCNN based CCTV image analysis apparatus according to another embodiment of the present invention,
8 is an exemplary view showing a processing process and results of a motion region detection unit according to an embodiment of the present invention,
9 is an exemplary view showing the processing process and results of the object tracking unit according to an embodiment of the present invention,
10 is a view for explaining a processing process of a tracking object classification unit according to an embodiment of the present invention;
11 is an exemplary diagram of sample images for learning an object image recognition DCNN classifying a given image into one of people, vehicles, and unidentified classes;
12 is an exemplary view of classifying tracking objects through a tracking object classification unit according to an embodiment of the present invention;
13 is an exemplary view showing a processing result of a non-interest tracking object removal unit according to an embodiment of the present invention,
14 is an exemplary view showing a result of processing an interest tracking object-based event detection unit according to an embodiment of the present invention;
15 is a block diagram of an object image recognition DCNN-based CCTV image analysis device for simultaneously real-time analysis of N CCTV camera images according to an embodiment of the present invention;
16 is a block diagram of an object image recognition DCNN-based CCTV video analysis system for performing video analysis on a large scale in a CCTV control center, according to an embodiment of the present invention;
17 is a block diagram showing the structure of an image analysis apparatus according to another embodiment of the present invention, and
18 is a flowchart illustrating a driving process of an image analysis apparatus according to an embodiment of the present invention.

이하에서는 첨부된 도면을 참조하여 본 발명의 실시예들을 상세히 설명하기로 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 파노라마 영상 감시 시스템을 설명하기 위한 전체 구성도이다.1 is an overall configuration diagram for explaining a panoramic image surveillance system according to an embodiment of the present invention.

도 1을 참조하면, 고해상도 파노라마영상 감시 시스템은 방사 형태로 배치된 다수의 영상 촬영 기기(혹은 복수의 고해상도 카메라)(100) 및 영상감시장치(200)를 포함한다.Referring to FIG. 1, a high-resolution panoramic image surveillance system includes a plurality of image photographing devices (or a plurality of high-resolution cameras) 100 and an image surveillance device 200 arranged in a radial form.

방사 형태로 배치된 다수의 영상 촬영 기기(100)는 촬영 영역의 일부가 서로 겹치게 설치되며 주변의 영상을 촬영하여 고해상도 영상을 영상감시장치(200)에 제공한다.The plurality of image photographing devices 100 arranged in a radial form are installed so that a part of the photographing area overlaps with each other and photographs surrounding images to provide a high-resolution image to the image monitoring device 200.

여기에서, 다수의 영상 촬영 기기(100)는 방사 형태로 고정되어 해당 영역을 촬영하여 고해상도 영상을 획득할 수 있는 장치를 의미하며, 예를 들어 고해상도 메가픽셀 카메라 등이 될 수 있다.Here, the plurality of image photographing devices 100 are fixed in the form of a radiation and mean a device capable of acquiring a high-resolution image by photographing a corresponding area, for example, a high-resolution megapixel camera.

영상감시장치(200)는 다수의 영상 촬영 기기(100)로부터 고해상도 영상을 수신하면 저해상도 파노라마 영상 생성 테이블을 기초로 저해상도 파노라마 영상을 생성하여 표시한다.When a high-resolution image is received from a plurality of image photographing devices 100, the image surveillance apparatus 200 generates and displays a low-resolution panoramic image based on the low-resolution panoramic image generation table.

영상감시장치(200)는 저해상도 파노라마 영상을 생성하여 표출하고, 실시간 영상 분석을 통해 객체 검출/추적 및 이벤트 감지를 수행하고, 지정된 이벤트가 감지되면(예를 들어 "침입" 이벤트는 파노라마 영상의 지정된 영역 내에 이동체가 출연하는 경우를 의미함), 해당 객체 또는 이벤트 발생 위치에 대응하는 고해상도 디지털 PTZ 영상을 생성하여 표출한다.The image surveillance device 200 generates and displays a low-resolution panoramic image, performs object detection/tracking and event detection through real-time image analysis, and when a specified event is detected (for example, “intrusion” event is a designated panoramic image) This means that a moving object appears in the area), and generates and displays a high-resolution digital PTZ image corresponding to the object or event occurrence location.

이와 같이, 본 발명은 방사 형태로 배치된 다수의 영상 촬영 기기로부터 수신된 고해상도 영상을 이용하여 저해상도 파노라마 영상을 생성하여 표시하고 실시간 영상 분석을 통해 이벤트가 발생하면 해당 위치에 대한 고해상도 디지털 PTZ 영상을 제공함으로써 적은 CPU 자원을 이용하여 초고해상도 파노라마 영상 감시 시스템을 효율적으로 구축 및 운영할 수 있다는 효과가 있다. 이하에서는 도 2를 참조하여 영상감시장치를 보다 구체적으로 설명하기로 한다.As described above, the present invention generates and displays a low-resolution panoramic image by using a high-resolution image received from a plurality of imaging devices arranged in a radial form, and when an event occurs through real-time image analysis, displays a high-resolution digital PTZ image for the corresponding location. By providing it, there is an effect that an ultra-high resolution panoramic video surveillance system can be efficiently constructed and operated using less CPU resources. Hereinafter, the image sensing value will be described in more detail with reference to FIG. 2.

도 2는 본 발명의 일 실시예에 따른 영상감시장치를 설명하기 위한 블록도이다.2 is a block diagram illustrating an image sensing value according to an embodiment of the present invention.

도 2를 참조하면, 영상감시장치(200)는 영상 입력부(210), 영상 디코딩부(211), 영상 저장부(212), 영상 검색/재생부(213), 저해상도 파노라마 영상 생성부(214), 저해상도 파노라마 영상 생성 테이블(215), 객체 추적부(216), 수동 디지털 PTZ 제어부(217), 자동 디지털 PTZ 제어부(218), 고해상도 디지털 PTZ 영상 생성부(219), 초고해상도 파노라마 영상 생성 테이블(220) 및 영상 표출부(221)를 포함한다.Referring to FIG. 2, the image surveillance device 200 includes an image input unit 210, an image decoding unit 211, an image storage unit 212, an image search/play unit 213, and a low-resolution panoramic image generation unit 214. , Low-resolution panoramic image generation table 215, object tracking unit 216, manual digital PTZ control unit 217, automatic digital PTZ control unit 218, high-resolution digital PTZ image generation unit 219, ultra-high resolution panoramic image generation table ( 220) and an image display unit 221.

영상 입력부(210)는 방사 형태로 배치되어 있고 촬영 영역의 일부가 서로 겹치게 설치된 다수의 영상 촬영 기기(100)로부터 압축된 고해상도 영상을 수신하여 영상 디코딩부(211) 및 영상 저장부(212) 각각에 제공한다. The image input unit 210 is arranged in a radial form and receives a compressed high-resolution image from a plurality of image photographing devices 100 in which a part of the photographing area overlaps with each other, so that the image decoding unit 211 and the image storage unit 212 are respectively To provide.

영상 디코딩부(211)는 영상 입력부(210)로부터 고해상도 영상을 수신하여 디코딩을 실행하고, 디코딩된 영상을 저해상도 파노라마 영상 생성부(214) 또는 고해상도 디지털 PTZ 영상 생성부(219)에 제공한다.The image decoding unit 211 receives a high resolution image from the image input unit 210 to perform decoding, and provides the decoded image to the low-resolution panoramic image generation unit 214 or the high-resolution digital PTZ image generation unit 219.

영상 저장부(212)는 영상 입력부(210)로부터 영상 스트림을 수신하여 그대로 저장한다.The image storage unit 212 receives the image stream from the image input unit 210 and stores it as it is.

영상 검색/재생부(213)는 특정 조건에 해당하는 영상의 검색이 요청되면 영상 저장부(212)에서 특정 조건에 해당하는 영상 데이터를 독출하여 영상 디코딩부(211)에 제공한다. 여기에서, 특정 조건은 이벤트 감지 정보(예를 들어, 배회, 가상펜스, 역방향 이동 등과 같은 이벤트 감지), 객체 감지 정보(특정 색상, 크기등과 같은 객체 정보)를 기반으로 검색 조건을 제시하거나 검색하고자 하는 날짜를 기초로 하는 조건 중 적어도 하나를 포함한다. When a search for an image corresponding to a specific condition is requested, the image search/playback unit 213 reads the image data corresponding to the specific condition from the image storage unit 212 and provides it to the image decoding unit 211. Here, a specific condition is based on event detection information (for example, event detection such as roaming, virtual fence, reverse movement, etc.) and object detection information (object information such as a specific color, size, etc.) to present or search a search condition. It includes at least one of the conditions based on the desired date.

저해상도 파노라마 영상 생성부(214)는 저해상도의 파노라마 영상 생성 정보를 가지고 있는 저해상도 파노라마 영상 생성 테이블(215)을 이용하여 저해상도 파노라마 영상을 생성한다. 저해상도 파노라마 영상 생성부(214)는 저해상도 파노라마 영상을 객체 추적부(216) 및 수동 디지털 PTZ 제어부(217)에 각각 제공하고 영상 표출을 위해 영상 표출부(221)에 제공한다.The low-resolution panoramic image generation unit 214 generates a low-resolution panoramic image using the low-resolution panoramic image generation table 215 having the low-resolution panoramic image generation information. The low-resolution panoramic image generation unit 214 provides the low-resolution panoramic image to the object tracking unit 216 and the manual digital PTZ control unit 217, respectively, and to the image display unit 221 for image display.

보다 구체적으로, 저해상도 파노라마 영상 생성부(214)는 저해상도 파노라마 영상 생성 테이블(215)에 저장되어 있는 이미지 워핑(warping) 및 블렌딩(blending) 정보를 이용하여 저해상도 파노라마 영상을 생성한다.More specifically, the low-resolution panoramic image generation unit 214 generates a low-resolution panoramic image using image warping and blending information stored in the low-resolution panoramic image generation table 215.

객체 추적부(216)는 저해상도 파노라마 영상 생성부(214)로부터 저해상도 파노라마 영상을 수신하면 객체 감지 및 추적을 통하여 설정된 이벤트 감지를 수행한다.The object tracking unit 216 receives the low-resolution panoramic image from the low-resolution panoramic image generating unit 214 and performs object detection and event detection through tracking.

객체 추적부(216)는 프레임별 객체의 위치 및 크기 정보를 자동 디지털 PTZ 제어부(218)에 제공한다. 여기에서, 객체 감지 및 추적하는 방법은 본 발명의 범위에서 벗어나므로 생략한다. 다만, 이후에 기술되는 도 7 내지 도 18의 관련 내용들이 적용될 수 있다. 수동 디지털 PTZ 제어부(217)는 사용자가 저해상도 파노라마 영상에서 지정한 위치좌표 정보를 고해상도 파노라마 영상의 위치 좌표로 변환하여 고해상도 디지털 PTZ 영상 생성부(219)에 제공한다.The object tracking unit 216 provides the location and size information of each frame to the automatic digital PTZ control unit 218. Here, the method of detecting and tracking the object is omitted because it is outside the scope of the present invention. However, the related contents of FIGS. 7 to 18 described later may be applied. The manual digital PTZ control unit 217 converts location coordinate information designated by the user from the low-resolution panoramic image into location coordinates of the high-resolution panoramic image and provides it to the high-resolution digital PTZ image generation unit 219.

자동 디지털 PTZ 제어부(218)는 객체 추적부(216)에서 감지된 저해상도 파노라마 영상에서의 감지 위치좌표 정보를 고해상도 파노라마 영상의 위치좌표로 변환하여 고해상도 디지털 PTZ 영상 생성부(219)에 제공한다. The automatic digital PTZ control unit 218 converts the sensed location coordinate information from the low-resolution panoramic image detected by the object tracking unit 216 to the location coordinate of the high-resolution panoramic image and provides it to the high-resolution digital PTZ image generation unit 219.

고해상도 디지털 PTZ 영상 생성부(219)는 수신된 고해상도 파노라마 영상의 위치좌표에 해당하는 고해상도 디지털 PTZ 영상을 생성하여 영상 표출부(221)에 제공한다. 이때, 고해상도 디지털 PTZ 영상 생성부(219)에서는 전체 고해상도 파노라마 영상을 생성하지 않고 초고해상도 파노라마 영상 생성 테이블(220)의 일부만 이용하여 해당 위치 좌표에 해당하는 고해상도 디지털 PTZ 영상을 생성한다.The high-resolution digital PTZ image generation unit 219 generates a high-resolution digital PTZ image corresponding to the location coordinates of the received high-resolution panoramic image and provides it to the image display unit 221. At this time, the high-resolution digital PTZ image generation unit 219 generates a high-resolution digital PTZ image corresponding to the corresponding location coordinates by using only a part of the ultra-high resolution panoramic image generation table 220 without generating a full high-resolution panoramic image.

영상 표출부(221)는 저해상도 파노라마 영상 생성부(214) 또는 고해상도 디지털PTZ 영상 생성부(219)로부터 수신된 영상을 표시한다.The image display unit 221 displays an image received from the low-resolution panoramic image generation unit 214 or the high-resolution digital PTZ image generation unit 219.

도 3은 본 발명에 따른 영상 감시 방법의 일 실시예를 설명하기 위한 흐름도이다.3 is a flowchart illustrating an embodiment of an image monitoring method according to the present invention.

도 3을 참조하면, 영상감시장치(200)는 방사 형태로 배치되고 촬영 영역의 일부가 서로 겹치게 설치된 복수의 고해상도 카메라로부터 영상 입력부(210)를 통해 고해상도 영상을 수신한다(단계 S310). 수신된 고해상도 영상은 통상적으로 H.264 등과 같은 코덱을 통해 인코딩(압축)되어 있는 형태이다.Referring to FIG. 3, the image surveillance apparatus 200 receives a high-resolution image through the image input unit 210 from a plurality of high-resolution cameras that are disposed in a radial form and partially overlapped with each other in a photographing area (step S310). The received high-resolution image is usually encoded (compressed) through a codec such as H.264.

영상감시장치(200)는 각 카메라별로 수신한 고해상도 영상을 영상 저장부(212)를 통해 저장한다(단계 S320). 이때 저장되는 영상 데이터의 형태는 카메라로부터 수신된, 인코딩되어 있는 영상 데이터의 형태이다.The image surveillance device 200 stores the high resolution image received for each camera through the image storage unit 212 (step S320). At this time, the type of image data stored is the type of encoded image data received from the camera.

영상감시장치(200)는 각 카메라별로 수신한 영상 데이터의 디코딩을 수행한다(단계 S330).The video surveillance device 200 decodes the video data received for each camera (step S330).

영상감시장치(200)는 디코딩된 각 카메라별 영상 데이터로부터 저해상도 파노라마 영상 생성부(214)를 통해 저해상도 파노라마 영상을 생성하고, 영상 표출부(221)를 통해 파노라마 영상을 표출한다. 이때 저해상도 파노라마 영상 생성부(214)는 저해상도 파노라마 영상 생성 테이블(215)에 저장되어 있는 이미지 워핑(warping) 및 블렌딩(blending) 정보를 이용하여 고해상도 카메라 영상들로부터 저해상도 파노라마 영상을 생성한다.The video surveillance device 200 generates a low-resolution panoramic image through the low-resolution panoramic image generation unit 214 from the decoded image data for each camera, and displays the panoramic image through the image display unit 221. At this time, the low-resolution panoramic image generation unit 214 generates low-resolution panoramic images from high-resolution camera images using image warping and blending information stored in the low-resolution panoramic image generation table 215.

영상감시장치(200)는 저해상도 파노라마 영상에서 특정 위치 또는 특정 이동 객체가 지정되면(단계 S350), 수동 디지털 PTZ 제어부(217) 또는 자동 디지털 PTZ 제어부(218)를 통해 지정된 위치 또는 지정된 이동 객체의 위치에 해당하는 고해상도 디지털 PTZ 영상의 영역 정보를 획득한다(단계 S360).When a specific location or a specific moving object is designated in the low-resolution panoramic image (step S350), the image monitoring device 200 may specify a specified location or a location of the specified moving object through the manual digital PTZ control unit 217 or the automatic digital PTZ control unit 218. Area information of the high-resolution digital PTZ image corresponding to is obtained (step S360).

단계 S350에 대한 일 실시예로, 사용자는 표출 중인 저해상도 파노라마 영상에서 관심 영역에 해당하는 위치를 지정할 수 있다.As an embodiment of step S350, the user may designate a location corresponding to a region of interest in the low-resolution panoramic image being displayed.

단계 S350에 대한 다른 실시예로, 사용자는 객체 추적부(216)를 통해 추적 중인 이동 객체들 중에 관심 객체를 지정할 수 있다.In another embodiment of step S350, the user may designate an object of interest among the moving objects being tracked through the object tracking unit 216.

단계 S350에 대한 또 다른 실시예로, 객체 추적부(216)를 통해 감지된 이벤트의 발생 위치 또는 감지된 이벤트를 발생시킨 객체가 관심 위치 또는 관심 객체로 자동 지정될 수 있다.As another embodiment of step S350, the location of occurrence of the detected event or the object that generated the detected event through the object tracking unit 216 may be automatically designated as the location of interest or the object of interest.

영상감시장치(200)는 단계 S360에서 획득한 고해상도 디지털 PTZ 영상의 영역 정보를 이용하여 고해상도 디지털 PTZ 영상 생성부(219)를 통해 고해상도 디지털 PTZ 영상을 생성하고, 영상 표출부(221)를 통해 화면에 표출한다. 이때 고해상도 디지털 PTZ 영상 생성부(219)는 초고해상도 파노라마 영상 생성 테이블(220)에 저장되어 있는 이미지 워핑 및 블렌딩 정보를 이용하여, 영상 디코딩부(211)로부터 얻은 고해상도 카메라 영상들로부터 고해상도 디지털 PTZ 영상을 생성한다.The image surveillance device 200 generates a high-resolution digital PTZ image through the high-resolution digital PTZ image generation unit 219 using the region information of the high-resolution digital PTZ image obtained in step S360, and displays the image through the image display unit 221. Express on. At this time, the high-resolution digital PTZ image generation unit 219 uses the image warping and blending information stored in the ultra-high resolution panoramic image generation table 220 to obtain high-resolution digital PTZ images from high-resolution camera images obtained from the image decoding unit 211. Produces

도 4는 본 발명에 따른 영상 감시 방법의 일 실시예를 설명하기 위한 흐름도이다.4 is a flowchart for explaining an embodiment of an image monitoring method according to the present invention.

도 4를 참조하면, 사용자는 영상감시장치(200)로 녹화된 영상의 재생을 요청한다(단계 S410). 녹화 영상의 재생 시작 위치는 다양한 방법으로 지정될 수 있다. 예를 들면 사용자가 재생 시작 위치에 해당하는 시점을 직접 지정하던가, 사용자가 기 발생한 이벤트들 중에 특정 이벤트를 선택하면, 선택된 이벤트의 발생 시작 시점으로 지정될 수 있다.Referring to FIG. 4, the user requests reproduction of an image recorded by the image monitoring device 200 (step S410). The playback start position of the recorded video may be specified in various ways. For example, when a user directly designates a time point corresponding to a playback start position or when a user selects a specific event among previously generated events, it may be designated as a start time point of the selected event.

영상감시장치(200)는 영상 검색/재생부(213)를 통해 영상 저장부(212)에 녹화되어 있는 각 카메라 별 녹화 영상 데이터를 가져온다(단계 S420).The video surveillance device 200 obtains recorded video data for each camera recorded in the video storage unit 212 through the video search/playback unit 213 (step S420).

영상감시장치(200)는 각 카메라별 녹화 영상 데이터를 영상 디코딩부(211)를 통해 디코딩한다(단계 S430).The video surveillance device 200 decodes the recorded video data for each camera through the video decoding unit 211 (step S430).

영상감시 장치(200)는 디코딩된 각 카메라별 영상 데이터로부터 저해상도 파노라마 영상 생성부(214)를 통해 저해상도 파노라마 영상을 생성하고, 영상 표출부(221)를 통해 화면에 표출한다(단계 S440).The video surveillance device 200 generates a low-resolution panoramic image from the decoded image data for each camera through the low-resolution panoramic image generation unit 214, and displays it on the screen through the image display unit 221 (step S440).

사용자는 관심 영역 또는 관심 이동 객체에 대한 고해상도 영상을 보기 위해, 관심 영역의 해당 위치 또는 관심 이동 객체를 표출 중인 저해상도 파노라마 영상에서 지정한다(단계 S450).In order to view a high-resolution image of a region of interest or a moving object of interest, the user designates a corresponding location of the region of interest or a moving object of interest in a low-resolution panoramic image being displayed (step S450).

영상감시장치(200)는 사용자가 관심 영역의 해당 위치를 지정한 경우, 수동 디지털 PTZ 제어부(217)를 통해 해당 고해상도 디지털 PTZ 영상의 영역 정보를 획득하고, 사용자가 관심 이동 객체를 지정한 경우, 자동 디지털 PTZ 제어부(218)를 통해 해당 고해상도 디지털 PTZ 영상의 영역 정보를 획득한다(단계 S460).When the user designates a corresponding location of the region of interest, the image surveillance device 200 obtains region information of the corresponding high-resolution digital PTZ image through the manual digital PTZ control unit 217, and when the user designates a moving object of interest, automatic digital Through the PTZ control unit 218, area information of the corresponding high-resolution digital PTZ image is obtained (step S460).

영상감시장치(200)는 단계 S430에서 획득한 각 카메라별 디코딩된 영상 데이터와 단계 S460에서 획득한 고해상도 디지털 PTZ 영상의 영역 정보로부터 고해상도 디지털 PTZ 영상 생성부(219)를 통해 고해상도 디지털 PTZ 영상을 생성하고, 영상 표출부(221)를 통해 화면에 표출한다.The image surveillance device 200 generates a high-resolution digital PTZ image through the high-resolution digital PTZ image generation unit 219 from the decoded image data for each camera acquired in step S430 and the region information of the high-resolution digital PTZ image obtained in step S460. Then, it is displayed on the screen through the image display unit 221.

도 5는 본 발명의 일 실시예에 따른 영상감시장치(200)가 저해상도 파노라마 영상을 생성하는 과정을 설명하기 위한 도면이다.5 is a view for explaining a process of generating a low-resolution panoramic image by the image surveillance apparatus 200 according to an embodiment of the present invention.

도 2와 도 5를 참조하면, 동시간에 획득한 각 카메라의 고해상도 영상으로부터, 영상감시장치(200)의 저해상도 파노라마 영상 생성부(214)는 저해상도 파노라마 영상 생성 테이블(215)에 저장되어 있는 이미지 워핑 및 블렌딩 정보를 이용하여 저해상도 파노라마 영상 생성 테이블과 동일한 크기(해상도)의 파노라마 영상을 생성한다.2 and 5, from the high-resolution images of each camera acquired at the same time, the low-resolution panoramic image generation unit 214 of the image surveillance apparatus 200 stores images stored in the low-resolution panoramic image generation table 215 Using the warping and blending information, a panorama image of the same size (resolution) as the low resolution panorama image generation table is generated.

보다 구체적으로, 파노라마 영상의 특정 픽셀의 밝기 값(컬러 영상인 경우에는 R/G/B 값)은 그것에 대응하는 카메라 영상들의 픽셀 밝기 값들의 가중 합(Weighted Sum)으로 구해진다. 예를 들어, N개의 카메라들로부터 획득한 N개의 카메라 영상들 중에 파노라마 영상 Ip의 특정 픽셀 좌표 (x,y)에 대응하는 픽셀을 갖는 카메라 영상이 I1과 I2라고 가정하자(도 5 참조). 그러면 파노라마 영상 Ip의 픽셀 좌표 (x,y)에서의 픽셀 밝기 값 Ip(x,y)는 하기 [수학식 1]에 의해 구해진다.More specifically, the brightness value (R/G/B value in the case of a color image) of a specific pixel of a panoramic image is obtained as a weighted sum of pixel brightness values of camera images corresponding thereto. For example, suppose that among the N camera images obtained from N cameras, camera images having pixels corresponding to a specific pixel coordinate (x,y) of the panoramic image Ip are I1 and I2 (see FIG. 5). Then, the pixel brightness value Ip(x,y) at the pixel coordinates (x,y) of the panoramic image Ip is obtained by the following [Equation 1].

상기 [수학식 1]에서 a(x,y)는 파노라마 영상의 픽셀 좌표 (x,y)에서의 블렌딩 계수 값으로, 0과 1 사이의 값을 갖는다. 픽셀 좌표 (x₁,y₁)과 (x₂,y₂)는 파노라마 영상의 픽셀 좌표 (x,y)에 대응하는 카메라 영상 I₁과 I₂에서의 픽셀 좌표 값을 의미한다. 즉, 파노라마 영상 Ip의 생성 시, 카메라 영상 I₁과 I₂의 픽셀 좌표 (x₁,y₁)과 (x₂,y₂)는 파노라마 영상 Ip의 픽셀 좌표 (x,y)로 워핑이 되는 것을 의미한다.In [Equation 1], a(x,y) is a blending coefficient value in pixel coordinates (x,y) of a panoramic image, and has a value between 0 and 1. The pixel coordinates (x ₁ ,y ₁ ) and (x ₂ ,y ₂ ) refer to the pixel coordinate values in the camera images I ₁ and I ₂ corresponding to the pixel coordinates (x,y) of the panoramic image. That is, when generating the panoramic image Ip, the pixel coordinates (x ₁ ,y ₁ ) and (x ₂ ,y ₂ ) of the camera images I ₁ and I ₂ are warped to the pixel coordinates (x,y) of the panoramic image Ip. Means

파노라마 영상 생성 테이블에는 생성할 파노라마 영상의 각 픽셀에 대응하는 카메라 영상의 픽셀 좌표 및 블렌딩 계수 값이 저장되어 있다. 따라서 N개의 카메라로부터 동시간에 획득한 N개의 카메라 영상이 입력으로 주어지면, 파노라마 영상 생성 테이블을 이용하여 파노라마 영상 전체 또는 그 일부를 손쉽게 계산할 수 있다. 파노라마 영상 생성 테이블은 카메라 보정(Camera Calibration) 기법에 의해 구할 수 있으며, 구체적인 방법은 본 특허의 범위를 벗어나므로 설명을 생략한다.The panoramic image generation table stores pixel coordinates and blending coefficient values of a camera image corresponding to each pixel of the panoramic image to be generated. Therefore, if the N camera images acquired at the same time from the N cameras are given as input, it is possible to easily calculate all or a part of the panoramic images using the panoramic image generation table. The panoramic image generation table can be obtained by a camera calibration technique, and a detailed method is out of the scope of the present patent, so the description is omitted.

저해상도 파노라마 영상 생성 테이블의 크기는 화면 상에 출력하거나 객체 추적을 위한 파노라마 영상의 해상도를 고려하여 결정한다. 예를 들면 수평 배치된 5대의 Full-HD급(1920x1080) 카메라를 이용하는 경우, 이를 통해 얻어지는 파노라마 영상의 해상도는 최대 8448x1080에 이른다(이웃하는 카메라 영상 사이의 겹침 정도가 15%라고 가정하는 경우). 그러나 통상적인 표출 장치(모니터)의 해상도는 이것에 훨씬 못 미치는 1920x1080 정도이다. 따라서 화면에 파노라마 영상을 한 줄로 표출하는 경우, 표출용 파노라마 영상의 해상도는 1920x246이면 족하다. 그러므로 상기 예에서는 크기가 1920x246인 저해상도 파노라마 영상 생성 테이블을 마련하면 된다.The size of the low-resolution panoramic image generation table is determined by considering the resolution of the panoramic image for outputting to a screen or tracking an object. For example, in the case of using five Full-HD level (1920x1080) cameras horizontally arranged, the resolution of the panoramic image obtained through this is up to 8448x1080 (assuming that the overlap between the neighboring camera images is 15%). However, the resolution of a typical display device (monitor) is about 1920x1080, far below this. Therefore, when a panoramic image is displayed on the screen in one line, the resolution of the panoramic image for display is sufficient if it is 1920x246. Therefore, in the above example, a low-resolution panoramic image generation table having a size of 1920x246 may be provided.

도 6은 본 발명의 일 실시예에 따른 영상감시장치(200)가 고해상도 디지털 PTZ 영상을 생성하는 과정을 설명하기 위한 도면이다.6 is a view for explaining a process of generating a high-resolution digital PTZ image by the video surveillance apparatus 200 according to an embodiment of the present invention.

도 2와 도 6을 참조하면, 동시간에 획득한 각 카메라의 고해상도 영상으로부터, 영상감시장치(200)의 고해상도 디지털 PTZ 영상 생성부(219)는 수동 디지털 PTZ 제어부(217) 또는 자동 디지털 PTZ 제어부(218)로부터 얻은 영역 정보 및 초고해상도 파노라마 영상 생성 테이블(220)에 저장되어 있는 이미지 워핑 및 블렌딩 정보를 이용하여 고해상도 디지털 PTZ 영상을 생성한다.2 and 6, from a high-resolution image of each camera acquired at the same time, the high-resolution digital PTZ image generation unit 219 of the image surveillance device 200 is a manual digital PTZ control unit 217 or an automatic digital PTZ control unit. A high resolution digital PTZ image is generated by using the region warping and blending information stored in the ultra-high resolution panoramic image generation table 220 obtained from (218).

상기 초고해상도 파노라마 영상 생성 테이블(220)의 구조는 저해상도 파노라마 영상 생성 테이블(215)과 동일하며, 테이블의 크기는 원 카메라 영상들을 통해 생성할 수 있는 파노라마 영상의 최대 해상도를 따른다.The structure of the ultra-high resolution panoramic image generation table 220 is the same as that of the low-resolution panoramic image generation table 215, and the size of the table follows the maximum resolution of the panoramic image that can be generated through the original camera images.

상기 고해상도 디지털 PTZ 영상은 상기 초고해상도 파노라마 영상 생성 테이블의 해당 영역 내에서만 이미지 워핑 및 블렌딩 작업을 수행함에 의해 생성되며, 디지털 PTZ 영상의 크기(해상도)는 사용자가 임의로 지정할 수 있다.The high-resolution digital PTZ image is generated by performing image warping and blending only within a corresponding area of the ultra-high resolution panoramic image generation table, and the size (resolution) of the digital PTZ image can be arbitrarily designated by the user.

도 7은 본 발명의 다른 실시예에 따른, 객체 이미지 인식 DCNN 기반 CCTV 영상분석장치의 구성도이다.7 is a block diagram of an object image recognition DCNN-based CCTV image analysis apparatus according to another embodiment of the present invention.

도 7의 영상분석장치(690)는 도 1에서와 같은 복수의 고해상도 카메라들을 이용한 파노라마 영상 감시 시스템에 적용될 수 있으며, 따라서 도 1의 영상감시장치(200)를 대체하거나 일부 구성요소들을 병합하여 활용할 수 있을 것이다. 일례로 도 7의 영상분석장치(690)는 도 1에서와 같이 복수의 고해상도 카메라(100_1~100_N)에 연동하여 동작할 수 있으며, 이를 위하여 해당 영상감시장치(200)의 구성 또는 동작의 일부 또는 전부를 포함할 수 있을 것이다. 다만, 설명의 편의를 위하여 이하에서는 고해상도 카메라(100_1~100_N)의 영상을 이용할 수 있다는 것만을 전제로 자세히 설명하기로 한다.The image analysis apparatus 690 of FIG. 7 may be applied to a panoramic image surveillance system using a plurality of high-resolution cameras as in FIG. 1, and thus may be used by replacing the image surveillance apparatus 200 of FIG. 1 or merging some components. Will be able to. As an example, the image analysis device 690 of FIG. 7 may operate in conjunction with a plurality of high-resolution cameras 100_1 to 100_N, as shown in FIG. 1, and for this purpose, a part of the configuration or operation of the corresponding video surveillance device 200 or You will be able to include everything. However, for convenience of description, hereinafter, it will be described in detail only on the assumption that the images of the high-resolution cameras 100_1 to 100_N can be used.

도 7에 도시된 바와 같이, 본 발명의 실시예에 따른 객체 이미지 인식 DCNN 기반 CCTV 영상분석장치(혹은 영상분석장치, 영상감시장치)(690)는, 영상 변환부(혹은 영상획득부)(721), 모션 기반 영상 분석부(722) 및 객체 이미지 인식 DCNN(부)(708)의 일부 또는 전부를 포함할 수 있다.As shown in FIG. 7, the object image recognition DCNN-based CCTV image analysis apparatus (or image analysis apparatus, image surveillance apparatus) 690 according to an embodiment of the present invention includes an image conversion unit (or image acquisition unit) 721 ), a part or all of the motion-based image analysis unit 722 and the object image recognition DCNN (unit) 708.

여기서, "일부 또는 전부를 포함한다"는 것은 객체 이미지 인식 DCNN(708)과 같은 일부 구성요소가 생략되어 CCTV 영상분석장치(690)가 구성되거나 모션 기반 영상 분석부(722)와 같은 일부 구성요소가 영상 변환부(721)와 같은 다른 구성요소에 통합되어 구성될 수 있는 것 등을 의미하는 것으로서, 발명의 충분히 이해를 돕기 위하여 전부 포함하는 것으로 설명한다.Here, "including some or all" means that some components, such as object image recognition DCNN 708, are omitted, such that the CCTV image analysis apparatus 690 is configured, or some components such as a motion-based image analysis unit 722 A means that can be configured to be integrated with other components, such as the image conversion unit 721, it will be described as including all in order to fully understand the invention.

영상 변환부(721)는 모션 기반 영상 분석부(722)가 사용할 영상 데이터를 취득한다. 가령, 영상 변환부(721)는 통신사의 통신망을 경유하여 CCTV나 복수의 고해상도 카메라가 제공하는 영상 데이터나 관공서 등의 서버에서 제공하는 영상 데이터를 취득할 수 있다. 모션 기반 영상 분석부(722)는 영상 변환부(721)가 제공하는 영상 데이터를 입력 받아 모션 및 DCNN 기반 영상 분석을 수행한다. 여기서, "영상 데이터"는 영상 신호를 의미할 수 있지만, 비디오 데이터를 의미할 수도 있다. 통상적으로 영상 신호는 비디오 신호와 음성 신호를 포함하고, 부가정보(예: 부호화정보, 자막정보 등)를 더 포함할 수 있는데, 이때 비디오 신호는 비디오 데이터를 나타낸다. 다만, 영상 처리에서는 비디오 신호의 처리가 중점적으로 다루어지는 관계로 당업자들을 영상 데이터와 비디오 데이터를 혼용하는 경향을 보이기도 한다. 따라서, 본 발명의 실시예에서는 위의 용어의 개념에 특별히 한정하지는 않을 것이다. 다시 말해, 영상 데이터와 비디오 데이터를 거의 동일 개념으로 사용할 수도 있다.The image conversion unit 721 acquires image data to be used by the motion-based image analysis unit 722. For example, the video converter 721 may acquire video data provided by a CCTV or a plurality of high-resolution cameras or video data provided by a server, such as a government office, through a communication network of a communication company. The motion-based image analysis unit 722 receives image data provided by the image conversion unit 721 and performs motion and DCNN-based image analysis. Here, "image data" may mean an image signal, but may also mean video data. Typically, the video signal includes a video signal and an audio signal, and may further include additional information (eg, encoding information, subtitle information, etc.), wherein the video signal represents video data. However, in the image processing, since video signal processing is mainly dealt with, it is also likely that those skilled in the art mix video data with video data. Therefore, the embodiment of the present invention will not be particularly limited to the concept of the above terms. In other words, it is also possible to use video data and video data in almost the same concept.

다만, 영상 데이터는 일련의 단위 프레임 영상을 포함한다. 다시 말해, 일련의 비디오 프레임으로 구성된다. 가령 프레임률이 60FPS(Frames Per Second)라 하면, 1초당 60장의 정지 영상 즉 단위 프레임 영상이 송수신되거나 하는 것을 의미한다. 따라서, 비디오 프레임, 프레임 비디오 또는 단위 프레임이라는 것은 거의 유사한 개념으로 사용되지만, 그 지칭하는 대상이 프레임이나 아니면 데이터냐에 따라 사용하는 용어는 조금 상이할 수 있다. 물론 통신 과정에서 이러한 단위 프레임 영상은 시리얼 데이터를 형성하므로, 영상 데이터는 단위 프레임 영상을 구성하는 픽셀의 화소값들이 시리얼로 이루어진 데이터라 볼 수도 있다. 그러나, 본 발명의 실시예에서는 위의 용어들이 당업자에게 다양하게 혼용되어 사용되므로 그 개념에 특별히 한정하지는 않을 것이다. 다만, 구분이 필요한 경우에는 명확히 구분하여 사용한다.However, the image data includes a series of unit frame images. In other words, it consists of a series of video frames. For example, if the frame rate is 60 frames per second (FPS), it means that 60 still images per second, that is, unit frame images are transmitted and received. Therefore, although a video frame, a frame video, or a unit frame is used in a nearly similar concept, the term used may differ slightly depending on whether the object to be referred to is a frame or data. Of course, since the unit frame image forms serial data in the communication process, the image data may be regarded as data in which pixel values of pixels constituting the unit frame image are serial. However, in the embodiment of the present invention, the above terms will not be particularly limited to the concept because they are used in various ways by those skilled in the art. However, if it is necessary to distinguish, use it clearly.

영상 변환부(721)는 비디오 프레임 획득부(701), 비디오 프레임 서브샘플링부(702), 비디오 프레임 스케일링부(703) 및 픽셀 포맷 변환부(704)의 일부 또는 전부를 포함할 수 있으며, 또한 모션 기반 영상 분석부(722)는 모션영역 검출부(705), 객체 추적부(706), 추적 객체 분류부(707), 비관심 추적 객체 제거부(709), 관심 추적 객체 기반 이벤트 검출부(710)의 일부 또는 전부를 포함할 수 있다. 여기서, "일부 또는 전부를 포함"한다는 것은 앞서서의 의미와 동일하다.The image conversion unit 721 may include a part or all of the video frame acquisition unit 701, the video frame subsampling unit 702, the video frame scaling unit 703, and the pixel format conversion unit 704. The motion-based image analysis unit 722 includes a motion area detection unit 705, an object tracking unit 706, a tracking object classification unit 707, an uninterested tracking object removal unit 709, and an interest tracking object-based event detection unit 710 It may include some or all of. Here, "including some or all" has the same meaning as above.

비디오 프레임 획득부(701)는 영상분석장치(690)에 연결된 CCTV 카메라, 비디오 스트리밍 서버 등과 같은 비디오 제공 장치로부터 비디오 프레임 데이터를 지속적으로 획득한다. 일례로 영상분석장치(690)에 연결된 비디오 제공 장치가 IP 카메라인 경우, 비디오 프레임 획득부(701)는 IP 카메라로부터 인코딩된 비디오 스트림 데이터를 수신 및 디코딩하여, YUV 픽셀 포맷(예: YV12 픽셀 포맷)의 비디오 프레임 데이터(혹은 프레임 비디오 데이터)를 지속적으로 획득한다. 여기서, YUV(예: 16비트)는 휘도신호(Y)와 적색신호의 차(U), 휘도신호와 청색성분의 차(V)의 3가지 정보로 색을 나타내는 형식이다. Y 성분은 오차에 민감하므로 색상 성분인 U와 V보다 많은 비트를 코딩하며, Y:U:V의 비율은 일반적으로 4:2:2이므로 YUV422로도 불리운다. 또한 적색 성분과 청색 성분을 이용하므로 색상을 지칭하는 크로마(Chroma)와 합쳐져 Chroma Red, Chroma Blue의 약어를 이용하여 YcrCb로도 불리운다.The video frame acquisition unit 701 continuously acquires video frame data from a video providing device, such as a CCTV camera or video streaming server connected to the video analysis device 690. For example, when the video providing device connected to the video analysis device 690 is an IP camera, the video frame acquiring unit 701 receives and decodes the encoded video stream data from the IP camera, YUV pixel format (eg, YV12 pixel format) ) Continuously acquires video frame data (or frame video data). Here, YUV (for example, 16-bit) is a format representing a color with three pieces of information: the difference (U) between the luminance signal (Y) and the red signal, and the difference (V) between the luminance signal and the blue component. Since the Y component is error-sensitive, it codes more bits than the color components U and V. The ratio of Y:U:V is generally 4:2:2, so it is also called YUV422. In addition, since red and blue components are used, it is combined with chroma, which refers to color, and is also called YcrCb by using abbreviations of Chroma Red and Chroma Blue.

비디오 프레임 서브샘플링부(702)는 비디오 프레임 획득부(701)가 획득한 (비디오)프레임들로부터 영상 분석에 사용될 프레임들을 서브샘플링(Subsampling)한다. 일례로 영상분석장치(690)에 입력되는 비디오의 프레임 레이트(Frame Rate)즉 프레임률을 30FPS이고, 영상 분석 프레임 레이트가 6FPS로 설정되어 있으면, 비디오 프레임 서브샘플링부(702)는 입력 비디오로부터 매 5 프레임마다 1 프레임씩 비디오 프레임을 취득한다.The video frame subsampling unit 702 subsamples frames to be used for image analysis from the (video) frames obtained by the video frame acquisition unit 701. For example, if the frame rate of the video input to the video analysis apparatus 690, that is, the frame rate is 30 FPS, and the video analysis frame rate is set to 6 FPS, the video frame subsampling unit 702 may check every input video. Video frames are acquired one frame for every 5 frames.

입력 비디오의 모든 프레임을 영상 분석에 사용하지 않고 서브샘플링한 프레임(혹은 복수의 제2 비디오 프레임)을 사용하는 이유는, 입력 비디오의 모든 프레임을 사용하는 경우 영상분석장치의 처리 부하는 매우 커지는 반면 그 효과(예: 객체 추적 성능 향상 등)는 통상적으로 작기 때문이다. 통상적인 CCTV 영상에서 보행자나 차량 객체를 추적하는 데 있어서 6~10FPS 정도면 충분하다.The reason for using subsampled frames (or a plurality of second video frames) instead of all the frames of the input video for video analysis is that the processing load of the video analysis device becomes very large when all the frames of the input video are used. This is because the effect (eg, improved object tracking performance) is usually small. 6~10FPS is enough to track pedestrians or vehicle objects in a typical CCTV image.

비디오 프레임 스케일링부(703)는 비디오 프레임 서브샘플링부(702)가 획득한 비디오 프레임의 크기(예: 해상도)를 지정된 크기로 스케일링(Scaling)한다. 통상적으로 비디오 프레임 스케일링부(703)는 고해상도의 입력 영상을 저해상도로 줄이는 역할을 한다. 일례로 비디오 프레임 획득부(701)가 획득한 비디오 프레임의 해상도가 1920x1080의 고해상도이고, 영상 분석부(722)에서 사용할 영상의 해상도가 640x480으로 설정되어 있으면, 비디오 프레임 스케일링부(703)는 1920x1080 해상도의 입력 영상으로부터 픽셀 서브샘플링 및 선형 보간(Linear Interpolation) 등을 통해 640x480 해상도의 영상을 생성한다.The video frame scaling unit 703 scales the size (eg, resolution) of the video frame obtained by the video frame subsampling unit 702 to a specified size. Typically, the video frame scaling unit 703 serves to reduce a high-resolution input image to a low resolution. For example, if the resolution of the video frame acquired by the video frame acquisition unit 701 is a high resolution of 1920x1080, and the resolution of the image to be used by the image analysis unit 722 is set to 640x480, the video frame scaling unit 703 is a 1920x1080 resolution. 640x480 resolution image is generated through pixel subsampling and linear interpolation from the input image.

영상 분석시 원 고해상도의 영상을 사용하지 않고 축소된 저해상도의 영상을 사용하는 이유는, 고해상도의 영상을 그대로 사용할 경우 영상분석장치의 처리 부하는 매우 커지는 반면 그 효과(예: 객체 검출 성능 향상 등)는 통상적으로 작기 때문이다. 원거리 감시 환경에서 원거리 소형 객체를 감지하려면 고해상도의 입력 영상을 축소하지 않고 그대로 처리하는 것이 필요할 수도 있으나, 통상적인 근거리 감시 환경에서는 고해상도 입력 영상을 축소하여 사용해도 객체 검출률에는 별 영향이 없으며, 반면 영상분석장치의 처리 부하는 훨씬 줄어드는 이득이 있다.The reason for using the reduced-resolution image that is reduced instead of using the original high-resolution image when analyzing the image is that when the high-resolution image is used as it is, the processing load of the image analysis device becomes very large while its effect (e.g., improved object detection performance) Because is usually small. In a remote monitoring environment, it may be necessary to process a high resolution input image without reducing it in order to detect a small object at a long distance. The processing load of the analytical device has a much reduced benefit.

픽셀 포맷 변환부(704)는 비디오 프레임 스케일링부(703)를 통해 스케일링이 된 YUV 픽셀 포맷의 영상을 영상 분석부(722)에서 사용할 수 있는 픽셀 포맷인 RGB 픽셀 포맷이나 그레이스케일(Gray-Scale) 픽셀 포맷으로 변환한다.The pixel format converting unit 704 is an RGB pixel format or gray-scale that is a pixel format that can be used by the image analysis unit 722 to scale YUV pixel-formatted images through the video frame scaling unit 703. Convert to pixel format.

모션 영역 검출부(705)는 주기적으로 학습한 배경 영상과 입력 영상의 차(혹은 차 영상)를 통해 기본적인 모션 영역을 구하고(혹은 인식하고), 각종 노이즈 모션 픽셀 제거 방법 및 모폴로지 필터링(Morphology Filtering)에 의한 최종적인 모션 영역을 검출한다. 도 8은 모션 영역 검출부(705)의 처리 과정 및 결과를 보여주는 예로, 입력 영상(801), 학습한 배경 영상(802), 입력 영상과 배경 영상의 차에 의한 기본 모션 영역 검출 결과(803), 노이즈 전경 픽셀 제거 및 모폴로지 필터링에 의한 최종 모션 영역 검출 결과(804)의 예이다.The motion area detector 705 periodically obtains (or recognizes) a basic motion area through a difference (or difference) between the background image and the input image, and performs various noise motion pixel removal methods and morphology filtering. To detect the final motion area. 8 is an example showing the processing process and results of the motion region detection unit 705, the input image 801, the learned background image 802, the basic motion region detection result 803 due to the difference between the input image and the background image, This is an example of the final motion area detection result 804 by noise foreground pixel removal and morphological filtering.

객체 추적부(706)는 도 8의 입력 영상(801) 및 최종 모션 영역 검출 결과(804)를 이용하여 다중(혹은 복수의) 객체 추적을 수행한다. 구체적으로 객체 추적부(706)는 신규 객체 검출, 매칭에 의한 프레임 간 객체 추적, 추적 객체의 템플리트 이미지(Template Image) 및 바운딩 박스 좌표 업데이트, 추적 객체 목록 관리, 추적 객체별 궤적 관리 등을 수행한다. 도 9는 도 7의 객체 추적부(706)의 처리 과정 및 결과를 보여주는 예로, 몇몇 추적 객체들의 템플리트 이미지(801) 및 객체 검출/추적 결과(802)의 예이다. 도 8의 객체 검출/추적 결과(802)의 예에서, 관심 객체인 사람 객체 이외에 다수의 무의미한 객체들(예: 바람에 흔들리는 나뭇가지들의 모션에 의해 검출된 객체들)이 검출된 것을 볼 수 있다.The object tracking unit 706 performs multiple (or multiple) object tracking using the input image 801 of FIG. 8 and the final motion area detection result 804. Specifically, the object tracking unit 706 detects new objects, tracks objects between frames by matching, updates the template image and bounding box coordinates of the tracking objects, manages the tracking object list, and manages trajectories for each tracking object. . FIG. 9 is an example showing a process and results of the object tracking unit 706 of FIG. 7, and is an example of a template image 801 and object detection/tracking results 802 of several tracking objects. In the example of the object detection/tracking result 802 of FIG. 8, it can be seen that in addition to the human object, which is the object of interest, a number of meaningless objects (eg, objects detected by the motion of the branches swaying in the wind) are detected. .

추적 객체 분류부(707)는 추적 중인 객체들에 대해 객체 이미지 인식 DCNN(708)을 이용하여 객체 분류를 수행한다. 추적 객체 분류부(707)와 관련한 자세한 내용은 이후에 도 10 내지 도 12를 참조하여 좀더 살펴보기로 한다.The tracking object classification unit 707 performs object classification on the objects being tracked using the object image recognition DCNN 708. The details related to the tracking object classification unit 707 will be described later with reference to FIGS. 10 to 12.

또한, 비관심 추적 객체 제거부(709)는 비관심 클래스(class, 부류, 계층)로 분류된 추적 객체들을 객체 추적부(706)의 추적 객체 목록에서 제거한다.Also, the non-interest tracking object removing unit 709 removes tracking objects classified as non-interesting classes (class, class, hierarchy) from the tracking object list of the object tracking unit 706.

관심 추적 객체 기반 이벤트검출부(710)는 관심 클래스의 객체들의 추적 정보를 이용하여 지정된 규칙을 만족하는 이벤트를 검출한다.The interest tracking object-based event detection unit 710 detects an event that satisfies a specified rule by using tracking information of objects of interest class.

도 10은 도 7의 추적객체 분류부의 처리 과정을 설명하기 위한 도면이다.10 is a view for explaining a processing process of the tracking object classification unit of FIG. 7.

설명의 편의상 도 10을 도 7과 함께 참조하면, 도 10의 예에서와 같이 도 7의 객체 이미지 인식 DCNN(708)은 크기가 정규화된 객체 이미지를 입력으로 받아 “사람”, “차량”, “미확인” 클래스 중 하나로 인식하는 심층 신경망이다. 가령, 객체 이미지가 입력되면, 객체 이미지 인식 DCNN(708)은 입력된 객체 이미지를 기저장한 이미지들(예: 샘플 이미지 혹은 이미지 데이터)과 비교하여 그에 대한 인식 결과를 제공해 줄 수 있다.For convenience of description, referring to FIG. 10 together with FIG. 7, as in the example of FIG. 10, the object image recognition DCNN 708 of FIG. 7 receives an object image with a normalized size as an input “person”, “vehicle”, “ It is a deep neural network recognized as one of the unidentified classes. For example, when an object image is input, the object image recognition DCNN 708 may compare the input object image with pre-stored images (eg, a sample image or image data) and provide a recognition result.

이에 도 7의 추적 객체 분류부(707)는 추적 중인 각 객체에 대해, 입력 영상으로부터 객체 바운딩 박스 영역에 해당하는 이미지 패치(Image Patch)(혹은 객체 이미지)를 획득하고, 이미지 패치의 크기를 정규화한 다음, 객체 이미지 인식 DCNN(708)을 통해 인식을 시도한다.Accordingly, the tracking object classification unit 707 of FIG. 7 acquires an image patch (or object image) corresponding to the object bounding box area from the input image for each object being tracked, and normalizes the size of the image patch Then, it attempts to recognize through the object image recognition DCNN (708).

추적 객체 분류부(707)는 보다 정확한 객체 분류를 수행하기 위해 DCNN 인식을 단 한 번만 시도하는 것이 아니라 주기적으로 여러 번 시도를 한다. 도 10의 1001의 예와 같이, 객체 바운딩 박스의 감지 상태에 따라 특정 시점에서 얻은 DCNN 인식 결과는 불안정한 인식 결과일 수도 있기 때문이다.The tracking object classification unit 707 does not attempt to recognize DCNN only once, but periodically tries to perform more accurate object classification. This is because, as in the example of 1001 in FIG. 10, the DCNN recognition result obtained at a specific time point may be an unstable recognition result according to the detection state of the object bounding box.

구체적으로 추적 객체 분류부(707)는 도 10과 같이 특정 추적 객체에 대해 지정된 주기 P마다 입력 영상으로부터 객체의 이미지(혹은 이미지 패치)를 획득하여 DCNN 인식을 시도하고, DCNN 인식 결과를 클래스 별로 누적한 다음, 현재 프레임에서 누적 점수가 가장 높은 클래스를 그 추적 객체의 클래스로 확정한다. 가령, 도 10에서 볼 때 7개의 단위 프레임이 일정 시간 간격으로 입력되면, 입력된 첫번째, 세번째, 다섯번째 및 일곱번째의 단위 프레임에서 동일 객체에 대한 객체 이미지 패치를 각각 획득하여 DCNN을 주기적, 즉 일정시간 간격을 주기로 적용할 수 있다.Specifically, the tracking object classification unit 707 attempts DCNN recognition by acquiring an image (or an image patch) of an object from an input image every specified period P for a specific tracking object as shown in FIG. 10, and accumulates DCNN recognition results for each class Then, the class having the highest cumulative score in the current frame is determined as the class of the tracking object. For example, as shown in FIG. 10, when seven unit frames are input at regular time intervals, DCNNs are periodically obtained by obtaining object image patches for the same object in the first, third, fifth, and seventh unit frames. It can be applied at regular intervals.

위의 주기 P 값은 통상적으로 0.5초~1초가 적당하다. P 값이 작을수록 영상분석장치(690)의 처리 부하가 증가하고, P 값이 클수록 객체 분류 성능이 떨어질 수 있다. 그리고 객체의 이미지 패치를 획득할 때, 원 해상도의 영상(즉, 도 7의 비디오 프레임 서브샘플링부(702)에서 얻은 비디오 프레임)으로부터 획득하는 것이 바람직하다. 이는 축소가 된 이후의 영상으로부터 객체의 이미지 패치를 얻을 경우, 질이 저하되어 인식하기 어려운 이미지 패치를 얻을 수도 있기 때문이다.The above period P value is usually 0.5 seconds to 1 second. The smaller the P value, the greater the processing load of the image analysis device 690, and the larger the P value, the lower the object classification performance. In addition, when acquiring an image patch of an object, it is preferable to acquire it from an image of the original resolution (ie, a video frame obtained from the video frame subsampling unit 702 of FIG. 7 ). This is because when an image patch of an object is obtained from an image after reduction, an image patch that is difficult to recognize due to deterioration in quality may be obtained.

주기적으로 수행하는 DCNN 인식 시도 횟수는 총 N회로 한정 지을 수 있다. 이는 장시간 추적하는 객체에 대하여 쓸데 없이 DCNN 인식 시도를 계속 하는 것을 방지하기 위함이다.The number of DCNN recognition attempts periodically performed may be limited to a total of N times. This is to prevent the DCNN recognition attempt to continue to be unnecessary for an object that is tracked for a long time.

도 11은 주어진 이미지를 사람(a), 차량(b), 미확인(c) 클래스 중 하나로 분류하는 객체 이미지 인식 DCNN을 학습하기 위한 샘플 이미지들의 예이다.11 is an example of sample images for learning an object image recognition DCNN classifying a given image into one of human (a), vehicle (b), and unidentified (c) classes.

설명의 편의상 도 11을 도 7과 함께 참조하면, 도 11의 예에서와 같이 특히 “사람” 객체의 학습 데이터의 경우, 사람의 전신 이미지 이외에 사람의 부분 이미지나 두 명 이상의 사람이 포함된 이미지도 학습 샘플에 포함되어 있음에 주목할 필요가 있다. 이는 도 7의 모션 기반 영상 분석부(722)에 의해 검출된 사람 객체 영역의 경우, 사람의 전신뿐만 아니라 사람의 일부분 또는 두 명 이상의 사람이 포함되는 경우도 자주 발생하기 때문이다.Referring to FIG. 11 together with FIG. 7 for convenience of description, as in the example of FIG. 11, in particular, in the case of learning data of the “person” object, a partial image of a person or an image including two or more persons is also included in the learning data of the person. It is worth noting that it is included in the training sample. This is because, in the case of the human object region detected by the motion-based image analysis unit 722 of FIG. 7, not only the whole body of the person but also a part of the person or two or more persons is frequently included.

도 7의 객체 이미지 인식 DCNN(708)의 모델로, 현재까지 제안된 다양한 형태의 DCNN 모델이 사용될 수 있다. 대표적으로 ILSVRC(ImageNet Large Scale Visual Recognition Challenge)에서 우승한 적이 있는 AlexNet, VGGNet, GoogLeNet, ResNet 모델 등이 있다. 실제로 사용할 DCNN 모델 선택 시 단순히 DCNN의 인식 성능만 고려해서는 문제가 있다. 통상적으로 DCNN의 처리 시간(Inference Time) 및 용량(예: 파라미터 수)과 DCNN의 인식 성능 사이에는 트레이드-오프(trade-off) 관계가 있기 때문이다. 최근에 인식 성능은 기존 DCNN 모델과 유사하면서 처리 시간이나 파라미터 수를 획기적으로 줄인 DCNN 모델들이 발표되고 있는데, 예를 들면 스퀴즈넷(SqueezeNet)이나 모바일넷(MobileNet) 등을 들 수 있다.As the model of the object image recognition DCNN 708 of FIG. 7, various types of DCNN models proposed to date can be used. Examples include AlexNet, VGGNet, GoogLeNet, and ResNet models that have won the ILSVRC (ImageNet Large Scale Visual Recognition Challenge). When selecting a DCNN model to be actually used, there is a problem by simply considering DCNN recognition performance. This is because there is a trade-off relationship between DCNN processing time and capacity (eg, number of parameters) and DCNN recognition performance. Recently, DCNN models, which have similar recognition performance to existing DCNN models and have significantly reduced the processing time and the number of parameters, have been announced, for example, SqueezeNet or MobileNet.

도 12는 도 7의 추적 객체 분류부를 통해 추적 객체들을 분류한 예이고, 도 13은 도 7의 비관심 추적 객체 제거부의 처리 결과를 보여주는 예이며, 도 14는 도 7의 관심 추적 객체 기반 이벤트 검출부의 처리 결과를 보여주는 예이다.FIG. 12 is an example in which tracking objects are classified through the tracking object classification unit of FIG. 7, FIG. 13 is an example of processing results of the uninterested tracking object removal unit of FIG. 7, and FIG. 14 is a tracking object-based event of interest in FIG. 7 This is an example showing the processing result of the detector.

도 12에서 볼 때, 실제 사람 객체 이외에 나머지 오검출된 객체들을 모두 “미확인(Unknown)”으로 정상 분류한 것을 볼 수 있다.As shown in FIG. 12, it can be seen that all other mis-detected objects are normally classified as “Unknown” in addition to the real human object.

또한, 도 13에서는 “미확인” 클래스를 비관심 클래스로 지정했을 때, 도 12의 객체 분류 결과로부터 비관심 추적 객체 제거부(709)의 처리 결과를 보여준다.In addition, FIG. 13 shows the processing result of the non-interest tracking object removal unit 709 from the object classification result of FIG. 12 when the “unconfirmed” class is designated as the non-interest class.

도 14에서는 “사람” 객체가 지정된 영역에 침입하는 이벤트를 검출하는 예를 잘 보여주고 있다.14 shows an example of detecting an event in which the “person” object invades a designated area.

도 15는 본 발명의 일 실시예에 따른, N대의 CCTV 카메라 영상을 동시에 실시간 분석하기 위한 객체 이미지 인식 DCNN 기반 CCTV 영상분석장치의 구성도이다.15 is a configuration diagram of an object image recognition DCNN-based CCTV image analysis device for simultaneously real-time analysis of N CCTV camera images according to an embodiment of the present invention.

도 15에 도시된 바와 같이, 본 발명의 일 실시예에 따른 CCTV 영상분석장치(1490)는 N개의 비디오 스트림을 동시에 처리하기 위한 N개의 비디오 채널 처리부(1501), 객체 이미지 인식 처리부(1502) 및 객체 이미지 인식 DCNN(708)의 일부 또는 전부를 포함하며, 여기서 "일부 또는 전부를 포함"한다는 것은 앞서서의 의미와 동일하다.15, the CCTV image analysis apparatus 1490 according to an embodiment of the present invention includes N video channel processing units 1501, an object image recognition processing unit 1502 for processing N video streams simultaneously, and Includes some or all of the object image recognition DCNN 708, where "including some or all" has the same meaning as above.

비디오 채널 처리부(1501)는 도 7의 영상 변환부(721)와 모션 기반 영상 분석부(722)의 일부 또는 전부를 포함할 수 있다. N개의 비디오 채널 처리부(1501) 및 객체 이미지 인식 처리부(1502)는 통상적으로 영상분석장치(1490)의 CPU 상에서 동작한다. 반면 객체 이미지 인식 DCNN(708)은 고속 동작을 위해 대규모 병렬 연산이 가능한 GPU 상에서 동작할 수 있다.The video channel processor 1501 may include part or all of the image converter 721 of FIG. 7 and the motion-based image analyzer 722. The N video channel processor 1501 and the object image recognition processor 1502 typically operate on the CPU of the image analysis device 1490. On the other hand, the object image recognition DCNN 708 can operate on a GPU capable of large-scale parallel computation for high-speed operation.

N개의 비디오 채널 처리부(1501)는 N개의 객체 이미지 인식 DCNN을 각각 사용하는 것이 아니라, 한 개의 객체 이미지 인식 DCNN(708)을 “공유”하여 사용한다. 한 개의 DCNN을 구동하기 위해 통상적으로 많은 시스템 메모리가 요구되는데, N개의 비디오 채널 처리부(1501)가 개별적으로 DCNN을 메모리에 올려서 사용할 경우, N의 값이 커짐에 따라 메모리 부족 문제가 발생할 수 있기 때문이다.The N video channel processing units 1501 do not use N object image recognition DCNNs, respectively, but “share” one object image recognition DCNN 708. In order to drive one DCNN, a large amount of system memory is usually required. When N video channel processing units 1501 individually use DCNNs in memory, a memory shortage problem may occur as the value of N increases. to be.

N개의 비디오 채널 처리부(1501) 내에 있는 각 모션 기반 영상 분석부(722)는 도 10에서 기술한 방식과 동일한 방식으로 추적 객체들의 분류 작업을 수행하기 위해, 추적 객체의 정규화된 객체 이미지 패치와 인식 요청 메시지를 주기적으로 객체 이미지 인식 처리부(1502)에 전달한다. 상기의 인식 요청 메시지들은 객체 이미지 인식 처리부(1502)의 요청 메시지 큐(Queue)에 순차적으로 저장된다. 객체 이미지 인식 처리부(1502)는 요청 메시지 큐에 인식 요청 메시지가 있는 것을 발견하는 즉시 요청 메시지를 큐로부터 꺼내 처리한다. 즉, 객체 이미지 인식 처리부(1502)는 인식 요청 메시지와 함께 전달받은 정규화된 객체 이미지 패치를 객체 이미지 인식 DCNN(708)에 입력하여 객체 이미지 인식 결과를 획득하고, 인식 요청을 한 해당 모션 기반 영상 분석부(722)로 객체 이미지 인식 결과를 전달한다. 해당 모션 기반 영상 분석부(722)는 객체 이미지 인식 결과를 받아 도 10과 같이 객체 분류 작업을 수행한다.Each motion-based image analysis unit 722 within the N video channel processing units 1501 performs normalized object image patching and recognition of the tracking object in order to perform classification of tracking objects in the same manner as described in FIG. The request message is periodically transmitted to the object image recognition processing unit 1502. The recognition request messages are sequentially stored in the request message queue of the object image recognition processing unit 1502. The object image recognition processing unit 1502 pulls the request message out of the queue and processes it as soon as it finds that the request message queue has a recognition request message. That is, the object image recognition processing unit 1502 inputs the normalized object image patch received with the recognition request message to the object image recognition DCNN 708 to obtain an object image recognition result, and analyzes the corresponding motion-based image that requested the recognition. The object image recognition result is transmitted to the unit 722. The motion-based image analysis unit 722 receives the object image recognition result and performs object classification as shown in FIG. 10.

도 16은 본 발명의 일 실시예에 따른, CCTV 관제 센터 등에서 대규모로 영상 분석을 수행하기 위한 객체 이미지 인식 DCNN 기반 CCTV 영상분석 시스템의 구성도이다.16 is a configuration diagram of an object image recognition DCNN-based CCTV image analysis system for performing image analysis on a large scale in a CCTV control center, according to an embodiment of the present invention.

대규모의 영상분석 시스템을 구축하기 위해 단순히 도 15에서 제시한 다수의 영상분석장치(1490)를 이용하여 시스템을 구성할 수도 있다. 그러나 관제 센터에서 사용 가능한 고성능 GPU는 통상적으로 매우 고가이므로, 개별 영상분석장치마다 GPU를 설치하여 사용하기에는 비용적인 문제가 발생한다.In order to construct a large-scale image analysis system, the system may be configured by simply using a plurality of image analysis devices 1490 shown in FIG. 15. However, the high-performance GPU available in the control center is usually very expensive, and thus, it is costly to install and use the GPU for each image analysis device.

상기의 문제를 해결하기 위해 본 발명의 실시예에서는 도 16과 같이 통상적인 비디오 채널 처리는 GPU가 탑재되어 있지 않은 다수의 영상 분석 서버(1601)가 수행하고, DCNN 기반의 객체 이미지 인식은 고성능 GPU가 탑재된 소수의 객체 이미지 인식 서버(1602)가 수행하도록 영상분석 시스템(1590)을 구성할 수 있다. 영상 분석 서버(1601)는 객체 이미지 인식 서버(1602)와 고속의 네트워크 통신을 수행한다.In order to solve the above problem, in the embodiment of the present invention, as shown in FIG. 16, a typical video channel processing is performed by a plurality of image analysis servers 1601 not equipped with a GPU, and DCNN-based object image recognition is a high performance GPU. The image analysis system 1590 may be configured to be performed by a small number of object image recognition servers 1602 mounted thereon. The image analysis server 1601 performs high-speed network communication with the object image recognition server 1602.

영상 분석 서버(1601)의 비디오 채널 처리부는 도 14와 동일한 방식으로 추적 객체들의 분류 작업을 수행하기 위해, 추적 객체의 정규화된 객체 이미지 패치와 인식 요청 메시지를 주기적으로 객체 이미지 인식 서버(1602)에 전달한다. 이때 객체 이미지 패치 데이터는 JPEG 등의 형식으로 압축하여 전달한다. 상기의 인식 요청 메시지들은 객체 이미지 인식 서버(1602)의 객체 이미지 인식 처리부(1603)의 요청 메시지 큐(Queue)에 순차적으로 저장된다. 객체 이미지 인식 처리부(1603)는 요청 메시지 큐에 인식 요청 메시지가 있는 것을 발견하는 즉시 요청 메시지를 큐로부터 꺼내 처리한다. 즉, 객체 이미지 인식 처리부(1603)는 인식 요청 메시지와 함께 전달받은 압축된 객체 이미지 패치 데이터를 디코딩한 후, 객체 이미지 인식 DCNN(708)에 입력하여 객체 이미지 인식 결과를 획득한다. 객체 이미지 인식 처리부(1603)는 인식 요청을 한 영상 분석 서버(1601)의 비디오 채널 처리부로 객체 이미지 인식 결과를 전달한다. 영상 분석 서버(1601)의 비디오 채널 처리부는 객체 이미지 인식 결과를 받아 도 10과 같이 객체 분류 작업을 수행하는 것이다.The video channel processing unit of the image analysis server 1601 periodically transmits the normalized object image patch and the recognition request message of the tracking object to the object image recognition server 1602 in order to classify the tracking objects in the same manner as in FIG. 14. To deliver. At this time, the object image patch data is compressed and transmitted in a format such as JPEG. The recognition request messages are sequentially stored in the request message queue of the object image recognition processing unit 1603 of the object image recognition server 1602. The object image recognition processing unit 1603 pulls the request message out of the queue and processes it as soon as it finds that the request message queue has a recognition request message. That is, the object image recognition processing unit 1603 decodes the compressed object image patch data received with the recognition request message, and then inputs the object image recognition DCNN 708 to obtain the object image recognition result. The object image recognition processing unit 1603 transmits the object image recognition result to the video channel processing unit of the image analysis server 1601 that has requested the recognition. The video channel processing unit of the image analysis server 1601 receives the object image recognition result and performs object classification as shown in FIG. 10.

도 17은 본 발명의 또 다른 실시예에 따른 영상분석장치의 구조를 나타내는 블록도이다.17 is a block diagram showing the structure of an image analysis apparatus according to another embodiment of the present invention.

도 17에 도시된 바와 같이, 본 발명의 또 다른 실시예에 따른 영상분석장치(1700)는 통신 인터페이스부(1710), 제어부(1720), DCNN기반 영상분석부(1730) 및 저장부(1740)의 일부 또는 전부를 포함한다.As shown in FIG. 17, the image analysis apparatus 1700 according to another embodiment of the present invention includes a communication interface unit 1710, a control unit 1720, a DCNN-based image analysis unit 1730, and a storage unit 1740. Includes all or part of.

여기서, "일부 또는 전부를 포함한다"는 것은 통신 인터페이스부(1710)나 저장부(1740)와 같은 일부 구성요소가 생략되어 영상분석장치(1700)가 구성되거나 DCNN기반 영상분석부(1730)와 같은 일부 구성요소가 제어부(1720)와 같은 다른 구성요소에 통합되어 구성될 수 있는 것 등을 의미하는 것으로 발명의 충분한 이해를 돕기 위하여 전부 포함하는 것으로 설명한다.Here, "including some or all" means that some components such as the communication interface unit 1710 or the storage unit 1740 are omitted to configure the image analysis device 1700 or the DCNN-based image analysis unit 1730. It means that some of the same components may be integrated into other components such as the control unit 1720 and the like, and all of them are included to help the understanding of the invention.

통신 인터페이스부(1710)는 통신사의 통신망을 경유하여 CCTV와 통신을 수행하고, 이의 과정에서 변/복조 등의 동작을 수행할 수 있다.The communication interface unit 1710 may communicate with a CCTV through a communication network of a communication company, and perform operations such as modulation/demodulation in the process.

제어부(1720)는 도 17의 영상분석장치(1700)를 구성하는 통신 인터페이스부(1710), DCNN기반 영상분석부(1730) 및 저장부(1740)의 전반적인 제어 동작을 담당한다. 가령, 제어부(1720)는 통신 인터페이스부(1710)를 통해 제공되는 CCTV의 촬영영상을 DCNN기반 영상분석부(1730)에 제공할 수 있다.The control unit 1720 is in charge of the overall control operation of the communication interface unit 1710, the DCNN-based image analysis unit 1730, and the storage unit 1740 constituting the image analysis device 1700 of FIG. For example, the control unit 1720 may provide the captured image of the CCTV provided through the communication interface unit 1710 to the DCNN-based image analysis unit 1730.

DCNN기반 영상분석부(1730)는 앞서 도 7 내지 도 16을 참조하여 설명한 바 있는 본 발명의 실시예에 관련되는 영상분석 동작을 수행할 수 있다. 대표적으로는 수신된 프레임 영상의 프레임률을 변환하거나 단위 프레임 영상을 랜덤하게 취득하며, 본 발명의 실시예에서는 이를 서브샘플링이라 명명한 바 있다. 또한, 서브샘플링을 통해 취득된 단위 프레임 영상의 해상도를 고해상도에서 저해상도로 변환한다. 그리고, 수신된 프레임 영상의 포맷을 변환할 수 있다.The DCNN-based image analysis unit 1730 may perform an image analysis operation related to an embodiment of the present invention as described with reference to FIGS. 7 to 16 above. Typically, a frame rate of a received frame image is converted or a unit frame image is randomly acquired. In the exemplary embodiment of the present invention, this is called subsampling. In addition, the resolution of the unit frame image acquired through subsampling is converted from high resolution to low resolution. Then, the format of the received frame image can be converted.

또한, DCNN기반 영상분석부(1730)는 포맷이 변환된 서브샘플링된 프레임 영상에서 객체 이미지 패치를 추출하여 DCNN기반으로 객체 이미지를 인식시켜 객체를 정확히 분류하고, 이때 비관심 추적 객체는 제거 즉 필터링한다. DCNN기반 영상분석부(1730)는 이와 같이 제거 동작을 통해 관심 추적 객체에 대하여만 이벤트를 검출하게 되는 것이다. 그리고, 그 이벤트를 근거로 경보를 출력할 수 있다.In addition, the DCNN-based image analysis unit 1730 extracts an object image patch from a subsampled frame image whose format has been converted, recognizes an object image based on DCNN, and accurately classifies the object. do. The DCNN-based image analysis unit 1730 detects an event only for a tracking object of interest through the removal operation. Then, an alarm can be output based on the event.

상기한 내용들 이외에, 도 17의 통신 인터페이스부(1710), 제어부(1720), DCNN기반 영상분석부(1730) 및 저장부(1740)와 관련해서는 앞서 충분히 설명하였으므로, 자세한 내용은 그 내용들로 대신하고자 한다.In addition to the above, the communication interface unit 1710, the control unit 1720, the DCNN-based image analysis unit 1730, and the storage unit 1740 of FIG. 17 have been sufficiently described above. I would like to replace it.

한편, 본 발명의 또 다른 실시예로서 도 17의 제어부(1720)는 CPU와 메모리를 포함할 수 있다. CPU는 제어회로, 연산부(ALU), 명령어해석부 및 레지스트리 등을 포함할 수 있고, 메모리는 램을 포함할 수 있다. 여기서, CPU는 도 15에서의 CPU를 의미할 수도 있다. 제어회로는 제어동작을 담당하고, 연산부는 비트연산을 수행하며, 명령어해석부는 기계어를 해석해 어떠한 명령인지를 판단할 수 있다. 레지스트리는 데이터의 일시 저장에 관여할 수 있다. 제어부(1720)는 통상 원칩(One-chip)으로 구성될 수 있다. 따라서, CPU는 영상분석장치(1700)의 동작 초기에 도 17의 DCNN기반 영상분석부(1730)에 저장된 프로그램을 복사하여 메모리에 로딩하고 이를 실행시킴으로써 CPU의 연산 처리 속도를 빠르게 증가시킬 수 있을 것이다.Meanwhile, as another embodiment of the present invention, the control unit 1720 of FIG. 17 may include a CPU and a memory. The CPU may include a control circuit, an operation unit (ALU), an instruction analysis unit, and a registry, and the memory may include RAM. Here, the CPU may mean the CPU in FIG. 15. The control circuit is in charge of the control operation, the operation unit performs a bit operation, and the instruction analysis unit may interpret the machine language to determine what command it is. The registry may be involved in the temporary storage of data. The control unit 1720 may be generally configured as a one-chip. Accordingly, the CPU may rapidly increase the computational processing speed of the CPU by copying the program stored in the DCNN-based image analysis unit 1730 of FIG. 17 and loading it into memory and executing it in the initial stage of operation of the image analysis apparatus 1700. .

도 18은 본 발명의 실시예에 따른 영상분석장치의 구동과정을 나타내는 흐름도이다.18 is a flowchart illustrating a driving process of an image analysis apparatus according to an embodiment of the present invention.

설명의 편의상 도 18을 도 17과 함께 참조하면, 본 발명의 실시예에 따른 영상분석장치(1700)는 비디오 제공 장치(예: CCTV, 복수의 고해상도 카메라 등)에서 제공하는 영상, 즉 복수의 비디오 프레임 영상을 수신하여 영상처리의 부하를 줄이는 포맷으로 변환한다(S1800). 여기서, 포맷은 앞서의 제1 포맷을 의미한다기보다는 수신된 비디오 영상과 다른 형태의 비디오 영상으로 변환한다고 이해하는 것이 좋다. 따라서, 이를 위하여 영상분석장치(1700)는 앞서와 같이 해상도를 입력 영상보다 저해상도로, 그리고 처리해야하는 영상은 더 적게, 그리고 가급적 RGB나 G-scale 픽셀 포맷의 비디오 영상으로 변환하는 것이 바람직하다. 이는 어디까지나 일반적인 CPU 환경에서도 CCTV의 영상을 통해 분석이 가능하고, 이를 통해 관제요원 등이 관제를 수행할 수도 있도록 하기 위한 것이라 볼 수 있다.Referring to FIG. 18 together with FIG. 17 for convenience of description, the image analysis device 1700 according to an embodiment of the present invention provides an image provided by a video providing device (eg, CCTV, a plurality of high-resolution cameras, etc.), that is, a plurality of videos. The frame image is received and converted into a format that reduces the load of image processing (S1800). Here, it is better to understand that the format is converted into a video image of a different form from the received video image, rather than the first format. Therefore, for this purpose, the image analysis apparatus 1700 preferably converts the resolution to a lower resolution than the input image, and fewer images to be processed, and preferably to a video image in RGB or G-scale pixel format. This can be analyzed to be possible through the video of CCTV even in a general CPU environment, and through this, it is intended to enable the control personnel to perform the control.

이어, 영상분석장치(1700)는 포맷이 변환된 복수의 비디오 프레임 영상에서 모션 추적을 위한 복수의 객체 객체에 대한 객체 이미지를 추출하여, 추출한 객체 이미지를 객체 이미지 인식 방식을 사용하는 DCNN에 적용하여 적용 결과를 근거로 복수의 추적 객체를 분류하고, 분류한 복수의 추적 객체에서 지정 기준을 벗어나는 사용자 비관심 객체를 제거해 획득하는 사용자 관심 객체의 추적 정보 및 지정 규칙을 근거로 이벤트를 검출한다(S1810).Subsequently, the image analysis apparatus 1700 extracts object images for a plurality of object objects for motion tracking from a plurality of format-converted video frame images, and applies the extracted object images to DCNN using an object image recognition method. Based on the application result, a plurality of tracking objects are classified, and an event is detected based on the tracking information of the user-interested object and a designation rule obtained by removing a user uninteresting object that exceeds a specified criterion from the classified plurality of tracking objects (S1810) ).

가령, DCNN은 샘플 이미지를 저장하고 이를 활용해 입력된 객체 이미지의 인식 결과를 제공해 줄 수 있다. 또한, 학습에 의해 적용 결과를 누적하여 누적 결과를 생성하고 누적 결과 높은 클래스를 최종적으로 추적 객체로 확정한다. 예를 들어, 사람 객체에 대한 누적 결과가 2점이고, 미확인 1점이라면 해당 프레임 구간에서는 사람 객체로 확정하는 것이다. For example, DCNN may store a sample image and utilize it to provide a recognition result of an input object image. In addition, by accumulating application results by learning, a cumulative result is generated, and a class having a high cumulative result is finally determined as a tracking object. For example, if the cumulative result for a human object is 2 points and if it is 1 unconfirmed, it is determined as a human object in the corresponding frame section.

이에 따라, 가령 미확인으로 분류된 특정 객체는 해당 객체의 객체 이미지를 근거로 복수의 추적 객체의 대상에서 제외시킨다. 따라서, 화면에서 추적 객체에 형성된 바운딩 박스는 제거될 수 있다.Accordingly, for example, a specific object classified as unidentified is excluded from the object of a plurality of tracking objects based on the object image of the object. Therefore, the bounding box formed on the tracking object on the screen can be removed.

이러한 방식으로 사용자의 관심 객체만 추적하게 되고, 그 추적 정보가 지정된 규칙을 만족하는 이벤트를 검출하게 되는 것이다.In this way, only the object of interest of the user is tracked, and an event in which the tracking information satisfies the specified rule is detected.

이외에도 다양한 동작이 가능할 수 있지만, 자세한 내용은 앞서 설명한 내용들로 대신하고자 한다.In addition, various operations may be possible, but the details will be replaced by the contents described above.

한편, 본 발명의 실시예를 구성하는 모든 구성 요소들이 하나로 결합하거나 결합하여 동작하는 것으로 설명되었다고 해서, 본 발명이 반드시 이러한 실시 예에 한정되는 것은 아니다. 즉, 본 발명의 목적 범위 안에서라면, 그 모든 구성 요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다. 또한, 그 모든 구성요소들이 각각 하나의 독립적인 하드웨어로 구현될 수 있지만, 각 구성 요소들의 그 일부 또는 전부가 선택적으로 조합되어 하나 또는 복수 개의 하드웨어에서 조합된 일부 또는 전부의 기능을 수행하는 프로그램 모듈을 갖는 컴퓨터 프로그램으로서 구현될 수도 있다. 그 컴퓨터 프로그램을 구성하는 코드들 및 코드 세그먼트들은 본 발명의 기술 분야의 당업자에 의해 용이하게 추론될 수 있을 것이다. 이러한 컴퓨터 프로그램은 컴퓨터가 읽을 수 있는 비일시적 저장매체(non-transitory computer readable media)에 저장되어 컴퓨터에 의하여 읽혀지고 실행됨으로써, 본 발명의 실시 예를 구현할 수 있다.On the other hand, that all components constituting the embodiments of the present invention are described as being combined or operated as one, the present invention is not necessarily limited to these embodiments. That is, within the scope of the object of the present invention, all of the constituent elements may be selectively combined and operated. In addition, although all of the components may be implemented as one independent hardware, a part or all of the components are selectively combined to perform a part or all of functions combined in one or a plurality of hardware. It may be implemented as a computer program having a. The codes and code segments constituting the computer program can be easily deduced by those skilled in the art of the present invention. Such a computer program is stored in a computer-readable non-transitory computer readable media and read and executed by a computer, thereby realizing an embodiment of the present invention.

여기서 비일시적 판독 가능 기록매체란, 레지스터, 캐시(cache), 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라, 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로, 상술한 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리 카드, ROM 등과 같은 비일시적 판독가능 기록매체에 저장되어 제공될 수 있다.Here, the non-transitory readable recording medium means a medium that stores data semi-permanently and that can be read by a device, rather than a medium that stores data for a short time, such as registers, caches, and memory. . Specifically, the above-described programs may be stored and provided on a non-transitory readable recording medium such as a CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM, and the like.

이상에서는 본 발명의 바람직한 실시 예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어서는 안 될 것이다.In the above, although the preferred embodiment of the present invention has been illustrated and described, the present invention is not limited to the specific embodiments described above, and it is usually in the technical field to which the present invention pertains without departing from the gist of the present invention as claimed in the claims. It is of course possible to perform various modifications by a person having knowledge of, and these modifications should not be individually understood from the technical idea or prospect of the present invention.

100: 영상 촬영 기기 200: 영상감시장치
210: 영상 입력부 211: 영상 디코딩부
212: 영상 저장부 213: 영상 검색/재생부
214: 저해상도 파노라마 영상 생성부 216, 706: 객체 추적부
217: 수동 디지털 PTZ 제어부 218: 자동 디지털 PTZ 제어부
221: 영상 표출부 770, 1490, 1700: 영상분석장치
701: 비디오 프레임 획득부 702: 비디오 프레임 서브샘플링부
703: 비디오 프레임 스케일링부 704: 픽셀 포맷 변환부
705: 모션 영역 검출부 707: 추적 객체 분류부
708: 객체 이미지 인식 DCNN 709: 비관심 추적 객체 제거부
710: 관심 추적 객체 기반 이벤트 검출부 721: 영상 변환부
722: 모션 기반 영상 분석부 1501: 비디오 채널 처리부
1502, 1603: 객체 이미지 인식 처리부 1590: 영상분석 시스템
1601: 영상 분석 서버 1602: 객체 이미지 인식 서버
1710: 통신 인터페이스부 1720: 제어부
1730: DCNN기반 영상분석부 1740: 저장부100: video recording device 200: video surveillance device
210: video input unit 211: video decoding unit
212: image storage unit 213: image search / playback unit
214: low-resolution panoramic image generation unit 216, 706: object tracking unit
217: manual digital PTZ control unit 218: automatic digital PTZ control unit
221: image display unit 770, 1490, 1700: image analysis device
701: Video frame acquisition unit 702: Video frame subsampling unit
703: video frame scaling unit 704: pixel format conversion unit
705: motion area detection unit 707: tracking object classification unit
708: Object image recognition DCNN 709: Non-interest tracking object removal unit
710: interest tracking object-based event detection unit 721: image conversion unit
722: motion-based image analysis unit 1501: video channel processing unit
1502, 1603: object image recognition processing unit 1590: image analysis system
1601: image analysis server 1602: object image recognition server
1710: Communication interface unit 1720: Control unit
1730: DCNN-based image analysis unit 1740: storage unit

Claims

A panoramic image surveillance system using a plurality of high-resolution cameras that perform image analysis using DCNN (Deep Convolutional Neural Network) that provides recognition results for objects in an image.
The panoramic video surveillance system using the plurality of high-resolution cameras,
When a high-resolution image is received from the plurality of high-resolution cameras and a low-resolution panoramic image is generated and displayed using the high-resolution image based on a low-resolution panoramic image generation table, an event occurs in the process of displaying the low-resolution panoramic image. Display a high-resolution digital PTZ image using the panoramic image generation table,
The first image data composed of a plurality of first video frames having a first pixel format, a first resolution, and a first frame rate is received from the high-resolution camera, and the first image data is received from the high-resolution camera. 1 resolution, the first frame rate and the first pixel format, respectively, a second resolution lower than the first resolution, a second frame rate lower than the first frame rate, and a format different from the first pixel format An image converter for converting second image data in a heterogeneous second pixel format having a; And
In the video frame of the converted second image data, motion and detection of a plurality of moving objects are performed based on motion, images of the plurality of objects being tracked are respectively extracted, and the extracted object images are stored in the DCNN. Using the recognition result obtained by inputting, each of the plurality of tracking objects through the DCNN is classified into one of a plurality of designated types, and a user uninterested object that deviates from a specified criterion in the classified plurality of tracking objects The motion-based image analysis unit detects an event based on tracking information and a designated rule of a user interest object obtained by removing the identified user uninterested object from the object of the plurality of tracking objects by checking the
The image conversion unit,
And a video frame subsampling unit that samples a smaller number of video frames than the input plurality of first video frames to generate a plurality of second video frames,
The motion-based image analysis unit,
Based on the accumulation result of the recognition result for the plurality of tracking objects in a designated frame section, an object of a specified type having a low accumulation result is removed from the object of the plurality of tracking objects as the user uninteresting object,
The first recognition result obtained by applying the object images of the plurality of tracking objects extracted from one video frame among the plurality of second video frames to the DCNN and a certain time from the one video frame among the plurality of second video frames A tracking object classification unit for classifying the plurality of tracking objects based on a second recognition result obtained by applying the object images of the plurality of tracking objects extracted from at least one other video frame input at intervals to the DCNN; And
It includes; a non-interest tracking object removal unit for removing the identified user non-interest object from the object of the plurality of tracking objects in the object tracking unit by identifying a user non-interest object that is out of a specified criterion among the divided plurality of tracking objects;
The tracking object classification unit,
A panoramic image surveillance system using a plurality of high-resolution cameras that calculates and accumulates the first recognition result and the second recognition result as scores, respectively, and determines a class having the highest accumulated score as a tracking object type.

According to claim 1,
The image conversion unit,
A video frame acquiring unit receiving the plurality of video frames input according to the first frame rate;
A video frame scaling unit converting the generated second video frames into the second resolution; And
A pixel format converter that provides the second image data generated by converting the first pixel format of the plurality of second video frames converted to the second resolution into the second pixel format to the motion-based image analyzer; To
Panoramic image surveillance system using a plurality of high-resolution cameras further included.

According to claim 1,
The motion-based image analysis unit,
A motion region detection unit generating a difference image using the second image data and the learned background image, and removing motion noise from the generated difference image to detect motion regions for the plurality of moving objects;
An object tracking unit for detecting and tracking the plurality of moving objects using the detected motion region and the second image data; And
An interest tracking object-based event detection unit that detects an event in which tracking information of the user interest object obtained according to the removal satisfies a specified rule;
Panoramic image surveillance system using a plurality of high-resolution cameras further included.

delete

According to claim 1,
The non-interest tracking object removal unit, a panoramic image surveillance system using a plurality of high-resolution cameras to identify and remove the user non-interest tracking object outside the specified criteria from the determined type of object.

According to claim 1,
The image converting unit may include the first pixel format having a pixel format of a luminance signal (Y), a difference (U) of a red signal, and a difference (V) of a luminance signal and a blue component (Red-Green-Blue) or gray. A panoramic image surveillance system using a plurality of high-resolution cameras that convert to the second pixel format having a gray-scale pixel format.

As a video surveillance method of a panoramic video surveillance system using a plurality of high-resolution cameras that perform video analysis using DCNN that provides recognition results for objects in the video,
The video surveillance method of the panoramic video surveillance system using the plurality of high-resolution cameras,
When a high-resolution image is received from the plurality of high-resolution cameras and a low-resolution panoramic image is generated and displayed using the high-resolution image based on a low-resolution panoramic image generation table, an event occurs in the process of displaying the low-resolution panoramic image. Display a high-resolution digital PTZ image using the panoramic image generation table,
The image conversion unit constituting the panoramic image surveillance system receives first image data composed of a plurality of first video frames having a first pixel format, a first resolution, and a first frame rate from the high-resolution camera, and receives the first image data. The first resolution of the first image data, the first frame rate, and the first pixel format are respectively a second resolution lower than the first resolution, a second frame rate lower than the first frame rate, and the first pixel. Converting the second image data in a heterogeneous second pixel format having a format different from the format; And
The motion-based image analysis unit constituting the panoramic image monitoring system detects and tracks a plurality of moving objects based on motion in video frames of the second image data converted by the image conversion unit, and the plurality of objects being tracked The images of each are extracted, and the plurality of tracking objects through the DCNN are classified into one of a plurality of designated types using the recognition result obtained by inputting the extracted object image to the DCNN, respectively. The event is based on the tracking information and the designation rule of the user interest object obtained by removing the identified user non-interest object from the object of the plurality of tracking objects by checking the user non-interest object that is out of the specified criteria from the plurality of classified tracking objects It comprises the steps of detecting;
The step of converting the second image data,
And generating, by the video frame subsampling unit of the image conversion unit, a plurality of second video frames by sampling fewer video frames than the input plurality of first video frames,
The detecting of the event may include:
The motion-based image analysis unit, based on the accumulation result of the recognition result with respect to the plurality of tracking objects in a designated frame section, uses the object of the plurality of tracking objects as an object of unspecified object as the user-not-interested object. Remove it,
In the tracking object classification unit of the motion-based image analysis unit, a first recognition result obtained by applying the object images of the plurality of tracking objects extracted from one video frame among the plurality of second video frames to the DCNN and the plurality of agents Based on the second recognition result obtained by applying the object images of the plurality of tracking objects extracted from at least one other video frame input at a predetermined time interval from the one video frame among the two video frames to the DCNN. Identifying a tracking object; And
In the non-interest tracking object removal unit of the motion-based image analysis unit, a user non-interest object that is out of a specified criterion among the plurality of divided tracking objects is identified, and the identified user non-interest object is the object of the plurality of tracking objects of the object tracking unit Removing from; includes,
The step of distinguishing the plurality of tracking objects,
A method of monitoring a panoramic image using a plurality of high-resolution cameras that calculates and accumulates the first recognition result and the second recognition result as a score, respectively, and determines the type having the highest accumulated score as a tracking object type.

The method of claim 7,
The image conversion unit further includes a video frame acquisition unit, a video frame scaling unit, and a pixel format conversion unit,
The step of converting the second image data,
Receiving, by the video frame acquiring unit, the plurality of video frames input according to the first frame rate;
Converting the generated plurality of second video frames into the second resolution by the video frame scaling unit; And
The pixel format converter provides the second image data generated by converting the first pixel format of the plurality of second video frames converted to the second resolution into the second pixel format to the motion-based image analyzer To do;
Panoramic image monitoring method using a plurality of high-resolution cameras further included.

The method of claim 7,
The motion-based image analysis unit further includes a motion area detection unit, an object tracking unit, and an interest tracking object-based event detection unit,
The detecting of the event may include:
Generating, by the motion area detection unit, a difference image using the second image data and the learned background image, and removing noise from the generated difference image to detect motion regions for the plurality of moving objects;
In the object tracking unit, detecting and tracking the plurality of moving objects using the detected motion region and the second image data; And
In the interest tracking object-based event detection unit, detecting an event in which tracking information of a user interest object obtained according to the removal satisfies a specified rule;
Panoramic image monitoring method using a plurality of high-resolution cameras further included.

delete

The method of claim 7,
The step of removing from the object of the plurality of tracking objects,
A method of monitoring a panoramic image using a plurality of high-resolution cameras that identifies a user's uninterested tracking object that exceeds the specified criteria from the determined type of object and removes it from the target of the plurality of tracking objects.

The method of claim 7,
The step of converting the second image data,
Converting the first pixel format having the pixel format of the luminance signal (Y), the difference (U) of the red signal, and the difference (V) of the luminance signal and the blue component to the second pixel format having an RGB or grayscale pixel format Panorama video surveillance method using a plurality of high-resolution cameras.