KR102550117B1

KR102550117B1 - Method and System for Video Encoding Based on Object Detection Tracking

Info

Publication number: KR102550117B1
Application number: KR1020220155446A
Authority: KR
Inventors: 최준호; 김자현
Original assignee: 주식회사 인텔리빅스
Priority date: 2022-11-18
Filing date: 2022-11-18
Publication date: 2023-06-30

Abstract

The present invention relates to a method and a system for video encoding based on object detection tracking. The system for video encoding and transmission based on object detection tracking according to one embodiment of the present invention may comprise: an image analysis device detecting and tracking an area that requires identification of objects within a captured image when compressing the captured image from a camera and transmitting the same to another device and compressing tracked objects differently depending on whether they are objects of interest to allow the other device to restore the objects of interest at a higher resolution than the standard resolution; and an external device operating as the other device, receiving object data of tracked objects provided by the image analysis device to restore the object of interest at a high resolution, and using the high-resolution object of interest to identify and recognize the tracked object. Accordingly, the quality of the video can be enhanced.

Description

Video encoding method based on object detection tracking, and system thereof {Method and System for Video Encoding Based on Object Detection Tracking}

본 발명은 객체 검출 추적에 기반한 비디오 인코딩 및 전송 방법, 그리고 그 시스템에 관한 것으로서, 더 상세하게는 가령 인공지능(AI) 카메라에 탑재된 딥러닝 기반의 객체 검출 추적시스템을 이용하여 객체를 추출하고 검출된 객체 단위로 별도의 비디오 인코더를 통하여 고해상도로 인코딩한 뒤, 영상 분석 메타 데이터와 융합하여 전송하는 시스템과 모니터링 시스템 등은 이를 수신하여 객체 단위로 고해상도로 출력하고 2차 분석 서버에서는 이를 활용하여 객체 식별 및 인식을 할 수 있는 객체 검출 추적에 기반한 비디오 인코딩 및 전송 방법, 그리고 시스템에 관한 것이다.The present invention relates to a video encoding and transmission method based on object detection and tracking, and a system thereof, and more particularly, to extract an object using a deep learning-based object detection and tracking system mounted on an artificial intelligence (AI) camera and After encoding in high resolution through a separate video encoder in units of detected objects, the system and monitoring system that transmits after convergence with video analysis metadata receive it and output it in high resolution in units of objects, and the secondary analysis server utilizes it It relates to a video encoding and transmission method and system based on object detection tracking capable of object identification and recognition.

카메라 내에서 영상을 분석하여 객체를 검출 및 추적하고 이를 기반으로 이벤트를 발생시키는 기능이 사용되고 있다. 기존에는 움직임에 기반한 객체 검출 기술을 즉 모션기반 검출기술을 사용하였으나 최근에는 딥러닝 기반 객체 검출 기술을 보편적으로 사용하고 있다. 딥러닝 기반 객체 검출 기술은 단순히 사람, 차량과 같은 객체의 종류를 식별하는 데부터 시작하여 최근에는 객체의 속성 분석이나 객체를 식별하는 기술까지 도입이 일반화되었다. 객체의 속성 분석은 보행자의 확연하게 구분될 수 있는 복장을 인식하거나 성별을 구분하는 등 객체를 분류하는 기능을 말한다. 또한 객체의 식별은 사람의 얼굴 특징 또는 전신의 특징을 통하여 동일인 여부를 판별(Verification)하거나 식별하는 기능을 말한다. A function of detecting and tracking an object by analyzing an image within a camera and generating an event based on the detected function is being used. In the past, motion-based object detection technology, that is, motion-based detection technology was used, but recently, deep learning-based object detection technology is commonly used. Deep learning-based object detection technology starts from simply identifying the type of object such as a person or vehicle, and has recently become generalized to analyze object attributes or identify objects. Attribute analysis of an object refers to a function of classifying an object, such as recognizing a clearly distinguishable clothing of a pedestrian or classifying a gender. In addition, object identification refers to a function of verifying or identifying whether a person is the same person through facial features or whole body features.

최근의 CCTV 카메라의 경우 IP 카메라가 보편화 되어 있으며 이 경우 네트워크 자원의 제약과 저장소 용량의 제약으로 인하여 인코더를 통하여 압축하는 기술이 사용된다. 문제는 분석 성능을 높이려면 고해상도이고 고품질의 영상을 수신해야 하나 실제 환경에서는 압축률을 매우 높여 사용하는 경우가 많고 이 때문에 영상품질의 저하가 많이 발생하여 기대하는 성능을 얻기가 어려운 실정이다. In the case of recent CCTV cameras, IP cameras are common, and in this case, compression technology through an encoder is used due to limitations in network resources and storage capacity. The problem is that in order to improve analysis performance, high-resolution and high-quality images must be received, but in real environments, compression rates are often used with a very high compression rate.

물론 종래에는 IP 카메라를 이용한 얼굴 영상 획득 방법 및 장치의 기술이 공개된 바 있으나, 해당 기술의 경우 얼굴 검출 및 특징 추출 후 메타데이터와 함께 전송을 하나 메타데이터 내용에 화면에 영상을 재구성하기 위한 좌표 정보나 객체의 종류별로 영상 품질을 선택하는 기능이 포함되어 있지 않다.Of course, face image acquisition method and device technology using an IP camera have been disclosed in the past, but in the case of this technology, face detection and feature extraction are transmitted together with metadata, but the metadata contents are coordinates for reconstructing the image on the screen. It does not include a function to select video quality for each type of information or object.

한국등록특허공보 제10-1051389호(2011.07.18)Korea Patent Registration No. 10-1051389 (2011.07.18) 한국등록특허공보 제10-2091088호(2020.03.13)Korea Patent Registration No. 10-2091088 (2020.03.13) 한국등록특허공보 제10-1399785호(2014.05.20)Korea Patent Registration No. 10-1399785 (2014.05.20)

본 발명의 실시예는 가령 인공지능(AI) 카메라에 탑재된 딥러닝 기반의 객체 검출 추적시스템을 이용하여 객체를 추출하고 검출된 객체 단위로 별도의 비디오 인코더를 통하여 고해상도로 인코딩한 뒤, 영상 분석 메타 데이터와 융합하여 전송하는 시스템과 모니터링 시스템 등은 이를 수신하여 객체 단위로 고해상도로 출력하고 2차 분석 서버에서는 이를 활용하여 객체 식별 및 인식을 할 수 있는 객체 검출 추적에 기반한 비디오 인코딩 및 전송 방법, 그리고 그 시스템을 제공함에 그 목적이 있다.An embodiment of the present invention extracts an object using a deep learning-based object detection and tracking system mounted on an artificial intelligence (AI) camera, encodes the detected object in high resolution through a separate video encoder, and analyzes the image. A video encoding and transmission method based on object detection and tracking in which systems and monitoring systems that transmit and converge with meta data receive it and output it in high resolution in object units, and the secondary analysis server utilizes this to identify and recognize objects, And the purpose is to provide the system.

본 발명의 실시예에 따른 객체 검출 추적에 기반한 비디오 인코딩 및 전송 시스템은, 카메라의 촬영영상을 압축하여 타 장치로 전송할 때, 촬영영상 내 객체의 식별이 필요한 영역을 검출 및 추적하고, 상기 추적한 객체를 관심 객체의 여부에 따라 서로 다르게 압축하여 상기 타 장치에서 상기 관심 객체를 기준 해상도 이상의 고해상도로 복원되도록 하는 영상분석장치, 및 상기 타 장치로서 동작하고, 상기 영상분석장치에서 제공하는 추적 객체의 객체 데이터를 수신하여 상기 관심 객체를 상기 고해상도로 복원하며, 상기 복원한 고해상도의 관심 객체를 이용해 상기 추적 객체를 식별 및 인식하는 외부장치를 포함한다.A video encoding and transmission system based on object detection and tracking according to an embodiment of the present invention detects and tracks an area in which an object needs to be identified in a captured image when compressing a captured image of a camera and transmitting the same to another device, and An image analysis device that compresses an object differently depending on whether it is an object of interest so that the object of interest is reconstructed at a high resolution higher than the reference resolution in the other device, and a tracking object that operates as the other device and is provided by the image analysis device and an external device for receiving object data, reconstructing the object of interest in the high resolution, and identifying and recognizing the tracked object using the reconstructed high resolution object of interest.

상기 영상분석장치는 상기 카메라 내에 구비되며, 상기 외부장치는 통신망을 경유해 원격지에 구비되어 모니터링, 데이터 저장, 검색 및 영상분석 중 적어도 하나의 동작을 수행하는 장치를 포함할 수 있다.The image analysis device may be provided in the camera, and the external device may include a device provided at a remote location via a communication network to perform at least one operation of monitoring, data storage, search, and image analysis.

상기 영상분석장치는, 상기 관심 객체와 비관심 객체의 압축을 서로 다르게 수행하는 다중 객체별 인코딩부를 포함할 수 있다.The video analysis apparatus may include an encoding unit for each multi-object that differently compresses the object of interest and the object of interest.

상기 영상분석장치는, 상기 관심 객체로서 상기 촬영영상 내의 사람 객체 및 차량 객체에 대하여 상기 고해상도의 복원이 되도록 압축할 수 있다.The image analysis device may compress a human object and a vehicle object in the photographed image as the object of interest to be reconstructed at the high resolution.

상기 영상분석장치는, 상기 객체 데이터로서 상기 추적 객체의 메타데이터를 전송하며, 상기 추적 객체의 위치를 판단하도록 상기 카메라의 촬영 장소와 관련해 기저장된 맵 데이터상에서의 상기 추적 객체에 대한 맵 좌표를 생성해 상기 메타데이터에 포함하여 전송할 수 있다.The video analysis device transmits metadata of the tracked object as the object data, and generates map coordinates for the tracked object on pre-stored map data in relation to the shooting location of the camera to determine the location of the tracked object. It can be included in the metadata and transmitted.

본 발명의 실시예에 따른 객체 검출 추적에 기반한 비디오 인코딩 및 전송 방법은, 영상분석장치가, 카메라의 촬영영상을 압축하여 타 장치로 전송할 때, 촬영영상 내 객체의 식별이 필요한 영역을 검출 및 추적하고, 상기 추적한 객체를 관심 객체의 여부에 따라 서로 다르게 압축하여 상기 타 장치에서 상기 관심 객체를 기준 해상도 이상의 고해상도로 복원되도록 하는 단계, 및 상기 타 장치로서 동작하는 외부장치가, 상기 영상분석장치에서 제공하는 추적 객체의 객체 데이터를 수신하여 상기 관심 객체를 상기 고해상도로 복원하며, 상기 복원한 고해상도의 관심 객체를 이용해 상기 추적 객체를 식별 및 인식하는 단계를 포함한다.A video encoding and transmission method based on object detection and tracking according to an embodiment of the present invention, when an image analysis device compresses a captured image of a camera and transmits it to another device, detects and tracks a region in the captured image where object identification is required. and compressing the tracked object differently depending on whether it is an object of interest so that the object of interest is reconstructed in a high resolution higher than a reference resolution in the other device, and an external device operating as the other device, the image analysis device Receiving object data of a tracking object provided by , reconstructing the object of interest in the high resolution, and identifying and recognizing the tracking object using the reconstructed high resolution object of interest.

상기 영상분석장치에 포함되는 다중 객체별 인코딩부에서 상기 관심 객체와 비관심 객체의 압축을 서로 다르게 수행할 수 있다.In the multi-object encoding unit included in the video analysis device, compression of the object of interest and the object of interest may be performed differently.

상기 서로 다르게 압축하는 단계는, 상기 관심 객체로서 상기 촬영영상 내의 사람 객체 및 차량 객체에 대하여 상기 고해상도의 복원이 되도록 압축할 수 있다.In the differently compressing, the human object and the vehicle object in the photographed image as the object of interest may be compressed to be reconstructed at the high resolution.

상기 영상분석장치가, 상기 객체 데이터로서 상기 추적 객체의 메타데이터를 전송하며, 상기 추적 객체의 위치를 판단하도록 상기 카메라의 촬영 장소와 관련해 기저장된 맵 데이터상에서의 상기 추적 객체에 대한 맵 좌표를 생성해 상기 메타데이터에 포함하여 전송하는 단계를 더 포함할 수 있다.The video analysis device transmits the metadata of the tracked object as the object data, and generates map coordinates for the tracked object on pre-stored map data in relation to the shooting location of the camera to determine the location of the tracked object. The method may further include transmitting the metadata by including it in the metadata.

본 발명의 실시예에 따르면, 종래 대역폭이 상당히 낮은 환경에서 전송되는 낮은 대역폭으로 인하여 동영상의 품질이 떨어지게 되고 특히 움직이는 심한 객체 즉 보행자, 차량 부분이 심하게 블록화가 발생이 되어 객체의 식별이 어려움이 많이 발생되는 문제를 해결할 수 있다.According to an embodiment of the present invention, the quality of video is degraded due to the low bandwidth transmitted in a conventional bandwidth environment, and in particular, severely moving objects, such as pedestrians and vehicles, are severely blocked, making it difficult to identify objects. problems that arise can be resolved.

또한, 본 발명의 실시예에 따르면, 에지 장비에서 객체의 식별이 필요한 영역을 검출 및 추적을 하고 이 부분만 높은 품질로 압축하여 동영상으로 전송하게 하여 객체의 식별 능력을 높일 수 있다.In addition, according to an embodiment of the present invention, the object identification ability can be improved by detecting and tracking an area requiring object identification in the edge equipment, compressing only this part in high quality and transmitting the video as a video.

본 발명의 실시예는 AI 카메라에 탑재된 객체 검출 및 추적시스템을 통하여 관심 물체 단위로 잘라내어 비디오 및 정지 영상을 생성함으로써 관심대상에 대한 인코딩을 함으로써 비디오 품질을 얻을 수 있다.In the embodiment of the present invention, video quality can be obtained by encoding the object of interest by generating video and still images by cutting each object of interest through an object detection and tracking system mounted in an AI camera.

나아가 본 발명의 실시예는 고품질의 이미지를 획득하면 사람 또한 차량의 식별 또는 인식할 수 있다는 장점이 있으며, 이렇게 고해상도로 얻어진 이미지 또는 비디오 클립은 2차 비디오 분석의 용도로 활용할 수 있는 입력 데이터로 활용될 수 있다. Furthermore, the embodiment of the present invention has the advantage that a person can also identify or recognize a vehicle when a high-quality image is acquired, and the image or video clip obtained in this high resolution is used as input data that can be used for the purpose of secondary video analysis. It can be.

도 1은 본 발명의 실시예에 따른 에지 기반 객체 검출 추적에 기반한 비디오 인코딩 시스템과 이와 연계된 모니터링, 저장, 검색, 분석 장치에 대한 도면이다.
도 2는 다중객체 검출추적부의 세부동작을 보여주는 도면이다.
도 3 및 도 4는 다중 객체별 인코딩부의 세부동작을 보여주는 도면이다.
도 5는 다중 객체 비디오 스트림부의 세부동작을 보여주는 도면이다.
도 6은 도 5의 세부동작을 도식화하여 나타내는 도면이다.
도 7은 본 발명의 다른 실시예에 따른 객체 검출 추적에 기반한 비디오 인코딩 시스템을 나타내는 도면이다.
도 8은 도 1의 에지형 영상분석장치나 도 7의 에지장치의 세부구조를 예시한 블록다이어그램이다.
도 9는 본 발명의 실시예에 따른 객체 검출 추적에 기반한 비디오 인코딩 및 전송 방법을 나타내는 흐름도이다.1 is a diagram of a video encoding system based on edge-based object detection and tracking according to an embodiment of the present invention and a monitoring, storage, search, and analysis device associated therewith.
2 is a diagram showing detailed operations of a multi-object detection and tracking unit.
3 and 4 are diagrams showing detailed operations of the encoding unit for each multi-object.
5 is a diagram showing detailed operations of a multi-object video stream unit.
FIG. 6 is a diagram illustrating the detailed operation of FIG. 5 schematically.
7 is a diagram illustrating a video encoding system based on object detection tracking according to another embodiment of the present invention.
8 is a block diagram illustrating a detailed structure of the edge type video analysis device of FIG. 1 or the edge device of FIG. 7 .
9 is a flowchart illustrating a video encoding and transmission method based on object detection tracking according to an embodiment of the present invention.

이하, 도면을 참조하여 본 발명의 실시예에 대하여 상세히 설명한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

도 1은 본 발명의 실시예에 따른 에지 기반 객체 검출 추적에 기반한 비디오 인코딩 시스템과 이와 연계된 모니터링, 저장, 검색, 분석 장치에 대한 도면이다.1 is a diagram of a video encoding system based on edge-based object detection and tracking according to an embodiment of the present invention and a monitoring, storage, search, and analysis device associated therewith.

도 1에 도시된 바와 같이, 본 발명의 실시예에 따른 에지 기반 객체 검출 추적에 기반한 비디오 인코딩 시스템과 이와 연계된 모니터링, 저장, 검색, 분석 장치(이하, 객체 검출 추적에 기반한 비디오 인코딩 시스템)(90)는 에지형 영상분석장치(100), 통신망(미도시), 그리고 영상처리시스템(110)의 일부 또는 전부를 포함한다.As shown in FIG. 1, a video encoding system based on edge-based object detection and tracking according to an embodiment of the present invention and a monitoring, storage, search, and analysis device associated therewith (hereinafter, a video encoding system based on object detection and tracking) ( 90) includes some or all of the edge-type image analysis device 100, a communication network (not shown), and the image processing system 110.

여기서, "일부 또는 전부를 포함한다"는 것은 영상처리시스템(110)을 구성하는 일부 구성요소 즉 장치가 생략되어 객체 검출 추적에 기반한 비디오 인코딩 시스템(90)이 구성되거나 영상처리시스템(110)을 구성하는 일부 장치 또는 구성요소가 통신망을 구성하는 네트워크장치(예: 무선교환장치 등)에 통합되어 구성될 수 있는 것 등을 의미하는 것으로서, 발명의 충분한 이해를 돕기 위하여 전부 포함하는 것으로 설명한다.Here, "including some or all" means that some components constituting the image processing system 110, i.e., devices, are omitted so that the video encoding system 90 based on object detection and tracking is configured or the image processing system 110 is configured. This means that some of the devices or components constituting the communication network can be integrated into a network device (eg, a wireless switching device, etc.), and will be described as including all of them to help a sufficient understanding of the present invention.

에지형 영상분석장치(100)는 공공장소나 우범지역 등 임의 장소에 설치되어 해당 장소를 촬영하는 카메라에 탑재되어 구성될 수 있다. 또는 그 카메라에 인접하여 설치되어 동작할 수 있다. 인접하여 설치되는 경우 별도의 압축을 수행하지 않는 것이 바람직하다. 카메라는 CCTV 카메라나 IP 카메라 등이 될 수 있지만, 딥러닝 기반의 객체 검출 추적 시스템이 구현되는 인공지능(AI) 카메라를 포함할 수 있다. 다양한 유형의 카메라에 연동하여 에지형 영상분석장치(100)가 동작할 수 있으며, 실질적으로 에지형 영상분석장치(100)는 카메라에서 촬영되는 고해상도, 다시 말해서는 카메라의 해상도에 상응하는 해상도의 촬영영상을 제공받는다고 볼 수 있다. 이러한 점에서, 에지형 영상분석장치(100)는 카메라 내에 탑재되어 카메라에 연결되어 동작하는 것이 바람직하다. 이를 통해 촬영영상의 가공(예: 인코딩/디코딩 등)이 발생하지 않으므로, 카메라의 해상도에 상응하는 영상을 제공받을 수 있게 된다. 물론 카메라의 해상도는 렌즈의 성능에 좌우되므로 렌즈에 구비되는 컬러필터 등이 고해상도를 갖는 경우 고해상도의 영상을 얻을 수 있다. 다만, 본 발명의 실시예에서는 기준 해상도 이상의 해상도를 고해상도로 정의하기로 한다. 따라서, 그 이하는 저해상도가 될 수 있으며, 여기서 기준 해상도는 본 발명의 실시예에 따른 기술을 구현하기 위한 최소한의 해상도라 볼 수 있다.The edge-type video analysis device 100 may be installed in an arbitrary place, such as a public place or a crime-prone area, and may be installed in a camera that captures the place. Alternatively, it may be installed and operated adjacent to the camera. When installed adjacently, it is preferable not to perform separate compression. The camera may be a CCTV camera or an IP camera, but may include an artificial intelligence (AI) camera in which a deep learning-based object detection and tracking system is implemented. The edge-type image analysis device 100 can operate in conjunction with various types of cameras, and substantially, the edge-type image analysis device 100 captures a high-resolution image taken from a camera, that is, a resolution corresponding to the resolution of the camera. You can see that the video is provided. In this regard, it is preferable that the edge-type image analysis device 100 is mounted in a camera and operated while being connected to the camera. Through this, since processing (eg, encoding/decoding, etc.) of the captured image does not occur, it is possible to receive an image corresponding to the resolution of the camera. Of course, since the resolution of a camera depends on the performance of a lens, a high-resolution image can be obtained when a color filter or the like provided in the lens has high resolution. However, in an embodiment of the present invention, a resolution equal to or higher than the reference resolution is defined as a high resolution. Therefore, a lower resolution may be a lower resolution, and the reference resolution may be regarded as a minimum resolution for implementing a technique according to an embodiment of the present invention.

에지형 영상분석장치(100)는 카메라에서 수신되는 촬영영상에 대하여 영상 분석 동작을 수행할 수 있다. 가령, 카메라에서 초당 60장의 비디오 프레임을 생성하는 경우 에지형 영상분석장치(100)는 비디오 프레임의 영상을 분석하여 시간 변화에 따른 영상 내 객체를 추적할 수 있다. 크게 영상은 배경 영상과 그 배경 영상 내에 객체가 포함될 수 있다. 여기서, 객체는 사람이나 차량 등이 될 수 있으며, 다양한 유형의 사물 객체가 포함될 수 있다. 물론 동물도 포함될 수 있으며, 나무도 포함될 수 있다. 객체는 카메라가 설치되어 운영되는 장소에 따라 다양한 유형의 객체를 배경 영상에 포함할 수 있으며, 본 발명의 실시예에서는 영상 내에서 객체를 검출하고 검출한 객체를 시간 변화에 따라 추적하며, 이의 과정에서 추적 객체의 메타 데이터를 생성할 수 있으며, 메타 데이터에는 추적 객체의 위치정보를 포함할 수 있다. 물론 에지형 영상분석장치(100)는 내부에 인공지능의 딥러닝 프로그램을 탑재하여 객체의 검출이나 분류의 정확도를 높일 수 있다. 예를 들어, 객체가 겹쳐있는 경우에는 특정 객체를 인식하는 것에 어려움이 있을 수 있다. 이의 경우에는 객체의 검출을 보류한 상태에서 다음 비디오 프레임에서의 정보를 근거로 그 보류한 객체의 유형을 판단할 수도 있다. 이러한 과정을 딥러닝을 통해 학습을 통해 객체를 검출하고 분류할 수 있는 것이다.The edge-type image analysis device 100 may perform an image analysis operation on a photographed image received from a camera. For example, when a camera generates 60 video frames per second, the edge-type image analysis device 100 may analyze an image of a video frame and track an object in the image according to a time change. Largely, the image may include a background image and an object within the background image. Here, the object may be a person or a vehicle, and various types of object objects may be included. Of course, animals may also be included, and trees may also be included. Objects may include various types of objects in the background image depending on the location where the camera is installed and operated. Meta data of the tracking object can be created in , and location information of the tracking object can be included in the meta data. Of course, the edge-type video analysis device 100 can increase the accuracy of object detection or classification by mounting an artificial intelligence deep learning program therein. For example, when objects overlap, it may be difficult to recognize a specific object. In this case, the type of the held object may be determined based on information in the next video frame while object detection is suspended. This process can detect and classify objects through learning through deep learning.

본 발명의 실시예에 따른 에지형 영상분석장치(100)는 영상을 분석할 때 관심 객체와 비관심 객체로 구분할 수 있다. 여기서, 관심 객체는 사람이나 차량이 될 수 있다. 또한 비관심 객체는 관심 객체 이외의 객체를 포함할 수 있다. 이에 따라 에지형 영상분석장치(100)는 촬영영상을 압축하여 전송할 때, 더 정확하게는 촬영영상을 분석한 분석 결과를 전송할 때 관심 객체와 비관심 객체의 압축을 서로 다르게 제공할 수 있다. 예를 들어, 비관심 객체는 에지형 영상분석장치(100)에 기설정되는 압축율이나 압축방법에 따라, 더 정확하게는 구비되는 인코더의 성능에 따라 압축하여 제공할 수 있지만, 관심 객체는 원격지의 장치에서 고해상도로 관심 객체를 복원해 낼 수 있도록 압축율이 비관심 객체보다는 적게 하여 비디오 데이터의 압축 데이터를 전송할 수 있다. 압축 즉 인코딩이 강하면 그만큼 비디오 데이터는 많이 생략되므로 이의 과정에서 압축 손실이 많이 발생할 수 있다. 따라서, 본 발명의 실시예에서는 비관심 객체(혹은 배경 영상)와 관심 객체의 압축을 서로 다르게 수행하며, 나아가서는 관심 객체의 경우에는 압축을 수행하지 않고 데이터를 전송하는 것도 얼마든지 가능할 수 있다. 이는 어디까지나 시스템 설계자의 의도에 따라 다양하게 이루어질 수 있는 것이므로 본 발명의 실시예에서는 어느 하나의 형태에 특별히 한정하지는 않을 것이다.The edge-type image analysis device 100 according to an embodiment of the present invention may divide an object of interest and an object of interest when analyzing an image. Here, the object of interest may be a person or a vehicle. In addition, the uninteresting object may include objects other than the interest object. Accordingly, the edge-type image analysis apparatus 100 may provide compression of the object of interest and non-interest objects differently when compressing and transmitting a captured image or, more precisely, when transmitting an analysis result obtained by analyzing the captured image. For example, the non-interest object may be compressed and provided according to a compression rate or compression method preset in the edge-type video analysis device 100, or more accurately, according to the performance of an encoder provided, but the object of interest may be provided by a remote device In order to reconstruct an object of interest in high resolution, compressed data of video data may be transmitted with a compression rate lower than that of an object of non-interest. If the compression, that is, the encoding, is strong, a lot of video data is omitted, and thus, a lot of compression loss may occur in the process. Therefore, in an embodiment of the present invention, compression of a non-interest object (or background image) and an object of interest are performed differently, and furthermore, it is possible to transmit data without performing compression in the case of an object of interest. Since this can be made in various ways according to the intention of the system designer, the embodiment of the present invention will not be particularly limited to any one form.

뿐만 아니라 에지형 영상분석장치(100)는 가령 촬영영상의 배경 영상이나 배경 영상 내 추적 객체에 대한 메타 데이터를 생성하여 전송할 때, 압축이 다르게 이루어지는 관심 객체에 대한 압축 정보(예: 디코딩을 위해 제공됨), 그리고 배경 영상 내 관심 객체의 위치정보 등을 함께 전송할 수 있으며, 이를 근거로 관심 객체에 대한 복원이 이루지도록 할 수 있다. 물론 여기서 위치정보는 촬영영상 내에서의 위치 정보이며, 촬영 장소의 맵 데이터를 이용한 맵 데이터상의 맵 좌표가 될 수 있다.In addition, when the edge-type image analysis device 100 generates and transmits meta data for a background image of a captured image or a tracked object in a background image, for example, compression information (eg, provided for decoding) of an object of interest that is compressed differently ), and location information of the object of interest in the background image may be transmitted together, and based on this, restoration of the object of interest may be performed. Of course, the location information here is location information within a photographed image, and may be map coordinates on map data using map data of a photographed location.

상기의 결과 에지형 영상분석장치(100)는 AI 카메라에 탑재된 객체 검출 및 추적 시스템을 통하여 관심 물체 단위로 잘라내어 비디오 및 정지 영상을 생성하고 관심 대상에 대한 인코딩을 함으로써 생성한 비디오 및 정지 영상의 복원시 고해상도의 비디오 품질을 얻을 수 있다. 고품질의 이미지를 획득하면 사람 또한 차량의 식별 또는 인식이 그만큼 용이할 수 있다. 또한 이렇게 고해상도로 얻어진 이미지 또는 비디오 클립은 2차 비디오 분석의 용도로 활용할 수 있는 입력 데이터로 활용함으로써 그만큼 정확도가 높아질 수 있다.As a result of the above, the edge-type image analysis device 100 generates video and still images by cutting each object of interest through the object detection and tracking system mounted on the AI camera, and encodes the object of interest. When restoring, high-resolution video quality can be obtained. Acquiring a high-quality image can make identification or recognition of a vehicle much easier for humans. In addition, the image or video clip obtained in this high resolution can be used as input data that can be used for the purpose of secondary video analysis, thereby increasing accuracy.

또한, 본 발명의 실시예는 기존에 대역폭이 상당이 낮은 환경에서 전송되는 낮은 대역폭으로 인하여 동영상의 품질이 떨어지게 되고 특히 움직이는 심한 객체 즉 보행자, 차량 부분이 심하게 블록화가 발생이 되어 객체의 식별이 어려움이 많이 발생되는 문제를 해결할 수 있다. 에지 장비에서 객체의 식별이 필요한 영역을 검출 및 추적을 하고 이 부분만 높은 품질로 압축하여 동영상으로 전송하게 하여 객체의 식별 능력을 높일 수 있는 것이다.In addition, in the embodiment of the present invention, the quality of video is degraded due to the low bandwidth transmitted in an environment where the bandwidth is considerably low, and in particular, severely moving objects, such as pedestrians and vehicles, are severely blocked, making it difficult to identify objects. Many of these problems can be solved. The edge equipment detects and tracks the area that needs object identification, compresses only this part to high quality, and transmits it as a video to increase the object identification ability.

본 발명의 실시예에 따른 영상처리시스템(110)은 에지형 영상분석장치(100)에서 제공하는 다중객체 비디오 스트림을 라이브로 모니터링할 수 있는 시스템과 이러한 다중객체비디오 스트림을 저장/재생/검색하는 시스템을 포함하며, 마지막으로 이러한 다중객체 비디오 스트림을 기반으로 2차 분석을 수행하는 분석 시스템을 포함한다.An image processing system 110 according to an embodiment of the present invention includes a system capable of live monitoring a multi-object video stream provided by an edge-type video analysis device 100 and a system for storing/playing/searching such a multi-object video stream. system, and finally an analysis system that performs secondary analysis based on these multi-object video streams.

모니터링장치(111) 등의 모니터링 시스템은 에지형 영상분석장치(100)에서 제공하는 다중객체 비디오 스트림을 입력받는다. 모니터링 시스템은 배경영상을 출력하고 그 위에 다중 객체별 비디오를 출력하는 방식을 포함한다. 또한 모니터링 시스템은 실시간 비디오 스트림 위에 다중 객체별 비디오를 출력하는 방식을 포함한다. 모니터링 시스템은 다중객체를 출력할 때 관심 객체만 선택하여 출력하는 방식을 포함할 수 있다. 예를 들어, 관심 객체의 종류, 속성 정보, 사건이나 사고의 이벤트 정보 등을 근거로 출력될 수 있다.A monitoring system such as the monitoring device 111 receives a multi-object video stream provided by the edge type video analysis device 100 . The monitoring system includes a method of outputting a background image and outputting video for each of multiple objects thereon. Also, the monitoring system includes a method of outputting video for each of multiple objects on top of a real-time video stream. The monitoring system may include a method of selecting and outputting only an object of interest when outputting multiple objects. For example, it may be output based on the type of object of interest, property information, event information of an incident or accident, and the like.

또한, 영상처리시스템(110)을 구성하는 저장 시스템은, 에지형 영상분석장치(100)에서 제공하는 다중객체 비디오 스트림을 입력받아 저장한다. 저장/재생/검색장치(112) 등의 저장 시스템은 실시간 영상과 함께 배경 영상과 객체별 비디오 스트림을 모두 저장하는 시스템을 포함한다. 저장/재생/검색장치(112) 등의 재생 및 검색 시스템은 다중 객체별로 인코딩된 비디오를 원래의 위치에 이동시켜 출력하게 하여 실시간 비디오 스트림과 동일하게 출력하는 기능을 포함한다. 관심 객체의 비디오만 선택하여 출력하는 시스템을 포함한다. 재생시스템에서 재생시 실시간 비디오 스트림을 배경으로 선택하거나 또는 학습된 배경영상을 배경으로 선택하는 기능을 제공할 수 있다.In addition, the storage system constituting the image processing system 110 receives and stores the multi-object video stream provided by the edge-type image analysis device 100 . A storage system such as the storage/playback/search device 112 includes a system for storing both a background image and a video stream for each object along with a real-time image. The playback and retrieval system such as the storage/playback/search device 112 includes a function of outputting the same as a real-time video stream by moving and outputting video encoded for each multi-object to its original position. and a system for selecting and outputting only the video of the object of interest. When playing in the playback system, a function of selecting a real-time video stream as a background or a learned background image may be provided.

2차 분석장치(113) 등의 2차 분석 시스템은 1차 분석 서버인 에지형 영상분석장치(100)로부터 수신받은 객체 검출 및 추적 정보를 이용하여 2차 분석을 수행한다. 또한 이와 함께 고화질로 인코딩된 객체 단위의 영상을 참고하여 보다 고성능을 수행하는 시스템을 포함한다. 예를 들어, 사람이나 차량의 사건이나 사고와 관련한 이벤트 검출을 위한 분석 동작이 이루어질 수 있다.A secondary analysis system such as the secondary analysis device 113 performs secondary analysis using object detection and tracking information received from the edge-type video analysis device 100, which is a primary analysis server. In addition, it includes a system that performs higher performance by referring to an image of an object unit encoded with high quality. For example, an analysis operation for detecting an event related to a human or vehicle incident or accident may be performed.

물론 영상처리시스템(110)은 도 1의 비디오 인코딩 시스템 즉 에지형 영상분석장치(100)와 연계하여 동작하며, 다양한 유형의 장치를 포함할 수 있다. 대표적으로 도 1에서와 같이 모니터링장치(111), 저장/재생/검색장치(112), 그리고 2차 분석장치(113) 등 다양한 영상 서비스를 제공하기 위한 서버 장치 등을 포함할 수 있다. 모니터링장치(111)는 정부 기관 등의 관제센터에 설치되어 운영될 수 있고, 저장/재생/검색장치(112)는 동영상 재생이나 검색 서비스를 제공하는 다양한 서버 등의 장치를 포함할 수 있으며, 2차 분석장치(113)는 에지형 영상분석장치(100)에서 제공하는 영상 분석 결과를 토대로 새로운 분석 동작을 수행하는 장치를 포함할 수 있다. 예를 들어, 관제센터의 모니터링장치(111)에서 임의 장소에 설치되는 카메라의 촬영영상을 통해 해당 장소에서 발생하는 사건, 사고를 알아야 하는 경우, 2차 분석장치(113)는 에지형 영상분석장치(100)에서 제공하는 영상 분석 결과를 토대로 사건, 사고 등의 이벤트를 판단하여 모니터링장치(111)로 통지해 줄 수 있다.Of course, the image processing system 110 operates in conjunction with the video encoding system of FIG. 1, that is, the edge-type image analysis device 100, and may include various types of devices. Representatively, as shown in FIG. 1, a server device for providing various video services such as a monitoring device 111, a storage/playback/search device 112, and a secondary analysis device 113 may be included. The monitoring device 111 may be installed and operated in a control center such as a government agency, and the storage / playback / search device 112 may include devices such as various servers that provide video playback or search services. The difference analysis device 113 may include a device that performs a new analysis operation based on the image analysis result provided by the edge-type image analysis device 100 . For example, when the monitoring device 111 of the control center needs to know an incident or accident occurring at a location through a photographed image of a camera installed at an arbitrary location, the secondary analysis device 113 is an edge-type image analysis device Based on the video analysis result provided by (100), an event such as an incident or accident may be determined and notified to the monitoring device 111.

다시 정리해 보면, 모니터링장치(111)는 관제 센터 등에 설치되어 관제요원 등이 카메라가 설치되는 장소에서 발생하는 사건, 사고 등의 이벤트를 감지할 수 있도록 한다. 따라서 모니터링장치(111)는 관제요원 등이 보유하는 컴퓨터 등을 포함할 수도 있다. 또한, 저장/재생/검색장치(112)는 가령 에지형 영상분석장치(100)에서 제공하는 동영상의 비디오 프레임 데이터를 저장하며, 또 음성 데이터를 함께 저장할 수도 있다. 이에 따라 모니터링장치(111)에서 특정 날짜와 특정 장소에서의 동영상을 다시 보고자 할 때 이를 검색 등을 통해 제공해 재생되도록 할 수 있다. 예를 들어, 특정 장소에서 범죄나 교통사고 등이 발생하였을 때 경찰서 등에서 해당 장소의 영상을 요청하는 경우, 이에 관련되는 서비스를 제공할 수 있다.In summary, the monitoring device 111 is installed in a control center, etc. so that control personnel can detect events such as incidents and accidents that occur in a place where a camera is installed. Therefore, the monitoring device 111 may include a computer owned by a control agent or the like. In addition, the storage/reproduction/search device 112 may store video frame data of a moving image provided by the edge-type image analysis device 100, for example, and may also store audio data together. Accordingly, when the monitoring device 111 wants to view a video on a specific date and in a specific place again, it can be provided through search or the like so that it can be reproduced. For example, when a crime or a traffic accident occurs in a specific place and a police station or the like requests an image of the place, a service related thereto may be provided.

나아가, 2차 분석장치(113)는 카메라의 촬영영상에 대하여 1차적으로 영상분석이 이루어져 그 분석 결과를 토대로 새로운 영상 분석 동작을 수행하는 것을 의미할 수 있다. 물론, 이의 과정에서 저장/재생/검색장치(112)에 저장되는 동영상을 함께 이용하여 2차 분석 동작을 수행할 수 있으며, 대표적으로 사건, 사고 등의 이벤트 판단과 관련한 동작을 수행할 수 있는 것을 설명한 바 있다. 물론 2차 분석장치(113)의 경우에도 사건, 사고의 이벤트 판단을 위하여 딥러닝 프로그램을 저장할 수 있다. 2차 분석장치(113)는 빅데이터를 형성한 후 해당 빅데이터의 학습을 통해 사건, 사고를 판단할 수 있으며, 딥러닝 프로그램을 적용함으로써 그 판단의 정확도를 높일 수 있을 것이다.Furthermore, the secondary analysis device 113 may mean that a video analysis is primarily performed on an image captured by a camera, and a new image analysis operation is performed based on the analysis result. Of course, in this process, it is possible to perform a secondary analysis operation by using the video stored in the storage / playback / search device 112 together, and representatively, it is possible to perform an operation related to event determination such as an incident or an accident. have been explained Of course, even in the case of the secondary analysis device 113, a deep learning program can be stored to determine events of accidents and accidents. The secondary analysis device 113 can determine incidents and accidents through learning of the big data after forming big data, and by applying a deep learning program, it will be possible to increase the accuracy of the judgment.

상기한 내용 이외에도 도 1의 에지형 영상분석장치(100) 및 영상처리시스템(110)은 다양한 동작을 수행할 수 있으며, 기타 자세한 내용은 도 2 내지 도 5를 참조하여 계속해서 살펴보고자 한다.In addition to the above, the edge-type image analysis device 100 and the image processing system 110 of FIG. 1 may perform various operations, and other details will be continuously reviewed with reference to FIGS. 2 to 5.

도 2는 다중객체 검출추적부의 세부동작을 보여주는 도면, 도 3 및 도 4는 다중 객체별 인코딩부의 세부동작을 보여주는 도면, 도 5는 다중 객체 비디오 스트림부의 세부동작을 보여주는 도면, 도 6은 도 5의 세부동작을 도식화하여 나타내는 도면이다.2 is a diagram showing detailed operations of a multi-object detection and tracking unit, FIGS. 3 and 4 are diagrams showing detailed operations of an encoding unit for each multi-object, FIG. 5 is a diagram showing detailed operations of a multi-object video stream unit, FIG. 6 is FIG. 5 It is a diagram showing the detailed operation of.

설명의 편의상 도 2 내지 도 6을 도 1과 함께 참조하면, 본 발명의 실시예에 따른 에지형 영상분석장치(100)는 도 1에 도시된 바와 같이, 영상 입력부(101), 배경영상생성부(102), 다중객체 검출추적부(103), 다중 객체별 인코딩부(104) 및 다중 객체 비디오 스트림부(105)의 일부 또는 전부를 포함하며, 여기서 "일부 또는 전부를 포함"한다는 것은 앞서서의 의미와 크게 다르지 않으므로 그 내용들로 대신하고자 한다.Referring to FIGS. 2 to 6 together with FIG. 1 for convenience of explanation, the edge-type image analysis device 100 according to an embodiment of the present invention, as shown in FIG. 1, includes an image input unit 101 and a background image generator. (102), a multi-object detection and tracking unit 103, a multi-object encoding unit 104, and a multi-object video stream unit 105. It is not very different from the meaning, so I would like to replace it with the contents.

영상 입력부(101)는 에지 영상분석장치(100)에서 객체 검출 추적에 기반한 비디오 인코딩 및 전송 방법을 수행하기 위해 에지 영상분석장치(100) 내에서 영상을 입력받는다. 즉 에지형 영상분석장치(100)는 외부 카메라에서 영상을 입력받은 장치 또는 IP 카메라와 같이 에지 장비 내부에서 영상을 캡쳐할 수 있는 영상 입력받는 시스템을 포함한다고 볼 수 있다.The image input unit 101 receives an image within the edge image analysis device 100 to perform a video encoding and transmission method based on object detection and tracking in the edge image analysis device 100 . That is, the edge-type video analysis device 100 can be considered to include a device that receives an image from an external camera or an image input system capable of capturing an image inside an edge device such as an IP camera.

배경영상 생성부(102)는 영상 입력부(101)에서 제공되는 촬영영상에서 배경 영상을 생성하는 기능을 수행한다. 즉 에지형 영상분석장치(100)는 객체별 스트림의 배경이 되는 배경 영상을 생성하는 배경영상 생성부(102)를 포함한다.The background image generator 102 performs a function of generating a background image from a captured image provided by the image input unit 101 . That is, the edge-type video analysis device 100 includes a background image generation unit 102 that generates a background image that becomes a background of each object stream.

다중객체 검출추적부(103)는 영상 입력부(101)로 입력된 입력 영상을 분석하여 다중 객체를 동시에 검출하고 추적한다. 에지형 영상분석장치(100)는 입력 영상으로부터 객체를 검출하고 일련의 프레임에서 검출된 객체들을 추적(Tracking)을 하는 다중객체 검출추적부(103)를 포함한다. 이와 같은 추적을 수행하는 과정에서 추적 정보는 방향 벡터 정보나 스칼라 정보 등의 형태로 생성할 수 있다. 물론 이는 맵 데이터상에서 맵 좌표의 형태로 생성될 수도 있다.The multi-object detection and tracking unit 103 analyzes the input image input to the image input unit 101 to simultaneously detect and track multiple objects. The edge-type image analysis device 100 includes a multi-object detection and tracking unit 103 that detects objects from an input image and tracks objects detected in a series of frames. In the process of performing such tracking, tracking information may be generated in the form of direction vector information or scalar information. Of course, this may be generated in the form of map coordinates on map data.

도 2는 다중 객체 검출 및 추적부의 세부 동작을 보여준다. 다중객체 검출추적부(103)는 영상 입력부(101)로부터 실시간 영상을 입력받아 다중 객체를 검출한다(S200, S210). 즉 배경 영상 내의 사람, 차량 등의 관심 객체와, 그 이외의 비관심 객체를 검출할 수 있다. 물론 객체 검출은 객체 검출기를 통해 이루어질 수 있으며, 객체 검출기는 검출 정확도를 높이기 위하여 DCNN 기반의 객체검출기가 사용될 수 있다. 또는 사람과 차량에 대한 다양한 유형의 이미지를 학습한 후 이를 근거로 객체를 검출 및 분류할 수 있다. 또한, 다중객체 검출추적부(103)는 해당 검출된 다중 객체의 추적 동작을 수행하며, 시간 변화에 따른 움직임 추적이라 볼 수 있다. 예를 들어, (n-1)번째 비디오 프레임에서 검출한 객체와, n번째 비디오 프레임의 객체가 동일한 속성(예: 모양이나 컬러 등)을 가지면 2개의 객체는 동일 객체이며 각각의 위치(정보)를 근거로 움직임을 추적할 수 있을 것이다. 이후, 다중객체 검출추적부(103)는 다중 객체에 대한 메타데이터를 생성할 수 있다. 메타데이터는 추적객체인 관심 객체나 비관심 객체와 관련한 다양한 정보를 갖는다고 볼 수 있다. 예를 들어, 객체의 모양, 종류, 색깔 등 다양한 정보를 포함할 수 있다. 또한, 메타데이터는 객체의 위치정보를 포함할 수도 있다. 또는 관심 객체의 압축과 관련한 압축 정보를 포함하라 수도 있다.2 shows detailed operations of the multi-object detection and tracking unit. The multi-object detection and tracking unit 103 receives real-time images from the video input unit 101 and detects multi-objects (S200 and S210). That is, an object of interest, such as a person or a vehicle, in the background image and other non-interest objects may be detected. Of course, object detection may be performed through an object detector, and a DCNN-based object detector may be used as the object detector to increase detection accuracy. Alternatively, after learning various types of images of people and vehicles, it is possible to detect and classify objects based on them. In addition, the multi-object detection and tracking unit 103 performs a tracking operation of the detected multi-objects, and can be regarded as motion tracking according to time changes. For example, if the object detected in the (n-1)th video frame and the object in the nth video frame have the same properties (eg shape, color, etc.), the two objects are the same object and their respective positions (information) Based on this, movement can be tracked. Then, the multi-object detection and tracking unit 103 may generate metadata for the multi-object. Metadata can be seen as having various information related to objects of interest or non-interest, which are tracking objects. For example, various information such as the shape, type, and color of an object may be included. Also, metadata may include object location information. Alternatively, compression information related to compression of the object of interest may be included.

다중 객체별 인코딩부(104)는 다중 추적 객체 정보 단위로 비디오를 생성하는 부분에 해당한다. 에지형 영상분석장치(100)는 입력 영상과 다중 객체 추적 정보를 기반으로 다중 객체별 비디오 스트림을 생성하는 다중객체 비디오 생성부 즉 다중 객체별 인코딩부를 포함할 수 있다. The multi-object encoding unit 104 corresponds to a part that generates a video in units of multi-tracking object information. The edge-type video analysis apparatus 100 may include a multi-object video generation unit that generates a video stream for each multi-object based on an input image and multi-object tracking information, that is, an encoding unit for each multi-object.

다중 객체별 인코딩부(104)는 다중객체 검출추적부(103)에서 제공되는 다양한 유형의 객체들에 대하여 비디오 데이터를 압축하는 즉 인코딩 동작을 수행할 수 있다. 이의 과정에서 다중 객체별 인코딩부(104)는 사람이나 차량 등의 관심 객체와 그 이외의 비관심 객체를 구분하여 서로 다른 압축 방식으로 압축하여 비디오 데이터, 더 정확하게는 객체 데이터를 출력할 수 있다. 물론 본 발명의 실시예에 따른 다중 객체별 인코딩부(104)는 객체의 유형 더 정확하게는 관심 객체 여부에 따라 서로 다른 압축을 수행하지만, 객체의 메타데이터를 생성할 수도 있다. 위의 다중 객체검출 추적부(103)에서 메타데이터를 생성할 수도 있지만, 다중 객체검출 추적부(103)는 촬영영상 내에서 객체의 위치 즉 좌표정보를 제공할 수 있으며, 본 발명의 실시예에서는 임의 지역의 맵 데이터에 대하여 맵 좌표를 생성하여 제공하는 형태로 영상 처리가 이루어질 수 있다. 따라서, 다중 객체별 인코딩부(104)는 관심 객체(혹은 비관심 객체, 배경 영상 등) 여부에 따라 서로 다른 압축 동작을 수행하는 것 이외에도 메타데이터를 생성하기 위한 동작을 수행할 수 있다. 이러한 점에서 다중 객체별 인코딩부(104)는 다중객체 비디오 생성부라 명명될 수 있으며, 다중객체 비디오 생성부 내에 다중 객체별 인코딩부(104)를 포함할 수도 있을 것이다.The multi-object encoding unit 104 may compress video data for various types of objects provided by the multi-object detection and tracking unit 103, that is, perform an encoding operation. In this process, the multi-object encoding unit 104 may output video data, more precisely, object data, by dividing an object of interest, such as a person or a vehicle, from an object of interest and other non-interest objects, and compressing them using different compression methods. Of course, the multi-object encoding unit 104 according to an embodiment of the present invention performs different compressions depending on the type of object, more precisely, whether or not it is an object of interest, but may also generate object metadata. Although the above multi-object detection tracking unit 103 can generate metadata, the multi-object detection tracking unit 103 can provide the position of an object in a captured image, that is, coordinate information, and in the embodiment of the present invention, Image processing may be performed in a form of generating and providing map coordinates for map data of an arbitrary region. Accordingly, the encoding unit 104 for each multi-object may perform an operation for generating metadata in addition to performing different compression operations depending on whether an object of interest (or a non-interest object, a background image, etc.) exists. In this respect, the multi-object encoding unit 104 may be referred to as a multi-object video generator, and the multi-object encoding unit 104 may be included in the multi-object video generator.

다중 객체별 인코딩부(104)는 비관심 객체를 압축 즉 인코딩하기 위한 제1 인코더와 관심 객체를 압축하기 위한 제2 인코더를 포함하여 구성될 수도 있다. 다만, 본 발명의 실시예에서는 비관심 객체만 기설정된 압축 방식대로 압축을 수행하고, 관심 객체는 압축을 수행하지 않고 전송하는 것도 얼마든지 가능할 수 있다. 물론 기설정된 압축 방식은 현재 사용되고 있는 표준화 기술이 적용될 수 있다. 인트라(intra), 인터(inter) 또는 그 조합에 의해 구성될 수 있다. 반면, 제2 인코더의 경우에는 제1 인코더와 다른 방식으로 관심 객체를 인코딩할 수 있으며, 해당 관심 객체의 복원시 해상도를 높이는 방향으로 인코딩이 이루어지는 것이 바람직하다. 따라서, 압축율은 압축되는 정도를 의미할 수 있으므로, 제2 인코더는 제1 인코더에 비하여 저 압축율을 가질 수 있다. 무엇보다 본 발명의 실시예에서는 관심 객체와 비관심 객체의 압축을 서로 다르게 한다는 것이며, 다양한 압축 방식이 사용될 수 있을 것이다. 또한, 사람이나 차량 객체 등의 관심 객체는 기설정되는 정보를 근거로 분류되면 이를 위하여 객체검출기를 딥러닝 기반의 검출기를 구성하여 학습을 통해 관심 객체와 비관심 객체의 검출 정확도를 높일 수 있다.The multi-object encoding unit 104 may include a first encoder for compressing, or encoding, a non-interest object, and a second encoder for compressing an object of interest. However, in an embodiment of the present invention, it may be possible to perform compression only on objects of interest according to a preset compression method, and to transmit objects of interest without performing compression. Of course, a currently used standardization technique may be applied to the preset compression method. It may be configured by intra, inter, or a combination thereof. On the other hand, in the case of the second encoder, the object of interest may be encoded in a method different from that of the first encoder, and encoding is preferably performed in a direction of increasing resolution when reconstructing the object of interest. Accordingly, since the compression rate may mean the degree of compression, the second encoder may have a lower compression rate than the first encoder. Above all, in the embodiment of the present invention, the compression of the object of interest and the object of interest are different, and various compression methods may be used. In addition, when an object of interest such as a person or a vehicle object is classified based on preset information, the detection accuracy of the object of interest and the non-interest object may be increased through learning by configuring the object detector as a deep learning-based detector.

도 3 및 도 4는 도 1의 다중 객체별 인코딩부(104)의 세부 동작을 보여주고 있다. 다중 객체별 인코딩부(104)는 영상 입력부(101)를 통해 영상이 입력되면, 배경 영상을 생성하고 그 생성한 배경 영상의 업데이트 동작을 수행할 수 있다(S300, S310), 시간 변화에 따라 배경은 변하기 때문이다. 또한, 다중 객체별 인코딩부(104)는 배경영상 동영상을 인코딩할 수 있다(S320). 이때 배경영상의 인코딩은 기설정되는 인코딩 방식, 다시 말해 표준화된 인코더를 사용하여 인코딩이 이루어질 수 있다. 또한, 다중 객체별 인코딩부(104)는 객체별 인코딩을 수행할 수 있다. 실시간 영상과 다중 객체 추적 정보를 수신하여 객체별로 이미지를 생성한다(S400, S410, S420). 해당 객체 이미지는 관심 객체와 비관심 객체로 분류될 수 있다. 그리고 객체 아이디별 동영상 인코딩이 이루어지며(S430), 이의 과정에서 관심 객체와 비관심 객체의 압축을 서로 다르게 수행할 수도 있다. 물론 본 발명의 실시예에서는 배경 영상만 기존의 표준화된 인코딩 기술로 압축하여 데이터를 전송하고, 객체들의 경우에는 다른 압축 방식으로 데이터를 전송하는 것도 얼마든지 가능하므로 본 발명의 실시예에서는 어느 하나의 형태에 특별히 한정하지는 않을 것이다.3 and 4 show detailed operations of the multi-object encoding unit 104 of FIG. 1 . When an image is input through the image input unit 101, the multi-object encoding unit 104 may generate a background image and perform an update operation of the generated background image (S300, S310). is changing. In addition, the multi-object encoding unit 104 may encode a background image video (S320). At this time, encoding of the background image may be performed using a preset encoding method, that is, a standardized encoder. Also, the multi-object encoding unit 104 may perform encoding for each object. Real-time video and multi-object tracking information are received and images are generated for each object (S400, S410, S420). A corresponding object image may be classified into an object of interest and an object of non-interest. Video encoding is performed for each object ID (S430), and in this process, the object of interest and the object of interest may be compressed differently. Of course, in the embodiment of the present invention, only the background image is compressed with the existing standardized encoding technology to transmit data, and in the case of objects, it is possible to transmit data with other compression methods. The shape will not be particularly limited.

도 1의 다중 객체 비디오 스트림부(105)는 일종의 통신부에 해당할 수 있다. 다중 객체 비디오 스트림부(105)는 배경영상, 다중객체 추적 정보, 다중객체별 비디오를 정보를 모아 스트리밍으로 전송하기 위한 동작을 수행한다. 물론 다중 객체 비디오 스트림부(105)는 데이터 합성 동작을 수행하고, 지정된 데이터 패킷 내에 해당 데이터를 포함하여 영상처리시스템(110)으로 제공할 수 있다. 이와 같이 에지형 영상분석장치(100)는 입력 영상을 일반적인 입력영상 전체를 인코딩하여 비디오 스트림으로 제공하는 실시간 비디오 스트림부를 포함한다.The multi-object video stream unit 105 of FIG. 1 may correspond to a kind of communication unit. The multi-object video stream unit 105 collects background image, multi-object tracking information, and video information for each multi-object and transmits the information through streaming. Of course, the multi-object video stream unit 105 may perform a data synthesis operation, include the corresponding data in a designated data packet, and provide the data to the image processing system 110. In this way, the edge-type video analysis device 100 includes a real-time video stream unit that encodes the entire input video and provides it as a video stream.

도 5는 도 1의 다중 객체 비디오 스트림부의 세부동작을 보여주는 도면, 도 6은 도 5의 세부동작을 도식화하여 나타내는 도면이다. 도 5에 도시된 바와 같이, 본 발명의 실시예에 따른 도 1의 다중 객체 비디오 스트림부(105)는 다중 객체의 메타데이터 정보, 객체별 인코딩된 동영상 정보, 인코딩된 배경영상 정보를 제공받아 이를 다중 정보 스트림으로 동기화한다(S500 ~ S530). 예를 들어 도 6의 (a)에서 볼 때 하나의 단위 비디오 프레임 영상 내의 사람과 차량 객체를 검출하고 이를 스트림으로 전송하는 경우, 도 6의 (b)에서와 같이 시간을 동기화하여 동시에 전송할 수 있다(S540).FIG. 5 is a diagram showing detailed operations of the multi-object video stream unit of FIG. 1, and FIG. 6 is a diagram illustrating detailed operations of FIG. As shown in FIG. 5, the multi-object video stream unit 105 of FIG. 1 according to an embodiment of the present invention receives metadata information of multiple objects, encoded video information for each object, and encoded background image information, and transmits them Synchronize with multiple information streams (S500 to S530). For example, as shown in (a) of FIG. 6, when a person and a vehicle object in one unit video frame are detected and transmitted as a stream, they can be simultaneously transmitted by synchronizing time as shown in (b) of FIG. (S540).

도 7은 본 발명의 다른 실시예에 따른 객체 검출 추적에 기반한 비디오 인코딩 시스템을 나타내는 도면이다.7 is a diagram illustrating a video encoding system based on object detection tracking according to another embodiment of the present invention.

도 7에 도시된 바와 같이, 본 발명의 다른 실시예에 따른 객체 검출 추적에 기반한 비디오 인코딩 시스템(190)은 에지장치(700), 통신망(710) 및 영상처리장치(720)의 일부 또는 전부를 포함할 수 있다.As shown in FIG. 7 , a video encoding system 190 based on object detection and tracking according to another embodiment of the present invention includes some or all of an edge device 700, a communication network 710, and an image processing device 720. can include

여기서, "일부 또는 전부를 포함"한다는 것은 통신망(710)과 같은 일부 구성요소가 생략되어 객체 검출 추적에 기반한 비디오 인코딩 시스템(190)이 구성되거나, 영상처리장치(720)와 같은 일부 구성요소가 통신망(710)을 구성하는 네트워크장치(예: 무선교환장치 등)에 통합되어 구성될 수 있는 것 등을 의미하는 것으로서, 발명의 충분한 이해를 돕기 위하여 전부 포함하는 것으로 설명한다.Here, "including some or all" means that some components such as the communication network 710 are omitted so that the video encoding system 190 based on object detection and tracking is configured, or some components such as the image processing device 720 are included. It means something that can be integrated and configured in a network device (eg, wireless switching device, etc.) constituting the communication network 710, and will be described as including all of them to help a sufficient understanding of the present invention.

도 7에서의 에지장치(700)나 영상처리장치(720)는 도 1에서의 에지형 영상분석장치(100)나 영상처리시스템(110)과 크게 다르지 않다. 따라서, 자세한 내용은 도 1에서의 내용들로 대신하고자 한다.The edge device 700 or image processing device 720 in FIG. 7 is not significantly different from the edge type image analysis device 100 or image processing system 110 in FIG. 1 . Therefore, the details are to be replaced with those in FIG. 1 .

다만, 도 7의 영상처리장치(720)의 경우에는 도 1의 모니터링장치(111), 저장/재생/검색장치(112) 및 2차 분석장치(113)를 하나의 장치에 통합하는 형태로 구성할 수 있는 것을 보여준다. 따라서, 도 7의 영상처리장치(720)는 모니터링부, 저장/재생/검색부, 2차 분석부 등의 기능 블록을 포함하여 도 1에서와 같은 동일한 동작을 수행할 수 있다. 다만, 도 1의 경우에는 도 7에 비하여 데이터 연산 처리 속도가 빠를 수 있지만, 다수의 장비를 사용하는 관계로 비용이 많이 소요될 수는 있다. 따라서, 이러한 구성은 시스템 설계자의 의도에 따라 다양하게 이루어질 수 있으므로 본 발명의 실시예에서는 어느 하나의 형태에 특별히 한정하지는 않을 것이다.However, in the case of the image processing device 720 of FIG. 7, the monitoring device 111, the storage/playback/search device 112, and the secondary analysis device 113 of FIG. 1 are integrated into one device. Show what you can do. Accordingly, the image processing device 720 of FIG. 7 may perform the same operation as in FIG. 1 including functional blocks such as a monitoring unit, a storage/playback/search unit, and a secondary analysis unit. However, in the case of FIG. 1, although the data operation processing speed may be faster than that of FIG. 7, it may be expensive due to the use of a large number of equipment. Therefore, since this configuration may be variously made according to the intention of the system designer, the embodiment of the present invention will not be particularly limited to any one form.

통신망(710)은 유무선 통신망을 모두 포함한다. 가령 통신망(710)으로서 유무선 인터넷망이 이용되거나 연동될 수 있다. 여기서 유선망은 케이블망이나 공중 전화망(PSTN)과 같은 인터넷망을 포함하는 것이고, 무선 통신망은 CDMA, WCDMA, GSM, EPC(Evolved Packet Core), LTE(Long Term Evolution), 와이브로(Wibro) 망 등을 포함하는 의미이다. 물론 본 발명의 실시예에 따른 통신망(710)은 이에 한정되는 것이 아니며, 향후 구현될 차세대 이동 통신 시스템의 접속망으로서 가령 클라우드 컴퓨팅 환경하의 클라우드 컴퓨팅망, 5G망 등에 사용될 수 있다. 가령, 통신망(710)이 유선 통신망인 경우 통신망 내의 액세스포인트는 전화국의 교환국 등에 접속할 수 있지만, 무선 통신망인 경우에는 통신사에서 운용하는 SGSN 또는 GGSN(Gateway GPRS Support Node)에 접속하여 데이터를 처리하거나, BTS(Base Transceiver Station), NodeB, e-NodeB 등의 다양한 중계기에 접속하여 데이터를 처리할 수 있다.The communication network 710 includes both wired and wireless communication networks. For example, as the communication network 710, a wired or wireless Internet network may be used or interlocked. Here, the wired network includes an Internet network such as a cable network or a public switched telephone network (PSTN), and the wireless communication network includes CDMA, WCDMA, GSM, EPC (Evolved Packet Core), LTE (Long Term Evolution), and Wibro networks. meaning to include Of course, the communication network 710 according to an embodiment of the present invention is not limited thereto, and may be used as an access network of a next-generation mobile communication system to be implemented in the future, for example, a cloud computing network under a cloud computing environment, a 5G network, and the like. For example, when the communication network 710 is a wired communication network, an access point within the communication network can access an exchange of a telephone company, but in the case of a wireless communication network, access to an SGSN or GGSN (Gateway GPRS Support Node) operated by a communication company to process data, Data can be processed by connecting to various repeaters such as BTS (Base Transceiver Station), NodeB, and e-NodeB.

통신망(710)은 액세스포인트를 포함할 수도 있다. 액세스포인트는 건물 내에 많이 설치되는 펨토(femto) 또는 피코(pico) 기지국과 같은 소형 기지국을 포함한다. 여기서, 펨토 또는 피코 기지국은 소형 기지국의 분류상 에지장치(700) 등을 최대 몇 대까지 접속할 수 있느냐에 따라 구분된다. 물론 액세스포인트는 에지장치(700) 등과 지그비 및 와이파이(Wi-Fi) 등의 근거리 통신을 수행하기 위한 근거리 통신 모듈을 포함할 수 있다. 액세스포인트는 무선통신을 위하여 TCP/IP 혹은 RTSP(Real-Time Streaming Protocol)를 이용할 수 있다. 여기서, 근거리 통신은 와이파이 이외에 블루투스, 지그비, 적외선(IrDA), UHF(Ultra High Frequency) 및 VHF(Very High Frequency)와 같은 RF(Radio Frequency) 및 초광대역 통신(UWB) 등의 다양한 규격으로 수행될 수 있다. 이에 따라 액세스포인트는 데이터 패킷의 위치를 추출하고, 추출된 위치에 대한 최상의 통신 경로를 지정하며, 지정된 통신 경로를 따라 데이터 패킷을 다음 장치, 예컨대 영상처리장치(720)로 전달할 수 있다. 액세스포인트는 일반적인 네트워크 환경에서 여러 회선을 공유할 수 있으며, 예컨대 라우터(router), 리피터(repeater) 및 중계기 등이 포함된다.Communications network 710 may also include an access point. Access points include small base stations such as femto or pico base stations that are often installed in buildings. Here, the femto or pico base station is classified according to the maximum number of edge devices 700 that can be connected to the small base station. Of course, the access point may include a short-range communication module for performing short-range communication such as ZigBee and Wi-Fi with the edge device 700 and the like. The access point may use TCP/IP or Real-Time Streaming Protocol (RTSP) for wireless communication. Here, short-range communication may be performed in various standards such as Bluetooth, ZigBee, infrared (IrDA), RF (Radio Frequency) such as UHF (Ultra High Frequency) and VHF (Very High Frequency), and ultra-wideband communication (UWB) in addition to Wi-Fi. can Accordingly, the access point may extract the location of the data packet, designate the best communication path for the extracted location, and forward the data packet to the next device, for example, the image processing device 720 along the designated communication path. An access point may share several lines in a general network environment, and includes, for example, a router, a repeater, and a repeater.

도 8은 도 1의 에지형 영상분석장치나 도 7의 에지장치의 세부구조를 예시한 블록다이어그램이다.8 is a block diagram illustrating a detailed structure of the edge type video analysis device of FIG. 1 or the edge device of FIG. 7 .

도 8에 도시된 바와 같이, 본 발명의 실시예에 따른 에지장치(700) 또는 에지형 영상분석장치(100)(이하, 에지장치로 설명함)는 통신 인터페이스부(800), 제어부(810), 에지기반 영상분석부(820) 및 저장부(830)의 일부 또는 전부를 포함한다.As shown in FIG. 8 , the edge device 700 or the edge type image analysis device 100 (hereinafter referred to as an edge device) according to an embodiment of the present invention includes a communication interface unit 800 and a control unit 810 , includes some or all of the edge-based image analysis unit 820 and the storage unit 830.

여기서, "일부 또는 전부를 포함한다"는 것은 저장부(830)와 같은 일부 구성요소가 생략되어 에지장치(700)가 구성되거나, 에지기반 영상분석부(820)와 같은 일부 구성요소가 제어부(810)와 같은 다른 구성요소에 통합되어 구성될 수 있는 것 등을 의미하는 것으로서, 발명의 충분한 이해를 돕기 위하여 전부 포함하는 것으로 설명한다.Here, "including part or all" means that the edge device 700 is configured by omitting some components such as the storage unit 830, or some components such as the edge-based image analysis unit 820 are the controller ( 810), which can be integrated into other components such as, etc., and will be described as including all of them to help a sufficient understanding of the invention.

통신 인터페이스부(800)는 임의 장소에 설치되어 해당 장소를 촬영하는 카메라의 촬영영상을 수신할 수 있다. 실제로 도 7의 에지장치(700)는 카메라 내에 탑재되는 것도 얼마든지 가능하므로, 카메라의 렌즈부에 연계하여 동작할 수 있다. 또한, 통신 인터페이스부(800)는 도 7의 통신망(710)을 경유하여 카메라의 촬영영상을 영상처리장치(720)로 전송하기 위한 동작을 수행할 수 있다.The communication interface unit 800 may be installed in an arbitrary place and receive a photographed image of a camera photographing the corresponding place. In fact, since the edge device 700 of FIG. 7 can be mounted in a camera, it can operate in conjunction with the lens unit of the camera. Also, the communication interface unit 800 may perform an operation for transmitting an image captured by a camera to the image processing device 720 via the communication network 710 of FIG. 7 .

통신 인터페이스부(800)는 제어부(810)의 제어하에 본 발명의 실시예에 따른 에지장치(700)의 촬영영상을 스트림으로 전송할 수 있으며, 이의 과정에서 본 발명의 실시예에 따라 관심 객체와 비관심 객체의 서로 다른 압축 방식에 따른 비디오 데이터를 전송하거나, 또는 배경 영상의 압축과 영상 내 객체의 서로 다른 압축 방식에 따른 데이터를 전송할 수 있다. 이는 가령 도 7의 영상처리장치(720)는 압축되어 전송되는 비디오 데이터를 복원할 때, 특히 사람이나 차량 등의 객체에서 고해상도의 복원이 가능하도록 하기 위해서이다. 물론 메타데이터를 전송할 때 추적 객체의 압축과 관련한 정보, 그리고 배경 영상 내 위치정보 등의 정보가 함께 제공되므로 영상처리장치(720)는 이를 근거로 기설정된 통신 규약(예: 추적 객체의 인코딩/디코딩 등)에 따라 영상을 복원해 낼 수 있게 된다.The communication interface unit 800 may transmit the captured image of the edge device 700 according to the embodiment of the present invention as a stream under the control of the controller 810, and in the process of doing so, the object of interest and the object of interest according to the embodiment of the present invention may be transmitted. Video data according to different compression methods of the object of interest may be transmitted, or data according to different compression methods of the background image and the object in the image may be transmitted. This is so that, for example, the image processing device 720 of FIG. 7 can restore high-resolution images, especially in objects such as people or vehicles, when video data that is compressed and transmitted is restored. Of course, when the metadata is transmitted, since information related to compression of the tracking object and information such as location information in the background image are provided together, the image processing device 720 uses a predetermined communication protocol based on this (e.g. encoding/decoding of the tracking object). etc.), the image can be restored.

제어부(810)는 도 8의 통신 인터페이스부(800), 에지기반 영상분석부(820) 및 저장부(830)의 전반적인 제어 동작을 담당한다. 제어부(810)는 통신 인터페이스부(800)를 통해 카메라의 촬영영상이 수신되면 이를 저장부(830)에 임시 저장시킬 수 있다. 또한, 제어부(810)는 해당 촬영영상을 에지기반 영상분석부(820)에 제공하여 영상분석이 이루어지도록 하고, 이와 과정에서 생성되는 서로 다른 압축 방식으로 이루어진 객체 데이터를 배경 영상의 영상 데이터와 함께 전송하도록 통신 인터페이스부(800)를 제어할 수 있다.The control unit 810 is in charge of overall control operations of the communication interface unit 800 of FIG. 8 , the edge-based image analysis unit 820 and the storage unit 830 . When an image captured by a camera is received through the communication interface unit 800, the controller 810 may temporarily store it in the storage unit 830. In addition, the control unit 810 provides the corresponding captured image to the edge-based image analysis unit 820 so that the image analysis is performed, and object data made of different compression methods generated in this process are combined with the image data of the background image. The communication interface unit 800 may be controlled to transmit.

에지기반 영상분석부(820)는 카메라의 촬영영상에 대하여 배경 영상과 객체를 분리한 후 배경 영상은 기존의 다시 말해서 제품 출고시 탑재된 인코더의 성능에 따라 데이터 압축을 수행할 수 있다. 압축은 인트라, 인터 또는 그 조합 방식에 의해 다양하게 이루어질 수 있다. 반면 에지기반 영상분석부(820)는 분리된 객체에 대하여는 압축을 수행하지 않거나, 또는 배경 영상보다는 압축율을 줄여 다시 말해 복원시 배경 영상보다 더 해상도가 높은 고해상도의 객체가 복원되도록 동작할 수 있다. 물론 이러한 데이터의 전송은 데이터 패킷 내에서 객체와 관련한 비디오 데이터의 화소값을 포함하는 별도의 구조를 구성하는 것도 얼마든지 가능할 수 있다. 다시 말해, 데이터 패킷의 표준화된 규격을 변경하지 않으면서 본 발명의 실시예에 따른 기술을 구현하는 것이 바람직하다.The edge-based image analysis unit 820 separates a background image and an object from an image captured by a camera, and then performs data compression on the background image according to the performance of an encoder loaded at the time of shipment. Compression may be performed in various ways by intra, inter, or a combination thereof. On the other hand, the edge-based image analysis unit 820 may not perform compression on the separated object, or reduce the compression rate compared to the background image, that is, operate to restore a high-resolution object having a higher resolution than the background image during restoration. Of course, it is possible to construct a separate structure including pixel values of video data related to an object in a data packet for transmission of such data. In other words, it is desirable to implement a technique according to an embodiment of the present invention without changing the standardized specifications of data packets.

또한, 본 발명의 실시예에 따른 에지기반 영상분석부(820)는 객체의 경우에는 비관심 객체와 관심 객체를 분리하여 서로 다르게 압축을 수행할 수도 있다. 예를 들어 교통관제센터에서는 영상의 모니터링시 사람이나 차량 등은 주요 관심 대상일 수 있으므로 이러한 관심 객체는 고해상도로 복원되도록 압축을 하지 않거나 압축율을 줄이는 것이다. 따라서 본 발명의 실시예에서는 상기의 동작을 위하여 이종의 인코더를 포함할 수 있을 것이다.Also, in the case of objects, the edge-based image analysis unit 820 according to an embodiment of the present invention may perform compression differently by separating an object of interest from an object of interest. For example, in the traffic control center, since people or vehicles may be a main object of interest when monitoring an image, the object of interest is not compressed or the compression rate is reduced so that the object of interest is restored in high resolution. Therefore, in the embodiment of the present invention, heterogeneous encoders may be included for the above operation.

나아가 에지기반 영상분석부(820)는 인코딩된 배경 영상, 객체별 인코딩된 동영상 정보, 나아가 다중객체의 메타데이터 정보를 합성하고, 이를 다중 정보 스트림 동기화를 수행할 수 있으며, 그리고 합성된 해당 데이터를 스트림으로 전송하기 위한 데이터 패킷을 생성하여 출력할 수도 있다. 이에 따라 해당 데이터 패킷은 제어부(810)의 제어하에 통신 인터페이스부(800)를 통해 도 7의 영상처리장치(720)로 전송될 수 있다. 물론, 영상처리장치(720)는 본 발명의 실시예에 따라 가령 관심 객체의 복원 즉 디코딩을 위한 별도의 디코더를 포함할 수 있으며, 이를 통해 복원 동작을 수행할 수 있을 것이다. 인코딩과 디코딩은 일종의 약속이고 통신 규약이므로, 에지장치(700)에 본 발명의 실시예에 따른 인코딩 모듈을 설치하는 경우, 영상처리장치(720)에는 그에 상응하는 디코딩 모듈이 설치될 수 있을 것이다.Furthermore, the edge-based video analysis unit 820 may synthesize the encoded background video, encoded video information for each object, and furthermore, metadata information of multiple objects, perform multi-information stream synchronization, and transmit the synthesized corresponding data Data packets to be transmitted as a stream may be generated and output. Accordingly, the corresponding data packet may be transmitted to the image processing device 720 of FIG. 7 through the communication interface unit 800 under the control of the controller 810 . Of course, according to an embodiment of the present invention, the image processing device 720 may include a separate decoder for restoring or decoding an object of interest, through which a restoration operation may be performed. Since encoding and decoding are a kind of agreement and communication protocol, when an encoding module according to an embodiment of the present invention is installed in the edge device 700, a corresponding decoding module may be installed in the image processing device 720.

저장부(830)는 제어부(810)의 제어하에 처리되는 다양한 유형의 데이터를 임시 저장할 수 있다. 제어부(810)는 카메라의 촬영영상을 저장부(830)에 임시 저장한 후 불러내어 에지기반 영상분석부(820)로 제공할 수 있다.The storage unit 830 may temporarily store various types of data processed under the control of the controller 810 . The control unit 810 may temporarily store an image taken by the camera in the storage unit 830 and then retrieve and provide the image to the edge-based image analysis unit 820 .

상기한 내용 이외에도 도 8의 통신 인터페이스부(800), 제어부(810), 에지기반 영상분석부(820) 및 저장부(830)는 다양한 동작을 수행할 수 있으며, 기타 자세한 내용은 앞서 충분히 설명하였으므로, 그 내용들로 대신하고자 한다.In addition to the above, the communication interface unit 800, the control unit 810, the edge-based image analysis unit 820, and the storage unit 830 of FIG. 8 can perform various operations, and other details have been sufficiently described above. , I want to replace it with those contents.

본 발명의 실시예에 따른 도 8의 통신 인터페이스부(800), 제어부(810), 에지기반 영상분석부(820) 및 저장부(830)는 서로 물리적으로 분리된 하드웨어 모듈로 구성되지만, 각 모듈은 내부에 상기의 동작을 수행하기 위한 소프트웨어를 저장하고 이를 실행할 수 있을 것이다. 다만, 해당 소프트웨어는 소프트웨어 모듈의 집합이고, 각 모듈은 하드웨어로 형성되는 것이 얼마든지 가능하므로 소프트웨어니 하드웨어니 하는 구성에 특별히 한정하지 않을 것이다. 예를 들어 저장부(830)는 하드웨어인 스토리지(storage) 또는 메모리(memory)일 수 있다. 하지만, 소프트웨어적으로 정보를 저장(repository)하는 것도 얼마든지 가능하므로 위의 내용에 특별히 한정하지는 않을 것이다.The communication interface unit 800, the control unit 810, the edge-based image analysis unit 820, and the storage unit 830 of FIG. 8 according to an embodiment of the present invention are composed of hardware modules physically separated from each other, but each module will be able to store and execute software for performing the above operation therein. However, since the corresponding software is a set of software modules, and each module can be formed of hardware, it will not be particularly limited to the configuration of software and hardware. For example, the storage unit 830 may be hardware storage or memory. However, since it is possible to store information in a software manner (repository), the above content will not be particularly limited.

한편, 본 발명의 다른 실시예로서 제어부(810)는 CPU 및 메모리를 포함할 수 있으며, 원칩화하여 형성될 수 있다. CPU는 제어회로, 연산부(ALU), 명령어해석부 및 레지스트리 등을 포함하며, 메모리는 램을 포함할 수 있다. 제어회로는 제어동작을, 그리고 연산부는 2진비트 정보의 연산동작을, 그리고 명령어해석부는 인터프리터나 컴파일러 등을 포함하여 고급언어를 기계어로, 또 기계어를 고급언어로 변환하는 동작을 수행할 수 있으며, 레지스트리는 소프트웨어적인 데이터 저장에 관여할 수 있다. 상기의 구성에 따라, 가령 에지형 영상분석장치(100)나 에지장치(700)의 동작 초기에 에지기반 영상분석부(720)에 저장되어 있는 프로그램을 복사하여 메모리 즉 램(RAM)에 로딩한 후 이를 실행시킴으로써 데이터 연산 처리 속도를 빠르게 증가시킬 수 있다. 딥러닝 모델 같은 경우 램(RAM)이 아닌 GPU 메모리에 올라가 GPU를 이용하여 수행 속도를 가속화하여 실행될 수도 있다.Meanwhile, as another embodiment of the present invention, the control unit 810 may include a CPU and a memory, and may be formed as a single chip. The CPU includes a control circuit, an arithmetic unit (ALU), a command interpreter, and a registry, and the memory may include RAM. The control circuit performs control operations, the operation unit performs operation of binary bit information, and the instruction interpretation unit performs an operation of converting high-level language into machine language and machine language into high-level language, including an interpreter or compiler. , the registry may be involved in software data storage. According to the above configuration, for example, at the beginning of operation of the edge-type image analysis device 100 or the edge device 700, the program stored in the edge-based image analysis unit 720 is copied and loaded into memory, that is, RAM. Then, by executing it, the data operation processing speed can be rapidly increased. In the case of a deep learning model, it can be loaded into GPU memory rather than RAM, and can be executed by accelerating the execution speed using the GPU.

도 9는 본 발명의 실시예에 따른 객체 검출 추적에 기반한 비디오 인코딩 및 전송 방법을 나타내는 흐름도이다.9 is a flowchart illustrating a video encoding and transmission method based on object detection tracking according to an embodiment of the present invention.

설명의 편의상 도 9를 도 1과 함께 참조하면, 본 발명의 실시예에 따른 에지형 영상분석장치(100)는 카메라의 촬영영상을 압축하여 가령 서버와 같은 타 장치로 전송할 때, 촬영영상 내 객체의 식별이 필요한 영역을 검출 및 추적하고, 추적한 객체를 관심 객체의 여부 즉 관심 객체인지 아닌지에 따라 서로 다르게 압축하여 타 장치에서 관심 객체를 기준 해상도 이상의 고해상도로 복원되도록 한다(S900). 물론 본 발명의 실시예에서는 현재 장치에 기설치되는 인코더 이외에 별도의 인코더를 더 탑재하여 사용할 수 있다. 장치에 기본적으로 탑재되는 인코더를 통해서는 배경 영상을 압축할 수 있다. 또한, 배경 영상 내 객체들에 대하여는 배경 영상의 압축보다 압축율 또는 압축도가 낮은 압축을 수행할 수 있다. 객체에 대하여는 고해상도 복원이 이루어지도록 하기 위해서이다. Referring to FIG. 9 together with FIG. 1 for convenience of explanation, the edge-type video analysis device 100 according to an embodiment of the present invention compresses a captured image of a camera and transmits it to another device such as a server, an object in the captured image Detects and tracks the region that needs to be identified, and compresses the tracked object differently depending on whether it is an object of interest, that is, whether it is an object of interest or not, so that the object of interest is reconstructed with a high resolution higher than the reference resolution in another device (S900). Of course, in the embodiment of the present invention, a separate encoder may be further mounted and used in addition to the encoder previously installed in the current device. The background image can be compressed through an encoder that is basically installed in the device. In addition, compression with a lower compression rate or degree of compression than compression of the background image may be performed on objects in the background image. This is to ensure that high-resolution reconstruction is performed on the object.

물론 본 발명의 실시예에서는 배경 영상과 비관심 객체는 장치에 기탑재되는 인코더를 통해 압축을 수행할 수 있지만, 관심 객체에 대하여만 압축율을 다르게 하여 고해상도 복원이 이루어지도록 할 수 있다. 나아가, 관심 객체나 비관심 객체의 경우에는 압축 즉 인코딩 자체를 미수행하여 데이터를 처리하는 것도 얼마든지 가능하므로 본 발명의 실시예에서는 어느 하나의 형태에 특별히 한정하지는 않을 것이다. 예를 들어, 도 7에서와 같은 통신망(710)에 트래픽이 많은 경우에는 객체에 대하여도 압축이 이루어질 수 있지만, 트래픽이 적은 통신 환경에서는 객체에 대하여 압축이 미수행될 수도 있다. 이와 같이 통신환경에 따라 적절하게 선택되어 시스템이 운영될 수도 있으므로 본 발명의 실시예에서는 어느 하나의 형태에 특별히 한정하지는 않을 것이다.Of course, in the embodiment of the present invention, the background image and the non-interest object may be compressed through an encoder pre-loaded in the device, but high-resolution reconstruction may be performed by varying the compression rate only for the object of interest. Furthermore, in the case of objects of interest or non-interest objects, it is possible to process data without performing compression or encoding itself, so the embodiment of the present invention will not be particularly limited to any one form. For example, when there is a lot of traffic in the communication network 710 as shown in FIG. 7 , compression may be performed on objects as well, but compression may not be performed on objects in a low-traffic communication environment. In this way, since the system may be properly selected and operated according to the communication environment, the embodiment of the present invention will not be particularly limited to any one form.

또한, 도 1의 영상처리시스템(110)과 같은 외부장치가, 영상분석장치(100)에서 제공하는 추적 객체의 (메타데이터 형태의) 객체 데이터를 수신하여 관심 객체를 고해상도로 복원하며, 복원한 고해상도의 관심 객체를 이용해 추적 객체의 식별 및 인식 동작을 수행한다(S910). 예를 들어, 이러한 과정에서 사람이나 차량에 대한 명확한 식별이 이루어질 수 있고, 그 결과 관제센터 등에서 관제의 효율이 증가할 수 있을 것이다.In addition, an external device such as the image processing system 110 of FIG. 1 receives the object data (metadata form) of the tracking object provided by the image analysis device 100, restores the object of interest in high resolution, and restores the Identification and recognition of the tracked object is performed using the high-resolution object of interest (S910). For example, in this process, a clear identification of a person or a vehicle can be made, and as a result, the efficiency of control in a control center or the like can be increased.

상기한 내용 이외에도 도 1의 에지형 영상분석장치(100) 및 영상처리시스템(110)은 다양한 동작을 수행할 수 있으며, 기타 자세한 내용은 앞서 충분히 설명하였으므로 그 내용들로 대신하고자 한다.In addition to the above, the edge-type image analysis device 100 and the image processing system 110 of FIG. 1 can perform various operations, and since other details have been sufficiently described above, they will be replaced with those contents.

한편, 본 발명의 실시예를 구성하는 모든 구성 요소들이 하나로 결합하거나 결합하여 동작하는 것으로 설명되었다고 해서, 본 발명이 반드시 이러한 실시 예에 한정되는 것은 아니다. 즉, 본 발명의 목적 범위 안에서라면, 그 모든 구성 요소들이 하나 이상으로 선택적으로 결합하여 동작할 수도 있다. 또한, 그 모든 구성요소들이 각각 하나의 독립적인 하드웨어로 구현될 수 있지만, 각 구성 요소들의 그 일부 또는 전부가 선택적으로 조합되어 하나 또는 복수 개의 하드웨어에서 조합된 일부 또는 전부의 기능을 수행하는 프로그램 모듈을 갖는 컴퓨터 프로그램으로서 구현될 수도 있다. 그 컴퓨터 프로그램을 구성하는 코드들 및 코드 세그먼트들은 본 발명의 기술 분야의 당업자에 의해 용이하게 추론될 수 있을 것이다. 이러한 컴퓨터 프로그램은 컴퓨터가 읽을 수 있는 비일시적 저장매체(non-transitory computer readable media)에 저장되어 컴퓨터에 의하여 읽혀지고 실행됨으로써, 본 발명의 실시 예를 구현할 수 있다.On the other hand, even if all the components constituting the embodiment of the present invention are described as being combined or operated as one, the present invention is not necessarily limited to these embodiments. That is, within the scope of the object of the present invention, all of the components may be selectively combined with one or more to operate. In addition, although all of the components may be implemented as a single independent piece of hardware, some or all of the components are selectively combined to perform some or all of the combined functions in one or a plurality of pieces of hardware. It may be implemented as a computer program having. Codes and code segments constituting the computer program may be easily inferred by a person skilled in the art. Such a computer program may implement an embodiment of the present invention by being stored in a computer-readable non-transitory computer readable media, read and executed by a computer.

여기서 비일시적 판독 가능 기록매체란, 레지스터, 캐시(cache), 메모리 등과 같이 짧은 순간 동안 데이터를 저장하는 매체가 아니라, 반영구적으로 데이터를 저장하며, 기기에 의해 판독(reading)이 가능한 매체를 의미한다. 구체적으로, 상술한 프로그램들은 CD, DVD, 하드 디스크, 블루레이 디스크, USB, 메모리 카드, ROM 등과 같은 비일시적 판독가능 기록매체에 저장되어 제공될 수 있다.Here, the non-transitory readable recording medium means a medium that stores data semi-permanently and can be read by a device, not a medium that stores data for a short moment, such as a register, cache, or memory. . Specifically, the above-described programs may be provided by being stored in a non-transitory readable recording medium such as a CD, DVD, hard disk, Blu-ray disk, USB, memory card, or ROM.

이상에서는 본 발명의 바람직한 실시 예에 대하여 도시하고 설명하였지만, 본 발명은 상술한 특정의 실시 예에 한정되지 아니하며, 청구범위에 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 다양한 변형실시가 가능한 것은 물론이고, 이러한 변형실시들은 본 발명의 기술적 사상이나 전망으로부터 개별적으로 이해되어서는 안 될 것이다.Although the preferred embodiments of the present invention have been shown and described above, the present invention is not limited to the specific embodiments described above, and is common in the art to which the present invention pertains without departing from the gist of the present invention claimed in the claims. Of course, various modifications and implementations are possible by those with knowledge of, and these modifications should not be individually understood from the technical spirit or perspective of the present invention.

100: 에지형 영상분석장치 101: 영상 입력부
102: 배경영상 생성부 103: 다중객체 검출추적부
104: 다중 객체별 인코딩부 105: 다중 객체 비디오 스트림부
110: 영상처리시스템 111: 모니터링장치
112: 저장/재생/검색장치 113: 2차 분석장치100: edge type video analysis device 101: video input unit
102: background image generation unit 103: multi-object detection and tracking unit
104: multi-object encoding unit 105: multi-object video stream unit
110: image processing system 111: monitoring device
112: storage / playback / search device 113: secondary analysis device

Claims

When a captured image captured by a camera is compressed and transmitted to another device, an area in the captured image where an object needs to be identified is detected and tracked, and the tracked object is compressed differently depending on whether it is an object of interest, and the other device detects the object of interest. An image analysis device for restoring to a high resolution higher than the reference resolution; and
An external device operating as the other device, receiving object data of a tracking object provided by the video analysis device, reconstructing the object of interest in the high resolution, and identifying and recognizing the tracking object using the reconstructed high resolution object of interest. Including; device;
The video analysis device is provided in the camera, and the external device includes a device provided at a remote location via a communication network to perform at least one operation of monitoring, data storage, search, and image analysis,
After the image analysis device separates the background image and the object from the captured image, the background image performs data compression according to the performance of the pre-mounted encoder, and the separated object does not perform compression or is better than the background image. Compress by reducing the compression rate,
The video analysis device includes an encoding unit for each multi-object that performs compression of the object of interest and the object of interest differently when compressing the separated object,
The video analysis device transmits metadata of the tracked object as the object data, and generates map coordinates for the tracked object on pre-stored map data in relation to the shooting location of the camera to determine the location of the tracked object. A video encoding system based on object detection and tracking that transmits a solution by including it in the metadata.

delete

According to claim 1,
The video encoding system based on object detection and tracking, wherein the image analysis device compresses a human object and a vehicle object in the photographed image as the object of interest to be reconstructed at the high resolution.

delete

When an image analysis device compresses an image captured by a camera and transmits it to another device, it detects and tracks a region in the captured image where an object needs to be identified, compresses the tracked object differently depending on whether it is an object of interest, and then compresses the other device. reconstructing the object of interest in a high resolution higher than a reference resolution in a device; and
An external device operating as the other device receives the object data of the tracking object provided by the image analysis device, restores the object of interest in the high resolution, identifies the tracking object using the restored high resolution object of interest, and recognizing; including,
The video analysis device includes an edge device provided in the camera, and the external device includes a device provided at a remote location via a communication network to perform at least one operation of monitoring, data storage, search, and image analysis,
After the image analysis device separates the background image and the object from the captured image, the background image performs data compression according to the performance of the pre-mounted encoder, and the separated object does not perform compression or is better than the background image. Compress by reducing the compression rate,
An encoding unit for each multi-object constituting the video analysis device performs compression of the object of interest and the object of interest differently,
The video analysis device transmits the metadata of the tracked object as the object data, and generates map coordinates for the tracked object on the map data stored in relation to the shooting location of the camera to determine the location of the tracked object. A video encoding method based on object detection and tracking further comprising: transmitting including the metadata in the metadata.

delete

According to claim 6,
The step of compressing differently from each other,
A video encoding method based on object detection and tracking, which compresses a human object and a vehicle object in the photographed image as the object of interest to be reconstructed at the high resolution.

delete