KR102061915B1

KR102061915B1 - syntax-based method of providing object classification for compressed video

Info

Publication number: KR102061915B1
Application number: KR1020180147014A
Authority: KR
Inventors: 이현우; 정승훈; 이성진
Original assignee: 이노뎁 주식회사
Priority date: 2018-11-26
Filing date: 2018-11-26
Publication date: 2020-01-02

Abstract

The present invention relates to a technology which effectively performs object classification from a compressed image such as H.264 AVC and H.265 HEVC. In particular, unlike known technology, in which the existence of an object is recognized through a complex image processing to classify the type of the object with respect to a compressed image generated by a CCTV camera, according to the present invention, syntax information (a motion vector, a coding type and the like), which is acquired after parsing compressed image data, is utilized in order to extract an area in the image in which a significant movement exists, in other words, a moving object area, before allowing a deep neural network to learn a motion vector pattern of the moving object area to utilize the same for object classification. Therefore, according to the present invention, object classification can be effectively performed from a CCTV compressed image without going through a complex processing such as decoding, downscale resizing, differential image acquisition, and image analysis. In addition, because object classification can be performed by 1/10 of the calculation amount of the known technology, the number of accommodating channels of an image analysis server can be increased by approximately 10 times.

Description

Syntax-based method of providing object classification for compressed video

본 발명은 일반적으로 H.264 AVC 및 H.265 HEVC 등의 압축영상으로부터 객체 분류를 효과적으로 수행하는 기술에 관한 것이다.The present invention generally relates to a technique for effectively performing object classification from compressed images such as H.264 AVC and H.265 HEVC.

더욱 상세하게는, 본 발명은 예컨대 CCTV 카메라가 생성하는 압축영상에 대해 종래기술처럼 복잡한 이미지 프로세싱을 통해 객체 존재를 인식하고 객체 종류를 분류하는 것이 아니라 압축영상 데이터를 파싱하여 얻어지는 신택스 정보(예: 모션벡터, 코딩유형)를 활용하여 영상 내의 무언가 유의미한 움직임이 존재하는 영역, 즉 이동객체 영역을 추출하고 그 이동객체 영역의 모션벡터 패턴을 심층신경망에 학습시킨 후에 객체 분류에 활용하는 기술에 관한 것이다.More specifically, the present invention, for example, the syntax information obtained by parsing the compressed image data rather than recognizing the existence of the object and classifying the object type through the complex image processing, for example, for the compressed image generated by the CCTV camera Motion vector, coding type) to extract a region of significant motion in the image, that is, a moving object region, and to learn the motion vector pattern of the moving object region in a deep neural network, and to apply it to object classification. .

최근에는 범죄예방이나 사후증거 확보 등을 위해 CCTV를 이용하는 영상관제 시스템을 구축하는 것이 일반적이다. 지역별로 다수의 CCTV 카메라를 설치해둔 상태에서 이들 CCTV 카메라가 생성하는 영상을 모니터에 표시하고 스토리지 장치에 저장해두는 것이다. 범죄나 사고가 발생하는 장면을 관제 요원이 발견하게 되면 그 즉시 적절하게 대처하는 한편, 필요에 따라서는 사후증거 확보를 위해 스토리지에 저장되어 있는 영상을 검색하는 것이다.Recently, it is common to build a video control system using CCTV for crime prevention and security. With multiple CCTV cameras installed by region, images generated by these CCTV cameras are displayed on a monitor and stored in a storage device. When a control agent finds a scene where a crime or accident occurs, he or she immediately responds appropriately and, if necessary, retrieves the image stored in the storage to secure post evidence.

그런데. CCTV 카메라의 설치 현황에 비해 관제 요원의 수는 매우 부족한 것이 현실이다. 이처럼 제한된 인원으로 영상 감시를 효과적으로 수행하려면 CCTV 영상을 모니터 화면에 단순 표시하는 것만으로는 충분하지 않다. 각각의 CCTV 영상에 존재하는 객체의 움직임을 감지하여 실시간으로 해당 영역에 무언가 추가 표시함으로써 효과적으로 발견되도록 처리하는 것이 바람직하다. 이러한 경우에 관제 요원은 CCTV 영상 전체를 균일한 관심도를 가지고 지켜보는 것이 아니라 객체 움직임이 있는 부분을 중심으로 CCTV 영상을 감시하면 된다. By the way. The reality is that the number of control personnel is very low compared to the installation status of CCTV cameras. In order to effectively perform video surveillance with such limited number of people, simply displaying CCTV images on the monitor screen is not enough. It is preferable to process the object to be effectively detected by detecting the motion of an object present in each CCTV image and displaying something in the corresponding area in real time. In this case, the monitoring personnel do not monitor the entire CCTV image with uniform interest, but monitor the CCTV image centering on the part where the object moves.

특히, 영상관제 시스템이 CCTV 영상에서 객체의 움직임을 자동 감지할 뿐만 아니라 당해 객체를 분류(예: 사람, 자동차, 동물, 깃발)할 수 있다면 영상관제의 효율을 더욱 제고할 수 있다. 특정한 종류의 객체만을 선별적으로 감시할 수 있을 뿐만 아니라, 스토리지에 저장되어 있는 대규모의 CCTV 영상 데이터로부터 사후 증거를 빠르게 검출해낼 수 있게 된다.In particular, if the video control system can not only automatically detect the movement of the object in the CCTV image, but also classify the object (eg, a person, a car, an animal, or a flag), the video control efficiency can be further improved. Not only can you selectively monitor certain types of objects, but you can also quickly detect ex post evidence from large amounts of CCTV video data stored in storage.

최근에 설치되는 CCTV 카메라는 고해상도(예: Full HD) 및 고프레임(예: 초당 24프레임)의 제품이 채택되고 있기 때문에 네트워크 대역폭과 스토리지 공간의 부담을 고려하여 H.264 AVC 및 H.265 HEVC 등과 같은 고압축율의 복잡한 영상압축 기술이 채택되고 있다. CCTV 카메라 장치는 촬영 영상을 영상압축 기술에 따라 인코딩하여 생성한 압축영상을 제공하고, CCTV 영상을 활용하는 측에서는 해당 기술규격에 따라 역으로 압축영상에 대한 디코딩을 수행한다. 따라서, 영상압축 기술이 적용된 CCTV 영상에서 객체를 인식하고 그 객체가 어떠한 종류인지, 예를 들어 사람인지, 자동차인지, 동물인지, 깃발인지, 나뭇잎인지 분류하려면 종래에는 압축영상을 디코딩하여 재생영상, 즉 압축이 풀려있는 원래 영상을 얻은 후에 내용 분석을 위해 이미지를 처리하는 과정이 필요하였다. Recently installed CCTV cameras adopt high resolution (e.g. Full HD) and high frame rate (e.g. 24 frames per second), so that H.264 AVC and H.265 HEVC take into account the burden of network bandwidth and storage space. High compression ratio complex image compression technology such as is being adopted. The CCTV camera device provides a compressed image generated by encoding the captured image according to image compression technology, and the side utilizing the CCTV image decodes the compressed image in reverse according to the corresponding technical standard. Therefore, in order to recognize an object in a CCTV image to which image compression technology is applied and to classify it as a kind of object, for example, a person, a car, an animal, a flag, a leaf, conventionally, a compressed image is decoded to reproduce a compressed image, In other words, it was necessary to process the image for content analysis after obtaining the original uncompressed image.

이하에서는 [도 1]와 [도 2]를 참조하여 종래기술에서 CCTV 압축영상으로부터 객체 분류를 수행하는 과정을 기술한다.Hereinafter, a process of performing object classification from a CCTV compressed image in the prior art will be described with reference to FIGS. 1 and 2.

[도 1]은 H.264 AVC 기술규격에 따른 동영상 디코딩 장치의 일반적인 구성을 나타내는 블록도이다. [도 1]을 참조하면, H.264 AVC에 따른 동영상 디코딩 장치는 구문분석기(11), 엔트로피 디코더(12), 역 변환기(13), 모션벡터 연산기(14), 예측기(15), 디블로킹 필터(16)를 포함하여 구성된다. 이들 하드웨어 모듈이 압축영상의 데이터를 순차적으로 처리함으로써 압축영상에서 압축을 풀고 원래의 영상 데이터를 복원해낸다. 이때, 구문분석기(11)는 압축영상의 코딩 유닛에 대해 모션벡터 및 코딩유형을 파싱해낸다. 이러한 코딩 유닛(coding unit)은 일반적으로는 매크로블록이나 서브 블록과 같은 영상 블록이다.1 is a block diagram illustrating a general configuration of a video decoding apparatus according to the H.264 AVC Technical Standard. Referring to FIG. 1, a video decoding apparatus according to H.264 AVC may include a parser 11, an entropy decoder 12, an inverse converter 13, a motion vector operator 14, a predictor 15, and deblocking. And a filter 16. These hardware modules sequentially process the data of the compressed image to decompress the compressed image and restore the original image data. At this time, the parser 11 parses the motion vector and the coding type for the coding unit of the compressed image. Such a coding unit is generally an image block such as a macroblock or a subblock.

[도 2]는 기존의 영상분석 솔루션에서 CCTV 영상으로부터 객체 분류를 수행하는 과정을 나타내는 순서도이다. [도 2]를 참조하면, 종래기술에서는 압축영상을 H.264 AVC 및 H.265 HEVC 등에 따라 디코딩하고(S10), 재생영상의 프레임 이미지들을 작은 이미지, 예컨대 320x240 정도로 다운스케일 리사이징을 한다(S20). 이때, 다운스케일 리사이징을 하는 이유는 이후 영상분석 과정에서의 프로세싱 부담을 그나마 줄이기 위한 것이다. 그리고 나서, 리사이징된 프레임 이미지들에 대해 차영상(differentials)을 구한 후에 영상 분석을 통해 이동객체를 추출해낸다(S30). 그리고 나서, 일련의 프레임 이미지에 대한 영상 내용 분석을 통해 이동객체를 인식하여 객체 분류를 달성한다(S40).2 is a flowchart illustrating a process of performing object classification from CCTV images in the existing image analysis solution. Referring to FIG. 2, in the prior art, the compressed video is decoded according to H.264 AVC and H.265 HEVC, etc. (S10), and the frame images of the playback video are downscaled to a small image, for example, 320x240 (S20). ). At this time, the reason for downsizing resizing is to reduce the processing burden in the subsequent image analysis process. Then, after obtaining differential images of the resized frame images, the moving object is extracted through image analysis (S30). Then, the object classification is achieved by recognizing the moving object through image content analysis of the series of frame images (S40).

종래기술에서 이동객체를 추출 및 분류하려면 압축영상 디코딩, 다운스케일 리사이징, 영상 분석을 수행한다. 이들은 복잡도가 매우 높은 프로세스이고, 그로 인해 종래의 영상관제 시스템에서는 한 대의 영상분석 서버가 동시 처리할 수 있는 용량이 상당히 제한되어 있다. 현재 고성능의 영상분석 서버가 커버할 수 있는 최대 CCTV 채널은 통상 최대 16 채널이다. 다수의 CCTV 카메라가 설치되므로 영상관제 시스템에는 다수의 영상분석 서버가 필요하였고, 이는 비용 증가와 물리적 공간 확보의 어려움이라는 문제점을 유발하였다.In the prior art, to extract and classify moving objects, compressed image decoding, downscale resizing, and image analysis are performed. These are very complicated processes, and therefore, in a conventional video control system, the capacity of a single video analysis server can be processed at a time is quite limited. Currently, the maximum CCTV channels that a high performance video analytics server can cover are typically up to 16 channels. Since a large number of CCTV cameras were installed, a number of video analysis servers were required for the video control system, which caused problems of increased cost and difficulty in securing physical space.

본 발명의 목적은 일반적으로 H.264 AVC 및 H.265 HEVC 등의 압축영상으로부터 객체 분류를 효과적으로 수행하는 기술을 제공하는 것이다.An object of the present invention is generally to provide a technique for effectively performing object classification from compressed images such as H.264 AVC and H.265 HEVC.

특히, 본 발명의 목적은 예컨대 CCTV 카메라가 생성하는 압축영상에 대해 종래기술처럼 복잡한 이미지 프로세싱을 통해 객체 존재를 인식하고 객체 종류를 분류하는 것이 아니라 압축영상 데이터를 파싱하여 얻어지는 신택스 정보(예: 모션벡터, 코딩유형)를 활용하여 영상 내의 무언가 유의미한 움직임이 존재하는 영역, 즉 이동객체 영역을 추출하고 그 이동객체 영역의 모션벡터 패턴을 심층신경망에 학습시킨 후에 객체 분류에 활용하는 기술을 제공하는 것이다.In particular, an object of the present invention is syntax information (eg, motion) obtained by parsing compressed image data instead of recognizing object existence and classifying object types through complex image processing, for example, for compressed images generated by CCTV cameras. By using vector and coding type), it is to provide a technique for extracting an area where there is significant motion in an image, that is, a moving object area, and learning the motion vector pattern of the moving object area in a deep neural network, and then using it for object classification. .

상기의 목적을 달성하기 위하여 본 발명에 따른 압축영상에 대한 신택스 기반의 객체 분류 방법은, 압축영상의 비트스트림을 파싱하여 코딩 유닛에 대한 모션벡터 및 코딩유형을 획득하는 제 1 단계; 압축영상을 구성하는 복수의 영상 블록 별로 미리 설정된 시간동안의 모션벡터 누적값을 획득하는 제 2 단계; 복수의 영상 블록에 대하여 모션벡터 누적값을 미리 설정된 제 1 임계치와 비교하는 제 3 단계; 제 1 임계치를 초과하는 모션벡터 누적값을 갖는 영상 블록을 이동객체 영역으로 마킹하는 제 4 단계; 각각의 이동객체 영역에 대해 모션벡터 패턴을 취합하는 제 5 단계; 이동객체 영역의 모션벡터 패턴을 훈련 데이터집합으로 설정하여 심층신경망 학습을 수행하는 제 6 단계; 객체 분류 대상으로 설정된 특정의 이동객체 영역에 대한 모션벡터 패턴을 심층신경망에 입력하여 객체 분류 결과를 획득하는 제 7 단계;를 포함하여 구성될 수 있다.In order to achieve the above object, a syntax-based object classification method for a compressed image may include: a first step of parsing a bitstream of the compressed image to obtain a motion vector and a coding type for a coding unit; A second step of obtaining a motion vector cumulative value for a predetermined time for each of the plurality of image blocks constituting the compressed image; A third step of comparing a motion vector cumulative value with a first threshold value for a plurality of image blocks; A fourth step of marking an image block having a motion vector accumulation value exceeding a first threshold as a moving object region; A fifth step of collecting a motion vector pattern for each moving object region; A sixth step of performing deep neural network learning by setting a motion vector pattern of a moving object region as a training data set; And a seventh step of obtaining a result of object classification by inputting a motion vector pattern for a specific moving object region set as an object classification object to a deep neural network.

이때, 이동객체 영역의 모션벡터 패턴은 해당 이동객체 영역에 속하는 복수의 영상 블록에 대하여 제 1 단계에서 획득된 모션벡터의 방향 성분과 크기 성분의 2차원 배열을 포함하여 구성될 수 있다.In this case, the motion vector pattern of the moving object region may include a two-dimensional array of direction components and size components of the motion vector obtained in the first step with respect to the plurality of image blocks belonging to the moving object region.

이때, 제 6 단계는, 이동객체 영역에 대한 객체분류 레퍼런스를 제공받는 단계; 이동객체 영역에 대하여 모션벡터 패턴과 객체분류 레퍼런스의 조합을 심층신경망에 입력하여 심층신경망 학습을 수행하는 단계;를 포함하여 구성될 수 있다.In this case, the sixth step may include receiving an object classification reference for the moving object region; And performing deep neural network learning by inputting a combination of a motion vector pattern and an object classification reference to the deep neural network for the moving object region.

또한, 본 발명에 따른 압축영상에 대한 신택스 기반의 객체 분류 방법은, 이동객체 영역을 중심으로 그 인접하는 복수의 영상 블록(이하, '이웃 블록'이라 함)을 식별하는 제 a 단계; 복수의 이웃 블록에 대하여 제 1 단계에서 획득된 모션벡터 값을 미리 설정된 제 2 임계치와 비교하는 제 b 단계; 복수의 이웃 블록 중에서 제 b 단계의 비교 결과 제 2 임계치를 초과하는 모션벡터 값을 갖는 이웃 블록을 이동객체 영역으로 추가 마킹하는 제 c 단계; 복수의 이웃 블록 중에서 코딩유형이 인트라 픽쳐인 이웃 블록을 이동객체 영역으로 추가 마킹하는 제 d 단계; 복수의 이동객체 영역에 대하여 인터폴레이션을 수행하여 이동객체 영역으로 둘러싸인 미리 설정된 갯수 이하의 비마킹 영상 블록을 이동객체 영역으로 추가 마킹하는 제 e 단계;를 더 포함하여 구성될 수 있다.In addition, the syntax-based object classification method for a compressed image according to the present invention, a step of identifying a plurality of adjacent image blocks (hereinafter referred to as 'neighbor block') around the moving object area; B) comparing the motion vector values obtained in the first step with respect to the plurality of neighboring blocks with a second preset threshold; C) additionally marking a neighboring block having a motion vector value exceeding a second threshold value as a result of the comparison of step b among the plurality of neighboring blocks as a moving object region; D) additionally marking a neighboring block having a coding type of an intra picture among the plurality of neighboring blocks as a moving object region; The method may further include an e-step of performing interpolation on the plurality of moving object regions to additionally mark the moving object region with a predetermined number of non-marked image blocks surrounded by the moving object region.

또한, 본 발명에 따른 압축영상에 대한 신택스 기반의 객체 분류 방법은, 이동객체 영역을 중심으로 그 인접하는 복수의 영상 블록(이하, '이웃 블록'이라 함)을 식별하는 제 a 단계; 복수의 이웃 블록에 대하여 모션벡터 누적값을 제 1 임계치보다 작은 값으로 미리 설정된 제 2 임계치와 비교하는 제 b 단계; 복수의 이웃 블록 중에서 제 b 단계의 비교 결과 제 2 임계치를 초과하는 모션벡터 누적값을 갖는 이웃 블록을 이동객체 영역으로 추가 마킹하는 제 c 단계; 복수의 이웃 블록 중에서 코딩유형이 인트라 픽쳐인 이웃 블록을 이동객체 영역으로 추가 마킹하는 제 d 단계; 복수의 이동객체 영역에 대하여 인터폴레이션을 수행하여 이동객체 영역으로 둘러싸인 미리 설정된 갯수 이하의 비마킹 영상 블록을 이동객체 영역으로 추가 마킹하는 제 e 단계;를 더 포함하여 구성될 수 있다.In addition, the syntax-based object classification method for a compressed image according to the present invention, a step of identifying a plurality of adjacent image blocks (hereinafter referred to as 'neighbor block') around the moving object area; B) comparing the motion vector cumulative value with respect to the plurality of neighboring blocks with a second threshold preset to a value less than the first threshold; C) additionally marking, as a moving object region, a neighboring block having a motion vector accumulation value exceeding a second threshold value as a result of the comparison of step b of the plurality of neighboring blocks; D) additionally marking a neighboring block having a coding type of an intra picture among the plurality of neighboring blocks as a moving object region; The method may further include an e-step of performing interpolation on the plurality of moving object regions to additionally mark the moving object region with a predetermined number of non-marked image blocks surrounded by the moving object region.

한편, 본 발명에 따른 컴퓨터프로그램은 하드웨어와 결합되어 이상과 같은 압축영상에 대한 신택스 기반의 객체 분류 방법을 실행시키기 위하여 매체에 저장된 것이다.On the other hand, the computer program according to the present invention is stored in a medium in combination with hardware to execute the syntax-based object classification method for the compressed image as described above.

본 발명에 따르면 디코딩, 다운스케일 리사이징, 차영상 획득, 영상 분석 등과 같은 복잡한 프로세싱을 거치지 않고서도 CCTV 압축영상으로부터 효과적으로 객체 분류를 수행할 수 있는 장점이 있다. 특히, 종래기술 대비 1/10 정도의 연산량으로 객체 분류를 수행할 수 있게 되어 영상분석 서버의 수용 채널수를 대략 10배 이상 증가시킬 수 있는 장점이 있다.According to the present invention, there is an advantage in that object classification can be effectively performed from CCTV compressed images without complex processing such as decoding, downscale resizing, difference image acquisition, image analysis, and the like. In particular, it is possible to perform object classification with a calculation amount of about 1/10 compared with the prior art, which has an advantage of increasing the number of receiving channels of the image analysis server by approximately 10 times or more.

[도 1]은 동영상 디코딩 장치의 일반적인 구성을 나타내는 블록도.
[도 2]는 종래기술에서 CCTV 압축영상으로부터 객체 분류를 수행하는 과정을 나타내는 순서도.
[도 3]은 본 발명에 따라 압축영상으로부터 신택스 기반으로 객체 분류를 수행하는 전체 프로세스를 나타내는 순서도.
[도 4]는 본 발명에서 심층신경망을 통해 이동객체 영역의 모션벡터 패턴으로부터 객체를 분류하는 개념을 나타내는 도면.
[도 5]는 본 발명에서 압축영상으로부터 유효 움직임 영역을 검출하는 과정의 구현 예를 나타내는 순서도.
[도 6]은 CCTV 압축영상에 대해 유효 움직임 영역 검출 과정을 적용한 결과의 일 예를 나타낸 도면.
[도 7]은 본 발명에서 이동객체 영역에 대한 바운더리 영역을 검출하는 과정의 구현 예를 나타내는 순서도.
[도 8]은 [도 6]의 CCTV 영상 이미지에 대해 바운더리 영역 검출 과정을 적용한 결과의 일 예를 나타낸 도면.
[도 9]는 [도 8]의 CCTV 영상 이미지에 대해 인터폴레이션을 통해 이동객체 영역을 정리한 결과의 일 예를 나타낸 도면.
[도 10]은 CCTV 영상 이미지에서 검출된 이동객체 영역 상에 모션벡터 패턴을 겹쳐서 나타낸 도면.
[도 11]은 [도 10]에서 이동객체 영역에 대한 모션벡터 패턴만 나타낸 도면.
[도 12] 내지 [도 14]는 [도 11]에서 세 개의 이동객체 영역에 대한 모션벡터 패턴을 확대하여 나타낸 도면.1 is a block diagram showing a general configuration of a video decoding apparatus.
2 is a flowchart showing a process of performing object classification from a CCTV compressed image in the prior art.
FIG. 3 is a flowchart illustrating the entire process of performing object classification on a syntax-based basis from a compressed image according to the present invention. FIG.
4 is a diagram illustrating a concept of classifying an object from a motion vector pattern of a moving object region through a deep neural network in the present invention.
FIG. 5 is a flowchart illustrating an example of a process of detecting an effective motion region from a compressed image in the present invention. FIG.
6 is a diagram illustrating an example of a result of applying an effective motion region detection process to a CCTV compressed image.
7 is a flowchart illustrating an example of a process of detecting a boundary region for a moving object region in the present invention.
FIG. 8 is a diagram illustrating an example of a result of applying a boundary region detection process to the CCTV image of FIG. 6. FIG.
FIG. 9 is a diagram illustrating an example of a result of arranging a moving object region through interpolation for the CCTV image of FIG. 8. FIG.
10 is a diagram showing a motion vector pattern superimposed on the area of the moving object detected from the CCTV image.
FIG. 11 is a diagram illustrating only a motion vector pattern for a moving object region in FIG. 10. FIG.
12 to 14 are enlarged views of a motion vector pattern for three moving object regions in FIG. 11.

이하에서는 도면을 참조하여 본 발명을 상세하게 설명한다.Hereinafter, with reference to the drawings will be described in detail the present invention.

[도 3]은 본 발명에 따라 압축영상으로부터 신택스 기반으로 객체 분류를 수행하는 전체 프로세스를 나타내는 순서도이고, [도 4]는 본 발명에서 심층신경망을 통해 이동객체 영역의 모션벡터 패턴으로부터 객체를 분류하는 개념을 나타내는 도면이다.3 is a flowchart illustrating an entire process of classifying an object based on a syntax from a compressed image according to the present invention, and FIG. 4 is a classifying an object from a motion vector pattern of a moving object region through a deep neural network in the present invention. It is a figure which shows the concept to make.

본 발명에 따른 객체 분류 프로세스는 다수의 압축영상을 다루는 시스템, 예컨대 CCTV 영상관제 시스템 또는 CCTV 영상분석 시스템에서 영상분석 서버가 수행할 수 있다. 또한, 본 발명의 객체 분류 프로세스는 심층신경망(deep neural network)을 이용하는데, 영상분석 서버 내부에 심층신경망을 구현하여 활용할 수도 있고, 혹은 외부의 클라우드 서버에 구현된 심층신경망을 오픈 API(Open Application Program Interface)를 통해 활용할 수도 있다.The object classification process according to the present invention may be performed by an image analysis server in a system for handling a plurality of compressed images, for example, a CCTV image control system or a CCTV image analysis system. In addition, the object classification process of the present invention uses a deep neural network, which may be utilized by implementing a deep neural network inside an image analysis server, or using an open API that is implemented in an external cloud server. It can also be used through the Program Interface.

객체분류를 수행하려면 먼저 압축영상으로부터 이동객체로 보이는 부분을 식별해내야 한다. 본 발명에서는 압축영상을 디코딩할 필요없이 압축영상의 비트스트림을 파싱하여 각 영상 블록에 대한 신택스 정보(syntax information)를 통해 이동객체 영역을 빠르게 추출한다. 영상 블록으로는 매크로블록(Macro Block) 및 서브블록(Sub Block) 등의 어느 하나 혹은 이들의 조합을 채택할 수 있고, 신택스 정보로는 모션벡터(Motion Vector)와 코딩유형(Coding Type)이 바람직하다. 이렇게 얻어진 이동객체 영역은 본 명세서에 첨부된 여러 이미지에서 확인되는 바와 같이 영상 내에 존재하는 이동객체의 경계선을 정확하게 반영하지는 못하지만 처리속도가 빠르면서도 신뢰도가 높은 장점이 있다. In order to perform object classification, it is necessary to first identify a part of the compressed image as a moving object. In the present invention, a moving object region is quickly extracted through syntax information of each image block by parsing a bitstream of the compressed image without having to decode the compressed image. As an image block, any one or a combination of macro blocks and sub blocks may be adopted, and as motion information, a motion vector and a coding type are preferable. Do. The moving object region thus obtained does not accurately reflect the boundary of the moving object existing in the image as confirmed in the various images attached to the present specification, but has a high processing speed and high reliability.

다만, 본 발명은 이동객체를 추출하는 것이 아니라 이동객체가 포함된 것으로 추정되는 영상 블록의 덩어리를 추출하는 방식이라는 점에서 종래기술과는 개념상 차이가 있다. 영상 분석을 하지 않았기에 영상 내용에 대해서는 정보가 없는 상태에서 객체 분류를 수행해야 하는 것이다.However, the present invention is conceptually different from the prior art in that the present invention is a method of extracting a block of an image block that is estimated to include a moving object, rather than extracting a moving object. Since image analysis was not performed, object classification should be performed without information on image contents.

그에 따라, 본 발명에서는 이동객체 영역에 형성된 모션벡터 패턴을 훈련 데이터집합으로 삼아 심층신경망을 학습시킨다. 다수의 훈련 데이터집합을 이용하여 심층신경망을 학습시킨 이후에 이를 객체 분류에 적용한다. 즉, 객체 분류 작업을 수행하는 단계에서는 예컨대 관제요원이 지정한 특정의 이동객체 영역 혹은 CCTV 영상에서 검출되는 다수의 이동객체 영역에 대해 모션벡터 패턴을 심층신경망에 입력하고 그 연산결과를 살펴봄으로써 해당 이동객체 영역에 대한 객체의 종류가 무엇인지 분류해나가는 것이다.Accordingly, in the present invention, the deep neural network is trained using the motion vector pattern formed in the moving object region as a training data set. After training a deep neural network using multiple training datasets, we apply it to object classification. That is, in the step of performing object classification, for example, the motion vector pattern is input to the deep neural network for a specific mobile object area designated by the controller or a plurality of mobile object areas detected in the CCTV image, and the operation result is examined by looking at the operation result. It is to classify what kind of object is the object area.

한편, 본 발명에 따르면 압축영상을 디코딩하지 않고도 이동객체 영역을 추출해내고 객체 분류를 수행할 수 있다. 하지만, 본 발명이 적용된 장치 또는 소프트웨어라면 압축영상을 디코딩하는 동작을 수행하지 않아야 하는 것으로 본 발명의 범위가 한정되는 것은 아니다.Meanwhile, according to the present invention, the moving object region can be extracted and the object classification can be performed without decoding the compressed image. However, the apparatus or software to which the present invention is applied should not perform an operation of decoding a compressed image, but the scope of the present invention is not limited.

이하, [도 3]을 참조하여 본 발명에 따라 압축영상으로부터 객체 분류를 수행하는 과정의 개념을 살펴본다.Hereinafter, a concept of a process of performing object classification from a compressed image according to the present invention will be described with reference to FIG. 3.

단계 (S100) : 먼저, 압축영상의 모션벡터에 기초하여 압축영상으로부터 실질적으로 의미를 인정할만한 유효 움직임을 검출하며, 이처럼 유효 움직임이 검출된 영상 영역을 이동객체 영역으로 설정한다.Step S100: First, an effective motion that can substantially recognize meaning is detected from the compressed image based on the motion vector of the compressed image, and the image region in which the effective motion is detected is set as the moving object region.

이를 위해, H.264 AVC 및 H.265 HEVC 등의 동영상압축 표준에 따라서 압축영상의 코딩 유닛(coding unit)의 모션벡터와 코딩유형을 파싱한다. 이때, 코딩 유닛의 사이즈는 일반적으로 64x64 픽셀 내지 4x4 픽셀 정도이며 설계자의 선택에 따라 다양하게 설정될 수 있다.To this end, the motion vector and coding type of a coding unit of a compressed image are parsed according to a video compression standard such as H.264 AVC and H.265 HEVC. In this case, the size of the coding unit is generally about 64x64 pixels to 4x4 pixels and may be variously set according to a designer's selection.

각 영상 블록에 대해 미리 설정된 일정 시간(예: 500 msec) 동안 모션벡터를 누적시키고, 그에 따른 모션벡터 누적값이 미리 설정된 제 1 임계치(예: 20)을 초과하는지 검사한다. 만일 그러한 영상 블록이 발견되면 해당 영상 블록에서 유효 움직임이 발견된 것으로 보고 이동객체 영역으로 마킹한다. 그에 따라, 모션벡터가 발생하였더라도 일정 시간동안의 누적값이 제 1 임계치를 넘지 못하는 경우에는 영상 변화가 미미한 것으로 추정하고 무시한다.The motion vectors are accumulated for a predetermined time period (for example, 500 msec) for each image block, and it is checked whether the motion vector accumulation value exceeds the first threshold value (for example, 20). If such an image block is found, it is considered that an effective motion is found in the image block, and then marked as a moving object area. Accordingly, even if the motion vector is generated, if the cumulative value for a predetermined time does not exceed the first threshold, the image change is estimated to be insignificant and ignored.

단계 (S200) : 앞의 (S100)에서 검출된 이동객체 영역에 대하여 그 주변 영역을 모션벡터와 코딩유형에 기초하여 검사함으로써 이들 이동객체 영역의 바운더리가 대략적으로 어디까지인지 확장해나간다. 이러한 과정을 통해서 앞서 (S100)에서 파편화된 영상 블록의 형태로 검출된 이동객체 영역을 서로 연결하여 유의미한 덩어리 형태를 만들어가는 결과를 얻는다.Step S200: The moving object areas detected in the previous step S100 are examined based on the motion vector and the coding type to extend the boundaries of the moving object areas. Through this process, the mobile object region detected in the form of the fragmented image block in step S100 is connected to each other, thereby obtaining a significant lump shape.

앞의 (S100)에서는 엄격한 판단기준에 따라 영상 블록들을 선별함으로써 압축영상 내에서 이동객체에 대응하는 것이 확실해 보이는 영상 블록을 검출하여 이동객체 영역으로 마킹하였다. 이번의 (S200)에서는 이렇게 (S100)에서 이동객체 영역으로 마킹되었던 영상 블록 주변에 위치하는 다른 영상 블록들을 검사한다. 이들을 본 명세서에서는 편이상 '이웃 블록'이라고 부른다. 이들 이웃 블록에 대해서는 앞서 (S100)에 적용하였던 판단기준에 비해 상대적으로 완화된 판단기준에 따라 이동객체 영역에 해당하는지 여부를 판단한다.In the previous step (S100), by selecting the image blocks according to the strict judgment criteria, the image block that seems to correspond to the moving object in the compressed image is detected and marked as the moving object area. In this step (S200), other image blocks located around the image block marked as the moving object area in S100 are examined. In the present specification, these are referred to as 'neighborhood blocks'. For these neighboring blocks, it is determined whether or not it corresponds to the moving object region according to a criterion that is relatively relaxed compared to the criterion applied in S100.

압축영상에서 매크로블록이나 서브블록 등은 매우 작은 사이즈이다. 따라서 CCTV 촬영영상과 같이 사람, 자동차, 자전거, 동물 등을 촬영한 영상이라면 그 속성상 이동객체가 하나의 영상 블록에만 나타나기는 곤란하고 여러 영상 블록에 걸쳐서 나타날 것으로 예상한다. 즉, 이동객체가 찍힌 영상 블록 근방에 존재하는 영상 블록에는 이동객체가 찍혀있을 가능성이 그렇지 않은 영상 블록에 비해 상대적으로 높다고 예상한다. 그러한 기술적 사상을 반영하여 (S200)에서는 이동객체 영역 주변에 존재하는 이웃 블록에 대해 상대적으로 완화된 판단기준에 따라 이동객체 영역에 해당하는지 여부를 판단한다.Macroblocks and subblocks are very small in compressed video. Therefore, if the image of people, cars, bicycles, animals, etc., such as CCTV images, the moving object is difficult to appear in only one image block, it is expected to appear over several image blocks. That is, it is expected that the image block existing near the image block on which the moving object is taken is relatively more likely to have the moving object captured than the image block. Reflecting such a technical idea (S200), it is determined whether or not it corresponds to the moving object area according to a relatively relaxed judgment criterion for the neighboring block existing around the moving object area.

바람직하게는 각각의 이웃 블록을 검사하여, 현재 프레임에서 검출된 모션벡터 값이 미리 설정된 제 2 임계치(예: 0) 이상이거나 코딩유형이 인트라 픽쳐(Intra Picture)일 경우에는 해당 영상 블록도 이동객체 영역으로 마킹한다. 다른 실시예로는, 이웃 블록에 대해 앞서 (S100)에서 산출하였던 모션벡터 누적값이 제 2 임계치(예: 5) 이상이거나 코딩유형이 인트라 픽쳐일 경우에는 해당 영상 블록도 이동객체 영역으로 마킹할 수 있다. 이때, 제 2 임계치는 제 1 임계치에 비해 작은 값으로 설정되는 것이 논리적으로 타당하다.Preferably, each neighboring block is inspected, and if the motion vector value detected in the current frame is greater than or equal to a preset second threshold (eg, 0) or the coding type is an intra picture, the corresponding video block is also a moving object. Mark the area. In another exemplary embodiment, when the motion vector cumulative value calculated in operation S100 for the neighboring block is equal to or greater than the second threshold value (eg, 5) or the coding type is an intra picture, the corresponding image block may also be marked as a moving object region. Can be. At this time, it is logically reasonable that the second threshold is set to a smaller value than the first threshold.

개념적으로는, 유효 움직임이 발견되어 이동객체 영역의 근방에서 어느 정도의 움직임이 있는 영상 블록이라면 이는 앞의 이동객체 영역과 한 덩어리일 가능성이 높기 때문에 이동객체 영역이라고 마킹하는 것이다. 또한, 인트라 픽쳐의 경우에는 모션벡터가 존재하지 않기 때문에 모션벡터에 기초하여 이동객체 영역인지 여부를 판단하는 것이 불가능하다. 이에, 이동객체 영역으로 이미 검출된 영상 블록에 인접하여 위치하는 인트라 픽쳐라면 기 추출된 이동객체 영역과 함께 한 덩어리를 이루는 것으로 추정한다. 이동객체 영역이 아닌 영상 블록 하나가 이동객체 영역에 포함되었을 때의 손실은 별로 크지 않은 반면, 이동객체 영역이 파편화되었을 때의 손실은 크기 때문이다.Conceptually, if an effective block is found and there is some movement in the vicinity of the moving object area, it is marked as a moving object area because it is likely to be a mass with the previous moving object area. In addition, in the case of an intra picture, since a motion vector does not exist, it is impossible to determine whether a motion object region is based on the motion vector. Therefore, if the intra picture is located adjacent to the image block already detected as the moving object region, it is assumed that the intra picture forms a mass together with the extracted moving object region. This is because the loss when one image block other than the moving object area is included in the moving object area is not very large, whereas the loss when the moving object area is fragmented is large.

단계 (S300) : 앞의 (S100)과 (S200)에서 검출된 이동객체 영역에 인터폴레이션(interpolation)을 적용하여 이동객체 영역의 분할(fragmentation)을 정리한다. 앞의 과정에서는 영상 블록 단위로 이동객체 영역 여부를 판단하였기 때문에 실제로는 하나의 이동객체(예: 사람)임에도 불구하고 중간중간에 이동객체 영역으로 마킹되지 않은 영상 블록이 존재하여 여러 개의 이동객체 영역으로 분할되는 현상이 발생할 수 있다. 그에 따라, 이동객체 영역으로 마킹된 복수의 영상 블록으로 둘러싸여 하나 혹은 소수의 비마킹 영상 블록이 존재한다면 이들은 이동객체 영역으로 추가로 마킹한다. 이를 통해, 여러 개로 분할되어 있는 이동객체 영역을 하나로 뭉쳐지도록 만들 수 있는데, 이와 같은 인터폴레이션의 영향은 [도 8]과 [도 9]를 비교하면 명확하게 드러난다.Step S300: The interpolation is applied to the moving object areas detected at S100 and S200 to clean up the fragmentation of the moving object area. In the above process, since it is determined whether the moving object area is the image block unit, even though it is actually a moving object (for example, a person), there is an image block that is not marked as the moving object area in the middle. The phenomenon of dividing into may occur. Accordingly, if there is one or a few unmarked image blocks surrounded by a plurality of image blocks marked with the moving object region, they additionally mark the moving object region. Through this, it is possible to make the mobile object region divided into several groups into one. The influence of such interpolation is clearly seen when comparing [FIG. 8] and [FIG. 9].

앞서의 과정 (S100) 내지 (S300)을 통하여 압축영상으로부터 하나이상의 이동객체 영역(region of moving object)을 획득하였다. 이렇게 획득된 이동객체 영역는 [도 9]에 파란 색으로 표시된 것으로서 일련의 과정에서 이동객체 영역에 속한다고 마킹해둔 다수의 영상블록들이 서로 연결되어 뭉쳐진 덩어리이다. 각각의 단계 (S100) 내지 (S300)에서는 영상블록 단위로 이동객체 영역에 속하는지 여부를 판단하여 마킹하였으나, 최종적으로는 이들이 뭉쳐져서 이룬 영상블록의 덩어리가 이동객체 영역으로 다루어진다. 이러한 이동객체 영역은 압축영상의 신택스 정보에 기초하여 그 안에 하나이상의 이동객체가 포함되어 있을 것으로 추정된 부분이다. 소프트웨어 처리를 위하여 각각의 이동객체 영역에는 고유 식별정보(Unique ID)를 할당하여 관리하는 것이 바람직하다. [도 10]을 참조하면 CCTV 압축영상으로부터 3개의 이동객체 영역이 검출되었으며 이들에 대해서는 각각 001, 002, 003의 Unique ID가 할당되었다.At least one region of moving object is obtained from the compressed image through the above processes (S100) to (S300). The obtained moving object region is shown in blue in FIG. 9 and is a lump in which a plurality of image blocks marked as belonging to the moving object region in a series of processes are connected to each other. In each of the steps (S100) to (S300) it is determined whether or not to belong to the moving object region in the image block unit, but finally, the chunks of the image blocks formed by agglomeration are treated as the moving object region. Such a moving object region is a portion estimated to include one or more moving objects based on the syntax information of the compressed image. It is preferable to assign and manage a unique ID to each mobile object area for software processing. Referring to FIG. 10, three moving object regions were detected from the CCTV compressed image, and unique IDs of 001, 002, and 003 were allocated to them.

단계 (S400) : 이상의 과정을 통해 획득한 각각의 이동객체 영역에 대해 모션벡터 패턴(pattern of motion vectors)을 [도 10]과 같이 취합한다. 이동객체 영역은 복수의 영상블록의 덩어리로 이루어지는데, 이동객체 영역을 구성하는 각각의 영상블록에 대하여 앞서 단계 (S200)에서 획득하였던 모션벡터의 방향 성분과 크기 성분을 배치한 2차원 배열로 구현될 수 있다. [도 10]은 CCTV 영상 이미지에서 검출된 이동객체 영역 상에 모션벡터 패턴을 겹쳐서 나타낸 도면이고, [도 11]은 [도 10]에서 이동객체 영역에 대한 모션벡터 패턴만 나타낸 도면이며, [도 12] 내지 [도 14]는 [도 11]에서 세 개의 이동객체 영역(Unique ID = 001, 002, 003)에 대한 모션벡터 패턴을 확대하여 나타낸 도면이다. Step S400: A pattern of motion vectors is collected for each moving object region obtained through the above process as shown in FIG. 10. The moving object region is composed of a plurality of image blocks, and is implemented in a two-dimensional array in which the direction component and the size component of the motion vector obtained in step S200 are arranged for each image block constituting the moving object region. Can be. FIG. 10 is a diagram illustrating a motion vector pattern superimposed on a moving object region detected in a CCTV image image, and FIG. 11 is a diagram showing only a motion vector pattern for a moving object region in FIG. 10. 12] to [14] are enlarged views of a motion vector pattern for three moving object areas (Unique ID = 001, 002, and 003) in FIG.

이들 도면에서는 모션벡터 패턴의 개념을 직관적으로 나타내기 위하여 각각의 영상블록에 대해 해당 영상 프레임에서 획득된 모션벡터의 방향 성분과 크기 성분을 시각적으로 표시하였다. [도 12] 내지 [도 14]를 참조하여 이들 세 개의 이동객체 영역에 대한 모션벡터 패턴을 비교하여 살펴보면 모션벡터의 크기, 모션벡터 방향의 통일성, 모션벡터의 배치 형태 등에 있어서 서로 상당한 차이를 나타내고 있다. 그러한 차이는 이동객체 영역에 포함되어 있는 이동객체의 형태적 특성 및 움직임 특성과 관련되어 있다.In these figures, in order to intuitively represent the concept of a motion vector pattern, a direction component and a size component of a motion vector acquired in a corresponding video frame are visually displayed for each image block. 12 and 14, the motion vector patterns of the three moving object regions are compared and shown to show a significant difference in the size of the motion vectors, uniformity of the direction of the motion vectors, and arrangement of the motion vectors. have. Such a difference is related to the morphological characteristics and the movement characteristics of the moving object included in the moving object area.

단계 (S500, S600) : 심층신경망의 동작 모드를 판단하여 심층신경망을 학습시키는 모드인 경우를 살펴본다. 이 경우에는, 단계 (S400)에서 획득한 다수의 이동객체 영역에 대한 모션벡터 패턴을 훈련 데이터집합(training dataset)으로 설정하여 심층신경망 학습을 수행한다. 심층신경망의 학습 방식으로는 지도학습(Supervised Learning)과 비지도학습(Unsupervised Learning)이 있는데, CCTV 영상관제의 경우에는 심층신경망을 통한 객체 분류 결과가 명확하게 설정되는 것이 일반적이므로 지도학습 방식이 바람직하다.Steps S500 and S600: Examine a case in which the deep neural network is trained by determining an operation mode of the deep neural network. In this case, the deep vector network training is performed by setting the motion vector patterns for the plurality of moving object regions acquired in step S400 as a training dataset. Supervised learning and unsupervised learning are the learning methods of the deep neural network. In the case of CCTV video control, the supervised learning method is preferable because the result of object classification through the deep neural network is generally set clearly. Do.

지도학습 방식의 경우에, 단계 (S600)은 각각의 이동객체 영역에 대하여 그것들이 어떠한 종류인지(예: 사람, 자동차, 동물, 깃발)에 대한 가이드 정보인 객체분류 레퍼런스를 제공받은 후에, 각각의 이동객체 영역에 대하여 모션벡터 패턴과 객체분류 레퍼런스의 조합을 심층신경망에 입력하여 심층신경망 학습을 수행하는 과정으로 구성될 수 있다.In the case of the supervised learning method, step S600 is performed for each moving object area after receiving an object classification reference which is guide information on what kind of object (eg, person, car, animal, flag). The combination of the motion vector pattern and the object classification reference for the moving object region may be input to the deep neural network to perform deep neural network learning.

단계 (S500, S700) : 다음으로, 심층신경망의 동작 모드를 판단하여 심층신경망을 적용하는 모드인 경우를 살펴본다. 이 경우에는, 객체 분류 대상으로 설정된 특정의 이동객체 영역에 대한 모션벡터 패턴을 심층신경망에 입력하고 심층신경망의 연산 결과로부터 해당 이동객체 영역에 대한 객체 분류 결과를 획득한다. 관제요원이 지정한 하나 혹은 그 이상의 이동객체 영역이 객체 분류 대상으로 설정될 수도 있고, CCTV 압축영상에서 순차적으로 식별되는 모든 이동객체 영역이 순차적으로 객체 분류 대상으로 설정될 수도 있다.Steps (S500, S700): Next, the operation mode of the deep neural network to determine the case of applying the deep neural network will be described. In this case, the motion vector pattern for the specific mobile object region set as the object classification object is input to the deep neural network, and an object classification result for the mobile object region is obtained from the calculation result of the deep neural network. One or more moving object areas designated by the controller may be set as the object classification object, or all moving object areas sequentially identified in the CCTV compressed image may be sequentially set as the object classification object.

객체 분류 대상인 이동객체 영역에 대해 단계 (S400)에서 획득된 모션벡터 패턴을 심층신경망에 입력하며, 그에 대응하여 심층신경망이 출력하는 연산 결과값은 해당 이동객체 영역의 객체 분류 결과를 나타낸다. 예를 들어, 심층신경망이 연산 결과값으로 '001'을 출력하면 이는 해당 이동객체 영역에 '자동차'가 포함되어 있다고 판단한 것을 의미한다. 이처럼 본 발명에서는 압축영상을 디코딩하고 그 영상을 분석하지 않고서도 객체를 분류할 수 있다.The motion vector pattern obtained in step S400 is input to the deep neural network with respect to the moving object region that is an object classification object, and the operation result value outputted by the deep neural network corresponds to the object classification result of the corresponding mobile object region. For example, when the deep neural network outputs '001' as an operation result value, this means that it is determined that 'car' is included in the corresponding mobile object area. As described above, the present invention can classify an object without decoding the compressed image and analyzing the image.

단계 (S800) : 이동객체 영역과 그에 대한 객체 분류 결과의 조합은 영상관제 장치로 제공되는데, 영상관제 장치는 객체 분류 결과를 다양한 형태로 활용할 수 있다. 예를 들어, CCTV 모니터 화면에서 객체 분류별로 상이한 색상을 지정하여 디스플레이할 수 있다. 또한, 예컨대 범죄해결 단서를 확보할 목적으로 영상 검색을 수행할 때에 스토리지에 저장되어 있는 대규모의 CCTV 촬영 영상으로부터 검색 대상을 특정 분류로 한정함으로써 검색 대상 모집단의 범위를 축소시킬 수 있으며, 이를 통해 검색 속도를 높일 수 있다.Step S800: The combination of the moving object region and the object classification result thereof is provided to the image control apparatus, which may use the object classification result in various forms. For example, different colors may be designated and displayed for each object classification on the CCTV monitor screen. In addition, for example, when conducting an image search for the purpose of securing a crime resolution clue, the scope of the search target population can be reduced by limiting the search target to a specific classification from the large-scale CCTV photographed images stored in the storage. You can speed it up.

[도 5]는 본 발명에서 압축영상으로부터 유효 움직임(effective movement) 영역을 검출하는 과정의 구현 예를 나타내는 순서도이고, [도 6]은 CCTV 압축영상에 대해 유효 움직임 영역 검출 과정이 적용된 결과의 일 예를 나타내는 도면이다. [도 5]의 프로세스는 [도 3]에서 단계 (S100)에 대응한다.FIG. 5 is a flowchart illustrating an example of a process of detecting an effective movement region from a compressed image in the present invention, and FIG. 6 is a result of applying an effective movement region detection process to a CCTV compressed image. It is a figure which shows an example. The process of FIG. 5 corresponds to step S100 in FIG.

단계 (S110) : 먼저, 압축영상의 코딩 유닛을 파싱하여 모션벡터 및 코딩유형을 획득한다. [도 1]을 참조하면, 동영상 디코딩 장치는 압축영상의 스트림에 대해 H.264 AVC 및 H.265 HEVC 등과 같은 동영상압축 표준에 따라 구문분석(헤더 파싱) 및 모션벡터 연산을 수행한다. 이러한 과정을 통하여 압축영상의 코딩 유닛에 대하여 모션벡터와 코딩유형을 파싱해낸다.Step S110: First, a coding unit of a compressed image is parsed to obtain a motion vector and a coding type. Referring to FIG. 1, a video decoding apparatus performs parsing (header parsing) and motion vector calculation on a stream of a compressed video according to a video compression standard such as H.264 AVC and H.265 HEVC. Through this process, the motion vector and coding type are parsed for the coding unit of the compressed image.

단계 (S120) : 압축영상을 구성하는 복수의 영상 블록 별로 미리 설정된 시간(예: 500 ms) 동안의 모션벡터 누적값을 획득한다. Step S120: Acquire a motion vector cumulative value for a preset time (for example, 500 ms) for each of the plurality of image blocks constituting the compressed image.

이 단계는 압축영상으로부터 실질적으로 의미를 인정할만한 유효 움직임, 예컨대 주행중인 자동차, 달려가는 사람, 서로 싸우는 군중들이 있다면 이를 검출하려는 의도를 가지고 제시되었다. 흔들리는 나뭇잎, 잠시 나타나는 고스트, 빛의 반사에 의해 약간씩 변하는 그림자 등은 비록 움직임은 있지만 실질적으로는 무의미한 객체이므로 검출되지 않도록 한다.This step is presented with the intention to detect if there are effective movements that are practically meaningful from the compressed image, such as driving cars, running people, and fighting crowds. Shaky leaves, ghosts that appear momentarily, and shadows that change slightly due to light reflections are not detected because they are moving but practically meaningless objects.

이를 위해, 미리 설정된 일정 시간(예: 500 msec) 동안 하나이상의 영상 블록 단위로 모션벡터를 누적시켜 모션벡터 누적값을 획득한다. 이때, 영상 블록은 매크로블록과 서브블록을 포함하는 개념으로 사용된 것이다.To this end, a motion vector cumulative value is obtained by accumulating the motion vectors in units of one or more image blocks for a predetermined time period (for example, 500 msec). In this case, the image block is used as a concept including a macroblock and a subblock.

단계 (S130, S140) : 복수의 영상 블록에 대하여 모션벡터 누적값을 미리 설정된 제 1 임계치(예: 20)와 비교하며, 제 1 임계치를 초과하는 모션벡터 누적값을 갖는 영상 블록을 이동객체 영역으로 마킹한다.Steps S130 and S140: Comparing a motion vector cumulative value with respect to a plurality of image blocks with a preset first threshold value (eg, 20) and moving the image block having a motion vector cumulative value exceeding the first threshold value. Mark with

만일 이처럼 일정 이상의 모션벡터 누적값을 갖는 영상 블록이 발견되면 해당 영상 블록에서 무언가 유의미한 움직임, 즉 유효 움직임이 발견된 것으로 보고 이동객체 영역으로 마킹한다. 예컨대 영상관제 시스템에서 사람이 뛰어가는 정도로 관제 요원이 관심을 가질만한 가치가 있을 정도의 움직임을 선별하여 검출하려는 것이다. 반대로, 모션벡터가 발생하였더라도 일정 시간동안의 누적값이 제 1 임계치를 넘지 못할 정도로 작을 경우에는 영상에서의 변화가 그다지 크지않고 미미한 것으로 추정하고 검출 단계에서 무시한다.If an image block having a predetermined motion vector accumulation value is found as described above, it is regarded that something significant motion, that is, effective motion, is found in the image block, and then marked as a moving object region. For example, in a video surveillance system, a human run is to detect and detect a movement that is worth the attention of the control personnel. On the contrary, even if the motion vector is generated, if the cumulative value for a predetermined time is small enough not to exceed the first threshold, the change in the image is assumed to be small and insignificant and is neglected in the detection step.

[도 6]은 본 발명에서 [도 5]의 과정을 통해 CCTV 압축영상으로부터 유효 움직임 영역을 검출한 결과를 시각적으로 나타낸 일 예이다. [도 6]에서는 제 1 임계치 이상의 모션벡터 누적값을 갖는 영상 블록이 이동객체 영역으로 마킹되어 붉은 색으로 표시되었다. [도 6]을 살펴보면 보도블럭이나 도로, 그리고 그림자가 있는 부분 등은 이동객체 영역으로 표시되지 않은 반면, 걷고있는 사람들이나 주행중인 자동차 등이 이동객체 영역으로 표시되었다.FIG. 6 is an example of visually showing a result of detecting an effective motion region from a CCTV compressed image through the process of FIG. 5 in the present invention. In FIG. 6, an image block having a motion vector accumulation value greater than or equal to a first threshold is marked with a moving object area and displayed in red. Referring to FIG. 6, the sidewalk block, the road, and the part with the shadow are not displayed as the moving object area, while the walking people or the driving car are displayed as the moving object area.

[도 7]은 본 발명에서 이동객체 영역에 대한 바운더리 영역(boundary area)을 검출하는 과정의 구현 예를 나타내는 순서도이고, [도 8]은 [도 6]의 CCTV 영상 이미지에 대해 [도 7]에 따른 바운더리 영역 검출 과정이 추가로 적용된 결과의 일 예를 나타내는 도면이다. [도 7]의 프로세스는 [도 3]에서 단계 (S200)에 대응한다.FIG. 7 is a flowchart illustrating an example of a process of detecting a boundary area of a moving object area in the present invention, and FIG. 8 is a view of the CCTV image of FIG. 2 is a diagram illustrating an example of a result of additionally applying a boundary region detection process according to the present invention. The process of FIG. 7 corresponds to step S200 in FIG.

앞서의 [도 6]을 살펴보면 이동객체에 해당되는 영상블록이 제대로 마킹되지 않았으며 일부에 대해서만 마킹이 이루어진 것을 발견할 수 있다. 즉, 걷고있는 사람이나 주행중인 자동차를 살펴보면 객체의 전부가 마킹되지 않고 그 일부의 영상블록만 마킹되었음을 발견할 수 있다. 또한, 하나의 이동객체에 대해 복수의 이동객체 영역이 형성된 것도 많이 발견된다. 자동차를 살펴보면 복수 개의 이동객체 영역이 형성되어 있다. 이는 앞의 (S100)에서 채택한 이동객체 영역의 판단 기준이 일반 영역을 필터링 아웃하는 데에는 매우 유용하지만 상당히 엄격한 것이었음을 의미한다. 따라서, 앞서 (S100)에서 마킹된 이동객체 영역을 중심으로 그 주변의 영상블록들을 검토하고 일정 기준을 만족한다면 이동객체 영역을 추가로 마킹해줌으로써 결과적으로는 이동객체 영역의 바운더리를 검출하는 과정이 필요하다.Looking at the above [Fig. 6] it can be found that the image block corresponding to the moving object is not properly marked and only marking is done for some. That is, when looking at a walking person or a driving car, it can be found that only some of the image blocks are marked, not all of the objects are marked. In addition, it is also found that a plurality of moving object regions are formed for one moving object. Looking at a car, a plurality of moving object regions are formed. This means that the criterion of the moving object region adopted in S100 is very useful for filtering out the general region but is quite strict. Accordingly, the process of detecting boundary of the moving object area is performed by reviewing the image blocks around the moving object area marked in S100 and marking the moving object area additionally if a certain criterion is satisfied. need.

단계 (S210) : 먼저, 앞의 (S100)에 의해 이동객체 영역으로 마킹된 영상 블록을 중심으로 하여 인접하는 복수의 영상 블록을 식별한다. 이들은 본 명세서에서는 '이웃 블록(neighboring blocks)'이라고 부른다. 이들 이웃 블록은 (S100)에 의해서는 이동객체 영역으로 마킹되지 않은 부분인데, [도 7]의 프로세스에서는 이들에 대해 좀더 살펴봄으로써 이들 이웃 블록 중에서 이동객체 영역의 바운더리에 포함될만한 것이 있는지 확인하려는 것이다.Step S210: First, a plurality of adjacent image blocks are identified based on the image blocks marked as moving object areas by the previous S100. These are referred to herein as 'neighboring blocks'. These neighboring blocks are parts that are not marked as moving object areas by (S100). In the process of FIG. 7, the neighboring blocks are examined in detail to determine whether any of these neighboring blocks may be included in the boundary of the moving object area. .

단계 (S220, S230) : 복수의 이웃 블록에 대하여 모션벡터 값을 미리 설정된 제 2 임계치와 비교하고, 제 2 임계치를 초과하는 모션벡터 값을 갖는 이웃 블록을 이동객체 영역으로 마킹한다. 실질적으로 의미를 부여할만한 유효 움직임이 인정된 이동객체 영역에 인접하여 위치하고 그 자신에 대해서도 어느 정도의 움직임이 발견되고 있다면 그 영상 블록은 촬영 영상(예: CCTV 영상)의 특성상 앞의 이동객체 영역과 한 덩어리일 가능성이 높다. 따라서, 이러한 이웃 블록도 이동객체 영역이라고 마킹한다. Steps S220 and S230: Compare the motion vector values with respect to the plurality of neighboring blocks with a second preset threshold value, and mark the neighboring block having the motion vector value exceeding the second threshold as the moving object region. If it is located adjacent to the moving object area where effective motion that is practically meaningful is found, and a certain amount of movement is found for itself, then the image block is characterized by the previous moving object area due to the characteristics of the captured image (eg CCTV image). It's likely a chunk. Therefore, such neighboring blocks are also marked as moving object regions.

이를 구현하는 제 1 실시예로서, 각각의 이웃 블록을 검사하여, 현재 프레임에서 검출된 모션벡터 값이 미리 설정된 제 2 임계치(예: 0) 이상인 경우에 해당 영상 블록도 이동객체 영역으로 마킹한다.As a first embodiment to implement this, each neighboring block is inspected, and when the motion vector value detected in the current frame is equal to or greater than a second preset threshold (eg, 0), the corresponding image block is also marked as a moving object region.

한편, 제 2 실시예로서, 이웃 블록에 대해 앞서 (S100)에서 산출하였던 모션벡터 누적값이 미리 설정된 제 2 임계치(예: 5) 이상인 경우에는 해당 영상 블록도 이동객체 영역으로 마킹할 수 있다. 이때, 제 2 임계치는 제 1 임계치에 비해 작은 값으로 설정되는 것이 타당하다.Meanwhile, as a second embodiment, when the motion vector cumulative value calculated in step S100 with respect to the neighboring block is greater than or equal to a preset second threshold value (eg, 5), the corresponding image block may also be marked as the moving object region. At this time, it is reasonable that the second threshold is set to a smaller value than the first threshold.

단계 (S240) : 또한, 복수의 이웃 블록 중에서 코딩유형이 인트라 픽쳐인 것을 이동객체 영역으로 마킹한다. 인트라 픽쳐의 경우에는 모션벡터가 존재하지 않기 때문에 해당 이웃 블록이 이동객체 영역에 해당되는지 여부를 모션벡터에 기초하여 판단하는 것이 불가능하다. 이동객체 영역으로 이미 검출된 영상 블록에 인접하여 위치하는 인트라 픽쳐라면 기 추출된 이동객체 영역과 함께 한 덩어리를 이루는 것으로 추정하는 것이 바람직하다. 이동객체 영역이 아닌 영상 블록 하나가 이동객체 영역에 포함되었을 때의 손실은 별로 크지 않은 반면, 이동객체 영역이 파편화되었을 때의 손실은 크기 때문이다.Step S240: Also, the coding type is an intra picture among the plurality of neighboring blocks, as a moving object region. In the case of the intra picture, since there is no motion vector, it is impossible to determine whether the neighboring block corresponds to the moving object region based on the motion vector. If the intra picture is located adjacent to an image block already detected as the moving object region, it is preferable to estimate that the moving object region forms a mass together with the extracted moving object region. This is because the loss when one image block other than the moving object area is included in the moving object area is not very large, whereas the loss when the moving object area is fragmented is large.

[도 8]은 본 발명에서 CCTV 압축영상에 바운더리 영역 검출 과정까지 적용된 결과를 시각적으로 나타낸 도면인데, 이상의 과정을 통해 이동객체 영역으로 마킹된 다수의 영상 블록을 파란 색으로 표시하였다. [도 8]을 살펴보면, 앞서 [도 6]에서 붉은 색으로 표시되었던 이동객체 영역의 근방으로 파란 색의 이동객체 영역은 좀더 확장되었으며 이를 통해 CCTV로 촬영된 실제 영상과 비교할 때 이동객체를 전부 커버할 정도가 되었다는 사실을 발견할 수 있다.FIG. 8 is a diagram visually showing a result of applying a boundary region detection process to a CCTV compressed image in the present invention. A plurality of image blocks marked as moving object regions are displayed in blue color through the above process. Referring to FIG. 8, the blue moving object area was further extended to the vicinity of the moving object area shown in red in FIG. 6 and covered the entire moving object when compared with the actual image photographed by CCTV. You can find that it is enough.

[도 9]는 [도 8]의 CCTV 영상 이미지에 대해 인터폴레이션을 통해 이동객체 영역을 정리한 결과의 일 예를 나타낸 도면이다.FIG. 9 is a diagram illustrating an example of a result of arranging a moving object region through interpolation for the CCTV image of FIG. 8.

단계 (S300)은 앞의 (S100)과 (S200)에서 검출된 이동객체 영역에 인터폴레이션을 적용하여 이동객체 영역의 분할을 정리하는 과정이다. [도 8]을 살펴보면 파란 색으로 표시된 이동객체 영역 사이사이에 비마킹 영상 블록이 발견된다. 이렇게 중간중간에 비마킹 영상 블록이 존재하게 되면 이들이 다수의 개별적인 이동객체인 것처럼 간주될 수 있다. 이렇게 이동객체 영역이 파편화되면 단계 (S400)의 결과가 부정확해질 수 있고, 이동객체 영역의 갯수가 많아져서 단계 (S400)의 프로세스가 복잡해지는 문제도 있다.Step S300 is a process of arranging the division of the moving object area by applying interpolation to the moving object areas detected in the previous steps S100 and S200. Referring to FIG. 8, an unmarked image block is found between the moving object regions shown in blue. If there is an unmarked image block in the middle, it can be regarded as if they are a plurality of individual moving objects. When the moving object region is fragmented as described above, the result of step S400 may be inaccurate, and the number of moving object regions may increase, thereby complicating the process of step S400.

그에 따라, 본 발명에서는 이동객체 영역으로 마킹된 복수의 영상 블록으로 둘러싸여 하나 혹은 소수의 비마킹 영상 블록이 존재한다면 이는 이동객체 영역으로 마킹하는데, 이를 인터폴레이션이라고 부른다. [도 8]과 대비하여 [도 9]를 살펴보면, 이동객체 영역 사이사이에 존재하던 비마킹 영상 블록이 모두 이동객체 영역이라고 마킹되었다. 이를 통해, 덩어리로 움직이는 영역은 모두 묶어서 하나의 이동객체로서 다루게 된다.Accordingly, in the present invention, if there is one or a few unmarked image blocks surrounded by a plurality of image blocks marked as the moving object region, this is marked as the moving object region, which is called interpolation. Referring to FIG. 9, in contrast to FIG. 8, all of the non-marked image blocks existing between the moving object regions are marked as moving object regions. By doing this, all the moving areas are bundled together and treated as a moving object.

[도 6], [도 8], [도 9]를 비교하면 바운더리 영역 검출 과정과 인터폴레이션 과정을 거치면서 이동객체 영역이 실제 영상의 상황을 제대로 반영하게 되어간다는 사실을 발견할 수 있다. [도 6]에서 붉은 색으로 마킹된 덩어리로 판단한다면 영상 화면 속에 아주 작은 물체들이 다수 움직이는 것처럼 다루어질 것인데, 이는 실제와는 부합하지 않는다. 반면, [도 9]에서 파란 색으로 마킹된 덩어리로 판단한다면 어느 정도의 부피를 갖는 몇 개의 이동객체가 존재하는 것으로 다루어질 것이어서 실제 장면을 유사하게 반영한다.6, 8, and 9, it can be found that the moving object region properly reflects the actual image situation through the boundary region detection process and the interpolation process. In Fig. 6, if it is determined as a block marked with red color, it will be treated as if a lot of very small objects move in the video screen, which is not consistent with reality. On the other hand, if it is determined as a block marked in blue in FIG. 9, several moving objects having a certain volume will be treated as being present, similarly reflecting the actual scene.

한편, 본 발명은 컴퓨터가 읽을 수 있는 비휘발성 기록매체에 컴퓨터가 읽을 수 있는 코드의 형태로 구현되는 것이 가능하다. 이러한 비휘발성 기록매체로는 다양한 형태의 스토리지 장치가 존재하는데 예컨대 하드디스크, SSD, CD-ROM, NAS, 자기테이프, 웹디스크, 클라우드 디스크 등이 있고 네트워크로 연결된 다수의 스토리지 장치에 코드가 분산 저장되고 실행되는 형태도 구현될 수 있다. 또한, 본 발명은 하드웨어와 결합되어 특정의 절차를 실행시키기 위하여 매체에 저장된 컴퓨터프로그램의 형태로 구현될 수도 있다.Meanwhile, the present invention may be embodied in the form of computer readable codes on a computer readable nonvolatile recording medium. Such nonvolatile recording media include various types of storage devices, such as hard disks, SSDs, CD-ROMs, NAS, magnetic tapes, web disks, cloud disks, etc., and code is distributed in a plurality of networked storage devices. Forms that are implemented and executed may also be implemented. In addition, the present invention may be implemented in the form of a computer program stored in a medium in combination with hardware to execute a specific procedure.

Claims

Parsing the bitstream of the compressed image to obtain a motion vector and a coding type for the coding unit;
A second step of obtaining a motion vector cumulative value for a predetermined time for each of the plurality of image blocks constituting the compressed image;
A third step of comparing the motion vector cumulative value with a first threshold value for the plurality of image blocks;
A fourth step of marking an image block having a motion vector accumulation value exceeding the first threshold as a moving object region;
A fifth step of setting a cluster of image blocks marked as the moving object region in the compressed image by being connected to each other as a moving object region extracted from the compressed image;
Acquiring a two-dimensional array in which a direction component and a size component of the motion vector obtained in the first step are disposed corresponding to the position of each image block for a plurality of image blocks belonging to each moving object region, A fifth step of setting a motion vector pattern for a moving object region;
A sixth step of collecting motion vector patterns for the plurality of moving object regions;
A sixth step of receiving an object classification reference for the corresponding moving object region for each motion vector pattern;
Performing a deep neural network learning by setting a combination of the motion vector pattern and the object classification reference as a training data set for the plurality of moving object regions and inputting the deep neural network prepared in advance;
A seventh step of acquiring a motion vector pattern (hereinafter, referred to as a 'classification motion vector pattern') for a specific moving object region to be classified;
Inputting the classification target motion vector pattern into the deep neural network and obtaining an object classification result for a specific moving object region that is the object classification object from the calculation result of the deep neural network;
A syntax based object classification method for a compressed image including a.

delete

The method according to claim 1,
Performed between the fourth step and the fifth step,
A step of identifying a plurality of adjacent image blocks (hereinafter, referred to as 'neighbor block') around the moving object area;
B) comparing a motion vector value obtained in the first step with respect to the plurality of neighboring blocks with a second preset threshold value;
C) additionally marking, as a moving object region, a neighboring block having a motion vector value exceeding the second threshold value as a result of the comparison in the b of the plurality of neighboring blocks;
A syntax based object classification method for a compressed image, characterized in that the configuration further comprises.

The method according to claim 1,
Performed between the fourth step and the fifth step,
A step of identifying a plurality of adjacent image blocks (hereinafter, referred to as 'neighbor block') around the moving object area;
B) comparing the motion vector cumulative value with respect to the plurality of neighboring blocks with a second threshold preset to a value less than the first threshold;
C) additionally marking a neighboring block having a motion vector cumulative value exceeding the second threshold as a moving object region among the plurality of neighboring blocks as a result of the comparison in the b step;
A syntax based object classification method for a compressed image, characterized in that the configuration further comprises.

The method according to claim 4 or 5,
Carried out after the step c,
D) additionally marking a neighboring block having a coding type of an intra picture among the plurality of neighboring blocks as a moving object region;
A syntax based object classification method for a compressed image, characterized in that the configuration further comprises.

The method according to claim 6,
Carried out after the d step,
Performing an interpolation operation on the plurality of moving object areas to additionally mark up to a predetermined number of unmarked image blocks surrounded by the moving object area as a moving object area;
Syntax-based object classification method for a compressed image, characterized in that comprises a.

A computer program stored in a medium in combination with hardware to execute a syntax-based object classification method for a compressed image according to any one of claims 1, 4 and 5.