KR102594803B1

KR102594803B1 - syntax-based method of searching RE-ID in compressed video

Info

Publication number: KR102594803B1
Application number: KR1020220173243A
Authority: KR
Inventors: 이현우; 박준석; 이성진
Original assignee: 이노뎁 주식회사
Priority date: 2022-12-13
Filing date: 2022-12-13
Publication date: 2023-10-30

Abstract

본 발명은 일반적으로 영상분석 장치가 압축영상(예: CCTV 촬영 영상)에 대해 동일인(RE-ID) 검색을 효과적으로 수행하는 기술에 관한 것이다. 특히, 본 발명은 압축영상의 신택스 정보에 의해 이동객체를 검출한 이미지 프레임을 선별하여 재생영상 분석을 수행하고 이동객체의 대표 썸네일(베스트샷 이미지)에 대해서 객체특성정보(RE-ID Feature)를 추출하여 데이터베이스 구축한 후에 동일인(RE-ID) 검색의 대상인 목표 이미지가 주어지면 그 목표 이미지의 객체특성정보를 데이터베이스에 조회하도록 구성되어 고속의 동일인 검색을 달성할 수 있는 기술에 관한 것이다. 본 발명에 따르면 종래기술에서와 같이 압축영상 디코딩, 다운스케일 리사이징, 재생영상 분석과 같은 복잡도가 높고 시스템 자원을 많이 요구하는 프로세스를 수행하지 않고도 압축영상에 대해 비트스트림 파싱(bitstream parsing)에서 얻어지는 신택스 정보에 기초하여 객체의 베스트샷 썸네일을 얻고 이 베스트샷 썸네일에 기초하여 동일인(RE-ID) 검색을 수행하므로 동일인 검색의 처리 효율을 개선할 수 있는 장점이 있다. The present invention generally relates to a technology in which a video analysis device effectively performs a same-person (RE-ID) search on compressed video (e.g., CCTV footage). In particular, the present invention performs playback video analysis by selecting image frames in which moving objects are detected based on the syntax information of the compressed video, and provides object characteristic information (RE-ID Feature) for the representative thumbnail (best shot image) of the moving object. This relates to a technology that can achieve high-speed same-person search by extracting and constructing a database, and then providing a target image that is the subject of a same-person (RE-ID) search, by querying the database for the object characteristic information of the target image. According to the present invention, syntax obtained from bitstream parsing for compressed video without performing complex and system resource-demanding processes such as compressed video decoding, downscale resizing, and playback video analysis as in the prior art. Since the best shot thumbnail of the object is obtained based on the information and the same person (RE-ID) search is performed based on this best shot thumbnail, there is an advantage in improving the processing efficiency of the same person search.

Description

Syntax-based method of searching RE-ID in compressed video}

본 발명은 일반적으로 영상분석 장치가 압축영상(예: CCTV 촬영 영상)에 대해 동일인(RE-ID) 검색을 효과적으로 수행하는 기술에 관한 것이다. The present invention generally relates to a technology in which a video analysis device effectively performs a same-person (RE-ID) search on compressed video (e.g., CCTV footage).

특히, 본 발명은 압축영상의 신택스 정보에 의해 이동객체를 검출한 이미지 프레임을 선별하여 재생영상 분석을 수행하고 이동객체의 대표 썸네일(베스트샷 이미지)에 대해서 객체특성정보(RE-ID Feature)를 추출하여 데이터베이스 구축한 후에 동일인(RE-ID) 검색의 대상인 목표 이미지가 주어지면 그 목표 이미지의 객체특성정보를 데이터베이스에 조회하도록 구성되어 고속의 동일인 검색을 달성할 수 있는 기술에 관한 것이다. In particular, the present invention performs playback video analysis by selecting image frames in which moving objects are detected based on the syntax information of the compressed video, and provides object characteristic information (RE-ID Feature) for the representative thumbnail (best shot image) of the moving object. This relates to a technology that can achieve high-speed same-person search by extracting and constructing a database, and then providing a target image that is the subject of a same-person (RE-ID) search, by querying the database for the object characteristic information of the target image.

최근에는 범죄 예방 및 안전사고 예방과 사후증거 확보 등을 위해 CCTV를 이용하는 영상관제 시스템을 구축하는 것이 일반적이다. 영상관제 시스템의 관제 효율을 높이기 위해 CCTV 촬영 영상을 분석하는 기술에 대한 연구가 활발하다. Recently, it is common to build a video control system using CCTV to prevent crime, safety accidents, and secure post-mortem evidence. There is active research on technology to analyze CCTV footage to increase the control efficiency of video control systems.

[도 1]은 영상관제 시스템의 일반적인 구성도이다. [Figure 1] is a general configuration diagram of a video control system.

CCTV 카메라(10)는 여러 지점에 분산 설치되어 촬영 영상을 영상관제 장치(20)로 실시간 제공한다. CCTV 카메라(10)는 고해상도(예: Full HD) 및 고프레임(예: 초당 24프레임)으로 영상을 생성하여 H.264 AVC 및 H.265 HEVC 등으로 영상압축하여 전송하는 것이 일반적이다. CCTV cameras (10) are distributedly installed at various points and provide real-time captured images to the video control device (20). The CCTV camera 10 generally generates images with high resolution (e.g., Full HD) and high frames (e.g., 24 frames per second), compresses the images with H.264 AVC and H.265 HEVC, and transmits them.

영상관제 장치(20)는 CCTV 카메라(10)의 촬영 영상을 관제요원에게 실시간 모니터링 제공하고 추후 확인을 위해 스토리지 장치(30)에 저장한다. 또한, 영상관제 장치(20)는 CCTV 촬영 영상을 영상분석 장치(40)로 전달하여 영상관제를 위해 특정 목적에 따른 영상 분석을 지시한다. The video control device 20 provides real-time monitoring of the video captured by the CCTV camera 10 to the control personnel and stores it in the storage device 30 for later confirmation. In addition, the video control device 20 transmits the CCTV captured video to the video analysis device 40 and instructs video analysis according to a specific purpose for video control.

영상분석 장치(40)는 영상관제 장치(20)와 협조 동작하여 촬영 영상을 분석하여 각종 정보(예: 사물 존재, 특이상황 발생 여부, 객체 추적 등)를 영상관제 장치(20)로 제공한다. [도 2]는 H.264 AVC의 경우에 영상분석 장치(40)가 수행하는 영상 디코딩의 일반적인 구성도이다. 압축영상 디코딩을 위해서는 구문분석기(41), 엔트로피 디코더(42), 역 변환기(43), 모션벡터 연산기(44), 예측기(45), 디블로킹 필터(46)가 구비되며, 이들 모듈이 CCTV 촬영 영상에서 압축을 풀고 원래의 영상 데이터를 복원해낸다. 이때, 구문분석기(41)는 압축영상의 코딩 유닛에 대해 모션벡터 및 코딩유형을 파싱해낸다. 이러한 코딩 유닛(coding unit)은 일반적으로는 매크로블록이나 서브 블록과 같은 영상블록이다.The video analysis device 40 cooperates with the video control device 20 to analyze captured images and provides various information (e.g., presence of objects, occurrence of unusual situations, object tracking, etc.) to the video control device 20. [Figure 2] is a general configuration diagram of video decoding performed by the video analysis device 40 in the case of H.264 AVC. To decode compressed video, a parser (41), an entropy decoder (42), an inverse converter (43), a motion vector operator (44), a predictor (45), and a deblocking filter (46) are provided, and these modules are used to capture CCTV footage. Decompresses the video and restores the original video data. At this time, the parser 41 parses the motion vector and coding type for the coding unit of the compressed video. These coding units are generally image blocks such as macroblocks or subblocks.

CCTV 영상관제에서 객체 추적(object tracking)은 중요한 기능이다. 객체 추적은 CCTV 영상에서 객체(예: 사람, 자동차)를 검출하고 그 검출된 객체를 영상 분석을 통해 추적하는 기능이다. 이를 위해 영상분석 장치(40)는 CCTV 영상 분석을 통해 객체 검출, 추적대상 객체의 설정, 객체 동일성 확인, 객체 추적 등을 수행해야 한다. 이때, 객체 동일성 확인을 위해서는 추적대상 객체의 프레임 진행에 따른 위치 및 크기 예측, 객체 특징 추출 및 비교, 추적대상 객체의 특징점 이동 검출(예: Meanshift, Optical flow, Camshift), 동일성 비교 등을 수행하게 된다. Object tracking is an important function in CCTV video surveillance. Object tracking is a function that detects objects (e.g. people, cars) in CCTV images and tracks the detected objects through video analysis. To this end, the video analysis device 40 must detect objects, set tracking objects, confirm object identity, and track objects through CCTV video analysis. At this time, to confirm object identity, predict the position and size according to the frame progress of the tracked object, extract and compare object features, detect movement of feature points of the tracked object (e.g. Meanshift, Optical flow, Camshift), and compare identity. do.

객체 추적의 특별한 형태로서 최근들어 동일인(RE-ID) 검색이 논의되고 있다. 영상관제 분야에서 동일인(RE-ID) 검색은 CCTV 촬영영상에서 목표 이미지가 요구(query)되었을 때 그 촬영영상 혹은 다른 CCTV 촬영영상에서 해당 객체(동일 객체)를 찾는 것을 의미하는데, 실종자를 탐색하거나 요주의 인물을 추적하는 데에 유용하다. As a special form of object tracking, same-person (RE-ID) search has been discussed recently. In the field of video surveillance, RE-ID search refers to finding the relevant object (same object) in the captured image or other CCTV captured images when a target image is queried in the CCTV captured image, searching for a missing person or It is useful for tracking down people of interest.

[도 3]은 압축영상에서 동일인 검색을 수행하는 일반적인 순서도이다. [Figure 3] is a general flow chart for performing a search for the same person in a compressed video.

[도 3]을 참조하면, 압축영상을 동영상 압축표준(예: H.264 AVC, H.265 HEVC)에 따라 디코딩하여 재생 영상을 획득하고(S10), 재생영상을 구성하는 프레임 이미지들을 작은 이미지, 예컨대 320x240 픽셀 정도로 다운스케일 리사이징(down-scale resizing)한다(S20). 그리고 나서, 리사이징된 프레임 이미지들에 대해 차영상(differentials)을 구하고 재생영상 분석을 통해 객체(object)를 식별한(S30) 후에, 일련의 프레임 이미지에서 식별된 객체에 대하여 객체특성정보(RE-ID Feature)를 추출하고 동일 객체 분석 및 검색 작업을 수행한다(S40).Referring to [Figure 3], the compressed video is decoded according to the video compression standard (e.g. H.264 AVC, H.265 HEVC) to obtain the playback video (S10), and the frame images that make up the playback video are converted into small images. , For example, down-scale resizing to about 320x240 pixels (S20). Then, after obtaining differentials for the resized frame images and identifying objects through playback image analysis (S30), object characteristic information (RE- ID Feature) and perform identical object analysis and search (S40).

그런데, 이와 같은 방식으로 동일인 검색을 수행하려면 압축영상 디코딩, 다운스케일 리사이징, 재생영상 분석을 수행해야 한다. 이들은 복잡도가 높고 시스템 자원을 많이 요구하는 프로세스이다. 그로 인해 종래의 영상관제 시스템에서는 한 대의 영상분석 장치가 동시 처리할 수 있는 채널의 수가 상당히 제한된다. 현재 고성능의 영상분석 장치(40)의 RE-ID 동시 처리 용량은 최대 10채널이다. 범죄 예방과 안전사고 예방을 위해 점점더 많은 CCTV 카메라가 설치됨에 따라 다수의 영상분석 장치가 필요하고, 이는 비용 증가와 물리적 공간 확보의 어려움이라는 문제점을 유발하고 있다.However, in order to search for the same person in this way, compressed video decoding, downscale resizing, and playback video analysis must be performed. These are processes that have a high complexity and require a lot of system resources. As a result, in conventional video control systems, the number of channels that one video analysis device can simultaneously process is significantly limited. Currently, the RE-ID simultaneous processing capacity of the high-performance video analysis device 40 is up to 10 channels. As more and more CCTV cameras are installed to prevent crime and safety accidents, a large number of video analysis devices are required, which causes problems such as increased costs and difficulty in securing physical space.

대한민국 등록특허 10-2374776호 "CCTV의 위치 정보 및 객체의 움직임 정보에 기초한 타겟 객체 재식별 시스템 및 방법"Republic of Korea Patent No. 10-2374776 “Target object re-identification system and method based on CCTV location information and object movement information” 대한민국 등록특허 10-2416620호 "CCTV를 이용한 객체 추적 기반의 지역시설 이용 행태 분석 방법 및 시스템"Republic of Korea Patent No. 10-2416620 “Method and system for analyzing local facility use behavior based on object tracking using CCTV” 대한민국 등록특허 10-1519235호 "무선 센서 네트워크 기반 스마트 보안 CCTV 시스템 및 그 제어방법"Republic of Korea Patent No. 10-1519235 “Wireless sensor network-based smart security CCTV system and its control method” 대한민국 등록특허 10-2470837호 "3차원 공간에서의 CCTV 영상 범위를 이용한 공간 분석 방법"Republic of Korea Patent No. 10-2470837 “Spatial analysis method using CCTV image range in 3D space” 대한민국 등록특허 10-2315371호 "스마트 CCTV 관제 및 경보 시스템"Republic of Korea Patent No. 10-2315371 “Smart CCTV control and alarm system”

본 발명의 목적은 일반적으로 영상분석 장치가 압축영상(예: CCTV 촬영 영상)에 대해 동일인(RE-ID) 검색을 효과적으로 수행하는 기술을 제공하는 것이다. The purpose of the present invention is to provide a technology for a general video analysis device to effectively perform a same-person (RE-ID) search on compressed video (e.g., CCTV footage).

특히, 본 발명의 목적은 압축영상의 신택스 정보에 의해 이동객체를 검출한 이미지 프레임을 선별하여 재생영상 분석을 수행하고 이동객체의 대표 썸네일(베스트샷 이미지)에 대해서 객체특성정보(RE-ID Feature)를 추출하여 데이터베이스 구축한 후에 동일인(RE-ID) 검색의 대상인 목표 이미지가 주어지면 그 목표 이미지의 객체특성정보를 데이터베이스에 조회하도록 구성되어 고속의 동일인 검색을 달성할 수 있는 기술을 제공하는 것이다. In particular, the purpose of the present invention is to perform playback video analysis by selecting image frames in which moving objects are detected based on the syntax information of the compressed video, and to obtain object characteristic information (RE-ID Feature) for the representative thumbnail (best shot image) of the moving object. ) is extracted and a database is constructed, and when a target image that is the subject of a same person (RE-ID) search is given, the object characteristic information of the target image is configured to be searched in the database, providing a technology that can achieve high-speed same person search. .

본 발명의 해결 과제는 이 사항에 제한되지 않으며 본 명세서의 기재로부터 다른 해결 과제가 이해될 수 있다. The problem to be solved by the present invention is not limited to this matter, and other problems to be solved can be understood from the description in this specification.

상기의 목적을 달성하기 위하여 본 발명은 영상분석 장치가 압축영상에 대해 신택스 기반으로 동일인 검색을 처리하는 방법을 제안한다. In order to achieve the above object, the present invention proposes a method in which a video analysis device processes identical person search based on syntax for compressed video.

본 발명에 따른 압축영상에 대한 신택스 기반의 동일인 검색 방법은, 압축영상의 비트스트림을 파싱하여 신택스 정보를 추출하는 제 1 단계; 신택스 정보에 기초하여 압축영상을 구성하는 일련의 이미지 프레임에서 이동객체를 검출하는 제 2 단계; 이동객체가 검출된 이미지 프레임을 영상분석하여 이동객체에 대한 썸네일을 획득하는 제 3 단계; 압축영상에서 이동객체를 추적하여 이동객체의 베스트샷 이미지를 획득하는 제 4 단계; 이동객체의 베스트샷 이미지에 대하여 객체특성정보(RE-ID Feature)를 추출하는 제 5 단계; 이동객체의 객체특성정보를 데이터베이스 관리하는 제 6 단계; 동일인(RE-ID) 검색을 위한 목표 이미지를 수신하는 제 7 단계; 목표 이미지의 객체특성정보를 데이터베이스에 조회하여 목표 이미지에 대한 동일인 검색을 수행하는 제 8 단계;를 포함하여 구성된다. The syntax-based identical person search method for compressed video according to the present invention includes a first step of extracting syntax information by parsing a bitstream of compressed video; A second step of detecting a moving object in a series of image frames constituting a compressed video based on syntax information; A third step of obtaining a thumbnail for the moving object by analyzing the image frame in which the moving object is detected; A fourth step of tracking the moving object in the compressed video and obtaining the best shot image of the moving object; A fifth step of extracting object characteristic information (RE-ID Feature) from the best shot image of the moving object; A sixth step of database management of object characteristic information of moving objects; A seventh step of receiving a target image for identical person (RE-ID) search; An eighth step of performing a search for the same person for the target image by querying the database for object characteristic information of the target image.

본 발명에서 제 4 단계는 압축영상에서 이동객체를 추적하면서 이동객체의 썸네일 중에서 이동객체의 면적이 가장 큰 썸네일을 베스트샷 이미지로 설정하도록 구성되는 것이 바람직하다. In the present invention, the fourth step is preferably configured to track the moving object in the compressed video and set the thumbnail with the largest area of the moving object among the thumbnails of the moving object as the best shot image.

본 발명에 따른 압축영상에 대한 신택스 기반의 동일인 검색 방법은, 제 2 단계와 제 3 단계 사이에 수행되는, 이동객체가 검출된 이미지 프레임이 GOP 후단의 미리 설정된 갯수 이내에 위치한 P 프레임인 경우 해당 이미지 프레임에 대한 재생영상 분석을 스킵 처리하고 제 1 단계로 진행하는 단계;를 더 포함하여 구성될 수 있다. The syntax-based identical person search method for compressed video according to the present invention is performed between the second and third steps, and if the image frame in which the moving object is detected is a P frame located within a preset number at the rear of the GOP, the corresponding image The method may further include skipping the playback video analysis for the frame and proceeding to the first step.

본 발명에서 신택스 정보는 코딩 유닛에 대한 모션벡터 및 코딩타입을 포함하여 구성될 수 있다. In the present invention, syntax information may include a motion vector and coding type for a coding unit.

본 발명에서 제 2 단계는, 압축영상을 구성하는 일련의 이미지 프레임에 대하여 영상블록 별로 특정 시간동안의 모션벡터 누적값을 산출하는 단계; 복수의 영상블록 중에서 미리 설정된 제 1 임계치를 초과하는 모션벡터 누적값을 갖는 영상블록을 이동객체 영역으로 마킹하는 단계-- 이하, 이동객체 영역으로 마킹된 영상블록을 '마킹 영상블록(marked image block)'이라 하고, 이동객체 영역으로 마킹되지 않은 영상블록을 '비마킹 영상블록(unmarked image block)'이라 함 --; 마킹 영상블록에 인접하는 하나이상의 영상블록(이하, '이웃 블록'이라 함)을 식별하는 단계; 인트라 픽쳐의 코딩타입을 갖는 이웃 블록을 이동객체 영역으로 마킹하는 단계; 미리 설정된 제 2 임계치를 초과하는 모션벡터 값을 갖는 이웃 블록을 이동객체 영역으로 마킹하는 단계; 마킹 영상블록으로 둘러싸인 미리 설정된 갯수 이하의 비마킹 영상블록을 이동객체 영역으로 마킹하는 단계; 압축영상을 구성하는 일련의 이미지 프레임에서 마킹 영상블록들이 서로 연결되어 뭉쳐진 덩어리를 이동객체로 설정하는 단계;를 포함하여 구성될 수 있다. In the present invention, the second step includes calculating a motion vector accumulation value for a specific time for each video block for a series of image frames constituting the compressed video; Marking an image block with a motion vector accumulation value exceeding a preset first threshold among a plurality of image blocks as a moving object area - Hereinafter, an image block marked as a moving object area is referred to as a 'marked image block'. )', and an image block that is not marked as a moving object area is called an 'unmarked image block' --; Identifying one or more image blocks (hereinafter referred to as 'neighboring blocks') adjacent to the marking image block; Marking a neighboring block with an intra picture coding type as a moving object area; Marking a neighboring block with a motion vector value exceeding a preset second threshold as a moving object area; Marking a preset number or less of unmarked image blocks surrounded by marked image blocks as a moving object area; It may be configured to include the step of setting a mass of marking image blocks connected to each other in a series of image frames constituting the compressed image as a moving object.

한편, 본 발명에 따른 컴퓨터프로그램은 컴퓨터에 이상과 같은 압축영상에 대한 신택스 기반의 동일인(RE-ID) 검색 방법을 실행시키기 위하여 비휘발성 저장매체에 저장된 것이다.Meanwhile, the computer program according to the present invention is stored in a non-volatile storage medium in order to execute the syntax-based identical person (RE-ID) search method for the compressed video described above on the computer.

본 발명에 따르면 종래기술에서와 같이 압축영상 디코딩, 다운스케일 리사이징, 재생영상 분석과 같은 복잡도가 높고 시스템 자원을 많이 요구하는 프로세스를 수행하지 않고도 압축영상에 대해 비트스트림 파싱(bitstream parsing)에서 얻어지는 신택스 정보에 기초하여 객체의 베스트샷 썸네일을 얻고 이 베스트샷 썸네일에 기초하여 동일인(RE-ID) 검색을 수행하므로 동일인 검색의 처리 효율을 개선할 수 있는 장점이 있다. According to the present invention, syntax obtained from bitstream parsing for compressed video without performing complex and system resource-demanding processes such as compressed video decoding, downscale resizing, and playback video analysis as in the prior art. Since the best shot thumbnail of the object is obtained based on the information and the same person (RE-ID) search is performed based on this best shot thumbnail, there is an advantage in improving the processing efficiency of the same person search.

[도 1]은 영상관제 시스템의 일반적인 구성도.
[도 2]는 영상분석 장치가 수행하는 영상 디코딩의 일반적인 구성도.
[도 3]은 압축영상에서 동일인 검색을 수행하는 일반적인 순서도.
[도 4]는 본 발명에 따른 압축영상에 대한 신택스 기반의 동일인 검색 프로세스의 순서도.
[도 5]는 본 발명에서 디코딩 스킵의 개념도.
[도 6]은 본 발명에서 베스트샷 이미지를 획득하는 예시도.
[도 7]은 본 발명에서 압축영상으로부터 신택스 기반으로 이동객체를 검출하는 순서도.
[도 8]은 CCTV 압축영상에 대해 유효 움직임 영역을 검출한 예시도.
[도 9]는 [도 8]의 영상 이미지에 대해 바운더리 영역을 검출한 예시도.
[도 10]은 [도 9]의 영상 이미지에 대해 인터폴레이션을 적용한 예시도.[Figure 1] is a general configuration diagram of a video control system.
[Figure 2] is a general configuration diagram of video decoding performed by a video analysis device.
[Figure 3] is a general flowchart of performing a search for the same person in a compressed video.
[Figure 4] is a flowchart of a syntax-based identical search process for compressed video according to the present invention.
[Figure 5] is a conceptual diagram of decoding skip in the present invention.
[Figure 6] is an example of obtaining a best shot image in the present invention.
[Figure 7] is a flowchart of detecting a moving object based on syntax from compressed video in the present invention.
[Figure 8] is an example of detecting an effective motion area for CCTV compressed video.
[FIG. 9] is an example of detecting a boundary area for the video image of [FIG. 8].
[FIG. 10] is an example of interpolation applied to the video image of [FIG. 9].

이하에서는 도면을 참조하여 본 발명을 상세하게 설명한다. Hereinafter, the present invention will be described in detail with reference to the drawings.

본 발명을 설명함에 있어서 종래기술과 중복되는 부분에 대해서는 자세한 설명을 생략할 수 있다. In describing the present invention, detailed descriptions of parts that overlap with the prior art may be omitted.

[도 4]는 본 발명에 따른 압축영상에 대한 신택스 기반의 동일인 검색 프로세스의 순서도이다. 이와 같은 동일인 검색 프로세스는 컴퓨터 장치, 예컨대 [도 1]의 영상관제 시스템에서 영상분석 장치(40)가 수행할 수 있으며, 압축영상은 예컨대 H.264 AVC, H.265 HEVC 등에 의해 압축된 영상 데이터이다. [Figure 4] is a flowchart of a syntax-based identical search process for compressed video according to the present invention. This same person search process can be performed by a computer device, for example, the video analysis device 40 in the video control system of [Figure 1], and compressed video is video data compressed by, for example, H.264 AVC, H.265 HEVC, etc. am.

단계 (S1000, S1100) : 먼저, 영상분석 장치(40)는 압축영상의 비트스트림을 파싱하여 신택스 정보(syntax information)를 추출한다. 이때, 신택스 정보는 동영상압축 표준에 따른 압축영상의 코딩 유닛(coding unit)에 대한 모션벡터(motion vector)와 코딩타입(coding type)을 포함하여 구성될 수 있는데, 코딩 유닛의 사이즈는 일반적으로 64x64 픽셀 내지 4x4 픽셀 정도이며 설계자의 선택에 따라 다양하게 설정될 수 있다.Steps (S1000, S1100): First, the image analysis device 40 parses the bitstream of the compressed video and extracts syntax information. At this time, the syntax information may be composed of a motion vector and a coding type for the coding unit of the compressed video according to the video compression standard. The size of the coding unit is generally 64x64. It ranges from pixel to 4x4 pixel and can be set in various ways depending on the designer's choice.

영상분석 장치(40)는 압축영상을 구성하는 일련의 이미지 프레임에서 신택스 정보에 기초하여 이동객체를 검출한다. 신택스 정보에 기초하여 이동객체를 검출하는 과정에 대해서는 [도 7] 내지 [도 10]을 참조하여 후술한다. The image analysis device 40 detects moving objects based on syntax information in a series of image frames constituting compressed images. The process of detecting a moving object based on syntax information will be described later with reference to [FIGS. 7] to [FIG. 10].

단계 (S1200) : 다음으로, 영상분석 장치(40)는 신택스 정보 기반으로 이동객체가 검출된 이미지 프레임을 영상분석하고 이를 통해 이동객체에 대한 썸네일을 획득한다. 압축영상을 구성하는 일련의 이미지 프레임 중에서 (S1100)에서 이동객체가 검출되지 않은 이미지 프레임은 건너뛰고, 바람직하게는 (S1100)에서 이동객체가 검출된 이미지 프레임에 대해서만 압축영상 디코딩, 다운스케일 리사이징, 재생영상 분석을 수행하여 [도 6]에서와 같이 이동객체에 대한 썸네일을 획득한다. Step (S1200): Next, the image analysis device 40 analyzes the image frame in which the moving object is detected based on syntax information and obtains a thumbnail for the moving object through this. Among the series of image frames that make up the compressed video, image frames in which no moving object was detected in (S1100) are skipped, and preferably, compressed video decoding, downscale resizing, and By analyzing the playback video, a thumbnail for the moving object is obtained as shown in [Figure 6].

이때, 이동객체가 검출된 이미지 프레임에 대해 영상분석을 수행하여 유효객체 검출 여부를 판단한다. [도 2]에서 살펴보았던 디코딩, 다운스케일 리사이징, 차영상 획득, 재생영상 분석을 통해 해당 이미지 프레임에 객체가 존재하는지 여부를 판단하는 것이다. 이처럼 영상분석을 통해 추출되는 객체를 본 명세서에서는 '유효객체'라고 부른다. 실시예에 따라서는 (S1100)에서 검출된 이동객체를 그대로 유효객체로 사용할 수도 있다. At this time, video analysis is performed on the image frame in which the moving object is detected to determine whether a valid object has been detected. It is determined whether an object exists in the image frame through decoding, downscale resizing, difference image acquisition, and playback image analysis as seen in [Figure 2]. In this specification, the object extracted through image analysis is called an ‘effective object’. Depending on the embodiment, the moving object detected in (S1100) may be used as a valid object.

일 실시예로서 (S1200)의 영상분석을 (S1100)에서 검출된 이동객체 부분([도 10]에서 파란색 영역)에 대해서만 수행하도록 구성될 수 있다. 다른 실시예로는 (S1200)의 영상분석을 해당 이미지 프레임 전체에 대해서 수행하도록 구성할 수 있다. 즉, 신택스 정보 기반으로 이동객체가 아주 작은 것이라도 발견되면 해당 이미지 프레임 전체에 대해 영상분석을 통해 면밀하게 살피는 것이다. 전자의 경우가 영상분석 속도는 더 빠를 것이다. As an example, the image analysis of (S1200) may be configured to be performed only on the moving object portion (blue area in [FIG. 10]) detected in (S1100). In another embodiment, the image analysis of (S1200) may be configured to be performed on the entire corresponding image frame. In other words, if a moving object, even a very small one, is discovered based on syntax information, the entire image frame is closely examined through video analysis. In the former case, the video analysis speed will be faster.

본 발명은 (S1100)에서 이동객체가 미검출된 이미지 프레임에 대해서는 중요 정보가 없을 것으로 판단하여 영상분석 과정을 스킵하도록 구성할 수 있다. 이 경우, 해당 이미지 프레임에 대해서는 영상분석 과정, 즉 디코딩, 다운스케일 리사이징, 차영상 획득, 재생영상 분석 작업을 수행하지 않고 단지 비트스트림 파싱 및 약간의 연산만 수행하므로 컴퓨팅 자원을 적게 사용한다.The present invention can be configured to skip the image analysis process by determining that there is no important information in the image frame in which the moving object is not detected in (S1100). In this case, the video analysis process, that is, decoding, downscale resizing, difference image acquisition, and playback video analysis, is not performed on the image frame, but only bitstream parsing and some operations are performed, so it uses fewer computing resources.

한편, 해당 이미지 프레임이 GOP 후단의 미리 설정된 갯수 이내에 위치한 P 프레임인 경우에는 해당 이미지 프레임에 대한 영상분석을 스킵(skip) 처리하고 (S1000)로 진행하도록 구성될 수 있다. Meanwhile, if the corresponding image frame is a P frame located within a preset number at the rear of the GOP, the video analysis for the corresponding image frame may be skipped and proceed to (S1000).

[도 5]는 본 발명에서 압축영상을 구성하는 일련의 이미지 프레임 및 디코딩 스킵의 개념을 나타내는 도면이다. I(Intra) 프레임은 단일 프레임으로 디코딩이 가능하며, P(Prediction 혹은 Inter) 프레임은 마지막 I 프레임과 바로 앞선 P프레임까지의 영상을 디코딩한 경우에만 디코딩이 가능하다. GOP(Group of Pictures)는 하나의 I 프레임(Key Frame)과 P 프레임들의 묶음이며, 압축영상을 온전히 디코딩하기위한 최소 단위이다.[Figure 5] is a diagram showing a series of image frames constituting compressed video and the concept of decoding skip in the present invention. I (Intra) frames can be decoded as a single frame, and P (Prediction or Inter) frames can only be decoded if the video from the last I frame to the immediately preceding P frame has been decoded. GOP (Group of Pictures) is a bundle of one I frame (Key Frame) and P frames, and is the minimum unit for completely decoding compressed video.

영상분석은 일반적으로 모든 프레임을 분석하지 않고 초당 3~4 프레임정도만 분석한다. 만일 초당 입력영상이 초당 30 프레임이고 10 프레임마다 한번 분석한다고 가정하면, 현재 GOP(t)내에서 "I 프레임 영상분석 이후 10개의 P 프레임마다 영상분석을 수행하게 된다. 이때, 다음번 GOP(t+1)의 I 프레임을 분석하기 전까지 GOP 후단의 9개 이내에 위치한 P 프레임들, 즉 21~29번째 P 프레임은 영상분석할 필요가 없다. 이러한 영상분석 스킵 동작을 통해서도 컴퓨팅 자원을 절감할 수 있다. 한편, 구현 예에 따라서는 [도 5]에서 P 프레임 부분에 B 프레임이 섞여있을 수도 있다. Video analysis generally does not analyze all frames, but only analyzes about 3 to 4 frames per second. Assuming that the input video per second is 30 frames per second and analysis is performed once every 10 frames, within the current GOP(t), video analysis is performed every 10 P frames after the "I frame video analysis. At this time, the next GOP(t+ Before analyzing the I frame in 1), there is no need to analyze video for P frames located within 9 of the rear end of the GOP, that is, the 21st to 29th P frames. Computing resources can also be saved through this video analysis skip operation. Meanwhile, depending on the implementation example, B frames may be mixed in the P frame portion in [FIG. 5].

단계 (S1300) : 다음으로, 영상분석 장치(40)는 압축영상에서 이동객체를 추적하여 이동객체의 베스트샷 이미지(best shot image)를 획득한다. 이때, 영상분석을 통해 이동객체가 확인된 경우에, 영상에서 해당 객체가 사라질 때까지 지속적으로 영상분석을 수행하며 이를 통해 하나의 이동객체에 대한 복수의 썸네일을 얻게 된다. 이들 썸네일 중에서 미리 설정된 기준에 따라 가장 양호하다고 판단되는 썸네일을 해당 이동객체에 대한 베스트샷 이미지로 설정하는 것이다. Step (S1300): Next, the image analysis device 40 tracks the moving object in the compressed image and obtains the best shot image of the moving object. At this time, when a moving object is identified through video analysis, video analysis is continuously performed until the object disappears from the video, and through this, multiple thumbnails for one moving object are obtained. Among these thumbnails, the thumbnail that is judged to be the best according to preset standards is set as the best shot image for the moving object.

[도 6]은 본 발명에서 베스트샷 이미지를 획득하는 예시도이다. 특정 이동객체에 대하여 특징 비교 혹은 이동위치 예측 등을 통해 객체 추적(object tracking)을 수행하며, 이동객체가 영상에서 나타날 때부터 사라질 때까지 복수의 썸네일을 획득하게 된다. 이동객체가 영상에서 사라진 후에 해당 이동객체의 썸네일 중에서 이동객체의 면적이 가장 큰 썸네일을 베스트샷 이미지로 설정할 수 있다.[Figure 6] is an example of obtaining a best shot image in the present invention. Object tracking is performed for a specific moving object by comparing its features or predicting its moving position, and multiple thumbnails are obtained from the time the moving object appears in the image until it disappears. After a moving object disappears from the video, the thumbnail with the largest area of the moving object among the thumbnails of the moving object can be set as the best shot image.

단계 (S1400, S1500) : 다음으로, 영상분석 장치(40)는 이동객체의 베스트샷 이미지에 대하여 객체특성정보(RE-ID Feature)를 추출하고 이 베스트샷 이미지의 객체특성정보를 데이터베이스 관리한다. 객체특성정보(RE-ID Feature)는 동일인(RE-ID) 검색 분야에서 이미 알려진 기술이기 때문에 이에 관한 자세한 설명은 생략한다. 동일인(RE-ID) 검색 기술에서는 머신러닝(딥러닝)을 활용하는 구현 예가 많은데, 본 발명에서는 이동객체의 베스트샷 이미지에 대해 추출한 객체특성정보를 머신러닝(딥러닝)에 활용할 수 있다. Steps (S1400, S1500): Next, the image analysis device 40 extracts object characteristic information (RE-ID Feature) for the best shot image of the moving object and manages the object characteristic information of the best shot image in a database. Because object characteristic information (RE-ID Feature) is already a known technology in the field of identical person (RE-ID) search, a detailed description of it will be omitted. There are many implementation examples using machine learning (deep learning) in the same person (RE-ID) search technology, and in the present invention, object characteristic information extracted from the best shot image of a moving object can be used for machine learning (deep learning).

단계 (S1600, S1700) : 그리고 나서, 영상분석 장치(40)는 외부로부터 동일인(RE-ID) 검색을 위한 목표 이미지를 수신한다. 예를 들어, 실종자 탐색을 위하여 그 실종자의 이미지를 제공받는 것이다. 영상분석 장치(40)는 목표 이미지(예: 실종자 이미지)의 객체특성정보를 데이터베이스에 조회하여 목표 이미지에 대한 동일인 검색을 수행한다. 객체특성정보 기반의 동일인 검색 자체는 이미 알려진 기술이므로 이에 관한 자세한 설명은 생략한다. Steps (S1600, S1700): Then, the video analysis device 40 receives a target image for searching for the same person (RE-ID) from the outside. For example, in order to search for a missing person, an image of the missing person is provided. The image analysis device 40 searches the database for object characteristic information of the target image (e.g., image of a missing person) and performs a search for the same person for the target image. Since the same person search itself based on object characteristic information is already a known technology, a detailed description of it will be omitted.

위의 설명에서 (S1000) 내지 (S1500)은 압축영상으로부터 객체특성정보의 데이터베이스를 구축하는 프로세스이고, (S1600)과 (S1700)은 이 데이터베이스를 이용하여 특정의 목표 이미지에 대한 동일인 검색을 처리하는 프로세스이다. 이 두가지 프로세스를 별개의 태스크(tasks) 또는 쓰레드(threads)로 구현하여 동시에 실행시키는 경우에는 CCTV 영상관제 시스템에서 실시간 동일인 검색을 달성할 수 있다. In the above description, (S1000) to (S1500) are processes for building a database of object characteristic information from compressed images, and (S1600) and (S1700) are processes for processing the same person search for a specific target image using this database. It's a process. If these two processes are implemented as separate tasks or threads and executed simultaneously, real-time search of the same person can be achieved in the CCTV video control system.

[도 7]은 본 발명에서 압축영상으로부터 신택스 기반으로 이동객체를 검출하는 순서도이다. [도 7]의 프로세스는 [도 4]에서 (S1100)에 대응한다. [Figure 7] is a flowchart of detecting a moving object based on syntax from compressed video in the present invention. The process in [FIG. 7] corresponds to (S1100) in [FIG. 4].

본 발명에서는 신택스 정보(syntax information)에 기초하여 압축영상으로부터 이동객체로 보이는 부분을 식별해내고 이들을 객체 단위영역으로 설정하는 방식이 사용된다. 영상 내용을 전혀 모르는 상태에서 신택스 정보에 근거하여 이동객체가 포함된 것으로 추정되는 이미지 덩어리를 추출하는 방식이다. 이를 위해, 압축영상의 비트스트림을 파싱하여 각 영상블록에 대한 신택스 정보를 통해 이동객체 영역을 빠르게 추출한다. 영상블록으로는 매크로블록(Macro Block) 및 서브블록(Sub Block) 등의 어느 하나 혹은 이들의 조합을 채택할 수 있고, 신택스 정보로는 모션벡터(motion vector)와 코딩타입(Coding Type)이 바람직하다. 이렇게 얻어진 이동객체 영역은 [도 8] 내지 [도 10]에서 확인되는 바와 같이 영상 내에 존재하는 이동객체의 경계선을 정밀하게 반영하지는 못하지만 처리속도가 빠르면서도 신뢰도가 높은 장점이 있다. In the present invention, a method is used to identify parts that appear to be moving objects from compressed video based on syntax information and set them as object unit areas. This is a method of extracting image chunks presumed to contain moving objects based on syntax information without any knowledge of the video content. To achieve this, the bitstream of the compressed video is parsed and the moving object area is quickly extracted through syntax information for each video block. As a video block, either a macro block or a sub block or a combination thereof can be used, and as syntax information, a motion vector and a coding type are preferable. do. As can be seen in [FIGS. 8] to [FIG. 10], the moving object area obtained in this way does not accurately reflect the boundaries of the moving object existing in the image, but has the advantage of fast processing speed and high reliability.

단계 (S100, S110) : 먼저, 압축영상을 구성하는 일련의 이미지 프레임에 대하여 영상블록 별로 특정 시간동안의 모션벡터 누적값을 산출하고, 복수의 영상블록 중에서 미리 설정된 제 1 임계치를 초과하는 모션벡터 누적값을 갖는 영상블록을 이동객체 영역으로 마킹한다. 이 과정은 모션벡터에 기초하여 압축영상으로부터 실질적으로 의미를 인정할만한 유효 움직임을 검출하고, 이처럼 유효 움직임이 검출된 영상 부분을 이동객체 영역으로 마킹하는 것이다. Steps (S100, S110): First, calculate the cumulative motion vector for a specific time for each video block for a series of image frames constituting the compressed video, and calculate the motion vector that exceeds the first preset threshold among the plurality of video blocks. Image blocks with accumulated values are marked as moving object areas. This process detects effective motion that can actually be recognized as meaningful from the compressed video based on the motion vector, and marks the portion of the image where valid motion is detected as a moving object area.

비트스트림 파싱을 통해 압축영상의 코딩 유닛(coding unit)에 대한 모션벡터(motion vector)를 얻을 수 있는데, 각 영상블록에 대해 미리 설정된 일정 시간(예: 500 msec) 동안 그 영상블록의 모션벡터 값을 누적시킨다. 초당 30 프레임의 압축영상이라면 15 프레임(즉, 500 msec)에 걸쳐 각 영상블록에서 발생한 모션벡터를 누적 합산하는 것이다. 그리고, 그에 따른 모션벡터 누적값이 미리 설정된 제 1 임계치(예: 20)를 초과하는지 검사한다. 만일 그러한 영상블록이 발견되면 해당 영상블록에서 유효 움직임이 발견된 것으로 보고 이동객체 영역으로 마킹한다. 반면, 어느 영상블록에서 모션벡터가 발생하였더라도 일정 시간동안의 누적값이 제 1 임계치를 넘지 못하는 경우에는 영상 변화가 미미한 것으로 추정하고 그 영상블록은 이동객체 영역으로 마킹하지 않는다.Through bitstream parsing, the motion vector for the coding unit of the compressed video can be obtained. The motion vector value of the video block is stored for a certain period of time (e.g. 500 msec) preset for each video block. accumulates. In the case of compressed video at 30 frames per second, the motion vectors generated from each video block are accumulated over 15 frames (i.e., 500 msec). Then, it is checked whether the resulting motion vector accumulation value exceeds a preset first threshold (eg, 20). If such a video block is found, it is considered that valid movement has been found in the video block and it is marked as a moving object area. On the other hand, even if a motion vector occurs in a certain image block, if the cumulative value over a certain period of time does not exceed the first threshold, the image change is assumed to be minimal and the image block is not marked as a moving object area.

[도 8]은 CCTV 압축영상에 대해 (S100)과 (S110)에 의해 유효 움직임 영역을 검출한 예시도이다. [도 8]에서는 제 1 임계치 이상의 모션벡터 누적값을 갖는 영상블록이 이동객체 영역으로 마킹되어 붉은 색으로 표시되었다. [도 8]을 살펴보면 보도블럭이나 도로, 그리고 그림자가 있는 부분 등은 이동객체 영역으로 표시되지 않은 반면, 걷고있는 사람들이나 주행중인 자동차 등이 이동객체 영역으로 표시되었다. 본 명세서에서는 편의상 이동객체 영역으로 마킹된 영상블록을 '마킹 영상블록(marked image block)'이라 부르고, 이동객체 영역으로 마킹되지 않은 영상블록을 '비마킹 영상블록(unmarked image block)'이라 부른다. [Figure 8] is an example of detecting an effective motion area using (S100) and (S110) for CCTV compressed video. In [Figure 8], image blocks with motion vector accumulation values greater than the first threshold are marked as moving object areas and displayed in red. Looking at [Figure 8], sidewalk blocks, roads, and shadowed areas are not displayed as moving object areas, while people walking or driving cars are displayed as moving object areas. In this specification, for convenience, an image block marked as a moving object area is called a 'marked image block', and an image block not marked as a moving object area is called an 'unmarked image block'.

단계 (S120 ~ S140) : 다음으로, 마킹 영상블록에 인접하는 하나이상의 영상블록(이하, '이웃 블록'이라 함)을 식별하고, 이들 이웃 블록이 인트라 픽쳐의 코딩타입을 갖거나 미리 설정된 제 2 임계치를 초과하는 모션벡터 값을 갖는 경우에 이동객체 영역으로 마킹한다. 이 과정은 앞의 (S100)에서 검출된 이동객체 영역에 대하여 그 주변 영역을 검사함으로써 이들 이동객체 영역의 바운더리가 대략적으로 어디까지인지 확장해나가는 것이다. 이러한 과정을 통해서 앞서 (S100)에서 파편화된 영상블록의 형태로 검출된 이동객체 영역을 서로 연결하여 유의미한 덩어리 형태를 만들어간다.Steps (S120 ~ S140): Next, identify one or more video blocks (hereinafter referred to as 'neighboring blocks') adjacent to the marking video block, and these neighboring blocks have a coding type of intra picture or a preset second video block. If the motion vector value exceeds the threshold, it is marked as a moving object area. This process extends the boundaries of these moving object areas to approximately how far they are by examining the surrounding areas of the moving object areas detected in (S100). Through this process, the moving object areas detected in the form of fragmented image blocks in (S100) are connected to each other to create a meaningful chunk shape.

앞의 (S100)에서는 엄격한 판단기준에 따라 영상블록들을 선별함으로써 압축영상 내에서 이동객체에 대응하는 것이 확실해 보이는 영상블록을 검출하여 이동객체 영역으로 마킹하였다. (S120)에서는 (S100)에서 추출한 마킹 영상블록들 주변에 위치하는 비마킹 영상블록을 검사한다. 이들을 편이상 '이웃 블록(neighboring blocks)'이라고 부른다. 이들 이웃 블록은 (S100)에 의해서는 이동객체 영역으로 마킹되지 않은 부분인데, (S120) 내지 (S140)에서는 이들에 대해 좀더 살펴봄으로써 이들 이웃 블록 중에서 이동객체 영역에 포함될만한 것이 있는지 확인한다. In the above (S100), image blocks that clearly appear to correspond to a moving object in the compressed image were detected and marked as a moving object area by selecting image blocks according to strict judgment criteria. In (S120), unmarked image blocks located around the marked image blocks extracted in (S100) are inspected. For convenience, these are called ‘neighboring blocks’. These neighboring blocks are parts that are not marked as a moving object area in (S100), and in (S120) to (S140), they are examined further to check whether any of these neighboring blocks can be included in the moving object area.

압축영상에서 매크로블록이나 서브블록 등은 매우 작은 사이즈이다. 따라서 CCTV 영상과 같이 사람, 자동차, 동물 등을 촬영한 영상이라면 그 속성상 이동객체가 하나의 영상블록에만 나타나기는 곤란하고 여러 영상블록에 걸쳐서 나타날 것으로 예상한다. 즉, 이동객체가 찍힌 영상블록 근방에 존재하는 영상블록에는 이동객체가 찍혀있을 가능성이 그렇지 않은 영상블록에 비해 상대적으로 높다고 예상한다. 그러한 기술적 사상을 반영하여 (S120)에서는 마킹 영상블록 주변에 존재하는 비마킹 영상블록에 대해 상대적으로 완화된 판단기준에 따라 이동객체 영역에 해당하는지 여부를 판단한다.In compressed video, macroblocks and subblocks are very small in size. Therefore, if it is a video of people, cars, animals, etc., such as CCTV video, it is expected that moving objects will not appear in only one video block due to their nature, but will appear across multiple video blocks. In other words, it is expected that the likelihood that a moving object is captured in an image block that exists near an image block in which a moving object is captured is relatively higher than in an image block that does not. Reflecting such technical ideas, in (S120), it is determined whether non-marking image blocks existing around the marking image block correspond to the moving object area according to relatively relaxed judgment criteria.

바람직하게는 각각의 이웃 블록을 검사하여, 현재 프레임에서 검출된 모션벡터 값이 미리 설정된 제 2 임계치(예: 0) 이상이거나 코딩타입이 인트라 픽쳐(Intra Picture)일 경우에는 해당 영상블록도 이동객체 영역으로 마킹한다. 다른 실시예로는, 이웃 블록에 대해 앞서 (S100)에서 산출하였던 모션벡터 누적값이 제 2 임계치(예: 5) 이상이거나 코딩타입이 인트라 픽쳐일 경우에는 해당 영상블록도 이동객체 영역으로 마킹할 수 있다. 이때, 제 2 임계치는 제 1 임계치에 비해 작은 값으로 설정되는 것이 논리적으로 타당하다.Preferably, each neighboring block is inspected, and if the motion vector value detected in the current frame is more than a preset second threshold (e.g. 0) or the coding type is Intra Picture, the corresponding video block is also a moving object. Mark as an area. In another embodiment, if the motion vector accumulation value previously calculated in (S100) for a neighboring block is greater than or equal to the second threshold (e.g., 5) or the coding type is intra picture, the corresponding video block may also be marked as a moving object area. You can. At this time, it is logically appropriate to set the second threshold to a smaller value than the first threshold.

개념적으로는, 유효 움직임이 발견되어 이동객체 영역의 근방에서 어느 정도의 움직임이 있는 영상블록이라면 이는 앞의 이동객체 영역과 한 덩어리일 가능성이 높기 때문에 이동객체 영역이라고 마킹하는 것이다. 또한, 인트라 픽쳐의 경우에는 모션벡터가 존재하지 않기 때문에 모션벡터에 기초하여 이동객체 영역인지 여부를 판단하는 것이 불가능하다. 이에, 이동객체 영역으로 이미 검출된 영상블록에 인접하여 위치하는 인트라 픽쳐라면 기 추출된 이동객체 영역과 함께 한 덩어리를 이루는 것으로 추정한다. 이동객체 영역이 아닌 영상블록 하나가 이동객체 영역에 포함되었을 때의 손실은 별로 크지 않은 반면, 이동객체 영역이 파편화되었을 때의 손실은 크기 때문이다.Conceptually, if effective motion is found and an image block with some degree of movement is found near the moving object area, it is marked as a moving object area because it is highly likely to be part of the previous moving object area. Additionally, in the case of intra pictures, since there is no motion vector, it is impossible to determine whether or not it is a moving object area based on the motion vector. Accordingly, if an intra picture is located adjacent to an image block that has already been detected as a moving object area, it is assumed to form a lump together with the previously extracted moving object area. This is because the loss when one video block that is not in the moving object area is included in the moving object area is not very large, whereas the loss when the moving object area is fragmented is large.

앞서의 [도 8]을 살펴보면 이동객체에 해당되는 영상블록이 제대로 마킹되지 않았으며 일부에 대해서만 마킹이 이루어진 것을 발견할 수 있다. 즉, 걷고있는 사람이나 주행중인 자동차를 살펴보면 객체의 전부가 마킹되지 않고 그 일부의 영상블록만 마킹되었음을 발견할 수 있다. 또한, 하나의 이동객체에 대해 복수의 이동객체 영역이 형성된 것도 많이 발견된다. 이는 앞의 (S100)에서 채택한 이동객체 영역의 판단 기준이 일반 영역을 필터링 아웃하는 데에는 매우 유용하지만 상당히 엄격한 것이었음을 의미한다. 따라서, 앞서 (S100)에서 마킹된 이동객체 영역을 중심으로 그 주변의 영상블록들을 검토하고 일정 기준을 만족한다면 이동객체 영역을 추가로 마킹해줌으로써 결과적으로는 이동객체 영역의 바운더리를 검출하는 과정이 필요하다.Looking at the previous [Figure 8], it can be seen that the video blocks corresponding to the moving objects were not properly marked and that only some were marked. In other words, if you look at a walking person or a driving car, you can find that not all of the object is marked, but only some of the image blocks are marked. Additionally, it is often found that multiple moving object areas are formed for one moving object. This means that the criteria for judging the moving object area adopted in (S100) above are very useful for filtering out the general area, but are quite strict. Therefore, focusing on the moving object area marked in (S100), the surrounding image blocks are reviewed, and if a certain standard is met, the moving object area is additionally marked, resulting in a process of detecting the boundary of the moving object area. need.

[도 9]는 [도 8]의 CCTV 영상 이미지에 대해 바운더리 영역 검출을 적용한 결과의 일 예를 나타낸 도면인데, 이상의 과정을 통해 이동객체 영역으로 마킹된 다수의 영상블록을 파란 색으로 표시하였다. [도 8]과 [도 9]를 비교하면, 앞서 [도 8]에서 붉은 색으로 표시되었던 이동객체 영역의 근방으로 파란 색의 이동객체 영역은 좀더 확장되었으며 이를 통해 CCTV로 촬영된 실제 영상과 비교할 때 이동객체를 전부 커버할 정도가 되었다는 사실을 발견할 수 있다. [Figure 9] is a diagram showing an example of the results of applying boundary area detection to the CCTV video image of [Figure 8]. Through the above process, a number of video blocks marked as moving object areas are displayed in blue. Comparing [FIG. 8] and [FIG. 9], the blue moving object area has expanded near the moving object area previously shown in red in [FIG. 8], and this can be compared with the actual image captured by CCTV. You can find that it has reached the point where all moving objects are covered.

단계 (S150) : 다음으로, 마킹 영상블록으로 둘러싸인 미리 설정된 갯수 이하의 비마킹 영상블록을 이동객체 영역으로 마킹한다. 이 과정은 (S100) 내지 (S140)에서 검출된 이동객체 영역에 인터폴레이션(interpolation)을 적용하여 이동객체 영역의 분할(fragmentation)을 정리하는 것이다. Step (S150): Next, unmarked video blocks of less than a preset number surrounded by marked video blocks are marked as a moving object area. This process applies interpolation to the moving object area detected in (S100) to (S140) to organize fragmentation of the moving object area.

(S100) 내지 (S140)에서는 영상블록 단위로 이동객체 영역 여부를 판단하였기 때문에 실제로는 하나의 이동객체(예: 사람)에 속함에도 불구하고 중간중간에 이동객체 영역으로 마킹되지 않은 영상블록이 존재하여 여러 개의 이동객체 영역으로 분할되는 현상이 발생할 수 있다. 이렇게 중간중간에 비마킹 영상블록이 존재하게 되면 이들이 다수의 개별적인 이동객체인 것처럼 간주될 수 있다. 이렇게 이동객체 영역이 파편화되면 이동객체의 식별이 부정확해질 있다. 그에 따라, 복수의 마킹 영상블록으로 둘러싸인 상태로 하나 혹은 소수의 비마킹 영상블록이 존재한다면 이들은 이동객체 영역으로 추가로 마킹하는데, 이를 인터폴레이션(보간)이라고 부른다. 이를 통해, 여러 개로 분할되어 있는 이동객체 영역을 하나로 뭉쳐지도록 만들 수 있다.In (S100) to (S140), the presence or absence of a moving object area is determined on a video block basis, so there are video blocks in the middle that are not marked as a moving object area even though they actually belong to one moving object (e.g., a person). As a result, division into multiple moving object areas may occur. If unmarked video blocks exist in the middle like this, they can be considered as if they are multiple individual moving objects. If the moving object area is fragmented in this way, identification of the moving object may become inaccurate. Accordingly, if there is one or a few unmarked image blocks surrounded by a plurality of marked image blocks, they are additionally marked as a moving object area, which is called interpolation. Through this, moving object areas that are divided into several parts can be united into one.

[도 10]은 [도 9]의 영상 이미지에 대해 인터폴레이션을 적용한 예시도이다. [도 8] 내지 [도 10]을 비교하면 바운더리 영역 검출 과정과 인터폴레이션 과정을 거치면서 이동객체 영역이 실제 영상의 상황을 제대로 반영하게 되어간다는 사실을 발견할 수 있다. [도 8]에서 붉은 색으로 마킹된 덩어리로 판단한다면 영상 속에 아주 작은 물체들이 다수 움직이는 것처럼 다루어질 것인데, 이는 실제와는 부합하지 않는다. 반면, [도 10]에서 파란 색으로 마킹된 덩어리로 판단한다면 어느 정도의 부피를 갖는 몇 개의 이동객체가 존재하는 것으로 다루어질 것이어서 실제 장면을 유사하게 반영한다.[FIG. 10] is an example of interpolation applied to the video image of [FIG. 9]. Comparing [FIG. 8] to [FIG. 10], it can be found that the moving object area properly reflects the actual image situation through the boundary area detection process and the interpolation process. If we judge it as a lump marked in red in [Figure 8], it will be treated as if there are many very small objects moving in the image, which does not correspond to reality. On the other hand, if it is judged as a lump marked in blue in [Figure 10], it will be treated as the presence of several moving objects with a certain volume, thus similarly reflecting the actual scene.

단계 (S160) : 다음으로, 압축영상을 구성하는 일련의 이미지 프레임에서 마킹 영상블록들이 서로 연결되어 뭉쳐진 덩어리를 이동객체로 설정한다. (S100) 내지 (S150)을 통하여 압축영상으로부터 하나이상의 이동객체 영역을 획득하였다. 각각의 단계 (S100) 내지 (S150)에서는 영상블록 단위로 이동객체 영역에 속하는지 여부를 판단하여 마킹하였으나, 최종적으로는 이들이 서로 연결되어 뭉쳐진 영상블록의 덩어리가 이동객체 영역(region of moving object)으로 다루어진다. 이동객체 영역은 [도 10]에 도시된 것과 같이 다수의 영상블록들이 뭉쳐진 덩어리이다. Step (S160): Next, in a series of image frames constituting the compressed video, the marking video blocks are connected to each other and the lumped mass is set as a moving object. One or more moving object areas were obtained from the compressed image through (S100) to (S150). In each step (S100) to (S150), each video block was judged and marked as to whether it belonged to the moving object area, but ultimately, the mass of video blocks that were connected to each other was called the region of moving object. is dealt with. The moving object area is a mass of multiple image blocks as shown in [Figure 10].

한편, 본 발명은 컴퓨터가 읽을 수 있는 비휘발성 기록매체에 컴퓨터가 읽을 수 있는 코드의 형태로 구현되는 것이 가능하다. 이러한 비휘발성 기록매체로는 다양한 형태의 스토리지 장치가 존재하는데 예컨대 하드디스크, SSD, CD-ROM, NAS, 자기테이프, 웹디스크, 클라우드 디스크 등이 있다. 또한, 본 발명은 하드웨어와 결합되어 특정의 절차를 실행시키기 위하여 매체에 저장된 컴퓨터프로그램의 형태로 구현될 수도 있다. Meanwhile, the present invention can be implemented in the form of computer-readable code on a computer-readable non-volatile recording medium. These non-volatile recording media include various types of storage devices, such as hard disks, SSDs, CD-ROMs, NAS, magnetic tapes, web disks, and cloud disks. Additionally, the present invention may be implemented in the form of a computer program stored on a medium in order to execute a specific procedure in combination with hardware.

10 : CCTV 카메라
20 : 영상관제 장치
30 : 스토리지 장치
40 : 영상분석 장치10: CCTV camera
20: Video control device
30: storage device
40: video analysis device

Claims

In a CCTV video control system that processes a large amount of compressed video, the video analysis device 40 processes a syntax-based search for the same person (RE-ID) for compressed video,
A first step of extracting syntax information including a motion vector and coding type for a coding unit according to a video compression standard by parsing the bitstream of a plurality of compressed images provided from the CCTV camera 10;
Based on the syntax information including the extracted motion vector and coding type, parts that appear to be moving objects are identified in a series of image frames constituting the compressed video, and these are set as object unit areas to perform compressed video decoding, downscale resizing, A second step of detecting a moving object without performing difference image acquisition and playback image analysis;
Among the series of image frames constituting the plurality of compressed images, an image frame in which a moving object was not detected in the second step (hereinafter referred to as 'first image frame') and an image frame in which a moving object was detected (hereinafter referred to as ' A 3a step of classifying and identifying a second image frame (referred to as a 'second image frame');
A 3b step of skipping image analysis by compressed image decoding, downscale resizing, difference image acquisition, and playback image analysis for the first image frame among the series of image frames constituting the plurality of compressed images;
Among the series of image frames that make up the plurality of compressed images, compressed image decoding, downscale resizing, difference image acquisition, and playback image analysis are performed on the second image frame, which is a P frame located within a preset number at the rear of the GOP (Group of Pictures). A 3c step of skipping image analysis by;
A 3d step of performing image analysis by compressed image decoding, downscale resizing, difference image acquisition, and playback image analysis only on the remaining second image frames among the series of image frames constituting the plurality of compressed images;
A 3e step of acquiring a thumbnail for the moving object detected in the second step by analyzing the image in the 3d step;
A 4a step of obtaining a plurality of thumbnails for the moving object by performing object tracking on the compressed image until the moving object detected in the second step disappears from the image;
A 4b step of acquiring a best shot image for the moving object from among the obtained plurality of thumbnails;
A fifth step of extracting object characteristic information (RE-ID Feature) from the best shot image of the moving object;
A sixth step of managing a database of object characteristic information of the best shot image of the moving object;
A seventh step of receiving a target image for identical person (RE-ID) search; and
An eighth step of querying the database for object characteristic information of the target image and performing a search for the same person for the target image based on the object characteristic information;
It is composed including,
The first to sixth steps are configured as a first task or thread, the seventh step and the eighth step are implemented as a second task or thread, and the first and second tasks or threads are executed simultaneously. A syntax-based identical person search method for compressed video that is configured to perform real-time identical person search in a CCTV video control system.

In claim 1,
The 4b step is configured to set the thumbnail with the largest area of the moving object among the obtained plurality of thumbnails as the best shot image.

delete

In claim 1,
The second step is,
Calculating a motion vector accumulation value for a specific time for each video block for a series of image frames constituting the compressed video;
Marking an image block with a motion vector accumulation value exceeding a preset first threshold among the plurality of image blocks as a moving object area - Hereinafter, the image block marked as a moving object area is referred to as a 'marked image block (marked image block)'. block)', and an image block that is not marked as a moving object area is called an 'unmarked image block'--;
Identifying one or more image blocks (hereinafter referred to as 'neighboring blocks') adjacent to the marking image block;
Marking a neighboring block with an intra picture coding type as a moving object area;
Marking a neighboring block with a motion vector value exceeding a preset second threshold as a moving object area;
Marking a preset number or less of unmarked image blocks surrounded by marked image blocks as a moving object area;
In a series of image frames constituting a compressed image, marking image blocks are connected to each other and set as a moving object;
A syntax-based identical person search method for compressed video, comprising:

A computer program stored in a storage medium to execute a syntax-based same-person search method for compressed video according to any one of claims 1, 2, and 4 on a computer.