KR20190023389A

KR20190023389A - Multi-Class Multi-Object Tracking Method using Changing Point Detection

Info

Publication number: KR20190023389A
Application number: KR1020170109109A
Authority: KR
Inventors: 이필규
Original assignee: 인하대학교 산학협력단
Priority date: 2017-08-29
Filing date: 2017-08-29
Publication date: 2019-03-08
Also published as: KR102465960B1

Abstract

The present invention relates to a multi-class multi-object tracking method capable of tracking a class of an object without limit. According to the present invention, the multi-class multi-object tracking method comprises: a step of calculating an observation likelihood about presence and absence of an object included in an input image sequence; a step of setting a plurality of tracking paths; a step of detecting a change point about a time point in which data of segments on the tracking path is changed by a threshold value or more; and a step of determining final tracking segments.

Description

[0001] The present invention relates to a multi-class multi-object tracking method,

본 발명은 다중 물체 추적 방법에 관한 것으로서, 특히, 물체 검출 기반 다중물체 추적 기술에 변화점 검출 기술의 결합을 이용하여 무제한으로 물체의 클래스를 추적할 수 있는 다중클래스 다중물체 추적 방법에 관한 것이다. The present invention relates to a multi-object tracking method, and more particularly, to a multi-class multi-object tracking method capable of tracking an object class with an unlimited number of objects using a combination of change point detection technology and object detection based multi object tracking technology.

다중 물체 추적은 영상 보안, 제스쳐 인식, 로봇 비전, 그리고 휴먼 로봇 인터랙션 등 다양한 현실 세계에 적용할 수 있는 기술로써 부상하고 있다. 다중 물체 추적의 도전적인 문제는 노이즈, 빛의 변화, 포즈, 복잡한 배경, 상호 작용, 물체로 인한 가림, 카메라의 움직임 등으로 발생한 외관의 변화에 의한 추적 궤도의 드리프트 발생이다. 대부분의 다중 물체 추적 방법들은 매 프레임마다 변화하는 물체의 개수 때문에 성능 저하를 겪으며, 복잡한 배경에서 이러한 현상은 더욱 두드러진다. 또한, 대부분의 다중 물체 추적 방법들은 주로 보행자나 자동차 추적 등 오로지 제한된 개수의 카테고리에 집중하고 있다. 물체의 클래스를 무제한적으로 활용하는 다중물체 추적 기술은 매우 높은 복잡도와 높은 계산량의 요구로 인해 매우 드물게 연구되어 왔다.Multi - object tracking is emerging as a technology applicable to various real world such as image security, gesture recognition, robot vision, and human robot interaction. A challenging problem in tracking multiple objects is the occurrence of trajectory drift due to changes in appearance caused by noise, light changes, pauses, complex backgrounds, interactions, obstructions due to objects, and camera movement. Most multi-object tracking methods suffer from performance degradation due to the number of objects changing per frame, and this is more noticeable in complex backgrounds. In addition, most multi-object tracking methods focus primarily on a limited number of categories, such as pedestrians and car tracking. Multi-object tracking techniques that utilize an object's class indefinitely have been studied very rarely due to very high complexity and high computational demands.

베이지안 필터는 사후 확률을 추정하기 위한 모션 모델과 관측 모델로 구성된다. 베이지안 필터를 기반으로 한 물체 추적 방법 중 하나는 마르코프 체인 몬테 카를로 기반의 방법이다. 이는 다중 물체 간의 상호 작용과 다양한 물체의 움직임을 다룰 수 있다. 대부분의 마르코프 체인 몬테 카를로 기반의 방법들은 물체의 개수가 시간에 따라 변화하지 않는다고 가정하고 있지만, 이러한 가정은 실제 세계에서 응용하기에는 적합하지 않다. 기존의 마르코프 체인 몬테 카를로 기반 추적 방법에는 한 가지 제한된 가정으로 인한 문제가 있다. 그래서 이를 해결하기 위해, 가역 점프 마르코프 체인 몬테 카를로가 제안되었다. 가역 점프 마르코프 체인 몬테 카를로는 시간에 따라 물체의 개수가 변하는 현상을 수용하기 위해 업데이트, 교체, 생성, 그리고 소멸 등의 모션 변화를 다룰 수 있다. 가역 점프 마르코프 체인 몬테 카를로는 새로운 물체를 초기화함으로써 새로운 추적을 시작하거나, 물체를 제거함으로써 현재 추적을 종료한다.The Bayesian filter consists of a motion model and an observation model for estimating the posterior probability. One of the object tracking methods based on the Bayesian filter is the Markov chain Monte Carlo based method. It can deal with the interaction between multiple objects and the movement of various objects. Most Markov chain Monte Carlo-based methods assume that the number of objects does not change with time, but these assumptions are not suitable for real world applications. The existing Markov chain Monte Carlo-based tracking method has a problem due to one limited assumption. To solve this problem, Monte Carlo, a reversible jump marker, was proposed. Reversible Jump Markov Chain Monte Carlo can deal with motion changes such as updating, replacing, creating, and destroying to accommodate changes in the number of objects over time. The Reversible Jump Markov Chain Monte Carlo ends the current track by initiating a new track by initializing a new object, or by removing an object.

이처럼 마르코프 체인 몬테 카를로 기반의 다중 물체 추적 접근방법이 어느 정도 성공적이었다. 그럼에도 고차원의 상태 공간으로 인해 매우 높은 계산량이 요구된다. 물체 모습의 변화, 상호 작용, 물체로 인한 가림, 그리고 변화하는 물체의 개수 또한 높은 계산량 과부하를 야기한다. Thus, the multi-object tracking approach based on the Markov chain Monte Carlo has been somewhat successful. Nevertheless, a very high computational complexity is required due to the high dimensional state space. Changes in object appearance, interaction, occlusion due to objects, and the number of changing objects also result in high computational overloads.

예를 들어, 발명, "Sakaino, H.: Video-based tracking, learning, and recognition method for multiple moving objects. IEEE trans. on circuits and systems for video technology 23 (2013) 1661-1674"에서는 물체의 생성과 소멸 상태를 결정하는 모션 모델을 분리시키고, 마르코프 체인의 반복 루프는 업데이트와 교체 상태만을 결정함으로써 마르코프 체인 몬테 카를로 샘플링의 계산 과부하를 감소시켰다. 만약 물체의 생성과 소멸을 결정하는 행동을 마르코프 체인 몬테 카를로 체인 내부에서 결정하게 되면, 마르코프 체인 몬테 카를로 샘플링에서 차원의 변화가 요구된다. 생성과 소멸을 결정하는 행동이 마르코프 체인 반복 루프로부터 분리됨으로써 더 이상 마르코프 체인은 차원의 변화가 발생하지 않게 되고, 이는 보다 적은 양의 계산량으로 안정적인 상태에 도달할 수 있게 된다. 하지만, 이러한 생성과 소멸을 결정하는 행동을 분리하는 간단한 접근방법으로는 다중 물체 추적에서 발생하는 복잡한 상황들을 효율적으로 처리할 수 없다. 대부분의 경우, 물체 외형 변화로 인해 추적 드리프트 문제가 발생한다.For example, in the invention, " Sakaino, H.: Video-based tracking, learning, and recognition methods for multiple moving objects, IEEE Trans. On circuits and systems for video technology 23 (2013) 1661-1674 " The motion model that determines the extinction state is isolated, and the iterative loop of the Markov chain reduces the computational overload of the Markov chain Monte Carlo sampling by only determining the update and replacement states. If the behavior that determines the creation and extinction of an object is determined within the Monte Carlo chain of Markov chains, dimensional changes in the Markov chain Monte Carlo sampling are required. By separating the actions that determine generation and extinction from the iterative loop of the Markov chain, the Markov chain no longer changes in dimension, which makes it possible to reach a stable state with a smaller amount of computation. However, a simple approach to isolate such generation and decay behavior can not efficiently handle complex situations arising from multiple object tracking. In most cases, tracking drift problems arise due to changes in object shape.

따라서, 본 발명은 상술한 문제점을 해결하기 위하여 안출된 것으로, 본 발명의 목적은, 물체 검출 기반 다중물체 추적 기술에 변화점 검출 기술의 결합을 이용하여 무제한으로 물체의 클래스를 추적할 수 있는 다중클래스 다중물체 추적 방법을 제공하는 데 있다. SUMMARY OF THE INVENTION Accordingly, the present invention has been made keeping in mind the above problems occurring in the prior art, and it is an object of the present invention to provide a multi- Class multi-object tracking method.

본 발명에서 변화점 검출 모델은 드리프트나 물체로 인한 가림으로 인해 발생하는 특이점이나, 급작스러운 변화를 추적 상태의 공간과 시간상의 특성을 기반으로 검출한다. 컨볼루션 신경망 기반의 물체 검출기와 Lucas-Kanade 추적기 기반의 모션 검출기는 제안한 영역의 다른 물체 클래스에 대한 우도(likelihood)를 계산하기 위해 적용되었다. 본 발명은 최근 소개된 영상 벤치마크 데이터셋인 ImageNet VID와 MOT 벤치마크를 기반으로 이루어진 실험에서, 최신의 다중 물체 추적 기술들과도 비견될 만한 매우 고무적인 결과를 확인하였다. In the present invention, the change point detection model detects a singular point caused by a drift or an obstruction due to an object, or a sudden change based on the spatial state of the track state and the temporal characteristics. A convolution neural network based object detector and a Lucas-Kanade tracker based motion detector were applied to calculate the likelihood for different object classes in the proposed domain. The present invention is based on recently introduced image benchmark data sets, ImageNet VID and MOT benchmark, and confirmed very encouraging results comparable to the latest multi-object tracking techniques.

먼저, 본 발명의 특징을 요약하면, 상기의 목적을 달성하기 위한 본 발명의일면에 따른 입력 영상 시퀀스에 대한 다중클래스 다중물체 추적 방법은, 입력 영상 시퀀스에 포함된 물체의 존재와 비존재에 대한 관측 우도를 계산하는 단계; 상기 우도의 계산에 기초하여, 물체 존재 가능성 위치에 대한 세그먼트들을 결정하고 세그먼트들을 연결하는 복수의 추적 궤도를 설정하는 단계; 추적 궤도 상의 상기 세그먼트들의 데이터가 임계값 이상으로 변화하는 시간 지점에 대한 변화점을 검출하는 단계; 및 상기 변화점이 포함된 추적 궤도 상의 세그먼트들에 대해 포워드-백워드 확인을 수행하고, 확인 결과에 기초하여 추적 궤도 상의 세그먼트들을 결합하여 최종 추적 세그먼트들을 결정하는 단계를 포함한다.In accordance with an aspect of the present invention, there is provided a method of tracking a multi-class multiple object for an input video sequence, the method comprising: detecting a presence or absence of an object included in an input image sequence, Calculating an observation likelihood; Determining, based on the computation of likelihood, segments for object presence probability positions and establishing a plurality of trajectories connecting the segments; Detecting a change point with respect to a time point at which the data of the segments on the trajectory change exceeds a threshold value; And performing a forward-backward check on the segments on the trajectory containing the change point, and combining the segments on the trajectory based on the result of the determination to determine the final trace segments.

상기 결정하는 단계에서, 상기 변화점의 검출에 대한 점수화를 이용하여 해당 세그먼트를 확신 세그먼트와 드리프트된 불안정 세그먼트로 분류하고, 상기 확신 세그먼트를 유지하고 상기 불안정 세그먼트를 제외시켜, 해당 추적 궤도 상의 세그먼트들을 결합할 수 있다.Wherein the determining step comprises: classifying the segment into a confidence segment and a drifted unstable segment using scoring for detection of the change point; maintaining the confidence segment and excluding the unstable segment; Can be combined.

상기 결정하는 단계에서, 해당 추적 궤도 상의 세그먼트들에 대해 포워드-백워드 에러로서의 확신도를 추정하여 상기 확신 세그먼트의 유지와 상기 불안정 세그먼트의 제외에 이용하되, 상기 확신도는 소정의 시간 동안의 포워드 추적에 기초한 세그먼트들 연결 궤도와 백워드 추적에 기초한 그 역순의 연결궤도 사이의 유클리디안 거리를 이용해 계산될 수 있다.Wherein the confidence level is used for estimating a confidence level as a forward-backward error for segments on a corresponding trajectory to exclude the retention of the confidence segment and the unstable segment, Can be computed using the Euclidean distance between the trajectory-based segment trajectory and its backward trajectory based on backward tracing.

그리고, 본 발명의 다른 일면에 따른 입력 영상 시퀀스에 대한 다중클래스 다중물체 추적을 위한 컴퓨터로 읽을 수 있는 코드로 구현된 기록 매체는, 입력 영상 시퀀스에 포함된 물체의 존재와 비존재에 대한 관측 우도를 계산하는 기능; 상기 우도의 계산에 기초하여, 물체 존재 가능성 위치에 대한 세그먼트들을 결정하고 세그먼트들을 연결하는 복수의 추적 궤도를 설정하는 기능; 추적 궤도 상의 상기 세그먼트들의 데이터가 임계값 이상으로 변화하는 시간 지점에 대한 변화점을 검출하는 기능; 및 상기 변화점이 포함된 추적 궤도 상의 세그먼트들에 대해 포워드-백워드 확인을 수행하고, 확인 결과에 기초하여 추적 궤도 상의 세그먼트들을 결합하여 최종 추적 세그먼트들을 결정하는 기능을 수행하는 것을 포함한다. According to another aspect of the present invention, there is provided a recording medium embodied as computer-readable code for multi-class multi-object tracking of an input video sequence, the method comprising the steps of: ; Determining, based on the computation of likelihood, segments for object presence probability positions and setting a plurality of trajectory trains connecting the segments; Detecting a change point with respect to a time point at which the data of the segments on the trajectory change more than a threshold value; And performing a forward-backward check on the segments on the trajectory containing the change point, and combining the segments on the trajectory based on the result of the determination to determine the final trace segments.

본 발명에 따른 다중클래스 다중물체 추적 방법에 따르면, 다양하게 변화하는 다중물체 무제한-클래스를 이상적으로 움직이는 제안 영역의 우도(likelihood)를 공식화함으로써 추적할 수 있다. 오직 보행자나 자동차와 같은 제한된 종류의 물체들에 한정하여 우도를 추정하던 기존 기술과 달리, 효율적인 컨볼루션 신경망 기반의 다중클래스 물체 검출기가 다중 물체 클래스의 우도를 계산하기 위해 적용될 수 있다.According to the multi-class multi-object tracking method according to the present invention, it is possible to track by changing the likelihood of a proposed region that ideally moves a multi-object unlimited-class which changes in various ways. Unlike existing techniques that estimate only limited types of objects, such as pedestrians and automobiles, an efficient convolution neural network based multi-class object detector can be applied to calculate the likelihood of multiple object classes.

또한, 정적인 관측과 동적인 관측을 기반으로 한 변화점 검출 알고리즘이 추적의 실패를 평가한다. 추적 세그먼트가 표현하는 변화 없는 시계열 내의 급작스런 변화점을 검출함으로써, 다중 물체 추적의 드리프트를 조사할 수 있다.In addition, a change point detection algorithm based on static observations and dynamic observations evaluates the failure of tracking. The drift of multiple object tracking can be investigated by detecting a sudden change point in the time series without change represented by the tracking segment.

본 발명에 관한 이해를 돕기 위해 상세한 설명의 일부로 포함되는 첨부도면은, 본 발명에 대한 실시예를 제공하고 상세한 설명과 함께 본 발명의 기술적 사상을 설명한다.
도 1은 본 발명의 일 실시예에 따른 다중클래스 다중물체 추적을 위한 프레임워크의 주요 개념을 보여준다.
도 2는 본 발명의 일 실시예에 따른 변화점 검출을 설명하기 위한 흐름도이다.
도 3a 및 도 3b는 본 발명의 MCMOT를 적용하여 MOT16-02와 MOT16-09 시퀸스의 세그먼트로부터 변화점을 검출한 예시이다.
도 4는 본 발명의 MCMOT를 적용하여 ImageNet VID 데이터셋의 검증 시퀸스에서의 MCMOT 추적 결과를 나타낸 예시이다.
도 5는 본 발명의 MCMOT를 적용하여 테스트 영상 시퀸스에서 MCMOT의 추적 결과를 시각화한 예시를 보여준다.BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and, together with the description, serve to explain the principles of the invention.
FIG. 1 shows a main concept of a framework for multi-class multi-object tracking according to an embodiment of the present invention.
2 is a flow chart for explaining change point detection according to an embodiment of the present invention.
FIGS. 3A and 3B illustrate examples of detection of change points from segments of the MOT16-02 and MOT16-09 sequences by applying the MCMOT of the present invention.
Figure 4 is an illustration of the result of MCMOT tracing in a verification sequence of an ImageNet VID dataset using the MCMOT of the present invention.
FIG. 5 shows an example of visualizing the tracking result of MCMOT in a test image sequence by applying the MCMOT of the present invention.

이하에서는 첨부된 도면들을 참조하여 본 발명에 대해서 자세히 설명한다. 이때, 각각의 도면에서 동일한 구성 요소는 가능한 동일한 부호로 나타낸다. 또한, 이미 공지된 기능 및/또는 구성에 대한 상세한 설명은 생략한다. 이하에 개시된 내용은, 다양한 실시 예에 따른 동작을 이해하는데 필요한 부분을 중점적으로 설명하며, 그 설명의 요지를 흐릴 수 있는 요소들에 대한 설명은 생략한다. 또한 도면의 일부 구성요소는 과장되거나 생략되거나 또는 개략적으로 도시될 수 있다. 각 구성요소의 크기는 실제 크기를 전적으로 반영하는 것이 아니며, 따라서 각각의 도면에 그려진 구성요소들의 상대적인 크기나 간격에 의해 여기에 기재되는 내용들이 제한되는 것은 아니다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings. In the drawings, the same components are denoted by the same reference symbols as possible. In addition, detailed descriptions of known functions and / or configurations are omitted. The following description will focus on the parts necessary for understanding the operation according to various embodiments, and a description of elements that may obscure the gist of the description will be omitted. Also, some of the elements of the drawings may be exaggerated, omitted, or schematically illustrated. The size of each component does not entirely reflect the actual size, and therefore the contents described herein are not limited by the relative sizes or spacings of the components drawn in the respective drawings.

본 발명의 실시예들을 설명함에 있어서, 본 발명과 관련된 공지기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다. 그리고, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. 상세한 설명에서 사용되는 용어는 단지 본 발명의 실시 예들을 기술하기 위한 것이며, 결코 제한적이어서는 안 된다. 명확하게 달리 사용되지 않는 한, 단수 형태의 표현은 복수 형태의 의미를 포함한다. 본 설명에서, "포함" 또는 "구비"와 같은 표현은 어떤 특성들, 숫자들, 단계들, 동작들, 요소들, 이들의 일부 또는 조합을 가리키기 위한 것이며, 기술된 것 이외에 하나 또는 그 이상의 다른 특성, 숫자, 단계, 동작, 요소, 이들의 일부 또는 조합의 존재 또는 가능성을 배제하도록 해석되어서는 안 된다. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the following description, well-known functions or constructions are not described in detail since they would obscure the invention in unnecessary detail. The following terms are defined in consideration of the functions of the present invention, and may be changed according to the intention or custom of the user, the operator, and the like. Therefore, the definition should be based on the contents throughout this specification. The terms used in the detailed description are intended only to describe embodiments of the invention and should in no way be limiting. Unless specifically stated otherwise, the singular form of a term includes plural forms of meaning. In this description, the expressions " comprising " or " comprising " are intended to indicate certain features, numbers, steps, operations, elements, parts or combinations thereof, Should not be construed to preclude the presence or possibility of other features, numbers, steps, operations, elements, portions or combinations thereof.

또한, 제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되는 것은 아니며, 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다.It is also to be understood that the terms first, second, etc. may be used to describe various components, but the components are not limited by the terms, and the terms may be used to distinguish one component from another .

먼저, 본 발명에서는, 물체 검출 반응과 변화점 검출 알고리즘을 결합한 강인한 다중클래스 다중물체 추적을 제안한다. Faster R-CNN, 그리고 ResNet 과 같은 딥러닝 기반의 물체 검출 기술의 발달로 인해, 물체의 클래스를 무제한적으로 다룰 수 있는 물체 검출이 현실적으로 가능하게 되었다. 물체 검출의 앙상블은 Faster R-CNN 기반의 물체 검출기와 Lucas-Kanade 추적 알고리즘 기반의 모션 검출기로 구현하였다. 베이지안 필터의 모션 다이내믹 모델을 분리하는 방법은 엔터티(entity) 전환과 모션 이동으로 구현하였다. 엔터티 전환은 생성과 소멸의 이벤트로 모델링한다. 딥러닝 기반의 알고리즘이 사후 확률을 계산한다. 다중 물체 추적의 가장 성가신 문제들 중 하나인 드리프트 문제는 변화점 검출 알고리즘을 통해 해결한다. 모션 다이나믹의 부드러움을 가정하면, 조명, 복잡한 배경, 포즈, 스케일 등과 연관되는 관찰의 급격한 변화하는 변화점 검출 알고리즘으로 처리할 수 있다. 본 발명의 주요 기여는 다음과 같다.First, the present invention proposes a robust multi-class multi-object tracking combining an object detection reaction and a change point detection algorithm. Faster R-CNN, and ResNet-based Deep Learning-based object detection technology has made it possible to detect objects that can handle an object's class indefinitely. The ensemble of object detection is realized by Faster R-CNN based object detector and Lucas-Kanade tracking algorithm based motion detector. The method of separating the motion dynamics model of the Bayesian filter is implemented by entity switching and motion movement. Entity transitions are modeled as events of creation and destruction. Deep learning-based algorithms calculate posterior probabilities. One of the most annoying problems of multi-object tracking, the drift problem, is solved through a change point detection algorithm. Assuming the smoothness of the motion dynamics, it can be handled with a rapidly changing point detection algorithm of observation associated with lighting, complex background, pose, scale, and so on. The main contributions of the present invention are as follows.

즉, 본 발명에 따른 다중클래스 다중물체 추적 방법에 따르면, 다양하게 변화하는 다중물체 무제한-클래스를 이상적으로 움직이는 제안 영역의 우도(likelihood)를 공식화함으로써 추적할 수 있다. 오직 보행자나 자동차와 같은 제한된 종류의 물체들에 한정하여 우도를 추정하던 기존 기술과 달리, 효율적인 컨볼루션 신경망 기반의 다중클래스 물체 검출기가 다중 물체 클래스의 우도를 계산하기 위해 적용될 수 있다. 또한, 정적인 관측과 동적인 관측을 기반으로 한 변화점 검출 알고리즘이 추적의 실패를 평가한다. 추적 세그먼트가 표현하는 변화 없는 시계열 내의 급작스런 변화점을 검출함으로써, 다중 물체 추적의 드리프트를 조사할 수 있다.That is, according to the multi-class multi-object tracking method according to the present invention, it is possible to track by changing the likelihood of the proposed region, which ideally moves the multi-object unlimited-class changing in various ways. Unlike existing techniques that estimate only limited types of objects, such as pedestrians and automobiles, an efficient convolution neural network based multi-class object detector can be applied to calculate the likelihood of multiple object classes. In addition, a change point detection algorithm based on static observations and dynamic observations evaluates the failure of tracking. The drift of multiple object tracking can be investigated by detecting a sudden change point in the time series without change represented by the tracking segment.

<다중 물체 추적><Multi Object Tracking>

최근의 다중 물체 추적 연구는 연속적인 영상에 존재하는, 물체 검출을 연결시키는 데이터 군집을 수행하기 위한 검출-기반-추적 이론에 집중되어 있다. 오프라인 기반 다중 물체 추적 방법은 사전 프레임의 정보를 사용하여 보다 나은 다중물체추적을 계층적 트랙 군집화, 네트워크 흐름, 그리고 글로벌 추적 궤도 최적화 등을 활용하여 수행한다. 하지만, 오프라인 방법은 높은 계산량을 요구한다. 이와 달리, 온라인 방법은 오직 과거와 현재 프레임의 정보만을 다중물체추적 문제를 해결하는데 사용하여 요구하는 계산량이 적다. 온라인 방법은 현실 문제에 적용하기 위해 더 적합하지만, 노이즈, 광량 변화, 포즈, 카메라 앵글, 그림자, 물체 개수의 변화, 그리고 급작스런 변화 등으로 인한 드리프트가 쉽게 발생한다. 역동적으로 변화하는 물체의 개수는 다루기가 어려우며, 특히 군중이나 군집도가 높은 물체들에서 자주 발생한다. 대부분의 다중 물체 추적 방법은 다른 종류의 특징을 기반으로 한 관측에 의존하고 있어 드리프트에 취약하다. 이러한 비정상성(nonstationary)와 비선형성(nonlinearity)에 대해, 칼만 필터 또는 파티클 필터와 같은 확률-기반의 추적 방법이 결정론적 기반의 추적 방법보다 우세하다고 알려져 있다.Recent multi-object tracking studies are focused on detection-based-tracking theory for performing data clustering that links object detection, present in continuous images. The offline based multi-object tracking method uses the information of the pre-frame to perform better multi-object tracking using hierarchical track clustering, network flow, and global tracking trajectory optimization. However, offline methods require high computational complexity. On the other hand, the on-line method requires only a small amount of computation to use only the past and present frame information to solve a multi-object tracking problem. The on-line method is more suitable for real-world problems, but drifts easily occur due to noise, light intensity, pose, camera angle, shadow, changes in the number of objects, and sudden changes. The number of dynamically changing objects is difficult to handle, especially in crowded or crowded objects. Most multi-object tracking methods rely on observations based on different kinds of features and are susceptible to drift. For these nonstationary and nonlinearities, probability-based tracking methods such as Kalman filters or particle filters are known to dominate deterministic based tracking methods.

<컨볼루션 신경망 기반 물체 검출><Convolution Neural Network Based Object Detection>

최근 몇 년 동안, 컴퓨터 비전 분야에서 컨볼루션 신경망을 사용하여 괄목할만한 성과를 보이고 있다. 그 중에서 특히 주목할 만한 연구는 R-CNN(Region based Convolutional Neural Networks)이다. R-CNN에서는 관심영역 기반 접근을 통해 컨볼루션 신경망 기반의 물체 분류기를 컨볼루션 신경망 기반의 물체 검출기로 전환하는 방법을 제안하였다. SPPNet(Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition)과 Fast R-CNN은 R-CNN 방법을 확장하여 컨볼루션 특징 지도로부터 컨볼루션 특징 벡터를 풀링하였다. 보다 최근에는, R-CNN 프레임워크 내부에서 관심영역들을 생성해낼 수 있는 RPN(Region Proposal Networks)이 제안되었다. 이러한 영역기반 컨볼루션 신경망 프레임워크는 거의 모든 이전 연구들보다 큰 폭으로 우월한 성능을 보여주었다. 이처럼 컨볼루션 신경망의 큰 성공에도 불구하고, 오로지 적은 수의 다중 물체 추적 알고리즘이 컨볼루션 신경망 기반의 표현을 적용하여 제안되었다. 기존 방법에서, 컨볼루션 신경망 기반의 프레임워크와 함께 간단한 물체 추적 알고리즘을 적용한 다중 물체 추적을 ImageNet VID 데이터셋에서 검증한 경우도 있으며, 컨볼루션 신경망 기반 물체 검출기가 MOT Challenge에서 사용된 경우도 있었다. In recent years, the use of convolutional neural networks in the field of computer vision has been remarkable. Particularly noteworthy is the region based Convolutional Neural Networks (R-CNN). In R-CNN, we proposed a method of transforming an object classifier based on convolutional neural network into an object detector based on convolutional neural network through an area-based approach of interest. SPPNet (Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition) and Fast R-CNN extended the R-CNN method to pool the convolution feature vectors from the convolution feature map. More recently, RPN (Region Proposal Networks) has been proposed that can create regions of interest within the R-CNN framework. These domain-based convolutional neural network frameworks have demonstrated superior performance over most previous studies. Despite the great success of convolutional neural networks, only a small number of multi-object tracking algorithms have been proposed by applying convolutional neural network based representation. In the conventional method, a multi-object tracking using a simple object tracking algorithm together with a convolutional neural network-based framework has been verified in the ImageNet VID data set, and a convolution neural network-based object detector has been used in the MOT Challenge.

본 발명에서는 이러한 영역 기반의 컨볼루션 신경망 패러다임을 적용하여 관측 모델을 구축한다.In the present invention, an observational model is constructed by applying the region-based convolutional neural network paradigm.

<다중클래스 다중물체 추적의 개요><Overview of multi-class multi-object tracking>

본 발명에서는 물체의 생성, 소멸, 물체로 인한 가림, 상호작용, 그리고 드리프트를 효과적으로 다룰 수 있는 효율적인 다중클래스 다중물체 추적기 MCMOT(Multi-Class Multi-Object Tracking unit)를 제안한다. MCMOT는 사후확률 계산, 상호작용 모델, 엔터티 모델, 모션 모델에서 발생할 수 있는 잘못된 계산으로 인해 실패할 수 있다. 따라서 MCMOT의 목적은 드리프트가 발생했을 때, 가능한 한 빨리 중단시키고, 잘못된 검출로부터 회복한 뒤, 다시 추적을 재개하는 것이다. The present invention proposes an efficient multi-class multi-object tracking unit MCMOT (Multi-Object Multi-Object Tracking Unit) that can efficiently handle generation, extinction, blindness, interaction, and drift of an object. MCMOT may fail due to posterior probability calculations, interaction models, entity models, and incorrect calculations that may occur in motion models. The purpose of MCMOT is therefore to stop the drift as soon as possible, recover from the false detection, and resume tracking again.

본 발명의 일 실시예에 따른 다중클래스 다중물체 추적기(MCMOT)는 각 기능을 실현하는 구성 요소들은 반도체 프로세서와 같은 하드웨어를 통하여 이루어질 수도 있고, 응용 프로그램과 같은 소프트웨어를 통하여 이루어질 수도 있으며, 또는 하드웨어와 소프트웨어의 결합으로 구현될 수도 있다. In the multi-class multi-object tracker (MCMOT) according to an exemplary embodiment of the present invention, the components for realizing each function may be realized through hardware such as a semiconductor processor, software such as an application program, Software. &Lt; / RTI >

특히, 이와 같은 본 발명의 일 실시예에 따른 다중클래스 다중물체 추적기(MCMOT)에서 데이터 처리에 사용되는 기능은 컴퓨터 등 장치로 읽을 수 있는 기록 매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하며, 이와 같은 기록 매체와 컴퓨터 등 장치의 결합으로 기능 수행에 필요한 데이터나 정보를 입력하거나 출력하고 디스플레이하도록 구현할 수 있다. 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광데이터 저장장치, 하드 디스크, 이동형 저장장치 등의 저장 매체를 포함한다. 또한, 컴퓨터가 읽을 수 있는 기록 매체는 네트워크(예, 인터넷, 이동통신 네트워크 등)로 연결된 컴퓨터 시스템에 분산되어 분산방식으로 컴퓨터가 읽을 수 있는 코드가 저장된 형태가 가능하며 네트워크를 통해 실행될 수도 있다.In particular, the functions used for data processing in the multi-class multi-object tracker (MCMOT) according to an embodiment of the present invention can be implemented as computer-readable codes on a recording medium readable by an apparatus such as a computer , And may be implemented by a combination of such a recording medium and a computer or the like to input, output, and display data or information necessary for performing a function. A computer-readable recording medium includes all kinds of recording apparatuses in which data that can be read by a computer system is stored. Examples of the computer-readable recording medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, hard disk, and storage medium such as a removable storage device. In addition, the computer-readable recording medium may be distributed in a computer system connected to a network (e.g., the Internet, a mobile communication network, or the like) and may store computer readable codes in a distributed manner and may be executed through a network.

도 1은 본 발명의 일 실시예에 따른 다중클래스 다중물체 추적을 위한 프레임워크의 주요 개념을 보여준다. 도 1과 같이 본 발명의 일 실시예에 따른 다중클래스 다중물체 추적기(MCMOT)에서 다중클래스 다중물체 추적을 위한 프레임워크는, 입력 영상 시퀀스에 대하여 (a) 관측 모델에 기반한 우도 계산, (b) 추적 세그먼트 생성, (c) 변화점 검출, (d) 추적 궤도 결합을 주요 단계로 하는 수행하는 과정으로 이루어져, 무제한으로 물체의 클래스를 추적할 수 있다. 세그먼트의 드리프트는 변화점 검출과 함께 포워드-백워드 확인을 통해 효과적으로 제어된다.FIG. 1 shows a main concept of a framework for multi-class multi-object tracking according to an embodiment of the present invention. As shown in FIG. 1, a framework for multi-class multi-object tracking in a multi-class multi-object tracker (MCMOT) according to an exemplary embodiment of the present invention includes: (a) a likelihood calculation based on an observation model; Tracking segment creation, (c) change point detection, and (d) trajectory combination. The class of objects can be tracked indefinitely. The drift of the segment is effectively controlled through forward-backward word verification with change point detection.

본 발명의 MCMOT에서는 추적 알고리즘으로 추적한 제안 영역들로 물체를 표현한다. 추적 알고리즘은, 우도 계산에 기초하여 만약 가능한 상호작용이나 가림이 발생했을 경우를 고려하여 추적 궤도를 여러 개의 세그먼트로 나눈다. 오류에 빠지기 쉬운 추적기의 결정을 고려하면, 세그먼트 내에서 관찰하고 지역화한 경계 상자를 비교하여, 변화점 검출을 통해 급격한 변화를 모니터링한다. 모션 기반의 추적 컴포넌트는 Lucas-Kanade 추적기를 적응적으로 적용하여 현재의 추적 포인트로부터 다음 추적 포인트의 영역을 예측한다. 본 발명에서는 깊은 특징점 기반의 다중클래스 물체 검출기를 글로벌 물체 검출기와 지역적 물체 검출기로써 사용한다. 이때, 물체 검출기의 능력에 따라 물체 카테고리의 개수가 확장될 수도 있다.In the MCMOT of the present invention, the object is represented by the proposed regions traced by the tracking algorithm. The tracking algorithm divides the tracking trajectory into several segments based on likelihood calculation, considering possible interactions or occlusion. Taking into account the decision of the easy-to-follow tracker, we observe within the segment, compare the localized bounding boxes, and monitor sudden changes through change point detection. The motion-based tracking component adaptively applies the Lucas-Kanade tracker to predict the area of the next tracking point from the current tracking point. In the present invention, a multi-class object detector based on a deep feature point is used as a global object detector and a local object detector. At this time, the number of object categories may be expanded according to the capability of the object detector.

이하 본 발명의 일 실시예에 따른 다중클래스 다중물체 추적기(MCMOT)의 동작에 대하여 자세히 설명한다. Hereinafter, the operation of the multi-class multi-object tracker (MCMOT) according to an embodiment of the present invention will be described in detail.

<다중클래스 다중물체 추적><Multi-class multi-object tracking>

본 발명의 MCMOT는 물체 단계의 이벤트인 물체 생성과 물체 소멸, 물체간 단계 이벤트인 상호작용과 물체간의 가림, 추적-단계 이벤트인 추적 생성, 업데이트, 그리고 소멸을 다루기 위한 데이터 중심의 접근 방식을 채택한다. 관측 실패로부터 오는 가능한 드리프트는 이상 검출 방법 기반의 변화점 검출을 활용해 대응한다.The MCMOT of the present invention adopts a data-centric approach to deal with object-level events such as object creation and object extinction, inter-object phase interactions and inter-object occlusion, tracking-step event generation, updating, and extinction do. Possible drifts from observation failures are addressed using change point detection based on anomaly detection methods.

추적 세그먼트는 생성과 소멸 검출을 사용해 정의한다. 오직 시각적으로 보이는 물체만을 추적하며, 만약 가려짐이 발생할 경우 전체적인 추적 궤도를 여러 개의 추적 세그먼트로 분할한다. 만일 물체가 가림이나 노이즈로 인해 모호해질 경우, 해당 추적 세그먼트는 종료한다(물체 소멸과 연관). 그리고 종료한 영역 근처에서 동일한 물체가 다시 발생했을 경우 해당 추적기는 다시 추적을 시작하며 (물체 생성과 연관), 추적 세그먼트를 연속적으로 구축한다.Trace segments are defined using generation and extinction detection. It only tracks visible objects and divides the overall tracking trajectory into multiple tracking segments if occlusion occurs. If the object becomes ambiguous due to occlusion or noise, the corresponding trace segment is terminated (associated with object extinction). If the same object reappears near the ending area, the tracker begins tracking again (associated with object creation) and builds the tracking segments successively.

<관측 모델><Observation model>

본 발명의 도 1의 "(a) 관측 모델에 기반한 우도 계산"에서, 관측 모델(관측 우도)은 다음과 같이 정의된다. 추적된 개체의 관측 우도는 객체 클래스와 정확한 위치를 모두 평가해야 한다. MCMOT에서는 다른 특성을 지닌 검출기들의 영상 검출을 앙상블하여 입력 영상 시퀀스에 대하여 그에 포함된 물체의 존재와 비존재에 대한 관측 우도를 정확하게 계산한다. 영상의 장면 상태의 차원은 변경될 수 있기 때문에, 측정 값은 시간 t에서의 물체 id의 존재와 비존재를 [수학식1]과 같이 우도의 비율로 정의한다. 비존재의 우도를 측정할 수 없기 때문에, 본 발명에서는 [수학식1]과 같이, 확률 모델의 소프트맥스(Softmax)를 채택한다. In the " likelihood calculation based on (a) observation model " of FIG. 1 of the present invention, the observation model (observation likelihood) is defined as follows. The observed likelihood of a tracked object must evaluate both the object class and its exact position. In MCMOT, the image detection of the detectors with different characteristics is ensembled to accurately calculate the observation likelihood for the existence and nonexistence of the objects included in the input image sequence. Since the dimension of the scene state of the image can be changed, the measurement value defines the existence and non-presence of the object id at time t as a ratio of likelihood as in [Equation 1]. Since the likelihood of non-presence can not be measured, the present invention adopts the softmax of the probability model as shown in [Equation 1].

[수학식1][Equation 1]

여기서, o_id,t는 시간 t에서의 물체 id(identification)의 존재, ø_id,t는 시간 t에서의 물체 id(identification)의 비존재, λ_e는 물체 검출기 e에 대한 가중치, f는 소프트맥스 함수이다.

는 시간 t에서의 물체가 검출되는 경우를 나타낸다. Where _{id, t} is the presence of an object id at time t, ø _{id, t} is the absence of an object id at time t, λ _e is a weight for the object detector e, Max function.

Indicates a case where an object at time t is detected.

각 검출기가 자신의 장점과 단점을 가지고 있기 때문에, 이러한 앙상블 접근 방식은 산발적인 노이즈에 강인할 것으로 예상된다. 본 발명에서는 글로벌 물체 검출기 (GT), 색상 검출기 (CT), 그리고 모션 검출기 (MT)로 이루어진 물체 검출기 앙상블을 사용한다. 글로벌 물체 검출기(GT)로서 깊은 특징점 기반의 물체 검출기가 사용되어 깊은 특징점 기반으로 물체를 검출한다. 색상 검출기(CT)는 관찰된 외관 모델과 기준 대상 사이의 유사성 점수를 경계 상자의 RGB(Red, Green, Blue) 또는 HSV(Hue, Saturation, Value) 컬러 히스토그램을 사용하여 Bhattacharyya 거리를 통해 계산한다. 모션 검출기(MT)는 물체의 존재 여부를 장면의 모션 존재 여부를 검출하는 Lucas-Kanade 추적기 기반의 모션 검출기를 사용하여 판정한다.Since each detector has its own advantages and disadvantages, this ensemble approach is expected to be robust to sporadic noise. In the present invention, an object detector ensemble consisting of a global object detector (GT), a color detector (CT), and a motion detector (MT) is used. As a global object detector (GT), a deep feature point based object detector is used to detect objects based on deep feature points. The color detector (CT) calculates the similarity score between the observed appearance model and the reference object over the Bhattacharyya distance using RGB (Red, Green, Blue) or HSV (Hue, Saturation, Value) color histogram of the bounding box. The motion detector MT determines the presence or absence of an object using a Lucas-Kanade tracker-based motion detector that detects whether or not the motion of the scene exists.

<추적 세그먼트 생성><Create Track Segment>

여기서, 본 발명의 도 1의 "(b) 추적 세그먼트 생성"에 관하여 설명한다. 위와 같은 우도 계산에 기초하여, 추적 알고리즘은, 만약 가능한 상호작용이나 가림이 발생했을 경우에 대한 판단을 이용하여 물체 존재 가능성 위치에 대한 세그먼트들을 결정하고 세그먼트들을 연결하는 여러 개의 추적 궤도를 설정하는 방식을 적용한다.Here, the "(b) generation of trace segments" in FIG. 1 of the present invention will be described. Based on the above likelihood calculation, the tracking algorithm determines the segments of the object existence possibility position by using the judgment about possible interactions or occlusion, and sets a plurality of tracking trajectories for connecting the segments Is applied.

MOCOT는 주어진 영상 시퀸스로부터 이상적인 장면 파티클들을 결정하는 추적 문제로 모델링된다. MCMOT는 현재 장면 상태로부터 다음 장면 상태로 물체들을 재할당하는 단계의 연속적인 반복을 수행한다고 생각할 수 있다. 우선, 생성과 소멸 할당을 엔터티 상태 전이 단계에서 수행한다. 두 번째로, 즉각적인 추적 세그먼트들을 데이터 중심의 마르코프 체인 몬테 카를로 샘플링 단계에서 추적 세그먼트의 모습과 위치가 부드럽게 변화한다는 가정 아래에서 구축한다. 마지막 단계에서는, 추적 드리프트의 검출이 변화점 검출 알고리즘을 통해 실시되어 드리프트 가능성을 방지한다. 변화점은 데이터의 특성이 급작스럽게 변화하는 시간 지점을 의미하며, 이는 드리프트가 발생하는 시작점일 확률이 높다고 기대할 수 있다. 데이터 중심의 마르코프 체인 몬테 카를로 샘플링과 엔터티 상태 전이의 보다 자세한 논의가 아래에서 기술된다.MOCOT is modeled as a tracking problem that determines ideal scene particles from a given image sequence. The MCMOT can be considered to perform successive iterations of reallocating objects from the current scene state to the next scene state. First, generation and destruction allocation are performed in the entity state transition step. Second, build immediate tracking segments under the assumption that the shape and position of the tracking segments change smoothly in the data-driven Markov chain Monte Carlo sampling stage. In the final step, detection of tracking drift is performed through a change point detection algorithm to prevent drift. The change point means the time point at which the characteristic of the data changes abruptly, and it can be expected that the probability of the start point of the drift is high. A more detailed discussion of data-centric Markov chain Monte Carlo sampling and entity state transition is described below.

<데이터 기반 마르코프 체인 몬테 카를로 샘플링><Data-based Markov chain Monte Carlo sampling>

마르코프 체인 몬테 카를로 샘플링에서, 제안 밀도 함수는 중요한 요소 중 하나이다. 왜냐하면 이는 안정된 분포의 마르코프 체인을 구축하는데 많은 영향을 주기 때문에 추적 성능에 많은 영향을 주는 것이 실제로 확인되었다. 제안 밀도 함수는 측정이 가능해야만 하며, 원하는 목표 분포에 비례하는 제안 분포로부터 효율적으로 샘플되어질 수 있어야 한다. 본 발명에서는 'one object at a time'전략, 즉 각각 하나의 물체 상태가 수정되어진다는 전략이 논문 "Khan, Z., Balch, T., Dellaert, F.: MCMC-based particle filtering for tracking a variable number of interacting targets. TPAMI 27 (2005) 1805-1819", 논문 "Kalal, Z., Mikolajczyk, K., Matas, J.: Forward-backward error: Automatic detection of tracking failures. In: ICPR. (2010)"에 기술된 것을 토대로 적용될 수 있다. 영상 중의 파티클 x_t가 시간 t에서 주어졌을 때, x_t에 대한 x'의 확률 밀도 함수

의 분포를 다음 시간(t+1)의 파티클을 제안하기 위해 사용한다. MCMOT에서는 위와 같은 논문에서 기술한

수식처럼, 마르코프 체인 몬테 카를로 샘플링을 위한 제안 밀도가 순수한 모션 모델을 따른다고 가정한다. 파티클 또는 물체 상태 x_t의 집합인 장면 파티클이 주어졌을 때, 후보 장면 파티클 x_t'가 랜덤하게 물체 존재 o_id,t를 선택함으로써 결정되고, 물체 존재 o_id,t에 의존하는 결정 상태 x_t'가 균일 확률 분포 가정 아래에서 결정된다. 본 발명에서는, 데이터 중심의 제안 밀도가 논문 "Zhao, T., Nevatia, R., Wu, B.: Segmentation and tracking of multiple humans in crowded environments. TPAMI 30 (2008) 1198-1211"을 참조하여 보다 나은 마르코프 체인 수용률을 위해 채택되었다. MCMOT에서 새로운 물체 상태 o'_id,t가 [수학식2]와 같이 표현된다. In the Markov chain Monte Carlo sampling, the proposed density function is one of the important factors. It has been confirmed that this has a great influence on tracking performance because it has a large influence on building a stable distribution of Markov chains. The proposed density function must be measurable and should be able to be sampled efficiently from the proposed distribution proportional to the desired target distribution. In the present invention, a strategy of 'one object at a time' strategy, that is, a strategy in which one object state is modified is described in the article "Khan, Z., Balch, T., Dellaert, F .: MCMC- K., Matas, J .: Forward-backward error: Automatic detection of tracking failures. In: ICPR. (2010) Quot;). &Lt; / RTI > When a particle x _t in the image is given at time t, the probability density function of x 'for x _t

Is used to suggest particles at the next time (t + 1). In the MCMOT,

As the formula suggests, the proposed density for Markov chain Monte Carlo sampling follows a pure motion model. Given a particle or object conditions, a set of scene particles x _t, the candidate scene particles x _t to "random object is present o _id, is determined by selecting a _t, the object exists o _id, it determined depending on the _t state x _t Is determined under the uniform probability distribution assumption. In the present invention, the proposed density of the data center is referred to as "Zhao, T., Nevatia, R., Wu, B .: Segmentation and tracking of multiple humans in crowded environments. TPAMI 30 (2008) 1198-1211" Better Markov chain was adopted for the acceptance rate. In MCMOT, the new object state o ' _{id, t} is expressed as [Equation 2].

[수학식2]&Quot; (2) "

여기서, λ₁ + λ₂ =1이다. [수학식2]에서 첫 번째 항은 모션 모델의 형태이며, 두 번째 항은 물체 id와 가까운 모든 검출을 사용한 검출기 앙상블을 나타낸다. 시간 단계 t-1을 위한 사후 확률은 N개(자연수)의 샘플들 (장면 파티클)로 표현되어진다고 가정한다. 현재 시간 t에서 관측이 초기에 주어졌을 때, 현재의 우도는 N개의 샘플을 사용한 마르코프 체인 몬테 카를로 샘플링에 의해 계산한다. 여기서 초기 B개의 샘플이 번인(burn-in) 샘플로 사용된다. B개의 번인(burn-in) 샘플은 초기에 사용하고 이후에 안정된 상태 분포로 수렴시키기 위해 제거한다. 마르코프 체인 몬테 카를로의 보다 실용적인 고려사항들은 논문 "Gilks, W. R., Richardson, S., Spiegelhalter, D. J.: Introducing markov chain monte carlo. Markov chain Monte Carlo in practice 1 (1996)"에서 찾아볼 수 있다.Here,? ₁ +? ₂ = 1. In Equation (2), the first term is the shape of the motion model, and the second term is the detector ensemble using all detections close to the object id. It is assumed that the posterior probability for time step t-1 is represented by N (natural number) samples (scene particles). When observations are initially given at the current time t, the current likelihood is computed by Monte Carlo sampling of Markov chains using N samples. Where the initial B samples are used as burn-in samples. The B burn-in samples are used initially and then removed to converge to a stable state distribution. More practical considerations of the Markov chain Monte Carlo can be found in the article "Gilks, WR, Richardson, S., Spiegelhalter, DJ: Introducing markov chain monte carlo. Markov chain Monte Carlo in practice 1 (1996)".

<엔터티 상태 전이의 판단>&Lt; Determination of entity state transition >

엔터티 상태 전이는 시간 단계 t와 t-1에서의 시간 모델에 따라 두 개의 이항 확률인 생성 상태와 소멸 상태 확률에 따라 예측할 수 있다. 수식

에 따라 물체 id가 위치 (x,y)에 존재할 때, 생성을 긍정하는 상태는 v=1으로, v=0은 생성을 부정하는 상태를 가리킨다. 마찬가지로,

는 소멸 상태를 나타내고, 즉, 소멸을 긍정하는 상태는 v=1으로, v=0은 소멸을 부정하는 상태를 가리킨다. 시간 t에서의 엔터티 상태 사후 확률(posterior probability) P_ES(o_id,t/o_id,t-1)은 [수학식3]과 같이 각 경우에 대해 P_b, P_d, P_a, P_ø로 나타낼 수 있다. Entity-state transitions can be predicted according to the generation state and the extinction state probability, which are two binomial probabilities, according to the time model at time t and at time t-1. Equation

, When the object id exists at the position (x, y), the state that affirms the generation is v = 1, and the state v = 0 denies the generation. Likewise,

Indicates a disappearance state, that is, a state in which vanishes are extinguished is v = 1, and v = 0 indicates a state in which extinction is denied. The posterior probability P _ES (o _{id, t} / o _{id, t-1} ) at time _t is given by P _b , P _d , P _a , P _ø .

[수학식3]&Quot; (3) "

만약, 새로운 물체 id가 시간 t의 위치 (x,y)에서 관측 우도에 의해 관찰되고 시간 t-1에서는 관찰되지 않았을 경우,

는 1로 설정되며, 그렇지 않을 경우에는 0으로 설정된다. 만약 물체 id가 시간 t의 위치 (x,y)에서 관찰되지 않고 시간 t-1에서 관찰되었을 경우, 물체 id의 소멸 상태

는 1로 설정되며, 그렇지 않을 경우 이는 0으로 설정된다.If a new object id is observed by observation likelihood at position (x, y) at time t and not observed at time t-1,

Is set to 1, and is set to 0 otherwise. If the object id is observed at time t-1 without being observed at the position (x, y) of time t, the disappearance state of the object id

Is set to 1, otherwise it is set to zero.

<변화점 검출>&Lt; Change point detection >

여기서 본 발명의 도 1의 "(c) 변화점 검출"에 관하여 설명하기 위하여, 도 2를 참조한다. Reference is now made to Fig. 2 for the purpose of describing the " (c) change point detection " in Fig. 1 of the present invention.

도 2는 본 발명의 일 실시예에 따른 변화점 검출을 설명하기 위한 흐름도이다.2 is a flow chart for explaining change point detection according to an embodiment of the present invention.

위와 같이 우도 계산에 기초하여, 추적 궤도가 복수의 세그먼트로 나누어지면, 도 2와 같이, MCMOT는 먼저 해당 추적 세그먼트에 대하여, 변화점(change point)을 검출한다(S100). 변화점은 추적궤도 상의 세그먼트들의 데이터가 임계값 이상으로 급작스럽게 변화하는 시간 지점을 의미하며, 이는 추적 궤도의 드리프트 발생의 시작점일 확률이 높은 점이다.If the tracking trajectory is divided into a plurality of segments based on the likelihood calculation as described above, the MCMOT first detects a change point (S100) for the corresponding tracking segment as shown in FIG. The change point means a time point at which the data of the segments on the trajectory change suddenly beyond the threshold value, which is a high probability of starting the drift of the trajectory.

변화점 검출(CPD)에 대한 점수는 변화점 검출 알고리즘에 의해 계산될 수 있다(S200). 만약 해당 변화점에 대해 높은 변화점 점수가 검출되면, 해당 변화점이 포함된 추적 궤도 상의 세그먼트들에 대해 포워드-백워드 에러(FB error)가 체크된다(S300). 포워드-백워드 에러(FB error)는 해당 세그먼트가 드리프트되었는지 여부를 판단하는 기초가 된다. 해당 변화점에 대해 변화점 검출(CPD)에 대한 점수가 낮거나 포워드-백워드 에러(FB error)가 낮으면 해당 세그먼트를 확신하는 세그먼트로서 유지하고(S500), 포워드-백워드 에러(FB error)가 높으면 드리프트된 불안정한 세그먼트로서 제외시킨다(S600). 이와 같이 추적 세그먼트의 드리프트 가능성에 대해 포워드-백워드 확인과 함께 변화점 검출을 통해 효율적으로 추적 궤도 상의 세그먼트들을 결합하여 최종 추적 세그먼트들에 따른 추적 궤도를 결정할 수 있게 된다. 포워드-백워드 에러(FB error) 확인에 대하여는 논문, "Kalal, Z., Mikolajczyk, K., Matas, J.: Forward-backward error: Automatic detection of tracking failures. In: ICPR. (2010)"이 참조될 수 있다. 이와 같이, 변화점의 검출에 대한 점수화를 이용하여 해당 세그먼트를 확신 세그먼트와 드리프트된 불안정 세그먼트로 분류하고, 확신 세그먼트를 유지하고 불안정 세그먼트를 제외시켜, 해당 추적 궤도 상의 세그먼트들을 결합함으로써, 효과적으로 최종 추적 세그먼트들을 결정할 수 있다. The score for change point detection (CPD) may be calculated by a change point detection algorithm (S200). If a high change point score is detected for the change point, a forward-backward error (FB error) is checked for the segments on the trajectory including the change point (S300). A forward-backward error (FB error) is the basis for determining whether the segment has drifted. If the score for the change point is low or the forward-backward error (FB error) is low for that change point, the segment is kept as a confident segment (S500) and a forward-backward error Is excluded as a drifted unstable segment (S600). In this way, it is possible to efficiently determine the tracking trajectory according to the final tracking segments by combining the segments on the tracking trajectory through the change point detection together with the forward-backward confirmation of drift probability of the tracking segment. Forward-backward error (FB error) confirmation is described in the paper, "Kalal, Z., Mikolajczyk, K., Matas, J .: Forward-backward error: Can be referenced. Thus, by scoring the segment for detection of the change point, classifying the segment into a confidence segment and a drifted unstable segment, maintaining the confidence segment, excluding the unstable segment, and combining the segments on the corresponding tracking trajectory, Segments can be determined.

MCMOT는 어수선한 배경에 의해 물체가 혼동되는 경우, 물체를 추적하는 데 실패할 수 있다. MCMOT는 추적을 종료 또는 추적을 계속할지 여부를 결정한다. MCMOT의 드리프트 문제는 트랙 세그먼트를 나타내는 고정된 시계열에서 급작스런 변화점 검출을 통해 조사한다. 높은 반응은 드리프트 발생 가능성이 높은 불확실성을 나타낸다. 추적 드리프트 가능성을 변화점 검출 방법을 통해 결정한다. 여기서 2단계 시계열 학습 알고리즘이 논문 "Takeuchi, J. I., Yamanishi, K.: A unifying framework for detecting outliers and change points from time series. IEEE transactions on Knowledge and Data Engineering 18 (2006) 482-492" 에 기술된 것과 유사하게 적용될 수 있다. 시계열의 아웃라이어를 감소시키기 위해 1단계 시계열의 평균 반응을 사용하여 2단계 시계열이 구축된다. 도 2와 같은 과정에 따라 드리프트를 방지할 수 있다. MCMOT may fail to track an object if the object is confused by a cluttered background. MCMOT decides whether to terminate tracing or continue tracing. The MCMOT drift problem is investigated by detecting sudden change points in a fixed time series representing track segments. A high response indicates a high uncertainty that is likely to occur in the drift. The tracking drift probability is determined through a change point detection method. Here, the two-stage time series learning algorithm is described in "Takeuchi, JI, Yamanishi, K .: A unifying framework for detecting outliers and change points from time series, IEEE Transactions on Knowledge and Data Engineering 18 (2006) 482-492" Can be similarly applied. To reduce time series outliers, a two-phase time series is constructed using the average response of the first-level time series. Drift can be prevented by the procedure shown in FIG.

예를 들어, 추적 세그먼트에서 높은 변화점 검출 응답이 발견될 경우, 다음과 같이 포워드-백워드 에러(FB error) 확인에서 추적 대상 세그먼트를 역추적함으로써 해당 추적 궤도 상의 세그먼트들에 대한 확신도(Conf)를 추정할 수 있다. 이에 따른 확신도(Conf)를 유지될 확신 세그먼트와 제외될 드리프트된 불안정한 세그먼트의 결정에 이용할 수 있다. 예를 들어, 영상이 주어졌을 때, 영상 프레임의 1부터 t까지 추적 궤도 상의 세그먼트들(

)의 연결 궤도를 나타내는 포워드 궤도 τ_t 를 예측할 수 있다. τ_t ^r는

의 역순 연결 상태

로 표현된 백워드 궤도에 해당한다. 랜덤한 추적 경로인 백워드 추적은 올바른 포워드 추적과 유사할 것으로 기대한다. 추적 세그먼트의 확신도는 포워드와 백워드 두 가지 추적 세그먼트 사이의 유클리디안 거리

로 정의된다. 여기서 거리 함수는

와 같이 포워드와 백워드 두 가지 추적 세그먼트에 기초한 유효한 추적궤도들(시작점과 끝점 사이)의 유클리디안 거리로써 표현될 수 있다. 예를 들어, 포워드-백워드 에러(FB error)가 임계치를 넘지만 이와 같은 유클리디안 거리(Conf)가 작고 FB error가 소정의 값 이하이면, 확신 세그먼트가 될 수 있다.For example, if a high change point detection response is found in the tracking segment, the backward tracking of the segment to be tracked in the forward-backward error (FB error) ) Can be estimated. The confidence level (Conf) can then be used to determine the confidence segment to be retained and the drifted unstable segment to be excluded. For example, given an image, the segments in the trajectory from 1 to t in the image frame (

Can be predicted from the forward trajectory < RTI ID = 0.0 > _ti < / RTI > τ _t ^r

Reverse Link State

Which corresponds to the backward orbit represented by < RTI ID = 0.0 > Backward tracking, a random tracking path, is expected to be similar to correct forward tracking. The confidence level of the tracking segment is the Euclidean distance between the two tracking segments, forward and backward.

. Here, the distance function

Can be expressed as the Euclidean distance of valid tracking trajectories (between the start and end points) based on the two tracking segments, forward and backward, as shown in FIG. For example, if the forward-backward error (FB error) exceeds the threshold but the Euclidean distance Conf is small and the FB error is below a predetermined value, it may become a confidence segment.

위와 같이 제안된 본 발명의 MCMOT 알고리즘은 다음과 같이 요약할 수 있다.The proposed MCMOT algorithm of the present invention can be summarized as follows.

이하, 본 발명의 MCMOT 실험에 대한 보다 자세한 내용에 대해 기술하고, 도전적인 영상 시퀸스에서 최첨단의 방법들과 비교를 통해 성능을 설명한다.Hereinafter, a more detailed description of the MCMOT experiment of the present invention will be described, and the performance will be described by comparison with the state of the art methods in a challenging image sequence.

<실험 설정><Experiment setting>

글로벌 물체 검출기와 지역적 물체 검출기를 구축하기 위해, 본 발명에서는 공개적으로 사용 가능한 16개의 레이어를 가진 VGG-Net [논문 참조, Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: arXiv preprint arXiv:1409.1556. (2014)] 과 ResNet [논문 참조, He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. arXiv pre-print arXiv:1512.03385. (2015)]을 ImageNet 물체 분류 데이터셋에서 사전-학습시킨 모델을 사용한다. 초기 모델을 ImageNet 챌린지 물체 검출 데이터셋 (ImageNet DET)을 활용해 280,000 번의 반복과 학습율(learning rate) 0.001의 설정으로 미세조정(fine-tune)을 진행한다. 280,000 번의 반복 이후에, learning rate를 1/10으로 더 낮춘 값으로 70,000 번의 반복을 수행한다. 제안 영역 생성을 위해서, 빠르고 정확한 제안영역을 컨볼루션 특징을 공유하는 방법으로 생성해낼 수 있는 RPN [논문참조, Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards real-time object detection with region proposal networks. In: NIPS. (2015)]을 적용한다. 초기 모델을 구축한 뒤, 각각의 데이터셋에 대해 fine-tuning을 위에서 기술한 방법과 유사하게 진행하여 각 도메인에 적응적인 모델을 구축한다. 변화점 검출 알고리즘에는 효과적인 계산량과 높은 검출 정확도를 보이는 2단계 시계열 학습 알고리즘 [논문참조, Takeuchi, J. I., Yamanishi, K.: A unifying framework for detecting outliers and change points from time series. IEEE transactions on Knowledge and Data Engineering 18 (2006) 482-492]이 사용되었다. 변화점은 변화점 점수가 변화점 점수 임계치보다 높을 경우에 고려한다. 변화점 점수 임계치는 경험적으로 0.3으로 설정한다.In order to construct a global object detector and a local object detector, the present invention uses a publicly available 16-layered VGG-Net (see the article, Simonyan, K., Zisserman, A .: Very deep convolutional networks for large- recognition. In: arXiv preprint arXiv: 1409.1556. (2014)] and ResNet [He, K., Zhang, X., Ren, S., Sun, J .: Deep residual learning for image recognition. arXiv pre-print arXiv: 1512.03385. (2015) is pre-learned in the ImageNet object classification data set. The initial model is fine-tuned to 280,000 iterations and a learning rate of 0.001 using the ImageNet Challenge Object Detection Data Set (ImageNet DET). After 280,000 iterations, 70,000 iterations are performed with a learning rate of 1/10 lower. In order to generate the proposed region, RPN which can generate the fast and accurate proposed region by the method of sharing the convolution feature [References, Ren, S., He, K., Girshick, R., Sun, J .: Faster R -CNN: Towards real-time object detection with region proposal networks. In: NIPS. (2015)]. After building the initial model, fine-tuning each dataset in a manner similar to that described above, and constructing an adaptive model for each domain. For the change point detection algorithm, a two-stage time series learning algorithm with effective computational complexity and high detection accuracy [see papers, Takeuchi, J. I., Yamanishi, K .: A unifying framework for detecting outliers and change points from time series. IEEE Transactions on Knowledge and Data Engineering 18 (2006) 482-492. The change point is considered when the change point score is higher than the change point score threshold. The change point score threshold is set to 0.3 empirically.

<실험 데이터셋><Experimental Data Set>

현존하는 다중 물체 추적 벤치마크 데이터셋은 오직 2가지 또는 3가지 클래스에 대해 다루고 있는 상태이. 본 발명에서는 MCMOT를 30가지 물체 클래스를 지닌 ImageNet VID [논문참조, Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... Berg, A. C.: Imagenet large scale visual recognition challenge. IJCV 115 (2015) 211-252] 와 최신의 다중 물체 추적 벤치마크인 MOT 2016 [논문참조, Magee, D. R.: Tracking multiple vehicles using foreground, background and motion models. Image and vision Computing 22 (2004) 143-155] 벤치마크 데이터셋을 활용해 성능을 평가한다. 최첨단의 방법들과 MCMOT의 성능 비교를 ImageNet VID와 MOT 벤치마크 2016에서 수행한다.Existing multi-object tracking benchmark datasets are only concerned with two or three classes. In the present invention, MCMOT is classified into ImageNet VID having 30 object classes (see papers, Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Berg, AC: Imagenet large scale visual recognition challenge. IJCV 115 (2015) 211-252] and the latest multi-object tracking benchmark, MOT 2016 [see papers, Magee, D. R .: Tracking multiple vehicles using foreground, background and motion models. Image and vision Computing 22 (2004) 143-155] Evaluate performance using benchmark datasets. Performance comparisons of MCMOT with state-of-the-art methods are performed in ImageNet VID and MOT benchmark 2016.

본 발명에서 제안하는 알고리즘을 ImageNet 영상 물체 검출(VID, object detection from Videos, Video Image Detection 또는 Visual Identicalness Detection) 데이터셋을 통해 검증한다. ImageNet VID 는 영상 데이터에 대한 물체 검출 성능을 평가하기 위해 사용되는 데이터셋이다. 그럼에도 불구하고, 해당 데이터셋은 다중클래스 다중물체 추적을 평가하는데도 사용할 수 있다. 왜냐하면 해당 데이터셋은 현실세계에서 움직이는 카메라를 통해 녹화된 영상 시퀸스들로 구성되어 있으며, 30가지 물체 카테고리가 장면과 시간 흐름에 따라 개수가 변화하기 때문이다. 장면에 존재하는 물체 카테고리들은 각기 다른 시점에서 촬영되고 다양한 단계의 가려짐이 존재한다. 다른 최첨단의 방법들과의 쉬운 비교를 위해, 해당 데이터셋에서 측정하는 MCMOT의 성능은 우선적으로 ImageNet 챌린지 [논문참조, Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... Berg, A. C.: Imagenet large scale visual recognition challenge. IJCV 115 (2015) 211-252]에서 사용된 평균 정밀도 (mean average precision, mAP)를 기준으로 측정한다. 본 발명에서 사용된 ImageNet VID 데이터셋은 최초로 공개된 버전을 사용하며, 이는 학습, 검증, 그리고 테스트 세 가지로 구성되어 있다.The algorithm proposed in the present invention is verified through an ImageNet video object detection (VID, object detection from video, video image detection or visual identification detection) data set. ImageNet VID is a dataset used to evaluate object detection performance for image data. Nevertheless, the data set can also be used to evaluate multi-class multi-object tracking. This is because the data set consists of video sequences recorded through a moving camera in the real world, and the number of 30 object categories varies with scene and time flow. Object categories that exist in a scene are photographed at different points in time, and there are obstacles in various stages. For an easy comparison with other state-of-the-art methods, the performance of the MCMOT measured in the dataset is primarily determined by the ImageNet Challenge [Russkovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., ... Berg, AC: Imagenet large scale visual recognition challenge. Measured based on the mean average precision (mAP) used in IJCV 115 (2015) 211-252. The ImageNet VID dataset used in the present invention uses the first published version, which consists of three parts: learning, verification, and testing.

본 발명에서 제안하는 다중 물체 추적 프레임워크는 MOT Benchmark [논문참조, Milan, A., Leal-Taixe, L., Reid, I., Roth, S., Schindler, K.: MOT16: A Benchmark for Multi-Object Tracking. arXiv preprint arXiv:1603.00831. (2016)]를 통해 성능을 측정한다. MOT 벤치마크는 최신의 다중 물체 추적 벤치마크이다. MOT 벤치마크는 몇몇의 새로운 도전적인 시퀸스들과 기존 다중 물체 추적 커뮤니티에서 범용적으로 사용하던 영상 시퀸스들로 구성되어 있다. MOT 벤치마크 2016은 총 14개의 제약적인 환경에서 고정된 카메라와 움직이는 카메라로 데이터를 작성하였다. 모든 시퀸스들은 오직 보행자만을 포함하고 있으며, 이러한 도전적인 시퀸스들은 다양한 설정들로 구성되어 있다. 예를 들면 다른 시점이나 다른 날씨 상태 등이 이러한 설정에 포함된다. 따라서, 특정 시나리오나 장면에 맞춰진 추적 알고리즘은 성능발휘를 제대로 할 수 없다. 본 발명에서는 MOT Benchmark Development Kit에 포함된 CLEAR MOT 추적 측정[논문참조, Bernardin, K., Stiefelhagen, R.: Evaluating multiple object tracking performance: the CLEAR MOT metrics. EURASIP Journal on Image and Video Processing 2008 (2008)]을 사용해 성능 평가를 진행한다.The multi-object tracking framework proposed by the present invention can be applied to a multi-object tracking framework such as MOT Benchmark [see papers, Milan, A., Leal-Taixe, L., Reid, I., Roth, S., Schindler, K .: MOT16: A Benchmark for Multi -Object Tracking. arXiv preprint arXiv: 1603.00831. (2016). The MOT benchmark is the latest multi-object tracking benchmark. The MOT benchmark consists of several new challenging sequences and video sequences commonly used in the existing multi-object tracking community. The MOT benchmark 2016 generated data from fixed cameras and moving cameras in a total of 14 constrained environments. All sequences contain only pedestrians, and these challenging sequences are made up of various settings. For example, other settings or other weather conditions are included in these settings. Therefore, certain scenarios or scene-specific tracking algorithms can not perform well. In the present invention, the CLEAR MOT tracking measurement included in the MOT Benchmark Development Kit [see the paper, Bernardin, K., Stiefelhagen, R .: Evaluating multiple object tracking performance: the CLEAR MOT metrics. (2008)] for the performance evaluation.

도 3a 및 도 3b는 본 발명의 MCMOT를 적용하여 MOT16-02와 MOT16-09 시퀸스의 세그먼트로부터 변화점을 검출한 예시이다. 도 3a 및 도 3b에는 아래쪽에 해당 영상 프레임에 대한 관측 우도와 검출된 변화점의 그래프를 나타내었다. 높은 변화점 반응은 높은 불확실성과 함께 높은 드리프트 발생 가능성을 나타낸다. 본 발명에서 제안하는 방법이 도전적인 환경에서 효과적으로 드리프트를 검출할 수 있음을 그림에서 확인할 수 있다.FIGS. 3A and 3B illustrate examples of detection of change points from segments of the MOT16-02 and MOT16-09 sequences by applying the MCMOT of the present invention. 3A and 3B show graphs of observed likelihood and detected change points for the corresponding image frame. High change point reaction shows high drift probability with high uncertainty. It can be seen from the figure that the method proposed by the present invention can effectively detect drift in a challenging environment.

제안하는 MCMOT 변화점 검출 컴포넌트를 검증하기 위해서, MOT 2016 학습 데이터로부터 두 개의 시퀸스인 MOT16-02(도 3a)와 MOT16-09(도 3b)를 선택하였다. 변화점 검출을 위해서, 만약 변화점 점수가 변화점 임계치인 0.3 보다 높을 경우 이를 변화점으로 인정하였다. 도 3a 및 도 3b에서와 같이, 낮은 우도 또는 급작스런 우도의 변화하는 변화점 가능성을 검출하기 위한 중요한 요소이다. MOT16-02(도 3a) 시퀸스의 추적 결과를 보면, 모션 블러된, 상체만 존재하는 사람이 이동함에 따라 불안정한 우도가 영상 프레임 #438까지 지속됨을 관찰할 수 있다. 이러한 경우에, 드리프트가 발생하는 영역의 제안 영역은 불안정하기 때문에, 우도의 강한 변동을 관측할 수 있다. 본 발명의 변화점 검출 알고리즘은 이러한 변동을 검출하여 높은 변화점 점수를 영상 프레임 #440에서 나타낸다. MOT16-09(도 3b) 시퀸스 또한 마찬가지로 앞서 설명한 상황과 비슷한 현상을 관측할 수 있다. 이러한 관측 결과로 볼 때, 변화점 검출 방법이 추적 드리프트 가능성을 암시적으로 처리한다는 것을 알 수 있다.To verify the proposed MCMOT change point detection component, two sequences MOT16-02 (Fig. 3a) and MOT16-09 (Fig. 3b) were selected from the MOT 2016 training data. For the change point detection, if the change point score is higher than the change point threshold value of 0.3, it is recognized as the change point. As in FIGS. 3A and 3B, it is an important factor for detecting the possibility of changing points of low likelihood or sudden likelihood. As can be seen from the tracking results of the MOT16-02 (FIG. 3A) sequence, the unstable likelihood continues to the image frame # 438 as a motion blurred person moving only the upper body moves. In this case, since the proposed region of the region where the drift occurs is unstable, a strong fluctuation of likelihood can be observed. The change point detection algorithm of the present invention detects this change and indicates a high change point score in the image frame # 440. The MOT16-09 (FIG. 3B) sequence can likewise observe similar phenomena as described above. From these observation results, it can be seen that the change point detection method implicitly processes the possibility of tracking drift.

아래 [표1]은 ImageNet VID 검증 셋에서의 다른 컴포넌트의 영향력 측정 결과를 보여준다. Table 1 below shows the impact measurement results of other components in the ImageNet VID verification set.

[표1][Table 1]

공식적인 ImageNet 챌린지 테스트 서버가 매년 개최되는 대회에 우선적으로 사용되고 있으며 평가 횟수에 제한이 있기 때문에, 본 발명에서의 ImageNet VID 성능 평가는 실용적으로 테스트 셋 대신 검증 셋에서 평가를 진행한다[논문참조, Kang, K., Ouyang, W., Li, H., Wang, X.: Object detection from video tubelets with convolutional neural networks. In: arXiv preprint arXiv:1604.04053. (2016)]. ImageNet VID 학습/검증 실험에서, 모든 학습과 테스트에 사용되는 이미지들은 이미지의 짧은 쪽의 길이가 600 픽셀이 되도록 스케일링한다. 해당 스케일링 값은 VGG16(데이터셋 분류방법)이나 ResNet을 fine-tuning하는 과정에서 GPU(Graphical Processing Unit) 메모리의 한계 때문에 선택하였다.Because the official ImageNet Challenge test server is being used primarily for annual competitions and there are a limited number of evaluations, the ImageNet VID performance evaluation in the present invention is practically evaluated in the verification set instead of the test set [ K., Ouyang, W., Li, H., Wang, X .: Object detection from video tubes with convolutional neural networks. In: arXiv preprint arXiv: 1604.04053. (2016). In the ImageNet VID learning / verification experiment, the images used for all learning and testing are scaled so that the short side of the image is 600 pixels in length. The corresponding scaling values were chosen due to the limitations of GPU (Graphical Processing Unit) memory in the fine tuning of VGG16 (dataset classification method) or ResNet.

[표1]은 MCMOT의 각 컴포넌트가 주는 영향력을 보여준다. 각각의 방법은 MCMOT와 변화점 알고리즘(MCMOT CPD), 그리고 MCMOT와 변화점 알고리즘에 포워드-백워드 확인(MCMOT CPD FB)로 표기한다. 세그먼트의 평균 관측 점수가 0.3보다 낮은 세그먼트들은 필터링한 뒤 평가를 진행하였다. [표1]에서 볼 수 있듯이, MCMOT CPD를 적용함으로써 기본 검출 성능으로부터 9.8%의 상당한 개선을 얻은 71.1%에 도달함을 확인할 수 있으며, 포워드-백워드 확인을 적용한 후에는 종합적으로 74.5% mAP를 ImageNet VID 검증 셋에서 얻을 수 있음을 확인할 수 있다.[Table 1] shows the influence of each component of MCMOT. Each method is denoted by MCMOT, the change point algorithm (MCMOT CPD), and the MCMOT and the change point algorithm with forward-backward word verification (MCMOT CPD FB). Segments with an average observation score of less than 0.3 were filtered and evaluated. As can be seen in Table 1, the MCMOT CPD showed 71.1% of the baseline detection performance, which is a significant improvement of 9.8%. After applying forward-backward detection, the overall detection performance is 74.5% mAP It can be seen from the ImageNet VID verification set.

아래 [표2]는 ImageNet VID 검증 셋에서 최첨단 알고리즘들과의 추적 성능 비교 결과를 나타낸다. Table 2 below shows the results of tracking performance comparisons with state-of-the-art algorithms in the ImageNet VID verification set.

[표2][Table 2]

[표2]는 MCMOT의 정확도 요약과 함께, 281개의 검증 영상 시퀸스들을 통해 평가를 진행, 최첨단의 알고리즘들과의 비교를 수행한다. 본 발명에서 제안하는 MCMOT는 종합적으로 74.5% mAP를 달성하였으며, 이는 최첨단의 알고리즘인 T-CNN [논문참조, Kang, K., Li, H., Yan, J., Zeng, X., Yang, B., Xiao, T., Ouyang, W.: T-cnn: Tubelets with convolutional neural networks for object detection from videos. In: arXiv preprint arXiv:1604.02532. (2016)]과 같은 방법보다 높은 점수임을 확인할 수 있다. 이러한 결과는 주로 높은 정확도의 세그먼트를 변화점 검출을 통해 구축할 수 있는 MCMOT 접근 방법 때문이다. Table 2 summarizes the accuracy of MCMOT, evaluates through 281 validated video sequences, and compares it with the most advanced algorithms. The MCMOT proposed by the present invention achieves a total of 74.5% mAP, and this is achieved by using a state-of-the-art algorithm, T-CNN (Kang, K., Li, H., Yan, J., Zeng, X., Yang, B., Xiao, T., Ouyang, W .: T-cnn: Tubelets with convolutional neural networks for object detection from videos. In: arXiv preprint arXiv: 1604.02532. (2016)] than the other methods. These results are mainly due to the MCMOT approach, which can be used to construct high-accuracy segments through change point detection.

도 4는 본 발명의 MCMOT를 적용하여 ImageNet VID 데이터셋의 검증 시퀸스에서의 MCMOT 추적 결과를 나타낸 예시이다. 각각의 물체 경계 상자에 대하여, 식별자(identifier) 번호가 부여되고 각 물체에 예측된 클래스, 그리고 해당 세그먼트의 신뢰도 점수 등을 관리해 식별자와 함께 영상에 표시될 수 있다. 본 그림들은 디지털 기기로 확대해서 보는 것이 권장된다.Figure 4 is an illustration of the result of MCMOT tracing in a verification sequence of an ImageNet VID dataset using the MCMOT of the present invention. For each object bounding box, an identifier number is assigned, the class predicted for each object, and the confidence score of the segment are managed and displayed on the image together with the identifier. It is recommended that these figures be enlarged to digital devices.

도 4에서 확인할 수 있듯이, MCMOT를 사용했을 때, 무제한적이고 다양한 종류의 클래스를 높은 정확도로 추적하는 것을 볼 수 있다.As can be seen in FIG. 4, when using MCMOT, it can be seen that an unlimited variety of classes are tracked with high accuracy.

본 발명의 MCMOT를 평가하기 위해 MOT 챌린지 2016 벤치마크에서 제안하는 접근방법과 다른 최첨단의 알고리즘들과의 비교를 진행한다. MOT 2016 실험을 위해서, 모든 학습 이미지와 테스트 이미지는 이미지의 짧은 쪽 길이가 800 픽셀이 되도록 스케일한다. 위에서사용한 스케일 픽셀 값보다 큰 값이 선택된 이유는 평균적으로 보행자의 경계상자가 ImageNet VID보다 크기가 작기 때문이다. 추적 실행 시간은 물체 검출 시간을 제외하고 측정하였다.To evaluate the MCMOT of the present invention, a comparison is made between the approach proposed in the MOT Challenge 2016 benchmark and other state-of-the-art algorithms. For the MOT 2016 experiment, all training images and test images are scaled so that the shorter side of the image is 800 pixels in length. The reason for selecting a value larger than the scale pixel value used above is that the average bounding box of the pedestrian is smaller than the ImageNet VID. The tracking execution time was measured excluding the object detection time.

아래 [표3]은 MOT 벤치마크 2016에서의 추적 성능 평가 및 비교 결과를 나타낸다(2016년 7월 14일 결과 기준). ↑표기는 높은 점수가 더 나은 성능을 보임을 나타내고, ↓표기는 낮은 점수가 더 나은 성능을 보임을 나타낸다. [표3]은 MCMOT의 평가 측정 기준을 요약하고, 다른 최첨단의 방법과의 비교를 테스트 영상 시퀸스에서 수행한 결과이다.Table 3 below shows the tracking performance evaluation and comparison results on the MOT benchmark 2016 (as of July 14, 2016). The ↑ mark indicates that the higher score shows better performance, and the ↓ mark indicates that the lower score shows better performance. Table 3 summarizes the MCMOT metrics and compares them with other state-of-the-art methods in test video sequences.

[표3][Table 3]

도 5는 본 발명의 MCMOT를 적용하여 테스트 영상 시퀸스에서 MCMOT의 추적 결과를 시각화한 예시를 보여준다. 도 5에서는 MOT Benchmark 2016 테스트 시퀸스에서의 MCMOT 추적 결과 예시이다. 각 프레임은 매 100 프레임마다 샘플링하였다. 각 경계 상자의 색상은 타겟의 identifier을 표현한다. 해당 그림은 컬러 이미지 환경에서 보는 것이 권장된다.FIG. 5 shows an example of visualizing the tracking result of MCMOT in a test image sequence by applying the MCMOT of the present invention. Figure 5 is an example of MCMOT trace results in the MOT Benchmark 2016 test sequence. Each frame was sampled every 100 frames. The color of each bounding box represents the identifier of the target. It is recommended that these figures be viewed in a color image environment.

[표3]에서도 볼 수 있듯이, MCMOT가 이전에 공개된 최첨단의 방법들보다 종합적인 성능 측정 기준인 MOTA(Multi Object Tracking Accuracy)에서 보다 뛰어난 결과를 보여준다. 또한 훨씬 적은 숫자의 ML(mostly lost targets)를 큰 폭으로 달성함을 보여준다. 본 발명에서 제안하는 방법이 대부분의 평가 기준에서 뛰어난 결과를 보여줌에도 불구하고, 추적기의 속도를 측정하는 초당 추적기 프레임 속력, tracker speed in frames per second (HZ)에서도 다른 추적기보다 더욱 빠른 속도를 보임을 확인할 수 있다. 이러한 결과는 간결한 마르코프 체인 몬테 카를로 기반 추적 구조와 함께 엔터티 상태 전이, 그리고 변화점 검출을 통한 선택적인 포워드-백워드 에러 확인을 통해 다중 물체 추적에서 추적 속도를 향상시켰기 때문이다. 하지만, 최첨단의 알고리즘들에 비해 높은 identity switch (IDS)와 높은 fragmentation (FRAG) 가 관측됨을 확인할 수 있는데 이는 추적 세그먼트 간의 identifier 맵핑이 부족하기 때문이다. 더욱 중요한 것은, 제안하는 MCMOT가 다른 두 개의 데이터셋에서 최첨단의 성능을 달성하였으므로, 제안하는 방법이 어떠한 종류의 상황과 함께 무제한의 클래스 개수에서 범용적인 다중클래스 다중물체 추적 적용 가능성이 높음을 증명하였다.As can be seen in [Table 3], MCMOT shows better results than Multi-Object Tracking Accuracy (MOTA), which is a more comprehensive performance measure than the most advanced methods previously released. It also shows a much smaller number of mostly lost targets (ML). Although the method proposed by the present invention shows excellent results in most of the evaluation criteria, the tracker speed in frames per second (HZ), which measures the speed of the tracker, is faster than other trackers Can be confirmed. This result is due to the improved tracking speed of multiple object tracking through selective state forward-backward error detection through entity state transition and change point detection along with a concise Markov chain Monte Carlo-based tracking structure. However, it can be seen that a higher identity switch (IDS) and higher fragmentation (FRAG) are observed compared to state-of-the-art algorithms because of the lack of identifier mapping between trace segments. More importantly, because the proposed MCMOT achieved the most advanced performance in the other two data sets, the proposed method proved to be highly applicable to universal multi-class multi-object tracking in an unlimited number of classes with any kind of situation .

상술한 바와 같이, 본 발명의 일 실시예에 따른 다중클래스 다중물체 추적 장치(MCMOT)는, ImageNet VID와 MOT Benchmark 2016에서 최첨단의 알고리즘들과 비교하여 보다 뛰어난 성능 결과를 보여주었다. MCMOT는 물체 검출 결과를 바탕으로 무제한의 물체 클래스의 연결관계를 형성한다. 변화점 검출 모델이 추적 드리프트로 인해 발생하는 급작스러운 변화나 이상점을 관측한다. Lucas-Kanade 기반의 모션 검출기와 컨볼루션 신경망 기반의 물체 검출기의 앙상블을 기반으로 우도를 계산하였다. As described above, the multi-class multi-object tracking device (MCMOT) according to an embodiment of the present invention showed superior performance results compared with the state-of-the-art algorithms in ImageNet VID and MOT Benchmark 2016. [ MCMOT forms an unlimited number of object class connections based on the object detection results. The change point detection model observes sudden changes or anomalies caused by tracking drift. We calculated the likelihood based on the ensemble of Lucas-Kanade based motion detector and convolution neural network based object detector.

따라서, 본 발명의 다중클래스 다중물체 추적 장치(MCMOT)에 따라, 다양하게 변화하는 다중물체 무제한-클래스를 이상적으로 움직이는 제안 영역의 우도(likelihood)를 공식화함으로써 추적할 수 있다. 오직 보행자나 자동차와 같은 제한된 종류의 물체들에 한정하여 우도를 추정하던 기존 기술과 달리, 효율적인 컨볼루션 신경망 기반의 다중클래스 물체 검출기가 다중 물체 클래스의 우도를 계산하기 위해 적용될 수 있다. 또한, 정적인 관측과 동적인 관측을 기반으로 한 변화점 검출 알고리즘이 추적의 실패를 평가한다. 추적 세그먼트가 표현하는 변화 없는 시계열 내의 급작스런 변화점을 검출함으로써, 다중 물체 추적의 드리프트를 조사할 수 있다.Thus, according to the multi-class multi-object tracking device (MCMOT) of the present invention, a variety of changing multi-object unlimited-classes can be traced by formulating the likelihood of a proposed region that ideally moves. Unlike existing techniques that estimate only limited types of objects, such as pedestrians and automobiles, an efficient convolution neural network based multi-class object detector can be applied to calculate the likelihood of multiple object classes. In addition, a change point detection algorithm based on static observations and dynamic observations evaluates the failure of tracking. The drift of multiple object tracking can be investigated by detecting a sudden change point in the time series without change represented by the tracking segment.

이상과 같이 본 발명에서는 구체적인 구성 요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상적인 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등하거나 등가적 변형이 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.As described above, the present invention has been described with reference to particular embodiments, such as specific elements, and specific embodiments and drawings. However, it should be understood that the present invention is not limited to the above- Those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the essential characteristics of the invention. Therefore, the spirit of the present invention should not be construed as being limited to the embodiments described, and all technical ideas which are equivalent to or equivalent to the claims of the present invention are included in the scope of the present invention .

Claims

A multi-class multi-object tracking method for an input video sequence,
Calculating an observation likelihood of existence and nonexistence of an object included in the input image sequence;
Determining, based on the computation of likelihood, segments for object presence probability positions and establishing a plurality of trajectories connecting the segments;
Detecting a change point with respect to a time point at which the data of the segments on the trajectory change exceeds a threshold value; And
Performing a forward-backward check on the segments on the trajectory containing the change point, and combining the segments on the trajectory based on the result of the determination to determine final trace segments
Wherein the multi-class multi-object tracking method comprises:

The method according to claim 1,
Wherein the determining step comprises: classifying the segment into a confidence segment and a drifted unstable segment using scoring for detection of the change point; maintaining the confidence segment and excluding the unstable segment; Wherein the multi-class multi-object tracking method comprises the steps of:

3. The method of claim 2,
Wherein the confidence level is used for estimating a confidence level as a forward-backward error for segments on a corresponding trajectory to exclude the retention of the confidence segment and the unstable segment, Wherein the Euclidean distance is calculated using the Euclidean distance between the trajectory-based segment trajectory and the backward trajectory based on backward tracing.

A function of calculating an observation likelihood of existence and nonexistence of an object included in an input image sequence;
Determining, based on the computation of likelihood, segments for object presence probability positions and setting a plurality of trajectory trains connecting the segments;
Detecting a change point with respect to a time point at which the data of the segments on the trajectory change more than a threshold value; And
Performing backward-word verification on the segments on the trajectory containing the change point, and combining the segments on the trajectory based on the result of the determination to determine final trace segments
And a computer-readable code for multi-class multi-object tracking for an input video sequence.