KR102388335B1

KR102388335B1 - Multiple object tracking using siamese random forest

Info

Publication number: KR102388335B1
Application number: KR1020200094016A
Authority: KR
Inventors: 고병철; 이지미
Original assignee: 계명대학교 산학협력단
Priority date: 2020-07-28
Filing date: 2020-07-28
Publication date: 2022-04-19
Also published as: KR20220014209A

Abstract

본 발명은 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법에 관한 것으로서, 보다 구체적으로는 다수 객체 추적 방법으로서, (a) 학습 데이터를 이용해, 랜덤 포레스트(Random Forest, RF) 분류기와 샴 구조를 결합한 샴 랜덤 포레스트(이하, SiameseRF)를 학습하는 단계; 및 (b) 상기 학습된 SiameseRF를 이용해 객체를 추적하는 단계를 포함하며, 상기 단계 (a)에서는, 기준(anchor), 양성(positive) 또는 음성(negative)으로 각각 레이블 된 이미지를 포함하는 학습 데이터를 이용해, 트리를 구성하는 규칙을 공유하는 두 RF로 구성되는 상기 SiameseRF를 학습하되, 상기 학습 데이터로부터 {기준, 양성}의 제1쌍 및 {기준, 음성}의 제2쌍을 상기 두 RF에 각각 입력하여, 상기 제1쌍과의 유사성이 증가하는 방향과 상기 제2쌍과의 차이가 증가하는 방향으로 상기 SiameseRF를 학습하는 것을 그 구성상의 특징으로 한다.
또한, 본 발명은 샴 랜덤 포레스트를 이용한 다수 객체 추적 장치에 관한 것으로서, 보다 구체적으로는 다수 객체 추적 장치로서, 학습 데이터를 이용해, 랜덤 포레스트(Random Forest, RF) 분류기와 샴 구조를 결합한 샴 랜덤 포레스트(이하, SiameseRF)를 학습하는 학습 모듈; 및 상기 학습된 SiameseRF를 이용해 객체를 추적하는 객체 추적 모듈을 포함하며, 상기 학습 모듈은, 기준(anchor), 양성(positive) 또는 음성(negative)으로 각각 레이블 된 이미지를 포함하는 학습 데이터를 이용해, 트리를 구성하는 규칙을 공유하는 두 RF로 구성되는 상기 SiameseRF를 학습하되, 상기 학습 데이터로부터 {기준, 양성}의 제1쌍 및 {기준, 음성}의 제2쌍을 상기 두 RF에 각각 입력하여, 상기 제1쌍과의 유사성이 증가하는 방향과 상기 제2쌍과의 차이가 증가하는 방향으로 상기 SiameseRF를 학습하는 것을 그 구성상의 특징으로 한다.
본 발명에서 제안하고 있는 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법 및 장치에 따르면, 랜덤 포레스트 분류기와 샴 구조를 결합하여 샴 랜덤 포레스트 구조를 제안하고, 학습 데이터로부터 {기준, 양성}의 제1쌍 및 {기준, 음성}의 제2쌍을 입력하여, 제1쌍과의 유사성이 증가하는 방향과 제2쌍과의 차이가 증가하는 방향으로 샴 랜덤 포레스트를 시킴으로써, 최종적으로 고속으로 학습 및 분류를 할 수 있고, 카메라 움직임이나 복잡한 보행자 형태에도 불구하고 강력한 추적 성능을 가질 수 있다.The present invention relates to a method for tracking multiple objects using a Siamese random forest, and more specifically, as a method for tracking multiple objects, (a) using learning data, a Siamese random combining a Random Forest (RF) classifier and a Siamese structure Learning Forest (hereinafter, SiameseRF); and (b) tracking an object using the learned SiameseRF, wherein in step (a), training data including images labeled as anchor, positive or negative, respectively By using , the SiameseRF composed of two RFs sharing the rules constituting a tree is learned, and a first pair of {reference, positive} and a second pair of {reference, negative} are applied to the two RFs from the training data. It is characterized in that by inputting each, the SiameseRF is learned in a direction in which the similarity with the first pair increases and the difference with the second pair increases.
In addition, the present invention relates to a multiple object tracking device using a Siamese random forest, and more specifically, as a multiple object tracking device, a Siamese random forest combining a Random Forest (RF) classifier and a Siamese structure using learning data. a learning module for learning (hereinafter, SiameseRF); and an object tracking module for tracking an object using the learned SiameseRF, wherein the learning module uses training data including images each labeled as a reference (anchor), positive (positive) or negative (negative), Learning the SiameseRF composed of two RFs that share the rules constituting a tree, by inputting a first pair of {reference, positive} and a second pair of {reference, negative} from the training data to the two RFs, respectively , it is characterized in that the SiameseRF is learned in a direction in which the similarity with the first pair increases and the difference with the second pair increases.
According to the method and apparatus for tracking multiple objects using a siamese random forest proposed in the present invention, a sham random forest structure is proposed by combining a random forest classifier and a siamese structure, and a first pair of {reference, positive} and By inputting the second pair of {reference, voice}, the Siamese random forest in the direction in which the similarity with the first pair increases and the difference between the second pair increases, and finally learns and classifies at high speed. and can have strong tracking performance despite camera movements or complex pedestrian shapes.

Description

Multiple object tracking method and apparatus using Siamese random forest {MULTIPLE OBJECT TRACKING USING SIAMESE RANDOM FOREST}

본 발명은 다수 객체 추적 방법 및 장치에 관한 것으로서, 보다 구체적으로는 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for tracking multiple objects, and more particularly, to a method and apparatus for tracking multiple objects using a Siamese random forest.

다수 객체 추적(Multiple Object Tracking, MOT)은 비디오 감시, 자율 주행, 인간-컴퓨터 인터페이스(HCI) 및 증강 현실(AR)과 같은 다양한 추적 기술에 필수적이다. 최근에는 많은 온라인 및 오프라인 MOT에 기존 추적 기술 대신 DNN(deep neural network)이 적용되고 있다.Multiple Object Tracking (MOT) is essential for various tracking technologies such as video surveillance, autonomous driving, human-computer interface (HCI) and augmented reality (AR). Recently, a deep neural network (DNN) has been applied to many online and offline MOTs instead of the existing tracking technology.

그러나 오프라인 추적은 실시간 경로 모니터링을 위해 모든 프레임을 고려해야 하므로 실시간 객체 모니터링 또는 기타 응용 프로그램에는 적합하지 않다. 또한, 온라인 MOT의 경우 Kalman 필터 또는 입자 필터 기반 방법이 주로 사용됐으나, 최근 DNN 기반의 MOT　시스템이 주목할만한 결과를 얻고 있다.However, offline tracking is not suitable for real-time object monitoring or other applications as every frame must be considered for real-time path monitoring. In addition, in the case of online MOT, Kalman filter or particle filter-based methods were mainly used, but recently, DNN-based MOT　 systems are getting noteworthy results.

특히, DNN, Deep Association Matching 및 4중 컨볼루션 신경망(quadruplet convolutional neural network)에서 추출된 특징을 사용하는 장기 외관 모델(long-term appearance models)은 우수한 추적 성능을 나타낸다. 그러나 네트워크 구조가 복잡하고 여러 프레임의 객체 추적 경로를 분석해야 하므로, 이를 온라인 추적에 적용하기에는 어려움이 있다.In particular, long-term appearance models using features extracted from DNN, Deep Association Matching, and quadruplet convolutional neural networks show excellent tracking performance. However, it is difficult to apply this to online tracking because the network structure is complex and the object tracking path of multiple frames needs to be analyzed.

일반적으로 오프라인 및 온라인 MOT는 모두 TBD(Tracking-by-Detection) 패러다임을 사용한다. 이는 검출 성능에 따라 추적 성능이 다소 의존적이다. 그러나, 검출 방법의 우수성과 관계없이, 객체 또는 카메라 흔들림으로 인해 물체가 누락되거나 부정확한 물체가 검출되면 추적 성능이 현저하게 저하될 수 있다. 따라서, MOT 검출의 부정확성을 보상하기 위해 다양한 데이터 연관 방법(data association method)이 제안되어 왔다.In general, both offline and online MOT use the Tracking-by-Detection (TBD) paradigm. The tracking performance is somewhat dependent on the detection performance. However, regardless of the superiority of the detection method, if an object is missing or an inaccurate object is detected due to an object or camera shake, the tracking performance may be significantly degraded. Therefore, various data association methods have been proposed to compensate for the inaccuracy of MOT detection.

특히, MOT의 실시간 추적은 데이터 연결의 효율성과 밀접한 관련이 있다. 샴 컨볼루션 신경망(샴 CNN) 기반 객체 추적은 실시간 추적 있어 상당한 관심을 받아 왔다. 샴 CNN은 같은 네트워크를 탐지 및 추적 객체에 적용하고 출력 특징값의 차이를 기반으로 유사성을 계산한다. 따라서 샴 CNN은 별도의 네트워크 구조를 유지할 필요가 없으며 빠른 추적의 장점이 있다.In particular, real-time tracking of MOT is closely related to the efficiency of data connection. Object tracking based on Siamese convolutional neural networks (Siamese CNNs) has received considerable attention for real-time tracking. Siamese CNN applies the same network to detection and tracking objects and calculates similarity based on differences in output feature values. Therefore, Siamese CNN does not need to maintain a separate network structure and has the advantage of fast tracking.

이와 같이, 샴 구조는 객체 간에 우수한 매칭 성능을 갖지만, 유사성 매칭을 위한 공유 네트워크(shared network)가 복잡한 구조를 갖기 때문에, 여전히 많은 초매개변수(hyper parameter)를 포함하고 느린 추적 속도를 가진다. 따라서 샴 CNN 기반 MOT 방법은 실제 환경에서 실시간 추적 또는 저가형 시스템에서의 실시간 추적에는 적합하지 않은 한계가 있다.As such, the Siamese structure has excellent matching performance between objects, but because a shared network for similarity matching has a complex structure, it still includes many hyper parameters and has a slow tracking speed. Therefore, the Siamese CNN-based MOT method has limitations that are not suitable for real-time tracking in real environments or real-time tracking in low-cost systems.

한편, 본 발명과 관련된 선행기술로서, 공개특허공보 제10-2020-0040665호(발명의 명칭: 컨볼루션 신경망을 이용하여 POI 변화를 검출하기 위한 시스템 및 방법, 공개일자: 2020년 04월 20일) 등이 개시된 바 있다.On the other hand, as a prior art related to the present invention, Patent Publication No. 10-2020-0040665 (Title of the invention: System and method for detecting POI change using a convolutional neural network, Publication date: April 20, 2020 ), etc. have been disclosed.

본 발명은 기존에 제안된 방법들의 상기와 같은 문제점들을 해결하기 위해 제안된 것으로서, 랜덤 포레스트 분류기와 샴 구조를 결합하여 샴 랜덤 포레스트 구조를 제안하고, 학습 데이터로부터 {기준, 양성}의 제1쌍 및 {기준, 음성}의 제2쌍을 입력하여, 제1쌍과의 유사성이 증가하는 방향과 제2쌍과의 차이가 증가하는 방향으로 샴 랜덤 포레스트를 학습시킴으로써, 최종적으로 고속으로 학습 및 분류를 할 수 있고, 카메라 움직임이나 복잡한 보행자 형태에도 불구하고 강력한 추적 성능을 가질 수 있는, 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법 및 장치를 제공하는 것을 그 목적으로 한다.The present invention has been proposed to solve the above problems of the previously proposed methods, and proposes a sham random forest structure by combining a random forest classifier and a siamese structure, and a first pair of {reference, positive} from training data And by inputting the second pair of {reference, voice}, learning and classifying the Siamese random forest in the direction of increasing similarity with the first pair and increasing difference with the second pair, finally learning and classifying at high speed It is an object of the present invention to provide a method and apparatus for tracking multiple objects using a Siamese random forest, which can do this and have strong tracking performance despite camera movements or complex pedestrian shapes.

상기한 목적을 달성하기 위한 본 발명의 특징에 따른 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법은,Multiple object tracking method using a Siamese random forest according to a feature of the present invention for achieving the above object,

다수 객체 추적 방법으로서,A method for tracking multiple objects, comprising:

(a) 학습 데이터를 이용해, 랜덤 포레스트(Random Forest, RF) 분류기와 샴 구조를 결합한 샴 랜덤 포레스트(이하, SiameseRF)를 학습하는 단계; 및(a) learning a Siamese Random Forest (hereinafter referred to as SiameseRF) combining a Random Forest (RF) classifier and a Siamese structure using the training data; and

(b) 상기 학습된 SiameseRF를 이용해 객체를 추적하는 단계를 포함하며,(b) tracking the object using the learned SiameseRF,

상기 단계 (a)에서는,In step (a),

기준(anchor), 양성(positive) 또는 음성(negative)으로 각각 레이블 된 이미지를 포함하는 학습 데이터를 이용해, 트리를 구성하는 규칙을 공유하는 두 RF로 구성되는 상기 SiameseRF를 학습하되, 상기 학습 데이터로부터 {기준, 양성}의 제1쌍 및 {기준, 음성}의 제2쌍을 상기 두 RF에 각각 입력하여, 상기 제1쌍과의 유사성이 증가하는 방향과 상기 제2쌍과의 차이가 증가하는 방향으로 상기 SiameseRF를 학습하는 것을 그 구성상의 특징으로 한다.Using training data including images labeled as anchor, positive or negative, respectively, learn the SiameseRF consisting of two RFs that share the rules for constructing a tree, but from the training data By inputting the first pair of {reference, positive} and the second pair of {reference, negative} to the two RFs, respectively, the direction in which the similarity with the first pair increases and the difference with the second pair increases It is characterized in its configuration to learn the SiameseRF in the direction.

바람직하게는, 상기 두 RF는,Preferably, the two RFs are

상기 제1쌍의 거리 벡터(이하, AP 거리 벡터) 및 상기 제2쌍의 거리 벡터(이하, AN 거리 벡터)를 각각 입력 벡터로 할 수 있다.The first pair of distance vectors (hereinafter, AP distance vectors) and the second pair of distance vectors (hereinafter, AN distance vectors) may be used as input vectors, respectively.

더욱 바람직하게는, 상기 AP 거리 벡터 및 AN 거리 벡터는,More preferably, the AP distance vector and the AN distance vector are:

신경망 기반의 객체 검출 장치에서 검출된 객체에 대한 특징 맵으로부터 생성된 압축 외관 특징(condensed appearance feature, CAF)의 차이일 수 있다.It may be a difference in a condensed appearance feature (CAF) generated from a feature map of an object detected by the neural network-based object detection apparatus.

바람직하게는, 상기 단계 (a)는,Preferably, the step (a) is

(1) 상기 학습 데이터로부터 특징 벡터를 추출하는 단계; 및(1) extracting a feature vector from the training data; and

(2) 상기 추출된 특징 벡터를 이용해 상기 SiameseRF를 학습하는 단계를 포함할 수 있다.(2) it may include the step of learning the SiameseRF using the extracted feature vector.

더욱 바람직하게는 상기 단계 (1)에서는,More preferably, in step (1),

(1-1) 객체 검출을 위한 신경망의 첫 번째 레이어 및 두 번째 레이어로부터 부분 특징 맵을 추출하는 단계;(1-1) extracting a partial feature map from a first layer and a second layer of a neural network for object detection;

(1-2) 상기 추출한 부분 특징 맵에 각각 글로벌 평균 풀링(global averaging pooling, GAP)을 적용해, 2개의 압축 특징을 생성하는 단계; 및(1-2) generating two compressed features by applying global averaging pooling (GAP) to the extracted partial feature maps, respectively; and

(1-3) 상기 생성된 2개의 압축 특징을 연결해 최종 압축 외관 특징(condensed appearance feature, CAF)을 생성하는 단계를 포함할 수 있다.(1-3) generating a final condensed appearance feature (CAF) by concatenating the two generated compressed features.

더욱 바람직하게는, 상기 단계 (2)에서는,More preferably, in step (2),

K겹 교차 검증(K-fold cross validation)을 이용해 상기 Siamese RF를 학습할 수 있다.The Siamese RF can be learned using K-fold cross validation.

더더욱 바람직하게는, 상기 단계 (2)는,Even more preferably, the step (2) is

(2-1) 상기 학습 데이터에서 선택된 K-1 폴드는 학습 세트로, 나머지 폴드는 검증 세트로 하는 단계;(2-1) using the K-1 fold selected from the training data as a training set and the remaining folds as a verification set;

(2-2) CAF를 이용해 상기 학습 세트에 포함된 샘플의 상기 제1쌍의 거리 벡터(이하, AP 거리 벡터) 및 상기 제2쌍의 거리 벡터(이하, AN 거리 벡터)를 추정하는 단계;(2-2) estimating the first pair of distance vectors (hereinafter, AP distance vectors) and the second pair of distance vectors (hereinafter, AN distance vectors) of the samples included in the training set using CAF;

(2-3) 규칙을 공유하는 상기 두 RF에 상기 학습 세트의 샘플 쌍을 입력하는 단계;(2-3) inputting a pair of samples of the training set into the two RFs sharing a rule;

(2-4) 상기 입력된 샘플 쌍에 대해 상기 추정된 AP 거리 벡터 및 AN 거리 벡터를 이용해 상기 두 RF를 학습하는 단계;(2-4) learning the two RFs using the estimated AP distance vector and AN distance vector for the input sample pair;

(2-5) 상기 K-1 폴드 학습 후, 상기 검증 세트의 모든 샘플 쌍에 대해 추정된 AP 거리 벡터 및 AN 거리 벡터를 상기 단계 (2-4)에서 학습된 RF에 적용하여 검증하는 단계; 및(2-5) verifying by applying the AP distance vector and AN distance vector estimated for all sample pairs in the verification set to the RF learned in step (2-4) after the K-1 fold learning; and

(2-6) 학습된 RF 구조 및 총 손실(total loss)을 저장하는 단계를 포함하며,(2-6) storing the learned RF structure and total loss,

모든 K개의 폴드가 검증 세트로 사용될 때까지, 상기 단계 (2-1) 내지 단계 (2-6)을 반복하여 학습할 수 있다.The above steps (2-1) to (2-6) may be repeated until all K folds are used as a verification set.

더더더욱 바람직하게는, 상기 단계 (2-6) 이후에는,Even more preferably, after step (2-6),

(2-7) 모든 K개의 폴드에 대한 학습이 완료되면, 총 손실이 가장 작은 RF를 최종 SiameseRF로 결정하는 단계를 더 포함할 수 있다.(2-7) When learning for all K folds is completed, the method may further include determining an RF having the smallest total loss as a final SiameseRF.

더더더욱 바람직하게는, 상기 단계 (2)에서는,Even more preferably, in step (2),

K겹 교차 검증의 수렴 여부에 따라 공유 RF로 구성된 규칙 업데이트가 수행될 수 있다.According to whether the K-fold cross-validation converges, a rule update configured with a shared RF may be performed.

바람직하게는, 상기 단계 (b)에서는,Preferably, in step (b),

객체가 탐지되면, 상기 학습된 SiameseRF를 이용해 상기 탐지된 객체와 추적 객체 사이의 연관성을 측정해 객체를 추적할 수 있다.When an object is detected, the association between the detected object and the tracking object may be measured using the learned SiameseRF to track the object.

더욱 바람직하게는, 상기 단계 (b)에서는,More preferably, in step (b),

상기 탐지된 객체와 상기 추적 객체의 압축 외관 특징을 상기 학습된 SiameseRF에 입력하고, 상기 SiameseRF의 출력인 두 객체의 유사성 확률(similarity probability)을 연관성 측정을 위한 외관 점수(appearance score)로 할 수 있다.The compressed appearance features of the detected object and the tracking object may be input to the learned SiameseRF, and the similarity probability of the two objects that is the output of the SiameseRF may be used as an appearance score for correlation measurement. .

더더욱 바람직하게는, 상기 단계 (b)에서는,Even more preferably, in step (b),

상기 SiameseRF의 역 유사성 확률값, 두 객체 종횡비(A_ratio) 및 두 객체 사이의 L1-중심 거리의 가중치 합으로 산출되는 연관 매칭의 비용 함수를 이용해, 상기 탐지된 객체와 추적 객체가 일치하는지 판단할 수 있다.Using the cost function of associative matching calculated as the weighted sum of the inverse similarity probability value of SiameseRF, the aspect ratio (A_ratio) of two objects, and the L1-center distance between the two objects, it can be determined whether the detected object and the tracking object match. .

더더더욱 바람직하게는, 상기 단계 (b)에서는,Even more preferably, in step (b),

상기 탐지된 객체와 추적 객체가 일치하면 상기 탐지된 객체를 상기 추적 객체의 상태에 업데이트하고, 미리 정해진 개수의 프레임 동안 상기 추적 객체와 일치하는 탐지된 객체가 없으면 상기 추적 객체를 삭제하며, 상기 탐지된 객체가 상기 추적 객체와 일치하지 않으면 상기 탐지된 객체를 잠재적 추적 객체로 할당하고 미리 정해진 개수의 프레임에서 상기 잠재적 추적 객체와 탐지된 객체가 일치하면 상기 잠재적 추적 객체를 새로운 추적 객체로 할당할 수 있다.If the detected object matches the tracking object, the detected object is updated in the status of the tracking object, and if there is no detected object matching the tracking object for a predetermined number of frames, the tracking object is deleted, and the detection If the detected object does not match the tracking object, the detected object is allocated as a potential tracking object, and if the potential tracking object and the detected object match in a predetermined number of frames, the potential tracking object is allocated as a new tracking object. there is.

상기한 목적을 달성하기 위한 본 발명의 특징에 따른 샴 랜덤 포레스트를 이용한 다수 객체 추적 장치는,Multiple object tracking apparatus using a Siamese random forest according to a feature of the present invention for achieving the above object,

다수 객체 추적 장치로서,A multi-object tracking device comprising:

학습 데이터를 이용해, 랜덤 포레스트(Random Forest, RF) 분류기와 샴 구조를 결합한 샴 랜덤 포레스트(이하, SiameseRF)를 학습하는 학습 모듈; 및a learning module for learning a Siamese Random Forest (hereinafter, SiameseRF) that combines a Random Forest (RF) classifier and a Siamese structure using training data; and

상기 학습된 SiameseRF를 이용해 객체를 추적하는 객체 추적 모듈을 포함하며,It includes an object tracking module for tracking an object using the learned SiameseRF,

상기 학습 모듈은,The learning module is

바람직하게는, 상기 두 RF는,Preferably, the two RFs are

본 발명에서 제안하고 있는 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법 및 장치에 따르면, 랜덤 포레스트 분류기와 샴 구조를 결합하여 샴 랜덤 포레스트 구조를 제안하고, 학습 데이터로부터 {기준, 양성}의 제1쌍 및 {기준, 음성}의 제2쌍을 입력하여, 제1쌍과의 유사성이 증가하는 방향과 제2쌍과의 차이가 증가하는 방향으로 샴 랜덤 포레스트를 시킴으로써, 최종적으로 고속으로 학습 및 분류를 할 수 있고, 카메라 움직임이나 복잡한 보행자 형태에도 불구하고 강력한 추적 성능을 가질 수 있다.According to the method and apparatus for tracking multiple objects using a siamese random forest proposed in the present invention, a sham random forest structure is proposed by combining a random forest classifier and a siamese structure, and a first pair of {reference, positive} and By inputting the second pair of {reference, voice}, the Siamese random forest in the direction of increasing similarity with the first pair and increasing difference with the second pair, finally high-speed learning and classification and can have strong tracking performance despite camera movements or complex pedestrian shapes.

도 1은 본 발명의 일실시예에 따른 샴 랜덤 포레스트를 이용한 다수 객체 추적 장치의 구성을 도시한 도면.
도 2는 본 발명의 일실시예에 따른 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법의 흐름을 도시한 도면.
도 3은 본 발명의 일실시예에 따른 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법 및 장치에서, SiameseRF의 학습 과정을 도시한 도면.
도 4는 본 발명의 일실시예에 따른 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법에서, 단계 S100의 세부적인 흐름을 도시한 도면.
도 5는 본 발명의 일실시예에 따른 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법에서, 단계 S110의 세부적인 흐름을 도시한 도면.
도 6은 본 발명의 일실시예에 따른 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법에서, 단계 S120의 세부적인 흐름을 도시한 도면.
도 7은 본 발명의 일실시예에 따른 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법 및 장치의 성능을 검증하기 위한 실험 결과를 도시한 도면.
도 8 내지 도 10은 본 발명의 일실시예에 따른 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법 및 장치가 적용된 시연 영상의 일부를 도시한 도면.1 is a diagram showing the configuration of a multi-object tracking apparatus using a Siamese random forest according to an embodiment of the present invention.
2 is a diagram illustrating a flow of a method for tracking multiple objects using a Siamese random forest according to an embodiment of the present invention.
3 is a diagram illustrating a learning process of SiameseRF in a method and apparatus for tracking multiple objects using a Siamese random forest according to an embodiment of the present invention.
4 is a diagram illustrating a detailed flow of step S100 in a method for tracking multiple objects using a Siamese random forest according to an embodiment of the present invention.
5 is a diagram illustrating a detailed flow of step S110 in a method for tracking multiple objects using a Siamese random forest according to an embodiment of the present invention.
6 is a diagram illustrating a detailed flow of step S120 in a method for tracking multiple objects using a Siamese random forest according to an embodiment of the present invention.
7 is a view showing experimental results for verifying the performance of a method and apparatus for tracking multiple objects using a Siamese random forest according to an embodiment of the present invention.
8 to 10 are views illustrating a part of a demonstration image to which a method and apparatus for tracking multiple objects using a Siamese random forest according to an embodiment of the present invention is applied.

이하, 첨부된 도면을 참조하여 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자가 본 발명을 용이하게 실시할 수 있도록 바람직한 실시예를 상세히 설명한다. 다만, 본 발명의 바람직한 실시예를 상세하게 설명함에 있어, 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략한다. 또한, 유사한 기능 및 작용을 하는 부분에 대해서는 도면 전체에 걸쳐 동일한 부호를 사용한다.Hereinafter, preferred embodiments will be described in detail so that those of ordinary skill in the art can easily practice the present invention with reference to the accompanying drawings. However, in describing the preferred embodiment of the present invention in detail, if it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. In addition, the same reference numerals are used throughout the drawings for parts having similar functions and functions.

덧붙여, 명세서 전체에서, 어떤 부분이 다른 부분과 ‘연결’ 되어 있다고 할 때, 이는 ‘직접적으로 연결’ 되어 있는 경우뿐만 아니라, 그 중간에 다른 소자를 사이에 두고 ‘간접적으로 연결’ 되어 있는 경우도 포함한다. 또한, 어떤 구성요소를 ‘포함’ 한다는 것은, 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있다는 것을 의미한다.In addition, throughout the specification, when a part is 'connected' with another part, it is not only 'directly connected' but also 'indirectly connected' with another element interposed therebetween. include In addition, "including" a certain component means that other components may be further included, rather than excluding other components, unless otherwise stated.

도 1은 본 발명의 일실시예에 따른 샴 랜덤 포레스트를 이용한 다수 객체 추적 장치의 구성을 도시한 도면이다. 도 1에 도시된 바와 같이, 본 발명의 일실시예에 따른 샴 랜덤 포레스트를 이용한 다수 객체 추적 장치는, 학습 데이터를 이용해, 랜덤 포레스트(Random Forest, RF) 분류기와 샴 구조를 결합한 샴 랜덤 포레스트(이하, SiameseRF)를 학습하는 학습 모듈(100) 및 학습된 SiameseRF를 이용해 객체를 추적하는 객체 추적 모듈(200)을 포함하여 구성될 수 있다.1 is a diagram showing the configuration of a multi-object tracking apparatus using a Siamese random forest according to an embodiment of the present invention. As shown in FIG. 1, the apparatus for tracking multiple objects using a Siamese random forest according to an embodiment of the present invention uses learning data, and combines a Random Forest (RF) classifier with a Siamese structure. Hereinafter, it may be configured to include a learning module 100 for learning SiameseRF and an object tracking module 200 for tracking an object using the learned SiameseRF.

즉, 본 발명에서는, 정확도 높은 랜덤 포레스트와 빠르게 학습 및 분류를 할 수 있는 샴 구조를 결합한 샴 랜덤 포레스트 프레임워크를 제안한다.That is, in the present invention, we propose a Siamese random forest framework that combines a high-accuracy random forest with a Siamese structure capable of fast learning and classification.

한편, 본 발명은 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법에 관한 것으로서, 본 발명의 특징에 따른 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법은, 메모리 및 프로세서를 포함한 하드웨어에서 기록되는 소프트웨어로 구성될 수 있다. 예를 들어, 본 발명의 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법은, 개인용 컴퓨터, 노트북 컴퓨터, 서버 컴퓨터, PDA, 스마트폰, 태블릿 PC, 자율주행차량 등에 저장 및 구현될 수 있다. 이하에서는 설명의 편의를 위해, 각 단계를 수행하는 주체는 생략될 수 있다.On the other hand, the present invention relates to a method for tracking multiple objects using a Siamese random forest, and the method for tracking multiple objects using a Siamese random forest according to a feature of the present invention may be composed of software recorded in hardware including a memory and a processor. . For example, the multiple object tracking method using the Siamese random forest of the present invention may be stored and implemented in a personal computer, a notebook computer, a server computer, a PDA, a smart phone, a tablet PC, an autonomous vehicle, and the like. Hereinafter, for convenience of description, a subject performing each step may be omitted.

도 2는 본 발명의 일실시예에 따른 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법의 흐름을 도시한 도면이다. 도 2에 도시된 바와 같이, 본 발명의 일실시예에 따른 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법은, 학습 데이터를 이용해 RF 분류기와 샴 구조를 결합한 SiameseRF를 학습하는 단계(S100) 및 학습된 SiameseRF를 이용해 객체를 추적하는 단계(S200)를 포함하여 구현될 수 있다.2 is a diagram illustrating a flow of a method for tracking multiple objects using a Siamese random forest according to an embodiment of the present invention. As shown in FIG. 2 , the method for tracking multiple objects using a Siamese random forest according to an embodiment of the present invention includes learning a SiameseRF that combines an RF classifier and a Siamese structure using training data (S100) and the learned SiameseRF It may be implemented including the step (S200) of tracking the object using

단계 S100에서는, 학습 모듈(100)이 학습 데이터를 이용해, 랜덤 포레스트(Random Forest, RF) 분류기와 샴 구조를 결합한 샴 랜덤 포레스트(이하, SiameseRF)를 학습할 수 있다.In step S100, the learning module 100 may learn a Siamese Random Forest (hereinafter referred to as SiameseRF) in which a Random Forest (RF) classifier and a Siamese structure are combined using the training data.

도 3은 본 발명의 일실시예에 따른 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법 및 장치에서, SiameseRF의 학습 과정을 도시한 도면이다. 도 3에 도시된 바와 같이, 본 발명의 일실시예에 따른 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법의 단계 S100에서는, 기준(anchor), 양성(positive) 또는 음성(negative)으로 각각 레이블 된 이미지를 포함하는 학습 데이터를 이용해, 트리를 구성하는 규칙을 공유하는 두 RF로 구성되는 SiameseRF를 학습하되, 학습 데이터로부터 {기준, 양성}의 제1쌍 및 {기준, 음성}의 제2쌍을 두 RF에 각각 입력하여, 제1쌍과의 유사성이 증가하는 방향과 제2쌍과의 차이가 증가하는 방향으로 SiameseRF를 학습할 수 있다.3 is a diagram illustrating a learning process of SiameseRF in a method and apparatus for tracking multiple objects using a Siamese random forest according to an embodiment of the present invention. As shown in Figure 3, in step S100 of the multiple object tracking method using a Siamese random forest according to an embodiment of the present invention, each labeled image as a reference (anchor), positive (positive) or negative (negative) Using the training data included, learn the SiameseRF consisting of two RFs that share the rules constituting the tree, but from the training data, the first pair of {reference, positive} and the second pair of {reference, negative} are applied to the two RFs. SiameseRF can be learned in a direction in which the similarity with the first pair increases and the difference with the second pair increases by inputting each of them.

보다 구체적으로, 두 RF는, 제1쌍의 거리 벡터(이하, AP 거리 벡터) 및 제2쌍의 거리 벡터(이하, AN 거리 벡터)를 각각 입력 벡터로 할 수 있다. 또한, AP 거리 벡터 및 AN 거리 벡터는, 신경망 기반의 객체 검출 장치에서 검출된 객체에 대한 특징 맵으로부터 생성된 압축 외관 특징(condensed appearance feature, CAF)의 차이일 수 있다.More specifically, the two RFs may use a first pair of distance vectors (hereinafter, AP distance vectors) and a second pair of distance vectors (hereinafter, AN distance vectors) as input vectors, respectively. Also, the AP distance vector and the AN distance vector may be a difference between a condensed appearance feature (CAF) generated from a feature map of an object detected by the neural network-based object detection apparatus.

도 4는 본 발명의 일실시예에 따른 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법에서, 단계 S100의 세부적인 흐름을 도시한 도면이다. 도 4에 도시된 바와 같이, 본 발명의 일실시예에 따른 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법의 단계 S100은, 학습 데이터로부터 특징 벡터를 추출하는 단계(S110) 및 추출된 특징 벡터를 이용해 SiameseRF를 학습하는 단계(S120)를 포함하여 구현될 수 있다.4 is a diagram illustrating a detailed flow of step S100 in a method for tracking multiple objects using a Siamese random forest according to an embodiment of the present invention. As shown in FIG. 4, step S100 of the method for tracking multiple objects using a Siamese random forest according to an embodiment of the present invention includes extracting a feature vector from training data (S110) and using the extracted feature vector SiameseRF It may be implemented including the step of learning (S120).

단계 S110에서는, 학습 데이터로부터 특징 벡터를 추출할 수 있다. 객체가 감지되면 기존 추적 객체와의 유사성을 측정하기 위해 SiameseRF에 감지된 객체를 입력해야 한다. 유사성 측정을 위한 가장 기본적이지만 중요한 단계는 단계 S110의 특징 추출 단계이다. RF는 표 형식 데이터에 대해 매우 우수한 성능을 보이지만, 이미지 및 비디오와 같은 조건 없는 데이터를 적용할 때 성능이 저하되는 단점이 있다. 따라서 SiameseRF의 전처리 과정으로, 객체를 효과적으로 구별할 수 있는 최적의 특징 추출 단계를 적용해야 한다.In step S110, a feature vector may be extracted from the training data. When an object is detected, the detected object must be entered into SiameseRF to measure the similarity with the existing tracking object. The most basic but important step for measuring similarity is the feature extraction step of step S110. Although RF has very good performance for tabular data, it suffers from poor performance when applying unconditional data such as images and videos. Therefore, as a preprocessing process of SiameseRF, the optimal feature extraction step that can effectively distinguish objects should be applied.

도 5는 본 발명의 일실시예에 따른 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법에서, 단계 S110의 세부적인 흐름을 도시한 도면이다. 도 5에 도시된 바와 같이, 본 발명의 일실시예에 따른 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법의 단계 S110은, 객체 검출을 위한 신경망의 첫 번째 레이어 및 두 번째 레이어로부터 부분 특징 맵을 추출하는 단계(S111), 추출한 부분 특징 맵에 각각 GAP를 적용해 2개의 압축 특징을 생성하는 단계(S112) 및 2개의 압축 특징을 연결해 CAF를 생성하는 단계(S113)를 포함하여 구현될 수 있다.5 is a diagram illustrating a detailed flow of step S110 in a method for tracking multiple objects using a Siamese random forest according to an embodiment of the present invention. 5, step S110 of the method for tracking multiple objects using a Siamese random forest according to an embodiment of the present invention is to extract a partial feature map from the first layer and the second layer of the neural network for object detection. Step (S111), generating two compressed features by applying GAP to the extracted partial feature map respectively (S112), and generating CAF by connecting the two compressed features (S113) may be implemented.

단계 S111에서는, 객체 검출을 위한 신경망의 첫 번째 레이어 및 두 번째 레이어로부터 부분 특징 맵을 추출할 수 있다. 본 발명에서는, 특징 추출을 위한 계산 시간을 줄이기 위해 YOLOv3의 근간 네트워크인 darknet53의 서로 다른 레이어로부터 얻은 특징 맵들을 이용할 수 있다. 보다 구체적으로, darknet의 첫 번째 레이어와 두 번째 레이어의 출력 특징맵에서 부분 경계 상자 (bounding box, bbox)에 해당하는 특징 맵을 추출할 수 있다.In step S111, a partial feature map may be extracted from the first layer and the second layer of the neural network for object detection. In the present invention, in order to reduce the computation time for feature extraction, feature maps obtained from different layers of darknet53, the underlying network of YOLOv3, can be used. More specifically, it is possible to extract a feature map corresponding to a partial bounding box (bbox) from the output feature maps of the first and second layers of darknet.

단계 S112에서는, 추출한 부분 특징 맵에 각각 글로벌 평균 풀링(global averaging pooling, GAP)을 적용해, 2개의 압축 특징(condensed feature)을 생성할 수 있다. GAP 방법은 3D 텐서의 공간 치수를 줄이는 데 사용되며, 모델의 총 매개변수 수를 줄여 과적합을 최소화할 수 있는 장점이 있다. 보다 구체적으로, 각각의 부분 특징 맵에 GAP를 적용해 2개의 1×1×C 특징 벡터를 생성할 수 있는데, 이를 압축 특징이라고 할 수 있다.In step S112 , two condensed features may be generated by applying global averaging pooling (GAP) to the extracted partial feature maps, respectively. The GAP method is used to reduce the spatial dimension of the 3D tensor, and has the advantage of minimizing overfitting by reducing the total number of parameters of the model. More specifically, two 1×1×C feature vectors can be generated by applying GAP to each partial feature map, which can be referred to as a compressed feature.

단계 S113에서는, 생성된 2개의 압축 특징을 연결해 최종 압축 외관 특징(condensed appearance feature, CAF)을 생성할 수 있다. 즉, 단계 S120에서 생성된 압축 특징(condensed feature)을 연결해 CAF가 될 수 있다. 이하에서 상세히 설명할 단계 S120에서 SiameseRF를 학습할 때에는, {기준, 양성}의 제1쌍의 CAF의 차이와 {기준, 음성}의 제2쌍의 CAF의 차이가 입력 벡터로 사용될 수 있다.In step S113, a final condensed appearance feature (CAF) may be generated by concatenating the two generated compressed features. That is, the CAF may be obtained by connecting the condensed features generated in step S120. When learning SiameseRF in step S120, which will be described in detail below, the difference between the CAF of the first pair of {reference, positive} and the difference between the CAF of the second pair of {reference, negative} may be used as an input vector.

단계 S120에서는, 추출된 특징 벡터를 이용해 SiameseRF를 학습할 수 있다. SiameseRF의 학습 과정에서, 먼저 L 앙상블 트리로 구성된 초기 RF를 생성할 수 있다. 두 RF가 입력으로 {기준, 양성}의 제1쌍 및 {기준, 음성}의 제2쌍을 각각 수신해 학습하지만, 두 RF는 같은 구조를 공유할 수 있다. 따라서 학습 과정에서 공유 RF(shared RF)는 제1쌍과의 유사성이 증가하는 방향과 제2쌍과의 차이가 증가하는 방향으로 학습될 수 있다. 두 개의 RF는 샴 CNN과 달리 가중치를 공유하지는 않지만, 트리를 구성하는 규칙을 공유할 수 있다.In step S120, SiameseRF may be learned using the extracted feature vector. In the learning process of SiameseRF, an initial RF composed of L ensemble trees can be generated first. Although both RFs receive and learn the first pair of {reference, positive} and the second pair of {reference, negative} as inputs, respectively, the two RFs can share the same structure. Therefore, in the learning process, the shared RF may be learned in a direction in which similarity with the first pair increases and a difference with the second pair increases. The two RFs do not share weights, unlike Siamese CNNs, but they can share the rules for constructing the tree.

보다 구체적으로, 단계 S120에서는, 제1쌍의 거리 벡터(이하, AP 거리 벡터) 및 제2쌍의 거리 벡터(이하, AN 거리 벡터)를 각각 두 RF의 입력 벡터로 하고, K겹 교차 검증(K-fold cross validation)을 이용해 Siamese RF를 학습할 수 있다. 여기서, AP 거리 벡터 및 AN 거리 벡터는, 신경망 기반의 객체 검출 장치에서 검출된 객체에 대한 특징 맵으로부터 생성된 압축 외관 특징(CAF)의 차이일 수 있다.More specifically, in step S120, the first pair of distance vectors (hereinafter referred to as AP distance vectors) and the second pair of distance vectors (hereinafter referred to as AN distance vectors) are used as input vectors of two RFs, respectively, and K-fold cross-validation ( Siamese RF can be trained using K-fold cross validation. Here, the AP distance vector and the AN distance vector may be a difference between a compressed appearance feature (CAF) generated from a feature map of an object detected by the neural network-based object detection apparatus.

즉, 각각의 RF의 입력으로서, 각 이미지의 외관 특징인 CAF 차이(AP_i, AN_i)가 특징으로 입력될 수 있다. 벡터

은 다음 수학식 1과 같은 경우에만 거리 벡터이다.That is, as an input of each RF, the CAF difference AP _i , AN _i , which is an appearance feature of each image, may be input as a feature. vector

is a distance vector only in the case of Equation 1 below.

여기서, d는 L2 거리 함수, a_i∈anchor, p_i∈positive, n_i∈negative 이고, m은 anchor, positive, negative의 각 레이블에 포함된 샘플 수이다.where d is the L2 distance function, a _i ∈anchor, p _i ∈positive, n _i ∈negative , and m is the number of samples included in each label of anchor, positive, and negative.

한편, 단계 S120에서는, 공유 RF의 학습 단계를 반복하기 위해, 학습 과정에 대한 K겹 교차 검증을 채택해 모델의 정확성을 높였다. K겹 교차 검증은 과적합의 위험을 줄이면서 최적의 규칙 개수와 매개변수를 결정할 수 있다.Meanwhile, in step S120, in order to repeat the learning step of the shared RF, K-fold cross-validation for the learning process was adopted to increase the accuracy of the model. K-fold cross-validation can determine the optimal number of rules and parameters while reducing the risk of overfitting.

도 6은 본 발명의 일실시예에 따른 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법에서, 단계 S120의 세부적인 흐름을 도시한 도면이다. 도 6에 도시된 바와 같이, 본 발명의 일실시예에 따른 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법의 단계 S120은, 학습 데이터에서 선택된 K-1 폴드는 학습 세트로, 나머지 폴드는 검증 세트로 하는 단계(S121), CAF를 이용해 학습 세트에 포함된 샘플의 AP 거리 벡터 및 AN 거리 벡터를 추정하는 단계(S122), 규칙을 공유하는 두 RF에 학습 세트의 샘플 쌍을 입력하는 단계(S123), 입력된 샘플 쌍에 대해 추정된 AP 거리 벡터 및 AN 거리 벡터를 이용해 두 RF를 학습하는 단계(S124), K-1 폴드 학습 후, 검증 세트의 모든 샘플 쌍에 대해 추정된 AP 거리 벡터 및 AN 거리 벡터를 학습된 RF에 적용하여 검증하는 단계(S125), 학습된 RF 구조 및 총 손실을 저장하는 단계(S126) 및 모든 K개의 폴드에 대한 학습이 완료되면 총 손실이 가장 작은 RF를 최종 SiameseRF로 결정하는 단계(S127)를 포함하여 구현될 수 있다.6 is a diagram illustrating a detailed flow of step S120 in a method for tracking multiple objects using a Siamese random forest according to an embodiment of the present invention. As shown in FIG. 6 , in step S120 of the multiple object tracking method using a Siamese random forest according to an embodiment of the present invention, the K-1 fold selected from the training data is the training set, and the remaining folds are the validation set. Step (S121), estimating the AP distance vector and AN distance vector of the samples included in the training set using CAF (S122), inputting the pair of samples of the training set into two RFs sharing the rule (S123), Learning two RFs using the AP distance vector and AN distance vector estimated for the input sample pair (S124), after K-1 fold learning, the estimated AP distance vector and AN distance for all sample pairs in the validation set Applying the vector to the learned RF and verifying it (S125), storing the learned RF structure and total loss (S126), and when learning for all K folds is completed, the RF with the smallest total loss is used as the final SiameseRF It may be implemented including the determining step (S127).

단계 S121에서는, 학습 데이터에서 선택된 K-1 폴드는 학습 세트로, 나머지 폴드는 검증 세트로 할 수 있다.In step S121, the K-1 fold selected from the training data may be used as the training set, and the remaining folds may be used as the verification set.

단계 S122에서는, CAF를 이용해 학습 세트에 포함된 샘플의 제1쌍의 거리 벡터(이하, AP 거리 벡터) 및 제2쌍의 거리 벡터(이하, AN 거리 벡터)를 추정할 수 있다. 즉, 입력 벡터로 사용하기 위한 AP_i, AN_i를 계산할 수 있다.In step S122, a first pair of distance vectors (hereinafter, AP distance vectors) and a second pair of distance vectors (hereinafter, AN distance vectors) of samples included in the training set may be estimated using CAF. That is, AP _i and AN _i to be used as input vectors may be calculated.

단계 S123에서는, 규칙을 공유하는 두 RF에 학습 세트의 샘플 쌍을 입력할 수 있다. 이때, 단계 S120에서는, K겹 교차 검증의 수렴 여부에 따라 공유 RF로 구성된 규칙 업데이트가 수행할 수 있다.In step S123, a pair of samples in the training set may be input to two RFs sharing a rule. In this case, in step S120, the rule update configured with the shared RF may be performed according to whether the K-fold cross-validation converges.

단계 S124에서는, 입력된 샘플 쌍에 대해 추정된 AP 거리 벡터 및 AN 거리 벡터를 이용해 두 RF를 학습할 수 있다. 즉, L 트리로 구성된 RF를 AP 거리 벡터 및 AN 거리 벡터를 사용해 학습을 수행할 수 있다.In step S124, two RFs may be learned using the AP distance vector and the AN distance vector estimated for the input sample pair. That is, it is possible to learn the RF composed of the L tree using the AP distance vector and the AN distance vector.

단계 S125에서는, K-1 폴드 학습 후, 검증 세트의 모든 샘플 쌍에 대해 추정된 AP 거리 벡터 및 AN 거리 벡터를 단계 S124에서 학습된 RF에 적용하여 검증할 수 있다.In step S125, after learning the K-1 fold, the AP distance vector and the AN distance vector estimated for all sample pairs in the verification set may be applied to the RF learned in step S124 for verification.

즉, 학습 세트 훈련 후, 검증 세트의 {기준, 양성}의 제1쌍 및 {기준, 음성}의 제2쌍으로부터 AP_i, AN_i를 얻을 수 있다. 다음 수학식 2에 따라, AP 거리 벡터 및 AN 거리 벡터를 학습된 RF에 적용하여 삼중 손실(triplet loss, L)을 계산할 수 있다. 이 프로세스는 검증 세트의 모든 n 쌍에 대해 수행될 수 있다.That is, after training on the training set, AP _i and AN _i can be obtained from the first pair of {reference, positive} and the second pair of {reference, negative} of the validation set. According to Equation 2 below, the triplet loss (L) may be calculated by applying the AP distance vector and the AN distance vector to the learned RF. This process can be performed for all n pairs in the verification set.

여기서, α는 margin이다.Here, α is margin.

단계 S126에서는, 학습된 RF 구조 및 다음 수학식 3에 따른 총 손실(total loss) J를 저장할 수 있다. 모든 K개의 폴드가 검증 세트로 사용될 때까지, 단계 S121 내지 단계 S126을 반복하여 학습할 수 있다.In step S126, it is possible to store the learned RF structure and a total loss J according to Equation 3 below. Until all the K folds are used as the verification set, it is possible to learn by repeating steps S121 to S126.

단계 S127에서는, 모든 K개의 폴드에 대한 학습이 완료되면, 총 손실이 가장 작은 RF를 최종 SiameseRF로 결정할 수 있다.In step S127, when learning for all K folds is completed, the RF having the smallest total loss may be determined as the final SiameseRF.

단계 S200에서는, 객체 추적 모듈(200)이, 학습된 SiameseRF를 이용해 객체를 추적할 수 있다. 즉, 단계 S200에서는, 객체가 탐지되면, 학습된 SiameseRF를 이용해 탐지된 객체와 추적 객체 사이의 연관성을 측정해 객체를 추적할 수 있다. 보다 구체적으로, 단계 S200에서는, 탐지된 객체와 추적 객체의 압축 외관 특징(CAF)을 학습된 SiameseRF에 입력하고, SiameseRF의 출력인 두 객체의 유사성 확률(similarity probability)을 연관성 측정을 위한 외관 점수(appearance score)로 할 수 있다.In step S200, the object tracking module 200 may track the object using the learned SiameseRF. That is, in step S200, when an object is detected, the association between the detected object and the tracking object may be measured using the learned SiameseRF to track the object. More specifically, in step S200, the compressed appearance feature (CAF) of the detected object and the tracking object is input to the learned SiameseRF, and the similarity probability of the two objects, which is the output of the SiameseRF, is used as an appearance score ( appearance score).

단계 S200에서는, SiameseRF의 역 유사성 확률값, 두 객체 종횡비 및 두 객체 사이의 L1-중심 거리의 가중치 합으로 산출되는 연관 매칭의 비용 함수를 이용해, 탐지된 객체와 추적 객체가 일치하는지 판단할 수 있다. 즉, 모든 프레임에서, 탐지 객체는 Hungarian 방법과 세 가지 측정, 즉 SiameseRF의 역 확률 값

, 종횡비(aspect ratio)(A_ratio) 및 L1-중심 거리(L1-centered distance)(Dis)를 기반으로 추적 객체에 할당될 수 있다. 마지막으로, 연관 매칭(association matching)의 비용 함수(cost function, c)를 계산하기 위해, 다음 수학식 5와 같이 가중치 합을 사용하여 세 가지 거리 측정값을 결합할 수 있다.In step S200, it may be determined whether the detected object and the tracking object match by using the cost function of associative matching calculated as the weighted sum of the inverse similarity probability value of SiameseRF, the aspect ratio of two objects, and the L1-center distance between the two objects. That is, in every frame, the detection object is determined by the Hungarian method and the inverse probability values of the three measures, i.e., SiameseRF.

, can be assigned to a tracking object based on an aspect ratio (A _ratio ) and L1-centered distance (Dis). Finally, in order to calculate a cost function c of association matching, three distance measurements may be combined using a weighted sum as shown in Equation 5 below.

여기서 α, β 및 γ는 각각 0.6, 0.2 및 0.2의 가중치를 나타내며, 이들 가중치는 여러 실험에 기초하여 미리 설정되었다.Here, α, β, and γ denote weights of 0.6, 0.2 and 0.2, respectively, and these weights were preset based on several experiments.

단계 S200에서는, 탐지된 객체와 추적 객체가 일치하면 탐지된 객체를 추적 객체의 상태에 업데이트하고, 미리 정해진 개수의 프레임 (τ 프레임) 동안 추적 객체와 일치하는 탐지된 객체가 없으면 추적 객체를 삭제하며, 탐지된 객체가 추적 객체와 일치하지 않으면 탐지된 객체를 잠재적 추적 객체로 할당하고 미리 정해진 개수의 프레임에서 잠재적 추적 객체와 탐지된 객체가 일치하면 잠재적 추적 객체를 새로운 추적 객체로 할당할 수 있다. 그렇지 않으면 잘못된 탐지로 인식되고 잠재적 추적 객체는 제거될 수 있다.In step S200, if the detected object and the tracking object match, the detected object is updated to the status of the tracking object, and if there is no detected object matching the tracking object for a predetermined number of frames (τ frames), the tracking object is deleted; , if the detected object does not match the tracking object, the detected object is allocated as a potential tracking object, and if the potential tracking object and the detected object match in a predetermined number of frames, the potential tracking object can be allocated as a new tracking object. Otherwise, it will be recognized as a false detection and the potential tracking object may be removed.

실험예 1Experimental Example 1

본 발명에서 제안하고 있는 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법 및 장치의 성능을 확인하기 위해, 다양한 환경에서 여러 객체를 포함하는 스테레오 카메라에서 캡처한 MOTS Challenge 워크숍 2020 벤치마크 비디오 시퀀스에 적용하여 객체 추적을 수행하였다. 보다 구체적으로, 미리 제공된 고정밀 탐지 세트에서, 본 발명은 sMOTSA에 의해 상당히 정확한 추적 성능을 산출하였다. 즉, 1) MOTS20 보행자의 경우 60%, 2) KITTI 자동차의 경우 71.4%, 및 3) KITTI 보행자의 경우 60.9%의 정확도를 나타냈다. 계산 시간 측면에서 MOTS20 데이터 세트의 경우 평균 8.2fps, KITTI 데이터 세트의 경우 12.4fps가 소요되었다.In order to check the performance of the method and device for tracking multiple objects using Siamese random forest proposed in the present invention, object tracking is applied to the MOTS Challenge Workshop 2020 benchmark video sequence captured from a stereo camera including multiple objects in various environments. was performed. More specifically, in the previously provided high-precision detection set, the present invention yielded a fairly accurate tracking performance by sMOTSA. That is, the accuracy was 1) 60% for MOTS20 pedestrians, 2) 71.4% for KITTI cars, and 3) 60.9% for KITTI pedestrians. In terms of computation time, it took an average of 8.2 fps for the MOTS20 data set and 12.4 fps for the KITTI data set.

실험예 2Experimental Example 2

본 발명에서 제안하고 있는 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법 및 장치의 성능을 확인하기 위해, MOT16 데이터의 추적 결과를 측정하였다. 실험을 위해, 동일한 이미지 시퀀스의 MOTS Challenge 2020이 사용되었다. 검출기로 YOLOv3를 사용하였고, 주어진 MOT16 훈련 데이터를 SiameseRF 학습에 사용하었다.In order to confirm the performance of the method and apparatus for tracking multiple objects using the Siamese random forest proposed in the present invention, the tracking results of MOT16 data were measured. For the experiment, the MOTS Challenge 2020 of the same image sequence was used. YOLOv3 was used as the detector, and the given MOT16 training data were used for SiameseRF training.

도 7은 본 발명의 일실시예에 따른 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법 및 장치의 성능을 검증하기 위한 실험 결과를 도시한 도면이다. 도 7에서 확인할 수 있는 바와 같이, MOT16 테스트 데이터 세트의 비교 실험 결과는 SiameseRF (“Ours”)가 유사한 결과를 가진 다른 MOT 알고리즘보다 상대적으로 빠르다는 것을 보여준다. 또한, 최신 온라인 기반 MOT 방법과 비교하여 제안된 SiameseRF는 전체 성능 측면에서 탁월한 결과를 보여준다.7 is a diagram illustrating experimental results for verifying the performance of a method and apparatus for tracking multiple objects using a Siamese random forest according to an embodiment of the present invention. As can be seen in Fig. 7, the comparative experimental results of the MOT16 test data set show that SiameseRF (“Ours”) is relatively faster than other MOT algorithms with similar results. In addition, compared with the latest online-based MOT method, the proposed SiameseRF shows excellent results in terms of overall performance.

도 8 내지 도 10은 본 발명의 일실시예에 따른 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법 및 장치가 적용된 시연 영상의 일부를 도시한 도면이다. 즉, 움직이는 카메라에서 촬영되는 영상에 SiameseRF를 적용한 객체 추적 결과로서, 도 8, 도 9, 도 10은 시간의 흐름에 따른 순서이다. 도 8 내지 도 10에 도시된 바와 같이, SiameseRF는 움직이는 카메라에서 촬영되는 영상에서 실시간으로 객체를 정확하게 추적하고 있음을 확인할 수 있다.8 to 10 are diagrams illustrating a part of a demonstration image to which a method and apparatus for tracking multiple objects using a Siamese random forest according to an embodiment of the present invention is applied. That is, as an object tracking result obtained by applying SiameseRF to an image captured by a moving camera, FIGS. 8, 9, and 10 are sequences according to the passage of time. As shown in FIGS. 8 to 10 , it can be confirmed that SiameseRF accurately tracks an object in real time in an image captured by a moving camera.

SiameseRF는 역전파가 아닌 K겹 유효성 검사를 사용하므로, 학습 속도가 빠르고 최적의 트리 규칙을 생성할 수 있으며, RF를 구성하는 트리의 규칙이 각 RF와 공유되므로 테스트 중에 메모리 요구량을 줄일 수 있다. 따라서 본 발명에서 제안하고 있는 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법 및 장치는, 제한된 시스템 자원을 가진 임베디드 시스템에서 온라인 추적에 사용될 수 있다는 것이 실험적으로 확인되었다.Since SiameseRF uses K-fold validation rather than backpropagation, it has a fast learning rate and can generate optimal tree rules, and since the rules of the tree constituting the RF are shared with each RF, memory requirements can be reduced during testing. Therefore, it has been experimentally confirmed that the method and apparatus for tracking multiple objects using Siamese random forest proposed in the present invention can be used for online tracking in embedded systems with limited system resources.

전술한 바와 같이, 본 발명에서 제안하고 있는 샴 랜덤 포레스트를 이용한 다수 객체 추적 방법 및 장치에 따르면, 랜덤 포레스트 분류기와 샴 구조를 결합하여 샴 랜덤 포레스트 구조를 제안하고, 학습 데이터로부터 {기준, 양성}의 제1쌍 및 {기준, 음성}의 제2쌍의 이미지를 입력하여, 제1쌍과의 유사성이 증가하는 방향과 제2쌍과의 차이가 증가하는 방향으로 샴 랜덤 포레스트를 시킴으로써, 최종적으로 고속으로 학습 및 분류를 할 수 있고, 카메라 움직임이나 복잡한 보행자 형태에도 불구하고 강력한 추적 성능을 가질 수 있다.As described above, according to the method and apparatus for tracking multiple objects using a siamese random forest proposed in the present invention, a siamese random forest structure is proposed by combining a random forest classifier and a siamese structure, and {reference, positive} By inputting the images of the first pair of and the second pair of {reference, voice}, a Siamese random forest is performed in the direction in which the similarity with the first pair increases and the difference with the second pair increases, and finally It can learn and classify at high speed, and have strong tracking performance despite camera movements or complex pedestrian shapes.

이상 설명한 본 발명은 본 발명이 속한 기술분야에서 통상의 지식을 가진 자에 의하여 다양한 변형이나 응용이 가능하며, 본 발명에 따른 기술적 사상의 범위는 아래의 특허청구범위에 의하여 정해져야 할 것이다.Various modifications and applications of the present invention described above are possible by those skilled in the art to which the present invention pertains, and the scope of the technical idea according to the present invention should be defined by the following claims.

100: 학습 모듈
200: 객체 추적 모듈
S100: 학습 데이터를 이용해 RF 분류기와 샴 구조를 결합한 SiameseRF를 학습하는 단계
S110: 학습 데이터로부터 특징 벡터를 추출하는 단계
S111: 객체 검출을 위한 신경망의 첫 번째 레이어 및 두 번째 레이어로부터 부분 특징 맵을 추출하는 단계
S112: 추출한 부분 특징 맵에 각각 GAP를 적용해 2개의 압축 특징을 생성하는 단계
S113: 2개의 압축 특징을 연결해 CAF를 생성하는 단계
S120: 추출된 특징 벡터를 이용해 SiameseRF를 학습하는 단계
S121: 학습 데이터에서 선택된 K-1 폴드는 학습 세트로, 나머지 폴드는 검증 세트로 하는 단계
S122: CAF를 이용해 학습 세트에 포함된 샘플의 AP 거리 벡터 및 AN 거리 벡터를 추정하는 단계
S123: 규칙을 공유하는 두 RF에 학습 세트의 샘플 쌍을 입력하는 단계
S124: 입력된 샘플 쌍에 대해 추정된 AP 거리 벡터 및 AN 거리 벡터를 이용해 두 RF를 학습하는 단계
S125: K-1 폴드 학습 후, 검증 세트의 모든 샘플 쌍에 대해 추정된 AP 거리 벡터 및 AN 거리 벡터를 학습된 RF에 적용하여 검증하는 단계
S126: 학습된 RF 구조 및 총 손실을 저장하는 단계
S127: 모든 K개의 폴드에 대한 학습이 완료되면 총 손실이 가장 작은 RF를 최종 SiameseRF로 결정하는 단계
S200: 학습된 SiameseRF를 이용해 객체를 추적하는 단계100: learning module
200: object tracking module
S100: Step of learning SiameseRF combining RF classifier and Siamese structure using training data
S110: extracting a feature vector from the training data
S111: extracting a partial feature map from the first layer and the second layer of the neural network for object detection
S112: A step of generating two compressed features by applying GAP to each of the extracted partial feature maps
S113: Concatenate two compression features to create CAF
S120: Learning SiameseRF using the extracted feature vector
S121: A step of using the K-1 fold selected from the training data as the training set and the remaining folds as the validation set
S122: estimating the AP distance vector and the AN distance vector of the samples included in the training set using CAF
S123: inputting sample pairs of training set into two RFs sharing rules
S124: Learning two RFs using the AP distance vector and the AN distance vector estimated for the input sample pair
S125: After K-1 fold learning, applying the estimated AP distance vector and AN distance vector to the learned RF for all sample pairs in the validation set to verify
S126: Storing the learned RF structure and total loss
S127: When all K folds are trained, the RF with the smallest total loss is determined as the final SiameseRF
S200: Step of tracking an object using the learned SiameseRF

Claims

A method for tracking multiple objects, comprising:
(a) learning a Siamese Random Forest (hereinafter referred to as SiameseRF) combining a Random Forest (RF) classifier and a Siamese structure using the training data; and
(b) tracking the object using the learned SiameseRF,
In step (a),
Using training data including images labeled as anchor, positive or negative, respectively, learn the SiameseRF consisting of two RFs that share the rules for constructing a tree, but from the training data The first pair of {reference, positive} and the second pair of {reference, negative} are respectively input to the two RFs, in a direction in which the similarity of the first pair increases and the difference between the second pair increases A method for tracking multiple objects using a Siamese random forest, characterized in that learning the SiameseRF.

According to claim 1, wherein the two RF,
A method for tracking multiple objects using a Siamese random forest, characterized in that the first pair of distance vectors (hereinafter referred to as AP distance vectors) and the second pair of distance vectors (hereinafter referred to as AN distance vectors) are used as input vectors, respectively.

The method of claim 2, wherein the AP distance vector and the AN distance vector are:
A method for tracking multiple objects using a Siamese random forest, characterized in that it is a difference in a compressed appearance feature (CAF) generated from a feature map for an object detected by a neural network-based object detection apparatus.

According to claim 1, wherein the step (a),
(1) extracting a feature vector from the training data; and
(2) A method for tracking multiple objects using a Siamese random forest, characterized in that it comprises the step of learning the SiameseRF using the extracted feature vector.

The method of claim 4, wherein in step (1),
(1-1) extracting a partial feature map from a first layer and a second layer of a neural network for object detection;
(1-2) generating two compressed features by applying global averaging pooling (GAP) to the extracted partial feature maps, respectively; and
(1-3) A method for tracking multiple objects using a Siamese random forest, comprising: generating a final condensed appearance feature (CAF) by concatenating the two generated compressed features.

The method of claim 4, wherein in step (2),
A method for tracking multiple objects using a Siamese random forest, characterized in that the Siamese RF is learned using K-fold cross validation.

The method of claim 6, wherein step (2) comprises:
(2-1) using the K-1 fold selected from the training data as a training set and the remaining folds as a verification set;
(2-2) estimating the first pair of distance vectors (hereinafter, AP distance vectors) and the second pair of distance vectors (hereinafter, AN distance vectors) of the samples included in the training set using CAF;
(2-3) inputting a pair of samples of the training set into the two RFs sharing a rule;
(2-4) learning the two RFs using the estimated AP distance vector and AN distance vector for the input sample pair;
(2-5) verifying by applying the AP distance vector and AN distance vector estimated for all sample pairs in the verification set to the RF learned in step (2-4) after the K-1 fold learning; and
(2-6) storing the learned RF structure and total loss,
A method for tracking multiple objects using a Siamese random forest, characterized in that learning is repeated by repeating steps (2-1) to (2-6) until all K folds are used as a verification set.

The method of claim 7, wherein after step (2-6),
(2-7) When learning for all K folds is completed, the method for tracking multiple objects using a Siamese random forest, characterized in that it further comprises determining the RF having the smallest total loss as the final SiameseRF.

The method of claim 7, wherein in step (2),
A method for tracking multiple objects using a Siamese random forest, characterized in that a rule update composed of a shared RF is performed according to whether the K-fold cross-validation converges.

The method of claim 1, wherein in step (b),
When an object is detected, a multiple object tracking method using a Siamese random forest, characterized in that the object is tracked by measuring the association between the detected object and the tracking object using the learned SiameseRF.

The method of claim 10, wherein in step (b),
Inputting the compressed appearance characteristics of the detected object and the tracking object to the learned SiameseRF, and using the similarity probability of the two objects, which is an output of the SiameseRF, as an appearance score for correlation measurement A method for tracking multiple objects using a Siamese random forest.

The method of claim 11, wherein in step (b),
Using the cost function of associative matching calculated as the weighted sum of the inverse similarity probability value of the SiameseRF, the aspect ratio of two objects (A_ratio), and the L1-center distance between the two objects, determining whether the detected object and the tracking object match A method for tracking multiple objects using a Siamese random forest.

The method of claim 12, wherein in step (b),
If the detected object matches the tracking object, the detected object is updated in the status of the tracking object, and if there is no detected object matching the tracking object for a predetermined number of frames, the tracking object is deleted, and the detection allocating the detected object as a potential tracking object if the detected object does not match the tracking object, and allocating the potential tracking object as a new tracking object Characterized, a method for tracking multiple objects using a Siamese random forest.

A multi-object tracking device comprising:
a learning module 100 for learning a Siamese Random Forest (hereinafter, SiameseRF) combining a Random Forest (RF) classifier and a Siamese structure using the training data; and
It includes an object tracking module 200 for tracking an object using the learned SiameseRF,
The learning module 100,
Using training data including images labeled as anchor, positive or negative, respectively, learn the SiameseRF consisting of two RFs that share the rules for constructing a tree, but from the training data The first pair of {reference, positive} and the second pair of {reference, negative} are respectively input to the two RFs, in a direction in which the similarity of the first pair increases and the difference between the second pair increases A multi-object tracking device using a Siamese random forest, characterized in that it learns the SiameseRF.

15. The method of claim 14, wherein the two RF,
The apparatus for tracking multiple objects using a Siamese random forest, characterized in that the first pair of distance vectors (hereinafter referred to as AP distance vectors) and the second pair of distance vectors (hereinafter referred to as AN distance vectors) are used as input vectors, respectively.

The method of claim 15, wherein the AP distance vector and the AN distance vector are:
A multi-object tracking apparatus using a Siamese random forest, characterized in that it is a difference in a compressed appearance feature (CAF) generated from a feature map for an object detected in a neural network-based object detection apparatus.