KR102339727B1

KR102339727B1 - Robust visual object tracking based on global and local search with confidence estimation

Info

Publication number: KR102339727B1
Application number: KR1020200009987A
Authority: KR
Inventors: 조근식; 방양
Original assignee: 인하대학교 산학협력단
Priority date: 2020-01-28
Filing date: 2020-01-28
Publication date: 2021-12-15
Also published as: KR20210096473A

Abstract

일 실시예에 따른 추적 시스템에 의해 수행되는 글로벌 및 로컬 검색 방법은, 다중 스케일 기반의 타겟 인식 탐지기에 지역 제안 네트워크(Region proposal)를 구축하는 단계; DCF 기반의 추적 모델과 상기 다중 스케일 기반의 타겟 인식 탐지기가 협력적으로 결합하여 글로벌 및 로컬 검색을 수행하는 단계; 및 상기 수행된 글로벌 및 로컬 검색을 통해 타겟 객체의 위치를 추정하는 단계를 포함할 수 있다. A global and local search method performed by a tracking system according to an embodiment comprises: constructing a region proposal network in a multi-scale-based target recognition detector; performing global and local searches by cooperatively combining the DCF-based tracking model and the multi-scale-based target recognition detector; and estimating the location of the target object through the performed global and local searches.

Description

ROBUST VISUAL OBJECT TRACKING BASED ON GLOBAL AND LOCAL SEARCH WITH CONFIDENCE ESTIMATION

아래의 설명은 객체(Object) 추적 기술에 관한 것이다.The description below relates to object tracking technology.

시각적 객체 추적은 개방적이고 도전적인 문제로서, 온라인 추적자는 심지어 타겟 이동과 배경 폐쇄와 같은 복잡한 시나리오에서도 오랫동안 타겟 객체를 추적할 수 있어야 한다. 차별적 상관 필터(DCF)는 원형 밀도 샘플링 메커니즘과 이산 푸리에(Fourier) 변환을 통한 빠른 계산 덕분에 단기 타겟 추적 문제에서 탁월한 성능을 보였다. 그러나 타겟이 급격한 변형, 빠른 움직임, 또는 배경 폐쇄에 부딪쳤을 때 타겟으로부터 변형되는 경향이 있다. 이는 추적기가 이전 프레임에 타겟이 위치했던 위치의 로컬 지역에서 타겟을 검색하기 때문에 잘못된 모델 업데이트를 초래할 수 있다. 타겟 재확인과 위치 재확인을 위한 복구 메커니즘이 없는 것이다.Visual object tracking is an open and challenging problem, requiring online trackers to be able to track target objects for long periods of time, even in complex scenarios such as target movement and background closure. The differential correlation filter (DCF) excelled in short-term target tracking problems thanks to its circular density sampling mechanism and fast computation through discrete Fourier transforms. However, it tends to deform from the target when it encounters a sharp deformation, rapid movement, or background occlusion. This can lead to incorrect model updates because the tracker searches for the target in the local area of where the target was located in the previous frame. There is no recovery mechanism for re-targeting and re-locating.

현대의 객체 탐지에서 지배적인 패러다임은 PASCAL VOC 2007/2012와 같은 표준 벤치마크와 최근 도전적인 COCO 데이터 세트에서 일관되게 최고의 정확도를 달성하는 2단계 탐지기에 기초한다. 고속 R-CNN 네트워크는 전체 이미지와 객체 제안 세트를 입력으로 한다. 네트워크는 먼저 입력 이미지를 전방으로 전파하여 CNN 특징맵을 제작한 다음 관심 영역(ROI) 풀링 계층을 적용하여 각 객체 제안 시 특징맵에서 고정 길이의 특징 벡터를 추출한다. 그런 다음 각 특징 벡터는 최종 객체 분류 및 경계 박스 회귀 분석을 위해 완전히 연결된 여러 계층으로 공급된다. 더 빠른 훈련과 시험 속도, 더 높은 정확도를 위해, 더 빠른 R-CNN에서 후보 객체 경계 박스를 제안하기 위해 지역 제안 네트워크(Region Proposal Network)이 제안되었다. 더 빠른 추론을 위해 객체 제안 생성, 객체 분류 및 경계 박스 회귀 분석을 위한 공유 콘볼루션 특징을 사용한다. The dominant paradigm in modern object detection is based on standard benchmarks such as PASCAL VOC 2007/2012 and a two-stage detector that consistently achieves the highest accuracy on recently challenging COCO data sets. The high-speed R-CNN network takes the full image and object proposal set as input. The network first propagates the input image forward to produce a CNN feature map, and then applies a region of interest (ROI) pooling layer to extract a fixed-length feature vector from the feature map when each object is proposed. Each feature vector is then fed into multiple fully connected layers for final object classification and bounding box regression analysis. For faster training and testing speed, and higher accuracy, a Region Proposal Network has been proposed to propose candidate object bounding boxes in faster R-CNN. It uses shared convolution features for object proposal generation, object classification, and bounding box regression analysis for faster inference.

이에, 기존의 최첨단 상관 필터(CF) 기반 추적 방법의 잠재적 약점을 탐구할 필요가 있다. 다시 말해서, 단순한 합성 원형 샘플링 메커니즘 때문에 타겟이 급격한 변형, 빠른 움직임, 또는 배경 폐쇄 등의 극단적인 경우에 이동하는 경향이 있다. 추적기 이동 문제가 발생하면 타겟 재지정 및 재배치를 위한 복구 메커니즘이 없어 추적 실패를 초래한다. Therefore, it is necessary to explore the potential weaknesses of the existing state-of-the-art correlation filter (CF)-based tracking methods. In other words, because of the simple synthetic circular sampling mechanism, the target tends to move in extreme cases such as sharp deformation, rapid movement, or background closure. When a tracker movement problem occurs, there is no recovery mechanism for retargeting and relocation, resulting in tracking failures.

새로운 타겟 인식 탐지기와 함께 차별적 상관 필터(DCF) 기반 추적 모델을 공동으로 적용하는 글로벌 및 로컬 검색 기법을 제안한다.We propose a global and local search technique that jointly applies a differential correlation filter (DCF)-based tracking model together with a novel target recognition detector.

추적 시스템에 의해 수행되는 글로벌 및 로컬 검색 방법은, 다중 스케일 기반의 타겟 인식 탐지기에 지역 제안 네트워크(Region proposal)를 구축하는 단계; DCF 기반의 추적 모델과 상기 다중 스케일 기반의 타겟 인식 탐지기가 협력적으로 결합하여 글로벌 및 로컬 검색을 수행하는 단계; 및 상기 수행된 글로벌 및 로컬 검색을 통해 타겟 객체의 위치를 추정하는 단계를 포함할 수 있다. A global and local search method performed by a tracking system includes: constructing a region proposal network in a multi-scale-based target recognition detector; performing global and local searches by cooperatively combining the DCF-based tracking model and the multi-scale-based target recognition detector; and estimating the location of the target object through the performed global and local searches.

상기 타겟 객체의 위치를 추정하는 단계는, 현재 프레임의 추적 결과에 따른 추적 모델의 신뢰도를 측정하기 위한 피크대 사이드로브(sidelobe) 비율(EPSR)을 제안하고, 상기 피크대 사이드로브 비율에 기초하여 단일 추적 또는 공동 추적 여부를 결정하고, 계산 효율성을 위한 추적 모델을 업데이트하는 단계를 포함할 수 있다. In the estimating of the location of the target object, a peak-to-sidelobe ratio (EPSR) for measuring the reliability of a tracking model according to a tracking result of the current frame is proposed, and based on the peak-to-sidelobe ratio Determining whether to single-track or joint-track, and update the tracking model for computational efficiency.

상기 타겟 객체의 위치를 추정하는 단계는, 상기 다중 스케일 기반의 타겟 인지 탐지기가 공간 및 스케일 제약 조건과 함께 유사한 객체 후보를 생성하고, 상기 DCF 기반의 추적 모델을 통하여 전경 및 백그라운드 간섭을 구별하기 위한 역할을 수행하여 객체 재확인을 위해 객체 후보의 순위를 재정렬하는 단계를 포함할 수 있다. In the estimating of the location of the target object, the multi-scale-based target recognition detector generates a similar object candidate with spatial and scale constraints, and distinguishes foreground and background interference through the DCF-based tracking model. It may include reordering the ranking of object candidates for reconfirming objects by performing a role.

상기 타겟 객체의 위치를 추정하는 단계는, 상기 다중 스케일 기반의 타겟 인식 탐지기에서 추적 객체가 탐지되는 것을 보장하기 위해 객체 회수율이 기 설정된 탐지 점수 이상을 가진 객체 제안을 선택하고, 상기 선택된 객체 제안의 중심점과 이전에 추정된 추적 객체의 중심점 사이의 상대적 거리를 계산함으로써 공간 제약을 적용하고, 상기 선택된 객체 제안과 추정된 추적 출력 사이의 통합 함수에 대한 교차점 조합 함수에 의한 스케일 제약을 적용하는 단계를 포함할 수 있다. The step of estimating the location of the target object may include selecting an object proposal having an object recovery rate equal to or greater than a preset detection score in order to ensure that the tracking object is detected by the multi-scale-based target recognition detector, applying a spatial constraint by calculating the relative distance between the centroid and the centroid of a previously estimated tracking object, and applying a scale constraint by an intersection combination function to the integration function between the selected object proposal and the estimated tracking output. may include

상기 타겟 객체의 위치를 추정하는 단계는, 상기 공간 제약 및 스케일 제약을 적용함에 따라 타겟 객체 후보를 생성하고, 상기 생성된 타겟 객체 후보 내에서 새로운 프레임에서 추적 타겟 객체가 될 가능성이 높은 후보 경계 박스를 나타내고, 각각의 후보 경계 박스의 상관 신뢰도를 계산하여 최종의 타겟 객체를 추정하는 단계를 포함할 수 있다. In the estimating of the location of the target object, a target object candidate is generated by applying the spatial constraint and the scale constraint, and a candidate bounding box with a high probability of becoming a tracking target object in a new frame within the generated target object candidate , and calculating the correlation reliability of each candidate bounding box to estimate the final target object.

글로벌 및 로컬 검색을 위한 추적 시스템은, 다중 스케일 기반의 타겟 인식 탐지기에 지역 제안 네트워크(Region proposal)를 구축하는 네트워크 구축부; DCF 기반의 추적 모델과 상기 다중 스케일 기반의 타겟 인식 탐지기가 협력적으로 결합하여 글로벌 및 로컬 검색을 수행하는 검색 수행부; 및 상기 수행된 글로벌 및 로컬 검색을 통해 타겟 객체의 위치를 추정하는 위치 추정부를 포함할 수 있다. A tracking system for global and local searches includes: a network builder configured to construct a region proposal network in a multi-scale-based target recognition detector; a search execution unit that cooperatively combines the DCF-based tracking model and the multi-scale-based target recognition detector to perform global and local searches; and a location estimator for estimating the location of the target object through the performed global and local searches.

극단적인 시나리오에서 드리프트 문제를 피하는 데 효과적으로 도움을 줄 수 있으며 실행 시간을 희생하지 않으면서 추적 견고성을 크게 향상시킬 수 있다. It can effectively help avoid drift problems in extreme scenarios and can significantly improve tracking robustness without sacrificing execution time.

향후 작업을 위해보다 효율적인 모델 업데이트 방법론을 탐색하고 협업 메커니즘을 최적화하여 성능을 더욱 향상시킬 수 있다.For future work, more efficient model update methodologies can be explored and collaboration mechanisms can be optimized to further improve performance.

도 1은 일 실시예에 따른 추적 시스템의 구성을 설명하기 위한 블록도이다.
도 2는 일 실시예에 있어서, 멀티 스케일(Multi-scale) 지역 제안 네트워크를 나타낸 도면이다.
도 3은 일 실시예에 있어서, 글로벌 및 로컬 검색 알고리즘의 전반적인 프로세스를 설명하기 위한 도면이다.
도 4는 일 실시예에 있어서, 그라운드 트루 경계 박스, 알고리즘 1에 의해 생성된 9개의 앵커 상자, 앵커 상자에서 회귀된 최종의 객체 영역을 나타낸 예이다.
도 5는 일 실시예에 있어서, 공간 제약 및 스케일 제약을 설명하기 위한 도면이다.
도 6은 일 실시예에 있어서, 향상된 피크 대 sidelobe 비율(EPSR) 값 및 DCF의 탐지 점수를 나타낸 도면이다. 1 is a block diagram for explaining the configuration of a tracking system according to an embodiment.
2 is a diagram illustrating a multi-scale regional proposal network according to an embodiment.
3 is a diagram for explaining an overall process of a global and local search algorithm, according to an embodiment.
4 is an example showing a ground-true bounding box, nine anchor boxes generated by Algorithm 1, and a final object region regressed from the anchor box, according to an embodiment.
5 is a diagram for explaining a space constraint and a scale constraint according to an embodiment.
6 is a diagram illustrating an improved peak-to-sidelobe ratio (EPSR) value and a detection score of DCF, according to an embodiment.

이하, 실시예를 첨부한 도면을 참조하여 상세히 설명한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings.

실시예에서 제안하는 추적 모델은 높은 추적 신뢰도로 로컬 검색 프로세스를 수행하고, 제안된 추적 시스템에 의해 추적 모델의 불안정성 및 신뢰 변동이 탐지되었을 때 전체 프레임에서 글로벌 검색을 통해 타겟을 재지정하고 위치를 파악하기 위해 타겟 인식 탐지기를 실행할 수 있다. 또한, 신뢰도 추정을 위해 향상된 피크 대 사이드로브(sidelobe) 비율(EPSR)을 설계하여 불안정성과 변동 정도를 나타내기로 한다. 이에 따라 최종 타겟의 상태 추정과 추정 모델의 업데이트 과정에 대해 로컬 추적 모델과 타겟 인식 탐지기가 공동으로 적용될 수 있다. 이에, 불량 업데이트로부터 모델 손상을 피할 수 있을 뿐만 아니라, 장기적인 추적을 위해 문제를 이동시키는 것을 방지한다. The tracking model proposed in the embodiment performs a local search process with high tracking reliability, and when instability and confidence fluctuations of the tracking model are detected by the proposed tracking system, it retargets and locates the target through global search in the entire frame. To do this, you can run a target recognition detector. In addition, an improved peak-to-sidelobe ratio (EPSR) is designed for reliability estimation to indicate the degree of instability and variation. Accordingly, the local tracking model and the target recognition detector may be jointly applied to the state estimation of the final target and the updating process of the estimation model. This not only avoids model corruption from bad updates, but also avoids moving problems for long-term tracking.

도 1은 일 실시예에 따른 추적 시스템의 구성을 설명하기 위한 블록도이다.1 is a block diagram for explaining the configuration of a tracking system according to an embodiment.

추적 시스템(100)은 글로벌 및 로컬 검색을 수행하기 위한 것으로, 네트워크 구축부(110), 검색 수행부(120) 및 위치 추정부(130)를 포함할 수 있다.The tracking system 100 is for performing global and local searches, and may include a network building unit 110 , a search performing unit 120 , and a location estimating unit 130 .

네트워크 구축부(110)는 다중 스케일 기반의 타겟 인식 탐지기에 지역 제안 네트워크를 구축할 수 있다. The network construction unit 110 may build a local proposal network in a multi-scale-based target recognition detector.

검색 수행부(120)는 DCF 기반의 추적 모델과 다중 스케일 기반의 타겟 인식 탐지기가 협력적으로 결합하여 글로벌 및 로컬 검색을 수행할 수 있다. The search performing unit 120 may perform global and local searches by cooperatively combining a DCF-based tracking model and a multi-scale-based target recognition detector.

위치 추정부(130)는 수행된 글로벌 및 로컬 검색을 통해 타겟 객체의 위치를 추정할 수 있다. 위치 추정부(130)는 현재 프레임의 추적 결과에 따른 추적 모델의 신뢰도를 측정하기 위한 피크대 사이드로브(sidelobe) 비율(EPSR)을 제안하고, 피크대 사이드로브 비율에 기초하여 단일 추적 또는 공동 추적 여부를 결정하고, 계산 효율성을 위한 추적 모델을 업데이트할 수 있다. 위치 추정부(130)는 다중 스케일 기반의 타겟 인지 탐지기가 공간 및 스케일 제약 조건과 함께 유사한 객체 후보를 생성하고, DCF 기반의 추적 모델을 통하여 전경 및 백그라운드 간섭을 구별하기 위한 역할을 수행하여 객체 재확인을 위해 객체 후보의 순위를 재정렬할 수 있다. 위치 추정부(130)는 다중 스케일 기반의 타겟 인식 탐지기에서 추적 객체가 탐지되는 것을 보장하기 위해 객체 회수율이 기 설정된 탐지 점수 이상을 가진 객체 제안을 선택하고, 선택된 객체 제안의 중심점과 이전에 추정된 추적 객체의 중심점 사이의 상대적 거리를 계산함으로써 공간 제약을 적용하고, 선택된 객체 제안과 추정된 추적 출력 사이의 통합 함수에 대한 교차점 조합 함수에 의한 스케일 제약을 적용할 수 있다. 위치 추정부(130)는 공간 제약 및 스케일 제약을 적용함에 따라 타겟 객체 후보를 생성하고, 생성된 타겟 객체 후보 내에서 새로운 프레임에서 추적 타겟 객체가 될 가능성이 높은 후보 경계 박스를 나타내고, 각각의 후보 경계 박스의 상관 신뢰도를 계산하여 최종의 타겟 객체를 추정할 수 있다. The location estimator 130 may estimate the location of the target object through the performed global and local searches. The position estimator 130 proposes a peak-to-sidelobe ratio (EPSR) for measuring the reliability of the tracking model according to the tracking result of the current frame, and single tracking or joint tracking based on the peak-to-sidelobe ratio It can determine whether or not to update the tracking model for computational efficiency. The location estimator 130 performs a role for the multi-scale-based target recognition detector to generate similar object candidates with spatial and scale constraints, and to distinguish foreground and background interferences through a DCF-based tracking model to reconfirm the object. For this purpose, the ranking of object candidates can be rearranged. The location estimator 130 selects an object proposal having an object recovery rate greater than or equal to a preset detection score in order to ensure that the tracking object is detected in the multi-scale-based target recognition detector, and the center point of the selected object proposal and the previously estimated It is possible to apply a spatial constraint by calculating the relative distance between the center points of the tracking object, and a scale constraint by the intersection combination function to the integration function between the selected object proposal and the estimated tracking output. The position estimator 130 generates a target object candidate by applying a spatial constraint and a scale constraint, and indicates a candidate bounding box that is highly likely to become a tracking target object in a new frame within the generated target object candidate, and each candidate The final target object may be estimated by calculating the correlation reliability of the bounding box.

도 3은 일 실시예에 따른 글로벌 및 로컬 검색 알고리즘의 전반적인 프로세스를 설명하기 위한 도면이다.3 is a diagram for describing an overall process of a global and local search algorithm according to an embodiment.

실시예에 따른 추적 시스템은 추적 필드에 멀티 스케일(Multi-scale) RPN 기반의 타겟 인식 탐지기를 구성할 수 있다. 마지막 콘볼루션 특징맵 위에 단일 RPN을 사용하는 대신, 두 번째 콘볼루션 계층을 통과 계층 위에 한 개의 지역 제안 네트워크를 추가할 수 있다. 이러한 방식으로 상기 지역 제안 네트워크는 더 높은 공간 해상도 정보와 풍부한 차별적 정보를 모두 학습하여 타겟 객체를 더 잘 지역화할 수 있다. The tracking system according to the embodiment may configure a multi-scale RPN-based target recognition detector in the tracking field. Instead of using a single RPN on top of the last convolutional feature map, we can add one local proposal network on top of the pass-through layer through the second convolutional layer. In this way, the regional suggestion network can learn both higher spatial resolution information and rich differential information to better localize the target object.

멀티 스케일 RPN 기반의 타겟 인식 탐지기는 콘볼루션 계층의 미세한 특징과 심층 콘볼루션 계층의 의미적 특징을 모두 처리할 수 있다. 얕은 계층의 특징은 정확한 객체 위치 추정에 우수하며, 깊은 콘볼루션 계층의 특징은 타겟 객체와 배경 사이를 효율적으로 구별할 수 있다. The multi-scale RPN-based target recognition detector can process both the fine features of the convolutional layer and the semantic features of the deep convolutional layer. The feature of the shallow layer is excellent for accurate object location estimation, and the feature of the deep convolutional layer can efficiently distinguish between the target object and the background.

실시예에 따른 추적 시스템은 DCF 기반 추적 모델과 장기적인 객체 추적을 위해 제안된 타겟 인식 탐지기를 협력적으로 결합하는 글로벌 및 로컬 검색 알고리즘을 제안할 수 있다. 또한, 추적 모델 신뢰성 정도를 나타내는 모델 신뢰도 추정기로 향상된 피크 대 sidelobe 비율(EPSR)을 설계할 수 있다. 추적 알고리즘은 계산 효율을 위해 EPSR 값으로 표시된 로컬 기반 검색과 글로벌 및 로컬 검색 사이를 전환한다.The tracking system according to the embodiment may propose a global and local search algorithm that cooperatively combines the DCF-based tracking model and the proposed target recognition detector for long-term object tracking. In addition, the improved peak-to-sidelobe ratio (EPSR) can be designed as a model reliability estimator representing the degree of tracking model reliability. The tracking algorithm switches between global and local searches and local-based searches expressed as EPSR values for computational efficiency.

실시예에 따른 추적 시스템은 글로벌 및 로컬 검색 프로세스로 수행되는 협업 모델 업데이트 메커니즘을 확립할 수 있다. 추적 신뢰도 변동 또는 모델 불안정성이 탐지될 때, 다른 DCF 기반 추적 알고리즘에 적용되는 자체 감독 업데이트 접근 방식 대신 타겟 인식 탐지기가 제안한 가장 가능성이 높은 타겟 후보들을 DCF 모델을 업데이트하는 데 사용할 수 있다. 이것은 모델 순도를 유지하고 견고성을 추적하는데 중요한 것으로 입증될 수 있다.A tracking system according to an embodiment may establish a collaborative model update mechanism performed as a global and local search process. When tracking reliability fluctuations or model instability are detected, the most probable target candidates proposed by the target recognition detector can be used to update the DCF model instead of the self-supervised update approach applied to other DCF-based tracking algorithms. This can prove important for maintaining model purity and tracking robustness.

실시예에 따른 추적 시스템에서 글로벌 및 로컬 검색 기반의 시각적 추적 동작을 설명하기로 한다. 구체적으로, 도 3은 제안된 글로벌 및 로컬 검색 알고리즘의 전반적인 프로세스를 설명하기 위한 것으로, 현재 프레임은 다중 스케일 영역 제안 네트워크에 공급되어 객체 제안을 생성하고 DCF 기반 추적 모델의 표현으로 Colorname 및 Hog 기능을 추출하기 위해 현재 프레임에서 로컬 검색 영역을 자를 수 있다. 구속 조건 후 추적 알고리즘은 객체 재식별 및 모델 업데이트를 협업 방식으로 실행할 수 있다.A global and local search-based visual tracking operation in the tracking system according to the embodiment will be described. Specifically, Fig. 3 is intended to explain the overall process of the proposed global and local search algorithm, in which the current frame is fed to a multi-scale region proposal network to generate object proposals, and Colorname and Hog functions as representations of DCF-based tracking models. You can truncate the local search region in the current frame for extraction. Post-constraint tracking algorithms can perform object re-identification and model updates in a collaborative manner.

도 2는 일 실시예에 있어서, 멀티 스케일(Multi-scale) 지역 제안 네트워크를 나타낸 도면이다.2 is a diagram illustrating a multi-scale regional proposal network according to an embodiment.

5개의 콘볼루션(conv) 계층과 2번째 콘볼루션 계층의 활성화를 따르는 1개의 통과 계층, 2개의 지역 제안 계층 RPN 1과 RPN 2로 구성될 수 있다. 여기서 지역 제안 계층 RPN 1은 통과 확인 매핑 계층 위에 구축될 수 있다. RPN 2는 5번째 콘볼루션 계층의 콘볼루션 특징맵 위에 구축될 수 있다. RPN 1은 세분화된 특징맵에서 지역 제안을 생성할 수 있다. 세분화된 특징은 소규모의 객체를 지역화하는데 도움이 될 수 있다. RPN 2는 대략적인 특징맵에서 지역 제안을 생성하며, 이러한 특징맵은 일반 정보를 제공할 수 있다. RPN 1 및 RPN 2 이후, 각 ROI 위치 상단에 관심 영역 풀링 계층을 적용할 수 있다. 그런 다음, 두 개의 완전히 연결된 계층(fully-connected layers)을 더 정밀한 경계 박스 회귀(bounding box regressor)로 사용할 수 있다. 네트워크에 대한 입력은 단일 스케일의 전체 프레임이며, 더 짧은 크기 = 600 픽셀이 되도록 이미지의 크기를 조절할 수 있다. 도 2는 고정 크기 1000 ×600 픽셀의 입력을 나타낸 예이다. It may consist of five convolutional (conv) layers, one pass-through layer following activation of the second convolutional layer, and two regional proposal layers RPN 1 and RPN 2 . Here, the regional proposal layer RPN 1 may be built on top of the pass confirmation mapping layer. RPN 2 may be built on the convolutional feature map of the fifth convolutional layer. RPN 1 can generate regional proposals from a subdivided feature map. Refinement features can help localize small objects. RPN 2 generates a regional proposal from a rough feature map, and this feature map can provide general information. After RPN 1 and RPN 2, a region of interest pooling layer can be applied on top of each ROI position. Then, two fully-connected layers can be used as more precise bounding box regressors. The input to the network is a single scale full frame, and the image can be scaled so that the shorter size = 600 pixels. 2 is an example showing an input of a fixed size of 1000 × 600 pixels.

지역 제안 네트워크는 앵커 박스(anchor boxes)에 의해 회귀된 지역 제안을 생성할 수 있다. 앵커 박스에 더 나은 우선권을 부여하고, 엄격한 경계 박스를 더 잘 예측하기 위한 지역 제안 네트워크를 보다 쉽게 수렴하게 하기 위해, 먼저 표준 K-평균 클러스터링 알고리즘을 사용하여 훈련 데이터 중 그라운드 트루(ground truth) 경계 박스를 m(m은 자연수) 개의 클러스터로 클러스터링할 수 있다. 여기서, 모든 경계 박스에서 m 개의 클러스터 중심이 앵커 박스로 선택할 수 있다. 실시예에서 사용하는 훈련 세트는 M(M = 19, 780)의 그라운드 트루 경계 박스를 포함하고 있으며, 이러한 그라운드 트루 경계 박스들은 서로 다른 스케일과 가로 세로 비율(종횡비)을 가질 수 있다. 이에 따라 이러한 경계 박스는 스케일과 가로 세로 비율 측면에서 N클러스터에 클러스터링 되어야 한다. The regional proposal network may generate regional proposals regressed by anchor boxes. In order to give better precedence to anchor boxes and to make it easier to converge local proposal networks for better predicting tight bounding boxes, we first use a standard K-means clustering algorithm to determine the ground truth boundaries among the training data. A box can be clustered into m (m is a natural number) clusters. Here, m cluster centers in all bounding boxes can be selected as anchor boxes. The training set used in the embodiment includes M (M = 19, 780) ground-true bounding boxes, and these ground-true bounding boxes may have different scales and aspect ratios (aspect ratios). Accordingly, these bounding boxes should be clustered in N clusters in terms of scale and aspect ratio.

b_i가 i번째 경계 박스를 나타내고, c_j가 j번째 클러스터의 중심인 경우, 거리 메트릭은

로 정의된다. 이때,

, j,

임계값인 모든 그라운드 트루 경계 박스에서 N개의 박스

를 랜덤으로 선택하여 N 중심부를 초기화할 수 있다. 앵커 박스 클러스터링 알고리즘은 다음의 알고리즘 1에 언급되어 있으며, m스케일과 가로 세로비를 가진 앵커 박스를 생성할 수 있다. If b _i denotes the i-th bounding box and c _j is the center of the j-th cluster, then the distance metric is

is defined as At this time,

, j,

N boxes from all ground true bounding boxes that are thresholds

can be randomly selected to initialize N centers. The anchor box clustering algorithm is mentioned in Algorithm 1 below, and can generate anchor boxes with m-scale and aspect ratio.

도 4에서 왼쪽 이미지는 훈련 세트로부터 19,780개의 객체 기반의 그라운드 트루 경계 박스를 포함하며, 중간 이미지는 알고리즘 1에서 생성된 9개의 앵커 박스를 나타낸 것이다. 앵커 박스 각각은 하나의 클러스터 중심을 나타내며 오른쪽 이미지는 다중 스케일 지역 제안 네트워크에 의해 생성된 최종 객체를 나타낸 것이다. In Fig. 4, the left image includes 19,780 object-based ground-true bounding boxes from the training set, and the middle image shows the 9 anchor boxes generated by Algorithm 1. Each anchor box represents one cluster centroid, and the image on the right represents the final object generated by the multi-scale regional proposal network.

표 1은 앵커 박스의 크기를 나타낸 것이다. 첫 번째 행은 높이, 두 번째 행은 앵커 박스의 폭(너비)이다.Table 1 shows the size of the anchor box. The first row is the height and the second row is the width (width) of the anchor box.

표 1: Table 1:

실시예에 따른 추적 시스템의 다중 도메인 학습 메커니즘에 대하여 설명하기로 한다. A multi-domain learning mechanism of the tracking system according to an embodiment will be described.

추적 시스템은 네트워크의 끝에 있는 K 분기를 분류 계층으로 설계할 수 있다. 각 분기는 하나의 특정 비디오 도메인을 의미하며, 각 도메인은 하나의 특정 트레이닝 시퀀스를 나타낸다. 네트워크 훈련 과정에서 표준 확률적 경사도 강하(SGD) 방법을 적용하고, 훈련이 k(k는 자연수)번째 반복에 있을 때 네트워크가 융합되거나 사전 정의된 최대 반복 횟수에 도달할 때까지 분류기의 k번째 분기 계층만 사용하여 네트워크를 업데이트할 수 있다. 구체적으로, 다중 스케일 지역 제안 네트워크의 훈련의 경우, 멀티 태스크 손실(Multi-task loss)로 목적 함수를 최소화할 수 있다. 멀티 태스크 손실 함수는 다음과 같이 정의된다.The tracking system can design the K branch at the end of the network as a classification layer. Each branch represents one specific video domain, and each domain represents one specific training sequence. Apply the standard stochastic gradient descent (SGD) method in the network training process, and when the training is at the k (k is a natural number) iteration, the k-th branch of the classifier until the network converges or reaches a predefined maximum number of iterations. Only layers can be used to update the network. Specifically, in the case of training of a multi-scale regional proposal network, the objective function can be minimized with a multi-task loss. The multi-task loss function is defined as

수학식 1:Equation 1:

여기서 {p*}는 객체를 포함하는 앵커 박스의 라벨(label)을 나타낸다. IoU(앵커박스, 그라운드 트루 박스)≥0.7일 때 p*=1이다. IoU≤0.3 일 때 p*=0이다. 그리고

는 분류 손실과 박스 회귀 손실 사이의 중요성을 제어하기 위한 하이퍼 파라미터이다. 실시예에서는 정확한 경계 박스 회귀에 더 많은 초점을 맞추고 있기 때문에 훈련 과정에서

=20으로 설정한 것을 예를 들어 설명하기로 한다. N_cls와 N_reg는 각각 미니 bach 크기와 앵커 위치 수를 나타낸다. 회귀 손실은 p*=1인 앵커에 대해서만 활성화될 수 있다. t_i와 t_i*는 예측된 좌표 파라미터와 양(positive)의 앵커와 관련된 그라운드 트루이다. Here, {p*} represents the label of the anchor box including the object. When IoU (anchor box, ground true box) ≥ 0.7, p*=1. When IoU≤0.3, p*=0. and

is a hyperparameter to control the importance between classification loss and box regression loss. Since the example focuses more on accurate bounding box regression, the training process

A setting of =20 will be described as an example. N _cls and N _reg represent the mini-bach size and number of anchor positions, respectively. Regression loss can only be activated for anchors with p*=1. t _i and t _i * are the predicted coordinate parameters and the ground true associated with the positive anchor.

최종의 경계 박스를 개선하기 위해 비특허문헌 1<R. Girshick, Fast r-CNN, in: IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1440-1448.>에 제시된 방법을 따른다. 추적하는 동안 객체 클래스를 예측할 필요가 없기 때문에, 손실 함수는 경계 박스 회귀에만 초점을 맞추기 때문에 학습 과정이 단순화될 수 있다. 손실 함수는 다음과 같다.In order to improve the final bounding box, non-patent document 1<R. Girshick, Fast r-CNN, in: IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1440-1448.> follow the method suggested. Since there is no need to predict the object class during tracking, the learning process can be simplified because the loss function only focuses on bounding box regression. The loss function is

수학식 2:Equation 2:

여기서,

이다. here,

to be.

예측된 경계 박스의 스케일 불변 변환(scale-invariant translation) 및 로그 공간 높이/폭 이동과 객체 제안에 대한 그라운드 트루 t_i 및 t_i*(i = x, y, w, h)는 다음과 같이 정의된다.Scale-invariant translation of predicted bounding box and ground true for log space height/width shift and object proposal t _i and t _i *(i = x , y , w , h ) are defined as do.

수학식 3:Equation 3:

이때, x, x* 및 x _p 는 각각 예측, 그라운드 트루 및 지역 제안 조정을 나타낸다. 여기서, x _p 는 지역 제안 네트워크의 출력 결과이다. In this case, x , x* and x _p represent prediction, ground true, and regional proposal adjustment, respectively. Here, x _p is the output result of the local proposal network.

실시예에서 최대 반복 횟수는 100K, 콘볼루션 계층의 경우 10^-4, 완전 연결된 계층의 경우 10^-3의 학습률을 가지고 있으며, 훈련 중 운동량과 가중치 붕괴는 각각 0.9와 5.0 Х10^-4로 설정될 수 있다. 시험 과정에서 도메인별 특정 계층이 테스트 시퀀스를 위한 새로운 분류기 계층으로 대체될 수 있다.In the embodiment, the maximum number of iterations is 100K, the convolutional layer ^{has a learning rate of 10 -4} , and the fully connected layer ^{has a learning rate of 10 -3} , and the momentum and weight decay during training can be set to ^{0.9 and 5.0 Х10 -4, respectively.} have. During testing, domain-specific layers may be replaced with new classifier layers for test sequences.

실시예에 따른 추적 시스템은 모델 신뢰도 추정을 통한 DCF 기반 추적기를 제공할 수 있다. 이에, 차별적 상관 필터 기반의 추적 모델을 설명하고, 모델 신뢰도 추정을 위한 새로운 방법을 제안한다. The tracking system according to the embodiment may provide a DCF-based tracker through model reliability estimation. Accordingly, we describe a differential correlation filter-based tracking model and propose a new method for model reliability estimation.

우선적으로 차별적 상관 필터 기반의 추적 모델에 대하여 설명하기로 한다. 객체 인식, 객체 탐지 및 포즈 추정에 탁월한 결과를 얻는 것으로 입증되었기 때문에 우리는 객체의 출현을 나타내는 색상 특징을 적용할 수 있다. 강력한 분류기를 위해 상관 필터 연산자를 적용할 수 있다. 추적 모델을 업데이트하기 위해 프레임 1에서 프레임 t까지 추출된 모든 타겟 객체 모양 패치

를 분류기 필터와 외관 모델을 업데이트하기 위한 학습 샘플로 간주할 수 있다. 구체적으로, x _i 는 프레임 i에서 타겟을 중심으로 한 크기의 M×N의 단일 이미지 패치로서, 그 모든 주기적 이동

,

는 분류기를 훈련시키는 훈련 사례로 사용할 수 있다. 그것들은 Gaussian 함수 y로 라벨이 표시되어 있으므로,

은

의 라벨이다. 필터 f는 다음과 같은 목적 함수를 최소화하여 학습될 수 있다.First, a differential correlation filter-based tracking model will be described. Since it has been proven to yield excellent results for object recognition, object detection, and pose estimation, we can apply color features that indicate the appearance of objects. A correlation filter operator can be applied for a powerful classifier. Patch all target object shapes extracted from frame 1 to frame t to update the tracking model

can be considered as a training sample for updating the classifier filter and appearance model. Specifically, x _i is a single image patch of size M×N centered on the target in frame i, with all its periodic shifts

,

can be used as a training example to train the classifier. Since they are labeled with the Gaussian function y,

silver

is the label of The filter f can be learned by minimizing the following objective function.

수학식 4: Equation 4:

여기서

은 커널

에 의해 유도된 Hilbert 공간에 색상 이름(CN)을 매핑하여, 내적을

로 정의한다. 해결책은 입력의 선형 조합

으로 확장할 수 있으며

는 정규화 파라미터이다. 이러한 비용 함수는 다음을 통해 최소화된다.here

silver kernel

By mapping the color names (CN) to the Hilbert space derived by

is defined as The solution is a linear combination of inputs

can be expanded to

is a normalization parameter. This cost function is minimized by

수학식 5:Equation 5:

이때,

를 이산 푸리에 변환 연산자로 정의할 수 있다. 분류기의 필터는 푸리에 도메인인

에서 학습되며, 또한 푸리에 도메인의 가우스 라벨을

로 변환할 수 있다. 푸리에 변환된 커널 출력은

로 정의되며, 여기서

로 정의될 수 있다. 현재 프레임 t에서 추적할 타겟 객체의 신뢰 점수는 크기 M ×N의 단일 패치 z에 대한 분류기 응답으로 계산될 수 있다.At this time,

can be defined as a discrete Fourier transform operator. The filter of the classifier is a Fourier domain

is learned from, and also the Gaussian label of the Fourier domain.

can be converted to Fourier transformed kernel output is

is defined as, where

can be defined as The confidence score of the target object to be tracked in the current frame t can be calculated as the classifier response to a single patch z of size M × N.

수학식 6: Equation 6:

여기서,

와

이다. 이때,

는 복수 개의 프레임에 걸쳐 이전 타겟 외관의 학습된 모델을 나타낸다. 새 프레임의 타겟 위치는 탐지 점수

를 최대화하여 추정할 수 있다.here,

Wow

to be. At this time,

represents the learned model of the previous target appearance over a plurality of frames. The target position of the new frame is the detection score

can be estimated by maximizing

대적이 폐쇄, 변형 및 갑작스러운 움직임을 겪을 때 타겟 모델은 다가오는 타겟의 상태와 일치하도록 적응적으로 업데이트되어야 하지만 이전 타겟 특성도 유지해야 한다. 이에 따라 타겟 객체의 외관 모델은 학습률

으로 업데이트 되고, 이전의 모든 타겟 외관 저장을 피하고 추적 속도를 최적화하기 위해 비특허문헌 2< M. Danelljan, F. Khan, M. Felsberg , J. Weijer, Adaptive color attributes for re- al-time visual tracking, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. 1090-1097 .>의 업데이트 개요를 채택할 수 있다.

를

(분자 부분)와

(분모 부분)의 분할 결과로 간주하며,

는 다음과 같이

와

를 별도로 업데이트하여 갱신할 수 있다.As the adversary experiences closure, deformation and sudden movements, the target model must be adaptively updated to match the state of the oncoming target, but also retain the previous target characteristics. Accordingly, the appearance model of the target object has a learning rate

In order to avoid all previous target appearance storage and optimize tracking speed, non-patent document 2 < M. Danelljan, F. Khan, M. Felsberg , J. Weijer, Adaptive color attributes for re-time visual tracking , in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014, pp. The update outline of 1090-1097 .> can be adopted.

cast

(molecular part) and

It is regarded as the result of division of (the denominator part),

is as follows

Wow

can be updated separately.

수학식 7: Equation 7:

여기서,

,

이고, 추정 타겟 상태

는 모델 안정성 및 추적의 강력함을 유지하기 위해 사용될 수 있다.here,

,

, and the estimated target state

can be used to maintain model stability and robustness of tracking.

실시예에 따른 추적 시스템에서 모델 신뢰성 메커니즘에 대하여 설명하기로 한다. 대부분의 기존 온라인 학습 추적기는 모델 손상 위험이 높은 각 프레임에서 외관 모델과 상관 필터를 업데이트한다. 특히 타겟 객체가 극적인 외관 변경, 배경 폐색 또는 빠른 움직임에 대응하면 이러한 요인들은 잘못된 업데이트로 이어진다. 잘못된 최신 모델은 추적기가 후속 프레임에서 타겟 객체를 추적하지 못하게 할 수 있다. A model reliability mechanism in the tracking system according to the embodiment will be described. Most existing online learning trackers update the appearance model and correlation filters at each frame with a high risk of model corruption. In particular, if the target object responds to a dramatic appearance change, background occlusion, or fast movement, these factors lead to false updates. A bad up-to-date model can prevent the tracker from tracking the target object in subsequent frames.

실시예에서는 현재 프레임의 추적 결과가 충분히 신뢰할 수 있는지 여부를 결정하기 위해 신뢰도를 측정하기 위해 도 6에 도시된 향상된 피크 대 sidelobe 비율(EPSR)을 제안할 수 있다. 여기서

는 각 비디오에 대한 첫 번째 프레임의 초기 EPSR 값에 대한 현재 프레임의 EPSR 값의 규모 요인을 나타내는 EPSR 값의 사전 정의된 임계값이며, 실시예에서는

= 0.8로 설정한 것으로 예를 들어 설명하기로 한다. 신뢰도 값이 사전 정의된 임계값

보다 클 경우, 추적 출력을 새 프레임에서 최종 타겟의 위치로 간주할 수 있다. 신뢰도 값이 사전 정의된 임계값

보다 크지 않을 경우, 제안된 타겟 인식 탐지기를 사용하여 객체 재탐지를 수행한 다음, re-ranking 알고리즘을 적용하여 진실 양성 확률을 가진 타겟 객체 후보들을 다시 정렬시킬 수 있다. 각 프레임에서 기준 DCF 추적기는 각 샘플링 패치에 대한 이산 신뢰도 점수

를 추정하며 이상적으로는 한 프레임에 있는 MХN 샘플링 패치 중 하나의 급격한 피크가 있어야 한다. In an embodiment, the improved peak-to-sidelobe ratio (EPSR) shown in FIG. 6 may be proposed to measure the reliability to determine whether the tracking result of the current frame is sufficiently reliable. here

is a predefined threshold value of EPSR value representing the scale factor of the EPSR value of the current frame with respect to the initial EPSR value of the first frame for each video, in the embodiment

= 0.8, which will be described as an example. Thresholds with predefined confidence values

If greater, the tracking output can be considered as the position of the last target in the new frame. Thresholds with predefined confidence values

If it is not greater than that, object re-detection is performed using the proposed target recognition detector, and then target object candidates with true positive probabilities can be rearranged by applying a re-ranking algorithm. At each frame, the baseline DCF tracker scores a discrete confidence score for each sampling patch.

, and ideally there should be an abrupt peak of one of the MХN sampling patches in a frame.

그러나, 신뢰도 맵에 피크가 2개 이상 있을 경우, 추적 결과는 실제 탐지 결과로서 설득력이 없다. 이에 따라 배경 폐색 또는 추적 실패에 대한 힌트를 제거하고, 불량 업데이트를 방지하고 실시예에서 제안된 타겟 인식 탐지기로 타겟 객체를 다시 식별하기 위해 향상된 피크 대 sidelobe 비율이라는 신뢰도 탐지 메커니즘을 탐구할 수 있다. 신뢰도 탐지값(EPSR)은 다음과 같이 정의할 수 있다. However, when there are two or more peaks in the confidence map, the tracking result is not convincing as the actual detection result. Accordingly, it is possible to explore the confidence detection mechanism of an improved peak-to-sidelobe ratio to remove hints of background occlusion or tracking failure, prevent bad updates, and re-identify target objects with the target-aware detector proposed in the embodiment. The reliability detection value (EPSR) may be defined as follows.

수학식 8: Equation 8:

여기서 S_max, S_max2는 최대값(주 피크)과 부 피크 값,

와

는 신뢰도 점수 S_m,n의 평균값과 공분산을 나타낸다. PVPR은 1차 대 2차 피크 비율을 나타내며, 초 단위 피크까지의 1차 피크의 억제 정도를 나타낸다. PSVR 값이 커질 수록 위치의 정확도가 향상될 수 있다. PSR 값은 피크 대 sidelobe 비율이며, 타겟 객체가 정확히 피크 위치에 존재하는지에 대한 신뢰도를 나타낸다.where S _max , S _max2 are the maximum (main peak) and sub-peak values,

Wow

is the mean value and covariance of the confidence score S _m,n. PVPR indicates the ratio of the primary to secondary peaks, and indicates the degree of suppression of the primary peak up to the peak in seconds. As the PSVR value increases, positioning accuracy may be improved. The PSR value is the peak-to-sidelobe ratio, and indicates the reliability of whether the target object is exactly at the peak position.

실시예에 따른 추적 시스템에서 타겟 객체 위치 및 모델 업데이트 동작에 대하여 설명하기로 한다. 강력한 추적기는 계산적으로 효율적이어야 하며, 차별적인 힘을 가져야 한다. 실시예에 따른 추적 시스템은 제안된 멀티 스케일 지역 제안 네트워크의 높은 객체 회수율의 이점을 이용하여 추적별 메커니즘으로 모델링되며, 추적기는 타겟 객체의 공간과 스케일 상태와 고유한 패턴을 기반으로 모든 객체 후보자들 사이에서 타겟과 정답이 아닌 선택지를 구별해야 한다. 추적 프로세스는 공간 및 스케일 제약과 객체 제안 재정렬이라는 두 단계로 구분될 수 있다.A target object position and a model update operation in the tracking system according to the embodiment will be described. A strong tracker must be computationally efficient and have differential power. The tracking system according to the embodiment is modeled as a tracking-by-tracking mechanism by taking advantage of the high object recovery rate of the proposed multi-scale regional proposal network, and the tracker selects all object candidates based on the spatial and scale state and unique pattern of the target object. Distinguish between the target and the non-correct option. The tracking process can be divided into two phases: spatial and scale constraints and object proposal reordering.

추적 시스템은 제1 단계(공간 및 스케일 제약)을 적용할 수 있다. 원래의 다중 스케일 RPN 기반의 타겟 인식 탐지기 D_ta는

의 경계 박스 풀을 생성한다. 각 경계 박스는 객체가 속하는 범주를 나타내는 탐지 점수

로 할당된 타겟성을 제안한다. 기준 j의 하한 임계값을 나타내는

를 이용할 수 있다.

는 타겟성 확률이고,

는 스케일 제약 기준이며,

은 공간 제약 기준이다.

를 이용하여 각각 기준

,

의 하한 임계값을 나타낸다. The tracking system may apply a first step (spatial and scale constraints). The original multi-scale RPN-based target recognition detector D _ta is

Create a bounding box pool of Each bounding box has a detection score indicating which category the object belongs to.

We propose the targetability assigned to representing the lower threshold of criterion j

is available.

is the target probability,

is the scale constraint criterion,

is a space constraint criterion.

Each standard using

,

represents the lower threshold of .

여기서, 타겟 인식 탐지기에 의해 추적 객체가 탐지되는 것을 보장하기 위해 높은 회수율에 대해 탐지 점수

를 가진 객체 제안을 선택하고,

는

의 하한 임계값이며,

로 설정할 수 있다. 후속 프레임에서 추적된 객체가 다음과 같은 이전 위치 근처에 위치해야 한다는 가정에 근거하여 타겟 객체가 나타난 것을 motion smoothness라고 하며,

에 의해 객체 제안의 중심점과 이전에 추정된 추적 객체의 중심점 사이의 상대적 거리를 계산함으로써 공간 제약을 적용할 수 있다. 연속 프레임에서 추적 객체의 스케일 변화는 스케일 범위 일관성을 만족시키기 위하여 특정 범위 내에서 제어되어야 하며, 객체 제안과 추정된 추적 출력 사이의 통합 함수에 대한 교차점 조합 함수

에 의한 스케일 제한으로 정의될 수 있다. 공간적, 스케일 제약을 만족시키는 객체 제안을 매우 유사한 객체 후보

로 유보할 수 있다.Here, the detection score for a high recovery rate to ensure that the tracked object is detected by the target-aware detector

Select the object proposal with

Is

is the lower threshold of

can be set to Based on the assumption that the tracked object in the subsequent frame should be located near the previous position as follows, the appearance of the target object is called motion smoothness,

The spatial constraint can be applied by calculating the relative distance between the center point of the object proposal and the center point of the previously estimated tracking object. The scale change of the tracking object in successive frames should be controlled within a certain range to satisfy the scale range consistency, and the intersection combination function for the integration function between the object proposal and the estimated tracking output

It can be defined as a scale limit by Object proposals that satisfy spatial and scale constraints are very similar object candidates

can be withheld by

수학식 9: Equation 9:

여기서

와

는 각각 0.60과 0.50으로 설정된 공간 제약과 스케일 제약의 사전 정의된 하한 임계값이다. 도 5는 공간과 스케일 제약의 프로세스를 나타낸 것이다.here

Wow

are the predefined lower thresholds of the space constraint and scale constraint set to 0.60 and 0.50, respectively. Figure 5 shows the process of space and scale constraints.

추적 시스템은 제2 단계(타겟 위치와 재식별)을 수행할 수 있다. 상기 언급한 공간 및 스케일 제약 프로세스 후, 글로벌 검색 기반 탐지에 의해 생성된 타겟 객체 후보

를 유지하며, 세트

내의 각 요소는 n에서 새로운 프레임에서 추적 타겟 객체가 될 가능성이 가장 높은 상위 K의 후보 경계 박스를 나타낸다. 그리고 나서, 각 후보들은 외관 모델

와의 상관도 점수 재계산을 위한 검색 영역 R_t ^k를 확립하기 위하여 즉각적인 샘플로 제공되며, 각 후보자들의 상관 신뢰도는

로 계산될 수 있다. 최대 상관도 점수를 가진 후보자는 다음을 통해 최종 추적 타겟 객체로 추정된다.The tracking system may perform a second step (re-identification with target location). After the above-mentioned spatial and scale constraint process, target object candidates generated by global search-based detection

maintain and set

Each element in n represents the top K candidate bounding box that is most likely to be the tracking target object in a new frame at n. Then, each candidate has an appearance model

_{is provided as an immediate sample to establish a search region R t} ^k for recalculating the correlation score with , and the correlation confidence of each candidate is

can be calculated as The candidate with the maximum relevance score is estimated as the final tracking target object through

수학식 10:Equation 10:

모델 안정성 및 추적의 강력함을 위해 가중 인스턴트 객체 후보로 추적 모델을 업데이트하며, 가중치는 상관도 점수의 지수함수이다. 이 업데이트 메커니즘은 모델 손상을 피할 수 있고 추적 알고리즘이 이동 문제를 피할 수 있도록 도와준다. 새로운 타겟 외관 모델은

라 할 때 다음과 같이 업데이트될 수 있다. We update the tracking model with weighted instant object candidates for model stability and tracking robustness, and the weight is an exponential function of the relevance score. This update mechanism can avoid model corruption and helps the tracking algorithm avoid movement problems. The new target appearance model is

It can be updated as follows.

수학식 11: Equation 11:

실시예에서 제안된 글로벌 및 로컬 검색 기반 추적 알고리즘은 알고리즘 2에 제시되어 있다.The global and local search-based tracking algorithms proposed in the embodiment are presented in Algorithm 2.

이상에서 설명된 장치는 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 하나 이상의 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 하나 이상의 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 콘트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The device described above may be implemented as a hardware component, a software component, and/or a combination of the hardware component and the software component. For example, devices and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate array (FPGA). , a programmable logic unit (PLU), microprocessor, or any other device capable of executing and responding to instructions, may be implemented using one or more general purpose or special purpose computers. The processing device may execute an operating system (OS) and one or more software applications running on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that can include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치에 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 하나 이상의 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or apparatus, to be interpreted by or to provide instructions or data to the processing device. may be embodied in The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in one or more computer-readable recording media.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, etc. alone or in combination. The program instructions recorded on the medium may be specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상과 같이 실시예들이 비록 한정된 실시예와 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 상기의 기재로부터 다양한 수정 및 변형이 가능하다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited embodiments and drawings, various modifications and variations are possible from the above description by those skilled in the art. For example, the described techniques are performed in an order different from the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

A global and local search method performed by a tracking system, comprising:
constructing a region proposal network in a multi-scale-based target-aware detector;
performing global and local searches by cooperatively combining a differential correlation filter (DCF)-based tracking model and the multi-scale-based target recognition detector; and
estimating the location of the target object through the performed global and local searches;
including,
The step of estimating the location of the target object,
Suggests a peak-to-sidelobe ratio (EPSR) for measuring the reliability of a tracking model according to the tracking result of the current frame, and determines whether single tracking or joint tracking is performed based on the peak-to-sidelobe ratio, and calculates The tracking model is updated for efficiency, the multi-scale-based target recognition detector generates similar object candidates with spatial and scale constraints, and foreground and background interference is detected through the differential correlation filter (DCF)-based tracking model. To perform a role to distinguish and rearrange the ranks of object candidates for object reconfirmation, and to ensure that the tracking object is detected by the multi-scale-based target recognition detector, the object recovery rate is higher than a preset detection score Select, apply a spatial constraint by calculating the relative distance between the centroid of the selected object suggestion and the centroid of a previously estimated tracking object; Applying scale constraints by
Global and local search methods including.

delete

According to claim 1,
The step of estimating the location of the target object,
A target object candidate is generated by applying the spatial constraint and scale constraint, and a candidate bounding box that is likely to become a tracking target object in a new frame within the generated target object candidate is indicated, and the correlation reliability of each candidate bounding box estimating the final target object by calculating
Global and local search methods including.

A tracking system for global and local searches, comprising:
a network construction unit for constructing a region proposal network in a multi-scale-based target recognition detector;
a search performing unit that cooperatively combines a differential correlation filter (DCF)-based tracking model and the multi-scale-based target recognition detector to perform global and local searches; and
A location estimator for estimating the location of the target object through the performed global and local searches
including,
The location estimation unit,
Suggests a peak-to-sidelobe ratio (EPSR) for measuring the reliability of a tracking model according to the tracking result of the current frame, and determines whether single tracking or joint tracking is performed based on the peak-to-sidelobe ratio, and calculates The tracking model is updated for efficiency, the multi-scale-based target recognition detector generates similar object candidates with spatial and scale constraints, and foreground and background interference is detected through the differential correlation filter (DCF)-based tracking model. To perform a role to distinguish and rearrange the ranks of object candidates for object reconfirmation, and to ensure that the tracking object is detected by the multi-scale-based target recognition detector, the object recovery rate is higher than a preset detection score Select, apply a spatial constraint by calculating the relative distance between the centroid of the selected object suggestion and the centroid of a previously estimated tracking object; to apply a scale constraint by
tracking system.