KR20210092672A

KR20210092672A - Method and apparatus for tracking target

Info

Publication number: KR20210092672A
Application number: KR1020200179773A
Authority: KR
Inventors: 수 징타오; 유 지아키안; 유병인; 최창규; 이현정; 탄 항카이; 한재준; 왕 퀴앙; 첸 이웨이
Original assignee: 삼성전자주식회사
Priority date: 2020-01-16
Filing date: 2020-12-21
Publication date: 2021-07-26
Also published as: CN113129332A

Abstract

Disclosed are a method and an apparatus for tracking a target. According to one embodiment, the apparatus for tracking the target includes at least one processor. The processor obtains a first depth feature from a target region image, and obtains a second depth feature from a search region image. The processor obtains a global response diagram between the first depth feature and the second depth feature. The processor obtains temporary bounding box information based on the global response diagram. The processor updates the second depth feature based on the temporary bounding box information to obtain the updated second depth feature. The processor obtains a plurality of local feature blocks based on the first depth feature. The processor obtains a local response diagram based on the local feature blocks and the updated second depth feature. The processor obtains output bounding box information based on the local response diagram.

Description

Target tracking method and apparatus {METHOD AND APPARATUS FOR TRACKING TARGET}

이미지에서 타겟을 추적하는 기술에 관한 것으로, 단계적으로 타겟을 추적하는 기술에 관한 것이다.It relates to a technique for tracking a target in an image, and to a technique for tracking a target in stages.

시각적 타겟 추적 기술(Visual object tracking)은 컴퓨터 비전 중 중요한 분야 중 하나이다. 시각적 타겟 추적 기술은 하나의 비디오 시퀀스에서 제1 프레임 이미지와 주어진 바운딩 박스(Bounding box)에 따라 후속 프레임 이미지에서 타겟의 바운딩 상자를 지속적으로 예측하는 것이다. 타겟은 물체 또는 물체의 일부일 수 있다. Visual object tracking is one of the important fields of computer vision. The visual target tracking technique is to continuously predict the bounding box of a target in a subsequent frame image according to a first frame image in one video sequence and a given bounding box. The target may be an object or part of an object.

일 실시예에 따르면, 타겟 추적 방법은, 타겟 영역 이미지로부터 제1 깊이 특징을 획득하고, 검색 영역 이미지로부터 제2 깊이 특징을 획득하는 단계; 상기 제1 깊이 특징 및 상기 제2 깊이 특징 간의 글로벌 응답 다이어그램을 획득하는 단계; 상기 글로벌 응답 다이어그램을 기초로 임시 바운딩 박스 정보를 획득하는 단계; 상기 임시 바운딩 박스 정보를 기초로 상기 제2 깊이 특징을 갱신하여 갱신된 제2 깊이 특징을 획득하는 단계; 상기 제1 깊이 특징을 기초로 복수의 로컬 특징 블록을 획득하는 단계; 상기 복수의 로컬 특징 블록 및 상기 갱신된 제2 깊이 특징을 기초로 로컬 응답 다이어그램을 획득하는 단계; 및 상기 로컬 응답 다이어그램을 기초로 출력 바운딩 박스 정보를 획득하는 단계를 포함할 수 있다.According to an embodiment, a target tracking method includes: obtaining a first depth feature from a target area image, and obtaining a second depth feature from a search area image; obtaining a global response diagram between the first depth feature and the second depth feature; obtaining temporary bounding box information based on the global response diagram; acquiring an updated second depth feature by updating the second depth feature based on the temporary bounding box information; obtaining a plurality of local feature blocks based on the first depth feature; obtaining a local response diagram based on the plurality of local feature blocks and the updated second depth feature; and obtaining output bounding box information based on the local response diagram.

상기 복수의 로컬 특징 블록을 획득하는 단계는, 상기 제1 깊이 특징으로부터 추가적으로 추출된 제3 깊이 특징 또는 상기 제1 깊이 특징을 분할하여 상기 복수의 로컬 특징 블록을 획득하는 단계를 포함하고, 상기 로컬 응답 다이어그램을 획득하는 단계는, 상기 갱신된 제2 깊이 특징으로부터 추가적으로 추출된 제4 깊이 특징 또는 상기 제2 깊이 특징 및 상기 복수의 로컬 특징 블록 간의 상관 관계를 기초로 상기 로컬 응답 다이어그램을 획득하는 단계를 포함할 수 있다.The obtaining of the plurality of local feature blocks includes obtaining the plurality of local feature blocks by dividing the first depth feature or a third depth feature additionally extracted from the first depth feature, The obtaining of the response diagram may include: obtaining the local response diagram based on a fourth depth feature additionally extracted from the updated second depth feature or a correlation between the second depth feature and the plurality of local feature blocks; may include.

상기 상관 관계를 기초로 상기 로컬 응답 다이어그램을 획득하는 단계는, 상기 복수의 로컬 특징 블록 각각과 상기 제2 깊이 특징 또는 상기 제4 깊이 특징 간의 상기 상관 관계를 기초로 복수의 로컬 서브 응답 다이어그램을 획득하는 단계; 및 상기 복수의 로컬 서브 응답 다이어그램을 합성하여 상기 로컬 응답 다이어그램을 획득하는 단계를 포함할 수 있다.The obtaining of the local response diagrams based on the correlation includes: acquiring a plurality of local sub-response diagrams based on the correlation between each of the plurality of local feature blocks and the second depth feature or the fourth depth feature to do; and synthesizing the plurality of local sub-response diagrams to obtain the local response diagram.

상기 복수의 로컬 서브 응답 다이어그램을 합성하여 상기 로컬 응답 다이어그램을 획득하는 단계는, 상기 복수의 로컬 특징 블록을 타겟 특징 블록 또는 배경 특징 블록으로 분류하는 단계; 및 상기 분류 결과를 기초로 상기 로컬 서브 응답 다이어그램을 합성하여 상기 로컬 응답 다이어그램을 획득하는 단계를 포함할 수 있다.The step of synthesizing the plurality of local sub-response diagrams to obtain the local response diagram may include: classifying the plurality of local feature blocks into target feature blocks or background feature blocks; and synthesizing the local sub-response diagram based on the classification result to obtain the local response diagram.

상기 분류하는 단계는, 상기 임시 바운딩 박스와 상기 복수의 로컬 특징 블록 각각의 중첩 비율을 기초로 상기 복수의 로컬 특징 블록을 타겟 특징 블록 또는 배경 특징 블록으로 분류하는 단계를 포함할 수 있다.The classifying may include classifying the plurality of local feature blocks as a target feature block or a background feature block based on an overlap ratio of the temporary bounding box and each of the plurality of local feature blocks.

상기 출력 바운딩 박스 정보는, 상기 임시 바운딩 박스 정보에 포함된 임시 바운딩 박스의 중심 위치 좌표 및 출력 바운딩 박스의 중심 위치 좌표 간의 좌표 오프셋 및 상기 출력 바운딩 박스의 크기와 미리 설정된 크기 간의 크기 오프셋을 포함하고, 상기 출력 바운딩 박스 정보를 획득하는 단계는, 상기 좌표 오프셋의 절대값의 합이 임계값보다 큰 경우, 상기 임시 바운딩 박스 정보를 상기 출력 바운딩 박스 정보로서 출력하고, 상기 좌표 오프셋의 절대값의 합이 임계값 이하인 경우, 상기 임시 바운딩 박스의 중심 위치와 상기 좌표 오프셋을 합산한 결과 및 상기 임시 바운딩 박스의 크기와 상기 크기 오프셋을 합산한 결과를 상기 출력 바운딩 박스 정보로서 출력할 수 있다.The output bounding box information includes a coordinate offset between the center position coordinates of the temporary bounding box and the center position coordinates of the output bounding box included in the temporary bounding box information, and a size offset between the size of the output bounding box and a preset size. , the obtaining of the output bounding box information includes outputting the temporary bounding box information as the output bounding box information when the sum of the absolute values of the coordinate offsets is greater than a threshold value, and the sum of the absolute values of the coordinate offsets. When it is less than or equal to this threshold, a result of summing the center position of the temporary bounding box and the coordinate offset and a result of summing the size of the temporary bounding box and the size offset may be output as the output bounding box information.

상기 임시 바운딩 박스 정보를 획득하는 단계는, 현재 프레임의 상기 글로벌 응답 다이어그램에서 가장 큰 상관 관계를 가지는 좌표를 현재 프레임의 상기 임시 바운딩 박스의 중심 위치 좌표로서 출력하고, 이전 프레임에서 추정된 출력 바운딩 박스의 크기를 현재 프레임의 상기 임시 바운딩 박스의 크기로서 출력할 수 있다.The obtaining of the temporary bounding box information includes outputting the coordinates having the greatest correlation in the global response diagram of the current frame as the coordinates of the center position of the temporary bounding box of the current frame, and outputting the estimated output bounding box in the previous frame. may be output as the size of the temporary bounding box of the current frame.

상기 제1 깊이 특징으로부터 추가적으로 추출된 제3 깊이 특징 또는 상기 제1 깊이 특징을 분할하여 상기 복수의 로컬 특징 블록을 획득하는 단계는, 상기 제1 깊이 특징 또는 상기 제3 깊이 특징을, 상기 복수의 로컬 특징 블록이 중첩되지 않도록 분할하거나, 상기 복수의 로컬 특징 블록이 중첩되도록 분할하거나, 미리 설정된 블록 분포를 기초로 분할할 수 있다.The step of obtaining the plurality of local feature blocks by dividing the third depth feature or the first depth feature additionally extracted from the first depth feature may include adding the first depth feature or the third depth feature to the plurality of local feature blocks. The local feature blocks may be divided so that they do not overlap, the plurality of local feature blocks may be divided such that they overlap, or the partition may be divided based on a preset block distribution.

일 실시예에 따른 컴퓨터 프로그램은 하드웨어와 결합되어 상기 방법을 실행시키기 위하여 컴퓨터 판독 가능한 기록매체에 저장될 수 있다.A computer program according to an embodiment may be combined with hardware and stored in a computer-readable recording medium to execute the method.

일 실시예에 따르면, 타겟 추적 장치는 적어도 하나의 프로세서를 포함할 수 있다. 프로세서는 타겟 영역 이미지로부터 제1 깊이 특징을 획득하고, 검색 영역 이미지로부터 제2 깊이 특징을 획득할 수 있다. 프로세서는 제1 깊이 특징 및 제2 깊이 특징 간의 글로벌 응답 다이어그램을 획득할 수 있다. 프로세서는 글로벌 응답 다이어그램을 기초로 임시 바운딩 박스 정보를 획득할 수 있다. 프로세서는 임시 바운딩 박스 정보를 기초로 제2 깊이 특징을 갱신하여 갱신된 제2 깊이 특징을 획득할 수 있다. 프로세서는 제1 깊이 특징을 기초로 복수의 로컬 특징 블록을 획득할 수 있다. 프로세서는 복수의 로컬 특징 블록 및 갱신된 제2 깊이 특징을 기초로 로컬 응답 다이어그램을 획득할 수 있다. 프로세서는 로컬 응답 다이어그램을 기초로 출력 바운딩 박스 정보를 획득할 수 있다.According to an embodiment, the target tracking apparatus may include at least one processor. The processor may obtain a first depth feature from the target area image, and obtain a second depth feature from the search area image. The processor may obtain a global response diagram between the first depth feature and the second depth feature. The processor may obtain temporary bounding box information based on the global response diagram. The processor may acquire the updated second depth feature by updating the second depth feature based on the temporary bounding box information. The processor may obtain a plurality of local feature blocks based on the first depth feature. The processor may obtain a local response diagram based on the plurality of local feature blocks and the updated second depth feature. The processor may obtain the output bounding box information based on the local response diagram.

도 1은 일 실시예에 따른 타겟 추적 방법의 전체 동작을 도시한 흐름도이다.
도 2는 일 실시예에 따른 타겟 추적 방법의 동작을 도시한 순서이이다.
도 3은 일 실시예에 따른 타겟 추적 방법을 구체화한 흐름도이다.
도 4는 일 실시예에 따른 도 6은 본 발명에 따른 글로벌 상관 연산에 대한 예시도이다.
도 5는 일 실시예에 따른 타겟 추적 방법의 제1 스테이지에 대한 예시도이다.
도 6은 일 실시예에 따른 블록 분할 방법에 대한 예시도이다.
도 7은 일 실시예에 따른 블록 상관 연산에 대한 예시도이다.
도 8은 일 실시예에 따른 간섭 억제 및 응답 다이어그램을 융합한 예시도이다.
도 9는 일 실시예에 따른 자가 적응 예측에 대한 예시도이다.
도 10은 일 실시예에 따른 타겟 추적 방법의 제2 스테이지 작업에 대한 예시도이다.
도 11은 일 실시예에 따른 네트워크 훈련에 대한 예시도이다.
도 12는 일 실시예에 따른 간섭 억제를 결합한 블록 상관 방법 및 글로벌 상관 방법과 블록 상관 방법의 효과 차이의 비교를 도시한다.
도 13은 일 실시예에 따른 타겟 추정 장치의 구성을 도시한 도면이다.1 is a flowchart illustrating an overall operation of a method for tracking a target according to an embodiment.
2 is a flowchart illustrating an operation of a target tracking method according to an embodiment.
3 is a flowchart illustrating a method for tracking a target according to an embodiment.
4 is an exemplary diagram illustrating a global correlation operation according to the present invention.
5 is an exemplary diagram of a first stage of a method for tracking a target according to an embodiment.
6 is an exemplary diagram of a block partitioning method according to an embodiment.
7 is an exemplary diagram for block correlation operation according to an embodiment.
8 is an exemplary diagram in which interference suppression and response diagrams are fused according to an embodiment.
9 is an exemplary diagram of self-adaptive prediction according to an embodiment.
10 is an exemplary diagram of a second stage operation of a method for tracking a target according to an embodiment.
11 is an exemplary diagram for network training according to an embodiment.
12 shows a comparison of the effect difference between the block correlation method and the global correlation method and the block correlation method combining interference suppression according to an embodiment.
13 is a diagram illustrating a configuration of an apparatus for estimating a target according to an embodiment.

실시예들에 대한 특정한 구조적 또는 기능적 설명들은 단지 예시를 위한 목적으로 개시된 것으로서, 다양한 형태로 변경되어 구현될 수 있다. 따라서, 실제 구현되는 형태는 개시된 특정 실시예로만 한정되는 것이 아니며, 본 명세서의 범위는 실시예들로 설명한 기술적 사상에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Specific structural or functional descriptions of the embodiments are disclosed for purposes of illustration only, and may be changed and implemented in various forms. Accordingly, the actual implemented form is not limited to the specific embodiments disclosed, and the scope of the present specification includes changes, equivalents, or substitutes included in the technical idea described in the embodiments.

제1 또는 제2 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 해석되어야 한다. 예를 들어, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Although terms such as first or second may be used to describe various components, these terms should be interpreted only for the purpose of distinguishing one component from another. For example, a first component may be termed a second component, and similarly, a second component may also be termed a first component.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다.When a component is referred to as being “connected to” another component, it may be directly connected or connected to the other component, but it should be understood that another component may exist in between.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 설명된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The singular expression includes the plural expression unless the context clearly dictates otherwise. In this specification, terms such as "comprise" or "have" are intended to designate that the described feature, number, step, operation, component, part, or combination thereof exists, and includes one or more other features or numbers, It should be understood that the possibility of the presence or addition of steps, operations, components, parts or combinations thereof is not precluded in advance.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 해당 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 명세서에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present specification. does not

이하, 실시예들을 첨부된 도면들을 참조하여 상세하게 설명한다. 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조 부호를 부여하고, 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. In the description with reference to the accompanying drawings, the same components are assigned the same reference numerals regardless of the reference numerals, and overlapping descriptions thereof will be omitted.

도 1은 일 실시예에 따른 타겟 추적 방법의 전체 동작을 도시한 흐름도이다.1 is a flowchart illustrating an overall operation of a method for tracking a target according to an embodiment.

일 실시예에 따르면, 타겟 추적 장치는 2개의 스테이지를 이용하여 연속되는 이미지 상에서 타겟을 추적할 수 있다. 타겟 추적 장치는 타겟 영역 이미지(101)와 검색 영역 이미지(103)를 결정할 수 있다. 타겟 추적 장치는 제1 스테이지(105)에서 검색 영역 이미지(103)로부터 타겟 영역 이미지(101)에 매칭되는 영역에 대한 대략적인 예측(107)을 수행할 수 있다. 대략적인 예측(107)의 결과로서 임시 바운딩 박스 정보가 획득될 수 있다. 타겟 추적 장치는 제2 스테이지(109)에서 임시 바운딩 박스 정보를 이용하여 정확한 예측(111)을 수행할 수 있다. 정확한 예측(111)의 결과로서 출력 바운딩 박스 정보가 출력될 수 있다. According to an embodiment, the target tracking apparatus may track the target on successive images using two stages. The target tracking device may determine the target area image 101 and the search area image 103 . In the first stage 105 , the target tracking apparatus may perform coarse prediction 107 on an area matching the target area image 101 from the search area image 103 . Temporary bounding box information may be obtained as a result of the coarse prediction 107 . The target tracking apparatus may perform the accurate prediction 111 using the temporary bounding box information in the second stage 109 . Output bounding box information may be output as a result of the correct prediction 111 .

타겟 추적 장치는 2개의 스테이지를 나누어 타겟 추적을 수행하며 블록 상관 및 글로벌 상관을 이용하여 타겟을 추적할 수 있다. 타겟 추적 장치는 블록 상관을 적용함으로써 임시 바운딩 박스 정보를 조정하기 위한 정보를 추출할 수 있다. 이를 통해, 타겟 추적 장치는 보다 적은 리소스를 이용하여 높은 정확도의 타겟 추적을 달성할 수 있다. 적은 리소스를 사용하기 때문에, 타겟 추적 장치는 모바일 환경에서도 높은 정확도와 안정적인 실시간 추적을 수행할 수 있다.The target tracking apparatus performs target tracking by dividing two stages, and may track the target using block correlation and global correlation. The target tracking device may extract information for adjusting the temporary bounding box information by applying block correlation. Through this, the target tracking apparatus can achieve high-accuracy target tracking using fewer resources. Because it uses few resources, the target tracking device can perform high-accuracy and stable real-time tracking even in a mobile environment.

이하에서, 입력 이미지는 타겟 추적 장치에 입력되는 이미지를 의미한다. 입력 이미지는 연속되는 프레임을 포함할 수 있으나 이에 한정되지 않는다. 타겟은 입력 이미지에서 추적되는 대상을 의미한다. 타겟 영역 이미지는 타겟을 나타내는 기준 이미지를 의미하며, 템플릿(template) 이미지로 지칭될 수 있다. 검색 영역 이미지는 타겟이 검색되는 영역으로서의 이미지를 의미한다. Hereinafter, the input image means an image input to the target tracking device. The input image may include a continuous frame, but is not limited thereto. The target refers to an object tracked in the input image. The target area image refers to a reference image indicating a target, and may be referred to as a template image. The search area image refers to an image as an area in which a target is searched.

제1 깊이 특징은 타겟 영역 이미지로부터 추출된 특징을 의미한다. 제2 깊이 특징은 검색 영역 이미지로부터 추출된 특징을 의미한다. 글로벌 상관은 제1 깊이 특징 전체와 제2 깊이 특징 간의 상관 연산을 의미한다. 글로벌 응답 다이어그램은 글로벌 상관의 결과를 의미한다. 제3 깊이 특징은 제1 깊이 특징에 대해 추가적인 컨벌루션 연산을 통해 추출된 깊이 특징을 의미하고, 제4 깊이 특징은 제2 깊이 특징에 대해 추가적인 컨벌루션 연산을 통해 추출된 깊이 특징을 의미한다.The first depth feature means a feature extracted from the target area image. The second depth feature means a feature extracted from the search area image. The global correlation refers to a correlation operation between the entire first depth feature and the second depth feature. The global response diagram represents the result of global correlation. The third depth feature means a depth feature extracted through an additional convolution operation with respect to the first depth feature, and the fourth depth feature means a depth feature extracted through an additional convolution operation with respect to the second depth feature.

로컬 특징 블록은 타겟 영역 이미지의 특징을 분할한 결과를 의미한다. 여기서, 타겟 영역의 특징은 제1 깊이 특징 또는 제1 깊이 특징으로부터 파생된 제3 깊이 특징을 포함할 수 있다. 로컬 특징 블록은 직사각형의 블록일 필요는 없으며 다양한 형태의 블록을 포함할 수 있다. 로컬 상관은 로컬 특징 블록 각각과 제2 깊이 특징 또는 갱신된 제2 깊이 특징 간의 상관 연산을 의미한다. 로컬 응답 다이어그램은 로컬 상관의 결과를 의미한다.The local feature block means a result of segmenting the feature of the target region image. Here, the characteristic of the target region may include a first depth characteristic or a third depth characteristic derived from the first depth characteristic. The local feature block need not be a rectangular block and may include blocks of various shapes. The local correlation refers to a correlation operation between each of the local feature blocks and the second depth feature or the updated second depth feature. The local response diagram represents the result of local correlation.

임시 바운딩 박스 정보는 제1 스테이지의 결과로서의 임시 바운딩 박스에 관한 정보를 의미하고, 출력 바운딩 박스 정보는 제2 스테이지의 결과로서의 출력 바운딩 박스에 관한 정보를 의미한다. 각각의 바운딩 박스 정보는 검색 영역 내에서의 타겟의 위치 정보와 크기 정보를 포함할 수 있다.The temporary bounding box information means information about the temporary bounding box as a result of the first stage, and the output bounding box information means information about the output bounding box as a result of the second stage. Each bounding box information may include location information and size information of the target within the search area.

도 2는 일 실시예에 따른 타겟 추적 방법의 동작을 도시한 순서이다.2 is a flowchart illustrating an operation of a target tracking method according to an embodiment.

도 2에서, 단계(201) 내지 단계(205)는 제1 스테이지에 해당하고, 단계(207) 내지 단계(213)은 제2 스테이지에 해당한다. 여기서, 제1 스테이지와 제2 스테이지의 구분은 설명의 편의를 위한 것으로, 스테이지를 구분하지 않아도 무방하다.In Fig. 2, steps 201 to 205 correspond to the first stage, and steps 207 to 213 correspond to the second stage. Here, the distinction between the first stage and the second stage is for convenience of description, and it is not necessary to separate the stages.

일 실시예에 따르면, 단계(201)에서, 타겟 추적 장치는 타겟 영역 이미지로부터 제1 깊이 특징을 획득하고, 검색 영역 이미지로부터 제2 깊이 특징을 획득할 수 있다.According to an embodiment, in step 201 , the target tracking device may obtain a first depth feature from the target area image, and obtain a second depth feature from the search area image.

예를 들어, 타겟 추적 장치는 연속되는 프레임을 획득할 수 있다. 타겟 추적 장치는 제1 뉴럴 네트워크를 사용하여 연속되는 프레임 중에서 제1 프레임 이미지의 일부 영역을 타겟 영역 이미지로 설정할 수 있다. 예를 들어, 제1 뉴럴 네트워크는 샴쌍둥이 컨볼루션 네트워크일 수 있으며, 이에 한정되지 않는다. 타겟 추적 장치는 타겟 영역 이미지로부터 제1 깊이 특징을 추출할 수 있다. 타겟 추적 장치는 현재 프레임을 검색 영역 이미지로 설정하고, 검색 영역 이미지에서 제2 깊이 특징을 추출한다. 여기서, 타겟 영역 이미지는 수동으로 설정된 초기 바운딩 박스 또는 이전의 프레임에 대한 출력 바운딩 박스에 따라 제1 프레임 이미지를 클리핑(clipping)하여 획득할 수 있으나, 이에 제한되지 않는다. 또한, 제1 깊이 특징은 타겟 영역 이미지의 글로벌 특징이고, 제2 깊이 특징은 검색 영역 이미지의 글로벌 특징이다. For example, the target tracking device may acquire successive frames. The target tracking apparatus may set a partial area of the first frame image among successive frames as the target area image by using the first neural network. For example, the first neural network may be a Siamese twin convolutional network, but is not limited thereto. The target tracking apparatus may extract the first depth feature from the target area image. The target tracking device sets the current frame as a search area image, and extracts a second depth feature from the search area image. Here, the target area image may be obtained by clipping the first frame image according to the manually set initial bounding box or the output bounding box for the previous frame, but is not limited thereto. Further, the first depth feature is a global feature of the target region image, and the second depth feature is a global feature of the search region image.

단계(203)에서, 타겟 추적 장치는 제1 깊이 특징 및 제2 깊이 특징 간의 글로벌 응답 다이어그램을 획득할 수 있다. 예를 들어, 타겟 추적 장치는 제1 깊이 특징 및 제2 깊이 특징의 글로벌 상관(global correlation)을 계산하여 글로벌 응답 다이어그램을 획득할 수 있다. In step 203 , the target tracking device may obtain a global response diagram between the first depth feature and the second depth feature. For example, the target tracking apparatus may obtain a global response diagram by calculating a global correlation between the first depth feature and the second depth feature.

두 이미지에 대해 상관 연산을 적용하면 두 이미지의 유사도를 나타내는 응답 다이어그램 Y가 획득될 수 있다. 유사도의 값이 클수록 검색 영역 이미지 Z의 해당 영역과 타겟 영역 이미지 X의 유사성이 높다는 것을 나타낸다. 예를 들어, 상관 연산은 수학식 1에 의해 수행될 수 있다.By applying the correlation operation to the two images, a response diagram Y representing the similarity between the two images can be obtained. The larger the similarity value, the higher the similarity between the corresponding area of the search area image Z and the target area image X is. For example, the correlation operation may be performed by Equation (1).

수학식 1에서, h, w는 이미지 X의 크기를 나타내고, i, j, u, v는 각각 이미지의 좌표를 나타낸다.In Equation 1, h and w represent the size of the image X, and i, j, u, and v represent the coordinates of the image, respectively.

단계(205)에서, 타겟 추적 장치는 글로벌 응답 다이어그램을 기초로 임시 바운딩 박스 정보를 획득할 수 있다. 타겟 추적 장치는 현재 프레임의 글로벌 응답 다이어그램에서 가장 큰 상관 관계를 가지는 좌표를 현재 프레임의 임시 바운딩 박스의 중심 위치 좌표로서 출력하고, 이전 프레임에서 추정된 출력 바운딩 박스의 크기를 현재 프레임의 임시 바운딩 박스의 크기로서 출력할 수 있다.In step 205, the target tracking device may obtain temporary bounding box information based on the global response diagram. The target tracking device outputs the coordinates having the greatest correlation in the global response diagram of the current frame as the center position coordinates of the temporary bounding box of the current frame, and sets the size of the output bounding box estimated in the previous frame to the temporary bounding box of the current frame. It can be output as the size of .

단계(207)에서, 타겟 추적 장치는 임시 바운딩 박스 정보를 기초로 제2 깊이 특징을 갱신하여 갱신된 제2 깊이 특징을 획득할 수 있다. 예를 들어, 타겟 추적 장치는 임시 바운딩 박스에 따라 검색 영역 이미지를 클리핑하여 축소된 영역의 검색 영역 이미지를 획득할 수 있다. 타겟 추적 장치는 축소된 영역의 검색 영역 이미지로부터 깊이 특징을 추출하여 제2 깊이 특징을 갱신할 수 있다.In step 207 , the target tracking device may acquire an updated second depth feature by updating the second depth feature based on the temporary bounding box information. For example, the target tracking apparatus may obtain the search area image of the reduced area by clipping the search area image according to the temporary bounding box. The target tracking apparatus may update the second depth feature by extracting the depth feature from the search region image of the reduced region.

단계(209)에서, 타겟 추적 장치는 제1 깊이 특징을 기초로 복수의 로컬 특징 블록을 획득할 수 있다. 타겟 추적 장치는 제1 깊이 특징으로부터 추가적으로 추출된 제3 깊이 특징 또는 제1 깊이 특징을 분할하여 복수의 로컬 특징 블록을 획득할 수 있다. 타겟 추적 장치는 제1 깊이 특징을 제2 뉴럴 네트워크에 입력하여 제3 깊이 특징을 추가적으로 추출할 수 있다.In step 209 , the target tracking device may obtain a plurality of local feature blocks based on the first depth feature. The target tracking apparatus may obtain a plurality of local feature blocks by dividing the third depth feature or the first depth feature additionally extracted from the first depth feature. The target tracking apparatus may additionally extract the third depth feature by inputting the first depth feature into the second neural network.

타겟 추적 장치는 제1 깊이 특징 또는 제3 깊이 특징을 복수의 로컬 특징 블록으로 분할할 수 있다. 타겟 추적 장치는 제1 깊이 특징 또는 제3 깊이 특징을 복수의 로컬 특징 블록이 중첩되지 않도록 분할하거나, 복수의 로컬 특징 블록이 중첩되도록 분할하거나, 미리 설정된 블록 분포를 기초로 분할할 수 있다.The target tracking apparatus may divide the first depth feature or the third depth feature into a plurality of local feature blocks. The target tracking apparatus may divide the first depth feature or the third depth feature so that a plurality of local feature blocks do not overlap, divide a plurality of local feature blocks so that they overlap, or divide based on a preset block distribution.

여기서, 미리 설정된 블록 분포는 인위적으로 정해진 분포일 수도 있고 학습된 뉴럴 네트워크에 의해 도출된 분포일 수도 있다. 인위적으로 정해진 분포는 가우스 분포를 포함할 수 있다. 분포를 출력하는 뉴럴 네트워크는 가우스 분포와 같은 특정 분포의 매개 변수(예: 가우스 분포의 평균값 및 분산)을 목표로 하여 최적화된 매겨 변수를 찾는 방향으로 학습될 수 있다.Here, the preset block distribution may be an artificially determined distribution or a distribution derived by a learned neural network. The artificially determined distribution may include a Gaussian distribution. A neural network that outputs a distribution can be trained in the direction of finding optimized parameters by targeting parameters of a specific distribution such as a Gaussian distribution (eg, the mean and variance of a Gaussian distribution).

단계(211)에서, 타겟 추적 장치는 복수의 로컬 특징 블록 및 갱신된 제2 깊이 특징을 기초로 로컬 응답 다이어그램을 획득할 수 있다. 타겟 추적 장치는 갱신된 제2 깊이 특징으로부터 추가적으로 추출된 제4 깊이 특징 또는 제2 깊이 특징 및 복수의 로컬 특징 블록 간의 상관 관계를 기초로 로컬 응답 다이어그램을 획득할 수 있다. 타겟 추적 장치는 제2 깊이 특징을 제2 뉴럴 네트워크에 입력하여 제4 깊이 특징을 추가적으로 추출할 수 있다.In step 211 , the target tracking device may obtain a local response diagram based on the plurality of local feature blocks and the updated second depth feature. The target tracking apparatus may obtain a local response diagram based on a fourth depth feature additionally extracted from the updated second depth feature or a correlation between the second depth feature and a plurality of local feature blocks. The target tracking apparatus may additionally extract the fourth depth feature by inputting the second depth feature to the second neural network.

타겟 추적 장치는 복수의 로컬 특징 블록 각각과 제2 깊이 특징 또는 제4 깊이 특징 간의 상관 관계를 기초로 복수의 로컬 서브 응답 다이어그램을 획득할 수 있다.The target tracking apparatus may obtain a plurality of local sub-response diagrams based on a correlation between each of the plurality of local feature blocks and the second depth feature or the fourth depth feature.

타겟 추적 장치는 복수의 로컬 서브 응답 다이어그램을 합성하여 로컬 응답 다이어그램을 획득할 수 있다. 타겟 추적 장치는 복수의 로컬 특징 블록을 타겟 특징 블록 또는 배경 특징 블록으로 분류할 수 있다. 임시 바운딩 박스와 복수의 로컬 특징 블록 각각의 중첩 비율을 기초로 복수의 로컬 특징 블록을 타겟 특징 블록 또는 배경 특징 블록으로 분류할 수 있다. 타겟 추적 장치는 분류 결과를 기초로 로컬 서브 응답 다이어그램을 합성하여 로컬 응답 다이어그램을 획득할 수 있다.The target tracking device may obtain a local response diagram by synthesizing a plurality of local sub-response diagrams. The target tracking apparatus may classify the plurality of local feature blocks as target feature blocks or background feature blocks. The plurality of local feature blocks may be classified as a target feature block or a background feature block based on an overlap ratio of the temporary bounding box and each of the plurality of local feature blocks. The target tracking device may obtain a local response diagram by synthesizing a local sub-response diagram based on the classification result.

단계(213)에서, 타겟 추적 장치는 로컬 응답 다이어그램을 기초로 출력 바운딩 박스 정보를 획득할 수 있다. 출력 바운딩 박스 정보는 임시 바운딩 박스 정보에 포함된 임시 바운딩 박스의 중심 위치 좌표 및 출력 바운딩 박스의 중심 위치 좌표 간의 좌표 오프셋 및 출력 바운딩 박스의 크기와 미리 설정된 크기 간의 크기 오프셋을 포함할 수 있다. 좌표 오프셋의 절대값의 합이 임계값보다 큰 경우, 타겟 추적 장치는 임시 바운딩 박스 정보를 출력 바운딩 박스 정보로서 출력할 수 있다. 좌표 오프셋의 절대값의 합이 임계값 이하인 경우, 타겟 추적 장치는 임시 바운딩 박스의 중심 위치와 좌표 오프셋을 합산한 결과 및 임시 바운딩 박스의 크기와 크기 오프셋을 합산한 결과를 출력 바운딩 박스 정보로서 출력할 수 있다.In step 213, the target tracking device may obtain the output bounding box information based on the local response diagram. The output bounding box information may include a coordinate offset between the center position coordinates of the temporary bounding box and the center position coordinates of the output bounding box included in the temporary bounding box information, and a size offset between the size of the output bounding box and a preset size. When the sum of the absolute values of the coordinate offsets is greater than the threshold value, the target tracking apparatus may output temporary bounding box information as output bounding box information. When the sum of the absolute values of the coordinate offsets is less than or equal to the threshold, the target tracking device outputs the result of summing the center position and the coordinate offset of the temporary bounding box and the result of summing the size and size offset of the temporary bounding box as output bounding box information can do.

도 3은 일 실시예에 따른 타겟 추적 방법을 구체화한 흐름도이다.3 is a flowchart illustrating a method for tracking a target according to an embodiment.

도 3에 도시된 바와 같이, 타겟 추적 방법은 2개의 스테이지를 포함한다. 제1 스테이지에서는 대략적인 예측(107)이 수행된다. 제1 스테이지(105)에서, 타겟 대상 이미지(101)와 검색 영역 이미지(103)가 결정될 수 있다. 타겟 추적 장치는 타겟 영역 이미지(101)와 검색 영역 이미지(103)의 글로벌 특징(311, 312)을 각각 추출하고, 추출한 글로벌 특징(311, 312)에 대해 글로벌 상관(313)을 계산하여 글로벌 응답 다이어그램을 획득할 수 있다. 타겟 추적 장치는 글로벌 응답 다이어그램을 기초로 임시 바운딩 박스 정보를 획득할 수 있다. As shown in Fig. 3, the target tracking method includes two stages. In the first stage, a coarse prediction 107 is performed. In the first stage 105 , a target target image 101 and a search area image 103 may be determined. The target tracking apparatus extracts global features 311 and 312 of the target area image 101 and the search area image 103, respectively, and calculates a global correlation 313 for the extracted global features 311 and 312 to obtain a global response You can get a diagram. The target tracking device may acquire temporary bounding box information based on the global response diagram.

제2 스테이지에서는 정확한 예측(111)이 이루어진다. 타겟 추적 장치는 타겟 영역 이미지의 특징을 분할하여 복수의 로컬 특징 블록(321, 322)를 획득하고, 타겟 영역 이미지의 분할된 특징인 복수의 로컬 특징 블록(321, 322)과 갱신된 검색 영역 이미지 특징 간의 블록 상관(323)을 계산하여 로컬 응답 다이어그램을 획득할 수 있다. 타겟 추적 장치는 로컬 응답 다이어그램에 따라 출력 바운딩 박스 정보를 출력할 수 있다.In the second stage, an accurate prediction 111 is made. The target tracking device obtains a plurality of local feature blocks 321 and 322 by segmenting the features of the target region image, and includes a plurality of local feature blocks 321 and 322 that are the segmented features of the target region image and the updated search region image. A local response diagram can be obtained by computing the block correlation 323 between the features. The target tracking device may output the output bounding box information according to the local response diagram.

도 4는 일 실시예에 따른 글로벌 상관 연산에 대한 예시도이다.4 is an exemplary diagram of a global correlation operation according to an embodiment.

도 4를 참조하면, 타겟 추적 장치는 제1 깊이 특징 F_T(401)과 제2 깊이 특징 F_St (402) 간의 상관 연산을 수행할 수 있다. 타겟 추적 장치는 제1 깊이 특징 F_T(401)을 제2 깊이 특징 F_St (402)에 대하여 슬라이드하면서 각 위치에서의 상관 관계를 계산할 수 있다. 타겟 추적 장치는 글로벌 상관을 통해 글로벌 응답 다이어그램(403)을 얻을 수 있다.Referring to FIG. 4 , the target tracking apparatus may perform a correlation operation between _{the first depth feature F T} 401 and the second depth feature F _{St 402 .} The target tracking device may calculate the correlation at each location while sliding _{the first depth feature F T} 401 relative to the second depth feature F _{St 402 .} The target tracking device may obtain the global response diagram 403 through global correlation.

도 5는 일 실시예에 따른 타겟 추적 방법의 제1 스테이지에 대한 예시도이다. 5 is an exemplary diagram of a first stage of a method for tracking a target according to an embodiment.

제1 스테이지(105)에서, 타겟 추적 장치는 임시 바운딩 박스를 출력할 수 있다. 이를 위하여 타겟 추적 장치는 특징 추출, 글로벌 상관(313) 및 특징 클리핑(505)를 수행할 수 있다. In the first stage 105 , the target tracking device may output a temporary bounding box. To this end, the target tracking apparatus may perform feature extraction, global correlation 313 , and feature clipping 505 .

타겟 추적 장치는 제1 뉴럴 네트워크를 이용하여 타겟 영역 이미지(101) 및 검색 영역 이미지(103)에 대해 각각 특징을 추출할 수 있다. 제1 뉴럴 네트워크는 컨벌루션 뉴럴 네트워크

를 포함할 수 있다. 타겟 추적 장치는 타겟 영역 이미지 Z(101)를 컨벌루션 뉴럴 네트워크

에 입력하여 제1 깊이 특징

(Z) 을 출력할 수 있다. 타겟 추적 장치는 검색 영역 이미지 X(103)를 컨벌루션 뉴럴 네트워크

에 입력하여 제2 깊이 특징

(X)을 출력할 수 있다. 예를 들어, 컨벌루션 뉴럴 네트워크

는 샴쌍둥이 컨볼루션 네트워크일 수 있으며, 도 5의 두개의 분기(branch)의 매개 변수는 입력되는 이미지가 동일한 특징 공간에 매핑되도록 공유될 수 있다.The target tracking apparatus may extract features from each of the target area image 101 and the search area image 103 using the first neural network. The first neural network is a convolutional neural network

may include. The target tracking device converts the target area image Z 101 into a convolutional neural network

Enter in the first depth feature

(Z) can be printed. The target tracking device converts the search area image X 103 into a convolutional neural network

Enter in the second depth feature

(X) can be printed. For example, convolutional neural networks

may be a Siamese twin convolution network, and the parameters of the two branches of FIG. 5 may be shared so that an input image is mapped to the same feature space.

타겟 추적 장치는 추출된 제1 깊이 특징

(Z) 과 제2 깊이 특징

(X)에 대해 글로벌 상관(313)을 수행하여 글로벌 응답 다이어그램 f(501)을 출력할 수 있다. 타겟 추적 장치는 글로벌 응답 다이어그램 f(501)을 기초로 제1 스테이지 예측 결과인 임시 바운딩 박스 정보 P1(503)을 출력할 수 있다. 타겟 추적 장치는 수학식 2를 이용하여 제1 깊이 특징

(Z) 과 제2 깊이 특징

(X) 간의 글로벌 응답 다이어그램을 획득할 수 있다.The target tracking device extracts the first depth feature

(Z) and the second depth feature

A global correlation 313 may be performed on (X) to output a global response diagram f(501). The target tracking apparatus may output temporary bounding box information P1 503 that is a first stage prediction result based on the global response diagram f 501 . The target tracking device uses Equation 2 to determine the first depth feature

(Z) and the second depth feature

A global response diagram between (X) can be obtained.

타겟 추적 장치는 글로벌 응답 다이어그램에서 가장 큰 값을 가지는 위치를 임시 바운딩 박스의 위치 정보로서 출력할 수 있다. 타겟 추적 장치는 이전 프레임의 출력 바운딩 박스의 크기를 임시 바운딩 박스의 크기 정보로서 출력할 수 있다. The target tracking device may output the position having the largest value in the global response diagram as position information of the temporary bounding box. The target tracking apparatus may output the size of the output bounding box of the previous frame as size information of the temporary bounding box.

타겟 추적 장치는 제1 스테이지 예측 결과 P1(503)에 대해 특징 클리핑(505)를 수행할 수 있다. 타겟 추적 장치는 특징 클리핑(505)에 의해 클리핑된 검색 영역 이미지로부터 깊이 특징을 추출하여 제2 깊이 특징을 갱신할 수 있다. 결과적으로, 타겟 추적 장치는 제1 스테이지(105)에서 갱신된 제2 깊이 특징

과 제1 깊이 특징

(Z) 을 출력할 수 있다. 임시 바운딩 박스 정보 P1(503)은 P1= (x1, y1, w1, h1)로 표현될 수 있다. 여기서, x1와 y1는 각각 제1 스테이지의 임시 바운딩 박스의 중심 위치의 가로 좌표와 세로 좌표이고, w1과 h1은 각각 임시 바운딩 박스의 폭과 높이를 의미한다.The target tracking apparatus may perform feature clipping 505 on the first stage prediction result P1 503 . The target tracking apparatus may update the second depth feature by extracting the depth feature from the search region image clipped by the feature clipping 505 . As a result, the target tracking device has the updated second depth feature in the first stage 105 .

and first depth feature

(Z) can be printed. The temporary bounding box information P1 503 may be expressed as P1 = (x1, y1, w1, h1). Here, x1 and y1 are the horizontal and vertical coordinates of the center position of the temporary bounding box of the first stage, respectively, and w1 and h1 are the width and height of the temporary bounding box, respectively.

타겟 추적 장치는 임시 바운딩 박스의 중심 위치와 크기에 따라 검색 영역 이미지 X를 클리핑하여 더 작은 검색 영역 이미지 X'를 획득할 수 있다. 타겟 추적 장치는 검색 영역 이미지 X'의 깊이 특징을 추출하여 갱신된 제2 깊이 특징

을 획득할 수 있다.The target tracking apparatus may obtain a smaller search area image X' by clipping the search area image X according to the center position and size of the temporary bounding box. The target tracking device extracts the depth feature of the search area image X' and updates the second depth feature

can be obtained.

도 6은 일 실시예에 따른 블록 분할 방법에 대한 예시도이다. 6 is an exemplary diagram of a block partitioning method according to an embodiment.

도 6을 참조하면, 타겟 추적 장치에 의해 수행되는 다양한 종류의 분할 방법(600)이 개시된다. 타겟 추적 장치는 제1 깊이 특징(401) 또는 제3 깊이 특징(미도시)을 복수의 로컬 특징 블록으로 분할할 수 있다. 비중첩된 이미지 분할(601) 방식에 따르면, 타겟 추적 장치는 제1 깊이 특징(401) 또는 제3 깊이 특징을 복수의 로컬 특징 블록이 중첩되지 않도록 분할할 수 있다. 중첩된 이미지 분할(603) 방식에 따르면, 타겟 추적 장치는 제1 깊이 특징(401) 또는 제3 깊이 특징을 복수의 로컬 특징 블록이 중첩되도록 분할할 수 있다. 미리 정해진 블록 분포에 기초한 분할(605) 방식에 따르면, 타겟 추적 장치는 제1 깊이 특징(401) 또는 제3 깊이 특징을 미리 설정된 블록 분포를 기초로 분할할 수 있다.Referring to FIG. 6 , various types of segmentation methods 600 performed by a target tracking apparatus are disclosed. The target tracking apparatus may divide the first depth feature 401 or the third depth feature (not shown) into a plurality of local feature blocks. According to the non-overlapping image segmentation 601 method, the target tracking apparatus may segment the first depth feature 401 or the third depth feature so that a plurality of local feature blocks do not overlap. According to the overlapped image segmentation 603 method, the target tracking apparatus may segment the first depth feature 401 or the third depth feature so that a plurality of local feature blocks overlap. According to the division 605 method based on a predetermined block distribution, the target tracking apparatus may divide the first depth feature 401 or the third depth characteristic based on a preset block distribution.

도 7은 일 실시예에 따른 블록 상관 연산에 대한 예시도이다.7 is an exemplary diagram for block correlation operation according to an embodiment.

도 7은 설명의 편의를 위해 제1 깊이 특징(401)을 가정하여 도시되었으나, 제3 깊이 특징을 대상으로 블록 상관 연산이 수행될 수도 있다. 도 7을 참조하면, 제1 깊이 특징(401)은 영역 특징 분할이 수행될 수 있다. 제1 깊이 특징(401)은 복수의 로컬 특징 블록으로 분할될 수 있다. Although FIG. 7 is illustrated assuming a first depth feature 401 for convenience of description, a block correlation operation may be performed with respect to the third depth feature. Referring to FIG. 7 , a region feature segmentation may be performed on the first depth feature 401 . The first depth feature 401 may be divided into a plurality of local feature blocks.

타겟 추적 장치는 로컬 특징 각각과 제2 깊이 특징(402)의 상관 관계를 계산할 수 있다. 타겟 추적 장치는 계산 결과로서 복수의 로컬 서브 다이어그램(701, 702, 703, 704, 705, 706, 707, 708, 709)이 출력될 수 있다. 타겟 추적 장치는 복수의 로컬 서브 다이어그램(701, 702, 703, 704, 705, 706, 707, 708, 709)을 합성하여 로컬 응답 다이어그램(711)을 획득할 수 있다.The target tracking device may calculate a correlation between each of the local features and the second depth feature 402 . The target tracking device may output a plurality of local sub-diagrams 701 , 702 , 703 , 704 , 705 , 706 , 707 , 708 and 709 as a result of the calculation. The target tracking device may synthesize a plurality of local sub-diagrams 701 , 702 , 703 , 704 , 705 , 706 , 707 , 708 , and 709 to obtain a local response diagram 711 .

예를 들어, 타겟 추적 장치는 복수의 로컬 특징 블록 중 각 로컬 특징 블록을 타겟 특징 블록 또는 배경 특징 블록으로 분류할 수 있다. 타겟 추적 장치는 타겟 특징 블록에 대응하는 로컬 서브 응답 다이어그램을 획득하고, 배경 특징 블록에 대응하는 로컬 서브 응답 다이어그램을 획득할 수 있다. 타겟 추적 장치는 모든 로컬 서브 응답 다이어그램을 합성하여 로컬 응답 다이어그램(711)을 출력할 수 있다. For example, the target tracking apparatus may classify each local feature block among a plurality of local feature blocks as a target feature block or a background feature block. The target tracking device may obtain a local sub-response diagram corresponding to the target feature block, and obtain a local sub-response diagram corresponding to the background feature block. The target tracking device may output a local response diagram 711 by synthesizing all local sub-response diagrams.

타겟 영역 이미지에는 타겟 영역 이외에도 배경 영역이 존재하고, 배경 영역의 특징은 타겟 추적의 안정성과 정확성에 영향을 미칠 수 있다. 타겟 추적 장치는 도 7과 같은 방식을 통해 타겟 추적의 안정성과 정확성을 높일 수 있다. 타겟 추적 장치는 복수의 로컬 특징 블록을 타겟 특징 블록과 배경 특징 블록으로 분류한 후 합성함으로써 배경에 의한 간섭을 효과적으로 줄일 수 있다.In the target area image, a background area exists in addition to the target area, and the characteristics of the background area may affect stability and accuracy of target tracking. The target tracking apparatus may increase the stability and accuracy of the target tracking through the method shown in FIG. 7 . The target tracking apparatus can effectively reduce background interference by classifying and synthesizing a plurality of local feature blocks into a target feature block and a background feature block.

도 8은 일 실시예에 따른 간섭 억제 및 응답 다이어그램을 융합한 예시도이다.8 is an exemplary diagram in which interference suppression and response diagrams are fused according to an embodiment.

타겟 추적 장치는 복수의 로컬 서브 응답 다이어그램을 합성하여 로컬 응답 다이어그램을 획득할 수 있다. 타겟 추적 장치는 복수의 로컬 특징 블록을 타겟 특징 블록 또는 배경 특징 블록으로 분류할 수 있다. 임시 바운딩 박스(801)와 복수의 로컬 특징 블록(801, 802, 803, 804, 805, 806, 807, 808, 809) 각각의 중첩 비율을 기초로 복수의 로컬 특징 블록(801, 802, 803, 804, 805, 806, 807, 808, 809)을 타겟 특징 블록 또는 배경 특징 블록으로 분류할 수 있다.The target tracking device may obtain a local response diagram by synthesizing a plurality of local sub-response diagrams. The target tracking apparatus may classify the plurality of local feature blocks as target feature blocks or background feature blocks. Based on the overlap ratio of each of the temporary bounding box 801 and the plurality of local feature blocks 801, 802, 803, 804, 805, 806, 807, 808, 809, the plurality of local feature blocks 801, 802, 803, 804, 805, 806, 807, 808, 809) may be classified as a target feature block or a background feature block.

타겟 추적 장치는 각 로컬 특징 블록(801, 802, 803, 804, 805, 806, 807, 808, 809)과 임시 바운딩 박스(801) 사이의 중첩 영역에서 각 로컬 특징 블록(801, 802, 803, 804, 805, 806, 807, 808, 809)이 차지하는 비율에 따라, 각 로컬 특징 블록(801, 802, 803, 804, 805, 806, 807, 808, 809)을 타겟 특징 블록 또는 배경 특징 블록으로 분류할 수 있다. 예를 들어, 타겟 영역 이미지 상에서 보정된 임시 바운딩 박스(801)를 기준으로, 로컬 특징 블록이 임시 바운딩 박스 내 p% 이상의 영역을 갖는 경우 타겟 대상 특징 블록으로 분류하고, 로컬 특징 블록(801, 802, 803, 804, 805, 806, 807, 808, 809)과 임시 바운딩 박스(801)의 중첩 부분이 p% 미만인 경우 배경 특징 블록으로 분류할 수 있다. 여기서, p는 미리 정해진 임계값일 수 있다. The target tracking device is configured for each local feature block 801, 802, 803, 804, 805, 806, 807, 808, 809 and each local feature block 801, 802, 803, in the overlapping area between the temporary bounding box 801, Each local feature block 801, 802, 803, 804, 805, 806, 807, 808, 809 is used as a target feature block or a background feature block, depending on the proportion of 804, 805, 806, 807, 808, 809. can be classified. For example, based on the corrected temporary bounding box 801 on the target region image, if the local feature block has an area of p% or more in the temporary bounding box, it is classified as a target target feature block, and the local feature blocks 801 and 802 , 803, 804, 805, 806, 807, 808, 809) and the temporary bounding box 801 may be classified as a background feature block when the overlapping portion is less than p%. Here, p may be a predetermined threshold value.

타겟 추적 장치는 수학식 3을 이용하여 타겟 특징 블록에 대응하는 로컬 서브 응답 다이어그램과 배경 특징 블록에 대응하는 로컬 서브 응답 다이어그램을 합성하여 로컬 응답 다이어그램을 얻을 수 있다. The target tracking apparatus may obtain a local response diagram by synthesizing the local sub-response diagram corresponding to the target feature block and the local sub-response diagram corresponding to the background feature block using Equation (3).

수학식 3에서, S는 로컬 응답 다이어그램이고, So는 타겟 특징 블록에 대응하는 로컬 서브 응답 다이어그램이고, Sb는 배경 특징 블록에 대응하는 서브 응답 다이어그램이고, no는 타겟 특징 블록의 수이고, nb는 배경 특징 블록의 수이다. In Equation 3, S is a local response diagram, So is a local sub-response diagram corresponding to a target feature block, Sb is a sub-response diagram corresponding to a background feature block, no is the number of target feature blocks, and nb is The number of background feature blocks.

타겟 추적 장치는 로컬 응답 다이어그램을 기초로 출력 바운딩 박스를 출력할 수 있다. 타겟 추적 장치는 로컬 응답 다이어그램에 따라 임시 바운딩 박스(801)의 위치 오프셋 및 크기 오프셋을 예측할 수 있다. 타겟 추적 장치는 예측된 위치 오프셋 및 크기 오프셋에 따라 출력 바운딩 박스를 출력할 수 있다. The target tracking device may output an output bounding box based on the local response diagram. The target tracking device may predict the position offset and the size offset of the temporary bounding box 801 according to the local response diagram. The target tracking device may output an output bounding box according to the predicted position offset and size offset.

예를 들어, 타겟 추적 장치는 제3 뉴럴 네트워크를 이용하여 로컬 응답 다이어그램을 처리하고, 출력 바운딩 박스의 위치 오프셋과 크기 오프셋을 예측할 수 있다. 제3 뉴럴 네트워크는 위에서 언급한 제1 뉴럴 네트워크 및 제2 뉴럴 네트워크와는 다를 수 있다. 여기서, 출력 바운딩 박스의 예측 결과는 타겟 대상 바운딩 박스의 위치 정보와 크기 정보를 포함할 수 있다. 이하에서, 로컬 응답 다이어그램을 기초로 출력 바운딩 박스를 출력하는 과정은 자가 적응 예측 과정으로 지칭될 수 있다. For example, the target tracking device may process a local response diagram using a third neural network, and predict a position offset and a size offset of an output bounding box. The third neural network may be different from the above-mentioned first neural network and second neural network. Here, the prediction result of the output bounding box may include location information and size information of the target bounding box. Hereinafter, the process of outputting the output bounding box based on the local response diagram may be referred to as a self-adaptive prediction process.

도 9는 일 실시예에 따른 자가 적응 예측에 대한 예시도이다. 9 is an exemplary diagram of self-adaptive prediction according to an embodiment.

타겟 추적 장치는 컨볼루션 뉴럴 네트워크(902, 903)를 사용하여 로컬 응답 다이어그램(S)(901)을 처리할 수 있다. 타겟 추적 장치는 오프셋 예측(904)을 통해 오프셋 D(905)를 출력할 수 있다. 타겟 추적 장치는 제1 스테이지 예측 결과 P₁(906)과 오프셋 D(905)를 기초로 자가 적응 예측(907)을 수행할 수 있다. 자가 적응 예측(907)의 결과로서 제2 스테이지 예측 결과 P₂(908)가 출력될 수 있다. The target tracking device may process the local response diagram (S) 901 using the convolutional neural networks 902 and 903 . The target tracking device may output an offset D 905 through the offset prediction 904 . The target tracking apparatus may perform self-adaptive prediction 907 based on the first stage prediction result P _{1 906 and the offset D 905 .} _{A second stage prediction result P 2} 908 may be output as a result of the self-adaptive prediction 907 .

타겟 추적 장치는 오프셋D=(dx, dy, dw, dh)을 예측할 수 있으며, 해당 오프셋은 위치 오프셋과 크기 오프셋을 포함한다. 예를 들어, 위치 오프셋은 제2 스테이지의 출력 바운딩 박스의 중심 위치 좌표와 제1 스테이지의 임시 바운딩 박스의 중심 위치 좌표 간의 좌표 오프셋일 수 있고, 크기 오프셋은 제2 스테이지의 출력 바운딩 박스와 미리 정해진 바운딩 박스 간의 크기 오프셋일 수 있다. The target tracking device may predict an offset D = (dx, dy, dw, dh), and the offset includes a position offset and a magnitude offset. For example, the position offset may be a coordinate offset between the center position coordinates of the output bounding box of the second stage and the center position coordinates of the temporary bounding box of the first stage, and the size offset is a predetermined value between the output bounding box of the second stage and the predetermined position offset. It may be a size offset between the bounding boxes.

타겟 추적 장치는 예측된 위치 오프셋과 크기 오프셋에 따라 제2 스테이지의 출력 바운딩 박스의 예측 결과를 얻을 수 있다. 좌표 오프셋의 절대값의 합이 미리 설정된 임계값보다 큰 경우, 타겟 추적 장치는 제1 스테이지의 임시 바운딩 박스의 예측 결과를 제2 스테이지의 출력 바운딩 박스의 예측 결과로서 출력할 수 있다. The target tracking apparatus may obtain the prediction result of the output bounding box of the second stage according to the predicted position offset and the size offset. When the sum of the absolute values of the coordinate offsets is greater than a preset threshold, the target tracking apparatus may output the prediction result of the temporary bounding box of the first stage as the prediction result of the output bounding box of the second stage.

좌표 오프셋의 절대값의 합이 미리 설정된 임계값 이하인 경우, 타겟 추적 장치는 제1 스테이지의 임시 바운딩 박스의 중심 위치와 예측된 위치 오프셋을 더하고, 미리 정해진 바운딩 박스의 크기와 예측된 크기 오프셋을 더하여 제2 스테이지의 출력 바운딩 박스의 예측 결과를 획득할 수 있다. When the sum of the absolute values of the coordinate offsets is less than or equal to a preset threshold, the target tracking device adds the predicted position offset and the center position of the temporary bounding box of the first stage, and adds the size of the predetermined bounding box and the predicted size offset A prediction result of the output bounding box of the second stage may be obtained.

예를 들어, 제1 스테이지의 임시 바운딩 박스의 예측 결과가 P1=(x1, y1, w1, h1)이고, 미리 지정된 바운딩 박스의 크기가 (w0, h0)(너비가 w0, 높이가 h0)인 경우, 제2 스테이지의 출력 바운딩 박스의 예측 결과는 P2=(x1+dx, y1+dy, w0+dw, h0+dh)일 수 있다. For example, the prediction result of the temporary bounding box of the first stage is P1 = (x1, y1, w1, h1), and the size of the predefined bounding box is (w0, h0) (width w0, height h0) In this case, the prediction result of the output bounding box of the second stage may be P2=(x1+dx, y1+dy, w0+dw, h0+dh).

도 10은 일 실시예에 따른 타겟 추적 방법의 제2 스테이지 작업에 대한 예시도이다.10 is an exemplary diagram of a second stage operation of a method for tracking a target according to an embodiment.

타겟 추적 장치는 제1 스테이지를 통해 제1 깊이 특징

및 갱신된 제2 깊이 특징

을 획득할 수 있다. 타겟 추적 장치는 제1 깊이 특징

및 갱신된 제2 깊이 특징

을 컨벌루션 네트워크(1003)에 입력할 수 있다. 타겟 추적 장치는 블록 상과(1004)을 수행할 수 있다. 타겟 추적 장치는 간섭 억제를 수행하고 로컬 서브 응답 다이어그램을 융합할 수 있다. 타겟 추적 장치는 로컬 서브 응답 다이어그램(1006)을 획득할 수 있다. 타겟 추적 장치는 자가 적응 예측(1007)을 수행할 수 있다. 타겟 추적 장치는 제2 스테이지의 예측 결과 P₂를 출력할 수 있다. The target tracking device provides a first depth feature through a first stage.

and an updated second depth feature.

can be obtained. The target tracking device has a first depth feature

and an updated second depth feature.

may be input to the convolutional network 1003 . The target tracking device may perform block phase 1004 . The target tracking device may perform interference suppression and fuse local sub-response diagrams. The target tracking device may obtain a local sub-response diagram 1006 . The target tracking device may perform self-adaptive prediction 1007 . The target tracking apparatus may output _{the prediction result P 2 of the second stage.}

도 11은 일 실시예에 따른 네트워크 훈련에 대한 예시도이다.11 is an exemplary diagram for network training according to an embodiment.

타겟 추적 장치는 캐스케이드(cascade) 네트워크(제1 뉴럴 네트워크, 제2 뉴럴 네트워크 및 제3 뉴럴 네트워크를 포함)를 사용하여 타겟 대상을 추적할 수 있다. 캐스케이드 네트워크는 다중 감독 신호를 사용하여 학습될 수 있다. 여기서, 다중 감독 신호는 글로벌 응답 다이어그램, 로컬 응답 다이어그램 및 타겟 바운딩 박스를 포함한다. The target tracking device may track the target object using a cascade network (including a first neural network, a second neural network, and a third neural network). Cascade networks can be trained using multiple supervisory signals. Here, the multi-supervision signal includes a global response diagram, a local response diagram, and a target bounding box.

이하의 훈련(training) 과정에서 수행되는 단계는 추론(inference) 과정에서 일부 유사하게 적용될 수 있다. 다중 감독 신호는 손실 함수의 손실값을 최적화하기 위해 사용될 수 있다. 다중 감독 신호는 반복적인 순환 학습을 통해 네트워크의 매개 변수를 학습하기 위해 사용될 수 있다. The steps performed in the following training process may be similarly applied in some way in the inference process. Multiple supervisory signals can be used to optimize the loss value of the loss function. Multi-supervised signals can be used to learn the parameters of the network through iterative cyclic learning.

다중 감독 신호를 사용한 학습 과정에서, 학습 장치는 템플릿 이미지(1101) 및 검색 영역 이미지(1102)에 대해 제1 스테이지 추적(1101)을 진행하여 글로벌 응답 다이어그램(1103)을 획득할 수 있다. 글로벌 응답 다이어그램에서, 중심 위치로부터의 거리가 특정 임계값 미만인 경우는 +1로 설정되고, 중심 위치로부터의 거리가 특정 임계값보다 큰 경우는 -1로 설정될 수 있다. 학습 장치는 글로벌 응답 다이어그램(1103)을 기초로 대략적인 예측 박스(1105)를 출력할 수 있다. 학습 장치는 제1 스테이지 추적(1102) 결과와 클리핑된 대략적인 예측 박스(1105)를 기초로 공유 특징(1104)를 출력할 수 있다.In the learning process using the multiple supervision signals, the learning apparatus may perform the first stage tracking 1101 on the template image 1101 and the search area image 1102 to obtain the global response diagram 1103 . In the global response diagram, it may be set to +1 when the distance from the central position is less than a specific threshold, and set to -1 when the distance from the central position is greater than a specific threshold. The learning device may output a coarse prediction box 1105 based on the global response diagram 1103 . The learning device may output the shared feature 1104 based on the first stage tracking 1102 result and the clipped coarse prediction box 1105 .

학습 장치는 제2 스테이지 추적(1108)을 진행할 수 있다. 학습 장치는 검색 영역 이미지 상의 타겟에 대한 분할 결과(분할 알고리즘 또는 수동으로 획득)를 획득할 수 있다. 학습 장치는 분할 결과에 대해 거리 변환을 수행하고, 거리 변환 맵을 수치적으로 정규화하여 로컬 응답 다이어그램(1109)의 감독 신호를 획득할 수 있다. The learning device may proceed with the second stage tracking 1108 . The learning apparatus may obtain a segmentation result (a segmentation algorithm or manually obtained) for a target on the search area image. The learning apparatus may obtain the supervision signal of the local response diagram 1109 by performing distance transformation on the segmentation result and numerically normalizing the distance transformation map.

학습 장치는 자가 적응 예측(1106)을 통해 정확한 예측 결과를 획득할 수 있다. 학습 과정에서, 글로벌 응답 다이어그램, 로컬 응답 다이어그램(1109) 및 타겟 바운딩 박스는 감독 신호로 사용될 수 있고, 반복적인 순환 학습을 통해 손실 함수를 최적화하여 캐스케이드 네트워크의 매개 변수를 학습시킬 수 있다. The learning apparatus may obtain an accurate prediction result through the self-adaptive prediction 1106 . In the learning process, the global response diagram, the local response diagram 1109 and the target bounding box can be used as supervisory signals, and through iterative cyclic learning, the loss function can be optimized to learn the parameters of the cascade network.

예를 들어, 학습 장치는 동일한 비디오 시퀀스에서 추출된 이미지 쌍(템플릿 이미지 및 검색 영역 이미지를 포함)을 제1 스테이지의 뉴럴 네트워크에 입력하여 임시 바운딩 박스 정보를 획득할 수 있다. 학습 장치는 바이너리 크로스 엔트로피(Binary Cross Entropy)를 사용하여 글로벌 응답 다이어그램의 예측값과 실제값 사이의 손실(Loss0)을 계산할 수 있다. For example, the learning apparatus may acquire temporary bounding box information by inputting an image pair (including a template image and a search region image) extracted from the same video sequence into the neural network of the first stage. The learning apparatus may calculate the loss (Loss0) between the predicted value and the actual value of the global response diagram using binary cross entropy.

학습 장치는 대략적인 예측 결과를 기반으로 제1 스테이지의 공유 특징(1104)을 스테이지 2의 뉴럴 네트워크에 입력하여 로컬 응답 다이어그램(1109)을 획득할 수 있다. 학습 장치는 로컬 응답 다이어그램을 기초로 제2 스테이지의 정확한 예측 결과를 획득할 수 있다. 학습 장치는 KL 발산 (Kullback-leibler Divergence)을 사용하여 로컬 응답 다이어그램의 예측값과 실제값의 손실(Loss1)을 측정할 수 있다. 학습 장치는 L1 거리를 사용하여 정확하게 예측된 바운딩 박스와 실제 박스의 손실(Loss2)을 측정할 수 있다. 학습 장치는 손실(Loss=Loss0+(a1)*Loss1+(a2)*Loss2)을 최적화하여 뉴럴 네트워크의 매개 변수를 학습시킬 수 있다. 여기서, a1, a2는 각각 손실의 가중치를 나타낸다.The learning apparatus may obtain the local response diagram 1109 by inputting the shared feature 1104 of the first stage to the neural network of the stage 2 based on the coarse prediction result. The learning apparatus may obtain an accurate prediction result of the second stage based on the local response diagram. The learning apparatus may measure the loss (Loss1) of the predicted value and the actual value of the local response diagram using KL divergence (Kullback-leibler divergence). The learning apparatus may measure the loss (Loss2) of the accurately predicted bounding box and the actual box using the L1 distance. The learning apparatus may learn the parameters of the neural network by optimizing the loss (Loss=Loss0+(a1)*Loss1+(a2)*Loss2). Here, a1 and a2 represent the weight of the loss, respectively.

도 12는 일 실시예에 따른 간섭 억제를 결합한 블록 상관 방법 및 글로벌 상관 방법과 블록 상관 방법의 효과 차이의 비교를 도시한다.12 shows a comparison of the effect difference between the block correlation method and the global correlation method and the block correlation method combining interference suppression according to an embodiment.

도 12를 참조하면, 글로벌 상관 응답 다이어그램(1201), 블록 상관 응답 다이어그램(1202) 및 블록 상관 응답 다이어그램에 간섭 억제가 수행된 결과(1203)가 도시된다. 타겟 추적 장치는 블록 상관에 기초하여 간섭 억제를 수행함으로써 타겟의 상세 정보를 효과적으로 추출하여 추적의 정확성을 더욱 향상시킬 수 있다.Referring to FIG. 12 , a result 1203 of interference suppression performed in a global correlation response diagram 1201 , a block correlation response diagram 1202 , and a block correlation response diagram are shown. The target tracking apparatus may effectively extract detailed information of a target by performing interference suppression based on block correlation to further improve tracking accuracy.

도 13은 일 실시예에 따른 타겟 추정 장치의 구성을 도시한 도면이다.13 is a diagram illustrating a configuration of an apparatus for estimating a target according to an embodiment.

일 실시예에 따르면, 타겟 추적 장치는 적어도 하나의 프로세서를 포함한다. According to one embodiment, the target tracking device comprises at least one processor.

프로세서는 타겟 영역 이미지로부터 제1 깊이 특징을 획득하고, 검색 영역 이미지로부터 제2 깊이 특징을 획득할 수 있다. 프로세서는 제1 깊이 특징 및 제2 깊이 특징 간의 글로벌 응답 다이어그램을 획득할 수 있다. 프로세서는 글로벌 응답 다이어그램을 기초로 임시 바운딩 박스 정보를 획득할 수 있다. 프로세서는 임시 바운딩 박스 정보를 기초로 제2 깊이 특징을 갱신하여 갱신된 제2 깊이 특징을 획득할 수 있다. 프로세서는 제1 깊이 특징을 기초로 복수의 로컬 특징 블록을 획득할 수 있다. 프로세서는 복수의 로컬 특징 블록 및 갱신된 제2 깊이 특징을 기초로 로컬 응답 다이어그램을 획득할 수 있다. 프로세서는 로컬 응답 다이어그램을 기초로 출력 바운딩 박스 정보를 획득할 수 있다.The processor may obtain a first depth feature from the target area image, and obtain a second depth feature from the search area image. The processor may obtain a global response diagram between the first depth feature and the second depth feature. The processor may obtain temporary bounding box information based on the global response diagram. The processor may acquire the updated second depth feature by updating the second depth feature based on the temporary bounding box information. The processor may obtain a plurality of local feature blocks based on the first depth feature. The processor may obtain a local response diagram based on the plurality of local feature blocks and the updated second depth feature. The processor may obtain the output bounding box information based on the local response diagram.

이상으로, 도 1 내지 도 13를 참조하여 일 실시예에 따른 타겟 추적 방법 및 타겟 추적 장치가 설명되었다. 도 13에 도시된 장치의 각 동작은 특정 기능을 수행하기 위해 소프트웨어, 하드웨어, 펌웨어 또는 이들의 임의의 조합으로 구현될 수 있음을 이해해야 한다. 예를 들어, 타겟 추정 장치의 동작은 특수 집적 회로, 순수 소프트웨어 코드 또는 소프트웨어와 하드웨어를 결합한 모듈에 의해 구현될 수 있다. 예를 들어, 타겟 추적 장치는 PC 컴퓨터, 태블릿 장치, 개인용 정보 단말기, 스마트폰, web 애플리케이션 또는 프로그램 명령을 실행할 수 있는 기타 장치일 수 있으나 이에 제한되지 않는다.As described above, a target tracking method and a target tracking apparatus according to an embodiment have been described with reference to FIGS. 1 to 13 . It should be understood that each operation of the device shown in FIG. 13 may be implemented in software, hardware, firmware, or any combination thereof to perform a specific function. For example, the operation of the target estimation apparatus may be implemented by a special integrated circuit, pure software code, or a module combining software and hardware. For example, the target tracking device may be, but is not limited to, a PC computer, a tablet device, a personal digital assistant, a smart phone, a web application, or other device capable of executing program commands.

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 콘트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수 개의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수 개의 프로세서 또는 하나의 프로세서 및 하나의 컨트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented by a hardware component, a software component, and/or a combination of a hardware component and a software component. For example, the apparatus, methods and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate (FPGA). array), a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions. The processing device may execute an operating system (OS) and a software application running on the operating system. The processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that can include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.The software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or device, to be interpreted by or to provide instructions or data to the processing device. , or may be permanently or temporarily embody in a transmitted signal wave. The software may be distributed over networked computer systems, and stored or executed in a distributed manner. Software and data may be stored in a computer-readable recording medium.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있으며 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination, and the program instructions recorded on the medium are specially designed and configured for the embodiment, or are known and available to those skilled in the art of computer software. may be Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and carry out program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

위에서 설명한 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 또는 복수의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The hardware devices described above may be configured to operate as one or a plurality of software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 이를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited drawings, those of ordinary skill in the art may apply various technical modifications and variations based thereon. For example, the described techniques are performed in a different order than the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

obtaining a first depth feature from the target area image and obtaining a second depth feature from the search area image;
obtaining a global response diagram between the first depth feature and the second depth feature;
obtaining temporary bounding box information based on the global response diagram;
acquiring an updated second depth feature by updating the second depth feature based on the temporary bounding box information;
obtaining a plurality of local feature blocks based on the first depth feature;
obtaining a local response diagram based on the plurality of local feature blocks and the updated second depth feature; and
obtaining output bounding box information based on the local response diagram;
How to track your target.

According to claim 1,
The obtaining of the plurality of local feature blocks includes:
obtaining the plurality of local feature blocks by dividing the first depth feature or a third depth feature additionally extracted from the first depth feature,
Obtaining the local response diagram comprises:
obtaining the local response diagram based on a fourth depth feature additionally extracted from the updated second depth feature or a correlation between the second depth feature and the plurality of local feature blocks;
How to track your target.

3. The method of claim 2,
Obtaining the local response diagram based on the correlation comprises:
obtaining a plurality of local sub-response diagrams based on the correlation between each of the plurality of local feature blocks and the second depth feature or the fourth depth feature; and
synthesizing the plurality of local sub-response diagrams to obtain the local sub-response diagram,
How to track your target.

4. The method of claim 3,
The step of synthesizing the plurality of local sub-response diagrams to obtain the local response diagram comprises:
classifying the plurality of local feature blocks into target feature blocks or background feature blocks; and
Comprising the step of synthesizing the local sub-response diagram based on the classification result to obtain the local response diagram,
How to track your target.

5. The method of claim 4,
The classification step is
classifying the plurality of local feature blocks into a target feature block or a background feature block based on an overlap ratio of the temporary bounding box and each of the plurality of local feature blocks,
How to track your target.

4. The method of claim 3,
The output bounding box information includes a coordinate offset between the center position coordinates of the temporary bounding box and the center position coordinates of the output bounding box included in the temporary bounding box information, and a size offset between the size of the output bounding box and a preset size. ,
The step of obtaining the output bounding box information comprises:
When the sum of the absolute values of the coordinate offsets is greater than a threshold value, outputting the temporary bounding box information as the output bounding box information,
When the sum of the absolute values of the coordinate offsets is less than or equal to a threshold value, a result of summing the center position of the temporary bounding box and the coordinate offset and a result of summing the size of the temporary bounding box and the size offset are displayed as the output bounding box information output as,
How to track your target.

7. The method of claim 6,
The step of obtaining the temporary bounding box information comprises:
The coordinates having the greatest correlation in the global response diagram of the current frame are output as the coordinates of the center position of the temporary bounding box of the current frame, and the size of the output bounding box estimated in the previous frame is the size of the temporary bounding box of the current frame. output as size,
How to track your target.

3. The method of claim 2,
obtaining the plurality of local feature blocks by dividing the first depth feature or a third depth feature additionally extracted from the first depth feature,
dividing the first depth feature or the third depth feature so that the plurality of local feature blocks do not overlap, partition the plurality of local feature blocks so that they overlap, or partition based on a preset block distribution,
How to track your target.

A computer program stored in a computer-readable recording medium in combination with hardware to execute the method of any one of claims 1 to 8.

at least one processor;
The processor is
obtain a first depth feature from the target area image, obtain a second depth feature from the search area image,
obtain a global response diagram between the first depth feature and the second depth feature,
obtaining temporary bounding box information based on the global response diagram;
obtaining an updated second depth feature by updating the second depth feature based on the temporary bounding box information;
obtaining a plurality of local feature blocks based on the first depth feature,
obtain a local response diagram based on the plurality of local feature blocks and the updated second depth feature,
obtaining output bounding box information based on the local response diagram,
target tracking device.