KR20240045055A

KR20240045055A - Method and electronic device for object tracking

Info

Publication number: KR20240045055A
Application number: KR1020230000901A
Authority: KR
Inventors: 주재용
Original assignee: 삼성전자주식회사
Priority date: 2022-09-29
Filing date: 2023-01-03
Publication date: 2024-04-05

Abstract

본 개시에서는 적어도 하나의 객체를 추적하는 방법이 제공된다. 본 개시의 일 실시예에 따른 적어도 하나의 객체를 추적하는 방법은, 이미지를 획득하는 단계, 이미지로부터 객체 추적과 관련된 복수의 태스크를 수행하기 위한 특징맵을 추출하는 단계, 추출된 특징맵을 이용하여 적어도 하나의 객체의 위치를 나타내는 위치 정보, 적어도 하나의 객체를 식별할 수 있도록 하는 식별 특징, 및 적어도 하나의 객체의 몸통이 향하는 몸통 방향의 각도를 추출하는 단계, 및 적어도 하나의 객체의 위치 정보, 식별 특징, 및 몸통 방향의 각도를 기초로 하여, 적어도 하나의 객체를 추적하는 단계를 포함할 수 있다.In this disclosure, a method for tracking at least one object is provided. A method of tracking at least one object according to an embodiment of the present disclosure includes obtaining an image, extracting a feature map for performing a plurality of tasks related to object tracking from the image, and using the extracted feature map. extracting location information indicating the location of at least one object, an identification feature enabling identification of the at least one object, and an angle of the body direction toward which the body of the at least one object faces, and the location of the at least one object. The method may include tracking at least one object based on the information, identifying features, and body orientation angle.

Description

METHOD AND ELECTRONIC DEVICE FOR OBJECT TRACKING}

본 개시의 실시예들은, 객체를 추적하는 방법, 객체를 추적하는 전자 장치, 및 객체를 추적하는 방법을 컴퓨터에서 수행하기 위한 프로그램이 기록된 컴퓨터로 읽을 수 있는 기록매체에 관한 것이다.Embodiments of the present disclosure relate to a method for tracking an object, an electronic device for tracking an object, and a computer-readable recording medium on which a program for performing the method for tracking an object is recorded on a computer.

컴퓨터 비전 분야에서, 객체 추적(Object Tracking) 기술에 대한 연구가 활발히 진행되고 있다. 객체 추적 기술이란, 전자 장치가 획득한 영상 내지 이미지 시퀀스에서 적어도 하나의 객체를 탐지하여 적어도 하나의 객체 각각의 경로를 동시에 추적하는 기술이다. 적어도 하나의 객체를 동시에 추적하기 위해서는 적어도 하나의 객체를 탐지하고, 탐지된 적어도 하나의 객체들과 기존에 추적되고 있던 적어도 하나의 객체들을 매칭하는 과정이 필요하다.In the field of computer vision, research on object tracking technology is actively underway. Object tracking technology is a technology that detects at least one object in an image or image sequence acquired by an electronic device and simultaneously tracks the path of each of the at least one object. In order to track at least one object simultaneously, a process of detecting at least one object and matching at least one detected object with at least one object that was previously being tracked is required.

객체 추적 중에는 폐색(occlusion)이 발생하거나 객체가 카메라의 시야에서 벗어났다가 재등장하는 경우가 발생할 수 있다. 폐색에는, 객체의 한 부분이 객체의 다른 부분을 가리면서 발생하는 폐색, 객체 간에 발생하는 폐색, 배경의 구조물로 인해 발생하는 폐색이 있다.During object tracking, occlusion may occur or the object may leave the camera's field of view and then reappear. Occlusion includes occlusion that occurs when one part of an object obscures another part of the object, occlusion that occurs between objects, and occlusion that occurs due to structures in the background.

본 개시의 일 측면에 따르면, 적어도 하나의 객체를 추적하는 방법이 제공된다. 상기 방법은, 이미지를 획득하는 단계를 포함한다. 상기 방법은, 이미지로부터 객체 추적과 관련된 복수의 태스크를 수행하기 위한 특징맵을 추출하는 단계를 포함한다. 상기 방법은, 추출된 특징맵을 이용하여 적어도 하나의 객체의 위치를 나타내는 위치 정보, 적어도 하나의 객체를 식별할 수 있도록 하는 식별 특징, 및 적어도 하나의 객체의 몸통이 향하는 몸통 방향의 각도를 추출하는 단계를 포함한다. 상기 방법은, 적어도 하나의 객체의 위치 정보, 식별 특징, 및 몸통 방향의 각도를 기초로 하여, 적어도 하나의 객체를 추적하는 단계를 포함한다.According to one aspect of the present disclosure, a method for tracking at least one object is provided. The method includes acquiring an image. The method includes extracting a feature map from an image to perform a plurality of tasks related to object tracking. The method uses the extracted feature map to extract location information indicating the location of at least one object, an identification feature enabling identification of at least one object, and the angle of the body direction toward which the body of at least one object faces. It includes steps to: The method includes tracking at least one object based on location information, identifying features, and body orientation angle of the at least one object.

본 개시의 일 측면에 따르면, 적어도 하나의 객체를 추적하는 전자 장치가 제공된다. 상기 전자 장치는, 통신 인터페이스, 적어도 하나의 인스트럭션을 저장하는 메모리 및 메모리에 저장된 상기 적어도 하나의 인스트럭션을 실행하는 적어도 하나의 프로세서를 포함한다. 상기 적어도 하나의 프로세서는 상기 적어도 하나의 인스트럭션을 실행함으로써, 이미지를 획득할 수 있다. 상기 적어도 하나의 프로세서는 상기 적어도 하나의 인스트럭션을 실행함으로써, 이미지로부터 객체 추적과 관련된 복수의 태스크를 수행하기 위한 특징맵을 추출할 수 있다. 상기 적어도 하나의 프로세서는 상기 적어도 하나의 인스트럭션을 실행함으로써, 추출된 특징맵을 이용하여 적어도 하나의 객체의 위치를 나타내는 위치 정보, 적어도 하나의 객체를 식별할 수 있도록 하는 식별 특징, 및 적어도 하나의 객체의 몸통이 향하는 몸통 방향의 각도를 추출할 수 있다. 상기 적어도 하나의 프로세서는 상기 적어도 하나의 인스트럭션을 실행함으로써, 적어도 하나의 객체의 위치 정보, 식별 특징, 및 몸통 방향의 각도를 기초로 하여, 적어도 하나의 객체를 추적할 수 있다.According to one aspect of the present disclosure, an electronic device for tracking at least one object is provided. The electronic device includes a communication interface, a memory storing at least one instruction, and at least one processor executing the at least one instruction stored in the memory. The at least one processor may acquire an image by executing the at least one instruction. The at least one processor may extract a feature map for performing a plurality of tasks related to object tracking from an image by executing the at least one instruction. By executing the at least one instruction, the at least one processor generates location information indicating the location of at least one object using the extracted feature map, an identification feature enabling identification of at least one object, and at least one The angle of the body direction toward which the object's body faces can be extracted. By executing the at least one instruction, the at least one processor may track at least one object based on location information, identification characteristics, and body direction angle of the at least one object.

본 개시의 일 측면에 따르면, 적어도 하나의 객체를 추적하는, 전술 및 후술하는 방법들 중 어느 하나를 컴퓨터에서 수행하기 위한 프로그램이 기록된 컴퓨터로 읽을 수 있는 기록매체가 제공된다.According to one aspect of the present disclosure, a computer-readable recording medium is provided on which a program for tracking at least one object and performing any one of the above and below-described methods on a computer is recorded.

도 1은 본 개시의 일 실시예에 따른 전자 장치가 객체의 추적 정보를 이용하여 객체를 추적하는 방법을 나타내는 도면이다.
도 2는 본 개시의 일 실시예에 따른 전자 장치가 객체를 추적하는 방법을 나타내는 흐름도이다.
도 3은 본 개시의 일 실시예에 따른 객체 추적 모델을 이용하여 객체를 추적하는 과정을 나타내는 흐름도이다.
도 4는 본 개시의 일 실시예에 따른 객체 추적 모델의 훈련 이미지 데이터를 나타내는 예시적인 도면이다.
도 5a는 본 개시의 일 실시예에 따른 객체 추적 모델의 각 헤드를 이용하여 객체의 위치 정보, 식별 특징, 몸통 방향의 각도를 추출하는 방법을 나타내는 도면이다.
도 5b는 본 개시의 일 실시예에 따른 객체 추적 모델의 탐지 헤드를 이용하여 객체의 위치 정보를 추출하는 방법을 나타내는 도면이다.
도 6은 본 개시의 일 실시예에 따른 객체 탐지 정보 및 객체 추적 정보를 나타내는 도면이다.
도 7은 본 개시의 일 실시예에 따른 데이터 연관 과정을 나타내는 흐름도이다.
도 8a는 본 개시의 일 실시예에 따른 제1 데이터 연관 과정을 나타내는 흐름도이다.
도 8b는 본 개시의 일 실시예에 따른 제1 데이터 연관 과정을 나타내는 예시적인 도면이다.
도 9는 본 개시의 일 실시예에 따른 제2 데이터 연관 과정을 나타내는 흐름도이다.
도 10은 본 개시의 일 실시예에 따른 제3 데이터 연관 과정을 나타내는 흐름도이다.
도 11a는 본 개시의 일 실시예에 따른 추적 정보 관리 과정을 나타내는 흐름도이다.
도 11b는 본 개시의 일 실시예에 따른 객체의 추적 정보를 업데이트하는 과정을 나타내는 도면이다.
도 12a는 본 개시의 일 실시예에 따른 객체의 추적 상태를 업데이트하는 과정을 나타내는 흐름도이다.
도 12b는 본 개시의 일 실시예에 따른 객체의 추적 상태를 업데이트하는 과정을 나타내는 흐름도이다.
도 13은 본 개시의 일 실시예에 따른 전자 장치의 구성을 도시한 블록도이다.
도 14는 본 개시의 일 실시예에 따른 전자 장치의 구성을 도시한 블록도이다.FIG. 1 is a diagram illustrating a method by which an electronic device tracks an object using object tracking information according to an embodiment of the present disclosure.
Figure 2 is a flowchart showing a method by which an electronic device tracks an object according to an embodiment of the present disclosure.
Figure 3 is a flowchart showing a process of tracking an object using an object tracking model according to an embodiment of the present disclosure.
Figure 4 is an exemplary diagram showing training image data of an object tracking model according to an embodiment of the present disclosure.
FIG. 5A is a diagram illustrating a method of extracting location information, identification features, and body direction angle of an object using each head of an object tracking model according to an embodiment of the present disclosure.
FIG. 5B is a diagram illustrating a method of extracting location information of an object using a detection head of an object tracking model according to an embodiment of the present disclosure.
Figure 6 is a diagram showing object detection information and object tracking information according to an embodiment of the present disclosure.
Figure 7 is a flowchart showing a data association process according to an embodiment of the present disclosure.
FIG. 8A is a flowchart showing a first data association process according to an embodiment of the present disclosure.
FIG. 8B is an exemplary diagram illustrating a first data association process according to an embodiment of the present disclosure.
Figure 9 is a flowchart showing a second data association process according to an embodiment of the present disclosure.
Figure 10 is a flowchart showing a third data association process according to an embodiment of the present disclosure.
FIG. 11A is a flowchart showing a tracking information management process according to an embodiment of the present disclosure.
FIG. 11B is a diagram illustrating a process of updating tracking information of an object according to an embodiment of the present disclosure.
FIG. 12A is a flowchart illustrating a process for updating the tracking state of an object according to an embodiment of the present disclosure.
FIG. 12B is a flowchart illustrating a process of updating the tracking state of an object according to an embodiment of the present disclosure.
Figure 13 is a block diagram showing the configuration of an electronic device according to an embodiment of the present disclosure.
Figure 14 is a block diagram showing the configuration of an electronic device according to an embodiment of the present disclosure.

본 개시에서, "a, b 또는 c 중 적어도 하나" 표현은 " a", " b", " c", "a 및 b", "a 및 c", "b 및 c", "a, b 및 c 모두", 혹은 그 변형들을 지칭할 수 있다.In the present disclosure, the expression “at least one of a, b, or c” refers to “a”, “b”, “c”, “a and b”, “a and c”, “b and c”, “a, b and c", or variations thereof.

본 개시에서 사용되는 용어는 본 개시에서의 기능을 고려하면서 가능한 현재 널리 사용되는 일반적인 용어들을 선택하였으나, 이는 당 분야에 종사하는 기술자의 의도 또는 판례, 새로운 기술의 출현 등에 따라 달라질 수 있다. 또한, 특정한 경우는 출원인이 임의로 선정한 용어도 있으며, 이 경우 해당되는 설명 부분에서 상세히 그 의미를 기재할 것이다. 따라서 본 개시에서 사용되는 용어는 단순한 용어의 명칭이 아닌, 그 용어가 가지는 의미와 본 개시의 전반에 걸친 내용을 토대로 정의되어야 한다.The terms used in the present disclosure have selected general terms that are currently widely used as much as possible while considering the functions in the present disclosure, but this may vary depending on the intention or precedents of those skilled in the art, the emergence of new technologies, etc. In addition, in certain cases, there are terms arbitrarily selected by the applicant, and in this case, the meaning will be described in detail in the relevant description. Therefore, the terms used in this disclosure should be defined based on the meaning of the term and the overall content of this disclosure, rather than simply the name of the term.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함할 수 있다. 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 용어들은 본 명세서에 기재된 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가질 수 있다. 또한, 본 명세서에서 사용되는 '제1' 또는 '제2' 등과 같이 서수를 포함하는 용어는 다양한 구성 요소들을 설명하는데 사용할 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로만 사용된다.Singular expressions may include plural expressions, unless the context clearly dictates otherwise. Terms used herein, including technical or scientific terms, may have the same meaning as generally understood by a person of ordinary skill in the technical field described herein. Additionally, terms including ordinal numbers, such as 'first' or 'second', used in this specification may be used to describe various components, but the components should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another.

명세서 전체에서 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있음을 의미한다. 또한, 명세서에 기재된 "부", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.When it is said that a part "includes" a certain element throughout the specification, this means that, unless specifically stated to the contrary, it does not exclude other elements but may further include other elements. Additionally, terms such as “unit” and “module” used in the specification refer to a unit that processes at least one function or operation, which may be implemented as hardware or software, or as a combination of hardware and software.

본 개시에 따른 인공지능과 관련된 기능은 프로세서와 메모리를 통해 동작된다. 프로세서는 하나 또는 복수의 프로세서로 구성될 수 있다. 이때, 하나 또는 복수의 프로세서는 CPU, AP, DSP(Digital Signal Processor) 등과 같은 범용 프로세서, GPU, VPU(Vision Processing Unit)와 같은 그래픽 전용 프로세서 또는 NPU와 같은 인공지능 전용 프로세서일 수 있다. 하나 또는 복수의 프로세서는, 메모리에 저장된 기 정의된 동작 규칙 또는 인공지능 모델에 따라, 입력 데이터를 처리하도록 제어한다. 또는, 하나 또는 복수의 프로세서가 인공지능 전용 프로세서인 경우, 인공지능 전용 프로세서는, 특정 인공지능 모델의 처리에 특화된 하드웨어 구조로 설계될 수 있다. Functions related to artificial intelligence according to the present disclosure are operated through a processor and memory. The processor may consist of one or multiple processors. At this time, one or more processors may be a general-purpose processor such as a CPU, AP, or DSP (Digital Signal Processor), a graphics-specific processor such as a GPU or VPU (Vision Processing Unit), or an artificial intelligence-specific processor such as an NPU. One or more processors control input data to be processed according to predefined operation rules or artificial intelligence models stored in memory. Alternatively, when one or more processors are dedicated artificial intelligence processors, the artificial intelligence dedicated processors may be designed with a hardware structure specialized for processing a specific artificial intelligence model.

기 정의된 동작 규칙 또는 인공지능 모델은 학습을 통해 만들어진 것을 특징으로 한다. 여기서, 학습을 통해 만들어진다는 것은, 기본 인공지능 모델이 학습 알고리즘에 의하여 다수의 학습 데이터들을 이용하여 학습됨으로써, 원하는 특성(또는, 목적)을 수행하도록 설정된 기 정의된 동작 규칙 또는 인공지능 모델이 만들어짐을 의미한다. 이러한 학습은 본 개시에 따른 인공지능이 수행되는 기기 자체에서 이루어질 수도 있고, 별도의 서버 및/또는 시스템을 통해 이루어 질 수도 있다. 학습 알고리즘의 예로는, 지도형 학습(supervised learning), 비지도형 학습(unsupervised learning), 준지도형 학습(semi-supervised learning) 또는 강화 학습(reinforcement learning)이 있으나, 전술한 예에 한정되지 않는다.Predefined operation rules or artificial intelligence models are characterized by being created through learning. Here, being created through learning means that the basic artificial intelligence model is learned using a large number of learning data by a learning algorithm, thereby creating a predefined operation rule or artificial intelligence model set to perform the desired characteristics (or purpose). It means burden. This learning may be performed on the device itself that performs the artificial intelligence according to the present disclosure, or may be performed through a separate server and/or system. Examples of learning algorithms include supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning, but are not limited to the examples described above.

인공지능 모델은, 복수의 신경망 레이어들로 구성될 수 있다. 복수의 신경망 레이어들 각각은 복수의 가중치들(weight values)을 갖고 있으며, 이전(previous) 레이어의 연산 결과와 복수의 가중치들 간의 연산을 통해 신경망 연산을 수행한다. 복수의 신경망 레이어들이 갖고 있는 복수의 가중치들은 인공지능 모델의 학습 결과에 의해 최적화될 수 있다. 예를 들어, 학습 과정 동안 인공지능 모델에서 획득한 로스(loss) 값 또는 코스트(cost) 값이 감소 또는 최소화되도록 복수의 가중치들이 갱신될 수 있다. 인공 신경망은 심층 신경망(DNN:Deep Neural Network)를 포함할 수 있으며, 예를 들어, CNN (Convolutional Neural Network), DNN (Deep Neural Network), RNN (Recurrent Neural Network), RBM (Restricted Boltzmann Machine), DBN (Deep Belief Network), BRDNN(Bidirectional Recurrent Deep Neural Network) 또는 심층 Q-네트워크 (Deep Q-Networks) 등이 있으나, 전술한 예에 한정되지 않는다.An artificial intelligence model may be composed of multiple neural network layers. Each of the plurality of neural network layers has a plurality of weight values, and neural network calculation is performed through calculation between the calculation result of the previous layer and the plurality of weights. Multiple weights of multiple neural network layers can be optimized by the learning results of the artificial intelligence model. For example, a plurality of weights may be updated so that loss or cost values obtained from the artificial intelligence model are reduced or minimized during the learning process. Artificial neural networks may include deep neural networks (DNN), for example, Convolutional Neural Network (CNN), Deep Neural Network (DNN), Recurrent Neural Network (RNN), Restricted Boltzmann Machine (RBM), Deep Belief Network (DBN), Bidirectional Recurrent Deep Neural Network (BRDNN), or Deep Q-Networks, etc., but are not limited to the examples described above.

아래에서는 첨부한 도면을 참고하여 본 개시의 실시예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 개시는 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시예에 한정되지 않는다. 그리고 도면에서 본 개시를 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다. 또한, 각각의 도면에서 사용된 도면 부호는 각각의 도면을 설명하기 위한 것일 뿐, 상이한 도면들 각각에서 사용된 상이한 도면 부호가 상이한 요소를 나타내기 위한 것은 아니다.Below, with reference to the attached drawings, embodiments of the present disclosure will be described in detail so that those skilled in the art can easily implement the present invention. However, the present disclosure may be implemented in many different forms and is not limited to the embodiments described herein. In order to clearly explain the present disclosure in the drawings, parts that are not related to the description are omitted, and similar parts are given similar reference numerals throughout the specification. In addition, the reference numerals used in each drawing are only for explaining each drawing, and the different reference numerals used in each of the different drawings are not intended to indicate different elements.

이하 첨부된 도면을 참고하여 본 개시를 상세히 설명하기로 한다.Hereinafter, the present disclosure will be described in detail with reference to the attached drawings.

도 1은 본 개시의 일 실시예에 따른 전자 장치가 객체의 추적 정보를 이용하여 객체를 추적하는 방법을 나타내는 도면이다.FIG. 1 is a diagram illustrating a method by which an electronic device tracks an object using object tracking information according to an embodiment of the present disclosure.

도 1을 참조하면, 본 개시의 일 실시예에 따른 전자 장치(2000)는 객체의 추적 정보를 이용하여 적어도 하나의 객체를 추적할 수 있다. 일 실시예에서, 객체를 추적한다는 것은, 전자 장치(2000)가 실시간으로 이미지를 획득하고, 획득한 이미지에서 객체를 탐지하여 객체의 경로를 추적하는 것을 의미한다. 예를 들어, 전자 장치(2000)는 실시간으로 이미지를 획득하고, 매 프레임마다 객체를 탐지하여 객체의 경로를 추적할 수 있다. 각 이미지의 획득 시점은 프레임 단위로 전자 장치(2000)에 기록될 수 있다. 예를 들어, 각 이미지에는 타임스탬프가 포함될 수 있고, 전자 장치(2000)는 타임스탬프에 기초하여 이미지의 획득 시점을 기록할 수 있다. 일 실시예에 따라, 실시간으로 획득된 이미지에서 복수의 객체가 탐지되는 경우, 전자 장치(2000)는 객체 각각의 경로를 동시에 추적할 수 있다. 이 때, 전자 장치(2000)는 탐지된 객체들과 기존에 추적되고 있는 객체들을 서로 매칭시키고, 객체들의 추적 정보를 업데이트할 수 있다. Referring to FIG. 1, an electronic device 2000 according to an embodiment of the present disclosure may track at least one object using object tracking information. In one embodiment, tracking an object means that the electronic device 2000 acquires an image in real time, detects an object in the acquired image, and tracks the path of the object. For example, the electronic device 2000 may acquire an image in real time, detect an object every frame, and track the path of the object. The acquisition time of each image may be recorded in the electronic device 2000 in units of frames. For example, each image may include a timestamp, and the electronic device 2000 may record the acquisition time of the image based on the timestamp. According to one embodiment, when a plurality of objects are detected in an image acquired in real time, the electronic device 2000 may simultaneously track the paths of each object. At this time, the electronic device 2000 may match detected objects with previously tracked objects and update the tracking information of the objects.

일 실시예에서, 객체의 추적 정보는, 객체 클래스, 객체의 ID, 객체의 경로, 객체의 식별 특징 목록, 객체의 마지막 매칭 시점을 포함할 수 있으나, 이에 한정되는 것은 아니다. 일 실시예에서, 객체의 마지막 매칭 시점이란, 해당 객체가 추적되고 있는 객체들 중 하나와 마지막으로 매칭된 이미지의 획득 시점을 의미할 수 있다. 객체의 추적 정보에 포함되는 정보들에 대한 구체적인 설명은 도 6을 참조하여 후술하도록 한다. 전자 장치(2000)는 메모리에 추적 데이터베이스를 저장할 수 있다. 일 실시예에 따른 추적 데이터베이스는, 전자 장치(2000)가 추적하고 있는 적어도 하나의 객체의 추적 정보를 포함할 수 있다.In one embodiment, the tracking information of the object may include, but is not limited to, the object class, the ID of the object, the path of the object, the list of identifying characteristics of the object, and the last matching time of the object. In one embodiment, the last matching time of an object may mean the time of acquisition of an image at which the object was last matched with one of the objects being tracked. A detailed description of the information included in the object tracking information will be described later with reference to FIG. 6. The electronic device 2000 may store a tracking database in memory. The tracking database according to one embodiment may include tracking information on at least one object being tracked by the electronic device 2000.

일 실시예에 따른 객체는, 몸통 방향에 따라 식별 특징(feature)이 구별되는 객체일 수 있다. 예를 들어, 객체의 클래스에는 사람, 차량, 동물이 포함될 수 있으나 이에 한정되지 않는다. 객체의 클래스가 사람인 경우를 예로 들면, 사람의 앞모습, 옆모습 및 뒷모습은 각각 구별되며, 이에 따라 앞모습, 옆모습 및 뒷모습에 대한 식별 특징 또한 구별될 수 있다. An object according to one embodiment may be an object whose identification features are distinguished depending on the direction of the body. For example, the object class may include, but is not limited to, people, vehicles, and animals. For example, if the class of the object is a person, the front, side, and back of the person are distinguished, and identification features for the front, side, and back can also be distinguished accordingly.

일 실시예에서, 식별 특징이란, 동일 클래스 내에서, 객체들 각각을 식별할 수 있는 특징 벡터를 의미한다. 예를 들어, 전자 장치(2000)는 실시간으로 획득한 이미지들에서 객체들을 탐지하고, 탐지된 객체들에 대한 식별 특징들을 추출할 수 있다. 이 때, 전자 장치(2000)가 이미지들에서 추출한 동일한 객체에 대한 식별 특징들은 높은 유사도를 가질 수 있다. 예를 들어, 동일한 객체에 대한 식별 특징들은 서로 간의 코사인 유사도가 1에 가까울 수 있다. 반면, 서로 다른 객체에 대한 식별 특징들은 낮은 유사도를 가질 수 있다. In one embodiment, an identification feature refers to a feature vector that can identify each object within the same class. For example, the electronic device 2000 may detect objects from images acquired in real time and extract identification features for the detected objects. At this time, identification features for the same object extracted by the electronic device 2000 from images may have a high degree of similarity. For example, identification features for the same object may have a cosine similarity close to 1. On the other hand, identification features for different objects may have low similarity.

도 1을 참조하면, 전자 장치(2000)는 ID가 5인 제1 객체(110)를 탐지할 수 있다. 일 실시예에 따른 전자 장치(2000)는 제1 객체(110)의 추적 정보 내에 포함된 제1 객체(110)의 식별 특징 목록(130)을 이용하여 제1 객체(110)를 식별할 수 있다. 일 실시예에서, 식별 특징 목록(130)이란, 객체의 최신 식별 특징 및 몸통 방향의 대표 각도 각각에 대응하는 적어도 하나의 식별 특징을 포함하는 목록을 의미한다. 일 실시예에서, 대표 각도란, 객체 탐지 시에 분류(classification) 방식으로 몸통 방향의 각도를 추출하기 위해 설정된 소정 각도 간격을 가지는 각도들을 의미한다. 예를 들어, 전자 장치(2000)가 몸통 방향의 각도를 10개로 분류하는 경우, 몸통 방향의 대표 각도들은 36도의 간격으로 설정될 수 있다. 이하에서는, 설명의 편의를 위해, 전자 장치(2000)가 몸통 방향의 각도를 4개로 분류하여, 몸통 방향의 대표 각도들은 0도, 90도, 180도 및 270도로 설정된 경우를 예로 들어 설명하도록 한다. Referring to FIG. 1, the electronic device 2000 may detect the first object 110 with ID 5. The electronic device 2000 according to one embodiment may identify the first object 110 using the identification feature list 130 of the first object 110 included in the tracking information of the first object 110. . In one embodiment, the identification feature list 130 refers to a list including at least one identification feature corresponding to each of the latest identification feature of an object and a representative angle of the body direction. In one embodiment, the representative angle refers to angles having a predetermined angle interval set to extract the angle of the body direction using a classification method when detecting an object. For example, when the electronic device 2000 classifies angles in the body direction into 10, representative angles in the body direction may be set at intervals of 36 degrees. Hereinafter, for convenience of explanation, an example will be given where the electronic device 2000 divides the angles in the body direction into four and the representative angles in the body direction are set to 0 degrees, 90 degrees, 180 degrees, and 270 degrees. .

예를 들어, 식별 특징 목록은, 대표 각도 각각에 대응하는 식별 특징을 모두 포함할 수도 있고, 대표 각도 각각에 대응하는 식별 특징 중 일부만 포함할 수도 있다. 전자 장치(2000)에 의해 제1 객체(110)가 추적되는 동안, 몸통 방향의 각도가 90도인 제1 객체(110)가 탐지되지 않은 경우, 제1 객체(110)의 식별 특징 목록(130)에는 90도에 대응하는 식별 특징이 포함되지 않을 수 있다. 그러나, 이후에 몸통 방향의 각도가 90도인 제1 객체(110)가 탐지되는 경우, 전자 장치(2000)는 추출된 식별 특징으로 제1 객체(110)의 식별 특징 목록(130)의 90도에 대응하는 식별 특징을 업데이트할 수 있다.For example, the identification feature list may include all of the identification features corresponding to each representative angle, or may include only some of the identification features corresponding to each representative angle. While the first object 110 is being tracked by the electronic device 2000, if the first object 110 whose body direction angle is 90 degrees is not detected, the identification feature list 130 of the first object 110 may not include an identifying feature corresponding to 90 degrees. However, if the first object 110 whose body direction angle is 90 degrees is detected later, the electronic device 2000 places the extracted identification feature at 90 degrees in the identification feature list 130 of the first object 110. Corresponding identification features can be updated.

일 실시예에서, 전자 장치(2000)는 제1 객체(110)를 탐지할 때, 제1 객체(110)의 몸통 방향의 각도를 탐지할 수 있다. 예를 들어, 도 1에 도시된 제1 객체(110)의 경우, 몸통 방향의 각도가 180도로 탐지될 수 있다. 일 실시예에 따른 전자 장치(2000)는 추출된 식별 특징으로 제1 객체(110)의 식별 특징 목록(130) 중 최신 식별 특징 및 제1 객체(110)의 탐지된 몸통 방향의 각도에 대응하는 식별 특징을 업데이트할 수 있다. 예를 들어, 제1 객체(110)의 몸통 방향의 각도가 180도로 탐지된 경우, 전자 장치(2000)는 추출된 식별 특징으로 제1 객체(110)의 식별 특징 목록(130) 중 최신 식별 특징 및 제1 객체(110)의 180도에 대응하는 식별 특징을 업데이트할 수 있다.In one embodiment, when detecting the first object 110, the electronic device 2000 may detect the angle of the body direction of the first object 110. For example, in the case of the first object 110 shown in FIG. 1, the body direction angle may be detected as 180 degrees. The electronic device 2000 according to an embodiment may use the extracted identification feature as the latest identification feature in the identification feature list 130 of the first object 110 and the detected body direction angle of the first object 110. Identification characteristics can be updated. For example, when the body direction angle of the first object 110 is detected as 180 degrees, the electronic device 2000 selects the latest identification feature from the identification feature list 130 of the first object 110 as the extracted identification feature. And the identification feature corresponding to 180 degrees of the first object 110 may be updated.

일 실시예에 의하면, 제1 객체(110)는 획득된 이미지에서 전자 장치(2000)에 의해 탐지된 이후, 화면 밖으로 사라질 수 있다. 제1 객체(110)는 화면 밖으로 사라진 뒤, 일정 기간 이후 화면에 재등장할 수 있다. 제1 객체(110)가 화면에 재등장하는 경우, 제1 객체(110)는 제1 객체(110)의 마지막 탐지 시간의 몸통 방향의 각도와 다른 각도의 몸통 방향을 가지고 재등장할 수 있다. 제1 객체(110)의 몸통 방향이 달라지는 경우, 제1 객체(110)의 외형(appearance)은 달라질 수 있다. 만약, 제1 객체(110)의 몸통 방향을 고려하지 않는 경우, 동일한 객체임에도 불구하고 외형이 달라짐에 따라 획득된 이미지에서 추출된 제1 객체(110)의 식별 특징과 마지막으로 탐지된 시점의 제1 객체(110)의 식별 특징 간 유사도가 낮을 수 있고, 객체 추적에 어려움이 있을 수 있다.According to one embodiment, the first object 110 may disappear off the screen after being detected by the electronic device 2000 in the acquired image. The first object 110 may disappear off the screen and then reappear on the screen after a certain period of time. When the first object 110 reappears on the screen, the first object 110 may reappear with a body direction angle that is different from the body direction angle at the last detection time of the first object 110. When the direction of the body of the first object 110 changes, the appearance of the first object 110 may change. If the direction of the body of the first object 110 is not considered, the identification characteristics of the first object 110 extracted from the image obtained as the appearance changes even though it is the same object, and the 1 The similarity between the identification features of the object 110 may be low, and there may be difficulty in tracking the object.

일 실시예에 따른 전자 장치(2000)는 제2 객체(120)를 탐지할 수 있다. 이 경우, 전자 장치(2000)는 제2 객체(120)의 몸통 방향의 각도를 탐지할 수 있다. 예를 들어, 도 1에 도시된 제2 객체(120)의 경우, 몸통 방향의 각도가 0도로 탐지될 수 있다. 전자 장치(2000)는 추적되고 있는 객체들의 추적 정보 각각에 포함된 식별 특징 목록 중 0도에 대응하는 식별 특징들과 제2 객체(120)의 추출된 식별 특징을 비교할 수 있다. 제2 객체(120)의 추출된 식별 특징이 제1 객체(110)의 식별 특징 목록(130) 중 0도에 대응하는 식별 특징과 가장 유사도가 높은 경우, 전자 장치(2000)는 제2 객체(120)를 제1 객체(110)라 판단하고, 제1 객체(110)의 추적 정보를 업데이트할 수 있다. 예를 들어, 전자 장치(2000)는 획득된 이미지에서 추출된 식별 특징으로 제1 객체(110)의 식별 특징 목록(130) 중 최신 식별 특징 및 0도에 대응하는 식별 특징을 업데이트할 수 있다.The electronic device 2000 according to one embodiment may detect the second object 120. In this case, the electronic device 2000 may detect the angle of the body direction of the second object 120. For example, in the case of the second object 120 shown in FIG. 1, the body direction angle may be detected as 0 degrees. The electronic device 2000 may compare the extracted identification features of the second object 120 with the identification features corresponding to 0 degrees among the identification feature list included in each tracking information of the objects being tracked. If the extracted identification feature of the second object 120 has the highest similarity to the identification feature corresponding to 0 degrees in the identification feature list 130 of the first object 110, the electronic device 2000 selects the second object ( 120 is determined to be the first object 110, and tracking information of the first object 110 may be updated. For example, the electronic device 2000 may update the latest identification feature and the identification feature corresponding to 0 degrees among the identification feature list 130 of the first object 110 with the identification feature extracted from the acquired image.

상술한 바와 같이, 객체 별로 몸통 방향의 각도에 따른 식별 특징들을 저장하고, 객체 추적 시 동일한 몸통 방향의 각도에 대응하는 식별 특징들을 비교한다면 객체의 식별률을 높일 수 있는 효과를 포함한 다양한 효과를 가질 수 있다. 이하에서는, 전자 장치(2000)가 객체를 추적하는 방법을 구체적으로 설명하도록 한다.As described above, if identification features according to the body direction angle are stored for each object and identification features corresponding to the same body direction angle are compared when tracking an object, various effects can be obtained, including the effect of increasing the identification rate of the object. You can. Hereinafter, a method by which the electronic device 2000 tracks an object will be described in detail.

도 2는 본 개시의 일 실시예에 따른 전자 장치가 객체를 추적하는 방법을 나타내는 흐름도이다.Figure 2 is a flowchart showing a method by which an electronic device tracks an object according to an embodiment of the present disclosure.

단계 S210에서, 전자 장치(2000)는 이미지를 획득할 수 있다. 일 실시예에 따른 전자 장치(2000)는 전자 장치(2000)에 포함된 카메라를 이용해 이미지를 촬영하고, 실시간으로 이미지를 획득할 수 있다. 카메라는 전자 장치(2000) 내부에 설치될 수도 있고, 외부에서 연결될 수도 있다. 또한, 일 실시예에 따른 전자 장치(2000)는 전자 장치(2000)에 포함된 통신 인터페이스를 통해 외부 장치로부터 실시간으로 이미지를 획득할 수도 있다. 예를 들어, 전자 장치(2000)는 외부 카메라에서 획득된 이미지를 통신 인터페이스를 통해 실시간으로 수신할 수 있다.In step S210, the electronic device 2000 may acquire an image. The electronic device 2000 according to one embodiment can capture an image using a camera included in the electronic device 2000 and acquire the image in real time. The camera may be installed inside the electronic device 2000 or connected externally. Additionally, the electronic device 2000 according to one embodiment may acquire images in real time from an external device through a communication interface included in the electronic device 2000. For example, the electronic device 2000 may receive images acquired from an external camera in real time through a communication interface.

일 실시예에 의하면, 획득된 이미지에는 추적 대상이 되는 적어도 하나의 객체가 포함될 수 있다. 또한, 획득된 이미지에는 서로 다른 클래스를 가지는 객체들이 포함될 수 있다. According to one embodiment, the acquired image may include at least one object to be tracked. Additionally, the acquired image may include objects of different classes.

단계 S220에서, 전자 장치(2000)는 이미지로부터 특징맵(feature map)을 추출할 수 있다. 일 실시예에 따른 특징맵은 객체 추적과 관련된 복수의 태스크를 수행하기 위한 특징맵일 수 있다. 일 실시예에 의하면, 전자 장치(2000)는 객체 추적 모델의 백본망(backbone network)을 통해 이미지로부터 특징맵을 추출할 수 있다. 일 실시예에서, 객체 추적 모델이란, 객체를 추적하기 위한 멀티-태스크(multi-task) 인공지능 모델로, 백본망, 탐지 헤드, 식별 헤드, 몸통 방향 헤드로 구성되어 있을 수 있으며, 백본망 및 헤드들 각각은 적어도 하나의 레이어로 구성된 것일 수 있다. 객체 추적 모델에 대한 구체적인 설명은 도 3을 참조하여 후술하도록 한다.In step S220, the electronic device 2000 may extract a feature map from the image. A feature map according to one embodiment may be a feature map for performing a plurality of tasks related to object tracking. According to one embodiment, the electronic device 2000 may extract a feature map from an image through a backbone network of an object tracking model. In one embodiment, the object tracking model is a multi-task artificial intelligence model for tracking objects, and may be composed of a backbone network, a detection head, an identification head, and a body direction head. The backbone network and Each of the heads may be composed of at least one layer. A detailed description of the object tracking model will be described later with reference to FIG. 3.

단계 S230에서, 전자 장치(2000)는 추출된 특징맵을 이용하여 적어도 하나의 객체의 위치 정보, 적어도 하나의 객체의 식별 특징, 및 적어도 하나의 객체의 몸통 방향의 각도를 추출할 수 있다. 일 실시예에서, 객체의 위치 정보란, 획득된 이미지 내에서 객체의 위치 및 크기를 사각형으로 나타내는 경계 상자(bounding box)를 의미할 수 있다. 일 실시예에 의하면, 적어도 하나의 객체의 위치 정보, 적어도 하나의 객체의 식별 특징, 및 적어도 하나의 객체의 몸통 방향의 각도는 각각 일 실시예에 따른 객체 추적 모델의 탐지 헤드, 식별 헤드, 및 몸통 방향 헤드에 의해 추출될 수 있다. In step S230, the electronic device 2000 may extract location information of at least one object, an identification feature of at least one object, and an angle of the body direction of at least one object using the extracted feature map. In one embodiment, the location information of an object may mean a bounding box that represents the location and size of the object in a rectangular shape within an acquired image. According to one embodiment, the location information of the at least one object, the identification feature of the at least one object, and the angle of the body direction of the at least one object are respectively the detection head, the identification head, and the object tracking model according to the embodiment. It can be extracted by the body direction head.

일 실시예에 의하면, 적어도 하나의 객체의 식별 특징 및 적어도 하나의 객체의 몸통 방향의 각도는 적어도 하나의 객체의 위치 정보에 기초하여 추출될 수 있다. 적어도 하나의 객체의 위치 정보에 기초하여 적어도 하나의 객체의 식별 특징 및 몸통 방향의 각도를 추출하는 방법에 대한 구체적인 설명은 도 5a를 참조하여 후술하도록 한다. According to one embodiment, the identification feature of at least one object and the angle of the body direction of at least one object may be extracted based on the location information of the at least one object. A detailed description of a method of extracting the identification feature and body direction angle of at least one object based on the location information of the at least one object will be described later with reference to FIG. 5A.

단계 S240에서, 전자 장치(2000)는 추출된 적어도 하나의 객체의 위치 정보, 적어도 하나의 객체의 식별 특징, 및 적어도 하나의 객체의 몸통 방향의 각도를 기초로 하여, 적어도 하나의 객체를 추적할 수 있다. 일 실시예에 의하면, 추출된 적어도 하나의 객체의 위치 정보, 적어도 하나의 객체의 식별 특징, 및 적어도 하나의 객체의 몸통 방향의 각도를 기초로 하여, 적어도 하나의 객체를 추적하는 단계는 데이터 연관(Data Association) 과정 및 추적 정보 관리 과정을 포함할 수 있다.In step S240, the electronic device 2000 tracks at least one object based on the extracted location information of the at least one object, the identification feature of the at least one object, and the angle of the body direction of the at least one object. You can. According to one embodiment, tracking the at least one object based on the extracted location information of the at least one object, the identification feature of the at least one object, and the angle of the body direction of the at least one object includes data association. (Data Association) process and tracking information management process may be included.

일 실시예에서, 데이터 연관이란, 전자 장치(2000)가 탐지된 적어도 하나의 객체 각각을 기존에 추적되고 있던 객체들과 비교하여 기존에 추적되고 있던 객체들 중 하나와 매칭시키는 과정을 의미한다. 일 실시예에서, 두 객체가 매칭된다는 것은 전자 장치(2000)에 의해 두 객체가 동일한 객체로 결정된다는 것을 의미한다. 전자 장치(2000)는 추출된 적어도 하나의 객체의 위치 정보, 적어도 하나의 객체의 식별 특징, 및 적어도 하나의 객체의 몸통 방향의 각도에 기초하여 데이터 연관 과정을 수행할 수 있다. In one embodiment, data correlation refers to a process in which the electronic device 2000 compares each of at least one detected object with objects that were previously being tracked and matches it with one of the objects that were previously being tracked. In one embodiment, matching two objects means that the electronic device 2000 determines that the two objects are the same object. The electronic device 2000 may perform a data association process based on the extracted location information of at least one object, the identification characteristic of at least one object, and the angle of the body direction of at least one object.

일 실시예에 의하면, 전자 장치(2000)는 데이터 연관 과정에서 탐지된 객체와의 비교 대상이 되는 객체가 활성 추적 상태인지 여부에 따라, 제1 데이터 연관 또는 제2 데이터 연관 과정을 수행할 수 있고, 두 객체가 서로 매칭되지 않는 경우, 제3 데이터 연관 과정을 수행할 수 있다. 일 실시예에서, 활성 추적 상태의 객체란, 추적되고 있는 객체들 중 이미지 획득 시점으로부터 기 설정된 기간 내에 매칭되었던 객체를 의미한다. 즉, 해당 객체의 마지막 매칭 시점으로부터 이미지 획득 시점까지의 기간이 기 설정된 기간 이내인 객체를 의미한다. 반대로, 비활성 추적 상태의 객체란, 추적되고 있는 객체들 중 이미지 획득 시점으로부터 기 설정된 기간 내에 매칭되지 않았던 객체를 의미한다. 데이터 연관 과정에 대한 구체적인 설명은 도 7 내지 도 10을 참조하여 후술하도록 한다.According to one embodiment, the electronic device 2000 may perform a first data association or a second data association process depending on whether the object to be compared with the object detected in the data association process is in an active tracking state. , if the two objects do not match each other, a third data association process can be performed. In one embodiment, an object in an active tracking state means an object that has been matched within a preset period from the time of image acquisition among the objects being tracked. In other words, it means an object whose period from the last matching time of the object to the time of image acquisition is within a preset period. Conversely, an object in an inactive tracking state means an object that was not matched within a preset period from the time of image acquisition among the objects being tracked. A detailed description of the data correlation process will be described later with reference to FIGS. 7 to 10.

일 실시예에 의하면, 전자 장치(2000)는 데이터 연관 과정 이후에 추적 정보 관리 과정을 수행할 수 있다. 전자 장치(2000)는 추적 정보에 포함된 식별 특징 목록 및 마지막 매칭 시점을 업데이트할 수도 있고, 새로운 객체에 대한 추적을 개시할 수도 있다. 또한, 전자 장치(2000)는 객체의 마지막 매칭 시점에 기초하여 활성 추적 상태인지 여부를 판단하고, 비활성 추적 상태로 업데이트 할 수도 있고, 객체의 추적을 종료할 수도 있다. 추적 정보 관리 과정에 대한 구체적인 설명은 도 11 내지 12를 참조하여 후술하도록 한다.According to one embodiment, the electronic device 2000 may perform a tracking information management process after the data association process. The electronic device 2000 may update the list of identification features and the last matching time included in the tracking information, or may start tracking a new object. Additionally, the electronic device 2000 may determine whether the object is in an active tracking state based on the last matching time point, update the state to an inactive tracking state, or end tracking the object. A detailed description of the tracking information management process will be described later with reference to FIGS. 11 and 12.

도 3은 본 개시의 일 실시예에 따른 객체 추적 모델을 이용하여 객체를 추적하는 과정을 나타내는 흐름도이다.Figure 3 is a flowchart showing a process of tracking an object using an object tracking model according to an embodiment of the present disclosure.

도 3을 참조하면, 전자 장치(2000)는 객체 추적 모델(320)을 이용하여 객체를 추적할 수 있다. Referring to FIG. 3 , the electronic device 2000 may track an object using the object tracking model 320.

단계 S310에서, 전자 장치(2000)는 객체 추적 모델(320)에 획득한 이미지를 입력할 수 있다. 전자 장치(2000)는 객체 추적 모델(320)을 메모리에 저장하여 이용할 수도 있고, 외부 서버에 저장된 객체 추적 모델(320)을 이용할 수도 있다. 전자 장치(2000)가 외부 서버에 저장된 객체 추적 모델(320)을 이용하는 경우, 전자 장치(2000)는 통신 인터페이스를 통해 획득한 이미지를 외부 서버에 전송할 수 있다. 외부 서버는 객체 추적 모델(320)에 전자 장치(2000)로부터 획득한 이미지를 입력하여 적어도 하나의 객체의 위치 정보, 식별 특징, 및 몸통 방향의 각도를 포함하는 출력 데이터들을 출력할 수 있다. 전자 장치(2000) 외부 서버로부터 객체 추적 모델(320)의 출력 데이터들을 수신할 수 있다. 이하에서는, 설명의 편의를 위해, 전자 장치(2000)는 객체 추적 모델(320)을 메모리에 저장하고, 메모리에 저장된 객체 추적 모델(320)을 이용하는 것을 예로 들어 설명하도록 한다.In step S310, the electronic device 2000 may input the acquired image into the object tracking model 320. The electronic device 2000 may store and use the object tracking model 320 in memory, or may use the object tracking model 320 stored in an external server. When the electronic device 2000 uses the object tracking model 320 stored in an external server, the electronic device 2000 may transmit the acquired image to the external server through a communication interface. The external server may input the image acquired from the electronic device 2000 to the object tracking model 320 and output output data including location information, identification features, and body direction angle of at least one object. The electronic device 2000 may receive output data of the object tracking model 320 from an external server. Hereinafter, for convenience of explanation, the electronic device 2000 stores the object tracking model 320 in the memory and uses the object tracking model 320 stored in the memory as an example.

일 실시예에 따른 객체 추적 모델(320)은 백본망(322), 탐지 헤드(324), 식별 헤드(326), 및 몸통 방향 헤드(328)로 구성될 수 있다. 객체 추적 모델(320)은 객체 추적을 위한 멀티-태스크 인공지능 모델로, 객체 위치 탐지 태스크, 객체 식별 태스크, 및 객체 몸통 방향 탐지 태스크 등을 병렬적으로 수행할 수 있다. 백본망(322) 및 헤드 각각은 적어도 하나의 레이어로 구성될 수 있다. The object tracking model 320 according to one embodiment may be composed of a backbone network 322, a detection head 324, an identification head 326, and a body direction head 328. The object tracking model 320 is a multi-task artificial intelligence model for object tracking, and can perform object location detection tasks, object identification tasks, and object body direction detection tasks in parallel. Each of the backbone network 322 and the head may be composed of at least one layer.

일 실시예에 의하면, 백본망(322)은 이미지로부터 특징맵을 추출할 수 있다. 백본망(322)은 복수의 레이어들을 포함하는 CNN 구조로 이루어질 수 있다. 추출된 특징맵은 객체 위치 탐지 태스크, 객체 식별 태스크, 및 객체 몸통 방향 탐지 태스크에 이용될 수 있다. 즉, 추출된 특징맵은 탐지 헤드(324), 식별 헤드(326), 및 몸통 방향 헤드(328)의 입력 데이터가 될 수 있다. 객체 추적 모델(320)의 각 헤드에서 이미지가 아닌 특징맵을 입력 데이터로 함으로써, 각 헤드에서 수행되는 태스크의 정확도 및 처리 속도를 높일 수 있다. 백본망(322)은 각 태스크에서 이용되기 적합하도록 복수 개의 필터를 가진 합성곱(Convolution) 레이어들 및 풀링(Pooling) 레이어들을 통해 이미지를 복수 개의 채널로 나누고, 다운샘플링할 수 있다. 예를 들어, 백본망(322)은 이미지를 1/4로 다운샘플링하여 특징맵을 추출할 수 있다. 이 경우, 이미지의 크기가 (W, H)일 때, 특징맵의 크기는 (W/4, H/4)일 수 있다.According to one embodiment, the backbone network 322 may extract a feature map from the image. The backbone network 322 may have a CNN structure including a plurality of layers. The extracted feature map can be used for object location detection tasks, object identification tasks, and object body direction detection tasks. That is, the extracted feature map can be input data for the detection head 324, the identification head 326, and the body direction head 328. By using a feature map rather than an image as input data in each head of the object tracking model 320, the accuracy and processing speed of tasks performed in each head can be increased. The backbone network 322 can divide the image into a plurality of channels and downsample it through convolution layers and pooling layers with a plurality of filters to be suitable for use in each task. For example, the backbone network 322 can extract a feature map by downsampling the image to 1/4. In this case, when the size of the image is (W, H), the size of the feature map may be (W/4, H/4).

일 실시예에 의하면, 탐지 헤드(324)는 객체 위치 탐지 태스크를 수행할 수 있다. 즉, 탐지 헤드(324)는 추출된 특징맵으로부터 적어도 하나의 객체의 위치 정보를 추출할 수 있다. 또한, 식별 헤드(326)는 객체 식별 태스크를 수행할 수 있다. 즉, 식별 헤드(326)는 추출된 특징맵으로부터 적어도 하나의 객체의 식별 특징을 추출할 수 있다. 또한, 몸통 방향 헤드(328)는 객체 몸통 방향 탐지 태스크를 수행할 수 있다. 즉, 몸통 방향 헤드(328)는 추출된 특징맵으로부터 적어도 하나의 객체의 몸통 방향의 각도를 추출할 수 있다.According to one embodiment, detection head 324 may perform an object location detection task. That is, the detection head 324 can extract location information of at least one object from the extracted feature map. Additionally, identification head 326 may perform object identification tasks. That is, the identification head 326 may extract identification features of at least one object from the extracted feature map. Additionally, the body direction head 328 may perform an object body direction detection task. That is, the body direction head 328 can extract the body direction angle of at least one object from the extracted feature map.

일 실시예에 의하면, 탐지 헤드(324)는 히트맵 헤드, 경계 상자 크기 헤드, 및 중심 오프셋(offset) 헤드를 포함할 수 있다. 히트맵 헤드는 특징맵으로부터 적어도 하나의 객체의 중심 위치를 나타내는 히트맵을 추출할 수 있다. 히트맵은 히트맵의 각 픽셀에 대응하는 히트맵 스코어를 가지며, 적어도 하나의 객체의 특징맵 상에서의 중심 위치에서 피크(peak) 히트맵 스코어를 가질 수 있다. 예를 들어, 특징맵의 크기가

일 때, 히트맵 헤드의 출력 데이터의 크기는

일 수 있다. 이 때, 마지막 차원은 히트맵 스코어에 대응될 수 있다. According to one embodiment, detection head 324 may include a heatmap head, a bounding box size head, and a center offset head. The heatmap head may extract a heatmap indicating the center position of at least one object from the feature map. The heatmap has a heatmap score corresponding to each pixel of the heatmap, and may have a peak heatmap score at the center position on the feature map of at least one object. For example, the size of the feature map is

When , the size of the output data of the heatmap head is

It can be. At this time, the last dimension may correspond to the heatmap score.

일 실시예에 의하면, 경계 상자 크기 헤드는 특징맵으로부터 특징맵의 각 픽셀에 대응하는 경계 상자의 x축 길이 및 y축 길이를 추출할 수 있다. 일 실시예에서, 경계 상자 크기란, 경계 상자의 x축 길이 및 y축 길이를 포함하는 의미를 가질 수 있다. 이 때, 적어도 하나의 객체의 특징맵 상에서의 중심 위치에 대응하는 경계 상자의 x축 길이 및 y축 길이가 적어도 하나의 객체의 경계 상자의 x축 길이 및 y축 길이일 수 있다. 예를 들어, 특징맵의 크기가 일 때, 경계 상자 크기 헤드의 출력 데이터의 크기는

일 수 있다. 이 때, 마지막 차원은 각각 경계 상자의 x축 길이 및 y축 길이에 대응될 수 있다. According to one embodiment, the bounding box size head may extract the x-axis length and y-axis length of the bounding box corresponding to each pixel of the feature map from the feature map. In one embodiment, the bounding box size may include the x-axis length and y-axis length of the bounding box. At this time, the x-axis length and y-axis length of the bounding box corresponding to the center position on the feature map of at least one object may be the x-axis length and y-axis length of the bounding box of at least one object. For example, the size of the feature map is When , the size of the output data of the bounding box size head is

It can be. At this time, the last dimension may correspond to the x-axis length and y-axis length of the bounding box, respectively.

일 실시예에 의하면, 중심 오프셋 헤드는 특징맵으로부터 특징맵의 각 픽셀에 대응하는 중심 오프셋을 추출할 수 있다. 일 실시예에서, 중심 오프셋은 적어도 하나의 객체의 이미지 상에서의 중심 위치를 획득하기 위한 오프셋일 수 있다. 이 때, 적어도 하나의 객체의 특징맵 상에서의 중심 위치에 대응하는 중심 오프셋이 적어도 하나의 객체의 중심 오프셋일 수 있다. 예를 들어, 특징맵의 크기가 일 때, 중심 오프셋 헤드의 출력 데이터의 크기는

일 수 있다. 이 때, 마지막 차원은 각각 x축의 중심 오프셋 및 y축의 중심 오프셋에 대응될 수 있다.According to one embodiment, the center offset head may extract the center offset corresponding to each pixel of the feature map from the feature map. In one embodiment, the center offset may be an offset for obtaining the center position on an image of at least one object. At this time, the center offset corresponding to the center position on the feature map of at least one object may be the center offset of at least one object. For example, the size of the feature map is When , the size of the output data of the center offset head is

It can be. At this time, the last dimension may correspond to the center offset of the x-axis and the center offset of the y-axis, respectively.

일 실시예에 의하면, 식별 헤드(326)는 복수 개의 필터를 가지는 합성곱 레이어를 포함할 수 있다. 식별 헤드(326)의 입력 데이터인 특징맵의 크기가 일 때, 식별 헤드(326)의 출력 데이터의 크기는

일 수 있다. 이 때,

는 특징맵의 각 픽셀에 대응하는 식별 임베딩(embedding) 벡터의 크기일 수 있다. 예를 들어, 128개의 기준을 가지고 객체의 식별 특징을 추출하는 경우, 식별 헤드(326)는 128개의 필터를 가지는 합성곱 레이어를 포함할 수 있다. 이 경우, 식별 헤드(326)의 출력 데이터의 크기는

일 수 있다. 식별 헤드(326)는 식별 임베딩(embedding) 벡터들을 추출할 수 있다. 일 실시예에서, 식별 임베딩 벡터란, 식별 헤드(326)의 출력 데이터에 포함되는 데이터로, 특징맵의 임의의 픽셀에 대응하는 식별 특징을 의미한다. 즉, 일 실시예에서, 식별 임베딩 벡터와 식별 특징은 동일한 의미로 사용될 수 있다. 이 때, 적어도 하나의 객체의 특징맵 상에서의 중심 위치에 대응하는 식별 임베딩 벡터가 적어도 하나의 객체의 식별 특징일 수 있다.According to one embodiment, the identification head 326 may include a convolution layer having a plurality of filters. The size of the feature map, which is the input data of the identification head 326, is When , the size of the output data of the identification head 326 is

It can be. At this time,

May be the size of the identification embedding vector corresponding to each pixel of the feature map. For example, when extracting identification features of an object using 128 criteria, the identification head 326 may include a convolution layer with 128 filters. In this case, the size of the output data of the identification head 326 is

It can be. The identification head 326 may extract identification embedding vectors. In one embodiment, the identification embedding vector is data included in the output data of the identification head 326 and means an identification feature corresponding to an arbitrary pixel of the feature map. That is, in one embodiment, identification embedding vector and identification feature may be used with the same meaning. At this time, an identification embedding vector corresponding to the center position on the feature map of at least one object may be an identification feature of at least one object.

일 실시예에 의하면, 몸통 방향 헤드(328)는 대표 각도의 개수만큼의 필터를 가지는 합성곱 레이어를 포함할 수 있다. 몸통 방향 헤드(328)의 입력 데이터인 특징맵의 크기가 일 때, 몸통 방향 헤드(328)의 출력 데이터의 크기는

일 수 있다. 이 때,

는 특징맵의 각 픽셀에 대응하는 몸통 방향 임베딩 벡터의 크기일 수 있다. 예를 들어, 대표 각도가 0도, 90도, 180도, 270도라면, 몸통 방향 헤드(328)는 4개의 필터를 가지는 합성곱 레이어를 포함할 수 있다. 합성곱 레이어의 각 필터는 대표 각도 각각에 대응될 수 있다. 이 경우, 몸통 방향 헤드(328)의 출력 데이터의 크기는

일 수 있다. 몸통 방향 헤드(328)는 몸통 방향 임베딩 벡터들을 추출할 수 있다. 일 실시예에서, 몸통 방향 임베딩 벡터란, 몸통 방향 헤드(328)의 출력 데이터에 포함되는 데이터로, 특징맵의 임의의 픽셀에 대응된다. 이 때, 적어도 하나의 객체의 특징맵 상에서의 중심 위치에 대응하는 몸통 방향 임베딩 벡터가 적어도 하나의 객체의 몸통 방향 임베딩 벡터일 수 있다. 객체의 몸통 방향 임베딩 벡터는 대표 각도 각각에 대응하는 원소들로 이루어지고, 원소들 각각의 값은 대표 각도 각각에 대한 스코어일 수 있다.According to one embodiment, the body direction head 328 may include a convolution layer having as many filters as the number of representative angles. The size of the feature map, which is the input data of the body direction head 328, is When , the size of the output data of the body direction head 328 is

It can be. At this time,

May be the size of the body direction embedding vector corresponding to each pixel of the feature map. For example, if the representative angles are 0 degrees, 90 degrees, 180 degrees, and 270 degrees, the body direction head 328 may include a convolution layer with four filters. Each filter of the convolution layer may correspond to each representative angle. In this case, the size of the output data of the body direction head 328 is

It can be. The body direction head 328 can extract body direction embedding vectors. In one embodiment, the body direction embedding vector is data included in the output data of the body direction head 328 and corresponds to an arbitrary pixel of the feature map. At this time, the body direction embedding vector corresponding to the center position on the feature map of at least one object may be the body direction embedding vector of at least one object. The object's body direction embedding vector consists of elements corresponding to each representative angle, and the value of each element may be a score for each representative angle.

일 실시예에 의하면, 객체 추적 모델(320)이 추적하는 객체의 클래스가 복수 개인 경우, 탐지 헤드(324), 식별 헤드(326), 및 몸통 방향 헤드(328)의 출력 데이터는 마지막 차원의 벡터의 원소 개수가 늘어나거나 차원이 추가될 수 있다. 예를 들어, 획득된 이미지 내에 객체의 클래스가 C개이고, 각 헤드의 입력 데이터가 되는 특징맵의 크기가 일 때, 탐지 헤드(324)에 포함된 히트맵 헤드의 출력 데이터의 크기는

일 수 있다. 히트맵 헤드는 각 클래스에 대응되는 히트맵을 추출할 수 있다. 또한, 식별 헤드(326)의 출력 데이터의 크기는

일 수 있다. 식별 헤드(326)는 각 클래스 별로 특징맵의 각 픽셀에 대응하는 식별 임베딩 벡터들을 추출할 수 있다. 또한, 몸통 방향 헤드(328)의 출력 데이터의 크기는

일 수 있다. 몸통 방향 헤드(328)는 각 클래스 별로 특징맵의 각 픽셀에 대응하는 몸통 방향 임베딩 벡터들을 추출할 수 있다. According to one embodiment, when there are multiple classes of objects tracked by the object tracking model 320, the output data of the detection head 324, the identification head 326, and the body direction head 328 are vectors in the last dimension. The number of elements may be increased or dimensions may be added. For example, there are C classes of objects in the acquired image, and the size of the feature map that is the input data for each head is When , the size of the output data of the heatmap head included in the detection head 324 is

It can be. The heatmap head can extract the heatmap corresponding to each class. Additionally, the size of the output data of the identification head 326 is

It can be. The identification head 326 may extract identification embedding vectors corresponding to each pixel of the feature map for each class. In addition, the size of the output data of the body direction head 328 is

It can be. The body direction head 328 can extract body direction embedding vectors corresponding to each pixel of the feature map for each class.

일 실시예에 따른 전자 장치(2000)는 객체의 클래스가 복수 개인 경우, 히트맵 헤드에서 추출되는 각 클래스에 대응되는 히트맵의 히트맵 스코어들을 기초로 하여, 객체의 클래스를 식별할 수 있다. 예를 들어, 히트맵 헤드는 각 클래스에 해당하는 적어도 하나의 객체의 중심 위치에서만 피크 히트맵 스코어를 가지는 히트맵을 추출할 수 있다. 전자 장치(2000)는 각 클래스에 대응되는 히트맵에서 피크(peak) 히트맵 스코어를 가지는 픽셀을 기준으로 각 클래스에 해당하는 적어도 하나의 객체를 결정하고, 각 객체의 특징맵 상에서의 중심 위치를 결정할 수 있다. When there are multiple classes of an object, the electronic device 2000 according to one embodiment may identify the class of the object based on the heatmap scores of the heatmap corresponding to each class extracted from the heatmap head. For example, the heatmap head may extract a heatmap having a peak heatmap score only at the center location of at least one object corresponding to each class. The electronic device 2000 determines at least one object corresponding to each class based on a pixel having a peak heatmap score in the heatmap corresponding to each class, and determines the center position of each object on the feature map. You can decide.

일 실시예에 따른 전자 장치(2000)는 각 클래스에 해당하는 적어도 하나의 객체의 특징맵 상에서의 중심 위치에 기초하여, 동일 클래스에 대응되는 식별 헤드(326) 및 몸통 방향 헤드(328)에서 추출된 식별 임베딩 벡터들 및 몸통 방향 임베딩 벡터들로부터 각 객체의 식별 특징 및 몸통 방향의 각도를 추출할 수 있다. 이하에서는, 설명의 편의를 위해, 획득된 이미지 내에 포함된 객체의 클래스는 한 개인 것을 예로 들어 설명하도록 한다.The electronic device 2000 according to an embodiment extracts information from the identification head 326 and the body direction head 328 corresponding to the same class based on the central position of at least one object corresponding to each class on the feature map. The identification features and body direction angle of each object can be extracted from the identified identification embedding vectors and body direction embedding vectors. Hereinafter, for convenience of explanation, an example will be given where there is only one class of object included in the acquired image.

단계 S330에서, 전자 장치(2000)는 출력 병합 과정을 수행할 수 있다. 즉, 전자 장치(2000)는 탐지 헤드(324), 식별 헤드(326), 및 몸통 방향 헤드(328)에서 출력된 출력 데이터들을 병합할 수 있다. 전자 장치(2000)는 탐지 헤드(324)에서 적어도 하나의 객체의 위치 정보를 추출하고, 추출된 적어도 하나의 객체의 위치 정보에 기초하여, 식별 헤드(326) 및 몸통 방향 헤드(328)에서 출력된 식별 임베딩 벡터들 및 몸통 방향 임베딩 벡터들로부터 적어도 하나의 객체의 식별 특징 및 적어도 하나의 객체의 몸통 방향의 각도를 추출할 수 있다. In step S330, the electronic device 2000 may perform an output merging process. That is, the electronic device 2000 may merge output data output from the detection head 324, the identification head 326, and the body direction head 328. The electronic device 2000 extracts location information of at least one object from the detection head 324 and outputs it from the identification head 326 and the body direction head 328 based on the extracted location information of the at least one object. An identification feature of at least one object and an angle of the body direction of at least one object may be extracted from the identified identification embedding vectors and body direction embedding vectors.

각 헤드에서 출력되는 출력 데이터 및 출력 병합 과정에 대한 구체적인 설명은 도 5a를 참조하여 후술하도록 한다.A detailed description of the output data output from each head and the output merging process will be described later with reference to FIG. 5A.

단계 S340에서, 전자 장치(2000)는 데이터 연관 과정을 수행할 수 있다. 즉, 전자 장치(2000)는 적어도 하나의 객체의 위치 정보, 식별 특징, 및 몸통 방향의 각도에 기초하여, 기존에 추적되고 있던 객체들 중 하나와 매칭시킬 수 있다.In step S340, the electronic device 2000 may perform a data association process. That is, the electronic device 2000 may match the object with one of the previously tracked objects based on the location information, identification feature, and body direction angle of at least one object.

단계 S350에서, 전자 장치(2000)는 추적 정보 관리 과정을 수행할 수 있다. 즉, 전자 장치(2000)는 추적 정보에 포함된 식별 특징 목록 및 마지막 매칭 시점을 업데이트할 수도 있고, 새로운 객체에 대한 추적을 개시할 수도 있다. 또한, 전자 장치(2000)는 객체가 마지막으로 탐지된 시점에 기초하여 활성 추적 상태인지 여부를 판단하고, 비활성 추적 상태로 업데이트 할 수도 있고, 객체의 추적을 종료할 수도 있다.In step S350, the electronic device 2000 may perform a tracking information management process. That is, the electronic device 2000 may update the list of identification features and the last matching time included in the tracking information, or may start tracking a new object. Additionally, the electronic device 2000 may determine whether the object is in an active tracking state based on the last time it was detected, update the object to an inactive tracking state, or end tracking the object.

도 4는 본 개시의 일 실시예에 따른 객체 추적 모델의 훈련 이미지 데이터를 나타내는 예시적인 도면이다.Figure 4 is an exemplary diagram showing training image data of an object tracking model according to an embodiment of the present disclosure.

일 실시예에 따른 객체 추적 모델은 다양한 주석(annotation)을 포함하는 훈련 이미지 데이터세트로 학습될 수 있다. 객체 추적 모델은 전자 장치(2000)에 의해 학습될 수도 있고, 외부 서버에 의해 학습될 수도 있다. 객체 추적 모델이 전자 장치(2000)에 의해 학습되는 경우, 전자 장치(2000)는 통신 인터페이스를 통해 외부 데이터베이스로부터 훈련 이미지 데이터 세트를 수신할 수 있다. 객체 추적 모델이 외부 서버에 의해 학습되는 경우, 객체 추적 모델은 외부 서버에 저장될 수도 있고, 전자 장치(2000)가 통신 인터페이스를 통해 외부 서버로부터 객체 추적 모델을 수신하여 저장할 수도 있다. 이하에서는, 설명의 편의를 위해, 객체 추적 모델은 전자 장치(2000)에 의해 학습되는 것을 예로 들어 설명하도록 한다.An object tracking model according to one embodiment may be learned with a training image dataset containing various annotations. The object tracking model may be learned by the electronic device 2000 or by an external server. When an object tracking model is learned by the electronic device 2000, the electronic device 2000 may receive a training image data set from an external database through a communication interface. When the object tracking model is learned by an external server, the object tracking model may be stored in the external server, or the electronic device 2000 may receive the object tracking model from the external server through a communication interface and store it. Hereinafter, for convenience of explanation, the object tracking model learned by the electronic device 2000 will be described as an example.

일 실시예에 의하면, 객체 추적 모델의 학습에 이용되는 훈련 이미지 데이터는 객체의 클래스, ID, 경계 상자(bounding box) 정보, 세그멘테이션(segmentation) 정보, 몸통 방향의 각도에 대한 주석을 포함할 수 있다. 다만, 이에 한정되지 않고, 훈련 이미지 데이터는 더 많은 주석을 포함할 수 있다. According to one embodiment, training image data used to learn an object tracking model may include annotations on the object's class, ID, bounding box information, segmentation information, and body direction angle. . However, the training image data is not limited to this and may include more annotations.

일 실시예에 의하면, 객체의 클래스는 객체의 종류를 나타낼 수 있다. 예를 들어, 객체의 클래스는 사람, 차량, 동물 등이 될 수 있다. 객체의 경계 상자 정보는 경계 상자의 각 꼭지점의 x, y 좌표를 포함할 수 있다. 세그멘테이션이란 이미지 내에서 객체들을 각각 분할하는 것을 의미한다. 객체의 세그멘테이션 정보는 일정 단위마다 측정된 객체의 경계의 x, y 좌표들을 포함한다. 객체의 몸통 방향의 각도는 실제 객체의 몸통 방향의 각도와 가장 가까운 대표 각도로 라벨링될 수 있다. 훈련 이미지 데이터 세트는 대표 각도 각각에 대응하는 적어도 하나의 객체를 포함할 수 있다.According to one embodiment, the class of an object may represent the type of the object. For example, the object's class can be person, vehicle, animal, etc. The bounding box information of an object may include the x and y coordinates of each vertex of the bounding box. Segmentation means dividing each object within an image. The segmentation information of an object includes the x and y coordinates of the boundary of the object measured in certain units. The angle of the body direction of the object may be labeled as a representative angle that is closest to the angle of the body direction of the actual object. The training image data set may include at least one object corresponding to each representative angle.

일 실시예에 의하면, 객체의 특징맵 상에서의 중심 위치는 훈련 이미지 데이터의 객체의 경계 상자 좌표를 통해 구해질 수 있다. 예를 들어, 특징맵은 이미지가 1/4로 다운샘플링되어 추출된 것이고, 객체의 학습 이미지 데이터 상에서의 경계 상자 좌표가

일 때, 객체의 학습 이미지 데이터 상에서의 중심 위치는

이며, 객체의 특징맵 상에서의 중심 위치는

일 수 있다. 이는, 객체의 특징맵 상에서의 중심 위치를 특징맵 상에서 정수의 x, y 좌표로 나타내기 위함이다. According to one embodiment, the center position of the object on the feature map can be obtained through the bounding box coordinates of the object in training image data. For example, the feature map is extracted by downsampling the image to 1/4, and the bounding box coordinates on the object's training image data are

When , the center position of the object on the training image data is

, and the center position on the feature map of the object is

It can be. This is to indicate the center position of the object on the feature map as integer x and y coordinates on the feature map.

마찬가지로, 객체의 경계 상자 크기도 훈련 이미지 데이터의 객체의 경계 상자 좌표를 통해 구해질 수 있다. 예를 들어, 객체의 경계 상자 좌표가 일 때 객체의 경계 상자 크기는

일 수 있다. 객체의 중심 오프셋은 학습 이미지 데이터의 객체의 경계 상자 좌표 및 객체의 특징맵 상에서의 중심 위치를 통해 구해질 수 있다. 예를 들어, 특징맵은 이미지가 1/4로 다운샘플링되어 추출된 것이고, 객체의 경계 상자 좌표가 일 때, 객체의 중심 오프셋은

일 수 있다. 이는 객체의 이미지 상에서의 정확한 중심 위치를 구하기 위함이다.Likewise, the bounding box size of an object can be obtained through the bounding box coordinates of the object in the training image data. For example, if the object's bounding box coordinates are When the object's bounding box size is

It can be. The center offset of the object can be obtained through the object's bounding box coordinates in the training image data and the center position on the object's feature map. For example, the feature map is extracted by downsampling the image to 1/4, and the bounding box coordinates of the object are When , the center offset of the object is

It can be. This is to obtain the exact center position on the image of the object.

일 실시예에 의하면, 객체의 식별 특징은 훈련 이미지 데이터의 객체의 세그멘테이션 정보, 객체의 색 조합, 객체의 경계 상자 정보 등에 기초하여 결정될 수 있다. 전자 장치(2000)는 식별 헤드가 추출하는 식별 특징에 있어서, 동일한 객체 간 식별 특징은 유사도가 높고, 다른 객체 간 식별 특징은 유사도가 낮도록 객체 추적 모델을 학습시킬 수 있다. According to one embodiment, the identification characteristics of the object may be determined based on the object's segmentation information, the object's color combination, the object's bounding box information, etc. in the training image data. The electronic device 2000 may train an object tracking model so that among the identification features extracted by the identification head, the identification features between the same objects have high similarity, and the identification features between different objects have low similarity.

일 실시예에 의하면, 객체의 몸통 방향의 각도는 훈련 이미지 데이터의 객체의 세그멘테이션 정보, 객체의 경계 상자 정보 등에 기초하여 결정될 수 있다. 예를 들어, 대표 각도 별로 객체의 외형이 구별될 수 있다. 전자 장치(2000)는 대표 각도 별로 구별되는 객체의 외형에 기초하여, 객체의 몸통 방향 헤드가 몸통 방향의 각도를 예측하도록 객체 추적 모델을 학습시킬 수 있다. According to one embodiment, the angle of the object's body direction may be determined based on the object's segmentation information, object's bounding box information, etc. in training image data. For example, the appearance of an object may be distinguished by representative angle. The electronic device 2000 may train an object tracking model to predict the body-direction angle of the body-direction head of the object based on the external appearance of the object distinguished for each representative angle.

일 실시예에 의하면, 전자 장치(2000)는 객체의 히트맵, 객체의 경계 상자 크기, 객체의 중심 오프셋, 객체의 식별 특징, 및 객체의 몸통 방향의 각도 각각에 대한 로스(loss) 값을 계산하고, 전체 로스 값이 최소가 되도록 객체 추적 모델을 학습시킬 수 있다. 객체의 히트맵, 객체의 경계 상자 크기, 및 객체의 중심 오프셋의 경우, 전자 장치(2000)는 실제값(ground-truth)과 예측값 사이의 차이가 작을수록 로스 값이 작아지도록 설정할 수 있다. According to one embodiment, the electronic device 2000 calculates loss values for each of the object's heatmap, the object's bounding box size, the object's center offset, the object's identification feature, and the object's body direction angle. And, the object tracking model can be trained so that the total loss value is minimized. In the case of the object's heatmap, the object's bounding box size, and the object's center offset, the electronic device 2000 may set the loss value to be smaller as the difference between the ground-truth and the predicted value is smaller.

일 실시예에 의하면, 객체의 식별 특징의 경우, 학습을 위해 완전 연결 계층(fully-connected layer) 및 소프트맥스(softmax) 함수가 이용될 수 있다. 예측한 식별 특징을 완전 연결 계층에 입력한 결과, 동일한 객체에 대한 확률이 높을수록 로스 값이 작아지도록 설정할 수 있다. 객체의 몸통 방향의 각도의 경우, 객체의 몸통 방향 임베딩 벡터에서 객체의 몸통 방향의 각도인 대표 각도에 대응하는 원소의 값이 클수록 로스 값이 작아지도록 설정할 수 있다.According to one embodiment, for identification features of objects, a fully-connected layer and a softmax function may be used for learning. As a result of inputting the predicted identification features into the fully connected layer, the loss value can be set so that the higher the probability of the same object, the smaller the loss value. In the case of the body direction angle of the object, the loss value can be set so that the larger the value of the element corresponding to the representative angle, which is the body direction angle of the object, in the object's body direction embedding vector, the smaller the loss value.

일 실시예에 의하면, 훈련 이미지 데이터 세트는 서로 다른 클래스에 해당하는 객체들을 포함함 수 있다. 예를 들어, 훈련 이미지 데이터 세트에는 추적 대상이 되는 객체로 사람, 차량이 포함될 수 있다. 객체 추적 모델은 히트맵 헤드, 식별 헤드, 몸통 방향 헤드에서 각 클래스에 대응하는 히트맵, 각 클래스 별로 특징맵의 각 픽셀에 대응하는 식별 임베딩 벡터, 및 각 클래스 별로 특징맵의 각 픽셀에 대응하는 몸통 방향 임베딩 벡터가 추출되도록 학습될 수 있다. 예를 들어, 히트맵 헤드는 클래스의 개수만큼의 필터를 가지는 레이어를 포함할 수 있고, 각 필터는 각 클래스에 대응하는 히트맵을 추출하도록 학습될 수 있다. 전자 장치(2000)는 각 클래스에 대응되는 히트맵이 각 클래스에 해당하는 적어도 하나의 객체의 중심 위치에서 큰 히트맵 스코어를 가질수록 로스 값이 작아지도록 설정할 수 있다.According to one embodiment, the training image data set may include objects corresponding to different classes. For example, a training image data set may include people and vehicles as objects to be tracked. The object tracking model includes a heatmap head, an identification head, and a body direction head, a heatmap corresponding to each class, an identification embedding vector corresponding to each pixel of the feature map for each class, and a heatmap corresponding to each pixel of the feature map for each class. The body direction embedding vector can be learned to be extracted. For example, the heatmap head may include a layer with as many filters as the number of classes, and each filter may be learned to extract a heatmap corresponding to each class. The electronic device 2000 may set the loss value to be smaller as the heatmap corresponding to each class has a larger heatmap score at the center position of at least one object corresponding to each class.

도 5a는 본 개시의 일 실시예에 따른 객체 추적 모델의 각 헤드를 이용하여 객체의 위치 정보, 식별 특징, 몸통 방향의 각도를 추출하는 방법을 나타내는 도면이다.FIG. 5A is a diagram illustrating a method of extracting location information, identification features, and body direction angle of an object using each head of an object tracking model according to an embodiment of the present disclosure.

도 5a를 참조하면, 전자 장치(2000)는 객체 추적 모델의 각 헤드들로부터 출력된 출력 데이터들을 병합하여 객체의 위치 정보, 식별 특징, 및 몸통 방향의 각도를 추출할 수 있다. 도 5a에는 이미지에서 3개의 객체가 탐지된 경우를 예로 들어 도시되어 있으나, 이미지에서 더 많은 객체가 탐지될 수도 있고, 더 적은 객체가 탐지될 수도 있다.Referring to FIG. 5A , the electronic device 2000 may merge output data from each head of the object tracking model to extract the object's location information, identification features, and body direction angle. Figure 5a shows an example in which three objects are detected in the image, but more or fewer objects may be detected in the image.

일 실시예에 의하면, 객체 추적 모델의 탐지 헤드(324)는 히트맵 헤드, 경계 상자 크기 헤드, 및 중심 오프셋 헤드로 구성될 수 있다. 탐지 헤드(324)에 포함된 각 헤드는 객체 추적 모델의 백본망에서 추출된 특징맵을 입력 데이터로 할 수 있다. 히트맵 헤드, 경계 상자 크기 헤드, 및 중심 오프셋 헤드는 각각 히트맵(510), 경계 상자 크기 데이터(520), 및 중심 오프셋 데이터(530)를 추출할 수 있다. 히트맵(510)은 특징맵과 동일한 사이즈를 가지며, 히트맵(510)의 각 픽셀에 대응하는 히트맵 스코어를 가질 수 있다. 또한, 히트맵(510)은 적어도 하나의 객체의 특징맵 상에서의 중심 위치에서 피크(peak) 히트맵 스코어를 가질 수 있다. 히트맵 헤드는 히트맵(510)의 피크 히트맵 스코어를 구하기 위한 맥스 풀링(max pooling) 레이어를 포함할 수 있다. 일 실시예에 의하면, 히트맵 헤드는 맥스 풀링 적용 후, 피크 히트맵 스코어 중 히트맵 스코어가 높은 순으로 상위 k개(예를 들어, 100개)의 픽셀을 추출할 수 있다. 이 때, k는 기 설정된 값일 수 있다.According to one embodiment, the detection head 324 of the object tracking model may be composed of a heatmap head, a bounding box size head, and a center offset head. Each head included in the detection head 324 may use a feature map extracted from the backbone network of the object tracking model as input data. The heatmap head, bounding box size head, and center offset head can extract heatmap 510, bounding box size data 520, and center offset data 530, respectively. The heatmap 510 has the same size as the feature map and may have a heatmap score corresponding to each pixel of the heatmap 510. Additionally, the heatmap 510 may have a peak heatmap score at the center position on the feature map of at least one object. The heatmap head may include a max pooling layer to obtain the peak heatmap score of the heatmap 510. According to one embodiment, after applying max pooling, the heatmap head may extract the top k (eg, 100) pixels among the peak heatmap scores in descending order of heatmap score. At this time, k may be a preset value.

일 실시예에 의하면, 적어도 하나의 객체의 경계 상자 크기 및 중심 오프셋은, 적어도 하나의 객체의 특징맵 상 중심 위치에 기초하여 추출될 수 있다. 경계 상자 크기 데이터(520) 및 중심 오프셋 데이터(530)에서, 적어도 하나의 객체의 특징맵 상 중심 위치에 대응하는 경계 상자 크기 및 중심 오프셋이 적어도 하나의 객체의 경계 상자 크기 및 중심 오프셋일 수 있다.According to one embodiment, the bounding box size and center offset of at least one object may be extracted based on the center position on the feature map of at least one object. In the bounding box size data 520 and the center offset data 530, the bounding box size and center offset corresponding to the center position on the feature map of at least one object may be the bounding box size and center offset of the at least one object. .

일 실시예에 의하면, 전자 장치(2000)는 적어도 하나의 객체의 특징맵 상에서의 중심 위치, 적어도 하나의 객체의 경계 상자 크기, 및 적어도 하나의 객체의 중심 오프셋을 기초로 하여, 적어도 하나의 객체의 위치 정보를 추출할 수 있다. 적어도 하나의 객체의 위치 정보를 추출하는 방법에 대한 자세한 설명은 도 5b를 참조하여 설명하도록 한다.According to one embodiment, the electronic device 2000 detects at least one object based on the center position on the feature map of the at least one object, the bounding box size of the at least one object, and the center offset of the at least one object. Location information can be extracted. A detailed description of the method for extracting location information of at least one object will be described with reference to FIG. 5B.

일 실시예에 의하면, 객체 추적 모델의 식별 헤드(326)는 식별 임베딩 벡터들(540)을 추출할 수 있다. 특징맵의 픽셀들 각각은 대응되는 식별 임베딩 벡터를 가질 수 있다. 예를 들어, 식별 헤드(326)가 128개의 필터를 가지는 합성곱 레이어로 구성되는 경우, 식별 임베딩 벡터는 128개의 원소를 가지는 벡터일 수 있다. 전자 장치(2000)는 탐지 헤드(324)의 히트맵 헤드(510)에서 구한 적어도 하나의 객체의 특징맵 상 중심 위치에 기초하여 적어도 하나의 객체의 식별 특징을 추출할 수 있다. 전자 장치(2000)는 적어도 하나의 객체의 특징맵 상 중심 위치에 대응하는 식별 임베딩 벡터를 적어도 하나의 객체의 식별 특징으로 결정할 수 있다.According to one embodiment, the identification head 326 of the object tracking model may extract identification embedding vectors 540. Each pixel of the feature map may have a corresponding identification embedding vector. For example, if the identification head 326 is composed of a convolution layer with 128 filters, the identification embedding vector may be a vector with 128 elements. The electronic device 2000 may extract an identification feature of at least one object based on the central position of the feature map of the at least one object obtained from the heat map head 510 of the detection head 324. The electronic device 2000 may determine an identification embedding vector corresponding to the center position on the feature map of at least one object as an identification feature of at least one object.

일 실시예에 의하면, 객체 추적 모델의 몸통 방향 헤드(328)는 몸통 방향 임베딩 벡터들(550)을 추출할 수 있다. 특징맵의 픽셀들 각각은 대응되는 몸통 방향 임베딩 벡터를 가질 수 있다. 예를 들어, 몸통 방향 헤드(328)가 4개의 필터를 가지는 합성곱 레이어로 구성되는 경우, 몸통 방향 임베딩 벡터는 4개의 원소를 가지는 벡터일 수 있다. 전자 장치(2000)는 적어도 하나의 객체의 특징맵 상 중심 위치에 기초하여 적어도 하나의 객체의 몸통 방향 임베딩 벡터를 추출할 수 있다. 전자 장치(2000)는 적어도 하나의 객체의 특징맵 상 중심 위치에 대응되는 몸통 방향 임베딩 벡터를 적어도 하나의 객체의 몸통 방향 임베딩 벡터로 결정할 수 있다. According to one embodiment, the body direction head 328 of the object tracking model may extract body direction embedding vectors 550. Each pixel of the feature map may have a corresponding body direction embedding vector. For example, if the body direction head 328 is composed of a convolution layer with four filters, the body direction embedding vector may be a vector with four elements. The electronic device 2000 may extract a body direction embedding vector of at least one object based on the center position of the feature map of the at least one object. The electronic device 2000 may determine a body direction embedding vector corresponding to the center position on the feature map of at least one object as the body direction embedding vector of at least one object.

일 실시예에 따른 전자 장치(2000)는 적어도 하나의 객체의 몸통 방향 임베딩 벡터의 원소들 중 최댓값을 가지는 원소에 기초하여, 적어도 하나의 객체의 몸통 방향의 각도를 추출할 수 있다. 전자 장치(2000)는 적어도 하나의 객체의 몸통 방향 임베딩 벡터의 원소 중 최댓값을 가지는 원소에 대응되는 대표 각도를 적어도 하나의 객체의 몸통 방향의 각도로 결정할 수 있다.The electronic device 2000 according to an embodiment may extract the angle of the body direction of at least one object based on the element having the maximum value among the elements of the body direction embedding vector of the at least one object. The electronic device 2000 may determine a representative angle corresponding to an element with the maximum value among elements of the body direction embedding vector of at least one object as the body direction angle of at least one object.

도 5b는 본 개시의 일 실시예에 따른 객체 추적 모델의 탐지 헤드를 이용하여 객체의 위치 정보를 추출하는 방법을 나타내는 도면이다.FIG. 5B is a diagram illustrating a method of extracting location information of an object using a detection head of an object tracking model according to an embodiment of the present disclosure.

도 5b를 참조하면, 일 실시예에 따른 전자 장치(2000)는 탐지 헤드에 포함된 히트맵 헤드, 경계 상자 크기 헤드, 및 중심 오프셋 헤드에서 출력된 출력 데이터들에 기초하여, 객체의 위치 정보(560)를 추출할 수 있다.Referring to FIG. 5B, the electronic device 2000 according to one embodiment generates object location information ( 560) can be extracted.

일 실시예에 따른 전자 장치(2000)는 히트맵 헤드를 통해 히트맵(510)을 추출하고, 추출된 히트맵(510)에서 객체의 특징맵 상에서의 중심 위치(570)를 획득할 수 있다. 또한, 전자 장치(2000)는 경계 상자 크기 헤드를 통해 경계 상자 크기 데이터(520)를 추출할 수 있다. 전자 장치(2000)는 객체의 특징맵 상 중심 위치(570)에 대응되는 경계 상자의 x축 길이 및 y축 길이를 객체의 경계 상자 크기(560)로 결정할 수 있다. 또한, 전자 장치(2000)는 중심 오프셋 헤드를 통해 중심 오프셋 데이터(530)를 추출할 수 있다. 전자 장치(2000)는 객체의 특징맵 상 중심 위치(570)에 대응되는 중심 오프셋을 객체의 중심 오프셋(580)으로 결정할 수 있다. 전자 장치(2000)는 획득한 객체의 특징맵 상에서의 중심 위치(570) 및 객체의 중심 오프셋(580)에 기초하여 객체의 이미지 상에서의 중심 위치(590)를 결정할 수 있다.The electronic device 2000 according to one embodiment may extract the heatmap 510 through a heatmap head and obtain the center position 570 on the feature map of the object from the extracted heatmap 510. Additionally, the electronic device 2000 may extract bounding box size data 520 through the bounding box size head. The electronic device 2000 may determine the x-axis length and y-axis length of the bounding box corresponding to the center position 570 on the feature map of the object as the bounding box size 560 of the object. Additionally, the electronic device 2000 may extract center offset data 530 through a center offset head. The electronic device 2000 may determine the center offset corresponding to the center position 570 on the feature map of the object as the center offset 580 of the object. The electronic device 2000 may determine the center position 590 on the image of the object based on the center position 570 on the acquired feature map of the object and the center offset 580 of the object.

전자 장치(2000)는 객체의 이미지 상에서의 중심 위치(590) 및 객체의 경계 상자 크기(560)에 기초하여, 객체의 위치 정보를 결정할 수 있다. 전자 장치(2000)는 객체의 이미지 상에서의 중심 위치(590)를 중심으로 하고, 객체의 경계 상자 크기(560)의 x, y 좌표 길이를 x, y 좌표 길이로 가지는 경계 상자를 객체의 위치 정보로 결정할 수 있다.The electronic device 2000 may determine location information of an object based on the center position 590 of the object in the image and the bounding box size 560 of the object. The electronic device 2000 uses the object's location information as a bounding box centered on the center position 590 on the image of the object and having the x, y coordinate length of the object's bounding box size 560 as the x, y coordinate length. can be decided.

일 실시예에 의하면, 전자 장치(2000)는 객체의 이미지 상에서의 중심 위치(590) 및 객체의 경계 상자 크기(560)에 기초하여 결정된 경계 상자들에 대해 NMS(Non-Maximum Suppression)를 수행할 수 있다. 전자 장치(2000)는 히트맵 스코어를 기준으로 추출된 상위 k개의 픽셀 중 기 설정된 기준 히트맵 스코어(예를 들어, 0.3) 이하의 히트맵 스코어를 가지는 픽셀에 대응하는 경계 상자를 제거할 수 있다.According to one embodiment, the electronic device 2000 performs Non-Maximum Suppression (NMS) on bounding boxes determined based on the center position 590 on the image of the object and the bounding box size 560 of the object. You can. The electronic device 2000 may remove a bounding box corresponding to a pixel having a heatmap score less than or equal to a preset reference heatmap score (eg, 0.3) among the top k pixels extracted based on the heatmap score. .

도 6은 본 개시의 일 실시예에 따른 객체 탐지 정보 및 객체 추적 정보를 나타내는 도면이다.Figure 6 is a diagram showing object detection information and object tracking information according to an embodiment of the present disclosure.

일 실시예에서, 객체 탐지 정보(610)란, 전자 장치(2000)가 획득한 이미지에서 객체를 탐지할 때 획득하는 정보로, 객체 추적 모델을 통해서 출력되는 출력 데이터들을 포함할 수 있다. 일 실시예에 따른 객체 탐지 정보(610)는 객체 클래스, 객체의 경계 상자, 히트맵 스코어, 객체의 식별 특징, 객체의 몸통 방향의 각도를 포함할 수 있으나, 이에 한정되지 않는다. In one embodiment, object detection information 610 is information obtained when the electronic device 2000 detects an object in an acquired image, and may include output data output through an object tracking model. Object detection information 610 according to one embodiment may include an object class, a bounding box of the object, a heatmap score, an identification feature of the object, and an angle of the body direction of the object, but is not limited thereto.

일 실시예에서, 객체 추적 정보(620)란, 전자 장치(2000)가 추적하고 있는 객체에 대한 정보로, 객체 클래스, 객체 ID, 객체 경로, 객체의 식별 특징 목록, 및 객체의 마지막 매칭 시점을 포함할 수 있으나, 이에 한정되지 않는다. 예를 들어, 객체 추적 정보(620)에는 객체의 추적 상태를 포함할 수 있다. 일 실시예에 의하면, 객체의 마지막 매칭 시점은 프레임 단위로 저장될 수 있다. In one embodiment, the object tracking information 620 is information about the object being tracked by the electronic device 2000, including the object class, object ID, object path, object identification feature list, and the last matching point of the object. It may include, but is not limited to this. For example, the object tracking information 620 may include the tracking status of the object. According to one embodiment, the last matching point of an object may be stored in units of frames.

도 7은 본 개시의 일 실시예에 따른 데이터 연관 과정을 나타내는 흐름도이다.Figure 7 is a flowchart showing a data association process according to an embodiment of the present disclosure.

단계 S710에서, 일 실시예에 따른 전자 장치(2000)는 탐지된 적어도 하나의 객체와 기존에 추적되고 있는 객체들을 비교할 수 있다. 전자 장치(2000)는 탐지된 적어도 하나의 객체 각각과 기존에 추적되고 있는 모든 객체들을 비교할 수 있다. 전자 장치(2000)는 탐지된 적어도 하나의 객체 중 하나와 기존에 추적되고 있는 객체들 중 하나로 이루어진 쌍을 만들고, 두 객체를 비교할 수 있다. 전자 장치(2000)는 탐지된 적어도 하나의 객체 각각을 기존에 추적되고 있는 모든 객체들과 비교하고, 가장 유사도가 높은 객체와 매칭시킬 수 있다.In step S710, the electronic device 2000 according to one embodiment may compare at least one detected object with previously tracked objects. The electronic device 2000 may compare each detected at least one object with all previously tracked objects. The electronic device 2000 may create a pair consisting of one of the at least one detected object and one of the previously tracked objects and compare the two objects. The electronic device 2000 may compare each of the at least one detected object with all previously tracked objects and match it with the object with the highest similarity.

단계 S720에서, 전자 장치(2000)는 기존에 추적되고 있는 객체들 각각이 비활성 추적 상태인지 여부를 판단할 수 있다. 예를 들어, 전자 장치(2000)는 실시간으로 이미지들을 획득하고, 이미지들에서 적어도 하나의 객체를 탐지할 수 있다. 예를 들어, 기존에 추적되고 있는 객체 p가 이미지 획득 시점을 기준으로, 객체 p의 마지막 매칭 시점으로부터 30 프레임 이상 매칭되지 않았던 경우, 전자 장치(2000)는 객체 p를 비활성 추적 상태로 결정할 수 있다. 반대로, 이미지 획득 시점이 객체 p의 마지막 매칭 시점으로부터 30 프레임 이내인 경우, 전자 장치(2000)는 객체 p를 활성 추적 상태로 결정할 수 있다. 전자 장치(2000)는 탐지된 적어도 하나의 객체를 비활성 추적 상태에 있는 객체와 비교하는 경우, 제1 데이터 연관(S730)을 수행할 수 있고, 활성 추적 상태에 있는 객체와 비교하는 경우, 제2 데이터 연관(S740)을 수행할 수 있다.In step S720, the electronic device 2000 may determine whether each of the previously tracked objects is in an inactive tracking state. For example, the electronic device 2000 may acquire images in real time and detect at least one object in the images. For example, if the previously tracked object p has not been matched for more than 30 frames from the last matching time of object p based on the image acquisition time, the electronic device 2000 may determine the object p to be in an inactive tracking state. . Conversely, if the image acquisition time is within 30 frames from the last matching time of object p, the electronic device 2000 may determine object p to be in an active tracking state. The electronic device 2000 may perform a first data association (S730) when comparing at least one detected object with an object in an inactive tracking state, and perform a second data association when comparing the at least one detected object with an object in an active tracking state. Data correlation (S740) can be performed.

단계 S730에서, 탐지된 적어도 하나의 객체를 비활성 추적 상태인 객체와 비교하는 경우, 전자 장치(2000)는 제1 데이터 연관 과정을 수행할 수 있다. 예를 들어, 탐지된 객체 q와 기존에 추적되고 있는 객체 p가 동일한 객체에 해당하고, 객체 p가 비활성 추적 상태에 있는 경우, 객체 q와 객체 p의 몸통 방향의 각도는 달라져 있을 수 있다. 예를 들어, 객체 p가 마지막 매칭 시점 이후에 화면 밖으로 사라졌다가 마지막 매칭 시점으로부터 30 프레임 이후에 재등장하는 경우, 재등장한 객체 p의 몸통 방향의 각도가 달라져 있을 수 있다. 몸통 방향의 각도가 달라짐에 따라 객체 q와 객체 p의 식별 특징은 동일한 객체임에도 불구하고 유사도가 낮을 수 있다. 이에 따라, 전자 장치(2000)는 제1 데이터 연관 과정에서 객체의 식별 특징 뿐만 아니라 객체의 몸통 방향의 각도 또한 고려할 수 있다. 제1 데이터 연관 과정에 대한 자세한 설명은 도 8a 및 도 8b를 참조하여 설명하도록 한다.In step S730, when comparing at least one detected object with an object in an inactive tracking state, the electronic device 2000 may perform a first data association process. For example, if the detected object q and the previously tracked object p correspond to the same object, and the object p is in an inactive tracking state, the body direction angles of object q and object p may be different. For example, if object p disappears from the screen after the last matching point and reappears 30 frames after the last matching point, the angle of the body direction of the reappeared object p may be different. As the angle of the body direction changes, the identification features of object q and object p may have low similarity even though they are the same object. Accordingly, the electronic device 2000 may consider not only the identification characteristics of the object but also the angle of the body direction of the object in the first data association process. A detailed description of the first data association process will be described with reference to FIGS. 8A and 8B.

단계 S740에서, 탐지된 적어도 하나의 객체를 활성 추적 상태인 객체와 비교하는 경우, 전자 장치(2000)는 제2 데이터 연관 과정을 수행할 수 있다. 예를 들어, 탐지된 객체 q와 기존에 추적되고 있는 객체 p가 동일한 객체에 해당하고, 객체 p가 활성 추적 상태에 있는 경우, 객체 q와 객체 p의 몸통 방향의 각도는 유사할 수 있다. 예를 들어, 객체 p가 실시간으로 획득되는 이미지들에서 매 프레임마다 매칭되어 추적되고 있는 경우, 객체 p의 움직임은 연속적이므로 몸통 방향의 각도 또한 크게 달라질 수 없다. 이 경우, 전자 장치(2000)는 제2 데이터 연관 과정에서는 객체의 식별 특징만을 고려할 수 있다.In step S740, when comparing at least one detected object with an object in an active tracking state, the electronic device 2000 may perform a second data association process. For example, if the detected object q and the previously tracked object p correspond to the same object, and the object p is in an active tracking state, the body direction angles of object q and object p may be similar. For example, if object p is matched and tracked for every frame in images acquired in real time, the body direction angle cannot vary significantly because the movement of object p is continuous. In this case, the electronic device 2000 may only consider the identification characteristics of the object in the second data association process.

단계 S750에서, 전자 장치(2000)는 탐지된 적어도 하나의 객체 각각에 대해 기존에 추적되고 있는 객체들 중 동일한 객체가 존재하는지 여부를 판단할 수 있다. 일 실시예에 의하면, 전자 장치(2000)는 제1 데이터 연관 또는 제2 데이터 연관에 따라 두 객체의 식별 특징 간 유사도가 소정 기준값 이상인 경우 두 객체를 동일한 객체로 판단할 수 있다. 전자 장치(2000)는 탐지된 적어도 하나의 객체 중 기존에 추적되고 있는 객체들과 매칭되지 않는 객체에 대해, 제3 데이터 연관(S760)을 수행할 수 있다. 예를 들어, 탐지된 객체 q와 기존에 추적되고 있는 객체들의 제1 데이터 연관 또는 제2 데이터 연관에 따라 객체 q와 기존에 추적되고 있는 객체들 각각의 식별 특징 간 유사도가 모두 소정 기준값 미만인 경우, 전자 장치(2000)는 객체 q에 대해 제3 데이터 연관(S760)을 수행할 수 있다.In step S750, the electronic device 2000 may determine whether the same object exists among existing tracked objects for each of the at least one detected object. According to one embodiment, the electronic device 2000 may determine that the two objects are the same object when the similarity between the identification features of the two objects according to the first data association or the second data association is greater than or equal to a predetermined reference value. The electronic device 2000 may perform a third data association (S760) on an object that does not match existing tracked objects among at least one detected object. For example, when the similarity between the identification characteristics of object q and each of the previously tracked objects according to the first data association or second data association of the detected object q and the previously tracked objects is all less than a predetermined reference value, The electronic device 2000 may perform third data association (S760) with respect to object q.

단계 S760에서, 탐지된 적어도 하나의 객체 중 기존에 추적되고 있는 객체들과 매칭되지 않는 객체에 대해, 전자 장치(2000)는 제3 데이터 연관 과정을 수행할 수 있다. 전자 장치(2000)는 탐지된 적어도 하나의 객체와 기존에 추적되고 있는 객체들 각각의 경계 상자(bounding box, bbox) 간 IoU(Intersection over Union)에 기초하여 기존에 추적되고 있는 객체들 중 탐지된 적어도 하나의 객체 각각과 동일한 객체가 존재하는지 여부를 판단할 수 있다. 경계 상자 간 IoU란, 두 경계 상자가 겹쳐있는 정도를 판단하는 지표로, 두 경계 상자가 겹친 영역을 두 경계 상자를 합친 영역으로 나눈 값에 해당한다. 예를 들어, 전자 장치(2000)는 탐지된 객체 q와 기존에 추적되고 있는 객체 p의 경계 상자간 IoU가 소정 기준값 이상인 경우, 객체 q와 객체 p가 동일한 객체라고 판단할 수 있다. 또는, 전자 장치(2000)는 탐지된 객체 q와 기존에 추적되고 있는 객체들 각각의 경계 상자 간 IoU가 소정 기준값 이상인 객체들을 선별할 수 있다. 전자 장치(2000)는 선별된 객체들 중 가장 큰 IoU 값을 가지는 객체를 객체 q와 동일한 객체라고 판단할 수 있다.In step S760, the electronic device 2000 may perform a third data association process for an object that does not match existing tracked objects among at least one detected object. The electronic device 2000 detects a detected object among the previously tracked objects based on IoU (Intersection over Union) between the bounding box (bbox) of each of the at least one detected object and the existing tracked objects. It can be determined whether an object identical to each of at least one object exists. The IoU between bounding boxes is an indicator that determines the degree to which two bounding boxes overlap, and corresponds to the value obtained by dividing the area where two bounding boxes overlap by the combined area of the two bounding boxes. For example, if the IoU between the bounding boxes of the detected object q and the previously tracked object p is greater than or equal to a predetermined reference value, the electronic device 2000 may determine that object q and object p are the same object. Alternatively, the electronic device 2000 may select objects for which the IoU between the detected object q and the bounding box of each of the previously tracked objects is greater than or equal to a predetermined reference value. The electronic device 2000 may determine that the object with the largest IoU value among the selected objects is the same object as object q.

일 실시예에 의하면, 전자 장치(2000)는 탐지된 적어도 하나의 객체와 기존에 추적되고 있는 객체들 모두에 대해 제1 데이터 연관 또는 제2 데이터 연관 과정을 수행하고, 탐지된 적어도 하나의 객체 각각에 대해, 식별 특징 간 유사도가 소정 기준값 이상인 객체들을 선별할 수 있다. 전자 장치(2000)는 탐지된 적어도 하나의 객체 각각에 대응하는 선별된 객체들 중 가장 유사도가 높은 객체를 대응되는 탐지된 적어도 하나의 객체 각각과 동일한 객체로 판단할 수 있다.According to one embodiment, the electronic device 2000 performs a first data association or second data association process on both the at least one detected object and the objects that are already being tracked, and each of the detected at least one object For , objects whose similarity between identification features is greater than or equal to a predetermined reference value can be selected. The electronic device 2000 may determine that the object with the highest similarity among the selected objects corresponding to each of the at least one detected object is the same object as each of the corresponding at least one detected object.

도 8a는 본 개시의 일 실시예에 따른 제1 데이터 연관 과정을 나타내는 흐름도이다.FIG. 8A is a flowchart showing a first data association process according to an embodiment of the present disclosure.

단계 S810에서, 일 실시예에 따른 전자 장치(2000)는 탐지된 적어도 하나의 객체와 기존에 추적되고 있는 객체들 중 비활성 추적 상태인 객체들을 비교할 수 있다. 단계 S810은 단계 S710에 대응되므로, 자세한 설명은 생략한다.In step S810, the electronic device 2000 according to an embodiment may compare at least one detected object with objects in an inactive tracking state among objects that are currently being tracked. Since step S810 corresponds to step S710, detailed description is omitted.

단계 S820에서, 전자 장치(2000)는 탐지된 객체와 추적되고 있는 객체의 쌍들 각각에 있어서, 추적되고 있는 객체의 식별 특징 목록에, 탐지된 객체의 몸통 방향의 각도에 대응하는 식별 특징이 존재하는지 여부를 판단할 수 있다. 예를 들어, 전자 장치(2000)가 탐지된 객체 q와 기존에 추적되고 있는 객체 p를 비교하는 경우, 전자 장치(2000)는 객체 p의 식별 특징 목록에 객체 q의 몸통 방향의 각도

에 대응하는 객체 p의 식별 특징

이 존재하는지 여부를 판단할 수 있다. 는 대표 각도 중 하나에 해당할 수 있다. 예를 들어, 객체 p가 추적되는 동안, 몸통 방향의 각도가 인 상태로 탐지된 적이 없는 경우, 객체 p의 식별 특징 목록에는 가 존재하지 않을 수 있다. 전자 장치(2000)는 가 존재하지 않는 경우, 와 가장 가까운 대표 각도를 결정(S830)할 수 있고, 가 존재하는 경우, 객체 q의 식별 특징

과 간 유사도를 계산(S850)할 수 있다. 이 때, t는 이미지 획득 시점을 의미한다.In step S820, the electronic device 2000 determines whether, for each pair of the detected object and the object being tracked, an identification feature corresponding to the angle of the body direction of the detected object exists in the identification feature list of the object being tracked. You can judge whether or not. For example, when the electronic device 2000 compares a detected object q with an existing object p that is being tracked, the electronic device 2000 includes the angle of the body direction of object q in the list of identifying features of object p.

The identifying feature of the object p corresponding to

You can determine whether it exists or not. may correspond to one of the representative angles. For example, while object p is being tracked, the angle of the body direction is If it has never been detected, the list of identifying features of object p contains may not exist. Electronic devices (2000) If does not exist, The representative angle closest to can be determined (S830), If exists, the identifying feature of object q

class The similarity between the two can be calculated (S850). At this time, t refers to the time of image acquisition.

단계 S830에서, 탐지된 객체와 추적되고 있는 객체의 쌍들 각각에 있어서, 추적되고 있는 객체의 식별 특징 목록에 탐지된 객체의 몸통 방향의 각도에 대응하는 식별 특징이 존재하지 않는 쌍들에 대해, 전자 장치(2000)는 탐지된 객체의 몸통 방향의 각도에 가장 가까운 대표 각도를 결정할 수 있다. 예를 들어, 기존에 추적되고 있는 객체 p의 식별 특징 목록에, 탐지된 객체 q의 몸통 방향의 각도에 대응하는 식별 특징인 가 존재하지 않는 경우, 전자 장치(2000)는 와 가장 가까운 대표 각도

를 결정할 수 있다. 예를 들어, 대표 각도가 0도, 90도, 180도, 270도이고, 가 90도인 경우, 는 0도 또는 180도 일 수 있다. 전자 장치(2000)는 0도 및 180도에 대응하는 식별 특징이 존재하는지 여부를 판단하고, 식별 특징이 존재하는 대표 각도를 로 결정할 수 있다. In step S830, for each of the pairs of the detected object and the object being tracked, for pairs in which the identification feature corresponding to the angle of the body direction of the detected object does not exist in the identification feature list of the object being tracked, the electronic device (2000) can determine the representative angle closest to the angle of the body direction of the detected object. For example, in the list of identification features of an existing tracked object p, there is an identification feature corresponding to the angle of the body direction of the detected object q. If does not exist, the electronic device 2000 The representative angle closest to

can be decided. For example, representative angles are 0 degrees, 90 degrees, 180 degrees, and 270 degrees, If is 90 degrees, can be 0 degrees or 180 degrees. The electronic device 2000 determines whether an identification feature corresponding to 0 degrees and 180 degrees exists, and selects a representative angle at which the identification feature exists. can be decided.

만약 두 대표 각도 모두 식별 특징이 존재하는 경우, 두 식별 특징 중 하나와 비교할 수도 있고, 두 식별 특징 모두와 비교할 수도 있다. 이하에서는, 설명의 편의를 위해, 두 식별 특징 중 하나와 비교하는 것을 예로 들어 설명하도록 한다. 만약, 두 대표 각도 모두 식별 특징이 존재하지 않는 경우, 전자 장치(2000)는 와 가까운 대표 각도부터 각 대표 각도에 대응하는 식별 특징이 존재하는지 여부를 판단하고 식별 특징이 존재하는 대표 각도를 로 결정할 수 있다.If both representative angles have an identifying feature, comparison may be made with one of the two identifying features or with both identifying features. Hereinafter, for convenience of explanation, comparison with one of two identification features will be described as an example. If no identifying feature exists for both representative angles, the electronic device 2000 Starting from the representative angle close to the can be decided.

단계 S840에서, 탐지된 객체와 추적되고 있는 객체의 쌍들 각각에 있어서, 전자 장치(2000)는 추적되고 있는 객체의 식별 특징 목록 중 탐지된 객체의 몸통 방향 각도와 가장 가까운 대표 각도에 대응하는 식별 특징과, 탐지된 객체의 식별 특징 간 유사도를 계산할 수 있다. 예를 들어, 전자 장치(2000)가 탐지된 객체 q와 기존에 추적되고 있는 객체 p를 비교하는 경우, 전자 장치(2000)는

와 에 대응하는 객체 p의 식별 특징

간 유사도를 계산할 수 있다. 예를 들어, 전자 장치(2000)는 와 간 코사인 유사도를 계산할 수 있다. In step S840, for each pair of a detected object and an object being tracked, the electronic device 2000 generates an identification feature corresponding to a representative angle that is closest to the body direction angle of the detected object among the identification feature list of the object being tracked. And, the similarity between the identification features of the detected object can be calculated. For example, when the electronic device 2000 compares a detected object q with an existing object p that is being tracked, the electronic device 2000

and The identifying feature of the object p corresponding to

The similarity between the two can be calculated. For example, electronic device (2000) and The inter-cosine similarity can be calculated.

단계 S850에서, 탐지된 객체와 추적되고 있는 객체의 쌍들에 있어서, 추적되고 있는 객체의 식별 특징 목록에 탐지된 객체의 몸통 방향의 각도에 대응하는 식별 특징이 존재하는 쌍들에 대해, 추적되고 있는 객체의 식별 특징 목록 중 탐지된 객체의 몸통 방향에 대응하는 식별 특징과, 탐지된 객체의 식별 특징 간 유사도를 계산할 수 있다. 예를 들어, 탐지된 객체 q와 기존에 추적되고 있는 객체 p를 비교하는 경우, 전자 장치(2000)는

와

간 유사도를 계산할 수 있다. 예를 들어, 전자 장치(2000)는 와 간 코사인 유사도를 계산할 수 있다.In step S850, for pairs of a detected object and an object being tracked, for pairs in which an identification feature corresponding to the angle of the body direction of the detected object exists in the identification feature list of the object being tracked, the object being tracked The similarity between the identification feature corresponding to the body direction of the detected object and the identification feature of the detected object from the list of identification features can be calculated. For example, when comparing a detected object q with an existing object p that is being tracked, the electronic device 2000

and

도 8b는 본 개시의 일 실시예에 따른 제1 데이터 연관 과정을 나타내는 예시적인 도면이다.FIG. 8B is an exemplary diagram illustrating a first data association process according to an embodiment of the present disclosure.

일 실시예에 따른 전자 장치(2000)는 제1 데이터 연관 과정에서 탐지된 객체 q(810)와 기존에 추적되고 있는 객체 p를 비교할 수 있다. 이 때, 탐지된 객체 q(810)의 몸통 방향의 각도는 90도일 수 있다. 객체 p의 식별 특징 목록(820)에는 90도에 대응하는 식별 특징이 존재하지 않을 수 있다. 이 경우, 전자 장치(2000)는 90도에 가장 가까운 대표 각도를 0도 또는 180도로 결정할 수 있다. 전자 장치(2000)는 0도 또는 180도에 대응하는 식별 특징과 객체 q(810)의 식별 특징 간 유사도를 계산할 수 있다. 일 실시예에 의하면, 전자 장치(2000)는 90도 및 180도에 대응하는 식별 특징 모두와 객체 q(810)의 식별 특징 간 유사도를 계산할 수 있다.The electronic device 2000 according to one embodiment may compare the object q (810) detected in the first data association process with the object p that is already being tracked. At this time, the angle of the body direction of the detected object q (810) may be 90 degrees. There may not be an identification feature corresponding to 90 degrees in the identification feature list 820 of object p. In this case, the electronic device 2000 may determine the representative angle closest to 90 degrees as 0 degrees or 180 degrees. The electronic device 2000 may calculate the similarity between the identification feature corresponding to 0 degrees or 180 degrees and the identification feature of object q (810). According to one embodiment, the electronic device 2000 may calculate the similarity between both identification features corresponding to 90 degrees and 180 degrees and the identification feature of object q (810).

도 9는 본 개시의 일 실시예에 따른 제2 데이터 연관 과정을 나타내는 흐름도이다.Figure 9 is a flowchart showing a second data association process according to an embodiment of the present disclosure.

단계 S910에서, 일 실시예에 따른 전자 장치(2000)는 탐지된 적어도 하나의 객체와 기존에 추적되고 있는 객체들 중 활성 추적 상태인 객체들을 비교할 수 있다. 단계 S910은 단계 S710에 대응되므로, 자세한 설명은 생략한다.In step S910, the electronic device 2000 according to an embodiment may compare at least one detected object with objects in an active tracking state among objects that are currently being tracked. Since step S910 corresponds to step S710, detailed description is omitted.

단계 S920에서, 전자 장치(2000)는 탐지된 적어도 하나의 객체의 식별 특징과 기존에 추적되고 있는 객체들 각각의 최신 식별 특징 간 유사도를 계산할 수 있다. 예를 들어, 전자 장치(2000)가 탐지된 객체 q와 기존에 추적되고 있는 객체 p를 비교하는 경우, 전자 장치(2000)는

와 객체 p의 최신 식별 특징

간 유사도를 계산할 수 있다. 는 객체 p가 마지막으로 매칭된 시점에서의 객체 p의 식별 특징일 수 있다. 예를 들어, 전자 장치(2000)는 와 간 코사인 유사도를 계산할 수 있다. In step S920, the electronic device 2000 may calculate the similarity between the identification characteristics of at least one detected object and the latest identification characteristics of each of the previously tracked objects. For example, when the electronic device 2000 compares a detected object q with an existing object p that is being tracked, the electronic device 2000

and the latest identifying features of object p

The similarity between the two can be calculated. may be an identification feature of object p at the time when object p was last matched. For example, electronic device (2000) and The inter-cosine similarity can be calculated.

도 10은 본 개시의 일 실시예에 따른 제3 데이터 연관 과정을 나타내는 흐름도이다.Figure 10 is a flowchart showing a third data association process according to an embodiment of the present disclosure.

단계 S1010에서, 일 실시예에 따른 전자 장치(2000)는 탐지된 적어도 하나의 객체와 기존에 추적되고 있는 객체들을 비교할 수 있다. In step S1010, the electronic device 2000 according to one embodiment may compare at least one detected object with previously tracked objects.

단계 S1020에서, 전자 장치(2000)는 탐지된 적어도 하나의 객체와 기존에 추적되고 있는 객체들 각각의 경계 상자(bbox) 간 IoU를 계산할 수 있다. 예를 들어, 전자 장치(2000)가 탐지된 객체 q와 기존에 추적되고 있는 객체 p를 비교하는 경우, 전자 장치(2000)는 객체 q와 객체 p의 경계 상자(bbox) 간 IoU를 계산할 수 있다. 단계 S1020에 대한 설명은 단계 S760에서의 설명과 동일하므로 자세한 설명은 생략한다.In step S1020, the electronic device 2000 may calculate the IoU between the bounding box (bbox) of at least one detected object and each of the previously tracked objects. For example, when the electronic device 2000 compares a detected object q with an existing object p that is being tracked, the electronic device 2000 may calculate the IoU between the bounding boxes (bbox) of object q and object p. . Since the description of step S1020 is the same as the description of step S760, detailed description is omitted.

도 11a는 본 개시의 일 실시예에 따른 추적 정보 관리 과정을 나타내는 흐름도이다. FIG. 11A is a flowchart showing a tracking information management process according to an embodiment of the present disclosure.

도 11a를 참조하면, 일 실시예에 따른 전자 장치(2000)는 기존에 추적되고 있는 객체들의 추적 정보를 업데이트하거나, 탐지된 적어도 하나의 객체의 추적을 개시할 수 있다. Referring to FIG. 11A , the electronic device 2000 according to an embodiment may update tracking information of objects that are already being tracked or may initiate tracking of at least one detected object.

단계 S1110에서, 전자 장치(2000)는 탐지된 적어도 하나의 객체와 기존에 추적되고 있는 객체들을 비교할 수 있다. 단계 S1110은 단계 S710에 대응되므로, 자세한 설명은 생략한다.In step S1110, the electronic device 2000 may compare at least one detected object with previously tracked objects. Since step S1110 corresponds to step S710, detailed description is omitted.

단계 S1120에서, 전자 장치(2000)는 탐지된 적어도 하나의 객체와 기존에 추적되고 있는 객체들 간 데이터 연관 과정을 수행할 수 있다. 데이터 연관 과정은 도 7 내지 도 10을 참조하여 설명하였으므로, 자세한 설명은 생략한다.In step S1120, the electronic device 2000 may perform a data association process between at least one detected object and previously tracked objects. Since the data correlation process has been described with reference to FIGS. 7 to 10, detailed description will be omitted.

단계 S1130에서, 전자 장치(2000)는 탐지된 적어도 하나의 객체 각각에 대해, 기존에 추적되고 있는 객체들 중 동일한 객체가 존재하는지 여부를 판단할 수 있다. 예를 들어, 전자 장치(2000)가 데이터 연관 과정에서 탐지된 객체 q와 기존에 추적되고 있는 객체 p를 비교하는 경우, 전자 장치(2000)는 객체 q와 객체 p가 동일한 객체인지 여부를 판단할 수 있다. 데이터 연관 과정에서 객체 q와 객체 p가 매칭되는 경우, 전자 장치(2000)는 객체 p의 추적 정보를 업데이트(S1140)할 수 있다. 전자 장치(2000)는 객체 p 뿐만 아니라, 기존에 추적되고 있는 객체들 모두에 대해 객체 q와 매칭되는 객체가 존재하는지 여부를 판단할 수 있다. 전자 장치(2000)는 객체 q에 매칭되는 객체가 존재하지 않는 경우, 객체 q의 추적을 개시(S1150)할 수 있다.In step S1130, the electronic device 2000 may determine, for each of the at least one detected object, whether the same object exists among objects that are already being tracked. For example, when the electronic device 2000 compares an object q detected in a data association process with an object p that is already being tracked, the electronic device 2000 may determine whether object q and object p are the same object. You can. If object q and object p are matched during the data association process, the electronic device 2000 may update tracking information of object p (S1140). The electronic device 2000 may determine whether an object matching object q exists not only for object p but also for all previously tracked objects. If there is no object matching object q, the electronic device 2000 may start tracking object q (S1150).

단계 S1140에서, 전자 장치(2000)는 기존에 추적되고 있는 객체들 중, 탐지된 적어도 하나의 객체와 매칭되는 객체들의 추적 정보를 업데이트 할 수 있다. 예를 들어, 데이터 연관 과정에서 탐지된 객체 q와 기존에 추적되고 있는 객체 p가 매칭되는 경우, 전자 장치(2000)는 객체 q의 탐지 정보에 기초하여 객체 p의 추적 정보를 업데이트 할 수 있다. 객체 p의 추적 정보를 업데이트 하는 과정에 대한 자세한 설명은 도 11b를 참조하여 설명하도록 한다.In step S1140, the electronic device 2000 may update tracking information of objects that match at least one detected object among objects that are already being tracked. For example, if object q detected in the data correlation process matches object p that is already being tracked, the electronic device 2000 may update tracking information of object p based on the detection information of object q. A detailed description of the process of updating tracking information of object p will be described with reference to FIG. 11B.

단계 S1150에서, 전자 장치(2000)는 탐지된 적어도 하나의 객체 중, 기존에 추적되고 있는 객체들 모두와 매칭되지 않는 객체의 추적을 개시할 수 있다. 예를 들어, 탐지된 객체 q와 기존에 추적되고 있는 객체들 모두가 매칭되지 않는 경우, 전자 장치(2000)는 객체 q의 추적을 개시할 수 있다. 예를 들어, 전자 장치(2000)는 객체 q에 대한 추적 정보를 추적 데이터베이스에 저장할 수 있다. 전자 장치(2000)는 객체 q의 ID를 q로 결정하고, 객체 q의 경로 추적을 개시할 수 있다. 전자 장치(2000)는 객체 q의 식별 특징 목록에 객체 q의 추출된 식별 특징

을 객체 q의 최신 식별 특징 및 객체 q의 몸통 방향의 각도

에 해당하는 식별 특징으로 추가할 수 있다. 전자 장치(2000)는 이미지 획득 시점인 t를 객체 q의 마지막 매칭 시점으로 결정할 수 있다.In step S1150, the electronic device 2000 may initiate tracking of at least one detected object that does not match all of the previously tracked objects. For example, if the detected object q does not match all of the previously tracked objects, the electronic device 2000 may start tracking object q. For example, the electronic device 2000 may store tracking information about object q in a tracking database. The electronic device 2000 may determine the ID of object q as q and start tracking the path of object q. The electronic device 2000 adds the extracted identification features of object q to the list of identification features of object q.

the latest identifying features of object q and the angle of body orientation of object q

It can be added as an identification feature corresponding to . The electronic device 2000 may determine t, the image acquisition time, as the last matching time of object q.

도 11b는 본 개시의 일 실시예에 따른 객체의 추적 정보를 업데이트하는 과정을 나타내는 도면이다.FIG. 11B is a diagram illustrating a process of updating tracking information of an object according to an embodiment of the present disclosure.

도 11b를 참조하면, 전자 장치(2000)는 객체 q(1110) 및 객체 p가 매칭되는 경우 객체 q(1110)의 탐지 정보를 기초로 하여, 객체 p의 추적 정보를 업데이트할 수 있다. 예를 들어, 업데이트 전 객체 p의 식별 특징 목록(1120)에는 90도에 대응하는 식별 특징이 존재하지 않을 수 있다. 만약, 객체 q(1110)의 몸통 방향의 각도 가 90도이고, 객체 q(1110) 및 객체 p가 매칭되는 경우, 전자 장치(2000)는 객체 q(1110)의 식별 특징으로 객체 p의 최신 식별 특징 및 90도에 대응하는 식별 특징을 업데이트할 수 있다. 이에 따라, 업데이트된 객체 p의 식별 특징 목록(1130)에는 대표 각도 각각에 대응하는 식별 특징이 모두 존재할 수 있다. Referring to FIG. 11B, when object q (1110) and object p match, the electronic device 2000 may update tracking information of object p based on detection information of object q (1110). For example, the identification feature corresponding to 90 degrees may not exist in the identification feature list 1120 of object p before update. If the angle of the body direction of object q (1110) is 90 degrees, and if object q (1110) and object p are matched, the electronic device 2000 updates the latest identification feature of object p and the identification feature corresponding to 90 degrees with the identification feature of object q (1110). You can. Accordingly, all identification features corresponding to each representative angle may be present in the updated identification feature list 1130 of object p.

전자 장치(2000)는 객체 p의 식별 특징 목록(1120) 뿐만 아니라, 객체 p의 추적 정보에 포함되는 다른 정보들 또한 업데이트 할 수 있다. 전자 장치(2000)는 객체 q(1110)의 탐지된 위치 정보에 기초하여 객체 p의 경로를 업데이트 할 수 있다. 또한, 전자 장치(2000)는 객체 q(1110)의 탐지 시점인 이미지 획득 시점 t를 객체 p의 마지막 매칭 시점으로 업데이트할 수 있다.The electronic device 2000 may update not only the identification feature list 1120 of object p, but also other information included in the tracking information of object p. The electronic device 2000 may update the path of object p based on the detected location information of object q (1110). Additionally, the electronic device 2000 may update the image acquisition time t, which is the detection time point of object q (1110), to the last matching time point of object p.

도 12a는 본 개시의 일 실시예에 따른 객체의 추적 상태를 업데이트하는 과정을 나타내는 흐름도이다.FIG. 12A is a flowchart illustrating a process for updating the tracking state of an object according to an embodiment of the present disclosure.

일 실시예에 따른 전자 장치(2000)는 추적 정보 관리 과정에서, 기존에 추적되고 있는 객체들에 대한 추적 상태를 업데이트할 수 있다. 전자 장치(2000)는 마지막 매칭 시점으로부터 매칭되지 않은 기간에 기초하여, 비활성 추적 상태로 업데이트할 수도 있고, 객체의 추적을 종료할 수도 있다.The electronic device 2000 according to one embodiment may update the tracking status of objects that are already being tracked during the tracking information management process. The electronic device 2000 may update to an inactive tracking state or end tracking the object based on the unmatched period from the last matching time.

단계 S1210에서, 전자 장치(2000)는 기존에 추적되고 있는 객체들 중 활성 추적 상태의 객체들을 추적 상태 업데이트 대상으로 설정할 수 있다. In step S1210, the electronic device 2000 may set objects in an active tracking state among existing tracked objects as tracking status update targets.

단계 S1220에서, 전자 장치(2000)는 이미지 획득 시점을 기준으로, 추적되고 있는 객체들 각각이 마지막으로 매칭된 시점으로부터 제1 기간이 경과했는지 여부를 판단할 수 있다. 즉, 전자 장치(2000)는 추적되고 있는 객체들 각각이 마지막 매칭 시점으로부터 제1 기간 동안 매칭되지 않았는지 여부를 판단할 수 있다. 제1 기간은 객체의 활성 추적 상태와 비활성 추적 상태의 기준이 되는 기 설정된 기간일 수 있다. 예를 들어, 전자 장치(2000)는 제1 기간을 30 프레임으로 설정할 수 있다. 전자 장치(2000)는 추적되고 있는 객체들 중 마지막 매칭 시점으로부터 제1 기간이 경과한 객체들의 추적 상태를 비활성 추적 상태로 업데이트(S1230)할 수 있다. 전자 장치(2000)는 추적되고 있는 객체들 중 마지막 매칭 시점으로부터 아직 제1 기간이 경과하지 않은 객체들의 경우, 추적 상태를 업데이트하지 않을 수 있다.In step S1220, the electronic device 2000 may determine whether a first period has elapsed from the time when each of the objects being tracked was last matched, based on the image acquisition time. That is, the electronic device 2000 may determine whether each of the objects being tracked has not been matched during the first period from the last matching point. The first period may be a preset period that serves as a standard for the active tracking state and inactive tracking state of the object. For example, the electronic device 2000 may set the first period to 30 frames. The electronic device 2000 may update the tracking status of objects for which a first period of time has elapsed from the last matching point among the objects being tracked to an inactive tracking status (S1230). The electronic device 2000 may not update the tracking status of objects for which the first period has not yet elapsed from the last matching point among the objects being tracked.

단계 S1230에서, 전자 장치(2000)는 이미지 획득 시점을 기준으로, 추적되고 있는 객체들 중 마지막으로 매칭된 시점으로부터 제1 기간이 경과한 객체들의 추적 상태를 비활성 추적 상태로 업데이트할 수 있다. 예를 들어, 제1 기간이 30 프레임으로 설정되었을 때, 추적되고 있는 객체들 중 마지막 매칭 시점으로부터 30 프레임 이상 매칭되지 않은 객체들의 경우, 전자 장치(2000)는 비활성 추적 상태로 업데이트할 수 있다.In step S1230, the electronic device 2000 may update the tracking status of objects for which a first period has elapsed from the last matching point among the objects being tracked to an inactive tracking state, based on the image acquisition time. For example, when the first period is set to 30 frames, among the objects being tracked, for objects that have not been matched for more than 30 frames from the last matching time, the electronic device 2000 may update the tracking state to an inactive tracking state.

도 12b는 본 개시의 일 실시예에 따른 객체의 추적 상태를 업데이트하는 과정을 나타내는 흐름도이다.FIG. 12B is a flowchart illustrating a process of updating the tracking state of an object according to an embodiment of the present disclosure.

단계 S1240에서, 전자 장치(2000)는 기존에 추적되고 있는 객체들 중 비활성 추적 상태의 객체들을 추적 상태 업데이트 대상으로 설정할 수 있다.In step S1240, the electronic device 2000 may set objects in an inactive tracking state among objects that are already being tracked as tracking status update targets.

단계 S1250에서, 전자 장치(2000)는 이미지 획득 시점을 기준으로, 추적되고 있는 객체들 각각이 마지막으로 매칭된 시점으로부터 제 2기간이 경과했는지 여부를 판단할 수 있다. 즉, 전자 장치(2000)는 추적되고 있는 객체들 각각이 마지막 매칭 시점으로부터 제2 기간 동안 매칭되지 않았는지 여부를 판단할 수 있다. 제2 기간은 객체의 비활성 추적 상태와 객체 추적 종료의 기준이 되는 기 설정된 기간으로, 제1 기간보다 길 수 있다. 예를 들어, 전자 장치(2000)는 제2 기간을 50프레임으로 설정할 수 있다. 전자 장치(2000)는 추적되고 있는 객체들 중 마지막 매칭 시점으로부터 제2 기간이 경과한 객체들의 추적을 종료(S1240)할 수 있다. 전자 장치(2000)는 추적되고 있는 객체들 중 마지막 매칭 시점으로부터 제2 기간이 아직 경과하지 않은 객체들의 추적을 종료하지 않을 수 있다.In step S1250, the electronic device 2000 may determine whether a second period has elapsed from the time when each of the objects being tracked was last matched, based on the image acquisition time. That is, the electronic device 2000 may determine whether each of the objects being tracked has not been matched for a second period from the last matching point. The second period is a preset period that serves as a standard for the inactive tracking state of the object and the end of object tracking, and may be longer than the first period. For example, the electronic device 2000 may set the second period to 50 frames. The electronic device 2000 may terminate tracking of objects for which a second period of time has elapsed since the last matching point among the objects being tracked (S1240). The electronic device 2000 may not end tracking of objects for which the second period has not yet elapsed from the last matching point among the objects being tracked.

단계 S1260에서, 전자 장치(2000)는 이미지 획득 시점을 기준으로, 추적되고 있는 객체들 중 마지막으로 매칭된 시점으로부터 제2 기간이 경과한 객체들의 추적을 종료할 수 있다. 전자 장치(2000)는 객체의 추적을 종료하기로 결정한 경우, 추적이 종료된 객체들은 이후에 획득되는 이미지에서 탐지된 객체들과의 비교 대상이 되지 않을 수 있다.In step S1260, the electronic device 2000 may end tracking objects for which a second period has elapsed since the last matching time among the objects being tracked, based on the image acquisition time. When the electronic device 2000 decides to end tracking an object, the objects for which tracking has ended may not be compared with objects detected in images acquired later.

도 13은 본 개시의 일 실시예에 따른 전자 장치(2000)의 구성을 도시한 블록도이다.FIG. 13 is a block diagram illustrating the configuration of an electronic device 2000 according to an embodiment of the present disclosure.

도 13을 참조하면, 일 실시예에 따른 전자 장치(2000)는 적어도 하나의 프로세서(2100), 메모리(2200), 통신 인터페이스(2300)를 포함할 수 있다.Referring to FIG. 13 , an electronic device 2000 according to an embodiment may include at least one processor 2100, a memory 2200, and a communication interface 2300.

적어도 하나의 프로세서(2100)는 메모리(2200)에 저장된 명령어들이나 프로그램화된 소프트웨어 모듈을 실행함으로써, 전자 장치(2000)가 수행하는 동작이나 기능을 제어할 수 있다. 적어도 하나의 프로세서(2100)는 산술, 로직 및 입출력 연산과 시그널 프로세싱을 수행하는 하드웨어 구성 요소로 구성될 수 있다.At least one processor 2100 may control operations or functions performed by the electronic device 2000 by executing instructions or programmed software modules stored in the memory 2200. At least one processor 2100 may be comprised of hardware components that perform arithmetic, logic, input/output operations, and signal processing.

적어도 하나의 프로세서(2100)는 예를 들어, 중앙 처리 장치(Central Processing Unit), 마이크로 프로세서(microprocessor), 그래픽 처리 장치(Graphic Processing Unit), ASICs(Application Specific Integrated Circuits), DSPs(Digital Signal Processors), DSPDs(Digital Signal Processing Devices), PLDs(Programmable Logic Devices), FPGAs(Field Programmable Gate Arrays), 애플리케이션 프로세서(Application Processor), 신경망 처리 장치(Neural Processing Unit) 또는 인공지능 모델의 처리에 특화된 하드웨어 구조로 설계된 인공지능 전용 프로세서 중 적어도 하나로 구성될 수 있으나, 이에 제한되는 것은 아니다. 적어도 하나의 프로세서(2100)를 구성하는 각 프로세서는 소정의 기능을 수행하기 위한 전용 프로세서일 수 있다.At least one processor 2100 may include, for example, a Central Processing Unit, a microprocessor, a Graphics Processing Unit, Application Specific Integrated Circuits (ASICs), or Digital Signal Processors (DSPs). , DSPDs (Digital Signal Processing Devices), PLDs (Programmable Logic Devices), FPGAs (Field Programmable Gate Arrays), Application Processors, Neural Processing Units, or hardware structures specialized for processing artificial intelligence models. It may consist of at least one of the designed artificial intelligence processors, but is not limited to this. Each processor constituting at least one processor 2100 may be a dedicated processor for performing a certain function.

적어도 하나의 프로세서(2100)는 메모리(2200)에 저장된 하나 이상의 인스트럭션(instructions)을 실행함으로써, 전자 장치(2000)가 객체 추적 모델을 이용하여 적어도 하나의 객체를 추적하기 위한 전반적인 동작들을 제어할 수 있다. At least one processor 2100 executes one or more instructions stored in the memory 2200, thereby enabling the electronic device 2000 to control overall operations for tracking at least one object using an object tracking model. there is.

메모리(2200)는 적어도 하나의 프로세서(2100)가 판독할 수 있는 명령어들, 데이터 구조, 및 프로그램 코드(program code)를 저장할 수 있다. 적어도 하나의 프로세서(2100)가 수행하는 동작들은 메모리(2200)에 저장된 프로그램의 명령어들 또는 코드들을 실행함으로써 구현될 수 있다.The memory 2200 may store instructions, data structures, and program code that can be read by at least one processor 2100. Operations performed by at least one processor 2100 may be implemented by executing instructions or codes of a program stored in the memory 2200.

메모리(2200)는 플래시 메모리 타입(flash memory type), 하드디스크 타입(hard disk type), 멀티미디어 카드 마이크로 타입(multimedia card micro type), 카드 타입의 메모리(예를 들어 SD 또는 XD 메모리 등)를 포함할 수 있으며, 롬(ROM, Read-Only Memory), EEPROM(Electrically Erasable Programmable Read-Only Memory), PROM(Programmable Read-Only Memory), 자기 메모리, 자기 디스크, 광디스크 중 적어도 하나를 포함하는 비 휘발성 메모리 및 램(RAM, Random Access Memory) 또는 SRAM(Static Random Access Memory)과 같은 휘발성 메모리를 포함할 수 있다.The memory 2200 includes flash memory type, hard disk type, multimedia card micro type, and card type memory (for example, SD or XD memory, etc.). Non-volatile memory that includes at least one of ROM (Read-Only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory), PROM (Programmable Read-Only Memory), magnetic memory, magnetic disk, and optical disk. and volatile memory such as RAM (Random Access Memory) or SRAM (Static Random Access Memory).

일 실시예에 따른 메모리(2200)는 전자 장치(2000)가 적어도 하나의 객체를 추적할 수 있도록 추적 데이터베이스를 저장할 수 있다. The memory 2200 according to one embodiment may store a tracking database so that the electronic device 2000 can track at least one object.

통신 인터페이스(2300)는 다른 장치 또는 네트워크와 유무선 통신을 수행할 수 있다. 통신 인터페이스(2300)는 다양한 유무선 통신 방법 중 적어도 하나를 지원하는 통신 회로 또는 통신 모듈을 포함할 수 있다. 예를 들어, 통신 인터페이스(2300)는 예를 들어, 유선 랜, 무선 랜(Wireless LAN), 와이파이(Wi-Fi), 블루투스(Bluetooth), 지그비(ZigBee), WFD(Wi-Fi Direct), 적외선 통신(IrDA, infrared Data Association), BLE (Bluetooth Low Energy), NFC(Near Field Communication), 와이브로(Wireless Broadband Internet, Wibro), 와이맥스(World Interoperability for Microwave Access, WiMAX), SWAP(Shared Wireless Access Protocol), 와이기그(Wireless Gigabit Alliances, WiGig) 및 RF 통신을 포함하는 데이터 통신 방식 중 적어도 하나를 이용하여, 전자 장치(2000)와 다른 장치들 간의 데이터 통신을 수행할 수 있다.The communication interface 2300 can perform wired or wireless communication with other devices or networks. The communication interface 2300 may include a communication circuit or communication module that supports at least one of various wired and wireless communication methods. For example, the communication interface 2300 may include wired LAN, wireless LAN, Wi-Fi, Bluetooth, ZigBee, WFD (Wi-Fi Direct), and infrared. Communication (IrDA, infrared Data Association), BLE (Bluetooth Low Energy), NFC (Near Field Communication), Wibro (Wireless Broadband Internet, Wibro), WiMAX (World Interoperability for Microwave Access, WiMAX), SWAP (Shared Wireless Access Protocol) , Data communication between the electronic device 2000 and other devices may be performed using at least one of data communication methods including Wireless Gigabit Alliances (WiGig) and RF communication.

일 실시예에 따른 통신 인터페이스(2300)는 카메라(2500)를 통해 획득한 이미지를 객체 추적 모델이 저장된 외부 서버에 전송할 수 있다. 외부 서버는 수신한 이미지를 객체 추적 모델에 입력하여 출력 데이터를 출력할 수 있다. 통신 인터페이스(2300)는 외부 서버로부터 출력 데이터를 수신할 수 있다. 또한, 통신 인터페이스(2300)는 객체 추적 모델 또는 객체 추적 모델의 학습에 이용되는 훈련 이미지 데이터 세트를 외부 장치로부터 수신할 수 있다. The communication interface 2300 according to one embodiment may transmit an image acquired through the camera 2500 to an external server where the object tracking model is stored. The external server can input the received image into the object tracking model and output output data. The communication interface 2300 may receive output data from an external server. Additionally, the communication interface 2300 may receive an object tracking model or a training image data set used for learning an object tracking model from an external device.

도 14는 본 개시의 일 실시예에 따른 전자 장치의 구성을 도시한 블록도이다.Figure 14 is a block diagram showing the configuration of an electronic device according to an embodiment of the present disclosure.

전자 장치(2000)는 도 13에 도시된 구성들 뿐만 아니라, 디스플레이(2400) 및 카메라(2500)를 더 포함할 수 있다. The electronic device 2000 may further include a display 2400 and a camera 2500 in addition to the components shown in FIG. 13 .

적어도 하나의 프로세서(2100) 및 메모리(2200)에 대해 도 13에서 설명한 것과 중복되는 내용은 이하 그 설명을 생략하도록 한다. 메모리(2200)에는 추적 데이터베이스 및 객체 추적 모델이 저장될 수 있다. 적어도 하나의 프로세서(2100)는 메모리(2200)로부터 객체 추적 모델을 불러와 실행할 수 있다.Contents that overlap with those described in FIG. 13 for at least one processor 2100 and the memory 2200 will be omitted hereinafter. A tracking database and an object tracking model may be stored in the memory 2200. At least one processor 2100 may load an object tracking model from the memory 2200 and execute it.

디스플레이(2400)는 정보나 영상을 제공하는 출력 인터페이스를 포함하며, 입력을 수신하는 입력 인터페이스를 더 포함한 형태일 수 있다. 출력 인터페이스는 표시 패널 및 표시 패널을 제어하는 컨트롤러를 포함할 수 있으며, OLED(Organic Light Emitting Diodes) 디스플레이, AM-OLED(Active-Matrix Organic Light-Emitting Diode) 디스플레이, LCD(Liquid Crystal Display) 등과 같은 다양한 방식으로 구현될 수 있다. 입력 인터페이스는 사용자로부터 다양한 형태의 입력을 받을 수 있으며, 터치 패널, 키패드, 펜 인식 패널 중 적어도 하나를 포함한 형태일 수 있다. 디스플레이(2400)는 표시 패널과 터치 패널이 결합된 터치 스크린 형태로 제공될 수 있으며, 유연하게(flexible) 또는 접을 수 있게(foldable) 구현될 수 있다. 일 실시예에 따른 디스플레이(2400)는 추적되고 있는 객체들의 경로를 실시간으로 출력할 수 있다.The display 2400 includes an output interface that provides information or images, and may further include an input interface that receives input. The output interface may include a display panel and a controller that controls the display panel, such as an Organic Light Emitting Diode (OLED) display, an Active-Matrix Organic Light-Emitting Diode (AM-OLED) display, a Liquid Crystal Display (LCD), etc. It can be implemented in various ways. The input interface can receive various types of input from the user and may include at least one of a touch panel, keypad, and pen recognition panel. The display 2400 may be provided in the form of a touch screen that combines a display panel and a touch panel, and may be implemented in a flexible or foldable manner. The display 2400 according to one embodiment may output the paths of objects being tracked in real time.

카메라(2500)는 이미지를 획득하는 하드웨어 모듈이다. 카메라(2500)는 이미지 내지 영상을 촬영할 수 있다. 카메라(2500)는 적어도 하나의 카메라 모듈을 구비할 수 있으며, 전자 장치(2000)의 사양에 따라, 접사, 심도, 망원, 광각, 초광각 등의 기능을 지원할 수 있다. The camera 2500 is a hardware module that acquires images. The camera 2500 can capture images or videos. The camera 2500 may include at least one camera module and, depending on the specifications of the electronic device 2000, may support functions such as macro, depth of field, telephoto, wide angle, and ultra wide angle.

본 개시의 일 실시예에 따른 적어도 하나의 프로세서(2100)는 메모리(2200)에 저장된 적어도 하나의 인스트럭션을 실행함으로써, 이미지를 획득할 수 있다. 적어도 하나의 프로세서(2100)는 메모리(2200)에 저장된 적어도 하나의 인스트럭션을 실행함으로써, 이미지로부터 객체 추적과 관련된 복수의 태스크를 수행하기 위한 특징맵을 추출할 수 있다. 적어도 하나의 프로세서(2100)는 메모리(2200)에 저장된 적어도 하나의 인스트럭션을 실행함으로써, 추출된 특징맵을 이용하여 적어도 하나의 객체의 위치를 나타내는 위치 정보, 적어도 하나의 객체를 식별할 수 있도록 하는 식별 특징, 및 적어도 하나의 객체의 몸통이 향하는 몸통 방향의 각도를 추출할 수 있다. 적어도 하나의 프로세서(2100)는 메모리(2200)에 저장된 적어도 하나의 인스트럭션을 실행함으로써, 적어도 하나의 객체의 위치 정보, 식별 특징, 및 몸통 방향의 각도를 기초로 하여, 적어도 하나의 객체를 추적할 수 있다.At least one processor 2100 according to an embodiment of the present disclosure may acquire an image by executing at least one instruction stored in the memory 2200. At least one processor 2100 may extract a feature map for performing a plurality of tasks related to object tracking from an image by executing at least one instruction stored in the memory 2200. At least one processor 2100 executes at least one instruction stored in the memory 2200, thereby generating location information indicating the location of at least one object and identifying at least one object using the extracted feature map. Identification features and the angle of the body direction toward which the body of at least one object faces can be extracted. At least one processor 2100 executes at least one instruction stored in the memory 2200 to track at least one object based on the position information, identification feature, and body direction angle of the at least one object. You can.

일 실시예에 따른 적어도 하나의 프로세서(2100)는 메모리(2200)에 저장된 적어도 하나의 인스트럭션을 실행함으로써, 객체 추적 모델을 이용하여 특징맵을 추출하는 동작, 적어도 하나의 객체의 위치 정보, 식별 특징, 및 몸통 방향 정보를 추출하는 동작을 수행할 수 있다. 객체 추적 모델은 객체 추적과 관련된 복수의 태스크를 수행하는 멀티 태스크 인공지능 모델로, 특징맵을 추출하는 백본망, 적어도 하나의 객체의 위치 정보를 추출하는 탐지 헤드(head), 적어도 하나의 객체의 식별 특징을 추출하는 식별 헤드 및 적어도 하나의 객체의 몸통 방향의 각도를 추출하는 몸통 방향 헤드를 포함할 수 있다. 백본망 및 헤드들 각각은 적어도 하나의 레이어로 구성된 것일 수 있다.At least one processor 2100 according to an embodiment executes at least one instruction stored in the memory 2200, thereby extracting a feature map using an object tracking model, location information, and identification features of at least one object. , and an operation to extract body direction information may be performed. The object tracking model is a multi-task artificial intelligence model that performs multiple tasks related to object tracking, including a backbone network that extracts a feature map, a detection head that extracts location information of at least one object, and a backbone network that extracts the location information of at least one object. It may include an identification head for extracting identification features and a body direction head for extracting the angle of the body direction of at least one object. Each of the backbone network and heads may be composed of at least one layer.

일 실시예에 따른 적어도 하나의 프로세서(2100)는 메모리(2200)에 저장된 적어도 하나의 인스트럭션을 실행함으로써, 적어도 하나의 객체의 클래스를 식별할 수 있다.At least one processor 2100 according to an embodiment may identify the class of at least one object by executing at least one instruction stored in the memory 2200.

일 실시예에 따른 적어도 하나의 프로세서(2100)는 메모리(2200)에 저장된 적어도 하나의 인스트럭션을 실행함으로써, 특징맵을 이용하여 적어도 하나의 객체의 특징맵 상에서의 중심 위치를 나타내는 히트맵, 적어도 하나의 객체의 경계 상자 크기, 적어도 하나의 객체의 이미지 상에서의 중심 위치를 획득하기 위한 중심 오프셋을 추출할 수 있다. 적어도 하나의 프로세서(2100)는 메모리(2200)에 저장된 적어도 하나의 인스트럭션을 실행함으로써, 히트맵에서 적어도 하나의 객체의 특징맵 상에서의 중심 위치를 획득할 수 있다. 적어도 하나의 프로세서(2100)는 메모리(2200)에 저장된 적어도 하나의 인스트럭션을 실행함으로써, 적어도 하나의 객체의 특징맵 상에서의 중심 위치, 적어도 하나의 객체의 경계 상자 크기 및 적어도 하나의 객체의 중심 오프셋을 기초로 하여, 적어도 하나의 객체의 위치 정보를 추출할 수 있다. 히트맵은 히트맵의 각 픽셀에 대응하는 히트맵 스코어를 가지며, 적어도 하나의 객체의 특징맵 상에서의 중심 위치에서 피크(peak) 히트맵 스코어를 가지는 것일 수 있다.At least one processor 2100 according to an embodiment executes at least one instruction stored in the memory 2200, thereby creating at least one heatmap indicating the center position on the feature map of at least one object using the feature map. The bounding box size of the object and the center offset for obtaining the center position in the image of at least one object can be extracted. At least one processor 2100 may obtain the center position of at least one object on the feature map in the heat map by executing at least one instruction stored in the memory 2200. At least one processor 2100 executes at least one instruction stored in the memory 2200 to determine the center position of the at least one object on the feature map, the bounding box size of the at least one object, and the center offset of the at least one object. Based on this, location information of at least one object can be extracted. The heatmap may have a heatmap score corresponding to each pixel of the heatmap, and may have a peak heatmap score at the center position on the feature map of at least one object.

일 실시예에 따른 적어도 하나의 프로세서(2100)는 메모리(2200)에 저장된 적어도 하나의 인스트럭션을 실행함으로써, 특징맵을 이용하여 식별 특징들을 추출할 수 있다. 적어도 하나의 프로세서(2100)는 메모리(2200)에 저장된 적어도 하나의 인스트럭션을 실행함으로써, 적어도 하나의 객체의 특징맵 상에서의 중심 위치에 기초하여, 식별 특징들로부터 적어도 하나의 객체의 식별 특징을 추출할 수 있다.At least one processor 2100 according to an embodiment may extract identification features using a feature map by executing at least one instruction stored in the memory 2200. At least one processor 2100 extracts an identification feature of at least one object from the identification features based on the central position of the at least one object on the feature map by executing at least one instruction stored in the memory 2200. can do.

일 실시예에 따른 적어도 하나의 프로세서(2100)는 적어도 하나의 인스트럭션을 실행함으로써, 특징맵을 이용하여 몸통 방향 임베딩(embedding) 벡터들을 추출할 수 있다. 적어도 하나의 프로세서(2100)는 메모리(2200)에 저장된 적어도 하나의 인스트럭션을 실행함으로써, 적어도 하나의 객체의 특징맵 상에서의 중심 위치에 기초하여, 몸통 방향 임베딩 벡터들로부터 적어도 하나의 객체의 몸통 방향 임베딩 벡터를 추출할 수 있다. 적어도 하나의 프로세서(2100)는 메모리(2200)에 저장된 적어도 하나의 인스트럭션을 실행함으로써, 적어도 하나의 객체의 몸통 방향 임베딩 벡터의 원소들 중 최댓값을 가지는 원소에 기초하여, 적어도 하나의 객체의 몸통 방향의 각도를 추출할 수 있다. 몸통 방향 임베딩 벡터는 몸통 방향의 대표 각도 각각에 대응하는 원소들로 이루어지고, 원소들 각각의 값은 몸통 방향의 대표 각도 각각에 대한 스코어일 수 있다.At least one processor 2100 according to an embodiment may extract body direction embedding vectors using a feature map by executing at least one instruction. The at least one processor 2100 executes at least one instruction stored in the memory 2200 to determine the body direction of the at least one object from the body direction embedding vectors based on the center position of the feature map of the at least one object. Embedding vectors can be extracted. At least one processor 2100 executes at least one instruction stored in the memory 2200 to determine the body direction of the at least one object based on the element having the maximum value among the elements of the body direction embedding vector of the at least one object. The angle of can be extracted. The body direction embedding vector consists of elements corresponding to each representative angle of the body direction, and the value of each element may be a score for each representative angle of the body direction.

일 실시예에 따른 적어도 하나의 프로세서(2100)는 적어도 하나의 인스트럭션을 실행함으로써, 추적되고 있는 객체들 각각의 식별 특징 목록에 포함된 적어도 하나의 객체의 몸통 방향의 각도에 대응하는 식별 특징과 적어도 하나의 객체의 식별 특징을 비교할 수 있다. 적어도 하나의 프로세서(2100)는 메모리(2200)에 저장된 적어도 하나의 인스트럭션을 실행함으로써, 식별 특징 목록에 적어도 하나의 객체의 몸통 방향의 각도에 대응하는 식별 특징이 없는 경우, 적어도 하나의 객체의 몸통 방향의 각도와 가장 가까운 대표 각도에 대응하는 식별 특징과 적어도 하나의 객체의 식별 특징을 비교할 수 있다. 적어도 하나의 프로세서(2100)는 메모리(2200)에 저장된 적어도 하나의 인스트럭션을 실행함으로써, 비교 결과에 기초하여, 적어도 하나의 객체를 추적할 수 있다. 식별 특징 목록은, 최신 식별 특징 및 몸통 방향의 대표 각도 각각에 대응하는 적어도 하나의 식별 특징을 포함할 수 있다. At least one processor 2100 according to an embodiment executes at least one instruction, thereby generating at least an identification feature corresponding to the angle of the body direction of at least one object included in the identification feature list of each of the objects being tracked. The identifying characteristics of one object can be compared. At least one processor 2100 executes at least one instruction stored in the memory 2200, so that when there is no identification feature corresponding to the angle of the body direction of the at least one object in the identification feature list, the body of the at least one object The identification feature of at least one object may be compared with the identification feature corresponding to the representative angle closest to the direction angle. At least one processor 2100 may track at least one object based on the comparison result by executing at least one instruction stored in the memory 2200. The identification feature list may include at least one identification feature corresponding to each of the latest identification features and a representative angle of the body direction.

일 실시예에 따른 적어도 하나의 프로세서(2100)는 적어도 하나의 인스트럭션을 실행함으로써, 적어도 하나의 객체가 추적되고 있는 객체 중 적어도 하나와 매칭된 경우, 적어도 하나의 객체의 식별 특징 목록에 포함된 최신 식별 특징과 적어도 하나의 객체의 몸통 방향의 각도에 대응하는 식별 특징을 적어도 하나의 객체의 식별 특징으로 업데이트할 수 있다.At least one processor 2100 according to an embodiment executes at least one instruction, so that when at least one object matches at least one of the objects being tracked, the latest information included in the identification feature list of the at least one object The identification feature and the identification feature corresponding to the angle of the body direction of the at least one object may be updated as the identification feature of the at least one object.

일 실시예에 따른 적어도 하나의 프로세서(2100)는 적어도 하나의 인스트럭션을 실행함으로써, 추적되고 있는 객체들 중 이미지 획득 시점으로부터 기 설정된 기간 내에 매칭되었던 객체와 비교하는 경우, 적어도 하나의 객체의 위치 정보 및 적어도 하나의 객체의 식별 특징을 기초로 하여, 이미지 획득 시점으로부터 기 설정된 기간 내에 매칭되었던 객체와 적어도 하나의 객체를 비교하고, 추적되고 있는 객체들 중 이미지 획득 시점으로부터 기 설정된 기간 내에 매칭되지 않았던 객체와 비교하는 경우, 적어도 하나의 객체의 위치 정보, 적어도 하나의 객체의 식별 특징, 및 적어도 하나의 객체의 몸통 방향의 각도를 기초로 하여, 이미지 획득 시점으로부터 기 설정된 기간 내에 매칭되지 않았던 객체와 적어도 하나의 객체를 비교할 수 있다. 적어도 하나의 프로세서(2100)는 적어도 하나의 인스트럭션을 실행함으로써, 비교 결과에 기초하여 적어도 하나의 객체를 추적할 수 있다.At least one processor 2100 according to an embodiment executes at least one instruction to generate location information of at least one object when comparing it with an object that was matched within a preset period from the time of image acquisition among the objects being tracked. And based on the identification characteristics of the at least one object, comparing the at least one object with an object that was matched within a preset period from the time of image acquisition, and comparing the at least one object with the object that was not matched within a preset period from the time of image acquisition among the objects being tracked. When comparing with an object, based on the location information of at least one object, the identification feature of at least one object, and the angle of the body direction of at least one object, an object that was not matched within a preset period from the time of image acquisition At least one object can be compared. At least one processor 2100 may track at least one object based on the comparison result by executing at least one instruction.

본 개시의 일 실시예에 따른 적어도 하나의 객체를 추적하는 방법은 이미지를 획득하는 단계를 포함할 수 있다. 적어도 하나의 객체를 추적하는 방법은, 이미지로부터 객체 추적과 관련된 복수의 태스크를 수행하기 위한 특징맵을 추출하는 단계를 포함할 수 있다. 적어도 하나의 객체를 추적하는 방법은, 추출된 특징맵을 이용하여 적어도 하나의 객체의 위치를 나타내는 위치 정보, 적어도 하나의 객체를 식별할 수 있도록 하는 식별 특징, 및 적어도 하나의 객체의 몸통이 향하는 몸통 방향의 각도를 추출하는 단계를 포함할 수 있다. 적어도 하나의 객체를 추적하는 방법은, 적어도 하나의 객체의 위치 정보, 식별 특징, 및 몸통 방향의 각도를 기초로 하여, 적어도 하나의 객체를 추적하는 단계를 포함할 수 있다.A method of tracking at least one object according to an embodiment of the present disclosure may include acquiring an image. A method of tracking at least one object may include extracting a feature map for performing a plurality of tasks related to object tracking from an image. A method of tracking at least one object uses an extracted feature map to provide location information indicating the location of at least one object, identification features that enable identification of the at least one object, and information about where the body of the at least one object is facing. It may include extracting the angle of the body direction. A method of tracking at least one object may include tracking at least one object based on location information, an identification feature, and a body direction angle of the at least one object.

일 실시예에 의하면, 적어도 하나의 객체를 추적하는 방법에서 특징맵을 추출하는 단계, 적어도 하나의 객체의 위치 정보, 식별 특징, 및 몸통 방향의 각도를 추출하는 단계는 객체 추적 모델을 통해 수행될 수 있다. 객체 추적 모델은 객체 추적과 관련된 복수의 태스크를 수행하는 멀티 태스크 인공지능 모델이며, 특징맵을 추출하는 백본망, 적어도 하나의 객체의 위치 정보를 추출하는 탐지 헤드(head), 적어도 하나의 객체의 식별 특징을 추출하는 식별 헤드 및 적어도 하나의 객체의 몸통 방향의 각도를 추출하는 몸통 방향 헤드를 포함할 수 있다. 백본망 및 헤드들 각각은 적어도 하나의 레이어로 구성된 것일 수 있다.According to one embodiment, in a method of tracking at least one object, extracting a feature map, extracting location information, identification features, and body direction angle of at least one object are performed through an object tracking model. You can. The object tracking model is a multi-task artificial intelligence model that performs multiple tasks related to object tracking, including a backbone network that extracts a feature map, a detection head that extracts location information of at least one object, and a backbone network that extracts the location information of at least one object. It may include an identification head for extracting identification features and a body direction head for extracting the angle of the body direction of at least one object. Each of the backbone network and heads may be composed of at least one layer.

일 실시예에 의하면, 적어도 하나의 객체를 추적하는 방법에서 객체 추적 모델은 객체의 종류를 나타내는 클래스에 대한 주석(annotation), 객체의 ID에 대한 주석, 객체의 위치 및 크기를 사각형으로 나타내는 경계 상자(bounding box) 정보에 대한 주석, 객체의 경계를 나타내는 세그멘테이션(segmentation) 정보에 대한 주석 및 객체의 몸통이 향하는 몸통 방향의 각도에 대한 주석을 포함하는 이미지 데이터세트로 학습되는 것일 수 있다.According to one embodiment, in a method of tracking at least one object, the object tracking model includes an annotation on a class indicating the type of the object, an annotation on the ID of the object, and a bounding box indicating the position and size of the object in a rectangle. It may be learned with an image dataset that includes an annotation for (bounding box) information, an annotation for segmentation information indicating the boundary of the object, and an annotation for the angle of the body direction toward which the body of the object faces.

일 실시예에 의하면, 적어도 하나의 객체를 추적하는 방법에서, 추출된 특징맵을 이용하여 적어도 하나의 객체의 위치 정보를 추출하는 단계는, 적어도 하나의 객체의 클래스를 식별하는 단계를 포함할 수 있다.According to one embodiment, in a method of tracking at least one object, extracting location information of at least one object using the extracted feature map may include identifying the class of the at least one object. there is.

일 실시예에 의하면, 적어도 하나의 객체를 추적하는 방법에서, 특징맵을 이용하여 적어도 하나의 객체의 위치 정보를 추출하는 단계는, 특징맵을 이용하여 적어도 하나의 객체의 특징맵 상에서의 중심 위치를 나타내는 히트맵, 적어도 하나의 객체의 경계 상자 크기, 적어도 하나의 객체의 이미지 상에서의 중심 위치를 획득하기 위한 중심 오프셋을 추출하는 단계를 포함할 수 있다. 특징맵을 이용하여 적어도 하나의 객체의 위치 정보를 추출하는 단계는, 히트맵에서 적어도 하나의 객체의 특징맵 상에서의 중심 위치를 획득하는 단계를 포함할 수 있다. 특징맵을 이용하여 적어도 하나의 객체의 위치 정보를 추출하는 단계는, 적어도 하나의 객체의 특징맵 상에서의 중심 위치, 적어도 하나의 객체의 경계 상자 크기, 및 적어도 하나의 객체의 중심 오프셋을 기초로 하여, 적어도 하나의 객체의 위치 정보를 추출하는 단계를 포함할 수 있다. 히트맵은 히트맵의 각 픽셀에 대응하는 히트맵 스코어를 가지며, 적어도 하나의 객체의 특징맵 상에서의 중심 위치에서 피크(peak) 히트맵 스코어를 가지는 것일 수 있다.According to one embodiment, in a method of tracking at least one object, the step of extracting location information of at least one object using a feature map includes determining the center position of the at least one object on the feature map using the feature map. It may include extracting a heatmap representing a heatmap, a bounding box size of at least one object, and a center offset for obtaining the center position on the image of at least one object. Extracting location information of at least one object using the feature map may include obtaining the center position of the at least one object on the feature map from the heat map. The step of extracting location information of at least one object using the feature map includes the center position on the feature map of the at least one object, the bounding box size of the at least one object, and the center offset of the at least one object. Thus, the method may include extracting location information of at least one object. The heatmap may have a heatmap score corresponding to each pixel of the heatmap, and may have a peak heatmap score at the center position on the feature map of at least one object.

일 실시예에 의하면, 적어도 하나의 객체를 추적하는 방법에서, 특징맵을 이용하여 적어도 하나의 객체의 식별 특징을 추출하는 단계는 특징맵을 이용하여 식별 특징들을 추출하는 단계를 포함할 수 있다. 특징맵을 이용하여 적어도 하나의 객체의 식별 특징을 추출하는 단계는 적어도 하나의 객체의 특징맵 상에서의 중심 위치에 기초하여, 식별 특징들로부터 적어도 하나의 객체의 식별 특징을 추출하는 단계를 포함할 수 있다.According to one embodiment, in a method of tracking at least one object, extracting identification features of at least one object using a feature map may include extracting identification features using a feature map. Extracting the identification feature of at least one object using the feature map may include extracting the identification feature of the at least one object from the identification features based on the central position of the at least one object on the feature map. You can.

일 실시예에 의하면, 적어도 하나의 객체를 추적하는 방법에서, 특징맵을 이용하여 적어도 하나의 객체의 몸통 방향의 각도를 추출하는 단계는 특징맵을 이용하여 몸통 방향 임베딩(embedding) 벡터들을 추출하는 단계를 포함할 수 있다. 적어도 하나의 객체의 몸통 방향의 각도를 추출하는 단계는 적어도 하나의 객체의 특징맵 상에서의 중심 위치에 기초하여, 몸통 방향 임베딩 벡터들로부터 적어도 하나의 객체의 몸통 방향 임베딩 벡터를 추출하는 단계를 포함할 수 있다. 적어도 하나의 객체의 몸통 방향의 각도를 추출하는 단계는 적어도 하나의 객체의 몸통 방향 임베딩 벡터의 원소들 중 최댓값을 가지는 원소에 기초하여, 적어도 하나의 객체의 몸통 방향의 각도를 추출하는 단계를 포함할 수 있다. 몸통 방향 임베딩 벡터는 몸통 방향의 대표 각도 각각에 대응하는 원소들로 이루어지고, 원소들 각각의 값은 몸통 방향의 대표 각도 각각에 대한 스코어일 수 있다.According to one embodiment, in a method of tracking at least one object, the step of extracting the angle of the body direction of at least one object using a feature map includes extracting body direction embedding vectors using the feature map. May include steps. The step of extracting the angle of the body direction of the at least one object includes extracting the body direction embedding vector of the at least one object from the body direction embedding vectors based on the center position on the feature map of the at least one object. can do. The step of extracting the angle of the body direction of the at least one object includes extracting the angle of the body direction of the at least one object based on the element having the maximum value among the elements of the body direction embedding vector of the at least one object. can do. The body direction embedding vector consists of elements corresponding to each representative angle of the body direction, and the value of each element may be a score for each representative angle of the body direction.

일 실시예에 의하면, 적어도 하나의 객체를 추적하는 방법에서, 적어도 하나의 객체의 위치 정보, 적어도 하나의 객체의 식별 특징, 적어도 하나의 객체의 몸통 방향의 각도를 기초로 하여, 적어도 하나의 객체를 추적하는 단계는 추적되고 있는 객체들 각각의 식별 특징 목록에 적어도 하나의 객체의 몸통 방향의 각도에 대응하는 식별 특징이 존재하는지 여부를 판단하는 단계를 포함할 수 있다. According to one embodiment, in a method of tracking at least one object, based on the location information of the at least one object, the identification feature of the at least one object, and the angle of the body direction of the at least one object, the at least one object The tracking step may include determining whether an identification feature corresponding to the angle of the body direction of at least one object exists in the identification feature list of each of the objects being tracked.

적어도 하나의 객체의 위치 정보, 적어도 하나의 객체의 식별 특징, 적어도 하나의 객체의 몸통 방향의 각도를 기초로 하여, 적어도 하나의 객체를 추적하는 단계는, 식별 특징 목록에 적어도 하나의 객체의 몸통 방향의 각도에 대응하는 식별 특징이 있는 경우, 식별 특징 목록에 포함된 적어도 하나의 객체의 몸통 방향의 각도에 대응하는 식별 특징과 적어도 하나의 객체의 식별 특징을 비교하고, 식별 특징 목록에 적어도 하나의 객체의 몸통 방향의 각도에 대응하는 식별 특징이 없는 경우, 적어도 하나의 객체의 몸통 방향의 각도와 가장 가까운 대표 각도에 대응하는 식별 특징과 적어도 하나의 객체의 식별 특징을 비교하는 단계를 포함할 수 있다. The step of tracking the at least one object based on the location information of the at least one object, the identification feature of the at least one object, and the angle of the body direction of the at least one object includes the body of the at least one object in the identification feature list. If there is an identification feature corresponding to the angle of direction, the identification feature corresponding to the angle of the body direction of at least one object included in the identification feature list is compared with the identification feature of the at least one object, and at least one object is included in the identification feature list. If there is no identification feature corresponding to the angle of the body direction of the object, comparing the identification feature of the at least one object with the identification feature corresponding to the representative angle closest to the angle of the body direction of the object. You can.

적어도 하나의 객체의 위치 정보, 적어도 하나의 객체의 식별 특징, 적어도 하나의 객체의 몸통 방향의 각도를 기초로 하여, 적어도 하나의 객체를 추적하는 단계는, 비교 결과에 기초하여, 적어도 하나의 객체를 추적하는 단계를 포함할 수 있다. 식별 특징 목록은, 객체의 최신 식별 특징 및 몸통 방향의 대표 각도 각각에 대응하는 적어도 하나의 식별 특징을 포함할 수 있다.Tracking the at least one object based on the location information of the at least one object, the identification feature of the at least one object, and the angle of the body direction of the at least one object may include: tracking the at least one object based on the comparison result; It may include the step of tracking. The identification feature list may include at least one identification feature corresponding to each of the latest identification feature of the object and the representative angle of the body direction.

일 실시예에 따른 적어도 하나의 객체를 추적하는 방법은, 적어도 하나의 객체가 추적되고 있는 객체 중 적어도 하나와 매칭된 경우, 적어도 하나의 객체의 식별 특징 목록에 포함된 최신 식별 특징과 적어도 하나의 객체의 몸통 방향의 각도에 대응하는 식별 특징을 적어도 하나의 객체의 식별 특징으로 업데이트하는 단계를 포함할 수 있다.A method of tracking at least one object according to an embodiment includes, when the at least one object matches at least one of the objects being tracked, the latest identification feature included in the identification feature list of the at least one object and at least one It may include updating an identification feature corresponding to the angle of the body direction of the object to an identification feature of at least one object.

일 실시예에 의하면, 적어도 하나의 객체를 추적하는 방법에서, 적어도 하나의 객체의 위치 정보, 적어도 하나의 객체의 식별 특징, 적어도 하나의 객체의 몸통 방향의 각도를 기초로 하여, 적어도 하나의 객체를 추적하는 단계는, 추적되고 있는 객체들 중 이미지 획득 시점으로부터 기 설정된 기간 내에 매칭되었던 객체와 비교하는 경우, 적어도 하나의 객체의 위치 정보 및 적어도 하나의 객체의 식별 특징을 기초로 하여, 이미지 획득 시점으로부터 기 설정된 기간 내에 매칭되었던 객체와 적어도 하나의 객체를 비교하고, 추적되고 있는 객체들 중 이미지 획득 시점으로부터 기 설정된 기간 내에 매칭되지 않았던 객체와 비교하는 경우, 적어도 하나의 객체의 위치 정보, 적어도 하나의 객체의 식별 특징, 및 적어도 하나의 객체의 몸통 방향의 각도를 기초로 하여, 이미지 획득 시점으로부터 기 설정된 기간 내에 매칭되지 않았던 객체와 적어도 하나의 객체를 비교하는 단계를 포함할 수 있다. According to one embodiment, in a method of tracking at least one object, based on the location information of the at least one object, the identification feature of the at least one object, and the angle of the body direction of the at least one object, the at least one object The tracking step includes obtaining an image based on the location information of at least one object and the identification feature of at least one object when comparing the objects among the being tracked objects with those that were matched within a preset period from the time of image acquisition. When comparing at least one object with an object that was matched within a preset period from the point of view and comparing it with an object that was not matched within a preset period from the point of image acquisition among the objects being tracked, location information of at least one object, at least It may include comparing the at least one object with an object that was not matched within a preset period from the time of image acquisition, based on the identification characteristic of one object and the angle of the body direction of the at least one object.

적어도 하나의 객체의 위치 정보, 적어도 하나의 객체의 식별 특징, 적어도 하나의 객체의 몸통 방향의 각도를 기초로 하여, 적어도 하나의 객체를 추적하는 단계는, 비교 결과에 기초하여 적어도 하나의 객체를 추적하는 단계를 포함할 수 있다.The step of tracking the at least one object based on the location information of the at least one object, the identification feature of the at least one object, and the angle of the body direction of the at least one object includes: tracking the at least one object based on the comparison result; May include tracking steps.

기기로 읽을 수 있는 저장매체는, 비일시적(non-transitory) 저장매체의 형태로 제공될 수 있다. 여기서, ‘비일시적 저장매체'는 실재(tangible)하는 장치이고, 신호(signal)(예: 전자기파)를 포함하지 않는다는 것을 의미할 뿐이며, 이 용어는 데이터가 저장매체에 반영구적으로 저장되는 경우와 임시적으로 저장되는 경우를 구분하지 않는다. 예로, '비일시적 저장매체'는 데이터가 임시적으로 저장되는 버퍼를 포함할 수 있다.A storage medium that can be read by a device may be provided in the form of a non-transitory storage medium. Here, 'non-transitory storage medium' simply means that it is a tangible device and does not contain signals (e.g. electromagnetic waves). This term refers to cases where data is semi-permanently stored in a storage medium and temporary storage media. It does not distinguish between cases where it is stored as . For example, a 'non-transitory storage medium' may include a buffer where data is temporarily stored.

일 실시예에 따르면, 본 문서에 개시된 다양한 실시예들에 따른 방법은 컴퓨터 프로그램 제품(computer program product)에 포함되어 제공될 수 있다. 컴퓨터 프로그램 제품은 상품으로서 판매자 및 구매자 간에 거래될 수 있다. 컴퓨터 프로그램 제품은 기기로 읽을 수 있는 저장 매체(예: compact disc read only memory (CD-ROM))의 형태로 배포되거나, 또는 어플리케이션 스토어를 통해 또는 두개의 사용자 장치들(예: 스마트폰들) 간에 직접, 온라인으로 배포(예: 다운로드 또는 업로드)될 수 있다. 온라인 배포의 경우에, 컴퓨터 프로그램 제품(예: 다운로더블 앱(downloadable app))의 적어도 일부는 제조사의 서버, 어플리케이션 스토어의 서버, 또는 중계 서버의 메모리와 같은 기기로 읽을 수 있는 저장 매체에 적어도 일시 저장되거나, 임시적으로 생성될 수 있다.According to one embodiment, methods according to various embodiments disclosed in this document may be provided and included in a computer program product. Computer program products are commodities and can be traded between sellers and buyers. A computer program product may be distributed in the form of a machine-readable storage medium (e.g. compact disc read only memory (CD-ROM)) or through an application store or between two user devices (e.g. smartphones). It may be distributed in person or online (e.g., downloaded or uploaded). In the case of online distribution, at least a portion of the computer program product (e.g., a downloadable app) is stored on a machine-readable storage medium, such as the memory of a manufacturer's server, an application store's server, or a relay server. It can be temporarily stored or created temporarily.

Claims

In a method of tracking at least one object,
acquiring an image;
extracting a feature map for performing a plurality of tasks related to object tracking from the image;
Using the extracted feature map, location information indicating the location of the at least one object, an identification feature enabling identification of the at least one object, and an angle of a body direction toward which the body of the at least one object faces are extracted. steps; and
A method comprising tracking the at least one object based on location information, identifying features, and body orientation angle of the at least one object.

According to paragraph 1,
The step of extracting the feature map, the step of extracting the location information, identification feature, and body direction angle of the at least one object are performed through an object tracking model,
The object tracking model is a multi-task artificial intelligence model that performs a plurality of tasks related to object tracking, and includes a backbone network for extracting the feature map, a detection head for extracting location information of the at least one object, and An identification head for extracting identification features of at least one object and a body direction head for extracting an angle of the body direction of the at least one object,
The method, wherein each of the backbone network and the heads consists of at least one layer.

According to any one of claims 1 and 2,
The object tracking model includes an annotation on the class indicating the type of object, an annotation on the ID of the object, an annotation on bounding box information indicating the position and size of the object in a rectangle, and an annotation on bounding box information indicating the boundary of the object. A method that is learned with an image dataset that includes an annotation for segmentation information and an annotation for the angle of the body direction toward which the body of the object faces.

According to any one of claims 1 to 3,
The step of extracting location information of the at least one object using the extracted feature map includes:
A method comprising: identifying a class of the at least one object.

According to any one of claims 1 to 4,
The step of extracting location information of at least one object using the feature map includes:
Using the feature map, a heatmap indicating the central position of the at least one object on the feature map, a bounding box size of the at least one object, and a central position of the at least one object on the image are obtained. extracting a centroid offset;
Obtaining a central position of the at least one object on the feature map from the heat map;
Extracting location information of the at least one object based on a center position of the at least one object on the feature map, a bounding box size of the at least one object, and a center offset of the at least one object. It further includes,
The heatmap has a heatmap score corresponding to each pixel of the heatmap, and has a peak heatmap score at a central position on the feature map of the at least one object.

According to any one of claims 1 to 5,
The step of extracting identification features of the at least one object using the feature map includes:
extracting identification features using the feature map;
Extracting an identifying feature of the at least one object from the identifying features based on a central location of the at least one object on the feature map.

According to any one of claims 1 to 6,
The step of extracting the angle of the body direction of the at least one object using the feature map includes:
extracting body direction embedding vectors using the feature map;
extracting a body direction embedding vector of the at least one object from the body direction embedding vectors based on a central position of the at least one object on the feature map;
A step of extracting the angle of the body direction of the at least one object based on the element having the maximum value among the elements of the body direction embedding vector of the at least one object,
The method wherein the body direction embedding vector is composed of the elements corresponding to each representative angle of the body direction, and the value of each element is a score for each representative angle of the body direction.

According to any one of claims 1 to 7,
Tracking the at least one object based on the location information of the at least one object, the identification feature of the at least one object, and the angle of the body direction of the at least one object,
determining whether an identification feature corresponding to a body direction angle of the at least one object exists in an identification feature list of each of the objects being tracked;
If the identification feature list includes an identification feature corresponding to the angle of the body direction of the at least one object, the identification feature corresponding to the angle of the body direction of the at least one object included in the identification feature list and the at least one Compare the identifying characteristics of objects,
If there is no identification feature in the identification feature list that corresponds to the angle of the body direction of the at least one object, identification corresponding to the representative angle closest to the angle of the body direction of the at least one object included in the identification feature list. comparing features and identifying characteristics of the at least one object;
Based on the comparison result, tracking the at least one object,
The method, wherein the identification feature list includes at least one identification feature corresponding to each of the latest identification feature of the object and the representative angle of the body direction.

According to any one of claims 1 to 8,
The above method is,
When the at least one object matches at least one of the objects being tracked, identification corresponding to the latest identification feature included in the identification feature list of the at least one object and the angle of the body direction of the at least one object. The method further comprising updating a feature with an identifying feature of the at least one object.

According to any one of claims 1 to 9,
Tracking the at least one object based on the location information of the at least one object, the identification feature of the at least one object, and the angle of the body direction of the at least one object,
When comparing among the objects being tracked with an object that was matched within a preset period from the time of image acquisition, based on the location information of the at least one object and the identification characteristic of the at least one object, the time of image acquisition Compare the at least one object with an object that was matched within a preset period of time,
When comparing among the objects being tracked with an object that was not matched within the preset period from the time of image acquisition, location information of the at least one object, identification characteristics of the at least one object, and the at least one object Comparing the at least one object with an object that was not matched within a preset period from the time of image acquisition, based on the angle of the body direction;
A method comprising tracking the at least one object based on the comparison result.

In an electronic device (2000) for tracking at least one object,
communication interface 2300;
A memory 2200 that stores at least one instruction; and
At least one processor (2100) executing the at least one instruction stored in the memory (2200),
The at least one processor 2100 executes the at least one instruction,
acquire the image,
Extracting a feature map for performing a plurality of tasks related to object tracking from the image,
Using the extracted feature map, location information indicating the location of the at least one object, an identification feature enabling identification of the at least one object, and an angle of a body direction toward which the body of the at least one object faces are extracted. do,
An electronic device (2000) that tracks the at least one object based on location information, identification features, and body direction angle of the at least one object.

According to clause 11,
The operation of extracting the feature map, the operation of extracting the location information, identification feature, and body direction information of the at least one object are performed through an object tracking model,
The object tracking model is a multi-task artificial intelligence model that performs a plurality of tasks related to object tracking, including a backbone network for extracting the feature map, a detection head for extracting location information of the at least one object, and An identification head for extracting identification features of at least one object and a body direction head for extracting an angle of the body direction of the at least one object,
The electronic device (2000), wherein each of the backbone network and the heads consists of at least one layer.

According to any one of claims 11 to 12,
The object tracking model includes an annotation on the class indicating the type of object, an annotation on the ID of the object, an annotation on bounding box information indicating the position and size of the object in a rectangle, and an annotation on bounding box information indicating the boundary of the object. An electronic device (2000), which is trained with an image dataset that includes annotations for segmentation information and annotations for the angle of the body direction toward which the body of the object faces.

According to any one of claims 11 to 13,
The at least one processor 2100 executes the at least one instruction,
An electronic device (2000) that identifies a class of the at least one object.

According to any one of claims 11 to 14,
The at least one processor 2100 executes the at least one instruction,
Using the feature map, a heatmap indicating the central position of the at least one object on the feature map, a bounding box size of the at least one object, and a central position of the at least one object on the image are obtained. extract the centroid offset,
Obtaining the central position of the at least one object on the feature map from the heat map,
Extracting location information of the at least one object based on the center position of the at least one object on the feature map, the bounding box size of the at least one object, and the center offset of the at least one object,
The heatmap has a heatmap score corresponding to each pixel of the heatmap, and has a peak heatmap score at a central position on the feature map of the at least one object. .

According to any one of claims 11 to 15,
The at least one processor 2100 executes the at least one instruction,
Extract identification features using the feature map,
An electronic device (2000) that extracts an identification feature of the at least one object from the identification features based on a central position of the at least one object on the feature map.

According to any one of claims 11 to 16,
The at least one processor 2100 executes the at least one instruction,
Extract body direction embedding vectors using the feature map,
extracting a body direction embedding vector of the at least one object from the body direction embedding vectors, based on the central position of the at least one object on the feature map;
Extracting the angle of the body direction of the at least one object based on the element having the maximum value among the elements of the body direction embedding vector of the at least one object,
The electronic device (2000) wherein the body direction embedding vector is composed of the elements corresponding to each representative angle of the body direction, and the value of each element is a score for each representative angle of the body direction.

According to any one of claims 11 to 17,
The at least one processor 2100 executes the at least one instruction,
Determine whether an identification feature corresponding to the angle of the body direction of the at least one object exists in the identification feature list of each object being tracked,
If the identification feature list includes an identification feature corresponding to the angle of the body direction of the at least one object, the identification feature corresponding to the angle of the body direction of the at least one object included in the identification feature list and the at least one Compare the identifying characteristics of objects,
If there is no identification feature in the identification feature list corresponding to the angle of the body direction of the at least one object, the identification feature corresponding to the representative angle closest to the angle of the body direction of the at least one object and the at least one object Compare the identifying characteristics of
Based on the comparison result, track the at least one object,
The electronic device (2000) wherein the identification feature list includes at least one identification feature corresponding to each of the latest identification feature and a representative angle of the body direction.

According to any one of claims 11 to 18,
The at least one processor 2100 executes the at least one instruction,
When the at least one object matches at least one of the objects being tracked, identification corresponding to the latest identification feature included in the identification feature list of the at least one object and the angle of the body direction of the at least one object. An electronic device (2000) that updates features with identification features of the at least one object.

A computer-readable recording medium on which a program for performing the method of any one of claims 1 to 10 is recorded on a computer.