KR20220079428A

KR20220079428A - Method and apparatus for detecting object in video

Info

Publication number: KR20220079428A
Application number: KR1020210140401A
Authority: KR
Inventors: 징타오 수; 이웨이 첸; 박창범; 이현정; 유병인; 한재준; 치앙 왕; 지아치안 유
Original assignee: 삼성전자주식회사
Priority date: 2020-12-04
Filing date: 2021-10-20
Publication date: 2022-06-13
Also published as: CN114596515A

Abstract

인공 지능, 객체 추적, 객체 검출, 이미지 처리 등 분야에 적용될 수 있는 타겟 객체 검출 방법 및 장치가 개시된다. 하나 이상의 타겟 템플릿을 포함하는 타겟 템플릿 세트에 기초하여, 복수의 프레임 이미지들을 포함하는 비디오의 프레임 이미지로부터 객체가 검출된다. 타겟 템플릿 세트를 이용함으로써, 객체 검출 및 추적의 정확도가 향상된다.A method and apparatus for detecting a target object that can be applied to fields such as artificial intelligence, object tracking, object detection, and image processing are disclosed. An object is detected from a frame image of a video including a plurality of frame images based on a target template set including one or more target templates. By using a set of target templates, the accuracy of object detection and tracking is improved.

Description

METHOD AND APPARATUS FOR DETECTING OBJECT IN VIDEO

아래의 개시는, 비디오에서 객체를 추적하고 검출하는 방법 및 장치에 관한 것이다.The disclosure below relates to a method and apparatus for tracking and detecting an object in a video.

비주얼 객체 추적(visual object tracking)은 컴퓨터 비전(computer vision) 기술에서 중요한 연구 분야 중 하나이다. 객체 추적 기술은, 일반적으로, 사람, 동물, 항공기, 자동차 등과 같이, 움직이는 객체(moving object)를 추적(track)하는데 사용된다.Visual object tracking is one of the important research fields in computer vision technology. Object tracking technology is generally used to track a moving object, such as a person, an animal, an aircraft, a car, or the like.

움직이는 객체를 추적하는 과정에서, 추적할 타겟 객체(target object to be tracked)는, 비디오의 제1 프레임(first frame)(초기 프레임(initial frame)이라고도 함)에서 지시(indicate)될 수 있다. 객체 추적 알고리즘은, 상기 비디오의 후속 프레임들에서 타겟 객체에 대한 계속적인 추적을 수행하고, 타겟 객체의 프레임 내 위치 정보를 제공한다.In the process of tracking a moving object, a target object to be tracked may be indicated in a first frame (also referred to as an initial frame) of a video. The object tracking algorithm performs continuous tracking of the target object in subsequent frames of the video, and provides location information of the target object within the frame.

객체 추적을 위하여, 제1 프레임에서 지시(indicate)된 타겟 객체에 기초하여 상기 타겟 객체와 연관된 템플릿 정보(template information)가 추출된다. 후속 비디오 프레임들(subsequent video frames)의 검색 영역(search region) 내에서, 복수의 다른 후보 영역들(candidate regions)과 상기 템플릿 정보 간의 매칭 정도가 계산되고, 가장 매칭되는 후보 영역이 타겟 객체의 위치로 결정된다.For object tracking, template information associated with the target object is extracted based on the target object indicated in the first frame. Within a search region of subsequent video frames, a degree of matching between a plurality of other candidate regions and the template information is calculated, and the most matching candidate region is the location of the target object is determined by

일 실시예에 따르면, 복수의 프레임 이미지들을 포함하는 비디오의 프레임 이미지(a frame image of a video comprising a plurality of frame images)로부터 객체(object)를 검출(detect)하는 방법이 제공된다. 이 방법은, 타겟 템플릿 세트(target tempage set)에 기초하여, 상기 프레임 이미지로부터 상기 객체를 검출하는 단계, 및 상기 검출된 객체에 관한 정보(information regarding the detected object)를 출력(output)하는 단계를 포함한다. 상기 타겟 템플릿 세트는, 하나 이상의 타겟 템플릿(one or more target templates)을 포함한다. 상기 하나 이상의 타겟 템플릿의 각각은, 상기 비디오의 상기 프레임 이미지의 이전 프레임 이미지들(previous frame images) 중 상기 객체를 포함하는 것으로 결정된 각각의 프레임 이미지(respective frame image)에서의 상기 객체의 정보를 포함한다.According to one embodiment, a method for detecting an object from a frame image of a video comprising a plurality of frame images is provided. The method includes, based on a target template set, detecting the object from the frame image, and outputting information regarding the detected object. include The target template set includes one or more target templates. each of the one or more target templates includes information of the object in a respective frame image determined to include the object among previous frame images of the frame image of the video. do.

일 실시예에 따르면, 상기 타겟 템플릿 세트는, 초기 타겟 템플릿(initial target template)를 더 포함할 수 있다. 상기 초기 타겟 템플릿은, 상기 비디오의 프레임 이미지들 중 상기 객체를 포함하는 것으로 사용자에 의하여 결정된 프레임 이미지에서의 상기 객체의 정보를 포함할 수 있다. 또는, 상기 초기 타겟 템플릿은, 상기 비디오와 독립(independent)되고, 상기 객체를 포함하는 별도의 이미지(separate image)를 포함할 수 있다.According to an embodiment, the target template set may further include an initial target template. The initial target template may include information on the object in the frame image determined by the user as including the object among the frame images of the video. Alternatively, the initial target template may include a separate image independent of the video and including the object.

일 실시예에 따르면, 상기 프레임 이미지로부터 상기 객체를 검출하는 단계는, 상기 타겟 템플릿 세트에 포함된 타겟 템플릿들의 각각에 대응하는 이미지 영역의 이미지 특징(image feature)을 융합(integrate)하여 타겟 융합 특징(integrated target feature)을 결정하는 단계, 상기 타겟 융합 특징에 기초하여, 상기 프레임 이미지 내에서 상기 객체를 포함하는 것으로 판단되는 하나 이상의 타겟 후보 영역(target candidate area)을 획득하는 단계, 및 상기 하나 이상의 타겟 후보 영역으로부터 하나의 타겟 영역(target area)을 결정하는 단계를 포함할 수 있다.According to an embodiment, the detecting of the object from the frame image comprises integrating an image feature of an image region corresponding to each of the target templates included in the target template set to integrate the target fusion feature. determining an integrated target feature; acquiring one or more target candidate areas determined to include the object in the frame image based on the target fusion feature; and the one or more target candidate areas. It may include determining one target area from the target candidate area.

일 실시예에 따르면, 상기 하나 이상의 타겟 후보 영역을 획득하는 단계는, 상기 프레임 이미지 내에서 복수의 검색 영역을 결정하는 단계, 상기 복수의 검색 영역의 각각의 이미지 특징을 추출(extract)하여 검색 영역 특징(search area feature)를 획득하는 단계, 상기 복수의 검색 영역의 각각의 검색 영역 특징과 상기 타겟 융합 특징의 상관도(correlation)를 계산하는 단계, 및 상기 상관도에 기초하여 상기 복수의 검색 영역 중 상기 하나 이상의 타겟 후보 영역을 결정하는 단계를 포함할 수 있다.According to an embodiment, the obtaining of the one or more target candidate regions includes determining a plurality of search regions within the frame image, and extracting image features of each of the plurality of search regions to obtain a search region obtaining a search area feature, calculating a correlation between each search area feature of the plurality of search areas and the target fusion feature, and based on the correlation, the plurality of search areas and determining the one or more target candidate regions.

일 실시예에 따르면, 상기 방법은, 상기 타겟 템플릿 세트를 갱신(update)하는 단계를 더 포함할 수 있다. 상기 타겟 템플릿 세트를 갱신하는 단계는, 상기 타겟 템플릿 세트의 상기 타겟 융합 특징 및 상기 타겟 영역의 유사도(similarity)를 계산하는 단계, 및 상기 유사도가 임계값보다 작은 경우, 상기 타겟 영역을 타겟 템플릿으로 상기 타겟 템플릿에 추가(add)하는 단계를 포함할 수 있다.According to an embodiment, the method may further include updating the target template set. The updating of the target template set includes calculating a similarity of the target region and the target fusion feature of the target template set, and if the similarity is less than a threshold value, the target region as a target template It may include adding (add) to the target template.

일 실시예에 따르면, 상기 프레임 이미지로부터 상기 객체를 검출하는 단계는, 상기 프레임 이미지 내에서 상기 객체를 포함하는 것으로 판단되는 하나 이상의 타겟 후보 영역(target candidate area)을 획득하는 단계, 및 상기 하나 이상의 타겟 후보 영역으로부터 하나의 타겟 영역(target area)을 결정하는 단계를 포함할 수 있다. 또한, 상기 방법은, 상기 타겟 영역이 미리 결정된 조건을 만족하는 경우, 상기 타겟 템플릿 세트를 갱신(update)하는 단계를 더 포함할 수 있다.According to an embodiment, the detecting of the object from the frame image includes: obtaining one or more target candidate areas determined to include the object in the frame image; It may include determining one target area from the target candidate area. Also, the method may further include updating the target template set when the target area satisfies a predetermined condition.

일 실시예에 따르면, 상기 타겟 템플릿 세트를 갱신하는 단계는, 상기 타겟 템플릿 세트의 모든 타겟 템플릿들의 각각 및 상기 타겟 영역의 유사도(similarity)를 계산하는 단계, 및 상기 유사도의 모두가 임계값보다 작은 경우, 상기 타겟 영역을 타겟 템플릿으로 상기 타겟 템플릿에 추가(add)하는 단계를 포함할 수 있다.According to an embodiment, the updating of the target template set includes calculating a similarity of each of all target templates of the target template set and the target region, and all of the similarities are less than a threshold value. In this case, the method may include adding the target region as a target template to the target template.

일 실시예에 따르면, 상기 프레임 이미지로부터 상기 객체를 검출하는 단계는, 상기 타겟 템플릿 세트 및 간섭 템플릿 세트(interference templage set)에 기초하여, 상기 프레임 이미지로부터 상기 객체를 검출하는 단계를 포함할 수 있다. 상기 간섭 템플릿 세트는, 하나 이상의 간섭 템플릿(interference template)을 포함한다. 상기 하나 이상의 간섭 템플릿의 각각은, 상기 비디오의 상기 프레임 이미지의 이전 프레임 이미지들 중 상기 객체의 검출을 방해(interfere with the detection of the object)한 간섭 객체(interfere object)에 관한 정보를 포함한다.According to an embodiment, the detecting of the object from the frame image may include detecting the object from the frame image based on the target template set and an interference template set. . The interference template set includes one or more interference templates. Each of the one or more interference templates includes information about an interference object that interfered with the detection of the object among previous frame images of the frame image of the video.

일 실시예에 따르면, 상기 프레임 이미지로부터 상기 객체를 검출하는 단계는, 상기 타겟 템플릿 세트 및 상기 간섭 테플릿 세트에 기초하여, 상기 프레임 이미지 내에서 상기 객체를 포함하는 것으로 판단되는 하나 이상의 타겟 후보 영역(target candidate area)을 획득하는 단계, 및 상기 하나 이상의 타겟 후보 영역으로부터 하나의 타겟 영역(target area)을 결정하는 단계를 포함할 수 있다.According to an embodiment, the detecting of the object from the frame image includes one or more target candidate regions determined to include the object in the frame image based on the target template set and the interference template set. The method may include obtaining a target candidate area, and determining one target area from the one or more target candidate areas.

일 실시예에 따르면, 상기 하나 이상의 타겟 후보 영역으로부터 하나의 타겟 영역을 결정하는 단계는, 상기 하나 이상의 타겟 후보 영역의 각각에 대해, 상기 타겟 후보 영역의 각각과 상기 타겟 템플릿 세트의 각 타겟 템플릿과의 매칭 정도(matching degree)를 계산하는 단계, 상기 하나 이상의 타겟 후보 영역의 각각에 대해, 상기 타겟 후보 영역의 각각과 상기 간섭 템플릿 세트의 각 간섭 템플릿과의 매칭 정도(matching degree)를 계산하는 단계, 상기 타겟 후보 영역의 각각과 상기 타겟 템플릿 세트의 각 타겟 템플릿과의 매칭 정도 및 상기 타겟 후보 영역의 각각과 상기 간섭 템플릿 세트의 각 간섭 템플릿과의 매칭 정도에 기초하여, 상기 타겟 후보 영역의 각각의 타겟 매칭 정도를 계산하는 단계, 및 상기 타겟 후보 영역의 각각의 타겟 매칭 정도에 기초하여, 상기 하나 이상의 타겟 후보 영역 중 하나의 타겟 영역을 결정하는 단계를 포함할 수 있다.According to an embodiment, the determining of one target region from the one or more target candidate regions includes, for each of the one or more target candidate regions, each of the target candidate regions and each target template of the set of target templates; calculating a matching degree of , for each of the one or more target candidate regions, calculating a matching degree of each of the target candidate regions and each interference template in the set of interference templates; , based on the matching degree of each of the target candidate areas with each target template of the target template set and the matching degree of each of the target candidate areas with each interference template of the interference template set, each of the target candidate areas calculating a target matching degree of , and determining one target area among the one or more target candidate areas based on each target matching degree of the target candidate area.

일 실시예에 따르면, 상기 타겟 후보 영역의 각각의 타겟 매칭 정도를 계산하는 단계는, 상기 타겟 후보 영역의 각각과 상기 타겟 템플릿 세트의 각 타겟 템플릿과의 매칭 정도의 평균값 또는 중간값, 및/또는 상기 타겟 후보 영역의 각각과 상기 간섭 템플릿 세트의 각 간섭 템플릿과의 매칭 정도의 평균값 또는 중간값에 기초하여 상기 타겟 후보 영역의 각각의 타겟 매칭 정도를 계산하는 단계를 포함할 수 있다.According to an embodiment, the calculating of each target matching degree of the target candidate area includes an average or median value of matching degrees between each of the target candidate areas and each target template of the target template set, and/or and calculating each target matching degree of the target candidate area based on an average or median value of matching degrees between each of the target candidate areas and each interference template of the interference template set.

일 실시예에 따르면, 상기 하나 이상의 타겟 후보 영역을 획득하는 단계는, 상기 타겟 템플릿 세트에 기초하여, 타겟 융합 특징(integrated target feature)을 결정하는 단계, 상기 간섭 템플릿 세트에 기초하여, 간섭 융합 특징(integrated inteference feature)을 결정하는 단계, 및 상기 타겟 융합 특징 및 상기 간섭 융합 특징에 기초하여, 상기 프레임 이미지로부터 상기 하나 이상의 타겟 후보 영역을 획득하는 단계를 포함할 수 있다.According to an embodiment, the obtaining of the one or more target candidate regions includes: determining, based on the target template set, an integrated target feature; based on the interference template set, an interference fusion feature. determining an integrated inteference feature; and obtaining the one or more target candidate regions from the frame image based on the target fusion feature and the interference fusion feature.

일 실시예에 따르면, 상기 타겟 템플릿 세트에 기초하여, 타겟 융합 특징을 결정하는 단계는, 상기 타겟 템플릿 세트에 포함된 타겟 템플릿들의 각각에 대응하는 이미지 영역의 이미지 특징을 융합(integrate)하여 상기 타겟 융합 특징을 결정하는 단계를 포함할 수 있다. 상기 간섭 템플릿 세트에 기초하여, 간섭 융합 특징을 결정하는 단계는, 상기 간섭 템플릿 세트에 포함된 간섭 템플릿들의 각각에 대응하는 이미지 영역의 이미지 특징을 융합(integrate)하여 상기 간섭 융합 특징을 결정하는 단계를 포함할 수 있다.According to an embodiment, the determining of the target fusion characteristic based on the target template set includes integrating image characteristics of an image region corresponding to each of the target templates included in the target template set to integrate the target determining a fusion characteristic. The determining of the interference fusion characteristic based on the interference template set may include: determining the interference fusion characteristic by integrating image characteristics of an image region corresponding to each of the interference templates included in the interference template set may include

일 실시예에 따르면, 상기 타겟 템플릿 세트에 기초하여, 타겟 융합 특징을 결정하는 단계는, 상기 타겟 템플릿 세트에 포함된 모든 타겟 템플릿에 기초하여, 상기 타겟 융합 특징을 결정하는 단계를 포함할 수 있다. 상기 간섭 템플릿 세트에 기초하여, 간섭 융합 특징을 결정하는 단계는, 상기 간섭 템플릿 세트에 포함된 모든 간섭 템플릿에 기초하여, 상기 간섭 융합 특징을 결정하는 단계를 포함할 수 있다.According to an embodiment, the determining of the target fusion characteristic based on the target template set may include determining the target fusion characteristic based on all target templates included in the target template set. . The determining of the interference fusion characteristic based on the interference template set may include determining the interference fusion characteristic based on all interference templates included in the interference template set.

일 실시예에 따르면, 상기 방법은, 상기 간섭 템플릿 세트를 갱신(update)하는 단계를 더 포함할 수 있다. 일 예에 따르면, 상기 간섭 템플릿 세트를 갱신하는 단계는, 상기 하나 이상의 타겟 후보 영역 중 상기 타겟 영역을 제외한 다른 타겟 후보 영역의 일부 또는 전부를 간섭 템플릿으로 상기 간섭 템플릿에 추가(add)하는 단계를 포함할 수 있다. 다른 예에 따르면, 상기 간섭 템플릿 세트를 갱신하는 단계는, 상기 간섭 템플릿 세트의 상기 간섭 융합 특징 및 상기 다른 타겟 후보 영역의 일부 또는 전부의 유사도를 계산하는 단계, 및 상기 다른 타겟 후보 영역의 일부 또는 전부 중 상기 유사도가 임계값보다 작은 타겟 후보 영역을 간섭 템플릿으로 상기 간섭 템플릿에 추가하는 단계를 포함할 수 있다.According to an embodiment, the method may further include updating the interference template set. According to an example, the updating of the interference template set includes adding some or all of the target candidate regions other than the target region among the one or more target candidate regions as an interference template to the interference template. may include According to another example, the updating of the interference template set may include calculating a similarity of some or all of the interference fusion characteristic of the interference template set and the other target candidate region, and a part of the other target candidate region or The method may include adding a target candidate region in which the similarity is less than a threshold value among all the interference templates to the interference template.

실시예들에 따르면, 하나 이상의 타겟 템플릿이 타겟 템플릿 세트에 포함되어 있으므로, 타겟 템플릿 세트에 포함된 정보가 더 풍부하게 되고, 타겟 객체를 검출하기 위해 제1 프레임 이미지만 사용하는 것을 피할 수 있다. 즉, 타겟 템플릿 세트는 타겟 객체의 나타날 수 잇는 다양한 특징 정보를 기록하므로, 타겟 객체의 정보를 보다 포괄적으로 설명할 수 있고, 이를 통해 타겟 객체 검출의 정확도가 향상된다.According to embodiments, since one or more target templates are included in the target template set, information included in the target template set is richer, and it is possible to avoid using only the first frame image to detect the target object. That is, since the target template set records various characteristic information that may appear of the target object, information of the target object can be described more comprehensively, thereby improving the accuracy of target object detection.

도 1은 일 실시예에 따른 객체 추적 방법을 설명하기 위한 도면이다.
도 2는 일 실시예에 따른 객체 추적 방법을 설명하기 위한 도면이다.
도 3은 일 실시예에 따른 객체 추적 방법의 흐름도이다.
도 4는 일 실시예에 따른 객체 추적 방법의 메모리 매칭을 설명하기 위한 도면이다.
도 5는 일 실시예에 따른 간소화된 객체 검출 방법(light-weight object detection method)를 설명하기 위한 도면이다.
도 6은 다른 실시예에 따른 간소화된 객체 검출 방법(light-weight object detection method)를 설명하기 위한 도면이다.
도 7은 일 실시예에 따른 메모리 갱신을 설명하기 위한 도면이다.
도 8은 일 실시예에 따른 객체 검출 장치의 블록도이다.
도 9는 일 실시예에 따른 객체 검출을 위한 전자 장치의 블록도이다.1 is a diagram for describing an object tracking method according to an embodiment.
2 is a diagram for describing an object tracking method according to an embodiment.
3 is a flowchart of an object tracking method according to an embodiment.
4 is a diagram for explaining memory matching of an object tracking method according to an embodiment.
FIG. 5 is a diagram for describing a light-weight object detection method according to an exemplary embodiment.
6 is a view for explaining a simplified object detection method (light-weight object detection method) according to another embodiment.
7 is a diagram for describing memory update according to an exemplary embodiment.
8 is a block diagram of an object detecting apparatus according to an exemplary embodiment.
9 is a block diagram of an electronic device for object detection according to an embodiment.

실시예들에 대한 특정한 구조적 또는 기능적 설명들은 단지 예시를 위한 목적으로 개시된 것으로서, 다양한 형태로 변경되어 구현될 수 있다. 따라서, 실제 구현되는 형태는 개시된 특정 실시예로만 한정되는 것이 아니며, 본 개시의 범위는 실시예들로 설명한 기술적 사상에 포함되는 변경, 균등물, 또는 대체물을 포함한다.Specific structural or functional descriptions of the embodiments are disclosed for purposes of illustration only, and may be changed and implemented in various forms. Accordingly, the actual implemented form is not limited to the specific embodiments disclosed, and the scope of the present disclosure includes changes, equivalents, or substitutes included in the technical idea described in the embodiments.

"제1" 또는 "제2" 등의 용어를 다양한 구성요소들을 설명하는데 사용될 수 있지만, 이런 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 해석되어야 한다. 예를 들어, "제1 구성요소"는 "제2 구성요소"로 명명될 수 있고, 유사하게 "제2 구성요소"는 "제1 구성요소"로도 명명될 수 있다.Although terms such as “first” or “second” may be used to describe various elements, these terms should only be construed for the purpose of distinguishing one element from another. For example, a “first component” may be termed a “second component” and similarly a “second component” may also be termed a “first component”.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다.When a component is referred to as being “connected to” another component, it may be directly connected or connected to the other component, but it should be understood that another component may exist in between.

단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 개시에서, "포함하다" 또는 "가지다" 등의 용어는 설명된 특징, 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것이 존재함으로 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부분품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The singular expression includes the plural expression unless the context clearly dictates otherwise. In the present disclosure, terms such as "comprise" or "have" are intended to designate that the described feature, number, step, action, component, part, or combination thereof exists, and includes one or more other features or numbers, It should be understood that the possibility of the presence or addition of steps, operations, components, parts or combinations thereof is not precluded in advance.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 해당 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가진다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 갖는 것으로 해석되어야 하며, 본 개시에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art. Terms such as those defined in a commonly used dictionary should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present disclosure. does not

이하, 실시예들을 첨부된 도면들을 참조하여 상세하게 설명한다. 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일한 참조 부호를 부여하고, 이에 대한 중복되는 설명은 생략하기로 한다.Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. In the description with reference to the accompanying drawings, the same components are assigned the same reference numerals regardless of the reference numerals, and overlapping descriptions thereof will be omitted.

타겟 객체를 추적할 때, 조명의 변화, 스케일의 변화, 배경 간섭(background interference), 시야 범위 밖으로 벗어남, 평면 내 회전, 저해상도(low resolution), 빠른 모션, 모션 블러(motion blur), 폐색(occlusion) 등의 영향으로, 추적되는 타겟 객체의 특징 정보가 변경될 수 있다. 이는 객체 식별 및/또는 객체 추적의 정확도를 떨어뜨릴 수 있다. 여기에 개시된 실시예들은 상기의 문제점들 중 적어도 하나를 해결할 수 있는 객체 검출 방법을 제공한다. 이 방법들은, 단말 장치, 서버 등일 수 있는 다양한 전자 장치들에 의하여 실행될 수 있다.When tracking a target object, changes in lighting, changes in scale, background interference, out of field of view, in-plane rotation, low resolution, fast motion, motion blur, occlusion ), characteristic information of the tracked target object may be changed. This may reduce the accuracy of object identification and/or object tracking. Embodiments disclosed herein provide an object detection method capable of solving at least one of the above problems. These methods may be executed by various electronic devices, which may be a terminal device, a server, or the like.

일 실시예에서, 복수의 프레임 이미지들을 포함하는 비디오의 프레임 이미지(a frame image of a video comprising a plurality of frame images)로부터 객체(object)를 검출(detect) 및/또는 추적(track)하는 방법이 제공된다.In one embodiment, there is provided a method for detecting and/or tracking an object from a frame image of a video comprising a plurality of frame images provided

상기 방법은, 타겟 템플릿 세트(target tempage set)에 기초하여, 프레임 이미지로부터 객체를 검출 및/또는 추적하고, 검출된 객체에 관한 정보(information regarding the detected object)를 출력(output)한다.The method detects and/or tracks an object from a frame image based on a target template set, and outputs information regarding the detected object.

상기 방법은, 비디오에서 타겟 객체(target object)를 추적하고 검출하는데 사용될 수 있고, 또는 임의의 프레임 이미지으로부터 타겟 객체를 검출하는데 사용될 수도 있으며, 이에 제한되지 않는다.The method may be used to track and detect a target object in a video, or may be used to detect a target object from an arbitrary frame image, but is not limited thereto.

실시예들이 적용될 수 있는 비디오의 소스(source)는 제한되지 않는다. 예를 들어, 비디오 소스는, 영화일 수도 있으며, CCTV 영상, 모니터링 영상, 스포츠 이벤트 라이브 비디오 등과 같이, 가시광 카메라(visible light camera)를 통해 촬영된 비디오일 수도 있다.A source of video to which embodiments can be applied is not limited. For example, the video source may be a movie, or may be a video captured by a visible light camera, such as a CCTV image, a monitoring image, a live video of a sporting event, or the like.

일 실시예에 따른 객체 검출 방법은, 시각적 객체 추적(visual object tracking)에 적용될 수 있다. 시각적 객체 추적은, 중요한 컴퓨터 비전 연구 과제로 중 하나로서, 일상 생활에서 다양한 실제 응용을 갖는다. 예를 들어, 일 실시예에 따른 객체 추적/검출은, 모니터링 영상으로부터 타겟 객체(예, 용의자, 실종 아동, 실종 노인, 실종 동물 등)를 찾거나, 스포츠 이벤트 라이브 비디오로부터 타겟 선수를 고정(locate)하는데 사용될 수 있다. 또한, 일 실시예에 따른 객체 추적/검출은, 모니터링 영상으로부터 차량이 이동한 경로을 결정하는 데에도 사용될 수 있고, 또는 비행 중인 소형 항공기를 추적하는 데에도 사용될 수 있다. 또 다른 예로서, 일 실시예에 따른 객체 추적/검출은, 스마트 폰에서, 카메라 포커싱, 비디오 효과 생성, 움직이는 물체 분석 등을 돕기 위해 사용될 수 있다. 실시예들에 따른 객체 추적/검출은, 실제의 필요에 따라 적용되는 시나리오가 결정될 수 있으며, 이에 제한되지 않는다.The object detection method according to an embodiment may be applied to visual object tracking. Visual object tracking, as one of the important computer vision research tasks, has various practical applications in daily life. For example, object tracking/detection according to an embodiment may include finding a target object (eg, a suspect, a missing child, a missing elderly person, a missing animal, etc.) from a monitoring image, or locating a target player from a live video of a sporting event. ) can be used to In addition, the object tracking/detection according to an embodiment may be used to determine a path traveled by the vehicle from a monitoring image, or may be used to track a small aircraft in flight. As another example, object tracking/detection according to an embodiment may be used in a smart phone to help a camera focus, generate a video effect, analyze a moving object, and the like. In the object tracking/detection according to the embodiments, a scenario applied according to actual needs may be determined, but is not limited thereto.

비디오 추적 및/또는 검출에 적용되는 경우, 타겟 템플릿 세트(target template set)는, 하나 이상의 타겟 템플릿(one or more target templates)을 포함한다. 상기 하나 이상의 타겟 템플릿의 각각은, 비디오의 해당 프레임 이미지의 이전 프레임 이미지들(previous frame images) 중 하나에 대응한다. 상기 하나 이상의 타겟 템플릿의 각각은, 비디오의 해당 프레임 이미지의 이전 프레임 이미지들(previous frame images) 중 타겟 객체를 포함하는 것으로 결정된 각각의 프레임 이미지(respective frame image)에서의 상기 객체의 정보를 포함한다. 상기 객체의 정보는 타겟 객체가 위치한 이미지 영역일 수 있다.When applied to video tracking and/or detection, a target template set includes one or more target templates. Each of the one or more target templates corresponds to one of previous frame images of the corresponding frame image of the video. Each of the one or more target templates includes information about the object in a respective frame image determined to include a target object among previous frame images of that frame image of the video. . The object information may be an image area in which the target object is located.

상기 타겟 객체(target object)는 움직이는 사람, 동물, 교통수단(자동차, 항공기 등)과 같이 추적 및/또는 검출하고자 하는 객체로 이해될 수 있다. 추적 및/또는 검출하고자 하는 타겟 객체는 실제 필요에 따라 선택될 수 있으며, 여기서 그 어떤 제한도 하지 않는다. 선택적으로, 비디오에서 추적할 대상(즉, 상기 타겟 객체)은 사용자의 지시(indication)에 의해 결정될 수 있으며, 또는 이미 알려진 객체의 이미지에 의해 결정될 수도 있다. 예를 들어, 사용자는 비디오에서 타겟 객체가 처음 나타나는 비디오 프레임에 어떤 객체를 지시함으로써 타겟 객체를 결정할 수 있으며, 이렇게 사용자에 의하여 선택된 이미지 영역은 타겟 객체의 초기 타겟 템플릿(initial target template)으로 이용될 수 있다. 또는, 초기 타겟 템플릿은 이미 알려진 객체의 이미지일 수 있다. 예를 들어, 특정 인물을 검색하는 경우, 상기 인물의 사진을 초기 타겟 템플릿으로 사용할 수 있다.The target object may be understood as an object to be tracked and/or detected, such as a moving person, animal, or means of transportation (car, aircraft, etc.). A target object to be tracked and/or detected may be selected according to actual needs, without any limitation herein. Optionally, the object to be tracked in the video (ie, the target object) may be determined by a user's indication, or may be determined by an image of a known object. For example, the user may determine the target object by pointing to an object in a video frame in which the target object first appears in the video, and the image area selected by the user in this way will be used as an initial target template of the target object. can Alternatively, the initial target template may be an image of a known object. For example, when searching for a specific person, a photo of the person may be used as an initial target template.

상기 타겟 템플릿 세트는, 메모리, 하드 디스크 등과 같은 저장장치(storage)에 저장된다. 여기서는 타겟 템플릿 세트를 저장하는 저장장치를 "타겟 메모리(target memory)"로 부른다. 타겟 템플릿 세트에 포함되는 타겟 템플릿은, 비디오의 검출된 프레임 이미지에서 타겟 객체의 검출 결과에 대응하는 이미지 영역(즉, 각 프레임 이미지에서 타겟 객체가 위치한 영역)을 포함할 수 있다. 예를 들어, 타겟 템플릿은, 프레임 이미지에서 타겟 객체가 위치한 이미지 영역일 수 있다. 즉, 프레임 이미지에서 타겟 객체를 포함한 이미지 영역이 타겟 템플릿으로 사용될 수 있다. 물론, 실시예에 따라, 타겟 템플릿 세트에 저장되는 타겟 템플릿은, 해당 이미지 영역의 특징 정보(feature information)일 수도 있다. 예를 들어, 타겟 템플릿은, 이미지로부터 추출된, 이미지 영역을 나타내는 이미지 특징 정보일 수도 있다. 즉, 프레임 이미지에서 타겟 객체를 포함한 이미지 영역으로부터 추출된 특징 정보가 타겟 템플릿으로 사용될 수 있다. 또한, 타겟 템플릿 세트에 포함되는 타겟 템플릿은, 각 프레임 이미지에서 검출된 타겟 객체의 위치 정보일 수 있다.The target template set is stored in a storage device such as a memory or a hard disk. A storage device for storing a set of target templates is referred to herein as a "target memory". The target template included in the target template set may include an image region corresponding to the detection result of the target object in the detected frame image of the video (ie, the region in which the target object is located in each frame image). For example, the target template may be an image area in which the target object is located in the frame image. That is, the image region including the target object in the frame image may be used as the target template. Of course, according to an embodiment, the target template stored in the target template set may be feature information of a corresponding image area. For example, the target template may be image feature information representing an image region extracted from the image. That is, feature information extracted from the image region including the target object in the frame image may be used as the target template. In addition, the target template included in the target template set may be location information of the target object detected in each frame image.

또한, 타겟 템플릿 세트는, 초기 타겟 템플릿(initial target template)를 포함할 수 있다. 초기 타겟 템플릿은, 비디오의 프레임 이미지들 중 목표 객체를 포함하는 것으로 사용자에 의하여 결정된 프레임 이미지에서의 목표 객체의 정보일 수 있다. 예를 들어, 사용자는 비디오에서 타겟 객체가 처음으로 나타난 프레임 이미지에서 초기 타겟 템플릿을 결정할 수 있다. 사용자는, 타겟 객체가 처음으로 나타난 프레임 이미지에서, 추적하고자 하는 타겟 객체를 지시(indicate)하거나 타겟 객체를 포함하는 이미지 영역을 지정할 수 있다. 이렇게 지정된 이미지 영역이 초기 타겟 템플릿으로 사용될 수도 있고, 상기 이미지 영역의 특징 정보가 초기 타겟 템플릿으로 사용될 수도 있다.Also, the target template set may include an initial target template. The initial target template may be information of the target object in the frame image determined by the user as including the target object among frame images of the video. For example, the user may determine an initial target template from a frame image in which the target object first appears in the video. A user may indicate a target object to be tracked in a frame image in which the target object first appears, or may designate an image area including the target object. The image region designated in this way may be used as an initial target template, and feature information of the image region may be used as an initial target template.

일 실시예에서, 초기 타겟 템플릿은, 타겟 객체를 추적하려는 비디오와 독립(independent)되고, 타겟 객체를 포함하는 별도의 이미지(separate image)로부터 획득될 수 있다. 예를 들어, 특정 인물을 추적 및/또는 식별하고는 경우, 상기 인물의 사진 또는 상기 인물의 사진의 이미지 특징 정보가 초기 타겟 템플릿으로 사용할 수 있다.In an embodiment, the initial target template may be obtained from a separate image independent of a video to be tracked and including the target object. For example, when tracking and/or identifying a specific person, a picture of the person or image characteristic information of the picture of the person may be used as an initial target template.

타겟 템플릿 세트는, 초기 타겟 템플릿에 추가하여, 타겟 객체가 이미 검출된 다른 프레임 이미지들에 대응하는 타겟 템플릿을 더 포함할 수 있다. 예를 들어, 비디오에서, 타겟 객체를 검출/추적하려고 하는 프레임 이미지의 이전 프레임 이미지들 중 적어도 하나의 프레임 이미지에 대응하는 타겟 템플릿을 더 포함할 수 있다.The target template set may further include, in addition to the initial target template, a target template corresponding to other frame images in which the target object has already been detected. For example, in the video, a target template corresponding to at least one frame image among previous frame images of a frame image in which a target object is to be detected/tracked may be further included.

예를 들어, 해당 프레임 이미지(subject frame image)가 제2 프레임 이미지인 경우, 이전 프레임 이미지(previous frame image)는 제1 프레임 이미지이다. 제1 프레임 이미지에 타겟 객체가 포함된 것으로 판단된 경우, 타겟 템플릿 세트는, 제1 프레임 이미지에서 타겟 객체를 포함한 이미지 영역에 관한 정보(제1 타겟 템플릿)를 포함할 수 있다.For example, when the subject frame image is the second frame image, the previous frame image is the first frame image. When it is determined that the target object is included in the first frame image, the target template set may include information (a first target template) about an image region including the target object in the first frame image.

해당 이미지가 제3 프레임 이미지인 경우, 이전 프레임 이미지(previous frame images)는 제1 프레임 이미지 및 제2 프레임 이미지를 포함한다. 제1 프레임 이미지 및 제2 프레임 이미지에 타겟 객체가 포함된 것으로 판단된 경우, 타겟 템플릿 세트는, 제1 프레임 이미지에서 타겟 객체를 포함한 이미지 영역(타겟 영역(target area))에 관한 정보(제1 타겟 템플릿) 및 제2 프레임 이미지에서 타겟 객체를 포함한 이미지 영역에 관한 정보(제2 타겟 템플릿)를 포함할 수 있다. When the corresponding image is the third frame image, previous frame images include the first frame image and the second frame image. When it is determined that the target object is included in the first frame image and the second frame image, the target template set includes information about an image area (target area) including the target object in the first frame image (the first target template) and information about an image region including a target object in the second frame image (a second target template).

일 실시예에서, 해당 프레임 이미지에 대해, 해당 프레임 이미지의 모든 이전 프레임 이미지들에 대응하는 타겟 템플릿의 모두가 타겟 템플릿 세트에 포함될 수 있다.In an embodiment, for the frame image, all of the target templates corresponding to all previous frame images of the frame image may be included in the target template set.

실시예에 따라서, 타겟 템플릿 세트가, 이전 프레임 이미지들 중 타겟 객체를 포함한 것으로 판단된 모든 프레임 이미지들에 대응하는 타겟 템플릿을 포함하지 않고, 일부의 타겟 템플릿만 포함될 수 있다. 예를 들어, 해당 프레임 이미지에 대해, 해당 프레임 이미지의 모든 이전 프레임 이미지들 중 특정 조건을 만족하는 타겟 템플릿만 타겟 템플릿 세트에 포함될 수 있다. 상기 특정 조건은, 실시예에 따라 구성될 수 있다. According to an embodiment, the target template set may not include target templates corresponding to all frame images determined to include the target object among previous frame images, but only some target templates. For example, with respect to the corresponding frame image, only a target template satisfying a specific condition among all previous frame images of the corresponding frame image may be included in the target template set. The specific condition may be configured according to an embodiment.

예를 들어, 새로운 타겟 템플릿은, 타겟 템플릿 세트에 이미 저장된 타겟 템플릿들과의 유사도(similarity), 매칭 정도(matching degree) 등에 따라, 타겟 템플릿 세트에 포함될지 여부가 결정될 수 있다. 예를 들어, 유사도 또는 매칭 정도가 임계값보다 작으면, 새로운 타겟 템플릿은 타겟 템플릿 세트에 포함될 수 있다. 이렇게 함으로써, 타겟 템플릿 세트에 이미 포함된 타겟 템플릿들과 비슷한 타겟 템플릿(유사도가 임계값보다 높은 타겟 템플릿)은 타겟 템플릿 세트에 추가되지 않도록 할 수 있다. 기존의 타겟 템플릿들(existing target tempates)과 유사한 타겟 템플릿을 추가하지 않도록 함으로써, 이후의 데이터 계산량을 효과적으로 줄일 수 있다.For example, whether the new target template is included in the target template set may be determined according to similarity, matching degree, etc. with target templates already stored in the target template set. For example, if the degree of similarity or matching is less than a threshold value, a new target template may be included in the target template set. By doing so, a target template similar to target templates already included in the target template set (a target template having a similarity higher than the threshold value) may not be added to the target template set. By not adding a target template similar to existing target templates, the amount of subsequent data calculation can be effectively reduced.

해당 프레임 이미지에서 타겟 객체를 검출할 때 사용되는 타겟 템플릿 세트는, 해당 프레임 이미지에 따라 다를 수 있다. 예를 들어, 50번째 프레임 이미지에서 타겟 객체를 검출할 때 사용하는 타겟 템플릿 세트는, 200번째 프레임 이미지에서 타겟 객체를 검출할 때 사용하는 타겟 템플릿 세트와 다를 수 있다. 유사하게, 사용되는 타겟 템플릿 세트에 포함되는 카겟 템플릿의 수도 다를 수 있다.A target template set used when detecting a target object in a corresponding frame image may be different depending on the corresponding frame image. For example, the target template set used when detecting the target object in the 50th frame image may be different from the target template set used when detecting the target object in the 200th frame image. Similarly, the number of cartridge templates included in the set of target templates used may be different.

비디오에서 타겟 객체를 추적 및/또는 검출할 때, 해당 프레임 이미지에 대해, 타겟 템플릿 세트를 통해, 해당 프레임 이미지에 대해 타겟 객체의 검출을 수행하여 타겟 객체의 검출 결과(detection results)를 얻을 수 있다. 검출 결과는, 해당 프레임 이미지에서 타겟 객체를 포함하는 것으로 판단된 이미지 영역(image area), 상기 이미지 영역의 이미지 특성 정보(image feature information), 또는 해당 프레임 이미지에서의 타겟 객체의 위치 정보일 수 있다. 또한, 상기 검출 결과는, 이들의 신뢰도를 추가로 포함할 수 있다. 신뢰도는 검출 결과를 처리하는데 사용될 수 있으며, 값의 범위는 [0, 1]일 수 있다.When tracking and/or detecting a target object in a video, detection results of the target object may be obtained by performing detection of the target object on the frame image, through the target template set, on the frame image. . The detection result may be an image area determined to include the target object in the frame image, image feature information of the image area, or location information of the target object in the frame image. . In addition, the detection result may further include their reliability. Reliability may be used to process the detection result, and the range of values may be [0, 1].

일 실시예에 따르면, 타겟 템플릿 세트를 통해, 프레임 이미지에서 타겟 객체에 대해 검출 및/또는 추적을 진행하고, 프레임 이미지에서의 타겟 객체의 검출 결과를 획득한다. 이 방법에 따르면, 타겟 템플릿 세트가 초기 타겟 템플릿뿐만 아니라 이전 프레임 이미지들에서 타겟 객체가 위치한 이미지 영역에 대응하는 타겟 템플릿도 포함하기 때문에, 타겟 템플릿 세트에 포함된 정보가 더 풍부해진다. 즉, 타겟 템플릿 세트에 포함된 타겟 템플릿의 다양성으로 인해, 타겟 객체 검출에 사용될 수 있는 정보가 더 풍부해지고, 검출 결과의 정확도 또한 향상된다.According to an embodiment, detection and/or tracking of a target object in a frame image is performed through a target template set, and a detection result of the target object in the frame image is obtained. According to this method, since the target template set includes not only the initial target template but also the target template corresponding to the image region where the target object is located in previous frame images, the information included in the target template set becomes richer. That is, due to the diversity of target templates included in the target template set, information that can be used for target object detection becomes richer, and the accuracy of the detection result is also improved.

일 실시예에 따르면, 프레임 이미지로부터 상기 객체를 검출하는 단계는, 타겟 템플릿 세트 및 간섭 템플릿 세트(interference templage set)에 기초하여, 상기 프레임 이미지로부터 상기 객체를 검출하는 단계를 포함한다.According to an embodiment, detecting the object from the frame image includes detecting the object from the frame image based on a target template set and an interference template set.

간섭 템플릿 세트는, 하나 이상의 간섭 템플릿(interference template)을 포함한다. 상기 하나 이상의 간섭 템플릿의 각각은, 상기 비디오의 상기 프레임 이미지의 이전 프레임 이미지들 중 상기 객체의 검출을 방해(interfere with the detection of the object)한 간섭 객체(interfere object)에 관한 정보를 포함한다.The interference template set includes one or more interference templates. Each of the one or more interference templates includes information about an interference object that interfered with the detection of the object among previous frame images of the frame image of the video.

간섭 객체는, 간섭물(interfering object), 즉 타겟 객체의 검출을 방해(interfere)할 수 있는 객체라고도 할 수 있다. 간섭 객체는, 비디오의 프레임 이미지에서 추적되는 타겟 객체와 (시각적으로) 유사한 물체, 텍스처(비물체) 또는 모양일 수 있다.The interfering object may also be referred to as an interfering object, that is, an object capable of interfering with the detection of a target object. An interfering object may be an object, texture (non-object) or shape that is (visually) similar to the target object being tracked in the frame image of the video.

상기 간섭 템플릿 세트는, 메모리, 하드 디스크 등과 같은 저장장치(storage)에 저장된다. 여기서는 간섭 템플릿 세트를 저장하는 저장장치를 "간섭 메모리(interference memory)"로 부른다. 간섭 템플릿 세트에 포함되는 간섭 템플릿은, 비디오의 프레임 이미지들에서 타겟 객체의 검출을 방해한 간섭 객체를 포함하는 이미지 영역(즉, 프레임 이미지에서 간섭 객체가 위치한 영역)을 포함할 수 있다. 예를 들어, 간섭 템플릿은, 프레임 이미지에서 타겟 객체에 대응하는 간섭 객체가 위치한 이미지 영역일 수 있다. 즉, 프레임 이미지에서 간섭 객체를 포함한 이미지 영역이 간섭 템플릿으로 사용될 수 있다. 또는, 간섭 템플릿은, 프레임 이미지에서 타겟 객체를 포함한 것으로 판단된 이미지 영역(타겟 영역)을 제외한 다른 이미지 영역을 포함할 수 있다. 물론, 실시예에 따라, 간섭 템플릿 세트에 저장되는 간섭 템플릿은, 간섭 객체를 포함한 이미지 영역의 특징 정보(feature information)일 수도 있다. 예를 들어, 간섭 템플릿은, 이미지로부터 추출된, 이미지 영역을 나타내는 이미지 특징 정보일 수도 있다. 즉, 각 프레임 이미지에서 간섭 객체를 포함한 이미지 영역으로부터 추출된 특징 정보가 간섭 템플릿으로 사용될 수 있다. 또한, 간섭 템플릿 세트에 포함되는 간섭 템플릿은, 프레임 이미지에서 검출된 간섭 객체의 위치 정보일 수 있다.The interference template set is stored in a storage device such as a memory or a hard disk. A storage device storing a set of interference templates is referred to herein as "interference memory". The interference template included in the interference template set may include an image region (ie, a region in which the interference object is located in the frame image) including the interference object that interferes with detection of the target object in frame images of the video. For example, the interference template may be an image region in which the interference object corresponding to the target object is located in the frame image. That is, the image region including the interference object in the frame image may be used as the interference template. Alternatively, the interference template may include an image region other than the image region (target region) determined to include the target object in the frame image. Of course, according to an embodiment, the interference template stored in the interference template set may be feature information of the image region including the interference object. For example, the interference template may be image feature information representing an image region, extracted from the image. That is, feature information extracted from an image region including an interference object in each frame image may be used as an interference template. Also, the interference template included in the interference template set may be location information of the interference object detected in the frame image.

일 실시예에서, 간섭 템플릿 세트는, 타겟 객체에 대응하는 간섭 템플릿들을 포함한다. 간섭 템플릿들의 각각은, 해당 프레임 이미지(subject frame image)의 이전 프레임 이미지들(previous image frames)에서 타겟 객체의 간섭 객체가 위치한 이미지 영역이다.In one embodiment, the interference template set includes interference templates corresponding to the target object. Each of the interference templates is an image region in which the interference object of the target object is located in previous image frames of the subject frame image.

예를 들어, 해당 프레임 이미지가 제3 프레임 이미지인 경우, 이전 프레임 이미지들은, 제1 프레임 이미지 및 제2 프레임 이미지를 포함한다. 타겟 객체가 제1 프레임 이미지에서 사용자에 의하여 선택(select)된 경우, 초기 타겟 템플릿은, 제1 프레임 이미지 중 사용자에 의하여 선택된 이미지 영역이다. 또한, 사용자의 선택에 의하여 제1 프레임 이미지로부터 간섭 객체도 선택될 수 있다. 이 경우, 선택된 간섭 객체에 대응하는 이미지 영역에 관한 정보가 간섭 템플릿 세트에 추가될 수 있다. 선택적으로, 제1 프레임 이미지에 대해 타겟 체에 대응하는 간섭 객체의 검출을 진행하지 않을 수 있다. 즉, 제1 프레임으로부터 간섭 객체의 검출을 진행하지 않을 수 있다.For example, when the corresponding frame image is the third frame image, the previous frame images include the first frame image and the second frame image. When the target object is selected by the user in the first frame image, the initial target template is an image area selected by the user from among the first frame images. Also, the interference object may be selected from the first frame image by the user's selection. In this case, information about the image region corresponding to the selected interference object may be added to the interference template set. Optionally, the detection of the interfering object corresponding to the target body may not proceed with respect to the first frame image. That is, the detection of the interfering object from the first frame may not proceed.

제2 프레임 이미지에 대해 타겟 객체가 검출될 때, 복수의 타겟 후보 영역들를 얻게 되고, 복수의 타겟 후보 영역들 중 타겟 객체가 포함된 것으로 결정된 타겟 영역 외의 타겟 후보 영역들(이미지 영역들)의 일부 또는 전부가 타겟 객체의 간섭 객체가 위치한 이미지 영역으로 결정될 수 있다. 이는 간섭 템플릿으로 간섭 템플릿 세트에 추가될 수 있다.When a target object is detected with respect to the second frame image, a plurality of target candidate regions are obtained, and a portion of target candidate regions (image regions) other than the target region determined to include the target object among the plurality of target candidate regions. Alternatively, all of the target object may be determined as the image region in which the interfering object is located. It can be added to the interference template set as an interference template.

제3 프레임 이미지에 대해 타겟 객체를 검출할 때, 타겟 템플릿 세트에 포함된 타겟 템플릿(제1 프레임으로부터 획득된 타겟 템플릿 및 제2 프레임으로부터 획득된 타겟 템플릿) 및 간섭 템플릿 세트에 포함된 간섭 템플릿들(제2 프레임 이미지로부터 획득된 간섭 템플릿)이 이용될 수 있다. 또한, 제3 프레임 이미지에 대해 타겟 객체가 검출될 때, 동일한 방법에 의하여, 제3 프레임 이미지로부터 추가의 간섭 템플릿을 얻을 수 있다.When the target object is detected for the third frame image, the target template (target template obtained from the first frame and target template obtained from the second frame) included in the target template set and the interference templates included in the interference template set (interference template obtained from the second frame image) may be used. Also, when a target object is detected for the third frame image, an additional interference template may be obtained from the third frame image by the same method.

일 실시예에서, 프레임 이미지로부터 타겟 객체를 검출할 때, 타겟 템플릿 세트와 간섭 템플릿 세트의 두 가지 정보를 모두 고려할 수 있다. 즉, 타겟 템플릿 세트와는 유사도가 높고 간섭 템플릿 세트와는 유사도가 낮은 타겟 객체를 검출할 수 있다.In an embodiment, when detecting a target object from a frame image, both information of a target template set and an interference template set may be considered. That is, a target object having a high similarity to the target template set and a low similarity to the interference template set may be detected.

간섭 객체에 관한 정보는, 예를 들어, 이전 프레임 이미지에서 간섭 객체를 포함하는 것으로 판단된 이미지 영역(image area), 상기 이미지 영역의 이미지 특성 정보(image feature information), 또는 이전 프레임 이미지에서의 간섭 객체의 위치 정보일 수 있다.Information on the interference object may include, for example, an image area determined to contain the interference object in the previous frame image, image feature information of the image area, or interference in the previous frame image. It may be location information of an object.

비디오에서 타겟 객체는, 조명 변환, 타겟 객체의 회전, 빠른 움직임, 모션 블러, 폐색, 변형, 및 배경 간섭 등 때문에 다양한 변화를 가진다. 실시예들에 따르면, 타겟 메모리(또는 타겟 템플릿 세트)는 타겟 객체의 이러한 변화들을 기억함으로써, 타겟 객체가 보다 정확하게 식별 및 추적될 수 있다.A target object in a video has various changes due to lighting transformations, rotation of the target object, fast movement, motion blur, occlusion, deformation, and background interference, and the like. According to embodiments, the target memory (or target template set) remembers these changes of the target object, so that the target object can be more accurately identified and tracked.

또한, 실시예들에 따르면, 간섭 메모리(또는 간섭 템플릿 세트)는 타겟 객체의 검출을 방해하는 간섭 객체의 이미지 특징을 기억하기 때문에, 타겟 객체의 검출에 있어서 간섭 객체의 영향을 제거하거나 줄일 수 있다.Further, according to embodiments, since the interference memory (or the interference template set) stores image characteristics of the interference object that interfere with the detection of the target object, the influence of the interference object on the detection of the target object can be eliminated or reduced .

타겟 메모리(또는 타겟 템플릿 세트)와 간섭 메모리(또는 간섭 템플릿 세트)를 통합(integrate)하여 타겟 객체를 검출하는 이러한 방법은, 타겟 객체의 다양한 특징을 종합적으로 반영할 수 있을 뿐 아니라, 간섭 객체의 부정적 영향을 제거하거나 줄임으로써, 타겟 객체의 식별 및/또는 추적의 정확도를 크게 향상시킬 수 있다.This method of detecting a target object by integrating the target memory (or target template set) and the interference memory (or interference template set) can not only comprehensively reflect various characteristics of the target object, but also By eliminating or reducing the negative impact, the accuracy of the identification and/or tracking of the target object can be greatly improved.

일 실시예에서, 프레임 이미지로부터 상기 객체를 검출하는 단계는, 상기 프레임 이미지 내에서 상기 객체를 포함하는 것으로 판단되는 하나 이상의 타겟 후보 영역(target candidate area)을 획득하는 단계, 및 상기 하나 이상의 타겟 후보 영역으로부터 하나의 타겟 영역(target area)을 결정하는 단계를 포함한다.In an embodiment, the detecting of the object from the frame image includes: obtaining one or more target candidate areas determined to include the object in the frame image, and the one or more target candidates and determining one target area from the area.

타겟 후보 영역은, 프레임 이미지에서 타겟 객체의 가능한 위치를 나타내는 후보 영역(candidate area), 앵커 영역(anchor region), 앵커 박스(anchor box) 또는 앵커(anchor)로 불릴 수도 있다. 타겟 후보 영역의 위치 및 크기는 다양할 수 있다.The target candidate area may be referred to as a candidate area indicating a possible position of the target object in the frame image, an anchor region, an anchor box, or an anchor. The location and size of the target candidate area may vary.

일 실시예에서, 이미지 프레임에서 타겟 객체를 검출할 때, 타겟 템플릿 세트를 이용하여, 이미지 프레임으로부터 하나 이상의 타겟 후보 영역이 획득될 수 있다.In an embodiment, when detecting a target object in an image frame, one or more target candidate regions may be obtained from the image frame by using a set of target templates.

일 실시예에서, 상기 하나 이상의 타겟 후보 영역을 획득하는 단계는, 상기 프레임 이미지 내에서 복수의 검색 영역(a plurality of search areas)을 결정하는 단계, 상기 복수의 검색 영역의 각각의 이미지 특징을 추출(extract)하여 검색 영역 특징(search area feature)를 획득하는 단계, 상기 복수의 검색 영역의 각각의 검색 영역 특징과 상기 타겟 융합 특징의 상관도(correlation)를 계산하는 단계, 및 상기 상관도에 기초하여 상기 복수의 검색 영역 중 상기 하나 이상의 타겟 후보 영역을 결정하는 단계를 포함할 수 있다.In one embodiment, the step of obtaining the one or more target candidate areas comprises: determining a plurality of search areas within the frame image; extracting image features of each of the plurality of search areas; extracting to obtain a search area feature, calculating a correlation between each search area feature of the plurality of search areas and the target fusion feature, and based on the correlation and determining the one or more target candidate areas among the plurality of search areas.

예를 들어, 해당 프레임 이미지(subject frame image)로부터 복수의 검색 영역 박스들(search area box)을 얻을 수 있다. 예를 들어, 이전 프레임에서 타겟 객체를 포함하는 것으로 판단된 영역의 주위에 대해 복수의 검색 영역 박스들을 설정할 수 있다. 일 실시예에 따르면, 이 검색 영역 박스들의 이미지들이 모두 타겟 후보 영역이 될 수도 있다. 또 다른 실시예에 따르면, 각 검색 영역 박스의 이미지는 타겟 템플릿 세트의 각 타겟 템플릿과 하나씩(one by one) 매칭될 수 있다. 또는, 각 검색 영역 박스의 이미지는, 타겟 템플릿 세트의 모든 타겟 템플릿들을 융합(integrate)하여 얻어진 융합 템플릿(integration template)과 매칭될 수 있다. 상기 매칭 후, 매칭 정도가 높은 하나 이상의 검색 영역들이 타겟 후보 영역들로 선택된다. 그 다음 타겟 후보 영역들 중에서 타겟 영역이 선택되고, 타겟 영역에 기초하여 타겟 객체의 검출 결과가 얻어진다. For example, a plurality of search area boxes may be obtained from a subject frame image. For example, a plurality of search area boxes may be set around an area determined to include the target object in the previous frame. According to an embodiment, all of the images in the search area boxes may be target candidate areas. According to another embodiment, the image of each search area box may be matched one by one with each target template of the target template set. Alternatively, the image of each search area box may be matched with an integration template obtained by integrating all target templates of the target template set. After the matching, one or more search areas having a high matching degree are selected as target candidate areas. Then, a target area is selected from among the target candidate areas, and a detection result of the target object is obtained based on the target area.

실시예들은, 사전에 훈련된 신경망 모델에 의해 구현될 수 있음을 이해할 수 있다. It may be understood that embodiments may be implemented by a previously trained neural network model.

도 1은 일 실시예에 따른 객체 추적 방법을 구현하는 신경말 모델을 도시한다.1 illustrates a neural model implementing an object tracking method according to an embodiment.

도 1에서, 운동 선수의 스포츠 장면을 포함하는 비디오가, 예로서, 제공된다. 비디오에서 추적하려고 하는 타겟 객체는 특정 운동 선수("타겟 운동 선수(target player)"라고 할 수 있음)이다. 현재 프레임 이미지(current frame image)(110)는, 타겟 운동 선수를 검출하려고 하는 비디오 내 프레임 이미지이다. 도 1의 실시예에서, 사용자는, 비디오의 재생 중 타겟 객체(타겟 운동 선수)를 발견하면, 이 프레임 이미지에서 타겟 객체를 지시(indicate)한다. 그러면, 이 프레임 이미지의 부분(이 부분은 타겟 객체를 포함함)의 정보가 초기 타겟 템플릿(initial target template)(120)으로 사용된다.In FIG. 1 , a video including, by way of example, a sports scene of an athlete is provided. The target object you want to track in the video is a specific athlete (you might call it a "target player"). A current frame image 110 is a frame image in a video for which a target athlete is to be detected. In the embodiment of FIG. 1 , when a user finds a target object (target athlete) during playback of a video, the user indicates the target object in this frame image. Then, information of a part of this frame image (this part includes a target object) is used as an initial target template 120 .

메모리 풀(memory pool)(130)은, 타겟 메모리(target memory)(131)를 포함한다. 타겟 메모리(131)는 타겟 템플릿 세트(target template set)를 저장하는 저장공간(storage space)이다. 메모리 풀은, 메모리(memory), 하드디스크(HDD), 플래시 메모리 등으로 구현될 수 있으며, 이에 제한되지 않는다. 타겟 템플릿 세트는, 초기 타겟 템플릿(120) 및 현재 프레임 이미지(110) 이전의 프레임 이미지들에 대응하는 타겟 템플릿들을 저장한다.The memory pool 130 includes a target memory 131 . The target memory 131 is a storage space for storing a target template set. The memory pool may be implemented as a memory, a hard disk (HDD), a flash memory, or the like, but is not limited thereto. The target template set stores target templates corresponding to frame images before the initial target template 120 and the current frame image 110 .

일 실시예에서, 현재 프레임 이미지(110)에서 타겟 객체를 검출하기 위하여, 현재 프레임 이미지(110)은 백본 네트워크(backbone network)(140)에 입력된다. 백본 네트워크(140)는, 신경망을 이용하여 이미지의 특징(features)을 추출하는 네트워크로서, MobileNet, ResNet, Xception Network 등이 사용될 수 있으나, 이에 제한되지 않는다. 백본 네트워크(140)는 현재 프레임 이미지(110)을 수신하고, 현재 프레임 이미지(110)에 대응하는 이미지 특징(145)을 출력한다. 예를 들어, 백본 네트워크(140)는 현재 프레임 이미지(110)의 이미지 특징(145)으로, W x H x C를 출력할 수 있다. W x H x C는 특징맵(feature map)으로, W와 H는 각각 특징맵의 너비(width)와 높이(height)를 나타내고, C는 특징맵의 채널 수(즉, 특징의 수)를 나타낸다.In one embodiment, in order to detect a target object in the current frame image 110 , the current frame image 110 is input to a backbone network 140 . The backbone network 140 is a network for extracting image features using a neural network, and MobileNet, ResNet, Xception Network, etc. may be used, but is not limited thereto. The backbone network 140 receives the current frame image 110 , and outputs an image feature 145 corresponding to the current frame image 110 . For example, the backbone network 140 may output W x H x C as the image feature 145 of the current frame image 110 . W x H x C is a feature map, where W and H represent the width and height of the feature map, respectively, and C represents the number of channels (that is, the number of features) of the feature map. .

또 다른 실시예에 따르면, 현재 프레임 이미지(110) 내에서 복수의 검색 영역들(search areas)이 결정된다. 검색 영역은, 현재 프레임 이미지(110) 내에서 일부 이미지의 영역이다. 복수의 검색 영역의 각각은 백본 네트워크(backbone network)(140)에 입력된다. 백본 네트워크(140)는, 복수의 검색 영역의 각각에 대한 검색 영역 특징(search area feature)(145)을 출력한다.According to another embodiment, a plurality of search areas are determined within the current frame image 110 . The search area is an area of a partial image within the current frame image 110 . Each of the plurality of search areas is input to a backbone network 140 . The backbone network 140 outputs a search area feature 145 for each of the plurality of search areas.

타겟 템플릿 세트에 저장된 타겟 템플릿들이 이미지인 경우, 하나 이상의 타겟 후보 영역을 획득하기 위해, 타겟 메모리(131)에 저장된 타겟 템플릿 세트의 타겟 템플릿들을 백본 네트워크(150)에 입력한다. 백본 네트워크(150)는, 신경망을 이용하여 이미지의 특징(features)을 추출하는 네트워크로서, MobileNet, ResNet, Xception Network 등이 사용될 수 있으나, 이에 제한되지 않는다. 백본 네트워크(150)는 백본 네트워크(140)와 동일할 수도 있고 다를 수도 있다. 백본 네트워크(150)는 타겟 템플릿들을 수신하고, 타겟 템플릿들에 대응하는 이미지 특징들을 리콜 네트워크(recall network)(163)에 출력한다.When the target templates stored in the target template set are images, target templates of the target template set stored in the target memory 131 are input to the backbone network 150 to obtain one or more target candidate regions. The backbone network 150 is a network for extracting image features using a neural network, and MobileNet, ResNet, Xception Network, etc. may be used, but is not limited thereto. The backbone network 150 may be the same as or different from the backbone network 140 . The backbone network 150 receives the target templates and outputs image features corresponding to the target templates to a recall network 163 .

또 다른 실시예에 따르면, 타겟 템플릿 세트에 저장된 타겟 템플릿들은 대응하는 이미지의 특징일 수 있다. 이 경우, 각 타겟 템플릿에 대응하는 이미지의 특징을 백본 네트워크(150)를 통하여 추출할 필요가 없으므로, 타겟 템플릿은 직접(directly) 리콜 네트워크(163)에 입력된다.According to another embodiment, the target templates stored in the target template set may be features of the corresponding image. In this case, since it is not necessary to extract the image features corresponding to each target template through the backbone network 150 , the target template is directly input to the recall network 163 .

일 실시예에 따르면, 프로세서(160)는, 리콜 네트워크(163), 상관도 계산부(correlation calculater)(166) 및 앵커 처리기(anchor processor)(169)를 포함할 수 있다. 리콜 네트워크(163), 상관도 계산부(correlation calculater)(166) 및 앵커 처리기(anchor processor)(169)는, 기능에 따른 개념적인 구분이며, 반드시 물리적으로 분리되어 구현될 필요는 없다. 예를 들어, 리콜 네트워크(163), 상관도 계산부(correlation calculater)(166) 및 앵커 처리기(anchor processor)(169)는, 모두 프로세서(160)에 의해 처리되는 다른 기능들일 수 있으며, 설명의 편의상 구분된 것이다.According to an embodiment, the processor 160 may include a recall network 163 , a correlation calculator 166 , and an anchor processor 169 . The recall network 163 , a correlation calculator 166 , and an anchor processor 169 are conceptual classifications according to functions, and do not necessarily have to be physically separately implemented. For example, recall network 163 , correlation calculater 166 , and anchor processor 169 may all be other functions processed by processor 160 , as described above. They are separated for convenience.

리콜 네트워크(163)는, 타겟 템플릿 세트에 포함된 타겟 템플릿들의 각각에 대응하는 이미지 영역의 이미지 특징(image feature)을 융합(integrate)하여 타겟 융합 특징(integrated target feature)을 출력한다. 리콜 네트워크(163)는, 신경망을 이용하여 입력된 이미지 특징들을 융합하는 네트워크로서, FFCNN(feature fusion convolutional neural network) 등이 사용될 수 있으나, 이에 제한되지 않는다. The recall network 163 integrates an image feature of an image region corresponding to each of the target templates included in the target template set, and outputs an integrated target feature. The recall network 163 is a network that fuses image features input using a neural network, and a feature fusion convolutional neural network (FFCNN) or the like may be used, but is not limited thereto.

상관도 계산부(correlation calculater)(166)는, 현재 프레임 이미지(110)의 이미지 특징(145)과 타겟 융합 특징의 상관도(correlation)를 계산한다. 또는, 상관도 계산부(correlation calculater)(166)는, 현재 프레임 이미지(110) 내 복수의 검색 영역들(search areas)의 각각의 검색 영역 특징과 타겟 융합 특징의 상관도(correlation)를 계산한다. 상기 계산된 상관도에 기초하여, 복수의 검색 영역들 중 하나 이상의 타겟 후보 영역("후보 앵커(candidate anchor)"로 불리기도 한다)가 결정된다.A correlation calculator 166 calculates a correlation between the image feature 145 of the current frame image 110 and the target fusion feature. Alternatively, the correlation calculator 166 calculates a correlation between the search area feature and the target fusion feature of a plurality of search areas in the current frame image 110 . . Based on the calculated correlation, one or more target candidate areas (also referred to as "candidate anchors") among a plurality of search areas are determined.

상관도 계산부(correlation calculater)(166)에 의하여 생성된 상관도는 앵커 처리기(anchor processor)(169)에 입력된다. 앵커 처리기(169)는 하나 이상의 타겟 후보 영역들(후보 앵커들)의 각각에 대한 신뢰도(confidence) 및/또는 회귀 결과(regression results)를 출력할 수 있다. 일 실시예에서, 회귀 결과는 회귀 위치(regression position) (x, y, w, h)를 포함할 수 있다. 여기서, x와 y는 대응하는 타겟 후보 영역의 특정 점(particular point)의 가로 좌표 및 세로 좌표를 나타낼 수 있다. 예를 들어, x와 y는 타겟 후보 영역의 중심점, 왼쪽 상단 모서리의 정점 등과 같이, 타겟 후보 영역에서 미리 결정된 임의의 정점의 가로 좌표 및 세로 좌표일 수 있다. 또한, w와 h는 타겟 후보 영역의 너비(width)와 높이(height)를 나타낼 수 있다.The correlation generated by the correlation calculator 166 is input to the anchor processor 169 . The anchor processor 169 may output confidence and/or regression results for each of one or more target candidate regions (candidate anchors). In one embodiment, the regression result may include a regression position (x, y, w, h). Here, x and y may represent abscissa and ordinate of a specific point of a corresponding target candidate area. For example, x and y may be the abscissa and ordinate of an arbitrary vertex in the target candidate region, such as a center point of the target candidate region, a vertex of an upper left corner, and the like. Also, w and h may represent the width and height of the target candidate area.

타겟 후보 영역들의 신뢰도(confidences) 및/또는 회귀 결과(regression results)에 따라, 복수의 타겟 후보 영역들(후보 앵커들) 중 하나의 타겟 후보 영역(최종 앵커(final anchor))(170)이 선택된다.According to the confidences and/or regression results of the target candidate regions, one target candidate region (final anchor) 170 is selected from among the plurality of target candidate regions (candidate anchors). do.

선택된 타겟 후보 영역(최종 앵커(final anchor))(170)의 신뢰도(confidence) 및/또는 회귀 결과(regression result)에 기초하여, 현재 프레임 이미지(110)에서 타겟 객체의 위치 및 대응하는 검출 신뢰도(detection confidence)를 얻는다. 이렇게 검출된 타겟 객체에 관한 정보(information regarding the detected target object)가 출력된다.Based on the confidence of the selected target candidate region (final anchor) 170 and/or the regression result, the position of the target object in the current frame image 110 and the corresponding detection confidence ( detection confidence) is obtained. In this way, information regarding the detected target object is output.

또한, 현재 프레임 이미지(110)에 대응하는 추적/검출 결과에 따라, 타겟 메모리(131)를 갱신(update)한다. 예를 들어, 검출된 타겟 후보 영역(최종 앵커(final anchor))(170)이, 새로운 타겟 템플릿으로, 타겟 메모리에 추가(add)된다. 이렇게, 타겟 메모리(131)는 갱신되고, 갱신된 타겟 메모리(131)는 다음 프레임 이미지의 타겟 객체의 검출 및/또는 추적에 사용된다.Also, according to the tracking/detection result corresponding to the current frame image 110 , the target memory 131 is updated. For example, the detected target candidate region (final anchor) 170 is added to the target memory as a new target template. In this way, the target memory 131 is updated, and the updated target memory 131 is used to detect and/or track the target object of the next frame image.

일 실시예에 따르면, 프레임 이미지로부터 타겟 객체를 검출하는 단계는, 타겟 템플릿 세트 및 간섭 템플릿 세트(interference templage set)에 기초하여, 상기 프레임 이미지로부터 타겟 객체를 검출하는 단계를 포함한다. 간섭 템플릿 세트는, 하나 이상의 간섭 템플릿(interference template)을 포함한다. 하나 이상의 간섭 템플릿의 각각은, 비디오의 해당 프레임 이미지(subject frame image)의 이전 프레임 이미지들 중 상기 객체의 검출을 방해(interfere with the detection of the object)한 간섭 객체(interfere object)에 관한 정보를 포함한다.According to an embodiment, detecting the target object from the frame image includes detecting the target object from the frame image based on a target template set and an interference template set. The interference template set includes one or more interference templates. Each of the one or more interference templates includes information about an interference object that interferes with the detection of the object among previous frame images of a subject frame image of the video. include

일 실시예에 따르면, 프레임 이미지로부터 타겟 객체를 검출하는 단계는, 타겟 템플릿 세트에 기초하여 하나 이상의 타겟 후보 영역을 획득하는 단계, 간섭 템플릿 세트에 기초하여, 상기 하나 이상의 타겟 후보 영역으로부터 하나의 타겟 영역을 결정하는 단계, 및 상기 타겟 영역에 기초하여 타겟 객체를 검출하는 단계를 포함한다.According to an embodiment, detecting the target object from the frame image includes: obtaining one or more target candidate regions based on a set of target templates; based on a set of interference templates, one target from the one or more target candidate regions determining an area, and detecting a target object based on the target area.

선택적으로, 현재 프레임 이미지로부터 타겟 객체를 검출할 때, 타겟 템플릿 세트 및 간섭 템플릿 세트에 따라, 현재 프레임 이미지에 대응하는 하나 이상의 타겟 후보 영역을 획득할 수 있으며, 구체적인 과정은 다음과 같다.Optionally, when detecting a target object from the current frame image, one or more target candidate regions corresponding to the current frame image may be obtained according to the target template set and the interference template set, and the specific process is as follows.

백본 네트워크를 통해, 메모리 풀의 타겟 템플릿 세트와 간섭 템플릿 세트로부터 이미지 특징들이 추출된다. 리콜 네트워크는, 추출된 이미지 특징들로부터 특징들을 융합한 융합 템플릿(integration template)을 획득한다. 현재 프레임 이미지에서 타겟 객체를 검출할 때, 현재 프레임 이미지의 여러 영역을 검색함으로써, 여러 검색 영역 박스들(search area box)이 얻어진다. 그 다음, 각 검색 영역 박스는 융합 템플릿과 매칭되고, 매칭 정도가 높은 하나 이상의 타겟 후보 영역이 획득(obtain)된다. 그 다음, 하나 이상의 타겟 후보 영역 중에서 최종적으로 하나의 타겟 영역이 선택되고, 선택된 타겟 영역에 기초하여 현재 프레임 이미지에 대응하는 타겟 객체의 검출 결과가 획득된다.Through the backbone network, image features are extracted from the target template set and the interference template set in the memory pool. The recall network obtains an integration template that fuses the features from the extracted image features. When detecting a target object in the current frame image, by searching several areas of the current frame image, several search area boxes are obtained. Then, each search area box is matched with the fusion template, and one or more target candidate areas with a high degree of matching are obtained. Then, one target area is finally selected from among the one or more target candidate areas, and a detection result of a target object corresponding to the current frame image is obtained based on the selected target area.

선택적으로, 신경망 모델에서의 타겟 추적에 있어서, 타겟 메모리를 사용하는 것에 추가하여, 간섭 메모리가 사용될 수 수 있으며, 구체적인 과정은 다음과 같다.Optionally, in the target tracking in the neural network model, in addition to using the target memory, an interference memory may be used, and the specific process is as follows.

단계 1, 비디오 시퀀스로부터 제1 프레임 이미지가 선택된다.Step 1, a first frame image is selected from a video sequence.

단계 2, 사용자가 제1 프레임 이미지에서 타겟을 지시(indicate)한다. 지시(indication)에 따라, 제1 프레임 이미지에서 타겟 객체가 위치한 이미지 영역의 이미지 특징이 추출된다. 추출된 이미지 특징으로부터, 초기 타겟 템플릿이 획득된다. 초기 타겟 템플릿은 타겟 템플릿 세트에 추가된다. 간섭 템플릿 세트는 비어(empty) 있다.Step 2, the user indicates the target in the first frame image. According to an indication, an image feature of an image region in which a target object is located is extracted from the first frame image. From the extracted image features, an initial target template is obtained. The initial target template is added to the set of target templates. The interference template set is empty.

단계 3, 비디오 시퀀스의 제2 프레임 이미지가 선택된다. 타겟 템플릿 세트에 포함되어 있는 초기 타겟 템플릿을 기반으로, 제2 프레임 이미지에서 타겟 객체의 타겟 위치가 예측되고, 예측 결과가 출력된다.Step 3, the second frame image of the video sequence is selected. Based on the initial target template included in the target template set, the target position of the target object in the second frame image is predicted, and the prediction result is output.

단계 4, 예측 결과를 기반으로, 제2 프레임 이미지에서 타겟 객체의 타겟 위치, 예측 신뢰도 등 추적 정보(tracking information)가 출력된다.Step 4, based on the prediction result, tracking information such as the target position of the target object and the prediction reliability in the second frame image is output.

단계 5, 제2 프레임 이미지의 예측 결과가 메모리 갱신 조건(memory update condition)을 만족하는지 여부가 판단된다. 만족하는 경우, 타겟 템플릿 세트는 갱신되고, 그렇지 않은 경우 갱신되지 않는다. 예를 들어, 만족하는 경우, 제2 프레임 이미지에 대응하는 타겟 템플릿이 타겟 템플릿 세트에 추가될 수 있다.Step 5, it is determined whether the prediction result of the second frame image satisfies a memory update condition. If satisfied, the target template set is updated, otherwise it is not updated. For example, if satisfied, a target template corresponding to the second frame image may be added to the target template set.

단계 6, 비디오 시퀀스의 제3 프레임 이미지가 선택된다. 이때, 타겟 템플릿 세트 및 간섭 템플릿 세트에 따라, 메모리 풀로부터 타겟 템플릿 세트 및 간섭 템플릿 세트가 로드(load)된다.Step 6, the third frame image of the video sequence is selected. At this time, the target template set and the interference template set are loaded from the memory pool according to the target template set and the interference template set.

단계 7, 타겟 템플릿 세트 및 간섭 템플릿 세트에 기반하여, 후속 프레임 이미지(subsequent frame image)에서 타겟 객체의 타겟 위치가 예측되고, 예측 결과가 출력된다. 그 다음 후속 프레임에서도 동일한 방법으로 진행된다.Step 7, based on the target template set and the interference template set, a target position of a target object in a subsequent frame image is predicted, and a prediction result is output. Then, the next frame proceeds in the same way.

상기의 방법을 보다 상세하게 설명하기 위하여 도 2에 도시된 예시를 참조하여 설명한다.In order to describe the above method in more detail, it will be described with reference to the example shown in FIG. 2 .

도 2는, 일 실시예에 따른 객체 추적 방법을 구현하는 신경말 모델을 도시한다.2 illustrates a neural model implementing an object tracking method according to an embodiment.

도 2에서, 추적하려고 하는 타겟 객체는 특정 운동 선수("타겟 운동 선수(target player)"라고 할 수 있음)이다. 현재 프레임 이미지(current frame image)(210)는, 타겟 운동 선수를 검출하려고 하는 비디오 내 프레임 이미지이다. 도 2의 실시예에서, 사용자는, 비디오의 재생 중 타겟 객체(타겟 운동 선수)를 발견하면, 이 프레임 이미지에서 타겟 객체를 지시(indicate)한다. 그러면, 이 프레임 이미지의 부분(이 부분은 타겟 객체를 포함함)의 정보가 초기 타겟 템플릿(initial target template)(220)으로 사용된다.In FIG. 2 , a target object to be tracked is a specific athlete (which may be referred to as a “target player”). A current frame image 210 is a frame image in a video for which a target athlete is to be detected. In the embodiment of FIG. 2 , when a user finds a target object (target athlete) during playback of a video, the user indicates the target object in this frame image. Then, information of a part of this frame image (this part includes a target object) is used as an initial target template 220 .

메모리 풀(memory pool)(230)은, 타겟 메모리(target memory)(231)와 간섭 메모리(interference memory)(232)를 포함한다. 타겟 메모리(131)는 타겟 템플릿 세트(target template set)를 저장하는 저장공간(storage space)이다. 간섭 메모리(232)는 간섭 템플릿 세트(interference template set)를 저장하는 저장공간이다.The memory pool 230 includes a target memory 231 and an interference memory 232 . The target memory 131 is a storage space for storing a target template set. The interference memory 232 is a storage space for storing an interference template set.

타겟 템플릿 세트는, 현재 프레임 이미지(210) 이전의 프레임 이미지들에 대응하는 타겟 템플릿들을 저장한다. 타겟 템플릿은, 비디오의 프레임 이미지들에서 타겟 객체에 관한 정보이다. 예를 들어, 타겟 템플릿은, 타겟 객체를 포함하는 이미지 영역일 수 있으며, 또는 상기 이미지 영역의 특징 정보(feature information)일 수 있다. 타겟 템플릿 세트는, 초기 타겟 템플릿(220)를 더 저장할 수 있다.The target template set stores target templates corresponding to frame images before the current frame image 210 . A target template is information about a target object in frame images of a video. For example, the target template may be an image area including a target object or feature information of the image area. The target template set may further store the initial target template 220 .

간섭 템플릿 세트는, 현재 프레임 이미지(210) 이전의 프레임 이미지들에 대응하는 간섭 템플릿들을 저장한다. 간섭 템플릿은, 비디오의 프레임 이미지들에서 타겟 객체의 검출을 방해한 간섭 객체에 관한 정보이다. 예를 들어, 간섭 템플릿은, 간섭 객체를 포함하는 이미지 영역일 수 있으며, 또는 상기 이미지 영역의 특징 정보(feature information)일 수 있다.The interference template set stores interference templates corresponding to frame images before the current frame image 210 . The interference template is information about an interference object that has prevented detection of a target object in frame images of a video. For example, the interference template may be an image region including an interference object, or may be feature information of the image region.

일 실시예에서, 현재 프레임 이미지(210)에서 타겟 객체를 검출하기 위하여, 현재 프레임 이미지(210)의 전체 또는 일부(검색 영역)는, 백본 네트워크(backbone network)(240)에 입력된다. 백본 네트워크(240)는, 신경망을 이용하여 검색 영역 특징(search area feature)를 추출할 수 있다. 검색 영역 특징은, 검색 영역에 대응하는 이미지 특징(image feature)이다. 예를 들어, 백본 네트워크(240)는, 현재 프레임 이미지(210)의 검색 영역에 대응하는 검색 영역 특징(245)으로, W x H x C를 출력할 수 있다. W x H x C는 특징맵(feature map)으로, W와 H는 각각 특징맵의 너비(width)와 높이(height)를 나타내고, C는 특징맵의 채널 수(즉, 특징의 수)를 나타낸다.In an embodiment, in order to detect a target object in the current frame image 210 , all or a part (search area) of the current frame image 210 is input to a backbone network 240 . The backbone network 240 may extract a search area feature using a neural network. The search area feature is an image feature corresponding to the search area. For example, the backbone network 240 may output W x H x C as the search area feature 245 corresponding to the search area of the current frame image 210 . W x H x C is a feature map, where W and H represent the width and height of the feature map, respectively, and C represents the number of channels (that is, the number of features) of the feature map. .

하나 이상의 타겟 후보 영역을 획득하기 위해, 메모리 풀(230)의 타겟 메모리(231)에 저장된 타겟 템플릿 세트 및 간섭 메모리(232)에 저장된 간섭 템플릿 세트는 백본 네트워크(250)에 입력된다. 백본 네트워크(250)는, 타겟 템플릿 세트의 각 타겟 템플릿에 대응하는 이미지 특징을 추출하고, 간섭 템플릿 세트의 각 간섭 템플릿에 대응하는 이미지 특징을 추출한다. 이 이미지 특징들은 리콜 네트워크(263)로 입력된다.To obtain one or more target candidate regions, the target template set stored in the target memory 231 of the memory pool 230 and the interference template set stored in the interference memory 232 are input to the backbone network 250 . The backbone network 250 extracts image features corresponding to each target template in the target template set, and extracts image features corresponding to each interference template in the interference template set. These image features are input to the recall network 263 .

일 실시예에 따르면, 프로세서(260)는, 리콜 네트워크(263), 상관도 계산부(correlation calculater)(266) 및 앵커 처리기(anchor processor)(269)를 포함할 수 있다. 리콜 네트워크(263), 상관도 계산부(correlation calculater)(266) 및 앵커 처리기(anchor processor)(269)는, 기능에 따른 개념적인 구분이며, 반드시 물리적으로 분리되어 구현될 필요는 없다. 예를 들어, 리콜 네트워크(263), 상관도 계산부(correlation calculater)(266) 및 앵커 처리기(anchor processor)(269)는, 모두 프로세서(260)에 의해 처리되는 다른 기능들일 수 있으며, 설명의 편의상 구분된 것이다.According to an embodiment, the processor 260 may include a recall network 263 , a correlation calculator 266 , and an anchor processor 269 . The recall network 263 , a correlation calculator 266 , and an anchor processor 269 are conceptually divided according to functions, and do not necessarily have to be physically separately implemented. For example, recall network 263 , correlation calculater 266 , and anchor processor 269 may all be other functions processed by processor 260 , as described in They are separated for convenience.

리콜 네트워크(263)는, 타겟 템플릿 세트에 포함된 타겟 템플릿들의 각각에 대응하는 이미지 영역의 이미지 특징(image feature)을 융합(integrate)하여 타겟 융합 특징(integrated target feature)을 출력한다. 또한, 리콜 네트워크(263)는, 간섭 템플릿 세트에 포함된 간섭 템플릿들의 각각에 대응하는 이미지 영역의 이미지 특징(image feature)을 융합(integrate)하여 간섭 융합 특징(integrated interference feature)을 출력한다. 리콜 네트워크(263)는, 신경망을 이용하여 입력된 이미지 특징들을 융합하는 네트워크로서, FFCNN(feature fusion convolutional neural network) 등이 사용될 수 있으나, 이에 제한되지 않는다.The recall network 263 integrates an image feature of an image region corresponding to each of the target templates included in the target template set, and outputs an integrated target feature. Also, the recall network 263 outputs an integrated interference feature by integrating an image feature of an image region corresponding to each of the interference templates included in the interference template set. The recall network 263 is a network that fuses image features input using a neural network, and a feature fusion convolutional neural network (FFCNN) may be used, but is not limited thereto.

일 실시예에 따르면, 타겟 템플릿 세트에 포함된 타겟 템플릿들의 각각에 대응하는 이미지 영역의 이미지 특징(image feature)을 융합(integrate)하여 타겟 융합 특징(integrated target feature)이 결정된다. 그리고, 상기 타겟 융합 특징에 기초하여, 상기 프레임 이미지 내에서 상기 객체를 포함하는 것으로 판단되는 하나 이상의 타겟 후보 영역(target candidate area)가 획득(obtain)된다. 상기 하나 이상의 타겟 후보 영역으로부터 하나의 타겟 영역(target area)가 결정된다.According to an embodiment, an integrated target feature is determined by integrating an image feature of an image region corresponding to each of the target templates included in the target template set. Then, based on the target fusion characteristic, one or more target candidate areas determined to include the object in the frame image are obtained. One target area is determined from the one or more target candidate areas.

일 실시예에 따르면, 상관도 계산부(correlation calculater)(266)는, 현재 프레임 이미지(210)의 복수의 검색 영역들(search areas)의 각각에 대응하는 검색 영역 특징(245)의 각각과 타겟 특징 커널(target feature kernel)(예를 들어, 타겟 융합 특징)의 상관도(correlation)를 계산한다. 상관도에 따라, 검색 영역들 중 하나 이상의 후보 앵커(타겟 후보 영역)이 결정된다. 예를 들어, 상관도의 값이 큰 K개의 후보 앵커(타겟 후보 영역)가 선택되어, Top-K 후보 앵커(타겟 후보 영역)으로 기록된다.According to one embodiment, correlation calculater 266 , each of a search area feature 245 corresponding to each of a plurality of search areas of current frame image 210 and a target A correlation of a target feature kernel (eg, a target fusion feature) is calculated. According to the degree of correlation, one or more candidate anchors (target candidate areas) among the search areas are determined. For example, K candidate anchors (target candidate areas) having large correlation values are selected and recorded as Top-K candidate anchors (target candidate areas).

일 실시예에 따르면, 상관도 계산부(correlation calculater)(266)는, 또한, 검색 영역들(search areas)의 각각에 대응하는 검색 영역 특징(245)의 각각과 간섭 특징 커널(interference feature kernel)(예를 들어, 간섭 융합 특징)의 상관도(correlation)를 계산한다. 상관도에 따라, 검색 영역들 중 이 상관도(또는 매칭도)가 가장 낮은 하나의 후보 앵커(타겟 후보 영역)가 선택된다. 선택된 후보 앵커(타겟 후보 영역)은 Bottom-1 후보 앵커로 기록된다.According to one embodiment, correlation calculater 266 also provides each of the search area features 245 corresponding to each of the search areas and an interference feature kernel. Calculate the correlation of (eg, interference fusion features). According to the degree of correlation, one candidate anchor (target candidate area) having the lowest degree of correlation (or matching degree) among the search areas is selected. The selected candidate anchor (target candidate area) is recorded as a Bottom-1 candidate anchor.

상관도 계산부(correlation calculater)(266)에 의하여 생성된 상관도는 앵커 처리기(anchor processor)(269)에 입력된다. 앵커 처리기(269)는 하나 이상의 타겟 후보 영역들(후보 앵커들)의 각각에 대한 신뢰도(confidence) 및/또는 회귀 결과(regression results)를 출력할 수 있다. 일 실시예에서, 회귀 결과는 회귀 위치(regression position) (x, y, w, h)를 포함할 수 있다. 여기서, x와 y는 대응하는 타겟 후보 영역의 특정 점(particular point)의 가로 좌표 및 세로 좌표를 나타낼 수 있다. 이에 기초하여, 현재 프레임 이미지(210)에서 타겟 객체의 위치 및 신뢰도가 결정될 수 있다.The correlation generated by the correlation calculator 266 is input to the anchor processor 269 . The anchor processor 269 may output confidence and/or regression results for each of one or more target candidate regions (candidate anchors). In one embodiment, the regression result may include a regression position (x, y, w, h). Here, x and y may represent abscissa and ordinate of a specific point of a corresponding target candidate area. Based on this, the position and reliability of the target object in the current frame image 210 may be determined.

타겟 후보 영역들의 신뢰도(confidences) 및/또는 회귀 결과(regression results)에 따라, 복수의 타겟 후보 영역들(후보 앵커들) 중 하나의 타겟 후보 영역(최종 앵커(final anchor))(270)이 선택된다.According to the confidences and/or regression results of the target candidate regions, one target candidate region (final anchor) 270 is selected from among the plurality of target candidate regions (candidate anchors). do.

또한, 추적 결과에 따라 타겟 메모리(231)와 간섭 메모리(232)를 갱신하는 것도 가능하다. 예를 들어, 검출된 타겟 후보 영역(최종 앵커(final anchor))(270)이, 새로운 타겟 템플릿으로, 타겟 메모리(231)에 추가(add)될 수 있다. 또한, 최종적으로 선택된 타겟 후보 영역(최종 앵커(final anchor))(270)을 제외한 복수의 타겟 후보 영역들(후보 앵커들) 중 일부 또는 전체가 간섭 메모리(232)에 추가(add)될 수 있다. 일 실시예에서, 최종 앵커(270)와 가장 매칭도가 낮은 후보 앵커가 간섭 메모리(232)에 추가될 수 있다.In addition, it is also possible to update the target memory 231 and the interference memory 232 according to the tracking result. For example, the detected target candidate region (final anchor) 270 may be added to the target memory 231 as a new target template. In addition, some or all of the plurality of target candidate regions (candidate anchors) excluding the finally selected target candidate region (final anchor) 270 may be added to the interference memory 232 . . In an embodiment, a candidate anchor having the lowest degree of matching with the final anchor 270 may be added to the interference memory 232 .

도 3은 도 2에 따른 객체 추적 방법의 흐름도이다. 도 3을 참조하여, 도 2에 따른 객체 추적 방법이 설명된다.3 is a flowchart of an object tracking method according to FIG. 2 . Referring to FIG. 3 , an object tracking method according to FIG. 2 is described.

단계(310)에서, 타겟 메모리(231) 및 간섭 메모리(232)로부터, 각각, 타겟 템플릿 세트 및 간섭 템플릿 세트가 판독된다.In step 310, the target template set and the interference template set are read from the target memory 231 and the interference memory 232, respectively.

단계(320)에서, 리콜 네트워크(263)는, 타겟 템플릿 세트에 포함된 타겟 템플릿들의 각각에 대응하는 이미지 영역의 이미지 특징(image feature)을 융합(integrate)하여 타겟 융합 특징(integrated target feature)("타겟 특징 커널(target feature kernel)"이라고도 함)(330)을 출력한다. 또한, 리콜 네트워크(263)는, 간섭 템플릿 세트에 포함된 간섭 템플릿들의 각각에 대응하는 이미지 영역의 이미지 특징(image feature)을 융합(integrate)하여 간섭 융합 특징(integrated interference feature)("간섭 특징 커널(interference feature kernel)"이라고도 함)(340)을 출력한다.In step 320, the recall network 263 integrates an image feature of an image region corresponding to each of the target templates included in the target template set to form an integrated target feature ( Outputs a “target feature kernel” 330 . In addition, the recall network 263 integrates an image feature of an image region corresponding to each of the interference templates included in the interference template set to form an integrated interference feature (“interference feature kernel”). (also called "interference feature kernel)") (340).

단계(350)에서, 상관도 계산부(correlation calculater)(266)는, 현재 프레임 이미지(210)의 복수의 검색 영역들(search areas)의 각각에 대응하는 검색 영역 특징(245)의 각각과 타겟 융합 특징(타겟 특징 커널)의 상관도(correlation)를 계산한다. 또한, 상관도 계산부(correlation calculater)(266)는, 검색 영역들(search areas)의 각각에 대응하는 검색 영역 특징(245)의 각각과 간섭 융합 특징(간섭 특징 커널)의 상관도(correlation)를 계산한다.In step 350 , correlation calculater 266 , each of the search area features 245 corresponding to each of a plurality of search areas of the current frame image 210 and the target Calculate the correlation of the fusion feature (target feature kernel). Further, a correlation calculator 266 is configured to perform a correlation between each of the search area features 245 corresponding to each of the search areas and an interference fusion feature (interference feature kernel). to calculate

단계(360)에서, 단계(350)의 상관도 계산 결과에 따라, 검색 영역들 중 하나 이상의 타겟 후보 영역이 결정된다.In operation 360 , one or more target candidate regions among the search regions are determined according to the correlation calculation result of operation 350 .

단계(370)에서, 하나 이상의 타겟 후보 영역의 신뢰도(confidences) 및/또는 회귀 결과(regression results)에 따라, 하나 이상의 타겟 후보 영역(후보 앵커) 중 하나의 타겟 후보 영역(최종 앵커(final anchor))(270)이 선택된다. 즉, 현재 프레임 이미지(210)에서 타겟 객체를 포함하는 것으로 판단되는 하나의 타겟 영역(target area)이 결정된다. 이는, 현재 프레임 이미지(210)에서 타겟 객체의 위치를 결정하는 것과 동일한 동작으로 이해된다.In step 370, one target candidate region (final anchor) of one or more target candidate regions (candidate anchors) according to regression results and/or confidences of the one or more target candidate regions. ) (270) is selected. That is, one target area determined to include the target object in the current frame image 210 is determined. This is understood as the same operation as determining the position of the target object in the current frame image 210 .

단계(380)에서, 검출 결과에 따라, 타겟 메모리(231) 및 간섭 메모리(232)가 갱신된다.In step 380 , the target memory 231 and the interference memory 232 are updated according to the detection result.

도 4는 일 실시예에 따른 객체 추적 방법의 메모리 매칭을 설명하기 위한 도면이다.4 is a diagram for explaining memory matching of an object tracking method according to an embodiment.

도 4에 도시된 바와 같이, 메모리 풀(430)은 타겟 메모리(431)와 간섭 메모리(432)를 포함한다. 타겟 메모리(431)는 타겟 템플릿 세트(target template set)를 저장한다. 간섭 메모리(432)는 간섭 템플릿 세트(interference template set)를 저장한다. 메모리 풀(430)의 타겟 메모리(431)에 저장된 타겟 템플릿 세트 및 간섭 메모리(432)에 저장된 간섭 템플릿 세트는 백본 네트워크(440)에 입력된다.As shown in FIG. 4 , the memory pool 430 includes a target memory 431 and an interfering memory 432 . The target memory 431 stores a target template set. The interference memory 432 stores an interference template set. The target template set stored in the target memory 431 of the memory pool 430 and the interference template set stored in the interference memory 432 are input to the backbone network 440 .

백본 네트워크(440)는, 타겟 템플릿 세트의 각 타겟 템플릿에 대응하는 이미지 특징을 추출한다. 백본 네트워크(440)는, 간섭 템플릿 세트의 각 간섭 템플릿에 대응하는 이미지 특징을 추출한다.The backbone network 440 extracts image features corresponding to each target template in the target template set. The backbone network 440 extracts image features corresponding to each interference template in the interference template set.

리콜 네트워크(recall network)(450)는, 백본 네트워크(440)로부터 수신한 타겟 템플릿들에 대응하는 이미지 특징들(image feature)을 융합(integrate)(451)하여 타겟 융합 특징(integrated target feature)(453)을 생성한다. 리콜 네트워크(recall network)(450)는, 백본 네트워크(440)로부터 수신한 간섭 템플릿들에 대응하는 이미지 특징들(image feature)을 융합(integrate)(455)하여 간섭 융합 특징(integrated intereference feature)457)을 생성한다. The recall network 450 integrates 451 image features corresponding to target templates received from the backbone network 440 to form an integrated target feature ( 453) is created. The recall network 450 integrates 455 image features corresponding to the interference templates received from the backbone network 440 to form an integrated intereference feature 457 ) is created.

상관도 계산부(correlation calculater)(460)는, 현재 프레임 이미지의 복수의 검색 영역들(search areas)의 각각에 대응하는 검색 영역 특징(445)의 각각과 타겟 융합 특징(453)의 상관도(correlation)를 계산한다. 또한, 상관도 계산부(correlation calculater)(266)는, 복수의 검색 영역들(search areas)의 각각에 대응하는 검색 영역 특징(445)의 각각과 간섭 융합 특징(457)의 상관도(correlation)를 계산한다.A correlation calculator 460 is configured to calculate a correlation ( ) of a target fusion feature 453 with each of a search area feature 445 corresponding to each of a plurality of search areas of the current frame image. correlation) is calculated. Further, a correlation calculator 266 is configured to correlate each of the search area features 445 corresponding to each of the plurality of search areas with the coherent fusion feature 457 . to calculate

상관도 계산부(correlation calculater)(460)에 의하여 생성된 상관도는 앵커 처리기(anchor processor)(470)에 입력된다. 앵커 처리기(470)는, 예를 들어, 신경망을 이용하여, 매칭 정도에 기초하여, 하나 이상의 앵커(anchor)를 출력할 수 있다. 앵커 처리기(470)는, 상관도 계산부(460)에 의하여 출력된 검색 영역들의 각각에 대응하는 검색 영역 특징(445)의 각각과 타겟 융합 특징(453)의 상관도에 기초하여, K개의 앵커들(검색 영역들)을 선택할 수 있다. 예를 들어, 상기 상관도에 기초한 매칭 정도에 따라, 매칭 정도가 높은 K개의 앵커들(검색 영역들)이 선택될 수 있다. K는 1보다 크거나 같은 정수로 실시예에 따라 선택될 수 있다. 예를 들어, K는 3일 수 있다. 선택된 K개의 앵커들은, Top-K 앵커(475)로 저장된다. Top-K 앵커(475)는 타겟 후보 영역이 된다.The correlation generated by the correlation calculator 460 is input to the anchor processor 470 . The anchor processor 470 may output one or more anchors based on the degree of matching using, for example, a neural network. The anchor processor 470, based on the correlation between the target fusion feature 453 and each of the search region features 445 corresponding to each of the search regions output by the correlation calculation unit 460, is configured for the K anchors. (search areas) can be selected. For example, according to the matching degree based on the correlation, K anchors (search areas) having a high matching degree may be selected. K is an integer greater than or equal to 1 and may be selected according to an embodiment. For example, K may be 3. The selected K anchors are stored as Top-K anchors 475 . The Top-K anchor 475 becomes the target candidate area.

일 실시예에 따르면, 상관도 계산부(correlation calculater)(480)는, Top-K 앵커들(475)의 각각(또는 Top-K 앵커들(475)의 각각에 대응하는 특징 정보)과 간섭 융합 특징(457)의 상관도를 계산한다. 상관도 계산부(480)에 의하여, 타겟 후보 영역들(475)과 간섭 융합 특징(457)의 매칭 정도가 계산된다. 타겟 후보 영역들(475) 중 간섭 융합 특징(457)과 가장 낮은 매칭 정도를 가지는 하나의 타겟 후보 영역(후보 앵커)이 Bottom-1 앵커로 선택된다. 이 선택된 Bottom-1 앵커가 타겟 객체의 타겟 영역(490)으로 결정된다. 타겟 영역은, 현재 프레임 이미지에서 타겟 객체가 위치하는 것으로 판단되는 위치에 대응(correspond)한다.According to an embodiment, the correlation calculator 480 is interference fusion with each of the Top-K anchors 475 (or feature information corresponding to each of the Top-K anchors 475 ). Calculate the correlation of features 457 . The degree of matching between the target candidate regions 475 and the interference fusion feature 457 is calculated by the correlation calculator 480 . One target candidate region (candidate anchor) having the lowest degree of matching with the interference fusion feature 457 among the target candidate regions 475 is selected as the Bottom-1 anchor. This selected Bottom-1 anchor is determined as the target area 490 of the target object. The target area corresponds to a position in the current frame image where the target object is determined to be located.

선택된 타겟 영역(490)의 신뢰도(confidence) 및/또는 회귀 결과(regression result)에 기초하여, 현재 프레임 이미지에서의 타겟 객체의 위치에 대한 신뢰도(detection confidence)가 계산된다.Based on the confidence of the selected target area 490 and/or the regression result, a detection confidence for the position of the target object in the current frame image is calculated.

도 1, 도 2 및 도 4에 도시된 구성들은, 설명의 목적을 위한, 예시적인 것임을 이해할 수 있다. 실시예들은 이러한 구성들에 제한되지 않는다. 예를 들어, 도 1, 도 2 및 도 4의 메모리 풀의 타겟 메모리에 저장된 타겟 템플릿들 및 간섭 메모리에 저장된 간섭 템플릿들은 예시적이며, 비디오의 후속 프레임들에 따라 갱신될 수 있다. 도 1의 실시예와 도 2의 실시예의 차이점은, 도 1의 실시예는 타겟 객체를 검출하기 위하여 타겟 메모리(또는 타겟 템플릿들)를 사용하고, 도 2의 실시예는 타겟 객체를 검출하기 위하여 타겟 메모리(또는 타겟 템플릿들) 및 간섭 메모리(또는 간섭 템플릿들)을 함께 사용한다는 것이다.It can be understood that the configurations shown in FIGS. 1, 2 and 4 are exemplary for the purpose of explanation. Embodiments are not limited to these configurations. For example, the target templates stored in the target memory of the memory pool of FIGS. 1 , 2 and 4 and the interference templates stored in the interference memory are exemplary and may be updated according to subsequent frames of the video. The difference between the embodiment of Fig. 1 and the embodiment of Fig. 2 is that the embodiment of Fig. 1 uses a target memory (or target templates) to detect the target object, and the embodiment of Fig. 2 uses the embodiment of Fig. 2 to detect the target object. It is to use a target memory (or target templates) and an interference memory (or interference templates) together.

실시예들을 통해, 타겟 메모리 및/또는 간섭 메모리에 기초하여 타겟 객체가 검출될 수 있다. 이러한 실시예들을 사용하면, 타겟 객체가 이동할 때, 타겟 객체의 다양한 변화를 보다 포괄적으로 기록하고, 검출 정확도를 효과적으로 향상시킬 수 있다.In embodiments, a target object may be detected based on a target memory and/or an interfering memory. Using these embodiments, when the target object moves, it is possible to more comprehensively record various changes of the target object, and effectively improve the detection accuracy.

일 실시예에 따르면, 하나 이상의 타겟 후보 영역으로부터 하나의 타겟 영역을 결정하는 단계는, 상기 하나 이상의 타겟 후보 영역의 각각에 대해, 상기 타겟 후보 영역의 각각과 상기 타겟 템플릿 세트의 각 타겟 템플릿과의 매칭 정도(matching degree)를 계산하는 단계, 상기 하나 이상의 타겟 후보 영역의 각각에 대해, 상기 타겟 후보 영역의 각각과 상기 간섭 템플릿 세트의 각 간섭 템플릿과의 매칭 정도(matching degree)를 계산하는 단계, 상기 타겟 후보 영역의 각각과 상기 타겟 템플릿 세트의 각 타겟 템플릿과의 매칭 정도 및 상기 타겟 후보 영역의 각각과 상기 간섭 템플릿 세트의 각 간섭 템플릿과의 매칭 정도에 기초하여, 상기 타겟 후보 영역의 각각의 타겟 매칭 정도를 계산하는 단계, 및 상기 타겟 후보 영역의 각각의 타겟 매칭 정도에 기초하여, 상기 하나 이상의 타겟 후보 영역 중 하나의 타겟 영역을 결정하는 단계를 포함한다.According to an embodiment, the determining of one target region from the one or more target candidate regions comprises: for each of the one or more target candidate regions, each of the target candidate regions and each target template of the set of target templates calculating a matching degree, for each of the one or more target candidate areas, calculating a matching degree between each of the target candidate areas and each interference template in the set of interference templates; Based on the matching degree of each of the target candidate areas with each target template of the target template set and the matching degree of each of the target candidate areas with each interference template of the interference template set, each of the target candidate areas calculating a target matching degree, and determining one of the one or more target candidate areas based on the target matching degree of each of the target candidate areas.

이 실시예에서, 각 타겟 후보 영역과 각 타겟 템플릿 간의 매칭 정도가 얻어지며, 이는 제1 매칭 정도로 기록(record)될 수 있다. 임의의 타겟 후보 영역에 대해, 타겟 후보 영역과 간섭 템플릿 세트의 각 간섭 템플릿 간의 매칭 정도가 계산되고, 타겟 후보 영영과 간섭 템플릿들 간의 복수의 매칭 정도가 얻을 질 수 있다. 이 복수의 매칭 정도의 평균값(또는 중앙값)은, 해당 타겟 후보 영역과 간섭 템플릿 간의 매칭 정도로 선택된다. 각 타겟 후보 영역과 각 간섭 템플릿 간의 매칭 정도도 계산될 수 있고, 이는 제2 매칭 정도로 기록될 수 있다. 제1 매칭 정도의 값과 제2 매칭 정도의 값을 각각 1:1 대응하여 감산함으로써, 제1 타겟 매칭 정도(first target matching degree)가 계산될 수 있다. 제1 타겟 매칭 정도의 값이 가장 큰 타겟 후보 영역은 타겟 영역으로 식별(identify)된다.In this embodiment, the matching degree between each target candidate area and each target template is obtained, which can be recorded as the first matching degree. For any target candidate region, a matching degree between the target candidate region and each interference template of the interference template set is calculated, and a plurality of matching degrees between the target candidate region and the interference templates can be obtained. The average value (or median value) of the plurality of matching degrees is selected as the matching degree between the target candidate area and the interference template. A matching degree between each target candidate area and each interference template may also be calculated, and this may be recorded as a second matching degree. By subtracting the value of the first matching degree and the value of the second matching degree in a 1:1 correspondence, respectively, a first target matching degree may be calculated. A target candidate region having the largest value of the first target matching degree is identified as a target region.

실시예에 따르면, 매칭 정도에 따라, 타겟 메모리와의 매칭도가 높고 간섭 메모리와의 매칭도가 낮은 타겟 후보 영역이 타겟 영역으로 식별될 수 있다. 이에 따르면, 현재 추적하는 타겟 객체를 타겟 템플릿들에 최대한 가까운 객체를 선택할 수 있도록 하고, 동시에 간섭 템플릿들의 영향을 가장 적게 받는 객체를 선택할 수 있도록 한다. 따라서, 타겟 객체의 검출의 정확도가 높아진다.According to an embodiment, a target candidate area having a high degree of matching with the target memory and a low degree of matching with the interfering memory may be identified as the target area according to the matching degree. According to this, it is possible to select an object that is as close as possible to the target templates for the currently tracked target object, and at the same time to select an object that is least affected by the interference templates. Accordingly, the accuracy of detection of the target object is increased.

일 실시예에서, 타겟 템플릿 세트의 각 타겟 템플릿과 간섭 템플릿의 각 간섭 템플릿을 이용하여 타겟 객체의 검출 결과를 획득할 수 있다. 구체적으로, 타겟 템플릿 세트의 각 타겟 템플릿과 간섭 템플릿의 각 간섭 템플릿을 각각 프레임 이미지 내 일부분에 대응하는 복수의 검색 영역 박스의 각각과 일대일 매칭시킴으로써, 하나 이상의 타겟 후보 영역을 획득할 수 있다. 그 다음, 상기 하나 이상의 타겟 후보 영역으로부터 타겟 객체에 대응하는 하나의 타겟 영역을 선택하고, 상기 타겟 영역을 기초로 타겟 객체의 검출 결과를 획득할 수 있다.In an embodiment, the detection result of the target object may be obtained by using each target template of the target template set and each interference template of the interference template. Specifically, one or more target candidate regions may be obtained by one-to-one matching of each target template of the target template set and each interference template of the interference template with each of a plurality of search region boxes corresponding to a portion in a frame image, respectively. Then, one target area corresponding to the target object may be selected from the one or more target candidate areas, and a detection result of the target object may be obtained based on the target area.

상기는 예시일 뿐, 본 실시예는 이에 한정되지 않는다.The above is only an example, and the present embodiment is not limited thereto.

도 5는 일 실시예에 따른 간소화된 객체 검출 방법(light-weight object detection method)를 설명하기 위한 도면이다.FIG. 5 is a diagram for describing a light-weight object detection method according to an exemplary embodiment.

도 5에 도시된 바와 같이, 메모리 풀(memory pool)(530)은, 타겟 메모리(target memory)(531)와 간섭 메모리(interference memory)(532)를 포함한다. 타겟 메모리(531)는 타겟 템플릿 세트(target template set)를 저장한다. 간섭 메모리(532)는 간섭 템플릿 세트(interference template set)를 저장한다. 본 실시에서, 타겟 템플릿 세트에 포함되는 타겟 템플릿들의 각각은 대응하는 이미지 영역의 특징 벡터(feature vector)이다. 즉, 도 5에서,

는 타겟 메모리(531)에 저장된 타겟 템플릿 세트 내 i번째 타겟 템플릿의 특징 벡터를 나타낸다(denote). 또한, 간섭 템플릿 세트에 포함되는 간섭 템플릿들의 각각은 대응하는 이미지 영역의 특징 벡터(feature vector)이다. 즉, 도 5에서,

는 간섭 메모리에 저장된 간섭 템플릿 세트 내 i번째 간섭 템플릿의 특징 벡터를 나타낸다.As shown in FIG. 5 , a memory pool 530 includes a target memory 531 and an interference memory 532 . The target memory 531 stores a target template set. The interference memory 532 stores an interference template set. In this embodiment, each of the target templates included in the target template set is a feature vector of a corresponding image region. That is, in Fig. 5,

denotes a feature vector of the i-th target template in the target template set stored in the target memory 531 (denote). In addition, each of the interference templates included in the interference template set is a feature vector of a corresponding image region. That is, in Fig. 5,

denotes a feature vector of the i-th interference template in the interference template set stored in the interference memory.

도 5의 실시예에서 제1 프레임 이미지에 대응하는 타겟 템플릿은 타겟 메모리(531)에 저장되지 않고, 별도로 입력될 수 있다. 제1 프레임 이미지로부터 추출된 템플릿 이미지(520)를 백본 네트워크(550) 및 조정 레이어(adjustment layer)(553)를 통하여 처리함으로써, 초기 타겟 템플릿(initial target template)(556)이 얻어진다.In the embodiment of FIG. 5 , the target template corresponding to the first frame image is not stored in the target memory 531 and may be separately input. By processing the template image 520 extracted from the first frame image through the backbone network 550 and the adjustment layer 553 , an initial target template 556 is obtained.

일 실시예에 따르면, 제1 프레임 이미지로부터 추출된 템플릿 이미지(520)는, 비디오의 프레임 이미지들 중 타겟 객체를 포함하는 것으로 사용자에 의하여 결정된 검색 영역(이미지 영역)의 이미지일 수 있다. 또 다른 실시예에 따르면, 제1 프레임 이미지는 타겟 객체를 포함하는 이미지로서, 상기 비디오와 독립(independent)된 별도의 이미지(separate image)일 수 있다. 타겟 템플릿은, 앞에서 설명한 바와 같이, 대응하는 이미지의 이미지 특성(image feature)을 나타낸다(represent).According to an embodiment, the template image 520 extracted from the first frame image may be an image of a search area (image area) determined by a user to include a target object among frame images of a video. According to another embodiment, the first frame image is an image including a target object, and may be a separate image independent of the video. The target template, as described above, represents the image features of the corresponding image.

현재 프레임 이미지(510)로부터 복수 개의 검색 영역이 선택된다. 복수 개의 검색 영역은 현재 프레임 이미지(510)를 복수 개로 분할한 영역들일 수 있다. 복수 개의 검색 영역들은 서로 일부 영역이 중복될 수 있다. 이전 프레임에서 타겟 객체의 타겟 영역이 결정된 경우, 이전 프레임의 타겟 영역의 위치에 기반하여, 현재 프레임에서 복수 개의 검색 영역들을 결정될 수 있다. 복수 개의 검색 영역들의 각각을 백본 네트워크(540) 및 조정 레이어(adjustment layer)(543)를 통하여 처리함으로써, 검색 영역에 대응하는 검색 영역 특징(546)이 얻어진다.A plurality of search areas are selected from the current frame image 510 . The plurality of search areas may be areas obtained by dividing the current frame image 510 into a plurality of pieces. A plurality of search areas may partially overlap each other. When the target area of the target object is determined in the previous frame, a plurality of search areas may be determined in the current frame based on the location of the target area of the previous frame. By processing each of the plurality of search areas through the backbone network 540 and an adjustment layer 543 , a search area feature 546 corresponding to the search area is obtained.

백본 네트워크(540, 550)는, 신경망을 이용하여 이미지의 특징(features)을 추출하는 네트워크로서, MobileNet, ResNet, Xception Network 등이 사용될 수 있으나, 이에 제한되지 않는다. 도 5에서, 백본 네트워크에 의하여 출력된 특징 정보(feature information)은 조정 레이어(553, 543)에 의하여 조정(adjust)되지만, 실시예에 따라서 생략될 수 있다. 조정 레이어(553, 543)는 백본 네트워크에 의하여 출력된 특징 정보(feature information) 중 일부 정보만을 추출(extract)하거나 특징 정보의 값을 조정할 수 있다.The backbone networks 540 and 550 are networks for extracting image features using a neural network, and MobileNet, ResNet, Xception Network, etc. may be used, but is not limited thereto. In FIG. 5 , feature information output by the backbone network is adjusted by the adjustment layers 553 and 543 , but may be omitted according to embodiments. The adjustment layers 553 and 543 may extract only some of the feature information output by the backbone network or adjust the value of the feature information.

상관도 계산부(correlation calculater)(560)는, 검색 영역에 대응하는 검색 영역 특징(546)과 초기 타겟 템플릿(initial target template)(556)의 상관도(correlation)를 계산한다. 예를 들어, 상관도 계산부(560)는 깊이 상관 방법(depthwise correlation method)를 사용하여, 상관도를 계산할 수 있다.A correlation calculator 560 calculates a correlation between the search area feature 546 corresponding to the search area and an initial target template 556 . For example, the correlation calculator 560 may calculate the correlation using a depthwise correlation method.

상관도 계산부(correlation calculater)(560)에 의하여 생성된 상관도는 앵커 처리기(anchor processor)(570)에 입력된다. 앵커 처리기(570)는, 상기 상관도에 기초한 매칭 정도에 따라, 매칭 정도가 높은 K개의 앵커들(검색 영역들)을 Top-K 앵커들(575)로 선택한다. 선택된 앵커(검색 영역)는 타겟 후보 영역으로 식별된다.The correlation generated by the correlation calculator 560 is input to the anchor processor 570 . The anchor processor 570 selects K anchors (search areas) having a high matching degree as the Top-K anchors 575 according to the matching degree based on the correlation. The selected anchor (search area) is identified as a target candidate area.

K개의 타겟 후보 영역들(575)의 각각에 대하여, 예를 들어 계산기(computing function)(580)에 의하여, 수학식 1의 스코어(score)가 계산된다.For each of the K target candidate regions 575 , a score of Equation 1 is calculated, for example by a computing function 580 .

여기서,

는 타겟 메모리에 저장된 타겟 템플릿 세트 내 i번째 타겟 템플릿의 특징 벡터를 나타낸다(denote).

는 간섭 메모리에 저장된 간섭 템플릿 세트 내 i번째 간섭 템플릿의 특징 벡터를 나타낸다.

는 K개의 타겟 후보 영역들(Top-K 앵커들) 중 k번째 타겟 후보 영역(앵커)의 특징 벡터를 나타낸다.

는

와

의 상관도를 계산한 매칭값을 나타낸다.

는 타겟 메모리에 저장된 타겟 템플릿 세트 내 i번째 타겟 템플릿의 가중치를 나타낸다.

는 간섭 메모리에 저장된 간섭 템플릿 세트 내 i번째 간섭 템플릿의 가중치를 나타낸다. m1은 타겟 템플릿 세트에 포함된 타겟 템플릿들의 개수이고, m2는 간섭 템플릿 세트에 포함된 간섭 템플릿들의 개수이다.here,

denotes a feature vector of the i-th target template in the target template set stored in the target memory (denote).

denotes a feature vector of a k-th target candidate region (anchor) among K target candidate regions (Top-K anchors).

Is

Wow

Shows the matching value calculated by the correlation of .

denotes the weight of the i-th target template in the target template set stored in the target memory.

denotes the weight of the i-th interference template in the interference template set stored in the interference memory. m1 is the number of target templates included in the target template set, and m2 is the number of interference templates included in the interference template set.

그리고, 수학식 1에 의하여 계산된 스코어가 가장 큰 타겟 후보 영역이 타겟 영역(최종 앵커)(590)로 선택된다. 이는 수학식 2로 나타낼 수 있다.Then, the target candidate region having the largest score calculated by Equation 1 is selected as the target region (final anchor) 590 . This can be expressed by Equation (2).

수학식 1과 수학식 2를 통해, 타겟 템플릿 세트들에 포함된 타겟 템플릿들과는 가장 유사하면서, 간섭 템플릿 세트들에 포함된 간섭 템플릿들과는 가장 덜 유사한 타겟 후보 영역이 타겟 영역(최종 앵커)로 선택된다.Through Equations 1 and 2, a target candidate area most similar to the target templates included in the target template sets and least similar to the interference templates included in the interference template sets is selected as the target area (final anchor) .

타겟 템플릿들의 수(m1) 및 간섭 템플릿들의 수(m2)는, 타겟 객체를 식별할 현재 프레임 이미지에 따라 다를 수 있다.The number of target templates m1 and the number of interference templates m2 may vary according to a current frame image for identifying a target object.

실시예에 따라, 타겟 템플릿 세트 내 모든 타겟 템플릿들의 가중치의 합은 1이고, 초기 타겟 템플릿의 가중치가 가장 높게 설정될 수 있다. 그러나, 타겟 템플릿들의 가중치의 설정 방법은 이에 제한되지 않고, 다양한 방법으로 설정될 수 있다.According to an embodiment, the sum of the weights of all target templates in the target template set may be 1, and the weight of the initial target template may be set to be the highest. However, the method of setting the weight of the target templates is not limited thereto, and may be set in various ways.

실시예에 따라, 간섭 템플릿 세트 내 모든 간섭 템플릿들의 가중치의 합은 1이고, 각 간섭 템플릿의 가중치는 각 간섭 템플릿이 위치한 이미지 영역과 타겟 객체가 위치한 이미지 영역 사이의 거리에 따라 설정될 수 있다. 그러나, 간섭 템플릿들의 가중치의 설정 방법은 이에 제한되지 않고, 다양한 방법으로 설정될 수 있다.According to an embodiment, the sum of weights of all interference templates in the interference template set is 1, and the weight of each interference template may be set according to a distance between an image region in which each interference template is located and an image region in which a target object is located. However, the method of setting the weight of the interference templates is not limited thereto, and may be set in various ways.

일 실시예에 따르면, 제1 프레임 이미지에 대응하는 이미지 특징(초기 타겟 템플릿)을 추출한 후, 이 초기 타겟 템플릿은 타겟 메모리(531)에 저장될 수 있다. 그러면, 그 다음부터는 초기 타겟 템플릿을 타겟 메모리(531)로부터 로드(load)하여 사용할 수 있다.According to an embodiment, after extracting an image feature (initial target template) corresponding to the first frame image, the initial target template may be stored in the target memory 531 . Then, thereafter, the initial target template may be loaded from the target memory 531 and used.

도 6은 다른 실시예에 따른 간소화된 객체 검출 방법(light-weight object detection method)를 설명하기 위한 도면이다.6 is a view for explaining a simplified object detection method (light-weight object detection method) according to another embodiment.

도 6에 도시된 바와 같이, 메모리 풀(memory pool)(630)은, 타겟 메모리(target memory)(631)만을 포함하고, 간섭 메모리(interference memory)는 포함하지 않는다. 타겟 메모리(531)는 타겟 템플릿 세트(target template set)를 저장한다. 본 실시에서, 타겟 템플릿 세트에 포함되는 타겟 템플릿들의 각각은 대응하는 이미지 영역의 특징 벡터(feature vector)이다. 즉, 도 6에서,

는 타겟 메모리(631)에 저장된 타겟 템플릿 세트 내 i번째 타겟 템플릿의 특징 벡터를 나타낸다(denote).As shown in FIG. 6 , the memory pool 630 includes only a target memory 631 and does not include an interference memory. The target memory 531 stores a target template set. In this embodiment, each of the target templates included in the target template set is a feature vector of a corresponding image region. That is, in Fig. 6,

denotes a feature vector of the i-th target template in the target template set stored in the target memory 631 (denote).

현재 프레임 이미지(610)로부터 복수 개의 검색 영역이 선택된다. 복수 개의 검색 영역들의 각각은, 백본 네트워크(640) 및 조정 레이어(adjustment layer)(643)를 통하여 처리되고, 검색 영역에 대응하는 검색 영역 특징(646)이 얻어진다.A plurality of search areas are selected from the current frame image 610 . Each of the plurality of search areas is processed through a backbone network 640 and an adjustment layer 643 , and a search area characteristic 646 corresponding to the search area is obtained.

타겟 객체를 포함하는 초기 이미지(620)는, 백본 네트워크(650) 및 조정 레이어(adjustment layer)(653)를 통하여 처리되고, 초기 이미지(620)에 대응하는 이미지 특징(656)이 얻어진다. 이 이미지 특징(656)은 초기 타겟 템플릿(initial target template)으로 사용된다.An initial image 620 including the target object is processed through a backbone network 650 and an adjustment layer 653 , and image features 656 corresponding to the initial image 620 are obtained. This image feature 656 is used as an initial target template.

상관도 계산부(correlation calculater)(660)는, 검색 영역에 대응하는 검색 영역 특징(646)과 초기 타겟 템플릿(initial target template)(656)의 상관도(correlation)를 계산한다. 상관도 계산부(correlation calculater)(660)에 의하여 생성된 상관도는 앵커 처리기(anchor processor)(670)에 입력된다. 앵커 처리기(670)는, 상기 상관도에 기초하여, 상관도가 높은 K개의 앵커들(검색 영역들)을 Top-K 앵커들(675)로 선택한다. 선택된 앵커(검색 영역)는 타겟 후보 영역으로 식별된다.A correlation calculator 660 calculates a correlation between the search area feature 646 corresponding to the search area and an initial target template 656 . The correlation generated by the correlation calculator 660 is input to the anchor processor 670 . The anchor processor 670 selects K anchors (search areas) with high correlation as Top-K anchors 675 based on the correlation. The selected anchor (search area) is identified as a target candidate area.

K개의 타겟 후보 영역들(675)의 각각에 대하여, 예를 들어 계산기(computing function)(680)에 의하여, 에 의하여, 수학식 3의 스코어(score)가 계산된다.For each of the K target candidate regions 675 , the score of Equation 3 is calculated by, for example, by a computing function 680 .

여기서,

는

와

의 상관도를 계산한 매칭값을 나타낸다.

는 타겟 메모리에 저장된 타겟 템플릿 세트 내 i번째 타겟 템플릿의 가중치를 나타낸다. m1은 타겟 템플릿 세트에 포함된 타겟 템플릿들의 개수이다.here,

Is

Wow

Shows the matching value calculated by the correlation of .

denotes the weight of the i-th target template in the target template set stored in the target memory. m1 is the number of target templates included in the target template set.

그리고, 수학식 3에 의하여 계산된 스코어가 가장 큰 타겟 후보 영역(Top-1 앵커)이 타겟 영역(최종 앵커)(690)로 선택된다. 이는 수학식 4로 나타낼 수 있다.Then, the target candidate region (Top-1 anchor) having the largest score calculated by Equation 3 is selected as the target region (final anchor) 690 . This can be expressed by Equation (4).

일 실시예에 따르면, 초기 타겟 템플릿은, 위에서 설명된 발명과 다르게, 타겟 템플릿 세트에 포함된 타겟 타겟 템플릿들의 모두 또는 일부의 융합 템플릿으로 대체될 수 있다.According to an embodiment, the initial target template may be replaced with a fusion template of all or some of the target target templates included in the target template set, unlike the invention described above.

일 실시예에 따른 방법은, 상기 타겟 템플릿 세트에 포함된 타겟 템플릿들의 각각에 대응하는 이미지 영역의 이미지 특징(image feature)을 융합(integrate)하여 타겟 융합 특징(integrated target feature)을 결정하고, 상기 타겟 융합 특징에 기초하여, 상기 프레임 이미지 내에서 상기 객체를 포함하는 것으로 판단되는 하나 이상의 타겟 후보 영역(target candidate area)을 획득할 수 있다. 그리고, 상기 하나 이상의 타겟 후보 영역으로부터 하나의 타겟 영역(target area)가 결정된다.The method according to an embodiment determines an integrated target feature by integrating an image feature of an image region corresponding to each of the target templates included in the target template set, and Based on the target fusion characteristic, one or more target candidate areas determined to include the object in the frame image may be acquired. Then, one target area is determined from the one or more target candidate areas.

상기 하나 이상의 타겟 후보 영역을 획득하는 단계는, 상기 프레임 이미지 내에서 복수의 검색 영역을 결정하는 단계, 상기 복수의 검색 영역의 각각의 이미지 특징을 추출(extract)하여 검색 영역 특징(search area feature)를 획득하는 단계, 상기 복수의 검색 영역의 각각의 검색 영역 특징과 상기 타겟 융합 특징의 상관도(correlation)를 계산하는 단계, 및 상기 상관도에 기초하여 상기 복수의 검색 영역 중 상기 하나 이상의 타겟 후보 영역을 결정하는 단계를 포함할 수 있다.The obtaining of the one or more target candidate areas may include: determining a plurality of search areas within the frame image; extracting image features of each of the plurality of search areas to obtain a search area feature obtaining, calculating a correlation between a feature of each search region of the plurality of search regions and the target fusion feature, and based on the correlation, the at least one target candidate among the plurality of search regions It may include determining the area.

일 실시예에서, 하나 이상의 타겟 후보 영역으로부터 타겟 영역을 결정할 때, 아래의 방법이 사용될 수 있다. 임의의 타겟 후보 영역에 대해, 타겟 템플릿 세트의 각 타겟 템플릿과의 매칭 정도를 계산하여, 복수의 매칭 정도를 얻는다. 해당 복수의 매칭 정도의 평균값은, 해당 타겟 후보 영역과 타겟 템플릿 간의 매칭 정도로 취해질 수 있다. 마지막으로, 각 타겟 후보 영역 중 상기 매칭 정도가 가장 높은 타겟 후보 영역을 최종 타겟 영역으로 취한다.In an embodiment, when determining a target area from one or more target candidate areas, the following method may be used. For an arbitrary target candidate area, a matching degree with each target template of the target template set is calculated, to obtain a plurality of matching degrees. The average value of the plurality of matching degrees may be taken as the matching degree between the corresponding target candidate region and the target template. Finally, a target candidate area having the highest matching degree among each target candidate area is taken as the final target area.

일 실시예에 따르면, 최종 타겟 영역이 미리 결정된 조건을 만족하는 경우, 타겟 템플릿 세트를 갱신(update)하는 단계를 더 포함할 수 있다. According to an embodiment, when the final target area satisfies a predetermined condition, the method may further include updating the target template set.

상기 타겟 템플릿 세트를 갱신하는 단계는, 상기 타겟 템플릿 세트의 타겟 융합 특징 및 최종 타겟 영역의 유사도(similarity)를 계산하는 단계, 및 상기 유사도가 임계값보다 작은 경우, 최종 타겟 영역을 타겟 템플릿으로 상기 타겟 템플릿에 추가(add)하는 단계를 포함할 수 있다.The updating of the target template set includes: calculating a similarity of a final target region and a target fusion characteristic of the target template set; and if the similarity is less than a threshold, the final target region as a target template. It may include adding to the target template.

상기 타겟 템플릿 세트를 갱신하는 단계는, 상기 타겟 템플릿 세트의 모든 타겟 템플릿들의 각각 및 최종 타겟 영역의 유사도(similarity)를 계산하는 단계, 및 상기 유사도의 모두가 임계값보다 작은 경우, 최종 타겟 영역을 타겟 템플릿으로 상기 타겟 템플릿에 추가(add)하는 단계를 포함할 수 있다. The updating of the target template set includes calculating a similarity of each of all target templates of the target template set and a final target area, and when all of the similarities are less than a threshold value, the final target area It may include adding (add) to the target template as a target template.

선택적으로, 최종 타겟 영역과 타겟 템플릿 세트 중 각 타겟 템플릿 간의 유사도(similarity)를 계산할 수 있다. 계산된 유사도의 모두가 설정된 임계값(예, 0.9 등) 이하이면, 최종 타겟 영역에 대응하는 타겟 템플릿이 타겟 템플릿 세트에 추가될 수 있다. 또는, 계산된 유사도에 대응하는 융합 유사도(예, 계산된 유사도들의 평균값)가 설정된 임계값(예, 0.9 등) 이하이면, 최종 타겟 영역에 대응하는 타겟 템플릿이 타겟 템플릿 세트에 추가될 수 있다.Optionally, a similarity between the final target region and each target template in the set of target templates may be calculated. When all of the calculated similarities are equal to or less than a set threshold (eg, 0.9, etc.), a target template corresponding to the final target area may be added to the target template set. Alternatively, if the fusion similarity (eg, the average value of the calculated similarities) corresponding to the calculated similarity is less than or equal to a set threshold (eg, 0.9, etc.), the target template corresponding to the final target region may be added to the target template set.

이에 따르면, 최종 타겟 영역이 기존의 타겟 템플릿 세트에 포함된 타겟 템플릿들과 유사한 경우, 타겟 템플릿 세트에 포함되지 않는다. 이미 포함된 타겟 템플릿들과 유사한 최종 타겟 영역은, 기존의 타겟 템플릿 세트의 성능을 크게 향상시키지 않기 때문이다. 또한, 타겟 템플릿 세트의 갱신을 통해 타겟 템플릿 세트가 타겟 객체의 최신 특징을 포함할 수 있으므로, 타겟 템플릿 세트의 무결성(integrity)을 향상시킨다.According to this, if the final target area is similar to target templates included in the existing target template set, it is not included in the target template set. This is because the final target area similar to the target templates already included does not significantly improve the performance of the existing target template set. In addition, since the target template set can include the latest characteristics of the target object through the update of the target template set, the integrity of the target template set is improved.

일 실시예에 따르면, 상기 간섭 템플릿 세트를 갱신(update)하는 단계를 더 포함할 수 있다.According to an embodiment, the method may further include updating the interference template set.

일 실시예에서, 하나 이상의 타겟 후보 영역 중 최종 타겟 영역을 제외한 다른 타겟 후보 영역의 일부 또는 전부를 간섭 템플릿으로 상기 간섭 템플릿에 추가(add)할 수 있다. 예를 들어, Top-K 앵커 중 Bottom-1 앵커 이외의 앵커에 대응하는 이미지 영역(또는 이 이미지 영역의 이미지 특징)이 간섭 템플릿으로 선택될 수 수 있다.In an embodiment, a part or all of the target candidate regions other than the final target region among one or more target candidate regions may be added to the interference template as an interference template. For example, an image region (or an image feature of this image region) corresponding to an anchor other than the Bottom-1 anchor among Top-K anchors may be selected as the interference template.

대안적인 실시예에 따르면, 하나 이상의 타겟 후보 영역 중 최종 타겟 영역을 제외한 다른 타겟 후보 영역의 일부 또는 전부에 대해, 간섭 템플릿 세트의 간섭 융합 특징과의 유사도가 계산된다. 계산된 유사도가 임계값보다 작은 타겟 후보 영역은, 간섭 템플릿으로 간섭 템플릿에 추가될 수 있다. 이에 따르면, 간섭 템플릿에 추가될 것으로 선택된 후보 간섭 영역(candidate interference area)이 기존의 간섭 템플릿 세트에 포함된 간섭 템플릿들과 유사한 경우, 간섭 템플릿 세트에 포함되지 않는다. 이미 포함된 간섭 템플릿들과 유사한 후보 간섭 영역은, 기존의 간섭 템플릿 세트의 성능을 크게 향상시키지 않기 때문이다.According to an alternative embodiment, for some or all of the target candidate regions other than the final target region among the one or more target candidate regions, a degree of similarity with the interference fusion characteristic of the interference template set is calculated. A target candidate region in which the calculated similarity is less than the threshold may be added to the interference template as the interference template. According to this, when a candidate interference area selected to be added to the interference template is similar to the interference templates included in the existing interference template set, it is not included in the interference template set. This is because the candidate interference region similar to the interference templates already included does not significantly improve the performance of the existing interference template set.

일 실시예에 따르면, 프레임 이미지 중 타겟 객체가 위치한 타겟 영역 및 해당 타겟 객체에 대한 가능 간섭 객체(possible interference object)가 위치한 후보 간섭 영역(이미지 영역)이 획득된다. 가능 간섭 객체가 위치하는 후보 간섭 영역에 대해, 해당 후보 간섭 영역과 타겟 영역 간의 제1 유사도 및 해당 후보 간섭 영역과 간섭 템플릿 세트 중의 각 간섭 템플릿 간의 제2 유사도를 결정한다. 각 가능 간섭 객체가 위치한 후보 간섭 영역에 대한 제1 유사도 및 제2 유사도에 기초하여, 간섭 템플릿 세트의 갱신 여부를 결정한다.According to an embodiment, a target area in which a target object is located and a candidate interference area (image area) in which a possible interference object with respect to the target object is located among the frame images are obtained. For a candidate interference region in which a possible interference object is located, a first degree of similarity between the candidate interference region and the target region and a second similarity between the candidate interference region and each interference template in the interference template set are determined. Whether to update the interference template set is determined based on the first and second similarities with respect to the candidate interference region in which each possible interference object is located.

일 실시예에 따르면, 임의의 가능 간섭 객체가 위치한 이미지 영역(후보 간섭 영역)에 대하여, 해당 이미지 영역과 타겟 영역의 제1 유사도(해당 제1 유사도는 해당 이미지 영역과 타겟 영역 사이의 거리를 나타낼 수 있음)을 결정하고, 해당 이미지 영역과 간섭 템플릿 세트 중 각 간섭 템플릿의 제2 유사도(해당 제2 유사도는 해당 이미지 영역과 각 간섭 템플릿의 유사도를 나타낼 수 있음)을 결정할 수 있다. 그런 다음, 제1 유사도와 제2 유사도에 기초하여, 후보 간섭 영역들 중에서 간섭 템플릿 세트에 포함할 후보 간섭 영역을 결정할 수 있다. 예를 들어, 제1 유사도가 크고, 제2 유사도가 작은 후보 간섭 영역을 간섭 템플릿 세트에 포함할 후보 간섭 영역으로 선택할 수 있다.According to an embodiment, with respect to an image region (candidate interference region) in which any possible interfering object is located, a first similarity between the image region and the target region (the first similarity indicates a distance between the image region and the target region) may be determined), and a second degree of similarity between the corresponding image region and each interference template among the interference template sets (the second similarity may indicate a similarity between the corresponding image region and each interference template) may be determined. Then, based on the first degree of similarity and the second degree of similarity, it is possible to determine a candidate interference region to be included in the interference template set from among the candidate interference regions. For example, a candidate interference region having a large first similarity and a small second similarity may be selected as a candidate interference region to be included in the interference template set.

이하, 도면을 참조하여 타겟 메모리 및/또는 간섭 메모리의 갱신(update)을 설명한다.Hereinafter, the update of the target memory and/or the interference memory will be described with reference to the drawings.

도 7은 일 실시예에 따른 메모리 갱신을 설명하기 위한 도면이다.7 is a diagram for describing memory update according to an exemplary embodiment.

도 7에 도시된 바와 같이, 최종적으로 선택된 Bottom-1 앵커(790)는, 현재 프레임 이미지에서 타겟 객체가 위치하는 것으로 판단된 이미지 영역에 대한 정보이다. 즉, Bottom-1 앵커(790)는 타겟 영역(최종 앵커)이다. Bottom-1 앵커(790)는, 이미지 영역일 수 있고, 또는 상기 이미지 영역에 대한 이미지 특징(image feature)을 나타내는 특징 맵(feature map)일 수 있으나, 이에 제한되지 않는다.As shown in FIG. 7 , the finally selected Bottom-1 anchor 790 is information on the image area in which the target object is determined to be located in the current frame image. That is, the Bottom-1 anchor 790 is the target area (final anchor). The Bottom-1 anchor 790 may be an image region or a feature map indicating image features for the image region, but is not limited thereto.

타겟 객체의 검출 결과(예를 들어, Bottom-1 앵커(790))와 타겟 메모리(731)에 저장된 타겟 템플릿 세트의 타겟 템플릿들이 비교된다. 타겟 객체의 검출 결과와 타겟 템플릿 세트의 유사도(similarity)이 임계값 미만인지 판단한다. 일 예에서, 타겟 템플릿 세트의 모든 타겟 템플릿들의 각각 및 타겟 영역(790)의 유사도(similarity)가 계산된다. 다른 예에서, 타겟 템플릿 세트의 타겟 융합 특징 및 상기 타겟 영역의 유사도(similarity)가 계산된다. 다시 말하면, 검출된 타겟 영역(790)이 타겟 메모리(731)에 이미 저장되어 있는 타겟 템플릿들과 어느 정도 유사한지를 판단한다. 유사도가 임계값(예를 들어, 90%) 미만이면, 검출된 타겟 영역(790)에 기초하여, 타겟 메모리(731)에 저장된 타겟 템플릿 세트가 갱신된다. 예를 들어, 검출된 타겟 영역(790)은, 새로운 타겟 템플릿으로, 타겟 템플릿 세트에 추가(add)된다. 유사도가 임계값(예를 들어, 90%) 이상이면, 타겟 메모리(731)는 갱신되지 않는다.The detection result of the target object (eg, the Bottom-1 anchor 790 ) and the target templates of the target template set stored in the target memory 731 are compared. It is determined whether a similarity between the detection result of the target object and the target template set is less than a threshold value. In one example, a similarity of the target area 790 and each of all target templates of the target template set is calculated. In another example, a target fusion characteristic of a set of target templates and a similarity of the target region are calculated. In other words, it is determined how similar the detected target area 790 is to target templates already stored in the target memory 731 . If the similarity is less than a threshold (eg, 90%), the target template set stored in the target memory 731 is updated based on the detected target area 790 . For example, the detected target area 790 is added to the target template set as a new target template. If the similarity is greater than or equal to a threshold (eg, 90%), the target memory 731 is not updated.

Top-K 후보 앵커들(765) 중에서, 타겟 영역으로 선택된 Bottom-1 앵커(790)를 제외한 나머지 앵커들 중 Bottom-K2 후보 앵커들(795)을 후보 간섭 앵커들(candidate interference anchors)로 선택한다. Top-K 후보 앵커들(765) 중 Bottom-1 앵커(790)를 찾기 위하여, 타겟 템플릿 세트 및 간섭 템플릿 세트를 이용하여, Top-K 후보 앵커들(765)에 대한 매칭 정도가 계산된다. Top-K 후보 앵커들(765) 중 가장 큰 매칭 정도를 가지는 앵커가 Bottom-1 앵커(790)로 선택된다. 그리고, Top-K 후보 앵커들(765) 중 작은 점수의 매칭 정도를 가지는 K2개의 앵커들이 Bottom-K2 후보 앵커들(795)로 결정된다.Among the Top-K candidate anchors 765 , the Bottom-K2 candidate anchors 795 among the remaining anchors except for the Bottom-1 anchor 790 selected as the target area are selected as candidate interference anchors. . In order to find the Bottom-1 anchor 790 among the Top-K candidate anchors 765 , the matching degree for the Top-K candidate anchors 765 is calculated using the target template set and the interference template set. The anchor having the greatest matching degree among the Top-K candidate anchors 765 is selected as the Bottom-1 anchor 790 . Then, K2 anchors having a matching degree of a small score among the Top-K candidate anchors 765 are determined as the Bottom-K2 candidate anchors 795 .

후보 간섭 앵커들(795)의 각각과 현재 프레임 이미지에서 검출된 타겟 객체(타겟 영역)의 위치 사이의 거리가 계산된다. 후보 간섭 앵커(795)와 타겟 객체의 위치 사이의 거리가 임계값도다 큰지 여부가 판단된다. 상기 거리가 임계값보다 큰 경우, 이 후보 간섭 앵커는, 간섭 객체(740)에 포함될 수 있다. 상기 거리가 임계값보다 작은 경우, 이 후보 간섭 앵커는, 간섭 객체가 아닌 것으로 간주되고, 이 후보 간섭 앵커에 관한 정보는 버려진다(discard).The distance between each of the candidate interference anchors 795 and the position of the detected target object (target area) in the current frame image is calculated. It is determined whether the distance between the candidate interference anchor 795 and the location of the target object is greater than a threshold degree. If the distance is greater than the threshold, this candidate interference anchor may be included in the interference object 740 . If the distance is less than the threshold, this candidate interfering anchor is considered not an interfering object, and information about this candidate interfering anchor is discarded.

선택된 간섭 객체(740)와 간섭 메모리(732)에 저장된 간섭 템플릿 세트의 간섭 템플릿들이 비교된다. 간섭 객체(740)와 간섭 템플릿 세트의 유사도(similarity)가 임계값 미만인지 판단한다. 일 예에서, 간섭 템플릿 세트의 모든 간섭 템플릿들의 각각 및 간섭 객체(740)의 유사도(similarity)가 계산된다. 다른 예에서, 간섭 템플릿 세트의 간섭 융합 특징 및 간섭 객체(740)의 유사도가 계산된다. 다시 말하면, 검출된 간섭 객체(740)가 간섭 메모리(732)에 이미 저장되어 있는 간섭 템플릿들과 어느 정도 유사한지를 판단한다. 유사도가 임계값(예를 들어, 90%) 미만이면, 검출된 간섭 객체(740)에 기초하여, 간섭 메모리(732)에 저장된 간섭 템플릿 세트가 갱신된다. 예를 들어, 검출된 간섭 객체(740)는, 새로운 간섭 템플릿으로, 간섭 템플릿 세트에 추가(add)된다. 유사도가 임계값(예를 들어, 90%) 이상이면, 간섭 메모리(731)는 갱신되지 않는다.The interference templates of the selected interference object 740 and the interference template set stored in the interference memory 732 are compared. It is determined whether the similarity between the interference object 740 and the interference template set is less than a threshold value. In one example, a similarity of the interference object 740 and each of all interference templates of the interference template set is calculated. In another example, the similarity of the interference object 740 and the interference fusion characteristic of the interference template set is calculated. In other words, it is determined to what extent the detected interference object 740 is similar to the interference templates already stored in the interference memory 732 . If the similarity is less than a threshold (eg, 90%), the interference template set stored in the interference memory 732 is updated based on the detected interference object 740 . For example, the detected interference object 740 is added to the interference template set as a new interference template. If the similarity is greater than or equal to a threshold (eg, 90%), the interference memory 731 is not updated.

일 실시예에 따르면, 타겟 객체의 검출 결과에 따라, 타겟 템플릿 세트 및 간섭 템플릿 세트에 대한 유지(maintenance) 및 갱신이 수행될 수 있다. 즉, 타겟 메모리와 간섭 메모리를 최신 상태로 유지하여, 다음 타겟 객체의 검출를 위해, 보다 풍부하고 다양한 정보를 제공하고, 검출 정확도를 향상시킬 수 있다.According to an embodiment, maintenance and update of the target template set and the interference template set may be performed according to the detection result of the target object. That is, by keeping the target memory and the interference memory up to date, it is possible to provide richer and more diverse information for detection of the next target object, and to improve detection accuracy.

표 1은, 타겟 메모리와 간섭 메모리를 모두 이용하지 않은 경우(타겟 템플릿 세트와 간섭 템플릿 세트를 모두 이용하지 않은 경우)(case 1), 타겟 메모리만 이용한 경우(타겟 템플릿 세트만 이용한 경우)(case 2), 타겟 메모리와 간섭 메모리를 모두 이용한 경우(타겟 템플릿 세트와 간섭 템플릿 세트를 모두 이용한 경우)(case 3)의 객체 검출/추적의 성공률 및 정확도를 나타낸다.Table 1 shows the case in which neither the target memory nor the coherence memory is used (the case where neither the target template set and the coherence template set are used) (case 1), when only the target memory is used (only the target template set is used) (case 2), the success rate and accuracy of object detection/tracking in the case where both the target memory and the interference memory are used (when both the target template set and the interference template set are used) (case 3).

표 1에서 설명된 바와 같이, 타겟 메모리만 이용한 경우(타겟 템플릿 세트만 이용한 경우)(case 2), 및 타겟 메모리와 간섭 메모리를 모두 이용한 경우(타겟 템플릿 세트와 간섭 템플릿 세트를 모두 이용한 경우)(case 3) 모두, 타겟 메모리와 간섭 메모리를 모두 이용하지 않은 경우(타겟 템플릿 세트와 간섭 템플릿 세트를 모두 이용하지 않은 경우)(case 1)와 비교하여, 성공률 및 정확도가 향상되었음을 알 수 있다. 또한, 타겟 메모리와 간섭 메모리를 모두 이용한 경우(타겟 템플릿 세트와 간섭 템플릿 세트를 모두 이용한 경우)(case 3)가, 타겟 메모리만 이용한 경우(타겟 템플릿 세트만 이용한 경우)(case 2)와 비교하여, 정확도가 향상되었음을 알 수 있다.도 8은 일 실시예에 따른 객체 검출 장치의 블록도이다.As described in Table 1, when only the target memory is used (when only the target template set is used) (case 2), and when both the target memory and the interference memory are used (when both the target template set and the interference template set are used) ( In all case 3), it can be seen that the success rate and accuracy are improved compared to the case in which neither the target memory nor the interference memory is used (when neither the target template set nor the interference template set is used) (case 1). In addition, the case where both the target memory and the interference memory were used (when both the target template set and the interference template set were used) (case 3) was compared with the case where only the target memory was used (only the target template set was used) (case 2). , it can be seen that the accuracy is improved. FIG. 8 is a block diagram of an apparatus for detecting an object according to an exemplary embodiment.

객체 검출 장치(800)는, 메모리와 프로세서를 포함하는 전자 장치로 구현될 수 있다. 객체 검출 장치(800)는, 복수의 프레임 이미지들을 포함하는 비디오의 프레임 이미지(a frame image of a video comprising a plurality of frame images)로부터 객체(object)를 검출(detect)하는 타겟 객체 처리 모듈(850)을 포함할 수 있다.The object detection apparatus 800 may be implemented as an electronic device including a memory and a processor. The object detection apparatus 800 includes a target object processing module 850 that detects an object from a frame image of a video comprising a plurality of frame images. ) may be included.

객체 검출 장치(800)는, 타겟 템플릿 세트(target tempage set)에 기초하여, 상기 프레임 이미지로부터 상기 객체를 검출하고, 상기 검출된 객체에 관한 정보(information regarding the detected object)를 출력(output)할 수 있다.The object detection apparatus 800 detects the object from the frame image based on a target template set, and outputs information regarding the detected object. can

객체 검출 장치(800)는, 타겟 템플릿 세트에 포함된 타겟 템플릿들의 각각에 대응하는 이미지 영역의 이미지 특징(image feature)을 융합(integrate)하여 타겟 융합 특징(integrated target feature)을 결정하고, 타겟 융합 특징에 기초하여, 상기 프레임 이미지 내에서 상기 객체를 포함하는 것으로 판단되는 하나 이상의 타겟 후보 영역(target candidate area)을 획득할 수 있다. 객체 검출 장치(800)는, 상기 하나 이상의 타겟 후보 영역으로부터 하나의 타겟 영역(target area)을 결정할 수 있다.The object detection apparatus 800 determines an integrated target feature by integrating an image feature of an image region corresponding to each of the target templates included in the target template set, and the target fusion Based on the feature, one or more target candidate areas determined to include the object in the frame image may be acquired. The object detection apparatus 800 may determine one target area from the one or more target candidate areas.

객체 검출 장치(800)는, 프레임 이미지 내에서 복수의 검색 영역을 결정하고, 복수의 검색 영역의 각각의 이미지 특징을 추출(extract)하여 검색 영역 특징(search area feature)를 획득할 수 있다. 객체 검출 장치(800)는, 상기 복수의 검색 영역의 각각의 검색 영역 특징과 상기 타겟 융합 특징의 상관도(correlation)를 계산하고, 상기 상관도에 기초하여 상기 복수의 검색 영역 중 상기 하나 이상의 타겟 후보 영역을 결정할 수 있다.The object detection apparatus 800 may determine a plurality of search areas within a frame image, and extract image features of each of the plurality of search areas to obtain a search area feature. The object detection apparatus 800 calculates a correlation between a feature of each of the search areas of the plurality of search areas and the target fusion feature, and based on the correlation, the at least one target among the plurality of search areas. Candidate regions can be determined.

객체 검출 장치(800)는, 타겟 템플릿 세트의 상기 타겟 융합 특징 및 상기 타겟 영역의 유사도(similarity)를 계산할 수 있다. 객체 검출 장치(800)는, 상기 유사도가 임계값보다 작은 경우, 상기 타겟 영역을 타겟 템플릿으로 상기 타겟 템플릿에 추가(add)함으로써, 타겟 템플릿 세트를 갱신할 수 있다.The object detection apparatus 800 may calculate a similarity between the target fusion feature of the target template set and the target region. When the similarity is less than a threshold value, the object detection apparatus 800 may update the target template set by adding the target region as a target template to the target template.

객체 검출 장치(800)는, 타겟 템플릿 세트의 모든 타겟 템플릿들의 각각 및 검출된 타겟 영역의 유사도(similarity)를 계산할 수 있다. 객체 검출 장치(800)는, 상기 유사도의 모두가 임계값보다 작은 경우, 상기 타겟 영역을 타겟 템플릿으로 상기 타겟 템플릿에 추가(add)함으로써, 타겟 템플릿 세트를 갱신할 수 있다.The object detection apparatus 800 may calculate similarity of each of all target templates of the target template set and the detected target area. The object detection apparatus 800 may update the target template set by adding the target region as a target template to the target template when all of the similarities are less than a threshold value.

객체 검출 장치(800)는, 타겟 템플릿 세트 및 간섭 테플릿 세트에 기초하여, 상기 프레임 이미지 내에서 상기 객체를 포함하는 것으로 판단되는 하나 이상의 타겟 후보 영역(target candidate area)을 획득할 수 있다. 객체 검출 장치(800)는, 상기 하나 이상의 타겟 후보 영역으로부터 하나의 타겟 영역(target area)을 결정할 수 있다.The object detection apparatus 800 may acquire one or more target candidate areas determined to include the object in the frame image, based on the target template set and the interference template set. The object detection apparatus 800 may determine one target area from the one or more target candidate areas.

객체 검출 장치(800)는, 하나 이상의 타겟 후보 영역의 각각에 대해, 상기 타겟 후보 영역의 각각과 상기 타겟 템플릿 세트의 각 타겟 템플릿과의 매칭 정도(matching degree)를 계산할 수 있다. 객체 검출 장치(800)는, 상기 하나 이상의 타겟 후보 영역의 각각에 대해, 상기 타겟 후보 영역의 각각과 상기 간섭 템플릿 세트의 각 간섭 템플릿과의 매칭 정도(matching degree)를 계산할 수 있다. 객체 검출 장치(800)는, 상기 타겟 후보 영역의 각각과 상기 타겟 템플릿 세트의 각 타겟 템플릿과의 매칭 정도 및 상기 타겟 후보 영역의 각각과 상기 간섭 템플릿 세트의 각 간섭 템플릿과의 매칭 정도에 기초하여, 상기 타겟 후보 영역의 각각의 타겟 매칭 정도를 계산할 수 있다. 객체 검출 장치(800)는, 상기 타겟 후보 영역의 각각의 타겟 매칭 정도에 기초하여, 상기 하나 이상의 타겟 후보 영역 중 하나의 타겟 영역을 결정할 수 있다. 여기서, 상기 타겟 후보 영역의 각각의 타겟 매칭 정도를 계산할 때, 객체 검출 장치(800)는, 상기 타겟 후보 영역의 각각과 상기 타겟 템플릿 세트의 각 타겟 템플릿과의 매칭 정도의 평균값 또는 중간값, 및/또는 상기 타겟 후보 영역의 각각과 상기 간섭 템플릿 세트의 각 간섭 템플릿과의 매칭 정도의 평균값 또는 중간값에 기초하여 상기 타겟 후보 영역의 각각의 타겟 매칭 정도를 계산할 수 있다.The object detection apparatus 800 may calculate, for each of the one or more target candidate areas, a matching degree between each of the target candidate areas and each target template of the target template set. The object detection apparatus 800 may calculate, for each of the one or more target candidate regions, a matching degree between each of the target candidate regions and each interference template of the interference template set. The object detection apparatus 800 is configured to: based on a matching degree between each of the target candidate areas and each target template of the target template set and a matching degree between each of the target candidate areas and each interference template of the interference template set , it is possible to calculate each target matching degree of the target candidate region. The object detection apparatus 800 may determine one target area among the one or more target candidate areas based on the respective target matching degrees of the target candidate areas. Here, when calculating the target matching degree of each of the target candidate areas, the object detection apparatus 800 includes an average or median value of matching degrees between each of the target candidate areas and each target template of the target template set, and / Alternatively, each target matching degree of the target candidate area may be calculated based on an average value or a median value of matching degrees between each of the target candidate areas and each interference template of the interference template set.

객체 검출 장치(800)는, 타겟 템플릿 세트에 기초하여, 타겟 융합 특징(integrated target feature)을 결정하고, 상기 간섭 템플릿 세트에 기초하여, 간섭 융합 특징(integrated inteference feature)을 결정할 수 있다. 객체 검출 장치(800)는, 상기 타겟 융합 특징 및 상기 간섭 융합 특징에 기초하여, 상기 프레임 이미지로부터 상기 하나 이상의 타겟 후보 영역을 획득할 수 있다. 예를 들어, 객체 검출 장치(800)는, 간섭 템플릿 세트의 간섭 융합 특징 및 상기 다른 타겟 후보 영역의 일부 또는 전부의 유사도를 계산하고, 상기 다른 타겟 후보 영역의 일부 또는 전부 중 상기 유사도가 임계값보다 작은 타겟 후보 영역을 간섭 템플릿으로 상기 간섭 템플릿에 추가할 수 있다.The object detection apparatus 800 may determine an integrated target feature based on the target template set, and may determine an integrated inteference feature based on the interference template set. The object detection apparatus 800 may obtain the one or more target candidate regions from the frame image based on the target fusion characteristic and the interference fusion characteristic. For example, the object detection apparatus 800 calculates a similarity of some or all of the interference fusion characteristics of the interference template set and the other target candidate regions, and the similarity among some or all of the other target candidate regions is a threshold value. A smaller target candidate area may be added to the interference template as an interference template.

도 9는 일 실시예에 따른 객체 검출을 위한 전자 장치의 블록도이다.9 is a block diagram of an electronic device for object detection according to an embodiment.

일 실시예에 따른 전자 장치(900)는 프로세서(910), 네트워크 인터페이스(940) 및 저장장치(storage)(950)를 포함할 수 있다. 또한, 전자 장치(910)는, 사용자 인터페이스(930) 및 적어도 하나의 통신 버스(920)를 더 포함할 수 있다. 통신 버스(920)는 이들 구성요소들 간의 연결 및 통신을 실현하기 위해 사용된다. 사용자 인터페이스(930)는, 디스플레이(display) 및 키보드(keyboard)를 포함할 수 있고, 선택적으로, 표준 유선/무선 인터페이스를 더 포함할 수 있다. 네트워크 인터페이스(940)는, 표준 유선 인터페이스 및/또는 무선 인터페이스(예, Wi-Fi 인터페이스)를 포함할 수 있다. 저장장치(950)는, 고속 RAM 메모리를 포함할 수 있고, 적어도 하나의 디스크 메모리, Flash Memory와 같은 비휘발성 메모리(non-volatile memory)를 포함할 수도 있다. 저장장치(950)는, 또한, 전자장치(900)에 착탈될 수 있는 착탈식 저장 장치(removable storage device)일 수 있다. 저장장치(950)는 운영 체제, 네트워크 통신 모듈, 사용자 인터페이스 모듈 및 장치 제어 응용 프로그램 등을 저장할 수 있다. 네트워크 인터페이스(940)는, 네트워크 통신 기능을 제공할 수 있다. 사용자 인터페이스(930)는, 사용자를 위한 입력 인터페이스를 제공하는데 사용된다. 프로세서(910)는 저장장치(950)에 저장된 장치 제어 애플리케이션을 호출하는데 사용될 수 있다.The electronic device 900 according to an embodiment may include a processor 910 , a network interface 940 , and a storage 950 . Also, the electronic device 910 may further include a user interface 930 and at least one communication bus 920 . The communication bus 920 is used to realize the connection and communication between these components. The user interface 930 may include a display and a keyboard, and optionally, may further include a standard wired/wireless interface. The network interface 940 may include a standard wired interface and/or a wireless interface (eg, a Wi-Fi interface). The storage device 950 may include a high-speed RAM memory, and may include at least one disk memory or a non-volatile memory such as a flash memory. The storage device 950 may also be a removable storage device that can be detached from the electronic device 900 . The storage device 950 may store an operating system, a network communication module, a user interface module, and a device control application program. The network interface 940 may provide a network communication function. The user interface 930 is used to provide an input interface for a user. The processor 910 may be used to invoke a device control application stored in the storage 950 .

어떤 구현에서, 전자 장치(900)는 내장된 각 기능 모듈을 통해, 도 1내지 도 7에서 설명된 방법을 실행할 수 있다. 구체적인 내용은 앞에서 제공하는 구현 방법을 참조하면 되며, 여기서 더 반복하지 않는다.In some implementations, the electronic device 900 may execute the method described with reference to FIGS. 1 to 7 through each built-in function module. For specific details, refer to the implementation method provided previously, which will not be repeated further.

실시예에 따라서, AI 모델을 통해, 복수의 모듈 중 적어도 하나 이상의 모듈이 구현될 수 있다. AI와 관련된 기능은 비휘발성 메모리, 휘발성 메모리 및 프로세서에 의해 수행될 수 있다. 프로세서는 하나 이상의 프로세서를 포함할 수 있다. 이때, 하나 이상의 프로세서는 중앙 처리 장치(CPU), 애플리케이션 프로세서(AP) 등과 같은 범용 프로세서 또는 그래픽 처리 장치(GPU), 비주얼 처리 장치(VPU)와 같은 순수 그래픽 처리 장치, 및/또는 신경 처리 장치(NPU)와 같은 AI 전용 프로세서일 수 있다.According to an embodiment, at least one module among a plurality of modules may be implemented through the AI model. Functions related to AI may be performed by non-volatile memory, volatile memory, and processors. A processor may include one or more processors. At this time, the one or more processors are general-purpose processors such as central processing units (CPUs), application processors (APs), or pure graphics processing units such as graphics processing units (GPUs), visual processing units (VPUs), and/or neural processing units ( It may be an AI-only processor such as an NPU).

하나 이상의 프로세서는 비휘발성 메모리 및 휘발성 메모리에 저장된 사전 정의된 동작 규칙 또는 인공 지능(AI) 모델에 따라 입력 데이터의 처리를 제어한다. 하나 이상의 프로세서는, 훈련 또는 학습을 통해 사전 정의된 동작 규칙 또는 인공 지능 모델을 제공한다. 여기서, 학습(training)을 통한 제공은, 복수의 학습 데이터에 학습 알고리즘을 적용하여 사전 정의된 동작 규칙이나 원하는 특성을 가진 AI 모델을 얻는 것을 의미한다. 학습은, 실시예에 따른 AI가 수행되는 장치 자체에서 수행될 수 있고, 및/또는 별도의 서버/시스템에 의해 구현될 수 있다.The one or more processors control processing of input data according to predefined operating rules or artificial intelligence (AI) models stored in non-volatile and volatile memory. The one or more processors provide predefined operating rules or artificial intelligence models through training or learning. Here, providing through training means obtaining an AI model having predefined operation rules or desired characteristics by applying a learning algorithm to a plurality of training data. Learning may be performed in the device itself on which AI according to the embodiment is performed, and/or may be implemented by a separate server/system.

AI 모델에는 복수의 신경망 레이어가 포함될 수 있다. 각 레이어에는 복수의 가중치 값이 있고, 하나의 레이어 계산은 이전 레이어의 계산 결과와 현재 레이어의 복수의 가중치를 통해 수행된다. 신경망의 예로, 컨볼루션 신경망(CNN), 심층 신경망(DNN), 순환 신경망(RNN), 제한된 볼츠만 머신(RBM), 심층 신뢰 신경망(DBN), 양방향 순환 신경망(BRDNN), 생성적 대립쌍 네트워(GAN) 및 심층 Q 네트워크를 포함하지만, 이에 제한되지 않는다. An AI model may include multiple neural network layers. Each layer has a plurality of weight values, and calculation of one layer is performed using a calculation result of a previous layer and a plurality of weights of the current layer. Examples of neural networks include convolutional neural networks (CNN), deep neural networks (DNNs), recurrent neural networks (RNNs), restricted Boltzmann machines (RBMs), deep trust neural networks (DBNs), bidirectional recurrent neural networks (BRDNNs), generative antagonistic networks ( GAN) and deep Q networks.

학습 알고리즘은, 복수의 학습 데이터를 이용하여 소정의 타겟 장치(예, 로봇)를 훈련하여 타겟 장치가 결정 또는 예측하도록 인에이블, 허용 또는 제어하는 방법이다. 학습 알고리즘의 예로, 지도 학습(supervised learning), 비지도 학습(unsupervised learning), 반 지도 학습(Semi-supervised learning) 또는 강화 학습(reinforcement learning)을 포함하나 이에 제한되지 않는다.The learning algorithm is a method of enabling, allowing, or controlling a predetermined target device (eg, a robot) to be determined or predicted by training a predetermined target device (eg, a robot) using a plurality of learning data. Examples of the learning algorithm include, but are not limited to, supervised learning, unsupervised learning, semi-supervised learning, or reinforcement learning.

AI 모델은, 이미지 데이터를 AI 모델의 입력 데이터로 사용하여, 비디오의 임의의 한 프레임 이미지에서 타겟 객체의 검출 결과를 얻을 수 있다. 해당 AI 모델은 훈련을 통해 얻을 수 있다. 여기서, "훈련을 통한 획득(obtain by training)"이란 훈련 알고리즘을 통해 다수의 훈련 데이터로 기본 AI 모델을 훈련시켜 원하는 특징(또는 목적)을 수행하도록 구성된 미리 정의된 연산 규칙 또는 AI 모델을 획득하는 것을 의미한다. AI 모델에는 복수의 신경망 계층이 포함될 수 있다. 복수의 신경망 계층 각각은 복수의 가중치 값을 포함하고, 신경망 계산은 이전 계층의 계산 결과와 복수의 가중치 값 사이의 계산에 의해 수행된다.The AI model may use the image data as input data of the AI model to obtain a detection result of a target object in an arbitrary one-frame image of a video. The AI model can be obtained through training. Here, "obtain by training" means training a basic AI model with a large number of training data through a training algorithm to obtain a predefined operation rule or AI model configured to perform a desired feature (or purpose). means that An AI model may include multiple neural network layers. Each of the plurality of neural network layers includes a plurality of weight values, and the neural network calculation is performed by calculation between the calculation result of the previous layer and the plurality of weight values.

이상에서 설명된 실시예들은 시각적 이해에 적용될 수 있다. 시각적 이해(visual understanding)는, 인간의 시각처럼 인식하고 처리하는 기술로, 예를 들어 객체 인식(object recognition), 객체 추적(object tracking), 이미지 검색, 인간 인식, 장면 인식, 3D 재구성/포지셔닝 또는 이미지 증강을 포함한다. The embodiments described above may be applied to visual understanding. Visual understanding is a technology for recognizing and processing like human vision, for example, object recognition, object tracking, image search, human recognition, scene recognition, 3D reconstruction/positioning, or Includes image enhancement.

이상에서 설명된 실시예들은 하드웨어 구성요소, 소프트웨어 구성요소, 및/또는 하드웨어 구성요소 및 소프트웨어 구성요소의 조합으로 구현될 수 있다. 예를 들어, 실시예들에서 설명된 장치, 방법 및 구성요소는, 예를 들어, 프로세서, 컨트롤러, ALU(arithmetic logic unit), 디지털 신호 프로세서(digital signal processor), 마이크로컴퓨터, FPGA(field programmable gate array), PLU(programmable logic unit), 마이크로프로세서, 또는 명령(instruction)을 실행하고 응답할 수 있는 다른 어떠한 장치와 같이, 범용 컴퓨터 또는 특수 목적 컴퓨터를 이용하여 구현될 수 있다. 처리 장치는 운영 체제(OS) 및 상기 운영 체제 상에서 수행되는 소프트웨어 애플리케이션을 수행할 수 있다. 또한, 처리 장치는 소프트웨어의 실행에 응답하여, 데이터를 접근, 저장, 조작, 처리 및 생성할 수도 있다. 이해의 편의를 위하여, 처리 장치는 하나가 사용되는 것으로 설명된 경우도 있지만, 해당 기술분야에서 통상의 지식을 가진 자는, 처리 장치가 복수의 처리 요소(processing element) 및/또는 복수 유형의 처리 요소를 포함할 수 있음을 알 수 있다. 예를 들어, 처리 장치는 복수의 프로세서 또는 하나의 프로세서 및 하나의 컨트롤러를 포함할 수 있다. 또한, 병렬 프로세서(parallel processor)와 같은, 다른 처리 구성(processing configuration)도 가능하다.The embodiments described above may be implemented by a hardware component, a software component, and/or a combination of a hardware component and a software component. For example, the apparatus, methods, and components described in the embodiments may include, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable gate (FPGA) array), a programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions, may be implemented using a general purpose computer or special purpose computer. The processing device may execute an operating system (OS) and a software application running on the operating system. A processing device may also access, store, manipulate, process, and generate data in response to execution of the software. For convenience of understanding, although one processing device is sometimes described as being used, one of ordinary skill in the art will recognize that the processing device includes a plurality of processing elements and/or a plurality of types of processing elements. It can be seen that can include For example, the processing device may include a plurality of processors or one processor and one controller. Other processing configurations are also possible, such as parallel processors.

소프트웨어는 컴퓨터 프로그램(computer program), 코드(code), 명령(instruction), 또는 이들 중 하나 이상의 조합을 포함할 수 있으며, 원하는 대로 동작하도록 처리 장치를 구성하거나 독립적으로 또는 결합적으로(collectively) 처리 장치를 명령할 수 있다. 소프트웨어 및/또는 데이터는, 처리 장치에 의하여 해석되거나 처리 장치에 명령 또는 데이터를 제공하기 위하여, 어떤 유형의 기계, 구성요소(component), 물리적 장치, 가상 장치(virtual equipment), 컴퓨터 저장 매체 또는 장치, 또는 전송되는 신호 파(signal wave)에 영구적으로, 또는 일시적으로 구체화(embody)될 수 있다. 소프트웨어는 네트워크로 연결된 컴퓨터 시스템 상에 분산되어서, 분산된 방법으로 저장되거나 실행될 수도 있다. 소프트웨어 및 데이터는 컴퓨터 판독 가능 기록 매체에 저장될 수 있다.Software may comprise a computer program, code, instructions, or a combination of one or more thereof, which configures a processing device to operate as desired or is independently or collectively processed You can command the device. The software and/or data may be any kind of machine, component, physical device, virtual equipment, computer storage medium or apparatus, to be interpreted by or to provide instructions or data to the processing device. , or may be permanently or temporarily embody in a transmitted signal wave. The software may be distributed over networked computer systems and stored or executed in a distributed manner. Software and data may be stored in a computer-readable recording medium.

실시예에 따른 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있으며 매체에 기록되는 프로그램 명령은 실시예를 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. The method according to the embodiment may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination, and the program instructions recorded on the medium are specially designed and configured for the embodiment, or may be known and available to those skilled in the art of computer software. may be Examples of the computer-readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic such as floppy disks. - includes magneto-optical media, and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

위에서 설명한 하드웨어 장치는 실시예의 동작을 수행하기 위해 하나 또는 복수의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The hardware devices described above may be configured to operate as one or a plurality of software modules to perform the operations of the embodiments, and vice versa.

이상과 같이 실시예들이 비록 한정된 도면에 의해 설명되었으나, 해당 기술분야에서 통상의 지식을 가진 자라면 이를 기초로 다양한 기술적 수정 및 변형을 적용할 수 있다. 예를 들어, 설명된 기술들이 설명된 방법과 다른 순서로 수행되거나, 및/또는 설명된 시스템, 구조, 장치, 회로 등의 구성요소들이 설명된 방법과 다른 형태로 결합 또는 조합되거나, 다른 구성요소 또는 균등물에 의하여 대치되거나 치환되더라도 적절한 결과가 달성될 수 있다.As described above, although the embodiments have been described with reference to the limited drawings, those of ordinary skill in the art may apply various technical modifications and variations based thereon. For example, the described techniques are performed in an order different from the described method, and/or the described components of the system, structure, apparatus, circuit, etc. are combined or combined in a different form than the described method, or other components Or substituted or substituted by equivalents may achieve an appropriate result.

그러므로, 다른 구현들, 다른 실시예들 및 특허청구범위와 균등한 것들도 후술하는 특허청구범위의 범위에 속한다.Therefore, other implementations, other embodiments, and equivalents to the claims are also within the scope of the following claims.

Claims

A method of detecting an object from a frame image of a video comprising a plurality of frame images, the method comprising:
detecting the object from the frame image based on a set of target templates; and
outputting information about the detected object
including,
The target template set comprises:
One or more target templates
including,
Each of the one or more target templates,
information of the object in each frame image determined to include the object among previous frame images of the frame image of the video
How to include.

According to claim 1,
The target template set comprises:
initial target template
further comprising,
The initial target template is
information of the object in a frame image determined by the user to include the object among the frame images of the video; or
A separate image that is independent of the video and contains the object
How to include.

According to claim 1,
The step of detecting the object from the frame image,
determining a target fusion characteristic by fusing image characteristics of an image region corresponding to each of the target templates included in the target template set;
obtaining one or more target candidate regions determined to include the object in the frame image based on the target fusion characteristic; and
determining one target area from the one or more target candidate areas;
How to include.

4. The method of claim 3,
Obtaining the one or more target candidate regions includes:
determining a plurality of search regions within the frame image;
obtaining a search area feature by extracting image features of each of the plurality of search areas;
calculating a correlation between a search area feature of each of the plurality of search areas and the target fusion feature; and
determining the at least one target candidate area among the plurality of search areas based on the correlation
How to include.

4. The method of claim 3,
updating the target template set
further comprising,
Updating the target template set includes:
calculating a similarity between the target fusion feature and the target region of the target template set; and
adding the target region as a target template to the target template when the similarity is less than a threshold
How to include.

4. The method of claim 3,
The step of detecting the object from the frame image,
obtaining one or more target candidate regions determined to include the object in the frame image; and
determining one target area from the one or more target candidate areas;
including,
The method is
updating the target template set when the target area satisfies a predetermined condition;
How to include more.

7. The method of claim 6,
Updating the target template set includes:
calculating a similarity of each of all target templates of the target template set and the target region; and
adding the target region as a target template to the target template when all of the similarities are less than a threshold value;
How to include.

According to claim 1,
The step of detecting the object from the frame image,
detecting the object from the frame image based on the target template set and the interference template set;
including,
The interference template set,
One or more clash templates
including,
Each of the one or more interference templates,
Information about an interfering object that interfered with the detection of the object among previous frame images of the frame image of the video
How to include.

9. The method of claim 8,
The step of detecting the object from the frame image,
obtaining one or more target candidate regions determined to include the object in the frame image based on the target template set and the interference template set; and
determining one target area from the one or more target candidate areas;
How to include.

10. The method of claim 9,
Determining one target region from the one or more target candidate regions includes:
calculating, for each of the one or more target candidate regions, a matching degree between each of the target candidate regions and each target template of the set of target templates;
calculating, for each of the one or more target candidate regions, a matching degree between each of the target candidate regions and each interference template in the set of interference templates;
Based on the matching degree of each of the target candidate areas with each target template of the target template set and the matching degree of each of the target candidate areas with each interference template of the interference template set, each of the target candidate areas calculating a target matching degree; and
determining one target area among the one or more target candidate areas based on respective target matching degrees of the target candidate areas;
How to include.

11. The method of claim 10,
Calculating each target matching degree of the target candidate region comprises:
an average or median value of matching degrees between each of the target candidate regions and each target template of the target template set, or
The average or median value of matching degrees between each of the target candidate regions and each interference template of the interference template set
calculating each target matching degree of the target candidate area based on
How to include.

10. The method of claim 9,
Obtaining the one or more target candidate regions includes:
determining a target fusion characteristic based on the set of target templates;
determining an interference fusion characteristic based on the interference template set; and
obtaining the one or more target candidate regions from the frame image based on the target fusion characteristic and the interference fusion characteristic;
How to include.

13. The method of claim 12,
Determining a target fusion characteristic based on the target template set comprises:
determining the target fusion characteristic by fusing image characteristics of an image region corresponding to each of the target templates included in the target template set;
including,
Determining an interference fusion characteristic based on the interference template set comprises:
determining the interference fusion characteristic by fusing image characteristics of an image region corresponding to each of the interference templates included in the interference template set;
How to include.

13. The method of claim 12,
Determining a target fusion characteristic based on the target template set comprises:
determining the target fusion characteristic based on all target templates included in the target template set;
including,
Determining an interference fusion characteristic based on the interference template set comprises:
Determining the interference fusion characteristic based on all interference templates included in the interference template set
How to include.

13. The method of claim 12,
updating the interference template set
further comprising,
Updating the interference template set comprises:
adding some or all of the target candidate regions other than the target region among the one or more target candidate regions as an interference template to the interference template;
How to include.

13. The method of claim 12,
updating the interference template set
further comprising,
Updating the interference template set comprises:
calculating a similarity of some or all of the interference fusion feature and other target candidate regions of the interference template set; and
adding a target candidate region having a similarity smaller than a threshold value among some or all of the other target candidate regions as an interference template to the interference template;
How to include.

In an electronic device,
including memory and a processor;
The memory stores a computer program,
The processor is configured to implement the method of claim 1 when the computer program is executed.
electronic device.

In the non-transitory computer-readable recording medium,
storing a computer program, the computer program being executed by a processor to implement the method of claim 1 ,
A non-transitory computer-readable recording medium.