KR101882743B1

KR101882743B1 - Efficient object detection method using convolutional neural network-based hierarchical feature modeling

Info

Publication number: KR101882743B1
Application number: KR1020170049015A
Authority: KR
Inventors: 이필규
Original assignee: 인하대학교 산학협력단
Priority date: 2017-04-17
Filing date: 2017-04-17
Publication date: 2018-08-30

Abstract

The present invention relates an efficient object detection method for suggesting a hierarchical deep feature-based training framework with a generalization capability using a hierarchical feature model (HFM) and a hierarchical classifier ensemble (HCE) based on facts that the performance of many object detectors is degraded due to the ambiguity of differences between classes as well as large changes in the class and that deep features extracted from visual objects exhibit strong hierarchical clustering properties.

Description

[0001] The present invention relates to an object detection method using convolutional neural network-based hierarchical feature modeling,

본 발명은 객체 검출 방법에 관한 것으로서, 특히, 콘볼루션 신경망 기반-계층적 특징 모델링을 이용한 효율적인 객체 검출 방법에 관한 것이다. The present invention relates to an object detection method, and more particularly, to an efficient object detection method using convolutional neural network-based hierarchical feature modeling.

컴퓨터 비전 영역에서 객체분류 및 방향정위(localization)로 이루어진 객체 검출이 점점 더 어려워지고 있다. 객체 검출은 클래스 간 외양의 이미지 모호성 및 큰 인트라 카테고리 변화로 인한 변형으로 인해 매우 복잡한 프로세스로 이루어진다. 많은 연구에서 훈련 샘플을 여러 구성 요소로 나누고 구성 요소를 독립적으로 훈련하여 객체 검출의 성능 저하를 개선하는 방법을 다루어왔다. 훈련 데이터 세트의 분해는 클래스 내의 국부 변형 및 변화를 완화할 수 있다. 일부 초기 선구적인 연구는 객체 규모, 자세, 종횡비(aspect ratio) 및 구성 요소 레이블의 측면에서 데이터를 훈련하기 위한 클러스터링 접근법을 조사한다. 그러나 클래스 간 모호성(inter-class ambiguity)을 고려하여 성능을 더 향상시킬 수 있지만, 대부분은 클래스 내 변화(intra-class variations)만을 고려하고, 클래스 간 모호성(inter-class ambiguity)을 조사하지 않는다. 검출 성능의 일부 진전은 의미 객체 카테고리(semantic object categories) 내에서 보다 일반적인 서브 카테고리 모델을 기반으로 한다. Object detection and object localization are becoming increasingly difficult to detect in the computer vision domain. Object detection is a very complex process due to the image ambiguity of the inter-class appearance and variations due to large intra category changes. In many studies, training samples have been divided into several components and methods have been dealt with to improve the performance degradation of object detection by independently training the components. Disassembly of the training data set can mitigate local variations and variations within the class. Some early pioneering research examines a clustering approach to training data in terms of object size, posture, aspect ratio, and component label. However, performance can be improved considering inter-class ambiguity, but most of them consider only intra-class variations and do not investigate inter-class ambiguity. Some progress in detection performance is based on more general subcategory models within semantic object categories.

예를 들어, Gu 등은 "Proceedings of the IEEE European Conference Computer Vision, pp. 445-458 (2012)"에서 주석이 달린 주요 점과 마스크를 사용하여 샘플을 구성 요소로 분할하고, Aghazadeh 등은 " Proceedings of the IEEE European Conference Computer Vision, pp. 115-128 (2012)"에서 데이터를 스펙트럼 클러스터로 분리하기 위해 클래스 내 정보를 나타내는 유사성 그래프를 사용했다. Ruan 등은 "Signal Process. Lett. IEEE 22(2), 244-248 (2015)"에서, 서브 카테고리에 대해 약하게 지도된(supervised) 다중 구성 요소 모델 훈련을 조사했다.For example, Gu et al. Classify samples into components using annotated key points and masks in "Proceedings of the IEEE European Conference Computer Vision, pp. 445-458 (2012) ", Aghazadeh et al. of the IEEE European Conference Computer Vision, pp. 115-128 (2012), used a similarity graph that represents information within a class to separate data into spectral clusters. Ruan et al., In "Signal Process. Lett., IEEE 22 (2), 244-248 (2015)," examined supervised multi-component model training for subcategories.

비록 많은 연구가 객체 검출의 정확성을 향상시키기 위해 서브 카테고리 구조를 이용 한다하더라도, 대부분의 서브 카테고리는 클래스 내 유사성 정보만을 기반으로 구축된다. 그러나, 클래스 간 모호성으로 인해 혼란스러운 많은 객체가 있다. 클래스 간 샘플의 혼란스러운 서브 카테고리 문제를 해결하기 위해 가치있는 클래스 간 정보가 사용될 수 있다. 최근, Dong 등은"IEEE Trans. Circuits Syst. Video Technol. 25(8), 1322-1334 (2015)"에서 클래스 내 다양성을 탐색하기 위한 서브 카테고리 마이닝 접근 방식을 제안했다. 그러나 고속 영역 기반의 컨볼루션 네트워크 (fast R-CNN)와 컨볼루션 네트워크에서의 공간 피라미드 풀링(SPP, spatial pyramid pooling)과 같은, 최근에 제안된 딥 피쳐 기반(deep feature-based) 객체 검출 방법에 비해 성능이 훨씬 떨어지는 문제점이 있다.Although many studies use sub-category structures to improve the accuracy of object detection, most sub-categories are constructed based only on similarity information within the class. However, there are many objects that are confusing due to ambiguity between classes. Valuable interclass information can be used to solve the confusing subcategory problem of interclass samples. Recently, Dong et al. Proposed a subcategory mining approach for exploring intra-class diversity in IEEE Trans. Circuits Syst. Video Technol. 25 (8), 1322-1334 (2015). However, recently proposed deep feature-based object detection methods such as fast R-CNN and spatial pyramid pooling (SPP) in a convolution network have been proposed There is a problem that the performance is much lower than that of the first embodiment.

따라서, 본 발명은 상술한 문제점을 해결하기 위하여 안출된 것으로, 본 발명의 목적은, 많은 객체 검출기의 성능은 클래스 내에서의 큰 변화뿐만 아니라 클래스 간 차이의 모호함으로 인해 저하되고, 시각적 객체에서 추출된 딥 피쳐는 강한 계층적 클러스터링 특성을 나타낸다는 점들에 기초하여, 계층적 특징 모델 (HFM, hierarchical feature model)과 계층적 분류기 앙상블 (HCE, hierarchical classifier ensemble)을 이용해, 일반화 능력을 갖춘 계층 적 딥 피쳐 기반 훈련 프레임 워크를 제시한, 효율적인 객체 검출 방법을 제공하는 데 있다. SUMMARY OF THE INVENTION Accordingly, the present invention has been made in order to solve the above-mentioned problems, and it is an object of the present invention to provide a method and an apparatus for analyzing a plurality of object detectors, Based hierarchical feature model (HFM) and a hierarchical classifier ensemble (HCE), based on the fact that the resulting deep feature exhibits strong hierarchical clustering characteristics, And to provide an efficient object detection method that presents a feature-based training framework.

먼저, 본 발명의 특징을 요약하면, 상기의 목적을 달성하기 위한 본 발명의일면에 따른 객체 검출 장치에서 이미지에 포함된 의미 객체들에 대한 계층적 특징 모델링을 이용한 객체 검출 방법은, 상기 이미지의 관심 영역의 특징에 따라, 상기 의미 객체들을 포함하는 의미 객체 카테고리 공간을 형성하는 루트 노드, 상기 루트 노드의 자식으로서 각 의미 객체에 대응되고 훈련 데이터 세트를 이용하여 훈련될 하나 이상의 수퍼 카테고리 노드, 각 수퍼 카테고리 노드의 자식으로서 훈련 데이터 세트를 이용하여 훈련될 하나 이상의 증강 카테고리 노드, 각 증강 카테고리 노드의 자식으로서 훈련 데이터 세트을 이용하여 훈련될 하나 이상의 서브 카테고리 노드를 포함하는 계층적 특징 모델의 구성을 수행하는 단계; 상기 수퍼 카테고리 노드, 상기 증강 카테고리 노드, 상기 서브 카테고리 노드에 대해 SVM(Support Vector Machine) 앙상블 알고리즘에 따른 다중 분류기를 학습하여 각 카테고리에서의 신뢰도 점수를 계산하는 단계; 및 상기 이미지에 포함된 의미 객체를 검출하기 위하여 상기 각 카테고리의 신뢰도 점수를 수집하는 단계를 포함한다.According to another aspect of the present invention, there is provided an object detection method using hierarchical feature modeling for semantic objects included in an image in an object detection apparatus, A root node forming a semantic object category space including the semantic objects according to a feature of the region of interest, at least one super category node corresponding to each semantic object as a child of the root node and being trained using a training data set, One or more enhancement category nodes to be trained using the training data set as a child of the super category node, and one or more subcategory nodes to be trained using the training data set as a child of each enhancement category node ; Learning a multi-classifier according to a SVM (Support Vector Machine) ensemble algorithm for the super category node, the augmentation category node, and the subcategory node, and calculating a reliability score in each category; And collecting reliability scores of each category to detect a semantic object included in the image.

상기 신뢰도 점수를 계산하는 단계는, 상기 각 카테고리에서 상기 관심 영역에 대하여 바이너리 SVM 분류기들을 의사 확률로 투영하는 단계; 및 상기 각 카테고리에서의 상기 의사 확률을 기초로 해당 카테고리에서의 다중 클래스 마진을 산출하고 상기 다중 클래스 마진에 대해 정규화된 다중 클래스 마진을 산출하여, 상기 각 카테고리에서의 신뢰도 점수를 계산하는 단계를 포함한다.Wherein calculating the confidence score comprises: projecting the binary SVM classifiers pseudo-probability for the region of interest in each of the categories; And calculating a multi-class margin in the category based on the pseudo-probability in each category and calculating a normalized multi-class margin for the multi-class margin to calculate a reliability score in each category do.

상기 관심 영역 r에서, 상기 하나 이상의 수퍼 카테고리 노드에 대해 의사 확률 P(h|r)로 투영하고, 하기의 수학식에 따라 k번째 수퍼 카테고리 노드에 대한 신뢰도 점수 CS_h ^(k)(r)를 산출하여 상기 수퍼 카테고리에서의 각 노드에 대해 신뢰도 점수를 산출하며, (H | r) for the one or more super category nodes in the region of interest r, and calculates a confidence score CS _h ^(k) (r) for the kth super category node according to the following equation: Calculates a reliability score for each node in the super category,

여기서,

은 k번째 수퍼 카테고리 노드 h^(k)에 대한 의사 확률,

은 h^(k)에 대한 다중 클래스 마진,

은 h^(k)에 대한 정규화된 다중 클래스 마진, |Ω_H|은 수퍼 카테고리 공간 Ω_H에서 h^(k)에 대응하는 최대수이다.here,

Is the pseudo-probability for the k-th super-category node h ^(k)

Is a multi-class margin for h ^(k)

Is the normalized multi-class margin for h ^(k) , | Ω _H | is the maximum number corresponding to h ^(k) in the super-category space Ω _H.

상기 관심 영역 r에서, 상기 하나 이상의 증강 카테고리 노드에 대해 의사 확률 P(h|r)로 투영하고, 하기의 수학식에 따라 k번째 증강 카테고리 노드에 대한 신뢰도 점수 CS_m ^(k')(r)를 산출하여 상기 증강 카테고리에서의 각 노드에 대해 신뢰도 점수를 산출하며, (H r) for the one or more enhancement category nodes in the region of interest r and calculates a confidence score CS _m ^{(k ')} (r) for the kth enhancement category node according to the following equation: Calculates a reliability score for each node in the augmentation category,

여기서,

은 k'번째 증강 카테고리 노드 m^(k')에 대한 의사 확률,

은 m^(k')에 대한 다중 클래스 마진,

은 m^(k')에 대한 정규화된 다중 클래스 마진, |Ω_M|은 증강 카테고리 공간 Ω_M에서 m^(k')에 대응하는 최대수이다.here,

Is the pseudo probability for the k 'th enhancement category node m ^(k') ,

Is a multi-class margin for m ^{(k ')} ,

Is ^"normalized multiclass margin for, | Ω _M | is m ^(k in augmented category space Ω _M m ^{^(k),} the maximum number corresponding to ^a).

상기 관심 영역 r에서, 상기 하나 이상의 서브 카테고리 노드에 대해 의사 확률 P(h|r)로 투영하고, 하기의 수학식에 따라 k"번째 서브 카테고리 노드에 대한 신뢰도 점수 CS_l ^(k")(r)를 산출하여 상기 서브 카테고리에서의 각 노드에 대해 신뢰도 점수를 산출하며, (H " r) for the one or more sub-category nodes in the region of interest r and calculates a confidence score CS _l ^{(k ")} (r ), Calculates a reliability score for each node in the sub-category,

여기서,

은 k"번째 서브 카테고리 노드 l^(k")에 대한 의사 확률,

은 l^(k")에 대한 다중 클래스 마진,

은 l^(k")에 대한 정규화된 다중 클래스 마진, |Ω_L|은 서브 카테고리 공간 Ω_L에서 l^(k")에 대응하는 최대수이다.here,

Is the pseudo-probability for the k "th sub-category node l ^(k") ,

Is a multi-class margin for l ^{(k ")} ,

Is ^"normalized multiclass margin for, | Ω _L | is l ^(k in the sub-category space Ω _L l ^{^(k)"} is the maximum number corresponding to ^a).

그리고, 본 발명의 다른 일면에 따른 컴퓨터로 읽을 수 있는 코드로 구현된 기록 매체는, 이미지의 관심 영역의 특징에 따라, 의미 객체들을 포함하는 의미 객체 카테고리 공간을 형성하는 루트 노드, 상기 루트 노드의 자식으로서 각 의미 객체에 대응되고 훈련 데이터 세트를 이용하여 훈련될 하나 이상의 수퍼 카테고리 노드, 각 수퍼 카테고리 노드의 자식으로서 훈련 데이터 세트를 이용하여 훈련될 하나 이상의 증강 카테고리 노드, 각 증강 카테고리 노드의 자식으로서 훈련 데이터 세트을 이용하여 훈련될 하나 이상의 서브 카테고리 노드를 포함하는 계층적 특징 모델의 구성을 수행하는 기능; 상기 수퍼 카테고리 노드, 상기 증강 카테고리 노드, 상기 서브 카테고리 노드에 대해 SVM 앙상블 알고리즘에 따른 다중 분류기를 학습하여 각 카테고리에서의 신뢰도 점수를 계산하는 기능; 및 상기 이미지에 포함된 의미 객체를 검출하기 위하여 상기 각 카테고리의 신뢰도 점수를 수집하는 기능을 포함하고, 객체 검출 장치에서 상기 이미지에 포함된 상기 의미 객체들에 대한 계층적 특징 모델링을 이용해 객체 검출을 수행하기 위한 것을 특징으로 한다.According to another aspect of the present invention, there is provided a recording medium embodied in computer readable code comprising a root node forming a semantic object category space including semantic objects according to characteristics of an area of interest of an image, One or more super category nodes corresponding to each semantic object as a child and trained using a training data set, one or more enhancement category nodes to be trained using a training data set as a child of each super category node, a child of each enhancement category node Performing a configuration of a hierarchical feature model comprising one or more subcategory nodes to be trained using training data sets; Learning the multi-classifier according to the SVM ensemble algorithm for the super-category node, the enhancement category node, and the sub-category node to calculate a reliability score in each category; And a function of collecting reliability scores of each category in order to detect the semantic objects included in the image, wherein the object detection apparatus performs object detection using hierarchical feature modeling of the semantic objects included in the image And the like.

본 발명에 따른 콘볼루션 신경망 기반-계층적 특징 모델링을 이용한 객체 검출 방법에 따르면, 증강된 객체 카테고리의 개념에 따라 특히 대규모 객체 검출에서 클래스 간 모호성 및 클래스 내 변화 문제를 해결할 수 있으며, SPP, 고속 R -CNN와 같은 최첨단 기술에서 볼 수 있듯이, 관심 영역 (ROI)에 전체 의미 카테고리의 전체 할당 대신 제한된 카테고리가 할당되기 때문에, 본 발명의 방법은 계산 오버 헤드를 줄일 수 있다.According to the object detection method using the convolutional neural network-based hierarchical feature modeling according to the present invention, it is possible to solve the problem of intra-class ambiguity and in-class change especially in the detection of a large scale object according to the concept of the enhanced object category, The method of the present invention can reduce the computational overhead because a restricted category is assigned instead of the entire assignment of the entire semantic category to the ROI, as seen in the state of the art, such as R-CNN.

또한, 계층적 특징 모델(HFM)은 딥 피쳐 계층의 클러스터링 품질을 활용하는 계층적 분류기 앙상블(HCE)과 결합하여 플랫 피쳐 모델 및 서브 카테고리 기반 피쳐 모델 보다 더 효과적으로 사용될 수 있다. In addition, a hierarchical feature model (HFM) can be used more effectively than a flat feature model and a subcategory based feature model in combination with a hierarchical classifier ensemble (HCE) that utilizes the clustering quality of the deep feature layer.

그리고, 많은 복잡한 데이터 샘플을 클래스 간 정보를 활용하여 서브 카테고리로 올바르게 클러스터링 할 수 있으며, 단순한 하위 문제를 해결하여 전반적인 검색 정확도를 향상시킬 수 있다.In addition, many complex data samples can be clustered correctly into subcategories by using interclass information, and simple sub-problems can be solved to improve overall search accuracy.

도 1은 본 발명의 HFM의 증강 카테고리의 개념을 보여준다.
도 2는 본 발명의 일 실시예에 따른 콘볼루션 신경망 기반-계층적 특징 모델링(CNN-HFM)을 이용한 객체 검출 시스템을 설명하기 위한 도면이다.
도 3은 본 발명의 일 실시예에 따른 객체 검출 시스템의 객체 검출 방법을 설명하기 위한 흐름도이다.Figure 1 shows the concept of the enhancement category of the HFM of the present invention.
2 is a diagram for explaining an object detection system using a convolutional neural network-based hierarchical feature modeling (CNN-HFM) according to an embodiment of the present invention.
3 is a flowchart illustrating an object detection method of an object detection system according to an embodiment of the present invention.

이하에서는 첨부된 도면들을 참조하여 본 발명에 대해서 자세히 설명한다. 이때, 각각의 도면에서 동일한 구성 요소는 가능한 동일한 부호로 나타낸다. 또한, 이미 공지된 기능 및/또는 구성에 대한 상세한 설명은 생략한다. 이하에 개시된 내용은, 다양한 실시 예에 따른 동작을 이해하는데 필요한 부분을 중점적으로 설명하며, 그 설명의 요지를 흐릴 수 있는 요소들에 대한 설명은 생략한다. 또한 도면의 일부 구성요소는 과장되거나 생략되거나 또는 개략적으로 도시될 수 있다. 각 구성요소의 크기는 실제 크기를 전적으로 반영하는 것이 아니며, 따라서 각각의 도면에 그려진 구성요소들의 상대적인 크기나 간격에 의해 여기에 기재되는 내용들이 제한되는 것은 아니다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings. In the drawings, the same components are denoted by the same reference symbols as possible. In addition, detailed descriptions of known functions and / or configurations are omitted. The following description will focus on the parts necessary for understanding the operation according to various embodiments, and a description of elements that may obscure the gist of the description will be omitted. Also, some of the elements of the drawings may be exaggerated, omitted, or schematically illustrated. The size of each component does not entirely reflect the actual size, and therefore the contents described herein are not limited by the relative sizes or spacings of the components drawn in the respective drawings.

본 발명은 기존의 알고리즘 중심의 검출 알고리즘 대신 일반화 능력을 갖춘 계층 적 딥 피쳐 기반 훈련 프레임 워크를 제시한다. 이는 다음과 같은 관찰에 의해 유발된다. (1) 많은 객체 검출기의 성능은 클래스 내에서의 큰 변화뿐만 아니라 클래스 간 차이의 모호함으로 인해 저하되고, (2) 시각적 객체에서 추출된 딥 피쳐는 강한 계층적 클러스터링 특성을 나타낸다. The present invention presents a hierarchical deep feature-based training framework with generalization capability instead of the existing algorithm-centric detection algorithm. This is caused by the following observations. (1) the performance of many object detectors is degraded due to the ambiguity of differences between classes as well as large changes in classes, and (2) the deep features extracted from visual objects exhibit strong hierarchical clustering properties.

본 발명에서는 계층적 특징 모델 (HFM, hierarchical feature model)과 계층적 분류기 앙상블 (HCE, hierarchical classifier ensemble)을 이용한 새로운 객체 검출 방법을 제안한다. 이 기법은 수퍼-(super-), 증강된(augmented) 및 서브 카테고리와 같은 측면에서 일반적이고 유연한 특징 구조를 특징으로 한다. 여기서 증강된 카테고리는, 잠재 주제 모델 (LTM, latent topic model)을 사용하여 수퍼 카테고리의 효과를 고려한 의미 객체 카테고리의 분할부분이다. 따라서, 각각의 증강된 카테고리는 하나의 의미 객체 카테고리에 대응한다. The present invention proposes a new object detection method using a hierarchical feature model (HFM) and a hierarchical classifier ensemble (HCE). This technique features typical and flexible feature structures in terms of super-, augmented, and sub-categories. Wherein the augmented category is a segmentation of a semantic object category that takes into account the effects of the super-category using a latent topic model (LTM). Thus, each augmented category corresponds to one semantic object category.

도 1은 본 발명의 HFM의 증강 카테고리의 개념을 보여준다. 도 1에서, 본래의 의미 객체와 비교된 증강된 카테고리의 개념을 나타내었다. 각 의미 객체 카테고리에 대해 훈련 샘플은 비지도(unsupervised) 수퍼 카테고리로 분할된다. 증강된 카테고리는 수퍼 카테고리를 기반으로 의미 객체 카테고리를 분할함으로써 결정된다. 그러므로, 각각의 증강된 카테고리는 하나의 의미 객체 카테고리와 대응한다.Figure 1 shows the concept of the enhancement category of the HFM of the present invention. In Figure 1, the concept of an enhanced category compared to the original semantic object is shown. For each semantic object category, training samples are divided into unsupervised super categories. The augmented category is determined by dividing the semantic object category based on the super category. Therefore, each augmented category corresponds to one semantic object category.

예를 들어, 사람(person) 카테고리는 앉아있는 사람 (증강된 카테고리 2), 서있는 사람 (증강된 카테고리 3) 및 승차한 사람 (증강된 카테고리 4)과 같이 세 가지 증강 카테고리로 나눌 수 있다. 대규모 객체 검출에서, 증강된 클래스를 사용하여 구축된 분류기는 카테고리 유사성 오류가 효율적으로 감소되기 때문에 의미 객체 카테고리를 사용하여 구축된 분류기보다 우수한 성능을 제공할 수 있다.For example, the person category can be divided into three enhancement categories: a sitting person (augmented category 2), a standing person (augmented category 3), and a rider (augmented category 4). In large object detection, a classifier constructed using an augmented class can provide better performance than a classifier constructed using a semantic object category because the category similarity errors are effectively reduced.

본 발명의 방법은 객체 분류 및 방향정위(localization)에서 일반화 능력을 탐색하기 위해 계층적 기능 구조 및 분류기 앙상블을 대화식으로 구축하는 최초의 완전한 종단 간 접근법이다. 데이터 계층 구조 HFM의 각 노드에서, 기존 다단계 분류 앙상블을 채택한 경우가 있지만 적응적이다. 모든 객체 클래스에 대해 플랫(flat) 선형 SVM(Support Vector Machine)을 사용하는 대신, 계층적 SVM 앙상블인 HCE가 클래스 간 및 클래스 내 결정에 사용된다. The method of the present invention is the first complete end-to-end approach to interactively build hierarchical functional structures and classifier ensembles to explore generalization capabilities in object classification and direction localization. Data Hierarchy In each node of the HFM, the existing multi-level classification ensemble is adopted but adaptive. Instead of using a flat linear SVM (Support Vector Machine) for all object classes, the hierarchical SVM ensemble HCE is used for interclass and intraclass decisions.

먼저, HCE는 각각의 증강 카테고리 라벨(label)에 대해 바이너리 SVM 분류기에 의해 만들어진 하나의 클래스 예측에 대한 신뢰도(confidence)를 계산하기 위해 일대다(one-against-all) SVM을 사용한다. 다음에, 객체 검출을 위한 다중 클래스 신뢰도 점수(각 카테고리의 신뢰도 점수)는 다중 검출기들을 결합하여 산출된다. 다음에, 각 HCE 트리는 HFM의 다른 결정 경로에서 훈련되고, 검출 오류를 최소화하기 위해 테스트 이미지의 전체 신뢰도 점수를 계산하는 데 사용된다. 마지막으로, 검출 단계에서 HCE 기반 객체 검출은 HFM에 의해 유도된 각 영역 제안의 신뢰 점수(들)를 집계하여 수행된다. 각 카테고리의 신뢰도 점수는 가중치 등에 따라 집계되어 능형 회귀(ridge regression), non-maximum suppression(비최대 억제) 등의 처리를 거쳐 이미지에 대상 객체를 포함하는 지 여부 등의 객체 검출 판단에 활용될 수 있다. First, the HCE uses a one-against-all SVM to compute confidence for one class prediction made by the binary SVM classifier for each enhancement category label. Next, the multi-class reliability score (reliability score of each category) for object detection is calculated by combining multiple detectors. Next, each HCE tree is trained in different decision paths in the HFM and is used to calculate the total confidence score of the test image to minimize detection errors. Finally, in the detection phase, HCE-based object detection is performed by aggregating the confidence score (s) of each domain proposal derived by HFM. The reliability score of each category can be used to judge object detection, such as whether the target object is included in the image through ridge regression, non-maximum suppression, etc. have.

본 발명의 주요 특징을 정리하면 다음과 같다.The main features of the present invention are summarized as follows.

1) 증강된 객체 카테고리의 개념은 특히 대규모 객체 검출에서 클래스 간 모호성 및 클래스 내 변화 문제를 해결할 수 있다. SPP, 고속 R -CNN와 같은 최첨단 기술에서 볼 수 있듯이, 관심 영역 (ROI)에 전체 의미 카테고리의 전체 할당 대신 제한된 카테고리가 할당되기 때문에, 본 발명의 방법은 계산 오버 헤드를 줄일 수 있다.1) The concept of augmented object categories can solve the ambiguity between classes and in-class changes, especially in large object detection. The method of the present invention can reduce the computational overhead because a limited category is assigned instead of the entire assignment of the entire semantic category to the ROI, as seen in state of the art techniques such as SPP, high speed R-CNN.

2) 계층적 특징 모델(HFM)은 딥 피쳐 계층의 클러스터링 품질을 활용하는 계층적 분류기 앙상블(HCE)과 결합하여 플랫 피쳐 모델 및 서브 카테고리 기반 피쳐 모델 보다 더 효과적으로 사용될 수 있다. 2) The Hierarchical Feature Model (HFM) can be used more effectively than the flat feature model and subcategory based feature models in combination with a hierarchical classifier ensemble (HCE) that takes advantage of the clustering quality of the deep feature layer.

3) 많은 복잡한 데이터 샘플을 클래스 간 정보를 활용하여 서브 카테고리로 올바르게 클러스터링 할 수 있으며, 단순한 하위 문제를 해결하여 전반적인 검색 정확도를 향상시킬 수 있다.3) Many complex data samples can be clustered correctly into subcategories by using interclass information, and simple sub-problems can be solved to improve overall search accuracy.

도 2는 본 발명의 일 실시예에 따른 콘볼루션 신경망 기반-계층적 특징 모델링(CNN-HFM)을 이용한 객체 검출 시스템(장치)을 설명하기 위한 도면이다. 도 2를 참조하면, 본 발명의 일 실시예에 따른 객체 검출 시스템은, 계층적 특징 모델 (HFM)을 구축하는 계층적 특징 모델부, 및 SVM 앙상블 알고리즘을 사용하여 HFM의 각 노드에서 다중 분류기를 훈련하여 객체 검출을 위한 다중 등급 신뢰도 점수를 산출하는 계층적 분류기 앙상블(HCE)을 포함한다. 2 is a diagram for explaining an object detection system (apparatus) using a convolutional neural network-based hierarchical feature modeling (CNN-HFM) according to an embodiment of the present invention. Referring to FIG. 2, an object detection system according to an embodiment of the present invention includes a hierarchical feature model unit for constructing a hierarchical feature model (HFM), and a multi-classifier at each node of the HFM using an SVM ensemble algorithm And a hierarchical classifier ensemble (HCE) that trains and computes multiple grade confidence scores for object detection.

본 발명의 일 실시예에 따른 객체 검출 시스템의 각부 구성 요소들은, 반도체 프로세서와 같은 하드웨어, 응용 프로그램과 같은 소프트웨어, 또는 이들의 결합으로 구현될 수 있다. 또한, 본 발명의 일 실시예에 따른 객체 검출 시스템에서 사용되는 기능은 컴퓨터 등 장치로 읽을 수 있는 기록 매체에 컴퓨터가 읽을 수 있는 코드로서 구현하는 것이 가능하며, 이와 같은 기록 매체와 컴퓨터 등 장치의 결합으로 기능 수행에 필요한 데이터나 정보를 입력하거나 출력하고 디스플레이하도록 구현할 수 있다. 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광데이터 저장장치, 하드 디스크, 이동형 저장장치 등을 포함한다.The components of the object detection system according to an embodiment of the present invention may be implemented by hardware such as a semiconductor processor, software such as an application program, or a combination thereof. In addition, the functions used in the object detection system according to an embodiment of the present invention can be implemented as computer-readable codes on a recording medium readable by an apparatus such as a computer. And can be implemented to input, output, and display data or information necessary for performing a function. A computer-readable recording medium includes all kinds of recording apparatuses in which data that can be read by a computer system is stored. Examples of the computer-readable recording medium include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage device, hard disk, removable storage device and the like.

일반적으로 객체 클래스 수가 증가할수록 분류 성능은 떨어지게 된다. 플랫 SVM과 완전히 연결된 신경망의 배치는 객체 클래스 수가 적거나 보통인 경우에 성공적이지만, 객체 카테고리 수가 증가함에 따라 검출 성능이 저하된다. 데이터 불균형은 실제 세계의 이미지 데이터를 오염시키는 잡음 및 변동의 증가로 인한 공통 현상이다. 제안된 HFM 기반 객체 검출 방법은, 일반화 능력을 가진 강력한 객체 검출을 제공하는 것을 목표로 한다. 데이터 중심 방식으로 HCE 및 HFM을 기반으로 한 정확성 개선이 새롭다. 제안된 객체 검출 방법의 핵심인 HFM은 수퍼 카테고리, 증강된 객체 카테고리 및 서브 카테고리 피쳐 모델로 구성된 3 레벨 클러스터 트리로 구성된다. HFM은 의미없는 객체 카테고리 인식을 위해 비지도(unsupervised) 수퍼 카테고리의 특징 정보와 의미적인 서브 카테고리를 이용한다. 영역 제안 알고리즘 인 EdgeBoxes를 사용하여 관심 영역 (ROI)을 추출할 수 있다. 학습(learning) 단계에서, 카테고리 계층 구조는 사전 훈련된 CNN에서 추출된 특징을 갖는 잠재 주제 모델(LTM)을 사용함으로써 발견될 수 있다. HFM은 계층적 카테고리에 맞게 미세 조정되어 클래스 간 수준과 클래스 내 수준으로 구축된다. HCE는 SVM 앙상블 알고리즘을 사용하여 HFM의 각 노드에서 다중 분류기를 훈련함으로써 구축된다. 영역 보상 모델(region compensation model)은 계층적 능형 회귀 알고리즘(hierarchical ridge regression algorithm)을 사용하여 구축될 수 있다.In general, as the number of object classes increases, the classification performance deteriorates. The placement of fully connected neural networks with a flat SVM is successful when the number of object classes is small or moderate, but the detection performance deteriorates as the number of object categories increases. Data imbalance is a common phenomenon due to increased noise and variations that contaminate real world image data. The proposed HFM based object detection method aims to provide robust object detection with generalization capability. Accuracy improvements based on HCE and HFM are new in a data-driven manner. HFM, which is the core of the proposed object detection method, consists of a three level cluster tree composed of super category, augmented object category and subcategory feature model. HFM uses feature information and semantic subcategories of unsupervised super categories for meaningless object category recognition. The region of interest (ROI) can be extracted using EdgeBoxes, the proposed algorithm. In the learning phase, the category hierarchy can be found by using a potential topic model (LTM) with features extracted from the pre-trained CNN. The HFM is fine-tuned to the hierarchical category and is constructed at the inter-class level and the intra-class level. The HCE is constructed by training multiple classifiers at each node of the HFM using the SVM ensemble algorithm. The region compensation model can be constructed using a hierarchical ridge regression algorithm.

검출 단계에서, 증강된 객체 카테고리의 풀은 영역 제안 알고리즘(region proposal algorithm)에 의해 생성된 ROI에 대한 HCE 서브 트리 가설의 관점에서 예측된다. 객체 검출은 Girshick의 방법과 유사하게 능형 회귀(ridge regression)와 non-maximum suppression(비최대 억제)에 이어 상기 가설을 기반으로 수행된다. HCE 하위 트리의 점수는, 가설 ROI를 위한 수퍼 카테고리, 증강 카테고리 및 서브 카테고리의 측면에서 결합된다. 최종적으로, non-maximum suppression의 후처리는 결합된 스코어 및 위치 정보를 사용하여 실행되고, 객체 카테고리가 결정된다.In the detection step, the pool of augmented object categories is predicted in terms of the HCE subtree hypothesis for the ROI generated by the region proposal algorithm. Object detection is performed based on the above hypothesis following ridge regression and non-maximum suppression similar to Girshick's method. The scores of the HCE subtree are combined in terms of super categories, enhancement categories, and subcategories for the hypothetical ROI. Finally, the post-processing of non-maximum suppression is performed using the combined score and location information, and the object category is determined.

<계층적 특징 모델부 (HFM)><Hierarchical Feature Model Unit (HFM)>

먼저, 계층적 특징 모델부 (HFM)의 기능을 구체적으로 설명한다. First, the function of the hierarchical feature model unit (HFM) will be described in detail.

영역 제안 알고리즘 EdgeBoxes는 이미지에서 ROI를 찾는데 사용되며, ROI 피쳐는 16-레이어 CNN을 사용하여 생성된다. 정규화된 ROI 기능을 사용하면, 루트 노드의 의미 객체들과 연관된 수퍼 카테고리의 H 레벨 (클래스 간), 증강 카테고리의 M 레벨 (증강된 클래스) 및 서브 카테고리의 L 레벨 (인트라 클래스)의 세 가지 레벨로 구성된 딥 피쳐 계층 HFM이 구축된다. HFM의 루트 노드는 H 레벨에 자식으로서 각 의미 객체와 대응된 하나 이상의 수퍼 카테고리 노드들을 갖는다. 각 수퍼 카테고리 노드는 클래스 간 특성에 따라 원래 또는 분할된 의미 객체 클래스인 M 레벨에 자식으로서 하나 이상의 증강 객체 카테고리 노드를 갖는다. 각각의 증강 객체 카테고리 노드는 L 레벨에 자식으로 하나 이상의 서브 카테고리 노드를 갖는다. Region Proposed Algorithm EdgeBoxes are used to find ROIs in images, and ROI features are generated using 16-layer CNNs. Using the normalized ROI function, three levels of the super level (inter-class), the M level (augmented class) and the sub-category (intra class) of the enhancement category associated with the semantic objects of the root node A deep feature layer HFM is constructed. The root node of the HFM has one or more super category nodes corresponding to each semantic object as a child at the H level. Each super category node has at least one augmented object category node as a child at the M level which is the original or divided semantic object class according to the inter-class characteristic. Each augmented object category node has one or more subcategory nodes as children at the L level.

HCE는 일대다(one-versus-all) SVM의 집합인 HFM의 각 노드에서 다중 카테고리 분류기를 훈련함으로써 구축된다. Girshicks의 플랫 피처 구조는 증강된 카테고리가 없는 M 레벨에서 루트와 전체 의미 객체 카테고리만 갖는 HFM의 특별한 경우에 해당할 수 있다. 이러한 플랫 HFM 구조를 플랫 피처 모델 (FFM, flat feature model)이라 한다. HFM은 또한 서브 카테고리 기반 접근법의 일반화에 해당한다.The HCE is built by training a multi-category classifier at each node of the HFM, which is a collection of one-versus-all SVMs. Girshicks' flat feature structure may correspond to a special case of HFM with only the root and the entire semantic object category at the M level without the augmented category. This flat HFM structure is called a flat feature model (FFM). HFM is also a generalization of the sub-category based approach.

<카테고리 계층화를 위한 잠재 주제 모델(LTM)><Potential topic model for category hierarchy (LTM)>

데이터 기반 계층적 카테고리를 학습하기 위한 비지도(unsupervised) 접근 방식을 소개한다. 비지도(unsupervised) 학습 단계의 경우, 잠재적 주제 모델(LTM)을 사용하는 수퍼 카테고리가 구축된다. 혼합 모델로서, LTM은 계층화된 구조를 학습하는데 유리한 데이터를 그룹화하기 위한 잠재적인 혼합 구성 요소를 나타내는 새로운 방법을 제공한다. LTM에서 ROI는 잠재적인 주제의 조합으로 표현된다. 이러한 학습된 주제는 카테고리 계층 구조를 구축하기 위한 수퍼 카테고리에 해당한다. ROI 표현을 위해, 딥(deep) CNN(convolutional neural network)으로부터 추출된 특징이 사용된다. 보다 구체적으로, CNN 모델은 사전 훈련된 CNN 모델을 사용하여 훈련 세트에서 미세 조정된다. 그 다음, 고정된 길이의 특징 벡터(피쳐 벡터)가 각 ROI에 대해 마지막으로 완전히 연결된 레이어로부터 추출된다. 추출된 특징(피쳐)는 인코딩되고, 양자화되고, 스케일된다. 각 ROI가 벡터로 표현되는 것을 고려하면, 목표는 표현 된 ROI에 혼합 모델을 적용하여 수퍼 카테고리를 학습하는 것이다. 상세하게, LTM은 K 개의 주제를 조합하여 각 ROI를 나타낸다. 각 주제는 하나의 수퍼 카테고리 또는 몇 개의 수퍼 카테고리에 해당한다. 이때 원래의 LTM과 유사한 생성 과정이 사용될 수 있다.We introduce an unsupervised approach to learning data-driven hierarchical categories. For the unsupervised learning phase, a super-category is built using the Potential Topic Model (LTM). As a mixed model, the LTM provides a new way of representing potential hybrid components for grouping data that is beneficial to learning a layered structure. In LTM, ROI is represented by a combination of potential topics. These learned topics correspond to super categories for building category hierarchies. For ROI representation, features extracted from deep CNN (convolutional neural network) are used. More specifically, the CNN model is fine-tuned in the training set using a pre-trained CNN model. A fixed length feature vector (feature vector) is then extracted from the last fully connected layer for each ROI. The extracted features (features) are encoded, quantized, and scaled. Considering that each ROI is represented as a vector, the goal is to learn the super-category by applying a mixed model to the expressed ROI. Specifically, LTM represents each ROI by combining K subjects. Each subject corresponds to one super category or a few super categories. At this time, a generation process similar to the original LTM can be used.

<계층적 특징 모델 (HFM)><Hierarchical Feature Model (HFM)>

계층적 특징 모델부는 이미지에 포함된 의미 객체들을 포함하는 의미 객체 공간 Ω과 관련된 훈련 데이터 세트 D의 ROI의 특징을 사용하여 계층적 특징 모델 (HFM)을 구축한다. HFM은 루트 노드, 수퍼 카테고리 노드, 증강 카테고리 노드 및 서브 카테고리 노드로 구성된다. HFM의 루트는 카테고리 공간 Ω을 가진 전체 피쳐 세트 D와 연관되어 있으며, 자식으로서의 수퍼 카테고리 노드에 연결된다. 다중 수퍼 카테고리를 갖는 의미 객체는 클래스 간 특성에 따라 여러 개의 증강된 카테고리로 분할된다. 증강된 카테고리의 개념은 원래의 의미 객체 클래스의 클래스 내 레벨에서의 변이뿐만 아니라 클래스 간 레벨에서의 모호함의 영향을 줄이기 위해 도입되었으며 다중 레이블 카테고리 및 폐색된(occluded) 의미 객체 카테고리를 나타내도록 증강될 수 있다. 수퍼 카테고리 노드 h는 훈련 데이터 세트 D_h와 연관되며, 이는 D의 서브 세트이며, 복수의 증강된 객체 노드를 자식으로 갖는다. LTM 분석은 하나의 의미 객체 카테고리가 여러 수퍼 카테고리에 속하는 것을 허용한다. 이는 서로 다른 객체가 비슷한 모양이나 특성을 가진 파트를 공유 할 수 있기 때문이다. M 레벨에서, 다수의 수퍼 카테고리에 속하는 의미 객체 카테고리는 다수의 증강 카테고리로 분할된다. 증강 카테고리 노드 m은 훈련 데이터 세트 D_h로부터 분할된 D_m으로 표시되는 훈련 데이터 세트를 갖는다. 각 증강된 객체 카테고리의 트레이닝 세트는 LTM 알고리즘을 사용하여 하위 레벨의 서브 카테고리로 더 분할되어 클래스 내 변동의 영향을 최소화할 수 있다. 서브 카테고리 노드 l의 훈련 데이터 세트는 D_l로 표시되며, 이 데이터는 증강된 훈련 데이터 세트 D_m에서 분할된다.The hierarchical feature modeling unit builds a hierarchical feature model (HFM) using the features of the ROI of the training data set D associated with the semantic object space? Containing the semantic objects contained in the image. The HFM consists of a root node, a super category node, an enhancement category node and a subcategory node. The root of the HFM is associated with the entire feature set D with category space? And is connected to the super category node as a child. Semantic objects with multiple super categories are divided into several enhancement categories according to their inter-class characteristics. The concept of an augmented category has been introduced to reduce the influence of ambiguity at the inter-class level as well as variations in the level of the class of the original semantic object class and is augmented to indicate the multi-label category and the occluded semantic object category . The super-category node h is associated with the training data set D _h , which is a subset of D and has a plurality of augmented object nodes as its children. LTM analysis allows one semantic object category to belong to several super categories. This is because different objects can share parts with similar shapes or characteristics. At the M level, semantic object categories belonging to a plurality of super categories are divided into a plurality of enhancement categories. The enhancement category node m has a training data set denoted by D _m divided from the training data set D _h . The training set of each augmented object category can be further subdivided into lower level subcategories using the LTM algorithm to minimize the impact of intra-class variation. The training data set of subcategory node l is denoted D _l , and this data is divided in the augmented training data set D _m .

<계층적 분류기 앙상블 (HCE)><Hierarchical Classifier Ensemble (HCE)>

계층적 분류기 앙상블(HCE)은, SVM 앙상블 알고리즘을 사용하여 HFM의 각 노드에서 다중 분류기를 훈련하여 객체 검출을 위한 다중 등급 신뢰도 점수를 산출한다. The hierarchical classifier ensemble (HCE) uses a SVM ensemble algorithm to train multiple classifiers at each node of the HFM to yield multiple grade confidence scores for object detection.

노드 신뢰도 함수는 개별 자식 노드의 바이너리 분류기의 집합에 의해 구축되는 다중 클래스 분류기로 구성된다. 예측 정확도와의 선형 관계를 유지하고 여러 고려 사항으로 예측 신뢰도를 높이려면 신뢰도 점수가 필요하다. The node reliability function consists of a multi-class classifier constructed by a set of binary classifiers of individual child nodes. Reliability scores are needed to maintain linear relationships with predictive accuracy and to increase predictive reliability with several considerations.

도 2와 같이, Ω_H, Ω_M 및 Ω_L을 각각 수퍼 카테고리 공간, 증강 카테고리 공간, 및 서브 카테고리 공간이라 하자. 루트는 자식으로서 수퍼 카테고리 노드들을 하나 이상(|Ω_H|, h^(k)에 대응하는 최대수) 포함하고, 수퍼 카테고리 노드 h는 증강 객체 카테고리 노드들을 하나 이상(|Ω_M|, m^(k')에 대응하는 최대수) 포함하며, 증강 카테고리 노드 m은 서브 카테고리 노드들을 하나 이상(|Ω_L|, l^(k'')에 대응하는 최대수)을 포함한다. As shown in FIG. 2, let? _H ,? _M, and? _L be the super-category space, the enhancement category space, and the sub-category space, respectively. Routes have one or more of the super-category node as a child comprising (| | Ω _H, the maximum number corresponding to the h ^(k)) and super-category node h is one or more of the enhanced object category node _{^{(| Ω M |, m (}} k ^') ), And the enhancement category node m includes one or more sub-category nodes (the maximum number corresponding to | Q _L |, l ^(k'') ).

먼저, 루트에서는 수퍼 카테고리 노드로 이동할 때 사용되는 ROI(Region of Interest)의 신뢰도 점수를 계산하는 SVM(Support Vector Machine) 앙상블을 생성한다. |Ω_H| 바이너리 SVM 분류기들(φ₁, φ₂,. . . , φ_|ΩH|)은, 수퍼 카테고리 노드를 결정할 때 ROI가 사용하는, 루트 노드에서 D_h를 사용하여 훈련된다. 다중 클래스 분류에 직접 사용하는 것으로 바이너리 SVM에 의해 추정된 예측을 믿을 수가 없으므로, 아래와 같이 설명된 신뢰도 함수가 도입된다. First, the root generates an SVM (Support Vector Machine) ensemble for calculating the reliability score of ROI (Region of Interest) used when moving to the super category node. | Ω _H | The binary SVM classifiers (φ ₁ , φ ₂ , ..., φ _{| ΩH |} ) are trained using the D _h at the root node, which ROI uses to determine the _{supercategory} nodes. Since the prediction used directly by the multi-class classification and estimated by the binary SVM is unreliable, the reliability function described below is introduced.

주어진 ROI r에서, 전체 수퍼 카테고리 노드들 h에 대한 선형 SVM (분류기) φ_h는 [수학식1]과 같이 의사(pseudo) 확률 P(y = h|r)로 투영된다. At a given ROI r, a linear SVM (classifier) φ _h for all super-category nodes _h is projected to a pseudo probability P (y = h | r), as in equation (1).

[수학식1][Equation 1]

여기서, 파라미터 α, β는 로지스틱 회귀(logistic regression)에 의해 [수학식1a]와 같이 결정된다. 이하 파라미터 α', β' 및 α", β"에 대하여도 마찬가지로 유사한 방법으로 결정될 수 있다. Here, the parameters? And? Are determined by logistic regression as shown in Equation (1a). The parameters a ', b' and a ", b "can be similarly determined in a similar manner.

[수학식1a]Equation (1a)

여기서, w는 ROI 샘플들에 대한 가중치(각 샘플에 대한 각각의 w_i 가능), y는 해당 라벨. 주어진 ROI r에서, 루트 노드에서의 다중 클래스 예측은 [수학식2a]와 같이 의사 확률 P(y = h|r)로 ROI r에 대한 최상위의 스코어된 제1 수퍼 카테고리 노드 예측값 h⁽¹⁾(r)을 결정함으로써 시작된다Where w is the weight for ROI samples (each w _{i is} possible for each sample), y is the corresponding label. At a given ROI r, the multi-class prediction at the root node yields the highest scored first super-category node predicted value h ⁽¹⁾ ( ⁽ h)) for ROI r with pseudo-probability P r < / RTI >

[수학식2a]&Quot; (2a) "

다중 클래스 마진 ζ_h ⁽¹⁾(r)은 [수학식2b]와 같이 정의된다. Multiclass margin ζ _h ^{(1) (r)} is defined as shown in Equation 2b].

[수학식2b](2b)

정규화된 다중 클래스 마진(normalized multi-class margin) φ_h ⁽¹⁾(r)은 [수학식2c]와 같이 시그모이드(sigmoid) 함수를 이용하여 의사 확률 P(y = h|r)과 다중 클래스 마진 ζ_h ⁽¹⁾(r) 사이의 관계를 기초로 산출된다. The normalized multi-class margin φ _h ⁽¹⁾ (r) is multiplied with the pseudo probability P (y = h | r) using a sigmoid function as in Equation (2c) And the class margin? _H ⁽¹⁾ (r).

[수학식2c](2c)

여기서, 파라미터 A, B, C는 경험적인 피팅(fitting)을 통하여 결정될 수 있는 소정의 계수이다. 이하 파라미터 A', B', C' 및 A", B", C"에 대하여도 마찬가지이다. Here, parameters A, B, and C are predetermined coefficients that can be determined through empirical fitting. The same applies to the parameters A ', B', C 'and A ", B" and C ".

이에 따라 제1 수퍼 카테고리 노드에 대한 예측값 h⁽¹⁾(r)에 대한 신뢰도 함수 CS_h ⁽¹⁾(r)가 [수학식2d]와 같이 산출될 수 있다. Accordingly, the reliability function CS _h ⁽¹⁾ (r) for the predicted value h ⁽¹⁾ (r) for the first super-category node can be calculated as shown in equation (2d).

[수학식2d]Equation (2d)

이에 따라 k번째 수퍼 카테고리 노드 h^(k)에 대한 예측값 h^(k)(r)에 대한 신뢰도 함수 CS_h ^(k)(r)가 [수학식2e]와 같이 산출되어 전체 수퍼 카테고리 노드들에 대한 신뢰도 점수 계산에 사용될 수 있다. Accordingly, the reliability function CS _h ^(k) (r) for the predicted value h ^(k) (r) for the k-th super-category node h ^(k) is calculated as in Equation (2e) Can be used for reliability score calculation.

[수학식2e](2e)

여기서, [수학식2a], [수학식2b], [수학식2c]에 따라, [수학식2f]와 같은 관계가 성립한다. 여기서, 복수의 수퍼 카테고리 노드 중 k번째 수퍼 카테고리 노드에 대한 예측값 h^(k), 다중 클래스 마진 ζ_h ^(k)(r), 정규화된 다중 클래스 마진 φ_h ^(k)(r)이 이용된다.Here, according to the equations (2a), (2b), (2c), the relationship as shown in the following equation (2f) is established. Here, the predicted value h ^(k) , the multiple class margin? _H ^(k) (r), and the normalized multiple class margin? _H ^(k) (r) for the kth super category node among the plurality of super category nodes are used.

[수학식2f]&Quot; (2f) "

수퍼 카테고리 노드는 전체 의미 객체 카테고리들 N보다 훨씬 작은, 증강 객체 카테고리 세트 Ω_M을 갖는다. 각 수퍼 카테고리 노드에 대한 SVM 앙상블이 구축되고 각각의 증강 객체 카테고리에 대한 신뢰 점수를 계산한다. 각 증강 객체 카테고리에서 SVM 앙상블이 구축된다. The super category node has augmented object category set < RTI ID = 0.0 > OM, < / RTI > An SVM ensemble for each super category node is established and the confidence score for each augmented object category is calculated. An SVM ensemble is built up in each augmented object category.

|Ω_M| 바이너리 SVM 분류기들(φ'₁, φ'₂,. . . , φ'_|ΩM|)은, 증강 카테고리 공간 Ω_M에서 가장 좋은 최상의 노드(들)를 결정하기 위해 수퍼 카테고리 노드에서 훈련 데이터 세트를 D_m를 사용하여 훈련된다. 증강 카테고리 m에 대한 선형 SVM (분류기) φ'_m은 [수학식3a]와 같이 정의된 의사 확률 P (y = m^(k')| r)로 투영된다([수학식1] 참조). | Ω _M | Binary SVM classifier _{_{(φ '1, φ' 2}} ,, φ '|... ΩM |) , the training data set in the super-category nodes in order to determine the best best node (s) in augmented category space Ω _M D _m . The linear SVM (classifier)? ' _M for the enhancement category m is projected to a pseudo probability P (y = m ^(k') | r) defined as in (3a) (see Equation 1).

[수학식3a] (3a)

이에 따라, 수퍼 카테고리 노드에 대해 계산된, 복수의 증강 카테고리 노드 중 각 증강 카테고리에 대한 k' 번째 신뢰도값 CS_m ^(k') (r)가 [수학식3b]와 같이 산출되어 전체 증강 카테고리 노드들에 대한 신뢰도 점수 계산에 사용될 수 있다([수학식2e] 참조). Thus, the k 'th credit value _{^{CS m (k') (r}} ) for the respective enhancement category of the plurality of enhancement category node calculation for the super-category node is calculated as shown in Equation 3b] Total Growth category node (See Equation (2e)). &Lt; / RTI >

[수학식3b](3b)

여기서, k'번째 증강 카테고리 노드에 대한 예측값 m^(k'), 다중 클래스 마진 ζ_m ^(k')(r), 정규화된 다중 클래스 마진 φ_m ^(k')(r)이 이용된다.Here, a predicted value m ^{(k ')} , a multi-class margin? _M ^(k') (r), and a normalized multiple class margin? _M ^{(k ')} (r) for a k'th enhancement category node are used.

마찬가지로, |Ω_L| 바이너리 SVM 분류기들(φ''₁, φ''₂,. . . , φ''_|ΩL|)은, 서브 카테고리 공간 Ω_L에서 가장 좋은 최상의 노드(들)를 결정하기 위해 증강 카테고리 노드에서 훈련 데이터 세트를 D_l를 사용하여 훈련된다. 서브 카테고리 l에 대한 선형 SVM (분류기) φ''_l은 [수학식4]와 같이 정의된 의사 확률 P (y = l^(k'')| r)로 투영되고, 이에 따라, 증강 카테고리 노드에 대해 계산된, 복수의 서브 카테고리 노드 중 각 서브 카테고리에 대한 k'' 번째 신뢰도값 CS_l ^(k'') (r)가 [수학식4]와 같이 산출되어 전체 서브 카테고리 노드들에 대한 신뢰도 점수 계산에 사용될 수 있다. Similarly, | Ω _L | Binary SVM classifier _{(... Φ '' 1} , φ '' 2,, φ '' | ΩL |) are trained in the enhanced category nodes in order to determine the best best node (s) in the sub-category space Ω _L The data set is trained using D _l . The linear SVM (classifier)? '' ₁ for subcategory l is projected to a pseudo probability P (y = l ^{(k '')} | r) defined as in Equation 4, calculated for, k for each sub-category of the plurality of sub-category node, "" second confidence value _{^{CS l (k '') (}} r) , the [equation 4] are calculated as the total sub-category reliability for the node score Can be used for the calculation.

[수학식4]&Quot; (4) "

여기서, k"번째 서브 카테고리 노드에 대한 예측값 l^(k''), 다중 클래스 마진 ζ_l ^(k'')(r), 정규화된 다중 클래스 마진 φ_l ^(k'')(r)이 이용된다.Here, k "predicted value l ^{(k ''),} a multi-class margin _{^{ζ l (k '(r)}} ') (r), the normalized multiclass margin φ _l ^{(k '')} for the second sub-category node is utilized .

이하 도 3의 흐름도를 참조하여, 본 발명의 일 실시예에 따른 객체 검출 시스템의 객체 검출 방법을 정리한다. Hereinafter, an object detection method of the object detection system according to an embodiment of the present invention will be described with reference to the flowchart of FIG.

도 3은 본 발명의 일 실시예에 따른 객체 검출 시스템의 객체 검출 방법을 설명하기 위한 흐름도이다.3 is a flowchart illustrating an object detection method of an object detection system according to an embodiment of the present invention.

도 3을 참조하면, 본 발명의 객체 검출 시스템(장치)에서 이미지에 포함된 의미 객체들에 대한 계층적 특징 모델링을 이용한 객체 검출 방법에서는, 먼저, 계층적 특징 모델부 (HFM)는, 대상 이미지의 관심 영역(ROI)의 특징에 따라, 상기 의미 객체들을 포함하는 의미 객체 카테고리 공간을 형성하는 루트 노드, 상기 루트 노드의 자식으로서 각 의미 객체에 대응되고 훈련 데이터 세트 D_h를 이용하여 훈련될 하나 이상의 수퍼 카테고리 노드, 각 수퍼 카테고리 노드의 자식으로서 훈련 데이터 세트 D_m를 이용하여 훈련될 하나 이상의 증강 카테고리 노드, 각 증강 카테고리 노드의 자식으로서 훈련 데이터 세트 D_l을 이용하여 훈련될 하나 이상의 서브 카테고리 노드를 포함하는 계층적 특징 모델 (HFM)의 구성을 수행한다(S100).3, in an object detection method using hierarchical feature modeling for semantic objects included in an image in an object detection system (apparatus) of the present invention, first, a hierarchical feature model unit (HFM) in accordance with a feature of the region of interest (ROI), the root node to form a semantic object category space including the semantic object and corresponding to each of the semantic object as a child of the root node a to be trained using a training data set D _h One or more sub-category nodes to be trained using training data set D _l as a child of each enhancement category node, one or more enhancement category nodes to be trained using training data set D _m as a child of each super category node, (Step S100). In step S100, the hierarchical feature model HFM is constructed.

계층적 분류기 앙상블(HCE)은, 위에서 기술한 바와 같이, 상기 수퍼 카테고리 노드, 상기 증강 카테고리 노드, 상기 서브 카테고리 노드에 대해 SVM 앙상블 알고리즘에 따른 다중 분류기를 학습하여 각 카테고리에서의 신뢰도 점수를 계산한다(S200). 각 카테고리에서의 신뢰도 점수를 계산하기 위하여, 계층적 분류기 앙상블(HCE)은 상기 각 카테고리에서 상기 관심 영역에 대하여 바이너리 SVM 분류기들을 의사 확률로 투영하며, 상기 각 카테고리에서의 상기 의사 확률을 기초로 해당 카테고리에서의 다중 클래스 마진을 산출하고 상기 다중 클래스 마진에 대해 정규화된 다중 클래스 마진을 산출하여, 상기 각 카테고리에서의 신뢰도 점수를 계산한다. 계층적 분류기 앙상블(HCE)은, 즉, 위에서도 기술한 바와 같이, 관심 영역(ROI) r에서, 상기 각 카테고리 노드에 대해 SVM 분류기들을 의사 확률 P(h|r)로 투영하고, [수학식2e], [수학식2f], [수학식3b] , [수학식4]와 같이, k/k'/k" 번째 각 카테고리의 노드에 대한 신뢰도 점수 CS를 산출하여 각 카테고리에서의 각 노드에 대해 신뢰도 점수를 산출할 수 있다. The hierarchical classifier ensemble (HCE) learns multiple classifiers according to the SVM ensemble algorithm for the super-category nodes, the enhancement category nodes, and the sub-category nodes, as described above, to calculate the confidence scores in each category (S200). In order to calculate a confidence score in each category, a hierarchical classifier ensemble (HCE) projects the binary SVM classifiers for the region of interest in each of the categories with a probability probability, and based on the pseudo probability in each category Category is calculated and a multi-class margin normalized for the multi-class margin is calculated to calculate a reliability score in each category. The Hierarchical Classifier Ensemble (HCE) projects the SVM classifiers to the pseudo-probability P (h | r) for each of the category nodes in the ROI r, as described above, K " / k "as shown in Equation (2), Equation 2f, Equation 3b and Equation 4, The reliability score can be calculated.

소정의 객체 검출기는, 대상 이미지에 포함된 의미 객체를 검출하기 위하여 상기 각 카테고리의 신뢰도 점수를 수집한다(S300). 이와 같은 HCE 기반 객체 검출은 HFM에 의해 유도되고 수집된 각 영역 제안의 신뢰 점수들을 통계하고 분석하여 수행된다. 각 카테고리의 신뢰도 점수는 가중치 등에 따라 집계되어 능형 회귀(ridge regression), non-maximum suppression(비최대 억제) 등의 처리를 거쳐 이미지에 대상 객체를 포함하는 지 여부 등의 객체 검출 판단에 활용될 수 있다. The predetermined object detector collects reliability scores of each category in order to detect a semantic object included in the target image (S300). This HCE - based object detection is performed by statistically analyzing and analyzing the confidence scores of each domain suggestions derived and collected by HFM. The reliability score of each category can be used to judge object detection, such as whether the target object is included in the image through ridge regression, non-maximum suppression, etc. have.

<실험 내용><Contents of experiment>

계층적 분류기 앙상블 (HCE)가 있는 계층적 특징 모델 (HFM)에 기반을 둔 본 발명의 객체 검출 시스템은 PASCAL VOC 2007 및 PASCAL VOC 2012 검출 작업에서 평가되었다. 각 데이터 세트에는 현실 세계 현장에 대한 수천 개의 이미지가 포함되어 있으며, 목표는 이미지의 모든 객체의 경계 상자들(bounding boxes)을 예측하는 것이다. 예측된 경계 상자가 현실과 50 % 이상 겹치면 true positive로 간주된다. 16 층 VGG-Net이 시스템의 베이스라인으로 사용되었다. 최첨단 영역 제안 알고리즘 중에서 Edge Boxes는 신속하고 정확한 영역 제안을 제공하기 때문에 채택되었다. PASCAL VOC 2007에 대한 첫 번째 실험에서 우리는 소정의 훈련 데이터 세트에서 검출 시스템을 훈련시켰다. 두 번째 실험은 HFM에 대한 지식 이전 학습과 함께 PASCAL VOC 2007으로 평가되었다. 마지막으로 본 발명의 방법과 PASCAL VOC 2012 공공 리더보드(public leaderboard) 상의 최첨단 방법을 비교한다.The object detection system of the present invention based on a hierarchical feature model (HFM) with a hierarchical classifier ensemble (HCE) was evaluated in the PASCAL VOC 2007 and PASCAL VOC 2012 detection tasks. Each dataset contains thousands of images for the real world scene, and the goal is to predict the bounding boxes of all the objects in the image. If the predicted bounding box overlaps with the reality by more than 50%, it is considered to be true positive. The 16th floor VGG-Net was used as the base line of the system. Among the most advanced domain proposal algorithms, Edge Boxes were adopted because they provide fast and accurate domain proposals. In the first experiment on PASCAL VOC 2007, we trained the detection system in a set of training data. The second experiment was evaluated as PASCAL VOC 2007 with knowledge transfer learning for HFM. Finally, the method of the present invention is compared with a cutting-edge method on a PASCAL VOC 2012 public leaderboard.

PASCAL VOC 2007 데이터 세트의 실험 결과는 아래 Table 1에 나와있다. 각 방법은 FFM(flat feature model), 증강 카테고리 레벨 (M 레벨 HFM)을 가진 HFM, 및 서브 카테고리 레벨 (L 레벨 HFM)을 가진 HFM으로 구별된다. Table 1의 모든 실험은 VOC 2007 훈련 데이터 세트에서 훈련되었으며 표준 파스칼 평가 도구 [27]를 사용하여 테스트 세트에서 Mean Averagr Precision(mAP)로 평가되었다. 최첨단 검출 기의 결과와 비교한 검출 성능은 Table 1에 나와 있다.The experimental results of the PASCAL VOC 2007 data set are shown in Table 1 below. Each method is distinguished by a flat feature model (FFM), an HFM with an enhancement category level (M level HFM), and an HFM with a subcategory level (L level HFM). All the experiments in Table 1 were trained in the VOC 2007 training data set and evaluated with Mean Averagr Precision (mAP) in the test set using the standard Pascal evaluation tool [27]. The detection performance compared with the results of the state-of-the-art detector is shown in Table 1.

FFM은 증강 카테고리가 없이 루트와 전체 의미 객체 카테고리만 갖는 특수한 유형의 HFM이다. FFM은 실험에서 피쳐(특징) 추출기로 사용된다. 공공 VGG16 CNN 구조가 훈련 프로토콜에 따라 기준선으로 선택되었다. FFM을 구축하기 위해 Image Net의 사전 훈련된 CNN은 훈련 속도 0.001에서 50K 반복으로 데이터 (D, Ω)를 미세 조정했다. 50K 회 반복 후 훈련 속도는 20K 회 반복으로 미세 조정을 위해 10 배 감소했다. FFM 미세 조정 중에는 conv4_1에서 fc7까지의 가중치 만 미세 조정되었고 conv1_1에서 conv3_3까지의 가중치는 고정되었다.FFM is a special type of HFM with no enhancement categories and only root and full semantic object categories. FFM is used as a feature extractor in experiments. The public VGG16 CNN structure was chosen as the baseline according to the training protocol. To build the FFM, ImageNet's pre-trained CNN fine-tuned the data (D, Ω) at a training rate of 0.001 to 50K iterations. After 50K repetitions, the training speed decreased by 10 times for fine adjustment with 20K repetitions. During FFM fine tuning, only the weights from conv4_1 to fc7 were fine-tuned and the weights from conv1_1 to conv3_3 were fixed.

<M-level HFM><M-level HFM>

M 레벨 HFM은 VOC 2007 훈련 데이터 세트에 서브 카테고리 없이 구축된다. 카테고리 계층 구조를 구축하기 위해 LTM에서 K 개의 주제를 설정하여 각 ROI에 대해 K 차원의 수퍼 카테고리 분포 θ^K를 찾았다. 이 실험에서 K는 {1,2, ..., 9} 이상의 그리드 검색에 의해 선택된 5로 설정되었다. 분리된 HFM H-레벨은 데이터 희박성으로 인해 쉽게 채워질 수 있으며 HFM 계층 구조는 일부 후보들을 오도한다. 과도 피팅(overfitting)과 오도의 소지가 있는 문제를 극복하기 위해, 수퍼 카테고리는 대신에

인 수퍼 카테고리 i를 결정하여 각 ROI r에 허용되었다. T_θ는 경험적으로 0.3으로 결정되었다. HFM을 구축하기 위해 매우 큰 데이터 세트를 사용하여 훈련하기 위해 하드 네가티브 마이닝(hard negative mining)을 사용하여 post-hoc SVM 훈련을 구현하였다. HFM은 Sect. 3.2.로 구성된다. HCE를 학습 후에, 실현 가능한 HCE 서브 트리는 사후 처리에서 서로 경쟁하여 최종 객체 위치를 방향정위(localization)한다. 후 처리 과정에서 보충 자료에 설명된 가중치 처리된 non-maximum suppression은 물론 계층적 능형 회귀가 적용되었다. Table 1은 FFM에서 2.9 %의 HFM을 적용하고 70.8 %에 도달함으로써 현저한 개선이 달성될 수 있음을 보여준다.The M-level HFM is built without subcategories in the VOC 2007 training data set. To construct a category hierarchy, we set up K subjects in the LTM to find the K-dimensional super category distribution θ ^K for each ROI. In this experiment, K was set to 5, which was selected by a grid search of {1,2, ..., 9} or more. The isolated HFM H-level can be easily filled due to data sparsity and the HFM hierarchy misleads some candidates. In order to overcome the problem of overfitting and misleading, the super category Instead of

Gt; i < / RTI > was determined and allowed for each ROI r. T _θ was empirically determined to be 0.3. To implement HFM, we implemented post-hoc SVM training using hard negative mining to train using very large data sets. HFM is Sect. It consists of 3.2. After learning the HCE, the feasible HCE subtrees compete with each other in post-processing to localize the final object location. Hierarchical ridge regression was applied as well as the weighted non-maximum suppression described in the supplementary data in the post-processing. Table 1 shows that a significant improvement can be achieved by applying 2.9% HFM in FFM and reaching 70.8%.

<L-level HFM><L-level HFM>

L 레벨 HFM은 서브 카테고리 레벨을 고려하여 구축되었다. M-레벨 HFM이 구축된 후 LTM에 의해 서브 카테고리가 발견되었다. 따라서 M 레벨의 각 증강 객체 카테고리 노드에는 L 레벨의 서브 카테고리 자식들이 있다. 학습 과정은 M 레벨의 HFM과 동일하다. L 레벨 HFM의 개선은 FFM 보다 4.4 %이다. Table 1에서와 같이, 66.9 % 를 보이는 Fast R-CNN과 같은 최첨단 방법보다 높게, VOC 2007 데이터 세트에서 전체 72.3 % mAP가 달성되었다.L-level HFM was constructed considering sub-category levels. After the M-level HFM was established, a sub-category was discovered by the LTM. Thus, for each augmented object category node at the M level, there are L level subcategory children. The learning process is the same as the M-level HFM. Improvement of L level HFM is 4.4% than FFM. As shown in Table 1, a total 72.3% mAP was achieved in the VOC 2007 data set, higher than the state-of-the-art methods such as Fast R-CNN with 66.9%.

교차 도메인 전송 학습(transfer learning)을 위한 모델의 성능은 Microsoft의 Common Objects in Context (COCO) 2014 및 PASCAL VOC 2007 + VOC 2012 (VOC +)를 두 도메인으로 사용하여 평가되었다. 먼저, 다음 2 가지 기준치(baseline)는 전송 학습없이 고려되었다.The performance of the model for cross-domain transfer learning was evaluated using Microsoft's Common Objects in Context (COCO) 2014 and PASCAL VOC 2007 + VOC 2012 (VOC +) as two domains. First, the following two baselines were considered without transmission learning.

<VOC+><VOC +>

FFM_VOC+를 구축하기 위해 ImageNet 사전 훈련된 CNN을 50K 반복 및 0.001의 학습 속도로 데이터 (D^VOC+, Ω^VOC+)에 대해 미세 조정한 다음 훈련 속도를 20K 반복에 대해 10 배 감소시켰다. FFM_VOC+가 미세하게 조정된 후, HFM은 LTM에서 얻은 카테고리 계층 구조를 사용하여 VOC+에 구축되었다. 모든 매개 변수는 위서 기술한 바와 같이 고정되었고, VOC+ 베이스라인에 대한 VOC 2007 테스트의 성능은 Table 2에서 설명한 바와 같이 75.6 %였다. 고속 R-CNN과 비교하여 성능 향상은 추가 데이터 (3.1에서 4.8 %까지)를 훈련함으로써 더욱 극적으로 달성되었다. 이 결과는 주로 계층적 구조를 구축하는 HFM 접근 방식과 더 큰 데이터 집합에서 강화된 그 능력에 기인한다.To build FFM _{VOC +} , ImageNet pre-trained CNN was fine-tuned for data (D ^{VOC +} , Ω ^{VOC +} ) at 50K repetitions and a learning rate of 0.001, and then the training speed was reduced by a factor of 10 for 20K iterations. After the FFM _{VOC +} was fine tuned, the HFM was built into the VOC + using the category hierarchy obtained from the LTM. All parameters were fixed as described above, and the performance of the VOC 2007 test for the VOC + baseline was 75.6% as described in Table 2. Compared with high-speed R-CNN, the performance improvement was achieved even more dramatically by training additional data (3.1 to 4.8%). This result is mainly due to the HFM approach to building a hierarchical structure and its ability to be enhanced in larger data sets.

COCO 데이터 세트는 VOC에 비해 클래스 수가 다르지만 VOC는 COCO의 하위 집합으로 간주 될 수 있으므로 COCO 베이스라인을 훈련하는 데 80 개의 클래스가 모두 사용되었다. 먼저, FFM_COCO가 구축되었고, ImageNet 사전 훈련 CNN 데이터 (D^COCO, Ω^COCO)에서 200K 반복 및 0.001의 학습률로 미세 조정되었다. 그런 다음 훈련 속도는 80K 반복에 대해 10 배 감소했다. VOC+에 대해 설명한 동일한 절차를 따랐다. COCO 베이스라인은 Table 2에서 73.9 %의 mAP를 달성했다. 이것은 도메인 차이 때문에 VOC+ 베이스라인 보다 낮다.Since the COCO data sets differ in the number of classes compared to the VOC, but the VOC can be considered a subset of the COCO, all 80 classes were used to train the COCO baseline. First, FFM _COCO was built and fine tuned to 200K iterations and a learning rate of 0.001 at ImageNet pre-training CNN data (D ^COCO , Ω ^COCO ). Then the training speed was reduced by a factor of 10 for 80K repeats. The same procedure as described for VOC + was followed. The COCO baseline achieved a mAP of 73.9% in Table 2. This is lower than the VOC + baseline due to domain differences.

<COCO→VOC+><COCO → VOC +>

HFM에 대한 지식 전달 학습의 효과가 검증되었다. 카테고리 계층을 구축하기 위해 동일한 도메인을 사용하는 대신 이전 지식을 위해 외부 도메인을 사용했다. 첫째, FFM_COCO가 COCO 베이스라인으로 사용되었다. 둘째, 카테고리 계층 구조는 LTM와 FFM_COCO를 사용하여 구성되었다. 그런다음 VOC 계층 구조 카테고리는 COCO 데이터 세트에서 외양(appearance)을 전송하여 얻은 것이다. 마지막으로, FFM_COCO는 HFM을 구축하기 위해 데이터 (D^VOC+, Ω^VOC+)로 미세 조정되었다. 미세 조정 옵션은 훈련 속도 0.001에서 50K 반복으로 설정되었으며 훈련 속도는 20K 반복에 대해 10 배 감소했다. 실험 매개 변수는 위에서 기술한 바와 같고, Table 2는 HFM을 기반으로 한 지식 전송 학습이 COCO 및 VOC+에 대한 베이스라인보다 80.4 % 더 잘 수행되었음을 보여준다.The effect of knowledge transfer learning on HFM was verified. Instead of using the same domain to build the category hierarchy, we used an external domain for our previous knowledge. First, FFM _COCO was used as the COCO baseline. Second, the category hierarchy was constructed using LTM and FFM _COCO . The VOC hierarchy category is then obtained by transferring the appearance from the COCO data set. Finally, the FFM _COCO was fine-tuned to the data (D ^{VOC +} , Ω ^{VOC +} ) to build the HFM. The fine tuning options were set at a training speed of 0.001 to 50K repetitions and the training speed was reduced by a factor of 10 for 20K repetitions. The experimental parameters are as described above, and Table 2 shows that knowledge transfer learning based on HFM was performed 80.4% better than the baseline for COCO and VOC +.

이 실험에서는 VOC 2012 테스트 세트의 검출 성능이 평가되었다. VOC2012 데이터 세트의 최종 결과를 위해 CNN은 COCO 트레인 세트에서 미세 조정되었으며 도메인 적응 방법은 위에서 설명한 것과 동일한 절차를 적용하는 VOC 2012 트레인 세트에서 수행되었다.The detection performance of the VOC 2012 test set was evaluated in this experiment. For the end result of the VOC2012 data set, CNN was fine-tuned in the COCO train set and the domain adaptation method was performed in the VOC 2012 train set applying the same procedure as described above.

Table 3은 VGG16을 베이스라인 및 추가 훈련 데이터로 사용하여 VOC2012 리더 보드의 항목에 HFM과 비교한다. 도메인 데이터를 사용하지 않아도 HFM은 71.0 %의 우수한 검색 방법 중 하나이다. 도메인 적응 접근법을 통해 미세 조정하고 L 레벨 HFM을 구성한 후, HFM은 VOC 2012 테스트 결과에서 최첨단 기술인 77.5 % mAP를 달성했다.Table 3 uses VGG16 as the baseline and additional training data to compare with the HFM on the items in the VOC2012 leaderboard. Without using domain data, HFM is one of the excellent search methods of 71.0%. After fine tuning through the domain adaptation approach and configuring the L-level HFM, HFM achieved a state-of-the-art 77.5% mAP in VOC 2012 test results.

이와 같이 본 발명에서는 새로운 데이터 기반의 계층적 객체 검출 프레임 워크를 제시했다. 이 프레임 워크는 PASCAL VOC 2007 및 VOC 2012 데이터 세트에 대한 최첨단 결과의 성능을 능가한다. 딥 피쳐는 LTM 알고리즘을 통해 계층적 딥 피쳐 모델 HFM을 구축하여 분할되었다. 분류기는 HFM의 각 노드에서 조립되어 HCE로 구성되었다. Thus, the present invention provides a new data-based hierarchical object detection framework. This framework outperforms the most advanced results for PASCAL VOC 2007 and VOC 2012 data sets. Deep features were segmented by building a hierarchical deep feature model HFM through the LTM algorithm. The classifier was assembled at each node of HFM and composed of HCE.

상술한 바와 같이, 본 발명에 따른 객체 검출 시스템에서의 콘볼루션 신경망 기반-계층적 특징 모델링을 이용한 객체 검출 방법은, 증강된 객체 카테고리의 개념에 따라 특히 대규모 객체 검출에서 클래스 간 모호성 및 클래스 내 변화 문제를 해결할 수 있으며, SPP, 고속 R -CNN와 같은 최첨단 기술에서 볼 수 있듯이, 관심 영역 (ROI)에 전체 의미 카테고리의 전체 할당 대신 제한된 카테고리가 할당되기 때문에, 본 발명의 방법은 계산 오버 헤드를 줄일 수 있다. 또한, 계층적 특징 모델(HFM)은 딥 피쳐 계층의 클러스터링 품질을 활용하는 계층적 분류기 앙상블(HCE)과 결합하여 플랫 피쳐 모델 및 서브 카테고리 기반 피쳐 모델 보다 더 효과적으로 사용될 수 있다. 그리고, 많은 복잡한 데이터 샘플을 클래스 간 정보를 활용하여 서브 카테고리로 올바르게 클러스터링 할 수 있으며, 단순한 하위 문제를 해결하여 전반적인 검색 정확도를 향상시킬 수 있다.As described above, the object detection method using the convolutional neural network-based hierarchical feature modeling in the object detection system according to the present invention is based on the concept of the enhanced object category, in particular, The problem can be solved and the method of the present invention is able to solve computational overheads because the ROI is assigned a limited category instead of the entire assignment of the entire semantic category, as can be seen in the state of the art SPP, high-speed R- Can be reduced. In addition, a hierarchical feature model (HFM) can be used more effectively than a flat feature model and a subcategory based feature model in combination with a hierarchical classifier ensemble (HCE) that utilizes the clustering quality of the deep feature layer. In addition, many complex data samples can be clustered correctly into subcategories by using interclass information, and simple sub-problems can be solved to improve overall search accuracy.

이상과 같이 본 발명에서는 구체적인 구성 요소 등과 같은 특정 사항들과 한정된 실시예 및 도면에 의해 설명되었으나 이는 본 발명의 보다 전반적인 이해를 돕기 위해서 제공된 것일 뿐, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상적인 지식을 가진 자라면 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 발명의 사상은 설명된 실시예에 국한되어 정해져서는 아니 되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등하거나 등가적 변형이 있는 모든 기술 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.As described above, the present invention has been described with reference to particular embodiments, such as specific elements, and specific embodiments and drawings. However, it should be understood that the present invention is not limited to the above- Those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the essential characteristics of the invention. Therefore, the spirit of the present invention should not be construed as being limited to the embodiments described, and all technical ideas which are equivalent to or equivalent to the claims of the present invention are included in the scope of the present invention .

클래스 간 모호성(inter-class ambiguity)
클래스 내 변화(intra-class variations)
의미 객체 카테고리(semantic object categories)
서브 카테고리(sub categories)
잠재 주제 모델(LTM)
딥 피쳐 기반(deep feature-based)
계층적 특징 모델 (HFM)
계층적 분류기 앙상블 (HCE)
계층적 능형 회귀 알고리즘(hierarchical ridge regression algorithm)Inter-class ambiguity
Intra-class variations
Semantic object categories
Sub categories
Latent Topic Model (LTM)
Deep feature-based < RTI ID = 0.0 >
Hierarchical Feature Model (HFM)
Hierarchical Classifier Ensemble (HCE)
Hierarchical ridge regression algorithm

Claims

A method for detecting an object using hierarchical feature modeling of semantic objects included in an image in an object detecting apparatus,
A root node forming a semantic object category space including the semantic objects according to a characteristic of an area of interest of the image, at least one super category corresponding to each semantic object as a child of the root node and being trained using a training data set A hierarchical feature model comprising one or more subcategory nodes to be trained using training data sets as children of each enhancement category node, one or more enhancement category nodes to be trained using training data sets as children of each super category node, Performing a configuration;
Learning a multi-classifier according to a SVM (Support Vector Machine) ensemble algorithm for the super category node, the augmentation category node, and the subcategory node, and calculating a reliability score in each category; And
Collecting reliability scores of each category to detect semantic objects included in the image
The method comprising the steps of:

The method according to claim 1,
The step of calculating the confidence score comprises:
Projecting binary SVM classifiers for the ROI in each of the categories with a pseudo probability; And
Calculating a multi-class margin in the category based on the pseudo-probability in each category, calculating a normalized multi-class margin for the multi-class margin, and calculating a reliability score in each category
The method comprising the steps of:

3. The method of claim 2,
(H | r) for the one or more super category nodes in the region of interest r, and calculates a confidence score CS _h ^(k) (r) for the kth super category node according to the following equation: Calculates a reliability score for the one or more super category nodes,

Here, A, B, and C are predetermined coefficients,

Is the pseudo-probability for the k-th super-category node h ^(k)

Is a multi-class margin for h ^(k)

Is a normalized multi-class margin for h ^(k) , Ω _H 은 is the maximum number corresponding to h ^(k) in the super-category space Ω _H.

3. The method of claim 2,
(M | r) for the one or more enhancement category nodes in the region of interest r and calculates a confidence score CS _m ^{(k ')} (r ^') for the k'th enhancement category node according to the following equation: r ) To yield a confidence score for the one or more enhancement category nodes,

Here, A ', B', and C 'are predetermined coefficients,

Is the pseudo probability for the k 'th enhancement category node m ^(k') ,

Is a multi-class margin for m ^{(k ')} ,

Is m ^{(k ')} the normalized multiclass margin, for | Ω _M | is m ^(k in augmented category space Ω _^M' maximum number of the object detecting method, characterized in that corresponding to ^a).

3. The method of claim 2,
(L | r) for the one or more sub-category nodes in the region of interest r and calculates a confidence score CS _l ^{(k ")} (r ) To yield a confidence score for the one or more sub-categories,

Here, A ", B ", and C "are predetermined coefficients,

Is the pseudo-probability for the k "th sub-category node l ^(k") ,

Is a multi-class margin for l ^{(k ")} ,

Is ^"normalized multiclass margin for, | Ω _L | is l ^(k in the sub-category space Ω _L l ^{^(k)"} The maximum number of the object detecting method, characterized in that corresponding to ^a).

One or more super category nodes corresponding to each semantic object as a child of the root node and to be trained using a training data set, according to a characteristic of a region of interest of an image, a root node forming a semantic object category space including semantic objects, One or more enhancement category nodes to be trained using the training data set as a child of each super category node, one or more subcategory nodes to be trained using the training data set as a child of each enhancement category node Function to perform;
Learning the multi-classifier according to the SVM ensemble algorithm for the super-category node, the enhancement category node, and the sub-category node to calculate a reliability score in each category; And
And collecting reliability scores of each category to detect a semantic object included in the image,
A computer-readable recording medium storing computer-readable code for performing object detection using hierarchical feature modeling of semantic objects contained in an image in an object detection apparatus.