KR20200071808A

KR20200071808A - Learning method of object detector, computer readable medium and apparatus for performing the method

Info

Publication number: KR20200071808A
Application number: KR1020180151958A
Authority: KR
Inventors: 고성제; 엄광현; 조성진; 국형근; 김승욱
Original assignee: 고려대학교 산학협력단
Priority date: 2018-11-30
Filing date: 2018-11-30
Publication date: 2020-06-22
Also published as: KR102204565B1

Abstract

Disclosed are an object detector learning method, and a recording medium and a device for performing the same. The object detector learning method comprises the following steps of: matching at least one feature map used in object detection in an image and at least one ground truth (GT) box set in an object area in the image to learn GT boxes each matched in at least one feature map; generating at least one additional GT box for each of the at least one GT box by expanding or reducing the at least one GT box, respectively; separating the at least one feature map to generate at least one sub feature map for each of the at least one feature map; and matching the at least one sub feature map and the at least one additional GT box according to the size of the box to further learn additional GT boxes each matched in the at least one sub-feature map.

Description

Learning method of object detector, recording medium and apparatus for performing it{LEARNING METHOD OF OBJECT DETECTOR, COMPUTER READABLE MEDIUM AND APPARATUS FOR PERFORMING THE METHOD}

본 발명은 객체 검출기의 학습 방법, 이를 수행하기 위한 기록매체 및 장치에 관한 것으로, 보다 상세하게는 영상 내 객체의 위치를 특정하고, 위치가 특정된 객체를 분류하는 객체 검출기의 학습 방법, 이를 수행하기 위한 기록매체 및 장치에 관한 것이다.The present invention relates to a method for learning an object detector, a recording medium and a device for performing the same, and more specifically, a method for learning an object detector for specifying the position of an object in an image and classifying the object whose position is specified, and performing the same It relates to a recording medium and a device for doing.

영상 내 객체 검출 기법으로 영상 피라미드(image pyramid) 기반의 기법이 알려져 있다. 영상 피라미드 기반의 객체 검출 기법은 입력 영상의 크기를 다양하게 변환하고, 변환된 영상들을 각각 고정된 크기의 입력을 받는 검출기에 적용하여 영상으로부터 객체를 검출하는 방식이다. 이러한 기법은 입력 영상의 크기에 따라 영상 내 객체들의 크기도 달라진다는 점을 이용하여 검출기가 다양한 크기의 객체를 검출할 수 있다.Image pyramid-based techniques are known as object detection techniques in images. The image pyramid-based object detection technique is a method of variously converting the size of an input image and applying the transformed images to a detector that receives a fixed-size input, thereby detecting an object from the image. This technique allows the detector to detect objects of various sizes using the fact that the size of the objects in the image also varies depending on the size of the input image.

최근에는 합성곱 신경망(CNN: convolutional neural networks)에 기반한 SSD(Single Shot multibox detector)이 실시간 객체 검출기로 각광받고 있다. SSD는 합성곱 신경망을 이용하여 특징 맵(feature map)을 추출하고, 특징 맵을 영상 내 객체 검출에 이용하여 기존의 영상 피라미드 기반의 객체 검출 기법에 비해 다양한 크기의 객체를 실시간으로 검출할 수 있다.Recently, a single shot multibox detector (SSD) based on convolutional neural networks (CNN) has been spotlighted as a real-time object detector. The SSD extracts a feature map using a convolutional neural network and detects objects of various sizes in real time compared to the existing image pyramid-based object detection technique using the feature map for object detection in the image. .

그러나 SSD의 특징 맵이 구성하는 특징 계층은 할당된 크기의 개체들에 대한 정보만을 학습하여 특징 계층 간의 추상화 정도가 다르고, 낮은 해상도의 특징 계층에 할당되는 일부 개체들에 대한 추상화 정도는 약할 수 있어 객체 검출의 정확도가 다소 떨어질 수 있다.However, the feature layer constituted by the feature map of the SSD learns only information about the objects of the allocated size, so the degree of abstraction between feature layers is different, and the degree of abstraction for some objects allocated to the feature layer of low resolution can be weak. The accuracy of object detection may be somewhat lower.

본 발명은 SSD의 각 특징 계층에서 할당된 크기의 개체들에 대한 정보뿐만 아니라 할당된 크기로 확대 또는 축소된 다른 개체들에 대한 정보의 추가 학습을 지원하는 객체 검출기의 학습 방법, 이를 수행하기 위한 기록매체 및 장치를 제공한다. The present invention provides a method for learning an object detector for supporting additional learning of information on objects of an allocated size in each feature layer of an SSD as well as information on other objects enlarged or reduced to an allocated size, for performing the same Provide a recording medium and device.

본 발명에 따른 생성하는 객체 검출기의 학습 방법은 영상 내 객체 검출에 사용되는 적어도 하나의 특징 맵과 영상 내 객체 영역에 설정되는 적어도 하나의 GT(Ground Truth) 박스를 박스의 크기에 따라 매칭하여 적어도 하나의 특징 맵에서 각각 매칭된 GT 박스를 학습하는 단계, 상기 적어도 하나의 GT 박스를 각각 확대 또는 축소하여 상기 적어도 하나의 GT 박스 별로 적어도 하나의 추가 GT 박스를 생성하는 단계, 상기 적어도 하나의 특징 맵을 분리하여 상기 적어도 하나의 특징 맵 별로 적어도 하나의 하위 특징 맵을 생성하는 단계 및 상기 적어도 하나의 하위 특징 맵과 상기 적어도 하나의 추가 GT 박스를 박스의 크기에 따라 매칭하여 적어도 하나의 하위 특징 맵에서 각각 매칭된 추가 GT 박스를 추가로 학습하는 단계를 포함한다.The learning method of the object detector to be generated according to the present invention is to match at least one feature map used for object detection in an image and at least one GT (Ground Truth) box set in an object region in an image according to the size of the box. Learning each matched GT box from one feature map, generating at least one additional GT box for each of the at least one GT box by expanding or reducing each of the at least one GT box, and the at least one feature Generating at least one sub-feature map for each of the at least one feature map by separating a map, and matching the at least one sub-feature map with the at least one additional GT box according to the size of a box to at least one sub-feature And further learning additional matched GT boxes in the map.

한편, 상기 적어도 하나의 GT 박스를 각각 확대 또는 축소하여 상기 적어도 하나의 GT 박스 별로 적어도 하나의 추가 GT 박스를 생성하는 단계는, 상기 적어도 하나의 GT 박스 별로 각각 GT 박스와 중심 좌표는 동일하되, 크기가 확대 또는 축소된 적어도 하나의 추가 GT 박스를 생성하는 단계를 포함할 수 있다.On the other hand, the step of generating at least one additional GT box for each of the at least one GT box by expanding or reducing the at least one GT box, the GT box and the center coordinate are the same for each of the at least one GT box, And generating at least one additional GT box that is enlarged or reduced in size.

또한, 상기 적어도 하나의 특징 맵을 분리하여 상기 적어도 하나의 특징 맵 별로 적어도 하나의 하위 특징 맵을 생성하는 단계는, 상기 적어도 하나의 특징 맵을 각각 상기 적어도 하나의 GT 박스 별로 생성하는 추가 GT 박스의 개수와 동일한 개수로 분리하는 단계를 포함할 수 있다.In addition, the step of generating at least one sub-feature map for each of the at least one feature map by separating the at least one feature map may include an additional GT box for generating the at least one feature map for each of the at least one GT box. It may include the step of separating the same number as the number of.

또한, 상기 적어도 하나의 하위 특징 맵과 상기 적어도 하나의 추가 GT 박스를 박스의 크기에 따라 매칭하여 적어도 하나의 하위 특징 맵에서 각각 매칭된 추가 GT 박스를 추가로 학습하는 단계는, 상기 적어도 하나의 추가 GT 박스와 상기 적어도 하나의 GT 박스의 박스 크기를 비교하여 상기 적어도 하나의 추가 GT 박스 별로 GT 박스를 매칭하는 단계 및 상기 적어도 하나의 추가 GT 박스를 각각 상기 적어도 하나의 추가 GT 박스 별로 매칭되는 GT 박스를 학습한 특징 맵으로부터 분리되어 생성된 하위 특징 맵과 매칭하는 단계를 포함할 수 있다.In addition, matching the at least one sub-feature map and the at least one additional GT box according to the size of the box to further learn additional GT boxes matched in at least one sub-feature map may include: Comparing the box size of the additional GT box and the at least one GT box to match the GT box for each of the at least one additional GT box, and matching the at least one additional GT box for each of the at least one additional GT box The method may include matching the GT box with the generated sub-feature map separated from the learned feature map.

또한, 상기 적어도 하나의 하위 특징 맵과 상기 적어도 하나의 추가 GT 박스를 박스의 크기에 따라 매칭하여 적어도 하나의 하위 특징 맵에서 각각 매칭된 추가 GT 박스를 추가로 학습하는 단계는, 상기 적어도 하나의 하위 특징 맵에서 각각 합성곱 신경망(CNN: Convolutional neural networks)을 이용하여 매칭된 추가 GT 박스를 학습하는 단계를 포함할 수 있다.In addition, matching the at least one sub-feature map and the at least one additional GT box according to the size of the box to further learn additional GT boxes matched in at least one sub-feature map may include: Each of the sub-feature maps may include learning additional matched GT boxes using convolutional neural networks (CNN).

또한, 상기 객체 검출기의 학습 방법을 수행하기 위한, 컴퓨터 프로그램이 기록된 컴퓨터로 판독 가능한 기록 매체일 수 있다.In addition, it may be a computer-readable recording medium in which a computer program is recorded for performing the learning method of the object detector.

한편, 본 발명에 따른 객체 검출기의 학습 장치는 영상 내 객체 검출에 사용되는 적어도 하나의 특징 맵에서 각각 영상 내 객체 영역에 설정되는 적어도 하나의 GT(Ground Truth) 박스를 학습할 수 있도록 상기 적어도 하나의 특징 맵과 상기 적어도 하나의 GT 박스를 박스의 크기에 따라 매칭하는 GT 박스 매칭부, 상기 적어도 하나의 GT 박스를 각각 확대 또는 축소하여 상기 적어도 하나의 GT 박스 별로 적어도 하나의 추가 GT 박스를 생성하는 추가 GT 박스 생성부, 상기 적어도 하나의 특징 맵을 분리하여 상기 적어도 하나의 특징 맵 별로 적어도 하나의 하위 특징 맵을 생성하는 특징 맵 분리부 및 상기 적어도 하나의 하위 특징 맵에서 각각 상기 적어도 추가 GT 박스를 추가로 학습할 수 있도록 상기 적어도 하나의 하위 특징 맵과 상기 적어도 하나의 추가 GT 박스를 박스의 크기에 따라 매칭하는 추가 GT 박스 매칭부를 포함한다.On the other hand, the learning apparatus of the object detector according to the present invention is at least one feature map used to detect at least one feature map used for object detection in the image, so that at least one GT (Ground Truth) box set in the object region in the image can be learned. A GT box matching unit that matches the feature map of the at least one GT box according to the size of the box, and enlarges or reduces each of the at least one GT box to generate at least one additional GT box for each of the at least one GT boxes. The additional GT box generating unit to separate the at least one feature map to generate at least one sub-feature map for each of the at least one feature map, and the at least one additional GT in the at least one sub-feature map, respectively. An additional GT box matching unit that matches the at least one sub-feature map with the at least one additional GT box according to the size of the box to further learn the box.

한편, 상기 추가 GT 박스 생성부는, 상기 적어도 하나의 GT 박스 별로 각각 GT 박스와 중심 좌표는 동일하되, 크기가 확대 또는 축소된 적어도 하나의 추가 GT 박스를 생성할 수 있다.Meanwhile, the additional GT box generator may generate at least one additional GT box in which the GT box and the center coordinate are the same for each of the at least one GT box, but are enlarged or reduced in size.

또한, 상기 특징 맵 분리부는, 상기 적어도 하나의 특징 맵을 각각 상기 적어도 하나의 GT 박스 별로 생성하는 추가 GT 박스의 개수와 동일한 개수로 분리할 수 있다.In addition, the feature map separator may separate the at least one feature map into the same number as the number of additional GT boxes that are generated for each of the at least one GT box.

또한, 상기 추가 GT 박스 매칭부는, 상기 적어도 하나의 추가 GT 박스와 상기 적어도 하나의 GT 박스의 박스 크기를 비교하여 상기 적어도 하나의 추가 GT 박스 별로 GT 박스를 매칭하고, 상기 적어도 하나의 추가 GT 박스를 각각 상기 적어도 하나의 추가 GT 박스 별로 매칭되는 GT 박스를 학습한 특징 맵으로부터 분리되어 생성된 하위 특징 맵과 매칭할 수 있다.In addition, the additional GT box matching unit compares the box sizes of the at least one additional GT box and the at least one GT box to match GT boxes for each of the at least one additional GT box, and the at least one additional GT box For each of the at least one additional GT box, the GT box matched may be matched with the generated sub-feature map separated from the learned feature map.

본 발명에 따르면 SSD의 모든 특징 맵이 영상 내 다양한 크기의 객체에 대한 정보를 학습하도록 지원할 수 있으며, 이로 인해 SSD의 검출 성능 향상을 도모할 수 있다. 아울러, 영상 내 다양한 크기의 객체 검출이 요구되는 자율 주행, 지능형 감시 시스템, 스마트 팩토리 등의 4차 산업 혁명 핵심 분야에 적용될 수 있을 것이다.According to the present invention, all feature maps of the SSD can support learning information on objects of various sizes in the image, thereby improving detection performance of the SSD. In addition, it can be applied to core areas of the 4th industrial revolution, such as autonomous driving, intelligent surveillance systems, and smart factories, which require the detection of objects of various sizes in an image.

도 1은 본 발명에 따른 객체 검출기의 학습 장치가 적용되는 SSD(Single Shot Multibox Detector)에서의 객체 검출 방법을 설명하기 위한 도면이다.
도 2는 본 발명의 일 실시예에 따른 객체 검출기의 학습 장치의 블록도이다.
도 3은 도 2에 도시된 GT 박스 매칭부에서의 특징 맵과 GT 박스 매칭을 설명하기 위한 도면이다.
도 4는 도 2에 도시된 추가 GT 박스 매칭부에서의 하위 특징 맵과 추가 GT 박스 매칭을 설명하기 위한 도면이다.
도 5는 본 발명의 일 실시예에 따른 객체 검출기의 학습 방법의 흐름도이다.
도 6은 영상 내에서 종래의 SSD로부터 객체를 검출한 결과 및 본 발명에 따른 객체 검출기의 학습 방법을 적용한 SSD로부터 객체를 검출한 결과를 보여주는 도면이다.1 is a view for explaining a method of detecting an object in a single shot multibox detector (SSD) to which a learning apparatus of an object detector according to the present invention is applied.
2 is a block diagram of a learning apparatus of an object detector according to an embodiment of the present invention.
FIG. 3 is a diagram for describing the feature map and GT box matching in the GT box matching unit shown in FIG. 2.
FIG. 4 is a diagram for explaining sub-feature maps and additional GT box matching in the additional GT box matching unit illustrated in FIG. 2.
5 is a flowchart of a method for learning an object detector according to an embodiment of the present invention.
FIG. 6 is a diagram showing results of detecting an object from a conventional SSD in an image and detecting an object from an SSD to which the learning method of the object detector according to the present invention is applied.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 서로 다른 다양한 형태로 구현될 것이며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하며, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명은 청구항의 범주에 의해 정의될 뿐이다. 명세서 전체에 걸쳐 동일 참조 부호는 동일 구성 요소를 지칭한다. Advantages and features of the present invention, and methods for achieving them will be clarified with reference to embodiments described below in detail together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but will be implemented in various different forms, and only the embodiments allow the disclosure of the present invention to be complete, and the ordinary knowledge in the technical field to which the present invention pertains. It is provided to fully inform the holder of the scope of the invention, and the invention is only defined by the scope of the claims. The same reference numerals refer to the same components throughout the specification.

본 명세서에서 사용된 용어는 실시예들을 설명하기 위한 것이며 본 발명을 제한하고자 하는 것은 아니다. 본 명세서에서, 단수형은 문구에서 특별히 언급하지 않는 한 복수형도 포함한다. 명세서에서 사용되는 "포함한다(comprises)" 및/또는 "포함하는(comprising)"은 언급된 구성요소, 단계 및 동작은 하나 이상의 다른 구성요소, 단계 및 동작의 존재 또는 추가를 배제하지 않는다.The terminology used herein is for describing the embodiments and is not intended to limit the present invention. In the present specification, the singular form also includes the plural form unless otherwise specified in the phrase. As used herein, "comprises" and/or "comprising" does not exclude the presence or addition of one or more other components, steps and actions.

도 1은 본 발명에 따른 객체 검출기의 학습 장치가 적용되는 SSD(Single Shot Multibox Detector)에서의 객체 검출 방법을 설명하기 위한 도면이다.1 is a view for explaining a method of detecting an object in a single shot multibox detector (SSD) to which a learning apparatus of an object detector according to the present invention is applied.

도 1을 참조하면, SSD는 특징 맵(feature map)(F₁, F₂, F₃)을 이용하여 영상(I) 내 객체를 검출할 수 있다.Referring to FIG. 1, the SSD may detect an object in the image I using a feature map F ₁ , F ₂ , and F ₃ .

SSD는 합성곱 신경망(CNN: Convolutional neural network) 기반의 실시간 검출기로, 기존의 영상 피라미드 기반의 검출기에 비해 다양한 크기의 객체를 실시간으로 검출할 수 있어 널리 이용되고 있다.The SSD is a real-time detector based on a convolutional neural network (CNN), and is widely used because it can detect objects of various sizes in real time compared to a conventional pyramid-based detector.

영상 내 객체 검출에 있어서 다양한 크기의 객체를 검출하는 것이 중요한데, SSD는 영상 내 다양한 크기의 객체를 각각 다른 해상도를 갖는 특징 맵을 이용하여 검출할 수 있다. 예를 들어, 크기가 큰 객체는 해상도가 낮은 특징 맵을 이용하여 검출할 수 있으며, 크기가 작은 객체는 해상도가 높은 특징 맵을 이용하여 검출할 수 있다. 이때, 특징 맵은 합성곱 신경망을 이용하여 영상 내 객체에 설정되는 GT(Ground Truth) 박스에 대한 지식을 학습하며, 영상 내 객체 검출에 사용될 수 있다. In detecting an object in an image, it is important to detect objects of various sizes, and an SSD can detect objects of various sizes in an image using feature maps having different resolutions. For example, a large object may be detected using a feature map with a low resolution, and a small object may be detected using a feature map with a high resolution. In this case, the feature map learns knowledge of a GT (Ground Truth) box that is set on an object in the image using a convolutional neural network, and can be used for object detection in the image.

본 발명에 따른 객체 검출기의 학습 장치는 이러한 SSD에 적용되어 영상 내 객체 검출에 사용되는 특징 맵의 학습을 지원할 수 있다. 이하 도 2 이하를 참조하여 본 발명에 따른 객체 검출기의 학습 장치에 대해 구체적으로 설명한다.The learning device of the object detector according to the present invention can be applied to such an SSD to support learning of a feature map used for object detection in an image. Hereinafter, a learning apparatus of the object detector according to the present invention will be described in detail with reference to FIG. 2 and below.

도 2는 본 발명의 일 실시예에 따른 객체 검출기의 학습 장치의 블록도이다.2 is a block diagram of a learning apparatus of an object detector according to an embodiment of the present invention.

도 2를 참조하면, 본 발명의 일 실시예에 따른 객체 검출기의 학습 장치(1)는 GT 박스 매칭부(10), 추가 GT 박스 생성부(30), 특징 맵 분리부(50) 및 추가 GT 박스 매칭부(70)를 포함할 수 있다.2, the learning apparatus 1 of the object detector according to an embodiment of the present invention includes a GT box matching unit 10, an additional GT box generating unit 30, a feature map separation unit 50, and an additional GT A box matching unit 70 may be included.

본 발명의 일 실시예에 따른 객체 검출기의 학습 장치(1)는 상술한 것처럼 SSD에 적용되어 영상 내 객체 검출에 사용되는 특징 맵의 GT 박스 학습을 지원할 수 있다. The apparatus 1 for learning an object detector according to an embodiment of the present invention may be applied to an SSD as described above to support GT box learning of a feature map used for object detection in an image.

본 발명의 일 실시예에 따른 객체 검출기의 학습 장치(1)는 SSD의 일부 모듈을 구성하거나, SSD와 연결되는 별도의 모듈을 구성할 수 있다.The learning device 1 of the object detector according to an embodiment of the present invention may constitute a part of the SSD module or a separate module connected to the SSD.

본 발명의 일 실시예에 따른 객체 검출기의 학습 장치(1)는 특징 맵의 학습을 지원하기 위한 소프트웨어(어플리케이션)가 설치되어 실행될 수 있으며, GT 박스 매칭부(10), 추가 GT 박스 생성부(30), 특징 맵 분리부(50) 및 추가 GT 박스 매칭부(70)는 특징 맵의 학습을 지원하기 위한 소프트웨어에 의해 제어될 수 있다.The learning apparatus 1 of the object detector according to an embodiment of the present invention may be installed and executed with software (application) for supporting learning of a feature map, a GT box matching unit 10, an additional GT box generating unit ( 30), the feature map separation unit 50 and the additional GT box matching unit 70 may be controlled by software to support learning of the feature map.

GT 박스 매칭부(10), 추가 GT 박스 생성부(30), 특징 맵 분리부(50) 및 추가 GT 박스 매칭부(70)의 구성은 통합 모듈로 형성되거나, 하나 이상의 모듈로 이루어질 수 있다. 그러나, 이와 반대로 각 구성은 별도의 모듈로 이루어질 수도 있다. The configuration of the GT box matching unit 10, the additional GT box generating unit 30, the feature map separation unit 50, and the additional GT box matching unit 70 may be formed of an integrated module, or may be formed of one or more modules. However, on the contrary, each configuration may be made of a separate module.

본 발명의 일 실시예에 따른 객체 검출기의 학습 장치(1)는 이동성을 갖거나 고정될 수 있다. 본 발명의 일 실시예에 따른 객체 검출기의 학습 장치(1)는 컴퓨터(computer), 서버(server) 또는 엔진(engine) 형태일 수 있으며, 디바이스(device), 기구(apparatus), 단말(terminal), UE(user equipment), MS(mobile station), MT(mobile terminal), UT(user terminal), SS(subscriber station), 무선기기(wireless device), PDA(personal digital assistant), 무선 모뎀(wireless modem), 휴대기기(handheld device) 등 다른 용어로 불릴 수 있다.The learning device 1 of the object detector according to an embodiment of the present invention may have mobility or be fixed. The learning device 1 of the object detector according to an embodiment of the present invention may be in the form of a computer, a server, or an engine, and may be a device, an apparatus, or a terminal. , User equipment (UE), mobile station (MS), mobile terminal (MT), user terminal (UT), subscriber station (SS), wireless device (personal digital assistant), PDA (wireless modem) ), and handheld devices.

이하, 도 2에 도시된 본 발명의 일 실시예에 따른 객체 검출기의 학습 장치(1)의 각 구성에 대해 구체적으로 설명한다.Hereinafter, each configuration of the learning apparatus 1 of the object detector according to an embodiment of the present invention shown in FIG. 2 will be described in detail.

GT 박스 매칭부(10)는 영상 내 검출에 사용되는 적어도 하나의 특징 맵과 영상 내 객체 영역에 설정되는 적어도 하나의 GT 박스를 매칭할 수 있다. The GT box matching unit 10 may match at least one feature map used for detection in the image and at least one GT box set in the object area in the image.

GT 박스 매칭부(10)는 SSD에서 종래의 방식에 따라 진행되는 특징 맵의 학습을 지원할 수 있다. SSD는 적어도 하나의 특징 맵에 각각 영상 내 객체 영역에 설정되는 GT 박스를 할당하고, 적어도 하나의 특징 맵에서 각각 할당된 GT 박스에 대한 정보를 학습하도록 한다. 여기서, GT 박스는 영상 내 객체에 대한 정보를 포함하며 객체의 크기에 따라 그 크기가 할당될 수 있다. 즉, 적어도 하나의 특징 맵은 각각 학습한 GT 박스에 포함되는 객체에 대한 지식을 저장할 수 있으며, SSD는 이러한 특징 맵을 이용하여 영상 내 객체를 검출할 수 있을 것이다.The GT box matching unit 10 may support learning of a feature map proceeding according to a conventional method in an SSD. The SSD allocates a GT box set to an object region in an image to each of the at least one feature map, and learns information about each assigned GT box from the at least one feature map. Here, the GT box includes information about an object in the image, and its size may be allocated according to the size of the object. That is, at least one feature map may store knowledge of an object included in each learned GT box, and the SSD may detect an object in the image using the feature map.

GT 박스 매칭부(10)는 이처럼 적어도 하나의 특징 맵에서 각각 적어도 하나의 GT 박스를 학습할 수 있도록 적어도 하나의 특징 맵과 적어도 하나의 GT 박스를 박스의 크기에 따라 매칭할 수 있다. 이와 관련하여 도 3을 참조하여 설명한다.The GT box matching unit 10 may match at least one feature map and at least one GT box according to the size of the box so as to learn at least one GT box from each of the at least one feature map. This will be described with reference to FIG. 3.

도 3은 도 2에 도시된 GT 박스 매칭부에서의 특징 맵과 GT 박스 매칭을 설명하기 위한 도면이다.FIG. 3 is a diagram for describing the feature map and GT box matching in the GT box matching unit shown in FIG. 2.

도 3을 참조하면, SSD는 제1 특징 맵(F₁) 및 제2 특징 맵(F₂)의 특징 계층(feature layer)을 구성할 수 있다. 이때, 제1 특징 맵(F₁)은 제2 특징 맵(F₂)보다 큰 해상도를 가질 수 있다. Referring to FIG. 3, the SSD may configure a feature layer of the first feature map F ₁ and the second feature map F ₂ . At this time, the first feature map F ₁ may have a larger resolution than the second feature map F ₂ .

SSD는 영상에 포함되는 개 및 소의 객체를 검출할 수 있다. 영상 내 개 영역은 제1 GT 박스(g₁)가 설정되고, 영상 내 소 영역은 제2 GT 박스(g₂)가 설정될 수 있다. 이때, 제1 GT 박스(g₁)는 제2 GT 박스(g₂)보다 작은 크기가 할당될 수 있다.The SSD can detect dog and cow objects included in the image. The first GT box g ₁ may be set as the open area in the image, and the second GT box g ₂ may be set as the small area in the image. At this time, a size smaller than the second GT box g ₂ may be allocated to the first GT box g ₁ .

GT 박스 매칭부(10)는 제1 특징 맵(F₁)과 제1 GT 박스(g₁)를 매칭하고, 제2 특징 맵(F₂)과 제2 GT 박스(g₂)를 매칭할 수 있다. 즉, 제1 GT 박스(g₁)에 포함되는 객체는 제1 특징 맵(F₁)에 의해 검출되도록 할당되고, 제2 GT 박스(g₂)에 포함되는 객체는 제2 특징 맵(F₂)에 의해 검출되도록 할당될 수 있다. 이를 위해, 제1 특징 맵(F₁)은 제1 GT 박스(g₁)를 학습하여 개에 대한 지식을 저장하고, 제2 특징 맵(F₂)은 제2 GT 박스(g₂)를 학습하여 소에 대한 지식을 저장할 수 있다.The GT box matching unit 10 may match the first feature map F ₁ and the first GT box g ₁ , and match the second feature map F ₂ and the second GT box g ₂ . have. That is, the object included in the first GT box g ₁ is allocated to be detected by the first feature map F ₁ , and the object included in the second GT box g ₂ is the second feature map F ₂ ). To this end, the first feature map F ₁ learns the first GT box g ₁ to store knowledge of the dog, and the second feature map F ₂ learns the second GT box g ₂ . You can store your knowledge of cows.

예를 들면, GT 박스 매칭부(10)는 SSD의 특징 계층을 구성하는 적어도 하나의 특징 맵을 해상도가 높은 순으로 정렬하고, 영상 내 객체 영역에 설정되는 적어도 하나의 GT 박스를 그 크기가 작은 순으로 정렬하여, 정렬된 순서대로 적어도 하나의 특징 맵과 적어도 하나의 GT 박스를 매칭할 수 있다.For example, the GT box matching unit 10 arranges at least one feature map constituting the feature layer of the SSD in the order of high resolution, and the size of the at least one GT box set in the object area in the image is small. Sorted in order, it is possible to match at least one feature map and at least one GT box in the sorted order.

한편, SSD는 종래의 방식에 따르면 특징 맵은 GT 박스의 크기에 따라 미리 할당되는 GT 박스에 대한 정보만을 저장하고 있어 특징 맵 간의 추상화 정도(semantic level)가 다르고, 낮은 특징 레벨을 구성하는 특징 맵의 경우 추상화 정도가 낮다. 따라서 본 실시예에서는 SSD의 모든 특징 맵이 영상 내 다양한 크기의 객체에 대한 정보를 학습하도록 하여 검출 성능 향상을 도모할 수 있다. On the other hand, according to the conventional method, the feature map stores only information about the GT box that is pre-allocated according to the size of the GT box, so that the level of abstraction (semantic level) between the feature maps is different, and the feature map constituting a low feature level In the case of, the degree of abstraction is low. Therefore, in this embodiment, all the feature maps of the SSD can learn information about objects of various sizes in the image, thereby improving detection performance.

이를 위해, 추가 GT 박스 생성부(30)는 영상 내 객체 영역에 설정되는 GT 박스 외에 추가 GT 박스를 생성할 수 있다.To this end, the additional GT box generator 30 may generate an additional GT box in addition to the GT box set in the object area in the image.

추가 GT 박스 생성부(30)는 영상 내 객체 영역에 설정되는 적어도 하나의 GT 박스를 각각 확대 또는 축소하여 적어도 하나의 GT 박스 별로 적어도 하나의 추가 GT 박스를 생성할 수 있다.The additional GT box creation unit 30 may generate at least one additional GT box for each at least one GT box by enlarging or reducing each of the at least one GT boxes set in the object area in the image.

추가 GT 박스 생성부(30)는 적어도 하나의 GT 박스 별로 각각 GT 박스와 중심 좌표는 동일하되, 크기가 확대 또는 축소된 적어도 하나의 추가 GT 박스를 생성할 수 있다. 이와 관련하여 구체적인 설명은 도 4를 참조하여 후술한다.The additional GT box generating unit 30 may generate at least one additional GT box in which the GT box and the center coordinate are the same for each of the at least one GT box, but the size is enlarged or reduced. A detailed description in this regard will be described later with reference to FIG. 4.

즉, 종래의 SSD 방식에 따르면 영상 내 특정 객체 영역에는 하나의 GT 박스만이 설정되는 반면, 본 발명에 따르면 영상 내 특정 개체 영역에는 종래의 SSD 방식에 따른 GT 박스 외에도, GT 박스와 크기가 다른 추가 GT 박스를 설정할 수 있다. 이에 영상 내 특정 객체에 대한 지식은 크기가 상이한 복수의 GT 박스로부터 획득할 수 있을 것이다.That is, according to the conventional SSD method, only one GT box is set in a specific object area in the image, whereas according to the present invention, in addition to the GT box according to the conventional SSD method, the specific object area in the image has a different size from the GT box. You can set up additional GT boxes. Accordingly, knowledge of a specific object in an image may be acquired from a plurality of GT boxes having different sizes.

또한, 특징 맵 분리부(50)는 적어도 하나의 특징 맵을 분리하여 적어도 하나의 특징 맵 별로 적어도 하나의 하위 특징 맵을 생성할 수 있다. Also, the feature map separator 50 may separate at least one feature map to generate at least one sub-feature map for each of the at least one feature map.

특징 맵 분리부(50)는 적어도 하나의 특징 맵을 각각 적어도 하나의 GT 박스 별로 생성하는 추가 GT 박스의 개수와 동일한 개수로 분리할 수 있다. 이와 관련하여 구체적인 설명은 도 4를 참조하여 후술한다.The feature map separator 50 may separate the at least one feature map into the same number as the number of additional GT boxes that are generated for each at least one GT box. A detailed description in this regard will be described later with reference to FIG. 4.

추가 GT 박스 매칭부(70)는 적어도 하나의 하위 특징 맵과 적어도 하나의 추가 GT 박스를 매칭할 수 있다.The additional GT box matching unit 70 may match at least one sub-feature map and at least one additional GT box.

추가 GT 박스 매칭부(70)는 적어도 하나의 하위 특징 맵에서 각각 적어도 하나의 추가 GT 박스를 학습할 수 있도록 적어도 하나의 하위 특징 맵과 적어도 하나의 추가 GT 박스를 박스의 크기에 따라 매칭할 수 있다. 이와 관련하여 도 4를 참조하여 설명한다.The additional GT box matching unit 70 may match at least one sub-feature map and at least one additional GT box according to the size of the box so as to learn at least one additional GT box from each of the at least one sub-feature map. have. This will be described with reference to FIG. 4.

도 4는 도 2에 도시된 추가 GT 박스 매칭부에서의 하위 특징 맵과 추가 GT 박스 매칭을 설명하기 위한 도면이다.FIG. 4 is a diagram for explaining sub-feature maps and additional GT box matching in the additional GT box matching unit illustrated in FIG. 2.

도 4를 참조하면, 추가 GT 박스 생성부(30)는 영상 내 개 영역에 설정된 제1 GT 박스(g₁)를 확대하여 제1 추가 GT 박스(g₁ ¹)를 생성하고, 영상 내 소 영역에 설정된 제2 GT 박스(g₂)를 축소하여 제2 추가 GT 박스(g₂ ⁴)를 생성할 수 있다. 여기서, 제1 추가 GT 박스(g₁ ¹)는 영상 내 축소된 개에 대한 정보를 포함하고, 제2 추가 GT 박스(g₂ ⁴)는 영상 내 확대된 소의 일부 영역에 대한 정보를 포함할 수 있다.Referring to FIG. 4, the additional GT box generating unit 30 enlarges the first GT box g ₁ set in each area in the image to generate a first additional GT box g ₁ ¹ , and a small area in the image. the second reduces the GT box (g ₂₎ set on the can to create a second more GT box ₍₂ ⁴ g). Here, the first additional GT box (g ₁ ¹ ) may include information about the reduced dog in the image, and the second additional GT box (g ₂ ⁴ ) may include information about a partial area of the enlarged cow in the image. have.

이처럼 추가 GT 박스 생성부(30)는 기존의 GT 박스의 크기를 확대 또는 축소하여 추가 GT 박스를 생성할 수 있다. 즉, 추가 GT 박스는 기존의 GT 박스의 일부 영역에 설정되는 박스이거나, 기존의 GT 박스를 내부에 포함하는 영역에 설정되는 박스일 수 있다.As described above, the additional GT box generator 30 may enlarge or reduce the size of an existing GT box to generate additional GT boxes. That is, the additional GT box may be a box set in a partial area of the existing GT box or a box set in an area including the existing GT box therein.

도 4에서는 추가 GT 박스 생성부(30)가 제1 GT 박스(g₁) 및 제2 GT 박스(g₂)로부터 각각 하나의 추가 GT 박스를 생성하는 것을 예로 들어 도시하였으나 이에 한정하는 것은 아니며, 하나 이상의 추가 GT 박스를 생성할 수 있음은 물론이다.In FIG. 4, the additional GT box generation unit 30 shows an example of generating one additional GT box from the first GT box g ₁ and the second GT box g ₂ , but is not limited thereto. Of course, you can create more than one additional GT box.

추가 GT 박스 생성부(30)가 제1 GT 박스(g₁) 및 제2 GT 박스(g₂)로부터 각각 K 개의 추가 GT 박스를 생성하는 경우, 제1 GT 박스(g₁)로부터 생성되는 추가 GT 박스의 집합은 {g₁ ¹,??,g₁ ^k}와 같이 나타내고, 제2 GT 박스(g₂)로부터 생성되는 추가 GT 박스의 집합은 {g₂ ¹,??,g₂ ^k}와 같이 나타낼 수 있다. 여기서, 제1 추가 GT 박스(g₁ ^k) 및 제2 추가 GT 박스(g₂ ^k)는 각각 제1 GT 박스(g₁) 및 제2 GT 박스(g₂)와 그 중심 좌표가 동일하다.When the additional GT box generating unit 30 generates K additional GT boxes from the first GT box g ₁ and the second GT box g ₂ , respectively, the additional GT box g ₁ is generated. The set of GT boxes is represented as {g ₁ ¹ ,??,g ₁ ^k }, and the set of additional GT boxes generated from the second GT box (g ₂ ) is {g ₂ ¹ ,??,g ₂ ^k } Can be represented as Here, the first additional GT box g ₁ ^k and the second additional GT box g ₂ ^k have the same center coordinates as the first GT box g ₁ and the second GT box g _2, respectively.

이때, 제1 GT 박스(g₁)의 크기가 HxW인 경우, 제1 추가 GT 박스(g₁ ^k)의 크기는 s_kHxs_kW일 수 있다. 여기서, s={s₁,??,s_k}는 k개의 스케일 파라미터의 집합이다.In this case, when the size of the first GT box g ₁ is HxW, the size of the first additional GT box g ₁ ^k may be s _k Hxs _k W. Here, s={s ₁ ,??,s _k } is a set of k scale parameters.

특징 맵 분리부(50)는 SSD의 특징 계층을 구성하는 제1 특징 맵(F₁) 및 제2 특징 맵(F₂)을 각각 분리할 수 있다.The feature map separator 50 may separate the first feature map F ₁ and the second feature map F ₂ constituting the feature layer of the SSD, respectively.

도 4에서는 제1 특징 맵(F₁) 및 제2 특징 맵(F₂)이 각각 4 개의 하위 특징 맵으로 분리된 것을 예로 들어 도시하였으나, 이에 한정하는 것은 아니며, 특징 맵 분리부(50)는 하나의 GT 박스 별로 생성하는 추가 GT 박스의 개수(K)와 동일한 개수로 제1 특징 맵(F₁) 및 제2 특징 맵(F₂)을 분리하여 각각 K 개의 하위 특징 맵을 생성할 수 있다.In FIG. 4, the first feature map F ₁ and the second feature map F ₂ are respectively illustrated as being divided into four sub-feature maps, but are not limited thereto, and the feature map separator 50 is K sub-feature maps may be generated by separating the first feature map F ₁ and the second feature map F ₂ by the same number as the number K of additional GT boxes generated for each GT box. .

특징 맵 분리부(50)가 제1 특징 맵(F₁) 및 제2 특징 맵(F₂)으로부터 각각 K 개의 하위 특징 맵을 생성하는 경우, 제1 특징 맵(F₁)으로부터 생성되는 하위 특징 맵의 집합은 {f₁ ¹,??,f₁ ^k}와 같이 나타내고, 제2 특징 맵(F₂)으로부터 생성되는 하위 특징 맵의 집합은 {f₂ ¹,??,f₂ ^k}와 같이 나타낼 수 있다. When the feature map separation unit 50 generates K sub-feature maps from the first feature map F ₁ and the second feature map F ₂ , respectively, the sub-features generated from the first feature map F ₁ The set of maps is represented as {f ₁ ¹ ,??,f ₁ ^k }, and the set of lower feature maps generated from the second feature map F ₂ is {f ₂ ¹ ,??,f ₂ ^k } and Can be represented together.

추가 GT 박스 매칭부(70)는 추가 GT 박스(g_n ^k)를 그 박스 크기에 따라 제1 특징 맵(F₁)으로부터 생성되는 하위 특징 맵{f₁ ^k} 또는 제2 특징 맵(F₂)으로부터 생성되는 하위 특징 맵{f₂ ^k}에서 학습하도록 할당할 수 있다.The additional GT box matching unit 70 generates the additional GT box g _n ^k from the lower feature map {f ₁ ^k } or the second feature map F ₂ generated from the first feature map F ₁ according to the box size. It can be assigned to learn from the lower feature map {f ₂ ^k } generated from ).

즉, 추가 GT 박스 매칭부(70)는 제2 특징 맵(F₂)으로부터 분리되는 하위 특징 맵(f₂ ¹)에 제1 추가 GT 박스(g₁ ¹)를 매칭하고, 제1 특징 맵(F₁)으로부터 분리되는 하위 특징 맵(f₁ ⁴)에 제2 추가 GT 박스(g₂ ⁴)를 매칭할 수 있다.That is, the additional GT box matching unit 70 matches the first additional GT box g ₁ ¹ to the lower feature map f ₂ ¹ separated from the second feature map F ₂ , and the first feature map ( The second additional GT box g ₂ ⁴ may be matched to the lower feature map f ₁ ⁴ separated from F ₁ ).

이에 따라 제1 추가 GT 박스(g₁ ¹)에 포함되는 객체는 제2 특징 맵(F₂)에 의해서도 검출되도록 할당되고, 제2 추가 GT 박스(g₂ ⁴)에 포함되는 객체는 제1 특징 맵(F₁)에 의해서도 검출되도록 할당될 수 있다. 이를 위해, 제2 특징 맵(F₂)으로부터 분리되는 하위 특징 맵(f₂ ¹)은 제1 추가 GT 박스(g₁ ¹)를 학습하여 영상 내 축소된 개에 대한 지식을 저장하고, 제1 특징 맵(F₁)으로부터 분리되는 하위 특징 맵(f₁ ⁴)은 제2 추가 GT 박스(g₂ ⁴)를 학습하여 영상 내 확대된 소의 일부 영역에 대한 지식을 저장할 수 있다.Accordingly, the object included in the first additional GT box g ₁ ¹ is allocated to be detected by the second feature map F ₂ , and the object included in the second additional GT box g ₂ ⁴ is the first feature. It can also be assigned to be detected by the map F ₁ . To this end, the sub-feature map f ₂ ¹ separated from the second feature map F ₂ learns the first additional GT box g ₁ ¹ to store knowledge of the reduced dog in the image, and the first The lower feature map f ₁ ⁴ separated from the feature map F ₁ may learn a second additional GT box g ₂ ⁴ to store knowledge of some areas of the enlarged cow in the image.

예를 들면, 추가 GT 박스 매칭부(70)는 추가 GT 박스와 GT 박스의 박스 크기를 비교하여 추가 GT 박스 별로 GT 박스를 매칭할 수 있다. 그리고 추가 GT 박스 매칭부(70)는 추가 GT 박스를 추가 GT 박스 별로 매칭되는 GT 박스를 학습한 특징 맵으로부터 분리되어 생성된 하위 특징 맵과 매칭할 수 있다.For example, the additional GT box matching unit 70 may match the GT box for each additional GT box by comparing the box size of the additional GT box and the GT box. In addition, the additional GT box matching unit 70 may match the additional GT box with a sub-feature map generated by separating the GT box that matches the GT box for each additional GT box.

즉, 제1 추가 GT 박스(g₁ ¹)는 박스 크기가 유사한 제2 GT 박스(g₂)와 매칭되고, 제2 추가 GT 박스(g₂ ⁴)는 박스 크기가 유사한 제1 GT 박스(g₁)와 매칭될 수 있다. 제1 추가 GT 박스(g₁ ¹)는 제2 GT 박스(g₂)를 학습한 제2 특징 맵(F₂)으로부터 분리되어 생성된 하위 특징 맵(f₂ ¹)과 매칭되고, 제2 추가 GT 박스(g₂ ⁴)는 제1 GT 박스(g₁)를 학습한 제1 특징 맵(F₁)으로부터 분리되어 생성된 하위 특징 맵(f₁ ⁴)과 매칭될 수 있다.In other words, the first additional GT box (g _1, ¹⁾ is the second GT box (g ₂₎ with the matching and the second adds a box size similar GT box (g _2, ⁴⁾ has a first GT box is a box size similar to (g ₁ ). The first additional GT box g ₁ ¹ is matched with the sub-feature map f ₂ ¹ generated separately from the second feature map F ₂ learning the second GT box g ₂ , and the second addition The GT box g ₂ ⁴ may be matched with the lower feature map f ₁ ⁴ generated separately from the first feature map F ₁ learning the first GT box g ₁ .

이와 같은 경우 각 특징 맵은 해상도에 따라 최초 할당되는 GT 박스의 크기와 유사한 크기의 추가 GT 박스를 학습하게 될 것이다. 즉, 각 특징 맵은 영상에서 유사한 크기의 영역 내에 포함되는 확대 또는 축소된 객체에 대한 지식을 저장할 수 있다.In this case, each feature map will learn an additional GT box of a size similar to the size of the GT box initially allocated according to the resolution. That is, each feature map may store knowledge of an enlarged or reduced object included in an area of a similar size in the image.

이와 같이, 본 발명의 일 실시예에 따른 객체 검출기의 학습 장치(1)는 GT 박스 및 특징 맵을 분리하여 SSD의 모든 특징 맵이 영상 내 모든 객체에 대한 정보를 학습하도록 지원할 수 있으며, 이로 인해 SSD의 검출 성능 향상을 도모할 수 있다.As described above, the learning apparatus 1 of the object detector according to an embodiment of the present invention can separate the GT box and the feature map and support all feature maps of the SSD to learn information about all objects in the image. It is possible to improve the detection performance of the SSD.

이하에서는 도 5를 참조하여 본 발명의 일 실시예에 따른 객체 검출기의 학습 방법에 대하여 설명한다.Hereinafter, a method of learning an object detector according to an embodiment of the present invention will be described with reference to FIG. 5.

도 5는 본 발명의 일 실시예에 따른 객체 검출기의 학습 방법의 흐름도이다.5 is a flowchart of a method for learning an object detector according to an embodiment of the present invention.

본 발명의 일 실시예에 따른 객체 검출기의 학습 방법은 도 2에 도시된 객체 검출기의 학습 장치(1)와 실질적으로 동일한 구성에서 실행될 수 있다. 따라서 도 2에 도시된 객체 검출기의 학습 장치(1)와 동일한 구성요소는 동일한 도면부호를 부여하고, 반복되는 설명은 생략한다.The learning method of the object detector according to an embodiment of the present invention may be executed in substantially the same configuration as the learning device 1 of the object detector shown in FIG. 2. Therefore, the same components as the learning device 1 of the object detector shown in FIG. 2 are given the same reference numerals, and repeated descriptions are omitted.

도 5를 참조하면, GT 박스 매칭부(10)는 특징 맵과 GT 박스를 매칭하여 특징 맵에서 GT 박스를 학습하도록 할 수 있다 (S100).Referring to FIG. 5, the GT box matching unit 10 may match the feature map with the GT box to learn the GT box from the feature map (S100).

GT 박스 매칭부(10)는 SSD에서 영상 내 객체 검출에 사용되는 적어도 하나의 특징 맵과 영상 내 객체 영역에 설정되는 적어도 하나의 GT 박스를 박스의 크기에 따라 매칭하여 적어도 하나의 특징 맵에서 각각 매칭된 GT 박스를 학습하도록 할 수 있다. The GT box matching unit 10 matches at least one feature map used for object detection in an image in an SSD and at least one GT box set in an object region in an image according to the size of the box, and each in at least one feature map You can let them learn the matched GT box.

추가 GT 박스 생성부(30)는 GT 박스로부터 추가 GT 박스를 생성할 수 있다(S200).The additional GT box generating unit 30 may generate an additional GT box from the GT box (S200).

추가 GT 박스 생성부(30)는 적어도 하나의 GT 박스 별로 각각 GT 박스와 중심 좌표는 동일하되, 크기가 확대 또는 축소된 적어도 하나의 추가 GT 박스를 생성할 수 있다.The additional GT box generating unit 30 may generate at least one additional GT box in which the GT box and the center coordinate are the same for each of the at least one GT box, but the size is enlarged or reduced.

특징 맵 분리부(50)는 특징 맵으로부터 하위 특징 맵을 생성할 수 있다(S300).The feature map separation unit 50 may generate a lower feature map from the feature map (S300 ).

특징 맵 분리부(50)는 적어도 하나의 특징 맵을 각각 적어도 하나의 GT 박스 별로 생성하는 추가 GT 박스의 개수와 동일한 개수로 분리하여 적어도 하나의 특징 맵 별로 하위 특징 맵을 생성할 수 있다.The feature map splitter 50 may separate the at least one feature map into the same number as the number of additional GT boxes that are generated for each of the at least one GT box, and generate a lower feature map for each of the at least one feature map.

추가 GT 박스 매칭부(70)는 하위 특징 맵과 추가 GT 박스를 매칭하여 하위 특징 맵에서 추가 GT 박스를 학습하도록 할 수 있다(S400).The additional GT box matching unit 70 may match the sub-feature map and the additional GT box to learn the additional GT box from the sub-feature map (S400).

추가 GT 박스 매칭부(70)는 적어도 하나의 하위 특징 맵과 적어도 하나의 추가 GT 박스를 박스의 크기에 따라 매칭하여 적어도 하나의 하위 특징 맵에서 각각 추가 GT 박스를 추가로 학습하도록 할 수 있다.The additional GT box matching unit 70 may match the at least one sub-feature map and the at least one additional GT box according to the size of the box to additionally learn the additional GT box from the at least one sub-feature map.

예를 들면, 추가 GT 박스 매칭부(70)는 적어도 하나의 추가 GT 박스와 적어도 하나의 GT 박스의 박스 크기를 비교하여 적어도 하나의 추가 GT 박스 별로 GT 박스를 매칭할 수 있다. 그리고, 추가 GT 박스 매칭부(70)는 적어도 하나의 추가 GT 박스를 각각 적어도 하나의 추가 GT 박스 별로 매칭되는 GT 박스를 학습한 특징 맵으로부터 분리되어 생성된 하위 특징 맵과 매칭할 수 있다.For example, the additional GT box matching unit 70 may match the GT box for each at least one additional GT box by comparing the box sizes of the at least one additional GT box and the at least one GT box. In addition, the additional GT box matching unit 70 may match at least one additional GT box with a sub-feature map generated separately from a feature map obtained by learning a GT box matched by at least one additional GT box.

이에 따라, 각 특징 맵은 영상에서 유사한 크기의 영역 내에 포함되는 객체, 확대 또는 축소된 객체에 대한 지식을 저장할 수 있을 것이다. 즉, 영상 내 객체에 대한 정보는 다양한 크기로 특징 맵에 학습될 수 있다. 아울러 이러한 특징 맵을 이용하여 객체를 검출하는 SSD의 검출 성능 향상을 기대할 수 있다.Accordingly, each feature map may store knowledge of an object included in an area of a similar size in an image, or an enlarged or reduced object. That is, information about an object in an image may be learned in a feature map in various sizes. In addition, it can be expected to improve the detection performance of an SSD that detects an object using the feature map.

한편, 특징 맵과 GT 박스를 매칭하여 특징 맵에서 GT 박스를 학습하도록 하는 단계(S100) 및 하위 특징 맵과 추가 GT 박스를 매칭하여 하위 특징 맵에서 추가 GT 박스를 학습하도록 하는 단계(S400)에서 특징 맵 또는 하위 특징 맵은 각각 합성곱 신경망(CNN)을 이용하여 매칭된 GT 박스 또는 추가 GT 박스를 학습할 수 있다.On the other hand, by matching the feature map and the GT box to learn the GT box from the feature map (S100) and matching the sub-feature map and the additional GT box to learn the additional GT box from the sub-feature map (S400). The feature map or sub-feature map may train matched GT boxes or additional GT boxes, respectively, using a convolutional neural network (CNN).

여기서, 합성곱 신경망(CNN)은 네트워크를 학습시키기 위해 널리 사용되는 multibox 손실 함수를 사용할 수 있다. GT 박스들의 세트를 G라 하고, G에 대해 예측된 결과를 x라 하면, multibox 손실 함수는 L(x,G)로 나타낼 수 있다. Multibox 손실 함수는 객체 분류 결과의 신뢰도에 대한 손실 함수인 softmax 함수와 객체 위치 추정에 대한 손실 함수인 smooth L1 함수로 구성될 수 있다. Here, the convolutional neural network (CNN) can use a multibox loss function widely used to train a network. Assuming that the set of GT boxes is G and the predicted result for G is x, the multibox loss function can be represented by L(x,G). The multibox loss function may be composed of a softmax function, which is a loss function for reliability of an object classification result, and a smooth L1 function, which is a loss function for object position estimation.

특징 맵(F_n) 및 하위 특징 맵(f_n ^k)은 각각 구성된 합성곱 계층을 거쳐 최종 결과인 x_k 및 x_org를 출력할 수 있으며, 아래 수학식 1과 같은 최종 손실 함수를 이용하여 매칭된 GT 박스 또는 추가 GT 박스를 학습할 수 있다.The feature map (F _n ) and the sub-feature map (f _n ^k ) can output the final result x _k and x _org through the constructed convolution layer, respectively, and match them using the final loss function as in Equation 1 below. You can learn an old GT box or an additional GT box.

수학식 1에서 G_org 및 G^k는 각각 GT 박스의 집합 및 s_k만큼 크기가 변화된 추가 GT 박스의 집합을 나타내고,

는 가중치로, 일예로, 0.25로 설정될 수 있다.In Equation 1, G _org and G ^k each represent a set of GT boxes and a set of additional GT boxes whose size is changed by s _k ,

May be set as a weight, for example, 0.25.

이하에서는 본 발명에 따른 객체 검출기의 학습 방법을 적용한 SSD의 유리한 효과에 대해 설명한다.Hereinafter, an advantageous effect of the SSD to which the learning method of the object detector according to the present invention is applied will be described.

도 6은 영상 내에서 종래의 SSD로부터 객체를 검출한 결과 및 본 발명에 따른 객체 검출기의 학습 방법을 적용한 SSD로부터 객체를 검출한 결과를 보여주는 도면이다.FIG. 6 is a diagram showing results of detecting an object from a conventional SSD in an image and detecting an object from an SSD to which the learning method of the object detector according to the present invention is applied.

도 6의 (a)는 종래의 SSD로부터 객체를 검출한 결과를 보여주는 다양한 예시 영상으로, SSD의 각 특징 계층은 할당된 크기의 개체들에 대한 정보만을 학습하여 특징 계층 간의 추상화 정도가 다르고 낮은 해상도의 특징 계층에 할당되는 일부 개체들에 대한 추상화 정도는 약한 상태이다.6(a) is a variety of example images showing the results of detecting an object from a conventional SSD, and each feature layer of the SSD learns only information about objects of an allocated size and has different degrees of abstraction between feature layers and has low resolution. The degree of abstraction of some objects assigned to the feature layer of is weak.

도 6의 (b)는 본 발명에 따른 객체 검출기의 학습 방법을 적용한 SSD 로부터 객체를 검출한 결과를 보여주는 다양한 예시 영상으로, SSD의 각 특징 계층은 할당된 크기의 개체들에 대한 정보뿐만 아니라 할당된 크기로 확대 또는 축소된 다른 개체들에 대한 정보를 추가로 학습하여, 특징 계층 간의 추상화 정도가 같고, 영상 내 개체들에 대한 추상화 정도가 증가한 상태이다.Figure 6 (b) is a variety of example images showing the results of detecting an object from the SSD to which the learning method of the object detector according to the present invention is applied, each feature layer of the SSD is assigned as well as information about the objects of the allocated size By learning additional information about other objects enlarged or reduced to a given size, the degree of abstraction between feature layers is the same, and the degree of abstraction for objects in an image is increased.

도 6의 (a) 및 도 6의 (b)를 비교하면, 종래의 SSD에서는 몇몇 객체들을 검출하는데 실패한 반면 본 발명에 따른 객체 검출기의 학습 방법을 적용한 SSD에서는 보다 많은 객체를 정확하게 검출한 것을 확인할 수 있다.6(a) and 6(b), it can be seen that the conventional SSD failed to detect some objects, whereas the SSD to which the object detector learning method according to the present invention was applied accurately detected more objects. Can be.

이처럼, 본 발명에 따른 객체 검출기의 학습 방법은 SSD의 검출 성능 향상의 유리한 효과를 갖는다.As such, the learning method of the object detector according to the present invention has an advantageous effect of improving the detection performance of the SSD.

이와 같은 본 발명의 객체 검출기의 학습 방법은 다양한 컴퓨터 구성요소를 통하여 수행될 수 있는 프로그램 명령어의 형태로 구현되어 컴퓨터 판독 가능한 기록 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능한 기록 매체는 프로그램 명령어, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다.The learning method of the object detector of the present invention is implemented in the form of program instructions that can be executed through various computer components and can be recorded in a computer-readable recording medium. The computer-readable recording medium may include program instructions, data files, data structures, or the like alone or in combination.

상기 컴퓨터 판독 가능한 기록 매체에 기록되는 프로그램 명령어는 본 발명을 위하여 특별히 설계되고 구성된 것들이거니와 컴퓨터 소프트웨어 분야의 당업자에게 공지되어 사용 가능한 것일 수도 있다.The program instructions recorded on the computer-readable recording medium are specially designed and configured for the present invention, and may be known and available to those skilled in the computer software field.

컴퓨터 판독 가능한 기록 매체의 예에는, 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD 와 같은 광기록 매체, 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 ROM, RAM, 플래시 메모리 등과 같은 프로그램 명령어를 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다.Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tapes, optical recording media such as CD-ROMs, DVDs, and magneto-optical media such as floptical disks. media), and hardware devices specifically configured to store and execute program instructions such as ROM, RAM, flash memory, and the like.

프로그램 명령어의 예에는, 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드도 포함된다. 상기 하드웨어 장치는 본 발명에 따른 처리를 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Examples of program instructions include not only machine language codes produced by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. The hardware device may be configured to operate as one or more software modules to perform processing according to the present invention, and vice versa.

이상 첨부된 도면을 참조하여 본 발명의 실시예를 설명하였지만, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 본 발명이 그 기술적 사상이나 필수적인 특징을 변경하지 않고서 다른 구체적인 형태로 실시될 수 있다는 것을 이해할 수 있을 것이다. 그러므로 이상에서 기술한 실시예들은 모든 면에서 예시적인 것이며 한정적이 아닌 것으로 이해해야만 한다.Although the embodiments of the present invention have been described with reference to the accompanying drawings, those skilled in the art to which the present invention pertains may be implemented in other specific forms without changing the technical spirit or essential features of the present invention. You will understand. Therefore, it should be understood that the embodiments described above are illustrative in all respects and not restrictive.

1: 객체 검출기의 학습 장치
10: GT 박스 매칭부
30: 추가 GT 박스 생성부
50: 특징 맵 분리부
70: 추가 GT 박스 매칭부1: Object detector learning device
10: GT box matching unit
30: additional GT box generator
50: feature map separation
70: additional GT box matching unit

Claims

At least one feature map used for object detection in the image and at least one ground truth (GT) box set in the object area in the image are matched according to the size of the box to learn the matched GT box in the at least one feature map. To do;
Generating at least one additional GT box for each of the at least one GT box by expanding or reducing the at least one GT box, respectively;
Separating the at least one feature map and generating at least one sub-feature map for each of the at least one feature map; And
And learning the additional GT boxes respectively matched in at least one sub-feature map by matching the at least one sub-feature map with the at least one additional GT box according to the size of the box. .

According to claim 1,
The step of generating at least one additional GT box for each of the at least one GT box by expanding or reducing the at least one GT box, respectively,
Each of the at least one GT box, the GT box and the center coordinates are the same, but the method of learning an object detector comprising the step of generating at least one additional GT box whose size is enlarged or reduced.

According to claim 1,
Separating the at least one feature map to generate at least one sub-feature map for each of the at least one feature map,
And separating the at least one feature map into the same number as the number of additional GT boxes generated for each of the at least one GT box.

According to claim 1,
The step of matching the at least one sub-feature map with the at least one additional GT box according to the size of the box to further learn additional GT boxes matched with each of the at least one sub-feature map,
Comparing a box size of the at least one additional GT box and the at least one GT box to match a GT box for each of the at least one additional GT box; And
And matching the at least one additional GT box with a sub-feature map generated separately from a feature map in which a GT box matching each of the at least one additional GT box is learned.

According to claim 1,
The step of matching the at least one sub-feature map with the at least one additional GT box according to the size of the box to further learn additional GT boxes matched with each of the at least one sub-feature map,
And learning additional GT boxes matched by using convolutional neural networks (CNN) in the at least one sub-feature map.

A computer-readable recording medium in which a computer program is recorded, for performing the learning method of the object detector according to any one of claims 1 to 5.

The at least one feature map and the at least one GT box are boxed so that at least one feature map used for object detection in the image can learn at least one ground truth box (GT) set in the object region in the image. GT box matching unit to match according to the size of the;
An additional GT box generating unit for generating at least one additional GT box for each of the at least one GT box by expanding or reducing the at least one GT box, respectively;
A feature map separation unit separating the at least one feature map to generate at least one sub-feature map for each of the at least one feature map; And
An additional GT box matching unit that matches the at least one sub-feature map and the at least one additional GT box according to the size of the box so that the at least one additional GT box can be additionally learned in each of the at least one sub-feature map Learning device of the object detector.

The method of claim 7,
The additional GT box generating unit,
For each of the at least one GT box, the GT box and the center coordinate are the same, but the learning device of the object detector generates at least one additional GT box whose size is enlarged or reduced.

The method of claim 7,
The feature map separation unit,
A learning apparatus for an object detector that separates the at least one feature map into the same number of additional GT boxes that are generated for each of the at least one GT box.

The method of claim 7,
The additional GT box matching unit,
Compare the box sizes of the at least one additional GT box and the at least one GT box to match GT boxes for each of the at least one additional GT box, and each of the at least one additional GT box to the at least one additional GT box An object detector learning device that matches the sub-feature map generated separately from the feature map that learned the GT box matched by each.