KR102429272B1

KR102429272B1 - Object detection apparatus based on deep learning and method therefor

Info

Publication number: KR102429272B1
Application number: KR1020200072281A
Authority: KR
Inventors: 김형준; 김지훈; 오창석; 류우섭; 차진혁; 김문현
Original assignee: 주식회사 베이리스
Priority date: 2020-06-15
Filing date: 2020-06-15
Publication date: 2022-08-04
Also published as: KR20210155142A

Abstract

딥러닝에 기반한 객체 검출 방법은, (a) 입력된 이미지의 특징맵을 추출하는 단계; 및 (b) 상기 (a) 단계로부터 추출된 특징맵을 입력받아 영역 제안 네트워크(Region Proposal Network)을 이용하여, 상기 영역 제안 네트워크에 포함된 앵커별로 해당 앵커에 객체가 포함되었는 지 여부, 앵커별로 해당 이미지에 포함된 객체의 이미지 상의 존재 영역 및 앵커별로 해당 이미지에 포함된 객체의 실제 물리적 위치값을 산출하는 단계;를 포함한다.An object detection method based on deep learning comprises the steps of: (a) extracting a feature map of an input image; and (b) receiving the feature map extracted from step (a) and using a region proposal network, whether an object is included in the corresponding anchor for each anchor included in the region proposal network, and each anchor and calculating the actual physical position value of the object included in the image for each presence region and anchor on the image of the object included in the image.

Description

Object detection device and method based on deep learning

본 발명은 딥러닝에 기반한 객체 검출 장치 및 그 방법에 관한 것이다.The present invention relates to an object detection apparatus and method based on deep learning.

딥러닝에서 물체를 검출하는 알고리즘 분야를 객체 검출(Object Detection)분야라고 한다. 이 분야에서 특히 유명한 알고리즘으로 Faster R-CNN과 Faster R-CNN에 기반한 파생 알고리즘이 존재한다. 다만, Faster R-CNN과 Faster R-CNN에 기반한 파생 알고리즘으로 실생활에서 쌓여진 물품을 분류하는 시도는 현재 많이 이루어져 있으나, 정확도 및 여러 제약으로 실제 제품으로 구현되기는 힘들다.The field of algorithms that detect objects in deep learning is called the field of object detection. As algorithms that are particularly famous in this field, there are Faster R-CNN and Derived Algorithms based on Faster R-CNN. However, many attempts have been made to classify stacked items in real life with a derivative algorithm based on Faster R-CNN and Faster R-CNN, but it is difficult to implement as an actual product due to accuracy and various limitations.

종래의 Faster R-CNN의 기법이 정확도가 높음에도 실제 제품의 검출에 적용되지 않게 되는 가장 큰 이유로는 논문상에서 정확도 측정에 사용되는 데이터셋과 실제 제품의 배치에 차이가 나기 때문이다. 특히 물품이 잔뜩 배치된 냉장고와 같은 환경은 기존 딥러닝 기반의 객체 검출에 사용되는 데이터셋에서는 보기 힘든 환경이다. 즉, 대량의 물품 검출이라는 측면에 특화되어 진행된 연구는 찾아보기 힘들다.The main reason why the conventional Faster R-CNN technique is not applied to the detection of actual products even though the accuracy is high is that there is a difference between the data set used for accuracy measurement in the paper and the arrangement of the actual product. In particular, an environment such as a refrigerator with a lot of items is an environment that is difficult to see in datasets used for object detection based on existing deep learning. In other words, it is difficult to find research that has been specialized in the aspect of detecting a large amount of goods.

예를 들면, 일반적으로 많이 쓰이는 COCO 데이터셋의 경우 제한된 제품 환경이 아닌 일상적인 환경에서 다양한 형태, 다양한 배경에 대한 데이터셋으로 구성이 되어 있는 상황인 까닭에, 10 내지 20개 정도가 많은 객체가 많이 존재하는 이미지에 속한다. For example, in the case of the commonly used COCO dataset, because it is a situation that consists of datasets for various shapes and backgrounds in everyday environments, not in limited product environments, there are about 10 to 20 objects. It belongs to many images.

도 1은 진열대의 이미지의 예시도를 나타낸다.1 shows an exemplary view of an image of a display stand.

도 1로부터 알 수 있는 바와 같이, 냉장고, 마켓의 일반적인 진열대 등의 환경은 기본적으로 하나의 선반에 50 내지 100개씩 쌓여있는게 일반적인 상황이다. 즉, 데이터의 기반이 되는 환경이 진열대와 COCO 데이터셋은 너무 다르다. 이러한 서로 다른 환경 때문에 데이터 기반에 따라서 성능이 직접적으로 영향받는 딥러닝의 특성상 기존 알고리즘이 제한적일 수 밖에 없다.As can be seen from FIG. 1 , in environments such as refrigerators and general shelves in a market, it is a general situation that 50 to 100 are basically stacked on one shelf. In other words, the environment on which the data is based is very different between the shelf and the COCO dataset. Due to these different environments, the existing algorithms are inevitably limited due to the nature of deep learning, where the performance is directly affected by the data base.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks(arXiv:1506.01497v3[cs.CV], 2015.06.04. 제출, 2016.06.06. 개정).Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks (arXiv:1506.01497v3[cs.CV], submitted on 06.04.2015, revised on 06.06.2016).

본 발명은 전술한 바와 같은 기술적 과제를 해결하는 데 목적이 있는 발명으로서, Faster R-CNN을 이용하되 객체의 실제 물리적 위치 정보를 추가하는 것에 의해 정확도를 높일 수 있을 뿐만 아니라, 연산량도 감소시킬 수 있는 딥러닝에 기반한 객체 검출 장치 및 그 방법을 제공하는 것에 그 목적이 있다.The present invention is an invention aimed at solving the technical problem as described above, and by using Faster R-CNN, not only can the accuracy be increased by adding the actual physical location information of the object, but also the amount of computation can be reduced. It is an object of the present invention to provide an object detection apparatus and method based on deep learning.

본 발명의 딥러닝에 기반한 객체 검출 방법은, (a) 입력된 이미지의 특징맵을 추출하는 단계; (b) 상기 (a) 단계로부터 추출된 특징맵을 입력받아 영역 제안 네트워크(Region Proposal Network)를 이용하여, 상기 영역 제안 네트워크에 포함된 앵커별로 해당 앵커에 객체가 포함되었는 지 여부, 앵커별로 해당 이미지에 포함된 객체의 이미지 상의 존재 영역 및 앵커별로 해당 이미지에 포함된 객체의 실제 물리적 위치값을 산출하는 단계; (c) 상기 영역 제안 네트워크를 이용하여, 상기 (b) 단계에서 산출된 객체의 실제 물리적 위치값 각각에 대해, 하나의 물리적 위치값에 하나의 앵커가 대응하도록 앵커를 선택하는 단계; 및 (d) 상기 (c) 단계에서 선택된 하나의 앵커를 이용하여, 해당 물리적 위치값에 위치하는 객체의 종류를 특정하는 단계;를 포함한다.An object detection method based on deep learning of the present invention comprises the steps of: (a) extracting a feature map of an input image; (b) receiving the feature map extracted from step (a) and using a region proposal network, whether an object is included in the corresponding anchor for each anchor included in the region proposal network, and corresponding to each anchor calculating an actual physical position value of an object included in the image for each presence region and anchor on the image of the object included in the image; (c) using the area proposal network, selecting an anchor so that one anchor corresponds to one physical location value for each actual physical location value of the object calculated in step (b); and (d) specifying the type of object located at the corresponding physical location value by using one anchor selected in step (c).

아울러, 상기 영역 제안 네트워크는, 로스(Loss) 함수를 이용하여 학습하고, 상기 로스 함수는, 해당 이미지에 포함된 객체의 실제 물리적 위치값과 관련된 제 3 로스항을 포함하는 것을 특징으로 한다.In addition, the region proposal network is trained using a Loss function, and the loss function is characterized in that it includes a third loss term related to the actual physical position value of the object included in the image.

또한, 상기 제 3 로스항은, 바이너리 크로스 엔트로피(Binary Cross Entropy)를 이용하는 것이 바람직하다.In addition, as the third loss term, it is preferable to use binary cross entropy.

구체적으로, 상기 객체의 실제 물리적 위치값은, 해당 이미지에 포함된 다수의 객체가 위치할 수 있는 수납면을, a행과 b열의 그리드로 나타낼 경우, 해당 그리드의 행과 열의 정보인 것을 특징으로 한다.Specifically, the actual physical location value of the object is the information of the row and column of the grid when the receiving surface on which a plurality of objects included in the image can be located is represented by a grid of row a and column b. do.

본 발명의 딥러닝에 기반한 객체 검출 장치는, 입력된 이미지의 특징맵을 추출하고, 추출된 특징맵을 입력받아 영역 제안 네트워크(Region Proposal Network)를 이용하여, 상기 영역 제안 네트워크에 포함된 앵커별로 해당 앵커에 객체가 포함되었는 지 여부, 앵커별로 해당 이미지에 포함된 객체의 이미지 상의 존재 영역 및 앵커별로 해당 이미지에 포함된 객체의 실제 물리적 위치값을 산출하는 것을 특징으로 한다.The deep learning-based object detection apparatus of the present invention extracts a feature map of an input image, receives the extracted feature map, and uses a region proposal network for each anchor included in the region proposal network. Whether the object is included in the corresponding anchor, the presence area on the image of the object included in the corresponding image for each anchor, and the actual physical position value of the object included in the corresponding image for each anchor are calculated.

아울러, 본 발명의 객체 검출 장치는, 산출된 객체의 실제 물리적 위치값 각각에 대해, 하나의 물리적 위치값에 하나의 앵커가 대응하도록 앵커를 선택하는 것이 바람직하다.In addition, the object detection apparatus of the present invention preferably selects an anchor so that one anchor corresponds to one physical position value for each of the calculated actual physical position values of the object.

또한, 본 발명의 객체 검출 장치는, 상기 선택된 하나의 앵커를 이용하여, 해당 물리적 위치값에 위치하는 객체의 종류를 특정한다.In addition, the object detection apparatus of the present invention specifies the type of the object located at the corresponding physical location value by using the selected one anchor.

구체적으로, 상기 영역 제안 네트워크는, 로스(Loss) 함수를 이용하여 학습하고, 상기 로스 함수는, 해당 이미지에 포함된 객체의 실제 물리적 위치값과 관련된 제 3 로스항을 포함하는 것이 바람직하다.Specifically, it is preferable that the region proposal network is trained using a Loss function, and the loss function includes a third loss term related to an actual physical position value of an object included in a corresponding image.

아울러, 상기 제 3 로스항은, 바이너리 크로스 엔트로피(Binary Cross Entropy)를 이용하는 것을 특징으로 한다.In addition, the third loss term is characterized by using a binary cross entropy (Binary Cross Entropy).

또한, 상기 객체의 실제 물리적 위치값은, 해당 이미지에 포함된 다수의 객체가 위치할 수 있는 수납면을, a행과 b열의 그리드로 나타낼 경우, 해당 그리드의 행과 열의 정보인 것을 특징으로 한다.In addition, the actual physical position value of the object is characterized in that when the receiving surface on which a plurality of objects included in the image can be located is represented by a grid of rows a and b, it is characterized in that it is information of rows and columns of the corresponding grid. .

본 발명의 딥러닝에 기반한 객체 검출 장치 및 그 방법에 따르면, Faster R-CNN을 이용하되 객체의 실제 물리적 위치 정보를 추가하는 것에 의해 정확도를 높일 수 있을 뿐만 아니라, 연산량도 감소시킬 수 있다.According to the apparatus and method for detecting an object based on deep learning of the present invention, it is possible to not only increase the accuracy but also reduce the amount of computation by using Faster R-CNN but adding the actual physical location information of the object.

도 1은 진열대의 이미지의 예시도.
도 2는 본 발명의 바람직한 일실시예에 따른 딥러닝에 기반한 객체 검출 장치의 구성도.
도 3은 객체의 실제 물리적 위치값에 대한 설명도.
도 4는 종래의 Faster R-CNN을 이용할 경우의 객체를 검출하는 방법에 대한 설명도.
도 5는 본 발명의 바람직한 일실시예에 따른 딥러닝에 기반한 객체 검출 방법의 흐름도.1 is an exemplary view of an image of a display stand.
2 is a block diagram of an object detection apparatus based on deep learning according to a preferred embodiment of the present invention.
3 is an explanatory diagram for an actual physical position value of an object;
4 is an explanatory diagram for a method of detecting an object when using a conventional Faster R-CNN.
5 is a flowchart of an object detection method based on deep learning according to a preferred embodiment of the present invention.

이하, 첨부된 도면을 참조하면서 본 발명의 실시예에 따른 딥러닝에 기반한 객체 검출 장치 및 그 방법에 대해 상세히 설명하기로 한다. 본 발명의 하기의 실시예는 본 발명을 구체화하기 위한 것일 뿐 본 발명의 권리 범위를 제한하거나 한정하는 것이 아님은 물론이다. 본 발명의 상세한 설명 및 실시예로부터 본 발명이 속하는 기술 분야의 전문가가 용이하게 유추할 수 있는 것은 본 발명의 권리 범위에 속하는 것으로 해석된다.Hereinafter, an apparatus and method for detecting an object based on deep learning according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings. Of course, the following examples of the present invention are not intended to limit or limit the scope of the present invention only to embody the present invention. What can be easily inferred by an expert in the technical field to which the present invention pertains from the detailed description and examples of the present invention is construed as belonging to the scope of the present invention.

먼저, 도 2는 본 발명의 바람직한 일실시예에 따른 딥러닝에 기반한 객체 검출 장치(100)의 구성도를 나타낸다.First, FIG. 2 shows a block diagram of an apparatus 100 for detecting an object based on deep learning according to a preferred embodiment of the present invention.

도 2로부터 알 수 있는 바와 같이, 본 발명의 바람직한 일실시예에 따른 딥러닝에 기반한 객체 검출 장치(100)는, 특징맵 추출기(10), 산출기(20) 및 객체 특정기(30)를 포함하여 구성된다. 본 발명의 바람직한 일실시예에 따른 딥러닝에 기반한 객체 검출 장치(100)는, 프로세서를 포함하는 전자 장치를 이용할 수 있다. 아울러, 특징맵 추출기(10), 산출기(20) 및 객체 특정기(30)는, 프로세서의 적어도 일부를 이용하여 구현될 수 있다.As can be seen from FIG. 2 , the deep learning-based object detection apparatus 100 according to a preferred embodiment of the present invention includes a feature map extractor 10 , a calculator 20 , and an object specifier 30 . consists of including The object detection apparatus 100 based on deep learning according to a preferred embodiment of the present invention may use an electronic device including a processor. In addition, the feature map extractor 10 , the calculator 20 , and the object specifyer 30 may be implemented using at least a part of a processor.

본 발명의 딥러닝에 기반한 객체 검출 장치(100)는, Faster R-CNN을 응용하여 구성된다.The object detection apparatus 100 based on deep learning of the present invention is configured by applying Faster R-CNN.

특징맵 추출기(10)는, 합성곱 신경망(Convolutional Neural Network)을 이용하여, 입력된 이미지의 특징맵(Feature Maps)을 추출하는 역할을 한다.The feature map extractor 10 serves to extract feature maps of the input image by using a convolutional neural network.

합성곱 신경망은 일반적으로 백본(Backbone)이라고 하며, 전이 학습(Transfer Learning)을 주로 사용한다. 보통 Faster R-CNN에서는 백본으로 ResNet 또는 ResNext를 주로 사용한다.Convolutional neural networks are generally called backbones, and transfer learning is mainly used. In general, Faster R-CNN mainly uses ResNet or ResNext as the backbone.

산출기(20)는, 영역 제안 네트워크(Region Proposal Network, RPN)를 이용한다. 구체적으로 산출기(20)는, 특징맵 추출기(10)로부터 출력된 특징맵을 입력받아, 영역 제안 네트워크에 포함된 앵커별로 해당 앵커에 객체가 포함되었는 지 여부, 앵커별로 해당 이미지에 포함된 객체의 이미지 상의 존재 영역 및 앵커별로 해당 이미지에 포함된 객체의 실제 물리적 위치값을 산출하는 역할을 한다.The calculator 20 uses a Region Proposal Network (RPN). Specifically, the calculator 20 receives the feature map output from the feature map extractor 10, and determines whether an object is included in the corresponding anchor for each anchor included in the area proposal network, and the object included in the image for each anchor. It serves to calculate the actual physical location value of the object included in the image for each presence region and anchor on the image of .

앵커는 미리 정의된 형태를 가진 객체가 있을 만한 영역을 의미한다.An anchor means an area where an object with a predefined shape is likely to exist.

본 발명에서의 객체는, 냉장고, 진열대 등에 진열되는 우유, 음료수 등의 물품을 예로 들 수 있다. Objects in the present invention may include articles such as milk and beverages displayed on a refrigerator, a display stand, and the like.

종래의 Faster R-CNN과 달리 본 발명의 산출기(20)는, 해당 이미지에 포함된 객체의 실제 물리적 위치값의 산출을 추가적으로 실시하는 것에 그 특징이 있다. Unlike the conventional Faster R-CNN, the calculator 20 of the present invention is characterized in that it additionally calculates the actual physical position value of the object included in the image.

참고로, 해당 앵커에 객체가 포함되었는 지 여부는 종래의 Faster R-CNN에서 클래스(Class)에 해당하며, 해당 이미지에 포함된 객체의 이미지 상의 존재 영역은 종래의 Faster R-CNN에서 경계 박스(Bounding box, Bbox)에 해당한다.For reference, whether an object is included in the anchor corresponds to a class in the conventional Faster R-CNN, and the existence area on the image of the object included in the image is the bounding box ( Bounding box, Bbox).

즉, 영역 제안 네트워크는 객체가 있을 만한 영역을 찾기 위해서 사용되며, 영역 제안 네트워크에는 앵커라는 개념이 들어간다. 앵커는 각각의 객체가 있을만한 영역을 나타내며, 여러개의 앵커를 선택한 뒤에 각각의 앵커를 학습하게 된다. 이런 앵커에서 얻어지는 출력값은 클래스(이 값이 객체인지 아닌지를 나타내는 값) 및 경계 박스라는 객체의 영역 정보(이미지 상에서의 중심 x 좌표, 중심 y 좌표, 가로 길이, 세로 길이)로 나타난다.That is, the area proposal network is used to find a region where an object is likely to be, and the concept of an anchor is included in the area proposal network. Anchors indicate areas where each object is likely to be, and after selecting several anchors, each anchor is learned. The output value obtained from such an anchor is expressed as a class (a value indicating whether this value is an object or not) and area information of an object called a bounding box (center x coordinate, center y coordinate, horizontal length, vertical length on the image).

아울러, 본 발명에서는 앵커가 객체의 실제 물리적 위치값을 추가적으로 출력한다. 즉, 본 발명에서는 해당 이미지에 포함된 객체의 이미지 상의 위치인 경계 박스 뿐만 아니라, 실제 진열대 등의 수납면에서의 물리적 위치인 좌표값도 이용하는 것에 그 특징이 있다.In addition, in the present invention, the anchor additionally outputs the actual physical position value of the object. That is, the present invention is characterized by using not only the bounding box, which is the position on the image of the object included in the image, but also the coordinate value, which is the physical position on the receiving surface of the actual display stand.

도 3은 객체의 실제 물리적 위치값에 대한 설명도이다.3 is an explanatory diagram for an actual physical position value of an object.

도 3으로부터 알 수 있는 바와 같이, 본 발명에서는 객체가 실제로 위치할 냉장고, 진열대 등의 수납면을 그리드(Grid) 형태로 구획하였다. 즉, 객체의 실제 물리적 위치값은, 해당 이미지에 포함된 다수의 객체가 위치할 수 있는 수납면을, a행과 b열의 그리드로 나타낼 경우, 해당 그리드의 행과 열의 정보인 것을 특징으로 한다. 참고로, 도 3에서는 a 및 b는 모두 '8'로 나타내었다.As can be seen from FIG. 3 , in the present invention, the receiving surface of the refrigerator, the display stand, etc., where the object is actually located, is partitioned in the form of a grid. That is, the actual physical position value of the object is characterized in that when the receiving surface on which a plurality of objects included in the image can be located is represented by a grid of rows a and b, it is characterized in that it is row and column information of the corresponding grid. For reference, both a and b are indicated by '8' in FIG. 3 .

즉, 객체의 실제 물리적 위치값 정보는 가로와 세로의 a×b맵으로 나타낼 수 있다.That is, the actual physical location value information of the object can be represented by horizontal and vertical a×b maps.

예를 들면, 도 3에서 우유는, 2행과 3열의 물리적 위치값을 가지게 된다. 참고로, 본 발명에서는 하나의 객체는, 하나의 행과 하나의 열의 내부에만 위치한다고 가정한다.For example, in FIG. 3 , milk has physical position values in rows 2 and 3 columns. For reference, in the present invention, it is assumed that one object is located only inside one row and one column.

영역 제안 네트워크는, 종래의 Faster R-CNN에서와 마찬가지로 로스(Loss) 함수를 이용하여 학습한다. 다만, 본 발명에서의 로스 함수는, 다음의 [수학식 1]과 같이, 해당 앵커에 객체가 포함되었는 지 여부와 관련된 제 1 로스항제 1 로스항 해당 이미지에 포함된 객체의 이미지 상의 존재 영역과 관련된 제 2 로스항(L₂)을 포함할 뿐만 아니라, 해당 이미지에 포함된 객체의 실제 물리적 위치값과 관련된 제 3 로스항(L₃)을 포함하는 것을 특징으로 한다.The region proposal network is trained using a Loss function as in the conventional Faster R-CNN. However, the loss function in the present invention is, as in the following [Equation 1], the first loss term 1 loss term related to whether the object is included in the anchor, the existence area on the image of the object included in the image, and It is characterized in that it includes not only the related second loss term (L ₂ ), but also includes the third loss term (L ₃ ) related to the actual physical position value of the object included in the image.

즉, 본 발명에서는 영역 제안 네트워크에 포함된 앵커의 출력으로 객체의 실제 물리적 위치값 정보를 추가적으로 얻어내기 때문에, 이러한 객체의 실제 물리적 위치값에 대응하는 로스(Loss)를 별도로 산출하여, 객체의 실제 물리적 위치값 연산에 들어가는 가중치를 조절함으로써 학습이 진행되어야 한다. 이 부분이 제 3 로스항(L₃)이 된다. That is, in the present invention, since the actual physical position value information of the object is additionally obtained as the output of the anchor included in the area proposal network, a loss corresponding to the actual physical position value of the object is separately calculated, and the actual physical position of the object is obtained. Learning should proceed by adjusting the weights involved in the calculation of the physical location value. This part becomes the third loss term (L ₃ ).

객체의 실제 물리적 위치값 정보는 가로와 세로의 a×b맵으로 나타나게 된다. 즉, 하나의 객체가 위치할 수 있는 물리적 위치값은 하나만 존재할 수 있다. a×b맵에서 객체가 존재하는 물리적 위치값에 해당하는 맵의 부분을 '1'로 두면, 나머지는 '0'으로 수렴하도록 학습이 진행되어야 한다. 즉, 도 3에서 우유는 2행과 3열, 즉 (2, 3)에서는 '1'이라는 값을 갖지만, 맵의 나머지 부분에서는 '0'이라는 값을 가지게 된다.The actual physical location value information of the object is displayed as a horizontal and vertical a×b map. That is, there may be only one physical location value in which one object can be located. In the a×b map, if the part of the map corresponding to the physical location value of the object is set to '1', learning should proceed so that the rest converge to '0'. That is, in FIG. 3 , milk has a value of '1' in rows 2 and 3, that is, (2, 3), but has a value of '0' in the rest of the map.

이러한 형태의 맵을 히트맵(Heatmap)이라고 하며, '0'과 '1'로 이루어진 바이너리 형태의 맵이기 때문에 바이너리 크로스 엔트로스 로스(Binary Cross Entropy Loss)로 계산이 들어간다.This type of map is called a heatmap, and since it is a binary map consisting of '0' and '1', it is calculated as Binary Cross Entropy Loss.

즉, 제 3 로스항(L₃)은, 바이너리 크로스 엔트로피를 이용하는 것을 특징으로 한다.That is, the third loss term (L ₃ ) is characterized by using binary cross entropy.

구체적으로, 제 3 로스항(L₃)은, 다음의 [수학식 2]와 같이 나타낼 수 있다.Specifically, the third loss term (L ₃ ) may be expressed as in the following [Equation 2].

[수학식 2]에서 N은 한번의 학습에 사용되는 앵커의 갯수, k는 각각의 앵커, p 및 q는 물리적 위치값, h는 히트맵, y는 실제 그라운드 트루수(Ground Truth)에서의 해당 물리적 위치값에서의 객체 존재 여부를 나타낸다.In [Equation 2], N is the number of anchors used for one time learning, k is each anchor, p and q are physical location values, h is a heat map, and y is the corresponding number in the actual ground truth. Indicates whether an object exists in a physical location value.

산출된 로스 함수는, Faster R-CNN의 백프로퍼게이션(Backpropagation)을 통해 각각 가중치(Weight)를 조절하는데 사용된다. The calculated loss function is used to adjust each weight through backpropagation of Faster R-CNN.

산출기(20)는, 산출된 객체의 실제 물리적 위치값 각각에 대해, 하나의 물리적 위치값에 하나의 앵커가 대응하도록 선택하여 출력하는 것이 바람직하다. 구체적으로, 산출기(20)는 가장 클래스의 스코어(Score)가 높은 앵커를 선택하는 것을 특징으로 한다. 따라서, 산출기(20)는 해당 앵커가 포지티브 앵커(Positive Anchor)인 경우에만, 해당 앵커의 물리적 위치값과 해당 경계 박스 정보를 객체 특정기(30)로 출력하는 것이 바람직하다.It is preferable that the calculator 20 selects and outputs one anchor corresponding to one physical position value for each of the calculated actual physical position values of the object. Specifically, the calculator 20 is characterized in that the anchor with the highest score of the class is selected. Therefore, it is preferable that the calculator 20 outputs the physical position value of the corresponding anchor and the corresponding bounding box information to the object specifying device 30 only when the corresponding anchor is a positive anchor.

객체 특정기(30)는, 산출기(20)에서 선택된 하나의 앵커를 이용하여, 해당 물리적 위치값에 위치하는 객체의 종류를 특정하는 역할을 한다. 즉, 객체 특정기(30)는, 해당 객체의 종류를 분류하는 역할을 한다. 구체적으로, 객체 특정기(30)는 산출기(20)로부터 해당 앵커가 포지티브 앵커(Positive Anchor)인 경우에만, 해당 앵커의 물리적 위치값과 해당 경계 박스 정보를 제안받는다.The object specifying unit 30 uses one anchor selected by the calculator 20 to specify the type of object located at the corresponding physical position value. That is, the object specifying unit 30 serves to classify the type of the corresponding object. Specifically, the object specifying unit 30 is suggested from the calculator 20 only when the corresponding anchor is a positive anchor, the physical position value of the corresponding anchor and the corresponding bounding box information.

예를 들면, 도 3과 같은 경우, 객체 특정기(30)는, 2행과 3열의 물리적 위치값에 위치한 객체를 '우유'라고 특정하게 된다.For example, in the case of FIG. 3 , the object specifying unit 30 specifies the object located at the physical location values of the second row and the third column as 'milk'.

구체적으로 객체 특정기(30)는, 특징맵 추출기(10)로부터 출력된 특징맵을 입력받고, 산출기(20)에서 선택된 하나의 앵커를 입력받아, ROI Pooling을 하고, 동일한 물리적 위치값에 대해 선택된 하나의 앵커에 대해 객체를 분류한다. 아울러, 객체 특정기(30)는, 산출기(20)의 해당 이미지에 포함된 객체의 이미지 상의 존재 영역인 경계 박스를 조정하여, 객체의 이미지 상의 존재 영역을 정확화하는 역할도 한다.Specifically, the object specifier 30 receives the feature map output from the feature map extractor 10, receives one anchor selected by the calculator 20, performs ROI pooling, and performs ROI pooling for the same physical location value. Classifies an object with respect to one selected anchor. In addition, the object specifying unit 30 also serves to correct the existence area on the image of the object by adjusting the boundary box that is the existence area on the image of the object included in the corresponding image of the calculator 20 .

참고로, 이 객체 특정은 종래의 Faster R-CNN에서 클래시피케이션(Classification)에 해당한다.For reference, this object specification corresponds to classification in the conventional Faster R-CNN.

객체 특정기(30)는, 산출기(20)와는 별도의 학습이 요구된다.The object specifying unit 30 requires separate learning from the calculator 20 .

도 4는 종래의 Faster R-CNN을 이용할 경우의 객체를 검출하는 방법에 대한 설명도이다.4 is an explanatory diagram for a method of detecting an object when using a conventional Faster R-CNN.

도 4로부터 알 수 있는 바와 같이 종래의 Faster R-CNN을 이용할 경우, Faster R-CNN의 후단에서 겹치는 객체를 확인하고, NMS(Non-Max Suppression) 방법을 이용하여, 겹치는 객체를 보정할 필요가 있다.As can be seen from FIG. 4, when using the conventional Faster R-CNN, it is necessary to check the overlapping object at the rear end of the Faster R-CNN and correct the overlapping object using the NMS (Non-Max Suppression) method. have.

즉, 종래의 Faster R-CNN을 이용할 경우, 객체가 존재할 만한 앵커(Positive Anchor)를 가져와서, 가져온 모든 앵커에 대해 ROI Pooling에 의해 객체 특정을 하고, 각 앵커별로 검출되는 객체를 확인한다. 여기서 앵커의 갯수가 실제 배치된 객체 갯수보다 많으며, 같은 객체에서도 위치가 조금씩 다르게 중복 검출되는(겹치는) 경우가 발생하게 된다. 즉, 그러한 여러 박스 중에서도 최적의 박스만 남겨야 하는데 이때 쓰이는 방식이 NMS(Non-maxmum suppression)라고 한다. 즉, NMS로 겹치는 객체를 보정한다. 구체적으로, NMS에서는 불필요한 물체를 스코어(Score)를 비교해서 가장 높은 하나만 남기고 전부 제거한다.That is, in the case of using the conventional Faster R-CNN, an anchor (Positive Anchor) in which an object may exist is brought, object is specified by ROI Pooling for all imported anchors, and an object detected for each anchor is checked. Here, the number of anchors is larger than the number of actually arranged objects, and even in the same object, overlapping detection (overlapping) occurs at slightly different locations. That is, only the optimal box should be left among such multiple boxes, and the method used at this time is called NMS (Non-maxmum suppression). That is, the overlapping objects are corrected with NMS. Specifically, in NMS, all unnecessary objects are removed except for the highest one by comparing scores.

다만, 본 발명의 딥러닝에 기반한 객체 검출 장치(100)에서는, 산출기(20)는 앵커들 중에서 실제 물리적 위치값 정보를 확인해서 동일한 객체이면 클래스 스코어(Class Score)가 가장 높은 하나만 남기고, 나머지는 모두 제거하여 출력하게 된다. 이에 따라, 객체 특정기(30)가 물리적 위치값별로 하나씩만 앵커를 가져오기 때문에 특정해야 하는 앵커 갯수 자체도 1/10 정도로 줄게 되고, 후처리로 NMS같은 것도 할 필요가 없어져서 좀 더 간단해진다.However, in the deep learning-based object detection apparatus 100 of the present invention, the calculator 20 checks the actual physical location value information among the anchors, and if it is the same object, only one having the highest class score is left, and the rest are all removed and printed. Accordingly, since the object specifying unit 30 brings only one anchor for each physical position value, the number of anchors to be specified is also reduced to about 1/10, and there is no need to do NMS as a post-processing, which makes it a little simpler.

도 5는 본 발명의 바람직한 일실시예에 따른 딥러닝에 기반한 객체 검출 방법의 흐름도를 나타낸다.5 is a flowchart of an object detection method based on deep learning according to a preferred embodiment of the present invention.

본 발명의 바람직한 일실시예에 따른 딥러닝에 기반한 객체 검출 방법은, 상술한 본 발명의 딥러닝에 기반한 객체 검출 장치(100)를 이용하므로, 별도의 설명이 없더라도 딥러닝에 기반한 객체 검출 장치(100)의 모든 특징을 포함하고 있음은 물론이다.Since the object detection method based on deep learning according to a preferred embodiment of the present invention uses the object detection apparatus 100 based on deep learning of the present invention described above, even if there is no separate explanation, the object detection apparatus based on deep learning ( 100) of course.

아울러, 본 발명의 바람직한 일실시예에 따른 딥러닝에 기반한 객체 검출 방법은, 프로세서에 의해 실시되는 컴프터 프로그램의 형태로 구현될 수 있다.In addition, the object detection method based on deep learning according to a preferred embodiment of the present invention may be implemented in the form of a computer program executed by a processor.

도 5로부터 알 수 있는 바와 같이, 본 발명의 바람직한 일실시예에 따른 딥러닝에 기반한 객체 검출 방법은, 입력된 이미지의 특징맵을 추출하는 단계(S10); S10 단계로부터 출력된 특징맵을 입력받아, 영역 제안 네트워크(Region Proposal Network, RPN)를 이용하여 영역 제안 네트워크에 포함된 앵커별로 해당 앵커에 객체가 포함되었는 지 여부, 앵커별로 해당 이미지에 포함된 객체의 이미지 상의 존재 영역 및 앵커별로 해당 이미지에 포함된 객체의 실제 물리적 위치값을 산출하는 단계(S20); 영역 제안 네트워크를 이용하여, S20 단계에서 산출된 객체의 실제 물리적 위치값 각각에 대해, 하나의 물리적 위치값에 하나의 앵커가 대응하도록 선택하는 단계(S30); 및 S30 단계에서 선택된 하나의 앵커를 이용하여, 해당 물리적 위치값에 위치하는 객체의 종류를 특정하는 단계(S40);를 포함한다.As can be seen from FIG. 5, the method for detecting an object based on deep learning according to a preferred embodiment of the present invention includes extracting a feature map of an input image (S10); Receive the feature map output from step S10 and use a region proposal network (RPN) to determine whether an object is included in the corresponding anchor for each anchor included in the region proposal network, and the object included in the image for each anchor calculating an actual physical location value of an object included in the image for each presence region and anchor on the image of (S20); selecting one anchor to correspond to one physical location value for each actual physical location value of the object calculated in step S20 using the area proposal network (S30); and specifying the type of object located at the corresponding physical location value ( S40 ) by using one anchor selected in step S30 .

아울러, 본 발명의 바람직한 일실시예에 따른 딥러닝에 기반한 객체 검출 방법은, S20 단계에서 산출된 해당 이미지에 포함된 객체의 이미지 상의 존재 영역을 조정하는 단계;를 더 포함하는 것이 바람직하다.In addition, the method for detecting an object based on deep learning according to a preferred embodiment of the present invention may further include a step of adjusting the presence area on the image of the object included in the corresponding image calculated in step S20.

영역 제안 네트워크는, 로스 함수를 이용하여 학습한다. 아울러, 로스 함수는, 해당 이미지에 포함된 객체의 실제 물리적 위치값과 관련된 제 3 로스항을 포함하는 것이 바람직하다.A region proposal network learns using a loss function. In addition, the loss function preferably includes a third loss term related to the actual physical position value of the object included in the image.

구체적으로, 제 3 로스항은, 바이너리 크로스 엔트로피를 이용하는 것을 특징으로 한다.Specifically, the third loss term is characterized by using binary cross entropy.

아울러, 객체의 실제 물리적 위치값은, 해당 이미지에 포함된 다수의 객체가 위치할 수 있는 수납면을, a행과 b열의 그리드로 나타낼 경우, 해당 그리드의 행과 열의 정보를 의미한다.In addition, the actual physical location value of the object means information on the row and column of the corresponding grid when the receiving surface on which a plurality of objects included in the image can be located is represented by a grid of rows a and b.

상술한 바와 같이, 본 발명의 딥러닝에 기반한 객체 검출 장치(100) 및 그 방법에 따르면, Faster R-CNN을 이용하되 객체의 실제 물리적 위치 정보를 추가하는 것에 의해 정확도를 높일 수 있을 뿐만 아니라, 연산량도 감소시킬 수 있음을 알 수 있다.As described above, according to the deep learning-based object detection apparatus 100 and method of the present invention, it is possible to increase the accuracy by using Faster R-CNN, but by adding the actual physical location information of the object, It can be seen that the amount of computation can also be reduced.

100 : 객체 검출 장치
10 : 특징맵 추출기
20 : 산출기
30 : 객체 특정기100: object detection device
10: feature map extractor
20: calculator
30: object specifying machine

Claims

In the object detection method based on deep learning,
(a) extracting a feature map of the input image;
(b) receiving the feature map extracted from step (a) and using a region proposal network, whether an object is included in the corresponding anchor for each anchor included in the region proposal network, and corresponding to each anchor calculating an actual physical position value of an object included in the image for each presence region and anchor on the image of the object included in the image; and
(c) using the area proposal network, for each of the actual physical position values of the object calculated in step (b), selecting an anchor so that one anchor corresponds to one physical position value; ,
The region proposal network learns using a Loss function,
The loss function, object detection method, characterized in that it includes a third loss term related to the actual physical position value of the object included in the image.

delete

According to claim 1,
The object detection method comprises:
(d) specifying the type of the object located at the corresponding physical location value by using the anchor selected in the step (c);

delete

According to claim 1,
The third Ross Port is,
Object detection method, characterized in that using binary cross entropy (Binary Cross Entropy).

According to claim 1,
The actual physical location value of the object is,
When the receiving surface on which a plurality of objects included in the image can be located is represented by a grid of rows a and b, the object detection method, characterized in that it is information of rows and columns of the corresponding grid.

In the object detection apparatus based on deep learning,
The object detection device,
Extract the feature map of the input image,
By receiving the extracted feature map and using a region proposal network, whether an object is included in the corresponding anchor for each anchor included in the region proposal network, the existence of the object included in the image by each anchor Calculate the actual physical location value of the object included in the image for each area and anchor,
For each actual physical position value of the calculated object, an anchor is selected so that one anchor corresponds to one physical position value,
The region proposal network learns using a Loss function,
The loss function, object detection apparatus, characterized in that it includes a third loss term related to the actual physical position value of the object included in the image.

delete

8. The method of claim 7,
The object detection device,
The object detection apparatus, characterized in that by using the selected one anchor, the type of the object located at the corresponding physical location value is specified.

delete

8. The method of claim 7,
The third Ross Port is,
An object detection apparatus, characterized in that using binary cross entropy (Binary Cross Entropy).

8. The method of claim 7,
The actual physical location value of the object is,
When the receiving surface on which a plurality of objects included in the image can be located is represented by a grid of rows a and b, the object detection apparatus, characterized in that it is information of rows and columns of the corresponding grid.