KR102340354B1

KR102340354B1 - Object Detection Method Using Multiscale Convolution Filter

Info

Publication number: KR102340354B1
Application number: KR1020200000825A
Authority: KR
Inventors: 최현철; 김민성
Original assignee: 영남대학교 산학협력단
Priority date: 2020-01-03
Filing date: 2020-01-03
Publication date: 2021-12-16
Also published as: KR20210087737A

Abstract

본 발명은, 서로 다른 크기를 가지는 복수개의 컨볼루션 필터를 적용하여 객체를 검출하는 방법을 개시한다. 본 발명에 따르면, 객체 검출시 복수개의 서로 다른 기 설정된 크기를 가지는 컨볼루션 필터를 적용하여 컨볼루션을 수행함으로써 객체의 크기에 상관없이 한 번의 프로세스로 객체를 정확히 검출할 수 있다.The present invention discloses a method of detecting an object by applying a plurality of convolution filters having different sizes. According to the present invention, an object can be accurately detected in one process regardless of the size of the object by performing convolution by applying a plurality of convolution filters having different preset sizes when detecting an object.

Description

Object Detection Method Using Multiscale Convolution Filter

본 발명은 객체 검출 방법에 관한 것으로, faster R-CNN(Convolutional Neural Network) 기반 RPN(Regional Proposal Network)에 있어서, 서로 다른 크기를 가지는 복수개의 컨볼루션 필터(convolution filter)를 적용하여 객체를 검출하는 방법에 관한 것이다.The present invention relates to an object detection method, in a faster R-CNN (Convolutional Neural Network) based RPN (Regional Proposal Network), by applying a plurality of convolution filters having different sizes to detect an object it's about how

심층 신경망을 기반으로 최근에 개발 된 검출 및 인식 기술은 고문서를 디지털화하는 데 매우 유용하다. 고대 문서 디지털화의 첫 단계로서, 문자 검출 및 지역화(localization) 작업은 문자 인식 및 번역의 추가 작업을 위해 매우 중요한데 고문서에서 문자 검출 및 지역화는 어려운 작업으로, 고문서에는 다양한 크기의 문자가 존재할 뿐 아니라 쓰여진 형태에 있어서 조건이 나쁘기 때문이다.Recently developed detection and recognition techniques based on deep neural networks are very useful for digitizing ancient documents. As the first step of digitizing ancient documents, character detection and localization work is very important for further work of character recognition and translation. Character detection and localization in ancient documents is a difficult task. This is because the condition is bad in terms of form.

특히 한자의 경우 독립 문자로 오해 할 수 있는 여러 가지 기하학적 구성 요소로 구성되어 있기 때문에 한자의 정확한 관심 영역(Region of interest; RoI)을 지역화해야 한다. 종래 사용되고 있는 검출 시스템인 Faster R-CNN은 CNN 기반의 특징 추출기와 RPN으로 구성되고, RPN은 3×3 컨볼루션 필터, 신뢰 점수 네트워크 및 경계 상자 예측 네트워크로 구성된다.In particular, since Chinese characters are composed of several geometric components that can be mistaken for independent characters, the precise region of interest (RoI) of Chinese characters needs to be localized. The conventionally used detection system, Faster R-CNN, is composed of a CNN-based feature extractor and RPN, and the RPN is composed of a 3×3 convolution filter, a confidence score network, and a bounding box prediction network.

그러나 Faster R-CNN에서 3x3 단일 크기 컨볼루션 필터를 사용하는 RPN에는 다양한 크기의 문자를 지역화하는 데 한계가 있다. 예를 들어, 도 1(a)에 표시된 것처럼 3×3 컨볼루션 필터보다 큰 문자를 지역화 할 때 종래의 RPN은 문자의 일부만 보고, 도 1(b)에 표시된 것처럼 작은 문자를 지역화 할 때 종래의 RPN에서는 문자 영역뿐만 아니라 배경도 보게 된다. 이로 인해 종래의 단일 크기 컨볼루션 필터를 사용하는 RPN은 문자의 정확한 영역을 지역화할 수 없다는 문제가 있다.However, RPN using 3x3 single-size convolution filter in Faster R-CNN has limitations in localizing characters of various sizes. For example, when localizing larger characters than a 3×3 convolution filter as shown in FIG. In RPN, you see the background as well as the text area. Due to this, there is a problem in that the RPN using the conventional single-size convolution filter cannot localize the exact region of the character.

또한, 이렇게 문자의 일부를 생략하는 등의 부정확한 지역화로 문자 인식 및 번역에 대한 추가 프로세스에서 성능이 크게 저하 될 수 있다. 따라서 감지된 문자의 정확한 지역화 즉, 정확한 경계 좌표 상자를 찾기 위해 종래의 RPN을 개선한 객체 검출 방법이 필요하다.Also, due to inaccurate localization, such as omitting a part of a character, performance may be greatly reduced in an additional process for character recognition and translation. Therefore, there is a need for an object detection method that improves the conventional RPN in order to find the correct localization of the detected text, that is, the correct bounding coordinate box.

S. Ren, K. He, R. Girshick and J. Sun, "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks", Neural Information Processing System (NIPS), Montreal, Canada, December 2015, pp.91-99S. Ren, K. He, R. Girshick and J. Sun, “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks”, Neural Information Processing System (NIPS), Montreal, Canada, December 2015, pp. .91-99

본 발명은 객체의 크기에 상관없이 정확히 객체를 검출할 수 있는 객체 검출 방법을 제공하는 것을 그 목적으로 한다.An object of the present invention is to provide an object detection method capable of accurately detecting an object regardless of the size of the object.

상기와 같은 목적을 달성하기 위하여, 본 발명의 일 실시예에 따른 객체 검출 방법은, 객체를 검출할 대상 이미지로부터 추출된 제1 특징맵을 입력받는 단계, 상기 입력된 제1 특징맵에 기 설정된 크기의 컨볼루션 필터에 의한 컨볼루션을 수행하여 제2 특징맵을 형성하는 단계 및 상기 제2 특징맵에 기초하여 상기 객체를 검출하는 단계를 포함하고, 이때 상기 제2 특징맵을 형성하는 단계는, 상기 제1 특징맵에 서로 다른 기 설정된 크기를 갖는 복수개의 컨볼루션 필터를 적용하는 것을 특징으로 한다.In order to achieve the above object, an object detection method according to an embodiment of the present invention includes the steps of receiving a first feature map extracted from a target image for detecting an object, preset in the input first feature map Forming a second feature map by performing convolution using a convolution filter of size, and detecting the object based on the second feature map, wherein the forming of the second feature map comprises: , characterized in that a plurality of convolution filters having different preset sizes are applied to the first feature map.

여기서, 상기 제2 특징맵을 형성하는 단계는, 상기 컨볼루션 필터의 기 설정된 크기가 N x N일 때, N은 홀수인 것을 특징으로 한다. (여기서, N의 단위는 픽셀임)Here, in the forming of the second feature map, when the preset size of the convolution filter is N x N, N is an odd number. (here, the unit of N is a pixel)

또한, 상기 제2 특징맵을 형성하는 단계는, 상기 객체 중 가장 큰 객체의 크기가 L ⅹ L이고, 상기 제1 특징맵의 1픽셀이 상기 대상 이미지의 M 픽셀에 대응될 때, 상기 복수개의 컨볼루션 필터 크기의 최대값은 K x K이고, K는 L/M 이하인 것을 특징으로 한다. (여기서, L 및 K의 단위는 픽셀임)In addition, in the forming of the second feature map, when the size of the largest object among the objects is L x L, and one pixel of the first feature map corresponds to M pixels of the target image, the plurality of The maximum value of the size of the convolution filter is K x K, and K is characterized as L/M or less. (Here, the units of L and K are pixels)

또한, 상기 제2 특징맵을 형성하는 단계는, 상기 객체 중 가장 작은 객체의 크기가 P ⅹ P이고, 상기 제1 특징맵의 1픽셀이 상기 대상 이미지의 M 픽셀에 대응될 때, 상기 복수개의 컨볼루션 필터 크기의 최소값은 Q x Q이고, Q는 P/M이상인 것을 특징으로 한다. (여기서, P 및 Q의 단위는 픽셀임)Also, in the forming of the second feature map, when the size of the smallest object among the objects is P x P and one pixel of the first feature map corresponds to M pixels of the target image, the plurality of The minimum value of the size of the convolution filter is Q x Q, and Q is characterized in that it is greater than or equal to P/M. (Here, the units of P and Q are pixels)

본 발명의 다른 실시예에 따른 객체 검출 방법은, a) 객체를 검출할 대상 이미지로부터 추출된 제1 특징맵을 입력받는 단계; b) 상기 입력된 제1 특징맵에 서로 다른 기 설정된 크기를 갖는 복수개의 컨볼루션 필터에 의한 컨볼루션을 수행하여 제2 특징맵을 형성하는 단계; c) 상기 제2 특징맵에 기초하여 상기 객체를 검출하는 단계; d) 상기 대상 이미지로부터 검출된 모든 객체의 검출률 및 검출 정확도를 측정하는 단계; e) 상기 검출된 모든 객체의 검출률 및 검출 정확도를 각각 기 설정된 목표 검출률 및 목표 검출 정확도와 비교하는 단계; 및 f) 상기 비교결과, 상기 검출된 모든 객체의 검출률 및 검출 정확도가 상기 목표 검출률 및 목표 검출 정확도 이상을 만족하는 경우에는, 상기 객체의 검출을 종료하고, 상기 비교결과, 상기 검출된 모든 객체 중 어느 하나 이상 객체의 검출률 및 검출 정확도가 상기 목표 검출률 및 목표 검출 정확도 미만인 경우에는, 상기 목표 검출률 및 상기 목표 검출 정확도와 가장 크게 차이 나는 검출률 및 검출 정확도를 갖는 객체의 크기를 산출하고, 상기 산출된 객체 크기에 대응하는 크기를 갖는 컨볼루션 필터를 추가하여 상기 b) 단계 내지 상기 e) 단계를 반복하는 단계;를 포함한다.An object detection method according to another embodiment of the present invention includes the steps of: a) receiving a first feature map extracted from a target image to detect an object; b) forming a second feature map by performing convolution on the input first feature map by a plurality of convolution filters having different preset sizes; c) detecting the object based on the second feature map; d) measuring a detection rate and detection accuracy of all objects detected from the target image; e) comparing the detection rates and detection accuracies of all the detected objects with preset target detection rates and target detection accuracies, respectively; and f) as a result of the comparison, if the detection rates and detection accuracy of all the detected objects satisfy the target detection rate and the target detection accuracy or more, the detection of the object is terminated, and, as a result of the comparison, among all the detected objects When the detection rate and detection accuracy of one or more objects are less than the target detection rate and the target detection accuracy, the size of the object having the detection rate and detection accuracy that is the greatest difference from the target detection rate and the target detection accuracy is calculated, and the calculated and repeating steps b) to e) by adding a convolution filter having a size corresponding to the object size.

여기서, 상기 b) 단계는, 상기 컨볼루션 필터의 기 설정된 크기가 N x N일 때, N은 홀수인 것을 특징으로 한다. (여기서, N의 단위는 픽셀임)Here, in step b), when the preset size of the convolution filter is N x N, N is an odd number. (here, the unit of N is a pixel)

또한 상기 b) 단계는, 상기 객체 중 가장 큰 객체의 크기가 L ⅹ L이고 상기 제1 특징맵의 1픽셀이 상기 대상 이미지의 M 픽셀에 대응될 때, 상기 복수개의 컨볼루션 필터 크기의 최대값은 K x K이고, K는 L/M이하인 것을 특징으로 한다. (여기서, L 및 K의 단위는 픽셀임)Also, in step b), when the size of the largest object among the objects is L × L and one pixel of the first feature map corresponds to M pixel of the target image, the maximum value of the size of the plurality of convolution filters is K x K, and K is characterized as L/M or less. (Here, the units of L and K are pixels)

또한 상기 b) 단계는, 상기 객체 중 가장 작은 객체의 크기가 P ⅹ P이고 상기 제1 특징맵의 1픽셀이 상기 대상 이미지의 M 픽셀에 대응될 때, 상기 복수개의 컨볼루션 필터 크기의 최소값은 Q x Q이고, Q는 P/M이상인 것을 특징으로 한다. (여기서, P 및 Q의 단위는 픽셀임)Also, in step b), when the size of the smallest object among the objects is P x P and one pixel of the first feature map corresponds to M pixel of the target image, the minimum value of the size of the plurality of convolution filters is Q x Q, and Q is characterized in that P/M or more. (Here, the units of P and Q are pixels)

한편, 본 발명은 객체 검출 방법을 구현하기 위한 프로그램이 기록된 컴퓨터 판독 가능한 기록매체를 제공하는 것을 특징으로 한다.Meanwhile, the present invention is characterized in that it provides a computer-readable recording medium in which a program for implementing an object detection method is recorded.

본 발명에 따르면, 객체 검출시 복수개의 서로 다른 기 설정된 크기를 가지는 컨볼루션 필터를 적용하여 컨볼루션을 수행함으로써, 객체의 크기에 상관없이 한 번의 프로세스로 객체를 정확히 검출할 수 있다.According to the present invention, by performing convolution by applying a plurality of convolution filters having different preset sizes when detecting an object, it is possible to accurately detect an object in one process regardless of the size of the object.

도 1a 및 도 1b는 종래의 객체 검출 알고리즘의 문제점을 나타내기 위한 도면이다.
도 2는 종래의 객체 검출 알고리즘에 대한 개념도이다.
도 3은 본 발명의 일 실시예에 따른 객체 검출 방법이 수행되는 객체 검출 시스템을 나타내는 개념도이다.
도 4는 본 발명의 일 실시예에 따른 객체 검출 방법의 흐름도이다.
도 5는 비교예 1, 비교예 2 및 실시예 1 각각에 있어서 신뢰 점수가 0.5 인 경계 상자에 대한 IoU(Intersection over union)의 누적 밀도 함수(CDF)를 나타내는 도면이다.
도 6a 및 도 6b는 비교예 1, 비교예 2 및 실시예 1 각각에 있어서 문자의 검출 정확도를 IoU로 나타낸 도면이다.
도 7은 비교예 1, 비교예 2 및 실시예 1 각각에 있어서 문자의 크기에 따른 검출률(true positive rate)을 나타낸 것이다.1A and 1B are diagrams illustrating problems of a conventional object detection algorithm.
2 is a conceptual diagram of a conventional object detection algorithm.
3 is a conceptual diagram illustrating an object detection system in which an object detection method according to an embodiment of the present invention is performed.
4 is a flowchart of a method for detecting an object according to an embodiment of the present invention.
5 is a diagram illustrating a cumulative density function (CDF) of an intersection over union (IoU) for a bounding box having a confidence score of 0.5 in Comparative Example 1, Comparative Example 2, and Example 1, respectively.
6A and 6B are diagrams illustrating character detection accuracy in IoU in Comparative Example 1, Comparative Example 2, and Example 1, respectively.
7 shows the detection rate (true positive rate) according to the size of the character in each of Comparative Example 1, Comparative Example 2, and Example 1.

이하, 첨부한 도면들을 참조하여 본 발명에 따른 객체 검출 방법에 대해 상세하게 설명한다. 첨부한 도면들은 통상의 기술자에게 본 발명의 기술적 사상이 충분히 전달될 수 있도록 하기 위하여 어디까지나 예시적으로 제공되는 것으로서, 본 발명은 이하 제시되는 도면들로 한정되지 않고 다른 형태로 얼마든지 구체화될 수 있다. 본 발명의 요지를 불필요하게 흐릴 수 있는 공지 기능 및 구성에 대한 상세한 설명은 생략하기로 한다.Hereinafter, an object detection method according to the present invention will be described in detail with reference to the accompanying drawings. The accompanying drawings are provided by way of example only so that the technical idea of the present invention can be sufficiently conveyed to those skilled in the art, and the present invention is not limited to the drawings presented below and may be embodied in other forms. have. Detailed descriptions of well-known functions and configurations that may unnecessarily obscure the gist of the present invention will be omitted.

또한, 명세서에 기재된 '…부', '모듈' 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.In addition, the '... Terms such as 'unit' and 'module' mean a unit that processes at least one function or operation, which may be implemented as hardware or software or a combination of hardware and software.

도 2는 종래의 객체 검출 알고리즘에 대한 개념도이다.2 is a conceptual diagram of a conventional object detection algorithm.

종래의 RPN(20)은, 대상 이미지에서 추출된 특징맵을 입력으로 전달받아, 단일 크기(일반적으로 3x3, 여기서 크기의 단위는 픽셀이다)를 갖는 컨볼루션 필터(21)에 의해 슬라이딩 윈도우(sliding window) 방식으로 컨볼루션을 수행하며, 컨볼루션 수행에 의해 객체 후보 영역들이 제안된 특징맵을 각각 좌표 예측 네트워크(bounding box prediction network)와 신뢰점수 네트워크(confidence score network)에 입력하여 객체를 검출한다.The conventional RPN 20 receives a feature map extracted from a target image as an input, and slides by a convolution filter 21 having a single size (generally 3x3, where the unit of size is a pixel). window) method, and input the feature maps proposed by object candidate regions into the bounding box prediction network and the confidence score network, respectively, to detect the object. .

이때 앞서 설명한 바와 같이 종래의 RPN(20)은 단일 크기의 컨볼루션 필터(21)만을 사용하여 객체를 탐색하므로 대상 이미지에 크고 작은 다양한 크기의 객체가 포함되어 있을 때 객체의 정확한 영역을 지역화할 수 없다는 문제가 있다.At this time, as described above, the conventional RPN 20 searches for an object using only the convolution filter 21 of a single size, so that when the target image contains objects of various sizes, the exact area of the object can be localized. There is a problem that there is no

도 3은 본 발명의 일 실시예에 따른 객체 검출 방법이 수행되는 객체 검출 시스템을 나타내는 개념도이고, 도 4는 상기 객체 검출 시스템의 RPN에 의해 수행되는 객체 검출 방법의 흐름도이다.3 is a conceptual diagram illustrating an object detection system in which an object detection method is performed according to an embodiment of the present invention, and FIG. 4 is a flowchart of an object detection method performed by an RPN of the object detection system.

먼저 도 3을 참조하면, 본 발명의 일 실시예에 따른 객체 검출 방법을 구현하기 위한 객체 검출 시스템(1000)은 제1 특징맵 생성 모듈(100)과 RPN(200)을 포함한다.Referring first to FIG. 3 , an object detection system 1000 for implementing an object detection method according to an embodiment of the present invention includes a first feature map generation module 100 and an RPN 200 .

여기서 제1 특징맵 생성 모듈(100)은 객체를 검출할 대상 이미지로부터 제1 특징맵을 추출하는 구성이고 이때 대상 이미지를 컨볼루션 및 풀링(pooling)하여 제1 특징맵을 추출한다. 제1 특징맵 생성 모듈(100)로는 공지된 Resnet34, VGG, googlenet등이 이용될 수 있다.Here, the first feature map generation module 100 is configured to extract a first feature map from a target image for detecting an object, and in this case, the first feature map is extracted by convolution and pooling the target image. As the first feature map generation module 100, well-known Resnet34, VGG, googlenet, etc. may be used.

RPN(200)은 도 4에 나타난 바와 같이 복수개의 컨볼루션 필터로 이루어진 컨볼루션 레이어와 좌표 예측 네트워크 및 신뢰점수 네트워크를 포함한다.As shown in FIG. 4 , the RPN 200 includes a convolution layer composed of a plurality of convolution filters, a coordinate prediction network, and a confidence score network.

이때 상술한 바와 같이, 본 발명의 일 실시예에 따른 객체 검출 방법은 RPN(200)에서 수행된다.At this time, as described above, the object detection method according to an embodiment of the present invention is performed in the RPN 200 .

한편, RPN(200)을 학습시키기 위해서 좌표 손실 및 점수 손실을 사용한다. 좌표 손실(L_coordinate)은 예측된 객체의 경계 상자 좌표(x, y, w, h)와 그라운드 진실(ground truth) 경계 상자 좌표(xt, yt, wt, ht) 사이의 L2 거리를 정규화하고 하기 수학식 1로 계산한다.On the other hand, in order to learn the RPN 200, the coordinate loss and the score loss are used. Coordinate loss (L _coordinate ) normalizes the L2 distance between the predicted object's bounding box coordinates (x, y, w, h) and the ground truth bounding box coordinates (xt, yt, wt, ht) and Calculated by Equation 1.

[수학식 1][Equation 1]

점수 손실(L_score)은 예측 신뢰 점수 p와 그라운드 진실 값(ground truth value) pt(0 또는 1) 사이의 이진 교차 엔트로피이며 하기 수학식 2로 계산된다. The score loss (L _score ) is the binary cross entropy between the predicted confidence score p and the ground truth value pt (0 or 1), and is calculated by Equation 2 below.

[수학식 2][Equation 2]

RPN(200) 훈련을 위한 최종 목적 함수인 L total(하기 수학식 3)은 L_coordinate와 L_score 의 가중치 합계로 계산된다.The final objective function L total (Equation 3 below) for training the RPN 200 is calculated as the weighted sum _{of L coordinate} and L _score.

[수학식 3][Equation 3]

이하부터는 본 발명의 일 실시예에 따른 객체 검출 방법에 대해 단계별로 설명한다.Hereinafter, an object detection method according to an embodiment of the present invention will be described in stages.

먼저, RPN(200)은 제1 특징맵 생성 모듈(100)이 객체를 검출할 대상 이미지에서 추출한 제1 특징맵을 입력받는다(S110)First, the RPN 200 receives the first feature map extracted from the target image to be detected by the first feature map generation module 100 (S110)

여기서 상기 대상 이미지는 다양한 크기를 갖는 한자가 포함된 고문서의 이미지 파일일 수 있다.Here, the target image may be an image file of an ancient document including Chinese characters having various sizes.

제1 특징맵은 제1 특징맵 생성모듈(100)이 대상 이미지를 컨볼루션 및 풀링하여 생성되며, 앞서 언급한 바와 같이 제1 특징맵 생성모듈은 Resnet34, VGG 등이 사용될 수 있다.The first feature map is generated by the first feature map generating module 100 convolves and pools the target image, and as mentioned above, the first feature map generating module may use Resnet34, VGG, or the like.

다음으로, RPN(200)은 입력된 제1 특징맵에 기 설정된 크기의 컨볼루션 필터에 의한 컨볼루션을 수행하여 제2 특징맵을 형성한다(S120). Next, the RPN 200 performs convolution on the input first feature map by a convolution filter of a preset size to form a second feature map (S120).

여기서 본 발명에서는, RPN(200)상에서 컨볼루션 수행시 종래와 같이 3x3 단일 크기를 갖는 컨볼루션 필터만을 사용하여 컨볼루션을 수행하는 것이 아니라, 서로 다른 크기를 갖는 복수개의 컨볼루션 필터 즉, 멀티 스케일 컨볼루션 필터가 적용된다.Here, in the present invention, when performing convolution on the RPN 200, convolution is not performed using only a convolution filter having a 3x3 single size as in the prior art, but a plurality of convolution filters having different sizes, that is, multi-scale A convolution filter is applied.

보다 구체적으로, 멀티 스케일을 갖는 복수개의 컨볼루션 필터는 제1 특징맵을 슬라이딩 윈도우 방식으로 지나가면서 객체 후보 영역을 제안한 복수개의 제2 특징맵을 형성하고, 상기 복수개의 제2 특징맵은 채널축 방향으로 연결되어 이후 단계에서 신뢰점수 네트워크 및 좌표 예측 네트워크의 입력이 된다.More specifically, a plurality of convolutional filters having a multi-scale form a plurality of second feature maps suggesting object candidate regions while passing the first feature map in a sliding window manner, and the plurality of second feature maps are channel-axis It is connected in the direction and becomes the input of the confidence score network and the coordinate prediction network in a later step.

다시 말해, 도 3에 나타난 바와 같이 1x1, 3x3, 5x5, 7x7 등의 다양한 크기를 가지는 멀티 스케일 컨볼루션 필터가 사용되고, 이를 통해 하나의 대상 이미지에 다양한 크기를 가지는 객체가 포함되어 있어도 객체의 경계영역을 더 정확하게 찾을 수 있다.In other words, as shown in FIG. 3, multi-scale convolution filters having various sizes such as 1x1, 3x3, 5x5, 7x7 are used, and through this, even if one target image includes objects having various sizes, the boundary area of the object can be found more precisely.

이때 본 단계(S120)에 있어서, 상기 컨볼루션 필터의 기 설정된 크기가 N x N일 때, N은 홀수일 수 있다. 컨볼루션 필터의 크기가 짝수이면 중심 지점을 가질 수 없고, 비대칭적인 패딩(padding)이 필요한 문제가 있다.At this time, in this step (S120), when the preset size of the convolution filter is N x N, N may be an odd number. If the size of the convolution filter is an even number, it cannot have a center point, and there is a problem that asymmetric padding is required.

또한, 본 단계(S120)에 있어서, 객체 중 가장 큰 객체의 크기가 L ⅹ L이고 제1 특징맵의 1픽셀이 대상 이미지의 M 픽셀에 대응될 때, 복수개의 컨볼루션 필터 중 가장 큰 컨볼루션 필터, 즉 컨볼루션 필터 크기의 최대값은 K x K이고, 이때, K는 L/M이하인 것이 바람직하다.Also, in this step (S120), when the size of the largest object among the objects is L × L and 1 pixel of the first feature map corresponds to M pixel of the target image, the largest convolution filter among the plurality of convolution filters The maximum value of the filter, that is, the convolutional filter size is K x K, where K is preferably L/M or less.

또한, 같은 방법으로 객체 중 가장 작은 객체의 크기가 P ⅹ P이고 제1 특징맵의 1픽셀이 대상 이미지의 M 픽셀에 대응될 때, 복수개의 컨볼루션 필터 중 가장 작은 컨볼루션 필터, 즉 컨볼루션 필터 크기의 최소값은 Q x Q이고, 이때, Q는 P/M이상인 것이 바람직하다.Also, in the same way, when the size of the smallest object among the objects is P x P and 1 pixel of the first feature map corresponds to M pixel of the target image, the smallest convolution filter among the plurality of convolution filters, that is, the convolution The minimum value of the filter size is Q x Q, where Q is preferably P/M or more.

물론, 여기서 L, M, N, K, O, P, Q 등의 크기를 나타내는 단위는 픽셀이다.Of course, here, the unit indicating the size of L, M, N, K, O, P, Q, etc. is a pixel.

이와 같이 다양한 크기의 컨볼루션 필터를 적용하면서도 컨볼루션 필터 크기의 상, 하한값을 제1 특징맵의 한 픽셀당 대상 이미지의 픽셀 수 및 객체의 크기에 의해 정함으로써 효율적으로 객체를 검출할 수 있다.While applying the convolution filters of various sizes as described above, the object can be efficiently detected by determining the upper and lower limits of the size of the convolution filter according to the number of pixels of the target image per pixel of the first feature map and the size of the object.

마지막으로, RPN(200)은 제2 특징맵에 기초하여 객체를 검출한다(S130). 이때 제2 특징맵을 생성하면서 제안한 객체 후보 영역에 대한 신뢰 점수와 경계 상자 좌표를 출력하고, 이를 기초로 객체를 검출한다.Finally, the RPN 200 detects an object based on the second feature map (S130). In this case, while generating the second feature map, the confidence score and bounding box coordinates for the proposed object candidate region are output, and the object is detected based on this.

보다 구체적으로, 신뢰점수는 신뢰점수 네트워크의 출력으로서 객체가 존재할 확률값으로 출력되고, 경계 상자 좌표는 좌표 예측 네트워크의 출력으로서 객체의 중심위치(x,y)와 객체의 크기인 폭(w) 및 높이(h)를 갖는 x,y,w,h로 출력된다.More specifically, the confidence score is output as a probability value of the existence of an object as an output of the confidence score network, and the bounding box coordinates are the output of the coordinate prediction network, including the central position (x, y) of the object and the width (w), which is the size of the object, and It is output as x,y,w,h with height (h).

이하에서는 본 발명의 일 실시예에 따른 객체 검출 방법의 효과를 검증하기 위해 수행한 구체적인 실험 및 그 결과를 설명한다. 이하 실험들에서는 대상 이미지로서 한자가 포함된 고문서를 사용하였다. 이때 고문서는 테스트를 위해 고대 스타일의 배경 이미지와 필기 한자로 된 문서를 합성하여 제작하였다.Hereinafter, a detailed experiment performed to verify the effect of the method for detecting an object according to an embodiment of the present invention and a result thereof will be described. In the following experiments, an ancient document containing Chinese characters was used as the target image. At this time, an ancient document was created by synthesizing an ancient-style background image and a handwritten Chinese character document for testing.

ImageNet 데이터 세트로 사전 학습된 Resnet34를 제1 특징맵 생성부(100)로 사용하였다. Resnet34는 VGG에 비해 높은 분류 성능을 가지고 더 적은 매개 변수를 사용하는 것으로 알려져 있다.Resnet34, pre-trained as an ImageNet data set, was used as the first feature map generator 100 . Resnet34 is known to have high classification performance and use fewer parameters compared to VGG.

제작된 고문서에서 가장 크기가 큰 한자의 크기는 112x112이고, 제1 특징맵의 1픽셀은 고문서의 16픽셀에 대응된다. 따라서, 112/16=7이므로 컨볼루션 필터 크기의 최대값은 7이 된다.The size of the largest Chinese character in the produced old document is 112x112, and 1 pixel of the first feature map corresponds to 16 pixels of the old document. Therefore, since 112/16=7, the maximum value of the convolution filter size becomes 7.

첫 번째 실험은 100개의 테스트 이미지에서 멀티 스케일 컨볼루션 필터를 사용했을 때의 결과를 분석하였다.The first experiment analyzed the results of using a multi-scale convolution filter on 100 test images.

하기 표 1은 첫 번째 실험의 분석 결과로서, 전체 문자 중 검출된 문자의 개수를 비율로 나타낸 검출률(true positive rate)을 본 발명의 성능지표로 사용하였다.Table 1 below is an analysis result of the first experiment, and a detection rate representing the number of detected characters among all characters as a ratio was used as a performance index of the present invention.

단일한 크기(3×3) 컨볼루션 필터를 이용한 객체 검출 방법은 최저 성능 (0.57)을 나타내고 추가로 1×1크기의 컨볼루션 필터를 추가하여 9 %의 개선을 달성한 것을 알 수 있다(0.66). 더 큰 크기의 컨볼루션 필터를 추가로 사용하면 객체 검출 성능이 유지되거나 약간 향상된다. 이때, 컨볼루션 필터의 최대 크기인 7x7 까지 사용했을 때 검출률은 포화된다.It can be seen that the object detection method using a single size (3×3) convolution filter shows the lowest performance (0.57) and achieves an improvement of 9% by adding an additional 1×1 size convolution filter (0.66). ). The additional use of a larger size convolution filter maintains or slightly improves object detection performance. At this time, when the maximum size of the convolution filter, 7x7, is used, the detection rate is saturated.

[표 1][Table 1]

두 번째 실험에서는 객체 검출에 있어서 세가지 다른 방법을 사용한 비교예 1, 비교예 2 및 실시예 1을 들어 객체 검출 정확도를 비교하였다.In the second experiment, the object detection accuracy was compared for Comparative Example 1, Comparative Example 2, and Example 1 using three different methods for object detection.

비교예 1은 단일 크기 컨볼루션 필터만 포함된 RPN을 사용하고, 비교예 2는 단일 크기 컨볼루션 필터만 포함된 RPN과 회귀 네트워크(Regressor)를 함께 사용하며, 실시예 1은 본 발명의 멀티 스케일 컨볼루션 필터를 포함한 RPN을 사용하여 객체를 검출하였다.Comparative Example 1 uses an RPN including only a single-size convolution filter, Comparative Example 2 uses an RPN including only a single-size convolution filter and a regressor network together, and Example 1 is a multi-scale of the present invention Objects were detected using RPN with convolutional filters.

여기서 비교예 2의 회귀 네트워크는, 주어진 입력에 대한 RPN의 경계 상자 좌표(x, y, w, h)와 그라운드 진실(ground truth) 경계 상자 좌표(xt, yt, wt, ht)의 차이(_x, _y, _w, _h)를 출력한다. 이 회귀 네트워크는 업데이트 된 경계 상자 좌표(xr = x + _x, yr = y + _y, wr = w + _w, hr = h + _h)와 그라운드 진실 상자 좌표의 IoU(Intersection over union) 손실(하기 수학식 4 참조)을 최소화하도록 훈련된다.Here, the regression network of Comparative Example 2 is the difference (_x) between the bounding box coordinates (x, y, w, h) of the RPN for a given input and the ground truth bounding box coordinates (xt, yt, wt, ht) , _y, _w, _h) are output. This regression network combines the updated bounding box coordinates (xr = x + _x, yr = y + _y, wr = w + _w, hr = h + _h) and the intersection over union (IoU) loss of the ground truth box coordinates (mathematical below) (see Equation 4) is trained to minimize

[수학식 4][Equation 4]

이때, IoU란 두 영역의 교차 영역의 넓이를 합영역의 값으로 나눈 값을 의미한다. 객체 검출에서 예측된 경계 상자의 정확도를 평가하는 지표 중 하나로 사용되며, 예측된 경계 상자와 그라운드 진실 경계 상자의 IoU를 해당 경계 상자의 검출 정확도로 간주한다. 본 발명에 있어서도 이 IoU를 검출 정확도 성능 지표로 사용한다.In this case, IoU means a value obtained by dividing the area of the intersection of the two areas by the value of the sum area. It is used as one of the indicators to evaluate the accuracy of the predicted bounding box in object detection, and the IoU of the predicted bounding box and the ground truth bounding box is considered as the detection accuracy of the corresponding bounding box. Also in the present invention, this IoU is used as a detection accuracy performance index.

본 실험에서는 500가지 테스트 문서에 대해 상기 비교예 1, 비교예 2 및 실시예 1의 검출된 경계 상자 좌표와 그라운드 진실 경계 상자 좌표 사이의 IoU를 측정하였다.In this experiment, IoU between the detected bounding box coordinates of Comparative Examples 1, 2, and 1 and the ground truth bounding box coordinates were measured for 500 test documents.

도 5는 비교예 1, 비교예 2 및 실시예 1 각각에 있어서 신뢰 점수가 0.5 인 경계 상자 좌표에 대한 IoU(Intersection over union)의 누적 밀도 함수(Cumulative Distribution function; CDF)를 나타내는 도면이다.5 is a diagram illustrating a cumulative distribution function (CDF) of an intersection over union (IoU) with respect to a bounding box coordinate having a confidence score of 0.5 in Comparative Example 1, Comparative Example 2, and Example 1, respectively.

여기서 p = 0.5는 문자 감지 임계값이다. 도 5와 같이 단일 스케일 RPN에 회귀 네트워크를 추가하면 CDF가 단일 스케일 RPN보다 IoU_0.6에서 더 높은 값을 가지며 감지된 경계 상자의 IoU를 증가시키는 데 효과적인 것을 알 수 있다.where p = 0.5 is the character detection threshold. It can be seen that adding the regression network to the single-scale RPN as shown in Fig. 5 shows that the CDF has a higher value at IoU_0.6 than the single-scale RPN and is effective in increasing the IoU of the detected bounding box.

본 발명 실시예 1의 멀티 스케일 컨볼루션 필터를 사용한 객체 검출 방법은 IoU_0.8에서 CDF가 가장 가파른 증가를 보이며, 비교예 1 및 비교예 2보다 높은 IoU에서 큰 밀도값을 보인다.In the object detection method using the multi-scale convolution filter of Example 1 of the present invention, the CDF shows the steepest increase at IoU_0.8, and shows a large density value at IoU higher than those of Comparative Examples 1 and 2.

이는 본 발명의 객체 검출 방법이 종래의 공지된 방법들보다 더 높은 문자 검출 성능을 가지고 있음을 의미한다.This means that the object detection method of the present invention has higher character detection performance than conventional known methods.

도 6a 및 도 6b는 비교예 1, 비교예 2 및 실시예 1 각각에 있어서 문자의 검출 정확도를 IoU로 나타낸 도면이다.6A and 6B are diagrams showing character detection accuracy in IoU in Comparative Example 1, Comparative Example 2, and Example 1, respectively.

숫자는 동일한 색상의 경계 상자의 IoU를 나타내며 빨간색 상자는 그라운드 진실 상자다.The numbers represent the IoUs of the bounding box of the same color, and the red box is the ground truth box.

도 6(a)에서 볼 수 있듯이, 본 발명 실시예 1의 멀티 스케일 컨볼루션 필터를 사용한 객체 검출 방법(녹색 상자)은 문자의 경계 영역을 거의 포함하며 IoU(0.82)가 큰 데 반해, 비교예 1의 단일 스케일 컨볼루션 필터를 사용한 객체 검출 방법(파란색 상자) 및 비교예 2의 회귀 네트워크를 이용한 객체 검출 방법(주황색 상자)은 문자의 일부 경계를 놓치고 있어 상대적으로 작은 IoU(0.68 및 0.69)를 갖는 것을 알 수 있다.As can be seen from Fig. 6(a), the object detection method (green box) using the multi-scale convolution filter of Example 1 of the present invention almost includes the boundary area of the character and has a large IoU (0.82), whereas the comparative example The object detection method using the single-scale convolution filter of 1 (blue box) and the object detection method using the regression network of Comparative Example 2 (orange box) miss some boundaries of the characters, resulting in relatively small IoUs (0.68 and 0.69). know that you have

또한, 도 6(b)와 같이 두 문자가 겹치는 경우, 실시예 1(녹색 상자)은 매우 큰 IoU(0.93 및 0.87)로 두 문자를 개별적으로 검출한 반면 비교예 1(파란색 상자) 및 비교예 2(주황색 상자)는 문자를 제대로 검출하지 못했다. In addition, when two characters overlap as shown in Fig. 6(b), Example 1 (green box) individually detected two characters with very large IoU (0.93 and 0.87), whereas Comparative Example 1 (blue box) and Comparative Example 2 (orange box) did not detect the character properly.

본 발명이 다른 방법들에 비해 큰 효과를 보이는 것은, 다양한 크기의 컨볼루션 필터를 사용함에 따른 것으로서, 크기가 큰 컨볼루션 필터를 사용하여 전체 지역 특성을 캡쳐할 수 있고(도 6a 참조), 작은 컨볼루션 필터를 사용하여 문자가 겹치는 작은 영역(도 6b 참조)을 피할 수 있기 때문이다.The reason that the present invention shows a large effect compared to other methods is that it uses convolution filters of various sizes, and it is possible to capture the entire regional characteristics using a large size convolution filter (see FIG. 6a ), and a small This is because a convolution filter can be used to avoid small areas where characters overlap (see Fig. 6b).

세 번째 실험에서는 두 번째 실험과 동일한 비교예 1, 비교예 2 및 실시예 1을 가지고 문자의 크기당 100개의 테스트 이미지에 적용하여 객체 검출률(true positive rate)을 비교하였다.In the third experiment, Comparative Example 1, Comparative Example 2, and Example 1 identical to those of the second experiment were applied to 100 test images per character size to compare the object detection rate (true positive rate).

도 7은 비교예 1, 비교예 2 및 실시예 1 각각에 있어서 문자의 크기에 따른 검출률을 나타낸 것이다.7 shows the detection rates according to the size of the characters in Comparative Example 1, Comparative Example 2, and Example 1, respectively.

도 7에서 볼 수 있듯이 3x3 컨볼루션 필터가 대상 이미지의 48픽셀에 해당하므로, 48픽셀 크기를 갖는 문자에 대해서는 비교예 1, 비교예 2 및 실시예 1이 같은 검출률을 가진다. As can be seen from FIG. 7 , since the 3x3 convolution filter corresponds to 48 pixels of the target image, Comparative Examples 1, 2, and 1 have the same detection rate for a character having a size of 48 pixels.

하지만, 상대적으로 작은 문자(16 또는 32 픽셀)의 경우, 대상 이미지의 48픽셀에 해당하는 3×3 컨볼루션 필터만 사용한 비교예 1 및 비교예 2에서는 문자 검출이 누락되어 검출률이 낮아진 것을 알 수 있다. 이에 반해 실시예 1에서는 1x1 컨볼루션 필터가 포함된 RPN을 사용하므로 비교예 1 및 비교예 2보다 검출률이 매우 크게 나타난다.However, in the case of relatively small characters (16 or 32 pixels), it can be seen that in Comparative Examples 1 and 2 using only a 3×3 convolution filter corresponding to 48 pixels of the target image, character detection was omitted and the detection rate was lowered. have. On the other hand, in Example 1, since the RPN including the 1x1 convolution filter is used, the detection rate is significantly higher than that of Comparative Examples 1 and 2.

또한, 상대적으로 큰 문자(112픽셀)의 경우에도, 실시예 1은 해당 문자의 전체 영역을 커버할 수 있는 7x7 컨볼루션 필터가 포함된 RPN을 사용하므로 비교예 1 및 비교예 2보다 검출률이 더 높은 것을 알 수 있다.In addition, even in the case of a relatively large character (112 pixels), Example 1 uses an RPN including a 7x7 convolution filter that can cover the entire area of the character, so that the detection rate is higher than that of Comparative Examples 1 and 2 high can be seen.

하기 표 2는 모든 글자 크기 각각에 대한 검출률을 나타낸다. 본 발명은 매우 작거나 매우 큰 문자(16 또는 32 또는 112 픽셀)에 대해 훨씬 높은 성능을 달성하는 것을 알 수 있고 평균적으로, 본 발명의 객체 검출 방법은 단일 스케일 컨볼루션 필터를 포함하는 RPN을 사용한 방법보다 3% 높은 검출률을 가지고, 단일 스케일 컨볼루션 필터를 포함하는 RPN과 회귀 네트워크를 함께 사용한 방법보다 1.4 % 더 높은 검출률을 가지는 것을 알 수 있다.Table 2 below shows the detection rates for each of all font sizes. It can be seen that the present invention achieves much higher performance for very small or very large characters (16 or 32 or 112 pixels) and, on average, the object detection method of the present invention uses RPNs comprising a single scale convolution filter. It can be seen that the method has a detection rate 3% higher than that of the method, and has a detection rate that is 1.4% higher than the method using the regression network and the RPN including a single-scale convolution filter.

[표 2][Table 2]

이하에서는 본 발명의 다른 실시예에 따른 객체 검출 방법에 대해 단계별로 설명한다.Hereinafter, an object detection method according to another embodiment of the present invention will be described step by step.

본 발명의 다른 실시예에 따른 객체 검출 방법 또한 지역 제안 네트워크가 수행하는 것으로서, 제1 특징맵 생성 모듈이 객체를 검출할 대상 이미지에서 추출한 제1 특징맵을 입력받는 a 단계(S210), 상기 입력된 제1 특징맵에 서로 다른 기 설정된 크기를 갖는 복수개의 컨볼루션 필터에 의한 컨볼루션을 수행하여 제2 특징맵을 형성하는 b 단계(S220) 및 상기 제2 특징맵에 기초하여 상기 객체를 검출하는 c 단계(S230)를 포함하는 것은 본 발명의 일 실시예에 따른 객체 검출 방법과 동일하다.The method for detecting an object according to another embodiment of the present invention is also performed by the regional proposal network. The first feature map generating module receives a first feature map extracted from a target image for detecting an object (S210), the input Step b of forming a second feature map by performing convolution by a plurality of convolution filters having different preset sizes on the first feature map (S220) and detecting the object based on the second feature map Including step c ( S230 ) is the same as the object detection method according to an embodiment of the present invention.

다만, 여기서는 객체를 검출하는 c 단계(S230) 이후에 대상 이미지로부터 검출된 모든 객체의 검출률 및 검출 정확도를 측정하는 d 단계(S240) 및 상기 모든 객체의 검출률 및 상기 검출 정확도를 각각 목표 검출률 및 목표 검출 정확도와 비교하는 e 단계(S250)를 더 포함한다.However, here, after step c (S230) of detecting the object, step d (S240) of measuring the detection rate and detection accuracy of all objects detected from the target image, and the detection rate and the detection accuracy of all the objects are set to the target detection rate and target, respectively. It further includes step e (S250) of comparing with the detection accuracy.

이후, 비교결과가 상기 대상 이미지로부터 검출된 모든 객체의 검출률 및 검출 정확도가 상기 목표 검출률 및 상기 목표 검출 정확도 이상을 만족하는 경우에는, 상기 객체의 검출을 종료하고, 비교결과가 상기 대상 이미지로부터 검출된 모든 객체 중 어느 하나 이상 객체의 검출률 및 검출 정확도가 상기 목표 검출률 및 상기 목표 검출 정확도 미만인 경우에는, 상기 목표 검출률 및 상기 목표 검출 정확도와 가장 크게 차이나는 검출률 및 검출 정확도를 갖는 객체의 크기를 산출한다. 이후 상기 산출된 객체 크기에 대응하는 크기를 갖는 컨볼루션 필터를 복수의 컨볼루션 필터에 추가하여 상기 b단계 내지 상기 e단계를 반복한다.Thereafter, when the detection rate and detection accuracy of all objects detected from the target image are greater than or equal to the target detection rate and the target detection accuracy, the detection of the object is terminated, and the comparison result is detected from the target image When the detection rate and detection accuracy of at least one object among all the objects are less than the target detection rate and the target detection accuracy, the size of the object having the largest difference between the target detection rate and the target detection accuracy and the detection accuracy is calculated do. Thereafter, steps b to e are repeated by adding a convolution filter having a size corresponding to the calculated object size to the plurality of convolution filters.

즉, 본 발명의 다른 실시예에 따른 객체 검출 방법은 대상 이미지에 포함된 모든 객체에 대해 목표 검출률과 목표 검출 정확도를 만족할 때까지 컨볼루션 필터를 추가하는 방법으로서, 객체 검출시 필요한 컨볼루션 필터만을 선택적으로 적용하는 방법이다.That is, the object detection method according to another embodiment of the present invention is a method of adding a convolution filter until the target detection rate and target detection accuracy are satisfied for all objects included in the target image. How to apply it selectively.

예를 들어, b 단계에서 서로 다른 크기의 기 설정된 컨볼루션 필터는 1x1 및 7x7일 수 있고, e 단계까지 수행하여 검출된 모든 객체의 검출률 및 검출 정확도를 각각 목표 검출률 및 목표 검출 정확도와 비교한다. 이때, 목표 검출률 및 목표 검출 정확도와 가장 크게 차이가 나는 검출률 및 검출 정확도를 갖는 객체의 크기에 대응하는 컨볼루션 필터의 크기가 3x3인 경우 3x3 크기의 컨볼루션 필터를 추가하여 단계 b부터 단계 e를 한번 더 수행한다.For example, preset convolution filters of different sizes in step b may be 1x1 and 7x7, and the detection rate and detection accuracy of all objects detected by performing up to step e are compared with the target detection rate and target detection accuracy, respectively. At this time, if the size of the convolution filter corresponding to the size of the object having the detection rate and detection accuracy that is the greatest difference from the target detection rate and the target detection accuracy is 3x3, add a 3x3 convolution filter to perform steps b to e do it one more time

이후 단계 e에서 다시 검출된 모든 객체의 검출률 및 검출 정확도를 각각 목표 검출률 및 목표 검출 정확도와 비교하고, 필요한 크기의 컨볼루션 필터를 추가하여 단계 b 내지 단계 e를 한번 더 수행한다. 이를 반복하다가 마침내 검출된 모든 객체의 검출률 및 검출 정확도가 목표 검출률 및 목표 검출 정확도 이상을 만족하게 되면 객체 검출을 종료한다.Thereafter, the detection rates and detection accuracies of all objects detected again in step e are compared with the target detection rate and the target detection accuracy, respectively, and steps b to e are performed once more by adding a convolutional filter of a required size. After repeating this, when the detection rates and detection accuracy of all detected objects finally satisfy the target detection rate and the target detection accuracy or more, the object detection is terminated.

이때 본 발명의 일 실시예에 따른 객체 검출 방법과 마찬가지로 단계 b에 있어서, 상기 컨볼루션 필터의 기 설정된 크기가 N x N일 때, N은 홀수일 수 있다. At this time, as in the object detection method according to an embodiment of the present invention, in step b, when the preset size of the convolution filter is N x N, N may be an odd number.

또한, 단계 b에 있어서, 객체 중 가장 큰 객체의 크기가 L ⅹ L이고 제1 특징맵의 1픽셀이 대상 이미지의 M 픽셀에 대응될 때, 복수개의 컨볼루션 필터 중 가장 큰 컨볼루션 필터의 크기, 즉 컨볼루션 필터 크기의 최대값은 K x K이고, 이때 K는 L/M이하이다.Also, in step b, when the size of the largest object among the objects is L × L and 1 pixel of the first feature map corresponds to M pixel of the target image, the size of the largest convolution filter among the plurality of convolution filters , that is, the maximum value of the convolution filter size is K x K, where K is less than or equal to L/M.

또한, 같은 방법으로 객체 중 가장 작은 객체의 크기가 P ⅹ P이고 제1 특징맵의 1픽셀이 상기 이미지의 M 픽셀에 대응될 때, 복수개의 컨볼루션 필터 중 가장 작은 컨볼루션 필터의 크기, 즉 컨볼루션 필터 크기의 최소값은 Q x Q이고, 이때 Q는 P/M이상이다.Also, in the same way, when the size of the smallest object among the objects is P x P and one pixel of the first feature map corresponds to M pixel of the image, the size of the smallest convolution filter among the plurality of convolution filters, that is, The minimum value of the convolution filter size is Q x Q, where Q is greater than or equal to P/M.

물론, 여기서도 L, M, N, K, O, P, Q 등의 컨볼루션 필터의 크기를 나타내는 단위는 픽셀이다.Of course, here too, the unit indicating the size of the convolution filter such as L, M, N, K, O, P, Q is a pixel.

본 발명의 다른 실시예에 따른 객체 검출 방법에 따르면, 처음에는 최소 크기의 컨볼루션 필터(예를 들어 1x1) 및 최대 크기의 컨볼루션 필터(예를 들어 7x7)만을 가지고 객체를 검출하고 이후 목표한 검출률 및 검출 정확도와 가장 큰 차이를 보이는 객체의 크기에 해당하는 컨볼루션 필터를 하나씩 추가하는 방법으로 현재 대상 이미지의 객체 검색시 필요한 컨볼루션 필터만을 선별하여 사용할 수 있다.According to the object detection method according to another embodiment of the present invention, an object is initially detected using only a convolution filter of the minimum size (eg 1x1) and a convolution filter of the maximum size (eg 7x7), and then the target In a method of adding convolution filters corresponding to the size of the object showing the greatest difference from the detection rate and detection accuracy one by one, only the convolution filters necessary for object search of the current target image can be selected and used.

다시 말해, 본 발명에 따르면, 객체 검출시 목표로 하는 검출 정확도 및 검출률을 만족하기 위해 필요한 최소한의 컨볼루션 필터의 개수와 크기를 적용할 수 있으므로 보다 효율적으로 객체를 검출할 수 있다.In other words, according to the present invention, the minimum number and size of convolution filters necessary to satisfy the target detection accuracy and detection rate when detecting an object can be applied, so that an object can be detected more efficiently.

한편, 상술한 본 발명에 따른 객체 검출 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터로 판독 가능한 기록매체에 기록될 수 있다. 상기 기록매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어의 통상의 기술자들에게 공지되어 사용 가능한 것일 수 있다. On the other hand, the object detection method according to the present invention described above may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable recording medium. The program instructions recorded on the recording medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the art of computer software.

컴퓨터 판독 가능한 기록매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 기록매체(magnetic media), CD-ROM, DVD와 같은 광 기록매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 기록매체(magneto-optical media) 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.Examples of the computer readable recording medium include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and floppy disks. Included are magneto-optical media and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like. A hardware device may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이상 본 발명의 설명을 위하여 예시로 든 실시예는 본 발명이 구체화되는 하나의 실시예에 불과하며, 본 발명의 요지가 실현되기 위하여 다양한 형태로 조합이 가능하다. 따라서 본 발명은 상기한 실시예에 한정되지 않고, 이하의 청구범위에서 청구하는 바와 같이 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변경실시가 가능한 범위까지 본 발명의 기술적 특징이 있다고 할 것이다.The embodiments given as examples for the purpose of description of the present invention are merely one embodiment in which the present invention is embodied, and combinations are possible in various forms in order to realize the gist of the present invention. Therefore, the present invention is not limited to the above embodiments, and as claimed in the following claims, anyone with ordinary skill in the art to which the invention pertains can implement various modifications without departing from the gist of the present invention. It will be said that there are technical features of the present invention to the extent.

1000: 객체 검출 시스템
100: 제1 특징맵 생성 모듈
200: RPN1000: object detection system
100: first feature map generation module
200: RPN

Claims

receiving a first feature map extracted from a target image for detecting an object;
forming a second feature map by performing convolution on the input first feature map by a convolution filter having a preset size; and
detecting the object based on the second feature map;
In the forming of the second feature map, a plurality of convolution filters having different preset sizes are applied to the first feature map,
In the forming of the second feature map, when the size of the largest object among the objects is L x L, and one pixel of the first feature map corresponds to M pixels of the target image, the plurality of convolutions The maximum value of the filter size is K x K, and K is L/M or less.

delete

According to claim 1,
In the forming of the second feature map, when the size of the smallest object among the objects is P x P, and one pixel of the first feature map corresponds to M pixels of the target image, the plurality of convolutions The minimum value of the filter size is Q x Q, and Q is P/M or more. (Here, the units of P and Q are pixels)

a) receiving a first feature map extracted from a target image for detecting an object;
b) forming a second feature map by performing convolution on the input first feature map by a plurality of convolution filters having different preset sizes;
c) detecting the object based on the second feature map;
d) measuring a detection rate and detection accuracy of all objects detected from the target image;
e) comparing the detection rates and detection accuracies of all the detected objects with preset target detection rates and target detection accuracies, respectively; and
f) If, as a result of the comparison, the detection rates and detection accuracy of all the detected objects satisfy the target detection rate and the target detection accuracy or more, the detection of the object is terminated, and, as a result of the comparison, any of the detected objects When the detection rate and detection accuracy of one or more objects are less than the target detection rate and the target detection accuracy, calculating the size of an object having a detection rate and detection accuracy that is the greatest difference from the target detection rate and the target detection accuracy, and the calculated object and repeating steps b) to e) by adding a convolution filter having a size corresponding to the size.

6. The method of claim 5,
In step b), when the preset size of the convolution filter is N x N, N is an odd number. (Where N is a pixel)

6. The method of claim 5,
In step b), when the size of the largest object among the objects is L × L and one pixel of the first feature map corresponds to M pixel of the target image, the maximum value of the size of the plurality of convolution filters is An object detection method, characterized in that K x K, and K is L/M or less. (Here, the units of L and K are pixels)

6. The method of claim 5,
In step b), when the size of the smallest object among the objects is P x P and one pixel of the first feature map corresponds to M pixel of the target image, the minimum value of the size of the plurality of convolution filters is Q x Q, and Q is an object detection method, characterized in that P/M or more. (Here, the units of P and Q are pixels)

A computer-readable recording medium in which a program for implementing the object detection method according to any one of claims 1 to 8 is recorded.