KR102650701B1

KR102650701B1 - Improvement of Detection Rate for Small Objects Using Pre-processing Network

Info

Publication number: KR102650701B1
Application number: KR1020210151354A
Authority: KR
Inventors: 최광남; 이두희; 차기순; 이크발 에테샴; 송현철
Original assignee: 중앙대학교 산학협력단; 남서울대학교 산학협력단
Priority date: 2021-11-05
Filing date: 2021-11-05
Publication date: 2024-03-25
Also published as: KR20230065610A

Abstract

본 발명은 전처리 네트워크를 이용한 이미지 내 작은 객체 검출률 향상 시스템 및 방법에 관한 것으로, 보다 상세하게는 기계 학습 절차를 통해 이미지의 개체를 감지할 뿐만 아니라 실제 시나리오에서 작고 불분명한 개체를 감지할 수 있는 전처리 네트워크를 이용한 이미지 내 작은 객체 검출률 향상 시스템 및 방법에 관한 것으로, 검출대상 객체를 포함하는 원본 이미지를 획득하는 이미지수집부;와 상기 원본 이미지를 샘플링하는 샘플링전처리부; 와 콘볼루션 계층의 네트워크를 기반으로 상기 샘플링전처리부로부터 도출된 이미지를 채널 이미지로 변환하는 다채널변환부; 및 상기 채널 이미지로부터 상기 검출대상 객체를 검출하는 객체검출부;를 포함한다.The present invention relates to a system and method for improving the detection rate of small objects in images using a preprocessing network. More specifically, the present invention relates to a preprocessing system that not only detects objects in images through a machine learning procedure but also detects small and unclear objects in real scenarios. It relates to a system and method for improving the detection rate of small objects in an image using a network, comprising: an image collection unit that acquires an original image including a detection target object; and a sampling preprocessor that samples the original image. and a multi-channel conversion unit that converts the image derived from the sampling preprocessor into a channel image based on a network of convolutional layers; and an object detection unit that detects the detection target object from the channel image.

Description

System and method for improving the detection rate of small objects in an image using a preprocessing network {Improvement of Detection Rate for Small Objects Using Pre-processing Network}

본 발명은 전처리 네트워크를 이용한 이미지 내 작은 객체 검출률 향상 시스템 및 방법에 관한 것으로, 보다 상세하게는 콘볼루션 네트워트를 이용한 실제 시나리오에서 작고 불분명한 개체를 감지할 수 있는 전처리 네트워크를 이용한 이미지 내 작은 객체 검출률 향상 시스템 및 방법에 관한 것이다.The present invention relates to a system and method for improving the detection rate of small objects in images using a preprocessing network. More specifically, the present invention relates to a system and method for improving the detection rate of small objects in images using a preprocessing network that can detect small and unclear objects in real scenarios using a convolutional network. It relates to improvement systems and methods.

최근, 기계 학습은 하나의 분야가 아니라 과학의 모든 분야에서 많은 응용 프로그램을 보여주고 있다. 기계 학습은, 컴퓨터가 인간이 하는 방식과 유사한 방식으로 생각하도록 가르치는 인공 지능(AI)의 한 분야로, 과거의 경험에 따라 학습하고 개선한다. 따라서, 기계 학습은 데이터를 탐색하고 패턴을 식별하여 작동하며 최소한의 인간의 개입을 수반한다.Nowadays, machine learning is not just one field, but is showing many applications in all fields of science. Machine learning is a branch of artificial intelligence (AI) that teaches computers to think in a similar way to the way humans do, learning and improving based on past experience. Therefore, machine learning works by exploring data and identifying patterns and involves minimal human intervention.

한편, 이와 같은 기계 학습은 이미지 내 작은 물체 감지를 위해 적용되는데 이를 머신 비전 기술이라고 한다. 객체 감지는 머신 비전에서 가장 중요한 응용 프로그램 중 하나이며, 객체 감지는 디지털 이미지 및 비디오에서 특정 클래스(예: 인간, 건물 또는 자동차)의 의미 체계 인스턴스를 감지하는 컴퓨터 비전 및 이미지 처리와 관련된 컴퓨터 기술이다. 이는 이미지에서 개체를 인식할 뿐만 아니라 클래스 및 개체의 좌표를 나타내는 것을 의미한다. 예를 들어, 사람이 이미지에 있는 것으로 가정되는 경우 학습된 네트워크는 사람을 찾고 사람이 이미지에 있는 위치를 추출하는 것을 의미한다. Meanwhile, this type of machine learning is applied to detect small objects in images, which is called machine vision technology. Object detection is one of the most important applications in machine vision. Object detection is a computer technique related to computer vision and image processing that detects semantic instances of specific classes (e.g., humans, buildings, or cars) in digital images and videos. . This means not only recognizing objects in the image, but also indicating the classes and coordinates of the objects. For example, if a person is assumed to be in an image, the trained network is meant to find the person and extract where the person is in the image.

현재까지 객체 감지 분야는 다양한 방식으로 개발되었으며, 한 단계 검출기와 두 단계 검출기는 일반적인 유형의 네트워크이다. 한 단계 검출기는 욜로(yolo), 망막넷, 그리고 두 단계 검출기는 RCNN, 빠른 RCNN가 있다. 한 단계 검출기는 이미지에서 분류 및 지역화 문제를 동시에 수행하는 방법이며, 전체 입력 이미지를 사용하여 이미지의 분류 및 지역화를 하는 과정이 상대적으로 빠르지만 덜 정확하다는 단점이 있다. 한편, 두 단계 검출기는 분류 및 지역화 문제를 순차적으로 진행하는 것으로, 두 단계 검출기는 영역 제안이라는 방법이 있는데, 이는 이미지 내에 찾고자 하는 객체가 있을 가능성이 있는 관심 영역을 찾는 알고리즘이다.To date, the field of object detection has been developed in various ways, and one-stage detector and two-stage detector are common types of networks. One-stage detectors include yolo and RetinaNet, and two-stage detectors include RCNN and fast RCNN. A one-step detector is a method that simultaneously performs classification and localization problems on an image. Although the process of classifying and localizing an image using the entire input image is relatively fast, it has the disadvantage of being less accurate. Meanwhile, the two-stage detector sequentially proceeds with classification and localization problems. The two-stage detector has a method called region proposal, which is an algorithm that finds regions of interest in the image where the object to be found is likely to be located.

해당 두 단계 검출기는 지역 제안 방법을 사용하여 관심 영역을 찾아 네트워크에 대한 입력으로 사용하는데 이것은 높은 정확도를 나타내지만, 저속이라는 단점을 가지고 있는 문제점이 존재한다.The two-stage detector uses a region proposal method to find a region of interest and uses it as input to the network. This shows high accuracy, but has the disadvantage of low speed.

등록특허공보 (KR) 제10-1896357호 (2018.09.03.)Registered Patent Publication (KR) No. 10-1896357 (2018.09.03.) 공개특허공보 (KR) 제10-2021-0119672호 (2021.10.06.)Public Patent Publication (KR) No. 10-2021-0119672 (2021.10.06.)

본 발명은 이와 같은 문제점을 감안한 것으로서, 네트워크에 입력된 입력 이미지의 관심 영역의 작고 조밀한 물체 감지를 위한 높은 정확도와, 더 나은 검출률을 갖춘 사전 처리를 제공하는 전처리 새로운 컨볼루션 신경망 네트워크를 이용한 전처리 네트워크를 이용한 이미지 내 작은 객체 검출률 향상 시스템 및 방법에 관한 것이다.The present invention takes these problems into account and provides preprocessing with high accuracy and better detection rate for detecting small and dense objects in the region of interest of the input image input to the network. Preprocessing using a new convolutional neural network network. This relates to a system and method for improving the detection rate of small objects in images using a network.

본 발명의 실시 예들에 따른, 전처리 네트워크를 이용한 이미지 내 작은 객체 검출률 향상 시스템은 검출대상 객체를 포함하는 원본 이미지를 획득하는 이미지수집부; 와 상기 원본 이미지를 샘플링하는 샘플링전처리부; 와 콘볼루션 계층의 네트워크를 기반으로 상기 샘플링전처리부로부터 도출된 이미지를 채널 이미지로 변환하는 다채널변환부; 및 상기 채널 이미지로부터 상기 검출대상 객체를 검출하는 객체검출부;를 포함한다.According to embodiments of the present invention, a system for improving the detection rate of small objects in an image using a preprocessing network includes an image collection unit that acquires an original image including a detection target object; and a sampling preprocessor that samples the original image; and a multi-channel conversion unit that converts the image derived from the sampling preprocessor into a channel image based on a network of convolutional layers; and an object detection unit that detects the detection target object from the channel image.

본 발명의 실시 예들에 있어서, 상기 콘볼루션 계층의 네트워크는, 망막넷(RetinaNet) 알고리즘을 기반으로 하며, 상기 콘볼루션 계층의 네트워크가 오류 이미지 샘플과 상기 원본 이미지를 구성하는 복수의 클래스의 불균형을 학습하여 차이점을 찾아내어, 상기 오류 이미지 샘플과 상기 원본 이미지를 구분한다.In embodiments of the present invention, the network of the convolutional layer is based on the RetinaNet algorithm, and the network of the convolutional layer reduces the imbalance of a plurality of classes constituting the error image sample and the original image. It learns to find differences and distinguishes between the error image sample and the original image.

본 발명의 실시 예들에 있어서, 상기 샘플링전처리부는, 상기 원본 이미지를 업 샘플링하여 상기 원본 이미지의 크기는 2배가 되고, 업 샘플링된 이미지로부터 검출대상 객체의 특징맵을 얻기 위해 상기 콘볼루션 계층으로 전달하는 업샘플링부; 와 상기 원본 이미지를 다운 샘플링하여 상기 원본 이미지의 크기는 1/2배가 되고, 다운 샘플링된 이미지로부터 검출대상 객체의 특징맵을 얻기 위해 상기 콘볼루션 계층으로 전달하는 다운샘플링부;를 포함한다.In embodiments of the present invention, the sampling preprocessor up-samples the original image to double the size of the original image, and passes the up-sampled image to the convolution layer to obtain a feature map of the object to be detected. an upsampling unit; and a downsampling unit that downsamples the original image to double the size of the original image and transmits it to the convolution layer to obtain a feature map of the object to be detected from the downsampled image.

본 발명의 실시 예들에 있어서, 상기 샘플링전처리부는, 상기 업샘플링부 혹은 상기 다운샘플링부로부터 도출된 상기 검출대상 객체의 특징맵의 8개의 형상데이터를 1열로 쌓아 순차적 벡터(sequential vector)로 변환하는 벡터변환부;를 더 포함한다.In embodiments of the present invention, the sampling preprocessor stacks eight shape data of the feature map of the detection target object derived from the upsampling unit or the downsampling unit into one column and converts it into a sequential vector. It further includes a vector conversion unit.

본 발명의 실시 예들에 있어서, 상기 다채널변환부는, 상기 벡터변환부로부터 변환된 벡터값을 기반으로, 하나의 채널(one-channel) 이미지를 생성하는 원채널이미지생성부; 와 상기 하나의 채널(one-channel) 이미지를 원본 이미지와 결합하여, 네트워크의 계층(layer)을 통과하여 4개의 채널(four-channel) 이미지를 생성하는 다채널이미지생성부;를 더 포함한다.In embodiments of the present invention, the multi-channel conversion unit may include a one-channel image generator that generates a one-channel image based on the vector value converted from the vector conversion unit; It further includes a multi-channel image generator that combines the one-channel image with the original image and generates a four-channel image by passing through a network layer.

본 발명의 실시 예들에 있어서, 상기 콘볼루션 계층은, 계층(layer)에서 픽셀(Pixel)의 위치를 추가하여 3개의 채널(tree-channel) 이미지를 다채널(multi-channel) 이미지로 변환하면, 상기 다채널(multi-channel) 이미지를 상기 3개의 채널(tree-channel)과 다른 좌표값을 갖는 3개의 채널(tree-channel)로 변환한다.In embodiments of the present invention, the convolution layer converts a three-channel image into a multi-channel image by adding the position of the pixel in the layer, The multi-channel image is converted into three channels (tree-channels) with coordinate values different from the three channels (tree-channels).

본 발명의 실시 예들에 있어서, 상기 다채널변환부는, 상기 콘볼루션 계층을 기반으로, 상기 4개의 채널(four-channel) 이미지를 또 다른 좌표값을 나타내는 3개의 채널 이미지(tree-channel)로 변환하는 채널변환이미지생성부;를 더 포함한다.In embodiments of the present invention, the multi-channel conversion unit converts the four-channel image into a three-channel image (tree-channel) representing another coordinate value, based on the convolutional layer. It further includes a channel conversion image generator.

본 발명의 실시 예들에 있어서, 상기 객체검출부는, 상기 다채널변환부로부터 도출된 상기 채널 이미지로부터 32x32 픽셀 이하의 사이즈의 상기 검출대상 객체를 검출한다.In embodiments of the present invention, the object detection unit detects the detection target object of a size of 32x32 pixels or less from the channel image derived from the multi-channel conversion unit.

본 발명의 또 다른 실시 예들에 따른, 전처리 네트워크를 이용한 이미지 내 작은 객체 검출하는 방법에 있어서, 검출대상 객체를 포함하는 원본 이미지를 획득하는 단계; 와 상기 원본 이미지를 샘플링하여 전처리하는 단계;와 콘볼루션 계층의 네트워크를 기반으로 상기 원본 이미지를 샘플링하여 전처리한 이미지를 채널 이미지로 변환하는 단계; 및 상기 채널 이미지로부터 상기 검출대상 객체를 검출하는 단계;를 포함한다.According to still other embodiments of the present invention, a method for detecting small objects in an image using a preprocessing network includes: acquiring an original image including a detection target object; and sampling and pre-processing the original image; and converting the pre-processed image by sampling the original image into a channel image based on a network of convolutional layers; and detecting the detection target object from the channel image.

본 발명의 실시 예들에 있어서, 상기 원본 이미지를 샘플링하여 전처리하는 단계는, 상기 원본 이미지를 업 샘플링하여 상기 원본 이미지의 크기는 2배가 되고, 업 샘플링된 이미지로부터 검출대상 객체의 특징맵을 얻기 위해 상기 콘볼루션 계층으로 전달하는 단계; 및 상기 원본 이미지를 다운 샘플링하여 상기 원본 이미지의 크기는 1/2배가 되고, 다운 샘플링된 이미지로부터 검출대상 객체의 특징맵을 얻기 위해 상기 콘볼루션 계층으로 전달하는 단계;를 포함한다.In embodiments of the present invention, the step of sampling and preprocessing the original image includes up-sampling the original image to double the size of the original image and obtaining a feature map of the detection target object from the up-sampled image. passing to the convolution layer; and down-sampling the original image so that the size of the original image is 1/2 the size of the original image, and passing the down-sampled image to the convolution layer to obtain a feature map of the object to be detected.

본 발명의 실시 예들에 있어서, 상기 원본 이미지를 샘플링하여 전처리하는 단계는, 상기 업 샘플링 혹은 다운 샘플링한 상기 검출대상 객체의 특징맵의 8개의 형상데이터를 1열로 쌓아 순차적 벡터(sequential vector)로 변환하는 단계;를 더 포함한다.In embodiments of the present invention, the step of sampling and preprocessing the original image includes stacking the eight shape data of the up-sampling or down-sampling feature map of the detection target object into one row and converting it into a sequential vector. It further includes;

본 발명의 실시 예들에 있어서, 상기 콘볼루션 계층의 네트워크를 기반으로 상기 원본 이미지를 샘플링하여 전처리한 이미지를 채널 이미지로 변환하는 단계는, 상기 순차적 벡터(sequential vector)로 변환한 값을 기반으로, 하나의 채널(one-channel) 이미지를 생성하는 단계; 및 상기 하나의 채널(one-channel) 이미지를 원본 이미지와 결합하여, 네트워크의 계층(layer)을 통과하여 4개의 채널(four-channel) 이미지를 생성하는 단계;를 더 포함한다.In embodiments of the present invention, the step of sampling the original image based on the network of the convolutional layer and converting the preprocessed image into a channel image is based on the value converted to the sequential vector, Generating a one-channel image; and combining the one-channel image with the original image to generate a four-channel image by passing through a network layer.

본 발명의 실시 예들에 있어서, 상기 콘볼루션 계층의 네트워크를 기반으로 상기 원본 이미지를 샘플링하여 전처리한 이미지를 채널 이미지로 변환하는 단계는, 상기 콘볼루션 계층을 기반으로, 상기 4개의 채널(four-channel) 이미지를 또 다른 좌표값을 나타내는 3개의 채널 이미지(tree-channel)로 변환하는 단계;를 더 포함한다.In embodiments of the present invention, the step of sampling the original image based on the network of the convolutional layer and converting the preprocessed image into a channel image includes the four channels (four- It further includes converting the (channel) image into a three channel image (tree-channel) representing another coordinate value.

이상에서 설명한 바와 같은 전처리 네트워크를 이용한 이미지 내 작은 객체 검출률 향상 시스템 및 방법에 따르면 다음과 같은 효과가 있다.According to the system and method for improving the detection rate of small objects in an image using a preprocessing network as described above, the following effects are achieved.

첫째, 콘볼루션 사전 처리 네트워크를 통해, 작고 조밀한 물체 감지를 위한 높은 정확도를 나타낸다.First, it shows high accuracy for detecting small and dense objects through a convolutional pre-processing network.

둘째, 콘볼루션 사전 처리 네트워크를 통해, 작고 조밀한 물체 감지를 위한 종래의 기술보다 향상된 검출률을 나타낸다.Second, through a convolutional pre-processing network, the detection rate is improved over conventional techniques for detecting small and dense objects.

셋째, 콘볼루션 사전 처리 네트워크를 통해, 최종 생성된 이미지는 원본 이미지보다 해상도가 더 높다.Third, through the convolutional preprocessing network, the final generated image has a higher resolution than the original image.

넷째, 콘볼루션 사전 처리 네트워크를 통해, 작은 물체 감지에 대한 평균 회수율을 향상할 수 있다.Fourth, through a convolutional preprocessing network, the average recall rate for small object detection can be improved.

도 1은 본 발명의 구성도이다.
도 2는 본 발명의 일실시 예에 따른, 샘플링전처리부와 다채널변환부의 모식도이다.
도 3는 본 발명의 일실시 예에 따른, 샘플링전처리부의 모식도이다.
도 4는 본 발명의 일실시 예에 따른, 다채널변환부의 구성도이다.
도 5는 본 발명의 일실시 예에 따른, 다채널변환부의 모식도이다.
도 6은 본 발명의 흐름도이다.
도 7과 도 8은 기존의 전처리의 네트워크를 이용한 이미지 내 객체 검출 결과 이미지와 본 발명의 일실시 예에 따른 이미지 내 작은 객체 검출 결과의 비교도이다.1 is a configuration diagram of the present invention.
Figure 2 is a schematic diagram of a sampling pre-processing unit and a multi-channel conversion unit according to an embodiment of the present invention.
Figure 3 is a schematic diagram of a sampling pre-processing unit according to an embodiment of the present invention.
Figure 4 is a configuration diagram of a multi-channel conversion unit according to an embodiment of the present invention.
Figure 5 is a schematic diagram of a multi-channel conversion unit according to an embodiment of the present invention.
Figure 6 is a flow chart of the present invention.
Figures 7 and 8 are comparison diagrams of the object detection results in an image using a conventional preprocessing network and the results of small object detection in an image according to an embodiment of the present invention.

첨부한 도면을 참조하여 본 발명의 실시예들에 따른 전처리 네트워크를 이용한 이미지 내 작은 객체 검출률 향상 시스템 및 방법에 대하여 상세히 설명한다. 본 발명은 다양한 변경을 가할 수 있고 여러 가지 형태를 가질 수 있는바, 특정 실시예들을 도면에 예시하고 본문에 상세하게 설명하고자 한다. 그러나 이는 본 발명을 특정한 개시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다. 첨부된 도면에 있어서, 구조물들의 치수는 본 발명의 명확성을 기하기 위하여 실제보다 확대하거나, 개략적인 구성을 이해하기 위하여 실제보다 축소하여 도시한 것이다.A system and method for improving the detection rate of small objects in an image using a preprocessing network according to embodiments of the present invention will be described in detail with reference to the attached drawings. Since the present invention can be subject to various changes and can have various forms, specific embodiments will be illustrated in the drawings and described in detail in the text. However, this is not intended to limit the present invention to a specific disclosed form, and should be understood to include all changes, equivalents, and substitutes included in the spirit and technical scope of the present invention. While describing each drawing, similar reference numerals are used for similar components. In the attached drawings, the dimensions of the structures are enlarged from the actual size for clarity of the present invention, or reduced from the actual size to understand the schematic configuration.

또한, 제1 및 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. 한편, 다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다. Additionally, terms such as first and second may be used to describe various components, but the components should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, a first component may be named a second component, and similarly, the second component may also be named a first component without departing from the scope of the present invention. Meanwhile, unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meaning as generally understood by those skilled in the art to which the present invention pertains. Terms defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related technology, and unless clearly defined in the present application, should not be interpreted in an ideal or excessively formal sense. No.

본 발명은 이와 같은 문제점을 감안한 것으로서, 네트워크에 입력된 입력 이미지의 관심 영역의 작고 조밀한 물체 감지를 위한 높은 정확도와, 더 나은 검출률을 갖춘 사전 처리를 제공하는 전처리 새로운 컨볼루션 신경망 네트워크를 이용한 전처리 네트워크를 이용한 이미지 내 작은 객체 검출률 향상 시스템 및 방법에 관한 것이다. The present invention takes these problems into account and provides preprocessing with high accuracy and better detection rate for detecting small and dense objects in the region of interest of the input image input to the network. Preprocessing using a new convolutional neural network network. This relates to a system and method for improving the detection rate of small objects in images using a network.

도 1은 본 발명의 구성도이다. 1 is a configuration diagram of the present invention.

도 1을 참고하면, 전처리 네트워크를 이용한 이미지 내 작은 객체 검출률 향상 시스템은 이미지수집부(100)와 샘플링전처리부(200)와 다채널변환부(300) 및 객체검출부(400)로 구성된다. Referring to Figure 1, the system for improving the detection rate of small objects in an image using a preprocessing network consists of an image collection unit 100, a sampling preprocessing unit 200, a multi-channel conversion unit 300, and an object detection unit 400.

이미지수집부(100)의 경우, 검출대상 객체를 포함하는 원본 이미지를 획득하며, 샘플링전처리부(200)는 상기 원본 이미지를 샘플링하며, 다채널변환부(300)는 콘볼루션 계층의 네트워크를 기반으로 상기 샘플링전처리부로부터 도출된 이미지를 채널 이미지로 변환한다. 객체검출부(400)의 경우, 다채널변환부(300)로부터 도출된 상기 채널 이미지로부터 작은 사이즈인 즉, 32x32 픽셀 이하의 사이즈의 상기 검출대상 객체를 검출한다.In the case of the image collection unit 100, an original image including the object to be detected is acquired, the sampling pre-processing unit 200 samples the original image, and the multi-channel conversion unit 300 is based on a network of convolutional layers. The image derived from the sampling preprocessor is converted into a channel image. In the case of the object detection unit 400, the detection target object of a small size, that is, a size of 32x32 pixels or less, is detected from the channel image derived from the multi-channel conversion unit 300.

도 2는 본 발명의 일실시 예에 따른, 샘플링전처리부와 다채널변환부의 모식도이다. 도 2를 참고하면, 샘플링전처리부(200)의 구성부인 업샘플링부(205)와 다운샘플링부(210) 및 벡터변환부(215)의 기능의 흐름도와, 다채널변환부(300)의 구성부인 원채널이미지생성부(305)와 다채널이미지생성부(310) 및 채널변환이미지생성부(315)의 기능의 흐름도를 나타낸다.Figure 2 is a schematic diagram of a sampling pre-processing unit and a multi-channel conversion unit according to an embodiment of the present invention. Referring to FIG. 2, a flowchart of the functions of the upsampling unit 205, the downsampling unit 210, and the vector conversion unit 215, which are components of the sampling preprocessing unit 200, and the configuration of the multi-channel conversion unit 300. A flowchart of the functions of the one-channel image generator 305, the multi-channel image generator 310, and the channel conversion image generator 315 is shown.

도 3는 본 발명의 일실시 예에 따른, 샘플링전처리부의 모식도이다.Figure 3 is a schematic diagram of a sampling pre-processing unit according to an embodiment of the present invention.

도 3을 참고하면, 업샘플링부(205)는 상기 원본 이미지를 업 샘플링하여 상기 원본 이미지의 크기는 2배가 되고, 업 샘플링된 이미지로부터 검출대상 객체의 특징맵을 얻기 위해 상기 콘볼루션 계층으로 전달하며, 다운샘플링부(210)는 상기 원본 이미지를 다운 샘플링하여 상기 원본 이미지의 크기는 1/2배가 되고, 다운 샘플링된 이미지로부터 검출대상 객체의 특징맵을 얻기 위해 상기 콘볼루션 계층으로 전달한다. 또한, 벡터변환부(215)는 업샘플링부(205) 혹은 다운샘플링부(210)로부터 도출된 상기 검출대상 객체의 특징맵의 8개의 형상데이터를 1열로 쌓아 순차적 벡터(sequential vector)로 변환한다. Referring to FIG. 3, the upsampling unit 205 up-samples the original image to double the size of the original image, and passes the up-sampled image to the convolution layer to obtain a feature map of the object to be detected. The downsampling unit 210 downsamples the original image to double the size of the original image, and transmits the downsampled image to the convolution layer to obtain a feature map of the object to be detected. In addition, the vector conversion unit 215 stacks the eight shape data of the feature map of the detection target object derived from the upsampling unit 205 or the downsampling unit 210 into one row and converts it into a sequential vector. .

여기서, 상기 콘볼루션 계층의 네트워크는, 망막넷(RetinaNet) 알고리즘을 기반으로 하며, 상기 콘볼루션 계층의 네트워크가 오류 이미지 샘플과 상기 원본 이미지를 구성하는 복수의 클래스의 불균형을 학습하여 차이점을 찾아내어, 상기 오류 이미지 샘플과 상기 원본 이미지를 구분한다. 좀 더 상세하게 설명하자면, 상기 망막넷(RetinaNet) 알고리즘은 레스넷(ResNet)을 백본으로 사용하고, 피처 피라미드 네트워크(FPN)를 사용하여 다양한 척도에서 추출된 특징맵(검출대상 객체의 특징맵)을 학습한 다음, 마지막 추출한 상기 특징맵(검출대상 객체의 특징맵)을 분류 및 경계상자(BBox) 회귀 단락에 넣어, 개체 감지를 위한 변환 단락하여 초점 손실(FL)을 학습합니다. 상기 망막넷(RetinaNet)에서 사용하는 상기 초점 손실(FL)은 첫 번째 손실기능으로, 기존 분류의 정확도를 크게 향상시킵니다. 즉, 상기 초점 손실(FL)은 하드 또는 쉽게 잘못 분류된 예제(즉, 시끄러운 텍스처 또는 부분 물체 또는 관심 대상을 가진 배경)에 더 많은 가중치를 할당합니다. 다음의 초점 손실(FL)은 클래스 불균형 문제를 처리하려고 시도하는 교차 엔트로피 손실(CE)의 개선된 버전이며, 상기 경계상자(BBox) 회귀 손실은 IoU 손실을 사용하여 학습합니다.Here, the network of the convolutional layer is based on the RetinaNet algorithm, and the network of the convolutional layer learns the imbalance of the plurality of classes constituting the error image sample and the original image to find the differences. , distinguish between the error image sample and the original image. To explain in more detail, the RetinaNet algorithm uses ResNet as a backbone, and feature maps (feature maps of detection target objects) extracted at various scales using a feature pyramid network (FPN). After learning, the last extracted feature map (feature map of the object to be detected) is put into the classification and bounding box (BBox) regression section and transformed for object detection to learn the focus loss (FL). The focus loss (FL) used in RetinaNet is the first loss function and greatly improves the accuracy of existing classification. That is, the focus loss (FL) assigns more weight to hard or easily misclassified examples (i.e. backgrounds with noisy textures or partial objects or objects of interest). The following Focal Loss (FL) is an improved version of the Cross Entropy Loss (CE) that attempts to handle the class imbalance problem, while the Bounding Box (BBox) regression loss above is trained using the IoU loss.

도 4는 본 발명의 일실시 예에 따른, 다채널변환부의 구성도이다.Figure 4 is a configuration diagram of a multi-channel conversion unit according to an embodiment of the present invention.

도 4를 참고하면, 다채널변환부(300)는 원채널이미지생성부(305)와 다채널이미지생성부(310) 및 채널변환이미지생성부(315)를 포함한다.Referring to FIG. 4, the multi-channel conversion unit 300 includes a one-channel image generator 305, a multi-channel image generator 310, and a channel conversion image generator 315.

원채널이미지생성부(305)는 상기 벡터변환부로부터 변환된 벡터값을 기반으로, 하나의 채널(one-channel) 이미지를 생성하며, 다채널이미지생성부(310)는 상기 하나의 채널(one-channel) 이미지를 원본 이미지와 결합하여, 네트워크의 계층(layer)을 통과하여 4개의 채널(four-channel) 이미지를 생성한다. 또한, 채널변환이미지생성부(315)는 상기 콘볼루션 계층을 기반으로, 상기 4개의 채널(four-channel) 이미지를 또 다른 좌표값을 나타내는 3개의 채널 이미지(tree-channel)로 변환한다. The one-channel image generator 305 generates a one-channel image based on the vector value converted from the vector converter, and the multi-channel image generator 310 generates the one-channel image. -channel) image is combined with the original image and passes through the network layers to create a four-channel image. Additionally, the channel conversion image generator 315 converts the four-channel image into a three-channel image (tree-channel) representing another coordinate value, based on the convolution layer.

도 5는 본 발명의 일실시 예에 따른, 다채널변환부의 모식도이다.Figure 5 is a schematic diagram of a multi-channel conversion unit according to an embodiment of the present invention.

도 5를 참고하면, 상기 콘볼루션 계층은, 계층(layer)에서 픽셀(Pixel)의 위치를 추가하여 3개의 채널(tree-channel) 이미지를 다채널(multi-channel) 이미지로 변환하면, 상기 다채널(multi-channel) 이미지를 상기 3개의 채널(tree-channel)과 다른 좌표값을 갖는 3개의 채널(tree-channel)로 변환한다는 특징을 갖고 있다. 좀 더 상세하게 설명하자면, 상기 콘볼루션의(CoordConv) 계층은 몇 가지 매개 변수의 특성과 효율적인 계산을 연속적으로 유지하지만, 네트워크가 학습 중인 작업에 필요한 번역 작업의 변동을 지속하거나 혹은 폐기하는 방법도 학습할 수 있다. 또한, 상기 콘볼루션의(CoordConv) 계층은 상기 원본 이미지의 형태로 상기 원본 이미지의 픽셀 위치 값을 만들고, 상기 원본 이미지에 추가하여 이미지의 채널 변환을 이끈다. 즉, 계층(레이어)에 픽셀 위치를 추가하면, 3채널 이미지를 5채널 이미지로 변경한 다음 3개의 채널 이미지를 변경하기 위해 콘볼루션 계층이 적용된다. 이를 통해 종래의 콘볼루션 네트워크의 단순 분류 및 개체 감지 분야에서 높은 정확도와 개선을 이끈다. Referring to FIG. 5, the convolution layer converts a three-channel (tree-channel) image into a multi-channel image by adding the position of the pixel (Pixel) in the layer, It has the characteristic of converting a multi-channel image into three channels (tree-channels) with coordinate values different from the three channels (tree-channels). To be more specific, the convolutional (CoordConv) layer keeps the characteristics of some parameters continuous and efficient computation, but also provides a way for the network to continue or discard translation operations required for the task being trained. You can learn. Additionally, the convolution (CoordConv) layer creates pixel position values of the original image in the form of the original image and adds them to the original image, leading to channel conversion of the image. In other words, when pixel positions are added to the layer, the 3-channel image is changed to a 5-channel image, and then a convolution layer is applied to change the 3-channel image. This leads to high accuracy and improvements in simple classification and object detection over conventional convolutional networks.

도 6은 본 발명의 흐름도이다.Figure 6 is a flow chart of the present invention.

도 6을 참고하면, 전처리 네트워크를 이용한 이미지 내 작은 객체 검출하는 방법에 있어서, 검출대상 객체를 포함하는 원본 이미지를 획득하는 단계(S601);와 상기 원본 이미지를 샘플링하여 전처리하는 단계(S602); 와 콘볼루션 계층의 네트워크를 기반으로 상기 원본 이미지를 샘플링하여 전처리한 이미지를 채널 이미지로 변환하는 단계(S603); 및 상기 채널 이미지로부터 상기 검출대상 객체를 검출하는 단계(S604);를 포함한다. Referring to Figure 6, in the method of detecting a small object in an image using a preprocessing network, obtaining an original image including the object to be detected (S601); and sampling and preprocessing the original image (S602); Sampling the original image and converting the preprocessed image into a channel image based on a network of convolutional layers (S603); and detecting the detection target object from the channel image (S604).

좀 더 상세하게 설명하자면, 원본 이미지를 샘플링하여 전처리하는 단계(S601)는, 상기 원본 이미지를 업 샘플링하여 상기 원본 이미지의 크기는 2배가 되고, 업 샘플링된 이미지로부터 검출대상 객체의 특징맵을 얻기 위해 상기 콘볼루션 계층으로 전달하는 단계; 및 상기 원본 이미지를 다운 샘플링하여 상기 원본 이미지의 크기는 1/2배가 되고, 다운 샘플링된 이미지로부터 검출대상 객체의 특징맵을 얻기 위해 상기 콘볼루션 계층으로 전달하는 단계;를 거쳐, 상기 업 샘플링 혹은 다운 샘플링한 상기 검출대상 객체의 특징맵의 8개의 형상데이터를 1열로 쌓아 순차적 벡터(sequential vector)로 변환한다. 이후, 콘볼루션 계층의 네트워크를 기반으로 상기 원본 이미지를 샘플링하여 전처리한 이미지를 채널 이미지로 변환하는 단계(S602)는, 상기 순차적 벡터(sequential vector)로 변환한 값을 기반으로, 하나의 채널(one-channel) 이미지를 생성하는 단계;를 거쳐 상기 하나의 채널(one-channel) 이미지를 원본 이미지와 결합하여, 네트워크의 계층(layer)을 통과하여 4개의 채널(four-channel) 이미지를 생성한다. 이후, 상기 콘볼루션 계층을 기반으로, 상기 4개의 채널(four-channel) 이미지를 또 다른 좌표값을 나타내는 3개의 채널 이미지(tree-channel)로 변환한다.To explain in more detail, the step of sampling and preprocessing the original image (S601) involves up-sampling the original image to double the size of the original image and obtaining a feature map of the detection target object from the up-sampled image. passing it to the convolution layer; and down-sampling the original image so that the size of the original image is 1/2, and passing it to the convolution layer to obtain a feature map of the object to be detected from the down-sampled image. The eight shape data of the down-sampled feature map of the detection target object are stacked in one row and converted into a sequential vector. Thereafter, in the step (S602) of sampling the original image and converting the preprocessed image to a channel image based on the network of the convolutional layer, one channel ( A step of generating a one-channel image; combining the one-channel image with the original image and passing through the network layer to create a four-channel image. . Thereafter, based on the convolution layer, the four-channel image is converted into a three-channel image (tree-channel) representing another coordinate value.

도 7과 도 8은 기존의 전처리의 네트워크를 이용한 이미지 내 객체 검출 결과 이미지와 본 발명의 일실시 예에 따른 이미지 내 작은 객체 검출 결과의 비교도이다. 도 7 및 도 8을 참고하면, 왼쪽의 이미지는 종래의 전처리 네트워크를 이용한 이미지내 객체 검출 방법의 결과 이미지이며, 오른쪽 이미지의 경우는 본 발명의 일실시예에 따른 전처리의 네트워크를 이용한 이미지 내 작은 객체 검출률 향상 시스템에 따라 상기 작은 이미지[초록색 경계 박스]를 비교할 수 있도록 한 비교도이다.Figures 7 and 8 are comparison diagrams of the object detection results in an image using a conventional preprocessing network and the results of small object detection in an image according to an embodiment of the present invention. Referring to Figures 7 and 8, the image on the left is the result image of an object detection method in an image using a conventional preprocessing network, and the image on the right is a small image in the image using a preprocessing network according to an embodiment of the present invention. This is a comparison diagram that allows comparison of the small image [green bounding box] according to the object detection rate improvement system.

이상에서 설명한 바와 같은 전처리 네트워크를 이용한 이미지 내 작은 객체 검출률 향상 시스템 및 방법에 따르면 다음과 같은 효과가 있다. 첫째, 콘볼루션 사전 처리 네트워크를 통해, 작고 조밀한 물체 감지를 위한 높은 정확도를 나타낸다. 둘째, 콘볼루션 사전 처리 네트워크를 통해, 작고 조밀한 물체 감지를 위한 종래의 기술보다 향상된 검출률을 나타낸다. 셋째, 콘볼루션 사전 처리 네트워크를 통해, 최종 생성된 이미지는 원본 이미지보다 해상도가 더 높다. 넷째, 콘볼루션 사전 처리 네트워크를 통해, 작은 물체 감지에 대한 평균 회수율을 향상할 수 있다.According to the system and method for improving the detection rate of small objects in an image using a preprocessing network as described above, the following effects are achieved. First, it shows high accuracy for detecting small and dense objects through a convolutional pre-processing network. Second, through a convolutional pre-processing network, the detection rate is improved over conventional techniques for detecting small and dense objects. Third, through the convolutional preprocessing network, the final generated image has a higher resolution than the original image. Fourth, through a convolutional preprocessing network, the average recall rate for small object detection can be improved.

앞서 설명한 본 발명의 상세한 설명에서는 본 발명의 바람직한 실시예들을 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자 또는 해당 기술 분야에 통상의 지식을 갖는 자라면 후술될 특허청구범위에 기재된 본 발명의 사상 및 기술 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다. Although the detailed description of the present invention described above has been described with reference to preferred embodiments of the present invention, those skilled in the art or those skilled in the art will understand the spirit of the present invention as described in the patent claims to be described later. It will be understood that the present invention can be modified and changed in various ways without departing from the technical scope.

100: 이미지수집부 200: 샘플링전처리부
205: 업샘플링부 210: 다운샘플링부
215: 벡터변환부 300: 다채널변환부
305: 원채널이미지생성부 310: 다채널이미지생성부
315: 채널변환이미지생성부 400: 객체검출부100: Image collection unit 200: Sampling pre-processing unit
205: upsampling unit 210: downsampling unit
215: Vector conversion unit 300: Multi-channel conversion unit
305: One-channel image generator 310: Multi-channel image generator
315: Channel conversion image generation unit 400: Object detection unit

Claims

An image collection unit that acquires an original image including a detection target object;
a sampling preprocessor that samples the original image;
a multi-channel conversion unit that converts the image derived from the sampling preprocessor into a channel image based on a network of convolutional layers;
An object detection unit that detects the detection target object from the channel image,
The sampling pre-processing unit,
By upsampling the original image, the size of the original image is doubled,
An upsampling unit that transmits the upsampled image to the convolution layer to obtain a feature map of the object to be detected;
A downsampling unit that downsamples the original image to 1/2 the size of the original image and transmits it to the convolution layer to obtain a feature map of the object to be detected from the downsampled image. A system for improving the detection rate of small objects in images using a preprocessing network.

According to paragraph 1,
The network of the convolutional layer is,
Based on the RetinaNet algorithm,
The convolutional layer network learns the imbalance between the error image sample and the plurality of classes constituting the original image to find differences,
A system for improving the detection rate of small objects in an image using a preprocessing network, characterized in that the error image sample is distinguished from the original image.

delete

According to paragraph 1,
The sampling pre-processing unit,
Preprocessing further comprising a vector conversion unit that stacks eight shape data of the feature map of the detection target object derived from the upsampling unit or the downsampling unit into one column and converts it into a sequential vector. A system for improving the detection rate of small objects in images using a network.

According to clause 4,
The multi-channel conversion unit,
a one-channel image generator that generates a one-channel image based on the vector value converted from the vector converter;
By combining the one-channel image with the original image,
A system for improving the detection rate of small objects in an image using a preprocessing network, further comprising a multi-channel image generator that generates a four-channel image by passing through a network layer.

According to paragraph 1,
The convolution layer is,
If you convert a three-channel image into a multi-channel image by adding the pixel position in the layer,
Improving the detection rate of small objects in an image using a preprocessing network, characterized by converting the multi-channel image into three channels (tree-channels) with coordinate values different from the three channels (tree-channels). system.

According to clause 5,
The multi-channel conversion unit,
Based on the convolution layer,
A channel conversion image generator that converts the four-channel image into a three-channel image (tree-channel) representing another coordinate value; Object detection rate improvement system.

According to paragraph 1,
The object detection unit,
A system for improving the detection rate of small objects in images using a preprocessing network, characterized in that detecting the detection target object of a size of 32x32 pixels or less from the channel image derived from the multi-channel conversion unit.

In a method of detecting small objects in an image using a preprocessing network,
Obtaining an original image including a detection target object;
Sampling and preprocessing the original image;
sampling the original image based on a network of convolutional layers and converting the preprocessed image into a channel image;
Detecting the detection target object from the channel image,
The step of sampling and preprocessing the original image is,
By upsampling the original image, the size of the original image is doubled,
Passing the up-sampled image to the convolution layer to obtain a feature map of the object to be detected; and
Preprocessing comprising: down-sampling the original image so that the size of the original image is 1/2 the size of the original image, and passing the down-sampled image to the convolution layer to obtain a feature map of the object to be detected. A method to improve the detection rate of small objects in images using a network.

delete

According to clause 9,
The step of sampling and preprocessing the original image is,
A small object in an image using a preprocessing network, further comprising: stacking eight shape data of the up-sampled or down-sampled feature map of the detection target object into one row and converting it into a sequential vector. How to improve detection rate.

According to clause 11,
The step of sampling the original image based on the network of the convolutional layer and converting the preprocessed image into a channel image,
Based on the value converted to the sequential vector,
Generating a one-channel image;
By combining the one-channel image with the original image,
A method of improving the detection rate of small objects in an image using a preprocessing network, further comprising: generating a four-channel image by passing through a layer of the network.

According to clause 12,
The step of sampling the original image based on the network of the convolutional layer and converting the preprocessed image into a channel image,
Based on the convolution layer,
Converting the four-channel image into a three-channel image (tree-channel) representing another coordinate value; improving the detection rate of small objects in an image using a preprocessing network, further comprising: method.