KR102545772B1

KR102545772B1 - Learning apparatus and method for object detection and apparatus and method for object detection

Info

Publication number: KR102545772B1
Application number: KR1020210024797A
Authority: KR
Inventors: 김성호; 송익환
Original assignee: 영남대학교 산학협력단
Priority date: 2021-02-24
Filing date: 2021-02-24
Publication date: 2023-06-20
Also published as: KR20220120961A

Abstract

객체 탐지를 위한 학습 장치 및 방법이 개시된다. 본 발명의 일 실시예에 따른 객체 탐지를 위한 학습 장치 및 방법은 생성적 적대 신경망(Generative Adversarial Networks, GAN) 기반의 객체 탐지를 위한 학습 장치로서, 상기 생성적 적대 신경망은 상기 입력된 영상으로부터 객체 추정 영상을 생성하는 생성자 및 상기 생성자로부터 생성된 객체 추정 영상을 기 설정된 실측 영상과 비교하고, 상기 비교 결과에 따라 입력된 영상이 상기 실측 영상인지 또는 상기 생성된 객체 추정 영상인지의 여부를 판단하며, 상기 판단 결과를 상기 생성자로 피드백하는 판별자를 포함한다.A learning device and method for object detection are disclosed. A learning apparatus and method for object detection according to an embodiment of the present invention is a learning apparatus for object detection based on generative adversarial networks (GANs), wherein the generative adversarial network uses an object from the input image. A generator that generates an estimated image and an object estimation image generated by the creator are compared with a preset actual measurement image, and based on the comparison result, it is determined whether the input image is the actual measurement image or the generated object estimation image, , includes a discriminator that feeds back the determination result to the generator.

Description

Learning Apparatus and Method for Object Detection and Object Detection Apparatus and Method

본 발명의 실시예들은 객체 탐지 기술과 관련된다.Embodiments of the present invention relate to object detection technology.

전자광학추적시스템에서는 물체 탐지 및 인지 성능이 매우 중요하다. 최근 무인화 및 자동화 추세에 따라 전자광학추적시스템의 물체 탐지 및 인지 기능은 딥러닝 기반으로 구현되고 있다.In the electro-optical tracking system, object detection and recognition performance is very important. In accordance with the recent unmanned and automated trends, the object detection and recognition functions of the electro-optical tracking system are being implemented based on deep learning.

그러나, 전자광학추적시스템의 영상 해상도가 발전함에도 불구하고, 딥러닝 모델의 컨볼루션 신경망(Convolution Neural Network, CNN)의 내부 레이어 배열 확장의 제한으로 인해 딥러닝 기반의 전자광학추적시스템은 일정 크기 이하의 소형 물체는 탐지할 수 없는 제한이 따른다. 만일 일정 크기 이하의 소형 물체를 탐지할 수 있도록 CNN의 내부 레이어 배열이 확장되는 경우, 전자광학추적시스템이 처리해야 할 데이터의 양이 방대해지기 때문에 실시간으로 물체를 탐지할 수 없는 문제가 발생한다.However, despite the development of the image resolution of the electro-optical tracking system, the deep learning-based electro-optical tracking system is less than a certain size due to the limitation of the expansion of the inner layer arrangement of the convolution neural network (CNN) of the deep learning model. of small objects are subject to undetectable limitations. If the internal layer arrangement of the CNN is extended to detect a small object of a certain size or less, the amount of data to be processed by the electro-optical tracking system becomes enormous, causing a problem that the object cannot be detected in real time.

따라서, 소형 물체를 탐지할 수 있는 객체 탐지 기술이 필요하다.Therefore, an object detection technology capable of detecting a small object is required.

대한민국 등록특허공보 제10-2085035호 (2020.02.28)Republic of Korea Patent Registration No. 10-2085035 (2020.02.28)

본 발명의 실시예들은 열영상 내에서 소형 객체를 탐지할 수 있는 객체 탐지 장치를 제공하기 위한 것이다.Embodiments of the present invention are to provide an object detection device capable of detecting a small object in a thermal image.

본 발명의 예시적인 실시예에 따르면, 생성적 적대 신경망(Generative Adversarial Networks, GAN) 기반의 객체 탐지를 위한 학습 장치로서, 상기 생성적 적대 신경망은 입력된 영상으로부터 객체 추정 영상을 생성하는 생성자; 및 상기 생성자로부터 생성된 객체 추정 영상을 기 설정된 실측 영상과 비교하고, 상기 비교 결과에 따라 입력된 영상이 상기 실측 영상인지 또는 상기 생성된 객체 추정 영상인지의 여부를 판단하며, 상기 판단 결과를 상기 생성자로 피드백하는 판별자를 포함하는 객체 탐지를 위한 학습 장치가 제공된다.According to an exemplary embodiment of the present invention, there is provided a learning apparatus for object detection based on generative adversarial networks (GANs), wherein the generative adversarial networks include: a generator for generating an object estimation image from an input image; and comparing the object estimation image generated by the creator with a preset actual measurement image, determining whether an input image is the actual measurement image or the generated object estimation image according to the comparison result, and determining the result of the determination as the A learning device for object detection including a discriminator that feeds back to a generator is provided.

상기 생성자는 상기 입력된 영상으로부터 입력 특징맵을 추출하는 입력부; 그리드 구조로 형성되어 상기 추출한 입력 특징맵을 기반으로 지역적 정보(local information) 및 전역적 정보(global information)를 반영하여 최종 특징맵을 출력하는 특징맵 생성부; 및 상기 출력된 최종 특징맵으로부터 상기 객체 추정 영상을 생성하는 출력부를 포함할 수 있다.The generator includes an input unit for extracting an input feature map from the input image; a feature map generator formed in a grid structure and outputting a final feature map by reflecting local information and global information based on the extracted input feature map; and an output unit generating the object estimation image from the output final feature map.

상기 특징맵 생성부는 상기 그리드 구조의 행 방향으로 연결되어 상기 그리드 구조의 층을 형성하는 복수의 덴스 모듈; 및 상기 그리드 구조의 열 방향으로 연결되어 상기 그리드 구조의 각 층을 연결하는 적어도 하나 이상의 업샘플링 모듈 및 다운샘플링 모듈을 포함할 수 있다.The feature map generator may include a plurality of dense modules connected in a row direction of the grid structure to form layers of the grid structure; and at least one up-sampling module and one down-sampling module connected in a column direction of the grid structure to connect each layer of the grid structure.

상기 각 층은 깊이(depth)에 따른 전역적 정보를 포함하는 지역적 정보의 특징맵을 각각 출력할 수 있다.Each of the layers may output feature maps of local information including global information according to depth.

상기 덴스 모듈은 입력되는 특징맵의 채널에 따라 제1 파트 특징맵 및 제2 파트 특징맵으로 분할하며, 상기 제2 파트 특징맵을 덴스 블록(dense block) 및 전이 블록(transition block)을 통과시킨 후, 상기 제1 파트 특징맵과 결합하여 지역적 정보의 특징맵을 출력할 수 있다.The dense module divides the feature map into a first part feature map and a second part feature map according to the channel of the input feature map, and passes the second part feature map through a dense block and a transition block. Then, a feature map of regional information may be output by combining with the feature map of the first part.

상기 특징맵 생성부는 상기 덴스 모듈, 상기 업샘플링 모듈 및 상기 다운샘플링 모듈에서 출력되는 각각의 특징맵들 융합시키는 융합 블록을 더 포함할 수 있다.The feature map generation unit may further include a fusion block for fusing feature maps output from the dense module, the upsampling module, and the downsampling module.

상기 융합 블록은 상기 출력되는 각각의 특징맵들을 더하되, 객체에 해당하는 특징를 강조하기 위하여 상기 객체 이외의 특징값들을 0으로 생성하는 필터(filter)를 곱하는 마스킹(masking) 기법을 이용하여 융합할 수 있다.The fusion block adds each of the output feature maps, and fuses them using a masking technique that multiplies feature values other than the object by a filter that generates 0 to emphasize features corresponding to the object. can

상기 판별자는 상기 실측 영상 및 상기 객체 추정 영상에 대한 특징맵을 추출하는 특징 추출 모듈; 상기 특징 추출 모듈에서 추출된 특징맵을 압축하는 서브 샘플링 모듈; 및 상기 서브 샘플링 모듈로부터 압축된 특징맵을 기반으로 소프트 맥스 함수에 의해 실측(real)인지 생성(fake)인지 분류하는 분류 모듈을 포함할 수 있다.The discriminator includes a feature extraction module for extracting feature maps for the measured image and the estimated object image; a sub-sampling module for compressing the feature map extracted by the feature extraction module; and a classification module for classifying whether the feature map is real or fake by a soft max function based on the feature map compressed from the subsampling module.

상기 서브 샘플링 모듈은 상기 추출된 특징맵에 맥스 풀링(Max Pooing) 및 평균 풀링(Average Pooling)을 각각 수행하여 상기 추출된 특징맵을 압축하며, 상기 분류 모듈은 상기 맥스 풀링 및 평균 풀링을 각각 수행한 특징맵들을 더할(addition) 수 있다.The sub-sampling module performs max pooling and average pooling on the extracted feature map to compress the extracted feature map, and the classification module performs max pooling and average pooling, respectively. One feature map can be added.

본 발명의 다른 예시적인 실시예에 따르면, 생성적 적대 신경망(Generative Adversarial Networks, GAN) 기반의 객체 탐지 장치로서, 상기 생성적 적대 신경망은 상기 입력된 영상으로부터 입력 특징맵을 추출하고, 그리드 구조를 이용하여 상기 추출된 입력 특징맵을 기반으로 지역적 정보(local information) 및 전역적 정보(global information)를 반영하여 최종 특징맵을 출력하며, 상기 출력된 최종 특징맵으로부터 객체 추정 영상을 생성하는 객체 탐지 장치가 제공된다.According to another exemplary embodiment of the present invention, an object detection apparatus based on generative adversarial networks (GAN), wherein the generative adversarial network extracts an input feature map from the input image and constructs a grid structure. Based on the extracted input feature map, local information and global information are reflected to output a final feature map, and an object detection image is generated from the output final feature map. device is provided.

상기 생성적 적대 신경망은 상기 그리드 구조의 행 방향으로 연결되어 상기 그리드 구조의 층을 형성하는 복수의 덴스 모듈; 및 상기 그리드 구조의 열 방향으로 연결되어 상기 그리드 구조의 각 층을 연결하는 적어도 하나 이상의 업샘플링 모듈 및 다운샘플링 모듈을 포함할 수 있다.The generative adversarial network includes a plurality of dense modules connected in a row direction of the grid structure to form layers of the grid structure; and at least one up-sampling module and one down-sampling module connected in a column direction of the grid structure to connect each layer of the grid structure.

상기 생성적 적대 신경망은 상기 덴스 모듈, 상기 업샘플링 모듈 및 상기 다운샘플링 모듈에서 출력되는 각각의 특징맵들 융합시키는 융합 블록을 더 포함할 수 있다.The generative adversarial network may further include a fusion block fusing feature maps output from the dense module, the upsampling module, and the downsampling module.

본 발명의 실시예들에 따르면, 열영상 내에서 소형 객체를 탐지할 수 있다.According to embodiments of the present invention, a small object may be detected in a thermal image.

또한, 본 발명의 실시예들에 따르면, 빠른 속도로 소형 객체를 탐지하고, 탐지 정확도를 높일 수 있는 효과가 있다.In addition, according to embodiments of the present invention, there is an effect of detecting a small object at high speed and increasing detection accuracy.

도 1은 본 발명의 일 실시예에 따른 객체 탐지를 위한 학습 장치를 나타낸 블록도
도 2는 본 발명의 일 실시예에 따른 객체 탐지를 위한 학습 장치에서 생성자를 설명하기 위한 도면
도 3은 본 발명의 일 실시예에 따른 객체 탐지를 위한 학습 장치에서 생성자의 특징맵 생성부에 형성된 그리드 구조를 설명하기 위한 도면
도 4는 본 발명의 일 실시예에 따른 객체 탐지를 위한 학습 장치에서 생성자의 덴스 모듈을 설명하기 위한 도면
도 5는 본 발명의 일 실시예에 따른 객체 탐지를 위한 학습 장치에서 제1 파트 특징맵을 설명하기 위한 도면
도 6은 본 발명의 일 실시예에 따른 객체 탐지를 위한 학습 장치에서 판별자를 설명하기 위한 구성도
도 7은 본 발명의 일 실시예에 따른 객체 탐지 장치를 나타낸 블록도
도 8은 본 발명의 일 실시예에 따른 객체 탐지 장치의 객체를 탐지하는 성능을 설명하기 위한 도면
도 9는 예시적인 실시예들에서 사용되기에 적합한 컴퓨팅 장치를 포함하는 컴퓨팅 환경을 예시하여 설명하기 위한 블록도1 is a block diagram showing a learning device for object detection according to an embodiment of the present invention.
2 is a diagram for explaining a constructor in a learning device for object detection according to an embodiment of the present invention.
3 is a diagram for explaining a grid structure formed in a feature map generating unit of a generator in a learning apparatus for object detection according to an embodiment of the present invention.
4 is a diagram for explaining a dense module of a generator in a learning apparatus for object detection according to an embodiment of the present invention;
5 is a diagram for explaining a first part feature map in a learning device for object detection according to an embodiment of the present invention.
6 is a configuration diagram for explaining a discriminator in a learning device for object detection according to an embodiment of the present invention.
7 is a block diagram showing an object detection device according to an embodiment of the present invention.
8 is a diagram for explaining object detection performance of an object detection apparatus according to an embodiment of the present invention.
9 is a block diagram illustrating and describing a computing environment including a computing device suitable for use in example embodiments.

이하, 도면을 참조하여 본 발명의 구체적인 실시형태를 설명하기로 한다. 이하의 상세한 설명은 본 명세서에서 기술된 방법, 장치 및/또는 시스템에 대한 포괄적인 이해를 돕기 위해 제공된다. 그러나 이는 예시에 불과하며 본 발명은 이에 제한되지 않는다.Hereinafter, specific embodiments of the present invention will be described with reference to the drawings. The detailed descriptions that follow are provided to provide a comprehensive understanding of the methods, devices and/or systems described herein. However, this is only an example and the present invention is not limited thereto.

본 발명의 실시예들을 설명함에 있어서, 본 발명과 관련된 공지기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다. 그리고, 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. 상세한 설명에서 사용되는 용어는 단지 본 발명의 실시예들을 기술하기 위한 것이며, 결코 제한적이어서는 안 된다. 명확하게 달리 사용되지 않는 한, 단수 형태의 표현은 복수 형태의 의미를 포함한다. 본 설명에서, "포함" 또는 "구비"와 같은 표현은 어떤 특성들, 숫자들, 단계들, 동작들, 요소들, 이들의 일부 또는 조합을 가리키기 위한 것이며, 기술된 것 이외에 하나 또는 그 이상의 다른 특성, 숫자, 단계, 동작, 요소, 이들의 일부 또는 조합의 존재 또는 가능성을 배제하도록 해석되어서는 안된다.In describing the embodiments of the present invention, if it is determined that the detailed description of the known technology related to the present invention may unnecessarily obscure the subject matter of the present invention, the detailed description will be omitted. In addition, terms to be described later are terms defined in consideration of functions in the present invention, which may vary according to the intention or custom of a user or operator. Therefore, the definition should be made based on the contents throughout this specification. Terminology used in the detailed description is only for describing the embodiments of the present invention and should in no way be limiting. Unless expressly used otherwise, singular forms of expression include plural forms. In this description, expressions such as "comprising" or "comprising" are intended to indicate any characteristic, number, step, operation, element, portion or combination thereof, one or more other than those described. It should not be construed to exclude the existence or possibility of any other feature, number, step, operation, element, part or combination thereof.

도 1은 본 발명의 일 실시예에 따른 객체 탐지를 위한 학습 장치(100)를 나타낸 블록도이다.1 is a block diagram showing a learning device 100 for object detection according to an embodiment of the present invention.

도 1에 도시된 바와 같이, 본 발명의 일 실시예에 따른 객체 탐지를 위한 학습 장치(100)는 생성적 적대 네트워크(Generative Adversarial Network, GAN)를 이용하여 입력되는 영상으로부터 객체를 추정할 수 있다. As shown in FIG. 1, the learning apparatus 100 for object detection according to an embodiment of the present invention can estimate an object from an input image using a generative adversarial network (GAN). .

한편, 본 발명의 일 실시예에 따른 생성적 적대 신경망(GAN)은 생성자(Generator)와 판별자(Discriminator)를 포함하는 두 개의 네트워크로 구성될 수 있다. 생성자는 생성 모델의 역할로서, 주어진 데이터를 학습하고 이로부터 유사한 데이터를 생성한다. 또한, 판별자는 생성자에 의해 생성된 데이터를 획득하여 데이터가 생성자로부터 생성된 데이터인지 실제 데이터인지를 구별하는 일종의 분별기(classifier)이다. 따라서, 생성자는 데이터와 유사한 데이터를 생성하는 것을 목적으로 하고, 판별자는 생성된 데이터와 실제 데이터를 분류하는 것을 목적으로 한다. 이에 두 네트워크를 minimax 관계라고 한다.Meanwhile, a generative adversarial network (GAN) according to an embodiment of the present invention may be composed of two networks including a generator and a discriminator. A generator is the role of a generative model, learning given data and generating similar data from it. In addition, the discriminator is a kind of classifier that obtains data generated by the generator and distinguishes whether the data is data generated by the generator or actual data. Therefore, the purpose of the generator is to generate data similar to data, and the purpose of the discriminator is to classify generated data and actual data. Therefore, the two networks are called a minimax relationship.

객체 탐지를 위한 학습 장치(100)는 생성자(Generator)(200) 및 판별자(Discriminator)(300)를 포함할 수 있다.The learning device 100 for object detection may include a generator 200 and a discriminator 300 .

생성자(200)는 입력 영상으로부터 객체 추정 영상을 생성할 수 있다. 입력 영상은 검출하고자 하는 객체를 포함하는 영상으로, 예를 들어, 선박, 사람, 차량 또는 이동 수단 등 미리 정의된 클래스의 객체를 포함하는 영상일 수 있다. 입력 영상은 카메라에 의하여 획득될 수 있으며, 예를 들어, 군사용/함정 장착용 전방 감시 열 영상 카메라(Forward Looking Infra Red, FLIR), 전자광학 열상 카메라(Electro Optic Infra Red, EO/IR), 열 영상 탐지추적 장치(Infra Red Search and Tracking system, IRST) 또는 보안용 감시용 영상 카메라(예, CCTV, TOD 등) 등을 포함할 수 있다.The generator 200 may generate an object estimation image from an input image. The input image is an image including an object to be detected, and may be, for example, an image including an object of a predefined class such as a ship, a person, a vehicle, or a means of transportation. The input image may be acquired by a camera, for example, a military/trap-mounted forward looking thermal imaging camera (Forward Looking Infra Red, FLIR), an electro-optical thermal camera (Electro Optic Infra Red, EO/IR), thermal It may include an Infra Red Search and Tracking system (IRST) or a surveillance video camera for security (eg, CCTV, TOD, etc.).

도 2는 본 발명의 일 실시예에 따른 객체 탐지를 위한 학습 장치에서 생성자를 설명하기 위한 도면이다.2 is a diagram for explaining a constructor in a learning device for object detection according to an embodiment of the present invention.

도 2에 도시된 바와 같이, 생성자(200)는 입력 영상으로부터 지역적 정보(local information) 및 전역적 정보(global information)를 반영하여 객채에 대한 최종 특징맵을 생성하는 특징맵 생성부(220)를 포함할 수 있다. 여기서, 도 2에서 특징맵 생성부(220)는 왼쪽으로 90도 회전된 상태이다. 즉, 도 2의 그리드 구조에서 가로 방향이 그리드 구조의 행(width)이며, 세로 방향이 그리드 구조의 열(height)을 나타낼 수 있다. As shown in FIG. 2, the generator 200 includes a feature map generator 220 that generates a final feature map for an object by reflecting local information and global information from an input image. can include Here, in FIG. 2 , the feature map generator 220 is rotated 90 degrees to the left. That is, in the grid structure of FIG. 2 , the horizontal direction may represent the width of the grid structure, and the vertical direction may represent the height of the grid structure.

생성자(200)는 도 3과 같이, 그리드(grid) 구조를 이용하여 지역적 정보(local information) 및 전역적 정보(global information)를 반영하여 특징맵을 출력할 수 있다. 여기서, 지역적 정보는 특정 객체에 해당하는 정보를 의미할 수 있으며, 전역적 정보는 특정 객체가 속해있는 주변 환경 또는 배경에 해당하는 정보를 의미할 수 있다. 즉, 생성자(200)는 지역적인 문맥 정보뿐만 아니라 전역적인 문맥 정보를 활용하여 객체 추정 영상을 생성할 수 있다.As shown in FIG. 3 , the generator 200 may output a feature map by reflecting local information and global information using a grid structure. Here, local information may mean information corresponding to a specific object, and global information may mean information corresponding to a surrounding environment or background to which a specific object belongs. That is, the generator 200 may generate an object estimation image by utilizing global context information as well as local context information.

또한, 생성자(210)는 입력부(210) 및 출력부(230)을 더 포함할 수 있다. 입력부(210)는 특징맵 생성부(210) 이전에 연결되어 입력 영상으로부터 입력 특징맵을 출력할 수 있다. 출력부(230)는 특징맵 생성부(220) 이후에 연결되어 특징맵 생성부(220)로부터 출력되는 최종 특징맵을 입력받아 객체 추정 영상을 생성할 수 있다. 예를 들어, 입력부(210) 및 출력부(230)는 커널(kernel)이 3, 스트라이드(stride)가 1로 구성된 컨볼루션 레이어를 포함할 수 있다. 이 때, 입력부(210)는 입력 영상을 컨볼루션 레이어에 통과시킴에 따라 입력 영상의 모든 특징 정보 흐름의 용량을 결정할 수 있으며, 출력부(230)는 최종 특징맵을 컨볼루션 레이어에 통과시킴에 따라 특징맵 생성부에서 출력되는 최종 특징맵의 모든 특징 정보를 수집하여 최종 결정을 할 수 있다.In addition, the generator 210 may further include an input unit 210 and an output unit 230 . The input unit 210 may be connected before the feature map generator 210 and output an input feature map from an input image. The output unit 230 is connected after the feature map generator 220 and receives the final feature map output from the feature map generator 220 to generate an object estimation image. For example, the input unit 210 and the output unit 230 may include a convolution layer having a kernel of 3 and a stride of 1. At this time, the input unit 210 can determine the capacity of all feature information flows of the input image by passing the input image through the convolution layer, and the output unit 230 passes the final feature map through the convolution layer. Accordingly, a final decision may be made by collecting all feature information of the final feature map output from the feature map generator.

특징맵 생성부(220)는 그리드(grid) 구조로 형성되어, 복수의 덴스 모듈(dense module)(221), 다운샘플링 모듈(downsampling module)(222) 및 업샘플링 모듈(upsampling module)(223)을 포함할 수 있다. The feature map generator 220 is formed in a grid structure, and includes a plurality of dense modules 221, a downsampling module 222, and an upsampling module 223. can include

복수의 덴스 모듈(221)은 그리드 구조의 행(width) 방향으로 연결되어 그리드 구조의 층을 형성할 수 있다. 복수의 덴스 모듈(221)은 입력 특징맵으로부터 객채에 대한 지역적 정보를 표현하기 위한 특징맵을 출력할 수 있다. 한편, 도 2와 같이, 덴스 모듈(221)은 예를 들어, CDB-L(Cross-stage partial dense Block with last-fusion) 및 CDDB-L(Cross-stage partial dense dilation block with last-fusion)을 포함할 수 있다.The plurality of dense modules 221 may be connected in a width direction of the grid structure to form a layer of the grid structure. The plurality of dense modules 221 may output a feature map for expressing regional information about an object from an input feature map. Meanwhile, as shown in FIG. 2 , the dense module 221, for example, cross-stage partial dense block with last-fusion (CDB-L) and cross-stage partial dense dilation block with last-fusion (CDDB-L) can include

도 4를 참조하면, 덴스 모듈(221)은 입력되는 특징맵의 채널에 따라 제1 파트 특징맵(X₀) 및 제2 파트 특징맵(X₁)으로 분할할 수 있다. 덴스 모듈(221)은 제1 파트 특징맵(X₀)은 그대로 통과시키고, 제2 파트 특징맵(X₁)은 덴스 블록(dense block)(221a) 및 전이 블록(transition block)(221b)을 통과시킬 수 있다. 덴스 모듈(221)은 제1 파트 특징맵(X₀)과 제2 파트 특징맵(X₁)을 결합(concatenation)하여 출력할 수 있다. 이에, 덴스 모듈(221)은 일부의 특징맵(X₀)만 컨볼루션 연산을 진행한 후, 나머지 일부의 특징맵(X₁)에 결합함으로써, 컨볼루션 연산의 연산량을 줄일 수 있다.Referring to FIG. 4 , the dense module 221 may divide a first part feature map (X ₀ ) and a second part feature map (X ₁ ) according to a channel of an input feature map. The dense module 221 passes the first part feature map (X ₀ ) as it is, and passes the second part feature map (X ₁ ) through the dense block 221a and the transition block 221b. can pass The dense module 221 may concatenate and output the first part feature map (X ₀ ) and the second part feature map (X ₁ ). Accordingly, the dense module 221 may reduce the amount of computation of the convolution operation by performing the convolution operation on only a part of the feature maps (X ₀ ) and then combining them with the remaining part of the feature maps (X ₁ ).

여기서, 덴스 블록(dense block)(221a)은 이전 특징맵을 컨볼루션 레이어에 통과시켜 새로운 특징맵을 출력하고, 이전 특징맵과 새로운 특징맵을 결합(concatenation)할 수 있다. 즉, 각각의 덴스 블록(221a)이 이전 특징맵에 새로운 특징맵을 결합(concatenation)함으로써, 덴스 모듈(221)의 모든 덴스 블록(221a) 사이에서 정보 흐름이 원활하도록 하여 특징 정보가 사라지는 문제를 해결할 수 있으며, 소형 객체에 대한 정보가 왜곡되는 현상을 방지할 수 있다. 또한, 전이 블록(transition block)(221b)은 복수의 덴스 블록(221a)을 통과하면서 증가한 제2 파트 특징맵(X₁)의 채널 수를 줄일 수 있다. 예를 들어, 전이 블록(221b)은 배치 정규화 레이어(batch normalization layer), 1x1 컨볼루션 레이어 및 2x2 평균 풀링 레이어(Average Pooling layer)를 포함할 수 있다. Here, the dense block 221a may output a new feature map by passing the previous feature map through the convolution layer, and concatenate the previous feature map with the new feature map. That is, each dense block 221a concatenates a new feature map with the previous feature map, so that information flows smoothly between all the dense blocks 221a of the dense module 221, thereby solving the problem of feature information disappearing. This can be solved, and a phenomenon in which information about small objects is distorted can be prevented. Also, the transition block 221b may reduce the number of channels of the second part feature map X ₁ that is increased while passing through the plurality of dense blocks 221a. For example, the transition block 221b may include a batch normalization layer, a 1x1 convolution layer, and a 2x2 average pooling layer.

한편, 제1 파트 특징맵(X₀)은 덴스 모듈(221)의 구성에 따라 특징맵 생성부(220)에서 도 5와 같은 흐름을 가지게 된다. 이 때, 제1 파트 특징맵(X₀)은 각각의 다운샘플링 모듈(222)을 통과하면서 각각 특징맵을 출력할 수 있다. 이 때, 다운샘플링 모듈(222)의 동작은 하기와 같이 표현할 수 있다.Meanwhile, the first part feature map (X ₀ ) has a flow as shown in FIG. 5 in the feature map generator 220 according to the configuration of the dense module 221. At this time, the first part feature map (X ₀ ) may output each feature map while passing through each downsampling module 222 . At this time, the operation of the downsampling module 222 can be expressed as follows.

(여기서, {a₁₁, a_12,…, a_h-1,w/2}는 그리드 구조의 행(w)열(h)좌표의 특징맵, D()는 다운샘플링 함수일 수 있다.)(Here, {a ₁₁ , a _12, ..., a _h-1,w/2 } may be a feature map of row (w) column (h) coordinates of the grid structure, and D() may be a downsampling function.)

또한, 다운샘플링 모듈(222)을 통과한 각각의 특징맵들은 업샘플링 모듈(223)을 통과하면서 각각 특징맵을 출력할 수 있다. 이 때, 업샘플링 모듈(223)의 동작은 하기와 같이 표현할 수 있다.In addition, each feature map passing through the downsampling module 222 may output a feature map while passing through the upsampling module 223 . At this time, the operation of the upsampling module 223 can be expressed as follows.

(여기서, {b_h-1,w/2+1, b_h-1,w/2+2,…, b_1,w}는 그리드 구조의 행(w)열(h)좌표의 특징맵, U()는 업샘플링 함수일 수 있다.)(Where {b _h-1,w/2+1 , b _h-1,w/2+2, …, b _1,w } is the feature map of the row (w) column (h) coordinates of the grid structure, U() may be an upsampling function.)

즉, 제1 파트 특징맵(X₀)이 덴스 모듈(221)의 구성에 따라 특징맵 생성부(220)를 통과한 후 출력되는 특징맵을 하기 식과 같이 표시할 수 있다.That is, after the first part feature map (X ₀ ) passes through the feature map generator 220 according to the configuration of the dense module 221, the output feature map may be displayed as the following formula.

(여기서, Y₀는 출력 특징맵, X₀는 제1 파트 특징맵일 수 있다.)(Here, Y ₀ may be an output feature map, and X ₀ may be a first part feature map.)

이에, 특징맵 생성부(220)에서 덴스 모듈(221)이 제1 파트 특징맵(X₀)를 스킵(skip)시킴으로써, 특징맵 생성부(220)를 구성하는 모듈간의 정보 흐름이 원활하도록 하여 특징 정보가 사라지는 문제를 해결할 수 있으며, 소형 객체에 대한 정보가 왜곡되는 현상을 방지할 수 있다. Accordingly, in the feature map generator 220, the density module 221 skips the first part feature map (X ₀ ) so that information flow between modules constituting the feature map generator 220 is smooth. It is possible to solve a problem in which feature information disappears, and to prevent a phenomenon in which information about a small object is distorted.

한편, 덴스 모듈(221)은 이전 특징맵을 컨볼루션 레이어에 통과시키는 것으로 도시하고 있으나, 이에 한정되는 것은 아니며, 이전 특징맵을 확장 컨볼루션 레이어(dilated convolution layer)에 통과시켜 새로운 특징맵을 출력할 수 있다. 여기서, 확장 컨볼루션 레이어는 커널 사이의 간격 K를 가지는 컨볼루션 레이어이다. 특징맵 생성부(220)는 그리드 구조의 행 수에 따라 기 설정된 개수의 확장 덴스 모듈(dilation dense module)을 포함할 수 있다.On the other hand, the dense module 221 is shown as passing the previous feature map through the convolution layer, but is not limited thereto, and passes the previous feature map through the dilated convolution layer to output a new feature map can do. Here, the extended convolution layer is a convolution layer having an interval K between kernels. The feature map generator 220 may include a predetermined number of dilation dense modules according to the number of rows of the grid structure.

다시 도 2를 참조하면, 그리드 구조에 형성된 각 층은 채널의 수, 즉, 깊이(depth)에 따라 객채에 대한 지역적 정보와 함께 주변 환경을 포함하는 전역적 정보를 표현하기 위한 지역적 정보의 특징맵을 각각 출력할 수 있다. 여기서, 그리드 구조의 각 층은 행 방향으로 연속적으로 연결된 복수의 덴스 모듈(221)로 형성될 수 있다. 이 때, 그리드 구조에 형성된 각 층은 적어도 하나 이상의 다운샘플링 모듈(222) 및 업샘플링 모듈(223)에 의하여 그리드 구조의 열(height) 방향으로 연결될 수 있다. 즉, 다운샘플링 모듈(222) 및 업샘플링 모듈(223)에 의하여 각 층은 채널의 수, 즉, 깊이(depth)에 따라 전역적 정보를 추출하고 각 층에서 출력되는 지역적 정보의 특징맵들에 반영할 수 있다. 예를 들어, 제1 층의 채널 수가 N 개인 경우, 제2 층의 채널 수는 2¹ x N개, 제 3층의 채널 수는 2² x N개 ... 제 N층의 채널 수는 2^N x N개일 수 있다. 이 때, 제1 층에서 출력되는 특징맵이 128x128x34인 경우, 제2 층에서 출력되는 특징맵은 64x64x68이고, 제3 층에서 출력되는 특징맵은 32x32x136이며, 제4 층에서 출력되는 특징맵은 16x16x272이며, 제5 층에서 출력되는 특징맵은 8x8x554이며, 제6 층에서 출력되는 특징맵은 4x4x1088일 수 있다. 여기서, 다운샘플링 모듈(222)은 입력되는 특징맵의 채널은 1/2배로 줄이고 특징맵의 가로 및 세로 크기를 각각 2배로 늘릴 수 있다. 예를 들어, 다운샘플링 모듈(222)은 3x3 컨볼루션 레이어, 배치 정규화 레이어(batch normalization layer), 활성함수 레이어(Activation layer) 및 1x1 컨볼루션 레이어를 포함할 수 있다. 여기서, 활성함수(activation function)로 mish를 이용할 수 있다. 또한, 업샘플링 모듈(223)은 입력되는 특징맵의 채널을 2배로 늘리고, 특징맵의 가로 및 세로 크기를 각각 1/2로 줄일 수 있다. 예를 들어, 업샘플링 모듈(223)은 3x3 컨볼루션 레이어, 배치 정규화 레이어(batch normalization layer), 활성함수 레이어(Activation layer) 및 1x1 컨볼루션 레이어를 포함할 수 있다.Referring back to FIG. 2, each layer formed in the grid structure is a feature map of local information to express global information including the surrounding environment along with local information about objects according to the number of channels, that is, depth. can be output respectively. Here, each layer of the grid structure may be formed of a plurality of dense modules 221 continuously connected in a row direction. In this case, each layer formed in the grid structure may be connected in a height direction of the grid structure by at least one down-sampling module 222 and one or more up-sampling modules 223 . That is, each layer extracts global information according to the number of channels, that is, depth, by the downsampling module 222 and the upsampling module 223, and extracts local information from feature maps output from each layer. can reflect For example, if the number of channels in the first layer is N, the number of channels in the second layer is 2 ¹ x N, the number of channels in the third layer is 2 ² x N ... the number of channels in the Nth layer is 2 It can be ^N x N. In this case, if the feature map output from the first layer is 128x128x34, the feature map output from the second layer is 64x64x68, the feature map output from the third layer is 32x32x136, and the feature map output from the fourth layer is 16x16x272. , the feature map output from the fifth layer may be 8x8x554, and the feature map output from the sixth layer may be 4x4x1088. Here, the downsampling module 222 may reduce the channel of the input feature map by half and double the horizontal and vertical sizes of the feature map, respectively. For example, the downsampling module 222 may include a 3x3 convolution layer, a batch normalization layer, an activation layer, and a 1x1 convolution layer. Here, mish can be used as an activation function. In addition, the upsampling module 223 may double the channels of the input feature map and reduce the horizontal and vertical sizes of the feature map to 1/2, respectively. For example, the upsampling module 223 may include a 3x3 convolution layer, a batch normalization layer, an activation layer, and a 1x1 convolution layer.

또한, 복수의 덴스 모듈(221), 다운샘플링 모듈(222) 및 업샘플링 모듈(223)에서 출력되는 각각의 특징맵들은 융합 블록(224)을 통하여 융합될 수 있다. 융합 블록(224)은 출력되는 각각의 특징맵들을 더하(addition)되, 객체에 해당하는 특징를 강조하기 위하여 객체 이외의 특징값들을 0으로 생성하는 필터(filter)를 곱하는 마스킹(masking) 기법을 이용하여 각각의 특징맵들을 융합할 수 있다. 즉, 융합 블록(224)을 이용하여 각각의 특징맵들을 융합함으로써, 중요한 특징(객체)를 부각하고, 이외의 특징을 제거함으로써, 명료한 학습을 가능하게 할 수 있다.In addition, feature maps output from the plurality of dense modules 221 , the downsampling module 222 and the upsampling module 223 may be fused through the fusion block 224 . The fusion block 224 adds each of the output feature maps, but uses a masking technique that multiplies feature values other than the object with a filter that generates 0 to emphasize features corresponding to the object. By doing so, each feature map can be fused. That is, by fusing each feature map using the convergence block 224, important features (objects) are highlighted and other features are removed, enabling clear learning.

판별자(300)는 실측 영상 및 생성자(200)로부터 생성된 객체 추정 영상을 학습하고, 학습 결과에 따라 입력된 영상이 실측 영상인지 생성된 영상인지의 여부를 판단할 수 있다. 여기서, 실측 영상이란 생성자(200)로부터 생성된 영상과 비교하기 위한 영상으로, 입력 영상(카메라 등의 촬영장치에 의하여 촬영된 영상)에 객체의 실제 위치를 표시한 영상일 수 있다. 판별자(300)는 판단 결과를 생성자로 피드백함으로써 생성자(200)에서 생성되는 이미지가 실제와 점점 유사해질 수 있다.The discriminator 300 may learn the measured image and the estimated object image generated by the generator 200, and determine whether the input image is a measured image or a generated image according to the learning result. Here, the measured image is an image for comparison with the image generated by the generator 200, and may be an image in which the actual location of an object is displayed on an input image (an image captured by a photographing device such as a camera). The discriminator 300 feeds back the determination result to the generator so that the image generated by the generator 200 becomes more and more similar to the real one.

도 6을 참조하면, 판별자(300)는 특징 추출 모듈(310), 서브 샘플링 모듈(320) 및 분류 모듈(330)을 포함할 수 있다.Referring to FIG. 6 , the discriminator 300 may include a feature extraction module 310 , a subsampling module 320 and a classification module 330 .

특징 추출 모듈(310)은 실측 영상 및 객체 추정 영상에 대한 특징 맵을 추출할 수 있다. 특징 추출 모듈(310)은 컨볼루션 레이어(convolution layer), 배치 정규화 레이어(batch normalization layer) 및 활성함수 레이어(Activation layer)를 포함할 수 있다. 특징 추출 모듈(310)은 컨볼루션 레이어, 배치 정규화 레이어 및 활성함수 레이어를 통한 실측 영상 및 객체 추정 영상에 대한 특징 맵을 생성하는 과정을 반복적으로 수행할 수 있다. 여기서, 특징 추출 모듈(310)에 포함된 컨볼루션 레이어는 커널(kernel)이 4, 스트라이드(stride)가 2, 패딩(padding)이 1로 구성될 수 있다. 또한, 활성함수(activation function)로 음의 값을 반영할 수 있는 Leaky-ReLU(Rectified Linear Unit)를 이용할 수 있다. 활성함수로 Leaky-ReLU를 이용하여 보다 정확한 객체 인식이 가능하도록 할 수 있다. 예를 들어, 입력 영상(128×128×2)에서 특징맵 추출과정(컨볼루션 레이어, 배치 정규화 레이어 및 활성함수 레이어)을 1회 수행하면 연산결과로 특징맵(64×64×24)이 생성될 수 있으며, 이를 4회 반복하여 최종적으로 8×8×768의 특징맵을 생성할 수 있다. The feature extraction module 310 may extract feature maps for the measured image and the object estimation image. The feature extraction module 310 may include a convolution layer, a batch normalization layer, and an activation layer. The feature extraction module 310 may repeatedly perform a process of generating feature maps for a measured image and an object estimation image through a convolution layer, a batch normalization layer, and an activation function layer. Here, the convolution layer included in the feature extraction module 310 may include a kernel of 4, a stride of 2, and a padding of 1. In addition, a Leaky-Rectified Linear Unit (ReLU) capable of reflecting a negative value as an activation function may be used. Using Leaky-ReLU as an activation function, it is possible to enable more accurate object recognition. For example, if the feature map extraction process (convolution layer, batch normalization layer, and activation function layer) is performed once from the input image (128 × 128 × 2), a feature map (64 × 64 × 24) is generated as an operation result. This may be repeated 4 times to finally generate an 8×8×768 feature map.

서브 샘플링 모듈(320)은 특징 추출 모듈(310)에서 생성된 특징 맵을 압축할 수 있다. 서브 샘플링 모듈(320)은 풀링 레이어(pooling layer), 완전 연결 레이어(fully connected layer) 및 스펙트럼 정규화 레이어(spectral normalization layer)를 포함할 수 있다. 서브 샘플링 모듈(320)은 맥스 풀링(Max Pooing) 및 평균 풀링(Average Pooling)을 각각 수행할 수 있다. 예를 들어, 서브 샘플링 모듈(320)은 특징 추출 모듈(310)에서 생성된 특징맵(8×8×768)으로부터 1×1×2의 특징맵을 생성할 수 있다.The sub-sampling module 320 may compress the feature map generated by the feature extraction module 310 . The subsampling module 320 may include a pooling layer, a fully connected layer, and a spectral normalization layer. The subsampling module 320 may perform max pooling and average pooling, respectively. For example, the subsampling module 320 may generate a 1×1×2 feature map from the feature maps (8×8×768) generated by the feature extraction module 310 .

분류 모듈(330)은 서브 샘플링 모듈(320)에서 전달된 각각의 특징 맵(맥스 풀링 및 평균 풀링을 각각 수행한 결과)을 더하고(addition), 소프트맥스(softmax) 함수에 의해 실측(real)인지 생성(fake)인지 분류할 수 있다. The classification module 330 adds each feature map (the result of performing max pooling and average pooling, respectively) transmitted from the subsampling module 320, and determines whether it is real by the softmax function. It can be classified as fake.

전술한 학습 과정을 통해 생성자(200)에서 생성되는 영상이 실측 영상과 충분히 유사해질 경우, 판별자(300)는 입력되는 영상이 실측 영상인지 또는 실측 영상인지의 여부를 구별할 수 없게 된다. GAN이 이와 같은 상태에 도달하면 학습 과정은 종료되며, 이후 생성자(200)는 입력되는 영상에 따라 객체 추정 영상을 생성하게 된다.When the image generated by the generator 200 through the above-described learning process is sufficiently similar to the measured image, the discriminator 300 cannot distinguish whether the input image is the measured image or the measured image. When the GAN reaches such a state, the learning process ends, and then the generator 200 generates an object estimation image according to the input image.

도 7은 본 발명의 일 실시예에 따른 객체 탐지 장치를 나타낸 블록도이다. 도 1 내지 도 6을 참조하여 설명하였던 본 발명의 실시예에서의 구성요소와 대응되는 구성요소는, 실시예에서 설명한 바와 동일 또는 유사한 기능을 수행하므로, 이에 대한 보다 구체적인 설명은 생략하도록 한다.7 is a block diagram illustrating an object detection device according to an embodiment of the present invention. Components corresponding to the components in the embodiments of the present invention described with reference to FIGS. 1 to 6 perform the same or similar functions as those described in the embodiments, so a detailed description thereof will be omitted.

도 7에 도시된 바와 같이, 본 발명의 일 실시예에 따른 객체 탐지 장치(800)는 생성적 적대 신경망(810)을 포함할 수 있다. 본 실시예에서, 생성적 적대 신경망(810)은 학습이 완료된 상태일 수 있다.As shown in FIG. 7 , an object detection apparatus 800 according to an embodiment of the present invention may include a generative adversarial network 810 . In this embodiment, the generative adversarial neural network 810 may be in a state in which learning has been completed.

생성적 적대 신경망(810)은 입력된 영상으로부터 입력 특징맵을 추출하고, 추출된 입력 특징맵에 지역적 정보(local information) 및 전역적 정보(global information)를 반영하여 최종 특징맵을 생성하며, 생성된 최종 특징맵으로부터 객체 추정 영상을 생성할 수 있다.The generative adversarial neural network 810 extracts an input feature map from an input image, generates a final feature map by reflecting local information and global information on the extracted input feature map, and generates a final feature map. An object estimation image can be generated from the final feature map.

도 8은 본 발명의 일 실시예에 따른 객체 탐지 장치에서 객체를 추정하는 성능을 설명하기 위한 도면이다.8 is a diagram for explaining performance of estimating an object in an object detection apparatus according to an embodiment of the present invention.

실시예에서 제안된 객체 검출 기술과 종래의 객체 검출 알고리즘의 성능을 비교한 도면이다. 도면에서 검출기의 성능을 F-area로 나타내었다. 도 8과 같이, 제안된 객체 검출 기술이 종래의 객체 검출 알고리즘에 비하여 좋은 성능을 나타내는 것을 알 수 있다. 여기서, F-area는 F1-Score와 평균 정밀도(average precision)을 곱하여 획득할 수 있다. F1-Score는 기 공지된 지표이므로, 자세한 설명은 생략하고, 다만 간단히 설명하면, 예측을 평가하기 위한 지표인 정밀도(precision) 및 재현율(recall)에 의하여 획득되는 점수이다. F-area는 고정된 임계값으로 측정되는 F1-Score에 검출기의 잠재 성능을 평가하기 위한 평균 정밀도를 곱하여 획득함으로써, 검출기의 잠재 성능까지 확인할 수 있는 지표이다.It is a diagram comparing the performance of the object detection technology proposed in the embodiment and the conventional object detection algorithm. In the figure, the performance of the detector is represented by F-area. As shown in FIG. 8 , it can be seen that the proposed object detection technique exhibits better performance than conventional object detection algorithms. Here, F-area can be obtained by multiplying F1-Score and average precision. Since the F1-Score is a well-known index, a detailed description thereof will be omitted, but simply described, it is a score obtained by precision and recall, which are indicators for evaluating prediction. The F-area is obtained by multiplying the F1-Score, which is measured with a fixed threshold, by the average precision for evaluating the potential performance of the detector, and is an index that can confirm the potential performance of the detector.

따라서 본 발명의 실시예들에 따르면, 빠른 속도로 소형 객체를 탐지하고, 탐지 정확도를 높일 수 있는 효과가 있다.Therefore, according to embodiments of the present invention, there is an effect of detecting a small object at high speed and increasing detection accuracy.

도 9는 예시적인 실시예들에서 사용되기에 적합한 컴퓨팅 장치를 포함하는 컴퓨팅 환경(10)을 예시하여 설명하기 위한 블록도이다. 도시된 실시예에서, 각 컴포넌트들은 이하에 기술된 것 이외에 상이한 기능 및 능력을 가질 수 있고, 이하에 기술되는 것 이외에도 추가적인 컴포넌트를 포함할 수 있다.9 is a block diagram illustrating and describing a computing environment 10 including a computing device suitable for use in example embodiments. In the illustrated embodiment, each component may have different functions and capabilities other than those described below, and may include additional components other than those described below.

도시된 컴퓨팅 환경(10)은 컴퓨팅 장치(12)를 포함한다. 일 실시예에서, 컴퓨팅 장치(12)는 본 발명의 실시예에 따른 객체 탐지를 수행하기 위한 장치일 수 있다.The illustrated computing environment 10 includes a computing device 12 . In one embodiment, computing device 12 may be a device for performing object detection in accordance with an embodiment of the present invention.

컴퓨팅 장치(12)는 적어도 하나의 프로세서(14), 컴퓨터 판독 가능 저장 매체(16) 및 통신 버스(18)를 포함한다. 프로세서(14)는 컴퓨팅 장치(12)로 하여금 앞서 언급된 예시적인 실시예에 따라 동작하도록 할 수 있다. 예컨대, 프로세서(14)는 컴퓨터 판독 가능 저장 매체(16)에 저장된 하나 이상의 프로그램들을 실행할 수 있다. 상기 하나 이상의 프로그램들은 하나 이상의 컴퓨터 실행 가능 명령어를 포함할 수 있으며, 상기 컴퓨터 실행 가능 명령어는 프로세서(14)에 의해 실행되는 경우 컴퓨팅 장치(12)로 하여금 예시적인 실시예에 따른 동작들을 수행하도록 구성될 수 있다.Computing device 12 includes at least one processor 14 , a computer readable storage medium 16 and a communication bus 18 . Processor 14 may cause computing device 12 to operate according to the above-mentioned example embodiments. For example, processor 14 may execute one or more programs stored on computer readable storage medium 16 . The one or more programs may include one or more computer-executable instructions, which when executed by processor 14 are configured to cause computing device 12 to perform operations in accordance with an illustrative embodiment. It can be.

컴퓨터 판독 가능 저장 매체(16)는 컴퓨터 실행 가능 명령어 내지 프로그램 코드, 프로그램 데이터 및/또는 다른 적합한 형태의 정보를 저장하도록 구성된다. 컴퓨터 판독 가능 저장 매체(16)에 저장된 프로그램(20)은 프로세서(14)에 의해 실행 가능한 명령어의 집합을 포함한다. 일 실시예에서, 컴퓨터 판독 가능 저장 매체(16)는 메모리(랜덤 액세스 메모리와 같은 휘발성 메모리, 비휘발성 메모리, 또는 이들의 적절한 조합), 하나 이상의 자기 디스크 저장 디바이스들, 광학 디스크 저장 디바이스들, 플래시 메모리 디바이스들, 그 밖에 컴퓨팅 장치(12)에 의해 액세스되고 원하는 정보를 저장할 수 있는 다른 형태의 저장 매체, 또는 이들의 적합한 조합일 수 있다.Computer-readable storage medium 16 is configured to store computer-executable instructions or program code, program data, and/or other suitable form of information. Program 20 stored on computer readable storage medium 16 includes a set of instructions executable by processor 14 . In one embodiment, computer readable storage medium 16 includes memory (volatile memory such as random access memory, non-volatile memory, or a suitable combination thereof), one or more magnetic disk storage devices, optical disk storage devices, flash memory devices, other forms of storage media that can be accessed by computing device 12 and store desired information, or any suitable combination thereof.

통신 버스(18)는 프로세서(14), 컴퓨터 판독 가능 저장 매체(16)를 포함하여 컴퓨팅 장치(12)의 다른 다양한 컴포넌트들을 상호 연결한다.Communications bus 18 interconnects various other components of computing device 12, including processor 14 and computer-readable storage medium 16.

컴퓨팅 장치(12)는 또한 하나 이상의 입출력 장치(24)를 위한 인터페이스를 제공하는 하나 이상의 입출력 인터페이스(22) 및 하나 이상의 네트워크 통신 인터페이스(26)를 포함할 수 있다. 입출력 인터페이스(22) 및 네트워크 통신 인터페이스(26)는 통신 버스(18)에 연결된다. 입출력 장치(24)는 입출력 인터페이스(22)를 통해 컴퓨팅 장치(12)의 다른 컴포넌트들에 연결될 수 있다. 예시적인 입출력 장치(24)는 포인팅 장치(마우스 또는 트랙패드 등), 키보드, 터치 입력 장치(터치패드 또는 터치스크린 등), 음성 또는 소리 입력 장치, 다양한 종류의 센서 장치 및/또는 촬영 장치와 같은 입력 장치, 및/또는 디스플레이 장치, 프린터, 스피커 및/또는 네트워크 카드와 같은 출력 장치를 포함할 수 있다. 예시적인 입출력 장치(24)는 컴퓨팅 장치(12)를 구성하는 일 컴포넌트로서 컴퓨팅 장치(12)의 내부에 포함될 수도 있고, 컴퓨팅 장치(12)와는 구별되는 별개의 장치로 컴퓨팅 장치(12)와 연결될 수도 있다.Computing device 12 may also include one or more input/output interfaces 22 and one or more network communication interfaces 26 that provide interfaces for one or more input/output devices 24 . An input/output interface 22 and a network communication interface 26 are connected to the communication bus 18 . Input/output device 24 may be coupled to other components of computing device 12 via input/output interface 22 . Exemplary input/output devices 24 include a pointing device (such as a mouse or trackpad), a keyboard, a touch input device (such as a touchpad or touchscreen), a voice or sound input device, various types of sensor devices, and/or a photographing device. input devices, and/or output devices such as display devices, printers, speakers, and/or network cards. The exemplary input/output device 24 may be included inside the computing device 12 as a component constituting the computing device 12, or may be connected to the computing device 12 as a separate device distinct from the computing device 12. may be

이상에서 본 발명의 대표적인 실시예들을 상세하게 설명하였으나, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 상술한 실시예에 대하여 본 발명의 범주에서 벗어나지 않는 한도 내에서 다양한 변형이 가능함을 이해할 것이다. 그러므로 본 발명의 권리범위는 설명된 실시예에 국한되어 정해져서는 안 되며, 후술하는 특허청구범위뿐만 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.Although representative embodiments of the present invention have been described in detail above, those skilled in the art will understand that various modifications are possible to the above-described embodiments without departing from the scope of the present invention. . Therefore, the scope of the present invention should not be limited to the described embodiments and should not be defined, and should be defined by not only the claims to be described later, but also those equivalent to these claims.

10: 컴퓨팅 환경
12: 컴퓨팅 장치
14: 프로세서
16: 컴퓨터 판독 가능 저장 매체
18: 통신 버스
20: 프로그램
22: 입출력 인터페이스
24: 입출력 장치
26: 네트워크 통신 인터페이스10: Computing environment
12: computing device
14: Processor
16: computer readable storage medium
18: communication bus
20: program
22: I/O interface
24: I/O device
26: network communication interface

Claims

As a learning device for object detection based on generative adversarial networks (GAN),
The generative adversarial neural network,
a generator generating an object estimation image from an input image; and
The object estimation image generated by the creator is compared with a preset actual measurement image, and it is determined whether the input image is the actual measurement image or the generated object estimation image according to the comparison result, and the determination result is transmitted to the constructor. Including a discriminator that feeds back to
The constructor,
an input unit extracting an input feature map from the input image;
a feature map generator formed in a grid structure and outputting a final feature map by reflecting local information and global information based on the extracted input feature map; and
And an output unit for generating the object estimation image from the output final feature map,
The feature map generator,
a plurality of dense modules connected in a row direction of the grid structure to form layers of the grid structure; and
At least one upsampling module and one downsampling module connected in a column direction of the grid structure to connect each layer of the grid structure;
The dense module,
According to the channel of the input feature map, it is divided into a first part feature map and a second part feature map, and after passing the second part feature map through a dense block and a transition block, the second part feature map A learning device for object detection that outputs a feature map of local information in combination with a one-part feature map.

delete

The method of claim 1,
The learning apparatus for object detection, wherein each layer outputs a feature map of local information including global information according to depth.

delete

The method of claim 1,
The feature map generator,
The learning apparatus for object detection further comprises a fusion block for fusing respective feature maps output from the dense module, the upsampling module, and the downsampling module.

The method of claim 6,
The fusion block,
Object detection, which adds each of the output feature maps and fuses them using a masking technique that multiplies feature values other than the object with a filter that generates 0 in order to emphasize a feature corresponding to the object. for learning devices.

The method of claim 1,
The discriminator,
a feature extraction module extracting feature maps for the measured image and the estimated object image;
a sub-sampling module for compressing the feature map extracted by the feature extraction module; and
A learning device for object detection comprising a classification module for classifying whether the feature map is real or fake by a soft max function based on the feature map compressed from the subsampling module.

The method of claim 8,
The subsampling module,
Compressing the extracted feature map by performing max pooling and average pooling on the extracted feature map, respectively;
The classification module,
A learning device for object detection that adds feature maps obtained by performing the max pooling and average pooling, respectively.

one or more processors; and
As a learning method for object detection performed in a computing device having a memory for storing one or more programs executed by the one or more processors,
generating an object estimation image from an input image in a generative adversarial network (GAN);
comparing the generated object estimation image with a preset actual measurement image, and determining whether an input image is the actual measurement image or the generated object estimation image according to the comparison result; and
Feeding back the determination result to the generating of the object estimation image,
Generating the object estimation image,
extracting an input feature map from the input image;
outputting a final feature map by reflecting local information and global information based on the input feature map using a grid structure; and
Further comprising generating an object estimation image from the output final feature map,
In the step of outputting the final feature map,
Dividing into a first part feature map and a second part feature map according to a channel of an input feature map; and
After passing the second part feature map through a dense block and a transition block, outputting a feature map of local information by combining with the first part feature map Object detection learning method for

delete

As an object detection device based on generative adversarial networks (GAN),
The generative adversarial neural network,
An input feature map is extracted from an input image, and a final feature map is output by reflecting local information and global information based on the extracted input feature map using a grid structure. An object estimation image is generated from the output final feature map,
The generative adversarial neural network,
a plurality of dense modules connected in a row direction of the grid structure to form layers of the grid structure; and
At least one upsampling module and one downsampling module connected in a column direction of the grid structure to connect each layer of the grid structure;
The dense module,
According to the channel of the input feature map, it is divided into a first part feature map and a second part feature map, and after passing the second part feature map through a dense block and a transition block, the second part feature map An object detection device that outputs a feature map of local information in combination with a one-part feature map.

delete

The method of claim 12,
Each of the layers outputs a feature map of local information including global information according to depth.

delete

The method of claim 12,
The generative adversarial neural network,
The object detection apparatus further comprises a fusion block for fusing respective feature maps output from the dense module, the upsampling module, and the downsampling module.

The method of claim 16
The fusion block,
An object detection device that adds each of the output feature maps and fuses them using a masking technique that multiplies feature values other than the object by a filter that generates 0 in order to emphasize a feature corresponding to the object. .

one or more processors; and
As an object detection method performed in a computing device having a memory for storing one or more programs executed by the one or more processors,
extracting an input feature map from an input image in a generative adversarial network (GAN);
outputting a final feature map by reflecting local information and global information based on the input feature map using a grid structure in the generative adversarial neural network; and
In the generative adversarial neural network, generating an object estimation image from the output final feature map,
In the step of outputting the final feature map,
Dividing into a first part feature map and a second part feature map according to a channel of an input feature map; and
After passing the second part feature map through a dense block and a transition block, outputting a feature map of local information by combining with the first part feature map Object detection method.