KR20210081852A

KR20210081852A - Apparatus and method for training object detection model

Info

Publication number: KR20210081852A
Application number: KR1020190174201A
Authority: KR
Inventors: 김성호; 류준환
Original assignee: 영남대학교 산학협력단
Priority date: 2019-12-24
Filing date: 2019-12-24
Publication date: 2021-07-02
Also published as: KR102427884B1

Abstract

Disclosed are a device and method for training an object detection model. The device for training the object detection model according to one embodiment comprises: an image acquisition part that acquires a first image and a second image for a specific space generated using different image sensors; and a model training part that generates fusion information by exchanging some of the information in a feature map generated from the first image and some of the information in a feature map generated from the second image with each other, and trains an object detection model to detect one or more objects existing in the specific space based on the fusion information. Therefore, the present invention is capable of effectively training a model that detects objects.

Description

Object detection model training apparatus and method {APPARATUS AND METHOD FOR TRAINING OBJECT DETECTION MODEL}

개시되는 실시예들은 객체 검출 모델을 학습시키는 기술에 관한 것이다.Disclosed embodiments relate to techniques for training an object detection model.

빅데이터 처리 기술의 발전과 함께 객체 검출 기술은 매우 다양한 분야에서 중요하게 다루어지고 있다. 최근에는 전통적인 머신 러닝(machine learning) 기법을 이용한 객체 검출이 아닌, 딥 러닝(deep learning) 기법을 이용한 객체 검출이 주로 연구되고 있다.With the development of big data processing technology, object detection technology is being treated as important in a wide variety of fields. Recently, object detection using deep learning techniques, rather than object detection using traditional machine learning techniques, has been mainly studied.

그런데 객체 검출이 보안 및 교통 분야에서 주로 활용됨에 따라, 주간 뿐만 아니라 야간에도 정확하게 객체를 검출하는 것이 중요한 목표가 되었고, 이를 위해 전하결합소자(Charge-Coupled Device, CCD) 카메라 및 적외선(infrared, IR) 카메라로부터 얻은 데이터를 융합하여 사용하기 위한 다양한 방법론이 제시되었다.However, as object detection is mainly used in security and traffic fields, it has become an important goal to accurately detect objects not only during the daytime but also at night. For this purpose, Charge-Coupled Device (CCD) cameras and infrared (IR) Various methodologies have been proposed to fuse and use the data obtained from the camera.

그러나 기 제시된 방법론들은 네트워크 구조 일부에서 오로지 연접(concatenation)에 기반한 데이터 융합을 시도하여, CCD 카메라 데이터 및 IR 카메라 데이터를 제대로 융합하지 못하는 문제가 존재한다.However, the previously presented methodologies attempt to converge data based solely on concatenation in a part of the network structure, so there is a problem in that CCD camera data and IR camera data cannot be properly fused.

대한민국 공개특허공보 제10-2019-0119261호 (2019.10.22. 공개)Republic of Korea Patent Publication No. 10-2019-0119261 (published on October 22, 2019)

개시되는 실시예들은 객체를 검출하는 모델을 효과적으로 학습시키기 위한 것이다.The disclosed embodiments are for effectively training a model for detecting an object.

일 실시예에 따른 객체 검출 모델 학습 장치는 상이한 이미지 센서를 이용하여 생성된 특정 공간에 대한 제1 이미지 및 제2 이미지를 획득하는 이미지 획득부 및 상기 제1 이미지로부터 생성되는 특징 지도 내 정보 중 일부 및 상기 제2 이미지로부터 생성되는 특징 지도 내 정보 중 일부를 서로 교환함으로써 융합 정보를 생성하고, 상기 융합 정보에 기초하여 상기 특정 공간에 존재하는 하나 이상의 객체를 검출하도록 객체 검출 모델을 학습시키는 모델 학습부를 포함한다.An object detection model learning apparatus according to an embodiment includes an image acquisition unit that acquires a first image and a second image for a specific space generated using different image sensors, and some of information in a feature map generated from the first image and model learning for generating fusion information by exchanging some of the information in the feature map generated from the second image, and training an object detection model to detect one or more objects existing in the specific space based on the fusion information includes wealth.

상기 객체 검출 모델은, 복수의 스케일을 갖는 복수의 특징 추출 레이어를 포함하고, 상기 제1 이미지가 입력되는 제1 네트워크 및 상기 제1 네트워크와 동일한 구조로 구성되며, 상기 제2 이미지가 입력되는 제2 네트워크를 포함할 수 있고, 상기 모델 학습부는, 상기 제1 네트워크에 포함된 복수의 특징 추출 레이어 각각에서 출력되는 특징 지도 중 적어도 하나와 상기 제2 네트워크에 포함된 복수의 특징 추출 레이어 각각에서 출력되는 특징 지도 중 적어도 하나를 임의로 매칭하여 매칭된 각 특징 지도 내 일부 정보를 교환할 수 있다.The object detection model includes a plurality of feature extraction layers having a plurality of scales, a first network to which the first image is input, and a second network to which the second image is input, and has the same structure as the first network. may include two networks, wherein the model learning unit outputs at least one of the feature maps output from each of the plurality of feature extraction layers included in the first network and each of the plurality of feature extraction layers included in the second network By arbitrarily matching at least one of the feature maps to be used, some information in each matched feature map may be exchanged.

상기 객체 검출 모델은, 상기 융합 정보에 기초하여 상기 하나 이상의 객체에 대응되는 하나 이상의 객체 검출 박스를 생성하고 상기 하나 이상의 객체 검출 박스 각각에 대한 신뢰도 점수를 산출하는 검출부를 더 포함할 수 있다.The object detection model may further include a detection unit that generates one or more object detection boxes corresponding to the one or more objects based on the fusion information and calculates a confidence score for each of the one or more object detection boxes.

상기 제1 네트워크 및 상기 제2 네트워크는 각각, 복수의 연접 레이어(concatenation layer) 및 하나 이상의 다운스케일 레이어(down-scale layer)를 더 포함할 수 있고, 상기 복수의 특징 추출 레이어는, 스케일 별로 연속적으로 배치될 수 있고, 상기 연접 레이어는, 상기 스케일 별로 상기 복수의 특징 추출 레이어의 후단에 배치될 수 있고, 상기 다운스케일 레이어는, 상기 복수의 연접 레이어 중 최소 스케일의 연접 레이어를 제외한 나머지 연접 레이어 각각의 후단에 배치될 수 있다.Each of the first network and the second network may further include a plurality of concatenation layers and one or more down-scale layers, and the plurality of feature extraction layers are continuous for each scale. , wherein the concatenated layer may be arranged at a rear end of the plurality of feature extraction layers for each scale, and the downscale layer may be a contiguous layer other than a contiguous layer of a minimum scale among the plurality of contiguous layers. It may be disposed at the rear end of each.

상기 검출부는, 상기 제1 네트워크 내에 포함된 복수의 연접 레이어에서 출력되는 복수의 특징 지도 및 상기 제2 네트워크 내에 포함된 복수의 연접 레이어에서 출력되는 복수의 특징 지도를 스케일 별로 연접한 결과 각각에 기초하여 상기 하나 이상의 객체 검출 박스를 생성하고, 상기 하나 이상의 객체 검출 박스 및 기 설정된 정답(ground-truth) 박스에 기초하여 상기 신뢰도 점수를 산출할 수 있다.The detection unit is based on a result of concatenating a plurality of feature maps output from a plurality of concatenated layers included in the first network and a plurality of feature maps output from a plurality of concatenated layers included in the second network for each scale. to generate the one or more object detection boxes, and calculate the reliability score based on the one or more object detection boxes and a preset ground-truth box.

상기 검출부는, 상기 하나 이상의 객체 검출 박스 각각에 대한 상기 신뢰도 점수에 기초한 비최대값 억제(Non-Maximum Suppression, NMS) 알고리즘을 이용하여 상기 하나 이상의 객체 검출 박스 중 중복되는 객체 검출 박스를 제거할 수 있다.The detection unit may use a Non-Maximum Suppression (NMS) algorithm based on the confidence score for each of the one or more object detection boxes to remove the overlapping object detection box among the one or more object detection boxes. have.

상기 모델 학습부는, 상기 제1 네트워크에 포함된 복수의 특징 추출 레이어 각각에서 출력되는 특징 지도 중 제1 특징 지도 및 상기 제2 네트워크에 포함된 복수의 특징 추출 레이어 각각에서 출력되는 특징 지도 중 상기 제1 특징 지도와 동일한 스케일을 가지는 제2 특징 지도를 매칭할 수 있다.The model learning unit may include a first feature map among the feature maps output from each of the plurality of feature extraction layers included in the first network and the second one of the feature maps output from each of the plurality of feature extraction layers included in the second network. A second feature map having the same scale as the first feature map may be matched.

상기 모델 학습부는, 상기 제1 특징 지도 내 픽셀 중 적어도 일부 픽셀의 원소 값과 상기 제2 특징 지도 내 픽셀 중 상기 적어도 일부 픽셀과 대응되는 위치의 픽셀의 원소 값을 교환할 수 있다.The model learning unit may exchange element values of at least some pixels among pixels in the first feature map with element values of pixels at positions corresponding to the at least some pixels among pixels in the second feature map.

상기 모델 학습부는, 상기 적어도 일부 픽셀 및 상기 대응되는 위치의 픽셀이 결정된 경우, 기 설정된 확률에 기초하여 상기 적어도 일부 픽셀의 원소 값과 상기 대응되는 위치의 픽셀의 원소 값을 교환할 수 있다.When the at least some pixels and the pixels at the corresponding positions are determined, the model learning unit may exchange element values of the at least some pixels with the element values of the pixels at the corresponding positions based on a preset probability.

일 실시예에 따른 객체 검출 모델 학습 방법은, 상이한 이미지 센서를 이용하여 생성된 특정 공간에 대한 제1 이미지 및 제2 이미지를 획득하는 단계, 상기 제1 이미지로부터 생성되는 특징 지도 내 정보 중 일부 및 상기 제2 이미지로부터 생성되는 특징 지도 내 정보 중 일부를 서로 교환함으로써 융합 정보를 생성하는 단계 및 상기 융합 정보에 기초하여 상기 특정 공간에 존재하는 하나 이상의 객체를 검출하도록 객체 검출 모델을 학습시키는 단계를 포함한다.A method for learning an object detection model according to an embodiment includes: acquiring a first image and a second image for a specific space generated using different image sensors; some of information in a feature map generated from the first image; generating fusion information by exchanging some of the information in the feature map generated from the second image, and training an object detection model to detect one or more objects existing in the specific space based on the fusion information include

상기 객체 검출 모델은, 복수의 스케일을 갖는 복수의 특징 추출 레이어를 포함하는 제1 네트워크 및 상기 제1 네트워크와 동일한 구조로 구성되는 제2 네트워크를 포함할 수 있고, 상기 융합 정보를 생성하는 단계는, 상기 제1 네트워크에 상기 제1 이미지를 입력하는 단계, 상기 제2 네트워크에 상기 제2 이미지를 입력하는 단계 및 상기 제1 네트워크에 포함된 복수의 특징 추출 레이어 각각에서 출력되는 특징 지도 중 적어도 하나와 상기 제2 네트워크에 포함된 복수의 특징 추출 레이어 각각에서 출력되는 특징 지도 중 적어도 하나를 임의로 매칭하여 매칭된 각 특징 지도 내 일부 정보를 교환하는 단계를 포함할 수 있다.The object detection model may include a first network including a plurality of feature extraction layers having a plurality of scales and a second network configured with the same structure as the first network, and generating the fusion information includes: , inputting the first image to the first network, inputting the second image to the second network, and at least one of a feature map output from each of a plurality of feature extraction layers included in the first network and arbitrarily matching at least one of the feature maps output from each of the plurality of feature extraction layers included in the second network to exchange some information in each matched feature map.

상기 제1 네트워크 및 상기 제2 네트워크는, 복수의 연접 레이어 및 하나 이상의 다운스케일 레이어를 더 포함할 수 있고, 상기 복수의 특징 추출 레이어는, 스케일 별로 연속적으로 배치될 수 있고, 상기 연접 레이어는, 상기 스케일 별로 상기 복수의 특징 추출 레이어의 후단에 배치될 수 있고, 상기 다운스케일 레이어는, 상기 복수의 연접 레이어 중 최소 스케일의 연접 레이어를 제외한 나머지 연접 레이어 각각의 후단에 배치될 수 있다.The first network and the second network may further include a plurality of concatenated layers and one or more downscale layers, and the plurality of feature extraction layers may be continuously disposed for each scale, and the concatenated layer may include: Each of the scales may be disposed at a rear end of the plurality of feature extraction layers, and the downscale layer may be disposed at a rear end of each of the remaining contiguous layers except for a minimum scale contiguous layer among the plurality of contiguous layers.

상기 검출부는, 상기 제1 네트워크 내에 포함된 복수의 연접 레이어에서 출력되는 복수의 특징 지도 및 상기 제2 네트워크 내에 포함된 복수의 연접 레이어에서 출력되는 복수의 특징 지도를 스케일 별로 연접한 결과 각각에 기초하여 상기 하나 이상의 객체 검출 박스를 생성하고, 상기 하나 이상의 객체 검출 박스 및 기 설정된 정답 박스에 기초하여 상기 신뢰도 점수를 산출할 수 있다.The detection unit is based on a result of concatenating a plurality of feature maps output from a plurality of concatenated layers included in the first network and a plurality of feature maps output from a plurality of concatenated layers included in the second network for each scale. to generate the one or more object detection boxes, and calculate the confidence score based on the one or more object detection boxes and a preset correct answer box.

상기 검출부는, 상기 하나 이상의 객체 검출 박스 각각에 대한 상기 신뢰도 점수에 기초한 비최대값 억제 알고리즘을 이용하여 상기 하나 이상의 객체 검출 박스 중 중복되는 객체 검출 박스를 제거할 수 있다.The detection unit may remove a duplicate object detection box from among the one or more object detection boxes by using a non-maximum value suppression algorithm based on the confidence score for each of the one or more object detection boxes.

상기 교환하는 단계는, 상기 제1 네트워크에 포함된 복수의 특징 추출 레이어 각각에서 출력되는 특징 지도 중 제1 특징 지도 및 상기 제2 네트워크에 포함된 복수의 특징 추출 레이어 각각에서 출력되는 특징 지도 중 상기 제1 특징 지도와 동일한 스케일을 가지는 제2 특징 지도를 매칭할 수 있다.The exchanging may include: a first feature map among feature maps output from each of the plurality of feature extraction layers included in the first network and among the feature maps output from each of the plurality of feature extraction layers included in the second network A second feature map having the same scale as the first feature map may be matched.

상기 교환하는 단계는, 상기 제1 특징 지도 내 픽셀 중 적어도 일부 픽셀의 원소 값과 상기 제2 특징 지도 내 픽셀 중 상기 적어도 일부 픽셀과 대응되는 위치의 픽셀의 원소 값을 교환함으로써 수행될 수 있다.The exchanging may be performed by exchanging element values of at least some pixels among pixels in the first feature map with element values of pixels at positions corresponding to the at least some pixels among pixels in the second feature map.

상기 교환하는 단계는, 상기 적어도 일부 픽셀 및 상기 대응되는 위치의 픽셀이 결정된 경우, 기 설정된 확률에 기초하여 수행될 수 있다.The exchanging may be performed based on a preset probability when the at least some pixels and the pixels at the corresponding positions are determined.

개시되는 실시예들에 따르면, 상이한 이미지 센서를 이용하여 생성된 두 이미지의 정보를 융합함으로써, 객체 검출 모델의 성능을 향상시킬 수 있다.According to the disclosed embodiments, the performance of the object detection model may be improved by fusing information of two images generated using different image sensors.

또한 개시되는 실시예들에 따르면, 하이퍼 파라미터(hyper-parameter)를 추가하지 않고 이미지 정보를 융합함으로써, 객체 검출 시 추가적인 메모리나 연산을 필요로 하지 않을 수 있다.Also, according to the disclosed embodiments, by fusing image information without adding a hyper-parameter, additional memory or operation may not be required when detecting an object.

도 1은 일 실시예에 따른 객체 검출 모델 학습 장치를 설명하기 위한 블록도
도 2는 일 실시예에 따른 객체 검출 모델 실행 장치를 설명하기 위한 블록도
도 3은 일 실시예에 따른 객체 검출 모델을 설명하기 위한 블록도
도 4는 일 실시예에 따른 객체 검출 모델을 상세히 설명하기 위한 도면
도 5는 일 실시예에 따른 객체 검출 모델 학습 방법을 설명하기 위한 흐름도
도 6은 일 실시예에 따른 융합 정보를 생성하는 방법을 설명하기 위한 흐름도
도 7은 일 실시예에 따른 객체 검출 모델 실행 방법을 설명하기 위한 흐름도
도 8은 종래 객체 검출 결과와 일 실시예에 따른 객체 검출 결과를 비교하기 위한 도면
도 9은 예시적인 실시예들에서 사용되기에 적합한 컴퓨팅 장치를 포함하는 컴퓨팅 환경을 예시하여 설명하기 위한 블록도1 is a block diagram illustrating an apparatus for learning an object detection model according to an embodiment;
2 is a block diagram illustrating an apparatus for executing an object detection model according to an embodiment;
3 is a block diagram illustrating an object detection model according to an embodiment;
4 is a diagram for describing in detail an object detection model according to an embodiment;
5 is a flowchart illustrating a method for learning an object detection model according to an embodiment;
6 is a flowchart illustrating a method of generating fusion information according to an embodiment;
7 is a flowchart illustrating a method of executing an object detection model according to an embodiment;
8 is a diagram for comparing a conventional object detection result and an object detection result according to an embodiment;
9 is a block diagram illustrating and describing a computing environment including a computing device suitable for use in example embodiments.

이하, 도면을 참조하여 구체적인 실시형태를 설명하기로 한다. 이하의 상세한 설명은 본 명세서에서 기술된 방법, 장치 및/또는 시스템에 대한 포괄적인 이해를 돕기 위해 제공된다. 그러나 이는 예시에 불과하며 개시되는 실시예들은 이에 제한되지 않는다.Hereinafter, specific embodiments will be described with reference to the drawings. DETAILED DESCRIPTION The following detailed description is provided to provide a comprehensive understanding of the methods, devices, and/or systems described herein. However, this is merely an example and the disclosed embodiments are not limited thereto.

실시예들을 설명함에 있어서, 관련된 공지기술에 대한 구체적인 설명이 개시되는 실시예들의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다. 그리고, 후술되는 용어들은 개시되는 실시예들에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다. 상세한 설명에서 사용되는 용어는 단지 실시예들을 기술하기 위한 것이며, 결코 제한적이어서는 안 된다. 명확하게 달리 사용되지 않는 한, 단수 형태의 표현은 복수 형태의 의미를 포함한다. 본 설명에서, "포함" 또는 "구비"와 같은 표현은 어떤 특성들, 숫자들, 단계들, 동작들, 요소들, 이들의 일부 또는 조합을 가리키기 위한 것이며, 기술된 것 이외에 하나 또는 그 이상의 다른 특성, 숫자, 단계, 동작, 요소, 이들의 일부 또는 조합의 존재 또는 가능성을 배제하도록 해석되어서는 안 된다.In describing the embodiments, if it is determined that a detailed description of a related known technology may unnecessarily obscure the gist of the disclosed embodiments, the detailed description thereof will be omitted. And, the terms to be described later are terms defined in consideration of functions in the disclosed embodiments, which may vary according to intentions or customs of users and operators. Therefore, the definition should be made based on the content throughout this specification. The terminology used in the detailed description is for the purpose of describing the embodiments only, and should in no way be limiting. Unless explicitly used otherwise, expressions in the singular include the meaning of the plural. In this description, expressions such as “comprising” or “comprising” are intended to indicate certain features, numbers, steps, acts, elements, some or a combination thereof, one or more other than those described. It should not be construed to exclude the presence or possibility of other features, numbers, steps, acts, elements, or any part or combination thereof.

이하에서, '객체 검출'은 이미지 내에 포함된 객체 주변을 둘러싸는 박스를 이미지 상에 표시함으로써, 이미지 내에 포함된 객체를 검출하는 컴퓨터 비전(computer vision) 기술을 의미한다. 이때, '객체'는 좁게는 이미지 내에 포함된 사람을 의미할 수 있고, 넓게는 이미지 내에 포함된 자동차, 사람, 동물, 식물 등을 의미할 수 있으나, 반드시 이에 한정되는 것은 아니다.Hereinafter, 'object detection' refers to a computer vision technology that detects an object included in an image by displaying a box surrounding the object included in the image on the image. In this case, the 'object' may narrowly mean a person included in the image, and broadly may mean a car, person, animal, plant, etc. included in the image, but is not limited thereto.

도 1은 일 실시예에 따른 객체 검출 모델 학습 장치(100)를 설명하기 위한 블록도이다. 도 1을 참조하면, 일 실시예에 따른 객체 검출 모델 학습 장치(100)는 이미지 획득부(110) 및 모델 학습부(120)를 포함한다.1 is a block diagram illustrating an apparatus 100 for learning an object detection model according to an embodiment. Referring to FIG. 1 , the apparatus 100 for learning an object detection model according to an embodiment includes an image acquiring unit 110 and a model learning unit 120 .

이미지 획득부(110)는 상이한 이미지 센서를 이용하여 생성된 특정 공간에 대한 제1 이미지 및 제2 이미지를 획득한다.The image acquisition unit 110 acquires a first image and a second image for a specific space generated using different image sensors.

이하에서, '이미지 센서'는 특정 공간을 직접 촬영하는 수단 또는 기 촬영된 이미지를 특정한 형식으로 변환하는 수단일 수 있다.Hereinafter, the 'image sensor' may be a means for directly photographing a specific space or a means for converting a pre-photographed image into a specific format.

예를 들어, '이미지 센서'는 전하결합소자(Charge-Coupled Device, CCD) 카메라, 적외선(infrared, IR) 카메라, 회색조(gray scale) 이미지 변환기 또는 방사 분석 온도(radiometric temperature) 이미지 변환기 중 어느 하나일 수 있으나, 반드시 이에 한정되는 것은 아니다.For example, the 'image sensor' may be a charge-coupled device (CCD) camera, an infrared (IR) camera, a gray scale image converter, or a radiometric temperature image converter. may be, but is not necessarily limited thereto.

일 실시예에 따르면, 제1 이미지가 CCD 카메라에 의해 생성된 이미지인 경우, 제2 이미지는 IR 카메라에 의해 생성된 이미지일 수 있다.According to an embodiment, when the first image is an image generated by the CCD camera, the second image may be an image generated by the IR camera.

다른 실시예에 따르면, 제1 이미지가 회색조 이미지인 경우, 제2 이미지는 방사 분석 온도 이미지일 수 있다.According to another embodiment, when the first image is a grayscale image, the second image may be a radiometric temperature image.

모델 학습부(120)는 제1 이미지로부터 생성되는 특징 지도 내 정보 중 일부 및 제2 이미지로부터 생성되는 특징 지도 내 정보 중 일부를 서로 교환함으로써 융합 정보를 생성하고, 상기 융합 정보에 기초하여 특정 공간에 존재하는 하나 이상의 객체를 검출하도록 객체 검출 모델(300)을 학습시킨다.The model learning unit 120 generates fusion information by exchanging some of the information in the feature map generated from the first image and part of the information in the feature map generated from the second image with each other, and based on the fusion information, a specific space The object detection model 300 is trained to detect one or more objects present in .

일 실시예에 따르면, 객체 검출 모델(300)이 제1 네트워크(310) 및 제2 네트워크(320)를 포함할 때, 모델 학습부(120)는 제1 네트워크(310)에 포함된 복수의 특징 추출 레이어 각각에서 출력되는 특징 지도 중 적어도 하나와 제2 네트워크(320)에 포함된 복수의 특징 추출 레이어 각각에서 출력되는 특징 지도 중 적어도 하나를 임의로 매칭하여 매칭된 각 특징 지도 내 일부 정보를 교환할 수 있다.According to an embodiment, when the object detection model 300 includes the first network 310 and the second network 320 , the model learning unit 120 includes a plurality of features included in the first network 310 . At least one of the feature maps output from each extraction layer and at least one of the feature maps output from each of the plurality of feature extraction layers included in the second network 320 are arbitrarily matched to exchange some information in each matched feature map. can

일 실시예에 따르면, 모델 학습부(120)는 제1 네트워크(310)에 포함된 복수의 특징 추출 레이어 각각에서 출력되는 특징 지도 중 제1 특징 지도 및 제2 네트워크(320)에 포함된 복수의 특징 추출 레이어 각각에서 출력되는 특징 지도 중 제1 특징 지도와 동일한 스케일을 가지는 제2 특징 지도를 매칭할 수 있다.According to an embodiment, the model learning unit 120 includes a first feature map and a plurality of feature maps included in the second network 320 among the feature maps output from each of the plurality of feature extraction layers included in the first network 310 . Among the feature maps output from each feature extraction layer, a second feature map having the same scale as the first feature map may be matched.

이어서, 모델 학습부(120)는 제1 특징 지도 내 픽셀 중 적어도 일부 픽셀의 원소 값과, 제2 특징 지도 내 픽셀 중 위 적어도 일부 픽셀과 대응되는 위치의 픽셀의 원소 값을 교환함으로써 융합을 수행할 수 있다.Subsequently, the model learning unit 120 performs fusion by exchanging element values of at least some pixels among pixels in the first feature map and element values of pixels at positions corresponding to at least some pixels above among pixels in the second feature map. can do.

이때, 모델 학습부(120)는 원소 값이 교환될 제1 특징 지도 내 적어도 일부의 픽셀과 제2 특징 지도 내 대응되는 위치의 픽셀이 결정된 경우, 기 설정된 확률에 기초하여 제1 특징 지도 내 적어도 일부 픽셀의 원소 값과 제2 특징 지도 내 대응되는 위치의 픽셀의 원소 값을 교환할 수 있다.In this case, when at least some pixels in the first feature map for which element values are to be exchanged and pixels at corresponding positions in the second feature map are determined, the model learning unit 120 determines at least one of the pixels in the first feature map based on a preset probability. An element value of some pixel may be exchanged with an element value of a pixel at a corresponding position in the second feature map.

더욱 상세하게, 모델 학습부(120)는 교환을 수행할 한 쌍의 픽셀이 결정된 경우, 기 설정된 확률 조건을 만족하는 경우에 한하여 최종적으로 한 쌍의 픽셀 내 원소 값 간의 교환을 수행할 수 있다.In more detail, when a pair of pixels to be exchanged is determined, the model learning unit 120 may finally exchange element values within the pair of pixels only when a preset probability condition is satisfied.

예를 들어, 모델 학습부(120)는 교환이 수행될 확률을 30퍼센트 이상 50퍼센트 이하의 범위에서 설정할 수 있다.For example, the model learning unit 120 may set the probability that the exchange is performed in a range of 30% or more and 50% or less.

일 실시예에 따르면, 모델 학습부(120)는 아래의 수학식 1에 의하여 한 쌍의 픽셀 내 원소 값 간의 교환을 수행할 수 있다.According to an embodiment, the model learner 120 may perform an exchange between element values within a pair of pixels according to Equation 1 below.

[수학식 1][Equation 1]

이때, s는 제1 네트워크(310)의 스케일을 지정하는 변수, s'는 s에 대응되는 제2 네트워크(320)의 스케일을 지정하는 변수, x는 특징 지도 상의 픽셀의 가로 방향 좌표를 나타내는 변수, y는 특징 지도 상의 픽셀의 세로 방향 좌표를 나타내는 변수, i는 s에 해당되는 스케일 내 1씩 순차적으로 증가하는 특징 지도의 인덱스 값, i'는 s'에 해당되는 스케일 내 특징 지도의 랜덤 인덱스 값,

는 스케일 s에서의 i번째 특징 지도의 x, y 위치에 존재하는 픽셀의 원소 값,

은 스케일 s'에서의 i'번째 특징 지도의 x, y 위치에 존재하는 픽셀의 원소 값 및

은 값 교환을 위한 임시 변수를 나타낸다.In this case, s is a variable designating the scale of the first network 310 , s' is a variable designating the scale of the second network 320 corresponding to s, and x is a variable representing the horizontal coordinates of pixels on the feature map , y is a variable indicating the vertical coordinates of pixels on the feature map, i is the index value of the feature map sequentially increasing by 1 in the scale corresponding to s, i' is the random index of the feature map in the scale corresponding to s' value,

is the element value of the pixel at the x, y position of the i-th feature map at scale s,

is the element value of the pixel at the x, y position of the i'-th feature map on the scale s', and

represents a temporary variable for value exchange.

예를 들어, 도 4를 참조하면, s는 1, 2, 3 중 어느 하나의 값을 가지며, 이는 복수의 컨볼루션 레이어(convolution layer)들의 스케일에 따른 세 단계의 분류 중 어느 한 분류를 지정하는 값이 된다. 또한, i는 한 스케일 내 원소 값의 교환이 일어날 수 있는 3개의 컨볼루션 레이어의 인덱스 값이 각각 1, 2, 3이라 할 때, 1부터 1씩 증가하여 3까지의 값을 가지는 변수이며, i'는 1부터 3 사이의 임의의 정수 값을 갖는 변수이다.For example, referring to FIG. 4 , s has any one of 1, 2, and 3, which designates any one of three stages of classification according to the scale of a plurality of convolution layers. be the value Also, i is a variable having a value of 3 by increasing from 1 to 1 when the index values of three convolutional layers in which element values can be exchanged within one scale are 1, 2, and 3, respectively, i ' is a variable having an arbitrary integer value between 1 and 3.

도 2는 일 실시예에 따른 객체 검출 모델 실행 장치(200)를 설명하기 위한 블록도이다. 도 2를 참조하면, 일 실시예에 따른 객체 검출 모델 실행 장치(200)는 이미지 획득부(210) 및 모델 실행부(220)를 포함한다.2 is a block diagram illustrating an object detection model execution apparatus 200 according to an embodiment. Referring to FIG. 2 , the apparatus 200 for executing an object detection model according to an embodiment includes an image acquisition unit 210 and a model execution unit 220 .

이미지 획득부(210)는 상이한 이미지 센서를 이용하여 생성된 특정 공간에 대한 제1 이미지 및 제2 이미지를 획득한다.The image acquisition unit 210 acquires a first image and a second image for a specific space generated using different image sensors.

모델 실행부(220)는 객체 검출 모델 학습 장치(100)에 의해 기 학습된 객체 검출 모델(300)을 이용하여, 제1 이미지 및 제2 이미지에 기초하여 특정 공간에 존재하는 하나 이상의 객체를 검출한다.The model executor 220 detects one or more objects existing in a specific space based on the first image and the second image using the object detection model 300 previously learned by the object detection model learning apparatus 100 . do.

구체적으로, 모델 실행부(220)는 객체 검출 모델(300)을 이용하여, 제1 이미지 또는 제2 이미지 중 어느 하나를 배경으로 하여, 해당 이미지 내에 존재하는 하나 이상의 객체에 객체 검출 박스가 표시된 출력 이미지를 생성한다.Specifically, the model executor 220 uses the object detection model 300 to display an object detection box on one or more objects existing in the image with either the first image or the second image as a background. create an image

이때, 모델 실행부(220)가 기 학습된 객체 검출 모델(300)을 이용하는 과정에서 제1 네트워크(310) 내 특징 추출 레이어로부터 출력되는 특징 지도 및 제2 네트워크(320) 내 특징 추출 레이어로부터 출력되는 특징 지도 간의 정보 교환은 이루어지지 않는다. 즉, 도 1을 참조하여 설명한 교환 작업은 객체 검출 모델(300)을 학습하는 과정에서만 수행되는 것이다.In this case, the feature map output from the feature extraction layer in the first network 310 and the feature extraction layer in the second network 320 are outputted from the model execution unit 220 using the pre-trained object detection model 300 . There is no information exchange between the feature maps that become available. That is, the exchange operation described with reference to FIG. 1 is performed only in the process of learning the object detection model 300 .

일 실시예에 따르면, 모델 실행부(220)는 제1 이미지 및 제2 이미지 중 하나가 출력 이미지의 배경이 되도록 미리 설정할 수 있다.According to an embodiment, the model executor 220 may preset one of the first image and the second image to be the background of the output image.

예를 들어, 제1 이미지가 CCD 카메라로 촬영된 이미지이고, 제2 이미지가 IR 카메라로 촬영된 이미지일 경우, 모델 실행부(220)는 시인성(visibility)을 고려하여 제1 이미지가 출력 이미지의 배경이 되도록 설정할 수 있다.For example, when the first image is an image photographed by a CCD camera and the second image is an image photographed by an IR camera, the model executor 220 considers visibility and determines that the first image is an output image. You can set it to be the background.

도 3은 일 실시예에 따른 객체 검출 모델(300)을 설명하기 위한 블록도이다.3 is a block diagram illustrating an object detection model 300 according to an exemplary embodiment.

도 3을 참조하면, 일 실시예에 따른 객체 검출 모델(300)은 제1 네트워크(310), 제2 네트워크(320) 및 검출부(330)를 포함한다.Referring to FIG. 3 , the object detection model 300 according to an embodiment includes a first network 310 , a second network 320 , and a detection unit 330 .

제1 네트워크(310)는 복수의 스케일(scale)을 갖는 복수의 특징 추출 레이어를 포함하고 제1 이미지를 입력받는다.The first network 310 includes a plurality of feature extraction layers having a plurality of scales and receives a first image.

일 실시예에 따르면, 제1 네트워크(310)는 컨볼루션 뉴럴 네트워크(Convolution Neural Network, CNN) 구조를 포함할 수 있다.According to an embodiment, the first network 310 may include a convolutional neural network (CNN) structure.

구체적으로, 제1 네트워크(310)는 SSD(Single-Shot multibox Detector) 구조를 포함할 수 있다.Specifically, the first network 310 may include a single-shot multibox detector (SSD) structure.

예를 들어, 제1 네트워크(310)는 Resnet 구조의 일부와 SSD 구조의 일부가 결합된 형태일 수 있으나, 반드시 이에 한정되는 것은 아니며, Resnet 구조 대신 VGGnet 구조가 이용될 수도 있다.For example, the first network 310 may have a form in which a part of the Resnet structure and a part of the SSD structure are combined, but is not limited thereto, and a VGGnet structure may be used instead of the Resnet structure.

일 실시예에 따르면, 복수의 특징 추출 레이어는 스케일 별로 연속적으로 배치될 수 있다.According to an embodiment, the plurality of feature extraction layers may be continuously disposed for each scale.

구체적으로, 각 특징 추출 레이어는 특징 지도를 출력하는 컨볼루션 레이어일 수 있다. 이때, 각 특징 추출 레이어의 stride 값은 1일 수 있다.Specifically, each feature extraction layer may be a convolutional layer that outputs a feature map. In this case, the stride value of each feature extraction layer may be 1.

일 실시예에 따르면, 제1 네트워크(310)는 복수의 연접 레이어(concatenation layer) 및 하나 이상의 다운스케일 레이어(down-scale layer)를 더 포함할 수 있다.According to an embodiment, the first network 310 may further include a plurality of concatenation layers and one or more down-scale layers.

이때, 연접 레이어는, 스케일 별로 복수의 특징 추출 레이어의 후단에 각각 배치될 수 있다.In this case, the contiguous layer may be respectively disposed at the rear end of the plurality of feature extraction layers for each scale.

구체적으로, 각 연접 레이어는 특징 지도를 출력하는 컨볼루션 레이어일 수 있으며, 각 연접 레이어로부터 출력된 특징 지도는 스케일 별로 검출부(330)에서 연접될 수 있다. 이때, 각 연접 레이어의 stride 값은 각 특징 추출 레이어의 stride 값과 동일할 수 있다.Specifically, each concatenated layer may be a convolutional layer that outputs a feature map, and the feature map output from each concatenated layer may be concatenated by the detector 330 for each scale. In this case, the stride value of each contiguous layer may be the same as the stride value of each feature extraction layer.

또한 다운스케일 레이어는, 최소 스케일의 연접 레이어를 제외한 복수의 연접 레이어의 후단에 배치될 수 있다.In addition, the downscale layer may be disposed at a rear end of a plurality of contiguous layers except for the contiguous layer of the minimum scale.

구체적으로, 각 다운스케일 레이어는 특징 지도를 출력하는 컨볼루션 레이어일 수 있으며, 이때, 각 다운스케일 레이어의 stride 값은 동일한 스케일의 특징 추출 레이어의 stride 값보다 크다. 예를 들어, 특징 지도 추출 레이어의 stride 값이 1일 경우, 이와 동일한 스케일의 다운스케일 레이어의 stride 값은 2일 수 있다.Specifically, each downscale layer may be a convolutional layer that outputs a feature map, and in this case, the stride value of each downscale layer is greater than the stride value of the feature extraction layer of the same scale. For example, when the stride value of the feature map extraction layer is 1, the stride value of the downscale layer of the same scale may be 2 .

제2 네트워크(320)는 제1 네트워크(310)와 동일한 구조로 구성되며, 제2 이미지를 입력받는다.The second network 320 has the same structure as the first network 310 and receives a second image.

검출부(330)는 융합 정보에 기초하여 하나 이상의 객체에 대응되는 하나 이상의 객체 검출 박스를 생성하고 하나 이상의 객체 검출 박스 각각에 대한 신뢰도 점수를 산출할 수 있다.The detection unit 330 may generate one or more object detection boxes corresponding to one or more objects based on the fusion information and calculate a confidence score for each of the one or more object detection boxes.

일 실시예에 따르면, 검출부(330)는 제1 네트워크(310) 내에 포함된 복수의 연접 레이어에서 출력되는 복수의 특징 지도 및 제2 네트워크(320) 내에 포함된 복수의 연접 레이어에서 출력되는 복수의 특징 지도를 스케일 별로 연접한 결과 각각에 기초하여 하나 이상의 객체 검출 박스를 생성하고, 생성된 하나 이상의 객체 검출 박스 및 기 설정된 정답(ground-truth, GT) 박스에 기초하여 신뢰도 점수를 산출할 수 있다.According to an embodiment, the detector 330 includes a plurality of feature maps output from a plurality of concatenated layers included in the first network 310 and a plurality of feature maps output from a plurality of concatenated layers included in the second network 320 . One or more object detection boxes are generated based on each result of concatenating the feature map for each scale, and a reliability score can be calculated based on the generated one or more object detection boxes and a preset correct answer (ground-truth, GT) box. .

구체적으로, 검출부(330)는 제1 네트워크(310)로부터 출력된 특징 지도 및 제2 네트워크(320)로부터 출력된 특징 지도를 각 스케일 별로 연접하여 융합 특징 지도를 생성하고, 생성된 융합 특징 지도 내 픽셀의 원소 값에 기초하여 객체 검출 박스를 생성할 수 있다.Specifically, the detector 330 generates a fusion feature map by concatenating the feature map output from the first network 310 and the feature map output from the second network 320 for each scale, and within the generated fusion feature map An object detection box may be generated based on element values of pixels.

일 실시예에 따르면, 검출부(330)는 생성된 객체 검출 박스와 기 설정된 정답 박스 간의 IOU(Intersection Over Union)을 계산하여, positive sample과 negative sample을 구분할 수 있다.According to an embodiment, the detection unit 330 may calculate an Intersection Over Union (IOU) between the generated object detection box and a preset correct answer box to distinguish a positive sample from a negative sample.

일 실시예에 따르면, 검출부(330)는 하나 이상의 객체 검출 박스 각각에 대한 신뢰도 점수에 기초한 비최대값 억제(Non-Maximum Suppression, NMS) 알고리즘을 이용하여 하나 이상의 객체 검출 박스 중 중복되는 객체 검출 박스를 제거할 수 있다.According to an embodiment, the detection unit 330 uses a Non-Maximum Suppression (NMS) algorithm based on a confidence score for each of one or more object detection boxes to detect duplicate objects among one or more object detection boxes. can be removed.

구체적으로, 검출부(330)는 객체 검출 박스를 신뢰도 점수 순으로 내림차순 정렬한 후, 가장 높은 신뢰도 점수를 갖는 객체 검출 박스와 나머지 박스들 간의 IoU 값을 계산하여, IoU 값이 0.5 이상인 나머지 박스들을 제거한다. 이후, 검출부(330)는 남은 객체 검출 박스 중 두번째로 높은 신뢰도 점수를 갖는 객체 검출 박스와 남은 나머지 박스들 사이에서 같은 과정을 반복한다. 이후 검출부(330)는 상술한 과정을 반복적으로 수행한다.Specifically, the detector 330 sorts the object detection boxes in descending order of confidence score, calculates an IoU value between the object detection box having the highest confidence score and the remaining boxes, and removes the remaining boxes having an IoU value of 0.5 or more. do. Thereafter, the detector 330 repeats the same process between the object detection box having the second highest reliability score among the remaining object detection boxes and the remaining boxes. Thereafter, the detection unit 330 repeatedly performs the above-described process.

도 4는 일 실시예에 따른 객체 검출 모델(300)을 상세히 설명하기 위한 도면이다.4 is a diagram for explaining in detail the object detection model 300 according to an embodiment.

도 4를 참조하면, 상이한 이미지 센서에 의해 생성된 특정 공간에 대한 회색조 이미지와 방사 분석 온도 이미지가 각각 평행하게 배치된 동일한 구조의 두 네트워크에 입력된다. 이때 상단의 복수의 레이어가 배치된 네트워크 구조는 제1 네트워크(310)를 나타내며, 하단의 동일한 네트워크 구조는 제2 네트워크(320)를 나타낸다.Referring to FIG. 4 , a grayscale image and a radiation analysis temperature image for a specific space generated by different image sensors are input to two networks of the same structure arranged in parallel, respectively. In this case, the network structure in which the plurality of layers are arranged at the top represents the first network 310 , and the same network structure at the bottom represents the second network 320 .

일 실시예에 따르면, 두 네트워크에 입력되는 이미지는 기 마련된 이미지 데이터 세트에서 무작위로 선택된 이미지이거나, 이 이미지를 상이한 이미지 센서에 의해 각각 가공한 이미지일 수 있다.According to an embodiment, the images input to the two networks may be images randomly selected from a pre-prepared image data set, or images processed by different image sensors, respectively.

서로 동일한 구조의 두 네트워크는, 모두 전단에 기 학습된 Resnet 구조의 일부를 포함한다. 이어서 두 네트워크는, 세 단계의 스케일 별로 배치된 복수의 컨볼루션 레이어를 포함한다.Both networks having the same structure include a part of the Resnet structure previously learned in the previous stage. Subsequently, the two networks include a plurality of convolutional layers arranged for each scale of three stages.

구체적으로, 복수의 컨볼루션 레이어는 특징 추출 레이어, 연접 레이어 및 다운스케일 레이어로 분류되며, 각 스케일 별로 앞 부분에 배치된 레이어가 특징 추출 레이어이고, 중간 부분에 배치된 레이어가 연접 레이어이며, 마지막 스케일을 제외한 첫번째 및 두번째 스케일 별로 끝 부분에 배치된 레이어가 다운스케일 레이어이다.Specifically, the plurality of convolutional layers are classified into a feature extraction layer, a concatenated layer, and a downscale layer. For each scale, the layer disposed in the front part is the feature extraction layer, the layer disposed in the middle part is the concatenation layer, and the last A layer disposed at the end of each of the first and second scales except for the scale is the downscale layer.

각 특징 추출 레이어에서 출력된 특징 지도 내 픽셀 간의 교환은 도면 중앙 부분의 화살표로 표현되어 있으며, 각 화살표가 그물 형태로 엉켜있는 것은 각 특징 지도의 인덱스가 랜덤으로 선택되는 것을 나타낸 것이다. 편의상, 이러한 특징 지도 내 픽셀 간의 교환이 수행되는 구성을 부분 랜덤 연결(Partially Random-Wired, PRW)이라 명명한다.The exchange between pixels in the feature map output from each feature extraction layer is represented by an arrow in the center of the drawing, and the entangled arrows in a net shape indicate that the index of each feature map is randomly selected. For convenience, the configuration in which the exchange between pixels in the feature map is performed is called a Partially Random-Wired (PRW).

검출부(330)의 기능을 도시한 우측의 점선 박스를 참조하면, 각 연접 레이어에서 출력된 특징 지도는 스케일 별로 연접되며, 연접된 결과에 기초하여 검출부(330)는 객체 검출 박스(box) 및 신뢰도 점수(score)를 생성한다. 이후, 검출부(330)는 객체 검출 박스 및 신뢰도 점수에 기초하여 NMS 알고리즘을 적용한다.Referring to the dotted line box on the right side showing the function of the detector 330, the feature maps output from each concatenated layer are concatenated for each scale, and based on the concatenated result, the detector 330 is an object detection box and reliability generate a score. Thereafter, the detector 330 applies the NMS algorithm based on the object detection box and the confidence score.

도 5는 일 실시예에 따른 객체 검출 모델 학습 방법을 설명하기 위한 흐름도이다. 도 5에 도시된 방법은 예를 들어, 상술한 객체 검출 모델 학습 장치(100)에 의해 수행될 수 있다.5 is a flowchart illustrating a method for learning an object detection model according to an embodiment. The method shown in FIG. 5 may be performed, for example, by the above-described object detection model training apparatus 100 .

도 5를 참조하면 우선, 객체 검출 모델 학습 장치(100)는 상이한 이미지 센서를 이용하여 생성된 특정 공간에 대한 제1 이미지 및 제2 이미지를 획득한다(510).Referring to FIG. 5 , first, the object detection model training apparatus 100 acquires a first image and a second image for a specific space generated using different image sensors ( S510 ).

이후, 객체 검출 모델 학습 장치(100)는 제1 이미지로부터 생성되는 특징 지도 내 정보 중 일부 및 제2 이미지로부터 생성되는 특징 지도 내 정보 중 일부를 서로 교환함으로써 융합 정보를 생성한다(520).Thereafter, the object detection model learning apparatus 100 generates fusion information by exchanging some of the information in the feature map generated from the first image and some of the information in the feature map generated from the second image ( 520 ).

이후, 객체 검출 모델 학습 장치(100)는 융합 정보에 기초하여 특정 공간에 존재하는 하나 이상의 객체를 검출하도록 객체 검출 모델(300)을 학습시킨다(530).Thereafter, the object detection model training apparatus 100 trains the object detection model 300 to detect one or more objects existing in a specific space based on the fusion information ( 530 ).

도시된 흐름도에서는 상기 방법을 복수 개의 단계로 나누어 기재하였으나, 적어도 일부의 단계들은 순서를 바꾸어 수행되거나, 다른 단계와 결합되어 함께 수행되거나, 생략되거나, 세부 단계들로 나뉘어 수행되거나, 또는 도시되지 않은 하나 이상의 단계가 부가되어 수행될 수 있다.In the illustrated flowchart, the method is described by dividing the method into a plurality of steps, but at least some of the steps are performed in a reversed order, are performed together in combination with other steps, are omitted, are performed separately, or are not shown. One or more steps may be added and performed.

도 6은 일 실시예에 따른 융합 정보를 생성하는 방법을 설명하기 위한 흐름도이다. 도 6에 도시된 방법은 예를 들어, 상술한 객체 검출 모델 학습 장치(100)에 의해 수행될 수 있다.6 is a flowchart illustrating a method of generating fusion information according to an embodiment. The method shown in FIG. 6 may be performed, for example, by the above-described object detection model training apparatus 100 .

도 6을 참조하면 우선, 객체 검출 모델 학습 장치(100)는 제1 네트워크(310)에 제1 이미지를 입력한다(610).Referring to FIG. 6 , first, the object detection model training apparatus 100 inputs a first image to the first network 310 ( 610 ).

이후, 객체 검출 모델 학습 장치(100)는 제2 네트워크(320)에 제2 이미지를 입력한다(620).Thereafter, the object detection model training apparatus 100 inputs a second image to the second network 320 ( 620 ).

이후, 객체 검출 모델 학습 장치(100)는 제1 네트워크(310)에 포함된 복수의 특징 추출 레이어 각각에서 출력되는 특징 지도 중 적어도 하나와 제2 네트워크(320)에 포함된 복수의 특징 추출 레이어 각각에서 출력되는 특징 지도 중 적어도 하나를 임의로 매칭하여 매칭된 각 특징 지도 내 일부 정보를 교환한다(630).Thereafter, the object detection model training apparatus 100 performs at least one of the feature maps output from each of the plurality of feature extraction layers included in the first network 310 and each of the plurality of feature extraction layers included in the second network 320 . At least one of the feature maps output from is arbitrarily matched to exchange some information in each matched feature map ( 630 ).

일 실시예에 따르면, 630 단계의 수행 시 객체 검출 모델 학습 장치(100)는, 제1 네트워크(310)에 포함된 복수의 특징 추출 레이어 각각에서 출력되는 특징 지도 중 제1 특징 지도 및 제2 네트워크(320)에 포함된 복수의 특징 추출 레이어 각각에서 출력되는 특징 지도 중 제1 특징 지도와 동일한 스케일을 가지는 제2 특징 지도를 매칭할 수 있다.According to an embodiment, when performing step 630 , the object detection model training apparatus 100 includes a first feature map and a second network among feature maps output from each of a plurality of feature extraction layers included in the first network 310 . Among the feature maps output from each of the plurality of feature extraction layers included in 320 , a second feature map having the same scale as the first feature map may be matched.

일 실시예에 따르면, 630 단계는, 제1 특징 지도 내 픽셀 중 적어도 일부 픽셀의 원소 값과 제2 특징 지도 내 픽셀 중 적어도 일부 픽셀과 대응되는 위치의 픽셀의 원소 값을 교환함으로써 수행될 수 있다.According to an embodiment, operation 630 may be performed by exchanging element values of at least some of the pixels in the first feature map with element values of pixels at positions corresponding to at least some of the pixels in the second feature map. .

일 실시예에 따르면, 630 단계는, 상술한 적어도 일부 픽셀 및 대응되는 위치의 픽셀이 결정된 경우, 기 설정된 확률에 기초하여 수행될 수 있다.According to an embodiment, operation 630 may be performed based on a preset probability when at least some of the above-described pixels and pixels at corresponding positions are determined.

도 7은 일 실시예에 따른 객체 검출 모델 실행 방법을 설명하기 위한 흐름도이다. 도 7에 도시된 방법은 예를 들어, 상술한 객체 검출 모델 실행 장치(200)에 의해 수행될 수 있다.7 is a flowchart illustrating a method of executing an object detection model according to an exemplary embodiment. The method shown in FIG. 7 may be performed, for example, by the above-described object detection model execution apparatus 200 .

도 7을 참조하면 우선, 객체 검출 모델 실행 장치(200)는, 상이한 이미지 센서를 이용하여 생성된 특정 공간에 대한 제1 이미지 및 제2 이미지를 획득한다(710).Referring to FIG. 7 , first, the object detection model execution apparatus 200 acquires a first image and a second image for a specific space generated using different image sensors ( 710 ).

이후, 객체 검출 모델 실행 장치(200)는, 객체 검출 모델 학습 장치(100)에 의해 기 학습된 객체 검출 모델(300)을 이용하여, 제1 이미지 및 제2 이미지에 기초하여 특정 공간에 존재하는 하나 이상의 객체를 검출한다(720).Then, the object detection model execution apparatus 200 uses the object detection model 300 previously learned by the object detection model learning apparatus 100, based on the first image and the second image. One or more objects are detected (720).

도 8은 종래 객체 검출 결과와 일 실시예에 따른 객체 검출 결과를 비교하기 위한 도면이다. 구체적으로, 도 8의 (a) 및 (c)는 종래의 객체 검출 방법에 따른 객체 검출 결과를 나타낸 도면(proposed)이고, 도 8의 (b) 및 (d)는 상술한 객체 검출 모델 학습 방법에 의해 학습된 객체 검출 모델(300)에 의한 객체 검출 결과(baseline)를 나타낸 도면이다.8 is a diagram for comparing a conventional object detection result and an object detection result according to an exemplary embodiment. Specifically, FIGS. 8 (a) and (c) are diagrams showing an object detection result according to a conventional object detection method, and FIGS. 8 (b) and (d) are the above-described object detection model learning method It is a diagram showing an object detection result (baseline) by the object detection model 300 learned by .

도 8의 (a) 내지 (d)를 참조하면, 첫 행과 두번째 행은 주간의 도로변을 촬영한 이미지로서, 기 학습된 객체 검출 모델(300)에 의한 객체 검출 결과 종래의 방법을 통해서는 검출하지 못한 객체(사람)를 추가적으로 검출했음을 볼 수 있다. 또한 세번째 행과 마지막 행은 야간의 도로변을 촬영한 이미지로서, 기 학습된 객체 검출 모델(300)에 의한 객체 검출 결과 야간에도 마찬가지로 종래의 방법을 통해서는 검출하지 못한 객체(사람)를 추가적으로 검출했음을 볼 수 있다.Referring to (a) to (d) of FIG. 8 , the first and second rows are images of daytime roadside images, and as a result of object detection by the pre-learned object detection model 300 , detection is performed using the conventional method. It can be seen that the failed object (person) was additionally detected. In addition, the third row and the last row are images of the roadside at night, and as a result of object detection by the pre-learned object detection model 300, an object (person) that was not detected through the conventional method was additionally detected even at night. can see.

도 9는 예시적인 실시예들에서 사용되기에 적합한 컴퓨팅 장치를 포함하는 컴퓨팅 환경(10)을 예시하여 설명하기 위한 블록도이다. 도시된 실시예에서, 각 컴포넌트들은 이하에 기술된 것 이외에 상이한 기능 및 능력을 가질 수 있고, 이하에 기술된 것 이외에도 추가적인 컴포넌트를 포함할 수 있다.9 is a block diagram illustrating and describing a computing environment 10 including a computing device suitable for use in example embodiments. In the illustrated embodiment, each component may have different functions and capabilities other than those described below, and may include additional components in addition to those described below.

도시된 컴퓨팅 환경(10)은 컴퓨팅 장치(12)를 포함한다. 일 실시예에서, 컴퓨팅 장치(12)는 객체 검출 모델 학습 장치(100)일 수 있다. 또한, 컴퓨팅 장치(12)는 객체 검출 모델 실행 장치(200)일 수 있다.The illustrated computing environment 10 includes a computing device 12 . In one embodiment, the computing device 12 may be the object detection model training device 100 . Also, the computing device 12 may be the object detection model execution device 200 .

컴퓨팅 장치(12)는 적어도 하나의 프로세서(14), 컴퓨터 판독 가능 저장 매체(16) 및 통신 버스(18)를 포함한다. 프로세서(14)는 컴퓨팅 장치(12)로 하여금 앞서 언급된 예시적인 실시예에 따라 동작하도록 할 수 있다. 예컨대, 프로세서(14)는 컴퓨터 판독 가능 저장 매체(16)에 저장된 하나 이상의 프로그램들을 실행할 수 있다. 상기 하나 이상의 프로그램들은 하나 이상의 컴퓨터 실행 가능 명령어를 포함할 수 있으며, 상기 컴퓨터 실행 가능 명령어는 프로세서(14)에 의해 실행되는 경우 컴퓨팅 장치(12)로 하여금 예시적인 실시예에 따른 동작들을 수행하도록 구성될 수 있다.Computing device 12 includes at least one processor 14 , computer readable storage medium 16 , and communication bus 18 . The processor 14 may cause the computing device 12 to operate in accordance with the exemplary embodiments discussed above. For example, the processor 14 may execute one or more programs stored in the computer-readable storage medium 16 . The one or more programs may include one or more computer-executable instructions that, when executed by the processor 14, configure the computing device 12 to perform operations in accordance with the exemplary embodiment. can be

컴퓨터 판독 가능 저장 매체(16)는 컴퓨터 실행 가능 명령어 내지 프로그램 코드, 프로그램 데이터 및/또는 다른 적합한 형태의 정보를 저장하도록 구성된다. 컴퓨터 판독 가능 저장 매체(16)에 저장된 프로그램(20)은 프로세서(14)에 의해 실행 가능한 명령어의 집합을 포함한다. 일 실시예에서, 컴퓨터 판독 가능 저장 매체(16)는 메모리(랜덤 액세스 메모리와 같은 휘발성 메모리, 비휘발성 메모리, 또는 이들의 적절한 조합), 하나 이상의 자기 디스크 저장 디바이스들, 광학 디스크 저장 디바이스들, 플래시 메모리 디바이스들, 그 밖에 컴퓨팅 장치(12)에 의해 액세스되고 원하는 정보를 저장할 수 있는 다른 형태의 저장 매체, 또는 이들의 적합한 조합일 수 있다.Computer-readable storage medium 16 is configured to store computer-executable instructions or program code, program data, and/or other suitable form of information. The program 20 stored in the computer readable storage medium 16 includes a set of instructions executable by the processor 14 . In one embodiment, computer-readable storage medium 16 includes memory (volatile memory, such as random access memory, non-volatile memory, or a suitable combination thereof), one or more magnetic disk storage devices, optical disk storage devices, flash It may be memory devices, other forms of storage medium accessed by computing device 12 and capable of storing desired information, or a suitable combination thereof.

통신 버스(18)는 프로세서(14), 컴퓨터 판독 가능 저장 매체(16)를 포함하여 컴퓨팅 장치(12)의 다른 다양한 컴포넌트들을 상호 연결한다.Communication bus 18 interconnects various other components of computing device 12 , including processor 14 and computer readable storage medium 16 .

컴퓨팅 장치(12)는 또한 하나 이상의 입출력 장치(24)를 위한 인터페이스를 제공하는 하나 이상의 입출력 인터페이스(22) 및 하나 이상의 네트워크 통신 인터페이스(26)를 포함할 수 있다. 입출력 인터페이스(22) 및 네트워크 통신 인터페이스(26)는 통신 버스(18)에 연결된다. 입출력 장치(24)는 입출력 인터페이스(22)를 통해 컴퓨팅 장치(12)의 다른 컴포넌트들에 연결될 수 있다. 예시적인 입출력 장치(24)는 포인팅 장치(마우스 또는 트랙패드 등), 키보드, 터치 입력 장치(터치패드 또는 터치스크린 등), 음성 또는 소리 입력 장치, 다양한 종류의 센서 장치 및/또는 촬영 장치와 같은 입력 장치, 및/또는 디스플레이 장치, 프린터, 스피커 및/또는 네트워크 카드와 같은 출력 장치를 포함할 수 있다. 예시적인 입출력 장치(24)는 컴퓨팅 장치(12)를 구성하는 일 컴포넌트로서 컴퓨팅 장치(12)의 내부에 포함될 수도 있고, 컴퓨팅 장치(12)와는 구별되는 별개의 장치로 컴퓨팅 장치(12)와 연결될 수도 있다.Computing device 12 may also include one or more input/output interfaces 22 and one or more network communication interfaces 26 that provide interfaces for one or more input/output devices 24 . The input/output interface 22 and the network communication interface 26 are coupled to the communication bus 18 . Input/output device 24 may be coupled to other components of computing device 12 via input/output interface 22 . Exemplary input/output device 24 may include a pointing device (such as a mouse or trackpad), a keyboard, a touch input device (such as a touchpad or touchscreen), a voice or sound input device, various types of sensor devices, and/or imaging devices. input devices, and/or output devices such as display devices, printers, speakers and/or network cards. The exemplary input/output device 24 may be included in the computing device 12 as a component constituting the computing device 12 , and may be connected to the computing device 12 as a separate device distinct from the computing device 12 . may be

한편, 본 발명의 실시예는 본 명세서에서 기술한 방법들을 컴퓨터상에서 수행하기 위한 프로그램, 및 상기 프로그램을 포함하는 컴퓨터 판독 가능 기록매체를 포함할 수 있다. 상기 컴퓨터 판독 가능 기록매체는 프로그램 명령, 로컬 데이터 파일, 로컬 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체는 본 발명을 위하여 특별히 설계되고 구성된 것들이거나, 또는 컴퓨터 소프트웨어 분야에서 통상적으로 사용 가능한 것일 수 있다. 컴퓨터 판독 가능 기록매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체, CD-ROM, DVD와 같은 광 기록 매체, 및 롬, 램, 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 상기 프로그램의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함할 수 있다.Meanwhile, an embodiment of the present invention may include a program for performing the methods described in this specification on a computer, and a computer-readable recording medium including the program. The computer-readable recording medium may include program instructions, local data files, local data structures, and the like alone or in combination. The medium may be specially designed and configured for the present invention, or may be commonly used in the field of computer software. Examples of computer-readable recording media include hard disks, magnetic media such as floppy disks and magnetic tapes, optical recording media such as CD-ROMs and DVDs, and program instructions specially configured to store and execute program instructions such as ROMs, RAMs, flash memories, and the like. Hardware devices are included. Examples of the program may include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

이상에서 본 발명의 대표적인 실시예들을 상세하게 설명하였으나, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자는 상술한 실시예에 대하여 본 발명의 범주에서 벗어나지 않는 한도 내에서 다양한 변형이 가능함을 이해할 것이다. 그러므로 본 발명의 권리범위는 설명된 실시예에 국한되어 정해져서는 안 되며, 후술하는 청구범위뿐만 아니라 이 청구범위와 균등한 것들에 의해 정해져야 한다.Although representative embodiments of the present invention have been described in detail above, those of ordinary skill in the art will understand that various modifications are possible without departing from the scope of the present invention with respect to the above-described embodiments. . Therefore, the scope of the present invention should not be limited to the described embodiments, but should be defined by the following claims as well as the claims and equivalents.

10: 컴퓨팅 환경
12: 컴퓨팅 장치
14: 프로세서
16: 컴퓨터 판독 가능 저장 매체
18: 통신 버스
20: 프로그램
22: 입출력 인터페이스
24: 입출력 장치
26: 네트워크 통신 인터페이스
100: 객체 검출 모델 학습 장치
110: 이미지 획득부
120: 모델 학습부
200: 객체 검출 모델 실행 장치
210: 이미지 획득부
220: 모델 실행부
300: 객체 검출 모델
310: 제1 네트워크
320: 제2 네트워크
330: 검출부10: Computing Environment
12: computing device
14: Processor
16: computer readable storage medium
18: communication bus
20: Program
22: input/output interface
24: input/output device
26: network communication interface
100: object detection model training device
110: image acquisition unit
120: model training unit
200: object detection model execution device
210: image acquisition unit
220: model execution unit
300: object detection model
310: first network
320: second network
330: detection unit

Claims

an image acquisition unit configured to acquire a first image and a second image for a specific space generated using different image sensors; and
The fusion information is generated by exchanging part of information in the feature map generated from the first image and part of the information in the feature map generated from the second image, and based on the fusion information, one existing in the specific space An object detection model learning apparatus comprising a model learning unit for learning an object detection model to detect an abnormal object.

The method according to claim 1,
The object detection model is
a first network including a plurality of feature extraction layers having a plurality of scales and to which the first image is input; and
It has the same structure as the first network and includes a second network to which the second image is input,
The model learning unit randomly selects at least one of a feature map output from each of a plurality of feature extraction layers included in the first network and at least one of a feature map output from each of a plurality of feature extraction layers included in the second network. An object detection model learning apparatus that matches and exchanges some information in each matched feature map.

3. The method according to claim 2,
The object detection model may further include a detector configured to generate one or more object detection boxes corresponding to the one or more objects based on the fusion information and calculate a confidence score for each of the one or more object detection boxes. learning device.

4. The method according to claim 3,
Each of the first network and the second network further comprises a plurality of concatenation layers and one or more down-scale layers,
The plurality of feature extraction layers are successively arranged for each scale,
The contiguous layer is disposed at a rear end of the plurality of feature extraction layers for each scale,
The downscale layer is disposed at a rear end of each of the remaining contiguous layers except for the minimum scale concatenated layer among the plurality of concatenated layers.

5. The method according to claim 4,
The detection unit is based on a result of concatenating a plurality of feature maps output from a plurality of concatenated layers included in the first network and a plurality of feature maps output from a plurality of concatenated layers included in the second network for each scale. to generate the one or more object detection boxes, and calculate the confidence score based on the one or more object detection boxes and a preset ground-truth box.

5. The method according to claim 4,
The detection unit removes the overlapping object detection box among the one or more object detection boxes using a Non-Maximum Suppression (NMS) algorithm based on the confidence score for each of the one or more object detection boxes. Object detection model training device.

3. The method according to claim 2,
The model learning unit,
A first feature map among the feature maps output from each of the plurality of feature extraction layers included in the first network and the same as the first feature map among the feature maps output from each of the plurality of feature extraction layers included in the second network An apparatus for learning an object detection model that matches a second feature map having a scale.

8. The method of claim 7,
The model learning unit,
and exchanging element values of at least some pixels among the pixels in the first feature map with element values of pixels at positions corresponding to the at least some pixels among pixels in the second feature map.

9. The method of claim 8,
The model learning unit,
When the at least some pixels and the pixels at the corresponding positions are determined, the element values of the at least some pixels and the element values of the pixels at the corresponding positions are exchanged based on a preset probability.

acquiring a first image and a second image for a specific space generated using different image sensors;
generating fusion information by exchanging part of information in the feature map generated from the first image and part of information in the feature map generated from the second image; and
and training an object detection model to detect one or more objects existing in the specific space based on the fusion information.

11. The method of claim 10,
The object detection model is
a first network comprising a plurality of feature extraction layers having a plurality of scales; and
A second network configured with the same structure as the first network,
The step of generating the fusion information comprises:
inputting the first image to the first network;
inputting the second image to the second network; and
Each matched by randomly matching at least one of the feature maps output from each of the plurality of feature extraction layers included in the first network and at least one of the feature maps output from each of the plurality of feature extraction layers included in the second network A method of learning an object detection model, comprising exchanging some information in a feature map.

12. The method of claim 11,
The object detection model may further include a detector configured to generate one or more object detection boxes corresponding to the one or more objects based on the fusion information and calculate a confidence score for each of the one or more object detection boxes. How to learn.

13. The method of claim 12,
The first network and the second network further include a plurality of concatenated layers and one or more downscale layers,
The plurality of feature extraction layers are successively arranged for each scale,
The contiguous layer is disposed at a rear end of the plurality of feature extraction layers for each scale,
The downscale layer is disposed at a rear end of each of the remaining contiguous layers except for the minimum scale concatenated layer among the plurality of concatenated layers.

14. The method of claim 13,
The detection unit is based on a result of concatenating a plurality of feature maps output from a plurality of concatenated layers included in the first network and a plurality of feature maps output from a plurality of concatenated layers included in the second network for each scale. to generate the one or more object detection boxes, and calculate the confidence score based on the one or more object detection boxes and a preset correct answer box.

14. The method of claim 13,
The detection unit, the object detection model learning method, using a non-maximum value suppression algorithm based on the confidence score for each of the one or more object detection boxes to remove the overlapping object detection box among the one or more object detection boxes.

12. The method of claim 11,
The exchanging step is
A first feature map among the feature maps output from each of the plurality of feature extraction layers included in the first network and the same as the first feature map among the feature maps output from each of the plurality of feature extraction layers included in the second network A method for learning an object detection model, for matching a second feature map having a scale.

17. The method of claim 16,
The exchanging step is
and exchanging element values of at least some of the pixels in the first feature map with element values of pixels at positions corresponding to the at least some of the pixels in the second feature map.

18. The method of claim 17,
The exchanging step is
When the at least some pixels and the pixels at the corresponding positions are determined, the object detection model learning method is performed based on a preset probability.