KR102431419B1

KR102431419B1 - Single-shot adaptive fusion method and apparatus for robust multispectral object detection

Info

Publication number: KR102431419B1
Application number: KR1020200028451A
Authority: KR
Inventors: 최유경; 김지원; 조원; 남현호; 황순민; 김남일; 노치원
Original assignee: 세종대학교산학협력단; 포테닛 주식회사
Priority date: 2020-03-06
Filing date: 2020-03-06
Publication date: 2022-08-11
Also published as: KR20210112869A

Abstract

강인한 객체 검출을 위한 싱글샷 기반 적응적 융합 방법 및 그 장치가 개시된다. 강인한 객체 검출을 위한 싱글샷 기반 적응적 융합 방법은, (a) 제1 영상과 제2 영상으로부터 제1 특징맵과 제2 특징맵을 각각 획득하는 단계; (b) 상기 제1 특징맵과 상기 제2 특징맵의 상관 분석 결과를 이용하여 융합 파라미터를 생성하는 단계; 및 (c) 상기 융합 파라미터를 이용하여 다른 특징맵을 융합하는 단계를 포함한다. Disclosed are a single-shot-based adaptive fusion method for robust object detection and an apparatus therefor. A single-shot-based adaptive fusion method for robust object detection includes: (a) acquiring a first feature map and a second feature map from a first image and a second image, respectively; (b) generating a fusion parameter by using a correlation analysis result of the first feature map and the second feature map; and (c) fusing other feature maps using the fusion parameters.

Description

Single-shot adaptive fusion method and apparatus for robust multispectral object detection

본 발명은 강인한 객체 검출을 위한 싱글샷 기반 적응적 융합 방법 및 그 장치에 관한 것이다. The present invention relates to a single-shot-based adaptive fusion method and apparatus for robust object detection.

DNN(Deep neural network)의 큰 성공으로 컴퓨터 비전 알고리즘이 일상에 적용되는데 큰 주목을 받았다. 다양한 시각적 인식 작업 중에서도 보행자 감지 기능은 자율 주행 자동차와 같은 다양한 애플리케이션으로 인해 많은 연구가 진행되고 있다. 그러나, 종래의 경우 정상적인 상황에서의 장면을 커버하는데 중점을 두고 있어, 실제 차량의 주행 중 발생되는 예상치 못한 상황(예를 들어, 센서 오작동, 센서 불능 상태(blackout), 날파리등의 이물질로 인한 오염 상황)에서는 인식률이 현저하게 낮아지는 문제점이 있다.With the great success of deep neural networks (DNNs), computer vision algorithms have received great attention in everyday life. Among the various visual recognition tasks, pedestrian detection is being studied a lot due to its various applications such as autonomous vehicles. However, in the conventional case, the focus is on covering scenes in normal situations, so unexpected situations (eg, sensor malfunction, sensor blackout, contamination caused by foreign substances such as flies) that occur during actual vehicle driving. situation), there is a problem in that the recognition rate is significantly lowered.

본 발명은 강인한 객체 검출을 위한 싱글샷 기반 적응적 융합 방법 및 그 장치를 제공하기 위한 것이다. An object of the present invention is to provide a single-shot-based adaptive fusion method and apparatus for robust object detection.

또한, 본 발명은 센서 오작동, 센서 불능 상태(blackout), 날파리등의 이물질로 인한 오염 상황과 같은 비정상적인 상황에서도 강인한 보행자(객체) 검출이 가능한 싱글샷 기반 적응적 융합 방법 및 그 장치를 제공하기 위한 것이다. In addition, the present invention provides a single-shot-based adaptive fusion method and apparatus capable of robust pedestrian (object) detection even in abnormal situations such as sensor malfunction, sensor blackout, and pollution caused by foreign substances such as flies. will be.

또한, 본 발명은 장면 특성에 따라 적응적으로 융합 파라미터를 적용할 수 있는 강인한 객체 검출을 위한 싱글샷 기반 적응적 융합 방법 및 그 장치를 제공하기 위한 것이다. Another object of the present invention is to provide a single-shot-based adaptive fusion method and apparatus for robust object detection capable of adaptively applying fusion parameters according to scene characteristics.

본 발명의 일 측면에 따르면, 강인한 객체 검출을 위한 싱글샷 기반 적응적 융합 방법이 제공된다. According to one aspect of the present invention, a single-shot-based adaptive fusion method for robust object detection is provided.

본 발명의 일 실시예에 따르면, (a) 제1 영상과 제2 영상으로부터 제1 특징맵과 제2 특징맵을 각각 획득하는 단계; (b) 상기 제1 특징맵과 상기 제2 특징맵의 상관 분석 결과를 이용하여 융합 파라미터를 생성하는 단계; 및 (c) 상기 융합 파라미터를 이용하여 상기 제1 영상과 상기 제2 영상을 융합하는 단계를 포함하는 강인한 객체 검출을 위한 싱글샷 기반 적응적 융합 방법이 제공될 수 있다. According to an embodiment of the present invention, (a) obtaining a first feature map and a second feature map from the first image and the second image, respectively; (b) generating a fusion parameter by using a correlation analysis result of the first feature map and the second feature map; and (c) fusing the first image and the second image using the fusion parameter.

상기 제1 영상은 가시광 영상 및 열 영상 중 어느 하나이며, 상기 제2 영상은 가시광 영상 및 열 영상 중 다른 하나이다. The first image is any one of a visible light image and a thermal image, and the second image is the other of a visible light image and a thermal image.

상기 (b) 단계는, 상기 제1 특징맵과 상기 제2 특쟁맵의 상관 계수를 도출하는 단계; 및 상기 도출된 상관 계수를 이용하여 상기 제1 특징맵과 상기 제2 특징맵의 상관 관계를 도출하여 다른 특징맵에서 합성(concatenate)하고, 상기 합성된 특징맵을 학습된 파라미터 예측 모델에 적용하여 유동적인 융합 파라미터를 생성하는 단계를 포함할 수 있다. The step (b) may include deriving a correlation coefficient between the first feature map and the second special content map; and deriving a correlation between the first feature map and the second feature map using the derived correlation coefficient, concatenating it in another feature map, and applying the synthesized feature map to the learned parameter prediction model. generating flexible fusion parameters.

상기 파라미터 예측 모델은 3가지 경우의 훈련 데이터 셋을 이용하여 학습되되, 상기 3가지 경우의 훈련 데이터 셋은 상기 RGB 영상과 상기 열 영상이 정상인 경우, 상기 RGB 영상이 정상이고 상기 열 영상이 센서 불능 상태(blackout)인 경우, 상기 RGB 영상이 센서 불능 상태(blackout)이고 상기 열 영상이 정상인 경우에 대해 RGB 영상과 열 영상 페어(pair)로 구성될 수 있다. The parameter prediction model is trained using a training data set in three cases, and the training data set in the three cases is when the RGB image and the thermal image are normal, the RGB image is normal and the thermal image is sensor disabled In the case of the blackout state, the RGB image may be configured as a pair of the RGB image and the thermal image when the sensor is blackout and the thermal image is normal.

상기 (a) 단계 내지 상기 (b) 단계는, 각각의 장면에 대해 수행되되, Steps (a) to (b) are performed for each scene,

상기 제1 특징맵 및 상기 제2 특징맵은 SSD(single shot multibox detector) 기반 다중 특징맵 중 가장 하위 레벨 특징맵으로 특징(feature)이 컨텍스트 정보에 크게 반응하며, RGB 영상 및 열 영상이 이런 컨텍스트 정보를 공통적으로 가지고 있다는 사실에 기반한다. The first feature map and the second feature map are the lowest-level feature maps among single shot multibox detector (SSD)-based multiple feature maps, and features respond greatly to context information, and RGB images and thermal images are It is based on the fact that they have information in common.

본 발명의 다른 측면에 따르면, 강인한 객체 검출을 위한 싱글샷 기반 적응적 융합 장치가 제공된다. According to another aspect of the present invention, a single-shot-based adaptive fusion device for robust object detection is provided.

본 발명의 일 실시예에 따르면, 이종 영상으로부터 제1 특징맵과 제2 특징맵을 각각 획득하는 특징맵 추출부; 및 상기 제1 특징맵과 상기 제2 특징맵의 상관 분석 결과를 이용하여 융합 파라미터를 생성하는 파라미터 생성부를 포함하는 강인한 객체 검출을 위한 싱글샷 기반 적응적 융합 장치가 제공될 수 있다. According to an embodiment of the present invention, a feature map extractor for obtaining a first feature map and a second feature map from a heterogeneous image, respectively; and a parameter generator for generating a fusion parameter by using a correlation analysis result of the first feature map and the second feature map.

상기 이종 영상은 가시광 영상 및 열 영상이며, The heterogeneous image is a visible light image and a thermal image,

상기 파라미터 생성부는, 상기 제1 특징맵과 상기 제2 특쟁맵의 상관 계수를 도출하는 상관 분석부; 및 상기 도출된 상관 계수를 이용하여 상기 제1 특징맵과 상기 제2 특징맵을 합성(concatenate)하고, 상기 합성된 특징맵을 이용하여 융합 파라미터를 생성하는 파라미터 예측 모델을 포함할 수 있다. The parameter generating unit may include: a correlation analysis unit deriving a correlation coefficient between the first feature map and the second special content map; and a parameter prediction model that concatenates the first feature map and the second feature map using the derived correlation coefficient and generates a fusion parameter using the synthesized feature map.

상기 파라미터 예측 모델은, 복수의 컨볼루션 블록, 평균 풀링층 및 복수의 완전 연결층을 포함하며, 3가지 경우의 훈련 데이터 셋을 이용하여 학습되되, 상기 3가지 경우의 훈련 데이터 셋은 상기 RGB 영상과 상기 열 영상이 정상인 경우, 상기 RGB 영상이 정상이고 상기 열 영상이 센서 불능 상태(blackout)인 경우, 상기 RGB 영상이 센서 불능 상태(blackout)이고 상기 열 영상이 정상인 경우에 대해 RGB 영상과 열 영상 페어(pair)로 구성될 수 있다. The parameter prediction model includes a plurality of convolution blocks, an average pooling layer, and a plurality of fully connected layers, and is trained using three cases of training data sets, wherein the three cases of training data sets are the RGB images. and when the thermal image is normal, when the RGB image is normal and the thermal image is sensor blackout, when the RGB image is sensor blackout and the thermal image is normal It may consist of image pairs.

본 발명의 일 실시예에 따른 강인한 객체 검출을 위한 싱글샷 기반 적응적 융합 방법 및 그 장치를 제공함으로써, 정상적인 상황 뿐만 아니라 센서 오작동, 센서 불능 상태(blackout), 날파리등의 이물질로 인한 오염 상황과 같은 비정상적인 상황에서도 강인한 보행자(객체) 검출이 가능한 이점이 있다. By providing a single-shot-based adaptive fusion method and apparatus for robust object detection according to an embodiment of the present invention, it is possible to detect not only a normal situation, but also a sensor malfunction, a sensor blackout, and a pollution situation due to foreign substances such as flies and the like. There is an advantage that strong pedestrian (object) detection is possible even in the same abnormal situation.

또한, 본 발명은 장면 특성에 따라 적응적으로 융합 파라미터를 적용하여 강인한 객체 검출이 가능한 이점이 있다. In addition, the present invention has an advantage that robust object detection is possible by adaptively applying a fusion parameter according to scene characteristics.

도 1은 본 발명의 일 실시예에 따른 강인한 객체 검출을 위한 싱글샷 기반 적응적 융합 장치의 구성을 개략적으로 도시한 블록도.
도 2는 본 발명의 일 실시예에 따른 파라미터 생성부의 상세 구조를 도시한 도면.
도 3은 본 발명의 일 실시예에 따른 상관 분석을 설명하기 위해 도시한 도면.
도 4는 종래와 본 발명의 일 실시예에 따른 센서 결합시 보행자 검출 결과를 비교한 도면.
도 5는 종래와 본 발명의 일 실시예에 따른 다중 스펙트럼 보행자 감시 성능을 비교한 그래프.
도 6은 종래와 본 발명의 일 실시예에 따른 이종 센서의 다양한 오류 상황에 대한 검출 결과를 비교한 표.
도 7은 본 발명의 일 실시예에 따른 강인한 객체 검출을 위한 싱글샷 기반 적응적 융합 방법을 나타낸 순서도.1 is a block diagram schematically illustrating the configuration of a single-shot-based adaptive fusion device for robust object detection according to an embodiment of the present invention.
2 is a diagram illustrating a detailed structure of a parameter generator according to an embodiment of the present invention;
3 is a diagram illustrating correlation analysis according to an embodiment of the present invention.
4 is a view comparing a pedestrian detection result when a sensor is coupled according to an embodiment of the present invention with a conventional one.
5 is a graph comparing the multi-spectrum pedestrian monitoring performance according to an embodiment of the present invention with the conventional one.
6 is a table comparing detection results for various error conditions of a heterogeneous sensor according to an embodiment of the present invention and the related art.
7 is a flowchart illustrating a single-shot-based adaptive fusion method for robust object detection according to an embodiment of the present invention.

본 명세서에서 사용되는 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "구성된다" 또는 "포함한다" 등의 용어는 명세서상에 기재된 여러 구성 요소들, 또는 여러 단계들을 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다. 또한, 명세서에 기재된 "...부", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.As used herein, the singular expression includes the plural expression unless the context clearly dictates otherwise. In this specification, terms such as “consisting of” or “comprising” should not be construed as necessarily including all of the various components or various steps described in the specification, some of which components or some steps are It should be construed that it may not include, or may further include additional components or steps. In addition, terms such as "...unit" and "module" described in the specification mean a unit that processes at least one function or operation, which may be implemented as hardware or software, or a combination of hardware and software. .

이하, 첨부된 도면들을 참조하여 본 발명의 실시예를 상세히 설명한다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 강인한 객체 검출을 위한 싱글샷 기반 적응적 융합 장치의 구성을 개략적으로 도시한 블록도이고, 도 2는 본 발명의 일 실시예에 따른 파라미터 생성부의 상세 구조를 도시한 도면이고, 도 3은 본 발명의 일 실시예에 따른 상관 분석을 설명하기 위해 도시한 도면이며, 도 4는 종래와 본 발명의 일 실시예에 따른 센서 결합시 보행자 검출 결과를 비교한 도면이고, 도 5는 종래와 본 발명의 일 실시예에 따른 다중 스펙트럼 보행자 감시 성능을 비교한 그래프이며, 도 6은 종래와 본 발명의 일 실시예에 따른 이종 센서의 다양한 오류 상황에 대한 검출 결과를 비교한 표이다. 1 is a block diagram schematically illustrating the configuration of a single-shot-based adaptive fusion device for robust object detection according to an embodiment of the present invention, and FIG. 2 is a detailed structure of a parameter generator according to an embodiment of the present invention. is a diagram showing a diagram, and FIG. 3 is a diagram illustrating correlation analysis according to an embodiment of the present invention, and FIG. 4 is a comparison of pedestrian detection results when combining a sensor according to an embodiment of the present invention with a conventional one. 5 is a graph comparing the multi-spectrum pedestrian monitoring performance according to an embodiment of the present invention and the related art, and FIG. is a comparison table.

도 1을 참조하면, 본 발명의 일 실시예에 따른 강인한 객체 검출을 위한 싱글샷 기반 적응적 융합 장치(100)는 입력부(110), 특징맵 추출부(115), 파라미터 생성부(120), 융합부(125), 메모리(130) 및 프로세서(135)를 포함하여 구성된다.Referring to FIG. 1 , a single-shot-based adaptive fusion apparatus 100 for robust object detection according to an embodiment of the present invention includes an input unit 110 , a feature map extractor 115 , a parameter generator 120 , It is configured to include a fusion unit 125 , a memory 130 , and a processor 135 .

입력부(110)는 이종 영상을 입력받기 위한 수단이다. 예를 들어, 이종 영상은 RGB 영상(가시광 영상) 및 열 영상(thermal 영상)일 수 있다.The input unit 110 is a means for receiving a heterogeneous image. For example, the heterogeneous image may be an RGB image (visible light image) and a thermal image (thermal image).

즉, 이하에서 설명되는 강인한 객체 검출을 위한 싱글샷 기반 적응적 융합 장치(100)는 가시광 영상과 열 영상을 입력 받은 후 융합함에 있어 가시광 영상과 열 영상의 특징맵간의 상관도를 기반으로 융합 파라미터를 생성한 후 융합 파라미터에 따라 가시광 영상 및 열 영상에 대한 가중치를 적응적으로 변화시켜 융합하기 위한 것이다. 가시광 영상 및 열 영상의 저 레벨 특징맵으로 영상 상관 관계를 구하여 다른 특징맵을 융합할 수 있다. 이는 저 레벨 특징맵이 컨텍스트 정보를 포함하며, RGB 영상과 열 영상이 이런 컨텍스트 정보들을 공통적으로 가지고 있다는 사실을 기반한다. 이는 하기의 설명에 의해 보다 명확하게 이해될 것이다.That is, the single-shot-based adaptive fusion apparatus 100 for robust object detection described below receives a visible light image and a thermal image and then converges the fusion parameter based on the correlation between the feature map of the visible light image and the thermal image. This is for fusion by adaptively changing the weights for the visible light image and the thermal image according to the fusion parameters after generating . Image correlation can be obtained with low-level feature maps of visible light images and thermal images, and other feature maps can be fused. This is based on the fact that the low-level feature map includes context information, and the RGB image and the thermal image have this context information in common. This will be more clearly understood by the following description.

특징맵 추출부(115)는 RGB 영상과 열 영상 각각에 대한 특징맵을 각각 추출한다. The feature map extraction unit 115 extracts a feature map for each of the RGB image and the column image, respectively.

특징맵 추출부(115)는 SSD 기반 기술을 이용하여 RGB 영상과 열 영상 각각에 대한 특징맵을 각각 추출할 수 있다. 이해와 설명의 편의를 도모하기 위해, RGB 영상에 대한 특징맵을 제1 특징맵이라 칭하기로 하며, 열 영상에 대한 특징맵을 제2 특징맵이라 칭하기로 한다. The feature map extractor 115 may extract a feature map for each of the RGB image and the thermal image using SSD-based technology. For convenience of understanding and explanation, the feature map for the RGB image will be referred to as a first feature map, and the feature map for the thermal image will be referred to as a second feature map.

예를 들어, 특징맵 추출부(115)는 RGB 영상과 열 영상 각각에 대한 컨볼루션을 통해 특징맵을 각각 추출할 수 있다. 본 발명의 일 실시예에서는 Conv 1-2에 의해 RGB 영상과 열 영상 각각에 대한 특징맵을 추출하는 것을 가정하여 이를 중심으로 설명하기로 한다. 이와 같이 저 레벨 특징맵을 추출하는 이유는 저 레벨 특징맵에 코너나 에지 등에 대한 컨텍스트 정보를 포함하므로 이를 이용할 수 있다. For example, the feature map extractor 115 may extract each feature map through convolution on each of the RGB image and the column image. In an embodiment of the present invention, it is assumed that a feature map for each of the RGB image and the column image is extracted by Conv 1-2, which will be mainly described. The reason for extracting the low-level feature map as described above is that the low-level feature map includes context information on corners or edges, and thus can be used.

따라서 본 발명의 일 실시예에서는 RGB 영상과 열 영상 각각에 대한 저 레벨 특징맵을 비교하여 정상 상황인지 아닌지를 판단할 수 있다. 즉, 정상 상황인 경우(RGB 카메라와 열 화상 카메라에 이상이 없는 경우), 에지, 접합점 등에 대한 상대적으로 많은 컨텍스트 정보가 포함된 저 레벨 특징맵을 비교하는 경우 RGB 영상과 열 열상의 저 레벨 특징맵은 상관도가 높은 것으로 나타나게 된다. Accordingly, in an embodiment of the present invention, it is possible to determine whether or not a normal situation is present by comparing the low-level feature maps for each of the RGB image and the thermal image. In other words, under normal circumstances (when there is no abnormality in the RGB camera and thermal imager), when comparing low-level feature maps that contain relatively large amounts of context information about edges and junctions, low-level features of RGB images and thermal images The map appears to be highly correlated.

이러한 특징을 이용하여 이종 입력 영상(즉, RGB 영상 및 열 영상)이 정상적인 상황에서 입력되었는지 여부를 판단할 수 있다. Using these features, it can be determined whether heterogeneous input images (ie, RGB images and thermal images) are input under normal conditions.

파라미터 생성부(120)는 RGB 영상과 열 영상 각각에 대한 특징맵(제1 특징맵과 제2 특징맵)에 대한 상관도를 계산한 후, 이를 이용하여 융합 파라미터를 생성할 수 있다. 즉, 파라미터 생성부(120)는 도 2에 도시된 바와 같이 딥 러닝 기반 파라미터 예측 모델을 포함하되, 해당 딥 러닝 기반 파라미터 예측 모델을 이용하여 융합 파라미터를 생성할 수 있다. 이는 이하의 설명에 의해 보다 명확하게 이해될 것이다. The parameter generator 120 may generate a fusion parameter by calculating a correlation for a feature map (a first feature map and a second feature map) for each of the RGB image and the thermal image, and then using this. That is, the parameter generator 120 includes a deep learning-based parameter prediction model as shown in FIG. 2 , and may generate a fusion parameter using the corresponding deep learning-based parameter prediction model. This will be more clearly understood by the following description.

파라미터 생성부(120)의 세부 구성은 도 2에 도시된 바와 같다. 파라미터 생성부(120)는 RGB 영상과 열 영상 각각에 대한 제1 특징맵과 제2 특징맵에 대한 상관도를 계산한다. 이를 위해, 파라미터 생성부(120)는 도 2에 도시된 바와 같이, 제1 특징맵과 제2 특징맵에 대한 평균 풀링 결과를 이용하여 상관 분석을 통해 상관 계수를 도출한다. 상관 계수를 도출하는 방법 자체는 당업자에게는 자명한 사항이므로 이에 대한 별도의 설명은 생략하기로 한다. The detailed configuration of the parameter generator 120 is as shown in FIG. 2 . The parameter generator 120 calculates a correlation between the first feature map and the second feature map for each of the RGB image and the column image. To this end, as shown in FIG. 2 , the parameter generator 120 derives a correlation coefficient through correlation analysis using the average pooling result for the first feature map and the second feature map. Since the method of deriving the correlation coefficient itself is obvious to those skilled in the art, a separate description thereof will be omitted.

즉, 정상적인 상황에서 입력된 RGB 영상과 열 열상에 대한 저 레벨 특징맵은 높은 상관을 보이게 된다. 그러나, 센서 오작동 등의 상황에서는 제1 특징맵과 제2 특징맵에 대한 상관 분석 결과는 낮게 나타날 수밖에 없다.That is, under normal circumstances, the input RGB image and the low-level feature map for the thermal image show a high correlation. However, in a situation such as a sensor malfunction, the correlation analysis result for the first feature map and the second feature map is inevitably low.

예를 들어, RGB 카메라에 크랙이 발생한 경우를 가정하기로 한다. 이와 같은 경우, RGB 카메라와 열 화상 카메라를 통해 입력된 영상은 도 3과 같으며, RGB 카메라를 통해 입력된 RGB 영상은 크랙 상황으로 인해 상관도가 낮게 나타나게 된다. For example, it is assumed that a crack occurs in an RGB camera. In this case, the image input through the RGB camera and the thermal imager is as shown in FIG. 3 , and the RGB image input through the RGB camera shows a low correlation due to the crack situation.

이러한 특성을 반영하여 파라미터 생성부(120)는 특징맵 추출부(115)에서 획득된 제1 특징맵과 제2 특징맵에 대한 상관 분석 결과를 이용하여 융합 파라미터를 생성할 수 있다. Reflecting these characteristics, the parameter generator 120 may generate a fusion parameter by using the correlation analysis result for the first feature map and the second feature map obtained by the feature map extractor 115 .

보다 상세히 설명하면, 파라미터 생성부(120)는 RGB 영상과 열 열상에 대한 제1 특징맵과 제2 특징맵에 대한 상관 계수를 도출한다. 상관 계수를 도출하는 방법 자체는 당업자에게는 자명한 사항이므로 이에 대한 설명은 생략하기로 한다. In more detail, the parameter generator 120 derives correlation coefficients for the first feature map and the second feature map for the RGB image and the thermal image. Since the method of deriving the correlation coefficient itself is obvious to those skilled in the art, a description thereof will be omitted.

파라미터 생성부(120)는 도출된 상관 계수를 이용하여 제1 특징맵과 제2 특징맵을 합(concatenate)하고, 복수의 컨볼루션 블록을 통과시킴으로써 해당 장면(scene)에 특화된 융합 파라미터를 생성할 수 있다. 도 2에는 파라미터 생성부(120)가 3개의 컨볼루션 블록과 적응적 평균 풀링층과 두개의 완전 연결층을 포함하는 것으로 도시되어 있다. 이는 일 예일 뿐이며, 컨볼루션 블록의 개수 등은 상이할 수 있음은 당연하다. The parameter generator 120 concatenates the first feature map and the second feature map using the derived correlation coefficient, and passes a plurality of convolution blocks to generate a fusion parameter specific to the scene. can 2 shows that the parameter generator 120 includes three convolution blocks, an adaptive average pooling layer, and two fully connected layers. This is only an example, and it is natural that the number of convolution blocks may be different.

본 발명의 일 실시예에 따르면, 파라미터 생성부(120)는 RGB 영상 및 열 영상을 이용하여 학습되어 있다. 이 과정에 대해 간략하게 설명하기로 한다. According to an embodiment of the present invention, the parameter generator 120 is trained using the RGB image and the thermal image. This process will be briefly described.

파라미터 생성부(120)는 3가지 경우에 대한 훈련 데이터 셋을 포함한다. 훈련 데이터 셋은 RGB 영상과 열 영상 쌍으로 구성되되, RGB 영상과 열 영상이 모두 정상인 제1 경우(RGB 영상(On), 열 영상(On)), RGB 영상이 정상이고 열 영상이 비정상으로 입력되지 않는 제2 경우(RGB 영상(On), 열 영상(Off)), RGB 영상이 비정상으로 입력되지 않으며, 열 영상이 정상인 제3 경우(RGB 영상(Off), 열 영상(On))일 수 있다. The parameter generator 120 includes training data sets for three cases. The training data set consists of an RGB image and a thermal image pair. In the first case where both the RGB image and the thermal image are normal (RGB image (On), thermal image (On)), the RGB image is normal and the thermal image is input abnormal. In the second case (RGB image (On), thermal image (Off)), the RGB image is not input abnormally, and the thermal image is normal in the third case (RGB image (Off), thermal image (On)). have.

융합부(125)는 생성된 융합 파라미터를 이용하여 다른 특징맵을 융합(fusion)한다. The fusion unit 125 fuses other feature maps using the generated fusion parameters.

즉, 본 발명의 일 실시예에 따르면, 각각의 장면에 대한 RGB 영상과 열 영상에 대한 저 레벨 특징값간의 상관 분석 결과를 이용하여 장면에 특화된 융합 파라미터를 적응적으로 생성한 후 이를 이용하여 다른 특징맵을 융합할 수 있다. 예를 들어, 4-3 conv layer에서 합성할 수 있다. That is, according to an embodiment of the present invention, a fusion parameter specific to a scene is adaptively generated using the correlation analysis result between the RGB image for each scene and the low-level feature value for the thermal image, and then another fusion parameter is generated using the fusion parameter. Feature maps can be fused. For example, it can be synthesized in 4-3 conv layers.

각각의 장면에 대한 RGB 영상과 열 영상에 대한 저 레벨 특징값간의 상관 분석 결과를 이용하여 장면에 특화된 융합 파라미터를 적응적으로 생성함으로써, 입력단에서 발생되는 예상치 못한 오류(예를 들어, 센서 오류, 센서 불능 상태(blackout), 크랙, 먼지 등)에 따라 이종 영상의 융합시 적용할 가중치를 각각 상이하게 적용할 수 있으며, 센서 결함시에도 강인한 객체(예를 들어, 보행자) 탐지가 가능하도록 할 수 있는 이점이 있다. By adaptively generating scene-specific fusion parameters using the results of correlation analysis between the RGB image for each scene and the low-level feature values for the thermal image, unexpected errors (e.g., sensor errors, Depending on the sensor's blackout, crack, dust, etc.), different weights can be applied when fusion of heterogeneous images, and strong objects (eg, pedestrians) can be detected even when the sensor is defective. there is an advantage

메모리(130)는 본 발명의 일 실시예에 따른 강인한 객체 검출을 위한 싱글샷 기반 적응적 융합 방법을 위한 다양한 명령어(프로그램 코드)들, 이 과정에서 파생되는 데이터 등을 저장하기 위한 수단이다. The memory 130 is a means for storing various instructions (program codes) for a single-shot-based adaptive fusion method for robust object detection according to an embodiment of the present invention, data derived from this process, and the like.

프로세서(135)는 본 발명의 일 실시예에 따른 강인한 객체 검출을 위한 싱글샷 기반 적응적 융합 장치(100)의 내부 구성 요소들(예를 들어, 입력부(110), 특징맵 추출부(115), 파라미터 생성부(120), 융합부(125), 메모리(130) 등)을 제어하기 위한 수단이다. The processor 135 includes internal components (eg, the input unit 110, the feature map extractor 115) of the single-shot-based adaptive fusion apparatus 100 for robust object detection according to an embodiment of the present invention. , the parameter generating unit 120 , the fusion unit 125 , the memory 130 , etc.).

도 4는 종래와 본 발명의 일 실시예에 따른 센서 결합시 보행자 검출 결과를 비교한 도면이다. 도 4의 (a)에서 보여지는 바와 같이, 센서 불능 상태(blackout), 크랙, 벌레 및 먼지 묻은 상황과 같은 센서 결함으로 인한 비정상 상황에서의 검출 결과를 비교한 것이다. 4 is a view comparing a pedestrian detection result when a sensor is coupled according to an embodiment of the present invention with a conventional one. As shown in (a) of FIG. 4 , detection results are compared in abnormal situations due to sensor defects such as sensor blackout, cracks, insects, and dust.

도 4의 (b)의 410은 일반적인 이종 영상의 정적 융합에 따른 보행자 검출 결과를 나타낸 것이며, 420은 본 발명의 일 실시예에 따른 이종 영상의 융합에 따른 보행자 검출 결과를 도시한 것이며, 430은 종래의 MSDS에 따른 이종 영상의 융합에 따른 보행자 검출 결과를 도시한 결과이다. Reference numeral 410 of FIG. 4B shows the pedestrian detection result according to the static fusion of common heterogeneous images, 420 shows the pedestrian detection result according to the fusion of heterogeneous images according to an embodiment of the present invention, and 430 is It is a result showing a pedestrian detection result according to the fusion of heterogeneous images according to the conventional MSDS.

도 4에서 보여지는 바와 같이, 본 발명의 일 실시예에 따른 방법은 센서 결함시에도 보행자 검출 성능이 종래 방법에 비해 월등이 좋은 것을 알 수 있다. As shown in FIG. 4 , it can be seen that the method according to an embodiment of the present invention has better pedestrian detection performance compared to the conventional method even when the sensor is defective.

도 5는 종래와 본 발명의 일 실시예에 따른 다중 스펙트럼 보행자 감시 성능을 비교한 그래프이다. 5 is a graph comparing the multi-spectrum pedestrian monitoring performance according to an embodiment of the present invention with the conventional one.

도 5의 (a)에서 보여지는 바와 같이, 종래의 staticFusion과 비교하여 본 발명의 경우 정상적인 상황에서 19.55%에서 10.36%의 정확성이 향상된 것을 알 수 있다. 또한, 본 발명은 도 의 (b)에서 보여지는 바와 같이 계산 효율성 높으며 종래에 비해 빠른 정확도로 더 빠른 추론이 가능한 것을 알 수 있다. 즉, 도 4의 (b)에서 보여지는 바와 같이, 본 발명은 종래의 MSDS에 비해 3배 이상 빠른 속도로 다양한 센서 오염 상황에서 강력하게 보행자를 검출할 수 있는 것을 알 수 있다. As shown in (a) of FIG. 5 , it can be seen that the accuracy of the present invention is improved from 19.55% to 10.36% under normal conditions compared to the conventional staticFusion. In addition, it can be seen that the present invention has high computational efficiency as shown in (b) of FIG. That is, as shown in (b) of FIG. 4 , it can be seen that the present invention can strongly detect pedestrians in various sensor contamination situations at a speed three times faster than that of the conventional MSDS.

도 6은 종래와 본 발명의 일 실시예에 따른 이종 센서의 다양한 오류 상황에 대한 검출 결과를 비교한 표이다. 도 6에서 보여지는 바와 같이, 종래에 비해 RGB 카메라 또는 열 화상 카메라에 결함이 있더라도 우수한 성능을 보여주는 것을 알 수 있다. 6 is a table comparing detection results for various error conditions of a heterogeneous sensor according to an embodiment of the present invention and the related art. As shown in FIG. 6 , it can be seen that the RGB camera or thermal imager shows superior performance compared to the conventional one even if there is a defect.

도 7은 본 발명의 일 실시예에 따른 강인한 객체 검출을 위한 싱글샷 기반 적응적 융합 방법을 나타낸 순서도이다. 7 is a flowchart illustrating a single-shot-based adaptive fusion method for robust object detection according to an embodiment of the present invention.

단계 710에서 적응적 융합 장치(100)는 RGB 영상 및 열 영상 각각에 대한 저 레벨 특징맵(예를 들어, 제1 특징맵, 제2 특징맵)을 각각 추출한다. In operation 710, the adaptive fusion apparatus 100 extracts a low-level feature map (eg, a first feature map and a second feature map) for each of the RGB image and the thermal image, respectively.

단계 715에서 적응적 융합 장치(100)는 제1 특징맵과 제2 특징맵을 상관 분석하여 상관 계수를 도출한다. In step 715, the adaptive fusion apparatus 100 derives a correlation coefficient by performing a correlation analysis on the first feature map and the second feature map.

단계 720에서 적응적 융합 장치(100)는 도출된 상관 계수를 이용하여 제1 특징맵과 제2 특징맵을 합성(concatenate)한다. In operation 720, the adaptive fusion apparatus 100 concatenates the first feature map and the second feature map using the derived correlation coefficient.

단계 725에서 적응적 융합 장치(100)는 합성된 특징맵을 파라미터 예측 모델에 적용하여 융합 파라미터를 생성한다. In step 725, the adaptive fusion apparatus 100 generates fusion parameters by applying the synthesized feature map to the parameter prediction model.

파라미터 예측 모델의 동작은 도 2를 참조하여 설명한 바와 동일하므로 이에 대한 별도의 설명은 생략하기로 한다. Since the operation of the parameter prediction model is the same as that described with reference to FIG. 2 , a separate description thereof will be omitted.

단계 730에서 적응적 융합 장치(100)는 생성된 융합 파라미터를 이용하여 다른 특징맵을 융합한다. In step 730, the adaptive fusion apparatus 100 fuses other feature maps using the generated fusion parameters.

본 발명의 실시 예에 따른 장치 및 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 컴퓨터 판독 가능 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 분야 통상의 기술자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media) 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.The apparatus and method according to an embodiment of the present invention may be implemented in the form of program instructions that can be executed through various computer means and recorded in a computer-readable medium. The computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination. The program instructions recorded on the computer readable medium may be specially designed and configured for the present invention, or may be known and available to those skilled in the computer software field. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floppy disks. - Includes magneto-optical media and hardware devices specially configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. Examples of program instructions include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

상술한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The hardware devices described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이제까지 본 발명에 대하여 그 실시 예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시 예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.Up to now, the present invention has been looked at focusing on the embodiments thereof. Those of ordinary skill in the art to which the present invention pertains will understand that the present invention may be implemented in a modified form without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments should be considered in an illustrative rather than a restrictive sense. The scope of the present invention is indicated in the claims rather than the foregoing description, and all differences within the scope equivalent thereto should be construed as being included in the present invention.

100: 적응적 융합 장치
110: 입력부
115: 특징맵 추출부
120: 파라미터 생성부
125: 융합부
130: 메모리
135: 프로세서100: adaptive fusion device
110: input unit
115: feature map extraction unit
120: parameter generator
125: fusion part
130: memory
135: processor

Claims

(a) obtaining a first feature map and a second feature map from the first image and the second image, respectively;
(b) generating a fusion parameter by using a correlation analysis result of the first feature map and the second feature map; and
(c) fusing other feature maps using the fusion parameters,
Step (b) is,
deriving a correlation coefficient between the first feature map and the second feature map; and
Concatenating the first feature map and the second feature map using the derived correlation coefficient, and applying the synthesized feature map to a learned parameter prediction model to generate a fusion parameter Single-shot-based adaptive fusion method for robust object detection characterized by

The method of claim 1,
The first image is any one of a visible light image and a thermal image,
The second image is a single-shot-based adaptive fusion method for robust object detection, characterized in that the other one of a visible light image and a thermal image.

delete

3. The method of claim 2,
The parameter prediction model is trained using the training data set in three cases,
The training data set in the three cases is that when the visible light image and the thermal image are normal, the visible light image is normal and the thermal image is a sensor blackout state, the visible light image is a sensor blackout state, and A single-shot-based adaptive fusion method for robust object detection, characterized in that the visible light image and the thermal image pair are configured when the thermal image is normal.

The method of claim 1,
Steps (a) to (b) are,
A single-shot-based adaptive fusion method for robust object detection, characterized in that the fusion is performed by applying different weights to the first image and the second image according to the fusion parameter.

The method of claim 1,
The single-shot-based adaptive fusion method, characterized in that the first feature map and the second feature map are the lowest-level feature maps among single shot multibox detector (SSD)-based multiple feature maps.

A computer-readable recording medium product on which a program code for performing the method according to any one of claims 1 to 2 and 4 to 6 is recorded.

a feature map extractor for obtaining a first feature map and a second feature map from a heterogeneous image, respectively;
a parameter generator for generating a fusion parameter by using a correlation analysis result of the first feature map and the second feature map; and
Comprising a fusion unit that fuses other feature maps using the fusion parameters,
The parameter generator,
a correlation analysis unit deriving a correlation coefficient between the first feature map and the second feature map; and
A robust, characterized in that it includes a parameter prediction model that concatenates the first feature map and the second feature map using the derived correlation coefficient, and generates a fusion parameter using the synthesized feature map. Single-shot-based adaptive fusion device for object detection.

9. The method of claim 8,
The heterogeneous image is a single-shot-based adaptive fusion device for robust object detection, characterized in that the visible light image and the thermal image.

delete

10. The method of claim 9,
The parameter prediction model includes a plurality of convolution blocks, an average pooling layer, and a plurality of fully connected layers, and is trained using three cases of training data sets,
The training data set in the three cases is that when the visible light image and the thermal image are normal, the visible light image is normal and the thermal image is a sensor blackout state, the visible light image is a sensor blackout state, and A single-shot-based adaptive fusion device for robust object detection, characterized in that the visible light image and the thermal image pair are configured when the thermal image is normal.