KR20230088213A

KR20230088213A - Deep learning decomposition-based multi-exposure image fusion method and apparatus

Info

Publication number: KR20230088213A
Application number: KR1020220043140A
Authority: KR
Inventors: 김종옥; 김종한
Original assignee: 고려대학교 산학협력단
Priority date: 2021-12-10
Filing date: 2022-04-07
Publication date: 2023-06-19
Also published as: KR102610330B1

Abstract

딥러닝 분해 기반 다중 노출 영상 융합 방법 및 그 장치가 개시된다. 딥러닝 분해 기반 다중 노출 영상 융합 방법은, (a) 각 다중 노출 영상을 딥러닝 모델에 적용하여 특징 레벨에서 공통 성분(common component)과 개별 성분(residual component)으로 각각 분해하는 단계; (b) 상기 각각의 개별 성분을 융합하고, 상기 각각의 공통 성분을 융합하는 단계; 및 (c) 상기 융합된 개별 성분과 상기 융합된 공통 성분을 더하여 재건 영상을 생성하는 단계를 포함한다. A multi-exposure image fusion method and apparatus based on deep learning decomposition are disclosed. The deep learning decomposition-based multi-exposure image fusion method includes: (a) decomposing each multiple-exposure image into a common component and a residual component at a feature level by applying a deep learning model; (b) fusing each individual component and fusing each common component; and (c) generating a reconstruction image by adding the fused individual component and the fused common component.

Description

Deep learning decomposition-based multi-exposure image fusion method and apparatus

본 발명은 딥러닝 분해 기반 다중 노출 영상 융합 방법 및 그 장치에 관한 것이다. The present invention relates to a deep learning decomposition-based multi-exposure image fusion method and apparatus therefor.

육안으로 볼 수 있는 다이내믹 레인지는 상용 카메라 센서보다 훨씬 넓다. 자연스러운 장면의 경우 단일 노출 수준으로 촬영한 이미지는 다이내믹 레인지 측면에서 만족스러운 화질을 얻지 못하는 경우가 많다. 이미징 센서의 낮은 다이내믹 레인지로 인해 장면의 가시성이 낮다.The dynamic range visible to the human eye is much wider than commercial camera sensors. For natural scenes, images shot at single exposure levels often do not yield satisfactory quality in terms of dynamic range. The visibility of the scene is low due to the low dynamic range of the imaging sensor.

이미지 센서의 낮은 다이내믹 레이지는 이미지의 세부 사항(detail)과 대비(contrast) 측면에서 장면의 가시성을 낮춘다. 이러한 다이내믹 레인지 문제를 해결하기 위해 Mertens 등은 다중 노출 이미지 융합(MEF, 이하 MEF라 칭하기로 함)을 연구했다. MEF는 다양한 노출 수준을 가진 여러 LDR(Low Dynamic Range, 이하 LDR이라 칭하기로 함)이미지를 고품질 이미지로 병합하기 위해 HDR(High Dynamic Range, 이하 HDR이라 칭하기로 함) 이미징 기술이다. LDR 이미지에서 가시성은 불균일한 조명 환경과 카메라 노출 수준에 의해 크게 영향을 받는다. 예를 들어, 밝은 영역의 세부 정보는 과다 노출로 손실되는 반면 어두운 영역의 세부 정보는 노출 부족으로 손실된다. 이러한 문제를 해결하기 위해 많은 비-딥러닝 기반 연구가 수행되었으며 MEF 성능이 획기적으로 향상되었다. The low dynamic range of the image sensor reduces the visibility of the scene in terms of image detail and contrast. To solve this dynamic range problem, Mertens et al. studied multiple exposure image fusion (MEF, hereinafter referred to as MEF). MEF is a High Dynamic Range (HDR) imaging technology for merging multiple Low Dynamic Range (LDR) images with different exposure levels into a high-quality image. In LDR images, visibility is greatly affected by non-uniform lighting conditions and camera exposure levels. For example, details in bright areas are lost through overexposure, while details in dark areas are lost through underexposure. To solve these problems, many non-deep learning-based studies have been conducted and MEF performance has been dramatically improved.

그럼에도 불구하고, 종래의 방법들은 일부 나쁜 조건(너무 밝거나 너무 어두운)에서 심각한 시각적 부자연스러움(세부 사항 또는 색상 왜곡)을 만드는 문제점들이 존재한다. Nonetheless, conventional methods have problems of creating serious visual unnaturalness (detail or color distortion) under some bad conditions (too bright or too dark).

본 발명은 딥러닝 분해 기반 다중 노출 영상 융합 방법 및 그 장치를 제공하기 위한 것이다. The present invention is to provide a deep learning decomposition-based multi-exposure image fusion method and apparatus therefor.

또한, 본 발명은 딥러닝 네트워크상에서 성분을 분해하고, 그 특성을 확인하여 특성에 맞게 융합하여 융합 성능을 향상시킬 수 있는 딥러닝 분해 기반 다중 노출 영상 융합 방법 및 그 장치를 제공하기 위한 것이다. In addition, the present invention is to provide a deep learning decomposition-based multi-exposure image fusion method and apparatus capable of improving fusion performance by decomposing components on a deep learning network, confirming their characteristics, and fusing them according to characteristics.

또한, 본 발명은 딥러닝 네트워크에서 성분을 분리하고 성분에 맞는 융합 기법을 도입하여 디테일 소실 문제와 해일로 열화 발생 문제를 해결할 수 있는 딥러닝 분해 기반 다중 노출 영상 융합 방법 및 그 장치를 제공하기 위한 것이다. In addition, the present invention is to provide a deep learning decomposition-based multi-exposure image fusion method and apparatus capable of solving the problem of detail loss and degradation caused by tidal waves by separating components from a deep learning network and introducing a convergence technique suitable for the components. will be.

또한, 본 발명은 Y-도메인에서의 융합이 아닌 RGB-도메인에서 융합을 도입하여 컬러 복원 측면에서 성능 개선에 기여할 수 있는 딥러닝 분해 기반 다중 노출 영상 융합 방법 및 그 장치를 제공하기 위한 것이다. In addition, the present invention is to provide a deep learning decomposition-based multi-exposure image fusion method and apparatus capable of contributing to performance improvement in terms of color restoration by introducing fusion in the RGB-domain instead of fusion in the Y-domain.

본 발명의 일 측면에 따르면 딥러닝 분해 기반 다중 노출 영상 융합 방법이 제공된다. According to an aspect of the present invention, a multi-exposure image fusion method based on deep learning decomposition is provided.

본 발명의 일 실시예에 따르면, (a) 각 다중 노출 영상을 딥러닝 모델에 적용하여 특징 레벨에서 공통 성분(common component)과 개별 성분(residual component)으로 각각 분해하는 단계; (b) 상기 각각의 개별 성분을 융합하고, 상기 각각의 공통 성분을 융합하는 단계; 및 (c) 상기 융합된 개별 성분과 상기 융합된 공통 성분을 더하여 재건 영상을 생성하는 단계를 포함하는 딥러닝 분해 기반 다중 노출 영상 융합 방법이 제공될 수 있다. According to an embodiment of the present invention, (a) decomposing each multi-exposure image into a common component and a residual component at a feature level by applying a deep learning model; (b) fusing each individual component and fusing each common component; and (c) generating a reconstruction image by adding the fused individual component and the fused common component.

상기 딥러닝 모델은 동일한 장면에 대해 서로 다른 노출 조건으로 획득된 상기 각 다중 노출 영상이 각각 동일한 공통 성분을 가지도록 등화 손실을 고려하여 학습될 수 있다.The deep learning model may be trained in consideration of equalization loss so that each of the multi-exposure images obtained under different exposure conditions for the same scene has the same common component.

상기 딥러닝 모델은, 상기 분해된 각 공통 성분과 각 개별 성분이 상기 각 다중 노출 영상과 동일한 단일 영상으로 결합되도록 시각화 손실을 고려하여 가중치가 학습될 수 있다.In the deep learning model, weights may be learned in consideration of visualization loss so that each decomposed common component and each individual component are combined into a single image identical to each multi-exposure image.

상기 (b) 단계에서, 공간 어텐션 가중치 맵을 이용하여 상기 각 개별 성분들을 융합할 수 있다. In step (b), each of the individual components may be fused using a spatial attention weight map.

상기 딥러닝 모델은, 상기 융합된 공통 성분과 상기 융합된 개별 성분으로 재구성된 출력 영상과 상기 다중 노출 영상과의 차이를 고려한 재구성 손실이 최소가 되도록 학습될 수 있다. The deep learning model may be trained to minimize reconstruction loss considering a difference between an output image reconstructed with the fused common component and the fused individual component and the multi-exposure image.

본 발명의 다른 측면에 따르면, 딥러닝 분해 기반 다중 노출 영상 융합 장치가 제공된다.According to another aspect of the present invention, a multi-exposure image fusion device based on deep learning decomposition is provided.

본 발명의 일 실시예에 따르면, 각 다중 노출 영상을 딥러닝 모델에 적용하여 특징 레벨에서 공통 성분(common component)과 개별 성분(residual component)으로 각각 분해하는 분해부; 상기 각각의 개별 성분을 융합하고, 상기 각각의 공통 성분을 융합하는 융합부; 및 상기 융합된 개별 성분과 상기 융합된 공통 성분을 이용하여 재구성된 출력 영상을 생성하는 재구성부를 포함하되, 상기 딥러닝 모델은 각 다중 노출 영상을 분해한 각 공통 성분들간의 등화 손실이 최소가 되도록 학습되는 것을 특징으로 하는 딥러닝 분해 기반 다중 노출 영상 융합 장치가 제공될 수 있다. According to an embodiment of the present invention, a decomposition unit for decomposing each multi-exposure image into a common component and a residual component at a feature level by applying a deep learning model to each multi-exposure image; a fusing unit fusing each of the individual components and fusing each of the common components; and a reconstruction unit generating a reconstructed output image using the fused individual component and the fused common component, wherein the deep learning model minimizes equalization loss between common components obtained by decomposing each multi-exposure image. A multi-exposure image fusion device based on deep learning decomposition, characterized in that it is learned, can be provided.

본 발명의 일 실시예에 따른 딥러닝 분해 기반 다중 노출 영상 융합 방법 및 그 장치를 제공함으로써, 딥러닝 네트워크상에서 성분을 분해하고, 그 특성을 확인하여 특성에 맞게 융합하여 융합 성능을 향상시킬 수 있다. By providing a deep learning decomposition-based multi-exposure image fusion method and apparatus according to an embodiment of the present invention, it is possible to improve fusion performance by decomposing components on a deep learning network, confirming their characteristics, and fusing them according to characteristics. .

또한, 본 발명은 딥러닝 네트워크에서 성분을 분리하고 성분에 맞는 융합 기법을 도입하여 디테일 소실 문제와 해일로 열화 발생 문제를 해결할 수 있는 이점도 있다.In addition, the present invention also has the advantage of solving the detail loss problem and the deterioration problem caused by tidal waves by separating components from the deep learning network and introducing a convergence technique suitable for the components.

또한, 본 발명은 Y-도메인에서의 융합이 아닌 RGB-도메인에서 융합을 도입하여 컬러 복원 측면에서 성능 개선에 기여할 수 있는 이점도 있다. In addition, the present invention has an advantage of contributing to performance improvement in terms of color restoration by introducing convergence in the RGB-domain instead of convergence in the Y-domain.

도 1은 본 발명의 일 실시예에 따른 딥러닝 분해 기반 다중 노출 영상 융합 방법을 나타낸 순서도.
도 2는 본 발명의 일 실시예에 따른 딥러닝 네트워크 아키텍처를 나타낸 도면,
도 3은 본 발명의 일 실시예에 따른 분해 모듈의 상세 구조를 도시한 도면.
도 4는 본 발명의 일 실시예에 따른 시각화 및 손실 함수를 설명하기 위해 도시한 도면.
도 5는 본 발명의 일 실시예에 따른 융합 모듈의 상세 구조를 도시한 도면.
도 6은 본 발명의 일 실시예에 따른 딥러닝 분해 기반 다중 노출 영상 융합 방법을 설명하기 위해 도시한 도면.
도 7은 본 발명의 일 실시예에 따른 딥러닝 분해 기반 다중 노출 영상 융합 장치의 내부 구성을 개략적으로 도시한 블록도.
도 8 내지 도 10은 종래와 본 발명의 일 실시예에 따른 다중 노출 영상 융합 결과를 비교한 도면.1 is a flowchart illustrating a multi-exposure image fusion method based on deep learning decomposition according to an embodiment of the present invention.
2 is a diagram showing a deep learning network architecture according to an embodiment of the present invention;
3 is a diagram showing a detailed structure of a disassembly module according to an embodiment of the present invention;
4 is a diagram for explaining visualization and a loss function according to an embodiment of the present invention;
5 is a diagram showing a detailed structure of a fusion module according to an embodiment of the present invention.
6 is a diagram for explaining a multi-exposure image fusion method based on deep learning decomposition according to an embodiment of the present invention.
7 is a block diagram schematically showing the internal configuration of a multi-exposure image fusion device based on deep learning decomposition according to an embodiment of the present invention.
8 to 10 are diagrams comparing multiple exposure image fusion results according to an embodiment of the present invention and the conventional one.

본 명세서에서 사용되는 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "구성된다" 또는 "포함한다" 등의 용어는 명세서상에 기재된 여러 구성 요소들, 또는 여러 단계들을 반드시 모두 포함하는 것으로 해석되지 않아야 하며, 그 중 일부 구성 요소들 또는 일부 단계들은 포함되지 않을 수도 있고, 또는 추가적인 구성 요소 또는 단계들을 더 포함할 수 있는 것으로 해석되어야 한다. 또한, 명세서에 기재된 "...부", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있다.Singular expressions used herein include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as "consisting of" or "comprising" should not be construed as necessarily including all of the various components or steps described in the specification, and some of the components or some of the steps It should be construed that it may not be included, or may further include additional components or steps. In addition, terms such as "...unit" and "module" described in the specification mean a unit that processes at least one function or operation, which may be implemented as hardware or software or a combination of hardware and software. .

이하, 첨부된 도면들을 참조하여 본 발명의 실시예를 상세히 설명한다. Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명의 일 실시예에 따른 딥러닝 분해 기반 다중 노출 영상 융합 방법을 나타낸 순서도이고, 도 2는 본 발명의 일 실시예에 따른 딥러닝 네트워크 아키텍처를 나타낸 도면이고, 도 3은 본 발명의 일 실시예에 따른 분해 모듈의 상세 구조를 도시한 도면이며, 도 4는 본 발명의 일 실시예에 따른 시각화 및 손실 함수를 설명하기 위해 도시한 도면이며, 도 5는 본 발명의 일 실시예에 따른 융합 모듈의 상세 구조를 도시한 도면이고, 도 6은 본 발명의 일 실시예에 따른 딥러닝 분해 기반 다중 노출 영상 융합 방법을 설명하기 위해 도시한 도면이다. 1 is a flowchart showing a deep learning decomposition-based multi-exposure image fusion method according to an embodiment of the present invention, FIG. 2 is a diagram showing a deep learning network architecture according to an embodiment of the present invention, and FIG. 3 is a diagram illustrating the present invention. A diagram showing the detailed structure of a decomposition module according to an embodiment of the, Figure 4 is a diagram for explaining visualization and loss function according to an embodiment of the present invention, Figure 5 is an embodiment of the present invention 6 is a diagram showing a detailed structure of a fusion module according to , and FIG. 6 is a diagram for explaining a multi-exposure image fusion method based on deep learning decomposition according to an embodiment of the present invention.

단계 110에서 딥러닝 분해 기반 다중 노출 영상 융합 장치(100)는 복수의 다중 노출 영상을 입력받는다.In step 110, the multi-exposure image fusion apparatus 100 based on deep learning decomposition receives a plurality of multiple-exposure images.

여기서, 다중 노출 영상은 동일한 장면에 대해 서로 다른 노출 조건에 의해 획득(촬영)된 영상일 수 있다. Here, the multi-exposure image may be an image acquired (captured) under different exposure conditions for the same scene.

예를 들어, 도 2에 도시된 바와 같이, 다중 노출 영상은 저노출 영상, 과 노출 영상일 수 있다. 이와 같이, 다중 노출 영상은 노출 레벨을 달리하여 동일한 장면(scene)에 대해 획득되므로, 노출 시간에 따라 밝기(brightness), 대비(contrast), 가시성(visibility) 측면에서 서로 상이할 수 있다(도 2 참조).For example, as shown in FIG. 2 , the multi-exposure image may be a low-exposure image and an over-exposure image. In this way, since multiple exposure images are obtained for the same scene with different exposure levels, they may differ from each other in terms of brightness, contrast, and visibility according to exposure time (FIG. 2). reference).

단계 115에서 딥러닝 분해 기반 다중 노출 영상 융합 장치(100)는 복수의 다중 노출 영상을 딥러닝 모델에 적용하여 특징 레벨에서 공통 성분(common component)와 개별 성분(residual component)로 각각 분해한다. In step 115, the deep learning decomposition-based multi-exposure image fusion apparatus 100 applies a plurality of multi-exposure images to a deep learning model and decomposes each into a common component and a residual component at a feature level.

도 2에는 딥러닝 모델의 전체 아키텍처가 도시되어 있다. Figure 2 shows the overall architecture of the deep learning model.

딥러닝 모델은 복수의 다중 노출 영상을 특징 레벨에서 공통 성분과 개별 성분으로 각각 분해함에 있어, 분해된 각각의 공통 성분들의 등화 손실을 고려할 수 있다. The deep learning model may consider an equalization loss of each decomposed common component when decomposing a plurality of multi-exposure images into common components and individual components at a feature level.

이에 대해 보다 상세히 설명하기로 한다. This will be described in more detail.

다중 노출 영상은 동일 장면에 대해 노출 시간을 달리하여 획득된 영상으로, 구조적 정보는 동일하며, 구조적 정보 이외의 개별 정보들은 상이할 수 있다. 즉, 장면에 대한 에지 성분과 같은 전체적인 구조 성분은 노출 시간이 달라지더라도 변하지 않는 공통 성분들이다. A multi-exposure image is an image obtained by varying exposure times for the same scene, and structural information is the same, but individual information other than the structural information may be different. That is, overall structural components such as edge components for a scene are common components that do not change even if the exposure time is changed.

본 발명의 일 실시예에서는 이러한 점을 감안하여 복수의 다중 노출 영상을 딥러닝 모델에 적용하여 특징 레벨에서 공통 성분과 개별 성분으로 각각 분해할 수 있다. In an embodiment of the present invention, in view of this, a plurality of multi-exposure images may be applied to a deep learning model to be decomposed into common components and individual components at a feature level.

도 3에는 딥러닝 모델의 분해 블록의 세부 구조가 도시되어 있다. 도 3을 참조하면, 딥러닝 모델의 분해 블록은 복수의 다중 노출 영상을 각각 분해함에 있어, 특징 레벨에서 공통 성분과 개별 성분으로 분해할 수 있다. 3 shows a detailed structure of a decomposition block of a deep learning model. Referring to FIG. 3 , the decomposition block of the deep learning model may decompose a plurality of multi-exposure images into common components and individual components at a feature level.

복수의 다중 노출 영상의 경우, 공통 성분은 서로 거의 동일하므로 정교하게 융합될 필요가 없으며, 구조 정보(예를 들어, 에지)를 강화하기 위해 여러 컨볼루션 레이어를 통해 융합될 수 있다. 개별 성분은 영상 융합시 상세 복원, 후광 인공물과 같은 융합된 영상의 품질에 크게 기여할 수 있다.In the case of a plurality of multi-exposure images, since common components are almost identical to each other, they do not need to be elaborately fused, and can be fused through several convolutional layers to enhance structural information (eg, edges). Individual components can significantly contribute to the quality of the fused image, such as detail restoration and halo artifacts during image fusion.

따라서, 공통 성분과 개별 성분간의 정확한 분해를 보정하기 위해 딥러닝 모델은 수학식 1과 같은 제약조건을 고려하여 학습될 수 있다. Therefore, in order to correct the exact decomposition between a common component and an individual component, a deep learning model can be trained in consideration of constraints such as Equation 1.

여기서, 제약조건은 등화 손실로, 모든 입력 영상(즉, 다중 노출 영상)의 공통 성분은 모두 동일해야 한다는 것이다. 등화 손실을 수학식으로 나타내면 수학식 1과 같다. Here, the constraint is the equalization loss, and all input images (ie, multi-exposure images) must have the same common components. Equalization loss is expressed as Equation 1.

여기서,

은 입력 이미지(

)의 공통 성분을 나타내고, n은 다중 노출 이미지의 개수를 나타낸다. 손실 함수(

)는 다중 노출 이미지들 사이의 차이를 최소로 하며, 유사한 공통 성분을 가지는 것을 의미한다. 즉, 딥러닝 모델은 등화 손실이 최소가 되도록 학습될 수 있다. here,

is the input image (

), and n represents the number of multiple exposure images. loss function (

) means minimizing the difference between multiple exposure images and having similar common components. That is, the deep learning model can be trained to minimize equalization loss.

또한, 공통 성분(

)과 개별 성분(

)을 시각화하면 도 4와 같이 나타낼 수 있다. 이러한 공통 성분과 개별 성분을 시각적으로 확인하는 것은 매우 어렵다. 따라서, 분해된 공통 성분과 개별 성분을 이용하여 RGB 채널로 매핑하여 시각화할 수 있다. In addition, common components (

) and the individual components (

) can be visualized as shown in FIG. It is very difficult to visually identify these common and individual components. Therefore, it can be visualized by mapping to RGB channels using the decomposed common components and individual components.

분해된 공통 성분과 개별 성분을 결합하는 경우, 입력 영상(즉, 다중 노출 영상)과 동일한 단일 영상으로 결합되어야 한다. 이때, 분해된 공통 특징 성분과 개별 특징 성분의 재구성은 평균 제공 오차(MSE: Mean square error)로 측정될 수 있다.When combining the decomposed common components and individual components, they must be combined into a single image identical to the input image (ie, multiple exposure image). At this time, reconstruction of the decomposed common feature component and individual feature component may be measured as a mean square error (MSE).

각 공통 성분과 개별 성분은 각각 매핑 블록을 통과하며, 매핑 블록은 복수의 컨볼루션 레이어를 포함할 수 있다. 각 매핑 블록에 포함된 컨볼루션 레이어의 가중치는 각각의 특징에서 공유될 수 있다. Each common component and individual component respectively pass through a mapping block, and the mapping block may include a plurality of convolutional layers. A weight of a convolution layer included in each mapping block may be shared in each feature.

도 4에는 특징 분해 및 시각화와 이에 따른 손실 함수를 개략적으로 도시하고 있다. 도 4를 참조하여 이에 대해 보다 상세히 설명하기로 한다. Figure 4 schematically shows feature decomposition and visualization and the resulting loss function. This will be described in more detail with reference to FIG. 4 .

예를 들어, 도 4에 도시된 바와 같이, 제1 노출 영상(I₁)과 제2 노출 영상(I₂)를 가정하여 설명하기로 한다. For example, as shown in FIG. 4 , a first exposure image I ₁ and a second exposure image I ₂ will be assumed and described.

제1 노출 영상(I₁)은 딥러닝 모델을 통해 특징 레벨에서 제1 공통 성분(C₁, vis)과 제1 개별 성분(R₁, vis)으로 분해될 수 있다. 또한, 제2 노출 영상(I₂)은 딥러닝 모델을 통해 특징 레벨에서 제2 공통 성분(C₂, vis)과 제2 개별 성분(R₂, vis)으로 분해될 수 있다.The first exposure image I ₁ may be decomposed into a first common component C ₁ , vis and a first individual component R ₁ , vis at a feature level through a deep learning model. In addition, the second exposure image I ₂ may be decomposed into a second common component C ₂ , vis and a second individual component R ₂ , vis at a feature level through a deep learning model.

이때, 제1 노출 영상(I₁)과 제2 노출 영상(I₂)은 동일한 장면에 대해 노출 조건만 달리하여 획득된 영상이므로, 딥러닝 모델을 통해 특징 레벨에서 분해된 제1 공통 성분(C₁, vis)과 제2 공통 성분(C₂, vis)에 포함된 구조적 정보(예를 들어, 에지 정보)는 동일할 수 있다. In this case, since the first exposure image (I ₁ ) and the second exposure image (I ₂ ) are images obtained by changing only the exposure conditions for the same scene, the first common component (C) decomposed at the feature level through the deep learning model Structural information (eg, edge information) included in ₁ , vis and the second common component C ₂ , vis may be the same.

따라서, 딥러닝 모델은 제1 공통 성분(C₁, vis)과 제2 공통 성분(C₂, vis)의 등화 손실이 최소가 되도록 제1 노출 영상(I₁)과 제2 노출 영상(I₂)을 분해하도록 학습될 수 있다. Therefore, the deep learning model is a first exposure image (I ₁ ) and a second exposure image (I ₂ ) so that the equalization loss of the first common component (C ₁ , vis) and the second common component (C ₂ , vis) is minimized. ) can be learned to decompose.

또한, 제1 공통 성분(C₁, vis)과 제1 개별 성분(R₁, vis)을 재구성한 영상(

)은 제1 노출 영상과 동일하도록 매핑 블록의 가중치가 조정될 수 있다. In addition, an image reconstructed of the first common component (C ₁ , vis) and the first individual component (R ₁ , vis) (

) may have the weight of the mapping block adjusted to be the same as that of the first exposure image.

즉, 딥러닝 모델은 시각화 손실이 최소가 되도록 가중치를 조절할 수 있다. 시각화 손실은 수학식 2와 같이 나타낼 수 있다. That is, the deep learning model can adjust weights to minimize visualization loss. The visualization loss can be expressed as Equation 2.

여기서,

이다. here,

am.

여기서,

는 입력 영상을 나타내고,

는 재구성된 영상을 나타내고,

는 공통 성분을 나타내며,

는 개별 성분을 나타낸다. here,

represents the input image,

represents the reconstructed image,

represents a common component,

represents an individual component.

다시 정리하면, 딥러닝 모델은 등화 손실과 시각화 손실이 최소가 되도록 학습되며, 다중 노출 영상은 학습된 딥러닝 모델을 통해 공통 성분과 개별 성분으로 분해될 수 있다. In other words, the deep learning model is trained to minimize equalization loss and visualization loss, and multi-exposure images can be decomposed into common components and individual components through the trained deep learning model.

단계 120에서 딥러닝 분해 기반 다중 노출 영상 융합 장치(100)는 공통 성분과 개별 성분을 융합한다. In step 120, the multi-exposure image fusion apparatus 100 based on deep learning decomposition fuses common components and individual components.

예를 들어, 딥러닝 분해 기반 다중 노출 영상 융합 장치(100)는 개별 성분들은 융합시, 공간 어텐션 가중치 맵을 적용하여 융합할 수 있다. HDR 재구성에 대한 정보의 유용성에 따라 각 입력 영상의 기여도가 다르다. 따라서, 딥러닝 모델은 학습 과정을 통해 픽셀 단위로 공간 어텐션 가중치 맵을 생성할 수 있다. 각각의 개별 성분에 대한 공간 어텐션 가중치 맵은 고품질의 융합을 위해 개별적으로 학습될 수 있다. For example, the deep learning decomposition-based multi-exposure image fusing apparatus 100 may fuse individual components by applying a spatial attention weight map when merging. The contribution of each input image is different depending on the usefulness of the information for HDR reconstruction. Accordingly, the deep learning model may generate a spatial attention weight map in units of pixels through a learning process. Spatial attention weight maps for each individual component can be learned separately for high-quality fusion.

도 5에는 개별 성분들을 융합하는 과정이 도시되어 있다. 도 5에 도시된 바와 같이, 개별 성분은 각각 컨볼루션 레이어를 통과하며 공간 어텐션 가중치 맵이 생성될 수 있다. 5 shows the process of fusing the individual components. As shown in FIG. 5 , each component passes through a convolution layer and a spatial attention weight map may be generated.

개별 성분들을 융합한 결과는 수학식 3과 같이 나타낼 수 있다. The result of fusing the individual components can be expressed as Equation 3.

여기서,

는 공간 어텐션 가중치 맵을 나타내고,

는 개별 특징 성분을 나타낸다. 이와 같이, 공간 어텐션 가중치 맵을 적용하여 개별 성분들을 융합함으로써 후광 아티팩트를 피할 수 있는 이점이 있다. here,

denotes a spatial attention weight map,

represents an individual feature component. In this way, there is an advantage in that halo artifacts can be avoided by fusing individual components by applying a spatial attention weight map.

공통 성분은 분해 과정에서 서로 동일하도록 제약되어 결과적으로 서로 매우 유사하게 분해될 수 있다. 따라서, 모든 입력(즉, 다중 노출 영상)의 공통 성분은 몇 개의 컨볼루션 레어잉와 연결하여 융합될 수 있다. Common components can be constrained to be identical to each other in the decomposition process and consequently decompose very similarly to each other. Thus, common components of all inputs (i.e. multiple exposure images) can be fused by connecting several convolution layerings.

단계 125에서 딥러닝 분해 기반 다중 노출 영상 융합 장치(100)는 융합된 공통 성분과 융합된 개별 성분을 이용하여 출력 영상을 재구성한다.In step 125, the deep learning decomposition-based multi-exposure image fusion apparatus 100 reconstructs an output image using the fused common component and the fused individual component.

실제 영상과 재구성된 영상을 이용하여 재구성 손실을 계산할 수 있다. Reconstruction loss can be calculated using the real image and the reconstructed image.

이를 수학식으로 나타내면, 수학식 4와 같다. If this is expressed as an equation, it is equivalent to Equation 4.

여기서, Mout은 재구성된 영상을 나타내고, GT는 실제 영상을 나타낸다. 또한, SSIM(structural similarity index measure)은 시각적 화질 차이 평가 함수이다. SSIM은 공지된 기술이므로 이에 대한 상세한 설명은 생략하기로 한다. 따라서, 딥러닝 모델은 결과적으로 훈련 손실(L)을 고려하여 학습될 수 있다. 여기서, 훈련 손실은 등화 손실, 시각화 손실 및 재구성 손실이되, 이를 수학식으로 나타내면 수학식 5와 같다. Here, Mout represents a reconstructed image, and GT represents a real image. Also, SSIM (structural similarity index measure) is a visual quality difference evaluation function. Since SSIM is a well-known technology, a detailed description thereof will be omitted. Therefore, the deep learning model can be learned taking into account the training loss (L) as a result. Here, the training loss is an equalization loss, a visualization loss, and a reconstruction loss.

여기서,

,

은 각각 가중치를 나타낸다. here,

,

represent weights, respectively.

다시 정리하면, 본 발명의 일 실시예에 따르면, 다중 노출 영상은 딥러닝 모델에 의해 특징 레벨에서 공통 성분과 개별 성분으로 분해되고, 특징 레벨에서 융합될 수 있다. 또한, 본 발명의 일 실시예에 따르면, 다중 오늘 영상 융합 방법은 개별 성분들에 대해 공간 어텐션 가중치 맵을 적용하여 융합함으로써 명암 부분의 디테일한 복원이 가능하며 후광 아티팩트를 감소하고 자연스러운 색상으로 복원할 수 있는 이점이 있다. In summary, according to an embodiment of the present invention, a multi-exposure image may be decomposed into common components and individual components at a feature level by a deep learning model, and then fused at a feature level. In addition, according to an embodiment of the present invention, the multi-today image fusion method applies a spatial attention weight map to individual components and fuses them, thereby enabling detailed restoration of bright and dark parts, reducing halo artifacts, and restoring natural colors. There are benefits to being able to

도 6에는 4개의 다중 노출 영상을 딥러닝 모델을 통해 각각 공통 성분과 개별 성분으로 분해하고, 분해된 공통 성분(C₁, vis 내지 C₄, vis)을 융합하고(

), 분해된 개별 성분(R₁, vis 내지 R₄, vis)을 융합하며(

), 융합된 공통 성분과 융합된 개별 성분을 이용하여 재구성된 출력 영상(Mout)을 생성하는 과정에 대한 설명이 예시되어 있다. In FIG. 6, each of the four multiple exposure images is decomposed into common components and individual components through a deep learning model, and the decomposed common components (C ₁ , vis to C ₄ , vis) are fused (

), fusing the decomposed individual components (R ₁ , vis to R ₄ , vis) (

), a description of a process of generating a reconstructed output image Mout using the fused common component and the fused individual component is exemplified.

도 7은 본 발명의 일 실시예에 따른 딥러닝 분해 기반 다중 노출 영상 융합 장치의 내부 구성을 개략적으로 도시한 블록도이고, 도 8 내지 도 10은 종래와 본 발명의 일 실시예에 따른 다중 노출 영상 융합 결과를 비교한 도면이다. 7 is a block diagram schematically showing the internal configuration of a deep learning decomposition-based multi-exposure image fusion device according to an embodiment of the present invention, and FIGS. 8 to 10 are multiple exposures according to the prior art and an embodiment of the present invention. It is a drawing comparing image fusion results.

도 7을 참조하면, 본 발명의 일 실시예에 따른 딥러닝 분해 기반 다중 노출 영상 융합 장치(100)는 분해부(710), 시각화부(720), 융합부(730), 재구성부(740), 메모리(750) 및 프로세서(760)를 포함하여 구성된다. Referring to FIG. 7 , the multi-exposure image fusion apparatus 100 based on deep learning decomposition according to an embodiment of the present invention includes a decomposition unit 710, a visualization unit 720, a fusion unit 730, and a reconstruction unit 740. , a memory 750 and a processor 760.

분해부(710)는 다중 노출 영상을 딥러닝 모델에 적용하여 특징 레벨에서 공통 성분과 개별 성분으로 각각 분해하기 위한 수단이다. The decomposition unit 710 is a means for decomposing the multi-exposure image into common components and individual components at a feature level by applying the deep learning model.

시각화부(720)는 공통 성분과 개별 성분을 결합하여 입력 영상(다중 노출 영상)과 동일한 단일 영상으로 시각화하기 위한 수단이다. The visualization unit 720 is a means for visualizing a single image identical to an input image (multiple exposure image) by combining common components and individual components.

딥러닝 모델은 이미 전술한 바와 같이, 등화 손실과 시각화 손실이 최소가 되도록 학습될 수 있다. 이는 도 1에서 설명한 바와 동일하므로 중복되는 설명은 생략하기로 한다. The deep learning model can be trained to minimize equalization loss and visualization loss, as already described above. Since this is the same as that described in FIG. 1, redundant description will be omitted.

융합부(730)는 공통 성분과 개별 성분을 각각 융합하기 위한 수단이다. 융합부(730)는 공통 성분을 융합함에 있어, 복수의 컨볼루션 레이어와 연결하여 융합할 수 있다. 또한, 융합부(730)는 개별 성분을 융합함에 있어, 공간 어텐션 가중치 맵을 적용하여 개별 성분들을 융합할 수 있다. The fusion part 730 is a means for fusing common components and individual components. The fusion part 730 may fuse a common component by connecting it to a plurality of convolution layers. Also, when fusing individual components, the fusing unit 730 may fuse individual components by applying a spatial attention weight map.

재구성부(740)는 융합된 공통 성분과 개별 성분을 이용하여 출력 영상을 재구성하기 위한 수단이다. 딥러닝 모델은 재구성 손실이 최소가 되도록 공통 성분과 개별 성분을 분해하도록 학습될 수 있다. 전술한 바와 같이, 딥러닝 모델은 다중 노출 영상을 공통 성분과 개별 성분으로 분해하도록 학습함에 있어, 등화 손실, 시각화 손실 및 재구성 손실을 고려한 손실이 최소가 되도록 학습될 수 있다. The reconstruction unit 740 is a means for reconstructing an output image using fused common components and individual components. Deep learning models can be trained to decompose common and individual components such that reconstruction loss is minimal. As described above, in learning to decompose a multi-exposure image into common components and individual components, the deep learning model can be trained to minimize loss considering equalization loss, visualization loss, and reconstruction loss.

메모리(750)는 본 발명의 일 실시예에 따른 딥러닝 분해 기반 다중 노출 영상 융합 방법을 수행하기 위한 명령어를 저장하기 위한 수단이다.The memory 750 is a means for storing instructions for performing a multiple exposure image fusion method based on deep learning decomposition according to an embodiment of the present invention.

프로세서(760)는 본 발명의 일 실시예에 따른 딥러닝 분해 기반 다중 노출 영상 융합 장치(100)의 내부 구성 요소들(예를 들어, 분해부(710), 시각화부(720), 융합부(730), 재구성부(740), 메모리(750) 등)을 제어하기 위한 수단이다. The processor 760 includes internal components (eg, a decomposition unit 710, a visualization unit 720, a fusion unit ( 730), a reconfiguration unit 740, a memory 750, etc.).

도 8은 종래와 본 발명의 일 실시예에 따른 다중 노출 영상을 융합한 결과를 나타낸 것이다. 도 8의 (a) 내지 (h)에서 보여지는 바와 같이, 종래의 방법들은 어두운 영역의 디테일을 복원하는데 문제가 있음을 알 수 있으며, 본 발명의 일 실시예에 따른 융합 방법(i)은 어두운 영역의 디테일을 복원하는데 장점이 있는 것을 알 수 있다. 8 shows a result of fusion of multiple exposure images according to the prior art and an embodiment of the present invention. As shown in (a) to (h) of FIG. 8 , it can be seen that conventional methods have problems in restoring details of dark areas, and the fusion method (i) according to an embodiment of the present invention It can be seen that there is an advantage in restoring the detail of the region.

도 9는 다른 입력 영상에 대해 종래와 본 발명의 일 실시예에 따른 다중 노출 영상을 융합한 결과를 나타낸 것이다. 도 9에서 붉은색 박스 영역은 과도한 빛으로 포화된 세부 영역으로, (a) 내지 (h)에서 보여지는 바와 같이 종래의 방법들은 디테일 복원 측면에서 성능이 좋지 않은 것을 알 수 있다. 그러나 본 발명의 일 실시예에 따른 융합 방법(i)은 안정적이고 자연스러운 색상과 최상의 디테일 복원 능력을 보이는 것을 알 수 있다. 9 shows a result of fusing multiple exposure images according to an embodiment of the present invention with respect to different input images. In FIG. 9, the red boxed area is a detailed area saturated with excessive light, and as shown in (a) to (h), it can be seen that the performance of the conventional methods is poor in terms of detail restoration. However, it can be seen that the fusion method (i) according to an embodiment of the present invention shows stable and natural color and the best detail restoration ability.

도 10은 또 다른 입력 영상에 대해 종래와 본 발명의 일 실시예에 따른 융합 결과와 가중치 맵을 비교한 것이다. 10 is a comparison between a fusion result and a weight map according to an embodiment of the present invention and the conventional one for another input image.

도 10의 (a)에서 보여지는 바와 같이, 종래 방법의 경우 부자연스러운 경계 현상이 관찰되는데, 이는 가중치 맵에서 보여지는 바와 같이, 4개의 입력 가중치 분포가 급격한 변동을 가지며, 대비가 더 높은 것을 알 수 있다. 이로 인해, 특정 입력에 대해 편향된 가중치로 인해 부자연스러운 경계 현상이 나타나는 것을 알 수 있다.As shown in (a) of FIG. 10, in the case of the conventional method, an unnatural boundary phenomenon is observed. As shown in the weight map, the four input weight distributions have sharp fluctuations and the contrast is higher. can As a result, it can be seen that an unnatural boundary phenomenon appears due to a biased weight for a specific input.

그러나, 본 발명의 경우, 종래와 비교하여 더 부드러운 가중치 분포를 가지는 것을 알 수 있다. 이로 인해, 본 발명의 일 실시예에 따른 융합 결과에서 후광 아티팩트가 더 감소되는 것을 알 수 있다. However, it can be seen that the present invention has a softer weight distribution compared to the prior art. Due to this, it can be seen that halo artifacts are further reduced in the fusion result according to an embodiment of the present invention.

본 발명의 실시 예에 따른 장치 및 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 컴퓨터 판독 가능 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 분야 통상의 기술자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media) 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다.Devices and methods according to embodiments of the present invention may be implemented in the form of program instructions that can be executed through various computer means and recorded in computer readable media. Computer readable media may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on a computer readable medium may be specially designed and configured for the present invention, or may be known and usable to those skilled in the art in the field of computer software. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks and magnetic tapes, optical media such as CD-ROMs and DVDs, and magnetic media such as floptical disks. - Includes hardware devices specially configured to store and execute program instructions, such as magneto-optical media and ROM, RAM, flash memory, etc. Examples of program instructions include high-level language codes that can be executed by a computer using an interpreter, as well as machine language codes such as those produced by a compiler.

상술한 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The hardware device described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이제까지 본 발명에 대하여 그 실시 예들을 중심으로 살펴보았다. 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자는 본 발명이 본 발명의 본질적인 특성에서 벗어나지 않는 범위에서 변형된 형태로 구현될 수 있음을 이해할 수 있을 것이다. 그러므로 개시된 실시 예들은 한정적인 관점이 아니라 설명적인 관점에서 고려되어야 한다. 본 발명의 범위는 전술한 설명이 아니라 특허청구범위에 나타나 있으며, 그와 동등한 범위 내에 있는 모든 차이점은 본 발명에 포함된 것으로 해석되어야 할 것이다.So far, the present invention has been looked at mainly by its embodiments. Those skilled in the art to which the present invention pertains will be able to understand that the present invention can be implemented in a modified form without departing from the essential characteristics of the present invention. Therefore, the disclosed embodiments should be considered from a descriptive point of view rather than a limiting point of view. The scope of the present invention is shown in the claims rather than the foregoing description, and all differences within the equivalent scope will be construed as being included in the present invention.

100: 딥러닝 기반 다중 노출 영상 융합 장치
710: 분해부
720: 시각화부
730: 융합부
740: 재구성부
750: 메모리
760: 프로세서100: Deep learning-based multi-exposure image fusion device
710: disassembly
720: visualization unit
730: fusion part
740: reconstruction unit
750: memory
760: processor

Claims

(a) decomposing each multi-exposure image into a common component and a residual component at a feature level by applying a deep learning model;
(b) fusing each individual component and fusing each common component; and
(c) generating a reconstruction image by adding the fused individual component and the fused common component;

According to claim 1,
The deep learning model is learned in consideration of equalization loss so that each of the multiple exposure images obtained under different exposure conditions for the same scene has the same common component.

According to claim 2,
The deep learning model,
Deep learning decomposition-based multi-exposure image fusion method, characterized in that weights are learned in consideration of visualization loss so that each of the decomposed common components and each individual component are combined into a single image identical to each of the multiple-exposure images.

According to claim 1,
In step (b),
Deep learning decomposition-based multi-exposure image fusion method, characterized in that for fusing each individual component using a spatial attention weight map.

According to claim 3,
The deep learning model,
Deep learning decomposition-based multi-exposure image fusion method, characterized in that learning to minimize reconstruction loss considering the difference between the multi-exposure image and the output image reconstructed with the fused common component and the fused individual component.

A computer-readable recording medium recording program codes for performing the method according to any one of claims 1 to 5.

a decomposition unit that applies each multi-exposure image to a deep learning model and decomposes each into a common component and a residual component at a feature level;
a fusing unit fusing each of the individual components and fusing each of the common components; and
A reconstruction unit generating a reconstructed output image using the fused individual component and the fused common component;
The deep learning model is a deep learning decomposition-based multi-exposure image fusion device, characterized in that it is learned to minimize equalization loss between each common component decomposed of each multiple-exposure image.