KR20230102134A

KR20230102134A - Real-time image fusion apparatus and method for remote sensing based on deep learning

Info

Publication number: KR20230102134A
Application number: KR1020210192023A
Authority: KR
Inventors: 전광길
Original assignee: 인천대학교 산학협력단
Priority date: 2021-12-30
Filing date: 2021-12-30
Publication date: 2023-07-07
Also published as: KR102556028B1

Abstract

본 발명의 일 실시예에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치는, 업샘플링된 저해상도 멀티 스펙트럼 이미지와 팬크로마틱 이미지로부터 각각 얕은 특징을 추출하는 얕은 특징 추출부; 상기 팬크로마틱 이미지, 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지, 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 얕은 특징 및 상기 팬크로마틱 이미지의 얕은 특징에 기반하여 원시 정보 특징을 생성하는 정보 풀(Information Pool); 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 얕은 특징에 기반하여 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 스몰 스케일 특징, 미들 스케일 특징 및 라지 스케일 특징을 생성하는 제1 멀티 스케일 특징 추출 서브 네트워크; 상기 팬크로마틱 이미지의 얕은 특징에 기반하여 상기 팬크로마틱 이미지의 스몰 스케일 특징, 미들 스케일 특징 및 라지 스케일 특징을 생성하는 제2 멀티 스케일 특징 추출 서브 네트워크; 상기 원시 정보 특징, 상기 저해상도 멀티 스펙트럼 이미지의 스몰 스케일 특징, 미들 스케일 특징 및 라지 스케일 특징, 및 상기 팬크로마틱 이미지의 스몰 스케일 특징, 미들 스케일 특징 및 라지 스케일 특징들을 융합하는 멀티 스케일 특징 융합부; 및 상기 융합된 특징 및 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지에 기반하여 고해상도 멀티 스펙트럼 이미지를 재구성하는 재구성부를 포함한다. An apparatus for real-time image fusion for remote sensing based on deep learning according to an embodiment of the present invention includes a shallow feature extractor for extracting shallow features from an upsampled low-resolution multispectral image and a panchromatic image, respectively; An information pool for generating raw information features based on the panchromatic image, the upsampled low-resolution multispectral image, shallow features of the upsampled low-resolution multispectral image, and shallow features of the panchromatic image ; a first multi-scale feature extraction sub-network generating small-scale features, middle-scale features, and large-scale features of the up-sampled low-resolution multi-spectral image based on shallow features of the up-sampled low-resolution multi-spectral image; a second multi-scale feature extraction sub-network generating small-scale features, middle-scale features, and large-scale features of the panchromatic image based on shallow features of the panchromatic image; a multi-scale feature fusion unit fusing the raw information feature, the small-scale feature, middle-scale feature, and large-scale feature of the low-resolution multi-spectral image, and the small-scale feature, middle-scale feature, and large-scale feature of the panchromatic image; and a reconstruction unit that reconstructs a high-resolution multispectral image based on the fused feature and the upsampled low-resolution multispectral image.

Description

Real-time image fusion apparatus and method for remote sensing based on deep learning {REAL-TIME IMAGE FUSION APPARATUS AND METHOD FOR REMOTE SENSING BASED ON DEEP LEARNING}

본 발명은 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치 및 방법에 관한 것이다.The present invention relates to a real-time image fusion apparatus and method for remote sensing based on deep learning.

원격 감지는 지구를 이해하고 인간-지구 통신을 지원하는 기본 도구이다. 원격 감지 기술의 중요한 분야인 멀티 스펙트럼(MS) 이미징은 실시간 모니터링 및 감시 분야에 널리 적용된다. 그러나 센서와 대역폭의 한계로 인해 단일 센서를 사용하여 높은 공간 해상도의 멀티 스펙트럼 이미지를 얻는 것은 특히 어렵다.Remote sensing is a fundamental tool for understanding the Earth and supporting human-Earth communication. Multispectral (MS) imaging, an important field of remote sensing technology, is widely applied in real-time monitoring and surveillance. However, it is particularly difficult to obtain multispectral images with high spatial resolution using a single sensor due to sensor and bandwidth limitations.

일반적으로 이러한 문제를 해결하기 위해 광원격탐사위성은 멀티 센서를 장착하여 멀티 대역의 저해상도 멀티 스펙트럼(LRMS: low-resolution multi-spectral) 이미지와 단일 대역의 고해상도(HR: high resolution) 팬크로마틱 이미지를 동시에 획득한다. 팬 샤프닝 기법은 저해상도 멀티 스펙트럼 이미지와 고해상도 팬크로마틱 이미지의 보완적 정보를 융합하여 고해상도 멀티 스펙트럼 이미지를 재구성하는 기술로서, 그 공간 해상도는 팬크로마틱 이미지와 동일하다.In general, in order to solve this problem, a light-based survey satellite is equipped with a multi-sensor to obtain a multi-band low-resolution multi-spectral (LRMS) image and a single-band high resolution (HR) pan-chromatic image. obtain at the same time. The pan sharpening technique is a technique for reconstructing a high-resolution multispectral image by fusing complementary information of a low-resolution multispectral image and a high-resolution panchromatic image, and its spatial resolution is the same as that of the panchromatic image.

실시간 원격 감지 커뮤니티에서는 이미지 분류, 이미지 분할 및 이미지 감지와 같은 다양한 작업이 붐을 이루고 있다. 일반적으로 이러한 작업은 고해상도 원격 감지 이미지를 입력으로 사용하는 것을 선호한다. 왜냐하면, 입력 이미지가 더 높은 해상도를 가질수록 후속 프로세스 및 분석을 위해 더 많은 정보를 제공할 것이기 때문이다. 반대로 해상도가 낮은 이미지는 저해상도 이미지에서 제한되는 객체의 구조와 픽셀 간의 관계에 민감하기 때문에 이러한 작업의 수행에 방해가 되기 쉽다. 도 1에서 볼 수 있듯이 원격 감지 이미지 융합(팬 샤프닝)은 원격 감지 이미지 처리에 매우 중요하며, 스펙트럼 영역 및 공간 영역 모두에서 고해상도 이미지를 제공하여 후속 작업의 정확도를 높인다.Various tasks such as image classification, image segmentation, and image detection are booming in the real-time remote sensing community. Typically, these tasks prefer to use high-resolution remote-sensing images as input. This is because the higher the resolution of the input image, the more information it will provide for subsequent processing and analysis. Conversely, low-resolution images tend to hinder the performance of these tasks because they are sensitive to the structure of objects and the relationship between pixels that are limited in low-resolution images. As shown in Figure 1, remote-sensing image fusion (pan sharpening) is very important for remote-sensing image processing, providing high-resolution images in both the spectral and spatial domains to increase the accuracy of subsequent operations.

지난 수십 년 동안 연구자들은 팬 샤프닝 기술에 많은 관심을 기울이고 다양한 의미 있는 방법을 제안했다. 기존 팬 샤프닝 방법은 실제로 기존 알고리즘과 딥 러닝 기반 방법의 두 가지 범주로 나눌 수 있다. 기존 알고리즘과 관련하여 구성 요소 대체(CS: component substitution) 접근 방식, 멀티 해상도 분석(MRA: multi-resolution analysis) 접근 방식 및 모델 기반 최적화(model-based optimization) 접근 방식의 세 가지 분기로 더 분류할 수 있다.Over the past decades, researchers have paid much attention to fan sharpening technology and proposed various meaningful methods. Existing fan sharpening methods can actually be divided into two categories: traditional algorithms and deep learning-based methods. Regarding existing algorithms, we can further classify them into three branches: component substitution (CS) approach, multi-resolution analysis (MRA) approach, and model-based optimization approach. can

CS 접근 방식의 경우 일반적으로 멀티 스펙트럼 이미지를 적절한 영역으로 변환한다. 여기서 멀티 스펙트럼 이미지의 공간 구성 요소는 고해상도 팬크로마틱 이미지로 대체된다. 그런 다음 해당 역변환을 통해 팬 샤프닝된 이미지가 달성된다. 강도-색조 포화 기법(HIS: intensity-hue-saturation), 주성분 분석(PCA: principal component analysis), 브로베이(Brovey) 변환(BT) 및 그램-슈미트(Gram-Schmidt) 변환은 종종 CS에 통합된다. 이러한 방법은 간단하고 빠르며 팬크로마틱 이미지에서 직접 공간 정보를 얻는 경향이 있지만 스펙트럼 왜곡이 있다. MRA 방법의 경우 주로 이산 웨이블릿 변환(DWT), 라플라시안 피라미드(LP), a trous 웨이블릿 변환(ATWT)을 통합하여 팬크로마틱 이미지를 분해하고 고주파 정보를 추출하고 LRMS로 주입한다. 이러한 방법은 더 나은 스펙트럼 일관성을 얻을 수 있지만 더 심각한 공간 왜곡을 초래할 수 있다.For the CS approach, we usually transform the multispectral image into an appropriate region. Here, the spatial component of the multispectral image is replaced with a high-resolution panchromatic image. A fan-sharpened image is then achieved through the corresponding inverse transform. Intensity-hue-saturation (HIS), principal component analysis (PCA), Brovey transform (BT) and Gram-Schmidt transform are often incorporated into CS . These methods are simple, fast, and tend to obtain spatial information directly from panchromatic images, but suffer from spectral distortion. In the case of the MRA method, a discrete wavelet transform (DWT), a Laplacian pyramid (LP), and a trous wavelet transform (ATWT) are mainly integrated to decompose a panchromatic image, extract high-frequency information, and inject into LRMS. This method can obtain better spectral coherence, but may result in more severe spatial distortion.

최근 딥 러닝의 상당한 발전과 함께 CNN(Convolutional Neural Networks)은 팬 샤프닝 커뮤니티에서 많은 주목을 받아 성능과 효율성 모두에서 현저한 개선을 달성했다. 팬 샤프닝과 단일 이미지 초해상도(SISR) 모두 향상된 고해상도(HR) 이미지를 재구성하는 것을 목표로 하기 때문에 팬 샤프닝은 어느 정도 초해상도 관련 작업이다. 주요 차이점은 전자는 주로 고해상도의 팬크로마틱 이미지에서 공간적 디테일을 추출하는 반면 후자는 해당 저해상도(LR) 이미지에서 디테일을 추출한다는 것이다. 따라서 기존의 많은 딥 러닝 기반 팬 샤프닝 접근법은 SISR 방법에 의해 계몽되었다.With recent significant advances in deep learning, Convolutional Neural Networks (CNNs) have garnered a lot of attention from the fan sharpening community, achieving significant improvements in both performance and efficiency. Because both pan sharpening and single image super-resolution (SISR) aim to reconstruct an enhanced high-resolution (HR) image, pan sharpening is to some extent super-resolution related. The main difference is that the former primarily extracts spatial details from high-resolution panchromatic images, while the latter extracts details from corresponding low-resolution (LR) images. Therefore, many existing deep learning-based fan sharpening approaches have been enlightened by SISR methods.

일반적으로 실제 적용에서는 실시간 및 효과적인 팬 새프닝 방법이 선호된다. 그래픽 처리 장치(GPU)의 급속한 발전으로 대부분의 딥 러닝 기반 팬 샤프닝 방법은 실시간 요구 사항을 충족할 수 있다. 그러나 실시간 응용의 조건에서 더 선명한 이미지를 획득함과 동시에 스펙트럼 왜곡을 줄일 수 있는 원격 감지를 위한 실시간 이미지 융합 장치 및 방법이 요구된다.In general, real-time and effective fan sharpening methods are preferred in practical applications. With the rapid development of graphics processing units (GPUs), most deep learning-based fan sharpening methods can meet real-time requirements. However, there is a need for a real-time image fusion device and method for remote sensing capable of obtaining clearer images and reducing spectral distortion under real-time application conditions.

KRKR 10-1132272 10-1132272 B1B1

본 발명이 해결하고자 하는 과제는 가중치 공유 인코더-디코더 구조에 의해 추출된 팬크로마틱 이미지 및 멀티 스펙트럼 이미지의 레이어적 보완 특징을 최대한 활용할 수 있고, 더 큰 수용 필드를 가능하게 하여 멀티 스펙트럼 이미지 및 해당 팬크로마틱 이미지의 멀티 스케일 보완 컨텍스트 정보를 유용하게 획득할 수 있는, 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치 및 방법을 제공하는 것이다.The problem to be solved by the present invention is to make the most of the layered complementary features of the panchromatic image and the multispectral image extracted by the weight sharing encoder-decoder structure, and to enable a larger receptive field to obtain a multispectral image and corresponding It is to provide a real-time image fusion device and method for remote sensing based on deep learning, which can usefully acquire multi-scale complementary context information of panchromatic images.

본 발명이 해결하고자 하는 다른 과제는 거친-미세(coarse-to-fine) 전략을 네트워크에 통합하여 단계적으로 멀티 스케일 특징을 획득함으로써 다양한 스케일의 특징을 모두 풍부하게 추출할 수 있는, 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치 및 방법을 제공하는 것이다.Another problem to be solved by the present invention is based on deep learning, which can richly extract all of the features of various scales by acquiring multi-scale features step by step by integrating a coarse-to-fine strategy into the network. It is to provide a real-time image fusion device and method for remote sensing.

본 발명이 해결하고자 하는 또 다른 과제는, 후속 특징 융합을 위한 기본 정보를 보존하기 위해 정보 풀을 구축함으로써 네트워크의 표현 능력을 향상시킬 뿐만 아니라 기울기 역전파(gradient back-propagation)에도 도움이 될 수 있는, 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치 및 방법을 제공하는 것이다.Another problem to be solved by the present invention is to build an information pool to preserve basic information for subsequent feature fusion, which can not only improve the expressiveness of the network but also help with gradient back-propagation. It is to provide a real-time image fusion device and method for remote sensing based on deep learning.

본 발명이 해결하고자 하는 또 다른 과제는, 스펙트럼 왜곡을 현저히 줄일 수 있으면서 동시에 선명한 고해상도 이미지를 획득할 수 있는, 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치 및 방법을 제공하는 것이다.Another problem to be solved by the present invention is to provide a real-time image fusion device and method for remote sensing based on deep learning, which can significantly reduce spectral distortion and acquire clear, high-resolution images at the same time.

상기 과제를 해결하기 위한 본 발명의 일 실시예에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치는,Real-time image fusion device for remote sensing based on deep learning according to an embodiment of the present invention for solving the above problems,

업샘플링된 저해상도 멀티 스펙트럼 이미지와 팬크로마틱 이미지로부터 각각 얕은 특징을 추출하는 얕은 특징 추출부;a shallow feature extraction unit extracting shallow features from the upsampled low-resolution multi-spectral image and the pan-chromatic image;

상기 팬크로마틱 이미지, 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지, 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 얕은 특징 및 상기 팬크로마틱 이미지의 얕은 특징에 기반하여 원시 정보 특징을 생성하는 정보 풀(Information Pool);An information pool for generating raw information features based on the panchromatic image, the upsampled low-resolution multispectral image, shallow features of the upsampled low-resolution multispectral image, and shallow features of the panchromatic image ;

상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 얕은 특징에 기반하여 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 스몰 스케일 특징, 미들 스케일 특징 및 라지 스케일 특징을 생성하는 제1 멀티 스케일 특징 추출 서브 네트워크;a first multi-scale feature extraction sub-network generating small-scale features, middle-scale features, and large-scale features of the up-sampled low-resolution multi-spectral image based on shallow features of the up-sampled low-resolution multi-spectral image;

상기 팬크로마틱 이미지의 얕은 특징에 기반하여 상기 팬크로마틱 이미지의 스몰 스케일 특징, 미들 스케일 특징 및 라지 스케일 특징을 생성하는 제2 멀티 스케일 특징 추출 서브 네트워크;a second multi-scale feature extraction sub-network generating small-scale features, middle-scale features, and large-scale features of the panchromatic image based on shallow features of the panchromatic image;

상기 원시 정보 특징, 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 스몰 스케일 특징, 미들 스케일 특징 및 라지 스케일 특징, 및 상기 팬크로마틱 이미지의 스몰 스케일 특징, 미들 스케일 특징 및 라지 스케일 특징들을 융합하는 멀티 스케일 특징 융합부; 및The raw information feature, the small-scale feature, middle-scale feature, and large-scale feature of the upsampled low-resolution multi-spectral image, and the multi-scale feature fusing the small-scale feature, middle-scale feature, and large-scale feature of the panchromatic image. fusion; and

상기 융합된 특징 및 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지에 기반하여 고해상도 멀티 스펙트럼 이미지를 재구성하는 재구성부를 포함한다.and a reconstruction unit for reconstructing a high-resolution multispectral image based on the fused feature and the upsampled low-resolution multispectral image.

본 발명의 일 실시예에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치에 있어서, 상기 얕은 특징 추출부는,In the real-time image fusion device for remote sensing based on deep learning according to an embodiment of the present invention, the shallow feature extraction unit,

상기 업샘플링된 저해상도 멀티 스펙트럼 이미지에 대해 컨벌루션 연산을 수행하여 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 얕은 특징을 출력하는 제1 컨벌루션 레이어; 및a first convolution layer that performs a convolution operation on the upsampled low-resolution multispectral image and outputs a shallow feature of the upsampled low-resolution multispectral image; and

상기 팬크로마틱 이미지에 대해 컨벌루션 연산을 수행하여 상기 팬크로마틱 이미지의 얕은 특징을 출력하는 제2 컨벌루션 레이어를 포함할 수 있다.and a second convolution layer that performs a convolution operation on the panchromatic image and outputs a shallow feature of the panchromatic image.

또한, 본 발명의 일 실시예에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치에 있어서, 상기 정보 풀(Information Pool)은,In addition, in the real-time image fusion device for remote sensing based on deep learning according to an embodiment of the present invention, the information pool,

상기 팬크로마틱 이미지, 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지, 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 얕은 특징 및 상기 팬크로마틱 이미지의 얕은 특징을 연결하는 연결부; 및a connection unit connecting the pan-chromatic image, the up-sampled low-resolution multi-spectral image, a shallow feature of the up-sampled low-resolution multi-spectral image, and a shallow feature of the pan-chromatic image; and

상기 연결부의 출력에 대해 컨벌루션 연산을 수행하여 상기 원시 정보 특징을 출력하는 제3 컨벌루션 레이어를 포함할 수 있다.It may include a third convolution layer that performs a convolution operation on the output of the connection unit and outputs the original information feature.

또한, 본 발명의 일 실시예에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치에 있어서, 상기 제1 멀티 스케일 특징 추출 서브 네트워크는,In addition, in the real-time image fusion device for remote sensing based on deep learning according to an embodiment of the present invention, the first multi-scale feature extraction subnetwork,

상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 얕은 특징의 공간 해상도를 낮추고 특징 맵들의 특징 채널들을 늘리는 제1 인코더; 및a first encoder that lowers spatial resolution of shallow features of the upsampled low-resolution multispectral image and increases feature channels of feature maps; and

상기 제1 인코더의 출력에 기반하여 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 스몰 스케일 특징, 미들 스케일 특징 및 라지 스케일 특징을 생성하는 제1 디코더를 포함하고,A first decoder for generating a small scale feature, a middle scale feature, and a large scale feature of the upsampled low-resolution multispectral image based on an output of the first encoder;

상기 제2 멀티 스케일 특징 추출 서브 네트워크는,The second multi-scale feature extraction subnetwork,

상기 팬크로마틱 이미지의 얕은 특징의 공간 해상도를 낮추고 특징 맵들의 특징 채널들을 늘리는 제2 인코더; 및a second encoder that lowers spatial resolution of shallow features of the panchromatic image and increases feature channels of feature maps; and

상기 제2 인코더의 출력에 기반하여 상기 팬크로마틱 이미지의 스몰 스케일 특징, 미들 스케일 특징 및 라지 스케일 특징을 생성하는 제2 디코더를 포함할 수 있다.and a second decoder for generating a small scale feature, a middle scale feature, and a large scale feature of the panchromatic image based on an output of the second encoder.

또한, 본 발명의 일 실시예에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치에 있어서, 상기 제1 멀티 스케일 특징 추출 서브 네트워크와 상기 제2 멀티 스케일 특징 추출 서브 네트워크는 컨벌루션 레이어에서 사용되는 가중치를 공유할 수 있다.In addition, in the real-time image fusion device for remote sensing based on deep learning according to an embodiment of the present invention, the first multi-scale feature extraction sub-network and the second multi-scale feature extraction sub-network are used in a convolution layer Weights can be shared.

또한, 본 발명의 일 실시예에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치에 있어서, 상기 멀티 스케일 특징 융합부는,In addition, in the real-time image fusion device for remote sensing based on deep learning according to an embodiment of the present invention, the multi-scale feature fusion unit,

상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 스몰 스케일 특징과 상기 팬크로마틱 이미지의 스몰 스케일 특징을 융합하여 융합된 스몰 스케일 특징을 출력하는 스몰 스케일 특징 융합부;a small-scale feature fusion unit fusing the small-scale feature of the upsampled low-resolution multi-spectral image with the small-scale feature of the panchromatic image and outputting a fused small-scale feature;

상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 미들 스케일 특징과 상기 팬크로마틱 이미지의 미들 스케일 특징을 융합하여 융합된 미들 스케일 특징을 출력하는 미들 스케일 특징 융합부; 및a middle scale feature fusion unit fusing middle scale features of the upsampled low-resolution multispectral image with middle scale features of the panchromatic image and outputting a fused middle scale feature; and

상기 융합된 스몰 스케일 특징, 상기 융합된 미들 스케일 특징, 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 라지 스케일 특징, 상기 팬크로마틱 이미지의 라지 스케일 특징 및 상기 원시 정보 특징을 융합하여 최종 융합된 특징을 출력하는 포괄적 특징 융합부를 포함할 수 있다.The fused small-scale feature, the fused middle-scale feature, the large-scale feature of the upsampled low-resolution multispectral image, the large-scale feature of the panchromatic image, and the raw information feature are fused to output a final fused feature. It may include a global feature fusion part that does.

상기 과제를 해결하기 위한 본 발명의 일 실시예에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 방법은,A real-time image fusion method for remote sensing based on deep learning according to an embodiment of the present invention for solving the above problems is,

(A) 얕은 특징 추출부가, 업샘플링된 저해상도 멀티 스펙트럼 이미지와 팬크로마틱 이미지로부터 각각 얕은 특징을 추출하는 단계;(A) extracting, by a shallow feature extraction unit, shallow features from the upsampled low-resolution multi-spectral image and the panchromatic image;

(B) 정보 풀(Information Pool)이, 상기 팬크로마틱 이미지, 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지, 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 얕은 특징 및 상기 팬크로마틱 이미지의 얕은 특징에 기반하여 원시 정보 특징을 생성하는 단계;(B) an information pool based on the panchromatic image, the upsampled low resolution multispectral image, shallow features of the upsampled lowresolution multispectral image, and shallow features of the panchromatic image; generating an information feature;

(C) 제1 멀티 스케일 특징 추출 서브 네트워크가, 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 얕은 특징에 기반하여 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 스몰 스케일 특징, 미들 스케일 특징 및 라지 스케일 특징을 생성하는 단계;(C) a first multi-scale feature extraction subnetwork generates small-scale features, middle-scale features, and large-scale features of the upsampled low-resolution multispectral image based on shallow features of the upsampled low-resolution multispectral image step;

(D) 제2 멀티 스케일 특징 추출 서브 네트워크가, 상기 팬크로마틱 이미지의 얕은 특징에 기반하여 상기 팬크로마틱 이미지의 스몰 스케일 특징, 미들 스케일 특징 및 라지 스케일 특징을 생성하는 단계;(D) generating, by a second multi-scale feature extraction sub-network, small-scale features, middle-scale features, and large-scale features of the panchromatic image based on the shallow features of the panchromatic image;

(E) 멀티 스케일 특징 융합부가, 상기 원시 정보 특징, 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 스몰 스케일 특징, 미들 스케일 특징 및 라지 스케일 특징, 및 상기 팬크로마틱 이미지의 스몰 스케일 특징, 미들 스케일 특징 및 라지 스케일 특징들을 융합하는 단계; 및(E) a multi-scale feature fusion unit, the raw information feature, the small-scale feature, middle-scale feature and large-scale feature of the upsampled low-resolution multi-spectral image, and the small-scale feature, middle-scale feature of the panchromatic image, and fusing large scale features; and

(F) 재구성부가, 상기 융합된 특징 및 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지에 기반하여 고해상도 멀티 스펙트럼 이미지를 재구성하는 단계를 포함한다.(F) reconstructing, by a reconstruction unit, a high-resolution multispectral image based on the fused feature and the upsampled low-resolution multispectral image.

본 발명의 일 실시예에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 방법에 있어서, 상기 단계 (A)는,In the real-time image fusion method for remote sensing based on deep learning according to an embodiment of the present invention, the step (A) is,

상기 업샘플링된 저해상도 멀티 스펙트럼 이미지에 대해 컨벌루션 연산을 수행하여 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 얕은 특징을 출력하는 단계; 및outputting shallow features of the upsampled low resolution multispectral image by performing a convolution operation on the upsampled low resolution multispectral image; and

상기 팬크로마틱 이미지에 대해 컨벌루션 연산을 수행하여 상기 팬크로마틱 이미지의 얕은 특징을 출력하는 단계를 포함할 수 있다.and outputting a shallow feature of the panchromatic image by performing a convolution operation on the panchromatic image.

또한, 본 발명의 일 실시예에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 방법에 있어서, 상기 단계 (B)는,In addition, in the real-time image fusion method for remote sensing based on deep learning according to an embodiment of the present invention, the step (B) is,

(B-1) 상기 팬크로마틱 이미지, 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지, 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 얕은 특징 및 상기 팬크로마틱 이미지의 얕은 특징을 연결하는 단계; 및(B-1) connecting the panchromatic image, the upsampled low-resolution multispectral image, shallow features of the upsampled lowresolution multispectral image, and shallow features of the panchromatic image; and

(B-2) 상기 단계 (B-1)의 출력에 대해 컨벌루션 연산을 수행하여 상기 원시 정보 특징을 출력하는 단계를 포함할 수 있다.(B-2) performing a convolution operation on the output of step (B-1) to output the original information feature.

또한, 본 발명의 일 실시예에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 방법에 있어서, 상기 단계 (C)는,In addition, in the real-time image fusion method for remote sensing based on deep learning according to an embodiment of the present invention, the step (C),

(C-1) 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 얕은 특징의 공간 해상도를 낮추고 특징 맵들의 특징 채널들을 늘리는 단계; 및(C-1) lowering spatial resolution of shallow features of the upsampled low-resolution multi-spectral image and increasing feature channels of feature maps; and

(C-2) 상기 단계 (C-1)의 출력에 기반하여 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 스몰 스케일 특징, 미들 스케일 특징 및 라지 스케일 특징을 생성하는 단계를 포함하고,(C-2) generating a small-scale feature, a middle-scale feature, and a large-scale feature of the upsampled low-resolution multispectral image based on the output of step (C-1);

상기 단계 (D)는,In the step (D),

(D-1) 상기 팬크로마틱 이미지의 얕은 특징의 공간 해상도를 낮추고 특징 맵들의 특징 채널들을 늘리는 단계; 및(D-1) lowering spatial resolution of shallow features of the panchromatic image and increasing feature channels of feature maps; and

(D-2) 상기 단계 (D-1)의 출력에 기반하여 상기 팬크로마틱 이미지의 스몰 스케일 특징, 미들 스케일 특징 및 라지 스케일 특징을 생성하는 단계를 포함할 수 있다.(D-2) generating a small-scale feature, a middle-scale feature, and a large-scale feature of the panchromatic image based on the output of step (D-1).

또한, 본 발명의 일 실시예에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 방법에 있어서, 상기 단계 (C)와 상기 단계 (D)는 컨벌루션 레이어에서 사용되는 가중치를 공유할 수 있다.In addition, in the real-time image fusion method for remote sensing based on deep learning according to an embodiment of the present invention, the step (C) and the step (D) may share weights used in the convolutional layer.

또한, 본 발명의 일 실시예에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 방법에 있어서, 상기 단계 (E)는,In addition, in the real-time image fusion method for remote sensing based on deep learning according to an embodiment of the present invention, the step (E) is,

상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 스몰 스케일 특징과 상기 팬크로마틱 이미지의 스몰 스케일 특징을 융합하여 융합된 스몰 스케일 특징을 출력하는 단계;fusing small scale features of the upsampled low-resolution multispectral image with small scale features of the panchromatic image and outputting a fused small scale feature;

상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 미들 스케일 특징과 상기 팬크로마틱 이미지의 미들 스케일 특징을 융합하여 융합된 미들 스케일 특징을 출력하는 단계; 및fusing middle-scale features of the upsampled low-resolution multi-spectral image with middle-scale features of the panchromatic image and outputting a fused middle-scale feature; and

상기 융합된 스몰 스케일 특징, 상기 융합된 미들 스케일 특징, 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 라지 스케일 특징, 상기 팬크로마틱 이미지의 라지 스케일 특징 및 상기 원시 정보 특징을 융합하여 최종 융합된 특징을 출력하는 단계를 포함할 수 있다.The fused small-scale feature, the fused middle-scale feature, the large-scale feature of the upsampled low-resolution multispectral image, the large-scale feature of the panchromatic image, and the raw information feature are fused to output a final fused feature. steps may be included.

본 발명의 일 실시예에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치 및 방법에 의하면, 가중치 공유 인코더-디코더 구조에 의해 추출된 팬크로마틱 이미지 및 멀티 스펙트럼 이미지의 레이어적 보완 특징을 최대한 활용할 수 있고, 더 큰 수용 필드를 가능하게 하여 멀티 스펙트럼 이미지 및 해당 팬크로마틱 이미지의 멀티 스케일 보완 컨텍스트 정보를 유용하게 획득할 수 있다.According to an apparatus and method for real-time image fusion for remote sensing based on deep learning according to an embodiment of the present invention, layered complementary features of a panchromatic image and a multispectral image extracted by a weight-sharing encoder-decoder structure are maximized. In addition, by enabling a larger receptive field, multi-scale complementary context information of a multispectral image and a corresponding panchromatic image can be usefully obtained.

본 발명의 일 실시예에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치 및 방법에 의하면, 거친-미세(coarse-to-fine) 전략을 네트워크에 통합하여 단계적으로 멀티 스케일 특징을 획득함으로써 다양한 스케일의 특징을 모두 풍부하게 추출할 수 있다.According to the real-time image fusion apparatus and method for remote sensing based on deep learning according to an embodiment of the present invention, by integrating a coarse-to-fine strategy into a network to acquire multi-scale features step by step, various All features of the scale can be extracted abundantly.

본 발명의 일 실시예에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치 및 방법에 의하면, 후속 특징 융합을 위한 기본 정보를 보존하기 위해 정보 풀을 구축함으로써 네트워크의 표현 능력을 향상시킬 뿐만 아니라 기울기 역전파(gradient back-propagation)에도 도움이 될 수 있다.According to the real-time image fusion apparatus and method for remote sensing based on deep learning according to an embodiment of the present invention, by constructing an information pool to preserve basic information for subsequent feature fusion, the expressive ability of the network is improved, and It can also help with gradient back-propagation.

본 발명의 일 실시예에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치 및 방법에 의하면, 스펙트럼 왜곡을 현저히 줄일 수 있으면서 동시에 선명한 고해상도 이미지를 획득할 수 있는, 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치 및 방법을 제공하는 것이다.According to the real-time image convergence apparatus and method for remote sensing based on deep learning according to an embodiment of the present invention, spectral distortion can be significantly reduced and a clear high-resolution image can be obtained at the same time, for remote sensing based on deep learning. It is to provide a real-time image fusion device and method.

도 1은 원격 감지 커뮤니티에서의 팬-샤프닝의 중요한 역할을 설명하기 위한 도면.
도 2는 본 발명의 일 실시예에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치를 도시한 도면.
도 3은 멀티 스펙트럼 경로의 제1 멀티 스케일 특징 추출 서브 네트워크의 구조를 도시한 도면.
도 4는 멀티 스케일 특징 융합부의 구조를 도시한 도면.
도 5는 본 발명의 일 실시예에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 방법의 흐름도를 도시한 도면.
도 6a는 표 4에 대응하는 절제 실험의 손실 곡선을 도시한 도면.
도 6b는 표 5에 대응하는 단일 스케일 융합 및 멀티 스케일 융합의 손실 곡선을 도시한 도면.
도 7은 퀵버드(QuickBird) 데이터셋에 대한 상이한 팬-샤프닝 방법들의 비주얼 결과를 도시한 도면.
도 8은 IKONOS 데이터셋에 대한 상이한 팬-샤프닝 방법들의 비주얼 결과를 도시한 도면.
도 9는 Pleiades 데이터셋에 대한 상이한 팬-샤프닝 방법들의 비주얼 결과를 도시한 도면.1 illustrates the important role of fan-sharpening in the remote sensing community.
2 is a diagram illustrating a real-time image fusion device for remote sensing based on deep learning according to an embodiment of the present invention.
3 is a diagram showing the structure of a first multi-scale feature extraction sub-network of a multi-spectral path;
4 is a diagram showing the structure of a multi-scale feature fusion unit;
5 is a flowchart of a real-time image fusion method for remote sensing based on deep learning according to an embodiment of the present invention.
Figure 6a shows a loss curve of an ablation experiment corresponding to Table 4;
6B shows loss curves of single-scale fusion and multi-scale fusion corresponding to Table 5;
Figure 7 shows the visual results of different pan-sharpening methods on the QuickBird dataset.
Figure 8 shows the visual results of different pan-sharpening methods for the IKONOS dataset.
Figure 9 shows the visual results of different pan-sharpening methods on the Pleiades dataset.

본 발명의 목적, 특정한 장점들 및 신규한 특징들은 첨부된 도면들과 연관되어지는 이하의 상세한 설명과 바람직한 실시예들로부터 더욱 명백해질 것이다.Objects, specific advantages and novel features of the present invention will become more apparent from the following detailed description and preferred embodiments taken in conjunction with the accompanying drawings.

이에 앞서 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이고 사전적인 의미로 해석되어서는 아니되며, 발명자가 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념을 적절하게 정의할 수 있는 원칙에 입각하여 본 발명의 기술적 사상에 부합되는 의미와 개념으로 해석되어야 한다.Prior to this, the terms or words used in this specification and claims should not be interpreted in a conventional and dictionary sense, and the inventor may appropriately define the concept of the term in order to explain his or her invention in the best way. It should be interpreted as a meaning and concept consistent with the technical spirit of the present invention based on the principles.

본 명세서에서 각 도면의 구성요소들에 참조번호를 부가함에 있어서, 동일한 구성 요소들에 한해서는 비록 다른 도면상에 표시되더라도 가능한 한 동일한 번호를 가지도록 하고 있음에 유의하여야 한다.In adding reference numerals to components of each drawing in this specification, it should be noted that the same components have the same numbers as much as possible, even if they are displayed on different drawings.

또한, "제1", "제2", "일면", "타면" 등의 용어는, 하나의 구성요소를 다른 구성요소로부터 구별하기 위해 사용되는 것으로, 구성요소가 상기 용어들에 의해 제한되는 것은 아니다.In addition, terms such as “first”, “second”, “one side”, and “other side” are used to distinguish one component from another component, and the components are limited by the terms. It is not.

이하, 본 발명을 설명함에 있어, 본 발명의 요지를 불필요하게 흐릴 수 있는 관련된 공지 기술에 대한 상세한 설명은 생략한다.Hereinafter, in describing the present invention, detailed descriptions of related known technologies that may unnecessarily obscure the subject matter of the present invention will be omitted.

이하, 첨부된 도면을 참조하여 본 발명의 바람직한 실시형태를 상세히 설명하기로 한다.Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

실시간 모니터링 및 감시는 더 나은 분석을 위해 높은 공간 해상도의 멀티 스펙트럼(MS: multi-spectral) 이미지가 널리 요구되는 원격 감지 분야에서 중요한 역할을 한다. 그러나 센서와 대역폭의 한계로 인해 고해상도 멀티 스펙트럼 이미지를 직접 얻을 수 없다. 이러한 문제를 완화하기 위한 필수적인 방법으로 팬 샤프닝(pan-sharpening)은 저해상도 멀티 스펙트럼 이미지와 고해상도 팬크로마틱(PAN: panchromatic) 이미지의 보완 정보를 융합하여 고해상도 멀티 스펙트럼 이미지를 재구성하는 것을 목표로 한다.Real-time monitoring and surveillance plays an important role in remote sensing applications where multi-spectral (MS) images with high spatial resolution are widely required for better analysis. However, high-resolution multispectral images cannot be obtained directly due to sensor and bandwidth limitations. As an essential method to alleviate this problem, pan-sharpening aims to reconstruct a high-resolution multispectral image by fusing complementary information of a low-resolution multispectral image and a high-resolution panchromatic (PAN) image.

대부분의 이전 딥러닝 기반 방법은 GPU(그래픽 처리 장치)의 도움으로 실시간 요구 사항을 충족할 수 있다. 그러나 그들은 유리한 레이어적 정보를 완전히 활용하지 않아 성능 향상을 위한 큰 여지를 남겨두지 않는다.Most previous deep learning-based methods can meet real-time requirements with the help of a graphics processing unit (GPU). However, they do not fully utilize the advantageous layered information, leaving no significant room for performance improvement.

본 발명에서는 실시간 구현의 요구사항을 충족함과 동시에 보다 효과적인 성능을 달성하기 위해 팬크로마틱 이미지 및 멀티 스펙트럼 이미지의 레이어적 보완 특징을 최대한 활용하는 멀티 스케일 융합 네트워크(Multi-Scale fusion network)를 제안한다. 구체적으로, 팬크로마틱 이미지와 멀티 스펙트럼 이미지의 멀티 스케일 특징을 개별적으로 효과적으로 추출하기 위해 인코더-디코더 구조와 거친-미세(coarse-to-fine) 전략을 도입한다.In the present invention, in order to achieve more effective performance while meeting the requirements of real-time implementation, we propose a multi-scale fusion network that makes the most of the layered complementary features of panchromatic images and multispectral images. do. Specifically, we introduce an encoder-decoder structure and a coarse-to-fine strategy to effectively extract multi-scale features of panchromatic images and multispectral images separately.

한편, 원시 정보를 보존하기 위해 정보 풀을 채택하고 있다. 그런 다음 멀티 스케일 특징 융합부가 디코더와 정보 풀의 멀티 스케일 특징을 융합하기 위해 적용된다. 마지막으로 융합된 특징을 활용하여 고해상도 멀티 스펙트럼 이미지를 재구성한다. 광범위한 실험은 본 발명에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치 및 방법이 정량적 메트릭 및 시각적 품질 측면에서 다른 방법에 비해 유리한 성능을 달성한다는 것을 보여준다. 게다가, 실행 시간에 대한 결과는 본 발명에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치 및 방법이 실시간 성능을 달성할 수 있음을 나타낸다.Meanwhile, information pools are being adopted to preserve raw information. Then a multi-scale feature fusion part is applied to fuse the multi-scale features of the decoder and information pool. Finally, a high-resolution multispectral image is reconstructed by utilizing the fused features. Extensive experiments show that the real-time image fusion device and method for remote sensing based on deep learning according to the present invention achieves advantageous performance compared to other methods in terms of quantitative metrics and visual quality. In addition, the results on the execution time indicate that the real-time image fusion apparatus and method for remote sensing based on deep learning according to the present invention can achieve real-time performance.

구체적으로, 가중치 공유 인코더-디코더 구조와 거친-미세(Coarse-to-Fine) 전략을 도입하여 팬크로마틱 이미지와 멀티 스펙트럼 이미지의 멀티 스케일 특징을 각각 효과적으로 추출한다. 인코더-디코더 구조는 네트워크가 더 큰 수용 필드를 얻을 수 있도록 하여 멀티 스펙트럼 이미지와 해당 팬크로마틱 이미지의 멀티 스케일 보완적 컨텍스트 정보를 캡처하는 데 유용하다.Specifically, multi-scale features of panchromatic images and multi-spectral images are effectively extracted by introducing a weight sharing encoder-decoder structure and a coarse-to-fine strategy. The encoder-decoder structure is useful for capturing the multi-spectral image and the multi-scale complementary contextual information of the corresponding panchromatic image by allowing the network to obtain a larger receptive field.

디코더 단계의 경우, 본 발명의 일 실시예에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치 및 방법에서는 멀티 스케일 특징들을 점진적으로 디코딩하여 단계적으로 더 미세한 세부 사항을 얻기 위해 거친 대 미세 전략을 통합한다. 이러한 방식으로 스몰-스케일 특징 뿐만 아니라 미들-스케일 및 라지-스케일 특징도 풍부하게 추출하여 더 나은 후속 융합을 만든다. 또한 원시 정보를 보존하기 위하여 정보 풀을 채택한다.In the case of the decoder step, in the real-time image fusion apparatus and method for remote sensing based on deep learning according to an embodiment of the present invention, a coarse-to-fine strategy is used to gradually decode multi-scale features to obtain finer details step by step. integrate In this way, not only small-scale features, but also middle-scale and large-scale features are extracted richly, making for better subsequent fusion. It also employs information pools to preserve raw information.

이 정보 풀에서는 업샘플링된 멀티 스펙트럼 이미지인 팬크로마틱 이미지와 얕은 특징(단일 컨볼루션 레이어를 통해 얻은 특징)을 연결하여 컨볼루션 레이어에 공급하여 원시 정보를 예약한다. 정보 풀의 직관적인 동기는 특징 재사용과 메모리 메커니즘이며, 이는 원시 공간 및 스펙트럼 정보를 포함하는 얕은 특징이 팬 샤프닝의 성능을 높이는 데 도움이 된다는 것을 의미한다.In this information pool, the panchromatic image, which is an upsampled multispectral image, and shallow features (features obtained through a single convolutional layer) are concatenated and fed to the convolutional layer to reserve raw information. The intuitive motivation of the information pool is feature reuse and memory mechanisms, which means that shallow features containing raw spatial and spectral information help improve the performance of fan sharpening.

한편, 후속 특징 융합에 적용되는 정보 풀의 스킵 연결도 기울기 역전파에 이점이 있다. 그런 다음 멀티 스케일 특징 융합 모듈을 적용하여 디코더와 정보 풀에서 각각 멀티 스케일 특징과 원시 정보를 융합한다. 마지막으로 융합된 특징을 사용하여 여러 잔여 블록을 통해 고해상도 멀티 스펙트럼 이미지를 재구성한다. 스펙트럼 정보를 보존하기 위해 업샘플링된 멀티 스펙트럼 이미지에 대한 전역 잔여 학습이 채택되었다는 점은 언급할 가치가 있다. 실험 결과는 제안한 멀티 스케일 융합 네트워크(MSFN)의 우수성을 보여준다.On the other hand, the skip connection of the information pool applied to subsequent feature fusion also has an advantage in gradient backpropagation. Then, a multi-scale feature fusion module is applied to fuse multi-scale features and raw information in the decoder and information pool, respectively. Finally, a high-resolution multispectral image is reconstructed through several residual blocks using the fused features. It is worth mentioning that global residual learning on upsampled multispectral images is employed to preserve spectral information. Experimental results show the superiority of the proposed multi-scale fusion network (MSFN).

다음에서는 개요 아키텍처 및 수학적 표현을 포함하여 본 발명의 일 실시예에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치 및 방법에서 제안된 멀티 스케일 융합 네트워크(MSFN)의 프레임워크를 설명한다. 그런 다음 두 가지 특수 모듈에 대해 자세히 설명한다. 그 후, 실시간 구현에 대해 설명할 것이다. 마지막으로 손실 함수가 도입된다.In the following, the framework of the multi-scale fusion network (MSFN) proposed in the real-time image fusion device and method for remote sensing based on deep learning according to an embodiment of the present invention, including an overview architecture and mathematical expression, will be described. The two special modules are then described in detail. After that, real-time implementation will be described. Finally, a loss function is introduced.

네트워크 프레임워크network framework

도 2와 같이 본 발명의 일 실시예에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치(200)는 크게 얕은 특징 추출부(202)(SFE: Shallow Feature Extraction), 멀티 스케일 특징 추출 네트워크(206)(MSFENet: Multiscale Feature Extraction Net), 정보 풀(204)(IP: Information Pool), 멀티 스케일 특징 융합부(208)(MSFF: Multi-Scale Feature fusion) 및 재구성부(210)의 5개 부분으로 구성된다.As shown in FIG. 2, the real-time image fusion device 200 for remote sensing based on deep learning according to an embodiment of the present invention includes a shallow feature extraction unit 202 (SFE: Shallow Feature Extraction), a multi-scale feature extraction network ( 206) (MSFENet: Multiscale Feature Extraction Net), information pool 204 (IP: Information Pool), multi-scale feature fusion part 208 (MSFF: Multi-Scale Feature fusion) and reconstruction part 210 It consists of

얕은 특징 추출부(202)는, 업샘플링된 저해상도 멀티 스펙트럼 이미지(212)와 팬크로마틱 이미지로부터 각각 얕은 특징을 추출하고, 정보 풀(Information Pool)(204)은, 상기 팬크로마틱 이미지, 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지, 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 얕은 특징 및 상기 팬크로마틱 이미지의 얕은 특징에 기반하여 원시 정보 특징을 생성하며, 제1 멀티 스케일 특징 추출 서브 네트워크(222)는 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 얕은 특징에 기반하여 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 스몰 스케일 특징, 미들 스케일 특징 및 라지 스케일 특징을 생성하고, 제2 멀티 스케일 특징 추출 서브 네트워크(224)는 상기 팬크로마틱 이미지의 얕은 특징에 기반하여 상기 팬크로마틱 이미지의 스몰 스케일 특징, 미들 스케일 특징 및 라지 스케일 특징을 생성하며, 멀티 스케일 특징 융합부(226)은 상기 원시 정보 특징, 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지의 스몰 스케일 특징, 미들 스케일 특징 및 라지 스케일 특징, 및 상기 팬크로마틱 이미지의 스몰 스케일 특징, 미들 스케일 특징 및 라지 스케일 특징들을 융합하고, 재구성부(210)는 상기 융합된 특징 및 상기 업샘플링된 저해상도 멀티 스펙트럼 이미지에 기반하여 고해상도 멀티 스펙트럼 이미지를 재구성한다.The shallow feature extractor 202 extracts shallow features from the upsampled low-resolution multispectral image 212 and the panchromatic image, respectively, and an information pool 204 extracts the panchromatic image, the panchromatic image, and the panchromatic image. Generating raw information features based on the upsampled low-resolution multispectral image, shallow features of the upsampled low-resolution multispectral image, and shallow features of the panchromatic image, the first multi-scale feature extraction subnetwork 222 Small-scale features, middle-scale features, and large-scale features of the upsampled low-resolution multispectral image are generated based on shallow features of the upsampled low-resolution multispectral image, and the second multi-scale feature extraction subnetwork 224 Small-scale features, middle-scale features, and large-scale features of the panchromatic image are generated based on the shallow features of the panchromatic image, and the multi-scale feature fusing unit 226 includes the original information feature and the upsampled The small-scale feature, middle-scale feature, and large-scale feature of the low-resolution multi-spectral image and the small-scale feature, middle-scale feature, and large-scale feature of the panchromatic image are fused, and the reconstruction unit 210 fuses the fused feature and A high resolution multispectral image is reconstructed based on the upsampled low resolution multispectral image.

및

을 각각 저해상도 멀티 스펙트럼 이미지 및 팬크로마틱 이미지 입력으로 표시한다.

는 MSFN의 출력을 나타낸다. 특히 SFE 부분의 경우

및

이 별도로 처리된다. 대응하는

과 업샘플링된

로부터 각각 얕은 특징들

와

를 추출하기 위하여 단지 하나의 컨볼루션 레이어만 적용된다.

and

as low-resolution multispectral image and panchromatic image input, respectively.

represents the output of MSFN. Especially for the SFE part

and

This is handled separately. corresponding

and upsampled

Each shallow feature from

and

To extract , only one convolutional layer is applied.

여기서

와

는 각각 팬크로마틱 경로와 멀티 스케일 경로의 컨볼루션 연산을 나타낸다.

은 쌍삼차 업샘플링 연산을 나타낸다. 그런 다음 얕은 특징들

와

는 멀티 스케일 특징들을 추가로 추출하기 위해 멀티 스케일 특징 추출 네트워크(206)(MSFENet)에 공급되고 원시 정보 보존을 위해 정보 풀(204)(IP)에 공급된다.here

and

represents the convolution operation of the panchromatic path and the multi-scale path, respectively.

denotes a bicubic upsampling operation. Then the shallow features

and

is supplied to the multi-scale feature extraction network 206 (MSFENet) to further extract multi-scale features and to the information pool 204 (IP) for preservation of raw information.

멀티 스케일 특징 추출 네트워크(206)(MSFENet)의 경우 동일한 아키텍처와 공유 가중치를 가진 두 개의 서브 네트워크인 제1 멀티 스케일 특징 추출 네트워크(222)와 제2 멀티 스케일 특징 추출 네트워크(224)를 사용하여 각각 팬크로마틱 이미지 및 멀티 스펙트럼 이미지의 멀티 스케일 정보를 캡처한다.In the case of the multi-scale feature extraction network 206 (MSFENet), a first multi-scale feature extraction network 222 and a second multi-scale feature extraction network 224, which are two sub-networks having the same architecture and shared weights, are used, respectively. Capture multi-scale information of panchromatic and multispectral images.

각 서브 네트워크는 인코더-디코더 구조와 효과적인 멀티 스케일 특징 추출을 위한 거친-미세 전략을 통합한다. 멀티 스케일 특징 추출 네트워크(206)(MSFENet)에 대한 자세한 내용은 추후 설명될 것이다. 각 서브 네트워크는 얕은 특징 추출부(202)(SFE)의 얕은 특징을 입력으로 사용하고 하기와 같이 공식화될 수 있는 세 가지 출력:

,

을 생성한다.Each sub-network incorporates an encoder-decoder structure and a coarse-fine strategy for effective multi-scale feature extraction. Details of the multi-scale feature extraction network 206 (MSFENet) will be described later. Each subnetwork takes as input the shallow features of the shallow feature extractor 202 (SFE) and has three outputs which can be formulated as follows:

,

generate

여기서

과

는 각각 팬크로마틱에 대한 서브 네트워크와 멀티 스케일에 대한 서브 네트워크의 합성 함수를 나타낸다.

,

은 각각 스몰 스케일(small scale), 미들 스케일(middle scale) 및 라지(large scale) 스케일 특징을 나타낸다. 본 발명에서 스케일은 특징 맵의 해상도를 나타낸다. 스몰 스케일 특징들은 거친(대략적인) 정보가 포함된 저해상도 특징 맵을 나타내고, 라지 스케일 특징들은 정제된 정보가 포함된 고해상도 특징 맵을 나타낸다. 이러한 멀티 스케일 특징들은 보완 정보 융합을 위해 멀티 스케일 특징 융합부(226)(MSFF)로 전달된다.here

class

denotes the composite function of a subnetwork for panchromatic and a subnetwork for multiscale, respectively.

,

denotes characteristics of a small scale, a middle scale, and a large scale, respectively. In the present invention, scale represents the resolution of a feature map. Small-scale features represent a low-resolution feature map with coarse (rough) information, and large-scale features represent a high-resolution feature map with refined information. These multi-scale features are transferred to the multi-scale feature fusion unit 226 (MSFF) for complementary information fusion.

정보 풀(204)(IP)과 관련하여 원시 정보를 보존하는 것은 간단하지만 유용한 모듈이다. 정보 풀(204)의 직관적인 동기는 특징 재사용과 메모리 메커니즘이며, 이는 원시 공간 및 스펙트럼 정보를 포함하는 얕은 특징이 팬 샤프닝의 성능을 높이는 데 도움이 된다는 것을 의미한다. 도 2와 같이 정보 풀(204)은 팬크로마틱 이미지, 업샘플링된 멀티 스펙트럼 이미지, 얕은 특징들

,

를 입력으로 연결하고, 입력을 단일 컨볼루션 레이어로 처리하여 원시 정보를 갖는 출력 특징들을 생성할 것이다. 이 부분은 다음과 같이 나타낼 수 있다.Preserving raw information in relation to the information pool 204 (IP) is a simple but useful module. The intuitive motivation of the information pool 204 is a feature reuse and memory mechanism, meaning that shallow features containing raw spatial and spectral information help improve the performance of fan sharpening. As shown in FIG. 2, the information pool 204 includes a panchromatic image, an upsampled multispectral image, and shallow features.

,

to the input, and process the input with a single convolutional layer to generate output features with raw information. This part can be expressed as:

여기서

는 원시 정보를 나타낸다.

는 스킵 연결을 통해 후속 특징 융합을 위해 멀티 스케일 특징 융합부(226)(MSFF)로 전송된다. 따라서 그것은 기울기 역전파에도 이점이 있다.

와

는 각각 정보 풀(204)(IP)에서의 컨벌루션 레이어 및 연결(concatenation) 연산을 나타낸다.

은 쌍삼차 업샘플링 연산을 나타낸다.here

represents raw information.

is transmitted to the multi-scale feature fusion unit 226 (MSFF) for subsequent feature fusion through skip connection. So it also benefits from gradient backpropagation.

and

denotes the convolutional layer and concatenation operation in the information pool 204 (IP), respectively.

denotes a bicubic upsampling operation.

결과적으로 멀티 스케일 특징 추출 네트워크(206)(MSFENet)의 멀티 스케일 특징과 정보 풀(204)(IP)의 원시 정보는 보완 정보 융합을 위해 멀티 스케일 특징 융합부(226)(MSFF)에서 활용되어 다음과 같은 융합된 특징을 생성한다:As a result, the multi-scale features of the multi-scale feature extraction network 206 (MSFENet) and raw information of the information pool 204 (IP) are utilized in the multi-scale feature fusion unit 226 (MSFF) for complementary information fusion, Generates fused features such as:

여기서

는 멀티 스케일 특징 융합부(226)(MSFF)의 특징을 나타낸다. 멀티 스케일 특징 융합부(226)(MSFF)에 대한 자세한 내용은 추후 설명될 것이다.here

denotes the characteristics of the multi-scale feature fusion unit 226 (MSFF). Details of the multi-scale feature fusing unit 226 (MSFF) will be described later.

그런 다음 정보 특징

는 재구성부(210)의 입력으로 사용된다. 재구성부(210)는 K개의 잔여 블록(228)과 꼬리에 있는 단일 컨볼루션 레이어(230) 및 제1 가산기(232)로 구성된다. 공정한 비교를 위해 매개변수의 수를 TFNet과 거의 같게 하기 위해 K를 12로 설정했다. 잔여 블록은 ResNet에서 처음 제안되었으며 다양한 컴퓨터 비전 작업에서 지배적인 역할을 했다. Then the information feature

is used as an input of the reconstruction unit 210. The reconstruction unit 210 is composed of K residual blocks 228, a single convolution layer 230 at the tail, and a first adder 232. For a fair comparison, we set K to 12 to make the number of parameters almost equal to TFNet. Residual blocks were first proposed in ResNet and have played a dominant role in various computer vision tasks.

잔여 블록 채택에 대해서, Ledig et al.은 SISR를 위해 SRResNet 및 SRGAN을 제안했다. 그러나 오리지널 잔여 블록의 배치 정규화(BN: batch normalization)는 특징 값들의 다양성을 변경하므로 이미지 복원 작업에 적합하지 않다. 따라서 Lim et al.은 잔여 블록에서 BN 층을 제거하여 만족스러운 성능을 얻었다. 따라서 Lim et al.에서와 같은 잔여 블록을 적용한다. 잔여 블록의 세부 사항은 도 2에 나와 있다. 잔여 블록(228)과 단일 컨볼루션 레이어(230)를 통해 잔차 이미지 I_res가 생성된다.For residual block adoption, Ledig et al. proposed SRResNet and SRGAN for SISR. However, batch normalization (BN) of the original residual block is not suitable for image reconstruction work because it changes the diversity of feature values. Therefore, Lim et al. obtained satisfactory performance by removing the BN layer from the remaining block. Therefore, we apply the residual block as in Lim et al. Details of the remaining blocks are shown in FIG. 2 . A residual image I _res is generated through the residual block 228 and a single convolutional layer 230 .

여기서 H_re()는 재구성부(210)의 중첩 함수를 나타낸다. 따라서 최종 팬-샤프닝된

는 업샘플링된 멀티 스펙트럼 이미지의 전역 잔여 학습에 의해 얻을 수 있다.Here, H _re () denotes an overlapping function of the reconstruction unit 210. Therefore, the final fan-sharpened

can be obtained by global residual learning of upsampled multispectral images.

여기서

은 쌍삼차 업샘플링 연산을 나타낸다.here

denotes a bicubic upsampling operation.

표 1은 본 발명의 일 실시예에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치(MSFN)의 세부 파라미터 설정을 보여준다.Table 1 shows detailed parameter settings of a real-time image fusion device (MSFN) for remote sensing based on deep learning according to an embodiment of the present invention.

멀티 스케일 특징 추출 네트워크(206)Multi-scale feature extraction network (206)

멀티 스케일 특징 추출 네트워크(206)(MSFENet: Multi-Scale Feature Extraction Net)와 관련하여 동일한 아키텍처와 공유 가중치를 가진 두 개의 서브 네트워크인 제1 멀티 스케일 특징 추출 서브 네트워크(222)와 제2 멀티 스케일 특징 추출 서브 네트워크(224)를 사용하여 각각 멀티 스펙트럼 이미지와 팬크로마틱 이미지의 멀티 스케일 정보를 캡처한다.Regarding the multi-scale feature extraction network 206 (MSFENet: Multi-Scale Feature Extraction Net), the first multi-scale feature extraction sub-network 222 and the second multi-scale feature are two sub-networks having the same architecture and shared weights. The extraction sub-network 224 is used to capture multi-scale information of the multi-spectral image and the panchromatic image, respectively.

도 3과 같이, 각각의 서브 네트워크(222, 224)는 인코더-디코더 구조를 도입한다. 두 개의 서브 네트워크가 정확히 동일하므로 여기에서 멀티 스펙트럼 경로의 제1 멀티 스케일 특징 추출 서브 네트워크(222)를 예로 들어 보겠다.As in Fig. 3, each sub-network 222, 224 introduces an encoder-decoder structure. Since the two sub-networks are exactly the same, the first multi-scale feature extraction sub-network 222 of the multi-spectral path is taken as an example here.

입력은

이고, 반면에 대응하는 출력들은

,

이고, 각각 멀티 스펙트럼 이미지의 멀티 스케일 특징들을 지칭한다. 인코더 단에서, 초기에 컨벌루션 레이어를 사용하여 로우-레벨 특징을 추출한다.input is

, while the corresponding outputs are

,

, and denote multi-scale features of the multispectral image, respectively. At the encoder stage, we initially extract low-level features using convolutional layers.

는 스트라이드가 1인 컨벌루션 레이어를 나타낸다.

는 스킵 연결 및 추가적인 인코딩을 위해 사용될 것이다. 그 다음 스트라이드가 2인 컨벌루션 레이어를 적용하여 공간 해상도를 절반으로 낮추고 입력 특징 맵들의 특징 채널들을 2배로 늘린다.

denotes a convolutional layer with a stride of 1.

will be used for skip concatenation and additional encoding. Then, a convolutional layer with a stride of 2 is applied to reduce the spatial resolution by half and double the feature channels of the input feature maps.

는 방금 언급된 컨벌루션 연산을 나타낸다.

는 스킵 연결 및 추가 인코딩을 위해 사용될 것이다. 후속적으로, 스트라이드가 2인 컨벌루션 레이어를 적용하여 두번째 다운샘플링 프로세스를 수행한다.

denotes the convolution operation just mentioned.

will be used for skip concatenation and further encoding. Subsequently, a second downsampling process is performed by applying a convolutional layer with a stride of 2.

는

와 유사한 컨벌루션 연산을 나타낸다. 공간 크기를 절반으로 줄이고

의 특징 채널들을 2배로 늘리는,

는 스킵 연결 및 추가 인코딩을 위해 사용될 것이다.

Is

represents a convolution operation similar to cut the space in half

doubling the feature channels of

will be used for skip concatenation and further encoding.

디코더는 주로 스몰 스케일 특징 추출부(310)(SSFEM: small-scale feature extraction module), 미들 스케일 특징 추출부(314)(MSFEM: middle-scale feature extraction module) 그리고 라지 스케일 특징 추출부(318)(LSFEM: large-scale feature extraction module)의 세 부분으로 구성된다.The decoder mainly includes a small-scale feature extraction module 310 (SSFEM: small-scale feature extraction module), a middle-scale feature extraction module 314 (MSFEM: middle-scale feature extraction module) and a large-scale feature extraction module 318 ( It consists of three parts: large-scale feature extraction module (LSFEM).

스몰 스케일 특징 추출부(310)(SSFEM)는 4개의 잔여 블록(328)과 꼬리에 있는 컨벌루션 레이어(330)로 구성된다. 필터의 수는

의 특징 채널들과 같다. 또한 마지막 컨벌루션 레이어의 출력에

를 로컬 잔여 학습으로 추가하여, 메모리를 유지하고 정보 흐름에 이점을 줄 수 있다. 이 부분은 다음과 같이 설명할 수 있다.The small scale feature extractor 310 (SSFEM) is composed of four residual blocks 328 and a convolutional layer 330 at the tail. the number of filters

It is the same as the characteristic channels of Also in the output of the last convolutional layer

can be added as local residual learning to conserve memory and benefit information flow. This part can be explained as follows.

,

및

는 스몰 스케일 특징 추출부(310)(SSFEM)에서 4개의 잔여 블록들의 특징들을 나타낸다.

는 마지막 컨벌루션 레이어(330)의 연산을 나타낸다.

는 추출된 스몰 스케일 특징을 나타낸다.

,

and

denotes features of four residual blocks in the small scale feature extractor 310 (SSFEM).

represents the operation of the last convolutional layer 330.

denotes the extracted small-scale features.

미들 스케일 특징 추출부(312)(MSFEM)에서, 입력

는 먼저 스트라이드가 2인 디컨벌루션 레이어(332)에 의해 처리되어,

의 공간 크기가 두 배이고 특징 채널이 절반인 특징 맵을 생성한다. 그런 다음 꼬리에 두 개의 잔여 블록(324)과 컨볼루션 레이어(336)를 적용한다. 또한, 스킵 연결을 위해

가 미들 스케일 특징 추출부(312)(MSFEM)의 출력에 가산된다. 프로세스는 다음과 같이 공식화할 수 있다.In the middle scale feature extraction unit 312 (MSFEM), the input

is first processed by the deconvolution layer 332 with a stride of 2,

Create a feature map with twice the spatial size of and half the feature channels. Then, two residual blocks 324 and a convolution layer 336 are applied to the tail. Also, for skip connections

is added to the output of the middle scale feature extractor 312 (MSFEM). The process can be formulated as follows.

과

은 M미들 스케일 특징 추출부(312)(MSFEM)에서의 두 잔여 블록들(334)의 특징들을 나타낸다.

와

는 각각 첫번째 디컨벌루션 레이어(332)와 마지막 컨벌루션 레이어(334)의 연산을 나타낸다.

는 추출된 미들 스케일 특징을 나타낸다.

class

represents the features of the two residual blocks 334 in the M-middle-scale feature extractor 312 (MSFEM).

and

represents the operation of the first deconvolution layer 332 and the last convolution layer 334, respectively.

denotes the extracted middle scale features.

라지 스케일 특징 추출부(318)(LSFEM)의 경우, 시작 부분에 스트라이드가 2인 디컨벌루션 레이어(338), 중간에 잔여 블록(340), 꼬리 부분에 컨벌루션 레이어(342)로 구성된다. 초기에, 라지 스케일 특징 추출부(318)(LSFEM)의 입력,

는 디컨벌루션 레이어(338)에서 처리된다. 이 레이어(338)를 통해

의 두 배의 공간 해상도와 절반의 채널을 가진 특징 맵을 얻는다.In the case of the large-scale feature extractor 318 (LSFEM), it is composed of a deconvolution layer 338 with a stride of 2 at the beginning, a residual block 340 in the middle, and a convolution layer 342 at the tail. Initially, the input of the large scale feature extraction unit 318 (LSFEM),

is processed in the deconvolution layer 338. through this layer 338

to obtain a feature map with twice the spatial resolution and half the channels of

후속적으로, 이들 특징 맵들은 잔여 블록(340)과 컨볼루션 레이어(342)에 입력되어 출력을 생성한다. 또한, 잔여 학습을 활용하기 위해, 스킵 연결을 위한 출력에

를 가산한다. 이러한 방식으로, 라지 스케일 특징

가 획득된다. 이 부분은 다음과 같이 공식화할 수 있다.Subsequently, these feature maps are input to a residual block 340 and a convolution layer 342 to generate an output. In addition, to utilize the residual learning, the output for skip connection

add up In this way, large-scale features

is obtained This part can be formulated as follows.

은 라지 스케일 특징 추출부(318)(LSFEM)에서 초기 디컨벌루션 레이어(328)의 특징을 나타낸다.

및

은 각각 중간의 잔여 블록(340) 및 마지막 컨벌루션 레이어(342)의 연산을 나타낸다.

represents a feature of the initial deconvolution layer 328 in the large-scale feature extractor 318 (LSFEM).

and

denotes the operation of the middle residual block 340 and the last convolutional layer 342, respectively.

디코더 단계에서, 멀티 스케일 특징을 점진적으로 디코딩하여, 단계적으로 더 미세한 세부 사항을 얻기 위해 거친 대 미세(coarse-to-fine) 전략을 통합한다. 구체적으로, 4개의 잔여 블록들은 스몰 스케일 특징을 위해 사용되는 반면, 2개의 잔여 블록들은 미들 스케일 특징을 위해 사용되며 하나의 잔여 블록은 미들 스케일 특징을 위해 사용된다. 따라서 스몰 스케일 특징 뿐만 아니라 미들 스케일 특징 및 라지 스케일 특징도 풍부하게 추출하여 더 나은 후속 융합을 제공한다. 상이한 스케일의 특징 추출을 위해 4, 2, 1의 잔여 블록을 유지한다는 점은 언급할 가치가 있다.In the decoder stage, multi-scale features are progressively decoded, incorporating a coarse-to-fine strategy to obtain progressively finer details. Specifically, four residual blocks are used for small scale features, while two residual blocks are used for middle scale features and one residual block is used for middle scale features. Therefore, not only small-scale features but also middle-scale features and large-scale features are extracted richly to provide better subsequent fusion. It is worth mentioning that we keep residual blocks of 4, 2 and 1 for feature extraction at different scales.

이러한 방식으로, 제1 멀티 스케일 특징 추출 서브 네트워크(222)의 3가지 멀티 스케일 출력, 즉

,

및

가 획득된다. 마찬가지로 팬크로마틱 경로의 제2 멀티 스케일 특징 추출 서브 네트워크(224)는

,

및

을 생성한다. 후속적으로, 이들 6개의 출력은 추가 보완 정보 융합을 위해 멀티 스케일 특징 융합부(208, 226)(MSFF)에 제공될 것이다.In this way, the three multi-scale outputs of the first multi-scale feature extraction sub-network 222, namely

,

and

is obtained Similarly, the second multi-scale feature extraction subnetwork 224 of the panchromatic path

,

and

generate Subsequently, these six outputs will be provided to the multi-scale feature fusion section 208, 226 (MSFF) for further complementary information fusion.

멀티 스케일 특징 융합부(208, 226)Multi-scale feature fusion (208, 226)

멀티 스케일 특징 추출 네트워크(206)(MSFENet)의 멀티 스케일 특징과 정보 풀(204)(IP)의 원시 정보는 보완 정보 융합을 위해 멀티 스케일 특징 융합부(206, 208)(MSFF: multi-scale feature fusion)에서 처리된다.The multi-scale feature of the multi-scale feature extraction network 206 (MSFENet) and the original information of the information pool 204 (IP) are multi-scale feature fusion units 206 and 208 (MSFF: multi-scale feature for complementary information fusion) fusion).

도 4에서 볼 수 있듯이, 멀티 스케일 특징 융합부(206, 208)(MSFF)는 주로 스몰 스케일 특징 융합부(400)(SSFFB: small-scale features fusion block), 미들 스케일 특징 융합부(402)(MSFFB: middle-scale features fusion block) 및 포괄적 특징 융합부(404)(CFFB: comprehensive features fusion block)의 세 부분으로 구성된다.As can be seen in FIG. 4 , the multi-scale feature fusion blocks 206 and 208 (MSFF) mainly include the small-scale feature fusion block 400 (SSFFB) and the middle-scale feature fusion block 402 ( It consists of three parts: middle-scale features fusion block (MSFFB) and comprehensive features fusion block (CFFB).

스몰 스케일 특징 융합부(400)(SSFFB)의 경우, 커널 크기가 1인(kernel_size=1) 컨벌루션 레이어(408)와 스트라이드(stride)가 4인 디컨벌루션 레이어(410)로 구성된다. 연결부(406)에 의해

와

가 연결되어 컨벌루션 레이어(408)의 입력으로 사용된다. 그런 다음 스몰 스케일 특징들이 저해상도(LR: low resolution) 공간에 융합되어, 계산 부담이 크게 줄어든다. 마지막으로 융합된 특징은 추가 융합을 위해 디컨벌루션 레이어(410)에 의해 스케일 인자 x4로 업샘플링된다. 이 프로세스는 다음과 같이 나타낼 수 있다.In the case of the small scale feature fusion unit 400 (SSFFB), a convolution layer 408 having a kernel size of 1 (kernel_size=1) and a deconvolution layer 410 having a stride of 4 are configured. by connection 406

and

is connected and used as an input of the convolutional layer 408. The small scale features are then fused into a low resolution (LR) space, greatly reducing the computational burden. Finally, the fused features are upsampled by a scale factor x4 by the deconvolution layer 410 for further fusion. This process can be represented as:

여기서

는 연결 동작을 나타낸다.

와

는 각각 연결된 특징을 압축하기 위한 1×1 컨벌루션 레이어(408)와 스트라이드가 4인 디컨벌루션 레이어(410)를 나타낸다.

는 스몰 스케일의 융합된 특징을 나타낸다.here

indicates a connection operation.

and

represents a 1×1 convolution layer 408 for compressing connected features and a deconvolution layer 410 having a stride of 4, respectively.

represents a small-scale fused feature.

마찬가지로, 미들 스케일 특징 융합부(402)(MSFFB)는

와

의 연결을 입력으로 사용하고 1×1 컨볼루션 레이어(414)를 사용하여 중간 해상도(MR: medium-resolution) 공간에서 미들 스케일 특징들을 융합한다. 후속적으로, 융합된 특징은 추가 융합을 위해 스트라이드가 2인 디컨벌루션 레이어(416))를 사용하여 업샘플링된다. 이 부분은 다음과 같이 설명할 수 있다.Similarly, the middle scale feature fusion part 402 (MSFFB)

and

It takes as input the concatenation of , and fuses middle-scale features in medium-resolution (MR) space using a 1×1 convolutional layer 414 . Subsequently, the fused features are upsampled using a deconvolution layer 416 with a stride of 2 for further fusion. This part can be explained as follows.

여기서

는 연결 동작을 나타낸다.

과

은 각각 1×1 컨벌루션 레이어(414)와 스트라이드가 2인 디컨벌루션 레이어(416)를 나타낸다.

는 미들 스케일의 융합된 특징을 나타낸다.here

indicates a connection operation.

class

denotes a 1×1 convolution layer 414 and a deconvolution layer 416 having a stride of 2, respectively.

represents the fused character of the middle scale.

마지막으로,

,

및

는 연결되어 종합적인 정보, 즉, 멀티 스케일 보완 정보 및 원시 정보를 융합하기 위하여 1×1 컨볼루션 레이어를 나타내는 종합 특징 융합부(404)(CFFB)에 제공된다. 이 마지막 융합된 특징은 다음과 같이 공식화될 수 있다.finally,

,

and

is concatenated and provided to the synthetic feature fusion unit 404 (CFFB) representing a 1×1 convolutional layer to fuse synthetic information, that is, multi-scale complementary information and original information. This last fused feature can be formulated as:

여기서

는 연결 동작을 나타낸다.

는 종합 특징 융합부(404)(CFFB)에서 1×1 컨볼루션 레이어(420)를 나타낸다.here

indicates a connection operation.

denotes the 1x1 convolutional layer 420 in the synthetic feature fusion section 404 (CFFB).

실시간 구현real-time implementation

매개변수와 계산 복잡도는 실제 응용과 실시간 처리의 두 가지 필수 요소이다. 따라서 본 발명에서는 실시간 구현을 위한 두 가지 조치가 있다. 한편, 두 개의 인코더-디코더의 가중치, 즉 멀티 스케일 특징 추출 네트워크(206)(MSFENet)의 두 개의 서브 네트워크인 제1 멀티 스케일 특징 추출 서브 네트워크(222)와 제2 멀티 스케일 특징 추출 서브 네트워크(224)의 가중치는 매개변수를 줄이기 위해 공유된다. 반면에 더 적은 수의 계산을 위해 스몰 스케일 특징은 저해상도(LR) 공간에 융합되고 미들 스케일 특징은 중간 해상도(MR) 공간에 융합된다.Parameters and computational complexity are two essential factors for real-world applications and real-time processing. Therefore, in the present invention, there are two measures for real-time implementation. Meanwhile, the weights of the two encoder-decoders, that is, the first multi-scale feature extraction sub-network 222 and the second multi-scale feature extraction sub-network 224, which are two sub-networks of the multi-scale feature extraction network 206 (MSFENet) ) is shared to reduce parameters. On the other hand, for fewer computations, small scale features are fused to low resolution (LR) space and middle scale features are fused to medium resolution (MR) space.

가중치 공유 인코더-디코더 구조Weight sharing encoder-decoder structure

멀티 스케일 특징 추출 네트워크(206)(MSFENet)에서는 동일한 인코더-디코더 아키텍처를 가진 두 개의 서브 네트워크를 사용하여 멀티 스펙트럼 이미지와 해당 팬크로마틱 이미지의 멀티 스케일 보완 컨텍스트 정보를 캡처한다. 인코더-디코더 아키텍처는 더 큰 수용 필드를 가능하게 하여 효과적인 멀티 스케일 특징 추출에 도움이 된다. 그러나 인코더-디코더 아키텍처는 또한 엄청난 양의 매개변수를 발생시킨다. 유리한 성능을 위해 인코더-디코더 아키텍처를 활용하고 적당한 크기의 네트워크를 유지하기 위해 두 개의 인코더-디코더가 동일한 가중치를 공유하므로 멀티 스케일 특징 추출 네트워크(206)(MSFENet)의 매개변수가 절반으로 줄어든다.The multi-scale feature extraction network 206 (MSFENet) uses two sub-networks with the same encoder-decoder architecture to capture the multi-spectral image and the multi-scale complementary context information of the corresponding panchromatic image. The encoder-decoder architecture enables a larger receptive field, conducive to effective multi-scale feature extraction. However, the encoder-decoder architecture also introduces a huge number of parameters. The parameters of the multi-scale feature extraction network 206 (MSFENet) are halved as both encoder-decoders share the same weights to utilize the encoder-decoder architecture for favorable performance and keep the network reasonably sized.

스몰 스케일 특징을 위한 저해상도 공간에서의 융합 및 미들 스케일 특징을 위한 중간 해상도 공간에서의 융합Fusion in low-resolution space for small-scale features and fusion in medium-resolution space for middle-scale features

보완 스몰 스케일 특징(

와

) 또는 미들 스케일 특징(

와

)으로부터 융합된 특징을 획득하기 위한 주된 2가지 방법이 존재한다. 첫번째 방법은 도 3에 도시된 파이프라인, 연결부(Concat), 컨벌루션 레이어(Conv) 및 디컨벌루션 레이어(Deconv)를 채택하는 것이고, 반면에 두 번째 방법은 디컨벌루션 레이어(Deconv), 연결부(Concat) 및 컨벌루션 레이어(Conv)의 또 다른 파이프라인을 채택한다.Complementary small scale features (

and

) or middle scale characteristics (

and

) There are two main methods for obtaining fused features. The first method is to adopt the pipeline shown in Fig. 3, the convolution (Concat), the convolution layer (Conv) and the deconvolution layer (Deconv), while the second method adopts the deconvolution layer (Deconv), the convolution (Concat) and another pipeline of convolutional layers (Conv).

전자는 저해상도(LR) 또는 중간 해상도(MR) 공간에서 보완적인 특징을 융합하고 후자는 고해상도(HR) 공간에서 융합한다. 분명히 전자는 공간 해상도가 증가함에 따라 계산 비용이 커지고 저해상도(LR) 또는 중간 해상도(MR) 공간의 특징 맵들의 공간 해상도가 고해상도(HR) 공간보다 낮기 때문에, 계산 복잡성이 더 작다.The former fuses complementary features in low-resolution (LR) or medium-resolution (MR) space, while the latter fuses them in high-resolution (HR) space. Obviously, the former has a smaller computational complexity because the computational cost increases as the spatial resolution increases and the spatial resolution of feature maps in low-resolution (LR) or medium-resolution (MR) space is lower than that in high-resolution (HR) space.

손실 함수loss function

네트워크 아키텍처 외에도 손실 함수는 팬 샤프닝의 성능에 영향을 미치는 또 다른 중요한 요소이다. L₂ 손실 함수는 기존의 SISR 및 팬 샤프닝 태스크에서 일반적으로 적용된다. 그럼에도 불구하고 L₂ 손실 함수가 로컬 최소값에 빠질 가능성이 더 높아 예기치 않은 아티팩트가 발생한다. 따라서 본 발명에서는 네트워크를 최적화하기 위해 L₁ 손실 함수를 채택하여 더 나은 휘도와 색상을 보존할 수 있고, L₂ 손실 함수보다 더 빠르게 수렴한다.Besides the network architecture, the loss function is another important factor affecting the performance of fan sharpening. The L ₂ loss function is commonly applied in conventional SISR and fan sharpening tasks. Nevertheless, it is more likely that the L ₂ loss function falls into a local minimum, leading to unexpected artifacts. Therefore, in the present invention, by adopting the L ₁ loss function to optimize the network, better luminance and color can be preserved, and it converges faster than the L ₂ loss function.

훈련 세트는

와 같이 표시되는데,

와

은 각각 저해상도 멀티 스펙트럼 이미지와 팬크로마틱 이미지를 나타낸다.

는 대응하는 그라운드-트루스(ground truth) 고해상도 멀티 스펙트럼 이미지를 나타낸다. 따라서, 본 발명에 의한 멀티 스펙트럼 융합 네트워크의 손실 함수는 다음과 같이 나타낼 수 있다.training set is

displayed as

and

denotes a low-resolution multispectral image and a panchromatic image, respectively.

denotes the corresponding ground-truth high-resolution multispectral image. Therefore, the loss function of the multispectral convergence network according to the present invention can be expressed as follows.

N,

및

는 각각 본 발명의 일 실시예에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치(MSFN)의 훈련 샘플의 수(크기), 함수 및 매개변수들을 나타낸다.N,

and

Respectively represents the number (size) of training samples, functions and parameters of the real-time image fusion device (MSFN) for remote sensing based on deep learning according to an embodiment of the present invention.

실험 결과Experiment result

데이터셋dataset

본 발명의 일 실시예에 의한 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치 및 방법의 우수성을 입증하기 위해 QuickBird, IKONOS, Spot-6 및 Pleiades의 네 가지 데이터 세트에 대한 실험을 구현한다. 이러한 위성의 스펙트럼 파장 및 공간 해상도 특징은 표 2에 요약되어 있다. 고해상도 멀티 스펙트럼 이미지가 실제로 팬 샤프닝 데이터셋에 존재하지 않는다는 점을 고려하여 실험 시뮬레이션을 위해 Wald의 프로토콜을 따른다. 특히, 스케일 인자 ×4로 멀티 스펙트럼 이미지 및 팬크로마틱 이미지를 모두 다운샘플링한다. 이러한 방식으로 다운샘플링된 멀티스펙트럼 이미지 및 팬크로마틱 이미지를 네트워크 입력으로 활용하고 원본 멀티 스펙트럼 이미지를 네트워크 훈련 및 방법 평가를 위한 참조로 사용할 수 있다.In order to demonstrate the superiority of the real-time image fusion device and method for remote sensing based on deep learning according to an embodiment of the present invention, experiments on four data sets of QuickBird, IKONOS, Spot-6, and Pleiades are implemented. The spectral wavelength and spatial resolution characteristics of these satellites are summarized in Table 2. We follow Wald's protocol for experimental simulations, taking into account that high-resolution multispectral images do not actually exist in the fan-sharpening dataset. In particular, both the multispectral image and the panchromatic image are downsampled by a scale factor of 4. In this way, downsampled multispectral images and panchromatic images can be utilized as network inputs, and original multispectral images can be used as references for network training and method evaluation.

구현 세부 사항implementation details

앞에서 언급된 4개의 데이터 세트는 별도로 실험을 구현하는 데 사용된다. 각 데이터 세트에 대해 업샘플링된 저해상도 멀티스펙트럼(LRMS)/팬크로마틱/고해상도 멀티 스펙트럼(HRMS) 이미지 쌍은 초기에 256×256 크기로 추출된다. 그런 다음 훈련/테스트를 위해 90/10%로 분할된다. 게다가, 데이터 증대는 무작위 자르기, 회전 및 뒤집기를 포함하여 각 훈련 세트에 채택된다. 훈련 단계에서 미니 배치 크기는 8로 설정되고 각 패치 쌍은 훈련 이미지 쌍에서 무작위로 64×64로 잘린다. 각 실험의 훈련 과정은 1000 에포크(Epoch)가 소요된다. 네트워크를 최적화하기 위해 Adam을 적용한다. 초기 학습률은 0.0001로 설정되고 매 200 에포크마다 0.5 팩터로 감소한다. 모든 딥 러닝 기반 실험은 GTX1080Ti GPU가 탑재된 데스크탑의 파이토치(Pytorch)에서 구현된다. 기존 방법의 실험은 4.2GHz Intel i7-7700K CPU가 장착된 MATLAB R2018a에서 수행된다.The four previously mentioned data sets are separately used to implement the experiments. For each data set, upsampled low-resolution multispectral (LRMS)/panchromatic/high-resolution multispectral (HRMS) image pairs are initially extracted with a size of 256 × 256. It is then split 90/10% for training/testing. In addition, data augmentation is employed for each training set including random cropping, rotation and flipping. In the training phase, the mini-batch size is set to 8 and each patch pair is randomly cropped to 64×64 from the training image pair. The training process of each experiment takes 1000 epochs. Apply Adam to optimize the network. The initial learning rate is set to 0.0001 and reduced by a factor of 0.5 every 200 epochs. All deep learning-based experiments are implemented in Pytorch on a desktop with a GTX1080Ti GPU. Experiments of the existing method are performed in MATLAB R2018a with a 4.2GHz Intel i7-7700K CPU.

절제(Ablation) 연구Ablation studies

정보 풀(IP), CTF(Coarse-to-Fine) 전략(CTF) 및 멀티 스케일 융합(MSF)의 효율성을 검증하기 위해 5가지 네트워크의 성능을 분석한다: 기준선(IP, CTF 및 MSF 없음), IP 없는 MSFN, CTF 없는 MSFN, MSF 없는 MSFN 및 제안된 MSFN. 모든 절제 실험은 퀵버드(QuickBird) 데이터 세트에서 수행된다. 절제 실험 결과 및 해당 손실 곡선은 각각 표 3 및 도 6a에 나와 있다.To verify the efficiency of information pool (IP), coarse-to-fine (CTF) strategy (CTF), and multi-scale convergence (MSF), we analyze the performance of five networks: baseline (without IP, CTF, and MSF), MSFN without IP, MSFN without CTF, MSFN without MSF, and Suggested MSFN. All ablation experiments are performed on the QuickBird data set. The results of the ablation experiments and the corresponding loss curves are shown in Table 3 and FIG. 6A, respectively.

정보 풀(204)(IP).Information Pool 204 (IP).

정보 풀(204)(IP)은 단순하지만 기본 정보를 보존하는 데 도움이 되는 모듈이며, 반면에 그래디언트 역전파(back-propagation)에 유용하다. 직관적인 동기는 특징 재사용과 메모리 메커니즘이며, 이는 원시 공간 및 스펙트럼 정보를 포함하는 얕은 특징이 팬 샤프닝의 성능을 향상시킬 가능성이 있음을 의미한다. 표 3에서 보는 바와 같이 두 번째 행과 마지막 행에 있는 결과는 정보 풀(204)(IP)의 효율성을 보여준다.The information pool 204 (IP) is a module that is simple but helps to preserve basic information, while useful for gradient back-propagation. Intuitive motives are feature reuse and memory mechanisms, implying that shallow features containing raw spatial and spectral information have the potential to improve the performance of fan sharpening. As shown in Table 3, the results in the second and last rows show the efficiency of the information pool 204 (IP).

거친-미세(CTF: Coarse-to-fine) 전략Coarse-to-fine (CTF) strategy

MSFENet의 디코더 단계에 대해 본 발명에서는 멀티 스케일 특징을 점진적으로 디코딩하여 단계적으로 더 미세한 세부 사항을 얻는 거친-미세 전략을 통합한다. 구체적으로, 4개의 잔여 블록은 소규모 특징에 사용되는 반면, 2개의 잔여 블록은 미들 스케일 특징에, 1개의 잔여 블록은 라지 스케일 특징에 사용된다. 이러한 방식으로 스몰 스케일 특징뿐만 아니라 미들 스케일 특징 및 라지 스케일 특징도 풍부하게 추출하여 더 나은 후속 융합을 만든다. CTF의 효율성을 입증하기 위해 MSFEM 및 LSFEM에서 잔여 블록을 제거하고 7개의 잔여 블록을 모두 SSFEM으로 병합한다. 표 3에서 보는 바와 같이 CTF가 없는 네트워크의 성능은 MSFN보다 열등하여 CTF의 장점을 나타낸다.For the decoder stage of MSFENet, the present invention incorporates a coarse-fine strategy that progressively decodes multi-scale features to obtain finer details step by step. Specifically, four residual blocks are used for small scale features, while two residual blocks are used for middle scale features and one residual block is used for large scale features. In this way, not only small-scale features, but also middle-scale features and large-scale features are extracted richly, making for better subsequent fusion. To demonstrate the efficiency of CTF, we remove residual blocks from MSFEM and LSFEM and merge all 7 residual blocks into SSFEM. As shown in Table 3, the performance of the network without CTF is inferior to that of MSFN, indicating the advantage of CTF.

멀티 스케일 융합(MSF: Multi-scale fusion)Multi-scale fusion (MSF)

MSF는 팬크로마틱 및 멀티 스펙트럼 이미지의 레이어적 보완 특징을 최대한 활용하는 데 필수적인 부분이다. MSF의 절제 연구를 위해 MSF 대신 단일 규모 융합을 적용한다. 특히, 라지 스케일 특징

와

을 융합하여 보완 정보를 획득하고, 표 3의 네 번째 행에 결과를 생성한다. 단일 스케일 융합과 멀티 스케일 융합 사이의 추가 비교를 위해, 스몰 스케일 융합과 미들 스케일 융합 각각을 가지고 2개의 단일 스케일 유합 실험을 수행하였다. 스몰 스케일 융합은 스몰 스케일 특징인

와

만을 융합하는 것을 의미하고, 미들 스케일 융합은 미들 스케일 특징인

와

만을 융합하는 것을 의미한다. 비교 결과는 표 4에 표시되고, 해당 손실 곡선은 도 6b에 표시된다. 멀티 스케일 융합이 실제로 단일 스케일 융합보다 성능이 우수하다는 것은 분명하며, 이는 MSF의 효율성을 보여준다. 전반적으로 모든 IP, CTF 및 MSF는 팬 샤프닝의 더 나은 성능에 유리하다.MSF is an essential part to fully exploit the layered complementary properties of panchromatic and multispectral images. For the ablation study of MSF, single-scale fusion is applied instead of MSF. In particular, large scale features

and

to obtain complementary information, and generate the result in the fourth row of Table 3. For further comparison between single-scale fusion and multi-scale fusion, two single-scale fusion experiments were performed with small-scale fusion and middle-scale fusion, respectively. Small scale fusion is a small scale characteristic.

and

Means to fuse only, and middle scale fusion is a characteristic of middle scale

and

means merging only The comparison results are shown in Table 4, and the corresponding loss curves are shown in Fig. 6b. It is clear that multi-scale fusion actually outperforms single-scale fusion, demonstrating the efficiency of MSF. Overall, all IP, CTF and MSF are in favor of better performance of fan sharpening.

다른 방법과의 비교Comparison with other methods

정량적 평가 및 시각적 성능Quantitative evaluation and visual performance

본 발명에 의한 MSFN의 효과를 확인하기 위해 Indusion, MMP, AWLP, GS, PNN, DiCNN1, PanNet, ECNN, DRPNN, BIEDN 및 TFNet을 포함하여, 4가지 전통적인 방법과 여러 딥 러닝 기반 방법을 본 발명의 방법과 비교한다. 정량적 평가를 위해 본 실험에서 널리 사용되는 6가지 지표가 채택되었다: 상관 계수(CC), 상대 전역 차원 합성 오차(ERGAS), 범용 이미지 품질 지수의 4밴드 확장(Q₄), 스펙트럼 각도 매퍼(SAM), 상대 평균 스펙트럼 오차(RASE) 및 RMSE(제곱 평균 오차). 앞에서 언급된 4개의 데이터 세트에 대한 객관적인 정량적 결과는 각각 표 5, 6, 7 및 8에 나열되어 있다. 각 데이터 세트에 대해 해당 테스트 이미지에 대한 결과를 평균화하여 최종 결과를 계산한다.In order to confirm the effect of MSFN according to the present invention, four traditional methods and several deep learning-based methods, including Indusion, MMP, AWLP, GS, PNN, DiCNN1, PanNet, ECNN, DRPNN, BIEDN and TFNet, were used according to the present invention. Compare with method. For quantitative evaluation, six metrics widely used in this experiment were adopted: correlation coefficient (CC), relative global dimension synthesis error (ERGAS), four-band extension of universal image quality index (Q ₄ ), and spectral angle mapper (SAM). ), relative mean spectral error (RASE) and root mean square error (RMSE). The objective quantitative results for the four previously mentioned data sets are listed in Tables 5, 6, 7 and 8, respectively. For each data set, the final result is calculated by averaging the results for the corresponding test image.

이들 표에서 볼 수 있듯이 본 발명에 의한 MSFN은 모든 객관적인 평가 지표에 대한 모든 비교 팬 샤프닝 방법 중 최고의 성능을 달성함을 알 수 있다. 이러한 정량적 결과는 공간 세부 사항을 복구하는 데 성공할 뿐만 아니라 스펙트럼 정보를 보존하는 동시에 본 발명의 MSFN의 우수성을 보여준다.As can be seen from these tables, it can be seen that the MSFN according to the present invention achieves the best performance among all comparative fan sharpening methods for all objective evaluation indicators. These quantitative results demonstrate the superiority of the MSFN of the present invention, not only succeeding in recovering spatial details but also preserving spectral information.

객관적인 결과 외에도 여러 시각적 비교 결과를 제공한다. 도 7 내지 도 9는 각각 QuickBird, IKONOS 및 Pleiades의 이들 주관적인 결과를 보여준다. 멀티 스펙트럼 이미지의 처음 세 밴드는 시각적 비교를 위해 선택된다. 더 나은 육안 검사를 위한 차이점을 강조하기 위해 해당 잔여 이미지도 제공한다. 잔여 이미지는 팬 샤프닝된 이미지와 참조 이미지의 차이를 나타낸다. 분명히 차이가 작을수록 더 나은 재구성을 얻을 수 있다. 도 7의 확대된 영역에서 전통적인 방법이 명확한 공간 정보를 얻는 경향이 있지만 스펙트럼 왜곡을 겪는 반면, 대부분의 딥 러닝 기반 방법이 더 부드러운 공간 세부 정보를 얻는 경향이 있음을 쉽게 알 수 있다. 본 발명에 의한 방법만이 스펙트럼 및 공간 영역 모두에서 만족스러운 보존을 달성하며, 특히 확대 영역에서 세 개의 검은 점을 주목하는 것이 분명하다. 잔여 이미지는 본 발명에 의한 방법이 참조 이미지에 비해 공간 및 스펙트럼 왜곡이 가장 적다는 것을 추가로 증명한다. 도 8에서 볼 수 있듯이 Indusion, AWLP 및 GS와 같은 전통적인 방법은 특히 녹지와 협곡을 관찰하는 데 심각한 스펙트럼 왜곡을 도입하는 대가로 공간 세부 사항을 눈에 띄게 향상시킨다. PNN, DiCNN1, ECNN 및 DRPNN과 같은 일부 딥 러닝 기반 방법에서도 동일한 현상이 나타나다. 반대로 MMP와 PanNet은 스펙트럼 왜곡이 적지만 더 부드러운 공간 디테일을 회복한다.In addition to objective results, it provides several visual comparison results. Figures 7 to 9 show these subjective results of QuickBird, IKONOS and Pleiades, respectively. The first three bands of the multispectral image are selected for visual comparison. Corresponding residual images are also provided to highlight the differences for better visual inspection. The residual image represents the difference between the pan sharpened image and the reference image. Clearly, the smaller the difference, the better the reconstruction. In the enlarged region of Fig. 7, it is easy to see that most deep learning-based methods tend to obtain smoother spatial details, whereas traditional methods tend to obtain clear spatial information but suffer from spectral distortion. It is clear that only the method according to the present invention achieves a satisfactory preservation in both the spectral and spatial domains, especially in the enlarged domain three black dots are noted. The residual image further demonstrates that the method according to the present invention has the least spatial and spectral distortion compared to the reference image. As can be seen in Fig. 8, traditional methods such as Indusion, AWLP and GS noticeably improve spatial detail at the cost of introducing serious spectral distortion, especially for viewing green fields and canyons. The same phenomenon is seen in some deep learning-based methods such as PNN, DiCNN1, ECNN and DRPNN. Conversely, MMP and PanNet have less spectral distortion but recover smoother spatial detail.

대조적으로, TFNet과 본 발명에 의한 방법은 만족스러운 스펙트럼 및 공간 보존을 얻는다. 추가 관찰을 위해, 본 발명에 의한 방법은 도 8의 확대된 영역과 해당 잔여 이미지에서 볼 수 있듯이 협곡과 녹색 지면의 더 선명한 모양을 얻는 동시에 스펙트럼 왜곡을 줄인다.In contrast, TFNet and our method achieve satisfactory spectral and spatial conservation. For further observation, the method according to the present invention reduces spectral distortion while obtaining sharper shapes of canyons and green ground, as can be seen in the magnified region of Fig. 8 and the corresponding residual images.

도 9에서 볼 수 있듯이 전통적인 방법은 공간 세부 사항을 인상적으로 개선하지만 그와 동시에 특히 AWLP 및 GS의 경우 스펙트럼 왜곡이 눈에 띄게 나타난다. 딥러닝 기반 방법이 더 만족스러운 재구성을 달성할 가능성이 높다. 그러나 PNN, DiCNN1, PanNet 및 ECNN은 잔여 이미지에 따라 복잡한 텍스처 영역에서 아티팩트가 발생하는 경향이 있다. 대조적으로 DRPNN, TFNet 및 본 발명에 의한 방법은 더 나은 결과를 생성한다. 추가 관찰을 위해 본 발명에 의한 방법은 스펙트럼 왜곡 없이 배수로의 가장 선명한 직선 윤곽을 생성하고 참조 이미지와 더 유사하다. 위에서 언급한 비교는 본 발명에 의한 MSFN의 우수성을 보여준다. 요약하면, 본 발명에 의한 방법은 정량적 메트릭과 시각적 품질 측면에서 우수한 성능을 달성한다.As can be seen in Fig. 9, the traditional method improves spatial detail impressively, but at the same time spectral distortion is noticeable, especially for AWLP and GS. Deep learning-based methods are more likely to achieve more satisfactory reconstructions. However, PNN, DiCNN1, PanNet, and ECNN tend to have artifacts in complex texture regions depending on the residual image. In contrast, DRPNN, TFNet and our method produce better results. For further observation, the method according to the present invention produces the clearest straight contours of the drainage channel without spectral distortion and more similar to the reference image. The comparison mentioned above shows the superiority of the MSFN according to the present invention. In summary, the method according to the present invention achieves good performance in terms of quantitative metrics and visual quality.

비모수 테스트nonparametric test

정량적 메트릭 및 시각적 품질 외에도 다양한 방법을 비교하는 데 강력한 것으로 입증된 일부 비모수 통계 테스트를 수행한다. Friedman, Aligned Friedman 및 Quade를 포함하여 널리 적용되는 세 가지 비모수 테스트가 통계적 유의성 테스트를 위해 본 실험에서 사용된다. 각 테스트에 대해 모든 방법의 평균 순위는 CC 지표 측면에서 서로 다른 데이터 세트에 대한 성능에 따라 계산된다.In addition to quantitative metrics and visual quality, we perform some non-parametric statistical tests that have proven powerful for comparing different methods. Three widely applied nonparametric tests including Friedman, Aligned Friedman and Quade are used in this experiment for testing statistical significance. For each test, the average rank of all methods is calculated according to their performance on different datasets in terms of CC metrics.

Friedman, Aligned Friedman 및 Quade 비모수 테스트의 결과는 각각 표 9, 10 및 11에 나열되어 있다. 순위 번호가 작을수록 더 좋은 방법이다. 이 표에서 볼 수 있듯이 본 발명에 의한 MSFN은 모든 비교 팬 샤프닝 방법에 대해 최상의 성능을 달성한다. 세 가지 비모수 테스트 모두 본 발명에 의한 방법의 우수성을 보여준다.The results of the Friedman, Aligned Friedman and Quade nonparametric tests are listed in Tables 9, 10 and 11, respectively. The lower the rank number, the better the method. As can be seen from this table, the MSFN according to the present invention achieves the best performance for all comparative fan sharpening methods. All three non-parametric tests show the superiority of the method according to the present invention.

실시간 평가real-time evaluation

시간 평가의 경우, 1024×1024 크기의 업샘플링된 LRMS 및 팬크로마틱 이미지 쌍을 이러한 학습된 모델에 직접 공급하여 팬 샤프닝된 이미지를 생성한다. 서로 다른 데이터셋에 대한 각 방법의 결과는 표 12에 나와 있다. 결과는 본 발명에 의한 방법이 0.0068초 안에 하나의 이미지를 생성하여 실시간 성능을 달성할 수 있음을 나타낸다. 본 발명에 의한 방법은 BIEDN을 제외한 다른 딥 러닝 기반 방법보다 팬 샤프닝 이미지를 재구성하는 데 더 많은 시간이 걸린다는 점에 주목할 가치가 있다. 이는 본 발명에 의한 방법의 목적이 실시간 애플리케이션의 조건에서 가능한 한 성능을 향상시키는 것이기 때문에 합리적이다. 앞에서 언급했듯이 MSFN은 모든 비교 팬 샤프닝 방법 중 최고의 성능을 달성한다. 따라서 본 발명에 의한 방법은 실시간 구현의 요구 사항을 충족함과 동시에 보다 효과적인 성능을 달성할 수 있다.For temporal evaluation, pairs of upsampled LRMS and panchromatic images of size 1024×1024 are directly fed into this trained model to generate pan-sharpened images. The results of each method for different datasets are shown in Table 12. Results indicate that the method according to the present invention can achieve real-time performance by generating one image in 0.0068 seconds. It is worth noting that the method according to the present invention takes more time to reconstruct the pan sharpened image than other deep learning based methods except BIEDN. This is reasonable since the purpose of the method according to the present invention is to improve performance as much as possible in the conditions of real-time applications. As mentioned earlier, MSFN achieves the best performance of all comparative fan sharpening methods. Therefore, the method according to the present invention can achieve more effective performance while meeting the requirements of real-time implementation.

결론 및 향후 과제Conclusion and Future Challenges

실시간 구현의 요구 사항을 충족하고 동시에 보다 효과적인 성능을 달성하기 위해 본 발명에서는 팬크로마틱 및 멀티 스펙트럼 이미지의 계층적인 보완 특징들을 최대한 활용할 수 있는 팬 샤프닝 작업을 위한 멀티 스케일 융합 네트워크(MSFN)를 제안한다. 제안된 MSFN에서는 팬크로마틱 이미지와 멀티 스펙트럼 이미지의 멀티 스케일 특징들을 개별적으로 효과적으로 추출하기 위하여, 인코더-디코더 구조 및 거친-미세 전략을 도입한다. 한편, 원시 정보를 보존하기 위해 정보 풀을 채택하여, 기울기 역전파에도 이점을 제공한다.In order to meet the requirements of real-time implementation and achieve more effective performance at the same time, the present invention proposes a multi-scale convergence network (MSFN) for pan sharpening task that can make full use of the hierarchical complementary features of panchromatic and multispectral images. do. In the proposed MSFN, an encoder-decoder structure and a coarse-fine strategy are introduced to effectively extract multi-scale features of panchromatic images and multispectral images separately. On the other hand, by adopting an information pool to preserve raw information, it also provides advantages for gradient backpropagation.

또한, 멀티 스케일 융합 모듈은 디코더와 정보 풀로부터의 멀티 스케일 특징을 융합하기 위해 적용되어, 계층적 보완공간 및 스펙트럼 정보를 최대한 활용한다. 광범위한 실험은 팬 샤프닝 방법을 위해 본 발명의 방법의 우수성을 보여준다.In addition, the multi-scale fusion module is applied to fuse the multi-scale features from the decoder and information pool, making full use of the hierarchical complementary spatial and spectral information. Extensive experiments show the superiority of the method of the present invention for a fan sharpening method.

이상 본 발명을 구체적인 실시예를 통하여 상세하게 설명하였으나, 이는 본 발명을 구체적으로 설명하기 위한 것으로, 본 발명은 이에 한정되지 않으며, 본 발명의 기술적 사상 내에서 당 분야의 통상의 지식을 가진 자에 의해 그 변형이나 개량이 가능함은 명백하다고 할 것이다.Although the present invention has been described in detail through specific examples, this is for explaining the present invention in detail, the present invention is not limited thereto, and within the technical spirit of the present invention, those skilled in the art It will be clear that the modification or improvement is possible by

본 발명의 단순한 변형 내지 변경은 모두 본 발명의 영역에 속하는 것으로, 본 발명의 구체적인 보호 범위는 첨부된 청구범위에 의하여 명확해질 것이다.All simple modifications or changes of the present invention fall within the scope of the present invention, and the specific protection scope of the present invention will be clarified by the appended claims.

200 : 딥러닝에 기반한 원격 감지를 위한 실시간 이미지 융합 장치
202 : 얕은 특징 추출부 204 : 정보 풀
206 : 멀티 스케일 특징 추출 네트워크 208 : 멀티 스케일 특징 융합부
210 : 재구성부
212 : 업샘플링된 저해상도 멀티 스펙트럼 이미지
214, 216, 220, 230, 234, 238 : 컨벌루션 레이어
218 : 연결부
222 : 제1 멀티 스케일 특징 추출 서브 네트워크
224 : 제2 멀티 스케일 특징 추출 서브 네트워크
226 : 멀티 스케일 특징 융합부 228 : 잔여 블록
232, 240 : 가산기 236 : ReLU
300 : 인코더 302 : 디코더
304, 306, 308, 330, 336, 342 : 컨벌루션 레이어
310, 322 : 스몰 스케일 특징 추출부 312, 316, 320 : 가산기
314, 324 : 미들 스케일 특징 추출부
318, 326 : 라지 스케일 특징 추출부 328, 334, 340 : 잔여 블록
332, 338 : 디컨벌루션 레이어 400 : 스몰 스케일 특징 융합부
402 : 미들 스케일 특징 융합부 404 : 종합 특징 융합부
406, 412, 418 : 연결부
408, 414, 420 : 컨벌루션 레이어
410, 416 : 디컨벌루션 레이어200: Real-time image fusion device for remote sensing based on deep learning
202: shallow feature extraction unit 204: information pool
206: multi-scale feature extraction network 208: multi-scale feature fusion unit
210: reconstruction unit
212: upsampled low resolution multispectral image
214, 216, 220, 230, 234, 238: convolutional layer
218: connection part
222: first multi-scale feature extraction subnetwork
224: second multi-scale feature extraction subnetwork
226: multi-scale feature convergence unit 228: remaining block
232, 240: Adder 236: ReLU
300: encoder 302: decoder
304, 306, 308, 330, 336, 342: convolutional layer
310, 322: small scale feature extraction unit 312, 316, 320: adder
314, 324: middle scale feature extraction unit
318, 326: large scale feature extraction unit 328, 334, 340: residual block
332, 338: deconvolution layer 400: small scale feature convergence
402: Middle scale feature fusion part 404: Comprehensive feature fusion part
406, 412, 418: connection part
408, 414, 420: convolution layer
410, 416: deconvolution layer

Claims

a shallow feature extraction unit extracting shallow features from the upsampled low-resolution multi-spectral image and the pan-chromatic image;
An information pool for generating raw information features based on the panchromatic image, the upsampled low-resolution multispectral image, shallow features of the upsampled low-resolution multispectral image, and shallow features of the panchromatic image ;
a first multi-scale feature extraction sub-network generating small-scale features, middle-scale features, and large-scale features of the up-sampled low-resolution multi-spectral image based on shallow features of the up-sampled low-resolution multi-spectral image;
a second multi-scale feature extraction sub-network generating small-scale features, middle-scale features, and large-scale features of the panchromatic image based on shallow features of the panchromatic image;
The raw information feature, the small-scale feature, middle-scale feature, and large-scale feature of the upsampled low-resolution multi-spectral image, and the multi-scale feature fusing the small-scale feature, middle-scale feature, and large-scale feature of the panchromatic image. fusion; and
A real-time image fusion device for remote sensing based on deep learning comprising a reconstruction unit for reconstructing a high-resolution multi-spectral image based on the fused feature and the upsampled low-resolution multi-spectral image.

The method of claim 1,
The shallow feature extraction unit,
a first convolution layer that performs a convolution operation on the upsampled low-resolution multispectral image and outputs a shallow feature of the upsampled low-resolution multispectral image; and
A real-time image fusion device for remote sensing based on deep learning, comprising a second convolution layer that performs a convolution operation on the panchromatic image and outputs a shallow feature of the panchromatic image.

The method of claim 1,
The information pool,
a connection unit connecting the pan-chromatic image, the up-sampled low-resolution multi-spectral image, a shallow feature of the up-sampled low-resolution multi-spectral image, and a shallow feature of the pan-chromatic image; and
A real-time image fusion device for remote sensing based on deep learning, comprising a third convolution layer for performing a convolution operation on the output of the connection unit and outputting the original information feature.

The method of claim 1,
The first multi-scale feature extraction subnetwork,
a first encoder that lowers spatial resolution of shallow features of the upsampled low-resolution multispectral image and increases feature channels of feature maps; and
A first decoder for generating a small scale feature, a middle scale feature, and a large scale feature of the upsampled low-resolution multispectral image based on an output of the first encoder;
The second multi-scale feature extraction subnetwork,
a second encoder that lowers spatial resolution of shallow features of the panchromatic image and increases feature channels of feature maps; and
A real-time image fusion device for remote sensing based on deep learning comprising a second decoder for generating small scale features, middle scale features, and large scale features of the panchromatic image based on the output of the second encoder.

The method of claim 4,
The real-time image fusion device for remote sensing based on deep learning, wherein the first multi-scale feature extraction sub-network and the second multi-scale feature extraction sub-network share weights used in a convolution layer.

The method of claim 1,
The multi-scale feature fusion part,
a small-scale feature fusion unit fusing the small-scale feature of the upsampled low-resolution multi-spectral image with the small-scale feature of the panchromatic image and outputting a fused small-scale feature;
a middle scale feature fusion unit fusing middle scale features of the upsampled low-resolution multispectral image with middle scale features of the panchromatic image and outputting a fused middle scale feature; and
The fused small-scale feature, the fused middle-scale feature, the large-scale feature of the upsampled low-resolution multispectral image, the large-scale feature of the panchromatic image, and the raw information feature are fused to output a final fused feature. A real-time image fusion device for remote sensing based on deep learning, including a comprehensive feature fusion unit that

(A) extracting, by a shallow feature extraction unit, shallow features from the upsampled low-resolution multi-spectral image and the panchromatic image;
(B) an information pool based on the panchromatic image, the upsampled low resolution multispectral image, shallow features of the upsampled lowresolution multispectral image, and shallow features of the panchromatic image; generating an information feature;
(C) a first multi-scale feature extraction subnetwork generates small-scale features, middle-scale features, and large-scale features of the upsampled low-resolution multispectral image based on shallow features of the upsampled low-resolution multispectral image step;
(D) generating, by a second multi-scale feature extraction sub-network, small-scale features, middle-scale features, and large-scale features of the panchromatic image based on the shallow features of the panchromatic image;
(E) a multi-scale feature fusion unit, the raw information feature, the small-scale feature, middle-scale feature and large-scale feature of the upsampled low-resolution multi-spectral image, and the small-scale feature, middle-scale feature of the panchromatic image, and fusing large scale features; and
(F) a real-time image fusion method for remote sensing based on deep learning comprising the step of reconstructing, by a reconstruction unit, a high-resolution multispectral image based on the fused feature and the upsampled low-resolution multispectral image.

The method of claim 7,
In the step (A),
outputting shallow features of the upsampled low resolution multispectral image by performing a convolution operation on the upsampled low resolution multispectral image; and
A real-time image fusion method for remote sensing based on deep learning comprising the step of performing a convolution operation on the panchromatic image and outputting a shallow feature of the panchromatic image.

The method of claim 7,
In the step (B),
(B-1) connecting the panchromatic image, the upsampled low-resolution multispectral image, shallow features of the upsampled lowresolution multispectral image, and shallow features of the panchromatic image; and
(B-2) performing a convolution operation on the output of step (B-1) to output the original information feature;

The method of claim 7,
In the step (C),
(C-1) lowering spatial resolution of shallow features of the upsampled low-resolution multi-spectral image and increasing feature channels of feature maps; and
(C-2) generating a small-scale feature, a middle-scale feature, and a large-scale feature of the upsampled low-resolution multispectral image based on the output of step (C-1);
In the step (D),
(D-1) lowering spatial resolution of shallow features of the panchromatic image and increasing feature channels of feature maps; and
(D-2) generating small-scale features, middle-scale features, and large-scale features of the panchromatic image based on the output of step (D-1), for remote sensing based on deep learning. Real-time image fusion method.

The method of claim 10,
The step (C) and the step (D) share the weight used in the convolution layer, real-time image fusion device for remote sensing based on deep learning.

The method of claim 7,
In the step (E),
fusing small scale features of the upsampled low-resolution multispectral image with small scale features of the panchromatic image and outputting a fused small scale feature;
fusing middle-scale features of the upsampled low-resolution multi-spectral image with middle-scale features of the panchromatic image and outputting a fused middle-scale feature; and
The fused small-scale feature, the fused middle-scale feature, the large-scale feature of the upsampled low-resolution multispectral image, the large-scale feature of the panchromatic image, and the raw information feature are fused to output a final fused feature. A real-time image fusion method for remote sensing based on deep learning, comprising the step of doing.