CN113191325A

CN113191325A - Image fusion method, system and application thereof

Info

Publication number: CN113191325A
Application number: CN202110567685.8A
Authority: CN
Inventors: 钟锡武; 钱静
Original assignee: Shenzhen Institute of Advanced Technology of CAS
Current assignee: Shenzhen Institute of Advanced Technology of CAS
Priority date: 2021-05-24
Filing date: 2021-05-24
Publication date: 2021-07-30
Anticipated expiration: 2041-05-24
Also published as: CN113191325B

Abstract

The present application belongs to the field of image processing technology, and in particular, relates to an image fusion method, system and application thereof. The application provides an image fusion method, which comprises the steps of extracting first high-pass information of a multispectral image to obtain a first multispectral image, and extracting second high-pass information of a full-color image to obtain a first full-color image; extracting first spatial information of the first multispectral image and extracting second spatial information of the first panchromatic image; fusing the first spatial information and the second spatial information to obtain spatial features; and reconstructing the spatial features to obtain a high spatial resolution image, and simultaneously directly transmitting the multispectral image and the panchromatic image to the high resolution image after the spatial features are reconstructed, thereby improving the spectral resolution of the fused image.

Description

Image fusion method, system and application thereof

Technical Field

The present application belongs to the field of image processing technology, and in particular, relates to an image fusion method, system and application thereof.

Background

With current remote sensing system designs, both spectral and spatial resolution often cannot be maintained at high levels simultaneously. The images acquired by different sensors differ in geometrical characteristics, spectral resolution and spatial resolution. Some sensors acquire rich spectral information of a scene, but lack sufficient spatial information, such as multispectral images (MS). On the other hand, some sensors are good at capturing spatial information, but cannot capture reliable spectral information, such as a panchromatic image (PAN). High spatial resolution images provide subtle geometric features, while high spectral resolution images provide rich spectral information that can be used to identify and analyze targets. In order to take full advantage of the information provided by multispectral images and panchromatic images, it is common to fuse a low-resolution multispectral image with a high-resolution panchromatic image of the same scene to produce an image with a more detailed spatial and spectral structure, i.e., pancharapening.

Remote sensing image pancharapening is developed to date, various technologies and algorithms exist, and the pancharapening is often used as a basis for other applications (such as semantic segmentation, classification and the like of remote sensing images) of the remote sensing images, and is particularly important in remote sensing image processing. The current widely used technical methods include: principal component analysis based, wavelet transform based, convolutional neural network based, and generative countermeasure network based methods. Although there are many methods that have been developed, none of these methods are optimal because they tend to make inefficient use of the spatial and spectral information of the MS image and the PAN image. In the existing fusion algorithm, it is often considered that spatial information exists in the PAN image and spectral information exists in the MS image, but this often ignores the spatial information existing in the MS image and the spectral information possibly existing in the PAN image, which results in the loss of spectral and spatial information to different degrees. Meanwhile, the existing deep learning method uses a simple stack of feature maps in feature fusion, and such an operation merely provides a fixed linear aggregation of feature maps, and it is not known at all whether such a combination is suitable for a specific object.

Disclosure of Invention

1. Technical problem to be solved

Based on the existing deep learning method, a simple feature map stack is used in feature fusion, such an operation only provides a fixed linear aggregation of feature maps, and the problem of whether the group is suitable for a specific object is completely unknown, meanwhile, the existing method often ignores spatial information existing in an MS image and spectral information possibly existing in a PAN image, and causes loss of fused image spectrum and spatial information to a certain extent, and the application provides an image fusion method, a system and application thereof.

2. Technical scheme

In order to achieve the above object, the present application provides an image fusion method, including the steps of: step 1: extracting first high-pass information of the multispectral image to obtain a first multispectral image, and extracting second high-pass information of the full-color image to obtain a first full-color image; step 2: extracting first spatial information of the first multispectral image and extracting second spatial information of the first panchromatic image; and step 3: fusing the first spatial information and the second spatial information to obtain spatial features; and 4, step 4: and reconstructing the spatial features to obtain a high spatial resolution image, and simultaneously directly transmitting the multispectral image and the panchromatic image to the high resolution image after the spatial features are reconstructed, thereby improving the spectral resolution of the fused image.

Another embodiment provided by the present application is: the extracting of the first high-pass information of the multispectral image comprises up-sampling the input multispectral image to make the multispectral image and the panchromatic image have the same size, and then extracting the first high-pass information of the up-sampled multispectral image by adopting high-pass filtering; the extracting the second high-pass information of the full-color image includes extracting the second high-pass information of the full-color image using high-pass filtering.

Another embodiment provided by the present application is: the first high-pass information extracting first low-pass information of the up-sampled multispectral image by using mean filtering, and then subtracting the first low-pass information from the up-sampled multispectral image; the second high-pass information extracts second low-pass information of the panchromatic image by employing mean filtering, and then subtracts the second low-pass information from the up-sampled multispectral image.

Another embodiment provided by the present application is: the first spatial information is extracted by adopting a convolutional neural network, and the second spatial information is extracted by adopting the convolutional neural network.

Another embodiment provided by the present application is: the reconstructing the spatial feature comprises reconstructing the spatial feature by adopting a U-Net network; and transmitting the up-sampling multispectral image and the panchromatic image to a spatial reconstruction image through spectral mapping by adopting long jump connection to obtain an image with high spatial resolution and high spectral resolution.

The application also provides an image fusion system, which comprises a feature extraction module, an attention feature fusion module and an image reconstruction module which are sequentially connected; the characteristic extraction module is used for acquiring high-pass information of an original image and then extracting image characteristics to obtain a characteristic diagram; the attention feature fusion module is used for fusing the feature map; and the image reconstruction module is used for reconstructing a high-spatial resolution image from the fused image.

Another embodiment provided by the present application is: the image reconstruction module comprises a long jump connection submodule, and the long jump connection submodule is used for transmitting the image spectrum information to the space for reconstruction and then fusing the image spectrum information with the image with the reconstructed space information.

Another embodiment provided by the present application is: said system adopts

Training as a loss function, said

The loss function is:

where N is the number of training samples in the small batch,

and

is a PAN image and a low-resolution MS image, Y⁽ⁱ⁾Is the corresponding high resolution MS image and θ is a parameter of the Attention _ FPNet network.

Another embodiment provided by the present application is: the attention feature fusion module is:

wherein X1, X2 represent two input features, Z ∈ R^C×H×WThe fused features are represented as a result of the fusion,

representing the weights derived by the channel attention module M, consisting of real numbers between 0 and 1,

corresponding to the dashed line in fig. 2, consists of real numbers between 0 and 1,

it means that the broadcast addition method is performed,

representing element-by-element multiplication.

The application also provides an application of the image fusion method, and the image fusion method is applied to the remote sensing image super-resolution reconstruction problem.

3. Advantageous effects

Compared with the prior art, the image fusion method, the image fusion system and the application thereof have the beneficial effects that:

the image fusion method provided by the application adopts a double-branch fusion network based on Attention feature fusion to solve the pancharapening problem, and is named as Attention _ FPNet.

The image fusion method provided by the application reconstructs the spatial information of the image in the high-pass filtering domain, and more fully considers the spatial information in the multispectral and panchromatic images. Meanwhile, the input panchromatic image and the up-sampled multispectral image are directly transmitted to the image after the space is reconstructed through a long jump connection, the spectral information of the panchromatic image and the multispectral image is considered, the spectral resolution of the fused image is improved, and the possible loss of spatial information caused by the deepening of a network is supplemented. Meanwhile, the attention feature fusion method is used, the relation among different feature graphs is fully considered, and the fusion quality is improved.

According to the image fusion method, the convolution neural network is utilized to depend on the high-efficiency fusion performance of the feature extraction capability and the attention mechanism of the powerful convolution network with little frequency spectrum distortion, the double-branch fusion network based on attention feature fusion is used, in order to more fully utilize spatial information in an MS image and a PAN image, the spatial information of the MS and the PAN after high-pass filtering is fused, then the spatial information of the fused image is reconstructed, meanwhile, an attention feature fusion module is used for replacing a common channel stacking method, the relation among different channels is considered, and therefore the quality of feature fusion is improved.

According to the image fusion method, in order to obtain a fusion image with higher spectral resolution, the spectral information in the MS image and the spectral information in the PAN image are considered at the same time, a long jump connection is used, the input PAN image and the MS image after up-sampling are directly transmitted to the fusion image after spatial reconstruction, and therefore the loss of the spectral information is reduced.

The image fusion method provided by the application inevitably causes the loss of the spatial information along with the deepening of the network depth, and the long jump connection plays a role in supplementing the spatial information. By the method and the device, the multispectral image with higher resolution can be obtained.

According to the image fusion method, the attention feature fusion method is used for replacing a simple channel stacking method used in the past to fuse the feature map, the weight among different channels is considered, and the quality of feature fusion is improved.

According to the image fusion method, the spatial resolution of the image is reconstructed in a high-pass filtering domain instead of an image domain, the spatial information in the MS image and the spatial information in the PAN image are more fully considered, the spatial resolution of the fused image can be improved, and meanwhile, the spectral information of the MS image and the spectral information of the PAN image are more fully utilized by using a long jump connection.

The image fusion system provided by the application uses

Loss function rather than widespread use

A loss function to optimize the network.

Drawings

FIG. 1 is a schematic diagram of an Attention _ FPNet of the present application;

FIG. 2 is a schematic view of an attention feature fusion module of the present application;

FIG. 3 is a detailed structural diagram of Attention _ FPNet of the present application;

fig. 4 is a schematic view of a first effect of the present application;

FIG. 5 is a schematic diagram of a second effect of the present application;

fig. 6 is a schematic diagram of a third effect of the present application.

Detailed Description

Hereinafter, specific embodiments of the present application will be described in detail with reference to the accompanying drawings, and it will be apparent to those skilled in the art from this detailed description that the present application can be practiced. Features from different embodiments may be combined to yield new embodiments, or certain features may be substituted for certain embodiments to yield yet further preferred embodiments, without departing from the principles of the present application.

The remote sensing image panchromatic sharpening method is a pancharapening method, the multispectral remote sensing image is enhanced by utilizing panchromatic wave bands, the observation process of the panchromatic wave bands and the multiband image is simulated by combining the characteristics of sensors, and the expected value of the multispectral image with high resolution is estimated by utilizing priori knowledge. The method enables the panchromatic waveband data and the multi-spectral waveband data to be automatically aligned, successfully reserves the spectral information, increases the spatial resolution and enriches the ground information.

In recent years, many different pansharpening methods have been proposed. These methods can be broadly classified into the following four categories: component replacement (CS), multiresolution analysis (MRA), hybrid approaches (combining CS and MAR), model-based approaches, and deep learning-based approaches.

(1) The component replacement method comprises the following steps: the CS method transforms the MS image to another color space based on a reversible transformation, which separates spatial information and spectral information of the MS image, and replaces the spatial information of the MS image separated with spatial information of the PAN image after histogram matching. And finally, converting the MS image after replacing the spatial information into the original color space by using inverse conversion. IHS (Intensity-Hue-preservation), Principal Component Analysis (PCA), Brovey Transform (BT) and Gram-Schmidt (GS) -based transformations are the best known CS methods.

(2) Multi-resolution analysis: the MRA method decomposes each original data into a series of images with different resolutions by using a plurality of multidimensional methods such as laplacian pyramid decomposition, wavelet transformation, contourlet transformation, curvelet transformation, and the like, performs fusion on the images with different resolutions, and finally performs inverse transformation to obtain a fused image.

(3) The mixing method comprises the following steps: hybrid methods combine the advantages of CS and MRA methods

(4) Model-based methods: a model-based method adopts a reverse thinking mode, firstly assumes a degradation process from a high-resolution MS image to a low-resolution MS and a high-resolution PAN, describes the degradation process by an optimal model, and carries out recovery from the degradation process in the reverse direction.

(5) The method based on deep learning comprises the following steps: the deep learning-based method can obtain ideal fusion performance by relying on the feature extraction capability of a strong convolution network with little frequency spectrum distortion. In 2016, Giuseppe and the like are improved on the basis of a single image super-resolution reconstruction algorithm SRCNN, and a first three-layer network structure based on deep learning is provided for solving the pancharapening problem. Firstly, an input MS image and an input PAN image are stacked on a channel dimension, and then the MS image and the PAN image are sent into a three-layer network structure to reconstruct the images, so that a multispectral image with high spatial resolution is generated. The idea of the method is applied later, and a plurality of pancharapening network structures based on deep learning are generated to help the subsequent development.

Referring to fig. 1 to 6, the present application provides an image fusion method, including the steps of: step 1: extracting first high-pass information of the multispectral image to obtain a first multispectral image, and extracting second high-pass information of the full-color image to obtain a first full-color image; step 2: extracting first spatial information of the first multispectral image and extracting second spatial information of the first panchromatic image; and step 3: fusing the first spatial information and the second spatial information to obtain spatial features; and 4, step 4: and reconstructing the spatial features to obtain a high spatial resolution image, and simultaneously directly transmitting the multispectral image and the panchromatic image to the high resolution image after the spatial features are reconstructed, thereby improving the spectral resolution of the fused image.

Further, the extracting the first high-pass information of the multispectral image comprises up-sampling the input multispectral image to make the multispectral image and the panchromatic image have the same size, and then extracting the first high-pass information of the up-sampled multispectral image by adopting high-pass filtering; the extracting the second high-pass information of the full-color image includes extracting the second high-pass information of the full-color image using high-pass filtering.

Further, the first high-pass information extracts first low-pass information of the up-sampled multispectral image by using mean filtering, and then subtracts the first low-pass information from the up-sampled multispectral image; the second high-pass information extracts second low-pass information of the panchromatic image by employing mean filtering, and then subtracts the second low-pass information from the up-sampled multispectral image.

Further, the first spatial information is extracted by using a convolutional neural network, and the second spatial information is extracted by using a convolutional neural network.

Further, the reconstructing the spatial feature comprises reconstructing the spatial feature by using a U-Net network; and transmitting the up-sampling multispectral image and the panchromatic image to a spatial reconstruction image through spectral mapping by adopting long jump connection to obtain an image with high spatial resolution and high spectral resolution.

Feature extraction module

The MS image is first up-sampled to have the same image size as the PAN. In order to obtain the high-pass information of the image, the application subtracts the low-pass information found by using the averaging filter from the original image, thereby obtaining the high-pass information of the original image. Thereafter, features are extracted from the MS image and the PAN image, respectively, from which the high-pass filtering has been extracted, using two sub-networks. The two sub-networks have similar structures and different weights, one sub-network takes the image of 4 wave bands as input, and the other sub-network takes the image of a single wave band as input. Each sub-network contains three successive convolutional layers, each followed by a modified Linear Unit (ReLU).

Attention feature fusion module

After passing through the feature extraction module, two feature maps which respectively and definitely represent the spatial information of the MS image and the PAN image are obtained. In order to fully utilize the spatial information of the MS image and the PAN image, the extracted feature maps must be fused. However, in the conventional deep learning method, when feature maps are fused, only two feature maps are directly stacked, only one fixed linear aggregation of feature maps can be provided, the relationship existing between different feature maps is not considered, and it is completely unknown whether such a combination is suitable for a specific object. To this end, the present application replaces the channel stacking method used in the prior art method with an Attention Feature Fusion (AFF) [14], which is structured as shown in FIG. 2. The AFF can be expressed as:

representing the weights derived by the channel attention module M, which consists of real numbers between 0 and 1,

corresponding to the dashed line in fig. 2, is also made up of real numbers between 0 and 1.

It means that the broadcast addition method is performed,

representing element-by-element multiplication.

Image reconstruction module

By the implementation of the two modules, the fusion of the spatial information of the MS image and the PAN image is completed, and an image with high spatial resolution needs to be reconstructed from the fused image. The present application first downsamples the image. The present application does not use the maximum pooling and average pooling used by most convolutional neural networks to obtain features with scale and rotation invariance, because the detail information of the image is very important in the pancharapening fusion, and therefore, in the whole network, a convolution kernel with the step size of 2 is used for downsampling, rather than a simple pooling operation. After two times of downsampling, feature maps of two different scales are obtained, and the feature maps only account for 1/2 × 1/2 and 1/4 × 1/4 of the input feature proportion respectively. Then, two deconvolution are used for up-sampling, and two characteristic graphs with the scales of 1/2 × 1/2 and 1 × 1 of the input characteristics are generated step by step.

Since the features extracted by the convolutional neural network map the semantics and abstract information of the image in the deep convolutional layer, it is difficult to recover the detail texture of the image from the semantics and abstract information of the image. To restore realistic details, inspired by U-Net [39], the generated feature map is copied to the position after the first up-sampling before the second down-sampling and connected with the corresponding feature map to inject the detail information lost in the down-sampling process. The last layer outputs the required high resolution MS image. The detailed structure of the deep learning model used in the present application is shown in fig. 3.

The pancharapening task is to obtain a multispectral image with high spatial resolution and high spectral resolution, and the conventional method usually utilizes a certain feature extraction method to extract spectral information of an MS image, but such operation can cause the loss of the spectral information in the MS image and ignore spectral information possibly existing in a PAN image, so that the application uses a long jump connection to directly transmit the input MS and PAN images to a space for reconstruction and then fuse the MS and PAN images with the image with the reconstructed spatial information.

Furthermore, the image reconstruction module comprises a long jump connection submodule, and the long jump connection submodule is used for transmitting the image spectrum information to the space reconstruction module and then fusing the image spectrum information with the image with the reconstructed space information.

Further, the system employs

Training as a loss function, said

The loss function is:

where N is the number of training samples in the small batch,

and

Loss function

In addition to the network structure, the loss function is another important factor affecting the quality of the reconstructed image. Use of previous image reconstruction tasks

The norm is taken as a loss function, but the generated image has a fuzzy phenomenon. Therefore, this application uses

The network of the present application is trained as a loss function.

Further, the attention feature fusion module is:

it means that the broadcast addition method is performed,

representing element-by-element multiplication.

The application also provides an application of the image fusion method, and the image fusion method is applied to the remote sensing image pancharapening problem.

Precision inspection and evaluation

The present application compares the method used with several widely used techniques, including: PCA, IHS, wavelet, MTF _ GLP _ HPM, GSA, CNMF, PNN, PanNet, ResTFNet.

Tables 1-3 show quantitative indices on the three satellite datasets Pleiades, SPOT-6 and Gaofen-2. Fig. 4-6 show qualitative results over three satellite datasets. From tables 1-3, it can be seen that the Attention _ FPNet achieves the best performance in most indicators. Particularly on the Pleiades dataset, all the indexes achieved the best performance. The SPOT-6 and Gaofen-2 data sets achieved the best performance for all the other criteria, except that QNR achieved the 4 th and 2 nd performance, respectively.

As can be seen from the present application in fig. 4, all methods except the Wavelet and PNN algorithms produce visually pleasing pansharpening images. Images generated by the Wavelet method have serious blurring and artifact phenomena. The PNN method also exhibits a blurring effect. The IHS method, while having good visual quality, also has significant spectral distortion. In fig. 5, except for Wavelet and CNMF, other methods can achieve better visual effect. Wavelet still has serious blurring and artifact phenomena, while CNMF has some serious information loss of spatial details. On the Pleiades dataset, as in fig. 6, the Wavelet and PNN methods again appear blurred. The Attention _ FPNet algorithm of the present application does better in terms of spectral preservation and also produces richer spatial details.

Table 1 quantitative evaluation on SPOT-6 dataset. And (4) sorting according to results, wherein the first four names are respectively marked as (1), (2), (3) and (4).

Table 2 quantitative evaluation on Pleiades dataset. And (4) sorting according to results, wherein the first four names are respectively marked as (1), (2), (3) and (4).

Table 3 quantitative evaluation on Gaofen-2 dataset. And (4) sorting according to results, wherein the first four names are respectively marked as (1), (2), (3) and (4).

And (3) accuracy inspection conclusion: based on the above experimental analysis, it is found that the method of the present application is superior to other commonly used methods in both spectral and spatial indicators and visual effects on three satellite datasets. This shows that the method of the present application is effective for solving the remote sensing image pancharapening problem.

Experiments are carried out on three satellite data sets of Pleiades, SPOT-6 and Gaofen-2, and the experimental results show that the Attention _ FPNet used in the method is superior to other existing common technologies in spectral and spatial information reconstruction. Experiments prove that the double-branch fusion network based on attention feature fusion is feasible for the pancharapening task.

Eight widely used indicators were used to quantitatively evaluate the performance of the proposed method and the comparative method.

The peak signal-to-noise ratio (PSNR) reflects the quality of the fused reconstructed image by calculating the ratio of the maximum peak value of the reconstructed image to the mean square error of the two images on the basis of the Mean Square Error (MSE). PSNR is defined as:

wherein, MAX_IIs the maximum value representing the color of the image point. The higher the PSNR value between two images, the less the reconstructed image is distorted with respect to the high resolution image. MSE is defined as:

where I and K are two images of size m n, one of which is a noise approximation of the other.

Structural Similarity (SSIM) measures the overall fusion quality by calculating the mean, variance and covariance of the fused image and the reference image. The SSIM measurement consists of three types of contrast modules, brightness, contrast, and structure. Suppose that two images X, Y of size M N are given, where the mean, variance and covariance of X and Y are u_x、u_y、

δ_xyAnd (4) showing. The comparison functions defining brightness, contrast and structure are respectively

The three component factors are combined to form an SSIM index which is defined as

SSIM(X,Y)＝[I(X,y)]^a[c(X,y)]^p[s(x.Y)]^γ #(6)

The closer the SSIM value is to 1, the higher the similarity between the two images.

The global integrated error index (ERGAS) mainly evaluates the spectral quality of all fused bands within the spectral range, taking into account the overall situation of spectral variations. It is defined as

Wherein h is the resolution of the high-resolution image, l is the resolution of the low-resolution image, N is the number of wavebands, Bi is the multispectral image, and Mi is the average of the radiance values of the multispectral image. The smaller its value, the better the spectral quality of the fused image in the spectral range.

Spectral Angle Mapping (SAM) evaluates spectral quality by calculating the angle between the corresponding pixel of the fused image and the reference image. It is defined as

Wherein, I_a，J_aIs the pixel vector of the fused image and the reference image at the distance point a. For an ideal fused image, the value of the SAM should be 0.

The Spatial Correlation Coefficient (SCC) is to estimate the similarity of spatial details of the fused image and the reference image, extract high frequency information of the reference image using a high pass filter, and calculate a Correlation Coefficient (CC) between the high frequency information [48 ].

Using a high-laplacian filter herein

A high frequency is obtained. sCC, the higher the number of points, the more spatial information that indicates the PAN image is injected during the fusion process. sCC between the fused image and the reference image is calculated. The final sCC is averaged over all bands of the MS image.

The correlation coefficient is calculated as

Where X is the fused image, Y is the reference image, w and h are the width and height of the image, and μ represents the average of the images.

Index Q combines three factors to calculate image distortion, correlation loss, brightness distortion and contrast distortion. It is defined as

Wherein Z is₁And Z₂Representing the b-th band of the fused image and the reference image. When Q is 1, it indicates that the best fidelity is available for reference.

QNR is a non-reference image quality evaluation method. It is composed of the spectral distortion index D_λAnd spatial distortion index D_SAnd (4) forming. Wherein, L wave bands are represented as I^LRMSThe generated HRMS picture is I^HRMSOnly one band is I^PANOf the PAN image of (1), which degenerates the corresponding I^LPANImage, then

The ideal value of QNR is 1, indicating better quality of the fused image.

Because the existing deep learning method is only simple channel stacking and does not know whether the combination is suitable for a specific object at all when feature fusion is carried out, the attention feature fusion method is used, the relation among different feature graphs is fully considered, and the fusion quality is improved.

Although the present application has been described above with reference to specific embodiments, those skilled in the art will recognize that many changes may be made in the configuration and details of the present application within the principles and scope of the present application. The scope of protection of the application is determined by the appended claims, and all changes that come within the meaning and range of equivalency of the technical features are intended to be embraced therein.

Claims

1. An image fusion method, characterized by: the method comprises the following steps:

step 1: extracting first high-pass information of the multispectral image to obtain a first multispectral image, and extracting second high-pass information of the full-color image to obtain a first full-color image;

step 2: extracting first spatial information of the first multispectral image and extracting second spatial information of the first panchromatic image;

and step 3: fusing the first spatial information and the second spatial information to obtain spatial features;

and 4, step 4: and reconstructing the spatial features to obtain a high spatial resolution image, and simultaneously directly transmitting the multispectral image and the panchromatic image to the high resolution image after the spatial features are reconstructed, thereby improving the spectral resolution of the fused image.

2. The image fusion method of claim 1, characterized in that: the extracting of the first high-pass information of the multispectral image comprises up-sampling the input multispectral image to make the multispectral image and the panchromatic image have the same size, and then extracting the first high-pass information of the up-sampled multispectral image by adopting high-pass filtering; the extracting the second high-pass information of the full-color image includes extracting the second high-pass information of the full-color image using high-pass filtering.

3. The image fusion method of claim 2, characterized in that: the first high-pass information extracting first low-pass information of the up-sampled multispectral image by using mean filtering, and then subtracting the first low-pass information from the up-sampled multispectral image; the second high-pass information extracts second low-pass information of the panchromatic image by employing mean filtering, and then subtracts the second low-pass information from the up-sampled multispectral image.

4. The image fusion method of claim 1, characterized in that: the first spatial information is extracted by adopting a convolutional neural network, and the second spatial information is extracted by adopting the convolutional neural network.

5. The image fusion method of claim 1, characterized in that: the reconstructing the spatial feature comprises reconstructing the spatial feature by adopting a U-Net network; and transmitting the up-sampling multispectral image and the panchromatic image to a spatial reconstruction image through spectral mapping by adopting long jump connection to obtain an image with high spatial resolution and high spectral resolution.

6. An image fusion system, characterized by: the system comprises a feature extraction module, an attention feature fusion module and an image reconstruction module which are sequentially connected;

the characteristic extraction module is used for acquiring high-pass information of an original image and then extracting image characteristics to obtain a characteristic diagram;

the attention feature fusion module is used for fusing the feature map;

and the image reconstruction module is used for reconstructing a high-spatial resolution image from the fused image.

7. The image fusion system of claim 6, wherein: the image reconstruction module comprises a long jump connection submodule, and the long jump connection submodule is used for transmitting the image spectrum information to the space for reconstruction and then fusing the image spectrum information with the image with the reconstructed space information.

8. The image fusion system of claim 6, wherein: the system is trained with l1 as the loss function, the l1 loss function being:

where N is the number of training samples in the small batch,

and

9. The image fusion system of claim 6, wherein: the attention feature fusion module is:

it means that the broadcast addition method is performed,

representing element-by-element multiplication.

10. An application of an image fusion method is characterized in that: the image fusion method of any one of claims 1-5 is applied to the remote sensing image pancharapening problem.