CN116563187A

CN116563187A - Multispectral image fusion based on graph neural network

Info

Publication number: CN116563187A
Application number: CN202310573695.1A
Authority: CN
Inventors: 邸江磊; 江文隽; 秦智坚; 吴计; 王萍; 任振波; 秦玉文
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2023-05-22
Filing date: 2023-05-22
Publication date: 2023-08-08

Abstract

The invention belongs to the field of image fusion, and discloses a multispectral image fusion method based on a graph neural network, which comprises the following steps of: multispectral images and full-color images are first acquired. Firstly, extracting pixel characteristics from a multispectral image by using a convolution network, carrying out dimension reduction and characteristic extraction on the multispectral image, extracting a three-dimensional graph structure of the multispectral image in a graph embedding mode, fusing the three-dimensional graph structure to obtain a heterogeneous graph of the multispectral image, and convoluting the acquired heterogeneous graph by using a space-time graph to extract spatial characteristics of graph data. And then, the acquired pixel characteristics and the spatial characteristics are aggregated to output weights of the characteristics through a gating mechanism, and a multispectral characteristic diagram of the final fusion of the spatial characteristics and the pixel characteristics is acquired by the weights. And fusing the obtained characteristic image and the multispectral characteristic image through the same convolution network by the panchromatic image to obtain a fused multispectral image by a attention mechanism, wherein the method has higher resolution of the multispectral image.

Description

Multispectral image fusion based on graph neural network

Technical Field

The invention relates to the field of image fusion, in particular to a multispectral image fusion method based on a graph neural network.

Background

The image fusion technology is to generate a new high-quality image by combining and processing the image data under the same scene acquired by different collectors. With the rapid development of satellite sensor technology, multispectral images are widely used in the fields of military systems, environmental analysis and the like. However, due to the limitations of satellite sensor technology, only full-color images with high spatial resolution but low spectral resolution, or multispectral images with rich spectral information but low spatial resolution can be acquired. In order to obtain multispectral images with high spatial resolution, remote sensing image fusion technology becomes a research hot spot, and is capable of fusing multispectral and panchromatic images.

The existing remote sensing image fusion technology can be mainly divided into four types: component substitution methods, multi-resolution analysis methods, model-based methods, and deep learning-based methods. The component replacement method decomposes a multispectral image into a plurality of components and replaces a part of the components therein with spatial components of a full-color image. However, some spectral information in the multispectral image may be lost due to imperfections in component separation. The multi-resolution analysis injects high frequency information of a full color image into a multi-spectral image in the transform domain. Multi-resolution analysis better retains spectral information but sometimes introduces spatial distortion. The model-based method realizes fusion by establishing an optimization model and priori constraints, but has higher calculation cost and difficult selection of optimal manual parameters, and limits the use in practical application.

Although the spectrum and the space detail of the remote sensing image are learned by using the convolutional neural network at present, better fusion performance is obtained, the correlation between the multispectral image and the hyperspectral image feature map is not considered, the possible complementarity feature between the multispectral image and the hyperspectral image is ignored, the multispectral image and the hyperspectral image feature interaction is weak, the extracted feature precision is not high, and the quality of the obtained image is not high. Therefore, it is an important problem to design a network to explore the cross-modal correlation between panchromatic and multispectral images, to better transfer the spatial texture details of panchromatic images into multispectral images, and to obtain multispectral images with rich texture information and minimal spectral distortion.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a multispectral image fusion method based on a graph neural network, wherein the image fusion method can realize the effective fusion of multispectral images and full-color images, and the fused characteristics have strong interaction and high image quality.

The technical scheme for solving the technical problems is as follows:

the multispectral image fusion method based on the graph neural network is characterized by comprising the following steps of:

(S1) acquiring a multispectral image and a corresponding panchromatic image over a period of time;

(S2) extracting pixel features from the multispectral image and the panchromatic image using a shared encoder network;

(S3) carrying out dimension reduction and feature extraction on the multispectral image, and then respectively extracting and fusing three-dimensional graph structures of the multispectral image in a graph embedding mode to obtain a heterogeneous graph of the multisource features;

(S4) performing feature extraction on the obtained heterogeneous graph by utilizing space-time graph convolution to obtain graph data space features;

(S5) aggregating the acquired pixel characteristics and the spatial characteristics through a gating mechanism, outputting the weights of the characteristics, and acquiring a multispectral characteristic diagram fusing the spatial characteristics and the pixel characteristics finally through the weights;

(S6) feature fusion is carried out on the obtained feature map of the full-color image and the multispectral feature map fused with the spatial features and the pixel features through an attention mechanism;

(S7) the fused characteristic images pass through a decoder to obtain fused multispectral images.

Preferably, in the step (S1), the multispectral camera in the step S1 is an imaging camera capable of collecting 3 or more spectral bands simultaneously.

Preferably, in the step (S2), the encoder network structure has two branches: the upper network is used for extracting the shallow layer characteristics of the image, and consists of 4 convolution layers with convolution kernels of 3 multiplied by 3, the last layer is removed, and each layer is connected with a ReLU activation function; one is the deep layer characteristic that the lower network is used for extracting the image, first pass through a 1 x 1 convolution layer, then connect 4 convolution layers that the convolution kernel is 3 x 3 to constitute, this convolution module adopts based on the Nest connected mode, can keep more information, obtains deep layer characteristic, carries out the concat with the characteristic map that upper and lower network obtained at last.

Preferably, in the step (S3), the obtaining of the three feature maps of the map structure is: extracting a physical feature map of the spectrum data by utilizing the spectrum data after dimension reduction and combining the infrared spectrum features; determining super-pixel neighbor node information by using a linear iterative clustering method, constructing edge connection relations among nodes according to the spatial connectivity relations of the super-pixels, and extracting a spatial feature map; and combining the spectrum characteristic similarity of the target, sampling and recombining from different spectrum band dimensions to obtain the spectrum characteristic distribution of the target, and effectively representing the spectrum data residing on the smooth manifold by using the GNN.

Preferably, in the step (S3), the heterogeneous graph is obtained by connecting the obtained feature graphs with three dimensions and different node types by using a graph self-encoder, and obtaining the heterogeneous graph fusing the multi-source features by adopting a self-attention-based graph pooling method, wherein the self-encoder comprises but is not limited to a graph convolution self-encoder, a variational graph convolution self-encoder and an anti-regularization graph self-encoder.

Preferably, in the step (S4), the space-time diagram convolution time dimension and the space dimension are respectively extracted by different methods, wherein the network for extracting the time dimension includes, but is not limited to, RNN, GRU, LSTM, TCN, transformer, and the network for extracting the space dimension includes, but is not limited to, GCN, GAT, GCN and GAT.

Preferably, in the step (S5), the fused feature map is obtained by aggregating two feature maps, where two fully connected networks connected to each other are used; and next, the aggregated features pass through an activation function, the function is limited to be between 0 and 1, the numerical value represents how much information can pass through the gate, 0 represents that no information is allowed to pass through, 1 represents that all information is allowed to pass through, and the weight of the output features can be obtained by the gate value, so that the weight is multiplied by the pixel features to obtain a feature map of the final fused space-time features and the pixel features.

Preferably, in the step (S6), the attention mechanism is a combination of a spatial attention mechanism and a channel attention mechanism, and the pixel features of the full-color image are fused with the multispectral image features of which the spatial features and the pixel features are fused;

preferably, in the step (S7), the decoder performs upsampling by 4 DB modules, where each DB module is composed of a 3×3 convolution and a 1×1 convolution, each DB module adopts a dense connection mode, and finally outputs a fused multispectral image by using 2 3×3 convolution layers.

Compared with the prior art, the invention provides a multispectral image fusion method based on a graph neural network, which has the following beneficial effects:

1. multispectral images provide higher spatial resolution because they contain rich spectral information, while panchromatic images provide higher spatial resolution. The information of the multispectral image and the full-color image can be effectively combined by fully utilizing the multispectral and full-color information through the fusion method of the graph neural network. Therefore, the advantages of the two can be fully utilized, and the quality and detail reduction capability of the image are improved.

2. The graph neural network learns on graph structure data and has good global feature learning capability. Therefore, the spectrum information and the spatial consistency can be maintained by fusion of the multispectral image and the full-color image through the graph neural network. This is important for some tasks that require maintaining consistency of object boundaries and colors in the image.

3. Multispectral and panchromatic images often contain a large amount of redundant information, particularly in the spectral and spatial dimensions. The fusion method of the graph neural network can effectively reduce redundant information and extract the features with the most representation and information richness. This can improve the efficiency of image processing and analysis and reduce the cost of data storage and transmission.

Drawings

Fig. 1 is a flow chart diagram of a multispectral image fusion method based on a graph neural network.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but embodiments of the present invention are not limited thereto.

Referring to fig. 1, the multispectral image fusion method based on the graph neural network comprises the following steps:

Referring to fig. 1, in step (S1), the acquired multispectral image is composed of multispectral images of three bands, the multispectral image and the panchromatic image are 1000, and the pixel size is 256×256.

Referring to fig. 1, in step (S2), the encoder network is composed of two branches, one is a convolution layer with an upper network for extracting the shallow features of the image, and is composed of 4 convolution layers with a convolution kernel of 3×3 and a stride of 1, and then a ReLU activation function, and the other is a convolution layer with a convolution kernel of 1×1, and then a convolution layer with a convolution kernel of 4×3 and a stride of 1, where the convolution module adopts a convolution layer based on a Nest connection mode, so that more information can be retained, deep features can be obtained, and finally, the feature map obtained by the upper network and the lower network is subjected to concat.

Referring to fig. 1, in step (S3), the dimension reduction and feature extraction method: the spectrum and the spatial information are fused by using the augmentation vector:

x＝(u,v,b ₁ ,b ₂ ,...,b _B )＝(x ¹ ,x ² ,...,x ^B+2 ) ^T (1)

where h (u, v) is a pixel on the graph, (b) ₁ ,b ₂ ,b ₃ ,b _B ) Is a band array.

Will augment the vectorAs training data, normalized to any x _i The same class classification is carried out in a supervision mode and k nearest neighbor calculation is carried outConstructing a local neighborhood of a pixel, classifying the similarity of the space and spectrum information of the local neighborhood and reducing the feature dimension through manifold learning, combining the local neighborhood or neighborhood embedding of a spatial spectrum polynomial to finish the weight distribution of the similarity of the spectrum features of different pixels in the local neighborhood, and finally combining the binary matrix multiplication to establish the low-dimensional nonlinear explicit mapping between multispectral data.

In this embodiment, images of 4 bands are acquired, so b=4.

Referring to fig. 1, in step (S3), the heterogeneous graph is obtained by connecting three feature graphs of different types of nodes and edges by using a network structure based on a graph self-encoder, specifically: analyzing each given graph, analyzing node feature vectors among different graphs through cosine similarity, and reserving nodes with high similarity in the three graphs; for the three graphs after processing, the graph convolution network is used for calculating the three graphs, so that the node representation z of each node is obtained, and then the following formula is utilized:

wherein the method comprises the steps ofThe prediction probability between the link nodes (i, j) is that sigma is a Sigmoid activation function, the probability that the sigma is larger than 0.8 is set for linking, the probability that the sigma is smaller than 0.2 is not connected, and a new graph after three graphs are linked is obtained.

In addition, the extracted new graph is subjected to primary graph nerve convolution operation, and GCN learns the characteristic representation of each node V E V, namely, the neighbor node characteristics of each node are aggregated to obtain the characteristic representation of the node V; for each node v, calculating an attention score z for each node by using a self-attention mechanism, then selecting the most important node by using topk, and determining the number of reserved nodes by pooling the proportion k, wherein k is set to be 0.5; by thus obtaining the attention-based mask map, the mask map is multiplied by the corresponding node of the map structure of the fused heterogeneous information of the original input, and the final output map, i.e. the heterogeneous map fused with the multi-source features, is obtained.

Referring to fig. 1, in step (S4), the space-time diagram convolving the network with the time dimension is TCmodule, and the network with the time dimension is GCN combined with GAT; wherein TCmodule is a network module for extracting time dimension, it is made up of two expansion starting layers, one expansion starting layer is processed through tanh activation function, it is used as the filter of filtering input, and another branch inputs the processing through Sigmoid activation function, used for controlling the information content that the filter can lead to the next module; and extracting spatial features through a GCN layer after the network of the time module is passed, then carrying out information transfer among nodes through a GAT graph annotation force layer, capturing the dependency relationship among the nodes, and finally obtaining space-time features.

Referring to FIG. 1, in step (S5), the gating mechanism is to fuse the multispectral spatiotemporal features f first ^R And pixel characteristics

Here g () we use two fully connected networks connected to each other, the hyperbolic tangent function as the activation function. Next, we use the fused feature f as a gating mechanism, i.e. the aggregated feature is subjected to a sigmoid activation function to limit the function to between [0,1], the value represents how much information can pass through the gating, wherein 0 represents that no information is allowed to pass through, and 1 represents that all information is allowed to pass through; in this network, the gating mechanism controls the importance of each pixel, where 0 represents that the current pixel is not useful for image recognition decisions at all and 1 represents that the current pixel is of paramount importance for image recognition decisions. Thus, the final output function can be expressed as:

the corresponding elements are multiplied by the weight of the output characteristic of the gate control value, so that the final characteristic vector f is obtained _fusion I.e., a multi-spectral signature that combines both spatio-temporal information as well as pixel information.

Referring to fig. 1, in step (S6), the attention mechanism is a feature of combining the channel attention mechanism and the spatial attention mechanism to make a full color imageAnd multispectral features f _fusion Feature learning is carried out in the channel dimension and the space dimension respectively, and the importance of each channel and the importance of the space region are obtained to obtain f _ca Feature map of channel attention and f _sa The feature map of spatial attention, which merges the features of these two mechanisms, is:

F _φ ＝(f _sa ×0.4+f _ca ×0.6)×0.5 (5)

because the multispectral image consists of a plurality of wave bands, we pay more attention to feature learning in the channel dimension, and finally obtain a feature map F _φ 。

Referring to fig. 1, in step (S7), the decoder upsamples by 4 DB modules, wherein each DB module is composed of a 3×3 convolution and a 1×1 convolution, stride is 1, the ReLU activation function is followed by excitation, each DB module adopts a dense connection mode, and finally 2 3×3 convolution layers are followed. Will obtain a feature map F _φ And outputting the images to the network to finally obtain the fused high-resolution multispectral image.

The foregoing is illustrative of the present invention and is not to be construed as limiting thereof, but rather as various changes, modifications, substitutions, combinations, and simplifications which may be made therein without departing from the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims

1. The multispectral image fusion method based on the graph neural network is characterized by comprising the following steps of:

2. The method of claim 1, wherein in step (S1), the multispectral image is captured by a multispectral camera capable of capturing 3 or more spectral bands simultaneously.

3. The method of claim 1, wherein in step (S2), the encoder network structure comprises two branches: the upper network is used for extracting the shallow layer characteristics of the image, and consists of 4 convolution layers with convolution kernels of 3 multiplied by 3, the last layer is removed, and each layer is connected with a ReLU activation function; one is the deep layer characteristic that the lower network is used for extracting the image, first pass through a convolution layer of 1 x 1, then connect 4 convolution layers that the convolution kernel is 3 x 3 to constitute, this convolution module adopts based on the Nest connected mode, can keep more information, obtains deep layer characteristic, carries out the concat of characteristic dimension with the characteristic map that upper and lower network obtained at last.

4. The method of claim 1, wherein in step (S3), the three feature maps are obtained by obtaining the map structure: extracting a physical feature map of the spectrum data by utilizing the spectrum data after dimension reduction and combining the infrared spectrum features; determining super-pixel neighbor node information by using a linear iterative clustering method, constructing edge connection relations among nodes according to the spatial connectivity relations of the super-pixels, and extracting a spatial feature map; and combining the spectrum characteristic similarity of the target, sampling and recombining from different spectrum band dimensions to obtain the spectrum characteristic distribution of the target, and effectively representing the spectrum data residing on the smooth manifold by using the graph neural network.

5. The method of claim 1, wherein in step (S3), the heterogeneous map is obtained by connecting the obtained feature maps of three dimensions of different node types with a map self-encoder, and obtaining the heterogeneous map fusing the multi-source features by using a self-attention-based map pooling method, wherein the self-encoder includes but is not limited to a map convolution self-encoder, a variational map convolution self-encoder, and an anti-regularization map self-encoder.

6. The method of claim 1, wherein in step (S4), the space-time graph convolution time dimension and the space dimension are extracted by different methods, wherein the network for extracting the time dimension includes but is not limited to RNN, GRU, LSTM, TCN, transformer, and the network for extracting the space dimension includes but is not limited to GCN, GAT, GCN and GAT.

7. The method of claim 1, wherein in step (S5), the fused feature map is obtained by first aggregating two feature maps, wherein two fully connected networks connected to each other are used; and next, the aggregated features pass through an activation function, the function is limited to be between 0 and 1, the numerical value represents how much information can pass through the gate, 0 represents that no information is allowed to pass through, 1 represents that all information is allowed to pass through, and the weight of the output features can be obtained by the gate value, so that the weight is multiplied by the pixel features to obtain a feature map of the final fused space-time features and the pixel features.

8. A method of fusion of multispectral images based on a neural network as claimed in claim 1, wherein in step (S6), the attention mechanism is a combination of spatial attention mechanism and channel attention mechanism, and the pixel features of the panchromatic image are fused with the multispectral image features fused with the spatial features and the pixel features.

9. The method of claim 1, wherein in step (S7), the decoder upsamples by 4 DB modules, each DB module comprising a 3×3 convolution and a 1×1 convolution, each DB module adopting a densenet connection method, and finally outputs the fused multispectral image by using 2 3×3 convolution layers.