CN113327304A

CN113327304A - Hyperspectral image saliency map generation method based on end-to-end neural network

Info

Publication number: CN113327304A
Application number: CN202110593767.XA
Authority: CN
Inventors: 许廷发; 黄晨; 徐畅; 樊阿馨
Original assignee: Beijing Institute of Technology BIT; Chongqing Innovation Center of Beijing University of Technology
Current assignee: Beijing Institute of Technology BIT; Chongqing Innovation Center of Beijing University of Technology
Priority date: 2021-05-28
Filing date: 2021-05-28
Publication date: 2021-08-31

Abstract

The invention provides a hyperspectral image saliency map generation method based on an end-to-end neural network, which comprises the following steps of S1: preprocessing an image; s2: constructing an end-to-end neural network model and extracting space spectrum characteristics; s3: training a neural network; the end-to-end neural network model structure is a W2 type convolutional neural network, the neural network model comprises a left coding channel, a right coding channel and a middle decoding channel which are respectively a space coding module, a spectrum coding module and a decoding module, and the model further comprises a result prediction module for generating a prediction saliency map. According to the invention, through the constructed end-to-end neural network model, the deep spatial spectrum characteristics of the image can be extracted, and the prediction saliency map is directly generated, so that the consumption of computing resources is saved, and the robustness of the characteristics is improved.

Description

Hyperspectral image saliency map generation method based on end-to-end neural network

Technical Field

The invention relates to the technical field of image processing, in particular to a hyperspectral image saliency map generation method based on an end-to-end neural network.

Background

The hyperspectral image is composed of tens or hundreds of continuous narrow-band images, can capture spatial and spectral dimensional information of a target scene simultaneously, and is called a data cube. With the development of the hyperspectral imaging technology, the hyperspectral imager can acquire hyperspectral data with higher spatial resolution and spectral resolution. Currently, hyperspectral images have been applied and gained in effect in many fields, such as ground object remote sensing, precision agriculture, medical diagnosis, target detection, and the like.

A saliency map is a model that simulates the human visual attention mechanism, describing salient objects or regions of interest to the human eye in a real scene, also referred to as "regions of interest". Saliency detection is the simulation of visual attention mechanisms through algorithms, extraction of salient regions in images and generation of saliency maps. In the conventional method, the local or global contrast is calculated mainly by using primary features such as color and texture of an image, so as to obtain the significance of a region. In recent years, neural network models have been studied intensively in computer vision for extracting deep features of images, based on which saliency maps of better quality can be generated in saliency detection.

The hyperspectral image has abundant space and spectrum information, but the spectrum data has higher dimensionality and correlation and higher processing difficulty, so that most of the existing hyperspectral image saliency map generation methods are based on shallow spectral features and cannot fully utilize the space spectral information of the hyperspectral image; the existing method based on deep features generally adopts a neural network to extract features and then generates a saliency map through saliency detection, and a two-stage process consumes more computing resources, so that the method is complex, not convenient and flexible enough in network training and practicality.

Disclosure of Invention

The invention provides a hyperspectral image saliency map generation method based on an end-to-end neural network, which comprises the steps of preprocessing a hyperspectral image to enable input image normalization to be more suitable for inputting into a model, inputting processed image data into the end-to-end neural network model to extract spatial spectral features, obtaining deep spatial spectral features of the hyperspectral image, directly generating a predicted saliency map, inputting the generated predicted saliency map into a neural network, and training the neural network model by calculating loss between the predicted saliency map and a true saliency map.

The invention provides a hyperspectral image saliency map generation method based on an end-to-end neural network, which comprises the following steps:

step S1: image preprocessing, namely preprocessing the initial hyperspectral image to obtain processed image data and inputting the processed image data into an end-to-end neural network;

step S2: extracting deep layer space spectrum characteristics, constructing an end-to-end neural network model, inputting the preprocessed hyperspectral image data into the neural network model, extracting spatial characteristics and spectral characteristics, fusing the spatial characteristics and the spectral characteristics, predicting the result and fusing the predicted result to obtain a final predicted saliency map;

step S3: training a neural network model, constructing a training data set, performing data expansion on hyperspectral image data, inputting the hyperspectral image data into the network model, calculating and predicting the loss of the saliency map and the truth saliency map by using a loss function, and optimizing a parameter training neural network.

Further, in step S1, the preprocessing of the image is to calculate a mean and a variance of the initial hyperspectral image data obtained by sampling, and normalize the mean and the variance to obtain the processed hyperspectral image data.

Further, in step S2, the end-to-end neural network model includes a spatial coding module, a spectral coding module, a decoding module, and a result prediction, where the spatial coding module and the spectral coding module are respectively connected to the decoding module, the spatial coding module is used to code spatial features, the spectral coding module is used to code spectral features, and the decoding module is used to output a prediction saliency map by fusing input spatial features and spectral features, and input the prediction saliency map into the result prediction module.

Furthermore, the spatial coding module and the decoding module have the same structure of each convolution layer of the convolution block, each convolution layer comprises conv3 × 3 convolution, a batch normalization layer bn and an activation function relu, the upper convolution layers are connected through a down sampling layer, the lower convolution layers are connected through an up sampling layer, and the convolution blocks form a U-shaped structure integrally.

Further, the spatial coding module includes 6 volume blocks, the depth L of the volume blocks is 7, 6, 5, 4, and 4, respectively, and the volume blocks are connected by a maximum pooling layer maxpool.

Further, the decoding module includes 5 convolutional blocks, the depths L of the convolutional blocks are 7, 6, 5, 4, and 4, respectively, and the convolutional blocks are connected by an upsampling layer upsample or a max-pooling layer maxpool.

Furthermore, the spectrum coding module comprises 6 rolling blocks, the depths of the rolling blocks are all 4, and the rolling blocks are connected through an averaging pooling layer avgpool.

Furthermore, each convolution layer of the convolution block in the spectral coding module includes a conv1 × 1 convolution layer and a batch normalization layer, the convolution layers in the convolution block are connected with each other through an activation function relu, and the convolution block integrally forms a U-shaped structure.

Further, the result prediction module receives the output of each convolution block in the decoding module through conv3 × 3 convolution and an activation function sigmoid respectively to obtain a prediction saliency map, and fuses the prediction saliency map through conv1 × 1 convolution and the activation function sigmoid to output a final prediction saliency map.

Further, in step S3, the hyperspectral image size of the data set subjected to neural network training is 1024 × 768, and the data of the training set is subjected to data expansion by horizontal inversion with 50% probability or downsampling with 25% probability, so as to obtain a hyperspectral image with a size of 512 × 384 as an input training set.

Further, in step S3, the loss function is a two-class cross entropy loss function, and the training of the neural network optimizes the model parameters through a back propagation algorithm according to the calculated loss.

The invention has the following beneficial effects:

1. an end-to-end double-branch neural network model is adopted to extract deep features of a hyperspectral image and directly generate a prediction significant map, network training is carried out according to the calculation of the generated prediction significant map and the loss of a true value significant map, a final network model is obtained, computing resources are saved, time consumption is reduced, and flexibility in neural network training and practicability is improved.

2. The constructed end-to-end neural network model structure integrates a spatial coding module, a spectral coding module and a decoding module, fully extracts deep spatial spectral features of a hyperspectral image, improves the robustness of the features, enables the quality of the generated saliency map to be higher, and improves the accuracy of a final result.

3. The input and the output of each convolution block in the decoding module are connected through an upper sampling layer, so that the influence of pixel points of input image data in a parameter range is improved, and the accuracy of result output is ensured.

Drawings

FIG. 1 is a schematic flow diagram of the generation method of the present invention;

FIG. 2 is a schematic diagram of the overall structure of the end-to-end neural network of the present invention;

FIG. 3 is a schematic structural diagram of a 4-layer deep convolution block of the end-to-end neural network of the present invention;

the left side in fig. 3 is a 4-layer depth convolution block structure corresponding to the spatial coding module and the decoding module; the right side is a 4-layer depth convolution block structure corresponding to the spectrum coding module;

S1-S6 respectively represent the predicted saliency maps of each layer corresponding output of the decoding module.

Detailed Description

In the following description, technical solutions in the embodiments of the present invention are clearly and completely described, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

An embodiment 1 of the present invention provides a hyperspectral image saliency map generation method based on an end-to-end neural network, and as shown in fig. 1, the method includes the following steps:

step S1: image preprocessing, namely preprocessing an initial hyperspectral image to obtain processed image data, and inputting the processed image data into an end-to-end neural network, wherein the specific process is as follows:

in this embodiment, the initial hyperspectral image data is X₀∈N^512×384×81The size is 512X 384, the spectrum dimension is 81, then the obtained hyperspectral image is preprocessed and artificially marked, the preprocessing calculates the mean value and the variance of the hyperspectral image data, and the whole normalization calculation is carried out to obtain the processed hyperspectral image data X₁∈R^W×H×LThe continuity of the spectral data in the hyperspectral image is ensured, and the specific formula is as follows:

wherein μ (X)₀) Is the mean, σ (X), of the raw hyperspectral image data₀) Is the variance of the raw hyperspectral image data.

Step S2: constructing an end-to-end neural network model, inputting the hyperspectral image data obtained through preprocessing into the neural network model, extracting spatial features and spectral features, fusing the spatial features and the spectral features, predicting the output result, and fusing the predicted result to obtain a final predicted saliency map;

as shown in fig. 2, the end-to-end neural network model structure in this embodiment is a double branch W²The convolutional neural network comprises a space coding module and a spectrum coding moduleThe device comprises a module, a decoding module and a result prediction module;

the spatial coding module and the spectral coding module are respectively connected with the decoding module to integrally form a U-shaped structure, the spatial coding module is used for coding spatial characteristics, the spectral coding module is used for coding spectral characteristics, and the decoding module is used for fusing input spatial characteristics and spectral characteristics to output a prediction saliency map and inputting the prediction saliency map into the result prediction module;

the spatial coding module comprises 6 volume blocks, the depths L of the volume blocks are respectively 7, 6, 5, 4 and 4, and the volume blocks are connected through a maximum pooling layer maxpool;

the decoding module comprises 5 convolution blocks, the depths L of the convolution blocks are respectively 7, 6, 5, 4 and 4, and the convolution blocks are connected through an upsampling layer or a maximum pooling layer maxpool;

the spectrum coding module comprises 6 rolling blocks, the depths of the rolling blocks are all 4, and the rolling blocks are connected through an average pooling layer avgpool.

And the spatial coding module and each rolling block of the spectral coding module are connected with the input of each layer of rolling block of the decoding module to obtain six prediction results with different degrees.

As shown in fig. 3, a convolution block with a depth of 4 is illustrated;

in this embodiment, the spatial coding module and each convolution layer of the convolution block in the decoding module have the same structure, each convolution layer includes conv3 × 3 convolution, batch normalization layer bn and activation function relu, image data is input to the convolution layer and batch normalization layer bn with convolution kernels of 3 × 3, and then is input to the convolution block with 4 layers of depth, the output of each convolution layer is input to the convolution layer with the same depth, the convolution layers with the front n-1 depth of the upper layer are connected through a down-sampling layer, the convolution layers with the front n-1 depth of the lower layer are connected through a down-sampling layer, n represents the depth of the current convolution block, the convolution blocks form a U-shaped structure as a whole, and finally the outputs of each layer are superimposed and input to the relu activation function;

each convolution layer of the convolution block in the spectrum coding module comprises a conv1 multiplied by 1 convolution layer and a batch normalization layer, input and output of each convolution layer in the convolution block are connected through an activation function relu, similarly, the output of each convolution layer is input into the convolution layer with the same depth, and the convolution layers with different depths are sequentially connected to form a U-shaped structure.

Pre-processed hyperspectral image data X₁Inputting end-to-end neural network, extracting deep space spectrum characteristics as F e R through space branching and spectrum branching^512×384×64And 64 is the feature dimension, which is then input into the result prediction module.

The result prediction module receives deep-layer space spectrum characteristics output by each convolution block in the decoding module through conv3 x 3 convolution and an activation function sigmoid respectively to generate a prediction saliency map corresponding to each layer, the prediction saliency maps are fused through conv1 x 1 convolution and the activation function sigmoid, and a final prediction saliency map Seeger R is output^512×384And the fusion can also output the final result in a mode of superposition and averaging.

Step S3: training a neural network model, constructing a training data set, obtaining a truth-value saliency map by labeling an original hyperspectral image, calculating and predicting the loss of the saliency map and the truth-value saliency map by using a loss function, and optimizing parameters to train the neural network;

in the embodiment, a hyperspectral image with the size of 1024 × 768 is used as a training set, and a hyperspectral image with the size of 512 × 384 is obtained by horizontally turning the image of the training set with 50% probability or sampling the image with 25% probability as input;

calculating the prediction saliency map and the truth saliency map G ∈ N output by the model^512×384According to the loss, parameters of the neural network model are optimized through a back propagation algorithm Adam algorithm;

wherein, the truth significance map G epsilon N^W×HThe value is 0 or 1, 0 represents background, 1 represents foreground, and the value range of the prediction saliency map is (0.0,1.0), so that the regression problem of the saliency value is regarded as a binary classification problem of the background and the foreground in the neural network training, and the prediction saliency map is subjected to binary classification on a pixel-by-pixel basisAnd the loss function adopts two-classification cross entropy to calculate loss.

In this embodiment, the losses of the predicted saliency map and the true saliency map under six different network depths and the losses of the finally fused saliency map and the true saliency map are calculated respectively, so as to train the network, and the calculation formula of the losses is as follows:

where G denotes a true saliency map, S denotes a predicted saliency map, W and H denote the sizes of the saliency map, and x and y denote pixel coordinates.

In this embodiment, 20 epochs are set for training of the neural network model, the batch size is set to 2, the initial learning rate is 0.001, and the attenuation coefficient of each epoch learning rate is 0.95.

The invention is not limited to the foregoing embodiments. The invention extends to any novel feature or any novel combination of features disclosed in this specification and any novel method or process steps or any novel combination of features disclosed.

Claims

1. A hyperspectral image saliency map generation method based on an end-to-end neural network is characterized by comprising the following steps:

2. The end-to-end neural network-based hyperspectral image saliency map generation method according to claim 1, wherein in step S1, the preprocessing of the image is to calculate the mean and variance of the sampled initial hyperspectral image data and normalize the mean and variance to obtain the processed hyperspectral data.

3. The end-to-end neural network-based hyperspectral image saliency map generation method according to claim 1, wherein in step S2, the end-to-end neural network model comprises a spatial coding module, a spectral coding module, a decoding module and a result prediction module, the spatial coding module and the spectral coding module are respectively connected with the decoding module, the spatial coding module is used for coding spatial features, the spectral coding module is used for coding spectral features, and the decoding module is used for fusing input spatial features and spectral features to output a predicted saliency map and inputting the predicted saliency map into the result prediction module.

4. The end-to-end neural network-based hyperspectral image saliency map generation method according to claim 4, wherein the spatial coding module and the decoding module have the same structure of each convolution layer of a convolution block, each convolution layer comprises conv3 x 3 convolution, a batch normalization layer bn and an activation function relu, the upper convolution layers are connected through a downsampling layer downlink sample, the lower convolution layers are connected through an upsampling layer uplink sample, and the convolution blocks integrally form a U-shaped structure.

5. The end-to-end neural network-based hyperspectral image saliency map generation method according to claim 5, wherein the spatial coding module comprises 6 volume blocks, the depth L of each volume block is 7, 6, 5, 4 and 4, and the volume blocks are connected through a max-pooling layer maxpool;

the decoding module comprises 5 convolution blocks, the depths L of the convolution blocks are respectively 7, 6, 5, 4 and 4, and the convolution blocks are connected through an upsampling layer or a maximum pooling layer maxpool.

6. The end-to-end neural network-based hyperspectral image saliency map generation method according to claim 4, wherein the spectral coding module comprises 6 volume blocks, the depth of each volume block is 4, and the volume blocks are connected by an average pooling layer avgpool.

7. The end-to-end neural network-based hyperspectral image saliency map generation method according to claim 8, wherein each convolution layer of a convolution block in the spectral coding module comprises a conv1 x 1 convolution layer and a batch normalization layer, wherein input ends and output ends of each convolution layer in the convolution block are connected through an activation function relu, and the convolution blocks integrally form a U-shaped structure.

8. The end-to-end neural network-based hyperspectral image saliency map generation method according to claim 4 is characterized in that the result prediction module receives the output of each convolution block in the decoding module through conv3 x 3 convolution and an activation function sigmoid respectively to obtain a predicted saliency map, and fuses the predicted saliency map through conv1 x 1 convolution and an activation function sigmoid to output a final predicted saliency map.

9. The end-to-end neural network-based hyperspectral image saliency map generation method according to claim 1, wherein in step S3, the hyperspectral image size of the data set subjected to neural network training is 1024 × 768, and the data of the training set is subjected to data expansion by adopting horizontal inversion with 50% probability or downsampling with 25% probability, so as to obtain a hyperspectral image with size 512 × 384 as an input training set.

10. The end-to-end neural network-based hyperspectral image saliency map generation method according to claim 1, wherein in step S3, the loss function is a two-class cross entropy loss function, and model parameters are optimized by a back propagation algorithm according to calculated loss during training of the neural network.