CN113554720A

CN113554720A - Multispectral image compression method and system based on multidirectional convolutional neural network

Info

Publication number: CN113554720A
Application number: CN202110829480.2A
Authority: CN
Inventors: 孔繁锵; 张宁; 胡可迪; 曹童波
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2021-07-22
Filing date: 2021-07-22
Publication date: 2021-10-26

Abstract

The invention discloses a multi-spectral image compression method and a system based on a multi-direction convolution neural network, wherein the system comprises the following steps: the device comprises a forward coding network, a quantization module, an entropy coding module, an entropy decoding module, an inverse quantization module and a reverse decoding network; the method comprises the following specific steps: constructing a multispectral image compression network and training the multispectral image compression network to obtain an optimal multispectral image compression network model; sending a multispectral image to be compressed into a multispectral image compression network, extracting inter-spectral spatial features of the image by multidirectional convolution, reducing the size of the feature image by using down-sampling after dimension reduction and fusion, removing data redundancy through quantification, and obtaining a compressed code stream for transmission and storage through lossless entropy coding; and entropy decoding and inverse quantizing the received compressed code stream to obtain multispectral image spectral spatial characteristic data, and inputting the multispectral image spectral spatial characteristic data into a reverse decoding network to obtain a reconstructed multispectral image. The invention can realize multi-code rate compression of the multispectral image and effectively improve the compression performance of the multispectral image.

Description

Multispectral image compression method and system based on multidirectional convolutional neural network

Technical Field

The invention relates to the technical field of image processing and deep learning, in particular to a multi-spectral image compression method and system based on a multi-direction convolutional neural network.

Background

With the rapid development of multispectral imaging technology, abundant spectral spatial features are encoded into a plurality of narrow and continuous spectral bands, and multispectral images which can reflect the same scene features better than common visible light images can be generated. Multispectral images have abundant inter-spectral and spatial information, and have been widely applied to the fields of environmental monitoring, crop condition assessment, military reconnaissance, target monitoring and the like. However, also due to the amount of information it contains, the amount of data of a multispectral image increases dramatically compared to a visible light image. Considering that a huge amount of data will put a great pressure on the transmission, storage and application of images, especially in the case of limited channel capacity, the data needs to be compressed effectively.

For compression of visible light images, only spatial correlation needs to be considered in most cases, and therefore many effective conventional compression methods, such as JPEG2000 and 3D-SPIHT, have emerged. These algorithms typically transform in the spatial dimension to eliminate spatial redundancy to achieve higher compression ratios. The JPEG2000 or other conventional methods can be applied to the compression of the multispectral image to obtain a good effect, but in view of the characteristics of the multispectral image, a compression algorithm focusing on the simultaneous extraction of spectral and spatial features needs to be researched to better retain the information contained in the multispectral image.

Image data can be compressed because various redundant components exist between data. The redundancy of image data is mainly divided into: spatial redundancy due to the presence of correlation between adjacent pixels, temporal redundancy due to the presence of correlation between different adjacent frames in an image sequence, and spectral redundancy resulting from the correlation between different color channels or spectral bands. In the imaging process of the multispectral image, because the wave band interval is small, strong correlation exists between wave band data, namely, the inter-spectrum redundancy; for a single image of each band, it is equivalent to a two-dimensional static image, i.e. there is spatial redundancy. However, most of the existing multispectral image compression algorithms focus on and remove spatial redundancy, and ignore inter-spectral redundancy.

The traditional multi-spectral image compression algorithm is mainly divided into three categories: (1) an algorithm based on predictive coding; (2) an algorithm based on vector quantization coding; (3) transform coding based algorithms. These three types of algorithms all have obvious disadvantages: the algorithm based on the predictive coding can realize lossless compression, but the compression ratio is low, and the quality of the design of a predictor is a main factor influencing the compression performance of the method; the algorithm based on vector quantization coding has high computational complexity, and the compression performance of the algorithm is directly related to the size of a codebook, namely the larger the codebook is, the better the compression performance is, but when the codebook is too large, the computational complexity can be greatly increased; when the compression rate is large, the algorithm based on transform coding has blocking effect and edge Gibbs effect, and the compression performance is seriously influenced. Although the traditional multi-spectral image compression method can also obtain good compression effect, the characteristic of rich spectral space characteristics of the multi-spectral image cannot be fully utilized.

In recent years, the deep learning technology is rapidly developed and the application of the deep learning technology in the field of image processing is increasingly wide, so that the combination of the deep learning technology and image compression is a necessary trend of research. The deep learning technology is combined with the compression technology, and different network structures are formed by utilizing frames such as a convolutional neural network or a cyclic neural network and the like by mainly utilizing the characteristics of multiple parameters and learning of deep learning, so that the image data characteristics are extracted. The deep learning technology has the advantages that deep information in the image can be extracted, and the essential characteristics of the object are reserved. The advantage is applied to image compression, and the defects of incomplete spectrum space feature extraction and the like of the traditional compression technology can be effectively overcome. Convolutional neural networks are most widely used today, but ordinary convolutional networks usually ignore the inter-spectral information of multispectral images, and cause a great deal of information loss.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a multi-spectral image compression method and system based on a multi-direction convolutional neural network, so that multi-code rate compression of a multi-spectral image is realized, and the compression performance of the multi-spectral image can be effectively improved.

In order to solve the technical problem, the invention provides a multispectral image compression method based on a multidirectional convolutional neural network, which comprises the following steps:

(1) constructing a multispectral image compression network, training the multispectral image compression network, and optimizing network parameters to obtain an optimal multispectral image compression network model;

(2) sending a multispectral image to be compressed into a multispectral image compression network, respectively extracting spectral features and spatial features of the image from different directions through multidirectional convolution, reducing the dimension and fusing, reducing the size of the feature image by using down-sampling, removing data redundancy through a quantization module, and performing lossless entropy coding on quantized feature data to obtain a compressed code stream for transmission and storage;

(3) entropy decoding and inverse quantization are carried out on the received compressed code stream to obtain multispectral image spectral space characteristic data, then the multispectral image spectral space characteristic data is input into a reverse decoding network, the characteristic image size is restored through up-sampling, corresponding spectral space and spatial characteristics are restored through an inter-spectral/spatial multi-directional convolution module in the decoding network, reconstructed multispectral images are obtained after dimension reduction and fusion, different network models are obtained through parameter changing training, and multi-code rate compression of the multispectral images is achieved.

Preferably, in step (1), a linear rectification function is used in the multispectral image compression network, and the expression of the linear rectification function is as follows:

ReLU(x_i)＝max(0,x_i)

wherein x_iFor the data of the ith channel, the function maps the input into two segments, when the input value is less than 0, the original value is mapped into 0, if the input value is greater than 0, the original value is transmitted, and the derivative shows that the gradient is not lost when the inverse calculation is carried out.

Preferably, in step (2), the quantization function is approximated by the following formula:

X_q＝round[(2^q-1)×X_s]

wherein X_sQ is a quantization level, the quantization function which is processed approximately rounds the data when the data are transmitted in the forward direction, and skips a quantization layer when the data are transmitted in the reverse direction, and directly transmits the gradient to the previous layer.

Preferably, in step (3), there are two ways to implement multi-rate compression of the multispectral image:

(a) the penalty weight lambda in the fixed rate distortion optimization is obtained by changing the number of neurons in the middle convolutional layer, and training to obtain compression networks with different code rates, wherein the smaller the number of neurons is, the smaller the obtained code rate is;

(b) and fixing the number of neurons in the middle convolutional layer, and training to obtain compression networks with different code rates by changing the value of the penalty weight lambda in rate distortion optimization, wherein the larger the lambda is, the smaller the obtained code rate is.

Correspondingly, the multispectral image compression system based on the multidirectional convolutional neural network comprises the following components: the device comprises a forward coding network, a quantization module, an entropy coding module, an entropy decoding module, an inverse quantization module and a reverse decoding network; the forward coding network comprises an inter-spectrum/space multidirectional convolution extraction module, an extraction feature fusion module and a down-sampling module; the reverse coding network comprises an up-sampling module, an inter-spectrum/space direction multi-direction convolution recovery module and a recovery characteristic fusion module;

the inter-spectrum/space multi-direction convolution extraction module is used for convolution extracting the inter-spectrum characteristics and the space characteristics of the multi-spectrum image from different directions respectively; the extracted feature fusion module performs pixel-by-pixel fusion and dimensionality reduction on the extracted spatial features, and then the extracted spatial features are connected in parallel with the inter-spectrum features subjected to dimensionality reduction to obtain a feature map; a down-sampling module reduces the size of the feature map; the quantization module and the entropy coding module quantize and entropy code the characteristic diagram to obtain a compressed code stream for transmission and storage; the entropy decoding module and the inverse quantization module carry out entropy decoding and inverse quantization on the compressed code stream to obtain characteristic data of the multispectral image; the up-sampling module recovers the size of the characteristic graph; the spectrum/space multidirectional convolution recovery module recovers corresponding characteristics of the characteristic diagram; and the recovery feature fusion module fuses the spatial features pixel by pixel, reduces the dimension and then connects the spatial features in parallel with the inter-spectral features subjected to the same dimension reduction to obtain a reconstructed multi-spectral image.

Preferably, the forward coding network comprises 8 inter-spectrum direction convolution modules, 8 spatial direction 1 convolution modules, 8 spatial direction 2 convolution modules and 3 down-sampling modules; the downsampling operation uses convolution layers with a step size of 2 and a convolution kernel size of 4 x 4.

Preferably, the multispectral image compression network further comprises a rate distortion optimization module; the rate distortion optimization module introduces the entropy value of the intermediate characteristic data into the loss function calculation of the compression network, and the distribution of the spectral characteristic data and the spatial characteristic data is continuously optimized through training so that the rate distortion optimization module is more compact, and the loss function is expressed as:

L＝L_d+λL_r

wherein L is_dFor distortion loss, Mean Square Error (MSE) is adopted for representation; λ is a penalty weight used to explicitly control the code rate; l is_rThe entropy approximate value of the quantized feature map data reflects the distribution concentration condition of the intermediate feature data;

L_dthe calculation formula of (2):

wherein, I represents an input image,

representing the compressed network restored image, N representing the batch data size, H, W, C representing the height, width, and number of channels of the input image, respectively;

L_rthe calculation formula of (2):

L_r＝-Ε[log₂P_q]

wherein,

P_d(x) Is a probability density function of the intermediate data.

The invention has the beneficial effects that: (1) the network model adopts an inter-spectrum/space multidirectional convolution extraction module to independently extract corresponding features from different directions, the extracted feature fusion module firstly only performs pixel-by-pixel fusion on the features in the space direction, and performs parallel connection on the features of the two parts of the inter-spectrum and the space, so that the integrity of the spatial spectrum feature is ensured while the inter-spectrum and the space features are completely separated, and no new parameter is introduced; when the network is trained, the data to be trained can be directly input into the network, and a training result is obtained at the output end, so that the intermediate independent learning steps are reduced, and the learning efficiency is greatly improved; (2) the inter-spectrum/space multi-directional convolution modules are respectively cascaded and connected in parallel to form a forward coding network and a reverse decoding network together with the feature fusion module, so that the training efficiency is improved, meanwhile, the inter-spectrum/space convolution units comprise short circuit connection, and a residual error network unit is referred, so that the training speed can be increased, and a series of problems of gradient explosion, network degradation and the like caused by a deep network are avoided; (3) rate distortion optimization is added into the loss function, so that the data distribution of the inter-spectral characteristics and the spatial characteristics is more compact, the trade-off balance between the code rate and the loss distortion is kept, and the reconstructed image is closer to the original image while the more complete inter-spectral characteristic information of the image is kept.

Drawings

FIG. 1 is a schematic diagram of the compression system of the present invention.

FIG. 2 is a schematic diagram of the structure of the inter-spectrum direction convolution unit according to the present invention.

FIG. 3 is a schematic diagram of a spatial direction convolution unit structure according to the present invention.

Fig. 4 is a schematic diagram of a forward coding network structure according to the present invention.

Fig. 5 is a schematic diagram of a reverse decoding network according to the present invention.

FIG. 6(a) is the average PSNR of the 7-band test set at different code rates according to the present invention.

FIG. 6(b) is the average PSNR graph of the 8-band test set at different code rates according to the present invention.

FIG. 7 is a 7-band restored image display diagram according to the present invention.

FIG. 8 is a diagram of an 8-band restored image according to the present invention.

FIG. 9(a) is a graph of similarity of 7-band test spectrum according to the present invention.

FIG. 9(b) is a graph of similarity of 8-band test spectrum according to the present invention.

Detailed Description

As shown in fig. 1, a multi-spectral image compression system based on a multi-directional convolutional neural network includes: the device comprises a forward coding network, a quantization module, an entropy coding module, an entropy decoding module, an inverse quantization module and a reverse decoding network; the forward coding network comprises an inter-spectrum/space multidirectional convolution extraction module, an extraction feature fusion module and a down-sampling module; the reverse coding network comprises an up-sampling module, an inter-spectrum/space multi-direction convolution recovery module and a recovery characteristic fusion module; the inter-spectrum/space multi-direction convolution extraction module is used for convolution extracting the inter-spectrum characteristics and the space characteristics of the multi-spectrum image from different directions respectively; the extracted feature fusion module performs pixel-by-pixel fusion and dimensionality reduction on the extracted spatial features, and then the extracted spatial features are connected in parallel with the inter-spectrum features subjected to dimensionality reduction to obtain a feature map; the down-sampling module reduces the feature map size; the quantization module and the entropy coding module quantize and entropy code the characteristic diagram to obtain a compressed code stream for transmission and storage; the entropy decoding module and the inverse quantization module carry out entropy decoding and inverse quantization on the compressed code stream to obtain characteristic data of the multispectral image; the up-sampling module recovers the size of the characteristic graph; the inter-spectral/spatial multidirectional convolution recovery module recovers corresponding features of a feature map; and the recovery feature fusion module fuses the spatial features pixel by pixel, reduces the dimension and then connects the spatial features in parallel with the inter-spectral features subjected to the same dimension reduction to obtain a reconstructed multi-spectral image.

The invention provides a multispectral image compression method based on a multidirectional convolutional neural network, which specifically comprises the following steps:

step S1: and constructing a multispectral image compression network, inputting training set data to train the network so as to optimize network parameters until an optimal multispectral image compression network model is obtained.

Step S11: a plurality of inter-spectrum/space convolution units in different directions are used as basic unit structures to be cascaded and stacked to form a corresponding inter-spectrum/space multidirectional convolution extraction (or recovery) module, down sampling (or up sampling) is added to form a forward coding network and a reverse decoding network respectively, and rate distortion optimization, quantization, entropy coding, entropy decoding and inverse quantization modules are added to form an end-to-end multispectral image compression network.

Step S12: the structures of the inter-spectrum convolution unit and the space convolution unit are respectively shown in fig. 2 and fig. 3, Conv represents convolution, ReLU represents a linear rectification function, short-circuit connection learning residual error is adopted to accelerate training, the inter-spectrum convolution unit adopts convolution with convolution kernel size of 1 × 1 × 3 to extract features, complete inter-spectrum feature data are extracted only in inter-spectrum dimension calculation, and a nonlinear activation function ReLU is added to learn nonlinear information; the spatial convolution unit extracts features by adopting convolution with convolution kernel size of 3 multiplied by 1; a plurality of inter-spectrum/space convolution units are stacked and cascaded to form a corresponding inter-spectrum/space multidirectional convolution extraction module, and after feature extraction is completed, fusion and down-sampling are performed together to form a forward coding network.

Step S13: the forward coding network is mainly formed by cascade stacking of a convolution layer, a linear rectification unit, a multidirectional convolution network, a feature fusion network and a down-sampling layer, and the forward coding network achieves the function of extracting complete inter-spectral and spatial features of a multispectral image.

Step S14: the quantization layer and entropy coding are arranged behind the forward coding network structure, the realized function is to quantize and entropy code the inter-spectral space fusion characteristics extracted by the forward coding network, wherein the quantization aims to reduce redundant information in data, but is also a main source of an image compression distortion part, and the entropy coding is to encode the data after quantization under a lossless condition to obtain a compressed binary code stream, so as to further remove the inter-data statistics redundancy. The quantization layer, entropy coding and forward coding network form a compression model in the network.

Step S15: the inverse quantization, entropy decoding and reverse decoding network in the network structure correspond to the quantization, entropy encoding and forward encoding network functions one by one, and the whole network is in a symmetrical structure and jointly forms a complete compression network structure.

Step S16: the whole compression network model adopts an end-to-end training mode: the preprocessed multispectral image is directly input into a network model at an input end to start training, and a batch processing mode is adopted, namely, a plurality of pictures are read at one time, so that the network training efficiency is improved; the image data is transmitted to a decompression model through the compression model, the data obtained at the output end is compared with the original data to calculate distortion errors, network parameters are optimized through a minimum loss function, and the network parameters are updated through back propagation until the parameters are optimal.

Step S2: the method comprises the steps of sending a multispectral image to be compressed into a multispectral image compression network, extracting spectral features and spatial features of the multispectral image respectively through a spectral/spatial multidirectional convolution extraction module, obtaining fused spectral-spatial features through an extracted feature fusion module, reducing the size of a feature graph by using downsampling, removing data redundancy through a quantization layer, and performing lossless entropy coding on quantized intermediate feature data to obtain a compressed code stream for transmission and storage.

Step S21: the forward coding network is used for extracting characteristic data of an input multispectral image, reserving inter-spectral information and spatial information of the multispectral image, and is helpful for reconstructing a high-quality image, and the structure of the forward coding network is shown in fig. 4, wherein Conv represents a convolutional layer, parameters in brackets behind Conv respectively represent the number of input channels, the number of output channels, the size of a convolutional kernel, step length and padding. In addition, the downsampling operation is implemented by using convolution with step size 2, and the parameters in the following brackets represent convolution kernel size, step size, padding, respectively. The ReLU represents a linear correction unit, the inter-spectrum and spatial convolution units are respectively of the structures shown in fig. 2 and fig. 3, and the forward coding network specifically comprises the following processes:

(1) multispectral data with the size of H multiplied by W multiplied by C are respectively input into the inter-spectrum and space multidirectional convolution modules, convolution layers with the convolution kernel size of 1 multiplied by 3, the step length of (1,1,1) and the padding of (0,0,1) are arranged in the inter-spectrum direction convolution modules, the dimension of the padding of 1 corresponds to the inter-spectrum dimension, and input and output channels are all 1; in the spatial direction convolution module, the size of a convolution kernel is 3 × 3 × 1, the step length is (1,1,1), the padding is (1,1,0) (1 in padding corresponds to the spatial dimension), the number of input channels of the first convolution layer is 7 or 8, which is determined by the number of input channels of the multispectral image itself, the output channels are 56, the input and output channels of the middle unit are 56, the input and output channels of the last convolution layer are 56, and the output is 7 or 8; increasing the nonlinear relation among each layer of the neural network through a ReLU activation function;

(2) extracting features through a multidirectional convolution module, then firstly adding and fusing spatial features in different directions pixel by pixel, after convolution dimensionality reduction with a convolution input channel of 7 or 8 (determined by the number of channels of an input multispectral image), an output channel of 4, a kernel size of 1 × 1, a step size of 1 and a padding of 0 is adopted, parallel connection is carried out with the spectrum characteristics after convolution dimensionality reduction with the same input channel of 7 or 8 (determined by the number of channels of the input multispectral image), an output channel of 3 or 4 (corresponding to 7 or 8), a kernel size of 1 × 1, a step size of 1 and a padding of 0 to obtain fused spectrum space features, then after a layer of convolution and a layer of ReLU activation function, the convolution kernel size is 3 × 3, the step size is 1, the padding is 1, then downsampling is carried out, namely, the convolution kernel size is 4 × 4, the step size is 2 and the padding is 1, obtaining 64 feature maps with the size of H/2 xW/2, repeating the same process twice again to obtain 64 feature maps with the size of H/8 xW/8, and reducing the spatial resolution of the image to 1/64;

(3) and finally, respectively obtaining (64,48) input and output channels through convolution layers with convolution kernel size of 3 multiplied by 3, step length of 1 and padding of 1, finally extracting 48 intermediate characteristic diagram data with size of M/8 multiplied by N/8, and enabling the value of network output to be between [0,1] through a Sigmoid function.

Step S22: the quantization layer and the entropy coding layer mainly remove data redundancy, data after coding quantization is a compressed code stream, and because the derivative of a quantization function is discontinuous, the derivative cannot be directly derived and added into a network, and the situation that the gradient disappears can occur during reverse propagation to hinder the updating of network parameters, the quantization function needs to be approximately processed, and the formula is as follows:

X_q＝round[(2^q-1)×X_s]

wherein X_sQ is the quantization level, and is the intermediate characteristic data obtained after the convolution layer is extracted and passes through a Sigmoid function.

And (3) approximating the processed quantization function, rounding the data in forward propagation, skipping a quantization layer in reverse propagation, and directly transmitting the gradient to the previous layer. Then the quantized intermediate characteristic data X_qAnd generating a binary code stream by adopting ZPAQ lossless entropy coding. Entropy decoding and restoring the code stream to obtain quantized intermediate characteristic data X_qThen inverse quantized intermediate feature data X_q/(2^q-1) input into a reverse decoding network.

Step S3: entropy decoding and inverse quantization are carried out on the received code stream to obtain multispectral image spectral space characteristic data, the multispectral image spectral space characteristic data are sent to a reverse decoding network to reconstruct the multispectral image, and different compression network models are trained by changing parameters to realize the multi-code rate compression of the multispectral image.

Step S31: the inverse decoding network is used for reconstructing the intermediate feature image into a multispectral image, and the structure of the multispectral image is shown in fig. 5, wherein Conv represents a convolution layer and is symmetrical to the structure of the forward encoding network, and parameters in parentheses behind Conv respectively represent the number of input channels, the number of output channels, the size of a convolution kernel, a step size and padding. ReLU denotes a linear correction unit, and the inter-spectrum and spatial convolution units have the structures shown in fig. 2 and 3, respectively. PixelShuffle represents an upsampling function used to recover the feature map size. The specific decoding process of the reverse decoding network is as follows:

(1) firstly, 48 feature map data with the size of H/8 xW/8 are subjected to convolution layers with the convolution kernel size of 3 x 3, the step size of 1 and the padding size of 1 to obtain 64 feature maps with the size of H/8 xW/8;

(2) obtaining 16 characteristic graphs with the size of H/4 xW/4 through a Pixelshuffle layer; then, obtaining 64 feature maps with the size of H/4 xW/4 through convolution layers with the convolution kernel size of 3 x 3, the step size of 1 and the padding of 1 and a ReLU activation function, and repeating the process twice through convolution layers with the convolution kernel size of 3 x 3, the step size of 1 and the padding of 1 to obtain 64 feature maps with the size of M x N;

(3) then, inputting the data into an inter-spectrum direction convolution module and a space direction convolution module respectively, and in the inter-spectrum direction convolution module, passing through a convolution layer (1 in coding corresponds to the inter-spectrum dimension) with the convolution kernel size of 1 multiplied by 3, the step length of (1,1,1) and the coding of (0,0,1), wherein the input and output channels are all 1; in a spatial direction convolution module, the size of a convolution kernel is 3 multiplied by 1, the step length is (1,1,1), the padding is (1,1,0) (1 in padding corresponds to a spatial dimension), the input channel of the first convolution layer is determined to be 7 or 8 by the wave band number of an input image, the output channel is 56, the input and output channels of the middle unit are 56, the input channel of the last convolution layer is 56, the output is 7 or 8, the whole process is in a symmetrical relation with a forward coding network, and the deconvolution operation is equivalently carried out on feature map data;

(4) and finally, adding the data obtained by the spatial direction convolution module, reducing the dimension, then connecting the data in parallel with the data between the spectrums after the dimension is reduced, and reconstructing to obtain a restored image with the same size as the original image.

Step S32: in order to further optimize the performance of the multispectral compression network, reduce the code rate and maintain the recovered image quality as much as possible, therefore, a balance needs to be made between the code rate and the image quality loss, and meanwhile, in order to obtain models under different code rates, the loss function adopted in the text is represented by the following formula:

L＝L_d+λL_r

wherein L is_dFor Distortion Loss (Distortion Loss), Mean-Square Error (MSE) is used; lambda is a punishment weight and is used for explicitly controlling the code rate, namely the larger the value is, the smaller the code rate is; l is_rThe entropy approximate value of the quantized feature map data can reflect the distribution concentration condition of the intermediate feature data;

L_dthe calculation formula of (2) is as follows:

wherein H, W and C areRepresenting the height width and the number of channels of the image, N representing the batch size, I (x, y, z) representing the pixel value at the spatial position (x, y, z) of the input image,

representing the pixel value at the (x, y, z) spatial position of the restored image of the compression network.

L_rThe calculation formula of (2) is as follows:

L_r＝-Ε[log₂P_q]

wherein,

p here_d(x) Is a probability density function of the intermediate data. The entropy of the intermediate characteristic data is introduced into the loss function, so that the distribution of the intermediate characteristic data can be continuously optimized and concentrated in the learning process of the network, and the compression performance of the network is improved.

Step S33: the multispectral image compression network aims to make the recovered image close to the input image as much as possible, retain the original image information, and in the network, the loss function value is minimized by obtaining the optimal network parameters through learning and training, and the multispectral image compression network is expressed by a formula:

wherein,

x_m(θ)＝Se(θ₁,x)+Sa₁(θ₂,x)+Sa₂(θ₃,x)

θ＝(θ₁,θ₂,θ₃)

where x denotes the input image, theta denotes the parameters of the network, theta₁,θ₂,θ₃Corresponding to the parameters of the convolutional network in each direction, Se (), Sa₁() And Sa₂() Representing inter-spectral direction, spatial direction 1, and spatial direction, respectivelyTo the 2-convolutional network, Qu () represents the quantization code and Re () represents the decoding reconstruction network. The specific process of network parameter optimization is as follows:

when network training starts, network parameters are initialized randomly, and the network parameters enter a decoding network after being coded and quantized by a forward coding network; the decoding end reconstructs an image according to the parameters; sending the pixel value of the reconstructed image and the pixel value of the original image into a loss function, calculating the error between the pixel value of the reconstructed image and the pixel value of the original image, updating the parameters through a back propagation algorithm until the loss function reaches the minimum, wherein L is_rThe input of the method is the unquantized output of the coding network, and the information entropy is added into the loss function, so that the image data distribution is more compact, and the code rate can be effectively reduced.

The effect of the present invention will be further described with reference to simulation experiments.

The hardware test platform of the invention is as follows: GPU (NVIDIA GeForce GTX 2080TI), internal memory (32GB), hard disk (Samsung SSD SM8712.57mm 256 GB/solid state disk).

The software platform is as follows: windows 764 bit operating system, pytorch1.2.0, matlab.

The training set and the testing set of the multispectral image compression network used by the invention are derived from multispectral images of Landsat8 and WorldView-3 satellites, and respectively comprise 7 or 8 multispectral spectral segments. In order to prevent overfitting of the network, a multispectral image under different seasons, various weather conditions and various terrain conditions is selected from a training set of the network, so that the multispectral image contains rich and various features, and the multispectral image is cut into blocks of 128 x 128 sizes, wherein 80000 pieces of 7-band data (Landsat8) are used as the training set, and 17000 pieces of 8-band data (WorldView-3) are used as the training set; selecting a test set of the obtained network according to the same standard, wherein the 7-waveband test set comprises 17 images in total, the size of the test set is 512 multiplied by 512, and the 8-waveband test set comprises 14 images, the size of the test set is 512 multiplied by 512; the training set and the test set of the network have no duplicate images, i.e. the multispectral image data selected as the test set is not involved in the training.

The Adam Optimizer is used for network training, the initial learning rate is set to be 0.0001, the Adam Optimizer is used for fast convergence of the network and a pre-training model is generated until the network convergence speed is slowed downAnd then, reducing the learning rate to 0.00001, and training to obtain the required model. For rate-distortion optimization, according to the learning idea of easy-to-hard, at an initial stage L_rSet to 0, after the network has sufficiently converged, gradually increase L_rThe value of the weight lambda enables the distribution of the data of the intermediate characteristic diagram to be gradually concentrated, and the compression performance is obviously improved.

The method is applied to compress multispectral images at different code rates, the image compression ratio can be changed by changing the punishment weight lambda in a trained model, the recovery effect of testing images by adopting 8 different code rates is simulated, and compared with the performance of the conventional JPEG2000 and 3D-SPIHT methods, the spectral similarity is introduced as a reference for checking the recovery condition of spectral information before and after image compression.

FIGS. 6(a) and 6(b) show the average PSNR of the present invention compared to JPEG2000 and 3D-SPIHT over two data sets at different code rates; FIG. 7 is a graph showing the comparison of the recovery of four selected 7-band multispectral images under the two algorithms of the invention, JPEG2000 and 3D-SPIHT; FIG. 8 is a graph showing the comparison of the recovery of four selected 8-band multispectral images under the two algorithms of the present invention and JPEG2000 and 3D-SPIHT; FIGS. 9(a) and 9(b) show a comparison of the spectral similarity curves of the present invention with JPEG2000 and 3D-SPIHT on two data sets.

The peak signal-to-noise ratio PSNR is the most widely applied objective evaluation index of the traditional image compression algorithm and is mainly measured by pixel change before and after compression, and the calculation formula is as follows:

wherein n represents the number of bits of the image, H, W, C represent the height width and the number of channels of the image, respectively, I (x, y, z) represents the pixel value at the spatial position (x, y, z) of the input image,

representing the pixel value at the (x, y, z) spatial position of the restored image of the compression network. As can be seen from fig. 6, as the code rate increases, PSNR of the reconstructed image and PSNR of the original image both increase, average PSNR between the reconstructed image and the original image of the present invention is higher than that of JPEG2000 and 3D-SPIHT at the same code rate, and comparing the results of fig. 6(a) and fig. 6(b), it can be further illustrated that as the number of bands increases, the advantages of the present invention are more obvious; the comparison results under 8 different code rates are integrated to show that the PSNR of the invention is averagely higher by about 3-4dB than JPEG2000 and 1-2dB than 3D-SPIHT on a 7-wave band data set, and is averagely higher by 6-8dB than JPEG2000 and 3-5dB than 3D-SPIHT on an 8-wave band data set; the PSNR advantage of the present invention is most significant at code rates of 0.3 to 0.4.

Fig. 7 and 8 show that the visual effect of the reconstructed image and the original image of the JPEG2000 and 3D-SPIHT algorithms on the 7-band and 8-band datasets of the present invention are compared, and the quality of the recovery effect is more intuitively demonstrated, and it can be seen from the four comparison images that the reconstructed image quality of the present invention is significantly better than that of the JPEG2000 and 3D-SPIHT algorithms, and many texture details of the recovered image of the latter two algorithms are blurred, especially the recovered image of the JPEG2000 algorithm generates an obvious blocking effect, and the recovered image of the present invention is clearer, and the texture details are more completely retained.

FIGS. 9(a) and 9(b) show the spectral similarity curves on two data sets, the spectral similarity curve of the restored multispectral image is always kept below the JPEG2000 and 3D-SPIHT spectral similarity curves, and the average spectral similarity of the multispectral image compression network under different code rates is smaller than that of the JPEG2000 and 3D-SPIHT. The smaller the spectral similarity value, the more similar the spectral curve is, the closer to the original spectrum is.

TABLE 1 average spectral similarity of 7 band test data sets at different code rates

TABLE 2 average spectral similarity of 8 band test data sets at different code rates

Table 1 shows the average spectral similarity of a 7-band test data set at different code rates, and Table 2 shows the average spectral similarity of an 8-band test data set at different code rates, and the overall experimental data shows that the algorithm has better effect on the spectral similarity index than JPEG2000 and 3D-SPIHT, namely more effectively and more completely retains the inter-spectrum information.

Claims

1. A multispectral image compression method based on a multidirectional convolutional neural network is characterized by comprising the following steps:

(3) entropy decoding and inverse quantization are carried out on the received compressed code stream to obtain multispectral image spectral space characteristic data, then the multispectral image spectral space characteristic data is input into a reverse decoding network, the characteristic image size is restored through up-sampling, corresponding spectral and spatial characteristics are restored through the reverse decoding network, reconstructed multispectral images are obtained after dimension reduction and fusion are carried out, different network models are obtained through parameter changing training, and multi-code rate compression of the multispectral images is achieved.

2. The multi-directional convolutional neural network-based multi-spectral image compression method of claim 1, wherein in step (1), a linear rectification function is used in the multi-spectral image compression network, and the expression of the linear rectification function is as follows:

ReLU(x_i)＝max(0,x_i)

3. The multi-directional convolutional neural network-based multi-spectral image compression method of claim 1, wherein in step (2), the quantization function is approximated by the following formula:

X_q＝round[(2^q-1)×X_s]

4. The multi-direction convolutional neural network-based multi-spectral image compression method of claim 1, wherein in step (3), there are two ways to achieve multi-rate compression of the multi-spectral image:

5. The system for implementing the multi-directional convolutional neural network-based multi-spectral image compression method according to claim 1, comprising: the device comprises a forward coding network, a quantization module, an entropy coding module, an entropy decoding module, an inverse quantization module and a reverse decoding network; the forward coding network comprises an inter-spectrum/space multidirectional convolution extraction module, an extraction feature fusion module and a down-sampling module; the reverse coding network comprises an up-sampling module, an inter-spectrum/space direction multi-direction convolution recovery module and a recovery characteristic fusion module;

6. The multi-directional convolutional neural network-based multispectral image compression system of claim 5, wherein the forward coding network comprises 8 inter-spectral direction convolution modules, 8 spatial direction 1 convolution modules, 8 spatial direction 2 convolution modules, and 3 down-sampling modules; the downsampling operation uses convolution layers with a step size of 2 and a convolution kernel size of 4 x 4.

7. The multi-directional convolutional neural network based multispectral image compression system of claim 5, further comprising a rate-distortion optimization module; the rate distortion optimization module introduces the entropy value of the intermediate characteristic data into the loss function calculation of the compression network, and the distribution of the spectral characteristic data and the spatial characteristic data is continuously optimized through training so that the rate distortion optimization module is more compact, and the loss function is expressed as:

L＝L_d+λL_r

L_dthe calculation formula of (2):

wherein, I represents an input image,

L_rthe calculation formula of (2):

L_r＝-Ε[log₂ P_q]

wherein,

P_d(x) Is a probability density function of the intermediate data.