CN111009018A

CN111009018A - Image dimensionality reduction and reconstruction method based on deep neural network

Info

Publication number: CN111009018A
Application number: CN201911347254.XA
Authority: CN
Inventors: 侯兴松; 康越
Original assignee: Suzhou Tianbiyou Technology Co ltd
Current assignee: Suzhou Tianbiyou Technology Co ltd
Priority date: 2019-12-24
Filing date: 2019-12-24
Publication date: 2020-04-14

Abstract

The invention discloses an image dimensionality reduction and reconstruction method based on a deep neural network, which is used for carrying out image dimensionality reduction at a coding end, reducing bit streams generated by image compression and saving bandwidth; and carrying out image reconstruction at a decoding end. The invention combines discrete wavelet transform and deep learning to improve the potential of image degradation and reconstruction performance. The invention combines the templates of transformation, quantization and entropy coding together, so that the functional modules are mutually influenced, and the coding steps of transformation, quantization and the like are jointly optimized, so that the performance of the coder is optimal as far as possible. The invention not only uses the neural network to realize the dimensionality reduction and reconstruction of the image, but also nestedly uses the discrete wavelet transform. In order to provide more accurate codeword distribution when optimizing rate distortion, the invention combines a context network and a super-prior network, and the super-prior network can correct the prediction information of the context network and jointly generate more accurate mean and standard deviation parameters.

Description

Image dimensionality reduction and reconstruction method based on deep neural network

Technical Field

The invention relates to the field of image reconstruction, in particular to a method for reducing dimension and reconstructing an image based on a deep neural network.

Background

The image is a similar and vivid description of objective objects, is a relatively intuitive representation mode of objective objects and contains related information of the described objects.

With the development of the information age, image information is increasing, and the bandwidth of the network is limited. In this case, it is particularly important to reduce the bandwidth occupied by transmitting the image by reducing the dimension of the image. The image data can be reduced in dimension because of the redundancy in the data. The redundancy of image data is mainly represented by: spatial redundancy due to correlation between adjacent pixels in the image; temporal redundancy caused by correlation between different frames in the image sequence; spectral redundancy due to different band correlations. The goal of image dimensionality reduction is to reduce the number of bits required to represent an image by removing these data redundancies.

The wavelet transform is a hot point direction for researching image degradation and reconstruction, and the invention aims to combine discrete wavelet transform with deep learning to improve the potential of image degradation and reconstruction performance.

Disclosure of Invention

The invention aims to provide an image dimension reduction and reconstruction method based on a deep neural network.

In order to achieve the purpose, the invention designs an image dimension reduction and reconstruction method based on a deep neural network, which is used for constructing an image dimension reduction and reconstruction network framework, wherein the network framework comprises an encoding end and a decoding end; performing image dimensionality reduction at a coding end, reducing bit streams generated by image compression and saving bandwidth; reconstructing an image at a decoding end; the method comprises the following steps:

s1: at a coding end, inputting an image to be coded into a convolutional neural network containing Discrete Wavelet Transform (DWT) to obtain a low-resolution image y stored with structural information;

s2: quantizing the low resolution image y to obtain a code word

The code word is processed by an entropy coder to obtain a code stream file after entropy coding;

probability modeling is carried out on the quantized code words through a Gaussian mixture model to control the code rate; context and prior information are introduced, and the prior network learns the mean value and the standard deviation of the probability distribution based on the context;

s3: at a decoding end, inputting the compressed and decoded image into a convolution network containing Integer Wavelet Transform (IWT) to obtain an image reconstructed from an original image; the integer wavelet transform is the inverse of the discrete wavelet transform.

Further, in step S1, the input image [0,255] is normalized to [ -1.0, +1.0], and then the input image is subjected to convolution-DWT-convolution transformation to obtain the transformed feature codeword y.

Further, in step S1, after each layer of convolution operation, the activation function used is GDN;

the positive transformation expression of the GDN transformation is as follows:

wherein i and j are channel serial numbers; w is a_i(m, n) is the characteristic code word of the ith channel with the plane position at (m, n); β_iAnd gamma_jFor parameters in GDN conversion, u_i(m, n) is w_iAnd (m, n) the code word is subjected to GDN transformation to obtain a characteristic code word.

Further, in step S2, a superior network and a context model are combined to learn to obtain a mean and a standard deviation of the probability distribution;

the super-prior network provides parameter information for an entropy coder for transforming the code words by sacrificing additional code words so as to further remove redundant information between the code words; the context model predicts the probability model parameters of the entropy coder through the code words obtained by decoding, thereby saving the code words; the combination of the two can provide parameter information for the entropy codec more efficiently.

Further, in step S3, after the input image is subjected to the deconvolution-IWT-deconvolution transform, a decoded image of the original image is obtained, and the range of the decoded image is normalized to the range of [0,255] to obtain a final decoded image.

Further, in step S3, after each layer of deconvolution operation, the activation function used is IGDN;

the IGDN transformation is expressed as:

wherein i and j are channel serial numbers; w is a_i(m, n) is the characteristic code word of the ith channel with the plane position at (m, n); β_iAnd gamma_jFor a parameter in the IGDN transformation, u_i(m, n) is w_iAnd (m, n) the code words are subjected to IGDN conversion to obtain the characteristic code words.

Furthermore, the invention carries out effective rate distortion optimization;

parameters in the image dimension reduction and reconstruction network are obtained by network training and learning; in network training, in order to perform rate distortion optimization effectively, the code rate of the codeword needs to be estimated, so as to facilitate the control of the code rate of the encoder in the training. The loss function in training is therefore set to:

L＝lamda*D+R

wherein lamda is used to control the model code rate; d is an original image x and a reconstructed image

Mean square error MSE between; r is the code rate, which is specifically calculated as follows:

wherein the content of the first and second substances,

the distribution parameters are obtained by combining a prior network and a context network, and the mean value and the standard deviation are respectively recorded as follows:

for the prior information

Constraining the distribution to be zero mean value, Gaussian distribution with standard deviation being learnable, and recording the standard deviation as

Compared with the prior art, the invention has the following advantages and characteristics:

conventional image downscaling and reconstruction algorithms, such as JPEG, JPEG2000, BPG, which use fixed transforms, i.e. discrete cosine transform and discrete wavelet transform, in combination with quantization and entropy coders, reduce the spatial redundancy of the image. These conventional image encoders mainly optimize each module inside the encoder, such as transform, quantization, and entropy coding, respectively. Aiming at the end-to-end mode of the deep neural network, the invention combines the templates of transformation, quantization and entropy coding together, so that the functional modules are mutually influenced, and the coding steps of transformation, quantization and the like are jointly optimized, so that the performance of the coder is optimal as far as possible.

The invention not only uses the neural network to realize the dimensionality reduction and reconstruction of the image, but also nestedly uses the discrete wavelet transform.

In order to provide more accurate codeword distribution when optimizing rate distortion, the invention combines a context network and a super-prior network, and the super-prior network can correct the prediction information of the context network and jointly generate more accurate mean and standard deviation parameters.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a diagram of a masked convolution for implementing a context network.

Detailed Description

The following description of the embodiments of the present invention will be made with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

Referring to fig. 1, the invention constructs an image dimension reduction and reconstruction network framework based on a deep neural network, and the network framework comprises the following parts: encoder, decoder, quantization, super-a-encoder, super-a-decoder, context network, entropy encoder parameter estimation network, entropy encoder, entropy decoder.

Based on the network framework, the invention provides an image dimension reduction and reconstruction method, which comprises the following steps:

s2: quantizing the low resolution image y to obtain a code word

The method comprises the following specific steps:

1) preprocessing and standardizing the range of the image to be coded to the range of [ -1, +1] to obtain a preprocessed image x;

2) sending the x into an encoder to obtain an output result y of the encoder; the method comprises the following specific steps:

2.1) x is subjected to convolution operation with a layer of convolution kernel with the size of 5x5, the number of channels of 128 and the step length of 2; then, the result after convolution is sent to GDN transformation of the first layer;

2.2) after the first layer GDN conversion, the output is sent to a DWT conversion layer and is converted by the GDN of the second layer;

2.3) inputting the output result into a second layer convolution operation, wherein the size of a convolution kernel is 5x5, the number of channels is 512, and the step length is 2; then, similarly, the convolution output of the layer is transformed by GDN of a third layer;

2.4) then sending the output result to the convolution layer of the last layer, wherein the size of the convolution kernel is 5x5, the number of channels is 192, the step length is 2, and then obtaining an unquantized coded code word y;

3) sending y into an encoder in a super-prior network to obtain a super-prior code word z;

the encoding process comprises three layers of convolution operation, wherein the convolution kernel size of each layer is 5x5, the channel number is 128, and the step length is 2;

4) the quantization of the rounding of the codeword z is obtained to obtain the quantized codeword

5) Passing the information according to an entropy model

Writing into a code stream file, assuming

The obedience mean value is 0, and the standard deviation is a learnable Gaussian distribution;

6) will be provided with

Sending the data to a decoder in a super prior network to obtain a parameter phi;

this decoding process includes three layers of deconvolution operations, where the convolution kernel size of the last layer is 5x5, the number of channels is 192, and the step size is 2;

7) the characteristic code word y is quantized by rounding, and the quantized code word

Inputting the information into a context network to obtain parameter information theta;

the context model is realized by mask convolution, when a pixel value of a certain point is generated by the mask convolution layer, the pixel values of the right side and the lower side of the pixel are covered, and the pixel value of the current point is calculated only by the pixel values of the upper side and the left side, as shown in FIG. 2, wherein the gray position is the pixel value to be generated;

8) combining phi and theta in channel dimensions, and inputting the combined phi and theta into an entropy encoder parameter estimation network to obtain estimated mu and delta;

9) will be provided with

Entropy coding is carried out under a Gaussian mixture probability model with the mean value of mu and the standard deviation of delta, a code stream file is written in, and once image coding is completed after code stream writing is completed;

10) the decoding process is similar to the encoding process; reading code stream file and according to code word

Is obtained by decoding the entropy probability model

Then will be

Sending the data to a decoder in a super prior network to obtain a parameter phi; and will be

The decoding information is input into a context network to obtain a parameter theta;

11) inputting phi and theta into an entropy encoder parameter estimation network to obtain

The probability model parameters mu, delta; and sending mu and delta into an entropy decoder according to the writing sequence, and continuously decoding quantized characteristic code words

12) After the decoding of the entropy decoder is finished, the entropy decoder will

Sending the data into a decoder network; obtaining a decoded image and normalizing the decoded image range to [0,255]]Obtaining a final decoded image; the specific operation is as follows:

12.1) code word after quantization

Feeding the data into a first layer of deconvolution, wherein the size of a convolution kernel is 5x5, the number of channels is 192, and the step length is 2; then, the deconvolution result of the first layer is sent to the first layer IGDN for conversion;

12.2) continuously sending the output into a deconvolution layer of a second layer, and then passing through an IGDN of the second layer;

12.3) then sending the output result of the second layer IGDN conversion into an IWT conversion layer;

12.4) subjecting the result to one-layer deconvolution operation, wherein the kernel size is 5x5, the channel number is 1, and the step size is 2, thereby obtaining a decoded image, and normalizing the range of the decoded image to [0,255] to obtain a final decoded image.

Compared with the traditional image encoder structure, the parameters of the encoder and the decoder are obtained by network training and learning; moreover, in the network training, in order to effectively perform rate distortion optimization, the code rate of the codeword needs to be estimated; the existing code rate estimation schemes mainly have two types: one is to directly constrain the number of code words, and this code rate constraint scheme is mostly seen in the self-encoder where the output code words are binary; the other method is to assume the distribution of code words, then solve the self-information quantity according to the probability of the code words to estimate the code rate, and take the estimated code rate as the approximate code rate in the network training; the entropy probability model based on deep learning in the invention is Gaussian distribution.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the technical principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. The image dimensionality reduction and reconstruction method based on the deep neural network is characterized in that an image dimensionality reduction and reconstruction network framework is constructed, and the network framework comprises an encoding end and a decoding end; performing image dimensionality reduction at a coding end, reducing bit streams generated by image compression and saving bandwidth; reconstructing an image at a decoding end; the method comprises the following steps:

s2: quantizing the low resolution image y to obtain a code word

2. The deep neural network-based image dimensionality reduction and reconstruction method according to claim 1, wherein the step S1 includes the following specific steps:

1.1) standardizing the pixel value range of an image to be coded to [ -1.0,1.0], and obtaining a preprocessed image x;

1.2) carrying out a layer of convolution operation on x, extracting relevant characteristic information and obtaining an output result x 1;

x1 ═ F (x ＊ w + b), where F is the activation function, x is the input information, w is the weight, b is the offset, and x1 is the output result;

1.3) carrying out DWT transformation on the x1 to obtain an output result x 2;

1.4) performing convolution operation on x2 n times to obtain a low-resolution image y of x.

3. The deep neural network-based image dimensionality reduction and reconstruction method according to claim 1, wherein the step S2 includes the following specific steps:

2.1) inputting the low-resolution image y into a super-first-order-check network to obtain an output result phi;

2.2) quantization of y by rounding to give

Wherein U (-0.5,0.5) is value range [ -0.5,0.5 [)]Uniform distribution of (2);

2.3) mixing

Inputting a context network to obtain an output result theta;

when the context network generates a pixel value of a certain point, the pixel values of the right side and the lower side of the pixel are covered, and the pixel value of the current point is calculated only through the pixel values of the upper side and the left side;

2.4) simultaneously inputting phi and theta into a parameter estimation network of an entropy coder, and carrying out multilayer convolution operation to obtain parameters of mixed Gaussian distribution, namely a mean value mu and a standard deviation delta;

2.5) entropy encoder based on μ, δ pairs

And coding to obtain a compressed bit stream.

4. The deep neural network-based image dimensionality reduction and reconstruction method according to claim 1, wherein the step S3 includes the following specific steps:

3.1) the entropy decoder decodes the compressed bit stream according to mu and delta to obtain

3.2) to

Performing deconvolution operation m times to obtainGraph with channel number being multiple of 4

Wherein Q is an activation function, w is a weight, and b is a bias;

3.3) to

Performing IWT conversion to obtain multi-channel result

3.4) pairs

Performing convolution operation with the number of output channels being 1 to obtain a decoded image of the original image x, and normalizing the pixel value range of the decoded image to [0,255]]To obtain the final decoded image

5. The deep neural network-based image dimensionality reduction and reconstruction method of claim 2, wherein:

the encoding end combines DWT and convolution operation;

in the step 1.2), the activation function is GDN;

in step 1.4), n is 2.

6. The deep neural network-based image dimensionality reduction and reconstruction method of claim 3, wherein:

in the step 2.1), the super-advanced network provides additional parameter information for the entropy encoder by sacrificing additional code words, and further removes redundant information between the code words;

in step 2.3), the function of the context network is realized by a mask convolution operation.

7. The deep neural network-based image dimensionality reduction and reconstruction method of claim 6, wherein the structure of the super-prior network is based on a convolution operation.

8. The deep neural network-based image dimensionality reduction and reconstruction method of claim 4, wherein:

the decoding end combines IWT and convolution operation;

in the step 3.2), the activation function is IGDN; m is 2.

9. The deep neural network-based image dimensionality reduction and reconstruction method according to claim 1, wherein, compared with a traditional image encoder structure, the encoder and decoder parameters in the image dimensionality reduction and reconstruction network are obtained by network training learning; and when the network is trained, in order to carry out rate distortion optimization effectively, the code rate of the code word is also estimated, so that the code rate of the encoder can be controlled conveniently in the training.

10. The image dimensionality reduction and reconstruction method based on the deep neural network of claim 1, wherein the prior network is complementary to the context network when estimating the code rate of the codeword; adding context information into the prior check network does not cause potential interest rate loss; introducing the super-a priori information in the context network eliminates a certain amount of uncertainty.