CN110956671A

CN110956671A - Image compression method based on multi-scale feature coding

Info

Publication number: CN110956671A
Application number: CN201911290877.8A
Authority: CN
Inventors: 吴庆波; 吴晨豪; 李宏亮; 孟凡满; 许林峰
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2019-12-12
Filing date: 2019-12-12
Publication date: 2020-04-03
Anticipated expiration: 2039-12-12
Also published as: CN110956671B

Abstract

The invention discloses an image compression method based on multi-scale feature coding, which is characterized in that selection vectors are obtained by averaging the absolute values of gradient spectrums of image features in a training set, and the selection vectors are utilized to guide different channel features to select coding resolution; and meanwhile, the characteristics of the low-resolution codes are restored through a super-resolution network at a decoding end, and finally the characteristics of the low-resolution codes and the characteristics of the high-resolution codes are recombined into a complete characteristic spectrum and mapped back to the original image. The invention carries out difference processing aiming at the characteristics of the image characteristics, and transmits the characteristics which are easy to recover from the context information by using low resolution, thereby saving code rate; for complex fine features transmitted with high resolution, the degree of loss is reduced.

Description

Image compression method based on multi-scale feature coding

Technical Field

The invention belongs to the technical field of image compression, and particularly relates to a design of an image compression method based on multi-scale feature coding.

Background

At present, in the field of image compression, many methods based on deep learning begin to appear, for example, the feature extraction capability of a convolutional neural network is utilized to map an image to a feature space, quantization and entropy coding are performed on the obtained features, and a transposed convolution is used to map the features back to an original image after entropy decoding is performed on a decoding end. However, there is a difference in complexity between the features of different channels, and the same processing method will waste a large amount of code rate on the smooth features and damage the fineness of the complex features.

Disclosure of Invention

The invention aims to provide an image compression method based on multi-scale feature coding, which carries out difference processing aiming at the characteristics of image features and transmits the features which are easy to recover from context information by using low resolution, thereby saving code rate; for complex fine features transmitted with high resolution, the degree of loss is reduced.

The technical scheme of the invention is as follows: an image compression method based on multi-scale feature coding comprises the following steps:

and S1, performing feature extraction on the input image to obtain image features.

And S2, selecting channels according to the image characteristics to obtain a high-resolution characteristic channel and a low-resolution characteristic channel.

And S3, respectively coding and decoding the image features in the high-resolution feature channel and the image features in the low-resolution feature channel to obtain first high-resolution image features and low-resolution image features.

And S4, inputting the low-resolution image features into a super-resolution network for recovery to obtain second high-resolution image features.

And S5, carrying out image synthesis on the first high-resolution image characteristic and the second high-resolution image characteristic to obtain an output image.

Further, step S1 is specifically: performing feature extraction on an input image through 4 sequentially connected downsampling convolutional layers at an image coding end to obtain image features; the convolution kernel size of each downsampled convolution layer is 5 × 5, the step size is 2, and the activation function is a GDN function.

Further, step S2 includes the following substeps:

and S21, extracting the characteristic spectrum of the image characteristic, and calculating by using a Sobel gradient operator to obtain the gradient spectrum.

And S22, averaging the absolute values of the gradient spectrum of each characteristic channel to obtain a one-dimensional vector for describing the complexity of the characteristic channel.

And S23, setting the channel corresponding to the half one-dimensional vector with larger complexity as a high-resolution characteristic channel, and setting the channel corresponding to the half one-dimensional vector with smaller complexity as a low-resolution characteristic channel, wherein the channel corresponding to the half one-dimensional vector with larger complexity is set as a 1.

Further, the specific method for encoding and decoding the image features in the high-resolution feature channel in step S3 is as follows:

and A1, quantizing the image features in the high-resolution feature channel.

And A2, estimating the probability distribution of the quantized image features through a hyper-pilot network.

And A3, performing arithmetic coding on the quantized image features according to the probability distribution to obtain a binary code stream.

And A4, performing arithmetic decoding on the binary code stream according to the probability distribution to obtain a first high-resolution image characteristic.

The specific method for encoding and decoding the image features in the low-resolution feature channel in step S3 is as follows:

and B1, down-sampling the image features in the low-resolution feature channel.

And B2, quantizing the image features after down sampling.

And B3, estimating the probability distribution of the quantized image features through a hyper-pilot network.

And B4, performing arithmetic coding on the quantized image features according to the probability distribution to obtain a binary code stream.

And B5, performing arithmetic decoding on the binary code stream according to the probability distribution to obtain the low-resolution image characteristics.

Further, step a1 specifically includes: in the training process, quantizing the image features in the high-resolution feature channel by increasing a uniform noise approximate representation quantization result; in the testing process, the image features in the high-resolution feature channel are quantized in a rounding mode.

The step B2 specifically includes: in the training process, the down-sampled image features are quantized in a mode of increasing a uniform noise approximate representation quantization result; in the testing process, the image characteristics after down sampling are quantized by adopting a rounding mode.

Further, in step B1, the image features in the low-resolution feature channel are downsampled by a downsampled convolutional layer, the size of the convolution kernel of the downsampled convolutional layer is 5 × 5, the step size is 2, and the activation function is a GDN function.

Further, the super-prior network extracts variance from image features as side information, and the side information uses fixed probability distribution when encoding and decoding.

The encoding end of the super-first network comprises three convolution layers which are connected in sequence, the size of a convolution kernel of each convolution layer is 5 multiplied by 5, the step length is 2, and the activation function is a ReLU function.

The decoding end of the super-pilot network comprises three sequentially connected transposition convolutional layers, the size of a convolution kernel of each transposition convolutional layer is 5 multiplied by 5, the step length is 2, and the activation function is a ReLU function.

Further, the super-resolution network in step S4 includes a first convolutional layer, a GDN function, a second convolutional layer, a first cascaded layer, a third convolutional layer, a GDN function, a fourth convolutional layer, a GDN function, a first residual block, a second residual block, a fifth convolutional layer, a GDN function, a second cascaded layer, and a transposed convolutional layer, which are connected in sequence.

The input end of the first convolution layer inputs a first high-resolution image characteristic, the input end of the first cascade layer inputs a low-resolution image characteristic, the input end of the second cascade layer is also connected with the output end of the first cascade layer, and the output end of the transposition convolution layer outputs a second high-resolution image characteristic.

The number of filters of the first convolutional layer is 192, the size of the convolutional kernel is 3 × 3, and the sampling multiple is 1.

The number of filters of the second convolutional layer is 96, the size of the convolutional kernel is 1 × 1, and the sampling multiple is 2 times of down-sampling.

The number of filters of the third convolutional layer is 384, the convolutional kernel size is 3 × 3, and the sampling multiple is 1.

The number of filters of the fourth convolutional layer is 192, the size of the convolutional kernel is 3 × 3, and the sampling multiple is 1.

The number of filters of the fifth convolutional layer is 384, the convolutional kernel size is 1 × 1, and the sampling multiple is 1.

The number of filters of the transposed convolution layer is 96, the size of the convolution kernel is 3 multiplied by 3, and the sampling multiple is 2 times of up-sampling.

The first residual block and the second residual block have the same structure and respectively comprise a sixth convolution layer, a GDN function, a seventh convolution layer and an adder which are sequentially connected, wherein the input end of the sixth convolution layer is used as the input end of the first residual block or the second residual block and is connected with the input end of the adder, and the output end of the adder is used as the output end of the first residual block or the second residual block.

The number of filters of the sixth convolutional layer is 192, the size of the convolutional kernel is 3 × 3, and the sampling multiple is 1.

The number of filters of the seventh convolutional layer is 192, the size of the convolutional kernel is 3 × 3, and the sampling multiple is 1.

Further, the main constraint condition of the super-resolution network in the step S4 in the training process is rate distortion loss, and the auxiliary constraint condition is super-resolution loss.

The rate-distortion loss is calculated by the formula:

L＝R+λD

wherein L represents rate distortion loss, R represents code rate, D represents distortion degree, and lambda is weight.

The super-resolution loss is the mean square error between the image characteristics before the down-sampling of the two characteristic channels at the image encoding end and the image characteristics after the super-resolution at the image decoding end.

Further, in step S5, the first high-resolution image feature and the second high-resolution image feature are image-synthesized by 4 transposed convolution layers, each of which has a convolution kernel size of 5 × 5, and the activation function is an IGDN function that is an inverse function of the GDN function.

The invention has the beneficial effects that:

(1) the invention adopts coding modes with different resolutions for the image characteristics of different channels to realize the purpose of distributing corresponding code rates for the characteristics with different fineness degrees.

(2) The invention processes the characteristics of low-resolution coding at a decoding end through a super-resolution network, fully utilizes the capability of a neural network in the aspect of image recovery, deduces lost information from context content and can reduce the loss degree of images.

Drawings

Fig. 1 is a flowchart of an image compression method based on multi-scale feature coding according to an embodiment of the present invention.

Fig. 2 is a schematic diagram of a super-resolution network structure according to an embodiment of the present invention.

Detailed Description

Exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It is to be understood that the embodiments shown and described in the drawings are merely exemplary and are intended to illustrate the principles and spirit of the invention, not to limit the scope of the invention.

The embodiment of the invention provides an image compression method based on multi-scale feature coding, as shown in fig. 1, the method comprises the following steps of S1-S5:

In the embodiment of the invention, the image coding end carries out feature extraction on the input image through 4 downsampling convolutional layers which are sequentially connected to obtain the image features. The convolution kernel size of each downsampled convolutional layer is 5 × 5, the step size is 2, and the activation function is a GDN (Generalized split Normalization) function.

The step S2 includes the following substeps S21-S23:

And S22, averaging the absolute values of the gradient spectrum of each characteristic channel to obtain a one-dimensional vector for describing the complexity of the characteristic channel. In the embodiment of the invention, the larger the numerical value in the one-dimensional vector is, the larger the complexity of the corresponding characteristic channel is.

The specific method for coding and decoding the image features in the high-resolution feature channel comprises the following steps:

and A1, quantizing the image features in the high-resolution feature channel.

In the embodiment of the invention, in the training process, because the quantization operation can not be reversely propagated, an alternative mode is adopted to quantize the image characteristics in the high-resolution characteristic channel by increasing a mode of approximately representing the quantization result by uniform noise; in the testing process, the image features in the high-resolution feature channel are quantized in a rounding mode.

The specific method for coding and decoding the image features in the low-resolution feature channel comprises the following steps:

and B1, down-sampling the image features in the low-resolution feature channel.

In the embodiment of the invention, the image features in the low-resolution feature channel are downsampled through a downsampling convolutional layer, the convolutional kernel size of the downsampling convolutional layer is 5 multiplied by 5, the step size is 2, and the activation function is a GDN function.

And B2, quantizing the image features after down sampling.

In the embodiment of the invention, in the training process, because the quantization operation can not be carried out with back propagation, an alternative mode is adopted, and the image characteristics after down sampling are quantized in a mode of increasing uniform noise to approximately represent the quantization result; in the testing process, the image characteristics after down sampling are quantized by adopting a rounding mode.

In the embodiment of the invention, the coding and decoding part adopts arithmetic coding and decoding, the arithmetic coding and decoding needs a probability distribution which is common to both the coding and decoding parts, at the moment, the probability distribution is modeled into a zero-mean Gaussian mixture model, and a super-prior network is used for extracting variance from characteristics to be used as side information. The arithmetic encoding of the two feature channel encoding and decoding sections encodes the features into a binary code stream according to the same probability distribution parameters, and the arithmetic decoding decodes them back to the original features. And using fixed probability distribution when the side information of the prior network is coded and decoded.

The encoding end of the super-advanced network comprises three convolution layers which are connected in sequence, the size of a convolution kernel of each convolution layer is 5 multiplied by 5, the step length is 2, and the activation function is a ReLU function. The decoding end of the super-first-rate network comprises three sequentially connected transposition convolutional layers corresponding to the convolutional layers at the encoding end, the size of a convolutional core of each transposition convolutional layer is 5 multiplied by 5, the step length is 2, and the activation function is a ReLU function.

As shown in fig. 2, in the embodiment of the present invention, the super-resolution network includes a first convolutional layer, a GDN function, a second convolutional layer, a first cascaded layer, a third convolutional layer, a GDN function, a fourth convolutional layer, a GDN function, a first residual block, a second residual block, a fifth convolutional layer, a GDN function, a second cascaded layer, and a transposed convolutional layer, which are sequentially connected.

Since there is a certain correlation between the low-resolution image feature and the first high-resolution image feature, the first high-resolution image feature is also used as an input when super-resolving the low-resolution image feature. In the embodiment of the invention, the input end of the first convolution layer inputs the first high-resolution image characteristic, the input end of the first cascade layer inputs the low-resolution image characteristic, the input end of the second cascade layer is also connected with the output end of the first cascade layer, and the output end of the transposition convolution layer outputs the second high-resolution image characteristic.

In the embodiment of the invention, the main constraint condition of the super-resolution network in the training process is rate distortion loss, and the auxiliary constraint condition is super-resolution loss.

The rate distortion loss is obtained by weighting the code rate and the distortion, and the calculation formula is as follows:

L＝R+λD

wherein L represents rate distortion loss, and R represents code rate, the information entropy is directly used as the code rate after entropy coding by adopting the current probability distribution; d represents distortion degree, and the Mean Square Error (MSE) between the original image and the decoded image is adopted to describe the distortion degree in the embodiment of the invention; the lambda is a weight value, manual setting is adopted in the embodiment of the invention, and the compression ratio of the image can be changed by adjusting the size of the lambda.

In the embodiment of the invention, the first high-resolution image feature and the second high-resolution image feature are subjected to image synthesis through 4 transposition convolutional layers, the size of a convolution kernel of each transposition convolutional layer is 5 multiplied by 5, and an activation function is an IGDN function of a GDN function.

It will be appreciated by those of ordinary skill in the art that the embodiments described herein are intended to assist the reader in understanding the principles of the invention and are to be construed as being without limitation to such specifically recited embodiments and examples. Those skilled in the art can make various other specific changes and combinations based on the teachings of the present invention without departing from the spirit of the invention, and these changes and combinations are within the scope of the invention.

Claims

1. An image compression method based on multi-scale feature coding is characterized by comprising the following steps:

s1, performing feature extraction on the input image to obtain image features;

s2, selecting channels according to the image characteristics to obtain a high-resolution characteristic channel and a low-resolution characteristic channel;

s3, respectively coding and decoding the image features in the high-resolution feature channel and the image features in the low-resolution feature channel to obtain a first high-resolution image feature and a first low-resolution image feature;

s4, inputting the low-resolution image features into a super-resolution network for recovery to obtain second high-resolution image features;

2. The image compression method according to claim 1, wherein the step S1 is specifically: performing feature extraction on an input image through 4 sequentially connected downsampling convolutional layers at an image coding end to obtain image features; the convolution kernel size of each downsampled convolution layer is 5 x 5, the step size is 2, and the activation function is a GDN function.

3. The image compression method according to claim 1, wherein the step S2 includes the following substeps:

s21, extracting a characteristic spectrum of the image characteristics, and calculating by using a Sobel gradient operator to obtain a gradient spectrum of the image characteristics;

s22, averaging the absolute values of the gradient spectrums of each characteristic channel to obtain a one-dimensional vector for describing the complexity of the characteristic channel;

4. The image compression method according to claim 1, wherein the specific method for coding and decoding the image features in the high resolution feature channel in step S3 is as follows:

a1, quantizing the image features in the high-resolution feature channel;

a2, estimating probability distribution of the quantized image features through a hyper-pilot network;

a3, performing arithmetic coding on the quantized image features according to probability distribution to obtain a binary code stream;

a4, performing arithmetic decoding on the binary code stream according to probability distribution to obtain a first high-resolution image characteristic;

b1, down-sampling the image features in the low-resolution feature channel;

b2, quantizing the image features after down sampling;

b3, estimating probability distribution of the quantized image features through a super-prior network;

b4, performing arithmetic coding on the quantized image features according to probability distribution to obtain a binary code stream;

5. The image compression method according to claim 4, wherein the step A1 is specifically as follows: in the training process, quantizing the image features in the high-resolution feature channel by increasing a uniform noise approximate representation quantization result; in the testing process, quantizing the image characteristics in the high-resolution characteristic channel by adopting a rounding mode;

6. The image compression method as claimed in claim 4, wherein in step B1, the image features in the low resolution feature channel are downsampled by a downsampling convolutional layer, the size of convolution kernel of the downsampling convolutional layer is 5 x 5, the step size is 2, and the activation function is GDN function.

7. The image compression method according to claim 4, wherein the super prior network extracts variance from image features as side information, and the side information uses fixed probability distribution when encoding and decoding;

the encoding end of the super-advanced network comprises three convolution layers which are sequentially connected, the size of a convolution kernel of each convolution layer is 5 multiplied by 5, the step length is 2, and the activation function is a ReLU function;

the decoding end of the super-advanced network comprises three sequentially connected transposition convolution layers, the convolution kernel size of each transposition convolution layer is 5 multiplied by 5, the step size is 2, and the activation function is a ReLU function.

8. The image compression method of claim 1, wherein the super-resolution network in step S4 includes a first convolutional layer, a GDN function, a second convolutional layer, a first cascaded layer, a third convolutional layer, a GDN function, a fourth convolutional layer, a GDN function, a first residual block, a second residual block, a fifth convolutional layer, a GDN function, a second cascaded layer, and a transposed convolutional layer, which are connected in sequence;

the input end of the first convolution layer inputs a first high-resolution image feature, the input end of the first cascade layer inputs a low-resolution image feature, the input end of the second cascade layer is also connected with the output end of the first cascade layer, and the output end of the first convolution layer outputs a second high-resolution image feature;

the number of the filters of the first convolution layer is 192, the size of a convolution kernel is 3 multiplied by 3, and the sampling multiple is 1;

the number of the filters of the second convolution layer is 96, the size of the convolution kernel is 1 multiplied by 1, and the sampling multiple is 2 times of down-sampling;

the number of the filters of the third convolutional layer is 384, the size of a convolutional kernel is 3 multiplied by 3, and the sampling multiple is 1;

the number of the filters of the fourth convolutional layer is 192, the size of a convolutional kernel is 3 multiplied by 3, and the sampling multiple is 1;

the number of the filters of the fifth convolutional layer is 384, the size of a convolutional kernel is 1 multiplied by 1, and the sampling multiple is 1;

the number of filters of the transposed convolution layer is 96, the size of a convolution kernel is 3 multiplied by 3, and the sampling multiple is 2 times of upsampling;

the first residual block and the second residual block have the same structure and respectively comprise a sixth convolution layer, a GDN function, a seventh convolution layer and an adder which are sequentially connected, wherein the input end of the sixth convolution layer is used as the input end of the first residual block or the second residual block and is connected with the input end of the adder, and the output end of the adder is used as the output end of the first residual block or the second residual block;

the number of the filters of the sixth convolutional layer is 192, the size of a convolutional kernel is 3 multiplied by 3, and the sampling multiple is 1;

the number of filters of the seventh convolutional layer is 192, the size of a convolutional kernel is 3 × 3, and the sampling multiple is 1.

9. The image compression method according to claim 8, wherein the super-resolution network in step S4 has a main constraint condition of rate distortion loss and an auxiliary constraint condition of super-resolution loss during training;

the calculation formula of the rate distortion loss is as follows:

L＝R+λD

wherein L represents rate distortion loss, R represents code rate, D represents distortion degree, and lambda is weight;

10. The image compression method according to claim 1, wherein in step S5, the first high-resolution image feature and the second high-resolution image feature are image-synthesized by 4 transposed convolutional layers, each of the transposed convolutional layers has a convolutional kernel size of 5 x 5, and the activation function is an IGDN function which is an inverse function of a GDN function.