CN113315970A

CN113315970A - Image compression method, image decoding method, intelligent terminal and storage medium

Info

Publication number: CN113315970A
Application number: CN202010121098.1A
Authority: CN
Inventors: 陈巍
Original assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Current assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Priority date: 2020-02-26
Filing date: 2020-02-26
Publication date: 2021-08-27
Anticipated expiration: 2040-02-26
Also published as: CN113315970B

Abstract

The invention discloses an image compression method, an image decoding method, an intelligent terminal and a storage medium, wherein a target image is obtained and is subjected to coding processing to obtain a plurality of characteristic graphs; clustering and quantizing the plurality of feature maps to obtain quantized feature map data; performing probability estimation and arithmetic coding on the feature map data through a probability estimation network to obtain binary data, wherein the binary data is image compression data of the target image; acquiring the binary data; and obtaining a plurality of clustering quantized feature maps by the binary data through a probability estimation network and arithmetic decoding, and outputting a reconstructed decoded image corresponding to the target image. The invention carries out image compression and decoding by combining the multi-scale self-coding network and the probability estimation network for synchronous optimization, and the probability estimation network can better carry out probability estimation on data compressed by a lossy model, thereby achieving better image processing effect.

Description

Image compression method, image decoding method, intelligent terminal and storage medium

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to an image compression method, an image decoding method, an intelligent terminal, and a storage medium.

Background

The image compression model of the deep convolutional self-coding network learns the data distribution of the image data through the convolutional self-coding network, because the image conforms to Gaussian distribution, the image compression model can be fitted to the distribution of the image information through parameter learning, and the whole image compression model is learnable end to end.

An image compression model based on a deep convolutional self-coding network is widely used due to the fact that the convolutional network has good representation capability of image abstract characteristics, but in order to enable the image compression model to better compress characteristic data, entropy coding (entropy coding is a coding method which does not lose any information according to the entropy principle in the coding process) is generally used for further compressing the data of the image compression model, and only the entropy coding is lossless compression, namely, no information is lost in the entropy coding process.

At present, in a multi-scale image compression method, a probability estimation network (the probability estimation network is a network model for learning and estimating the occurrence probability of each pixel value of a feature map after image quantization because entropy coding needs the probability value of an object to be encoded) in lossy compression (a multi-scale compression model) and entropy coding needs to be trained and optimized independently, that is, the multi-scale model is trained and optimized first, after the multi-scale compression model is trained, the multi-scale compression model is fixed to train and optimize the probability estimation network, and when the multi-scale compression model and the probability estimation network are optimized jointly, the problem that a decoded image is abnormal or the size of a compressed file is doubled occurs (because the original constant value quantization causes that the multi-scale compression model and the probability estimation network cannot learn the parameters of the respective models along the respective correct directions during joint training).

Accordingly, the prior art is yet to be improved and developed.

Disclosure of Invention

The invention mainly aims to provide an image compression method, an image decoding method, an intelligent terminal and a storage medium, and aims to solve the problem that the decoded image is abnormal or the size of a compressed file is doubled when a multi-scale compression model and a probability estimation network are jointly optimized in the prior art.

To achieve the above object, the present invention provides an image compression method, comprising the steps of:

acquiring a target image, and encoding the target image to acquire a plurality of characteristic maps;

clustering and quantizing the plurality of feature maps to obtain quantized feature map data;

and performing probability estimation and arithmetic coding on the feature map data through a probability estimation network to obtain binary data, wherein the binary data is image compression data of the target image.

Optionally, in the image compression method, the obtaining a plurality of feature maps specifically includes:

and sequentially carrying out downsampling operation, convolution operation, normalization and nonlinear transformation and channel segmentation operation of preset multiples on the target image to obtain four characteristic maps.

Optionally, the image compression method, wherein the sequentially performing downsampling operation of a preset multiple, convolution operation, normalization, nonlinear transformation, and channel segmentation operation on the target image to obtain four feature maps specifically includes:

sequentially carrying out preset multiple downsampling operation, first convolution operation, normalization and nonlinear transformation and second convolution operation on the target image;

sequentially carrying out preset multiple downsampling operation, normalization and nonlinear transformation, third convolution operation and first channel segmentation operation on the image subjected to the second convolution operation, and outputting a first feature map;

sequentially carrying out preset multiple downsampling operation, normalization and nonlinear transformation, fourth convolution operation and second channel segmentation operation on the image subjected to the first channel segmentation operation, and outputting a second feature map;

sequentially carrying out preset multiple downsampling operation, normalization and nonlinear transformation, fifth convolution operation and third channel segmentation operation on the image subjected to the second channel segmentation operation, and outputting a third feature map;

and sequentially carrying out preset multiple downsampling operation, normalization and nonlinear transformation and sixth convolution operation on the image subjected to the third channel segmentation operation, and outputting a fourth feature map.

Optionally, the image compression method, wherein the image after the third channel segmentation operation is sequentially subjected to a preset multiple downsampling operation, a normalization and nonlinear transformation, and a sixth convolution operation, and a fourth feature map is output, and then the method further includes:

and respectively carrying out a first preset multiple downsampling operation, a second preset multiple downsampling operation and a third preset multiple downsampling operation on the first feature map, the second feature map and the third feature map so as to control the scales of the first feature map, the second feature map and the third feature map to be the same as the fourth feature map and combine the first feature map, the second feature map and the third feature map in the channel dimension.

Optionally, the image compression method, wherein the preset-multiple downsampling operation is a 2-time downsampling operation, and the 2-time downsampling operation is used for reducing the size of the image by half;

the convolution kernel size in the first convolution operation is 3 x 3, the number of output channels is 128, the step size is 1, and the pixel filling is 1;

the convolution kernel size in the second convolution operation is 3 x 3, the number of output channels is 64, the step size is 1, and the pixel filling is 1;

the convolution kernel size in the third convolution operation is 3 × 3, the output channel number is 128+ the channel number of the first feature map, the step size is 1, and the pixel filling is 1;

the convolution kernel size in the fourth convolution operation is 3 × 3, the output channel number is 256+ the channel number of the second feature map, the step size is 1, and the pixel filling is 1;

the convolution kernel in the fifth convolution operation is 3 × 3, the number of output channels is 512+ the number of channels of the third feature map, the step size is 1, and the pixel filling is 1;

the convolution kernel size in the sixth convolution operation is 3 × 3, the number of output channels is the number of channels of the fourth feature map, the step size is 1, and the pixel fill is 1.

Optionally, in the image compression method, the first channel segmentation operation is configured to segment a tensor of a channel number of 128+ the channel number of the first feature map into two tensors of the channel number of 128 and the channel number of the first feature map;

the second channel division operation is to divide a tensor of the number of channels being 256+ the number of channels of the second eigenmap into two tensors of the number of channels being 256 and the number of channels of the second eigenmap;

the third channel division operation is to divide the tensor of the number of channels 512+ the number of channels of the third eigenmap into two tensors of the number of channels 512 and the number of channels of the third eigenmap.

Optionally, the image compression method, wherein the performing cluster quantization processing on the plurality of feature maps specifically includes:

acquiring a plurality of feature maps;

given clustering quantization center point C ═ { C ═ C₁,c₂,...,c_LCalculating the distance between each point in the input _ x and the quantization central point, and setting the central point closest to the point as the quantized value, namely Q (input _ x)_i):＝argmin_j(input_x_i-c_j) Where i denotes the ith data of input _ x, j denotes the jth center of quantization, j ∈ [1, L ]]L is the number of clustering quantization center points;

during training, soft quantization is performed in the way of

Wherein σ is a hyperparameter;

soft quantization transitions to hard quantization and rounding, stop _ gradient (Q (input _ x)_i)-soft_Q(input_x_i))+soft_Q(input_x_i)；

And (x) taking the round (x) and outputting the quantized feature map data.

In addition, to achieve the above object, the present invention provides an image decoding method including the steps of:

acquiring binary data, wherein the binary data is image compression data of a target image;

and obtaining a plurality of clustering quantized feature maps by the binary data through a probability estimation network and arithmetic decoding, and outputting a reconstructed decoded image corresponding to the target image.

Optionally, the image decoding method, wherein the plurality of feature maps include: a first characteristic diagram, a second characteristic diagram, a third characteristic diagram and a fourth characteristic diagram.

Optionally, the image decoding method, wherein the obtaining of the multiple clustering quantized feature maps from the binary data through a probability estimation network and arithmetic decoding, and outputting a reconstructed decoded image specifically includes:

sequentially carrying out seventh convolution operation, normalization and nonlinear transformation and first-time sampling operation on the fourth feature map;

sequentially carrying out first channel merging operation, eighth convolution operation, normalization and nonlinear transformation and second time of sampling operation on the image subjected to the first time of sampling operation on the preset multiple and the third characteristic diagram;

sequentially carrying out second channel merging operation, ninth convolution operation, normalization and nonlinear transformation and third-time sampling operation on the image subjected to the second-time sampling operation on the preset multiple and the second characteristic diagram;

sequentially carrying out third channel merging operation, tenth convolution operation, normalization and nonlinear transformation and fourth sampling operation on the image subjected to the third sampling operation on the preset multiple and the first characteristic diagram;

and sequentially performing eleventh convolution operation, normalization and nonlinear transformation, twelfth convolution operation and fifth preset multiple upsampling operation on the image subjected to the fourth preset multiple upsampling operation, and outputting a reconstructed decoded image.

Optionally, in the image decoding method, a convolution kernel size in the seventh convolution operation is 3 × 3, the number of output channels is 2048, a step size is 1, and pixel filling is 1;

the convolution kernel size in the eighth convolution operation is 3 x 3, the number of output channels is 1024, the step size is 1, and the pixel filling is 1;

the size of a convolution kernel in the ninth convolution operation is 3 x 3, the number of output channels is 512, the step length is 1, and the pixel filling is 1;

the convolution kernel size in the tenth convolution operation is 3 × 3, the number of output channels is 256, the step size is 1, and the pixel filling is 1;

the size of a convolution kernel in the eleventh convolution operation is 3 × 3, the number of output channels is 128, the step size is 1, and the pixel filling is 1;

the size of the convolution kernel in the twelfth convolution operation is 3 × 3, the number of output channels is 12, the step size is 1, and the pixel fill is 1.

In addition, to achieve the above object, the present invention further provides an intelligent terminal, wherein the intelligent terminal includes: a memory, a processor and an image compression program or an image decoding program stored on the memory and executable on the processor, the image compression program implementing the steps of the image compression method as described above when executed by the processor or the image decoding program implementing the steps of the image decoding method as described above when executed by the processor.

Further, to achieve the above object, the present invention provides a storage medium storing an image compression program or an image decoding program, the image compression program implementing the steps of the image compression method described above when executed by a processor or the image decoding program implementing the steps of the image decoding method described above when executed by a processor.

In the present invention, the image compression method includes: acquiring a target image, and encoding the target image to acquire a plurality of characteristic maps; clustering and quantizing the plurality of feature maps to obtain quantized feature map data; and performing probability estimation and arithmetic coding on the feature map data through a probability estimation network to obtain binary data, wherein the binary data is image compression data of the target image. The image decoding method includes: acquiring binary data, wherein the binary data is image compression data of the target image; and obtaining a plurality of clustering quantized feature maps by the binary data through a probability estimation network and arithmetic decoding, and outputting a reconstructed decoded image corresponding to the target image. The invention carries out image compression and decoding by combining the multi-scale self-coding network and the probability estimation network for synchronous optimization, and the probability estimation network can better carry out probability estimation on data compressed by a lossy model, thereby achieving better image processing effect.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of the image compression method of the present invention;

FIG. 2 is a flow chart of a preferred embodiment of the image decoding method of the present invention;

FIG. 3 is a schematic diagram of the whole process of image compression and image decoding in the image compression method and the image decoding method of the present invention;

fig. 4 is a schematic operating environment diagram of an intelligent terminal according to a preferred embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, the image compression method according to the preferred embodiment of the present invention includes the following steps:

and step S11, acquiring a target image, and performing encoding processing on the target image to acquire a plurality of feature maps.

Specifically, the target image is sequentially subjected to a downsampling operation of a preset multiple (preferably 2 times in the present invention), a convolution operation, a normalization and nonlinear transformation, and a channel segmentation operation to obtain four feature maps (that is, the four feature maps have different sizes), that is, the present invention preferably includes four feature maps, as shown in fig. 3, which respectively include: the method comprises a first feature map (C1), a second feature map (C2), a third feature map (C3) and a fourth feature map (C4), wherein the dimensions of the four feature maps are different, and refer to the width and height sizes of the feature maps, because the width and height sizes of the feature maps after different E _ block processing are different, and because the feature maps are sampled by 2 times, the width and height of the feature maps after two adjacent E _ block processing are in a 2-time relation.

E _ block refers to a module in the encoding process, which contains 2 times down-sampling operation (space2depth 2 ↓), convolution operation (conv3 × 3/number of channels), normalized and nonlinear transformation (BN + Relu), and channel splitting operation (split) (where part E _ block does not contain split operation).

The main purposes of down-sampling (space2depth ↓, space to channel) and reducing the image (or called as down-sampling (subsampled) or down-sampling (downsamped)) are to make the image fit the size of the display area and to generate a thumbnail of the corresponding image; for example, the target image size is 512 × 512, and after 2 times of down-sampling, the image size is 256 × 256. In the invention, space2depth 2 ↓' means that reduces the space (width and height of the feature graph) by half through a certain data extraction mode, rearranges the extracted data to the dimension of the feature graph placement channel, and the total amount of data in the whole transformation mode remains unchanged, because the width and height of the space are reduced by half, the number of the rearranged feature graph is 4 times of the number of the feature graph before transformation.

Wherein convolution operation (conv): the convolution operation in the traditional mathematical definition comprises matrix dot multiplication and addition operation, wherein dot multiplication is performed firstly and then addition is performed; the convolution operation aims to extract the features of the image, and different feature extraction images can be obtained according to different convolution kernels and different calculation modes.

Wherein the normalized and nonlinear transformation (BN + Relu): normalization, that is, batch normalization bn (batch normalization) operation, which is mainly an operation of normalizing a batch (batch) of data in a model training process, and nonlinear transformation, that is, nonlinear operation, where a Relu function is used, that is, y ═ Relu (x), the Relu function is a function that runs on neurons of an artificial neural network, and is responsible for mapping inputs of the neurons to outputs, and introducing an activation function is to perform nonlinear transformation in order to increase nonlinearity of a neural network model.

Wherein channel splitting operation (split): a split () function, which aims to divide data in channel dimension, because the data dimension in the model processing process is data of four dimensions of NCHW, N represents batch size, namely the above-mentioned batch size, C is channel number, split operation is performed in the dimension of C, and H and W are feature map height and width respectively; as shown in fig. 3, the processing objects of these operations are feature maps, such as the first E _ block (E _ block refers to a module in the encoding process), the target image is sent, the target image may also be referred to as a 3-channel feature map, if batch is 1, that is, one image is processed sequentially, the original image width and height are both 512, at this time, the dimension of the feature map may be represented as (1, 3, 512, 512), after space2depth ≦ the channel of the feature map becomes 12, the data dimension is (1, 12, 256, 256), then the convolution operation is performed, because the number of convolution kernels is 128, the data dimension is (1, 128, 256, 256), along with the normalization and non-linear operation, which is an operation that does not change the data dimension, the data dimension is still (1, 128, 256, 256), because the first E _ block has no split operation, there is no channel split operation, for an E _ block with a subsequent split operation, as shown in fig. 3, for example, split the data (1, 128+ C1, H, W) obtained after convolution in the channel dimension to obtain two feature maps, where the data dimension of one feature map is (1, 128, H, W) and the data dimension of the other feature map is (1, C1, H, W), and all E _ blocks process the feature maps according to the processing method; wherein "+" in "128 + C1" means "plus", and "C1" in "128 + C1" means the number of channels of the first profile, and all "+" in the present invention means "plus".

Further, as shown in fig. 3, the target image is input (the target image is an RGB three-channel image, channels are also referred to as feature maps), and the target image is sequentially subjected to a first 2-fold downsampling operation (space2depth 2 ↓, with a scale parameter of 2), a first convolution operation (conv3 × 3/128), a normalization and nonlinear transformation (BN + Relu), and a second convolution operation (conv3 × 3/64); sequentially carrying out a second 2-time down-sampling operation (space2depth 2 ↓), a normalization and nonlinear transformation (BN + Relu), a third convolution operation (conv3 x 3/128+ C1) and a first channel segmentation operation (split) on the image subjected to the second convolution operation, and outputting a first feature map (C1); sequentially carrying out a third 2-time down-sampling operation (space2depth 2 ↓), a normalization and nonlinear transformation (BN + Relu), a fourth convolution operation (conv3 x 3/256+ C2) and a second channel segmentation operation (split) on the image subjected to the first channel segmentation operation, and outputting a second feature map (C2); sequentially carrying out fourth 2-time down-sampling operation (space2depth 2 ↓), normalization and nonlinear transformation (BN + Relu), fifth convolution operation (conv3 x 3/512+ C3) and third channel segmentation operation (split) on the image subjected to the second channel segmentation operation, and outputting a third feature map (C3); and sequentially carrying out fifth 2-time down-sampling operation (space2depth 2 ↓), normalization and nonlinear transformation (BN + Relu) and sixth convolution operation (conv3 x 3/C4) on the image subjected to the third channel segmentation operation, and outputting a fourth feature map (C4).

Specifically, the convolution kernel size in the first convolution operation (conv3 × 3/128) is 3 × 3, the number of output channels is 128, the step size is 1, and the pixel fill is 1; the convolution kernel size in the second convolution operation (conv3 × 3/64) is 3 × 3, the number of output channels is 64, the step size is 1, and the pixel fill is 1; the convolution kernel size in the third convolution operation (conv3 × 3/128+ C1) is 3 × 3, the number of output channels is 128+ the number of channels of the first signature, the step size is 1, and the pixel fill is 1; the convolution kernel size in the fourth convolution operation (conv3 × 3/256+ C2) is 3 × 3, the number of output channels is 256+ the number of channels of the second signature, the step size is 1, and the pixel fill is 1; the convolution kernel size in the fifth convolution operation (conv3 × 3/512+ C3) is 3 × 3, the number of output channels is 512+ the number of channels of the third feature map, the step size is 1, and the pixel fill is 1; the convolution kernel size in the sixth convolution operation (conv3 × 3/C4) is 3 × 3, the number of output channels is the number of channels of the fourth feature map, the step size is 1, and the pixel fill is 1.

The step length (stride) is a length of a sliding pixel unit when the convolution operation processes the feature map, for example, the step length is 1, which indicates that after the convolution kernel finishes processing the current pixel area when the feature map is plotted, the convolution kernel slides one pixel unit to continue processing the next pixel area. The pixel filling (padding) is whether pixel filling is performed on the upper, lower, left, and right sides of the feature map at the time of the convolution operation.

Specifically, the first channel division operation is to divide the tensor of the number of channels of 128+ the number of channels of the first eigenmap (C1) into two tensors of the number of channels of 128 and the number of channels of the first eigenmap (C1); the second channel division operation is to divide the tensor of the number of channels being 256+ the number of channels of the second eigenmap (C2) into two tensors of the number of channels being 256 and the number of channels of the second eigenmap (C2); the third channel division operation is to divide the tensor of the number of channels 512+ the number of channels of the third eigenmap (C3) into two tensors of the number of channels 512 and the number of channels of the third eigenmap (C3).

The tensor is a proper term in deep learning, and refers to a multidimensional matrix, for example, the above-mentioned feature map is a tensor, which is a matrix expressed by four dimensions of NCHW.

Further, the first feature map (C1), the second feature map (C2) and the third feature map (C3) are respectively subjected to a first preset multiple (preferably 8 times) down-sampling operation, a second preset multiple (preferably 4 times) down-sampling operation and a third preset multiple (preferably 2 times) down-sampling operation to control the scale of the first feature map (C1), the second feature map (C2) and the third feature map (C3) to be the same as that of the fourth feature map (C4) and to be combined with the fourth feature map (C4) in the channel dimension.

And step S12, performing clustering quantization processing on the plurality of feature maps to obtain quantized feature map data.

Firstly, for data generated in an encoding stage (the data refers to a feature graph after the last E _ block processing), quantization processing is required to implement data compression, and for input _ x, the quantization mode of an original multi-scale model is as follows:

(1) batch Normalization BN (Batch Normalization) operation;

(2)clip[0，u]where clip represents clip and then maps to N, i.e.

Where N and u are both hyper-parameters, which need to be set manually, where N denotes the quantized data range, e.g. N-7, denotes quantization to [0, 6 []U has no special meaning in order to narrow the data range of the feature map, e.g. u is set to 4;

(3) followed by soft quantization function

Error back-propagation is guaranteed, where α is a hyperparameter, e.g. set to 0.5;

(4) and finally rounding (x).

However, this quantization processing method will cause that a multi-scale model and a probability estimation network (such as a PixelCNN model, a Parallel multi-scale Pixel CNN network, a multi-scale neural network that can generate a plurality of Pixel values in Parallel) cannot be jointly optimized during network training, which will affect the probability estimation network to perform better probability estimation modeling on quantized data, and cannot realize better compression; therefore, based on the idea of a clustering method, the invention adopts a new quantization mode and solves the problem that two networks cannot be jointly optimized through clustering quantization.

Namely, the clustering quantification method adopted by the invention is as follows:

(1) given clustering quantization center point C ═ { C ═ C₁,c₂,...,c_LCalculating the distance between each point in the input _ x and the quantization central point, and setting the central point closest to the point as the quantized value, namely Q (input _ x)_i):＝argmin_j(input_x_i-c_j) And i represents input _ xIth data, j represents the quantized jth center, j ∈ [1, L ]]L is the number of clustering quantization center points;

(2) during training, because the error back propagation is ensured, the soft quantization processing is also needed, and the processing mode is

Where σ is a hyper-parameter, e.g., the value is set to 1.0;

(3) then, the soft quantization is over-scaled to hard quantization and rounded, stop _ gradient (Q (input _ x)_i)-soft_Q(input_x_i) This mathematical expression refers to the pair Q (input _ x)_i)-soft_Q(input_x_i) The result of (1) is not gradient tracking, i.e. the derivation is not performed on the part, which is a mathematical expression of the processing procedure in the program code;

stop_gradient(Q(input_x_i)-soft_Q(input_x_i))+soft_Q(input_x_i) The meaning of this mathematical expression is that the way in which the quantization is calculated is Q (input _ x) when propagating forward_i) When the gradient is calculated reversely, the quantization is calculated in soft _ Q (input _ x)_i) To ensure that the quantization process is conducted in the chain derivation process of the model;

(4) and finally, rounding (x) is taken, and the quantized feature map data are output.

And step S13, performing probability estimation and arithmetic coding on the feature map data through a probability estimation network to obtain binary data, wherein the binary data is image compression data of the target image.

The self-coding network is composed of a down-sampling coding network and an up-sampling decoding network, the coding and decoding processes are generally symmetrical structures, and the multi-scale self-coding network reserves the characteristic diagram of the corresponding scale according to the characteristic diagram of different scales extracted in the self-coding network coding process and an over-parameter on the basis of the self-coding network and sends the characteristic diagram into the network model of the decoding network.

Further, the present invention also discloses an image decoding method, which in the preferred embodiment of the present invention, as shown in fig. 2, includes the following steps:

step S21, binary data is acquired, and the binary data is image compression data of the target image.

In the present invention, the image decoding method and the aforementioned image compression method are mutually corresponding processes, and the binary data is obtained after the image compression is completed, that is, the binary data is image compression data of the target image, and therefore, the image decoding method needs to acquire the binary data first.

And step S22, obtaining a plurality of clustering quantized feature maps by the binary data through a probability estimation network and arithmetic decoding, and outputting a reconstructed decoded image corresponding to the target image.

Specifically, after clustering quantization is performed on the feature map, probability estimation is performed on the feature map by a probability estimation network (the probability estimation is to perform entropy coding, because the entropy coding needs to know the probability of data or symbols to be coded, the data or symbols to be coded can be coded), the probability estimation network adopts a parallel PixelCNN model, and after the probability estimation, the quantized feature map data is converted into binary data by traditional arithmetic coding and is stored; and decoding the binary data by combining a probability estimation network and arithmetic decoding to obtain a quantized first characteristic diagram (C1), a quantized second characteristic diagram (C2), a quantized third characteristic diagram (C3) and a quantized fourth characteristic diagram (C4).

That is, the obtained C1, C2, C3 and C4 are sequentially subjected to D _ block processing, as shown in fig. 3, to finally obtain a reconstructed decoded image, where D _ block is a module at the decoding end, and includes convolution operation (conv3 × 3/channel number), normalization and nonlinear transformation (BN + relu), 2-fold upsampling (depth2space) and channel merging operation (concat, some modules have no channel merging operation). The convolution, normalization and nonlinear transformation show that see E _ block in the encoding end, what is different from E _ block is channel merging operation and upsampling, and the two operations and downsampling and channel separation in E _ block are reciprocal operations, for example, the channel merging operation is to merge feature maps in channel dimensions, that is, add the channel dimensions, for example, the data dimensions of feature map a are (N, 32, H, W), the data dimensions of feature map B are (N, 64, H, W), then two feature maps AB combine in the channel dimensions to form a new feature map C, and the data dimensions of C are (N, 96, H, W); for depth2space up-sampling operation, the method is also an inverse operation of space2depth, and the same value and permutation combination rule as space2depth combines the data of channel dimensions in two dimensions of width and height, because 2 times of up-sampling is performed, namely the width and the height can be changed into the original 2 times, the number of channels can be changed into the original 1/4; similarly, the object operated on by the D _ block module is also an eigenmap, and the data is a tensor of (N, C, H, W) four dimensions.

Further, as shown in fig. 3, sequentially performing a seventh convolution operation (conv3 × 3/2048), a normalization and nonlinear transformation (BN + Relu), and a first 2-fold upsampling operation on the fourth feature map (C4) (in the present invention, the preset-fold upsampling operation is preferably a 2-fold upsampling operation, i.e., depth2space 2 × (which is opposite to the downsampling operation, i.e., image enlargement); sequentially performing a first channel merging operation (concat), an eighth convolution operation (conv3 × 3/1024), a normalization and nonlinear transformation (BN + Relu) and a second 2-time upsampling operation (depth2space 2 ×) on the image subjected to the first 2-time upsampling operation and the third feature map (C3); sequentially performing a second channel merging operation (concat), a ninth convolution operation (conv3 × 3/512), a normalization and nonlinear transformation (BN + Relu) and a third 2-time upsampling operation (depth2space 2 ×) (on the image subjected to the second 2-time upsampling operation and the second feature map (C2); sequentially carrying out a third channel merging operation (concat), a tenth convolution operation (conv3 × 3/256), normalization and nonlinear transformation (BN + Relu) and a fourth 2-time upsampling operation (depth2space 2 ×); and sequentially performing an eleventh convolution operation (conv3 × 3/128), a normalization and nonlinear transformation (BN + Relu), a twelfth convolution operation (conv3 × 3/12), and a fifth 2 × upsampling operation (depth2space 2 ×) on the image subjected to the fourth 2 × upsampling operation, and outputting a reconstructed decoded image (i.e., a 3-channel RGB image).

Wherein the convolution kernel size in the seventh convolution operation (conv3 × 3/2048) is 3 × 3, the number of output channels is 2048, the step size is 1, and the pixel fill is 1; the convolution kernel size in the eighth convolution operation (conv3 × 3/1024) is 3 × 3, the number of output channels is 1024, the step size is 1, and the pixel fill is 1; the convolution kernel size in the ninth convolution operation (conv3 × 3/512) is 3 × 3, the number of output channels is 512, the step size is 1, and the pixel fill is 1; the convolution kernel size in the tenth convolution operation (conv3 × 3/256) is 3 × 3, the number of output channels is 256, the step size is 1, and the pixel fill is 1; the convolution kernel size in the eleventh convolution operation (conv3 × 3/128) is 3 × 3, the number of output channels is 128, the step size is 1, and the pixel fill is 1; the convolution kernel size in the twelfth convolution operation (conv3 × 3/12) is 3 × 3, the number of output channels is 12, the step size is 1, and the pixel fill is 1.

For the channel merging operation (concat), that is, merging in the channel dimension, for example, the channel merging operation (concat) merges two tensors with channel numbers of 512 and C4 in the channel dimension to obtain a tensor with channel number of 512+ C4; by analogy, after all the D _ block processing, a 3-channel RGB image is obtained.

Further, as shown in fig. 4, based on the above image compression method, the present invention also provides an intelligent terminal, which includes a processor 10, a memory 20 and a display 30. Fig. 4 shows only some of the components of the smart terminal, but it should be understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.

The memory 20 may be an internal storage unit of the intelligent terminal in some embodiments, such as a hard disk or a memory of the intelligent terminal. The memory 20 may also be an external storage device of the Smart terminal in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the Smart terminal. Further, the memory 20 may also include both an internal storage unit and an external storage device of the smart terminal. The memory 20 is used for storing application software installed in the intelligent terminal and various data, such as program codes of the installed intelligent terminal. The memory 20 may also be used to temporarily store data that has been output or is to be output. In one embodiment, the memory 20 stores an image compression program or an image decoding program 40, and the image compression program or the image decoding program 40 can be executed by the processor 10 to implement the image compression method or the image decoding method in the present application.

The processor 10 may be, in some embodiments, a Central Processing Unit (CPU), a microprocessor or other data Processing chip, and is configured to execute program codes stored in the memory 20 or process data, such as executing the image compression method or the image decoding method.

The display 30 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch panel, or the like in some embodiments. The display 30 is used for displaying information at the intelligent terminal and for displaying a visual user interface. The components 10-30 of the intelligent terminal communicate with each other via a system bus.

In one embodiment, when the processor 10 executes the image compression program 40 in the memory 20, the following steps are implemented:

The obtaining of the plurality of feature maps specifically includes:

The method comprises the following steps of sequentially carrying out downsampling operation, convolution operation, normalization and nonlinear transformation and channel segmentation operation of preset multiples on the target image to obtain four characteristic graphs, and specifically comprises the following steps:

And the image subjected to the third channel segmentation operation is sequentially subjected to a preset multiple downsampling operation, normalization and nonlinear transformation and a sixth convolution operation, and a fourth feature map is output, and then the method further comprises the following steps:

The preset multiple downsampling operation is 2 times of downsampling operation, and the 2 times of downsampling operation is used for reducing the size of the image by half;

The first channel dividing operation is to divide a tensor of the number of channels being 128+ the number of channels of the first eigenmap into two tensors of the number of channels being 128 and the number of channels of the first eigenmap;

The clustering and quantizing the plurality of feature maps specifically includes:

acquiring a plurality of feature maps;

during training, soft quantization is performed in the way of

Wherein σ is a hyperparameter;

And (x) taking the round (x) and outputting the quantized feature map data.

Or in another embodiment, when the processor 10 executes the image decoding program 40 in the memory 20, the following steps are implemented:

Wherein the plurality of feature maps comprise: a first characteristic diagram, a second characteristic diagram, a third characteristic diagram and a fourth characteristic diagram.

The obtaining of the multiple clustering quantized feature maps of the binary data through a probability estimation network and arithmetic decoding and outputting of a reconstructed decoded image specifically include:

The convolution kernel size in the seventh convolution operation is 3 × 3, the number of output channels is 2048, the step size is 1, and the pixel filling is 1;

The present invention also provides a storage medium, wherein the storage medium stores an image compression program, and the image compression program realizes the steps of the image compression method as described above when executed by a processor.

In summary, the present invention provides an image compression method, an image decoding method, an intelligent terminal and a storage medium, where the image compression method includes: acquiring a target image, and encoding the target image to acquire a plurality of characteristic maps; clustering and quantizing the plurality of feature maps to obtain quantized feature map data; and performing probability estimation and arithmetic coding on the feature map data through a probability estimation network to obtain binary data, wherein the binary data is image compression data of the target image. The image decoding method includes: acquiring binary data, wherein the binary data is image compression data of the target image; and obtaining a plurality of clustering quantized feature maps by the binary data through a probability estimation network and arithmetic decoding, and outputting a reconstructed decoded image. The invention carries out image compression and decoding by combining the multi-scale self-coding network and the probability estimation network for synchronous optimization, and the probability estimation network can better carry out probability estimation on data compressed by a lossy model, thereby achieving better image processing effect.

Of course, it will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by a computer program instructing relevant hardware (such as a processor, a controller, etc.), and the program may be stored in a computer readable storage medium, and when executed, the program may include the processes of the above method embodiments. The storage medium may be a memory, a magnetic disk, an optical disk, etc.

It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims

1. An image compression method, characterized in that it comprises the steps of:

2. The image compression method according to claim 1, wherein the obtaining a plurality of feature maps specifically comprises:

3. The image compression method according to claim 2, wherein the sequentially performing down-sampling operation, convolution operation, normalization and nonlinear transformation, and channel segmentation operation by a preset multiple on the target image to obtain four feature maps specifically comprises:

4. The image compression method according to claim 3, wherein the image after the third channel segmentation operation is sequentially subjected to a preset-multiple downsampling operation, a normalization and nonlinear transformation, and a sixth convolution operation, and a fourth feature map is output, and then the method further comprises:

5. The image compression method according to claim 3, wherein the preset-multiple down-sampling operation is a 2-time down-sampling operation, and the 2-time down-sampling operation is used to reduce the size of the image by half;

6. The image compression method according to claim 3, wherein the first channel division operation is configured to divide a tensor of the channel number of 128+ the channel number of the first eigenmap into two tensors of the channel number of 128 and the channel number of the first eigenmap;

7. The image compression method according to claim 4, wherein the clustering quantization processing of the plurality of feature maps specifically includes:

acquiring a plurality of feature maps;

during training, soft quantization is performed in the way of

Wherein σ is a hyperparameter;

And (x) taking the round (x) and outputting the quantized feature map data.

8. An image decoding method, characterized by comprising the steps of:

9. The image decoding method according to claim 8, wherein the plurality of feature maps include: a first characteristic diagram, a second characteristic diagram, a third characteristic diagram and a fourth characteristic diagram.

10. The image decoding method according to claim 9, wherein the obtaining of the clustering-quantized multiple feature maps from the binary data by a probability estimation network and arithmetic decoding and outputting of the reconstructed decoded image specifically include:

11. The image decoding method according to claim 10, wherein the convolution kernel size in the seventh convolution operation is 3 x 3, the number of output channels is 2048, the step size is 1, and the pixel fill is 1;

12. An intelligent terminal, characterized in that, intelligent terminal includes: memory, a processor and an image compression program or an image decoding program stored on the memory and executable on the processor, the image compression program implementing the steps of the image compression method according to any one of claims 1 to 7 when executed by the processor or the image decoding program implementing the steps of the image decoding method according to any one of claims 8 to 11 when executed by the processor.

13. A storage medium, characterized in that it stores an image compression program or an image decoding program, said image compression program implementing the steps of the image compression method according to any one of claims 1 to 7 when executed by a processor or said image decoding program implementing the steps of the image decoding method according to any one of claims 8 to 11 when executed by a processor.