CN114581544A

CN114581544A - Image compression method, computer device and computer storage medium

Info

Publication number: CN114581544A
Application number: CN202210496064.XA
Authority: CN
Inventors: 梁永生; 尹珊至; 鲍有能; 李超; 谭文
Original assignee: Shenzhen Graduate School Harbin Institute of Technology
Current assignee: Shenzhen Graduate School Harbin Institute of Technology
Priority date: 2022-05-09
Filing date: 2022-05-09
Publication date: 2022-06-03

Abstract

The embodiment of the application discloses an image compression method, computer equipment and a computer storage medium, a code rate control module at a coding end can map a weight factor into a code rate control vector, and multiplying the code rate control vector by the sparse feature map to obtain a potential feature representation of the designated code rate, quantizing floating points in the potential feature representation of the designated code rate into integers by a quantization unit to obtain an integer type potential feature representation, entropy-coding the integer type potential feature representation by a lossless coding module to obtain a binary code stream, therefore, only one target neural network image compression model needs to be trained, and only the weight factors need to be adjusted to realize the arbitrary adjustment of the compression rate, the code rate and the reconstruction quality of the image, therefore, a plurality of image compression models do not need to be trained, and a plurality of image compression models do not need to be deployed in computer equipment, so that the hardware storage overhead of user equipment is greatly reduced. The number of elements of the sparse feature map is greatly reduced, the calculation amount of subsequent modules can be reduced, and calculation resources are saved.

Description

Image compression method, computer device, and computer storage medium

Technical Field

The embodiment of the application relates to the field of image processing, in particular to an image compression method, computer equipment and a computer storage medium.

Background

Image compression is an important technique in the field of signal processing and computer vision, which aims to maintain the reconstruction quality of the transmitted image as much as possible on the basis of reducing the number of binary bits required for transmission and storage of digital images. In recent years, many neural network image compression methods based on deep learning achieve better performance than traditional image compression methods such as JPEG and BPG. However, there are still some problems to be solved at present to enable the neural network image compression method to be deployed in practical application scenarios.

The neural network image compression method uses a pre-trained neural network image compression model to compress the image, namely, the image to be compressed is input into the neural network image compression model, and the image which is output by the neural network image compression model and is completely compressed can be obtained. However, the pre-trained neural network image compression model has a fixed compression ratio for the image, and a fixed code rate and reconstruction quality for the output image of the same input image, which is not suitable for the actual requirement for image compression, because the requirements of the image receiving end for the code rate and reconstruction quality of the output image are diversified in the actual requirement, and the compression ratio of the image is also required to be capable of adapting to the continuous change of the network transmission bandwidth, so as to facilitate the image transmission. Obviously, the fixed compression ratio, the fixed code rate of the output image and the reconstruction quality cannot meet the diversified requirements on the compression ratio, the code rate and the image reconstruction quality in the actual requirements.

However, in order to obtain images with different compression ratios, different code rates and different reconstruction qualities, one way is to train a plurality of different neural network image compression models, each neural network image compression model corresponds to one compression ratio, code rate and image reconstruction quality, however, this way needs to train a large number of neural network image compression models, which not only increases the hardware storage overhead of the user equipment, but also needs a lot of labor to train a plurality of models, and thus this way is not economical and practical.

Disclosure of Invention

The embodiment of the application provides an image compression method, computer equipment and a computer storage medium, which are used for compressing an image to realize the arbitrary adjustment of the compression rate, the code rate and the image reconstruction quality of the image.

The embodiment of the application provides an image compression method in a first aspect, wherein the method is applied to computer equipment and comprises the following steps:

acquiring a target image to be compressed;

the method comprises the steps of obtaining a pre-trained target neural network image compression model, wherein the target neural network image compression model is obtained by training a plurality of groups of image training samples through a deep learning algorithm, and comprises at least one sparse compression unit, a coding end code rate control module connected with the at least one sparse compression unit, a quantization unit connected with the coding end code rate control module and a lossless coding module connected with the quantization unit;

inputting the target image into the target neural network image compression model so that at least one sparse compression unit of the target neural network image compression model performs feature extraction on the target image to obtain a sparse feature map of the target image,

the coding end code rate control module maps preset weight factors into code rate control vectors, the code rate control vectors are multiplied by the sparse feature map to obtain potential feature representation of the specified code rate, the quantization unit quantizes floating points in the potential feature representation of the specified code rate into integers to obtain integer potential feature representation, and the lossless coding module carries out entropy coding on the integer potential feature representation to obtain binary code streams.

Preferably, each sparse compression unit comprises a feature extraction module and a complexity control module;

inputting the target image into the target neural network image compression model, so that at least one sparse compression unit of the target neural network image compression model performs feature extraction on the target image to obtain a sparse feature map of the target image, and the method comprises the following steps:

inputting the target image into the target neural network image compression model, so that the feature extraction module performs feature extraction on the target image to obtain a first target feature map of the target image, and the complexity control module performs sparse operation on the first target feature map to obtain the sparse feature map.

Preferably, inputting the target image into the target neural network image compression model so that the complexity control module performs a sparse operation on the first target feature map includes:

inputting the target image into the target neural network image compression model, so that the complexity control module extracts intra-channel importance information of each channel of the first target feature map to obtain an initial vector with dimension equal to the number of channels of the first target feature map, extracts correlation among the channels of the first target feature map according to the initial vector to obtain an importance vector with dimension equal to the number of channels of the first target feature map, performs dot product operation on the importance vector and a preset threshold control vector to obtain a threshold vector with dimension equal to the number of channels of the first target feature map, and,

comparing the element value of each channel of the first target feature map with the threshold value of the channel specified by the threshold value vector, and if the element value of a channel is greater than the threshold value of the channel specified by the threshold value vector, determining that the binary mask of the position corresponding to the element value of the channel is 1; if the element value of the channel is smaller than the threshold value of the channel specified by the threshold value vector, determining that the binary mask of the position corresponding to the element value of the channel is 0, and obtaining the binary mask corresponding to each channel; and respectively performing dot product on the element values of the channels and the binary masks corresponding to the element values to obtain the sparse feature map.

A second aspect of embodiments of the present application provides a computer device, including:

an acquisition unit configured to acquire a target image to be compressed;

the acquisition unit is further used for acquiring a pre-trained target neural network image compression model, the target neural network image compression model is obtained by training a plurality of groups of image training samples through a deep learning algorithm, and the target neural network image compression model comprises at least one sparse compression unit, a coding end code rate control module connected with the at least one sparse compression unit, a quantization unit connected with the coding end code rate control module and a lossless coding module connected with the quantization unit;

a compression unit, configured to input the target image into the target neural network image compression model, so that at least one sparse compression unit of the target neural network image compression model performs feature extraction on the target image to obtain a sparse feature map of the target image, and,

A third aspect of embodiments of the present application provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the method of the foregoing first aspect when executing the computer program.

A fourth aspect of embodiments of the present application provides a computer storage medium having instructions stored therein, which when executed on a computer, cause the computer to perform the method of the first aspect.

According to the technical scheme, the embodiment of the application has the following advantages:

the code rate control module at the encoding end in the target neural network image compression model can map the weight factors into code rate control vectors, multiply the code rate control vectors with a sparse feature map to obtain potential feature representation of the designated code rate, the quantization unit quantizes floating points in the potential feature representation of the designated code rate into integers to obtain integer potential feature representation, and the lossless coding module performs entropy coding on the integer potential feature representation to obtain binary code stream, so that only one target neural network image compression model needs to be trained, when different compression ratios, code rates and image reconstruction qualities of images need to be obtained, the compression ratio, the code rate and the reconstruction qualities of the images can be changed by only adjusting the weight factors, arbitrary adjustment of the compression ratio, the code rate and the reconstruction qualities of the images is realized, a plurality of image compression models do not need to be trained, and a plurality of image compression models do not need to be deployed by computer equipment, therefore, the hardware storage cost of the user equipment is greatly reduced, and the labor intensity of personnel is greatly reduced.

Meanwhile, element values of the first target characteristic diagram are greatly reduced after sparse operation of the complexity control module, so that the number of elements of the sparse characteristic diagram is greatly reduced compared with the number of elements of the first target characteristic diagram, simplification of data quantity of the characteristic diagram is realized, further the calculated quantity of subsequent modules is reduced, and computer equipment with limited computing power is facilitated to deploy the target neural network image compression model of the embodiment. Meanwhile, the sparsity of the feature map is related to the characteristics of the texture, the content, the feature region and the like of the input image, so that the utilization of the computing resources of the target neural network image compression model can be adaptively changed according to the characteristics of the image, for example, the smaller the change of the texture of the input image is, the greater the sparsity of the feature map is, the fewer the used computing resources are, the computing resources can be saved, and the waste of the computing resources can be avoided.

Drawings

FIG. 1 is a schematic flow chart of an image compression method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart illustrating an image compression method according to an embodiment of the present application;

fig. 3 is a schematic view of a scenario in which the complexity control module performs sparse operation on the first target feature map in the embodiment of the present application;

FIG. 4 is a schematic structural diagram of a cascaded sparse compression unit in an embodiment of the present application;

FIG. 5 is a schematic diagram of a cascaded sparse decompression unit according to an embodiment of the present application;

FIG. 6 is a schematic structural diagram of a computer device in an embodiment of the present application;

fig. 7 is a schematic structural diagram of another computer device in the embodiment of the present application.

Detailed Description

The following describes an image compression method in the embodiment of the present application:

referring to fig. 1, an embodiment of an image compression method in the embodiment of the present application includes:

101. acquiring a target image to be compressed;

the method of the embodiment can be applied to any computer equipment with certain computing capability and data processing capability, and when the computer equipment is a terminal, the computer equipment can be terminal equipment such as a Personal Computer (PC) and a desktop computer; when the computer equipment is a server, the computer equipment can be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, and can also be a cloud server for providing basic cloud computing services such as a cloud database, cloud computing, a big data and artificial intelligence platform and the like.

102. The method comprises the steps of obtaining a pre-trained target neural network image compression model, wherein the target neural network image compression model is obtained by training a plurality of groups of image training samples through a deep learning algorithm, and comprises at least one sparse compression unit, a coding end code rate control module connected with the at least one sparse compression unit, a quantization unit connected with the coding end code rate control module and a lossless coding module connected with the quantization unit;

in this embodiment, the target neural network image compression model may be obtained by training a plurality of groups of image training samples through a deep learning algorithm. The sparse compression unit can be used for extracting the features of the target image to obtain a sparse feature map of the target image, the code rate control module at the encoding end can be used for controlling the compression rate, the code rate and the image reconstruction quality of the target image according to arbitrarily set weight factors, and the lossless encoding module is used for compressing and encoding the target image. The structure of each constituent unit and module of the target neural network image compression model may be a neural network structure, for example, the structure of each constituent unit and module may include each neural network structure such as an input layer, a convolutional layer, an activation function, a pooling layer, and a full connection layer in the neural network structure. Therefore, by training an image compression model composed of a plurality of neural network structures by a deep learning algorithm, a neural network model capable of performing an image compression operation can be obtained.

103. Inputting a target image into a target neural network image compression model so that at least one sparse compression unit of the target neural network image compression model performs feature extraction on the target image to obtain a sparse feature map of the target image, mapping a preset weight factor into a code rate control vector by a coding end code rate control module, multiplying the code rate control vector by the sparse feature map to obtain a potential feature representation of a specified code rate, quantizing floating point numbers in the potential feature representation of the specified code rate into integers by a quantization unit to obtain an integer potential feature representation, and performing entropy coding on the integer potential feature representation by a lossless coding module to obtain a binary code stream;

in order to realize the control and arbitrary adjustment of a plurality of image compression indexes such as compression rate, code rate, image reconstruction quality and the like of a target image to be compressed, the embodiment adopts a coding end code rate control module added in an image compression model to control and adjust the image compression indexes. The specific adjustment mode is that the computer device can input the target image into the target neural network image compression model, so that at least one sparse compression unit of the target neural network image compression model extracts the features of the target image to obtain a sparse feature map of the target image, the sparse feature map is transmitted to the coding end code rate control module, the coding end code rate control module maps preset weighting factors into code rate control vectors, the code rate control vectors are multiplied by the sparse feature map to obtain a potential feature representation of a specified code rate, the potential feature representation of the specified code rate is transmitted to the quantization unit, the quantization unit quantizes floating points in the potential feature representation of the specified code rate into integers to obtain an integer potential feature representation, the coding module performs entropy coding on the integer potential feature representation to obtain a binary code stream, and the binary code stream is a result obtained by over-compressing the target image, the computer device may store the binary code stream or transmit it to other computer devices. It can be seen that, since the weighting factor input to the encoding end code rate control module can be arbitrarily controlled and adjusted according to the actual image compression requirement, and the weighting factor can be mapped as a code rate control vector, the arbitrary adjustment and control of the code rate control vector can be realized, and further, the compression rate, the code rate and the image reconstruction quality of the image compression can be controlled and adjusted according to the code rate control vector in the image compression process, so as to realize the arbitrary adjustment and control of each image compression index.

Wherein the lossless coding module entropy codes the integer type latent feature representation using the probability distribution estimated by the entropy model, which may be, in particular, a gaussian distribution model, a probability distribution model, or the like.

In the embodiment, a code rate control module at a coding end in a target neural network image compression model can map a weight factor into a code rate control vector, and multiply the code rate control vector with a sparse feature map to obtain a potential feature representation of a specified code rate, a quantization unit quantizes floating points in the potential feature representation of the specified code rate into integers to obtain an integer potential feature representation, and a lossless coding module performs entropy coding on the integer potential feature representation to obtain a binary code stream, so that only one target neural network image compression model needs to be trained, when different compression ratios, code rates and image reconstruction qualities of an image need to be obtained, the compression ratio, the code rate and the reconstruction qualities of the image can be changed by only adjusting the weight factor, arbitrary adjustment of the compression ratio, the code rate and the reconstruction qualities of the image is realized, and a plurality of image compression models do not need to be trained, and the computer equipment does not need to deploy a plurality of image compression models, so that the hardware storage overhead of the user equipment is greatly reduced, and the labor intensity of personnel is greatly reduced.

The embodiments of the present application will be described in further detail below on the basis of the aforementioned embodiment shown in fig. 1. Referring to fig. 2, another embodiment of the image compression method in the embodiment of the present application includes:

201. acquiring a target image to be compressed;

the target image may be a pre-stored image or an image input when a user needs to compress a certain image, and is not limited herein.

202. The method comprises the steps of obtaining a pre-trained target neural network image compression model, wherein the target neural network image compression model is obtained by training a plurality of groups of image training samples through a deep learning algorithm, and comprises at least one sparse compression unit, a coding end code rate control module connected with the at least one sparse compression unit, a quantization unit connected with the coding end code rate control module and a lossless coding module connected with the quantization unit;

in this embodiment, the module in the target neural network image compression model may be composed of a network structure of a neural network, for example, the module may be composed of different network structures of a convolutional layer, a fully connected layer, a pooling layer, an output layer, and the like of the neural network, and when the module includes the convolutional layer or the fully connected layer of the neural network, the module may perform a convolution operation; when a module includes a pooling layer, the module may perform pooling operations.

Different from the traditional image compression model, the target neural network image compression model of this embodiment further includes a code rate control module at the encoding end, and the code rate control module at the encoding end may be composed of different neural network structures such as a convolutional layer, a pooling layer, or a full link layer. The code rate control module at the encoding end can adjust the code rate and the compression rate of the image according to the preset weight factor so as to adjust the reconstruction quality of the image.

203. Inputting a target image into a target neural network image compression model so that at least one sparse compression unit of the target neural network image compression model performs feature extraction on the target image to obtain a sparse feature map of the target image, mapping a preset weight factor into a code rate control vector by a coding end code rate control module, multiplying the code rate control vector by the sparse feature map to obtain a potential feature representation of a specified code rate, quantizing floating point numbers in the potential feature representation of the specified code rate into integers by a quantization unit to obtain an integer potential feature representation, and performing entropy coding on the integer potential feature representation by a lossless coding module to obtain a binary code stream;

in this embodiment, each sparse compression unit includes a feature extraction module and a complexity control module, after a target image is input to the target neural network image compression model, the feature extraction module of the sparse compression unit performs feature extraction on the target image to obtain a first target feature map of the target image, and the complexity control module performs sparse operation on the first target feature map to obtain a sparse feature map.

In a preferred embodiment, the feature extraction module includes a down-sampling module and a nonlinear feature enhancement module, and the specific manner of the feature extraction module performing feature extraction on the target image to obtain the first target feature map may be that the down-sampling module performs down-sampling on the target image and expands the number of channels of the target image to obtain a first initial feature map of the target image, and the nonlinear feature enhancement module extracts the spatial feature and the channel feature of the first initial feature map, and enhances the first initial feature map according to the spatial feature and the channel feature to obtain the first target feature map.

In another preferred embodiment, the feature extraction module may include only a down-sampling module, and if the down-sampling module down-samples the target image and expands the number of channels of the target image, to obtain a first initial feature map of the target image, the first initial feature map is directly input to the complexity control module as the first target feature map without being processed by the nonlinear feature enhancement module, and the complexity control module performs the thinning operation on the first initial feature map.

Compared with the mode that the first initial characteristic diagram is not processed by the nonlinear characteristic enhancement module, the first initial characteristic diagram is processed by the nonlinear characteristic enhancement module, so that the characterization capability of the characteristic diagram on the image can be enhanced, the probability distribution characteristic of the image is optimized, and the compression performance is improved.

Specifically, the processing of the target image by the downsampling module is to downsample the target image in the spatial dimension to reduce the spatial dimension size of the target image, and the downsampling module may specifically be pooling, convolution, extraction, and the like, and the downsampling multiple may be an integer multiple of 2, 4, 6, 8, and the like.

The nonlinear feature enhancement module may process the first initial feature map by extracting spatial features and channel features of the first initial feature map, and performing mathematical operation on the first initial feature map and the spatial features and channel features thereof through a nonlinear operation algorithm and/or a linear operation algorithm to obtain a first target feature map. The linear operation algorithm includes, but is not limited to, simple operations such as addition, subtraction, multiplication, division, or combined operations such as affine, convolution, and the like, and the nonlinear operation algorithm includes, but is not limited to, an operation algorithm according to functions such as a tangent function, a Sigmoid function, a Softplus function, a softmax function, and a ReLU function.

In a preferred mode of this embodiment, a specific implementation of the sparse operation on the first target feature map by the complexity control module may be that the complexity control module extracts intra-channel importance information of each channel of the first target feature map to obtain an initial vector whose dimension is equal to the number of channels of the first target feature map, extracting the correlation among the channels of the first target feature map according to the initial vector to obtain an importance vector with dimension equal to the number of the channels of the first target feature map, performing dot product operation on the importance vector and a preset threshold control vector to obtain a threshold vector with dimension equal to the number of channels of the first target feature map, comparing the element value of each channel of the first target feature map with a threshold value of the channel specified by the threshold vector, and if the element value of the channel is greater than the threshold value of the channel specified by the threshold vector, determining that the binary mask of the channel is 1; and if the element value of the channel is smaller than the threshold value of the channel specified by the threshold value vector, determining that the binary mask of the channel is 0, and performing dot product operation on the binary mask of the channel and the first target feature map to obtain a sparse feature map.

Fig. 3 illustrates a schematic view of a scene in which the complexity control module performs a sparse operation on the first target feature map, as shown in the figure, the first target feature map includes A, B, C and D4 channels, the width and height of each channel are 2, for example, the element value of the channel a includes 0.1, 0.2, 0.3 and 0.4, the complexity control module extracts intra-channel importance information of each channel of the first target feature map by using an average pooling method to obtain an initial vector v whose dimension is equal to the number of channels of the first target feature map, that is, 4 dimensions of the initial vector v are 0.25, 65, 0.65 and 0.25, and then extracts a correlation between each channel of the first target feature map according to the initial vector v, that is, first, a 1-dimensional convolution operation is performed on the initial vector v, where a convolution kernel is set to have a size of 3 and a value of [1, 0, 1]Zero padding is carried out at two ends of the initial vector V, 1-dimensional convolution operation is carried out, the initial vector V which completes the 1-dimensional convolution operation is further activated by using a Sigmoid activation function sigma, and the importance vector V with the dimension equal to the number of channels of the first target feature map can be obtained_d。

To importance vector V_dAnd a preset threshold control vector V_adPerforming dot product operation to obtain a threshold vector th with dimension equal to the number of channels of the first target feature map, comparing element values of the channels of the first target feature map with a threshold value of the channel specified by the threshold vector th, if the element values of the channels are greater than the threshold value of the channel specified by the threshold vector th, determining that a binary mask of a position corresponding to the element values of the channels is 1, and if the threshold vector th specifies that the threshold value of the channel a is 0.329, and the element value of the channel a, which is 0.4, is greater than the threshold value, determining that the binary mask of the position corresponding to the element value of the channel a is 1; if the element value of the channel is smaller than the threshold value of the channel specified by the threshold vector th, determining the element value pair of the channelIf the binary mask of the corresponding position is 0, and if the element value of 0.3 of the channel a is smaller than the threshold value of 0.329 of the channel a, the binary mask of the position corresponding to the element value of 0.3 is determined to be 0, and so on, the binary mask corresponding to each channel can be determined. And then, performing dot product on the element values of the channels and the binary masks corresponding to the element values respectively, and determining that the result after the dot product of the channel A is 0, 0 and 0.4 if the element values of the channel A and the binary masks corresponding to the element values are performed with dot product, and so on, determining the result after the dot product of the channels respectively, and finally obtaining the sparse feature map.

Wherein Sigmoid activation function σ may be

，xIs the value of the initial vector v after a 1-dimensional convolution.

In addition to extracting the intra-channel importance information by using the average pooling method, the intra-channel importance information may also be extracted by using a convolution operation method in order to reduce the spatial dimension of each channel of the first target feature map to 1 × 1. When extracting the correlation between the channels according to the initial vector V, the dimension of the input initial vector V and the output importance vector V need to be ensured_dAre equal in size, and the implementation manner includes, but is not limited to, convolution, full connection layer, and the like.

Therefore, it can be seen from the scenario shown in fig. 3 that the element values of the first target feature map are greatly reduced after the sparse operation of the complexity control module, so that the element number of the sparse feature map is greatly reduced compared with the element number of the first target feature map, the reduction of the data volume of the feature map is realized, the calculation amount of subsequent modules is further reduced, and the target neural network image compression model of the embodiment is conveniently deployed by computer equipment with limited calculation capability. Meanwhile, the sparsity of the feature map is related to the characteristics of the texture, the content, the feature region and the like of the input image, so that the utilization of the computing resources of the target neural network image compression model can be adaptively changed according to the characteristics of the image, for example, the smaller the change of the texture of the input image is, the greater the sparsity of the feature map is, the fewer the used computing resources are, the computing resources can be saved, and the waste of the computing resources can be avoided.

In a preferred embodiment of this embodiment, at least one sparse compression unit of the target neural network image compression model is arranged in sequence, and each sparse compression unit is connected to an adjacent sparse compression unit. For example, as shown in fig. 4, when the sparse compression unit includes a down-sampling module, a nonlinear feature enhancement module, and a complexity control module, the sparse compression units are connected in tandem, for example, the complexity control module of the 1 st sparse compression unit is connected to the down-sampling module of the 2 nd sparse compression unit, the complexity control module of the 2 nd sparse compression unit is connected to the down-sampling module of the 3 rd sparse compression unit, and so on, to form a cascaded sparse compression unit. Therefore, in the cascaded sparse compression units, the first sparse compression unit performs feature extraction on the target image to obtain a sparse feature map of the target image, each sparse compression unit behind the first sparse compression unit performs feature extraction on the output of the previous sparse compression unit to obtain a sparse feature map, for example, the second sparse compression unit performs feature extraction on the output of the first sparse compression unit, the third sparse compression unit performs feature extraction … … on the output of the second sparse compression unit, and so on until the last sparse compression unit performs feature extraction on the output of the previous sparse compression unit to obtain a sparse feature map.

204. Acquiring a binary code stream, inputting the binary code stream into a target neural network image compression model so that a lossless decoding module of the target neural network image compression model performs entropy decoding on the binary code stream to obtain integer type potential feature representation of a specified code rate, mapping a preset weight factor into a code rate control vector by a decoding end code rate control module, multiplying the code rate control vector by the integer type potential feature representation of the specified code rate to obtain potential feature representation of the specified code rate, and decompressing the potential feature representation of the specified code rate by at least one sparse decompression unit to obtain a reconstructed image corresponding to the target image;

in a preferred embodiment of this embodiment, the target neural network image compression model further includes a lossless decoding module, a decoding-end code rate control module connected to the lossless decoding module, and at least one sparse decompression unit connected to the decoding-end code rate control module. The computer equipment can decode and decompress the binary code stream based on the modules to obtain a reconstructed image corresponding to the target image.

The entropy model used by the lossless decoding module for entropy decoding the binary code stream is the same as the entropy model used by the lossless encoding module for entropy encoding the integer potential feature representation.

In a preferred embodiment of this embodiment, each sparse decompression unit includes an upsampling module, a nonlinear feature enhancement module, and a complexity control module. When the sparse decompression compression unit decompresses the potential feature representation of the specified code rate, the up-sampling module performs feature recovery on the potential feature representation of the specified code rate and transforms the number of channels of the potential feature representation of the specified code rate to obtain a second initial feature map, the nonlinear feature enhancement module extracts the spatial feature and the channel feature of the second initial feature map and reinforces the second initial feature map according to the spatial feature and the channel feature of the second initial feature map to obtain a second target feature map, and the complexity control module performs sparse operation on the second target feature map to obtain a reconstructed image.

In a preferred embodiment of this embodiment, a specific implementation manner of the complexity control module performing the sparse operation on the second target feature map may be that the complexity control module extracts intra-channel importance information of each channel of the second target feature map, obtains an initial vector having a dimension equal to the number of channels of the second target feature map, extracts a correlation between each channel of the second target feature map according to the initial vector, obtains an importance vector having a dimension equal to the number of channels of the second target feature map, performs a dot product operation on the importance vector and a preset threshold control vector, obtains a threshold vector having a dimension equal to the number of channels of the second target feature map, compares an element value of each channel of the second target feature map with a threshold of a channel specified by the threshold vector, if the element value of the channel is greater than the threshold of the channel specified by the threshold vector, determining the binary mask of the position corresponding to the element value of the channel to be 1; if the element value of the channel is smaller than the threshold value of the channel specified by the threshold value vector, determining that the binary mask of the position corresponding to the element value of the channel is 0, and obtaining the binary mask corresponding to each channel; and respectively performing dot product on the element value of each channel and the binary mask corresponding to the element value to obtain a reconstructed image.

The step of performing the sparse operation on the second target feature map is similar to the step of performing the sparse operation on the first target feature map by the complexity control module, and is not repeated here.

Similar to the cascading of multiple sparse compression units, the multiple sparse decompression units may also form a cascaded sparse decompression compression unit. Specifically, at least one sparse decompression compression unit of the target neural network image compression model is arranged in sequence, and each sparse decompression compression unit is connected with adjacent sparse decompression compression units. For example, as shown in fig. 5, when the sparse decompression unit includes an upsampling module, a nonlinear feature enhancing module, and a complexity control module, the sparse decompression units are connected in series, for example, the complexity control module of the 1 st sparse decompression unit is connected to the upsampling module of the 2 nd sparse decompression unit, the complexity control module of the 2 nd sparse decompression unit is connected to the upsampling module of the 3 rd sparse decompression unit, and so on, to form a cascaded sparse decompression unit. Therefore, in the cascaded sparse decompression units, the first sparse decompression unit decompresses the potential feature representation of the specified code rate to obtain a sparse feature map, each sparse decompression unit subsequent to the first sparse decompression unit decompresses the output of the previous sparse decompression unit to obtain a sparse feature map, for example, the second sparse decompression unit decompresses the output of the first sparse decompression unit, the third sparse decompression unit decompresses … … the output of the second sparse decompression unit, and so on, until the last sparse decompression unit decompresses the output of the previous sparse decompression unit to obtain a reconstructed image corresponding to the target image.

In this embodiment, the number of the cascaded sparse compression units may be determined according to the size of the input image, the size of the required potential feature representation, the complexity limitation of the target neural network image compression model, and other factors. Similarly, the number of the cascaded sparse decompression units may also be determined according to the above factors.

In a preferred embodiment of this embodiment, the training step of the target neural network image compression model may include:

the method comprises the steps of obtaining a plurality of image training samples, obtaining an initial neural network image compression model, inputting the plurality of image training samples into the initial neural network image compression model, obtaining a reconstructed image corresponding to the image training samples output by the initial neural network image compression model, and stopping training when the relation between the image training samples and the reconstructed image corresponding to the image training samples meets a convergence condition to obtain a target neural network image compression model.

Wherein, the deep learning mode of the initial neural network image compression model according to the image training samples can be that the code rate control module at the encoding end and the code rate control module at the decoding end are closed, a plurality of image training samples are input into the initial neural network image compression model to obtain a reconstructed image corresponding to the image training samples output by the initial neural network image compression model, when the first loss function meets the convergence condition, the code rate control module at the encoding end and the code rate control module at the decoding end are opened, a plurality of image training samples are input into the initial neural network image compression model, a reconstructed image corresponding to the image training samples output by the initial neural network image compression model is obtained, stopping training when the second loss function meets the convergence condition to obtain a target neural network image compression model, the first loss function and the second loss function are both functions related to network sparsity and reconstruction quality of a reconstructed image corresponding to the image training sample.

For example, the first loss function may be L = R + λ d (x, x)_hat) + γ s, where d is the input image x and the output image x_hatAnd s is the measurement index of network sparsity, R is the code rate, lambda is the weight factor, and gamma is the weight.

The second loss function may be

；

Where Λ is the set of available λ values.

The method for optimizing the loss function by using the deep learning algorithm includes, but is not limited to, a random gradient descent method, a newton method, a conjugate gradient method, and the like, and the deep learning framework used for realizing the step includes, but is not limited to, tensrflow, pyrrch, Mindspore, and the like. The used measures of reconstruction quality include, but are not limited to, mean square error, multi-scale structural similarity, and the like, and the used measures of network sparsity include, but are not limited to, the proportion of the complexity control module to the 0 value in the binary mask corresponding to the first target feature map in the sparse operation of the first target feature map.

In this embodiment, the coding-end rate control module maps the preset weighting factor to a rate control vector, and the implementation manner of the coding-end rate control module includes, but is not limited to, convolution, pooling, full-link layer operation, and the like. Similarly, the decoding-end rate control module maps the preset weighting factor to a rate control vector, and the implementation manner of the coding-end rate control module includes but is not limited to convolution, pooling, full-link layer operation and the like.

The quantization unit quantizes floating point numbers in the potential feature representation of the specified code rate into integers, which refers to a process of approximating continuous values of a signal to a finite plurality of discrete values, and the implementation manner of the process includes, but is not limited to, operations of uniform quantization, non-uniform quantization, scalar quantization, vector quantization and the like.

The specific operation modes of the entropy coding operation performed by the lossless coding module and the entropy decoding operation performed by the lossless decoding module include, but are not limited to, huffman coding, arithmetic coding, inter-top-of-region coding and the like.

The upsampling module performs feature recovery on the potential feature representation of the specified code rate and transforms the number of channels of the potential feature representation of the specified code rate, the specific implementation manner includes but is not limited to interpolation, convolution, sub-pixel layer and other technologies, and the multiple of upsampling may be an integer multiple of 2, 4, 6, 8 and the like.

The above describes the image compression method in the embodiment of the present application, and referring to fig. 6, the following describes the computer device in the embodiment of the present application, where an embodiment of the computer device in the embodiment of the present application includes:

an acquisition unit 601 configured to acquire a target image to be compressed;

the obtaining unit 601 is further configured to obtain a pre-trained target neural network image compression model, where the target neural network image compression model is obtained by training multiple groups of image training samples through a deep learning algorithm, and the target neural network image compression model includes at least one sparse compression unit, a coding end code rate control module connected to the at least one sparse compression unit, a quantization unit connected to the coding end code rate control module, and a lossless coding module connected to the quantization unit;

the compression unit 602 is configured to input a target image into a target neural network image compression model, so that at least one sparse compression unit of the target neural network image compression model performs feature extraction on the target image to obtain a sparse feature map of the target image, the code rate control module at a coding end maps a preset weight factor into a code rate control vector, multiplies the code rate control vector by the sparse feature map to obtain a potential feature representation of a specified code rate, the quantization unit quantizes floating points in the potential feature representation of the specified code rate into integers to obtain an integer potential feature representation, and the lossless coding module performs entropy coding on the integer potential feature representation to obtain a binary code stream.

In a preferred implementation manner of this embodiment, each sparse compression unit includes a feature extraction module and a complexity control module;

the compression unit 602 is specifically configured to input the target image into a target neural network image compression model, so that the feature extraction module performs feature extraction on the target image to obtain a first target feature map of the target image, and the complexity control module performs sparse operation on the first target feature map to obtain a sparse feature map.

In a preferred embodiment of this embodiment, the compressing unit 602 is specifically configured to input the target image into the target neural network image compression model, so that the complexity control module extracts intra-channel importance information of each channel of the first target feature map, obtains an initial vector with a dimension equal to the number of channels of the first target feature map, extracts a correlation between each channel of the first target feature map according to the initial vector, obtains an importance vector with a dimension equal to the number of channels of the first target feature map, performs a dot product operation on the importance vector and a preset threshold control vector, obtains a threshold vector with a dimension equal to the number of channels of the first target feature map, and compares an element value of each channel of the first target feature map with a threshold of a channel specified by the threshold vector, if the element value of the channel is greater than the threshold of the channel specified by the threshold vector, determining the binary mask of the position corresponding to the element value of the channel to be 1; if the element value of the channel is smaller than the threshold value of the channel specified by the threshold value vector, determining that the binary mask of the position corresponding to the element value of the channel is 0, and obtaining the binary mask corresponding to each channel; and respectively performing dot product on the element values of each channel and the binary masks corresponding to the element values to obtain the sparse feature map.

In a preferred implementation manner of this embodiment, the feature extraction module includes a down-sampling module and a nonlinear feature enhancement module;

the compressing unit 602 is specifically configured to input the target image into the target neural network image compression model, so that the down-sampling module performs down-sampling on the target image and expands the number of channels of the target image to obtain a first initial feature map of the target image, and the nonlinear feature enhancing module extracts a spatial feature and a channel feature of the first initial feature map, and enhances the first initial feature map according to the spatial feature and the channel feature to obtain a first target feature map.

In a preferred embodiment of this embodiment, at least one sparse compression unit is arranged in sequence and each sparse compression unit is connected to an adjacent sparse compression unit;

the compression unit 602 is specifically configured to input a target image into a target neural network image compression model, so that a first sparse compression unit performs feature extraction on the target image to obtain a sparse feature map of the target image, and each sparse compression unit behind the first sparse compression unit performs feature extraction on the output of a previous sparse compression unit to obtain a sparse feature map until a last sparse compression unit performs feature extraction on the output of the previous sparse compression unit to obtain a sparse feature map.

In a preferred implementation manner of this embodiment, the target neural network image compression model further includes a lossless decoding module, a decoding-end code rate control module connected to the lossless decoding module, and at least one sparse decompression compression unit connected to the decoding-end code rate control module;

the computer device further includes:

a decompressing unit 603, configured to obtain a binary code stream, input the binary code stream into the target neural network image compression model, so that a lossless decoding module of the target neural network image compression model performs entropy decoding on the binary code stream to obtain an integer-type potential feature representation of a specified code rate, and a decoding-end code rate control module maps a preset weight factor to a code rate control vector, and multiplies the code rate control vector by the integer-type potential feature representation of the specified code rate to obtain a potential feature representation of the specified code rate, where at least one sparse decompressing unit decompresses the potential feature representation of the specified code rate to obtain a reconstructed image corresponding to the target image.

In a preferred implementation manner of this embodiment, each sparse decompression compression unit includes an upsampling module, a nonlinear feature enhancement module, and a complexity control module;

the decompression unit 603 is specifically configured to input the binary code stream into the target neural network image compression model, so that the upsampling module performs feature recovery on the potential feature representation at the specified code rate and transforms the number of channels of the potential feature representation at the specified code rate to obtain a second initial feature map, the nonlinear feature enhancement module extracts spatial features and channel features of the second initial feature map, and enhances the second initial feature map according to the spatial features and the channel features of the second initial feature map to obtain a second target feature map, and the complexity control module performs sparse operation on the second target feature map to obtain a reconstructed image.

In a preferred embodiment of this embodiment, the decompressing unit 603 is specifically configured to input the binary code stream into the target neural network image compression model, so that the complexity control module extracts intra-channel importance information of each channel of the second target feature map, obtains an initial vector with a dimension equal to the number of channels of the second target feature map, extracts a correlation between each channel of the second target feature map according to the initial vector, obtains an importance vector with a dimension equal to the number of channels of the second target feature map, performs a dot product operation on the importance vector and a preset threshold control vector, obtains a threshold vector with a dimension equal to the number of channels of the second target feature map, compares an element value of each channel of the second target feature map with a threshold of a channel specified by the threshold vector, if the element value of the channel is greater than the threshold of the channel specified by the threshold vector, determining the binary mask of the position corresponding to the element value of the channel to be 1; if the element value of the channel is smaller than the threshold value of the channel specified by the threshold value vector, determining that the binary mask of the position corresponding to the element value of the channel is 0, and obtaining the binary mask corresponding to each channel; and respectively performing dot product on the element value of each channel and the binary mask corresponding to the element value to obtain a reconstructed image.

In a preferred implementation manner of this embodiment, the computer device further includes:

a training unit 604, configured to perform a training step of the target neural network image compression model, where the training step includes: obtaining a plurality of image training samples; obtaining an initial neural network image compression model; inputting a plurality of image training samples into an initial neural network image compression model to obtain a reconstructed image corresponding to the image training sample output by the initial neural network image compression model, and stopping training when the relationship between the image training sample and the reconstructed image corresponding to the image training sample meets a convergence condition to obtain a target neural network image compression model.

In a preferred embodiment of this embodiment, the training unit 604 is specifically configured to turn off the code rate control module at the encoding end and the code rate control module at the decoding end; inputting a plurality of image training samples into an initial neural network image compression model to obtain a reconstructed image corresponding to the image training samples output by the initial neural network image compression model; when the first loss function meets the convergence condition, a code rate control module at the encoding end and a code rate control module at the decoding end are opened; inputting a plurality of image training samples into an initial neural network image compression model to obtain a reconstructed image corresponding to the image training sample output by the initial neural network image compression model, and stopping training when a second loss function meets a convergence condition to obtain a target neural network image compression model; the first loss function and the second loss function are both functions related to network sparsity and reconstruction quality of a reconstructed image corresponding to the image training sample.

In this embodiment, operations performed by each unit in the computer device are similar to those described in the embodiments shown in fig. 1 to fig. 2, and are not described again here.

Meanwhile, since the element values of the first target feature map are greatly reduced after the sparse operation of the complexity control module, the element number of the sparse feature map is greatly reduced compared with that of the first target feature map, the simplification of the data volume of the feature map is realized, the calculation amount of a subsequent module is further reduced, and the target neural network image compression model of the embodiment can be conveniently deployed on computer equipment with limited calculation capacity. Meanwhile, the sparsity of the feature map is related to the characteristics of the texture, the content, the feature region and the like of the input image, so that the utilization of the computing resources of the target neural network image compression model can be adaptively changed according to the characteristics of the image, for example, the smaller the change of the texture of the input image is, the greater the sparsity of the feature map is, the fewer the used computing resources are, the computing resources can be saved, and the waste of the computing resources can be avoided.

Referring to fig. 7, a computer device in an embodiment of the present application is described below, where an embodiment of the computer device in the embodiment of the present application includes:

the computer device 700 may include one or more Central Processing Units (CPUs) 701 and a memory 705, where the memory 705 stores one or more applications or data.

The memory 705 may be volatile storage or persistent storage, among others. The program stored in the memory 705 may include one or more modules, each of which may include a sequence of instructions operating on a computer device. Still further, the central processor 701 may be configured to communicate with the memory 705 and to execute a sequence of instruction operations in the memory 705 on the computer device 700.

The computer apparatus 700 may also include one or more power supplies 702, one or more wired or wireless network interfaces 703, one or more input-output interfaces 704, and/or one or more operating systems, such as Windows Server, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.

The central processing unit 701 may perform the operations performed by the computer device in the embodiments shown in fig. 1 to fig. 2, which are not described herein again.

An embodiment of the present application further provides a computer storage medium, where one embodiment includes: the computer storage medium has stored therein instructions that, when executed on a computer, cause the computer to perform the operations described above as being performed by the computer device in the embodiments of fig. 1-2.

It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like.

Claims

1. An image compression method applied to a computer device, the method comprising:

acquiring a target image to be compressed;

2. The method according to claim 1, wherein each of the sparse compression units comprises a feature extraction module and a complexity control module;

3. The method of claim 2, wherein inputting the target image into the target neural network image compression model to cause the complexity control module to perform a sparse operation on the first target feature map comprises:

comparing the element value of each channel of the first target feature map with the threshold value of the channel specified by the threshold value vector, and if the element value of a channel is greater than the threshold value of the channel specified by the threshold value vector, determining that the binary mask of the position corresponding to the element value of the channel is 1; if the element value of the channel is smaller than the threshold value of the channel specified by the threshold value vector, determining that the binary mask of the position corresponding to the element value of the channel is 0, and obtaining the binary mask corresponding to each channel; and respectively performing dot product on the element values of each channel and the binary masks corresponding to the element values to obtain the sparse feature map.

4. The method of claim 2, wherein the feature extraction module comprises a down-sampling module and a non-linear feature enhancement module;

inputting the target image into the target neural network image compression model, so that the feature extraction module performs feature extraction on the target image to obtain a first target feature map of the target image, and the method comprises the following steps:

inputting the target image into the target neural network image compression model, so that the down-sampling module performs down-sampling on the target image and expands the number of channels of the target image to obtain a first initial feature map of the target image, and the nonlinear feature enhancement module extracts the spatial features and the channel features of the first initial feature map and enhances the first initial feature map according to the spatial features and the channel features to obtain the first target feature map.

5. The method according to claim 2, wherein the at least one sparse compression unit is arranged in order and each sparse compression unit connects adjacent sparse compression units;

inputting the target image into the target neural network image compression model, so that the feature extraction module performs feature extraction on the target image to obtain a first target feature map of the target image, and the complexity control module performs sparse operation on the first target feature map to obtain the sparse feature map, wherein the sparse feature map comprises:

and inputting the target image into the target neural network image compression model, so that a first sparse compression unit performs feature extraction on the target image to obtain a sparse feature map of the target image, each sparse compression unit behind the first sparse compression unit performs feature extraction on the output of the previous sparse compression unit to obtain a sparse feature map, and until the last sparse compression unit performs feature extraction on the output of the previous sparse compression unit to obtain a sparse feature map.

6. The method of any one of claims 1 to 5, wherein the target neural network image compression model further comprises a lossless decoding module, a decoding-side rate control module connected to the lossless decoding module, and at least one sparse decompression compression unit connected to the decoding-side rate control module;

the method further comprises the following steps:

obtaining the binary code stream, inputting the binary code stream into the target neural network image compression model, so that a lossless decoding module of the target neural network image compression model performs entropy decoding on the binary code stream to obtain an integer potential feature representation of a specified code rate, and,

the decoding end code rate control module maps preset weight factors into code rate control vectors, multiplies the code rate control vectors by integer potential feature representations of the designated code rate to obtain potential feature representations of the designated code rate, and the at least one sparse decompression unit decompresses the potential feature representations of the designated code rate to obtain reconstructed images corresponding to the target images.

7. The method of claim 6, wherein each of the sparse decompression units comprises an upsampling module, a nonlinear feature enhancement module, and a complexity control module;

inputting the binary code stream into the target neural network image compression model to enable the at least one sparse decompression unit to decompress the potential feature representation of the specified code rate, including:

inputting the binary code stream into the target neural network image compression model, so that the up-sampling module performs feature recovery on the potential feature representation of the specified code rate and transforms the number of channels of the potential feature representation of the specified code rate to obtain a second initial feature map, and,

the nonlinear feature enhancement module extracts the spatial features and the channel features of the second initial feature map, strengthens the second initial feature map according to the spatial features and the channel features of the second initial feature map to obtain a second target feature map, and performs sparse operation on the second target feature map by the complexity control module to obtain the reconstructed image.

8. The method of claim 7, wherein inputting the binary code stream into the target neural network image compression model to cause the complexity control module to perform a sparse operation on the second target feature map comprises:

inputting the binary code stream into the target neural network image compression model, so that the complexity control module extracts intra-channel importance information of each channel of the second target feature map to obtain an initial vector with dimension equal to the number of channels of the second target feature map, extracts correlation among the channels of the second target feature map according to the initial vector to obtain an importance vector with dimension equal to the number of channels of the second target feature map, performs dot product operation on the importance vector and a preset threshold control vector to obtain a threshold vector with dimension equal to the number of channels of the second target feature map, and,

comparing the element value of each channel of the second target feature map with the threshold value of the channel specified by the threshold vector, and if the element value of a channel is greater than the threshold value of the channel specified by the threshold vector, determining that the binary mask of the position corresponding to the element value of the channel is 1; if the element value of the channel is smaller than the threshold value of the channel specified by the threshold value vector, determining that the binary mask of the position corresponding to the element value of the channel is 0, and obtaining the binary mask corresponding to each channel; and respectively carrying out dot product on the element value of each channel and the binary mask corresponding to the element value to obtain the reconstructed image.

9. The method of claim 6, wherein the step of training the target neural network image compression model comprises:

obtaining a plurality of image training samples;

obtaining an initial neural network image compression model;

inputting a plurality of image training samples into the initial neural network image compression model to obtain a reconstructed image corresponding to the image training sample output by the initial neural network image compression model, and stopping training when the relationship between the image training samples and the reconstructed image corresponding to the image training samples meets a convergence condition to obtain the target neural network image compression model.

10. The method according to claim 9, wherein inputting a plurality of image training samples into the initial neural network image compression model to obtain a reconstructed image corresponding to the image training samples output by the initial neural network image compression model, includes:

closing the code rate control module at the encoding end and the code rate control module at the decoding end;

inputting a plurality of image training samples into the initial neural network image compression model to obtain a reconstructed image corresponding to the image training samples output by the initial neural network image compression model;

when the first loss function meets the convergence condition, the code rate control module of the encoding end and the code rate control module of the decoding end are opened;

inputting a plurality of image training samples into the initial neural network image compression model to obtain a reconstructed image corresponding to the image training samples output by the initial neural network image compression model, and stopping training when a second loss function meets a convergence condition to obtain the target neural network image compression model;

wherein the first loss function and the second loss function are both functions related to the network sparsity and the reconstruction quality of the reconstructed image corresponding to the image training sample.

11. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the method according to any one of claims 1 to 10 when executing the computer program.

12. A computer storage medium having stored therein instructions that, when executed on a computer, cause the computer to perform the method of any one of claims 1 to 10.