CN117915107B

CN117915107B - Image compression system, image compression method, storage medium and chip

Info

Publication number: CN117915107B
Application number: CN202410318128.6A
Authority: CN
Inventors: 赵旭; 李晓雷; 甘杰; 李德建; 宋波; 刘素伊; 谢海燕
Original assignee: Beijing Smartchip Microelectronics Technology Co Ltd
Current assignee: Beijing Smartchip Microelectronics Technology Co Ltd
Priority date: 2024-03-20
Filing date: 2024-03-20
Publication date: 2024-05-17
Anticipated expiration: 2044-03-20
Also published as: CN117915107A

Abstract

The invention relates to the technical field of communication, and discloses an image compression system, an image compression method, a storage medium and a chip. The image compression system includes: a nonlinear transformation network, the nonlinear transformation network comprising a feature extraction network, the feature extraction network comprising: a shift convolution layer for extracting local features of the target image, and an attention mechanism for extracting potential representation features from the local features, a query vector in the attention mechanism being equal to the key vector; an encoder for encoding the potential representation features to obtain corresponding code streams; a decoder for decoding the code stream to obtain potential representation features corresponding to the code stream; and a nonlinear inverse transformation network for decompressing the latent representation feature to obtain a reconstructed image corresponding to the target image, wherein the nonlinear inverse transformation network and the nonlinear transformation network are of a symmetrical structure. The invention can greatly improve the image compression speed on the premise of keeping the same compression performance.

Description

Image compression system, image compression method, storage medium and chip

Technical Field

The present invention relates to the field of communications technologies, and in particular, to an image compression system, an image compression method, a storage medium, and a chip.

Background

The new era of everything interconnection is started by 5G (fifth generation mobile communication technology, 5th Generation Mobile Communication Technology), meanwhile, as an important basic stone of the fourth industrial revolution, the industrial Internet of things is the most important application scene of 5G. Currently, 5G+ industrial vision becomes one of the most widely applied industrial fields, and is widely applied to scenes such as security monitoring, quality detection and automatic sorting in the fields of electric power inspection, electronic manufacturing, semiconductors, rail transit, photovoltaics, automobile manufacturing and the like. Therefore, efficient image compression plays a vital role for internet of things services (image, video real-time interaction, etc.). Traditional image coding standards (such as JPEG and JPEG 2000) and the like are widely used, but a better image coding method is still desired in the market, so that the bandwidth can be consumed less, and the higher reconstruction quality can be achieved.

At present, the image coding method based on the deep learning with excellent performance works under the framework of the variable self-encoder, and the end-to-end image coding method with the optimal performance still has the characteristics of higher model complexity and long decoding time, and cannot be applied to hardware best.

Disclosure of Invention

The invention aims to provide an image compression system, an image compression method, a storage medium and a chip, which can greatly improve the image compression speed (for example, the speed can be improved by 3-4 times) on the premise of maintaining the same compression performance compared with the traditional neural network image compression algorithm with optimal performance. In addition, because the end-to-end image compression method does not need to manually design related parameters, compared with the traditional image coding method, the method has better iteration capacity and expansibility.

In order to achieve the above object, a first aspect of the present invention provides an image compression system including: a nonlinear transformation network comprising a feature extraction network, wherein the feature extraction network comprises: a shift convolution layer for extracting local features of a target image, and an attention mechanism for extracting potential representation features of the target image from the local features, wherein a query vector in the attention mechanism is equal to a key vector; a first encoder for encoding potential representation features of the target image to obtain a corresponding first code stream; a first decoder for decoding the first code stream to obtain a first potential representation feature corresponding to the first code stream; and a nonlinear inverse transformation network for decompressing the first potential representation feature to obtain a reconstructed image corresponding to the target image, wherein the nonlinear inverse transformation network and the nonlinear transformation network are in a symmetrical structure.

Preferably, the attention mechanism comprises: two 1 x1 convolutional layers; and/or two batch normalization layers.

Preferably, the image compression system further comprises: the super prior transformation network is used for compressing the potential representation characteristics of the target image so as to acquire side information of the potential representation characteristics of the target image; the second encoder is used for encoding the side information to obtain a corresponding second code stream; a second decoder for decoding the second code stream to obtain a second potential representation feature corresponding to the second code stream; the super-prior inverse transformation network is used for decoding the second potential representation feature to acquire a variance parameter and a first mean parameter of the potential representation feature of the target image; and a context model for predicting a second mean parameter of the potential representation feature of the target image from the potential representation feature of the target image, and reconstructing a gaussian distribution model of the potential representation feature of the target image from the first mean parameter, the second mean parameter and the variance parameter, the first encoder for encoding the potential representation feature of the target image comprising: encoding potential representation features of the target image according to the gaussian distribution model, and the first decoder for decoding the first code stream comprises: and decoding the first code stream according to the Gaussian distribution model.

Preferably, the super a priori transformation network comprises: and the super-prior inverse transformation network and the super-prior transformation network are of symmetrical structures.

Preferably, the image compression system further comprises: a first quantizer is located between the super a priori transform network and the second encoder.

Preferably, the feature extraction network is a plurality of feature extraction networks, and correspondingly, the nonlinear transformation network further includes: a plurality of convolution layers alternating with the plurality of feature extraction networks.

Preferably, the shift convolution layer includes: the first shifted convolutional layer and the second shifted convolutional layer, the feature extraction network further comprising: a first residual structure connecting an input of the first shifted convolutional layer with an output of the second shifted convolutional layer; and/or a second residual structure for connecting the input and the output of the attention mechanism.

Preferably, the feature extraction network further comprises: an active layer located between the first shifted convolutional layer and the second shifted convolutional layer.

Preferably, the image compression system further comprises: a second quantizer is located between the non-linear transformation network and the first encoder.

Through the technical scheme, the method creatively sets a nonlinear transformation network for acquiring potential representation features, sets a first encoder for encoding the potential representation features of the target image to acquire corresponding first code streams, sets a first decoder for decoding the first code streams to acquire first potential representation features corresponding to the first code streams, sets a nonlinear inverse transformation network for decompressing the first potential representation features to acquire reconstructed images corresponding to the target image, and sets a shift convolution layer and an attention mechanism in the nonlinear transformation network, wherein query vectors in the attention mechanism are equal to key vectors. Compared with the existing neural network image compression algorithm with optimal performance, the method can greatly improve the image compression speed (for example, the speed can be improved by 3-4 times) on the premise of keeping the same compression performance, and compared with the traditional image coding method, the method has better iteration capacity and expansibility because the end-to-end image compression method does not need to manually design related parameters.

A second aspect of the present invention provides an image compression method, the image compression method comprising: the following operations are performed by the nonlinear transformation network: extracting local features of the target image, and extracting potential representation features of the target image according to the local features, wherein the nonlinear transformation network comprises a feature extraction network comprising: shifting a convolutional layer and an attention mechanism, wherein a query vector in the attention mechanism is equal to a key vector; encoding potential representation features of the target image by a first encoder to obtain a corresponding first code stream; decoding the first code stream by a first decoder to obtain a first potential representation feature corresponding to the first code stream; and decompressing the first potential representation feature through a nonlinear inverse transformation network to obtain a reconstructed image corresponding to the target image, wherein the nonlinear inverse transformation network and the nonlinear transformation network are in a symmetrical structure.

Preferably, the image compression method further includes: compressing the potential representation features of the target image through a super prior transformation network to obtain side information of the potential representation features of the target image; encoding the side information through a second encoder to obtain a corresponding second code stream; decoding, by a second decoder, the second code stream to obtain a second potential representation feature corresponding to the second code stream; decoding the second potential representation feature through a super-prior inverse transformation network to obtain a variance parameter and a first mean parameter of the potential representation feature of the target image; and predicting, by a context model, a second mean parameter of the potential representation feature of the target image from the potential representation feature of the target image, and reconstructing a gaussian distribution model of the potential representation feature of the target image from the first mean parameter, the second mean parameter, and the variance parameter, the encoding, by a first encoder, the potential representation feature of the target image including: encoding potential representation features of the target image according to the gaussian distribution model, and decoding the first code stream by a first decoder comprises: and decoding the first code stream according to the Gaussian distribution model.

Specific details and benefits of the image compression method provided in the present invention can be found in the above description of the image compression system, and are not repeated here.

A third aspect of the invention provides a chip comprising the image compression system.

A fourth aspect of the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image compression method.

Additional features and advantages of the invention will be set forth in the detailed description which follows.

Drawings

The accompanying drawings are included to provide a further understanding of embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain, without limitation, the embodiments of the invention. In the drawings:

Fig. 1 is a schematic diagram of an image compression system according to an embodiment of the present invention;

FIG. 2A is a schematic diagram of an image compression system according to an embodiment of the present invention;

FIG. 2B is a schematic diagram of an image compression system according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a lightweight feature extraction network (RLAB) according to an embodiment of the invention;

FIG. 4 is a schematic diagram of a lightweight feature extraction network (RLAB) according to an embodiment of the invention;

FIG. 5 is a schematic diagram of shift convolution provided by an embodiment of the present invention;

FIG. 6 is a schematic diagram of a lightweight attention mechanism provided by an embodiment of the present invention; and

Fig. 7 is a flowchart of an image compression method according to an embodiment of the present invention.

Detailed Description

The following describes specific embodiments of the present invention in detail with reference to the drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the invention, are not intended to limit the invention.

Fig. 1 is a schematic diagram of an image compression system according to an embodiment of the present invention. As shown in fig. 1, the image compression system may include: a nonlinear transformation network 10, a first encoder 20, a first decoder 30, and a nonlinear inverse transformation network 40.

The specific cases of the above-described respective modules are explained and illustrated below, respectively.

As shown in fig. 2A, the nonlinear transformation network 10 may include a feature extraction network 100. As shown in fig. 3, the feature extraction network 100 may include: a shift convolution layer 110 for extracting local features of a target image, and an attention mechanism 120 for extracting potential representation features of the target image from the local features.

Wherein the query vector (q) in the attention mechanism 120 is equal to the key vector (k).

The three parameters of the attention mechanism 120-query vector (q), key vector (k), value vector (v) are three mappings of input x, requiring three 1 x 1 convolutions and three batch normalizes. For an attention process, if q=k is set, the number of parameters and calculation amount of one 1×1 convolution are reduced. When n RLAB modules are superimposed, the parameters of n 1 x 1 convolutions are reduced, whereby the non-linear transformation/inverse transformation network (comprising multiple RLABs including the attention mechanism 120 provided by the present embodiment) can greatly increase the speed of compression/decompression.

The lightweight attention mechanism divides the feature map channels into three groups for feature extraction, with the attention window size of each group being fixed. After information movement interaction among windows, when an attention parameter query vector (q), a key vector (k) and a value vector (v) are trained and calculated in each attention window, q=k is set, and the network complexity is obviously reduced under the condition that the model performance is not reduced. Accordingly, the attention mechanism 120 may be referred to as a lightweight attention mechanism, and the feature extraction network 100 comprised of the attention mechanism 120 is also referred to as a lightweight feature extraction unit (RLAB).

The attention mechanism 120 may include: two 1 x 1 convolutional layers; and/or two batch normalization layers.

As shown in fig. 6, the lightweight attention mechanism uses batch normalization (Batch normalization, BN) instead of layer normalization (Layer normalization, LN) commonly used by various attention models. The BN is used in the experimental attention mechanism, so that model training is more stable, the BN can be aggregated with the convolutional neural network in the reasoning stage, the reasoning speed is further improved, and no extra computational complexity cost is caused. Then, the parameter matrices q, k, v in the attention window are not calculated separately, but q=k. Through experiments, the above measures are very effective in an end-to-end image compression network.

After feature mapping, the feature map is respectively passed through two 1*1 convolution volumes and BN, then the parameter matrix is respectively subjected to dimension reconstruction, and then the feature map subjected to attention weight distribution is obtained through matrix multiplication (the dimension reconstruction and the matrix multiplication are not improvement points of the invention and can be implemented in any existing mode). Before the attention operation, the pixels of each window are subjected to a clockwise shift operation, and global information is exchanged.

As shown in fig. 4, the shift convolution layer 110 includes: a first shifted convolutional layer 111 and a second shifted convolutional layer 112.

The feature extraction network 100 may further include: a first residual structure 130 for connecting an input of the first shifted convolutional layer 111 with an output of the second shifted convolutional layer 112; and/or a second residual structure 140 for connecting the input and output of the attention mechanism 120, as shown in fig. 4.

As shown in fig. 5, the shift convolution divides the channels into 5 groups, performs a corresponding shift operation on the convolution kernel of each group, and then uses 1 x1 convolution to implement cross-channel information aggregation. The shift convolution has a receptive field of 3 x 3 convolutions at a computational complexity of 1 x1 convolutions. The depth of the network is increased by the residual, otherwise the network is not well learned (gradient vanishing problem), for example, the original image information can be combined with the output of the second shift convolution by the first residual structure, thereby further outputting more comprehensive characteristic information. Then, by flexibly capturing the relation between the global features (potential representation features) and the local features through a lightweight attention mechanism, extracting key information, and constructing a multi-hop attention mechanism by means of residual connection by using a deeper network, so that the model can consider the features which have been focused before when getting the next attention.

Preferably, the feature extraction network 100 may further include: an active layer 150 is located between the first shifted convolutional layer 111 and the second shifted convolutional layer 112, as shown in fig. 4.

The activation layer is composed of an activation function (Relu) and is mainly used for carrying out nonlinear transformation on the characteristics. Through nonlinear mapping, the network maps the characteristics to a high-dimensional nonlinear interval, and the expression capacity of the network is enhanced.

And adding Relu an activation layer between two shift convolution operations, and simultaneously adding a residual structure, so that the network can learn image characteristics better.

In particular, the feature extraction network 100 may be a plurality of feature extraction networks 100. As shown in fig. 2A, the nonlinear transformation network 10 may further include: a plurality of convolution layers 101, said plurality of convolution layers 101 alternating with said plurality of feature extraction networks 100.

Wherein, the plurality of convolution layers 101 may be the same or different. As shown in fig. 2B, the nonlinear transformation network 10 includes Conv (5, 2) and RLAB, conv (3, 2) and RLAB. The main function of the network composed of the convolution layer and the feature extraction network (namely RLAB) is to perform feature extraction and downsampling, redundancy is reduced, and the dimension of the potential representation feature data promotes the network to learn compact features. It should be noted that the number of the respective portions included in the nonlinear transformation network 10 may be set according to the actual situation, and is not limited to the specific number shown in fig. 2B.

For example, x is an input image, and features are initially extracted by Conv (5, 2) and downsampled. That is, the convolution operation makes the image smaller (reduces the image size), reducing the complexity of subsequent operations, i.e., the dimension of the potentially represented feature data, facilitates the network to learn compact features. And RLAB can further extract the information of image details and ignore irrelevant information, can better learn characteristic representation, the degree of parallelization is enhanced, and the operation speed is higher. Through a plurality of similar modules, the output implicit variable y (learned feature map) is a feature map (more compact feature) representing an input image obtained by learning a model under the guidance of a loss function.

Furthermore, the image compression system may further include: a second quantizer 2 is located between the non-linear transformation network 10 and the first encoder 20, as shown in fig. 2A.

Wherein the second quantizer (Q) 2 causes the continuous implicit variable (y) to become a discrete valueThe distribution is more centralized, and the subsequent entropy coding is convenient. Where y is a continuous value, more memory bits are required, and/>Less memory bits are required.

As shown in fig. 2A, the first encoder 20 is configured to encode the potential representation feature of the target image to obtain a corresponding first code stream.

Wherein the first encoder 20 may be an entropy encoder (AE). Pairs using entropy coding techniquesPerforming lossless compression further reduces coding redundancy, forming a binary image compression file (i.e., a bitstream for storage and transmission). /(I)The more compact the entropy encoding uses fewer bits.

As shown in fig. 2A, the first decoder 30 is configured to decode the first code stream to obtain a first potential representation feature corresponding to the first code stream.

Wherein the first decoder 30 may be an entropy decoder (AD). AD is the inverse of AE, and the input code stream is obtained again(If the codec algorithm is ideal, then pre-coding/>And post-decoding/>The specific error should be the same depending on the merits of the codec algorithm and whether the parameters of the entropy codec are sufficiently accurate).

As shown in fig. 2A, the nonlinear inverse transformation network 40 is configured to decompress the first potential representation feature to obtain a reconstructed image corresponding to the target image.

The nonlinear inverse transformation network 40 and the nonlinear transformation network 10 are symmetrical, as shown in fig. 2A. The specific structure of the nonlinear inverse transformation network 40 can be found in the above description regarding the specific structure of the nonlinear transformation network 10.

The nonlinear inverse transformation network 40 is the inverse of the nonlinear transformation performed by the nonlinear transformation network 10 described above, i.e., the inputA reconstructed image (restored image) is obtained.

The original image data x is transformed to obtain y, and quantized to obtainWhere the quantized data probability model is generally considered a known distribution, the image is typically entropy encoded by a hypothetical probability model. However, the model is different from the actual distribution, and because the actual distribution is unknown, it is necessary to approximate the probability model distribution to the actual distribution as close as possible. In order to reduce the gap between the probability model and the actual model, a new variable of side information z is introduced to realize accurate estimation of the probability model. Specifically, in order to improve compression performance, in this embodiment, decoding may be assisted by adding side information, a variance and a first mean value of potential representation features after quantization are obtained through a super prior transformation network- (quantizer-) encoder-decoder-super prior inverse transformation network, on the other hand, a context model predicts a second mean value and fuses the two mean values, a gaussian distribution model is constructed according to the variance and the fused mean value, and the gaussian distribution model is used to assist a decoder in reconstructing an image.

In an embodiment, the image compression system may further include: the super a priori transformation network 50, the second encoder 60, the second decoder 70, the super a priori inverse transformation network 80, and the context model 90, as shown in fig. 2A.

The following explanation and explanation will be made with respect to the specific cases of the above-described respective modules, respectively.

The super a priori transformation network 50 is configured to compress the latent representation features of the target image to obtain side information of the latent representation features of the target image.

In particular, the super a priori transformation network 50 may include: a plurality of convolution layers and a plurality of said feature extraction networks.

As shown in fig. 2B, the super a priori transformation network 50 includes Conv (3, 2) and RLAB, conv (3, 2) and RLAB. The main function of the network composed of the convolution layer and the feature extraction network (namely RLAB) is to perform feature extraction and downsampling, and redundancy is reduced to obtain corresponding side information z. The decoding is assisted by the side information, so that the compression performance can be further improved. For specific information on the convolutional layer and RLAB, see the description above regarding the relevant content in the nonlinear transformation network.

Furthermore, the image compression system may further include: a first quantizer 1 is located between the super a priori transformation network 50 and the second encoder 60, as shown in fig. 2A.

Wherein the first quantizer (Q) 1 causes the continuous side information (z) to become a discrete valueAnd the subsequent coding is convenient. The original z is a continuous value, more memory bits are needed, and/>Less memory bits are required.

As shown in fig. 2A, the second encoder 60 is configured to encode the side information to obtain a corresponding second code stream.

Wherein the second encoder 60 may be an entropy encoder (AE). Pairs using entropy coding techniquesPerforming lossless compression further reduces coding redundancy, forming a binary image compression file (i.e., a bitstream for storage and transmission). /(I)The more compact the encoding uses fewer bits.

As shown in fig. 2A, the second decoder 70 is configured to decode the second code stream to obtain a second potential representation feature corresponding to the second code stream.

Wherein the second decoder 70 may be an entropy decoder (AD). AD is the inverse of AE, and the input code stream is obtained again。

As shown in fig. 2A, the super a priori inverse transform network 80 is configured to decode the second potential representation feature to obtain a variance parameter and a first mean parameter of the potential representation feature of the target image.

Accordingly, the super a priori inverse transformation network 80 and the super a priori transformation network 50 are symmetrical, as shown in fig. 2A. The specific structure of the super a priori inverse transformation network 80 may be found in the description above regarding the specific structure of the super a priori transformation network 50. Except that the inverse of the non-linear transformation, i.e., the input, is performed by the super-a priori inverse transformation network 80Obtain variance parameter/>(I.e., variance parameter of Gaussian distribution model) and first mean parameter/>。

As shown in fig. 2A, the context model 90 is configured to predict a second mean parameter of the potential representation feature of the target image according to the potential representation feature of the target image, and reconstruct a gaussian distribution model of the potential representation feature of the target image according to the first mean parameter, the second mean parameter, and the variance parameter.

Accordingly, the first encoder 20 for encoding the potential representation features of the target image comprises: encoding potential representation features of the target image according to the gaussian distribution model, and the first decoder 30 for decoding the first code stream comprises: and decoding the first code stream according to the Gaussian distribution model.

The context model obtains the second mean parameter by adopting an autoregressive prediction method according to the potential representation characteristic (or the quantized potential representation characteristic)Then according to the fusion of the two mean parameters, the mean parameters of the Gaussian distribution model are obtainedWherein/>Is a weight coefficient, and/>Thus, it can be output as a distribution for computing hypotheses/>Parameter/>Is a piece of information of (a). At output parameters/>The entropy encoder may then encode the potential representation features according to the actual gaussian distribution, and the entropy decoder may decode the first bitstream according to the actual gaussian distribution, whereby the context model may effectively assist the decoder in reconstructing the image. Thus, the present embodiment can achieve more efficient compression and better restoration of images.

Binary code streams are obtained through a super prior encoder (conv (3, 2) -RLAB-conv (3, 2) -RLAB) -quantizer (Q) -entropy encoder (AE), variances and first mean values of potential representation features after quantization are obtained through an entropy decoder (AD) -super prior decoder when an image is reconstructed, on the other hand, a context model is used for predicting a second mean value through an autoregressive prediction method and fusing two mean value information, a Gaussian distribution model is constructed according to the variances and the fused mean values, and the Gaussian distribution model is used for assisting the decoder to reconstruct the image. In the prior art, the decoder reconstructs the image according to the assumed distribution, and the distribution effect without network learning is good. In addition, the mean value parameters adopted in the embodiment use the fused result, the reliability of reconstructing the image according to the Gaussian distribution is higher, the calculation is simple, a network does not need to be reconstructed to learn the Gaussian distribution parameters, the complexity of the model is reduced, and the light-weight target of the compression model is further realized.

Compared with the prior neural network image coding method with optimal performance on a large number of test pictures, the method can improve the reasoning speed of image coding and decoding by 3-4 times on the premise of ensuring that the performance is not reduced, and has better hardware application prospect. In addition, because the end-to-end image compression method does not need to manually design related parameters, compared with the traditional image coding method, the method has better iteration capacity and expansibility.

The image compression system described above may be constructed and trained prior to use of the system.

First, a high definition image with content diversity including human, natural landscape, animal, etc. (OpenImage or Flicker2W library) is constructed, and a training set, a validation set, and a test set are divided on the basis of a data set.

Next, a lightweight feature extraction unit (RLAB) consisting of a shifted volume and lightweight attention mechanism is constructed, as shown in fig. 3 or fig. 4.

The lightweight feature extraction unit includes a shifted convolution and lightweight attention mechanism with a residual structure. And adding Relu an activation layer between two shift convolution operations, and simultaneously adding a residual structure, so that the network can learn image characteristics better.

Then, a non-linear transformation and inverse transformation network is constructed in a lightweight mode by using a lightweight characteristic extraction unit, and a model is trained.

In an embodiment, the convolutional neural network may also be used to perform feature preliminary extraction and downsampling; then, a plurality of lightweight characteristic extraction units (RLAB) are overlapped to form a nonlinear transformation network, a nonlinear inverse transformation network, a super prior transformation network and a super prior inverse transformation network.

And designing a super prior network, taking the output z as a side information auxiliary image code, further reducing redundancy of hidden layer variable y and improving compression rate.

And finally, inputting the image into a nonlinear transformation network, acquiring a Gaussian distribution model through a super prior network, performing entropy coding and entropy decoding processes according to the Gaussian distribution model, and decoding the image through a nonlinear inverse transformation network to obtain a decoded reconstructed image.

In training the model, the loss function is used: l=λ ∙ d+r, where D is the mean square error of the original image and the reconstructed image, and R is the code rate (calculated from the super a priori information code rate and the hidden layer variable code rate). And the lambda 1 weighing coefficient is used for jointly optimizing the mean square error and the code rate, so that the end-to-end training is realized.

Compared with the neural network image compression model with the best performance at present, the parameter quantity is reduced by 5% on the premise of not reducing the compression performance, the coding reasoning time is reduced by more than 40%, and the decoding reasoning time is reduced to 1/3 of the original time, as shown in the table 1. Thus, the image compression system provided by the various embodiments of the invention can be applied to a 5G system to form a lightweight neural network image compression system for the 5G system.

TABLE 1PSNR and MS-SSIM Performance comparison Table

In summary, the present invention creatively provides a nonlinear transformation network for acquiring potential representation features, a first encoder for encoding the potential representation features of the target image to acquire corresponding first code streams, a first decoder for decoding the first code streams to acquire first potential representation features corresponding to the first code streams, a nonlinear inverse transformation network for decompressing the first potential representation features to acquire reconstructed images corresponding to the target image, and a shift convolution layer and attention mechanism in the nonlinear transformation network, wherein a query vector in the attention mechanism is equal to a key vector. Compared with the existing neural network image compression algorithm with optimal performance, the method can greatly improve the image compression speed (for example, the speed can be improved by 3-4 times) on the premise of keeping the same compression performance, and compared with the traditional image coding method, the method has better iteration capacity and expansibility because the end-to-end image compression method does not need to manually design related parameters.

Fig. 7 is a flowchart of an image compression method according to an embodiment of the present invention. As shown in fig. 7, the image compression method may include: step S701, potential representation features of a target image are acquired through a nonlinear transformation network; step S702, encoding potential representation features of the target image through a first encoder to obtain a corresponding first code stream; step S703, decoding, by a first decoder, the first code stream to obtain a first potential representation feature corresponding to the first code stream; and step S704, decompressing the first latent representation feature through a nonlinear inverse transformation network to obtain a reconstructed image corresponding to the target image, where the nonlinear inverse transformation network and the nonlinear transformation network are in a symmetrical structure.

Wherein the nonlinear transformation network comprises a feature extraction network. The feature extraction network includes: the shift convolution layer is used for extracting local features of the target image; and an attention mechanism for extracting potential representation features of the target image from the local features. Wherein the query vector in the attention mechanism is equal to the key vector.

An embodiment of the present invention provides a chip, where the chip includes the image compression system.

An embodiment of the present invention provides a chip including: a processor; a memory for storing a computer program for execution by the processor; the processor is configured to read the computer program from the memory and execute the computer program to implement the image compression method.

An embodiment of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image compression method.

An embodiment of the invention provides a computer program product comprising a computer program which, when executed by a processor, implements the image compression method.

The preferred embodiments of the present invention have been described in detail above with reference to the accompanying drawings, but the present invention is not limited to the specific details of the above embodiments, and various simple modifications can be made to the technical solution of the present invention within the scope of the technical concept of the present invention, and all the simple modifications belong to the protection scope of the present invention.

In addition, the specific features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various possible combinations are not described further.

Those skilled in the art will appreciate that all or part of the steps in implementing the methods of the embodiments described above may be implemented by a program stored in a storage medium, including instructions for causing a single-chip microcomputer, chip or processor (processor) to perform all or part of the steps of the methods of the embodiments described herein. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Moreover, any combination of the various embodiments of the invention can be made without departing from the spirit of the invention, which should also be considered as disclosed herein.

Claims

1. An image compression system, the image compression system comprising:

a nonlinear transformation network comprising a feature extraction network,

Wherein the feature extraction network comprises:

a shift convolution layer for extracting local features of the target image, and

An attention mechanism for extracting potential representation features of the target image from the local features, wherein a query vector in the attention mechanism is equal to a key vector;

A first encoder for encoding potential representation features of the target image to obtain a corresponding first code stream;

A first decoder for decoding the first code stream to obtain a first potential representation feature corresponding to the first code stream; and

A non-linear inverse transform network for decompressing the first potential representation feature to obtain a reconstructed image corresponding to the target image,

Wherein the nonlinear inverse transformation network and the nonlinear transformation network are of symmetrical structures,

The image compression system further includes:

the super prior transformation network is used for compressing the potential representation characteristics of the target image so as to acquire side information of the potential representation characteristics of the target image;

the second encoder is used for encoding the side information to obtain a corresponding second code stream;

a second decoder for decoding the second code stream to obtain a second potential representation feature corresponding to the second code stream;

the super-prior inverse transformation network is used for decoding the second potential representation feature to acquire a variance parameter and a first mean parameter of the potential representation feature of the target image; and

A context model for predicting a second mean parameter of a potential representation feature of the target image from the potential representation feature of the target image, and reconstructing a gaussian distribution model of the potential representation feature of the target image from the first mean parameter, the second mean parameter and the variance parameter,

Accordingly, the first encoder for encoding the potential representation features of the target image comprises: encoding potential representation features of the target image according to the Gaussian distribution model, and

The first decoder for decoding the first code stream includes: and decoding the first code stream according to the Gaussian distribution model.

2. The image compression system of claim 1, wherein the attention mechanism comprises: two 1x 1 convolutional layers; and/or two batch normalization layers.

3. The image compression system of claim 2, wherein the super a priori transformation network comprises: a plurality of convolution layers and a plurality of said feature extraction networks,

The super prior inverse transformation network and the super prior transformation network are of symmetrical structures.

4. The image compression system of claim 3, further comprising: a first quantizer is located between the super a priori transform network and the second encoder.

5. The image compression system of claim 1, wherein the feature extraction network is a plurality of feature extraction networks,

Correspondingly, the nonlinear transformation network further comprises: a plurality of convolution layers alternating with the plurality of feature extraction networks.

6. The image compression system of claim 1, wherein the shift convolution layer comprises: a first shifted convolutional layer and a second shifted convolutional layer,

The feature extraction network further includes: a first residual structure connecting an input of the first shifted convolutional layer with an output of the second shifted convolutional layer; and/or a second residual structure for connecting the input and the output of the attention mechanism.

7. The image compression system of claim 6, wherein the feature extraction network further comprises: an active layer located between the first shifted convolutional layer and the second shifted convolutional layer.

8. The image compression system of claim 1, further comprising: a second quantizer is located between the non-linear transformation network and the first encoder.

9. An image compression method, characterized in that the image compression method comprises:

The following operations are performed by the nonlinear transformation network:

Extracting local features of the target image, and

Extracting potential representation features of the target image from the local features,

Wherein the nonlinear transformation network comprises a feature extraction network comprising: shifting a convolutional layer and an attention mechanism, wherein a query vector in the attention mechanism is equal to a key vector;

Encoding potential representation features of the target image by a first encoder to obtain a corresponding first code stream;

Decoding the first code stream by a first decoder to obtain a first potential representation feature corresponding to the first code stream; and

Decompressing the first potential representation feature over a non-linear inverse transform network to obtain a reconstructed image corresponding to the target image,

The image compression method further includes:

Compressing the potential representation features of the target image through a super prior transformation network to obtain side information of the potential representation features of the target image;

Encoding the side information through a second encoder to obtain a corresponding second code stream;

Decoding, by a second decoder, the second code stream to obtain a second potential representation feature corresponding to the second code stream;

Decoding the second potential representation feature through a super-prior inverse transformation network to obtain a variance parameter and a first mean parameter of the potential representation feature of the target image; and

Predicting a second mean parameter of the potential representation feature of the target image according to the potential representation feature of the target image through a context model, reconstructing a Gaussian distribution model of the potential representation feature of the target image according to the first mean parameter, the second mean parameter and the variance parameter,

Accordingly, the encoding, by the first encoder, the potential representation feature of the target image includes: encoding potential representation features of the target image according to the Gaussian distribution model, and

The decoding of the first code stream by a first decoder includes: and decoding the first code stream according to the Gaussian distribution model.

10. The image compression method of claim 9, wherein the attention mechanism comprises: two 1 x 1 convolutional layers; and/or two batch normalization layers.

11. The image compression method of claim 9, wherein the super a priori transformation network comprises: a plurality of convolution layers and a plurality of said feature extraction networks,

12. The image compression method of claim 9, wherein the feature extraction network is a plurality of feature extraction networks,

13. The image compression method of claim 9, wherein the shift convolution layer comprises: a first shifted convolutional layer and a second shifted convolutional layer,

14. The image compression method of claim 13, wherein the feature extraction network further comprises: an active layer located between the first shifted convolutional layer and the second shifted convolutional layer.

15. A chip, characterized in that it comprises an image compression system according to any one of claims 1-8.

16. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the image compression method of any of claims 9-14.