CN114554210A

CN114554210A - Lossless image compression method and system

Info

Publication number: CN114554210A
Application number: CN202111563861.7A
Authority: CN
Inventors: 张高志; 李凡平; 石柱国
Original assignee: ISSA Technology Co Ltd
Current assignee: ISSA Technology Co Ltd
Priority date: 2021-12-20
Filing date: 2021-12-20
Publication date: 2022-05-27

Abstract

The invention provides a lossless image compression method and a lossless image compression system, which belong to the technical field of computer vision, and convert an original image from an RGB color space to a YUV color space; processing the image after spatial conversion by using a pre-trained prediction model to obtain Y, U, V prediction pixels on three channels and context information among the pixels; coding to obtain a final compressed bit stream based on the prediction errors and the context information of original pixels on Y, U, V channels and the corresponding prediction pixels; and performing image compression based on the final compressed bit stream. The method is based on a multi-layer perceptron (MLP), adopts the ideas of channel-by-channel, residual and progressive learning, jointly estimates the average value (pixel prediction) and the variance (context) of a prediction error model of each pixel, reduces the complexity and the power consumption of the system, and improves the image compression calculation efficiency and the compression quality.

Description

Lossless image compression method and system

Technical Field

The invention relates to the technical field of computer vision, in particular to a lossless image compression method and system based on a multi-layer perceptron MLP.

Background

The image compression may be lossy data compression or lossless data compression. Lossless compression is preferred for technical drawings, charts or caricatures as drawn, because lossy compression methods, especially at low bit rates, will introduce compression distortion. Compression of such valuable content as medical images or scanned images for archiving has also tried to choose a lossless compression method. Lossy methods are well suited to natural images, for example, in some applications a slight loss of image may be acceptable (and sometimes imperceptible), which can significantly reduce the bit rate.

In the case of lossless compression, predictive coding is a more widely used method. Some examples of non-learning lossless codecs are BPG, PNG, JPEG-LS, JPEG2000, LCIC, WebP, FLIF, JPEG-XL, etc. Deep neural networks have enjoyed significant success in computer vision and signal processing, as well as providing solutions for image compression. For example, a neural network-based compression method, which extracts features from an input image and then encodes the features into one bit stream, can be considered as the same as a transform and quantization method, and thus, they are not suitable for lossless compression.

Based on a learning lossless compression method, a probability model of a given pixel value or a residual signal can be learned, or a hybrid coding method is adopted, a predictor of a non-learning coder is replaced by a deep neural network, and an entropy coder is improved. These methods perform better than the non-learning codec FLIF in terms of compressed bits per pixel. However, they are much longer in computation time than FLIF, at least 10 times greater even on a GPU²This makes them less practical.

Disclosure of Invention

The invention aims to provide a lossless image compression method and system based on a multi-layer perceptron MLP, which adopt the thought of channel-by-channel, residual and progressive learning, only jointly estimate the average value (pixel prediction) and the variance (context) of a prediction error model of each pixel, and can realize accurate prediction in a short time, so as to solve at least one technical problem in the background technology.

In order to achieve the purpose, the invention adopts the following technical scheme:

in one aspect, the present invention provides a lossless image compression method, including:

converting an original image from an RGB color space to a YUV color space;

processing the image after spatial conversion by using a pre-trained prediction model to obtain Y, U, V prediction pixels on three channels and context information among the pixels;

coding to obtain a final compressed bit stream based on the prediction errors and the context information of original pixels on Y, U, V channels and the corresponding prediction pixels;

and performing image compression based on the final compressed bit stream.

Preferably, it is determined whether the encoded pixel is in a smooth region or a texture region; and respectively utilizing the smooth prediction branch and the texture prediction branch to predict the pixels to be coded for the image smooth area and the texture area.

Preferably, the smooth prediction branch and the texture prediction branch are both composed of three MLP neural networks, and each MLP neural network corresponds to one color channel.

Preferably, in the pre-trained prediction model, the MLP neural network takes the support pixel of the pixel to be coded as input, and outputs a pixel prediction value, context information and intermediate characteristics; the support pixels are adjacent pixels of the pixels to be encoded, and the intermediate characteristic is a node value of the last hidden layer of the MLP neural network.

Preferably, based on the support pixels of the Y channel, the pixel values, the context information and the intermediate features of the Y channel are predicted by using an MLP neural network; predicting the pixel value, the context information and the intermediate characteristic of the U channel by using the information from the Y channel and the supporting pixels in the U channel; information from the Y channel and the U channel is combined, and the support pixels of the V channel predict pixel values, context information and intermediate features of the V channel.

Preferably, if the pixel variation of a group of support pixels of the pixel to be encoded is less than or equal to the threshold, the group of support pixels is a smooth patch, and the pixel to be encoded is located in the smooth area; and if the pixel change of a group of support pixels of the pixel to be coded is larger than the threshold value, the group of support pixels are texture patches, and the pixel to be coded is positioned in the texture area.

In a second aspect, the present invention provides a lossless image compression system comprising:

the conversion module is used for converting an original image from an RGB color space to a YUV color space;

the prediction module is used for processing the image after the space conversion by using a pre-trained prediction model to obtain Y, U, V prediction pixels on three channels and context information among the pixels;

the encoding module is used for encoding to obtain a final compressed bit stream based on the prediction errors and the context information of original pixels on Y, U, V channels and the corresponding prediction pixels;

a compression module for image compression based on the final compressed bitstream.

In a third aspect, the invention provides a non-transitory computer readable storage medium for storing computer instructions which, when executed by a processor, implement a lossless image compression method as described above.

In a fourth aspect, the invention provides a computer program product comprising a computer program for implementing a lossless image compression method as described above when the computer program is run on one or more processors.

In a fifth aspect, the present invention provides an electronic device, comprising: a processor, a memory, and a computer program; wherein a processor is connected to the memory, the computer program being stored in the memory, the processor executing the computer program stored in the memory when the electronic device is running, to cause the electronic device to execute instructions implementing the lossless image compression method as described above.

The invention has the beneficial effects that: based on a multilayer perceptron (MLP), the ideas of channel-by-channel, residual and progressive learning are adopted, the average value (pixel prediction) and the variance (context) of a prediction error model of each pixel are jointly estimated, the complexity and the power consumption of a system are reduced, and the image compression calculation efficiency and the compression quality are improved.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic workflow diagram of an efficient image encoding method based on a multi-layered perceptron (MLP) according to an embodiment of the present invention.

Fig. 2 is a flowchart of a channel-level progressive compression scheme according to an embodiment of the present invention.

Fig. 3 is a flowchart of pixel encoding for smooth and texture regions of an image according to an embodiment of the present invention.

Fig. 4 is a schematic diagram of the internal structures of the smooth network and the texture network according to the embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below by way of the drawings are illustrative only and are not to be construed as limiting the invention.

It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

For the purpose of facilitating an understanding of the present invention, the present invention will be further explained by way of specific embodiments with reference to the accompanying drawings, which are not intended to limit the present invention.

It should be understood by those skilled in the art that the drawings are merely schematic representations of embodiments and that the elements shown in the drawings are not necessarily required to practice the invention.

Example 1

This embodiment 1 provides a lossless image compression system, including:

the conversion module is used for converting the original image from an RGB color space to a YUV color space;

In this embodiment 1, with the above system, a lossless image compression method is implemented, which includes:

firstly, converting an original image from an RGB color space to a YUV color space by using a conversion module;

then, processing the image after space conversion by using a prediction module based on a pre-trained prediction model to obtain Y, U, V prediction pixels on three channels and context information among the pixels;

then, an encoding module is utilized to encode and obtain a final compressed bit stream based on the prediction errors and context information of original pixels on Y, U, V channels and the corresponding prediction pixels;

and finally, performing image compression based on the final compressed bit stream by adopting a compression module.

Specifically, in this embodiment 1, the method is used for processing the image after spatial conversion by using a pre-trained prediction model, and first determining whether a coding pixel is in a smooth region or a texture region; and respectively utilizing the smooth prediction branch and the texture prediction branch to predict the pixels to be coded for the image smooth area and the texture area.

The smooth prediction branch and the texture prediction branch are respectively composed of three multi-layer perceptron MLP neural networks and respectively correspond to one color channel. They have the same internal setup, 4 hidden layers containing 64 cells, and the size of the intermediate features of each network is also 64.

In a pre-trained prediction model, an MLP neural network takes a support pixel of a pixel to be coded as input and outputs a pixel prediction value, context information and an intermediate characteristic; the support pixels are neighbor pixels of the pixels to be encoded, and the intermediate features are node values of the last hidden layer of the MLP neural network.

Specifically, when performing pixel coding prediction in this embodiment 1, firstly, based on the support pixels of the Y channel, the MLP neural network is used to predict the pixel value, the context information, and the intermediate feature of the Y channel; predicting the pixel value, the context information and the intermediate characteristic of the U channel by using the information from the Y channel and the supporting pixels in the U channel; information from the Y channel and the U channel is combined, and the support pixels of the V channel predict pixel values, context information and intermediate features of the V channel.

If the pixel change of a group of support pixels of the pixels to be coded is less than or equal to a threshold value, the group of support pixels are smooth patches, and the pixels to be coded are located in a smooth area; and if the pixel change of a group of support pixels of the pixel to be coded is larger than the threshold value, the group of support pixels are texture patches, and the pixel to be coded is positioned in the texture area.

Example 2

Mainly solving the problem that the image compression time is long due to too complicated models used in the existing method, and realizing high image compression performance in a reasonable time, in this embodiment 2, an image high-efficiency coding method based on a multi-layer perceptron (MLP) is provided. In order to carry out accurate prediction, the ideas of channel-by-channel, residual error and progressive learning are adopted. To achieve a practical computation time, only the mean (pixel prediction) and variance (context) of the prediction error model for each pixel are jointly estimated. Using MLP reduces the complexity and power consumption of the system compared to using convolutional or cyclic neural networks.

As shown in fig. 1, in this embodiment 2, the workflow of the multi-layer perceptron (MLP) -based image efficient coding method includes the following steps:

step S1: acquiring an original image and converting the image from an RGB color space to a YUV color space by reversible color transformation;

step S2: training six simple MLP neural network models; among them, 3 are smooth prediction branches in the prediction model, and 3 are texture prediction branches in the prediction model.

Step S3: obtaining and determining whether to be a smooth patch or a texture patch according to local activity of the image;

step S4: the smooth patch is input to the smoothing network (i.e., smooth prediction branch), the texture patch is input to the texture network (texture prediction branch);

step S5: using a channel-level progressive compression scheme, the image Y, U, V uses one MLP network per channel;

step S6: predicting pixel values and contexts simultaneously for each coded pixel MLP neural network at the channel level;

step S7: the prediction error and context are sent to an adaptive arithmetic coder, which produces the final compressed bit stream.

In embodiment 2, the method described above uses different network models for image compression in different regions of an image, which requires less time than a non-learning type encoder method, and can achieve better image compression performance than a learning-based method, thereby achieving higher lossless image compression performance in a shorter time.

Next, a detailed flow of the method for efficient encoding and compressing of images in embodiment 2 will be described.

First, the acquired original image is converted from the RGB color space to the YUV color space. The compression efficiency of the color image is improved by transforming the decorrelated color channels. For lossless compression, the color conversion itself must be lossless, i.e., in integer arithmetic, the inverse of YUV back to RGB should be lossless. In this embodiment 2, a reversible color transform is employed, which closely approximates the conventional YUV transform.

Next, the loss function for training the MLP neural network is described.

The loss function aims at achieving two goals, namely: (1) minimize prediction error to accurately reconstruct the encoded pixels, (2) find a suitable context that represents local activity well.

The first objective can be achieved by minimizing the loss of L1 between the reconstructed encoded pixel and the corresponding original pixel, with the following formula:

wherein, the first and the second end of the pipe are connected with each other,

represents the ith original pixel of channel C,

representing the ith original reconstructed encoded pixel of channel C.

The second objective is achieved by context loss, such that the context reflects the size of the local amplitude. Since the prediction error tends to be large for the edge and texture regions, and small for the smooth regions, in this embodiment 2, a context proportional to the prediction error is modeled. That is, the network estimates the context as a prediction error through the context loss, and the specific formula is as follows:

the overall objective function L is the sum of the two objectives, and the specific formula is as follows:

wherein λ is_cIs a hyperparameter that balances the contribution of each channel. Furthermore, the contribution to the reconstruction loss and the context loss is the same.

Since the encoding is processed in the order of Y, U, V, the performance of Y affects the performance of U. Furthermore, the performance of V also depends on Y and U. Therefore, in this embodiment 2, the network is sequentially optimized in a progressive manner in the order of Y, U, V, and this improved training scheme is called progressive training. Formally, progressive training takes the following penalties, respectively:

in this embodiment 2, the MLP neural network is optimized according to the sequence of the three equations, and the network is finely tuned according to the overall objective function L.

Next, there is a stepwise prediction of the channel level.

The case of simultaneous estimation using the same support pixel for the Y, U, V three channels will be described.

For each coded pixel, the MLP neural network generates its prediction values and context with the support pixel input. The support pixels are the pixels between causal neighbors that are used as inputs to the MLP neural network. Since pixels far from the encoded pixel contribute little to the estimation, pixels within a short distance are used as support pixels.

In this embodiment 2, the distance is set in proportion to the image resolution. Specifically, for low resolution, 2K and 4K UHD images, the distances are set to 1, 2, 4, respectively. Formally, the prediction process is described by the following equation:

wherein f is_cRepresenting a neural network for channel C, x_S,iRepresenting a coded pixel x_iOf the display device is controlled by the control unit,

the ith prediction pixel representing channel C,

representing the context of the ith pixel of channel C.

The channel level progressive compression scheme is described below. The whole flow is shown in fig. 2.

The pixel values and context of channel Y are first predicted by using the MLP neural network. This prediction is based only on the support pixels, since these are the only available information. Formally, f_YWith the support pixels as inputs, three outputs are generated: pixel prediction

Context(s)

And intermediate features

Wherein the intermediate characteristic is a node value of the last hidden layer in the MLP.

In order to predict the encoded pixels in channel U, in this embodiment 2, the information from Y is used, as well as the support pixels in channel U. Specifically, f_UBy using

Intermediate features

And supporting pixels to predict encoded pixels

Also, information from both the Y and U channels is used to predict the encoded pixel in V. Formally, f_VTo be provided with

And support pixels as inputs.

The specific formula of the channel-level progressive prediction is as follows:

the prediction error using the above described progressive scheme is still too large to yield the most advanced compression performance. In this embodiment 2, one of the pixel values is subtracted from all the support pixels and the encoded pixels. In particular, the pixel to the left of the encoded pixel is selected, i.e.

And subtracts it from the support pixels and the encoded pixels.

Since the encoded pixel and the pixel to its left tend to have higher correlation, the variance of the difference between them is usually lower and the mean is close to zero. Thus, the network will produce a more stable and accurate output, resulting in better compression results. In this embodiment 2, the prediction scheme is called residual prediction, and the formula is specifically expressed as follows:

wherein r is_i ^cThe encoded pixel representing channel C minus the pixel to its left.

Pixel estimators typically have good estimation performance on smooth regions and relatively poor estimation performance on textured regions. In other words, the networks exhibit different behavior depending on the area, and therefore it would be beneficial to have a different network for each type of area.

In this embodiment 2, two kinds of MLP neural networks are proposed for smooth regions and texture regions of an image, respectively (i.e. each MLP neural network is located in a smooth prediction branch and a texture prediction branch, respectively). Where each network consists of three MLP neural networks, one for each color channel. They have the same internal setup, 4 hidden layers containing 64 cells, and the size of the intermediate features of each network is also 64. One of which is specifically directed to smooth areas and the other to textured areas.

It is first distinguished whether the encoded pixels are in smooth regions or in texture regions. Specifically, a set of support pixels is denoted as a smooth patch if the pixel variation is small, and denoted as a texture patch otherwise. In this embodiment 2, the average absolute deviation is used to distinguish the smooth/texture region, and if the average absolute error of the support pixel is less than or equal to the threshold, it is determined as a smooth patch, otherwise, it is determined as a texture patch. The threshold value is set to 50 in this embodiment 2. After determining the patch type, the smooth patch is sent to the smooth network, and the texture patch is sent to the texture network. In this way, each network can focus on a particular patch type, thereby improving estimation accuracy. The overall workflow is shown in fig. 3. In which the internal structure of the smooth network and the texture network is shown in fig. 4.

Prediction error

And the context is fed to an adaptive encoder, which generates the final compressed bit stream.

The adaptive encoder utilizes the probability distribution of the prediction error, so the estimation of the probability density function highly affects the compression performance. In order to use entropy coding in the framework more accurately, N adaptive encoders are used in this embodiment 2, wherein the jth adaptive encoder learns statistics of the jth level of local activity. That is, the first adaptive encoder processes the prediction error from the smallest local activity region, while the nth adaptive encoder processes exclusively the largest local activity region. Context refers to the size of the local activity. The context is quantized into N levels in this embodiment 2, each level corresponding to one of N adaptive encoders. It is noted that the entire process is performed in raster scan order, and in Y, U, V channel order. N is set to 24 in this embodiment 2.

Example 3

Embodiment 3 of the present invention provides a non-transitory computer readable storage medium for storing computer instructions which, when executed by a processor, implement a lossless image compression method as described above, the method including:

converting an original image from an RGB color space to a YUV color space;

and performing image compression based on the final compressed bit stream.

Example 4

Embodiment 4 of the present invention provides a computer program (product) comprising a computer program for implementing a lossless image compression method as described above when the computer program runs on one or more processors, the method comprising:

converting an original image from an RGB color space to a YUV color space;

processing the image after spatial conversion by using a pre-trained prediction model to obtain Y, U, V predicted pixels on three channels and context information among the pixels;

and performing image compression based on the final compressed bit stream.

Example 5

An embodiment 5 of the present invention provides an electronic device, including: a processor, a memory, and a computer program; wherein a processor is connected to the memory, a computer program is stored in the memory, and when the electronic device is running, the processor executes the computer program stored in the memory to make the electronic device execute the instructions for implementing the lossless image compression method as described above, the method comprising:

converting an original image from an RGB color space to a YUV color space;

based on the prediction errors and the context information of original pixels on Y, U, V channels and the corresponding prediction pixels, coding to obtain a final compressed bit stream;

and performing image compression based on the final compressed bit stream.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Although the embodiments of the present invention have been described with reference to the accompanying drawings, it is not intended to limit the scope of the present invention, and it should be understood by those skilled in the art that various modifications and variations can be made without inventive efforts based on the technical solutions disclosed in the present invention.

Claims

1. A lossless image compression method, comprising:

converting an original image from an RGB color space to a YUV color space;

and performing image compression based on the final compressed bit stream.

2. The lossless image compression method as claimed in claim 1, wherein it is determined whether the encoded pixel is in a smooth region or a texture region; and respectively utilizing the smooth prediction branch and the texture prediction branch to predict the pixels to be coded for the image smooth area and the texture area.

3. The lossless image compression method as claimed in claim 2, wherein the smooth prediction branch and the texture prediction branch are each composed of three MLP neural networks, each corresponding to one color channel.

4. The lossless image compression method as claimed in claim 3, wherein in the pre-trained prediction model, the MLP neural network takes the support pixels of the pixels to be encoded as input, and outputs the predicted values, context information and intermediate features of the pixels; the support pixels are neighbor pixels of the pixels to be encoded, and the intermediate features are node values of the last hidden layer of the MLP neural network.

5. The lossless image compression method as claimed in claim 4, wherein the pixel values, the context information, and the intermediate features of the Y channel are predicted using an MLP neural network based on the support pixels of the Y channel; predicting the pixel value, the context information and the intermediate characteristic of the U channel by using the information from the Y channel and the supporting pixels in the U channel; information from the Y channel and the U channel is combined, and the support pixels of the V channel predict pixel values, context information and intermediate features of the V channel.

6. A lossless image compression method as claimed in claim 5, wherein if the pixel variation of a set of support pixels of the pixel to be encoded is less than or equal to a threshold, the set of support pixels is a smooth patch, and the pixel to be encoded is located in a smooth region; and if the pixel change of a group of support pixels of the pixel to be coded is larger than the threshold value, the group of support pixels are texture patches, and the pixel to be coded is positioned in the texture area.

7. A lossless image compression system, comprising:

8. A non-transitory computer-readable storage medium for storing computer instructions which, when executed by a processor, implement the lossless image compression method of any one of claims 1 to 6.

9. A computer program product, comprising a computer program for implementing a lossless image compression method as claimed in any one of claims 1 to 6, when the computer program is run on one or more processors.

10. An electronic device, comprising: a processor, a memory, and a computer program; wherein a processor is connected to the memory, in which the computer program is stored, which processor executes the computer program stored by the memory when the electronic device is running, to cause the electronic device to execute instructions implementing the lossless image compression method as claimed in any one of claims 1 to 6.