CN114792347A

CN114792347A - An image compression method based on fusion of multi-scale space and context information

Info

Publication number: CN114792347A
Application number: CN202210224174.0A
Authority: CN
Inventors: 王瀚漓; 刘自毅
Original assignee: Tongji University
Current assignee: Tongji University
Priority date: 2022-03-09
Filing date: 2022-03-09
Publication date: 2022-07-26
Anticipated expiration: 2042-03-09
Also published as: CN114792347B

Abstract

The invention relates to an image compression method based on fusion of multi-scale space and context information. The method includes the following steps: 1) constructing an image compression model based on fusion of multi-scale space and context information, and extracting hidden images from the original image through a main encoder features, and uses a multi-scale information fusion module to reduce the loss of forward propagation of effective information; 2) The super-prior module combines the super-prior information and multi-scale context information to obtain the parameters and weights of the three Gaussian functions, and add the weights Then, the Gaussian mixture model is obtained to obtain the probability distribution of hidden features; 3) Based on the probability distribution of hidden features, the arithmetic encoder encodes and decodes the hidden features; 4) The main decoder reconstructs the hidden features into pictures to complete image compression. Compared with the prior art, the present invention can achieve better image reconstruction quality under the condition of lower compression code rate.

Description

An image compression method based on fusion of multi-scale space and context information

技术领域technical field

本发明涉及图像压缩技术领域，尤其是涉及一种基于多尺度空间和上下文信息融合的图像压缩方法。The invention relates to the technical field of image compression, in particular to an image compression method based on fusion of multi-scale space and context information.

背景技术Background technique

在第三次信息革命之后，大量的数字信息在各个终端之间相互传输。但限于当时的数字信息获取方式，大多数的信息还是停留在文字信息上，但是随着各种电子数码产品的诞生和普及，尤其是移动互联网时代的到来，每一个人都可以成为摄影师，可以方便使用的电子设备使得大量的图片与视频信息在互联网上大量传输，大量的数据也导致了对于数据速度的要求和对于存储空间的要求飞速增加。因此对于数据压缩的必要性也由此体现出来，其中如何压缩占据互联网数据相当部分的图片数据成为了一个热点研究课题。After the third information revolution, a large amount of digital information is transmitted to and from each terminal. However, limited by the way of obtaining digital information at that time, most of the information remained in text information, but with the birth and popularization of various electronic digital products, especially the advent of the mobile Internet era, everyone can become a photographer, Electronic devices that can be easily used enable a large number of pictures and video information to be transmitted on the Internet in large quantities, and a large amount of data also leads to a rapid increase in the requirements for data speed and storage space. Therefore, the necessity of data compression is also reflected, and how to compress image data which occupies a considerable part of Internet data has become a hot research topic.

在基于深度学习的图像编码方法出现之前，有着大量的传统方法，包括JPEG、JPEG2000、BPG等，这些方法直到现在依然被广泛使用，但是传统方法被有很多手工设计的组件，一般而言包括分块，线性变换，量化以及熵编码。由于深度学习的迅速发展以及在众多计算机视觉领域的广泛运用，大量基于深度学习的端到端的图像压缩方法被提出。现有的方法大多基于比较成熟的深度学习模型，比如基于卷积神经网络的图像压缩，基于生成对抗网络的图像压缩和基于图卷积神经网络的图像压缩。其中基于生成对抗网络的图像压缩算法利用生成器和判别器之间的对抗训练，提高了低码率下图像重构的人眼感官，但是恢复的图片在峰值信噪比(PSNR)和多层级结构相似性(MS-SSIM)的指标上表现不佳。因为卷积神经网络对于图像特征提取的优势，大部分基于深度学习的图像压缩一般使用卷积神经网络的结构。现有的基于卷积神经网络的方法，首先，通过主编码器从图像中提取隐藏特征，再利用超先验自动编码机，提取隐藏特征中的边缘信息作为超先验特征。之后，通过超先验特征结合由上下文模型得到的上下文特征来估计隐藏特征的概率分布，以此来对隐藏特征进行算数编码。最后，利用主编码器将隐藏特征恢复成为图像。但是已有的方法依然存在很多不足。首先，主编码器在去除隐藏特征中的空间冗余的，同时也丢失了一部分有效的空间信息，尤其是具有复杂纹理的区域的信息在向前传播中被损失了，这限制了图像重构的质量。另一方面，由于压缩图像的内容尺度是不确定的，上下文模型中大小固定的掩码卷积核无法有效的从隐藏特征中获得有效的相关性信息，这使得现有方法的熵模型的准确性依然不够。Before the advent of deep learning-based image coding methods, there were a large number of traditional methods, including JPEG, JPEG2000, BPG, etc. These methods are still widely used until now, but the traditional methods have many hand-designed components, generally including Block, linear transform, quantization and entropy coding. Due to the rapid development of deep learning and its wide application in many computer vision fields, a large number of end-to-end image compression methods based on deep learning have been proposed. Most of the existing methods are based on relatively mature deep learning models, such as image compression based on convolutional neural networks, image compression based on generative adversarial networks, and image compression based on graph convolutional neural networks. Among them, the image compression algorithm based on generative adversarial network uses the adversarial training between the generator and the discriminator to improve the human eye perception of image reconstruction under low bit rate, but the restored image has a high peak signal-to-noise ratio (PSNR) and multi-level Poor performance on the metric of structural similarity (MS-SSIM). Because of the advantages of convolutional neural networks for image feature extraction, most image compression based on deep learning generally uses the structure of convolutional neural networks. Existing methods based on convolutional neural network firstly extract hidden features from images through a main encoder, and then use a super-prior auto-encoder to extract edge information in the hidden features as super-prior features. Afterwards, the hidden features are arithmetically encoded by estimating the probability distribution of the hidden features by combining the context features obtained by the context model with the super-prior features. Finally, the hidden features are recovered as images using the main encoder. However, the existing methods still have many shortcomings. First, the main encoder removes the spatial redundancy in the hidden features, and at the same time loses a part of the effective spatial information, especially the information of the area with complex texture is lost in the forward propagation, which limits the image reconstruction. the quality of. On the other hand, since the content scale of the compressed image is uncertain, the fixed-size mask convolution kernel in the context model cannot effectively obtain effective correlation information from the hidden features, which makes the entropy model of the existing methods more accurate. Sex is still not enough.

发明内容SUMMARY OF THE INVENTION

本发明的目的就是为了克服上述现有技术存在的缺陷而提供一种基于多尺度空间和上下文信息融合的图像压缩方法。The purpose of the present invention is to provide an image compression method based on fusion of multi-scale space and context information in order to overcome the above-mentioned defects of the prior art.

本发明的目的可以通过以下技术方案来实现：The object of the present invention can be realized through the following technical solutions:

一种基于多尺度空间和上下文信息融合的图像压缩方法，该方法包括以下步骤：An image compression method based on fusion of multi-scale space and context information, the method includes the following steps:

1)构建基于多尺度空间和上下文信息融合的图像压缩模型，通过主编码器从原始图像中提取隐藏特征，并采用多尺度信息融合模块减少向前传播有效信息的损失；1) Build an image compression model based on fusion of multi-scale space and contextual information, extract hidden features from the original image through the main encoder, and use a multi-scale information fusion module to reduce the loss of forward propagation of effective information;

2)超先验模块将超先验信息和多尺度上下文信息结合，获取三个高斯函数的参数及权重，以权重相加后得到高斯混合模型得到隐藏特征的概率分布；2) The super-prior module combines the super-prior information and multi-scale context information to obtain the parameters and weights of the three Gaussian functions, and adds the weights to obtain the Gaussian mixture model to obtain the probability distribution of the hidden features;

3)基于隐藏特征的概率分布，算数编码器对隐藏特征进行编码和解码；3) Based on the probability distribution of the hidden features, the arithmetic encoder encodes and decodes the hidden features;

4)主解码器将隐藏特征重构成为图片，完成图像压缩。4) The main decoder reconstructs the hidden features into pictures to complete image compression.

所述的步骤1)具体包括以下步骤：Described step 1) specifically comprises the following steps:

11)原始图片经由残差块、注意力模块以及多尺度信息融合模块进行特征提取以及降采样之后得到隐藏特征y，为对y进行熵编码，对其进行步长为1的量化得到量化后的隐藏特征

则有：11) The hidden feature y is obtained after feature extraction and downsampling of the original image through the residual block, the attention module and the multi-scale information fusion module. In order to perform entropy coding on y, quantize it with a step size of 1 to obtain the quantized hidden features

Then there are:

其中，x为原始图片，

为主编码器的参数，Q(·)表示量化处理，g_a(·)表示主编码器；where x is the original image,

are the parameters of the main encoder, Q(·) represents the quantization process, and _ga (·) represents the main encoder;

12)通过主编码器中的多尺度信息融合模块将降采样i次的特征y⁽ⁱ⁾和降采样i+2次的特征y⁽ⁱ⁺²⁾通过注意力机制的形式融合在一起，为减少计算资源消耗，主编码器仅采用两个多尺度信息模块，则有：12) Through the multi-scale information fusion module in the main encoder, the feature y ⁽ⁱ⁾ of down-sampling i times and the feature y ⁽ⁱ⁺²⁾ of down-sampling i+2 times are fused together in the form of attention mechanism, as To reduce computational resource consumption, the main encoder only uses two multi-scale information modules, as follows:

y⁽ⁱ⁺²⁾＝y⁽ⁱ⁺²⁾+y⁽ⁱ⁺²⁾*sigmoid(Res(y⁽ⁱ⁾)).y ⁽ⁱ⁺²⁾ = y ⁽ⁱ⁺²⁾ + y ⁽ⁱ⁺²⁾ *sigmoid(Res(y ⁽ⁱ⁾ )).

其中，Res(·)表示残差块。Among them, Res(·) represents the residual block.

所述的步骤2)具体包括以下步骤：Described step 2) specifically comprises the following steps:

21)超先验编码器从隐藏特征y中计算得到超先验特征z，再经由量化得到量化后的超先验特征

用以辅助提取隐藏特征中的空间冗余以及提高隐藏特征概率分布估计的准确性，则有：21) The super-prior encoder calculates the super-prior feature z from the hidden feature y, and then obtains the quantized super-prior feature through quantization

To assist in the extraction of spatial redundancy in hidden features and to improve the accuracy of probability distribution estimation of hidden features, there are:

其中，h_a(·)表示超先验编码器，

为超先验编码器的参数；where _ha ( ) represents the super-prior encoder,

are the parameters of the super-prior encoder;

22)利用多尺度三维上下文模块从量化后的隐藏特征

中得到的多尺度上下文特征

则有：22) Using a multi-scale 3D context module from the quantized hidden features

The multi-scale contextual features obtained in

Then there are:

其中，downsample表示下采样，

表示卷积核大小为5×5×5的三维上下文模型，

表示卷积核大小为7×7×7的三维上下文模型，

表示卷积核大小为9×9×9的三维上下文模型；Among them, downsample means downsampling,

represents a 3D context model with a convolution kernel size of 5×5×5,

represents a 3D context model with a kernel size of 7×7×7,

Represents a 3D context model with a convolution kernel size of 9×9×9;

23)将多尺度上下文特征

和超先验特征

结合之后，利用超先验解码器解算得到高斯混合模型的模型参数以及权重，则有：23) Combine multi-scale contextual features

and super-prior features

After the combination, the model parameters and weights of the Gaussian mixture model are obtained by using the super-prior decoder, as follows:

其中，ω_i，μ_i，

分别表示高斯混合模型中第i个高斯模型的权重，均值以及方差，

表示第i个超先验解码器；Among them, ω _i , μ _i ,

respectively represent the weight, mean and variance of the i-th Gaussian model in the Gaussian mixture model,

represents the i-th super-prior decoder;

24)根据权重将三个高斯函数组合成为高斯混合模型作为熵模型，计算得到隐藏特征概率分布的估计，则有：24) Combine the three Gaussian functions into a Gaussian mixture model as an entropy model according to the weight, and calculate the estimate of the probability distribution of hidden features, as follows:

其中，

为基于超先验特征

的隐藏特征

的条件概率分布，

为基于参数ω_i，μ_i的高斯概率分布，

为范围从

到

的均匀分布噪声。in,

is based on super-prior features

hidden features of

The conditional probability distribution of ,

is the Gaussian probability distribution based on parameters ω _i , μ _i ,

for the range from

arrive

uniformly distributed noise.

所述的步骤3)具体包括以下步骤：Described step 3) specifically comprises the following steps:

31)为防止模型训练时出现梯度消失现象，在训练图像压缩模型阶段，量化过程被替换为添加独立同分布的均匀噪声；31) In order to prevent the gradient disappearance phenomenon during model training, in the training image compression model stage, the quantization process is replaced by adding IID uniform noise;

32)在图像压缩模型的使用过程中，隐藏特征被量化，并且基于超先验编码器得到的熵模型计算出特征的概率分布，并且采用熵编码中的算数编码对量化的特征进行编码。32) In the process of using the image compression model, the hidden features are quantized, and the probability distribution of the features is calculated based on the entropy model obtained by the super-prior encoder, and the quantized features are encoded using arithmetic coding in entropy coding.

所述的步骤4)具体包括以下步骤：Described step 4) specifically comprises the following steps:

41)将隐藏特征经过主解码器中的残差块和升采样重新变为图片，则有：41) The hidden features are re-transformed into pictures through the residual block and up-sampling in the main decoder, there are:

其中，g_s(·)表示主解码器，

为主解码器的参数，

为重构的图片；where g _s ( ) represents the main decoder,

are the parameters of the main decoder,

is the reconstructed picture;

42)将重构的图片

和原始图片x进行客观和主观指标上的对比，从而评估模型的压缩效果和重构效果。42) The reconstructed picture

The objective and subjective indicators are compared with the original image x to evaluate the compression effect and reconstruction effect of the model.

所述的步骤42)中，客观和主观指标包括PSNR和MS-SSIM。In the step 42), the objective and subjective indicators include PSNR and MS-SSIM.

所述的步骤1)中，基于多尺度空间和上下文信息融合的图像压缩模型由主编码器、超先验编码器、超先验解码器、主解码器以及多尺度三维上下文模块。In the step 1), the image compression model based on the fusion of multi-scale space and context information consists of a main encoder, a super-a priori encoder, a super-a priori decoder, a main decoder and a multi-scale 3D context module.

图像压缩模型在训练时，为平衡码率和图像重构质量之间的关系，训练的目标函数被设置为：When the image compression model is trained, in order to balance the relationship between the bit rate and the image reconstruction quality, the training objective function is set as:

其中，λ为平衡码率和图像重构质量的超参数，

和

分别为量化后的隐藏特征

和量化后的超先验特征

的码率，D(·)表示原始图片和重构图片之间的差别，采用MSE和MS-SSIM作为衡量标准，当采用MSE优化模型时，则模型的评价标准为PSNR，当采用MS-SSIM优化模型时，则模型的评价标准为MS-SSM。Among them, λ is a hyperparameter that balances the bit rate and image reconstruction quality,

and

are the hidden features after quantization

and quantized super-prior features

D( ) represents the difference between the original picture and the reconstructed picture. MSE and MS-SSIM are used as the measurement standards. When the MSE optimization model is used, the evaluation standard of the model is PSNR. When the MS-SSIM is used When optimizing the model, the evaluation standard of the model is MS-SSM.

为加快模型收敛的速度，首先在高码率下进行预训练，之后修改λ的取值，将模型的码率调整到其他值。In order to speed up the convergence of the model, pre-training is first performed at a high code rate, and then the value of λ is modified to adjust the code rate of the model to other values.

在训练预训练模型时，学习率随迭代次数下降，在训练其他码率的模型时，学习率初始值增大且随迭代次数下降。When training a pre-trained model, the learning rate decreases with the number of iterations. When training models with other code rates, the initial value of the learning rate increases and decreases with the number of iterations.

与现有技术相比，本发明具有以下优点：Compared with the prior art, the present invention has the following advantages:

一、在主编码器中采用的多尺度信息融合模块，利用注意力机制将不同尺度的图像特征融合在了一起，该方法在保留了复杂区域的空间信息的同时，避免在隐藏特征中添加额外的空间冗余。1. The multi-scale information fusion module used in the main encoder uses the attention mechanism to fuse image features of different scales together. This method avoids adding additional hidden features while retaining the spatial information of complex regions. space redundancy.

二、在上下文模型中采用的多尺度三维上下文模块，通过并行地使用不同尺寸的掩码三维卷积核，将隐藏特征中不同尺度空间内的相关性信息融合在一起，从而提高熵模型的准确率，提高模型压缩效率。2. The multi-scale 3D context module used in the context model uses mask 3D convolution kernels of different sizes in parallel to fuse the correlation information in different scale spaces in the hidden features, thereby improving the accuracy of the entropy model. to improve the model compression efficiency.

附图说明Description of drawings

图1为基于多尺度的空间与上下文信息融合的图像压缩方法示意图。Figure 1 is a schematic diagram of an image compression method based on multi-scale spatial and contextual information fusion.

图2为多尺度信息融合模块示意图。Figure 2 is a schematic diagram of a multi-scale information fusion module.

图3为多尺度三维上下文模块示意图。Figure 3 is a schematic diagram of a multi-scale 3D context module.

图4为本发明和几种其他方法效果对比图。FIG. 4 is a comparison diagram of the effects of the present invention and several other methods.

图5为本发明和另几种其他方法效果对比图。FIG. 5 is a comparison diagram of the effects of the present invention and several other methods.

具体实施方式Detailed ways

下面结合附图和具体实施例对本发明进行详细说明。本实施例以本发明技术方案为前提进行实施，给出了详细的实施方式和具体的操作过程，但本发明的保护范围不限于下述的实施例。The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments. This embodiment is implemented on the premise of the technical solution of the present invention, and provides a detailed implementation manner and a specific operation process, but the protection scope of the present invention is not limited to the following embodiments.

实施例Example

本发明提供一种基于多尺度的空间与上下文信息融合的图像压缩方法，如图1所示，包括以下步骤：The present invention provides an image compression method based on the fusion of multi-scale space and context information, as shown in Figure 1, comprising the following steps:

11)本发明训练阶段采用COCO2014的训练集，所有训练的图片被随机剪裁为256x256的大小，训练时的批大小被设置为16，采用Kodak24作为模型训练的测试集；11) The training stage of the present invention adopts the training set of COCO2014, the pictures of all training are randomly cut into the size of 256×256, the batch size during training is set to 16, and Kodak24 is adopted as the test set of model training;

12)为了能够平衡码率和图像重构质量之间的关系，模型训练的目标函数被设置为：12) In order to balance the relationship between bit rate and image reconstruction quality, the objective function of model training is set as:

其中，λ是平衡码率和图像重构质量的超参数，

和

分别表示

和

的码率，D(·)表示原始图片和重构图片之间的差别，可以用MSE和MS-SSIM作为衡量标准。当用MSE优化模型时，模型的评价标准为PSNR，此时λ的取值根据码率的不同分别是{0.0035，0.0067，0.0130，0.0250，0.0483}。如果使用MS-SSIM优化模型，则模型的评价标准为MS-SSM，λ的取值根据码率的不同分别为{4.58，8.73，16.64，31.73，60.50}。为了加快模型收敛的速度，该模型首先在高码率下进行预训练，之后修改λ的取值，将模型的码率调整到其他值。在训练预训练模型时，学习率初始被设置为10^-5，之后每迭代100000次，学习率下降为原来的1/2。在那之后，训练其他码率的模型时，学习率初始被设置为5×10^-5，之后每迭代100000次，学习率下降为原来的1/2。where λ is a hyperparameter for balancing bit rate and image reconstruction quality,

and

Respectively

and

The code rate of , D( ) represents the difference between the original picture and the reconstructed picture, which can be measured by MSE and MS-SSIM. When using MSE to optimize the model, the evaluation standard of the model is PSNR, and the value of λ is {0.0035, 0.0067, 0.0130, 0.0250, 0.0483} according to the different code rates. If MS-SSIM is used to optimize the model, the evaluation standard of the model is MS-SSM, and the value of λ is {4.58, 8.73, 16.64, 31.73, 60.50} depending on the code rate. In order to speed up the convergence of the model, the model is first pre-trained at a high code rate, and then the value of λ is modified to adjust the code rate of the model to other values. When training the pre-trained model, the learning rate is initially set to 10 ^-5 , and after every 100,000 iterations, the learning rate is reduced to 1/2 of the original. After that, when training models with other bitrates, the learning rate is initially set to 5×10 ^-5 , and then decreases to 1/2 for every 100,000 iterations.

13)原始图片经由残差块，注意力模块以及多尺度信息融合模块进行特征提取以及降采样之后得到隐藏特征y，为了对y进行熵编码，对其进行步长为1的量化得到

以上操作可以用下述公示表示：13) The original image is extracted and downsampled through the residual block, attention module and multi-scale information fusion module to obtain the hidden feature y. In order to perform entropy coding on y, it is quantized with a step size of 1 to obtain

The above operations can be represented by the following announcements:

14)主编码器中多尺度信息融合模块将降采样i次的特征y⁽ⁱ⁾和降采样i+2次的特征y⁽ⁱ⁺²⁾通过注意力机制的形式融合在一起，具体如图2所示。为了减少计算资源的消耗，主编码器只采用两个多尺度信息模块。此操作可以用下述公式表示：14) The multi-scale information fusion module in the main encoder fuses the feature y ⁽ⁱ⁾ downsampling i times and the feature y ⁽ⁱ⁺²⁾ downsampling i+2 times together in the form of an attention mechanism, as shown in the figure. 2 shown. In order to reduce the consumption of computational resources, the main encoder only adopts two multi-scale information modules. This operation can be represented by the following formula:

y⁽ⁱ⁺²⁾＝y⁽ⁱ⁺²⁾+y⁽ⁱ⁺²⁾*sigmoid(Res(y⁽ⁱ⁾))y ⁽ⁱ⁺²⁾ = y ⁽ⁱ⁺²⁾ + y ⁽ⁱ⁺²⁾ *sigmoid(Res(y ⁽ⁱ⁾ ))

其中Res(·)表示残差块。where Res( ) represents the residual block.

21)超先验编码器从隐藏特征中计算出超先验特征z，再经由量化得到

用以帮助提取隐藏特征中的空间冗余和提高隐藏特征概率分布估计的准确性。以上步骤可以用下述公式表示：21) The super-prior encoder calculates the super-prior feature z from the hidden features, and then obtains it through quantization

It is used to help extract spatial redundancy in hidden features and improve the accuracy of probability distribution estimation of hidden features. The above steps can be expressed by the following formula:

22)利用多尺度三维上下文模型从量化后的隐藏特征中计算得到的多尺度上下文特征，具体如图3所示。以上步骤可以用公式表示：22) Multi-scale context features calculated from the quantized hidden features using a multi-scale 3D context model, as shown in Figure 3. The above steps can be expressed by the formula:

23)将上下文特征和超先验特征结合之后，利用超先验解码器结算出高斯混合模型的模型参数以及权重，可以表示为：23) After combining the contextual features and the super-priority features, the model parameters and weights of the Gaussian mixture model are calculated by using the super-priority decoder, which can be expressed as:

24)根据权重将三个高斯函数组合成为高斯混合模型作为熵模型，计算得到隐藏特征概率分布的估计，可以表示为：24) Combine the three Gaussian functions into a Gaussian mixture model as an entropy model according to the weight, and calculate the estimate of the probability distribution of hidden features, which can be expressed as:

31)由于经过量化之后的隐藏特征是离散的，离散函数的导数处处为0，这将导致模型训练是出现梯度消失，为了训练图像压缩模型，在训练阶段，量化过程将被替换成添加独立同分布的均匀噪声。31) Since the hidden features after quantization are discrete, the derivative of the discrete function is 0 everywhere, which will cause the gradient to disappear during model training. In order to train the image compression model, in the training phase, the quantization process will be replaced by adding an independent Distributed uniform noise.

32)模型的使用过程中，隐藏特征会被量化，并且基于超先验自动编码机得到的熵模型可以计算出特征的概率分布，并且利用熵编码对量化的特征进行编码，一般使用熵编码中的算数编码。32) During the use of the model, the hidden features will be quantized, and the probability distribution of the features can be calculated based on the entropy model obtained by the super-a priori automatic encoder, and the quantized features are encoded by entropy encoding, generally using entropy encoding. arithmetic coding.

41)将隐藏特征经过主解码器中的残差块和升采样重新变为图片，以上步骤可以表示为：41) The hidden feature is re-transformed into a picture through the residual block and upsampling in the main decoder. The above steps can be expressed as:

42)将重构的图片和原始图片进行客观和主观指标上的对比，从而评估模型的压缩效果和重构效果。42) Compare the reconstructed picture with the original picture on objective and subjective indicators, so as to evaluate the compression effect and reconstruction effect of the model.

为了验证该方法的有效性，本方法和JPEG，JPEG2000，BPG，VCC等传统方法和部分端到端的图像压缩方法作对比。Kodak24公开测试集作为测试的数据，将原始图片和算法压缩后重构的图片做对比，计算两者之间的PSNR和MS-SSIM两个指标上的差距，分别得到两个曲线图如图4和5所示。In order to verify the effectiveness of this method, this method is compared with traditional methods such as JPEG, JPEG2000, BPG, VCC and some end-to-end image compression methods. The Kodak24 public test set is used as the test data. Compare the original image and the reconstructed image compressed by the algorithm, calculate the difference between the two indicators, PSNR and MS-SSIM, and obtain two graphs as shown in Figure 4. and 5 shown.

本发明在主编码器中使用了多尺度信息融合模块可以在保留有效的空间信息的同时，避免在隐藏特征中添加空间冗余，在上下文模型中采用的多尺度三维上下文模块可以融合不同尺度空间下的上下文信息使得熵模型更加准确，本发明在Kodak24公开测试集的测试结果表明，在PSNR指标上，该方法的效果要比最新的传统图像压缩标准VVC高0.15dB。The present invention uses a multi-scale information fusion module in the main encoder to avoid adding spatial redundancy in hidden features while retaining effective spatial information, and the multi-scale three-dimensional context module used in the context model can fuse different scale spaces The context information below makes the entropy model more accurate. The test results of the present invention in the Kodak24 public test set show that the effect of the method is 0.15dB higher than the latest traditional image compression standard VVC on the PSNR index.

Claims

1. An image compression method based on multi-scale space and context information fusion is characterized by comprising the following steps:

1) constructing an image compression model based on multi-scale space and context information fusion, extracting hidden features from an original image through a main encoder, and reducing the loss of forward transmission effective information by adopting a multi-scale information fusion module;

2) the super prior module combines the super prior information and the multi-scale context information to obtain parameters and weights of three Gaussian functions, and the parameters and the weights are added to obtain a Gaussian mixture model to obtain probability distribution of hidden features;

3) based on the probability distribution of the hidden features, the arithmetic coder codes and decodes the hidden features;

4) and the main decoder reconstructs the hidden features into pictures to finish image compression.

2. The image compression method based on multi-scale spatial and contextual information fusion according to claim 1, wherein said step 1) specifically comprises the steps of:

11) the original picture is subjected to feature extraction and down-sampling through a residual block, an attention module and a multi-scale information fusion module to obtain a hidden feature y, and in order to carry out entropy coding on y, quantization with the step length of 1 is carried out on y to obtain a quantized hidden feature

Then there are:

wherein, x is the original picture,

for the parameters of the primary encoder, Q (-) denotes the quantization process, g _a () represents a primary encoder;

12) down-sampling the i-th feature y by a multi-scale information fusion module in a main encoder ⁽ⁱ⁾ And down-sampling feature y by i +2 times ⁽ⁱ⁺²⁾ Through the fusion of the forms of attention mechanisms, in order to reduce the consumption of computing resources, the main encoder only adopts two multi-scale information modules, and the following modules are provided:

y ⁽ⁱ⁺²⁾ ＝y ⁽ⁱ⁺²⁾ +y ⁽ⁱ⁺²⁾ *sigmoid(Res(y ⁽ⁱ⁾ )).

where Res (·) denotes the residual block.

3. The image compression method based on the fusion of the multi-scale space and the context information as claimed in claim 2, wherein the step 2) specifically comprises the following steps:

21) the super-prior encoder calculates the super-prior characteristic z from the hidden characteristic y, and then obtains the quantized super-prior characteristic through quantization

For assisting in extracting spatial redundancy in the hidden features and improving the accuracy of the probability distribution estimation of the hidden features, there are:

wherein h is _a (. cndot.) denotes a super-a-priori coder,

parameters of the super-prior encoder;

22) hidden features from quantization using a multi-scale three-dimensional context module

Multi-scale context features derived from

Then there are:

wherein, downsample is represented by downsample,

representing a three-dimensional context model with a convolution kernel size of 5 x 5,

representing a three-dimensional context model with convolution kernel size 7 x 7,

a three-dimensional context model representing a convolution kernel size of 9 × 9 × 9;

23) characterizing multi-scale context

And a super-precedent feature

After combination, the model parameters and the weight of the Gaussian mixture model are obtained by resolving through a super-first decoder, and then:

wherein, ω is _i ，μ _i ，

Respectively representing the weight, mean and variance of the ith Gaussian model in the Gaussian mixture model,

represents the ith super-a decoder;

24) combining three Gaussian functions into a Gaussian mixture model according to the weight to serve as an entropy model, and calculating to obtain the estimation of the probability distribution of the hidden features, wherein the estimation comprises the following steps:

wherein,

based on the characteristics of the prior

Hidden feature of (2)

The conditional probability distribution of (2) is,

based on a parameter omega _i ，μ _i The probability distribution of the gaussian distribution of (a),

is in the range of-

To

Evenly distributed noise.

4. The image compression method based on the fusion of the multi-scale space and the context information as claimed in claim 2, wherein the step 3) specifically comprises the following steps:

31) in order to prevent the gradient disappearance phenomenon during model training, in the stage of training an image compression model, the quantization process is replaced by adding independent uniformly distributed uniform noise;

32) during the use of the image compression model, the hidden features are quantized, the probability distribution of the features is calculated based on the entropy model obtained by the prior encoder, and the quantized features are encoded by the arithmetic coding in the entropy encoding.

5. The image compression method based on multi-scale spatial and contextual information fusion according to claim 1, wherein said step 4) specifically comprises the steps of:

41) the hidden features are re-transformed into pictures through residual blocks and upsampling in the main decoder, then there are:

wherein, g _s (. cndot.) denotes a master decoder,

is a parameter of the main decoder and is,

is a reconstructed picture;

42) picture to be reconstructed

And comparing the compression effect with the reconstruction effect of the model by objective and subjective indexes of the original picture x.

6. The method as claimed in claim 5, wherein in step 42), the objective and subjective indicators include PSNR and MS-SSIM.

7. The method according to claim 1, wherein the image compression model based on fusion of multi-scale space and context information in step 1) comprises a main encoder, a super-a-decoder, a main decoder, and a multi-scale three-dimensional context module.

8. The method of claim 7, wherein when the image compression model is trained, in order to balance the relationship between the bit rate and the image reconstruction quality, the trained objective function is set as:

wherein, the lambda is a hyper-parameter for balancing code rate and image reconstruction quality,

and

respectively the quantized hidden featuresSign for

And the quantized super-a priori characteristics

The code rate of (1), D (-) represents the difference between the original picture and the reconstructed picture, MSE and MS-SSIM are used as the measuring standards, when an MSE optimization model is adopted, the evaluation standard of the model is PSNR, and when an MS-SSIM optimization model is adopted, the evaluation standard of the model is MS-SSM.

9. The image compression method based on multi-scale space and context information fusion of claim 8, characterized in that, in order to accelerate the convergence speed of the model, pre-training is performed at a high code rate, and then the value of λ is modified, and the code rate of the model is adjusted to other values.

10. The image compression method based on multi-scale space and context information fusion of claim 8, characterized in that, when a pre-training model is trained, the learning rate decreases with the number of iterations, and when models with other code rates are trained, the initial learning rate value increases and decreases with the number of iterations.