CN114792347A - An image compression method based on fusion of multi-scale space and context information - Google Patents

An image compression method based on fusion of multi-scale space and context information Download PDF

Info

Publication number
CN114792347A
CN114792347A CN202210224174.0A CN202210224174A CN114792347A CN 114792347 A CN114792347 A CN 114792347A CN 202210224174 A CN202210224174 A CN 202210224174A CN 114792347 A CN114792347 A CN 114792347A
Authority
CN
China
Prior art keywords
model
scale
image compression
super
hidden
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210224174.0A
Other languages
Chinese (zh)
Other versions
CN114792347B (en
Inventor
王瀚漓
刘自毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN202210224174.0A priority Critical patent/CN114792347B/en
Publication of CN114792347A publication Critical patent/CN114792347A/en
Application granted granted Critical
Publication of CN114792347B publication Critical patent/CN114792347B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • G06T9/002Image coding using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Compression Of Band Width Or Redundancy In Fax (AREA)

Abstract

本发明涉及一种基于多尺度空间和上下文信息融合的图像压缩方法,该方法包括以下步骤:1)构建基于多尺度空间和上下文信息融合的图像压缩模型,通过主编码器从原始图像中提取隐藏特征,并采用多尺度信息融合模块减少向前传播有效信息的损失;2)超先验模块将超先验信息和多尺度上下文信息结合,获取三个高斯函数的参数及权重,以权重相加后得到高斯混合模型得到隐藏特征的概率分布;3)基于隐藏特征的概率分布,算数编码器对隐藏特征进行编码和解码;4)主解码器将隐藏特征重构成为图片,完成图像压缩。与现有技术相比,本发明能够实现在压缩码率更低的情况下,更加优秀的图像重构质量。

Figure 202210224174

The invention relates to an image compression method based on fusion of multi-scale space and context information. The method includes the following steps: 1) constructing an image compression model based on fusion of multi-scale space and context information, and extracting hidden images from the original image through a main encoder features, and uses a multi-scale information fusion module to reduce the loss of forward propagation of effective information; 2) The super-prior module combines the super-prior information and multi-scale context information to obtain the parameters and weights of the three Gaussian functions, and add the weights Then, the Gaussian mixture model is obtained to obtain the probability distribution of hidden features; 3) Based on the probability distribution of hidden features, the arithmetic encoder encodes and decodes the hidden features; 4) The main decoder reconstructs the hidden features into pictures to complete image compression. Compared with the prior art, the present invention can achieve better image reconstruction quality under the condition of lower compression code rate.

Figure 202210224174

Description

一种基于多尺度空间和上下文信息融合的图像压缩方法An image compression method based on fusion of multi-scale space and context information

技术领域technical field

本发明涉及图像压缩技术领域,尤其是涉及一种基于多尺度空间和上下文信息融合的图像压缩方法。The invention relates to the technical field of image compression, in particular to an image compression method based on fusion of multi-scale space and context information.

背景技术Background technique

在第三次信息革命之后,大量的数字信息在各个终端之间相互传输。但限于当时的数字信息获取方式,大多数的信息还是停留在文字信息上,但是随着各种电子数码产品的诞生和普及,尤其是移动互联网时代的到来,每一个人都可以成为摄影师,可以方便使用的电子设备使得大量的图片与视频信息在互联网上大量传输,大量的数据也导致了对于数据速度的要求和对于存储空间的要求飞速增加。因此对于数据压缩的必要性也由此体现出来,其中如何压缩占据互联网数据相当部分的图片数据成为了一个热点研究课题。After the third information revolution, a large amount of digital information is transmitted to and from each terminal. However, limited by the way of obtaining digital information at that time, most of the information remained in text information, but with the birth and popularization of various electronic digital products, especially the advent of the mobile Internet era, everyone can become a photographer, Electronic devices that can be easily used enable a large number of pictures and video information to be transmitted on the Internet in large quantities, and a large amount of data also leads to a rapid increase in the requirements for data speed and storage space. Therefore, the necessity of data compression is also reflected, and how to compress image data which occupies a considerable part of Internet data has become a hot research topic.

在基于深度学习的图像编码方法出现之前,有着大量的传统方法,包括JPEG、JPEG2000、BPG等,这些方法直到现在依然被广泛使用,但是传统方法被有很多手工设计的组件,一般而言包括分块,线性变换,量化以及熵编码。由于深度学习的迅速发展以及在众多计算机视觉领域的广泛运用,大量基于深度学习的端到端的图像压缩方法被提出。现有的方法大多基于比较成熟的深度学习模型,比如基于卷积神经网络的图像压缩,基于生成对抗网络的图像压缩和基于图卷积神经网络的图像压缩。其中基于生成对抗网络的图像压缩算法利用生成器和判别器之间的对抗训练,提高了低码率下图像重构的人眼感官,但是恢复的图片在峰值信噪比(PSNR)和多层级结构相似性(MS-SSIM)的指标上表现不佳。因为卷积神经网络对于图像特征提取的优势,大部分基于深度学习的图像压缩一般使用卷积神经网络的结构。现有的基于卷积神经网络的方法,首先,通过主编码器从图像中提取隐藏特征,再利用超先验自动编码机,提取隐藏特征中的边缘信息作为超先验特征。之后,通过超先验特征结合由上下文模型得到的上下文特征来估计隐藏特征的概率分布,以此来对隐藏特征进行算数编码。最后,利用主编码器将隐藏特征恢复成为图像。但是已有的方法依然存在很多不足。首先,主编码器在去除隐藏特征中的空间冗余的,同时也丢失了一部分有效的空间信息,尤其是具有复杂纹理的区域的信息在向前传播中被损失了,这限制了图像重构的质量。另一方面,由于压缩图像的内容尺度是不确定的,上下文模型中大小固定的掩码卷积核无法有效的从隐藏特征中获得有效的相关性信息,这使得现有方法的熵模型的准确性依然不够。Before the advent of deep learning-based image coding methods, there were a large number of traditional methods, including JPEG, JPEG2000, BPG, etc. These methods are still widely used until now, but the traditional methods have many hand-designed components, generally including Block, linear transform, quantization and entropy coding. Due to the rapid development of deep learning and its wide application in many computer vision fields, a large number of end-to-end image compression methods based on deep learning have been proposed. Most of the existing methods are based on relatively mature deep learning models, such as image compression based on convolutional neural networks, image compression based on generative adversarial networks, and image compression based on graph convolutional neural networks. Among them, the image compression algorithm based on generative adversarial network uses the adversarial training between the generator and the discriminator to improve the human eye perception of image reconstruction under low bit rate, but the restored image has a high peak signal-to-noise ratio (PSNR) and multi-level Poor performance on the metric of structural similarity (MS-SSIM). Because of the advantages of convolutional neural networks for image feature extraction, most image compression based on deep learning generally uses the structure of convolutional neural networks. Existing methods based on convolutional neural network firstly extract hidden features from images through a main encoder, and then use a super-prior auto-encoder to extract edge information in the hidden features as super-prior features. Afterwards, the hidden features are arithmetically encoded by estimating the probability distribution of the hidden features by combining the context features obtained by the context model with the super-prior features. Finally, the hidden features are recovered as images using the main encoder. However, the existing methods still have many shortcomings. First, the main encoder removes the spatial redundancy in the hidden features, and at the same time loses a part of the effective spatial information, especially the information of the area with complex texture is lost in the forward propagation, which limits the image reconstruction. the quality of. On the other hand, since the content scale of the compressed image is uncertain, the fixed-size mask convolution kernel in the context model cannot effectively obtain effective correlation information from the hidden features, which makes the entropy model of the existing methods more accurate. Sex is still not enough.

发明内容SUMMARY OF THE INVENTION

本发明的目的就是为了克服上述现有技术存在的缺陷而提供一种基于多尺度空间和上下文信息融合的图像压缩方法。The purpose of the present invention is to provide an image compression method based on fusion of multi-scale space and context information in order to overcome the above-mentioned defects of the prior art.

本发明的目的可以通过以下技术方案来实现:The object of the present invention can be realized through the following technical solutions:

一种基于多尺度空间和上下文信息融合的图像压缩方法,该方法包括以下步骤:An image compression method based on fusion of multi-scale space and context information, the method includes the following steps:

1)构建基于多尺度空间和上下文信息融合的图像压缩模型,通过主编码器从原始图像中提取隐藏特征,并采用多尺度信息融合模块减少向前传播有效信息的损失;1) Build an image compression model based on fusion of multi-scale space and contextual information, extract hidden features from the original image through the main encoder, and use a multi-scale information fusion module to reduce the loss of forward propagation of effective information;

2)超先验模块将超先验信息和多尺度上下文信息结合,获取三个高斯函数的参数及权重,以权重相加后得到高斯混合模型得到隐藏特征的概率分布;2) The super-prior module combines the super-prior information and multi-scale context information to obtain the parameters and weights of the three Gaussian functions, and adds the weights to obtain the Gaussian mixture model to obtain the probability distribution of the hidden features;

3)基于隐藏特征的概率分布,算数编码器对隐藏特征进行编码和解码;3) Based on the probability distribution of the hidden features, the arithmetic encoder encodes and decodes the hidden features;

4)主解码器将隐藏特征重构成为图片,完成图像压缩。4) The main decoder reconstructs the hidden features into pictures to complete image compression.

所述的步骤1)具体包括以下步骤:Described step 1) specifically comprises the following steps:

11)原始图片经由残差块、注意力模块以及多尺度信息融合模块进行特征提取以及降采样之后得到隐藏特征y,为对y进行熵编码,对其进行步长为1的量化得到量化后的隐藏特征

Figure BDA0003538622660000021
则有:11) The hidden feature y is obtained after feature extraction and downsampling of the original image through the residual block, the attention module and the multi-scale information fusion module. In order to perform entropy coding on y, quantize it with a step size of 1 to obtain the quantized hidden features
Figure BDA0003538622660000021
Then there are:

Figure BDA0003538622660000022
Figure BDA0003538622660000022

Figure BDA0003538622660000023
Figure BDA0003538622660000023

其中,x为原始图片,

Figure BDA0003538622660000024
为主编码器的参数,Q(·)表示量化处理,ga(·)表示主编码器;where x is the original image,
Figure BDA0003538622660000024
are the parameters of the main encoder, Q(·) represents the quantization process, and ga (·) represents the main encoder;

12)通过主编码器中的多尺度信息融合模块将降采样i次的特征y(i)和降采样i+2次的特征y(i+2)通过注意力机制的形式融合在一起,为减少计算资源消耗,主编码器仅采用两个多尺度信息模块,则有:12) Through the multi-scale information fusion module in the main encoder, the feature y (i) of down-sampling i times and the feature y (i+2) of down-sampling i+2 times are fused together in the form of attention mechanism, as To reduce computational resource consumption, the main encoder only uses two multi-scale information modules, as follows:

y(i+2)=y(i+2)+y(i+2)*sigmoid(Res(y(i))).y (i+2) = y (i+2) + y (i+2) *sigmoid(Res(y (i) )).

其中,Res(·)表示残差块。Among them, Res(·) represents the residual block.

所述的步骤2)具体包括以下步骤:Described step 2) specifically comprises the following steps:

21)超先验编码器从隐藏特征y中计算得到超先验特征z,再经由量化得到量化后的超先验特征

Figure BDA0003538622660000031
用以辅助提取隐藏特征中的空间冗余以及提高隐藏特征概率分布估计的准确性,则有:21) The super-prior encoder calculates the super-prior feature z from the hidden feature y, and then obtains the quantized super-prior feature through quantization
Figure BDA0003538622660000031
To assist in the extraction of spatial redundancy in hidden features and to improve the accuracy of probability distribution estimation of hidden features, there are:

Figure BDA0003538622660000032
Figure BDA0003538622660000032

Figure BDA0003538622660000033
Figure BDA0003538622660000033

其中,ha(·)表示超先验编码器,

Figure BDA0003538622660000034
为超先验编码器的参数;where ha ( ) represents the super-prior encoder,
Figure BDA0003538622660000034
are the parameters of the super-prior encoder;

22)利用多尺度三维上下文模块从量化后的隐藏特征

Figure BDA0003538622660000035
中得到的多尺度上下文特征
Figure BDA0003538622660000036
则有:22) Using a multi-scale 3D context module from the quantized hidden features
Figure BDA0003538622660000035
The multi-scale contextual features obtained in
Figure BDA0003538622660000036
Then there are:

Figure BDA0003538622660000037
Figure BDA0003538622660000037

其中,downsample表示下采样,

Figure BDA0003538622660000038
表示卷积核大小为5×5×5的三维上下文模型,
Figure BDA0003538622660000039
表示卷积核大小为7×7×7的三维上下文模型,
Figure BDA00035386226600000310
表示卷积核大小为9×9×9的三维上下文模型;Among them, downsample means downsampling,
Figure BDA0003538622660000038
represents a 3D context model with a convolution kernel size of 5×5×5,
Figure BDA0003538622660000039
represents a 3D context model with a kernel size of 7×7×7,
Figure BDA00035386226600000310
Represents a 3D context model with a convolution kernel size of 9×9×9;

23)将多尺度上下文特征

Figure BDA00035386226600000311
和超先验特征
Figure BDA00035386226600000312
结合之后,利用超先验解码器解算得到高斯混合模型的模型参数以及权重,则有:23) Combine multi-scale contextual features
Figure BDA00035386226600000311
and super-prior features
Figure BDA00035386226600000312
After the combination, the model parameters and weights of the Gaussian mixture model are obtained by using the super-prior decoder, as follows:

Figure BDA00035386226600000313
Figure BDA00035386226600000313

其中,ωi,μi

Figure BDA00035386226600000314
分别表示高斯混合模型中第i个高斯模型的权重,均值以及方差,
Figure BDA00035386226600000315
表示第i个超先验解码器;Among them, ω i , μ i ,
Figure BDA00035386226600000314
respectively represent the weight, mean and variance of the i-th Gaussian model in the Gaussian mixture model,
Figure BDA00035386226600000315
represents the i-th super-prior decoder;

24)根据权重将三个高斯函数组合成为高斯混合模型作为熵模型,计算得到隐藏特征概率分布的估计,则有:24) Combine the three Gaussian functions into a Gaussian mixture model as an entropy model according to the weight, and calculate the estimate of the probability distribution of hidden features, as follows:

Figure BDA00035386226600000316
Figure BDA00035386226600000316

其中,

Figure BDA00035386226600000317
为基于超先验特征
Figure BDA00035386226600000318
的隐藏特征
Figure BDA00035386226600000319
的条件概率分布,
Figure BDA00035386226600000320
为基于参数ωi,μi的高斯概率分布,
Figure BDA00035386226600000321
为范围从
Figure BDA00035386226600000322
Figure BDA00035386226600000323
的均匀分布噪声。in,
Figure BDA00035386226600000317
is based on super-prior features
Figure BDA00035386226600000318
hidden features of
Figure BDA00035386226600000319
The conditional probability distribution of ,
Figure BDA00035386226600000320
is the Gaussian probability distribution based on parameters ω i , μ i ,
Figure BDA00035386226600000321
for the range from
Figure BDA00035386226600000322
arrive
Figure BDA00035386226600000323
uniformly distributed noise.

所述的步骤3)具体包括以下步骤:Described step 3) specifically comprises the following steps:

31)为防止模型训练时出现梯度消失现象,在训练图像压缩模型阶段,量化过程被替换为添加独立同分布的均匀噪声;31) In order to prevent the gradient disappearance phenomenon during model training, in the training image compression model stage, the quantization process is replaced by adding IID uniform noise;

32)在图像压缩模型的使用过程中,隐藏特征被量化,并且基于超先验编码器得到的熵模型计算出特征的概率分布,并且采用熵编码中的算数编码对量化的特征进行编码。32) In the process of using the image compression model, the hidden features are quantized, and the probability distribution of the features is calculated based on the entropy model obtained by the super-prior encoder, and the quantized features are encoded using arithmetic coding in entropy coding.

所述的步骤4)具体包括以下步骤:Described step 4) specifically comprises the following steps:

41)将隐藏特征经过主解码器中的残差块和升采样重新变为图片,则有:41) The hidden features are re-transformed into pictures through the residual block and up-sampling in the main decoder, there are:

Figure BDA0003538622660000041
Figure BDA0003538622660000041

其中,gs(·)表示主解码器,

Figure BDA0003538622660000042
为主解码器的参数,
Figure BDA0003538622660000043
为重构的图片;where g s ( ) represents the main decoder,
Figure BDA0003538622660000042
are the parameters of the main decoder,
Figure BDA0003538622660000043
is the reconstructed picture;

42)将重构的图片

Figure BDA0003538622660000044
和原始图片x进行客观和主观指标上的对比,从而评估模型的压缩效果和重构效果。42) The reconstructed picture
Figure BDA0003538622660000044
The objective and subjective indicators are compared with the original image x to evaluate the compression effect and reconstruction effect of the model.

所述的步骤42)中,客观和主观指标包括PSNR和MS-SSIM。In the step 42), the objective and subjective indicators include PSNR and MS-SSIM.

所述的步骤1)中,基于多尺度空间和上下文信息融合的图像压缩模型由主编码器、超先验编码器、超先验解码器、主解码器以及多尺度三维上下文模块。In the step 1), the image compression model based on the fusion of multi-scale space and context information consists of a main encoder, a super-a priori encoder, a super-a priori decoder, a main decoder and a multi-scale 3D context module.

图像压缩模型在训练时,为平衡码率和图像重构质量之间的关系,训练的目标函数被设置为:When the image compression model is trained, in order to balance the relationship between the bit rate and the image reconstruction quality, the training objective function is set as:

Figure BDA0003538622660000045
Figure BDA0003538622660000045

其中,λ为平衡码率和图像重构质量的超参数,

Figure BDA0003538622660000046
Figure BDA0003538622660000047
分别为量化后的隐藏特征
Figure BDA0003538622660000048
和量化后的超先验特征
Figure BDA0003538622660000049
的码率,D(·)表示原始图片和重构图片之间的差别,采用MSE和MS-SSIM作为衡量标准,当采用MSE优化模型时,则模型的评价标准为PSNR,当采用MS-SSIM优化模型时,则模型的评价标准为MS-SSM。Among them, λ is a hyperparameter that balances the bit rate and image reconstruction quality,
Figure BDA0003538622660000046
and
Figure BDA0003538622660000047
are the hidden features after quantization
Figure BDA0003538622660000048
and quantized super-prior features
Figure BDA0003538622660000049
D( ) represents the difference between the original picture and the reconstructed picture. MSE and MS-SSIM are used as the measurement standards. When the MSE optimization model is used, the evaluation standard of the model is PSNR. When the MS-SSIM is used When optimizing the model, the evaluation standard of the model is MS-SSM.

为加快模型收敛的速度,首先在高码率下进行预训练,之后修改λ的取值,将模型的码率调整到其他值。In order to speed up the convergence of the model, pre-training is first performed at a high code rate, and then the value of λ is modified to adjust the code rate of the model to other values.

在训练预训练模型时,学习率随迭代次数下降,在训练其他码率的模型时,学习率初始值增大且随迭代次数下降。When training a pre-trained model, the learning rate decreases with the number of iterations. When training models with other code rates, the initial value of the learning rate increases and decreases with the number of iterations.

与现有技术相比,本发明具有以下优点:Compared with the prior art, the present invention has the following advantages:

一、在主编码器中采用的多尺度信息融合模块,利用注意力机制将不同尺度的图像特征融合在了一起,该方法在保留了复杂区域的空间信息的同时,避免在隐藏特征中添加额外的空间冗余。1. The multi-scale information fusion module used in the main encoder uses the attention mechanism to fuse image features of different scales together. This method avoids adding additional hidden features while retaining the spatial information of complex regions. space redundancy.

二、在上下文模型中采用的多尺度三维上下文模块,通过并行地使用不同尺寸的掩码三维卷积核,将隐藏特征中不同尺度空间内的相关性信息融合在一起,从而提高熵模型的准确率,提高模型压缩效率。2. The multi-scale 3D context module used in the context model uses mask 3D convolution kernels of different sizes in parallel to fuse the correlation information in different scale spaces in the hidden features, thereby improving the accuracy of the entropy model. to improve the model compression efficiency.

附图说明Description of drawings

图1为基于多尺度的空间与上下文信息融合的图像压缩方法示意图。Figure 1 is a schematic diagram of an image compression method based on multi-scale spatial and contextual information fusion.

图2为多尺度信息融合模块示意图。Figure 2 is a schematic diagram of a multi-scale information fusion module.

图3为多尺度三维上下文模块示意图。Figure 3 is a schematic diagram of a multi-scale 3D context module.

图4为本发明和几种其他方法效果对比图。FIG. 4 is a comparison diagram of the effects of the present invention and several other methods.

图5为本发明和另几种其他方法效果对比图。FIG. 5 is a comparison diagram of the effects of the present invention and several other methods.

具体实施方式Detailed ways

下面结合附图和具体实施例对本发明进行详细说明。本实施例以本发明技术方案为前提进行实施,给出了详细的实施方式和具体的操作过程,但本发明的保护范围不限于下述的实施例。The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments. This embodiment is implemented on the premise of the technical solution of the present invention, and provides a detailed implementation manner and a specific operation process, but the protection scope of the present invention is not limited to the following embodiments.

实施例Example

本发明提供一种基于多尺度的空间与上下文信息融合的图像压缩方法,如图1所示,包括以下步骤:The present invention provides an image compression method based on the fusion of multi-scale space and context information, as shown in Figure 1, comprising the following steps:

11)本发明训练阶段采用COCO2014的训练集,所有训练的图片被随机剪裁为256x256的大小,训练时的批大小被设置为16,采用Kodak24作为模型训练的测试集;11) The training stage of the present invention adopts the training set of COCO2014, the pictures of all training are randomly cut into the size of 256×256, the batch size during training is set to 16, and Kodak24 is adopted as the test set of model training;

12)为了能够平衡码率和图像重构质量之间的关系,模型训练的目标函数被设置为:12) In order to balance the relationship between bit rate and image reconstruction quality, the objective function of model training is set as:

Figure BDA0003538622660000051
Figure BDA0003538622660000051

其中,λ是平衡码率和图像重构质量的超参数,

Figure BDA0003538622660000052
Figure BDA0003538622660000053
分别表示
Figure BDA0003538622660000054
Figure BDA0003538622660000055
的码率,D(·)表示原始图片和重构图片之间的差别,可以用MSE和MS-SSIM作为衡量标准。当用MSE优化模型时,模型的评价标准为PSNR,此时λ的取值根据码率的不同分别是{0.0035,0.0067,0.0130,0.0250,0.0483}。如果使用MS-SSIM优化模型,则模型的评价标准为MS-SSM,λ的取值根据码率的不同分别为{4.58,8.73,16.64,31.73,60.50}。为了加快模型收敛的速度,该模型首先在高码率下进行预训练,之后修改λ的取值,将模型的码率调整到其他值。在训练预训练模型时,学习率初始被设置为10-5,之后每迭代100000次,学习率下降为原来的1/2。在那之后,训练其他码率的模型时,学习率初始被设置为5×10-5,之后每迭代100000次,学习率下降为原来的1/2。where λ is a hyperparameter for balancing bit rate and image reconstruction quality,
Figure BDA0003538622660000052
and
Figure BDA0003538622660000053
Respectively
Figure BDA0003538622660000054
and
Figure BDA0003538622660000055
The code rate of , D( ) represents the difference between the original picture and the reconstructed picture, which can be measured by MSE and MS-SSIM. When using MSE to optimize the model, the evaluation standard of the model is PSNR, and the value of λ is {0.0035, 0.0067, 0.0130, 0.0250, 0.0483} according to the different code rates. If MS-SSIM is used to optimize the model, the evaluation standard of the model is MS-SSM, and the value of λ is {4.58, 8.73, 16.64, 31.73, 60.50} depending on the code rate. In order to speed up the convergence of the model, the model is first pre-trained at a high code rate, and then the value of λ is modified to adjust the code rate of the model to other values. When training the pre-trained model, the learning rate is initially set to 10 -5 , and after every 100,000 iterations, the learning rate is reduced to 1/2 of the original. After that, when training models with other bitrates, the learning rate is initially set to 5×10 -5 , and then decreases to 1/2 for every 100,000 iterations.

13)原始图片经由残差块,注意力模块以及多尺度信息融合模块进行特征提取以及降采样之后得到隐藏特征y,为了对y进行熵编码,对其进行步长为1的量化得到

Figure BDA0003538622660000068
以上操作可以用下述公示表示:13) The original image is extracted and downsampled through the residual block, attention module and multi-scale information fusion module to obtain the hidden feature y. In order to perform entropy coding on y, it is quantized with a step size of 1 to obtain
Figure BDA0003538622660000068
The above operations can be represented by the following announcements:

Figure BDA0003538622660000061
Figure BDA0003538622660000061

Figure BDA0003538622660000062
Figure BDA0003538622660000062

14)主编码器中多尺度信息融合模块将降采样i次的特征y(i)和降采样i+2次的特征y(i+2)通过注意力机制的形式融合在一起,具体如图2所示。为了减少计算资源的消耗,主编码器只采用两个多尺度信息模块。此操作可以用下述公式表示:14) The multi-scale information fusion module in the main encoder fuses the feature y (i) downsampling i times and the feature y (i+2) downsampling i+2 times together in the form of an attention mechanism, as shown in the figure. 2 shown. In order to reduce the consumption of computational resources, the main encoder only adopts two multi-scale information modules. This operation can be represented by the following formula:

y(i+2)=y(i+2)+y(i+2)*sigmoid(Res(y(i)))y (i+2) = y (i+2) + y (i+2) *sigmoid(Res(y (i) ))

其中Res(·)表示残差块。where Res( ) represents the residual block.

21)超先验编码器从隐藏特征中计算出超先验特征z,再经由量化得到

Figure BDA0003538622660000069
用以帮助提取隐藏特征中的空间冗余和提高隐藏特征概率分布估计的准确性。以上步骤可以用下述公式表示:21) The super-prior encoder calculates the super-prior feature z from the hidden features, and then obtains it through quantization
Figure BDA0003538622660000069
It is used to help extract spatial redundancy in hidden features and improve the accuracy of probability distribution estimation of hidden features. The above steps can be expressed by the following formula:

Figure BDA0003538622660000063
Figure BDA0003538622660000063

Figure BDA0003538622660000064
Figure BDA0003538622660000064

22)利用多尺度三维上下文模型从量化后的隐藏特征中计算得到的多尺度上下文特征,具体如图3所示。以上步骤可以用公式表示:22) Multi-scale context features calculated from the quantized hidden features using a multi-scale 3D context model, as shown in Figure 3. The above steps can be expressed by the formula:

Figure BDA0003538622660000065
Figure BDA0003538622660000065

23)将上下文特征和超先验特征结合之后,利用超先验解码器结算出高斯混合模型的模型参数以及权重,可以表示为:23) After combining the contextual features and the super-priority features, the model parameters and weights of the Gaussian mixture model are calculated by using the super-priority decoder, which can be expressed as:

Figure BDA0003538622660000066
Figure BDA0003538622660000066

24)根据权重将三个高斯函数组合成为高斯混合模型作为熵模型,计算得到隐藏特征概率分布的估计,可以表示为:24) Combine the three Gaussian functions into a Gaussian mixture model as an entropy model according to the weight, and calculate the estimate of the probability distribution of hidden features, which can be expressed as:

Figure BDA0003538622660000067
Figure BDA0003538622660000067

31)由于经过量化之后的隐藏特征是离散的,离散函数的导数处处为0,这将导致模型训练是出现梯度消失,为了训练图像压缩模型,在训练阶段,量化过程将被替换成添加独立同分布的均匀噪声。31) Since the hidden features after quantization are discrete, the derivative of the discrete function is 0 everywhere, which will cause the gradient to disappear during model training. In order to train the image compression model, in the training phase, the quantization process will be replaced by adding an independent Distributed uniform noise.

32)模型的使用过程中,隐藏特征会被量化,并且基于超先验自动编码机得到的熵模型可以计算出特征的概率分布,并且利用熵编码对量化的特征进行编码,一般使用熵编码中的算数编码。32) During the use of the model, the hidden features will be quantized, and the probability distribution of the features can be calculated based on the entropy model obtained by the super-a priori automatic encoder, and the quantized features are encoded by entropy encoding, generally using entropy encoding. arithmetic coding.

41)将隐藏特征经过主解码器中的残差块和升采样重新变为图片,以上步骤可以表示为:41) The hidden feature is re-transformed into a picture through the residual block and upsampling in the main decoder. The above steps can be expressed as:

Figure BDA0003538622660000071
Figure BDA0003538622660000071

42)将重构的图片和原始图片进行客观和主观指标上的对比,从而评估模型的压缩效果和重构效果。42) Compare the reconstructed picture with the original picture on objective and subjective indicators, so as to evaluate the compression effect and reconstruction effect of the model.

为了验证该方法的有效性,本方法和JPEG,JPEG2000,BPG,VCC等传统方法和部分端到端的图像压缩方法作对比。Kodak24公开测试集作为测试的数据,将原始图片和算法压缩后重构的图片做对比,计算两者之间的PSNR和MS-SSIM两个指标上的差距,分别得到两个曲线图如图4和5所示。In order to verify the effectiveness of this method, this method is compared with traditional methods such as JPEG, JPEG2000, BPG, VCC and some end-to-end image compression methods. The Kodak24 public test set is used as the test data. Compare the original image and the reconstructed image compressed by the algorithm, calculate the difference between the two indicators, PSNR and MS-SSIM, and obtain two graphs as shown in Figure 4. and 5 shown.

本发明在主编码器中使用了多尺度信息融合模块可以在保留有效的空间信息的同时,避免在隐藏特征中添加空间冗余,在上下文模型中采用的多尺度三维上下文模块可以融合不同尺度空间下的上下文信息使得熵模型更加准确,本发明在Kodak24公开测试集的测试结果表明,在PSNR指标上,该方法的效果要比最新的传统图像压缩标准VVC高0.15dB。The present invention uses a multi-scale information fusion module in the main encoder to avoid adding spatial redundancy in hidden features while retaining effective spatial information, and the multi-scale three-dimensional context module used in the context model can fuse different scale spaces The context information below makes the entropy model more accurate. The test results of the present invention in the Kodak24 public test set show that the effect of the method is 0.15dB higher than the latest traditional image compression standard VVC on the PSNR index.

Claims (10)

1. An image compression method based on multi-scale space and context information fusion is characterized by comprising the following steps:
1) constructing an image compression model based on multi-scale space and context information fusion, extracting hidden features from an original image through a main encoder, and reducing the loss of forward transmission effective information by adopting a multi-scale information fusion module;
2) the super prior module combines the super prior information and the multi-scale context information to obtain parameters and weights of three Gaussian functions, and the parameters and the weights are added to obtain a Gaussian mixture model to obtain probability distribution of hidden features;
3) based on the probability distribution of the hidden features, the arithmetic coder codes and decodes the hidden features;
4) and the main decoder reconstructs the hidden features into pictures to finish image compression.
2. The image compression method based on multi-scale spatial and contextual information fusion according to claim 1, wherein said step 1) specifically comprises the steps of:
11) the original picture is subjected to feature extraction and down-sampling through a residual block, an attention module and a multi-scale information fusion module to obtain a hidden feature y, and in order to carry out entropy coding on y, quantization with the step length of 1 is carried out on y to obtain a quantized hidden feature
Figure FDA0003538622650000011
Then there are:
Figure FDA0003538622650000015
Figure FDA0003538622650000012
wherein, x is the original picture,
Figure FDA0003538622650000013
for the parameters of the primary encoder, Q (-) denotes the quantization process, g a () represents a primary encoder;
12) down-sampling the i-th feature y by a multi-scale information fusion module in a main encoder (i) And down-sampling feature y by i +2 times (i+2) Through the fusion of the forms of attention mechanisms, in order to reduce the consumption of computing resources, the main encoder only adopts two multi-scale information modules, and the following modules are provided:
y (i+2) =y (i+2) +y (i+2) *sigmoid(Res(y (i) )).
where Res (·) denotes the residual block.
3. The image compression method based on the fusion of the multi-scale space and the context information as claimed in claim 2, wherein the step 2) specifically comprises the following steps:
21) the super-prior encoder calculates the super-prior characteristic z from the hidden characteristic y, and then obtains the quantized super-prior characteristic through quantization
Figure FDA0003538622650000014
For assisting in extracting spatial redundancy in the hidden features and improving the accuracy of the probability distribution estimation of the hidden features, there are:
Figure FDA0003538622650000021
Figure FDA0003538622650000022
wherein h is a (. cndot.) denotes a super-a-priori coder,
Figure FDA0003538622650000023
parameters of the super-prior encoder;
22) hidden features from quantization using a multi-scale three-dimensional context module
Figure FDA0003538622650000024
Multi-scale context features derived from
Figure FDA0003538622650000025
Then there are:
Figure FDA0003538622650000026
wherein, downsample is represented by downsample,
Figure FDA0003538622650000027
representing a three-dimensional context model with a convolution kernel size of 5 x 5,
Figure FDA0003538622650000028
representing a three-dimensional context model with convolution kernel size 7 x 7,
Figure FDA0003538622650000029
a three-dimensional context model representing a convolution kernel size of 9 × 9 × 9;
23) characterizing multi-scale context
Figure FDA00035386226500000210
And a super-precedent feature
Figure FDA00035386226500000222
After combination, the model parameters and the weight of the Gaussian mixture model are obtained by resolving through a super-first decoder, and then:
Figure FDA00035386226500000211
wherein, ω is i ,μ i
Figure FDA00035386226500000212
Respectively representing the weight, mean and variance of the ith Gaussian model in the Gaussian mixture model,
Figure FDA00035386226500000213
represents the ith super-a decoder;
24) combining three Gaussian functions into a Gaussian mixture model according to the weight to serve as an entropy model, and calculating to obtain the estimation of the probability distribution of the hidden features, wherein the estimation comprises the following steps:
Figure FDA00035386226500000214
wherein,
Figure FDA00035386226500000215
based on the characteristics of the prior
Figure FDA00035386226500000221
Hidden feature of (2)
Figure FDA00035386226500000216
The conditional probability distribution of (2) is,
Figure FDA00035386226500000217
based on a parameter omega i ,μ i The probability distribution of the gaussian distribution of (a),
Figure FDA00035386226500000218
is in the range of-
Figure FDA00035386226500000219
To
Figure FDA00035386226500000220
Evenly distributed noise.
4. The image compression method based on the fusion of the multi-scale space and the context information as claimed in claim 2, wherein the step 3) specifically comprises the following steps:
31) in order to prevent the gradient disappearance phenomenon during model training, in the stage of training an image compression model, the quantization process is replaced by adding independent uniformly distributed uniform noise;
32) during the use of the image compression model, the hidden features are quantized, the probability distribution of the features is calculated based on the entropy model obtained by the prior encoder, and the quantized features are encoded by the arithmetic coding in the entropy encoding.
5. The image compression method based on multi-scale spatial and contextual information fusion according to claim 1, wherein said step 4) specifically comprises the steps of:
41) the hidden features are re-transformed into pictures through residual blocks and upsampling in the main decoder, then there are:
Figure FDA0003538622650000031
wherein, g s (. cndot.) denotes a master decoder,
Figure FDA0003538622650000032
is a parameter of the main decoder and is,
Figure FDA0003538622650000033
is a reconstructed picture;
42) picture to be reconstructed
Figure FDA0003538622650000034
And comparing the compression effect with the reconstruction effect of the model by objective and subjective indexes of the original picture x.
6. The method as claimed in claim 5, wherein in step 42), the objective and subjective indicators include PSNR and MS-SSIM.
7. The method according to claim 1, wherein the image compression model based on fusion of multi-scale space and context information in step 1) comprises a main encoder, a super-a-decoder, a main decoder, and a multi-scale three-dimensional context module.
8. The method of claim 7, wherein when the image compression model is trained, in order to balance the relationship between the bit rate and the image reconstruction quality, the trained objective function is set as:
Figure FDA0003538622650000035
wherein, the lambda is a hyper-parameter for balancing code rate and image reconstruction quality,
Figure FDA0003538622650000036
and
Figure FDA0003538622650000037
respectively the quantized hidden featuresSign for
Figure FDA0003538622650000038
And the quantized super-a priori characteristics
Figure FDA0003538622650000039
The code rate of (1), D (-) represents the difference between the original picture and the reconstructed picture, MSE and MS-SSIM are used as the measuring standards, when an MSE optimization model is adopted, the evaluation standard of the model is PSNR, and when an MS-SSIM optimization model is adopted, the evaluation standard of the model is MS-SSM.
9. The image compression method based on multi-scale space and context information fusion of claim 8, characterized in that, in order to accelerate the convergence speed of the model, pre-training is performed at a high code rate, and then the value of λ is modified, and the code rate of the model is adjusted to other values.
10. The image compression method based on multi-scale space and context information fusion of claim 8, characterized in that, when a pre-training model is trained, the learning rate decreases with the number of iterations, and when models with other code rates are trained, the initial learning rate value increases and decreases with the number of iterations.
CN202210224174.0A 2022-03-09 2022-03-09 An image compression method based on multi-scale space and context information fusion Active CN114792347B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210224174.0A CN114792347B (en) 2022-03-09 2022-03-09 An image compression method based on multi-scale space and context information fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210224174.0A CN114792347B (en) 2022-03-09 2022-03-09 An image compression method based on multi-scale space and context information fusion

Publications (2)

Publication Number Publication Date
CN114792347A true CN114792347A (en) 2022-07-26
CN114792347B CN114792347B (en) 2025-02-28

Family

ID=82460743

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210224174.0A Active CN114792347B (en) 2022-03-09 2022-03-09 An image compression method based on multi-scale space and context information fusion

Country Status (1)

Country Link
CN (1) CN114792347B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116743182A (en) * 2023-08-15 2023-09-12 国网江西省电力有限公司信息通信分公司 Lossless data compression method
CN117173263A (en) * 2023-10-31 2023-12-05 江苏势通生物科技有限公司 Image compression method for generating countermeasure network based on enhanced multi-scale residual error
CN119135910A (en) * 2024-09-10 2024-12-13 电子科技大学 Image encoding method and device based on deep learning

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126258A (en) * 2019-12-23 2020-05-08 深圳市华尊科技股份有限公司 Image recognition method and related device
US20210012201A1 (en) * 2019-07-10 2021-01-14 Adobe Inc. Center-biased machine learning techniques to determine saliency in digital images
CN113284055A (en) * 2021-03-18 2021-08-20 华为技术有限公司 Image processing method and device
CN113283435A (en) * 2021-05-14 2021-08-20 陕西科技大学 Remote sensing image semantic segmentation method based on multi-scale attention fusion

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210012201A1 (en) * 2019-07-10 2021-01-14 Adobe Inc. Center-biased machine learning techniques to determine saliency in digital images
CN111126258A (en) * 2019-12-23 2020-05-08 深圳市华尊科技股份有限公司 Image recognition method and related device
CN113284055A (en) * 2021-03-18 2021-08-20 华为技术有限公司 Image processing method and device
CN113283435A (en) * 2021-05-14 2021-08-20 陕西科技大学 Remote sensing image semantic segmentation method based on multi-scale attention fusion

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
何凯;冯旭;高圣楠;马希涛;: "基于多尺度特征融合与反复注意力机制的细粒度图像分类算法", 天津大学学报(自然科学与工程技术版), no. 10, 2 September 2020 (2020-09-02) *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116743182A (en) * 2023-08-15 2023-09-12 国网江西省电力有限公司信息通信分公司 Lossless data compression method
CN116743182B (en) * 2023-08-15 2023-12-08 国网江西省电力有限公司信息通信分公司 Lossless data compression method
CN117173263A (en) * 2023-10-31 2023-12-05 江苏势通生物科技有限公司 Image compression method for generating countermeasure network based on enhanced multi-scale residual error
CN117173263B (en) * 2023-10-31 2024-02-02 江苏势通生物科技有限公司 Image compression method for generating countermeasure network based on enhanced multi-scale residual error
CN119135910A (en) * 2024-09-10 2024-12-13 电子科技大学 Image encoding method and device based on deep learning
CN119135910B (en) * 2024-09-10 2025-06-20 电子科技大学 Image encoding method and device based on deep learning

Also Published As

Publication number Publication date
CN114792347B (en) 2025-02-28

Similar Documents

Publication Publication Date Title
CN111787323B (en) Variable bit rate generation type compression method based on counterstudy
CN114792347B (en) An image compression method based on multi-scale space and context information fusion
CN110248190B (en) Multilayer residual coefficient image coding method based on compressed sensing
CN111147862B (en) End-to-end image compression method based on target coding
CN111641826B (en) Method, device and system for encoding and decoding data
CN109102461B (en) Image reconstruction method, device, equipment and medium for low-sampling block compressed sensing
CN114422802B (en) Self-encoder image compression method based on codebook
CN116600119B (en) Video encoding method, video decoding method, video encoding device, video decoding device, computer equipment and storage medium
CN113132727A (en) Scalable machine vision coding method based on image generation
CN114501034B (en) Image compression method and medium based on discrete Gaussian mixture super prior and Mask
CN114663536B (en) Image compression method and device
CN114549302B (en) Image super-resolution reconstruction method and system
CN118982555A (en) A dynamic point cloud efficient compression method, system, electronic device and medium based on feature anchor points
CN111107377A (en) Depth image compression method, device, equipment and storage medium
CN119152051A (en) Three-dimensional medical image compression system for man-machine vision
CN118608799A (en) Remote sensing image compression method based on multi-scale asymmetric encoding and decoding network
CN111510740B (en) Transcoding method, transcoding device, electronic equipment and computer readable storage medium
CN116828184B (en) Video encoding method, video decoding method, video encoding device, video decoding device, computer equipment and storage medium
CN117319652A (en) Video coding and decoding model processing, video coding and decoding methods and related equipment
CN111163320A (en) Video compression method and system
CN115361555A (en) Image encoding method, image encoding method, device, and computer storage medium
Teng et al. Light field compression via a variational graph auto-encoder
CN111091495A (en) High-resolution compressive sensing reconstruction method for laser image based on residual error network
Liu et al. Learned Image Compression with Multi-Scale Spatial and Contextual Information Fusion
CN117689742A (en) A multi-rate image compression and transmission method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant