CN114519750A

CN114519750A - Face image compression method and system

Info

Publication number: CN114519750A
Application number: CN202210013946.6A
Authority: CN
Inventors: 贾川民; 张悦枫; 马思伟; 王苫社
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2022-01-06
Filing date: 2022-01-06
Publication date: 2022-05-20
Anticipated expiration: 2042-01-06
Also published as: CN114519750B

Abstract

The embodiment of the application discloses an image compression method and system, wherein the method comprises the following steps: inputting a style encoder and a content encoder from an original face image to extract style features and structural features; respectively carrying out probability estimation and entropy coding to obtain a style coding bit stream corresponding to the style characteristics and a structure coding bit stream corresponding to the structure characteristics, and inputting the style coding bit stream and the structure coding bit stream into a decoder and a multitask analysis network; the decoder reconstructs the images of the style coding bit stream and the structure coding bit stream and outputs a reconstructed image; and the multitask analysis network carries out semantic understanding analysis on the style coding bit stream and the structure coding bit stream and outputs semantic information of the image. Under the condition of extremely high compression efficiency, the high subjective visual evaluation quality of the reconstructed image is maintained, and the decoding time and the resource overhead are saved.

Description

A face image compression method and system

技术领域technical field

本申请实施例涉及数字信号处理技术领域，具体涉及一种人脸图像压缩方法和系统。The embodiments of the present application relate to the technical field of digital signal processing, and in particular, to a method and system for compressing a face image.

背景技术Background technique

基于神经网络的图像/视频压缩方法近年来发展迅速，其压缩重建图像质量已经在PSNR、MS-SSIM等客观指标上超过新一代视频编码标准VVC。基于生成模型的压缩框架可以在不影响直接反应人眼观看效果的相关评价指标的前提下，极大限度地提升了压缩比。The image/video compression method based on neural network has developed rapidly in recent years, and the quality of its compressed and reconstructed image has surpassed the new-generation video coding standard VVC in objective indicators such as PSNR and MS-SSIM. The compression framework based on the generative model can greatly improve the compression ratio without affecting the relevant evaluation indicators that directly reflect the viewing effect of the human eye.

目前在各种研究中，基于神经网络的端到端图像编码面临两大问题：一是对于输入的原始图像信号表示机理有限，缺乏对目前广泛应用的计算机视觉处理任务的支持；二是信号接收端资源有限，不足以支撑参数量庞大的神经网络模型。At present, in various researches, the end-to-end image coding based on neural network faces two major problems: one is the limited representation mechanism for the input original image signal, and the lack of support for the currently widely used computer vision processing tasks; the other is the signal reception. The terminal resources are limited, which is not enough to support the neural network model with a large number of parameters.

发明内容SUMMARY OF THE INVENTION

为此，本申请实施例提供一种图像压缩方法和系统，在极高压缩效率的情况下，保持重建图像的高主观视觉评价质量，并且节省解码时间与资源开销。To this end, the embodiments of the present application provide an image compression method and system, which can maintain high subjective visual evaluation quality of reconstructed images and save decoding time and resource overhead under the condition of extremely high compression efficiency.

为了实现上述目的，本申请实施例提供如下技术方案：In order to achieve the above purpose, the embodiments of the present application provide the following technical solutions:

根据本申请实施例的第一方面，提供了一种人脸图像压缩方法，所述方法包括：According to a first aspect of the embodiments of the present application, a method for compressing a face image is provided, the method comprising:

从原始人脸图像输入风格编码器和内容编码器，以提取风格特征和结构特征；Input style encoder and content encoder from raw face images to extract style features and structural features;

分别进行概率估计和熵编码，得到风格特征对应的风格编码比特流和结构特征对应的结构编码比特流，输入解码器和多任务分析网络；Probability estimation and entropy coding are performed respectively to obtain the style coding bitstream corresponding to the style feature and the structure coding bitstream corresponding to the structure feature, which are input to the decoder and the multi-task analysis network;

解码器对所述风格编码比特流和结构编码比特流的图像进行重建，输出重建图；多任务分析网络对所述风格编码比特流和结构编码比特流进行语义理解分析，输出图像的语义信息。The decoder reconstructs the images of the style-encoded bitstream and the structure-encoded bitstream, and outputs a reconstructed image; the multi-task analysis network performs semantic understanding and analysis on the style-encoded bitstream and the structure-encoded bitstream, and outputs the semantic information of the image.

可选地，分别进行概率估计和熵编码，得到风格特征对应的风格编码比特流和结构特征对应的结构编码比特流，包括：Optionally, probability estimation and entropy coding are respectively performed to obtain a style coded bitstream corresponding to the style feature and a structure coded bitstream corresponding to the structural feature, including:

分别将风格特征和结构特征进行量化，得到量化后的风格特征和结构特征；Quantize the style features and structural features respectively to obtain the quantized style features and structural features;

分别根据概率估计模型计算的概率估计结果将量化后的风格特征和结构特征进行熵编码，得到风格特征对应的风格编码比特流和结构特征对应的结构编码比特流。Entropy coding is performed on the quantized style features and structural features according to the probability estimation results calculated by the probability estimation model, respectively, to obtain a style coded bitstream corresponding to the style feature and a structure coded bitstream corresponding to the structural feature.

可选地，解码器对所述风格编码比特流和结构编码比特流的图像进行重建，输出重建图，包括：Optionally, the decoder reconstructs the images of the style-encoded bitstream and the structure-encoded bitstream, and outputs a reconstructed image, including:

通过解码器中的融合模块将风格编码比特流和结构编码比特流进行融合，通过多层感知MLP处理，以学习残差块中卷积层的均值和方差；The style-encoded bitstream and the structure-encoded bitstream are fused through the fusion module in the decoder, and processed through a multi-layer perceptual MLP to learn the mean and variance of the convolutional layers in the residual block;

通过解码器中的生成器对融合后的编码比特流执行图像压缩任务，得到压缩后的重建图像；Perform the image compression task on the fused encoded bit stream through the generator in the decoder to obtain the compressed reconstructed image;

通过所述判别器对所述压缩后的重建图像进行判别，得到损失优化函数；根据损失优化函数训练所述生成器。The compressed reconstructed image is discriminated by the discriminator to obtain a loss optimization function; the generator is trained according to the loss optimization function.

可选地，所述损失优化函数按照如下公式：Optionally, the loss optimization function is according to the following formula:

其中，D为判别器，E为内容编码器和风格编码器，G为生成器，P为概率估计模型，x为原始人脸图像，

为重建图像，

为量化后的风格特征和结构特征，p为概率估计结果，λ、β为超参数。Among them, D is the discriminator, E is the content encoder and style encoder, G is the generator, P is the probability estimation model, x is the original face image,

To reconstruct the image,

are the quantized style features and structural features, p is the probability estimation result, and λ and β are hyperparameters.

可选地，所述多任务分析网络对所述风格编码比特流和结构编码比特流进行语义理解分析，输出图像的语义信息，包括：Optionally, the multi-task analysis network performs semantic understanding analysis on the style-coded bitstream and the structure-coded bitstream, and outputs semantic information of the image, including:

将风格编码比特流和结构编码比特流输入所述多任务分析网络中，通过融合模块对编码比特流进行融合，再根据多任务分析损失函数对多任务分析网络进行训练，得到对应的任务结果，作为图像的语义信息的输出。Input the style coding bit stream and the structure coding bit stream into the multi-task analysis network, fuse the encoded bit stream through the fusion module, and then train the multi-task analysis network according to the multi-task analysis loss function to obtain the corresponding task results, As the output of the semantic information of the image.

可选地，所述多任务分析损失函数L_multi按照如下公式计算：Optionally, the multi-task analysis loss function L _multi is calculated according to the following formula:

L_multi＝λ_clsl_cls+λ_segl_seg L _multi =λ _cls l _cls +λ _seg l _seg

其中，l_cls、l_seg分别是分类任务与分割任务的损失函数，λ_cls、λ_seg是对应的权重超参数。Among them, l _cls and l _seg are the loss functions of the classification task and the segmentation task, respectively, and λ _cls and λ _seg are the corresponding weight hyperparameters.

可选地，所述方法还包括：通过对所述多任务分析损失函数的优化，训练所述多任务分析模型中的参数，以获得全局最优解；其中多任务分析模型的训练中应用的总损失函数按照如下公式：Optionally, the method further includes: by optimizing the multi-task analysis loss function, training parameters in the multi-task analysis model to obtain a global optimal solution; wherein the multi-task analysis model is applied in training. The total loss function follows the formula:

L＝L_EGP+L_D+γL_multi L=L _EGP +L _D +γL _multi

其中，γ为超参数。where γ is a hyperparameter.

根据本申请实施例的第二方面，提供了一种人脸图像压缩系统，所述系统包括：According to a second aspect of the embodiments of the present application, a system for compressing a face image is provided, and the system includes:

特征提取模块，用于从原始人脸图像输入风格编码器和内容编码器，以提取风格特征和结构特征；Feature extraction module for inputting style encoder and content encoder from raw face images to extract style features and structural features;

编码模块，用于分别进行概率估计和熵编码，得到风格特征对应的风格编码比特流和结构特征对应的结构编码比特流，输入解码器和多任务分析网络；The encoding module is used to perform probability estimation and entropy encoding respectively to obtain the style encoding bitstream corresponding to the style feature and the structure encoding bitstream corresponding to the structure feature, and input the decoder and the multi-task analysis network;

压缩解码模块，用于解码器对所述风格编码比特流和结构编码比特流的图像进行重建，输出重建图；a compression decoding module, used for the decoder to reconstruct the images of the style-encoded bitstream and the structure-encoded bitstream, and output a reconstructed image;

多任务分析模块，用于多任务分析网络对所述风格编码比特流和结构编码比特流进行语义理解分析，输出图像的语义信息。The multi-task analysis module is used for the multi-task analysis network to perform semantic understanding and analysis on the style-coded bit stream and the structure-coded bit stream, and output the semantic information of the image.

根据本申请实施例的第三方面，提供了一种电子设备，包括：存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序，所述处理器运行所述计算机程序时执行以实现上述第一方面所述的方法。According to a third aspect of the embodiments of the present application, an electronic device is provided, including: a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor running the A computer program is executed to implement the method described in the first aspect above.

根据本申请实施例的第四方面，提供了一种计算机可读存储介质，其上存储有计算机可读指令，所述计算机可读指令可被处理器执行以实现上述第一方面所述的方法。According to a fourth aspect of the embodiments of the present application, there is provided a computer-readable storage medium having computer-readable instructions stored thereon, where the computer-readable instructions can be executed by a processor to implement the method described in the first aspect above .

综上所述，本申请实施例提供了一种图像压缩方法和系统，通过从原始人脸图像输入风格编码器和内容编码器，以提取风格特征和结构特征；分别进行概率估计和熵编码，得到风格特征对应的风格编码比特流和结构特征对应的结构编码比特流，输入解码器和多任务分析网络；解码器对所述风格编码比特流和结构编码比特流的图像进行重建，输出重建图；多任务分析网络对所述风格编码比特流和结构编码比特流进行语义理解分析，输出图像的语义信息。在极高压缩效率的情况下，保持重建图像的高主观视觉评价质量，并且节省解码时间与资源开销。To sum up, the embodiments of the present application provide an image compression method and system, by inputting a style encoder and a content encoder from an original face image to extract style features and structural features; respectively performing probability estimation and entropy encoding, The style encoding bitstream corresponding to the style feature and the structure encoding bitstream corresponding to the structure feature are obtained, and input to the decoder and the multi-task analysis network; the decoder reconstructs the images of the style encoding bitstream and the structure encoding bitstream, and outputs the reconstructed image The multi-task analysis network performs semantic understanding analysis on the style-coded bit stream and the structure-coded bit stream, and outputs the semantic information of the image. In the case of extremely high compression efficiency, the high subjective visual evaluation quality of the reconstructed image is maintained, and decoding time and resource overhead are saved.

附图说明Description of drawings

为了更清楚地说明本发明的实施方式或现有技术中的技术方案，下面将对实施方式或现有技术描述中所需要使用的附图作简单地介绍。显而易见地，下面描述中的附图仅仅是示例性的，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据提供的附图引伸获得其它的实施附图。In order to illustrate the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that are required to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only exemplary, and for those of ordinary skill in the art, other implementation drawings can also be obtained according to the extension of the drawings provided without creative efforts.

本说明书所绘示的结构、比例、大小等，均仅用以配合说明书所揭示的内容，以供熟悉此技术的人士了解与阅读，并非用以限定本发明可实施的限定条件，故不具技术上的实质意义，任何结构的修饰、比例关系的改变或大小的调整，在不影响本发明所能产生的功效及所能达成的目的下，均应仍落在本发明所揭示的技术内容能涵盖的范围内。The structures, proportions, sizes, etc. shown in this specification are only used to cooperate with the contents disclosed in the specification, so as to be understood and read by those who are familiar with the technology, and are not used to limit the conditions for the implementation of the present invention, so there is no technical The substantive meaning, any modification of the structure, the change of the proportional relationship or the adjustment of the size, without affecting the effect that the present invention can produce and the purpose that can be achieved, should still fall within the technical content disclosed in the present invention. within the scope of coverage.

图1为本申请实施例提供的一种人脸图像压缩方法流程示意图；1 is a schematic flowchart of a method for compressing a face image according to an embodiment of the present application;

图2为本申请实施例提供的技术方案流程图；FIG. 2 is a flow chart of the technical solution provided by the embodiment of the present application;

图3为本申请实施例提供的多任务分析网络结构图；3 is a multi-task analysis network structure diagram provided by an embodiment of the present application;

图4为本申请实施例提供的压缩重建图像质量指标展示示意图；FIG. 4 is a schematic diagram showing a quality indicator of a compressed and reconstructed image provided by an embodiment of the present application;

图5为本申请实施例提供的一种人脸图像压缩系统；FIG. 5 provides a face image compression system according to an embodiment of the present application;

图6示出了本申请实施例所提供的一种电子设备的结构示意图；FIG. 6 shows a schematic structural diagram of an electronic device provided by an embodiment of the present application;

图7示出了本申请实施例所提供的一种计算机可读存储介质的示意图。FIG. 7 shows a schematic diagram of a computer-readable storage medium provided by an embodiment of the present application.

具体实施方式Detailed ways

以下由特定的具体实施例说明本发明的实施方式，熟悉此技术的人士可由本说明书所揭露的内容轻易地了解本发明的其他优点及功效，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。The embodiments of the present invention are described below by specific specific embodiments. Those who are familiar with the technology can easily understand other advantages and effects of the present invention from the contents disclosed in this specification. Obviously, the described embodiments are part of the present invention. , not all examples. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

本申请实施例提供的图像压缩方法在同等码率的前提下，在多种模拟人眼视觉感知的评价指标上领先于现有技术；同时，使用压缩数据作为下游视觉分析任务的输入，在与原图相比损失极少分析准确性，而带来大幅度解码端存储资源开销与网络传输带宽。Under the premise of the same bit rate, the image compression method provided by the embodiment of the present application is ahead of the prior art in terms of various evaluation indicators that simulate the visual perception of the human eye; Compared with the original image, it loses little analysis accuracy, but brings a large amount of storage resource overhead and network transmission bandwidth at the decoding end.

图1示出了本申请实施例提供的图像压缩方法，所述方法包括如下步骤：FIG. 1 shows an image compression method provided by an embodiment of the present application, and the method includes the following steps:

步骤101：从原始人脸图像输入风格编码器和内容编码器，以提取风格特征和结构特征；Step 101: input a style encoder and a content encoder from the original face image to extract style features and structural features;

步骤102：分别进行概率估计和熵编码，得到风格特征对应的风格编码比特流和结构特征对应的结构编码比特流，输入解码器和多任务分析网络；Step 102: respectively perform probability estimation and entropy coding to obtain the style coding bitstream corresponding to the style feature and the structure coding bitstream corresponding to the structure feature, and input the decoder and the multi-task analysis network;

步骤103：解码器对所述风格编码比特流和结构编码比特流的图像进行重建，输出重建图；多任务分析网络对所述风格编码比特流和结构编码比特流进行语义理解分析，输出图像的语义信息。Step 103: The decoder reconstructs the images of the style-encoded bitstream and the structure-encoded bitstream, and outputs a reconstructed image; the multi-task analysis network performs semantic understanding analysis on the style-encoded bitstream and the structure-encoded bitstream, and outputs the image semantic information.

在一种可能的实施方式中，在步骤102中，分别进行概率估计和熵编码，得到风格特征对应的风格编码比特流和结构特征对应的结构编码比特流，包括：In a possible implementation, in step 102, probability estimation and entropy coding are performed respectively to obtain a style-coded bitstream corresponding to the style feature and a structure-coded bitstream corresponding to the structural feature, including:

分别将风格特征和结构特征进行量化，得到量化后的风格特征和结构特征；分别根据概率估计模型计算的概率估计结果将量化后的风格特征和结构特征进行熵编码，得到风格特征对应的风格编码比特流和结构特征对应的结构编码比特流。Quantize the style features and structural features respectively to obtain the quantized style features and structural features; entropy encode the quantized style features and structural features according to the probability estimation results calculated by the probability estimation model, and obtain the style codes corresponding to the style features. The bitstream and the structure-encoded bitstream corresponding to the structure feature.

在一种可能的实施方式中，在步骤103中，解码器对所述风格编码比特流和结构编码比特流的图像进行重建，输出重建图，包括：In a possible implementation manner, in step 103, the decoder reconstructs the images of the style-encoded bitstream and the structure-encoded bitstream, and outputs a reconstructed image, including:

通过解码器中的融合模块将风格编码比特流和结构编码比特流进行融合，通过多层感知MLP处理，以学习残差块中卷积层的均值和方差；通过解码器中的生成器对融合后的编码比特流执行图像压缩任务，得到压缩后的重建图像；通过所述判别器对所述压缩后的重建图像进行判别，得到损失优化函数；根据损失优化函数训练所述生成器。The style-encoded bitstream and the structure-encoded bitstream are fused by the fusion module in the decoder, and processed through a multi-layer perceptual MLP to learn the mean and variance of the convolutional layers in the residual block; the generator in the decoder is used to fuse the fusion The encoded bit stream performs an image compression task to obtain a compressed reconstructed image; the compressed reconstructed image is discriminated by the discriminator to obtain a loss optimization function; the generator is trained according to the loss optimization function.

在一种可能的实施方式中，所述损失优化函数按照如下公式(1)和(2)：In a possible implementation, the loss optimization function is according to the following formulas (1) and (2):

为重建图像，

To reconstruct the image,

在一种可能的实施方式中，在步骤103中，所述多任务分析网络对所述风格编码比特流和结构编码比特流进行语义理解分析，输出图像的语义信息，包括：In a possible implementation manner, in step 103, the multi-task analysis network performs semantic understanding analysis on the style-coded bitstream and the structure-coded bitstream, and outputs semantic information of the image, including:

在压缩框架中，使用中间压缩结果，在不解码的情况下，直接作为多种分析任务输入，获取原始图像信号语义信息。这里的多任务指多种视觉语义相关的任务，如识别、检测、分割等。In the compression framework, the intermediate compression results are used directly as input for various analysis tasks without decoding to obtain the semantic information of the original image signal. Multitasking here refers to a variety of visual-semantic-related tasks, such as recognition, detection, segmentation, etc.

在一种可能的实施方式中，所述多任务分析损失函数L_multi按照如下公式(3)计算：In a possible implementation manner, the multi-task analysis loss function L _multi is calculated according to the following formula (3):

L_multi＝λ_clsl_cls+λ_segl_seg 公式(3)L _multi =λ _cls l _cls +λ _seg l _seg Formula (3)

在一种可能的实施方式中，所述方法还包括：通过对所述多任务分析损失函数的优化，训练所述多任务分析模型中的参数，以获得全局最优解；其中多任务分析模型的训练中应用的总损失函数按照如下公式(4)：In a possible implementation manner, the method further includes: by optimizing the multi-task analysis loss function, training parameters in the multi-task analysis model to obtain a global optimal solution; wherein the multi-task analysis model The total loss function applied in training follows the formula (4):

L＝L_EGP+L_D+γL_multi 公式(4)L=L _EGP +L _D + _γL multiformula (4)

其中，γ为超参数。where γ is a hyperparameter.

本申请实施例提供的人脸图像压缩方法，将应用当前对图片信号分层表征的生成模型的构建思想，将原始输入图片映射为风格特征与结构特征，从而在对应特征表达分布上进行进一步的量化和熵编码。此外，本申请实施例将直接采取压缩域压缩数据，作为后续多种视觉任务的输入。由于压缩数据是一种高效紧凑的数据表达形式，因此本申请实施例提出多任务分析网络模型，在不解压缩的情况下以很低的运算代价从中获取原始图像的语义信息。同时，还对率失真损失函数和机器分析目标损失函数进行联合优化，获取图像压缩任务与多种机器视觉分析任务的公共解。The face image compression method provided by the embodiment of the present application will apply the current construction idea of the generation model for the hierarchical representation of the picture signal, and map the original input picture into style features and structural features, so as to further carry out the corresponding feature expression distribution. Quantization and entropy coding. In addition, the embodiment of the present application will directly take compressed data in the compressed domain as the input of various subsequent visual tasks. Since compressed data is an efficient and compact data expression form, the embodiment of the present application proposes a multi-task analysis network model to obtain semantic information of the original image from it at a low computational cost without decompression. At the same time, the rate-distortion loss function and the machine analysis objective loss function are jointly optimized to obtain common solutions for image compression tasks and various machine vision analysis tasks.

图2示出了本申请实施例提出的用于多视觉分析任务的人脸图片压缩方法适用的模型架构图，主要包括压缩模型和多任务分析模型。FIG. 2 shows a model architecture diagram applicable to the face image compression method for multi-visual analysis tasks proposed by the embodiment of the present application, which mainly includes a compression model and a multi-task analysis model.

压缩模型主要包括四个主要部分：编码器，生成器，判别器和概率估计模型。编码器包括内容编码器和风格编码器。The compression model mainly consists of four main parts: encoder, generator, discriminator and probability estimation model. Encoders include content encoders and style encoders.

给定一个原始图像x，编码器先将其编码为y＝E(x)，然后量化

其中Q为量化函数。在此之后，根据概率估计模型所给定的概率估计结果将

使用熵编码方法无损编码为比特流。在解码器端，有

其中

为重构图像。Given an original image x, the encoder first encodes it as y=E(x), and then quantizes it

where Q is the quantization function. After that, the probability estimation result given by the probability estimation model will be

Lossless encoding into a bitstream using an entropy encoding method. On the decoder side, there is

in

to reconstruct the image.

将原始待压缩图片在映射到视觉语义特征域，再将其分解为风格特征与结构特征。分解所得风格特征与结构特征使用相互独立的概率估计方法进行概率分布拟合，将拟合得到概率的熵值作为实际编码得到的码率值。The original image to be compressed is mapped to the visual semantic feature domain, and then decomposed into style features and structural features. The style features and structural features obtained by decomposing are used for probability distribution fitting using independent probability estimation methods, and the entropy value of the probability obtained by fitting is used as the code rate value obtained by actual coding.

将原始图像信号输入，在语义特征层面分解为内容特征与风格特征。采用两个独立的编码器将输入图像x编码为内容表示和风格表示，分别为E₁和E₂。内容特征和风格特征分别为y₁＝E₁(x)和y₂＝E₂(x)。然后利用量化函数Q进行量化，得到

和

The original image signal input is decomposed into content features and style features at the semantic feature level. Two independent encoders are employed to encode the input image x into a content representation and _a style representation, E1 and _E2 , respectively. The content feature and style feature are y ₁ =E ₁ (x) and y ₂ =E ₂ (x), respectively. Then use the quantization function Q for quantization to get

and

由于解耦合所得特征具有相互独立的数据分布，为每一层分别设置了概率估计模型P，即p₁(y₁|z₁)和p₂(y₂|z₂)。根据概率估计模型所给定的概率估计结果p₁(y₁|z₁)和p₂(y₂|z₂)将

和

使用熵编码方法无损编码为编码比特流，将拟合得到概率的熵值作为实际编码得到的码率值。Since the features obtained by decoupling have independent data distributions, a probability estimation model P is set for each layer, namely p ₁ (y ₁ |z ₁ ) and p ₂ (y ₂ |z ₂ ). According to the probability estimation results p ₁ (y ₁ |z ₁ ) and p ₂ (y ₂ |z ₂ ) given by the probability estimation model, the

and

The entropy encoding method is used to encode the bit stream losslessly, and the entropy value of the probability obtained by fitting is used as the code rate value obtained by the actual encoding.

采用熵编码方法进一步压缩特征，熵编码的输入需要特征中元素的概率分布，所以采用概率估计模型对出现的每个元素的概率分布进行估计。熵编码方法包括但不限于哈夫曼编码、算数编码、基于上下文的二值化编码。The entropy coding method is used to further compress the features. The input of the entropy coding requires the probability distribution of the elements in the feature, so the probability estimation model is used to estimate the probability distribution of each element that appears. Entropy coding methods include, but are not limited to, Huffman coding, arithmetic coding, and context-based binarization coding.

这里的编码比特流的简称是码流。特征经过熵编码器进行压缩得到的二进制文件，图2中的码流1和码流2分别是内容特征与风格特征的熵编码结果。The abbreviation of the encoded bit stream here is the code stream. The feature is a binary file obtained by compressing the feature by the entropy encoder. The code stream 1 and the code stream 2 in Figure 2 are the entropy coding results of the content feature and the style feature, respectively.

本申请实施例还提出一种基于语义分层的编码器，解码器部分主要包括生成器与判别器。本申请实施例在解码端设计了融合模块。融合模块基于自适应实例归一化(Adaptive Instance Normalization,AdaIN)残差块，其中内容特征直接作为AdaIN模块输入；风格特征通过多层感知(multi-layer perceptron,MLP)处理，作为AdaIN模块中残差网络卷基层的均值与方差，以学习残差块中卷积层的均值和方差。The embodiment of the present application also proposes an encoder based on semantic layers, and the decoder part mainly includes a generator and a discriminator. In this embodiment of the present application, a fusion module is designed at the decoding end. The fusion module is based on the Adaptive Instance Normalization (AdaIN) residual block, in which the content features are directly input as the AdaIN module; the style features are processed by multi-layer perceptron (MLP) as the residual in the AdaIN module. The mean and variance of the convolutional base layer of the difference network to learn the mean and variance of the convolutional layers in the residual block.

在解码器端，还有

其中

为重构图像。On the decoder side, there is also

in

to reconstruct the image.

本申请实施例根据率失真理论对压缩模型进行优化。失真损失度量中有两个参考指标：像素层面的平均绝对误差(mean average error,MAE)损失d_MAE和整体结构评测的SSIM损失d_SSIM。同时，考虑到主观感知质量，采用感知失真损失d_p，通过从预训练的卷积神经网络VGG16中提取的高阶特征，来模拟人眼感知特性。The embodiments of the present application optimize the compression model according to the rate-distortion theory. There are two reference metrics for distortion loss metrics: pixel-level mean average error (MAE) loss d _MAE and SSIM loss d _SSIM for overall structure evaluation. At the same time, considering the subjective perceptual quality, a perceptual distortion loss _dp is adopted to simulate the human eye perceptual characteristics through the high-order features extracted from the pre-trained convolutional neural network VGG16.

本申请实施例所提出的压缩模型计算总失真损失的方法按照如下公式(5)：The method for calculating the total distortion loss by the compression model proposed in the embodiment of the present application is according to the following formula (5):

d＝λ_MAEd_MAE+λ_SSIMd_SSIM+λ_pd_p 公式(5)d=λ _MAE d _MAE +λ _SSIM d _SSIM +λ _p d _p Formula (5)

其中，λ_MAE、λ_SSIM和λ_p是超参数。d_MAE是像素层面的平均绝对误差(mean averageerror,MAE)损失，d_SSIM是整体结构评测的SSIM损失，d_p是感知失真损失。where λ _MAE , λ _SSIM and λ _p are hyperparameters. d _MAE is the pixel-level mean average error (MAE) loss, d _SSIM is the SSIM loss for overall structure evaluation, and d _p is the perceptual distortion loss.

基于此损失函数的定义，本申请实施例所对应的各模块(是指L角标所对应的图2各部分模块，E指内容和风格编码器，G指生成器，P指概率估计模型，D指判别器)损失优化函数可以被定义为公式(1)和公式(2)。Based on the definition of this loss function, the modules corresponding to the embodiments of the present application (refer to the respective partial modules in FIG. 2 corresponding to the L subscript, E refers to the content and style encoder, G refers to the generator, and P refers to the probability estimation model, D refers to the discriminator) The loss optimization function can be defined as Equation (1) and Equation (2).

本申请实施例在训练压缩模型时通过改变特征通道数，来改变预期码流传输码率，以更有效的获取极高压缩比模型。In the embodiment of the present application, the expected code stream transmission code rate is changed by changing the number of feature channels when training the compression model, so as to obtain a model with a very high compression ratio more effectively.

训练过程中，使用加均匀噪声的方式避免量化操作在反向传播时的梯度不可导。During the training process, the method of adding uniform noise is used to avoid that the gradient of the quantization operation is not steerable during backpropagation.

在训练时使用的概率分布拟合方法包括但不限于高斯模型，混合高斯模型。The probability distribution fitting methods used during training include, but are not limited to, Gaussian models and Gaussian mixture models.

图3示出了本申请实施例提供的多任务分析网络结构示意图。在多任务分析网络模型方面，压缩域多任务分析网络是指分类任务与语义分割任务，采用对应网络结构设计以及对应损失函数。ASPP是指空间空洞金字塔池化(Atrous Spatial Pyramid Pooling)。FIG. 3 shows a schematic diagram of a multi-task analysis network structure provided by an embodiment of the present application. In terms of the multi-task analysis network model, the compressed-domain multi-task analysis network refers to the classification task and the semantic segmentation task, and adopts the corresponding network structure design and corresponding loss function. ASPP stands for Atrous Spatial Pyramid Pooling.

将风格特征和结构特征输入多任务分析网络，首先通过融合模块，之后针对不同任务特性设计任务网络分支，根据不同任务设定输出对应结果。Input the style features and structural features into the multi-task analysis network, first through the fusion module, and then design task network branches for different task characteristics, and output corresponding results according to different task settings.

本申请实施例中以已经被广泛研究的分类任务和分割任务为例，因为它们是视觉分析中最具代表性的任务。In the examples of this application, classification tasks and segmentation tasks, which have been widely studied, are used as examples, because they are the most representative tasks in visual analysis.

通过设置超参数来控制任务之间的平衡，因此多任务分析损失L_multi可表述为公式(6)：The balance between tasks is controlled by setting hyperparameters, so the multi-task analysis loss L _multi can be expressed as formula (6):

L_multi＝λ_clsl_cls+λ_segl_seg 公式(6)L _multi =λ _cls l _cls +λ _seg l _seg Formula (6)

其中l_cls、l_seg分别是分类任务与分割任务的损失函数，λ_cls、λ_seg是它们对应的权重超参数。Among them, l _cls and l _seg are the loss functions of the classification task and the segmentation task, respectively, and λ _cls and λ _seg are their corresponding weight hyperparameters.

为了进一步探讨压缩与视觉分析的关系，验证了两种分析网络训练方法：单独训练和联合训练。To further explore the relationship between compression and visual analysis, two methods of training the analysis network are validated: individual training and joint training.

单独训练的训练过程涉及参数量小，更易训练。在单独训练设置中，本申请实施例固定压缩模型，只训练多任务分析模型。The training process of separate training involves a small amount of parameters and is easier to train. In the separate training setting, the embodiment of the present application fixes the compression model, and only trains the multi-task analysis model.

联合训练的训练过程是直接训练平衡压缩、重建与分析三者关系，更易找到全局最优点，达到较好分析效果。The training process of joint training is to directly train the relationship between compression, reconstruction and analysis, which makes it easier to find the global optimum and achieve better analysis results.

对于联合训练方法，联合优化压缩模型和多任务分析模型，总损失函数和被定义为公式(4)，其中，超参数γ用以平衡压缩任务和视觉分析任务之间的比重。For the joint training method, which jointly optimizes the compression model and the multi-task analysis model, the sum of the total loss functions is defined as formula (4), where the hyperparameter γ is used to balance the weight between the compression task and the visual analysis task.

本申请实施例对率失真损失函数和机器分析目标损失函数进行联合优化，获取图像压缩任务与多种机器视觉分析任务的公共解。In the embodiments of the present application, the rate-distortion loss function and the machine analysis target loss function are jointly optimized to obtain common solutions for image compression tasks and various machine vision analysis tasks.

本申请实施例中压缩模型所对应的重建图所对应的在四种基于感知的图像评价指标上的效果如图4所示，可以看到各种图像质量指标上，大幅度超过现有传统编码方法与基于深度学习的端到端压缩方法。The effects of the reconstruction map corresponding to the compression model in the embodiment of the present application on the four perception-based image evaluation indicators are shown in Figure 4. It can be seen that various image quality indicators greatly exceed the existing traditional encoding. Methods and Deep Learning-Based End-to-End Compression Methods.

本申请实施例提供的人脸图片压缩方法，该技术通过将原始图片信号在视觉特征域解耦合为风格特征与结构特征，应用相互独立的概率估计模型分别对特征进行分布拟合，进而通过熵编码器得到编码比特流；为解决直接分析压缩数据以获取语义信息，提出一种多任务分析模型。In the face image compression method provided by the embodiment of the present application, the technology decouples the original image signal into style features and structural features in the visual feature domain, applies independent probability estimation models to respectively perform distribution fitting on the features, and then uses entropy The encoder obtains the encoded bit stream; in order to solve the problem of directly analyzing the compressed data to obtain the semantic information, a multi-task analysis model is proposed.

本申请实施例所提出的方法能够在极高压缩效率的情况下，保持重建图像高主观视觉评价质量，并且在解码端能够实现不解码就接收码流，使用作用于压缩数据的多任务分析网络进行原图语义信息获取，节省了解码时间与资源开销。The method proposed in the embodiment of the present application can maintain the high subjective visual evaluation quality of the reconstructed image under the condition of extremely high compression efficiency, and at the decoding end, the code stream can be received without decoding, and the multi-task analysis network acting on the compressed data can be used. Obtaining the semantic information of the original image saves decoding time and resource overhead.

综上所述，本申请实施例提供了一种图像压缩方法，通过从原始人脸图像输入风格编码器和内容编码器，以提取风格特征和结构特征；分别进行概率估计和熵编码，得到风格特征对应的风格编码比特流和结构特征对应的结构编码比特流，输入解码器和多任务分析网络；解码器对所述风格编码比特流和结构编码比特流的图像进行重建，输出重建图；多任务分析网络对所述风格编码比特流和结构编码比特流进行语义理解分析，输出图像的语义信息。在极高压缩效率的情况下，保持重建图像的高主观视觉评价质量，并且节省解码时间与资源开销。To sum up, the embodiments of the present application provide an image compression method, by inputting a style encoder and a content encoder from an original face image to extract style features and structural features; respectively performing probability estimation and entropy encoding to obtain style The style coding bitstream corresponding to the feature and the structure coding bitstream corresponding to the structure feature are input to the decoder and the multi-task analysis network; the decoder reconstructs the images of the style coding bitstream and the structure coding bitstream, and outputs the reconstructed image; The task analysis network performs semantic understanding analysis on the style-encoded bitstream and the structure-encoded bitstream, and outputs the semantic information of the image. In the case of extremely high compression efficiency, the high subjective visual evaluation quality of the reconstructed image is maintained, and decoding time and resource overhead are saved.

基于相同的技术构思，本申请实施例还提供了一种人脸图像压缩系统，图5所示，所述系统包括：Based on the same technical concept, an embodiment of the present application also provides a face image compression system, as shown in FIG. 5 , the system includes:

特征提取模块501，用于从原始人脸图像输入风格编码器和内容编码器，以提取风格特征和结构特征；A feature extraction module 501, for inputting a style encoder and a content encoder from the original face image to extract style features and structural features;

编码模块502，用于分别进行概率估计和熵编码，得到风格特征对应的风格编码比特流和结构特征对应的结构编码比特流，输入解码器和多任务分析网络；The coding module 502 is used to perform probability estimation and entropy coding respectively, obtain the style coding bitstream corresponding to the style feature and the structure coding bitstream corresponding to the structural feature, and input the decoder and the multi-task analysis network;

压缩解码模块503，用于解码器对所述风格编码比特流和结构编码比特流的图像进行重建，输出重建图；Compression and decoding module 503, used for the decoder to reconstruct the images of the style coded bit stream and the structure coded bit stream, and output the reconstructed image;

多任务分析模块504，用于多任务分析网络对所述风格编码比特流和结构编码比特流进行语义理解分析，输出图像的语义信息。The multi-task analysis module 504 is used for the multi-task analysis network to perform semantic understanding and analysis on the style-coded bit stream and the structure-coded bit stream, and output the semantic information of the image.

本申请实施方式还提供一种与前述实施方式所提供的方法对应的电子设备。请参考图6，其示出了本申请的一些实施方式所提供的一种电子设备的示意图。所述电子设备20可以包括：处理器200，存储器201，总线202和通信接口203，所述处理器200、通信接口203和存储器201通过总线202连接；所述存储器201中存储有可在所述处理器200上运行的计算机程序，所述处理器200运行所述计算机程序时执行本申请前述任一实施方式所提供的方法。The embodiments of the present application further provide an electronic device corresponding to the method provided by the foregoing embodiments. Please refer to FIG. 6 , which shows a schematic diagram of an electronic device provided by some embodiments of the present application. The electronic device 20 may include: a processor 200, a memory 201, a bus 202 and a communication interface 203, the processor 200, the communication interface 203 and the memory 201 are connected through the bus 202; A computer program running on the processor 200, when the processor 200 runs the computer program, the method provided by any of the foregoing embodiments of the present application is executed.

其中，存储器201可能包含高速随机存取存储器(RAM：Random Access Memory)，也可能还包括非不稳定的存储器(non-volatile memory)，例如至少一个磁盘存储器。通过至少一个物理端口203(可以是有线或者无线)实现该系统网元与至少一个其他网元之间的通信连接，可以使用互联网、广域网、本地网、城域网等。The memory 201 may include a high-speed random access memory (RAM: Random Access Memory), and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory. The communication connection between the network element of the system and at least one other network element is realized through at least one physical port 203 (which may be wired or wireless), and the Internet, a wide area network, a local network, a metropolitan area network, etc. may be used.

总线202可以是ISA总线、PCI总线或EISA总线等。所述总线可以分为地址总线、数据总线、控制总线等。其中，存储器201用于存储程序，所述处理器200在接收到执行指令后，执行所述程序，前述本申请实施例任一实施方式揭示的所述方法可以应用于处理器200中，或者由处理器200实现。The bus 202 may be an ISA bus, a PCI bus, an EISA bus, or the like. The bus can be divided into an address bus, a data bus, a control bus, and the like. The memory 201 is used to store a program, and the processor 200 executes the program after receiving the execution instruction. The method disclosed in any of the foregoing embodiments of the present application may be applied to the processor 200, or the The processor 200 is implemented.

处理器200可能是一种集成电路芯片，具有信号的处理能力。在实现过程中，上述方法的各步骤可以通过处理器200中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器200可以是通用处理器，包括中央处理器(Central Processing Unit，简称CPU)、网络处理器(Network Processor，简称NP)等；还可以是数字信号处理器(DSP)、专用集成电路(ASIC)、现场可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成，或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器，闪存、只读存储器，可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器201，处理器200读取存储器201中的信息，结合其硬件完成上述方法的步骤。The processor 200 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above-mentioned method can be completed by an integrated logic circuit of hardware in the processor 200 or an instruction in the form of software. The above-mentioned processor 200 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP for short), etc.; it may also be a digital signal processor (DSP), an application-specific integrated circuit (ASIC), Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The methods, steps, and logic block diagrams disclosed in the embodiments of this application can be implemented or executed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The storage medium is located in the memory 201, and the processor 200 reads the information in the memory 201, and completes the steps of the above method in combination with its hardware.

本申请实施例提供的电子设备与本申请实施例提供的方法出于相同的发明构思，具有与其采用、运行或实现的方法相同的有益效果。The electronic device provided by the embodiments of the present application and the methods provided by the embodiments of the present application are based on the same inventive concept, and have the same beneficial effects as the methods adopted, operated or realized.

本申请实施方式还提供一种与前述实施方式所提供的方法对应的计算机可读存储介质，请参考图7，其示出的计算机可读存储介质为光盘30，其上存储有计算机程序(即程序产品)，所述计算机程序在被处理器运行时，会执行前述任意实施方式所提供的方法。Embodiments of the present application further provide a computer-readable storage medium corresponding to the method provided by the foregoing embodiments, please refer to FIG. 7 , the computer-readable storage medium shown is an optical disc 30 on which a computer program (ie, a computer program) is stored. program product), the computer program, when executed by the processor, executes the method provided by any of the foregoing embodiments.

需要说明的是，所述计算机可读存储介质的例子还可以包括，但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他光学、磁性存储介质，在此不再一一赘述。It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random Access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other optical and magnetic storage media will not be repeated here.

本申请的上述实施例提供的计算机可读存储介质与本申请实施例提供的方法出于相同的发明构思，具有与其存储的应用程序所采用、运行或实现的方法相同的有益效果。The computer-readable storage medium provided by the above-mentioned embodiments of the present application and the methods provided by the embodiments of the present application are based on the same inventive concept, and have the same beneficial effects as the methods adopted, executed or implemented by the application programs stored therein.

需要说明的是：It should be noted:

在此提供的算法和显示不与任何特定计算机、虚拟装置或者其它设备有固有相关。各种通用装置也可以与基于在此的示教一起使用。根据上面的描述，构造这类装置所要求的结构是显而易见的。此外，本申请也不针对任何特定编程语言。应当明白，可以利用各种编程语言实现在此描述的本申请的内容，并且上面对特定语言所做的描述是为了披露本申请的最佳实施方式。The algorithms and displays provided herein are not inherently related to any particular computer, virtual appliance, or other device. Various general-purpose devices can also be used with the teachings based on this. The structure required to construct such a device is apparent from the above description. Furthermore, this application is not directed to any particular programming language. It should be understood that the content of the application described herein can be implemented using a variety of programming languages and that the descriptions of specific languages above are intended to disclose the best mode of the application.

在此处所提供的说明书中，说明了大量具体细节。然而，能够理解，本申请的实施例可以在没有这些具体细节的情况下实践。在一些实例中，并未详细示出公知的方法、结构和技术，以便不模糊对本说明书的理解。In the description provided herein, numerous specific details are set forth. It will be understood, however, that the embodiments of the present application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

类似地，应当理解，为了精简本申请并帮助理解各个发明方面中的一个或多个，在上面对本申请的示例性实施例的描述中，本申请的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而，并不应将该公开的方法解释成反映如下意图：即所要求保护的本申请要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说，如下面的权利要求书所反映的那样，发明方面在于少于前面公开的单个实施例的所有特征。因此，遵循具体实施方式的权利要求书由此明确地并入该具体实施方式，其中每个权利要求本身都作为本申请的单独实施例。Similarly, it is to be understood that in the above description of exemplary embodiments of the present application, various features of the present application are sometimes grouped together into a single embodiment, figure, or its description. This disclosure, however, should not be interpreted as reflecting an intention that the claimed application requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of this application.

本领域那些技术人员可以理解，可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件，以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外，可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述，本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。Those skilled in the art will understand that the modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. The modules or units or components in the embodiments may be combined into one module or unit or component, and further they may be divided into multiple sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method so disclosed may be employed in any combination, unless at least some of such features and/or procedures or elements are mutually exclusive. All processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

此外，本领域的技术人员能够理解，尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征，但是不同实施例的特征的组合意味着处于本申请的范围之内并且形成不同的实施例。例如，在下面的权利要求书中，所要求保护的实施例的任意之一都可以以任意的组合方式来使用。Furthermore, those skilled in the art will appreciate that although some of the embodiments described herein include certain features, but not others, included in other embodiments, that combinations of features of different embodiments are intended to be within the scope of the present application within and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

本申请的各个部件实施例可以以硬件实现，或者以在一个或者多个处理器上运行的软件模块实现，或者以它们的组合实现。本领域的技术人员应当理解，可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本申请实施例的虚拟机的创建装置中的一些或者全部部件的一些或者全部功能。本申请还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如，计算机程序和计算机程序产品)。这样的实现本申请的程序可以存储在计算机可读介质上，或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到，或者在载体信号上提供，或者以任何其他形式提供。Various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should understand that, in practice, a microprocessor or a digital signal processor (DSP) may be used to implement some or all functions of some or all components in the apparatus for creating a virtual machine according to the embodiments of the present application. The present application can also be implemented as an apparatus or apparatus program (eg, computer programs and computer program products) for performing part or all of the methods described herein. Such a program implementing the present application may be stored on a computer-readable medium, or may be in the form of one or more signals. Such signals may be downloaded from Internet sites, or provided on carrier signals, or in any other form.

应该注意的是上述实施例对本申请进行说明而不是对本申请进行限制，并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中，不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本申请可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中，这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。It should be noted that the above-described embodiments illustrate rather than limit the application, and alternative embodiments may be devised by those skilled in the art without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application can be implemented by means of hardware comprising several different elements and by means of a suitably programmed computer. In a unit claim enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, and third, etc. do not denote any order. These words can be interpreted as names.

以上所述，仅为本申请较佳的具体实施方式，但本申请的保护范围并不局限于此，任何熟悉本技术领域的技术人员在本申请揭露的技术范围内，可轻易想到的变化或替换，都应涵盖在本申请的保护范围之内。因此，本申请的保护范围应以所述权利要求的保护范围为准。The above are only the preferred specific embodiments of the present application, but the protection scope of the present application is not limited to this. Substitutions should be covered within the protection scope of this application. Therefore, the protection scope of the present application should be subject to the protection scope of the claims.

Claims

1. A method for compressing a face image, the method comprising:

inputting a style encoder and a content encoder from an original face image to extract style features and structural features;

respectively carrying out probability estimation and entropy coding to obtain a style coding bit stream corresponding to the style characteristics and a structure coding bit stream corresponding to the structure characteristics, and inputting the style coding bit stream and the structure coding bit stream into a decoder and a multitask analysis network;

the decoder reconstructs the images of the style coding bit stream and the structure coding bit stream and outputs a reconstructed image; and the multitask analysis network carries out semantic understanding analysis on the style coding bit stream and the structure coding bit stream and outputs semantic information of the image.

2. The method of claim 1, wherein the performing probability estimation and entropy coding to obtain a stylized coded bitstream corresponding to the stylistic characteristic and a structurally coded bitstream corresponding to the structurally characteristic, respectively, comprises:

quantizing the style characteristic and the structural characteristic respectively to obtain quantized style characteristic and structural characteristic;

and entropy coding the quantized style features and the structure features according to probability estimation results calculated by the probability estimation model respectively to obtain style coding bit streams corresponding to the style features and structure coding bit streams corresponding to the structure features.

3. The method of claim 1, wherein the decoder reconstructs the images of the style and structure coded bitstreams comprising:

fusing the style coding bit stream and the structure coding bit stream through a fusion module in a decoder, and learning the mean value and the variance of convolution layers in a residual block through multi-layer perception MLP (maximum likelihood prediction) processing;

executing an image compression task on the fused coded bit stream through a generator in a decoder to obtain a compressed reconstructed image;

judging the compressed reconstructed image through a discriminator to obtain a loss optimization function; the generator is trained according to a loss optimization function.

4. The method of claim 3, wherein the loss optimization function is in accordance with the following equation:

wherein D is a discriminator, E is a content encoder and a style encoder, G is a generator, P is a probability estimation model, x is an original face image,

in order to reconstruct the image,

and p is a probability estimation result, and lambda and beta are hyper-parameters for the quantized style characteristics and structure characteristics.

5. The method of claim 1, wherein the multitask analysis network performs semantic parsing on the style coded bitstream and the structure coded bitstream to output semantic information for an image, comprising:

inputting the style coding bit stream and the structure coding bit stream into the multitask analysis network, fusing the coding bit streams through a fusion module, and training the multitask analysis network according to a multitask analysis loss function to obtain a corresponding task result which is used as the output of the semantic information of the image.

6. The method of claim 5, wherein the multitasking analysis loss function L_multiCalculated according to the following formula:

L_multi＝λ_clsl_cls+λ_segl_seg

wherein l_cls、l_segIs a loss function, lambda, of the classification task and the segmentation task, respectively_cls、λ_segIs the corresponding weight hyperparameter.

7. The method of any of claims 1 to 6, further comprising: training parameters in the multi-task analysis model through optimization of the multi-task analysis loss function to obtain a global optimal solution; wherein the total loss function applied in the training of the multi-tasking analysis model is according to the following formula:

L＝L_EGP+L_D+γL_multi

wherein gamma is a hyperparameter.

8. A face image compression system, the system comprising:

the characteristic extraction module is used for inputting a style encoder and a content encoder from an original face image so as to extract style characteristics and structural characteristics;

the coding module is used for respectively carrying out probability estimation and entropy coding to obtain a style coding bit stream corresponding to the style characteristics and a structure coding bit stream corresponding to the structure characteristics, and inputting the style coding bit stream and the structure coding bit stream into the decoder and the multitask analysis network;

the compression decoding module is used for reconstructing the images of the style coding bit stream and the structure coding bit stream by a decoder and outputting a reconstructed image;

and the multitask analysis module is used for carrying out semantic understanding analysis on the style coding bit stream and the structure coding bit stream by the multitask analysis network and outputting semantic information of the image.

9. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor executes when executing the computer program to implement the method according to any of claims 1-7.

10. A computer-readable storage medium having computer-readable instructions stored thereon, the computer-readable instructions being executable by a processor to implement the method of any one of claims 1-7.