CN114519750A - Face image compression method and system - Google Patents

Face image compression method and system Download PDF

Info

Publication number
CN114519750A
CN114519750A CN202210013946.6A CN202210013946A CN114519750A CN 114519750 A CN114519750 A CN 114519750A CN 202210013946 A CN202210013946 A CN 202210013946A CN 114519750 A CN114519750 A CN 114519750A
Authority
CN
China
Prior art keywords
style
bit stream
coding bit
image
analysis
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210013946.6A
Other languages
Chinese (zh)
Inventor
贾川民
张悦枫
马思伟
王苫社
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202210013946.6A priority Critical patent/CN114519750A/en
Publication of CN114519750A publication Critical patent/CN114519750A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T9/00Image coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The embodiment of the application discloses an image compression method and system, wherein the method comprises the following steps: inputting a style encoder and a content encoder from an original face image to extract style features and structural features; respectively carrying out probability estimation and entropy coding to obtain a style coding bit stream corresponding to the style characteristics and a structure coding bit stream corresponding to the structure characteristics, and inputting the style coding bit stream and the structure coding bit stream into a decoder and a multitask analysis network; the decoder reconstructs the images of the style coding bit stream and the structure coding bit stream and outputs a reconstructed image; and the multitask analysis network carries out semantic understanding analysis on the style coding bit stream and the structure coding bit stream and outputs semantic information of the image. Under the condition of extremely high compression efficiency, the high subjective visual evaluation quality of the reconstructed image is maintained, and the decoding time and the resource overhead are saved.

Description

Face image compression method and system
Technical Field
The embodiment of the application relates to the technical field of digital signal processing, in particular to a method and a system for compressing a human face image.
Background
The image/video compression method based on the neural network is developed rapidly in recent years, and the quality of the compressed and reconstructed image exceeds the VVC (video coding standard) of the new generation on objective indexes such as PSNR (Peak Signal to noise ratio), MS-SSIM (motion state-to-noise ratio) and the like. The compression framework based on the generative model can greatly improve the compression ratio on the premise of not influencing related evaluation indexes directly reflecting the human eye viewing effect.
Currently, in various researches, end-to-end image coding based on a neural network faces two major problems: firstly, the representation mechanism of the input original image signal is limited, and the support of the computer vision processing task which is widely applied at present is lacked; secondly, the resources of the signal receiving end are limited, and the neural network model with huge parameter quantity is not enough to be supported.
Disclosure of Invention
Therefore, the embodiment of the application provides an image compression method and system, which can keep high subjective visual evaluation quality of a reconstructed image under the condition of extremely high compression efficiency, and save decoding time and resource overhead.
In order to achieve the above object, the embodiments of the present application provide the following technical solutions:
according to a first aspect of the embodiments of the present application, there is provided a face image compression method, including:
inputting a style encoder and a content encoder from an original face image to extract style features and structural features;
respectively carrying out probability estimation and entropy coding to obtain a style coding bit stream corresponding to the style characteristics and a structure coding bit stream corresponding to the structure characteristics, and inputting the style coding bit stream and the structure coding bit stream into a decoder and a multitask analysis network;
the decoder reconstructs the images of the style coding bit stream and the structure coding bit stream and outputs a reconstructed image; and the multitask analysis network carries out semantic understanding analysis on the style coding bit stream and the structure coding bit stream and outputs semantic information of the image.
Optionally, performing probability estimation and entropy coding respectively to obtain a style coded bit stream corresponding to the style characteristic and a structure coded bit stream corresponding to the structure characteristic, including:
quantizing the style characteristic and the structural characteristic respectively to obtain quantized style characteristic and structural characteristic;
and entropy coding the quantized style features and the structure features according to probability estimation results calculated by the probability estimation model respectively to obtain style coding bit streams corresponding to the style features and structure coding bit streams corresponding to the structure features.
Optionally, the reconstructing the image of the style coded bitstream and the structure coded bitstream by the decoder, and outputting a reconstructed image, including:
fusing the style coding bit stream and the structure coding bit stream through a fusion module in a decoder, and learning the mean value and the variance of a convolution layer in a residual block through multi-layer perception MLP processing;
executing an image compression task on the fused coded bit stream through a generator in a decoder to obtain a compressed reconstructed image;
judging the compressed reconstructed image through the discriminator to obtain a loss optimization function; the generator is trained according to a loss optimization function.
Optionally, the loss optimization function is according to the following formula:
Figure BDA0003459062760000021
Figure BDA0003459062760000022
wherein D is a discriminator, E is a content encoder and a style encoder, G is a generator, P is a probability estimation model, x is an original face image,
Figure BDA0003459062760000023
in order to reconstruct the image,
Figure BDA0003459062760000024
and p is a probability estimation result, and lambda and beta are hyper-parameters for the quantized style characteristics and structure characteristics.
Optionally, the multitask analysis network performs semantic analysis on the style coded bitstream and the structure coded bitstream, and outputs semantic information of an image, including:
inputting the style coding bit stream and the structure coding bit stream into the multitask analysis network, fusing the coding bit streams through a fusion module, and training the multitask analysis network according to a multitask analysis loss function to obtain a corresponding task result which is used as the output of the semantic information of the image.
Optionally, the multitask analysis loss function LmultiCalculated according to the following formula:
Lmulti=λclslclsseglseg
wherein lcls、lsegIs a loss function, lambda, of the classification task and the segmentation task, respectivelycls、λsegIs the corresponding weight hyperparameter.
Optionally, the method further comprises: training parameters in the multi-task analysis model through optimization of the multi-task analysis loss function to obtain a global optimal solution; wherein the total loss function applied in the training of the multi-tasking analysis model is according to the following formula:
L=LEGP+LD+γLmulti
wherein gamma is a hyperparameter.
According to a second aspect of the embodiments of the present application, there is provided a face image compression system, the system including:
the characteristic extraction module is used for inputting a style encoder and a content encoder from an original face image so as to extract style characteristics and structural characteristics;
the coding module is used for respectively carrying out probability estimation and entropy coding to obtain a style coding bit stream corresponding to the style characteristics and a structure coding bit stream corresponding to the structure characteristics, and inputting the style coding bit stream and the structure coding bit stream into the decoder and the multitask analysis network;
the compression decoding module is used for reconstructing the images of the style coding bit stream and the structure coding bit stream by a decoder and outputting a reconstructed image;
and the multitask analysis module is used for carrying out semantic understanding analysis on the style coding bit stream and the structure coding bit stream by the multitask analysis network and outputting semantic information of the image.
According to a third aspect of embodiments herein, there is provided an electronic device comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the computer program to implement the method of the first aspect.
According to a fourth aspect of embodiments herein, there is provided a computer readable storage medium having stored thereon computer readable instructions executable by a processor to implement the method of the first aspect described above.
In summary, the embodiments of the present application provide an image compression method and system, which inputs a style encoder and a content encoder from an original face image to extract style features and structural features; respectively carrying out probability estimation and entropy coding to obtain a style coding bit stream corresponding to the style characteristics and a structure coding bit stream corresponding to the structure characteristics, and inputting the style coding bit stream and the structure coding bit stream into a decoder and a multitask analysis network; the decoder reconstructs the images of the style coding bit stream and the structure coding bit stream and outputs a reconstructed image; and the multitask analysis network carries out semantic understanding analysis on the style coding bit stream and the structure coding bit stream and outputs semantic information of the image. Under the condition of extremely high compression efficiency, the high subjective visual evaluation quality of the reconstructed image is maintained, and the decoding time and the resource overhead are saved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.
The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so that those skilled in the art can understand and read the present invention, and do not limit the conditions for implementing the present invention, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the functions and purposes of the present invention, should still fall within the scope of the present invention.
Fig. 1 is a schematic flow chart of a face image compression method according to an embodiment of the present application;
fig. 2 is a flowchart of a technical solution provided in an embodiment of the present application;
fig. 3 is a diagram of a multi-tasking analysis network architecture provided by an embodiment of the present application;
fig. 4 is a schematic diagram illustrating a quality index of a compressed and reconstructed image according to an embodiment of the present disclosure;
fig. 5 is a facial image compression system according to an embodiment of the present application;
fig. 6 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application;
fig. 7 shows a schematic diagram of a computer-readable storage medium provided by an embodiment of the present application.
Detailed Description
The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The image compression method provided by the embodiment of the application is prior to the prior art on the premise of the same code rate and on various evaluation indexes simulating the visual perception of human eyes; meanwhile, compressed data is used as input of a downstream visual analysis task, and compared with an original image, the loss of analysis accuracy is low, so that the storage resource overhead and network transmission bandwidth of a decoding end are greatly increased.
Fig. 1 illustrates an image compression method provided by an embodiment of the present application, where the method includes the following steps:
step 101: inputting a style encoder and a content encoder from an original face image to extract style features and structural features;
step 102: respectively carrying out probability estimation and entropy coding to obtain a style coding bit stream corresponding to the style characteristics and a structure coding bit stream corresponding to the structure characteristics, and inputting the style coding bit stream and the structure coding bit stream into a decoder and a multitask analysis network;
step 103: the decoder reconstructs the images of the style coding bit stream and the structure coding bit stream and outputs a reconstructed image; and the multitask analysis network carries out semantic understanding analysis on the style coding bit stream and the structure coding bit stream and outputs semantic information of the image.
In a possible implementation manner, in step 102, performing probability estimation and entropy coding respectively to obtain a style coded bitstream corresponding to the style characteristic and a structure coded bitstream corresponding to the structure characteristic, including:
quantizing the style characteristic and the structural characteristic respectively to obtain quantized style characteristic and structural characteristic; and entropy coding the quantized style features and the structure features according to probability estimation results calculated by the probability estimation model respectively to obtain style coding bit streams corresponding to the style features and structure coding bit streams corresponding to the structure features.
In one possible implementation, in step 103, the decoder reconstructs an image of the style coded bitstream and the structure coded bitstream, and outputs a reconstructed image, including:
fusing the style coding bit stream and the structure coding bit stream through a fusion module in a decoder, and learning the mean value and the variance of a convolution layer in a residual block through multi-layer perception MLP processing; executing an image compression task on the fused coded bit stream through a generator in a decoder to obtain a compressed reconstructed image; judging the compressed reconstructed image through the discriminator to obtain a loss optimization function; the generator is trained according to a loss optimization function.
In one possible embodiment, the loss optimization function is according to the following equations (1) and (2):
Figure BDA0003459062760000061
Figure BDA0003459062760000062
wherein D is a discriminator, E is a content encoder and a style encoder, G is a generator, P is a probability estimation model, x is an original face image,
Figure BDA0003459062760000063
in order to reconstruct the image,
Figure BDA0003459062760000064
and p is a probability estimation result, and lambda and beta are hyper-parameters for the quantized style characteristics and structure characteristics.
In a possible implementation manner, in step 103, the multitask analysis network performs semantic solution analysis on the style coded bitstream and the structure coded bitstream, and outputs semantic information of an image, including:
inputting the style coding bit stream and the structure coding bit stream into the multitask analysis network, fusing the coding bit streams through a fusion module, and training the multitask analysis network according to a multitask analysis loss function to obtain a corresponding task result which is used as the output of the semantic information of the image.
In the compression framework, the intermediate compression result is used, and under the condition of no decoding, the intermediate compression result is directly used as a plurality of analysis task inputs to obtain the semantic information of the original image signal. Multitasking herein refers to a variety of visual and semantic related tasks such as recognition, detection, segmentation, etc.
In one possible embodiment, the multitask analysis loss function LmultiCalculated according to the following equation (3):
Lmulti=λclslclsseglsegformula (3)
Wherein lcls、lsegIs a loss function, lambda, of the classification task and the segmentation task, respectivelycls、λsegIs the corresponding weight hyperparameter.
In one possible embodiment, the method further comprises: training parameters in the multi-task analysis model through optimization of the multi-task analysis loss function to obtain a global optimal solution; wherein the total loss function applied in the training of the multi-tasking analysis model is according to the following equation (4):
L=LEGP+LD+γLmultiformula (4)
Wherein gamma is a hyperparameter.
According to the face image compression method provided by the embodiment of the application, the construction idea of the generation model for the layered representation of the image signal at present is applied, and the original input image is mapped into the style characteristic and the structural characteristic, so that further quantization and entropy coding are carried out on the corresponding characteristic expression distribution. In addition, the embodiment of the application directly adopts compressed domain compressed data as the input of a plurality of subsequent visual tasks. Because the compressed data is an efficient and compact data expression form, the embodiment of the application provides a multi-task analysis network model, and the semantic information of the original image is obtained from the compressed data at low operation cost under the condition of no decompression. Meanwhile, joint optimization is carried out on the rate distortion loss function and the machine analysis target loss function, and a common solution of an image compression task and various machine vision analysis tasks is obtained.
Fig. 2 shows a model architecture diagram applicable to the face image compression method for multi-vision analysis task according to the embodiment of the present application, which mainly includes a compression model and a multi-task analysis model.
The compression model mainly comprises four main parts: an encoder, a generator, a discriminator and a probability estimation model. The encoder includes a content encoder and a genre encoder.
Given an original image x, the encoder first encodes it as y ═ e (x),then quantize
Figure BDA0003459062760000071
Where Q is the quantization function. After that, the probability estimation result given by the probability estimation model will be
Figure BDA0003459062760000072
Lossless encoding into a bitstream using an entropy encoding method. At the decoder end, there are
Figure BDA0003459062760000073
Wherein
Figure BDA0003459062760000074
To reconstruct the image.
And mapping the original picture to be compressed to a visual semantic feature domain, and decomposing the original picture to be compressed into style features and structural features. And performing probability distribution fitting on the style characteristics and the structural characteristics obtained by decomposition by using a mutually independent probability estimation method, and taking entropy values of the probabilities obtained by fitting as code rate values obtained by actual coding.
The original image signal is input and decomposed into content characteristic and style characteristic in semantic characteristic level. An input image x is encoded into a content representation and a style representation, respectively E, using two separate encoders1And E2. The content feature and the style feature are y1=E1(x) And y2=E2(x) In that respect Then, quantization is carried out by using a quantization function Q to obtain
Figure BDA0003459062760000081
And
Figure BDA0003459062760000082
because the characteristics obtained by decoupling have mutually independent data distribution, a probability estimation model P, namely P, is respectively arranged for each layer1(y1|z1) And p2(y2|z2). Probability estimation result p given by probability estimation model1(y1|z1) And p2(y2|z2) Will be provided with
Figure BDA0003459062760000083
And
Figure BDA0003459062760000084
and lossless coding is carried out on the coded bit stream by using an entropy coding method, and the entropy value of the probability obtained by fitting is used as the code rate value obtained by actual coding.
The features are further compressed by an entropy coding method, the entropy coded input requires the probability distribution of the elements in the features, so the probability distribution of each element that occurs is estimated by a probability estimation model. Entropy coding methods include, but are not limited to, huffman coding, arithmetic coding, context-based binarization coding.
The coded bit stream is referred to as a code stream. The features are compressed by an entropy encoder to obtain a binary file, and a code stream 1 and a code stream 2 in fig. 2 are entropy encoding results of the content features and the style features respectively.
The embodiment of the application also provides a semantic layering-based encoder, and the decoder part mainly comprises a generator and a discriminator. The embodiment of the application designs a fusion module at a decoding end. The fusion module is based on an Adaptive Instance Normalization (AdaIN) residual block, wherein content characteristics are directly input as an AdaIN module; the style features are processed by multi-layer perceptron (MLP) as the mean and variance of the residual network volume base layer in the AdaIN block to learn the mean and variance of the convolutional layer in the residual block.
At the decoder side, there is
Figure BDA0003459062760000085
Wherein
Figure BDA0003459062760000086
To reconstruct the image.
According to the embodiment of the application, the compression model is optimized according to a rate distortion theory. There are two reference indicators in the distortion loss metric: mean absolute of pixel levelError (MAE) loss dMAESSIM loss d for evaluation of the overall StructureSSIM. Meanwhile, in consideration of subjective perception quality, perception distortion loss d is adoptedpHuman eye perception characteristics are simulated by high-order features extracted from the pre-trained convolutional neural network VGG 16.
The method for calculating the total distortion loss by the compression model provided by the embodiment of the application is as follows:
d=λMAEdMAESSIMdSSIMpdpformula (5)
Wherein λ isMAE、λSSIMAnd λpIs a hyper-parameter. dMAEIs the Mean Absolute Error (MAE) loss at the pixel level, dSSIMSSIM loss, d, which is an overall structural evaluationpIs the perceptual distortion loss.
Based on the definition of the loss function, the loss optimization function of each module (referring to each partial module of fig. 2 corresponding to the L-corner mark, E referring to the content and style encoder, G referring to the generator, P referring to the probability estimation model, and D referring to the discriminator) corresponding to the embodiment of the present application can be defined as formula (1) and formula (2).
According to the method and the device, the expected code stream transmission code rate is changed by changing the number of the characteristic channels when the compression model is trained, so that the extremely high compression ratio model can be obtained more effectively.
In the training process, a mode of adding uniform noise is used to avoid the situation that the gradient of the quantization operation is not conductive when the quantization operation is reversely propagated.
The probability distribution fitting method used in training includes, but is not limited to, gaussian model, and mixture gaussian model.
Fig. 3 shows a schematic diagram of a multitask analysis network structure provided in the embodiment of the present application. In the aspect of a multi-task analysis network model, a compressed domain multi-task analysis network refers to a classification task and a semantic segmentation task, and adopts a corresponding network structure design and a corresponding loss function. ASPP refers to Spatial void Pyramid Pooling (Atrous Spatial Pyramid Pooling).
Inputting style characteristics and structural characteristics into a multi-task analysis network, firstly passing through a fusion module, then designing task network branches aiming at different task characteristics, and setting and outputting corresponding results according to different tasks.
The classification task and the segmentation task, which have been widely studied, are exemplified in the embodiments of the present application because they are the most representative tasks in the visual analysis.
Controlling the balance between tasks by setting hyper-parameters, thus multi-tasking analysis loses LmultiCan be expressed as formula (6):
Lmulti=λclslclsseglsegformula (6)
Wherein lcls、lsegIs a loss function, lambda, of the classification task and the segmentation task, respectivelycls、λsegAre their corresponding weight hyperparameters.
To further discuss the relationship of compression and visual analysis, two analysis network training methods were validated: training alone and training in combination.
The training process of the independent training relates to small parameter quantity and is easier to train. In the individual training setting, the embodiment of the application fixes the compression model and only trains the multi-task analysis model.
The training process of the combined training is to directly train the relationship among the balance compression, the reconstruction and the analysis, so that the global optimum point is easier to find, and a better analysis effect is achieved.
For the joint training method, the compression model and the multi-task analysis model are jointly optimized, the sum of the total loss functions is defined as formula (4), where the hyper-parameter γ is used to balance the specific gravity between the compression task and the visual analysis task.
According to the method and the device, the rate distortion loss function and the machine analysis target loss function are optimized in a combined mode, and a common solution of an image compression task and various machine vision analysis tasks is obtained.
The effects of the reconstruction map corresponding to the compression model on the four perception-based image evaluation indexes in the embodiment of the application are shown in fig. 4, and various image quality indexes can be seen, which greatly exceed the existing traditional encoding method and the end-to-end compression method based on deep learning.
According to the face picture compression method provided by the embodiment of the application, original picture signals are decoupled into style characteristics and structural characteristics in a visual characteristic domain, distribution fitting is respectively carried out on the characteristics by applying mutually independent probability estimation models, and then coding bit streams are obtained through an entropy coder; in order to directly analyze compressed data to obtain semantic information, a multi-task analysis model is provided.
The method provided by the embodiment of the application can keep high subjective visual evaluation quality of the reconstructed image under the condition of extremely high compression efficiency, can receive the code stream without decoding at a decoding end, and can acquire the semantic information of the original image by using the multitask analysis network acting on compressed data, thereby saving decoding time and resource overhead.
In summary, the embodiment of the present application provides an image compression method, which inputs a style encoder and a content encoder from an original face image to extract style features and structural features; respectively carrying out probability estimation and entropy coding to obtain a style coding bit stream corresponding to the style characteristics and a structure coding bit stream corresponding to the structure characteristics, and inputting the style coding bit stream and the structure coding bit stream into a decoder and a multitask analysis network; the decoder reconstructs the images of the style coding bit stream and the structure coding bit stream and outputs a reconstructed image; and the multitask analysis network carries out semantic understanding analysis on the style coding bit stream and the structure coding bit stream and outputs semantic information of the image. Under the condition of extremely high compression efficiency, the high subjective visual evaluation quality of the reconstructed image is maintained, and the decoding time and the resource overhead are saved.
Based on the same technical concept, an embodiment of the present application further provides a face image compression system, as shown in fig. 5, the system includes:
a feature extraction module 501, configured to input a style encoder and a content encoder from an original face image to extract style features and structural features;
the encoding module 502 is configured to perform probability estimation and entropy encoding respectively to obtain a style encoded bit stream corresponding to the style characteristics and a structure encoded bit stream corresponding to the structure characteristics, and input the style encoded bit stream and the structure encoded bit stream to the decoder and the multitask analysis network;
a compression decoding module 503, configured to reconstruct the image of the style coded bitstream and the structure coded bitstream by a decoder, and output a reconstructed image;
and a multitask analysis module 504, configured to perform semantic understanding analysis on the style coded bit stream and the structure coded bit stream by using a multitask analysis network, and output semantic information of an image.
The embodiment of the application also provides electronic equipment corresponding to the method provided by the embodiment. Please refer to fig. 6, which illustrates a schematic diagram of an electronic device according to some embodiments of the present application. The electronic device 20 may include: the system comprises a processor 200, a memory 201, a bus 202 and a communication interface 203, wherein the processor 200, the communication interface 203 and the memory 201 are connected through the bus 202; the memory 201 stores a computer program that can be executed on the processor 200, and the processor 200 executes the computer program to perform the method provided by any of the foregoing embodiments of the present application.
The Memory 201 may include a high-speed Random Access Memory (RAM) and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one physical port 203 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.
Bus 202 can be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. The memory 201 is used for storing a program, and the processor 200 executes the program after receiving an execution instruction, and the method disclosed by any of the foregoing embodiments of the present application may be applied to the processor 200, or implemented by the processor 200.
The processor 200 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 200. The Processor 200 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 201, and the processor 200 reads the information in the memory 201 and completes the steps of the method in combination with the hardware thereof.
The electronic device provided by the embodiment of the application and the method provided by the embodiment of the application have the same inventive concept and have the same beneficial effects as the method adopted, operated or realized by the electronic device.
Referring to fig. 7, the computer-readable storage medium is an optical disc 30, on which a computer program (i.e., a program product) is stored, and when the computer program is executed by a processor, the computer program performs the method of any of the foregoing embodiments.
It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, or other optical and magnetic storage media, which are not described in detail herein.
The computer-readable storage medium provided by the above-mentioned embodiments of the present application and the method provided by the embodiments of the present application have the same advantages as the method adopted, executed or implemented by the application program stored in the computer-readable storage medium.
It should be noted that:
the algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. In addition, this application is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present application as described herein, and any descriptions of specific languages are provided above to disclose the best mode of use of the present application.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in the creation apparatus of a virtual machine according to embodiments of the present application. The present application may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method for compressing a face image, the method comprising:
inputting a style encoder and a content encoder from an original face image to extract style features and structural features;
respectively carrying out probability estimation and entropy coding to obtain a style coding bit stream corresponding to the style characteristics and a structure coding bit stream corresponding to the structure characteristics, and inputting the style coding bit stream and the structure coding bit stream into a decoder and a multitask analysis network;
the decoder reconstructs the images of the style coding bit stream and the structure coding bit stream and outputs a reconstructed image; and the multitask analysis network carries out semantic understanding analysis on the style coding bit stream and the structure coding bit stream and outputs semantic information of the image.
2. The method of claim 1, wherein the performing probability estimation and entropy coding to obtain a stylized coded bitstream corresponding to the stylistic characteristic and a structurally coded bitstream corresponding to the structurally characteristic, respectively, comprises:
quantizing the style characteristic and the structural characteristic respectively to obtain quantized style characteristic and structural characteristic;
and entropy coding the quantized style features and the structure features according to probability estimation results calculated by the probability estimation model respectively to obtain style coding bit streams corresponding to the style features and structure coding bit streams corresponding to the structure features.
3. The method of claim 1, wherein the decoder reconstructs the images of the style and structure coded bitstreams comprising:
fusing the style coding bit stream and the structure coding bit stream through a fusion module in a decoder, and learning the mean value and the variance of convolution layers in a residual block through multi-layer perception MLP (maximum likelihood prediction) processing;
executing an image compression task on the fused coded bit stream through a generator in a decoder to obtain a compressed reconstructed image;
judging the compressed reconstructed image through a discriminator to obtain a loss optimization function; the generator is trained according to a loss optimization function.
4. The method of claim 3, wherein the loss optimization function is in accordance with the following equation:
Figure FDA0003459062750000021
Figure FDA0003459062750000022
wherein D is a discriminator, E is a content encoder and a style encoder, G is a generator, P is a probability estimation model, x is an original face image,
Figure FDA0003459062750000023
in order to reconstruct the image,
Figure FDA0003459062750000024
and p is a probability estimation result, and lambda and beta are hyper-parameters for the quantized style characteristics and structure characteristics.
5. The method of claim 1, wherein the multitask analysis network performs semantic parsing on the style coded bitstream and the structure coded bitstream to output semantic information for an image, comprising:
inputting the style coding bit stream and the structure coding bit stream into the multitask analysis network, fusing the coding bit streams through a fusion module, and training the multitask analysis network according to a multitask analysis loss function to obtain a corresponding task result which is used as the output of the semantic information of the image.
6. The method of claim 5, wherein the multitasking analysis loss function LmultiCalculated according to the following formula:
Lmulti=λclslclsseglseg
wherein lcls、lsegIs a loss function, lambda, of the classification task and the segmentation task, respectivelycls、λsegIs the corresponding weight hyperparameter.
7. The method of any of claims 1 to 6, further comprising: training parameters in the multi-task analysis model through optimization of the multi-task analysis loss function to obtain a global optimal solution; wherein the total loss function applied in the training of the multi-tasking analysis model is according to the following formula:
L=LEGP+LD+γLmulti
wherein gamma is a hyperparameter.
8. A face image compression system, the system comprising:
the characteristic extraction module is used for inputting a style encoder and a content encoder from an original face image so as to extract style characteristics and structural characteristics;
the coding module is used for respectively carrying out probability estimation and entropy coding to obtain a style coding bit stream corresponding to the style characteristics and a structure coding bit stream corresponding to the structure characteristics, and inputting the style coding bit stream and the structure coding bit stream into the decoder and the multitask analysis network;
the compression decoding module is used for reconstructing the images of the style coding bit stream and the structure coding bit stream by a decoder and outputting a reconstructed image;
and the multitask analysis module is used for carrying out semantic understanding analysis on the style coding bit stream and the structure coding bit stream by the multitask analysis network and outputting semantic information of the image.
9. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor executes when executing the computer program to implement the method according to any of claims 1-7.
10. A computer-readable storage medium having computer-readable instructions stored thereon, the computer-readable instructions being executable by a processor to implement the method of any one of claims 1-7.
CN202210013946.6A 2022-01-06 2022-01-06 Face image compression method and system Pending CN114519750A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210013946.6A CN114519750A (en) 2022-01-06 2022-01-06 Face image compression method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210013946.6A CN114519750A (en) 2022-01-06 2022-01-06 Face image compression method and system

Publications (1)

Publication Number Publication Date
CN114519750A true CN114519750A (en) 2022-05-20

Family

ID=81597218

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210013946.6A Pending CN114519750A (en) 2022-01-06 2022-01-06 Face image compression method and system

Country Status (1)

Country Link
CN (1) CN114519750A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115880762A (en) * 2023-02-21 2023-03-31 中国传媒大学 Scalable human face image coding method and system for human-computer mixed vision

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991329A (en) * 2019-11-29 2020-04-10 上海商汤智能科技有限公司 Semantic analysis method and device, electronic equipment and storage medium
CN111199550A (en) * 2020-04-09 2020-05-26 腾讯科技(深圳)有限公司 Training method, segmentation method, device and storage medium of image segmentation network
CN112766079A (en) * 2020-12-31 2021-05-07 北京航空航天大学 Unsupervised image-to-image translation method based on content style separation
CN112819689A (en) * 2021-02-02 2021-05-18 百果园技术(新加坡)有限公司 Training method of face attribute editing model, face attribute editing method and equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110991329A (en) * 2019-11-29 2020-04-10 上海商汤智能科技有限公司 Semantic analysis method and device, electronic equipment and storage medium
CN111199550A (en) * 2020-04-09 2020-05-26 腾讯科技(深圳)有限公司 Training method, segmentation method, device and storage medium of image segmentation network
CN112766079A (en) * 2020-12-31 2021-05-07 北京航空航天大学 Unsupervised image-to-image translation method based on content style separation
CN112819689A (en) * 2021-02-02 2021-05-18 百果园技术(新加坡)有限公司 Training method of face attribute editing model, face attribute editing method and equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马思伟等: "智能视频编码", 人工智能, 10 April 2020 (2020-04-10) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115880762A (en) * 2023-02-21 2023-03-31 中国传媒大学 Scalable human face image coding method and system for human-computer mixed vision

Similar Documents

Publication Publication Date Title
CN109218727B (en) Video processing method and device
CN111641832B (en) Encoding method, decoding method, device, electronic device and storage medium
CN113259665B (en) Image processing method and related equipment
CN113259676B (en) Image compression method and device based on deep learning
CN111641826B (en) Method, device and system for encoding and decoding data
CN116600119B (en) Video encoding method, video decoding method, video encoding device, video decoding device, computer equipment and storage medium
CN113079378B (en) Image processing method and device and electronic equipment
CN110930408A (en) Semantic image compression method based on knowledge reorganization
CN111246206A (en) Optical flow information compression method and device based on self-encoder
Löhdefink et al. GAN-vs. JPEG2000 image compression for distributed automotive perception: Higher peak SNR does not mean better semantic segmentation
CN114519750A (en) Face image compression method and system
CN114501031B (en) Compression coding and decompression method and device
CN113382244B (en) Coding and decoding network structure, image compression method, device and storage medium
CN115866265A (en) Multi-code-rate depth image compression system and method applied to mixed context
CN113554719B (en) Image encoding method, decoding method, storage medium and terminal equipment
CN118020306A (en) Video encoding and decoding method, encoder, decoder, and storage medium
CN111565314A (en) Image compression method, coding and decoding network training method and device and electronic equipment
Li et al. You Can Mask More For Extremely Low-Bitrate Image Compression
WO2024060161A1 (en) Encoding method, decoding method, encoder, decoder and storage medium
US20230316048A1 (en) Multi-rate computer vision task neural networks in compression domain
US20230306239A1 (en) Online training-based encoder tuning in neural image compression
US11683515B2 (en) Video compression with adaptive iterative intra-prediction
US20230336738A1 (en) Multi-rate of computer vision task neural networks in compression domain
US20230316588A1 (en) Online training-based encoder tuning with multi model selection in neural image compression
US20230334718A1 (en) Online training computer vision task models in compression domain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination