CN114519750A - Face image compression method and system - Google Patents
Face image compression method and system Download PDFInfo
- Publication number
- CN114519750A CN114519750A CN202210013946.6A CN202210013946A CN114519750A CN 114519750 A CN114519750 A CN 114519750A CN 202210013946 A CN202210013946 A CN 202210013946A CN 114519750 A CN114519750 A CN 114519750A
- Authority
- CN
- China
- Prior art keywords
- style
- bit stream
- coding bit
- image
- analysis
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 67
- 230000006835 compression Effects 0.000 title claims abstract description 49
- 238000007906 compression Methods 0.000 title claims abstract description 49
- 238000004458 analytical method Methods 0.000 claims abstract description 82
- 230000006870 function Effects 0.000 claims description 37
- 238000012549 training Methods 0.000 claims description 20
- 238000005457 optimization Methods 0.000 claims description 14
- 238000004590 computer program Methods 0.000 claims description 11
- 230000004927 fusion Effects 0.000 claims description 9
- 230000008447 perception Effects 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 7
- 230000011218 segmentation Effects 0.000 claims description 7
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 3
- 238000007476 Maximum Likelihood Methods 0.000 claims 1
- 230000000007 visual effect Effects 0.000 abstract description 13
- 238000011156 evaluation Methods 0.000 abstract description 10
- 238000010586 diagram Methods 0.000 description 7
- 238000013139 quantization Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 5
- 238000004891 communication Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 101100295091 Arabidopsis thaliana NUDT14 gene Proteins 0.000 description 1
- 206010021403 Illusion Diseases 0.000 description 1
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 230000006837 decompression Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 230000004438 eyesight Effects 0.000 description 1
- 210000000887 face Anatomy 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000016776 visual perception Effects 0.000 description 1
- 239000011800 void material Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T9/00—Image coding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The embodiment of the application discloses an image compression method and system, wherein the method comprises the following steps: inputting a style encoder and a content encoder from an original face image to extract style features and structural features; respectively carrying out probability estimation and entropy coding to obtain a style coding bit stream corresponding to the style characteristics and a structure coding bit stream corresponding to the structure characteristics, and inputting the style coding bit stream and the structure coding bit stream into a decoder and a multitask analysis network; the decoder reconstructs the images of the style coding bit stream and the structure coding bit stream and outputs a reconstructed image; and the multitask analysis network carries out semantic understanding analysis on the style coding bit stream and the structure coding bit stream and outputs semantic information of the image. Under the condition of extremely high compression efficiency, the high subjective visual evaluation quality of the reconstructed image is maintained, and the decoding time and the resource overhead are saved.
Description
Technical Field
The embodiment of the application relates to the technical field of digital signal processing, in particular to a method and a system for compressing a human face image.
Background
The image/video compression method based on the neural network is developed rapidly in recent years, and the quality of the compressed and reconstructed image exceeds the VVC (video coding standard) of the new generation on objective indexes such as PSNR (Peak Signal to noise ratio), MS-SSIM (motion state-to-noise ratio) and the like. The compression framework based on the generative model can greatly improve the compression ratio on the premise of not influencing related evaluation indexes directly reflecting the human eye viewing effect.
Currently, in various researches, end-to-end image coding based on a neural network faces two major problems: firstly, the representation mechanism of the input original image signal is limited, and the support of the computer vision processing task which is widely applied at present is lacked; secondly, the resources of the signal receiving end are limited, and the neural network model with huge parameter quantity is not enough to be supported.
Disclosure of Invention
Therefore, the embodiment of the application provides an image compression method and system, which can keep high subjective visual evaluation quality of a reconstructed image under the condition of extremely high compression efficiency, and save decoding time and resource overhead.
In order to achieve the above object, the embodiments of the present application provide the following technical solutions:
according to a first aspect of the embodiments of the present application, there is provided a face image compression method, including:
inputting a style encoder and a content encoder from an original face image to extract style features and structural features;
respectively carrying out probability estimation and entropy coding to obtain a style coding bit stream corresponding to the style characteristics and a structure coding bit stream corresponding to the structure characteristics, and inputting the style coding bit stream and the structure coding bit stream into a decoder and a multitask analysis network;
the decoder reconstructs the images of the style coding bit stream and the structure coding bit stream and outputs a reconstructed image; and the multitask analysis network carries out semantic understanding analysis on the style coding bit stream and the structure coding bit stream and outputs semantic information of the image.
Optionally, performing probability estimation and entropy coding respectively to obtain a style coded bit stream corresponding to the style characteristic and a structure coded bit stream corresponding to the structure characteristic, including:
quantizing the style characteristic and the structural characteristic respectively to obtain quantized style characteristic and structural characteristic;
and entropy coding the quantized style features and the structure features according to probability estimation results calculated by the probability estimation model respectively to obtain style coding bit streams corresponding to the style features and structure coding bit streams corresponding to the structure features.
Optionally, the reconstructing the image of the style coded bitstream and the structure coded bitstream by the decoder, and outputting a reconstructed image, including:
fusing the style coding bit stream and the structure coding bit stream through a fusion module in a decoder, and learning the mean value and the variance of a convolution layer in a residual block through multi-layer perception MLP processing;
executing an image compression task on the fused coded bit stream through a generator in a decoder to obtain a compressed reconstructed image;
judging the compressed reconstructed image through the discriminator to obtain a loss optimization function; the generator is trained according to a loss optimization function.
Optionally, the loss optimization function is according to the following formula:
wherein D is a discriminator, E is a content encoder and a style encoder, G is a generator, P is a probability estimation model, x is an original face image,in order to reconstruct the image,and p is a probability estimation result, and lambda and beta are hyper-parameters for the quantized style characteristics and structure characteristics.
Optionally, the multitask analysis network performs semantic analysis on the style coded bitstream and the structure coded bitstream, and outputs semantic information of an image, including:
inputting the style coding bit stream and the structure coding bit stream into the multitask analysis network, fusing the coding bit streams through a fusion module, and training the multitask analysis network according to a multitask analysis loss function to obtain a corresponding task result which is used as the output of the semantic information of the image.
Optionally, the multitask analysis loss function LmultiCalculated according to the following formula:
Lmulti=λclslcls+λseglseg
wherein lcls、lsegIs a loss function, lambda, of the classification task and the segmentation task, respectivelycls、λsegIs the corresponding weight hyperparameter.
Optionally, the method further comprises: training parameters in the multi-task analysis model through optimization of the multi-task analysis loss function to obtain a global optimal solution; wherein the total loss function applied in the training of the multi-tasking analysis model is according to the following formula:
L=LEGP+LD+γLmulti
wherein gamma is a hyperparameter.
According to a second aspect of the embodiments of the present application, there is provided a face image compression system, the system including:
the characteristic extraction module is used for inputting a style encoder and a content encoder from an original face image so as to extract style characteristics and structural characteristics;
the coding module is used for respectively carrying out probability estimation and entropy coding to obtain a style coding bit stream corresponding to the style characteristics and a structure coding bit stream corresponding to the structure characteristics, and inputting the style coding bit stream and the structure coding bit stream into the decoder and the multitask analysis network;
the compression decoding module is used for reconstructing the images of the style coding bit stream and the structure coding bit stream by a decoder and outputting a reconstructed image;
and the multitask analysis module is used for carrying out semantic understanding analysis on the style coding bit stream and the structure coding bit stream by the multitask analysis network and outputting semantic information of the image.
According to a third aspect of embodiments herein, there is provided an electronic device comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the processor executing the computer program to implement the method of the first aspect.
According to a fourth aspect of embodiments herein, there is provided a computer readable storage medium having stored thereon computer readable instructions executable by a processor to implement the method of the first aspect described above.
In summary, the embodiments of the present application provide an image compression method and system, which inputs a style encoder and a content encoder from an original face image to extract style features and structural features; respectively carrying out probability estimation and entropy coding to obtain a style coding bit stream corresponding to the style characteristics and a structure coding bit stream corresponding to the structure characteristics, and inputting the style coding bit stream and the structure coding bit stream into a decoder and a multitask analysis network; the decoder reconstructs the images of the style coding bit stream and the structure coding bit stream and outputs a reconstructed image; and the multitask analysis network carries out semantic understanding analysis on the style coding bit stream and the structure coding bit stream and outputs semantic information of the image. Under the condition of extremely high compression efficiency, the high subjective visual evaluation quality of the reconstructed image is maintained, and the decoding time and the resource overhead are saved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.
The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so that those skilled in the art can understand and read the present invention, and do not limit the conditions for implementing the present invention, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the functions and purposes of the present invention, should still fall within the scope of the present invention.
Fig. 1 is a schematic flow chart of a face image compression method according to an embodiment of the present application;
fig. 2 is a flowchart of a technical solution provided in an embodiment of the present application;
fig. 3 is a diagram of a multi-tasking analysis network architecture provided by an embodiment of the present application;
fig. 4 is a schematic diagram illustrating a quality index of a compressed and reconstructed image according to an embodiment of the present disclosure;
fig. 5 is a facial image compression system according to an embodiment of the present application;
fig. 6 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application;
fig. 7 shows a schematic diagram of a computer-readable storage medium provided by an embodiment of the present application.
Detailed Description
The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The image compression method provided by the embodiment of the application is prior to the prior art on the premise of the same code rate and on various evaluation indexes simulating the visual perception of human eyes; meanwhile, compressed data is used as input of a downstream visual analysis task, and compared with an original image, the loss of analysis accuracy is low, so that the storage resource overhead and network transmission bandwidth of a decoding end are greatly increased.
Fig. 1 illustrates an image compression method provided by an embodiment of the present application, where the method includes the following steps:
step 101: inputting a style encoder and a content encoder from an original face image to extract style features and structural features;
step 102: respectively carrying out probability estimation and entropy coding to obtain a style coding bit stream corresponding to the style characteristics and a structure coding bit stream corresponding to the structure characteristics, and inputting the style coding bit stream and the structure coding bit stream into a decoder and a multitask analysis network;
step 103: the decoder reconstructs the images of the style coding bit stream and the structure coding bit stream and outputs a reconstructed image; and the multitask analysis network carries out semantic understanding analysis on the style coding bit stream and the structure coding bit stream and outputs semantic information of the image.
In a possible implementation manner, in step 102, performing probability estimation and entropy coding respectively to obtain a style coded bitstream corresponding to the style characteristic and a structure coded bitstream corresponding to the structure characteristic, including:
quantizing the style characteristic and the structural characteristic respectively to obtain quantized style characteristic and structural characteristic; and entropy coding the quantized style features and the structure features according to probability estimation results calculated by the probability estimation model respectively to obtain style coding bit streams corresponding to the style features and structure coding bit streams corresponding to the structure features.
In one possible implementation, in step 103, the decoder reconstructs an image of the style coded bitstream and the structure coded bitstream, and outputs a reconstructed image, including:
fusing the style coding bit stream and the structure coding bit stream through a fusion module in a decoder, and learning the mean value and the variance of a convolution layer in a residual block through multi-layer perception MLP processing; executing an image compression task on the fused coded bit stream through a generator in a decoder to obtain a compressed reconstructed image; judging the compressed reconstructed image through the discriminator to obtain a loss optimization function; the generator is trained according to a loss optimization function.
In one possible embodiment, the loss optimization function is according to the following equations (1) and (2):
wherein D is a discriminator, E is a content encoder and a style encoder, G is a generator, P is a probability estimation model, x is an original face image,in order to reconstruct the image,and p is a probability estimation result, and lambda and beta are hyper-parameters for the quantized style characteristics and structure characteristics.
In a possible implementation manner, in step 103, the multitask analysis network performs semantic solution analysis on the style coded bitstream and the structure coded bitstream, and outputs semantic information of an image, including:
inputting the style coding bit stream and the structure coding bit stream into the multitask analysis network, fusing the coding bit streams through a fusion module, and training the multitask analysis network according to a multitask analysis loss function to obtain a corresponding task result which is used as the output of the semantic information of the image.
In the compression framework, the intermediate compression result is used, and under the condition of no decoding, the intermediate compression result is directly used as a plurality of analysis task inputs to obtain the semantic information of the original image signal. Multitasking herein refers to a variety of visual and semantic related tasks such as recognition, detection, segmentation, etc.
In one possible embodiment, the multitask analysis loss function LmultiCalculated according to the following equation (3):
Lmulti=λclslcls+λseglsegformula (3)
Wherein lcls、lsegIs a loss function, lambda, of the classification task and the segmentation task, respectivelycls、λsegIs the corresponding weight hyperparameter.
In one possible embodiment, the method further comprises: training parameters in the multi-task analysis model through optimization of the multi-task analysis loss function to obtain a global optimal solution; wherein the total loss function applied in the training of the multi-tasking analysis model is according to the following equation (4):
L=LEGP+LD+γLmultiformula (4)
Wherein gamma is a hyperparameter.
According to the face image compression method provided by the embodiment of the application, the construction idea of the generation model for the layered representation of the image signal at present is applied, and the original input image is mapped into the style characteristic and the structural characteristic, so that further quantization and entropy coding are carried out on the corresponding characteristic expression distribution. In addition, the embodiment of the application directly adopts compressed domain compressed data as the input of a plurality of subsequent visual tasks. Because the compressed data is an efficient and compact data expression form, the embodiment of the application provides a multi-task analysis network model, and the semantic information of the original image is obtained from the compressed data at low operation cost under the condition of no decompression. Meanwhile, joint optimization is carried out on the rate distortion loss function and the machine analysis target loss function, and a common solution of an image compression task and various machine vision analysis tasks is obtained.
Fig. 2 shows a model architecture diagram applicable to the face image compression method for multi-vision analysis task according to the embodiment of the present application, which mainly includes a compression model and a multi-task analysis model.
The compression model mainly comprises four main parts: an encoder, a generator, a discriminator and a probability estimation model. The encoder includes a content encoder and a genre encoder.
Given an original image x, the encoder first encodes it as y ═ e (x),then quantizeWhere Q is the quantization function. After that, the probability estimation result given by the probability estimation model will beLossless encoding into a bitstream using an entropy encoding method. At the decoder end, there areWhereinTo reconstruct the image.
And mapping the original picture to be compressed to a visual semantic feature domain, and decomposing the original picture to be compressed into style features and structural features. And performing probability distribution fitting on the style characteristics and the structural characteristics obtained by decomposition by using a mutually independent probability estimation method, and taking entropy values of the probabilities obtained by fitting as code rate values obtained by actual coding.
The original image signal is input and decomposed into content characteristic and style characteristic in semantic characteristic level. An input image x is encoded into a content representation and a style representation, respectively E, using two separate encoders1And E2. The content feature and the style feature are y1=E1(x) And y2=E2(x) In that respect Then, quantization is carried out by using a quantization function Q to obtainAnd
because the characteristics obtained by decoupling have mutually independent data distribution, a probability estimation model P, namely P, is respectively arranged for each layer1(y1|z1) And p2(y2|z2). Probability estimation result p given by probability estimation model1(y1|z1) And p2(y2|z2) Will be provided withAndand lossless coding is carried out on the coded bit stream by using an entropy coding method, and the entropy value of the probability obtained by fitting is used as the code rate value obtained by actual coding.
The features are further compressed by an entropy coding method, the entropy coded input requires the probability distribution of the elements in the features, so the probability distribution of each element that occurs is estimated by a probability estimation model. Entropy coding methods include, but are not limited to, huffman coding, arithmetic coding, context-based binarization coding.
The coded bit stream is referred to as a code stream. The features are compressed by an entropy encoder to obtain a binary file, and a code stream 1 and a code stream 2 in fig. 2 are entropy encoding results of the content features and the style features respectively.
The embodiment of the application also provides a semantic layering-based encoder, and the decoder part mainly comprises a generator and a discriminator. The embodiment of the application designs a fusion module at a decoding end. The fusion module is based on an Adaptive Instance Normalization (AdaIN) residual block, wherein content characteristics are directly input as an AdaIN module; the style features are processed by multi-layer perceptron (MLP) as the mean and variance of the residual network volume base layer in the AdaIN block to learn the mean and variance of the convolutional layer in the residual block.
According to the embodiment of the application, the compression model is optimized according to a rate distortion theory. There are two reference indicators in the distortion loss metric: mean absolute of pixel levelError (MAE) loss dMAESSIM loss d for evaluation of the overall StructureSSIM. Meanwhile, in consideration of subjective perception quality, perception distortion loss d is adoptedpHuman eye perception characteristics are simulated by high-order features extracted from the pre-trained convolutional neural network VGG 16.
The method for calculating the total distortion loss by the compression model provided by the embodiment of the application is as follows:
d=λMAEdMAE+λSSIMdSSIM+λpdpformula (5)
Wherein λ isMAE、λSSIMAnd λpIs a hyper-parameter. dMAEIs the Mean Absolute Error (MAE) loss at the pixel level, dSSIMSSIM loss, d, which is an overall structural evaluationpIs the perceptual distortion loss.
Based on the definition of the loss function, the loss optimization function of each module (referring to each partial module of fig. 2 corresponding to the L-corner mark, E referring to the content and style encoder, G referring to the generator, P referring to the probability estimation model, and D referring to the discriminator) corresponding to the embodiment of the present application can be defined as formula (1) and formula (2).
According to the method and the device, the expected code stream transmission code rate is changed by changing the number of the characteristic channels when the compression model is trained, so that the extremely high compression ratio model can be obtained more effectively.
In the training process, a mode of adding uniform noise is used to avoid the situation that the gradient of the quantization operation is not conductive when the quantization operation is reversely propagated.
The probability distribution fitting method used in training includes, but is not limited to, gaussian model, and mixture gaussian model.
Fig. 3 shows a schematic diagram of a multitask analysis network structure provided in the embodiment of the present application. In the aspect of a multi-task analysis network model, a compressed domain multi-task analysis network refers to a classification task and a semantic segmentation task, and adopts a corresponding network structure design and a corresponding loss function. ASPP refers to Spatial void Pyramid Pooling (Atrous Spatial Pyramid Pooling).
Inputting style characteristics and structural characteristics into a multi-task analysis network, firstly passing through a fusion module, then designing task network branches aiming at different task characteristics, and setting and outputting corresponding results according to different tasks.
The classification task and the segmentation task, which have been widely studied, are exemplified in the embodiments of the present application because they are the most representative tasks in the visual analysis.
Controlling the balance between tasks by setting hyper-parameters, thus multi-tasking analysis loses LmultiCan be expressed as formula (6):
Lmulti=λclslcls+λseglsegformula (6)
Wherein lcls、lsegIs a loss function, lambda, of the classification task and the segmentation task, respectivelycls、λsegAre their corresponding weight hyperparameters.
To further discuss the relationship of compression and visual analysis, two analysis network training methods were validated: training alone and training in combination.
The training process of the independent training relates to small parameter quantity and is easier to train. In the individual training setting, the embodiment of the application fixes the compression model and only trains the multi-task analysis model.
The training process of the combined training is to directly train the relationship among the balance compression, the reconstruction and the analysis, so that the global optimum point is easier to find, and a better analysis effect is achieved.
For the joint training method, the compression model and the multi-task analysis model are jointly optimized, the sum of the total loss functions is defined as formula (4), where the hyper-parameter γ is used to balance the specific gravity between the compression task and the visual analysis task.
According to the method and the device, the rate distortion loss function and the machine analysis target loss function are optimized in a combined mode, and a common solution of an image compression task and various machine vision analysis tasks is obtained.
The effects of the reconstruction map corresponding to the compression model on the four perception-based image evaluation indexes in the embodiment of the application are shown in fig. 4, and various image quality indexes can be seen, which greatly exceed the existing traditional encoding method and the end-to-end compression method based on deep learning.
According to the face picture compression method provided by the embodiment of the application, original picture signals are decoupled into style characteristics and structural characteristics in a visual characteristic domain, distribution fitting is respectively carried out on the characteristics by applying mutually independent probability estimation models, and then coding bit streams are obtained through an entropy coder; in order to directly analyze compressed data to obtain semantic information, a multi-task analysis model is provided.
The method provided by the embodiment of the application can keep high subjective visual evaluation quality of the reconstructed image under the condition of extremely high compression efficiency, can receive the code stream without decoding at a decoding end, and can acquire the semantic information of the original image by using the multitask analysis network acting on compressed data, thereby saving decoding time and resource overhead.
In summary, the embodiment of the present application provides an image compression method, which inputs a style encoder and a content encoder from an original face image to extract style features and structural features; respectively carrying out probability estimation and entropy coding to obtain a style coding bit stream corresponding to the style characteristics and a structure coding bit stream corresponding to the structure characteristics, and inputting the style coding bit stream and the structure coding bit stream into a decoder and a multitask analysis network; the decoder reconstructs the images of the style coding bit stream and the structure coding bit stream and outputs a reconstructed image; and the multitask analysis network carries out semantic understanding analysis on the style coding bit stream and the structure coding bit stream and outputs semantic information of the image. Under the condition of extremely high compression efficiency, the high subjective visual evaluation quality of the reconstructed image is maintained, and the decoding time and the resource overhead are saved.
Based on the same technical concept, an embodiment of the present application further provides a face image compression system, as shown in fig. 5, the system includes:
a feature extraction module 501, configured to input a style encoder and a content encoder from an original face image to extract style features and structural features;
the encoding module 502 is configured to perform probability estimation and entropy encoding respectively to obtain a style encoded bit stream corresponding to the style characteristics and a structure encoded bit stream corresponding to the structure characteristics, and input the style encoded bit stream and the structure encoded bit stream to the decoder and the multitask analysis network;
a compression decoding module 503, configured to reconstruct the image of the style coded bitstream and the structure coded bitstream by a decoder, and output a reconstructed image;
and a multitask analysis module 504, configured to perform semantic understanding analysis on the style coded bit stream and the structure coded bit stream by using a multitask analysis network, and output semantic information of an image.
The embodiment of the application also provides electronic equipment corresponding to the method provided by the embodiment. Please refer to fig. 6, which illustrates a schematic diagram of an electronic device according to some embodiments of the present application. The electronic device 20 may include: the system comprises a processor 200, a memory 201, a bus 202 and a communication interface 203, wherein the processor 200, the communication interface 203 and the memory 201 are connected through the bus 202; the memory 201 stores a computer program that can be executed on the processor 200, and the processor 200 executes the computer program to perform the method provided by any of the foregoing embodiments of the present application.
The Memory 201 may include a high-speed Random Access Memory (RAM) and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one physical port 203 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used.
The processor 200 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 200. The Processor 200 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 201, and the processor 200 reads the information in the memory 201 and completes the steps of the method in combination with the hardware thereof.
The electronic device provided by the embodiment of the application and the method provided by the embodiment of the application have the same inventive concept and have the same beneficial effects as the method adopted, operated or realized by the electronic device.
Referring to fig. 7, the computer-readable storage medium is an optical disc 30, on which a computer program (i.e., a program product) is stored, and when the computer program is executed by a processor, the computer program performs the method of any of the foregoing embodiments.
It should be noted that examples of the computer-readable storage medium may also include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory, or other optical and magnetic storage media, which are not described in detail herein.
The computer-readable storage medium provided by the above-mentioned embodiments of the present application and the method provided by the embodiments of the present application have the same advantages as the method adopted, executed or implemented by the application program stored in the computer-readable storage medium.
It should be noted that:
the algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. In addition, this application is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present application as described herein, and any descriptions of specific languages are provided above to disclose the best mode of use of the present application.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in the creation apparatus of a virtual machine according to embodiments of the present application. The present application may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.
Claims (10)
1. A method for compressing a face image, the method comprising:
inputting a style encoder and a content encoder from an original face image to extract style features and structural features;
respectively carrying out probability estimation and entropy coding to obtain a style coding bit stream corresponding to the style characteristics and a structure coding bit stream corresponding to the structure characteristics, and inputting the style coding bit stream and the structure coding bit stream into a decoder and a multitask analysis network;
the decoder reconstructs the images of the style coding bit stream and the structure coding bit stream and outputs a reconstructed image; and the multitask analysis network carries out semantic understanding analysis on the style coding bit stream and the structure coding bit stream and outputs semantic information of the image.
2. The method of claim 1, wherein the performing probability estimation and entropy coding to obtain a stylized coded bitstream corresponding to the stylistic characteristic and a structurally coded bitstream corresponding to the structurally characteristic, respectively, comprises:
quantizing the style characteristic and the structural characteristic respectively to obtain quantized style characteristic and structural characteristic;
and entropy coding the quantized style features and the structure features according to probability estimation results calculated by the probability estimation model respectively to obtain style coding bit streams corresponding to the style features and structure coding bit streams corresponding to the structure features.
3. The method of claim 1, wherein the decoder reconstructs the images of the style and structure coded bitstreams comprising:
fusing the style coding bit stream and the structure coding bit stream through a fusion module in a decoder, and learning the mean value and the variance of convolution layers in a residual block through multi-layer perception MLP (maximum likelihood prediction) processing;
executing an image compression task on the fused coded bit stream through a generator in a decoder to obtain a compressed reconstructed image;
judging the compressed reconstructed image through a discriminator to obtain a loss optimization function; the generator is trained according to a loss optimization function.
4. The method of claim 3, wherein the loss optimization function is in accordance with the following equation:
wherein D is a discriminator, E is a content encoder and a style encoder, G is a generator, P is a probability estimation model, x is an original face image,in order to reconstruct the image,and p is a probability estimation result, and lambda and beta are hyper-parameters for the quantized style characteristics and structure characteristics.
5. The method of claim 1, wherein the multitask analysis network performs semantic parsing on the style coded bitstream and the structure coded bitstream to output semantic information for an image, comprising:
inputting the style coding bit stream and the structure coding bit stream into the multitask analysis network, fusing the coding bit streams through a fusion module, and training the multitask analysis network according to a multitask analysis loss function to obtain a corresponding task result which is used as the output of the semantic information of the image.
6. The method of claim 5, wherein the multitasking analysis loss function LmultiCalculated according to the following formula:
Lmulti=λclslcls+λseglseg
wherein lcls、lsegIs a loss function, lambda, of the classification task and the segmentation task, respectivelycls、λsegIs the corresponding weight hyperparameter.
7. The method of any of claims 1 to 6, further comprising: training parameters in the multi-task analysis model through optimization of the multi-task analysis loss function to obtain a global optimal solution; wherein the total loss function applied in the training of the multi-tasking analysis model is according to the following formula:
L=LEGP+LD+γLmulti
wherein gamma is a hyperparameter.
8. A face image compression system, the system comprising:
the characteristic extraction module is used for inputting a style encoder and a content encoder from an original face image so as to extract style characteristics and structural characteristics;
the coding module is used for respectively carrying out probability estimation and entropy coding to obtain a style coding bit stream corresponding to the style characteristics and a structure coding bit stream corresponding to the structure characteristics, and inputting the style coding bit stream and the structure coding bit stream into the decoder and the multitask analysis network;
the compression decoding module is used for reconstructing the images of the style coding bit stream and the structure coding bit stream by a decoder and outputting a reconstructed image;
and the multitask analysis module is used for carrying out semantic understanding analysis on the style coding bit stream and the structure coding bit stream by the multitask analysis network and outputting semantic information of the image.
9. An electronic device, comprising: memory, processor and computer program stored on the memory and executable on the processor, characterized in that the processor executes when executing the computer program to implement the method according to any of claims 1-7.
10. A computer-readable storage medium having computer-readable instructions stored thereon, the computer-readable instructions being executable by a processor to implement the method of any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210013946.6A CN114519750A (en) | 2022-01-06 | 2022-01-06 | Face image compression method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210013946.6A CN114519750A (en) | 2022-01-06 | 2022-01-06 | Face image compression method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114519750A true CN114519750A (en) | 2022-05-20 |
Family
ID=81597218
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210013946.6A Pending CN114519750A (en) | 2022-01-06 | 2022-01-06 | Face image compression method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114519750A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115880762A (en) * | 2023-02-21 | 2023-03-31 | 中国传媒大学 | Scalable human face image coding method and system for human-computer mixed vision |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110991329A (en) * | 2019-11-29 | 2020-04-10 | 上海商汤智能科技有限公司 | Semantic analysis method and device, electronic equipment and storage medium |
CN111199550A (en) * | 2020-04-09 | 2020-05-26 | 腾讯科技(深圳)有限公司 | Training method, segmentation method, device and storage medium of image segmentation network |
CN112766079A (en) * | 2020-12-31 | 2021-05-07 | 北京航空航天大学 | Unsupervised image-to-image translation method based on content style separation |
CN112819689A (en) * | 2021-02-02 | 2021-05-18 | 百果园技术(新加坡)有限公司 | Training method of face attribute editing model, face attribute editing method and equipment |
-
2022
- 2022-01-06 CN CN202210013946.6A patent/CN114519750A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110991329A (en) * | 2019-11-29 | 2020-04-10 | 上海商汤智能科技有限公司 | Semantic analysis method and device, electronic equipment and storage medium |
CN111199550A (en) * | 2020-04-09 | 2020-05-26 | 腾讯科技(深圳)有限公司 | Training method, segmentation method, device and storage medium of image segmentation network |
CN112766079A (en) * | 2020-12-31 | 2021-05-07 | 北京航空航天大学 | Unsupervised image-to-image translation method based on content style separation |
CN112819689A (en) * | 2021-02-02 | 2021-05-18 | 百果园技术(新加坡)有限公司 | Training method of face attribute editing model, face attribute editing method and equipment |
Non-Patent Citations (1)
Title |
---|
马思伟等: "智能视频编码", 人工智能, 10 April 2020 (2020-04-10) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115880762A (en) * | 2023-02-21 | 2023-03-31 | 中国传媒大学 | Scalable human face image coding method and system for human-computer mixed vision |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109218727B (en) | Video processing method and device | |
CN111641832B (en) | Encoding method, decoding method, device, electronic device and storage medium | |
CN113259665B (en) | Image processing method and related equipment | |
CN113259676B (en) | Image compression method and device based on deep learning | |
CN111641826B (en) | Method, device and system for encoding and decoding data | |
CN116600119B (en) | Video encoding method, video decoding method, video encoding device, video decoding device, computer equipment and storage medium | |
CN113079378B (en) | Image processing method and device and electronic equipment | |
CN110930408A (en) | Semantic image compression method based on knowledge reorganization | |
CN111246206A (en) | Optical flow information compression method and device based on self-encoder | |
Löhdefink et al. | GAN-vs. JPEG2000 image compression for distributed automotive perception: Higher peak SNR does not mean better semantic segmentation | |
CN114519750A (en) | Face image compression method and system | |
CN114501031B (en) | Compression coding and decompression method and device | |
CN113382244B (en) | Coding and decoding network structure, image compression method, device and storage medium | |
CN115866265A (en) | Multi-code-rate depth image compression system and method applied to mixed context | |
CN113554719B (en) | Image encoding method, decoding method, storage medium and terminal equipment | |
CN118020306A (en) | Video encoding and decoding method, encoder, decoder, and storage medium | |
CN111565314A (en) | Image compression method, coding and decoding network training method and device and electronic equipment | |
Li et al. | You Can Mask More For Extremely Low-Bitrate Image Compression | |
WO2024060161A1 (en) | Encoding method, decoding method, encoder, decoder and storage medium | |
US20230316048A1 (en) | Multi-rate computer vision task neural networks in compression domain | |
US20230306239A1 (en) | Online training-based encoder tuning in neural image compression | |
US11683515B2 (en) | Video compression with adaptive iterative intra-prediction | |
US20230336738A1 (en) | Multi-rate of computer vision task neural networks in compression domain | |
US20230316588A1 (en) | Online training-based encoder tuning with multi model selection in neural image compression | |
US20230334718A1 (en) | Online training computer vision task models in compression domain |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |