CN114915786B - Asymmetric semantic image compression method for Internet of things scene - Google Patents

Asymmetric semantic image compression method for Internet of things scene Download PDF

Info

Publication number
CN114915786B
CN114915786B CN202210445325.5A CN202210445325A CN114915786B CN 114915786 B CN114915786 B CN 114915786B CN 202210445325 A CN202210445325 A CN 202210445325A CN 114915786 B CN114915786 B CN 114915786B
Authority
CN
China
Prior art keywords
semantic
linear encoder
image
internet
quantized
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210445325.5A
Other languages
Chinese (zh)
Other versions
CN114915786A (en
Inventor
陈斌
王轩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN202210445325.5A priority Critical patent/CN114915786B/en
Publication of CN114915786A publication Critical patent/CN114915786A/en
Application granted granted Critical
Publication of CN114915786B publication Critical patent/CN114915786B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/146Data rate or code amount at the encoder output
    • H04N19/147Data rate or code amount at the encoder output according to rate distortion criteria
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/132Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses an asymmetric semantic image compression method for an Internet of things scene. The asymmetric semantic image compression method for the scene of the Internet of things comprises the steps of obtaining quantized sampling signals and reconstructing an image by using a depth decoder; extracting semantic information from the reconstructed image, using the semantic information to train the lightweight linear encoder, and broadcasting the trained lightweight linear encoder to Internet of things equipment. According to the invention, better rate distortion performance is obtained based on residual fidelity block reconstruction, and the accuracy of a downstream task is ensured based on rate distortion optimization of data semantics.

Description

Asymmetric semantic image compression method for Internet of things scene
Technical Field
The invention relates to the field of image processing, in particular to an asymmetric semantic image compression method for an Internet of things scene.
Background
Deep learning has enabled the concept of intelligent internet of things (AIoT) to be realized by analyzing and understanding massive data perceived by internet of things front-end devices, but there are still objective obstacles to deploying a deep network model on internet of things devices due to limited computing resources, storage space and battery capacity of the internet of things devices. With the development of mobile edge computing and 5G technology (fifth generation mobile communication technology), the above-mentioned obstacles can be effectively solved by deploying a deep neural network model on a server with a relatively strong computing capability close to the internet of things device and deploying an image compression algorithm between the internet of things device and the server.
The existing popular lossy image compression algorithm does not consider semantic distortion optimization. The image compression algorithm based on the JPEG (JointPhotographicExpertsGroup) framework proposed by Liu, choi et al is invariant and relearnable, resulting in poor performance in certain downstream tasks. The compressed sensing-based gray image compression framework proposed by Yuan et al does not consider the efficient expansion of color images. Although the depth lossy image compression method achieves good rate distortion performance, the high deployment cost is not suitable for the Internet of things equipment with limited computing resources.
Accordingly, the prior art is still in need of improvement and development.
Disclosure of Invention
The invention aims to solve the technical problems that the existing image compression algorithm does not consider semantic distortion optimization and effective expansion, cannot learn and cannot be deployed in a lightweight mode of a color image.
The technical scheme adopted for solving the technical problems is as follows:
in a first aspect, an embodiment of the present invention provides a method for compressing an asymmetric semantic image oriented to a scene of the internet of things, where the method includes:
acquiring a quantized sampling signal, and performing image reconstruction by using a depth decoder based on the quantized sampling signal to obtain a reconstructed image;
extracting semantic information from the reconstructed image, training a first lightweight linear encoder by using the semantic information to obtain a trained lightweight linear encoder, and broadcasting the trained lightweight linear encoder to Internet of things equipment.
In one implementation manner, the quantized sampling signal is obtained from the internet of things device, where the internet of things device is configured to:
acquiring a target image through a second lightweight linear encoder, and performing separation sampling processing on the target image to obtain a sampling signal;
converting the sampled signal into the quantized sampled signal by the second lightweight linear encoder using a learnable quantization;
entropy encoding the quantized sample signal by an arithmetic encoder through the second lightweight linear encoder to obtain a bit stream of the quantized sample signal;
uploading the bitstream to a server by the second lightweight linear encoder.
In one implementation, the obtaining, by the second lightweight linear encoder, a target image, and performing a separation sampling process on the target image includes:
converting an RGB color space of the target image into a YUV color space through RGB-YUV conversion by the second lightweight linear encoder;
dividing each YUV channel in the YUV color space of the target image into non-overlapping b×b-sized sample blocks by the second lightweight linear encoder, wherein the expression of the b×b-sized sample blocks is:wherein H, W is the height and width of the target image, Y, U, V are Y, U, V channel subscripts;
sampling the block of samples with a learnable linear sampling matrix by the second lightweight linear encoder with a sampling process of y i:j =Ax i:j WhereinIs the learnable linear sampling matrix, such that M<3B 2 Where M is the number of samples.
In one implementation, the method further comprises:
the learnable linear sampling matrix is integrated as a learnable parameter into a task-dependent semantic depth feature extractor with a parameter set.
In one implementation, the acquiring quantized sample signals and reconstructing an image using a depth decoder includes:
receiving a bit stream of the quantized sample signal;
inputting a bit stream of the quantized sample signal into a depth decoder;
entropy decoding is carried out on the bit stream of the quantized sampling signal by adopting an arithmetic decoder, so that the quantized sampling signal is obtained;
reconstructing the quantized sampling signal to obtain an intermediate YUV reconstruction signal;
performing fidelity processing on the intermediate YUV reconstruction signals to obtain fidelity recovery, and updating the intermediate YUV reconstruction signals into the fidelity YUV reconstruction signals by using a multichannel gradient;
extracting features from the fidelity restoration by using the semantic depth feature extractor with parameter sets and related to the task, and adding the features into original features;
the reconstructing the quantized sampling signal to obtain an intermediate YUV reconstructed signal includes:
learning the priori property of the target image by using a residual block to obtain a learnable priori based on the residual block;
reconstructing the quantized sampling signal by using the residual block-based learnable prior to obtain the intermediate YUV reconstruction signal.
In one implementation, the extracting semantic information from the reconstructed image, training a first lightweight linear encoder using the semantic information, includes:
extracting reconstruction inferred semantic information from the reconstructed image;
obtaining loss of accuracy of estimating the downstream task according to the reconstruction estimation semantic information;
obtaining a data-semantic rate distortion optimization target according to the loss of the inferred accuracy rate for evaluating the downstream task;
obtaining a data-semantic rate distortion loss function based on the data-semantic rate distortion optimization target;
training the lightweight linear encoder with the data-semantic rate-distortion loss function.
In one implementation, the data-semantic rate-distortion optimization objective includes estimated bitrate loss, human eye perceived loss, and the loss of inferred accuracy for evaluating downstream tasks;
the data-semantic rate distortion loss function is:
wherein,,is the encoded and quantized vector, X is the target image, < >>Is the reconstructed image, z is the true semantic tag,>is a semantic tag which is inferred and generated by the reconstructed image through a downstream task model, d 1 (. Cndot.) is MSE or other loss of image reconstruction quality, d 2 (. Lambda.) is the expected semantic distortion 1 ,λ 2 Is the Lagrangian multiplier, D, controlling overall loss R Is the estimated bit rate loss, D 1 Is the perception loss of human eyes, D 2 The loss of accuracy is inferred for the evaluation of downstream tasks.
In a second aspect, an embodiment of the present invention further provides an asymmetric semantic image compression system facing to a scene of the internet of things, where the asymmetric semantic image compression system includes: server and with the thing networking device that the server is connected, wherein, the server includes:
the reconstruction image acquisition module is used for acquiring a quantized sampling signal, and performing image reconstruction by using a depth decoder based on the quantized sampling signal to obtain a reconstruction image;
the lightweight linear encoder training module is used for extracting semantic information from the reconstructed image, training a first lightweight linear encoder by using the semantic information to obtain a trained lightweight linear encoder, and broadcasting the trained lightweight linear encoder to the Internet of things equipment.
In a third aspect, the present invention further provides a server, where the server includes a memory, a processor, and an asymmetric semantic image compression program for an internet of things scene stored in the memory and capable of running on the processor, and when the processor executes the asymmetric semantic image compression program for an internet of things scene, the steps of the asymmetric semantic image compression method for an internet of things scene according to any one of the above schemes are implemented.
In a fourth aspect, an embodiment of the present invention further provides a storage medium, where one or more programs are stored, where the one or more programs may be executed by one or more processors, so as to implement the steps of an asymmetric semantic image compression method for an internet of things scenario according to any one of the foregoing schemes.
The beneficial effects are that: compared with the prior art, the invention provides an asymmetric semantic image compression algorithm for an Internet of things scene. Semantic information is then extracted from the reconstructed image, which is used to train the lightweight linear encoder. Finally, broadcasting the trained lightweight linear encoder to the Internet of things equipment. According to the invention, the image is reconstructed through the depth decoder based on the residual fidelity block and deployed on the server, and the lightweight linear encoder is trained through the data-semantic rate distortion loss function so as to obtain better rate distortion performance, so that the accuracy of a downstream task is ensured, and the problems that the existing image compression algorithm does not consider semantic distortion optimization and cannot learn are solved. The problem that the computing resources and the storage space of the Internet of things equipment are limited is solved by deploying the depth decoder and the training function on the server.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to the drawings without inventive effort to those skilled in the art.
Fig. 1 is a flow chart of an asymmetric semantic image compression method for an internet of things scene, which is provided by the embodiment of the invention.
Fig. 2 is a network architecture diagram of an asymmetric semantic image compression algorithm for an internet of things scene provided by an embodiment of the invention
Fig. 3 is a graph of absolute values of pearson correlation coefficients inside RGB versus YUV channels provided by an embodiment of the present invention.
Fig. 4 is a schematic diagram of a separation sampling process according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of an iterative decoding process based on a residual fidelity block according to an embodiment of the present invention.
FIG. 6 is a diagram illustrating the rate distortion performance of different compression algorithms on the Cityscapes and KITTI data sets provided by an embodiment of the present invention.
Fig. 7 is a schematic block diagram of an asymmetric semantic image compression system for an internet of things scene provided by an embodiment of the present invention.
Fig. 8 is a schematic block diagram of a server according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and more specific, the present invention will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The existing popular lossy image compression algorithm does not consider semantic distortion optimization. The image compression algorithm based on the JPEG (JointPhotographicExpertsGroup) framework proposed by Liu, choi et al is invariant and relearnable, resulting in poor performance in certain downstream tasks. The compressed sensing-based gray image compression framework proposed by Yuan et al does not consider the efficient expansion of color images. Although the depth lossy image compression method achieves good rate distortion performance, the high deployment cost is not suitable for the Internet of things equipment with limited computing resources. In order to solve the above technical problems, the present embodiment provides an asymmetric semantic image compression algorithm for an internet of things scene, where a quantized sampling signal is first obtained and an image is reconstructed using a depth decoder. Semantic information is then extracted from the reconstructed image, which is used to train the lightweight linear encoder. Finally, broadcasting the trained lightweight linear encoder to the Internet of things equipment. According to the embodiment, better rate distortion performance is obtained through reconstruction based on the residual fidelity block, and the accuracy of a downstream task is guaranteed through rate distortion optimization based on data semantics.
Exemplary method
The embodiment provides an asymmetric semantic image compression algorithm for an internet of things scene, and the embodiment can be applied to a server, wherein the server can be a cloud/edge server. As shown in fig. 1, the method comprises the steps of:
step S100, obtaining a quantized sampling signal, and performing image reconstruction by using a depth decoder based on the quantized sampling signal to obtain a reconstructed image.
In this embodiment, after the cloud/edge server receives the quantized sampling signal sent by the internet of things device, the depth decoder disposed on the cloud/edge server performs reconstruction processing on the quantized sampling signal to obtain a reconstructed image.
The quantized sampling signal is obtained by converting a continuous variation interval corresponding to the sampling signal into a discrete integer value. In reality, the original image signal acquired is a continuous gray signal, and the continuous signal cannot be transmitted in a channel, so that the continuous gray image signal needs to be divided into a plurality of sections, and each section takes a discrete integer value to represent the corresponding gray level. In this embodiment, a lightweight linear encoder on an internet of things device samples a target image to obtain a sampling signal, and converts a continuous variation interval corresponding to the sampling signal into a discrete integer value to obtain a quantized sampling signal.
The depth decoder is deployed on the cloud/edge server, and the image decoding method and the image encoding method are mutually corresponding processes. The quantized sample signal is obtained by encoding a target image on the internet of things device, so that the reconstructed image is obtained by reconstructing the quantized sample signal through a depth decoder on a cloud/edge server.
In one implementation manner, in this embodiment, the quantized sampling signal is obtained from the internet of things device, and the internet of things device obtains the quantized sampling signal through the following steps:
s10, acquiring a target image through a second lightweight linear encoder, and performing separation sampling processing on the target image to obtain a sampling signal;
s20, converting the sampling signal into the quantized sampling signal by the second lightweight linear encoder through learning quantization;
s30, entropy coding the quantized sampling signal by an arithmetic coder through the second lightweight linear coder to obtain a bit stream of the quantized sampling signal;
s40, uploading the bit stream to a server through the second lightweight linear encoder.
Specifically, a second lightweight linear encoder disposed on the internet of things device acquires a target image and samples the target image in advance, that is, after the internet of things device receives the target image sent by the image acquisition device, samples the target image to obtain a sampling signal, converts the sampling signal into a quantized sampling signal, and sends the quantized sampling signal to a server in a bit stream mode. In the signal data transmission process, a large amount of data will cause channel congestion, and the signal data is often affected by environmental factors, such as noise, obstacles, and the like, so that information received by the terminal is incomplete, missing, or the receiving time length is increased, such as image blurring, image damage, and the like. Therefore, in order to improve the quality of data received by the terminal, the original image needs to be sampled, so that the terminal receives the sampled signal, and the original image is successfully reconstructed through measurement and optimal reconstruction of the sampled signal. Therefore, the sampling rate can be reduced on the premise of ensuring the signal quality through sampling, and the cost of transmission, processing and the like of data such as images, videos and the like is obviously reduced through the reduction of sampling data.
For example, after the front-end device of the internet of things, such as a street monitoring camera or an unmanned aerial vehicle fire monitoring system, shoots a target image, a target image X is sent to the internet of things device, after the internet of things device receives the target image X, the target image X is subjected to separation sampling processing through a second lightweight linear encoder disposed on the internet of things device, so as to obtain a sampling signal, further, the sampling signal is quantized and entropy-coded to be converted into a bit stream of the quantized sampling signal, and the bit stream is transmitted to a cloud/edge server in a bit stream form, as shown in fig. 2, wherein the lightweight linear encoder is at the upper left, the JPEG encoder is at the lower left, the RGB-to-YUV transmission method and the 2D-DCT are special convolution operations for comparison, the special convolution operations are used for comparing the coding effect with the lightweight linear encoder in the invention, and the right side corresponds to a depth decoder for completing image reconstruction. LQ represents a learnable quantization, AE and AD represent an arithmetic encoder and an arithmetic decoder, respectively, LE represents a lossless encoder. The convolution parameters are expressed as: kernel height x kernel width x filter number/stride.
In one implementation, the step S10 in this embodiment includes the following steps:
s11, converting the RGB color space of the target image into a YUV color space through RGB-YUV conversion by the second lightweight linear encoder;
s12, dividing each YUV channel in the YUV color space of the target image into non-overlapping sampling blocks with the size of B multiplied by B through the second lightweight linear encoder, wherein the expression of the sampling blocks with the size of B multiplied by B is as follows:wherein H, W is the height and width of the target image, Y, U, V are Y, U, V channel subscripts;
s13, sampling the sampling block by the second lightweight linear encoder through a learnable linear sampling matrix, wherein the sampling process is that y i:j =Ax i:j WhereinIs the learnable linear sampling matrix, such that M<3B 2 Where M is the number of samples.
Specifically, sampling the color image signal along one color dimension and two spatial dimensions, respectively, may reduce the size of the sample matrix. Furthermore, the correlation in the RGB channel is higher than that in the YUV channel, as shown in fig. 3. Thus, the present invention samples each YUV channel independently because they have been de-correlated. As shown in fig. 4, the second lightweight linear encoder on the internet of things device of the present invention samples each YUV channel along the spatial dimension for the received color target image signal.
The size of the sample matrix can be reduced by performing a block-based compressed sensing sampling operation to divide each YUV channel into overlapping bxb blocks. Specifically, given a target X, the target image X is further divided into non-overlapping b×b blocks:where H, W is the height and width of the image, and each sample block is then sampled independently using a learnable linear sampling matrix. In the present invention, the blocks are sparse relative to an orthogonal basis ψ, such as the Discrete Cosine Transform (DCT). Then, for each sampling process can be expressed as y i:j =Ax i:j Wherein Is the learnable linear sampling matrix, such that M<3B 2 Where M is the number of samples. Notably, our block-based sampling steps can be represented by convolutions.
In one implementation manner, the asymmetric semantic image compression algorithm for the scene of the internet of things further comprises the following steps:
s50, integrating the learnable linear sampling matrix into a task-related semantic depth feature extractor with parameter sets as learnable parameters.
Specifically, the JPEG encoder is designed only for human eye perception, unlike the existing JPEG encoder, the lightweight linear encoder of the present invention can be trained based on data-semantic rate distortion, i.e., the method of the present invention simultaneously takes into account human eye perception and machine learning model perception. Each row of our learnable linear sampling matrix can be seen as a filter whose sampling operation is equivalent to a series of convolution filters. The kernel size and stride are both bxb. From this point of view, we can integrate a linear sampling matrix as a learnable parameter into a task-dependent semantic depth feature extractor with a parameter set.
In one implementation, the step S100 in this embodiment includes the following steps:
s101, receiving a bit stream of the quantized sampling signal;
s102, inputting a bit stream of the quantized sampling signal into a depth decoder;
s103, entropy decoding is carried out on the bit stream of the quantized sampling signal by adopting an arithmetic decoder, so that the quantized sampling signal is obtained;
s104, carrying out reconstruction processing on the quantized sampling signals to obtain intermediate YUV reconstruction signals;
s105, performing fidelity processing on the intermediate YUV reconstruction signals to obtain fidelity recovery, and updating the intermediate YUV reconstruction signals into fidelity YUV reconstruction signals by using a multichannel gradient;
s106, extracting features from the fidelity restoration by using the semantic depth feature extractor with the parameter set and related to the task, and adding the features into the original features.
In the invention, the server receives the bit stream of the quantized sampling signal sent by the second lightweight linear encoder, the depth decoder further comprises an entropy decoding step and synthesis transformation to obtain the quantized sampling signal, the received quantized sampling signal is reconstructed, the intermediate YUV reconstruction signal is extracted and recovered, the fidelity YUV reconstruction signal can be obtained through the fidelity processing, a large amount of multidimensional original data is recovered by a small amount of low-dimensional sampling data, and the server can be a cloud/edge server connected with the Internet of things equipment.
Existing compressed sensing theory proves that if the sampling matrix meets the limited equidistant property (Restricted Isometry Property, RIP), sparse optimization can be used: to recover the ith tile, where ρ is a super parameter. In the invention, the sparse prior in the conventional compressed sensing optimization is replaced by the learnable prior, as shown in a formula (1), so as to improve the rate-distortion performance. In particular we use a depth reconstruction function f θ To learn the reconstruction from the quantized sampled signal to the original signal:
where ρ is a super-parameter,is the encoded and quantized vector, X is the target image, < >>Is the reconstructed image, Y, U, V are Y, U, V channel subscripts, W is the RGB to YUV transfer matrix.
Further, we solve this problem using an iterative gradient expansion method. First we define the initial reconstruction as:we can then derive the iterative decoding algorithm as follows:
wherein, K is the total iteration times,for intermediate recovery->Is->Gradient of->Is a deconvolution operation.
Specifically, the invention learns the priori property of the target image by using the residual block of the formula (2), and obtains the learnable priori based on the residual block. And (3) reconstructing the quantized sampling signal by using the residual block-based learnable prior to obtain the intermediate YUV reconstruction signal, wherein the reconstruction quality is better than that of the reconstruction based on sparse prior. The intermediate YUV reconstruction signal is updated to the fidelity YUV reconstruction signal by utilizing the multichannel gradient of the formula (4), so that the reconstruction error of the intermediate YUV reconstruction signal can be reduced. Finally, as shown in fig. 5, the features extracted from the fidelity restoration are added to the original features to correct the accumulated errors in the feature layer.
And step 200, extracting semantic information from the reconstructed image, training a first lightweight linear encoder by using the semantic information to obtain a trained lightweight linear encoder, and broadcasting the trained lightweight linear encoder to the Internet of things equipment.
In this embodiment, because resources of the internet of things device are limited, the deep neural network model cannot be trained and deployed, so that the deep neural network model is deployed on a cloud/edge server with stronger computing capability to ensure that processing on the internet of things device is low in complexity. The deep neural network model on the cloud/edge server extracts reconstruction inferred semantic information from the reconstructed image to obtain a data-semantic rate-distortion optimization target, and trains the first lightweight linear encoder by using a data-semantic rate-distortion loss function, so that a trained and learnable lightweight linear encoder is obtained. Finally broadcasting the trained lightweight linear encoder to the Internet of things equipment, so that the lightweight linear encoder on the Internet of things equipment can be learned under the condition that the processing capacity of the Internet of things equipment is not affected, and the accuracy of downstream tasks is further ensured.
In one implementation, the step S200 in this embodiment includes the following steps:
s201, extracting reconstruction inference semantic information from the reconstruction image;
s202, deducing semantic information according to the reconstruction, and obtaining loss of deducing accuracy rate for evaluating downstream tasks;
s203, deducing the loss of accuracy according to the task for evaluating the downstream task to obtain a data-semantic rate distortion optimization target;
s204, obtaining a data-semantic rate distortion loss function based on the data-semantic rate distortion optimization target;
s205, training the lightweight linear encoder by using the data-semantic rate distortion loss function.
In particular, the reconstruction inferred semantic information is inferred by a semantic deep neural network analysis model deployed on the cloud/edge server. The semantic deep neural network analysis model extracts reconstruction inferred semantic information from the reconstructed image, obtains a loss of inferred accuracy for evaluating downstream tasks, and the data-semantic rate-distortion optimization objective includes estimated bitrate loss, constitutes the data-semantic rate-distortion optimization objective together with human eye perception loss, and trains the first lightweight linear encoder by the following data-semantic rate-distortion loss function (5):
wherein the method comprises the steps of,Is the encoded and quantized vector, X is the target image, < >>Is the reconstructed image, z is the true semantic tag,>is a semantic tag which is inferred and generated by the reconstructed image through a downstream task model, d 1 (. Cndot.) is MSE or other loss of image reconstruction quality, d 2 (. Lambda.) is the expected semantic distortion 1 ,λ 2 Is the Lagrangian multiplier, D, controlling overall loss R Is the estimated bit rate loss, D 1 Is the perception loss of human eyes, D 2 The loss of accuracy is inferred for the evaluation of downstream tasks.
In one implementation, a semantic deep neural network analysis model deployed on a cloud/edge server trains a first lightweight linear encoder to obtain a trained lightweight linear encoder, and broadcasts the trained lightweight linear encoder to the internet of things device in a broadcasting manner. The invention can adaptively improve the semantic accuracy of specific downstream tasks, realize adaptive coding and design an asymmetric semantic image compression algorithm suitable for the resource-restricted scene of the Internet of things.
The lightweight linear encoder (CS-ASIC) based on the asymmetric semantic image compression method and the lightweight linear encoder (CS-ASIC) based on the asymmetric semantic image compression method trained by the data-semantic rate-distortion loss function are tested on the Cityscapes and the KITTI data sets to simulate the internet of things equipment with limited resources, and the comparison result shown in fig. 6 is obtained by comparing the image compression methods mainstream in the current industry and adopting PSNR (Peak Signal to Noise Ratio) MS-SSIM (multiscale structure similarity) and mIoU (Mean Intersection over Union) as evaluation indexes. Where the Cityscapes dataset is a large-scale dataset that performs a high quality pixel-level representation of 5000 street view images from 50 different cities, which contains 19 foreground objects for image segmentation. The KITTI data set is the main data set of image processing technology in the field of autopilot.
FIG. 6 (a) shows the results of comparison tests of CS-ASIC and CS-ASIC on the Cityscapes dataset with JPEG, webP, H.264, deep N-JPEG, balle (2017) method on Jetson Nanob01, and FIG. 6 (b) shows the results of comparison tests of CS-ASIC and CS-ASIC on the KITTI dataset with JPEG, webP, H.264, deep N-JPEG, balle (2017) method on Jetson Nanob 01. It can be seen that the compression rate of CS-ASIC is 1.5-3.8 times that of JPEG, and that of CS-ASIC is 1.5-2.5 times that of JPEG, and its performance is superior to that of JPEG encoder. WebP and h.264 are superior to JPEG. Because they have intra prediction to de-associate neighboring blocks. deep-JPEG is superior to JPEG in image segmentation task, but inferior in object detection task. Balle (2017) is superior to CS-ASIC in data rate distortion performance, but has higher complexity requirement for coding, and is not suitable for Internet of things equipment with limited computing resources. In summary, the asymmetric semantic image compression method is more suitable for the scene of the Internet of things.
Exemplary System
Further, the invention also correspondingly provides an asymmetric semantic image compression system facing the scene of the internet of things, which comprises: server and with the thing networking device that the server is connected, wherein, the server includes:
a reconstructed image obtaining module 10, configured to obtain a quantized sampling signal, and reconstruct an image based on the quantized sampling signal by using a depth decoder, so as to obtain a reconstructed image;
the lightweight linear encoder training module 20 is configured to extract semantic information from the reconstructed image, train a first lightweight linear encoder using the semantic information, obtain a trained lightweight linear encoder, and broadcast the trained lightweight linear encoder to an internet of things device.
The invention also provides a server, which comprises a memory 71, a processor 72 and an asymmetric semantic image compression program 73 which is stored in the memory 71 and can run on the processor 72 and faces to the scene of the internet of things, wherein when the processor 72 executes the asymmetric semantic image compression program 73 which faces to the scene of the internet of things, the steps of the asymmetric semantic image compression method which faces to the scene of the internet of things are realized.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, operational database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual operation data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
In summary, the invention discloses an asymmetric semantic image compression method, a system, a server and a storage medium for an Internet of things scene, wherein the method comprises the following steps: acquiring quantized sample signals and reconstructing an image using a depth decoder; extracting semantic information from the reconstructed image, using the semantic information to train the lightweight linear encoder, and broadcasting the trained lightweight linear encoder to Internet of things equipment. According to the invention, better rate distortion performance is obtained based on residual fidelity block reconstruction, and the accuracy of a downstream task is ensured based on rate distortion optimization of data semantics.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (8)

1. An asymmetric semantic image compression method for an internet of things scene is characterized by comprising the following steps:
acquiring a quantized sampling signal, and performing image reconstruction by using a depth decoder based on the quantized sampling signal to obtain a reconstructed image;
extracting semantic information from the reconstructed image, training a first lightweight linear encoder by using the semantic information to obtain a trained lightweight linear encoder, and broadcasting the trained lightweight linear encoder to Internet of things equipment;
the quantized sampling signal is obtained from the internet of things equipment, and the internet of things equipment is used for:
acquiring a target image through a second lightweight linear encoder, and performing separation sampling processing on the target image to obtain a sampling signal;
converting the sampled signal into the quantized sampled signal by the second lightweight linear encoder using a learnable quantization;
entropy encoding the quantized sample signal by an arithmetic encoder through the second lightweight linear encoder to obtain a bit stream of the quantized sample signal;
uploading the bitstream to a server by the second lightweight linear encoder;
the step of obtaining a target image through the second lightweight linear encoder and performing separation sampling processing on the target image comprises the following steps:
converting an RGB color space of the target image into a YUV color space through RGB-YUV conversion by the second lightweight linear encoder;
dividing each YUV channel in the YUV color space of the target image into non-overlapping B x B-sized sample blocks by the second lightweight linear encoder, wherein the B x
The expression for the sample block of size B is:
wherein H, W is the height and width of the target image, Y, U, V are Y, U, V channel subscripts;
sampling the sampling block by the second lightweight linear encoder by using a learnable linear sampling matrix, wherein the sampling process is as follows: y is i:j =Ax i: WhereinIs the learnable linear sampling matrix, such that M<3B 2 Where M is the number of samples.
2. The method for compressing asymmetric semantic images for an internet of things scene as recited in claim 1, further comprising:
the learnable linear sampling matrix is integrated as a learnable parameter into a task-dependent semantic depth feature extractor with a parameter set.
3. The asymmetric semantic image compression method of claim 2 wherein the acquiring quantized sample signals and reconstructing an image using a depth decoder comprises:
receiving a bit stream of the quantized sample signal;
inputting a bit stream of the quantized sample signal into a depth decoder;
entropy decoding is carried out on the bit stream of the quantized sampling signal by adopting an arithmetic decoder, so that the quantized sampling signal is obtained;
reconstructing the quantized sampling signal to obtain an intermediate YUV reconstruction signal;
performing fidelity processing on the intermediate YUV reconstruction signals to obtain fidelity recovery, and updating the intermediate YUV reconstruction signals into the fidelity YUV reconstruction signals by using a multichannel gradient;
extracting features from the fidelity restoration by using the semantic depth feature extractor with parameter sets and related to the task, and adding the features into original features;
the reconstructing the quantized sampling signal to obtain an intermediate YUV reconstructed signal includes:
learning the priori property of the target image by using a residual block to obtain a learnable priori based on the residual block;
reconstructing the quantized sampling signal by using the residual block-based learnable prior to obtain the intermediate YUV reconstruction signal.
4. The method of asymmetric semantic image compression according to claim 1, wherein the extracting semantic information from the reconstructed image, training a first lightweight linear encoder using the semantic information, comprises:
extracting reconstruction inferred semantic information from the reconstructed image;
obtaining loss of accuracy of estimating the downstream task according to the reconstruction estimation semantic information;
obtaining a data-semantic rate distortion optimization target according to the loss of the inferred accuracy rate for evaluating the downstream task;
obtaining a data-semantic rate distortion loss function based on the data-semantic rate distortion optimization target;
training the lightweight linear encoder with the data-semantic rate-distortion loss function.
5. The asymmetric semantic image compression method of claim 4 wherein the data-semantic rate distortion optimization objective includes estimated bitrate loss, human eye perceived loss and loss of inferred accuracy for evaluation of downstream tasks;
the data-semantic rate distortion loss function is:
wherein,,is the encoded and quantized vector, X is the target image, < >>Is the reconstructed image, z is the true semantic tag,>is a semantic tag which is inferred and generated by the reconstructed image through a downstream task model, d 1 (. Cndot.) is MSE or other loss of image reconstruction quality, d 2 (. Lambda.) is the expected semantic distortion 1 ,λ 2 Is the Lagrangian multiplier, D, controlling overall loss R Is the estimated bit rate loss, D 1 Is the perception loss of human eyes, D 2 The loss of accuracy is inferred for the evaluation of downstream tasks.
6. An asymmetric semantic image compression system oriented to an internet of things scene is characterized by comprising: server and with the thing networking device that the server is connected, wherein, the server includes:
the reconstruction image acquisition module is used for acquiring a quantized sampling signal, and performing image reconstruction by using a depth decoder based on the quantized sampling signal to obtain a reconstruction image;
the lightweight linear encoder training module is used for extracting semantic information from the reconstructed image, training a first lightweight linear encoder by using the semantic information to obtain a trained lightweight linear encoder, and broadcasting the trained lightweight linear encoder to the Internet of things equipment;
the quantized sampling signal is obtained from the internet of things equipment, and the internet of things equipment is used for:
acquiring a target image through a second lightweight linear encoder, and performing separation sampling processing on the target image to obtain a sampling signal;
converting the sampled signal into the quantized sampled signal by the second lightweight linear encoder using a learnable quantization;
entropy encoding the quantized sample signal by an arithmetic encoder through the second lightweight linear encoder to obtain a bit stream of the quantized sample signal;
uploading the bitstream to a server by the second lightweight linear encoder;
the step of obtaining a target image through the second lightweight linear encoder and performing separation sampling processing on the target image comprises the following steps:
converting an RGB color space of the target image into a YUV color space through RGB-YUV conversion by the second lightweight linear encoder;
dividing each YUV channel in the YUV color space of the target image into non-overlapping B x B-sized sample blocks by the second lightweight linear encoder, wherein the B x
The expression for the sample block of size B is:
wherein H, W is the height and width of the target image, Y, U, V are Y, U, V channel subscripts;
the block of samples is processed by the second lightweight linear encoder with a learnable linear sampling matrixLine sampling, wherein the sampling process is as follows: y is i:j =Ax i: WhereinIs the learnable linear sampling matrix, such that M<3B 2 Where M is the number of samples.
7. A server, characterized in that the server comprises a memory, a processor and an asymmetric semantic image compression program which is stored in the memory and can run on the processor and faces to an internet of things scene, and when the processor executes the asymmetric semantic image compression program which faces to the internet of things scene, the steps of the asymmetric semantic image compression method which faces to the internet of things scene according to any one of claims 1-5 are realized.
8. A storage medium, wherein the storage medium stores one or more programs executable by one or more processors to implement the steps of an asymmetric semantic image compression method for an internet of things scenario according to any one of claims 1-5.
CN202210445325.5A 2022-04-26 2022-04-26 Asymmetric semantic image compression method for Internet of things scene Active CN114915786B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210445325.5A CN114915786B (en) 2022-04-26 2022-04-26 Asymmetric semantic image compression method for Internet of things scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210445325.5A CN114915786B (en) 2022-04-26 2022-04-26 Asymmetric semantic image compression method for Internet of things scene

Publications (2)

Publication Number Publication Date
CN114915786A CN114915786A (en) 2022-08-16
CN114915786B true CN114915786B (en) 2023-07-28

Family

ID=82765249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210445325.5A Active CN114915786B (en) 2022-04-26 2022-04-26 Asymmetric semantic image compression method for Internet of things scene

Country Status (1)

Country Link
CN (1) CN114915786B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115767108B (en) * 2022-10-20 2023-11-07 哈尔滨工业大学(深圳) Distributed image compression method and system based on feature domain matching
CN115496818B (en) * 2022-11-08 2023-03-10 之江实验室 Semantic graph compression method and device based on dynamic object segmentation

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110009013A (en) * 2019-03-21 2019-07-12 腾讯科技(深圳)有限公司 Encoder training and characterization information extracting method and device
WO2020142077A1 (en) * 2018-12-31 2020-07-09 Didi Research America, Llc Method and system for semantic segmentation involving multi-task convolutional neural network
CN113688836A (en) * 2021-09-28 2021-11-23 四川大学 Real-time road image semantic segmentation method and system based on deep learning
CN114067162A (en) * 2021-11-24 2022-02-18 重庆邮电大学 Image reconstruction method and system based on multi-scale and multi-granularity feature decoupling
CN114143040A (en) * 2021-11-08 2022-03-04 浙江工业大学 Confrontation signal detection method based on multi-channel feature reconstruction

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020142077A1 (en) * 2018-12-31 2020-07-09 Didi Research America, Llc Method and system for semantic segmentation involving multi-task convolutional neural network
CN110009013A (en) * 2019-03-21 2019-07-12 腾讯科技(深圳)有限公司 Encoder training and characterization information extracting method and device
CN113688836A (en) * 2021-09-28 2021-11-23 四川大学 Real-time road image semantic segmentation method and system based on deep learning
CN114143040A (en) * 2021-11-08 2022-03-04 浙江工业大学 Confrontation signal detection method based on multi-channel feature reconstruction
CN114067162A (en) * 2021-11-24 2022-02-18 重庆邮电大学 Image reconstruction method and system based on multi-scale and multi-granularity feature decoupling

Also Published As

Publication number Publication date
CN114915786A (en) 2022-08-16

Similar Documents

Publication Publication Date Title
CN114915786B (en) Asymmetric semantic image compression method for Internet of things scene
US20200092556A1 (en) Efficient Use of Quantization Parameters in Machine-Learning Models for Video Coding
US20200280717A1 (en) Receptive-field-conforming convolution models for video coding
US20200092552A1 (en) Receptive-Field-Conforming Convolutional Models for Video Coding
CN110971901B (en) Processing method, device and equipment of convolutional neural network and storage medium
JP2020508010A (en) Image processing and video compression method
CN110059796A (en) The generation method and device of convolutional neural networks
TWI834087B (en) Method and apparatus for reconstruct image from bitstreams and encoding image into bitstreams, and computer program product
CN112235569B (en) Quick video classification method, system and device based on H264 compressed domain
Fracastoro et al. Superpixel-driven graph transform for image compression
Akbari et al. Generalized octave convolutions for learned multi-frequency image compression
CN110827198A (en) Multi-camera panoramic image construction method based on compressed sensing and super-resolution reconstruction
CN114096987A (en) Video processing method and device
TW202337211A (en) Conditional image compression
Lukin et al. Lossy compression of remote sensing images with controllable distortions
Chen et al. Learning to compress videos without computing motion
WO2023024115A1 (en) Encoding method, decoding method, encoder, decoder and decoding system
EP3156943A1 (en) Method and device for clustering patches of a degraded version of an image
CN112383778A (en) Video coding method and device and decoding method and device
Petrov et al. Intra frame compression and video restoration based on conditional markov processes theory
WO2023082107A1 (en) Decoding method, encoding method, decoder, encoder, and encoding and decoding system
Kimishima et al. Frame adaptive rate control scheme for video compressive sensing
CN118020306A (en) Video encoding and decoding method, encoder, decoder, and storage medium
CN109451314B (en) Image compression sensing method based on graph model
Zemliachenko et al. Improved compression ratio prediction in DCT-based lossy compression of remote sensing images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant