CN114915786B - Asymmetric semantic image compression method for Internet of things scene - Google Patents
Asymmetric semantic image compression method for Internet of things scene Download PDFInfo
- Publication number
- CN114915786B CN114915786B CN202210445325.5A CN202210445325A CN114915786B CN 114915786 B CN114915786 B CN 114915786B CN 202210445325 A CN202210445325 A CN 202210445325A CN 114915786 B CN114915786 B CN 114915786B
- Authority
- CN
- China
- Prior art keywords
- semantic
- linear encoder
- image
- internet
- quantized
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000006835 compression Effects 0.000 title claims abstract description 50
- 238000000034 method Methods 0.000 title claims abstract description 48
- 238000007906 compression Methods 0.000 title claims abstract description 44
- 238000005070 sampling Methods 0.000 claims abstract description 106
- 238000005457 optimization Methods 0.000 claims abstract description 20
- 239000011159 matrix material Substances 0.000 claims description 18
- 238000012545 processing Methods 0.000 claims description 18
- 230000006870 function Effects 0.000 claims description 14
- 238000012549 training Methods 0.000 claims description 14
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 9
- 238000000926 separation method Methods 0.000 claims description 9
- 230000008447 perception Effects 0.000 claims description 7
- 210000000887 face Anatomy 0.000 claims description 6
- 238000011156 evaluation Methods 0.000 claims description 5
- 238000013139 quantization Methods 0.000 claims description 5
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 230000001419 dependent effect Effects 0.000 claims description 3
- 230000006855 networking Effects 0.000 claims description 3
- 238000011084 recovery Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 6
- 238000003062 neural network model Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000003709 image segmentation Methods 0.000 description 2
- 238000012544 monitoring process Methods 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/134—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
- H04N19/146—Data rate or code amount at the encoder output
- H04N19/147—Data rate or code amount at the encoder output according to rate distortion criteria
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/90—Determination of colour characteristics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/124—Quantisation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/10—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
- H04N19/102—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
- H04N19/132—Sampling, masking or truncation of coding units, e.g. adaptive resampling, frame skipping, frame interpolation or high-frequency transform coefficient masking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/90—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
- H04N19/91—Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention discloses an asymmetric semantic image compression method for an Internet of things scene. The asymmetric semantic image compression method for the scene of the Internet of things comprises the steps of obtaining quantized sampling signals and reconstructing an image by using a depth decoder; extracting semantic information from the reconstructed image, using the semantic information to train the lightweight linear encoder, and broadcasting the trained lightweight linear encoder to Internet of things equipment. According to the invention, better rate distortion performance is obtained based on residual fidelity block reconstruction, and the accuracy of a downstream task is ensured based on rate distortion optimization of data semantics.
Description
Technical Field
The invention relates to the field of image processing, in particular to an asymmetric semantic image compression method for an Internet of things scene.
Background
Deep learning has enabled the concept of intelligent internet of things (AIoT) to be realized by analyzing and understanding massive data perceived by internet of things front-end devices, but there are still objective obstacles to deploying a deep network model on internet of things devices due to limited computing resources, storage space and battery capacity of the internet of things devices. With the development of mobile edge computing and 5G technology (fifth generation mobile communication technology), the above-mentioned obstacles can be effectively solved by deploying a deep neural network model on a server with a relatively strong computing capability close to the internet of things device and deploying an image compression algorithm between the internet of things device and the server.
The existing popular lossy image compression algorithm does not consider semantic distortion optimization. The image compression algorithm based on the JPEG (JointPhotographicExpertsGroup) framework proposed by Liu, choi et al is invariant and relearnable, resulting in poor performance in certain downstream tasks. The compressed sensing-based gray image compression framework proposed by Yuan et al does not consider the efficient expansion of color images. Although the depth lossy image compression method achieves good rate distortion performance, the high deployment cost is not suitable for the Internet of things equipment with limited computing resources.
Accordingly, the prior art is still in need of improvement and development.
Disclosure of Invention
The invention aims to solve the technical problems that the existing image compression algorithm does not consider semantic distortion optimization and effective expansion, cannot learn and cannot be deployed in a lightweight mode of a color image.
The technical scheme adopted for solving the technical problems is as follows:
in a first aspect, an embodiment of the present invention provides a method for compressing an asymmetric semantic image oriented to a scene of the internet of things, where the method includes:
acquiring a quantized sampling signal, and performing image reconstruction by using a depth decoder based on the quantized sampling signal to obtain a reconstructed image;
extracting semantic information from the reconstructed image, training a first lightweight linear encoder by using the semantic information to obtain a trained lightweight linear encoder, and broadcasting the trained lightweight linear encoder to Internet of things equipment.
In one implementation manner, the quantized sampling signal is obtained from the internet of things device, where the internet of things device is configured to:
acquiring a target image through a second lightweight linear encoder, and performing separation sampling processing on the target image to obtain a sampling signal;
converting the sampled signal into the quantized sampled signal by the second lightweight linear encoder using a learnable quantization;
entropy encoding the quantized sample signal by an arithmetic encoder through the second lightweight linear encoder to obtain a bit stream of the quantized sample signal;
uploading the bitstream to a server by the second lightweight linear encoder.
In one implementation, the obtaining, by the second lightweight linear encoder, a target image, and performing a separation sampling process on the target image includes:
converting an RGB color space of the target image into a YUV color space through RGB-YUV conversion by the second lightweight linear encoder;
dividing each YUV channel in the YUV color space of the target image into non-overlapping b×b-sized sample blocks by the second lightweight linear encoder, wherein the expression of the b×b-sized sample blocks is:wherein H, W is the height and width of the target image, Y, U, V are Y, U, V channel subscripts;
sampling the block of samples with a learnable linear sampling matrix by the second lightweight linear encoder with a sampling process of y i:j =Ax i:j WhereinIs the learnable linear sampling matrix, such that M<3B 2 Where M is the number of samples.
In one implementation, the method further comprises:
the learnable linear sampling matrix is integrated as a learnable parameter into a task-dependent semantic depth feature extractor with a parameter set.
In one implementation, the acquiring quantized sample signals and reconstructing an image using a depth decoder includes:
receiving a bit stream of the quantized sample signal;
inputting a bit stream of the quantized sample signal into a depth decoder;
entropy decoding is carried out on the bit stream of the quantized sampling signal by adopting an arithmetic decoder, so that the quantized sampling signal is obtained;
reconstructing the quantized sampling signal to obtain an intermediate YUV reconstruction signal;
performing fidelity processing on the intermediate YUV reconstruction signals to obtain fidelity recovery, and updating the intermediate YUV reconstruction signals into the fidelity YUV reconstruction signals by using a multichannel gradient;
extracting features from the fidelity restoration by using the semantic depth feature extractor with parameter sets and related to the task, and adding the features into original features;
the reconstructing the quantized sampling signal to obtain an intermediate YUV reconstructed signal includes:
learning the priori property of the target image by using a residual block to obtain a learnable priori based on the residual block;
reconstructing the quantized sampling signal by using the residual block-based learnable prior to obtain the intermediate YUV reconstruction signal.
In one implementation, the extracting semantic information from the reconstructed image, training a first lightweight linear encoder using the semantic information, includes:
extracting reconstruction inferred semantic information from the reconstructed image;
obtaining loss of accuracy of estimating the downstream task according to the reconstruction estimation semantic information;
obtaining a data-semantic rate distortion optimization target according to the loss of the inferred accuracy rate for evaluating the downstream task;
obtaining a data-semantic rate distortion loss function based on the data-semantic rate distortion optimization target;
training the lightweight linear encoder with the data-semantic rate-distortion loss function.
In one implementation, the data-semantic rate-distortion optimization objective includes estimated bitrate loss, human eye perceived loss, and the loss of inferred accuracy for evaluating downstream tasks;
the data-semantic rate distortion loss function is:
wherein,,is the encoded and quantized vector, X is the target image, < >>Is the reconstructed image, z is the true semantic tag,>is a semantic tag which is inferred and generated by the reconstructed image through a downstream task model, d 1 (. Cndot.) is MSE or other loss of image reconstruction quality, d 2 (. Lambda.) is the expected semantic distortion 1 ,λ 2 Is the Lagrangian multiplier, D, controlling overall loss R Is the estimated bit rate loss, D 1 Is the perception loss of human eyes, D 2 The loss of accuracy is inferred for the evaluation of downstream tasks.
In a second aspect, an embodiment of the present invention further provides an asymmetric semantic image compression system facing to a scene of the internet of things, where the asymmetric semantic image compression system includes: server and with the thing networking device that the server is connected, wherein, the server includes:
the reconstruction image acquisition module is used for acquiring a quantized sampling signal, and performing image reconstruction by using a depth decoder based on the quantized sampling signal to obtain a reconstruction image;
the lightweight linear encoder training module is used for extracting semantic information from the reconstructed image, training a first lightweight linear encoder by using the semantic information to obtain a trained lightweight linear encoder, and broadcasting the trained lightweight linear encoder to the Internet of things equipment.
In a third aspect, the present invention further provides a server, where the server includes a memory, a processor, and an asymmetric semantic image compression program for an internet of things scene stored in the memory and capable of running on the processor, and when the processor executes the asymmetric semantic image compression program for an internet of things scene, the steps of the asymmetric semantic image compression method for an internet of things scene according to any one of the above schemes are implemented.
In a fourth aspect, an embodiment of the present invention further provides a storage medium, where one or more programs are stored, where the one or more programs may be executed by one or more processors, so as to implement the steps of an asymmetric semantic image compression method for an internet of things scenario according to any one of the foregoing schemes.
The beneficial effects are that: compared with the prior art, the invention provides an asymmetric semantic image compression algorithm for an Internet of things scene. Semantic information is then extracted from the reconstructed image, which is used to train the lightweight linear encoder. Finally, broadcasting the trained lightweight linear encoder to the Internet of things equipment. According to the invention, the image is reconstructed through the depth decoder based on the residual fidelity block and deployed on the server, and the lightweight linear encoder is trained through the data-semantic rate distortion loss function so as to obtain better rate distortion performance, so that the accuracy of a downstream task is ensured, and the problems that the existing image compression algorithm does not consider semantic distortion optimization and cannot learn are solved. The problem that the computing resources and the storage space of the Internet of things equipment are limited is solved by deploying the depth decoder and the training function on the server.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to the drawings without inventive effort to those skilled in the art.
Fig. 1 is a flow chart of an asymmetric semantic image compression method for an internet of things scene, which is provided by the embodiment of the invention.
Fig. 2 is a network architecture diagram of an asymmetric semantic image compression algorithm for an internet of things scene provided by an embodiment of the invention
Fig. 3 is a graph of absolute values of pearson correlation coefficients inside RGB versus YUV channels provided by an embodiment of the present invention.
Fig. 4 is a schematic diagram of a separation sampling process according to an embodiment of the present invention.
Fig. 5 is a schematic diagram of an iterative decoding process based on a residual fidelity block according to an embodiment of the present invention.
FIG. 6 is a diagram illustrating the rate distortion performance of different compression algorithms on the Cityscapes and KITTI data sets provided by an embodiment of the present invention.
Fig. 7 is a schematic block diagram of an asymmetric semantic image compression system for an internet of things scene provided by an embodiment of the present invention.
Fig. 8 is a schematic block diagram of a server according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and effects of the present invention clearer and more specific, the present invention will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The existing popular lossy image compression algorithm does not consider semantic distortion optimization. The image compression algorithm based on the JPEG (JointPhotographicExpertsGroup) framework proposed by Liu, choi et al is invariant and relearnable, resulting in poor performance in certain downstream tasks. The compressed sensing-based gray image compression framework proposed by Yuan et al does not consider the efficient expansion of color images. Although the depth lossy image compression method achieves good rate distortion performance, the high deployment cost is not suitable for the Internet of things equipment with limited computing resources. In order to solve the above technical problems, the present embodiment provides an asymmetric semantic image compression algorithm for an internet of things scene, where a quantized sampling signal is first obtained and an image is reconstructed using a depth decoder. Semantic information is then extracted from the reconstructed image, which is used to train the lightweight linear encoder. Finally, broadcasting the trained lightweight linear encoder to the Internet of things equipment. According to the embodiment, better rate distortion performance is obtained through reconstruction based on the residual fidelity block, and the accuracy of a downstream task is guaranteed through rate distortion optimization based on data semantics.
Exemplary method
The embodiment provides an asymmetric semantic image compression algorithm for an internet of things scene, and the embodiment can be applied to a server, wherein the server can be a cloud/edge server. As shown in fig. 1, the method comprises the steps of:
step S100, obtaining a quantized sampling signal, and performing image reconstruction by using a depth decoder based on the quantized sampling signal to obtain a reconstructed image.
In this embodiment, after the cloud/edge server receives the quantized sampling signal sent by the internet of things device, the depth decoder disposed on the cloud/edge server performs reconstruction processing on the quantized sampling signal to obtain a reconstructed image.
The quantized sampling signal is obtained by converting a continuous variation interval corresponding to the sampling signal into a discrete integer value. In reality, the original image signal acquired is a continuous gray signal, and the continuous signal cannot be transmitted in a channel, so that the continuous gray image signal needs to be divided into a plurality of sections, and each section takes a discrete integer value to represent the corresponding gray level. In this embodiment, a lightweight linear encoder on an internet of things device samples a target image to obtain a sampling signal, and converts a continuous variation interval corresponding to the sampling signal into a discrete integer value to obtain a quantized sampling signal.
The depth decoder is deployed on the cloud/edge server, and the image decoding method and the image encoding method are mutually corresponding processes. The quantized sample signal is obtained by encoding a target image on the internet of things device, so that the reconstructed image is obtained by reconstructing the quantized sample signal through a depth decoder on a cloud/edge server.
In one implementation manner, in this embodiment, the quantized sampling signal is obtained from the internet of things device, and the internet of things device obtains the quantized sampling signal through the following steps:
s10, acquiring a target image through a second lightweight linear encoder, and performing separation sampling processing on the target image to obtain a sampling signal;
s20, converting the sampling signal into the quantized sampling signal by the second lightweight linear encoder through learning quantization;
s30, entropy coding the quantized sampling signal by an arithmetic coder through the second lightweight linear coder to obtain a bit stream of the quantized sampling signal;
s40, uploading the bit stream to a server through the second lightweight linear encoder.
Specifically, a second lightweight linear encoder disposed on the internet of things device acquires a target image and samples the target image in advance, that is, after the internet of things device receives the target image sent by the image acquisition device, samples the target image to obtain a sampling signal, converts the sampling signal into a quantized sampling signal, and sends the quantized sampling signal to a server in a bit stream mode. In the signal data transmission process, a large amount of data will cause channel congestion, and the signal data is often affected by environmental factors, such as noise, obstacles, and the like, so that information received by the terminal is incomplete, missing, or the receiving time length is increased, such as image blurring, image damage, and the like. Therefore, in order to improve the quality of data received by the terminal, the original image needs to be sampled, so that the terminal receives the sampled signal, and the original image is successfully reconstructed through measurement and optimal reconstruction of the sampled signal. Therefore, the sampling rate can be reduced on the premise of ensuring the signal quality through sampling, and the cost of transmission, processing and the like of data such as images, videos and the like is obviously reduced through the reduction of sampling data.
For example, after the front-end device of the internet of things, such as a street monitoring camera or an unmanned aerial vehicle fire monitoring system, shoots a target image, a target image X is sent to the internet of things device, after the internet of things device receives the target image X, the target image X is subjected to separation sampling processing through a second lightweight linear encoder disposed on the internet of things device, so as to obtain a sampling signal, further, the sampling signal is quantized and entropy-coded to be converted into a bit stream of the quantized sampling signal, and the bit stream is transmitted to a cloud/edge server in a bit stream form, as shown in fig. 2, wherein the lightweight linear encoder is at the upper left, the JPEG encoder is at the lower left, the RGB-to-YUV transmission method and the 2D-DCT are special convolution operations for comparison, the special convolution operations are used for comparing the coding effect with the lightweight linear encoder in the invention, and the right side corresponds to a depth decoder for completing image reconstruction. LQ represents a learnable quantization, AE and AD represent an arithmetic encoder and an arithmetic decoder, respectively, LE represents a lossless encoder. The convolution parameters are expressed as: kernel height x kernel width x filter number/stride.
In one implementation, the step S10 in this embodiment includes the following steps:
s11, converting the RGB color space of the target image into a YUV color space through RGB-YUV conversion by the second lightweight linear encoder;
s12, dividing each YUV channel in the YUV color space of the target image into non-overlapping sampling blocks with the size of B multiplied by B through the second lightweight linear encoder, wherein the expression of the sampling blocks with the size of B multiplied by B is as follows:wherein H, W is the height and width of the target image, Y, U, V are Y, U, V channel subscripts;
s13, sampling the sampling block by the second lightweight linear encoder through a learnable linear sampling matrix, wherein the sampling process is that y i:j =Ax i:j WhereinIs the learnable linear sampling matrix, such that M<3B 2 Where M is the number of samples.
Specifically, sampling the color image signal along one color dimension and two spatial dimensions, respectively, may reduce the size of the sample matrix. Furthermore, the correlation in the RGB channel is higher than that in the YUV channel, as shown in fig. 3. Thus, the present invention samples each YUV channel independently because they have been de-correlated. As shown in fig. 4, the second lightweight linear encoder on the internet of things device of the present invention samples each YUV channel along the spatial dimension for the received color target image signal.
The size of the sample matrix can be reduced by performing a block-based compressed sensing sampling operation to divide each YUV channel into overlapping bxb blocks. Specifically, given a target X, the target image X is further divided into non-overlapping b×b blocks:where H, W is the height and width of the image, and each sample block is then sampled independently using a learnable linear sampling matrix. In the present invention, the blocks are sparse relative to an orthogonal basis ψ, such as the Discrete Cosine Transform (DCT). Then, for each sampling process can be expressed as y i:j =Ax i:j Wherein Is the learnable linear sampling matrix, such that M<3B 2 Where M is the number of samples. Notably, our block-based sampling steps can be represented by convolutions.
In one implementation manner, the asymmetric semantic image compression algorithm for the scene of the internet of things further comprises the following steps:
s50, integrating the learnable linear sampling matrix into a task-related semantic depth feature extractor with parameter sets as learnable parameters.
Specifically, the JPEG encoder is designed only for human eye perception, unlike the existing JPEG encoder, the lightweight linear encoder of the present invention can be trained based on data-semantic rate distortion, i.e., the method of the present invention simultaneously takes into account human eye perception and machine learning model perception. Each row of our learnable linear sampling matrix can be seen as a filter whose sampling operation is equivalent to a series of convolution filters. The kernel size and stride are both bxb. From this point of view, we can integrate a linear sampling matrix as a learnable parameter into a task-dependent semantic depth feature extractor with a parameter set.
In one implementation, the step S100 in this embodiment includes the following steps:
s101, receiving a bit stream of the quantized sampling signal;
s102, inputting a bit stream of the quantized sampling signal into a depth decoder;
s103, entropy decoding is carried out on the bit stream of the quantized sampling signal by adopting an arithmetic decoder, so that the quantized sampling signal is obtained;
s104, carrying out reconstruction processing on the quantized sampling signals to obtain intermediate YUV reconstruction signals;
s105, performing fidelity processing on the intermediate YUV reconstruction signals to obtain fidelity recovery, and updating the intermediate YUV reconstruction signals into fidelity YUV reconstruction signals by using a multichannel gradient;
s106, extracting features from the fidelity restoration by using the semantic depth feature extractor with the parameter set and related to the task, and adding the features into the original features.
In the invention, the server receives the bit stream of the quantized sampling signal sent by the second lightweight linear encoder, the depth decoder further comprises an entropy decoding step and synthesis transformation to obtain the quantized sampling signal, the received quantized sampling signal is reconstructed, the intermediate YUV reconstruction signal is extracted and recovered, the fidelity YUV reconstruction signal can be obtained through the fidelity processing, a large amount of multidimensional original data is recovered by a small amount of low-dimensional sampling data, and the server can be a cloud/edge server connected with the Internet of things equipment.
Existing compressed sensing theory proves that if the sampling matrix meets the limited equidistant property (Restricted Isometry Property, RIP), sparse optimization can be used: to recover the ith tile, where ρ is a super parameter. In the invention, the sparse prior in the conventional compressed sensing optimization is replaced by the learnable prior, as shown in a formula (1), so as to improve the rate-distortion performance. In particular we use a depth reconstruction function f θ To learn the reconstruction from the quantized sampled signal to the original signal:
where ρ is a super-parameter,is the encoded and quantized vector, X is the target image, < >>Is the reconstructed image, Y, U, V are Y, U, V channel subscripts, W is the RGB to YUV transfer matrix.
Further, we solve this problem using an iterative gradient expansion method. First we define the initial reconstruction as:we can then derive the iterative decoding algorithm as follows:
wherein, K is the total iteration times,for intermediate recovery->Is->Gradient of->Is a deconvolution operation.
Specifically, the invention learns the priori property of the target image by using the residual block of the formula (2), and obtains the learnable priori based on the residual block. And (3) reconstructing the quantized sampling signal by using the residual block-based learnable prior to obtain the intermediate YUV reconstruction signal, wherein the reconstruction quality is better than that of the reconstruction based on sparse prior. The intermediate YUV reconstruction signal is updated to the fidelity YUV reconstruction signal by utilizing the multichannel gradient of the formula (4), so that the reconstruction error of the intermediate YUV reconstruction signal can be reduced. Finally, as shown in fig. 5, the features extracted from the fidelity restoration are added to the original features to correct the accumulated errors in the feature layer.
And step 200, extracting semantic information from the reconstructed image, training a first lightweight linear encoder by using the semantic information to obtain a trained lightweight linear encoder, and broadcasting the trained lightweight linear encoder to the Internet of things equipment.
In this embodiment, because resources of the internet of things device are limited, the deep neural network model cannot be trained and deployed, so that the deep neural network model is deployed on a cloud/edge server with stronger computing capability to ensure that processing on the internet of things device is low in complexity. The deep neural network model on the cloud/edge server extracts reconstruction inferred semantic information from the reconstructed image to obtain a data-semantic rate-distortion optimization target, and trains the first lightweight linear encoder by using a data-semantic rate-distortion loss function, so that a trained and learnable lightweight linear encoder is obtained. Finally broadcasting the trained lightweight linear encoder to the Internet of things equipment, so that the lightweight linear encoder on the Internet of things equipment can be learned under the condition that the processing capacity of the Internet of things equipment is not affected, and the accuracy of downstream tasks is further ensured.
In one implementation, the step S200 in this embodiment includes the following steps:
s201, extracting reconstruction inference semantic information from the reconstruction image;
s202, deducing semantic information according to the reconstruction, and obtaining loss of deducing accuracy rate for evaluating downstream tasks;
s203, deducing the loss of accuracy according to the task for evaluating the downstream task to obtain a data-semantic rate distortion optimization target;
s204, obtaining a data-semantic rate distortion loss function based on the data-semantic rate distortion optimization target;
s205, training the lightweight linear encoder by using the data-semantic rate distortion loss function.
In particular, the reconstruction inferred semantic information is inferred by a semantic deep neural network analysis model deployed on the cloud/edge server. The semantic deep neural network analysis model extracts reconstruction inferred semantic information from the reconstructed image, obtains a loss of inferred accuracy for evaluating downstream tasks, and the data-semantic rate-distortion optimization objective includes estimated bitrate loss, constitutes the data-semantic rate-distortion optimization objective together with human eye perception loss, and trains the first lightweight linear encoder by the following data-semantic rate-distortion loss function (5):
wherein the method comprises the steps of,Is the encoded and quantized vector, X is the target image, < >>Is the reconstructed image, z is the true semantic tag,>is a semantic tag which is inferred and generated by the reconstructed image through a downstream task model, d 1 (. Cndot.) is MSE or other loss of image reconstruction quality, d 2 (. Lambda.) is the expected semantic distortion 1 ,λ 2 Is the Lagrangian multiplier, D, controlling overall loss R Is the estimated bit rate loss, D 1 Is the perception loss of human eyes, D 2 The loss of accuracy is inferred for the evaluation of downstream tasks.
In one implementation, a semantic deep neural network analysis model deployed on a cloud/edge server trains a first lightweight linear encoder to obtain a trained lightweight linear encoder, and broadcasts the trained lightweight linear encoder to the internet of things device in a broadcasting manner. The invention can adaptively improve the semantic accuracy of specific downstream tasks, realize adaptive coding and design an asymmetric semantic image compression algorithm suitable for the resource-restricted scene of the Internet of things.
The lightweight linear encoder (CS-ASIC) based on the asymmetric semantic image compression method and the lightweight linear encoder (CS-ASIC) based on the asymmetric semantic image compression method trained by the data-semantic rate-distortion loss function are tested on the Cityscapes and the KITTI data sets to simulate the internet of things equipment with limited resources, and the comparison result shown in fig. 6 is obtained by comparing the image compression methods mainstream in the current industry and adopting PSNR (Peak Signal to Noise Ratio) MS-SSIM (multiscale structure similarity) and mIoU (Mean Intersection over Union) as evaluation indexes. Where the Cityscapes dataset is a large-scale dataset that performs a high quality pixel-level representation of 5000 street view images from 50 different cities, which contains 19 foreground objects for image segmentation. The KITTI data set is the main data set of image processing technology in the field of autopilot.
FIG. 6 (a) shows the results of comparison tests of CS-ASIC and CS-ASIC on the Cityscapes dataset with JPEG, webP, H.264, deep N-JPEG, balle (2017) method on Jetson Nanob01, and FIG. 6 (b) shows the results of comparison tests of CS-ASIC and CS-ASIC on the KITTI dataset with JPEG, webP, H.264, deep N-JPEG, balle (2017) method on Jetson Nanob 01. It can be seen that the compression rate of CS-ASIC is 1.5-3.8 times that of JPEG, and that of CS-ASIC is 1.5-2.5 times that of JPEG, and its performance is superior to that of JPEG encoder. WebP and h.264 are superior to JPEG. Because they have intra prediction to de-associate neighboring blocks. deep-JPEG is superior to JPEG in image segmentation task, but inferior in object detection task. Balle (2017) is superior to CS-ASIC in data rate distortion performance, but has higher complexity requirement for coding, and is not suitable for Internet of things equipment with limited computing resources. In summary, the asymmetric semantic image compression method is more suitable for the scene of the Internet of things.
Exemplary System
Further, the invention also correspondingly provides an asymmetric semantic image compression system facing the scene of the internet of things, which comprises: server and with the thing networking device that the server is connected, wherein, the server includes:
a reconstructed image obtaining module 10, configured to obtain a quantized sampling signal, and reconstruct an image based on the quantized sampling signal by using a depth decoder, so as to obtain a reconstructed image;
the lightweight linear encoder training module 20 is configured to extract semantic information from the reconstructed image, train a first lightweight linear encoder using the semantic information, obtain a trained lightweight linear encoder, and broadcast the trained lightweight linear encoder to an internet of things device.
The invention also provides a server, which comprises a memory 71, a processor 72 and an asymmetric semantic image compression program 73 which is stored in the memory 71 and can run on the processor 72 and faces to the scene of the internet of things, wherein when the processor 72 executes the asymmetric semantic image compression program 73 which faces to the scene of the internet of things, the steps of the asymmetric semantic image compression method which faces to the scene of the internet of things are realized.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, operational database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual operation data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
In summary, the invention discloses an asymmetric semantic image compression method, a system, a server and a storage medium for an Internet of things scene, wherein the method comprises the following steps: acquiring quantized sample signals and reconstructing an image using a depth decoder; extracting semantic information from the reconstructed image, using the semantic information to train the lightweight linear encoder, and broadcasting the trained lightweight linear encoder to Internet of things equipment. According to the invention, better rate distortion performance is obtained based on residual fidelity block reconstruction, and the accuracy of a downstream task is ensured based on rate distortion optimization of data semantics.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
Claims (8)
1. An asymmetric semantic image compression method for an internet of things scene is characterized by comprising the following steps:
acquiring a quantized sampling signal, and performing image reconstruction by using a depth decoder based on the quantized sampling signal to obtain a reconstructed image;
extracting semantic information from the reconstructed image, training a first lightweight linear encoder by using the semantic information to obtain a trained lightweight linear encoder, and broadcasting the trained lightweight linear encoder to Internet of things equipment;
the quantized sampling signal is obtained from the internet of things equipment, and the internet of things equipment is used for:
acquiring a target image through a second lightweight linear encoder, and performing separation sampling processing on the target image to obtain a sampling signal;
converting the sampled signal into the quantized sampled signal by the second lightweight linear encoder using a learnable quantization;
entropy encoding the quantized sample signal by an arithmetic encoder through the second lightweight linear encoder to obtain a bit stream of the quantized sample signal;
uploading the bitstream to a server by the second lightweight linear encoder;
the step of obtaining a target image through the second lightweight linear encoder and performing separation sampling processing on the target image comprises the following steps:
converting an RGB color space of the target image into a YUV color space through RGB-YUV conversion by the second lightweight linear encoder;
dividing each YUV channel in the YUV color space of the target image into non-overlapping B x B-sized sample blocks by the second lightweight linear encoder, wherein the B x
The expression for the sample block of size B is:
wherein H, W is the height and width of the target image, Y, U, V are Y, U, V channel subscripts;
sampling the sampling block by the second lightweight linear encoder by using a learnable linear sampling matrix, wherein the sampling process is as follows: y is i:j =Ax i: WhereinIs the learnable linear sampling matrix, such that M<3B 2 Where M is the number of samples.
2. The method for compressing asymmetric semantic images for an internet of things scene as recited in claim 1, further comprising:
the learnable linear sampling matrix is integrated as a learnable parameter into a task-dependent semantic depth feature extractor with a parameter set.
3. The asymmetric semantic image compression method of claim 2 wherein the acquiring quantized sample signals and reconstructing an image using a depth decoder comprises:
receiving a bit stream of the quantized sample signal;
inputting a bit stream of the quantized sample signal into a depth decoder;
entropy decoding is carried out on the bit stream of the quantized sampling signal by adopting an arithmetic decoder, so that the quantized sampling signal is obtained;
reconstructing the quantized sampling signal to obtain an intermediate YUV reconstruction signal;
performing fidelity processing on the intermediate YUV reconstruction signals to obtain fidelity recovery, and updating the intermediate YUV reconstruction signals into the fidelity YUV reconstruction signals by using a multichannel gradient;
extracting features from the fidelity restoration by using the semantic depth feature extractor with parameter sets and related to the task, and adding the features into original features;
the reconstructing the quantized sampling signal to obtain an intermediate YUV reconstructed signal includes:
learning the priori property of the target image by using a residual block to obtain a learnable priori based on the residual block;
reconstructing the quantized sampling signal by using the residual block-based learnable prior to obtain the intermediate YUV reconstruction signal.
4. The method of asymmetric semantic image compression according to claim 1, wherein the extracting semantic information from the reconstructed image, training a first lightweight linear encoder using the semantic information, comprises:
extracting reconstruction inferred semantic information from the reconstructed image;
obtaining loss of accuracy of estimating the downstream task according to the reconstruction estimation semantic information;
obtaining a data-semantic rate distortion optimization target according to the loss of the inferred accuracy rate for evaluating the downstream task;
obtaining a data-semantic rate distortion loss function based on the data-semantic rate distortion optimization target;
training the lightweight linear encoder with the data-semantic rate-distortion loss function.
5. The asymmetric semantic image compression method of claim 4 wherein the data-semantic rate distortion optimization objective includes estimated bitrate loss, human eye perceived loss and loss of inferred accuracy for evaluation of downstream tasks;
the data-semantic rate distortion loss function is:
wherein,,is the encoded and quantized vector, X is the target image, < >>Is the reconstructed image, z is the true semantic tag,>is a semantic tag which is inferred and generated by the reconstructed image through a downstream task model, d 1 (. Cndot.) is MSE or other loss of image reconstruction quality, d 2 (. Lambda.) is the expected semantic distortion 1 ,λ 2 Is the Lagrangian multiplier, D, controlling overall loss R Is the estimated bit rate loss, D 1 Is the perception loss of human eyes, D 2 The loss of accuracy is inferred for the evaluation of downstream tasks.
6. An asymmetric semantic image compression system oriented to an internet of things scene is characterized by comprising: server and with the thing networking device that the server is connected, wherein, the server includes:
the reconstruction image acquisition module is used for acquiring a quantized sampling signal, and performing image reconstruction by using a depth decoder based on the quantized sampling signal to obtain a reconstruction image;
the lightweight linear encoder training module is used for extracting semantic information from the reconstructed image, training a first lightweight linear encoder by using the semantic information to obtain a trained lightweight linear encoder, and broadcasting the trained lightweight linear encoder to the Internet of things equipment;
the quantized sampling signal is obtained from the internet of things equipment, and the internet of things equipment is used for:
acquiring a target image through a second lightweight linear encoder, and performing separation sampling processing on the target image to obtain a sampling signal;
converting the sampled signal into the quantized sampled signal by the second lightweight linear encoder using a learnable quantization;
entropy encoding the quantized sample signal by an arithmetic encoder through the second lightweight linear encoder to obtain a bit stream of the quantized sample signal;
uploading the bitstream to a server by the second lightweight linear encoder;
the step of obtaining a target image through the second lightweight linear encoder and performing separation sampling processing on the target image comprises the following steps:
converting an RGB color space of the target image into a YUV color space through RGB-YUV conversion by the second lightweight linear encoder;
dividing each YUV channel in the YUV color space of the target image into non-overlapping B x B-sized sample blocks by the second lightweight linear encoder, wherein the B x
The expression for the sample block of size B is:
wherein H, W is the height and width of the target image, Y, U, V are Y, U, V channel subscripts;
the block of samples is processed by the second lightweight linear encoder with a learnable linear sampling matrixLine sampling, wherein the sampling process is as follows: y is i:j =Ax i: WhereinIs the learnable linear sampling matrix, such that M<3B 2 Where M is the number of samples.
7. A server, characterized in that the server comprises a memory, a processor and an asymmetric semantic image compression program which is stored in the memory and can run on the processor and faces to an internet of things scene, and when the processor executes the asymmetric semantic image compression program which faces to the internet of things scene, the steps of the asymmetric semantic image compression method which faces to the internet of things scene according to any one of claims 1-5 are realized.
8. A storage medium, wherein the storage medium stores one or more programs executable by one or more processors to implement the steps of an asymmetric semantic image compression method for an internet of things scenario according to any one of claims 1-5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210445325.5A CN114915786B (en) | 2022-04-26 | 2022-04-26 | Asymmetric semantic image compression method for Internet of things scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210445325.5A CN114915786B (en) | 2022-04-26 | 2022-04-26 | Asymmetric semantic image compression method for Internet of things scene |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114915786A CN114915786A (en) | 2022-08-16 |
CN114915786B true CN114915786B (en) | 2023-07-28 |
Family
ID=82765249
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210445325.5A Active CN114915786B (en) | 2022-04-26 | 2022-04-26 | Asymmetric semantic image compression method for Internet of things scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114915786B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115767108B (en) * | 2022-10-20 | 2023-11-07 | 哈尔滨工业大学(深圳) | Distributed image compression method and system based on feature domain matching |
CN115496818B (en) * | 2022-11-08 | 2023-03-10 | 之江实验室 | Semantic graph compression method and device based on dynamic object segmentation |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110009013A (en) * | 2019-03-21 | 2019-07-12 | 腾讯科技(深圳)有限公司 | Encoder training and characterization information extracting method and device |
WO2020142077A1 (en) * | 2018-12-31 | 2020-07-09 | Didi Research America, Llc | Method and system for semantic segmentation involving multi-task convolutional neural network |
CN113688836A (en) * | 2021-09-28 | 2021-11-23 | 四川大学 | Real-time road image semantic segmentation method and system based on deep learning |
CN114067162A (en) * | 2021-11-24 | 2022-02-18 | 重庆邮电大学 | Image reconstruction method and system based on multi-scale and multi-granularity feature decoupling |
CN114143040A (en) * | 2021-11-08 | 2022-03-04 | 浙江工业大学 | Confrontation signal detection method based on multi-channel feature reconstruction |
-
2022
- 2022-04-26 CN CN202210445325.5A patent/CN114915786B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020142077A1 (en) * | 2018-12-31 | 2020-07-09 | Didi Research America, Llc | Method and system for semantic segmentation involving multi-task convolutional neural network |
CN110009013A (en) * | 2019-03-21 | 2019-07-12 | 腾讯科技(深圳)有限公司 | Encoder training and characterization information extracting method and device |
CN113688836A (en) * | 2021-09-28 | 2021-11-23 | 四川大学 | Real-time road image semantic segmentation method and system based on deep learning |
CN114143040A (en) * | 2021-11-08 | 2022-03-04 | 浙江工业大学 | Confrontation signal detection method based on multi-channel feature reconstruction |
CN114067162A (en) * | 2021-11-24 | 2022-02-18 | 重庆邮电大学 | Image reconstruction method and system based on multi-scale and multi-granularity feature decoupling |
Also Published As
Publication number | Publication date |
---|---|
CN114915786A (en) | 2022-08-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN114915786B (en) | Asymmetric semantic image compression method for Internet of things scene | |
US20200092556A1 (en) | Efficient Use of Quantization Parameters in Machine-Learning Models for Video Coding | |
US20200280717A1 (en) | Receptive-field-conforming convolution models for video coding | |
US20200092552A1 (en) | Receptive-Field-Conforming Convolutional Models for Video Coding | |
CN110971901B (en) | Processing method, device and equipment of convolutional neural network and storage medium | |
JP2020508010A (en) | Image processing and video compression method | |
CN110059796A (en) | The generation method and device of convolutional neural networks | |
TWI834087B (en) | Method and apparatus for reconstruct image from bitstreams and encoding image into bitstreams, and computer program product | |
CN112235569B (en) | Quick video classification method, system and device based on H264 compressed domain | |
Fracastoro et al. | Superpixel-driven graph transform for image compression | |
Akbari et al. | Generalized octave convolutions for learned multi-frequency image compression | |
CN110827198A (en) | Multi-camera panoramic image construction method based on compressed sensing and super-resolution reconstruction | |
CN114096987A (en) | Video processing method and device | |
TW202337211A (en) | Conditional image compression | |
Lukin et al. | Lossy compression of remote sensing images with controllable distortions | |
Chen et al. | Learning to compress videos without computing motion | |
WO2023024115A1 (en) | Encoding method, decoding method, encoder, decoder and decoding system | |
EP3156943A1 (en) | Method and device for clustering patches of a degraded version of an image | |
CN112383778A (en) | Video coding method and device and decoding method and device | |
Petrov et al. | Intra frame compression and video restoration based on conditional markov processes theory | |
WO2023082107A1 (en) | Decoding method, encoding method, decoder, encoder, and encoding and decoding system | |
Kimishima et al. | Frame adaptive rate control scheme for video compressive sensing | |
CN118020306A (en) | Video encoding and decoding method, encoder, decoder, and storage medium | |
CN109451314B (en) | Image compression sensing method based on graph model | |
Zemliachenko et al. | Improved compression ratio prediction in DCT-based lossy compression of remote sensing images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |