CN110225350B - Natural image compression method based on generation type countermeasure network - Google Patents
Natural image compression method based on generation type countermeasure network Download PDFInfo
- Publication number
- CN110225350B CN110225350B CN201910460717.7A CN201910460717A CN110225350B CN 110225350 B CN110225350 B CN 110225350B CN 201910460717 A CN201910460717 A CN 201910460717A CN 110225350 B CN110225350 B CN 110225350B
- Authority
- CN
- China
- Prior art keywords
- image
- network
- layer
- ith
- convolution layer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000006835 compression Effects 0.000 title claims abstract description 71
- 238000007906 compression Methods 0.000 title claims abstract description 71
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000012549 training Methods 0.000 claims abstract description 57
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 238000010606 normalization Methods 0.000 claims description 67
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000004422 calculation algorithm Methods 0.000 claims description 9
- 238000010586 diagram Methods 0.000 claims description 5
- 238000001228 spectrum Methods 0.000 claims description 5
- 238000009826 distribution Methods 0.000 claims description 4
- 230000008447 perception Effects 0.000 claims description 3
- 230000003595 spectral effect Effects 0.000 claims description 3
- 238000005520 cutting process Methods 0.000 claims description 2
- 238000009795 derivation Methods 0.000 claims description 2
- 238000013527 convolutional neural network Methods 0.000 abstract description 13
- 238000004088 simulation Methods 0.000 description 25
- 238000012360 testing method Methods 0.000 description 12
- 238000005457 optimization Methods 0.000 description 7
- 238000004817 gas chromatography Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 238000011084 recovery Methods 0.000 description 3
- 238000003860 storage Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 230000008921 facial expression Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000008520 organization Effects 0.000 description 2
- 238000003909 pattern recognition Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000004445 quantitative analysis Methods 0.000 description 2
- RTAQQCXQSZGOHL-UHFFFAOYSA-N Titanium Chemical compound [Ti] RTAQQCXQSZGOHL-UHFFFAOYSA-N 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000003042 antagnostic effect Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000001815 facial effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4046—Scaling of whole images or parts thereof, e.g. expanding or contracting using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N19/00—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
- H04N19/42—Methods or arrangements for coding, decoding, compressing or decompressing digital video signals characterised by implementation details or hardware specially adapted for video compression or decompression, e.g. dedicated software implementation
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Compression Or Coding Systems Of Tv Signals (AREA)
Abstract
The invention discloses a natural image compression method based on a generative countermeasure network, which overcomes the problems of low restoration quality and generative compression data dependency of the existing natural image compression method under high-multiple compression, and comprises the following specific steps: (1) constructing an image compression generation type network; (2) training an image decoding subnetwork; (3) training an image coding subnetwork; (4) preprocessing a natural image; (5) acquiring compressed data; (6) and acquiring a restored image. The method compresses original image data by using the convolutional neural network, generates an image from the compressed data by using a generation module in the generative confrontation network, and restricts the generation module by using a discrimination module of the generative confrontation network, thereby realizing high-quality image restoration.
Description
Technical Field
The invention belongs to the technical field of image processing, and further relates to a natural image compression method based on a generating countermeasure network in the technical field of image compression. The method can be used for obtaining compressed data by reducing redundant data amount in the natural image under limited storage resources, and generating an image similar to an original image by using the compressed data.
Background
The image compression technology brings revolutionary breakthrough for reducing redundant information in image data and reducing storage and transmission pressure, and shows that an original image can be recovered from data volume far lower than that of the original image under certain conditions, so that a large amount of resources are saved. The image compression method based on the neural network is divided into generative compression and non-generative compression according to whether the recovery model is a generative model. The generative compression can realize elegant degradation of a restored image along with the increase of a compression ratio due to the antagonistic characteristic of a network, so that distortion is more consistent with the characteristics of human eyes, but has serious data dependency, and only can realize the restoration of a single-class natural image. The non-generative compression is not limited by the training data set, and can realize compression of natural images of various types, but the restored image is severely distorted at a high compression ratio.
Shibani Santurkar et al, in their published paper "Generation compression" (Computer Vision and Pattern Recognition, 2017, Hawaii), propose a natural image compression method using generators in a generative countermeasure network as a restoration model. The method comprises the steps of firstly, extracting deep features of an original image by using a convolutional neural network-based coding network to obtain compressed bit stream data, and then inputting the compressed bit stream data into a trained generator of a generative confrontation network to generate a restored image. The method has the defects that the generator trained independently can only generate a single type of natural images, has serious data dependence and cannot restore different types of images.
The university of Sichuan discloses a natural image compression method using a convolutional neural network in combination with a conventional encoder in its application patent document "still image compression method based on a deep convolutional neural network" (patent application No. 201710287432.9, publication No. CN 107018422A). The method comprises the steps of firstly, coding an original image by using a traditional coder, then, calculating the difference between a restored image and the original image by using a loss function based on a peak signal-to-noise ratio, training a convolutional neural network end to end, and finally, restoring the original image by using the trained network. The method has the disadvantages that under the condition of high compression ratio, the loss function adopted by the method is not beneficial to the storage of the whole image structure, so that the restoration distortion does not accord with the visual characteristics of human eyes, and the quality of the restored image is low.
Disclosure of Invention
The present invention aims to provide a natural image compression method based on a generative countermeasure network, which is aimed at overcoming the defects of the prior art. The invention can generate the restored image similar to the original image in high-multiple compression, and simultaneously solves the problem of data dependency, so that the restored image is more in line with the characteristics of human eyes.
The idea for realizing the purpose of the invention is that a convolutional neural network is utilized to compress original image data, a generating module in a generating type countermeasure network is utilized to generate an image from the compressed data, mutual information of the compressed data and a restored image is used as an additional optimization target of a judging module in the generating type countermeasure network and is used for restoring a corresponding image from different types of compressed data, so that the restoration of multiple types of images is realized, and mixed loss is used as an optimization target and is used for generating a restored image similar to the original image, so that the restoration of an image with higher quality is realized.
The method comprises the following specific steps:
(1) constructing an image compression generation type network:
(1a) a7-layer image coding sub-network is built, and the structure sequentially comprises the following steps: the first convolution layer → the second convolution layer → the first normalization layer → the third convolution layer → the second normalization layer → the fourth convolution layer → the third normalization layer;
(1b) constructing an image decoding sub-network consisting of a generating module and a judging module;
the structure of the generation module is as follows in sequence: the fourth normalization layer → the fifth convolution layer → the fifth normalization layer → the sixth convolution layer → the sixth normalization layer → the seventh convolution layer → the seventh normalization layer → the eighth convolution layer;
the structure of the discrimination module is as follows in sequence: a ninth convolution layer → a tenth convolution layer → an eighth normalization layer → an eleventh convolution layer → a ninth normalization layer → a twelfth convolution layer → a tenth normalization layer → a spectral normalization layer;
connecting the eighth convolution layer in the generation module with the ninth convolution layer in the discrimination module to obtain an image decoding subnetwork;
(1c) connecting a third normalization layer in the image coding sub-network with a fourth normalization layer in the image decoding sub-network to obtain a natural image compression network based on a generative countermeasure network;
(1d) setting parameters of each layer of the image coding sub-network;
(1e) setting parameters of each layer of a generation module of an image decoding sub-network;
(1f) setting parameters of each layer of a discrimination module of an image decoding subnetwork;
(2) training the image decoding subnetwork:
(2a) randomly selecting 180000 images from a natural image data set to form a training set;
(2b) sequentially inputting each image in the training set into an image coding sub-network, and outputting a compressed data sequence corresponding to each image in the training set; inputting each compressed data sequence into a generation module in an image decoding sub-network, and outputting a restored image corresponding to each image in a training set; inputting each image in the training set and the corresponding restored image into a discrimination module in an image decoding subnetwork, and calculating a weighted total loss value corresponding to each image in the training set by using a weighted total loss formula;
(2c) updating a direction formula by using a network parameter of a random gradient descent algorithm, taking the minimum weighted total loss value as a target, and updating the network parameters in the generation module and the judgment module to obtain a trained image decoding sub-network;
(3) training an image coding subnetwork:
(3a) sequentially inputting each image in the training set into a VGGNet19 model, and outputting a deep characteristic diagram corresponding to each image in the training set; sequentially inputting the restoration images corresponding to each image in the training set into the VGGNet19 model, and outputting the deep feature map corresponding to each restoration image;
(3b) calculating a mixed loss value of each image in the training set and the corresponding restored image by using a mixed loss formula;
(3c) updating the network parameters of the image coding sub-network by using a network parameter updating direction formula of a random gradient descent algorithm and taking the minimized mixed loss value as a target to obtain a trained image coding sub-network;
(4) preprocessing a natural image:
cutting each natural image into the size of 64 multiplied by 64 pixels;
(5) acquiring compressed data:
inputting the preprocessed natural image into a trained image coding sub-network, and outputting compressed data by a third normalization layer in the sub-network;
(6) acquiring a restored image:
the compressed data is input to a generation module in the trained image decoding subnetwork, and a restored image is output by the eighth convolution layer of the generation module.
Compared with the prior art, the invention has the following advantages:
firstly, the invention constructs and trains the image decoding sub-network, uses the mutual information of the compressed data and the restored image as an additional optimization target, and is used for restoring the corresponding image from the compressed data of different types, thereby overcoming the problem that the independent training generator can only generate a single type of natural image, has serious data dependency and can not finish restoring the images of different types in the prior art, and realizing the restoration of the images of different types.
Secondly, the invention constructs and trains the image coding sub-network, takes the mixed loss as an optimization target, and is used for generating a restored image similar to the original image, thereby overcoming the problem that the whole structure of the image is lost under the condition of high compression ratio by taking the peak signal-to-noise ratio as the optimization target in the prior art, and realizing the restoration of the image with higher quality.
Drawings
FIG. 1 is a flow chart of the present invention;
FIG. 2 is a graph showing the results of simulation experiment 1;
fig. 3 is a graph showing the results of simulation experiment 2.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
The steps of the present invention are further described with reference to fig. 1.
Step 1, constructing an image compression generation type network.
An image coding sub-network is built, and the structure of the image coding sub-network sequentially comprises the following steps: the first convolution layer → the second convolution layer → the first normalization layer → the third convolution layer → the second normalization layer → the fourth convolution layer → the third normalization layer.
And constructing an image decoding sub-network consisting of a generating module and a judging module.
The structure of the generation module is as follows in sequence: the fourth normalization layer → the fifth convolution layer → the fifth normalization layer → the sixth convolution layer → the sixth normalization layer → the seventh convolution layer → the seventh normalization layer → the eighth convolution layer.
The structure of the discrimination module is as follows in sequence: the ninth convolution layer → the tenth convolution layer → the eighth normalization layer → the eleventh convolution layer → the ninth normalization layer → the twelfth convolution layer → the tenth normalization layer → the spectral normalization layer.
And connecting the eighth convolution layer in the generation module with the ninth convolution layer in the judgment module to obtain the image decoding subnetwork.
And connecting the third normalization layer in the image coding sub-network with the fourth normalization layer in the image decoding sub-network to obtain the image compression generation type network.
The parameters of the layers of the image coding sub-network are set.
The setting of the parameters of each layer of the image coding sub-network is as follows:
the sizes of convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer are all set to be 5 multiplied by 5, the step length is all set to be 2, and the edge filling mode is all set to be SAME.
Setting the mean values of the first normalization layer, the second normalization layer and the third normalization layer to be 0 and setting the variances to be 1.
And setting parameters of each layer of a generation module of the image decoding sub-network.
The parameters of each layer of the generation module for setting the image decoding sub-network are as follows:
setting the mean values of the fourth normalization layer, the fifth normalization layer, the sixth normalization layer and the seventh normalization layer to be 0 and setting the variances to be 1.
The sizes of convolution kernels of the fifth convolution layer, the sixth convolution layer, the seventh convolution layer and the eighth convolution layer are all set to be 5 multiplied by 5, the step length is all set to be 2, and the edge filling mode is all set to be SAME.
And setting parameters of each layer of a discrimination module of the image decoding subnetwork.
The parameters of each layer of the discrimination module for setting the image decoding sub-network are as follows:
convolution kernel sizes of the eighth convolution layer, the ninth convolution layer, the tenth convolution layer, the eleventh convolution layer and the twelfth convolution layer are all set to be 5 multiplied by 5, step lengths are all set to be 2, and an edge filling mode is all set to be SAME.
Setting the mean values of the eighth normalization layer, the ninth normalization layer and the tenth normalization layer as 0 and setting the variances as 1;
and setting the normalization target of the spectrum normalization layer as the maximum singular value of the parameter matrix of the network of the current layer.
The maximum singular value of the network parameter matrix of the current layer is calculated by the following formula:
wherein, σ (W) represents the maximum singular value of the network parameter matrix of the layer, max represents the operation of solving the maximum value, xi belongs to R and represents xi is the element in the matrix R, | | | · | | represents the operation of solving the spectrum norm, and W represents the network parameter matrix of the layer.
And 2, training an image decoding sub-network.
180000 images are randomly selected from the natural image data set to form a training set.
Sequentially inputting each image in the training set into an image coding sub-network, and outputting a compressed data sequence corresponding to each image in the training set; inputting each compressed data sequence into a generation module in an image decoding sub-network, and outputting a restored image corresponding to each image in a training set; and inputting each image in the training set and the corresponding restored image into a discrimination module in the image decoding subnetwork, and calculating a weighted total loss value corresponding to each image in the training set by using a weighted total loss formula.
The weighted total loss formula is as follows:
li=λ1liD+λ2liI
wherein liRepresents the weighted total loss value, λ, of the ith image in the training set1And λ2Respectively represent weight coefficients, are in [0,1 ]]Two unequal fractions randomly selected within the range, and λ1And λ2The sum being equal to 1, liDIndicates the distance loss value l of the ith image in the training set and the corresponding restored imageiIAnd a mutual information loss value representing the i-th group of compressed data and the corresponding restored image.
The distance loss value is calculated by the following formula:
wherein m represents the total number of channels of the ith image in the training set, w and h represent the width and height of the ith image respectively, n represents the total number of pixels of the ith image, j represents the serial number of the pixels in the ith image, Σ represents summation operation, | | | |. either n or h represents the width of the ith image, n represents the sum of the pixels in the ith image, j represents the sum of the pixels in the ith image, and2denotes a two-norm operation, yi,jIndicating the pixel value, x, of the jth pixel in the restored image corresponding to the ith imagei,jRepresenting the pixel value of the jth pixel in the ith image.
The mutual information loss value is calculated by the following formula:
liI=E[lnQ(ct,yi)]+P(ct)log2P(ct)
wherein, E [. C]Denotes an expectation operation, ln denotes a logarithmic operation based on a natural constant e, Q (c)t,yi) Compressed data c corresponding to the ith imagetRestored image y corresponding to ith imageiProbability distribution of (1), P (c)t) Compressed data c corresponding to the ith imagetProbability distribution of (log)2Representing a logarithmic operation with a natural constant of 2 as the base.
And updating the direction formula by using the network parameters of the stochastic gradient descent algorithm, taking the minimum weighted total loss value as a target, and updating the network parameters in the generation module and the judgment module to obtain the trained image decoding subnetwork.
The network parameter updating direction formula of the random gradient descent algorithm is as follows:
θv+1=θv-L′(θV)
wherein, thetaV+1Network parameters, θ, representing the generation module and the discrimination module after the v +1 th updateVThe network parameters of the generation module and the judgment module after the v-th update are shown, L 'represents the partial derivative operation, and L' (theta)V) The weighted total loss value L (theta) is expressed in the network parameter thetavThe partial derivative value of time.
And 3, training the image coding sub-network.
Sequentially inputting each image in the training set into a VGGNet19 model, and outputting a deep characteristic diagram corresponding to each image in the training set; and sequentially inputting the restoration images corresponding to the images in the training set into the VGGNet19 model, and outputting the deep feature maps corresponding to the restoration images.
And calculating a mixed loss value of each image in the training set and the corresponding restored image by using a mixed loss formula.
The hybrid loss formula is as follows:
Ji=α1JiD+α2JiV
wherein, JiMixed loss value, alpha, representing the ith image in the training set1And alpha2Respectively represent weight coefficients, are in [0,1 ]]Two unequal fractions randomly selected within the range, and alpha1And alpha2The sum being equal to 1, JiDRepresents the distance loss value, J, of the ith image in the training set and the corresponding restored imageiVThe perceptual loss values of the i-th image and the corresponding restored image are shown.
The distance loss value is calculated by the following formula:
wherein m represents the total number of channels of the ith image in the training set, w and h represent the width and height of the ith image respectively, n represents the total number of pixels of the ith image, j represents the serial number of the pixels in the ith image, Σ represents summation operation, | | | |. either n or h represents the width of the ith image, n represents the sum of the pixels in the ith image, j represents the sum of the pixels in the ith image, and2denotes a two-norm operation, yi,jIndicating the pixel value, x, of the jth pixel in the restored image corresponding to the ith imagei,jRepresenting the pixel value of the jth pixel in the ith image.
The perception loss value is calculated by the following formula:
wherein f represents the total number of channels of the deep feature map corresponding to the ith image in the training set, g and d represent the width and height of the deep feature map corresponding to the ith image respectively, u represents the total number of pixels of the deep feature map corresponding to the ith image, k represents the serial number of the pixels in the ith deep feature map, Σ represents summation operation, | | |. Y |, C2Denotes a two-norm operation, ai,kB represents the pixel value of the kth pixel in the deep layer feature map of the restored image corresponding to the ith imagei,kThe pixel value of the kth pixel in the deep feature map of the ith image is represented.
And updating the network parameters of the image coding sub-network by using a network parameter updating direction formula of a random gradient descent algorithm and taking the minimized mixed loss value as a target to obtain the trained image coding sub-network.
The network parameter updating direction formula of the random gradient descent algorithm is as follows:
θv+1=θv-L′(θV)
wherein, thetaV+1Network parameter, θ, representing the image decoding subnetwork after the v +1 th updateVDenotes the network parameters of the image decoding subnetwork after the v-th update, L 'denotes the derivation operation, L' (θ)V) The mixed loss value L (theta) is expressed in the network parameter thetavThe partial derivative value of time.
And 4, preprocessing the natural image.
Each natural image is cropped to a size of 64 × 64 pixels.
And 5, acquiring compressed data.
And inputting the preprocessed natural images into a trained image coding sub-network, and outputting compressed data by a third normalization layer in the sub-network.
And 6, acquiring a restored image.
The compressed data is input to a generation module in the trained image decoding subnetwork, and a restored image is output by the eighth convolution layer of the generation module.
The effect of the present invention is further explained by combining the simulation experiment as follows:
1. simulation experiment conditions are as follows:
the hardware platform of the simulation experiment of the invention is as follows: the processor is NVIDIA TITAN XPGPUs, the master frequency is 3.4GHz, and the memory is 128 GB.
The software platform of the simulation experiment of the invention is as follows: ios operating system and python 2.7.
2. Simulation content and result analysis thereof:
the simulation experiment of the invention has two.
Simulation experiment 1:
the simulation experiment 1 of the invention adopts the invention and two prior arts (JPEG image compression method, generation type compression GC method) to respectively carry out 3 kinds of compression processing with different multiples on 5 test original images randomly selected from a face image set CelebA, so as to obtain restored images.
In simulation experiment 1, two prior arts are used:
the JPEG image compression method in the prior art refers to the first international digital image compression standard created for still images, referred to as JPEG image compression method for short, by Joint Photographic Experts Group (Joint Photographic Experts Group) consisting of International Standard Organization (ISO) and international telegraph Consultation Committee (CCITT).
The conventional generative compressed GC method is an image compression method based on a generative countermeasure network proposed by Shibani Santurkar et al in "Computer Vision and Pattern Recognition, 2017, Hawaii", which is abbreviated as a generative compressed GC method.
The effect of the present invention will be further described with reference to the simulation diagram of fig. 2.
Fig. 2(a) is 5 test artwork of simulation experiment 1 of the present invention randomly selected from the CelebA dataset. The CelebA data set was 202599 facial images collected and collated by hong kong university of chinese containing a total of 10177 celebrities. Each image is 64 × 64 pixels in size. Fig. 2(b) is a restored image obtained by compressing 5 test originals by 140 times by the method of the present invention, fig. 2(c) is a restored image obtained by compressing 5 test originals by 708 times by the method of the present invention, fig. 2(d) is a restored image obtained by compressing 5 test originals by 140 times by the generative compression GC method, and fig. 2(e) is a restored image obtained by compressing 5 test originals by 38 times by the JPEG image compression method.
As can be seen from fig. 2(b) and fig. 2(c), when the original image is compressed by 140 and 708 times by the method of the present invention, the restored image and the original image of fig. 2(a) have similar overall structures, clear edges, low error loss, and high quality of restored image. As can be seen from fig. 2(d), when the original image is compressed 140 times by using the generative compression GC method, the facial expression in the restored image is unnatural, mainly because the restored image randomly generated by the method has an obvious difference from the facial expression in the original image, and after pixel-level loss optimization, the restored image cannot be optimized from the deep features of the image, resulting in unnatural expression of the restored image. As can be seen from fig. 2(e), when the compression factor is only 38 times by using the JPEG image compression method, the restored image cannot maintain the whole structure of the original image, and the visual effect is poor, mainly because the method uses a fixed codec, and the image is blocked during the operation, which results in distortion of the restored image.
In order to better compare simulation effects, two evaluation indexes (peak signal-to-noise ratio (PSNR), and Structural Similarity (SSIM)) are used for evaluating the restored image quality of the three methods respectively. The peak signal-to-noise ratio PSNR and the structural similarity SSIM of the restored images of the present invention and two prior arts (JPEG image compression method, generative compression GC method) are calculated respectively using the following formulas, and all the calculation results are plotted as table 1:
wherein log10Denotes a logarithmic operation with a natural constant of 10 as a base, n denotes the number of bits per pixel, H, W denotes the width and height of the restored image, Σ denotes a summing operation, X (i, j) denotes the pixel value of the pixel matrix of the restored image at the (i, j) position, and Y (i, j) denotes the pixel value of the pixel matrix of the original image at the (i, j) position.
Wherein, mux、μyRepresenting the mean values, C, of the pixels of the restored image and of the original image, respectively1、C2、C3Are all constants, usually take C1=6.5025,C2=58.5225,C3=29.26125,σx、σyRepresenting the variance, σ, of the restored image and the original image, respectivelyxyRepresenting the covariance of the restored image and the original image.
TABLE 1 quantitative analysis table of the present invention and prior art image restoration results in simulation experiment 1
As can be seen by combining the table 1, the two indexes of the peak signal-to-noise ratio PSNR and the structural similarity SSIM of the invention are higher than those of the two prior art methods, and the invention proves that the invention can obtain higher image restoration quality.
The above simulation experiments show that: the method of the invention takes the resistance loss and the mixed loss as the optimization target by constructing and training the image coding sub-network, is used for generating the restored image similar to the original image, and overcomes the problem of low quality of the restored image caused by the loss of the whole structure of the image under the condition of high compression ratio in the prior art, so that the method of the invention can realize high-quality image restoration, and is a very practical natural image compression method.
Simulation experiment 2:
the simulation experiment 2 of the invention adopts the invention and two prior arts (JPEG image compression method, NN image compression method based on convolutional neural network) to respectively carry out compression treatment of 2 different multiples on 6 test original images randomly selected from a near-universal object image set CIFAR-10 to obtain restored images.
In simulation experiment 2, two prior arts are used:
the JPEG image compression method in the prior art refers to the first international digital image compression standard created for still images, referred to as JPEG image compression method for short, by Joint Photographic Experts Group (Joint Photographic Experts Group) consisting of International Standard Organization (ISO) and international telegraph Consultation Committee (CCITT).
The NN Image Compression method based on the convolutional neural network in the prior art is an Image Compression method based on the convolutional neural network, which is proposed by Ball, Johannes et al in "End-to-End Optimized Image Compression" (International Conference on learning retrieval, 2017, Toulon) ", and is referred to as an NN Image Compression method based on the convolutional neural network for short.
The effect of the present invention will be further described with reference to the simulation diagram of fig. 3.
FIG. 3(a) is a graph of 6 test artwork from simulation experiment 2 of the present invention randomly selected from the CIFAR-10 dataset. The CIFAR-10 dataset was 50000 training images and 10000 test images of 10 classes of objects in total, organized by Alex Krizhevsky and Ilya Sutskey. Each image size is 32 × 32 pixels. Fig. 3(b) is a restored image obtained by compressing 6 test originals by 140 times by the method of the present invention, fig. 3(c) is a restored image obtained by compressing 6 test originals by 140 times by the NN image compression method based on the convolutional neural network, and fig. 3(d) is a restored image obtained by compressing 6 test originals by 38 times by the JPEG image compression method.
From fig. 3(b), when the original image is compressed by 140 times by the method of the present invention, the restored image has a similar overall structure to the original image fig. 3(a), the edge is clear, and the restored image has higher quality. As can be seen from fig. 3(c), when the original image is compressed by 140 times by using the NN image compression method based on the convolutional neural network, the restored image is blurred, and only the approximate contour of an object in the original image can be retained, mainly because the restored image is optimized by using the loss of the peak signal-to-noise ratio, and the deep features of the image are lacked, so that the detail information is lost, and only the approximate contour of the original image can be restored. As can be seen from fig. 3(d), when the compression factor is only 38 times by using the JPEG image compression method, the restored image cannot maintain the whole structure of the original image, and the visual effect is poor, mainly because the method uses a fixed codec, and the image is blocked during the operation, which results in distortion of the restored image.
In order to better compare simulation effects, the two evaluation indexes (peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM)) are used for evaluating the restored image quality of the three methods respectively. The same formula adopted in simulation experiment 1 is used to calculate the peak signal-to-noise ratio PSNR and the structural similarity SSIM of the restored images of the present invention and two prior arts (JPEG image compression method, NN image compression method based on convolutional neural network) respectively, and all the calculation results are plotted as table 2:
TABLE 2 quantitative analysis table of the present invention and prior art image restoration results in simulation experiment 2
As can be seen by combining the table 2, the two indexes of the peak signal-to-noise ratio PSNR and the structural similarity SSIM of the invention are higher than those of the two prior art methods, and the invention is proved to obtain higher image restoration quality.
The above simulation experiments show that: the invention constructs and trains the image decoding sub-network for recovering the image from the compressed data, and overcomes the problems that in the prior art, a generator trained independently can only generate a single type of natural image, has serious data dependency and can not complete the recovery of different types of images, so that the invention can realize the recovery of different types of images, and is a natural image compression method with more universality.
Claims (6)
1. A natural image compression method based on a generative confrontation network is characterized in that a spectrum normalization layer is arranged in a discrimination module of an image decoding network, network parameters in the generation module and the discrimination module are updated by taking a minimum weighted total loss value as a target, and network parameters of an image coding sub-network are updated by taking a minimum mixed loss value as a target, and the method comprises the following specific steps:
(1) constructing an image compression generation type network:
(1a) a7-layer image coding sub-network is built, and the structure sequentially comprises the following steps: the first convolution layer → the second convolution layer → the first normalization layer → the third convolution layer → the second normalization layer → the fourth convolution layer → the third normalization layer;
(1b) constructing an image decoding sub-network consisting of a generating module and a judging module;
the structure of the generation module is as follows in sequence: the fourth normalization layer → the fifth convolution layer → the fifth normalization layer → the sixth convolution layer → the sixth normalization layer → the seventh convolution layer → the seventh normalization layer → the eighth convolution layer;
the structure of the discrimination module is as follows in sequence: a ninth convolution layer → a tenth convolution layer → an eighth normalization layer → an eleventh convolution layer → a ninth normalization layer → a twelfth convolution layer → a tenth normalization layer → a spectral normalization layer;
connecting the eighth convolution layer in the generation module with the ninth convolution layer in the discrimination module to obtain an image decoding subnetwork;
(1c) connecting a third normalization layer in the image coding sub-network with a fourth normalization layer in the image decoding sub-network to obtain a natural image compression network based on a generative countermeasure network;
(1d) setting parameters of each layer of the image coding sub-network;
(1e) setting parameters of each layer of a generation module of an image decoding sub-network;
(1f) setting parameters of each layer of a discrimination module of an image decoding subnetwork;
(2) training the image decoding subnetwork:
(2a) randomly selecting 180000 images from a natural image data set to form a training set;
(2b) sequentially inputting each image in the training set into an image coding sub-network, and outputting a compressed data sequence corresponding to each image in the training set; inputting each compressed data sequence into a generation module in an image decoding sub-network, and outputting a restored image corresponding to each image in a training set; inputting each image in the training set and the corresponding restored image into a discrimination module in an image decoding subnetwork, and calculating a weighted total loss value corresponding to each image in the training set by using a weighted total loss formula;
(2c) updating a direction formula by using a network parameter of a random gradient descent algorithm, taking the minimum weighted total loss value as a target, and updating the network parameters in the generation module and the judgment module to obtain a trained image decoding sub-network;
(3) training an image coding subnetwork:
(3a) sequentially inputting each image in the training set into a VGGNet19 model, and outputting a deep characteristic diagram corresponding to each image in the training set; sequentially inputting the restoration images corresponding to each image in the training set into the VGGNet19 model, and outputting the deep feature map corresponding to each restoration image;
(3b) calculating a mixed loss value of each image in the training set and the corresponding restored image by using the following mixed loss formula:
Ji=α1JiD+α2JiV
wherein, JiMixed loss value, alpha, representing the ith image in the training set1And alpha2Respectively represent weight coefficients, are in [0,1 ]]Two unequal fractions randomly selected within the range, and alpha1And alpha2The sum being equal to 1, JiDRepresents the distance loss value, J, of the ith image in the training set and the corresponding restored imageiVIndicating the perception loss value of the ith image and the corresponding restored image;
the distance loss value is calculated by the following formula:
wherein m represents the total number of channels of the ith image in the training set, w and h represent the width and height of the ith image respectively, n represents the total number of pixels of the ith image, j represents the serial number of the pixels in the ith image, Σ represents summation operation, | | | |. either n or h represents the width of the ith image, n represents the sum of the pixels in the ith image, j represents the sum of the pixels in the ith image, and2denotes a two-norm operation, yi,jIndicating the pixel value, x, of the jth pixel in the restored image corresponding to the ith imagei,jRepresenting the pixel value of the jth pixel in the ith image;
the perception loss value is calculated by the following formula:
wherein f represents the total number of channels of the deep feature map corresponding to the ith image in the training set, g and d represent the width and height of the deep feature map corresponding to the ith image respectively, u represents the total number of pixels of the deep feature map corresponding to the ith image, k represents the serial number of the pixels in the ith deep feature map, Σ represents summation operation, | | |. Y |, C2Denotes a two-norm operation, ai,kB represents the pixel value of the kth pixel in the deep layer feature map of the restored image corresponding to the ith imagei,kRepresenting the pixel value of the kth pixel in the deep feature map of the ith image;
(3c) Updating the network parameters of the image coding sub-network by using a network parameter updating direction formula of a random gradient descent algorithm and taking the minimized mixed loss value as a target to obtain a trained image coding sub-network;
(4) preprocessing a natural image:
cutting each natural image into the size of 64 multiplied by 64 pixels;
(5) acquiring compressed data:
inputting the preprocessed natural image into a trained image coding sub-network, and outputting compressed data by a third normalization layer in the sub-network;
(6) acquiring a restored image:
the compressed data is input to a generation module in the trained image decoding subnetwork, and a restored image is output by the eighth convolution layer of the generation module.
2. The natural image compression method based on the generative countermeasure network as claimed in claim 1, wherein the setting of the parameters of each layer of the image coding sub-network in step (1d) is as follows:
setting the sizes of convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer to be 5 multiplied by 5, setting the step length to be 2 and setting the edge filling mode to be SAME;
setting the mean values of the first normalization layer, the second normalization layer and the third normalization layer to be 0 and setting the variances to be 1.
3. The natural image compression method based on generative countermeasure network as claimed in claim 1, wherein the parameters of each layer of the generating module for setting image decoding sub-network in step (1e) are as follows:
setting the mean values of the fourth normalization layer, the fifth normalization layer, the sixth normalization layer and the seventh normalization layer as 0 and setting the variances as 1;
the sizes of convolution kernels of the fifth convolution layer, the sixth convolution layer, the seventh convolution layer and the eighth convolution layer are all set to be 5 multiplied by 5, the step length is all set to be 2, and the edge filling mode is all set to be SAME.
4. The method of claim 1, wherein the parameters of the layers of the decision module for setting the image decoding sub-network in step (1f) are as follows:
setting convolution kernels of the eighth convolution layer, the ninth convolution layer, the tenth convolution layer, the eleventh convolution layer and the twelfth convolution layer to be 5 multiplied by 5, setting step length to be 2 and setting an edge filling mode to be SAME;
setting the mean values of the eighth normalization layer, the ninth normalization layer and the tenth normalization layer as 0 and setting the variances as 1;
setting a normalization target of a spectrum normalization layer as a maximum singular value of a parameter matrix of the network of the current layer;
the maximum singular value of the network parameter matrix of the current layer is calculated by the following formula:
wherein, σ (W) represents the maximum singular value of the network parameter matrix of the layer, max represents the operation of solving the maximum value, xi belongs to R and represents xi is the element in the matrix R, | | | · | | represents the operation of solving the spectrum norm, and W represents the network parameter matrix of the layer.
5. The natural image compression method based on generative countermeasure network as claimed in claim 1, wherein the weighted total loss formula in step (2b) is as follows:
li=λ1liD+λ2liI
wherein liRepresents the weighted total loss value, λ, of the ith image in the training set1And λ2Respectively represent weight coefficients, are in [0,1 ]]Two unequal fractions randomly selected within the range, and λ1And λ2The sum being equal to 1, liDRepresenting a training setI-th image in (1) and the distance loss value l of its corresponding restored imageiIThe mutual information loss value of the ith group of compressed data and the corresponding restoration image is represented;
the distance loss value is calculated by the following formula:
wherein m represents the total number of channels of the ith image in the training set, w and h represent the width and height of the ith image respectively, n represents the total number of pixels of the ith image, j represents the serial number of the pixels in the ith image, Σ represents summation operation, | | | |. either n or h represents the width of the ith image, n represents the sum of the pixels in the ith image, j represents the sum of the pixels in the ith image, and2denotes a two-norm operation, yi,jIndicating the pixel value, x, of the jth pixel in the restored image corresponding to the ith imagei,jRepresenting the pixel value of the jth pixel in the ith image;
the mutual information loss value is calculated by the following formula:
liI=E[lnQ(ct,yi)]+P(ct)log2P(ct)
wherein, E [. C]Denotes an expectation operation, ln denotes a logarithmic operation based on a natural constant e, Q (c)t,yi) Compressed data c corresponding to the ith imagetRestored image y corresponding to ith imageiProbability distribution of (1), P (c)t) Compressed data c corresponding to the ith imagetProbability distribution of (log)2Representing a logarithmic operation with a natural constant of 2 as the base.
6. The natural image compression method based on the generative countermeasure network as claimed in claim 1, wherein the network parameter update direction formula of the stochastic gradient descent algorithm in step (2c) and step (3c) is as follows:
θv+1=θv-L′(θV)
wherein, thetaV+1Represents the network parameter after the v +1 th update, and the network parameter is live in step (2c)The network parameters of the forming module and the judging module are the network parameters of the image decoding sub-network in the step (3c), and thetaVRepresenting the network parameter after the v-th update, the network parameter in the step (2c) is the network parameter of the generation module and the discrimination module, the network parameter in the step (3c) is the network parameter of the image decoding sub-network, L 'represents the partial derivation operation, L' (theta)V) The loss value L (theta) is expressed in the network parameter thetavThe partial derivative value in time (c) is the weighted total loss value L (θ) in step (2c), and the loss value L (θ) in step (3c) is the mixed loss value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910460717.7A CN110225350B (en) | 2019-05-30 | 2019-05-30 | Natural image compression method based on generation type countermeasure network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910460717.7A CN110225350B (en) | 2019-05-30 | 2019-05-30 | Natural image compression method based on generation type countermeasure network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110225350A CN110225350A (en) | 2019-09-10 |
CN110225350B true CN110225350B (en) | 2021-03-23 |
Family
ID=67818824
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910460717.7A Active CN110225350B (en) | 2019-05-30 | 2019-05-30 | Natural image compression method based on generation type countermeasure network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110225350B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111050170A (en) * | 2019-12-06 | 2020-04-21 | 山东浪潮人工智能研究院有限公司 | Image compression system construction method, compression system and method based on GAN |
CN111340901B (en) * | 2020-02-19 | 2023-08-11 | 国网浙江省电力有限公司 | Compression method of power transmission network picture under complex environment based on generation type countermeasure network |
CN113542759B (en) * | 2020-04-15 | 2024-05-10 | 辉达公司 | Generating an antagonistic neural network assisted video reconstruction |
CN111639542A (en) * | 2020-05-06 | 2020-09-08 | 中移雄安信息通信科技有限公司 | License plate recognition method, device, equipment and medium |
CN111787323B (en) | 2020-05-23 | 2021-09-03 | 清华大学 | Variable bit rate generation type compression method based on counterstudy |
CN112929666B (en) * | 2021-03-22 | 2023-04-14 | 北京金山云网络技术有限公司 | Method, device and equipment for training coding and decoding network and storage medium |
CN115086670B (en) * | 2022-06-13 | 2023-03-10 | 梧州学院 | Low-bit-rate encoding and decoding method and system for high-definition microscopic video |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109040763A (en) * | 2018-08-07 | 2018-12-18 | 北京飞搜科技有限公司 | A kind of method for compressing image and system based on production confrontation network |
CN109377532A (en) * | 2018-10-18 | 2019-02-22 | 众安信息技术服务有限公司 | Image processing method and device neural network based |
CN109495744A (en) * | 2018-10-29 | 2019-03-19 | 西安电子科技大学 | The big multiplying power remote sensing image compression method of confrontation network is generated based on joint |
CN109801230A (en) * | 2018-12-21 | 2019-05-24 | 河海大学 | A kind of image repair method based on new encoder structure |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10785496B2 (en) * | 2015-12-23 | 2020-09-22 | Sony Corporation | Video encoding and decoding apparatus, system and method |
EP3398114B1 (en) * | 2016-02-05 | 2022-08-24 | Deepmind Technologies Limited | Compressing images using neural networks |
-
2019
- 2019-05-30 CN CN201910460717.7A patent/CN110225350B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109040763A (en) * | 2018-08-07 | 2018-12-18 | 北京飞搜科技有限公司 | A kind of method for compressing image and system based on production confrontation network |
CN109377532A (en) * | 2018-10-18 | 2019-02-22 | 众安信息技术服务有限公司 | Image processing method and device neural network based |
CN109495744A (en) * | 2018-10-29 | 2019-03-19 | 西安电子科技大学 | The big multiplying power remote sensing image compression method of confrontation network is generated based on joint |
CN109801230A (en) * | 2018-12-21 | 2019-05-24 | 河海大学 | A kind of image repair method based on new encoder structure |
Non-Patent Citations (1)
Title |
---|
基于深度学习的图像压缩方法研究;任杰;《中国优秀硕士学位论文全文数据库(电子期刊)》;20180215;全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN110225350A (en) | 2019-09-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110225350B (en) | Natural image compression method based on generation type countermeasure network | |
CN110992275B (en) | Refined single image rain removing method based on generation of countermeasure network | |
CN111787323B (en) | Variable bit rate generation type compression method based on counterstudy | |
CN113362223B (en) | Image super-resolution reconstruction method based on attention mechanism and two-channel network | |
US7286712B2 (en) | Method and apparatus for efficiently encoding chromatic images using non-orthogonal basis functions | |
CN110751597B (en) | Video super-resolution method based on coding damage repair | |
CN114092330A (en) | Lightweight multi-scale infrared image super-resolution reconstruction method | |
CN111080567A (en) | Remote sensing image fusion method and system based on multi-scale dynamic convolution neural network | |
CN112288632B (en) | Single image super-resolution method and system based on simplified ESRGAN | |
CN109949222A (en) | Image super-resolution rebuilding method based on grapheme | |
CN111340901B (en) | Compression method of power transmission network picture under complex environment based on generation type countermeasure network | |
CN117710216B (en) | Image super-resolution reconstruction method based on variation self-encoder | |
CN110827198A (en) | Multi-camera panoramic image construction method based on compressed sensing and super-resolution reconstruction | |
CN117151990B (en) | Image defogging method based on self-attention coding and decoding | |
WO2024164694A1 (en) | Image compression method and apparatus, electronic device, computer program product, and storage medium | |
CN117274059A (en) | Low-resolution image reconstruction method and system based on image coding-decoding | |
CN114422795A (en) | Face video coding method, decoding method and device | |
CN118172290A (en) | Multi-stage adaptive CNN and hybrid transducer-based Thangka image restoration method, system and storage medium | |
CN112634168A (en) | Image restoration method combined with edge information | |
CN116137043A (en) | Infrared image colorization method based on convolution and transfomer | |
CN113240589A (en) | Image defogging method and system based on multi-scale feature fusion | |
CN114663292A (en) | Ultra-lightweight picture defogging and identification network model and picture defogging and identification method | |
CN117455813B (en) | Method for restoring Chinese character image of shielding handwritten medical record based on gating convolution and SCPAM attention module | |
CN116597256A (en) | Model pruning method based on BGNet three-dimensional matching network | |
CN117132500A (en) | Weak light enhancement method based on sparse conversion network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |