CN110225350B

CN110225350B - Natural image compression method based on generation type countermeasure network

Info

Publication number: CN110225350B
Application number: CN201910460717.7A
Authority: CN
Inventors: 王柯俨; 刘泉; 刘凯; 李云松
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2019-05-30
Filing date: 2019-05-30
Publication date: 2021-03-23
Anticipated expiration: 2039-05-30
Also published as: CN110225350A

Abstract

The invention discloses a natural image compression method based on a generative countermeasure network, which overcomes the problems of low restoration quality and generative compression data dependency of the existing natural image compression method under high-multiple compression, and comprises the following specific steps: (1) constructing an image compression generation type network; (2) training an image decoding subnetwork; (3) training an image coding subnetwork; (4) preprocessing a natural image; (5) acquiring compressed data; (6) and acquiring a restored image. The method compresses original image data by using the convolutional neural network, generates an image from the compressed data by using a generation module in the generative confrontation network, and restricts the generation module by using a discrimination module of the generative confrontation network, thereby realizing high-quality image restoration.

Description

Natural image compression method based on generation type countermeasure network

Technical Field

The invention belongs to the technical field of image processing, and further relates to a natural image compression method based on a generating countermeasure network in the technical field of image compression. The method can be used for obtaining compressed data by reducing redundant data amount in the natural image under limited storage resources, and generating an image similar to an original image by using the compressed data.

Background

The image compression technology brings revolutionary breakthrough for reducing redundant information in image data and reducing storage and transmission pressure, and shows that an original image can be recovered from data volume far lower than that of the original image under certain conditions, so that a large amount of resources are saved. The image compression method based on the neural network is divided into generative compression and non-generative compression according to whether the recovery model is a generative model. The generative compression can realize elegant degradation of a restored image along with the increase of a compression ratio due to the antagonistic characteristic of a network, so that distortion is more consistent with the characteristics of human eyes, but has serious data dependency, and only can realize the restoration of a single-class natural image. The non-generative compression is not limited by the training data set, and can realize compression of natural images of various types, but the restored image is severely distorted at a high compression ratio.

Shibani Santurkar et al, in their published paper "Generation compression" (Computer Vision and Pattern Recognition, 2017, Hawaii), propose a natural image compression method using generators in a generative countermeasure network as a restoration model. The method comprises the steps of firstly, extracting deep features of an original image by using a convolutional neural network-based coding network to obtain compressed bit stream data, and then inputting the compressed bit stream data into a trained generator of a generative confrontation network to generate a restored image. The method has the defects that the generator trained independently can only generate a single type of natural images, has serious data dependence and cannot restore different types of images.

The university of Sichuan discloses a natural image compression method using a convolutional neural network in combination with a conventional encoder in its application patent document "still image compression method based on a deep convolutional neural network" (patent application No. 201710287432.9, publication No. CN 107018422A). The method comprises the steps of firstly, coding an original image by using a traditional coder, then, calculating the difference between a restored image and the original image by using a loss function based on a peak signal-to-noise ratio, training a convolutional neural network end to end, and finally, restoring the original image by using the trained network. The method has the disadvantages that under the condition of high compression ratio, the loss function adopted by the method is not beneficial to the storage of the whole image structure, so that the restoration distortion does not accord with the visual characteristics of human eyes, and the quality of the restored image is low.

Disclosure of Invention

The present invention aims to provide a natural image compression method based on a generative countermeasure network, which is aimed at overcoming the defects of the prior art. The invention can generate the restored image similar to the original image in high-multiple compression, and simultaneously solves the problem of data dependency, so that the restored image is more in line with the characteristics of human eyes.

The idea for realizing the purpose of the invention is that a convolutional neural network is utilized to compress original image data, a generating module in a generating type countermeasure network is utilized to generate an image from the compressed data, mutual information of the compressed data and a restored image is used as an additional optimization target of a judging module in the generating type countermeasure network and is used for restoring a corresponding image from different types of compressed data, so that the restoration of multiple types of images is realized, and mixed loss is used as an optimization target and is used for generating a restored image similar to the original image, so that the restoration of an image with higher quality is realized.

The method comprises the following specific steps:

(1) constructing an image compression generation type network:

(1a) a7-layer image coding sub-network is built, and the structure sequentially comprises the following steps: the first convolution layer → the second convolution layer → the first normalization layer → the third convolution layer → the second normalization layer → the fourth convolution layer → the third normalization layer;

(1b) constructing an image decoding sub-network consisting of a generating module and a judging module;

the structure of the generation module is as follows in sequence: the fourth normalization layer → the fifth convolution layer → the fifth normalization layer → the sixth convolution layer → the sixth normalization layer → the seventh convolution layer → the seventh normalization layer → the eighth convolution layer;

the structure of the discrimination module is as follows in sequence: a ninth convolution layer → a tenth convolution layer → an eighth normalization layer → an eleventh convolution layer → a ninth normalization layer → a twelfth convolution layer → a tenth normalization layer → a spectral normalization layer;

connecting the eighth convolution layer in the generation module with the ninth convolution layer in the discrimination module to obtain an image decoding subnetwork;

(1c) connecting a third normalization layer in the image coding sub-network with a fourth normalization layer in the image decoding sub-network to obtain a natural image compression network based on a generative countermeasure network;

(1d) setting parameters of each layer of the image coding sub-network;

(1e) setting parameters of each layer of a generation module of an image decoding sub-network;

(1f) setting parameters of each layer of a discrimination module of an image decoding subnetwork;

(2) training the image decoding subnetwork:

(2a) randomly selecting 180000 images from a natural image data set to form a training set;

(2b) sequentially inputting each image in the training set into an image coding sub-network, and outputting a compressed data sequence corresponding to each image in the training set; inputting each compressed data sequence into a generation module in an image decoding sub-network, and outputting a restored image corresponding to each image in a training set; inputting each image in the training set and the corresponding restored image into a discrimination module in an image decoding subnetwork, and calculating a weighted total loss value corresponding to each image in the training set by using a weighted total loss formula;

(2c) updating a direction formula by using a network parameter of a random gradient descent algorithm, taking the minimum weighted total loss value as a target, and updating the network parameters in the generation module and the judgment module to obtain a trained image decoding sub-network;

(3) training an image coding subnetwork:

(3a) sequentially inputting each image in the training set into a VGGNet19 model, and outputting a deep characteristic diagram corresponding to each image in the training set; sequentially inputting the restoration images corresponding to each image in the training set into the VGGNet19 model, and outputting the deep feature map corresponding to each restoration image;

(3b) calculating a mixed loss value of each image in the training set and the corresponding restored image by using a mixed loss formula;

(3c) updating the network parameters of the image coding sub-network by using a network parameter updating direction formula of a random gradient descent algorithm and taking the minimized mixed loss value as a target to obtain a trained image coding sub-network;

(4) preprocessing a natural image:

cutting each natural image into the size of 64 multiplied by 64 pixels;

(5) acquiring compressed data:

inputting the preprocessed natural image into a trained image coding sub-network, and outputting compressed data by a third normalization layer in the sub-network;

(6) acquiring a restored image:

the compressed data is input to a generation module in the trained image decoding subnetwork, and a restored image is output by the eighth convolution layer of the generation module.

Compared with the prior art, the invention has the following advantages:

firstly, the invention constructs and trains the image decoding sub-network, uses the mutual information of the compressed data and the restored image as an additional optimization target, and is used for restoring the corresponding image from the compressed data of different types, thereby overcoming the problem that the independent training generator can only generate a single type of natural image, has serious data dependency and can not finish restoring the images of different types in the prior art, and realizing the restoration of the images of different types.

Secondly, the invention constructs and trains the image coding sub-network, takes the mixed loss as an optimization target, and is used for generating a restored image similar to the original image, thereby overcoming the problem that the whole structure of the image is lost under the condition of high compression ratio by taking the peak signal-to-noise ratio as the optimization target in the prior art, and realizing the restoration of the image with higher quality.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a graph showing the results of simulation experiment 1;

fig. 3 is a graph showing the results of simulation experiment 2.

Detailed Description

The invention is further described below with reference to the accompanying drawings.

The steps of the present invention are further described with reference to fig. 1.

Step 1, constructing an image compression generation type network.

An image coding sub-network is built, and the structure of the image coding sub-network sequentially comprises the following steps: the first convolution layer → the second convolution layer → the first normalization layer → the third convolution layer → the second normalization layer → the fourth convolution layer → the third normalization layer.

And constructing an image decoding sub-network consisting of a generating module and a judging module.

The structure of the generation module is as follows in sequence: the fourth normalization layer → the fifth convolution layer → the fifth normalization layer → the sixth convolution layer → the sixth normalization layer → the seventh convolution layer → the seventh normalization layer → the eighth convolution layer.

The structure of the discrimination module is as follows in sequence: the ninth convolution layer → the tenth convolution layer → the eighth normalization layer → the eleventh convolution layer → the ninth normalization layer → the twelfth convolution layer → the tenth normalization layer → the spectral normalization layer.

And connecting the eighth convolution layer in the generation module with the ninth convolution layer in the judgment module to obtain the image decoding subnetwork.

And connecting the third normalization layer in the image coding sub-network with the fourth normalization layer in the image decoding sub-network to obtain the image compression generation type network.

The parameters of the layers of the image coding sub-network are set.

The setting of the parameters of each layer of the image coding sub-network is as follows:

the sizes of convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer are all set to be 5 multiplied by 5, the step length is all set to be 2, and the edge filling mode is all set to be SAME.

Setting the mean values of the first normalization layer, the second normalization layer and the third normalization layer to be 0 and setting the variances to be 1.

And setting parameters of each layer of a generation module of the image decoding sub-network.

The parameters of each layer of the generation module for setting the image decoding sub-network are as follows:

setting the mean values of the fourth normalization layer, the fifth normalization layer, the sixth normalization layer and the seventh normalization layer to be 0 and setting the variances to be 1.

The sizes of convolution kernels of the fifth convolution layer, the sixth convolution layer, the seventh convolution layer and the eighth convolution layer are all set to be 5 multiplied by 5, the step length is all set to be 2, and the edge filling mode is all set to be SAME.

And setting parameters of each layer of a discrimination module of the image decoding subnetwork.

The parameters of each layer of the discrimination module for setting the image decoding sub-network are as follows:

convolution kernel sizes of the eighth convolution layer, the ninth convolution layer, the tenth convolution layer, the eleventh convolution layer and the twelfth convolution layer are all set to be 5 multiplied by 5, step lengths are all set to be 2, and an edge filling mode is all set to be SAME.

Setting the mean values of the eighth normalization layer, the ninth normalization layer and the tenth normalization layer as 0 and setting the variances as 1;

and setting the normalization target of the spectrum normalization layer as the maximum singular value of the parameter matrix of the network of the current layer.

The maximum singular value of the network parameter matrix of the current layer is calculated by the following formula:

wherein, σ (W) represents the maximum singular value of the network parameter matrix of the layer, max represents the operation of solving the maximum value, xi belongs to R and represents xi is the element in the matrix R, | | | · | | represents the operation of solving the spectrum norm, and W represents the network parameter matrix of the layer.

And 2, training an image decoding sub-network.

180000 images are randomly selected from the natural image data set to form a training set.

Sequentially inputting each image in the training set into an image coding sub-network, and outputting a compressed data sequence corresponding to each image in the training set; inputting each compressed data sequence into a generation module in an image decoding sub-network, and outputting a restored image corresponding to each image in a training set; and inputting each image in the training set and the corresponding restored image into a discrimination module in the image decoding subnetwork, and calculating a weighted total loss value corresponding to each image in the training set by using a weighted total loss formula.

The weighted total loss formula is as follows:

l_i＝λ₁l_iD+λ₂l_iI

wherein l_iRepresents the weighted total loss value, λ, of the ith image in the training set₁And λ₂Respectively represent weight coefficients, are in [0,1 ]]Two unequal fractions randomly selected within the range, and λ₁And λ₂The sum being equal to 1, l_iDIndicates the distance loss value l of the ith image in the training set and the corresponding restored image_iIAnd a mutual information loss value representing the i-th group of compressed data and the corresponding restored image.

The distance loss value is calculated by the following formula:

wherein m represents the total number of channels of the ith image in the training set, w and h represent the width and height of the ith image respectively, n represents the total number of pixels of the ith image, j represents the serial number of the pixels in the ith image, Σ represents summation operation, | | | |. either n or h represents the width of the ith image, n represents the sum of the pixels in the ith image, j represents the sum of the pixels in the ith image, and₂denotes a two-norm operation, y_i,jIndicating the pixel value, x, of the jth pixel in the restored image corresponding to the ith image_i,jRepresenting the pixel value of the jth pixel in the ith image.

The mutual information loss value is calculated by the following formula:

l_iI＝E[lnQ(c_t,y_i)]+P(c_t)log₂P(c_t)

wherein, E [. C]Denotes an expectation operation, ln denotes a logarithmic operation based on a natural constant e, Q (c)_t,y_i) Compressed data c corresponding to the ith image_tRestored image y corresponding to ith image_iProbability distribution of (1), P (c)_t) Compressed data c corresponding to the ith image_tProbability distribution of (log)₂Representing a logarithmic operation with a natural constant of 2 as the base.

And updating the direction formula by using the network parameters of the stochastic gradient descent algorithm, taking the minimum weighted total loss value as a target, and updating the network parameters in the generation module and the judgment module to obtain the trained image decoding subnetwork.

The network parameter updating direction formula of the random gradient descent algorithm is as follows:

θ_v+1＝θ_v-L′(θ_V)

wherein, theta_V+1Network parameters, θ, representing the generation module and the discrimination module after the v +1 th update_VThe network parameters of the generation module and the judgment module after the v-th update are shown, L 'represents the partial derivative operation, and L' (theta)_V) The weighted total loss value L (theta) is expressed in the network parameter theta_vThe partial derivative value of time.

And 3, training the image coding sub-network.

Sequentially inputting each image in the training set into a VGGNet19 model, and outputting a deep characteristic diagram corresponding to each image in the training set; and sequentially inputting the restoration images corresponding to the images in the training set into the VGGNet19 model, and outputting the deep feature maps corresponding to the restoration images.

And calculating a mixed loss value of each image in the training set and the corresponding restored image by using a mixed loss formula.

The hybrid loss formula is as follows:

J_i＝α₁J_iD+α₂J_iV

wherein, J_iMixed loss value, alpha, representing the ith image in the training set₁And alpha₂Respectively represent weight coefficients, are in [0,1 ]]Two unequal fractions randomly selected within the range, and alpha₁And alpha₂The sum being equal to 1, J_iDRepresents the distance loss value, J, of the ith image in the training set and the corresponding restored image_iVThe perceptual loss values of the i-th image and the corresponding restored image are shown.

The distance loss value is calculated by the following formula:

The perception loss value is calculated by the following formula:

wherein f represents the total number of channels of the deep feature map corresponding to the ith image in the training set, g and d represent the width and height of the deep feature map corresponding to the ith image respectively, u represents the total number of pixels of the deep feature map corresponding to the ith image, k represents the serial number of the pixels in the ith deep feature map, Σ represents summation operation, | | |. Y |, C₂Denotes a two-norm operation, a_i,kB represents the pixel value of the kth pixel in the deep layer feature map of the restored image corresponding to the ith image_i,kThe pixel value of the kth pixel in the deep feature map of the ith image is represented.

And updating the network parameters of the image coding sub-network by using a network parameter updating direction formula of a random gradient descent algorithm and taking the minimized mixed loss value as a target to obtain the trained image coding sub-network.

θ_v+1＝θ_v-L′(θ_V)

wherein, theta_V+1Network parameter, θ, representing the image decoding subnetwork after the v +1 th update_VDenotes the network parameters of the image decoding subnetwork after the v-th update, L 'denotes the derivation operation, L' (θ)_V) The mixed loss value L (theta) is expressed in the network parameter theta_vThe partial derivative value of time.

And 4, preprocessing the natural image.

Each natural image is cropped to a size of 64 × 64 pixels.

And 5, acquiring compressed data.

And inputting the preprocessed natural images into a trained image coding sub-network, and outputting compressed data by a third normalization layer in the sub-network.

And 6, acquiring a restored image.

The effect of the present invention is further explained by combining the simulation experiment as follows:

1. simulation experiment conditions are as follows:

the hardware platform of the simulation experiment of the invention is as follows: the processor is NVIDIA TITAN XPGPUs, the master frequency is 3.4GHz, and the memory is 128 GB.

The software platform of the simulation experiment of the invention is as follows: ios operating system and python 2.7.

2. Simulation content and result analysis thereof:

the simulation experiment of the invention has two.

Simulation experiment 1:

the simulation experiment 1 of the invention adopts the invention and two prior arts (JPEG image compression method, generation type compression GC method) to respectively carry out 3 kinds of compression processing with different multiples on 5 test original images randomly selected from a face image set CelebA, so as to obtain restored images.

In simulation experiment 1, two prior arts are used:

the JPEG image compression method in the prior art refers to the first international digital image compression standard created for still images, referred to as JPEG image compression method for short, by Joint Photographic Experts Group (Joint Photographic Experts Group) consisting of International Standard Organization (ISO) and international telegraph Consultation Committee (CCITT).

The conventional generative compressed GC method is an image compression method based on a generative countermeasure network proposed by Shibani Santurkar et al in "Computer Vision and Pattern Recognition, 2017, Hawaii", which is abbreviated as a generative compressed GC method.

The effect of the present invention will be further described with reference to the simulation diagram of fig. 2.

Fig. 2(a) is 5 test artwork of simulation experiment 1 of the present invention randomly selected from the CelebA dataset. The CelebA data set was 202599 facial images collected and collated by hong kong university of chinese containing a total of 10177 celebrities. Each image is 64 × 64 pixels in size. Fig. 2(b) is a restored image obtained by compressing 5 test originals by 140 times by the method of the present invention, fig. 2(c) is a restored image obtained by compressing 5 test originals by 708 times by the method of the present invention, fig. 2(d) is a restored image obtained by compressing 5 test originals by 140 times by the generative compression GC method, and fig. 2(e) is a restored image obtained by compressing 5 test originals by 38 times by the JPEG image compression method.

As can be seen from fig. 2(b) and fig. 2(c), when the original image is compressed by 140 and 708 times by the method of the present invention, the restored image and the original image of fig. 2(a) have similar overall structures, clear edges, low error loss, and high quality of restored image. As can be seen from fig. 2(d), when the original image is compressed 140 times by using the generative compression GC method, the facial expression in the restored image is unnatural, mainly because the restored image randomly generated by the method has an obvious difference from the facial expression in the original image, and after pixel-level loss optimization, the restored image cannot be optimized from the deep features of the image, resulting in unnatural expression of the restored image. As can be seen from fig. 2(e), when the compression factor is only 38 times by using the JPEG image compression method, the restored image cannot maintain the whole structure of the original image, and the visual effect is poor, mainly because the method uses a fixed codec, and the image is blocked during the operation, which results in distortion of the restored image.

In order to better compare simulation effects, two evaluation indexes (peak signal-to-noise ratio (PSNR), and Structural Similarity (SSIM)) are used for evaluating the restored image quality of the three methods respectively. The peak signal-to-noise ratio PSNR and the structural similarity SSIM of the restored images of the present invention and two prior arts (JPEG image compression method, generative compression GC method) are calculated respectively using the following formulas, and all the calculation results are plotted as table 1:

wherein log₁₀Denotes a logarithmic operation with a natural constant of 10 as a base, n denotes the number of bits per pixel, H, W denotes the width and height of the restored image, Σ denotes a summing operation, X (i, j) denotes the pixel value of the pixel matrix of the restored image at the (i, j) position, and Y (i, j) denotes the pixel value of the pixel matrix of the original image at the (i, j) position.

Wherein, mu_x、μ_yRepresenting the mean values, C, of the pixels of the restored image and of the original image, respectively₁、C₂、C₃Are all constants, usually take C₁＝6.5025,C₂＝58.5225,C₃＝29.26125，σ_x、σ_yRepresenting the variance, σ, of the restored image and the original image, respectively_xyRepresenting the covariance of the restored image and the original image.

TABLE 1 quantitative analysis table of the present invention and prior art image restoration results in simulation experiment 1

As can be seen by combining the table 1, the two indexes of the peak signal-to-noise ratio PSNR and the structural similarity SSIM of the invention are higher than those of the two prior art methods, and the invention proves that the invention can obtain higher image restoration quality.

The above simulation experiments show that: the method of the invention takes the resistance loss and the mixed loss as the optimization target by constructing and training the image coding sub-network, is used for generating the restored image similar to the original image, and overcomes the problem of low quality of the restored image caused by the loss of the whole structure of the image under the condition of high compression ratio in the prior art, so that the method of the invention can realize high-quality image restoration, and is a very practical natural image compression method.

Simulation experiment 2:

the simulation experiment 2 of the invention adopts the invention and two prior arts (JPEG image compression method, NN image compression method based on convolutional neural network) to respectively carry out compression treatment of 2 different multiples on 6 test original images randomly selected from a near-universal object image set CIFAR-10 to obtain restored images.

In simulation experiment 2, two prior arts are used:

The NN Image Compression method based on the convolutional neural network in the prior art is an Image Compression method based on the convolutional neural network, which is proposed by Ball, Johannes et al in "End-to-End Optimized Image Compression" (International Conference on learning retrieval, 2017, Toulon) ", and is referred to as an NN Image Compression method based on the convolutional neural network for short.

The effect of the present invention will be further described with reference to the simulation diagram of fig. 3.

FIG. 3(a) is a graph of 6 test artwork from simulation experiment 2 of the present invention randomly selected from the CIFAR-10 dataset. The CIFAR-10 dataset was 50000 training images and 10000 test images of 10 classes of objects in total, organized by Alex Krizhevsky and Ilya Sutskey. Each image size is 32 × 32 pixels. Fig. 3(b) is a restored image obtained by compressing 6 test originals by 140 times by the method of the present invention, fig. 3(c) is a restored image obtained by compressing 6 test originals by 140 times by the NN image compression method based on the convolutional neural network, and fig. 3(d) is a restored image obtained by compressing 6 test originals by 38 times by the JPEG image compression method.

From fig. 3(b), when the original image is compressed by 140 times by the method of the present invention, the restored image has a similar overall structure to the original image fig. 3(a), the edge is clear, and the restored image has higher quality. As can be seen from fig. 3(c), when the original image is compressed by 140 times by using the NN image compression method based on the convolutional neural network, the restored image is blurred, and only the approximate contour of an object in the original image can be retained, mainly because the restored image is optimized by using the loss of the peak signal-to-noise ratio, and the deep features of the image are lacked, so that the detail information is lost, and only the approximate contour of the original image can be restored. As can be seen from fig. 3(d), when the compression factor is only 38 times by using the JPEG image compression method, the restored image cannot maintain the whole structure of the original image, and the visual effect is poor, mainly because the method uses a fixed codec, and the image is blocked during the operation, which results in distortion of the restored image.

In order to better compare simulation effects, the two evaluation indexes (peak signal-to-noise ratio (PSNR) and Structural Similarity (SSIM)) are used for evaluating the restored image quality of the three methods respectively. The same formula adopted in simulation experiment 1 is used to calculate the peak signal-to-noise ratio PSNR and the structural similarity SSIM of the restored images of the present invention and two prior arts (JPEG image compression method, NN image compression method based on convolutional neural network) respectively, and all the calculation results are plotted as table 2:

TABLE 2 quantitative analysis table of the present invention and prior art image restoration results in simulation experiment 2

As can be seen by combining the table 2, the two indexes of the peak signal-to-noise ratio PSNR and the structural similarity SSIM of the invention are higher than those of the two prior art methods, and the invention is proved to obtain higher image restoration quality.

The above simulation experiments show that: the invention constructs and trains the image decoding sub-network for recovering the image from the compressed data, and overcomes the problems that in the prior art, a generator trained independently can only generate a single type of natural image, has serious data dependency and can not complete the recovery of different types of images, so that the invention can realize the recovery of different types of images, and is a natural image compression method with more universality.

Claims

1. A natural image compression method based on a generative confrontation network is characterized in that a spectrum normalization layer is arranged in a discrimination module of an image decoding network, network parameters in the generation module and the discrimination module are updated by taking a minimum weighted total loss value as a target, and network parameters of an image coding sub-network are updated by taking a minimum mixed loss value as a target, and the method comprises the following specific steps:

(1) constructing an image compression generation type network:

(1d) setting parameters of each layer of the image coding sub-network;

(2) training the image decoding subnetwork:

(3) training an image coding subnetwork:

(3b) calculating a mixed loss value of each image in the training set and the corresponding restored image by using the following mixed loss formula:

J_i＝α₁J_iD+α₂J_iV

wherein, J_iMixed loss value, alpha, representing the ith image in the training set₁And alpha₂Respectively represent weight coefficients, are in [0,1 ]]Two unequal fractions randomly selected within the range, and alpha₁And alpha₂The sum being equal to 1, J_iDRepresents the distance loss value, J, of the ith image in the training set and the corresponding restored image_iVIndicating the perception loss value of the ith image and the corresponding restored image;

the distance loss value is calculated by the following formula:

wherein m represents the total number of channels of the ith image in the training set, w and h represent the width and height of the ith image respectively, n represents the total number of pixels of the ith image, j represents the serial number of the pixels in the ith image, Σ represents summation operation, | | | |. either n or h represents the width of the ith image, n represents the sum of the pixels in the ith image, j represents the sum of the pixels in the ith image, and₂denotes a two-norm operation, y_i,jIndicating the pixel value, x, of the jth pixel in the restored image corresponding to the ith image_i,jRepresenting the pixel value of the jth pixel in the ith image;

the perception loss value is calculated by the following formula:

wherein f represents the total number of channels of the deep feature map corresponding to the ith image in the training set, g and d represent the width and height of the deep feature map corresponding to the ith image respectively, u represents the total number of pixels of the deep feature map corresponding to the ith image, k represents the serial number of the pixels in the ith deep feature map, Σ represents summation operation, | | |. Y |, C₂Denotes a two-norm operation, a_i,kB represents the pixel value of the kth pixel in the deep layer feature map of the restored image corresponding to the ith image_i,kRepresenting the pixel value of the kth pixel in the deep feature map of the ith image；

(4) preprocessing a natural image:

cutting each natural image into the size of 64 multiplied by 64 pixels;

(5) acquiring compressed data:

(6) acquiring a restored image:

2. The natural image compression method based on the generative countermeasure network as claimed in claim 1, wherein the setting of the parameters of each layer of the image coding sub-network in step (1d) is as follows:

setting the sizes of convolution kernels of the first convolution layer, the second convolution layer, the third convolution layer and the fourth convolution layer to be 5 multiplied by 5, setting the step length to be 2 and setting the edge filling mode to be SAME;

3. The natural image compression method based on generative countermeasure network as claimed in claim 1, wherein the parameters of each layer of the generating module for setting image decoding sub-network in step (1e) are as follows:

setting the mean values of the fourth normalization layer, the fifth normalization layer, the sixth normalization layer and the seventh normalization layer as 0 and setting the variances as 1;

4. The method of claim 1, wherein the parameters of the layers of the decision module for setting the image decoding sub-network in step (1f) are as follows:

setting convolution kernels of the eighth convolution layer, the ninth convolution layer, the tenth convolution layer, the eleventh convolution layer and the twelfth convolution layer to be 5 multiplied by 5, setting step length to be 2 and setting an edge filling mode to be SAME;

setting a normalization target of a spectrum normalization layer as a maximum singular value of a parameter matrix of the network of the current layer;

5. The natural image compression method based on generative countermeasure network as claimed in claim 1, wherein the weighted total loss formula in step (2b) is as follows:

l_i＝λ₁l_iD+λ₂l_iI

wherein l_iRepresents the weighted total loss value, λ, of the ith image in the training set₁And λ₂Respectively represent weight coefficients, are in [0,1 ]]Two unequal fractions randomly selected within the range, and λ₁And λ₂The sum being equal to 1, l_iDRepresenting a training setI-th image in (1) and the distance loss value l of its corresponding restored image_iIThe mutual information loss value of the ith group of compressed data and the corresponding restoration image is represented;

the distance loss value is calculated by the following formula:

the mutual information loss value is calculated by the following formula:

l_iI＝E[lnQ(c_t,y_i)]+P(c_t)log₂P(c_t)

6. The natural image compression method based on the generative countermeasure network as claimed in claim 1, wherein the network parameter update direction formula of the stochastic gradient descent algorithm in step (2c) and step (3c) is as follows:

θ_v+1＝θ_v-L′(θ_V)

wherein, theta_V+1Represents the network parameter after the v +1 th update, and the network parameter is live in step (2c)The network parameters of the forming module and the judging module are the network parameters of the image decoding sub-network in the step (3c), and theta_VRepresenting the network parameter after the v-th update, the network parameter in the step (2c) is the network parameter of the generation module and the discrimination module, the network parameter in the step (3c) is the network parameter of the image decoding sub-network, L 'represents the partial derivation operation, L' (theta)_V) The loss value L (theta) is expressed in the network parameter theta_vThe partial derivative value in time (c) is the weighted total loss value L (θ) in step (2c), and the loss value L (θ) in step (3c) is the mixed loss value.