CN114663315A

CN114663315A - Image bit enhancement method and device for generating countermeasure network based on semantic fusion

Info

Publication number: CN114663315A
Application number: CN202210326308.XA
Authority: CN
Inventors: 刘婧; 米晓峰; 窦倩倩; 苏育挺
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2022-03-30
Filing date: 2022-03-30
Publication date: 2022-06-24
Anticipated expiration: 2042-03-30
Also published as: CN114663315B

Abstract

The invention discloses an image bit enhancement method and device for generating a countermeasure network based on semantic fusion, wherein the method comprises the following steps: the generator takes a residual error structure as a frame, adds an input image to an output global skip connection and semantic fusion layer, takes a high-bit image after zero filling and a semantic segmentation result as input, and outputs a reconstructed high-bit image; the discriminator is based on a VGG network, has an image true and false prediction branch and an image semantic category prediction branch, takes a reconstructed high-bit image or a real high-bit image as input, and outputs the input image as prediction of whether the input image is a reconstructed image or not and the prediction of a category to which the image belongs; and inputting the zero-padded high-bit image obtained by preprocessing the test set and the semantic segmentation result into the generator network loaded with the training model parameters to obtain a reconstructed high-bit image. The device comprises: a processor and a memory. The method carries out texture generation constraint at the pixel level; and the texture edge details of the mutation area are reserved, and the visual quality is improved.

Description

Image bit enhancement method and device for generating countermeasure network based on semantic fusion

Technical Field

The invention relates to the field of image bit enhancement, in particular to an image bit enhancement method and device for generating a countermeasure network based on semantic fusion.

Background

The Bit depth of an image refers to the number of bits occupied by quantized pixel values of each color channel in the image, and the basic research objective of a Bit-depth Enhancement (BDE) technology is to reconstruct a corresponding high-Bit image from an existing low-Bit image. With fast iterations of display device technology, the latest High definition displays and High Dynamic Range (HDR) displays^[1]The capability of displaying high-bit images with 10 bits or 12 bits is provided, which provides richer image details for users and makes the picture more real and natural. But most of the existing images are stored with 8 bits or even lower. When a low bit image is simply enhanced and displayed on a high bit device, a serious false contour problem and a color distortion phenomenon occur, which cannot satisfy that a human visual system can distinguish 12 bits^[2]Visual requirements for level of image detail.

Image bit enhancement is one of the important technologies for improving image display quality, and converts a low-bit image into a high-bit image through an algorithm, so that the inherent limitations of imaging hardware such as an image sensor can be overcome. The bit enhancement algorithm can be classified into a conventional method^[3][4][5][6]And method based on deep learning^[7][8][9]Wherein the conventional method can alsoThere are roughly classified into a simple calculation-based method, an interpolation-based method, and a filtering-based method. The method based on simple calculation has high efficiency, but the image reconstructed based on the method has a serious false contour problem. The method based on interpolation and filtering can obviously improve the problem of false contour, but can blur the texture and edges of the image, so that the reconstructed image lacks high-frequency details.

In recent years, methods based on deep neural networks have proven to have powerful feature representation and learning capabilities, and have been widely applied to various computer vision tasks, including: super-resolution^[9][10]Semantic segmentation, object recognition, and the like. Convolutional Neural Network (CNN) based methods have been shown to far exceed the performance of conventional algorithms in image bit enhancement tasks. In addition, by adding perceptual losses^[11]The convolutional neural network is trained, and the bit enhancement methods for deep learning can keep some edge details of the image while inhibiting the false contour, so that the visual quality of the reconstructed high-bit image is improved. But in larger color gradient areas the false contour problem is not completely solved. In addition, the texture details of the reconstructed image are still difficult to satisfy completely, and the image visual quality needs to be further improved.

Disclosure of Invention

The invention provides an image bit enhancement method and device for generating a countermeasure network based on semantic fusion.A semantic segmentation result is introduced into the generated countermeasure network to construct an image bit enhancement algorithm and carry out pixel-level texture generation constraint on a reconstructed high-bit image; in addition, the invention calculates the gradient loss based on the detection result of the gradual change region, and gives a larger loss calculation weight to the gradual change region according to the detection result of the gradual change region, so that the false contour of the gradual change region can be further inhibited, the texture edge details of the abrupt change region are reserved, and the visual quality of the image is improved, which is described in detail in the following:

an image bit enhancement method for generating a countermeasure network based on semantic fusion, the method comprising:

1) the generator takes a residual error structure as a frame, adds an input image to an output global jump connection and semantic fusion layer, takes a high-bit image after zero filling and a semantic segmentation result as input, and outputs a reconstructed high-bit image;

the discriminator is based on a VGG network, has an image true and false prediction branch and an image semantic category prediction branch, takes a reconstructed high-bit image or a real high-bit image as input, and outputs the input image as prediction of whether the input image is a reconstructed image or not and prediction of a category to which the image belongs;

2) taking a high-bit image after zero filling obtained by preprocessing a training set and a semantic segmentation result as input data of a generator, taking a reconstructed high-bit or real high-bit image as input data of a discriminator, and alternately training model parameters of a generator network and a discriminator network by using a combined loss function comprising gradient loss of a gradual change mutation region through an Adam optimizer;

3) and inputting the zero-padded high-bit image obtained by preprocessing the test set and the semantic segmentation result into the generator network loaded with the training model parameters to obtain a reconstructed high-bit image.

Wherein, prior to step 1), the method further comprises: preprocessing the images in the training set, specifically:

quantizing a real high-bit image to obtain a low-bit image, and performing zero filling on bits of the low-bit image to obtain a high-bit image after zero filling; and acquiring a semantic segmentation result and a gradual change region detection result by using the real high-bit image.

Further, the preprocessing the test set specifically includes:

quantizing the real high-bit image into a low-bit image, and performing zero filling on the bit of the low-bit image to obtain a high-bit image after the zero filling; and obtaining a semantic segmentation result by using the high-bit image after zero padding.

Wherein the generator is: a semantic fusion generator based on a residual error network structure is composed of four residual error groups, wherein each residual error group comprises: a semantic fusion residual block and a plurality of universal residual blocks, wherein jump connection exists between adjacent residual blocks;

the universal residual block is composed of two convolutional layers using several 3 x 3 convolution operations and one ReLU active layer.

Furthermore, the semantic fusion layer takes the image features and the semantic segmentation results as input, generates semantic feature information corresponding to the image features through the semantic segmentation results, and performs semantic enhancement on the image feature map. The semantic fusion layer is composed of convolution layers, the first convolution layer and the subsequent activation layer are shared by parameters, and the last two convolution layers obtain converted semantic feature information by taking the output of the activation layer as input.

Wherein, the discriminator is composed of a series of convolutions, the first branch at the back end of the discriminator is used for discriminating the true and false images, the second branch predicts the input image into one of eight categories, and the two branches share all the parameters of the front.

The combined loss function is:

wherein L is_{reconstruction}Reconstruction loss as L1-norm, L_perceptionFor perception of loss, L_gradientFor gradual and abrupt regional gradient loss, L_adversaryTo generate a generator opposition loss function, L_{classification}The classification penalties in the form of cross entropy, α, β, γ, δ, ε, ζ are the summed weights of the individual penalties.

An image bit enhancement apparatus for generating a countermeasure network based on semantic fusion, the apparatus comprising: a processor and a memory, the memory having stored therein program instructions, the processor calling upon the program instructions stored in the memory to cause the apparatus to perform any of the method steps described.

The technical scheme provided by the invention has the beneficial effects that:

1. according to the method, on the basis of generation of the countermeasure network, a generator takes a semantic segmentation result corresponding to an input image as auxiliary input data and converts the auxiliary input data into semantic feature information, and semantic enhancement is performed on the image in a feature space, so that a reconstructed high-bit image has real and natural texture;

2. the invention adds the image semantic category prediction branch on the basis of a general discriminator, and has stronger discrimination capability, thereby being capable of supervising the generator to generate a high bit depth image which is closer to reality;

3. the invention provides a gradient loss function of a gradual change mutation area, uses a larger weight when calculating loss in the gradual change area by using a gradual change area detection result, and can keep more image local texture and semantic structure information while further inhibiting the false contour of the gradual change area.

Drawings

FIG. 1 is a flow chart of an image bit enhancement method for generating a countermeasure network based on semantic fusion;

FIG. 2 is a block diagram of an image bit enhancement method for generating a countermeasure network based on semantic fusion;

FIG. 3 is a network architecture of a semantic fusion generator;

fig. 4 is a network structure of a discriminator;

fig. 5 is a schematic structural diagram of an image bit enhancement apparatus for generating a countermeasure network based on semantic fusion.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention are described in further detail below.

Example 1

The embodiment of the invention provides an image bit enhancement method for generating a countermeasure network based on semantic fusion, which constructs a countermeasure network based on semantic fusion and optimizes a network model by combining loss functions, and comprises the following steps:

101: for training set Outdoorscreentrain^[10]、DIV2K^[12]Preprocessing the image to obtain a semantic segmentation result and a gradual change region detection result of a training set;

the preprocessing of the images in the training set specifically comprises:

quantizing the real high-bit image to obtain a low-bit image, and then performing zero filling on the bits of the low-bit image to obtain a high-bit image after zero filling; obtaining a semantic segmentation result and a gradual change region detection result by using a real high-bit image;

wherein, the training set OutdoorsSceneTrain^[10]、DIV2K^[12]All are 8 bits, which are known to those skilled in the art, and the embodiment of the present invention will not be described herein.

102: to generate a countermeasure network as a basic core network, the generator being structured with residuals^[14]Adding an input image to an output global jump connection and semantic fusion layer for a core frame, taking a high-bit image after zero filling and a semantic segmentation result as input, and outputting a reconstructed high-bit image; the discriminator is based on a VGG network, and is provided with an image true and false prediction branch and an image semantic category prediction branch, and outputs the reconstructed high-bit image or a real high-bit image as input to the prediction of whether the input image is a reconstructed image or not and the prediction of the category to which the image belongs;

103: in the training stage, a zero-padded high-bit image and a semantic segmentation result obtained by preprocessing a training set are used as input data of a generator, a reconstructed high-bit or real high-bit image is used as input data of a discriminator, and a combined loss function including gradient loss of a gradual change mutation region and the like is used through Adam^[15]The optimizer alternately trains model parameters of the generator network and the discriminator network;

wherein, the reconstructed high-bit or real high-bit image is used in the whole training process; but only one image is input to the discriminator at a time, so that the two images are alternately input.

104: in the testing phase, Adobe5K is applied to the test set^[13]Carrying out pretreatment: quantizing the real high-bit image into a low-bit image, and performing zero filling on the low-bit image to obtain a high-bit image after zero filling; obtaining a semantic segmentation result by using a high-bit image after zero padding; zero padded high bit image sumAnd inputting the semantic segmentation result into the generator network loaded with the training model parameters to obtain a reconstructed high-bit image.

And calculating the similarity between the reconstructed high-bit image and the real high-bit image through a related objective evaluation standard to verify the effectiveness of the method.

The Adobe5K is a 16-bit data set, which is well known to those skilled in the art, and will not be described herein in detail in the embodiments of the present invention.

In summary, the embodiment of the present invention designs a bit depth enhancement method for generating a countermeasure network based on semantic fusion through steps 101 to 104. And performing inverse quantization on the low-bit image to obtain a zero-padding high-bit image, obtaining a semantic segmentation result from the zero-padding high-bit image, and inputting the zero-padded high-bit image and the semantic segmentation result into a trained generator network to obtain a reconstructed high-quality high-bit image. According to the embodiment of the invention, the semantic segmentation result is introduced, and the semantic segmentation result information is used for guiding the reconstruction of the high-bit image, so that the adopted generation countermeasure network has stronger learning ability, and the reconstructed high-bit image has higher visual display effect.

Example 2

The scheme of example 1 is further described below with reference to an example and a calculation formula, which is described in detail below:

201: outdoorscreentrain is a data set of outdoor scenes containing: seven semantic categories of mountain, water, sky, grassland, vegetation, animal and building. The "sky" category tends to have a large range of smooth color gradient structures, while other categories have rich texture details. Therefore, the OutdoORSceneTracin can generate the color gradient of the image gradient area and the texture of the mutation area learned by the countermeasure network, is beneficial to reducing the false contour of the gradient area and retaining the detail information of the texture, and is beneficial to improving the visual quality of the reconstructed high-bit image. Therefore, OutdoORSceneTrain is selected as the training set. In addition, considering that many images cannot be strictly classified into one of the above seven semantics in practice, DIV2K is used as a background class training set. To verify the effectiveness of the method, Adobe5K was used as a test set.

In view of the great similarity of pixel values at corresponding positions of the low bit image and the high bit image, the difference is only in the least significant bit. The key of image bit enhancement is the reconstruction of low-effective bits, and the difficulty is how to recover the texture edge details of an image while inhibiting the false contour of a gradual change region. The traditional method often has serious false contours or causes over-blurring of edges, and although the convolution neural network-based method makes a great breakthrough in improving the visual quality of the reconstructed high-bit image, the problem of false contours in a color gradient area in a large range cannot be completely solved, and the reconstructed high-bit image usually lacks texture details. The generation of an antagonistic network has strong learning and feature representation capabilities, can generate images with high visual quality, and has been widely used for various computer visual tasks such as super-resolution reconstruction and the like; in order for the generation of a competing network to generate a true natural texture, an effective approach is to use semantic segmentation results as auxiliary input data.

Therefore, the embodiment of the invention uses the generation countermeasure network based on semantic fusion. The image has a gradual change area and a sudden change area of the color, and the image blocks of the gradual change area (such as the sky) have more uniform change of the color and less obvious texture; whereas abrupt regions (e.g. rough rocks) are not uniform in color change and have rich texture. In consideration of different visual characteristics of a gradual change region and a sudden change region of an image in a bit depth enhancement task, the image display quality is visually reduced more obviously by the false contour of the gradual change region, and the gradient loss of the gradual change sudden change region is provided in the embodiment of the invention. This loss calculates the difference in gradient change between the true high-bit image and the reconstructed high-bit image. Wherein the gradient loss of the gradual region is given more weight than the gradient loss of the abrupt region.

Through the above analysis, the embodiment of the present invention proposes to reconstruct a high-bit image using a generation countermeasure network, and introduce the semantic segmentation result into the generator as auxiliary input data, and the overall framework is as shown in fig. 2. Using gradient loss, generating contrast loss, pixel level reconstruction loss, perceptual loss, and image semantic classesThe combined loss function of the prediction loss trains the whole generation confrontation network model, and when the gradient loss is calculated, the detection result of the gradual change region is used, and the gradual change region is given a weight larger than that of the abrupt change region, so that the model is promoted to pay more attention to the gradual change region of the image, and the false contour problem of the gradual change region is further inhibited. Quantizing the training set images and the test set images into low-bit images, and obtaining zero-padding high-bit images as the input of a generator in a bit zero-padding mode; inputting high-bit of training set and zero-padding high-bit image of test set into trained semantic segmentation network^[10]Outputting corresponding semantic segmentation results; and detecting the gradual change region of the high-bit image of the training set to obtain a gradual change region detection result.

202: although the traditional generation countermeasure network has strong capability of generating image texture and edge details, the generated texture is not real and the position of the generated edge is not accurate enough. Therefore, the semantic segmentation result is fused in the generator, the semantic generation constraint at the pixel level is realized, the reconstructed high-bit image is ensured to have real and natural texture and edge structure, and the visual quality of the reconstructed high-bit image is further improved.

As shown in fig. 3, the semantic fusion generator is based on a residual error network structure, and mainly comprises four residual error groups in consideration of computational complexity, each residual error group comprises a semantic fusion residual error block and a plurality of general residual error blocks, and adjacent residual error blocks are in jumping connection, so that the problem of gradient disappearance can be remarkably alleviated, and the training effect of the network can be enhanced. In addition, since the data difference between the low-bit image and the high-bit image is only reflected in the low-significant bit, and a global jump connection from the input image to the output is added in the generator, the actual generator network learns the missing low-significant bit of the low-bit image, and compared with directly restoring the high-bit image, the missing low-significant bit of the restored image is equivalent but simpler and easier to implement.

The universal residual block is composed of two convolutional layers (Conv) using several 3 × 3 convolutional operations and a ReLU active layerBy doing so, the ReLU layer can alleviate the problems of gradient vanishing and overfitting. The Batch Normalization layer (BN) may corrupt the contrast information of the image. The general residual block of the generator does not use the BN layer, similar to many super-resolution tasks based on deep learning. Subjected to SFTGAN^[10]The Semantic Fusion residual block adds a Semantic Fusion layer (SF) on the basis of the universal residual block. The semantic fusion layer takes the image characteristics and the semantic segmentation result as input and has the function of fusing semantic information in the image characteristics. Specifically, the semantic fusion layer generates semantic feature information corresponding to the image features through a semantic segmentation result, and performs semantic enhancement on the image feature map:

wherein F represents a feature map, S represents a semantic segmentation result of a corresponding image, which indicates dot-by-dot multiplication by pixel,

representing an element-by-element addition.

The semantic fusion layer is mainly composed of convolution layers, the first convolution layer and the subsequent activation layer are shared by parameters, the latter two convolution layers take the output of the activation layer as input to obtain converted semantic feature information, and the spatial dimension of the semantic feature information is the same as the dimension of the input feature. The semantic fusion layer performs semantic enhancement on the image features by using semantic segmentation results, can realize semantic generation constraint at a pixel level, and ensures that a reconstructed image has more real and natural textures.

In addition, considering that the generator and the discriminator which generate the countermeasure network are a mutual competition process during training, the quality of the image generated by the generator can be improved to a certain extent by improving the discrimination capability of the discriminator. As shown in fig. 4, the discriminator uses a structure like a VGG network, and is composed of a series of convolutions. With the increase of the network layer number, the channel number of the output features of the network gradually increases, and the space size of the features gradually decreases. The discriminator back end is divided into two branches, the first branch is used for discriminating true and false images, and the second branch predicts the input image into one of eight categories. Both branches share all the previous parameters to reduce the computational effort.

Eight categories are, among others, mountains, water, sky, grasslands, vegetation, animals, buildings, and backgrounds.

203: the whole generated confrontation network model is trained through an Adam optimizer, and a combined loss function of pixel-level reconstruction loss, perception loss, gradient loss of gradual change and mutation regions, confrontation loss and image semantic category prediction loss is used for optimizing network parameters.

The pixel level reconstruction loss is the most widely applied loss function in image-to-image tasks, the difference in pixel level between the reconstructed image and the target image is calculated, and the reconstruction loss using the L1-norm can be defined as:

wherein, the first and the second end of the pipe are connected with each other,

to reconstruct high bit pictures, I_HBDIs a true high bit image.

The perceptual loss is another widely used deep learning loss function, which is used to calculate the distance between two images in the feature space, and is defined as follows:

wherein phi is_iThe output of the ith convolution layer of an image feature extraction network (VGG-19) is shown, and the invention takes i as 2,7,16,25 and 34. None of these features have undergone pooling operations.

The gradient loss is divided into gradual change mutation areas which are respectively calculated, and the gradual change areas use larger loss weight. On the one hand, the false contours of the gradient area can be further suppressed, and on the other hand, structural information is also retained to a greater extent. The detection result of the gradual change area is obtained by using the difference calculation of local pixel values, and for any pixel in the image, the difference of local average pixel values is calculated as follows:

wherein, g_i(i-1, 2, …, P) represents neighboring pixels, P is the total number of neighboring pixels,

is the average of P neighboring pixels, where P in this term is 8. Similarly, the global average pixel value difference is calculated as follows:

where N is the total number of image pixels,

is the average value of the pixels. The detection result of the gradual change area is obtained by comparing the difference of local global average pixel values:

d_local<0.05×d_global#(6)

if the above formula is satisfied, the middle pixel position of the local area is a gradual change area and is recorded as 1, otherwise, if the formula is not satisfied, the middle pixel position is a sudden change area and is recorded as 0, and finally, the detection result of the binary gradual change area at the pixel level is obtained and is recorded as M_flat. The gradient loss of the gradual and abrupt regions can be calculated by the following formula:

wherein G is_HBDAnd

a gradient map representing the true high-bit image and the reconstructed high-bit image, respectively.

The generation countermeasure mechanism can guide the generator to generate more real and natural content, and particularly has a remarkable effect of improving the perceptual quality in the aspect of texture details, and the generator countermeasure loss function of the embodiment of the invention can be expressed as follows:

L_adversary＝-logD_η(G_θ(x)) (8)

where x represents a low bit image, D_ηFor discriminator output, G_θ(x) A reconstructed high bit image generated by the generator from the input zero-padded high bit image.

The image semantic category prediction loss is calculated by using the output of an image semantic category prediction branch of a discriminator, and has the function of promoting a generator to generate texture details conforming to the actual semantic category, and the embodiment of the invention uses the classification loss in a cross entropy form and calculates as follows:

where N-8 denotes the eight semantic categories considered by the present invention, z_iRepresenting the original semantic tag, q_iRepresenting the confidence of the predicted image belonging to the ith semantic category of the predicted image of the semantic category prediction branch, q_jPredicting the confidence of the branch predicted image belonging to the jth semantic category for the image semantic category.

In summary, the combined loss function used to generate the confrontation model is:

where α, β, γ, δ, ε, ζ are the summation weights of the individual losses.

Example 3

The protocols of examples 1 and 2 were evaluated for efficacy in combination with specific experimental data, as described in detail below:

301: data composition;

the training set is formed by randomly extracting one thousand images from seven semantics of an OutdoORSceneTracin data set and randomly extracting two hundred images from a DIV2K, wherein the total of seven thousand images and two hundred images are extracted.

The test set consisted of fifty images randomly drawn by Adobe 5K.

302: evaluating criteria;

the embodiment of the invention adopts two objective evaluation indexes to evaluate the quality of the reconstructed high-bit image:

the Peak Signal to Noise Ratio (PSNR) is the most widely used image objective quality evaluation index to calculate the difference between corresponding pixels of the contrast image and the original image. The higher the PSNR index is, the greater the similarity between the contrast image and the original image is, and the better the image quality is.

Structural Similarity Index (Structural Similarity Index, SSIM)^[16]Is an index for measuring the similarity between two images by comparing the brightness, contrast and structure between the two images. The SSIM value is between 0 and 1, and the larger the value, the smaller the image distortion and the better the quality.

303: a comparison algorithm;

the method is compared with 4 methods in the experiment, wherein 2 methods are traditional methods, and 2 methods are deep learning methods.

The conventional method comprises the following steps: bit Zero Padding (ZP) and Intensity Potential field based bit depth enhancement (IPAD)^[6]。

The deep learning method comprises the following steps: image Bit Depth Enhancement algorithm (Bit-Depth Enhancement via conditional Neural Network, BE-CNN) based on end-to-end Convolutional Neural Network^[7]And Bit depth Enhancement algorithm (Bit-depth Enhancement by conditioning All levels Features of DNN, BE-CALF) based on dense feature layer cascade^[8]。

Table 1 shows the results of bit enhancement of 4-16 bits on Adobe5K data set by this method and other methods (the optimal value of the evaluation index is shown in bold font). As can be seen from table 1, both PSNR and SSIM of the deep learning method are significantly superior to those of the conventional method. The PSNR and SSIM indexes obtained by the method provided by the embodiment of the invention on the test set are the highest indexes in all algorithms.

TABLE 1

Example 4

An image bit enhancement apparatus for generating a countermeasure network based on semantic fusion, referring to fig. 5, the apparatus comprising: a processor and a memory, wherein the processor is capable of processing a plurality of data,

the discriminator is based on a VGG network, has an image true and false prediction branch and an image semantic category prediction branch, takes a reconstructed high-bit image or a real high-bit image as input, and outputs the input image as prediction of whether the input image is a reconstructed image or not and the prediction of a category to which the image belongs;

3) and inputting the zero-filled high-bit image obtained by preprocessing the test set and the semantic segmentation result into the generator network loaded with the training model parameters to obtain a reconstructed high-bit image.

Wherein, before step 1), further comprising: preprocessing the images in the training set, specifically:

Further, the test set is preprocessed, specifically:

quantizing a real high-bit image into a low-bit image, and performing zero filling on bits of the low-bit image to obtain a high-bit image after zero filling; and obtaining a semantic segmentation result by using the high-bit image after zero padding.

Wherein, the generator is: a semantic fusion generator based on a residual error network structure is composed of four residual error groups, wherein each residual error group comprises: a semantic fusion residual block and a plurality of universal residual blocks, wherein jump connection exists between adjacent residual blocks;

The discriminator is composed of a series of convolutions, the first branch at the back end of the discriminator is used for discriminating true and false images, the second branch predicts an input image into one of eight categories, and the two branches share all the parameters in the front.

The combined loss function is:

wherein L is_{reconstruction}Reconstruction loss as L1-norm, L_perceptionFor perception of loss, L_gradientFor gradual and abrupt regional gradient loss, L_adversaryTo generate a generator opposition loss function, L_{classification}For classification penalties in the form of cross entropy, α, β, γ, δ, ε, ζ are the summation weights of the individual penalties.

Reference to the literature

[1]Mantiuk R,Krawczyk G,Myszkowski K,et al.Perception-motivated high dynamic range video encoding[J].ACM Transactions on Graphics,2014,23(3):733–741.

[2]Zhao Y,Wang R,Jia W,et al.Deep reconstruction of least significant bits for bit-depth expansion[J].IEEE Transactions on Image Processing,2019,28(6):2847-2859.

[3]Ulichney R A,Cheung S.Pixel bit-depth increase by bit replication[C].Color Imaging:Device-Independent Color,Color Hardcopy,and Graphic Arts III.International Society for Optics and Photonics,1998,3300:232-241.

[4]Cheng C H,Au O C,Liu C H,et al.Bit-depth expansion by contour region reconstruction[C].IEEE International Symposium on Circuits and Systems,2009:944-947.

[5]Wan P,Au O C,Tang K,et al.From 2d extrapolation to 1d interpolation:Content adaptive image bit-depth expansion[C].IEEE International Conference on Multimedia and Expo,2012:170-175.

[6]Liu J,Zhai G,Liu A,et al.IPAD:Intensity potential for adaptive de-quantization[J].IEEE Transactions on Image Processing,2018,27(10):4860-4872.

[7]Liu J,Sun W,Liu Y.Bit-depth enhancement via convolutional neural network[C].International Forum on Digital TV and Wireless Multimedia Communications.Springer,Singapore,2017:255-264.

[8]Liu J,Sun W,Su Y,et al.BE-CALF:bit-depth enhancement by concatenating all level features of DNN[J].IEEE Transactions on Image Processing,2019,28(10):4926-4940.

[9]Ledig C,Theis L,Huszar F,et al.Photo-realistic single image super-resolution using agenerative adversarial network[C].IEEE Conference on Computer Vision and Pattern Recognition,2016,4681-4690.

[10]Wang X,Yu K,Dong C,et al.Recovering Realistic Texture in Image Super-Resolution by Deep Spatial Feature Transform[C].IEEE/CVF Conference on Computer Vision and Pattern Recognition,2018:606-615.

[11]Johnson J,Alahi A,Li F.Perceptual losses for real-time style transfer and super-resolution[C].European Conference on Computer Vision,2016:694-711.

[12]Agustsson E,Timofte R.NTIRE 2017Challenge on Single Image Super-Resolution:Dataset and Study,The IEEE Conference on Computer Vision and Pattern Recognition(CVPR)Workshops,2017.

[13]Vladimir B,Sylvain P,Eric C,et al.Learning Photographic Global Tonal Adjustment with aDatabase of Input/Output Image Pairs,IEEE Conference on Computer Vision and Pattern Recognition,2011.

[14]He K,Zhang X,Ren S,et al.Deep Residual Learning for Image Recognition[C].IEEE Conference on Computer Vision and Pattern Recognition,2016:770-778.

[15]Kingma D P,Ba J.Adam:A Method for Stochastic Optimization,arXiv preprint arXiv:1412.6980,2014.

[16]Wang Z,Bovik A C,Sheikh H R,et al.Image quality assessment:from error visibility to structural similarity[J].IEEE Transactions on Image Processing,2004,13(4):600-612.

In the embodiment of the present invention, except for the specific description of the model of each device, the model of other devices is not limited, as long as the device can perform the above functions.

Those skilled in the art will appreciate that the drawings are only schematic illustrations of preferred embodiments, and the above-described embodiments of the present invention are merely provided for description and do not represent the merits of the embodiments.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims

1. An image bit enhancement method for generating a countermeasure network based on semantic fusion, the method comprising:

2. The method for generating image bit enhancement for countermeasure network based on semantic fusion as claimed in claim 1, wherein before step 1), the method further comprises: preprocessing the images in the training set, specifically:

3. The image bit enhancement method for generating a countermeasure network based on semantic fusion as claimed in claim 1, wherein the preprocessing of the test set is specifically:

4. The method for generating image bit enhancement for confrontation network based on semantic fusion as claimed in claim 1, wherein the generator is: a semantic fusion generator based on a residual error network structure is composed of four residual error groups, wherein each residual error group comprises: a semantic fusion residual block and a plurality of universal residual blocks, wherein jump connection exists between adjacent residual blocks;

5. The image bit enhancement method for generating the countermeasure network based on semantic fusion as claimed in claim 1, wherein the semantic fusion layer takes the image feature and the semantic segmentation result as input, generates semantic feature information corresponding to the image feature through the semantic segmentation result, and performs semantic enhancement on the image feature map.

6. The image bit enhancement method for generating the countermeasure network based on semantic fusion according to claim 1 or 5, characterized in that the semantic fusion layer is composed of convolutional layers, the first convolutional layer and the following active layer are parameter shared, and the last two convolutional layers are transformed semantic feature information obtained by taking the output of the active layer as input.

7. The image bit enhancement method based on semantic fusion generation countermeasure network of claim 1 or 5, characterized in that the discriminator is composed of a series of convolutions, the first branch at the back end of the discriminator is used for discriminating true and false images, the second branch predicts the input image as one of eight categories, and the two branches share all the parameters of the front.

8. The method for generating image bit enhancement of countermeasure network based on semantic fusion as claimed in claim 1 or 5, wherein the combination loss function is:

9. An image bit enhancement apparatus for generating a countermeasure network based on semantic fusion, the apparatus comprising:

a processor and a memory, the memory having stored therein program instructions, the processor calling upon the program instructions stored in the memory to cause the apparatus to perform the method steps of any of claims 1-8.