CN113222953A

CN113222953A - Natural image enhancement method based on depth gamma transformation

Info

Publication number: CN113222953A
Application number: CN202110557873.2A
Authority: CN
Inventors: 董伟生; 张松林; 毋芳芳; 石光明; 谢雪梅; 吴金建
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2021-05-21
Filing date: 2021-05-21
Publication date: 2021-08-06
Anticipated expiration: 2041-05-21
Also published as: CN113222953B

Abstract

The invention discloses a natural image enhancement method based on depth gamma transformation. The problem that the contrast and the detail texture of the enhancement effect of shooting natural images are poor in the prior art is mainly solved. The implementation scheme is as follows: 1) acquiring an existing data set, and dividing the existing data set into a training set and a test set; 2) respectively constructing a generating network G and an identifying network D based on a depth gamma conversion network, and setting an optimized objective function of each network; 3) alternately optimizing the objective functions of the generating network G and the identifying network D and updating the network parameters until the set maximum iteration times is reached to obtain a trained generating network G; 4) and inputting the low-quality natural image into the trained generation network G, and outputting the enhanced high-quality natural image. The invention not only enhances the overall color and contrast of the low-quality natural image, but also enhances the detail and texture information of the image, and can be widely used for beautifying the image or preprocessing the image.

Description

Natural image enhancement method based on depth gamma transformation

Technical Field

The invention belongs to the technical field of image processing, and particularly relates to a natural image enhancement method which can be used for image beautification or image preprocessing.

Background

With the rapid development of the information age, images are of great importance in information acquisition, and especially high-quality image information is receiving more and more attention. Because the imaging process is influenced by hardware, the motion of an object and ambient light, for example, when the ambient light is dark, the obtained image has lower contrast and fuzzy detail information; when an object moves, the obtained image is often blurred locally, the obtained image cannot acquire enough information, and the visual requirements of people on social media cannot be met. Besides reducing visual requirements, low-quality images and videos influence computer vision processing tasks mainly based on the low-quality images and videos, for example, monitoring scenes are required to be distributed all over the world at present, and images acquired under a dark light environment have great influence on pedestrian identification, all-day automatic driving and biological identification. Although software allows users to interactively adjust images, batch interactive processing of images is quite tedious and difficult because natural image enhancement requires not only precise control of the color and contrast and exposure of images, but also fine adjustment of various objects and details in images, especially in recent years, environment adaptive imaging of mobile phone cameras is receiving more and more attention, and mobile phones are required to adaptively adjust images according to scenes and light to meet the visual needs of people. Therefore, the reduction of the quality of the visible image seriously influences the daily life and scientific research of people, and therefore, the method has practical significance and value for the research of the self-adaptive natural image enhancement technology.

The natural image enhancement is to adjust the color and tone of the whole image and process the local details and texture of the image, so that the generated image can clearly transmit the expressed information and achieve considerable visual effect. This task requires a high level of skill by the photographer and consumes much time, so it is necessary to study adaptive image enhancement. At present, natural image enhancement algorithms can be divided into traditional algorithms and deep learning algorithms, and the traditional methods such as histogram equalization image contrast, gamma conversion image contrast adjustment through parameters, and filtering method removal of image high-frequency or low-frequency information, however, the traditional methods have great defects for image detail and local information processing, and simultaneously, different images must be enhanced by selecting proper parameters and proper method combinations. The enhancement method based on the deep learning can better deal with the defect, and can adaptively learn the information of the mapping input image from the low-quality image to the target enhanced image, so that the enhanced image can meet the visual requirements of people in details, textures and colors as much as possible. In recent years, with the increase of computer hardware platforms and the increase of data volume, machine learning has been developed vigorously, and machine learning learns target conversion functions and features from a large amount of data sets and predicts new data. The deep learning theory as a part of machine learning is widely used for various task processing due to the superior learning ability. Due to the rapid development of the internet and big data, a large amount of data set bases are provided for deep learning, the deep learning learns richer mappings of the data set through a feature extraction network and a nonlinear layer, and unknown data can be well predicted by utilizing the data set mappings. The natural image enhancement method based on deep learning also becomes a popular research direction for image enhancement, and the existing algorithm can be divided into: the method comprises the steps of natural image enhancement based on an end-to-end structure, natural image enhancement based on semi-supervision and dark light natural image enhancement based on Retinex decomposition theory.

The natural image enhancement method based on deep learning mostly lies in the design of end-to-end discriminator.

Ignatov A et al, in the thesis Dslr-quality phosphor on mobile devices with default connected networks, learns the mapping from a common low-quality image to a single-reflection image end to end by using a neural network, constructs a natural image data set by the method, restrains the color of the generated image by using color loss for the first time, and simultaneously combines the generated countermeasure model to enable the color of the generated image to be closer to the color of a target enhanced image. The method has the following defects: the constructed generated network model is relatively simple, the features of the input image cannot be fully extracted, and the enhanced image has to be improved in the aspects of details and textures.

JIANG Y et al propose in the paper EnLightengan: Deep light enhancement with out predicted supervision to utilize the generation to resist the mapping method of the end-to-end learning low-quality image of the network model to the high-quality image, this method distinguishes the image produced by constructing global discriminator and local discriminator, utilize the luminance information to supervise the luminance of the image produced at the same time, make the image closer to the goal enhancement image in detail and luminance. The method has the following defects: the modeling for image enhancement is simple, only the mapping of the end-to-end network structure learning data set is used, the optimization target setting is simple, and the visual effect of the enhanced image is poor.

Disclosure of Invention

The invention aims to provide a natural image enhancement method based on depth gamma transformation aiming at the defects of the existing natural image enhancement method, so that gamma transformation parameters are learned through network self-adaptation, global and local enhancement is carried out on input low-quality natural images, meanwhile, the enhancement of image details and textures is realized while the contrast of the low-quality natural images is enhanced through improving an optimized objective function of a model, and the visual effect is improved.

In order to achieve the purpose, the technical scheme of the invention comprises the following steps:

acquiring an existing paired natural image dataset MIT-Adobe, and dividing a training set and a testing set: randomly selecting 500 pairs of images in the data set as a test set, and taking 4500 pairs of images as a training set;

constructing a generation network G consisting of a global feature extraction module, a local feature extraction module and a fusion module in sequence, and according to the pixel loss l_MSEContent loss L_contentThe antagonistic loss A of the generating network jointly form an objective optimization function L of the generating network_G；

Constructing an authentication network D consisting of convolution modules and full-connection modules and resisting the loss L_DAs an optimization function;

updating the network parameters of the generated network G and the identification network D through alternately optimizing the objective functions of the two networks until the set network training times are reached to obtain a trained generated network G:

and inputting a low-quality natural image with any size into an updated generation network G, and outputting the enhanced high-quality natural image through forward propagation calculation of the generation network.

Compared with the prior art, the invention has the following advantages:

firstly, the natural image enhancement method based on depth gamma conversion of the invention performs global and local enhancement on the input low-quality natural image by adaptively learning the gamma conversion parameters through the network, overcomes the problem that the prior art needs to manually adjust the model parameters, and not only adaptively learns the gamma conversion parameters of each natural image, but also better enhances the local information of the image through the network model compared with the traditional gamma conversion algorithm.

Secondly, the natural image enhancement model is constructed through the generating network G and the identifying network D, and the parameters of the generating network and the identifying network are alternately updated through the optimization function, so that the details and the textures of the updated generating network G can be improved when the image is enhanced.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention;

FIG. 2 is a block diagram of a generation network constructed in the present invention;

FIG. 3 is a block diagram of a global feature extraction module in the production network;

FIG. 4 is a block diagram of a local feature extraction module in the production network;

FIG. 5 is a block diagram of a fusion module in the production network;

FIG. 6 is a diagram of a countermeasure network constructed in accordance with the present invention;

FIG. 7 is a graph of simulation results of the present invention.

Detailed Description

Embodiments and effects of the present invention will be described in further detail below with reference to the accompanying drawings.

Referring to fig. 1, specific implementations of this example include the following:

step 1, a training sample set is obtained.

Acquiring an existing paired natural image data set MIT-Adobe from a network, wherein the data set comprises 5000 pairs of natural scene images, each pair of images comprises an originally shot low-quality image and a corresponding high-quality image subjected to image repairing by an expert, and the data set comprises abundant natural scenes;

500 pairs of images in the data set were randomly selected as the test set, and 4500 pairs of images were selected as the training set.

And 2, respectively constructing a generating network G and an identifying network D, and setting respective optimization objective functions.

2.1) constructing and generating network G and target optimization function L thereof_G：

Referring to fig. 2, the example generation network G is composed of a global feature extraction module, a local feature extraction module, and a fusion module in sequence, and the structures and parameters of the modules are as follows:

the structure of the global feature extraction module is shown in fig. 3, and the specific structure is as follows: the input layer → the first convolution layer → the first activation function layer → the second convolution layer → the first normalization layer → the second activation function layer → the first downsampling layer → the third convolution layer → the second normalization layer → the third activation function layer → the second downsampling layer → the fourth convolution layer → the third normalization layer → the fourth activation function layer → the second downsampling layer → the fifth convolution layer → the fourth normalization layer → the fifth activation function layer → the sixth convolution layer → the fifth normalization layer → the sixth activation function layer → the third downsampling layer → the seventh convolution layer → the sixth normalization layer → the seventh activation function layer → the eighth convolution layer → the third downsampling layer → the seventh normalization layer → the eighth activation function layer → the fourth downsampling layer → the ninth convolution layer → the eighth convolution layer → the ninth activation function layer → the twelfth activation function layer → the ninth activation function layer → the seventh normalization layer → the seventh activation function layer → the eighth activation function layer → the fourth downsampling layer → the ninth layer → the eleventh activation function layer → the tenth layer → the ninth activation function layer → the eleventh layer → the twelfth activation function layer → the ninth activation function layer → the eleventh layer → the ninth activation function layer → the twelfth layer → the ninth activation function layer → the seventh layer → the eleventh layer → the seventh activation function layer → the seventh activation function layer → the seventh layer → the eleventh layer → the seventh An output layer composition outputting ten different sets of gamma parameters a and gamma;

the parameters of each layer are as follows: the input layer inputs low-quality natural images; the number of input channels of the first convolutional layer is 3, and the number of output channels is 32; the input channel number of the second convolution layer is 32, and the output channel number is 32; the input channel number of the third convolution layer is 32, and the output channel number is 64; the input channel number of the fourth convolution layer is 64, and the output channel number is 64; the input channel number of the fifth convolutional layer is 64, and the output channel number is 128; the number of input channels of the sixth convolutional layer is 128, and the number of output channels is 128; the input channel number of the seventh convolutional layer is 128, and the output channel number is 256; the input channel number of the eighth convolutional layer is 256, and the output channel number is 256; the input channel number of the ninth convolutional layer is 256, and the output channel number is 512; the input channel number of the tenth convolutional layer is 512, and the output channel number is 512; the convolution kernel sizes of all convolution layers are set to be 3 multiplied by 3, and the convolution step length is set to be 1; the input of the first full-connection layer is 512 vectors, and the output is 256 vectors; the input of the second full-connection layer is 256 vectors, and the output is 60 vectors; the normalization layers all use BN normalization functions; the activation functions of the first to tenth activation function layers use LeakyRelu; the eleventh and twelfth activation functions use Sigmoid; the first sampling layer to the fourth sampling layer carry out 2 times of down-sampling operation on the input characteristic layer; the global average pooling layer normalizes each channel average of input features to a value.

The structure of the local feature extraction module is shown in fig. 4, and the specific structure is as follows: the input layer → the 1 st convolution layer → the 1 st activation function layer → the 2 nd convolution layer → the 1 st normalization layer → the 2 nd activation function layer → the 3 rd convolution layer → the 2 nd normalization layer → the 3 rd activation function layer → the 4 th convolution layer → the 3 rd normalization layer → the 4 th activation function layer → the 5 th convolution layer → the 4 th normalization layer → the 5 th activation function layer → the 1 st channel connection layer → the 6 th convolution layer → the 5 th normalization layer → the 6 th activation function layer → the 7 th convolution layer → the 6 th normalization layer → the 7 th activation function layer → the 8 th convolution layer → the 8 th activation function layer → the 9 th convolution layer → the 8 th normalization layer → the 8 th activation function layer → the 10 th convolution layer → the 9 th normalization layer → the 10 th activation function layer → the 2 th channel connection layer → the 11 th convolution layer → the 10 th convolution layer → the 11 th activation function layer → the 12 → the first activation function layer → the 12 → the 10 th convolution layer → the 10 th activation function layer → the 13 → the first activation function layer → the 8 th convolution layer → the 8 th activation function layer → the 10 → the first convolution layer → the 10 → the 11 → the first activation function layer → the 12 → the first convolution layer → the 10 → the second layer → the 10 → the first activation function layer → the second layer → the 13 → the second layer → the first activation function layer → the second layer → the 10 → the second activation function layer → the second layer → the first activation function layer → the second layer → the first activation function layer → the second layer → the first activation function layer → the second layer → the first activation function layer → the second layer → the first activation function layer → the second layer → the first layer → the second layer → the first layer → the second layer → the first layer → the second layer → the first layer → the second layer Laminate → 13 normalization layer → 14 activation function layer → 15 convolution layer → 14 normalization layer → 15 activation function layer → 3 channel connection layer → 16 convolution layer → 15 normalization layer → 16 activation function layer → 17 convolution layer → output layer, which outputs ten different sets of characteristic residual characteristics b;

the parameters of each layer are as follows: the input layer inputs low-quality natural images; the number of input channels of the 1 st convolution layer is 3, and the number of output channels is 64; the number of input channels of the 6 th, 11 th and 16 th convolution layers is 320, and the number of output channels is 64; the number of input channels of the 17 th convolution layer is 64 channels, and the number of output channels is 60; the number of input channels of the other convolution layers is 64, and the number of output channels is 64; the input characteristic of the 7 th convolution layer is that the output of the 6 th activation function layer is added to the output of the 1 st activation function layer, the input characteristic of the 12 th convolution layer is that the output characteristic of the 11 th activation function layer is added to the input characteristic of the 7 th convolution layer, and the input characteristic of the 17 th convolution layer is that the output of the 16 th activation function layer is added to the input characteristic of the 12 th convolution layer; the convolution kernel sizes of all convolution layers are set to be 3 multiplied by 3, and the convolution step length is set to be 1; all activation function layers use LeakyRelu activation functions; the normalization layers all use BN normalization functions; each channel connects the layers, and the output characteristics of the first five activation function layers are cascaded on the channel domain.

The structure of the fusion module is shown in fig. 5, and the specific structure is as follows: an input layer → an i convolution layer → an i normalization layer → an i activation function layer → a ii convolution layer → a ii normalization layer → a ii activation function layer → a iii convolution layer → a iii normalization layer → a iii activation function layer → an iv convolution layer → an iv normalization layer → an iv activation function layer → a v convolution layer → a v normalization layer → a v activation function layer → a channel connection layer → a vi convolution layer → a vi normalization layer → a vi activation function layer → a vii convolution layer → a vii normalization layer → a vii activation function layer → an output layer, which is output to generate a network G-enhanced image;

the parameters of each layer are as follows: the input layer is the image characteristics enhanced by the local characteristic extraction module and the global characteristic extraction module; the number of input channels of the first convolution layer is 60, and the number of output channels is 64; the number of input channels from the second convolution layer to the fifth convolution layer is 64, and the number of output channels is 64; the number of input channels of the VI convolution layer is 320, and the number of output channels is 64; the number of input channels of the VII th convolution layer is 64, and the number of output channels is 64; the number of input channels of the eighth convolutional layer is 64, and the number of output channels is 3; the sizes of convolution kernels from the first convolution layer to the VI convolution layer are all set to be 3 multiplied by 3, and convolution step lengths are all set to be 1; the sizes of convolution kernels from the VII th convolution layer to the VIII th convolution layer are all set to be 5 multiplied by 5, and convolution step lengths are all set to be 1; the VII-th convolution layer input characteristic is the addition of the I-th activation function layer output characteristic and the VI-th activation function layer output characteristic; the normalization layers all use BN normalization functions; the activation functions from the activation function layer I to the activation function layer VI use LeakyRelu; the activation functions from the VII-th activation function layer to the VIII-th activation function layer use Sigmoid; the channel connection layer concatenates the output characteristics of its first five activation function layers over the channel domain.

An objective optimization function L of the generation network G_GBy pixel loss l_MSEContent loss L_contentAntagonistic loss a composition, which is represented as follows:

L_G＝4*L_MSE+L_content+A

A＝-E_Ireal[log(1-D_Ra(I_real,I_fake))]-E_Ifake[log(D_Ra(I_fake,I_real))]

wherein, I_realEnhancing the image for the object, I_fakeTo generate an enhanced image output by the network, i represents the pixel points of the image, f_vggA function for extracting features for the VGG16 network, wherein | · | |, expressed as L₂Norm, H, W is the length and width of the input feature, respectively, E represents the averaging operation over all the data it contains, D_Ra(I_real,I_fake) And D_Ra(I_fake,I_real) For intermediate variables, the expression is as follows:

D_Ra(I_real,I_fake)＝σ(D(I_real))-E[D(I_fake)]

D_Ra(I_fake,I_real)＝σ(D(I_fake))-E[D(I_real)]

d (-) denotes the authentication network function and σ denotes the Sigmoid activation function.

2.2) construction of the authentication network D and the optimization function L thereof_D：

Referring to fig. 6, the authentication network D is composed of a convolution module and a full-connection module, and the structures and parameters of the modules are as follows: ,

the structure of the convolution module is as follows: an input layer → an a-th convolution layer → an a-th activation function layer → an a-th normalization layer → a b-th convolution layer → a b-th activation function layer → a b-th normalization layer → a c-th convolution layer → a c-th normalization layer → a d-th convolution layer → a d-th activation function layer → a d-th normalization layer → an e-th convolution layer → an e-th activation function layer → an e-th normalization layer → an f-th convolution layer → an f-th activation function layer → an f-th normalization layer → a g-th convolution layer → a g-th activation function layer → a g-th normalization layer → an h-th convolution layer → an h-th activation function layer → a Reshape conversion layer → an output layer, which outputs a two-dimensional feature vector of behavior 1;

the parameters of each layer are as follows: the input layer is an image input by the authentication network; the number of input channels of the a-th convolution layer is 3, and the number of output channels is 32; the number of input channels of the (b) th convolution layer is 32, and the number of output channels is 32; the number of input channels of the c-th convolution layer is 32, and the number of output channels is 64; the number of input channels of the d convolution layer is 64, and the number of output channels is 64; the number of input channels of the e-th convolution layer is 64, and the number of output channels is 128; the number of input channels of the f-th convolution layer is 128, and the number of output channels is 128; the number of input channels of the g convolutional layer is 128, and the number of output channels is 256; the number of input channels of the h convolution layer is 256, and the number of output channels is 256; the sizes of convolution kernels from the (a) th convolution layer to the (h) th convolution layer are all set to be 3 multiplied by 3, and convolution step lengths are all set to be 1; the normalization layers all use BN normalization functions; the activation functions from the a-th activation function layer to the h-activation function layer use LeakyRelu; the Reshape translation layer converts the h normalization layer output feature layer into a two-dimensional feature vector of behavior 1.

The structure of the full-connection module is as follows: the input layer → the a full connection layer → the i activation function layer → the b full connection layer → the j activation function layer → the output layer, the output is a probability value, which is used to judge the reality probability of the input image;

the parameters of each layer are as follows: the input layer is a two-dimensional characteristic vector output by a convolution module in the identification network D; the activation function of the ith activation function layer uses LeakyRelu; the jth activation function layer uses Sigmoid; the input of the a-th full connection layer is a feature vector output by a Reshape conversion layer, and the output is 512 vectors; the input of the b-th fully-connected layer is 512 vectors, and the output is 1 vector.

The setting discriminates the optimization function L of the network D_DIs represented as follows:

wherein, I_realEnhancing the image for the object, I_fakeTo generate an enhanced image output by the network, E denotes an averaging operation on all the data it contains, D_Ra(I_real,I_fake) And D_Ra(I_fake,I_real) For intermediate variables, the expression is as follows:

D_Ra(I_real,I_fake)＝σ(D(I_real))-E[D(I_fake)]

D_Ra(I_fake,I_real)＝σ(D(I_fake))-E[D(I_real)]d (-) denotes the authentication network function, and σ denotes the Sigmoid activation function.

And 3, alternately optimizing the objective functions of the generating network G and the identifying network D, and updating the network parameters to obtain the trained generating network G.

3.1) fixedly generating network G parameters, updating parameters of the authentication network D:

(3.1.1) randomly selecting an input training sample I from the training sample set₁ ^inputIs shown by₁ ^inputInput into the generation network G to generate the output G (I) of the network G₁ ^input) As a synthetically generated sample;

(3.1.2) selecting and inputting training samples I from the training sample set₁ ^inputCorresponding target enhancement samples I₁ ^labelAnd synthesizing the same with the sample G (I)₁ ^input) Respectively input into an authentication network D to calculate an optimization function L_DMinimizing the optimization function value by using Adam optimization algorithm, and updating each layer parameter theta of the identification network D^D；

3.2) fixing the parameters of the authentication network D, updating the parameters of the generation network G:

(3.2.1) randomly selecting an input training sample I from the input training sample set₂ ^inputInputting the data into a generation network G, and calculating to obtain the final output G (I)₂ ^input) I.e. a high-quality natural image enhanced by the generation network G, and outputs the G (I)₂ ^input) As a synthetically generated sample;

(3.2.2) selecting one from the training sample set and inputting the training sample I₂ ^inputCorresponding target enhancement samples I₂ ^labelThis is combined with the generated sample G (I)₂ ^input) Are input together to the objective optimization function L_GCalculating a function value, minimizing the optimized function value by utilizing an Adam optimization algorithm, and updating each layer of parameters of the authentication network G;

(3.2.3) judging whether the parameter updating times of the generating network G and the identifying network D both reach the set maximum iteration time of 100000 times, if so, obtaining the trained generating network G, and if not, returning to the step (3.1);

and 4, inputting a low-quality natural image with any size into the trained generating network G, and outputting the enhanced high-quality natural image through the forward propagation calculation of the generating network G.

(4.1) taking the low-quality natural image x as the input of the global feature extraction module, and obtaining ten different sets of two gamma parameters a through calculation_iAnd gamma_iThen, the following global transformation is carried out:

wherein, y_iRepresenting the global enhanced image enhanced by the global feature extraction module, wherein the value of i is from 1 to 10;

(4.2) taking the low-quality natural image x as the input of the local feature extraction module, and calculating to obtain ten different groups of residual features b_iThen, the following local enhancement transformation is performed:

y_i'＝y_i+b_i

wherein, y_i' represents a local enhanced image which is enhanced by the global feature extraction module and the local feature extraction module together;

(4.3) locally enhancing ten sets of images y_iThe two are used as the input of a characteristic fusion module together, and the final output image is obtained through calculationTo a high quality natural image enhanced by the generation network G.

The effect of the present invention will be further explained with the simulation experiment.

1. Simulation conditions are as follows:

the hardware environment of the simulation experiment is: a GPU of NVIDIA GTX 1080Ti model and a running memory of 128 GB;

the software environment of the simulation experiment is as follows: deep learning framework Tensorflow1.4.1;

in a simulation experiment, the peak signal-to-noise ratio PSNR and the structural similarity SSIM are adopted as objective quantitative evaluation indexes, and the PSNR expression is as follows:

it I^labelEnhancing the image for the object, I^outputW, H are the length and width of the image respectively, and max represents the maximum pixel value in the image, for the image after being enhanced by the natural enhancement network G;

the PSNR index of the peak signal-to-noise ratio is used for measuring the error between corresponding pixel points in two images, the unit is dB, and the larger the value of the index is, the smaller the difference between the two evaluated images is;

structural similarity SSIM is expressed as:

wherein mu_xEnhancing an image I for a target^labelMean value of (a)_xEnhancing an image I for a target^labelVariance of u_yFor the image I after being enhanced by the natural enhancement network G^outputMean value of (a)_yFor the image I after being enhanced by the natural enhancement network G^outputVariance of (a)_xyIs I^labelAnd I^outputCovariance of c₁＝(k₁L)²And c₂＝(k₂L)²Two constants with different values are adopted to avoid the occurrence of zero denominator of structural similarity SSIM, L represents the dynamic range of the pixel, and a default constant k₁0.01 and k₂＝0.03；

The value range of the structural similarity SSIM is [0,1], and the larger the SSIM value is, the smaller the details and distortion of the image are represented.

2. Simulation experiment content and result analysis:

simulation experiment 1, which respectively uses the present invention and the existing convolution network-based image enhancement method DPED to enhance a real natural image, and the result is as shown in fig. 7, where fig. 7(a) is a real low-quality natural image used in the simulation experiment; fig. 7(b) is an image enhanced by a DPED, which is a conventional natural image enhancement method; FIG. 7(c) is an image enhanced using the trained generation network G using the method of the present invention;

the existing natural image enhancement method DPED is derived from the article "Proceedings of the IEEE International Conference on Computer Vision";

comparing fig. 7(b) and fig. 7(c) can result in: the method for enhancing the real natural image by the DPED has the advantages that the enhanced result of the real natural image by the existing natural image enhancing method lacks detail information, and the brightness and the definition are low, while the enhanced result of the real natural image by the method has more proper brightness, definition and contrast, which shows that the method not only overcomes the defect that the traditional gamma conversion parameter is not easy to adjust, but also the enhanced image has more detail information, the color and the brightness are more natural, and the integral visual effect of the image is improved.

A simulation experiment 2, which is to test 500 pairs of test sets in the MIT-Adobe natural image data set by respectively using the existing trained DPED network, the trained CRN network, the trained FCN network and the trained generation network G;

the average values of the peak signal-to-noise ratio PSNR and the structural similarity SSIM were calculated, respectively, and the results are shown in table 1:

TABLE 1 comparison of DPED Process with the present invention

Method	PSNR	SSIM
			DPED	21.76	0.871
FCN	20.66	0.849
			CRN	22.38	0.877
The method of the invention	23.75	0.885

Wherein the conventional natural image enhancement method FCN is derived from the article "Fast image processing with full-volumetric networks" (Proceedings of the IEEE International Conference on Computer Vision.);

the conventional natural image enhancement method CRN is derived from the article "Photographic image synthesis with masked defined refinement networks" (Proceedings of the IEEE international conference on computer vision).

As can be seen from Table 1, the PSNR and SSIM values of the present invention are higher, which indicates that the enhancement of the low-quality natural image by the present invention has better enhancement results than the existing DPED method.

Claims

1. A natural image enhancement method based on depth gamma transformation comprises the following steps:

2. The method of claim 1, wherein: the global feature extraction module in the generation network G has the following structure and parameters:

the structure is as follows: the input layer → the first convolution layer → the first activation function layer → the second convolution layer → the first normalization layer → the second activation function layer → the first downsampling layer → the third convolution layer → the second normalization layer → the third activation function layer → the second downsampling layer → the fourth convolution layer → the third normalization layer → the fourth activation function layer → the second downsampling layer → the fifth convolution layer → the fourth normalization layer → the fifth activation function layer → the sixth convolution layer → the fifth normalization layer → the sixth activation function layer → the third downsampling layer → the seventh convolution layer → the sixth normalization layer → the seventh activation function layer → the eighth convolution layer → the third downsampling layer → the seventh normalization layer → the eighth activation function layer → the fourth downsampling layer → the ninth convolution layer → the eighth convolution layer → the ninth activation function layer → the twelfth activation function layer → the ninth activation function layer → the seventh normalization layer → the seventh activation function layer → the eighth activation function layer → the fourth downsampling layer → the ninth layer → the eleventh activation function layer → the tenth layer → the ninth activation function layer → the eleventh layer → the twelfth activation function layer → the ninth activation function layer → the eleventh layer → the ninth activation function layer → the twelfth layer → the ninth activation function layer → the seventh layer → the eleventh layer → the seventh activation function layer → the seventh activation function layer → the seventh layer → the eleventh layer → the seventh An output layer composition outputting ten different sets of gamma parameters a and gamma;

the parameters of each layer are as follows:

the input layer inputs low-quality natural images;

the number of input channels of the first convolutional layer is 3, and the number of output channels is 32;

the input channel number of the second convolution layer is 32, and the output channel number is 32;

the input channel number of the third convolution layer is 32, and the output channel number is 64;

the input channel number of the fourth convolution layer is 64, and the output channel number is 64;

the input channel number of the fifth convolutional layer is 64, and the output channel number is 128;

the number of input channels of the sixth convolutional layer is 128, and the number of output channels is 128;

the input channel number of the seventh convolutional layer is 128, and the output channel number is 256;

the input channel number of the eighth convolutional layer is 256, and the output channel number is 256;

the input channel number of the ninth convolutional layer is 256, and the output channel number is 512;

the input channel number of the tenth convolutional layer is 512, and the output channel number is 512;

the convolution kernel sizes of all convolution layers are set to be 3 multiplied by 3, and the convolution step length is set to be 1;

the input of the first full-connection layer is 512 vectors, and the output is 256 vectors;

the input of the second full-connection layer is 256 vectors, and the output is 60 vectors;

the normalization layers all use BN normalization functions;

the activation functions of the first to tenth activation function layers use LeakyRelu;

the eleventh and twelfth activation functions use Sigmoid;

the first sampling layer to the fourth sampling layer carry out 2 times of down-sampling operation on the input characteristic layer;

the global average pooling layer normalizes each channel average of input features to a value.

3. The method of claim 1, wherein: the local feature extraction module in the generation network G has the following structure and parameters:

the structure is as follows: the input layer → the 1 st convolution layer → the 1 st activation function layer → the 2 nd convolution layer → the 1 st normalization layer → the 2 nd activation function layer → the 3 rd convolution layer → the 2 nd normalization layer → the 3 rd activation function layer → the 4 th convolution layer → the 3 rd normalization layer → the 4 th activation function layer → the 5 th convolution layer → the 4 th normalization layer → the 5 th activation function layer → the 1 st channel connection layer → the 6 th convolution layer → the 5 th normalization layer → the 6 th activation function layer → the 7 th convolution layer → the 6 th normalization layer → the 7 th activation function layer → the 8 th convolution layer → the 8 th activation function layer → the 9 th convolution layer → the 8 th normalization layer → the 8 th activation function layer → the 10 th convolution layer → the 9 th normalization layer → the 10 th activation function layer → the 2 th channel connection layer → the 11 th convolution layer → the 10 th convolution layer → the 11 th activation function layer → the 12 → the first activation function layer → the 12 → the 10 th convolution layer → the 10 th activation function layer → the 13 → the first activation function layer → the 8 th convolution layer → the 8 th activation function layer → the 10 → the first convolution layer → the 10 → the 11 → the first activation function layer → the 12 → the first convolution layer → the 10 → the second layer → the 10 → the first activation function layer → the second layer → the 13 → the second layer → the first activation function layer → the second layer → the 10 → the second activation function layer → the second layer → the first activation function layer → the second layer → the first activation function layer → the second layer → the first activation function layer → the second layer → the first activation function layer → the second layer → the first activation function layer → the second layer → the first layer → the second layer → the first layer → the second layer → the first layer → the second layer → the first layer → the second layer Lamination → 13 normalization layer → 14 activation function layer → 15 convolution layer → 14 normalization layer → 15 activation function layer → 3 channel connection layer → 16 convolution layer → 15 normalization layer → 16 activation function layer → 17 convolution layer → output layer. Outputting ten groups of different characteristic residual error characteristics b;

the parameters of each layer are as follows:

the input layer inputs low-quality natural images;

the number of input channels of the 1 st convolution layer is 3, and the number of output channels is 64;

the number of input channels of the 6 th, 11 th and 16 th convolution layers is 320, and the number of output channels is 64;

the number of input channels of the 17 th convolution layer is 64 channels, and the number of output channels is 60;

the number of input channels of the other convolution layers is 64, and the number of output channels is 64;

the input characteristic of the 7 th convolution layer is that the output of the 6 th activation function layer is added to the output of the 1 st activation function layer, the input characteristic of the 12 th convolution layer is that the output characteristic of the 11 th activation function layer is added to the input characteristic of the 7 th convolution layer, and the input characteristic of the 17 th convolution layer is that the output of the 16 th activation function layer is added to the input characteristic of the 12 th convolution layer;

all activation function layers use LeakyRelu activation functions;

the normalization layers all use BN normalization functions;

each channel connects the layers, and the output characteristics of the first five activation function layers are cascaded on the channel domain.

4. The method of claim 1, wherein: the structure and parameters of the feature fusion module in the generation network G are as follows:

the structure is as follows: an input layer → an i convolution layer → an i normalization layer → an i activation function layer → a ii convolution layer → a ii normalization layer → a ii activation function layer → a iii convolution layer → a iii normalization layer → a iii activation function layer → an iv convolution layer → an iv normalization layer → an iv activation function layer → a v convolution layer → a v normalization layer → a v activation function layer → a channel connection layer → a vi convolution layer → a vi normalization layer → a vi activation function layer → a vii convolution layer → a vii normalization layer → a vii activation function layer → an output layer, which is output to generate a network G-enhanced image;

the parameters of each layer are as follows:

the input layer is the image characteristics enhanced by the local characteristic extraction module and the global characteristic extraction module;

the number of input channels of the first convolution layer is 60, and the number of output channels is 64;

the number of input channels from the second convolution layer to the fifth convolution layer is 64, and the number of output channels is 64;

the number of input channels of the VI convolution layer is 320, and the number of output channels is 64;

the number of input channels of the VII th convolution layer is 64, and the number of output channels is 64;

the number of input channels of the eighth convolutional layer is 64, and the number of output channels is 3;

the sizes of convolution kernels from the first convolution layer to the VI convolution layer are all set to be 3 multiplied by 3, and convolution step lengths are all set to be 1;

the sizes of convolution kernels from the VII th convolution layer to the VIII th convolution layer are all set to be 5 multiplied by 5, and convolution step lengths are all set to be 1;

the VII-th convolution layer input characteristic is the addition of the I-th activation function layer output characteristic and the VI-th activation function layer output characteristic;

the normalization layers all use BN normalization functions;

the activation functions from the activation function layer I to the activation function layer VI use LeakyRelu;

the activation functions from the VII-th activation function layer to the VIII-th activation function layer use Sigmoid;

the channel connection layer concatenates the output characteristics of its first five activation function layers over the channel domain.

5. The method of claim 1, wherein: according to pixel loss l_MSEContent loss L_contentThe antagonistic loss A jointly form an objective optimization function L of a generating network G_GExpressed as follows:

L_G＝4*L_MSE+L_content+A

I_realenhancing the image for the object, I_fakeTo generate an enhanced image output by the network, i represents the pixel points of the image, f_vggA function for extracting features for the VGG16 network, | | | |₂Norm, H, W is the length and width of the input feature, respectively, E represents the averaging operation over all the data it contains, D_Ra(I_real,I_fake) And D_Ra(I_fake,I_real) For intermediate variables, the expression is as follows:

D_Ra(I_real,I_fake)＝σ(D(I_real))-E[D(I_fake)]

D_Ra(I_fake,I_real)＝σ(D(I_fake))-E[D(I_real)]

6. The method of claim 1, wherein: the convolution module structure and parameters in the discrimination network D are as follows:

the structure is as follows: an input layer → an a-th convolution layer → an a-th activation function layer → an a-th normalization layer → a b-th convolution layer → a b-th activation function layer → a b-th normalization layer → a c-th convolution layer → a c-th normalization layer → a d-th convolution layer → a d-th activation function layer → a d-th normalization layer → an e-th convolution layer → an e-th activation function layer → an e-th normalization layer → an f-th convolution layer → an f-th activation function layer → an f-th normalization layer → a g-th convolution layer → a g-th activation function layer → a g-th normalization layer → an h-th convolution layer → an h-th activation function layer → a Reshape conversion layer → an output layer, which outputs a two-dimensional feature vector of behavior 1;

the parameters of each layer are as follows:

the input layer is an image input by the authentication network;

the number of input channels of the a-th convolution layer is 3, and the number of output channels is 32;

the number of input channels of the (b) th convolution layer is 32, and the number of output channels is 32;

the number of input channels of the c-th convolution layer is 32, and the number of output channels is 64;

the number of input channels of the d convolution layer is 64, and the number of output channels is 64;

the number of input channels of the e-th convolution layer is 64, and the number of output channels is 128;

the number of input channels of the f-th convolution layer is 128, and the number of output channels is 128;

the number of input channels of the g convolutional layer is 128, and the number of output channels is 256;

the number of input channels of the h convolution layer is 256, and the number of output channels is 256;

the sizes of convolution kernels from the (a) th convolution layer to the (h) th convolution layer are all set to be 3 multiplied by 3, and convolution step lengths are all set to be 1;

the normalization layers all use BN normalization functions;

the activation functions from the a-th activation function layer to the h-activation function layer use LeakyRelu;

the Reshape translation layer converts the h normalization layer output feature layer into a two-dimensional feature vector of behavior 1.

7. The method of claim 1, wherein: the fully connected module structure and parameters in the authentication network D are as follows:

the structure is as follows: the input layer → the a full connection layer → the i activation function layer → the b full connection layer → the j activation function layer → the output layer, the output is a probability value, which is used to judge the reality probability of the input image;

the parameters of each layer are as follows:

the input layer is a two-dimensional characteristic vector output by a convolution module in the identification network D;

the activation function of the ith activation function layer uses LeakyRelu;

the jth activation function layer uses Sigmoid;

the input of the a-th full connection layer is a feature vector output by a Reshape conversion layer, and the output is 512 vectors;

the input of the b-th fully-connected layer is 512 vectors, and the output is 1 vector.

8. The method of claim 1, wherein: an optimization function L of the authentication network D_DExpressed as follows:

D_Ra(I_real,I_fake)＝σ(D(I_real))-E[D(I_fake)]

D_Ra(I_fake,I_real)＝σ(D(I_fake))-E[D(I_real)]

9. The method of claim 1, wherein: updating network parameters of the two networks through alternately optimizing the objective functions of the generating network G and the identifying network D to obtain a trained generating network G, and realizing the following steps:

when training the generation network G, fixing the parameters of the identification network D, randomly selecting a group of samples in a training set, cutting 256 × 256 image blocks as the input of the generation network, and optimizing the objective function L of the generation network_GUpdating parameters of the generated network;

when training the identification network D, the parameters of the generation network G are fixed, and the low-quality image input by the generation network is inputEnhancing, using the enhanced image as the input of the identification network, and optimizing the target function L_DUpdating parameters of the authentication network;

and alternately updating and training the generating network G and the identifying network D until the whole network reaches the set training times to obtain the trained generating network G and identifying network D.

10. The method of claim 1, wherein: the forward propagation calculation is implemented as follows:

firstly, a low-quality natural image x is used as the input of a global feature extraction module, and ten different sets of two gamma parameters a are obtained through calculation_iAnd gamma_iThen, the following global transformation is carried out:

then, the low-quality natural image x is used as the input of the local feature extraction module, and ten different sets of residual features b are calculated and obtained_iThen, the following local enhancement transformation is performed:

y_i'＝y_i+b_i

then, ten sets of locally enhanced images y_iThe images are jointly used as the input of a feature fusion module, and the final output image, namely the high-quality natural image enhanced by the generation network G, is obtained through calculation.