CN108681991A

CN108681991A - Based on the high dynamic range negative tone mapping method and system for generating confrontation network

Info

Publication number: CN108681991A
Application number: CN201810299749.9A
Authority: CN
Inventors: 宋利; 宁士钰; 解蓉; 张文军
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2018-04-04
Filing date: 2018-04-04
Publication date: 2018-10-19

Abstract

The present invention provides a kind of high dynamic range negative tone mapping method and system based on generation confrontation network, wherein：Original high dynamic range video is read, cutting, which is converted into, can be used for trained standard dynamic range data set corresponding with high dynamic range；A kind of generation confrontation network connected with great-jump-forward based on convolutional neural networks is established, converts standard dynamic range image to high dynamic range images, i.e. negative tone maps；According to the integrated objective function of setting, continues to optimize entire generate of training and fight network, finally obtained network can complete the mapping from standard dynamic range to high dynamic range.Present invention improves existing non-learning method it is non-linear insufficient, parameter regulation is complicated the problems such as, and in view of the one-dimensional and gradient characteristics of high dynamic range images, the negative tone mapping to high dynamic range has been better achieved.

Description

Based on the high dynamic range negative tone mapping method and system for generating confrontation network

Technical field

It is specifically a kind of dynamic based on the height for generating confrontation network the present invention relates to a kind of method of technical field of image processing State range negative tone mapping method and system.

Background technology

Show that high dynamic range (HDR) is undoubtedly a main trend in technique for taking field in current TV.High dynamic Range is a kind of with the completely different technology of existing display technology, this is mainly reflected in it on the shifting gears of picture. Under the action of high dynamic range, the color that TV is presented is more lively, and black is deeper, and the object in picture is also more clear It is illustrated.At the same time, the tone of picture can be also extended, and be expanded by the BT.709 standards under widely used standard dynamic range To the BT.2020 standards of high dynamic range.Brightness is the key that high dynamic range：Most of TV brightness is 400 on the market Nit or so, part type have reached 750 nits.But the maximum brightness of high dynamic range TV is up to 1000 nits, brightness Promotion can allow scene to seem more true, especially Outdoor Scene.Photographing request due to high dynamic range content and shooting generation Valence is high, it is difficult to by a large amount of high dynamic range content of direct shooting and producing, therefore using in existing standard dynamic range Hold and make high dynamic range content as an of great value direction, this conversion is known as negative tone mapping.

Due to the method that traditional method is based not on study, essence is subsection compression, the non-linear property of method It can be insufficient with operability.Method of some occurred in recent years based on study is seldom examined primarily directed to the expansion of brightness domain Consider the expansion of colour gamut, therefore causes viewing experience bad.

Invention content

The present invention is directed to the defect of existing negative tone mapping techniques, provides a kind of based on the high dynamic for generating confrontation network Range negative tone mapping method, this method fight net by considering the expansion of brightness range and Color Range using generating Network makes up the nonlinear deficiency of conventional method, devises a kind of generation confrontation model based on convolutional neural networks, to reach To visually with better conversion effect on objective evaluation.

The first object of the present invention is to provide a kind of high dynamic range negative tone mapping method based on generation confrontation network, Including：

S1：Original high dynamic range video is read, standard dynamic range image is sheared and be converted into, with high dynamic range figure The data set for having supervision as forming, as subsequent training dataset and validation data set；

S2：It establishes the generation connected with great-jump-forward based on convolutional neural networks and fights network；

S3：Network is fought to the generation, the target loss function integrated by one-dimensional characteristic and Gradient Features is established, uses The training dataset constantly training optimization obtains the generation confrontation network that can complete negative tone mapping；

S4：The validation data set, which is input to, can complete the generation confrontation network model of negative tone mapping, and mapping obtains High dynamic range images.

Preferably, by HDRTools that existing HDR videos is single by scene extraction when establishing data set in the S1 Vertical frame dimension dynamic image, and piecemeal cuts into the image of low resolution for network training.Further, part of the present invention is real Apply in example is that the high dynamic range images of several 512 × 512 resolution ratio are intercepted out from the video of 4K resolution ratio as data set In high dynamic range images, reuse Reinhard tone-mapping algorithms, by high dynamic range images be converted into standard dynamic Range image forms one-to-one data set for training and verifying.

It is used to train it is highly preferred that the data set includes multigroup standard dynamic range and the corresponding image of high dynamic range Model, in addition also multigroup for verifying, wherein high dynamic range images are the exr formatted files of 10 bit quantizations, are used BT.2020 standard color gamuts, maximum brightness are 1000 nits；Standard dynamic range image is the png formatted files of 8 bit quantizations, Using BT.709 standard color gamuts, maximum brightness is 100 nits.In order to easy to use in the training described in S3, when getImage It is stored in h5 files using vector.Data above collection uses when being section Example of the present invention, can also choose other high dynamics The corresponding standard dynamic range image construction data set of range image, but need using same quantization digit, colour gamut Range and maximum brightness.

Preferably, the target loss function is determined by the one-dimensional characteristic of image and the Gradient Features of all directions；

If L, H indicates standard dynamic range input and high dynamic range output respectively, G, D indicate generator and differentiation respectively Device, it may be considered that the entire target loss function for generating confrontation network is a Minimax Problems：

Wherein：Indicate the content loss function of generator,Indicate pair of generator and arbiter Anti- loss function, λ are hyper parameters, control the proportion of two kinds of loss function ingredients.It can be written as：

α indicates the weight of one-dimensional characteristic and Gradient Features, d_x、d_yThe gradient in transverse and longitudinal direction in image is indicated respectively.It can be expressed as：

The high dynamic range negative tone mapping method based on generation confrontation network that the present invention designs, has in view of image While some one-dimensional characteristics, it is also contemplated that the Gradient Features of high dynamic range.Specifically, negative tone maps corresponding pixel points Value not only determined by the standard dynamic range pixel of current location, it is also related to the variation tendency of surrounding pixel point, because This use calculates Gradient Features to calculated pixel and surrounding pixel point, 2 norms is used in the present invention, while using volume Product operation is so that this operating effect is more preferable.By this setting, generation pair is trained up in the training dataset using foundation After anti-network, good visual effect and evaluation score can be reached.

The present invention second is designed to provide a kind of high dynamic range negative tone mapped system based on generation confrontation network, Including：Processor and memory, have program stored therein in the memory instruction, and the processor is for transferring described program instruction It is above-described based on the high dynamic range negative tone mapping method for generating confrontation network to execute.

Compared with prior art, the present invention has following advantageous effect：

The method of the invention and system consider it is existing expanded based on brightness in learning method while, also take into account The expansion of colour gamut, there is a better visual effect.

Compared with widely used non-learning method, the method for the invention and system have better non-linear behaviour and Multi-scale transform performance.It ensure that the theory of negative tone mapping is complete simultaneously, improving visual effect and the visitor of mapping result See evaluation index.

Description of the drawings

Upon reading the detailed description of non-limiting embodiments with reference to the following drawings, other feature of the invention, Objects and advantages will become more apparent upon：

Fig. 1 is the flow chart of one embodiment of the invention method；

Fig. 2 is the network structure block diagram that confrontation network is generated in one embodiment of the invention；

Fig. 3 is the Contrast on effect that result and existing method are generated in one embodiment of the invention.

Specific implementation mode

With reference to specific embodiment, the present invention is described in detail.Following embodiment will be helpful to the technology of this field Personnel further understand the present invention, but the invention is not limited in any way.It should be pointed out that the ordinary skill of this field For personnel, without departing from the inventive concept of the premise, various modifications and improvements can be made.These belong to the present invention Protection domain.

Shown in referring to Fig.1, the flow of the invention based on the high dynamic range negative tone mapping method for generating confrontation network Figure, mentality of designing are：

1. reading original high dynamic range video, standard dynamic range image is sheared and is converted into, with high dynamic range figure As composition has the data set of supervision, for training in 3. and 4. in verification；

2. based on convolutional neural networks, establish and generate confrontation network, wherein generator is connected comprising great-jump-forward, is obtained It generates confrontation network and negative tone map operation can be completed by training 3., the verification result in 4.；

3. establishing by the target loss function of one-dimensional characteristic and Gradient Features synthesis, the training data established in use 1. Collection, constantly training optimization obtain to complete the generation confrontation network model of negative tone mapping；

4. the data set for verification established in 1. is input to and fights network model through 1.-generation for 3. establishing, reflect It penetrates and obtains high dynamic range images, and output result is evaluated.

Wherein the 1., 2. step establishes data set and generates confrontation network, the 3. step pass through the target loss function instruction that sets The generation confrontation network model of negative tone mapping can be completed by getting, the 4. walk to the generation that training obtains fight network model into Row assessment.High dynamic range negative tone mapping generation confrontation network model is described below establishes process.

The detailed technology operation involved in above-mentioned each step is illustrated below by way of specific embodiment, it should be understood that , it is only section Example of the present invention below.

1. the foundation of high dynamic range data set corresponding with standard dynamic range

When establishing data set, existing HDR videos are pressed into scene, shears and is converted into standard dynamic range image.Make again With Reinhard tone-mapping algorithms, it converts high dynamic range images to standard dynamic range image, composition is one-to-one Data set is for training and verifying.

In section Example of the present invention, in order to obtain the high dynamic range images of high quality, existing height is selected Quality high dynamic range play pieces, with following feature：10bit quantization exr formatted files, 1000 nit peak brightness, Colour gamut meets BT.2020 standards, and PQ curve processings are encoded using HEVCMain10, color space YUV, sample rate 4:2: 0.Then video will be selected to be the image of single frames 3840 × 2160 by scene cut above by the tool boxes HDRTools, and cut At 512 × 512 resolution ratio, color space conversion is rgb space, results in the high dynamic range images in data set.

Further, when establishing corresponding standard dynamic range image in supervised learning data set, in standard dynamic range Appearance is obtained using Reinhard tone mapping operators, and the standard dynamic range content mapped has following feature：8bit amounts The png formatted files of change, 100 nit peak brightness, colour gamut meet BT.709 standards, color space RGB.

By obtained standard dynamic range image and high dynamic range images linear normalization between [0,1], for net It is easy to use in network training, it is stored in h5 files correspondingly using vector when getImage.In obtained data set, Training dataset is by 2660 groups of image constructions, and validation data set is by 140 groups of image constructions.

2. generating the foundation of confrontation network

It generates confrontation network and constitutes i.e. generator, arbiter by two networks, wherein standard dynamic range image is inputted Generator, the high dynamic range images output predicted, the image that arbiter is used for differentiating input arbiter is to generate also It is primary high dynamic range images.

When designing generator, main purpose is the further feature of extraction standard dynamic image to represent whole picture figure Then picture goes out high dynamic range images by this feature reconstruction again.Therefore the generator in the embodiment of the present invention is solved using volume Code device structure, wherein the standard dynamic range image inputted constantly extracts feature by continuous convolutional layer, convolution among this The convolution kernel size of layer is constant, and port number is continuously increased, and the size in each channel is gradually reduced, to reach feature extraction Operation, process are similar to the coding of image information.

Then, the height of original image is constantly gone back by the layer that deconvolutes corresponding with convolutional layer using obtained further feature Dynamic range information, image channel are reduced to RGB triple channels.Particularly, it is added between corresponding convolutional layer and the layer that deconvolutes Great-jump-forward connects, and provides more original information for the layer that deconvolutes to go back original image.

In the preferred embodiment of the present invention, generator is a kind of U-NET structures, wherein the convolution of each convolutional layer and the layer that deconvolutes The size of core is all 3 × 3, stride 2, has batch normalization layer and leaky-RELU as activation letter after every layer of convolution operation Number, but the activation primitive of last layer is sigmoid, and stride is 2.The port number of 5 layers of convolutional layer of generator is respectively 64, 128,256,512,1024, in addition the port number of the 4 layers of layer that deconvolutes is respectively 512,256,128,64；In addition, generator is defeated It is 3 to enter layer port number, and output layer port number is 3.It should be noted that not containing Max- in the network structure of the present invention Pooling layers.

Another part of confrontation network is generated, arbiter is made of convolutional neural networks and full articulamentum, main complete Pairs of input picture carries out feature extraction, and judges that input picture is the high dynamic range figure generated according to arbiter final output Picture or primary high dynamic range images, value is between 0-1, it is considered that value is closer to 1, and input picture is it is more likely that original Raw high dynamic range images；Input is closer to 0, and input picture is it is more likely that the high dynamic range images generated.It is being preferably implemented In example, arbiter is made of five layers of convolutional layer and two layers of full articulamentum, wherein in convolutional layer first two layers of convolution kernel size be 5 × 5, latter three layers of convolution kernel size is 3 × 3；The stride of three first layers is 2, and latter two layers of stride is 1；It is similar with generator, Each convolutional layer has batch normalization operation and a leaky-RELU activation primitives, the port numbers of 5 layers of convolutional layer is respectively 64, 128、128、256、1.It after convolutional layer, outputs it and vector is converted to by Flatten operations, then pass through two layers of full connection Layer, number of nodes is respectively 1024 and 1.

For above-mentioned network structure as shown in Fig. 2, wherein conv indicates that convolutional layer, deconv indicate the layer that deconvolutes, k is convolution Core size, n indicate that port number, s indicate that stride sizes, FC indicate full articulamentum.

3. setting object function and training network, the generation confrontation network model that can complete negative tone mapping is obtained

The training principle for generating confrontation network is that the high dynamic range images of "false" are generated by generator, this is generated High dynamic range images distinguished with primary high dynamic range images input arbiter.On the one hand training process is to allow life Grow up to be a useful person generation image closer to primary image, to achieve the purpose that cheat arbiter；On the other hand it is trained arbiter to make It differentiates that the accuracy rate for generating image and primary image is more preferable.It thus generates confrontation network and forms a gambling process, target Loss function also thus establish by principle.

Wherein：Indicate the content loss function of generator,Indicate pair of generator and arbiter Anti- loss function, λ are hyper parameters, control the proportion of two kinds of loss function ingredients.Indicate the relevant loss letter of content Number proposes a kind of high dynamic range images in view of generating and primary high dynamic range images in embodiments of the present invention Between mean square error and differential errors losses by mixture function.In embodiments of the present invention,It can be written as：

Wherein：E indicates the expectation of formula in bracket, (L, H)~p_dataIt indicates to sample out data group from training data concentration (L, H), G (L) indicate the high dynamic range images generated, | | | |_FIndicate the F- norms of the tensor, d_x、d_yImage is indicated respectively In in horizontal and vertical direction each Color Channel differential value.(2) first item is widely applied mean square deviation MSE losses in formula Function, it is a kind of computational methods of pixel scale；(2) Section 2 is high dynamic range images and primary that comparison generates in formula The deeper contrast difference of high dynamic range images, is named as dMSE, and the help that is introduced into of dMSE inhibits image mistake in former problem In smooth problem.

Be generate confrontation network confrontation loss function, obtained by the principle of GAN networks, the computational item with Generator and arbiter are all related, and concrete form is：

Wherein H~p_dataWith L~p_dataIt indicates to sample out high dynamic range images from training data concentration respectively and standard is dynamic State range image.

(1) Minimax Problems in formula are the games generated between better image and more accurately differentiation generation image Problem generates confrontation network and needs with the trained parameter for carrying out progressive updating generator and arbiter.Particularly, by (1) When formula is calculated separately for generator and arbiter, it can be written as：

Wherein G_kAnd D_kGenerator G when being illustrated respectively in kth time iteration and arbiter D.

Principle among these is when training generator, by the content loss function for minimizing generator Its high dynamic range images generated can be in pixel scale close to primary high dynamic range images.By pair for minimizing network Anti- loss functionThe image that generator generates " can deceive " arbiter and do the judgement to make mistake as possible, it is believed that its The image of generation is primary high dynamic range images；Arbiter is with trained progress simultaneously, the image to generation and primary figure The differentiation accuracy rate of picture gradually increases, and then in next iteration, is differentiated result for training generator.

After the completion of the above training, the generation confrontation network model that can complete negative tone mapping is obtained.

4. implementation condition and outcome evaluation

In section Example of the present invention, code realization is completed by Python, and frame uses TensorFlow.Training process In, the batch size of each iteration of selection is 6, and optimization method selects RMSProp, and learning rate is by 10^-4Starting, with iteration The increase of number constantly reduces, and reaches 10 after 80,000 iteration^-5.Parameter setting in object function, λ are set as 10⁴, α It is set as 10⁵。

The evaluation index of high dynamic range images is different from standard dynamic range image, generally using HDR-VDP-2 come into Row objective evaluation, what mass fraction reflected is the high dynamic range images of generation relative to primary high dynamic range images matter The degree declined is measured, the objective evaluation index to exporting result is got by a mean subjective opinion.In addition to this, it evaluates Index mPSNR has also been introduced evaluate quality and the SSIM of pixel scale evaluate generate image structural dependence.

The Contrast on effect of table 1 embodiment of the present invention and existing method

The evaluation of result index comparison of result and existing method that the embodiment of the present invention obtains is given in table 1, wherein Huo, KO are non-learning methods, and parameter setting is executed according to default setting in addition to image attributes；DrTM is based on study Method, code, which is realized, derives from author.From the results of view, the result of the embodiment of the present invention all has in three kinds of evaluation indexes Highest score, it is believed that method proposed by the present invention has promotion relative to existing method on generating quality, as a result example It is referred to Fig. 3.

Fig. 3 is the result of the embodiment of the present invention and the Comparative result of existing method, it can be seen that result of the invention has Better color performance, and closer primary high dynamic range images.

The present invention also provides a kind of embodiment based on the high dynamic range negative tone mapped system for generating confrontation network, packets It includes：Processor and memory, have program stored therein in the memory instruction, the processor for transfer described program instruction with Execute the high dynamic range negative tone mapping method based on generation confrontation network described in Fig. 1, Fig. 2.

The method of the invention and the system enhancement non-linear deficiency of existing non-learning method, parameter regulation complexity etc. Problem, and in view of the one-dimensional and gradient characteristics of high dynamic range images, preferably realize to the anti-of high dynamic range Tone mapping.

It should be noted that provided by the invention described based on the high dynamic range negative tone mapping side for generating confrontation network Step in method can utilize described based on corresponding mould in the high dynamic range negative tone mapped system for generating confrontation network Block, device, unit etc. are achieved, and the technical solution that those skilled in the art are referred to the system realizes the method Steps flow chart, that is, the embodiment in the system can be regarded as realizing the preference of the method, and it will not be described here.

One skilled in the art will appreciate that in addition to realizing system provided by the invention in a manner of pure computer readable program code And its other than each device, completely can by by method and step carry out programming in logic come so that system provided by the invention and its Each device is in the form of logic gate, switch, application-specific integrated circuit, programmable logic controller (PLC) and embedded microcontroller etc. To realize identical function.So system provided by the invention and its every device are considered a kind of hardware component, and it is right The device for realizing various functions for including in it can also be considered as the structure in hardware component；It can also will be for realizing each The device of kind function is considered as either the software module of implementation method can be the structure in hardware component again.

The description that specific embodiments of the present invention are carried out above.It is to be appreciated that the invention is not limited in above-mentioned Particular implementation, those skilled in the art can make various deformations or amendments within the scope of the claims, this not shadow Ring the substantive content of the present invention.

Claims

1. a kind of based on the high dynamic range negative tone mapping method for generating confrontation network, which is characterized in that including：

S1：Original high dynamic range video is read, standard dynamic range image is sheared and be converted into, with high dynamic range images group At the data set for having supervision, as subsequent training dataset and validation data set；

S3：Network is fought to the generation, the target loss function integrated by one-dimensional characteristic and Gradient Features is established, using described Training dataset constantly training optimization obtains the generation confrontation network model that can complete negative tone mapping；

S4：The validation data set, which is input to, can complete the generation confrontation network model of negative tone mapping, and mapping obtains high dynamic State range image.

2. according to claim 1 based on the high dynamic range negative tone mapping method for generating confrontation network, feature exists In：In the S1, when establishing data set, single frames high dynamic range out is extracted existing HDR videos by scene by HDRTools Image, and piecemeal cuts into the image of low resolution for network training.

3. according to claim 2 based on the high dynamic range negative tone mapping method for generating confrontation network, feature exists In：In the S1, the high dynamic range images of several 512 × 512 resolution ratio are intercepted out from the video of 4K resolution ratio as number According to the high dynamic range images of concentration, Reinhard tone-mapping algorithms are reused, convert high dynamic range images to standard Dynamic image forms one-to-one data set for training and verifying.

4. a kind of based on the high dynamic range negative tone mapping side for generating confrontation network according to claim 1-3 any one of them Method, it is characterised in that：The data set includes multigroup standard dynamic range and the corresponding image of high dynamic range for training mould Type, it is in addition also multigroup for verifying, wherein：

High dynamic range images are the exr formatted files of 10 bit quantizations, and using BT.2020 standard color gamuts, maximum brightness is 1000 nits；

Standard dynamic range image is the png formatted files of 8 bit quantizations, uses BT.709 standard color gamuts, maximum brightness 100 Nit；

In order to easy to use in the training described in S3, when getImage, is stored in using vector in h5 files.

5. it is according to claim 1 a kind of based on the high dynamic range negative tone mapping method for generating confrontation network, it is special Sign is：In the S2, generates confrontation network and be made of generator and arbiter, standard dynamic range image is inputted and is generated Device, the high dynamic range images output predicted；Arbiter be used for differentiate input arbiter image be generate or it is former Raw high dynamic range images.

6. it is according to claim 5 a kind of based on the high dynamic range negative tone mapping method for generating confrontation network, it is special Sign is：The generator is made of 9 layers of convolutional neural networks, and every layer has batch normalization operation, wherein i-th layer and the N-i layers have great-jump-forward connection, n 9；Arbiter is made of 5 layers of convolutional neural networks and 2 layers of fully-connected network.

7. it is according to claim 1 a kind of based on the high dynamic range negative tone mapping method for generating confrontation network, it is special Sign is：In the S3, target loss function is a Minimax Problems：

Wherein：L, H indicates standard dynamic range input and high dynamic range output respectively, and G, D indicate generator and differentiation respectively Device,Indicate the content loss function of generator,Indicate that letter is lost in the confrontation of generator and arbiter Number, λ is hyper parameter, controls the proportion of two kinds of loss functions.

8. it is according to claim 7 a kind of based on the high dynamic range negative tone mapping method for generating confrontation network, it is special Sign is：It is describedIt is written as：

α indicates the weight of one-dimensional characteristic and Gradient Features, d_x、d_yThe gradient in transverse and longitudinal direction in image is indicated respectively； It is expressed as：

Wherein：E indicates expectation, (L, H)~p_dataIt indicates to sample out data group (L, H) from training data concentration, G (L) indicates to generate High dynamic range images, | | | |_FIndicate the F- norms of the tensor, d_x、d_yIndicate every in horizontal and vertical direction in image respectively The differential value of a Color Channel.

9. it is according to claim 7 a kind of based on the high dynamic range negative tone mapping method for generating confrontation network, it is special Sign is：It is describedIt is obtained by the principle of GAN networks, the computational item and generator and arbiter are all related, tool Body form is：

Wherein：H~p_dataWith L~p_dataIt indicates to sample out high dynamic range images and standard dynamic from training data concentration respectively Range image.

10. it is a kind of based on the high dynamic range negative tone mapped system for generating confrontation network, including：Processor and memory, It is characterized in that：Have program stored therein instruction in the memory, and the processor is for transferring described program instruction with perform claim It is required that the high dynamic range negative tone mapping method based on generation confrontation network described in any one of 1-9.