CN113066114A

CN113066114A - Cartoon style migration method based on Retinex model

Info

Publication number: CN113066114A
Application number: CN202110305033.7A
Authority: CN
Inventors: 施云惠; 欧阳浩然; 齐娜; 尹宝才
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2021-03-10
Filing date: 2021-03-10
Publication date: 2021-07-02

Abstract

A cartoon style migration method based on a Retinex model belongs to the field of computer vision. Converting real-world photographs into cartoon-style images is a meaningful and challenging task. The existing method cannot obtain a satisfactory cartoon result because the consistency and continuity of the cartoon image and the real photo in the aspects of structure, texture and illumination are not considered respectively. The Retinex model in the invention jointly learns the intrinsic properties (shape and texture) and the extrinsic properties (illumination) of photos and cartoon images. The RexGAN framework includes reflextgan that learns the mapping from photo images to cartoon images through reflection loss and luminegan that further facilitates generating the structure and illumination of the images through illumination loss. The invention can generate high-quality cartoon images from real photos, has clear edges and structures and correct illumination, and is superior to the most advanced method in subjective effect.

Description

Cartoon style migration method based on Retinex model

Technical Field

The invention relates to the fields of style migration, low-light image enhancement and the like, and designs a cartoon style migration method based on a Retinex decomposition model. The invention belongs to the field of computer vision, and particularly relates to technologies such as nerve style migration and Retinex image decomposition algorithm.

Background

In recent years, the film and television works of science fiction subjects are more and more popular. With the development of computer graphics calculation, the production quality of the film and television special effect is greatly improved. The makers can finish finer, more real and more shocking picture effects on a computer. The currently mainstream film and television special effect creation software includes Houdini, Nuke and MAYA. The character scene shaping of famous animated films final fantasy and madagas are both hands from MAYA. However, the special effect making process based on these software is complicated, which includes scene construction, and this not only requires a lot of financial and manpower, but also requires professional technicians and a lot of time. Recently, in the field of deep learning, research on style migration is receiving attention, and the main goal of the research is to convert a common image into a painting with an artistic style, which can provide good technical support for making a special effect of movie and television. The artistic style of cartoons emphasizes mainly the information that is closely related to the theme, thus simplifying and eliminating the miscellaneous parts. Because of the unique appeal, the artistic styles are widely used in life, not only in the video field, but also in the fields of games, advertisements, and the like. Therefore, the research of the cartoon style migration has important significance.

Since the 90's of the 20 th century, many non-photorealistic rendering (NPR) algorithms were developed for specific styles, including cartoon styles, oil-painting styles, and ink-wash styles. Researchers often use cel shading or filter processing to obtain specific patterns, and these methods are widely used in various software. In recent years, a neural style migration (NST) method based on a convolutional neural network has come to be developed, which has a good effect on the painting style by using the correlation between depth features. In addition, there is another group of methods based on the generation of a countermeasure network (GANS) to transmit images between two domains in a countermeasure manner. Subsequently, a series of methods based on a loop method have been developed to complete the migration of domains by training unpaired data. In addition, cartonongan, ComixGAN, AnimeGAN have also achieved great success in cartoon style migration. However, these methods still show unsatisfactory results in two respects. 1) The structure and texture of the original image weak light area are easy to lose; 2) the resulting cartoon image does not retain the global color appearance of the original image.

In the field of low light enhancement, the Retinex model is used as a perception model of the human visual system to decompose an image into illumination and reflection components. Its physical object can be described as O ═ I &, i.e., the observed image O is decomposed into illumination I and reflectance R, indicating that the elements are multiplied one by one. In recent years, methods represented by WVM, JeiP, and STAR can well decompose images into illumination and reflection components based on Retinex. The JeIP model can better reserve structural information through shape prior, better estimate reflectivity through texture prior and better acquire a light source through illumination prior. These properties can thus be exploited to better convey stylization and improve illumination and preserve the structure of the generated image.

A generative countermeasure network (GAN) framework based on a Retinex model is proposed and is denoted as RexGAN. First, the influence of extrinsic properties (illumination) is eliminated by introducing the Jeip-Retinex model to decompose the image into its reflection components. The mapping of the reflections from the photo images to the cartoon images is then learned in an iterative manner by exploiting the piecewise nature of the animated image and the piecewise continuous nature of the piecewise reflective components of the image, as well as the reflection losses. Finally, the illumination and the structure protecting the generated image are further improved by the loss of illumination.

Disclosure of Invention

The neural style migration method generally requires that the content image and the style image have structural similarity in the process of realizing style conversion of the image, and the drawing style is emphasized in the mode. The generation of the countermeasure network can realize the transfer of two completely different domains, so the generation of the countermeasure network is one of the research hotspots at present, is also applied to style transfer, and achieves good effect. The most representative of them are carteongan, ComixGAN and animagegan, respectively. However, these methods show unsatisfactory results in two respects. 1) The structure and texture of the original image weak light area are easy to lose; 2) the resulting cartoon image does not retain the global color appearance of the original image.

A cartoon style migration method based on a Retinex model is characterized by comprising the following steps:

the method is realized by a training stage and a testing stage; firstly, two generation countermeasure networks are trained simultaneously, which are a generator G and a discriminator D respectively_RAnd generator F and discriminator D_IThen, taking the real photo x as an input to continuously pass through a generator G and a generator F to finally obtain an output image F (G (x));

the method comprises the following three steps: preprocessing a data set, training a RexGAN model and synthesizing cartoon images;

1. preprocessing of data sets

In order to effectively estimate the illumination component I and the reflection component R of the real photos and cartoon images, an intrinsic and extrinsic joint prior model Jeip is integrated into a mapping function RexGAN; therefore, the cartoon image y and the real photo x are decomposed by using the Jeip model respectively to obtain the reflection component R of the cartoon image_yAnd the reflection component R of a real photograph_x(ii) a Unpaired training data sets are then utilized

And

training generator G to learn a mapping function (reflextgan) from the photo domain X of photo image X to the reflection domain of cartoon image y; training data finally by pairingCollection

And

training the generator F to learn a mapping function (luminegan) from the reflection domain Q of the cartoon image to the cartoon domain Y; the training data comprises a real photograph x and a cartoon image y, while the test data comprises only the real photograph x; in the training process of RexGAN, a data set provided by animeGAN is used; furthermore, all training images were adjusted to 256 × 256;

2, training a RexGAN model, based on the assumption that the reflection component of the image contains fine textures and has segmentation continuity, transferring the reflection of the real photo to a corresponding cartoon image; thus, the RexGAN model includes two mappings G: x → R_y(ReflectGAN) and F: R_y→ y (luminegan); in addition, two countermeasure discriminators D are introduced_RAnd D_I，D_RTo distinguish the image R_yAnd a converted image G (x) corresponding to the generator G; likewise, D_IFor the purpose of distinguishing between y and F (R)_y) Which corresponds to generator F; thus, the objective function is expressed as:

where argminmax represents the solution of the maximum and minimum problem, where similar to training one generative countermeasure network, the training problems of two generative countermeasure networks, i.e., minimizing generator G and generator F, maximizing discriminator D_RAnd discriminator D_I；

Luminegan is essentially the use of generation of a countermeasure network to preserve the structure of and reconstruct the illumination of a target image; thus, the formula is derived as:

L(G，F，D_R，D_I)＝L_R(G，D_R)+L_I(F，D_I) (2)

L_R(G，D_R) And L_I(F，D_I) Represents the loss functions of reflecctgan and luminegan, respectively, which are described in detail below;

2.1 training of reflextgan

Reflextgan is trained to learn style characteristics of cartoon image reflections; in order to reduce the training parameters of ReflectGAN, a generator model in AnimeGANv2 was introduced directly; in addition, a simple block-level discriminator is used to determine whether the generated result has the characteristics of cartoon image reflection components;

loss function of ReflectGAN counters loss by reflection

Content loss

Reflection style loss L_graAnd color consistency loss L_colComposition is carried out; thus, L_R(G，D_R) Expressed as:

wherein ω is₁＝300，ω₂＝1.4、ω₃2.5 and ω₄100 is the weight used to balance the ReflectGAN loss;

inputting the realistic picture x into a generator G and attempting to generate an image G (x) whose appearance style and texture should be matched to the reflection component R of the real cartoon image_yIn agreement with, and discriminator D_RThe purpose of (a) is to combine the image G (x) with the reflection component R_yDistinguishing; therefore, an image G (x) and a reflection component R will be generated_yInput to discriminator D_RTo obtain a false probability D_R(G (x)) and true probability D_R(R_y) Then the false probability D is determined_R(G (x)) is compared to the true probability value 1And the true probability D_R(R_y) Comparing with the false probability value 0, and further comparing the generator G with the discriminator D_RThe purpose of alternate iterative training is achieved until convergence; in order to effectively learn the style characteristics of cartoon image reflection, the reflection contrast loss based on least square loss is proposed to constrain the generator G and the discriminator D_R(ii) a Reflection countermeasure loss

Then it is expressed as:

wherein

Representing the reflection component R in the reflection domain Q of a cartoon image_yThe set of data of (a) is,

a data set representing a real photograph X in a real photograph field X;

introducing perceptual loss as content loss, which has the ability to preserve image content and overall spatial structure; thus, the ability to extract high-level features using VGG extracts G (x), x, and R_yThe image high-level features of (1); in addition, style characteristics reflected by the image are extracted from high-level characteristics of the image by using a Gram matrix; finally, the content is lost

And reflection style loss L_graIs defined as:

wherein

Representing the reflection component R in the field Q of reflection of the cartoon image_yThe data set and VGG of the real photo X in the real photo domain X represent the high-level feature map extracted by a 19-layer VGG network pre-trained on an lmageNet data set, and l represents the feature mapping of a specific VGG layer; during training, the "conv 4-4" layer is selected to calculate this loss;

reflection style loss contains color information of style image reflection, and the Jeip model is mainly image decomposition performed in an HSV color space illumination (V) channel; therefore, the RGB format image is converted into HSV format, and color consistency loss is established, so that the reflection color of the generated image is close to the reflection color of the real photo; since a large amount of texture information is included in the V channel, l is used for the V channel₁Sparse constraint, using Huber loss l for hue (H) channel and saturation (S) channel_h(ii) a Loss of color consistency L_colIs defined as:

wherein

Representing the real photograph X in the real photograph field X and the real photograph reflection component R in the real photograph reflection field P_xThe data set of (2); h (-), S (-), V (-), represent the three channels of the HSV format image, respectively, and α represents the weight of the V channel;

2.2 training of LuminGAN

Integrating four lightweight channel attention modules (ECA) into eight Inverse Residual Blocks (IRB) of a generator F to form a new residual block;

training the cartoon image and the reflection of the cartoon image as a group of paired data sets LuminGAN, so that a generator F has the capability of reconstructing illumination characteristics; thus, the objective function L_I(F，D_I) Mainly comprisingLoss of resistance by illumination

Content loss

And global coherency loss L_glo(ii) a The loss function of luminegan is expressed as:

wherein gamma is₁＝150，γ₂0.5 and γ₃1000 is the weight used to balance luminegan loss;

reflecting component R of cartoon image_yInput into the generator F and attempt to generate an image F (R)_y) Image F (R)_y) Should coincide with the real cartoon image y, and the discriminator D_IIs aimed at synthesizing the image F (R)_y) And a reflected component R_yDistinguishing;

so that light counteracts the loss

Is defined as:

equation (9) is constrained in the same way as equation (4), except that the cartoon image y and its reflection component R are separately applied_yInput to discriminator D_ITo obtain a true probability D_I(y) and a false probability D_I(F(R_y) ); then the false probability D_I(F(R_y) Compare with the true probability value 1 and compare the true probability D_I(y) comparing with the false probability value 0, and further comparing the generator F with the discriminator D_IThe purpose of alternate iterative training is achieved until convergence; wherein

A data set representing a cartoon image Y in the cartoon image field Y;

in order to accelerate the convergence speed in LuminGAN training, the content loss with the same structure as the content loss in ReflectGAN is added

A de-constrained generator F, the difference being that only the input real image x in ReflectGAN is changed to the reflection component R of the input cartoon image_y(ii) a In order to highlight the edge structure of the image, a color consistency loss constraint generator F of an HSV space is introduced, and the consistency loss is added to the whole image; therefore, global consistency is lost by L_gloIs defined as:

wherein

Representing the reflection component R in the reflection domain Q of a cartoon image_yAnd the data set of the cartoon image Y in the cartoon image domain Y, H (-), S (-), V (-), respectively represent three channels of the HSV format image, and beta ═ 2 represents the weight of the V channel.

The invention provides a generating countermeasure network model based on Retinex by jointly considering the intrinsic and extrinsic properties of an image. The model can effectively store the color characteristics of the content image and synthesize a high-quality cartoon style image.

Drawings

FIG. 1: rexgan framework diagram

FIG. 2: residual module in LuminGAN generator F

FIG. 3: subjective quality contrast map with different methods

FIG. 4: comparison of three different styles

FIG. 5: effect of verifying loss of color consistency

FIG. 6: effect of composition differences on results in Global consistency loss

Detailed Description

Fig. 1 shows that the method implementation is divided into a training phase and a testing phase. The method is realized by simultaneously training two generation countermeasure networks, namely a generator G and a discriminator D_RAnd generator F and discriminator D_IThen, the real photo x is taken as an input and continuously passed through the generator G and the generator F to finally obtain an output image F (G (x)). The method can be divided into the following three steps: preprocessing of the data set, training of the RexGAN model, and synthesis of cartoon images.

3. Preprocessing of data sets

In order to efficiently estimate the illumination component I and the reflection component R of real photographs and cartoon images, a joint prior model (Je ip) of intrinsic and extrinsic is integrated into a mapping function (RexGAN). Therefore, the Jeip model is used for decomposing the cartoon image y and the real photo x respectively to obtain the reflection component R of the cartoon image_yAnd the reflection component R of a real photograph_x. Unpaired training data sets are then utilized

And

the training generator G is such that it learns the mapping function (ReflectGAN) from the reflection domain P of the photographic image x to the reflection domain Q of the cartoon image y. Training data set by pairing

And

the training generator F makes it learn the mapping function from the reflection domain Q of the cartoon image to the cartoon domain YNumber (luminegan). The training data contains a real photograph x and a cartoon image y, while the test data only contains a real photograph x. In the RexGAN training process, the data set provided by AnimeGAN is used. The data set contains 6656 real photos as content image data set, and key frames cut by three cartoon movies (respectively by Nagasaki Jun, New Haichi and today sensitive practice) as style image data set, wherein different authors represent different styles. Further, all training images are adjusted to 256 × 256.

Training of the RexGAN model transfers reflections of real photographs onto corresponding cartoon images based on the assumption that the reflection components of the images contain fine textures and have piecewise continuity. Therefore, the RexGAN model includes two mappings G: x → R_y(ReflectGAN) and F: r_y→ y (LuminGAN). In addition, two countermeasure discriminators D are introduced_RAnd D_I，D_RTo distinguish the image R_yAnd a converted image G (x) corresponding to the generator G; likewise, D_IFor the purpose of distinguishing between y and F (R)_y) Which corresponds to the generator F. Thus, the objective function is expressed as:

where argminmax represents the solution of the maximum and minimum problem, where similar to training one generative countermeasure network, the training problems of two generative countermeasure networks, i.e., minimizing generator G and generator F, maximizing discriminator D_RAnd discriminator D_I。

Since the appearance of an object is influenced by both intrinsic and extrinsic properties. Intrinsic properties of the object, including shape and texture, are illumination independent. On the basis, the reflecting layer is used for establishing the relation between the reflecting layer of the real photo and the reflecting layer of the cartoon image, and the style characteristics of the cartoon image are learned. Luminegan is essentially the use of illumination to generate a countermeasure network to preserve the structure of and reconstruct an image of interest. Thus, the formula can be derived as:

L(G，F，D_R，D_I)＝L_R(G，D_R)+L_I(F，D_I) (2)

L_R(G，D_R) And L_I(F，D_I) Which represent the loss functions of reflecctgan and luminegan, respectively, are described in detail below.

4.1 training of reflextgan

ReflectGAN is trained to learn style characteristics of cartoon image reflections. To reduce the training parameters of ReflectGAN, the generator model in animageganv 2 was introduced directly. In addition, a simple block-level discriminator is used to determine whether the resulting result is characteristic of the reflection component of the cartoon image.

Loss function of ReflectGAN counters loss by reflection

Content loss

Reflection style loss L_qraAnd color consistency loss L_colAnd (4) forming. Thus, L_R(G，D_R) Can be expressed as:

wherein ω is₁＝300，ω₂＝1.4、ω₃2.5 and ω₄100 is the weight used to balance the ReflectGAN loss. The weight value is obtained by a large number of experiments.

Inputting the realistic picture x into a generator G and attempting to generate an image G (x) whose appearance style and texture should be matched to the reflection component R of the real cartoon image_yIn agreement with, and discriminator D_RThe purpose of (a) is to combine the image G (x) with the reflection component R_yAre distinguished. Therefore, an image G (x) and a reflection component R will be generated_yInput to discriminator D_RTo obtain a false probability D_R(G (x)) and true probability D_R(R_y) Then the false probability D is determined_R(G (x)) is compared to the true probability value 1 and the true probability D_R(R_y) Comparing with the false probability value 0, and further comparing the generator G with the discriminator D_RThe aim of alternate iterative training is achieved until convergence. In order to effectively learn the style characteristics of cartoon image reflection, the reflection contrast loss based on least square loss is proposed to constrain the generator G and the discriminator D_R. Reflection countermeasure loss

It can be expressed as:

wherein

a data set representing a real photograph X in a real photograph field X.

Perceptual loss is introduced as content loss, which has the ability to preserve image content and overall spatial structure. Thus, the ability to extract high-level features using VGG extracts G (x), x, and R_yThe high-level features of the image. In addition, the Gram matrix is used to extract the style features reflected by the image from the high-level features of the image. Finally, the content is lost

And reflection style loss L_graIs defined as:

wherein

Representing the reflection component R in the field Q of reflection of the cartoon image_yThe data set and VGG associated with real photo X in real photo domain X represent the high level feature maps extracted by a 19-layer VGG network pre-trained on the lmageNet data set, and l represents the feature mapping for a particular VGG layer. During training, the "conv 4-4" layer was chosen to calculate this loss.

Reflection style loss contains color information of the style image reflection, while the Jeip model is mainly image decomposition in the HSV color space illumination (V) channel. Therefore, the image in the RGB format is converted into the HSV format, and the color consistency loss is established, so that the reflection color of the generated image is close to the reflection color of the real photo. Since a large amount of texture information is included in the V channel, l is used for the V channel₁Sparse constraint, using Huber loss l for hue (H) channel and saturation (S) channel_h. Loss of color consistency L_colIs defined as:

wherein

Representing the real photograph X in the real photograph field X and the real photograph reflection component R in the real photograph reflection field P_xThe data set of (2). H (-), S (-), V (-), represent the three channels of the HSV format image, respectively, and α represents the weight of the V channel.

4.2 training of LuminGAN

The purpose of luminegan is to reconstruct the lighting components of a cartoon image. Therefore, the AnimeGAN generator is directly used, as well as a conventional pixel-level discriminator. To avoid generating high frequency artifacts in the image and to reduce the training complexity, four lightweight channel attention modules (ECAs) are integrated into eight Inverse Residual Blocks (IRBs) of the generator F to form a new residual block, as shown in fig. 2.

In order to make the generated image have clearer edges and better visual perception, the cartoon image and the reflection of the cartoon image are trained as a set of data set luminegan, making the generator F capable of reconstructing illumination characteristics. Thus, the objective function L_I(F，D_I) Loss of resistance primarily by illumination

Content loss

And global coherency loss L_glo. The loss function of luminegan is expressed as:

wherein gamma is₁＝150，γ₂0.5 and γ₃1000 is the weight used to balance the luminegan loss.

Reflecting component R of cartoon image_yInput into the generator F and attempt to generate an image F (R)_y) Image F (R)_y) Should coincide with the real cartoon image y, and the discriminator D_IIs aimed at synthesizing the image F (R)_y) And a reflected component R_yAre distinguished.

So that light counteracts the loss

Is defined as:

equation (9) is constrained in the same way as equation (4), except that the cartoon image y and its reflection component R are separately applied_yInput to discriminator D_ITo obtain a true probability D_I(y) and falseProbability D_I(F(R_y)). Then the false probability D_I(F(R_y) Compare with the true probability value 1 and compare the true probability D_I(y) comparing with the false probability value 0, and further comparing the generator F with the discriminator D_IThe aim of alternate iterative training is achieved until convergence. Wherein

A data set representing a cartoon image Y in the cartoon image field Y.

A de-constrained generator F, the difference being that only the input real image x in ReflectGAN is changed to the reflection component R of the input cartoon image_y. In order to highlight the edge structure of the image, a color consistency loss constraint generator F of HSV space is introduced and adds this consistency loss to the entire image. Therefore, global consistency is lost by L_gloIs defined as:

wherein

5. Synthesis of cartoon images

Generator G first converts the input photographic image x into G (x) with statistical characteristics similar to the reflection components of the cartoon image, and then generator F converts G (x) into the cartoon image F (G (x)).

In order to verify the effectiveness of the proposed solution of the invention, experimental verification was carried out from different aspects. The results were compared with four of the most advanced works at present, and the experimental results are shown in fig. 3. As a result, the texture information of the real photo is kept, the clear edge is reproduced, and the whole color is more in line with the visual effect. In addition, attempts were made to test the method from different styles, with the results shown in FIG. 4.

Fig. 5 shows how color consistency loss affects the generation of ReflectGAN. Content loss and reflection style loss enable generator G to generate stylized images, but they tend to produce over-stylization. Images (c) and (d) have a clear texture after loss of color consistency is used. However, different α's have a significant effect on the results. Compared with the result of α ═ 1, the result of α ═ 2 showed more correct color. To further investigate the effect of different compositions of global consistency loss on luminegan, β ═ 2 was set in equation (11), and the different effects of the three constituent modes of the loss on the results were compared. As shown in fig. 6, although the global consistency of the RGB format presents a good appearance, its structure is blurred. Also, the results presented using only the HSV format as a constraint are the opposite of the RGB format. In contrast to them, image (d) shows sharp edges and a pleasing color appearance.

Claims

1. A cartoon style migration method based on a Retinex model is characterized by comprising the following steps:

1) preprocessing of data sets

For efficient estimation of illumination components I and reflection components R of real photographs and cartoon images, an intrinsic and an extrinsic joint prior model Jeep are integrated into a mapping functionNumber RexGAN; therefore, the cartoon image y and the real photo x are decomposed by using the Jeip model respectively to obtain the reflection component R of the cartoon image_yAnd the reflection component R of a real photograph_x(ii) a Unpaired training data sets are then utilized

And

training generator G to learn a mapping function (ReflectGAN) from the photo domain X of photo image X to the reflection domain Q of cartoon image y; training data set by pairing

And

2) training of RexGAN model

Based on the assumption that the reflection component of the image contains fine textures and has segmentation continuity, the reflection of the real photo is transferred to the corresponding cartoon image; therefore, the RexGAN model includes two mappings G: x → R_y(ReflectGAN) and F: r_y→ y (luminegan); in addition, two countermeasure discriminators D are introduced_RAnd D_I，D_RTo distinguish the image R_yAnd a converted image G (x) corresponding to the generator G; likewise, D_IFor the purpose of distinguishing between y and F (R)_y) Which corresponds to generator F; therefore, the aim is toThe objective function is expressed as:

L(G，F，D_R，D_I)＝L_R(G，D_R)+L_I(F，D_I) (2)

2.1 training of reflextgan

loss function of ReflectGAN counters loss by reflection

Content loss

inputting the realistic picture x into a generator G and attempting to generate an image G (x) whose appearance style and texture should be matched to the reflection component R of the real cartoon image_yThe purpose of the discriminator DR is to combine the resultant image G (x) with the reflection component R_yDistinguishing; therefore, an image G (x) and a reflection component R will be generated_yInput to discriminator D_RTo obtain a false probability D_R(G (x)) and true probability D_R(R_y) Then the false probability D is determined_R(G (x)) is compared to the true probability value 1 and the true probability D_R(R_y) Comparing with the false probability value 0, and further comparing the generator G with the discriminator D_RThe purpose of alternate iterative training is achieved until convergence; in order to effectively learn the style characteristics of cartoon image reflection, the reflection contrast loss based on least square loss is proposed to constrain the generator G and the discriminator D_R(ii) a Reflection countermeasure loss

Then it is expressed as:

wherein

a data set representing a real photograph X in a real photograph field X;

introducing perceptual loss as content loss with retention mapsCapabilities like content and overall spatial structure; thus, the ability to extract high-level features using VGG extracts G (x), x, and R_yThe image high-level features of (1); in addition, style characteristics reflected by the image are extracted from high-level characteristics of the image by using a Gram matrix; finally, the content is lost

And reflection style loss L_graIs defined as:

wherein

Representing the reflection component R in the field Q of reflection of the cartoon image_yThe data set and VGG of the real photo X in the real photo domain X represent a high-level feature map extracted by a 19-layer VGG network pre-trained on an ImageNet data set, and l represents the feature mapping of a specific VGG layer; during training, the "conv 4-4" layer is selected to calculate this loss;

wherein

2.2 training of LuminGAN

training the cartoon image and the reflection of the cartoon image as a group of paired data sets LuminGAN, so that a generator F has the capability of reconstructing illumination characteristics; thus, the objective function L_I(F，D_I) Loss of resistance primarily by illumination

Content loss

so that light counteracts the loss

Is defined as:

A data set representing a cartoon image Y in the cartoon image field Y;

wherein