CN113066114A - Cartoon style migration method based on Retinex model - Google Patents

Cartoon style migration method based on Retinex model Download PDF

Info

Publication number
CN113066114A
CN113066114A CN202110305033.7A CN202110305033A CN113066114A CN 113066114 A CN113066114 A CN 113066114A CN 202110305033 A CN202110305033 A CN 202110305033A CN 113066114 A CN113066114 A CN 113066114A
Authority
CN
China
Prior art keywords
image
loss
reflection
cartoon
generator
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110305033.7A
Other languages
Chinese (zh)
Inventor
施云惠
欧阳浩然
齐娜
尹宝才
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202110305033.7A priority Critical patent/CN113066114A/en
Publication of CN113066114A publication Critical patent/CN113066114A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/50Lighting effects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/40Analysis of texture
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics

Abstract

A cartoon style migration method based on a Retinex model belongs to the field of computer vision. Converting real-world photographs into cartoon-style images is a meaningful and challenging task. The existing method cannot obtain a satisfactory cartoon result because the consistency and continuity of the cartoon image and the real photo in the aspects of structure, texture and illumination are not considered respectively. The Retinex model in the invention jointly learns the intrinsic properties (shape and texture) and the extrinsic properties (illumination) of photos and cartoon images. The RexGAN framework includes reflextgan that learns the mapping from photo images to cartoon images through reflection loss and luminegan that further facilitates generating the structure and illumination of the images through illumination loss. The invention can generate high-quality cartoon images from real photos, has clear edges and structures and correct illumination, and is superior to the most advanced method in subjective effect.

Description

Cartoon style migration method based on Retinex model
Technical Field
The invention relates to the fields of style migration, low-light image enhancement and the like, and designs a cartoon style migration method based on a Retinex decomposition model. The invention belongs to the field of computer vision, and particularly relates to technologies such as nerve style migration and Retinex image decomposition algorithm.
Background
In recent years, the film and television works of science fiction subjects are more and more popular. With the development of computer graphics calculation, the production quality of the film and television special effect is greatly improved. The makers can finish finer, more real and more shocking picture effects on a computer. The currently mainstream film and television special effect creation software includes Houdini, Nuke and MAYA. The character scene shaping of famous animated films final fantasy and madagas are both hands from MAYA. However, the special effect making process based on these software is complicated, which includes scene construction, and this not only requires a lot of financial and manpower, but also requires professional technicians and a lot of time. Recently, in the field of deep learning, research on style migration is receiving attention, and the main goal of the research is to convert a common image into a painting with an artistic style, which can provide good technical support for making a special effect of movie and television. The artistic style of cartoons emphasizes mainly the information that is closely related to the theme, thus simplifying and eliminating the miscellaneous parts. Because of the unique appeal, the artistic styles are widely used in life, not only in the video field, but also in the fields of games, advertisements, and the like. Therefore, the research of the cartoon style migration has important significance.
Since the 90's of the 20 th century, many non-photorealistic rendering (NPR) algorithms were developed for specific styles, including cartoon styles, oil-painting styles, and ink-wash styles. Researchers often use cel shading or filter processing to obtain specific patterns, and these methods are widely used in various software. In recent years, a neural style migration (NST) method based on a convolutional neural network has come to be developed, which has a good effect on the painting style by using the correlation between depth features. In addition, there is another group of methods based on the generation of a countermeasure network (GANS) to transmit images between two domains in a countermeasure manner. Subsequently, a series of methods based on a loop method have been developed to complete the migration of domains by training unpaired data. In addition, cartonongan, ComixGAN, AnimeGAN have also achieved great success in cartoon style migration. However, these methods still show unsatisfactory results in two respects. 1) The structure and texture of the original image weak light area are easy to lose; 2) the resulting cartoon image does not retain the global color appearance of the original image.
In the field of low light enhancement, the Retinex model is used as a perception model of the human visual system to decompose an image into illumination and reflection components. Its physical object can be described as O ═ I &, i.e., the observed image O is decomposed into illumination I and reflectance R, indicating that the elements are multiplied one by one. In recent years, methods represented by WVM, JeiP, and STAR can well decompose images into illumination and reflection components based on Retinex. The JeIP model can better reserve structural information through shape prior, better estimate reflectivity through texture prior and better acquire a light source through illumination prior. These properties can thus be exploited to better convey stylization and improve illumination and preserve the structure of the generated image.
A generative countermeasure network (GAN) framework based on a Retinex model is proposed and is denoted as RexGAN. First, the influence of extrinsic properties (illumination) is eliminated by introducing the Jeip-Retinex model to decompose the image into its reflection components. The mapping of the reflections from the photo images to the cartoon images is then learned in an iterative manner by exploiting the piecewise nature of the animated image and the piecewise continuous nature of the piecewise reflective components of the image, as well as the reflection losses. Finally, the illumination and the structure protecting the generated image are further improved by the loss of illumination.
Disclosure of Invention
The neural style migration method generally requires that the content image and the style image have structural similarity in the process of realizing style conversion of the image, and the drawing style is emphasized in the mode. The generation of the countermeasure network can realize the transfer of two completely different domains, so the generation of the countermeasure network is one of the research hotspots at present, is also applied to style transfer, and achieves good effect. The most representative of them are carteongan, ComixGAN and animagegan, respectively. However, these methods show unsatisfactory results in two respects. 1) The structure and texture of the original image weak light area are easy to lose; 2) the resulting cartoon image does not retain the global color appearance of the original image.
A cartoon style migration method based on a Retinex model is characterized by comprising the following steps:
the method is realized by a training stage and a testing stage; firstly, two generation countermeasure networks are trained simultaneously, which are a generator G and a discriminator D respectivelyRAnd generator F and discriminator DIThen, taking the real photo x as an input to continuously pass through a generator G and a generator F to finally obtain an output image F (G (x));
the method comprises the following three steps: preprocessing a data set, training a RexGAN model and synthesizing cartoon images;
1. preprocessing of data sets
In order to effectively estimate the illumination component I and the reflection component R of the real photos and cartoon images, an intrinsic and extrinsic joint prior model Jeip is integrated into a mapping function RexGAN; therefore, the cartoon image y and the real photo x are decomposed by using the Jeip model respectively to obtain the reflection component R of the cartoon imageyAnd the reflection component R of a real photographx(ii) a Unpaired training data sets are then utilized
Figure BDA0002969964550000034
Figure BDA0002969964550000035
And
Figure BDA0002969964550000031
training generator G to learn a mapping function (reflextgan) from the photo domain X of photo image X to the reflection domain of cartoon image y; training data finally by pairingCollection
Figure BDA0002969964550000032
And
Figure BDA0002969964550000036
Figure BDA0002969964550000037
training the generator F to learn a mapping function (luminegan) from the reflection domain Q of the cartoon image to the cartoon domain Y; the training data comprises a real photograph x and a cartoon image y, while the test data comprises only the real photograph x; in the training process of RexGAN, a data set provided by animeGAN is used; furthermore, all training images were adjusted to 256 × 256;
2, training a RexGAN model, based on the assumption that the reflection component of the image contains fine textures and has segmentation continuity, transferring the reflection of the real photo to a corresponding cartoon image; thus, the RexGAN model includes two mappings G: x → Ry(ReflectGAN) and F: Ry→ y (luminegan); in addition, two countermeasure discriminators D are introducedRAnd DI,DRTo distinguish the image RyAnd a converted image G (x) corresponding to the generator G; likewise, DIFor the purpose of distinguishing between y and F (R)y) Which corresponds to generator F; thus, the objective function is expressed as:
Figure BDA0002969964550000033
where argminmax represents the solution of the maximum and minimum problem, where similar to training one generative countermeasure network, the training problems of two generative countermeasure networks, i.e., minimizing generator G and generator F, maximizing discriminator DRAnd discriminator DI
Luminegan is essentially the use of generation of a countermeasure network to preserve the structure of and reconstruct the illumination of a target image; thus, the formula is derived as:
L(G,F,DR,DI)=LR(G,DR)+LI(F,DI) (2)
LR(G,DR) And LI(F,DI) Represents the loss functions of reflecctgan and luminegan, respectively, which are described in detail below;
2.1 training of reflextgan
Reflextgan is trained to learn style characteristics of cartoon image reflections; in order to reduce the training parameters of ReflectGAN, a generator model in AnimeGANv2 was introduced directly; in addition, a simple block-level discriminator is used to determine whether the generated result has the characteristics of cartoon image reflection components;
loss function of ReflectGAN counters loss by reflection
Figure BDA0002969964550000041
Content loss
Figure BDA0002969964550000042
Reflection style loss LgraAnd color consistency loss LcolComposition is carried out; thus, LR(G,DR) Expressed as:
Figure BDA0002969964550000043
wherein ω is1=300,ω2=1.4、ω32.5 and ω4100 is the weight used to balance the ReflectGAN loss;
inputting the realistic picture x into a generator G and attempting to generate an image G (x) whose appearance style and texture should be matched to the reflection component R of the real cartoon imageyIn agreement with, and discriminator DRThe purpose of (a) is to combine the image G (x) with the reflection component RyDistinguishing; therefore, an image G (x) and a reflection component R will be generatedyInput to discriminator DRTo obtain a false probability DR(G (x)) and true probability DR(Ry) Then the false probability D is determinedR(G (x)) is compared to the true probability value 1And the true probability DR(Ry) Comparing with the false probability value 0, and further comparing the generator G with the discriminator DRThe purpose of alternate iterative training is achieved until convergence; in order to effectively learn the style characteristics of cartoon image reflection, the reflection contrast loss based on least square loss is proposed to constrain the generator G and the discriminator DR(ii) a Reflection countermeasure loss
Figure BDA0002969964550000044
Then it is expressed as:
Figure BDA0002969964550000045
wherein
Figure BDA0002969964550000046
Representing the reflection component R in the reflection domain Q of a cartoon imageyThe set of data of (a) is,
Figure BDA0002969964550000047
a data set representing a real photograph X in a real photograph field X;
introducing perceptual loss as content loss, which has the ability to preserve image content and overall spatial structure; thus, the ability to extract high-level features using VGG extracts G (x), x, and RyThe image high-level features of (1); in addition, style characteristics reflected by the image are extracted from high-level characteristics of the image by using a Gram matrix; finally, the content is lost
Figure BDA0002969964550000051
And reflection style loss LgraIs defined as:
Figure BDA0002969964550000052
Figure BDA0002969964550000053
wherein
Figure BDA0002969964550000054
Representing the reflection component R in the field Q of reflection of the cartoon imageyThe data set and VGG of the real photo X in the real photo domain X represent the high-level feature map extracted by a 19-layer VGG network pre-trained on an lmageNet data set, and l represents the feature mapping of a specific VGG layer; during training, the "conv 4-4" layer is selected to calculate this loss;
reflection style loss contains color information of style image reflection, and the Jeip model is mainly image decomposition performed in an HSV color space illumination (V) channel; therefore, the RGB format image is converted into HSV format, and color consistency loss is established, so that the reflection color of the generated image is close to the reflection color of the real photo; since a large amount of texture information is included in the V channel, l is used for the V channel1Sparse constraint, using Huber loss l for hue (H) channel and saturation (S) channelh(ii) a Loss of color consistency LcolIs defined as:
Figure BDA0002969964550000055
wherein
Figure BDA0002969964550000056
Representing the real photograph X in the real photograph field X and the real photograph reflection component R in the real photograph reflection field PxThe data set of (2); h (-), S (-), V (-), represent the three channels of the HSV format image, respectively, and α represents the weight of the V channel;
2.2 training of LuminGAN
Integrating four lightweight channel attention modules (ECA) into eight Inverse Residual Blocks (IRB) of a generator F to form a new residual block;
training the cartoon image and the reflection of the cartoon image as a group of paired data sets LuminGAN, so that a generator F has the capability of reconstructing illumination characteristics; thus, the objective function LI(F,DI) Mainly comprisingLoss of resistance by illumination
Figure BDA0002969964550000057
Content loss
Figure BDA0002969964550000058
And global coherency loss Lglo(ii) a The loss function of luminegan is expressed as:
Figure BDA0002969964550000059
wherein gamma is1=150,γ20.5 and γ31000 is the weight used to balance luminegan loss;
reflecting component R of cartoon imageyInput into the generator F and attempt to generate an image F (R)y) Image F (R)y) Should coincide with the real cartoon image y, and the discriminator DIIs aimed at synthesizing the image F (R)y) And a reflected component RyDistinguishing;
so that light counteracts the loss
Figure BDA0002969964550000061
Is defined as:
Figure BDA0002969964550000062
equation (9) is constrained in the same way as equation (4), except that the cartoon image y and its reflection component R are separately appliedyInput to discriminator DITo obtain a true probability DI(y) and a false probability DI(F(Ry) ); then the false probability DI(F(Ry) Compare with the true probability value 1 and compare the true probability DI(y) comparing with the false probability value 0, and further comparing the generator F with the discriminator DIThe purpose of alternate iterative training is achieved until convergence; wherein
Figure BDA0002969964550000063
A data set representing a cartoon image Y in the cartoon image field Y;
in order to accelerate the convergence speed in LuminGAN training, the content loss with the same structure as the content loss in ReflectGAN is added
Figure BDA0002969964550000064
A de-constrained generator F, the difference being that only the input real image x in ReflectGAN is changed to the reflection component R of the input cartoon imagey(ii) a In order to highlight the edge structure of the image, a color consistency loss constraint generator F of an HSV space is introduced, and the consistency loss is added to the whole image; therefore, global consistency is lost by LgloIs defined as:
Figure BDA0002969964550000065
wherein
Figure BDA0002969964550000066
Representing the reflection component R in the reflection domain Q of a cartoon imageyAnd the data set of the cartoon image Y in the cartoon image domain Y, H (-), S (-), V (-), respectively represent three channels of the HSV format image, and beta ═ 2 represents the weight of the V channel.
The invention provides a generating countermeasure network model based on Retinex by jointly considering the intrinsic and extrinsic properties of an image. The model can effectively store the color characteristics of the content image and synthesize a high-quality cartoon style image.
Drawings
FIG. 1: rexgan framework diagram
FIG. 2: residual module in LuminGAN generator F
FIG. 3: subjective quality contrast map with different methods
FIG. 4: comparison of three different styles
FIG. 5: effect of verifying loss of color consistency
FIG. 6: effect of composition differences on results in Global consistency loss
Detailed Description
Fig. 1 shows that the method implementation is divided into a training phase and a testing phase. The method is realized by simultaneously training two generation countermeasure networks, namely a generator G and a discriminator DRAnd generator F and discriminator DIThen, the real photo x is taken as an input and continuously passed through the generator G and the generator F to finally obtain an output image F (G (x)). The method can be divided into the following three steps: preprocessing of the data set, training of the RexGAN model, and synthesis of cartoon images.
3. Preprocessing of data sets
In order to efficiently estimate the illumination component I and the reflection component R of real photographs and cartoon images, a joint prior model (Je ip) of intrinsic and extrinsic is integrated into a mapping function (RexGAN). Therefore, the Jeip model is used for decomposing the cartoon image y and the real photo x respectively to obtain the reflection component R of the cartoon imageyAnd the reflection component R of a real photographx. Unpaired training data sets are then utilized
Figure BDA0002969964550000073
Figure BDA0002969964550000074
And
Figure BDA0002969964550000071
the training generator G is such that it learns the mapping function (ReflectGAN) from the reflection domain P of the photographic image x to the reflection domain Q of the cartoon image y. Training data set by pairing
Figure BDA0002969964550000072
And
Figure BDA0002969964550000075
Figure BDA0002969964550000076
the training generator F makes it learn the mapping function from the reflection domain Q of the cartoon image to the cartoon domain YNumber (luminegan). The training data contains a real photograph x and a cartoon image y, while the test data only contains a real photograph x. In the RexGAN training process, the data set provided by AnimeGAN is used. The data set contains 6656 real photos as content image data set, and key frames cut by three cartoon movies (respectively by Nagasaki Jun, New Haichi and today sensitive practice) as style image data set, wherein different authors represent different styles. Further, all training images are adjusted to 256 × 256.
Training of the RexGAN model transfers reflections of real photographs onto corresponding cartoon images based on the assumption that the reflection components of the images contain fine textures and have piecewise continuity. Therefore, the RexGAN model includes two mappings G: x → Ry(ReflectGAN) and F: ry→ y (LuminGAN). In addition, two countermeasure discriminators D are introducedRAnd DI,DRTo distinguish the image RyAnd a converted image G (x) corresponding to the generator G; likewise, DIFor the purpose of distinguishing between y and F (R)y) Which corresponds to the generator F. Thus, the objective function is expressed as:
Figure BDA0002969964550000081
where argminmax represents the solution of the maximum and minimum problem, where similar to training one generative countermeasure network, the training problems of two generative countermeasure networks, i.e., minimizing generator G and generator F, maximizing discriminator DRAnd discriminator DI
Since the appearance of an object is influenced by both intrinsic and extrinsic properties. Intrinsic properties of the object, including shape and texture, are illumination independent. On the basis, the reflecting layer is used for establishing the relation between the reflecting layer of the real photo and the reflecting layer of the cartoon image, and the style characteristics of the cartoon image are learned. Luminegan is essentially the use of illumination to generate a countermeasure network to preserve the structure of and reconstruct an image of interest. Thus, the formula can be derived as:
L(G,F,DR,DI)=LR(G,DR)+LI(F,DI) (2)
LR(G,DR) And LI(F,DI) Which represent the loss functions of reflecctgan and luminegan, respectively, are described in detail below.
4.1 training of reflextgan
ReflectGAN is trained to learn style characteristics of cartoon image reflections. To reduce the training parameters of ReflectGAN, the generator model in animageganv 2 was introduced directly. In addition, a simple block-level discriminator is used to determine whether the resulting result is characteristic of the reflection component of the cartoon image.
Loss function of ReflectGAN counters loss by reflection
Figure BDA0002969964550000083
Content loss
Figure BDA0002969964550000084
Reflection style loss LqraAnd color consistency loss LcolAnd (4) forming. Thus, LR(G,DR) Can be expressed as:
Figure BDA0002969964550000082
wherein ω is1=300,ω2=1.4、ω32.5 and ω4100 is the weight used to balance the ReflectGAN loss. The weight value is obtained by a large number of experiments.
Inputting the realistic picture x into a generator G and attempting to generate an image G (x) whose appearance style and texture should be matched to the reflection component R of the real cartoon imageyIn agreement with, and discriminator DRThe purpose of (a) is to combine the image G (x) with the reflection component RyAre distinguished. Therefore, an image G (x) and a reflection component R will be generatedyInput to discriminator DRTo obtain a false probability DR(G (x)) and true probability DR(Ry) Then the false probability D is determinedR(G (x)) is compared to the true probability value 1 and the true probability DR(Ry) Comparing with the false probability value 0, and further comparing the generator G with the discriminator DRThe aim of alternate iterative training is achieved until convergence. In order to effectively learn the style characteristics of cartoon image reflection, the reflection contrast loss based on least square loss is proposed to constrain the generator G and the discriminator DR. Reflection countermeasure loss
Figure BDA0002969964550000098
It can be expressed as:
Figure BDA0002969964550000091
wherein
Figure BDA0002969964550000092
Representing the reflection component R in the reflection domain Q of a cartoon imageyThe set of data of (a) is,
Figure BDA0002969964550000093
a data set representing a real photograph X in a real photograph field X.
Perceptual loss is introduced as content loss, which has the ability to preserve image content and overall spatial structure. Thus, the ability to extract high-level features using VGG extracts G (x), x, and RyThe high-level features of the image. In addition, the Gram matrix is used to extract the style features reflected by the image from the high-level features of the image. Finally, the content is lost
Figure BDA0002969964550000094
And reflection style loss LgraIs defined as:
Figure BDA0002969964550000095
Figure BDA0002969964550000096
wherein
Figure BDA0002969964550000097
Representing the reflection component R in the field Q of reflection of the cartoon imageyThe data set and VGG associated with real photo X in real photo domain X represent the high level feature maps extracted by a 19-layer VGG network pre-trained on the lmageNet data set, and l represents the feature mapping for a particular VGG layer. During training, the "conv 4-4" layer was chosen to calculate this loss.
Reflection style loss contains color information of the style image reflection, while the Jeip model is mainly image decomposition in the HSV color space illumination (V) channel. Therefore, the image in the RGB format is converted into the HSV format, and the color consistency loss is established, so that the reflection color of the generated image is close to the reflection color of the real photo. Since a large amount of texture information is included in the V channel, l is used for the V channel1Sparse constraint, using Huber loss l for hue (H) channel and saturation (S) channelh. Loss of color consistency LcolIs defined as:
Figure BDA0002969964550000101
wherein
Figure BDA0002969964550000102
Representing the real photograph X in the real photograph field X and the real photograph reflection component R in the real photograph reflection field PxThe data set of (2). H (-), S (-), V (-), represent the three channels of the HSV format image, respectively, and α represents the weight of the V channel.
4.2 training of LuminGAN
The purpose of luminegan is to reconstruct the lighting components of a cartoon image. Therefore, the AnimeGAN generator is directly used, as well as a conventional pixel-level discriminator. To avoid generating high frequency artifacts in the image and to reduce the training complexity, four lightweight channel attention modules (ECAs) are integrated into eight Inverse Residual Blocks (IRBs) of the generator F to form a new residual block, as shown in fig. 2.
In order to make the generated image have clearer edges and better visual perception, the cartoon image and the reflection of the cartoon image are trained as a set of data set luminegan, making the generator F capable of reconstructing illumination characteristics. Thus, the objective function LI(F,DI) Loss of resistance primarily by illumination
Figure BDA0002969964550000103
Content loss
Figure BDA0002969964550000104
And global coherency loss Lglo. The loss function of luminegan is expressed as:
Figure BDA0002969964550000105
wherein gamma is1=150,γ20.5 and γ31000 is the weight used to balance the luminegan loss.
Reflecting component R of cartoon imageyInput into the generator F and attempt to generate an image F (R)y) Image F (R)y) Should coincide with the real cartoon image y, and the discriminator DIIs aimed at synthesizing the image F (R)y) And a reflected component RyAre distinguished.
So that light counteracts the loss
Figure BDA0002969964550000106
Is defined as:
Figure BDA0002969964550000107
equation (9) is constrained in the same way as equation (4), except that the cartoon image y and its reflection component R are separately appliedyInput to discriminator DITo obtain a true probability DI(y) and falseProbability DI(F(Ry)). Then the false probability DI(F(Ry) Compare with the true probability value 1 and compare the true probability DI(y) comparing with the false probability value 0, and further comparing the generator F with the discriminator DIThe aim of alternate iterative training is achieved until convergence. Wherein
Figure BDA0002969964550000108
A data set representing a cartoon image Y in the cartoon image field Y.
In order to accelerate the convergence speed in LuminGAN training, the content loss with the same structure as the content loss in ReflectGAN is added
Figure BDA0002969964550000111
A de-constrained generator F, the difference being that only the input real image x in ReflectGAN is changed to the reflection component R of the input cartoon imagey. In order to highlight the edge structure of the image, a color consistency loss constraint generator F of HSV space is introduced and adds this consistency loss to the entire image. Therefore, global consistency is lost by LgloIs defined as:
Figure BDA0002969964550000112
wherein
Figure BDA0002969964550000113
Representing the reflection component R in the reflection domain Q of a cartoon imageyAnd the data set of the cartoon image Y in the cartoon image domain Y, H (-), S (-), V (-), respectively represent three channels of the HSV format image, and beta ═ 2 represents the weight of the V channel.
5. Synthesis of cartoon images
Generator G first converts the input photographic image x into G (x) with statistical characteristics similar to the reflection components of the cartoon image, and then generator F converts G (x) into the cartoon image F (G (x)).
In order to verify the effectiveness of the proposed solution of the invention, experimental verification was carried out from different aspects. The results were compared with four of the most advanced works at present, and the experimental results are shown in fig. 3. As a result, the texture information of the real photo is kept, the clear edge is reproduced, and the whole color is more in line with the visual effect. In addition, attempts were made to test the method from different styles, with the results shown in FIG. 4.
Fig. 5 shows how color consistency loss affects the generation of ReflectGAN. Content loss and reflection style loss enable generator G to generate stylized images, but they tend to produce over-stylization. Images (c) and (d) have a clear texture after loss of color consistency is used. However, different α's have a significant effect on the results. Compared with the result of α ═ 1, the result of α ═ 2 showed more correct color. To further investigate the effect of different compositions of global consistency loss on luminegan, β ═ 2 was set in equation (11), and the different effects of the three constituent modes of the loss on the results were compared. As shown in fig. 6, although the global consistency of the RGB format presents a good appearance, its structure is blurred. Also, the results presented using only the HSV format as a constraint are the opposite of the RGB format. In contrast to them, image (d) shows sharp edges and a pleasing color appearance.

Claims (1)

1. A cartoon style migration method based on a Retinex model is characterized by comprising the following steps:
the method is realized by a training stage and a testing stage; firstly, two generation countermeasure networks are trained simultaneously, which are a generator G and a discriminator D respectivelyRAnd generator F and discriminator DIThen, taking the real photo x as an input to continuously pass through a generator G and a generator F to finally obtain an output image F (G (x));
the method comprises the following three steps: preprocessing a data set, training a RexGAN model and synthesizing cartoon images;
1) preprocessing of data sets
For efficient estimation of illumination components I and reflection components R of real photographs and cartoon images, an intrinsic and an extrinsic joint prior model Jeep are integrated into a mapping functionNumber RexGAN; therefore, the cartoon image y and the real photo x are decomposed by using the Jeip model respectively to obtain the reflection component R of the cartoon imageyAnd the reflection component R of a real photographx(ii) a Unpaired training data sets are then utilized
Figure FDA0002969964540000011
And
Figure FDA0002969964540000012
training generator G to learn a mapping function (ReflectGAN) from the photo domain X of photo image X to the reflection domain Q of cartoon image y; training data set by pairing
Figure FDA0002969964540000013
And
Figure FDA0002969964540000014
Figure FDA0002969964540000015
training the generator F to learn a mapping function (luminegan) from the reflection domain Q of the cartoon image to the cartoon domain Y; the training data comprises a real photograph x and a cartoon image y, while the test data comprises only the real photograph x; in the training process of RexGAN, a data set provided by animeGAN is used; furthermore, all training images were adjusted to 256 × 256;
2) training of RexGAN model
Based on the assumption that the reflection component of the image contains fine textures and has segmentation continuity, the reflection of the real photo is transferred to the corresponding cartoon image; therefore, the RexGAN model includes two mappings G: x → Ry(ReflectGAN) and F: ry→ y (luminegan); in addition, two countermeasure discriminators D are introducedRAnd DI,DRTo distinguish the image RyAnd a converted image G (x) corresponding to the generator G; likewise, DIFor the purpose of distinguishing between y and F (R)y) Which corresponds to generator F; therefore, the aim is toThe objective function is expressed as:
Figure FDA0002969964540000016
where argminmax represents the solution of the maximum and minimum problem, where similar to training one generative countermeasure network, the training problems of two generative countermeasure networks, i.e., minimizing generator G and generator F, maximizing discriminator DRAnd discriminator DI
Luminegan is essentially the use of generation of a countermeasure network to preserve the structure of and reconstruct the illumination of a target image; thus, the formula is derived as:
L(G,F,DR,DI)=LR(G,DR)+LI(F,DI) (2)
LR(G,DR) And LI(F,DI) Represents the loss functions of reflecctgan and luminegan, respectively, which are described in detail below;
2.1 training of reflextgan
Reflextgan is trained to learn style characteristics of cartoon image reflections; in order to reduce the training parameters of ReflectGAN, a generator model in AnimeGANv2 was introduced directly; in addition, a simple block-level discriminator is used to determine whether the generated result has the characteristics of cartoon image reflection components;
loss function of ReflectGAN counters loss by reflection
Figure FDA0002969964540000021
Content loss
Figure FDA0002969964540000022
Reflection style loss LgraAnd color consistency loss LcolComposition is carried out; thus, LR(G,DR) Expressed as:
Figure FDA0002969964540000023
wherein ω is1=300,ω2=1.4、ω32.5 and ω4100 is the weight used to balance the ReflectGAN loss;
inputting the realistic picture x into a generator G and attempting to generate an image G (x) whose appearance style and texture should be matched to the reflection component R of the real cartoon imageyThe purpose of the discriminator DR is to combine the resultant image G (x) with the reflection component RyDistinguishing; therefore, an image G (x) and a reflection component R will be generatedyInput to discriminator DRTo obtain a false probability DR(G (x)) and true probability DR(Ry) Then the false probability D is determinedR(G (x)) is compared to the true probability value 1 and the true probability DR(Ry) Comparing with the false probability value 0, and further comparing the generator G with the discriminator DRThe purpose of alternate iterative training is achieved until convergence; in order to effectively learn the style characteristics of cartoon image reflection, the reflection contrast loss based on least square loss is proposed to constrain the generator G and the discriminator DR(ii) a Reflection countermeasure loss
Figure FDA0002969964540000024
Then it is expressed as:
Figure FDA0002969964540000025
wherein
Figure FDA0002969964540000026
Representing the reflection component R in the reflection domain Q of a cartoon imageyThe set of data of (a) is,
Figure FDA0002969964540000027
a data set representing a real photograph X in a real photograph field X;
introducing perceptual loss as content loss with retention mapsCapabilities like content and overall spatial structure; thus, the ability to extract high-level features using VGG extracts G (x), x, and RyThe image high-level features of (1); in addition, style characteristics reflected by the image are extracted from high-level characteristics of the image by using a Gram matrix; finally, the content is lost
Figure FDA0002969964540000028
And reflection style loss LgraIs defined as:
Figure FDA0002969964540000029
Figure FDA0002969964540000031
wherein
Figure FDA0002969964540000032
Representing the reflection component R in the field Q of reflection of the cartoon imageyThe data set and VGG of the real photo X in the real photo domain X represent a high-level feature map extracted by a 19-layer VGG network pre-trained on an ImageNet data set, and l represents the feature mapping of a specific VGG layer; during training, the "conv 4-4" layer is selected to calculate this loss;
reflection style loss contains color information of style image reflection, and the Jeip model is mainly image decomposition performed in an HSV color space illumination (V) channel; therefore, the RGB format image is converted into HSV format, and color consistency loss is established, so that the reflection color of the generated image is close to the reflection color of the real photo; since a large amount of texture information is included in the V channel, l is used for the V channel1Sparse constraint, using Huber loss l for hue (H) channel and saturation (S) channelh(ii) a Loss of color consistency LcolIs defined as:
Figure FDA0002969964540000033
wherein
Figure FDA0002969964540000034
Representing the real photograph X in the real photograph field X and the real photograph reflection component R in the real photograph reflection field PxThe data set of (2); h (-), S (-), V (-), represent the three channels of the HSV format image, respectively, and α represents the weight of the V channel;
2.2 training of LuminGAN
Integrating four lightweight channel attention modules (ECA) into eight Inverse Residual Blocks (IRB) of a generator F to form a new residual block;
training the cartoon image and the reflection of the cartoon image as a group of paired data sets LuminGAN, so that a generator F has the capability of reconstructing illumination characteristics; thus, the objective function LI(F,DI) Loss of resistance primarily by illumination
Figure FDA0002969964540000035
Content loss
Figure FDA0002969964540000036
And global coherency loss Lglo(ii) a The loss function of luminegan is expressed as:
Figure FDA0002969964540000037
wherein gamma is1=150,γ20.5 and γ31000 is the weight used to balance luminegan loss;
reflecting component R of cartoon imageyInput into the generator F and attempt to generate an image F (R)y) Image F (R)y) Should coincide with the real cartoon image y, and the discriminator DIIs aimed at synthesizing the image F (R)y) And a reflected component RyDistinguishing;
so that light counteracts the loss
Figure FDA0002969964540000038
Is defined as:
Figure FDA0002969964540000039
equation (9) is constrained in the same way as equation (4), except that the cartoon image y and its reflection component R are separately appliedyInput to discriminator DITo obtain a true probability DI(y) and a false probability DI(F(Ry) ); then the false probability DI(F(Ry) Compare with the true probability value 1 and compare the true probability DI(y) comparing with the false probability value 0, and further comparing the generator F with the discriminator DIThe purpose of alternate iterative training is achieved until convergence; wherein
Figure FDA0002969964540000041
A data set representing a cartoon image Y in the cartoon image field Y;
in order to accelerate the convergence speed in LuminGAN training, the content loss with the same structure as the content loss in ReflectGAN is added
Figure FDA0002969964540000042
A de-constrained generator F, the difference being that only the input real image x in ReflectGAN is changed to the reflection component R of the input cartoon imagey(ii) a In order to highlight the edge structure of the image, a color consistency loss constraint generator F of an HSV space is introduced, and the consistency loss is added to the whole image; therefore, global consistency is lost by LgloIs defined as:
Figure FDA0002969964540000043
wherein
Figure FDA0002969964540000044
Representing the reflection component R in the reflection domain Q of a cartoon imageyAnd the data set of the cartoon image Y in the cartoon image domain Y, H (-), S (-), V (-), respectively represent three channels of the HSV format image, and beta ═ 2 represents the weight of the V channel.
CN202110305033.7A 2021-03-10 2021-03-10 Cartoon style migration method based on Retinex model Pending CN113066114A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110305033.7A CN113066114A (en) 2021-03-10 2021-03-10 Cartoon style migration method based on Retinex model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110305033.7A CN113066114A (en) 2021-03-10 2021-03-10 Cartoon style migration method based on Retinex model

Publications (1)

Publication Number Publication Date
CN113066114A true CN113066114A (en) 2021-07-02

Family

ID=76563397

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110305033.7A Pending CN113066114A (en) 2021-03-10 2021-03-10 Cartoon style migration method based on Retinex model

Country Status (1)

Country Link
CN (1) CN113066114A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113870102A (en) * 2021-12-06 2021-12-31 深圳市大头兄弟科技有限公司 Animation method, device, equipment and storage medium of image

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110458750A (en) * 2019-05-31 2019-11-15 北京理工大学 A kind of unsupervised image Style Transfer method based on paired-associate learning
CN111325661A (en) * 2020-02-21 2020-06-23 京工数演(福州)科技有限公司 Seasonal style conversion model and method for MSGAN image
CN112330535A (en) * 2020-11-27 2021-02-05 江南大学 Picture style migration method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110458750A (en) * 2019-05-31 2019-11-15 北京理工大学 A kind of unsupervised image Style Transfer method based on paired-associate learning
CN111325661A (en) * 2020-02-21 2020-06-23 京工数演(福州)科技有限公司 Seasonal style conversion model and method for MSGAN image
CN112330535A (en) * 2020-11-27 2021-02-05 江南大学 Picture style migration method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113870102A (en) * 2021-12-06 2021-12-31 深圳市大头兄弟科技有限公司 Animation method, device, equipment and storage medium of image
CN113870102B (en) * 2021-12-06 2022-03-08 深圳市大头兄弟科技有限公司 Animation method, device, equipment and storage medium of image

Similar Documents

Publication Publication Date Title
Rudnev et al. Nerf for outdoor scene relighting
Lukac Computational photography: methods and applications
US8437514B2 (en) Cartoon face generation
Wu et al. Interactive normal reconstruction from a single image
WO2008050062A1 (en) Method and device for the virtual simulation of a sequence of video images
Kumar et al. A comprehensive survey on non-photorealistic rendering and benchmark developments for image abstraction and stylization
US11727628B2 (en) Neural opacity point cloud
WO2023066173A1 (en) Image processing method and apparatus, and storage medium and electronic device
Li et al. Uphdr-gan: Generative adversarial network for high dynamic range imaging with unpaired data
Bao et al. Deep image-based illumination harmonization
CN113222845A (en) Portrait external shadow removing method based on convolution neural network
Chhabra et al. Detailed survey on exemplar based image inpainting techniques
CN115641391A (en) Infrared image colorizing method based on dense residual error and double-flow attention
Xiao et al. Image hazing algorithm based on generative adversarial networks
Zhao et al. Cartoon image processing: a survey
CN113066114A (en) Cartoon style migration method based on Retinex model
CN117541732A (en) Text-guided neural radiation field building scene stylization method
Wen et al. A survey of image dehazing algorithm based on retinex theory
EP4162448A1 (en) Method and device for three-dimensional reconstruction of a face with toothed portion from a single image
JP5896204B2 (en) Image processing apparatus and program
Tous Pictonaut: movie cartoonization using 3D human pose estimation and GANs
CN114898021B (en) Intelligent cartoon method for music stage performance video
CN116977455A (en) Face sketch image generation system and method based on deep two-way learning
Ma A comparison of art style transfer in Cycle-GAN based on different generators
Wang et al. Uncouple generative adversarial networks for transferring stylized portraits to realistic faces

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination