CN113066114A - Cartoon style migration method based on Retinex model - Google Patents
Cartoon style migration method based on Retinex model Download PDFInfo
- Publication number
- CN113066114A CN113066114A CN202110305033.7A CN202110305033A CN113066114A CN 113066114 A CN113066114 A CN 113066114A CN 202110305033 A CN202110305033 A CN 202110305033A CN 113066114 A CN113066114 A CN 113066114A
- Authority
- CN
- China
- Prior art keywords
- image
- loss
- reflection
- cartoon
- generator
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 238000013508 migration Methods 0.000 title claims abstract description 14
- 230000005012 migration Effects 0.000 title claims abstract description 14
- 238000005286 illumination Methods 0.000 claims abstract description 29
- 238000013507 mapping Methods 0.000 claims abstract description 17
- 238000012549 training Methods 0.000 claims description 62
- 230000006870 function Effects 0.000 claims description 23
- 238000012360 testing method Methods 0.000 claims description 7
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 6
- 238000000354 decomposition reaction Methods 0.000 claims description 5
- 230000002194 synthesizing effect Effects 0.000 claims description 5
- 239000000203 mixture Substances 0.000 claims description 4
- 239000000284 extract Substances 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims description 2
- 230000014759 maintenance of location Effects 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 13
- 230000004438 eyesight Effects 0.000 abstract description 2
- 238000011160 research Methods 0.000 description 4
- 238000012546 transfer Methods 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 238000010422 painting Methods 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 238000010428 oil painting Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000002310 reflectometry Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000007493 shaping process Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000016776 visual perception Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T13/00—Animation
- G06T13/20—3D [Three Dimensional] animation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T15/00—3D [Three Dimensional] image rendering
- G06T15/50—Lighting effects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/40—Analysis of texture
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/90—Determination of colour characteristics
Abstract
A cartoon style migration method based on a Retinex model belongs to the field of computer vision. Converting real-world photographs into cartoon-style images is a meaningful and challenging task. The existing method cannot obtain a satisfactory cartoon result because the consistency and continuity of the cartoon image and the real photo in the aspects of structure, texture and illumination are not considered respectively. The Retinex model in the invention jointly learns the intrinsic properties (shape and texture) and the extrinsic properties (illumination) of photos and cartoon images. The RexGAN framework includes reflextgan that learns the mapping from photo images to cartoon images through reflection loss and luminegan that further facilitates generating the structure and illumination of the images through illumination loss. The invention can generate high-quality cartoon images from real photos, has clear edges and structures and correct illumination, and is superior to the most advanced method in subjective effect.
Description
Technical Field
The invention relates to the fields of style migration, low-light image enhancement and the like, and designs a cartoon style migration method based on a Retinex decomposition model. The invention belongs to the field of computer vision, and particularly relates to technologies such as nerve style migration and Retinex image decomposition algorithm.
Background
In recent years, the film and television works of science fiction subjects are more and more popular. With the development of computer graphics calculation, the production quality of the film and television special effect is greatly improved. The makers can finish finer, more real and more shocking picture effects on a computer. The currently mainstream film and television special effect creation software includes Houdini, Nuke and MAYA. The character scene shaping of famous animated films final fantasy and madagas are both hands from MAYA. However, the special effect making process based on these software is complicated, which includes scene construction, and this not only requires a lot of financial and manpower, but also requires professional technicians and a lot of time. Recently, in the field of deep learning, research on style migration is receiving attention, and the main goal of the research is to convert a common image into a painting with an artistic style, which can provide good technical support for making a special effect of movie and television. The artistic style of cartoons emphasizes mainly the information that is closely related to the theme, thus simplifying and eliminating the miscellaneous parts. Because of the unique appeal, the artistic styles are widely used in life, not only in the video field, but also in the fields of games, advertisements, and the like. Therefore, the research of the cartoon style migration has important significance.
Since the 90's of the 20 th century, many non-photorealistic rendering (NPR) algorithms were developed for specific styles, including cartoon styles, oil-painting styles, and ink-wash styles. Researchers often use cel shading or filter processing to obtain specific patterns, and these methods are widely used in various software. In recent years, a neural style migration (NST) method based on a convolutional neural network has come to be developed, which has a good effect on the painting style by using the correlation between depth features. In addition, there is another group of methods based on the generation of a countermeasure network (GANS) to transmit images between two domains in a countermeasure manner. Subsequently, a series of methods based on a loop method have been developed to complete the migration of domains by training unpaired data. In addition, cartonongan, ComixGAN, AnimeGAN have also achieved great success in cartoon style migration. However, these methods still show unsatisfactory results in two respects. 1) The structure and texture of the original image weak light area are easy to lose; 2) the resulting cartoon image does not retain the global color appearance of the original image.
In the field of low light enhancement, the Retinex model is used as a perception model of the human visual system to decompose an image into illumination and reflection components. Its physical object can be described as O ═ I &, i.e., the observed image O is decomposed into illumination I and reflectance R, indicating that the elements are multiplied one by one. In recent years, methods represented by WVM, JeiP, and STAR can well decompose images into illumination and reflection components based on Retinex. The JeIP model can better reserve structural information through shape prior, better estimate reflectivity through texture prior and better acquire a light source through illumination prior. These properties can thus be exploited to better convey stylization and improve illumination and preserve the structure of the generated image.
A generative countermeasure network (GAN) framework based on a Retinex model is proposed and is denoted as RexGAN. First, the influence of extrinsic properties (illumination) is eliminated by introducing the Jeip-Retinex model to decompose the image into its reflection components. The mapping of the reflections from the photo images to the cartoon images is then learned in an iterative manner by exploiting the piecewise nature of the animated image and the piecewise continuous nature of the piecewise reflective components of the image, as well as the reflection losses. Finally, the illumination and the structure protecting the generated image are further improved by the loss of illumination.
Disclosure of Invention
The neural style migration method generally requires that the content image and the style image have structural similarity in the process of realizing style conversion of the image, and the drawing style is emphasized in the mode. The generation of the countermeasure network can realize the transfer of two completely different domains, so the generation of the countermeasure network is one of the research hotspots at present, is also applied to style transfer, and achieves good effect. The most representative of them are carteongan, ComixGAN and animagegan, respectively. However, these methods show unsatisfactory results in two respects. 1) The structure and texture of the original image weak light area are easy to lose; 2) the resulting cartoon image does not retain the global color appearance of the original image.
A cartoon style migration method based on a Retinex model is characterized by comprising the following steps:
the method is realized by a training stage and a testing stage; firstly, two generation countermeasure networks are trained simultaneously, which are a generator G and a discriminator D respectivelyRAnd generator F and discriminator DIThen, taking the real photo x as an input to continuously pass through a generator G and a generator F to finally obtain an output image F (G (x));
the method comprises the following three steps: preprocessing a data set, training a RexGAN model and synthesizing cartoon images;
1. preprocessing of data sets
In order to effectively estimate the illumination component I and the reflection component R of the real photos and cartoon images, an intrinsic and extrinsic joint prior model Jeip is integrated into a mapping function RexGAN; therefore, the cartoon image y and the real photo x are decomposed by using the Jeip model respectively to obtain the reflection component R of the cartoon imageyAnd the reflection component R of a real photographx(ii) a Unpaired training data sets are then utilized Andtraining generator G to learn a mapping function (reflextgan) from the photo domain X of photo image X to the reflection domain of cartoon image y; training data finally by pairingCollectionAnd training the generator F to learn a mapping function (luminegan) from the reflection domain Q of the cartoon image to the cartoon domain Y; the training data comprises a real photograph x and a cartoon image y, while the test data comprises only the real photograph x; in the training process of RexGAN, a data set provided by animeGAN is used; furthermore, all training images were adjusted to 256 × 256;
2, training a RexGAN model, based on the assumption that the reflection component of the image contains fine textures and has segmentation continuity, transferring the reflection of the real photo to a corresponding cartoon image; thus, the RexGAN model includes two mappings G: x → Ry(ReflectGAN) and F: Ry→ y (luminegan); in addition, two countermeasure discriminators D are introducedRAnd DI,DRTo distinguish the image RyAnd a converted image G (x) corresponding to the generator G; likewise, DIFor the purpose of distinguishing between y and F (R)y) Which corresponds to generator F; thus, the objective function is expressed as:
where argminmax represents the solution of the maximum and minimum problem, where similar to training one generative countermeasure network, the training problems of two generative countermeasure networks, i.e., minimizing generator G and generator F, maximizing discriminator DRAnd discriminator DI;
Luminegan is essentially the use of generation of a countermeasure network to preserve the structure of and reconstruct the illumination of a target image; thus, the formula is derived as:
L(G,F,DR,DI)=LR(G,DR)+LI(F,DI) (2)
LR(G,DR) And LI(F,DI) Represents the loss functions of reflecctgan and luminegan, respectively, which are described in detail below;
2.1 training of reflextgan
Reflextgan is trained to learn style characteristics of cartoon image reflections; in order to reduce the training parameters of ReflectGAN, a generator model in AnimeGANv2 was introduced directly; in addition, a simple block-level discriminator is used to determine whether the generated result has the characteristics of cartoon image reflection components;
loss function of ReflectGAN counters loss by reflectionContent lossReflection style loss LgraAnd color consistency loss LcolComposition is carried out; thus, LR(G,DR) Expressed as:
wherein ω is1=300,ω2=1.4、ω32.5 and ω4100 is the weight used to balance the ReflectGAN loss;
inputting the realistic picture x into a generator G and attempting to generate an image G (x) whose appearance style and texture should be matched to the reflection component R of the real cartoon imageyIn agreement with, and discriminator DRThe purpose of (a) is to combine the image G (x) with the reflection component RyDistinguishing; therefore, an image G (x) and a reflection component R will be generatedyInput to discriminator DRTo obtain a false probability DR(G (x)) and true probability DR(Ry) Then the false probability D is determinedR(G (x)) is compared to the true probability value 1And the true probability DR(Ry) Comparing with the false probability value 0, and further comparing the generator G with the discriminator DRThe purpose of alternate iterative training is achieved until convergence; in order to effectively learn the style characteristics of cartoon image reflection, the reflection contrast loss based on least square loss is proposed to constrain the generator G and the discriminator DR(ii) a Reflection countermeasure lossThen it is expressed as:
whereinRepresenting the reflection component R in the reflection domain Q of a cartoon imageyThe set of data of (a) is,a data set representing a real photograph X in a real photograph field X;
introducing perceptual loss as content loss, which has the ability to preserve image content and overall spatial structure; thus, the ability to extract high-level features using VGG extracts G (x), x, and RyThe image high-level features of (1); in addition, style characteristics reflected by the image are extracted from high-level characteristics of the image by using a Gram matrix; finally, the content is lostAnd reflection style loss LgraIs defined as:
whereinRepresenting the reflection component R in the field Q of reflection of the cartoon imageyThe data set and VGG of the real photo X in the real photo domain X represent the high-level feature map extracted by a 19-layer VGG network pre-trained on an lmageNet data set, and l represents the feature mapping of a specific VGG layer; during training, the "conv 4-4" layer is selected to calculate this loss;
reflection style loss contains color information of style image reflection, and the Jeip model is mainly image decomposition performed in an HSV color space illumination (V) channel; therefore, the RGB format image is converted into HSV format, and color consistency loss is established, so that the reflection color of the generated image is close to the reflection color of the real photo; since a large amount of texture information is included in the V channel, l is used for the V channel1Sparse constraint, using Huber loss l for hue (H) channel and saturation (S) channelh(ii) a Loss of color consistency LcolIs defined as:
whereinRepresenting the real photograph X in the real photograph field X and the real photograph reflection component R in the real photograph reflection field PxThe data set of (2); h (-), S (-), V (-), represent the three channels of the HSV format image, respectively, and α represents the weight of the V channel;
2.2 training of LuminGAN
Integrating four lightweight channel attention modules (ECA) into eight Inverse Residual Blocks (IRB) of a generator F to form a new residual block;
training the cartoon image and the reflection of the cartoon image as a group of paired data sets LuminGAN, so that a generator F has the capability of reconstructing illumination characteristics; thus, the objective function LI(F,DI) Mainly comprisingLoss of resistance by illuminationContent lossAnd global coherency loss Lglo(ii) a The loss function of luminegan is expressed as:
wherein gamma is1=150,γ20.5 and γ31000 is the weight used to balance luminegan loss;
reflecting component R of cartoon imageyInput into the generator F and attempt to generate an image F (R)y) Image F (R)y) Should coincide with the real cartoon image y, and the discriminator DIIs aimed at synthesizing the image F (R)y) And a reflected component RyDistinguishing;
equation (9) is constrained in the same way as equation (4), except that the cartoon image y and its reflection component R are separately appliedyInput to discriminator DITo obtain a true probability DI(y) and a false probability DI(F(Ry) ); then the false probability DI(F(Ry) Compare with the true probability value 1 and compare the true probability DI(y) comparing with the false probability value 0, and further comparing the generator F with the discriminator DIThe purpose of alternate iterative training is achieved until convergence; whereinA data set representing a cartoon image Y in the cartoon image field Y;
in order to accelerate the convergence speed in LuminGAN training, the content loss with the same structure as the content loss in ReflectGAN is addedA de-constrained generator F, the difference being that only the input real image x in ReflectGAN is changed to the reflection component R of the input cartoon imagey(ii) a In order to highlight the edge structure of the image, a color consistency loss constraint generator F of an HSV space is introduced, and the consistency loss is added to the whole image; therefore, global consistency is lost by LgloIs defined as:
whereinRepresenting the reflection component R in the reflection domain Q of a cartoon imageyAnd the data set of the cartoon image Y in the cartoon image domain Y, H (-), S (-), V (-), respectively represent three channels of the HSV format image, and beta ═ 2 represents the weight of the V channel.
The invention provides a generating countermeasure network model based on Retinex by jointly considering the intrinsic and extrinsic properties of an image. The model can effectively store the color characteristics of the content image and synthesize a high-quality cartoon style image.
Drawings
FIG. 1: rexgan framework diagram
FIG. 2: residual module in LuminGAN generator F
FIG. 3: subjective quality contrast map with different methods
FIG. 4: comparison of three different styles
FIG. 5: effect of verifying loss of color consistency
FIG. 6: effect of composition differences on results in Global consistency loss
Detailed Description
Fig. 1 shows that the method implementation is divided into a training phase and a testing phase. The method is realized by simultaneously training two generation countermeasure networks, namely a generator G and a discriminator DRAnd generator F and discriminator DIThen, the real photo x is taken as an input and continuously passed through the generator G and the generator F to finally obtain an output image F (G (x)). The method can be divided into the following three steps: preprocessing of the data set, training of the RexGAN model, and synthesis of cartoon images.
3. Preprocessing of data sets
In order to efficiently estimate the illumination component I and the reflection component R of real photographs and cartoon images, a joint prior model (Je ip) of intrinsic and extrinsic is integrated into a mapping function (RexGAN). Therefore, the Jeip model is used for decomposing the cartoon image y and the real photo x respectively to obtain the reflection component R of the cartoon imageyAnd the reflection component R of a real photographx. Unpaired training data sets are then utilized Andthe training generator G is such that it learns the mapping function (ReflectGAN) from the reflection domain P of the photographic image x to the reflection domain Q of the cartoon image y. Training data set by pairingAnd the training generator F makes it learn the mapping function from the reflection domain Q of the cartoon image to the cartoon domain YNumber (luminegan). The training data contains a real photograph x and a cartoon image y, while the test data only contains a real photograph x. In the RexGAN training process, the data set provided by AnimeGAN is used. The data set contains 6656 real photos as content image data set, and key frames cut by three cartoon movies (respectively by Nagasaki Jun, New Haichi and today sensitive practice) as style image data set, wherein different authors represent different styles. Further, all training images are adjusted to 256 × 256.
Training of the RexGAN model transfers reflections of real photographs onto corresponding cartoon images based on the assumption that the reflection components of the images contain fine textures and have piecewise continuity. Therefore, the RexGAN model includes two mappings G: x → Ry(ReflectGAN) and F: ry→ y (LuminGAN). In addition, two countermeasure discriminators D are introducedRAnd DI,DRTo distinguish the image RyAnd a converted image G (x) corresponding to the generator G; likewise, DIFor the purpose of distinguishing between y and F (R)y) Which corresponds to the generator F. Thus, the objective function is expressed as:
where argminmax represents the solution of the maximum and minimum problem, where similar to training one generative countermeasure network, the training problems of two generative countermeasure networks, i.e., minimizing generator G and generator F, maximizing discriminator DRAnd discriminator DI。
Since the appearance of an object is influenced by both intrinsic and extrinsic properties. Intrinsic properties of the object, including shape and texture, are illumination independent. On the basis, the reflecting layer is used for establishing the relation between the reflecting layer of the real photo and the reflecting layer of the cartoon image, and the style characteristics of the cartoon image are learned. Luminegan is essentially the use of illumination to generate a countermeasure network to preserve the structure of and reconstruct an image of interest. Thus, the formula can be derived as:
L(G,F,DR,DI)=LR(G,DR)+LI(F,DI) (2)
LR(G,DR) And LI(F,DI) Which represent the loss functions of reflecctgan and luminegan, respectively, are described in detail below.
4.1 training of reflextgan
ReflectGAN is trained to learn style characteristics of cartoon image reflections. To reduce the training parameters of ReflectGAN, the generator model in animageganv 2 was introduced directly. In addition, a simple block-level discriminator is used to determine whether the resulting result is characteristic of the reflection component of the cartoon image.
Loss function of ReflectGAN counters loss by reflectionContent lossReflection style loss LqraAnd color consistency loss LcolAnd (4) forming. Thus, LR(G,DR) Can be expressed as:
wherein ω is1=300,ω2=1.4、ω32.5 and ω4100 is the weight used to balance the ReflectGAN loss. The weight value is obtained by a large number of experiments.
Inputting the realistic picture x into a generator G and attempting to generate an image G (x) whose appearance style and texture should be matched to the reflection component R of the real cartoon imageyIn agreement with, and discriminator DRThe purpose of (a) is to combine the image G (x) with the reflection component RyAre distinguished. Therefore, an image G (x) and a reflection component R will be generatedyInput to discriminator DRTo obtain a false probability DR(G (x)) and true probability DR(Ry) Then the false probability D is determinedR(G (x)) is compared to the true probability value 1 and the true probability DR(Ry) Comparing with the false probability value 0, and further comparing the generator G with the discriminator DRThe aim of alternate iterative training is achieved until convergence. In order to effectively learn the style characteristics of cartoon image reflection, the reflection contrast loss based on least square loss is proposed to constrain the generator G and the discriminator DR. Reflection countermeasure lossIt can be expressed as:
whereinRepresenting the reflection component R in the reflection domain Q of a cartoon imageyThe set of data of (a) is,a data set representing a real photograph X in a real photograph field X.
Perceptual loss is introduced as content loss, which has the ability to preserve image content and overall spatial structure. Thus, the ability to extract high-level features using VGG extracts G (x), x, and RyThe high-level features of the image. In addition, the Gram matrix is used to extract the style features reflected by the image from the high-level features of the image. Finally, the content is lostAnd reflection style loss LgraIs defined as:
whereinRepresenting the reflection component R in the field Q of reflection of the cartoon imageyThe data set and VGG associated with real photo X in real photo domain X represent the high level feature maps extracted by a 19-layer VGG network pre-trained on the lmageNet data set, and l represents the feature mapping for a particular VGG layer. During training, the "conv 4-4" layer was chosen to calculate this loss.
Reflection style loss contains color information of the style image reflection, while the Jeip model is mainly image decomposition in the HSV color space illumination (V) channel. Therefore, the image in the RGB format is converted into the HSV format, and the color consistency loss is established, so that the reflection color of the generated image is close to the reflection color of the real photo. Since a large amount of texture information is included in the V channel, l is used for the V channel1Sparse constraint, using Huber loss l for hue (H) channel and saturation (S) channelh. Loss of color consistency LcolIs defined as:
whereinRepresenting the real photograph X in the real photograph field X and the real photograph reflection component R in the real photograph reflection field PxThe data set of (2). H (-), S (-), V (-), represent the three channels of the HSV format image, respectively, and α represents the weight of the V channel.
4.2 training of LuminGAN
The purpose of luminegan is to reconstruct the lighting components of a cartoon image. Therefore, the AnimeGAN generator is directly used, as well as a conventional pixel-level discriminator. To avoid generating high frequency artifacts in the image and to reduce the training complexity, four lightweight channel attention modules (ECAs) are integrated into eight Inverse Residual Blocks (IRBs) of the generator F to form a new residual block, as shown in fig. 2.
In order to make the generated image have clearer edges and better visual perception, the cartoon image and the reflection of the cartoon image are trained as a set of data set luminegan, making the generator F capable of reconstructing illumination characteristics. Thus, the objective function LI(F,DI) Loss of resistance primarily by illuminationContent lossAnd global coherency loss Lglo. The loss function of luminegan is expressed as:
wherein gamma is1=150,γ20.5 and γ31000 is the weight used to balance the luminegan loss.
Reflecting component R of cartoon imageyInput into the generator F and attempt to generate an image F (R)y) Image F (R)y) Should coincide with the real cartoon image y, and the discriminator DIIs aimed at synthesizing the image F (R)y) And a reflected component RyAre distinguished.
equation (9) is constrained in the same way as equation (4), except that the cartoon image y and its reflection component R are separately appliedyInput to discriminator DITo obtain a true probability DI(y) and falseProbability DI(F(Ry)). Then the false probability DI(F(Ry) Compare with the true probability value 1 and compare the true probability DI(y) comparing with the false probability value 0, and further comparing the generator F with the discriminator DIThe aim of alternate iterative training is achieved until convergence. WhereinA data set representing a cartoon image Y in the cartoon image field Y.
In order to accelerate the convergence speed in LuminGAN training, the content loss with the same structure as the content loss in ReflectGAN is addedA de-constrained generator F, the difference being that only the input real image x in ReflectGAN is changed to the reflection component R of the input cartoon imagey. In order to highlight the edge structure of the image, a color consistency loss constraint generator F of HSV space is introduced and adds this consistency loss to the entire image. Therefore, global consistency is lost by LgloIs defined as:
whereinRepresenting the reflection component R in the reflection domain Q of a cartoon imageyAnd the data set of the cartoon image Y in the cartoon image domain Y, H (-), S (-), V (-), respectively represent three channels of the HSV format image, and beta ═ 2 represents the weight of the V channel.
5. Synthesis of cartoon images
Generator G first converts the input photographic image x into G (x) with statistical characteristics similar to the reflection components of the cartoon image, and then generator F converts G (x) into the cartoon image F (G (x)).
In order to verify the effectiveness of the proposed solution of the invention, experimental verification was carried out from different aspects. The results were compared with four of the most advanced works at present, and the experimental results are shown in fig. 3. As a result, the texture information of the real photo is kept, the clear edge is reproduced, and the whole color is more in line with the visual effect. In addition, attempts were made to test the method from different styles, with the results shown in FIG. 4.
Fig. 5 shows how color consistency loss affects the generation of ReflectGAN. Content loss and reflection style loss enable generator G to generate stylized images, but they tend to produce over-stylization. Images (c) and (d) have a clear texture after loss of color consistency is used. However, different α's have a significant effect on the results. Compared with the result of α ═ 1, the result of α ═ 2 showed more correct color. To further investigate the effect of different compositions of global consistency loss on luminegan, β ═ 2 was set in equation (11), and the different effects of the three constituent modes of the loss on the results were compared. As shown in fig. 6, although the global consistency of the RGB format presents a good appearance, its structure is blurred. Also, the results presented using only the HSV format as a constraint are the opposite of the RGB format. In contrast to them, image (d) shows sharp edges and a pleasing color appearance.
Claims (1)
1. A cartoon style migration method based on a Retinex model is characterized by comprising the following steps:
the method is realized by a training stage and a testing stage; firstly, two generation countermeasure networks are trained simultaneously, which are a generator G and a discriminator D respectivelyRAnd generator F and discriminator DIThen, taking the real photo x as an input to continuously pass through a generator G and a generator F to finally obtain an output image F (G (x));
the method comprises the following three steps: preprocessing a data set, training a RexGAN model and synthesizing cartoon images;
1) preprocessing of data sets
For efficient estimation of illumination components I and reflection components R of real photographs and cartoon images, an intrinsic and an extrinsic joint prior model Jeep are integrated into a mapping functionNumber RexGAN; therefore, the cartoon image y and the real photo x are decomposed by using the Jeip model respectively to obtain the reflection component R of the cartoon imageyAnd the reflection component R of a real photographx(ii) a Unpaired training data sets are then utilizedAndtraining generator G to learn a mapping function (ReflectGAN) from the photo domain X of photo image X to the reflection domain Q of cartoon image y; training data set by pairingAnd training the generator F to learn a mapping function (luminegan) from the reflection domain Q of the cartoon image to the cartoon domain Y; the training data comprises a real photograph x and a cartoon image y, while the test data comprises only the real photograph x; in the training process of RexGAN, a data set provided by animeGAN is used; furthermore, all training images were adjusted to 256 × 256;
2) training of RexGAN model
Based on the assumption that the reflection component of the image contains fine textures and has segmentation continuity, the reflection of the real photo is transferred to the corresponding cartoon image; therefore, the RexGAN model includes two mappings G: x → Ry(ReflectGAN) and F: ry→ y (luminegan); in addition, two countermeasure discriminators D are introducedRAnd DI,DRTo distinguish the image RyAnd a converted image G (x) corresponding to the generator G; likewise, DIFor the purpose of distinguishing between y and F (R)y) Which corresponds to generator F; therefore, the aim is toThe objective function is expressed as:
where argminmax represents the solution of the maximum and minimum problem, where similar to training one generative countermeasure network, the training problems of two generative countermeasure networks, i.e., minimizing generator G and generator F, maximizing discriminator DRAnd discriminator DI;
Luminegan is essentially the use of generation of a countermeasure network to preserve the structure of and reconstruct the illumination of a target image; thus, the formula is derived as:
L(G,F,DR,DI)=LR(G,DR)+LI(F,DI) (2)
LR(G,DR) And LI(F,DI) Represents the loss functions of reflecctgan and luminegan, respectively, which are described in detail below;
2.1 training of reflextgan
Reflextgan is trained to learn style characteristics of cartoon image reflections; in order to reduce the training parameters of ReflectGAN, a generator model in AnimeGANv2 was introduced directly; in addition, a simple block-level discriminator is used to determine whether the generated result has the characteristics of cartoon image reflection components;
loss function of ReflectGAN counters loss by reflectionContent lossReflection style loss LgraAnd color consistency loss LcolComposition is carried out; thus, LR(G,DR) Expressed as:
wherein ω is1=300,ω2=1.4、ω32.5 and ω4100 is the weight used to balance the ReflectGAN loss;
inputting the realistic picture x into a generator G and attempting to generate an image G (x) whose appearance style and texture should be matched to the reflection component R of the real cartoon imageyThe purpose of the discriminator DR is to combine the resultant image G (x) with the reflection component RyDistinguishing; therefore, an image G (x) and a reflection component R will be generatedyInput to discriminator DRTo obtain a false probability DR(G (x)) and true probability DR(Ry) Then the false probability D is determinedR(G (x)) is compared to the true probability value 1 and the true probability DR(Ry) Comparing with the false probability value 0, and further comparing the generator G with the discriminator DRThe purpose of alternate iterative training is achieved until convergence; in order to effectively learn the style characteristics of cartoon image reflection, the reflection contrast loss based on least square loss is proposed to constrain the generator G and the discriminator DR(ii) a Reflection countermeasure lossThen it is expressed as:
whereinRepresenting the reflection component R in the reflection domain Q of a cartoon imageyThe set of data of (a) is,a data set representing a real photograph X in a real photograph field X;
introducing perceptual loss as content loss with retention mapsCapabilities like content and overall spatial structure; thus, the ability to extract high-level features using VGG extracts G (x), x, and RyThe image high-level features of (1); in addition, style characteristics reflected by the image are extracted from high-level characteristics of the image by using a Gram matrix; finally, the content is lostAnd reflection style loss LgraIs defined as:
whereinRepresenting the reflection component R in the field Q of reflection of the cartoon imageyThe data set and VGG of the real photo X in the real photo domain X represent a high-level feature map extracted by a 19-layer VGG network pre-trained on an ImageNet data set, and l represents the feature mapping of a specific VGG layer; during training, the "conv 4-4" layer is selected to calculate this loss;
reflection style loss contains color information of style image reflection, and the Jeip model is mainly image decomposition performed in an HSV color space illumination (V) channel; therefore, the RGB format image is converted into HSV format, and color consistency loss is established, so that the reflection color of the generated image is close to the reflection color of the real photo; since a large amount of texture information is included in the V channel, l is used for the V channel1Sparse constraint, using Huber loss l for hue (H) channel and saturation (S) channelh(ii) a Loss of color consistency LcolIs defined as:
whereinRepresenting the real photograph X in the real photograph field X and the real photograph reflection component R in the real photograph reflection field PxThe data set of (2); h (-), S (-), V (-), represent the three channels of the HSV format image, respectively, and α represents the weight of the V channel;
2.2 training of LuminGAN
Integrating four lightweight channel attention modules (ECA) into eight Inverse Residual Blocks (IRB) of a generator F to form a new residual block;
training the cartoon image and the reflection of the cartoon image as a group of paired data sets LuminGAN, so that a generator F has the capability of reconstructing illumination characteristics; thus, the objective function LI(F,DI) Loss of resistance primarily by illuminationContent lossAnd global coherency loss Lglo(ii) a The loss function of luminegan is expressed as:
wherein gamma is1=150,γ20.5 and γ31000 is the weight used to balance luminegan loss;
reflecting component R of cartoon imageyInput into the generator F and attempt to generate an image F (R)y) Image F (R)y) Should coincide with the real cartoon image y, and the discriminator DIIs aimed at synthesizing the image F (R)y) And a reflected component RyDistinguishing;
equation (9) is constrained in the same way as equation (4), except that the cartoon image y and its reflection component R are separately appliedyInput to discriminator DITo obtain a true probability DI(y) and a false probability DI(F(Ry) ); then the false probability DI(F(Ry) Compare with the true probability value 1 and compare the true probability DI(y) comparing with the false probability value 0, and further comparing the generator F with the discriminator DIThe purpose of alternate iterative training is achieved until convergence; whereinA data set representing a cartoon image Y in the cartoon image field Y;
in order to accelerate the convergence speed in LuminGAN training, the content loss with the same structure as the content loss in ReflectGAN is addedA de-constrained generator F, the difference being that only the input real image x in ReflectGAN is changed to the reflection component R of the input cartoon imagey(ii) a In order to highlight the edge structure of the image, a color consistency loss constraint generator F of an HSV space is introduced, and the consistency loss is added to the whole image; therefore, global consistency is lost by LgloIs defined as:
whereinRepresenting the reflection component R in the reflection domain Q of a cartoon imageyAnd the data set of the cartoon image Y in the cartoon image domain Y, H (-), S (-), V (-), respectively represent three channels of the HSV format image, and beta ═ 2 represents the weight of the V channel.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110305033.7A CN113066114A (en) | 2021-03-10 | 2021-03-10 | Cartoon style migration method based on Retinex model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110305033.7A CN113066114A (en) | 2021-03-10 | 2021-03-10 | Cartoon style migration method based on Retinex model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113066114A true CN113066114A (en) | 2021-07-02 |
Family
ID=76563397
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110305033.7A Pending CN113066114A (en) | 2021-03-10 | 2021-03-10 | Cartoon style migration method based on Retinex model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113066114A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113870102A (en) * | 2021-12-06 | 2021-12-31 | 深圳市大头兄弟科技有限公司 | Animation method, device, equipment and storage medium of image |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110458750A (en) * | 2019-05-31 | 2019-11-15 | 北京理工大学 | A kind of unsupervised image Style Transfer method based on paired-associate learning |
CN111325661A (en) * | 2020-02-21 | 2020-06-23 | 京工数演(福州)科技有限公司 | Seasonal style conversion model and method for MSGAN image |
CN112330535A (en) * | 2020-11-27 | 2021-02-05 | 江南大学 | Picture style migration method |
-
2021
- 2021-03-10 CN CN202110305033.7A patent/CN113066114A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110458750A (en) * | 2019-05-31 | 2019-11-15 | 北京理工大学 | A kind of unsupervised image Style Transfer method based on paired-associate learning |
CN111325661A (en) * | 2020-02-21 | 2020-06-23 | 京工数演(福州)科技有限公司 | Seasonal style conversion model and method for MSGAN image |
CN112330535A (en) * | 2020-11-27 | 2021-02-05 | 江南大学 | Picture style migration method |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113870102A (en) * | 2021-12-06 | 2021-12-31 | 深圳市大头兄弟科技有限公司 | Animation method, device, equipment and storage medium of image |
CN113870102B (en) * | 2021-12-06 | 2022-03-08 | 深圳市大头兄弟科技有限公司 | Animation method, device, equipment and storage medium of image |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Rudnev et al. | Nerf for outdoor scene relighting | |
Lukac | Computational photography: methods and applications | |
US8437514B2 (en) | Cartoon face generation | |
Wu et al. | Interactive normal reconstruction from a single image | |
WO2008050062A1 (en) | Method and device for the virtual simulation of a sequence of video images | |
Kumar et al. | A comprehensive survey on non-photorealistic rendering and benchmark developments for image abstraction and stylization | |
US11727628B2 (en) | Neural opacity point cloud | |
WO2023066173A1 (en) | Image processing method and apparatus, and storage medium and electronic device | |
Li et al. | Uphdr-gan: Generative adversarial network for high dynamic range imaging with unpaired data | |
Bao et al. | Deep image-based illumination harmonization | |
CN113222845A (en) | Portrait external shadow removing method based on convolution neural network | |
Chhabra et al. | Detailed survey on exemplar based image inpainting techniques | |
CN115641391A (en) | Infrared image colorizing method based on dense residual error and double-flow attention | |
Xiao et al. | Image hazing algorithm based on generative adversarial networks | |
Zhao et al. | Cartoon image processing: a survey | |
CN113066114A (en) | Cartoon style migration method based on Retinex model | |
CN117541732A (en) | Text-guided neural radiation field building scene stylization method | |
Wen et al. | A survey of image dehazing algorithm based on retinex theory | |
EP4162448A1 (en) | Method and device for three-dimensional reconstruction of a face with toothed portion from a single image | |
JP5896204B2 (en) | Image processing apparatus and program | |
Tous | Pictonaut: movie cartoonization using 3D human pose estimation and GANs | |
CN114898021B (en) | Intelligent cartoon method for music stage performance video | |
CN116977455A (en) | Face sketch image generation system and method based on deep two-way learning | |
Ma | A comparison of art style transfer in Cycle-GAN based on different generators | |
Wang et al. | Uncouple generative adversarial networks for transferring stylized portraits to realistic faces |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |