CN111275651A

CN111275651A - Face bright removal method based on antagonistic neural network

Info

Publication number: CN111275651A
Application number: CN202010115474.6A
Authority: CN
Inventors: 朱婷; 夏思宇; 杨凡
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2020-02-25
Filing date: 2020-02-25
Publication date: 2020-06-12
Anticipated expiration: 2040-02-25
Also published as: CN111275651B

Abstract

The invention discloses a face bright removal method based on an antagonistic neural network, which comprises the following steps: synthesizing a human face by a computer imaging method, establishing a virtual human face data set with corresponding virtual human face light and human face no light, and obtaining a real human face data set with corresponding real human face light and human face no light by common shooting and shooting by utilizing a polarized lens; establishing an antagonistic neural network, training network parameters by using a data set to obtain a network model with light removal, and inputting a face image with light into the trained network model with light removal to obtain a non-light image of the face. The invention can realize more natural and real face bright removal through the anti-neural network.

Description

Face bright removal method based on antagonistic neural network

Technical Field

The invention relates to the field of pattern recognition, in particular to a face bright removal method based on an antagonistic neural network.

Background

The skin surface will reflect light due to illumination from the ambient light source, which will appear as skin glow on the image. The original color and texture of the part are covered by light on the surface of the skin, and the local light on the human face influences aspects such as computer vision, computer graphics, medical examination and the like, so that the problems of incapability of detection, identification errors and the like caused by the fact that the light on the human face brings difficulties to face detection, face identification and the like in the aspect of computer vision; in the aspect of computer graphics, the light of a real face needs to be consistent with the environment, and the skin should not be highlighted under a light source without generating light; in the aspect of medical examination, the detection of the human facial pigmentary dermatosis is interfered, the part of the skin with the light coverage is difficult to distinguish whether the dermatosis exists, and the design of the disease judgment and treatment scheme has an inhibiting effect. Therefore, the method is simple. The necessity of removing the light from the human face is high.

Because the material of the skin is complicated, a general bright removal method is not suitable, and the existing bright removal technology research cannot realize the natural and real elimination of light on the skin of a human face, especially when dense spots such as freckles exist on the face, the bright part of the original skin and the features such as the spots and the nevi on the skin are recovered as much as possible while the bright part is naturally eliminated. Therefore, the removal of light from a human face is a complicated and challenging problem. The illumination removing method by establishing the illumination model is limited by the physical illumination model, and cannot completely reflect the real illumination environment, especially for the complex materials such as human faces and skins, so that the removal effect is not real enough.

With the rise of artificial intelligence technology, the deep learning algorithm is continuously developed, and compared with a light removal method by establishing a light model, the corresponding lightless image is predicted directly according to a lighted original image by using an end-to-end neural network, so that the method is not limited by a fixed light model, and the skin of a lighted part of the face can be better and more truly recovered.

Disclosure of Invention

The purpose of the invention is as follows: the technical problem to be solved by the invention is to provide a face bright removal method based on an antagonistic neural network, which can realize the repair of a face bright area by utilizing an end-to-end neural network, achieve the purposes of naturally and truly removing the face bright area and recovering the original skin complexion and texture of the corresponding part, and is suitable for various faces with bright light.

The technical scheme is as follows: in order to realize the purpose of the invention, the technical scheme adopted by the invention is as follows: the invention designs a face bright removal method based on an antagonistic neural network. The method comprises the following steps: a face light removal method based on an antagonistic neural network comprises the following steps:

(1) synthesizing a human face by a computer vision method, generating a virtual human face bright light and human face dark image pair, and establishing a virtual human face data set;

(2) the method comprises the steps of obtaining an image pair of light and no light of a face under a real environment by common shooting and shooting of the face by utilizing a polarized lens, and establishing a real face data set;

(3) building a generating type antagonistic neural network model;

(4) pre-training a neural network with a virtual face data set;

(5) adjusting the pre-trained neural network parameters by using the real face data set to obtain a final light removal network model;

(6) and inputting the face image to be subjected to light removal into the light removal network model to obtain a corresponding non-light face image.

Further, the method for establishing the virtual face data set in the step (1) is as follows:

(1.1) collecting a front face lightless image of a person on the network as a material for generating a three-dimensional face;

(1.2) generating a three-dimensional face corresponding to the image through a HeadShot, adding a light source, setting different reflection degrees for face materials, obtaining different bright images of the face under different expressions through the expression of a preset face three-dimensional model, obtaining a face non-bright image under a non-light source environment and a face bright image under a light source environment through an intercepting mode, and establishing a virtual face data set.

Further, the method for establishing the real face data set in the step (2) is as follows:

(2.1) collecting data by using a camera, and setting the position of the camera, the position of a shot person, the position of a polarizer, the position of a light source and the angle so that light can be eliminated by using the polarizer during shooting;

and (2.2) when the image is collected, the shot person keeps still at a fixed position, real face images with different light are shot under different intensities of light, a polarizer is used for shooting a lightless version of the corresponding image, and a real face data set is established.

Further, in the step (3), a generative confrontation neural network model is built, and the method comprises the following steps:

constructing an antagonistic neural network, which comprises a generator and a discriminator, wherein the generator adopts a U-net structure, the total layer number is 15, the ith layer characteristic and the 15-i layer characteristic are connected, i belongs to [1,7], and the corresponding layers share bottom layer information, so that an output result is optimized; the size of an input image of the generator is 256 × 256, the down-sampling is realized through convolution with convolution kernel size of 4 × 4, batch normalization and ReLu activation, the down-sampling is performed for 8 times, and the dimensionality of the features obtained after each sampling is sequentially as follows: 128 × 128, 64 × 64, 32 × 32, 16 × 16, 8 × 8, 4 × 4, 2 × 2, and 1 × 1; and then activating, deconvoluting with convolution kernel size of 4 x 4, and performing batch normalization to realize upsampling, wherein after 8 times of upsampling, the obtained dimensionality after each time of sampling is as follows: 2 x 2, 4 x 4, 8 x 8, 16 x 16, 32 x 32, 64 x 64, 128 x 128 and an output of size 256 x 256;

the discriminator is a Markov discriminator (PatchGAN), the working principle of the discriminator is that an input image is divided into N × N small blocks for judgment, the average value of the judgment results of all the small blocks is used as an output result, the multi-scale judgment is realized by changing the size of the input image without changing the parameter N of the discriminator, the result of the discriminator is the average value of the results of 2 discriminators with different scales, and the training process is optimized by adopting an Adam optimizer.

Further, the neural network is pre-trained by using the virtual face data set in the step (3), and the method comprises the following steps: for the neural network constructed in the step (3), pre-training is carried out by using the virtual face data set to obtain parameters of a generator and a discriminator of a pre-training model, and the specific countermeasure training principle is as follows: inputting the luminous image a in the training set into a generator G to obtain a generated image G (a), inputting the luminous image a corresponding to the generated image G (a) and the training set into discriminators D1 and D2 with different scales to obtain a discriminator result

Since the result generated by the initial generator is not true, the probability of the image being considered as a true image by the discriminator is 0, i.e. the result of the discriminator

At 0, the light image a in the training set and the corresponding light-free image a' in the training set are input to discriminators D1 and D2 to obtain a discriminator result

The discriminator considers the input image as a real image, so the result of the discriminator

To 1, the parameters of the generator are updated such that the probability of generating the image as a true image

Maximizing, wherein the maximum value is larger than 0.5, and the discriminator considers the image generated by the generator as a real image; keeping the generator parameters unchanged, training the discriminator to make the generated image be the probability of the real image

Minimizing, wherein the minimum value is less than 0.5, and the discriminator considers that the image generated by the generator is not real; training the generator and the discriminator alternately until the discriminator considers that the real image is consistent with the generated image, namely Nash equilibrium is achieved between the generator and the discriminator, and then training is finished;

to ensure similarity between the generated graph G (a) and the lightless image a 'in the training set, the L1 distance between G (a) and a' is added as part of the loss function, and the overall loss function is

Loss function of generator and discriminator

Training the discriminator D with the goal of making L_GANAt a minimum, training Generator G, the goal is to make L_GANMax, L1 norm loss function L1 ═ E_a,a‘[||a‘-G(a)||₁]E is desired and λ isAnd (4) the coefficient.

Further, in the step (5), the real face data set is used for adjusting the pre-trained neural network parameters to obtain a final light removal network model, and the method comprises the following steps: on the basis of the step (4), training by using a real face data set, and adjusting parameters of a generator and a discriminator to enable a final light-free removal network model to generate a light-free face image which is the same as the real face, wherein the specific countermeasure training principle is as follows: inputting the luminous image a in the training set into a generator G to obtain a generated image G (a), inputting the luminous image a corresponding to the generated image G (a) and the training set into discriminators D1 and D2 with different scales to obtain a discriminator result

Minimization, which is less than 0.5, when the arbiter considers generation to beThe images generated by the device are not real; training the generator and the discriminator alternately until the discriminator judges that the real image is consistent with the generated image, namely Nash equilibrium is achieved between the generator and the discriminator, and then training is finished;

Loss function L of generator and discriminator_GAN＝E_a,a‘[logD(a,a‘)]+E_a[log(1-D(a,G(a)))]When training the discriminator D, the goal is to make L_gANAt a minimum, training Generator G, the goal is to make L_GANMax, L1 norm loss function L1 ═ E_a,a‘[||a‘-G(a)||₁]E is desired and λ is a coefficient.

Has the advantages that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:

(1) the invention realizes the effect of removing the face bright by using the end-to-end neural network, is not limited by a fixed physical illumination model, and can generate a more real bright removal effect;

(2) in addition, compared with the light removal of other materials, the invention has good adaptability to the complex material of skin;

(3) in addition, the invention not only can recover the original skin color of the bright area, but also can have certain recovery capability on the spots, color blocks and the like of the area;

(4) additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

FIG. 1 is a flow chart of a method for removing face light based on an anti-neural network implemented by the present invention;

FIG. 2 is a flow chart of the present invention for training an antagonistic neural network;

FIG. 3 is a schematic structural diagram of an anti-neural network generator constructed according to the present invention;

fig. 4 is a schematic diagram of the light removal effect of the present invention.

Detailed Description

The following detailed description of the embodiments of the present invention will be provided with reference to the drawings and examples, so that how to apply the technical means to solve the technical problems and achieve the technical effects can be fully understood and implemented. It should be noted that, as long as there is no conflict, the embodiments and the features of the embodiments of the present invention may be combined with each other, and the technical solutions formed are within the scope of the present invention.

Additionally, the steps illustrated in the flow charts of the figures may be performed in a computer system such as a set of computer-executable instructions and, although a logical order is illustrated in the flow charts, in some cases, the steps illustrated or described may be performed in an order different than here.

Fig. 1 is a flowchart of a method for removing face light based on an anti-neural network according to the present invention, and the following steps are described in detail with reference to fig. 1.

Step S110, because there is no public data set of the face luminous image, and collecting a large amount of paired real face luminous data is time-consuming, labor-consuming and difficult to realize, a computer graphics method is adopted to add luminous to the face image to obtain paired virtual face luminous image and face non-luminous image, and a virtual face data set is established.

The method comprises the steps of collecting face images from a network, generating a three-dimensional face corresponding to the images through a HeadShot, adding a light source, setting different reflection degrees for face materials, and obtaining different bright images of the face under different expressions through the expression of a preset face three-dimensional model. The virtual face data set comprises male and female with black, white and yellow, the comprehensiveness of the data set is ensured, and a foundation is provided for various crowds to which the result of subsequent neural network training can be applicable.

In step S120, since the human face generated by computer graphics has a certain false sense, in order to compensate for this, corresponding images of a real bright human face and a non-bright human face are required. Therefore, when an image is collected, a shot person keeps still at a fixed position, real face images with different light are shot under different intensities of illumination (the light source angle is fixed), and then polarized lenses arranged at a fixed angle are used for removing light, so that a lightless version of the same face is obtained. Thereby creating a smaller number of real face data sets.

And step S130, constructing a confrontation neural network, wherein the confrontation neural network comprises a generator and a discriminator, and parameters of the generator and the discriminator are obtained through later training, so that the generator can generate a natural matte image, and the discriminator can consider the generated image to be a real natural matte image.

And step S140, pre-training the constructed neural network by using a virtual face data set. And obtaining parameters of a generator and a discriminator of the pre-training model.

And S150, training by using a real face data set on the basis of a pre-training model, and adjusting parameters of a generator and a discriminator to enable a final light removal network model to generate a natural and non-false matte face image which is the same as a real face. Because the number of the real face data sets is small, data enhancement is carried out in an image turning mode, the data volume is increased, and the trained network performance is better. And saving the generator and discriminator parameters obtained by the final training.

And step S160, inputting the bright face image into the bright removal network model obtained in the last step to obtain a corresponding non-bright face image.

FIG. 2 is a flow chart of the present invention for training an anti-neural network, which is divided into two steps of pre-training and parameter adjustment. The anti-neural network is divided into two parts, namely a generator and a discriminator, wherein the generator adopts a U-net structure, the total layer number is 15, the ith layer characteristic and the 15-ith layer characteristic are connected, and the corresponding layers share bottom layer information, so that an output result is optimized. A commonly used Markov discriminator (PatchGAN) is used as the discriminator, and in view of the limited sensing range of a single discriminator, the invention adopts a multi-scale mode on the discriminator to respectively grasp detail information and global information. By changing the size of the input image, the input image is input into a discriminator with the same structure, and the extraction of the detail information of the image with different scales is realized. The discriminator loss is the sum of the losses discriminated by the scales.

No matter the virtual face data set or the real face data set is paired, in order to focus the attention of the network on the difference of the light areas, paired images are connected as input, the network can automatically learn the residual error of the images, and therefore the characteristic learning is effectively carried out on the light areas. The overall loss function is the competing losses of the generator and the discriminator plus the L1 norm loss between the generated image output by the generator and the corresponding matte image in the dataset. The training process is optimized using an Adam optimizer.

The specific training steps are that a virtual face data set is used for pre-training to obtain rough parameters of a generator and a discriminator, then a real face data set is used for training, the parameters of the generator and the discriminator are finely adjusted according to a real face image, the generation result of the generator is more real and natural, and the discriminator is more accurate in judgment of image authenticity and light removal effect. The specific countertraining principle is as follows: inputting the luminous image a in the training set into a generator G to obtain a generated image G (a), inputting the luminous image a corresponding to the generated image G (a) and the training set into discriminators D1 and D2 with different scales to obtain a discriminator result

Since the result generated by the initial generator is poor in reality, the probability that the image is considered as a true image by the discriminator is 0, i.e., the result of the discriminator

Minimizing, wherein the minimum value is less than 0.5, and the discriminator considers that the image generated by the generator is not real; the generator and the discriminator are alternately trained in this way until the discriminator cannot discriminate whether the image generated by the generator is real (the real image and the generated image are considered to be indistinguishable), that is, Nash equilibrium is achieved between the generator and the discriminator, and then training is completed. To ensure similarity between the generated map g (a) and the matte image a 'in the training set, a portion of the L1 distance between g (a) and a' as a function of the loss is added. The overall loss function is

Loss function L of generator and discriminator_GAN＝E_a,a‘[logD(a,a‘)]+E_a[log(1-D(a,G(a)))]When training the discriminator D, the goal is to make L_GANAt a minimum, training Generator G, the goal is to make L_GANMax, L1 norm loss function L1 ═ E_a,a‘[||a‘-G(a)||₁]E is desired, λ is coefficient, and 100 is set at the time of training.

FIG. 3 is a schematic structural diagram of an anti-neural network generator constructed according to the present invention, in which dimension splicing is added on the basis of a simple Encode-Decoder structure. The generator adopts a U-net structure, the total layer number is 15, the ith layer characteristic and the 15 th-i layer characteristic are connected (i belongs to [1,7]), and the corresponding layers share bottom layer information, so that the output result is optimized. The size of the input image of the generator is 256 × 256, the down-sampling is realized through convolution (the size of a convolution kernel is 4 × 4), batch normalization and ReLu activation, the down-sampling is performed for 8 times, and the dimensionality of the features obtained after each sampling is sequentially as follows: 128 × 128, 64 × 64, 32 × 32, 16 × 16, 8 × 8, 4 × 4, 2 × 2, and 1 × 1; and then activating, deconvoluting (the size of a convolution kernel is 4 x 4) and carrying out batch normalization to realize upsampling, wherein after 8 times of upsampling, the dimensionality obtained after each time of sampling is as follows: 2 x 2, 4 x 4, 8 x 8, 16 x 16, 32 x 32, 64 x 64, 128 x 128 and an output of size 256 x 256.

Fig. 4 is a graph showing the effect of light removal according to the present invention.

Although the embodiments of the present invention have been described above, the above descriptions are only for the convenience of understanding the present invention, and are not intended to limit the present invention. It will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A face bright removal method based on an antagonistic neural network is characterized by comprising the following steps:

(3) building a generating type antagonistic neural network model;

(4) pre-training a neural network with a virtual face data set;

2. The method for removing the face light based on the antagonistic neural network according to the claim 1, wherein the method for establishing the virtual face data set in the step (1) is as follows:

3. The method for removing the face light based on the antagonistic neural network as claimed in claim 1, wherein the method for creating the real face data set in the step (2) is as follows:

4. The method for removing the face with light based on the antagonistic neural network according to the claim 1, wherein in the step (3), a generative antagonistic neural network model is built, and the method comprises the following steps:

5. The method for removing the face light based on the antagonistic neural network as claimed in claim 1, wherein the neural network is pre-trained with the virtual face data set in step (3), and the method comprises the following steps: for the neural network constructed in the step (3), pre-training is carried out by using the virtual face data set to obtain parameters of a generator and a discriminator of a pre-training model, and the specific countermeasure training principle is as follows: inputting the luminous image a in the training set into a generator G to obtain a generated image G (a), inputting the luminous image a corresponding to the generated image G (a) and the training set into discriminators D1 and D2 with different scales to obtain a discriminator result

Loss function of generator and discriminator

Training the discriminator D with the goal of making L_GANAt a minimum, training Generator G, the goal is to make L_GANMax, L1 norm loss function L1 ═ E_a,a‘[||a‘-G(a)||₁]E is desired and λ is a coefficient.

6. The method for removing the face with light based on the antagonistic neural network as claimed in claim 1, wherein in the step (5), the pre-trained neural network parameters are adjusted by using the real face data set to obtain the final model of the network with light removal, and the method comprises the following steps: on the basis of the step (4), training by using a real face data set, and adjusting parameters of a generator and a discriminator to enable a final light-free removal network model to generate a light-free face image which is the same as the real face, wherein the specific countermeasure training principle is as follows: inputting the luminous image a in the training set into a generator G to obtain a generated image G (a), inputting the luminous image a corresponding to the generated image G (a) and the training set into discriminators D1 and D2 with different scales to obtain a discriminator result

Maximizing, wherein the maximum value is larger than 0.5, and the discriminator considers the image generated by the generator as a real image; retainment of generationThe parameters of the device are not changed, and the discriminator is trained to generate the probability that the image is a real image

Minimizing, wherein the minimum value is less than 0.5, and the discriminator considers that the image generated by the generator is not real; training the generator and the discriminator alternately until the discriminator judges that the real image is consistent with the generated image, namely Nash equilibrium is achieved between the generator and the discriminator, and then training is finished;