CN111275651B

CN111275651B - Face bright removal method based on antagonistic neural network

Info

Publication number: CN111275651B
Application number: CN202010115474.6A
Authority: CN
Inventors: 朱婷; 夏思宇; 杨凡
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2020-02-25
Filing date: 2020-02-25
Publication date: 2023-05-12
Anticipated expiration: 2040-02-25
Also published as: CN111275651A

Abstract

The invention discloses a face bright light removing method based on an antagonistic neural network, which comprises the following steps: synthesizing a human face by a computer imaging method, establishing a virtual human face data set corresponding to the bright and dark of a virtual human face, and obtaining a real human face data set corresponding to the bright and dark of a real human face by ordinary shooting and shooting by using a polarized lens; establishing an antagonistic neural network, training network parameters by using a data set to obtain a network model with light removal, and inputting a light face image into the trained network model with light removal to obtain a matt image of the face. The invention can realize more natural and real face bright removal through the antagonistic neural network.

Description

Face bright removal method based on antagonistic neural network

Technical Field

The invention relates to the field of pattern recognition, in particular to a face bright removal method based on an antagonistic neural network.

Background

The skin surface reflects light due to the illumination of the ambient light source and appears as skin shiny on the image. The original color and texture of the part are covered by the skin surface bright light, and the local bright light on the human face has influence on the aspects of computer vision, computer graphics, medical examination and the like, and the bright light on the human face can cause difficulties on human face detection, human face recognition and the like in the aspect of computer vision, so that the problems of undetectable and incorrect recognition and the like are caused; in the aspect of computer graphics, the bright demand of a real face is consistent with the environment, and under the condition that a bright light source is not generated, the skin is not highlighted; in the aspect of medical examination, the detection of the pigment skin diseases on the face of the human body is interfered, the part with the light coverage of the skin is difficult to distinguish whether the skin diseases exist or not, and the detection has an obstacle effect on the judgment of the state of illness and the design of a treatment scheme. So that it is possible to provide the above-mentioned structure. The necessity of light removal of the face is also a necessity.

Because of the complexity of the skin, the general method for removing the light is not suitable, and the research of the existing technology for removing the light cannot realize the natural and real elimination of the light on the skin of the human face, especially when the face has dense speckles such as freckle, the original skin of the light part and the characteristics such as spots, nevi and the like on the skin are recovered as much as possible while the light is removed naturally. Thus, facial light removal is a complex and challenging problem. The illumination removal method by establishing the illumination model is limited by a physical illumination model, and cannot fully reflect the real illumination environment, especially for the complex materials such as human face and skin, so that the effect of removing the light is not real enough.

With the rising of artificial intelligence technology, a deep learning algorithm is continuously developed, and compared with a light removal method by establishing a light model, the method can be used for predicting a corresponding matt image directly according to a bright original image by using an end-to-end neural network, is not limited by a fixed light model, and can be used for better and more truly recovering the skin of a bright part of a human face.

Disclosure of Invention

The invention aims to: the invention aims to solve the technical problem of providing a face light removing method based on an antagonistic neural network, by the method, the face light area can be repaired by utilizing the end-to-end neural network, the face light area can be removed naturally and truly, the original skin color and texture of the corresponding part can be recovered, and the method is suitable for various face lights.

The technical scheme is as follows: in order to achieve the purpose of the invention, the technical scheme adopted by the invention is as follows: the invention designs a face bright light removing method based on an antagonistic neural network. The method comprises the following steps: a face light removal method based on an antagonistic neural network, the method comprising the steps of:

(1) Synthesizing a human face by a computer imaging method, generating a virtual human face bright image pair and a human face dark image pair, and establishing a virtual human face data set;

(2) The method comprises the steps of obtaining an image pair of bright face and dark face in a real environment by ordinary shooting and shooting the face by using a polarized lens, and establishing a real face data set;

(3) Building a generated antagonistic neural network model;

(4) Pre-training a neural network with the virtual face dataset;

(5) Adjusting the parameters of the pre-trained neural network by using the real face data set to obtain a final bright removal network model;

(6) And inputting the face image to be subjected to the light removal into a light removal network model to obtain a corresponding no-light face image.

Further, the method for establishing the virtual face data set in the step (1) is as follows:

(1.1) collecting a face no-light image on a network as a material for generating a three-dimensional face;

(1.2) generating a three-dimensional face corresponding to the image through a HeadShot, adding a light source, setting different reflection degrees for face materials, obtaining different bright images of the face under different expressions through the preset three-dimensional model of the face, obtaining a face bright image under a dark source environment and a face bright image under a light source environment in a intercepting mode, and establishing a virtual face data set.

Further, the method for establishing the real face data set in the step (2) is as follows:

(2.1) collecting data by using a camera, and setting the position of the camera, the position of a shot person, the position of a polarizer, the position of a light source and the angle, so that the polarizer can eliminate light during shooting;

and (2.2) when the image is acquired, the shot person keeps still at a fixed position, different bright real face images are shot under different illumination intensities, and then a polarizer is used for shooting a matt version of the corresponding image to establish a real face data set.

Further, in the step (3), a generated type antagonistic neural network model is built, and the method comprises the following steps:

constructing an antagonistic neural network, wherein the antagonistic neural network comprises a generator and a discriminator, the generator adopts a U-net structure, the total layer number is 15, and the ith layer characteristic and the 15 th-ith layer characteristic are connected, wherein i is [1,7], and the corresponding layers share bottom layer information, so that the output result is optimized; the size of the input image of the generator is 256 x 256, the downsampling is realized through convolution with the convolution kernel size of 4*4, batch normalization and ReLu activation, the dimension of the feature obtained after 8 times of downsampling is as follows: 128×128, 64×64, 32×32, 16×16, 8×8, 4*4, 2×2, and 1*1; then performing activation, deconvolution with the convolution kernel size of 4*4 and batch normalization to realize up-sampling, and obtaining dimensions after 8 times of up-sampling, wherein the dimensions are as follows: 2 x 2, 4*4, 8 x 8, 16 x 16, 32 x 32, 64 x 64, 128 x 128 and an output of size 256 x 256;

the discriminant is a Markov discriminant (PatchGAN), the working principle is that an input image is divided into N small blocks for judgment, the average value of the discrimination results of all the small blocks is taken as an output result, the multi-scale discrimination is realized by changing the size of the input image without changing the parameter N of the discriminant, the discriminant result is the average value of the results of 2 different-scale discriminants, and the training process is optimized by adopting an Adam optimizer.

Further, in the step (3), the neural network is pre-trained by using the virtual face data set, and the method comprises the following steps: for the neural network constructed in the step (3), pre-training is performed by using a virtual face data set to obtain parameters of a generator and a discriminator of a pre-training model, wherein the specific countermeasure training principle is as follows: the bright image a in the training set is input into a generator G to obtain a generated image G (a), the bright image a corresponding to the training set in the generated image G (a) is input into discriminators D1 and D2 with different scales to obtain a discriminator result

Since the result generated by the initial generator is not true, the probability that the image is considered to be true by the discriminator is 0, namely the result of the discriminator +.>

0, and inputting the bright image a in the training set and the dark image a' corresponding to the training set into the discriminators D1 and D2 to obtain a discriminator result

The input image is considered as a real image by the discriminator, so the result of the discriminator is->

1, updating the parameters of the generator so that the probability of generating a picture as a real picture +.>

Maximizing, wherein the maximum value is larger than 0.5, and the image generated by the generator is considered to be a real image by the discriminator; then keep the generator parameter unchanged, train the discriminator, make the probability of the generation map as the true image +.>

Minimizing, wherein the minimum value is smaller than 0.5, and the discriminator considers that the image generated by the generator is not true; training the generator and the discriminator alternately until the discriminator considers that the real image is consistent with the generated image, namely Nash equilibrium is achieved between the generator and the discriminator, and training is completed;

to ensure similarity between the generated graph G (a) and the training set matt image a ', the L1 distance of G (a) and a' is added as part of the loss function, the overall loss function is

Wherein the generator and the arbiter have a loss function +.>

Training the arbiter D, the goal is to let L _GAN Minimum, the goal is to let L when training generator G _GAN Maximum, l1 norm loss function l1=e _a,a‘ [||a‘-G(a)|| ₁ ]E is desired and λ is a coefficient.

Further, in step (5), the parameters of the pre-trained neural network are adjusted by using the real face data set to obtain a final model of the optical removal network, and the method comprises the following steps: based on the step (4), training by using a real face data set, and adjusting parameters of a generator and a discriminator to enable a final bright removal network model to generate a matt face image identical to a real face, wherein the specific countermeasure training principle is as follows: the bright image a in the training set is input into a generator G to obtain a generated image G (a), the bright image a corresponding to the training set in the generated image G (a) is input into discriminators D1 and D2 with different scales to obtain a discriminator result

0, and concentrating the training into a lighted imagea, a matt image a' corresponding to the training set is input into the discriminators D1 and D2 to obtain a discriminator result +.>

Maximizing, wherein the maximum value is larger than 0.5, and the image generated by the generator is considered to be a real image by the discriminator; and then keeping generator parameters unchanged, training a discriminator to enable the probability of generating a graph to be a real image

Minimizing, wherein the minimum value is smaller than 0.5, and the discriminator considers that the image generated by the generator is not true; the generator and the discriminator are trained alternately until the discriminator discriminates that the real image is consistent with the generated image, namely Nash equilibrium is achieved between the generator and the discriminator, and training is completed;

Wherein the generator and arbiter loss functions L _GAN ＝E _a,a‘ [logD(a,a‘)]+E _a [log(1-D(a,G(a)))]When training the arbiter D, the goal is to let L _gAN Minimum, the goal is to let L when training generator G _GAN Maximum, l1 norm loss function l1=e _a,a‘ [||a‘-G(a)|| ₁ ]E is desired and λ is a coefficient.

The beneficial effects are that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:

(1) The invention realizes the effect of removing the face light by utilizing the end-to-end neural network, is not limited by a fixed physical illumination model, and can generate a more real light removing effect;

(2) In addition, compared with the removal of light of other materials, the invention has good adaptability to the skin which is a complex material;

(3) In addition, the invention not only can restore the original skin color of the bright area, but also has certain restoration capability to spots, color blocks and the like of the area;

(4) Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

FIG. 1 is a flow chart of a face light removal method based on an antagonistic neural network implemented by the present invention;

FIG. 2 is a flow chart of the present invention training an antagonistic neural network;

FIG. 3 is a schematic diagram of the construction of the antagonistic neural network generator constructed in accordance with the present invention;

fig. 4 is a schematic view of the effect of removing light of the present invention.

Detailed Description

The following will describe embodiments of the present invention in detail with reference to the drawings and examples, thereby solving the technical problems by applying technical means to the present invention, and realizing the technical effects can be fully understood and implemented accordingly. It should be noted that, as long as no conflict is formed, each embodiment of the present invention and each feature of each embodiment may be combined with each other, and the formed technical solutions are all within the protection scope of the present invention.

Additionally, the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that herein.

Fig. 1 is a flowchart of a face light removal method based on an antagonistic neural network according to the present invention, and each step is described in detail below with reference to fig. 1.

Step S110, because there is no disclosed data set of the bright image of the human face, but a large amount of paired bright data of the real human face is time-consuming and labor-consuming and difficult to realize, the bright image of the human face is added by adopting a computer graphics method, a paired bright image of the virtual human face and a dark image of the human face are obtained, and a virtual human face data set is established.

The face image is collected from a network, a three-dimensional face corresponding to the image is generated through HeadShot, a light source is added, different reflection degrees are set for face materials, and different bright images of the face under different expressions are obtained through the preset expression of the three-dimensional model of the face. The virtual face data set comprises black, white and yellow men and females, so that the comprehensiveness of the data set is ensured, and a foundation is provided for various crowds to which the result of the subsequent neural network training can be applied.

In step S120, since the human face generated by computer graphics has a certain false sense, in order to compensate for this, corresponding images of a real bright human face and a dark human face are required. Therefore, when the image is collected, a photographer keeps still at a fixed position, and under illumination with different intensities (the angles of the light sources are fixed), different lighted real face images are shot, and then the lights are removed by using a polarized lens arranged at a fixed angle, so that a matt version of the same face is obtained. Thereby creating a smaller number of real face data sets.

And step 130, constructing an antagonistic neural network comprising a generator and a discriminator, and obtaining parameters of the generator and the discriminator through subsequent training, so that the generator can generate a natural matt image, and the discriminator considers the generated image to be a real natural matt image.

Step S140, for the constructed neural network, pre-training is performed by using the virtual face data set. Parameters of a generator and a discriminator of the pre-training model are obtained.

Step S150, training by using a real face data set on the basis of the pre-training model, and adjusting parameters of a generator and a discriminator to enable the final bright removal network model to generate a dark face image which is natural and not false as the real face. Because the number of the real face data sets is smaller, the data is enhanced in an image overturning mode, the data quantity is increased, and the trained network performance is better. And saving the parameters of the generator and the discriminator obtained by the final training.

Step S160, inputting the bright face image into the bright removal network model obtained in the previous step to obtain a corresponding dark face image.

FIG. 2 is a flow chart of training the antagonistic neural network according to the present invention, which is divided into two steps of pre-training and parameter adjustment. The antagonistic neural network is divided into two parts, namely a generator and a discriminator, the generator adopts a U-net structure, the total layer number is 15, the ith layer characteristic and the 15 th-i layer characteristic are connected, and the corresponding layers share the bottom layer information, so that the output result is optimized. The invention adopts a multi-scale mode on the discriminator to respectively grasp detail information and global information in view of limited sensing range of a single discriminator by using a common Markov discriminator (PatchGAN) as the discriminator. By changing the size of the input image and inputting the size into the discriminators with the same structure, the detail information of different scales of the image is extracted. The discriminator loss is the sum of the losses of the discrimination of each scale.

The virtual face data set and the real face data set are paired, in order to concentrate the attention of the network on different light areas, paired images are connected as input, and the network can automatically learn the residual errors of the images, so that feature learning is effectively performed on the light areas. The overall loss function adds the counter loss of the generator and the arbiter to the L1 norm loss between the generated image output by the generator and the corresponding no light image in the dataset. The training process is optimized by an Adam optimizer.

The specific training comprises pre-training with virtual face data set to obtain parameters of generator and discriminator, training with real face data set, and fine-tuning according to real face imageThe parameters of the generator and the discriminator make the generated result of the generator more real and natural, and the discriminator judges the image authenticity and the bright removal effect more accurately. The specific countermeasure training principle is as follows: the bright image a in the training set is input into a generator G to obtain a generated image G (a), the bright image a corresponding to the training set in the generated image G (a) is input into discriminators D1 and D2 with different scales to obtain a discriminator result

Since the result generated by the initial generator is very bad in reality, the probability that the image is considered as a true image by the arbiter is 0, i.e. the result of the arbiter +.>

Minimizing, wherein the minimum value is smaller than 0.5, and the discriminator considers that the image generated by the generator is not true; the generator and the discriminator are trained alternately until the discriminator cannot discriminate whether the image generated by the generator is true (the true image is considered indistinguishable from the generated image), i.e. the generatorAnd the Nash equilibrium is achieved between the training device and the discriminator, and the training is completed. To ensure similarity between the generated graph G (a) and the matte image a 'in the training set, the L1 distance of G (a) from a' is added as part of the loss function. The overall loss function is->

Wherein the generator and arbiter loss functions L _GAN ＝E _a,a‘ [logD(a,a‘)]+E _a [log(1-D(a,G(a)))]When training the arbiter D, the goal is to let L _GAN Minimum, the goal is to let L when training generator G _GAN Maximum, l1 norm loss function l1=e _a,a‘ [||a‘-G(a)|| ₁ ]E is desired, λ is a coefficient, and 100 is set at training. />

Fig. 3 is a schematic diagram of the structure of the antagonistic neural network generator constructed by the invention, wherein dimension splicing is added on the basis of a simple Encoder-Decoder structure. The generator adopts a U-net structure, the total layer number is 15, the ith layer characteristics and the 15 th-ith layer characteristics are connected (wherein i epsilon [1,7 ]), and the corresponding layers share the bottom layer information, so that the output result is optimized. The size of the input image of the generator is 256 x 256, the downsampling is realized through convolution (the convolution kernel size is 4*4), batch normalization and ReLu activation, the dimension of the feature obtained after 8 times of downsampling is as follows: 128×128, 64×64, 32×32, 16×16, 8×8, 4*4, 2×2, and 1*1; then activating, deconvoluting (the convolution kernel size is 4*4), normalizing in batches, realizing up-sampling, and obtaining dimensions after 8 times of up-sampling, wherein the dimensions are as follows: 2 x 2, 4*4, 8 x 8, 16 x 16, 32 x 32, 64 x 64, 128 x 128 and an output of size 256 x 256.

Fig. 4 is a graph showing the effect of removing light in the present invention.

Although the embodiments of the present invention are described above, the embodiments are only used for facilitating understanding of the present invention, and are not intended to limit the present invention. Any person skilled in the art can make any modification and variation in form and detail without departing from the spirit and scope of the present disclosure, but the scope of the present disclosure is still subject to the scope of the appended claims.

Claims

1. A face bright removal method based on an antagonistic neural network, which is characterized by comprising the following steps:

(3) Building a generated antagonistic neural network model;

(4) Pre-training a neural network with the virtual face dataset;

(6) Inputting the face image to be subjected to light removal into a light removal network model to obtain a corresponding no-light face image;

in the step (3), a generated antagonistic neural network model is built, and the method comprises the following steps:

the discriminator is a Markov discriminator (PatchGAN), the working principle is that an input image is divided into N small blocks for judgment, the average value of the discrimination results of all the small blocks is taken as an output result, the multi-scale discrimination is realized by changing the size of the input image without changing the parameter N of the discriminator, the result of the discriminator is the average value of the results of 2 discriminators with different scales, and the training process is optimized by adopting an Adam optimizer;

in the step (3), the neural network is pre-trained by using the virtual face data set, and the method comprises the following steps: for the neural network constructed in the step (3), pre-training is performed by using a virtual face data set to obtain parameters of a generator and a discriminator of a pre-training model, wherein the specific countermeasure training principle is as follows: the bright image a in the training set is input into a generator G to obtain a generated image G (a), the bright image a corresponding to the training set in the generated image G (a) is input into discriminators D1 and D2 with different scales to obtain a discriminator result

0, and inputting the bright image a in the training set and the dark image a' corresponding to the training set into the discriminators D1 and D2 to obtain a discriminator result->

The minimum value is smaller than 0.5, and the image generated by the generator is considered to be unreal by the discriminator at the moment; training the generator and the discriminator alternately until the discriminator considers that the real image is consistent with the generated image, namely Nash equilibrium is achieved between the generator and the discriminator, and training is completed;

Wherein the generator and the arbiter have a loss function +.>

Training the arbiter D, the goal is to let L _GAN Minimum, the goal is to let L when training generator G _GAN Maximum, l1 norm loss function l1=e _a,a‘ [||a‘-G(a)|| ₁ ]E is the expected, lambda is the coefficient;

in the step (5), the parameters of the pre-trained neural network are adjusted by using the real face data set to obtain a final light removal network model, and the method comprises the following steps: based on the step (4), training by using a real face data set, and adjusting parameters of a generator and a discriminator to enable a final bright removal network model to generate a matt face image identical to a real face, wherein the specific countermeasure training principle is as follows: the bright image a in the training set is input into a generator G to obtain a generated image G (a), the bright image a corresponding to the training set in the generated image G (a) is input into discriminators D1 and D2 with different scales to obtain a discriminator result

Due to the junction generated by the initial generatorIf not, the probability that the image is considered to be a true image by the discriminator is 0, namely the result of the discriminator is +.>

The minimum value is smaller than 0.5, and the image generated by the generator is considered to be unreal by the discriminator at the moment; the generator and the discriminator are trained alternately until the discriminator discriminates that the real image is consistent with the generated image, namely Nash equilibrium is achieved between the generator and the discriminator, and training is completed;

Wherein the generator and arbiter loss functions L _GAN ＝E _a，a′ [logD(a，a‘)]+E _a [log(1-D(a，G(a)))]When training the arbiter D, the goal is to let L _GAN Minimum, the goal is to let L when training generator G _GAN Maximum, l1 norm loss function l1=e _a，a‘ [||a‘-G(a)|| ₁ ]E is desired and λ is a coefficient.

2. The method of claim 1, wherein the method of creating the virtual face data set in step (1) is as follows:

3. The method of face light removal based on antagonistic neural network according to claim 1, wherein the method of creating the real face data set in step (2) is as follows: