CN112861578B

CN112861578B - Method for generating human face from human eyes based on self-attention mechanism

Info

Publication number: CN112861578B
Application number: CN201911182680.2A
Authority: CN
Inventors: 罗晓东; 何小海; 卿粼波; 刘露平; 许一宁; 滕奇志; 吴小强
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2019-11-27
Filing date: 2019-11-27
Publication date: 2023-07-04
Anticipated expiration: 2039-11-27
Also published as: CN112861578A

Abstract

A method for generating a face from a human eye based on a self-attention mechanism. The invention discloses a method for generating a human face from human eyes, which extracts human eye information from a covered or blocked human face, fully excavates the internal mapping relation between human eyes and the human face through a self-attention mechanism countermeasure generation network, synthesizes a vivid human face image according to the human eyes, and further carries out human face recognition on the generated human face. The method mainly comprises the following steps: constructing a face data set generated from human eyes based on a face image in the public face data set, training the face data set on a model provided by the invention, and completing training after adjusting parameters; extracting a human eye part in a human face image and carrying out normalization processing; sending the model to a trained model to finish the generation of the face; and finally, carrying out identity recognition verification on the synthesized face and an original face (group trunk). The invention introduces the self-attention mechanism into the countermeasure generation network to guide training, can generate more realistic faces, effectively improves the face recognition rate, and can be applied to the fields of public security, terrorism and the like.

Description

Method for generating human face from human eyes based on self-attention mechanism

Technical Field

The invention designs a method for generating a human face from human eyes based on a self-attention mechanism, and relates to the technical fields of computer vision, deep learning and public safety.

Background

Along with the progress of the face recognition technology, the application of the face recognition technology is more and more extensive, and the recognition rate of the face recognition technology in the public face libraries CelebA and LFW at present is over 98%. However, in the practical application environment, the different recognition effects of the scenes are relatively large, and the recognition effect of the places which can clearly absorb the complete face in some short distances is also very good, such as railway stations, airports, examination rooms, mobile payment and the like; and in places interfered by factors such as distance, illumination, background, shielding and the like, the recognition effect is not satisfactory, for example, criminal destructive molecules are usually covered in the public safety field, and only the eye information of the obtained face is visible, so that the recognition is challenging.

With the development of public monitoring, monitoring cameras are popular and installed in public places basically, and can capture the head portraits of illegal crimes and terrorists in real time, but the head portraits usually cover facial information and only expose information of two eyes, so that the identity of the people is difficult to be recognized by law enforcement personnel, and the searching and locking difficulty of criminals in the law enforcement process is high. The main face recognition technology at present is to recognize the whole face, the recognition rate of recognizing the face only according to the eye information is low, and therefore, the research of recognizing the face according to the eyes of the person is very necessary, and the method has important significance.

The existing face recognition method is mainly divided into two main categories of traditional machine learning and deep learning: the application research of traditional machine learning in the face recognition field has made a plurality of breakthroughs, wherein typical methods are as follows: (1) a classification method based on a template matching method; (2) an identification method based on the extraction of geometrical features of the face; (3) The identification method based on the mathematical statistics mainly comprises the following steps: the algorithms such as singular value decomposition (Singular Value Decomposition, SVD) classification method, KL (Karhunen-Loeve) algorithm identification, hidden Markov (Hidden Markov Model, HMM) algorithm and the like all achieve higher identification rate. With the development of deep learning technology, research on face recognition methods based on deep learning also becomes a development trend, recognition rate is greatly improved compared with that of traditional methods, and even the latest method based on deep learning currently exceeds the recognition rate of human beings on disclosing face data sets CelebA and LFW. The typical convolutional neural network (Convolutional Neural Network, CNN) can adaptively learn parameters of a convolutional kernel and extract facial features for comparison and identification by utilizing gradient descent and back propagation mechanisms, and the method is more scientific and effective in practice compared with a feature extraction method artificially designed in a traditional machine learning method; while the counter-generation networks (Generative Adversarial Networks, GANs) technology has also made breakthrough progress in the fields of image generation and computer vision.

Disclosure of Invention

The invention provides a method for generating a human face from human eyes based on a self-attention mechanism for solving the problems. In the public safety field, most of the face information of criminals is usually covered by exposing only two eyes, and in order to effectively identify the criminals, the invention designs a process and a method for generating corresponding faces according to the eyes and then identifying the faces.

The invention realizes the above purpose through the following technical scheme:

1. the deep neural network model provided by the invention needs to be trained on a special data set, and a public data set generated from eyes to faces (eye-to-face) does not exist at present, so that the data set needs to be built by the user in the first step, and the steps and requirements are as follows:

(1) Acquiring a face original image through the disclosed face recognition data set CelebA, LFW, network resources and the like, and normalizing the face original image to a uniform size of 256 multiplied by 256;

(2) Extracting the eye part image of the human face obtained in the step (1) through a self-programming tool, and erasing the image information of the rest part of the human face;

(3) Normalizing the human eye image obtained in the step (2) to a uniform size of 256 multiplied by 256;

(4) The original face image in the step (1) and the corresponding human eye image obtained in the step (3) are combined into a data pair, a single image with the size of 512 multiplied by 256 is combined as a training data pair, and finally a data set generated from human eyes to human faces is formed, and the data set constructed by the method is named SCU-eye.

2. The invention provides a method for generating a human face from human eyes based on a self-attention mechanism, wherein the specific network model structure and principle are as follows:

(1) The invention provides a self-attention mechanism-based countermeasure generation network, which has the network structure as follows: a cyclic countermeasure generation network consisting of two generators (generators) and two discriminators (discriminators), wherein the codec in the generators is a convolutional neural network U-Net used, and a self-attention mechanism (self-attention mechanism) is added in the generators to guide training of the generators, and the loss functions of the generators include GANs loss, face feature loss, L ₁ 、L ₂ The weighted sum of losses, KL losses, is used to guide the training of the generator, the whole training process being the result of the continuous loop game of the generator and the discriminator, the overall loss function of the model being as follows:

wherein L is _1GAN (G, D) and L _2GAN (G, D, E) is a loss function of GAN, L ₁ (G, E) and L ₁ (G, E) is an image pixel distribution loss function, FL (E) _PR ) As a characteristic loss function, L _KL (E) Is a conditional noise distribution loss function.

(2) In the invention, a pre-trained human face feature extraction network Resnet is used for extracting the features of the generated human face and the original human face in the model training process, conditional noise loss and human face feature loss are calculated and fed back to the training of the guide model in the network, so that the generated human face is more similar to the human body, and the feature loss function is as follows:

wherein E is _PR () Representing a pre-trained Resnet network,

representing feature vectors generated for faces, E _PR (B) A feature vector representing the original face.

(3) The invention adds a self-attention mechanism (self-attention mechanism) to guide the training of the generator in the model, so that the model can learn the internal mapping relation between human eyes and human faces better, and a more realistic human face can be generated. Embedding an attention model between two convolution layers in a coder and a decoder of a generator, inputting a feature vector x output by the previous convolution layer in the coder and the decoder into the attention model for calculation, and calculating a value o output by the attention model _j The input of the next convolution layer of the coder-decoder is obtained after weighted summation with x, and the principle and the formula are as follows:

wherein f (x) =w _f x，g(x)＝W _g x，h(x)＝W _h x; x is the input of the self-attention model; f (x), g (x), h (x) are three paths of 1×1conv convolution eigenvectors of the image; w (W) _f 、W _g 、W _h Is the weight.

y _i ＝γo _i +x _i

Wherein o is _j Representing the output of the self-attention model, y _i As input to the next convolutional layer of the codec.

(4) Experiments prove that the deep neural network model provided by the invention achieves the optimal convergence effect when trained to 300-400 epochs, and the model is trained at the moment and used for generating a face image of the next stepping pedestrian eye image.

(5) The generated face graph is identified by a pre-trained deep convolutional neural network, and then the identity of the identified face is judged.

The invention provides a solution for the face recognition of only human eye information, and the designed deep neural network for generating the human face from the human eyes can generate the human face which is closer to the original human face, thereby effectively improving the recognition rate and having great application prospect in the fields of public safety, terrorism and the like.

Drawings

FIG. 1 is a flow chart of a proposal of the invention

FIG. 2 is a sample of a human face training dataset generated from a human eye in accordance with the present invention

FIG. 3 is a block diagram of an algorithm for human eye-based face generation designed in accordance with the present invention

FIG. 4 is a schematic diagram of the self-attention mechanism embedded in the generator of the present invention

Detailed Description

The invention is further described below with reference to the accompanying drawings:

as shown in fig. 1, a method for generating a face from a human eye based on a self-attention mechanism comprises the following working steps:

step one: creating a data set for generating a face from human eyes based on face images in the public face data set;

step two: training a network model, namely training a Self-attention mechanism Generative Adversarial Networks (GANs) based on an attention mechanism countermeasure generation network on the data set constructed in the step one, and completing model training through multiple rounds of parameter adjustment;

step three: preprocessing human eye images, namely extracting human eye parts in human face images according to requirements, and carrying out normalization processing on the human eye parts;

step four: generating a human face from human eyes, and inputting the human eye image preprocessed in the step three into the trained neural network model in the step two to finish the generation of the human face image;

step five: face recognition, namely, carrying out identity recognition verification on the synthesized face and an original face (group trunk) in a face recognition network.

Fig. 2 is a data pair sample in a self-constructed dataset, each sample being a 512 x 256 size image, the left half of the image being the eye to which the right half of the face corresponds.

Fig. 3 is an end-to-end human eye face generating network based on an implementation of a self-attention mechanism countermeasure generating network design.

The specific design is as follows:

a cyclic countermeasure generation network consisting of two generators (generators) and two discriminators (discriminators), wherein the codec in the generators is a convolutional neural network U-Net used, and a self-attention mechanism (self-attention mechanism) is added in the generators to guide training of the generators, and the loss functions of the generators include GANs loss, face feature loss, L ₁ 、L ₂ The loss, the weighted sum of the KL loss, is used to guide the training of the generator, and the whole training process is that the generator and the discriminator continuously circulate the game until the generated face is indistinguishable from the original face (ground trunk).

Fig. 4 is a schematic diagram of the self-attention mechanism introduced in the present invention, and the self-attention mechanism (self-attention mechanism) is added to guide the training of the model generator, so that the model can learn the internal mapping relation between human eyes and human faces better, and a more real human face can be generated.

Claims

1. A method of generating a face from a human eye based on a self-attention mechanism, comprising the steps of:

step two: training a network model, namely training a data set constructed in the first step by using an attention-based mechanism countermeasure generation network GANs, wherein the network is a circular countermeasure generation network consisting of two generators and two discriminators; embedding an attention model between two convolution layers in a coder and a decoder of a generator, inputting a feature vector output by a previous convolution layer in the coder and the decoder into the attention model for calculation, and taking the calculated value output by the attention model and the input feature vector as the input of the next convolution layer of the coder and the decoder after weighted summation, wherein the whole training process is as follows: (1) Training by a convolutional neural network U-Net guide generator added with a self-attention mechanism, and generating a face image through an input eye image; (2) The method comprises the steps of extracting characteristics of a generated face and an original face by using a pre-trained face characteristic extraction network Resnet, calculating conditional noise loss and face characteristic loss, feeding back to training of a guide model in the network, and enabling the generated face to be more similar to the human, wherein a characteristic loss function is as follows:

wherein E is _PR () Representing a pre-trained Resnet network,

representing feature vectors generated for faces, E _PR (B) A feature vector representing an original face;

model training is completed through multiple rounds of parameter adjustment;

step five: face recognition, namely, carrying out identity recognition verification on the synthesized face and the original face groundtrunk in a face recognition network.