AU2019430859B2 - Generative adversarial mechanism and attention mechanism-based standard face generation method - Google Patents
Generative adversarial mechanism and attention mechanism-based standard face generation method Download PDFInfo
- Publication number
- AU2019430859B2 AU2019430859B2 AU2019430859A AU2019430859A AU2019430859B2 AU 2019430859 B2 AU2019430859 B2 AU 2019430859B2 AU 2019430859 A AU2019430859 A AU 2019430859A AU 2019430859 A AU2019430859 A AU 2019430859A AU 2019430859 B2 AU2019430859 B2 AU 2019430859B2
- Authority
- AU
- Australia
- Prior art keywords
- image
- face
- network
- model
- sub
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000007246 mechanism Effects 0.000 title claims abstract description 66
- 238000000034 method Methods 0.000 title claims abstract description 33
- 238000005286 illumination Methods 0.000 claims abstract description 51
- 238000012549 training Methods 0.000 claims abstract description 21
- 230000014509 gene expression Effects 0.000 claims description 19
- 230000008921 facial expression Effects 0.000 claims description 11
- 238000010276 construction Methods 0.000 claims description 7
- 230000008569 process Effects 0.000 claims description 6
- 238000002372 labelling Methods 0.000 claims description 3
- 238000005457 optimization Methods 0.000 claims description 3
- 230000015572 biosynthetic process Effects 0.000 claims description 2
- 238000012938 design process Methods 0.000 claims description 2
- 238000011156 evaluation Methods 0.000 claims description 2
- 238000013507 mapping Methods 0.000 claims description 2
- 238000003786 synthesis reaction Methods 0.000 claims description 2
- 238000013461 design Methods 0.000 abstract description 20
- 238000013135 deep learning Methods 0.000 abstract description 12
- 238000000605 extraction Methods 0.000 abstract description 4
- 238000005516 engineering process Methods 0.000 abstract description 3
- 230000001815 facial effect Effects 0.000 abstract description 3
- 239000007787 solid Substances 0.000 abstract description 3
- 230000006870 function Effects 0.000 description 17
- 238000011161 development Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 239000000463 material Substances 0.000 description 2
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
A generative adversarial mechanism and attention mechanism-based standard face generation method, comprising: a dataset design step, constructing, according to database-related annotation data, face code having a plurality of non-limiting factors for a face image, and taking the code and the face image as inputs of a model; a model design and training step, using a generative adversarial mechanism and an attention mechanism to design a corresponding network structure, and using the constructed data pair to perform model training, so as to obtain a network model weight; and a model prediction step, predicting the acquired face image by means of the model. The present invention applies deep learning network technology to standard face generation to generate a colour, front-facing, and standard face image under normal light illumination. The method using a deep learning network is capable of obtaining an accurate standard face photograph, reducing the difficulty of matching with data in a single-sample database, and laying a solid foundation for subsequent face feature extraction and single-sample facial recognition.
Description
A Standard Face Generation Method Based on a Generative Adversarial Mechanism and an Attention Mechanism
Technical Field The invention relates to the technical field of deep learning applications, in particular to a standard face generation method based on a generative adversarial mechanism and an attention mechanism.
Technical Background In recent years, video surveillance has been popularized in large and medium cities across the country, and has been widely used in the construction of social security prevention and control systems, and has become a powerful technical means for public security agencies to investigate and solve cases. Especially in mass incidents, major cases and robberies, the evidential clues obtained from video surveillance videos play a key role in the rapid solving of cases. At present, domestic public security agencies mainly use video surveillance videos to find after-crime clues and evidences, and locking in the suspect's identity by comparing the face information of key suspects with personal information in the public security bureau's database. However, there are many restrictive factors in the face information of the suspect in the surveillance video, such as expression information interference, posture interference or shooting illumination interference. Since most of the facial information images in the public security bureau's database are only a single sample of the ID photo, when the facial images interfered by the above-mentioned multiple restrictive factors are subjected to recognition processing, the success rate is greatly restricted, and it is easy to cause missed detections and wrong detections etc.
In recent years, the field of artificial intelligence has been mentioned in the scope of national key construction. This indicates that the combination of artificial intelligence and related industries is an inevitable trend of the country's development towards intelligence, and it is of great significance to promote the development of the industry towards intelligence and automation. The most important thing in the field of artificial intelligence is to design corresponding deep learning network models for different industry tasks. With the increase of computer computing power, the difficulty of network training has been greatly reduced, and the accuracy of network prediction has also been continuously improved. The basic characteristics of deep learning network are strong model fitting ability, large amount of information and high precision, which can meet the different needs of different industries. For the face recognition problem with multiple non-limiting factors, the key issue is how to generate a standard front-facing face image to meet the needs of subsequent face image feature extraction and recognition. At present, it is urgent to solve this problem, design a corresponding and reasonable deep learning network framework, use high-performance computer processing capabilities to train the network, and then generate more standard front-facing face images, improve the accuracy of face matching and reduce the occurrence of false detections during face recognition.
Reference to cited material or information contained in the text should not be understood as a concession that the material or information was part of the common general knowledge or was known in Australia or any other country.
Each document, reference, patent application or patent cited in this text is expressly incorporated herein in their entirety by reference, which means that it should be read and considered by the reader as part of this text. That the document, reference, patent application, or patent cited in this text is not repeated in this text is merely for reasons for conciseness.
Reference numbers and letters appearing between parentheses in the claims, identifying features described in the embodiment(s) and/or example(s) and/or illustrated in the accompanying drawings, are provided as an aid to the reader as an exemplification of the matter claimed. The inclusion of such reference numbers and letters is not to be interpreted as placing any limitations on the scope of the claims.
Throughout the specification and claims, unless the context requires otherwise, the word "comprise" or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated integer or group of integers but not the exclusion of any other integer or group of integers.
Summary of the Invention The present invention addresses above-mentioned shortcomings in the prior art, and provide a standard face generation method based on a generative adversarial mechanism and an attention mechanism, and use a deep learning network framework to design related models, thereby obtaining a more standard front-facing face image, laying a solid foundation for subsequent face feature extraction and single sample face recognition.
In an embodiment, the invention provides A standard face generation method based on a generative adversarial mechanism and an attention mechanism, characterized in that, the generation method comprises the following steps: Si. a data construction, collecting face data, constructing a face code with multiple non-limiting factors for each face image, then classifying the face data; wherein the non-limiting factors comprise face expression factors, face posture factors and shooting illumination factors; an encoded face image
forms an information unit U ={L,, E, A,} , comprising an 8-bit illumination code L, an 8-bit
expression code E, and a 19-bit posture code A, ;
S2. establishing a network model based on the generative adversarial mechanism and the attention mechanism; the network model comprises three sub-networks, each correspond to an image generator sub-network for generating a standard face, a model discriminator sub-network for discriminating generated results, and an image restoration network for restoring the generated results; first, using the image generator sub-network and the attention mechanism to generate a standard face on an input face image; then, using the model discriminator sub-network to discriminate the generated image, finally, constructing an image restoration network, restoring the generated image, and comparing a restoration result with the input image to optimize constraints of the network model; S3. a model training, using the information unit U = {L, E , A,} as an input to optimize outputs of the image generator sub-network, the model discriminator sub-network, and the image restoration sub-network, and labeling similarities, to achieve a convergence of the network model based on the generative adversarial mechanism and the attention mechanism; S4. a model prediction, extracting an face image in an actual image as an input of the network model, finally obtaining a standard front face image output by controlling the information unit U ; wherein the face expression factors are divided into eight situations, namely happy, angry, sad, contemptuous, disappointed, scared, surprised and natural; a face expression is encoded as E =(E 1 , E 2 ,..., E8 ), where E represents thel-th expression, l= 0, 1, 2, ... , 8, its value is [0,1],
Ea = (0, 0,...,1) means a natural expression;
the face illumination factors are divided into eight situations, namely front illumination, left illumination, right illumination, front left illumination, front right illumination, left right illumination, no illumination, and full illumination; illumination information of the face is encoded as
L, =(L 1, 2,..., L ), where L, represents the n-th illumination situation, its value is [0,1], L, = (0, 0,...,1) represents full illumination image information;
the face posture factors are divided into 19 situations, namely left 900, left 80, left 70, left 60, left
, left 400, left 300, left 200, left 10, front face, right 10, right 20, right 30, right 40, right 50,
right 60, right 700, right 800, right 900; posture information of the face is encoded as
, =(A, A 2 ,9 ., A , 19 ), where Am represents the m-th face pose, m = 0, 1, 2, ..., 19, its
value is [0,1], A, =(0,0,...,1) represents front posture information.
A standard face generation method based on a generative adversarial mechanism and an attention mechanism. The generation method comprises: data set design steps, model design and training steps, and model prediction steps; the data set design steps are mainly based on the current mainstream Radboud Faces Database (RaFD) data set and Institute of Artificial Intelligence and Robotics (IAIR) face data set, according to the relevant annotated data of the database, a face code with a various non limiting factors is constructed for each face image, comprising face expression factors, face posture factors, and shooting illumination factors etc., and the code and face image are used as inputs to the model; the model design and training steps are mainly to use the related principles of the generative adversarial mechanism and the attention mechanism to design the corresponding network structure, and to use the constructed data pair for model training to obtain the network model weight; the model prediction step is mainly to predict the result after model processing is performed on the face image acquired in reality.
Specifically, the operation steps are as follows:
Si. A data construction, collecting face data in a RaFD face data set and a IAIR face data set, constructing a face code with multiple non-limiting factors for each face image, then classifying the face data; wherein the non-limiting factors comprise face expression factors, face posture factors and
shooting illumination factors; an encoded face image forms an information unit U ={L, E, A,},
comprising an 8-bit illumination code L, an 8-bit expression code E, and a 19-bit posture code A,;
S2. Establishing a network model based on the generative adversarial mechanism and the attention mechanism; the network model comprises three sub-networks, each correspond to an image generator sub-network for generating a standard face, a model discriminator sub-network for discriminating generated results, and an image restoration network for restoring the generated results; first, using the image generator sub-network and the attention mechanism to generate a standard face on an input face image; then, using the model discriminator sub-network to discriminate the generated image, finally, constructing an image restoration network, restoring the generated image, and comparing a restoration result with the input image to optimize constraints of the network model;
S3. A model training, using an image unit generated in step S1, using an image with multiple non limiting factors as input to optimize outputs of the image generator sub-network, the model discriminator sub-network, and the image restoration sub-network, and labeling similarities, to achieve a convergence of the network model based on the generative adversarial mechanism and the attention mechanism;
S4. A model prediction, extracting an face in an actual image as an input of the model, finally obtaining a more standard front face image output by controlling a unified information unit.
Further, in step Si, the face information in the face data set is correspondingly encoded, and divided into two types: non-limiting factor face images and standard front natural face images;
The process of step Si is as follows;
Si1. face information code. For different face data in the data set, a face code with multiple non limiting factors is constructed for each face image, wherein the non-limiting factors comprise, but are not limited to, face expression factors, face posture factors, and shooting illumination factors etc.
The specific rules for coding face images are as follows:
A) the face expression factors are divided into eight situations, namely happy, angry, sad, contemptuous, disappointed, scared, surprised and natural; a face expression is encoded as
E =(E,1, E 2,..., E8 ), where E, represents the l-th expression, l= 0, 1, 2, . . , 8, its value is [0,1],
Ea = (0, 0,...,1) means a natural expression;
B) the face illumination factors are divided into eight situations, mainly front illumination, left illumination, right illumination and a combination of these three illuminations namely front illumination, left illumination, right illumination, front left illumination, front right illumination, left right illumination, no illumination, and full illumination; illumination information of the face is
encoded as L, =(L, 1,L 2 , ... , L, ,), where L, represents the n-th illumination situation, n=0, 1, 2,
8, its value is [0,1], L, =(0, 0,...,1) represents front illumination image information;
C) the face posture factors are divided into 19 situations, comprising 9 poses of the left face at 10
intervals, 9 poses of the right face at 10 intervals, and front face posture images, that is left 90, left
, left 70, left 60, left 500, left 400, left 300, left 20, left 10, front face, right 10, right 20, right
, right 400, right 500, right 600, right 70, right 80, right 90; posture information of the face is
encoded as A, =(A, 1,A , 2 ... , Aum,.--, 1 9A i ), where Aum represents the m-th face pose, m = 0, 1, 2,
19, its value is [0,1], A, = (0, 0,...,1) represents front posture information. Finally, the face
information code is integrated into the unified information code U ={L, E , A,} , which is a 35-bit
one-dimensional information.
S12. classifying face data, classifying the encoded face data into non-limiting factor face images and standard front natural clear face images, specifically:
Face images with unified code information UO = (L' (0,0,...,1), E (0, 0,...,1), A, (0, 0,...,1),) are
taken as the standard frontal natural clear face images, and used as target images of the model; remaining face images are taken as the non-limiting factor face images, and used as input images of the model.
Further, in the step S2,
assuming that the input image is Y, its corresponding original unified information code is U,, the
generated standard face image is I , the unified information code UO corresponds to the standard
face image I, , the corresponding standard face image in the UO database is I, the unified information
code corresponding to the standard face image I is Uo.
In the image generator sub-network, the content of its inputs are image Y and a unified information code Uo. The invention designs two codec networks Gc and Gf, generating a color information mask C and an attention mask F by combining with the attention mechanism; then through the following synthesis mechanism to generate the standard face:
C=Gj(Y,UO), F=Gf(Y,UO)
Io = (1 - F) C C + F )Y wherein C represents element-wise multiplications of matrices.
Therefore, the codec network Gc mainly focuses on the color information and texture information of the face, and the codec network Gfmainly focuses on the areas that need to be changed in the face;
In the model discriminator sub-network, the content of its input is an image lo generated by the image generator sub-network. Similarly, the invention also designs two deep convolution networks: image discrimination sub-network D and information code discrimination sub-network Du, to respectively distinguish a difference between a generated standard face image I and a corresponding standard face
image I in a database, and a difference between a unified information code UO corresponding to the
generated standard face image I and a unified information code Uo corresponding to the
corresponding standard face image I in the database;
In the image restoration sub-network, the content of its input is the original unified information code U, corresponding to the generated standard face image lo and an input image Y. The restoration sub
network is consistent with the image generator sub-network, and its network restoration result is Y. By comparing the restoration result with the input image Y of the overall network, the goal of a loop optimization network result is achieved.
Further, the processing flow of the network model based on the generative adversarial mechanism and the attention mechanism is as follows:
First, input the unified information code Uo corresponding to the input image Y and the standard face
image I into the image generator sub-network to generate the standard face image I ; the image
generator sub-network incorporates the attention mechanism;
Then, in order to distinguish between real images and generated images, send the generated standard face image Io and the corresponding standard face image I (that is real image 1) in the database to the image discrimination sub-network Din the model discrimination sub-network for discrimination, and
at the same time, sending the unified information code U corresponding to the generated standard
face image I and the unified information code Uo corresponding to the standard face image I in the
database to the information code discrimination sub-network Du in the model discriminator sub network for discrimination, optimizing through continuous loop so that the image generator sub network and the model discriminator sub-network achieve common progress;
Finally, in order to achieve the purpose of loop optimizing the network model, the present invention
designs an image restoration sub-network, the generated standard face image I is further restored
according to the original unified information code U, corresponding to the original input image Y, and the restoration result is compared with the input image Y. The entire network realizes the convergence of the overall network model by continuously optimizing the corresponding loss function. Finally, the removal of non-limiting environmental factors from the face image is realized.
Further, in the step S3, the model training achieves the convergence of the model by optimizing a loss function, wherein a design process of the loss function is as follows specifically:
1) optimizing a difference between the generated standard face image I by discrimination and the
corresponding standard face image I in the database: setting an image loss function as shown in
LI = I D, (1)- D, (I)1, where H and W are a height and a width of an output face image, HxW 112
D, (I,) and D, (i) are evaluation results of the image I and I by the image discrimination sub
network; then, considering an effectiveness of a gradient loss, add a gradient-based penalty to the image loss function, which may improve the efficiency of convergence and the quality of image generation, that is, the image loss function is designed as 1 2 12 L= I D,( )-D,(')i2+ HxW VD,(1 -12, where V representsagradient HxW 2
operation of the image, and A, is a weight of the penalty;
2) optimizing a difference of a conditional unified information code: setting a conditional expression loss function, that is, distinguishing a difference between the generated standard face image I, and
the corresponding standard face image I in the database, each corresponding to unified information
code U0 and Uo; therefore, the conditional expression loss function is designed as follows:
Lu = - Du (I0)- U0 , where N is a length of an output uniform information code, and then, in the
conditional expression loss function, adding a mapping relationship between the input image Y and a corresponding original unified information encode U,, which may improve the discriminating ability of the discriminator, therefore, the conditional expression loss function is designed as 1 21 Lu = - jD (Io)- U0 2 +-Do (Y)- U , where U, is an original unified information code
corresponding to the input image Y, Uo is a unified information code corresponding to the standard face image 1, D, (I) and Du (Y) are discrimination results of the information code discriminating
sub-network on the images I and Y respectively;
3) optimizing a difference between a result of the image restoration sub-network and the original input image: restoring the image 1 generated by an input generator with the original unified information code U, and then compared with the original input image Y. Therefore, a restoration loss function is
designed as L, = I G(G(1 0 , U,- Y , where h and w represent a height and a width of the
image.
Therefore, a loss function of the entire network model is as follows: L = L,+ Lu + L,.
By optimizing the loss function, the convergence of the network model is achieved, and a generator structure and weights for generating standard faces are obtained.
Further, for the generation of the actual face image in step S4, first use the face positioning method based on the face Histogram of Oriented Gradients (HOG) image to obtain the face image in the actual image; then, the generator trained by the model and the manually set unified information code are used to realize the rapid standard face generation of the face in the actual image. In addition, it is foreseeable that by setting different unified information codes, it is possible to change other structures of the face, such as it is feasible to control other expressions, or further changing the face posture.
Compared with the prior art, the present invention has the following advantages and effects:
The present invention applies the deep learning network technology to the standard face generation task to generate colorful, forward-facing standard face images under normal illumination; using the deep learning network method, accurate standard front face photos may be obtained, the difficulty of matching data in a single-sample database is reduced, and a solid foundation is laid for subsequent face feature extraction and single-sample face recognition.
Brief Description of the Figures Figure 1 is a flowchart of a model training and a model application in an embodiment of the present invention;
Figure 2 is a flowchart of a data construction of a database in an embodiment of the present invention;
Figure 3 is an overall design diagram of a network model in an embodiment of the present invention;
Figure 4 is a specific structure diagram of an image generation network in an embodiment of the present invention;
Figure 5 is a specific structure diagram of an image discrimination network in an embodiment of the present invention.
Description In order to better clarify the objectives, technical solutions, and advantages of the embodiments of the present invention, the technical solutions of the embodiments of the present invention will be described clearly and completely in conjunction with the accompanying figures of the embodiments of the present invention. Obviously, the described embodiments are part of the embodiments of the present invention, rather than all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.
Embodiments
This embodiment discloses a standard face generation method based on a generative adversarial mechanism and an attention mechanism, which mainly involves the following types of technologies: 1) design of training data: using existing data sets to design unified information codes; 2) design of network model structure: taking the generation of adversarial network framework and loop optimization network method as the basic network structure; 3) standard face generation method: adding an attention mechanism to the generator to restrict the accuracy of standard face generation.
This embodiment is based on the TensorFlow framework and the Pycharm development environment: the TensorFlow framework is a development framework based on the python language, which can build a reasonable deep learning network conveniently and quickly, and has good cross-platform interaction capabilities. TensorFlow provides interfaces for many package functions and various image processing functions in the deep learning architecture, comprising image processing functions related to OpenCV. The TensorFlow framework can also use GPU to train and verify the model, which improves the efficiency of calculation.
The Pycharm development environment under the Windows platform or Linux platform becomes the integrated development environment (IDE), which is currently one of the first choices for deep learning network design and development. Pycharm provides customers with new templates, design tools, testing and debugging tools, and may provide customers with an interface to directly call remote servers.
The present embodiment discloses a standard face generation method based on a generative adversarial mechanism and an attention mechanism. The main process comprises two stages, model training and model application.
In the model training stage: first, processing the existing face data set, and generating a data set that meets the model training by designing a unified information code mechanism; then, using a cloud server with high computing power to train the network model, by optimizing a loss function and adjusting the network model parameters until the network model converges to obtain the generator structure and weights for generating standard faces.
In the model application stage: first, using the HOG face image processing method to extract the actual picture to obtain the actual face image; then, calling the trained network model to use a face image with non-limiting factors and the designed unified information code as inputs, performing standard face generation; finally obtaining a colorful, front-facing face image.
Figure 1 is a flowchart of a standard face generation method based on a generative adversarial mechanism and an attention mechanism disclosed in this embodiment. Specific steps are as follows:
Step 1. Since the current face database mainly focuses on recognition tasks, there is no face image database with uniform information code required by the present invention. Therefore, it is necessary to integrate existing databases to construct a suitable database.
Figure 2 shows a construction process of a face image and a unified information code in the database.
Step 2. Figure 3 is an illustrative diagram of the overall architecture of the network model. The entire model framework mainly comprises three sub-networks, which correspond to an image generator sub network for generating standard faces, a model discriminator sub-network for discriminating the generated results, and an image restoration network for restoring the generated results, wherein parameter sharing is carried out between the image generator sub-network and the image restoration sub-network, the image generator sub-network mainly combines the attention mechanism to generate face images. Figure 4 is a specific network structure of the image generator sub-network, and Figure 5 is a specific network structure of the model discriminator sub-network.
The main parameters are as follows:
1) The image generator sub-network has the same parameters as the image restoration sub-network, and comprises two generators respectively, namely the color information generator and the attention mask generator, as follows specifically:
The color information generator comprises 8 convolution layers and 7 deconvolution layers. The convolution kernel size of all convolution layers is 5, the step size is 1, and finally a 3-channel color information image is generated;
The attention mask generator comprises 8 convolution layers and 7 deconvolution layers. The convolution kernel size of all convolution layers is 5, the step length is 1, and finally a1-channel attention mask is generated.
2) The model discriminator sub-network comprises two parts, namely an information code discriminating sub-network and an image discriminating sub-network, as follows specifically: the information code discriminating sub-network comprises 6 convolution layers and 1 fully connected layer, the convolution kernel size of the convolution layer is 5, the step size is 1, and finally a one dimensional unified information code of length N is generated; the image discrimination sub-network comprises 6 convolution layers, the convolution kernel size is 5, and the step size is 1.
Step 3. The model training is carried out on a high-performance GPU. The specific training parameters are designed as follows: Adam optimizer can be used, and the parameters are set to 0.9/0.999; the learning rate is set to 0.0001; the training epoch is set to 100; the batch setting for training depends on the training samples of the data.
Step 4. Model prediction, extracting the face in an actual image as the input of the model, by controlling the unified information unit, finally obtain a more standard front-facing image output.
The above-mentioned embodiments are preferred embodiments of the present invention, but the embodiments of the present invention are not limited by the above-mentioned embodiments. Any other changes, modifications, substitutions, combinations, simplifications made without departing from the spirit and principle of the present invention, all should be equivalent replacement methods, and they are all included in the protection scope of the present invention.
Claims (7)
- Claims 1. A standard face generation method based on a generative adversarial mechanism and an attention mechanism, characterized in that, the generation method comprises the following steps: Si. a data construction, collecting face data, constructing a face code with multiple non-limiting factors for each face image, then classifying the face data; wherein the non-limiting factors comprise face expression factors, face posture factors and shooting illumination factors; an encoded face image forms an information unit U ={L, E , A,} , comprising an 8-bit illumination code L, an 8-bitexpression code E, and a 19-bit posture code A, ;S2. establishing a network model based on the generative adversarial mechanism and the attention mechanism; the network model comprises three sub-networks, each correspond to an image generator sub-network for generating a standard face, a model discriminator sub-network for discriminating generated results, and an image restoration network for restoring the generated results; first, using the image generator sub-network and the attention mechanism to generate a standard face on an input face image; then, using the model discriminator sub-network to discriminate the generated image, finally, constructing an image restoration network, restoring the generated image, and comparing a restoration result with the input image to optimize constraints of the network model; S3. a model training, using the information unit U = {L,, E ,A'} as an input to optimize outputsof the image generator sub-network, the model discriminator sub-network, and the image restoration sub-network, and labeling similarities, to achieve a convergence of the network model based on the generative adversarial mechanism and the attention mechanism; S4. a model prediction, extracting an face image in an actual image as an input of the network model, finally obtaining a standard front face image output by controlling the information unit U ;wherein the face expression factors are divided into eight situations, namely happy, angry, sad, contemptuous, disappointed, scared, surprised and natural; a face expression is encoded as E =(E ,E,2, ... , E,8), where E, represents thel-th expression, l= 0, 1, 2, . . , 8, its value is [0,1],E =(0,0,...,1) means a natural expression;the face illumination factors are divided into eight situations, namely front illumination, left illumination, right illumination, front left illumination, front right illumination, left right illumination, no illumination, and full illumination; illumination information of the face is encoded asL, = (L, IL, 2 ,..., L 8 ), where L, represents the n-th illumination situation, its value is [0,1],L, =(0,0,...,1) represents full illumination image information;the face posture factors are divided into 19 situations, namely left 900, left 80, left 70, left 60,left 50, left 400, left 300, left 200, left 10, front face, right 10, right 20, right 30, right 40, right, right 600, right 700, right 800, right 90; posture information of the face is encoded asAAI=( IA 2 ,...,.Am,---, An), where Am represents the m-th face pose, m = 0, 1, 2, ... , 19, itsvalue is [0,1], A, =(0, 0,...,1) represents front posture information.
- 2. The standard face generation method based on a generative adversarial mechanism and an attention mechanism according to claim 1, characterized in that, the process of classifying face data in step Si is as follows: classifying the encoded face data into non-limiting factor face images and standard front natural clear face images, wherein,face images with unified code information UO =(L"(0,0,...,1), E (0,0,...,1), A4(0,0,...,1),) aretaken as the standard frontal natural clear face images, and used as target images of the model, and remaining face images are taken as the non-limiting factor face images, and used as input images of the model.
- 3. The standard face generation method based on a generative adversarial mechanism and an attention mechanism according to claim 1, characterized in that, inputs of the image generator sub-network are image Y and a standard face unified information code Uo, the image generator sub-network comprises two codec networks Gc and G, wherein the codec network Gc focuses on face color information and texture information, the codec network Gf focuses on areas that need to be changed on the face, generating a color information mask C and an attention mask F by combining with the attention mechanism, then through the following synthesis mechanism to generate the standard face:C=Gc(Y,UO), F=G,(Y,UO)4 = (1 - F) O C + F O Y wherein C represents element-wise multiplications of matrices; in the model discriminator sub-network, its input is an image Io generated by the image generator sub-network; the model discriminator sub-network comprises two deep convolution networks, image discrimination sub-network D and information code discrimination sub-network Du, to respectivelydistinguish a difference between a generated standard face image 1J and a corresponding standard faceimage I in a database, and a difference between a unified information code U corresponding to thegenerated standard face image 1J and a unified information code Uo corresponding to thecorresponding standard face image I in the database; an input of the image restoration sub-network is an original unified information code U corresponding to the generated standard face image 1 and an input image Y, an output of the imagerestoration sub-network is a network restoration result Y; by comparing the restoration results Y with the input image Y of an overall network, a loop optimization network result is realized.
- 4. The standard face generation method based on a generative adversarial mechanism and an attention mechanism according to claim 3, characterized in that, the process of step S2 is as follows: first, input the unified information code Uo corresponding to the input image Y and the standard face image I into the image generator sub-network incorporating the attention mechanism to generatethe standard face image I ;then, send the generated standard face image lo and the corresponding standard face image I in the database to the deep convolution network D in the model discriminator sub-network fordiscrimination, and at the same time, sending the unified information code UO corresponding to thegenerated standard face image 1J and the unified information code Uo corresponding to the standardface image I in the database to the deep convolution network Du in the model discriminator sub network for discrimination, so that the image generator sub-network and the model discriminator sub network are optimized simultaneously;finally, inputting the generated standard face image 1J to the image restoration network, restoringbased on the original unified information code U, corresponding to the original input image Y, andcomparing the restoration result Y with the input image Y, and continuously optimizing a corresponding loss function to achieve the convergence of the network model based on the generative adversarial mechanism and the attention mechanism.
- 5. The standard face generation method based on a generative adversarial mechanism and an attention mechanism according to claim 1, characterized in that, in the step S3, the model training achieves the convergence of the model by optimizing a loss function, wherein a design process of the loss function is as follows:optimizing a difference between the generated standard face image 1J by discrimination and thecorresponding standard face image I in the database: setting an image loss function as shown inLI = I ID (I)- DI () , where H and W are a height and a width of an output face image, HxW 2D, (I, ) and D, (I) are evaluation results of the image 1 and I by the image discrimination subnetwork; then, considering an effectiveness of a gradient loss, add a gradient-based penalty to the image loss function, that is, the image loss function is designed asLI = I ID, (1 0 )-D, (I) 1 +2 VD+ I( 0) -1 , where V ()represents a gradient HxW 2 HxW operation of the image, and 1, is a weight of the penalty;optimizing a difference of a conditional unified information code: setting a conditional expressionloss function, that is, distinguishing a difference between the generated standard face image 10 andthe corresponding standard face image I in the database, each corresponding to unified information code U0 and Uo; the conditional expression loss function is designed as follows:L = - jDu (Io)- U 0 ', where N is a length of an output uniform information code, and then, N2 in the conditional expression loss function, adding a mapping relationship between the input image Y and a corresponding original unified information encode U,, therefore, the conditional expression loss function is designed as follows:LU = DU (I0)- U 0 2+ DU (Y)- U 1 ,where U, is an original unified information codecorresponding to the input image Y, Uo is a unified information code corresponding to the standardface image 1, Du (I0 ) and Du ( Y) are discrimination results of the information code discriminatingsub-network on the images I and Y respectively;optimizing a difference between a result of the image restoration sub-network and the original input image: restoring the image Io generated by an input generator with the original unified information code U,, and then compared with the original input image Y, therefore, a restoration lossfunction is designed as L, = 1 G(G(Io, U)) - Y ,where h and w represent a height and a hxw 1width of the image, and G represents the image generator sub-network; a loss function of the entire network model is as follows: L = L,+ LU + L,.
- 6. The standard face generation method based on a generative adversarial mechanism and an attention mechanism according to claim 1, characterized in that, the process of step S4 is as follows: first, using a face positioning method based on a face HOG image to obtain a face image in the actual image; then, using a generator trained by the network model and a manually set unified information code to realize a rapid standard face generation of a face in the actual image.
- 7. The standard face generation method based on a generative adversarial mechanism and an attention mechanism according to claim 1, characterized in that, in the step Si, collecting face data in a RaFD face data set and a IAIR face data set.
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910121233.XA CN109934116B (en) | 2019-02-19 | 2019-02-19 | Standard face generation method based on confrontation generation mechanism and attention generation mechanism |
CN201910121233.X | 2019-02-19 | ||
PCT/CN2019/112045 WO2020168731A1 (en) | 2019-02-19 | 2019-10-18 | Generative adversarial mechanism and attention mechanism-based standard face generation method |
Publications (2)
Publication Number | Publication Date |
---|---|
AU2019430859A1 AU2019430859A1 (en) | 2021-04-29 |
AU2019430859B2 true AU2019430859B2 (en) | 2022-12-08 |
Family
ID=66985683
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
AU2019430859A Active AU2019430859B2 (en) | 2019-02-19 | 2019-10-18 | Generative adversarial mechanism and attention mechanism-based standard face generation method |
Country Status (3)
Country | Link |
---|---|
CN (1) | CN109934116B (en) |
AU (1) | AU2019430859B2 (en) |
WO (1) | WO2020168731A1 (en) |
Families Citing this family (48)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109934116B (en) * | 2019-02-19 | 2020-11-24 | 华南理工大学 | Standard face generation method based on confrontation generation mechanism and attention generation mechanism |
CN110633655A (en) * | 2019-08-29 | 2019-12-31 | 河南中原大数据研究院有限公司 | Attention-attack face recognition attack algorithm |
CN110619315B (en) * | 2019-09-24 | 2020-10-30 | 重庆紫光华山智安科技有限公司 | Training method and device of face recognition model and electronic equipment |
CN110796111B (en) * | 2019-11-05 | 2020-11-10 | 腾讯科技(深圳)有限公司 | Image processing method, device, equipment and storage medium |
CN111144314B (en) * | 2019-12-27 | 2020-09-18 | 北京中科研究院 | Method for detecting tampered face video |
CN111242078A (en) * | 2020-01-20 | 2020-06-05 | 重庆邮电大学 | Face-righting generation method based on self-attention mechanism |
CN111325319B (en) * | 2020-02-02 | 2023-11-28 | 腾讯云计算(北京)有限责任公司 | Neural network model detection method, device, equipment and storage medium |
CN111325809B (en) * | 2020-02-07 | 2021-03-12 | 广东工业大学 | Appearance image generation method based on double-impedance network |
CN111275613A (en) * | 2020-02-27 | 2020-06-12 | 辽宁工程技术大学 | Editing method for generating confrontation network face attribute by introducing attention mechanism |
CN111400531B (en) * | 2020-03-13 | 2024-04-05 | 广州文远知行科技有限公司 | Target labeling method, device, equipment and computer readable storage medium |
CN112036281B (en) * | 2020-07-29 | 2023-06-09 | 重庆工商大学 | Facial expression recognition method based on improved capsule network |
CN112102191A (en) * | 2020-09-15 | 2020-12-18 | 北京金山云网络技术有限公司 | Face image processing method and device |
CN112199637B (en) * | 2020-09-21 | 2024-04-12 | 浙江大学 | Regression modeling method for generating contrast network data enhancement based on regression attention |
CN112258402A (en) * | 2020-09-30 | 2021-01-22 | 北京理工大学 | Dense residual generation countermeasure network capable of rapidly removing rain |
CN112200055B (en) * | 2020-09-30 | 2024-04-30 | 深圳市信义科技有限公司 | Pedestrian attribute identification method, system and device of combined countermeasure generation network |
CN112508800A (en) * | 2020-10-20 | 2021-03-16 | 杭州电子科技大学 | Attention mechanism-based highlight removing method for surface of metal part with single gray image |
CN112686817B (en) * | 2020-12-25 | 2023-04-07 | 天津中科智能识别产业技术研究院有限公司 | Image completion method based on uncertainty estimation |
CN112580011B (en) * | 2020-12-25 | 2022-05-24 | 华南理工大学 | Portrait encryption and decryption system facing biological feature privacy protection |
CN112802160B (en) * | 2021-01-12 | 2023-10-17 | 西北大学 | U-GAT-IT-based improved method for migrating cartoon style of Qin cavity character |
CN112766160B (en) * | 2021-01-20 | 2023-07-28 | 西安电子科技大学 | Face replacement method based on multi-stage attribute encoder and attention mechanism |
CN112800937B (en) * | 2021-01-26 | 2023-09-05 | 华南理工大学 | Intelligent face recognition method |
CN112818850B (en) * | 2021-02-01 | 2023-02-10 | 华南理工大学 | Cross-posture face recognition method and system based on progressive neural network and attention mechanism |
CN112950661B (en) * | 2021-03-23 | 2023-07-25 | 大连民族大学 | Attention-based generation method for generating network face cartoon |
CN113688857A (en) * | 2021-04-26 | 2021-11-23 | 贵州电网有限责任公司 | Method for detecting foreign matters in power inspection image based on generation countermeasure network |
CN113255738A (en) * | 2021-05-06 | 2021-08-13 | 武汉象点科技有限公司 | Abnormal image detection method based on self-attention generation countermeasure network |
CN113239867B (en) * | 2021-05-31 | 2023-08-11 | 西安电子科技大学 | Mask area self-adaptive enhancement-based illumination change face recognition method |
CN113239870B (en) * | 2021-05-31 | 2023-08-11 | 西安电子科技大学 | Identity constraint-based face correction method and system for generating countermeasure network |
CN113255788B (en) * | 2021-05-31 | 2023-04-07 | 西安电子科技大学 | Method and system for generating confrontation network face correction based on two-stage mask guidance |
CN113255530B (en) * | 2021-05-31 | 2024-03-29 | 合肥工业大学 | Attention-based multichannel data fusion network architecture and data processing method |
CN113837953B (en) * | 2021-06-11 | 2024-04-12 | 西安工业大学 | Image restoration method based on generation countermeasure network |
CN113361489B (en) * | 2021-07-09 | 2022-09-16 | 重庆理工大学 | Decoupling representation-based face orthogonalization model construction method and training method |
CN113239914B (en) * | 2021-07-13 | 2022-02-25 | 北京邮电大学 | Classroom student expression recognition and classroom state evaluation method and device |
CN113658040B (en) * | 2021-07-14 | 2024-07-16 | 北京海百川科技有限公司 | Human face super-resolution method based on priori information and attention fusion mechanism |
CN113705400B (en) * | 2021-08-18 | 2023-08-15 | 中山大学 | Single-mode face living body detection method based on multi-mode face training |
CN113743284B (en) * | 2021-08-30 | 2024-08-13 | 杭州海康威视数字技术股份有限公司 | Image recognition method, device, equipment, camera and access control equipment |
CN114120391B (en) * | 2021-10-19 | 2024-07-12 | 哈尔滨理工大学 | Multi-pose face recognition system and method thereof |
CN114022930B (en) * | 2021-10-28 | 2024-04-16 | 天津大学 | Automatic generation method of portrait credentials |
CN114399431B (en) * | 2021-12-06 | 2024-06-04 | 北京理工大学 | Dim light image enhancement method based on attention mechanism |
CN113989444A (en) * | 2021-12-08 | 2022-01-28 | 北京航空航天大学 | Method for carrying out three-dimensional reconstruction on human face based on side face photo |
CN114359034B (en) * | 2021-12-24 | 2023-08-08 | 北京航空航天大学 | Face picture generation method and system based on hand drawing |
CN114331904B (en) * | 2021-12-31 | 2023-08-08 | 电子科技大学 | Face shielding recognition method |
CN114399824B (en) * | 2022-01-18 | 2024-09-20 | 湖南中科助英智能科技研究院有限公司 | Multi-angle side face correction method, device, computer equipment and medium |
CN114663539B (en) * | 2022-03-09 | 2023-03-14 | 东南大学 | 2D face restoration technology under mask based on audio drive |
CN114626042B (en) * | 2022-03-18 | 2024-06-28 | 杭州师范大学 | Face verification attack method and device |
CN114943585B (en) * | 2022-05-27 | 2023-05-05 | 天翼爱音乐文化科技有限公司 | Service recommendation method and system based on generation of countermeasure network |
CN115546848B (en) * | 2022-10-26 | 2024-02-02 | 南京航空航天大学 | Challenge generation network training method, cross-equipment palmprint recognition method and system |
CN116486464B (en) * | 2023-06-20 | 2023-09-01 | 齐鲁工业大学(山东省科学院) | Attention mechanism-based face counterfeiting detection method for convolution countermeasure network |
CN117808854B (en) * | 2024-02-29 | 2024-05-14 | 腾讯科技(深圳)有限公司 | Image generation method, model training method, device and electronic equipment |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101777116A (en) * | 2009-12-23 | 2010-07-14 | 中国科学院自动化研究所 | Method for analyzing facial expressions on basis of motion tracking |
WO2018020275A1 (en) * | 2016-07-29 | 2018-02-01 | Unifai Holdings Limited | Computer vision systems |
CN108510061A (en) * | 2018-03-19 | 2018-09-07 | 华南理工大学 | The method that more positive faces of monitor video human face segmentation of confrontation network are generated based on condition |
Family Cites Families (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5371083B2 (en) * | 2008-09-16 | 2013-12-18 | Kddi株式会社 | Face identification feature value registration apparatus, face identification feature value registration method, face identification feature value registration program, and recording medium |
JP5310506B2 (en) * | 2009-03-26 | 2013-10-09 | ヤマハ株式会社 | Audio mixer |
CN102496174A (en) * | 2011-12-08 | 2012-06-13 | 中国科学院苏州纳米技术与纳米仿生研究所 | Method for generating face sketch index for security monitoring |
CN102938065B (en) * | 2012-11-28 | 2017-10-20 | 北京旷视科技有限公司 | Face feature extraction method and face identification method based on large-scale image data |
CN103186774B (en) * | 2013-03-21 | 2016-03-09 | 北京工业大学 | A kind of multi-pose Face expression recognition method based on semi-supervised learning |
CN104361328B (en) * | 2014-11-21 | 2018-11-02 | 重庆中科云丛科技有限公司 | A kind of facial image normalization method based on adaptive multiple row depth model |
CN107077197B (en) * | 2014-12-19 | 2020-09-01 | 惠普发展公司,有限责任合伙企业 | 3D visualization map |
US10475174B2 (en) * | 2017-04-06 | 2019-11-12 | General Electric Company | Visual anomaly detection system |
CN107292813B (en) * | 2017-05-17 | 2019-10-22 | 浙江大学 | A kind of multi-pose Face generation method based on generation confrontation network |
CN107506770A (en) * | 2017-08-17 | 2017-12-22 | 湖州师范学院 | Diabetic retinopathy eye-ground photography standard picture generation method |
CN107909061B (en) * | 2017-12-07 | 2021-03-30 | 电子科技大学 | Head posture tracking device and method based on incomplete features |
CN108564119B (en) * | 2018-04-04 | 2020-06-05 | 华中科技大学 | Pedestrian image generation method in any posture |
CN108520503B (en) * | 2018-04-13 | 2020-12-22 | 湘潭大学 | Face defect image restoration method based on self-encoder and generation countermeasure network |
CN109934116B (en) * | 2019-02-19 | 2020-11-24 | 华南理工大学 | Standard face generation method based on confrontation generation mechanism and attention generation mechanism |
-
2019
- 2019-02-19 CN CN201910121233.XA patent/CN109934116B/en active Active
- 2019-10-18 AU AU2019430859A patent/AU2019430859B2/en active Active
- 2019-10-18 WO PCT/CN2019/112045 patent/WO2020168731A1/en active Application Filing
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101777116A (en) * | 2009-12-23 | 2010-07-14 | 中国科学院自动化研究所 | Method for analyzing facial expressions on basis of motion tracking |
WO2018020275A1 (en) * | 2016-07-29 | 2018-02-01 | Unifai Holdings Limited | Computer vision systems |
CN108510061A (en) * | 2018-03-19 | 2018-09-07 | 华南理工大学 | The method that more positive faces of monitor video human face segmentation of confrontation network are generated based on condition |
Non-Patent Citations (3)
Title |
---|
Bettadapura V. Face expression recognition and analysis: the state of the art. arXiv preprint arXiv:1203.6722. 2012 Mar 30. * |
LIU X et al 'Normalized face image generation with perceptron generative adversarial networks' In: 2018 IEEE 4th International Conference on Identity, Security, and Behavior Analysis (ISBA) 2018 Jan 11 (pp. 1-8). IEEE. * |
ZHANG G et al. "Generative adversarial network with spatial attention for face attribute editing." Proceedings of the European conference on computer vision (ECCV). 2018. * |
Also Published As
Publication number | Publication date |
---|---|
CN109934116B (en) | 2020-11-24 |
WO2020168731A1 (en) | 2020-08-27 |
AU2019430859A1 (en) | 2021-04-29 |
CN109934116A (en) | 2019-06-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2019430859B2 (en) | Generative adversarial mechanism and attention mechanism-based standard face generation method | |
Mei et al. | Image super-resolution with non-local sparse attention | |
Phan et al. | Mobinet: A mobile binary network for image classification | |
WO2023179429A1 (en) | Video data processing method and apparatus, electronic device, and storage medium | |
CN110222718B (en) | Image processing method and device | |
CN114418030B (en) | Image classification method, training method and device for image classification model | |
Wu et al. | Visual transformers: Where do transformers really belong in vision models? | |
CN113657272B (en) | Micro video classification method and system based on missing data completion | |
Fan et al. | Hybrid separable convolutional inception residual network for human facial expression recognition | |
Bebeshko et al. | 3D modelling by means of artificial intelligence | |
Zhang et al. | Research on fish identification in tropical waters under unconstrained environment based on transfer learning | |
Mengiste et al. | Transfer-learning and texture features for recognition of the conditions of construction materials with small data sets | |
Convolutions | Channelnets: Compact and efficient convolutional neural networks via | |
Qi et al. | An efficient deep learning hashing neural network for mobile visual search | |
Yan et al. | A more efficient cnn architecture for plankton classification | |
Kumar et al. | Red green blue depth image classification using pre-trained deep convolutional neural network | |
Soongswang et al. | Enhancing MobileNetV2 Performance with Layer Replication and Splitting for 3D Face Recognition Task Using Distributed Training | |
Gong et al. | Fine-grained classification network for fish based on the attention mechanism and EfficientNet | |
Zhou et al. | Fast camouflaged object detection via multi-scale feature-enhanced network | |
Lu et al. | Spatio-temporal multi-level fusion for human action recognition | |
Rangdale et al. | CNN based Model for Hand Gesture Recognition and Detection Developed for Specially Disabled People | |
Shi et al. | Alleviating Spatial Misalignment and Motion Interference for UAV-based Video Recognition | |
Feng et al. | A new method of microblog rumor detection based on transformer model | |
Fays et al. | Activity Recognition in Industrial Environment using Two Layers Learning | |
WO2024174583A9 (en) | Model training method and apparatus, and device, storage medium and product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FGA | Letters patent sealed or granted (standard patent) |