CN112800937A - Intelligent face recognition method - Google Patents

Intelligent face recognition method Download PDF

Info

Publication number
CN112800937A
CN112800937A CN202110101590.7A CN202110101590A CN112800937A CN 112800937 A CN112800937 A CN 112800937A CN 202110101590 A CN202110101590 A CN 202110101590A CN 112800937 A CN112800937 A CN 112800937A
Authority
CN
China
Prior art keywords
face
picture
posture
identity
expression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110101590.7A
Other languages
Chinese (zh)
Other versions
CN112800937B (en
Inventor
李弘�
肖南峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202110101590.7A priority Critical patent/CN112800937B/en
Publication of CN112800937A publication Critical patent/CN112800937A/en
Application granted granted Critical
Publication of CN112800937B publication Critical patent/CN112800937B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Security & Cryptography (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Computer Hardware Design (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses an intelligent face recognition method, which comprises the following steps: 1) face detection: intercepting a source posture face picture taking a face part as main content from an original picture; 2) aligning the human face: identifying and positioning face key points in a source posture face picture; 3) the human face gesture is rotated: according to the source posture face picture and the selected posture, keeping identity information and expression information of the source posture face picture, and generating a target posture face picture; 4) facial expression and identity recognition: and combining the source posture face picture and the target posture face picture to judge the expression and identity of the face in the picture. The invention provides an end-to-end identification method which is established by combining three innovative points of attention mechanism, generation of a countermeasure network and integrated learning. The limitation of extreme postures is broken through, the synthesized front image is used for face identity and expression recognition without constraint conditions, the accuracy and robustness are improved, and the method has a wide application prospect in the field of face recognition.

Description

Intelligent face recognition method
Technical Field
The invention relates to the technical field of face recognition, in particular to an intelligent face recognition method.
Background
Human face-related vision tasks are an important field of computer vision applications, and have made tremendous progress with the help of deep learning. However, the performance of the visual algorithm is severely restricted by complex factors such as multi-view angles, expressions, illumination, shading and the like in a real application scene, wherein the performance is most seriously degraded by posture change. The proposed strategy of 'recognition after rotation', namely recognizing after the face rotates to the front, is one of the mainstream means for solving the face posture problem. Referring to fig. 1, a general flow of face "recognition after rotation" can be summarized as face detection, face alignment, face pose rotation, and face recognition.
Face detection: namely, a partial picture with the human face as the main content is cut out from the original picture and is input to the subsequent flow. At present, the mainstream face detection mode is from coarse to fine, the whole image is sliced according to different window sizes and step lengths, whether the whole image contains a face image or not is judged by networks of different depths, the positioning of a boundary frame is corrected, and finally a plurality of image areas most possibly containing the face image are obtained.
Aligning the human face: and identifying and positioning key points of the human face. The face key points refer to feature points predefined in the face picture, and are mainly positioned around or in the center of facial components such as five sense organs and facial contours. The general 68 keypoint labeling scheme is shown in fig. 2.
The human face gesture is rotated: giving a face picture with any gesture, keeping the identity information and the expression information of the face picture, and converting to generate other gesture pictures with visual reality. In the current literature environment, the face rotating in the horizontal direction is mostly specified to be orthogonalized, that is, a non-frontal face picture is given to generate a frontal posture picture. For example, 3D modeling of faces in the case of few data sources; in the photo editing process, the face which does not look at the lens in the group photo can be corrected to be the direct-view lens; face synthesis in virtual reality and augmented reality, etc. The face rotation mainly comprises 2 technical routes, wherein one technical route is a 2D route, and a source posture face picture is directly converted into a target posture face picture; and secondly, constructing a 3D model according to the source posture face picture, rotating the 3D model to a target posture, projecting the target posture to a 2D plane and rendering a final picture. The present invention employs a 2D strategy.
Face recognition: face recognition is a broad term, and detailed application categories include face authentication, identity recognition, attribute recognition, expression recognition, and the like. There are two main application forms of identity recognition. Identity query, namely giving a face to be detected and a face database with a certain scale, and identifying the identity number of the face to be detected; and identity authentication, namely giving the face to be detected and the comparison face and judging whether the two faces are the same identity.
And (3) expression recognition: the description of the expression is generally divided into a discrete label, an expression action unit and a continuous expression space. From the perspective of simplicity and practicality, the present invention employs a discrete label approach. The 7 categories of basic expressions include "fear", "anger", "disgust", "happiness", "neutral", "sadness", "surprise", see fig. 3.
Although face recognition has been widely applied to various aspects of social life, such as various passes based on face authentication, payment authentication, human-machine emotion interaction based on identity recognition and expression recognition, public management monitoring, driver monitoring, and the like. However, in a real application environment, a large number of face recognition tasks are faced with unconstrained face conditions such as changing postures, expressions, illumination, shading and the like. Especially extreme poses, degrade the performance of the face recognition system most severely.
In the early research on the extreme face gesture recognition, Liu and the like train and train a plurality of mutually independent sub-networks for extracting bottom-layer features on the basis of discrete gesture labels, and a simple attention sub-channel strategy is adopted in feature extraction, so that the calculation efficiency and the flexibility are weak.
At present, few research achievements research the retention and recognition of expression features in the face normalization process. The existing face recognition after rotation algorithm has obvious defects in visual effect of generated pictures, or can not restore violent expression actions, or lacks modeling learning on posture change in the vertical direction, even Luan and the like propose interference of eliminating expression information in the rotation process, and the face with various expressions is regressed into the face with the positive neutral expression. There is a great progress space in the field of face rotation and recognition.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an intelligent face recognition method, which can effectively process and recognize face images in extreme postures in various practical application scenes and expand the application range of a face recognition system.
In order to achieve the purpose, the technical scheme provided by the invention is as follows: an intelligent face recognition method comprises the following steps:
1) face detection: intercepting a source posture face picture taking a face part as main content from an original picture;
2) aligning the human face: identifying and positioning face key points in a source posture face picture;
3) the human face gesture is rotated: according to the source posture face picture and the selected posture, identity information and expression information of the source posture face picture are reserved, and a target posture face picture with visual sense of reality is generated through conversion;
4) facial expression and identity recognition: and combining the source posture face picture and the target posture face picture to judge the expression and identity of the face in the picture.
In the step 1), a multitask convolutional neural network MTCNN is adopted to carry out preprocessing on face data, and local pictures taking the face as main contents are intercepted from original pictures in various practical application fields.
In the step 2), detecting the face pictures in various postures by using a neural network Awing based on an adaptive loss function, determining the positions of key points of the face, and further generating a heat map surrounding the Gaussian divergence of the key points, wherein the used heat map is divided into the key point heat map and an attention heat map, the divergence radius of the key point heat map is 3, and the divergence radius of the attention heat map is 25; the key point heat map is used for posture guidance of the subsequent steps, and the attention heat map is used for image detail enhancement of the subsequent steps.
In the step 3), a generation countermeasure network AFGAN of a fusion attention mechanism is provided, the identity information and the emotion information of the face picture are reserved according to the source posture and the selected posture, and the face picture with the target posture and visual sense of reality is generated through conversion;
the network structure foundation of AFGAN isCGAN comprising 1 generator G and 2 discriminators Dii、DihAnd an identity feature extractor Dip(ii) a A generator G inputs a source posture face picture and a target posture key point heat map, takes the key point heat map as posture condition information, and outputs a synthesized target posture picture; discriminator DiiInputting the face picture of the source posture and the face picture of the real or synthesized target posture, and a discriminator DihInputting a real or synthesized target posture face picture and a target posture key point heat map, and outputting an authenticity label and an expression label; discriminator DiiAnd identity feature extractor DipIs consistent, and an identity feature extractor DipThe extracted identity characteristic vector is used for maintaining identity information consistency before and after rotation and an identity recognition task for a LightCNN model;
loss functions used for training the generator G in the AFGAN comprise conditional confrontation loss, overall change loss, identity maintenance loss, expression recognition loss and multi-scale pixel value loss; the conditional countermeasure loss ensures the authenticity of the synthesized picture, the total change loss inhibits the sawtooth distortion of the synthesized picture, the identity keeps the loss and maintains the consistency of the identity information, the expression recognition loss maintains the consistency of the expression information, and the multi-scale pixel value loss accelerates the convergence of the training;
the method comprises the steps of using a generator G in the AFGAN to realize the rotation of the face pose and the picture synthesis, multiplying a source pose face picture by a source pose attention heat map based on an attention mechanism of a face key point to generate an attention subgraph, wherein the formal definition is as follows:
x=Is+Ht,x1=(Is*Hs)+Ht
wherein, IsAs a source pose, HsAs a source pose attention heat map, HtRepresenting element series multiplication operation for the key point heat map of the target attitude, + representing the connection operation of the matrix, x being the main channel input, x1An attention subchannel input;
the generator G respectively processes the input of the main channel and the attention sub-channel, and performs connection fusion on the characteristic output of the two channels, wherein the formalization definition is as follows:
G2(x,x1)=Linear(G1(x)+G1(x1))
wherein G is2(x,x1) To generate input features of the penultimate layer of G, G1(x) Output characteristic of the penultimate layer of the main channel, G1(x1) For feature output of the penultimate layer of the attention subchannel, Linear () represents a Linear combination;
the final feature after fusion is processed by the last layer of the generator G to obtain a synthetic target pose picture, formally defined as:
It=G3(G2(x,x1))
wherein, ItFor the resultant target pose face picture, G3() To the output of the generator G.
Further, during the gesture rotation process, the AFGAN uses the identity recognition network LightCNN as the identity feature extractor DipRespectively extracting identity characteristic vectors of the synthesized target attitude picture and the real target attitude picture, punishing an error between the synthesized target attitude picture and the real target attitude picture through an identity keeping loss function, and maintaining the consistency of the identity information of the human face; and integrating expression recognition learning in a subsequent discriminator, forcing the picture synthesized by the generator G to show the same expression characteristics as a real picture through an expression recognition loss function, and maintaining the front-back consistency of the facial expression information.
In the step 4), a face expression is recognized by using a generation countermeasure network AFGAN of a fusion attention mechanism, and an attention mechanism based on key points of the face is adopted in the recognition process;
the network structure of the AFGAN is based on CGAN, and comprises 1 generator G and 2 discriminators Dii、DihAnd an identity feature extractor Dip(ii) a A generator G inputs a source posture face picture and a target posture key point heat map, takes the key point heat map as posture condition information, and outputs a synthesized target posture picture; discriminator DiiInputting the face picture of the source posture and the face picture of the real or synthesized target posture, and a discriminator DihInputting a real or synthesized target posture face picture and a target posture key point heat map, and outputting an authenticity label and an expression label; discriminator DiiAnd identity feature extractor DipIs consistent, and an identity feature extractor DipAnd the extracted identity characteristic vector is used for maintaining identity information consistency before and after rotation and an identity recognition task for the LightCNN model.
Training arbiter D in AFGANiiAnd DihThe loss function of (1) comprises a conditional confrontation loss and an expression recognition loss; the conditional countermeasure loss guarantee discriminator has the capability of distinguishing real and synthesized face pictures, and the expression recognition loss guarantee discriminator has the capability of recognizing the expressions of the faces;
using 2 discriminators D in AFGANiiAnd DihChecking the authenticity of the synthesized picture and identifying the facial expression; based on the attention mechanism of the key points of the human face, multiplying the human face picture to be recognized by the corresponding attention heat map to generate an attention subgraph, wherein the formalization definition is as follows:
x=It+Ht,x1=(Is*Hs1)+Ht1,x2=(Is*Hs2)+Ht2
wherein, ItFor a picture of the face to be recognized, i.e. a picture of the true or synthetic target pose, Hs1、Hs2Eye and mouth attention heat maps, H, respectively, of the source poset1、Ht2Eye and mouth key point heat maps of target postures respectively representing element series multiplication operation, + representing matrix connection operation, x1、x2Respectively inputting a main channel, an eye attention sub-channel and an oral attention sub-channel;
discriminator DihRespectively processing the input of the main channel and the two attention sub-channels, and connecting and fusing the characteristic outputs of the three channels, wherein the formalization definition is as follows:
D2(x,x1,x2)=Linear(D1(x)+D1(x1)+D1(x2))
wherein D is2(x,x1,x2) Is a discriminator DihFinal output of the common feature extraction stage, D1(x)、D1(x1)、D1(x2) Respectively outputting common characteristics of the main channel, the eye attention subchannel and the oral attention subchannel, wherein Linear () represents a Linear combination;
the fused public features respectively enter a discriminator DihThe two output branches obtain a truth feature matrix and an expression prediction feature vector of the face picture to be recognized, and the formalization definition is as follows:
Exp=D3(D2(x,x1,x2)),Gan=D4(D2(x,x1,x2))
wherein D is3() For expression output branches, Exp is expression prediction feature vector, D4() Outputting branches for the image truth degree, and Gan is an image truth degree characteristic matrix;
discriminator DiiTreatment method of (1)ihOnly mixing H withtReplacement by source pose face picture IsAnd the others are kept consistent.
Further, a strategy of 'identification after rotation' is adopted, namely, the face is identified after being rotated to the front, and the identity of the face is directly identified by using an identity identification network LightCNN in combination with the source posture face picture and the synthesized front face picture.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the invention initiates an attention mechanism based on the key points of the human face. The academic and industrial fields have conducted extensive and intensive research on attention mechanisms in artificial intelligence, particularly in computer vision. The previous research results mainly focus on the self-learning of the neural network to generate the attention heat map, and although the method has effects, the method lacks interpretability and stability; or simply based on the specific area of the five sense organs of the human face, the picture is directly cut, the attention weight of the data of the area is increased implicitly, and the flexibility and the adaptivity are lacked. The AFGAN method combines reliable face key points extracted by the precursor networks MTCNN and Awing to generate an attention heat map around the key point area, so that the attention heat map has higher interpretability and effectiveness.
2. The invention firstly realizes the fusion of the attention mechanism and the generation of the confrontation network based on the key points of the human face. The invention fully utilizes the strong learning and reasoning ability of the production countermeasure network, and models the reliable human face structure migration and reconstruction to ensure the overall image reality; constructing an attention mechanism based on the key points of the human face, improving the characteristic extraction and image reconstruction quality of local areas of the key points, and acquiring a synthetic picture with more real and complete details; the advantages of the two are combined and supplemented with each other.
3. The invention is not limited to the face orthogonalization, and can rotate the face picture in any posture into a specific posture and simultaneously keep the front-back consistency of the identity and expression information. Therefore, the face rotating part of the invention is not limited to improving the accuracy of face recognition, and can be applied to other face image related fields.
4. In the expression recognition process, the invention also introduces an attention mechanism based on the key points of the human face, more efficiently extracts and analyzes the image area which is closely related to the human face expression. And the strategy of 'identification after rotation' is adopted, so that the identification accuracy of the facial expression pictures of all the postures is improved.
5. In the process of identifying the identity, the method adopts a strategy of 'identification after rotation', combines the face picture in the source posture with the synthesized face picture in the front side, and improves the identification accuracy rate of the face identity in each posture.
6. The invention initiates the integrated learning of face rotation, expression recognition and identity recognition. In the conventional research technology, face rotation, identity recognition and expression recognition are regarded as mutually independent machine learning tasks, and different neural network models are developed. On the contrary, the invention initiatively provides the ensemble learning, discovers and utilizes the common characteristics and expression rules among the three tasks, simultaneously realizes the purpose of keeping the consistency of the identity and the expression characteristics in the process of face rotation, and effectively utilizes the bottom layer characteristics obtained when the face rotates in the process of identity and expression recognition. Compared with three tasks of independent learning, the method saves the model capacity, improves the calculation speed and improves the effects of synthesis and identification.
7. The invention can break through the limitation of the gesture and the expression on the face recognition system, broaden the available range of the face recognition application under the large gesture and the large expression, effectively utilize more face image data which are difficult to utilize in the past, and bring benefits to a plurality of fields of social public safety monitoring, driving monitoring, teaching monitoring, pass verification, human-computer interaction of service robots and the like.
Drawings
Fig. 1 is an overall flow of face rotation and recognition.
Fig. 2 is a labeled diagram of 68 key points of a human face.
Fig. 3 shows 7 basic expressions of human face.
FIG. 4 is a schematic diagram of the structure of AFGAN.
Fig. 5 is a schematic diagram of an attention mechanism based on face key points.
FIG. 6 is a diagram of the visual effect of AFGAN application.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the present invention is not limited thereto.
The embodiment discloses an intelligent face recognition method, which has the following specific conditions:
1) face detection: and intercepting a source posture face picture taking the face of the person as the main content from the original picture.
The face detection adopts a multitask convolutional neural network (MTCNN) to carry out preprocessing of face data, and partial pictures taking the face as main contents are intercepted from original pictures in various practical application fields.
The MTCNN combines a face region detection task and a key point alignment task in any posture into an integrated learning task, and sequentially learns the picture pyramids with different resolutions by adopting the cascaded CNN. The model inputs an original picture and outputs a face region bounding box and corresponding 5 key points.
Firstly, constructing picture pyramids with different high resolutions on a whole picture, sequentially performing three tasks (namely three output branches) on picture slices with different resolutions by using three-level networks P-Net, R-Net and O-Net, and judging whether the pictures are faces (secondary classification), face bounding box regression (adjusting the position of a frame) and face key point alignment (executed on the O-Net); the result of the previous stage network is used as input to the next stage network.
Thus, the MTCNN firstly uses a small network to rapidly extract a large number of face examinations on a low-resolution picture, and then uses a larger network to screen extracted parts, and finally obtains a fine result.
The MTCNN's objective function consists of the lossy terms of three tasks. The face region classification loss is a two-classification cross entropy loss function for judging whether the picture region contains the face image; performing regression loss of the bounding box, and calculating a second-order Euclidean distance between the vertexes of each candidate window and the real face region with the closest distance; and (3) the face key point positioning loss is obtained, and the second-order Euclidean distances between the 5 predicted landmark points and the real key points are similarly calculated.
Considering different types of samples (for example, a non-face area does not need to perform bounding box regression and landmark alignment), different networks have different task weights, and if a certain sample does not contain a certain task, the task label of the certain sample is set to be 0, and otherwise, the task label is 1. And during training, a difficult sample mining strategy is adopted for judging false positive samples and giving higher penalty weight to the false positive samples in the training process. Samples with the first 70% loss value in each training batch were recorded as difficult samples, and only the gradient information they produced was transmitted back.
2) Aligning the human face: and identifying and positioning the face key points in the source pose face picture.
The human face alignment adopts a neural network (AWING) based on an adaptive loss function to detect human face pictures under various postures, the positions of key points of the human face are determined, and then a heat map which is scattered around the key points is generated. The heat map used was divided into a key point heat map with a divergence radius of 3 and an attention heat map with a divergence radius of 25; the key point heat map is used for posture guidance of the subsequent steps, and the attention heat map is used for image detail enhancement of the subsequent steps.
The Awing aims to solve the problem of face alignment in a complex environment. Because the mainstream loss function used in the current human face key point alignment is the minimum Mean Square Error (MSE), a tiny error is excessively tolerated, and the predicted key point heat map is fuzzy. The AWING inputs an original picture, outputs a predicted key point heat map, and a supervision signal is the heat map drawn by true key points according to Gaussian distribution divergence (the high pixel value near the key points is the foreground, and the low pixel value far away from the key points is the background).
The network main body is a stacked Hourglass (HG) model which is similar to a nested residual error model, a down-sampling layer is symmetrical to an up-sampling layer, addition operation is carried out between symmetrical layer outputs, but each layer is expanded into a residual error module; each HG predicts a landmark point heat map and a boundary line heat map for a single channel (boundary lines are interpolated from real keypoints).
The Awing provides a self-adaptive Wing loss function which treats foreground and background pixels differently, punishs a small error of the foreground pixel (enables the landmark positioning to be accurate), and tolerates a background pixel error (enables the loss to be easy to converge); adjusting the weighted loss heat map of foreground and background pixel loss weights according to the real heat map; and introducing boundary prediction and coordinate convolution performed on boundary pixel points.
The winging function is a core component of the model. When the error is small, an ln form is adopted, and the smaller the error value is, the larger the gradient is; and when the error is large, a linear function is adopted, and the gradient is stable. According to the magnitude of the pixel value of the real key point thermal image, the foreground is represented when the pixel value is large, the ln index is close to 1, but the gradient can be rapidly reduced when the error is extremely close to 0, and the gradient discontinuity is avoided; the background is expressed when the value is small, the ln index is close to 2, the loss function shows similar MSE when the error is small, namely the background pixel point is allowed to tolerate the small error.
And weighting the loss heat map, performing expansion operation of digital image processing on the real key point heat map, dividing a binary mask by taking 0.2 as a threshold value, and multiplying the calculated Awing loss value by the loss of the foreground to calculate the weight.
The boundary prediction is convolved with the coordinates, firstly a coordinate information channel consisting of x and y coordinates of pixels is added in the network, a first HG generates a supervised boundary heat map, and then a boundary mask is generated with a threshold value of 0.05 to obtain boundary coordinate information.
3) The human face gesture is rotated: and according to the source posture face picture and the selected posture, keeping the identity information and the expression information of the source posture face picture, and converting to generate a target posture face picture with visual reality.
The invention integrates the tasks of face rotation and recognition into the AFGAN to realize, thereby enabling the two tasks of face rotation and recognition to share the bottom layer characteristics and the internal relation, reducing the model capacity and improving the operation efficiency and the system performance. The AFGAN combines an attention mechanism based on key points of the human face, a generation countermeasure network and an expression recognition integrated learning mechanism into a whole, and all the parts supplement each other. A generator G synthesizes a target posture face picture according to the source posture face picture and the selected posture; two discriminators DiiAnd DihThe method is used for judging the authenticity of the picture and identifying the expression label.
Referring to FIG. 4, the network structure of AFGAN is based on CGAN, and comprises 1 generator G and 2 discriminators Dii、DihAnd an identity feature extractor Dip. And G, inputting the source posture face picture and the target posture key point heat map, and outputting the synthesized target posture picture by taking the key point heat map as posture condition information. DiiInputting the source pose face picture and the real or synthetic target pose face picture, DihInputting real or synthesized target pose face pictures and target pose key point heat maps, and outputting authenticity labels and expression labels. DiiAnd DipThe structures of the two parts are basically consistent. DipAnd the LightCNN model is used for maintaining identity information consistency before and after rotation and an identity recognition task by using the extracted identity characteristic vector.
The network structure of the generator G is based on U-net, and the down-sampling layer is overlapped by a 'ReLU activation function + convolution layer + batch normalization layer' module; the upper sampling layer is overlapped by a ReLU activation function, a reverse convolution layer and a batch normalization layer module; the up-sampling layer and the down-sampling layer are symmetrically distributed, and connection operation is carried out between the output characteristics of the symmetrical layers.
Discriminator Dii、DihThe network structure is based on patchGAN, and basically comprises a module superposition group of' ReLU activation function + convolution layer + batch normalization layerAnd (4) obtaining. Two output branches of picture truth and expression prediction are divided at an output end, wherein the expression prediction is a feature vector with 7 dimensions; the prediction of the picture truth is not a 01 label, but a feature matrix, and each feature value represents the truth of 70 × 70 sub-pictures.
The attention mechanism of AFGAN is expanded around the attention heat map. The observation shows that the main characteristics of the face identity and the expression are expressed by the geometric shapes and the action changes of the eyes and the mouth of the face. Therefore, the weight of the feature information is increased for these key regions, which is helpful for the targeted extraction of feature vectors and improves the effect of face rotation and recognition, as shown in fig. 5.
And the generator G multiplies the attention heat map and the source posture face picture by weight to construct a self-adaptive local picture taking the facial features as the center, and the self-adaptive local picture becomes an input sub-channel of G. And (4) processing the attention weighted sub-channel by G, connecting the attention weighted sub-channel with the feature matrix of the main channel before the last layer of the upper sampling layer, and synthesizing a final picture. Thus, G can pay attention to the five sense organ areas weighted by the attention heat map while ensuring the overall quality of the generated picture, so as to obtain more exquisite detailed expression in the key areas.
Discriminator Dii、DihThe input end multiplies the human face picture to be recognized and the attention heat map number for weighting, and the human face picture is also divided into three branches of a main channel, an eye sub-channel and an oral sub-channel so as to train convolution kernels specially used for analyzing different geometric shape characteristics. And sequentially connecting and merging the three channel characteristics, and using the acquired common bottom layer characteristics for subsequent branch tasks. The expression output branch and the truth output branch are connected with a Linear full connection layer after a plurality of ReLU activation function, convolution layer and batch normalization layer modules.
The AFGAN aims to rotate the face picture in any posture to a specific posture, and meanwhile, the front-back consistency of identity and expression information is kept. Therefore, the authenticity of the picture, the consistency of the identity information, the consistency of the expression information and the recognition capability of the expression form a multi-task learning target of the model. The model guides the training of the network model through 5 complementary loss functions, namely conditional countermeasure loss, total change loss, identity maintenance loss, expression recognition loss and multi-scale pixel value loss.
The conditional countermeasure loss is used for guiding the migration of data from the source domain to the target domain, and the authenticity of the composite picture is improved. It can suppress excessive smoothing and produce an image with more abundant details. The loss function encourages the generator G to deceive the discriminator, so that the discriminator gives the highest possible truth to the composite sheet; meanwhile, the identification power of the discriminator is enhanced, the real picture is judged to be high in authenticity, and the synthesized picture is judged to be low in authenticity.
The overall variation loss can suppress the jagged distortion phenomenon caused by the conditional opposition loss. Specifically, the total of the pixel value changes of the generated picture in the vertical direction and the horizontal direction is calculated. The loss term leads the generated picture to gradually change overall, and abrupt changes of pixel values are suppressed.
Identity maintenance loss maintains consistency of identity information, and the AFGAN introduces an identity recognition network LightCNN. The method is trained to recognize identity information of a large number of faces, and therefore has strong identity feature extraction capability. AFGAN uses it as identity extractor DipRespectively calculating the identity characteristic vectors of the source posture picture and the target posture picture synthesized by the generator G, and requiring the identity characteristic vectors to be consistent. The function is defined as the cosine distance between the identity feature vectors of the source and target pictures. DipThe extracted identity characteristic vector can eliminate the interference of non-identity information to a great extent, so that the extracted identity characteristic vector can be used as an effective means for keeping identity information in the face rotation process.
As known at present, the invention firstly proposes to follow a multi-task learning strategy and integrates expression recognition learning into two discriminators Dii、DihAnd one generator G. In this way, the underlying features and potential associated information shared between the two tasks may be fully utilized. In the process of recognizing the expression, the adaptive attention mechanism helps the discriminator to focus on the key human face five sense organ region, and local information which is important for classifying the expression is better extracted.
The expression recognition loss employs a cross entropy function. In training, two discriminators Dii、DihOne expression prediction vector is output. The model averages the two vectors to obtain the final prediction vector. In this way, expression information from different poses is fused to make a robust prediction. The expression recognition loss is applied to the discriminator on one hand to improve the recognition capability of the discriminator; on one hand, the method is applied to a generator G, and the generator G is forced to generate pictures with consistent expressions.
In order to further improve the authenticity of the picture, AFGAN requires that the generated picture is as close as possible to the real picture. And the AFGAN constructs a picture pyramid by using the pictures generated by the G, and calculates the L1 distance of the pixel value level between the synthetic picture and the real picture. Although this loss of multi-scale pixel values may result in some degree of over-smoothing of the composite picture, it may still help to speed up the convergence of the training process.
The AFGAN can realize the conversion of face pictures between any postures under the assistance of a front-end face detection and face alignment method, simultaneously keep identity and expression information, and realize robust expression recognition classification by combining a source image and a synthetic image. The example of the rotation and recognition of a part of pictures in the test process is shown in fig. 6, so that the identity information and the expression information of the original face are perfectly maintained, and the rotation of the posture is accurately completed.
The method comprises the steps of using a generator G in the AFGAN to realize the rotation of the face pose and the picture synthesis, multiplying a source pose face picture by a source pose attention heat map based on an attention mechanism of a face key point to generate an attention subgraph, wherein the formal definition is as follows:
x=Is+Ht,x1=(Is*Hs)+Ht
wherein, IsAs a source pose, HsAs a source pose attention heat map, HtRepresenting element series multiplication operation for the key point heat map of the target attitude, + representing the connection operation of the matrix, x being the main channel input, x1An attention subchannel input;
the generator G respectively processes the input of the main channel and the attention sub-channel, and performs connection fusion on the characteristic output of the two channels, wherein the formalization definition is as follows:
G2(x,x1)=Linear(G1(x)+G1(x1))
wherein G is2(x,x1) To generate input features of the penultimate layer of G, G1(x) Output characteristic of the penultimate layer of the main channel, G1(x1) For feature output of the penultimate layer of the attention subchannel, Linear () represents a Linear combination;
the final feature after fusion is processed by the last layer of the generator G to obtain a synthetic target pose picture, formally defined as:
It=G3(G2(x,x1))
wherein, ItFor the resultant target pose face picture, G3() To the output of the generator G.
4) Facial expression and identity recognition: and (4) continuously using the AFGAN to identify the facial expression (adopting an attention mechanism based on facial key points in the identification process), and judging the expression and identity of the face in the picture by combining the source posture face picture and the target posture face picture.
The network structure of the AFGAN is based on CGAN, and comprises 1 generator G and 2 discriminators Dii、DihAnd an identity feature extractor Dip(ii) a A generator G inputs a source posture face picture and a target posture key point heat map, takes the key point heat map as posture condition information, and outputs a synthesized target posture picture; discriminator DiiInputting the face picture of the source posture and the face picture of the real or synthesized target posture, and a discriminator DihInputting a real or synthesized target posture face picture and a target posture key point heat map, and outputting an authenticity label and an expression label; discriminator DiiAnd identity feature extractor DipIs basically consistent, and an identity feature extractor DipAnd the extracted identity characteristic vector is used for maintaining identity information consistency before and after rotation and an identity recognition task for the LightCNN model.
Training arbiter D in AFGANiiAnd DihThe loss function of (1) comprises a conditional confrontation loss and an expression recognition loss;the conditional countermeasure loss guarantee discriminator has the capability of distinguishing real and synthesized face pictures, and the expression recognition loss guarantee discriminator has the capability of recognizing the expressions of the faces;
using 2 discriminators D in AFGANiiAnd DihChecking the authenticity of the synthesized picture and identifying the facial expression; following by DihFor example, based on the attention mechanism of the face key point, the face picture to be recognized is multiplied by the corresponding attention heat map number to generate an attention subgraph, which is formally defined as:
x=It+Ht,x1=(Is*Hs1)+Ht1,x2=(Is*Hs2)+Ht2
wherein, ItFor a picture of the face to be recognized (a picture of the true or synthetic target pose), Hs1、Hs2Eye and mouth attention heat maps, H, respectively, of the source poset1、Ht2Eye and mouth key point heat maps of target postures respectively representing element series multiplication operation, + representing matrix connection operation, x1、x2Respectively inputting a main channel, an eye attention sub-channel and an oral attention sub-channel;
discriminator DihRespectively processing the input of the main channel and the two attention sub-channels, and connecting and fusing the characteristic outputs of the three channels, wherein the formalization definition is as follows:
D2(x,x1,x2)=Linear(D1(x)+D1(x1)+D1(x2))
wherein D is2(x,x1,x2) Is a discriminator DihFinal output of the common feature extraction stage, D1(x)、D1(x1)、D1(x2) Respectively outputting common characteristics of the main channel, the eye attention subchannel and the oral attention subchannel, wherein Linear () represents a Linear combination;
the fused public features respectively enter a discriminator DihObtaining a truth characteristic matrix of the face picture to be recognized andexpression prediction feature vector, formalized definition is:
Exp=D3(D2(x,x1,x2)),Gan=D4(D2(x,x1,x2))
wherein D is3() For expression output branches, Exp is expression prediction feature vector, D4() Outputting branches for the image truth degree, and Gan is an image truth degree characteristic matrix;
discriminator DiiTreatment method of (1)ihOnly mixing H withtReplacement by source pose face picture IsAnd the others are kept consistent.
In addition, in the step, a strategy of 'identification after rotation' is adopted, namely, the face is identified after being rotated to the front, the face identity is directly identified by using an identity identification network LightCNN by combining the source pose face picture and the synthesized front face picture, and the identification accuracy of the face identity in each pose is improved.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (7)

1. An intelligent face recognition method is characterized by comprising the following steps:
1) face detection: intercepting a source posture face picture taking a face part as main content from an original picture;
2) aligning the human face: identifying and positioning face key points in a source posture face picture;
3) the human face gesture is rotated: according to the source posture face picture and the selected posture, identity information and expression information of the source posture face picture are reserved, and a target posture face picture with visual sense of reality is generated through conversion;
4) facial expression and identity recognition: and combining the source posture face picture and the target posture face picture to judge the expression and identity of the face in the picture.
2. The intelligent face recognition method of claim 1, wherein: in the step 1), a multitask convolutional neural network MTCNN is adopted to carry out preprocessing on face data, and local pictures taking the face as main contents are intercepted from original pictures in various practical application fields.
3. The intelligent face recognition method of claim 1, wherein: in the step 2), detecting the face pictures in various postures by using a neural network Awing based on an adaptive loss function, determining the positions of key points of the face, and further generating a heat map surrounding the Gaussian divergence of the key points, wherein the used heat map is divided into the key point heat map and an attention heat map, the divergence radius of the key point heat map is 3, and the divergence radius of the attention heat map is 25; the key point heat map is used for posture guidance of the subsequent steps, and the attention heat map is used for image detail enhancement of the subsequent steps.
4. The intelligent face recognition method of claim 1, wherein: in the step 3), a generation countermeasure network AFGAN of a fusion attention mechanism is provided, the identity information and the emotion information of the face picture are reserved according to the source posture and the selected posture, and the face picture with the target posture and visual sense of reality is generated through conversion;
the network structure of the AFGAN is based on CGAN, and comprises 1 generator G and 2 discriminators Dii、DihAnd an identity feature extractor Dip(ii) a A generator G inputs a source posture face picture and a target posture key point heat map, takes the key point heat map as posture condition information, and outputs a synthesized target posture picture; discriminator DiiInputting the face picture of the source posture and the face picture of the real or synthesized target posture, and a discriminator DihInputting a real or synthesized target posture face picture and a target posture key point heat map, and outputting an authenticity label and an expression label; discriminator DiiAnd identity feature extractor DipIs consistent, and an identity feature extractor DipIs a LightCNN model and is extractedThe identity feature vector is used for maintaining identity information consistency before and after rotation and identity recognition tasks;
loss functions used for training the generator G in the AFGAN comprise conditional confrontation loss, overall change loss, identity maintenance loss, expression recognition loss and multi-scale pixel value loss; the conditional countermeasure loss ensures the authenticity of the synthesized picture, the total change loss inhibits the sawtooth distortion of the synthesized picture, the identity keeps the loss and maintains the consistency of the identity information, the expression recognition loss maintains the consistency of the expression information, and the multi-scale pixel value loss accelerates the convergence of the training;
the method comprises the steps of using a generator G in the AFGAN to realize the rotation of the face pose and the picture synthesis, multiplying a source pose face picture by a source pose attention heat map based on an attention mechanism of a face key point to generate an attention subgraph, wherein the formal definition is as follows:
x=Is+Ht,x1=(Is*Hs)+Ht
wherein, IsAs a source pose, HsAs a source pose attention heat map, HtRepresenting element series multiplication operation for the key point heat map of the target attitude, + representing the connection operation of the matrix, x being the main channel input, x1An attention subchannel input;
the generator G respectively processes the input of the main channel and the attention sub-channel, and performs connection fusion on the characteristic output of the two channels, wherein the formalization definition is as follows:
G2(x,x1)=Linear(G1(x)+G1(x1))
wherein G is2(x,x1) To generate input features of the penultimate layer of G, G1(x) Output characteristic of the penultimate layer of the main channel, G1(x1) For feature output of the penultimate layer of the attention subchannel, Linear () represents a Linear combination;
the final feature after fusion is processed by the last layer of the generator G to obtain a synthetic target pose picture, formally defined as:
It=G3(G2(x,x1))
wherein, ItFor the resultant target pose face picture, G3() To the output of the generator G.
5. The intelligent face recognition method of claim 4, wherein: in the gesture rotation process, the AFGAN uses the identity recognition network LightCNN as the identity feature extractor DipRespectively extracting identity characteristic vectors of the synthesized target attitude picture and the real target attitude picture, punishing an error between the synthesized target attitude picture and the real target attitude picture through an identity keeping loss function, and maintaining the consistency of the identity information of the human face; and integrating expression recognition learning in a subsequent discriminator, forcing the picture synthesized by the generator G to show the same expression characteristics as a real picture through an expression recognition loss function, and maintaining the front-back consistency of the facial expression information.
6. The intelligent face recognition method of claim 1, wherein: in the step 4), a face expression is recognized by using a generation countermeasure network AFGAN of a fusion attention mechanism, and an attention mechanism based on key points of the face is adopted in the recognition process;
the network structure of the AFGAN is based on CGAN, and comprises 1 generator G and 2 discriminators Dii、DihAnd an identity feature extractor Dip(ii) a A generator G inputs a source posture face picture and a target posture key point heat map, takes the key point heat map as posture condition information, and outputs a synthesized target posture picture; discriminator DiiInputting the face picture of the source posture and the face picture of the real or synthesized target posture, and a discriminator DihInputting a real or synthesized target posture face picture and a target posture key point heat map, and outputting an authenticity label and an expression label; discriminator DiiAnd identity feature extractor DipIs consistent, and an identity feature extractor DipAnd the extracted identity characteristic vector is used for maintaining identity information consistency before and after rotation and an identity recognition task for the LightCNN model.
Use in AFGAN for trainingDiscriminator DiiAnd DihThe loss function of (1) comprises a conditional confrontation loss and an expression recognition loss; the conditional countermeasure loss guarantee discriminator has the capability of distinguishing real and synthesized face pictures, and the expression recognition loss guarantee discriminator has the capability of recognizing the expressions of the faces;
using 2 discriminators D in AFGANiiAnd DihChecking the authenticity of the synthesized picture and identifying the facial expression; based on the attention mechanism of the key points of the human face, multiplying the human face picture to be recognized by the corresponding attention heat map to generate an attention subgraph, wherein the formalization definition is as follows:
x=It+Ht,x1=(Is*Hs1)+Ht1,x2=(Is*Hs2)+Ht2
wherein, ItFor a picture of the face to be recognized, i.e. a picture of the true or synthetic target pose, Hs1、Hs2Eye and mouth attention heat maps, H, respectively, of the source poset1、Ht2Eye and mouth key point heat maps of target postures respectively representing element series multiplication operation, + representing matrix connection operation, x1、x2Respectively inputting a main channel, an eye attention sub-channel and an oral attention sub-channel;
discriminator DihRespectively processing the input of the main channel and the two attention sub-channels, and connecting and fusing the characteristic outputs of the three channels, wherein the formalization definition is as follows:
D2(x,x1,x2)=Linear(D1(x)+D1(x1)+D1(x2))
wherein D is2(x,x1,x2) Is a discriminator DihFinal output of the common feature extraction stage, D1(x)、D1(x1)、D1(x2) Respectively outputting common characteristics of the main channel, the eye attention subchannel and the oral attention subchannel, wherein Linear () represents a Linear combination;
the fused public features respectively enter a discriminator DihTwo output branches ofObtaining a truth feature matrix and an expression prediction feature vector of the face picture to be recognized, wherein the formalization definition is as follows:
Exp=D3(D2(x,x1,x2)),Gan=D4(D2(x,x1,x2))
wherein D is3() For expression output branches, Exp is expression prediction feature vector, D4() Outputting branches for the image truth degree, and Gan is an image truth degree characteristic matrix;
discriminator DiiTreatment method of (1)ihOnly mixing H withtReplacement by source pose face picture IsAnd the others are kept consistent.
7. The intelligent face recognition method of claim 6, wherein: and (3) adopting a strategy of 'identification after rotation', namely identifying after the face rotates to the front, and combining the source posture face picture and the synthesized front face picture to directly identify the face identity by using an identity identification network LightCNN.
CN202110101590.7A 2021-01-26 2021-01-26 Intelligent face recognition method Active CN112800937B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110101590.7A CN112800937B (en) 2021-01-26 2021-01-26 Intelligent face recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110101590.7A CN112800937B (en) 2021-01-26 2021-01-26 Intelligent face recognition method

Publications (2)

Publication Number Publication Date
CN112800937A true CN112800937A (en) 2021-05-14
CN112800937B CN112800937B (en) 2023-09-05

Family

ID=75811639

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110101590.7A Active CN112800937B (en) 2021-01-26 2021-01-26 Intelligent face recognition method

Country Status (1)

Country Link
CN (1) CN112800937B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113435365A (en) * 2021-06-30 2021-09-24 平安科技(深圳)有限公司 Face image migration method and device
CN113688799A (en) * 2021-09-30 2021-11-23 合肥工业大学 Facial expression recognition method for generating confrontation network based on improved deep convolution
CN114360032A (en) * 2022-03-17 2022-04-15 北京启醒科技有限公司 Polymorphic invariance face recognition method and system
CN114492513A (en) * 2021-07-15 2022-05-13 电子科技大学 Electroencephalogram emotion recognition method for adaptation to immunity domain based on attention mechanism in cross-user scene
WO2023009058A1 (en) * 2021-07-30 2023-02-02 脸萌有限公司 Image attribute classification method and apparatus, electronic device, medium, and program product
WO2023009060A1 (en) * 2021-07-30 2023-02-02 Lemon Inc. Neural network architecture for face tracking
WO2023044661A1 (en) * 2021-09-23 2023-03-30 Intel Corporation Learning reliable keypoints in situ with introspective self-supervision
CN116071809A (en) * 2023-03-22 2023-05-05 鹏城实验室 Face space-time representation generation method based on multi-class representation space-time interaction
CN117437522A (en) * 2023-12-19 2024-01-23 福建拓尔通软件有限公司 Face recognition model training method, face recognition method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109816593A (en) * 2019-01-18 2019-05-28 大连海事大学 A kind of super-resolution image reconstruction method of the generation confrontation network based on attention mechanism
CN109829396A (en) * 2019-01-16 2019-05-31 广州杰赛科技股份有限公司 Recognition of face motion blur processing method, device, equipment and storage medium
CN109934116A (en) * 2019-02-19 2019-06-25 华南理工大学 A kind of standard faces generation method based on generation confrontation mechanism and attention mechanism
CN110222668A (en) * 2019-06-17 2019-09-10 苏州大学 Based on the multi-pose human facial expression recognition method for generating confrontation network
CN110751098A (en) * 2019-10-22 2020-02-04 中山大学 Face recognition method for generating confrontation network based on illumination and posture
CN111243066A (en) * 2020-01-09 2020-06-05 浙江大学 Facial expression migration method based on self-supervision learning and confrontation generation mechanism

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109829396A (en) * 2019-01-16 2019-05-31 广州杰赛科技股份有限公司 Recognition of face motion blur processing method, device, equipment and storage medium
CN109816593A (en) * 2019-01-18 2019-05-28 大连海事大学 A kind of super-resolution image reconstruction method of the generation confrontation network based on attention mechanism
CN109934116A (en) * 2019-02-19 2019-06-25 华南理工大学 A kind of standard faces generation method based on generation confrontation mechanism and attention mechanism
CN110222668A (en) * 2019-06-17 2019-09-10 苏州大学 Based on the multi-pose human facial expression recognition method for generating confrontation network
CN110751098A (en) * 2019-10-22 2020-02-04 中山大学 Face recognition method for generating confrontation network based on illumination and posture
CN111243066A (en) * 2020-01-09 2020-06-05 浙江大学 Facial expression migration method based on self-supervision learning and confrontation generation mechanism

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113435365A (en) * 2021-06-30 2021-09-24 平安科技(深圳)有限公司 Face image migration method and device
CN114492513A (en) * 2021-07-15 2022-05-13 电子科技大学 Electroencephalogram emotion recognition method for adaptation to immunity domain based on attention mechanism in cross-user scene
WO2023009058A1 (en) * 2021-07-30 2023-02-02 脸萌有限公司 Image attribute classification method and apparatus, electronic device, medium, and program product
WO2023009060A1 (en) * 2021-07-30 2023-02-02 Lemon Inc. Neural network architecture for face tracking
US11803996B2 (en) 2021-07-30 2023-10-31 Lemon Inc. Neural network architecture for face tracking
WO2023044661A1 (en) * 2021-09-23 2023-03-30 Intel Corporation Learning reliable keypoints in situ with introspective self-supervision
CN113688799A (en) * 2021-09-30 2021-11-23 合肥工业大学 Facial expression recognition method for generating confrontation network based on improved deep convolution
CN114360032A (en) * 2022-03-17 2022-04-15 北京启醒科技有限公司 Polymorphic invariance face recognition method and system
CN116071809A (en) * 2023-03-22 2023-05-05 鹏城实验室 Face space-time representation generation method based on multi-class representation space-time interaction
CN117437522A (en) * 2023-12-19 2024-01-23 福建拓尔通软件有限公司 Face recognition model training method, face recognition method and device
CN117437522B (en) * 2023-12-19 2024-05-03 福建拓尔通软件有限公司 Face recognition model training method, face recognition method and device

Also Published As

Publication number Publication date
CN112800937B (en) 2023-09-05

Similar Documents

Publication Publication Date Title
CN112800937A (en) Intelligent face recognition method
Chen et al. Disparity-based multiscale fusion network for transportation detection
Wang et al. Weakly supervised adversarial domain adaptation for semantic segmentation in urban scenes
CN111428586B (en) Three-dimensional human body posture estimation method based on feature fusion and sample enhancement
CN109948475B (en) Human body action recognition method based on skeleton features and deep learning
CN113609896B (en) Object-level remote sensing change detection method and system based on dual-related attention
CN113269089B (en) Real-time gesture recognition method and system based on deep learning
Maslov et al. Online supervised attention-based recurrent depth estimation from monocular video
Li et al. MVF-CNN: Fusion of multilevel features for large-scale point cloud classification
Xu et al. AMCA: Attention-guided multiscale context aggregation network for remote sensing image change detection
Wen et al. Fast LiDAR R-CNN: Residual relation-aware region proposal networks for multiclass 3-D object detection
Yuan et al. Multi-level object detection by multi-sensor perception of traffic scenes
Ji et al. ETS-3D: An efficient two-stage framework for stereo 3D object detection
CN114913519B (en) 3D target detection method and device, electronic equipment and storage medium
Vu et al. Scalable SoftGroup for 3D Instance Segmentation on Point Clouds
Zerrouki et al. Exploiting deep learning-based LSTM classification for improving hand gesture recognition to enhance visitors’ museum experiences
Chen et al. KepSalinst: Using peripheral points to delineate salient instances
Tang et al. A multi-task neural network for action recognition with 3D key-points
Cheng et al. Understanding depth map progressively: Adaptive distance interval separation for monocular 3d object detection
Yang et al. Visual relationship prediction via label clustering and incorporation of depth information
Deramgozin et al. Attention-enabled lightweight neural network architecture for detection of action unit activation
Ammar et al. Comparative Study of latest CNN based Optical Flow Estimation
Xiao et al. Multi-modal weights sharing and hierarchical feature fusion for RGBD salient object detection
Lin et al. Mlf-det: Multi-level fusion for cross-modal 3d object detection
Nguyen et al. Facial Landmark Detection with Learnable Connectivity Graph Convolutional Network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant