CN111523406A - Deflection face correcting method based on generation of confrontation network improved structure - Google Patents

Deflection face correcting method based on generation of confrontation network improved structure Download PDF

Info

Publication number
CN111523406A
CN111523406A CN202010269281.6A CN202010269281A CN111523406A CN 111523406 A CN111523406 A CN 111523406A CN 202010269281 A CN202010269281 A CN 202010269281A CN 111523406 A CN111523406 A CN 111523406A
Authority
CN
China
Prior art keywords
face
pred
picture
classifier
loss
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010269281.6A
Other languages
Chinese (zh)
Other versions
CN111523406B (en
Inventor
达飞鹏
胡惠雅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202010269281.6A priority Critical patent/CN111523406B/en
Publication of CN111523406A publication Critical patent/CN111523406A/en
Application granted granted Critical
Publication of CN111523406B publication Critical patent/CN111523406B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T90/00Enabling technologies or technologies with a potential or indirect contribution to GHG emissions mitigation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a deflection human face correcting method based on an improved structure of a generation confrontation network. The processing steps are as follows: (1) detecting the characteristic points of the human face, and extracting human face region blocks (eyes, a nose and a mouth) with fixed sizes; (2) respectively inputting the human face region blocks and the whole human face in the step 1 into a local channel and a whole channel to obtain a result of correcting the local channel and the whole channel; (3) fusing the outputs of the local channel and the whole channel, and setting the pixel of the overlapped area as the maximum value on the area to obtain the final generated face; (4) inputting the generated face and the front face into a discriminator and a classifier to ensure the accuracy and identity consistency of the generated face; (5) and saving the trained network model for testing. The BEGAN network adopted by the invention has simple and efficient structure, and improves the accuracy and speed of the deflection face correction to a certain extent.

Description

Deflection face correcting method based on generation of confrontation network improved structure
Technical Field
The invention relates to a deflection human face correcting method based on an improved structure of a generation confrontation network, and belongs to the field of computer vision.
Background
With the continuous development of deep learning, the research on face recognition has made many breakthrough progresses. Recognition algorithms based on deep learning even exceed the level of human eyes, however, most of the researches are under the premise of a front face or a similar front face, and therefore, the researches have certain limitations. There is evidence that even the most well-behaved frontal face recognition method has a greatly reduced recognition rate at large angular deflections. For face recognition under posture change, the existing methods can be roughly classified into the following three types: a feature extraction method based on pose robustness, a method based on front face generation and a method based on subspace analysis.
For the first method, the conventional feature extraction method mainly utilizes some robust local descriptors, such as Gabor features, SIFT features, LBP features, and the like, and the latest improved method utilizes deep learning to extract features, such as LightCNN and Facenet structure extraction features, but both of these two methods for extracting features cannot effectively handle the situation of large attitude deflection. For the third method, the linear subspace is difficult to express the nonlinearity of the face pose change process, and the learning of the nonlinear subspace is often accompanied with the problem of complex training. Therefore, the present invention is primarily concerned with the second approach, the frontal face generation based approach. The early research is realized by establishing a three-dimensional face model, the requirement on the feature points is high, and particularly when the deflection angle of the face is slightly large, some feature points in a two-dimensional face picture are invisible, so that the method has certain limitation.
Compared with a method for generating a face by using a three-dimensional face model, the method for generating the confrontation network to carry out face conversion is a great trend, and exciting performance is obtained. The current methods for face alignment using a generated confrontation network can be divided into two-dimensional methods and three-dimensional methods. The two-dimensional method comprises TP-GAN and PIM, and the three-dimensional method mainly comprises the step of applying a three-dimensional human face deformation model (3DMM) to the generation of the confrontation network, and obtaining shape and texture parameters through the model to provide prior for accurate recovery of a human face structure. Because the face structure is a complex three-dimensional structure, and the two-dimensional method is difficult to implement due to lack of constraints, the two-dimensional method usually adopts a double-path network structure generated by face contours and face details, and establishes a series of supervisory loss functions to provide constraints on the face structure, such as constraints established by keeping symmetry of generated faces.
In addition, in both the two-dimensional method and the three-dimensional method, a feature extraction module is often added to maintain identity consistency before and after correction, wherein the face feature extraction structure Light-CNN has better performance in both time and space complexity, and thus is widely used.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems, in a method TP-GAN for performing face correction by a two-dimensional method, aiming at the problems that a DCGAN structure adopted by the TP-GAN for generating an antagonistic network has a plurality of training difficulties and is easy to generate mode collapse, and the training process is relatively complicated because multi-scale images are collected on the face, the invention provides a method for correcting the deflected face based on an improved structure for generating the antagonistic network.
The method firstly maintains identity consistency of human faces before and after generation by introducing a third confrontation structure, namely a classifier, on the basis of a generation confrontation network structure of a traditional two-way confrontation of a double path. Practice proves that the method can better keep identity consistency before and after face correction, the generated front face has higher quality, and meanwhile, the method greatly reduces the difficulty of network training and improves the training efficiency.
The technical scheme is as follows: in order to realize the purpose of the invention, the technical scheme adopted by the invention is as follows: a method for correcting a deflected face based on an improved structure of a generation countermeasure network comprises the following steps:
step 1, deflecting face I for inputinDetecting characteristic points, and extracting human face region blocks with fixed sizes, namely eyes, a nose and a mouth;
step 2, respectively inputting the human face region blocks and the whole human face obtained in the step 1 into a local channel and a whole channel to obtain a result of inverting the local channel and the whole channel;
step 3, converting the output of the local channel into the corrected human face area and the wholeFusing the output whole human face contour of the channel, setting the pixel of the overlapped region as the maximum value on the region, and obtaining the final generated human face Ipred
Step 4, generating a face IpredWith a corresponding frontal face IgtInputting the data into a discriminator in sequence to ensure the accuracy of face generation, wherein IgtInputting a front face corresponding to a deflected face;
step 5, generating a human face IpredWith a corresponding frontal face IgtMeanwhile, the identity consistency of the generated face and the input deflection face is ensured when the face is input into the classifier;
and 6, storing the generated confrontation network model obtained by training, and correcting the deflected face during testing.
As a further technical solution of the present invention, step 1 specifically comprises:
step 1.1, uniformly normalizing the size of the face to 128 multiplied by 128, and constructing a caffe deep learning environment;
step 1.2, under a caffe deep learning environment, detecting feature points of a face, and positioning five feature points of a center point of eyes, a nose tip point and two side points of a mouth corner;
and step 1.3, obtaining each region block in the face, namely the mouth, the nose and the eyes according to the five feature points obtained in the step 1.2, wherein in order to ensure that the network can train smoothly, the size of the region blocks among different faces is kept constant, the size of the eye region is set to be 40 × 40, and the size of the nose and the mouth are respectively set to be 32 × 40 and 32 × 48.
As a further technical solution of the present invention, step 2 specifically is:
step 2.1, to the front face IgtExtracting face blocks of the eye region, the nose region and the mouth region in the same way to respectively obtain eyegt、Eyergt、Nosegt、Mouthgt
Step 2.2: aiming at a local channel and an overall channel, designing a generator structure into a U-Net structure so as to ensure that the sizes of pictures before and after generation are kept unchanged;
step 2.3: for local channels, inputting the face region blocks obtained in the step 1 into a local channel generator to obtain corrected face region blocks, namely eyepred、Eyerpred、Nosepred、Mouthpred(ii) a The loss function of the local channel is designed as an L1 loss function and consists of the following parts:
Figure BDA0002442474390000031
Figure BDA0002442474390000032
Figure BDA0002442474390000033
Figure BDA0002442474390000034
wherein, Weyel、Weyer、Wnose、WmouWidth of left eye, right eye, nose and mouth regions, respectively, equal to 40, 32, Heyel、Heyer、Hnose、HmouThen the corresponding is height, which is equal in value to 40, 40 and 48, respectively.
Figure BDA0002442474390000035
The gray values of the left eye, the right eye, the nose and the mouth which are just rotated and positioned at the (x, y) coordinate are respectively,
Figure BDA0002442474390000036
the gray values at (x, y) coordinates are blocked for each region in the corresponding frontal face.
The loss function of the local channel is the sum of the losses of each region block, and lambda is set2Being the weight value of local channel loss, the loss function local _ loss of the local channel can be expressed as follows:
local_loss=λ2*(eyel_loss+eyer_loss+nose_loss+mouth_loss)
step 2.4: for the whole channel, likewise, the face I will be deflectedinInput into the encoder structure with 3 × 3 convolution kernel, the 128 × 128 dimension face outline with the same size as the original face is obtained.
As a further technical solution of the present invention, step 3 specifically is:
step 3.1: based on the result of the correction of the face of each region block obtained in the step 2, expanding each region block by 0 to obtain the picture with the same size as the original face, namely the face pictures respectively corresponding to the left eye, the right eye, the nose and the mouth of the face;
step 3.2: and obtaining the face pictures corresponding to the local channels according to the four face pictures with the same size obtained in the step 3.1. The method comprises the following steps: comparing gray values of pixel points with the same coordinates in the four face pictures, selecting the maximum value of the gray values, taking the maximum value as the gray value of the coordinate point, and repeating the operation pixel by pixel to finally obtain the face picture corresponding to the local channel;
step 3.3: the local detail face picture obtained in the last step is fused with the front face contour obtained in the integral channel in the step 2.4, and a final generated picture I is obtained through a plurality of applied loss functionspred
The loss function applied in step 3.3 is specifically:
1) pixel-by-pixel level L1 penalty:
Figure BDA0002442474390000041
wherein W and H are width and height of the face picture, both of which are 128,
Figure BDA0002442474390000042
and
Figure BDA0002442474390000043
respectively represent and generate a face picture IpredAnd gray at coordinates (x, y) corresponding to the frontal faceA value of the metric;
2) generating the confrontation loss:
Figure BDA0002442474390000044
since the generator performs countercheck generation with the discriminator and the classifier respectively in the invention, L in the above formulaG-DAnd LG-CThe loss function is established when the generator respectively confronts with the discriminator and the classifier, the confrontation loss of the generator is designed as the weighted sum of the two parts aiming at the quality of the generated picture and keeping identity consistency, and the weights are respectively set as lambda0And λ1. The two parts correspond to the following specific formula, wherein Ipred'Representing generated face picture IpredBy the output of the discriminator D, W and H are the width and height of the face picture, both 128,
Figure BDA0002442474390000045
and
Figure BDA0002442474390000046
each represents IpredAnd Ipred'The gray value at coordinate (x, y),
Figure BDA0002442474390000047
features extracted for the classifier:
Figure BDA0002442474390000048
Figure BDA0002442474390000049
3) face symmetry loss:
Figure BDA0002442474390000051
wherein W and H are width and height of the face picture, both of which are 128,
Figure BDA0002442474390000052
and
Figure BDA0002442474390000053
respectively represent and generate a face picture IpredGray values at coordinates (x, y) and coordinates (W- (x-1), y);
4) the regularization loss function is a function that accounts for the noise present in the generated face picture.
As a further technical solution of the present invention, step 4 specifically is:
step 4.1: the face picture I obtained in the step 3.3 is processedpredWith front face IgtSuccessively inputting the images into a discriminator, wherein the discriminator is structurally a self-encoder, and the output of the discriminator is a face image;
step 4.2: will generate a face IpredWith front face IgtThe output face pictures after passing through the discriminator are respectively marked as Ipred'And Igt'The pixel-by-pixel errors are calculated separately, i.e. the L1 norm errors are calculated and summed for each pixel in the picture, the expression is as follows:
Figure BDA0002442474390000054
Figure BDA0002442474390000055
wherein W and H are width and height of the face picture, both of which are 128,
Figure BDA0002442474390000056
and
Figure BDA0002442474390000057
each represents Igt、Ipred、Igt'And Ipred'A gray value at coordinate (x, y);
step 4.3: the penalty function for design arbiter D is:
LD=L(Igt)-ktL(Ipred)
wherein k istThe degree of importance, k, to the discriminator can be controlled manuallytThe larger the value, the stronger the judgment ability of the representative setting discriminator. During the training of the arbiter, it is required to minimize LDThe method requires that the pixel-by-pixel error of the face is generated maximally, and simultaneously the pixel-by-pixel error of the real face is minimized, so that the aim of distinguishing the two faces is fulfilled;
step 4.4: for the game process between the discriminator and the generator, the loss function of the generator is designed as follows:
LG-D=L(Ipred)
contrary to the discriminator, the generator aims to make the generated face as close as possible to the real face, so the loss function of the generator generates the face I for minimizationpredThe quality of the generated face is ensured by the pixel-by-pixel error;
step 4.5: according to the loss functions in the step 4.3 and the step 4.4, the discriminator and the countermeasure are continuously generated in a countercheck mode, the expression capacities of the discriminator and the countermeasure are stronger and stronger, and the training data are more and more vivid.
As a further technical solution of the present invention, step 5 specifically is:
step 5.1: the face picture I obtained in the step 3.3 is processedpredCorresponding front face IgtInputting the images into a classifier (assuming that a training set has N faces in total, then one image has 2N images), wherein the classifier C is a Light-CNN model trained in advance and is used as an independent structure to confront with a generator;
step 5.2: front face IgtAnd generator generating face IpredThe features extracted by the classifier are respectively recorded as
Figure BDA0002442474390000061
And
Figure BDA0002442474390000062
defining the result label of the classifier as 1-2N, and the classification target of the classifier is to classify the front face picture IgtCorresponding to the first N labels, generating a picture IpredCorrespond toThe last N tags. The cross entropy loss function is respectively as follows:
Figure BDA0002442474390000063
Figure BDA0002442474390000064
wherein j ∈ { 1.., 2N },
Figure BDA0002442474390000065
and
Figure BDA0002442474390000066
classifier labels corresponding to the face of the person and the generated face picture respectively,
Figure BDA0002442474390000067
representing the jth front face picture IgtThe corresponding correct label is then used to identify the correct label,
Figure BDA0002442474390000068
representing the classification label of the front face picture by the classifier;
Figure BDA0002442474390000069
representing the jth generated face picture IpredCorresponding correct tag, { C (I)pred)}jOutput labels after the classifier classifies the output labels;
step 5.3: the classifier loss function is designed as the weighted sum of the two cross entropy loss functions, and alpha is a weight factor, because the generated face is relatively close to the face picture corresponding to the generated face in the feature space. The loss function of the classifier is formulated as follows:
Figure BDA00024424743900000610
step 5.4: aiming at the game process between the generator and the classifier, designing the loss function corresponding to the generator as follows:
Figure BDA00024424743900000611
wherein the content of the first and second substances,
Figure BDA00024424743900000612
the calculation formula of (a) is as follows:
Figure BDA00024424743900000613
namely, the cosine distance between the generated face picture and the corresponding front face picture characteristic is minimized, so that the aim of keeping identity consistency between the generated face picture and the corresponding front face picture is fulfilled;
step 5.5: weighting the generator in the step 4.4 and the generator in the step 5.4 with the confrontation loss of the discriminator and the classifier respectively to obtain the generated confrontation loss of the generator in the step 3.4, wherein a formula is rewritten as follows:
Figure BDA00024424743900000614
step 5.6: the classifier and the generator respectively have corresponding confrontation loss functions, the classifier and the generator continuously confront and generate, the expression capacities of the classifier and the generator are stronger and stronger, and the training data is closer to the identity information of the original face.
As a further technical solution of the present invention, step 6 specifically is:
step 6.1: storing the network model parameters during training;
step 6.2: extracting the characteristic points of the test set deflection human face picture, wherein the extraction method is the same as that of the training set and is completed in a caffe deep learning environment;
step 6.3: extracting face blocks (eyes, a nose and a mouth) of the deflected face region according to the feature points obtained in the step 5.2, wherein the sizes of the face blocks are consistent with the corresponding sizes in the training set, namely the sizes of the eyes, the nose and the mouth are respectively 40 × 40, 32 × 40 and 32 × 48;
step 6.4: the network model obtained in the training is used for correcting the test set face region blocks, and similarly, the face contour after the whole face of the deflected face is corrected can also be obtained;
step 6.5: and (4) fusing two outputs of the local channel and the whole channel according to the network model parameters in the training set, and keeping the loss function of the generator consistent with that in the step (3.4), thereby obtaining a generated front face picture corresponding to the test set deflection face.
Has the advantages that: compared with the prior art, the technical scheme of the invention has the following beneficial technical effects:
1) firstly, different from other single-channel network structures, the invention adopts a double-channel network structure to recover the face from two channels of local details and the whole face, compared with the problem of detail loss caused by independent input of the face into the network, the double-channel structure can better retain face detail information, thereby generating a more vivid front face picture.
2) Secondly, a special structure is used before and after the face is corrected to keep the consistency of the face identity characteristics, and the structure is used as a third structure for generating an antagonistic network to participate in network training, and experimental results show that the method can better accelerate network convergence and keep face identity information.
3) Finally, the discriminator is designed as a self-encoder, the output of the self-encoder is still a face picture, the Wassertein distance of pixel-by-pixel error distribution of the generated face and the original face is minimized, the loss function of the discriminator is an energy loss function, and the continuous energy value is different from a traditional generation countermeasure network which defines the output of a generator as a discrete value, and the face picture with high resolution can be better generated.
Drawings
FIG. 1 is a flow chart of the overall process of the present invention.
Fig. 2 shows distribution positions and serial numbers of 68 contour feature points extracted in a caffe environment.
Fig. 3 is a schematic diagram of the structure of the generator.
Fig. 4 is a schematic diagram of the structure of the discriminator.
Fig. 5 is a schematic diagram of a classifier structure.
Fig. 6 is a schematic diagram of the correction effect of the present invention on a Multi-PIE data set.
Fig. 7 is a schematic diagram of the positive conversion effect on the LFW test set, where (a) is the original face picture in the LFW test set, (b) is the positive conversion result obtained by the present invention, and (c) and (d) are both corresponding result pictures in other studies.
Detailed Description
The technical solution of the present invention will be further described in detail with reference to the following examples and accompanying drawings.
Spyder software is selected as a programming tool under a Linux operating system, and a confrontation network model is established and generated. This example was trained using 13 different pose pictures of 337 individuals of the Multi-PIE face library under the same lighting conditions and tested on an LFW deflection face data set.
Fig. 1 is a schematic diagram of the network structure of the present invention, and the specific steps are as follows:
step 1: the method comprises the following steps of detecting characteristic points of a human face, and extracting human face region blocks (eyes, a nose and a mouth) with fixed sizes, wherein the specific steps are as follows:
step 1.1, uniformly normalizing the size of the face to be 128 multiplied by 128, and constructing a caffe deep learning environment;
step 1.2, according to the combination Data-drive and Model-drive Methods for robust Facial Landmark Detection [ J ]. IEEE Transactions on information about features & Security, 2018:1-1, the proposed key store extraction method is used for detecting the feature points of the face under the condition of caffe deep learning, and positioning five feature points of the center point of the eyes, the nose tip point and the two side points of the mouth corner;
and step 1.3, obtaining each region block (mouth, nose and eyes) in the face according to the five feature points obtained in the step 1.2, wherein in order to ensure that the network can train smoothly, the size of the region blocks among different faces is kept constant, the size of the eye region is set to be 40 × 40, and the sizes of the nose and the mouth are respectively set to be 32 × 40 and 32 × 48.
Fig. 2 shows distribution positions and serial numbers of 68 contour feature points extracted in a caffe environment.
Step 2: respectively inputting the human face region blocks and the whole human face obtained in the step 1 into a local channel and a whole channel to obtain a corrected result of the local channel and the whole channel, and specifically comprising the following steps:
step 2.1, to the front face IgtExtracting face blocks of the eye region, the nose region and the mouth region in the same way to respectively obtain eyegt、Eyergt、Nosegt、Mouthgt
Step 2.2: aiming at a local channel and an overall channel, designing a generator structure into a U-Net structure so as to ensure that the sizes of pictures before and after generation are kept unchanged;
step 2.3: for local channels, inputting the face region blocks obtained in the step 1 into a local channel generator to obtain corrected face region blocks, namely eyepred、Eyerpred、Nosepred、Mouthpred
The loss function of the local channel is designed as an L1 loss function and consists of the following parts:
Figure BDA0002442474390000091
Figure BDA0002442474390000092
Figure BDA0002442474390000093
Figure BDA0002442474390000094
wherein, Weyel、Weyer、Wnose、WmouWidth of left eye, right eye, nose and mouth region, respectively, equal to 40, 32, respectively、32,Heyel、Heyer、Hnose、HmouThen the corresponding is height, which is equal in value to 40, 40 and 48, respectively.
Figure BDA0002442474390000095
The gray values of the left eye, the right eye, the nose and the mouth which are just rotated and positioned at the (x, y) coordinate are respectively,
Figure BDA0002442474390000096
the gray values at (x, y) coordinates are blocked for each region in the corresponding frontal face.
The loss function of the local channel is the sum of the losses of each region block, and lambda is set2Being the weight value of local channel loss, the loss function local _ loss of the local channel can be expressed as follows:
local_loss=λ2*(eyel_loss+eyer_loss+nose_loss+mouth_loss)
step 2.4: for the whole channel, likewise, the face I will be deflectedinInput into the encoder structure with 3 × 3 convolution kernel, the 128 × 128 dimension face outline with the same size as the original face is obtained.
And step 3: the output of the local channel, the corrected human face area and the output of the whole channel, the whole human face outline are fused, the pixel of the overlapping area is set as the maximum value on the area, and the final generated human face I is obtainedpredThe method comprises the following specific steps:
step 3.1: based on the result of the correction of the face of each region block obtained in the step 2, expanding each region block by 0 to obtain the picture with the same size as the original face, namely the face pictures respectively corresponding to the left eye, the right eye, the nose and the mouth of the face;
step 3.2: and obtaining the face pictures corresponding to the local channels according to the four face pictures with the same size obtained in the step 3.1. The method comprises the following steps: comparing gray values of pixel points with the same coordinates in the four face pictures, selecting the maximum value of the gray values, taking the maximum value as the gray value of the coordinate point, and repeating the operation pixel by pixel to finally obtain the face picture corresponding to the local channel;
step 3.3: the local detail face picture obtained in the last step is fused with the front face contour obtained in the integral channel in the step 2.4, and a final generated picture I is obtained through a plurality of applied loss functionspred
The loss function applied in step 3.3 is specifically:
1) pixel-by-pixel level L1 penalty:
Figure BDA0002442474390000101
wherein W and H are width and height of the face picture, both of which are 128,
Figure BDA0002442474390000102
and
Figure BDA0002442474390000103
respectively represent and generate a face picture IpredAnd the gray value corresponding to the face at the coordinates (x, y);
2) generating the confrontation loss:
Figure BDA0002442474390000104
since the generator performs countercheck generation with the discriminator and the classifier respectively in the invention, L in the above formulaG-DAnd LG-CThe loss function is established when the generator respectively confronts with the discriminator and the classifier, the confrontation loss of the generator is designed as the weighted sum of the two parts aiming at the quality of the generated picture and keeping identity consistency, and the weights are respectively set as lambda0And λ1. The two parts correspond to the following specific formula, wherein Ipred'Representing generated face picture IpredBy the output of the discriminator D, W and H are the width and height of the face picture, both 128,
Figure BDA0002442474390000105
and
Figure BDA0002442474390000106
each represents IpredAnd Ipred'The gray value at coordinate (x, y),
Figure BDA0002442474390000107
features extracted for the classifier:
Figure BDA0002442474390000108
Figure BDA0002442474390000109
3) face symmetry loss:
Figure BDA00024424743900001010
wherein W and H are width and height of the face picture, both of which are 128,
Figure BDA00024424743900001011
and
Figure BDA00024424743900001012
respectively represent and generate a face picture IpredGray values at coordinates (x, y) and coordinates (W- (x-1), y);
4) the regularization loss function is a function that accounts for the noise present in the generated face picture.
Fig. 3 is a schematic diagram of the structure of the generator.
And 4, step 4: will generate a face IpredWith a corresponding frontal face IgtThe method comprises the following steps of inputting the sequence into a discriminator to ensure the accuracy of face generation:
step 4.1: the face picture I obtained in the step 3.3 is processedpredWith front face IgtSuccessively inputting the images into a discriminator, wherein the discriminator is structurally a self-encoder, and the output of the discriminator is a face image;
step 4.2: will generate a face IpredWith front face IgtThrough the output of the discriminatorThe pictures of the face are respectively marked as Ipred'And Igt'The pixel-by-pixel errors are calculated separately, i.e. the L1 norm errors are calculated and summed for each pixel in the picture, the expression is as follows:
Figure BDA0002442474390000111
Figure BDA0002442474390000112
wherein W and H are width and height of the face picture, both of which are 128,
Figure BDA0002442474390000113
and
Figure BDA0002442474390000114
each represents Igt、Ipred、Igt'And Ipred'A gray value at coordinate (x, y);
step 4.3: the penalty function for design arbiter D is:
LD=L(Igt)-ktL(Ipred)
wherein k istThe degree of importance, k, to the discriminator can be controlled manuallytThe larger the value, the stronger the judgment ability of the representative setting discriminator. During the training of the arbiter, it is required to minimize LDThe method requires that the pixel-by-pixel error of the face is generated maximally, and simultaneously the pixel-by-pixel error of the real face is minimized, so that the aim of distinguishing the two faces is fulfilled;
step 4.4: for the game process between the discriminator and the generator, the loss function of the generator is designed as follows:
LG-D=L(Ipred)
contrary to the discriminator, the generator aims to make the generated face as close as possible to the real face, so the loss function of the generator generates the face I for minimizationpredThe quality of the generated face is ensured by the pixel-by-pixel error;
step 4.5: according to the loss functions in the step 4.3 and the step 4.4, the discriminator and the countermeasure are continuously generated in a countercheck mode, the expression capacities of the discriminator and the countermeasure are stronger and stronger, and the training data are more and more vivid.
Fig. 4 is a schematic diagram of the structure of the discriminator.
Step 5, generating a human face IpredWith a corresponding frontal face IgtMeanwhile, the identity consistency of the generated face and the input deflection face is ensured by inputting the face into a classifier, and the specific steps are as follows:
step 5.1: the face picture I obtained in the step 3.3 is processedpredCorresponding front face IgtInputting the images into a classifier (assuming that a training set has N faces in total, then one image has 2N images), wherein the classifier C is a Light-CNN model trained in advance and is used as an independent structure to confront with a generator;
step 5.2: front face IgtAnd generator generating face IpredThe features extracted by the classifier are respectively recorded as
Figure BDA0002442474390000121
And
Figure BDA0002442474390000122
defining the result label of the classifier as 1-2N, and the classification target of the classifier is to classify the front face picture IgtCorresponding to the first N labels, generating a picture IpredAnd corresponding to the last N labels. The cross entropy loss function is respectively as follows:
Figure BDA0002442474390000123
Figure BDA0002442474390000124
wherein j ∈ { 1.., 2N },
Figure BDA0002442474390000125
and
Figure BDA0002442474390000126
classifier labels corresponding to the face of the person and the generated face picture respectively,
Figure BDA0002442474390000127
representing the jth front face picture IgtCorresponding correct tag, { C (I)gt)}jRepresenting the classification label of the front face picture by the classifier;
Figure BDA0002442474390000128
representing the jth generated face picture IpredCorresponding correct tag, { C (I)pred)}jOutput labels after the classifier classifies the output labels;
step 5.3: the classifier loss function is designed as the weighted sum of the two cross entropy loss functions, and alpha is a weight factor, because the generated face is relatively close to the face picture corresponding to the generated face in the feature space. The loss function of the classifier is formulated as follows:
Figure BDA0002442474390000129
step 5.4: aiming at the game process between the generator and the classifier, designing the loss function corresponding to the generator as follows:
Figure BDA00024424743900001210
therein
Figure BDA00024424743900001211
The calculation formula of (a) is as follows:
Figure BDA00024424743900001212
namely, the cosine distance between the generated face picture and the corresponding front face picture characteristic is minimized, so that the aim of keeping identity consistency between the generated face picture and the corresponding front face picture is fulfilled;
step 5.5: weighting the generator in the step 4.4 and the generator in the step 5.4 with the confrontation loss of the discriminator and the classifier respectively to obtain the generated confrontation loss of the generator in the step 3.4, wherein a formula is rewritten as follows:
Figure BDA00024424743900001213
step 5.6: the classifier and the generator respectively have corresponding confrontation loss functions, the classifier and the generator continuously confront and generate, the expression capacities of the classifier and the generator are stronger and stronger, and the training data is closer to the identity information of the original face.
Fig. 5 is a schematic diagram of the structure of the classifier.
Fig. 6 is a schematic diagram of the correction effect of the present invention on a Multi-PIE data set.
And 6, storing the generated confrontation network model obtained by training, and correcting the deflected face during testing, wherein the specific steps are as follows:
step 6.1: storing the network model parameters during training;
step 6.2: extracting the characteristic points of the test set deflection human face picture, wherein the extraction method is the same as that of the training set and is completed in a caffe deep learning environment;
step 6.3: extracting face blocks (eyes, a nose and a mouth) of the deflected face region according to the feature points obtained in the step 5.2, wherein the sizes of the face blocks are consistent with the corresponding sizes in the training set, namely the sizes of the eyes, the nose and the mouth are respectively 40 × 40, 32 × 40 and 32 × 48;
step 6.4: the network model obtained in the training is used for correcting the test set face region blocks, and similarly, the face contour after the whole face of the deflected face is corrected can also be obtained;
step 6.5: and (4) fusing two outputs of the local channel and the whole channel according to the network model parameters in the training set, and keeping the loss function of the generator consistent with that in the step (3.4), thereby obtaining a generated front face picture corresponding to the test set deflection face.
Fig. 7 is a schematic diagram of the positive effects on the LFW test set.
The invention is based on the BEGAN in the generation countermeasure network, and the structure minimizes Wassertein distance of pixel-by-pixel error distribution of human faces before and after the generation of the BEGAN and can avoid a plurality of problems in the traditional generation countermeasure network; the method comprises the following steps of accurately recovering a face structure from a local layer and an integral layer by using a dual-channel network structure; furthermore, by using a network structure of the three countermeasures, the classifier keeping identity consistency is used as an independent structure to be countermeasures with the generator instead of being used as a part of the generation process of the supervision generator of the loss function, the accuracy and the speed of the method for correcting the deflected face are improved, and a good correcting effect can be obtained under the condition of large-angle deflection.
The above description is only an embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can understand that the modifications or substitutions within the technical scope of the present invention are included in the scope of the present invention, and therefore, the scope of the present invention should be subject to the protection scope of the claims.

Claims (7)

1. A method for correcting a deflected face based on an improved structure of a generation countermeasure network is characterized by comprising the following steps:
step 1, deflecting face I for inputinDetecting characteristic points, and extracting human face region blocks with fixed sizes, namely eyes, a nose and a mouth;
step 2, respectively inputting the human face region blocks and the whole human face obtained in the step 1 into a local channel and a whole channel to obtain a result of inverting the local channel and the whole channel;
step 3, fusing the corrected human face region output by the local channel and the whole human face contour output by the whole channel, setting the pixel of the overlapping region as the maximum value on the region, and obtaining the final generated human face Ipred
Step 4, generating a face IpredWith a corresponding frontal face IgtSequential input discriminationIn the device, the accuracy of generating the human face is ensured, wherein IgtInputting a front face corresponding to a deflected face;
step 5, generating a human face IpredWith a corresponding frontal face IgtMeanwhile, the identity consistency of the generated face and the input deflection face is ensured when the face is input into the classifier;
and 6, storing the generated confrontation network model obtained by training, and correcting the deflected face during testing.
2. The method for correcting the deflected face based on the improved structure of the generated countermeasure network as claimed in claim 1, wherein step 1 specifically comprises:
step 1.1, uniformly normalizing the size of the face to 128 multiplied by 128, and constructing a caffe deep learning environment;
step 1.2, under a caffe deep learning environment, detecting feature points of a face, and positioning five feature points of a center point of eyes, a nose tip point and two side points of a mouth corner;
and step 1.3, obtaining each region block in the human face, namely the mouth, the nose and the two eyes according to the five feature points obtained in the step 1.2, wherein the size of the region blocks among different human faces is kept constant, the size of the eye region is set to be 40 × 40, and the size of the nose and the mouth are respectively set to be 32 × 40 and 32 × 48.
3. The method for correcting the deflected face based on the improved structure of the generated countermeasure network as claimed in claim 2, wherein the step 2 is specifically:
step 2.1, to the front face IgtExtracting face blocks of the eye region, the nose region and the mouth region in the same way to respectively obtain eyegt、Eyergt、Nosegt、Mouthgt
Step 2.2: aiming at a local channel and an overall channel, designing a generator structure into a U-Net structure so as to ensure that the sizes of pictures before and after generation are kept unchanged;
step 2.3: for local channels, the face area obtained in the step 1 is input into local channels in a blocking modeIn the channel generator, the face region blocks after correction are obtained and are eye respectivelypred、Eyerpred、Nosepred、MouthpredWherein, the loss function of the local channel is designed as L1 loss function, which is composed of the following parts:
Figure FDA0002442474380000021
Figure FDA0002442474380000022
Figure FDA0002442474380000023
Figure FDA0002442474380000024
wherein, Weyel、Weyer、Wnose、WmouWidth of left eye, right eye, nose and mouth regions, respectively, equal to 40, 32, Heyel、Heyer、Hnose、HmouCorresponding is a height, equal in value to 40, 40 and 48 respectively,
Figure FDA0002442474380000025
the gray values of the left eye, the right eye, the nose and the mouth which are just rotated and positioned at the (x, y) coordinate are respectively,
Figure FDA0002442474380000026
dividing each region in the face corresponding to the front into blocks and dividing the gray values at the (x, y) coordinates;
the loss function of the local channel is the sum of the losses of each region block, and lambda is set2Being the weight value of local channel loss, the loss function local _ loss of the local channel can be expressed as follows:
local_loss=λ2*(eyel_loss+eyer_loss+nose_loss+mouth_loss)
step 2.4: for the whole channel, the human face I will be deflectedinInput into the encoder structure with 3 × 3 convolution kernel, the 128 × 128 dimension face outline with the same size as the original face is obtained.
4. The method for correcting the deflected face based on the improved structure of the generated countermeasure network as claimed in claim 3, wherein step 3 is specifically:
step 3.1: based on the result of the correction of the face of each region block obtained in the step 2, expanding each region block by 0 to obtain the picture with the same size as the original face, namely the face pictures respectively corresponding to the left eye, the right eye, the nose and the mouth of the face;
step 3.2: obtaining the face pictures corresponding to the local channels according to the four face pictures with the same size obtained in the step 3.1, wherein the specific method comprises the following steps: comparing gray values of pixel points with the same coordinates in the four face pictures, selecting the maximum value of the gray values, taking the maximum value as the gray value of the coordinate point, and repeating the operation pixel by pixel to finally obtain the face picture corresponding to the local channel;
step 3.3: fusing the local detail face picture obtained in the last step with the front face contour obtained in the integral channel in the step 2.4, and training through a plurality of applied loss functions to obtain a final generated picture Ipred
The loss function applied in step 3.3 is specifically:
1) pixel-by-pixel level L1 penalty:
Figure FDA0002442474380000031
wherein W and H are width and height of the face picture, both of which are 128,
Figure FDA0002442474380000032
and
Figure FDA0002442474380000033
respectively represent and generate a face picture IpredAnd the gray value corresponding to the face at the coordinates (x, y);
2) generating the confrontation loss:
Figure FDA0002442474380000034
l in the above formulaG-DAnd LG-CThe loss function is established when the generator respectively confronts with the discriminator and the classifier, the confrontation loss of the generator is designed as the weighted sum of the two parts aiming at the quality of the generated picture and keeping identity consistency, and the weights are respectively set as lambda0And λ1The two parts correspond to the following specific formula, wherein Ipred' Generation of face Picture IpredBy the output of the discriminator D, W and H are the width and height of the face picture, both 128,
Figure FDA0002442474380000035
and
Figure FDA0002442474380000036
each represents IpredAnd Ipred' gray value at coordinates (x, y),
Figure FDA0002442474380000037
features extracted for the classifier:
Figure FDA0002442474380000038
Figure FDA0002442474380000039
3) face symmetry loss:
Figure FDA00024424743800000310
wherein W and H are width and height of the face picture, both of which are 128,
Figure FDA00024424743800000311
and
Figure FDA00024424743800000312
respectively represent and generate a face picture IpredGray values at coordinates (x, y) and coordinates (W- (x-1), y);
4) the regularization loss function is a function that accounts for the noise present in the generated face picture.
5. The method for generating a deflected face correction based on an improved structure of a confrontation network as claimed in claim 4, wherein the step 4 is specifically:
step 4.1: the face picture I obtained in the step 3.3 is processedpredWith front face IgtSuccessively inputting the images into a discriminator, wherein the discriminator is structurally a self-encoder, and the output of the discriminator is a face image;
step 4.2: will generate a face IpredWith front face IgtThe output face pictures after passing through the discriminator are respectively marked as Ipred'And Igt'The pixel-by-pixel errors are calculated separately, i.e. the L1 norm errors are calculated and summed for each pixel in the picture, the expression is as follows:
Figure FDA0002442474380000041
Figure FDA0002442474380000042
wherein W and H are width and height of the face picture, both of which are 128,
Figure FDA0002442474380000043
and
Figure FDA0002442474380000044
each represents Igt、Ipred、Igt'And Ipred'A gray value at coordinate (x, y);
step 4.3: the penalty function for design arbiter D is:
LD=L(Igt)-ktL(Ipred)
wherein k istThe degree of attention to the discriminator can be manually controlled, and the minimum L is required in the training process of the discriminatorDThe method requires that the pixel-by-pixel error of the face is generated maximally, and simultaneously the pixel-by-pixel error of the real face is minimized, so that the aim of distinguishing the two faces is fulfilled;
step 4.4: for the game process between the discriminator and the generator, the loss function of the generator is designed as follows:
LG-D=L(Ipred)
loss function of generator for generating face I for minimizationpredThe quality of the generated face is ensured by the pixel-by-pixel error;
step 4.5: and (4) continuously carrying out countercheck generation on the discriminator and the countercheck according to the loss functions in the step 4.3 and the step 4.4 until the training reaches a preset condition.
6. The method for correcting the deflected face based on the improved structure of the generated countermeasure network as claimed in claim 5, wherein the step 5 is specifically:
step 5.1: the face picture I obtained in the step 3.3 is processedpredCorresponding front face IgtInputting the images into a classifier together, wherein a training set is assumed to have N faces in total, then 2N pictures are in total, and the classifier C is a Light-CNN model trained in advance and is used as an independent structure to confront with a generator;
step 5.2: front face IgtAnd generator generating face IpredThe features extracted by the classifier are respectively recorded as
Figure FDA0002442474380000045
And
Figure FDA0002442474380000046
defining the result label of the classifier as 1-2N, and the classification target of the classifier is to classify the front face picture IgtCorresponding to the first N labels, generating a picture IpredCorresponding to the last N labels, the cross entropy loss functions of the N labels are respectively corresponding to:
Figure FDA0002442474380000047
Figure FDA0002442474380000048
wherein j ∈ { 1.., 2N },
Figure FDA0002442474380000051
and
Figure FDA0002442474380000052
classifier labels corresponding to the face of the person and the generated face picture respectively,
Figure FDA0002442474380000053
representing the jth front face picture IgtCorresponding correct tag, { C (I)gt)}jRepresenting the classification label of the front face picture by the classifier;
Figure FDA0002442474380000054
representing the jth generated face picture IpredCorresponding correct tag, { C (I)pred)}jOutput labels after the classifier classifies the output labels;
step 5.3: the classifier loss function is designed as the weighted sum of the two cross entropy loss functions, alpha is a weight factor, and the loss function formula of the classifier is expressed as follows:
Figure FDA0002442474380000055
step 5.4: aiming at the game process between the generator and the classifier, designing the loss function corresponding to the generator as follows:
Figure FDA0002442474380000056
wherein the content of the first and second substances,
Figure FDA0002442474380000057
the calculation formula of (a) is as follows:
Figure FDA0002442474380000058
namely, the cosine distance between the generated face picture and the corresponding front face picture characteristic is minimized;
step 5.5: weighting the generator in the step 4.4 and the generator in the step 5.4 with the confrontation loss of the discriminator and the classifier respectively to obtain the generated confrontation loss of the generator in the step 3.4, wherein a formula is rewritten as follows:
Figure FDA0002442474380000059
step 5.6: the classifier and the generator respectively have corresponding confrontation loss functions, and the classifier and the generator continuously confront and generate until the training reaches a preset condition.
7. The method for generating a deflected face correction based on an improved structure of a confrontation network as claimed in claim 6, wherein step 6 is specifically:
step 6.1: storing the network model parameters during training;
step 6.2: extracting the characteristic points of the test set deflection human face picture, wherein the extraction method is the same as that of the training set and is completed in a caffe deep learning environment;
step 6.3: extracting the face blocks of the deflected face region, namely the eyes, the nose and the mouth, according to the feature points obtained in the step 5.2, wherein the sizes of the eyes, the nose and the mouth are consistent with the corresponding sizes in the training set, namely the sizes of the eyes, the nose and the mouth are respectively 40 × 40, 32 × 40 and 32 × 48;
step 6.4: the network model obtained in the training is used for correcting the test set face region blocks, and similarly, the face contour after the whole face of the deflected face is corrected can also be obtained;
step 6.5: and (4) fusing two outputs of the local channel and the whole channel according to the network model parameters in the training set, and keeping the loss function of the generator consistent with that in the step (3.4), thereby obtaining a generated front face picture corresponding to the test set deflection face.
CN202010269281.6A 2020-04-08 2020-04-08 Deflection face correcting method based on generation confrontation network improved structure Active CN111523406B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010269281.6A CN111523406B (en) 2020-04-08 2020-04-08 Deflection face correcting method based on generation confrontation network improved structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010269281.6A CN111523406B (en) 2020-04-08 2020-04-08 Deflection face correcting method based on generation confrontation network improved structure

Publications (2)

Publication Number Publication Date
CN111523406A true CN111523406A (en) 2020-08-11
CN111523406B CN111523406B (en) 2023-04-18

Family

ID=71902548

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010269281.6A Active CN111523406B (en) 2020-04-08 2020-04-08 Deflection face correcting method based on generation confrontation network improved structure

Country Status (1)

Country Link
CN (1) CN111523406B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112288861A (en) * 2020-11-02 2021-01-29 湖北大学 Automatic face three-dimensional model construction method and system based on single photo

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110738161A (en) * 2019-10-12 2020-01-31 电子科技大学 face image correction method based on improved generation type confrontation network

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110738161A (en) * 2019-10-12 2020-01-31 电子科技大学 face image correction method based on improved generation type confrontation network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
RUI HUANG等: "Beyond Face Rotation: Global and Local Perception GAN for Photorealistic and Identity Preserving Frontal View Synthesis" *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112288861A (en) * 2020-11-02 2021-01-29 湖北大学 Automatic face three-dimensional model construction method and system based on single photo
CN112288861B (en) * 2020-11-02 2022-11-25 湖北大学 Single-photo-based automatic construction method and system for three-dimensional model of human face

Also Published As

Publication number Publication date
CN111523406B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
CN109684924B (en) Face living body detection method and device
Lee et al. Ensemble deep learning for skeleton-based action recognition using temporal sliding lstm networks
CN106815566B (en) Face retrieval method based on multitask convolutional neural network
CN108629336B (en) Face characteristic point identification-based color value calculation method
Vemulapalli et al. R3DG features: Relative 3D geometry-based skeletal representations for human action recognition
Tang et al. Facial landmark detection by semi-supervised deep learning
CN110728209A (en) Gesture recognition method and device, electronic equipment and storage medium
CN108171133B (en) Dynamic gesture recognition method based on characteristic covariance matrix
CN105335719A (en) Living body detection method and device
CN111783748A (en) Face recognition method and device, electronic equipment and storage medium
CN115830652B (en) Deep palm print recognition device and method
CN111914643A (en) Human body action recognition method based on skeleton key point detection
CN112036260A (en) Expression recognition method and system for multi-scale sub-block aggregation in natural environment
Zhou et al. MTCNet: Multi-task collaboration network for rotation-invariance face detection
Iqbal et al. Facial expression recognition with active local shape pattern and learned-size block representations
Wu et al. An unsupervised real-time framework of human pose tracking from range image sequences
CN111523406B (en) Deflection face correcting method based on generation confrontation network improved structure
Das et al. A fusion of appearance based CNNs and temporal evolution of skeleton with LSTM for daily living action recognition
Liao et al. 3D face tracking and expression inference from a 2D sequence using manifold learning
CN110598595B (en) Multi-attribute face generation algorithm based on face key points and postures
US20210042510A1 (en) Adaptive hand tracking and gesture recognition using face-shoulder feature coordinate transforms
CN112784800B (en) Face key point detection method based on neural network and shape constraint
Liu et al. Adaptive recognition method for VR image of Wushu decomposition based on feature extraction
Deng et al. Multi-stream face anti-spoofing system using 3D information
CN114782992A (en) Super-joint and multi-mode network and behavior identification method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant