CN113706404A - Depression angle human face image correction method and system based on self-attention mechanism - Google Patents

Depression angle human face image correction method and system based on self-attention mechanism Download PDF

Info

Publication number
CN113706404A
CN113706404A CN202110899936.2A CN202110899936A CN113706404A CN 113706404 A CN113706404 A CN 113706404A CN 202110899936 A CN202110899936 A CN 202110899936A CN 113706404 A CN113706404 A CN 113706404A
Authority
CN
China
Prior art keywords
layer
input
fusion
module
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110899936.2A
Other languages
Chinese (zh)
Other versions
CN113706404B (en
Inventor
邹华
斯马依力江·木萨汗
王中元
邬欢欢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN202110899936.2A priority Critical patent/CN113706404B/en
Publication of CN113706404A publication Critical patent/CN113706404A/en
Application granted granted Critical
Publication of CN113706404B publication Critical patent/CN113706404B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/80Geometric correction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Radiology & Medical Imaging (AREA)
  • Quality & Reliability (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method and a system for correcting a depression angle face image based on a self-attention mechanism. In order to fully utilize complementary information of a plurality of pictures, a convolution gating circulating unit is introduced, the plurality of pictures are sequentially input from the picture with a larger deflection angle to the picture with a smaller deflection angle to simulate the process of face correction, and feature associated information of a face can be predicted and extracted from the plurality of face pictures with depression angles better; the face information complementation of a plurality of pictures is achieved. And the generated picture is more similar to a true picture and has richer details compared with a single face correction by carrying out depression angle face correction on a plurality of faces.

Description

Depression angle human face image correction method and system based on self-attention mechanism
Technical Field
The invention belongs to the technical field of computer vision, relates to a depression angle face image correction method and system, and particularly relates to a depression angle face image correction method and system based on a self-attention mechanism.
Background
In modern society, payment, security check, suspect tracking and even office card punching are all very demanding to identify and authenticate each individual, and there are many means for identification and authentication, such as fingerprint identification, genetic identification, etc. Because of the characteristics of less contact, no need of special cooperation of users, capability of collecting information data at a long distance and the like, face recognition has become an identity recognition mode which is most widely applied and most deployed in the current society. Research around human faces has also shown a variety of factors, including determining a person's age by predicting a face, simulating the past appearance of the person by making some changes to the face, and the process of aging in the future, and accurately analyzing the mental state of this type of person by recognizing the face's expression.
The technology for recognizing the front face of a standard human face is quite mature at present, namely, important areas (eyes, nose, mouth and the like) of the face are transferred to a specific area through a certain method, and identity information is not lost. In a controlled and supervised scene, such as a scene passing face verification, a target conscious face is required to be adjusted to a fixed position so as to accurately and effectively obtain a face front image. As recognition accuracy approaches the peak in standard faces, researchers have focused on the transition from controlled face images to uncontrolled natural images. Generally, non-controllable images in life account for a larger proportion; the overall facial changes due to changes in lighting, expression, pose, etc. are insurmountable barriers to the various methods used in standard face recognition.
Particularly in the monitoring field, because the monitoring camera is usually arranged at a high position, the monitored picture is usually a depression angle picture of a human face, if the camera is arranged in a wide space, but when the camera is arranged in a relatively limited space, such as a corner place, the face front image is difficult to obtain, even if the face front image can be obtained in the wide space, people are also in a relatively far place, and the requirement on the camera equipment is high when the face front image is to be extracted; the cost required for such high performance cameras is necessarily high and cannot be applied in all situations; on the other hand, if the frontal image of the human face can be restored from the depression angle picture, the requirements on the camera are not so high and can be applied to almost all cases; therefore, multi-pose face correction takes place as soon as possible.
With the deepening of the dependence degree of people on face information and the diversification of face processing problems, face correction becomes another field which is separated from face recognition. Because the effect of the pose on face recognition is not negligible or even decisive compared to the problems of illumination, expression, resolution, etc., the change of the pose of the face, especially the change of the pose of a large magnitude, makes face recognition become very unstable. Just like any three-dimensional object, the human face can obtain an image of any angle through rotation in three directions, namely a Pitch angle Pitch, a Yaw angle Yaw and a Roll angle Roll. Many of the current researches are directed to Roll and Yaw correction, but the Pitch angle is only slightly corrected, and Pitch rotation obtains depression angle or elevation angle images, wherein the depression angle images can be obtained in monitoring frequently and are very wide in application range, but the researches on the aspects are relatively few, and relatively effective results are relatively few. Therefore, the method has great practical significance for the research of the face depression angle correction problem.
Disclosure of Invention
In order to solve the technical problem, the invention provides a depression angle human face image correction method and system based on a self-attention mechanism.
The method adopts the technical scheme that: a depression angle face image correction method based on a self-attention mechanism comprises the following steps:
step 1: constructing a multi-input fusion countermeasure generation network based on an attention mechanism;
the multi-input fusion confrontation generation network comprises a multi-input fusion coding module, a self-attention module, a single-layer fusion module, a multi-input fusion decoding module and a confrontation generation network identification module;
the multi-input fusion coding module comprises four convolutional layers which are arranged in series, the first layer is a convolutional layer with the convolutional kernel size of 7, and the step length is 1; the second layer is a convolution layer with convolution kernel size of 5 and step length of 2; the third and fourth layers are convolution layers with convolution kernel size of 3, and step length is 2; a residual block is added behind the first layer and the second layer of convolution layers, and a normalization layer, an activation layer and a residual block are sequentially added behind the third layer and the fourth layer of convolution layers;
the self-attention module is used for constructing three characteristic graphs F, g and h for the characteristic graph F output by the multi-input fusion coding module through a convolution kernel with the size of 1; matrix multiplication and softmax operation are carried out on the characteristic diagram f and the characteristic diagram g to obtain a rectangular characteristic diagram betaI,jThen betaI,jMultiplying the characteristic graph h to obtain a weight value ojAdding the weight value into the characteristic diagram F and then outputting the weight value;
the single-layer fusion module is used for fusing a plurality of picture features of the C pictures output by each convolution layer in the multi-input fusion coding module through the C ConvGRU modules;
the multilayer fusion module is used for enabling all the characteristics to be in the same scale by respectively passing through a deconvolution layer for four single-layer fusion characteristics G1, G2, G3 and G4 output by the single-layer fusion module, and respectively passing through a ConvGRU module according to the sequence of G4, G3, G2 and G1 to finally obtain multilayer fusion characteristics, wherein the multilayer fusion characteristics pass through a convolution layer with a convolution kernel size of 3 and a step size of 2 and two full-connection layers to obtain overall characteristics;
the multi-input fusion decoding module consists of four deconvolution layers, two self-attention layers and two convolution layers; the system is used for adding Gaussian noise information into the total features output by the multilayer fusion module for reconstruction to obtain new features F1, and then performing up-sampling on the features F1 to respectively form three features F2, F3 and F4 with different scales and then inputting the three features into a deconvolution layer; entering a deconvolution operation; the input of the deconvolution network of the first layer of the multi-input fusion decoding module is an up-sampling value of the output of the fourth layer of the multi-input fusion coding module after passing through the residual block and fused with F1; the input of the second layer of the convolution layer of the multi-input fusion decoding module is the result of the output of the previous layer of convolution layer after passing through the residual block, F2 and the output of the third layer of the convolution layer of the multi-input fusion encoding module after passing through the residual block are fused; the input of the deconvolution layer of the third layer of the multi-input fusion decoding module is the residual output of the deconvolution layer of the previous layer, the result of the output of the attention module after passing through the residual block, F3, the cross-layer input of the second layer of the convolution layer of the multi-input fusion coding module, and the fusion of the four values after the input picture is subjected to resize to be in a certain size; the input of the deconvolution layer of the fourth layer of the multi-input fusion decoding module is the result of the output of the attention module after passing through the residual block, the result of the output of the first convolution layer of the multi-input fusion coding module after passing through the parameter block, and the fusion input of the input picture; the face correction fine picture is output through the two convolution layers after the fourth layer of the multi-input fusion decoding module; after the input feature map passes through the unit, each feature map has a weight map which represents the association degree of each part in the feature map;
the generation confrontation network identification module consists of seven convolutional layers, and residual blocks are added into a penultimate layer and a penultimate layer;
step 2: and inputting the depression angle face image to be corrected into the multi-input fusion confrontation generation network to obtain a face correction fine picture.
The technical scheme adopted by the system of the invention is as follows: a depression angle human face image correction system based on a self-attention mechanism comprises the following modules:
the module 1 is used for constructing a multi-input fusion countermeasure generation network based on an attention mechanism;
the multi-input fusion confrontation generation network comprises a multi-input fusion coding module, a self-attention module, a single-layer fusion module, a multi-input fusion decoding module and a confrontation generation network identification module;
the multi-input fusion coding module comprises four convolutional layers which are arranged in series, the first layer is a convolutional layer with the convolutional kernel size of 7, and the step length is 1; the second layer is a convolution layer with convolution kernel size of 5 and step length of 2; the third and fourth layers are convolution layers with convolution kernel size of 3, and step length is 2; a residual block is added behind the first layer and the second layer of convolution layers, and a normalization layer, an activation layer and a residual block are sequentially added behind the third layer and the fourth layer of convolution layers;
the self-attention module is used for constructing three characteristic graphs F, g and h for the characteristic graph F output by the multi-input fusion coding module through a convolution kernel with the size of 1; matrix multiplication and softmax operation are carried out on the characteristic diagram f and the characteristic diagram g to obtain a rectangular characteristic diagram betaI,jThen betaI,jMultiplying the characteristic graph h to obtain a weight value ojAdding the weight value into the characteristic diagram F and then outputting the weight value;
the single-layer fusion module is used for fusing a plurality of picture features of the C pictures output by each convolution layer in the multi-input fusion coding module through the C ConvGRU modules;
the multilayer fusion module is used for enabling all the characteristics to be in the same scale by respectively passing through a deconvolution layer for four single-layer fusion characteristics G1, G2, G3 and G4 output by the single-layer fusion module, and respectively passing through a ConvGRU module according to the sequence of G4, G3, G2 and G1 to finally obtain multilayer fusion characteristics, wherein the multilayer fusion characteristics pass through a convolution layer with a convolution kernel size of 3 and a step size of 2 and two full-connection layers to obtain overall characteristics;
the multi-input fusion decoding module consists of four deconvolution layers, two self-attention layers and two convolution layers; the system is used for adding Gaussian noise information into the total features output by the multilayer fusion module for reconstruction to obtain new features F1, and then performing up-sampling on the features F1 to respectively form three features F2, F3 and F4 with different scales and then inputting the three features into a deconvolution layer; entering a deconvolution operation; the input of the deconvolution network of the first layer of the multi-input fusion decoding module is an up-sampling value of the output of the fourth layer of the multi-input fusion coding module after passing through the residual block and fused with F1; the input of the second layer of the convolution layer of the multi-input fusion decoding module is the result of the output of the previous layer of convolution layer after passing through the residual block, F2 and the output of the third layer of the convolution layer of the multi-input fusion encoding module after passing through the residual block are fused; the input of the deconvolution layer of the third layer of the multi-input fusion decoding module is the residual output of the deconvolution layer of the previous layer, the result of the output of the attention module after passing through the residual block, F3, the cross-layer input of the second layer of the convolution layer of the multi-input fusion coding module, and the fusion of the four values after the input picture is subjected to resize to be in a certain size; the input of the deconvolution layer of the fourth layer of the multi-input fusion decoding module is the result of the output of the attention module after passing through the residual block, the result of the output of the first convolution layer of the multi-input fusion coding module after passing through the parameter block, and the fusion input of the input picture; the face correction fine picture is output through the two convolution layers after the fourth layer of the multi-input fusion decoding module; after the input feature map passes through the unit, each feature map has a weight map which represents the association degree of each part in the feature map;
the generation confrontation network identification module consists of seven convolutional layers, and residual blocks are added into a penultimate layer and a penultimate layer;
and the module 2 is used for inputting the depression angle face image needing to be corrected into the multi-input fusion confrontation generation network to obtain a face correction fine picture.
The invention designs a multi-input depression angle human face correction method and a multi-input depression angle human face correction system based on an attention mechanism according to the characteristic of human face multi-pose image correction. The self-attention unit and the convolution gate control circulation unit are used, complementary information in a plurality of pictures can be effectively reserved, and meanwhile, the relation can be established for global pixel points in a single picture. The interference of low-price redundant information can be eliminated while beneficial information is extracted, and a residual error layer is added behind each convolution layer to enhance gradient flow, so that the training efficiency and the learning quality are improved. The invention uses single-scale and multi-scale feature fusion to establish the feature relation of the depth level of the picture, so that the generated picture has accurate face features on the whole and also has fine face picture information.
Drawings
Fig. 1 is a schematic diagram of a multi-input fusion countermeasure generation network structure of an attention mechanism according to an embodiment of the invention.
Fig. 2 is a schematic structural diagram of a self-attention module in a multiple-input fusion countermeasure generation network of an attention mechanism according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a single-layer fusion module in a multi-input fusion countermeasure generation network of an attention mechanism according to an embodiment of the present invention.
Fig. 4 is a schematic structural diagram of a multi-layer fusion module in a multi-input fusion countermeasure generation network of an attention mechanism according to an embodiment of the present invention.
Fig. 5 is a schematic structural diagram of an authentication module of a generation countermeasure network in a multi-input fusion countermeasure generation network of an attention mechanism according to an embodiment of the present invention.
FIG. 6 is a schematic diagram of a method for correcting a face image at a depression angle M2F based on a self-attention mechanism according to an embodiment of the present invention
And carrying out face correction effect graph on a single picture on the PA data set, wherein three face pictures form a group from left to right, the left one is an input graph, the left two is a real graph, and the left three is a generated effect graph.
Fig. 7 is a diagram showing the face correction effect of two pictures on a DFW data set by the depression face image correction method based on the self-attention mechanism according to the embodiment of the present invention, where four pictures from left to right form a group, the left one and the left two are input pictures, the left three are generated pictures, and the left four are real face pictures.
FIG. 8 shows the effect of the self-attention mechanism-based dip angle face image correction method on the DFW data set in comparison with the DA-GAN and TP-GAN methods.
Fig. 9 is a structural diagram of a ConvGRU module according to an embodiment of the present invention.
Fig. 10 is a block diagram of single-layer fusion and multi-layer fusion modules according to an embodiment of the present invention.
Detailed Description
In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.
The invention provides a depression angle face image correction method based on a self-attention mechanism, which comprises the following steps of:
step 1: constructing a multi-input fusion countermeasure generation network based on an attention mechanism;
the multi-input fusion confrontation generation network of the embodiment comprises a multi-input fusion coding module, a self-attention module, a single-layer fusion module, a multi-input fusion decoding module and a generation confrontation network identification module;
the multi-input fusion coding module of the embodiment comprises four convolutional layers which are arranged in series, wherein the first layer is a convolutional layer with a convolutional kernel size of 7, and the step length is 1; the second layer is a convolution layer with convolution kernel size of 5 and step length of 2; the third and fourth layers are convolution layers with convolution kernel size of 3, and step length is 2; a residual block is added after the first layer and the second layer of the convolution layer, and a normalization layer, an activation layer and a residual block are sequentially added after the third layer and the fourth layer of the convolution layer;
in this embodiment, a picture passes through the first layer of convolutional layer and then outputs a feature value 128x128x64, the feature value is transmitted to the same-layer fusion structure of the first layer of convolutional layer while being transmitted to the next layer of convolutional layer, the second layer of convolutional layer outputs a feature value 64x64x64, the feature value is transmitted to the third layer of convolutional layer as an input and is simultaneously transmitted to the same-layer fusion structure of the second layer, the outputs of the third layer and the fourth layer are features 32x32x128 and 16x16x256, and the subsequent steps are consistent with those of the previous two layers. The output of the fourth layer is also input to the multi-input fusion decoding module via one convolutional layer and two fully-connected layers.
In this embodiment, the multi-input fusion coding module includes four convolution layers and four integrally fused ConvGRU units corresponding to each convolution layer.
Referring to fig. 2, the self-attention module of the present embodiment is configured to construct three feature maps F (x), g (x), and h (x) by a convolution kernel with a size of 1 for the feature map F output by the multi-input fusion coding module; performing matrix multiplication and softmax operation on the characteristic diagram f (x) and the characteristic diagram g (x) to obtain a rectangular characteristic diagram betai,jThen betai,jMultiplying the characteristic graph h (x) to obtain a weight value ojAdding the weight value into the characteristic diagram F and then outputting the weight value;
the calculation steps are as follows:
f(x)=Wfx,g(x)=Wgx,h(x)=Whx,u(x)=Wvx (1)
sij=f(xi)Tg(xi) (2)
Figure BDA0003199373310000071
Figure BDA0003199373310000072
yi=αoi+xi (5)
in the above formula, x represents an input feature map, xiShowing the ith layer characteristic diagram. Wf,Wg,Wh,WvFour different weight matrices are shown by which the input profile, which is also the quantity that needs to be learned in the network, is processed differently. V (x) represents the final processing of the feature values, which are then added to the original weights to obtain the final self-attention weight values.
Generating weight matrices W for f (x), g (x), and h (x)f,Wg,WhAre all generated by convolution kernels of scale 1 × 1, sijIs the intermediate module of generation, last yiIt is the result of the generation from the attention network that is the final result obtained by multiplying the output from the attention layer by the scaling parameter a and then adding back the input profile.
Referring to fig. 3, the single-layer fusion module of the present embodiment is configured to fuse characteristics of multiple pictures for characteristics of C pictures output by each convolution layer in the multi-input fusion coding module through C ConvGRU modules;
in this embodiment, the ConvGRU module has two usage forms in the network, one is to merge the convolution results of each layer, and the number of input in this layer determines the number of ConvGRU modules, that is, the number of C is the number of input pictures and the number of ConvGRU modules in the same layer, that is, how many input pictures have to be merged in the same layer. Each ConvGRU module input is the result of the output of the kth convolutional layer passing through a residual block, which in this embodiment has only 4 convolutional layers, so K < 4.
In this embodiment, each control gate of the ConvGRU module is preceded by a weight map, and the feature map is updated and retained according to the weight map.
Please refer to fig. 9, which is a structure diagram of the ConvGRU module, where each module input is the value xt of one picture after convolution and the hidden state value ht-1 of the previous picture after being processed by the ConvGRU module. Unlike LSTM, the GRU module only has a reset gate and an update gate, after an input image is fused with a hidden state of a previous layer, a sigmod function is used, a value output by the GRU module is subjected to point multiplication with a hidden feature value of the previous layer, the operation is the update gate operation of the GRU, after the value of the hidden state is fused with a current picture feature value and a weight is given to the current picture feature value, the sigmod operation is used for determining which feature information is reserved, because a smaller value is close to zero by the sigmod, a larger value is close to 1, the multiplication of the numbers close to 0 or 1 with the previous hidden feature value can enable a part of feature values to be almost kept intact, and a part of the numbers close to 0 is returned to 0, which is equivalent to reset; the sigmod function thus here acts as a filter. The update gate is used to control how much of the state at the previous time needs to be written to the current state, with a larger value of the update gate indicating a larger proportion of the previous input. The information of the reset gate and the previous hidden state information point multiplication result are normalized through a tanh function, the result of tanh is multiplied by the information of the update gate and then added with the previous state information to obtain the hidden state information at the moment, and then the hidden state information is output or transmitted into a next ConvGRU module, namely a cycle of the recurrent neural network ConvGRU.
ConvGRU's propulsion formula is as follows:
Zt=σ(Wxz*Xt+Whz*Ht-1)
Rt=σ(Wxr*Xt+Whr*Ht-1)
H't=f(Wxh*Xt+Rt⊙(Whh*Ht-1))
Ht=(1-Zt)H’t+Zt○Ht-1
in the above formula, ZtIndicating an update gate, RtDenotes a reset gate, XtThe symbol representing the input at that moment represents the convolution operation, W being the weight value, the subscript of which represents the change of the weight from one state to another, e.g. WxzThe same applies to other weight values, indicating that the weight changes the input from the original input to the input of the update gate. H represents a state value, Ht-1Represents the state value, H, of the last input after processingtIndicating the state value of the last output of this input after some column operations, and o indicates the hadamard product. All convolution operations in the above operations are weighting operations using a 1 × 1 convolution kernel, the f () function represents an activation function, and the leakage corrected ReLU with a negative slope of 0.2 is used to normalize the output value after each operation. The ConvGRU module is prior art.
Referring to fig. 4, the multilayer fusion module of the present embodiment is configured to enable all the four single-layer fusion features G1, G2, G3, and G4 output by the single-layer fusion module to be in the same size through one deconvolution layer, and pass through a ConvGRU module according to the sequence of G4, G3, G2, and G1, respectively, to obtain a multilayer fusion feature, where the multilayer fusion feature passes through one convolution layer with a convolution kernel size of 3 and a step size of 2, and two full-connection layers to obtain an overall feature;
please refer to fig. 10, which is a network fusion structure, the ConvGRU module is used for both single-layer fusion and multi-layer fusion, the left part of the upper diagram is an encoder, the right part of the upper diagram is a decoder, and the middle part of the upper diagram is a four-layer fusion module.
In this embodiment, the single-layer fusion structure fuses outputs of each convolutional layer, and the network has four convolutional layers, so there are 4 single-layer fusion branches. Besides the single-layer fusion structure, the multi-layer fusion structure is a fusion structure after induction of the single-layer fusion result, the multi-layer fusion structure is more inclined to extract deep information from a plurality of inputs, and the dependency relationship among the features is searched from a space with higher latitude. The ConvGRU module first input to the multilayer fused structure is the result after the first convolutional layer passes through the single layer fused layer, the whole multilayer fused structure has 4 ConvGRU units, and the formula of the process is as follows:
Figure BDA0003199373310000091
in the above formula, HFRepresenting the final output of the multi-layer fusion module, conv () representing the convolution, and representing the result of the up-sampling of the output of the nth layer convolution layer.
Figure BDA0003199373310000092
The result of one up-sampling of the output of the nth convolutional layer is shown.
Figure BDA0003199373310000093
And representing the value of the output value of the multi-layer fusion module after one up-sampling. In the multilayer fusion structure, because the input size of each layer is different, when data is transmitted from the bottom layer to the high layer, one-time up-sampling is needed, the data is fused with the input of the previous layer, and after the fusion layer is subjected to convolution operation for the last time, the data is transmitted to the deconvolution layer on the uppermost layer for deconvolution, and finally, a face picture is generated.
The multi-input fusion decoding module of the embodiment is composed of four deconvolution layers, two self-attention layers and two convolution layers; the system is used for adding Gaussian noise information into the overall features output by the multilayer fusion module for reconstruction to obtain new features F1, and then performing up-sampling on the features F1 to respectively form three features F2, F3 and F4 with different scales and then inputting the three features into a deconvolution layer; entering a deconvolution operation; the input of the deconvolution network of the first layer of the multi-input fusion decoding module is an up-sampling value of the output of the fourth layer of the multi-input fusion coding module after passing through the residual block and fused with F1; the input of the second layer of the deconvolution layer of the multi-input fusion decoding module is the result of the output of the previous layer of the deconvolution layer after passing through the residual block, F2 and the output of the third layer of the convolution layer of the multi-input fusion coding module after passing through the residual block are fused; the input of the deconvolution layer of the third layer of the multi-input fusion decoding module is the residual output of the deconvolution layer of the previous layer, the result of the output of the attention module after passing through the residual block, F3, the cross-layer input of the second layer of the convolution layer of the multi-input fusion coding module, and the fusion of the four values after the input picture is subjected to resize to be in a certain size; the input of the deconvolution layer of the fourth layer of the multi-input fusion decoding module is the result of the output of the attention module after passing through the residual block, the result of the output of the first convolution layer of the multi-input fusion coding module after passing through the parameter block, and the fusion input of the input picture; the face correction fine picture is output through the two convolution layers after the fourth layer of the multi-input fusion decoding module; after the input feature map passes through the unit, each feature map has a weight map which represents the association degree of each part in the feature map.
In this embodiment, the cross-layer input means that all layer inputs have the output of the previous layer, and in addition, as an additional input, the output of the first layer convolutional layer is input to the second self-attention module across layers, the output of the second layer convolutional layer is input to the first self-attention module across layers, and the convolutional layer of the third and fourth layers is input to the second first layer anti-convolutional layer across layers. All cross-layer inputs are inputs after fusion by the ConvGRU module.
In this embodiment, the network initially reconstructs the gaussian noise information by adding the gaussian noise information to the 256 features input by the encoding module in the same-layer fusion, reconstructs the features into 8 × 8 × 64, and then performs upsampling on the features to respectively form three features (32 × 32 × 64, 64 × 64 × 16, and 128 × 128 × 8) with different sizes, which are input into the subsequent deconvolution layer. The upsampling step here can be subdivided into deconvolution followed by activation by the relu function. Then, the deconvolution operation stage is entered, and the input of the deconvolution network of the first layer of the decoding module is the up-sampling value of the fused output of the fourth layer of the convolution layer of the encoding module and the 8 × 8 × 64 feature. The input of the second layer of deconvolution layer is the fusion of the result of the previous layer of deconvolution layer after the output of the previous layer of deconvolution layer passes through the residual block and the fusion of the third layer of coding module convolution and the same layer. The deconvolution layer input of the third layer is the residual output of the previous layer deconvolution layer, the up-sampling value of 32 × 32 × 64, the cross-layer input of the convolution layer fusion of the second layer coding module, and the fusion of these four values after the input picture is resize to a size of 32 × 32. The third deconvolution layer is followed by the self-attention network layer. The input of the fourth layer of deconvolution layer is the result after the output from the attention network passes through the residual block, the result after the output of the first convolution layer of the coding module passes through the reference block, and the input picture is fused by resize to a size of 64 × 64 × 3. The last deconvolution layer is followed by a convolution layer, the convolution layer is a fusion convolution layer of the picture generated by the double generators, and the input of the layer is the result of the local face generator after each organ passes the arrangement, the result of the residual block output by the first layer of the coding module, and the cross-layer input of the original input picture. Then, a 128 × 128 × 3 fine face correction picture can be output after two convolutional layers.
Referring to fig. 5, the generation countermeasure network identification module of the present embodiment is composed of seven convolutional layers, in which a residual block is added to the penultimate layer and the penultimate layer; thus, better convergence can be obtained in a shallower network structure.
This embodiment uses a convolution filter of size 1 in the last layer to reduce dimensionality while preserving the spatial structure of the image. The discriminator finally generates a 4x 4 probability map, the probability map can be more concentrated on organ receptive fields of different areas of the face than a probability value for generating the authenticity of the global picture, particularly, the six receptive fields of (1,1), (1,2), (1,3), (2,1) (2,2) (2,3) can just contain face organ areas under the premise of setting the coordinates of the upper left corner area as (0,0), so that the key discrimination capability of the generation confrontation network discrimination module on organs can be improved, and the generation of a finer face picture by the multi-input fusion confrontation generation network can be fed back.
Step 2: inputting the depression angle face image to be corrected into a multi-input fusion confrontation generation network to obtain a face correction fine picture.
In the embodiment, a multi-input fusion countermeasure generation network needs to be trained, and the trained multi-input fusion countermeasure generation network is obtained; the specific implementation comprises the following substeps:
step 1.1: making a training set comprising a front image dataset IFAnd depression angle image data set IP
In this embodiment, a face angle-of-depression data set TFD is used, where the TFD data set includes N individual persons, each individual person has a front face picture and K angle-of-depression face pictures, and there is only one picture for each different angle of depression. These depression photographs cover almost all of the possible depression picture angles of the frontal face, and all of the frontal images are first taken from the TFD data set to form a frontal image data set IFOther depression angle images form a depression angle image data set IPAll of the depression images for each individual may be present in folders distinguished by individual name. Wherein A individual persons are used as training data set, B individual persons are used as testing data set. In the training set and the training data set, each depression angle face picture and a corresponding front face picture form a face picture pair.
In this embodiment, the TFD data set includes 926 individuals, each person has 6 pictures, one of which is a front picture, and the remaining 5 face depression angle pictures of 15 °, 30 °, 45 °, 60 °, and 75 °, respectively, there are 5556 pictures in total, in this embodiment, 700 individuals are taken as a training data set, 226 individuals are taken as a test data set, and a pair of picture groups is formed by the depression angle picture of each individual and the front picture, and all the pictures strictly require a size of 128 × 128.
Step 1.2: a depression angle image data set IPInputting the depression angle picture into a multi-input fusion countermeasure generation network, and collecting a front image data set IFAgainst a generator of a multiple-input fusion generation network, a generated picture I to be generatedGCalculating pixel loss, identity retention loss, confrontation loss, total variation regularization and total loss;
in this embodiment, the identity retention loss is to evaluate a difference between the generated frontal photo and the real frontal face photo, so that on one hand, whether the face generated by the model is accurate and reliable can be evaluated. In the embodiment, the lightCNN algorithm is used for extracting the face features, and the Euclidean distance between the face features and the real face features is generated through calculation and is used as the identity loss of the network.
Figure BDA0003199373310000121
Wherein d isi() Is the feature extracted by the reciprocal I-th layer network of the feature extraction network of the lightCNN, wherein | · | | is the Euclidean distance, G (I)input) By introducing into the generator a depression picture IinputThe resulting front face picture, IgtIs really a front face picture. The lightCNN is a model obtained by training thousands of pictures, can accurately extract key features of the human face, and has a reliable classification effect.
Regularization of total variation:
Figure BDA0003199373310000122
pixel loss:
Figure BDA0003199373310000123
wherein S is a picture with various sizes, for example, in the present embodiment, a 128 × 128 image is used as an input, a 128 × 128 picture is regenerated, and a certain range information can be obtained from pixel loss by reducing the scale thereof, and in the present embodiment, three image scales of 128 × 128, 64 × 64, and 32 × 32 are provided; ws and Hs indicate the width and height of the picture corresponding to different scales S. C represents a color channel, G () represents a generator generation diagram, in formulas 8 and 9, w and h both represent the width and height of the diagram,
Figure BDA0003199373310000131
representing the pixels at the c-channel, (w, h) position when computing the total variation regularization. I isi s,w,h,cAnd (3) a pixel point value representing that the generated picture of the s scale is located at the (w, h, c) position when the pixel loss is calculated. I isgt s,w,h,cRepresenting the pixel point value at which the real picture representing the s-scale is located at the (w, h, c) position when the pixel loss is calculated.
The countermeasure loss is an essential part of a framework of the countermeasure generation network, the whole network can be made to perform better through the countermeasure learning of the generator G and the discriminator D, the embodiment adopts the traditional countermeasure loss as the countermeasure loss in the text, and the following is a loss function of the countermeasure network of the embodiment:
Figure BDA0003199373310000132
the method is characterized in that a classical confrontation generation network formula is adopted, the identifier evaluates the pictures, gives the highest value of the real front picture and the minimum value of the generated picture to the greatest extent, and judges that the generator is trained when the identifier evaluates the real picture and the generated picture to be consistent. In equation 10, E () represents the expected value, I, of the distribution functiongtRepresenting a real picture, IinputSide face picture representing input, GnA generator representing an nth iteration. DθDRepresenting a pre-trained discriminator.
The total loss of the network proposed by this embodiment is a linear combination of the above losses, and its expression form is as follows:
L=λ1LID2Lpixel3Ladv4Ltv (11)
in the above formula, L represents the calculated total loss; l isID,Lpixel,Ladv,LtvRespectively representing identity loss, pixel loss, immunity loss and total variation regularization, wherein1,λ2,λ3,λ4Weights representing different loss functions, 0.1,20,0.1,10 respectively-4
Step 1.3: using an optimizer Adam, setting parameters as defaults, performing iterative training on the multi-input fusion countermeasure generation network, and continuously optimizing the model through a back propagation and gradient descent method according to the error of each forward propagation calculation to finally obtain the trained multi-input fusion countermeasure generation network;
in this embodiment, the setting conditions of the network parameters are as follows: adam is used as an optimizer of the network, meanwhile, the batch size is set to be 32, the initial learning rate is set to be 0.001, and the learning rate is attenuated to be 0.9 times of the original learning rate when 96 batches are trained. The network training is iterated 40 ten thousand times in total.
Step 1.4: testing the test set by using the trained model, and acquiring an image and a front image data set IFComparing the front pictures, calculating rank-1 index, and obtaining the index by calculationIn multi-posture correction of all angles, the method has almost the best effect, and rank-1 indexes are all over 95.
For comparison with other methods, the present example uses data sets such as M2FPA, DFW, CAS-PEAL-R1 for training and testing; as shown in table 1, the accuracy of the present invention is always in the first two positions for different side faces at a depression angle of 15 °, and the present invention has strong robustness especially for large angles. Fig. 6 shows the effect of face correction at a depression angle on an M2FPA data set, which is able to recover a clearer picture and is sufficiently similar to a front picture, and fig. 7 shows the corrected face obtained by using two pictures as input on a DFW data set, which is able to better recover skin organs when a plurality of pictures are input, which is sufficient to reflect the effectiveness of the method.
TABLE 1
Figure BDA0003199373310000141
FIG. 8 shows the effect of the method for correcting a face image with a depression angle based on the self-attention mechanism in comparison with DA-GAN, TP-GAN and other methods on a DFW data set. It is shown in table 1 that our method has the advantages that the rank-1 accuracy of our method is highest under almost all angles, and the second best 99.5 is achieved even though the best effect is not achieved under 30-degree angles.
The invention adopts various most advanced technologies:
(1) the invention adds a self-attention module in the face generation network. Then in order to make the correction effect finer;
(2) the invention divides the picture into multiple areas for identification. Experiments show that the self-attention module fuses fine features reserved in the whole face and rich local features provided by a face local generation network, and can generate a face correction picture with the fine features;
(3) the invention provides an input enhanced multi-input face depression angle correction network; in the enhanced input module, the ConvGRU module is added, and the ConvGRU module can extract the associated features in a plurality of depression angle face images to complement information, reduce the number of parameters in a network, reduce the complexity of network training and improve the efficiency of the model.
(4) The invention can efficiently reconstruct a real, accurate and high-precision face front image from a single or a plurality of depression angle pictures.
It should be understood that the above description of the preferred embodiments is given for clarity and not for any purpose of limitation, and that various changes, substitutions and alterations can be made herein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (5)

1. A depression angle face image correction method based on a self-attention mechanism is characterized by comprising the following steps:
step 1: constructing a multi-input fusion countermeasure generation network based on an attention mechanism;
the multi-input fusion confrontation generation network comprises a multi-input fusion coding module, a self-attention module, a single-layer fusion module, a multi-input fusion decoding module and a confrontation generation network identification module;
the multi-input fusion coding module comprises four convolutional layers which are arranged in series, the first layer is a convolutional layer with the convolutional kernel size of 7, and the step length is 1; the second layer is a convolution layer with convolution kernel size of 5 and step length of 2; the third and fourth layers are convolution layers with convolution kernel size of 3, and step length is 2; a residual block is added behind the first layer and the second layer of convolution layers, and a normalization layer, an activation layer and a residual block are sequentially added behind the third layer and the fourth layer of convolution layers;
the self-attention module is used for constructing three characteristic graphs F, g and h for the characteristic graph F output by the multi-input fusion coding module through a convolution kernel with the size of 1; and performing matrix multiplication and softmax operation on the characteristic diagram f and the characteristic diagram g to obtain rectangular characteristicsGraph betaI,Then betaI,Multiplying the characteristic graph h to obtain a weight value ojAdding the weight value into the characteristic diagram F and then outputting the weight value;
the single-layer fusion module is used for fusing a plurality of picture features of the C pictures output by each convolution layer in the multi-input fusion coding module through the C ConvGRU modules;
the multilayer fusion module is used for enabling all the characteristics to be in the same scale by respectively passing through a deconvolution layer for four single-layer fusion characteristics G1, G2, G3 and G4 output by the single-layer fusion module, and respectively passing through a ConvGRU module according to the sequence of G4, G3, G2 and G1 to finally obtain multilayer fusion characteristics, wherein the multilayer fusion characteristics pass through a convolution layer with a convolution kernel size of 3 and a step size of 2 and two full-connection layers to obtain overall characteristics;
the multi-input fusion decoding module consists of four deconvolution layers, two self-attention layers and two convolution layers; the system is used for adding Gaussian noise information into the total features output by the multilayer fusion module for reconstruction to obtain new features F1, and then performing up-sampling on the features F1 to respectively form three features F2, F3 and F4 with different scales and then inputting the three features into a deconvolution layer; entering a deconvolution operation; the input of the deconvolution network of the first layer of the multi-input fusion decoding module is an up-sampling value of the output of the fourth layer of the multi-input fusion coding module after passing through the residual block and fused with F1; the input of the second layer of the convolution layer of the multi-input fusion decoding module is the result of the output of the previous layer of convolution layer after passing through the residual block, F2 and the output of the third layer of the convolution layer of the multi-input fusion encoding module after passing through the residual block are fused; the input of the deconvolution layer of the third layer of the multi-input fusion decoding module is the residual output of the deconvolution layer of the previous layer, the result of the output of the attention module after passing through the residual block, F3, the cross-layer input of the second layer of the convolution layer of the multi-input fusion coding module, and the fusion of the four values after the input picture is subjected to resize to be in a certain size; the input of the deconvolution layer of the fourth layer of the multi-input fusion decoding module is the result of the output of the attention module after passing through the residual block, the result of the output of the first convolution layer of the multi-input fusion coding module after passing through the parameter block, and the fusion input of the input picture; the face correction fine picture is output through the two convolution layers after the fourth layer of the multi-input fusion decoding module; after the input feature map passes through the unit, each feature map has a weight map which represents the association degree of each part in the feature map;
the generation confrontation network identification module consists of seven convolutional layers, and residual blocks are added into a penultimate layer and a penultimate layer;
step 2: and inputting the depression angle face image to be corrected into the multi-input fusion confrontation generation network to obtain a face correction fine picture.
2. The depression-angle face image correction method based on the self-attention mechanism according to claim 1, characterized in that: in step 1, training the multi-input fusion countermeasure generating network to obtain a trained multi-input fusion countermeasure generating network; the specific implementation comprises the following substeps:
step 1.1: making a training set comprising a front image dataset IFAnd depression angle image data set IP
Step 1.2: the depression angle image data set IPInputting the depression angle picture into the multi-input fusion countermeasure generation network, and inputting the front image data set IFAgainst a generator of a multiple-input fusion generation network, a generated picture I to be generatedGCalculating pixel loss, identity retention loss, confrontation loss, total variation regularization and total loss;
step 1.3: using an optimizer Adam, setting parameters as defaults, performing iterative training on the multi-input fusion countermeasure generating network, and continuously optimizing a model through a back propagation and gradient descent method according to the error of each forward propagation calculation to finally obtain the trained multi-input fusion countermeasure generating network;
step 1.4: testing the test set by using the trained modelThe obtained image and the front image data set IFComparing the positive pictures in the picture list, and calculating rank-1 index.
3. The depression-angle face image correction method based on the self-attention mechanism according to claim 1, characterized in that: in step 1, each control gate of the ConvGRU module is preceded by a weight map, and the feature map is updated and retained according to the weight map.
4. The depression-angle face image correction method based on the self-attention mechanism according to any one of claims 1 to 3, characterized in that: in step 1, the multi-input fusion coding module comprises four convolution layers and four integrally fused ConvGRU units, wherein each convolution layer corresponds to each layer.
5. A depression angle human face image correction system based on a self-attention mechanism is characterized by comprising the following modules:
the module 1 is used for constructing a multi-input fusion countermeasure generation network based on an attention mechanism;
the multi-input fusion confrontation generation network comprises a multi-input fusion coding module, a self-attention module, a single-layer fusion module, a multi-input fusion decoding module and a confrontation generation network identification module;
the multi-input fusion coding module comprises four convolutional layers which are arranged in series, the first layer is a convolutional layer with the convolutional kernel size of 7, and the step length is 1; the second layer is a convolution layer with convolution kernel size of 5 and step length of 2; the third and fourth layers are convolution layers with convolution kernel size of 3, and step length is 2; a residual block is added behind the first layer and the second layer of convolution layers, and a normalization layer, an activation layer and a residual block are sequentially added behind the third layer and the fourth layer of convolution layers;
the self-attention module is used for constructing three characteristic graphs F, g and h for the characteristic graph F output by the multi-input fusion coding module through a convolution kernel with the size of 1; and performing matrix multiplication and softmax operation on the characteristic diagram f and the characteristic diagram g to obtain a rectangleCharacteristic diagram betaI,Then betaI,Multiplying the characteristic graph h to obtain a weight value ojAdding the weight value into the characteristic diagram F and then outputting the weight value;
the single-layer fusion module is used for fusing a plurality of picture features of the C pictures output by each convolution layer in the multi-input fusion coding module through the C ConvGRU modules;
the multilayer fusion module is used for enabling all the characteristics to be in the same scale by respectively passing through a deconvolution layer for four single-layer fusion characteristics G1, G2, G3 and G4 output by the single-layer fusion module, and respectively passing through a ConvGRU module according to the sequence of G4, G3, G2 and G1 to finally obtain multilayer fusion characteristics, wherein the multilayer fusion characteristics pass through a convolution layer with a convolution kernel size of 3 and a step size of 2 and two full-connection layers to obtain overall characteristics;
the multi-input fusion decoding module consists of four deconvolution layers, two self-attention layers and two convolution layers; the system is used for adding Gaussian noise information into the total features output by the multilayer fusion module for reconstruction to obtain new features F1, and then performing up-sampling on the features F1 to respectively form three features F2, F3 and F4 with different scales and then inputting the three features into a deconvolution layer; entering a deconvolution operation; the input of the deconvolution network of the first layer of the multi-input fusion decoding module is an up-sampling value of the output of the fourth layer of the multi-input fusion coding module after passing through the residual block and fused with F1; the input of the second layer of the convolution layer of the multi-input fusion decoding module is the result of the output of the previous layer of convolution layer after passing through the residual block, F2 and the output of the third layer of the convolution layer of the multi-input fusion encoding module after passing through the residual block are fused; the input of the deconvolution layer of the third layer of the multi-input fusion decoding module is the residual output of the deconvolution layer of the previous layer, the result of the output of the attention module after passing through the residual block, F3, the cross-layer input of the second layer of the convolution layer of the multi-input fusion coding module, and the fusion of the four values after the input picture is subjected to resize to be in a certain size; the input of the deconvolution layer of the fourth layer of the multi-input fusion decoding module is the result of the output of the attention module after passing through the residual block, the result of the output of the first convolution layer of the multi-input fusion coding module after passing through the parameter block, and the fusion input of the input picture; the face correction fine picture is output through the two convolution layers after the fourth layer of the multi-input fusion decoding module; after the input feature map passes through the unit, each feature map has a weight map which represents the association degree of each part in the feature map;
the generation confrontation network identification module consists of seven convolutional layers, and residual blocks are added into a penultimate layer and a penultimate layer;
and the module 2 is used for inputting the depression angle face image needing to be corrected into the multi-input fusion confrontation generation network to obtain a face correction fine picture.
CN202110899936.2A 2021-08-06 2021-08-06 Depression angle face image correction method and system based on self-attention mechanism Active CN113706404B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110899936.2A CN113706404B (en) 2021-08-06 2021-08-06 Depression angle face image correction method and system based on self-attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110899936.2A CN113706404B (en) 2021-08-06 2021-08-06 Depression angle face image correction method and system based on self-attention mechanism

Publications (2)

Publication Number Publication Date
CN113706404A true CN113706404A (en) 2021-11-26
CN113706404B CN113706404B (en) 2023-11-21

Family

ID=78651853

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110899936.2A Active CN113706404B (en) 2021-08-06 2021-08-06 Depression angle face image correction method and system based on self-attention mechanism

Country Status (1)

Country Link
CN (1) CN113706404B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114639156A (en) * 2022-05-17 2022-06-17 武汉大学 Depression angle face recognition method and system based on axial attention weight distribution network
CN115731243A (en) * 2022-11-29 2023-03-03 北京长木谷医疗科技有限公司 Spine image segmentation method and device based on artificial intelligence and attention mechanism

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109902667A (en) * 2019-04-02 2019-06-18 电子科技大学 Human face in-vivo detection method based on light stream guide features block and convolution GRU
CN110059602A (en) * 2019-04-10 2019-07-26 武汉大学 A kind of vertical view face antidote based on orthographic projection eigentransformation
US20200265219A1 (en) * 2017-09-18 2020-08-20 Board Of Trustees Of Michigan State University Disentangled representation learning generative adversarial network for pose-invariant face recognition
CN112418074A (en) * 2020-11-20 2021-02-26 重庆邮电大学 Coupled posture face recognition method based on self-attention
CN112818850A (en) * 2021-02-01 2021-05-18 华南理工大学 Cross-posture face recognition method based on progressive neural network and attention mechanism

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200265219A1 (en) * 2017-09-18 2020-08-20 Board Of Trustees Of Michigan State University Disentangled representation learning generative adversarial network for pose-invariant face recognition
CN109902667A (en) * 2019-04-02 2019-06-18 电子科技大学 Human face in-vivo detection method based on light stream guide features block and convolution GRU
CN110059602A (en) * 2019-04-10 2019-07-26 武汉大学 A kind of vertical view face antidote based on orthographic projection eigentransformation
CN112418074A (en) * 2020-11-20 2021-02-26 重庆邮电大学 Coupled posture face recognition method based on self-attention
CN112818850A (en) * 2021-02-01 2021-05-18 华南理工大学 Cross-posture face recognition method based on progressive neural network and attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨壮;吴斌;廉炜雯;韩兴;: "基于注意力机制和深度恒等映射的人脸识别", 传感器与微系统, no. 09, pages 150 - 153 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114639156A (en) * 2022-05-17 2022-06-17 武汉大学 Depression angle face recognition method and system based on axial attention weight distribution network
CN114639156B (en) * 2022-05-17 2022-07-22 武汉大学 Depression angle face recognition method and system based on axial attention weight distribution network
CN115731243A (en) * 2022-11-29 2023-03-03 北京长木谷医疗科技有限公司 Spine image segmentation method and device based on artificial intelligence and attention mechanism
CN115731243B (en) * 2022-11-29 2024-02-09 北京长木谷医疗科技股份有限公司 Spine image segmentation method and device based on artificial intelligence and attention mechanism

Also Published As

Publication number Publication date
CN113706404B (en) 2023-11-21

Similar Documents

Publication Publication Date Title
CN113221641B (en) Video pedestrian re-identification method based on generation of antagonism network and attention mechanism
CN108537743A (en) A kind of face-image Enhancement Method based on generation confrontation network
CN107292225B (en) Face recognition method
CN111368672A (en) Construction method and device for genetic disease facial recognition model
CN112784929B (en) Small sample image classification method and device based on double-element group expansion
CN113706404B (en) Depression angle face image correction method and system based on self-attention mechanism
CN112801015A (en) Multi-mode face recognition method based on attention mechanism
CN113379655B (en) Image synthesis method for generating antagonistic network based on dynamic self-attention
CN115050064A (en) Face living body detection method, device, equipment and medium
CN113205002B (en) Low-definition face recognition method, device, equipment and medium for unlimited video monitoring
CN115966010A (en) Expression recognition method based on attention and multi-scale feature fusion
CN117079098A (en) Space small target detection method based on position coding
CN114596622A (en) Iris and periocular antagonism adaptive fusion recognition method based on contrast knowledge drive
CN111444957B (en) Image data processing method, device, computer equipment and storage medium
Zhao et al. [Retracted] Hybrid Depth‐Separable Residual Networks for Hyperspectral Image Classification
CN112836605B (en) Near-infrared and visible light cross-modal face recognition method based on modal augmentation
CN110728238A (en) Personnel re-detection method of fusion type neural network
CN114429646A (en) Gait recognition method based on deep self-attention transformation network
CN118053232A (en) Enterprise safety intelligent management system and method thereof
CN117934849A (en) Deep learning-based RGB-D image semantic segmentation method
CN113450297A (en) Fusion model construction method and system for infrared image and visible light image
CN113221683A (en) Expression recognition method based on CNN model in teaching scene
CN112907692A (en) SFRC-GAN-based sketch-to-face reconstruction method
CN109583406B (en) Facial expression recognition method based on feature attention mechanism
CN111461061A (en) Pedestrian re-identification method based on camera style adaptation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant