CN114445889A - Lightweight face aging method based on double attention mechanism - Google Patents

Lightweight face aging method based on double attention mechanism Download PDF

Info

Publication number
CN114445889A
CN114445889A CN202210095562.3A CN202210095562A CN114445889A CN 114445889 A CN114445889 A CN 114445889A CN 202210095562 A CN202210095562 A CN 202210095562A CN 114445889 A CN114445889 A CN 114445889A
Authority
CN
China
Prior art keywords
age
image
face
aging
dimensional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210095562.3A
Other languages
Chinese (zh)
Inventor
马小林
郭翔
张家亮
旷海兰
刘新华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN202210095562.3A priority Critical patent/CN114445889A/en
Publication of CN114445889A publication Critical patent/CN114445889A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a lightweight face aging method based on a double attention mechanism, which comprises the following steps: preprocessing an input face image; coding the input target aging age and converting the target aging age into a multi-dimensional age vector; extracting identity features of the preprocessed face image to obtain high-dimensional identity features; inputting the multi-dimensional age vector into a multi-layer perceptron to be mapped into age-related high-dimensional age features; acquiring a fusion feature vector from the high-dimensional identity feature and the high-dimensional age feature through a self-adaptive instance normalization layer; using jump connection, up-sampling and multi-scale traditional convolution on the fusion feature vector to obtain a texture attention diagram and a color attention diagram; and fusing the texture attention diagram, the color attention diagram and the input original image to obtain the face aging image aged to the target age finally. The method can obtain the high-resolution image of the face aged to the target age finally.

Description

Lightweight face aging method based on double attention mechanism
Technical Field
The invention belongs to the technical field of digital image processing, and particularly relates to a lightweight face aging method based on a double attention mechanism.
Background
With the improvement of living standard, the pursuit of people for social entertainment quality is gradually improved, and short videos become the most popular social entertainment mode at present. In the process of shooting the short video, the special effect simulation of face aging and rejuvenation is realized, and the face aging and rejuvenation is praised by the majority of users due to the authenticity and interestingness of the face aging and rejuvenation. However, the face aging is a very complex process, and various factors need to be considered comprehensively, the structural design of the traditional algorithm is complex in order to realize the authenticity of the aging effect, and particularly in the process of realizing the high-resolution face aging, the calculation amount of a single image is huge, so that the method is not beneficial to the deployment of embedded devices such as mobile phones and the like.
In order to solve the problem that a face aging algorithm is difficult to deploy in an embedded device, a researcher considers applying deep learning to face aging, an existing face aging algorithm usually designs a neural network model with a relatively simple structure to achieve face aging, and although face aging can be achieved to a certain extent, the effect is not satisfactory, and the reality is greatly reduced.
At present, lightweight network design has been greatly successful in the field of deep learning and is widely applied to image processing research, but the specific field of face aging is relatively less in application, and needs to be studied more deeply, so that great progress space exists.
Disclosure of Invention
The invention aims to solve the defects of the background technology, and provides a lightweight face aging method based on a double attention mechanism, which comprises the steps of extracting identity characteristics of an input face image by using the traditional convolution, depth separable convolution, inverted bottleneck residual error and mixed domain attention, using a target aging age as the input of image attribute editing, realizing the fusion of the identity characteristics and the aging age characteristics by using a self-adaptive instance normalization layer, and generating a texture attention diagram and a color attention diagram of the double attention mechanism by performing jump connection, up-sampling and convolution operation on the fusion characteristics; finally, combining texture attention diagram, color attention diagram and original input image, obtaining high-resolution image of face aged to target age, and completing network training without supervision learning.
The technical scheme adopted by the invention is as follows: a lightweight face aging method based on a double attention mechanism is characterized by comprising the following steps: the method comprises the following steps:
s1, preprocessing the input face image to realize pixel normalization;
s2, coding the input target aging age and converting the target aging age into a multi-dimensional age vector;
s3, extracting the identity characteristics of the preprocessed face image by adopting a coding network to obtain high-dimensional identity characteristics;
s4, inputting the multi-dimensional age vector into a multi-layer sensor, gradually increasing the dimension, and mapping the multi-dimensional age vector into age-related high-dimensional age characteristics;
s5, realizing feature fusion of the high-dimensional identity feature and the high-dimensional age feature through a self-adaptive instance normalization layer to obtain a fusion feature vector;
s6, a decoding network constructed by jump connection, up-sampling and multi-scale traditional convolution is used for the fusion feature vector to obtain a texture attention diagram and a color attention diagram; and fusing the texture attention diagram, the color attention diagram and the input original image to obtain the face aging image aged to the target age finally.
In the above technical solution, in step S1, the original input image x is subjected toiPretreatment with a primary age of i according to [0.5,0.5 ]]Sum of mean values of [0.5,0.5 ]]The standard deviation is normalized, and stretching, clipping and noise adding are only introduced in the network training process to prevent the network training from overfitting.
In the above technical solution, in order to solve the problem that the common coding method has a weak correlation with the adjacent ages, in step S2, a coding method combining classification and regression is designed, and first, an age section in which a target aging age is located is confirmed, and a correlation between the target aging age and an age section boundary is obtained as a coding result by a linear calculation method, so as to obtain a multidimensional age vector, thereby realizing that the coding result retains a certain age correlation.
In the above technical solution, in step S2, dividing 0-100 years into 10 age groups of 0-1,1-2,2-3 … 9-10 according to the width N of the age group being 10, acquiring an input target age j, and performing age group determination, where the age group where j is located is a lower integer a of j/N and a lower integer B of j/N + 1;
calculating the association system of the target age and the age interval according to the determined age interval by the following formula group to obtain the association coefficient p, q:
p+q=1
A×p+B×q=j
the encoding result of the target aging age j is an 11-dimensional vector Tj
Figure BDA0003490888320000031
m is an integer
In the above technical solution, aiming at the problem that the traditional convolution mode has a large calculation amount and a large parameter amount, in step S3, the coding network includes a traditional convolution module, a depth separable convolution module with a step length of 2, a bottle neck inversion residual error module, and a mixed domain attention module; the preprocessed face image is downsampled and feature extracted by adopting a traditional convolution module and a separable convolution module with the step length of 2 depths in a coding network, the sampling depth of identity extraction is deepened by adopting a bottle neck inversion residual error module in the coding network, and the extraction capability of the coding network on important areas of identity features is enhanced by using a mixed domain attention module. The method can effectively reduce the parameter quantity and the calculated quantity of the coding network.
Preferably, down sampling and feature extraction are realized by using 1 7 × 7 traditional convolution module with step size 1 and 2 3 × 3 depth separable convolution modules with step size 2, the convolution depth is deepened by using 4 3 × 3 inverted bottleneck residual error modules with step size 1, and the important region extraction of the network is guided by using 1 mixed domain attention module to the feature, and finally the image is extracted to 128-dimensional n × n identity feature Z.
In the above technical solution, in step S4, most of the age vectors in step S2 are input to the multi-level perceptron of [11,64,128,256], and the characteristic dimension is gradually increased to obtain a 256-dimensional 1 × 1 age vector L.
In the above technical solution, in order to achieve reasonable and sufficient fusion of the two feature vectors, in step S5, the high-dimensional age feature format is converted into 2 age feature vectors having the same dimension as the high-dimensional identity feature; feature fusion is carried out on the high-dimensional identity features and the 2 age feature vectors by using an adaptive instance normalization layer (AdaIN) to obtain a fusion feature vector AdaIN (Z, L) with the same dimension as the high-dimensional identity features,
Figure BDA0003490888320000041
wherein Z represents identity characteristics and L represents age characteristics; mu (Z) and sigma (Z) respectively represent the mean value and the standard deviation of the identity characteristics and are obtained by calculation through a mathematical formula in the prior art; α (L) and β (L) represent 2 age feature vectors after the age feature format conversion, respectively.
Preferably, the obtained 128-dimensional identity feature and the 256-dimensional age feature are fused, the 256-dimensional feature is firstly compressed into 2 128-dimensional 1 × 1 feature vectors, and then feature fusion is performed by using an adaptive instance normalization layer (AdaIN) to obtain a 128-dimensional n × n fused feature vector.
In the above technical solution, for the problem that the jump connection layer weakens the function of the age information included in the fusion feature in the decoding process, in step S6, a decoding network is constructed by using the jump connection layer combined with an attention mechanism, an upsampling module and a conventional convolution module, so as to enhance the function of the age information in the decoding process, and make the generated image closer to the target age; after the decoding network is adopted to carry out characteristic dimension reduction and scale expansion on the fused characteristic vector, 2 independent traditional convolution modules are used for decoding the networkProcessing the output image to respectively generate a texture attention diagram R and a color attention diagram C which are consistent with the scale of the input image; unprocessed original face image xiAnd the texture attention diagram R and the color attention diagram C are fused according to the following formula to finally obtain an aged face image x corresponding to the target age jij:
xij=R×xi+(1-R)×C
Preferably, a decoding network is constructed using 2 skipped connection layers combined with an attention gate mechanism, 2 upsampling with a scale of 2, and 2 conventional convolutions of 3 × 3 with a step size of 1, the resulting fused features are subjected to feature dimension reduction and scale expansion, and then a texture attention map R and a color attention map C are generated, respectively, in conformity with the scale of the input image using 2 independent conventional convolutions of 7 × 7 with a step size of 1.
In the above technical solution, in order to enable the method to generate a human face age composite image with more real vision and clearer details, the method further includes step S7:
training an authenticity discriminator for discriminating a face aging image result finally aged to a target age, detecting whether an input picture is generated in the step S6 by the authenticity discriminator, and calculating an authenticity error loss value; estimating the age of the aged face in the input picture by adopting an age discriminator, and calculating an age error loss value; and jointly guiding the encoding network, the multi-layer perceptron and the decoding network training process in the steps S3-S6 by adopting the authenticity error loss value and the age error loss value. The age discriminator adopts VGG-FACE.
In the above technical solution, for the problem that the data set lacks pairing data in the training process, the method further includes step S8:
inputting the face aging image result obtained in the step S6 and the original age of the face image adopted in the step S1 as an original face image and a target aging age by adopting a cycle consistency principle, executing the steps S1-S6 again to obtain an image with the aged image restored to the original age, performing pixel-level loss comparison on the face image obtained by executing the steps and the original face image input in the step S1, and guiding the training process of the coding network, the multilayer perceptron and the decoding network of the steps S3-S6.
In the above technical solution, in order to enable the method to generate a human face age composite image with more real vision and clearer details, the method further includes step S9:
and inputting an original input image and an original age as an original face image and a target aging age by adopting an image reconstruction consistency principle, executing the steps S1-S6 again to obtain an image of the original face image at the original age, performing pixel-level loss comparison and age loss comparison with the original input face image, and guiding the training processes of the coding network, the multi-layer perceptron and the decoding network of the steps S3-S6.
The invention provides a computer readable storage medium, wherein a dual attention mechanism high-resolution face image aging method program of a lightweight network is stored on the computer readable storage medium, and when the dual attention mechanism high-resolution face image aging method program of the lightweight network is executed by a processor, the steps of the dual attention mechanism high-resolution face image aging method of the lightweight network in the technical scheme are realized.
The beneficial effects of the invention are: the invention uses the depth separable convolution and bottleneck inversion residual module, thus reducing the operation amount of the computer for extracting the identity characteristics of the face image; the self-adaptive instance normalization layer is used, so that the operation amount of high-dimensional spatial feature fusion is reduced; the method has the advantages that the texture attention diagram, the color attention diagram and the original input image are fused, so that the pixel loss of the aged image and the original image is reduced, and the high-resolution human face aged image synthesis is realized; and realizing the network unsupervised training process by using a cycle consistency principle and a reestablishment consistency principle.
Drawings
FIG. 1 is a flow chart of a lightweight face aging method based on a dual attention mechanism according to the present invention;
FIG. 2 is a schematic diagram of a network structure of a lightweight face aging method based on a dual attention mechanism according to the present invention;
FIG. 3 is a graph comparing depth separable convolution with conventional convolution;
FIG. 4 is a comparison of an inverted bottleneck residual module and a conventional residual module;
FIG. 5 is a schematic diagram of a jump connection layer incorporating an attention gate mechanism;
FIG. 6 is a diagram of a training process of a lightweight face aging method based on a dual attention mechanism according to the present invention;
fig. 7 is a face aging effect diagram of a lightweight face aging method based on a double attention mechanism (original image is 34 years old, and aged image is 65 years old).
Detailed Description
The invention will be further described in detail with reference to the following drawings and specific examples, which are not intended to limit the invention, but are for clear understanding.
The invention provides a lightweight face aging method based on a double attention mechanism, which mainly comprises the following steps: as shown in fig. 1, 5 aspects of preprocessing input face images and target aging ages, extracting the identities of the face images, mapping high-dimensional age features, fusing the identities and the age features, generating a double attention map and synthesizing aging face images; as shown in fig. 6, in the unsupervised training process of the network model, the whole method includes the following steps:
the method comprises the following steps of firstly, inputting a face image and preprocessing a target aging age, and specifically comprises the following steps:
(1-1) carrying out normalization processing on an input face image, carrying out stretching and then randomly cutting in the training process, and simultaneously adding Gaussian noise to prevent over-fitting of network training, specifically:
(1-1-1) adjusting the threshold range of the RGB channel of the input face image from 0-255 to 0-1, and enabling the normalized image to be in accordance with the average value of [0.5,0.5,0.5], and the standard deviation of [0.5,0.5,0.5 ]; in the training process, before the image normalization, an additional stretching and cutting process is required, the image is stretched to be 1.1 times of the original size, in the stretched image, the image with the original size is cut randomly, and Gaussian noise with the expected value of 0.5 and the variance of 0.5 is added after the face image normalization.
(1-2) encoding the target aging age j, firstly confirming an age interval where the target age is located by combining the encoding modes of classification and regression, obtaining the relevance between the target age and the boundary of the age interval by a linear calculation mode, and outputting an age characteristic, wherein the encoding modes specifically comprise the following steps:
(1-2-1) dividing 0-100 years into 10 age sections of 0-1,1-2,2-3 … 9-10 according to the width N of the age section, acquiring input aged age j, and judging the age sections, wherein the age sections of j are lower integer A of j/N and lower integer B of j/N + 1.
(1-2-2) calculating the association between the target age and the age section according to the age section determined in (1-2-1) by the following formula group, and obtaining the association coefficient p, q.
p+q=1
A×p+B×q=j
The aging age j encoding result is an 11-dimensional vector Tj(ii) a M in the formula is an integer corresponding to the age interval
Figure BDA0003490888320000071
Secondly, extracting the identity of the face image, comprising the following steps:
(2) the method comprises the following steps of constructing a coding network by using a traditional convolution module with the step length of 1, a depth separable convolution module with the step length of 2, a bottle neck inversion residual error module and a mixed domain attention module, and extracting identity characteristics of a preprocessed face image, wherein the method specifically comprises the following substeps:
(2-1) As shown in FIG. 2, using 1 7 × 7 conventional convolution calculation with 3 padding and 1 step size, the activation function is chosen as ReLU, which achieves mapping from 3 × n × n low dimensional space to 32 × n × n feature space while preserving the higher receptive field of the convolution.
(2-2) As shown in FIG. 2,2 3 × 3 depth separable convolution calculations with 1 padding and 2 step sizes are continuously used, h-swish is selected as the activation function, convolution with 2 step sizes can replace a pooling layer to realize feature scale reduction, and while parameters and calculation amount are reduced, feature space is realized from 32 × n × n to
Figure BDA0003490888320000081
Then to
Figure BDA0003490888320000082
The conversion of (1).
The depth separable convolution (the lower convolution module in fig. 3) adopts the ways of channel-by-channel convolution and point-by-point convolution relative to the mixed domain convolution of the traditional convolution (the upper convolution module in fig. 3), and greatly reduces the computational power requirement while realizing the same convolution effect, and the specific structure is as shown in fig. 3.
(2-3) as shown in fig. 2, 4 inverted bottleneck residual error modules are continuously used for calculation, an activation function is selected as h-swish, parameters and calculation amount are reduced, and extraction of deep features by a network is enhanced.
The inverted bottleneck residual error module is different from a traditional residual error module, the dimensionality is improved through point-by-point convolution, the dimensionality is reduced through depth separable convolution, light-weight SE attention is introduced into the middle, extraction of deeper features under low calculation amount is achieved, and the specific structure is that the traditional residual error module is arranged on the left side and the inverted bottleneck residual error module used in the method is arranged on the right side as shown in fig. 4.
(2-4) as shown in fig. 2, 1 mixed domain attention module is used to further guide the coding network to extract the identity feature interesting region with emphasis.
Thirdly, high-dimensional age characteristic mapping, comprising the following steps:
(3) for an age vector T with the length of 11, a multi-layer perceptron with dimension transformation [11,64,128,256] is used for mapping to 256 dimensions, and a Sigmoid function is used for carrying out nonlinear mapping to obtain an age feature L with the dimension of 256 dimensions and the dimension of 1 multiplied by 1.
L=G(b(3)+W(3)(s(b(2)+W(2)(s(b(1)+W(1)*T)))))
Wherein W represents a weight matrix of the full link layer, b represents a bias matrix of the full link layer, T represents the target aging age coding result obtained in (1-2-2), G is a softmax function, and s is a Sigmoid function.
Fourthly, fusing identity and age characteristics, comprising the following steps:
(4) firstly, a 256-dimensional 1 × 1 age feature L format is converted into 2 128-dimensional 1 × 1 feature vectors, and then feature fusion is performed by using an adaptive instance normalization layer (AdaIN), wherein a fusion formula is as follows, so that a 128-dimensional n × n fusion feature AdaIN (Z, L) is obtained.
Figure BDA0003490888320000083
Mu (Z) and sigma (Z) respectively represent the mean value and the standard deviation of the identity characteristics and are obtained by calculation through a mathematical formula in the prior art; α (L) and β (L) represent two 128-dimensional feature vectors after the age feature format conversion, respectively.
Fifthly, generating a double attention map and synthesizing aged face images, wherein the method comprises the following steps:
(5) using a jump join, upsampling, and multi-scale conventional convolution for the fused features, a texture attention map and a color attention map are obtained. Combining the texture attention diagram, the color attention diagram and the input original image, fusing the three to obtain a face aging image aged to a target age finally, wherein the step specifically comprises the following substeps:
(5-1) as shown in fig. 2, the jump connection layer combined with the attention gate mechanism is used, the features are fused to serve as gate control signals, the weights of the jump connection layer are guided, and finally, the prominent image area and the feature response irrelevant to the suppression task are highlighted.
The attention gate mechanism is shown in FIG. 5, with the fusion feature as the gate control signal xgFirst, a 1 × 1 conventional convolution is performed, the compression characteristic of which is Hg×WgX 1, jumping the connection layer as controlled signal xlSimilarly, a conventional convolution of 1 × 1 is performed, the compression characteristic being Hl×WlX 1, where H denotes the height of the feature, W denotes the width of the feature, with the subscript l, g denotes the corresponding signal source; two 1-dimensional vectors concat are 2-dimensional vectors, and then H is obtained by 7 multiplied by 7 convolutionl×WlLayer of × 1 repeat, with controlled signal xlMultiplying to obtain final output signal
Figure BDA0003490888320000091
(5-2) As shown in FIG. 2, in the present network structure, the fusion feature is completely consistent with the feature scale of the jump connection layer combined with the attention gate mechanism, the two concat are connected together, two upsampling with the scaling factor of 2 and 3 x 3 traditional convolution with the step length of 1 are used to realize the feature space slave
Figure BDA0003490888320000092
To
Figure BDA0003490888320000093
And then to 32 xnxn.
(5-3) As shown in FIG. 2, convolution of 7 × 7 × 1 and 7 × 7 × 3 is performed on the feature vectors of 32 × n × n, respectively, to obtain texture attention map R and color attention map C, and the texture attention map R and the color attention map C are fused with the original input image according to the following formula to obtain an aged image xij
Sixthly, an unsupervised training process of the network model comprises the following steps:
(6) as shown in fig. 6, in the network training process, the obtained aging image needs to be subjected to authenticity judgment through an authenticity discriminator D, and the judgment result is used as an original loss function L of GANGANAnd guiding the coding network, the decoding network and the multi-layer perceptron training.
LGAN(E,G,M)=Ex~P(x)Ey~P(y)[(D(G(E(x),M(y)))-1)2]
Wherein Ex~P(x)Representing the mathematical expectation that the input image x conforms to the distribution of p (x), E (×) representing the coding network, G (×) representing the decoding network, M (×) representing the multi-layer perceptron, p (x) representing the true distribution of the input image, p (y) representing the true distribution of the target aging age label, x representing the original input face image, y representing the target aging age.
When guiding discriminator D to train itself, the loss function is as follows
LGAN(D)=Ex~P(x)Ey~p(y)[(D(G(E(x),M(y))))2]+Ex~P(x)[(D(x)-1)2]
Wherein Ex~P(x)Representing the mathematical expectation that the input image x conforms to the distribution of p (x), E (×) representing the coding network, G (×) representing the decoding network, M (×) representing the multi-layer perceptron, p (x) representing the true distribution of the input image, p (y) representing the true distribution of the target aging age label, x representing the original input face image, y representing the target aging age.
The age discriminator V adopts the existing VGG-FACE discriminator, so that training is not needed, the aging image is input into the FACE discriminator to obtain the estimated age, and the estimated age is compared with the target age to obtain the loss function Lage1
Figure BDA0003490888320000101
Figure BDA0003490888320000102
Wherein P (y)j) Indicates the target age distribution, C (y)j) Indicates the target age yjCarrying out One-hot coding on the obtained 101 vector; l isCERepresenting a cross entropy loss function.
(7) As shown in fig. 6, during training, the same subject lacks paired data of different ages for supervised learning, so the cycle consistency principle is adopted to solve the problem.
Theoretically, the method can obtain the image x aged to the target age through the stepsijAging the image x due to lack of supervised samplingijRepeating the steps to restore the original age to i to obtain a restored image xijiWith the original image xiComparing to obtain pixel-level loss function Lcycle
Lcycle=||xi-xiji||
Wherein xiRepresenting the original input image, xijiRepresenting circularly reconstructed images, | | calvingRepresenting the L1 norm calculation.
Meanwhile, in the training process, the identity characteristic Z obtained by twice encoding needs to be constrained to guide the accuracy of the encoder for extracting the identity characteristic, so that the loss function is a Pearson correlation coefficient Lid
Figure BDA0003490888320000111
Where μ and σ are the mean and standard deviation, respectively, Z1Representing the identity, Z, extracted from the input original image by the encoding network during the aging process2And representing the identity characteristics obtained by extracting the input aging image by the coding network in the cyclic reconstruction process.
(8) As shown in fig. 6, in order to improve the network image generation quality and the age accuracy, the reconstruction consistency principle is adopted.
Theoretically, the method will be used to generate the original image xiAnd the original age i as algorithm input, and obtaining a reconstructed image xiiWith the original image xiComparing to obtain pixel level loss function LreconAnd age loss function Lage2
Lrecon=||xi-xii||
Figure BDA0003490888320000112
Wherein | | | purple hairDenotes the L1 norm calculation, P (y)i) Representing the original age distribution, C (y)i) Representing the original age yiCarrying out One-hot coding on the obtained 101 vector; l isCERepresenting a cross entropy loss function.
(9) As shown in FIG. 6, the final loss function of the training generation network G for the lightweight face aging method based on the double attention mechanism is LallAs an objective loss function, using an Adam optimizer to perform parameter optimization, and guiding the training process of the coding network, the multi-layer perceptron and the decoding network in the steps S3-S6;
Lall=λGANLGAN(E,G,M)+λreconLreconcycleLcycleidLidage1Lage1age2Lage2#
it will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention. Those not described in detail in this specification are within the skill of the art.

Claims (10)

1. A lightweight face aging method based on a double attention mechanism is characterized by comprising the following steps: the method comprises the following steps:
s1, preprocessing the input face image to realize pixel normalization;
s2, coding the input target aging age and converting the target aging age into a multi-dimensional age vector;
s3, extracting the identity characteristics of the preprocessed face image by adopting a coding network to obtain high-dimensional identity characteristics;
s4, inputting the multi-dimensional age vector into a multi-layer sensor, gradually increasing the dimension, and mapping the multi-dimensional age vector into age-related high-dimensional age characteristics;
s5, realizing feature fusion of the high-dimensional identity feature and the high-dimensional age feature through a self-adaptive instance normalization layer to obtain a fusion feature vector;
s6, a decoding network constructed by jump connection, up-sampling and multi-scale traditional convolution is used for the fusion feature vector to obtain a texture attention diagram and a color attention diagram; and fusing the texture attention diagram, the color attention diagram and the input original image to obtain the face aging image aged to the target age finally.
2. The lightweight face aging method based on the double attention mechanism as claimed in claim 1, wherein: in step S2, the encoding method of classification and regression is combined, and the age section in which the target age is located is first identified, and the correlation coefficient between the target age and the boundary of the age section is obtained as the encoding result by the linear calculation method, thereby obtaining the multidimensional age vector.
3. The lightweight face aging method based on the double attention mechanism as claimed in claim 1, wherein: in step S3, the coding network includes a conventional convolution module, a depth separable convolution module with step length of 2, a bottle neck inversion residual error module, and a mixed domain attention module; the preprocessed face image is downsampled and extracted with a traditional convolution module and a separable convolution module with the step length of 2 depths, the sampling depth of the extracted identity feature is deepened with a bottleneck inversion residual error module, and the extraction capability of the coding network on the important region of the identity feature is enhanced with a mixed domain attention module.
4. The lightweight face aging method based on the dual attention mechanism as claimed in claim 1, wherein: in step S5, splitting a single high-dimensional age feature format into 2 age feature vectors with the same dimension as the high-dimensional identity feature; feature fusion is carried out on the high-dimensional identity features and the 2 age feature vectors by using an adaptive instance normalization layer to obtain fusion feature vectors AdaIN (Z, L) with the same dimension as the high-dimensional identity features,
Figure FDA0003490888310000021
wherein Z represents identity characteristics, L represents age characteristics, and mu (Z) and sigma (Z) respectively represent the mean value and standard deviation of the identity characteristics, and are obtained by calculation through a mathematical formula in the prior art; α (L) and β (L) represent 2 age feature vectors after the age feature format conversion, respectively.
5. The lightweight human face aging method based on the dual attention mechanism as claimed in claim 1The method is characterized in that: in step S6, a decoding network is constructed using a jump connection layer, an upsampling module, and a conventional convolution module in combination with an attention gate mechanism; after the feature dimensionality reduction and the scale expansion of the fused feature vector are carried out by adopting a decoding network, processing an output image of the decoding network by using 2 independent traditional convolution modules, and respectively generating a texture attention diagram R and a color attention diagram C which are consistent with the scale of the input image; unprocessed original face image xiAnd the texture attention diagram R and the color attention diagram C are fused according to the following formula to finally obtain an aged face image x corresponding to the target age jij:
xij=R×xi+(1-R)×C。
6. The lightweight face aging method based on the double attention mechanism as claimed in claim 1, wherein: further comprising step S7:
training an authenticity discriminator for discriminating a face aging image result finally aged to a target age, detecting whether an input picture is generated in the step S6 by the authenticity discriminator, and calculating an authenticity error loss value; estimating the age of the aged face in the input picture by adopting an age discriminator, and calculating an age error loss value; and jointly guiding the encoding network, the multi-layer perceptron and the decoding network training process in the steps S3-S6 by adopting the authenticity error loss value and the age error loss value.
7. The lightweight face aging method based on the double attention mechanism as claimed in claim 1, wherein: further comprising step S8:
inputting the face aging image result obtained in the step S6 and the original age of the face image adopted in the step S1 as an original face image and a target aging age by adopting a cycle consistency principle, executing the steps S1-S6 again to obtain an image with the aged image restored to the original age, performing pixel-level loss comparison on the face image obtained by executing the steps and the original face image adopted in the step S1, and guiding the training process of the coding network, the multilayer perceptron and the decoding network of the steps S3-S6.
8. The lightweight face aging method based on the double attention mechanism as claimed in claim 1, wherein: further comprising step S9:
and inputting an original input image and an original age as an original face image and a target aging age by adopting an image reconstruction consistency principle, executing the steps S1-S6 again to obtain an image of the original face image at the original age, performing pixel-level loss comparison and age loss comparison with the original input face image, and guiding the training processes of the coding network, the multi-layer perceptron and the decoding network of the steps S3-S6.
9. The lightweight face aging method based on the double attention mechanism as claimed in claim 2, characterized in that: in step S2, dividing 0-100 years into 10 age groups of 0-1,1-2,2-3 … 9-10 according to the age group width N of 10, obtaining an input target age j, and performing age group determination, wherein the age group in which j is located is a lower integer a of j/N and a lower integer B of j/N + 1; calculating the relevance between the target age and the age interval according to the determined age interval by the following formula group to obtain relevance coefficients p and q;
p+q=1
A×p+B×q=j
the encoding result of the target aging age j is an 11-dimensional vector Tj
Figure FDA0003490888310000031
m∈[0,10]And m is an integer.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a dual attention mechanism high resolution face image aging method program of a lightweight network, which when executed by a processor, implements the steps of the dual attention mechanism high resolution face image aging method of a lightweight network according to any one of claims 1 to 9.
CN202210095562.3A 2022-01-26 2022-01-26 Lightweight face aging method based on double attention mechanism Pending CN114445889A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210095562.3A CN114445889A (en) 2022-01-26 2022-01-26 Lightweight face aging method based on double attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210095562.3A CN114445889A (en) 2022-01-26 2022-01-26 Lightweight face aging method based on double attention mechanism

Publications (1)

Publication Number Publication Date
CN114445889A true CN114445889A (en) 2022-05-06

Family

ID=81369782

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210095562.3A Pending CN114445889A (en) 2022-01-26 2022-01-26 Lightweight face aging method based on double attention mechanism

Country Status (1)

Country Link
CN (1) CN114445889A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115311720A (en) * 2022-08-11 2022-11-08 山东省人工智能研究院 Defekake generation method based on Transformer

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115311720A (en) * 2022-08-11 2022-11-08 山东省人工智能研究院 Defekake generation method based on Transformer
CN115311720B (en) * 2022-08-11 2023-06-06 山东省人工智能研究院 Method for generating deepfake based on transducer

Similar Documents

Publication Publication Date Title
CN112149504B (en) Motion video identification method combining mixed convolution residual network and attention
Guo et al. Content-based image retrieval using error diffusion block truncation coding features
CN113177882B (en) Single-frame image super-resolution processing method based on diffusion model
CN110689599A (en) 3D visual saliency prediction method for generating countermeasure network based on non-local enhancement
Li et al. Example-based image super-resolution with class-specific predictors
CN111932458B (en) Image information extraction and generation method based on inter-region attention mechanism
Wang et al. Semantic perceptual image compression with a Laplacian pyramid of convolutional networks
CN114757864B (en) Multi-level fine granularity image generation method based on multi-scale feature decoupling
CN111210382A (en) Image processing method, image processing device, computer equipment and storage medium
CN110852935A (en) Image processing method for human face image changing with age
Löhdefink et al. On low-bitrate image compression for distributed automotive perception: Higher peak snr does not mean better semantic segmentation
CN111986132A (en) Infrared and visible light image fusion method based on DLatLRR and VGG & Net
CN116168197A (en) Image segmentation method based on Transformer segmentation network and regularization training
CN116664435A (en) Face restoration method based on multi-scale face analysis map integration
CN117314808A (en) Infrared and visible light image fusion method combining transducer and CNN (carbon fiber network) double encoders
Löhdefink et al. GAN-vs. JPEG2000 image compression for distributed automotive perception: Higher peak SNR does not mean better semantic segmentation
CN111967358A (en) Neural network gait recognition method based on attention mechanism
CN114445889A (en) Lightweight face aging method based on double attention mechanism
CN118212463A (en) Target tracking method based on fractional order hybrid network
CN114283301A (en) Self-adaptive medical image classification method and system based on Transformer
CN117078539A (en) CNN-transducer-based local global interactive image restoration method
CN116486495A (en) Attention and generation countermeasure network-based face image privacy protection method
CN108259914B (en) Cloud image encoding method based on object library
EP4164221A1 (en) Processing image data
CN115147317A (en) Point cloud color quality enhancement method and system based on convolutional neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination