CN114445889A

CN114445889A - Lightweight face aging method based on double attention mechanism

Info

Publication number: CN114445889A
Application number: CN202210095562.3A
Authority: CN
Inventors: 马小林; 郭翔; 张家亮; 旷海兰; 刘新华
Original assignee: Wuhan University of Technology WUT
Current assignee: Wuhan University of Technology WUT
Priority date: 2022-01-26
Filing date: 2022-01-26
Publication date: 2022-05-06

Abstract

The invention provides a lightweight face aging method based on a double attention mechanism, which comprises the following steps: preprocessing an input face image; coding the input target aging age and converting the target aging age into a multi-dimensional age vector; extracting identity features of the preprocessed face image to obtain high-dimensional identity features; inputting the multi-dimensional age vector into a multi-layer perceptron to be mapped into age-related high-dimensional age features; acquiring a fusion feature vector from the high-dimensional identity feature and the high-dimensional age feature through a self-adaptive instance normalization layer; using jump connection, up-sampling and multi-scale traditional convolution on the fusion feature vector to obtain a texture attention diagram and a color attention diagram; and fusing the texture attention diagram, the color attention diagram and the input original image to obtain the face aging image aged to the target age finally. The method can obtain the high-resolution image of the face aged to the target age finally.

Description

Lightweight face aging method based on double attention mechanism

Technical Field

The invention belongs to the technical field of digital image processing, and particularly relates to a lightweight face aging method based on a double attention mechanism.

Background

With the improvement of living standard, the pursuit of people for social entertainment quality is gradually improved, and short videos become the most popular social entertainment mode at present. In the process of shooting the short video, the special effect simulation of face aging and rejuvenation is realized, and the face aging and rejuvenation is praised by the majority of users due to the authenticity and interestingness of the face aging and rejuvenation. However, the face aging is a very complex process, and various factors need to be considered comprehensively, the structural design of the traditional algorithm is complex in order to realize the authenticity of the aging effect, and particularly in the process of realizing the high-resolution face aging, the calculation amount of a single image is huge, so that the method is not beneficial to the deployment of embedded devices such as mobile phones and the like.

In order to solve the problem that a face aging algorithm is difficult to deploy in an embedded device, a researcher considers applying deep learning to face aging, an existing face aging algorithm usually designs a neural network model with a relatively simple structure to achieve face aging, and although face aging can be achieved to a certain extent, the effect is not satisfactory, and the reality is greatly reduced.

At present, lightweight network design has been greatly successful in the field of deep learning and is widely applied to image processing research, but the specific field of face aging is relatively less in application, and needs to be studied more deeply, so that great progress space exists.

Disclosure of Invention

The invention aims to solve the defects of the background technology, and provides a lightweight face aging method based on a double attention mechanism, which comprises the steps of extracting identity characteristics of an input face image by using the traditional convolution, depth separable convolution, inverted bottleneck residual error and mixed domain attention, using a target aging age as the input of image attribute editing, realizing the fusion of the identity characteristics and the aging age characteristics by using a self-adaptive instance normalization layer, and generating a texture attention diagram and a color attention diagram of the double attention mechanism by performing jump connection, up-sampling and convolution operation on the fusion characteristics; finally, combining texture attention diagram, color attention diagram and original input image, obtaining high-resolution image of face aged to target age, and completing network training without supervision learning.

The technical scheme adopted by the invention is as follows: a lightweight face aging method based on a double attention mechanism is characterized by comprising the following steps: the method comprises the following steps:

s1, preprocessing the input face image to realize pixel normalization;

s2, coding the input target aging age and converting the target aging age into a multi-dimensional age vector;

s3, extracting the identity characteristics of the preprocessed face image by adopting a coding network to obtain high-dimensional identity characteristics;

s4, inputting the multi-dimensional age vector into a multi-layer sensor, gradually increasing the dimension, and mapping the multi-dimensional age vector into age-related high-dimensional age characteristics;

s5, realizing feature fusion of the high-dimensional identity feature and the high-dimensional age feature through a self-adaptive instance normalization layer to obtain a fusion feature vector;

s6, a decoding network constructed by jump connection, up-sampling and multi-scale traditional convolution is used for the fusion feature vector to obtain a texture attention diagram and a color attention diagram; and fusing the texture attention diagram, the color attention diagram and the input original image to obtain the face aging image aged to the target age finally.

In the above technical solution, in step S1, the original input image x is subjected to_iPretreatment with a primary age of i according to [0.5,0.5 ]]Sum of mean values of [0.5,0.5 ]]The standard deviation is normalized, and stretching, clipping and noise adding are only introduced in the network training process to prevent the network training from overfitting.

In the above technical solution, in order to solve the problem that the common coding method has a weak correlation with the adjacent ages, in step S2, a coding method combining classification and regression is designed, and first, an age section in which a target aging age is located is confirmed, and a correlation between the target aging age and an age section boundary is obtained as a coding result by a linear calculation method, so as to obtain a multidimensional age vector, thereby realizing that the coding result retains a certain age correlation.

In the above technical solution, in step S2, dividing 0-100 years into 10 age groups of 0-1,1-2,2-3 … 9-10 according to the width N of the age group being 10, acquiring an input target age j, and performing age group determination, where the age group where j is located is a lower integer a of j/N and a lower integer B of j/N + 1;

calculating the association system of the target age and the age interval according to the determined age interval by the following formula group to obtain the association coefficient p, q:

p+q＝1

A×p+B×q＝j

the encoding result of the target aging age j is an 11-dimensional vector T_j：

m is an integer

In the above technical solution, aiming at the problem that the traditional convolution mode has a large calculation amount and a large parameter amount, in step S3, the coding network includes a traditional convolution module, a depth separable convolution module with a step length of 2, a bottle neck inversion residual error module, and a mixed domain attention module; the preprocessed face image is downsampled and feature extracted by adopting a traditional convolution module and a separable convolution module with the step length of 2 depths in a coding network, the sampling depth of identity extraction is deepened by adopting a bottle neck inversion residual error module in the coding network, and the extraction capability of the coding network on important areas of identity features is enhanced by using a mixed domain attention module. The method can effectively reduce the parameter quantity and the calculated quantity of the coding network.

Preferably, down sampling and feature extraction are realized by using 1 7 × 7 traditional convolution module with step size 1 and 2 3 × 3 depth separable convolution modules with step size 2, the convolution depth is deepened by using 4 3 × 3 inverted bottleneck residual error modules with step size 1, and the important region extraction of the network is guided by using 1 mixed domain attention module to the feature, and finally the image is extracted to 128-dimensional n × n identity feature Z.

In the above technical solution, in step S4, most of the age vectors in step S2 are input to the multi-level perceptron of [11,64,128,256], and the characteristic dimension is gradually increased to obtain a 256-dimensional 1 × 1 age vector L.

In the above technical solution, in order to achieve reasonable and sufficient fusion of the two feature vectors, in step S5, the high-dimensional age feature format is converted into 2 age feature vectors having the same dimension as the high-dimensional identity feature; feature fusion is carried out on the high-dimensional identity features and the 2 age feature vectors by using an adaptive instance normalization layer (AdaIN) to obtain a fusion feature vector AdaIN (Z, L) with the same dimension as the high-dimensional identity features,

wherein Z represents identity characteristics and L represents age characteristics; mu (Z) and sigma (Z) respectively represent the mean value and the standard deviation of the identity characteristics and are obtained by calculation through a mathematical formula in the prior art; α (L) and β (L) represent 2 age feature vectors after the age feature format conversion, respectively.

Preferably, the obtained 128-dimensional identity feature and the 256-dimensional age feature are fused, the 256-dimensional feature is firstly compressed into 2 128-dimensional 1 × 1 feature vectors, and then feature fusion is performed by using an adaptive instance normalization layer (AdaIN) to obtain a 128-dimensional n × n fused feature vector.

In the above technical solution, for the problem that the jump connection layer weakens the function of the age information included in the fusion feature in the decoding process, in step S6, a decoding network is constructed by using the jump connection layer combined with an attention mechanism, an upsampling module and a conventional convolution module, so as to enhance the function of the age information in the decoding process, and make the generated image closer to the target age; after the decoding network is adopted to carry out characteristic dimension reduction and scale expansion on the fused characteristic vector, 2 independent traditional convolution modules are used for decoding the networkProcessing the output image to respectively generate a texture attention diagram R and a color attention diagram C which are consistent with the scale of the input image; unprocessed original face image x_iAnd the texture attention diagram R and the color attention diagram C are fused according to the following formula to finally obtain an aged face image x corresponding to the target age j_ij:

x_ij＝R×x_i+(1-R)×C

Preferably, a decoding network is constructed using 2 skipped connection layers combined with an attention gate mechanism, 2 upsampling with a scale of 2, and 2 conventional convolutions of 3 × 3 with a step size of 1, the resulting fused features are subjected to feature dimension reduction and scale expansion, and then a texture attention map R and a color attention map C are generated, respectively, in conformity with the scale of the input image using 2 independent conventional convolutions of 7 × 7 with a step size of 1.

In the above technical solution, in order to enable the method to generate a human face age composite image with more real vision and clearer details, the method further includes step S7:

training an authenticity discriminator for discriminating a face aging image result finally aged to a target age, detecting whether an input picture is generated in the step S6 by the authenticity discriminator, and calculating an authenticity error loss value; estimating the age of the aged face in the input picture by adopting an age discriminator, and calculating an age error loss value; and jointly guiding the encoding network, the multi-layer perceptron and the decoding network training process in the steps S3-S6 by adopting the authenticity error loss value and the age error loss value. The age discriminator adopts VGG-FACE.

In the above technical solution, for the problem that the data set lacks pairing data in the training process, the method further includes step S8:

inputting the face aging image result obtained in the step S6 and the original age of the face image adopted in the step S1 as an original face image and a target aging age by adopting a cycle consistency principle, executing the steps S1-S6 again to obtain an image with the aged image restored to the original age, performing pixel-level loss comparison on the face image obtained by executing the steps and the original face image input in the step S1, and guiding the training process of the coding network, the multilayer perceptron and the decoding network of the steps S3-S6.

In the above technical solution, in order to enable the method to generate a human face age composite image with more real vision and clearer details, the method further includes step S9:

and inputting an original input image and an original age as an original face image and a target aging age by adopting an image reconstruction consistency principle, executing the steps S1-S6 again to obtain an image of the original face image at the original age, performing pixel-level loss comparison and age loss comparison with the original input face image, and guiding the training processes of the coding network, the multi-layer perceptron and the decoding network of the steps S3-S6.

The invention provides a computer readable storage medium, wherein a dual attention mechanism high-resolution face image aging method program of a lightweight network is stored on the computer readable storage medium, and when the dual attention mechanism high-resolution face image aging method program of the lightweight network is executed by a processor, the steps of the dual attention mechanism high-resolution face image aging method of the lightweight network in the technical scheme are realized.

The beneficial effects of the invention are: the invention uses the depth separable convolution and bottleneck inversion residual module, thus reducing the operation amount of the computer for extracting the identity characteristics of the face image; the self-adaptive instance normalization layer is used, so that the operation amount of high-dimensional spatial feature fusion is reduced; the method has the advantages that the texture attention diagram, the color attention diagram and the original input image are fused, so that the pixel loss of the aged image and the original image is reduced, and the high-resolution human face aged image synthesis is realized; and realizing the network unsupervised training process by using a cycle consistency principle and a reestablishment consistency principle.

Drawings

FIG. 1 is a flow chart of a lightweight face aging method based on a dual attention mechanism according to the present invention;

FIG. 2 is a schematic diagram of a network structure of a lightweight face aging method based on a dual attention mechanism according to the present invention;

FIG. 3 is a graph comparing depth separable convolution with conventional convolution;

FIG. 4 is a comparison of an inverted bottleneck residual module and a conventional residual module;

FIG. 5 is a schematic diagram of a jump connection layer incorporating an attention gate mechanism;

FIG. 6 is a diagram of a training process of a lightweight face aging method based on a dual attention mechanism according to the present invention;

fig. 7 is a face aging effect diagram of a lightweight face aging method based on a double attention mechanism (original image is 34 years old, and aged image is 65 years old).

Detailed Description

The invention will be further described in detail with reference to the following drawings and specific examples, which are not intended to limit the invention, but are for clear understanding.

The invention provides a lightweight face aging method based on a double attention mechanism, which mainly comprises the following steps: as shown in fig. 1, 5 aspects of preprocessing input face images and target aging ages, extracting the identities of the face images, mapping high-dimensional age features, fusing the identities and the age features, generating a double attention map and synthesizing aging face images; as shown in fig. 6, in the unsupervised training process of the network model, the whole method includes the following steps:

the method comprises the following steps of firstly, inputting a face image and preprocessing a target aging age, and specifically comprises the following steps:

(1-1) carrying out normalization processing on an input face image, carrying out stretching and then randomly cutting in the training process, and simultaneously adding Gaussian noise to prevent over-fitting of network training, specifically:

(1-1-1) adjusting the threshold range of the RGB channel of the input face image from 0-255 to 0-1, and enabling the normalized image to be in accordance with the average value of [0.5,0.5,0.5], and the standard deviation of [0.5,0.5,0.5 ]; in the training process, before the image normalization, an additional stretching and cutting process is required, the image is stretched to be 1.1 times of the original size, in the stretched image, the image with the original size is cut randomly, and Gaussian noise with the expected value of 0.5 and the variance of 0.5 is added after the face image normalization.

(1-2) encoding the target aging age j, firstly confirming an age interval where the target age is located by combining the encoding modes of classification and regression, obtaining the relevance between the target age and the boundary of the age interval by a linear calculation mode, and outputting an age characteristic, wherein the encoding modes specifically comprise the following steps:

(1-2-1) dividing 0-100 years into 10 age sections of 0-1,1-2,2-3 … 9-10 according to the width N of the age section, acquiring input aged age j, and judging the age sections, wherein the age sections of j are lower integer A of j/N and lower integer B of j/N + 1.

(1-2-2) calculating the association between the target age and the age section according to the age section determined in (1-2-1) by the following formula group, and obtaining the association coefficient p, q.

p+q＝1

A×p+B×q＝j

The aging age j encoding result is an 11-dimensional vector T_j(ii) a M in the formula is an integer corresponding to the age interval

Secondly, extracting the identity of the face image, comprising the following steps:

(2) the method comprises the following steps of constructing a coding network by using a traditional convolution module with the step length of 1, a depth separable convolution module with the step length of 2, a bottle neck inversion residual error module and a mixed domain attention module, and extracting identity characteristics of a preprocessed face image, wherein the method specifically comprises the following substeps:

(2-1) As shown in FIG. 2, using 1 7 × 7 conventional convolution calculation with 3 padding and 1 step size, the activation function is chosen as ReLU, which achieves mapping from 3 × n × n low dimensional space to 32 × n × n feature space while preserving the higher receptive field of the convolution.

(2-2) As shown in FIG. 2,2 3 × 3 depth separable convolution calculations with 1 padding and 2 step sizes are continuously used, h-swish is selected as the activation function, convolution with 2 step sizes can replace a pooling layer to realize feature scale reduction, and while parameters and calculation amount are reduced, feature space is realized from 32 × n × n to

Then to

The conversion of (1).

The depth separable convolution (the lower convolution module in fig. 3) adopts the ways of channel-by-channel convolution and point-by-point convolution relative to the mixed domain convolution of the traditional convolution (the upper convolution module in fig. 3), and greatly reduces the computational power requirement while realizing the same convolution effect, and the specific structure is as shown in fig. 3.

(2-3) as shown in fig. 2, 4 inverted bottleneck residual error modules are continuously used for calculation, an activation function is selected as h-swish, parameters and calculation amount are reduced, and extraction of deep features by a network is enhanced.

The inverted bottleneck residual error module is different from a traditional residual error module, the dimensionality is improved through point-by-point convolution, the dimensionality is reduced through depth separable convolution, light-weight SE attention is introduced into the middle, extraction of deeper features under low calculation amount is achieved, and the specific structure is that the traditional residual error module is arranged on the left side and the inverted bottleneck residual error module used in the method is arranged on the right side as shown in fig. 4.

(2-4) as shown in fig. 2, 1 mixed domain attention module is used to further guide the coding network to extract the identity feature interesting region with emphasis.

Thirdly, high-dimensional age characteristic mapping, comprising the following steps:

(3) for an age vector T with the length of 11, a multi-layer perceptron with dimension transformation [11,64,128,256] is used for mapping to 256 dimensions, and a Sigmoid function is used for carrying out nonlinear mapping to obtain an age feature L with the dimension of 256 dimensions and the dimension of 1 multiplied by 1.

L＝G(b⁽³⁾+W⁽³⁾(s(b⁽²⁾+W⁽²⁾(s(b⁽¹⁾+W⁽¹⁾*T)))))

Wherein W represents a weight matrix of the full link layer, b represents a bias matrix of the full link layer, T represents the target aging age coding result obtained in (1-2-2), G is a softmax function, and s is a Sigmoid function.

Fourthly, fusing identity and age characteristics, comprising the following steps:

(4) firstly, a 256-dimensional 1 × 1 age feature L format is converted into 2 128-dimensional 1 × 1 feature vectors, and then feature fusion is performed by using an adaptive instance normalization layer (AdaIN), wherein a fusion formula is as follows, so that a 128-dimensional n × n fusion feature AdaIN (Z, L) is obtained.

Mu (Z) and sigma (Z) respectively represent the mean value and the standard deviation of the identity characteristics and are obtained by calculation through a mathematical formula in the prior art; α (L) and β (L) represent two 128-dimensional feature vectors after the age feature format conversion, respectively.

Fifthly, generating a double attention map and synthesizing aged face images, wherein the method comprises the following steps:

(5) using a jump join, upsampling, and multi-scale conventional convolution for the fused features, a texture attention map and a color attention map are obtained. Combining the texture attention diagram, the color attention diagram and the input original image, fusing the three to obtain a face aging image aged to a target age finally, wherein the step specifically comprises the following substeps:

(5-1) as shown in fig. 2, the jump connection layer combined with the attention gate mechanism is used, the features are fused to serve as gate control signals, the weights of the jump connection layer are guided, and finally, the prominent image area and the feature response irrelevant to the suppression task are highlighted.

The attention gate mechanism is shown in FIG. 5, with the fusion feature as the gate control signal x_gFirst, a 1 × 1 conventional convolution is performed, the compression characteristic of which is H_g×W_gX 1, jumping the connection layer as controlled signal x_lSimilarly, a conventional convolution of 1 × 1 is performed, the compression characteristic being H_l×W_lX 1, where H denotes the height of the feature, W denotes the width of the feature, with the subscript l, g denotes the corresponding signal source; two 1-dimensional vectors concat are 2-dimensional vectors, and then H is obtained by 7 multiplied by 7 convolution_l×W_lLayer of × 1 repeat, with controlled signal x_lMultiplying to obtain final output signal

(5-2) As shown in FIG. 2, in the present network structure, the fusion feature is completely consistent with the feature scale of the jump connection layer combined with the attention gate mechanism, the two concat are connected together, two upsampling with the scaling factor of 2 and 3 x 3 traditional convolution with the step length of 1 are used to realize the feature space slave

To

And then to 32 xnxn.

(5-3) As shown in FIG. 2, convolution of 7 × 7 × 1 and 7 × 7 × 3 is performed on the feature vectors of 32 × n × n, respectively, to obtain texture attention map R and color attention map C, and the texture attention map R and the color attention map C are fused with the original input image according to the following formula to obtain an aged image x_ij。

Sixthly, an unsupervised training process of the network model comprises the following steps:

(6) as shown in fig. 6, in the network training process, the obtained aging image needs to be subjected to authenticity judgment through an authenticity discriminator D, and the judgment result is used as an original loss function L of GAN_GANAnd guiding the coding network, the decoding network and the multi-layer perceptron training.

L_GAN(E,G,M)＝E_x～P(x)E_y～P(y)[(D(G(E(x),M(y)))-1)²]

Wherein E_x～P(x)Representing the mathematical expectation that the input image x conforms to the distribution of p (x), E (×) representing the coding network, G (×) representing the decoding network, M (×) representing the multi-layer perceptron, p (x) representing the true distribution of the input image, p (y) representing the true distribution of the target aging age label, x representing the original input face image, y representing the target aging age.

When guiding discriminator D to train itself, the loss function is as follows

L_GAN(D)＝E_x～P(x)E_y～p(y)[(D(G(E(x),M(y))))²]+E_x～P(x)[(D(x)-1)²]

The age discriminator V adopts the existing VGG-FACE discriminator, so that training is not needed, the aging image is input into the FACE discriminator to obtain the estimated age, and the estimated age is compared with the target age to obtain the loss function L_age1。

Wherein P (y)_j) Indicates the target age distribution, C (y)_j) Indicates the target age y_jCarrying out One-hot coding on the obtained 101 vector; l is_CERepresenting a cross entropy loss function.

(7) As shown in fig. 6, during training, the same subject lacks paired data of different ages for supervised learning, so the cycle consistency principle is adopted to solve the problem.

Theoretically, the method can obtain the image x aged to the target age through the steps_ijAging the image x due to lack of supervised sampling_ijRepeating the steps to restore the original age to i to obtain a restored image x_ijiWith the original image x_iComparing to obtain pixel-level loss function L_cycle。

L_cycle＝||x_i-x_iji||_1·

Wherein x_iRepresenting the original input image, x_ijiRepresenting circularly reconstructed images, | | calving_1·Representing the L1 norm calculation.

Meanwhile, in the training process, the identity characteristic Z obtained by twice encoding needs to be constrained to guide the accuracy of the encoder for extracting the identity characteristic, so that the loss function is a Pearson correlation coefficient L_id。

Where μ and σ are the mean and standard deviation, respectively, Z₁Representing the identity, Z, extracted from the input original image by the encoding network during the aging process₂And representing the identity characteristics obtained by extracting the input aging image by the coding network in the cyclic reconstruction process.

(8) As shown in fig. 6, in order to improve the network image generation quality and the age accuracy, the reconstruction consistency principle is adopted.

Theoretically, the method will be used to generate the original image x_iAnd the original age i as algorithm input, and obtaining a reconstructed image x_iiWith the original image x_iComparing to obtain pixel level loss function L_reconAnd age loss function L_age2。

L_recon＝||x_i-x_ii||_1·

Wherein | | | purple hair_1·Denotes the L1 norm calculation, P (y)_i) Representing the original age distribution, C (y)_i) Representing the original age y_iCarrying out One-hot coding on the obtained 101 vector; l is_CERepresenting a cross entropy loss function.

(9) As shown in FIG. 6, the final loss function of the training generation network G for the lightweight face aging method based on the double attention mechanism is L_allAs an objective loss function, using an Adam optimizer to perform parameter optimization, and guiding the training process of the coding network, the multi-layer perceptron and the decoding network in the steps S3-S6;

L_all＝λ_GANL_GAN(E,G,M)+λ_reconL_recon+λ_cycleL_cycle+λ_idL_id+λ_age1L_age1+λ_age2L_age2#

it will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention. Those not described in detail in this specification are within the skill of the art.

Claims

1. A lightweight face aging method based on a double attention mechanism is characterized by comprising the following steps: the method comprises the following steps:

s1, preprocessing the input face image to realize pixel normalization;

2. The lightweight face aging method based on the double attention mechanism as claimed in claim 1, wherein: in step S2, the encoding method of classification and regression is combined, and the age section in which the target age is located is first identified, and the correlation coefficient between the target age and the boundary of the age section is obtained as the encoding result by the linear calculation method, thereby obtaining the multidimensional age vector.

3. The lightweight face aging method based on the double attention mechanism as claimed in claim 1, wherein: in step S3, the coding network includes a conventional convolution module, a depth separable convolution module with step length of 2, a bottle neck inversion residual error module, and a mixed domain attention module; the preprocessed face image is downsampled and extracted with a traditional convolution module and a separable convolution module with the step length of 2 depths, the sampling depth of the extracted identity feature is deepened with a bottleneck inversion residual error module, and the extraction capability of the coding network on the important region of the identity feature is enhanced with a mixed domain attention module.

4. The lightweight face aging method based on the dual attention mechanism as claimed in claim 1, wherein: in step S5, splitting a single high-dimensional age feature format into 2 age feature vectors with the same dimension as the high-dimensional identity feature; feature fusion is carried out on the high-dimensional identity features and the 2 age feature vectors by using an adaptive instance normalization layer to obtain fusion feature vectors AdaIN (Z, L) with the same dimension as the high-dimensional identity features,

wherein Z represents identity characteristics, L represents age characteristics, and mu (Z) and sigma (Z) respectively represent the mean value and standard deviation of the identity characteristics, and are obtained by calculation through a mathematical formula in the prior art; α (L) and β (L) represent 2 age feature vectors after the age feature format conversion, respectively.

5. The lightweight human face aging method based on the dual attention mechanism as claimed in claim 1The method is characterized in that: in step S6, a decoding network is constructed using a jump connection layer, an upsampling module, and a conventional convolution module in combination with an attention gate mechanism; after the feature dimensionality reduction and the scale expansion of the fused feature vector are carried out by adopting a decoding network, processing an output image of the decoding network by using 2 independent traditional convolution modules, and respectively generating a texture attention diagram R and a color attention diagram C which are consistent with the scale of the input image; unprocessed original face image x_iAnd the texture attention diagram R and the color attention diagram C are fused according to the following formula to finally obtain an aged face image x corresponding to the target age j_ij:

x_ij＝R×x_i+(1-R)×C。

6. The lightweight face aging method based on the double attention mechanism as claimed in claim 1, wherein: further comprising step S7:

training an authenticity discriminator for discriminating a face aging image result finally aged to a target age, detecting whether an input picture is generated in the step S6 by the authenticity discriminator, and calculating an authenticity error loss value; estimating the age of the aged face in the input picture by adopting an age discriminator, and calculating an age error loss value; and jointly guiding the encoding network, the multi-layer perceptron and the decoding network training process in the steps S3-S6 by adopting the authenticity error loss value and the age error loss value.

7. The lightweight face aging method based on the double attention mechanism as claimed in claim 1, wherein: further comprising step S8:

inputting the face aging image result obtained in the step S6 and the original age of the face image adopted in the step S1 as an original face image and a target aging age by adopting a cycle consistency principle, executing the steps S1-S6 again to obtain an image with the aged image restored to the original age, performing pixel-level loss comparison on the face image obtained by executing the steps and the original face image adopted in the step S1, and guiding the training process of the coding network, the multilayer perceptron and the decoding network of the steps S3-S6.

8. The lightweight face aging method based on the double attention mechanism as claimed in claim 1, wherein: further comprising step S9:

9. The lightweight face aging method based on the double attention mechanism as claimed in claim 2, characterized in that: in step S2, dividing 0-100 years into 10 age groups of 0-1,1-2,2-3 … 9-10 according to the age group width N of 10, obtaining an input target age j, and performing age group determination, wherein the age group in which j is located is a lower integer a of j/N and a lower integer B of j/N + 1; calculating the relevance between the target age and the age interval according to the determined age interval by the following formula group to obtain relevance coefficients p and q;

p+q＝1

A×p+B×q＝j

m∈[0,10]And m is an integer.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a dual attention mechanism high resolution face image aging method program of a lightweight network, which when executed by a processor, implements the steps of the dual attention mechanism high resolution face image aging method of a lightweight network according to any one of claims 1 to 9.