CN113065525B

CN113065525B - Age identification model training method, face age identification method and related device

Info

Publication number: CN113065525B
Application number: CN202110456649.4A
Authority: CN
Inventors: 陈仿雄
Original assignee: Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Current assignee: Shenzhen Shuliantianxia Intelligent Technology Co Ltd
Priority date: 2021-04-27
Filing date: 2021-04-27
Publication date: 2023-12-12
Anticipated expiration: 2041-04-27
Also published as: CN113065525A

Abstract

The application provides an age identification model training method, a face age identification method and a related device, wherein the age identification model training method comprises the following steps: obtaining a sampling face image group; inputting the sample face image group into a face age recognition model to respectively obtain high-level semantic feature vectors and predicted ages corresponding to two different sample face images; respectively calculating the difference between the predicted ages and the age labels corresponding to the two face images of different samples to obtain the age loss of the face age recognition model; calculating the similarity between the high-level semantic feature vectors corresponding to the two face images with different samples so as to obtain the feature similarity loss of the face age recognition model; and carrying out iterative parameter adjustment on the face age recognition model based on the age loss and the feature similarity loss so as to obtain a target face age recognition model. The technical scheme can improve the recognition accuracy of the face age recognition model.

Description

Age identification model training method, face age identification method and related device

Technical Field

The application relates to the field of facial image recognition, in particular to an age recognition model training method, a facial age recognition method and a related device.

Background

The face image contains various face characteristic information such as face facial form, face skin state, face five sense organs, face age and the like, wherein the face age is taken as important characteristic information, and is widely applied to the field of face image recognition. For example, some clients running on the mobile device have a function of face age recognition, wherein the clients acquire face images and output the face ages obtained by recognition based on the acquired face images to feed back to users, so that the purpose that the clients interact with the users to increase the viscosity of the users is achieved.

For these clients with the face age recognition function, the accuracy of age recognition, that is, the gap between the recognized age and the true age of the user, is a content that is important. At present, in the related technology of face age recognition, the real age of a face is generally used as a single tag information, the real age is used as a tag of a face image, a one-to-one correspondence is established between the face image and the real age, and then training of a face age recognition model is performed. Because the identity of the user has uniqueness, the facial features of different faces of the same age are different, in the training process of the face age recognition model, each training face image is input, the face age recognition model belongs to a new image, and only the face features of training data can be learned by the face age recognition model. When the face age recognition model is used for recognizing the face age, the face image newly input to the face age recognition model is a new image for the face age recognition model, and the face features of the image are not learned, so that the face image is difficult to adapt to the identity uniqueness of the newly input face image, and the recognition effect on the newly input face image is poor.

Disclosure of Invention

The application provides an age identification model training method, a face age identification method and a related device, which are used for solving the technical problem of poor identification effect in the existing face age identification technology.

In a first aspect, there is provided a method of training an age identification model, the method comprising the steps of:

obtaining a sample face image group, wherein the sample face image group comprises two different sample face images with the same age label;

inputting the sample face image group into a face age recognition model to respectively obtain high-level semantic feature vectors and predicted ages corresponding to the two different sample face images, wherein the face age recognition model comprises two layers of face age recognition networks, one layer of face age recognition network is used for obtaining the high-level semantic feature vector and the predicted age corresponding to one sample face image in the sample face image group, and the high-level semantic feature vector of one sample face image is used for expressing the face features of the one sample face image;

calculating the difference between the predicted ages corresponding to the two face images of different samples and the age labels corresponding to the two face images of different samples respectively to obtain the age loss of the face age identification model;

Calculating the similarity between the high-level semantic feature vectors corresponding to the two face images with different samples respectively to obtain the feature similarity loss of the face age recognition model;

and iteratively adjusting parameters of the face age recognition model based on the age loss and the feature similarity loss to obtain a target face age recognition model.

According to the technical scheme, two face images with the same age label and different samples are input into a face age recognition model comprising two layers of face age recognition networks, high-level semantic feature vectors and predicted ages of the two sample face images are obtained through the two layers of face age recognition networks respectively, then the difference between the predicted ages of the two sample face images and the age label is calculated to obtain age loss of the face age recognition model, similarity between the high-level semantic feature vectors of the two sample face images is calculated to obtain feature similarity loss of the face age recognition model, and finally iterative parameter adjustment is carried out on the face age recognition model based on the age loss and the feature similarity loss of the face age recognition model to obtain the target face age recognition model. The human face age recognition network is used for acquiring high-level semantic feature vectors and predicted ages of one sample human face image, so that human face features of two sample human face images can be extracted relatively independently through the two-layer human face age recognition model, and the human face age recognition model is favorable for better comparing feature commonalities and feature differences of the two sample human face images; because the two sample face images correspond to the same age, when iteration parameter adjustment is carried out based on the loss of the face age recognition model, the face age recognition model can learn the face features which are related to the age information and are irrelevant to the identity information from various face features by introducing the feature similarity loss obtained by calculating the high-level semantic feature vectors corresponding to the two sample face images belonging to the same age on the basis of the age loss, so that the corresponding relation between the face features which are related to the age information and are irrelevant to the identity information and the age information is established, and the recognition precision of the face age recognition model is improved.

With reference to the first aspect, in a possible implementation manner, the network structure of each layer of face age identification network is the same; each layer of face age identification network comprises a feature extraction module, a first full-connection layer and a second full-connection layer, wherein the feature extraction module is respectively connected with the first full-connection layer and the second full-connection layer; the step of inputting the sample face image group into the face age recognition model to obtain high-level semantic feature vectors and predicted ages corresponding to the two different sample face images respectively, including: extracting features of a target sample face image through a feature extraction module of a target face age recognition network to obtain a feature image of the target sample face image, wherein the target face age recognition network is any one layer of face age recognition network in the face age recognition model, and the target sample face image is any one sample face image in the sample face image group; extracting high-level semantic features of the feature map through the first full-connection layer to obtain high-level semantic feature vectors of the target sample face image; and carrying out age prediction on the feature map through the second full-connection layer so as to obtain the predicted age of the face image of the target sample. By setting two full-connection layers for each layer of face age recognition network, the high-level semantic feature vector and the predicted age of the sampled face image can be obtained respectively, and the two full-connection layers share the parameters of the feature extraction module, so that the face age recognition model can be simplified, and the operation efficiency can be improved.

With reference to the first aspect, in one possible implementation manner, the calculating the similarity between the high-level semantic feature vectors corresponding to the two different sample face images to obtain the face age identification modelA feature similarity penalty comprising: and calculating the similarity between the high-level semantic feature vectors corresponding to the two face images with different samples according to the following formula to obtain the feature similarity loss of the face age recognition model: wherein L1 is the feature similarity loss, X ₁ For the high-level semantic feature vector, X corresponding to one of the sample face images in the sample face image group ₂ And M is a similarity measurement matrix, wherein the M is a high-level semantic feature vector corresponding to another Zhang Yangben face image in the sample face image group. By introducing the similarity measurement matrix, namely equivalent to weight setting of the face features, the differentiation measurement of the face features can be realized, so that the face age recognition model can learn the face features related to the age information better.

With reference to the first aspect, in one possible implementation manner, the calculating, respectively, a difference between the predicted ages corresponding to the two different sample face images and the age labels corresponding to the two different sample face images to obtain the age loss of the face age recognition model includes: respectively calculating the cross entropy between the predicted ages corresponding to the two different-sample face images and the age labels corresponding to the two different-sample face images so as to obtain the cross entropy corresponding to the two different-sample face images; and determining the age loss of the face age identification model according to the cross entropy corresponding to each of the two face images of different samples. The difference between the real age and the predicted age output by the face age recognition model is measured based on the cross entropy, so that the face age recognition model can be converged more quickly, and the training speed of the face age recognition model is improved.

With reference to the first aspect, in one possible implementation manner, the acquiring a sampled face image set includes: acquiring a face sample image set; and carrying out sample data enhancement on the face sample image set, and acquiring two different sample face images belonging to the same age from the face sample image set subjected to sample data enhancement to serve as the sample face image group. By carrying out data enhancement on the face sample image set, the variety of sample face images in the face sample image set can be enriched, so that the face age recognition model can be better trained.

In a second aspect, a face age identification method is provided, including the following steps:

acquiring a face image to be identified;

inputting the face image to be identified into a target face age identification model to obtain two predicted age values corresponding to the face image to be identified, wherein the target face age identification model is obtained by training the method of the first aspect;

and taking the average value of the two predicted age values as the face age of the face image to be identified.

Since the target face age recognition model obtained through training by the method of the first aspect learns the face features related to the age information and unrelated to the identity information from various face features, the correspondence between the face features related to the age information and unrelated to the identity information is established, and when the target face age recognition model is utilized, the target face age recognition model can recognize the age of the face image to be recognized by utilizing the learned face features related to the age information and unrelated to the identity information and the established correspondence between the face features related to the age information and unrelated to the identity information, thereby removing the influence of the face features related to the identity information on the age recognition, and improving the accuracy of the face age recognition.

In a third aspect, there is provided an age identification model training apparatus, comprising:

the first acquisition module is used for acquiring a sample face image group, wherein the sample face image group comprises two different sample face images with the same age label;

the result prediction module is used for inputting the sample face image group into a face age recognition model to respectively obtain high-level semantic feature vectors and predicted ages corresponding to the two different sample face images, wherein the face age recognition model comprises two layers of face age recognition networks, one layer of face age recognition network is used for obtaining the high-level semantic feature vector and the predicted age corresponding to one sample face image in the sample face image group, and the high-level semantic feature vector of one sample face image is used for expressing the face features of the one sample face image;

the first loss calculation module is used for calculating differences between the predicted ages corresponding to the two face images with different samples and the age labels corresponding to the two face images with different samples respectively so as to obtain age loss of the face age identification model;

the second loss calculation module is used for calculating the similarity between the high-level semantic feature vectors corresponding to the two face images with different samples respectively so as to obtain the feature similarity loss of the face age recognition model;

And the parameter adjusting module is used for carrying out iterative parameter adjustment on the face age identification model based on the age loss and the characteristic similarity loss so as to obtain a target face age identification model.

With reference to the third aspect, in one possible design, the network structure of each layer of face age identification network is the same; each layer of face age identification network comprises a feature extraction module, a first full-connection layer and a second full-connection layer, wherein the feature extraction module is respectively connected with the first full-connection layer and the second full-connection layer; the result prediction module is specifically configured to: extracting features of a target sample face image through a feature extraction module of a target face age recognition network to obtain a feature image of the target sample face image, wherein the target face age recognition network is any one layer of face age recognition network in the face age recognition model, and the target sample face image is any one sample face image in the sample face image group; extracting high-level semantic features of the feature map through the first full-connection layer to obtain high-level semantic feature vectors of the target sample face image; and carrying out age prediction on the feature map through the second full-connection layer so as to obtain the predicted age of the face image of the target sample.

With reference to the third aspect, in one possible design, the first loss calculation module is specifically configured to: and calculating the similarity between the high-level semantic feature vectors corresponding to the two face images with different samples according to the following formula to obtain the feature similarity loss of the face age recognition model: wherein L1 is the feature similarity loss, X ₁ For the high-level semantic feature vector, X corresponding to one of the sample face images in the sample face image group ₂ And M is a similarity measurement matrix, wherein the M is a high-level semantic feature vector corresponding to another Zhang Yangben face image in the sample face image group.

With reference to the third aspect, in one possible design, the second loss calculation module is specifically configured to: respectively calculating the cross entropy between the predicted ages corresponding to the two different-sample face images and the age labels corresponding to the two different-sample face images so as to obtain the cross entropy corresponding to the two different-sample face images; and determining the age loss of the face age identification model according to the cross entropy corresponding to each of the two face images of different samples.

With reference to the third aspect, in one possible design, the first obtaining module is specifically configured to: acquiring a face sample image set; and carrying out sample data enhancement on the face sample image set, and acquiring two different sample face images belonging to the same age from the face sample image set subjected to sample data enhancement to serve as the sample face image group.

In a fourth aspect, a face age recognition apparatus is provided, including:

the second acquisition module is used for acquiring the face image to be identified;

the age prediction module is used for inputting the face image to be recognized into a target face age recognition model to obtain two predicted age values corresponding to the face image to be recognized, wherein the target face age recognition model is trained by the method of the first aspect;

and the age calculation module is used for taking the average value of the two predicted age values as the face age of the face image to be identified.

In a fifth aspect, a computer device is provided, comprising a memory and one or more processors configured to execute one or more computer programs stored in the memory, the one or more processors, when executing the one or more computer programs, cause the computer device to implement the age-recognition model training method of the first aspect or the face-age-recognition method of the second aspect.

In a sixth aspect, there is provided a computer readable storage medium storing a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the age-recognition model training method of the first aspect or the face age-recognition method of the second aspect.

The application can realize the following beneficial effects: the human face age recognition network is used for acquiring high-level semantic feature vectors and predicted ages of one sample human face image, so that human face features of two sample human face images can be extracted relatively independently through the two-layer human face age recognition model, and the human face age recognition model is favorable for better comparing feature commonalities and feature differences of the two sample human face images; because the two sample face images correspond to the same age, when iteration parameter adjustment is carried out based on the loss of the face age recognition model, the face age recognition model can learn the face features which are related to the age information and are irrelevant to the identity information from various face features by introducing the feature similarity loss obtained by calculating the high-level semantic feature vectors corresponding to the two sample face images belonging to the same age on the basis of the age loss, so that the corresponding relation between the face features which are related to the age information and are irrelevant to the identity information and the age information is established, and the recognition precision of the face age recognition model is improved.

Drawings

FIG. 1 is a flow chart of an age identification model training method according to an embodiment of the present application;

Fig. 2 is a schematic structural diagram of a face age identification network according to an embodiment of the present application;

fig. 3 is a schematic flow chart of a face age identification method according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of an age identification model training device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a face age identifying device according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application.

The technical scheme of the application can be suitable for various scenes of face recognition, and particularly can be used for recognizing the age of the face corresponding to the face image in the scene of face recognition. In a face recognition scene, in some implementations, a face image of the scene is recognized by a face age recognition model with an age recognition function, so as to determine a face age corresponding to the face image, wherein the face age recognition model is obtained through pre-training.

In the process of training the obtained face age recognition model, a large number of face images are obtained, corresponding age labels are marked on each face image, then each face image and the age labels corresponding to each face image are input into the face age recognition model which is not trained yet for training, the face age recognition model obtained through training can be infinitely close to the age labels corresponding to each age image based on the age recognition results output by each face image, the face age recognition model learns the face characteristics of the face images, the face age recognition model further has the capability of distinguishing face images of different ages, and the face age recognition model with the capability of distinguishing face images of different ages can be used for recognizing the face ages corresponding to the face images.

In order to facilitate understanding of the technical scheme of the application, a training process of a face age recognition model is specifically introduced through an example. Taking the age range to be identified between 1 and 100, training the face age identification model through 5 face images at a time as an example, assume that the face age corresponding to the face image 1 is 20 years old, the face age corresponding to the face image 2 is 25 years old, the face age corresponding to the face image 3 is 37 years old, the face age corresponding to the face image 4 is 55 years old, and the face age corresponding to the face image 5 is 86 years old.

The primary training process is as follows:

1) And labeling each face image with a face age label. Specifically, since the age range to be identified is between 1 and 100, a 100-dimensional vector is used as an age tag corresponding to one face image, and one 100-dimensional vector is used to indicate the age of the face corresponding to one face image. The 20 th bit in the 100-dimensional vector corresponding to the face image 1 is 1, and the rest bits are 0; the 25 th bit is 1 and the rest bits are 0 in the 100-dimensional vector corresponding to the face image 2; the 37 th bit in the 100-dimensional vector corresponding to the face image 3 is 1, and the rest bits are 0; the 55 th bit is 1 in the 100-dimensional vector corresponding to the face image 4, and the rest bits are 0; the 86 th bit is 1 and the rest bits are 0 in the 100-dimensional vector corresponding to the face image 5. Namely, in the 100-dimensional vector, the position corresponding to the age corresponding to the face image is marked as 1, and the other positions are marked as 0 so as to be used for representing the face age corresponding to the face image. This way of labelling is also known as one-hot (one-hot) coding.

2) And inputting each face image into a face age recognition model, and outputting an age detection result aiming at each face image by the face age recognition model, wherein each age detection result comprises the probability of the face image corresponding to each age. Specifically, after 5 face images are input to the face age recognition model, each detection result output by the face age recognition model is a 100-dimensional vector, each 100-dimensional vector contains 100 probability values, and each probability value is in a range of 0 to 1 so as to respectively indicate the probability that the age of the face image corresponding to the 100-dimensional vector is 1-100. That is, after the face images 1 to 5 are input into the face age recognition model, the age recognition model outputs a 100-dimensional vector corresponding to the face image 1 for indicating the probability that the face in the face image 1 is 1 year old, the probability that the face is 2 years old, the probability that the face is 3 years old … … and the probability that the face is 100 years old; outputting a 100-dimensional vector corresponding to the face image 2, wherein the 100-dimensional vector is used for indicating the probability that the face in the face image 2 is 1 year old, the probability that the face is 2 years old, the probability that the face is 3 years old … … and the probability that the face is 100 years old; outputting a 100-dimensional vector corresponding to the face image 3, wherein the 100-dimensional vector is used for indicating the probability that the face in the face image 3 is 1 year old, the probability that the face is 2 years old, the probability that the face is 3 years old … … and the probability that the face is 100 years old; outputting a 100-dimensional vector corresponding to the face image 4, wherein the 100-dimensional vector is used for indicating the probability that the face in the face image 4 is 1 year old, the probability that the face is 2 years old, the probability that the face is 3 years old … … and the probability that the face is 100 years old; and outputting a 100-dimensional vector corresponding to the face image 5, wherein the 100-dimensional vector is used for indicating the probability that the face in the face image 5 is 1 year old, the probability that the face is 2 years old and the probability that the face is 3 years old, and the probability that the face is … … is 100 years old.

3) And calculating the difference between the output result of the face age recognition model and the age label to determine the loss of the face age recognition model, wherein the loss of the face age recognition model represents the accuracy of age recognition, and the smaller the loss is, the higher the accuracy of the face age recognition model is, and the closer the accuracy is to the real situation. Specifically, calculating the gap between the age label of the face image 1 and the age detection result of the face image 1 to obtain a gap 1; calculating the difference between the age label of the face image 2 and the age detection result of the face image 2 to obtain a difference 2; calculating the difference between the age label of the face image 3 and the age detection result of the face image 3 to obtain a difference 3; calculating the difference between the age label of the face image 4 and the age detection result of the face image 4 to obtain a difference 4; calculating the difference between the age label of the face image 5 and the age detection result of the face image 5 to obtain a difference 5; and summing and averaging the gaps 1, 2, 3, 4 and 5 to obtain the loss of the face age recognition model.

4) And (5) according to the internal parameters of the damaged and deregulated whole face age identification model.

The method is a one-time training parameter-adjusting process of the face age recognition model, a large number of face images can be obtained in the actual training process, the real ages corresponding to the face images are used as face age labels, and repeated iterative parameter-adjusting training is carried out. As the face image has various face features, besides the face features representing the age, the face features representing the identity of the user. For the trained face age recognition model, in terms of face features in the user identity dimension, the newly input face image belongs to a new class of images, and the face age recognition model does not learn the face features related to the identity information of the newly input face image, so that the face age recognition model is interfered, and the recognition effect is poor.

In view of this, the technical scheme of the application proposes a new method for training a face age recognition model, after a large number of face images of different users are obtained, the obtained face images are classified according to ages, two face images belonging to the same age are used as a group of training samples, the face characteristics of the two face images in the group of training samples are respectively extracted through a pre-established two-layer face age recognition network, so as to respectively obtain high-level semantic feature vectors and predicted ages of the two face images, then the loss of the face age recognition model is calculated based on the high-level semantic feature vectors and the predicted ages of the two face images, and on the basis of the existing calculated age loss, the feature similarity loss for measuring the similarity of the face features of the two face images is increased, so that the face age recognition model can learn the face features which are related to the age information and irrelevant to the identity information, the influence caused by the identity uniqueness of the user is eliminated, and the accuracy of model training is improved, and the recognition accuracy of the face age recognition model is improved.

The technical scheme of the application is specifically described below.

Referring first to fig. 1, fig. 1 is a flowchart of an age identification model training method according to an embodiment of the present application, where the method may be applied to various face recognition devices, as shown in fig. 1, and the method includes the following steps:

S101, acquiring a sample face image group, wherein the sample face image group comprises two different sample face images with the same age labels.

Here, the sample face image refers to a face image for training a face age recognition model, and the age label refers to a label for indicating the true age of a face (i.e., user) in the face image; the sample face image group refers to an image combination consisting of two different sample face images belonging to the same age, that is, the sample face image group comprises two sample face images, and the real ages of faces in the two sample face images are the same. For each age, a plurality of different face images with the true age being the age can be obtained to be used as face sample subsets corresponding to the ages, and the face sample subsets corresponding to the ages are combined to obtain a face image sample set. Any two face images obtained from a face sample subset corresponding to any age may be referred to as a sample face image group.

In a specific implementation, a face image set can be obtained from various public face databases to be used as a face image sample set, sample face images in the face image sample set are classified according to ages to obtain face sample subsets corresponding to all ages, and then two face images are obtained at will from the face sample subsets corresponding to any age to be used as a sample face image group. It should be understood that in the actual training process, a plurality of sample face image groups can be obtained at the same time for training the face age recognition model, and because the processing process of each sample face image group is the same in the one training process, the embodiment of the application introduces the training process of the face age recognition model by the processing process of one sample face image group.

S102, inputting the sample face image group into a face age recognition model to respectively obtain high-level semantic feature vectors and predicted ages corresponding to two different sample face images.

In the embodiment of the application, the face age recognition model comprises two layers of face age recognition networks, wherein the two layers of face age recognition networks are mutually independent in network structure (namely, no connection relation exists between the two layers of face age recognition networks), and one layer of face age recognition network is used for acquiring a high-level semantic feature vector and predicted age of one face image input into the face age recognition model. The high-level semantic feature vector refers to a vector which is extracted by the face age recognition network and used for representing the face features of the face image input to the face age recognition network; the predicted age is an age obtained by age prediction of a face image input to the face age recognition network by the face age recognition network. The sample face image group comprises two different sample face images, and the high-level semantic feature vectors and the predicted ages corresponding to the two different sample face images can be respectively obtained through the two-level face age recognition network. The face age recognition network may be any network structure capable of extracting high-level semantic feature vectors and predicting ages of face images.

In some embodiments, the network structure of the two layers of face age recognition networks may be identical.

In some possible embodiments, each layer of face age recognition network includes a feature extraction module, a first full-connection layer, and a second full-connection layer, where the first full-connection layer is different from the second full-connection layer, and the feature extraction module is connected to the first full-connection layer and the second full-connection layer, respectively. The feature extraction module is used for extracting face features of face images input to the face age recognition network; the first full-connection layer is used for carrying out feature splicing and fusion on the face features extracted by the feature extraction module to form a vector; the second full-connection layer is used for carrying out age prediction based on the face features extracted by the feature extraction module to obtain the possibility that the faces in the face images belong to each age so as to obtain the predicted ages of the face images.

Because the network structures of the two layers of face age recognition networks are the same, the processing procedure of each layer of face age recognition network on the input sample face image is the same, and the processing procedure of the face age recognition network on one sample face image is as follows: extracting features of the sample face image through a feature extraction module to obtain a feature map of the sample face image; and performing high-level semantic feature extraction on the feature map of the sample face image through the first full-connection layer to obtain a high-level semantic feature vector of the sample face image, and performing age prediction on the feature map of the sample face image through the second full-connection layer to obtain the predicted age of the sample face image. Specifically, the feature extraction module may be any network module capable of extracting facial features, for example, may be a network module composed of multiple convolution layers, or a network module composed of multiple serially connected residual structures, or a network module composed of multiple feature extraction operators, or the like, which is not limited by the present application.

Illustratively, the face age recognition network may be as shown in fig. 2, and the feature extraction module is composed of 5 sequentially connected convolution layers, each of which employs a relu function as an activation function. For a sample face image with a size of 320×320×3 input to a face age recognition network, the processing procedure is as follows:

(1) Processing the sample face image through a convolution layer C1, wherein the convolution layer C1 comprises 16 convolution kernels with the size of 3*3 and the step length of 2, and after the face image is subjected to convolution processing through the convolution layer C1 and is activated by an activation function, 16 feature images with the size of 160 x 160 can be obtained; the 16 feature maps with the sizes of 160×160 obtained by the processing of the convolution layer C1 are input to the convolution layer C2.

(2) Processing 16 feature images with the size of 160 x 160 through a convolution layer C2, wherein the convolution layer C2 comprises 32 convolution kernels with the size of 3*3 and the step size of 2, and after the convolution layer C2 carries out convolution processing on the 16 feature images with the size of 160 x 160 and activation by using an activation function, 32 feature images with the size of 80 x 80 can be obtained; the 32 feature maps with the sizes of 80 x 80, which are obtained by processing the convolution layer C2, are input to the convolution layer C3.

(3) Processing 32 feature images with the size of 80 x 80 through a convolution layer C3, wherein the convolution layer C3 comprises 64 convolution kernels with the size of 3*3 and the step length of 2, and after the convolution layer C3 carries out convolution processing on the 32 feature images with the size of 80 x 80 and activation by using an activation function, 64 feature images with the size of 40 x 40 can be obtained; the 64 feature maps with the size of 40 x 40 obtained by the processing of the convolution layer C3 are input to the convolution layer C4.

(4) Processing 64 feature maps with the size of 40 x 40 through a convolution layer C4, wherein the convolution layer C4 comprises 128 convolution kernels with the size of 3*3 and the step length of 2, and after the convolution layer C4 carries out convolution processing on the 64 feature maps with the size of 40 x 40 and activation by using an activation function, 128 feature maps with the size of 20 x 20 can be obtained; the 128 feature maps with the size of 20 x 20 obtained by the processing of the convolution layer C4 are input to the convolution layer C5.

(5) Processing the feature images with the size of 20 x 20 through a convolution layer C5128, wherein the convolution layer C4 comprises 256 convolution kernels with the size of 3*3 step length of 2, and after the convolution layer C4 carries out convolution processing on the feature images with the size of 20 x 20 and activation by using an activation function, 256 feature images with the size of 10 x 10 can be obtained; 256 feature maps with the size of 10 x 10, which are obtained through the processing of the convolution layer C5, are input to the first full-connection layer A1 and the second full-connection layer A2.

(6) The first full-connection layer comprises a feature matrix of (256 x 10) m, 256 feature graphs with the size of 10 x 10 are spliced and then multiplied by the feature matrix to obtain a vector 1 with the dimension of m, wherein the vector 1 is a high-level semantic feature vector corresponding to the sample face image, and m is the feature dimension number of the high-level semantic feature vector.

(7) The second full-connection layer comprises a feature matrix of (256×10×10) n, and after full-connection and normalization processing are performed on 256 feature graphs with the size of 10×10, a vector 2 with the dimension of n is obtained, n is the number of age values, and the values in the vector 2 are respectively used for indicating the probability that the face in the sample face image respectively belongs to n age values, wherein the age value indicated by the maximum probability is the predicted age value of the sample face image.

Based on the face age recognition network with the sequentially connected multi-layer convolution layers as the feature extraction modules shown in fig. 2, the full-range face feature extraction from global features to local features can be realized along with the gradual deepening of the convolution layers, the gradual shrinking of the feature graphs and the gradual increasing of the number of convolution kernels, the extracted face features are ensured to be enough and fine enough, and the face age recognition model is favorable for better finding the face features which are related to age information and are irrelevant to identity information. The two full-connection layers share parameters of the feature extraction module, so that the structure of the face age identification network can be simplified, and the operation efficiency is improved. It should be understood that the structure, size, number of convolution kernels, full connection layer arrangement, etc. of the feature extraction module in fig. 2 are merely examples of the present application, and are not limited thereto, and in practical applications, other feature extraction structures and full connection layers may be designed according to the size of the sample face image and the required feature number.

In other possible embodiments, the second full-connection layer may be connected to the first full-connection layer, and the age prediction may be performed by the second full-connection layer based on the high-level semantic feature vector of the sample face image extracted by the first full-connection layer, so as to obtain the predicted age of the sample face image. After the second full-connection layer is connected to the first full-connection layer, the second full-connection layer can acquire the result of the first full-connection layer, and accuracy of age prediction is improved.

In still other possible embodiments, feature extraction modules may be further provided for the first full connection layer and the second full connection layer in the face age identification network, that is, a first feature extraction module and a second feature extraction module are provided, where the first feature extraction module is connected with the first full connection layer and is used for extracting face features in a face image input to the face age identification network, so as to be used for feature similarity matching; the second feature extraction module is connected with the second full-connection layer and is used for extracting face features in face images input to the face age recognition network so as to be used for age prediction. For one of the face images, the face age recognition network is processed as follows: carrying out feature extraction on the sample face image through a first feature extraction module so as to obtain a first feature map of the sample face image; extracting high-level semantic features of the first feature image through the first full-connection layer to obtain high-level semantic feature vectors of the sample face image; and carrying out feature extraction on the sample face image through a second feature extraction module so as to obtain a second feature image of the sample face image, and carrying out age prediction on the second feature image through a second full-connection layer so as to obtain the predicted age of the sample face image. The two feature extraction modules may have the same structure, for example, the feature extraction structures shown in fig. 2 may be the same, and of course, the two feature extraction modules may also be different. The feature extraction modules are respectively arranged to perform similarity matching and age prediction, so that the face age recognition model can be helped to better distinguish face features related to age information and face features unrelated to identity information.

The network structures of the two-layer face age recognition network are set to be identical, so that the types of face features extracted by the two-layer face age recognition model are identical, the face age recognition network can conveniently compare feature commonalities and feature differences of two sample face images, and the face features related to age information and the face features unrelated to identity information can be distinguished more easily.

Alternatively, in other embodiments, the network structure of the two-layer face age identification network may be different. In this embodiment, the face age recognition network may determine the predicted ages corresponding to the two different face images based on the high-level semantic feature vector by determining the face features of the same kind as the high-level semantic feature vector.

S103, respectively calculating the difference between the predicted ages corresponding to the two face images with different samples and the age labels corresponding to the two face images with different samples so as to obtain the age loss of the face age identification model.

Here, age loss is used to reflect the deviation between the predicted age and the true age. The larger the age loss is, the less accurate the face age recognition model is, and the smaller the age loss is, the more accurate the face age recognition model is.

In some embodiments, the age penalty of the face age recognition model may be equal to the sum of the deviations between the predicted ages and the true ages of the two sample face images.

In a possible implementation, the predicted age and the age label are both presented in the form of n-dimensional vectors, where n is equal to the number of age values in the preset age range, and then the cross entropy can be used to measure the deviation between the predicted age and the true age. Specifically, the cross entropy between the predicted ages and the age labels corresponding to the two face images with different samples can be calculated respectively to obtain the cross entropy corresponding to the two face images with different samples; and determining the sum of the cross entropies corresponding to the two face images of different samples to be the age loss of the face age recognition model. The cross entropy corresponding to a sample face image can be expressed as the following formula:

wherein L2 is cross entropy, n is the number of age values, Y _i For the value of the age tag in the i-th dimension, P _i And outputting the probability belonging to the ith age value for the face age identification model. Illustratively Y _i And P _i Reference is made to the 100-dimensional vector in the foregoing description of a face age recognition model training process. The n-dimensional vector is used for representing the predicted age and the age label, and the deviation between the real age and the predicted age is measured by cross entropy, so that the face age recognition model can be converged better, and the training speed of the face age recognition model is improved.

Alternatively, in other embodiments, the predicted age and the age label may be directly presented in a numerical manner, and then the difference between the two may be used to measure the deviation between the predicted age and the age label. The deviation corresponding to one sample face image is equal to the absolute value of the difference between the predicted age and the age label, and is expressed as L3= |Y-P|, Y is a real age value, and P is a predicted age value.

Alternatively, in other embodiments, the age loss of the face age recognition model may be equal to the mean of the deviation between the predicted age and the true age of the two sample face images. It should be understood that, in particular, the deviation between the predicted age and the true age is measured in what manner, and may be designed according to the specific situation, and the present application is not limited.

S104, calculating the similarity between the high-level semantic feature vectors corresponding to the two face images with different samples so as to obtain the feature similarity loss of the face age recognition model.

Here, the feature similarity loss is used to reflect the similarity between features of two sample face images extracted by the face age recognition model. The larger the feature similarity loss is, the more dissimilar the features of the two sample face images extracted by the face age recognition model are, and the smaller the feature similarity loss is, the more similar the features of the two sample face images extracted by the face age recognition model are. By restraining the feature similarity loss, the feature similarity loss is small enough, the feature irrelevant to identity information can be removed from the face age recognition model, the sufficiently similar face features are extracted from two sample face images, and the influence caused by different identity information can be reduced.

In some possible embodiments, the similarity between the high-level semantic feature vectors corresponding to the two different face images can be measured by various similarity calculation factors for measuring the similarity, so as to obtain the feature similarity loss of the face age recognition model. For example, the similarity between the high-level semantic feature vectors corresponding to each of the two different sample face images may be calculated based on cosine similarity, pearson correlation coefficient, likelihood probability, and the like.

In other possible embodiments, a similarity matrix can be constructed through metric learning, and the similarity between the high-level semantic feature vectors corresponding to the two face images with different samples is calculated based on the similarity matrix. Specifically, the similarity between the high-level semantic feature vectors corresponding to the two face images with different samples can be calculated according to the following formula, so as to obtain the feature similarity loss of the face age recognition model:

wherein L1 is feature similarity loss, X ₁ Is a high-level semantic feature vector, X corresponding to one sample face image in the sample face image group ₂ The method is characterized in that the method is a high-level semantic feature vector corresponding to another Zhang Yangben face image in the sample face image group, and M is a similarity measurement matrix. It should be understood that the similarity metric matrix is a metric matrix preset based on the dimension of the high-level semantic feature vector corresponding to the two sample face images, and the number of rows and the number of columns of the similarity metric matrix are related to the dimension of the high-level semantic feature vector corresponding to the two sample face images, and in the training process, parameters in the similarity metric matrix are changed along with the parameter change in the face age identification model. When the face age recognition model converges, parameters in the similarity measurement matrix are not changed any more, and the similarity measurement matrix determined based on the condition can enable the feature similarity loss of the face age recognition model to be smaller, so that the high-level semantic features corresponding to the two sample face images are guaranteed to be similar enough. By introducing the similarity measurement matrix, the weight setting is equivalent to the weight setting of the face features, and the differentiation measurement of the face features can be realized, so that the face age recognition model can learn the face features associated with and corresponding to the ages better.

S105, iteratively adjusting parameters of the face age recognition model based on the age loss of the face age recognition model and the feature similarity loss of the face age recognition model to obtain a target face age recognition model.

In the embodiment of the application, iterative parameter adjustment is performed on the face age recognition model based on the age loss of the face age recognition model and the feature similarity loss of the face age recognition model, namely parameters in the face age recognition model are adjusted based on the age loss and the feature similarity loss, so that the age loss and the feature similarity loss calculated based on the adjusted face age recognition model can change towards a reduced direction until the accuracy of the face age recognition model reaches a preset degree, and the face age recognition model with the accuracy reaching the preset degree is the target face age recognition model.

In a possible implementation manner, a weighted sum of the age loss of the face age recognition model and the feature similarity loss of the face age recognition model may be calculated to obtain a total loss of the face age recognition model, and then the parameters in the whole face age recognition model are offset based on the total loss, and the steps S101 to S104 are executed again until the face age recognition model converges, and the face age recognition model after convergence is used as the target face age recognition model. The convergence of the face age recognition model may be that the total loss of the face age recognition model is less than or equal to a preset threshold, or the number of parameter adjustment times reaches a preset number of times. Specifically, the total loss of the face age recognition model can be expressed as: l=a+b×lb, L is the total loss, la is the age loss, a is the weight of the age loss, lb is the feature similarity loss, and b is the weight of the feature similarity loss. Different weights are set for the age loss and the feature similarity loss, so that the face age recognition model can learn in a biased manner in the training process, and the training precision of the face age recognition model is improved.

In specific implementation, the parameters of the face age recognition model can be optimized by adopting an adaptive algorithm, the initial learning rate is 0.001, the weight attenuation is set to be 0.0005, and the learning rate is attenuated to be 1/10 of the original weight attenuation for each 50 iterations, so that iterative parameter adjustment is carried out on the face age recognition model.

Based on the above description, in the technical scheme corresponding to fig. 1, two face images with the same age label and different samples are input into a face age recognition model comprising two layers of face age recognition networks, high-level semantic feature vectors and predicted ages of the two sample face images are obtained through the two layers of face age recognition networks respectively, then a gap between the predicted ages of the two sample face images and the age label is calculated to obtain an age loss of the face age recognition model, similarity between the high-level semantic feature vectors of the two sample face images is calculated to obtain a feature similarity loss of the face age recognition model, and finally iterative parameter adjustment is performed on the face age recognition model based on the age loss and the feature similarity loss of the face age recognition model to obtain the target face age recognition model. The human face age recognition network is used for acquiring high-level semantic feature vectors and predicted ages of one sample human face image, so that human face features of two sample human face images can be extracted relatively independently through the two-layer human face age recognition model, and the human face age recognition model is favorable for better comparing feature commonalities and feature differences of the two sample human face images; because the two sample face images correspond to the same age, when iteration parameter adjustment is carried out based on the loss of the face age recognition model, the face age recognition model can learn the face features which are related to the age information and are irrelevant to the identity information from various face features by introducing the feature similarity loss obtained by calculating the high-level semantic feature vectors corresponding to the two sample face images belonging to the same age on the basis of the age loss, so that the corresponding relation between the face features which are related to the age information and are irrelevant to the identity information and the age information is established, and the recognition precision of the face age recognition model is improved.

In some possible embodiments, in the process of acquiring the sample face image, after the sample face image is acquired, the training samples can be enriched by performing data enhancement on the sample face image. The step S101 specifically includes the following steps:

a1, acquiring a face sample image set.

Here, regarding the concept of the face sample image set, reference may be made to the description of step S101, and the description is omitted here.

And A2, carrying out sample data enhancement on the face sample image set, and acquiring two different sample face images belonging to the same age from the face sample image set subjected to sample data enhancement to serve as a sample face image group.

Here, the step of enhancing the sample data of the face sample image refers to a step of obtaining a new sample face image by a data enhancement method and adding the new sample face image into the face sample image set.

Specifically, the sample face image in the face sample image set may be subjected to image transformation by a simple enhancement manner such as geometric transformation (rotation, overturn, clipping, deformation, scaling, etc.), color transformation (noise, blurring, color transformation, erasure, filling, etc.), etc., so as to obtain a new face image; optionally, image conversion may be performed on the sample face image in the face sample image set by using an image synthesis technique (such as interpolation, etc.), so as to obtain a new face image; further alternatively, the sample face image in the face sample image set may be subjected to image transformation by a combination of multiple enhancement modes, so as to obtain a new face image. And is not limited to the description herein. By carrying out data enhancement on the face sample images, the types of sample face images in the face sample image set can be enriched, and the sample face images corresponding to each age are balanced, so that the face age recognition model can be better trained.

After the age identification model training method provided by the application is used, the age identification model can be used for face age identification. Referring to fig. 3, fig. 3 is a flow chart of a face age identifying method according to an embodiment of the present application, as shown in fig. 3, the method includes the following steps:

s201, acquiring a face image to be recognized.

Here, the face image to be recognized refers to a face image of the age of the face to be recognized.

In specific implementation, the face image to be identified can be obtained by means of local acquisition, direct shooting and the like.

S202, inputting the face image to be recognized into a target face age recognition model to obtain two predicted age values corresponding to the face image to be recognized.

Here, the target face age recognition model refers to a face age recognition model trained by the foregoing method embodiments of fig. 1-2. As described in the foregoing step S102, the target face age recognition model includes two-layer face age recognition networks, so that the face images to be recognized are respectively input into the two-layer face age recognition networks, and the two-layer face age recognition networks can respectively perform age prediction on the face images to be recognized to obtain two predicted age values. The implementation manner of obtaining the predicted age value by the face age recognition network according to the age prediction of the face image to be recognized can refer to the process of processing the sample face image by the face age recognition network to obtain the predicted age value corresponding to the sample face image, which is not described herein.

S403, taking the average value of two predicted age values corresponding to the face image to be identified as the face age of the face image to be identified.

Based on the above description, in the technical solution corresponding to fig. 3, since the target face age recognition model learns the face features related to the age information and not related to the identity information from the various face features, the correspondence between the face features related to the age information and not related to the identity information and the age information is established, and when the target face age recognition model is utilized, the target face age recognition model can recognize the age of the face image to be recognized by utilizing the learned face features related to the age information and not related to the identity information and the correspondence between the face features related to the age information and not related to the identity information and the age information, thereby removing the influence of the face features related to the identity information on the age recognition, and improving the accuracy of the face age recognition.

The above describes the method of the application and the apparatus of the application is described next in order to better carry out the method of the application.

Referring to fig. 4, fig. 4 is a schematic structural diagram of an age identification model training device according to an embodiment of the present application, as shown in fig. 4, the device 30 includes:

A first obtaining module 301, configured to obtain a sample face image set, where the sample face image set includes two different sample face images with the same age tag;

the result prediction module 302 is configured to input the sample face image set to the face age recognition model to obtain high-level semantic feature vectors and predicted ages corresponding to the two different sample face images respectively, where the face age recognition model includes two-level face age recognition networks, and one-level face age recognition network is configured to obtain the high-level semantic feature vector and the predicted age corresponding to one sample face image in the sample face image set, and the high-level semantic feature vector of one sample face image is configured to express the face feature of the one sample face image;

a first loss calculation module 303, configured to calculate differences between the predicted ages corresponding to the two different sample face images and the age labels corresponding to the two different sample face images, so as to obtain age loss of the face age recognition model;

the second loss calculation module 304 is configured to calculate a similarity between the high-level semantic feature vectors corresponding to the two face images of different samples, so as to obtain a feature similarity loss of the face age recognition model;

The parameter tuning module 305 is configured to iteratively tune the face age identification model based on the age loss and the feature similarity loss, so as to obtain a target face age identification model.

In one possible design, the network structure of each layer of face age identification network is the same; each layer of face age identification network comprises a feature extraction module, a first full-connection layer and a second full-connection layer, wherein the feature extraction module is respectively connected with the first full-connection layer and the second full-connection layer; the result prediction module 302 specifically is configured to: extracting features of a target sample face image through a feature extraction module of a target face age recognition network to obtain a feature image of the target sample face image, wherein the target face age recognition network is any one layer of face age recognition network in the face age recognition model, and the target sample face image is any one sample face image in the sample face image group; extracting high-level semantic features of the feature map through the first full-connection layer to obtain high-level semantic feature vectors of the target sample face image; and carrying out age prediction on the feature map through the second full-connection layer so as to obtain the predicted age of the face image of the target sample.

In one possible design, the first loss calculation module 303 is specifically configured to: and calculating the similarity between the high-level semantic feature vectors corresponding to the two face images with different samples according to the following formula to obtain the feature similarity loss of the face age recognition model:/>wherein L1 is the feature similarity loss, X ₁ For the high-level semantic feature vector, X corresponding to one of the sample face images in the sample face image group ₂ And M is a similarity measurement matrix, wherein the M is a high-level semantic feature vector corresponding to another Zhang Yangben face image in the sample face image group.

In one possible design, the second loss calculation module 304 is specifically configured to: respectively calculating the cross entropy between the predicted ages corresponding to the two different-sample face images and the age labels corresponding to the two different-sample face images so as to obtain the cross entropy corresponding to the two different-sample face images; and determining the age loss of the face age identification model according to the cross entropy corresponding to each of the two face images of different samples.

In one possible design, the first obtaining module 301 is specifically configured to: acquiring a face sample image set; and carrying out sample data enhancement on the face sample image set, and acquiring two different sample face images belonging to the same age from the face sample image set subjected to sample data enhancement to serve as the sample face image group.

It should be noted that, what is not mentioned in the embodiment corresponding to fig. 4 may refer to the description of the method embodiment corresponding to fig. 1-2, and will not be repeated here.

According to the device, two different sample face images with the same age label are input into a face age identification model comprising two layers of face age identification networks, high-level semantic feature vectors and predicted ages of the two sample face images are obtained through the two layers of face age identification networks respectively, then the difference between the predicted ages of the two sample face images and the age label is calculated, the age loss of the face age identification model is obtained, the similarity between the high-level semantic feature vectors of the two sample face images is calculated, the feature similarity loss of the face age identification model is obtained, and finally iterative parameter adjustment is carried out on the face age identification model based on the age loss and the feature similarity loss of the face age identification model, so that the target face age identification model is obtained. The human face age recognition network is used for acquiring high-level semantic feature vectors and predicted ages of one sample human face image, so that human face features of two sample human face images can be extracted relatively independently through the two-layer human face age recognition model, and the human face age recognition model is favorable for better comparing feature commonalities and feature differences of the two sample human face images; because the two sample face images correspond to the same age, when iteration parameter adjustment is carried out based on the loss of the face age recognition model, the face age recognition model can learn the face features which are related to the age information and are irrelevant to the identity information from various face features by introducing the feature similarity loss obtained by calculating the high-level semantic feature vectors corresponding to the two sample face images belonging to the same age on the basis of the age loss, so that the corresponding relation between the face features which are related to the age information and are irrelevant to the identity information and the age information is established, and the recognition precision of the face age recognition model is improved.

Referring to fig. 5, fig. 5 is a schematic structural diagram of a face age identifying device according to an embodiment of the present application, and as shown in fig. 5, the device 40 includes:

a second obtaining module 401, configured to obtain a face image to be identified;

the age prediction module 402 is configured to input the face image to be recognized into a target face age recognition model, so as to obtain two predicted age values corresponding to the face image to be recognized, where the target face age recognition model is trained by the age recognition model training method in the method embodiment;

the age calculation module 403 is configured to take the average of the two predicted age values as the face age of the face image to be identified.

It should be noted that, what is not mentioned in the embodiment corresponding to fig. 5 may refer to the description of the embodiment of the method corresponding to fig. 3, and will not be repeated here.

In the device, as the target face age recognition model learns the face features related to the age information and irrelevant to the identity information from various face features, the corresponding relation between the face features related to the age information and irrelevant to the identity information and the age information is established, and when the target face age recognition model is utilized, the target face age recognition model can recognize the age of the face image to be recognized by utilizing the learned face features related to the age information and irrelevant to the identity information and the established corresponding relation between the face features related to the age information and irrelevant to the identity information and the age information, thereby removing the influence of the face features related to the identity information on the age recognition, and improving the accuracy of the face age recognition.

Referring to fig. 6, fig. 6 is a schematic structural diagram of a computer device according to an embodiment of the present application, and the computer device 50 includes a processor 501 and a memory 502. The processor 501 is connected to the memory 502, for example the processor 501 may be connected to the memory 502 via a bus.

The processor 501 is configured to support the computer device 50 to perform the corresponding functions in the methods of fig. 1-2 or the method of fig. 3. The processor 501 may be a central processing unit (central processing unit, CPU), a network processor (network processor, NP), a hardware chip or any combination thereof. The hardware chip may be an application specific integrated circuit (application specific integrated circuit, ASIC), a programmable logic device (programmable logic device, PLD), or a combination thereof. The PLD may be a complex programmable logic device (complex programmable logic device, CPLD), a field-programmable gate array (field-programmable gate array, FPGA), general-purpose array logic (generic array logic, GAL), or any combination thereof.

The memory 502 is used for storing program codes and the like. The memory 502 may include Volatile Memory (VM), such as random access memory (random access memory, RAM); the memory 1002 may also include a non-volatile memory (NVM), such as read-only memory (ROM), flash memory (flash memory), hard disk (HDD) or Solid State Drive (SSD); memory 502 may also include a combination of the types of memory described above.

In some possible cases, the processor 501 may call the program code to:

In other possible cases, the processor 501 may call the program code to:

acquiring a face image to be identified;

inputting the face image to be identified into a target face age identification model to obtain two predicted age values corresponding to the face image to be identified, wherein the target face age identification model is obtained by training an age identification model training method in the method embodiment;

It should be noted that, implementation of each operation may also correspond to the corresponding description referring to the above method embodiment; the processor 501 may also cooperate with other functional hardware to perform other operations in the method embodiments described above.

Embodiments of the present application also provide a computer-readable storage medium storing a computer program comprising program instructions that, when executed by a computer, cause the computer to perform the method of the previous embodiments.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in the embodiments may be accomplished by computer programs stored in a computer-readable storage medium, which when executed, may include the steps of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only memory (ROM), a random-access memory (Random Access memory, RAM), or the like.

The foregoing disclosure is illustrative of the present application and is not to be construed as limiting the scope of the application, which is defined by the appended claims.

Claims

1. An age identification model training method, comprising:

Respectively calculating the difference between the predicted ages corresponding to the two face images with different samples and the age labels corresponding to the two face images with different samples so as to obtain the age loss of the face age identification model;

calculating the similarity between the high-level semantic feature vectors corresponding to the two face images with different samples so as to obtain the feature similarity loss of the face age recognition model;

and carrying out iterative parameter adjustment on the face age recognition model based on the age loss and the feature similarity loss so as to obtain a target face age recognition model.

2. The method of claim 1, wherein the network structure of each layer of face age identification network is the same; each layer of face age identification network comprises a feature extraction module, a first full-connection layer and a second full-connection layer, wherein the feature extraction module is respectively connected with the first full-connection layer and the second full-connection layer;

the step of inputting the sample face image group to the face age recognition model to obtain the high-level semantic feature vectors and the predicted ages corresponding to the two different sample face images respectively, includes:

Extracting features of a target sample face image through a feature extraction module of a target face age recognition network to obtain a feature image of the target sample face image, wherein the target face age recognition network is any layer of face age recognition network in the face age recognition model, and the target sample face image is any sample face image in the sample face image group;

extracting high-level semantic features of the feature map through the first full-connection layer to obtain high-level semantic feature vectors of the target sample face image;

and carrying out age prediction on the feature map through the second full-connection layer so as to obtain the predicted age of the face image of the target sample.

3. The method according to claim 1 or 2, wherein the calculating the similarity between the high-level semantic feature vectors corresponding to the two different sample face images to obtain the feature similarity loss of the face age recognition model includes:

and calculating the similarity between the high-level semantic feature vectors corresponding to the two face images of different samples according to the following formula to obtain the feature similarity loss of the face age recognition model:

Wherein L1 is the feature similarity loss, X ₁ For the high-level semantic feature vector, X corresponding to one sample face image in the sample face image group ₂ And M is a similarity measurement matrix for a high-level semantic feature vector corresponding to another Zhang Yangben face image in the sample face image group.

4. The method according to claim 1 or 2, wherein calculating the difference between the predicted ages corresponding to the two different sample face images and the age labels corresponding to the two different sample face images to obtain the age loss of the face age recognition model includes:

respectively calculating the cross entropy between the predicted ages corresponding to the two different-sample face images and the age labels corresponding to the two different-sample face images so as to obtain the cross entropy corresponding to the two different-sample face images;

and determining the age loss of the face age identification model according to the cross entropy corresponding to each of the two face images of different samples.

5. The method according to claim 1 or 2, wherein the acquiring a sample set of face images comprises:

acquiring a face sample image set;

And carrying out sample data enhancement on the face sample image set, and acquiring two different sample face images belonging to the same age from the face sample image set subjected to sample data enhancement to serve as the sample face image group.

6. A face age identification method, comprising:

acquiring a face image to be identified;

inputting the face image to be recognized into a target face age recognition model to obtain two predicted age values corresponding to the face image to be recognized, wherein the target face age recognition model is obtained by training according to the method of any one of claims 1-5;

7. An age identification model training device, comprising:

the second loss calculation module is used for calculating the similarity between the high-level semantic feature vectors corresponding to the two face images with different samples so as to obtain the feature similarity loss of the face age recognition model;

and the parameter adjusting module is used for carrying out iterative parameter adjustment on the face age identification model based on the age loss and the feature similarity loss so as to obtain a target face age identification model.

8. A face age recognition apparatus, comprising:

the age prediction module is configured to input the face image to be recognized into a target face age recognition model to obtain two predicted age values corresponding to the face image to be recognized, where the target face age recognition model is obtained by training according to the method of any one of claims 1-5;

9. A computer device comprising a memory and one or more processors configured to execute one or more computer programs stored in the memory, which when executed, cause the computer device to implement the method of any of claims 1-5 or claim 6.

10. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program comprising program instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1-5 or claim 6.