CN111967383A

CN111967383A - Age estimation method, and training method and device of age estimation model

Info

Publication number: CN111967383A
Application number: CN202010822845.4A
Authority: CN
Inventors: 苏驰; 李凯; 刘弘也; 王育林
Original assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Current assignee: Beijing Kingsoft Cloud Network Technology Co Ltd
Priority date: 2020-08-14
Filing date: 2020-08-14
Publication date: 2020-11-20

Abstract

The invention provides an age estimation method, an age estimation model training method and an age estimation model training device, wherein the method comprises the following steps: acquiring an image to be processed containing a human face; inputting an image to be processed into an age estimation model to obtain an output result; determining an age estimation result corresponding to the face based on the output result; the age estimation model is obtained through machine learning training based on age distances among age labels of a plurality of samples in a preset sample group and characteristic distances among sample characteristics of the plurality of samples output by the age estimation model. According to the method, the age of the face in the image to be processed is estimated through an age estimation model, and the age corresponding to the face is obtained; the age estimation model can learn the relation between the age distance and the characteristic distance in the training process, and can restrict the characteristic distance according to the age distance, so that the characteristics learned by the model are more discriminative and accord with the objective rule of age change, and the accuracy of estimating the age by the model is improved.

Description

Age estimation method, and training method and device of age estimation model

Technical Field

The invention relates to the technical field of image processing, in particular to an age estimation method, an age estimation model training method and an age estimation model training device.

Background

Age is an important human face attribute, and has important application in the fields of human-computer interaction, intelligent commerce, safety monitoring, entertainment and the like. The estimation of the age of the face generally means that the real age of the face is automatically estimated according to an input face image by adopting a computer vision technology and the like.

In the related art, the age of a person in a face image can be estimated through a trained deep learning model. In the training process of the deep learning model, an age label needs to be marked on each face image in a training set, usually, face images of one category correspond to the same age label, and then the deep learning model is trained based on the mapping relation between the face images and the age labels in the training set. In this way, the data referred by the deep learning model in the training process is limited, and the accuracy of the model for estimating the age is limited.

Disclosure of Invention

The invention aims to provide an age estimation method, an age estimation model training method and an age estimation model training device, so as to improve the accuracy of the age estimation model in estimating the age.

In a first aspect, an embodiment of the present invention provides an age estimation method, where the method includes: acquiring an image to be processed containing a human face; inputting the image to be processed into an age estimation model which is trained in advance to obtain an output result; determining an age estimation result corresponding to the face based on the output result; the age estimation model is obtained through machine learning training based on age distances among age labels of a plurality of samples in a preset sample group and characteristic distances among sample characteristics of the plurality of samples output by the age estimation model; each sample contains a face image and an age label indicating the age of the person in the face image.

In an alternative embodiment, the weight parameter of the age estimation model is determined according to the loss amount in the machine learning training process; the loss amount is determined according to the age distance between the age labels corresponding to the samples and the characteristic distance between the sample characteristics of the samples output by the age estimation model.

In an alternative embodiment, the set of samples includes a first sample and a second sample; the loss amount comprises a first loss value and a second loss value; the first loss value is indicative of: the characteristic distance between the sample characteristic data of the first sample and the sample characteristic data of the second sample output by the age estimation model; the characteristic distance is determined according to the age distance between the age label of the first sample and the age label of the second sample; the second loss value is used to indicate: the age estimation model outputs a gap between the age estimation result of the first sample and the age label of the first sample, and the age estimation model outputs a gap between the age estimation result of the second sample and the age label of the second sample.

In an alternative embodiment, the first loss value is determined by the following equation:

wherein L is_featureRepresenting a first loss value; f1 represents a sample characteristic of the first sample; f2 represents a sample characteristic of the second sample; a1 denotes the age label of the first sample; a2 denotes the age label of the second sample; l |. electrically ventilated margin₂A two-norm representation of a vector; max represents the selected maximum; exp denotes an exponential function operation with a natural constant e as the base.

In an alternative embodiment, the age estimation result includes: the probability that the face in the sample belongs to each age value under a plurality of preset age values; the second loss value is determined by the following equation:

wherein L is_ageRepresenting a second loss value;

indicating the age of the first sampleEstimating a result; a1 denotes the age label of the first sample;

to represent

The a1 th element in (1) represents the result of estimation of age

The probability that the age value corresponding to the face in the first sample is a 1;

representing an age estimation of the second sample; a2 denotes the age label of the second sample;

to represent

The a2 th element in (1) represents the result of estimation of age

The probability that the age value corresponding to the face in the second sample is a 2.

In an alternative embodiment, the age label of the face image is determined by: acquiring a plurality of labeling results corresponding to the face image; the labeling result is used for identifying the age value of the person in the face image; the labeled age value in the labeling result is one of a plurality of preset age values; calculating the average value of the age values corresponding to the plurality of labeling results to obtain an age average value; and taking the age mean value as an age label of the face image.

In an alternative embodiment, the output result includes: the probability that the face in the image to be processed belongs to each age value under a plurality of preset age values; the step of determining an age estimation result corresponding to the face based on the output result includes: and determining the age value corresponding to the maximum probability in the output result as an age estimation result corresponding to the face in the image to be processed.

In a second aspect, an embodiment of the present invention provides a training method for an age estimation model, where the training method includes: obtaining a sample set; wherein the sample set comprises a plurality of samples, each sample comprises a face image and an age label, and the age label is used for indicating the age of a person in the face image; and performing machine learning training on the initial model based on the age distances among the age labels of the samples in the sample set and the characteristic distances among the sample characteristics of the samples output by the age estimation model to obtain the age estimation model.

In an optional embodiment, the step of performing machine learning training on the initial model based on age distances between age labels of a plurality of samples in the sample set and feature distances between sample features of a plurality of samples output by the age estimation model to obtain the age estimation model includes: determining a target sample group based on the sample set; the target sample group comprises a first sample and a second sample in a sample set; inputting the first sample and the second sample into the initial model to obtain output results corresponding to the first sample and the second sample; the output result includes a sample feature; determining the loss amount according to the age difference between the age labels corresponding to the first sample and the second sample and the output result; updating the weight parameters of the initial model according to the loss amount; and continuing to execute the step of determining the target sample pair based on the sample set until the loss amount is converged or a preset training number is reached, so as to obtain an age estimation model.

In an optional embodiment, the output result further includes an age estimation result; the step of determining the loss amount according to the age difference between the age tags corresponding to the first sample and the second sample and the output result includes: calculating a characteristic difference between the sample characteristic data of the first sample and the sample characteristic data of the second sample according to the age distance between the age label corresponding to the first sample and the age label of the second sample to obtain a first loss value; calculating a first gap between the age estimate of the first sample and the age label of the first sample, and a second gap between the age estimate of the second sample and the age label of the second sample; obtaining a second loss value according to the first difference and the second difference; and obtaining the loss amount according to the first loss value and the second loss value.

In a third aspect, an embodiment of the present invention provides an age estimation apparatus, including: the image acquisition module is used for acquiring an image to be processed containing a human face; the image input model is used for inputting the image to be processed into the age estimation model which is trained in advance to obtain an output result; determining an age estimation result corresponding to the face based on the output result; the age estimation model is obtained through machine learning training based on age distances among age labels of a plurality of samples in a preset sample group and characteristic distances among sample characteristics of the plurality of samples output by the age estimation model; each sample contains a face image and an age label indicating the age of the person in the face image.

In a fourth aspect, an embodiment of the present invention provides a training apparatus for an age estimation model, where the training apparatus includes: the sample set acquisition module is used for acquiring a sample set; the sample set comprises a plurality of samples, each sample comprises a face image and an age label, and the age label is used for indicating the age of a person in the face image; and the model training module is used for performing machine learning training on the initial model based on the age distances among the age labels of the samples in the sample set and the characteristic distances among the sample characteristics of the samples output by the age estimation model to obtain the age estimation model.

In a fifth aspect, an embodiment of the present invention provides an electronic device, which includes a processor and a memory, where the memory stores machine executable instructions capable of being executed by the processor, and the processor executes the machine executable instructions to implement the above age estimation method or the above training method of the age estimation model.

In a sixth aspect, embodiments of the present invention provide a machine-readable storage medium storing machine-executable instructions that, when invoked and executed by a processor, cause the processor to implement the above age estimation method or the above training method of an age estimation model.

The embodiment of the invention has the following beneficial effects:

the invention provides an age estimation method, an age estimation model training method and an age estimation model training device, which are characterized in that firstly, to-be-processed images containing human faces are obtained; inputting the image to be processed into an age estimation model trained in advance to obtain an output result; and then determines an age estimation result corresponding to the face based on the output result. The method comprises the steps of carrying out age estimation on a face in an image to be processed through an age estimation model to obtain an age estimation result corresponding to the face, wherein the age estimation model is obtained through machine learning training based on age distances among age labels of a plurality of samples in a preset sample group and characteristic distances among sample characteristics of the plurality of samples output by the age estimation model; therefore, the age estimation model can learn the relation between the age distance and the characteristic distance in the training process, and restricts the characteristic distance according to the age distance in a display mode, so that the age characteristic learned by the model is more discriminative and conforms to the objective rule of age change, and the accuracy of estimating the age by the model is improved.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention as set forth above.

In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of an age estimation method according to an embodiment of the present invention;

FIG. 2 is a flow chart of another age estimation method according to an embodiment of the present invention;

fig. 3 is a schematic network structure diagram of an age estimation model according to an embodiment of the present invention;

fig. 4 is a flowchart of a training method of an age estimation device according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of an age estimation apparatus according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an age estimation model training apparatus according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The age estimation task is more specific than the general image classification task (e.g., classifying cats, dogs, etc.), which is a more difficult fine-grained classification task (e.g., distinguishing between faces 5 and 6 years old is very difficult for both humans and machines). In the related art, two age estimation methods exist, the first is a traditional face age estimation algorithm, generally, face features (such as active appearance features, anthropometric features, biological heuristic features, and the like) in a face image are manually extracted, then a regressor for age estimation can be obtained from the face features, age estimation can be performed on the face image to be estimated through the regressor, but the method lacks high-level semantic information of the face, so that the accuracy of an age estimation result obtained by the method is low.

The second is a way of estimating the age of a person in a face image based on a deep learning model. In the training process of the deep learning model, an age label needs to be marked on each face image in a training set, usually, face images of one category correspond to the same age label, and then the deep learning model is trained based on the mapping relation between the face images and the age labels in the training set. Compared with the traditional human face age estimation algorithm, the method can learn the high-level semantic information of the human face, and the estimation precision is improved; however, in the method, an objective rule of the age change of the face is not considered in the process of training the deep learning model (that is, the more similar the appearances of the faces with the similar ages are, the more different the appearances of the faces with different ages are, the larger the difference of the appearances of the faces with the different ages is), so that the age features learned by the deep learning model do not necessarily conform to the objective change rule of the ages, that is, data referred by the deep learning model in the training process is limited, and the accuracy of the model in estimating the ages is limited.

Based on the above problems, embodiments of the present invention provide an age estimation method, an age estimation model training method, and an age estimation model training device, which can be applied in the fields of human-computer interaction, intelligent commerce, security monitoring, entertainment, and the like, for age identification and age estimation. To facilitate understanding of the present embodiment, an age estimation method disclosed in the present embodiment will be described in detail first, and as shown in fig. 1, the method includes the following steps:

step S102, acquiring an image to be processed containing a human face.

The image to be processed can be a picture or a photo shot by a video camera or a camera, or can be a certain video frame in a designated video file; the image to be processed contains a face, which can be a front face, a side face or faces with various expressions. In a specific implementation, the manner of acquiring the image may be: the images are taken by a camera, a camera head and the like connected through communication and then transmitted into the storage device, or are acquired from the storage device which stores the images to be processed which are already taken.

Step S104, inputting the image to be processed into an age estimation model which is trained in advance to obtain an output result; and determining an age estimation result corresponding to the face based on the output result.

The age estimation model can be obtained through machine learning training based on age distances among age labels of a plurality of samples in a preset sample group and characteristic distances among sample characteristics of the plurality of samples output by the age estimation model; each sample contains a face image and an age label indicating the age of the person in the face image.

The samples in the preset sample group may be a plurality of samples selected from a preset sample set, where the sample set includes a large number of samples, each sample includes a face image and an age tag corresponding to the face image, each face image includes a face, and the age tag identifies the age of a person in the face image. For example, if the age of a person in a face image is ten years old, the corresponding age label of the face image is 10.

The age estimation model may adopt a deep learning model or a neural network model. In the process of performing machine learning training on an age estimation model, a sample set needs to be selected from a sample set, face images corresponding to a plurality of samples in the sample set are input into the age estimation model, the age estimation model can perform age estimation on each face image, and output sample characteristics and an age estimation result of the face image, and then network parameters (namely, weight parameters of each layer of network) of the age estimation model are adjusted according to age distances between age labels of the plurality of samples in the sample set and characteristic distances between sample characteristics of the plurality of samples output by the age estimation model, and the sample set is continuously selected from the sample set and input into the adjusted age estimation model until the network parameters converge or reach a preset training number, so as to obtain a trained age estimation model.

The age distance between the age labels of the multiple samples can be understood as a difference between the age labels of every two samples, and the feature distance between the sample features of the multiple samples output by the age estimation model can also be understood as a difference between the sample features corresponding to every two samples, so that the model training mode in the invention restrains the feature distance according to the age distance, so that the age features learned by the model conform to an objective rule of age change, that is, the more similar the face features with smaller age differences, the larger the age differences, and the more different the face features with larger age differences, thereby the model in the embodiment can realize high-precision face age estimation.

The age estimation method provided by the embodiment of the invention comprises the steps of firstly, acquiring an image to be processed containing a human face; inputting the image to be processed into an age estimation model trained in advance to obtain an output result; and then determines an age estimation result corresponding to the face based on the output result. The method comprises the steps of carrying out age estimation on a face in an image to be processed through an age estimation model to obtain an age estimation result corresponding to the face, wherein the age estimation model is obtained through machine learning training based on age distances among age labels of a plurality of samples in a preset sample group and characteristic distances among sample characteristics of the plurality of samples output by the age estimation model; therefore, the age estimation model can learn the relation between the age distance and the characteristic distance in the training process, and restricts the characteristic distance according to the age distance in a display mode, so that the age characteristic learned by the model is more discriminative and conforms to the objective rule of age change, and the accuracy of estimating the age by the model is improved.

The embodiment of the invention also provides another age estimation method, which is realized on the basis of the method of the embodiment; the method mainly describes a specific process of determining a weight parameter of an age estimation model (realized by the following steps S202-S206) before acquiring an image to be processed, and a specific process of determining an age estimation result corresponding to a human face based on an output result (realized by the following step S210); as shown in fig. 2, the method comprises the steps of:

step S202, inputting a plurality of samples determined from a sample set into an age estimation model to obtain sample characteristics corresponding to the samples; each sample contained an age label.

The sample set comprises a plurality of samples, each sample comprises a face image and an age label corresponding to the face image, and each face image comprises a face. The plurality of samples are a plurality of samples randomly determined from a sample set. Before model training, the corresponding age label of each face image in the sample set needs to be determined, and in specific implementation, the age label of the face image is determined through the following steps 10-11:

step 10, obtaining a plurality of labeling results corresponding to the face image; the labeling result is used for identifying the age value of the person in the face image; the labeled age value in the labeling result is one of a plurality of preset age values.

The preset age values are set by the research and development personnel according to the requirements, the range and the number of the age values are also set according to the research and development requirements, for example, 101 age values can be set, and the age values are integers between 0 and 100 and respectively represent 0 to 100. The plurality of labeling results corresponding to the face image may be n labeling results obtained after the preset n individuals respectively perform age labeling on the people in the face image, and the age value labeled by the n individuals is one of the preset age values.

Step 11, calculating an average value of the age values corresponding to the plurality of labeling results to obtain an age average value; the age average is used as an age label of the face image.

For example, assuming that a plurality of preset age values are integers between 0 and 100, n persons perform age labeling on the persons in the face image to obtainn number of labeled results

Wherein k has a value ranging from 1 to n,

and representing the labeling result of the kth person to the face image, and obtaining the age mean value as follows according to the n labeling results:

wherein, a represents the age mean of the face image, namely the age label of the face image;

represents rounding down.

The age estimation model may include a feature extraction layer and an output layer; the feature extraction layer is used for extracting features of the face image to obtain feature data of the face image; the output layer is used for outputting sample characteristics and age estimation results of the face images according to the characteristic data. The feature extraction layer comprises a convolution layer and an activation function layer which are connected in sequence, the feature extraction layer can extract image features of a face image to obtain high-level semantic information of the image features, and in order to improve the performance of the feature extraction layer, the feature extraction layer generally comprises a plurality of groups of convolution layers and activation function layers which are connected in sequence. The activation function layer in the feature extraction layer can perform function transformation on the image features output by the convolutional layer, the transformation process can break the linear combination of the input of the convolutional layer, and the activation function layer can be a Sigmoid function, a tanh function, a Relu function and the like.

The output layer comprises at least one fully-connected layer, and the fully-connected layer can obtain an output result with a specified dimension. In some embodiments, to normalize the output results, an activation function layer may be concatenated after the last fully-concatenated layer in the output layer, which may employ a softmax function. In specific implementation, the number of feature extraction layers included in the age estimation model, the number of groups of a plurality of sequentially connected convolution layers and activation function layers included in each feature extraction layer, and the number of fully-connected layers in the output layer can be determined according to the speed and precision of data processing, and generally, the greater the number or the greater the number of groups, the deeper the network structure of the model is, the better the performance is, but the calculation speed is reduced.

Fig. 3 is a schematic network structure diagram of an age estimation model including 4 feature extraction layers and 2 fully-connected layers, where the 4 feature extraction layers in fig. 3 are Block1, Block2, Block3, and Block 4; the 2 full connection layers are FC1 and FC _ a, the face image is input into Block1, feature data are output after sequentially passing through Block2, Block3 and Block4, the feature data are input into FC1, a sample feature with dimension c (the value of c is set according to task requirements, the effect is better when the value of c is larger is general) is obtained, the sample feature is the image feature (also called age feature) of the extracted face image, the image feature is input into FC _ a, the age estimation result of the face image can be obtained, and the dimension of the age estimation result can be consistent with the number of a plurality of preset age values. The dimension of the data output by Block4 is usually large, and FC1 can also be understood as a dimension reduction process, that is, the feature data output by Block4 is reduced to c dimension to obtain the sample feature.

Step S204, determining the loss amount according to the age distance between the age labels corresponding to the samples and the characteristic distance between the sample characteristics of the samples output by the age estimation model.

The loss value is a value related to an age distance between age features corresponding to the plurality of samples and a feature distance between sample features of the plurality of samples output by the age estimation model. In this embodiment, the different age distances correspond to different loss functions, that is, different age distances, and the loss functions for calculating the feature distances between the sample features are different.

In a specific implementation, if the plurality of samples input into the age estimation model are two samples (i.e., a first sample and a second sample), the loss amount may include a first loss value and a second loss value; the first loss value is indicative of: the characteristic distance between the sample characteristic data of the first sample and the sample characteristic data of the second sample output by the age estimation model; the characteristic distance is determined from an age distance between an age label of the first sample and an age label of the second sample. The second loss value is used to indicate: the age estimation model outputs a gap between the age estimation result of the first sample and the age label of the first sample, and the age estimation model outputs a gap between the age estimation result of the second sample and the age label of the second sample. Specifically, the above loss amount may be a sum of the first loss value and the second loss value.

The face image in the sample is input into an age estimation model, and the age estimation model can output sample characteristics corresponding to the sample and an age estimation result corresponding to the face in the face image. In order to enable the age estimation model to learn the age characteristics (equivalent to the sample characteristics) according with the objective rule of age change in the training process, the age characteristics corresponding to the samples can be constrained according to the age distance between the samples, so that the trained age estimation model can accurately estimate the age of a person. The objective rule of the age change can be understood that the more similar the age characteristics of the faces with the similar ages, the more different the age characteristics of the faces with different ages, the larger the difference.

Specifically, the first loss value may be determined by the following equation:

wherein L is_featureRepresenting a first loss value; f1 represents a sample characteristic of the first sample; f2 represents a sample characteristic of the second sample; a1 denotes the age label of the first sample; a2 denotes the age label of the second sample; l |. electrically ventilated margin₂Representing the two-norm of the vector (i.e. the squares of the elements in a x are summed, and then the sum is rooted); max represents the selected maximum; exp is represented byAnd (4) performing exponential function operation with a natural constant e as a base.

L_featureThe meanings of (A) are as follows: as training progresses, if the age label of the first sample is the same as the age label of the second sample (i.e., the age distance corresponding to the age labels of the first and second samples is 0), L_featureThe distance between f1 and f2 is reduced, i.e. the sample features at the same age are made as similar as possible; if the age labels of the first and second samples are different, then L_featureThe distance between f1 and f2 is increased, and the greater the age difference between the age labels in the first and second samples, the greater the feature distance between the two samples will be pulled apart. Therefore, the feature constraint loss function L_featureThe age characteristics (corresponding to the first loss value) can be made to conform to the objective rule of age change, that is, the more similar the characteristics of the faces with similar ages, the more different the characteristics of the faces with different ages.

In a specific implementation, the age estimation result includes: the probability that the face in the sample belongs to each age value under a plurality of preset age values; the second loss value can be determined by the following equation:

wherein L is_ageRepresenting a second loss value;

representing an age estimation result of the first sample; a1 denotes the age label of the first sample;

to represent

The a1 th element in (1) represents the result of estimation of age

to represent

The a2 th element in (1) represents the result of estimation of age

The probability that the age value corresponding to the face in the second sample is a 2. L is_ageThe meaning is as follows: the larger the probability value corresponding to the sample label in the age identification result of the sample is, the better the probability value is.

In a specific implementation, if the preset plurality of age values are 101, the dimension of the vector corresponding to the age estimation result is 101 dimensions. The probability in the age estimation result may be a normalized probability value or an unnormalized probability value, and if the unnormalized probability value is to be converted into the normalized probability value, a softmax function may be added to the last full-connected layer of the age estimation model.

For example, assume that the first sample corresponds to an unnormalized age estimate of as

Wherein,

a probability value corresponding to 0 years of age for a face in the face image,

a probability value corresponding to the age of 1 year for a face in the face image,

for the probability value of 100 years of age corresponding to the face in the face image, each element in the age estimation result is sent into a softmax function, so that the age estimation result of the first sample after the probability value normalization can be obtained:

wherein,

representing the probability value of the face in the normalized age estimation result belonging to the age value j;

a probability value representing the probability value that a face in the non-normalized age estimation result belongs to the age value j;

and a probability value representing the probability value that the face in the age estimation result belongs to the age value m. The age estimation result of the normalized second sample can also be obtained based on the same method as described above.

Step S206, determining a weight parameter of the age estimation model according to the loss amount in the machine learning and training process; and determining the age estimation model after the weight parameters are determined as the trained age estimation model. The weight parameter is typically a weight corresponding to each network parameter in the network structure of the age estimation model.

During specific implementation, the weight parameters of the age estimation model are required to be continuously adjusted according to the loss amount until the loss amount is converged or the specified training times are reached; and determining the corresponding weight parameter when the adjustment of the weight parameter is stopped as the final weight parameter of the age estimation model.

Step S208, if the image to be processed is obtained, inputting the image to be processed into the trained age estimation model to obtain an output result of the image to be processed; the output result includes: and under a plurality of preset age values, the probability that the face in the image to be processed belongs to each age value.

Step S210, determining the age value corresponding to the maximum probability in the output result as the age estimation result corresponding to the face in the image to be processed.

According to the age estimation method, the machine learning training age estimation model is adopted, so that the multi-level semantic features related to the age can be automatically learned; in the process of training the age estimation model, the age characteristics are explicitly restricted, so that the age characteristics learned by the model can better accord with the objective rule of age change, and high-precision face age estimation can be realized.

For the embodiment of the age estimation method, an embodiment of the present invention further provides a training method of an age estimation model, as shown in fig. 4, the method includes the following steps:

step S402, acquiring a sample set; wherein the sample set comprises a plurality of samples, each sample comprising a face image and an age label indicating the age of the person in the face image.

The samples in the preset sample group may be a plurality of samples selected from a preset sample set, where the sample set includes a large number of samples, each sample includes a face image and an age tag corresponding to the face image, each face image includes a face, and the age tag identifies the age of a person in the face image. In a particular implementation, the age label may be determined by steps 10-11 in the above embodiment.

Step S404, based on the age distance between the age labels of the samples in the sample set and the characteristic distance between the sample characteristics of the samples output by the age estimation model, the initial model is trained by machine learning, and the age estimation model is obtained.

In a specific implementation, the step S404 may be implemented by the following steps 20 to 23:

step 20, determining a target sample group based on the sample set; the target set of samples includes a first sample and a second sample of a set of samples.

Step 21, inputting the first sample and the second sample into the initial model to obtain output results corresponding to the first sample and the second sample; the output includes the sample characteristic. The initial model may employ a deep learning model or a neural network model, etc. The network structure of the initial model can be seen in the network structure shown in fig. 3.

And step 22, determining the loss amount according to the age difference between the age labels corresponding to the first sample and the second sample and the output result.

Step 23, updating the weight parameters of the initial model according to the loss amount; and continuing to execute the step of determining the target sample pair based on the sample set until the loss amount is converged or a preset training number is reached, so as to obtain an age estimation model.

Specifically, the output result further includes an age estimation result, and the step 22 can be implemented by the following steps 30 to 32:

and step 30, calculating a characteristic gap between the sample characteristic data of the first sample and the sample characteristic data of the second sample according to the age distance between the age label corresponding to the first sample and the age label corresponding to the second sample to obtain a first loss value.

Step 31, calculating a first gap between the age estimation result of the first sample and the age label of the first sample, and a second gap between the age estimation result of the second sample and the age label of the second sample; and obtaining a second loss value according to the first difference and the second difference.

And step 32, obtaining the loss amount according to the first loss value and the second loss value. The loss amount may be a sum of the first loss value and the second loss value.

In specific implementation, the parts not mentioned in this embodiment can be implemented through steps S202 to S206 in the above embodiment, that is, the manner of calculating the first loss value and the second loss value in this embodiment is the same as that in the above embodiment.

Specifically, the above step 23 can be realized by the following steps 40 to 43:

step 40, calculating the derivative of the loss amount to the weight parameter to be updated in the initial model

Wherein L is the loss amount; w is a weight parameter to be updated; the weight parameters to be updated can be all parameters in the initial model, and can also be partial parameters randomly determined from the initial model; the updated weight parameter is the weight of each layer of network in the initial model. The derivative of the weight parameter to be updated can be solved according to a back propagation algorithm in general; if the loss is large, the difference between the identification result of the current initial model and the expected result is large, the derivative of the loss to the weight parameter to be updated in the initial model is solved, and the derivative can be used as the basis for updating the weight parameter to be updated.

Step 41, updating the weight parameter to be updated to obtain the updated weight parameter to be updated

Wherein α is a preset coefficient, and the preset coefficient is a manually preset hyper-parameter, and can be 0.01, 0.001, and the like. This process may also be referred to as a random gradient descent algorithm; the derivative of each weight parameter to be updated can also be understood as the direction in which the loss amount decreases most rapidly relative to the current parameter, and the loss amount can be reduced rapidly by adjusting the parameter in the direction, so that the weight parameter converges.

Step 42, judging whether the parameters of the updated initial model are all converged, and if yes, executing the step of determining a training sample based on the sample set; otherwise step 43 is performed.

If the parameters of the updated initial model do not all converge, new training samples are determined based on the sample set, and steps 30-33 are continued until the parameters of the updated initial model all converge.

And step 43, determining the initial model after the parameters are updated as the trained age estimation model.

In particular implementations, images in the sample set may be divided into a sample set used to train the model and a sample set used to validate the model according to a preset ratio (e.g., 10: 1). The identification precision of the trained age estimation model can be determined through a sample set used for verifying the model; generally, a test sample can be determined from a sample set used for verifying a model, the test sample comprises a face image and an age label corresponding to the face image, the test sample is input into a trained age estimation model to obtain an age estimation result, an age value corresponding to a maximum probability value in the age estimation result is compared with the age label to judge whether the age estimation result is correct, and the test sample is continuously determined from the sample set used for verifying the model until all samples in the sample set used for verifying the model are selected; and counting the correctness corresponding to the test result corresponding to each test sample to obtain the prediction precision of the trained age estimation model.

The training method of the age estimation model comprises the steps of firstly obtaining a sample set, and then conducting machine learning training on an initial model based on age distances among age labels of a plurality of samples in the sample set and characteristic distances among sample characteristics of the plurality of samples output by the age estimation model to obtain the age estimation model. The age estimation model summarized in the mode can learn the relation between the age distance and the characteristic distance in the training process, and restricts the characteristic distance according to the age distance in a display mode, so that the age characteristic learned by the model has more discriminative power and accords with the objective rule of age change, and the accuracy of estimating the age by the model is improved.

Corresponding to the embodiment of the age estimation method, an embodiment of the present invention further provides an age estimation apparatus, as shown in fig. 5, the apparatus including:

and the image acquisition module 50 is used for acquiring the image to be processed containing the human face.

The image input model 51 is used for inputting the image to be processed into the pre-trained age estimation model to obtain an output result; and determining an age estimation result corresponding to the face based on the output result.

The age estimation model is obtained through machine learning training based on age distances among age labels of a plurality of samples in a preset sample group and characteristic distances among sample characteristics of the plurality of samples output by the age estimation model; each of the samples contains a face image and an age label indicating the age of the person in the face image.

The age estimation device firstly acquires an image to be processed containing a human face; inputting the image to be processed into an age estimation model trained in advance to obtain an output result; and then determines an age estimation result corresponding to the face based on the output result. The method comprises the steps of carrying out age estimation on a face in an image to be processed through an age estimation model to obtain an age estimation result corresponding to the face, wherein the age estimation model is obtained through machine learning training based on age distances among age labels of a plurality of samples in a preset sample group and characteristic distances among sample characteristics of the plurality of samples output by the age estimation model; therefore, the age estimation model can learn the relation between the age distance and the characteristic distance in the training process, and restricts the characteristic distance according to the age distance in a display mode, so that the age characteristic learned by the model is more discriminative and conforms to the objective rule of age change, and the accuracy of estimating the age by the model is improved.

Further, the apparatus further includes a model parameter training module, configured to: determining a weight parameter of an age estimation model according to the loss amount in the machine learning training process; the loss amount is determined according to the age distance between the age labels corresponding to the samples and the characteristic distance between the sample characteristics of the samples output by the age estimation model.

Further, the sample group includes a first sample and a second sample; the loss amount includes a first loss value and a second loss value; the first loss value is used to indicate: the characteristic distance between the sample characteristic data of the first sample and the sample characteristic data of the second sample output by the age estimation model; the characteristic distance is determined from an age distance between an age label of the first sample and an age label of the second sample; the second loss value is used to indicate: the age estimation model outputs a gap between the age estimation result of the first sample and the age label of the first sample, and the age estimation model outputs a gap between the age estimation result of the second sample and the age label of the second sample.

In a specific implementation, the first loss value is determined by the following equation:

Further, the age estimation result includes: the probability that the face in the sample belongs to each age value under a plurality of preset age values; a second loss value determined by the following equation:

wherein L is_ageRepresenting a second loss value;

to represent

The a1 th element in (1) represents the result of estimation of age

In the first sampleThe probability when the age value corresponding to the face is a 1;

to represent

The a2 th element in (1) represents the result of estimation of age

Further, the apparatus further includes a tag determination module configured to: acquiring a plurality of labeling results corresponding to the face image; the labeling result is used for identifying the age value of the person in the face image; the labeled age value in the labeling result is one of a plurality of preset age values; calculating the average value of the age values corresponding to the plurality of labeling results to obtain an age average value; and taking the age mean value as an age label of the face image.

Further, the output result includes: the probability that the face in the image to be processed belongs to each age value is determined according to the preset age values; the image input model 51 is further configured to: and determining the age value corresponding to the maximum probability in the output result as an age estimation result corresponding to the face in the image to be processed.

The age estimation apparatus provided in the embodiment of the present invention has the same implementation principle and technical effect as those of the age estimation method embodiment, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiment for the part not mentioned in the apparatus embodiment.

Corresponding to the above embodiment of the training method of the age estimation model, an embodiment of the present invention further provides a training apparatus of an age estimation model, as shown in fig. 6, the training apparatus includes:

a sample set obtaining module 60 for obtaining a sample set; wherein the sample set comprises a plurality of samples, each sample comprising a face image and an age label indicating the age of the person in the face image.

And the model training module 61 is configured to perform machine learning training on the initial model based on the age distances between the age labels of the multiple samples in the sample set and the feature distances between the sample features of the multiple samples output by the age estimation model, so as to obtain the age estimation model.

The training device of the age estimation model firstly acquires a sample set, and then performs machine learning training on an initial model based on age distances among age labels of a plurality of samples in the sample set and characteristic distances among sample characteristics of the plurality of samples output by the age estimation model to obtain the age estimation model. The age estimation model summarized in the mode can learn the relation between the age distance and the characteristic distance in the training process, and restricts the characteristic distance according to the age distance in a display mode, so that the age characteristic learned by the model has more discriminative power and accords with the objective rule of age change, and the accuracy of estimating the age by the model is improved.

Further, the model training module 61 is configured to: determining a target sample group based on the sample set; the target sample group comprises a first sample and a second sample in a sample set; inputting the first sample and the second sample into the initial model to obtain output results corresponding to the first sample and the second sample; the output result includes a sample feature; determining the loss amount according to the age difference between the age labels corresponding to the first sample and the second sample and the output result; updating the weight parameters of the initial model according to the loss amount; and continuing to execute the step of determining the target sample pair based on the sample set until the loss amount is converged or a preset training number is reached, so as to obtain an age estimation model.

Specifically, the output result further includes an age estimation result; the model training module 61 is further configured to: calculating a characteristic difference between the sample characteristic data of the first sample and the sample characteristic data of the second sample according to the age distance between the age label corresponding to the first sample and the age label of the second sample to obtain a first loss value; calculating a first gap between the age estimate of the first sample and the age label of the first sample, and a second gap between the age estimate of the second sample and the age label of the second sample; obtaining a second loss value according to the first difference and the second difference; and obtaining the loss amount according to the first loss value and the second loss value.

The second loss value is determined by the following equation:

wherein L is_ageRepresenting a second loss value;

to represent

The a1 th element in (1) represents the result of estimation of age

In the first sample ofThe probability that the age value corresponding to the face of (a) is a 1;

to represent

The a2 th element in (1) represents the result of estimation of age

The implementation principle and the generated technical effect of the training device of the age estimation model provided by the embodiment of the invention are the same as those of the embodiment of the training method of the age estimation model, and for the sake of brief description, corresponding contents in the embodiment of the method can be referred to where the embodiment of the device is not mentioned.

An embodiment of the present invention further provides an electronic device, which is shown in fig. 7 and includes a processor 101 and a memory 100, where the memory 100 stores machine executable instructions that can be executed by the processor 101, and the processor executes the machine executable instructions to implement the age estimation method or the training method of the age estimation model.

Further, the electronic device shown in fig. 7 further includes a bus 102 and a communication interface 103, and the processor 101, the communication interface 103, and the memory 100 are connected through the bus 102.

The Memory 100 may include a high-speed Random Access Memory (RAM) and may further include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is realized through at least one communication interface 103 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like can be used. The bus 102 may be an ISA bus, PCI bus, EISA bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 7, but this does not indicate only one bus or one type of bus.

The processor 101 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 101. The Processor 101 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 100, and the processor 101 reads the information in the memory 100, and completes the steps of the method of the foregoing embodiment in combination with the hardware thereof.

An embodiment of the present invention further provides a machine-readable storage medium, where the machine-readable storage medium stores machine-executable instructions, and when the machine-executable instructions are called and executed by a processor, the machine-executable instructions cause the processor to implement the age estimation method or the training method for the age estimation model, and specific implementation may refer to method embodiments and will not be described herein again.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the apparatus and/or the electronic device described above may refer to corresponding processes in the foregoing method embodiments, and are not described herein again.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A method of age estimation, the method comprising:

acquiring an image to be processed containing a human face;

inputting the image to be processed into an age estimation model trained in advance to obtain an output result; determining an age estimation result corresponding to the face based on the output result;

2. The method of claim 1, wherein the weight parameters of the age estimation model are determined from the amount of loss during machine learning training; wherein the loss amount is determined according to age distances between age labels corresponding to the samples and characteristic distances between sample characteristics of the samples output by the age estimation model.

3. The method of claim 2, wherein the set of samples comprises a first sample and a second sample; the loss amount comprises a first loss value and a second loss value;

the first loss value is used to indicate: a feature distance between sample feature data of the first sample and sample feature data of the second sample output by the age estimation model; the characteristic distance is determined from an age distance between an age label of the first sample and an age label of the second sample;

the second loss value is used to indicate: a gap between an age estimation result of the first sample and an age label of the first sample output by the age estimation model, and a gap between an age estimation result of the second sample and an age label of the second sample output by the age estimation model.

4. The method of claim 3, wherein the first loss value is determined by the following equation:

wherein L is_featureRepresenting the first loss value; f1 represents a sample characteristic of the first sample; f2 represents a sample characteristic of the second sample; a1 denotes an age mark of the first sampleSigning; a2 represents the age label of the second sample; l |. electrically ventilated margin₂A two-norm representation of a vector; max represents the selected maximum; exp denotes an exponential function operation with a natural constant e as the base.

5. The method of claim 3, wherein the age estimation comprises: the probability that the face in the sample belongs to each age value under a plurality of preset age values;

the second loss value is determined by the following equation:

wherein L is_ageRepresenting the second loss value;

representing an age estimation of the first sample; a1 represents an age label of the first sample;

to represent

A1 th element in (a) represents the result of the age estimation

representing an age estimation of the second sample; a2 represents the age label of the second sample;

to represent

The a2 th element in (1) represents the result of estimation of age

6. The method of claim 1, wherein the age label of the face image is determined by:

acquiring a plurality of labeling results corresponding to the face image; the labeling result is used for identifying the age value of the person in the face image; the labeled age value in the labeling result is one of a plurality of preset age values;

calculating the average value of the age values corresponding to the plurality of labeling results to obtain an age average value; and taking the age mean value as an age label of the face image.

7. The method of claim 1, wherein outputting the result comprises: the probability that the face in the image to be processed belongs to each age value is determined according to the age values;

the step of determining an age estimation result corresponding to the face based on the output result includes:

and determining the age value corresponding to the maximum probability in the output result as an age estimation result corresponding to the face in the image to be processed.

8. A method for training an age estimation model, the method comprising:

obtaining a sample set; wherein the sample set comprises a plurality of samples, each sample comprises a face image and an age label, and the age label is used for indicating the age of a person in the face image;

and performing machine learning training on an initial model based on age distances among age labels of a plurality of samples in the sample set and characteristic distances among sample characteristics of the plurality of samples output by the age estimation model to obtain the age estimation model.

9. The training method according to claim 8, wherein the step of performing machine learning training on an initial model based on age distances between age labels of a plurality of samples in the sample set and feature distances between sample features of a plurality of samples output by the age estimation model to obtain the age estimation model comprises:

determining a target sample group based on the sample set; the target sample group comprises a first sample and a second sample in the sample set;

inputting the first sample and the second sample into the initial model to obtain output results corresponding to the first sample and the second sample; the output result comprises a sample feature;

determining the loss amount according to the age difference between the age labels corresponding to the first sample and the second sample and the output result;

updating the weight parameters of the initial model according to the loss amount; and continuing to execute the step of determining a target sample pair based on the sample set until the loss amount converges or a preset training time is reached, so as to obtain the age estimation model.

10. A training method as claimed in claim 9, wherein the output further comprises an age estimation result; the step of determining the loss amount according to the age difference between the age labels corresponding to the first sample and the second sample and the output result comprises:

calculating a characteristic gap between the sample characteristic data of the first sample and the sample characteristic data of the second sample according to the age distance between the age label corresponding to the first sample and the age label corresponding to the second sample to obtain a first loss value;

calculating a first gap between the age estimate of the first sample and the age label of the first sample and a second gap between the age estimate of the second sample and the age label of the second sample; obtaining a second loss value according to the first difference and the second difference;

and obtaining the loss amount according to the first loss value and the second loss value.

11. An age estimation apparatus, characterized in that the apparatus comprises:

the image acquisition module is used for acquiring an image to be processed containing a human face;

the image input model is used for inputting the image to be processed into an age estimation model which is trained in advance to obtain an output result; determining an age estimation result corresponding to the face based on the output result;

12. An apparatus for training an age estimation model, the apparatus comprising:

the sample set acquisition module is used for acquiring a sample set; wherein the sample set comprises a plurality of samples, each sample comprises a face image and an age label, and the age label is used for indicating the age of a person in the face image;

and the model training module is used for performing machine learning training on an initial model based on the age distance between the age labels of the samples in the sample set and the characteristic distance between the sample characteristics of the samples output by the age estimation model to obtain the age estimation model.

13. An electronic device comprising a processor and a memory, the memory storing machine executable instructions executable by the processor, the processor executing the machine executable instructions to implement the age estimation method of any one of claims 1 to 7 or the training method of the age estimation model of any one of claims 8 to 10.

14. A machine-readable storage medium having stored thereon machine-executable instructions which, when invoked and executed by a processor, cause the processor to implement the age estimation method of any of claims 1 to 7 or the training method of the age estimation model of any of claims 8 to 10.