CN112560823A

CN112560823A - Adaptive variance and weight face age estimation method based on distribution learning

Info

Publication number: CN112560823A
Application number: CN202110199644.8A
Authority: CN
Inventors: 郭海云; 温馨; 王金桥; 唐明; 伍虹燕
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Zhongke Zidong Taichu Beijing Technology Co ltd
Priority date: 2021-02-23
Filing date: 2021-02-23
Publication date: 2021-03-26
Anticipated expiration: 2041-02-23
Also published as: CN112560823B

Abstract

The invention belongs to the field of computer vision and pattern recognition, and particularly relates to a face age estimation method, system and device based on self-adaptive variance and weight of distributed learning, aiming at solving the problems that the existing face age data set is unbalanced in category and the existing method is insufficient in using fixed variance, so that the difference between a face age estimation result and a true value is large. The method comprises the steps of preprocessing an input image to obtain a preprocessed image; obtaining the predicted age of the face in the preprocessed image through a pre-trained face age estimation model; the face age estimation model is constructed based on a depth residual error network. The invention reduces the difference between the estimation result of the face age and the true value.

Description

Adaptive variance and weight face age estimation method based on distribution learning

Technical Field

The invention belongs to the field of computer vision and pattern recognition, and particularly relates to a method, a system and a device for estimating face age based on self-adaptive variance and weight of distributed learning.

Background

Human face, one of the most important biological features of human beings, contains a great deal of information such as identity, race, sex, expression, age, and the like. With the development of artificial intelligence technology, the research on human face correlation technology has become a research hotspot in the field of computer vision. Age estimation is an important task in face attribute recognition, namely, predicting age from an input face image. It has many potential applications including demographic collection, business user management, video security monitoring, etc. Compared with the traditional method based on manual characteristics and statistical modeling, the method based on the convolutional neural network achieves better performance in recent years.

One of the most troublesome problems in face age estimation is the ambiguity of labels, i.e. facial images of the same person with similar ages are often difficult to distinguish. Some existing approaches use a label distribution learning approach (LDL) to address this problem by exploiting semantic relatedness between age labels. The method assumes that the true age of a face can be represented by a discrete distribution, and then measures the similarity between the predicted distribution and the true distribution by using the K-L divergence. For the label distribution learning method, the mean value of the age label distribution is the real age value, but the distribution variance of the face image is usually unknown. These methods usually treat the variance as a hyperparameter, setting it directly to some fixed value. However, the variance is closely related to the correlation between adjacent ages, and the variance of different people should be different, and the variance of the same person at different ages should also be different, assuming that all images have the same variance, may degrade the performance of the model.

Another troublesome problem of face age estimation is that existing public data sets have serious class imbalance problems, which restrict the performance of age estimation. This is manifested in the form of excessive samples in adolescents and young adults and insufficient samples in the elderly. In machine learning, a typical assumption often assumes that training samples are equal in number in each class, i.e., the number of samples in each class is balanced, but the practical problem often does not meet the assumption. Generally, unbalanced samples result in training the model to focus on classes with a higher number of samples, while "overlooking" the class with a lower number of samples, thus affecting the generalization capability of the model on the test data.

Disclosure of Invention

In order to solve the above-mentioned problems in the prior art, that is, to solve the problem that the difference between the estimation result of the face age and the true value is large due to the unbalanced class of the existing face age data set and the insufficient use of the fixed variance in the existing method, the first aspect of the present invention provides a face age estimation method based on the adaptive variance and weight of the distributed learning, which comprises:

step S10, acquiring a face image of the age to be estimated as an input image;

step S20, preprocessing the input image to obtain a preprocessed image; obtaining the predicted age of the face in the preprocessed image through a pre-trained face age estimation model;

the face age estimation model is constructed based on a depth residual error network; the training method comprises the following steps:

a10, pre-training a face age estimation model on an IMDB-WIKI database, and taking the pre-trained model as a first model;

a20, acquiring a face sample image and a truth label of the face sample image corresponding to the face age, and constructing a training set; inputting a face sample image into a first model to obtain the prediction probability distribution of the face age and a corresponding prediction value label;

a30, taking a truth-value label of the face age as a mean value, combining a preset variance to construct a normal distribution, and combining a prediction probability distribution corresponding to each face sample image to calculate the K-L loss;

a40, calculating a cross entropy loss value of each face sample image according to a corresponding predicted value label and a corresponding true value label, and performing weighted summation by combining with a preset weight to obtain a weight loss;

a50, summing the weight loss and the K-L loss to obtain a total loss, updating model parameters of the first model by an SGD optimization method, and taking the updated model as a second model;

a60, selecting the same number of face sample images from all ages in a training set, and constructing a verification set; inputting the face sample images in the verification set into a second model to obtain corresponding predicted value labels, and calculating L1 loss and cross entropy loss by combining the true value labels of the face sample images;

a70, updating the variance and weight of each face image by combining L1 loss and cross entropy loss corresponding to each face sample image in the verification set; and after updating, recalculating the K-L loss and the weight loss of each face sample image in the training set, updating the model parameters of the second model, and taking the updated second model as a finally trained face age estimation model.

In some preferred embodiments, the normal distribution of the truth label is:

wherein,

representing the true age of a face in a face sample image as

The probability of (a), i.e. a normal distribution,

is a true value label of the age of the face in the face sample image,

and representing the corresponding initial variance value of the face sample image.

In some preferred embodiments, the K-L loss is calculated by:

wherein,

the values of the loss of K to L are shown,

representing face sample images

The corresponding predicted probability distribution is then calculated,

representing model parameters.

In some preferred embodiments, the weight loss is calculated by:

wherein,

the value of the weight loss is represented,

which represents the cross-entropy loss value of the entropy,

is shown as

The prediction result of the face sample image is displayed,

is shown as

A true value label of the face sample image,

is shown as

The weights corresponding to the face sample images,

representing the number of face sample images.

In some preferred embodiments, the method for updating the variance and the weight of each face image in combination with the L1 loss and the cross entropy loss corresponding to each face sample image in the verification set includes:

summing the L1 loss and the cross entropy loss corresponding to each face sample image in the verification set to obtain a first loss;

updating the variance and the disturbance variable corresponding to the weight of each face sample image based on the first loss;

and updating the corresponding variance and weight of each face sample image based on the updated variance and the disturbance variable corresponding to the weight.

In some preferred embodiments, the method for updating the variance and the disturbance variable corresponding to the weight of each face sample image based on the first loss includes:

wherein,

the first loss is represented by the first loss,

、

representing the updated variance and the disturbance variable corresponding to the weight,

、

representing the disturbance variable corresponding to the variance and the weight which are not updated,

、

which represents a preset step of descent in time,

、

representing the face sample images in the verification set and the corresponding truth labels,

representing the updated model parameters of the model,

、

in a function representing a first loss

、

The corresponding gradient.

In a second aspect of the present invention, a face age estimation system based on adaptive variance and weight of distribution learning is provided, the system comprising: the system comprises an image acquisition module and an age prediction module;

the image acquisition module is configured to acquire a face image to be age estimated as an input image;

the age prediction module is configured to pre-process the input image as a pre-processed image; obtaining the predicted age of the face in the preprocessed image through a pre-trained face age estimation model;

In a third aspect of the present invention, a storage device is provided, in which a plurality of programs are stored, the programs being adapted to be loaded and executed by a processor to implement the above-mentioned adaptive variance and weight based on distribution learning method for face age estimation.

In a fourth aspect of the present invention, a processing apparatus is provided, which includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is suitable for being loaded and executed by a processor to implement the above-described adaptive variance and weight based face age estimation method based on distribution learning.

The invention has the beneficial effects that:

the invention reduces the difference between the estimation result of the face age and the true value. The method of the invention utilizes the meta-learning training strategy to execute the step of meta-gradient descent on variables (namely, variance and weight), and the predicted age label distribution is more effectively close to the real distribution by adaptively learning the proper variance and weight of each sample, and meanwhile, the generalization capability of the model is improved, the purpose of improving the age estimation effect is achieved, and the difference between the age estimation result of the human face and the real value is further reduced.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings.

FIG. 1 is a flowchart illustrating a method for estimating a face age based on adaptive variance and weight of distribution learning according to an embodiment of the present invention;

FIG. 2 is a block diagram of an adaptive variance and weight face age estimation system based on distribution learning according to an embodiment of the present invention;

FIG. 3 is a schematic frame diagram of a training process of a face age estimation model according to an embodiment of the present invention;

fig. 4 is a flowchart illustrating a training process of a face age estimation model according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict.

The invention relates to a face age estimation method based on distribution learning adaptive variance and weight, as shown in figure 1, the method comprises the following steps:

step S10, acquiring a face image to be age estimated as an input image;

In order to more clearly describe the method for estimating the age of the human face based on the adaptive variance and the weight of the distribution learning, the following describes in detail the steps in an embodiment of the method of the present invention with reference to the drawings.

In the following embodiments, the training process of the face age estimation model is detailed first, and then the process of obtaining the predicted age of the face by the face age estimation method based on the adaptive variance and weight of the distribution learning is detailed.

1. Training process of the face age estimation model, as shown in fig. 3 and 4

in the embodiment, the deep residual error network Resnet-18 is used as a backbone network of the face age estimation model to extract features, different from a commonly-used VGG-16 network, the Resnet-18 can be converged more quickly, classification accuracy is improved, network parameters are limited, and feasibility of industrial application is guaranteed.

Resnet-18 comprises a convolutional neural network CNN and two full connection layers; the convolutional neural network is used for extracting a feature map of an input face sample image, the first fully-connected layer is used for mapping the feature map extracted by the convolutional neural network into a feature vector, and the second fully-connected layer is used for outputting corresponding prediction age distribution according to the feature vector output by the feature mapping layer.

In the present invention, training is divided into two major steps, the first step, pre-training on an IMDB-WIKI database using a face age estimation model, this data set containing 50 million images from IMDB and Wikipedia. And taking the pre-trained model as a first model.

in this embodiment, a face sample image is obtained, a training set is constructed, and the real age of the face in the face sample image is labeled as a true value label. And then inputting the face sample image into the first model to obtain the prediction probability distribution of the face age in the image and a corresponding prediction value label. In addition, different from the division of a common training set test set, the invention adds a verification set, specifically, the same number of pictures are respectively selected from all ages in the training set to form a small unbiased verification set, and a proper variance and weight of each picture are adaptively found through the representation of an input image in the verification set.

in this embodiment, use is made of

Is shown as

The variance of the face sample images is obtained, and then the initial value of the variance of all the face sample images is set to be a fixed value

To find the appropriate variance, we proceed by

Perturb each

As shown in equation (1):

（1）

wherein,

initially set to a value of 0 and,

representing a disturbance vector

To (1) a

Component, so finding the appropriate variance

I.e. to find suitable

。

In addition, use

Is shown as

Weights of the face sample images are obtained, then the initial values of the weights of all the face sample images are set to be 0, and in order to find the proper weight, the user can search the proper weight by

Perturb each weight so find the appropriate weight

I.e. to find suitable

。

And (4) constructing normal distribution of the truth label by taking the truth label of the face age as a mean value and combining the variance of a preset face sample image. The normal distribution is shown in equation (2):

（2）

wherein,

representing the true age of a face in a face sample image as

The probability of (a), i.e. a normal distribution,

is a true value label of the age of the face in the face sample image,

Calculating K-L loss based on normal distribution and prediction probability distribution of truth-value labels of the face sample image, wherein the calculation process is shown as a formula (3):

（3）

wherein,

the values of the loss of K to L are shown,

representing face sample images

The corresponding predicted probability distribution is then calculated,

representing model parameters.

in this embodiment, a cross entropy loss value is calculated according to the prediction result and the truth label of the face sample image. Obtaining a weight loss value based on the cross loss value and a weight value corresponding to the face sample image, wherein the specific process is shown in formula (4):

（4）

wherein,

the value of the weight loss is represented,

which represents the cross-entropy loss value of the entropy,

is shown as

The prediction result of the face sample image is displayed,

representing the number of face pictures.

in this embodiment, the weight loss and the K-L loss are weighted and summed to obtain a total loss, and the model parameters of the first network are updated by the SGD optimization method, as shown in equations (5) and (6):

（5）

（6）

wherein,

which represents the step size of the learning,

is shown as

The parameters of the model for the sub-iteration,

representing the updated model parameters. Taking the updated first model as the second model,

in a function representing total loss

The corresponding gradient.

in the embodiment, the same number of images (pictures) are respectively selected from all ages in the training set to form a small unbiased verification set, and the proper variance and weight of each picture are adaptively found through the representation of the input images in the verification set. Namely, the updated model parameters are shared with the verification set model, and the joint loss of L1 loss and the corresponding cross entropy loss are calculated as meta-targets by using the predicted expected age and the truth label on the verification set. By using the element director, two disturbance variables are updated through gradient descent, and proper variance and weight are adaptively found, so that the model is better represented on a verification set. The method comprises the following specific steps:

label the true value of the age of the face as

Of the verification set

A second model is input. The training loss is related to distribution, and the invention adopts the method in verification to compensate the lack of constraint in the final predicted age value

The loss measures the distance between the predicted face age and the verification truth value, as shown in equation (7):

（7）

wherein，

Is authentication set input

To (1) a

The number of prediction vectors is determined based on the prediction vector,

represents the first

The age value of the class is determined,

to represent

The value of the loss is determined,

representing verification set input

Corresponding predicted age values.

Whereas better superparameters mean better verification performance. In this embodiment, the L1 loss and the cross entropy loss corresponding to each face sample image in the verification set are summed to serve as a first loss; updating disturbance variable by gradient descent method based on first loss

And

as shown in equations (8) (9) (10):

（8）

（9）

（10）

wherein,

、

which represents a preset step of descent in time,

the first loss is represented by the first loss,

、

、

in a function representing a first loss

、

The corresponding gradient.

Due to the non-negative limitations of variance and weight, we will refer to

Normalized to [ -1,1 [ ]]，

The constraint is first greater than 0 and then normalized.

Finally, by using the learned variance and weight, the invention calculates the sum of the forward K-L loss and the weight loss of the training input image again, and then updates the model parameters by using the SGD optimizer. And taking the updated model as a final trained face age estimation model, as shown in fig. 3.

In addition, the evaluation index in the invention preferably adopts MAE, namely the average value of age differences between the predicted value and the true value of all the face images, and the lower the MAE is, the more accurate the model prediction result is.

2. Adaptive variance and weight face age estimation method based on distribution learning

Step S10, acquiring a face image to be age estimated as an input image;

in this embodiment, a face image is acquired, and face age estimation is performed.

Step S20, preprocessing the input image to obtain a preprocessed image; and obtaining the predicted age of the face in the preprocessed image through a pre-trained face age estimation model.

In this embodiment, the obtained face image is preprocessed and input into the trained face age estimation model to obtain the predicted age of the face in the input image, and the predicted age is output as an estimation result. The pretreatment process comprises the following steps: detecting a face area, positioning face key points, and correcting a face image according to the positioning result of the face key points.

A second embodiment of the present invention relates to a face age estimation system based on adaptive variance and weight of distribution learning, as shown in fig. 2, specifically including: an image acquisition module 100, an age prediction module 200;

the image acquisition module 100 is configured to acquire a face image to be age-estimated as an input image;

the age prediction module 200 is configured to pre-process the input image to obtain a pre-processed image; obtaining the predicted age of the face in the preprocessed image through a pre-trained face age estimation model;

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiment, and details are not described herein again.

It should be noted that, the face age estimation system based on adaptive variance and weight of distribution learning provided in the foregoing embodiment is only illustrated by the division of the above functional modules, and in practical applications, the above functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiments of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiments may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the above described functions. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.

A storage device according to a third embodiment of the present invention stores therein a plurality of programs adapted to be loaded by a processor and to implement the above-described face age estimation method based on adaptive variance and weight for distribution learning.

A processing apparatus according to a fourth embodiment of the present invention includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is suitable for being loaded and executed by a processor to implement the above-described adaptive variance and weight based face age estimation method based on distribution learning.

It can be clearly understood by those skilled in the art that, for convenience and brevity, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method examples, and are not described herein again.

It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.

The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. A face age estimation method based on adaptive variance and weight of distribution learning is characterized by comprising the following steps:

s10, acquiring a face image to be age estimated as an input image;

s20, preprocessing the input image to obtain a preprocessed image; obtaining the predicted age of the face in the preprocessed image through a pre-trained face age estimation model;

a70, updating the variance and the weight of each face sample image by combining the L1 loss and the cross entropy loss corresponding to each face sample image in the verification set; and after updating, recalculating the K-L loss and the weight loss of each face sample image in the training set, updating the model parameters of the second model, and taking the updated second model as a finally trained face age estimation model.

2. The adaptive variance and weight face age estimation method based on distribution learning of claim 1, wherein the normal distribution of the truth label is:

wherein,

representing the true age of a face in a face sample image as

The probability of (a), i.e. a normal distribution,

is a true value label of the age of the face in the face sample image,

representing the corresponding initial variance values of the face sample image。

3. The adaptive variance and weight face age estimation method based on distribution learning of claim 2, wherein the K-L loss is calculated by:

wherein,

the values of the loss of K to L are shown,

representing face sample images

The corresponding predicted probability distribution is then calculated,

representing model parameters.

4. The method for estimating face age based on adaptive variance and weight of distribution learning according to claim 3, wherein the weight loss is calculated by:

wherein,

the value of the weight loss is represented,

which represents the cross-entropy loss value of the entropy,

is shown as

The prediction result of the face sample image is displayed,

is shown as

The weights corresponding to the face sample images,

which represents the number of face sample images,

is shown as

And (5) true value labels of the face sample images.

5. The method for estimating the age of the face based on the adaptive variance and the weight of the distribution learning of claim 1, wherein the method for updating the variance and the weight of each face sample image by combining the L1 loss and the cross entropy loss corresponding to each face sample image in the verification set comprises the following steps:

6. The method for estimating face age based on adaptive variance and weight of distribution learning according to claim 5, wherein updating the disturbance variable corresponding to the variance and weight of each face sample image based on the first loss comprises:

wherein,

the first loss is represented by the first loss,

、

which represents a preset step of descent in time,

、

representing the updated model parameters of the model,

、

in a function representing a first loss

、

The corresponding gradient.

7. A system for adaptive variance and weight face age estimation based on distributed learning, the system comprising: the system comprises an image acquisition module and an age prediction module;

the age prediction module is configured to preprocess the input image to obtain a preprocessed image; obtaining the predicted age of the face in the preprocessed image through a pre-trained face age estimation model;

8. A storage device having stored therein a plurality of programs, wherein the programs are adapted to be loaded and executed by a processor to implement the distribution learning based adaptive variance and weight face age estimation method of any of claims 1-6.

9. A processing device comprising a processor and a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; characterized in that said program is adapted to be loaded and executed by a processor to implement the distribution learning based adaptive variance and weight face age estimation method of any of claims 1-6.