CN109002755B

CN109002755B - Age estimation model construction method and estimation method based on face image

Info

Publication number: CN109002755B
Application number: CN201810563826.7A
Authority: CN
Inventors: 彭进业; 李帆; 李展; 王珺; 章勇勤; 祝轩; 唐文华
Original assignee: Northwestern University
Current assignee: Northwestern University
Priority date: 2018-06-04
Filing date: 2018-06-04
Publication date: 2020-09-01
Anticipated expiration: 2038-06-04
Also published as: CN109002755A

Abstract

The invention discloses a method for constructing an age estimation model based on a face image and an estimation method, which adopts a method based on skin color classification and deep label distribution learning, age estimation is carried out on the face image in the MORPH database, the influence of individual skin color difference is considered in the age estimation method, compared with the existing method, the method can effectively reduce the influence caused by skin color difference, changes the last global average pooling layer of the inclusion-V3 deep convolutional neural network into the global maximum pooling layer, can reduce the problem of mean shift of estimated values caused by parameter errors of the convolutional layer, more retains texture information, and a deep label distribution learning algorithm and an inclusion-V3 deep convolution neural network are adopted, network fine tuning is carried out on data by using transfer learning, and the feasibility and the effectiveness of the method are verified through theoretical analysis and experiments.

Description

Age estimation model construction method and estimation method based on face image

Technical Field

The invention relates to the field of face image recognition, in particular to a face image-based age estimation model construction method and an age estimation method.

Background

Age information, an important biometric feature of human beings, has numerous application requirements in the field of human-computer interaction, and has an important influence on the performance of a face recognition system. The age estimation of the face image refers to modeling the rule of the face image changing along with the age by applying a computer technology, so that a machine can estimate the approximate age or the age range of a person according to the face image.

In the prior art, when age estimation is performed on the basis of a face image, a Gabor feature and an LBP feature are used, and an SVM is used for classifying ages; face age estimation is also performed by KPLS (Kernel Partial Least squares) regression and BIF (BiologeallyInsected Features) methods.

In the prior art, when the age of a face image is estimated, the influence of skin color difference and aging process on the age estimation accuracy is not considered, so that the age estimation accuracy is not high.

Disclosure of Invention

The invention aims to provide a construction method and an estimation method of an age estimation model based on a face image, which are used for solving the problems that the age estimation accuracy is low and the like because the skin color influence is not considered when the age estimation is carried out on the face image by the method in the prior art.

In order to realize the task, the invention adopts the following technical scheme:

an age estimation model construction method based on face images comprises the following steps:

step 1, carrying out face detection on a plurality of images with faces, intercepting only the images with the faces as face images, and storing the plurality of face images as an original face image set;

step 2, dividing all the face images in the original face image set into two groups, wherein all the face images with black skin form a first image set, and all the face images with non-black skin form a second image set;

respectively corresponding each human face image in the first image set to a respective age label to obtain a first age label set, and respectively corresponding each human face image in the second image set to a respective age label to obtain a second age label set, wherein the age labels are real age values of human faces in the human face images;

step 3, extracting LBP characteristics from each face image in the first image set to obtain first LBP characteristics of each face image; extracting DCNN (direct current neural network) features from each face image in the first image set by adopting a depth convolution neural network to obtain first DCNN features of each face image; fusing the first LBP characteristic and the first DCNN characteristic of each facial image to obtain a first image characteristic of each facial image, and collecting the first image characteristics of all facial images in a first image set to obtain a first image characteristic set;

extracting LBP characteristics from each face image in the second image set to obtain second LBP characteristics of each face image; extracting DCNN (direct current neural network) features from each face image in the second image set by adopting a depth convolution neural network to obtain second DCNN features of each face image; fusing the second LBP characteristic and the second DCNN characteristic of each facial image to obtain a second image characteristic of each facial image, and collecting the second image characteristics of all facial images in a second image set to obtain a second image characteristic set;

step 4, taking the first image feature set as input and the first age label set as output, training an XGboost regression model, and obtaining a first age estimation model;

and taking the second image feature set as input and the second age label set as output, training the XGboost regression model, and obtaining a second age estimation model.

Further, dividing all the face images in the original face image set into two groups, specifically including:

step 21, taking a plurality of face images from the original face image set as a grouped image set, wherein each face image in the grouped image set corresponds to a respective skin color label, the skin color labels comprise black skin [0] and non-black skin [1], and collecting the skin color labels of all the face images in the grouped image set to obtain a skin color label set;

step 22, extracting the color feature and the LBP feature of each facial image in the grouped image set, fusing the color feature and the LBP feature of each facial image to obtain the skin color feature of each facial image, and collecting the skin color features of all the facial images in the grouped image set to obtain a skin color feature set;

step 23, taking the skin color feature set as input and the skin color label set as output, training an XGboost classification model, and obtaining a face classification model;

and step 24, grouping the original face image set processed in the step 22 by using the face classification model obtained in the step 23 to obtain a black skin face image or a non-black skin face image.

Further, the loss function in the XGBoost classification model is logistic loss.

Further, the DCNN features are extracted by adopting an inclusion-v 3 network.

Further, when the inclusion-v 3 network is adopted to extract the DCNN feature, a top-layer global average pooling layer in the inclusion-v 3 network is replaced by a global maximum pooling layer, a plurality of layers of gaussian distribution initialized full-connection layers are added behind the global maximum pooling layer, tag distribution of a KL loss layer is replaced by deep tag distribution, and the output of the last full-connection layer in the plurality of layers of gaussian distribution initialized full-connection layers is used as the DCNN feature.

Further, two Gaussian distribution initialized full-connection layers are added behind the global maximum pooling layer, the weight of the global maximum pooling layer in the inclusion-v 3 network and the added two Gaussian distribution initialized full-connection layers is updated by adopting a back propagation finetune method, the output of the second full-connection layer is obtained, and a learning rate is set by utilizing an exponential attenuation method, wherein the initial learning rate is 0.1, the minimum step length is 32, and the iteration times are 100.

Further, the mean value of the gaussian distribution of the fully connected layer initialized by the gaussian distribution is 0, and the variance is 0.01.

Further, the loss function in the XGBoost regression model is square loss.

An age estimation method based on a face image, which adopts the step 1 to process an image to be estimated to obtain the face image to be estimated, and carries out age estimation on the face image to be estimated, and the method comprises the following steps:

step A, adopting the face classification model to classify the face image to be estimated, which is processed in step 22, and if the face image is a black skin face image, executing step B; if the face image is a non-black skin face image, executing the step C;

b, adopting the first age estimation model to estimate the age of the face image of the black skin processed in the step 3;

and C, adopting the second age estimation model to estimate the age of the face image of the non-black skin processed in the step 3.

Compared with the prior art, the invention has the following technical characteristics:

1. the invention considers the influence of individual skin color difference into the age estimation method, classifies the face images in the database into black and non-black according to the skin color difference, establishes age estimation models for the classified images respectively, and can effectively reduce the influence caused by the skin color difference compared with the prior method.

2. The last global average pooling layer of the inclusion-V3 deep convolutional neural network is changed into a global maximum pooling layer, because the global average pooling can reduce the problem of the increase of variance of the estimated value caused by the limitation of the size of a convolutional kernel, more background information of the image is reserved, more texture information needs to be reserved for the age estimation of the face, the global maximum pooling can reduce the problem of mean shift of the estimated value caused by parameter errors of the convolutional layer, and more texture information is reserved.

3. The method adopts a depth label distribution learning algorithm and an inclusion-V3 depth convolution neural network, uses transfer learning to carry out network fine tuning on data, extracts depth features, and is fused with LBP features, thereby complementing the problem that DCNN neglects local structure features when directly extracting face features and the problem of relying on manual selection to intervene subjective factors when extracting LBP features.

Drawings

FIG. 1 is a flow chart of a model building method provided by the present invention;

FIG. 2 is a face image in a first image set provided by the present invention;

FIG. 3 is a face image in a second image set provided by the present invention;

FIG. 4 is an image to be estimated provided in an embodiment of the present invention;

FIG. 5 is a diagram of a face image to be estimated according to an embodiment of the present invention;

FIG. 6 is a diagram of a face image to be estimated according to another embodiment of the present invention;

FIG. 7 is a graph of MAE results after training samples are added for each age estimation algorithm provided in one embodiment of the present invention;

fig. 8 is a graph of experimental results of depth selection of trees in the XGBoost algorithm provided by the present invention.

Detailed Description

The following are specific examples provided by the inventors to further explain the technical solutions of the present invention.

For example, the skin color of african is mostly black, the skin color of european is mostly white, and the skin color of asian is mostly yellow, so that black skin refers to the skin color of african, and non-black skin refers to the skin color of europe, asian, and the like, such as white, yellow, and the like.

Example one

The invention discloses a method for constructing an age estimation model based on a face image, which is used for constructing a model for estimating the age of an image with a face, and as shown in figure 1, the method comprises the following steps:

because the acquired image not only comprises the face but also comprises a plurality of other interference factors, the face part in the image is extracted as the face image in the step, and the processing speed of the subsequent step is improved.

In this example, the images with faces were all from the MORPH database, where there were 55134 images of over 13,000 persons. Each person has approximately 4 images, aged from 16 to 77 years, from different ethnicities, with approximately 77% african, 19% european, and the remaining 4% including asians, etc. 10000 black skin images and 10000 non-black skin images are randomly selected from the 55134 images, an MATLAB face detection tool box is adopted to carry out face detection on 20000 images, the detected face part is extracted and stored as a face image with 299 pixels, and the 20000 face images form an original face image set.

Step 2, dividing all original face images in the original face image set into two groups, wherein all face images with black skin form a first image set, and all face images with non-black skin form a second image set;

respectively corresponding each image in the first image set to a respective age tag to obtain a first age tag set, and respectively corresponding each image in the second image set to a respective age tag to obtain a second age tag set, wherein the age tags are real age values of human faces in the images;

the method for dividing all original face images in the original face image set into two groups of black skin and non-black skin can be that after the face image features are extracted, a neural network classifier and a support vector machine classifier are adopted to classify the images.

In a preferred embodiment, the color features and LBP features of the image are extracted as the face image features, and the XGBoost classification model is used for classification.

Specifically, the method for classifying face images according to skin colors provided by the invention comprises the following steps:

step 21, taking a plurality of face images from the original face image set as a grouped image set, wherein each image in the grouped image set corresponds to a respective skin color label, the skin color labels comprise black skin [0] and non-black skin [1], and collecting the skin color labels of all the images in the grouped image set to obtain a skin color label set;

in this embodiment, 1000 face images are extracted from the original face image set as a grouped image set, wherein 500 face images are black skins [0], and 500 face images are non-black skins [1], and a 1000 × 1-dimensional vector is obtained as a skin color tag set according to the sequence of the face images in the grouped image set corresponding to the skin color tags.

Step 22, extracting the color feature and LBP feature of each facial image in the grouped image set, fusing the color feature and LBP feature of each original facial image to obtain the skin color feature of each facial image, and collecting the skin color features of all the images in the grouped image set to obtain a skin color feature set;

in the step, the pixel value of the color face image is used as the color feature, before the LBP feature is extracted, in order to improve the operation speed of the algorithm, the color face image is grayed, the gray face image is obtained to extract the LBP feature, after each face image is segmented, the LBP feature is extracted from each region of the segmented image.

In this embodiment, the face image is divided into 10 × 10 square regions, an LBP feature operator is used for each region, histogram features are calculated by using an equivalence mode, equal weight is given to each region, and the features of each region are connected to obtain an LBP feature of 5900 dimensions.

In the application, in order to improve the efficiency of the algorithm, the LBP characteristics and the color characteristics are spliced end to obtain the skin color characteristics of each face image.

in this step, the XGBoost classification model is trained, the loss function may be Square loss, logistic loss, etc., and the regularization term may be L1 regularization, L2 regularization, etc.

In this embodiment, the loss function in the XGBoost classification model is logistic loss, and the regularization term is L2 regularization.

And step 24, grouping the original face image set face images processed in the step 22 by using the face classification model obtained in the step 23 to obtain black skin face images or non-black skin face images.

In the step, the original face images in the original face image set are classified by using a face classification model, and the skin color property of each image is obtained. Through the classification of the step, each face image in the original face image set obtains the corresponding skin color label, so that the existing original face image set is divided into two groups, one group is a first image set formed by face images of all black skins, and the other group is a second image set formed by face images of all non-black skins.

Finally, each face image in the first image set corresponds to a respective age label, and a first age label set is obtained; and respectively corresponding each face image in the second image set to a respective age label to obtain a second age label set, wherein the age label is a real age value of the face in the image.

In this embodiment, the true age of the black skin face image shown in fig. 2 is 67, which corresponds to the age label [67], and the true age of the non-black skin face image shown in fig. 3 is 50, which corresponds to the age label [50 ].

In the invention, in order to reduce the influence on age estimation caused by skin color difference, a set of age estimation models are respectively established for a black skin face image and a non-black skin face image, so that different age estimation models are selected according to different colors of the face skin when the face image is estimated.

Step 3, extracting LBP characteristics from each face image in the first image set to obtain first LBP characteristics of each face image; extracting DCNN (direct current neural network) features from each face image in the first image set by adopting a deep convolutional neural network to obtain first DCNN features of each face image; fusing the first LBP characteristic and the first DCNN characteristic of each facial image to obtain a first image characteristic of each facial image, and collecting the first image characteristics of all facial images in a first image set to obtain a first image characteristic set;

in this step, the LBP features of all face images in the first image set adopt the LBP features extracted in step 22 to obtain a first LBP feature set.

When the convolutional neural network is adopted to extract the DCNN features, an AlexNet network, a ZF network, an inclusion-v 3 network and the like can be adopted.

In a preferred embodiment, the inclusion-v 3 network is adopted to extract DCNN features to obtain DCNN feature sets.

The method comprises the steps of extracting a first DCNN feature by adopting an inclusion-v 3 network to obtain a first DCNN feature set, improving the structure of the inclusion-v 3 network by adopting an inclusion-v 3 network to reserve more texture information of a human face, and modifying partial parameters of the inclusion-v 3 network.

Specifically, when the inclusion-v 3 network is used for extracting the DCNN feature, in order to make the method not easy to generate overfitting, a top-layer global average pooling layer in the inclusion-v 3 network is replaced by a global maximum pooling layer, a plurality of layers of gaussian distribution initialized full-connection layers are added behind the global maximum pooling layer, tag distribution of KL loss layers is replaced by deep tag distribution, and the output of the last full-connection layer in the plurality of layers of gaussian distribution initialized full-connection layers is used as the DCNN feature.

The last global average pooling layer of the inclusion-V3 deep convolutional neural network is changed into a global maximum pooling layer, because the global average pooling can reduce the problem of the increase of variance of the estimated value caused by the limitation of the size of a convolutional kernel, more background information of the image is reserved, more texture information needs to be reserved for the age estimation of the face, the global maximum pooling can reduce the problem of mean shift of the estimated value caused by parameter errors of the convolutional layer, and more texture information is reserved.

In order to ensure the effectiveness of the algorithm and the processing speed, the scheme provided by the invention adds a 2-layer Gaussian distribution initialized full-connection layer behind the global maximum pooling layer, see table 1, and adopts a back propagation finetune method to update the weights of the global maximum pooling layer and the added 2-layer Gaussian distribution initialized full-connection layer in the inclusion-v 3 network, so as to obtain the output of the second full-connection layer, and sets the learning rate by using an exponential decay method, wherein the initial learning rate is 0.1, the minimum step length is 32, and the iteration number is 100.

In a preferred embodiment, the mean value of the gaussian distribution of the fully connected layer initialized by the gaussian distribution is 0 and the variance is 0.01.

Table 1 improved inclusion V3 network structure

In addition, in order to improve the accuracy of the age estimation algorithm, the invention replaces the label distribution of a KL loss layer with the depth label distribution, the depth label distribution is a label distribution learning algorithm which converts an age label corresponding to each face age image into a discrete label distribution and minimizes the Kullback-Leibler difference between a real age label and a predicted age label by using a depth convolution neural network, the Kullback-Leibler digenerc is used as a standard for measuring the similarity between the real label distribution and the predicted label distribution, and the loss function is optimized by using a random gradient descent method.

Through the steps, the LBP characteristic and the DCNN characteristic of each face image in the first image set are extracted, and in order to ensure the accuracy of characteristic extraction and further ensure the accuracy of age estimation, the LBP characteristic and the DCNN characteristic are fused, so that the problems that the local structural characteristic of the DCNN characteristic is ignored when the DCNN characteristic is directly extracted and the problem that the LBP characteristic has more subjective factors when being extracted are solved.

In the scheme, in order to improve the efficiency of the algorithm, the LBP feature and the DCNN feature are directly spliced end to obtain the first image feature of each image in the first image set, so that the first image feature set of the first image set is obtained.

Extracting LBP characteristics from each face image in the second image set to obtain second LBP characteristics of each face image; extracting DCNN (direct current neural network) features from each face image in the second image set by adopting a depth convolution neural network to obtain second DCNN features of each face image; and fusing the second LBP characteristic and the second DCNN characteristic of each facial image to obtain a second image characteristic of each facial image, and collecting the second image characteristics of all facial images in the second image set to obtain a second image characteristic set.

And extracting LBP (local binary pattern) features and DCNN (direct sequence neural network) features from all the face images in the second image set by adopting the same method for processing all the face images in the first image set, and then fusing to obtain second image features of all the face images in the second image set, thereby obtaining a second image feature set.

In the scheme, two age estimation models are respectively established for the first image feature set and the second image feature set, namely, the image feature set of black skin corresponds to the first age estimation model, and the image feature set of non-black skin corresponds to the second age estimation model.

When the age estimation model is established, an XGboost regression model is adopted, and training is carried out by utilizing input and output, and as a preferred implementation mode, a loss function in the XGboost regression model is square loss.

Example two

Processing the image to be estimated by adopting the step 1 of the first embodiment to obtain the face image to be estimated, and estimating the age of the face image to be estimated.

In this embodiment, the age of the image to be estimated shown in fig. 4 is estimated, and after the processing in step 1, the face image to be estimated shown in fig. 5 is obtained, and the age of the face image to be estimated is estimated.

Specifically, the method comprises the following steps:

step A, adopting the face classification model in the first embodiment to classify the face image to be estimated, which is processed in the step 22, and if the face image is a black skin face image, executing the step B; and C, if the face image is a non-black face image, executing the step C.

In this embodiment, the face image shown in fig. 6 is classified by using a face classification model, and as a result, the face image is a black [0] skin face image, and step B is performed.

Step B, adopting the first age estimation model of the first embodiment to estimate the age of the black skin face image processed in the step 3;

and step C, adopting the second age estimation model in the first embodiment to estimate the age of the non-black skin face image processed in the step 3.

In step B, C, age estimation is performed on the face images of different skin colors.

In this embodiment, a first age estimation model is selected for a face image to be estimated as shown in fig. 6 to perform age estimation, and an obtained age label is: [70] i.e. the estimated age of the face in the image is 70 years.

EXAMPLE III

Validity test of face estimation model

In order to prove the effectiveness of the face age estimation model provided by the invention, two groups of experiments are carried out, wherein one experiment adopts a method for extracting features of the text, and the experiments are respectively carried out by using the existing age estimation algorithm, including BP neural network, SVM, KNN and label distribution learning algorithms IIS-LLD and CPNN. For CPNN, the number of hidden layer units is 400, k in KNN algorithm is set to 61, the activation function of BP neural network is sigmoid, and the hidden layer has 100 neurons. Experiment two through comparison with the results of the existing age estimation method on the basis of using the same database, the comparison method comprises AGES, MTWGB, OHRanker, Rkcca and the like.

When the original image set is classified, the black skin mark is 0, the non-black skin mark is 1, the threshold value is set to be 0.6, the classification result is larger than the threshold value, the classification result is classified into the non-black skin, a 10-fold cross verification method is adopted for carrying out experiments, finally, the average classification accuracy is 0.94, and the deviation is 0.03.

Table 2 shows the comparison results of age estimation, MAE, for different input characteristics₁Means that the model only adopts LBP characteristics of the classified images as the average absolute error, MAE₂Mean absolute error, MAE, obtained by using the depth features of the classified images as input features₃The mean absolute error of the model obtained by using the LBP characteristic and the depth characteristic of the classified image as input characteristics is referred to.

Table 2: inputting age estimation comparison results with different characteristics

From the results of table 2, it can be seen that the age estimation model proposed herein exhibits excellent performance. The results of the first five comparison algorithms are the average of the results of the two modes of the black skin face image and the non-black skin face image. After the feature fusion is adopted, most algorithms have better effect than that of only using a single feature, except for the BP neural network and the SVM, the possible reason is the influence brought by the increase of the number of samples and the feature dimension. The learning speed of the BP neural network algorithm is very low, the BP neural network algorithm is optimized by adopting a gradient descent method, an objective function to be optimized is very complex, the BP algorithm is low in efficiency, from the mathematical point of view, the BP algorithm is an optimization method for local search, but the problem to be solved is to solve the global extremum of a complex nonlinear function, and therefore the algorithm is likely to fall into the local extremum. Because of the large number of database samples and the large number of categories for classifying ages, the storage and calculation of the matrix will consume a large amount of machine memory and operation time, which may affect the performance of the SVM algorithm. The time complexity and the storage space of the KNN algorithm can be rapidly increased along with the increase of the scale and the feature dimension of the training set, because the similarity must be calculated and compared with all the training sets every time when a new sample to be classified is adopted, when the face images of the database are classified by adopting the KNN algorithm, the classification of the sample is extremely unbalanced due to more classified age classes, everyone hardly ensures that each age group has at least one image, some age groups have a plurality of images, some age groups have almost no images, and the classification accuracy can be influenced. The IIS-LLD and CPNN algorithms are algorithms proposed by Geng et al that specifically address face age estimation, and the reason why CPNN performs better may be that, first, CPNN learns without a priori assumptions, while the IIS-LLD assumption is a maximum entropy model, which is not necessarily suitable for age estimation. Secondly, all class labels share the same set of model parameters in the CPNN, and the IIS-LLD learns the parameters of each class label separately, so the CPNN can better exploit the correlation between class labels, but the high bias brought by the CPNN suggests that it is more susceptible to overfitting.

As shown in table 2, the age estimation model proposed by the present invention is better for the age estimation of the face image with black skin than for the face image with non-black skin, and may be affected by the illumination during the database image acquisition process, especially the face image with white skin.

The convolutional neural network has excellent performance in the fields of target detection, image recognition and the like, wherein the important reason is a large amount of training data, for the problem of age estimation, whether increasing the number of training samples influences the result, in order to further verify the idea, 30000 pieces of black skin face images left in the MORPH database are gradually added into the established black skin face age estimation model, while the test data are kept the same, and as a result, as shown in fig. 7, by increasing the number of training samples, the age estimation model provided herein is utilized, so that a lower MAE of 3.39 can be obtained.

Table 3 lists the comparison results of the age estimation models provided by the present invention, and the method proposed by the present invention achieves better results, wherein the result of the age estimation on the face images of non-black skin is less than ideal, and the number of possible reason training samples is insufficient, wherein the age range of 0-16 years has few sample images, and each age range cannot guarantee sufficient data, and the influence of illumination interferes with the extraction of texture features, but achieves better results on the age estimation of the face images of black skin, and the MORPH database has about 40000 face images of black skin, and the difference between black skin color and illumination brightness is obvious, so the illumination influence on the results is less likely.

Table 3: comparison results of different age estimation models

For the XGBoost algorithm, the depth of the tree is also an important parameter, the depth of the tree is insufficient, which may cause an under-fitting phenomenon, and the estimated age result is inaccurate, for example, if the depth of the fruit tree is too deep, an over-fitting phenomenon may occur, so that the selection of the depth of the tree needs to be experimentally verified, as shown in fig. 8, when the depth of the tree is 3, the obtained tree model may generate a better result.

Claims

1. A method for constructing an age estimation model based on a face image is characterized by comprising the following steps:

taking the real age value of each facial image in the first image set as a respective age label to obtain a first age label set, and taking the real age value of each facial image in the second image set as a respective age label to obtain a second age label set;

2. The method for constructing an age estimation model based on facial images according to claim 1, wherein the dividing of all facial images in the original facial image set into two groups specifically comprises:

3. The method for constructing an age estimation model based on facial images as claimed in claim 2, wherein the loss function in the XGBoost classification model is logistic.

4. The method of claim 1, wherein the inclusion-v 3 network is used to extract DCNN features.

5. The age estimation model construction method based on facial images as claimed in claim 4, wherein when the inclusion-v 3 network is adopted to extract DCNN features, a top layer global average pooling layer in the inclusion-v 3 network is replaced by a global maximum pooling layer, a multi-layer Gaussian distribution initialized full-connected layer is added behind the global maximum pooling layer, tag distribution of KL loss layers is replaced by depth tag distribution, and the output of the last full-connected layer in the multi-layer Gaussian distribution initialized full-connected layer is taken as the DCNN features.

6. The method for constructing an age estimation model based on facial images as claimed in claim 5, wherein two layers of Gaussian distribution initialized full-connected layers are added behind the global maximum pooling layer, the weight of the global maximum pooling layer and the added two layers of Gaussian distribution initialized full-connected layers in the inclusion-v 3 network is updated by using a back propagation finetune method, the output of the second full-connected layer is obtained, and the learning rate is set by using an exponential decay method, wherein the initial learning rate is 0.1, the minimum step size is 32, and the iteration number is 100.

7. The method of claim 6, wherein the mean value of the Gaussian distribution of the fully connected layer initialized by the Gaussian distribution is 0, and the variance is 0.01.

8. The method for constructing an age estimation model based on facial images as claimed in claim 1, wherein the loss function in the XGBoost regression model is square loss.

9. An age estimation method based on a face image, characterized in that, the step 1 of claim 1 is adopted to process the image to be estimated to obtain the face image to be estimated, and the age estimation is carried out on the face image to be estimated, the method comprises:

step A, adopting the face classification model of any claim of claims 2-3 to classify the face image to be estimated processed in step 22, and if the face image is a face image with black skin, executing step B; if the face image is a non-black skin face image, executing the step C;

step B, adopting the first age estimation model of any one of claims 1-8 to perform age estimation on the face image of the black skin processed in the step 3;

and C, using the second age estimation model of any one of claims 1 to 8 to estimate the age of the face image of the non-black skin processed in the step 3.