CN111814611A - Multi-scale face age estimation method and system embedded with high-order information - Google Patents

Multi-scale face age estimation method and system embedded with high-order information Download PDF

Info

Publication number
CN111814611A
CN111814611A CN202010590398.4A CN202010590398A CN111814611A CN 111814611 A CN111814611 A CN 111814611A CN 202010590398 A CN202010590398 A CN 202010590398A CN 111814611 A CN111814611 A CN 111814611A
Authority
CN
China
Prior art keywords
module
face image
age
global
local
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010590398.4A
Other languages
Chinese (zh)
Other versions
CN111814611B (en
Inventor
钟福金
王新月
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dragon Totem Technology Hefei Co ltd
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202010590398.4A priority Critical patent/CN111814611B/en
Publication of CN111814611A publication Critical patent/CN111814611A/en
Application granted granted Critical
Publication of CN111814611B publication Critical patent/CN111814611B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/32Normalisation of the pattern dimensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/178Human faces, e.g. facial parts, sketches or expressions estimating age from face image; using age information for improving recognition

Abstract

The invention relates to the field of face age estimation, in particular to a multi-scale face age estimation method and a system embedded with high-order information, wherein the method comprises the following steps: inputting a face image, and preprocessing the face image; inputting the face image into a residual error network for global feature extraction to construct a global branch; inserting blocks for extracting high-order age information at different positions of the global branch; taking the output characteristic diagram of the first convolutional layer of ResNet as the input of a long-term and short-term memory network, acquiring the position information of an age sensitive area, and obtaining a local characteristic diagram through cutting to construct a local branch; performing joint optimization on the two branches through a back propagation minimized loss function, and performing iterative training on a neural network; and inputting the test set into a trained neural network model, and calculating and outputting a final predicted age according to the age characteristics. The network model of the invention has the advantages of low calculation cost, high precision and strong applicability of related products.

Description

Multi-scale face age estimation method and system embedded with high-order information
Technical Field
The invention belongs to the field of face age estimation, and particularly relates to a multi-scale face age estimation method and system with embedded high-order information.
Background
The purpose of face age estimation is to automatically output biological age through a face image, and the face age estimation method is widely applied to the fields of face retrieval based on age, accurate advertisement, intelligent monitoring, human-computer interaction (HCI), Internet access control and the like, and is an active research topic in computer vision. Due to the combined action of internal factors of facial aging (such as various genes) and complex changes of facial images (such as facial poses at different angles and camera vision), the facial aging process is uncontrollable and personalized, and accurate and reliable automatic age estimation from facial images is extremely challenging.
The classical age estimation algorithm consists of two successive but relatively independent phases: age feature extraction and age estimation. According to the way of feature extraction, the current face age estimation methods can be divided into two categories: the method is based on the traditional machine learning method; the second is a method based on deep learning. The traditional machine learning method mainly extracts age features manually and then classifies the age features through a traditional classifier, so that the age estimation of the human face is realized. In recent years, with the development of deep learning technology, the deep neural network has the most advanced performance in image recognition, can realize automatic extraction of facial features, is widely applied to age estimation, and achieves the achievement superior to the traditional machine learning method.
In the prior art, the design of a deep convolutional neural network mainly focuses on a deeper or wider network to enhance the nonlinear modeling capability of a model, but a face age estimation method based on deep learning has the problem that the face age feature expression which can not well take global-local details into account, so that the feature expression capability of the CNN is limited to a certain extent. Therefore, how to realize the face age estimation feature expression which takes global and local details into consideration is one of the future face age estimation research directions.
Disclosure of Invention
In view of the above-mentioned problem of lack of global-local feature expression capability, the present invention aims to provide a method and a system for multi-scale face age estimation with embedded high-order information, which can better perform global-local age feature expression, and further enhance the nonlinear modeling capability of the model by inserting a block for extracting high-order age features into the network, thereby effectively improving the accuracy of face age estimation and realizing high-precision age estimation.
In a first aspect of the present invention, the present invention provides a multi-scale age estimation method embedded with high-order information, comprising the following steps:
a multi-scale face age estimation method embedded with high-order information comprises the following steps:
inputting a face image set with an accurate age label as a data set, and preprocessing the face image data set;
inputting the preprocessed face image into a baseline model ResNet-50, and extracting a shallow feature map through a convolution layer and a maximum pooling layer;
after the shallow feature map is extracted, four groups of residual modules which are sequentially connected are connected to form a residual network, the residual network is used as a global branch, and global features of the face image are extracted;
embedding a global second-order pooling block between the first set of residual blocks and the second set of residual blocks, thereby generating a high-dimensional global image representation in the global branch;
taking the shallow feature map as the input of a long-term and short-term memory neural network, constructing a local branch and extracting the local features of the age sensitive area;
performing joint optimization to solve a cross entropy loss function of the two branches, performing iterative training on a convolutional neural network formed by the global branch and the local branch until convergence, and storing a trained convolutional neural network model;
and inputting the face image to be detected into the trained convolutional neural network model, and calculating and outputting the final predicted age by the classifier according to the age characteristics.
In a second aspect of the present invention, the present invention provides a multi-scale face age estimation system embedded with high-order information, comprising an image acquisition module, a data preprocessing module, a data enhancement module, a neural network module and an output module;
the image acquisition module is used for inputting a data set and acquiring face image information or a face image to be detected;
the data preprocessing module is used for carrying out face detection, face alignment and cutting on the face image information or a face image to be detected and carrying out pixel normalization processing on the face image;
the data enhancement module is used for expanding the training set according to random horizontal turning, zooming, rotating and translating operations;
the neural network module is used for constructing and training a convolutional neural network formed by the global module and the local module;
preferably, a sharing module is further arranged in front of the global module and the local module, and the sharing module is used for transferring between the global module and the local module;
the global module is used for extracting and learning global features;
the local module is used for extracting and learning local features;
the output module is used for outputting the final predicted age of the face image to be detected.
The invention has the beneficial technical effects that:
(1) the invention has the effects of high speed and high precision, and can accurately estimate the age of any input face image.
(2) The invention provides a novel multi-scale feature extraction framework giving consideration to global-local information, ensures that the network can extract age features of different types (global and local details) through multi-scale feature extraction, enhances the feature characterization capability of the network, and overcomes the defects in the existing face age estimation method.
(3) According to the invention, the GSoP block used for extracting high-order age information is embedded in the age estimation network, the high-order module can capture global second-order statistical information along a channel dimension or a position dimension, and the nonlinear modeling capability of the model is stronger than that of a traditional first-order network.
Drawings
Fig. 1 is a flowchart of a multi-scale face estimation method with embedded high-order information according to an embodiment of the present invention;
FIG. 2 is a high level block diagram of an embodiment of the present invention;
FIG. 3 is a schematic diagram of a training process according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a multi-scale network embedded with high-order information according to an embodiment of the present invention;
fig. 5 is a diagram illustrating an application effect of the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention discloses a multi-scale face age estimation method embedded with high-order information, which comprises the following steps of:
inputting a face image set with an accurate age label as a data set, and preprocessing the face image data set;
inputting the preprocessed face image into a baseline model ResNet-50, and extracting a shallow feature map through a convolution layer and a maximum pooling layer;
after the shallow feature map is extracted, four groups of residual modules which are sequentially connected are connected to form a residual network, the residual network is used as a global branch, and global features of the face image are extracted;
embedding a global second-order pooling block between the first set of residual blocks and the second set of residual blocks, thereby generating a high-dimensional global image representation in the global branch;
taking the shallow feature map as the input of a long-term and short-term memory neural network, constructing a local branch and extracting the local features of the age sensitive area;
performing joint optimization to solve a cross entropy loss function of the two branches, performing iterative training on a convolutional neural network formed by the global branch and the local branch until convergence, and storing a trained convolutional neural network model;
and inputting the face image to be detected into the trained convolutional neural network model, and calculating and outputting the final predicted age by the classifier according to the age characteristics.
In one embodiment, the data set used in the present invention is a Morph II face age data set comprising 55134 face images from 13618 people in the age range of 16-77 years with age labels of age value taken from a controlled environment. In order to ensure the sufficiency of the training set and the reasonableness of the test set, the invention adopts the widely used protocol S1-S2-S3 on the data set to carry out experiments, specifically, the embodiment of the invention repeats all the steps included in the data set twice, wherein S1 is adopted as the training set in the first pass, and S2+ S3 is adopted as the test set; the second pass used S2 as the training set and S1+ S3 as the test set. The original image provided by the Morph II face age data set has the advantages of high quality, small noise, large quantity and the like, and is convenient for subsequent processing of experiments.
Preprocessing the Morph II dataset: the method comprises the following steps of adopting a Multi-task convolutional neural network (MTCNN) to carry out face detection on an originally acquired face image, carrying out key point alignment through eye center, nose tip and upper lip coordinates, uniformly cutting a processed image into 256 multiplied by 256 sizes, carrying out random horizontal turning, scaling, rotating (such as +/-5 degrees) and translation series of data amplification operations on a candidate training set to enhance the generalization capability of a subsequent convolutional neural network model, and carrying out pixel normalization processing on the processed face image, wherein the formula comprises the following steps:
Xpix=(Xpix-128)/128
wherein, in the present invention, XpixThe correspondence is the input face image pixel value, specifically, the face image pixel value input to the MTCNN network.
And sequentially transmitting the training sample images after data enhancement to a neural network, and training the network by utilizing a back propagation minimized loss function. Compared with the traditional age estimation algorithm, the baseline model ResNet-50 is adopted to reduce the size of the model and improve the accuracy of the model, the ResNet-50 adds a bypass connection (shortcut) branch outside an original convolutional layer to form a basic residual module, the original mapping H (X) is expressed as H (X) (F (X)) + x, wherein F (X) is residual mapping, x is an input signal, the learning of H (X) by the convolutional layer is converted into the learning of F (X) by a residual module structure, the learning of F (X) is simpler than that of H (X), and the structure effectively solves the attenuation problem caused by the fact that the network layer number is too deep while reducing the calculation amount.
Inputting a face image into a ResNet-50 network, performing shallow feature extraction through a convolutional layer and a maximum pooling layer to serve as an input feature map of each next branch network, specifically, a feature map with an input channel number of 3 performs feature extraction through a convolutional layer with a core size of 7 × 7, a channel number of 64 and a step size of 2, an output feature map has a size of 112 × 112 and an output channel of 64, and an output channel is 64 through a maximum pooling layer with a core size of 3 × 3 and a step size of 2, and the output feature map at the moment serves as an input feature map of each next branch.
After the shallow feature map is extracted, four groups of residual modules which are sequentially connected are connected to form a residual network, the residual network is used as a global branch, and the global features of the face image are extracted;
it can be understood that the core improvement of the present invention lies in two branch networks proposed by the present invention, namely a global branch and a local branch, and for the global branch, the core of the present invention is to make some modifications to the baseline model ResNet-50, extract a shallow feature map in the convolution layer and the maximum pooling layer of the baseline model ResNet-50, and connect four sets of sequentially connected residual modules behind the maximum pooling layer to form a residual network, and extract the global feature of the face image by using the residual network as the global branch, and on the other hand, embed a high-order module for extracting high-order age information in the global branch, and instead, in the local branch, the shallow feature map is used as the input of the LSTM, and the coordinates of the local feature are obtained by using the gate structure of the LSTM, and then the local feature map is obtained by clipping. In the present invention, if not specifically emphasized, the residual error network of the present invention mainly refers to a structure formed by a plurality of sets of residual error modules after the baseline model ResNet-50, and of course, the above division refers to only the point for more highlighting the improvement of the present invention, and those skilled in the art can adaptively understand the present invention according to the overall embodiment and the attached drawings.
In this embodiment, the convolution layer and the maximum pooling layer of the baseline model ResNet-50 are used as a shared layer, and an output characteristic diagram of the shared layer is used as an input of a dual-branch network to form a hybrid network structure composed of a global branch and a local branch, i.e., a convolutional neural network model finally obtained by the present invention;
furthermore, the global branch is composed of a residual module and a high-order embedding module.
Further, the process of constructing the global branch includes the steps of:
firstly, inputting a feature map of a shared layer into a global network branch, wherein the global network branch is formed by connecting 4 groups of residual modules in series, the number of input channels of each group of residual modules is 64, 128, 256 and 512, each residual module is formed by convolution operation, BN (Batch Normalization) operation and ReLU (modified Linear Unit) operation, the series of operations are applied to mapping of global features, and corresponding output channels of the series of operations are changed into 256, 512, 1024 and 2048;
then, a global second-order pooling block is embedded between the first group of residual modules and the second group of residual modules, and the embedding process of the global second-order pooling block comprises the following steps:
inserting a block for extracting high-order information into a residual error network, specifically, as shown in fig. 2, inputting a three-dimensional tensor of h ' × w ' × c ', and performing 1 × 1 convolution on the three-dimensional tensor to obtain a three-dimensional tensor of h ' × w ' × c; wherein h 'and w' are respectively the length and width of the input face image, c 'is the number of channels, and c is less than c';
calculating the correlation of the channels to obtain a fixed-size c × c covariance matrix, and performing row direction normalization on the covariance matrix;
performing two continuous operations of covariance matrix row convolution and Sigmoid nonlinear activation, and outputting a weight vector of c multiplied by 1;
multiplying each channel of the input tensor by a corresponding element in the weight vector to obtain a new three-dimensional tensor h 'xw' xc which is used as the input of a subsequent residual error module;
and inserting a matrix normalized covariance matrix at the end of the last residual module of the residual network to generate a final global feature representation.
In an embodiment, after the convolution operation is performed by the first residual error module, a 128 × 128 × 256 three-dimensional tensor is input, which is the length, width and channel number of the feature map, respectively, and the three-dimensional tensor is subjected to 1 × 1 convolution first to obtain a 128 × 128 × c three-dimensional tensor, where it is noted that the calculation cost can be reduced by c < c', in this embodiment, 256 is taken, and no compression parameter number operation is performed; then, calculating channel correlation to obtain a fixed-size c multiplied by c covariance matrix, and carrying out row direction normalization on the covariance matrix; then, two continuous operations of covariance matrix row convolution and Sigmoid nonlinear activation are executed, and a weight vector of c multiplied by 1 is output; each channel of the input tensor is multiplied by a corresponding element in the weight vector, and the channels are emphasized or suppressed in a soft manner, so that a new three-dimensional tensor 128 x c representing the global features is obtained.
Finally, replacing the first-order global average pooling by a second-order statistical method at the end of the network, inserting a matrix normalization covariance matrix as a final global image representation, thereby realizing the embedding of high-order information, specifically, outputting a 7 × 7 × 2048 three-dimensional tensor after performing feature mapping on the fourth residual layer of ResNet-50, adjusting the three-dimensional tensor into a feature matrix X with a dimension of 2048 and an eigenvalue of 49, and then passing through the feature matrix X
Figure BDA0002556110890000071
The computation of the covariance matrix employs pooling of the second order, in which
Figure BDA0002556110890000072
Where I and 1 are respectively an n identity matrix and a matrix of all 1 s.
In one embodiment, the method for constructing the local branch and extracting the local features of the age-sensitive region based on the Long Short-term memory neural network (LSTM) is composed of the Long Short-term memory neural network, a local region positioning module and a cutting module.
Further, the process of constructing the local branch comprises the following steps:
firstly, an output characteristic diagram of a sharing layer is input into an LSTM, an LSTM unit controls the state of the unit through a structure of a gate, the characteristics of a current image are considered, position information of other similar images is utilized, an age sensitive area is more comprehensive to be positioned, and the structure of the gate is divided into an input gate, a forgetting gate and an output gate. First, the forgetting gate is from the previous state CprevSelect information in the output of (1), input gate and new candidate vector C generated by tanh layerin-tanThe purpose of the multiplication, and then the combination of the two sources of information for status update, is to discard unnecessary information and add new information. Furthermore, the state output of the LSTM hidden layer is obtained using the cell state, which is held between-1 and multiplied by the output value of the output gate. The formula includes:
Cnext=forgetgate⊙Cprev+ingate⊙Cin-tan
hnext=outgate⊙tanh(Cnext)
Cin-tan=tan h(WC[hprev,xinput]+bC)
wherein, forgetgateForgetting gate, in, representing long-short term memory neural network LSTMgateInput gate, out, representing LSTMgateAn output gate representing an LSTM; the lines indicate the same or a symbol; cprevAnd Cnext、hnextThe previous state, the current state and the hidden state of the LSTM, respectively; cin-tanIs a candidate vector for updating the state of the cell, WCAnd bCRespectively representing the weight and the offset, xinputIs the input of the LSTM;
then, S is addednextInputting into a positioning module consisting of a convolutional layerAnd an activation function of the S type, SnextAs input to the convolutional layer, the output is l1-4=L(W*Snext) Wherein l is1-4Representing a four-dimensional vector, representing coordinates (x, y), width and height, respectively, and updating the LSTM unit block and the location block using a cross-entropy loss function strategy in the back propagation process.
Finally, the LSTM clipping module clips the position coordinates to obtain a local feature map with a size of 112 × 112, and sequentially inputs the local feature map into the following 4 residual module groups for local feature learning, wherein the number of input channels is 64, 128, 256, and 512, and the number of corresponding output channels is 256, 512, 1024, and 2048;
and (3) performing cross entropy loss solution on the global branch and the local branch jointly, performing joint optimization on the two branches through a back propagation minimum loss function, and performing iterative training on the neural network.
Further, the loss function is expressed as follows:
Pfinal(Xi)=Pglobal+0.5Plocal
Figure BDA0002556110890000091
wherein the content of the first and second substances,
Figure BDA0002556110890000092
representing loss of convolutional neural network, PglobalRepresenting the predicted age probability, P, of a sample i in a global branchlocalRepresenting the predicted age probability, P, of a sample i in a global branchfinal(Xi) The final predicted age of sample i is represented and n represents the number of training lumped samples of the face image.
Using an Adam optimizer to carry out training adjustment, after multiple rounds of training, leading the neural network to tend to be stable, ending the iteration process, and obtaining a trained convolutional neural network model, wherein the training process is shown as figure 3,
after an image data set is obtained, preprocessing a face image;
constructing a multi-scale network model embedded with high-order information, namely a convolutional neural network model constructed by the invention;
training the network using the data set and performing multiple iterations;
and solving the loss of the result output by the network and the real age value label corresponding to the face image until the loss tends to be stable.
At this time, the training is finished and the trained convolutional neural network model is output.
The trained convolutional neural network is shown in fig. 4.
When the trained neural network model is used, the image containing the human face is input into the trained neural network model, and the trained neural network model calculates the predicted age value of the sample according to the weight parameters obtained in advance.
A multi-scale human face age estimation system embedded with high-order information comprises an image acquisition module, a data preprocessing module, a data enhancement module, a neural network module and an output module;
the image acquisition module is used for inputting a data set and acquiring face image information or a face image to be detected; the image acquisition module is used as a data reading inlet of the whole system and is used for inputting a data set and acquiring pixels and age tags of an original image;
the data preprocessing module is used for carrying out face detection, face alignment and cutting on the face image information or a face image to be detected and carrying out pixel normalization processing on the face image;
the data enhancement module is used for expanding the training set according to random horizontal turning, zooming, rotating and translating operations; data enhancement is carried out on the limited training set to increase the generalization capability of the model, so that the network can deal with face estimation under a more complex background such as an uncontrolled environment;
the neural network module is used for constructing and training a convolutional neural network formed by the global module and the local module; the neural network module is used for training and testing a network and is a core module of the whole system;
the global module is used for extracting and learning global features, and the local module is used for extracting and learning local features;
in a preferred embodiment, the baseline model ResNet-50 can be used as a sharing layer, provides the input of the global module and the local module, and can realize the transfer between the global module and the local module.
The output module is used for outputting the age estimation value of the face image to be detected.
The global module comprises a residual error module and a high-order module, wherein the residual error modules are sequentially connected to form a residual error network, the residual error network extracts global features of the face image, and the high-order module introduces a global second-order pooling block from a lower layer to a higher layer, so that second-order statistical information of the face image is fully utilized.
The high-order module is used for embedding high-order information and comprises: a convolution module with the size of 1 multiplied by 1, which is used for integrating the information of each channel and reducing the number of output channels at the same time so as to compress the parameter number; the covariance matrix module is used for calculating channel correlation, obtaining a covariance matrix with a fixed size, and normalizing the covariance matrix in the row direction; and the covariance convolution module is used for performing covariance matrix row convolution and Sigmoid nonlinear activation two continuous operations.
The local module comprises a long-short term memory neural network, a local area positioning module and a cutting module, wherein the long-short term memory neural network is used for updating the state, the local area positioning module is used for positioning the coordinate, the width and the height of the age sensitive area, and the cutting module cuts the local characteristic diagram according to the local position information. The invention relates to a multi-scale human face age estimation system embedded with high-order information, which comprises an image acquisition module, a data preprocessing module, a neural network module and an output module.
Fig. 5 is a face age estimation diagram of the present invention, after inputting the leftmost original face picture, preprocessing the face according to the face key point detection to highlight the age characteristics of the face image, especially identifying the distance between the five sense organs of the face; and inputting the processed picture into a multi-scale human face age estimation network embedded with high-order information for feature extraction and age estimation. It can be seen that after the global features and the local features of the face image are extracted, the age corresponding to the face can be estimated to be 22.
It can be understood that, part of features of the multi-scale face age estimation method and system embedded with high-order information of the present invention may be mutually cited, for example, a global branch in the method corresponds to a global module of the system, etc., and those skilled in the art may correspondingly understand and implement the present invention according to the embodiments of the present invention, and the present invention is not described in detail.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (10)

1. A multi-scale face age estimation method embedded with high-order information is characterized by comprising the following steps:
inputting a face image set with an accurate age label as a data set, and preprocessing the face image data set;
inputting the preprocessed face image into a baseline model ResNet-50, and extracting a shallow feature map through a convolution layer and a maximum pooling layer;
after the shallow feature map is extracted, four groups of residual modules which are sequentially connected are connected to form a residual network, the residual network is used as a global branch, and global features of the face image are extracted;
embedding a global second-order pooling block between the first set of residual blocks and the second set of residual blocks, thereby generating a high-dimensional global image representation in the global branch;
taking the shallow feature map as the input of a long-term and short-term memory neural network, constructing a local branch and extracting the local features of the age sensitive area;
performing joint optimization to solve a cross entropy loss function of the two branches, performing iterative training on a convolutional neural network formed by the global branch and the local branch until convergence, and storing a trained convolutional neural network model;
and inputting the face image to be detected into the trained convolutional neural network model, and calculating and outputting the final predicted age by the classifier according to the age characteristics.
2. The method of claim 1, wherein the preprocessing of the face image dataset comprises performing face detection and face alignment using a multitask convolutional neural network, and cropping the face image to the same size, performing data enhancement on a candidate training set in the face image dataset, and performing pixel normalization on the face image according to the following formula:
Xpix=(Xpix-128)/128
wherein, XpixIs the input face image pixel value.
3. The method as claimed in claim 1, wherein the constructing of the convolution layer and the maximum pooling layer of the baseline model ResNet-50 comprises inputting a face image into the ResNet-50, outputting a shallow feature map of the face image through the convolution layer with convolution kernel size of 7 x 7 and step size of 2, and outputting a feature map of 112 x 112, and then outputting the shallow feature map through the maximum pooling layer.
4. The method as claimed in claim 1, wherein after extracting the shallow feature map, the extracted shallow feature map is sequentially passed through four different residual error modules, each residual error module group sequentially includes residual error modules of 3, 4, 6, and 3, the output dimensions of the residual error modules in each group are different, and the sizes of the output feature maps are 56 × 56, 28 × 28, 14 × 14, and 7 × 7.
5. The method for estimating the age of a multi-scale face embedded with high-order information as claimed in claim 1, wherein the embedding process of the global second-order pooling block comprises:
inserting a block for extracting high-order information into a residual error network, specifically, inputting a three-dimensional tensor of h ' × w ' × c ', and performing 1 × 1 convolution on the three-dimensional tensor to obtain a three-dimensional tensor of h ' × w ' × c; wherein h 'and w' are respectively the length and width of the input face image, c 'is the number of channels, and c is less than c';
calculating the correlation of the channels to obtain a fixed-size c × c covariance matrix, and performing row direction normalization on the covariance matrix;
performing two continuous operations of covariance matrix row convolution and Sigmoid nonlinear activation, and outputting a weight vector of c multiplied by 1;
multiplying each channel of the input tensor by a corresponding element in the weight vector to obtain a new three-dimensional tensor h 'xw' xc which is used as the input of a subsequent residual error module;
and inserting a matrix normalization covariance matrix at the tail end of the last residual module of the residual network to generate the final global feature representation of the face image.
6. The multi-scale face age estimation method embedded with high-order information as claimed in claim 1, wherein the process of constructing local branches based on long-short term memory neural network and extracting local features of age sensitive region comprises:
the long-short term memory neural network automatically keeps the position information of other face images similar to the current face image through a long-term and short-term storage mechanism to realize the positioning function, and the calculation formula comprises the following steps:
Cnext=forgetgate⊙Cprev+ingate⊙Cin-tan
hnext=outgate⊙tanh(Cnext)
Cin-tan=tan h(WC[hprev,xinput]+bC)
wherein, forgetgateA forgetting gate representing the long-short term memory neural network LSTM,ingateinput gate, out, representing LSTMgateAn output gate representing an LSTM; the lines indicate the same or a symbol; cprevAnd Cnext、hnextThe previous state, the current state and the hidden state of the LSTM, respectively; cin-tanIs a candidate vector for updating the state of the cell, WCAnd bCRespectively representing the weight and the offset, xinputIs the input of the LSTM;
generating coordinates, width and height of the local area box sensitive to age by status update, the formula comprising:
l1-4=L(W*Snext)
wherein l1-4Represents a four-dimensional vector, representing coordinates (x, y), width and height, S, respectivelynextFor the joint output of LSTM, W is the total parameter, L (. eta.) represents the convolution function;
and cutting according to the position coordinates to obtain local features of the age sensitive area, and sequentially inputting the local features into four groups of residual error modules in the residual error network for local feature learning.
7. The method for estimating age of a multi-scale face embedded with high-order information as claimed in claim 1, wherein the cross entropy loss function is expressed as follows:
Pfinal(Xi)=Pglobal+0.5Plocal
Figure FDA0002556110880000031
wherein the content of the first and second substances,
Figure FDA0002556110880000032
representing loss of convolutional neural network, PglobalRepresenting the predicted age probability, P, of a sample i in a global branchlocalRepresenting the predicted age probability, P, of a sample i in a global branchfinal(Xi) The final predicted age of sample i is represented and n represents the number of training lumped samples of the face image.
8. A multi-scale human face age estimation system embedded with high-order information is characterized by comprising an image acquisition module, a data preprocessing module, a data enhancement module, a neural network module and an output module;
the image acquisition module is used for inputting a data set and acquiring face image information or a face image to be detected;
the data preprocessing module is used for carrying out face detection, face alignment and cutting on the face image information or a face image to be detected and carrying out pixel normalization processing on the face image;
the data enhancement module is used for expanding the training set according to random horizontal turning, zooming, rotating and translating operations;
the neural network module is used for constructing and training a convolutional neural network formed by the global module and the local module;
the global module is used for extracting and learning global features, and the local module is used for extracting and learning local features;
the output module is used for outputting the final predicted age of the face image to be detected.
9. The system according to claim 8, wherein the global module comprises a residual module and a high-order module, the residual module is used for extracting global features of the face image, and the high-order module introduces a global second-order pooling block from a lower layer to a higher layer, so that second-order statistical information of the face image is fully utilized.
10. The system of claim 8, wherein the local modules include a long-short term memory neural network for updating status, a local area location module for locating coordinates, width and height of the age-sensitive area, and a clipping module for clipping the local feature map according to the local location information.
CN202010590398.4A 2020-06-24 2020-06-24 Multi-scale face age estimation method and system embedded with high-order information Active CN111814611B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010590398.4A CN111814611B (en) 2020-06-24 2020-06-24 Multi-scale face age estimation method and system embedded with high-order information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010590398.4A CN111814611B (en) 2020-06-24 2020-06-24 Multi-scale face age estimation method and system embedded with high-order information

Publications (2)

Publication Number Publication Date
CN111814611A true CN111814611A (en) 2020-10-23
CN111814611B CN111814611B (en) 2022-09-13

Family

ID=72854944

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010590398.4A Active CN111814611B (en) 2020-06-24 2020-06-24 Multi-scale face age estimation method and system embedded with high-order information

Country Status (1)

Country Link
CN (1) CN111814611B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528897A (en) * 2020-12-17 2021-03-19 Oppo(重庆)智能科技有限公司 Portrait age estimation method, Portrait age estimation device, computer equipment and storage medium
CN112801040A (en) * 2021-03-08 2021-05-14 重庆邮电大学 Lightweight unconstrained facial expression recognition method and system embedded with high-order information
CN112950631A (en) * 2021-04-13 2021-06-11 西安交通大学口腔医院 Age estimation method based on saliency map constraint and X-ray head skull positioning lateral image
CN115132275A (en) * 2022-05-25 2022-09-30 西北工业大学 Method for predicting EGFR gene mutation state based on end-to-end three-dimensional convolutional neural network
CN116091496A (en) * 2023-04-07 2023-05-09 菲特(天津)检测技术有限公司 Defect detection method and device based on improved Faster-RCNN
CN116416667A (en) * 2023-04-25 2023-07-11 天津大学 Facial action unit detection method based on dynamic association information embedding

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106919897A (en) * 2016-12-30 2017-07-04 华北电力大学(保定) A kind of facial image age estimation method based on three-level residual error network
CN107622261A (en) * 2017-11-03 2018-01-23 北方工业大学 Face age estimation method and device based on deep learning
US20190065906A1 (en) * 2017-08-25 2019-02-28 Baidu Online Network Technology (Beijing) Co., Ltd . Method and apparatus for building human face recognition model, device and computer storage medium
CN109829375A (en) * 2018-12-27 2019-05-31 深圳云天励飞技术有限公司 A kind of machine learning method, device, equipment and system
CN110458084A (en) * 2019-08-06 2019-11-15 南京邮电大学 A kind of face age estimation method based on inversion residual error network
CN110717401A (en) * 2019-09-12 2020-01-21 Oppo广东移动通信有限公司 Age estimation method and device, equipment and storage medium
CN111027490A (en) * 2019-12-12 2020-04-17 腾讯科技(深圳)有限公司 Face attribute recognition method and device and storage medium
CN111274882A (en) * 2020-01-11 2020-06-12 上海悠络客电子科技股份有限公司 Automatic estimation method for human face age based on weak supervision

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106919897A (en) * 2016-12-30 2017-07-04 华北电力大学(保定) A kind of facial image age estimation method based on three-level residual error network
US20190065906A1 (en) * 2017-08-25 2019-02-28 Baidu Online Network Technology (Beijing) Co., Ltd . Method and apparatus for building human face recognition model, device and computer storage medium
CN107622261A (en) * 2017-11-03 2018-01-23 北方工业大学 Face age estimation method and device based on deep learning
CN109829375A (en) * 2018-12-27 2019-05-31 深圳云天励飞技术有限公司 A kind of machine learning method, device, equipment and system
CN110458084A (en) * 2019-08-06 2019-11-15 南京邮电大学 A kind of face age estimation method based on inversion residual error network
CN110717401A (en) * 2019-09-12 2020-01-21 Oppo广东移动通信有限公司 Age estimation method and device, equipment and storage medium
CN111027490A (en) * 2019-12-12 2020-04-17 腾讯科技(深圳)有限公司 Face attribute recognition method and device and storage medium
CN111274882A (en) * 2020-01-11 2020-06-12 上海悠络客电子科技股份有限公司 Automatic estimation method for human face age based on weak supervision

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KE ZHANG等: "Fine-Grained Age Group Classification in the wild", 《2018 24TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR)》 *
张珂等: "人脸年龄估计的深度学习方法综述", 《中国图象图形学报》 *
白昊洋等: "基于残差网络人脸年龄估计", 《电脑知识与技术》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528897A (en) * 2020-12-17 2021-03-19 Oppo(重庆)智能科技有限公司 Portrait age estimation method, Portrait age estimation device, computer equipment and storage medium
CN112801040A (en) * 2021-03-08 2021-05-14 重庆邮电大学 Lightweight unconstrained facial expression recognition method and system embedded with high-order information
CN112801040B (en) * 2021-03-08 2022-09-23 重庆邮电大学 Lightweight unconstrained facial expression recognition method and system embedded with high-order information
CN112950631A (en) * 2021-04-13 2021-06-11 西安交通大学口腔医院 Age estimation method based on saliency map constraint and X-ray head skull positioning lateral image
CN112950631B (en) * 2021-04-13 2023-06-30 西安交通大学口腔医院 Age estimation method based on saliency map constraint and X-ray head cranium positioning side image
CN115132275A (en) * 2022-05-25 2022-09-30 西北工业大学 Method for predicting EGFR gene mutation state based on end-to-end three-dimensional convolutional neural network
CN115132275B (en) * 2022-05-25 2024-02-27 西北工业大学 Method for predicting EGFR gene mutation state based on end-to-end three-dimensional convolutional neural network
CN116091496A (en) * 2023-04-07 2023-05-09 菲特(天津)检测技术有限公司 Defect detection method and device based on improved Faster-RCNN
CN116091496B (en) * 2023-04-07 2023-11-24 菲特(天津)检测技术有限公司 Defect detection method and device based on improved Faster-RCNN
CN116416667A (en) * 2023-04-25 2023-07-11 天津大学 Facial action unit detection method based on dynamic association information embedding
CN116416667B (en) * 2023-04-25 2023-10-24 天津大学 Facial action unit detection method based on dynamic association information embedding

Also Published As

Publication number Publication date
CN111814611B (en) 2022-09-13

Similar Documents

Publication Publication Date Title
CN111814611B (en) Multi-scale face age estimation method and system embedded with high-order information
CN112949565B (en) Single-sample partially-shielded face recognition method and system based on attention mechanism
CN111563508B (en) Semantic segmentation method based on spatial information fusion
CN110348330B (en) Face pose virtual view generation method based on VAE-ACGAN
CN111091045A (en) Sign language identification method based on space-time attention mechanism
CN109359608B (en) Face recognition method based on deep learning model
CN112800903B (en) Dynamic expression recognition method and system based on space-time diagram convolutional neural network
CN112784763B (en) Expression recognition method and system based on local and overall feature adaptive fusion
CN107169954B (en) Image significance detection method based on parallel convolutional neural network
CN111259786A (en) Pedestrian re-identification method based on synchronous enhancement of appearance and motion information of video
CN111612008A (en) Image segmentation method based on convolution network
CN112464865A (en) Facial expression recognition method based on pixel and geometric mixed features
CN109033978B (en) Error correction strategy-based CNN-SVM hybrid model gesture recognition method
CN110321805B (en) Dynamic expression recognition method based on time sequence relation reasoning
CN110609917B (en) Image retrieval method and system based on convolutional neural network and significance detection
CN110458235B (en) Motion posture similarity comparison method in video
CN110675421B (en) Depth image collaborative segmentation method based on few labeling frames
CN112861718A (en) Lightweight feature fusion crowd counting method and system
CN110135435B (en) Saliency detection method and device based on breadth learning system
CN112883931A (en) Real-time true and false motion judgment method based on long and short term memory network
CN114694255B (en) Sentence-level lip language recognition method based on channel attention and time convolution network
CN113780249B (en) Expression recognition model processing method, device, equipment, medium and program product
CN110610138A (en) Facial emotion analysis method based on convolutional neural network
CN116740362B (en) Attention-based lightweight asymmetric scene semantic segmentation method and system
CN111144469B (en) End-to-end multi-sequence text recognition method based on multi-dimensional associated time sequence classification neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20240115

Address after: 230000 floor 1, building 2, phase I, e-commerce Park, Jinggang Road, Shushan Economic Development Zone, Hefei City, Anhui Province

Patentee after: Dragon totem Technology (Hefei) Co.,Ltd.

Address before: 400065 Chongwen Road, Nanshan Street, Nanan District, Chongqing

Patentee before: CHONGQING University OF POSTS AND TELECOMMUNICATIONS

TR01 Transfer of patent right