CN108021908B

CN108021908B - Face age group identification method and device, computer device and readable storage medium

Info

Publication number: CN108021908B
Application number: CN201711449997.9A
Authority: CN
Inventors: 杨龙; 游德创
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2017-12-27
Filing date: 2017-12-27
Publication date: 2020-06-16
Anticipated expiration: 2037-12-27
Also published as: CN108021908A

Abstract

A face age group identification method comprises the following steps: (a) acquiring the face characteristics of each face image in a training sample set; (b) pre-training a multi-layer stacked self-coding model; (c) coding the face features of each face image to obtain the age characteristics of each face image; (d) clustering the age bracket characteristics of each facial image to obtain a preset number of clustering centers; (e) calculating the attribution degree of the age group characteristics of each facial image to each clustering center, and adjusting the network parameters of the multilayer stack type self-coding model to optimize the attribution degree; (f) repeating (c) - (e) until a preset condition is met; (g) coding the face image to be processed to obtain the age group characteristics of the face image to be processed; (h) and carrying out age bracket identification on the face image to be processed to obtain the age bracket type of the face image to be processed. The invention also provides a human face age group recognition device, a computer device and a readable storage medium. The invention can realize rapid and efficient human face age group identification.

Description

Face age group identification method and device, computer device and readable storage medium

Technical Field

The invention relates to the technical field of computer vision, in particular to a face age group identification method and device, a computer device and a computer readable storage medium.

Background

Age group identification is a new research direction in the field of biological feature identification, and has great application prospects in realizing accurate identification of age groups, such as application to security control, video monitoring, electronic customer relationship management and the like.

The human face contains a large amount of age-related information, and age group recognition (namely human face age group recognition) can be performed according to the human face image. The existing face age group identification technology comprises a convolutional neural network, an age growing mode subspace and the like. However, the convolutional neural network needs to continuously perform convolution calculation on the image, and the time for extracting the age group features is long and time-inefficient. The age growing pattern subspace needs to splice the features of all age groups into a large vector, which easily causes dimension disaster.

Disclosure of Invention

In view of the above, it is desirable to provide a method and an apparatus for identifying a face age group, a computer apparatus and a computer readable storage medium, which can realize fast and efficient face age group identification.

A first aspect of the present application provides a method for identifying a face age group, the method including:

(a) acquiring the face characteristics of each face image in a training sample set;

(b) pre-training a multilayer stack type self-coding model by using the training sample set to obtain an initial value of a network parameter of the multilayer stack type self-coding model;

(c) coding the face features of each face image in the training sample set by using the multilayer stack type self-coding model to obtain the age characteristics of each face image in the training sample set;

(d) clustering age group characteristics of all face images in the training sample set to obtain a preset number of clustering centers;

(e) calculating the attribution degree of the age group characteristics of each face image in the training sample set to each clustering center, and adjusting the network parameters of the multilayer stack type self-coding model to optimize the attribution degree;

(f) judging whether a preset training end condition is met, if so, obtaining a well-trained multilayer stacked self-coding model, otherwise, if not, returning to the step (c);

(g) coding the face image to be processed by utilizing the trained multilayer stack type self-coding model to obtain the age group characteristics of the face image to be processed;

(h) and carrying out age group identification on the face image to be processed according to the age group characteristics of the face image to be processed to obtain the age group type of the face image to be processed.

In another possible implementation manner, the face features include a gradient direction histogram feature and/or a local binary pattern feature.

In another possible implementation manner, the multi-layer stacked self-coding model includes three hidden layers, and the number of neurons in the three hidden layers is 500, and 1000, respectively.

In another possible implementation manner, the pre-training the multi-layer stacked self-coding model includes:

and carrying out greedy pre-training on the multilayer stacked self-coding model layer by using a restricted Boltzmann machine.

In another possible implementation manner, the optimizing the attribution degree includes: optimizing the attribution degree by optimizing an objective function L, wherein the objective function L is measured by KL divergence, and the calculation formula is as follows:

wherein Q is represented by_ijA feature space of composition, P being P_ijThe characteristic space of the composition is formed,

wherein q is_ijFor face image x in the training sample set_iTo the cluster center mu_jDegree of attribution of, z_iFor a face image x_iThe age group of (1) is characterized in that α represents the degree of freedom of t-distribution, j is 1, and n, j' is 1.

A second aspect of the present application provides a face age group recognition apparatus, the apparatus comprising:

the acquisition unit is used for acquiring the face characteristics of each face image in the training sample set;

the pre-training unit is used for pre-training the multi-layer stacked self-coding model by utilizing the training sample set to obtain an initial value of a network parameter of the multi-layer stacked self-coding model;

the adjusting unit is used for encoding the face features of the face images in the training sample set by using the multilayer stack type self-encoding model to obtain the age characteristics of the face images in the training sample set; clustering age group characteristics of all face images in the training sample set to obtain a preset number of clustering centers; calculating the attribution degree of the age group characteristics of each face image in the training sample set to each clustering center, and adjusting the network parameters of the multilayer stack type self-coding model to optimize the attribution degree;

the judging unit is used for judging whether a preset training end condition is met or not, and if the preset training end condition is met, obtaining a well-trained multilayer stack type self-coding model;

the encoding unit is used for encoding the face image to be processed by utilizing the trained multilayer stack type self-encoding model to obtain the age group characteristics of the face image to be processed;

and the identification unit is used for identifying the age group of the face image to be processed according to the age group characteristics of the face image to be processed to obtain the age group type of the face image to be processed.

A third aspect of the application provides a computer arrangement comprising a processor for implementing the face age identification method when executing a computer program stored in a memory.

A fourth aspect of the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the face age group identification method.

The method comprises the steps of (a) obtaining the face characteristics of each face image in a training sample set; (b) pre-training a multilayer stack type self-coding model by using the training sample set to obtain an initial value of a network parameter of the multilayer stack type self-coding model; (c) coding the face features of each face image in the training sample set by using the multilayer stack type self-coding model to obtain the age characteristics of each face image in the training sample set; (d) clustering age group characteristics of all face images in the training sample set to obtain a preset number of clustering centers; (e) calculating the attribution degree of the age group characteristics of each face image in the training sample set to each clustering center, and adjusting the network parameters of the multilayer stack type self-coding model to optimize the attribution degree; (f) judging whether a preset training end condition is met, if so, obtaining a well-trained multilayer stacked self-coding model, otherwise, if not, returning to the step (c); (g) coding the face image to be processed by utilizing the trained multilayer stack type self-coding model to obtain the age group characteristics of the face image to be processed; (h) and carrying out age group identification on the face image to be processed according to the age group characteristics of the face image to be processed to obtain the age group type of the face image to be processed.

The invention uses a stack type self-coding structure of deep learning, can continuously correct the feature expression of the face image, and finally trains the optimal age group feature expression of the face image. The invention improves the feature extraction problem in the prior art, and the current training mode does not need convolution calculation, thereby greatly improving the time efficiency. In addition, the encoding process of acquiring the age group features maps the face features to a feature space with smaller dimensions, thereby avoiding the problem of dimension disaster, reducing the calculation complexity of the algorithm and accelerating the training and recognition process. Therefore, the invention can realize rapid and efficient face age group identification.

Drawings

Fig. 1 is a flowchart of a multi-layer stacked self-coding model training method according to an embodiment of the present invention.

Fig. 2 is a flowchart of a face age group recognition method according to a second embodiment of the present invention.

Fig. 3 is a structural diagram of a multi-layer stacked self-coding model training apparatus according to a third embodiment of the present invention.

Fig. 4 is a structural diagram of a face age group recognition apparatus according to a fourth embodiment of the present invention.

Fig. 5 is a schematic diagram of a computer device according to a fifth embodiment of the present invention.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a detailed description of the present invention will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth to provide a thorough understanding of the present invention, and the described embodiments are merely a subset of the embodiments of the present invention, rather than a complete embodiment. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

Preferably, the face age group identification method of the present invention is applied to one or more computer devices. The computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing equipment. The computer device can be in man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

Example one

Fig. 1 is a flowchart of a multi-layer stacked self-coding model training method according to an embodiment of the present invention. The multilayer stack type self-coding model training method is applied to a computer device. The multi-layer stack type self-coding model training method is used for training a multi-layer stack type self-coding model suitable for face age group recognition (namely, age group recognition is carried out according to face images) so as to be applied to occasions such as safety control, video monitoring, electronic customer relationship management and the like.

As shown in fig. 1, the method for training a multi-layer stacked self-coding model specifically includes the following steps:

101: and acquiring the face features of each face image in the training sample set.

The training sample set of the multilayer stack type self-coding model comprises a plurality of face images with labeled age bracket types, and each face image is a training sample. For example, the training sample set includes 4000 face images, wherein 1000 face images are labeled as a child age group type, an adolescent age group type, a middle-aged and young age group type and an old age group type.

Different age bracket types can be divided according to needs, and the facial images in the training sample set are labeled according to the divided age bracket types. For example, four age groups of children, adolescents, middle-aged and elderly people may be classified. Alternatively, four age groups of young children, adults, and the elderly may be categorized.

In this embodiment, the facial features may include one or more of the following features:

(A) histogram of Oriented Gradient (HOG) feature.

The HOG features focus on the description of the local gradients of the image, with geometric and illumination invariance.

In one embodiment, I is a grayscale image, and I (x, y) is the pixel value of the image I at the pixel point (x, y). The HOG features of the image I can be extracted as follows:

(A1) and (6) image normalization. Image normalization can be performed as follows:

I(x,y)＝I(x,y)^1/2。

(A2) the gradient of the image is calculated. The calculation formula is as follows:

G_x(x,y)＝I(x+1,y)-I(x-1,y)，

G_y(x,y)＝I(x,y+1)-I(x,y-1)，

G(x,y)＝(G_x(x,y)²+G_y(x,y)²)^1/2，

α(x,y)＝tan^-1(G_y(x,y)/G_x(x,y))。

wherein G is_x(x,y)、G_y(x, y), G (x, y), α (x, y) respectively represent the horizontal gradient value, the vertical gradient value, the gradient magnitude, and the gradient direction of the image I at the pixel point (x, y).

(A3) Dividing the image into a plurality of cells (cells), and constructing a gradient direction histogram for each cell to obtain the HOG feature vector of each cell.

For example, a 32 × 32 image is divided into 64 cells, each cell has 4 × 4 pixels, the gradient direction of the cell is divided into 9 direction blocks on average at 360 degrees, and gradient amplitudes of the gradient directions belonging to the range of the direction blocks are accumulated to form a 9-dimensional feature vector.

(A4) Combining a plurality of adjacent unit grids into an image block (block), and normalizing the gradient histogram in the image block to obtain the HOG characteristic vector of each image block.

For example, four adjacent cells are formed into an image block, and a 36-dimensional feature vector z in the image block is defined as [ z ═ z%₁,z₂,z₃,...,z₃₆]According to formula v_i＝(z_i-min (z)/(max (z) — min (z)) is normalized to obtain the HOG feature vector v ═ v [ v ]) of the image block₁,v₂,v₃,...,v₃₆]。

(A5) And connecting HOG feature vectors of all the image blocks in series to obtain the HOG feature vectors of the images.

Further HOG feature extraction methods can refer to the prior art, and are not described herein again.

(B) Local Binary Pattern (LBP) feature.

The LBP features focus on the description of local texture of the image, with rotation invariance and gray-scale invariance.

In one embodiment, the LBP features of the image may be extracted as follows:

(B1) the image is divided into a number of cells. For example, a 32 × 32 image is divided into 4 cells, each having 16 × 16 pixel points.

(B2) Comparing the gray value of 8 adjacent pixel points taking the pixel point as the center with the gray value of the pixel point in each cell, if the gray value is larger than the gray value of the pixel point, marking the position of the pixel point as 1, and if the gray value is not larger than the gray value of the pixel point, marking the position of the pixel point as 0, wherein the generated 8-bit binary number is the LBP value of the center pixel point.

(B3) A statistical histogram of the LBP values within each cell is calculated.

(B4) And normalizing the statistical histogram of the LBP value in each cell to obtain the LBP characteristic vector of each cell.

(B5) And connecting the LBP characteristic vectors of all the cells in series to obtain the LBP characteristic vector of the image.

Further LBP feature extraction methods can refer to the prior art, and are not described herein.

102: and pre-training the multilayer stack type self-coding model by utilizing the training sample set to obtain the initial value of the network parameter of the multilayer stack type self-coding model.

The multilayer stack type self-coding model is a neural network formed by multilayer self-encoders, and the output of the front layer self-encoder is used as the input of the rear layer self-encoder. In one embodiment, the multi-layer stacked self-coding model may include an input layer and three hidden layers, and the number of neurons in the three hidden layers is 500, and 1000, respectively. The input layer and the three hidden layers constitute a stack of three self-encoders.

The self-encoder is divided into two parts, one is an encoding part and the other is a decoding part. The encoding part encodes input data (such as face features of each face image in a training sample set), namely, the input data is compressed and mapped to obtain encoded data, and the decoding part decodes the encoded data to obtain reconstructed output data. The principle of the self-encoder is to try to learn a function such that the output data generated by the reconstruction is as equal as possible to the input data. Therefore, by limiting the number of hidden neurons, the self-encoder can learn the compressed representation of the input data, and the self-encoder can discover the main characteristic expression for some specific structures hidden in the input data.

Pre-training of the multi-layered stacked self-encoding model is an unsupervised learning process. The pre-training process of the multi-layer stack type self-coding model does not need to use the age bracket type labeled to each face image in the training sample set, and the pre-training is carried out according to the principle of the self-encoder (the output is equal to the input as much as possible). And performing greedy pre-training on the multilayer stacked self-coding model layer by using a restricted Boltzmann machine to obtain an initial value of a network parameter of the multilayer stacked self-coding model.

103: and coding the face features of each face image in the training sample set by using a multilayer stack type self-coding model to obtain the age characteristics of each face image in the training sample set.

As previously described, the self-encoder is divided into an encoding portion and a decoding portion. The method comprises the steps of utilizing a multi-layer stack type self-coding model to code face features of all face images in a training sample set, namely utilizing a coding part of a self-coder of the multi-layer stack type self-coding model to code the face features to obtain age-group features of all face images in the training sample set.

The face features of each face image in the training sample set are encoded by using a multi-layer stacked self-encoding model, so as to obtain better feature expression. The age characteristics obtained by coding have better age differentiation than the human face characteristics, and are suitable for identifying the age of the human face image. For example, the face features of each face image in the training sample set are encoded by using a multi-layer stacked self-encoding model, so that fine texture features and geometric features in the face images can be obtained.

The face features are encoded by mapping the face features to a feature space with smaller dimensions, so that the problem of dimension disaster is avoided, the calculation complexity of an algorithm is reduced, and the training process is accelerated.

104: and clustering the age group characteristics of all the face images in the training sample set to obtain a preset number of clustering centers.

The preset number is the number of different divided age bracket types. For example, four age groups including a child age group type, a teenage group type, a middle and young age group type, and an old age group type are divided, i.e., the preset number is 4.

Each cluster center corresponds to an age group type. For example, 4 clustering centers are obtained by clustering, and correspond to the age group type of children, the age group type of teenagers, the age group type of middle-aged and young children, and the age group type of old people, respectively.

The age class features of each face image in the training sample set can be clustered by using a GMM (Gaussian Mixture Model) or a K-Means algorithm to obtain a preset number of cluster centers. For example, the age bracket features of each face image in the training sample set are clustered by using a Gaussian mixture model GMM or a K-Means algorithm with the clustering center number of 4 to obtain 4 clustering centers.

Other clustering algorithms can be used for clustering the age bracket characteristics of each facial image in the training sample set. For example, a DBSCAN (Density-Based Clustering with Noise) algorithm with 4 Clustering centers is used to cluster the age class features of each facial image in the training sample set, so as to obtain 4 Clustering centers.

105: and calculating the attribution degree of the age class characteristics of each face image in the training sample set to each clustering center, and adjusting the network parameters of the multilayer stack type self-coding model to optimize the attribution degree.

For each face image x in the training sample set_iCan calculate x_iFor each cluster center mu_jJ 1.. multidata.n.t-distribution degree of attribution, which represents x_iIs a mu_jThe calculation formula of the attribution degree is as follows:

wherein z is_iRepresenting a face image x_iThe age group characteristic of (1), i.e., α is the degree of freedom of t-distribution, α may be 1, j' is 1, a., n, n denotes the number of cluster centers (e.g., 4) in step 104₂If the attribution degree is the highest, the type of the age group to which the face image belongs is identified as a cluster center mu₂The corresponding age group type.

An auxiliary target profile P may be used to optimize the attribution. Optimizing the attribution degree refers to maximizing the correct attribution degree.

The K-L divergence can be used as an objective function for optimization as follows:

and adjusting network parameters of the multilayer stack type self-coding model to optimize an objective function L, namely, the attribution degree is optimized.

106: and judging whether a preset training ending condition is met.

Whether the identification accuracy of the age group type of each facial image in the training sample set is greater than or equal to a first threshold (for example, 99%) can be judged, and if the identification accuracy of the age group type of each facial image in the training sample set is greater than or equal to the first threshold, it is judged that a preset training end condition is met. For example, the training sample set includes 4000 face images, the recognition results of 3980 face images are correct (i.e., the recognized age group type is the same as the labeled age group type), and the recognition results of 200 face images are incorrect (i.e., the recognized age group type is different from the labeled age group type), so that the recognition accuracy is 39800/4000-99.5%, which is greater than the first threshold 99%, and thus the preset training end condition is satisfied.

Alternatively, it may be determined whether the change rate of the recognition result of the age group type of each face image in the training sample set is less than or equal to a second threshold (e.g., 0.1%), and if the change rate of the recognition result of the age group type of each face image in the training sample set is less than or equal to the second threshold, it is determined that the preset training end condition is satisfied. The change rate of the recognition result of the age group type of each face image in the training sample set is the change rate of the recognition result of the current round of adjustment compared with the previous round of adjustment. For example, the training sample set includes 4000 face images, and when the recognition result of the age bracket type of 2 face images changes in the current adjustment compared to the previous adjustment, the change rate of the recognition result is 2/4000 ═ 0.05%, and is less than the second threshold value 0.1%, so that the preset training end condition is satisfied.

Alternatively, it may be determined whether the adjusted number of rounds reaches a preset number of rounds (for example, 400 rounds), and if the adjusted number of rounds reaches the preset number of rounds, it is determined that the preset training end condition is satisfied.

If the preset training end condition is not satisfied, for example, if the identification accuracy of the age group type of each face image in the training sample set is smaller than the first threshold, the process returns to step 103.

Otherwise, if a preset training end condition is met, for example, if the identification accuracy of the age group type of each face image in the training sample set is greater than or equal to a first threshold, a trained multi-layer stacked self-coding model is obtained, and the process is ended.

The method for training the multilayer stack type self-coding model comprises the following steps of (a) obtaining face features of face images in a training sample set; (b) pre-training a multilayer stack type self-coding model by using the training sample set to obtain an initial value of a network parameter of the multilayer stack type self-coding model; (c) coding the face features of each face image in the training sample set by using the multilayer stack type self-coding model to obtain the age characteristics of each face image in the training sample set; (d) clustering age group characteristics of all face images in the training sample set to obtain a preset number of clustering centers; (e) calculating the attribution degree of the age group characteristics of each face image in the training sample set to each clustering center, and adjusting the network parameters of the multilayer stack type self-coding model to optimize the attribution degree; (f) and (c) judging whether a preset training end condition is met, if so, obtaining a trained multilayer stacked self-coding model, otherwise, if not, returning to the step (c).

The existing model used for identifying the age group of the human face comprises a convolutional neural network and an age growing pattern subspace. The convolutional neural network needs to continuously perform convolution calculation on the image, so that the time for extracting the age group features is long, and the time efficiency is low. The age growing pattern subspace needs to splice the features of all age groups into a large vector, which easily causes dimension disaster. In the first embodiment, a deep learning stacked self-coding structure is used, forward parameter modification and backward propagation can be continuously performed, feature representation of the face image is continuously modified, and finally, the optimal age group feature representation of the face image is trained. The embodiment firstly improves the feature extraction problem in the prior art, and the current training mode does not need convolution calculation, thereby greatly improving the time efficiency. In addition, in the first encoding process for acquiring the age group features, the face features are mapped to the feature space with smaller dimensions, so that the problem of dimension disaster is avoided, the calculation complexity of the algorithm is reduced, and the training process is accelerated. Therefore, the embodiment I can quickly and efficiently train the multi-layer stacked self-coding model for face age group recognition.

Example two

Fig. 2 is a flowchart of a face age group recognition method according to a second embodiment of the present invention. The face age group identification method is applied to a computer device. The face age group identification method can be applied to occasions such as safety control, video monitoring, electronic customer relationship management and the like. The method trains a multilayer stack type self-coding model, and uses the trained multilayer stack type self-coding model to identify the age group of the face of a face image to be processed.

As shown in fig. 2, the face age group identification method specifically includes the following steps:

201: and acquiring a training sample set of the multilayer stack type self-coding model, and acquiring the face characteristics of each face image in the training sample set.

Step 201 in this embodiment is the same as step 101 in the first embodiment, and please refer to the related description of step 101 in the first embodiment, which is not described herein again.

202: establishing a multi-layer stacked self-coding model, and pre-training the multi-layer stacked self-coding model by using the training sample set to obtain an initial value of a network parameter of the multi-layer stacked self-coding model.

Step 202 in this embodiment is the same as step 102 in the first embodiment, and please refer to the related description of step 102 in the first embodiment, which is not described herein again.

203: and coding the face features of each face image in the training sample set by using the multilayer stack type self-coding model to obtain the age characteristics of each face image.

Step 203 in this embodiment is the same as step 103 in the first embodiment, and please refer to the related description of step 103 in the first embodiment, which is not described herein again.

204: and clustering the age group characteristics of all the face images in the training sample set to obtain a preset number of clustering centers.

Step 204 in this embodiment is the same as step 104 in the first embodiment, and please refer to the related description of step 104 in the first embodiment, which is not described herein again.

205: and calculating the attribution degree of the age group characteristics of each face image in the training sample set to each clustering center, and adjusting the network parameters of the multilayer stack type self-coding model to optimize the attribution degree.

Step 205 in this embodiment is the same as step 105 in the first embodiment, and please refer to the related description of step 105 in the first embodiment, which is not described herein again.

206: judging whether a preset training end condition is met, if so, obtaining a trained multilayer stacked self-coding model, and executing step 207; otherwise, if the preset training end condition is not satisfied, the procedure returns to step 203.

Step 206 in this embodiment is the same as step 106 in the first embodiment, and please refer to the related description of step 106 in the first embodiment, which is not described herein again.

207: and coding the face image to be processed by using the trained multilayer stack type self-coding model to obtain the age characteristics of the face image to be processed.

When the age group identification of the face image to be processed is needed, the face image to be processed is received, the trained multilayer stack type self-coding model is used for coding the face image to be processed, and the age group characteristics of the face image to be processed are obtained.

208: and performing age bracket identification on the face image to be processed according to the age bracket characteristics of the face image to be processed to obtain the age bracket type of the face image to be processed.

For example, the age bracket identification is performed on the face image to be processed according to the age bracket characteristics of the face image to be processed, and the obtained age bracket type of the face image to be processed is an old age bracket type.

Age group recognition can be performed on the face image to be processed by using a Decision Tree (Decision Tree). Decision trees, also known as decision trees, are tree structures that are applied to classification. Each internal node in the decision tree represents a test on a certain attribute, each edge represents a test result, the leaf nodes represent a certain class or the distribution of the classes, and the uppermost node is the root node. Other classifiers can also be used for identifying the age group of the face image to be processed, for example, a Softmax classifier is used for identifying the age group of the face image to be processed.

The prior art can be referred to for age bracket identification of the face image to be processed according to the age bracket characteristics of the face image to be processed, and details are not repeated here.

The face age group identification method of the second embodiment (a) obtains the face features of each face image in a training sample set; (b) pre-training a multilayer stack type self-coding model by using the training sample set to obtain an initial value of a network parameter of the multilayer stack type self-coding model; (c) coding the face features of each face image in the training sample set by using the multilayer stack type self-coding model to obtain the age characteristics of each face image in the training sample set; (d) clustering age group characteristics of all face images in the training sample set to obtain a preset number of clustering centers; (e) calculating the attribution degree of the age group characteristics of each face image in the training sample set to each clustering center, and adjusting the network parameters of the multilayer stack type self-coding model to optimize the attribution degree; (f) judging whether a preset training end condition is met, if so, obtaining a well-trained multilayer stacked self-coding model, otherwise, if not, returning to the step (c); (g) coding the face image to be processed by utilizing the trained multilayer stack type self-coding model to obtain the age group characteristics of the face image to be processed; (h) and carrying out age group identification on the face image to be processed according to the age group characteristics of the face image to be processed to obtain the age group type of the face image to be processed.

The existing face age group identification technology comprises a convolutional neural network and an age growing pattern subspace. The convolutional neural network needs to continuously perform convolution calculation on the image, so that the time for extracting the age group features is long, and the time efficiency is low. The age growing pattern subspace needs to splice the features of all age groups into a large vector, which easily causes dimension disaster. The second embodiment uses a deep learning stacked self-coding structure, and can continuously perform forward parameter modification and backward propagation, continuously modify the feature representation of the face image, and finally train the optimal age group feature representation of the face image. The second embodiment improves the feature extraction problem in the prior art, and the current training mode does not need convolution calculation, thereby greatly improving the time efficiency. In addition, the encoding process for acquiring the age group features in the second embodiment is to map the face features to a feature space with smaller dimensions, so that the problem of dimension disaster is avoided, the calculation complexity of the algorithm is reduced, and the training and recognition processes are accelerated. Therefore, the second embodiment can realize fast and efficient face age group recognition.

EXAMPLE III

Fig. 3 is a structural diagram of a multi-layer stacked self-coding model training apparatus according to a third embodiment of the present invention. The multi-layer stacked self-coding model training device 10 is applied to a computer device. The multi-layer stack type self-coding model training device 10 trains a multi-layer stack type self-coding model suitable for face age group recognition (namely, age group recognition is carried out according to face images) so as to be applied to occasions such as safety control, video monitoring, electronic customer relationship management and the like.

As shown in fig. 3, the multi-layer stacked self-coding model training apparatus 10 may include: acquisition unit 301, pre-training unit 302, adjustment unit 303, and determination unit 304.

An obtaining unit 301, configured to obtain face features of face images in the training sample set.

(A) histogram of Oriented Gradient (HOG) feature.

I(x,y)＝I(x,y)^1/2。

G_x(x,y)＝I(x+1,y)-I(x-1,y)，

G_y(x,y)＝I(x,y+1)-I(x,y-1)，

G(x,y)＝(G_x(x,y)²+G_y(x,y)²)^1/2，

α(x,y)＝tan^-1(G_y(x,y)/G_x(x,y))。

(B) Local Binary Pattern (LBP) feature.

In one embodiment, the LBP features of the image may be extracted as follows:

(B3) A statistical histogram of the LBP values within each cell is calculated.

And a pre-training unit 302, configured to pre-train the multi-layer stacked self-coding model by using a training sample set, so as to obtain an initial value of a network parameter of the multi-layer stacked self-coding model.

The adjusting unit 303 is configured to encode the face features of each face image in the training sample set by using the multi-layer stacked self-encoding model, so as to obtain the age-related features of each face image in the training sample set.

The adjusting unit 303 is further configured to cluster the age group characteristics of each face image in the training sample set to obtain a preset number of cluster centers.

The adjusting unit 303 is further configured to calculate an attribution degree of the age group feature of each face image in the training sample set to each cluster center, and adjust a network parameter of the multi-layer stacked self-coding model to optimize the attribution degree.

The determining unit 304 is configured to determine whether a preset training end condition is met, and if the preset training end condition is met, obtain a trained multi-layer stacked self-coding model.

If the preset training end condition is not met, for example, if the identification accuracy of the age group type of each face image in the training sample set is smaller than the first threshold, the adjusting unit 303 continues to adjust the network parameters.

The multilayer stack type self-coding model training device of the third embodiment (a) obtains the face features of each face image in a training sample set; (b) pre-training a multilayer stack type self-coding model by using the training sample set to obtain an initial value of a network parameter of the multilayer stack type self-coding model; (c) coding the face features of each face image in the training sample set by using the multilayer stack type self-coding model to obtain the age characteristics of each face image in the training sample set; (d) clustering age group characteristics of all face images in the training sample set to obtain a preset number of clustering centers; (e) calculating the attribution degree of the age group characteristics of each face image in the training sample set to each clustering center, and adjusting the network parameters of the multilayer stack type self-coding model to optimize the attribution degree; (f) and judging whether a preset training end condition is met, and if the preset training end condition is met, obtaining a well-trained multilayer stack type self-coding model.

The existing model used for identifying the age group of the human face comprises a convolutional neural network and an age growing pattern subspace. The convolutional neural network needs to continuously perform convolution calculation on the image, so that the time for extracting the age group features is long, and the time efficiency is low. The age growing pattern subspace needs to splice the features of all age groups into a large vector, which easily causes dimension disaster. In the third embodiment, a deep learning stacked self-coding structure is used, forward parameter modification and backward propagation can be continuously performed, the feature representation of the face image is continuously modified, and finally the optimal age group feature representation of the face image is trained. The third embodiment improves the feature extraction problem in the prior art, and the current training mode does not need convolution calculation, thereby greatly improving the time efficiency. In addition, the encoding process for acquiring the age group features in the third embodiment is to map the face features to a feature space with smaller dimensions, so that the problem of dimension disaster is avoided, the calculation complexity of the algorithm is reduced, and the training process is accelerated. Therefore, the third embodiment can quickly and efficiently train a multi-layer stacked self-coding model for face age group recognition.

Example four

Fig. 4 is a structural diagram of a face age group recognition apparatus according to a fourth embodiment of the present invention. The face age group recognition device 11 is applied to a computer device. The face age group identification device 11 can be applied to occasions such as safety control, video monitoring, electronic customer relationship management and the like. The device 11 trains the multilayer stack type self-coding model, and the trained multilayer stack type self-coding model is used for identifying the age group of the face image to be processed.

As shown in fig. 4, the face age group recognition device 11 may include: acquisition section 401, pre-training section 402, adjustment section 403, determination section 404, encoding section 405, and recognition section 406.

An obtaining unit 401 is configured to obtain face features of face images in the training sample set.

The obtaining unit 401 in this embodiment is the same as the obtaining unit 301 in the third embodiment, and please refer to the related description of the obtaining unit 301 in the third embodiment, which is not described herein again.

And a pre-training unit 402, configured to pre-train the multi-layer stacked self-coding model by using a training sample set, so as to obtain an initial value of a network parameter of the multi-layer stacked self-coding model.

The pre-training unit 402 in this embodiment is the same as the pre-training unit 302 in the third embodiment, and specific reference is made to the description of the pre-training unit 302 in the third embodiment, which is not repeated herein.

An adjusting unit 403, configured to encode the face features of each face image in the training sample set by using a multi-layer stacked self-encoding model, to obtain age-group features of each face image in the training sample set; clustering the age group characteristics of all the face images in the training sample set to obtain a preset number of clustering centers; and calculating the attribution degree of the age class characteristics of each face image in the training sample set to each clustering center, and adjusting the network parameters of the multilayer stack type self-coding model to optimize the attribution degree.

The adjusting unit 403 in this embodiment is the same as the adjusting unit 303 in the third embodiment, and please refer to the related description of the adjusting unit 303 in the third embodiment, which is not described herein again.

The determining unit 404 is configured to determine whether a preset training end condition is met, and if the preset training end condition is met, obtain a trained multi-layer stacked self-coding model.

The determining unit 404 in this embodiment is the same as the determining unit 304 in the third embodiment, and please refer to the related description of the determining unit 304 in the third embodiment, which is not described herein again.

And the encoding unit 405 is configured to encode the face image to be processed by using the trained multilayer stacked self-encoding model, so as to obtain the age group characteristics of the face image to be processed.

And the identifying unit 406 is configured to perform age group identification on the face image to be processed according to the age group characteristics of the face image to be processed, so as to obtain an age group type of the face image to be processed.

For example, the age bracket identification is carried out on the face image to be processed according to the age bracket characteristics of the face image to be processed, and the obtained age bracket type of the face image to be processed is the old age bracket type.

The age bracket identification of the face image to be processed according to the age bracket characteristics of the face image to be processed can refer to the prior art, and is not described herein again.

The method comprises the following steps of (a) obtaining the face features of face images in a training sample set; (b) pre-training a multilayer stack type self-coding model by using the training sample set to obtain an initial value of a network parameter of the multilayer stack type self-coding model; (c) coding the face features of each face image in the training sample set by using the multilayer stack type self-coding model to obtain the age characteristics of each face image in the training sample set; (d) clustering age group characteristics of all face images in the training sample set to obtain a preset number of clustering centers; (e) calculating the attribution degree of the age group characteristics of each face image in the training sample set to each clustering center, and adjusting the network parameters of the multilayer stack type self-coding model to optimize the attribution degree; (f) judging whether a preset training end condition is met, and if the preset training end condition is met, obtaining a well-trained multilayer stacked self-coding model; (g) coding the face image to be processed by utilizing the trained multilayer stack type self-coding model to obtain the age group characteristics of the face image to be processed; (h) and carrying out age group identification on the face image to be processed according to the age group characteristics of the face image to be processed to obtain the age group type of the face image to be processed.

The existing face age group identification technology comprises a convolutional neural network and an age growing pattern subspace. The convolutional neural network needs to continuously perform convolution calculation on the image, so that the time for extracting the age group features is long, and the time efficiency is low. The age growing pattern subspace needs to splice the features of all age groups into a large vector, which easily causes dimension disaster. The fourth embodiment uses a deep learning stacked self-coding structure, can continuously modify parameters forward and transmit backward, continuously modify the feature representation of the face image, and finally train the optimal age group feature representation of the face image. The fourth embodiment improves the feature extraction problem in the prior art, and the current training mode does not need convolution calculation, thereby greatly improving the time efficiency. In addition, the encoding process for acquiring the age group features in the fourth embodiment is to map the face features to a feature space with smaller dimensions, so that the problem of dimension disaster is avoided, the calculation complexity of the algorithm is reduced, and the training and recognition processes are accelerated. Therefore, the fourth embodiment can realize fast and efficient face age group recognition.

EXAMPLE five

Fig. 5 is a schematic diagram of a computer device according to a fifth embodiment of the present invention. The computer device 1 comprises a memory 20, a processor 30 and a computer program 40, such as a face age identification program, stored in the memory 20 and executable on the processor 30. The processor 30, when executing the computer program 40, implements the steps of the above-mentioned face age group identification method embodiments, such as the steps 101 to 106 shown in fig. 1 or the steps 201 to 208 shown in fig. 2. Alternatively, the processor 30, when executing the computer program 40, implements the functions of the modules/units in the above-mentioned device embodiments, such as units 301 to 304 in FIG. 3 or units 401 to 406 in FIG. 4.

Illustratively, the computer program 40 may be partitioned into one or more modules/units that are stored in the memory 20 and executed by the processor 30 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution process of the computer program 40 in the computer apparatus 1. For example, the computer program 40 may be divided into an acquisition unit 301, a pre-training unit 302, an adjustment unit 303, and a determination unit 304 in fig. 3 or an acquisition unit 401, a pre-training unit 402, an adjustment unit 403, a determination unit 404, an encoding unit 405, and an identification unit 406 in fig. 4, where specific functions of each unit are described in the third embodiment and the fourth embodiment.

The computer device 1 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. It will be understood by those skilled in the art that the schematic diagram 4 is only an example of the computer apparatus 1, and does not constitute a limitation to the computer apparatus 1, and may include more or less components than those shown, or combine some components, or different components, for example, the computer apparatus 1 may further include an input and output device, a network access device, a bus, and the like.

The Processor 30 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor 30 may be any conventional processor or the like, the processor 30 being the control center of the computer device 1 and connecting the various parts of the whole computer device 1 with various interfaces and lines.

The memory 20 may be used for storing the computer program 40 and/or the module/unit, and the processor 30 implements various functions of the computer device 1 by running or executing the computer program and/or the module/unit stored in the memory 20 and calling data stored in the memory 20. The memory 20 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the computer apparatus 1, and the like. In addition, the memory 20 may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

The modules/units integrated with the computer device 1 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

In the embodiments provided in the present invention, it should be understood that the disclosed computer apparatus and method can be implemented in other ways. For example, the above-described embodiments of the computer apparatus are merely illustrative, and for example, the division of the units is only one logical function division, and there may be other divisions when the actual implementation is performed.

In addition, functional units in the embodiments of the present invention may be integrated into the same processing unit, or each unit may exist alone physically, or two or more units are integrated into the same unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. The units or computer means recited in the computer means claims may also be implemented by the same unit or computer means, either in software or in hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A face age group identification method is characterized by comprising the following steps:

(a) acquiring the face features of each face image in a training sample set, wherein the face features comprise gradient direction histogram features and/or local binary pattern features;

(b) pre-training a multi-layer stacked self-coding model by using the training sample set to obtain an initial value of a network parameter of the multi-layer stacked self-coding model, wherein the multi-layer stacked self-coding model comprises an input layer and three hidden layers, and the input layer and the three hidden layers form three stacked self-encoders;

(c) encoding gradient direction histogram features and/or local binary pattern features of each face image in the training sample set by using the multilayer stack type self-encoding model to obtain age class features of each face image in the training sample set;

2. The method of claim 1, wherein the determining whether a preset training end condition is satisfied comprises:

judging whether the identification accuracy rate of the age group type of each face image in the training sample set is greater than or equal to a preset threshold value or not; or

And judging whether the adjusted number of rounds reaches the preset number of rounds.

3. The method of claim 1, wherein the number of neurons in the three hidden layers is 500, 1000, respectively.

4. The method of claim 1, wherein the pre-training the multi-layer stacked self-encoding model comprises:

5. The method of any of claims 1-4, wherein the optimizing the degree of attribution comprises: optimizing the attribution degree by optimizing an objective function L, wherein the objective function L is measured by KL divergence, and the calculation formula is as follows:

6. An apparatus for identifying the age group of a human face, the apparatus comprising:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring the face characteristics of each face image in a training sample set, and the face characteristics comprise gradient direction histogram characteristics and/or local binary pattern characteristics;

the pre-training unit is used for pre-training a multi-layer stacked self-coding model by using the training sample set to obtain an initial value of a network parameter of the multi-layer stacked self-coding model, wherein the multi-layer stacked self-coding model comprises an input layer and three hidden layers, and the input layer and the three hidden layers form three stacked self-encoders;

the adjusting unit is used for encoding the gradient direction histogram feature and/or the local binary pattern feature of each face image in the training sample set by using the multi-layer stacked self-encoding model to obtain the age class feature of each face image in the training sample set; clustering age group characteristics of all face images in the training sample set to obtain a preset number of clustering centers; calculating the attribution degree of the age group characteristics of each face image in the training sample set to each clustering center, and adjusting the network parameters of the multilayer stack type self-coding model to optimize the attribution degree;

7. The apparatus according to claim 6, wherein the determining unit is specifically configured to:

8. The apparatus of claim 6, wherein the number of neurons in the three hidden layers is 500, 1000, respectively.

9. A computer device, characterized by: the computer arrangement comprises a processor for implementing the method of face age identification according to any one of claims 1-5 when executing a computer program stored in a memory.

10. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program, when executed by a processor, implements the face age group identification method of any one of claims 1-5.