CN107766787B

CN107766787B - Face attribute identification method, device, terminal and storage medium

Info

Publication number: CN107766787B
Application number: CN201710591603.7A
Authority: CN
Inventors: 牟永强; 田第鸿
Original assignee: Shenzhen Intellifusion Technologies Co Ltd
Current assignee: Shenzhen Intellifusion Technologies Co Ltd
Priority date: 2016-08-16
Filing date: 2017-07-19
Publication date: 2023-04-07
Anticipated expiration: 2037-07-19
Also published as: CN106295584A; CN107766787A

Abstract

A face attribute identification method, the method comprising: pre-training a neural network model; carrying out fine adjustment of a face recognition task on the pre-trained neural network model; carrying out fine adjustment on the face attribute recognition task on the neural network model subjected to fine adjustment on the face recognition task; and carrying out face attribute recognition on the given image by using the neural network model after the fine adjustment of the face attribute recognition task. The invention also provides a face attribute recognition device, a terminal and a storage medium. The invention can train out a neural network model suitable for face attribute recognition, and obtain better face attribute recognition effect.

Description

Face attribute identification method, device, terminal and storage medium

Technical Field

The invention relates to the technical field of image recognition, in particular to a face attribute recognition method, a face attribute recognition device, a face attribute recognition terminal and a storage medium.

Background

Vision-based face attribute recognition, such as race/ethnic recognition, gender recognition, age recognition, etc., has very wide application in the fields of video surveillance, face recognition, population analysis, business analysis, etc. The traditional identification algorithm based on artificial features is difficult to meet the requirements in real scenes in terms of precision. In recent years, visual correlation algorithms based on deep learning have been greatly developed in the fields of image classification, object detection, object segmentation, and the like. However, the biggest problem of deep learning is that a very large number of samples are required for training the model, which makes it difficult to achieve a large breakthrough compared with the conventional algorithm on the task that some samples have a limited number. In addition, there are studies that show that the features learned by deep learning and the corresponding tasks are closely related and it is difficult to directly apply them to other tasks.

Disclosure of Invention

In view of the above, there is a need for a face attribute recognition method, device, terminal and storage medium, which can train a neural network model suitable for face attribute recognition to obtain a better face attribute recognition effect.

A first aspect of the present application provides a face attribute identification method, where the method includes:

pre-training the neural network model;

carrying out face recognition task fine adjustment on the pre-trained neural network model;

carrying out fine adjustment on the face attribute recognition task on the neural network model subjected to fine adjustment on the face recognition task;

and carrying out face attribute recognition on the given image by using the neural network model after the fine adjustment of the face attribute recognition task.

In another possible implementation, the pre-training of the neural network model includes pre-training of the neural network model using a natural scene image;

the fine adjustment of the face recognition task on the pre-trained neural network model comprises the fine adjustment of the face recognition task on the pre-trained neural network model by using a face image;

and the fine adjustment of the face attribute recognition task on the neural network model subjected to the fine adjustment of the face recognition task comprises the fine adjustment of the face attribute recognition task on the neural network model subjected to the fine adjustment of the face recognition task by using the image with the marked face attribute.

In another possible implementation manner, the number of the natural scene images is greater than the number of the face images, and the number of the face images is greater than the number of the images with labeled face attributes.

In another possible implementation manner, the neural network model is a convolutional network model, and in the convolutional network model, a ReLU activation function is connected to the back of each convolutional layer.

In another possible implementation manner, the facial attributes include race, gender, age, and expression.

In another possible implementation manner, in the step of performing face recognition task fine adjustment on the pre-trained neural network model and performing face attribute recognition task fine adjustment on the neural network model after the face recognition task fine adjustment, the learning rate of the last layer of the neural network model is 10 times that of the other layers.

In another possible implementation manner, in the step of performing face recognition task fine adjustment on the pre-trained neural network model and performing face attribute recognition task fine adjustment on the neural network model after the face recognition task fine adjustment, before an image is input into the neural network model, face detection, face alignment and face normalization are performed on the image.

A second aspect of the present application provides a face attribute recognition apparatus, the apparatus including:

the pre-training unit is used for pre-training the neural network model;

the first fine tuning unit is used for carrying out fine tuning on a face recognition task on the pre-trained neural network model;

the second fine tuning unit is used for performing fine tuning of the face attribute recognition task on the neural network model subjected to fine tuning of the face recognition task;

and the recognition unit is used for carrying out face attribute recognition on the given image by using the neural network model after the fine adjustment of the face attribute recognition task.

A third aspect of the present application provides a terminal comprising a processor for implementing the steps of a neural network model training method or the face attribute recognition method when executing a computer program stored in a memory.

A fourth aspect of the present application provides a computer-readable storage medium having stored thereon a computer program which, when being executed by a processor, carries out the steps of a neural network model training method or the face attribute recognition method.

The method pre-trains the neural network model; carrying out fine adjustment of a face recognition task on the pre-trained neural network model; carrying out fine adjustment on the face attribute recognition task on the neural network model subjected to fine adjustment on the face recognition task; and carrying out face attribute identification on the given image by using the neural network model after the fine adjustment of the face attribute identification task. The invention combines the ideas of deep learning and transfer learning together, applies the ideas to face attribute recognition, and can train a neural network model suitable for face attribute recognition under the premise of limited sample size to obtain better face attribute recognition effect.

Drawings

Fig. 1 is a flowchart of a neural network model training method according to an embodiment of the present invention.

FIG. 2 is a schematic diagram of a training process for a neural network model.

Fig. 3 is a flowchart of a face attribute recognition method according to a second embodiment of the present invention.

Fig. 4 is a structural diagram of a neural network model training apparatus according to a third embodiment of the present invention.

Fig. 5 is a structural diagram of a face attribute recognition apparatus according to a fourth embodiment of the present invention.

Fig. 6 is a schematic diagram of a terminal according to a fifth embodiment of the present invention.

The following detailed description will further illustrate the invention in conjunction with the above-described figures.

Detailed Description

In order that the above objects, features and advantages of the present invention can be more clearly understood, a detailed description of the present invention will be given below with reference to the accompanying drawings and specific embodiments. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.

In the following description, numerous specific details are set forth to provide a thorough understanding of the present invention, and the described embodiments are merely a subset of the embodiments of the present invention, rather than a complete embodiment. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

Preferably, the face attribute recognition method of the present invention is applied to one or more terminals. The terminal is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The terminal can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The terminal can be in man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.

Example one

The neural network model training method of the present invention is described below with reference to fig. 1 and 2. The neural network model training method is applied to a terminal. Fig. 1 is a flowchart of a neural network model training method according to an embodiment of the present invention. FIG. 2 is a schematic diagram of a training process for a neural network model.

As shown in fig. 1, the neural network model training method specifically includes the following steps:

101: and pre-training the neural network model.

In this embodiment, the neural network model is a convolutional neural network model. The convolutional neural network model comprises a convolutional layer, a downsampling layer and a full-connection layer. The convolutional neural network model may also include other layers, such as a normalization layer, a Dropout layer. The number of network layers of the neural network model can be set as desired. For example, if the size of the input image is large, more convolution layers and downsampling layers are selected. If the size of the input image is small, fewer convolutional layers and downsample layers are selected.

In a preferred embodiment, the convolutional neural network model comprises 8 convolutional layers (Conv 11, conv12, conv21, conv22, conv31, conv32, conv41, conv 42), 4 downsampling layers (Pool 1, pool 2, pool 3, pool 4), 1 Dropout layer (Dropout 1), 2 fully-connected layers (Fc 6, fc 7). Wherein each two convolutional layers are followed by a downsampled layer, for example, conv11, conv12 followed by Pool 1, conv21, conv22 followed by Pool 2. The layers with parameters total 10 layers (i.e. 8 convolutional layers and 2 fully-connected layers). Conv11 is the first layer of the convolutional neural network model, fc7 is the last layer of the convolutional neural network model, and the Dropout layer is located after the convolutional layer and the downsampling layer and before the fully-connected layer. In this embodiment, an activation function is connected behind each convolutional layer. In this embodiment, the activation function is a ReLU function. In other embodiments, the activation function may be another function, such as a sigmoid function or a tanh function.

The neural network model may be pre-trained using natural scene images. In particular, in a preferred embodiment, the neural network model is pre-trained using the ImageNet image library. The ImageNet image library has more than 100 million images with labeled categories, covers more than 1000 categories, and is suitable for large-scale network training. In other embodiments, other image libraries may be used for pre-training.

Table 1 shows the specific parameters of the neural network model training process according to the preferred embodiment of the present invention.

TABLE 1 training parameters for neural network model training Process

	Batch size	Learning rate	Number of training rounds
				Pre-training	128	0.01～0.00001	40
Face recognition task fine tuning	64	0.00001	20
				Attribute recognition task hinting	32	0.00001	10

As shown in table 1, in the process of pre-training the neural network model, the number of images per Batch (each Batch, each cluster) is 128, the learning rate is from 0.01 to 0.00001 (i.e., the learning rate is 0.01, 0.001, 0.0001, 0.00001), and the total number of training rounds is 40. In other embodiments, the size of the pre-trained training parameters may vary. For example, in the process of pre-training the neural network model, the number of images per Batch is 256, the learning rate is from 0.001 to 0.00001 (i.e., the learning rate is 0.001, 0.0001, 0.00001), and the number of training rounds is 60.

Table 1 also shows training parameters for performing face recognition task fine adjustment and attribute recognition task fine adjustment on the neural network model, which will be described in detail later.

In the pre-training process, the neural network model may be trained using a neural network training algorithm, such as a back propagation algorithm. Neural network training algorithms are prior art and will not be described herein.

102: and carrying out fine adjustment on the face recognition task on the pre-trained neural network model.

The neural network model fine tuning refers to that on the basis of a trained model, local fine adjustment is carried out on parameters learned in the model through a certain training strategy, and then different objective functions can be expressed.

The pre-trained neural network model can be subjected to fine adjustment of a face recognition task by using a face image library. When the face image library is used for carrying out face recognition task fine adjustment on the pre-trained neural network model, the output of the last layer (such as the last full-connection layer) of the neural network model is changed to be consistent with the number of people contained in the face image library. For example, if the face image library used contains 500 people, the output of the last layer of the neural network model is changed to 500 (i.e., 500 classes are output). In at least one embodiment, the learning rate of the last layer (e.g., the last fully-connected layer) of the neural network model may be set 10 times that of the other layers during the fine-tuning of the face recognition task. For example, the learning rate of the other layers of the neural network model is kept to 0.00001, and the learning rate of the last layer is 0.0001.

The number of images used for the face recognition task fine-tuning may be less than the number of images used for pre-training. For example, pre-training uses over 100 million images, covering over 1000 classes; the face image library used for fine-tuning the face recognition task contains 500 persons, each with about 40-100 images (about 2-5 ten thousand total).

As shown in table 1, in the face recognition task fine tuning process, the number of images per Batch is 64, the learning rate is 0.00001, and 20 training rounds are counted. In other embodiments, the size of the training parameters trimmed by the face recognition task may vary. For example, in the fine tuning process of the face recognition task, the number of images per Batch is 32, the learning rate is 0.0001, and the number of training rounds is 15.

As shown in fig. 2, in the preferred embodiment, when the face recognition task is fine-tuned, before the face image is input into the neural network model, the face detection, the face alignment, and the face normalization are performed on the face image.

Similar to the pre-training, in the fine tuning process of the face recognition task, a neural network training algorithm, such as a back propagation algorithm, may be used to train the pre-trained neural network model.

103: and carrying out fine adjustment on the face attribute recognition task on the neural network model subjected to fine adjustment on the face recognition task.

The neural network model after the face recognition task is finely adjusted can be subjected to face attribute recognition task fine adjustment by using the image with the labeled face attributes.

The facial attributes include race, gender, age, expression, and the like. Specifically, the ethnicity may be classified as either a yellow, white, black and brown race, or as a particular ethnicity (e.g., yellow race) with other ethnicities. Gender can be divided into male and female. The ages can be classified as young children, adolescents, middle-aged, elderly, or as different specific ages. Expressions can be classified as happy, sad, angry, horror, surprised, disgust, and the like. Different division can be carried out to the face attribute according to actual need.

When the face attribute recognition task is fine-tuned, different images marked with the face attributes can be used according to different face attributes.

For example, the invention is used for ethnicity recognition, and can use the image of the yellow race, the image of the white race, the image of the black race and the image of the brown race to perform fine adjustment of the face attribute recognition task, or use the image of a specific ethnicity (such as the image of the yellow race) and the image of a non-specific ethnicity (such as the image of the non-yellow race) to perform fine adjustment of the face attribute recognition task.

For another example, the method is used for gender identification, and the face attribute identification task can be finely adjusted by using the male image and the female image.

For another example, the invention is used for age recognition, and can perform fine adjustment of a face attribute recognition task by using images of infants, images of juveniles, images of young adults, images of middle-aged people and images of old people, or perform fine adjustment of a face attribute recognition task by using images of people of different specific ages.

When the neural network model after the fine adjustment of the face recognition task is adjusted by using the image with the labeled face attributes, the output of the last layer (for example, the last full-connection layer) of the neural network model is changed to be consistent with the category of the image with the labeled face attributes. For example, if the used image with labeled face attributes contains four classes (e.g., yellow image, white image, black image, and brown image), the output of the last layer of the neural network model is changed to 4 (i.e., 4 classes are output). In at least one embodiment, the learning rate of the last layer (e.g., the last fully connected layer) of the neural network model may be set to 10 times that of the other layers during the fine-tuning of the face attribute recognition task. For example, the learning rate of the other layers of the neural network model is kept to 0.00001, and the learning rate of the last layer is 0.0001.

The number of images used for fine tuning of the face attribute recognition task may be less than the number of images used for fine tuning of the face attribute recognition task. For example, the face image library used for fine tuning of the face recognition task contains 500 persons, each with about 40-100 images (about 2-5 ten thousand in total); the images used for fine-tuning of the face attribute recognition task contain 2 classes (e.g., yellow images and non-yellow images), with about 1000 images of each class (about 2000 total).

As shown in table 1, in the fine tuning process of the face attribute recognition task, the number of images per Batch is 32, the learning rate is 0.00001, and 10 training rounds are counted. In other embodiments, the size of the training parameters trimmed by the face attribute recognition task may vary. For example, in the fine adjustment process of the face attribute recognition task, the number of images per Batch is 8, the learning rate is 0.0001, and the number of training rounds is 8.

As shown in fig. 2, in the preferred embodiment, similar to the fine adjustment process of the face recognition task, when the face attribute recognition task is fine adjusted, before the face image is input into the neural network model, the face detection, the face alignment, and the face normalization are performed on the image.

Similar to pre-training, in the fine-tuning process of the face attribute recognition task, a neural network training algorithm, such as a back propagation algorithm, may be used to train the neural network model after the fine-tuning of the face attribute recognition task.

Through the steps 101-103, the neural network model suitable for face attribute recognition can be obtained.

The neural network model training method of the first embodiment is used for pre-training a neural network model; carrying out fine adjustment of a face recognition task on the pre-trained neural network model; and carrying out fine adjustment on the face attribute recognition task on the neural network model subjected to fine adjustment on the face recognition task. In the first embodiment, a neural network model with a deep learning structure is used, the idea of deep learning is used for pre-training the neural network model, the idea of transfer learning is used for carrying out face recognition task fine tuning on the pre-trained neural network model to obtain a neural network model suitable for carrying out face recognition, and the idea of transfer learning is used for carrying out face attribute recognition task fine tuning on the neural network model after the face recognition task fine tuning to obtain a neural network model suitable for carrying out face attribute recognition. Therefore, the neural network model training method of the first embodiment combines the ideas of deep learning and transfer learning together, and applies the method to face attribute recognition, so that the neural network model suitable for face attribute recognition can be trained on the premise of limited sample size.

Example two

Fig. 3 is a flowchart of a face attribute identification method according to a second embodiment of the present invention. As shown in fig. 3, the face attribute recognition method specifically includes the following steps:

301: and pre-training the neural network model.

Step 301 in this embodiment is the same as step 101 in the first embodiment, and please refer to the related description of step 101 in the first embodiment, which is not described herein again.

302: and carrying out fine adjustment on the face recognition task on the pre-trained neural network model.

Step 302 in this embodiment is the same as step 102 in the first embodiment, and please refer to the related description of step 102 in the first embodiment, which is not repeated herein.

303: and carrying out fine adjustment on the face attribute recognition task on the neural network model subjected to fine adjustment on the face recognition task.

In this embodiment, step 303 is the same as step 103 in the first embodiment, and specific reference is made to the related description of step 103 in the first embodiment, which is not repeated herein.

304: and carrying out face attribute recognition on the given image by using the neural network model after the fine adjustment of the face attribute recognition task.

And when the face attribute of the given image needs to be identified, inputting the given image into the neural network model after the fine adjustment of the face attribute identification task. And the neural network model receives the given image and identifies the given image to obtain an identification result. The recognition result is the facial attributes determined from the given image, such as race, gender, age, expression, etc.

The face attribute recognition method of the second embodiment pre-trains the neural network model; carrying out face recognition task fine adjustment on the pre-trained neural network model; carrying out fine adjustment on the face attribute recognition task on the neural network model subjected to fine adjustment on the face recognition task; and carrying out face attribute recognition on the given image by using the neural network model after the fine adjustment of the face attribute recognition task. In the second embodiment, a neural network model with a deep learning structure is used, the idea of deep learning is used for pre-training the neural network model, the idea of transfer learning is used for carrying out face recognition task fine tuning on the pre-trained neural network model to obtain a neural network model suitable for face recognition, and the idea of transfer learning is used for carrying out face attribute recognition task fine tuning on the neural network model subjected to face recognition task fine tuning to obtain a neural network model suitable for face attribute recognition. Therefore, the human face attribute recognition method of the second embodiment combines the ideas of deep learning and transfer learning together, and applies the method to human face attribute recognition, so that a neural network model suitable for human face attribute recognition can be trained on the premise of limited sample size, and a better human face attribute recognition result can be obtained by using the neural network model.

To verify the effectiveness of the proposed solution, a comparative experiment was performed on the convolutional neural network model comprising 8 convolutional layers (Conv 11, conv12, conv21, conv22, conv31, conv32, conv41, conv 42), 4 downsampled layers (Pool 1, pool 2, pool 3, pool 4), 1 Dropout layer (Dropout 1) and 2 fully-connected layers (Fc 6, fc 7), the test set comprising 500 positive and 500 negative examples, and the classification results of the different strategies are shown in table 2.

TABLE 2 Classification results of different strategies

Test 1: the neural network model is pre-trained by using an ImageNet image library, the characteristics of an Fc6 layer are extracted for attribute information expression, and an SVM (Support Vector Machine) is used as an attribute classifier. The SVM is a supervised learning model that is commonly used for pattern recognition, classification, and regression analysis.

Test 2: and pre-training a neural network model by using an ImageNet image library, finely adjusting a face recognition task, extracting the characteristics of an Fc6 layer to express attribute information, and using an SVM as an attribute classifier.

Test 3: and pre-training a neural network model by using an ImageNet image library, finely adjusting a face recognition task and a face attribute recognition task, and directly classifying attributes by using the output of the network model.

Test 4: the method comprises the steps of pre-training a neural network model by using an ImageNet image library, finely adjusting a face recognition task and a face attribute recognition task, extracting the characteristics of an Fc6 layer to express attribute information, and using an SVM as an attribute classifier.

According to table 2, after the neural network model is pre-trained and the face recognition task and the face attribute recognition task are finely tuned, the neural network model can well recognize the face attribute. And moreover, the characteristics of the Fc6 layer are extracted for attribute information expression, and the classification result obtained by using the SVM as the attribute classifier is superior to that obtained by directly using the output of the network model for attribute classification.

The invention aims at the defects that the existing face attribute recognition has low precision and is easily influenced by factors such as illumination, posture, expression and the like, and utilizes the advantages of deep learning on image recognition to apply the face attribute recognition task. Feature expression based on deep learning requires a large number of samples for model training, and usually, only a small number of samples can be obtained by tasks in certain specific fields. The invention combines the ideas of deep learning and transfer learning together, applies the ideas to face attribute recognition (such as ethnicity recognition), and can obtain good effect on the premise of limited sample size.

In the preferred embodiment of the present invention, during the model training process, the neural network model is pre-trained using the natural scene image. Images of natural scenes are very easy to acquire and there are many large-scale public datasets, such as ImageNet, that can be used to pre-train neural network models. And secondly, carrying out fine adjustment on the face recognition task by using the face image to the pre-trained neural network model. The attribute classification is carried out according to the face images, and because the model has no prior knowledge about the face features, the effect is poor when the features obtained from the natural scene images are directly applied to the face images, so that the pre-trained neural network model is used for carrying out fine adjustment on face recognition tasks on a certain number of face images. Finally, the features learned in the last stage can be well expressed, but specific face attributes, such as ethnicity-related knowledge, are lacked, so that the model is further fine-tuned by using a small number of images with labeled face attributes. This method can be satisfactorily used especially in cases where the sample size is very limited. The initialization of the neural network is crucial to the training and convergence of the network, the convergence of the model can be guaranteed through some specific initialization algorithms under the condition that the sample size is sufficient generally, the network is likely to be in the local minimum when the training is directly performed under the condition that the sample size is very effective, and the global optimal result can be guaranteed to be obtained to a great extent through parameters learned through pre-training and network fine adjustment. Through the pre-training and fine-tuning of the stages, the obtained features can better classify the attributes of the human faces.

The face attribute recognition of the invention can be applied to various fields, such as video monitoring, population analysis, business analysis and the like. For example, in traffic monitoring, the invention can be used to perform face attribute recognition on the monitored face image of a pedestrian or a driver, and determine the face attribute (such as gender) of the pedestrian or the driver.

EXAMPLE III

Fig. 4 is a structural diagram of a neural network model training device according to a third embodiment of the present invention. As shown in fig. 4, the neural network model training device 10 may include: a pre-training unit 401, a first fine tuning unit 402, and a second fine tuning unit 403.

And a pre-training unit 401, configured to pre-train the neural network model.

In a preferred embodiment, the convolutional neural network model comprises 8 convolutional layers (Conv 11, conv12, conv21, conv22, conv31, conv32, conv41, conv 42), 4 downsampling layers (Pool 1, pool 2, pool 3, pool 4), 1 Dropout layer (Dropout 1), 2 fully-connected layers (Fc 6, fc 7). Wherein each two convolutional layers are followed by a downsampling layer, for example, conv11, conv12 followed by Pool 1, conv21, conv22 followed by Pool 2. The layers with parameters are 10 layers in total (i.e. 8 convolutional layers and 2 fully-connected layers). Conv11 is the first layer of the convolutional neural network model, fc7 is the last layer of the convolutional neural network model, and the Dropout layer is located after the convolutional layer and the downsampling layer and before the fully-connected layer. In this embodiment, an activation function is connected behind each convolutional layer. In this embodiment, the activation function is a ReLU function. In other embodiments, the activation function may be another function, such as a sigmoid function or a tanh function.

As shown in table 1, in the process of pre-training the neural network model, the number of images per Batch (each cluster) is 128, the learning rate is from 0.01 to 0.00001 (i.e., the learning rate is 0.01, 0.001, 0.0001, 0.00001), and a total of 40 rounds of training are performed. In other embodiments, the size of the pre-trained training parameters may vary. For example, in the process of pre-training the neural network model, the number of images per Batch is 256, the learning rate is from 0.001 to 0.00001 (i.e., the learning rate is 0.001, 0.0001, 0.00001), and the number of training rounds is 60.

A first fine tuning unit 402, configured to perform face recognition task fine tuning on the pre-trained neural network model.

The number of images used for the face recognition task fine-tuning may be less than the number of images used for pre-training. For example, pre-training uses over 100 million images, covering over 1000 categories; the face image library used for fine-tuning the face recognition task contains 500 persons, each with about 40-100 images (about 2-5 ten thousand total).

As shown in table 1, in the face recognition task fine tuning process, the number of images per Batch is 64, the learning rate is 0.00001, and 20 training rounds are counted. In other embodiments, the size of the training parameters that the face recognition task fine-tunes may vary. For example, in the fine tuning process of the face recognition task, the number of images per Batch is 32, the learning rate is 0.0001, and the number of training rounds is 15.

And a second fine-tuning unit 403, configured to perform fine-tuning of the face attribute recognition task on the neural network model after the fine-tuning of the face recognition task.

The facial attributes include race, gender, age, expression, and the like. Specifically, the ethnicity may be divided into yellow, white, black and brown, or into a particular ethnicity (e.g., yellow) and other ethnicities. Gender can be divided into male and female. The ages can be classified as young children, adolescents, middle-aged, elderly, or as different specific ages. Expressions can be classified as happy, sad, angry, horror, surprised, disgust, and the like. Different division can be carried out to the face attribute according to actual need.

For example, the present invention is used for race recognition, and the face attribute recognition task fine adjustment can be performed using a yellow race image, a white race image, a black race image, and a brown race image, or using a specific race image (e.g., a yellow race image) and a non-specific race image (e.g., a non-yellow race image).

The number of images used for fine tuning of the face attribute recognition task may be less than the number of images used for fine tuning of the face attribute recognition task. For example, the face image library used for fine tuning of the face recognition task contains 500 persons, each with about 40 to 100 images (about 2 to 5 ten thousand in total); the images used for fine-tuning of the face attribute recognition task contain 2 classes (e.g., yellow images versus non-yellow images), with about 1000 images per class (about 2000 total).

Similar to pre-training, in the fine tuning process of the face attribute recognition task, a neural network training algorithm, such as a back propagation algorithm, may be used to train the neural network model after the fine tuning of the face attribute recognition task.

The neural network model training device can train a neural network model suitable for face attribute recognition through the units 401 to 403.

The neural network model training device of the third embodiment pre-trains the neural network model; carrying out fine adjustment of a face recognition task on the pre-trained neural network model; and carrying out fine adjustment on the face attribute recognition task on the neural network model subjected to fine adjustment on the face recognition task. In the third embodiment, a neural network model with a deep learning structure is used, the idea of deep learning is used for pre-training the neural network model, the idea of transfer learning is used for carrying out face recognition task fine tuning on the pre-trained neural network model to obtain a neural network model suitable for carrying out face recognition, and the idea of transfer learning is used for carrying out face attribute recognition task fine tuning on the neural network model after the face recognition task fine tuning to obtain a neural network model suitable for carrying out face attribute recognition. Therefore, the neural network model training device of the third embodiment combines the ideas of deep learning and transfer learning together, and applies the neural network model to face attribute recognition, so that a neural network model suitable for face attribute recognition can be trained on the premise of limited sample size.

Example four

Fig. 5 is a structural diagram of a face attribute recognition apparatus according to a fourth embodiment of the present invention. As shown in fig. 5, the face attribute recognition apparatus 50 may include: a pre-training unit 501, a first fine-tuning unit 502, a second fine-tuning unit 503, and a recognition unit 504.

And a pre-training unit 501, configured to pre-train the neural network model.

The pre-training unit 501 in this embodiment is the same as the pre-training unit 401 in the third embodiment, and specific reference is made to the description of the pre-training unit 401 in the third embodiment, which is not repeated herein.

The first fine tuning unit 502 is configured to perform fine tuning of a face recognition task on the pre-trained neural network model.

The first trimming unit 502 in this embodiment is the same as the first trimming unit 402 in the third embodiment, and please refer to the description related to the first trimming unit 402 in the third embodiment, which is not repeated herein.

And a second fine-tuning unit 503, configured to perform fine-tuning of the face attribute recognition task on the neural network model after the fine-tuning of the face recognition task.

The second trimming unit 503 in this embodiment is the same as the second trimming unit 403 in the third embodiment, and please refer to the description related to the second trimming unit 403 in the third embodiment, which is not repeated herein.

And the identifying unit 504 is configured to perform face attribute identification on the given image by using the neural network model after the fine adjustment of the face attribute identification task.

When the face attribute identification needs to be carried out on the given image, the given image is input into the neural network model after the fine adjustment of the face attribute identification task. And the neural network model receives the given image and identifies the given image to obtain an identification result. The recognition result is the facial attributes determined from the given image, such as race, gender, age, expression, etc.

The face attribute recognition device of the fourth embodiment pre-trains the neural network model; carrying out fine adjustment of a face recognition task on the pre-trained neural network model; carrying out fine adjustment on the face attribute recognition task on the neural network model subjected to fine adjustment on the face recognition task; and carrying out face attribute recognition on the given image by using the neural network model after the fine adjustment of the face attribute recognition task. In the fourth embodiment, a neural network model with a deep learning structure is used, the idea of deep learning is used for pre-training the neural network model, the idea of transfer learning is used for carrying out face recognition task fine tuning on the pre-trained neural network model to obtain a neural network model suitable for face recognition, and the idea of transfer learning is used for carrying out face attribute recognition task fine tuning on the neural network model subjected to face recognition task fine tuning to obtain a neural network model suitable for face attribute recognition. Therefore, the face attribute recognition apparatus of the fourth embodiment combines the ideas of deep learning and transfer learning together, and applies the concept to face attribute recognition, so that on the premise of limited sample size, a neural network model suitable for face attribute recognition can be trained, and a better face attribute recognition result can be obtained by using the neural network model.

EXAMPLE five

Fig. 6 is a schematic diagram of a terminal according to a fifth embodiment of the present invention. The terminal 1 comprises a memory 20, a processor 30 and a computer program 40, such as a neural network model training program or a face attribute recognition program, stored in the memory 20 and executable on the processor 30. When the computer program 40 is executed by the processor 30, the steps in the embodiment of the neural network model training method or the face attribute recognition method, such as the steps 101 to 103 shown in fig. 1 or the steps 301 to 304 shown in fig. 3, are implemented. Alternatively, the processor 30, when executing the computer program 40, implements the functions of the modules/units in the above-described apparatus embodiments, such as the units 401 to 403 in fig. 4 or the units 501 to 504 in fig. 5.

Illustratively, the computer program 40 may be partitioned into one or more modules/units, which are stored in the memory 20 and executed by the processor 30 to implement the present invention. The one or more modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 40 in the terminal 1. For example, the computer program 40 may be divided into a pre-training unit 401, a first fine-tuning unit 402, and a second fine-tuning unit 403 in fig. 4, or divided into a pre-training unit 501, a first fine-tuning unit 502, a second fine-tuning unit 503, and an identification unit 504 in fig. 5, where the specific functions of each unit are described in the third embodiment and the fourth embodiment.

The terminal 1 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. It will be appreciated by a person skilled in the art that the schematic diagram 6 is only an example of the terminal 1 and does not constitute a limitation of the terminal 1, and may comprise more or less components than those shown, or some components may be combined, or different components, e.g. the terminal 1 may further comprise input and output devices, network access devices, buses, etc.

The Processor 30 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. The general purpose processor may be a microprocessor or the processor 30 may be any conventional processor or the like, said processor 30 being the control center of said terminal 1, the various parts of the whole terminal 1 being connected by means of various interfaces and lines.

The memory 20 may be used for storing the computer program 40 and/or the modules/units, and the processor 30 implements various functions of the terminal 1 by running or executing the computer program and/or the modules/units stored in the memory 20 and calling data stored in the memory 20. The memory 20 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the terminal 1, and the like. In addition, the memory 20 may include high speed random access memory, and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other volatile solid state storage device.

The integrated modules/units of the terminal 1, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, U.S. disk, removable hard disk, magnetic diskette, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signal, telecommunications signal, and software distribution medium, etc. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

In the embodiments provided in the present invention, it should be understood that the disclosed terminal and method can be implemented in other manners. For example, the above-described terminal embodiments are merely illustrative, and for example, the division of the unit is only one logical function division, and there may be another division manner in actual implementation.

In addition, functional units in the embodiments of the present invention may be integrated into the same processing unit, or each unit may exist alone physically, or two or more units are integrated into the same unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. Several units or terminals recited in the terminal claims may also be implemented by one and the same unit or terminal, either in software or in hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A face attribute recognition method is applied to a scene combining deep learning and transfer learning, and comprises the following steps:

pre-training a neural network model by using a natural scene image to obtain pre-trained network model parameters, wherein the neural network model comprises 8 convolution layers, 4 down-sampling layers, 1 Dropout layer and 2 full-connection layers, the down-sampling layer is connected behind each two convolution layers, and the Dropout layer is positioned behind the convolution layers and the down-sampling layers and in front of the full-connection layers;

carrying out face recognition task fine adjustment on the pre-trained neural network model by using a face image so as to carry out face recognition, adjusting the parameters of the pre-trained network model to obtain the parameters of the network model after the face recognition task is finely adjusted, and changing the output of the last layer of the pre-trained neural network model to be consistent with the number of people contained in the face image when the face recognition task is finely adjusted;

fine tuning a face attribute recognition task by using the image with the face attributes marked to the neural network model after the fine tuning of the face recognition task so as to recognize the face attributes, adjusting the network model parameters after the fine tuning of the face recognition task to obtain the network model parameters after the fine tuning of the face attribute recognition task, and changing the output of the last layer of the neural network model after the fine tuning of the face recognition task into the class consistent with the class contained in the image with the face attributes marked when the fine tuning of the face attribute recognition task is performed;

carrying out face attribute recognition on a given image by using the neural network model after the fine tuning of the face attribute recognition task;

in the step of performing the fine adjustment of the face recognition task on the pre-trained neural network model and the fine adjustment of the face attribute recognition task on the neural network model after the fine adjustment of the face recognition task, the learning rate of the last layer of the neural network model is 10 times that of the other layers;

and the number of the natural scene images is greater than that of the face images, and the number of the face images is greater than that of the images marked with the face attributes.

2. The method of claim 1, wherein the neural network model is a convolutional network model in which each convolutional layer is followed by an activation function.

3. The method of claim 1, wherein the facial attributes comprise race, gender, age, expression.

4. The method of claim 1, wherein in the step of performing the fine tuning of the face recognition task on the pre-trained neural network model and the fine tuning of the face attribute recognition task on the neural network model after the fine tuning of the face recognition task, the steps of face detection, face alignment and face normalization are performed on the image before the image is input into the neural network model.

5. A face attribute recognition apparatus, applied to a combined scene of deep learning and transfer learning, the apparatus comprising:

the device comprises a pre-training unit, a pre-training unit and a pre-training unit, wherein the pre-training unit is used for pre-training a neural network model by using a natural scene image to obtain pre-trained network model parameters, the neural network model comprises 8 convolution layers, 4 down-sampling layers, 1 Dropout layer and 2 full-connection layers, the down-sampling layer is connected behind each two convolution layers, and the Dropout layer is positioned behind the convolution layers and the down-sampling layers and in front of the full-connection layers;

the first fine tuning unit is used for carrying out fine tuning on a face recognition task on the pre-trained neural network model by using a face image so as to carry out face recognition, adjusting parameters of the pre-trained network model to obtain the fine-tuned parameters of the face recognition task, and changing the output of the last layer of the pre-trained neural network model to be consistent with the number of people contained in the face image when the face recognition task is fine-tuned;

the second fine tuning unit is used for fine tuning the face attribute recognition task of the neural network model subjected to fine tuning of the face recognition task by using the image with the face attribute labeled so as to recognize the face attribute, adjusting the network model parameters subjected to fine tuning of the face recognition task to obtain the network model parameters subjected to fine tuning of the face attribute recognition task, and changing the output of the last layer of the neural network model subjected to fine tuning of the face recognition task to be consistent with the category contained in the image with the face attribute labeled when the face attribute recognition task is fine tuned;

the recognition unit is used for carrying out face attribute recognition on the given image by using the neural network model after the fine adjustment of the face attribute recognition task;

6. A terminal, characterized by: the terminal comprises a processor for implementing the steps of the face attribute recognition method according to any one of claims 1-4 when executing a computer program stored in a memory.

7. A computer-readable storage medium having stored thereon a computer program, characterized in that: the computer program realizes the steps of the face attribute recognition method as claimed in any one of claims 1-4 when executed by a processor.