WO2020114118A1

WO2020114118A1 - Facial attribute identification method and device, storage medium and processor

Info

Publication number: WO2020114118A1
Application number: PCT/CN2019/112478
Authority: WO
Inventors: 刘若鹏; 栾琳; 刘凯品
Original assignee: 深圳光启空间技术有限公司
Priority date: 2018-12-07
Filing date: 2019-10-22
Publication date: 2020-06-11
Also published as: CN111291604A

Abstract

The present invention provides a facial attribute identification method and device, a storage medium and a processor. The method comprises: building datasets of multiple facial attributes; merging the datasets of multiple facial attributes to form a dataset collection; build a multi-task deep convolutional network to train the dataset collection of multiple facial attributes, to obtain a network model capable of identifying the multiple facial attributes; using the obtained trained network model for multiple attributes to perform a multi-attribute prediction for facial attributes of an image to be identified, so as to identify the multiple attributes in the image to be identified. Network parameters are reduced to change the number of parameters of certain layers in the network model, for example, a fully connected layer, and lower network memory occupancy rate; and data merging is used to input data of multiple attributes into a convolutional neural network, so that all attributes are studied and trained in the network and thus the identification for all facial attributes is completed rapidly and efficiently by using the same network.

Description

Face attribute recognition method, device, storage medium and processor

【Technical Field】

The present invention relates to the field of image recognition technology, and more specifically, to a facial attribute recognition method, device, storage medium, and processor.

【Background technique】

With the vigorous development of computer vision technology and deep learning, it is necessary to continuously recognize faces in the fields of security, intelligent video surveillance, urban security, and accident warning. However, with the increasing demand for tasks, people are not only limited to the authentication of facial identities, but also more importantly to identify and authenticate facial sub-attributes. This is more conducive to the identification of someone's identity, but also more conducive to the use of facial information in the security field to control. In addition, the use of facial recognition and facial attributes in the security field at the same time makes the task requirements more clear.

The face is a very important biological feature. It has the characteristics of complex structure and many details. It also contains a lot of information, such as gender, race, age, expression, etc. A normal adult can easily understand the information of the face, but giving the same ability to the computer and letting it replace humans for brain-like thinking has become a scientific subject urgently needed to be overcome by researchers.

Some of the existing attribute recognition methods use training multiple deep convolutional neural networks, and then perform score fusion or feature fusion for further training. The work tasks in this way are heavy and complicated, which is not conducive to practical application of technical problems. If the facial identity authentication network and the attribute recognition network are merged to form a fusion network, the identity feature and the facial attribute feature are simultaneously learned by joint learning. It is a multi-task network; and the cost-sensitive weighting function is used to make it independent of Due to the distribution of data in the target domain, balanced training in the source data domain is realized; and the modified fusion framework only adds a small number of parameters, adding additional computational load.

It has also been proposed to fuse multiple attributes into a network, but its network structure and loss function are too complex, which is not conducive to network training and only depends on facial attribute recognition to increase the accuracy of facial recognition. Although it is also a multi-attribute network structure, it is actually a face image with multiple attribute tags as input to the network structure, which is equivalent to a picture corresponding to multiple tags, which is not conducive to network training and learning. The network model used at the same time is The deep residual network structure, which causes the network model to be too large, requires too much computing resources, and has poor practicality.

[Invention content]

The technical problem to be solved by the present invention is to provide a facial attribute recognition method, device, storage medium and processor, which can adopt the method of reducing network parameters and change the number of parameters of certain layers in the network model, such as fully connected layers, Reduce the occupancy rate of network video memory, and input data of multiple attributes into the convolutional neural network through data combination, so that it can learn and train all attributes in the network, so that the same network can complete all facial attributes quickly and efficiently Identification.

In order to solve the above technical problem, on the one hand, an embodiment of the present invention provides a facial attribute recognition method, which is characterized by comprising: establishing a plurality of facial attribute data sets; and combining the multiple facial attribute data sets To form a dataset of the multiple facial attributes; establish a multi-task deep convolutional neural network to train the dataset of the multiple facial attributes to obtain a multi-attribute network model that can recognize multiple facial attributes; The multi-attribute network model performs multi-attribute prediction on the facial attributes of the image to be recognized to identify multiple facial attributes in the image to be recognized.

Preferably, the establishing of the data sets of multiple facial attributes includes: pre-processing the data sets of the multiple facial attributes; normalizing the pre-processed data sets of the multiple facial attributes; The multiple facial attributes after normalization are labeled.

Preferably, the combining data sets of multiple facial attributes to form the data set sets of the multiple facial attributes includes: setting the input format of each facial attribute data to an array form D1[n, c, w, h]; the data sets of the multiple face attributes are merged according to the first dimension of the input array, that is, the number of pictures. Assuming that the number of types of face attributes is m, the data form generated after the merge is D2[m ×n, c, w, h]; where n, c, w, and h are the number, channel number, width, and height of the data set images of the multiple facial attributes input into the deep convolutional neural network, respectively.

Preferably, the establishment of a multi-task deep convolutional network to train multiple sets of facial attribute data sets includes: using 4 residual modules of a deep residual network;

Input the facial attribute data of the image to be recognized into the deep residual network for network training;

The first residual module is represented by two small residual modules, and the first small residual module structure uses 64 convolutional layers with a convolution kernel size of 3×3 connected to 64 convolution kernels with a size of 3. ×3 convolutional layer, the identity mapping uses 64 1×1 convolutions, and the sum of the two is input into the second small residual module. The structure of the second small residual module is the same as that of the first The small residual modules are the same, the identity map of the second small residual module is represented by the output of the first small residual module,

The structure of the other three residual modules is the same as that of the first residual module, but the number of convolution kernels is 128, 256, and 512, respectively.

Preferably, a multi-task deep convolutional neural network is established to train the data set of multiple facial attributes, and a multi-attribute network model that can recognize multiple facial attributes includes:

According to the trained network model, in the picture recognition stage, the recognition result with the highest probability of each face attribute of the multiple face attributes is obtained at one time, that is, the multi-attribute value of the face attribute.

Preferably, preprocessing the data set of the plurality of face attributes includes: using a multi-task convolutional neural network algorithm to detect the face in the picture to obtain a face image.

Preferably, the multi-task convolutional neural network algorithm is used to detect the face in the picture, and obtaining the facial image includes: the convolutional neural network adopts a fully connected method, and uses the bounding box vector to fine-tune the candidate form, so that the face is in the image The coordinate points in are detected to obtain a facial image.

Preferably, normalizing the data sets of the plurality of face attributes includes: normalizing the width and height of the face image of the data sets of the plurality of face attributes to 128 pixels×128 pixels.

Preferably, the labeling of facial attributes includes: labeling face wearing glasses, wearing a mask, hairstyle, face shape, age, and gender attributes.

Preferably: according to the trained network model, the recognition result with the highest probability of each face attribute of the plurality of face attributes at a time in the picture recognition stage includes: when the network training calculates the loss function, according to each array’s The first dimension is divided, and it is cut according to the number of pictures corresponding to each attribute, that is, the final result is the array form D3[n, c, w, h] of each attribute; The number of pictures is carried out and each attribute data is input into the corresponding loss function; the loss function takes the form of probability, the formula is:

Where j represents the current serial number of the sub-attribute of an attribute in the facial attribute, k represents the serial number of the sub-attribute of an attribute in the facial attribute, T represents the total number of sub-attributes of an attribute in the facial attribute, and S _j represents The probability of the j-th sub-attribute of one of the face attributes, k starts from 1, and the sum of the probabilities of the T sub-attributes is 1.

It is the exponential form of a certain attribute of the data. a _j and a _k represent a certain attribute value scale in the facial attribute, and the denominator indicates the sum of the exponents of all attribute tags, thereby obtaining the probability that the facial attribute is a specific attribute tag.

On the other hand, an embodiment of the present invention provides a storage medium, the storage medium including a stored program, wherein the face attribute recognition method described above is executed when the program runs.

On the other hand, an embodiment of the present invention provides a processor for running a program, wherein the facial attribute recognition method described above is executed when the program is run.

On the other hand, an embodiment of the present invention provides a face recognition device, which includes a data establishment module, a data merge module, a training module, and a prediction module that are electrically connected in sequence: the data establishment module is used to establish a variety of facial attributes. Data set; the data merge module is used to merge the data sets of the multiple facial attributes to form a data set set of the multiple facial attributes; the training module is used to establish a multi-task deep convolutional neural network for the multiple Data sets of various facial attributes are trained to obtain a multi-attribute network model that can recognize multiple facial attributes; a prediction module is used to perform multi-attribute prediction of facial attributes of the image to be recognized using the multi-attribute network model to identify Identify multiple facial attributes in the image.

Preferably, the data establishment module includes: a data preprocessing module, the data preprocessing module is used for preprocessing facial image data; a data normalization processing module, the data normalization processing module is used for face The image data is normalized; a data labeling module, which is used to label facial attributes.

Preferably, the data merging module includes: a first storage module, the first storage module is used to store facial images of each attribute.

Preferably, the training module includes a deep residual network, and the deep residual network includes 4 residual modules, wherein the first residual module is represented by two small residual modules, and the first small residual The module structure uses 64 convolution layers with a convolution kernel size of 3×3 to connect 64 convolution layers with a convolution kernel size of 3×3, and its identity mapping uses 64 1×1 convolution layers. Then enter the second small residual module, the structure of the second small residual module is the same as the first small residual module, and the identity mapping of the second small residual module uses the first The output of the small residual module indicates that the structure of the other three residual modules is the same as that of the first residual module, and the number of convolution kernels is 128, 256, and 512, respectively.

Preferably, the prediction module includes a third storage module, the third storage module is used to store the recognition result with the highest probability of each attribute of the face obtained according to the trained network model, that is, the multi-attribute of the face attribute value.

Preferably, the data normalization processing module for normalizing the face image data includes: normalizing the width and height of the face image of the data set of multiple face attributes to 128 pixels×128 pixels.

Compared with the prior art, the above technical solution has the following advantages: The present invention adopts a method of reducing network parameters, that is, changing the number of parameters of certain layers in the network model, such as the fully connected layer, reduces the network memory occupancy rate, And through the method of data combination, the data of multiple attributes are input into the convolutional neural network, so that it can learn and train all the attributes in the network, so that the same network can quickly and efficiently complete the identification of all facial attributes.

【Explanation】

In order to more clearly explain the technical solutions in the embodiments of the present invention, the drawings required in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For a person of ordinary skill in the art, without paying any creative labor, other drawings can also be obtained based on these drawings.

FIG. 1 is a flowchart of a face attribute recognition method of the present invention.

FIG. 2 is a basic block diagram of the residual network used in the facial attribute recognition method of the present invention.

FIG. 3 is an embodiment of a flowchart of a facial attribute recognition method of the present invention.

FIG. 4 is a diagram of facial shape categories.

FIG. 5 is a structural diagram of a face attribute recognition device of the present invention.

【detailed description】

The technical solutions in the embodiments of the present invention will be described clearly and completely in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.

Example one

FIG. 1 is a flowchart of a face attribute recognition method of the present invention. As shown in FIG. 1, the facial attribute recognition method of the present invention includes steps: including steps:

S11. Establish a data set of multiple facial attributes;

S12: Combine the data sets of the multiple facial attributes to form a data set set of the multiple facial attributes;

S13. Establish a multi-task deep convolutional neural network to train the data set of multiple facial attributes to obtain a multi-attribute network model that can recognize multiple facial attributes;

S14. Use the multi-attribute network model to perform multi-attribute prediction on the facial attributes of the image to be recognized to identify multiple facial attributes in the image to be recognized.

During specific implementation, the step S11 of establishing a plurality of facial attribute data sets includes: preprocessing, normalizing, and labeling the multiple facial attribute data sets.

Preprocessing the dataset of multiple facial attributes further includes:

The multi-task convolutional neural network algorithm is used to detect the face in the picture to obtain a facial image. The convolutional neural network can be fully connected, and the candidate form can be fine-tuned using the bounding box vector to detect the coordinate points of the face in the image. After detecting the face, the face image is intercepted for training.

Normalizing the image to a certain size is determined by the network input size. Different networks have different input sizes. As an embodiment of the present invention, normalizing the data set of multiple face attributes includes: normalizing the width and height of the face image of the data set of multiple face attributes to but not limited to 128 pixels×128 pixels.

Labeling facial attributes includes but is not limited to: labeling face wearing glasses, wearing a mask, hairstyle, face shape, age, gender attributes, etc.

During specific implementation, in step S12, the data sets of multiple facial attributes are combined to form a data set set of multiple facial attributes including:

Set the input format of each facial attribute data to array form D1[n, c, w, h];

The data sets of multiple face attributes are combined according to the first dimension of the input array, that is, the number of pictures. Assuming that the number of types of face attributes is m, the data form generated after the merge is D2[m×n, c, w, h];

Where n, c, w, and h are the number, channel number, width, and height of the data set images of the multiple facial attributes input into the deep convolutional neural network, respectively.

During specific implementation, step S13 establishes a multi-task deep convolutional neural network to train the data set of multiple facial attributes, and a multi-attribute network model that can recognize multiple facial attributes includes: 4 using a deep residual network Residual modules, 4 residual modules, of which the first residual module is represented by two small residual modules, the first small residual module structure uses 64 convolution kernels with a size of 3×3 convolution The layer connects 64 convolutional layers with a convolution kernel size of 3×3, and its identity mapping uses 64 1×1 convolutions. After the two are summed, the second small residual module is input, and the second small The structure of the residual module is the same as the first small residual module, the identity map of the second small residual module is represented by the output of the first small residual module, and the structure of the other three residual modules It has the same structure as the first residual module, but the number of convolution kernels is 128, 256, and 512, respectively; input the facial attribute data of the image to be recognized into the deep residual network for network training. In the specific implementation process, the number of residual modules can be selected according to actual needs. Here, four residual modules are used as examples.

During specific implementation, step S14 uses the multi-attribute network model to perform multi-attribute prediction on the facial attributes of the image to be recognized to identify multiple facial attributes in the image to be recognized. Obtain the recognition result with the highest probability of each face attribute of the multiple face attributes at once, that is, the multiple attribute value of the face attribute.

In the network training stage, when calculating the loss function, it is divided according to the first dimension of each array, and it is cut according to the number of pictures corresponding to each attribute, that is, the final result is the array form of each attribute D3[n , C, w, h];

When segmenting, proceed according to the corresponding number of pictures and input each attribute data into the corresponding loss function;

The loss function takes the form of probability, the formula is:

Where j represents the current serial number of the sub-attribute of an attribute in the face attribute, k represents the serial number of the sub-attribute in an attribute of the face, T represents the total number of sub-attributes of an attribute in the face attribute, and S _j represents the face The probability of the jth sub-attribute of one of the attributes, k starts at 1, and the sum of the probabilities of the T sub-attributes is 1.

For example, a face attribute in the face attribute, the face attribute has three sub-attributes: round face, square face, and pointed face. Then T=3, and k takes the value from 1. In one judgment, the sum of the probabilities of all the sub-attributes of round face, square face, and sharp face is 1 together. Probability calculation of round face: The network output value calculated by neural network is a ₁ =3. Probability calculation of the square face: The network output value calculated by the neural network is a ₂ =1. Probability calculation of sharp face: The network output value calculated by neural network is a ₃ =-3. According to the formula in the probability form of the loss function:

Then the probability value S ₁ judged as a round face this time is:

Then the probability value S ₂ judged as square face this time is:

Then the probability value S ₃ judged as sharp face this time is:

According to the probability values S ₁ , S ₂ , and S ₃ of the round, square, and sharp faces determined this time, the face attribute is determined to belong to the round face attribute among the three sub-attributes.

It can be seen that by adopting the facial attribute recognition method of the present invention, the parameters of the deep convolutional neural network are reduced, that is, the number of parameters of certain layers in the network model such as the fully connected layer is changed, and the network memory occupation rate is reduced, and Through data merging, data of multiple attributes are input into the convolutional neural network, so that it can learn and train all attributes in the network, so that the same network can quickly and efficiently complete the identification of all facial attributes.

Example 2

An embodiment of the present invention further provides a storage medium, the storage medium includes a stored program, wherein the above-mentioned facial attribute recognition method flow is executed when the above program runs.

Optionally, in this embodiment, the above storage medium may be set to store program code for performing the following flow of facial attribute recognition method:

S11. Establish a data set of multiple facial attributes;

Optionally, in this embodiment, the above storage medium may include, but is not limited to: a USB flash drive, a read-only memory (Read-Only Memory, ROM for short), a random access memory (Random Access Memory, RAM for short), Various media that can store program codes, such as removable hard disks, magnetic disks, or optical disks.

It can be seen that by using the storage medium of the present invention, the storage capacity is reduced, and the program of the built-in facial attribute recognition method process runs faster, thereby quickly and efficiently completing the recognition of all facial attributes.

Example Three

An embodiment of the present invention further provides a processor for running a program, wherein, when the program runs, the steps in the facial attribute recognition method described above are executed.

Optionally, in this embodiment, the above program is used to perform the following steps:

S11. Establish a data set of multiple facial attributes;

Optionally, for specific examples in this embodiment, reference may be made to the above-mentioned embodiments and the examples described in the specific implementation, and this embodiment will not be repeated here.

It can be seen that by using the processor of the present invention, the amount of data to be processed is reduced, and the program of the built-in facial attribute recognition method process runs faster, thereby quickly and efficiently completing the recognition of all facial attributes.

Example 4

FIG. 2 is an embodiment of a flowchart of a facial attribute recognition method of the present invention. In today's society, practitioners can get many kinds of data from various channels. This provides good data support for deep learning. In order to introduce the specific embodiments of the present invention more conveniently, the flow chart of the present invention is introduced with age, gender, and face shape as the training set.

It is assumed that enough age, gender, and face samples of various poses have been collected, and this sample is used as a training data preprocessing sample. Facial attribute recognition starts, and then training data preprocessing. Samples for training data preprocessing include data pictures with age information, data pictures with gender information, and data pictures with face shape information.

Before constructing the network input training set, the image must be preprocessed. The data preprocessing steps of this scheme follow the following steps:

The first step: use the multi-task convolutional neural network (mtcnn) algorithm to detect the face in the picture. The algorithm uses three convolutional neural network cascades to detect the face. First, the full convolutional neural network is used to obtain face candidates Regression of form and boundary. At the same time, after the candidate form is calibrated according to the bounding box, the non-maximum suppression method is used to remove the overlapping form. Secondly, the convolutional neural network in this step adopts the fully connected method and uses the bounding box vector to fine-tune. The candidate form detects the coordinate points of the face in the image. After detecting the face, the face image is intercepted for training. Finally, a convolutional neural network with one more layer is used to continue to optimize the position of the face detection frame. After the face detection algorithm, a face image with a normalized size of 128 pixels×128 pixels is obtained.

The second step: Annotate the facial attributes. The labeled attributes include whether to wear glasses, whether to wear a mask, hairstyle, face shape, age, gender, etc. The labeling method uses the existing public attribute data set Market1501 to train an initial model and then coarsely classifies the pictures obtained by face detection. Each attribute will be given a numerical label, and then the fine classification of the facial attribute pictures is manually classified by human, thus Construct data sets of different facial attributes. For example: different attributes in the same picture can be placed in different folders.

Enter the stage of training data merging and multi-task training data preparation: the multi-task network learning mechanism can enable the network to share the characteristics of other data. Nowadays, many deep learning networks only focus on a single task, so that many data features with the same commonality cannot be shared. Multi-task learning can solve this problem well. It is an inductive transfer mechanism. The main goal is to use the specific field information hidden in the training signals of multiple related tasks to improve the generalization ability. Multi-task learning uses Shared representation trains multiple tasks in parallel to accomplish this goal, that is, you can use shared representation to acquire knowledge about other related problems while learning a problem. Therefore, multi-task learning is a method focused on applying knowledge to solve a problem to other related problems. This solution uses data merging to achieve the preparation of multi-task learning training data. In the process of data merging (taking face, gender, and age data as an example, other attributes can be performed according to this step), the following steps need to be followed:

a. Prerequisites for data merging The channel number, width, and height of each image in each data set input into the network must be the same. This step has been completed in the data preprocessing process.

b. When combining three types of facial attribute data, the input format of each facial attribute data is an array, because its data format is: D1[n, c, w, h], where n is the image input into the network The number of, c is the number of channels of the image (generally 3 channels), w, h are the width and height of the image. Under the condition of a, the data set is merged according to the first dimension of the input array, that is, the number of pictures, and the data form generated after the merge is D2[m×n, c, w, h]; where n, c , W, and h are the number, channel number, width, and height of the data set images of the multiple facial attributes input into the deep convolutional neural network, respectively.

The above two steps complete the data preparation phase of multi-task learning. Through the merged data input to the network for learning, the network can learn the correlation between each data set to achieve the purpose of multi-task learning.

Enter the deep convolutional neural network training stage: the present invention uses the deep residual network as the basic network to perform feature extraction on the network input data. The residual network is a network proposed in 2015, and its performance is better than other deep networks. However, due to its deep network structure and too many parameters, it consumes a large amount of video memory. The invention uses the classification ideas in the detection network to modify the deep residual network, reduce the residual modules in the network, and discard the fully connected layer to form a new network structure, which makes the network structure simple and the network model greatly reduced At the same time, the memory usage is also greatly reduced.

The structure of the deep residual network used in this scheme is relatively simple. As shown in Figure 3, x is the input of the residual module and F(x) is the original mapping of the convolutional neural network. Relu is the activation function in the depth residual module, H(x) is the output function of the depth residual module, and the depth residual module uses the original mapping F(x) and the input x to form a row of network output functions. This solution uses 4 residual modules of the deep residual network, where the first residual module is represented by two small residual modules, and the first small residual module structure uses 64 convolution kernels with a size of 3× The convolutional layer of 3 is connected to 64 convolutional layers with a size of 3×3, and its identity mapping uses 64 1×1 convolutions. After the two are summed, they are input into the second small residual module. The structure of the second small residual module is the same as that of the first small residual module, the identity map of the second small residual module is represented by the output of the first small residual module, and the other three The structure of the residual module is the same as that of the first residual module, but the number of convolution kernels is 128, 256, and 512, respectively. Only the four residual modules in the deep residual network are used, which makes the structure smaller and more conducive to feature extraction of the network input data. Through the depth residual network, the depth features of the image can be well extracted, which can facilitate better classification.

Enter the network output stage: the data is merged before the network training begins to facilitate the sharing of data. After the network's training and learning, the network has learned better features, and the number of images in the training data has not changed during the network's learning process. Therefore, in order to obtain the learning situation of each facial attribute data set, when the network output calculates the loss function, it is divided according to the first dimension of each attribute array and cut according to each attribute, that is, the final result is still each attribute The array form of is D3[n, c, w, h], and input each attribute data into the corresponding loss function.

When data is merged, merge according to the number of facial attribute pictures input into the network. When data separation is performed, the number of pictures input into the network for each data set is cut, and it must be the same as the original number.

The data set after cutting is input into the corresponding loss function layer, so that the corresponding loss function calculation is performed, the corresponding weight update is obtained, and the corresponding category output of each facial attribute data set is obtained.

The loss function can calculate the loss function corresponding to each attribute data based on the shared features learned. The loss function used in this scheme takes the form of probability, and the formula is:

among them,

It is an exponential form of a certain result of the data. a _j and a _k represent a certain attribute value scale in the facial attribute, and the denominator indicates the sum of the exponents of all attribute labels, thereby obtaining the probability that the facial attribute is a specific attribute label. According to the network loss function, the weights are updated using the back propagation algorithm to make the network reach the optimal state, so as to obtain the facial attributes corresponding to the input samples, such as age recognition set, gender recognition set, and face recognition set.

Taking age as an example, we will label age as four labels: teenager, youth, middle age, and old age. The output of the last layer of the network will be calculated according to the loss function and the corresponding label, and four probabilities will be obtained, respectively representing the probabilities of the corresponding categories (juvenile, youth, middle age, old age). If the probability of the juvenile is the highest, it will be judged For teenagers. The present invention will have a loss function corresponding to each attribute in the final network structure, so each attribute will calculate its corresponding probability according to its corresponding loss function. Determine which attribute it belongs to based on the probability. According to the network loss function, the weights are updated using the back-propagation algorithm to make the network reach the optimal state, and the facial attributes corresponding to the input samples are obtained.

Taking gender as an example, we will label gender as: male and female tags. The output of the last layer of the network will be calculated according to the loss function and the corresponding label, and two probabilities will be obtained, which respectively represent the probability of the corresponding category (male and female). If the probability of male is the highest, it will be judged as male. The present invention will have a loss function corresponding to each attribute in the final network structure, so each attribute will calculate its corresponding probability according to its corresponding loss function. Determine which attribute it belongs to based on the probability. According to the network loss function, the weights are updated using the back-propagation algorithm to make the network reach the optimal state, and the facial attributes corresponding to the input samples are obtained.

FIG. 4 is a diagram of facial shape categories. According to the most common facial types, we can be roughly divided into 7 types: round face, oval face, heart-shaped face (inverted triangle face), diamond face, square face, long face, pear-shaped face (positive triangle face). The judgment process for the face shape (the type of face is artificially specified by the practitioner according to the business needs, taking the above seven face types as an example) is:

The first step: After the data preprocessing process, the data set of the training network and the test network is established through the face detection and data annotation process.

The second step: input the training data of each attribute into the network, merge the data (taking gender, age, and face shape as examples), enter it into the network model, and train the network model.

Step 3: After the network model training is completed, use the trained model to input the picture to be recognized into the network model, so as to obtain the gender, age and face type of the face contained in the picture.

The elements of the age recognition set output through the network may be any of the above ages (juvenile, youth, middle age, old age).

The elements of the face recognition set output through the network can be the above face shapes (round face, oval face, heart-shaped face (inverted triangle face), diamond face, square face, long face, pear-shaped face (positive triangle face)) Of any kind.

Recognition of other facial attributes, such as wearing glasses, wearing a mask, hairstyle, skin color, facial expressions and emotions, beard, hair color, wearing a hat, ethnicity, charm value, etc., can be accurately identified by similar methods as above.

Example 5

An embodiment of the present invention further provides a storage medium, which includes a stored program, where the flow of the facial attribute recognition method described in Embodiment 4 is executed when the above program runs.

Example Six

An embodiment of the present invention further provides a processor for running a program, where the steps in the facial attribute recognition method described in Embodiment 4 are executed when the program is run.

Example 7

FIG. 5 is a structural diagram of a face attribute recognition device of the present invention. The device includes: a data establishment module, the data establishment module is used to establish a plurality of facial attribute data sets; a data merge module, the data merge module is used to merge the multiple facial attribute data sets to form A data set collection of the plurality of facial attributes; a training module, the training module is used to establish a multi-task deep convolutional neural network to train the data set collection of the plurality of facial attributes to obtain a variety of facial attributes Multi-attribute network model; prediction module, the prediction module is used to perform multi-attribute prediction of facial attributes of the image to be recognized by using the multi-attribute network model to identify multiple facial attributes in the image to be recognized.

Wherein, the data establishment module includes: a data preprocessing module, the data preprocessing module is used for preprocessing facial image data; a data normalization processing module, the data normalization processing module is used for facial image The data is normalized; a data labeling module, which is used to label facial attributes. The data normalization processing module includes: normalizing the width and height of the face image of the data set of multiple face attributes to but not limited to 128 pixels×128 pixels. Labeling facial attributes refers to: labeling facial attributes whether to wear glasses, whether to wear a mask, what hairstyle, what face shape, what age, what gender.

The data merging module includes: a first storage module; the first storage module is used to store facial image data of each attribute. When a facial attribute recognition method is executed on the facial attribute recognition device, the software program dynamically generates Input the array D1[n, c, w, h], and merge the data sets of the multiple facial attributes according to the first dimension of the input array, that is, the number of pictures. Assuming that the number of types of facial attributes is m, then The data array D2[m×n, c, w, h] generated after merging, where n is the number of images input into the deep convolutional neural network, and c is the number of channels input into the image of the deep convolutional neural network, w is the width of the image input into the deep convolutional neural network, and h is the height of the image input into the deep convolutional neural network.

The training module includes a deep residual network, the deep residual network includes 4 residual modules, wherein the first residual module is represented by two small residual modules, and the first small residual module structure uses 64 The convolutional layer with a convolution kernel size of 3×3 connects 64 convolutional layers with a convolution kernel size of 3×3, and its identity map uses 64 1×1 convolutions. Small residual module, the structure of the second small residual module is the same as that of the first small residual module, and the identity mapping of the second small residual module uses the first small residual module The output of shows that the structure of the other three residual modules is the same as that of the first residual module, and the number of convolution kernels is 128, 256, and 512, respectively.

The prediction module includes a second storage module, which is used to store a recognition result with the highest probability of each attribute of the face obtained according to the trained network model, that is, a multi-attribute value of the face attribute.

It can be seen that by using the face attribute recognition device of the present invention, the parameters of the deep convolutional neural network are reduced, that is, the number of parameters of certain layers in the network model such as the fully connected layer is changed, and the network memory occupation rate is reduced, and Through data merging, data of multiple attributes are input into the convolutional neural network, so that it can learn and train all attributes in the network, so that the same network can quickly and efficiently complete the identification of all facial attributes.

The process of using the original facial attribute recognition method is: establishing data sets corresponding to age, gender, and face shape (including training and testing); entering age, gender, and face shape into their corresponding network models (at this time, three network models need to be trained ); Input the test pictures into the three trained network models respectively, and obtain the attribute recognition result corresponding to each network model according to the attribute corresponding to the highest probability of each network model.

It can be seen that the multi-task deep learning recognition method is used for facial recognition. In deep convolutional neural networks, the number of parameters in the network determines the size of the network, and the size of the network determines the size of the network memory. Adopt the method of reducing network parameters, that is, change the number of parameters of certain layers (such as fully connected layers) in the network, thereby reducing the occupation of network video memory, and input data of multiple attributes into the volume through data consolidation The neural network makes it learn and train all the attributes in the network, so that the same network can complete the recognition of all facial attributes. In the original deep residual network, the last fully connected layer needs to learn 1000 parameters. Since the fully connected layer occupies most of the network parameters, this solution prunes the network, that is, the original network structure. The fully connected layer is removed, so that the convolutional layer is directly connected to the loss layer, and the probability of the face attribute is directly returned to complete the modification and pruning of the network structure. Such network pruning will reduce the number of network calculation parameters, making Network size and video memory usage are reduced.

It can be seen from the above description that using the facial attribute recognition method, device, storage medium and processor according to the present invention reduces the network parameters, that is, changes the number of parameters of certain layers in the network model, such as the fully connected layer, reduces Network memory occupancy rate, and through data combination, input data of multiple attributes into the convolutional neural network, so that it can learn and train all attributes in the network, so that the same network can quickly and efficiently complete the identification of all facial attributes .

The embodiments of the present invention have been described in detail above, and specific examples have been used in this article to explain the principles and implementations of the present invention. The descriptions of the above embodiments are only used to help understand the method and the core idea of the present invention; Those of ordinary skill in the art, according to the ideas of the present invention, may have changes in specific implementations and application scopes. In summary, the content of this specification should not be construed as limiting the present invention.

Claims

A facial attribute recognition method, characterized in that it includes steps:

Create data sets of multiple facial attributes;

Combining the data sets of the multiple facial attributes to form a data set set of the multiple facial attributes;

Establish a multi-task deep convolutional neural network to train the data set of the multiple facial attributes, and obtain a multi-attribute network model that can identify the multiple facial attributes;

The multi-attribute network model is used to perform multi-attribute prediction on facial attributes of the image to be recognized, so as to identify multiple facial attributes in the image to be recognized.
The facial attribute recognition method according to claim 1, wherein the data set for establishing multiple facial attributes includes:

Preprocessing the data set of the multiple facial attributes;

Normalizing the pre-processed data set of the plurality of facial attributes; and

Annotate the multiple facial attributes after normalization.
The facial attribute recognition method according to claim 1, wherein the combining of data sets of multiple facial attributes to form a data set set of the multiple facial attributes includes:

Set the input format of each facial attribute data to array form D1[n, c, w, h];

The data sets of the multiple face attributes are merged according to the first dimension of the input array, that is, the number of pictures. Assuming that the number of types of face attributes is m, the data form generated after the merge is D2[m×n, c, w, h];

Where n, c, w, and h are the number, channel number, width, and height of the data set images of the multiple facial attributes input into the deep convolutional neural network, respectively.
The facial attribute recognition method according to claim 1, wherein the training of the multi-task deep convolutional network to train multiple sets of facial attribute data sets includes:

4 residual modules using deep residual network;

Input the facial attribute data of the image to be recognized into the deep residual network for network training;

The first residual module of the deep residual network is represented by two small residual modules, and the first small residual module structure uses 64 convolutional layers with a convolution kernel size of 3×3 to connect 64 volumes The convolutional layer with a product kernel size of 3×3 uses 64 1×1 convolutions for its identity mapping. After the two are summed, the second small residual module is input, and the structure of the second small residual module Similar to the first small residual module, the identity map of the second small residual module is represented by the output of the first small residual module,

The structure of the other three residual modules is the same as that of the first residual module. The number of convolution kernels is 128, 256, and 512, respectively.
The facial attribute recognition method according to claim 1, wherein the multi-task deep convolutional neural network is trained on the data set set of the multiple facial attributes to obtain multiple facial attribute recognition Attribute network models include:

According to the trained network model, in the picture recognition stage, the recognition result with the highest probability of each face attribute of the multiple face attributes is obtained at one time, that is, the multi-attribute value of the face attribute.
The facial attribute recognition method according to claim 2, wherein the preprocessing of the data sets of the multiple facial attributes includes:

The multi-task convolutional neural network algorithm is used to detect the face in the picture to obtain a facial image.
The face attribute recognition method according to claim 6, wherein the detecting the face in the picture using the multi-task convolutional neural network algorithm and obtaining the face image includes:

The convolutional neural network adopts the fully connected method, and uses the bounding box vector to fine-tune the candidate form, so as to detect the coordinate points of the face in the image, and obtain the face image.
The facial attribute recognition method according to claim 2, wherein normalizing the data set of the plurality of facial attributes includes: normalizing the width and height of the facial image of the data set of the multiple facial attributes to 128 Pixels × 128 pixels.
The facial attribute recognition method according to claim 2, wherein the labeling of the facial attributes includes: labeling the face wearing glasses, wearing a mask, hairstyle, face shape, age, and gender attributes.
The facial attribute recognition method according to claim 5, characterized in that: according to the trained network model, the recognition result with the highest probability of each face attribute of the plurality of facial attributes at a time during the picture recognition stage includes:

When the network training calculates the loss function, it is divided according to the first dimension of each array, and it is cut according to the number of pictures corresponding to each attribute, that is, the final result is the array form of each attribute D3[n, c , W, h];

When segmenting, proceed according to the corresponding number of pictures and input each attribute data into the corresponding loss function;

The loss function takes the form of probability, the formula is:

Where j represents the current serial number of the sub-attribute of an attribute in the facial attribute, k represents the serial number of the sub-attribute of an attribute in the facial attribute, T represents the total number of sub-attributes of an attribute in the facial attribute, and S j represents The probability of the j-th sub-attribute of one of the face attributes, k starts from 1, and the sum of the probabilities of the T sub-attributes is 1.
It is the exponential form of a certain attribute of the data. a j and a k represent a certain attribute value scale in the facial attribute, and the denominator indicates the sum of the exponents of all attribute tags, thereby obtaining the probability that the facial attribute is a specific attribute tag.
A storage medium, characterized in that the storage medium includes a stored program, wherein the face attribute recognition method according to any one of claims 1 to 10 is executed when the program runs.
A processor, characterized in that the processor is used to run a program, wherein the face attribute recognition method according to any one of claims 1 to 10 is executed when the program is run.
A face recognition device, characterized in that the device includes: a data establishment module, a data merge module, a training module and a prediction module that are electrically connected in sequence,

The data establishment module is used to establish a data set of various facial attributes;

The data merging module is used to merge the data sets of the multiple facial attributes to form a data set set of the multiple facial attributes;

The training module is used to establish a multi-task deep convolutional neural network to train the data set collection of multiple facial attributes to obtain a multi-attribute network model that can recognize multiple facial attributes;

The prediction module is used to perform multi-attribute prediction on the facial attributes of the image to be recognized by using the obtained multi-attribute network model to identify multiple facial attributes in the image to be recognized.
The facial recognition device according to claim 13, wherein the data establishment module includes:

A data preprocessing module, the data preprocessing module is used to preprocess the facial image data;

A data normalization processing module, the data normalization processing module is used to normalize the facial image data;

A data labeling module, which is used to label facial attributes.
The facial recognition device according to claim 13, wherein the data merging module includes: a first storage module, the first storage module is used to store facial image data of each attribute.
The facial recognition device according to claim 13, wherein the training module includes a deep residual network, and the deep residual network includes 4 residual modules,

The first residual module is represented by two small residual modules, and the first small residual module structure uses 64 convolutional layers with a convolution kernel size of 3×3 connected to 64 convolution kernels with a size of 3. ×3 convolutional layer, the identity mapping uses 64 1×1 convolutions, and the sum of the two is input into the second small residual module. The structure of the second small residual module is the same as that of the first The small residual modules are the same, the identity map of the second small residual module is represented by the output of the first small residual module,

The structure of the other three residual modules is the same as that of the first residual module. The number of convolution kernels is 128, 256, and 512, respectively.
The facial recognition device according to claim 13, wherein the prediction module includes a second storage module, and the second storage module is used to store the facial attributes obtained at a time according to the trained network model The recognition result with the highest probability is the multi-attribute value of the face attribute.
The facial recognition device according to claim 14, wherein the data normalization processing module is configured to perform normalization processing on the facial image data including: normalizing the facial image data set of multiple facial attribute data sets The height is 128 pixels by 128 pixels.