WO2020114118A1 - Facial attribute identification method and device, storage medium and processor - Google Patents

Facial attribute identification method and device, storage medium and processor Download PDF

Info

Publication number
WO2020114118A1
WO2020114118A1 PCT/CN2019/112478 CN2019112478W WO2020114118A1 WO 2020114118 A1 WO2020114118 A1 WO 2020114118A1 CN 2019112478 W CN2019112478 W CN 2019112478W WO 2020114118 A1 WO2020114118 A1 WO 2020114118A1
Authority
WO
WIPO (PCT)
Prior art keywords
attribute
facial
attributes
data
module
Prior art date
Application number
PCT/CN2019/112478
Other languages
French (fr)
Chinese (zh)
Inventor
刘若鹏
栾琳
刘凯品
Original Assignee
深圳光启空间技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳光启空间技术有限公司 filed Critical 深圳光启空间技术有限公司
Publication of WO2020114118A1 publication Critical patent/WO2020114118A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Definitions

  • the present invention relates to the field of image recognition technology, and more specifically, to a facial attribute recognition method, device, storage medium, and processor.
  • the face is a very important biological feature. It has the characteristics of complex structure and many details. It also contains a lot of information, such as gender, race, age, expression, etc. A normal adult can easily understand the information of the face, but giving the same ability to the computer and letting it replace humans for brain-like thinking has become a scientific subject urgently needed to be overcome by researchers.
  • Some of the existing attribute recognition methods use training multiple deep convolutional neural networks, and then perform score fusion or feature fusion for further training.
  • the work tasks in this way are heavy and complicated, which is not conducive to practical application of technical problems.
  • the facial identity authentication network and the attribute recognition network are merged to form a fusion network, the identity feature and the facial attribute feature are simultaneously learned by joint learning. It is a multi-task network; and the cost-sensitive weighting function is used to make it independent of Due to the distribution of data in the target domain, balanced training in the source data domain is realized; and the modified fusion framework only adds a small number of parameters, adding additional computational load.
  • the technical problem to be solved by the present invention is to provide a facial attribute recognition method, device, storage medium and processor, which can adopt the method of reducing network parameters and change the number of parameters of certain layers in the network model, such as fully connected layers, Reduce the occupancy rate of network video memory, and input data of multiple attributes into the convolutional neural network through data combination, so that it can learn and train all attributes in the network, so that the same network can complete all facial attributes quickly and efficiently Identification.
  • an embodiment of the present invention provides a facial attribute recognition method, which is characterized by comprising: establishing a plurality of facial attribute data sets; and combining the multiple facial attribute data sets To form a dataset of the multiple facial attributes; establish a multi-task deep convolutional neural network to train the dataset of the multiple facial attributes to obtain a multi-attribute network model that can recognize multiple facial attributes; The multi-attribute network model performs multi-attribute prediction on the facial attributes of the image to be recognized to identify multiple facial attributes in the image to be recognized.
  • the establishing of the data sets of multiple facial attributes includes: pre-processing the data sets of the multiple facial attributes; normalizing the pre-processed data sets of the multiple facial attributes; The multiple facial attributes after normalization are labeled.
  • the combining data sets of multiple facial attributes to form the data set sets of the multiple facial attributes includes: setting the input format of each facial attribute data to an array form D1[n, c, w, h]; the data sets of the multiple face attributes are merged according to the first dimension of the input array, that is, the number of pictures. Assuming that the number of types of face attributes is m, the data form generated after the merge is D2[m ⁇ n, c, w, h]; where n, c, w, and h are the number, channel number, width, and height of the data set images of the multiple facial attributes input into the deep convolutional neural network, respectively.
  • the establishment of a multi-task deep convolutional network to train multiple sets of facial attribute data sets includes: using 4 residual modules of a deep residual network;
  • the first residual module is represented by two small residual modules, and the first small residual module structure uses 64 convolutional layers with a convolution kernel size of 3 ⁇ 3 connected to 64 convolution kernels with a size of 3. ⁇ 3 convolutional layer, the identity mapping uses 64 1 ⁇ 1 convolutions, and the sum of the two is input into the second small residual module.
  • the structure of the second small residual module is the same as that of the first The small residual modules are the same, the identity map of the second small residual module is represented by the output of the first small residual module,
  • the structure of the other three residual modules is the same as that of the first residual module, but the number of convolution kernels is 128, 256, and 512, respectively.
  • a multi-task deep convolutional neural network is established to train the data set of multiple facial attributes
  • a multi-attribute network model that can recognize multiple facial attributes includes:
  • the recognition result with the highest probability of each face attribute of the multiple face attributes is obtained at one time, that is, the multi-attribute value of the face attribute.
  • preprocessing the data set of the plurality of face attributes includes: using a multi-task convolutional neural network algorithm to detect the face in the picture to obtain a face image.
  • the multi-task convolutional neural network algorithm is used to detect the face in the picture, and obtaining the facial image includes: the convolutional neural network adopts a fully connected method, and uses the bounding box vector to fine-tune the candidate form, so that the face is in the image The coordinate points in are detected to obtain a facial image.
  • normalizing the data sets of the plurality of face attributes includes: normalizing the width and height of the face image of the data sets of the plurality of face attributes to 128 pixels ⁇ 128 pixels.
  • the labeling of facial attributes includes: labeling face wearing glasses, wearing a mask, hairstyle, face shape, age, and gender attributes.
  • the recognition result with the highest probability of each face attribute of the plurality of face attributes at a time in the picture recognition stage includes: when the network training calculates the loss function, according to each array’s The first dimension is divided, and it is cut according to the number of pictures corresponding to each attribute, that is, the final result is the array form D3[n, c, w, h] of each attribute; The number of pictures is carried out and each attribute data is input into the corresponding loss function; the loss function takes the form of probability, the formula is:
  • j represents the current serial number of the sub-attribute of an attribute in the facial attribute
  • k represents the serial number of the sub-attribute of an attribute in the facial attribute
  • T represents the total number of sub-attributes of an attribute in the facial attribute
  • S j represents The probability of the j-th sub-attribute of one of the face attributes, k starts from 1, and the sum of the probabilities of the T sub-attributes is 1. It is the exponential form of a certain attribute of the data.
  • a j and a k represent a certain attribute value scale in the facial attribute, and the denominator indicates the sum of the exponents of all attribute tags, thereby obtaining the probability that the facial attribute is a specific attribute tag.
  • an embodiment of the present invention provides a storage medium, the storage medium including a stored program, wherein the face attribute recognition method described above is executed when the program runs.
  • an embodiment of the present invention provides a processor for running a program, wherein the facial attribute recognition method described above is executed when the program is run.
  • an embodiment of the present invention provides a face recognition device, which includes a data establishment module, a data merge module, a training module, and a prediction module that are electrically connected in sequence: the data establishment module is used to establish a variety of facial attributes. Data set; the data merge module is used to merge the data sets of the multiple facial attributes to form a data set set of the multiple facial attributes; the training module is used to establish a multi-task deep convolutional neural network for the multiple Data sets of various facial attributes are trained to obtain a multi-attribute network model that can recognize multiple facial attributes; a prediction module is used to perform multi-attribute prediction of facial attributes of the image to be recognized using the multi-attribute network model to identify Identify multiple facial attributes in the image.
  • the data establishment module includes: a data preprocessing module, the data preprocessing module is used for preprocessing facial image data; a data normalization processing module, the data normalization processing module is used for face The image data is normalized; a data labeling module, which is used to label facial attributes.
  • the data merging module includes: a first storage module, the first storage module is used to store facial images of each attribute.
  • the training module includes a deep residual network
  • the deep residual network includes 4 residual modules
  • the first residual module is represented by two small residual modules
  • the first small residual The module structure uses 64 convolution layers with a convolution kernel size of 3 ⁇ 3 to connect 64 convolution layers with a convolution kernel size of 3 ⁇ 3, and its identity mapping uses 64 1 ⁇ 1 convolution layers.
  • the structure of the second small residual module is the same as the first small residual module
  • the identity mapping of the second small residual module uses the first
  • the output of the small residual module indicates that the structure of the other three residual modules is the same as that of the first residual module, and the number of convolution kernels is 128, 256, and 512, respectively.
  • the prediction module includes a third storage module, the third storage module is used to store the recognition result with the highest probability of each attribute of the face obtained according to the trained network model, that is, the multi-attribute of the face attribute value.
  • the data normalization processing module for normalizing the face image data includes: normalizing the width and height of the face image of the data set of multiple face attributes to 128 pixels ⁇ 128 pixels.
  • the present invention adopts a method of reducing network parameters, that is, changing the number of parameters of certain layers in the network model, such as the fully connected layer, reduces the network memory occupancy rate, And through the method of data combination, the data of multiple attributes are input into the convolutional neural network, so that it can learn and train all the attributes in the network, so that the same network can quickly and efficiently complete the identification of all facial attributes.
  • FIG. 1 is a flowchart of a face attribute recognition method of the present invention.
  • FIG. 2 is a basic block diagram of the residual network used in the facial attribute recognition method of the present invention.
  • FIG. 3 is an embodiment of a flowchart of a facial attribute recognition method of the present invention.
  • FIG. 4 is a diagram of facial shape categories.
  • FIG. 5 is a structural diagram of a face attribute recognition device of the present invention.
  • FIG. 1 is a flowchart of a face attribute recognition method of the present invention. As shown in FIG. 1, the facial attribute recognition method of the present invention includes steps: including steps:
  • the step S11 of establishing a plurality of facial attribute data sets includes: preprocessing, normalizing, and labeling the multiple facial attribute data sets.
  • Preprocessing the dataset of multiple facial attributes further includes:
  • the multi-task convolutional neural network algorithm is used to detect the face in the picture to obtain a facial image.
  • the convolutional neural network can be fully connected, and the candidate form can be fine-tuned using the bounding box vector to detect the coordinate points of the face in the image. After detecting the face, the face image is intercepted for training.
  • Normalizing the image to a certain size is determined by the network input size. Different networks have different input sizes.
  • normalizing the data set of multiple face attributes includes: normalizing the width and height of the face image of the data set of multiple face attributes to but not limited to 128 pixels ⁇ 128 pixels.
  • Labeling facial attributes includes but is not limited to: labeling face wearing glasses, wearing a mask, hairstyle, face shape, age, gender attributes, etc.
  • step S12 the data sets of multiple facial attributes are combined to form a data set set of multiple facial attributes including:
  • the data sets of multiple face attributes are combined according to the first dimension of the input array, that is, the number of pictures. Assuming that the number of types of face attributes is m, the data form generated after the merge is D2[m ⁇ n, c, w, h];
  • n, c, w, and h are the number, channel number, width, and height of the data set images of the multiple facial attributes input into the deep convolutional neural network, respectively.
  • step S13 establishes a multi-task deep convolutional neural network to train the data set of multiple facial attributes, and a multi-attribute network model that can recognize multiple facial attributes includes: 4 using a deep residual network Residual modules, 4 residual modules, of which the first residual module is represented by two small residual modules, the first small residual module structure uses 64 convolution kernels with a size of 3 ⁇ 3 convolution
  • the layer connects 64 convolutional layers with a convolution kernel size of 3 ⁇ 3, and its identity mapping uses 64 1 ⁇ 1 convolutions.
  • the second small residual module is input, and the second small
  • the structure of the residual module is the same as the first small residual module, the identity map of the second small residual module is represented by the output of the first small residual module, and the structure of the other three residual modules It has the same structure as the first residual module, but the number of convolution kernels is 128, 256, and 512, respectively; input the facial attribute data of the image to be recognized into the deep residual network for network training.
  • the number of residual modules can be selected according to actual needs.
  • four residual modules are used as examples.
  • step S14 uses the multi-attribute network model to perform multi-attribute prediction on the facial attributes of the image to be recognized to identify multiple facial attributes in the image to be recognized. Obtain the recognition result with the highest probability of each face attribute of the multiple face attributes at once, that is, the multiple attribute value of the face attribute.
  • the network training stage when calculating the loss function, it is divided according to the first dimension of each array, and it is cut according to the number of pictures corresponding to each attribute, that is, the final result is the array form of each attribute D3[n , C, w, h];
  • the loss function takes the form of probability, the formula is:
  • j represents the current serial number of the sub-attribute of an attribute in the face attribute
  • k represents the serial number of the sub-attribute in an attribute of the face
  • T represents the total number of sub-attributes of an attribute in the face attribute
  • S j represents the face
  • the probability of the jth sub-attribute of one of the attributes, k starts at 1, and the sum of the probabilities of the T sub-attributes is 1. It is the exponential form of a certain attribute of the data.
  • a j and a k represent a certain attribute value scale in the facial attribute, and the denominator indicates the sum of the exponents of all attribute tags, thereby obtaining the probability that the facial attribute is a specific attribute tag.
  • the sum of the probabilities of all the sub-attributes of round face, square face, and sharp face is 1 together.
  • the face attribute is determined to belong to the round face attribute among the three sub-attributes.
  • the parameters of the deep convolutional neural network are reduced, that is, the number of parameters of certain layers in the network model such as the fully connected layer is changed, and the network memory occupation rate is reduced, and Through data merging, data of multiple attributes are input into the convolutional neural network, so that it can learn and train all attributes in the network, so that the same network can quickly and efficiently complete the identification of all facial attributes.
  • An embodiment of the present invention further provides a storage medium, the storage medium includes a stored program, wherein the above-mentioned facial attribute recognition method flow is executed when the above program runs.
  • the above storage medium may be set to store program code for performing the following flow of facial attribute recognition method:
  • the above storage medium may include, but is not limited to: a USB flash drive, a read-only memory (Read-Only Memory, ROM for short), a random access memory (Random Access Memory, RAM for short), Various media that can store program codes, such as removable hard disks, magnetic disks, or optical disks.
  • the storage capacity is reduced, and the program of the built-in facial attribute recognition method process runs faster, thereby quickly and efficiently completing the recognition of all facial attributes.
  • An embodiment of the present invention further provides a processor for running a program, wherein, when the program runs, the steps in the facial attribute recognition method described above are executed.
  • the above program is used to perform the following steps:
  • FIG. 2 is an embodiment of a flowchart of a facial attribute recognition method of the present invention.
  • the flow chart of the present invention is introduced with age, gender, and face shape as the training set.
  • Samples for training data preprocessing include data pictures with age information, data pictures with gender information, and data pictures with face shape information.
  • the first step use the multi-task convolutional neural network (mtcnn) algorithm to detect the face in the picture.
  • the algorithm uses three convolutional neural network cascades to detect the face.
  • the full convolutional neural network is used to obtain face candidates Regression of form and boundary.
  • the non-maximum suppression method is used to remove the overlapping form.
  • the convolutional neural network in this step adopts the fully connected method and uses the bounding box vector to fine-tune.
  • the candidate form detects the coordinate points of the face in the image.
  • the face image is intercepted for training.
  • a convolutional neural network with one more layer is used to continue to optimize the position of the face detection frame.
  • a face image with a normalized size of 128 pixels ⁇ 128 pixels is obtained.
  • the second step Annotate the facial attributes.
  • the labeled attributes include whether to wear glasses, whether to wear a mask, hairstyle, face shape, age, gender, etc.
  • the labeling method uses the existing public attribute data set Market1501 to train an initial model and then coarsely classifies the pictures obtained by face detection. Each attribute will be given a numerical label, and then the fine classification of the facial attribute pictures is manually classified by human, thus Construct data sets of different facial attributes. For example: different attributes in the same picture can be placed in different folders.
  • the multi-task network learning mechanism can enable the network to share the characteristics of other data.
  • many deep learning networks only focus on a single task, so that many data features with the same commonality cannot be shared.
  • Multi-task learning can solve this problem well. It is an inductive transfer mechanism. The main goal is to use the specific field information hidden in the training signals of multiple related tasks to improve the generalization ability.
  • Multi-task learning uses Shared representation trains multiple tasks in parallel to accomplish this goal, that is, you can use shared representation to acquire knowledge about other related problems while learning a problem. Therefore, multi-task learning is a method focused on applying knowledge to solve a problem to other related problems.
  • This solution uses data merging to achieve the preparation of multi-task learning training data. In the process of data merging (taking face, gender, and age data as an example, other attributes can be performed according to this step), the following steps need to be followed:
  • each facial attribute data is an array, because its data format is: D1[n, c, w, h], where n is the image input into the network
  • the number of, c is the number of channels of the image (generally 3 channels)
  • w, h are the width and height of the image.
  • the data set is merged according to the first dimension of the input array, that is, the number of pictures, and the data form generated after the merge is D2[m ⁇ n, c, w, h]; where n, c , W, and h are the number, channel number, width, and height of the data set images of the multiple facial attributes input into the deep convolutional neural network, respectively.
  • the above two steps complete the data preparation phase of multi-task learning.
  • the network can learn the correlation between each data set to achieve the purpose of multi-task learning.
  • the present invention uses the deep residual network as the basic network to perform feature extraction on the network input data.
  • the residual network is a network proposed in 2015, and its performance is better than other deep networks.
  • the invention uses the classification ideas in the detection network to modify the deep residual network, reduce the residual modules in the network, and discard the fully connected layer to form a new network structure, which makes the network structure simple and the network model greatly reduced At the same time, the memory usage is also greatly reduced.
  • the structure of the deep residual network used in this scheme is relatively simple.
  • x is the input of the residual module and F(x) is the original mapping of the convolutional neural network.
  • Relu is the activation function in the depth residual module
  • H(x) is the output function of the depth residual module
  • the depth residual module uses the original mapping F(x) and the input x to form a row of network output functions.
  • This solution uses 4 residual modules of the deep residual network, where the first residual module is represented by two small residual modules, and the first small residual module structure uses 64 convolution kernels with a size of 3 ⁇
  • the convolutional layer of 3 is connected to 64 convolutional layers with a size of 3 ⁇ 3, and its identity mapping uses 64 1 ⁇ 1 convolutions.
  • the structure of the second small residual module is the same as that of the first small residual module, the identity map of the second small residual module is represented by the output of the first small residual module, and the other three
  • the structure of the residual module is the same as that of the first residual module, but the number of convolution kernels is 128, 256, and 512, respectively.
  • Only the four residual modules in the deep residual network are used, which makes the structure smaller and more conducive to feature extraction of the network input data. Through the depth residual network, the depth features of the image can be well extracted, which can facilitate better classification.
  • the data is merged before the network training begins to facilitate the sharing of data.
  • the network After the network's training and learning, the network has learned better features, and the number of images in the training data has not changed during the network's learning process. Therefore, in order to obtain the learning situation of each facial attribute data set, when the network output calculates the loss function, it is divided according to the first dimension of each attribute array and cut according to each attribute, that is, the final result is still each attribute
  • the array form of is D3[n, c, w, h], and input each attribute data into the corresponding loss function.
  • the data set after cutting is input into the corresponding loss function layer, so that the corresponding loss function calculation is performed, the corresponding weight update is obtained, and the corresponding category output of each facial attribute data set is obtained.
  • the loss function can calculate the loss function corresponding to each attribute data based on the shared features learned.
  • the loss function used in this scheme takes the form of probability, and the formula is:
  • a j and a k represent a certain attribute value scale in the facial attribute, and the denominator indicates the sum of the exponents of all attribute labels, thereby obtaining the probability that the facial attribute is a specific attribute label.
  • the weights are updated using the back propagation algorithm to make the network reach the optimal state, so as to obtain the facial attributes corresponding to the input samples, such as age recognition set, gender recognition set, and face recognition set.
  • age As an example, we will label age as four labels: teenager, youth, middle age, and old age.
  • the output of the last layer of the network will be calculated according to the loss function and the corresponding label, and four probabilities will be obtained, respectively representing the probabilities of the corresponding categories (juvenile, youth, middle age, old age). If the probability of the juvenile is the highest, it will be judged For teenagers.
  • the present invention will have a loss function corresponding to each attribute in the final network structure, so each attribute will calculate its corresponding probability according to its corresponding loss function. Determine which attribute it belongs to based on the probability.
  • the weights are updated using the back-propagation algorithm to make the network reach the optimal state, and the facial attributes corresponding to the input samples are obtained.
  • gender we will label gender as: male and female tags.
  • the output of the last layer of the network will be calculated according to the loss function and the corresponding label, and two probabilities will be obtained, which respectively represent the probability of the corresponding category (male and female). If the probability of male is the highest, it will be judged as male.
  • the present invention will have a loss function corresponding to each attribute in the final network structure, so each attribute will calculate its corresponding probability according to its corresponding loss function. Determine which attribute it belongs to based on the probability.
  • the weights are updated using the back-propagation algorithm to make the network reach the optimal state, and the facial attributes corresponding to the input samples are obtained.
  • FIG. 4 is a diagram of facial shape categories. According to the most common facial types, we can be roughly divided into 7 types: round face, oval face, heart-shaped face (inverted triangle face), diamond face, square face, long face, pear-shaped face (positive triangle face).
  • the judgment process for the face shape (the type of face is artificially specified by the practitioner according to the business needs, taking the above seven face types as an example) is:
  • the first step After the data preprocessing process, the data set of the training network and the test network is established through the face detection and data annotation process.
  • the second step input the training data of each attribute into the network, merge the data (taking gender, age, and face shape as examples), enter it into the network model, and train the network model.
  • Step 3 After the network model training is completed, use the trained model to input the picture to be recognized into the network model, so as to obtain the gender, age and face type of the face contained in the picture.
  • the elements of the age recognition set output through the network may be any of the above ages (juvenile, youth, middle age, old age).
  • the elements of the face recognition set output through the network can be the above face shapes (round face, oval face, heart-shaped face (inverted triangle face), diamond face, square face, long face, pear-shaped face (positive triangle face)) Of any kind.
  • the parameters of the deep convolutional neural network are reduced, that is, the number of parameters of certain layers in the network model such as the fully connected layer is changed, and the network memory occupation rate is reduced, and Through data merging, data of multiple attributes are input into the convolutional neural network, so that it can learn and train all attributes in the network, so that the same network can quickly and efficiently complete the identification of all facial attributes.
  • An embodiment of the present invention further provides a storage medium, which includes a stored program, where the flow of the facial attribute recognition method described in Embodiment 4 is executed when the above program runs.
  • the above storage medium may include, but is not limited to: a USB flash drive, a read-only memory (Read-Only Memory, ROM for short), a random access memory (Random Access Memory, RAM for short), Various media that can store program codes, such as removable hard disks, magnetic disks, or optical disks.
  • the storage capacity is reduced, and the program of the built-in facial attribute recognition method process runs faster, thereby quickly and efficiently completing the recognition of all facial attributes.
  • An embodiment of the present invention further provides a processor for running a program, where the steps in the facial attribute recognition method described in Embodiment 4 are executed when the program is run.
  • FIG. 5 is a structural diagram of a face attribute recognition device of the present invention.
  • the device includes: a data establishment module, the data establishment module is used to establish a plurality of facial attribute data sets; a data merge module, the data merge module is used to merge the multiple facial attribute data sets to form A data set collection of the plurality of facial attributes; a training module, the training module is used to establish a multi-task deep convolutional neural network to train the data set collection of the plurality of facial attributes to obtain a variety of facial attributes Multi-attribute network model; prediction module, the prediction module is used to perform multi-attribute prediction of facial attributes of the image to be recognized by using the multi-attribute network model to identify multiple facial attributes in the image to be recognized.
  • the data establishment module includes: a data preprocessing module, the data preprocessing module is used for preprocessing facial image data; a data normalization processing module, the data normalization processing module is used for facial image The data is normalized; a data labeling module, which is used to label facial attributes.
  • the data normalization processing module includes: normalizing the width and height of the face image of the data set of multiple face attributes to but not limited to 128 pixels ⁇ 128 pixels. Labeling facial attributes refers to: labeling facial attributes whether to wear glasses, whether to wear a mask, what hairstyle, what face shape, what age, what gender.
  • the data merging module includes: a first storage module; the first storage module is used to store facial image data of each attribute.
  • the software program dynamically generates Input the array D1[n, c, w, h], and merge the data sets of the multiple facial attributes according to the first dimension of the input array, that is, the number of pictures.
  • the data array D2[m ⁇ n, c, w, h] generated after merging where n is the number of images input into the deep convolutional neural network, and c is the number of channels input into the image of the deep convolutional neural network, w is the width of the image input into the deep convolutional neural network, and h is the height of the image input into the deep convolutional neural network.
  • the training module includes a deep residual network, the deep residual network includes 4 residual modules, wherein the first residual module is represented by two small residual modules, and the first small residual module structure uses 64
  • the convolutional layer with a convolution kernel size of 3 ⁇ 3 connects 64 convolutional layers with a convolution kernel size of 3 ⁇ 3, and its identity map uses 64 1 ⁇ 1 convolutions.
  • Small residual module, the structure of the second small residual module is the same as that of the first small residual module, and the identity mapping of the second small residual module uses the first small residual module
  • the output of shows that the structure of the other three residual modules is the same as that of the first residual module, and the number of convolution kernels is 128, 256, and 512, respectively.
  • the prediction module includes a second storage module, which is used to store a recognition result with the highest probability of each attribute of the face obtained according to the trained network model, that is, a multi-attribute value of the face attribute.
  • the parameters of the deep convolutional neural network are reduced, that is, the number of parameters of certain layers in the network model such as the fully connected layer is changed, and the network memory occupation rate is reduced, and Through data merging, data of multiple attributes are input into the convolutional neural network, so that it can learn and train all attributes in the network, so that the same network can quickly and efficiently complete the identification of all facial attributes.
  • the process of using the original facial attribute recognition method is: establishing data sets corresponding to age, gender, and face shape (including training and testing); entering age, gender, and face shape into their corresponding network models (at this time, three network models need to be trained ); Input the test pictures into the three trained network models respectively, and obtain the attribute recognition result corresponding to each network model according to the attribute corresponding to the highest probability of each network model.
  • the multi-task deep learning recognition method is used for facial recognition.
  • the number of parameters in the network determines the size of the network, and the size of the network determines the size of the network memory.
  • Adopt the method of reducing network parameters that is, change the number of parameters of certain layers (such as fully connected layers) in the network, thereby reducing the occupation of network video memory, and input data of multiple attributes into the volume through data consolidation
  • the neural network makes it learn and train all the attributes in the network, so that the same network can complete the recognition of all facial attributes.
  • the last fully connected layer needs to learn 1000 parameters. Since the fully connected layer occupies most of the network parameters, this solution prunes the network, that is, the original network structure.
  • the fully connected layer is removed, so that the convolutional layer is directly connected to the loss layer, and the probability of the face attribute is directly returned to complete the modification and pruning of the network structure.
  • Such network pruning will reduce the number of network calculation parameters, making Network size and video memory usage are reduced.
  • the facial attribute recognition method, device, storage medium and processor reduces the network parameters, that is, changes the number of parameters of certain layers in the network model, such as the fully connected layer, reduces Network memory occupancy rate, and through data combination, input data of multiple attributes into the convolutional neural network, so that it can learn and train all attributes in the network, so that the same network can quickly and efficiently complete the identification of all facial attributes .

Abstract

The present invention provides a facial attribute identification method and device, a storage medium and a processor. The method comprises: building datasets of multiple facial attributes; merging the datasets of multiple facial attributes to form a dataset collection; build a multi-task deep convolutional network to train the dataset collection of multiple facial attributes, to obtain a network model capable of identifying the multiple facial attributes; using the obtained trained network model for multiple attributes to perform a multi-attribute prediction for facial attributes of an image to be identified, so as to identify the multiple attributes in the image to be identified. Network parameters are reduced to change the number of parameters of certain layers in the network model, for example, a fully connected layer, and lower network memory occupancy rate; and data merging is used to input data of multiple attributes into a convolutional neural network, so that all attributes are studied and trained in the network and thus the identification for all facial attributes is completed rapidly and efficiently by using the same network.

Description

面部属性识别方法、装置、存储介质及处理器Face attribute recognition method, device, storage medium and processor 【技术领域】【Technical Field】
本发明涉及图像识别技术领域,更具体地说,涉及一种面部属性识别方法、装置、存储介质及处理器。The present invention relates to the field of image recognition technology, and more specifically, to a facial attribute recognition method, device, storage medium, and processor.
【背景技术】【Background technique】
随着计算机视觉技术以及深度学习的蓬勃发展,安防、智能视频监控、城市治安、事故预警等领域中需要不断地对面部进行识别。然而,随着任务需求的不断提高,人们不仅仅是局限于对面部身份的认证,更重要的对面部子属性进行识别认证。这样更加有利于对某人的身份进行识别,同时也更加有利于在安防领域中运用面部信息进行布控。另外,将面部识别和面部属性同时运用在安防领域中,使得任务需求变得更加清晰明了。With the vigorous development of computer vision technology and deep learning, it is necessary to continuously recognize faces in the fields of security, intelligent video surveillance, urban security, and accident warning. However, with the increasing demand for tasks, people are not only limited to the authentication of facial identities, but also more importantly to identify and authenticate facial sub-attributes. This is more conducive to the identification of someone's identity, but also more conducive to the use of facial information in the security field to control. In addition, the use of facial recognition and facial attributes in the security field at the same time makes the task requirements more clear.
面部是一种非常重要的生物特征,具有结构复杂、细节变化多等特点,同时也蕴含了大量的信息,比如性别、种族、年龄、表情等。一个正常的成年人可以轻易的理解面部的信息,但将同样的能力赋予给计算机,并让其代替人类进行类脑思考成为研究学者亟待攻克的科学课题。The face is a very important biological feature. It has the characteristics of complex structure and many details. It also contains a lot of information, such as gender, race, age, expression, etc. A normal adult can easily understand the information of the face, but giving the same ability to the computer and letting it replace humans for brain-like thinking has become a scientific subject urgently needed to be overcome by researchers.
现有属性识别的方法中,有的采用训练多个深度卷积神经网络,然后进行分数融合或者特征融合进一步训练。这种方式工作任务繁重复杂,不利于实际应用的技术问题。如有将面部身份认证网络和属性识别网络进行融合形成一个融合网络,采用联合学习的方式同时学习身份特征和面部属性特征,是一个多任务网络;并采用代价敏感的加权函数,使其不依赖于目标域数据分布,而实现在源数据域的平衡训练;且修改后的融合框架仅添加了少量参数,增加了额外计算负荷。Some of the existing attribute recognition methods use training multiple deep convolutional neural networks, and then perform score fusion or feature fusion for further training. The work tasks in this way are heavy and complicated, which is not conducive to practical application of technical problems. If the facial identity authentication network and the attribute recognition network are merged to form a fusion network, the identity feature and the facial attribute feature are simultaneously learned by joint learning. It is a multi-task network; and the cost-sensitive weighting function is used to make it independent of Due to the distribution of data in the target domain, balanced training in the source data domain is realized; and the modified fusion framework only adds a small number of parameters, adding additional computational load.
也有提出将多个属性融合进一个网络中完成的,但是其网络结构和损失函数函数过于复杂,不利于网络的训练且仅仅依赖面部属性识别来增加面部识别的准确率。虽然也是多属性网络结构,但其实是将带有多种属性标签的 面部图像作为网络结构的输入,相当于是一张图片对应多种标签,不利于网络的训练和学习,同时使用的网络模型为深度残差网络结构,这样造成网络模型过大,对计算资源要求过高,实用性较差。It has also been proposed to fuse multiple attributes into a network, but its network structure and loss function are too complex, which is not conducive to network training and only depends on facial attribute recognition to increase the accuracy of facial recognition. Although it is also a multi-attribute network structure, it is actually a face image with multiple attribute tags as input to the network structure, which is equivalent to a picture corresponding to multiple tags, which is not conducive to network training and learning. The network model used at the same time is The deep residual network structure, which causes the network model to be too large, requires too much computing resources, and has poor practicality.
【发明内容】[Invention content]
本发明所要解决的技术问题是提供一种面部属性识别方法、装置、存储介质及处理器,能够采用减小网络参数的方法,改变网络模型中某些层如全连接层的参数的个数,减小网络显存占用率,且通过数据合并的方式,将多种属性的数据输入进卷积神经网络,使其在网络中对所有属性进行学习和训练,从而使同一网络快速高效完成所有面部属性的识别。The technical problem to be solved by the present invention is to provide a facial attribute recognition method, device, storage medium and processor, which can adopt the method of reducing network parameters and change the number of parameters of certain layers in the network model, such as fully connected layers, Reduce the occupancy rate of network video memory, and input data of multiple attributes into the convolutional neural network through data combination, so that it can learn and train all attributes in the network, so that the same network can complete all facial attributes quickly and efficiently Identification.
为解决上述技术问题,一方面,本发明一实施例提供了一种面部属性识别方法,其特征在于,包括:建立多种面部属性的数据集;将所述多种面部属性的数据集进行合并,形成所述多种面部属性的数据集集合;建立多任务的深度卷积神经网络对所述多种面部属性的数据集集合进行训练,得到可以识别多种面部属性的多属性网络模型;运用所述多属性网络模型对待识别图像的面部属性进行多属性预测,以识别所述待识别图像中的多种面部属性。In order to solve the above technical problem, on the one hand, an embodiment of the present invention provides a facial attribute recognition method, which is characterized by comprising: establishing a plurality of facial attribute data sets; and combining the multiple facial attribute data sets To form a dataset of the multiple facial attributes; establish a multi-task deep convolutional neural network to train the dataset of the multiple facial attributes to obtain a multi-attribute network model that can recognize multiple facial attributes; The multi-attribute network model performs multi-attribute prediction on the facial attributes of the image to be recognized to identify multiple facial attributes in the image to be recognized.
优选地,所述建立多种面部属性的数据集包括:对所述多种面部属性的数据集进行预处理;对预处理后的所述多种面部属性的数据集进行归一化处理;对归一化处理后的所述多种面部属性进行标注。Preferably, the establishing of the data sets of multiple facial attributes includes: pre-processing the data sets of the multiple facial attributes; normalizing the pre-processed data sets of the multiple facial attributes; The multiple facial attributes after normalization are labeled.
优选地,所述将多种面部属性的数据集进行合并,形成所述多种面部属性的数据集集合包括:将每一个面部属性数据的输入格式设为数组形式D1[n,c,w,h];将所述多种面部属性的数据集按照输入数组的第一维,即图片的个数进行合并,假设面部属性的种类个数为m,则合并后生成的数据形式为D2[m×n,c,w,h];其中n、c、w、h分别为输入进深度卷积神经网络的所述多种面部属性的数据集图像的个数、通道数、宽度和高度。Preferably, the combining data sets of multiple facial attributes to form the data set sets of the multiple facial attributes includes: setting the input format of each facial attribute data to an array form D1[n, c, w, h]; the data sets of the multiple face attributes are merged according to the first dimension of the input array, that is, the number of pictures. Assuming that the number of types of face attributes is m, the data form generated after the merge is D2[m ×n, c, w, h]; where n, c, w, and h are the number, channel number, width, and height of the data set images of the multiple facial attributes input into the deep convolutional neural network, respectively.
优选地,所述建立多任务的深度卷积网络对多种面部属性数据集集合进行训练包括:采用深度残差网络的4个残差模块;Preferably, the establishment of a multi-task deep convolutional network to train multiple sets of facial attribute data sets includes: using 4 residual modules of a deep residual network;
将待识别图像的面部属性数据输入进深度残差网络进行网络训练;Input the facial attribute data of the image to be recognized into the deep residual network for network training;
其中第一个残差模块采用两个小的残差模块表示,第一个小的残差模块 结构采用64个卷积核大小为3×3的卷积层连接64个卷积核大小为3×3的卷积层,其恒等映射用64个1×1的卷积,二者做和之后输入第二个小的残差模块,第二个小的残差模块结构与所述第一个小的残差模块相同,第二个小的残差模块的恒等映射用第一个小的残差模块的输出表示,The first residual module is represented by two small residual modules, and the first small residual module structure uses 64 convolutional layers with a convolution kernel size of 3×3 connected to 64 convolution kernels with a size of 3. ×3 convolutional layer, the identity mapping uses 64 1×1 convolutions, and the sum of the two is input into the second small residual module. The structure of the second small residual module is the same as that of the first The small residual modules are the same, the identity map of the second small residual module is represented by the output of the first small residual module,
另外三个残差模块结构和第一个残差模块结构一样,但是卷积核的个数分别为128、256、512。The structure of the other three residual modules is the same as that of the first residual module, but the number of convolution kernels is 128, 256, and 512, respectively.
优选地,建立多任务的深度卷积神经网络对所述多种面部属性的数据集集合进行训练,得到可以识别多种面部属性的多属性网络模型包括:Preferably, a multi-task deep convolutional neural network is established to train the data set of multiple facial attributes, and a multi-attribute network model that can recognize multiple facial attributes includes:
根据训练好的网络模型,在图片识别阶段一次性得到所述多种面部属性的面部各个属性的概率最高的识别结果,即面部属性的多属性值。According to the trained network model, in the picture recognition stage, the recognition result with the highest probability of each face attribute of the multiple face attributes is obtained at one time, that is, the multi-attribute value of the face attribute.
优选地,对所述多种面部属性的数据集进行预处理包括:使用多任务卷积神经网络算法对图片中的面部进行检测,获得面部图像。Preferably, preprocessing the data set of the plurality of face attributes includes: using a multi-task convolutional neural network algorithm to detect the face in the picture to obtain a face image.
优选地,所述使用多任务卷积神经网络算法对图片中的面部进行检测,获得面部图像包括:卷积神经网络采用全连接的方式,利用边界框向量微调候选窗体,从而对面部在图像中的坐标点进行检测,得到面部图像。Preferably, the multi-task convolutional neural network algorithm is used to detect the face in the picture, and obtaining the facial image includes: the convolutional neural network adopts a fully connected method, and uses the bounding box vector to fine-tune the candidate form, so that the face is in the image The coordinate points in are detected to obtain a facial image.
优选地,对所述多种面部属性的数据集进行归一化处理包括:对多种面部属性的数据集面部图像归一化宽高大小为128像素×128像素。Preferably, normalizing the data sets of the plurality of face attributes includes: normalizing the width and height of the face image of the data sets of the plurality of face attributes to 128 pixels×128 pixels.
优选地,所述对面部属性进行标注包括:标注面部戴眼镜、戴口罩、发型、脸型、年龄、性别属性。Preferably, the labeling of facial attributes includes: labeling face wearing glasses, wearing a mask, hairstyle, face shape, age, and gender attributes.
优选地:所述根据训练好的网络模型,在图片识别阶段一次性得到所述多种面部属性的面部各个属性的概率最高的识别结果包括:在网络训练计算损失函数时,按照每个数组的第一维度进行分割,并按照每个属性对应的图片个数将其切开,即最终的结果为每个属性的数组形式D3[n,c,w,h];在进行分割时,按照对应的图片个数进行并将每个属性数据输入对应的损失函数;损失函数采用概率形式,公式为:Preferably: according to the trained network model, the recognition result with the highest probability of each face attribute of the plurality of face attributes at a time in the picture recognition stage includes: when the network training calculates the loss function, according to each array’s The first dimension is divided, and it is cut according to the number of pictures corresponding to each attribute, that is, the final result is the array form D3[n, c, w, h] of each attribute; The number of pictures is carried out and each attribute data is input into the corresponding loss function; the loss function takes the form of probability, the formula is:
Figure PCTCN2019112478-appb-000001
Figure PCTCN2019112478-appb-000001
其中,j代表面部属性中的一个属性的子属性的当前序号,k代表面部属性中的一个属性的子属性的序号,T代表面部属性中的一个属性的子 属性的总个数,S j代表面部属性中的一个属性的第j子属性的概率,k从1开始取值,T个子属性的概率之和为1,
Figure PCTCN2019112478-appb-000002
为数据的某一种属性的的指数形式,a j和a k代表面部属性中的某一个属性值标,分母表示将所有属性标签的指数相加,从而得到面部属性是具体属性标签的概率。
Where j represents the current serial number of the sub-attribute of an attribute in the facial attribute, k represents the serial number of the sub-attribute of an attribute in the facial attribute, T represents the total number of sub-attributes of an attribute in the facial attribute, and S j represents The probability of the j-th sub-attribute of one of the face attributes, k starts from 1, and the sum of the probabilities of the T sub-attributes is 1.
Figure PCTCN2019112478-appb-000002
It is the exponential form of a certain attribute of the data. a j and a k represent a certain attribute value scale in the facial attribute, and the denominator indicates the sum of the exponents of all attribute tags, thereby obtaining the probability that the facial attribute is a specific attribute tag.
另一方面,本发明一实施例提供了一种存储介质,所述存储介质包括存储的程序,其中,所述程序运行时执行上述的面部属性识别方法。On the other hand, an embodiment of the present invention provides a storage medium, the storage medium including a stored program, wherein the face attribute recognition method described above is executed when the program runs.
另一方面,本发明一实施例提供了一种处理器,所述处理器用于运行程序,其中,所述程序运行时执行上述的面部属性识别方法。On the other hand, an embodiment of the present invention provides a processor for running a program, wherein the facial attribute recognition method described above is executed when the program is run.
另一方面,本发明一实施例提供了一种面部识别装置,所述装置包括依次电连接的数据建立模块、数据合并模块、训练模块和预测模块:数据建立模块用于建立多种面部属性的数据集;数据合并模块用于将所述多种面部属性的数据集进行合并,形成所述多种面部属性的数据集集合;训练模块用于建立多任务的深度卷积神经网络对所述多种面部属性的数据集集合进行训练,得到可以识别多种面部属性的多属性网络模型;预测模块用于运用所述多属性网络模型对待识别图像的面部属性进行多属性预测,以识别所述待识别图像中的多种面部属性。On the other hand, an embodiment of the present invention provides a face recognition device, which includes a data establishment module, a data merge module, a training module, and a prediction module that are electrically connected in sequence: the data establishment module is used to establish a variety of facial attributes. Data set; the data merge module is used to merge the data sets of the multiple facial attributes to form a data set set of the multiple facial attributes; the training module is used to establish a multi-task deep convolutional neural network for the multiple Data sets of various facial attributes are trained to obtain a multi-attribute network model that can recognize multiple facial attributes; a prediction module is used to perform multi-attribute prediction of facial attributes of the image to be recognized using the multi-attribute network model to identify Identify multiple facial attributes in the image.
优选地,所述数据建立模块包括:数据预处理模块,所述数据预处理模块用于对面部图像数据进行预处理;数据归一化处理模块,所述数据归一化处理模块用于对面部图像数据进行归一化处理;数据标注模块,所述数据标注模块用于对面部属性进行标注。Preferably, the data establishment module includes: a data preprocessing module, the data preprocessing module is used for preprocessing facial image data; a data normalization processing module, the data normalization processing module is used for face The image data is normalized; a data labeling module, which is used to label facial attributes.
优选地,所述数据合并模块包括:第一存储模块,所述第一存储模块用于存储每一种属性的面部图像。Preferably, the data merging module includes: a first storage module, the first storage module is used to store facial images of each attribute.
优选地,所述训练模块包括深度残差网络,所述深度残差网络包括4个残差模块,其中第一个残差模块采用两个小的残差模块表示,第一个小的残差模块结构采用64个卷积核大小为3×3的卷积层连接64个卷积核大小为3×3的卷积层,其恒等映射用64个1×1的卷积,二者做和之后输入第二个小的残差模块,第二个小的残差模块结构与所述第一个小的残差模块相同,第二个小的残差模块的恒等映射用第一个小的残差模块的输出表示,另外三个残差模块结构和第一个残差模块结构一样,卷积核的个数分别为 128、256、512。Preferably, the training module includes a deep residual network, and the deep residual network includes 4 residual modules, wherein the first residual module is represented by two small residual modules, and the first small residual The module structure uses 64 convolution layers with a convolution kernel size of 3×3 to connect 64 convolution layers with a convolution kernel size of 3×3, and its identity mapping uses 64 1×1 convolution layers. Then enter the second small residual module, the structure of the second small residual module is the same as the first small residual module, and the identity mapping of the second small residual module uses the first The output of the small residual module indicates that the structure of the other three residual modules is the same as that of the first residual module, and the number of convolution kernels is 128, 256, and 512, respectively.
优选地,所述预测模块包括第三存储模块,所述第三存储模块用于存储根据训练好的网络模型,从而一次性得到的面部各个属性的概率最高的识别结果,即面部属性的多属性值。Preferably, the prediction module includes a third storage module, the third storage module is used to store the recognition result with the highest probability of each attribute of the face obtained according to the trained network model, that is, the multi-attribute of the face attribute value.
优选地,所述数据归一化处理模块用于对面部图像数据进行归一化处理包括:对多种面部属性的数据集面部图像归一化宽高大小为128像素×128像素。Preferably, the data normalization processing module for normalizing the face image data includes: normalizing the width and height of the face image of the data set of multiple face attributes to 128 pixels×128 pixels.
与现有技术相比,上述技术方案具有以下优点:本发明采用减小网络参数的方法,即改变网络模型中某些层如全连接层的参数的个数,减小了网络显存占用率,且通过数据合并的方式,将多种属性的数据输入进卷积神经网络,使其在网络中对所有属性进行学习和训练,从而使同一网络快速高效完成所有面部属性的识别。Compared with the prior art, the above technical solution has the following advantages: The present invention adopts a method of reducing network parameters, that is, changing the number of parameters of certain layers in the network model, such as the fully connected layer, reduces the network memory occupancy rate, And through the method of data combination, the data of multiple attributes are input into the convolutional neural network, so that it can learn and train all the attributes in the network, so that the same network can quickly and efficiently complete the identification of all facial attributes.
【附图说明】【Explanation】
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其它的附图。In order to more clearly explain the technical solutions in the embodiments of the present invention, the drawings required in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For a person of ordinary skill in the art, without paying any creative labor, other drawings can also be obtained based on these drawings.
图1是本发明面部属性识别方法的流程图。FIG. 1 is a flowchart of a face attribute recognition method of the present invention.
图2是本发明面部属性识别方法中使用的残差网络的基础模块图。FIG. 2 is a basic block diagram of the residual network used in the facial attribute recognition method of the present invention.
图3是本发明一种面部属性识别方法的流程图的一实施例。FIG. 3 is an embodiment of a flowchart of a facial attribute recognition method of the present invention.
图4是面部形状类别图。FIG. 4 is a diagram of facial shape categories.
图5本发明一种面部属性识别装置结构图。FIG. 5 is a structural diagram of a face attribute recognition device of the present invention.
【具体实施方式】【detailed description】
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be described clearly and completely in conjunction with the drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, but not all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts fall within the protection scope of the present invention.
实施例一Example one
图1是本发明面部属性识别方法的流程图。如图1所述,本发明面部属性识别方法包括步骤:包括步骤:FIG. 1 is a flowchart of a face attribute recognition method of the present invention. As shown in FIG. 1, the facial attribute recognition method of the present invention includes steps: including steps:
S11、建立多种面部属性的数据集;S11. Establish a data set of multiple facial attributes;
S12、将所述多种面部属性的数据集进行合并,形成所述多种面部属性的数据集集合;S12: Combine the data sets of the multiple facial attributes to form a data set set of the multiple facial attributes;
S13、建立多任务的深度卷积神经网络对所述多种面部属性的数据集集合进行训练,得到可以识别多种面部属性的多属性网络模型;S13. Establish a multi-task deep convolutional neural network to train the data set of multiple facial attributes to obtain a multi-attribute network model that can recognize multiple facial attributes;
S14、运用所述多属性网络模型对待识别图像的面部属性进行多属性预测,以识别所述待识别图像中的多种面部属性。S14. Use the multi-attribute network model to perform multi-attribute prediction on the facial attributes of the image to be recognized to identify multiple facial attributes in the image to be recognized.
具体实施时,步骤S11建立多种面部属性的数据集包括:对多种面部属性的数据集进行预处理、归一化处理、标注。During specific implementation, the step S11 of establishing a plurality of facial attribute data sets includes: preprocessing, normalizing, and labeling the multiple facial attribute data sets.
对多种面部属性的数据集进行预处理进一步包括:Preprocessing the dataset of multiple facial attributes further includes:
使用多任务卷积神经网络算法对图片中的面部进行检测,获得面部图像。可以采取卷积神经网络全连接的方式,利用边界框向量微调候选窗体,对面部在图像中的坐标点进行检测,检测到面部之后把面部图像截取下来进行训练。The multi-task convolutional neural network algorithm is used to detect the face in the picture to obtain a facial image. The convolutional neural network can be fully connected, and the candidate form can be fine-tuned using the bounding box vector to detect the coordinate points of the face in the image. After detecting the face, the face image is intercepted for training.
将图像归一化到某一大小,是由网络输入大小决定的。不同的网络有不同的输入大小。本发明作为一个实施例,对多种面部属性的数据集进行归一化处理包括:对多种面部属性的数据集面部图像归一化宽高大小为但是不限于128像素×128像素。Normalizing the image to a certain size is determined by the network input size. Different networks have different input sizes. As an embodiment of the present invention, normalizing the data set of multiple face attributes includes: normalizing the width and height of the face image of the data set of multiple face attributes to but not limited to 128 pixels×128 pixels.
对面部属性进行标注包括但是不局限于:标注面部戴眼镜、戴口罩、发型、脸型、年龄、性别属性等。Labeling facial attributes includes but is not limited to: labeling face wearing glasses, wearing a mask, hairstyle, face shape, age, gender attributes, etc.
具体实施时,步骤S12将多种面部属性的数据集进行合并,形成多种面部属性的数据集集合包括:During specific implementation, in step S12, the data sets of multiple facial attributes are combined to form a data set set of multiple facial attributes including:
将每一个面部属性数据的输入格式设为数组形式D1[n,c,w,h];Set the input format of each facial attribute data to array form D1[n, c, w, h];
将多种面部属性的数据集按照输入数组的第一维,即图片的个数进行合并,假设面部属性的种类个数为m,则合并后生成的数据形式为D2[m×n,c, w,h];The data sets of multiple face attributes are combined according to the first dimension of the input array, that is, the number of pictures. Assuming that the number of types of face attributes is m, the data form generated after the merge is D2[m×n, c, w, h];
其中n、c、w、h分别为输入进深度卷积神经网络的所述多种面部属性的数据集图像的个数、通道数、宽度和高度。Where n, c, w, and h are the number, channel number, width, and height of the data set images of the multiple facial attributes input into the deep convolutional neural network, respectively.
具体实施时,步骤S13建立多任务的深度卷积神经网络对所述多种面部属性的数据集集合进行训练,得到可以识别多种面部属性的多属性网络模型包括:采用深度残差网络的4个残差模块,4个残差模块其中第一个残差模块采用两个小的残差模块表示,第一个小的残差模块结构采用64个卷积核大小为3×3的卷积层连接64个卷积核大小为3×3的卷积层,其恒等映射用64个1×1的卷积,二者做和之后输入第二个小的残差模块,第二个小的残差模块结构与所述第一个小的残差模块相同,第二个小的残差模块的恒等映射用第一个小的残差模块的输出表示,另外三个残差模块结构和第一个残差模块结构一样,但是卷积核的个数分别为128、256、512;将待识别图像的面部属性数据输入进深度残差网络进行网络训练。在具体实施过程中,残差模块的数量可以根据实际需要进行选取,此处以4个残差模块进行举例说明。During specific implementation, step S13 establishes a multi-task deep convolutional neural network to train the data set of multiple facial attributes, and a multi-attribute network model that can recognize multiple facial attributes includes: 4 using a deep residual network Residual modules, 4 residual modules, of which the first residual module is represented by two small residual modules, the first small residual module structure uses 64 convolution kernels with a size of 3×3 convolution The layer connects 64 convolutional layers with a convolution kernel size of 3×3, and its identity mapping uses 64 1×1 convolutions. After the two are summed, the second small residual module is input, and the second small The structure of the residual module is the same as the first small residual module, the identity map of the second small residual module is represented by the output of the first small residual module, and the structure of the other three residual modules It has the same structure as the first residual module, but the number of convolution kernels is 128, 256, and 512, respectively; input the facial attribute data of the image to be recognized into the deep residual network for network training. In the specific implementation process, the number of residual modules can be selected according to actual needs. Here, four residual modules are used as examples.
具体实施时,步骤S14运用所述多属性网络模型对待识别图像的面部属性进行多属性预测,以识别所述待识别图像中的多种面部属性包括:根据训练好的网络模型,在图片识别阶段一次性得到所述多种面部属性的面部各个属性的概率最高的识别结果,即面部属性的多属性值。During specific implementation, step S14 uses the multi-attribute network model to perform multi-attribute prediction on the facial attributes of the image to be recognized to identify multiple facial attributes in the image to be recognized. Obtain the recognition result with the highest probability of each face attribute of the multiple face attributes at once, that is, the multiple attribute value of the face attribute.
在网络训练阶段,计算损失函数时,按照每个数组的第一维度进行分割,并按照每个属性对应的图片个数将其切开,即最终的结果为每个属性的数组形式D3[n,c,w,h];In the network training stage, when calculating the loss function, it is divided according to the first dimension of each array, and it is cut according to the number of pictures corresponding to each attribute, that is, the final result is the array form of each attribute D3[n , C, w, h];
在进行分割时,按照对应的图片个数进行并将每个属性数据输入对应的损失函数;When segmenting, proceed according to the corresponding number of pictures and input each attribute data into the corresponding loss function;
损失函数采用概率形式,公式为:The loss function takes the form of probability, the formula is:
Figure PCTCN2019112478-appb-000003
Figure PCTCN2019112478-appb-000003
其中,j代表面部属性中的一个属性的子属性的当前序号,k代表面部某一属性中的子属性的序号,T代表面部属性中的一个属性的子属性的 总个数,S j代表面部属性中的一个属性的第j子属性的概率,k从1开始取值,T个子属性的概率之和为1,
Figure PCTCN2019112478-appb-000004
为数据的某一种属性的的指数形式,a j和a k代表面部属性中的某一个属性值标,分母表示将所有属性标签的指数相加,从而得到面部属性是具体属性标签的概率。
Where j represents the current serial number of the sub-attribute of an attribute in the face attribute, k represents the serial number of the sub-attribute in an attribute of the face, T represents the total number of sub-attributes of an attribute in the face attribute, and S j represents the face The probability of the jth sub-attribute of one of the attributes, k starts at 1, and the sum of the probabilities of the T sub-attributes is 1.
Figure PCTCN2019112478-appb-000004
It is the exponential form of a certain attribute of the data. a j and a k represent a certain attribute value scale in the facial attribute, and the denominator indicates the sum of the exponents of all attribute tags, thereby obtaining the probability that the facial attribute is a specific attribute tag.
比如面部属性中的一种脸型属性,脸型属性有三种子属性:圆脸、方脸、尖脸。则T=3,k从1开始取值。在一次判断中,所有子属性圆脸、方脸、尖脸的概率和加在一起为1。圆脸的概率计算:通过神经网络计算得到的网络输出值为a 1=3。方脸的概率计算:通过神经网络计算得到的网络输出值为a 2=1。尖脸的概率计算:通过神经网络计算得到的网络输出值为a 3=-3。根据损失函数概率形式的公式:
Figure PCTCN2019112478-appb-000005
For example, a face attribute in the face attribute, the face attribute has three sub-attributes: round face, square face, and pointed face. Then T=3, and k takes the value from 1. In one judgment, the sum of the probabilities of all the sub-attributes of round face, square face, and sharp face is 1 together. Probability calculation of round face: The network output value calculated by neural network is a 1 =3. Probability calculation of the square face: The network output value calculated by the neural network is a 2 =1. Probability calculation of sharp face: The network output value calculated by neural network is a 3 =-3. According to the formula in the probability form of the loss function:
Figure PCTCN2019112478-appb-000005
则此次判断为圆脸的概率值S 1为:
Figure PCTCN2019112478-appb-000006
Then the probability value S 1 judged as a round face this time is:
Figure PCTCN2019112478-appb-000006
则此次判断为方脸的概率值S 2为:
Figure PCTCN2019112478-appb-000007
Then the probability value S 2 judged as square face this time is:
Figure PCTCN2019112478-appb-000007
则此次判断为尖脸的概率值S 3为:
Figure PCTCN2019112478-appb-000008
Then the probability value S 3 judged as sharp face this time is:
Figure PCTCN2019112478-appb-000008
根据此次判断为圆脸、方脸、尖脸的概率值S 1、S 2、S 3型值大小,则此次判断脸型属性属于三种子属性中的圆脸属性。 According to the probability values S 1 , S 2 , and S 3 of the round, square, and sharp faces determined this time, the face attribute is determined to belong to the round face attribute among the three sub-attributes.
由此可见,通过采用本发明面部属性识别方法,减小了深度卷积神经网络参数,即改变网络模型中某些层如全连接层的参数的个数,减小了网络显存占用率,且通过数据合并的方式,将多种属性的数据输入进卷积神经网络,使其在网络中对所有属性进行学习和训练,从而使同一网络快速高效完成所有面部属性的识别。It can be seen that by adopting the facial attribute recognition method of the present invention, the parameters of the deep convolutional neural network are reduced, that is, the number of parameters of certain layers in the network model such as the fully connected layer is changed, and the network memory occupation rate is reduced, and Through data merging, data of multiple attributes are input into the convolutional neural network, so that it can learn and train all attributes in the network, so that the same network can quickly and efficiently complete the identification of all facial attributes.
实施例二Example 2
本发明的实施例还提供了一种存储介质,该存储介质包括存储的程序,其中,上述程序运行时执行上述的面部属性识别方法流程。An embodiment of the present invention further provides a storage medium, the storage medium includes a stored program, wherein the above-mentioned facial attribute recognition method flow is executed when the above program runs.
可选地,在本实施例中,上述存储介质可以被设置为存储用于执行以下 面部属性识别方法流程的程序代码:Optionally, in this embodiment, the above storage medium may be set to store program code for performing the following flow of facial attribute recognition method:
S11、建立多种面部属性的数据集;S11. Establish a data set of multiple facial attributes;
S12、将所述多种面部属性的数据集进行合并,形成所述多种面部属性的数据集集合;S12: Combine the data sets of the multiple facial attributes to form a data set set of the multiple facial attributes;
S13、建立多任务的深度卷积神经网络对所述多种面部属性的数据集集合进行训练,得到可以识别多种面部属性的多属性网络模型;S13. Establish a multi-task deep convolutional neural network to train the data set of multiple facial attributes to obtain a multi-attribute network model that can recognize multiple facial attributes;
S14、运用所述多属性网络模型对待识别图像的面部属性进行多属性预测,以识别所述待识别图像中的多种面部属性。S14. Use the multi-attribute network model to perform multi-attribute prediction on the facial attributes of the image to be recognized to identify multiple facial attributes in the image to be recognized.
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(Read-Only Memory,简称为ROM)、随机存取存储器(Random Access Memory,简称为RAM)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。Optionally, in this embodiment, the above storage medium may include, but is not limited to: a USB flash drive, a read-only memory (Read-Only Memory, ROM for short), a random access memory (Random Access Memory, RAM for short), Various media that can store program codes, such as removable hard disks, magnetic disks, or optical disks.
由此可见,通过采用本发明存储介质,减小了存储容量,内置的面部属性识别方法流程的程序运行速度更快,从而快速高效完成所有面部属性的识别。It can be seen that by using the storage medium of the present invention, the storage capacity is reduced, and the program of the built-in facial attribute recognition method process runs faster, thereby quickly and efficiently completing the recognition of all facial attributes.
实施例三Example Three
本发明的实施例还提供了一种处理器,该处理器用于运行程序,其中,该程序运行时执行上述的面部属性识别方法中的步骤。An embodiment of the present invention further provides a processor for running a program, wherein, when the program runs, the steps in the facial attribute recognition method described above are executed.
可选地,在本实施例中,上述程序用于执行以下步骤:Optionally, in this embodiment, the above program is used to perform the following steps:
S11、建立多种面部属性的数据集;S11. Establish a data set of multiple facial attributes;
S12、将所述多种面部属性的数据集进行合并,形成所述多种面部属性的数据集集合;S12: Combine the data sets of the multiple facial attributes to form a data set set of the multiple facial attributes;
S13、建立多任务的深度卷积神经网络对所述多种面部属性的数据集集合进行训练,得到可以识别多种面部属性的多属性网络模型;S13. Establish a multi-task deep convolutional neural network to train the data set of multiple facial attributes to obtain a multi-attribute network model that can recognize multiple facial attributes;
S14、运用所述多属性网络模型对待识别图像的面部属性进行多属性预测,以识别所述待识别图像中的多种面部属性。S14. Use the multi-attribute network model to perform multi-attribute prediction on the facial attributes of the image to be recognized to identify multiple facial attributes in the image to be recognized.
可选地,本实施例中的具体示例可以参考上述实施例及具体实施时所描述的示例,本实施例在此不再赘述。Optionally, for specific examples in this embodiment, reference may be made to the above-mentioned embodiments and the examples described in the specific implementation, and this embodiment will not be repeated here.
由此可见,通过采用本发明处理器,减小了待处理数据量,内置的面部属性识别方法流程的程序运行速度更快,从而快速高效完成所有面部属性的识别。It can be seen that by using the processor of the present invention, the amount of data to be processed is reduced, and the program of the built-in facial attribute recognition method process runs faster, thereby quickly and efficiently completing the recognition of all facial attributes.
实施例四Example 4
图2是本发明一种面部属性识别方法的流程图的一实施例。在如今社会,从业人员可以从各个渠道得到很多种数据。这为深度学习提供了很好的数据支持。为了更加方便的介绍本发明的具体实施例,以年龄、性别、脸型作为训练集介绍该发明的流程图。FIG. 2 is an embodiment of a flowchart of a facial attribute recognition method of the present invention. In today's society, practitioners can get many kinds of data from various channels. This provides good data support for deep learning. In order to introduce the specific embodiments of the present invention more conveniently, the flow chart of the present invention is introduced with age, gender, and face shape as the training set.
假设已经收集到足够多各种姿态的年龄、性别、脸型样本,该样本作为训练数据预处理样本。面部属性识别开始,然后训练数据预处理。训练数据预处理的样本包括带年龄信息的数据图片、带性别信息的数据图片、带脸型信息的数据图片。It is assumed that enough age, gender, and face samples of various poses have been collected, and this sample is used as a training data preprocessing sample. Facial attribute recognition starts, and then training data preprocessing. Samples for training data preprocessing include data pictures with age information, data pictures with gender information, and data pictures with face shape information.
在构建网络输入的训练集要先对图片进行预处理,本方案的数据预处理步骤遵循如下步骤:Before constructing the network input training set, the image must be preprocessed. The data preprocessing steps of this scheme follow the following steps:
第一步:使用多任务卷积神经网络(mtcnn)算法对图片中的面部进行检测,该算法采用三个卷积神经网络级联对面部进行检测,首先采用全卷积神经网络获得面部的候选窗体和边界回归量,同时候选窗体根据边界框进行校准后,利用非极大值抑制方法去除重叠窗体;其次,该步骤的卷积神经网络采用全连接的方式,利用边界框向量微调候选窗体,对面部在图像中的坐标点进行检测,检测到面部之后把面部图像截取下来进行训练。最后采用比之前多一层的卷积神经网络继续优化面部检测框的位置。经过面部检测算法,得到归一化大小为128像素×128像素的面部图片。The first step: use the multi-task convolutional neural network (mtcnn) algorithm to detect the face in the picture. The algorithm uses three convolutional neural network cascades to detect the face. First, the full convolutional neural network is used to obtain face candidates Regression of form and boundary. At the same time, after the candidate form is calibrated according to the bounding box, the non-maximum suppression method is used to remove the overlapping form. Secondly, the convolutional neural network in this step adopts the fully connected method and uses the bounding box vector to fine-tune. The candidate form detects the coordinate points of the face in the image. After detecting the face, the face image is intercepted for training. Finally, a convolutional neural network with one more layer is used to continue to optimize the position of the face detection frame. After the face detection algorithm, a face image with a normalized size of 128 pixels×128 pixels is obtained.
第二步:对面部属性进行标注,标注的属性包括是否戴眼镜、是否戴口罩、发型、脸型、年龄、性别等。标注方法采用现有公共属性数据集Market1501训练一个初始模型后对面部检测得到的图片进行粗分类,每一个属性会给其一个数值标签,然后通过人为对分好的面部属性图片记性细分类,从而构建不同面部属性的数据集。例如:同一张图片中的不同属性可以将其放在不同的文件夹中。The second step: Annotate the facial attributes. The labeled attributes include whether to wear glasses, whether to wear a mask, hairstyle, face shape, age, gender, etc. The labeling method uses the existing public attribute data set Market1501 to train an initial model and then coarsely classifies the pictures obtained by face detection. Each attribute will be given a numerical label, and then the fine classification of the facial attribute pictures is manually classified by human, thus Construct data sets of different facial attributes. For example: different attributes in the same picture can be placed in different folders.
进入进行训练数据合并和多任务训练数据准备阶段:采用多任务网络学习机制可以使网络共享到其他数据的特征。现如今很多深度学习网络仅专注于单一任务,会使很多具有相同共性的数据特性得不到共享。多任务学习的提出可以很好地解决这个问题,它是一种归纳迁移机制,主要目标是利用隐含在多个相关任务的训练信号的特定领域信息来提高泛化能力,多任务学习通过使用共享表示并行训练多个任务来完成这一目标,即可以在学习一个问题的同时,使用共享表示来获取其他相关问题的知识。因此多任务学习是一种专注于将解决一个问题的知识应用到其他相关问题的方法。本方案使用数据合并的方式实现多任务学习训练数据的准备。在数据合并的过程(以脸型、性别、年龄数据为例,其他属性均可按照此步骤进行)中需要遵循以下步骤:Enter the stage of training data merging and multi-task training data preparation: the multi-task network learning mechanism can enable the network to share the characteristics of other data. Nowadays, many deep learning networks only focus on a single task, so that many data features with the same commonality cannot be shared. Multi-task learning can solve this problem well. It is an inductive transfer mechanism. The main goal is to use the specific field information hidden in the training signals of multiple related tasks to improve the generalization ability. Multi-task learning uses Shared representation trains multiple tasks in parallel to accomplish this goal, that is, you can use shared representation to acquire knowledge about other related problems while learning a problem. Therefore, multi-task learning is a method focused on applying knowledge to solve a problem to other related problems. This solution uses data merging to achieve the preparation of multi-task learning training data. In the process of data merging (taking face, gender, and age data as an example, other attributes can be performed according to this step), the following steps need to be followed:
a、数据进行合并的前提每个输入进网络的数据集中的每一副图像的通道数、宽度、高要保证一样,该步骤已经在数据预处理过程中完成。a. Prerequisites for data merging The channel number, width, and height of each image in each data set input into the network must be the same. This step has been completed in the data preprocessing process.
b、在进行三种面部属性数据集合并时,每一个面部属性数据的输入格式为数组形式,由于其数据格式为:D1[n,c,w,h],其中n为输入进网络的图片的个数,c为图像的通道数(一般为3通道),w、h为图像的宽和高。在满足a的条件下,将数据集按照输入数组的第一维,即图片的个数进行合并,合并后生成的数据形式是D2[m×n,c,w,h];其中n、c、w、h分别为输入进深度卷积神经网络的所述多种面部属性的数据集图像的个数、通道数、宽度和高度。b. When combining three types of facial attribute data, the input format of each facial attribute data is an array, because its data format is: D1[n, c, w, h], where n is the image input into the network The number of, c is the number of channels of the image (generally 3 channels), w, h are the width and height of the image. Under the condition of a, the data set is merged according to the first dimension of the input array, that is, the number of pictures, and the data form generated after the merge is D2[m×n, c, w, h]; where n, c , W, and h are the number, channel number, width, and height of the data set images of the multiple facial attributes input into the deep convolutional neural network, respectively.
以上两个步骤完成了多任务学习的数据准备阶段。通过合并的数据输入网络进行学习,使网络可以学习到每个数据集之间的相关性,达到了多任务学习的目的。The above two steps complete the data preparation phase of multi-task learning. Through the merged data input to the network for learning, the network can learn the correlation between each data set to achieve the purpose of multi-task learning.
进入深度卷积神经网络训练阶段:本发明以深度残差网络为基础网络对网络输入数据进行特征提取。残差网络为2015年提出的网络,其性能优于其他深度网络,但是由于其网络结构过深、参数过多,导致其显存占用较大。本发明运用检测网络中的分类思想对深度残差网络进行修改,将网络中的残差模块减少、并丢弃全连接层,形成新的网络结构,使得网络结构变得简单,网络模型大大减小的同时,显存占用也大幅度减小。Enter the deep convolutional neural network training stage: the present invention uses the deep residual network as the basic network to perform feature extraction on the network input data. The residual network is a network proposed in 2015, and its performance is better than other deep networks. However, due to its deep network structure and too many parameters, it consumes a large amount of video memory. The invention uses the classification ideas in the detection network to modify the deep residual network, reduce the residual modules in the network, and discard the fully connected layer to form a new network structure, which makes the network structure simple and the network model greatly reduced At the same time, the memory usage is also greatly reduced.
本方案采用的深度残差网络结构比较简单,如图3所示,x为残差模块 的输入,F(x)为卷积神经网络的原始映射。relu为深度残差模块中的激活函数,H(x)为深度残差模块的输出函数,深度残差模块采用原始映射F(x)和输入x构成行的网络输出函数。本方案采用深度残差网络的4个残差模块,其中第一个残差模块采用两个小的残差模块表示,第一个小的残差模块结构采用64个卷积核大小为3×3的卷积层连接64个卷积核大小为3×3的卷积层,其恒等映射用64个1×1的卷积,二者做和之后输入第二个小的残差模块,第二个小的残差模块结构与所述第一个小的残差模块相同,第二个小的残差模块的恒等映射用第一个小的残差模块的输出表示,另外三个残差模块结构和第一个残差模块结构一样,但是卷积核的个数分别为128、256、512。只采用深度残差网络中的四个残差模块,这样结构更小更有利于实现对网络输入数据进行特征提取。通过深度残差网络,可以很好的提取到图像的深度特征,从而可以便于更好进行分类。The structure of the deep residual network used in this scheme is relatively simple. As shown in Figure 3, x is the input of the residual module and F(x) is the original mapping of the convolutional neural network. Relu is the activation function in the depth residual module, H(x) is the output function of the depth residual module, and the depth residual module uses the original mapping F(x) and the input x to form a row of network output functions. This solution uses 4 residual modules of the deep residual network, where the first residual module is represented by two small residual modules, and the first small residual module structure uses 64 convolution kernels with a size of 3× The convolutional layer of 3 is connected to 64 convolutional layers with a size of 3×3, and its identity mapping uses 64 1×1 convolutions. After the two are summed, they are input into the second small residual module. The structure of the second small residual module is the same as that of the first small residual module, the identity map of the second small residual module is represented by the output of the first small residual module, and the other three The structure of the residual module is the same as that of the first residual module, but the number of convolution kernels is 128, 256, and 512, respectively. Only the four residual modules in the deep residual network are used, which makes the structure smaller and more conducive to feature extraction of the network input data. Through the depth residual network, the depth features of the image can be well extracted, which can facilitate better classification.
进入网络输出阶段:在网络训练开始之前对数据进行了合并,以方便学习到数据的共享知识。经过网络的训练学习,网络学习到了较好的特征,且网络在学习的过程中,训练数据的图像的个数没有发生改变。因此,为了得到各个面部属性数据集的学习情况,在网络输出计算损失函数时,按照每个属数组的第一维度进行分割,并按照每个属性切开,即最终的结果依然是每个属性的数组形式为D3[n,c,w,h],并将每个属性数据输入对应的损失函数。Enter the network output stage: the data is merged before the network training begins to facilitate the sharing of data. After the network's training and learning, the network has learned better features, and the number of images in the training data has not changed during the network's learning process. Therefore, in order to obtain the learning situation of each facial attribute data set, when the network output calculates the loss function, it is divided according to the first dimension of each attribute array and cut according to each attribute, that is, the final result is still each attribute The array form of is D3[n, c, w, h], and input each attribute data into the corresponding loss function.
在进行数据合并时根据输入进网络的面部属性图片的个数进行合并。在进行数据分离时,按照每个数据集输入进网络的图片的个数进行切割,必须保证与原始个数相同。When data is merged, merge according to the number of facial attribute pictures input into the network. When data separation is performed, the number of pictures input into the network for each data set is cut, and it must be the same as the original number.
切割之后的数据集输入进相对应的损失函数层,从而进行相应的损失函数计算,得到相应的权重更新,并得到对应的每一个面部属性数据集中的种类输出。The data set after cutting is input into the corresponding loss function layer, so that the corresponding loss function calculation is performed, the corresponding weight update is obtained, and the corresponding category output of each facial attribute data set is obtained.
损失函数可以根据学习到共享特征计算各个属性数据相对应的损失函数,本方案采用的损失函数采用概率形式,公式为:The loss function can calculate the loss function corresponding to each attribute data based on the shared features learned. The loss function used in this scheme takes the form of probability, and the formula is:
Figure PCTCN2019112478-appb-000009
Figure PCTCN2019112478-appb-000009
其中,
Figure PCTCN2019112478-appb-000010
为数据的某一种结果的指数形式,a j和a k代表面部属性中的某一个属性值标,分母表示将所有属性标签的指数相加,从而得到面部属性是具体属性标签的概率。根据网络损失函数,运用反向传播算法更新权重,使网络达到最优状态,从而得到输入样本对应的面部属性,如年龄识别集、性别识别集、脸型识别集。
among them,
Figure PCTCN2019112478-appb-000010
It is an exponential form of a certain result of the data. a j and a k represent a certain attribute value scale in the facial attribute, and the denominator indicates the sum of the exponents of all attribute labels, thereby obtaining the probability that the facial attribute is a specific attribute label. According to the network loss function, the weights are updated using the back propagation algorithm to make the network reach the optimal state, so as to obtain the facial attributes corresponding to the input samples, such as age recognition set, gender recognition set, and face recognition set.
以年龄为例,我们会将年龄标注为:少年、青年、中年、老年四个标签。网络最后的一层的输出会根据损失函数和对应的标签进行计算,得到四个概率,分别表示对应类别(少年、青年、中年、老年)的概率,如果少年的概率最高,则会被判定为少年。本发明在最后网络结构中会有每个属性对应的损失函数,所以每个属性都会根据自己对应的损失函数计算其对应的概率。根据概率的高低决定其属于哪一个属性。根据网络损失函数,运用反向传播算法更新权重,使网络达到最优状态,从而得到输入样本对应的面部属性。Taking age as an example, we will label age as four labels: teenager, youth, middle age, and old age. The output of the last layer of the network will be calculated according to the loss function and the corresponding label, and four probabilities will be obtained, respectively representing the probabilities of the corresponding categories (juvenile, youth, middle age, old age). If the probability of the juvenile is the highest, it will be judged For teenagers. The present invention will have a loss function corresponding to each attribute in the final network structure, so each attribute will calculate its corresponding probability according to its corresponding loss function. Determine which attribute it belongs to based on the probability. According to the network loss function, the weights are updated using the back-propagation algorithm to make the network reach the optimal state, and the facial attributes corresponding to the input samples are obtained.
以性别为例,我们会将性别标注为:男、女两个标签。网络最后的一层的输出会根据损失函数和对应的标签进行计算,得到两个概率,分别表示对应类别(男、女)的概率,如果男的概率最高,则会被判定为男。本发明在最后网络结构中会有每个属性对应的损失函数,所以每个属性都会根据自己对应的损失函数计算其对应的概率。根据概率的高低决定其属于哪一个属性。根据网络损失函数,运用反向传播算法更新权重,使网络达到最优状态,从而得到输入样本对应的面部属性。Taking gender as an example, we will label gender as: male and female tags. The output of the last layer of the network will be calculated according to the loss function and the corresponding label, and two probabilities will be obtained, which respectively represent the probability of the corresponding category (male and female). If the probability of male is the highest, it will be judged as male. The present invention will have a loss function corresponding to each attribute in the final network structure, so each attribute will calculate its corresponding probability according to its corresponding loss function. Determine which attribute it belongs to based on the probability. According to the network loss function, the weights are updated using the back-propagation algorithm to make the network reach the optimal state, and the facial attributes corresponding to the input samples are obtained.
图4是面部形状类别图。按照常见的多数面部型,我们大致可以划分为7种:圆脸、椭圆形脸、心形脸(倒三角脸)、菱形脸、方形脸、长形脸、梨形脸(正三角脸)。对脸型(脸型的种类为从业人员根据业务需求人为规定的,以上述七种脸型为例)的判断过程为:FIG. 4 is a diagram of facial shape categories. According to the most common facial types, we can be roughly divided into 7 types: round face, oval face, heart-shaped face (inverted triangle face), diamond face, square face, long face, pear-shaped face (positive triangle face). The judgment process for the face shape (the type of face is artificially specified by the practitioner according to the business needs, taking the above seven face types as an example) is:
第一步:经过数据预处理的过程,通过面部检测和数据标注过程建立训练网络和测试网络的数据集。The first step: After the data preprocessing process, the data set of the training network and the test network is established through the face detection and data annotation process.
第二步:将各个属性的训练数据输入进网络,将数据合并(以性别、年龄、脸型为例),输入进网络模型中,对网络模型进行训练。The second step: input the training data of each attribute into the network, merge the data (taking gender, age, and face shape as examples), enter it into the network model, and train the network model.
第三步:网络模型训练结束后,利用训练好的模型,将待识别的图片输 入进网络模型,从而得到该图片包含的面部的性别、年龄、脸型的类别。Step 3: After the network model training is completed, use the trained model to input the picture to be recognized into the network model, so as to obtain the gender, age and face type of the face contained in the picture.
通过网络输出后的年龄识别集元素可以是以上年龄(少年、青年、中年、老年)中的任何一种。The elements of the age recognition set output through the network may be any of the above ages (juvenile, youth, middle age, old age).
通过网络输出后的脸型识别集元素可以是以上脸型(圆脸、椭圆形脸、心形脸(倒三角脸)、菱形脸、方形脸、长形脸、梨形脸(正三角脸))中的任何一种。The elements of the face recognition set output through the network can be the above face shapes (round face, oval face, heart-shaped face (inverted triangle face), diamond face, square face, long face, pear-shaped face (positive triangle face)) Of any kind.
对于其他面部属性的识别,如戴眼镜、戴口罩、发型、肤色、表情情绪、有胡子、头发颜色、戴帽子、种族、魅力值等,则可以通过如上类似的方法来准确识别。Recognition of other facial attributes, such as wearing glasses, wearing a mask, hairstyle, skin color, facial expressions and emotions, beard, hair color, wearing a hat, ethnicity, charm value, etc., can be accurately identified by similar methods as above.
由此可见,通过采用本发明面部属性识别方法,减小了深度卷积神经网络参数,即改变网络模型中某些层如全连接层的参数的个数,减小了网络显存占用率,且通过数据合并的方式,将多种属性的数据输入进卷积神经网络,使其在网络中对所有属性进行学习和训练,从而使同一网络快速高效完成所有面部属性的识别。It can be seen that by adopting the facial attribute recognition method of the present invention, the parameters of the deep convolutional neural network are reduced, that is, the number of parameters of certain layers in the network model such as the fully connected layer is changed, and the network memory occupation rate is reduced, and Through data merging, data of multiple attributes are input into the convolutional neural network, so that it can learn and train all attributes in the network, so that the same network can quickly and efficiently complete the identification of all facial attributes.
实施例五Example 5
本发明的实施例还提供了一种存储介质,该存储介质包括存储的程序,其中,上述程序运行时执行实施例四所述的面部属性识别方法流程。An embodiment of the present invention further provides a storage medium, which includes a stored program, where the flow of the facial attribute recognition method described in Embodiment 4 is executed when the above program runs.
可选地,在本实施例中,上述存储介质可以包括但不限于:U盘、只读存储器(Read-Only Memory,简称为ROM)、随机存取存储器(Random Access Memory,简称为RAM)、移动硬盘、磁碟或者光盘等各种可以存储程序代码的介质。Optionally, in this embodiment, the above storage medium may include, but is not limited to: a USB flash drive, a read-only memory (Read-Only Memory, ROM for short), a random access memory (Random Access Memory, RAM for short), Various media that can store program codes, such as removable hard disks, magnetic disks, or optical disks.
由此可见,通过采用本发明存储介质,减小了存储容量,内置的面部属性识别方法流程的程序运行速度更快,从而快速高效完成所有面部属性的识别。It can be seen that by using the storage medium of the present invention, the storage capacity is reduced, and the program of the built-in facial attribute recognition method process runs faster, thereby quickly and efficiently completing the recognition of all facial attributes.
实施例六Example Six
本发明的实施例还提供了一种处理器,该处理器用于运行程序,其中,该程序运行时执行实施例四所述的面部属性识别方法中的步骤。An embodiment of the present invention further provides a processor for running a program, where the steps in the facial attribute recognition method described in Embodiment 4 are executed when the program is run.
可选地,本实施例中的具体示例可以参考上述实施例及具体实施时所描述的示例,本实施例在此不再赘述。Optionally, for specific examples in this embodiment, reference may be made to the above-mentioned embodiments and the examples described in the specific implementation, and this embodiment will not be repeated here.
由此可见,通过采用本发明处理器,减小了待处理数据量,内置的面部属性识别方法流程的程序运行速度更快,从而快速高效完成所有面部属性的识别。It can be seen that by using the processor of the present invention, the amount of data to be processed is reduced, and the program of the built-in facial attribute recognition method process runs faster, thereby quickly and efficiently completing the recognition of all facial attributes.
实施例七Example 7
图5本发明一种面部属性识别装置结构图。所述装置包括:数据建立模块,所述数据建立模块用于建立多种面部属性的数据集;数据合并模块,所述数据合并模块用于将所述多种面部属性的数据集进行合并,形成所述多种面部属性的数据集集合;训练模块,所述训练模块用于建立多任务的深度卷积神经网络对所述多种面部属性的数据集集合进行训练,得到可以识别多种面部属性的多属性网络模型;预测模块,所述预测模块用于运用所述多属性网络模型对待识别图像的面部属性进行多属性预测,以识别所述待识别图像中的多种面部属性。FIG. 5 is a structural diagram of a face attribute recognition device of the present invention. The device includes: a data establishment module, the data establishment module is used to establish a plurality of facial attribute data sets; a data merge module, the data merge module is used to merge the multiple facial attribute data sets to form A data set collection of the plurality of facial attributes; a training module, the training module is used to establish a multi-task deep convolutional neural network to train the data set collection of the plurality of facial attributes to obtain a variety of facial attributes Multi-attribute network model; prediction module, the prediction module is used to perform multi-attribute prediction of facial attributes of the image to be recognized by using the multi-attribute network model to identify multiple facial attributes in the image to be recognized.
其中,所述数据建立模块包括:数据预处理模块,所述数据预处理模块用于对面部图像数据进行预处理;数据归一化处理模块,所述数据归一化处理模块用于对面部图像数据进行归一化处理;数据标注模块,所述数据标注模块用于对面部属性进行标注。数据归一化处理模块包括:对多种面部属性的数据集面部图像归一化宽高大小为但是不限于128像素×128像素。对面部属性进行标注指的是:标注面部属性是否戴眼镜、是否戴口罩、是什么发型、是什么脸型、是什么年龄、是什么性别。Wherein, the data establishment module includes: a data preprocessing module, the data preprocessing module is used for preprocessing facial image data; a data normalization processing module, the data normalization processing module is used for facial image The data is normalized; a data labeling module, which is used to label facial attributes. The data normalization processing module includes: normalizing the width and height of the face image of the data set of multiple face attributes to but not limited to 128 pixels×128 pixels. Labeling facial attributes refers to: labeling facial attributes whether to wear glasses, whether to wear a mask, what hairstyle, what face shape, what age, what gender.
数据合并模块包括:第一存储模块;所述第一存储模块用于存储每一种属性的面部图像数据当面部属性识别装置上执行面部属性识别方法时,软件程序动态生成每一个面部属性数据的输入数组D1[n,c,w,h],将所述多种面部属性的数据集按照输入数组的第一维,即图片的个数进行合并,假设面部属性的种类个数为m,则合并后生成的数据数组D2[m×n,c,w,h],其中n为输入进深度卷积神经网络的图像的个数,c为输入进深度卷积神经网络的图像的通道数,w为输入进深度卷积神经网络的图像的宽度,h为输 入进深度卷积神经网络的图像的高度。The data merging module includes: a first storage module; the first storage module is used to store facial image data of each attribute. When a facial attribute recognition method is executed on the facial attribute recognition device, the software program dynamically generates Input the array D1[n, c, w, h], and merge the data sets of the multiple facial attributes according to the first dimension of the input array, that is, the number of pictures. Assuming that the number of types of facial attributes is m, then The data array D2[m×n, c, w, h] generated after merging, where n is the number of images input into the deep convolutional neural network, and c is the number of channels input into the image of the deep convolutional neural network, w is the width of the image input into the deep convolutional neural network, and h is the height of the image input into the deep convolutional neural network.
训练模块包括深度残差网络,所述深度残差网络包括4个残差模块,其中第一个残差模块采用两个小的残差模块表示,第一个小的残差模块结构采用64个卷积核大小为3×3的卷积层连接64个卷积核大小为3×3的卷积层,其恒等映射用64个1×1的卷积,二者做和之后输入第二个小的残差模块,第二个小的残差模块结构与所述第一个小的残差模块相同,第二个小的残差模块的恒等映射用第一个小的残差模块的输出表示,另外三个残差模块结构和第一个残差模块结构一样,卷积核的个数分别为128、256、512。The training module includes a deep residual network, the deep residual network includes 4 residual modules, wherein the first residual module is represented by two small residual modules, and the first small residual module structure uses 64 The convolutional layer with a convolution kernel size of 3×3 connects 64 convolutional layers with a convolution kernel size of 3×3, and its identity map uses 64 1×1 convolutions. Small residual module, the structure of the second small residual module is the same as that of the first small residual module, and the identity mapping of the second small residual module uses the first small residual module The output of shows that the structure of the other three residual modules is the same as that of the first residual module, and the number of convolution kernels is 128, 256, and 512, respectively.
预测模块包括第二存储模块,所述第二存储模块用于存储根据训练好的网络模型,从而一次性得到的面部各个属性的概率最高的识别结果,即面部属性的多属性值。The prediction module includes a second storage module, which is used to store a recognition result with the highest probability of each attribute of the face obtained according to the trained network model, that is, a multi-attribute value of the face attribute.
由此可见,通过采用本发明面部属性识别装置,减小了深度卷积神经网络参数,即改变网络模型中某些层如全连接层的参数的个数,减小了网络显存占用率,且通过数据合并的方式,将多种属性的数据输入进卷积神经网络,使其在网络中对所有属性进行学习和训练,从而使同一网络快速高效完成所有面部属性的识别。It can be seen that by using the face attribute recognition device of the present invention, the parameters of the deep convolutional neural network are reduced, that is, the number of parameters of certain layers in the network model such as the fully connected layer is changed, and the network memory occupation rate is reduced, and Through data merging, data of multiple attributes are input into the convolutional neural network, so that it can learn and train all attributes in the network, so that the same network can quickly and efficiently complete the identification of all facial attributes.
利用原始的面部属性识别方法过程是:建立年龄、性别、脸型对应的数据集(包括训练和测试);将年龄、性别、脸型分别输入进其对应的网络模型(此时需要训练三个网络模型);将测试图片分别输入进训练好的三个网络模型,根据每个网络模型的最高概率对应的属性,从而得到每个网络模型对应的属性识别结果。The process of using the original facial attribute recognition method is: establishing data sets corresponding to age, gender, and face shape (including training and testing); entering age, gender, and face shape into their corresponding network models (at this time, three network models need to be trained ); Input the test pictures into the three trained network models respectively, and obtain the attribute recognition result corresponding to each network model according to the attribute corresponding to the highest probability of each network model.
由此可见,采用多任务深度学习的识别方法进行面部识别,在深度卷积神经网络中,网络中的参数的个数决定了网络的大小,网络大小决定了网络占用显存的大小。采用减小网络参数的方法,即改变网络中某些层(例如全连接层)的参数的个数,从而减小网络显存占用,且通过数据合并的方式,将多种属性的数据输入进卷积神经网络,使其在网络中对所有属性进行学习和训练,从而使同一网络完成所有面部属性的识别。在原始的深度残差网络中,最后全连接层需要学习的参数为1000个,由于全连接层占用了网络参 数中的大部分比例,因此本方案对网络进行剪枝,即将原始网络结构中的全连接层去掉,使得卷积层直接和损失层相连接,直接回归面部属性的概率,从而完成网络结构的修改和剪枝,这样的网络剪枝会使网络的计算参数的个数减少,使得网络大小和显存占用减小。It can be seen that the multi-task deep learning recognition method is used for facial recognition. In deep convolutional neural networks, the number of parameters in the network determines the size of the network, and the size of the network determines the size of the network memory. Adopt the method of reducing network parameters, that is, change the number of parameters of certain layers (such as fully connected layers) in the network, thereby reducing the occupation of network video memory, and input data of multiple attributes into the volume through data consolidation The neural network makes it learn and train all the attributes in the network, so that the same network can complete the recognition of all facial attributes. In the original deep residual network, the last fully connected layer needs to learn 1000 parameters. Since the fully connected layer occupies most of the network parameters, this solution prunes the network, that is, the original network structure. The fully connected layer is removed, so that the convolutional layer is directly connected to the loss layer, and the probability of the face attribute is directly returned to complete the modification and pruning of the network structure. Such network pruning will reduce the number of network calculation parameters, making Network size and video memory usage are reduced.
由上述说明可知,使用根据本发明的面部属性识别方法、装置、存储介质及处理器,减小了网络参数,即改变网络模型中某些层如全连接层的参数的个数,减小了网络显存占用率,且通过数据合并的方式,将多种属性的数据输入进卷积神经网络,使其在网络中对所有属性进行学习和训练,从而使同一网络快速高效完成所有面部属性的识别。It can be seen from the above description that using the facial attribute recognition method, device, storage medium and processor according to the present invention reduces the network parameters, that is, changes the number of parameters of certain layers in the network model, such as the fully connected layer, reduces Network memory occupancy rate, and through data combination, input data of multiple attributes into the convolutional neural network, so that it can learn and train all attributes in the network, so that the same network can quickly and efficiently complete the identification of all facial attributes .
以上对本发明实施例进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的一般技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。The embodiments of the present invention have been described in detail above, and specific examples have been used in this article to explain the principles and implementations of the present invention. The descriptions of the above embodiments are only used to help understand the method and the core idea of the present invention; Those of ordinary skill in the art, according to the ideas of the present invention, may have changes in specific implementations and application scopes. In summary, the content of this specification should not be construed as limiting the present invention.

Claims (18)

  1. 一种面部属性识别方法,其特征在于,包括步骤:A facial attribute recognition method, characterized in that it includes steps:
    建立多种面部属性的数据集;Create data sets of multiple facial attributes;
    将所述多种面部属性的数据集进行合并,形成所述多种面部属性的数据集集合;Combining the data sets of the multiple facial attributes to form a data set set of the multiple facial attributes;
    建立多任务的深度卷积神经网络对所述多种面部属性的数据集集合进行训练,得到可以识别所述多种面部属性的多属性网络模型;Establish a multi-task deep convolutional neural network to train the data set of the multiple facial attributes, and obtain a multi-attribute network model that can identify the multiple facial attributes;
    运用所述多属性网络模型对待识别图像的面部属性进行多属性预测,以识别所述待识别图像中的多种面部属性。The multi-attribute network model is used to perform multi-attribute prediction on facial attributes of the image to be recognized, so as to identify multiple facial attributes in the image to be recognized.
  2. 根据权利要求1所述的面部属性识别方法,其特征在于,所述建立多种面部属性的数据集包括:The facial attribute recognition method according to claim 1, wherein the data set for establishing multiple facial attributes includes:
    对所述多种面部属性的数据集进行预处理;Preprocessing the data set of the multiple facial attributes;
    对预处理后的所述多种面部属性的数据集进行归一化处理;以及Normalizing the pre-processed data set of the plurality of facial attributes; and
    对归一化处理后的所述多种面部属性进行标注。Annotate the multiple facial attributes after normalization.
  3. 根据权利要求1所述的面部属性识别方法,其特征在于,所述将多种面部属性的数据集进行合并,形成所述多种面部属性的数据集集合包括:The facial attribute recognition method according to claim 1, wherein the combining of data sets of multiple facial attributes to form a data set set of the multiple facial attributes includes:
    将每一个面部属性数据的输入格式设为数组形式D1[n,c,w,h];Set the input format of each facial attribute data to array form D1[n, c, w, h];
    将所述多种面部属性的数据集按照输入数组的第一维,即图片的个数进行合并,假设面部属性的种类个数为m,则合并后生成的数据形式为D2[m×n,c,w,h];The data sets of the multiple face attributes are merged according to the first dimension of the input array, that is, the number of pictures. Assuming that the number of types of face attributes is m, the data form generated after the merge is D2[m×n, c, w, h];
    其中n、c、w、h分别为输入进深度卷积神经网络的所述多种面部属性的数据集图像的个数、通道数、宽度和高度。Where n, c, w, and h are the number, channel number, width, and height of the data set images of the multiple facial attributes input into the deep convolutional neural network, respectively.
  4. 根据权利要求1所述的面部属性识别方法,其特征在于,所述建立多任务的深度卷积网络对多种面部属性数据集集合进行训练包括:The facial attribute recognition method according to claim 1, wherein the training of the multi-task deep convolutional network to train multiple sets of facial attribute data sets includes:
    采用深度残差网络的4个残差模块;4 residual modules using deep residual network;
    将待识别图像的面部属性数据输入进深度残差网络进行网络训练;Input the facial attribute data of the image to be recognized into the deep residual network for network training;
    其中深度残差网络的第一个残差模块采用两个小的残差模块表示,第一个小的残差模块结构采用64个卷积核大小为3×3的卷积层连接64个卷积 核大小为3×3的卷积层,其恒等映射用64个1×1的卷积,二者做和之后输入第二个小的残差模块,第二个小的残差模块结构与所述第一个小的残差模块相同,第二个小的残差模块的恒等映射用第一个小的残差模块的输出表示,The first residual module of the deep residual network is represented by two small residual modules, and the first small residual module structure uses 64 convolutional layers with a convolution kernel size of 3×3 to connect 64 volumes The convolutional layer with a product kernel size of 3×3 uses 64 1×1 convolutions for its identity mapping. After the two are summed, the second small residual module is input, and the structure of the second small residual module Similar to the first small residual module, the identity map of the second small residual module is represented by the output of the first small residual module,
    另外三个残差模块结构和第一个残差模块结构一样,卷积核的个数分别为128、256、512。The structure of the other three residual modules is the same as that of the first residual module. The number of convolution kernels is 128, 256, and 512, respectively.
  5. 根据权利要求1所述的面部属性识别方法,其特征在于,所述建立多任务的深度卷积神经网络对所述多种面部属性的数据集集合进行训练,得到可以识别多种面部属性的多属性网络模型包括:The facial attribute recognition method according to claim 1, wherein the multi-task deep convolutional neural network is trained on the data set set of the multiple facial attributes to obtain multiple facial attribute recognition Attribute network models include:
    根据训练好的网络模型,在图片识别阶段一次性得到所述多种面部属性的面部各个属性的概率最高的识别结果,即面部属性的多属性值。According to the trained network model, in the picture recognition stage, the recognition result with the highest probability of each face attribute of the multiple face attributes is obtained at one time, that is, the multi-attribute value of the face attribute.
  6. 根据权利要求2所述面部属性识别方法,其特征在于,对所述多种面部属性的数据集进行预处理包括:The facial attribute recognition method according to claim 2, wherein the preprocessing of the data sets of the multiple facial attributes includes:
    使用多任务卷积神经网络算法对图片中的面部进行检测,获得面部图像。The multi-task convolutional neural network algorithm is used to detect the face in the picture to obtain a facial image.
  7. 根据权利要求6所述面部属性识别方法,其特征在于,所述使用多任务卷积神经网络算法对图片中的面部进行检测,获得面部图像包括:The face attribute recognition method according to claim 6, wherein the detecting the face in the picture using the multi-task convolutional neural network algorithm and obtaining the face image includes:
    卷积神经网络采用全连接的方式,利用边界框向量微调候选窗体,从而对面部在图像中的坐标点进行检测,得到面部图像。The convolutional neural network adopts the fully connected method, and uses the bounding box vector to fine-tune the candidate form, so as to detect the coordinate points of the face in the image, and obtain the face image.
  8. 根据权利要求2所述面部属性识别方法,其特征在于,对所述多种面部属性的数据集进行归一化处理包括:对多种面部属性的数据集面部图像归一化宽高大小为128像素×128像素。The facial attribute recognition method according to claim 2, wherein normalizing the data set of the plurality of facial attributes includes: normalizing the width and height of the facial image of the data set of the multiple facial attributes to 128 Pixels × 128 pixels.
  9. 根据权利要求2所述面部属性识别方法,其特征在于,所述对面部属性进行标注,包括:标注面部戴眼镜、戴口罩、发型、脸型、年龄、性别属性。The facial attribute recognition method according to claim 2, wherein the labeling of the facial attributes includes: labeling the face wearing glasses, wearing a mask, hairstyle, face shape, age, and gender attributes.
  10. 根据权利要求5所述面部属性识别方法,其特征在于:所述根据训练好的网络模型,在图片识别阶段一次性得到所述多种面部属性的面部各个属性的概率最高的识别结果包括:The facial attribute recognition method according to claim 5, characterized in that: according to the trained network model, the recognition result with the highest probability of each face attribute of the plurality of facial attributes at a time during the picture recognition stage includes:
    在网络训练计算损失函数时,按照每个数组的第一维度进行分割, 并按照每个属性对应的图片个数将其切开,即最终的结果为每个属性的数组形式D3[n,c,w,h];When the network training calculates the loss function, it is divided according to the first dimension of each array, and it is cut according to the number of pictures corresponding to each attribute, that is, the final result is the array form of each attribute D3[n, c , W, h];
    在进行分割时,按照对应的图片个数进行并将每个属性数据输入对应的损失函数;When segmenting, proceed according to the corresponding number of pictures and input each attribute data into the corresponding loss function;
    损失函数采用概率形式,公式为:The loss function takes the form of probability, the formula is:
    Figure PCTCN2019112478-appb-100001
    Figure PCTCN2019112478-appb-100001
    其中,j代表面部属性中的一个属性的子属性的当前序号,k代表面部属性中的一个属性的子属性的序号,T代表面部属性中的一个属性的子属性的总个数,S j代表面部属性中的一个属性的第j子属性的概率,k从1开始取值,T个子属性的概率之和为1,
    Figure PCTCN2019112478-appb-100002
    为数据的某一种属性的的指数形式,a j和a k代表面部属性中的某一个属性值标,分母表示将所有属性标签的指数相加,从而得到面部属性是具体属性标签的概率。
    Where j represents the current serial number of the sub-attribute of an attribute in the facial attribute, k represents the serial number of the sub-attribute of an attribute in the facial attribute, T represents the total number of sub-attributes of an attribute in the facial attribute, and S j represents The probability of the j-th sub-attribute of one of the face attributes, k starts from 1, and the sum of the probabilities of the T sub-attributes is 1.
    Figure PCTCN2019112478-appb-100002
    It is the exponential form of a certain attribute of the data. a j and a k represent a certain attribute value scale in the facial attribute, and the denominator indicates the sum of the exponents of all attribute tags, thereby obtaining the probability that the facial attribute is a specific attribute tag.
  11. 一种存储介质,其特征在于,所述存储介质包括存储的程序,其中,所述程序运行时执行权利要求1至10中任一项所述的面部属性识别方法。A storage medium, characterized in that the storage medium includes a stored program, wherein the face attribute recognition method according to any one of claims 1 to 10 is executed when the program runs.
  12. 一种处理器,其特征在于,所述处理器用于运行程序,其中,所述程序运行时执行权利要求1至10中任一项所述的面部属性识别方法。A processor, characterized in that the processor is used to run a program, wherein the face attribute recognition method according to any one of claims 1 to 10 is executed when the program is run.
  13. 一种面部识别装置,其特征在于,所述装置包括:依次电连接的数据建立模块、数据合并模块、训练模块和预测模块,A face recognition device, characterized in that the device includes: a data establishment module, a data merge module, a training module and a prediction module that are electrically connected in sequence,
    所述数据建立模块用于建立多种面部属性的数据集;The data establishment module is used to establish a data set of various facial attributes;
    所述数据合并模块用于将所述多种面部属性的数据集进行合并,形成所述多种面部属性的数据集集合;The data merging module is used to merge the data sets of the multiple facial attributes to form a data set set of the multiple facial attributes;
    所述训练模块用于建立多任务的深度卷积神经网络对所述多种面部属性的数据集集合进行训练,得到可以识别多种面部属性的多属性网络模型;The training module is used to establish a multi-task deep convolutional neural network to train the data set collection of multiple facial attributes to obtain a multi-attribute network model that can recognize multiple facial attributes;
    所述预测模块用于运用得到的所述多属性网络模型对待识别图像的面部属性进行多属性预测,以识别所述待识别图像中的多种面部属性。The prediction module is used to perform multi-attribute prediction on the facial attributes of the image to be recognized by using the obtained multi-attribute network model to identify multiple facial attributes in the image to be recognized.
  14. 根据权利要求13所述的面部识别装置,其特征在于,所述数据建立模块包括:The facial recognition device according to claim 13, wherein the data establishment module includes:
    数据预处理模块,所述数据预处理模块用于对面部图像数据进行预处理;A data preprocessing module, the data preprocessing module is used to preprocess the facial image data;
    数据归一化处理模块,所述数据归一化处理模块用于对面部图像数据进行归一化处理;A data normalization processing module, the data normalization processing module is used to normalize the facial image data;
    数据标注模块,所述数据标注模块用于对面部属性进行标注。A data labeling module, which is used to label facial attributes.
  15. 根据权利要求13所述的面部识别装置,其特征在于,所述数据合并模块包括:第一存储模块,所述第一存储模块用于存储每一种属性的面部图像数据。The facial recognition device according to claim 13, wherein the data merging module includes: a first storage module, the first storage module is used to store facial image data of each attribute.
  16. 根据权利要求13所述的面部识别装置,其特征在于,所述训练模块包括深度残差网络,所述深度残差网络包括4个残差模块,The facial recognition device according to claim 13, wherein the training module includes a deep residual network, and the deep residual network includes 4 residual modules,
    其中第一个残差模块采用两个小的残差模块表示,第一个小的残差模块结构采用64个卷积核大小为3×3的卷积层连接64个卷积核大小为3×3的卷积层,其恒等映射用64个1×1的卷积,二者做和之后输入第二个小的残差模块,第二个小的残差模块结构与所述第一个小的残差模块相同,第二个小的残差模块的恒等映射用第一个小的残差模块的输出表示,The first residual module is represented by two small residual modules, and the first small residual module structure uses 64 convolutional layers with a convolution kernel size of 3×3 connected to 64 convolution kernels with a size of 3. ×3 convolutional layer, the identity mapping uses 64 1×1 convolutions, and the sum of the two is input into the second small residual module. The structure of the second small residual module is the same as that of the first The small residual modules are the same, the identity map of the second small residual module is represented by the output of the first small residual module,
    另外三个残差模块结构和第一个残差模块结构一样,卷积核的个数分别为128、256、512。The structure of the other three residual modules is the same as that of the first residual module. The number of convolution kernels is 128, 256, and 512, respectively.
  17. 根据权利要求13所述的面部识别装置,其特征在于,所述预测模块包括第二存储模块,所述第二存储模块用于存储根据训练好的网络模型,从而一次性得到的面部各个属性的概率最高的识别结果,即面部属性的多属性值。The facial recognition device according to claim 13, wherein the prediction module includes a second storage module, and the second storage module is used to store the facial attributes obtained at a time according to the trained network model The recognition result with the highest probability is the multi-attribute value of the face attribute.
  18. 根据权利要求14所述的面部识别装置,其特征在于,所述数据归一化处理模块用于对面部图像数据进行归一化处理包括:对多种面部属性的数据集面部图像归一化宽高大小为128像素×128像素。The facial recognition device according to claim 14, wherein the data normalization processing module is configured to perform normalization processing on the facial image data including: normalizing the facial image data set of multiple facial attribute data sets The height is 128 pixels by 128 pixels.
PCT/CN2019/112478 2018-12-07 2019-10-22 Facial attribute identification method and device, storage medium and processor WO2020114118A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811502128.2 2018-12-07
CN201811502128.2A CN111291604A (en) 2018-12-07 2018-12-07 Face attribute identification method, device, storage medium and processor

Publications (1)

Publication Number Publication Date
WO2020114118A1 true WO2020114118A1 (en) 2020-06-11

Family

ID=70973734

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/112478 WO2020114118A1 (en) 2018-12-07 2019-10-22 Facial attribute identification method and device, storage medium and processor

Country Status (2)

Country Link
CN (1) CN111291604A (en)
WO (1) WO2020114118A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695513A (en) * 2020-06-12 2020-09-22 长安大学 Facial expression recognition method based on depth residual error network
CN111783619A (en) * 2020-06-29 2020-10-16 北京百度网讯科技有限公司 Human body attribute identification method, device, equipment and storage medium
CN112232231A (en) * 2020-10-20 2021-01-15 城云科技(中国)有限公司 Pedestrian attribute identification method, system, computer device and storage medium
CN112287966A (en) * 2020-09-21 2021-01-29 深圳市爱深盈通信息技术有限公司 Face recognition method and device and electronic equipment
CN112783990A (en) * 2021-02-02 2021-05-11 贵州大学 Graph data attribute-based reasoning method and system
CN112906668A (en) * 2021-04-07 2021-06-04 上海应用技术大学 Face information identification method based on convolutional neural network
CN113128345A (en) * 2021-03-22 2021-07-16 深圳云天励飞技术股份有限公司 Multitask attribute identification method and device and computer readable storage medium
CN113657486A (en) * 2021-08-16 2021-11-16 浙江新再灵科技股份有限公司 Multi-label multi-attribute classification model establishing method based on elevator picture data
CN113705527A (en) * 2021-09-08 2021-11-26 西南石油大学 Expression recognition method based on loss function integration and coarse and fine hierarchical convolutional neural network
CN113963426A (en) * 2021-12-22 2022-01-21 北京的卢深视科技有限公司 Model training method, mask wearing face recognition method, electronic device and storage medium
CN114626527A (en) * 2022-03-25 2022-06-14 中国电子产业工程有限公司 Neural network pruning method and device based on sparse constraint retraining

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783574B (en) * 2020-06-17 2024-02-23 李利明 Meal image recognition method, device and storage medium
CN112149601A (en) * 2020-09-30 2020-12-29 北京澎思科技有限公司 Occlusion-compatible face attribute identification method and device and electronic equipment
CN114067488A (en) * 2021-11-03 2022-02-18 深圳黑蚂蚁环保科技有限公司 Recovery system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105825191A (en) * 2016-03-23 2016-08-03 厦门美图之家科技有限公司 Face multi-attribute information-based gender recognition method and system and shooting terminal
CN106529402A (en) * 2016-09-27 2017-03-22 中国科学院自动化研究所 Multi-task learning convolutional neural network-based face attribute analysis method
CN107247947A (en) * 2017-07-07 2017-10-13 北京智慧眼科技股份有限公司 Face character recognition methods and device
WO2018133034A1 (en) * 2017-01-20 2018-07-26 Intel Corporation Dynamic emotion recognition in unconstrained scenarios

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017102671A (en) * 2015-12-01 2017-06-08 キヤノン株式会社 Identification device, adjusting device, information processing method, and program
CN106228139A (en) * 2016-07-27 2016-12-14 东南大学 A kind of apparent age prediction algorithm based on convolutional network and system thereof
CN108921022A (en) * 2018-05-30 2018-11-30 腾讯科技(深圳)有限公司 A kind of human body attribute recognition approach, device, equipment and medium
CN108921029A (en) * 2018-06-04 2018-11-30 浙江大学 A kind of SAR automatic target recognition method merging residual error convolutional neural networks and PCA dimensionality reduction

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105825191A (en) * 2016-03-23 2016-08-03 厦门美图之家科技有限公司 Face multi-attribute information-based gender recognition method and system and shooting terminal
CN106529402A (en) * 2016-09-27 2017-03-22 中国科学院自动化研究所 Multi-task learning convolutional neural network-based face attribute analysis method
WO2018133034A1 (en) * 2017-01-20 2018-07-26 Intel Corporation Dynamic emotion recognition in unconstrained scenarios
CN107247947A (en) * 2017-07-07 2017-10-13 北京智慧眼科技股份有限公司 Face character recognition methods and device

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695513B (en) * 2020-06-12 2023-02-14 长安大学 Facial expression recognition method based on depth residual error network
CN111695513A (en) * 2020-06-12 2020-09-22 长安大学 Facial expression recognition method based on depth residual error network
CN111783619A (en) * 2020-06-29 2020-10-16 北京百度网讯科技有限公司 Human body attribute identification method, device, equipment and storage medium
CN111783619B (en) * 2020-06-29 2023-08-11 北京百度网讯科技有限公司 Human body attribute identification method, device, equipment and storage medium
CN112287966A (en) * 2020-09-21 2021-01-29 深圳市爱深盈通信息技术有限公司 Face recognition method and device and electronic equipment
CN112232231A (en) * 2020-10-20 2021-01-15 城云科技(中国)有限公司 Pedestrian attribute identification method, system, computer device and storage medium
CN112232231B (en) * 2020-10-20 2024-02-02 城云科技(中国)有限公司 Pedestrian attribute identification method, system, computer equipment and storage medium
CN112783990A (en) * 2021-02-02 2021-05-11 贵州大学 Graph data attribute-based reasoning method and system
CN112783990B (en) * 2021-02-02 2023-04-18 贵州大学 Graph data attribute-based reasoning method and system
CN113128345A (en) * 2021-03-22 2021-07-16 深圳云天励飞技术股份有限公司 Multitask attribute identification method and device and computer readable storage medium
CN112906668B (en) * 2021-04-07 2023-08-25 上海应用技术大学 Face information identification method based on convolutional neural network
CN112906668A (en) * 2021-04-07 2021-06-04 上海应用技术大学 Face information identification method based on convolutional neural network
CN113657486A (en) * 2021-08-16 2021-11-16 浙江新再灵科技股份有限公司 Multi-label multi-attribute classification model establishing method based on elevator picture data
CN113705527A (en) * 2021-09-08 2021-11-26 西南石油大学 Expression recognition method based on loss function integration and coarse and fine hierarchical convolutional neural network
CN113705527B (en) * 2021-09-08 2023-09-22 西南石油大学 Expression recognition method based on loss function integration and thickness grading convolutional neural network
CN113963426A (en) * 2021-12-22 2022-01-21 北京的卢深视科技有限公司 Model training method, mask wearing face recognition method, electronic device and storage medium
CN113963426B (en) * 2021-12-22 2022-08-26 合肥的卢深视科技有限公司 Model training method, mask wearing face recognition method, electronic device and storage medium
CN114626527A (en) * 2022-03-25 2022-06-14 中国电子产业工程有限公司 Neural network pruning method and device based on sparse constraint retraining
CN114626527B (en) * 2022-03-25 2024-02-09 中国电子产业工程有限公司 Neural network pruning method and device based on sparse constraint retraining

Also Published As

Publication number Publication date
CN111291604A (en) 2020-06-16

Similar Documents

Publication Publication Date Title
WO2020114118A1 (en) Facial attribute identification method and device, storage medium and processor
US11195051B2 (en) Method for person re-identification based on deep model with multi-loss fusion training strategy
Salama AbdELminaam et al. A deep facial recognition system using computational intelligent algorithms
Torralba et al. Sharing visual features for multiclass and multiview object detection
Ali et al. Boosted NNE collections for multicultural facial expression recognition
CN110689025B (en) Image recognition method, device and system and endoscope image recognition method and device
CN110458038B (en) Small data cross-domain action identification method based on double-chain deep double-current network
CN112784763B (en) Expression recognition method and system based on local and overall feature adaptive fusion
Sha et al. Feature level analysis for 3D facial expression recognition
CN108830237B (en) Facial expression recognition method
US10936868B2 (en) Method and system for classifying an input data set within a data category using multiple data recognition tools
Danisman et al. Boosting gender recognition performance with a fuzzy inference system
Xia et al. Face occlusion detection using deep convolutional neural networks
Gupta et al. Single attribute and multi attribute facial gender and age estimation
Sun et al. Perceptual multi-channel visual feature fusion for scene categorization
CN114677730A (en) Living body detection method, living body detection device, electronic apparatus, and storage medium
CN114463812A (en) Low-resolution face recognition method based on dual-channel multi-branch fusion feature distillation
Sajid et al. Facial asymmetry-based feature extraction for different applications: a review complemented by new advances
Wasi et al. ARBEx: Attentive feature extraction with reliability balancing for robust facial expression learning
Tunc et al. Age group and gender classification using convolutional neural networks with a fuzzy logic-based filter method for noise reduction
Karungaru et al. Face recognition in colour images using neural networks and genetic algorithms
Yu et al. Research on face recognition method based on deep learning
CN111898473B (en) Driver state real-time monitoring method based on deep learning
CN112241680A (en) Multi-mode identity authentication method based on vein similar image knowledge migration network
Oladipo et al. A novel genetic-artificial neural network based age estimation system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19891840

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 28/09/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19891840

Country of ref document: EP

Kind code of ref document: A1