WO2020114118A1 - Procédé et dispositif d'identification d'attribut facial, support d'enregistrement et processeur - Google Patents

Procédé et dispositif d'identification d'attribut facial, support d'enregistrement et processeur Download PDF

Info

Publication number
WO2020114118A1
WO2020114118A1 PCT/CN2019/112478 CN2019112478W WO2020114118A1 WO 2020114118 A1 WO2020114118 A1 WO 2020114118A1 CN 2019112478 W CN2019112478 W CN 2019112478W WO 2020114118 A1 WO2020114118 A1 WO 2020114118A1
Authority
WO
WIPO (PCT)
Prior art keywords
attribute
facial
attributes
data
module
Prior art date
Application number
PCT/CN2019/112478
Other languages
English (en)
Chinese (zh)
Inventor
刘若鹏
栾琳
刘凯品
Original Assignee
深圳光启空间技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳光启空间技术有限公司 filed Critical 深圳光启空间技术有限公司
Publication of WO2020114118A1 publication Critical patent/WO2020114118A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Definitions

  • the present invention relates to the field of image recognition technology, and more specifically, to a facial attribute recognition method, device, storage medium, and processor.
  • the face is a very important biological feature. It has the characteristics of complex structure and many details. It also contains a lot of information, such as gender, race, age, expression, etc. A normal adult can easily understand the information of the face, but giving the same ability to the computer and letting it replace humans for brain-like thinking has become a scientific subject urgently needed to be overcome by researchers.
  • Some of the existing attribute recognition methods use training multiple deep convolutional neural networks, and then perform score fusion or feature fusion for further training.
  • the work tasks in this way are heavy and complicated, which is not conducive to practical application of technical problems.
  • the facial identity authentication network and the attribute recognition network are merged to form a fusion network, the identity feature and the facial attribute feature are simultaneously learned by joint learning. It is a multi-task network; and the cost-sensitive weighting function is used to make it independent of Due to the distribution of data in the target domain, balanced training in the source data domain is realized; and the modified fusion framework only adds a small number of parameters, adding additional computational load.
  • the technical problem to be solved by the present invention is to provide a facial attribute recognition method, device, storage medium and processor, which can adopt the method of reducing network parameters and change the number of parameters of certain layers in the network model, such as fully connected layers, Reduce the occupancy rate of network video memory, and input data of multiple attributes into the convolutional neural network through data combination, so that it can learn and train all attributes in the network, so that the same network can complete all facial attributes quickly and efficiently Identification.
  • an embodiment of the present invention provides a facial attribute recognition method, which is characterized by comprising: establishing a plurality of facial attribute data sets; and combining the multiple facial attribute data sets To form a dataset of the multiple facial attributes; establish a multi-task deep convolutional neural network to train the dataset of the multiple facial attributes to obtain a multi-attribute network model that can recognize multiple facial attributes; The multi-attribute network model performs multi-attribute prediction on the facial attributes of the image to be recognized to identify multiple facial attributes in the image to be recognized.
  • the establishing of the data sets of multiple facial attributes includes: pre-processing the data sets of the multiple facial attributes; normalizing the pre-processed data sets of the multiple facial attributes; The multiple facial attributes after normalization are labeled.
  • the combining data sets of multiple facial attributes to form the data set sets of the multiple facial attributes includes: setting the input format of each facial attribute data to an array form D1[n, c, w, h]; the data sets of the multiple face attributes are merged according to the first dimension of the input array, that is, the number of pictures. Assuming that the number of types of face attributes is m, the data form generated after the merge is D2[m ⁇ n, c, w, h]; where n, c, w, and h are the number, channel number, width, and height of the data set images of the multiple facial attributes input into the deep convolutional neural network, respectively.
  • the establishment of a multi-task deep convolutional network to train multiple sets of facial attribute data sets includes: using 4 residual modules of a deep residual network;
  • the first residual module is represented by two small residual modules, and the first small residual module structure uses 64 convolutional layers with a convolution kernel size of 3 ⁇ 3 connected to 64 convolution kernels with a size of 3. ⁇ 3 convolutional layer, the identity mapping uses 64 1 ⁇ 1 convolutions, and the sum of the two is input into the second small residual module.
  • the structure of the second small residual module is the same as that of the first The small residual modules are the same, the identity map of the second small residual module is represented by the output of the first small residual module,
  • the structure of the other three residual modules is the same as that of the first residual module, but the number of convolution kernels is 128, 256, and 512, respectively.
  • a multi-task deep convolutional neural network is established to train the data set of multiple facial attributes
  • a multi-attribute network model that can recognize multiple facial attributes includes:
  • the recognition result with the highest probability of each face attribute of the multiple face attributes is obtained at one time, that is, the multi-attribute value of the face attribute.
  • preprocessing the data set of the plurality of face attributes includes: using a multi-task convolutional neural network algorithm to detect the face in the picture to obtain a face image.
  • the multi-task convolutional neural network algorithm is used to detect the face in the picture, and obtaining the facial image includes: the convolutional neural network adopts a fully connected method, and uses the bounding box vector to fine-tune the candidate form, so that the face is in the image The coordinate points in are detected to obtain a facial image.
  • normalizing the data sets of the plurality of face attributes includes: normalizing the width and height of the face image of the data sets of the plurality of face attributes to 128 pixels ⁇ 128 pixels.
  • the labeling of facial attributes includes: labeling face wearing glasses, wearing a mask, hairstyle, face shape, age, and gender attributes.
  • the recognition result with the highest probability of each face attribute of the plurality of face attributes at a time in the picture recognition stage includes: when the network training calculates the loss function, according to each array’s The first dimension is divided, and it is cut according to the number of pictures corresponding to each attribute, that is, the final result is the array form D3[n, c, w, h] of each attribute; The number of pictures is carried out and each attribute data is input into the corresponding loss function; the loss function takes the form of probability, the formula is:
  • j represents the current serial number of the sub-attribute of an attribute in the facial attribute
  • k represents the serial number of the sub-attribute of an attribute in the facial attribute
  • T represents the total number of sub-attributes of an attribute in the facial attribute
  • S j represents The probability of the j-th sub-attribute of one of the face attributes, k starts from 1, and the sum of the probabilities of the T sub-attributes is 1. It is the exponential form of a certain attribute of the data.
  • a j and a k represent a certain attribute value scale in the facial attribute, and the denominator indicates the sum of the exponents of all attribute tags, thereby obtaining the probability that the facial attribute is a specific attribute tag.
  • an embodiment of the present invention provides a storage medium, the storage medium including a stored program, wherein the face attribute recognition method described above is executed when the program runs.
  • an embodiment of the present invention provides a processor for running a program, wherein the facial attribute recognition method described above is executed when the program is run.
  • an embodiment of the present invention provides a face recognition device, which includes a data establishment module, a data merge module, a training module, and a prediction module that are electrically connected in sequence: the data establishment module is used to establish a variety of facial attributes. Data set; the data merge module is used to merge the data sets of the multiple facial attributes to form a data set set of the multiple facial attributes; the training module is used to establish a multi-task deep convolutional neural network for the multiple Data sets of various facial attributes are trained to obtain a multi-attribute network model that can recognize multiple facial attributes; a prediction module is used to perform multi-attribute prediction of facial attributes of the image to be recognized using the multi-attribute network model to identify Identify multiple facial attributes in the image.
  • the data establishment module includes: a data preprocessing module, the data preprocessing module is used for preprocessing facial image data; a data normalization processing module, the data normalization processing module is used for face The image data is normalized; a data labeling module, which is used to label facial attributes.
  • the data merging module includes: a first storage module, the first storage module is used to store facial images of each attribute.
  • the training module includes a deep residual network
  • the deep residual network includes 4 residual modules
  • the first residual module is represented by two small residual modules
  • the first small residual The module structure uses 64 convolution layers with a convolution kernel size of 3 ⁇ 3 to connect 64 convolution layers with a convolution kernel size of 3 ⁇ 3, and its identity mapping uses 64 1 ⁇ 1 convolution layers.
  • the structure of the second small residual module is the same as the first small residual module
  • the identity mapping of the second small residual module uses the first
  • the output of the small residual module indicates that the structure of the other three residual modules is the same as that of the first residual module, and the number of convolution kernels is 128, 256, and 512, respectively.
  • the prediction module includes a third storage module, the third storage module is used to store the recognition result with the highest probability of each attribute of the face obtained according to the trained network model, that is, the multi-attribute of the face attribute value.
  • the data normalization processing module for normalizing the face image data includes: normalizing the width and height of the face image of the data set of multiple face attributes to 128 pixels ⁇ 128 pixels.
  • the present invention adopts a method of reducing network parameters, that is, changing the number of parameters of certain layers in the network model, such as the fully connected layer, reduces the network memory occupancy rate, And through the method of data combination, the data of multiple attributes are input into the convolutional neural network, so that it can learn and train all the attributes in the network, so that the same network can quickly and efficiently complete the identification of all facial attributes.
  • FIG. 1 is a flowchart of a face attribute recognition method of the present invention.
  • FIG. 2 is a basic block diagram of the residual network used in the facial attribute recognition method of the present invention.
  • FIG. 3 is an embodiment of a flowchart of a facial attribute recognition method of the present invention.
  • FIG. 4 is a diagram of facial shape categories.
  • FIG. 5 is a structural diagram of a face attribute recognition device of the present invention.
  • FIG. 1 is a flowchart of a face attribute recognition method of the present invention. As shown in FIG. 1, the facial attribute recognition method of the present invention includes steps: including steps:
  • the step S11 of establishing a plurality of facial attribute data sets includes: preprocessing, normalizing, and labeling the multiple facial attribute data sets.
  • Preprocessing the dataset of multiple facial attributes further includes:
  • the multi-task convolutional neural network algorithm is used to detect the face in the picture to obtain a facial image.
  • the convolutional neural network can be fully connected, and the candidate form can be fine-tuned using the bounding box vector to detect the coordinate points of the face in the image. After detecting the face, the face image is intercepted for training.
  • Normalizing the image to a certain size is determined by the network input size. Different networks have different input sizes.
  • normalizing the data set of multiple face attributes includes: normalizing the width and height of the face image of the data set of multiple face attributes to but not limited to 128 pixels ⁇ 128 pixels.
  • Labeling facial attributes includes but is not limited to: labeling face wearing glasses, wearing a mask, hairstyle, face shape, age, gender attributes, etc.
  • step S12 the data sets of multiple facial attributes are combined to form a data set set of multiple facial attributes including:
  • the data sets of multiple face attributes are combined according to the first dimension of the input array, that is, the number of pictures. Assuming that the number of types of face attributes is m, the data form generated after the merge is D2[m ⁇ n, c, w, h];
  • n, c, w, and h are the number, channel number, width, and height of the data set images of the multiple facial attributes input into the deep convolutional neural network, respectively.
  • step S13 establishes a multi-task deep convolutional neural network to train the data set of multiple facial attributes, and a multi-attribute network model that can recognize multiple facial attributes includes: 4 using a deep residual network Residual modules, 4 residual modules, of which the first residual module is represented by two small residual modules, the first small residual module structure uses 64 convolution kernels with a size of 3 ⁇ 3 convolution
  • the layer connects 64 convolutional layers with a convolution kernel size of 3 ⁇ 3, and its identity mapping uses 64 1 ⁇ 1 convolutions.
  • the second small residual module is input, and the second small
  • the structure of the residual module is the same as the first small residual module, the identity map of the second small residual module is represented by the output of the first small residual module, and the structure of the other three residual modules It has the same structure as the first residual module, but the number of convolution kernels is 128, 256, and 512, respectively; input the facial attribute data of the image to be recognized into the deep residual network for network training.
  • the number of residual modules can be selected according to actual needs.
  • four residual modules are used as examples.
  • step S14 uses the multi-attribute network model to perform multi-attribute prediction on the facial attributes of the image to be recognized to identify multiple facial attributes in the image to be recognized. Obtain the recognition result with the highest probability of each face attribute of the multiple face attributes at once, that is, the multiple attribute value of the face attribute.
  • the network training stage when calculating the loss function, it is divided according to the first dimension of each array, and it is cut according to the number of pictures corresponding to each attribute, that is, the final result is the array form of each attribute D3[n , C, w, h];
  • the loss function takes the form of probability, the formula is:
  • j represents the current serial number of the sub-attribute of an attribute in the face attribute
  • k represents the serial number of the sub-attribute in an attribute of the face
  • T represents the total number of sub-attributes of an attribute in the face attribute
  • S j represents the face
  • the probability of the jth sub-attribute of one of the attributes, k starts at 1, and the sum of the probabilities of the T sub-attributes is 1. It is the exponential form of a certain attribute of the data.
  • a j and a k represent a certain attribute value scale in the facial attribute, and the denominator indicates the sum of the exponents of all attribute tags, thereby obtaining the probability that the facial attribute is a specific attribute tag.
  • the sum of the probabilities of all the sub-attributes of round face, square face, and sharp face is 1 together.
  • the face attribute is determined to belong to the round face attribute among the three sub-attributes.
  • the parameters of the deep convolutional neural network are reduced, that is, the number of parameters of certain layers in the network model such as the fully connected layer is changed, and the network memory occupation rate is reduced, and Through data merging, data of multiple attributes are input into the convolutional neural network, so that it can learn and train all attributes in the network, so that the same network can quickly and efficiently complete the identification of all facial attributes.
  • An embodiment of the present invention further provides a storage medium, the storage medium includes a stored program, wherein the above-mentioned facial attribute recognition method flow is executed when the above program runs.
  • the above storage medium may be set to store program code for performing the following flow of facial attribute recognition method:
  • the above storage medium may include, but is not limited to: a USB flash drive, a read-only memory (Read-Only Memory, ROM for short), a random access memory (Random Access Memory, RAM for short), Various media that can store program codes, such as removable hard disks, magnetic disks, or optical disks.
  • the storage capacity is reduced, and the program of the built-in facial attribute recognition method process runs faster, thereby quickly and efficiently completing the recognition of all facial attributes.
  • An embodiment of the present invention further provides a processor for running a program, wherein, when the program runs, the steps in the facial attribute recognition method described above are executed.
  • the above program is used to perform the following steps:
  • FIG. 2 is an embodiment of a flowchart of a facial attribute recognition method of the present invention.
  • the flow chart of the present invention is introduced with age, gender, and face shape as the training set.
  • Samples for training data preprocessing include data pictures with age information, data pictures with gender information, and data pictures with face shape information.
  • the first step use the multi-task convolutional neural network (mtcnn) algorithm to detect the face in the picture.
  • the algorithm uses three convolutional neural network cascades to detect the face.
  • the full convolutional neural network is used to obtain face candidates Regression of form and boundary.
  • the non-maximum suppression method is used to remove the overlapping form.
  • the convolutional neural network in this step adopts the fully connected method and uses the bounding box vector to fine-tune.
  • the candidate form detects the coordinate points of the face in the image.
  • the face image is intercepted for training.
  • a convolutional neural network with one more layer is used to continue to optimize the position of the face detection frame.
  • a face image with a normalized size of 128 pixels ⁇ 128 pixels is obtained.
  • the second step Annotate the facial attributes.
  • the labeled attributes include whether to wear glasses, whether to wear a mask, hairstyle, face shape, age, gender, etc.
  • the labeling method uses the existing public attribute data set Market1501 to train an initial model and then coarsely classifies the pictures obtained by face detection. Each attribute will be given a numerical label, and then the fine classification of the facial attribute pictures is manually classified by human, thus Construct data sets of different facial attributes. For example: different attributes in the same picture can be placed in different folders.
  • the multi-task network learning mechanism can enable the network to share the characteristics of other data.
  • many deep learning networks only focus on a single task, so that many data features with the same commonality cannot be shared.
  • Multi-task learning can solve this problem well. It is an inductive transfer mechanism. The main goal is to use the specific field information hidden in the training signals of multiple related tasks to improve the generalization ability.
  • Multi-task learning uses Shared representation trains multiple tasks in parallel to accomplish this goal, that is, you can use shared representation to acquire knowledge about other related problems while learning a problem. Therefore, multi-task learning is a method focused on applying knowledge to solve a problem to other related problems.
  • This solution uses data merging to achieve the preparation of multi-task learning training data. In the process of data merging (taking face, gender, and age data as an example, other attributes can be performed according to this step), the following steps need to be followed:
  • each facial attribute data is an array, because its data format is: D1[n, c, w, h], where n is the image input into the network
  • the number of, c is the number of channels of the image (generally 3 channels)
  • w, h are the width and height of the image.
  • the data set is merged according to the first dimension of the input array, that is, the number of pictures, and the data form generated after the merge is D2[m ⁇ n, c, w, h]; where n, c , W, and h are the number, channel number, width, and height of the data set images of the multiple facial attributes input into the deep convolutional neural network, respectively.
  • the above two steps complete the data preparation phase of multi-task learning.
  • the network can learn the correlation between each data set to achieve the purpose of multi-task learning.
  • the present invention uses the deep residual network as the basic network to perform feature extraction on the network input data.
  • the residual network is a network proposed in 2015, and its performance is better than other deep networks.
  • the invention uses the classification ideas in the detection network to modify the deep residual network, reduce the residual modules in the network, and discard the fully connected layer to form a new network structure, which makes the network structure simple and the network model greatly reduced At the same time, the memory usage is also greatly reduced.
  • the structure of the deep residual network used in this scheme is relatively simple.
  • x is the input of the residual module and F(x) is the original mapping of the convolutional neural network.
  • Relu is the activation function in the depth residual module
  • H(x) is the output function of the depth residual module
  • the depth residual module uses the original mapping F(x) and the input x to form a row of network output functions.
  • This solution uses 4 residual modules of the deep residual network, where the first residual module is represented by two small residual modules, and the first small residual module structure uses 64 convolution kernels with a size of 3 ⁇
  • the convolutional layer of 3 is connected to 64 convolutional layers with a size of 3 ⁇ 3, and its identity mapping uses 64 1 ⁇ 1 convolutions.
  • the structure of the second small residual module is the same as that of the first small residual module, the identity map of the second small residual module is represented by the output of the first small residual module, and the other three
  • the structure of the residual module is the same as that of the first residual module, but the number of convolution kernels is 128, 256, and 512, respectively.
  • Only the four residual modules in the deep residual network are used, which makes the structure smaller and more conducive to feature extraction of the network input data. Through the depth residual network, the depth features of the image can be well extracted, which can facilitate better classification.
  • the data is merged before the network training begins to facilitate the sharing of data.
  • the network After the network's training and learning, the network has learned better features, and the number of images in the training data has not changed during the network's learning process. Therefore, in order to obtain the learning situation of each facial attribute data set, when the network output calculates the loss function, it is divided according to the first dimension of each attribute array and cut according to each attribute, that is, the final result is still each attribute
  • the array form of is D3[n, c, w, h], and input each attribute data into the corresponding loss function.
  • the data set after cutting is input into the corresponding loss function layer, so that the corresponding loss function calculation is performed, the corresponding weight update is obtained, and the corresponding category output of each facial attribute data set is obtained.
  • the loss function can calculate the loss function corresponding to each attribute data based on the shared features learned.
  • the loss function used in this scheme takes the form of probability, and the formula is:
  • a j and a k represent a certain attribute value scale in the facial attribute, and the denominator indicates the sum of the exponents of all attribute labels, thereby obtaining the probability that the facial attribute is a specific attribute label.
  • the weights are updated using the back propagation algorithm to make the network reach the optimal state, so as to obtain the facial attributes corresponding to the input samples, such as age recognition set, gender recognition set, and face recognition set.
  • age As an example, we will label age as four labels: teenager, youth, middle age, and old age.
  • the output of the last layer of the network will be calculated according to the loss function and the corresponding label, and four probabilities will be obtained, respectively representing the probabilities of the corresponding categories (juvenile, youth, middle age, old age). If the probability of the juvenile is the highest, it will be judged For teenagers.
  • the present invention will have a loss function corresponding to each attribute in the final network structure, so each attribute will calculate its corresponding probability according to its corresponding loss function. Determine which attribute it belongs to based on the probability.
  • the weights are updated using the back-propagation algorithm to make the network reach the optimal state, and the facial attributes corresponding to the input samples are obtained.
  • gender we will label gender as: male and female tags.
  • the output of the last layer of the network will be calculated according to the loss function and the corresponding label, and two probabilities will be obtained, which respectively represent the probability of the corresponding category (male and female). If the probability of male is the highest, it will be judged as male.
  • the present invention will have a loss function corresponding to each attribute in the final network structure, so each attribute will calculate its corresponding probability according to its corresponding loss function. Determine which attribute it belongs to based on the probability.
  • the weights are updated using the back-propagation algorithm to make the network reach the optimal state, and the facial attributes corresponding to the input samples are obtained.
  • FIG. 4 is a diagram of facial shape categories. According to the most common facial types, we can be roughly divided into 7 types: round face, oval face, heart-shaped face (inverted triangle face), diamond face, square face, long face, pear-shaped face (positive triangle face).
  • the judgment process for the face shape (the type of face is artificially specified by the practitioner according to the business needs, taking the above seven face types as an example) is:
  • the first step After the data preprocessing process, the data set of the training network and the test network is established through the face detection and data annotation process.
  • the second step input the training data of each attribute into the network, merge the data (taking gender, age, and face shape as examples), enter it into the network model, and train the network model.
  • Step 3 After the network model training is completed, use the trained model to input the picture to be recognized into the network model, so as to obtain the gender, age and face type of the face contained in the picture.
  • the elements of the age recognition set output through the network may be any of the above ages (juvenile, youth, middle age, old age).
  • the elements of the face recognition set output through the network can be the above face shapes (round face, oval face, heart-shaped face (inverted triangle face), diamond face, square face, long face, pear-shaped face (positive triangle face)) Of any kind.
  • the parameters of the deep convolutional neural network are reduced, that is, the number of parameters of certain layers in the network model such as the fully connected layer is changed, and the network memory occupation rate is reduced, and Through data merging, data of multiple attributes are input into the convolutional neural network, so that it can learn and train all attributes in the network, so that the same network can quickly and efficiently complete the identification of all facial attributes.
  • An embodiment of the present invention further provides a storage medium, which includes a stored program, where the flow of the facial attribute recognition method described in Embodiment 4 is executed when the above program runs.
  • the above storage medium may include, but is not limited to: a USB flash drive, a read-only memory (Read-Only Memory, ROM for short), a random access memory (Random Access Memory, RAM for short), Various media that can store program codes, such as removable hard disks, magnetic disks, or optical disks.
  • the storage capacity is reduced, and the program of the built-in facial attribute recognition method process runs faster, thereby quickly and efficiently completing the recognition of all facial attributes.
  • An embodiment of the present invention further provides a processor for running a program, where the steps in the facial attribute recognition method described in Embodiment 4 are executed when the program is run.
  • FIG. 5 is a structural diagram of a face attribute recognition device of the present invention.
  • the device includes: a data establishment module, the data establishment module is used to establish a plurality of facial attribute data sets; a data merge module, the data merge module is used to merge the multiple facial attribute data sets to form A data set collection of the plurality of facial attributes; a training module, the training module is used to establish a multi-task deep convolutional neural network to train the data set collection of the plurality of facial attributes to obtain a variety of facial attributes Multi-attribute network model; prediction module, the prediction module is used to perform multi-attribute prediction of facial attributes of the image to be recognized by using the multi-attribute network model to identify multiple facial attributes in the image to be recognized.
  • the data establishment module includes: a data preprocessing module, the data preprocessing module is used for preprocessing facial image data; a data normalization processing module, the data normalization processing module is used for facial image The data is normalized; a data labeling module, which is used to label facial attributes.
  • the data normalization processing module includes: normalizing the width and height of the face image of the data set of multiple face attributes to but not limited to 128 pixels ⁇ 128 pixels. Labeling facial attributes refers to: labeling facial attributes whether to wear glasses, whether to wear a mask, what hairstyle, what face shape, what age, what gender.
  • the data merging module includes: a first storage module; the first storage module is used to store facial image data of each attribute.
  • the software program dynamically generates Input the array D1[n, c, w, h], and merge the data sets of the multiple facial attributes according to the first dimension of the input array, that is, the number of pictures.
  • the data array D2[m ⁇ n, c, w, h] generated after merging where n is the number of images input into the deep convolutional neural network, and c is the number of channels input into the image of the deep convolutional neural network, w is the width of the image input into the deep convolutional neural network, and h is the height of the image input into the deep convolutional neural network.
  • the training module includes a deep residual network, the deep residual network includes 4 residual modules, wherein the first residual module is represented by two small residual modules, and the first small residual module structure uses 64
  • the convolutional layer with a convolution kernel size of 3 ⁇ 3 connects 64 convolutional layers with a convolution kernel size of 3 ⁇ 3, and its identity map uses 64 1 ⁇ 1 convolutions.
  • Small residual module, the structure of the second small residual module is the same as that of the first small residual module, and the identity mapping of the second small residual module uses the first small residual module
  • the output of shows that the structure of the other three residual modules is the same as that of the first residual module, and the number of convolution kernels is 128, 256, and 512, respectively.
  • the prediction module includes a second storage module, which is used to store a recognition result with the highest probability of each attribute of the face obtained according to the trained network model, that is, a multi-attribute value of the face attribute.
  • the parameters of the deep convolutional neural network are reduced, that is, the number of parameters of certain layers in the network model such as the fully connected layer is changed, and the network memory occupation rate is reduced, and Through data merging, data of multiple attributes are input into the convolutional neural network, so that it can learn and train all attributes in the network, so that the same network can quickly and efficiently complete the identification of all facial attributes.
  • the process of using the original facial attribute recognition method is: establishing data sets corresponding to age, gender, and face shape (including training and testing); entering age, gender, and face shape into their corresponding network models (at this time, three network models need to be trained ); Input the test pictures into the three trained network models respectively, and obtain the attribute recognition result corresponding to each network model according to the attribute corresponding to the highest probability of each network model.
  • the multi-task deep learning recognition method is used for facial recognition.
  • the number of parameters in the network determines the size of the network, and the size of the network determines the size of the network memory.
  • Adopt the method of reducing network parameters that is, change the number of parameters of certain layers (such as fully connected layers) in the network, thereby reducing the occupation of network video memory, and input data of multiple attributes into the volume through data consolidation
  • the neural network makes it learn and train all the attributes in the network, so that the same network can complete the recognition of all facial attributes.
  • the last fully connected layer needs to learn 1000 parameters. Since the fully connected layer occupies most of the network parameters, this solution prunes the network, that is, the original network structure.
  • the fully connected layer is removed, so that the convolutional layer is directly connected to the loss layer, and the probability of the face attribute is directly returned to complete the modification and pruning of the network structure.
  • Such network pruning will reduce the number of network calculation parameters, making Network size and video memory usage are reduced.
  • the facial attribute recognition method, device, storage medium and processor reduces the network parameters, that is, changes the number of parameters of certain layers in the network model, such as the fully connected layer, reduces Network memory occupancy rate, and through data combination, input data of multiple attributes into the convolutional neural network, so that it can learn and train all attributes in the network, so that the same network can quickly and efficiently complete the identification of all facial attributes .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

La présente invention concerne un procédé et un dispositif d'identification d'attribut facial, un support d'enregistrement et un processeur. Le procédé comprend les étapes consistant à : construire des jeux de données de multiples attributs faciaux ; fusionner les jeux de données de multiples attributs faciaux pour former une collection de jeux de données ; construire un réseau à convolution profonde multitâche pour former la collection de jeux de données de multiples attributs faciaux, pour obtenir un modèle de réseau capable d'identifier les multiples attributs faciaux ; utiliser le modèle de réseau formé obtenu pour de multiples attributs afin d'effectuer une prédiction multi-attribut pour des attributs faciaux d'une image à identifier, de façon à identifier les multiples attributs dans l'image à identifier. Des paramètres de réseau sont réduits pour modifier le nombre de paramètres de certaines couches dans le modèle de réseau, par exemple, une couche complètement connectée, et un taux d'occupation de mémoire de réseau inférieur ; et la fusion de données est utilisée pour entrer des données de multiples attributs dans un réseau de neurones à convolution, de telle sorte que tous les attributs sont étudiés et formés dans le réseau et ainsi l'identification pour tous les attributs faciaux est achevée rapidement et efficacement à l'aide du même réseau.
PCT/CN2019/112478 2018-12-07 2019-10-22 Procédé et dispositif d'identification d'attribut facial, support d'enregistrement et processeur WO2020114118A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201811502128.2 2018-12-07
CN201811502128.2A CN111291604A (zh) 2018-12-07 2018-12-07 面部属性识别方法、装置、存储介质及处理器

Publications (1)

Publication Number Publication Date
WO2020114118A1 true WO2020114118A1 (fr) 2020-06-11

Family

ID=70973734

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/112478 WO2020114118A1 (fr) 2018-12-07 2019-10-22 Procédé et dispositif d'identification d'attribut facial, support d'enregistrement et processeur

Country Status (2)

Country Link
CN (1) CN111291604A (fr)
WO (1) WO2020114118A1 (fr)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695513A (zh) * 2020-06-12 2020-09-22 长安大学 一种基于深度残差网络的人脸表情识别方法
CN111783619A (zh) * 2020-06-29 2020-10-16 北京百度网讯科技有限公司 人体属性的识别方法、装置、设备及存储介质
CN112232231A (zh) * 2020-10-20 2021-01-15 城云科技(中国)有限公司 行人属性的识别方法、系统、计算机设备和存储介质
CN112287966A (zh) * 2020-09-21 2021-01-29 深圳市爱深盈通信息技术有限公司 一种人脸识别方法、装置及电子设备
CN112783990A (zh) * 2021-02-02 2021-05-11 贵州大学 一种基于图数据属性推理方法及系统
CN112906668A (zh) * 2021-04-07 2021-06-04 上海应用技术大学 基于卷积神经网络的人脸信息识别方法
CN113033310A (zh) * 2021-02-25 2021-06-25 北京工业大学 一种基于视觉自注意力网络的表情识别方法
CN113128345A (zh) * 2021-03-22 2021-07-16 深圳云天励飞技术股份有限公司 多任务属性识别方法及设备、计算机可读存储介质
CN113191201A (zh) * 2021-04-06 2021-07-30 上海夏数网络科技有限公司 基于视觉的鸡雏公母智能鉴别方法及系统
CN113657486A (zh) * 2021-08-16 2021-11-16 浙江新再灵科技股份有限公司 基于电梯图片数据的多标签多属性分类模型建立方法
CN113705527A (zh) * 2021-09-08 2021-11-26 西南石油大学 一种基于损失函数集成和粗细分级卷积神经网络的表情识别方法
CN113947780A (zh) * 2021-09-30 2022-01-18 吉林农业大学 一种基于改进卷积神经网络的梅花鹿面部识别方法
CN113963231A (zh) * 2021-10-15 2022-01-21 中国石油大学(华东) 基于图像增强与样本平衡优化的行人属性识别方法
CN113963426A (zh) * 2021-12-22 2022-01-21 北京的卢深视科技有限公司 模型训练、戴口罩人脸识别方法、电子设备及存储介质
CN114092759A (zh) * 2021-10-27 2022-02-25 北京百度网讯科技有限公司 图像识别模型的训练方法、装置、电子设备及存储介质
CN114626527A (zh) * 2022-03-25 2022-06-14 中国电子产业工程有限公司 基于稀疏约束再训练的神经网络剪枝方法及装置

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111783574B (zh) * 2020-06-17 2024-02-23 李利明 膳食图像识别方法、装置以及存储介质
CN112149601A (zh) * 2020-09-30 2020-12-29 北京澎思科技有限公司 兼容遮挡的面部属性识别方法、装置和电子设备
CN114067488B (zh) * 2021-11-03 2024-06-11 深圳黑蚂蚁环保科技有限公司 回收系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105825191A (zh) * 2016-03-23 2016-08-03 厦门美图之家科技有限公司 基于人脸多属性信息的性别识别方法、系统及拍摄终端
CN106529402A (zh) * 2016-09-27 2017-03-22 中国科学院自动化研究所 基于多任务学习的卷积神经网络的人脸属性分析方法
CN107247947A (zh) * 2017-07-07 2017-10-13 北京智慧眼科技股份有限公司 人脸属性识别方法及装置
WO2018133034A1 (fr) * 2017-01-20 2018-07-26 Intel Corporation Reconnaissance dynamique d'émotions dans des scénarios non contraints

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017102671A (ja) * 2015-12-01 2017-06-08 キヤノン株式会社 識別装置、調整装置、情報処理方法及びプログラム
CN106228139A (zh) * 2016-07-27 2016-12-14 东南大学 一种基于卷积网络的表观年龄预测算法及其系统
CN108921022A (zh) * 2018-05-30 2018-11-30 腾讯科技(深圳)有限公司 一种人体属性识别方法、装置、设备及介质
CN108921029A (zh) * 2018-06-04 2018-11-30 浙江大学 一种融合残差卷积神经网络和pca降维的sar自动目标识别方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105825191A (zh) * 2016-03-23 2016-08-03 厦门美图之家科技有限公司 基于人脸多属性信息的性别识别方法、系统及拍摄终端
CN106529402A (zh) * 2016-09-27 2017-03-22 中国科学院自动化研究所 基于多任务学习的卷积神经网络的人脸属性分析方法
WO2018133034A1 (fr) * 2017-01-20 2018-07-26 Intel Corporation Reconnaissance dynamique d'émotions dans des scénarios non contraints
CN107247947A (zh) * 2017-07-07 2017-10-13 北京智慧眼科技股份有限公司 人脸属性识别方法及装置

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695513B (zh) * 2020-06-12 2023-02-14 长安大学 一种基于深度残差网络的人脸表情识别方法
CN111695513A (zh) * 2020-06-12 2020-09-22 长安大学 一种基于深度残差网络的人脸表情识别方法
CN111783619A (zh) * 2020-06-29 2020-10-16 北京百度网讯科技有限公司 人体属性的识别方法、装置、设备及存储介质
CN111783619B (zh) * 2020-06-29 2023-08-11 北京百度网讯科技有限公司 人体属性的识别方法、装置、设备及存储介质
CN112287966A (zh) * 2020-09-21 2021-01-29 深圳市爱深盈通信息技术有限公司 一种人脸识别方法、装置及电子设备
CN112232231A (zh) * 2020-10-20 2021-01-15 城云科技(中国)有限公司 行人属性的识别方法、系统、计算机设备和存储介质
CN112232231B (zh) * 2020-10-20 2024-02-02 城云科技(中国)有限公司 行人属性的识别方法、系统、计算机设备和存储介质
CN112783990A (zh) * 2021-02-02 2021-05-11 贵州大学 一种基于图数据属性推理方法及系统
CN112783990B (zh) * 2021-02-02 2023-04-18 贵州大学 一种基于图数据属性推理方法及系统
CN113033310A (zh) * 2021-02-25 2021-06-25 北京工业大学 一种基于视觉自注意力网络的表情识别方法
CN113128345A (zh) * 2021-03-22 2021-07-16 深圳云天励飞技术股份有限公司 多任务属性识别方法及设备、计算机可读存储介质
CN113191201A (zh) * 2021-04-06 2021-07-30 上海夏数网络科技有限公司 基于视觉的鸡雏公母智能鉴别方法及系统
CN112906668B (zh) * 2021-04-07 2023-08-25 上海应用技术大学 基于卷积神经网络的人脸信息识别方法
CN112906668A (zh) * 2021-04-07 2021-06-04 上海应用技术大学 基于卷积神经网络的人脸信息识别方法
CN113657486A (zh) * 2021-08-16 2021-11-16 浙江新再灵科技股份有限公司 基于电梯图片数据的多标签多属性分类模型建立方法
CN113705527A (zh) * 2021-09-08 2021-11-26 西南石油大学 一种基于损失函数集成和粗细分级卷积神经网络的表情识别方法
CN113705527B (zh) * 2021-09-08 2023-09-22 西南石油大学 一种基于损失函数集成和粗细分级卷积神经网络的表情识别方法
CN113947780A (zh) * 2021-09-30 2022-01-18 吉林农业大学 一种基于改进卷积神经网络的梅花鹿面部识别方法
CN113963231A (zh) * 2021-10-15 2022-01-21 中国石油大学(华东) 基于图像增强与样本平衡优化的行人属性识别方法
CN114092759A (zh) * 2021-10-27 2022-02-25 北京百度网讯科技有限公司 图像识别模型的训练方法、装置、电子设备及存储介质
CN113963426A (zh) * 2021-12-22 2022-01-21 北京的卢深视科技有限公司 模型训练、戴口罩人脸识别方法、电子设备及存储介质
CN113963426B (zh) * 2021-12-22 2022-08-26 合肥的卢深视科技有限公司 模型训练、戴口罩人脸识别方法、电子设备及存储介质
CN114626527A (zh) * 2022-03-25 2022-06-14 中国电子产业工程有限公司 基于稀疏约束再训练的神经网络剪枝方法及装置
CN114626527B (zh) * 2022-03-25 2024-02-09 中国电子产业工程有限公司 基于稀疏约束再训练的神经网络剪枝方法及装置

Also Published As

Publication number Publication date
CN111291604A (zh) 2020-06-16

Similar Documents

Publication Publication Date Title
WO2020114118A1 (fr) Procédé et dispositif d'identification d'attribut facial, support d'enregistrement et processeur
US11195051B2 (en) Method for person re-identification based on deep model with multi-loss fusion training strategy
Salama AbdELminaam et al. A deep facial recognition system using computational intelligent algorithms
Torralba et al. Sharing visual features for multiclass and multiview object detection
Ali et al. Boosted NNE collections for multicultural facial expression recognition
CN110689025B (zh) 图像识别方法、装置、系统及内窥镜图像识别方法、装置
CN108427921A (zh) 一种基于卷积神经网络的人脸识别方法
Sha et al. Feature level analysis for 3D facial expression recognition
CN108830237B (zh) 一种人脸表情的识别方法
CN112784763A (zh) 基于局部与整体特征自适应融合的表情识别方法及系统
US10936868B2 (en) Method and system for classifying an input data set within a data category using multiple data recognition tools
Chanti et al. Improving bag-of-visual-words towards effective facial expressive image classification
US11538577B2 (en) System and method for automated diagnosis of skin cancer types from dermoscopic images
Danisman et al. Boosting gender recognition performance with a fuzzy inference system
Xia et al. Face occlusion detection using deep convolutional neural networks
Gupta et al. Single attribute and multi attribute facial gender and age estimation
CN114677730A (zh) 活体检测方法、装置、电子设备及存储介质
Sun et al. Perceptual multi-channel visual feature fusion for scene categorization
Wasi et al. Arbex: Attentive feature extraction with reliability balancing for robust facial expression learning
Sajid et al. Facial asymmetry-based feature extraction for different applications: a review complemented by new advances
CN116311387B (zh) 一种基于特征交集的跨模态行人重识别方法
Almuashi et al. Siamese convolutional neural network and fusion of the best overlapping blocks for kinship verification
Tunc et al. Age group and gender classification using convolutional neural networks with a fuzzy logic-based filter method for noise reduction
Yu et al. Research on face recognition method based on deep learning
Karungaru et al. Face recognition in colour images using neural networks and genetic algorithms

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19891840

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 28/09/2021)

122 Ep: pct application non-entry in european phase

Ref document number: 19891840

Country of ref document: EP

Kind code of ref document: A1