CN112825119A - Face attribute judgment method and device, computer readable storage medium and equipment - Google Patents

Face attribute judgment method and device, computer readable storage medium and equipment Download PDF

Info

Publication number
CN112825119A
CN112825119A CN201911138227.1A CN201911138227A CN112825119A CN 112825119 A CN112825119 A CN 112825119A CN 201911138227 A CN201911138227 A CN 201911138227A CN 112825119 A CN112825119 A CN 112825119A
Authority
CN
China
Prior art keywords
extraction module
face
probability
branch
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911138227.1A
Other languages
Chinese (zh)
Inventor
周军
王洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Eyes Intelligent Technology Co ltd
Beijing Eyecool Technology Co Ltd
Original Assignee
Beijing Eyes Intelligent Technology Co ltd
Beijing Eyecool Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Eyes Intelligent Technology Co ltd, Beijing Eyecool Technology Co Ltd filed Critical Beijing Eyes Intelligent Technology Co ltd
Priority to CN201911138227.1A priority Critical patent/CN112825119A/en
Publication of CN112825119A publication Critical patent/CN112825119A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Abstract

The invention discloses a face attribute judgment method, a face attribute judgment device, a computer readable storage medium and a computer readable storage device, and belongs to the field of pattern recognition. The method comprises the following steps: obtaining key points of the face image; aligning the key points to the appointed coordinate position, and intercepting the image with the appointed width and the appointed height on the aligned image to obtain a face image; inputting the face image into a convolutional neural network to obtain the probability of each face attribute: the face image is subjected to convolution operation, activation operation, Eltwise operation and pooling operation to obtain face features; the facial features are processed by a plurality of branches respectively, and a classifier or a regressor of each branch outputs a probability of a face attribute. The invention solves the attribute analysis related to the face through the face characteristics and obtains better face attribute judgment effect; by using the multi-task supervision method, each attribute task only shares a part of the network, so that the convenience of the network is ensured, and a better effect can be obtained.

Description

Face attribute judgment method and device, computer readable storage medium and equipment
Technical Field
The present invention relates to the field of pattern recognition, and in particular, to a method and an apparatus for determining a face attribute, a computer-readable storage medium, and a device.
Background
The rise of deep learning brings huge development to biological feature recognition, especially to research in the directions of face recognition, image understanding and the like. With the continuous improvement of the accuracy of the face comparison algorithm, more and more researchers begin to pay attention to the research on the behavior attribute of the human. In the field of face recognition, research on human behaviors includes research on attributes of a human body, such as expressions, postures and the like, and research on external attributes, such as whether glasses or sunglasses are worn.
In order to enhance the interaction between human and machine, the research on the attributes of human faces is increasing. Currently, research on attributes of human faces mainly focuses on performing two-classification or multi-classification on images of the human faces by using a classifier.
(1) Two classification methods for human face behavior attributes
The human face behavior attribute classification method based on the two classifications judges a single attribute type, and generally uses a convolutional neural network to perform the two classifications on a human face image to judge whether the human face image has the attribute and how large the probability of having the attribute is. For example, for the attribute of glasses, the objective is to determine whether a captured face image is glasses-worn, the binary classification method uses a depth network to perform binary classification on the face image, and the result of the binary classification is the probability that the face in the image is glasses-worn and the probability that the face is not glasses-worn.
The method is simple and direct for attribute classification, but if n attributes need to be judged, n classifiers or deep networks need to be trained, and n feedforward processes are carried out. Therefore, the method is complicated to judge all the face attributes, time complexity is high, and waste of excessive resources is caused.
(2) Multi-classification method for human face behavior attributes
The method takes all the face attributes needing to be judged as final targets, and only one classifier or a deep network is used for outputting all judgment results. For example, it is necessary to simultaneously determine whether a human face image has three attributes, namely, whether the image is worn by glasses or closed by eyes or whether to make up, and actually, only one convolutional neural network is used for three classifications, and the result of each classification represents the probability of determining that the image has the attribute. Since the method uses multi-classification, the judgment speed can be increased to a greater extent for multi-classification tasks. However, as the number of classification categories increases, the generalization effect of the algorithm becomes worse.
From the foregoing, the method based on two-classification in the prior art can only determine a single attribute, and if it is desired to determine multiple face attributes in actual use, the time overhead is increased. The multi-classification based method cannot better solve the problem of increasing attribute types, and if more attributes need to be classified, the generalization of the algorithm becomes poor. Furthermore, with the increase of the number of attributes to be classified, a deeper convolutional neural network and a wider convolutional neural network need to be selected for attribute classification, so that better generalization is obtained, but higher time complexity is caused.
Disclosure of Invention
In order to solve the technical problems, the invention provides a method, a device, a computer readable storage medium and equipment for judging the face attribute, which effectively solve the attribute analysis related to the face through the face characteristics and obtain better face attribute judgment effect; and by using a multi-task supervision method, each attribute task only shares a part of the network, so that the convenience of the network is ensured, and a better effect can be obtained.
The technical scheme provided by the invention is as follows:
in a first aspect, the present invention provides a method for determining a face attribute, where the method includes:
obtaining key points of a face image by using a face detection and key point positioning method;
aligning the key points to the appointed coordinate position, and intercepting the image with the appointed width and the appointed height on the aligned image to obtain a face image;
inputting the face image into a convolutional neural network to obtain the probability of each face attribute, wherein:
the facial image is subjected to a series of convolution operation, activation operation, Eltwise operation and pooling operation to obtain facial features;
and processing the facial features through a plurality of branches respectively, wherein each branch sequentially comprises a convolution layer and a discriminator, the discriminator is a classifier or a regressor, the convolution layer of each branch is different from the convolution layers of other branches, and the classifier or the regressor of each branch outputs a probability of a human face attribute.
Further, the convolutional neural network comprises a first extraction module, a second extraction module, a third extraction module, a fourth extraction module, a fifth extraction module, a sixth extraction module, a seventh extraction module, an eighth extraction module, a ninth extraction module and a tenth extraction module which are connected in sequence;
the facial image passes through a first extraction module to a tenth extraction module and passes through a convolution layer, an activation layer and a pooling layer to obtain the facial features;
the first extraction module, the third extraction module, the fourth extraction module, the sixth extraction module, the seventh extraction module, the eighth extraction module and the tenth extraction module respectively comprise a plurality of convolution layers, an activation layer and an Eltwise layer, and the second extraction module, the fifth extraction module and the ninth extraction module respectively comprise a plurality of convolution layers and an activation layer;
in a third extraction module, a fourth extraction module, a sixth extraction module, a seventh extraction module, an eighth extraction module and a tenth extraction module, the feature diagram input by the current extraction module is subjected to a plurality of convolution operations and activation operations, and the obtained feature diagram and the input feature diagram are subjected to Eltwise operation to obtain an output feature diagram which is used as the input of the next extraction module.
Further, the face attribute includes age, whether have the beard, whether wear glasses, whether wear the mask, whether wear sunglasses, branch quantity is 5, wherein:
the facial features pass through the convolution layer of the first branch and the classifier to obtain the probability of having a beard and the probability of not having a beard; the facial features pass through the convolution layer and the regressor of the second branch to obtain the age of the person; the facial features pass through the convolution layer and the classifier of the third branch to obtain the probability of wearing glasses and the probability of not wearing glasses; the facial features pass through the convolution layer and the classifier of the fourth branch to obtain the probability of wearing the mask and the probability of not wearing the mask; and the facial features are subjected to the convolution layer and the classifier of the fifth branch to obtain the probability of wearing sunglasses and the probability of not wearing sunglasses.
Further, the convolutional neural network is obtained by multi-task supervised training, and the trained loss function is cross entropy loss; during training, the samples in the training set are subjected to translation or rotation to expand data.
In a second aspect, the present invention provides a face attribute determination apparatus, including:
the face detection and key point positioning module is used for obtaining key points of a face image by using a face detection and key point positioning method;
the face image acquisition module is used for aligning the key points to the specified coordinate positions, and intercepting the images with the specified width and the specified height on the aligned images to obtain face images;
the facial feature extraction and human face attribute classification module is used for inputting facial images into a convolutional neural network to obtain the probability of each human face attribute, and comprises the following steps:
the facial feature extraction unit is used for obtaining facial features of the facial image through a series of convolution operation, activation operation, Eltwise operation and pooling operation;
the face attribute classification unit is used for processing the face features through a plurality of branches respectively, each branch sequentially comprises a convolution layer and a discriminator, the discriminator is a classifier or a regressor, the convolution layer of each branch is different from the convolution layers of other branches, and the classifier or the regressor of each branch outputs the probability of one face attribute.
Further, the convolutional neural network comprises a first extraction module, a second extraction module, a third extraction module, a fourth extraction module, a fifth extraction module, a sixth extraction module, a seventh extraction module, an eighth extraction module, a ninth extraction module and a tenth extraction module which are connected in sequence;
the facial image passes through a first extraction module to a tenth extraction module and passes through a convolution layer, an activation layer and a pooling layer to obtain the facial features;
the first extraction module, the third extraction module, the fourth extraction module, the sixth extraction module, the seventh extraction module, the eighth extraction module and the tenth extraction module respectively comprise a plurality of convolution layers, an activation layer and an Eltwise layer, and the second extraction module, the fifth extraction module and the ninth extraction module respectively comprise a plurality of convolution layers and an activation layer;
in a third extraction module, a fourth extraction module, a sixth extraction module, a seventh extraction module, an eighth extraction module and a tenth extraction module, the feature diagram input by the current extraction module is subjected to a plurality of convolution operations and activation operations, and the obtained feature diagram and the input feature diagram are subjected to Eltwise operation to obtain an output feature diagram which is used as the input of the next extraction module.
Further, the face attribute includes age, whether have the beard, whether wear glasses, whether wear the mask, whether wear sunglasses, branch quantity is 5, wherein:
the facial features pass through the convolution layer of the first branch and the classifier to obtain the probability of having a beard and the probability of not having a beard; the facial features pass through the convolution layer and the regressor of the second branch to obtain the age of the person; the facial features pass through the convolution layer and the classifier of the third branch to obtain the probability of wearing glasses and the probability of not wearing glasses; the facial features pass through the convolution layer and the classifier of the fourth branch to obtain the probability of wearing the mask and the probability of not wearing the mask; and the facial features are subjected to the convolution layer and the classifier of the fifth branch to obtain the probability of wearing sunglasses and the probability of not wearing sunglasses.
Further, the convolutional neural network is obtained by multi-task supervised training, and the trained loss function is cross entropy loss; during training, the samples in the training set are subjected to translation or rotation to expand data.
In a third aspect, the present invention provides a computer-readable storage medium for face attribute determination, comprising a memory for storing processor-executable instructions, which when executed by the processor implement the steps of the face attribute determination method according to the first aspect.
In a fourth aspect, the present invention provides an apparatus for face attribute determination, including at least one processor and a memory storing computer-executable instructions, where the processor implements the steps of the face attribute determination method according to the first aspect when executing the instructions.
The invention has the following beneficial effects:
the invention effectively solves the attribute analysis related to the face through the face characteristics, and obtains better face attribute judgment effect; and by using a multi-task supervision method, each attribute task only shares a part of the network, so that the convenience of the network is ensured, and a better effect can be obtained.
Drawings
FIG. 1 is a flow chart of a face attribute determination method of the present invention;
FIG. 2 is a schematic diagram of a face attribute determination method according to the present invention;
FIG. 3 is a schematic diagram of acquiring a face image;
fig. 4 is a schematic diagram of a human face attribute determination apparatus according to the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
Example 1:
the embodiment of the invention provides a face attribute judgment method, as shown in fig. 1-2, the method comprises the following steps:
step S100: and obtaining key points of the face image by using a face detection and key point positioning method.
The invention firstly needs to use an SSD face detector to detect the face of an input image, and then uses a TDCNN method to position key points on the obtained face image, thereby obtaining the key points of the face image. The key points of the face image are mainly used for aligning the face image in the follow-up process and are used as a reference when the face image is intercepted. The face detector is not limited to SSD, but may be CRAFT, ADABOOST, or the like. Similarly, the face key point positioning method is not limited to using TDCNN, and SDM may also be used. The step only needs to acquire the coordinates of the key points of the face image.
Step S200: the key points are aligned to the specified coordinate positions, and the images with the specified width and the specified height are intercepted on the aligned images to obtain the face images, as shown in fig. 3.
Illustratively, on the aligned images, face images having a width and a height of 200, respectively, are cut out. Therefore, in this example, the input to the convolutional neural network is a three channel RGB face image with width and height of 200.
Step S300: inputting the face image into a convolutional neural network to obtain the probability of each face attribute, wherein:
the facial image is subjected to a series of convolution operation, activation operation, Eltwise operation and pooling operation to obtain facial features.
For example, after a series of operations, an input 3-channel image with a size of 200 × 200 may obtain a feature map with a channel number of 256 and a width and a height of 4, and then use global mean value down-sampling (pooling) to obtain a feature map with a channel number of 256 and a width and a height of 1, that is, a facial feature.
Then, the facial features are respectively processed by a plurality of branches, each branch sequentially comprises a convolution layer and a discriminator, the discriminator is a classifier or a regressor, the convolution layer of each branch is different from the convolution layers of other branches, and the classifier or the regressor of each branch outputs a probability of a human face attribute.
Illustratively, there are five branches, and the 256-dimensional features obtained in the above process are used as shared features of the five branches, and then the features are respectively convolved by using different convolution layers, and are classified or regressed by different classifiers or regressers, so as to obtain a final classification result.
The invention carries out face detection and key point positioning on the input image, acquires the position of the face of a person, intercepts the face image, and analyzes the characteristics of the face image to finally obtain the related face attribute. The invention extracts features of the face image by using the convolutional neural network and simultaneously carries out various attribute classifications or regressions, thereby effectively solving the attribute analysis related to the face, such as the attribute judgment results of face age judgment, judgment whether the face has a beard or not, whether the face wears glasses or not, whether the face wears a mask or not, whether the face wears sunglasses or not and the like.
The invention discloses a human face attribute classification method which uses a multitask supervised method, namely, a convolutional neural network is used for classifying various attributes at the same time. However, in order to obtain a better classification effect, only a part of the network (namely the part for extracting the facial features) is shared among a plurality of tasks, and each task uses an additional branch for feature extraction, so that the accuracy of classification or regression is ensured, and the time complexity is reduced.
In conclusion, the invention effectively solves the attribute analysis related to the face through the facial features, and obtains better human face attribute judgment effect; and by using a multi-task supervision method, each attribute task only shares a part of the network, so that the convenience of the network is ensured, and a better effect can be obtained.
As an improvement of the present invention, the convolutional neural network includes a first extraction module, a second extraction module, a third extraction module, a fourth extraction module, a fifth extraction module, a sixth extraction module, a seventh extraction module, an eighth extraction module, a ninth extraction module, and a tenth extraction module, which are connected in sequence.
The facial image passes through the first extraction module to the tenth extraction module and passes through a convolution layer, an activation layer and a pooling layer to obtain facial features.
The first extraction module, the third extraction module, the fourth extraction module, the sixth extraction module, the seventh extraction module, the eighth extraction module and the tenth extraction module respectively comprise a plurality of convolution layers, an activation layer and an Eltwise layer, and the second extraction module, the fifth extraction module and the ninth extraction module respectively comprise a plurality of convolution layers and an activation layer.
In a third extraction module, a fourth extraction module, a sixth extraction module, a seventh extraction module, an eighth extraction module and a tenth extraction module, the feature diagram input by the current extraction module is subjected to a plurality of convolution operations and activation operations, and the obtained feature diagram and the input feature diagram are subjected to Eltwise operation to obtain an output feature diagram which is used as the input of the next extraction module.
The invention emphasizes five attributes with high correlation with human face features, and uses a convolutional neural network to classify. The invention judges 5 human face attributes of age, whether a user has a beard, whether the user wears glasses or not, whether the user wears a mask or not and whether the user wears sunglasses or not through the facial features, and each attribute is completed by classification or regression. The age is a regression task, whether a user has a beard or not, whether the user wears glasses or not, whether the user wears a mask or not and whether the user wears sunglasses or not are classified tasks.
The facial features pass through the convolution layer of the first branch and the classifier to obtain the probability of having a beard and the probability of not having a beard; the facial features pass through the convolution layer and the regressor of the second branch to obtain the age of the person; the facial features pass through the convolution layer and the classifier of the third branch to obtain the probability of wearing glasses and the probability of not wearing glasses; the facial features pass through the convolution layer and the classifier of the fourth branch to obtain the probability of wearing the mask and the probability of not wearing the mask; the facial features are subjected to the convolution layer and the classifier of the fifth branch, and the probability of wearing sunglasses and the probability of not wearing sunglasses are obtained.
The network structure used in the invention has a better description effect on facial features, and the five attributes are classified simultaneously by using a lightweight convolutional neural network, so that the depth and the width of the network are controlled, the classification accuracy is ensured, and the time complexity is reduced, so that the classification has higher accuracy and higher speed.
The convolutional neural network is obtained by a multitask supervised method training, and the trained loss function is cross entropy loss.
In practical use, due to the fact that different face detections or face positioning can cause deviation in the acquisition of face images, the method performs corresponding translation or rotation on samples in a training set during training to expand data, and therefore the algorithm has better robustness.
In one example of the present invention, the key points of the face image include five key points, i.e., a left eye center point, a right eye center point, a nose tip point, a left mouth corner point, and a right mouth corner point.
In aligning the keypoints to the specified coordinate locations, the exemplary five keypoint aligned coordinates are as follows:
the left eye center point abscissa is aligned to 50, and the left eye center point ordinate is aligned to 70;
the abscissa of the center point of the right eye is aligned to 150, and the ordinate of the center point of the right eye is aligned to 70;
the horizontal coordinate of the nose tip point is aligned to 100, and the vertical coordinate of the nose tip point is aligned to 100;
the abscissa of the left mouth corner point is aligned to 65, and the ordinate of the left mouth corner point is aligned to 130;
the right mouth corner point abscissa is aligned to 135 and the right mouth corner point ordinate is aligned to 130.
The human face attribute judging method of the invention uses the convolutional neural network to extract the features and simultaneously carries out attribute classification or regression, and the method can better predict the age, whether the person has a beard, whether the person wears glasses, whether the person wears a mask, whether the person wears sunglasses and the like. It should be noted that the method of the present invention is not only operated on the above five attributes, but also applied to other attributes related to facial features, such as yes or no make-up.
Example 2:
an embodiment of the present invention provides a face attribute determination apparatus, as shown in fig. 4, the apparatus includes:
and the face detection and key point positioning module 10 is used for obtaining key points of the face image by using a face detection and key point positioning method.
The face image obtaining module 20 is configured to align the key points to a specified coordinate position, and intercept an image with a specified width and a specified height on the aligned image to obtain a face image.
The facial feature extraction and face attribute classification module 30 is configured to input the facial image into a convolutional neural network to obtain a probability of each face attribute, where the probability includes:
and the facial feature extraction unit 31 is used for obtaining the facial features of the facial image through a series of convolution operation, activation operation, Eltwise operation and pooling operation.
The face attribute classification unit 32 is configured to process the face features through a plurality of branches, where each branch includes a convolution layer and a discriminator in sequence, the discriminator is a classifier or a regressor, the convolution layer of each branch is different from the convolution layers of other branches, and the classifier or the regressor of each branch outputs a probability of a face attribute.
The invention effectively solves the attribute analysis related to the face through the face characteristics, and obtains better face attribute judgment effect; and by using a multi-task supervision method, each attribute task only shares a part of the network, so that the convenience of the network is ensured, and a better effect can be obtained.
As an improvement of the present invention, the convolutional neural network includes a first extraction module, a second extraction module, a third extraction module, a fourth extraction module, a fifth extraction module, a sixth extraction module, a seventh extraction module, an eighth extraction module, a ninth extraction module, and a tenth extraction module, which are connected in sequence.
The facial image passes through the first extraction module to the tenth extraction module and passes through a convolution layer, an activation layer and a pooling layer to obtain facial features.
The first extraction module, the third extraction module, the fourth extraction module, the sixth extraction module, the seventh extraction module, the eighth extraction module and the tenth extraction module respectively comprise a plurality of convolution layers, an activation layer and an Eltwise layer, and the second extraction module, the fifth extraction module and the ninth extraction module respectively comprise a plurality of convolution layers and an activation layer.
In a third extraction module, a fourth extraction module, a sixth extraction module, a seventh extraction module, an eighth extraction module and a tenth extraction module, the feature diagram input by the current extraction module is subjected to a plurality of convolution operations and activation operations, and the obtained feature diagram and the input feature diagram are subjected to Eltwise operation to obtain an output feature diagram which is used as the input of the next extraction module.
The invention emphasizes five attributes with high correlation with human face features, and uses a convolutional neural network to classify. The invention judges 5 human face attributes of age, whether a user has a beard, whether the user wears glasses or not, whether the user wears a mask or not and whether the user wears sunglasses or not through the facial features, and each attribute is completed by classification or regression. The age is a regression task, whether a user has a beard or not, whether the user wears glasses or not, whether the user wears a mask or not and whether the user wears sunglasses or not are classified tasks.
The facial features pass through the convolution layer of the first branch and the classifier to obtain the probability of having a beard and the probability of not having a beard; the facial features pass through the convolution layer and the regressor of the second branch to obtain the age of the person; the facial features pass through the convolution layer and the classifier of the third branch to obtain the probability of wearing glasses and the probability of not wearing glasses; the facial features pass through the convolution layer and the classifier of the fourth branch to obtain the probability of wearing the mask and the probability of not wearing the mask; the facial features are subjected to the convolution layer and the classifier of the fifth branch, and the probability of wearing sunglasses and the probability of not wearing sunglasses are obtained.
The network structure used in the invention has a better description effect on facial features, and the five attributes are classified simultaneously by using a lightweight convolutional neural network, so that the depth and the width of the network are controlled, the classification accuracy is ensured, and the time complexity is reduced, so that the classification has higher accuracy and higher speed.
The convolutional neural network is obtained by a multitask supervised method training, and the trained loss function is cross entropy loss.
In practical use, due to the fact that different face detection live face positioning can cause deviation in face image acquisition, corresponding translation or rotation is conducted on samples in a training set during training to expand data, and therefore the algorithm has better robustness.
In one example of the present invention, the key points of the face image include five key points, i.e., a left eye center point, a right eye center point, a nose tip point, a left mouth corner point, and a right mouth corner point.
In aligning the keypoints to the specified coordinate locations, the exemplary five keypoint aligned coordinates are as follows:
illustratively, the coordinates at which the five keypoints of this step are aligned are as follows:
the left eye center point abscissa is aligned to 50, and the left eye center point ordinate is aligned to 70;
the abscissa of the center point of the right eye is aligned to 150, and the ordinate of the center point of the right eye is aligned to 70;
the horizontal coordinate of the nose tip point is aligned to 100, and the vertical coordinate of the nose tip point is aligned to 100;
the abscissa of the left mouth corner point is aligned to 65, and the ordinate of the left mouth corner point is aligned to 130;
the right mouth corner point abscissa is aligned to 135 and the right mouth corner point ordinate is aligned to 130.
The human face attribute judging device extracts features by using a convolutional neural network and performs attribute classification or regression at the same time, and can better predict the age, whether a person has a beard, whether the person wears glasses, whether the person wears a mask, whether the person wears sunglasses and the like. It should be noted that the device of the present invention does not only operate on the above five attributes, but also applies to other attributes related to facial features, such as whether to make up or not.
The device provided by the embodiment of the present invention has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, reference may be made to the corresponding contents in the method embodiments without reference to the device embodiments. It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the apparatus and the unit described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Example 3:
the method provided by the embodiment of the present specification can implement the service logic through a computer program and record the service logic on a storage medium, and the storage medium can be read and executed by a computer, so as to implement the effect of the solution described in embodiment 1 of the present specification. Accordingly, the present invention also provides a computer-readable storage medium for face attribute determination, comprising a memory for storing processor-executable instructions, which when executed by a processor, implement the steps comprising the face attribute determination method of embodiment 1.
The invention effectively solves the attribute analysis related to the face through the face characteristics, and obtains better face attribute judgment effect; and by using a multi-task supervision method, each attribute task only shares a part of the network, so that the convenience of the network is ensured, and a better effect can be obtained.
The storage medium may include a physical device for storing information, and typically, the information is digitized and then stored using an electrical, magnetic, or optical media. The storage medium may include: devices that store information using electrical energy, such as various types of memory, e.g., RAM, ROM, etc.; devices that store information using magnetic energy, such as hard disks, floppy disks, tapes, core memories, bubble memories, and usb disks; devices that store information optically, such as CDs or DVDs. Of course, there are other ways of storing media that can be read, such as quantum memory, graphene memory, and so forth.
The above description of the storage medium according to the method embodiment may also include other implementations. The specific implementation manner may refer to the description of the related method embodiment, and is not described in detail herein.
Example 4:
the invention also provides a device for judging the human face attribute, which can be a single computer, and can also comprise an actual operation device and the like using one or more methods or one or more embodiment devices of the specification. The device for determining a face attribute may include at least one processor and a memory storing computer-executable instructions, where the processor executes the instructions to implement the steps of the face attribute determination method in any one or more of embodiments 1.
The invention effectively solves the attribute analysis related to the face through the face characteristics, and obtains better face attribute judgment effect; and by using a multi-task supervision method, each attribute task only shares a part of the network, so that the convenience of the network is ensured, and a better effect can be obtained.
The above description of the device according to the method or apparatus embodiment may also include other embodiments, and specific implementation may refer to the description of the related method embodiment, which is not described herein in detail.
It should be noted that, the above-mentioned apparatus or system in this specification may also include other implementation manners according to the description of the related method embodiment, and a specific implementation manner may refer to the description of the method embodiment, which is not described herein in detail. The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the hardware + program class, storage medium + program embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for the relevant points, refer to the partial description of the method embodiment.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. One typical implementation device is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a vehicle-mounted human-computer interaction device, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, when implementing one or more of the present description, the functions of each module may be implemented in one or more software and/or hardware, or a module implementing the same function may be implemented by a combination of multiple sub-modules or sub-units, etc. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method or apparatus that comprises the element.
As will be appreciated by one skilled in the art, one or more embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
One or more embodiments of the present description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. One or more embodiments of the present specification can also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment. In the description of the specification, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the specification. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the present invention in its spirit and scope. Are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (10)

1. A face attribute judging method is characterized by comprising the following steps:
obtaining key points of a face image by using a face detection and key point positioning method;
aligning the key points to the appointed coordinate position, and intercepting the image with the appointed width and the appointed height on the aligned image to obtain a face image;
inputting the face image into a convolutional neural network to obtain the probability of each face attribute, wherein:
the facial image is subjected to a series of convolution operation, activation operation, Eltwise operation and pooling operation to obtain facial features;
and processing the facial features through a plurality of branches respectively, wherein each branch sequentially comprises a convolution layer and a discriminator, the discriminator is a classifier or a regressor, the convolution layer of each branch is different from the convolution layers of other branches, and the classifier or the regressor of each branch outputs a probability of a human face attribute.
2. The face attribute judging method according to claim 1, wherein the convolutional neural network comprises a first extraction module, a second extraction module, a third extraction module, a fourth extraction module, a fifth extraction module, a sixth extraction module, a seventh extraction module, an eighth extraction module, a ninth extraction module and a tenth extraction module which are connected in sequence;
the facial image passes through a first extraction module to a tenth extraction module and passes through a convolution layer, an activation layer and a pooling layer to obtain the facial features;
the first extraction module, the third extraction module, the fourth extraction module, the sixth extraction module, the seventh extraction module, the eighth extraction module and the tenth extraction module respectively comprise a plurality of convolution layers, an activation layer and an Eltwise layer, and the second extraction module, the fifth extraction module and the ninth extraction module respectively comprise a plurality of convolution layers and an activation layer;
in a third extraction module, a fourth extraction module, a sixth extraction module, a seventh extraction module, an eighth extraction module and a tenth extraction module, the feature diagram input by the current extraction module is subjected to a plurality of convolution operations and activation operations, and the obtained feature diagram and the input feature diagram are subjected to Eltwise operation to obtain an output feature diagram which is used as the input of the next extraction module.
3. The face attribute judging method according to claim 2, wherein the face attributes include age, whether or not there is a beard, whether or not to wear glasses, whether or not to wear a mask, whether or not to wear sunglasses, and the number of branches is 5, wherein:
the facial features pass through the convolution layer of the first branch and the classifier to obtain the probability of having a beard and the probability of not having a beard; the facial features pass through the convolution layer and the regressor of the second branch to obtain the age of the person; the facial features pass through the convolution layer and the classifier of the third branch to obtain the probability of wearing glasses and the probability of not wearing glasses; the facial features pass through the convolution layer and the classifier of the fourth branch to obtain the probability of wearing the mask and the probability of not wearing the mask; and the facial features are subjected to the convolution layer and the classifier of the fifth branch to obtain the probability of wearing sunglasses and the probability of not wearing sunglasses.
4. The method for judging the attributes of the human face according to any one of claims 1 to 3, wherein the convolutional neural network is obtained by multitask supervised method training, and the trained loss function is cross entropy loss; during training, the samples in the training set are subjected to translation or rotation to expand data.
5. A face attribute determination apparatus, the apparatus comprising:
the face detection and key point positioning module is used for obtaining key points of a face image by using a face detection and key point positioning method;
the face image acquisition module is used for aligning the key points to the specified coordinate positions, and intercepting the images with the specified width and the specified height on the aligned images to obtain face images;
the facial feature extraction and human face attribute classification module is used for inputting facial images into a convolutional neural network to obtain the probability of each human face attribute, and comprises the following steps:
the facial feature extraction unit is used for obtaining facial features of the facial image through a series of convolution operation, activation operation, Eltwise operation and pooling operation;
the face attribute classification unit is used for processing the face features through a plurality of branches respectively, each branch sequentially comprises a convolution layer and a discriminator, the discriminator is a classifier or a regressor, the convolution layer of each branch is different from the convolution layers of other branches, and the classifier or the regressor of each branch outputs the probability of one face attribute.
6. The device for judging the attributes of the human face according to claim 5, wherein the convolutional neural network comprises a first extraction module, a second extraction module, a third extraction module, a fourth extraction module, a fifth extraction module, a sixth extraction module, a seventh extraction module, an eighth extraction module, a ninth extraction module and a tenth extraction module which are connected in sequence;
the facial image passes through a first extraction module to a tenth extraction module and passes through a convolution layer, an activation layer and a pooling layer to obtain the facial features;
the first extraction module, the third extraction module, the fourth extraction module, the sixth extraction module, the seventh extraction module, the eighth extraction module and the tenth extraction module respectively comprise a plurality of convolution layers, an activation layer and an Eltwise layer, and the second extraction module, the fifth extraction module and the ninth extraction module respectively comprise a plurality of convolution layers and an activation layer;
in a third extraction module, a fourth extraction module, a sixth extraction module, a seventh extraction module, an eighth extraction module and a tenth extraction module, the feature diagram input by the current extraction module is subjected to a plurality of convolution operations and activation operations, and the obtained feature diagram and the input feature diagram are subjected to Eltwise operation to obtain an output feature diagram which is used as the input of the next extraction module.
7. The apparatus according to claim 6, wherein the face attributes include age, whether or not there is a beard, whether or not glasses are worn, whether or not a mask is worn, whether or not sunglasses are worn, and the number of the branches is 5, wherein:
the facial features pass through the convolution layer of the first branch and the classifier to obtain the probability of having a beard and the probability of not having a beard; the facial features pass through the convolution layer and the regressor of the second branch to obtain the age of the person; the facial features pass through the convolution layer and the classifier of the third branch to obtain the probability of wearing glasses and the probability of not wearing glasses; the facial features pass through the convolution layer and the classifier of the fourth branch to obtain the probability of wearing the mask and the probability of not wearing the mask; and the facial features are subjected to the convolution layer and the classifier of the fifth branch to obtain the probability of wearing sunglasses and the probability of not wearing sunglasses.
8. The face attribute judging device of any one of claims 5 to 7, wherein the convolutional neural network is obtained by multitask supervised method training, and the loss function of the training is cross entropy loss; during training, the samples in the training set are subjected to translation or rotation to expand data.
9. A computer-readable storage medium for face attribute determination, comprising a memory for storing processor-executable instructions which, when executed by the processor, implement steps comprising the face attribute determination method of any of claims 1-4.
10. An apparatus for face attribute determination, comprising at least one processor and a memory storing computer-executable instructions, the processor implementing the steps of the face attribute determination method according to any one of claims 1 to 4 when executing the instructions.
CN201911138227.1A 2019-11-20 2019-11-20 Face attribute judgment method and device, computer readable storage medium and equipment Pending CN112825119A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911138227.1A CN112825119A (en) 2019-11-20 2019-11-20 Face attribute judgment method and device, computer readable storage medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911138227.1A CN112825119A (en) 2019-11-20 2019-11-20 Face attribute judgment method and device, computer readable storage medium and equipment

Publications (1)

Publication Number Publication Date
CN112825119A true CN112825119A (en) 2021-05-21

Family

ID=75906153

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911138227.1A Pending CN112825119A (en) 2019-11-20 2019-11-20 Face attribute judgment method and device, computer readable storage medium and equipment

Country Status (1)

Country Link
CN (1) CN112825119A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105404877A (en) * 2015-12-08 2016-03-16 商汤集团有限公司 Human face attribute prediction method and apparatus based on deep study and multi-task study
CN106203395A (en) * 2016-07-26 2016-12-07 厦门大学 Face character recognition methods based on the study of the multitask degree of depth
CN106529402A (en) * 2016-09-27 2017-03-22 中国科学院自动化研究所 Multi-task learning convolutional neural network-based face attribute analysis method
CN106815566A (en) * 2016-12-29 2017-06-09 天津中科智能识别产业技术研究院有限公司 A kind of face retrieval method based on multitask convolutional neural networks
CN107038429A (en) * 2017-05-03 2017-08-11 四川云图睿视科技有限公司 A kind of multitask cascade face alignment method based on deep learning
CN107220635A (en) * 2017-06-21 2017-09-29 北京市威富安防科技有限公司 Human face in-vivo detection method based on many fraud modes
CN108564029A (en) * 2018-04-12 2018-09-21 厦门大学 Face character recognition methods based on cascade multi-task learning deep neural network
CN109977781A (en) * 2019-02-26 2019-07-05 上海上湖信息技术有限公司 Method for detecting human face and device, readable storage medium storing program for executing
WO2019183758A1 (en) * 2018-03-26 2019-10-03 Intel Corporation Methods and apparatus for multi-task recognition using neural networks

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105404877A (en) * 2015-12-08 2016-03-16 商汤集团有限公司 Human face attribute prediction method and apparatus based on deep study and multi-task study
CN106203395A (en) * 2016-07-26 2016-12-07 厦门大学 Face character recognition methods based on the study of the multitask degree of depth
CN106529402A (en) * 2016-09-27 2017-03-22 中国科学院自动化研究所 Multi-task learning convolutional neural network-based face attribute analysis method
CN106815566A (en) * 2016-12-29 2017-06-09 天津中科智能识别产业技术研究院有限公司 A kind of face retrieval method based on multitask convolutional neural networks
CN107038429A (en) * 2017-05-03 2017-08-11 四川云图睿视科技有限公司 A kind of multitask cascade face alignment method based on deep learning
CN107220635A (en) * 2017-06-21 2017-09-29 北京市威富安防科技有限公司 Human face in-vivo detection method based on many fraud modes
WO2019183758A1 (en) * 2018-03-26 2019-10-03 Intel Corporation Methods and apparatus for multi-task recognition using neural networks
CN108564029A (en) * 2018-04-12 2018-09-21 厦门大学 Face character recognition methods based on cascade multi-task learning deep neural network
CN109977781A (en) * 2019-02-26 2019-07-05 上海上湖信息技术有限公司 Method for detecting human face and device, readable storage medium storing program for executing

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
RANJAN RAJEEV 等: "HyperFace: A Deep Multi-Task Learning Framework for Face Detection, Landmark Localization, Pose Estimation, and Gender Recognition", 《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》, vol. 41, no. 01, pages 121 - 135, XP011703713, DOI: 10.1109/TPAMI.2017.2781233 *
ZHANG ZHANPENG 等: "Facial landmark detection by deep multi-task learning", 《EUROPEAN CONFERENCE ON COMPUTER VISION 2014》, pages 94 *
张珂: "基于卷积神经网络的人脸检测和人脸属性识别研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》, no. 09, pages 138 - 903 *
徐培超 等: "多任务及Resnet网络在人脸多属性识别中的应用", 《小型微型计算机系统》, vol. 39, no. 12, pages 2720 - 2724 *
曾成 等: "基于多任务CNN的人脸活体多属性检测", 《科学技术与工程》, vol. 16, no. 32, pages 88 - 92 *
杨俊钦 等: "基于深度学习的人脸多属性识别系统", 《现代计算机(专业版)》, no. 05, pages 52 - 55 *

Similar Documents

Publication Publication Date Title
CN109359538B (en) Training method of convolutional neural network, gesture recognition method, device and equipment
Lajevardi et al. Higher order orthogonal moments for invariant facial expression recognition
CN110276248B (en) Facial expression recognition method based on sample weight distribution and deep learning
CN111539389A (en) Face anti-counterfeiting recognition method, device, equipment and storage medium
KR101802500B1 (en) Learning device for improving image recogntion performance and learning method thereof
CN111553326B (en) Hand motion recognition method and device, electronic equipment and storage medium
Mahmood et al. A Comparative study of a new hand recognition model based on line of features and other techniques
CN110288079A (en) Characteristic acquisition methods, device and equipment
CN111860078A (en) Face silence living body detection method and device, readable storage medium and equipment
CN111860056B (en) Blink-based living body detection method, blink-based living body detection device, readable storage medium and blink-based living body detection equipment
JP7141518B2 (en) Finger vein matching method, device, computer equipment, and storage medium
Verma et al. Age prediction using image dataset using machine learning
CN107944381A (en) Face tracking method, device, terminal and storage medium
Wang et al. Fusion network for face-based age estimation
US20210406568A1 (en) Utilizing multiple stacked machine learning models to detect deepfake content
Badi et al. New method for optimization of static hand gesture recognition
Reddy et al. Emotion detection using periocular region: A cross-dataset study
Minhas et al. Accurate pixel-wise skin segmentation using shallow fully convolutional neural network
Eckert et al. Fast facial expression recognition for emotion awareness disposal
Sikkandar Design a contactless authentication system using hand gestures technique in COVID-19 panic situation
CN112825119A (en) Face attribute judgment method and device, computer readable storage medium and equipment
CN112825117A (en) Behavior attribute judgment method, behavior attribute judgment device, behavior attribute judgment medium and behavior attribute judgment equipment based on head features
CN112825122A (en) Ethnicity judgment method, ethnicity judgment device, ethnicity judgment medium and ethnicity judgment equipment based on two-dimensional face image
Mohan et al. Facial expression recognition using improved local binary pattern and min-max similarity with nearest neighbor algorithm
YILDIZ et al. CNN-based gender prediction in uncontrolled environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination