CN116386099A

CN116386099A - Face multi-attribute identification method and model acquisition method and device thereof

Info

Publication number: CN116386099A
Application number: CN202211715157.3A
Authority: CN
Inventors: 黄乐; 王侃; 庞建新; 谭欢
Original assignee: Shenzhen Ubtech Technology Co ltd
Current assignee: Shenzhen Ubtech Technology Co ltd
Priority date: 2022-12-29
Filing date: 2022-12-29
Publication date: 2023-07-04

Abstract

The application belongs to the technical field of face recognition, and particularly relates to a face multi-attribute recognition method, a model acquisition method and a model acquisition device thereof. The method comprises the steps of constructing and training a face multi-attribute recognition model through a backbone network and a plurality of auxiliary branches, and then inputting a target face image into the trained face multi-attribute recognition model for multi-attribute recognition so as to output a plurality of face attributes corresponding to the target face image; furthermore, the method and the device expand the identification of a plurality of face attributes on the basis that only one or two face attributes can be identified in the prior art, further expand the application scene and can meet the application requirements of complex scenes.

Description

Face multi-attribute identification method and model acquisition method and device thereof

Technical Field

The present disclosure relates to the field of face recognition technologies, and in particular, to a method for identifying multiple attributes of a face, and a method and apparatus for obtaining a model of the face.

Background

With the development of deep learning technology, different tasks such as face detection, recognition and face attribute recognition are widely applied in various fields, wherein the face attribute recognition is a relatively complex task. Common facial attributes are key points, gestures, expressions, ages, sexes, face values, facial quality, masks, glasses and the like. In the prior art, when the identification of key points and gestures in the face attributes is realized, only one or two attributes can be identified, and the limitation exists on application scenes which need to be analyzed by multiple attributes at the same time. However, how to implement multi-attribute identification of face attributes is a problem that needs to be solved.

Disclosure of Invention

In view of this, the embodiments of the present application provide a face multi-attribute recognition model acquisition method, a face multi-attribute recognition method, a device, a terminal apparatus, and a readable storage medium.

In a first aspect, an embodiment of the present application provides a method for acquiring a face multi-attribute recognition model, including:

inputting the training image into a backbone network for face feature extraction, and predicting the extracted face features with different attributes through a plurality of auxiliary branches to obtain each face attribute prediction result;

and calculating the loss value of each face attribute prediction result through a loss function, and carrying out gradient back propagation on the weighted summation of the loss values as total loss to continue training until a preset training stopping condition is met, so as to obtain the face multi-attribute recognition model.

In some embodiments, the tasks predicted by the different attributes comprise regression type tasks and classification type tasks, and the different types of tasks adopt different loss functions; the regression type task comprises the prediction of key points, gestures, ages, face quality and face values of the face;

the classification task comprises the prediction of facial expression, gender, glasses and mask.

In some embodiments, the Loss function includes three types of Loss functions, wherein the predicting of the face key points in the regression type task adopts a Wing Loss function;

the rest attribute prediction in the regression task adopts a Smooth L1 Loss function;

each attribute prediction in the classification task adopts a cross entropy loss function.

In some embodiments, the Wing Loss function is expressed as follows:

wherein wing (x) represents the calculated loss value; x represents the difference between the predicted value and the real label; w and ε are preset values and C is a constant.

In some embodiments, each auxiliary branch includes a full connection layer, and when the face multi-attribute recognition model is formed, each full connection layer is accessed at a shared feature layer of the backbone network to form the plurality of auxiliary branches; wherein the number of auxiliary branches is equal to the number of all attribute predictions.

In some embodiments, the backbone network employs a MobileNet-V2 lightweight network.

In a second aspect, an embodiment of the present application further provides a face multi-attribute identification method, including:

inputting the target face image into the face multi-attribute recognition model obtained by the method to carry out multi-attribute recognition, and obtaining the prediction result of each face attribute.

In some embodiments, before the face image is input, the method further includes:

and acquiring an image containing a human face, extracting a human face prediction frame from the image through a human face detection algorithm, and cutting the human face prediction frame to obtain the target human face image to be input into the human face multi-attribute recognition model.

In a third aspect, an embodiment of the present application provides a device for acquiring a face multi-attribute recognition model, including:

the feature extraction module is used for inputting the training image into a backbone network to extract the face features, and predicting the extracted face features with different attributes through a plurality of auxiliary branches to obtain each face attribute prediction result;

and the network training module is used for calculating the loss value of each face attribute prediction result through a loss function, carrying out gradient back propagation on the weighted summation of the loss values as the total loss so as to continue training until the preset training stopping condition is met, and obtaining the face multi-attribute recognition model.

In a fourth aspect, an embodiment of the present application further provides a terminal device, where the terminal device includes a processor and a memory, where the memory stores a computer program, and the processor is configured to execute the computer program to implement the above-mentioned method for acquiring a face multi-attribute recognition model or the method for identifying a face multi-attribute.

In a fifth aspect, embodiments of the present application further provide a readable storage medium storing a computer program, where the computer program implements the above-mentioned face multi-attribute recognition model acquisition method or the face multi-attribute recognition method when executed on a processor.

The application has the following beneficial effects:

according to the face multi-attribute recognition method, the target face image is input into the trained face multi-attribute recognition model to carry out multi-attribute recognition, so that a plurality of face attributes corresponding to the target face image are output; furthermore, the method and the device amplify the identification of a plurality of face attributes on the basis that the prior art can only identify one or two face attributes, and further enlarge the application scene; and different auxiliary branches are connected to the shared feature layer of the backbone network for decoupling, so that the same model can output more face attributes, and the application requirements of complex scenes can be met.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of an implementation of a method for acquiring a face multi-attribute recognition model according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a face multi-attribute identification method according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of a face multi-attribute recognition model obtaining device according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a face multi-attribute identification apparatus according to an embodiment of the present application;

fig. 5 shows a schematic structural diagram of a terminal device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments.

The components of the embodiments of the present application, which are generally described and illustrated in the figures herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as provided in the accompanying drawings, is not intended to limit the scope of the application, as claimed, but is merely representative of selected embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, are intended to be within the scope of the present application.

In the following, the terms "comprises", "comprising", "having" and their cognate terms may be used in various embodiments of the present application are intended only to refer to a particular feature, number, step, operation, element, component, or combination of the foregoing, and should not be interpreted as first excluding the existence of or increasing the likelihood of one or more other features, numbers, steps, operations, elements, components, or combinations of the foregoing. Furthermore, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.

Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which various embodiments of this application belong. The terms (such as those defined in commonly used dictionaries) will be interpreted as having a meaning that is identical to the meaning of the context in the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein in connection with the various embodiments.

Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The embodiments described below and features of the embodiments may be combined with each other without conflict.

In the prior art, when the identification of key points and gestures in the face attributes is realized, only one or two attributes can be identified, and the limitation exists on application scenes which need to be analyzed by multiple attributes at the same time. PFLD (A Pratical Facial Landmark Detector), a face key point detection algorithm. When the face detection is carried out through the PFLD algorithm, the recognition of key points and gestures in the face attribute can be realized, but the application scene which needs multiple attributes for analysis at the same time has limitation. In addition, the PFLD algorithm designs special Loss functions aiming at the key point training of scenes such as a side face, a head-up scene, a low head scene and the like, different categories are given different weights, but basically, the Loss functions are used for calculating the L1 or L2 distance between a predicted value and a true value of the key point, L1 Loss is unfavorable for the convergence of training, L2 Loss is sensitive to abnormal points, and the accuracy of the key point prediction can be influenced.

Based on this, the application proposes a face multi-attribute recognition method, and the face multi-attribute recognition method is described in detail below. In this embodiment, the method is divided into two phases, namely a model training phase and a model application phase.

Alternatively, the embodiments of the present application may be implemented based on a terminal device having data storage and visualization functions, such as a server or a computer device. In this embodiment, a server is taken as an example, and a face image database is mounted on the server, where the face image database stores face images with different face poses and different light and shadow effects. Further, the face images stored in the face image database cover faces of different sexes and different countries, and the face images have different postures and facial expressions. Further, the face multi-attribute recognition model can be trained by invoking the face image database to obtain corresponding face images as training images.

The model training phase is described here. Referring to fig. 1, exemplary, an embodiment of the present application provides a method for obtaining a face multi-attribute recognition model, where the method includes:

s110, inputting the training image into a backbone network to extract the facial features, and predicting the extracted facial features with different attributes through a plurality of auxiliary branches to obtain each facial attribute prediction result.

In this embodiment, the face image database is called to obtain a predetermined number of face images with different face poses, and the face images are used as training images to be correspondingly input into the backbone network for face feature extraction. The predetermined number may be specifically determined according to actual requirements, and the embodiment is not limited herein, for example, the predetermined number may be 100, 200, 300, or the like.

The embodiment combines a backbone network and a plurality of auxiliary branches to form a face multi-attribute recognition model. Furthermore, in the model training stage, the main network and the auxiliary branches are trained by acquiring a certain amount of sample data so as to improve the recognition accuracy of the face multi-attribute recognition model. Optionally, when training the backbone network and the plurality of auxiliary branches, face attribute recognition and training may be specifically performed based on a face key point detection algorithm such as PFLD algorithm.

Specifically, a predetermined number of face images are acquired from a face image database to form a face image dataset, and the face image dataset is divided into a training dataset and a verification dataset according to a certain proportion; the division ratio is not limited herein, and for example, the face image dataset may be divided into a training dataset and a verification dataset according to a ratio of 9:1. The training data set is used for training the model, and the verification data set is used for verifying the training result of the model so as to judge whether the accuracy of the model meets the requirement.

Further, labeling all face images in the training data set to form labeling data, wherein the labeling data are stored in a text format (namely TXT format), the labeling is carried out on attributes corresponding to face features in the face images, and each row of stored labeling data sequentially stores real labels of various face attributes. For example, the face attributes include, but are not limited to, key points, gestures, expressions, age, gender, face value, face quality, mask, glasses. The key points of the face mainly refer to important feature points capable of representing the uniqueness of the face, for example, but not limited to eyes, mouth, nose, eyebrows, etc., and may be specifically selected according to actual needs, which is not limited herein.

When the face image is marked, the marked face attribute is at least nine attributes; and further, when the face multi-attribute recognition model is obtained based on the at least nine face attributes in a training way, the recognition of the at least nine face attributes can be realized through the face multi-attribute recognition model, namely the multi-attribute face recognition is realized.

In this embodiment, a training data set is loaded, face features of face images in the training data set are extracted through a pre-built backbone network, and further, predicted values of corresponding face attributes are output through each auxiliary branch.

Further, the description of the face features can be specifically divided into geometric features and algebraic features. The geometrical characteristics are characteristics based on the shape and geometrical relations of facial organs, the facial is composed of eyes, nose, mouth, chin and other organs, the collection positions of the characteristic points are relatively fixed, the geometrical characteristic description can be used as important characteristics of the facial, namely, the characteristics of main organs of the facial are extracted on the knowledge level by using a structure-based method through priori knowledge of geometrical relations of facial topological structures, and the facial is represented by a group of geometrical characteristic vectors. The algebraic features of the human face are determined by the gray level distribution of the image, the internal information of the image is described, the features of the human face are captured and described on the whole, and the applied skills mainly of standard forehead statistics are complex in operation.

It should be understood that the backbone network in this embodiment is only a tool for extracting image features, and is not dependent on a specific network structure, for example, a common deep learning model may be used, such as a heavy-weight network of the res net series, a MobileNet series, a shuffleNet series, and a light-weight network of the shuffleNet series, which are not limited herein. Alternatively, the backbone network may employ a MobileNet-V2 lightweight network.

Further, in this embodiment, different attribute predictions are performed on the extracted face features through multiple auxiliary branches, so as to obtain each face attribute prediction result. When the face multi-attribute recognition model is formed, specifically, each full-connection layer can be accessed to a shared feature layer of a backbone network to form a plurality of auxiliary branches; wherein the number of auxiliary branches is equal to the number of all attribute predictions.

In this embodiment, the full connection layer (fully connected layers, FC) is used to implement classification and prediction of the face attribute features, that is, to divide the face features and the attributes in an associated manner, so as to output the attribute predicted value of the face features.

It can be understood that different auxiliary branches are accessed on the shared feature layer of the backbone network, and the next prediction is carried out through the auxiliary branches so as to output an attribute prediction value; in the embodiment, when different auxiliary branches are utilized to perform attribute decoupling, the same model can output more attribute prediction results, and the method is convenient to apply to complex scenes.

As an alternative implementation manner, when predicting different attributes of the face feature through multiple auxiliary branches, for example, the tasks of nine attribute predictions may be classified into a regression type task and a classification type task, where different types of tasks adopt different loss functions, so as to improve the prediction precision and the model training precision. Specifically, the regression type task comprises the prediction of key points, gestures, ages, face quality and face values of the face; the classification task comprises the prediction of facial expression, gender, glasses and mask.

And S120, calculating a loss value of each face attribute prediction result through a loss function, carrying out gradient back propagation on each loss value weighted sum as a total loss so as to continue training until a preset training stopping condition is met, and obtaining the face multi-attribute recognition model.

The method comprises the steps of obtaining real labels corresponding to all face images (namely training images) in a training data set, obtaining face attribute prediction results output in a model training process, substituting the real labels and the face attribute prediction results into a loss function expression, and calculating the total loss value of the training through a loss function.

The loss value characterizes the similarity between the face attribute prediction result and the real label; if the loss value is smaller, the face attribute prediction result is closer to the real label, and the recognition accuracy of the model is higher; otherwise, if the loss value is larger, the face attribute prediction result deviates from the real label, and the recognition accuracy of the model is lower.

In one embodiment, the Loss function comprises three types of Loss functions, wherein a Wing Loss function is adopted for predicting the key points of the face in the regression task, and the Wing Loss function can amplify small Loss, so that the accuracy of the key point regression is improved; the rest attribute prediction in the regression task adopts a Smooth L1 Loss function; each attribute prediction in the typed task employs a cross entropy loss function. Wherein the cross entropy loss function is used to calculate a cross entropy loss value for the corresponding image feature.

For the task of predicting the key point, a Wing Loss function is adopted, and the expression is as follows:

For the task of predicting pose, age, face quality and face value, a Smooth L1 Loss function is adopted, and the expression is as follows:

wherein SmoothL ₁ (x) Representing the calculated loss value; x represents the difference between the predicted value and the real label.

For the predictive tasks of expression, gender, glasses, mask, a Cross Entropy Loss (i.e. cross entropy loss) loss function will be employed, expressed as follows:

wherein H represents the calculated entropy, p (x _i ) For the true tag value, q (x _i ) For the predicted value, n represents the number of categories of classification.

It will be appreciated that the expressions of the above loss functions are merely examples, and other loss functions may be selected to perform the loss value calculation during actual use, which is not limited herein. In addition, the loss function expression may further define the construction of the loss function by adding other constraint conditions based on the constraint conditions, in addition to correspondingly constructing the loss function based on the constraint conditions, and is not limited herein.

It will be appreciated that the total loss value for each training is derived from the addition or weighted summation of the loss values calculated by each of the three classes of loss functions. In this embodiment, the network parameters corresponding to the main network and the plurality of auxiliary branches are correspondingly adjusted by the total loss value calculated in each training, so as to correspondingly improve the model precision, and thus the precision of the face multi-attribute recognition model obtained after multiple training meets the requirement.

In the process, gradient reverse transmission is carried out according to the calculated total loss value, and the weight values in the main network and the auxiliary branch structure can be iterated by adopting a gradient descent method through reverse propagation, for example, and training is continued until a preset training stopping condition is met, so that a trained face multi-attribute recognition model can be obtained. It can be understood that the above weights mainly include the connection weights among the neurons in the network structure, the deviations of the neurons, and the like. With respect to the principle of reverse transmission of the neural network, reference is specifically made to the published related literature, which is not described here, as it is not the focus of the present application.

The preset training stop condition may include, but is not limited to, that the total loss value is small enough, such as approaching 0 or being within a certain range (i.e. passing through a predetermined loss threshold as a judgment standard), or reaching the training number, etc., and may be specifically set according to the actual requirement, which is not limited herein. It will be appreciated that when the total loss value is less than or equal to the predetermined loss threshold, it is indicative that the accuracy of the current model prediction meets the requirements, i.e., training of the model may cease. Otherwise, when the loss value is larger than a preset loss threshold value, a large difference exists between the face attribute prediction result output by the representation model and the real label, and further the face multi-attribute recognition model needs to be continuously trained.

In the training process, the network parameters in the face multi-attribute recognition model are adjusted by taking the total loss value of each training as a basis to obtain an updated face multi-attribute recognition model, and after repeated iterative training, if the loss value predicted by the model is within a preset allowable range, the face multi-attribute recognition model can be judged to be trained. It will be appreciated that the trained face multi-attribute recognition model can be used to extract the desired face multi-attribute features.

Further, the model application stage will be described based on the trained face multi-attribute recognition model. The image features are extracted by acquiring face images in a plurality of postures in a required application scene and then utilizing a trained face multi-attribute recognition model obtained in a training stage.

According to the face multi-attribute recognition model, different attribute predictions are carried out on face features through a main network and a plurality of auxiliary branches, so that the model is trained for multiple times through the face attribute prediction result and the total loss value of each training calculated by a loss function, and finally a high-precision face multi-attribute recognition model is obtained; and then a plurality of face attributes of the face image can be identified through the face multi-attribute identification model, so that an application scene which needs to be analyzed by a plurality of attributes at the same time is satisfied.

Referring to fig. 2, based on the model acquisition method of the above embodiment, the present embodiment provides a face multi-attribute recognition method, which includes:

s210, inputting the target face image into a face multi-attribute recognition model to perform multi-attribute recognition, and obtaining a prediction result of each face attribute.

Inputting the target face image into a trained face multi-attribute recognition model to perform face multi-attribute recognition, and outputting a prediction result of each face attribute of the target face image through the face multi-attribute recognition model, wherein the prediction result can specifically comprise the prediction results of the nine attributes.

As an alternative implementation manner, before S210 described above, this embodiment further includes the following steps:

s220, obtaining an image containing a human face, extracting a human face prediction frame from the image through a human face detection algorithm, and cutting the human face prediction frame to obtain a target human face image to be input into a human face multi-attribute recognition model.

An image containing a face is acquired, which may be a face image or a human body image containing a face. Then cutting the image through a face detection algorithm to identify face information therefrom, cutting the face information to obtain a face image, and taking the face image obtained after cutting as a target face image.

Specifically, the face contour information of the image can be identified based on the face detection algorithm, and then a face prediction frame is generated according to the face contour information, so that the image is cut according to the face prediction frame.

Optionally, the shape of the face prediction frame may be that the edge of the face prediction frame is attached to the edge of the face contour of the identified image; or, the face prediction frame may be a frame with a fixed shape or a fixed contour, and the specific shape and the contour of the frame may be set correspondingly according to actual requirements, which is not limited herein.

According to the face multi-attribute identification method provided by the embodiment of the application, the target face image is input into the face multi-attribute identification model to carry out multi-attribute identification so as to output a plurality of face attributes corresponding to the target face image, and on the basis that only one or two face attributes can be identified in the prior art, the identification of the plurality of face attributes is amplified, so that the application scene is further expanded; different branches are connected to the shared feature layer of the backbone network for decoupling, so that the same model can output more attributes, and the application of complex scenes is facilitated; furthermore, the face multi-attribute identification method can be applied to scenes such as interactive entertainment, special effect beauty, security monitoring and the like, and various attribute information of the face can be obtained in real time; in addition, the face multi-attribute recognition method adopts a lightweight recognition model to extract face features, and can be deployed on edge equipment, so that the recognition flexibility is improved.

Referring to fig. 3, based on the model acquisition method of the above embodiment, the present embodiment provides a face multi-attribute recognition model acquisition device 100, including:

the feature extraction module 110 is configured to input the training image into a backbone network to perform face feature extraction, and then predict different attributes of the extracted face feature through a plurality of auxiliary branches to obtain each face attribute prediction result; the main network and the plurality of auxiliary branches are used for forming a face multi-attribute recognition model;

the network training module 120 is configured to calculate a total loss value of the training by using a loss function according to various face attribute prediction results and real labels of the training images, and input a next training image to continue training when the total loss value of the training does not meet a preset condition, and stop until the calculated total loss value meets the preset condition, so as to obtain a trained face multi-attribute recognition model.

It can be understood that the apparatus of this embodiment corresponds to the method for acquiring a face multi-attribute recognition model of the above embodiment, and the options in the above embodiment are also applicable to this embodiment, so the description thereof will not be repeated here.

Referring to fig. 4, based on the above-mentioned method for recognizing multiple attributes of a face, the present embodiment provides a device 200 for recognizing multiple attributes of a face, which includes:

the recognition module 210 is configured to input the target face image into a face multi-attribute recognition model to perform multi-attribute recognition, so as to obtain a prediction result of each face attribute.

Further optionally, the face multi-attribute identifying apparatus 200 further includes:

the obtaining module 220 is configured to obtain an image including a face, extract a face prediction frame from the image through a face detection algorithm, and cut the face prediction frame to obtain the target face image to be input into the multi-attribute face recognition model.

It can be understood that the apparatus of this embodiment corresponds to the face multi-attribute recognition method of the above embodiment, and the options in the above embodiment are also applicable to this embodiment, so the description is not repeated here.

Referring to fig. 5, a schematic structural diagram of a terminal device 10 according to an embodiment of the present application is shown. The terminal device 10 may exemplarily include a memory 11 and a processor 12, where the memory 11 stores a computer program, and the processor 12 is configured to execute the computer program to implement the face multi-attribute recognition model acquisition method or the face multi-attribute recognition method according to the embodiment of the present application, so that multiple attribute features of a face can be recognized at the same time, and recognition efficiency is improved.

It should be noted that, the terminal device 10 in the embodiment of the present application may be a device with a strong computing capability, such as a computer, a notebook computer, etc., but may also be some mobile terminal devices or embedded devices with very limited computing capability, such as a robot, a smart phone, a tablet, or even some smart home devices. The face multi-attribute recognition model obtained by the training method in the embodiment of the present application is deployed in the terminal devices 10, so as to perform face multi-attribute detection under some required scenes, for example, when lawless persons perform different camouflage, etc., and multiple attributes of the face are obtained to further determine whether the person is the same person, etc., which can of course also be applied to other scenes, and is not limited herein.

The Memory 11 may be, but is not limited to, a random access Memory (Random Access Memory, RAM), a Read Only Memory (ROM), a programmable Read Only Memory (Programmable Read-Only Memory, PROM), an erasable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), an electrically erasable Read Only Memory (Electric Erasable Programmable Read-Only Memory, EEPROM), etc. Wherein the memory 11 is adapted to store a computer program which, upon receiving an execution instruction, is executable by the processor 12 accordingly.

The processor 12 may be an integrated circuit chip having signal processing capabilities. Processor 12 may be a general-purpose processor including at least one of a central processing unit (Central Processing Unit, CPU), a graphics processor (Graphics Processing Unit, GPU) and network processor (Network Processor, NP), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like that may implement or perform the methods, steps, and logic blocks disclosed in embodiments of the present application.

The present application also provides a readable storage medium for storing the computer program for use in the above terminal device.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners as well. The apparatus embodiments described above are merely illustrative, for example, of the flow diagrams and block diagrams in the figures, which illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules or units in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a smart phone, a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application.

Claims

1. The method for acquiring the human face multi-attribute recognition model is characterized by comprising the following steps of:

2. The method for acquiring the human face multi-attribute recognition model according to claim 1, wherein the tasks predicted by different attributes comprise regression type tasks and classification type tasks, and different loss functions are adopted for the different types of tasks; the regression type task comprises the prediction of key points, gestures, ages, face quality and face values of the face;

3. The method for acquiring the face multi-attribute recognition model according to claim 2, wherein the Loss function comprises three types of Loss functions, wherein a Wing Loss function is adopted for the prediction of the face key points in the regression task;

4. A face multi-attribute recognition model acquisition method according to claim 3, wherein the expression of the Wing Loss function is as follows:

5. The method for acquiring the multi-attribute face recognition model according to claim 1, wherein each auxiliary branch comprises a full connection layer, and when the multi-attribute face recognition model is formed, each full connection layer is accessed to a shared feature layer of the backbone network to form the plurality of auxiliary branches; wherein the number of auxiliary branches is equal to the number of all attribute predictions;

the backbone network adopts a MobileNet-V2 lightweight network.

6. The face multi-attribute identification method is characterized by comprising the following steps of:

inputting the target face image into the face multi-attribute recognition model obtained by the method according to any one of claims 1 to 5 for multi-attribute recognition to obtain a prediction result of each face attribute.

7. The face multi-attribute recognition method according to claim 6, further comprising, before the face image is input:

8. A face multi-attribute recognition model acquisition device, characterized by comprising:

the feature extraction module is used for inputting the training image into a backbone network to extract the face features, and predicting the extracted face features with different attributes through a plurality of auxiliary branches to obtain each face attribute prediction result; the main network and the plurality of auxiliary branches are used for forming a face multi-attribute recognition model;

9. A terminal device, characterized in that it comprises a processor and a memory, the memory storing a computer program, the processor being adapted to execute the computer program to implement the method of any of claims 1-7.

10. A readable storage medium, characterized in that it stores a computer program which, when executed on a processor, implements the method according to any of claims 1-7.