CN111723613A

CN111723613A - Face image data processing method, device, equipment and storage medium

Info

Publication number: CN111723613A
Application number: CN201910214184.4A
Authority: CN
Inventors: 任明罡
Original assignee: Guangzhou Huiruisitong Information Technology Co Ltd
Current assignee: Guangzhou Huiruisitong Information Technology Co Ltd
Priority date: 2019-03-20
Filing date: 2019-03-20
Publication date: 2020-09-29

Abstract

The application relates to a method, a device, equipment and a storage medium for processing face image data, wherein the method comprises the following steps: acquiring an image to be processed, wherein the image to be processed comprises at least one face; preprocessing an image to be processed to obtain a positive attitude image of a human face; extracting a target feature map in the positive attitude image by using a preset feature map extraction network; and respectively carrying out attribute estimation and face recognition on the target feature map to obtain attribute data and recognition categories of the face. The method can solve the problem that the prediction process is time-consuming due to the increase of the computation amount in the prior art, and can ensure that the accuracy of face recognition is not influenced, a large amount of computation amount is not increased and a large amount of time is not consumed in the process of estimating attribute data during face recognition.

Description

Face image data processing method, device, equipment and storage medium

Technical Field

The present application relates to the field of machine learning applications, and in particular, to a method, an apparatus, a device, and a storage medium for processing facial image data.

Background

In recent years, with the increase of data volume and the enhancement of computer computing capability, deep learning is widely applied in many fields, and particularly in the aspect of image detection and recognition, the application of deep learning is particularly wide. Deep learning has further advanced machine learning due to the study of artificial neural networks.

In practical application, the structural characteristics of the convolutional neural network determine that the convolutional neural network is more suitable for face recognition in the field of digital images. However, the mere face recognition using a face image has not been able to satisfy the needs of people, and people want to perform gender estimation and/or age estimation using a face image based on face recognition. In the prior art, a face image is input into one depth network model for face recognition, and then the same face image is input into another depth network model for gender estimation and/or age estimation. For each face image, because a plurality of depth network models are used, the amount of calculation is increased, and the prediction process becomes time-consuming.

Disclosure of Invention

In order to solve the technical problems or at least partially solve the technical problems, the application provides a face image data processing method, a face image data processing device, a face image data processing equipment and a storage medium.

In a first aspect, an embodiment of the present application provides a method for processing face image data, including:

acquiring an image to be processed, wherein the image to be processed comprises at least one face;

preprocessing the image to be processed to obtain a posture correcting image of the face;

extracting a target feature map in the positive attitude image by using a preset feature map extraction network;

and respectively carrying out attribute estimation and face recognition on the target feature map to obtain attribute data and recognition categories of the face.

Optionally, performing attribute estimation on the target feature map to obtain attribute data of the face, including:

inputting the target characteristic diagram into a preset attribute estimation network;

and performing attribute estimation on the target feature map by using the preset attribute estimation network to obtain the attribute data of the human face.

Optionally, if the attribute data is age, the performing attribute estimation on the target feature map to obtain the attribute data of the face includes:

inputting the target feature map into the preset attribute estimation network to obtain age similarity vectors containing the similarity corresponding to each age label;

multiplying each similarity in the age similarity vector by the value of the corresponding age label to obtain a multiplication result;

and adding a plurality of multiplication results, and calculating to obtain the target age corresponding to the human face.

Optionally, performing face recognition on the target feature map to obtain a recognition category of the face, including:

inputting the target feature map into a preset face classification network;

carrying out face recognition on the target feature map by using the preset face classification network to obtain a target feature vector;

and determining the recognition category of the face by using the target feature vector.

Optionally, the preprocessing the image to be processed to obtain a posture correcting image of the face includes:

positioning key feature points of the face in the image to be processed;

when the number of the key feature points of the face exceeds a preset value, determining that a face image exists in the image to be processed;

acquiring image brightness information, definition information and symmetry information of the face image;

judging whether the image brightness information, the definition information and the symmetry information all meet the image requirements;

if the image brightness information, the definition information and the symmetry information all accord with the image requirement, correcting the facial image;

and carrying out normalization processing on the corrected face image to obtain the posture correcting image.

Optionally, the method further includes:

constructing a face recognition data set, an attribute estimation data set and a basic neural network, wherein each image in the attribute estimation data set corresponds to an attribute data label;

preprocessing each image in the face recognition data set respectively, and preprocessing each image in the attribute estimation data set respectively;

training the basic neural network by utilizing the preprocessed face recognition data set to obtain the preset characteristic diagram extraction network;

extracting a first feature map of each image in the preprocessed face recognition data set by using the preset feature map extraction network to obtain a first feature map data set, and extracting a second feature map of each image in the preprocessed attribute estimation data set by using the preset feature map extraction network to obtain a second feature map data set;

respectively constructing a first network and a second network at the output end of the preset feature map extraction network;

training the first network by using the first feature map data set to obtain a preset face classification network;

and training the second network by using the second feature map data set and the attribute data labels respectively corresponding to each second feature map in the second feature map data set to obtain a preset attribute estimation network.

Optionally, the training the second network by using the second feature map data set and the attribute data label corresponding to each second feature map in the second feature map data set, to obtain a preset attribute estimation network, includes:

dividing the second feature map dataset into a training dataset and a testing dataset;

selecting attribute data labels respectively corresponding to each second feature map in the training data set to obtain a training label data set;

selecting attribute data labels respectively corresponding to each second feature map in the test data set to obtain a test label data set;

training the second network by using the training data set and the training label data set to obtain a trained network;

and testing the trained network by using the test data set and the test label data set to obtain the preset attribute estimation network.

In a second aspect, an embodiment of the present application provides a face image data processing apparatus, including: the device comprises an acquisition module, a preprocessing module, an extraction module and a processing module;

the acquisition module is used for acquiring an image to be processed, wherein the image to be processed comprises at least one face;

the preprocessing module is used for preprocessing the image to be processed to obtain a positive attitude image of the face;

the extraction module is used for extracting a network by utilizing a preset feature map and extracting a target feature map in the positive attitude image;

and the processing module is used for respectively carrying out attribute estimation and face recognition on the target feature map to obtain attribute data and recognition categories of the face.

In a third aspect, an embodiment of the present application provides a face image data processing device, including: a processor, a memory, a communication interface, and a bus;

the processor, the memory and the communication interface complete mutual communication through the bus;

the communication interface is used for information transmission between external devices;

the processor is configured to invoke program instructions in the memory to perform the steps of the method according to any of the first aspects.

In a fourth aspect, the present embodiments provide a computer-readable storage medium storing computer instructions for causing a computer to perform the steps of the method according to any one of the first aspect.

Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages:

the method for processing the face image data comprises the following steps: acquiring an image to be processed, wherein the image to be processed comprises at least one face; preprocessing an image to be processed to obtain a positive attitude image of a human face; extracting a target feature map in the positive attitude image by using a preset feature map extraction network; and respectively carrying out attribute estimation and face recognition on the target feature map to obtain attribute data and recognition categories of the face. Therefore, because the face recognition process and the attribute estimation process share one feature map extraction network, and the face recognition process and the attribute estimation process use the same feature map extraction network to output the target feature map, the face recognition accuracy is not influenced, a large amount of calculation is not increased, and a large amount of time is not consumed in the process of simultaneously estimating attribute data in the face recognition. Therefore, the problem that the operation amount is increased and the prediction process is time-consuming in the prior art can be solved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a flowchart of a method for processing face image data according to an embodiment of the present application;

fig. 2 is a first connection relationship diagram of a preset attribute estimation network and a preset feature diagram extraction network according to an embodiment of the present application;

fig. 3 is a second connection relationship diagram of the preset attribute estimation network and the preset feature diagram extraction network according to the embodiment of the present application;

fig. 4 is a connection relationship diagram of a preset face classification network and a preset feature map extraction network according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a face image data processing apparatus according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a face image data processing device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The application provides a method, a device, equipment and a storage medium for processing face image data, which can solve the problem that the increase of computation amount in the prior art and the time consumption of a prediction process are caused, so that the accuracy of face recognition can not be influenced, a large amount of computation amount can not be increased and a large amount of time can not be consumed in the process of estimating attribute data during face recognition.

First, a method for processing face image data in the embodiment of the present application is described in detail, as shown in fig. 1, the method for processing face image data may include steps S101 to S104:

s101, obtaining an image to be processed, wherein the image to be processed comprises at least one face.

Illustratively, the image to be processed may be a snapshot image or a frame in a video stream.

S102, preprocessing the image to be processed to obtain a posture correcting image of the face.

Wherein, the preprocessing process can comprise the following steps: positioning key feature points of the face, detecting the face, screening image quality, normalizing the image, correcting the image and the like.

And S103, extracting a target feature map in the positive attitude image by using a preset feature map extraction network.

For example, the preset feature map extraction network may be a MobileNet network. The MobileNet network has the advantages of few calculation parameters, few times of addition and multiplication, and small calculation amount.

And S104, respectively carrying out attribute estimation and face recognition on the target feature map to obtain attribute data and recognition categories of the face.

Specifically, attribute estimation is performed on the target feature map to obtain attribute data of the face. And carrying out face recognition on the target feature image to obtain the recognition category of the face.

Wherein the identifying the category may include: and identifying whether the face is in a preset database, and/or identifying identity information corresponding to the face.

Furthermore, the preset feature map extraction network can select a lightweight convolutional neural network, so that the requirement on the deployment of model hardware is reduced, and the face image data processing method provided by the embodiment of the application is suitable for the development of the off-line mobile terminal equipment.

In another embodiment of the present invention, on the basis of the embodiments of the foregoing step S101 to step S104, performing attribute estimation on the target feature map to obtain attribute data of the face, may include steps S201 to S202:

s201, inputting the target characteristic diagram into a preset attribute estimation network.

Wherein, the preset attribute estimation network may include: a global average pooling layer and a full connection layer. The number of the global average pooling layers can be set according to actual needs. The number of fully-connected layers and the number of output categories of each fully-connected layer can be set according to actual needs.

S202, carrying out attribute estimation on the target feature map by using the preset attribute estimation network to obtain the attribute data of the human face.

Illustratively, the attribute data may include at least one of age, gender, and a result of the liveness detection.

The attribute data may also include other data according to actual use requirements.

In another embodiment of the present invention, on the basis of the embodiment of the foregoing step S201 to step S202, if the attribute data is age, the performing attribute estimation on the target feature map to obtain the attribute data of the face may include step S301 to step S303:

s301, inputting the target feature map into the preset attribute estimation network to obtain an age similarity vector containing the similarity corresponding to each age label.

The value range of the age label can be 0-100 years old, and the value of the age label is an integer. Illustratively, the age similarity vector may be [ p ]₀，p₁，p₂，...，p_i，…，p₁₀₀]，p_iValue x of corresponding age label_i，x_i＝i，i＝0，1，...，100。

And S302, multiplying each similarity in the age similarity vector by the corresponding value of the age label to obtain a multiplication result.

Wherein a multiplication result of 0 × p is obtained₀、1×p₁、…、99×p₉₉And 100 × p₁₀₀。

And S303, adding the multiplication results, and calculating to obtain the target age corresponding to the human face.

In another embodiment of the present invention, on the basis of the embodiment of the foregoing step S201 to step S202, if the attribute data is gender, the performing attribute estimation on the target feature map to obtain the attribute data of the face may include steps S601 to S602:

s601, inputting the target characteristic diagram into the preset attribute estimation network to obtain a gender similarity vector containing the similarity corresponding to each gender label.

The content in the gender label can be male or female. For example, the gender similarity vector may be [ p ]_M，p_W]，p_MContent of the corresponding gender tag: male, p_WContent of the corresponding gender tag: female, the female is provided with a drug.

S602, determine the content in the gender label corresponding to the larger similarity as the target gender.

Wherein, if p_MGreater than p_WThen the sex of interest is male, if p_MIs less than p_WThe target gender is female.

In another embodiment of the present invention, on the basis of the foregoing step S201 to step S202, if the attribute data is a living body detection result, the performing attribute estimation on the target feature map to obtain the attribute data of the human face may include step S701 to step S702:

s701, inputting the target feature map into the preset attribute estimation network to obtain a living body detection similarity vector containing similarities corresponding to each living body detection result label.

Here, the content in the living body detection result tag may be a living body or not. Illustratively, the in-vivo detection similarity vector may be [ p ]_Y，p_N]，p_YContent of the corresponding live body detection result tag: is a living body, p_NContent of the corresponding live body detection result tag: not a living body.

S702, determines the content in the living body detection result label corresponding to the larger similarity as the target living body detection result.

Wherein, if p_YGreater than p_NAnd then the target living body detection result is as follows: is a living body, if p_YIs less than p_NAnd then the target living body detection result is as follows: not a living body.

In another embodiment of the present invention, on the basis of the foregoing step S201 to step S202, if the attribute data includes age and gender, the performing attribute estimation on the target feature map to obtain the attribute data of the face may include steps S801 to S803:

s801, inputting the target feature map into the preset attribute estimation network, to obtain an age similarity vector including similarities corresponding to each age label, and a gender similarity vector including similarities corresponding to each gender label.

And S802, multiplying each similarity in the age similarity vector by the corresponding value of the age label to obtain a multiplication result, adding the multiplication results, and calculating to obtain the target age corresponding to the face.

And S803, determining the content in the sex label corresponding to the larger similarity in the sex similarity vector as the target sex.

Further, the description will be given taking an example in which the attribute data includes age and gender. Illustratively, as shown in fig. 2, the first output terminal of the preset feature map extraction network is connected to an input terminal of the preset attribute evaluation network, and the preset attribute evaluation network may include: the global average pooling layer, the first full-connection layer, the second full-connection layer and the third full-connection layer are connected, a first output end of a preset feature map extraction network is connected with an input end of the global average pooling layer, an output end of the global average pooling layer is connected with an input end of the first full-connection layer, a first output end of the first full-connection layer is connected with an input end of the second full-connection layer, and a second output end of the first full-connection layer is connected with an input end of the third full-connection layer. The second fully connected layer and the third fully connected layer may be in a side-by-side relationship.

Preferably, the number of first fully-connected layer output classes may be 256, the number of second fully-connected layer output classes may be 101, and the number of third fully-connected layer output classes may be 2.

Wherein the second fully connected layer may output an age similarity vector (101 dimensions) for the age category and the third fully connected layer may output a gender similarity vector (2 dimensions) for the gender category.

The second full-connection layer is connected with the cross entropy loss function, the third full-connection layer is connected with the cross entropy loss function, and the cross entropy loss function is required to be used in the process of training the preset attribute estimation network.

In another embodiment of the present invention, on the basis of the foregoing step S801 to step S803, if the attribute data includes age, gender, and a living body detection result, the performing attribute estimation on the target feature map to obtain the attribute data of the human face may further include step S901 to step S902:

s901, inputting the target feature map into the preset attribute estimation network to obtain a living body detection similarity vector containing similarities corresponding to each living body detection result label.

Here, the content in the living body detection result tag may be a living body or not. Illustratively, the in-vivo detection similarity vector may be [ p ]_Y，p_N]，p_YCorresponding living body detection nodeContent of fruit label: is a living body, p_NContent of the corresponding live body detection result tag: not a living body.

And S902, determining the content in the living body detection result label corresponding to the larger similarity in the living body detection similarity vector as the target living body detection result.

Further, the description will be given taking an example in which the attribute data includes age, sex, and result of the living body test. Illustratively, as shown in fig. 3, the first output terminal of the preset feature map extraction network is connected to the input terminal of the preset attribute estimation network, and the preset attribute estimation network may include: the global average pooling layer, the first full-connection layer, the second full-connection layer, the third full-connection layer and the fourth full-connection layer, a first output end of a preset characteristic diagram extraction network is connected with an input end of the global average pooling layer, an output end of the global average pooling layer is connected with an input end of the first full-connection layer, a first output end of the first full-connection layer is connected with an input end of the second full-connection layer, a second output end of the first full-connection layer is connected with an input end of the third full-connection layer, and a third output end of the first full-connection layer is connected with an input end of the fourth full-connection layer. The second fully-connected layer, the third fully-connected layer, and the fourth fully-connected layer may be in a side-by-side relationship.

Preferably, the number of first fully-connected layer output classes may be 256, the number of second fully-connected layer output classes may be 101, the number of third fully-connected layer output classes may be 2, and the number of fourth fully-connected layer output classes may be 2.

Wherein, the second fully-connected layer can output an age similarity vector (101 dimensions) of an age category, the third fully-connected layer can output a gender similarity vector (2 dimensions) of a gender category, and the fourth fully-connected layer can output a biopsy similarity vector (2 dimensions) of a biopsy result category.

The second full-connection layer is connected with the cross entropy loss function, the third full-connection layer is connected with the cross entropy loss function, the fourth full-connection layer is connected with the cross entropy loss function, and the cross entropy loss function is required to be used in the process of training the preset attribute estimation network.

The embodiment of the application can classify the crowds of children, young people, middle-aged people, old people and the like. For example, sex classification (0 year, 1 year, … year, 99 year, or 100 years), age classification (male or female), and biopsy result classification (whether a living body or not) are performed.

In another embodiment of the present invention, on the basis of the embodiments of the foregoing step S101 to step S104, performing face recognition on the target feature map to obtain a recognition category of the face, where the method may include steps S401 to S403:

s401, inputting the target feature map into a preset face classification network.

Illustratively, as shown in fig. 4, the second output terminal of the preset feature map extraction network is connected to an input terminal of a preset face classification network, and the preset face classification network may include: the second output end of the preset face classification network is connected with the input end of the first convolution layer, the output end of the first convolution layer is connected with the input end of the second convolution layer, and the output end of the second convolution layer is connected with the input end of the fifth full-connection layer.

Preferably, the first convolutional layer may be a global depth convolutional layer (7 × 7). The number of output classes of the fifth fully connected layer may be 128.

S402, carrying out face recognition on the target feature map by using the preset face classification network to obtain a target feature vector.

And the output end of the fifth full connection layer outputs the target characteristic vector. The fifth fully-connected layer connects the cross-entropy loss function. In the process of training the preset face classification network, a cross entropy loss function is required to be used.

And S403, determining the identification category of the face by using the target feature vector.

Specifically, the similarity between the target feature vector and the database image may be calculated, and the recognition category of the face may be determined. Namely, whether the face is in a preset database or not is determined, and/or identity information corresponding to the face is determined.

In another embodiment of the present invention, based on the foregoing embodiments of step S101 to step S104, step S102 may include step S1021 to step S1026:

s1021, positioning the key feature points of the face in the image to be processed.

For example, the key feature points of the face may be the contour points of the eyes, nose tip, mouth corner points, eyebrows, and parts of the face. Specifically, MTCNN (Multi-task Cascaded Convolutional network) algorithm may be used to locate the facial key feature points in the image to be processed.

S1022, when the number of the key feature points of the face exceeds a preset numerical value, determining that a face image exists in the image to be processed.

Wherein whether the face image exists in the image to be processed may be determined using an MTCNN (Multi-task Cascaded Convolutional network) algorithm.

The preset value can be set according to actual needs.

For example, the preset number may be 3, and when the number of the key feature points exceeds 3, it is determined that a face image exists in the image to be processed.

Step S1022 may also be: and when the key feature points of the face meet a preset condition, determining that the face image exists in the image to be processed. For example, the preset condition may be: the located facial key feature points include both eye contour points, nose tip contour points, and mouth corner contour points.

S1023, image brightness information, sharpness information, and symmetry information of the face image are acquired.

And S1024, judging whether the image brightness information, the definition information and the symmetry information all meet the image requirements.

S1025, if the image brightness information, the definition information and the symmetry information all accord with the image requirements, correcting the face image.

Wherein the facial image may be corrected using a similarity transformation. For example, a facial image having a side face or a crooked head may be corrected.

And S1026, carrying out normalization processing on the corrected face image to obtain the posture correcting image.

The corrected face image may be normalized by using a bilinear interpolation algorithm. Illustratively, the normalized size may be 112 × 112 pixels.

In another embodiment of the present invention, on the basis of the foregoing embodiments of step S101 to step S104, the method for processing facial image data may further include step S501 to step S507:

s501, a face recognition data set, an attribute estimation data set and a basic neural network are constructed, wherein each image in the attribute estimation data set corresponds to an attribute data label.

Illustratively, the number of images in the attribute evaluation dataset may be 30 ten thousand. The number of images in the face recognition dataset may be 200 tens of thousands.

Illustratively, if the attribute data includes age and gender, each image in the attribute estimation data set corresponds to an age label and a gender label, respectively. The value range of the age label can be 0-100 years old, and the value of the age label is an integer.

Illustratively, if the attribute data includes age, gender, and biopsy results, each image in the attribute estimation data set corresponds to an age label, a gender label, and a biopsy result label, respectively. The value range of the age label can be 0-100 years old, and the value of the age label is an integer. The content in the gender label may be male or female. The content in the living body detection result tag may be a living body or not.

The basic neural network may be a convolutional neural network, such as a MobileFaceNet model, among others.

The face recognition data set may be a data set such as VGGFace2 or CASIA, and the attribute estimation data set may be an IMDB data set.

S502, preprocessing each image in the face recognition data set respectively, and preprocessing each image in the attribute estimation data set respectively.

S503, training the basic neural network by using the preprocessed face recognition data set to obtain the preset feature map extraction network.

The preset feature map extraction network can be a MobileNet network.

S504, extracting a first feature map of each image in the preprocessed face recognition data set by using the preset feature map extraction network to obtain a first feature map data set, and extracting a second feature map of each image in the preprocessed attribute estimation data set by using the preset feature map extraction network to obtain a second feature map data set.

And S505, respectively constructing a first network and a second network at the output end of the preset feature map extraction network.

The first network and the second network are in parallel relation.

S506, training the first network by using the first feature map data set to obtain a preset face classification network.

And S507, training the second network by using the second feature map data set and the attribute data labels respectively corresponding to each second feature map in the second feature map data set to obtain a preset attribute estimation network.

In still another embodiment of the present invention, on the basis of the foregoing embodiments of step S501 to step S507, step S507 may include step S5071 to step S5075:

s5071, the second feature map dataset is divided into a training dataset and a testing dataset.

In particular, the second feature map data set may be divided into two parts, with 90% of the data in the second feature map data set constituting the training data set and the remaining 10% of the data in the second feature map data set constituting the testing data set.

S5072, selecting attribute data labels corresponding to each second feature map in the training data set, to obtain a training label data set.

And S5073, selecting attribute data labels respectively corresponding to the second feature maps in the test data set to obtain the test label data set.

And S5074, training the second network by using the training data set and the training label data set to obtain a trained network.

S5075, testing the trained network by using the test data set and the test label data set to obtain the preset attribute estimation network.

In another embodiment of the present invention, a face image data processing apparatus in the embodiment of the present application is described in detail, as shown in fig. 5, including: an acquisition module 51, a pre-processing module 52, an extraction module 53 and a processing module 54.

The obtaining module 51 is configured to obtain an image to be processed, where the image to be processed includes at least one human face.

The preprocessing module 52 is configured to preprocess the image to be processed to obtain a posture correcting image of the human face.

The extracting module 53 is configured to extract a target feature map in the positive-posture image by using a preset feature map extraction network.

The processing module 54 is configured to perform attribute estimation and face recognition on the target feature map respectively to obtain attribute data and a recognition category of the face.

In another embodiment of the present invention, a facial image data processing apparatus in an embodiment of the present application is described in detail, as shown in fig. 6, the facial image data processing apparatus includes: a processor 601, a memory 602, a communication interface 603, and a bus 604.

The processor 601, the memory 602 and the communication interface 603 complete communication with each other through the bus 604.

The communication interface 603 is used for information transmission between external devices.

Illustratively, the external device may be a user equipment UE.

The processor 601 is configured to call program instructions in the memory 602 to execute the steps of the facial image data processing method according to any one of the above embodiments.

Specifically, the processor 601 is configured to execute the facial image data processing program to implement the following steps: acquiring an image to be processed, wherein the image to be processed comprises at least one face; preprocessing the image to be processed to obtain a posture correcting image of the face; extracting a target feature map in the positive attitude image by using a preset feature map extraction network; and respectively carrying out attribute estimation and face recognition on the target feature map to obtain attribute data and recognition categories of the face.

In a further embodiment of the present invention, a computer-readable storage medium in an embodiment of the present application is described in detail, and the computer-readable storage medium stores computer instructions that cause the computer to execute the steps of the facial image data processing method according to any one of the above embodiments.

The computer-readable storage medium may store one or more computer instructions. The computer-readable storage medium may include volatile memory, such as random access memory; the computer-readable storage medium may also include non-volatile memory, such as read-only memory, flash memory, a hard disk, or a solid state disk; the computer readable storage medium may also include a combination of memories of the above kinds.

Specifically, the computer instructions cause the computer to perform the steps of: acquiring an image to be processed, wherein the image to be processed comprises at least one face; preprocessing the image to be processed to obtain a posture correcting image of the face; extracting a target feature map in the positive attitude image by using a preset feature map extraction network; and respectively carrying out attribute estimation and face recognition on the target feature map to obtain attribute data and recognition categories of the face.

It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for processing face image data is characterized by comprising the following steps:

2. The method of claim 1, wherein performing attribute estimation on the target feature map to obtain attribute data of the face comprises:

3. The method according to claim 2, wherein if the attribute data is age, the performing attribute estimation on the target feature map to obtain the attribute data of the face comprises:

4. The method of claim 1, wherein performing face recognition on the target feature map to obtain a recognition class of the face comprises:

inputting the target feature map into a preset face classification network;

5. The method according to claim 1, wherein the preprocessing the image to be processed to obtain a positive posture image of the human face comprises:

positioning key feature points of the face in the image to be processed;

6. The method of claim 1, further comprising:

7. The method of claim 6, wherein the training the second network using the second feature map dataset and the attribute data labels corresponding to each second feature map in the second feature map dataset to obtain a preset attribute estimation network comprises:

8. A face image data processing apparatus, comprising: the device comprises an acquisition module, a preprocessing module, an extraction module and a processing module;

9. A face image data processing apparatus characterized by comprising: a processor, a memory, a communication interface, and a bus;

the processor is configured to invoke program instructions in the memory to perform the steps of the method of any of claims 1 to 7.

10. A computer-readable storage medium storing computer instructions for causing a computer to perform the steps of the method of any one of claims 1 to 7.