CN114038044A

CN114038044A - Face gender and age identification method and device, electronic equipment and storage medium

Info

Publication number: CN114038044A
Application number: CN202111398427.8A
Authority: CN
Inventors: 杨凯; 罗超; 梁贤朋; 邹宇; 程国强
Original assignee: Ctrip Travel Information Technology Shanghai Co Ltd
Current assignee: Ctrip Travel Information Technology Shanghai Co Ltd
Priority date: 2021-11-23
Filing date: 2021-11-23
Publication date: 2022-02-11

Abstract

The invention relates to the technical field of image processing, and provides a face gender and age identification method, a face gender and age identification device, electronic equipment and a storage medium. The face gender and age identification method comprises the following steps: based on the fusion detection of face detection and portrait detection, intercepting a face image from an initial image; inputting the face image into a gender and age identification model to obtain gender prediction output and age prediction output of the gender and age identification model; the gender and age identification model is provided with a double-branch convolutional neural network used for gender prediction and age prediction respectively; and respectively obtaining a gender recognition result and an age recognition result of the face image according to the gender prediction output and the age prediction output. The invention relates to face detection and portrait detection, obtains reliable face images, simultaneously predicts the gender and age of the face by utilizing a gender and age identification model with a double-branch convolutional neural network, and accurately and efficiently obtains the identification result of the gender and age of the face.

Description

Face gender and age identification method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of image processing, in particular to a face gender and age identification method, a face gender and age identification device, electronic equipment and a storage medium.

Background

Currently, pictures are important means for displaying goods, browsing and booking by users on an Online service platform, such as an OTA (Online Travel Agency) platform. By mining and analyzing the picture content, personalized accurate recommendation can be provided for the user during user search, and the use experience of the user is improved. In a travel scene, information such as gender and age of people contained in the picture is an important reference dimension for searching and recommending travel products.

The existing face gender and age identification methods are many, but the detection accuracy of the existing method is low due to the characteristics of openness and complexity of an OTA scene, and the existing method is difficult to fall and act on the OTA scene. The specific reasons include:

the picture proportion of faces including statues, cartoon images and the like in OTA scenes is more than that of other scenes, and the faces are difficult to filter;

the human face detection is influenced by the problems of shooting jitter, focusing, light, various beautifying filters and the like;

face detection errors are caused by the conditions of too small face, unclear face, shielding of a mask, sunglasses or a hat and the like, different degrees of angle deflection and the like.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the invention and therefore may include information that does not constitute prior art that is already known to a person of ordinary skill in the art.

Disclosure of Invention

In view of the above, the present invention provides a method, an apparatus, an electronic device and a storage medium for identifying gender and age of a human face, which can acquire a reliable human face image through fusion detection of human face detection and portrait detection, and predict gender and age of the human face simultaneously by using a gender and age identification model with a dual-branch convolutional neural network, thereby accurately and efficiently obtaining an identification result of gender and age of the human face.

According to one aspect of the invention, a face gender and age identification method is provided, which comprises the following steps: based on the fusion detection of face detection and portrait detection, intercepting a face image from an initial image; inputting the face image into a gender and age identification model to obtain gender prediction output and age prediction output of the gender and age identification model; the gender and age identification model is provided with a double-branch convolutional neural network, and the double-branch convolutional neural network is used for carrying out gender prediction and age prediction respectively; and acquiring a gender identification result and an age identification result of the face image according to the gender prediction output and the age prediction output respectively.

In some embodiments, the backbone network of the gender age identification model comprises: the public convolutional neural network is used for extracting public features of the face image; and the double-branch convolutional neural network is connected with the public convolutional neural network and is used for respectively extracting gender characteristics and age characteristics of the public characteristics extracted by the public convolutional neural network and respectively outputting the gender prediction output and the age prediction output.

In some embodiments, the backbone network of the gender age identification model employs a Res2Net network; before the face image is input into a gender and age recognition model, the method further comprises a model training process, wherein the model training process comprises the following steps: training the Res2Net network based on a first sample set to obtain a first model with a first parameter set; freezing the front end part of the Res2Net network, and training the first model based on a second sample set to obtain a second model with a second parameter set; combining the first model and the second model to enable the front end part to form the public convolutional neural network, and enabling the back end part with first parameters and the back end part with second parameters of the Res2Net network to respectively form the double-branch convolutional neural network; wherein the first set of samples is a gender tagged set of samples and the second set of samples is an age tagged set of samples; alternatively, the first set of samples is an age-tagged set of samples and the second set of samples is a gender-tagged set of samples.

In some embodiments, the Res2Net network includes a pre-processing module, four Stage modules, and a post-processing module; the front end part comprises the preprocessing module and the first two Stage modules, and the back end part comprises the last two Stage modules and the post-processing module.

In some embodiments, when the Res2Net network is trained based on the gender sample set, the number of output nodes of the rear end part is set to two, the two output nodes respectively correspond to two gender labels, and the gender prediction loss of each gender sample image by the two output nodes is calculated based on cross entropy; and when the Res2Net network is trained based on the age sample set, the number of the output nodes is set to be multiple, the output nodes correspond to a plurality of age labels respectively, and the age prediction loss of each age sample image by the multiple output nodes is calculated based on the bulldozer distance and the mean square error.

In some embodiments, the gender prediction loss is calculated by the formula:

wherein, y_genderThe value range of the sex label value of the sex sample image is {0, 1}, y_iA gender prediction value for the corresponding output node for the gender sample image;

the calculation formula of the age prediction loss is as follows:

loss_age＝loss_cls+αloss_reg；

therein, loss_clsLoss for classification based on bulldozer distance_regAlpha is a weight factor for the regression loss based on the mean square error;

the calculation formula of the classification loss based on the bulldozer distance is as follows:

wherein, Y [ 0: 100]SoftMax (·) representing SoftMax operations for age prediction distribution of the age sample images by the plurality of output nodes; age is an age label value of the age sample image, the value range is {0, 1., 100}, and OneHot (·) represents a one hot code; CDF_i() represents the ith element in the probability distribution;

the calculation formula of the regression loss based on the mean square error is as follows:

wherein, y_iAnd y_kAn age prediction value for the age sample image for a corresponding output node.

In some embodiments, the model training process further comprises a sample construction process comprising: obtaining a sample image set; respectively carrying out gender label labeling and age label labeling on the sample image set; and according to the data scene characteristics, performing data enhancement on the sample image set to obtain the gender tag sample set and the age tag sample set.

In some embodiments, data enhancement of the sample image set comprises: dividing the sample image set into a plurality of groups; setting a random selection probability and a data enhancement mode for each group of sample images, wherein the data enhancement mode of each group of sample images covers the data scene characteristics; and carrying out image random selection on each group of sample images according to the random selection probability, and carrying out image enhancement processing according to the data enhancement mode.

In some embodiments, the intercepting a face image from an initial image based on fusion detection of face detection and portrait detection includes: carrying out face detection on the initial image to obtain a face frame, and carrying out portrait detection on the initial image to obtain a portrait frame; judging whether each obtained face frame has a matched face frame, wherein the area of the intersection region of the matched face frame and the face frame is larger than the preset proportional area of the face frame; reserving the face frame with the matched face frame as a real face frame; and intercepting the face image from the initial image according to the real face frame.

In some embodiments, the obtaining the face image from the initial image according to the real face frame includes: carrying out equal-ratio external expansion on the real face frame to obtain an external expansion face frame; calculating a face inclination angle of the initial image according to key points of two eyes of the real face frame, and rotating the initial image by taking the center of the real face frame as a rotation center according to the face inclination angle; and according to the area corresponding to the external expansion face frame, intercepting from the initial image after rotation to obtain the face image.

In some embodiments, the gender prediction output corresponds to two gender tags, the age prediction output corresponds to a plurality of age tags; obtaining a gender identification result and an age identification result of the face image, comprising: respectively carrying out SoftMax operation on the gender prediction output and the age prediction output to obtain gender prediction probabilities corresponding to the two gender tags and age prediction probabilities corresponding to the plurality of age tags; obtaining the gender identification result according to the gender label corresponding to the larger probability value in the gender prediction probability; and calculating mathematical expectation according to the age prediction probability and the age labels to obtain the age identification result.

According to an aspect of the present invention, there is provided a face gender age recognition apparatus, comprising: the face image acquisition module is used for intercepting a face image from an initial image based on fusion detection of face detection and portrait detection; the gender and age prediction module is used for inputting the face image into a gender and age recognition model to obtain gender prediction output and age prediction output of the gender and age recognition model; the gender and age identification model is provided with a double-branch convolutional neural network, and the double-branch convolutional neural network is used for carrying out gender prediction and age prediction respectively; and the gender and age calculating module is used for obtaining a gender identification result and an age identification result of the face image according to the gender prediction output and the age prediction output respectively.

According to an aspect of the present invention, there is provided an electronic apparatus including: a processor; a memory having executable instructions stored therein; wherein the executable instructions, when executed by the processor, implement the face gender and age identification method as described in any of the above embodiments.

According to an aspect of the present invention, there is provided a computer-readable storage medium for storing a program which, when executed by a processor, implements the face gender age identification method as described in any of the above embodiments.

Compared with the prior art, the invention has the beneficial effects that:

the method comprises the steps of associating and aggregating a face detection result and a portrait detection result through fusion detection of face detection and portrait detection to obtain a reliable face image; the gender and age of the face are predicted simultaneously by utilizing a gender and age recognition model with a double-branch convolutional neural network, so that the recognition result of the gender and age of the face is accurately and efficiently obtained;

according to the technical scheme, the gender, age and other information of the face in the prediction image library are automatically identified and predicted based on a deep learning face processing technology, and an automatic flow of image face detection and gender and age identification is established; the machine can replace manual checking and screening work, so that the manual operation and maintenance cost is greatly saved, and the processing efficiency and accuracy of gender and age identification are improved; meanwhile, the mining and application of the information such as the gender and the age of the face can improve the searching and recommending precision of the picture, improve the click rate of the product and the browsing experience of the user, and establish a good OTA brand image.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings described below are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

FIG. 1 is a schematic diagram illustrating steps of a face gender and age identification method according to an embodiment of the invention;

FIG. 2 is a network architecture diagram illustrating a gender age identification model in accordance with one embodiment of the present invention;

FIG. 3 is a diagram illustrating the training steps of a gender age identification model in one embodiment of the present invention;

FIG. 4 is a schematic diagram illustrating data enhancement of a sample image set according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating a training structure of a gender age recognition model in an embodiment of the present invention;

FIG. 6 is a schematic block diagram of an apparatus for identifying gender and age of a human face according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device in an embodiment of the invention.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

The drawings are merely schematic illustrations of the invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

In addition, the flow shown in the drawings is only an exemplary illustration, and not necessarily includes all the steps. For example, some steps may be divided, some steps may be combined or partially combined, and the actual execution sequence may be changed according to the actual situation. The use of "first," "second," and similar terms in the detailed description is not intended to imply any order, quantity, or importance, but rather is used to distinguish one element from another. It should be noted that features of the embodiments of the invention and of the different embodiments may be combined with each other without conflict.

Fig. 1 shows the main steps of the face gender and age identification method in an embodiment, and referring to fig. 1, the face gender and age identification method in the embodiment includes:

and step S110, intercepting a face image from the initial image based on the fusion detection of the face detection and the portrait detection.

The face detection algorithm and the portrait detection algorithm are utilized to respectively carry out face detection and portrait detection on the pictures, and correlation aggregation is carried out on the results, so that the face which is detected by mistake can be filtered and removed, and the accuracy of face detection is improved.

In one embodiment, based on fusion detection of face detection and portrait detection, intercepting a face image from an initial image, specifically including: carrying out face detection on the initial image to obtain a face frame, and carrying out portrait detection on the initial image to obtain a portrait frame; judging whether each obtained face frame has a matched face frame or not, wherein the area of the intersection region of the matched face frame and the face frame is larger than the preset proportional area of the face frame; reserving the face frame with the matched face frame as a real face frame; and intercepting the original image according to the real face frame to obtain a face image.

The face detection can adopt the existing face detection algorithm. Detecting the initial image O by using a face detection algorithm to obtain N personal face frames B_face＝{b_f1，b_f2，...，b_fNAnd corresponding face key points L_face＝{l_f1，l_f2，...，l_fNAnd 5 key points of the face of each face frame comprise a left eye, a right eye, a nose, a left mouth corner and a right mouth corner. The portrait detection can adopt the existing portrait detection algorithm. Detecting the initial image by using a portrait detection algorithm to obtain M personal photo frames B_person＝{b_p1，b_p2，...，b_pM}. For each face frame b_fnIf 1 photo frame b can be found_pmSo that b is_fnAnd b_pmThe area of the crossed region is more than b_fnKeeping the area at 80%, otherwise, removing the face frame b_fnAnd its corresponding key point l_fn。

Further, the method for capturing and obtaining the face image from the initial image according to the real face frame specifically comprises the following steps: carrying out equal-ratio external expansion on the real face frame to obtain an external expansion face frame; calculating a face inclination angle of the initial image according to two eye key points of the real face frame, and rotating the initial image by taking the center of the real face frame as a rotation center according to the face inclination angle; and intercepting from the initial image after rotation to obtain a face image according to the area corresponding to the external expansion face frame.

As in the above embodiment, N' real face frames are obtained. Respectively expanding N 'real face frames by 0.4 times in the transverse direction and the longitudinal direction to obtain N' expanded face frames

Using face key points l corresponding to each real face frame_fnCalculating the face inclination angle by using the key points of the two eyes, rotating the initial image O by taking the center of a real face frame as a rotation center based on the face inclination angle, and then expanding the face frame

The corresponding area cuts out a face image I from the rotated initial image_fnResize is 224 × 224 as input to the face gender age recognition model.

The method comprises the steps of obtaining an external expansion face frame through equal-ratio external expansion, containing more useful face information, calculating a face inclination angle by using key points of two human eyes, rotating an original drawing according to the face inclination angle, and obtaining a real and effective face image convenient for model identification. Therefore, the method and the device can distinguish and filter the non-real faces such as animation, sculpture and the like, can distinguish and filter the situations such as too small face and too fuzzy face and the like, and can accurately identify attribute information such as gender and age of the faces in different age groups and different deflection angles.

Step S120, inputting the face image into a gender and age identification model to obtain gender prediction output and age prediction output of the gender and age identification model; the gender and age identification model is provided with a double-branch convolutional neural network, and the double-branch convolutional neural network is used for gender prediction and age prediction respectively.

The backbone network of the gender age identification model may specifically include: the public convolutional neural network is used for extracting public features of the face image; and the two-branch convolutional neural network is connected with the public convolutional neural network and used for respectively extracting the gender characteristic and the age characteristic of the public characteristic extracted by the public convolutional neural network and respectively outputting gender prediction output and age prediction output.

Fig. 2 shows a network structure of the gender age recognition model in an embodiment, and referring to fig. 2, a Res2Net network, specifically, a Res2Net-50 network is adopted as a backbone network of the gender age recognition model 200 in the embodiment. The Res2Net network is an upgraded version of a ResNet network, namely a residual network, the Res2Net network has multiple versions, and the Res2Net-50 network is a Res2Net network with 50 layers. The Res2Net network includes a preprocessing module, four Stage modules (Stage1, Stage2, Stage3, and Stage4), and a post-processing module.

The input to the gender age recognition model 200 is a 224 x 3 dimensional face image. The operations of the pre-processing module comprise, in order, a convolution operation (Conv), a batch normalization operation (Bn) and an activation function (Relu). The operation of the pre-processing module may also include max pooling, not specifically shown. The four Stage modules are all composed of Bottleneeck, and the structure is similar. Stage1 can contain 3 Bottleneecks, Stage2 can contain 4 Bottleneecks, Stage3 can contain 6 Bottleneecks, and Stage4 can contain 3 Bottleneecks. The specific structure of the Res2Net network and the Bottleneck structure thereof are already available, so the description is not further provided. The operation of the post-treatment module 0 comprises in turn an average pooling (Avg pool) and a full connection operation (FC). The preprocessing module, Stage1 and Stage2 form a common convolutional neural network of the gender age identification model 200; two sets of Stage3, Stage4 and post-processing modules with different parameters form a two-branch convolutional neural network of the gender age identification model 200, including a gender branch convolutional neural network and an age branch convolutional neural network.

The outputs of the gender age identification model 200 are the gender prediction output of the gender branch convolutional neural network and the age prediction output of the age branch convolutional neural network, respectively. The sex prediction output is 1 multiplied by 2 dimension, and corresponds to sex male and sex female 2 individual labels; the age prediction output is in 1 x 101 dimensions, corresponding to 101 age tags from 0 to 100 years old.

And step S130, acquiring a gender identification result and an age identification result of the face image according to the gender prediction output and the age prediction output respectively.

In one embodiment, the obtaining of the gender identification result and the age identification result of the face image specifically includes: respectively carrying out SoftMax operation on the gender prediction output and the age prediction output to obtain gender prediction probabilities corresponding to the two gender labels and age prediction probabilities corresponding to the plurality of age labels; obtaining a gender identification result according to the gender label corresponding to the larger probability value in the gender prediction probability; and calculating a mathematical expectation according to the age prediction probability and the plurality of age labels to obtain an age identification result.

Therefore, the face gender and age identification method can acquire a reliable face image by fusion detection of face detection and portrait detection and association aggregation of the face detection result and the portrait detection result; the gender and age of the face are predicted simultaneously by utilizing a gender and age recognition model with a double-branch convolutional neural network, so that the recognition result of the gender and age of the face is accurately and efficiently obtained; by adopting the face gender and age identification method, an automatic algorithm processing flow can be established aiming at the current situation that content information such as face gender, age and the like in an OTA image library is not sufficiently dug, and automatic and high-accuracy face detection, screening and filtering and face gender and age identification and prediction are carried out on the images in the image library based on a deep learning face processing technology.

The training process of the gender age identification model is explained in detail below.

Fig. 3 shows the main training steps of the gender age identification model in an embodiment, which is shown in fig. 3 and includes: s310, training data collection and enhancement; and S320, constructing, training and combining the models. S320 specifically includes: training a gender branch classification model; training an age branch prediction model; and combining to obtain a double-branch face gender and age identification model.

The sample construction process, namely training data collection and enhancement, specifically comprises the following steps: obtaining a sample image set; respectively carrying out gender label labeling and age label labeling on the sample image set; and according to the data scene characteristics, performing data enhancement on the sample image set to obtain a gender tag sample set and an age tag sample set.

The method can collect a face gender identification and age prediction related switch source data set, integrate related data accumulated by OTA, and add supplementary labeling data into training data to form an initial sample image set.

Further, in order to enable the training data to be closer to the data distribution under the real application scene, data enhancement is performed on the training data. The method mainly comprises various noises, various blurs, various shelters, picture style and tone changes and the like.

In one embodiment, the data enhancement of the sample image set specifically includes: dividing the sample image set into a plurality of groups; randomly selecting probability and a data enhancement mode for each group of sample images, wherein the data enhancement mode of each group of sample images covers the data scene characteristics; and carrying out image random selection on each group of sample images according to the random selection probability, and carrying out image enhancement processing according to a data enhancement mode. The data enhancement mode of each group of sample images covers the data scene features, namely, if mask occlusion and filter dimming exist in a face picture in an OTA scene, various occlusions and various filters are added to the picture during data enhancement, so that training data are close to data distribution in a real application scene.

Fig. 4 shows a data enhancement schematic of a sample image set in an embodiment, and referring to fig. 4, in order to make the training data more approximate to the real data distribution in the OTA scene, the training data is enhanced by using an allocations data enhancement library. In the concrete implementation, the sample image set 410 is divided into 7 independent groups, each group processes the images according to a preset probability in the training process, when a certain group is triggered, an image processing method is randomly selected from the group for processing, and the process is controlled by using an Oneof method in allocations.

Randomly chosen probability P of the first set of sample images 420₁0.2, the data enhancement mode comprises the following steps: CoarseDropout and gridrdropout. Random choice probability P of second set of sample images 430₂0.2, the data enhancement mode comprises the following steps: RandomFog (fogging treatment), RandomRain (rain treatment), RandomShadow (shading treatment), RandomSnow (snow cover treatment) and randomsun (light treatment). Random choice probability P of third set of sample images 440₃0.5, the data enhancement mode comprises the following steps: horizon flip and vertical flip. Random selection probability P of fourth group of sample images 450₄0.3, the data enhancement mode comprises the following steps: rgbshift (RGB conversion), huespataturationvalue (HSV, hue, saturation, lightness processing), ChannelShuffle (channel conversion), ChannelDropout (channel drop), CLAHE (adaptive histogram equalization that limits Contrast), RandomGamma (gamma random), randombright (brightness random), Contrast (Contrast processing), ImageCompression (image compression), Posterize (hue separation), and Equalize (hue equalization). Random choice probability P of the fifth set of sample images 460₅0.2, the data enhancement mode comprises the following steps: medianburr (median filtering), GaussianBlur (gaussian filtering), motionimburr (motion blur), and glassesblu (ground glass processing). Probability of random selection of the sixth set of sample images 470P₆0.2, the data enhancement mode comprises the following steps: ToGray (gray scale processing), ToSepia (tan special effect), and ToFloat (floating point type processing). Random selection probability P of the seventh set of sample images 480₇0.3, the data enhancement mode comprises the following steps: gauss noise, isonose (ISO noise), multiplicitvoise (multiplicative noise), and iaaadditivegassinannoise (additive gaussian noise).

Through data enhancement processing, a processed sample image set 490 is obtained containing a set of gender tagged samples and a set of age tagged samples. Wherein the images of the gender tagged sample set and the age tagged sample set may be the same, except for the added labels being different.

The model building, training and merging process specifically comprises the following steps: training the Res2Net network based on a first sample set by taking the Res2Net network as a backbone network to obtain a first model with a first parameter set; freezing the front end part of the Res2Net network, and training the first model based on a second sample set to obtain a second model with a second parameter set; combining the first model and the second model to enable a front end part to form a common convolutional neural network, and enabling a rear end part with first parameters and a rear end part with second parameters of the Res2Net network to respectively form a double-branch convolutional neural network; wherein the first set of samples is a gender tagged set of samples and the second set of samples is an age tagged set of samples; alternatively, the first set of samples is an age-tagged set of samples and the second set of samples is a gender-tagged set of samples.

FIG. 5 shows a training structure of a gender age recognition model in an embodiment, and referring to FIG. 5, a Res2Net network 500 (which may be a Res2Net-50 network) specifically includes a preprocessing module 510, four Stage modules (520-550), and a post-processing module 560; the front-end portion of the Res2Net network 500 includes a pre-processing module 510 and the first two Stage modules (520 and 530), and the back-end portion includes the last two Stage modules (540 and 550) and a post-processing module 560.

After the deep convolutional neural network is constructed based on the Res2Net network 500, the gender branch and the age branch are trained independently, and the training sequence of the gender branch and the age branch can be interchanged. First, training the Res2Net network 500 based on a sample set of gender labels to obtain a first model 510 with a first set of parameters; and during sex branch training, setting the number of output nodes at the rear end part of the network as 2, respectively corresponding to the two sex labels, calculating sex prediction loss of each sex sample image by using the 2 output nodes through cross entropy, and performing back propagation on the optimization model. Then, training the Res2Net network 500 based on an age label sample set, setting the number of output nodes at the rear end part of the network as 101, and respectively corresponding to 101 age labels; loading the parameters of the trained first model 510, freezing the front end of the network, and retraining the weight of the back-end parameters; during training, age prediction loss of 101 output nodes to each age sample image is calculated based on bulldozer distance (EMD) and Mean Square Error (MSE), the optimization model is propagated reversely, and a second model 520 with a second parameter set is obtained. Finally, the parts with the same front-end parameter weight and the parts with different back-end parameter weight of the two models are combined to form a gender branch and an age branch respectively, so that the whole model forms a dual-branch gender and age identification model 200, and the gender and the age of the human face can be identified at the same time. The structure of each module in the gender age identification model 200 can refer to the description of fig. 2 above, and will not be repeated here.

When carrying out sex branch training, revise output node into 2, correspond sex label man and woman, use the cross entropy to calculate gender discernment's categorised loss, the concrete formula is:

wherein, y_genderThe value range of the sex label value of the sex sample image is {0, 1}, y_iAnd a gender prediction value of the gender sample image is obtained for the corresponding output node.

And during age branch training, modifying the output nodes into 101 nodes, loading the trained gender model parameter weight corresponding to the age label of 0-100 years, freezing the network Stage2 and the previous layer, and retraining the parameter weight after 2 of the network Stage. MiningAnd (3) calculating classification loss and regression loss of the age prediction by using EMD loss and MSE loss respectively, and performing weighted summation on the two losses to train the model. The categorical loss, regression loss, and age-predicted total loss are expressed as loss, respectively_cls、loss_regAnd loss_age。

Classification loss based on bulldozer distance_clsThe calculation formula of (2) is as follows:

wherein, Y [ 0: 100](ii) an age prediction distribution of age sample images for a plurality of output nodes, SoftMax (·) denoting SoftMax operations; age is an age label value of the age sample image, the value range is {0, 1., 100}, and oneHot (·) represents a one hot code; CDF_i(. cndot.) represents the ith element in the probability distribution.

Mean square error based regression loss_regThe calculation formula of (2) is as follows:

wherein, y_iAnd y_kAnd the age predicted value of the age sample image is corresponding to the output node.

Age predicted loss_ageThe calculation formula of (2) is as follows:

loss_age＝loss_cls+αloss_reg；

where α is a weighting factor used to balance age classification loss and regression loss.

And finally, combining the gender model and the age model, selecting the gender model Stage2 and the part before the gender model Stage2 or the age model Stage2 and the part before the gender model Stage2 (the gender model Stage2 and the part before the age model Stage2 are completely equal), and simultaneously splicing the part after the gender model Stage2 and the part after the age model Stage2 to form a new double-branch network model, namely a gender and age identification model.

In conclusion, the face gender and age identification method based on deep learning can effectively extract and mine face gender, age and other information contained in an open scene. The method comprises the steps of firstly aggregating the detection results of a face detection algorithm and a portrait detection algorithm, filtering the face detected by mistake to obtain more reliable face information, and then performing data enhancement based on the existing training set aiming at the characteristics of OTA data, so that the training set is closer to the data distribution of an OTA scene. In addition, two independent deep convolutional neural networks are independently trained through various loss functions in model training, and then the parts with the same parameters of the two networks are combined to obtain a double-branch network for simultaneously identifying the gender and the age of the face. By adopting the scheme of the invention, automatic face detection and face gender and age identification can be realized, so that the labor cost can be greatly saved, the processing efficiency is improved, and meanwhile, the accuracy of personalized recommendation is improved based on the content information such as the face gender and age in the gallery in the subsequent production application, the click rate of a product and the browsing experience of a user are improved, and a good OTA brand image is established.

The embodiment of the invention also provides a face gender and age identification device which can be used for realizing the face gender and age identification method described in any embodiment. The features and principles of the face gender age identification method described in any of the above embodiments can be applied to the following face gender age identification device embodiments. In the following embodiments of the face gender and age recognition device, the features and principles that have been elucidated with respect to the face gender and age recognition will not be repeated.

Fig. 6 shows the main modules of the face gender age recognition apparatus in an embodiment, and referring to fig. 6, the face gender age recognition apparatus 800 of the embodiment includes: a face image obtaining module 810, configured to intercept a face image from an initial image based on fusion detection of face detection and portrait detection; a gender and age predicting module 820, for inputting the face image into a gender and age identifying model, obtaining gender predicting output and age predicting output of the gender and age identifying model; the gender and age identification model is provided with a double-branch convolutional neural network, and the double-branch convolutional neural network is used for carrying out gender prediction and age prediction respectively; and a gender and age calculating module 830 for obtaining a gender recognition result and an age recognition result of the face image according to the gender prediction output and the age prediction output, respectively.

Further, the face gender and age identifying apparatus 800 may further include modules for implementing other process steps of the above embodiments of the face gender and age identifying method, and specific principles of the modules may refer to the description of the above embodiments of the face gender and age identifying method, and will not be described again here.

The face gender and age recognition device can acquire reliable face information through the fusion detection of face detection and portrait detection; training data distribution of matched data scene characteristics is obtained through data enhancement; independently training the sex age branches by using a model through various loss functions to obtain a double-branch sex age identification model; therefore, automatic and high-precision face detection, screening and gender and age identification can be realized, content information such as face gender and age in the picture can be conveniently mined, the search recommendation precision is improved, and the use experience of a user is improved.

The embodiment of the invention also provides electronic equipment which comprises a processor and a memory, wherein the memory stores executable instructions, and the executable instructions are executed by the processor to realize the face gender and age identification method described in any embodiment.

The electronic equipment can acquire reliable face information through the fusion detection of the face detection and the portrait detection; training data distribution of matched data scene characteristics is obtained through data enhancement; independently training the sex age branches by using a model through various loss functions to obtain a double-branch sex age identification model; therefore, automatic and high-precision face detection, screening and gender and age identification can be realized, content information such as face gender and age in the picture can be conveniently mined, the search recommendation precision is improved, and the use experience of a user is improved.

Fig. 7 is a schematic structural diagram of an electronic device in an embodiment of the present invention, and it should be understood that fig. 7 only schematically illustrates various modules, and these modules may be virtual software modules or actual hardware modules, and the combination, the splitting, and the addition of the remaining modules of these modules are within the scope of the present invention.

As shown in fig. 7, the electronic device 600 is embodied in the form of a general purpose computing device. The components of the electronic device 600 include, but are not limited to: at least one processing unit 610, at least one memory unit 620, a bus 630 connecting the different platform components (including the memory unit 620 and the processing unit 610), a display unit 640, etc.

Wherein the storage unit stores program code which can be executed by the processing unit 610 such that the processing unit 610 performs the steps of the face gender age identification method described in any of the above embodiments. For example, processing unit 610 may perform the steps shown in fig. 1 and 3.

The storage unit 620 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)6201 and/or a cache memory unit 6202, and may further include a read-only memory unit (ROM) 6203.

The memory unit 620 may also include programs/utilities 6204 including one or more program modules 6205, such program modules 6205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

Bus 630 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 600 may also communicate with one or more external devices 700, and the external devices 700 may be one or more of a keyboard, a pointing device, a bluetooth device, and the like. The external devices 700 enable a user to interactively communicate with the electronic device 600. The electronic device 600 may also be capable of communicating with one or more other computing devices, including routers, modems. Such communication may occur via an input/output (I/O) interface 650. Also, the electronic device 600 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 660. The network adapter 660 may communicate with other modules of the electronic device 600 via the bus 630. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 600, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage platforms, to name a few.

The embodiment of the invention also provides a computer-readable storage medium for storing a program, and the program can be executed to implement the face gender and age identification method described in any of the above embodiments. In some possible embodiments, the aspects of the present invention may also be implemented in the form of a program product, which includes program code for causing a terminal device to execute the face gender age identification method described in any of the above embodiments, when the program product is run on the terminal device.

When the storage medium is executed, the reliable face information can be obtained through the fusion detection of the face detection and the portrait detection; training data distribution of matched data scene characteristics is obtained through data enhancement; independently training the sex age branches by using a model through various loss functions to obtain a double-branch sex age identification model; therefore, automatic and high-precision face detection, screening and gender and age identification can be realized, content information such as face gender and age in the picture can be conveniently mined, the search recommendation precision is improved, and the use experience of a user is improved.

The program product may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this respect, and may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of readable storage media include, but are not limited to: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A readable storage medium may include a propagated data signal with readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device, such as through the internet using an internet service provider.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims

1. A face gender and age identification method is characterized by comprising the following steps:

based on the fusion detection of face detection and portrait detection, intercepting a face image from an initial image;

inputting the face image into a gender and age identification model to obtain gender prediction output and age prediction output of the gender and age identification model;

the gender and age identification model is provided with a double-branch convolutional neural network, and the double-branch convolutional neural network is used for carrying out gender prediction and age prediction respectively;

and acquiring a gender identification result and an age identification result of the face image according to the gender prediction output and the age prediction output respectively.

2. The method of claim 1, wherein the backbone network of the gender age recognition model comprises:

the public convolutional neural network is used for extracting public features of the face image;

and the double-branch convolutional neural network is connected with the public convolutional neural network and is used for respectively extracting gender characteristics and age characteristics of the public characteristics extracted by the public convolutional neural network and respectively outputting the gender prediction output and the age prediction output.

3. The face gender and age identification method of claim 2, wherein the backbone network of the gender and age identification model adopts a Res2Net network;

before the face image is input into a gender and age recognition model, the method further comprises a model training process, wherein the model training process comprises the following steps:

training the Res2Net network based on a first sample set to obtain a first model with a first parameter set;

freezing the front end part of the Res2Net network, and training the first model based on a second sample set to obtain a second model with a second parameter set;

combining the first model and the second model to enable the front end part to form the public convolutional neural network, and enabling the back end part with first parameters and the back end part with second parameters of the Res2Net network to respectively form the double-branch convolutional neural network;

wherein the first set of samples is a gender tagged set of samples and the second set of samples is an age tagged set of samples; alternatively, the first set of samples is an age-tagged set of samples and the second set of samples is a gender-tagged set of samples.

4. The method for identifying the gender and age of the human face as claimed in claim 3, wherein the Res2Net network comprises a preprocessing module, four Stage modules and a post-processing module;

the front end part comprises the preprocessing module and the first two Stage modules, and the back end part comprises the last two Stage modules and the post-processing module.

5. The method for identifying the gender and the age of the face according to claim 3, wherein when the Res2Net network is trained based on the gender sample set, the number of output nodes at the rear end part is set to be two, the two output nodes respectively correspond to the two gender labels, and the gender prediction loss of each gender sample image is calculated by the two output nodes based on cross entropy;

and when the Res2Net network is trained based on the age sample set, the number of the output nodes is set to be multiple, the output nodes correspond to a plurality of age labels respectively, and the age prediction loss of each age sample image by the multiple output nodes is calculated based on the bulldozer distance and the mean square error.

6. The method for identifying the gender and age of a human face as claimed in claim 5, wherein the formula for calculating the gender prediction loss is as follows:

the calculation formula of the age prediction loss is as follows:

loss_age＝loss_cls+αloss_reg；

wherein, Y is 0: 100]SoftMax (·) representing SoftMax operations for age prediction distribution of the age sample images by the plurality of output nodes; age is an age label value of the age sample image, the value range is {0, 1., 100}, and OneHot (·) represents a one hot code; CDF_i() represents the ith element in the probability distribution;

7. The method of claim 3, wherein the model training process further comprises a sample construction process, the sample construction process comprising:

obtaining a sample image set;

respectively carrying out gender label labeling and age label labeling on the sample image set;

and according to the data scene characteristics, performing data enhancement on the sample image set to obtain the gender tag sample set and the age tag sample set.

8. The method of claim 7, wherein the data enhancement of the sample image set comprises:

dividing the sample image set into a plurality of groups;

setting a random selection probability and a data enhancement mode for each group of sample images, wherein the data enhancement mode of each group of sample images covers the data scene characteristics;

and carrying out image random selection on each group of sample images according to the random selection probability, and carrying out image enhancement processing according to the data enhancement mode.

9. The method for identifying gender and age of human face according to claim 1, wherein the step of intercepting the human face image from the initial image based on the fusion detection of human face detection and portrait detection comprises:

carrying out face detection on the initial image to obtain a face frame, and carrying out portrait detection on the initial image to obtain a portrait frame;

judging whether each obtained face frame has a matched face frame, wherein the area of the intersection region of the matched face frame and the face frame is larger than the preset proportional area of the face frame;

reserving the face frame with the matched face frame as a real face frame; and

and intercepting the initial image according to the real face frame to obtain the face image.

10. The method for identifying the gender and age of the human face according to claim 9, wherein the step of intercepting the human face image from the initial image according to the real human face frame comprises:

carrying out equal-ratio external expansion on the real face frame to obtain an external expansion face frame;

calculating a face inclination angle of the initial image according to key points of two eyes of the real face frame, and rotating the initial image by taking the center of the real face frame as a rotation center according to the face inclination angle;

and according to the area corresponding to the external expansion face frame, intercepting from the initial image after rotation to obtain the face image.

11. The method of claim 1, wherein the gender prediction output corresponds to two gender tags, and the age prediction output corresponds to a plurality of age tags;

obtaining a gender identification result and an age identification result of the face image, comprising:

respectively carrying out SoftMax operation on the gender prediction output and the age prediction output to obtain gender prediction probabilities corresponding to the two gender tags and age prediction probabilities corresponding to the plurality of age tags;

obtaining the gender identification result according to the gender label corresponding to the larger probability value in the gender prediction probability;

and calculating mathematical expectation according to the age prediction probability and the age labels to obtain the age identification result.

12. A face gender and age recognition device is characterized by comprising:

the face image acquisition module is used for intercepting a face image from an initial image based on fusion detection of face detection and portrait detection;

the gender and age prediction module is used for inputting the face image into a gender and age recognition model to obtain gender prediction output and age prediction output of the gender and age recognition model; the gender and age identification model is provided with a double-branch convolutional neural network, and the double-branch convolutional neural network is used for carrying out gender prediction and age prediction respectively;

and the gender and age calculating module is used for obtaining a gender identification result and an age identification result of the face image according to the gender prediction output and the age prediction output respectively.

13. An electronic device, comprising:

a processor;

a memory having executable instructions stored therein;

wherein the executable instructions, when executed by the processor, implement the face gender age identification method of any one of claims 1-11.

14. A computer-readable storage medium storing a program, wherein the program when executed by a processor implements the face gender age identification method according to any one of claims 1 to 11.