CN110532970A

CN110532970A - Age-sex's property analysis method, system, equipment and the medium of face 2D image

Info

Publication number: CN110532970A
Application number: CN201910823680.XA
Authority: CN
Inventors: 张帅
Original assignee: Xiamen Reconova Information Technology Co Ltd
Current assignee: Xiamen Reconova Information Technology Co Ltd
Priority date: 2019-09-02
Filing date: 2019-09-02
Publication date: 2019-12-03
Anticipated expiration: 2039-09-02
Also published as: CN110532970B

Abstract

The invention discloses age-sex's property analysis method, system and the computer equipment of a kind of face 2D image, method includes: the face 2D picture for obtaining and needing to detect；Face datection is carried out to individual face 2D picture by trained first nerves network model, obtains face frame position and face feature point position；Picture correction and interception are carried out according to face frame position and face feature point position, obtains the face 2D picture after overcorrection standardizes；Age-sex's attribute forecast is carried out to the face 2D picture after correction standardization by trained nervus opticus network model, obtains original predictive value；Age-sex's attribute that face is determined according to the original predictive value and age-sex's Attributions selection strategy exports age and the gender of prediction；Age-sex's result of prediction is output to backstage and is recorded in database, is analyzed for subsequent data.The method of the present invention can fast and accurately detect age-sex's attribute information of face in camera.

Description

Age and gender attribute analysis method, system, equipment and medium for face 2D image

Technical Field

The invention relates to the technical field of image processing based on a deep learning method, in particular to an age and gender attribute analysis method and system based on a 2D face picture and computer equipment.

Technical Field

When people walk on the street or in supermarkets of various shopping malls and shops, if attention is paid, various cameras can be found to be distributed throughout daily life of people, most of the cameras are used for data recording and have a storage function, monitoring data are called under certain conditions (case tracking, shop monitoring and the like) to perform historical backtracking analysis, a large amount of data are generated by the cameras every day, most of the data are used for backtracking, and the data are not fully utilized. In order to solve the practical problems of similar scenes, the invention provides an age and gender attribute analysis method based on a 2D face photo.

The invention discloses an intelligent tourism management system, which is disclosed by the invention in China with the publication number of CN109858388A and the publication number of 20190607, and comprises the following components: the system comprises an unmanned aerial vehicle aerial photo tourist distribution system, a scenic spot face recognition system, a scenic spot entrance people flow prediction system, a scenic spot basic information data system, a hotel data statistics system, a cloud data management platform and a mobile terminal; the scenic spot face recognition system is used for recognizing the age stage and the gender of tourists by using a face recognition technology, and comprises the following steps:

firstly, establishing a face database, wherein face images in the database comprise photos from different ages and different expressions, and the photo background is consistent with the photo background shot by a scenic spot entrance camera;

then, the database is manually sorted according to gender, the training samples are divided into a male image set and a female image set, the name of the database is according to English acronym, and the first layer is as follows: dividing the sex for the first time; a second layer: dividing young YM, middle MM and old OM in the sex layer of male or female; the third layer is used for dividing age ranges, the fourth layer is used for dividing databases with smaller age intervals, and the 'MM-i-13' is interpreted as a 3 rd sub-database which is attached to the 1 st database by the ith middle-aged man; fifth age estimation;

finally, an average age estimation method is adopted, wherein Li is the age of the database, Nij represents the total number of the sub-database training, pictures are divided into a plurality of pictures of each person per year, and then the pictures are independently trained;

the training model of the scenic spot face recognition system is as follows:

firstly, performing face recognition pre-training on a face database to obtain a deep learning face model, then performing fine tuning training on the characteristics of hair, eyes, nose, mouth and beard on a face attribute data set by using the model to obtain a face attribute model, connecting all full-connection layer characteristics of a network to be used as a face characteristic vector, and finally training and testing on the data set by using a random forest classifier;

then, the age stages are divided into four age stage categories of 5-15 years old, 15-25 years old, 25-50 years old and more than 50 years old; the cloud data management platform classifies the ages of the tourists obtained by the scenic spot face recognition system according to four age stages, the number of the tourists in each age stage is calculated, the ages and the sexes of the tourists are input when the tourists inquire the scenic spot information on the terminal APP, and the system pushes scenic spot data suitable for the ages and the sexes of the tourists. However, the invention can only predict age groups, but cannot predict specific age values, the application scene range is single, and the deep learning face model does not adopt a method of weighting and averaging by multi-person labeling, and the result is inaccurate.

Disclosure of Invention

The technical problem to be solved by the invention is to provide a method, a system, equipment and a medium for analyzing the age and gender attributes of a 2D face image, which can quickly and accurately analyze the age and gender of the face image and statistically analyze the age and gender information in a camera in various scenes.

In a first aspect, the method of the present invention is implemented by: a method for analyzing the age and gender attribute of a 2D image of a human face comprises the following steps:

step S1, obtaining a 2D picture of a human face to be detected;

step S2, carrying out face detection on a single face 2D picture through the trained first neural network model to obtain the position of a face frame and the position of a face feature point; correcting and intercepting the picture according to the position of the face frame and the position of the facial feature point to obtain a corrected and standardized 2D picture of the face;

step S3, performing age and gender attribute prediction on the corrected and standardized human face 2D picture through the trained second neural network model to obtain an original prediction value;

step S4, determining the age and gender attribute of the face according to the original predicted value and the age and gender attribute selection strategy, and outputting the predicted age and gender;

and step S5, outputting the result of the predicted age and sex to the background and recording the result into a database for subsequent data analysis.

In a second aspect, the system of the present invention is implemented as: an age gender attribute analysis system of a 2D image of a human face, comprising:

the data acquisition module is used for acquiring a 2D picture of a face to be detected;

the first neural network model is used for carrying out face detection on a single face 2D picture to obtain the position of a face frame and the position of a face characteristic point; correcting and intercepting the picture according to the position of the face frame and the position of the facial feature point to obtain a corrected and standardized 2D picture of the face;

the second neural network model is used for carrying out age and gender attribute prediction on the corrected and standardized human face 2D picture to obtain an original prediction value;

the prediction module is used for determining the age and gender attribute of the face according to the original prediction value and the age and gender attribute selection strategy and outputting the predicted age and gender;

and the result output module is used for outputting the predicted age and sex results to a background and recording the results into a database for subsequent data analysis.

In a third aspect, the computer apparatus of the present invention is realized: a computer device comprising a memory storing a computer program and a processor implementing the method of the invention as described above when the processor executes the computer program.

In a fourth aspect, the medium of the present invention is realized by: a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 5.

Compared with the prior art, the invention has the following beneficial effects:

(1) the invention relates to an age and gender attribute analysis method, a system and computer equipment based on a 2D face photo, which can quickly detect a face frame and facial feature points in a picture through a face detection neural network model and output the position of the face frame and the position of the facial feature points; and after the face picture is corrected and amplified, a corrected and standardized face picture is intercepted.

(2) According to the age and gender attribute analysis method and system based on the 2D face photo and the computer device, the age and gender attribute information of the face in the camera can be detected quickly and accurately, so that a store owner can accurately master the age and gender distribution of customers in a store, and an effective strategy can be made by using the analyzed data to improve the turnover.

(3) The predicted age range of the invention is between 0 and 90 years, the data can be predicted by a model after being acquired to obtain very accurate surface age, and the final prediction result is the age value and the gender, but not the age range and the gender.

(4) In the invention, in a real use scene of data origin, a method of multi-person labeling, weighting and averaging is adopted to enable the result to be more accurate, unified correction and shearing are carried out for standardization after face detection, an age and gender prediction model is that a basic model is determined after a large number of related papers are read, a characteristic processing branch structure is designed, and accurate age values and gender are obtained through a post-processing flow according to the output result of the age and gender prediction model.

Drawings

The invention will be further described with reference to the following examples with reference to the accompanying drawings.

FIG. 1 is a flow chart illustrating the use of the age and gender attribute analysis method based on 2D face photos in a real scene according to the present invention;

FIG. 2 is a diagram of a neural network model for face detection according to an embodiment of the present invention; wherein 2(a) is a P-Net network structure diagram of a face detection model; 2(b) is an R-Net network structure diagram of the face detection model; 2(c) is an O-Net network structure diagram of the face detection model;

fig. 3 is an age and gender effect graph predicted in an actual scene according to the present invention, in which the predicted gender (M for male and F for female) and the corresponding predicted value (range 0-1, female is more like as closer to 0 and male is more like as closer to 1) are marked in the upper left corner of the graph, and the position of the detected face frame and the positions of five coordinate points (left eye pupil, right eye pupil, nose tip, left mouth corner and right mouth corner) are drawn in the graph;

fig. 4 is an architecture diagram of the system of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

On one hand, the invention provides an age and gender attribute analysis method of a face 2D image, which can meet the requirements of some scenes needing to judge the age and gender of the face by analyzing a video by using a deep learning face detection algorithm and a face age and gender analysis algorithm. The method can effectively, quickly and accurately detect the face position and the feature point in the video or/and the picture and predict the age and gender attribute of the face of the person, so that the age and gender attribute analysis of the face picture can be performed on some projects or scenes with requirements on the age and gender attribute of the face, and further the related data can be better analyzed and utilized.

As shown in fig. 1, the method of the present invention comprises:

step S1, obtaining a 2D picture of a human face to be detected;

On the other hand, as shown in fig. 4, the present invention further provides an age and gender attribute analysis system for a 2D image of a human face, comprising:

In still another aspect, the present invention further provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor executes the computer program to implement the method for analyzing the age and gender attribute of the 2D image of the human face according to the present invention.

In still another aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method for analyzing the age and gender attribute of a 2D image of a human face according to the present invention.

The specific steps for realizing the scheme of the invention are as follows:

training of neural network model

The training data mainly comprises two parts: firstly, an online public data set IMBD-WIKI is collected to be used as pre-training, meanwhile, face data of a real scene collected by a real scene camera is collected to be used as an optimization model to improve the prediction accuracy of the real scene, and because the neural network model is characterized in that the data representation effect under the same scene is good, but great effect loss is caused under different scenes.

1. Training of a first neural network

Firstly, pictures and videos of various figures in cameras under various scenes are collected, then five feature points (a left eye pupil, a right eye pupil, a nose tip, a left mouth corner and a right mouth corner respectively) of a face area and a face are manually calibrated by adopting an external rectangular frame, and calibrated data and corresponding labels are sent to a first neural network for training. In a specific embodiment, the first neural network model adopts an mtcn face detection (Multi-task Cascaded Convolutional network) model, the face detection model is composed of three network structures, namely P-net (pro-social network), R-net (refine network) and O-net (output network), and the obtaining of the face frame position and the face feature point position includes three stages:

(1) obtaining a candidate window of a face area and a regression vector of a boundary frame by the P-Net network, performing regression by using the regression vector of the boundary frame, calibrating the candidate window, combining highly overlapped candidate frames by non-maximum suppression, and outputting an initial face frame prediction result and five face feature points;

(2) removing the false-positive areas through bounding box regression and non-maximum suppression by the R-Net network, and outputting a more accurate face frame prediction result and five facial feature points;

(3) and further removing the false-positive areas through bounding box regression and non-maximum suppression by the O-Net network, and outputting a more accurate face frame prediction result and five facial feature points.

These three networks are described in detail below:

P-Net network: the network structure is as shown in fig. 2(a), and a 1 × 1 × 32 output result is obtained by using 12 pixels × 3 channels as network input and passing through a 3 × 3 convolutional network- > MaxPooling layer- >3 × 3 convolutional network.

R-Net network: the network structure is shown in fig. 2(b), and the false-positive (the network predicts the face area but does not actually) areas are removed mainly by the bounding box regression and the NMS. Only because the network structure is different from the P-Net network structure, the input is changed into 24pixel multiplied by 3channel, and a full connection layer is added, so that the effect of inhibiting false-positive can be better achieved.

O-Net network: the network structure is as shown in fig. 2(c), the input is further enlarged to 48pixel × 48pixel × 3channel so that the input information is more detailed, and the layer has a roll base layer more than the R-Net layer, and the function is the same as that of the R-Net layer. But the layer supervises more the face area, as the last stage of the whole model, the output five facial feature points (landmark, including the left eye pupil, the right eye pupil, the nose tip, the mouth leftmost point and the mouth rightmost point) are more accurate than those in the first two stages, the three small network structures all output the coordinates of the five facial feature points, but because the input of the R-Net and P-Net networks is too small and the information of the facial feature points is very little, the weight coefficient of the loss function generated by the regression of the facial feature points in the first two stages is set to be 0.5 which is relatively small, and the weight coefficient of the loss function generated by the O-Net network in the last stage is 1.0 which is relatively large, because the prediction of the facial feature points is the most accurate in the output of the O-Net stage, the prediction result of the facial feature points in the last stage is selected as the prediction result of the facial feature points in practice, the network input of the O-Net is the largest of the three small networks, so that the facial features can be extracted more accurately.

The loss function of the face detection feature description of the MCCN face detection model mainly comprises 3 parts: face classification loss functions (face/non-face classifier), face frame loss functions (bounding box regression), and face feature point loss functions (feature point localization).

(a) The face classification loss function is expressed as follows:

wherein i represents the ith sample, p_iRepresents the probability that the ith sample is a human face, ranging from 0 to 1, p_i∈[0，1]，Real label data representing the ith sample, the data range is 0 and 1, and y belongs to {0, 1 };

(b) the face box loss function is represented as follows:

wherein,in order to be predicted by the network,y is a quadruple consisting of a horizontal and vertical coordinate at the upper left corner of the face frame, the length of the face frame and the width of the face frame;

(c) the facial feature point loss function is represented as follows:

wherein,in order to be predicted by the network,for the actual real facial feature point coordinates,y is a ten-tuple consisting of 5 face feature point coordinates.

In summary, the overall loss function of the entire model training process can be expressed as follows:

P-Net R-Net(α_det＝1，α_box＝0.5，α_landmark＝0.5)

O-Net(α_det＝1，α_box＝0.5，α_landmark＝1)

wherein N is the number of positive samples of a preset face frame; alpha is alpha_det、α_boxAnd alpha_landmarkRepresenting weights respectively representing face classification loss, face frame loss and face feature point loss;indicating whether a human face is input;andrespectively representing a face classification loss function, a face frame loss function and a face feature point loss function.

From the above, it can be seen that the above-mentioned 3 loss functions are calculated during training but the losses are not meaningful for each input, so the above formula is defined to control the use of different losses for different inputs and to assign different weights. It can be seen that in both P-Net and R-Net networks, the loss weight α of facial feature point regression_landmarkSmaller than the O-Net part because the first 2 stages focus on filtering out non-face bbox. The meaning of beta existence is that for example, non-face input, only the meaningful face classification loss needs to be calculatedAnd the regression loss of the meaningless bounding box and the facial feature point does not need to be calculated because the non-human face area is aimed at.

After training, a deep learning neural network model capable of accurately detecting the face frame and the facial feature points is obtained and used for predicting the positions of the face frame and the facial feature points in the video or/and the picture, and then the face is extracted for next age and gender attribute analysis of the extracted face.

2. Training of a second neural network model

In a specific embodiment, the second neural network model uses LightCNN as a feature extraction layer, 128 pixels × 3 channels as network inputs, and the output is set as a 512-dimensional vector as an extracted feature, followed by three parallel branches:

the first branch is used for predicting the gender, the prediction result is between 0 and 1, the closer to 1, the more the model determines that a male is in the picture, and the closer to 0, the more the model determines that a female is in the picture;

the second branch is used for classifying age groups, the predicted age group is set to be 0-90 years old and is averagely divided into 18 segments, so that 18 results are output in the second branch and respectively represent the confidence degrees of the segments, and the segment with the highest confidence degree is selected as the predicted result of the age group in the training and predicting process; for example, one segment every 5 years of age, for 18 segments, so there are 18 results output in the second branch;

the third branch also has 18 results output, each corresponding to a small range of adjustment values, combined with the results of the second branch to obtain a predicted age value.

For example, the confidence of the second branch prediction result is the fifth age group, the corresponding age range is [20,25) this age group, the center age is 22.5 years, the fifth prediction result of the third branch is 1.2, and the final predicted age is 22.5+1.2 ≈ 23.7 ≈ 24 years by combining the results of the second branch and the third branch.

1) And the first branch (gender prediction branch) adopts mean square error MSELoss as a loss function, and the formula is as follows:

wherein,the predicted probability value of the male gender attribute is represented, y represents the true value of the gender attribute, y belongs to {0, 1}, 0 represents that the picture is female, and 1 represents that the picture is male; n represents the number of categories of all attributes;

2) the second branch (age group classification branch) adopts cross entropy CELoss as a loss function, and the formula is as follows:

wherein,a probability value representing all the predicted age groups,y represents the true values of all age groups, y is equal to {0, 1}, 0 represents not in the age group, 1 represents in the age group, and for the same picture, the label of only one age group is 1, and the others are all 0;representing a predicted probability value for the ith age group; y is_iA true value representing the ith age group; n represents the number of all age groups;

3) and the third branch (intra-segment age adjustment branch) adopts cross-entropy CELoss as a loss function, and the formula is as follows:

wherein,a regression value representing the predicted adjustment value of the corresponding age group,y represents the true value for all age groups, y ∈ [ -2.5,2.5]；A predicted regression value representing the adjusted value of the ith age group; y is_iTrue regression values representing the ith age group; n represents the number of all age groups.

And obtaining a model capable of accurately predicting the age and gender attributes of the face through a large amount of training and parameter adjustment, and using the model for analyzing the age and gender attributes of the face.

Use in real scenes

As shown in fig. 1, in a specific embodiment, using a trained first neural network model and a trained second neural network model to predict age and gender of data in a real scene, the embodiment specifically includes:

step S1, acquiring a 2D picture of a human face to be detected from a video screen;

As shown in fig. 3, the predicted age and gender effect map in the actual scene according to the present invention is obtained by plotting the predicted gender (M for male and F for female) and the corresponding predicted value (range 0-1, female-like as closer to 0 and male-like as closer to 1) in the upper left corner of the picture, and the predicted age size, and by plotting the position of the detected face frame and the positions of the five coordinate points (left eye pupil, right eye pupil, nose tip, left mouth corner and right mouth corner) in the picture.

Claims

1. A method for analyzing age and gender attributes of a face 2D image is characterized by comprising the following steps: the method comprises the following steps:

step S1, obtaining a 2D picture of a human face to be detected;

2. The method for analyzing age-gender attribute of a human face 2D image according to claim 1, wherein: the step S2 specifically includes:

step S21, carrying out face frame detection on a single picture through the trained first neural network model to obtain the position of a face frame and the position of a facial feature point; the position of the face frame comprises the coordinates of the upper left corner of the face frame and the coordinates of the lower right corner of the face frame; the facial feature points comprise left eye pupils, right eye pupils, nose tips, leftmost points of the mouth and rightmost points of the mouth; the position of the facial feature point comprises coordinates of the five facial feature points;

step S22, calculating the included angle between the connection line of the two pupils and the horizontal line according to the positions of the left pupil and the right pupil; connecting the middle point of the double-pupil connecting line with the middle point of the connecting line of the leftmost point and the rightmost point of the mouth to be used as a longitudinal line, and using the preset value of the segment from top to bottom of the longitudinal line as the central point of the image; reversely rotating the degree of the included angle by taking the central point as a center to obtain a horizontal picture with double pupils;

and step S23, amplifying the position of the face frame according to the preset proportion, intercepting the picture in the amplified face frame, and obtaining the corrected and standardized face picture.

3. The method for analyzing age-gender attribute of a human face 2D image according to claim 1, wherein: the second neural network model uses LightCNN as a feature extraction layer, takes 128pixel × 128pixel × 3channel as network input, sets output as a 512-dimensional vector as an extracted feature, and is followed by three parallel branches:

the second branch is used for classifying age groups, the predicted age group is set to be 0-90 years old and is averagely divided into 18 segments, so that 18 results are output in the second branch and respectively represent the confidence degrees of the segments, and the segment with the highest confidence degree is selected as the predicted result of the age group in the training and predicting process;

4. The method for analyzing age-gender attribute of a human face 2D image according to claim 3, wherein:

1) and the first branch adopts mean square error MSELoss as a loss function, and the formula is as follows:

2) and the second branch adopts cross entropy CELoss as a loss function, and the formula is as follows:

3) and the third branch adopts cross entropy CELoss as a loss function, and the formula is as follows:

wherein,a regression value representing the predicted adjustment value of the corresponding age group,y represents the true value for all age groups, y ∈ [ -2.5,2.5](ii) a y represents a predicted regression value of the adjustment value of the ith age group; y is_iTrue regression values representing the ith age group; n represents the number of all age groups.

5. The system for analyzing the age and gender attributes of the 2D face image is characterized in that: the method comprises the following steps:

6. The age-gender attribute analysis system of a human face 2D image as claimed in claim 5, wherein: the first neural network model is specifically configured to:

carrying out face frame detection on a single picture to obtain the position of a face frame and the position of a facial feature point; the position of the face frame comprises the coordinates of the upper left corner of the face frame and the coordinates of the lower right corner of the face frame; the facial feature points comprise left eye pupils, right eye pupils, nose tips, leftmost points of the mouth and rightmost points of the mouth; the position of the facial feature point comprises coordinates of the five facial feature points;

calculating the included angle between the connecting line of the two pupils and the horizontal line according to the positions of the left pupil and the right pupil; connecting the middle point of the double-pupil connecting line with the middle point of the connecting line of the leftmost point and the rightmost point of the mouth to be used as a longitudinal line, and using the preset value of the segment from top to bottom of the longitudinal line as the central point of the image; reversely rotating the degree of the included angle by taking the central point as a center to obtain a horizontal picture with double pupils;

and amplifying the preset proportion according to the position of the face frame, and intercepting the picture in the amplified face frame to obtain a corrected and standardized face picture.

7. The age-gender attribute analysis system of a human face 2D image as claimed in claim 5, wherein: the second neural network model uses LightCNN as a feature extraction layer, takes 128pixel × 128pixel × 3channel as network input, sets output as a 512-dimensional vector as an extracted feature, and is followed by three parallel branches:

8. The system for analyzing age-gender attribute of a human face 2D image according to claim 7, wherein:

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that: the processor, when executing the computer program, implementing the method of any of claims 1 to 5.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 5.