CN110443189A

CN110443189A - Face character recognition methods based on multitask multi-tag study convolutional neural networks

Info

Publication number: CN110443189A
Application number: CN201910704048.3A
Authority: CN
Inventors: 严严; 毛龙彪; 王菡子
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2019-07-31
Filing date: 2019-07-31
Publication date: 2019-11-12
Anticipated expiration: 2039-07-31
Also published as: CN110443189B

Abstract

Based on the face character recognition methods of multitask multi-tag study convolutional neural networks, it is related to computer vision technique.Multi-task learning is used first, while learning face critical point detection and face character two tasks of identification；Consider that the learning difficulty of different attribute is different with study convergence rate, attribute is divided into subjective attribute and objective attribute, accelerates the gentle solution sample imbalance problem of convergence rate of network using changeable weight and adaptive threshold strategy；Finally according to trained network model, using subjective attribute and the face character recognition result of objective attribute sub-network as final face character recognition result.It is adjusted using changeable weight scheme and adaptive threshold, accelerates to can be relieved label imbalance problem while the convergence rate of network；Using method three different sub-networks of training in spatial pyramid pond, training end to end is reached and has carried out multitask plurality of human faces Attribute Recognition.Improve the precision of face character identification, the especially biggish subjective attribute of difficulty.

Description

Face character recognition methods based on multitask multi-tag study convolutional neural networks

Technical field

The present invention relates to computer vision techniques, are specifically related to a kind of based on multitask multi-tag study convolutional Neural net The face character recognition methods of network.

Background technique

In the past few years, face character identification causes the very big concern of computer vision and pattern-recognition, mainly answers With including, including image retrieval, recognition of face, pedestrian identify again, micro- Expression Recognition, image generate and recommender system.Face category Property identification task be: given face face-image, a variety of face characters of prediction；Such as gender, attraction and smile.Although face The task of Attribute Recognition is the classification task of image level, at present there is also many challenges, be primarily due to facial angle, Variation of facial expression caused by illumination etc. changes etc..

Recently, divided due to the outstanding performance method of convolutional neural networks (CNN) using convolutional network opposite subordinate's property Class.Roughly, these methods can be divided into single label study and label study, in general, facial inspection is first carried out in the above method Survey/alignment, then predicts face character.What these tasks were often separately trained, therefore, the internal relation between these tasks It is often ignored, however these tasks are closely related；Such as: whether face mouth feature point may determine that a people It smiles.At the same time, although some differentiations for also learning multiple face characters simultaneously based on multi-tag learning method.But these Each attribute of method fair play (uses the framework of identical network all properties), does not consider that the different study of these attributes are multiple Polygamy (such as: study prediction " glasses " attribute may be easier than identification " oval face ") some attributes (such as: " big lip ", " heavy make-up ") be it is very subjective, they are difficult to be distinguished by machine, in some instances it may even be possible to obscure the mankind sometimes.In addition to the above problem, instruction Practice the frequent the problem of there are also uneven labels (for example, the positive sample of " bald head " attribute is considerably less) of collection, and rebalances multi-tag Data be it is very difficult, often balance the balance that one of attribute just influences whether another attribute.

Chinese patent application CN201811093395.9 discloses a kind of people based on the study of more example multi-tag depth migrations Face attribute recognition approach includes the following steps: to prepare face image data collection, to each facial image, extraction depth convolution Multiple nervous layer features of neural network migration models, are combined into multilayer face characteristic；It builds and extracts multi-tag relationship characteristic Network model, and be input with multilayer face characteristic, plurality of human faces attribute tags are true value, training fixed network model parameter；Needle To each face character design a linear binary classifier, using the network model of trained multi-tag relationship characteristic as Feature extractor migration at most face character sorter model, utilizes each linear binary classifier of face image data collection training. The present invention selects the mode of transfer learning, by the very strong migration models of vigor migrating to selected data set quickly and efficiently, And the simple multi-tag relationship characteristic model of training structure is built, while the linear binary classifier of the multiple face characters of training.

Summary of the invention

It is a kind of based on multitask multi-tag it is an object of the invention in view of the above-mentioned problems existing in the prior art, provide Practise the face character recognition methods of convolutional neural networks.

The present invention uses multi-task learning first, while learning face critical point detection and face character identification two Business.Consider that the learning difficulty of different attribute is different with study convergence rate simultaneously, attribute is divided into subjective attribute and objective attribute Two classes accelerate the gentle solution sample imbalance problem of convergence rate of network using changeable weight and adaptive threshold strategy；Most Eventually according to trained network model, using subjective attribute and the face character recognition result of objective attribute sub-network as final Face character recognition result.

The present invention specifically includes the following steps:

1) prepare training sample set and verifying sample set；

2) it concentrates every width to input facial image training sample, a feature is obtained by a feature extraction network Figure；

3) obtained characteristic pattern is carried out to spatial pyramid pond (the Spatial Pyramid of different levels Pooling), to obtain the feature of different dimensions；

4) multi-task learning is used, while carrying out facial feature points detection and face character identification；

5) face character is divided into two class of subjective attribute and objective attribute, using the feature of different dimensions as input；

6) a kind of dynamic loss weight and adaptive threshold mechanism are used, in depth convolutional neural networks for calculating people The loss weight and adjustment decision boundary threshold value, all images that training sample is concentrated of face attribute input depth convolutional neural networks In trained end to end using back-propagation algorithm, will verifying collection sample input depth convolutional neural networks according to classification tie Fruit obtains loss weight and decision boundary；

7) face character identification, the sub-network output of subjective attribute and objective attribute are carried out using trained network model Recognition result.

In step 1), the specific method for preparing training sample set and verifying sample set can are as follows:

(1) mark of face critical point detection and face character identification is obtained respectively；

(2) mark that face critical point detection and face character identify is integrated together composition training and verifying sample Collection；Training sample set is expressed asI=1 ..., N, N are training sample Number, and the verifying set representations of sample areJ=1 ..., M, M are to test Demonstrate,prove sample number, wherein landmark_p be face key point number, attr_q be face character total classification number, N, M, Landmark_p, attr_q are natural number；WithRespectively indicate a face sample image of training set and verifying collection；Indicate the coordinate of landmark_p face key point；Indicate attr_q people The label of face attribute, value, which is 1 or -1,1 expression image, indicates that the face category is not present in image there are the face character, -1 Property.

In step 2), it is described pass through a feature extraction network obtain the specific steps of a characteristic pattern can are as follows:

It (1) is 224 × 224 to the normalization size of all original images；

(2) Resnet50 network (K.He, X.Zhang, S.Ren, and J.Sun, " Deep residual are used learning for image recognition,”in Proc.IEEE Conf.Comput.Vis.Pattern Recognit., 2016, pp.770-778.) and remove the last overall situation and be averaged pond layer and full link sort layer, obtain one 7 × 7 characteristic pattern.

In step 3), the spatial pyramid pond (Spatial that obtained characteristic pattern is carried out to different levels Pyramid Pooling), so that the specific steps for obtaining the feature of different dimensions can are as follows:

(1) to characteristic pattern using 1 × 1 spatial pyramid pond (K.He, X.Zhang, S.Ren, and J.Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,”IEEE Trans.Pattern Anal.Mach.Intell.,vol.37,no.9,pp.1904-1916, 2015.) feature for obtaining 2048 dimensions, the full articulamentums of two sub-networks identified with facial feature points detection and objective attribute into Row connection.

(2) feature of 28672 dimensions is obtained using 3 × 3 spatial pyramid pond to characteristic pattern, with subjective attribute identification The full articulamentum of sub-network is attached.

In step 4), it is described can using the specific method of multi-task learning are as follows:

(1) for facial feature points detection task, for the relative coordinate to given 144 characteristic points of picture prediction Value；For each training sampleThe loss for calculating facial feature points detection, using MSE loss function:

Wherein, n indicates sample number,WithRespectively indicate prediction and true human face characteristic point label；

(2) for face character analysis task, it can be considered two classification problems；For each training sampleCalculate people The loss of face attributive analysis task:

Wherein,WithRespectively indicate the predicted value and true tag for i-th of sample, j-th of attribute.

In step 5), the subjective attribute include arched eyebrows, charming, the big lip of eye pouch, russian, it is with sleepy eyes, Heavy eyebrows, plentiful, heavy make-up, high-malar, elongated eyes, oval face, sharp nose, roseate cheek, smile, straight hair, curly hair, The subjectivity face character such as wavy hair, young man；The objective attribute include beard, bald head, fringe, dark hair, golden hair, palm fibre hair, Double chin, glasses, goatee, white hair, male, open one's mouth, beard (upper lip), beard (chin), pale skin, hair line to Afterwards, sideboards, wear earrings, band cap, use lipstick, band necklace, the objective face character such as tie；The subjective attribute subnet Network has three layers of full articulamentum, there is 2048,1024 and 22 nodes respectively, and the objective attribute sub-network has two layers of full articulamentum, There are 1024 and 22 nodes respectively.

In step 6), the dynamic loss weight and adaptive threshold mechanism concretely:

Each attribute weighted in backpropagation is given according to the loss trend of verifying collection；

Wherein,It is j-th of attribute weight in the t times iterative process,It is that j-th of attribute collects in verifying Loss；

It is as follows that adaptive threshold adjusts formula:

Wherein, τ_tIt is the threshold vector of an attribute number dimension, l indicates the round of whole sample trainings at present, and V is to test Card collection total sample number,WithIt is illustrated respectively in the upper mistake of verifying collection and differentiates that the sample number for the class that is positive and mistake differentiation are negative The sample number of class, γ are a constants, are set as 0.01.

The invention proposes it is a kind of based on multitask multi-tag study convolutional neural networks face character recognition methods, Facial feature points detection and face character is trained to identify two inter-related tasks, simultaneously using multi-task learning to utilize the two Inner link, while attribute is divided into two class of objective attribute and subjective attribute, it is contemplated that the otherness between different faces attribute makes Learn their feature and classifier with different network structures, come with this while predicting that multiple face characters improve face category Property identification accuracy.Face character identification can be effectively performed in the present invention, by analyzing algorithm it is found that relative to routine Face character recognizer, the present invention can be improved face character identification precision, especially some biggish subjectivities of difficulty Attribute.The present invention is adjusted using changeable weight scheme and adaptive threshold, can be delayed while accelerating the convergence rate at network Solve label imbalance problem.The present invention has trained three different sub-networks using the method in spatial pyramid pond, reaches Training carries out multitask plurality of human faces Attribute Recognition end to end.

Detailed description of the invention

Fig. 1 is the frame diagram of the embodiment of the present invention.

Specific embodiment

Following embodiment will elaborate in conjunction with attached drawing and to method of the invention, and the present embodiment is with the technology of the present invention Implemented under premised on scheme, gives embodiment and specific operation process, but protection scope of the present invention is not limited to down The embodiment stated.

Referring to Fig. 1, the embodiment of the present invention includes following steps:

1. preparing training sample set and verifying sample, human face characteristic point label, face are needed using the library python of open source Attribute tags are database from tape label.

A1. the mark of face critical point detection and face character identification is obtained respectively；

A2. the mark that face critical point detection and face character identify is integrated together composition training and verifying sample setI=1 ..., N, N are number of training,J=1 ..., M, M are verifying sample number, and landmark_p is face The number of key point, attr_q are total classification number of face character, and N, M, landmark_p, attr_q are natural number；WithRespectively indicate a sample image of training set and verifying collection；Indicate landmark_p people The coordinate of face key point；Indicate the label of attr_q face character.

2. picture given for any one is adjusted to fixed size first, specific as follows:

It B1. is 224 × 224 to the normalization size of all original images.

B2. Resnet50 network (K.He, X.Zhang, S.Ren, and J.Sun, " Deep residual are used learning for image recognition,”in Proc.IEEE Conf.Comput.Vis.Pattern Recognit., 2016, pp.770-778.) and remove the last overall situation and be averaged pond layer and full link sort layer, obtain one 7 × 7 characteristic pattern.

3. carrying out different spatial pyramid ponds for obtained characteristic pattern obtains the feature of different dimensions, specifically such as Under:

C1. to characteristic pattern using 1 × 1 spatial pyramid pond (K.He, X.Zhang, S.Ren, and J.Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,”IEEE Trans.Pattern Anal.Mach.Intell.,vol.37,no.9,pp.1904-1916, 2015.) feature for obtaining 2048 dimensions, the full articulamentums of two sub-networks identified with facial feature points detection and objective attribute into Row connection.

C2. the feature of 28672 dimensions is obtained using 3 × 3 spatial pyramid pond to characteristic pattern, with subjective attribute identification The full articulamentum of sub-network is attached.

4. multi-task learning further progress facial feature points detection task:

D1. for facial feature points detection task, for the relative coordinate to given 144 characteristic points of picture prediction Value.For each training sampleThe loss for calculating facial feature points detection, using MSE loss function:

N indicates sample number,WithRespectively indicate prediction and true human face characteristic point label.

D2. for face character analysis task, it can be considered two classification problems.For each training sampleCalculate people The loss of face attributive analysis task:

WhereinWithRespectively indicate the predicted value and true tag for i-th of sample, j-th of attribute.

5. attribute is divided into subjective attribute and two groups of objective attribute, output knot is then obtained by different network structures Fruit.

Wherein: arched eyebrows, charming, the big lip of eye pouch, russian is with sleepy eyes, and heavy eyebrows are plentiful, heavy make-up, high cheekbone Bone, elongated eyes, oval face, sharp nose, roseate cheek, smile, straight hair, curly hair, wavy hair, 18 subjectivities of young man Face character and beard, bareheaded, fringe, dark hair, golden hair, palm fibre hair, double chin, glasses, goatee, white hair, male, Mouth, beard (upper lip), beard (chin), pale skin, backward, sideboards wear earrings to hair line, and band cap is used lipstick, band Necklace, 22 objective face characters of tying.Subjective attribute sub-network has three layers of full articulamentum, there is 2048,1024 and 22 respectively A node, objective attribute sub-network have two layers of full articulamentum, there is 1024 and 22 nodes respectively.

6. using dynamic loss weight in training process to accelerate to restrain and solve positive and negative sample class imbalance problem It is specific as follows with adaptive threshold mechanism:

Each attribute weighted in backpropagation is given according to the loss trend of verifying collection,

WhereinIt is j-th of attribute weight in the t times iterative process,It is damage of j-th of attribute in verifying collection It loses.

It is as follows that adaptive threshold adjusts formula:

Table 1 is the present invention on CelebA data set, method proposed by the present invention and other face attribute recognition approach knots Fruit comparison；Table 2 is the present invention on LFWA data set, method proposed by the present invention and other face attribute recognition approach results pair Than.

Table 1

Table 2

In tables 1 and 2:

PANDA correspond to N.Zhang et al. proposition method (N.Zhang, M.Paluri, M.Ranzato, T.Darrel, “Panda:Pose aligned networks for deep attribute modeling”,in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2014,pp.1637- 1644.)；

LNets+ANet corresponds to method (Z.Liu, P.Luo, X.Wang, the X.Tang, " Deep of Z.Liu et al. proposition learning face attributes in the wild”,in Proceedings of the IEEE International Conference on Computer Vision,2015,pp.3730-3738.)；

MOON correspond to E.M.Rudd et al. proposition method (E.M.Rudd, M.Gunther, and T.E.Boult, “Moon:A mixed objective optimization network for the recognition of facial attributes,"in Proc.Eur.Conf.Comput.Vis.,2016,pp.19-35.)；

NSA correspond to U.Mahbub et al. proposition method (U.Mahbub, S.Sarkar, and R.Chellappa, “Segment-based methods for facial attribute detection from partial faces,”in IEEE Trans.Affective Comput.doi:10.1109/TAFFC.2018.2820048,2018.)；

MCNN-AUX correspond to E.M.Hand et al. proposition method (E.M.Hand and R.Chellappa, “Attributes for improved attributes:A multi-task network for attribute classification,"in Proc.ThirtyFirst AAAI Conf.Artif.Intell.,2017.)；

MCFA correspond to N.Zhuang et al. proposition method (N.Zhuang, Y.Yan, S.Chen and H.Wang, “Multi-task learning of cascaded CNN for facial attribute classification,”in Proc.Int.Conf.Pattern Recog.,2018,pp.2069-2074.)。

From table 1 and 2 as can be seen that the present invention can be effectively performed face character identification, by algorithm analysis it is found that Relative to conventional face character recognizer, the precision of face character identification, especially some difficulty are can be improved in the present invention Biggish subjective attribute.

Claims

1. the face character recognition methods based on multitask multi-tag study convolutional neural networks, it is characterised in that including following step It is rapid:

1) prepare training sample set and verifying sample set；

2) it concentrates every width to input facial image training sample, a characteristic pattern is obtained by a feature extraction network；

3) the spatial pyramid pond that obtained characteristic pattern is carried out to different levels, to obtain the feature of different dimensions；

6) a kind of dynamic loss weight and adaptive threshold mechanism are used, in depth convolutional neural networks for calculating face category Property loss weight and adjustment decision boundary threshold value, training sample concentrate all images input depth convolutional neural networks in benefit It is trained end to end with back-propagation algorithm, verifying collection sample input depth convolutional neural networks is obtained according to classification results To loss weight and decision boundary；

7) face character identification, the sub-network output identification of subjective attribute and objective attribute are carried out using trained network model As a result.

2. the face character recognition methods as described in claim 1 based on multitask multi-tag study convolutional neural networks, special Sign is in step 1), described to prepare training sample set and verify sample set method particularly includes:

(2) mark that face critical point detection and face character identify is integrated together composition training and verifying sample set；Instruction Practice sample set to be expressed asN is number of training, And the verifying set representations of sample areM is verifying Sample number, wherein landmark_p be face key point number, attr_q be face character total classification number, N, M, Landmark_p, attr_q are natural number；WithRespectively indicate a face sample image of training set and verifying collection；Indicate the coordinate of landmark_p face key point；Indicate attr_q people The label of face attribute, value, which is 1 or -1,1 expression image, indicates that the face category is not present in image there are the face character, -1 Property.

3. the face character recognition methods as described in claim 1 based on multitask multi-tag study convolutional neural networks, special Sign is in step 2), described to pass through a feature extraction network and obtain the specific steps of a characteristic pattern are as follows:

It (1) is 224 × 224 to the normalization size of all original images；

(2) using Resnet50 network and remove the last overall situation and be averaged pond layer and full link sort layer, obtain one 7 × 7 Characteristic pattern.

4. the face character recognition methods as described in claim 1 based on multitask multi-tag study convolutional neural networks, special It levies and is in step 3), the spatial pyramid pond that obtained characteristic pattern is carried out to different levels, to obtain difference The specific steps of the feature of dimension are as follows:

(1) feature of 2048 dimensions is obtained using 1 × 1 spatial pyramid pond to characteristic pattern, with facial feature points detection and visitor The full articulamentum for seeing two sub-networks of Attribute Recognition is attached；

(2) feature of 28672 dimensions is obtained using 3 × 3 spatial pyramid pond to characteristic pattern, the subnet with subjective attribute identification The full articulamentum of network is attached.

5. the face character recognition methods as described in claim 1 based on multitask multi-tag study convolutional neural networks, special Sign is in step 4), described using multi-task learning method particularly includes:

(1) for facial feature points detection task, for the relative coordinate values to given 144 characteristic points of picture prediction；It is right In each training sample x_i ^train, the loss of facial feature points detection is calculated, using MSE loss function:

(2) for face character analysis task, it can be considered two classification problems；For each training sampleCalculate face category The loss of property analysis task:

6. the face character recognition methods as described in claim 1 based on multitask multi-tag study convolutional neural networks, special Sign is in step 5), and the subjective attribute includes arched eyebrows, charming, the big lip of eye pouch, russian, with sleepy eyes, dense Eyebrow, plentiful, heavy make-up, high-malar, elongated eyes, oval face, sharp nose, roseate cheek, smile, straight hair, curly hair, wave The subjective face character such as unrestrained hair, young man；The objective attribute include beard, bald head, fringe, dark hair, golden hair, palm fibre hair, it is double Chin, glasses, goatee, white hair, male, open one's mouth, beard (upper lip), beard (chin), pale skin, hair line backward, Sideboards wear earrings, band cap, use lipstick, band necklace, the objective face character such as tie；The subjective attribute sub-network There are three layers of full articulamentum, there are 2048,1024 and 22 nodes respectively, the objective attribute sub-network has two layers of full articulamentum, point There are not 1024 and 22 nodes.

7. the face character recognition methods as described in claim 1 based on multitask multi-tag study convolutional neural networks, special It levies and is in step 6), the dynamic loss weight and adaptive threshold mechanism specifically:

Wherein,It is j-th of attribute weight in the t times iterative process,It is loss of j-th of attribute in verifying collection；

It is as follows that adaptive threshold adjusts formula:

Wherein, τ_tIt is the threshold vector of an attribute number dimension, l indicates the round of whole sample trainings at present, and V is verifying collection Total sample number,WithBeing illustrated respectively in verifying collection, above mistake differentiates that the sample number for the class that is positive and mistake differentiate the class that is negative Sample number, γ are a constants, are set as 0.01.