CN110569724B

CN110569724B - Face alignment method based on residual hourglass network

Info

Publication number: CN110569724B
Application number: CN201910716528.1A
Authority: CN
Inventors: 邵雄凯; 阳邹; 高榕; 刘建舟; 王春枝
Original assignee: Hubei University of Technology
Current assignee: Hubei University of Technology
Priority date: 2019-08-05
Filing date: 2019-08-05
Publication date: 2021-06-04
Anticipated expiration: 2039-08-05
Also published as: CN110569724A

Abstract

The invention discloses a face alignment method based on a residual error hourglass network, which comprises the steps of firstly, additionally arranging convolution branches on the basis of a residual error module of a basic unit of the hourglass network to increase the receptive field of the network so as to better extract characteristic information under different scales, and simultaneously keeping high-resolution information; and then, by combining the characteristics of the hourglass network, the core size of the newly added convolution branches is adjusted along with the number of layers of the hourglass network, so that the relationship between the feature map resolution and the receptive field is well balanced, the network can extract more detailed information, and the structural information from local to whole is also kept. Finally, the hourglass network is stacked and is assisted by an intermediate supervision mechanism, so that normal updating of low-layer parameters is guaranteed, and the network is allowed to reevaluate the initial estimation and characteristics of the whole image. According to the invention, by stacking the new residual hourglass network, more effective information is extracted, the capability of extracting local detail information by the network is increased, and the accuracy of detecting key points of the face is improved.

Description

Face alignment method based on residual hourglass network

Technical Field

The invention belongs to the technical field of computer vision, and relates to a face alignment method based on a residual hourglass network, in particular to a face alignment method based on a novel residual hourglass network in the field of face recognition of digital images.

Background

The face alignment is also called face key point detection and is used for positioning key points such as eyes, a nose, a mouth, a face contour and the like of a face, the face alignment can help to provide accurate face shape information with specific semantics, and the face alignment plays a vital role in the fields of face recognition, face expression analysis, gender age identification, three-dimensional face modeling and the like. Due to the natural scene or the existence of facial expression, exaggerated head posture, difference of illumination conditions and partial shielding under an unconstrained condition, the problem of face alignment still faces huge challenges, so that an efficient and accurate face alignment algorithm can better meet the requirements of people.

In recent years, with the wide application of the deep learning framework in the field of face alignment, human research on face problems has been rapidly progressing. The advantage of deep learning is that the feature extraction is particularly prominent, and the deep network structure of deep learning can extract data features layer by layer, so that the extracted features are more obvious and easier to classify.

At the earliest, deep learning is introduced into a face alignment algorithm DCNN (deep Convolutional network) [ document 1], and by designing a cascaded Convolutional neural network with three levels, the problem of local optimization caused by initial improper is solved, and more accurate key point detection is obtained by means of strong feature extraction capability of CNN. Compared with the early face alignment algorithm based on the optimization (ASM [ document 2], AAM [ document 3] - [ document 5], CLM [ document 6] - [ document 7]), the aim of face alignment is achieved by optimizing an error equation, the nonlinear optimization problem is relatively complex to solve, the solving cost is increased due to overhigh middle dimension, the face alignment algorithm [ document 8] - [ document 14] based on the cascade shape regression continuously approaches to a standard shape by gradually estimating the shape increment according to the initialized shape, the requirement on the initialized shape is higher, the extraction of the features is more complicated, and the deep learning method is obviously simpler and more efficient.

With the recognition and use of the convolutional neural network, an hourglass network [ document 15] has been proposed to replace the CNN extraction features, and for the correlation task of face alignment, the parts of the face do not have the best recognition accuracy on the same feature map. For example, eyes may be easily recognized on the feature map of the layer 3, and mouths are more easily recognized on the layer 5, so that compared with a traditional convolutional neural network, only the last layer of convolution is used as a target feature, the hourglass network combines the features of the network at the lower level and the higher level by using a unique structure from bottom to top, so that the finally obtained feature is more effective, and therefore, the maximum utilization of image feature information is achieved. The novel residual error-based hourglass network increases convolution branches on the basis of the hourglass network, correspondingly adjusts the scale of a convolution kernel along with the number of layers of the hourglass network, increases the receptive field, balances the relationship between the feature map resolution and the receptive field, enables the network to extract more detailed information, maintains the structural information from local parts to the whole part, and increases the capability of the network to extract more effective characteristics.

[ document 1] Sun Y, Wang X, Tang X. deep conditional Network case for Facial Point Detection [ C ]// Computer Vision and Pattern recognition. IEEE,2013: 3476-.

[ document 2] dyes T F, Taylor C J, Cooper D H, Graham J. active shape models-the guiding and the application. computer vision and image understating, 1995,61(1):38-59.

[ document 3] Sauer P, Cootes T F, Taylor C J. Accurate regression processes for active application models// Proceedings of the British Machine Vision conference. Dunde, Scotland,2011: 681-.

[ document 3] Cootes T F, Edwards G J, Taylor C J.active application models.IEEE Transactions on Pattern Analysis and Machine understanding, 2001,23(6):581-585.

[ document 5] Asthana A, Zafeiriou S, Cheng S, cationic M.Robust distributed reactive map matching with constrained local models// IEEE Conference on Computer Vision and Pattern recognition. Portland, USA,2013: 3444-.

[ document 6] cristiniace D, Cootes T.feature detection and tracking with constrained local models// Proceedings of the British Machine Vision conference. Edinburgh, UK,2006: 929-.

[ document 7] Asthana A, Zafeiriou S, CHENG Shi-yang, cationic M.Inclusion Face Alignment in the wild.// IEEE Conference on Computer Vision and Pattern recognition. Columbus, USA 2014: 1859-.

[ document 8] Xiong Xue-han, Torre F D L.Supervised device method and its applications to face alignment// IEEE Conference on Computer Vision and Pattern registration. Portland, USA,2013: 532-.

[ document 9] Cao Xu-dong, Wei Yi-chen, Wen Fang, Sun Jian.face alignment by application program regression. International Journal of Computer Vision,2014,107(2): 177-.

[ document 10] Burgos-Artizzu X P, Perona P, Dollar P.Robust face and evaluation unit oclusion// IEEE International Conference on Computer Vison.Sydney, Australia,2013: 1513-.

[ document 11] Ren Shao-q, Cao Xu-dong, Wei Yi-chen, Sun Jian.face alignment at 3000fps via regressing local binding facilities// IEEE Conference on Computer Vision and Pattern recognition. Columbus, USA 2014: 1685-.

[ document 12] Dollar P, Welinder P, Perona P.Cascaded position regression.// IEEE Conference on Computer Vision and Pattern recognition. san Francisco, USA,2010: 1078-.

[ document 13] Tzimiopoulos G, cationic M.Gauss-newton defined part models for face alignment in-the-wire.// IEEE Conference on Computer Vision and Pattern recognition. Columbus, USA,2014: 1851-.

[ document 14] Smith B M, Brandt J, Lin Z, Zhang L. Nonparametric context modifying of local impedance for position-and expression-robust surface localization.// IEEE Conference on Computer Vision and Pattern recognition. Columbus, USA,2014: 1741-.

Document 15A. Newell, K.Yang, and J.Deng.Stacked hourglass networks for human position estimation. in European Conference on Computer Vision, pages 483-499 Springer,2016.1,2,3

Disclosure of Invention

The invention aims to provide a novel residual hourglass network-based face alignment method, which increases the capability of extracting effective features of a network and improves the accuracy of face alignment.

The technical scheme adopted by the invention is as follows: a face alignment method based on a residual hourglass network is characterized by comprising the following steps:

step 1: constructing a novel residual hourglass network;

the novel residual error hourglass network comprises a novel residual error module and a residual error module; the novel residual error module is characterized in that a convolution branch is newly added on the basis of the residual error module, the kernel scale of the newly added convolution branch is k, and k changes along with the current layer number hg _ level of the hourglass network; the output of the residual error module is h (x), the output of the novel residual error module is h' (x), and then:

h(x)＝f(x)+x；

h′(x)＝f(x)+g^k(x)+x；

k＝hg_level*2+1；

where x is the input to the residual block, f (x) is the output of x via a triple convolution operation, g^k(x) The output of the novel residual error module is obtained under the condition that the x is the k in the novel convolution branch kernel scale, hg _ level is the current layer of the hourglass network, hg _ levels is the total number of layers of the hourglass network, and H (x) is the final output of the hourglass network;

step 2: obtaining an estimated face key point thermodynamic diagram (H) from an input picture through a stacked novel residual hourglass network₁,H₂...,H_NIn which H_iObtained by representing ith novel residual hourglass networkEstimating a face key point thermodynamic diagram, wherein i is more than or equal to 1 and less than or equal to N, and N is the number of stacked novel residual hourglass networks;

and step 3: generating a real face key point thermodynamic diagram by combining a two-dimensional Gaussian function with real face key points of an input picture

And 4, step 4: predicted human face key point thermodynamic diagram H of each stage of novel residual hourglass network_iAnd real face key point thermodynamic diagram

Pass to obtain L_iThen the loss { L ] obtained from the whole network stage₁,L₂...,L_NTaking an average value to obtain a final L;

and 5: training a network to obtain a training model, carrying out the training model on an input picture to obtain a predicted human face key point thermodynamic diagram H, and converting the thermodynamic diagram H into predicted human face key point coordinates P;

step 6: and drawing the key points of the human face on an original drawing.

The invention discloses a novel residual hourglass network-based face alignment method, which is a relatively simple face alignment method with good robustness. The description of the network on the whole and local associated information of the key points is enhanced by using a novel residual hourglass network model, the completeness of effective feature extraction is enhanced, the problem of gradient disappearance (variation) along with the deepening of the network depth can be effectively avoided by combining a stacked novel residual hourglass network of an intermediate supervision mechanism, the features can be processed in the context of the network while the normal updating of bottom layer parameters is ensured, the features are allowed to be reevaluated, the fault tolerance rate of the network is enhanced, and the algorithm effect is improved.

Drawings

FIG. 1: an overall framework diagram of an embodiment of the invention;

FIG. 2: a network flow diagram of an embodiment of the invention;

FIG. 3: the novel residual error hourglass network model schematic diagram of the embodiment of the invention;

FIG. 4: in the embodiment of the invention, an accumulated error distribution map (CED) of a residual hourglass (NRHG) on a 300W data set is formed by an hourglass network (HG) and the Novel Residual Hourglass (NRHG);

FIG. 5: in the embodiment of the invention, an accumulated error distribution map (CED) of an hourglass network (HG) and a Novel Residual Hourglass (NRHG) on an IBUG data set is provided;

FIG. 6: in the embodiment of the invention, an accumulated error distribution map (CED) of a residual hourglass (NRHG) on a COFW data set is formed by an hourglass network (HG) and the Novel Residual Hourglass (NRHG);

Detailed Description

In order to facilitate understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, which are described herein for the purpose of illustration and explanation only and are not intended to limit the present invention.

Referring to fig. 1 and fig. 2, the novel residual hourglass network human face alignment method provided by the invention comprises the following steps:

step 1: constructing a novel residual hourglass network;

fig. 3 shows a schematic diagram of a novel residual hourglass network structure, the basic constituent units of the novel residual hourglass network are a novel residual module and a residual module, the novel residual module is formed by adding a convolution branch on the basis of the residual module, the size of the kernel of the newly added convolution branch is k, k changes along with the number of layers hg _ level of the hourglass network, the output of the residual module is h (x), the output of the self-adaptive residual module is h' (x), and then:

h(x)＝f(x)+x

h′(x)＝f(x)+g^k(x)+x

k＝hg_level*2+1

where x is the input to the residual blockF (x) is the output of x through a three-layer convolution operation, g^k(x) For the output obtained by x in the novel residual error module under the condition that the scale of the novel convolution branch kernel is k, hg _ level is the layer where the hourglass network is located currently, hg _ levels is the total number of layers (designed to be 4) of the hourglass network, and H (x) is the final output of the hourglass network.

Step 2: obtaining an estimated face key point thermodynamic diagram (H) from an input picture through a stacked novel residual hourglass network₁,H₂...,H_NIn which H_iRepresenting an estimated human face key point thermodynamic diagram obtained by the ith novel residual hourglass network, wherein i is more than or equal to 1 and less than or equal to N, and N is the number of stacked novel residual hourglass networks;

hm(w,h,i)_0＜i＜M＝f(x_i,y_i)_{0＜xi＜w,0＜yi＜h}

Where f (x, y) is a two-dimensional Gaussian function and ksize is the size of the Gaussian function, corresponding to amplitude, sigma²Is the variance of a Gaussian function, (x)₀,y₀) Is the coordinate of the central point, here the coordinate of the key point of the real face, hm (w, h, i)_0＜i＜MAnd w and h are the width and the height of the thermodynamic diagram respectively, and M is the total number of the key points. Finally generated real face key point thermodynamic diagram

That is, hm (w, h, i) generated by all the key points_0＜i＜MIn combination with (1)

And 4, step 4: the predictive face key point thermodynamic diagram H of each stage of the network_i(i.e., output thermodynamic diagram of ith hourglass network) and real face key point thermodynamic diagrams

in the formula h_jAnd

respectively the estimated thermodynamic diagram and the real thermodynamic diagram of the j-th personal face key point of the current stage, wherein M is the total number of the key points of the face, and n is the total number of the network stages (namely the number of the stacked novel residual hourglass networks)

the predicted face key point thermodynamic diagram H is composed of all key point thermodynamic diagrams hm (w, H, i)_0＜i＜MThe deeper the color in the thermodynamic diagram, the larger the two-dimensional Gaussian function value corresponding to the position with the larger color, namely, the closer to the real key point, and the thermodynamic diagram hm (w, h, i) is formed_0＜i＜MAnd converting the value into a one-bit vector, taking the maximum value, and calculating the position coordinate of the original thermodynamic diagram pixel where the value is located, namely the predicted face key point coordinate P.

Step 6: and drawing the key points of the human face on an original drawing.

The invention also provides a performance evaluation method of the face alignment method based on the residual hourglass network, which is used for evaluating the predicted face key point coordinate P and the real face key point coordinate

Carrying out error comparison;

the normalized Mean error nme (normalized Mean error) is generally used to measure the quality of the face alignment algorithm, and is defined as follows:

wherein M is the number of key points of the face, p_iAnd

i is the predicted coordinate and the real coordinate of the key point of the individual face, d is the normalization factor, and the embodiment adopts two normalization modes, one is Inter-pupil normalization, and the other is Inter-ocular normalization.

Compared with the face alignment method (HG) using the traditional hourglass network (network structure and training parameters are the same, and only the novel residual hourglass network is replaced by the common hourglass network), the face alignment accuracy is improved by the algorithm. The data sets used in the experiments were 300W, IBUG, COFW. 3148 images in the 300W data set training set, and 554 images in the test set; the IBUG data set training set is the same as the IBUG data set training set 300W, and the test set comprises 135 images; the COFW data set training set has 1345 image test sets with 507 images.

TABLE 1 comparison of the results of the two different normalization methods on 300W, IBUG and COFW data sets for NRHG and HG of this example (% omitted)

The data in table 1 are Normalized Mean Error (NME) under different test sets, and from the data comparison it can be seen that the new residual hourglass Network (NRHG) compares to the conventional hourglass network (HG) for both 300W and IBUG and COFW data sets, the average error obtained in two different normalization modes is reduced to different degrees, the reduction range on the difficult data set IBUG is large, this shows that the novel residual hourglass Network (NRHG) proposed in this embodiment is more effective in feature extraction of human face images than the conventional hourglass network (HG), can be more helpful for locating key points of human faces, therefore, the accuracy of the algorithm for face alignment is improved to a certain degree, and meanwhile, the NRHG provided by the embodiment has advantages over HG in positioning of key points of the face in complex scenes such as care and shielding.

Referring to fig. 4, in the embodiment of the present invention, a cumulative error distribution map (CED) of a common hourglass network (HG) and a Novel Residual Hourglass (NRHG) on a 300W data set is provided;

referring to fig. 5, in the embodiment of the present invention, a cumulative error distribution map (CED) of a common hourglass network (HG) and a Novel Residual Hourglass (NRHG) on an IBUG data set is provided;

referring to fig. 6, in the embodiment of the present invention, a cumulative error distribution map (CED) of the common hourglass network (HG) and the Novel Residual Hourglass (NRHG) on the COFW data set is provided;

fig. 4, fig. 5, and fig. 6 visually reflect that the face alignment effect of the Novel Residual Hourglass (NRHG) proposed by the present invention is improved to a certain extent in two different normalization manners of three different data sets compared with the conventional hourglass network (HG).

Table 2 comparison of experimental results of this example algorithm (NRHG) and other face alignment algorithms on the COFW dataset (% omitted)

Table 3 comparison of experimental results of this example algorithm (NRHG) and other face alignment algorithms on 300W dataset (% omitted)

As is clear from table 2, the experimental results of the novel residual hourglass network algorithm (NRHG) proposed in this embodiment are better than those of the face alignment algorithms, except that the Normalized Mean Error (NME) is smaller than those of the face alignment algorithms, fr (feature rate) is also significantly reduced, and the data results show the superiority of the algorithm proposed in this embodiment in the COFW data set.

Table 3 shows the comparison of the experimental results of the novel residual hourglass network face alignment algorithm (NRHG) proposed in this embodiment and other face alignment algorithms under three indexes (public subset, challenge subset, full set) on a 300W data set. The data in the table shows that the algorithm provided by the embodiment is superior to the face alignment algorithms, especially the improvement range of the experimental effect is larger compared with the traditional cascade regression RCPR, ESR, LBF and other algorithms, and it can be reflected to a certain extent that the algorithm provided by the embodiment has a significant advantage in face alignment compared with the traditional cascade regression algorithm, although the experimental result of the algorithm provided by the embodiment is not improved to a different extent compared with the face alignment algorithms, some problems still exist in the challenge subset, the novel residual hourglass network face alignment algorithm provided by the embodiment focuses on processing the face image features, although the image characterization has a certain help in the complex situations of caring and shielding, the algorithm for solving the problems of pose, illumination, shielding and the like in comparison RAR, TSR and the like also has a large improvement space, this is also the place that needs to be learned and improved later, but overall, the algorithm of the embodiment has certain advantages.

It should be understood that parts of the specification not set forth in detail are well within the prior art. The present invention may be replaced or modified by one of ordinary skill in the art without departing from the scope of the present invention as defined by the appended claims.

Claims

1. A face alignment method based on a residual hourglass network is characterized by comprising the following steps:

step 1: constructing a novel residual hourglass network;

h(x)＝f(x)+x；

h′(x)＝f(x)+g^k(x)+x；

k＝hg_level*2+1；

step 6: and drawing the key points of the human face on an original drawing.

2. The residual hourglass network-based face alignment method according to claim 1, wherein in step 3, a calculation formula for generating a face key point thermodynamic diagram by using a two-dimensional Gaussian function is as follows:

hm(w,h,i)_0＜i＜M＝f(x_i,y_i)_{0＜xi＜w,0＜yi＜h}；

where f (x, y) is a two-dimensional Gaussian function, ksize is the size of the Gaussian function, and sigma²Is the variance of a Gaussian function, (x)₀,y₀) Is the coordinate of the central point, here the coordinate of the key point of the real face, hm (w, h, i)_0＜i＜MThe thermodynamic diagram of the ith key point is shown, w and h are respectively the width and the height of the thermodynamic diagram, and M is the total number of the key points; finally generated real face key point thermodynamic diagram

That is, hm (w, h, i) generated by all the key points_0＜i＜MCombinations of (a) and (b).

3. The residual hourglass network-based face alignment method according to claim 1, wherein in step 4, the loss function L of each stage_iAnd the final loss function L is expressed as:

in the formula, h_jAnd

respectively an estimated thermodynamic diagram and a real thermodynamic diagram of the j-th personal face key point in the current stage, wherein M is the total number of the personal face key points, and n is the total number of the network stages, namely the number of the stacked novel residual hourglass networks.

4. The residual hourglass network-based face alignment method according to claim 2, wherein the specific implementation of step 5 comprises the following steps:

step 5.1: will thermodynamic diagram hm (w, h, i)_0＜i＜MConverting into a bit vector, and taking the maximum value;

step 5.2: calculating thermodynamic diagram hm (w, h, i)_0＜i＜MAnd the position coordinate of the thermodynamic diagram pixel where the medium maximum value is located is the predicted face key point coordinate P.