CN105426882B

CN105426882B - The method of human eye is quickly positioned in a kind of facial image

Info

Publication number: CN105426882B
Application number: CN201510991486.4A
Authority: CN
Inventors: 马越; 贺光辉
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2015-12-24
Filing date: 2015-12-24
Publication date: 2018-11-20
Anticipated expiration: 2035-12-24
Also published as: CN105426882A

Abstract

The invention belongs to machine learning fields, the method of human eye is quickly positioned in specially a kind of facial image, human eye positioning is carried out using the random forests algorithm for promoting decision tree based on self-adaption gradient, and multi-pose diversification processing is done to sample, make the present invention that there is better robustness, simultaneously using multistage location structure, centered on the position once positioned before i.e., reduce search range, improve convergence rate and accuracy rate, finally result in random forest is weighted and averaged, determines the coordinate position of human eye, and has preferable real-time.The present invention further increases the precision of positioning, low complexity algorithm based on binary tree reduces time loss, the present invention is based on CART models to be learnt to obtain regression tree from a large amount of diversified human eye samples, the good promotion that robustness also obtains, significant performance gain is realized, can be applied in human eye positioning field.

Description

The method of human eye is quickly positioned in a kind of facial image

Technical field

The invention belongs to computer vision and field of image processing, human eye is quickly positioned in specifically a kind of facial image Method is applied in eye recognition field.

Background technique

Either between men during exchange or human-computer interaction, human eye as the most important feature of face it One, there is very important effect to the positioning of human eye.At the beginning of Internet era, the protection of privacy information and its safety are just Start to become hot issue, and iris recognition is used as a kind of method more higher than finger print safety property, enters the visual field of people. The space encoder of iris recognition is abundant, everyone iris lines is unique and easy differentiation.Moreover, in human body Iris can show different characters under different illumination conditions, therefore only living body iris can be by detection, this is just The security risk for having prevented similar duplication fingerprint etc, so that iris recognition becomes the highest identification skill of current safety Art.But it is desirable to obtain safety as high as possible under friendly user experience.One user-friendly example is exactly people Face identification, in open occasion (such as safety check, gate inhibition), face can be caught in the case where user does not discover by hardware device It obtains, therefore added burden will not be brought to user.

In field of human-computer interaction, human eye positioning also has a wide range of applications.As the wear-types such as Google Glass can be worn The appearance of equipment is worn, people are sought for the novel interactive mode other than keyboard and mouse combines, the people nearest apart from equipment Eye just becomes therein popular alternative.There is the hardware device of some short distances to come out at present, such as Pupil Labs public The equipment that department develops the similar glasses that one is formed using wear-type component and some cameras, for capturing the eyeball of user Movement.These human eye tracer techniques can allow people without using both hands using eyes control browsing webpage, can be with Auxiliary player controls personage's movement in game, can also apply in the application scenarios such as virtual reality.And in human eye The positioning of the heart is the important ring in this kind of human-computer interaction technology.When human eye sight moves on the screen, eyeball turns It is dynamic very small, therefore to realize the precise controlling such as the clicking operation of web page browsing even on the screen, accurate people Eye location technology will be indispensable.

Human eye is located in other field and is also widely used.Such as pernicious traffic accident caused by fatigue driving, Many researchs use the modes judgement departments such as the eyeball movement of mobile unit observation driver, eyelid closing, facial expression, head movement Machine whether fatigue driving, and take the necessary measures when there is fatigue driving.In recent ten years, human eye positioning is by more and more The concern of academia and industry.

There are many methods of relative maturity in terms of human eye positioning at present.Such as：M.Ressel is in Proceedings " the An delivered on of the 1996ACM conference on Computer supported cooperative work integrating,transformation-oriented approach to concurrency control and undo In group editors " is identified using geometric object of the Hough transformation to rule, and human eye is positioned in figure. M.Asadifard and J.Shanbezadeh is in Proceedings of the International " the Automatic adaptive delivered on multiConferenceofngineers and Computer Scientists Center of pupil detection usingface detection and cdf analysis " utilizes active shape mould Type (ASM) models face feature point as a whole, is positioned using the mutual relational implementation of characteristic point.Recent years, with Increase of the machine learning algorithm in numerous areas depth and range, in terms of human eye positioning application it is also more and more. Such as：.According to Haar feature and the performance outstanding in Face datection of Adaboost classifier, Y.Ma et al exists AutomaticFace and Gesture Recognition,2004.Proceedings.Sixth IEEE " the Robust precise eye location under delivered on International Conference Probabilistic framework " is positioned using similar method human eye, by cascading multiple simple classifiers, is emphasized Mistake in classifier before constantly trains new classifier, is finally combined into strong classifier.Although these methods relative at It is ripe, but respectively have without the advantages of and deficiency, disadvantage be concentrated mainly on, for the not high method of complexity, accuracy cannot dash forward Broken bottleneck, and the method for requirement can be reached for accuracy, complexity is excessively high and real-time is not reached requirement.Human eye Many scenes of positioning are had higher requirements to real-time and accuracy, this just needs to find one kind can both ways all The method to do well.

Summary of the invention

As above in order to solve the problems, such as, the present invention provides a kind of quick positioning human eye methods in facial image.

Technical solution of the invention is as follows：

A kind of method that human eye is quickly positioned in facial image, it is characterized in that, include the following steps：

Step 1, based on random forest machine learning algorithm, training obtains certain amount decision tree, passes through integrated skill Art forms random forest；

Step 2, input facial image to be measured, and it is converted into grayscale image, it is two-dimensional matrix；

Step 3, using the obtained random forest in step 1, the use for obtaining the two-dimensional matrix of gray value to step 2 is more Grade location structure, i.e., since a fixed human eye coarse positioning region, according to current positioning result, as positioning next time The center in region, reduces search range step by step, continuous iteration to the last after the completion of level-one restoring to normal position as a result, determining human eye Position coordinates；

Step 4 will have more decision trees in random forest to obtain multiple human eye coordinates weighted averages in step 3, obtain most The result of whole human eye positioning.

Training process in the step 1 based on random forest machine learning algorithm, specific step is as follows：

1.1) the face sample gray level image of input, is done into normalized, the upper left corner, the upper right corner, the lower right corner of picture Coordinate be respectively (0,0), (1,0), (1,1).

1.2), as " seed " of random forest, diversified processing is done to sample, on the basis of master sample, is carried out Randomization, does a certain range of random offset on abscissa, ordinate direction and dimension of picture respectively, while Random rotation is done in certain angle, generates the training sample of multi-pose；

1.3) two-dimensional matrix that, will 1.2) obtain takes its most direct characteristic value, the gray scale coordinate of pixel in image Value, as input, for the coordinate of human eye as output, decision tree is a full binary tree, i.e., ties among root node and each Point training is simultaneously saved based on grey scale difference I (I between the coordinate of difference₁)-I(I₂) and a threshold value T, we define to image I's Two-value tests (binary test)：

Preferential, this binary tree is decision tree, we promote decision Tree algorithms by self-adaption gradient and obtain, specific steps It is as follows：

Firstly, initialization error of fitting function：

Wherein, F (x) is decision tree function,For error of fitting；

Further, it calculates to negative gradient direction calculating puppet residual error, i.e.,

Wherein, we define loss function and are,

L(y_i,F(x_i))=1/2 (y_i-F(x_i))²

Substitution obtains,

Further, it updates and is fitted pseudo- residual error：

Wherein, h (x_i；It α) is fitting result, such as the fitting result of first time is h (x_i；α₁)

Further, weight and regression criterion multiplier are updated：

Further, more new model, m=1 → M, M times rear iteration terminates：

F_m(x)=F_m-1(x)+γρ_mh(x；α_m)

Wherein, 1<γ≤0, γ are learning rate, and the size of γ determines the convergence rate of iteration.

1.4), using method 1.3), n picture is divided into two classes, is mounted to the son section of left and right two of root node respectively On point, in subsequent first layer, with good grounds two-value test is divided into four class picture altogether, and so on, it is cut when reaching appropriate tree depth Only；

1.5), for the classifying quality in 1.4), while the aggregation extent of estimation can also be used, i.e. the quadratic sum of error is come Characterization, therefore the purpose of training is to minimize following error：

Wherein S_lAnd S_rThe collection that coordinate forms in the left and right child node respectively generated according to certain feature and threshold classification It closes.The condition that training stops be when reaching the specified depth set or specification error size.

Prediction result due to needing to position picture is being averaged for the coordinate of same class sample, so being to prediction result One estimation, so the aggregation extent of this estimation is more, the confidence level predicted is also higher, it was demonstrated that trained effect is got over It is good, wherein the aggregation extent of estimation can be indicated with variance, it is contemplated that needing to guarantee the picture of carry on each node Quantity, i.e., as far as possible guarantee classification be averaged, therefore using error quadratic sum it is the most suitable.Therefore we obtain trained mesh Mark is to minimize following error：

Wherein, i=1,2 ..., 2^d, it indicates the i-th class, corresponds to i-th of leaf node, 2^dFor the sum of leaf node, S_iIt indicates The corresponding set of i-th class is exactly the set in the picture of i-th of leaf node.

But in practical applications, itself is a nonlinear systems for binary tree, while considering all influence factors It is also extremely difficult and complicated.Therefore the decision tree training process of this paper uses greedy method, according to depth since root node The characteristic parameter and its threshold value on each node are successively determined, so that formula (1) is small.Notice that training sample is not at this time On leaf node, but with the propulsion of depth, we using present node as leaf node, still use Q_treeIt is all to represent Euclidean distance quadratic sum of the sample to generic estimation center.Due to only training a node every time, so only this section The training sample of point can change.So best feature and threshold value are needed to be traversed for and select when each node of training, so that Two classes formed after being classified have the smallest variance, therefore objective function becomes：

Wherein, S_lWith S_rThe set of the left and right child node respectively represented.

Compared with prior art, the beneficial effects of the invention are as follows：

1) present invention further increases the precision of positioning, and time loss drops in the low complexity algorithm based on binary tree Low, the present invention is based on CART models to be learnt to obtain regression tree from a large amount of diversified human eye samples, what robustness also obtained It is promoted well.

2) situation weaker for single regression tree model stationkeeping ability, the present invention are promoted by random forest and gradient Multiple weak fix devices that both integrated technologies of decision tree obtain repetition learning are combined into a strong fix device.

3) application environment positioned according to human eye, decision tree of the invention constantly reduce localization region in position fixing process Range, and introduce the sample weights of positioning result further to improve the instruction that random forest and gradient promote decision tree Practice process and prediction process, finally realizes significant performance gain.

4) the human eye positioning based on CART model can be in the case where only low-resolution image with extremely low operation generation Valence, positioning accuracy and robustness of the realization better than remaining most model, the required only a small amount additional memory space of cost, It therefore is a selection well for fast human-eye positioning.

Detailed description of the invention

Fig. 1 is the flow chart of the quick positioning human eye method in the present inventor's face image；

Fig. 2 multi-pose training sample generates sample；

Fig. 3 is the process that gradient promotes that decision tree uses the positioning of integrated technology human eye；

Fig. 4 multistage location structure effect picture.

Mono- decision tree structure of Fig. 5.

Specific embodiment

In order to realize the technology of the present invention measure, creation characteristic, reach purpose and effect is easy to understand, tie below Conjunction is specifically illustrating, and the present invention is further explained.

Fig. 1 is the flow chart of the method for the quick positioning human eye in the present inventor's face image, is included the following steps：

Step 1: obtaining the decision tree that 25 tree depths are 11 by training based on random forest machine learning algorithm；

Firstly, as shown in Fig. 2, doing normalization and diversification processing, the upper left corner, the upper right corner, the lower right corner of picture to sample Coordinate be respectively (0,0), (1,0), (1,1), on the basis of master sample, carry out randomization, respectively abscissa, A certain range of random offset is done in ordinate direction and dimension of picture, while doing random rotation within a certain angle, it is raw At the training sample of multi-pose；

Decision tree is promoted by gradient, as shown in figure 3, gradient when γ=0.4 promotes decision tree, constantly amendment is advanced Direction to more accurately reach human eye center, obtain better effect.

Obvious single decision tree must be a Weak Classifier, these are set composition random forest, shape with integrated technology At strong classifier.

Step 2: input face picture, and it is converted into two bit matrix of gray value；

Step 3: as shown in figure 5, decision tree is a binary tree using each of step 1 decision-tree model, and And usually a full binary tree asks a training in advance and is protected on each nonleaf node since the root node of binary tree The problem of keeping proceeds to the left son or right son of root node according to the selection of the answer result of the problem, until proceeding to The leaf node of decision tree.One is obtained 25 human eye coordinates.

Using 5 grades of location structures, the position coordinates of human eye are determined.As shown in Fig. 2, the present invention adopts to improve accuracy rate With multistage location structure, also referred to as pyramid structure, precision is improved by constantly reducing the range of positioning：From a fixation Biggish ROI (Region of Interest) start, according to current ROI positioning result as the smaller ROI of next stage Regional center, then continuous iteration until return afterbody ROI positioning result.

Step 4: 25 human eye coordinates that step 3 is obtained are averaged, the result of final human eye positioning is obtained.

Claims

1. quickly positioning the method for human eye in a kind of facial image, which is characterized in that include the following steps：

Step 1, based on random forest machine learning algorithm, training obtain certain amount decision tree, pass through integrated technology shape At random forest；

Step 3, using the obtained random forest in step 1, to step 2 obtain the two-dimensional matrix of gray value using multistage fixed Bit architecture, i.e., since a fixed human eye coarse positioning region, according to current positioning result, as localization region next time Center, reduce search range step by step, continuous iteration to the last after the completion of level-one restoring to normal position as a result, determining the position of human eye Coordinate；

Step 4 will have more decision trees in random forest to obtain multiple human eye coordinates weighted averages in step 3, obtain final people The result of eye positioning；

The face sample gray level image of input is done normalized by step 1.1), obtains standard exercise sample, face sample ash Spend the upper left corner of image, the upper right corner, the lower right corner coordinate be respectively (0,0), (1,0), (1,1)；

Step 1.2) does diversified processing to standard exercise sample：Respectively on abscissa, ordinate direction and dimension of picture A certain range of random offset is done, while doing random rotation within a certain angle, generates the training sample of multi-pose；

Step 1.3) takes its most direct characteristic value, by pixel in image according to the two-dimensional matrix of the training sample of multi-pose Gray scale coordinate value, as input, the coordinate of human eye obtains binary tree as output, i.e., ties among root node and each Point training is simultaneously saved based on grey scale difference I (I between the coordinate of difference₁)-I(I₂) and a threshold value T, define the two-value to image I Testing (binary test) is：

Step 1.4) is divided into two classes according to binary tree, by the n of input multi-pose training samples, is mounted to a left side for root node respectively In right two child nodes, in subsequent first layer, with good grounds two-value test is divided into four class picture altogether, and so on, arrival refers to End when fixed tree depth or when specification error size.

2. quickly positioning the method for human eye in a kind of facial image according to claim 1, which is characterized in that the step Training process in 1 based on random forest machine learning algorithm further includes：

Step 1.5) carries out error validity to the training result that step 1.4) obtains, and judges whether it is minimum error, formula is such as Under：

Q in formula_nodeIt is exactly minimal error, represents Euclidean distance quadratic sum of all samples to generic estimation center, S_lAnd S_r Respectively it is divided into the set that coordinate forms in the left and right child node of two classes according to what certain feature and threshold classification generated.

3. quickly positioning the method for human eye in a kind of facial image according to claim 1, which is characterized in that the step 1.3) it obtains binary tree specific step is as follows：

Step 1.31) initializes error of fitting function, and formula is as follows：

Wherein, F (x) is decision tree function,For error of fitting；

Step 1.32) is calculated to negative gradient direction calculating puppet residual error, and formula is as follows：

Step 1.33), which updates, is fitted pseudo- residual error：

Wherein, h (x_i；It α) is fitting result, the fitting result of first time is h (x_i；α₁))

Step 1.34) updates weight and regression criterion multiplier：

Step 1.34) more new model, m=1 → M, M times rear iteration terminates：

F_m(x)=F_m-1(x)+γρ_mh(x；α_m)