CN106599994B

CN106599994B - A kind of gaze estimation method based on depth Recurrent networks

Info

Publication number: CN106599994B
Application number: CN201611036387.1A
Authority: CN
Inventors: 潘力立
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2016-11-23
Filing date: 2016-11-23
Publication date: 2019-02-15
Anticipated expiration: 2036-11-23
Also published as: CN106599994A

Abstract

This patent proposes a kind of gaze estimation method based on depth Recurrent networks, belongs to computer vision and machine learning field.The main thought of this method is the mapping relations established between input picture feature and sight by depth Recurrent networks.Firstly, extracting the gradient orientation histogram feature in eyes image region；Then, 5 layers of depth regression model is established, input picture feature is fitted and exports the mapping relations between direction of visual lines；Later, the parameter of gradient descent method optimization depth regression model is utilized；Finally, for eyes image to be estimated, direction of visual lines is estimated using the depth model succeeded in school.

Description

A kind of gaze estimation method based on depth Recurrent networks

Technical field

The invention belongs to technical field of computer vision, are related to deep learning method, mainly solve sight estimation and sight The visions mapping problems such as tracking can be applied to the fields such as vehicle security drive and area-of-interest detection.

Background technique

In computer vision, sight estimation refers to the face-image according to input, positions human eye area and according to eyeball Automatically estimate direction of visual lines in position.Existing gaze estimation method includes two classes: (1) method based on geometry and (2) base In the method for recurrence.It is detailed in bibliography: Takahiro Ishikawa, Simon Baker, Iain Matthews, and Takeo Kanade, Passive Driver Gaze Tracking with Active Appearance Models, tech.report CMU-RI-TR-04-08,2004。

Gaze estimation method based on geometry mainly passes through positioning pupil center, upper palpebra inferior characteristic point and canthus position Etc. calculating direction of visual lines.The characteristic point for positioning above-mentioned ocular mainly utilizes movable appearance model (Active Appearance Model), this is a kind of side by face overall situation appearance positioning facial feature points (canthus point, corners of the mouth point etc.) Method.The advantages of gaze estimation method based on geometry, is that sight accuracy of estimation is high when eye feature point location is accurate, and Disadvantage be active appearance models vulnerable to illumination, block influence with posture, cause eye feature point location inaccurate.It is detailed in ginseng Examine document: Takahiro Ishikawa, Simon Baker, Iain Matthews, and Takeo Kanade, Passive Driver Gaze Tracking with Active Appearance Models,tech.report CMU-RI-TR-04- 08,2004 and Iain Matthews, and Simon Baker, Active Appearance Models Revisited, International Journal of Computer Vision,Vol.60,No.2,pp.135-164,2004.

Gaze estimation method based on recurrence mainly passes through detection human eye area, and establish eyes image feature and sight it Between mapping relations.Existing method mainly passes through support vector regression, Gaussian process recurrence etc. and establishes mapping relations.Based on recurrence Gaze estimation method major advantage be it is simple under the premise of the human eye area accurate positioning easily realize, and the disadvantage is that existing Homing method is difficult accurately to describe very much the mapping relations between human eye feature and sight.It is detailed in bibliography: Zhiwei Zhu,Qiang Ji,and Kristin P.Bennett,Nonlinear Eye Gaze Mapping Function Estimation via Support Vector Regression,The 18th International Conference on Pattern Recognition, Vol.1, pp.1132-1135,2006. and Oliver Williams, Andrew Blake, and Roberto Cipolla.Sparse and Semi-supervised Visual Mapping with the S3GP.The 2006IEEE Computer Society Conference on Computer Vision and Pattern Recognition,Vol.1,pp.230-237,2006.

In the gaze estimation method based on homing method, sixty-four dollar question be establish it is non-from eye feature to sight Linear regression model (LRM).In nonlinear regression model (NLRM), present depth regression model has been found mould best to solve this problem One of type.Due to its high accuracy, flexibility and strong versatility, depth method is currently widely used.In recent years, to encode certainly Device is the research work of core, is more and more applied in practical problem.It is detailed in bibliography: Yan LeCun, Bengio Yoshua Bengio,and Geoffrey Hinton,Deep Learning,Nature,Vol.521,pp 436-444。

Summary of the invention

The present invention provides a kind of gaze estimation methods based on depth network.Collected eyes image is carried out first Size normalization；Later, depth is established between input eyes image gradient orientation histogram feature and corresponding sight return mould Type；Then, the parameter of depth regression model is initialized, and solves depth model parameter using gradient descent method；Estimate finally, treating It counts eyes image and extracts gradient orientation histogram feature, and utilize the depth regression model estimation direction of visual lines acquired.Algorithm Schematic diagram is referring to fig. 2.

In order to easily describe the content of present invention, some terms are defined first.

Definition 1: direction of visual lines.The angle in eyeball fixes direction is usually indicated by vector in two-dimensional space, this to Amount is made of two elements, and first element is level angle, and second element is vertical angle.

Definition 2: gradient orientation histogram feature.Piece image is described using the directional spreding of image pixel intensities gradient or edge In object presentation and shape Visual Feature Retrieval Process method.Its implementation, which first divides the image into, small is called pane location Connected region；Then the gradient direction or edge orientation histogram of each pixel in pane location are acquired；It is finally that these are straight Square figure, which combines, can be formed by Feature Descriptor.It, can also be these local histograms in image in order to improve accuracy Bigger section (block) in degree of comparing normalization (contrast-normalized), the method is each by first calculating Density of the histogram in this section (block), then does each pane location in section according to this density value and returns One changes.There can be stronger robustness to illumination variation and shade by the normalization.

Definition 3: depth Recurrent networks.Refer to the neural network successively returned, each layer of output is next layer of input.

Define 4:S shape function.Sigmoid function (sigmoid function) can generally be expressed as σ (), and expression formula is

Definition 5: back-propagation algorithm.It is a kind of supervised learning algorithm, is often used to train multilayer neural network.General packet Containing two stages: (1) the propagated forward stage, which inputs training, is sent into network to obtain exciter response；(2) back-propagation phase will Exciter response inputs corresponding target output with training and asks poor, to obtain the response error of hidden layer and output layer.

Definition 6: gradient descent method.It is a kind of unconfined optimization method, when solving objective function minimum value, finds ladder Direction is spent, and is searched for along gradient opposite direction, the method until reaching local minimum.

A kind of gaze estimation method based on depth Recurrent networks according to the invention, it is comprised the steps of:

Step 1: acquisition N width includes the eyes image (see Fig. 1) of different sight, and records corresponding when acquisition each image Direction of visual linesy_nOne-dimensional representation horizontal direction, two-dimensional representation vertical direction, subscript n indicate the n-th width image Corresponding direction of visual lines；

Step 2: the eyes image normalized that will be acquired in step 1, and extract gradient orientation histogram feature (Histogramof Oriented Gradient,HOG)。

Step 3: the range of the corresponding direction of visual lines of N width image is normalized into [0,1] section, specific practice are as follows:

WhereinIndicate the component of the calibration sight jth dimension of n-th of sample, y_njNumerical value after indicating dimension normalization.

Step 4: the corresponding mapping function of projected depth regression model is to input feature vectorWherein s₁Indicate special The dimension of sign successively maps input feature vector,

Indicate the input of l+1 layers of i-th of unit,Indicate connection deep neural network L layers of all s_lParameter between a unit and l+1 layers of i-th of unit.Specifically,It indicates to connect l layers Parameter between j-th of unit and l+1 i-th of unit of layer,For bias term relevant to l+1 layers of hidden unit i, s_l+1 For the number of l+1 layers of hidden unit.Whether l+1 layers of i-th of unit is activated, and is determined by the output of sigmoid function, it may be assumed that

Above formula can also indicate are as follows:

The output layer of the depth regression model of this patent design shares 2 units, uses symbolIt indicates, to estimate The level angle and vertical angle of direction of visual lines are counted, subscript (L) indicates the label of output layer.Entire depth regression model function h_{W, b}(x_n) indicate to input as x_nWhen sight estimated value, it may be assumed that

Step 5: by gradient orientation histogram feature [x normalized in step 3₁..., x_N] as depth regression model Input, corresponding calibration direction of visual lines are [y₁..., y_N], establish the objective function of depth regression model:

WhereinL is the layer of depth Recurrent networks Number,

Step 6: inputting to indicate to work as x_nWhen, any unit i of any one layer of l is to error sum of squares The size of contribution defines error termI=1 ..., s_l, l=2 ..., L.For output Layer (L layers), the corresponding error term of each unit i are as follows:

It indicatesDerivative calculate l=2 using Back Propagation Algorithm, each node at 3, L-1 layers The corresponding error term of j

Following objective function J (w, b) is finally obtained about parameterWithPartial derivative

WhereinWithIt indicates to work as and input as x_nWhen corresponding l layers of j-th of unit output and l+1 layer i-th The corresponding error term of a unit.Objective function J (w, b) is finally obtained about parameter vector w, the gradient of bWith

Step 7: parameter w and b in order to acquire optimal deep neural network, our first initiation parameters, parameter are initial Change value is the reconstructed error minimum so that input signal, obtains initial value w^[0]And b^[0]And then it is carried out using gradient descent method Optimization；That is:

Wherein subscript [t] and [t+1] indicate the t times and t+1 iteration.Stop iteration when w and b meet the condition of convergence.

Step 8: for new eyes image, detecting ocular and extract gradient orientation histogram feature, numerical value is returned One change after be sent into trained depth network, obtain corresponding direction of visual lines estimated value, and numberical range is reverted to- 90 ° Dao+90 °.

It should be understood that

Further, the specific method of affiliated step 2 is that the eyes image that will be acquired in step 1 is normalized to size and is 100 × 60 pixels, during gradient orientation histogram feature calculation, the parameter of areal is set as 2 × 2, Mei Yiqu The number parameter of elementary area is set as 4 × 4 in domain, and the number of steering column is set as 9, and it is corresponding to finally obtain any piece image Gradient orientation histogram feature dimension be 1152, and remember the corresponding gradient orientation histogram feature of any n-th width image to AmountAnd then to every one-dimensional progress numerical value normalization, data area is compressed to [0,1] section, for n-th of sample This, the data x of i-th dimension_niNormalize formula

For the minimum value in all sample i-th dimensions,It is similar fixed Justice is maximum value in all sample i-th dimensions.

The stack self-encoding encoder mentioned in step 4, each layer of number of unit are respectively s₁=1440, s₂=300, s₃= 250 and s₄=200, output layer only has 2 units, s₅=2.

When seeking depth network parameter using gradient descent method in step 7, the condition of convergence be front and back twice iteration parameter not Change again, that is, reaches local best points；Gradient descent method optimizes method are as follows:

Innovation of the invention is:

It proposes to utilize depth Recurrent networks, establishes the Nonlinear Mapping relationship between eyes image and direction of visual lines.This hair The bright N eyes image of acquisition first is normalized to the image that size is 100*60 as training sample, and depth image, mentions simultaneously 1152 dimension gradient orientation histogram features are taken, corresponding direction of visual lines is then recorded.Later, projected depth Recurrent networks, the depth It spends network and removes input layer and output layer, totally 3 layers of middle layer.Then, on training sample and calibration direction of visual lines data, ladder is utilized Each layer parameter of degree decline calligraphy learning depth Recurrent networks.Finally, for the eyes image of sight to be estimated, gradient direction is extracted Histogram feature estimates direction of visual lines according to the above-mentioned depth Recurrent networks model succeeded in school.With traditional gaze estimation method Compare, this method can simulation input feature to direction of visual lines complex mapping relation, effectively overcome shallow Model estimation The not high problem of accuracy.

Detailed description of the invention

Fig. 1 is that sight estimates schematic diagram；

Fig. 2 is depth Recurrent networks schematic diagram.

Specific embodiment

According to the method for the present invention, the training pattern of depth Recurrent networks is write first with Matlab or C language；It connects The collected training sample of input and training depth Recurrent networks parameter；Then it is straight gradient direction to be extracted to acquired image Square figure feature is input in trained depth Recurrent networks as source data and is handled；The direction of visual lines estimated.This The method of invention can be used in the sight estimation problem of eye in natural scene.

A kind of gaze estimation method based on depth Recurrent networks, it is comprised the steps of:

Step 2: it is 100 × 60 pixels that the eyes image acquired in step 1, which is normalized to size, and extracts gradient direction Histogram feature (Histogram of Oriented Gradient, HOG).In the process of gradient orientation histogram feature calculation In, the parameter of areal is set as 2 × 2, and the number parameter of elementary area is set as 4 × 4 in each region, of steering column Number is set as 9, and the dimension for finally obtaining the corresponding gradient orientation histogram feature of any piece image is 1152, and remembers any the The corresponding gradient orientation histogram feature vector of n width imageAnd then it to every one-dimensional progress numerical value normalization, will count According to Ratage Coutpressioit to [0,1] section, for n-th of sample, the data x of i-th dimension_niNormalize formula

Above formula can also indicate are as follows:

WhereinL is the layer of depth Recurrent networks Number,

Step 7: parameter w and b in order to acquire optimal deep neural network, our first initiation parameters, parameter are initial Change value is the reconstructed error minimum so that input signal, obtains initial value w^[0]And b^[0]And then it is carried out using gradient descent method Optimization, that is:

It should be understood that

The stack self-encoding encoder mentioned in step 3, each layer of number of unit are respectively s₁=1440, s₂=300, s₃= 250 and s₄=200, output layer only has 2 units, it may be assumed that s₅=2.

When seeking depth network parameter using gradient descent method in step 7, the condition of convergence be front and back twice iteration parameter not Change again, that is, reaches local best points.

Claims

1. a kind of gaze estimation method based on depth Recurrent networks, comprising the following steps:

Step 1: acquisition N width includes the eyes image of different sight, and records corresponding direction of visual lines when acquisition each imagey_nOne-dimensional representation horizontal direction, two-dimensional representation vertical direction, subscript n indicates the corresponding view of the n-th width image Line direction；

Step 2: the eyes image normalized that will be acquired in step 1, and gradient orientation histogram feature is extracted, obtain N width Gradient orientation histogram feature [the x of image₁..., x_N]；

Step 3: the range that N width image corresponds to direction of visual lines being normalized into [0,1] section, obtaining calibration direction of visual lines is [y₁..., y_N], specific practice are as follows:

WhereinIndicate the component of the calibration sight jth dimension of the n-th width image, y_njNumerical value after indicating dimension normalization, wherein for Expression facilitates y_niIt is expressed as y_n；

Step 4: the corresponding mapping function of projected depth regression model is to input feature vectorWherein s₁Indicate feature Dimension successively maps input feature vector,

Indicate the input of l+1 layers of i-th of unit,Indicate l layers of deep neural network of connection All s_lParameter between a unit and l+1 layers of i-th of unit；Specifically,Indicate j-th of l layers of connection Parameter between unit and l+1 i-th of unit of layer,Indicate the output of the sigmoid function of l+1 layers of i-th of unit, For bias term relevant to l+1 layers of hidden unit i, S_l+1For the number of l+1 layers of hidden unit；L+1 layers of i-th of unit Whether it is activated, is determined by the output of sigmoid function, it may be assumed that

The output layer of the depth regression model of design shares 2 units, uses symbolIt indicates, to estimate direction of visual lines Level angle and vertical angle, subscript (L) indicate output layer label；Entire depth regression model function h_{W, b}(x_n) table Show to work as and input as x_nWhen sight estimated value, it may be assumed thatσ () indicates sigmoid function；

Step 5: by the normalized gradient orientation histogram feature [x of step 2₁..., x_N] input as depth regression model, Corresponding calibration direction of visual lines is [y₁..., y_N], establish the objective function of depth regression model:

WhereinL is the number of plies of depth Recurrent networks,The intensity of λ Section 2 bound term；

Step 6: inputting to indicate to work as x_nWhen, any unit i of any one layer of l is to error sum of squares The size of contribution defines an error termOutput layer is used as L layers When, the corresponding error term of each unit i are as follows:

It indicatesDerivative,It indicates to work as and input as x_nWhen L i-th of node of layer input, to biography after utilization It broadcasts algorithm, calculates l=2, each node j corresponding error term at 3, L-1 layers

WhereinWithIt indicates to work as and input as x_nWhen corresponding l layers of j-th of unit output and i-th of l+1 layer it is single The corresponding error term of member；Objective function J (w, b) is finally obtained about parameter vector w, the gradient of bWith

Step 7: parameter w and b in order to acquire optimal deep neural network, our first initiation parameters, parameter initialization value It is the reconstructed error minimum so that input signal, obtains initial value w^[0]And b^[0]And then it is optimized using gradient descent method； That is:

Wherein subscript [t] and [t+1] indicate the t times and t+1 iteration；Stop iteration when w and b meet the condition of convergence, α is indicated The step-length of gradient decline；

Step 8: for new eyes image, detects ocular and extract gradient orientation histogram feature, numerical value normalization Be sent into trained depth network later, obtain corresponding direction of visual lines estimated value, and by numberical range revert to -90 ° to+ 90°。

2. a kind of gaze estimation method based on depth Recurrent networks as described in claim 1, it is characterised in that the step 2 Method particularly includes: it is 100 × 60 pixels that the eyes image acquired in step 1, which is normalized to size, in gradient direction histogram During figure feature calculation, the parameter of areal is set as 2 × 2, the number parameter setting of elementary area in each region It is 4 × 4, the number of steering column is set as 9, finally obtains the dimension of the corresponding gradient orientation histogram feature of any piece image It is 1152, and remembers the corresponding gradient orientation histogram feature vector of any n-th width imageAnd then to it is every it is one-dimensional into The normalization of line number value, is compressed to [0,1] section for data area, for the n-th width image, the data x of i-th dimension_niNormalization is public Formula

For the minimum value in all sample i-th dimensions,It is similar to be defined as Maximum value in all sample i-th dimensions,Indicate the n-th width image, the gradient orientation histogram feature vector of the data of i-th dimension, To indicate to facilitate x_niIt is expressed as x_n。

3. a kind of gaze estimation method based on depth Recurrent networks as described in claim 1, it is characterised in that sharp in step 7 When seeking depth network parameter with gradient descent method, the condition of convergence is that the parameter of front and back iteration twice no longer changes, that is, reaches part Optimum point；Gradient descent method optimizes method are as follows:

Wherein subscript [t] and [t+1] indicate the t times and t+1 iteration；Stop iteration when w and b meet the condition of convergence.