CN106599810A - Head pose estimation method based on stacked auto-encoding - Google Patents

Head pose estimation method based on stacked auto-encoding Download PDF

Info

Publication number
CN106599810A
CN106599810A CN201611100343.0A CN201611100343A CN106599810A CN 106599810 A CN106599810 A CN 106599810A CN 201611100343 A CN201611100343 A CN 201611100343A CN 106599810 A CN106599810 A CN 106599810A
Authority
CN
China
Prior art keywords
layer
stack
head
parameter
represent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611100343.0A
Other languages
Chinese (zh)
Other versions
CN106599810B (en
Inventor
潘力立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201611100343.0A priority Critical patent/CN106599810B/en
Publication of CN106599810A publication Critical patent/CN106599810A/en
Application granted granted Critical
Publication of CN106599810B publication Critical patent/CN106599810B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Abstract

The invention discloses a head pose estimation method based on stacked auto-encoding, and belongs to the technical field of computer vision. The main idea is to establish a nonlinear mapping relation between a head depth image and pose by employing a stacked auto-encoder. The method includes: firstly acquiring a lot of head depth images as training samples, extracting histogram of oriented gradient characteristics, and recording the corresponding head pose; then designing the stacked auto-encoder, and learning parameters of each layer of the stacked auto-encoder based on the training samples and calibrated pose data by employing a gradient descent method; and finally, for the head images whose poses are to be estimated, extracting the histogram of oriented gradient characteristics, and estimating the head pose according to the learned stacked auto-encoder. Compared with the conventional head pose estimation method, according to the method, the complex mapping relation of input characteristics to the head pose can be simulated, and the problem of low estimation accuracy of a shallow model is effectively overcome.

Description

A kind of head pose estimation method based on stack own coding
Technical field
The invention belongs to technical field of computer vision, the head pose estimation problem being related in image.
Background technology
Head pose estimation (such as Fig. 1) refers to the digital picture according to head, using machine learning and computer vision Method quickly and accurately estimates the deflection angle of correspondence head in the image, also referred to as head pose.It is computer in recent years The popular problem of vision and machine learning area research, has non-at aspects such as man-machine interaction, safe driving and attention-degree analysis Often it is widely applied.For example:In field of human-computer interaction, the deflection angle of head can be used for controlling the side that computer or machine show To and position;In safe driving field, head pose can be used for auxiliary line of sight estimation, so as to point out driver correct sight line side To.In recent years, head pose estimation has further development on the basis of manifold learning and subspace theory development.It is existing There is head pose estimation method to be divided into three big classifications:1. the method based on appearance, is 2. based on the method and 3. of classification Method based on returning.
It is that the head image of input is existing with data base based on the ultimate principle of the head pose estimation method of appearance Image compared one by one, and using the angle corresponding to the most like image for finding as image to be estimated head pose (i.e. angle).The maximum defect of such method is that it can only export discrete head deflection angle, and due to needs and institute There is existing image to be compared successively, operand is huge.Referring to document:D.J.Beymer,Face Recognition under Varying Pose,IEEE Conference on Computer Vision and Pattern Recognition, Pp.756-761,1994 and J.Sherrah, S.Gong, and E.J.Ong, Face Distributions in Similarity Space under Varying Head pose Image and Vision Computing,vol.19, no.12,pp.807-819,2001。
Feature and correspondence head deflection angle instruction according to input picture is referred to based on the head pose estimation method of classification Practice grader, and the classification belonging to picture headers deflection angle to be estimated is distinguished using the grader for succeeding in school, so that it is determined that head The approximate range of portion's attitude.In such method commonly use grader include support vector machine (Support Vector Machine, SVM), linear judgment analysis (Linear Discriminative Analysis, LDA), the linear judgment analysis (Kernel of core Linear Discriminative Analysis, KLDA), the major defect of this kind of method is to be unable to estimate the continuous head of output Portion's attitude, referring to document:J.Huang,X.Shao,and H.Wechsler,Face Pose Discrimination using Support Vector Machines(SVM),International Conference on Pattern Recognition, pp.154-156,1998。
It is method of estimation the most frequently used at present based on the head pose estimation method for returning, the ultimate principle of the method is profit Mapping function is set up with existing characteristics of image and corresponding head angle, and estimates that pending image is corresponding using mapping function Head pose.Such method solves the problems, such as that aforementioned two methods are unable to estimate the continuous attitude of output, while reducing computing Complexity, referring to document G.Fanelli, J.Gall, and L.Van Gool, Real Time Head Pose Estimation with Random Regression Forests,IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp.617-624 and document H.Ji, R.Liu, F.Su, Z.Su, and Y.Tian, Convex Regularized Sparse Regression for Head Pose Estimation,IEEE International Conference on Image Processing,pp.3617-3620,2011。
The content of the invention
The task of the present invention there is provided a kind of head pose estimation method based on stack own coding.The method is with depth Image is used as input picture;And find the mapping relations between depth image and correspondence head pose using stack own coding.It is logical Above-mentioned modeling pattern is crossed, the complex mapping relation between depth image and head pose can be accurately found, head had both been improve The accuracy of portion's Attitude estimation, in turn ensure that the efficiency of estimation.
In order to easily describe present invention, some terms are defined first.
Define 1:Head pose.In three dimensions the angle of end rotation is generally by a vector representation, the vector by Three elements are constituted, and first element is the angle of pitch, and second element is yaw angle, and the 3rd element is the anglec of rotation.
Define 2:The angle of pitch.In the x-y-z coordinate system shown in Fig. 2 (b), the angle of pitch refers to what is rotated centered on x-axis Angle, θ.
Define 3:Yaw angle.In the x-y-z coordinate system shown in Fig. 2 (a), yaw angle refers to what is rotated centered on z-axis Angle φ.
Define 4:The anglec of rotation.In the x-y-z coordinate system shown in Fig. 2 (c), the anglec of rotation refers to the angle rotated centered on z ' Degree Ψ.
Define 5:Gradient orientation histogram feature.Piece image is described using the directional spreding of image pixel intensities gradient or edge In object presentation and the Visual Feature Retrieval Process method of shape.Its implementation is first divided the image into and little is called pane location Connected region;Then the gradient direction or edge orientation histogram of each pixel in pane location are gathered;It is finally that these are straight Square figure combines and can be formed by Feature Descriptor.In order to improve degree of accuracy, can with these local histograms in image Bigger interval (block) in carry out contrast normalization (contrast-normalized), the method is each by first calculating Density of the rectangular histogram in this interval (block), then does to each pane location in interval according to this density value and returns One changes.There can be higher robustness to illumination variation and shade by the normalization.
Define 6:Back-propagation algorithm.It is a kind of supervised learning algorithm, is often used to train multilayer neural network.General bag Containing two stages:(1) the propagated forward stage input will be trained to send into network obtaining exciter response;(2) back-propagation phase will Exciter response asks poor with the corresponding target output of training input, so as to obtain the response error of hidden layer and output layer.
Define 7:Gradient descent method.It is a kind of unconfined optimization method, when object function minima is solved, finds ladder Degree direction, and along the search of gradient opposite direction, the method until reaching local minimum.
According to a kind of head pose estimation method based on stack own coding of the present invention, comprise the following steps:
Step 1:Head depth image of the collection N width comprising different attitudes, and according to photographic head during collection each image Position, each self-corresponding head pitching of record N width images, driftage and the anglec of rotation obtain head pose vector The 1st dimension table show the angle of pitch, the 2nd dimension table shows inclination angle, and 3-dimensional represents the anglec of rotation, and subscript n represents the n-th width image;
Step 2:Detecting step 1 collects the head zone of image, and extracts the gradient orientation histogram of the head zone Feature, composition gradient direction histogram characteristic vector
Step 3:To obtaining gradient orientation histogram characteristic vector in step 2Numerical value normalization is being carried out per one-dimensional, will Numerical range is compressed to [0,1] interval, the scope of attitude is normalized to into [0,1] interval;
The concrete grammar of the step 3 is:
Numerical range is compressed to into [0,1] interval, specific practice is:For n-th sample, the data of its i-th dimensionReturn One changes formula
For the minima in all sample i-th dimensions,It is all Maximum in sample i-th dimension;
The scope of attitude is normalized to into [0,1] interval, specific practice is:
WhereinRepresent the component of the demarcation attitude jth dimension of n-th sample, ynjRepresent the numerical value after the dimension normalization;
Step 4:The corresponding mapping function of stack self-encoding encoder (such as Fig. 3) is built, if input isWherein s1Represent The dimension of feature, the stack own coding that this patent is used is of five storeys altogether;1st layer is input layer, and the input of input layer is gradient direction Histogram feature vector, the number of the 1st node layer is the dimension of gradient orientation histogram characteristic vector, and layer 2-4 is hidden unit Layer, the 5th layer is output layer;Any one node unit symbol of any one layer of lRepresent, subscript (l) represents l layers, its Computing formula is:
Represent all s of Connection Neural Network l layerslBetween individual unit and i-th unit of l+1 layers Parameter;Specifically,The parameter between i-th unit of j-th unit and l+1 layers of connection l layers is represented,For The bias term related to the hidden unit i of l+1 layers, sl+1For the number of l+1 layer hidden units;σ () is sigmoid function, its expression Formula isIf definitionThen above formula can also It is expressed as:
The output layer of the stack self-encoding encoder has 3 units, uses symbolRepresent, to represent estimation head The angle of pitch of portion's attitude, inclination angle and the anglec of rotation;Whole stack own coding model function hW, bX () is represented when input is x Estimate head pose, i.e.,:
Step 5:When input is x, it is assumed that corresponding demarcation attitude is y, and stack own coding is to Attitude estimation value and demarcation Error between attitude is:
Meanwhile, in order to represent that each unit of output layer defines error term to the size of error contribution
RepresentDerivative, using Back Propagation Algorithm, calculate l=2, each node j when 3,4 layers Corresponding error term;
Finally obtain following two estimation difference with regard toWithPartial derivative:
Step 6:Using the stack own coding model in step 4, by normalized gradient orientation histogram feature in step 3 [x1..., xN] used as the input of stack own coding, corresponding demarcation head pose value is [y1..., yN], set up stack self-editing The optimization object function of code:
WhereinWith Lambda binding itemIntensity;
Step 7:Object function J (w, b) is solved with regard to parameterWithPartial derivative
WhereinWithRepresent to work as and be input into as xnWhen corresponding l layers j-th unit output and l+1 layers The corresponding error term of i unit;Object function J (w, b) is finally obtained with regard to parameter vector w, the gradient of bWith
Step 8:In order to try to achieve optimal stack own coding parameter w and b, it would be desirable to first initiation parameter, ladder is recycled Degree descent method is optimized, specifically comprising following two steps:
A () w and b is initialized;First random initializtion w and b, w are expressed as (w(1)..., w(4))T, wherein w(l)Represent l The parameter of layer;B is expressed as (b(1)..., b(4))T, the parameter of the 1st, 2,3 layers of layer-by-layer correction afterwards;When 1 layer parameter is corrected, Using gradient descent method parameters optimization w(1)And b(1), feature is originally inputted using the reconstruct of the 1st layer network, and make reconstructed error most It is little;When 2 layer parameter is corrected, using gradient descent method parameters optimization w(2)And b(2), using the 1st layer of output as the 2nd layer Input, using layer 2 network reconstruct feature is originally inputted, and makes reconstructed error minimum;When 3 layer parameter is corrected, using ladder Degree descent method parameters optimization w(3)And b(3), using the 2nd layer of output as the 3rd layer of input, reconstructed using layer 3 network original defeated Enter feature, and make reconstructed error minimum;For the 4th layer parameter, by the use of the 3rd layer of output as the 4th layer of input, parameters optimization w(4)And b(4)So that output and the error sum of squares demarcated between attitude are minimum;Thus the 1st to the 4th layer network is initialized;
(b) gradient descent method;According to initialization value, undated parameter vector w and b, i.e.,:
Wherein subscript [t] and [t+1] represent the t time and t+1 iteration;Stop iteration when w and b meet the condition of convergence;
Step 9:For new head image, determine head zone and extract gradient orientation histogram feature, numerical value normalizing During the stack self-encoding encoder for training is sent into after change, corresponding head pose estimation value is obtained, and numerical range is reverted to- 180 to+180.
Further, the concrete grammar of the step 3 is:
Numerical range is compressed to into [0,1] interval, specific practice is:For n-th sample, the data of its i-th dimensionReturn One changes formula
For the minima in all sample i-th dimensions,It is all Maximum in sample i-th dimension;
The scope of attitude is normalized to into [0,1] interval, specific practice is:
WhereinRepresent the component of the demarcation attitude jth dimension of n-th sample, yniRepresent the numerical value after the dimension normalization;
Further, the stack self-encoding encoder mentioned in the step 4, each layer of number of unit is respectively s1= 1440, s2=80, s3=80 and s4=80, output layer only has 3 units, i.e.,:s5=3.
Further, when solving stack own coding parameter using gradient descent method in the step 8, before and after the condition of convergence is Twice the parameter of iteration no longer changes, that is, reach local best points.
The present invention innovation be:
Propose to utilize stack self-encoding encoder, the nonlinear mapping relation set up between head depth image and attitude.This The bright N width head depth images that gather first are normalized to the image that size is 96*128 as training sample depth image, 1440 are extracted simultaneously and ties up gradient orientation histogram feature, then record corresponding head pose.Afterwards, stack own coding is designed Device, the self-encoding encoder removes input layer and output layer, totally 3 layers of intermediate layer.Then, on training sample and demarcation attitude data, profit Learn each layer parameter of stack self-encoding encoder with gradient descent method.Finally, for the head image of attitude to be estimated, gradient is extracted Direction histogram feature, according to the above-mentioned stack self-encoding encoder for succeeding in school head pose is estimated.With traditional head pose estimation Method is compared, the method can simulation input feature to the complex mapping relation of head pose, effectively overcome shallow Model Estimate the not high problem of accuracy.
Description of the drawings
Fig. 1 is head pose estimation schematic diagram;
Fig. 2 is the angle of pitch, yaw angle and anglec of rotation schematic diagram;
Fig. 3 is stack self-encoding encoder schematic diagram.
Specific embodiment
The method according to the invention, first with Matlab or C language the training pattern of stack self-encoding encoder is write;Connect The training sample that collects of input and train stack own coding parameter;Then the image zooming-out gradient direction Nogata to collecting Figure feature, is input in the stack self-encoding encoder for training as source data and is processed;Obtain the head pose estimated.This Bright method, can be used in natural scene in head pose estimation problem.
A kind of head pose estimation method based on stack own coding, comprises the following steps:
Step 1:Head depth image of the collection N width comprising different attitudes, and according to photographic head during collection each image Position, each self-corresponding head pitching of record N width images, driftage and the anglec of rotation obtain head pose vector The 1st dimension table show the angle of pitch, the 2nd dimension table shows inclination angle, and 3-dimensional represents the anglec of rotation, and subscript n represents the n-th width image;
Step 2:Detecting step 1 collects the head zone of image, and extracts the gradient orientation histogram of the head zone Feature, composition gradient direction histogram characteristic vector
Step 3:To obtaining gradient orientation histogram characteristic vector in step 2Numerical value normalization is being carried out per one-dimensional, will Numerical range is compressed to [0,1] interval, the scope of attitude is normalized to into [0,1] interval;
Step 4:The corresponding mapping function of stack self-encoding encoder (such as Fig. 3) is built, if input isWherein s1Represent The dimension of feature, the stack own coding that this patent is used is of five storeys altogether;1st layer is input layer, and the input of input layer is gradient direction Histogram feature vector, the number of the 1st node layer is the dimension of gradient orientation histogram characteristic vector, and layer 2-4 is hidden unit Layer, the 5th layer is output layer;Any one node unit symbol of any one layer of lRepresent, subscript (l) represents l layers, its Computing formula is:
Represent all s of Connection Neural Network l layerslBetween individual unit and i-th unit of l+1 layers Parameter;Specifically,The parameter between i-th unit of j-th unit and l+1 layers of connection l layers is represented,Be with The hidden unit i of l+1 layers related bias term, sl+1For the number of l+1 layer hidden units;σ () is sigmoid function (sigmoid Function), its expression formula isIf definition Then above formula can also be expressed as:
Changing the output layer of stack self-encoding encoder has 3 units, uses symbolRepresent, to represent estimation head The angle of pitch of portion's attitude, inclination angle and the anglec of rotation;Whole stack own coding model function hW, bX () is represented when input is x Estimate head pose, i.e.,:
The stack self-encoding encoder mentioned in the step 4, each layer of number of unit is respectively s1=1440, s2=80, s3 =8 and s4=80, output layer only has 3 units, i.e.,:s5=3.
Step 5:When input is x, it is assumed that corresponding demarcation attitude is y, and stack own coding is to Attitude estimation value and demarcation Error between attitude is:
Meanwhile, in order to represent that each unit of output layer defines error term to the size of error contribution
RepresentDerivative, using Back Propagation Algorithm, calculate l=2, each node j when 3,4 layers Corresponding error term;
Finally obtain following two estimation difference with regard toWithPartial derivative:
Step 6:Using the stack own coding model in step 4, by normalized gradient orientation histogram feature in step 3 xnUsed as the input of stack own coding, corresponding demarcation head pose value is [y1..., yN], set up the optimization of stack own coding Object function:
WhereinWith Lambda binding itemIntensity;
Step 7:Object function J (w, b) is solved with regard to parameterWithPartial derivative
WhereinWithRepresent to work as and be input into as xnWhen corresponding l layers j-th unit output and l+1 layers The corresponding error term of i unit;Object function J (w, b) is finally obtained with regard to parameter vector w, the gradient of bWith
Step 8:In order to try to achieve optimal stack own coding parameter w and b, it would be desirable to first initiation parameter, ladder is recycled Degree descent method is optimized, specifically comprising following two steps:
A () w and b is initialized;First random initializtion w and b, w are expressed as (w(1)..., w(4))T, wherein w(l)Represent l The parameter of layer;B is expressed as (b(1)..., b(4))T, the parameter of the 1st, 2,3 layers of layer-by-layer correction afterwards;When 1 layer parameter is corrected, Using gradient descent method parameters optimization w(1)And b(1), feature is originally inputted using the reconstruct of the 1st layer network, and make reconstructed error most It is little;When 2 layer parameter is corrected, using gradient descent method parameters optimization w(2)And b(2), using the 1st layer of output as the 2nd layer Input, using layer 2 network reconstruct feature is originally inputted, and makes reconstructed error minimum;When 3 layer parameter is corrected, using ladder Degree descent method parameters optimization w(3)And b(3), using the 2nd layer of output as the 3rd layer of input, reconstructed using layer 3 network original defeated Enter feature, and make reconstructed error minimum;For the 4th layer parameter, by the use of the 3rd layer of output as the 4th layer of input, parameters optimization w(4)And b(4)So that output and the error sum of squares demarcated between attitude are minimum;Thus the 1st to the 4th layer network is initialized;
(b) gradient descent method;According to initialization value, undated parameter vector w and b, i.e.,:
Wherein subscript [t] and [t+1] represent the t time and t+1 iteration;Stop iteration when w and b meet the condition of convergence;
When solving stack own coding parameter using gradient descent method in the step 8, condition of convergence iteration twice for before and after Parameter no longer change, that is, reach local best points.
Step 9:For new head image, determine head zone and extract gradient orientation histogram feature, numerical value normalizing During the stack self-encoding encoder for training is sent into after change, corresponding head pose estimation value is obtained, and numerical range is reverted to- 180 to+180.

Claims (4)

1. a kind of head pose estimation method based on stack own coding, comprises the following steps:
Step 1:Head depth image of the collection N width comprising different attitudes, and according to the position of photographic head during collection each image, The each self-corresponding head pitching of record N width images, driftage and the anglec of rotation, obtain head pose vector The 1st Dimension table shows the angle of pitch, and the 2nd dimension table shows inclination angle, and 3-dimensional represents the anglec of rotation, and subscript n represents the n-th width image;
Step 2:Detecting step 1 collects the head zone of image, and extracts the gradient orientation histogram feature of the head zone, Composition gradient direction histogram characteristic vector
Step 3:To obtaining gradient orientation histogram characteristic vector in step 2Numerical value normalization is being carried out per one-dimensional, by numerical value Ratage Coutpressioit to [0,1] is interval, the scope of attitude is normalized to into [0,1] interval;
The concrete grammar of the step 3 is:
Numerical range is compressed to into [0,1] interval, specific practice is:For n-th sample, the data of its i-th dimensionNormalization Formula
x n i = x ~ n i - m i n ( x ~ n i , n = 1 , ... , N ) m a x ( x ~ n i , n = 1 , ... , N ) - min ( x ~ n i , n = 1 , ... , N )
For the minima in all sample i-th dimensions,For all samples Maximum in i-th dimension;
The scope of attitude is normalized to into [0,1] interval, specific practice is:
y n j = y ~ n j + 180 360
WhereinRepresent the component of the demarcation attitude jth dimension of n-th sample, ynjRepresent the numerical value after the dimension normalization;
Step 4:The corresponding mapping function of stack self-encoding encoder is built, if input isWherein s1The dimension of feature is represented, The stack own coding that this patent is used is of five storeys altogether;1st layer is input layer, and the input of input layer is gradient orientation histogram feature Vector, the number of the 1st node layer is the dimension of gradient orientation histogram characteristic vector, and layer 2-4 is hidden unit layer, and the 5th layer is Output layer;Any one node unit symbol of any one layer of lRepresent, subscript (l) represents l layers, its computing formula For:
a i ( l + 1 ) = σ ( w i 1 ( l ) a 1 ( l ) + w i 2 ( l ) a 2 ( l ) ... + w is l ( l ) a s l ( l ) + b i ( l ) ) , i = 1 , ... , s l + 1
Represent all s of Connection Neural Network l layerslGinseng between individual unit and i-th unit of l+1 layers Number;Specifically,The parameter between i-th unit of j-th unit and l+1 layers of connection l layers is represented,It is and l + 1 layer of hidden unit i related bias term, sl+1For the number of l+1 layer hidden units;σ () is sigmoid function, and its expression formula isIf definitionThen above formula can also be represented For:
a i ( l + 1 ) = σ ( z i ( l + 1 ) ) , i = 1 , ... , s l + 1
Changing the output layer of stack self-encoding encoder has 3 units, uses symbolRepresent, to represent head appearance is estimated The angle of pitch of state, inclination angle and the anglec of rotation;Whole stack own coding model function hW, bX () represents the estimation when input is x Head pose, i.e.,:
Step 5:When input is x, it is assumed that corresponding demarcation attitude is y, and stack own coding is to Attitude estimation value and demarcates attitude Between error be:
Meanwhile, in order to represent that each unit of output layer defines error term to the size of error contribution
δ i ( 5 ) = ∂ ∂ z i ( 5 ) 1 2 | | y - h w , b ( x ) | | 2 = - ( y i - a i ( 5 ) ) σ ′ ( z i ( 5 ) )
RepresentDerivative, using Back Propagation Algorithm, calculate l=2, each node j correspondences when 3,4 layers Error term;
δ j ( l ) = ( Σ k = 1 s l + 1 w j k ( l ) δ k ( l + 1 ) ) σ ′ ( z j ( l ) )
Finally obtain following two estimation difference with regard toWithPartial derivative:
∂ ∂ w i j ( l ) 1 2 | | y - h w , b ( x ) | | 2 = a i ( l ) δ j ( l + 1 )
∂ ∂ b i ( l ) 1 2 | | y - h w , b ( x ) | | 2 = δ i ( l + 1 )
Step 6:Using the stack own coding model in step 4, by normalized gradient orientation histogram feature x in step 3nMake For the input of stack own coding, corresponding demarcation head pose value is [y1..., yN], set up the optimization aim of stack own coding Function:
J ( w , b ) = 1 N Σ n = 1 N 1 2 | | y - h w , b ( x n ) | | 2 2 + λ 2 | | w | | 2 2
WhereinWith Lambda binding itemIntensity;
Step 7:Object function J (w, b) is solved with regard to parameterWithPartial derivative
∂ J ( w , b ) ∂ w i j ( l ) = 1 N Σ n = 1 N a n j ( l ) δ n i ( l + 1 ) + λw i j ( l )
∂ J ( w , b ) ∂ b i ( l ) = 1 N Σ n = 1 N δ n i ( l + 1 )
WhereinWithRepresent to work as and be input into as xnWhen corresponding l layers j-th unit output and i-th of l+1 layers The corresponding error term of unit;Object function J (w, b) is finally obtained with regard to parameter vector w, the gradient of bWith
Step 8:In order to try to achieve optimal stack own coding parameter w and b, it would be desirable to first initiation parameter, under recycling gradient Drop method is optimized, specifically comprising following two steps:
A () w and b is initialized;First random initializtion w and b, w are expressed as (w(1)..., w(4))T, wherein w(l)Represent l layers Parameter;B is expressed as (b(1)..., b(4))T, the parameter of the 1st, 2,3 layers of layer-by-layer correction afterwards;When 1 layer parameter is corrected, utilize Gradient descent method parameters optimization w(1)And b(1), feature is originally inputted using the reconstruct of the 1st layer network, and make reconstructed error minimum;When When correcting 2 layer parameter, using gradient descent method parameters optimization w(2)And b(2), using the 1st layer of output as the 2nd layer of input, Feature is originally inputted using layer 2 network reconstruct, and makes reconstructed error minimum;When 3 layer parameter is corrected, declined using gradient Method parameters optimization w(3)And b(3), using the 2nd layer of output as the 3rd layer of input, using layer 3 network reconstruct spy is originally inputted Levy, and make reconstructed error minimum;For the 4th layer parameter, by the use of the 3rd layer of output as the 4th layer of input, parameters optimization w(4) And b(4)So that output and the error sum of squares demarcated between attitude are minimum;Thus the 1st to the 4th layer network is initialized;
(b) gradient descent method;According to initialization value, undated parameter vector w and b, i.e.,:
w [ t + 1 ] = w [ t ] - α ▿ w J ( w , b )
b [ t + 1 ] = b [ t ] - α ▿ b J ( w , b )
Wherein subscript [t] and [t+1] represent the t time and t+1 iteration;Stop iteration when w and b meet the condition of convergence;
Step 9:For new head image, determine head zone and extract gradient orientation histogram feature, numerical value normalization it In sending into the stack self-encoding encoder for training afterwards, corresponding head pose estimation value is obtained, and numerical range is reverted to into -180 To+180.
2. a kind of head pose estimation method based on stack own coding as claimed in claim 1, it is characterised in that the step Rapid 3 concrete grammar is:
Numerical range is compressed to into [0,1] interval, specific practice is:For n-th sample, the data of its i-th dimensionNormalization Formula
x n i = x ~ n i - min ( x ~ n i , n = 1 , ... , N ) max ( x ~ n i , n = 1 , ... , N ) - min ( x ~ n i , n = 1 , ... , N )
For the minima in all sample i-th dimensions,For all samples Maximum in i-th dimension;
The scope of attitude is normalized to into [0,1] interval, specific practice is:
y n j = y ~ n j + 180 360
WhereinRepresent the component of the demarcation attitude jth dimension of n-th sample, ynjRepresent the numerical value after the dimension normalization;
3. a kind of head pose estimation method based on stack own coding as claimed in claim 1, it is characterised in that the step The stack self-encoding encoder mentioned in rapid 4, each layer of number of unit is respectively s1=1440, s2=80, s3=80 and s4=80, Output layer only has 3 units, i.e.,:s5=3.
4. a kind of head pose estimation method based on stack own coding as claimed in claim 1, it is characterised in that the step When solving stack own coding parameter using gradient descent method in rapid 8, the condition of convergence is that in front and back twice the parameter of iteration no longer changes, Reach local best points.
CN201611100343.0A 2016-12-05 2016-12-05 A kind of head pose estimation method encoded certainly based on stack Expired - Fee Related CN106599810B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611100343.0A CN106599810B (en) 2016-12-05 2016-12-05 A kind of head pose estimation method encoded certainly based on stack

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611100343.0A CN106599810B (en) 2016-12-05 2016-12-05 A kind of head pose estimation method encoded certainly based on stack

Publications (2)

Publication Number Publication Date
CN106599810A true CN106599810A (en) 2017-04-26
CN106599810B CN106599810B (en) 2019-05-14

Family

ID=58596108

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611100343.0A Expired - Fee Related CN106599810B (en) 2016-12-05 2016-12-05 A kind of head pose estimation method encoded certainly based on stack

Country Status (1)

Country Link
CN (1) CN106599810B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107481292A (en) * 2017-09-05 2017-12-15 百度在线网络技术(北京)有限公司 The attitude error method of estimation and device of vehicle-mounted camera
CN107506725A (en) * 2017-08-22 2017-12-22 杭州远鉴信息科技有限公司 High voltage isolator positioning and status image recognizer based on neutral net
CN107749757A (en) * 2017-10-18 2018-03-02 广东电网有限责任公司电力科学研究院 A kind of data compression method and device based on stacking-type own coding and PSO algorithms
CN107945161A (en) * 2017-11-21 2018-04-20 重庆交通大学 Road surface defect inspection method based on texture feature extraction
CN110533065A (en) * 2019-07-18 2019-12-03 西安电子科技大学 Based on the shield attitude prediction technique from coding characteristic and deep learning regression model
US11367197B1 (en) * 2014-10-20 2022-06-21 Henry Harlyn Baker Techniques for determining a three-dimensional representation of a surface of an object from a set of images

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104392241A (en) * 2014-11-05 2015-03-04 电子科技大学 Mixed regression-based head pose estimation method
US20160070966A1 (en) * 2014-09-05 2016-03-10 Ford Global Technologies, Llc Head-mounted display head pose and activity estimation
US9292734B2 (en) * 2011-01-05 2016-03-22 Ailive, Inc. Method and system for head tracking and pose estimation
CN105760809A (en) * 2014-12-19 2016-07-13 联想(北京)有限公司 Method and apparatus for head pose estimation

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9292734B2 (en) * 2011-01-05 2016-03-22 Ailive, Inc. Method and system for head tracking and pose estimation
US20160070966A1 (en) * 2014-09-05 2016-03-10 Ford Global Technologies, Llc Head-mounted display head pose and activity estimation
CN104392241A (en) * 2014-11-05 2015-03-04 电子科技大学 Mixed regression-based head pose estimation method
CN105760809A (en) * 2014-12-19 2016-07-13 联想(北京)有限公司 Method and apparatus for head pose estimation

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11367197B1 (en) * 2014-10-20 2022-06-21 Henry Harlyn Baker Techniques for determining a three-dimensional representation of a surface of an object from a set of images
US11869205B1 (en) 2014-10-20 2024-01-09 Henry Harlyn Baker Techniques for determining a three-dimensional representation of a surface of an object from a set of images
CN107506725A (en) * 2017-08-22 2017-12-22 杭州远鉴信息科技有限公司 High voltage isolator positioning and status image recognizer based on neutral net
CN107481292A (en) * 2017-09-05 2017-12-15 百度在线网络技术(北京)有限公司 The attitude error method of estimation and device of vehicle-mounted camera
CN107481292B (en) * 2017-09-05 2020-07-28 百度在线网络技术(北京)有限公司 Attitude error estimation method and device for vehicle-mounted camera
CN107749757A (en) * 2017-10-18 2018-03-02 广东电网有限责任公司电力科学研究院 A kind of data compression method and device based on stacking-type own coding and PSO algorithms
CN107945161A (en) * 2017-11-21 2018-04-20 重庆交通大学 Road surface defect inspection method based on texture feature extraction
CN107945161B (en) * 2017-11-21 2020-10-23 重庆交通大学 Road surface defect detection method based on textural feature extraction
CN110533065A (en) * 2019-07-18 2019-12-03 西安电子科技大学 Based on the shield attitude prediction technique from coding characteristic and deep learning regression model

Also Published As

Publication number Publication date
CN106599810B (en) 2019-05-14

Similar Documents

Publication Publication Date Title
CN108345869B (en) Driver posture recognition method based on depth image and virtual data
CN110059558B (en) Orchard obstacle real-time detection method based on improved SSD network
CN106599810A (en) Head pose estimation method based on stacked auto-encoding
CN108764065B (en) Pedestrian re-recognition feature fusion aided learning method
CN110532920B (en) Face recognition method for small-quantity data set based on FaceNet method
CN110674741B (en) Gesture recognition method in machine vision based on double-channel feature fusion
CN104392241B (en) A kind of head pose estimation method returned based on mixing
CN108171112A (en) Vehicle identification and tracking based on convolutional neural networks
CN112184752A (en) Video target tracking method based on pyramid convolution
CN107424161B (en) Coarse-to-fine indoor scene image layout estimation method
CN107808129A (en) A kind of facial multi-characteristic points localization method based on single convolutional neural networks
CN108182397B (en) Multi-pose multi-scale human face verification method
CN104268539A (en) High-performance human face recognition method and system
CN106599994A (en) Sight line estimation method based on depth regression network
CN103324938A (en) Method for training attitude classifier and object classifier and method and device for detecting objects
CN111368759B (en) Monocular vision-based mobile robot semantic map construction system
CN105205449A (en) Sign language recognition method based on deep learning
CN103279936A (en) Human face fake photo automatic combining and modifying method based on portrayal
CN104636732A (en) Sequence deeply convinced network-based pedestrian identifying method
CN105760898A (en) Vision mapping method based on mixed group regression method
WO2023151237A1 (en) Face pose estimation method and apparatus, electronic device, and storage medium
CN113361542A (en) Local feature extraction method based on deep learning
CN105488541A (en) Natural feature point identification method based on machine learning in augmented reality system
CN112232263A (en) Tomato identification method based on deep learning
CN103093211B (en) Based on the human body motion tracking method of deep nuclear information image feature

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190514

Termination date: 20211205