CN106599810A - Head pose estimation method based on stacked auto-encoding - Google Patents
Head pose estimation method based on stacked auto-encoding Download PDFInfo
- Publication number
- CN106599810A CN106599810A CN201611100343.0A CN201611100343A CN106599810A CN 106599810 A CN106599810 A CN 106599810A CN 201611100343 A CN201611100343 A CN 201611100343A CN 106599810 A CN106599810 A CN 106599810A
- Authority
- CN
- China
- Prior art keywords
- layer
- stack
- head
- parameter
- represent
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
Abstract
The invention discloses a head pose estimation method based on stacked auto-encoding, and belongs to the technical field of computer vision. The main idea is to establish a nonlinear mapping relation between a head depth image and pose by employing a stacked auto-encoder. The method includes: firstly acquiring a lot of head depth images as training samples, extracting histogram of oriented gradient characteristics, and recording the corresponding head pose; then designing the stacked auto-encoder, and learning parameters of each layer of the stacked auto-encoder based on the training samples and calibrated pose data by employing a gradient descent method; and finally, for the head images whose poses are to be estimated, extracting the histogram of oriented gradient characteristics, and estimating the head pose according to the learned stacked auto-encoder. Compared with the conventional head pose estimation method, according to the method, the complex mapping relation of input characteristics to the head pose can be simulated, and the problem of low estimation accuracy of a shallow model is effectively overcome.
Description
Technical field
The invention belongs to technical field of computer vision, the head pose estimation problem being related in image.
Background technology
Head pose estimation (such as Fig. 1) refers to the digital picture according to head, using machine learning and computer vision
Method quickly and accurately estimates the deflection angle of correspondence head in the image, also referred to as head pose.It is computer in recent years
The popular problem of vision and machine learning area research, has non-at aspects such as man-machine interaction, safe driving and attention-degree analysis
Often it is widely applied.For example:In field of human-computer interaction, the deflection angle of head can be used for controlling the side that computer or machine show
To and position;In safe driving field, head pose can be used for auxiliary line of sight estimation, so as to point out driver correct sight line side
To.In recent years, head pose estimation has further development on the basis of manifold learning and subspace theory development.It is existing
There is head pose estimation method to be divided into three big classifications:1. the method based on appearance, is 2. based on the method and 3. of classification
Method based on returning.
It is that the head image of input is existing with data base based on the ultimate principle of the head pose estimation method of appearance
Image compared one by one, and using the angle corresponding to the most like image for finding as image to be estimated head pose
(i.e. angle).The maximum defect of such method is that it can only export discrete head deflection angle, and due to needs and institute
There is existing image to be compared successively, operand is huge.Referring to document:D.J.Beymer,Face Recognition under
Varying Pose,IEEE Conference on Computer Vision and Pattern Recognition,
Pp.756-761,1994 and J.Sherrah, S.Gong, and E.J.Ong, Face Distributions in
Similarity Space under Varying Head pose Image and Vision Computing,vol.19,
no.12,pp.807-819,2001。
Feature and correspondence head deflection angle instruction according to input picture is referred to based on the head pose estimation method of classification
Practice grader, and the classification belonging to picture headers deflection angle to be estimated is distinguished using the grader for succeeding in school, so that it is determined that head
The approximate range of portion's attitude.In such method commonly use grader include support vector machine (Support Vector Machine,
SVM), linear judgment analysis (Linear Discriminative Analysis, LDA), the linear judgment analysis (Kernel of core
Linear Discriminative Analysis, KLDA), the major defect of this kind of method is to be unable to estimate the continuous head of output
Portion's attitude, referring to document:J.Huang,X.Shao,and H.Wechsler,Face Pose Discrimination using
Support Vector Machines(SVM),International Conference on Pattern Recognition,
pp.154-156,1998。
It is method of estimation the most frequently used at present based on the head pose estimation method for returning, the ultimate principle of the method is profit
Mapping function is set up with existing characteristics of image and corresponding head angle, and estimates that pending image is corresponding using mapping function
Head pose.Such method solves the problems, such as that aforementioned two methods are unable to estimate the continuous attitude of output, while reducing computing
Complexity, referring to document G.Fanelli, J.Gall, and L.Van Gool, Real Time Head Pose Estimation
with Random Regression Forests,IEEE Conference on Computer Vision and Pattern
Recognition, 2011, pp.617-624 and document H.Ji, R.Liu, F.Su, Z.Su, and Y.Tian, Convex
Regularized Sparse Regression for Head Pose Estimation,IEEE International
Conference on Image Processing,pp.3617-3620,2011。
The content of the invention
The task of the present invention there is provided a kind of head pose estimation method based on stack own coding.The method is with depth
Image is used as input picture;And find the mapping relations between depth image and correspondence head pose using stack own coding.It is logical
Above-mentioned modeling pattern is crossed, the complex mapping relation between depth image and head pose can be accurately found, head had both been improve
The accuracy of portion's Attitude estimation, in turn ensure that the efficiency of estimation.
In order to easily describe present invention, some terms are defined first.
Define 1:Head pose.In three dimensions the angle of end rotation is generally by a vector representation, the vector by
Three elements are constituted, and first element is the angle of pitch, and second element is yaw angle, and the 3rd element is the anglec of rotation.
Define 2:The angle of pitch.In the x-y-z coordinate system shown in Fig. 2 (b), the angle of pitch refers to what is rotated centered on x-axis
Angle, θ.
Define 3:Yaw angle.In the x-y-z coordinate system shown in Fig. 2 (a), yaw angle refers to what is rotated centered on z-axis
Angle φ.
Define 4:The anglec of rotation.In the x-y-z coordinate system shown in Fig. 2 (c), the anglec of rotation refers to the angle rotated centered on z '
Degree Ψ.
Define 5:Gradient orientation histogram feature.Piece image is described using the directional spreding of image pixel intensities gradient or edge
In object presentation and the Visual Feature Retrieval Process method of shape.Its implementation is first divided the image into and little is called pane location
Connected region;Then the gradient direction or edge orientation histogram of each pixel in pane location are gathered;It is finally that these are straight
Square figure combines and can be formed by Feature Descriptor.In order to improve degree of accuracy, can with these local histograms in image
Bigger interval (block) in carry out contrast normalization (contrast-normalized), the method is each by first calculating
Density of the rectangular histogram in this interval (block), then does to each pane location in interval according to this density value and returns
One changes.There can be higher robustness to illumination variation and shade by the normalization.
Define 6:Back-propagation algorithm.It is a kind of supervised learning algorithm, is often used to train multilayer neural network.General bag
Containing two stages:(1) the propagated forward stage input will be trained to send into network obtaining exciter response;(2) back-propagation phase will
Exciter response asks poor with the corresponding target output of training input, so as to obtain the response error of hidden layer and output layer.
Define 7:Gradient descent method.It is a kind of unconfined optimization method, when object function minima is solved, finds ladder
Degree direction, and along the search of gradient opposite direction, the method until reaching local minimum.
According to a kind of head pose estimation method based on stack own coding of the present invention, comprise the following steps:
Step 1:Head depth image of the collection N width comprising different attitudes, and according to photographic head during collection each image
Position, each self-corresponding head pitching of record N width images, driftage and the anglec of rotation obtain head pose vector
The 1st dimension table show the angle of pitch, the 2nd dimension table shows inclination angle, and 3-dimensional represents the anglec of rotation, and subscript n represents the n-th width image;
Step 2:Detecting step 1 collects the head zone of image, and extracts the gradient orientation histogram of the head zone
Feature, composition gradient direction histogram characteristic vector
Step 3:To obtaining gradient orientation histogram characteristic vector in step 2Numerical value normalization is being carried out per one-dimensional, will
Numerical range is compressed to [0,1] interval, the scope of attitude is normalized to into [0,1] interval;
The concrete grammar of the step 3 is:
Numerical range is compressed to into [0,1] interval, specific practice is:For n-th sample, the data of its i-th dimensionReturn
One changes formula
For the minima in all sample i-th dimensions,It is all
Maximum in sample i-th dimension;
The scope of attitude is normalized to into [0,1] interval, specific practice is:
WhereinRepresent the component of the demarcation attitude jth dimension of n-th sample, ynjRepresent the numerical value after the dimension normalization;
Step 4:The corresponding mapping function of stack self-encoding encoder (such as Fig. 3) is built, if input isWherein s1Represent
The dimension of feature, the stack own coding that this patent is used is of five storeys altogether;1st layer is input layer, and the input of input layer is gradient direction
Histogram feature vector, the number of the 1st node layer is the dimension of gradient orientation histogram characteristic vector, and layer 2-4 is hidden unit
Layer, the 5th layer is output layer;Any one node unit symbol of any one layer of lRepresent, subscript (l) represents l layers, its
Computing formula is:
Represent all s of Connection Neural Network l layerslBetween individual unit and i-th unit of l+1 layers
Parameter;Specifically,The parameter between i-th unit of j-th unit and l+1 layers of connection l layers is represented,For
The bias term related to the hidden unit i of l+1 layers, sl+1For the number of l+1 layer hidden units;σ () is sigmoid function, its expression
Formula isIf definitionThen above formula can also
It is expressed as:
The output layer of the stack self-encoding encoder has 3 units, uses symbolRepresent, to represent estimation head
The angle of pitch of portion's attitude, inclination angle and the anglec of rotation;Whole stack own coding model function hW, bX () is represented when input is x
Estimate head pose, i.e.,:
Step 5:When input is x, it is assumed that corresponding demarcation attitude is y, and stack own coding is to Attitude estimation value and demarcation
Error between attitude is:
Meanwhile, in order to represent that each unit of output layer defines error term to the size of error contribution
RepresentDerivative, using Back Propagation Algorithm, calculate l=2, each node j when 3,4 layers
Corresponding error term;
Finally obtain following two estimation difference with regard toWithPartial derivative:
Step 6:Using the stack own coding model in step 4, by normalized gradient orientation histogram feature in step 3
[x1..., xN] used as the input of stack own coding, corresponding demarcation head pose value is [y1..., yN], set up stack self-editing
The optimization object function of code:
WhereinWith
Lambda binding itemIntensity;
Step 7:Object function J (w, b) is solved with regard to parameterWithPartial derivative
WhereinWithRepresent to work as and be input into as xnWhen corresponding l layers j-th unit output and l+1 layers
The corresponding error term of i unit;Object function J (w, b) is finally obtained with regard to parameter vector w, the gradient of bWith
Step 8:In order to try to achieve optimal stack own coding parameter w and b, it would be desirable to first initiation parameter, ladder is recycled
Degree descent method is optimized, specifically comprising following two steps:
A () w and b is initialized;First random initializtion w and b, w are expressed as (w(1)..., w(4))T, wherein w(l)Represent l
The parameter of layer;B is expressed as (b(1)..., b(4))T, the parameter of the 1st, 2,3 layers of layer-by-layer correction afterwards;When 1 layer parameter is corrected,
Using gradient descent method parameters optimization w(1)And b(1), feature is originally inputted using the reconstruct of the 1st layer network, and make reconstructed error most
It is little;When 2 layer parameter is corrected, using gradient descent method parameters optimization w(2)And b(2), using the 1st layer of output as the 2nd layer
Input, using layer 2 network reconstruct feature is originally inputted, and makes reconstructed error minimum;When 3 layer parameter is corrected, using ladder
Degree descent method parameters optimization w(3)And b(3), using the 2nd layer of output as the 3rd layer of input, reconstructed using layer 3 network original defeated
Enter feature, and make reconstructed error minimum;For the 4th layer parameter, by the use of the 3rd layer of output as the 4th layer of input, parameters optimization
w(4)And b(4)So that output and the error sum of squares demarcated between attitude are minimum;Thus the 1st to the 4th layer network is initialized;
(b) gradient descent method;According to initialization value, undated parameter vector w and b, i.e.,:
Wherein subscript [t] and [t+1] represent the t time and t+1 iteration;Stop iteration when w and b meet the condition of convergence;
Step 9:For new head image, determine head zone and extract gradient orientation histogram feature, numerical value normalizing
During the stack self-encoding encoder for training is sent into after change, corresponding head pose estimation value is obtained, and numerical range is reverted to-
180 to+180.
Further, the concrete grammar of the step 3 is:
Numerical range is compressed to into [0,1] interval, specific practice is:For n-th sample, the data of its i-th dimensionReturn
One changes formula
For the minima in all sample i-th dimensions,It is all
Maximum in sample i-th dimension;
The scope of attitude is normalized to into [0,1] interval, specific practice is:
WhereinRepresent the component of the demarcation attitude jth dimension of n-th sample, yniRepresent the numerical value after the dimension normalization;
Further, the stack self-encoding encoder mentioned in the step 4, each layer of number of unit is respectively s1=
1440, s2=80, s3=80 and s4=80, output layer only has 3 units, i.e.,:s5=3.
Further, when solving stack own coding parameter using gradient descent method in the step 8, before and after the condition of convergence is
Twice the parameter of iteration no longer changes, that is, reach local best points.
The present invention innovation be:
Propose to utilize stack self-encoding encoder, the nonlinear mapping relation set up between head depth image and attitude.This
The bright N width head depth images that gather first are normalized to the image that size is 96*128 as training sample depth image,
1440 are extracted simultaneously and ties up gradient orientation histogram feature, then record corresponding head pose.Afterwards, stack own coding is designed
Device, the self-encoding encoder removes input layer and output layer, totally 3 layers of intermediate layer.Then, on training sample and demarcation attitude data, profit
Learn each layer parameter of stack self-encoding encoder with gradient descent method.Finally, for the head image of attitude to be estimated, gradient is extracted
Direction histogram feature, according to the above-mentioned stack self-encoding encoder for succeeding in school head pose is estimated.With traditional head pose estimation
Method is compared, the method can simulation input feature to the complex mapping relation of head pose, effectively overcome shallow Model
Estimate the not high problem of accuracy.
Description of the drawings
Fig. 1 is head pose estimation schematic diagram;
Fig. 2 is the angle of pitch, yaw angle and anglec of rotation schematic diagram;
Fig. 3 is stack self-encoding encoder schematic diagram.
Specific embodiment
The method according to the invention, first with Matlab or C language the training pattern of stack self-encoding encoder is write;Connect
The training sample that collects of input and train stack own coding parameter;Then the image zooming-out gradient direction Nogata to collecting
Figure feature, is input in the stack self-encoding encoder for training as source data and is processed;Obtain the head pose estimated.This
Bright method, can be used in natural scene in head pose estimation problem.
A kind of head pose estimation method based on stack own coding, comprises the following steps:
Step 1:Head depth image of the collection N width comprising different attitudes, and according to photographic head during collection each image
Position, each self-corresponding head pitching of record N width images, driftage and the anglec of rotation obtain head pose vector
The 1st dimension table show the angle of pitch, the 2nd dimension table shows inclination angle, and 3-dimensional represents the anglec of rotation, and subscript n represents the n-th width image;
Step 2:Detecting step 1 collects the head zone of image, and extracts the gradient orientation histogram of the head zone
Feature, composition gradient direction histogram characteristic vector
Step 3:To obtaining gradient orientation histogram characteristic vector in step 2Numerical value normalization is being carried out per one-dimensional, will
Numerical range is compressed to [0,1] interval, the scope of attitude is normalized to into [0,1] interval;
Step 4:The corresponding mapping function of stack self-encoding encoder (such as Fig. 3) is built, if input isWherein s1Represent
The dimension of feature, the stack own coding that this patent is used is of five storeys altogether;1st layer is input layer, and the input of input layer is gradient direction
Histogram feature vector, the number of the 1st node layer is the dimension of gradient orientation histogram characteristic vector, and layer 2-4 is hidden unit
Layer, the 5th layer is output layer;Any one node unit symbol of any one layer of lRepresent, subscript (l) represents l layers, its
Computing formula is:
Represent all s of Connection Neural Network l layerslBetween individual unit and i-th unit of l+1 layers
Parameter;Specifically,The parameter between i-th unit of j-th unit and l+1 layers of connection l layers is represented,Be with
The hidden unit i of l+1 layers related bias term, sl+1For the number of l+1 layer hidden units;σ () is sigmoid function (sigmoid
Function), its expression formula isIf definition
Then above formula can also be expressed as:
Changing the output layer of stack self-encoding encoder has 3 units, uses symbolRepresent, to represent estimation head
The angle of pitch of portion's attitude, inclination angle and the anglec of rotation;Whole stack own coding model function hW, bX () is represented when input is x
Estimate head pose, i.e.,:
The stack self-encoding encoder mentioned in the step 4, each layer of number of unit is respectively s1=1440, s2=80, s3
=8 and s4=80, output layer only has 3 units, i.e.,:s5=3.
Step 5:When input is x, it is assumed that corresponding demarcation attitude is y, and stack own coding is to Attitude estimation value and demarcation
Error between attitude is:
Meanwhile, in order to represent that each unit of output layer defines error term to the size of error contribution
RepresentDerivative, using Back Propagation Algorithm, calculate l=2, each node j when 3,4 layers
Corresponding error term;
Finally obtain following two estimation difference with regard toWithPartial derivative:
Step 6:Using the stack own coding model in step 4, by normalized gradient orientation histogram feature in step 3
xnUsed as the input of stack own coding, corresponding demarcation head pose value is [y1..., yN], set up the optimization of stack own coding
Object function:
WhereinWith
Lambda binding itemIntensity;
Step 7:Object function J (w, b) is solved with regard to parameterWithPartial derivative
WhereinWithRepresent to work as and be input into as xnWhen corresponding l layers j-th unit output and l+1 layers
The corresponding error term of i unit;Object function J (w, b) is finally obtained with regard to parameter vector w, the gradient of bWith
Step 8:In order to try to achieve optimal stack own coding parameter w and b, it would be desirable to first initiation parameter, ladder is recycled
Degree descent method is optimized, specifically comprising following two steps:
A () w and b is initialized;First random initializtion w and b, w are expressed as (w(1)..., w(4))T, wherein w(l)Represent l
The parameter of layer;B is expressed as (b(1)..., b(4))T, the parameter of the 1st, 2,3 layers of layer-by-layer correction afterwards;When 1 layer parameter is corrected,
Using gradient descent method parameters optimization w(1)And b(1), feature is originally inputted using the reconstruct of the 1st layer network, and make reconstructed error most
It is little;When 2 layer parameter is corrected, using gradient descent method parameters optimization w(2)And b(2), using the 1st layer of output as the 2nd layer
Input, using layer 2 network reconstruct feature is originally inputted, and makes reconstructed error minimum;When 3 layer parameter is corrected, using ladder
Degree descent method parameters optimization w(3)And b(3), using the 2nd layer of output as the 3rd layer of input, reconstructed using layer 3 network original defeated
Enter feature, and make reconstructed error minimum;For the 4th layer parameter, by the use of the 3rd layer of output as the 4th layer of input, parameters optimization
w(4)And b(4)So that output and the error sum of squares demarcated between attitude are minimum;Thus the 1st to the 4th layer network is initialized;
(b) gradient descent method;According to initialization value, undated parameter vector w and b, i.e.,:
Wherein subscript [t] and [t+1] represent the t time and t+1 iteration;Stop iteration when w and b meet the condition of convergence;
When solving stack own coding parameter using gradient descent method in the step 8, condition of convergence iteration twice for before and after
Parameter no longer change, that is, reach local best points.
Step 9:For new head image, determine head zone and extract gradient orientation histogram feature, numerical value normalizing
During the stack self-encoding encoder for training is sent into after change, corresponding head pose estimation value is obtained, and numerical range is reverted to-
180 to+180.
Claims (4)
1. a kind of head pose estimation method based on stack own coding, comprises the following steps:
Step 1:Head depth image of the collection N width comprising different attitudes, and according to the position of photographic head during collection each image,
The each self-corresponding head pitching of record N width images, driftage and the anglec of rotation, obtain head pose vector The 1st
Dimension table shows the angle of pitch, and the 2nd dimension table shows inclination angle, and 3-dimensional represents the anglec of rotation, and subscript n represents the n-th width image;
Step 2:Detecting step 1 collects the head zone of image, and extracts the gradient orientation histogram feature of the head zone,
Composition gradient direction histogram characteristic vector
Step 3:To obtaining gradient orientation histogram characteristic vector in step 2Numerical value normalization is being carried out per one-dimensional, by numerical value
Ratage Coutpressioit to [0,1] is interval, the scope of attitude is normalized to into [0,1] interval;
The concrete grammar of the step 3 is:
Numerical range is compressed to into [0,1] interval, specific practice is:For n-th sample, the data of its i-th dimensionNormalization
Formula
For the minima in all sample i-th dimensions,For all samples
Maximum in i-th dimension;
The scope of attitude is normalized to into [0,1] interval, specific practice is:
WhereinRepresent the component of the demarcation attitude jth dimension of n-th sample, ynjRepresent the numerical value after the dimension normalization;
Step 4:The corresponding mapping function of stack self-encoding encoder is built, if input isWherein s1The dimension of feature is represented,
The stack own coding that this patent is used is of five storeys altogether;1st layer is input layer, and the input of input layer is gradient orientation histogram feature
Vector, the number of the 1st node layer is the dimension of gradient orientation histogram characteristic vector, and layer 2-4 is hidden unit layer, and the 5th layer is
Output layer;Any one node unit symbol of any one layer of lRepresent, subscript (l) represents l layers, its computing formula
For:
Represent all s of Connection Neural Network l layerslGinseng between individual unit and i-th unit of l+1 layers
Number;Specifically,The parameter between i-th unit of j-th unit and l+1 layers of connection l layers is represented,It is and l
+ 1 layer of hidden unit i related bias term, sl+1For the number of l+1 layer hidden units;σ () is sigmoid function, and its expression formula isIf definitionThen above formula can also be represented
For:
Changing the output layer of stack self-encoding encoder has 3 units, uses symbolRepresent, to represent head appearance is estimated
The angle of pitch of state, inclination angle and the anglec of rotation;Whole stack own coding model function hW, bX () represents the estimation when input is x
Head pose, i.e.,:
Step 5:When input is x, it is assumed that corresponding demarcation attitude is y, and stack own coding is to Attitude estimation value and demarcates attitude
Between error be:
Meanwhile, in order to represent that each unit of output layer defines error term to the size of error contribution
RepresentDerivative, using Back Propagation Algorithm, calculate l=2, each node j correspondences when 3,4 layers
Error term;
Finally obtain following two estimation difference with regard toWithPartial derivative:
Step 6:Using the stack own coding model in step 4, by normalized gradient orientation histogram feature x in step 3nMake
For the input of stack own coding, corresponding demarcation head pose value is [y1..., yN], set up the optimization aim of stack own coding
Function:
WhereinWith
Lambda binding itemIntensity;
Step 7:Object function J (w, b) is solved with regard to parameterWithPartial derivative
WhereinWithRepresent to work as and be input into as xnWhen corresponding l layers j-th unit output and i-th of l+1 layers
The corresponding error term of unit;Object function J (w, b) is finally obtained with regard to parameter vector w, the gradient of bWith
Step 8:In order to try to achieve optimal stack own coding parameter w and b, it would be desirable to first initiation parameter, under recycling gradient
Drop method is optimized, specifically comprising following two steps:
A () w and b is initialized;First random initializtion w and b, w are expressed as (w(1)..., w(4))T, wherein w(l)Represent l layers
Parameter;B is expressed as (b(1)..., b(4))T, the parameter of the 1st, 2,3 layers of layer-by-layer correction afterwards;When 1 layer parameter is corrected, utilize
Gradient descent method parameters optimization w(1)And b(1), feature is originally inputted using the reconstruct of the 1st layer network, and make reconstructed error minimum;When
When correcting 2 layer parameter, using gradient descent method parameters optimization w(2)And b(2), using the 1st layer of output as the 2nd layer of input,
Feature is originally inputted using layer 2 network reconstruct, and makes reconstructed error minimum;When 3 layer parameter is corrected, declined using gradient
Method parameters optimization w(3)And b(3), using the 2nd layer of output as the 3rd layer of input, using layer 3 network reconstruct spy is originally inputted
Levy, and make reconstructed error minimum;For the 4th layer parameter, by the use of the 3rd layer of output as the 4th layer of input, parameters optimization w(4)
And b(4)So that output and the error sum of squares demarcated between attitude are minimum;Thus the 1st to the 4th layer network is initialized;
(b) gradient descent method;According to initialization value, undated parameter vector w and b, i.e.,:
Wherein subscript [t] and [t+1] represent the t time and t+1 iteration;Stop iteration when w and b meet the condition of convergence;
Step 9:For new head image, determine head zone and extract gradient orientation histogram feature, numerical value normalization it
In sending into the stack self-encoding encoder for training afterwards, corresponding head pose estimation value is obtained, and numerical range is reverted to into -180
To+180.
2. a kind of head pose estimation method based on stack own coding as claimed in claim 1, it is characterised in that the step
Rapid 3 concrete grammar is:
Numerical range is compressed to into [0,1] interval, specific practice is:For n-th sample, the data of its i-th dimensionNormalization
Formula
For the minima in all sample i-th dimensions,For all samples
Maximum in i-th dimension;
The scope of attitude is normalized to into [0,1] interval, specific practice is:
WhereinRepresent the component of the demarcation attitude jth dimension of n-th sample, ynjRepresent the numerical value after the dimension normalization;
3. a kind of head pose estimation method based on stack own coding as claimed in claim 1, it is characterised in that the step
The stack self-encoding encoder mentioned in rapid 4, each layer of number of unit is respectively s1=1440, s2=80, s3=80 and s4=80,
Output layer only has 3 units, i.e.,:s5=3.
4. a kind of head pose estimation method based on stack own coding as claimed in claim 1, it is characterised in that the step
When solving stack own coding parameter using gradient descent method in rapid 8, the condition of convergence is that in front and back twice the parameter of iteration no longer changes,
Reach local best points.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611100343.0A CN106599810B (en) | 2016-12-05 | 2016-12-05 | A kind of head pose estimation method encoded certainly based on stack |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611100343.0A CN106599810B (en) | 2016-12-05 | 2016-12-05 | A kind of head pose estimation method encoded certainly based on stack |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106599810A true CN106599810A (en) | 2017-04-26 |
CN106599810B CN106599810B (en) | 2019-05-14 |
Family
ID=58596108
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611100343.0A Expired - Fee Related CN106599810B (en) | 2016-12-05 | 2016-12-05 | A kind of head pose estimation method encoded certainly based on stack |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106599810B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107481292A (en) * | 2017-09-05 | 2017-12-15 | 百度在线网络技术(北京)有限公司 | The attitude error method of estimation and device of vehicle-mounted camera |
CN107506725A (en) * | 2017-08-22 | 2017-12-22 | 杭州远鉴信息科技有限公司 | High voltage isolator positioning and status image recognizer based on neutral net |
CN107749757A (en) * | 2017-10-18 | 2018-03-02 | 广东电网有限责任公司电力科学研究院 | A kind of data compression method and device based on stacking-type own coding and PSO algorithms |
CN107945161A (en) * | 2017-11-21 | 2018-04-20 | 重庆交通大学 | Road surface defect inspection method based on texture feature extraction |
CN110533065A (en) * | 2019-07-18 | 2019-12-03 | 西安电子科技大学 | Based on the shield attitude prediction technique from coding characteristic and deep learning regression model |
US11367197B1 (en) * | 2014-10-20 | 2022-06-21 | Henry Harlyn Baker | Techniques for determining a three-dimensional representation of a surface of an object from a set of images |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104392241A (en) * | 2014-11-05 | 2015-03-04 | 电子科技大学 | Mixed regression-based head pose estimation method |
US20160070966A1 (en) * | 2014-09-05 | 2016-03-10 | Ford Global Technologies, Llc | Head-mounted display head pose and activity estimation |
US9292734B2 (en) * | 2011-01-05 | 2016-03-22 | Ailive, Inc. | Method and system for head tracking and pose estimation |
CN105760809A (en) * | 2014-12-19 | 2016-07-13 | 联想(北京)有限公司 | Method and apparatus for head pose estimation |
-
2016
- 2016-12-05 CN CN201611100343.0A patent/CN106599810B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9292734B2 (en) * | 2011-01-05 | 2016-03-22 | Ailive, Inc. | Method and system for head tracking and pose estimation |
US20160070966A1 (en) * | 2014-09-05 | 2016-03-10 | Ford Global Technologies, Llc | Head-mounted display head pose and activity estimation |
CN104392241A (en) * | 2014-11-05 | 2015-03-04 | 电子科技大学 | Mixed regression-based head pose estimation method |
CN105760809A (en) * | 2014-12-19 | 2016-07-13 | 联想(北京)有限公司 | Method and apparatus for head pose estimation |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11367197B1 (en) * | 2014-10-20 | 2022-06-21 | Henry Harlyn Baker | Techniques for determining a three-dimensional representation of a surface of an object from a set of images |
US11869205B1 (en) | 2014-10-20 | 2024-01-09 | Henry Harlyn Baker | Techniques for determining a three-dimensional representation of a surface of an object from a set of images |
CN107506725A (en) * | 2017-08-22 | 2017-12-22 | 杭州远鉴信息科技有限公司 | High voltage isolator positioning and status image recognizer based on neutral net |
CN107481292A (en) * | 2017-09-05 | 2017-12-15 | 百度在线网络技术(北京)有限公司 | The attitude error method of estimation and device of vehicle-mounted camera |
CN107481292B (en) * | 2017-09-05 | 2020-07-28 | 百度在线网络技术(北京)有限公司 | Attitude error estimation method and device for vehicle-mounted camera |
CN107749757A (en) * | 2017-10-18 | 2018-03-02 | 广东电网有限责任公司电力科学研究院 | A kind of data compression method and device based on stacking-type own coding and PSO algorithms |
CN107945161A (en) * | 2017-11-21 | 2018-04-20 | 重庆交通大学 | Road surface defect inspection method based on texture feature extraction |
CN107945161B (en) * | 2017-11-21 | 2020-10-23 | 重庆交通大学 | Road surface defect detection method based on textural feature extraction |
CN110533065A (en) * | 2019-07-18 | 2019-12-03 | 西安电子科技大学 | Based on the shield attitude prediction technique from coding characteristic and deep learning regression model |
Also Published As
Publication number | Publication date |
---|---|
CN106599810B (en) | 2019-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108345869B (en) | Driver posture recognition method based on depth image and virtual data | |
CN110059558B (en) | Orchard obstacle real-time detection method based on improved SSD network | |
CN106599810A (en) | Head pose estimation method based on stacked auto-encoding | |
CN108764065B (en) | Pedestrian re-recognition feature fusion aided learning method | |
CN110532920B (en) | Face recognition method for small-quantity data set based on FaceNet method | |
CN110674741B (en) | Gesture recognition method in machine vision based on double-channel feature fusion | |
CN104392241B (en) | A kind of head pose estimation method returned based on mixing | |
CN108171112A (en) | Vehicle identification and tracking based on convolutional neural networks | |
CN112184752A (en) | Video target tracking method based on pyramid convolution | |
CN107424161B (en) | Coarse-to-fine indoor scene image layout estimation method | |
CN107808129A (en) | A kind of facial multi-characteristic points localization method based on single convolutional neural networks | |
CN108182397B (en) | Multi-pose multi-scale human face verification method | |
CN104268539A (en) | High-performance human face recognition method and system | |
CN106599994A (en) | Sight line estimation method based on depth regression network | |
CN103324938A (en) | Method for training attitude classifier and object classifier and method and device for detecting objects | |
CN111368759B (en) | Monocular vision-based mobile robot semantic map construction system | |
CN105205449A (en) | Sign language recognition method based on deep learning | |
CN103279936A (en) | Human face fake photo automatic combining and modifying method based on portrayal | |
CN104636732A (en) | Sequence deeply convinced network-based pedestrian identifying method | |
CN105760898A (en) | Vision mapping method based on mixed group regression method | |
WO2023151237A1 (en) | Face pose estimation method and apparatus, electronic device, and storage medium | |
CN113361542A (en) | Local feature extraction method based on deep learning | |
CN105488541A (en) | Natural feature point identification method based on machine learning in augmented reality system | |
CN112232263A (en) | Tomato identification method based on deep learning | |
CN103093211B (en) | Based on the human body motion tracking method of deep nuclear information image feature |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20190514 Termination date: 20211205 |