A kind of enhancing coding method for vision mapping objects value
Technical field
The invention belongs to technical field of computer vision, is related to vision mapping techniques, Attitude estimation, sight line is mainly used in
In the vision estimation problem such as tracking and age estimation.
Background technology
In computer vision, vision mapping refers to the mistake of mapping function between study input picture feature and output variable
Journey, when new images are input into, to estimate the corresponding target output value of the input picture.Specifically, vision mapping includes:People
Body Attitude estimation, head pose estimation, sight line estimation and object tracking etc..Refer to bibliography:O.Williams,A.Blake,
and R.Cipolla,Sparse and Semi-Supervised Visual Mapping with the S3GP,in IEEE
Conference Computer on Computer Vision and Pattern Recognition,pp.230-237,
2006.
Used as an important branch of computer vision, vision mapping is changed under many occasions by people one by one according to image
Content estimates the situation of target output.Replace, by computer according to Input Image Content, by existing vision mapping function
Prediction output, so that realize replacing human eye and brain to carry out automatical analysis and estimation to image by video camera and computer.At present,
The technology has started to be applied to multiple industries closely related with people's life.Wherein, head pose estimation is applied to vapour
Car safe driving industry, sight line is estimated and human body attitude is estimated to be applied to Intelligent Human-Machine Interface and game industry, object tracking
The industries such as intelligent transportation are applied to, human body attitude is estimated to be applied to field of human-computer interaction.Believe with computer hardware
The progressively solution of key technical problem in the continuous improvement of reason ability and vision mapping, its application prospect will be more wide.
In the model for vision mapping problems, all kinds of regression models have been found to the mould best for solving the problem
Type.When regression model is set up, it usually needs by input picture Feature Mapping to desired value (for example:Head pose, the age,
Body posture and direction of visual lines etc.).In some particular problems, target range is to determine, and is spacedly distributed,
For example:Age, the corresponding angle of direction of visual lines and the corresponding angle of attitude.For this kind of desired value, directly set up from original spy
The mapping for levying desired value can have that target Distribution value is sparse and uneven, in order to solve problems, while carrying
The performance of high algorithm, we have proposed enhancing coding method and desired value is encoded.
Content of the invention
The invention provides a kind of enhancing coding method of vision mapping objects value, be more suitable for setting up after desired value coding and
Mapping relations between input.First to the image that collects and carry out feature extraction (original gradation, HOG, SIFT and Harr
Deng), and record corresponding desired value (age, attitude angle and direction of visual lines etc.);Afterwards, enhancing coding is carried out to desired value,
Each of coding is 0/1 two-valued variable;Then, the mapping relations that sets up between original input picture feature and binary-coding;
Then all of input picture is mapped to binary-coding according to above-mentioned mapping relations, finally, recycles random forest method to build
Mapping relations between vertical binary-coding and desired value.For new test pictures, characteristics of image is extracted, recycling has been acquired
Binary-coding estimated by model, and binary-coding is revert to desired value.The patent of invention solves that sample is sparse and skewness
In the case of even, the problem that the existing method estimation effect of vision mapping is not good enough.
In order to easily describe present invention, first some terms are defined.
Define 1:Vision maps.Will visual signature revert to desired value.
Define 1:Input feature vector.In vision estimation problem, it is often necessary to extract visual signature to original image, such as ladder
Degree direction histogram feature, local binary feature etc..
Define 2:Desired value.In vision estimation problem, it is often necessary to estimate corresponding output valve, example according to input feature vector
The age is estimated according to face-image such as, head angle deflection is estimated according to head image, age and head angle here deflect
It is desired value.
Define 3:Gradient orientation histogram.Gradient orientation histogram feature.Using image pixel intensities gradient or the direction at edge
The presentation and the Visual Feature Retrieval Process method of shape of the object in distribution description piece image.Its implementation is first divided the image into
The little connected region for being called pane location;Then the gradient direction or edge direction Nogata of each pixel in pane location are gathered
Figure;These set of histograms can be formed by Feature Descriptor altogether finally.In order to improve accuracy, can be with these offices
Portion's histogram carries out contrast normalization (contrast-normalized), this side in the bigger interval (block) of image
Method by first calculating density of each histogram in this interval (block), then according to this density value to interval in each
Individual pane location is normalized.There can be higher robustness to illumination variation and shade by the normalization.
Define 3:Shallow-layer regression model.The combination for directly carrying out one layer of Weight from input feature vector obtains estimate.
Define 4:Deep layer regression model.The hidden feature that the combination that input feature vector carries out Weight is obtained next layer, then right
Hidden feature is weighted combination and obtains the hidden feature of next layer, similar down estimates always to obtain last desired value.
Define 5:Random forest.In machine learning, random forest is a grader comprising multiple decision trees or returns
Return device, and its classification for exporting be by the classification of output and the mode of numerical value is set individually depending on.
Detailed technology scheme of the present invention is a kind of enhancing coding method for vision mapping objects value;The method includes:
Step 1:Collection N width input pictures, and each image corresponding desired value is demarcated during according to collection each image;
Step 2:By the image zooming-out visual signature obtained in step 1, and remember the corresponding visual signature of any n-th width image
Vector
Step 3:By all N width image character pairs vector, arrangement can obtain input data matrix X, i.e. X=in order
[x1, x2..., xN];
Step 5:Corresponding for N width images desired value vector is arranged as data matrix Y, i.e. Y=[y in order1, y2...,
yN];
Step 6:Enhancing coding is carried out to the desired value vector for exporting;
For ynEvery one-dimensional ynjCarrying out binary-coding method is:According to ynjSpan be [- M1+ 1, M2], this takes
Value scope is set according to actual conditions, then to ynjSpan is first adjusted to [1, M1+M2], make M=M1+M2
Basis afterwardsValue carry out binary-coding, the length of coding is
Coding vector an, [] represents and rounds symbol;Obtain coding vector anFront M dimension corresponding be encoded to:
Wherein k presentation codes vector anDimension;
anM+1 dimension to 2M corresponding encode be:
an2M+1 dimension arriveTieing up corresponding coding is:
an'sTie upCorrespondingly coding is dimension:
Step 6:Set up from input feature vectorArriveRegression model, and model is solved, obtains mould
The each parameter of type;
Step 7:Using in step 6 obtain model parameter, by feature fromEnhancing space encoder is mapped to, is obtained final product
Arrive
Step 8:Desired value is finally mapped in order to coding will be strengthened, is set upWith output desired valueIt
Between mapping relations, set up contact therebetween, the intrinsic dimensionality of the number and random tree of random tree using Random Forest model
Number according to the length and training sample that strengthen coding is selected;
Step 9:When sample to be estimated is given, input feature vector is mapped to increasing first with the model that sets up in step 6
Strong coding, then desired value is mapped to coding mapping is strengthened using the Random Forest model in step 8.
Further, the regression model in the step 6 is shallow Model or Deep model.
The present invention to the image that collects and carries out feature extraction first, and records corresponding desired value;Afterwards, to target
Value carries out enhancing coding, and each of coding is 0/1 two-valued variable;Then, original input picture feature and binary-coding are set up
Between mapping relations;Then all of input picture is mapped to binary-coding according to above-mentioned mapping relations, finally, is recycled
The mapping relations that random forest method is set up between binary-coding and desired value.For new test pictures, characteristics of image is extracted,
Recycle the model that has acquired to estimate binary-coding, and binary-coding is revert to desired value.The patent of invention has in sample
In the case of sparse and skewness, to improving sample identification rate, and the accuracy of identification.
Description of the drawings
Fig. 1 is vision mapping schematic diagram (head pose estimation, body posture are estimated and sight line is estimated).
Fig. 2 is coding schematic diagram schematic diagram.
Specific embodiment
Realize language:Matlab,C/C++
Hardware platform:Intel core2 E7400+4G DDR RAM
Software platform:Matlab2012a,VisualStdio2010
The method according to the invention, clearly requires the vision mapping problems of solution first, and gathers associated picture (head figure
Picture, body image and face-image etc.) and spotting value (head pose angle, body posture angle and age).According to this
Patent of invention, first with Matlab or C language coding study image to the mapping model for strengthening coding, and from increasing
The Random Forest model of desired value is encoded to by force;Image to be estimated to being input into carries out vision mapping afterwards, estimates desired value.This
The method of invention can be used for the vision mapping problems in various computer visions, hence it is evident that improve direct mapping method (from input
Feature is to desired value) performance.
The present invention is further detailed to technical scheme with reference to Figure of description:One kind is directed to vision mapping objects value
Enhancing coding method;The method includes:
Step 1:Collection N width input pictures (see Fig. 1), and each image corresponding target is demarcated during according to collection each image
Value;By taking head pose estimation as an example, N width input picture is N width head images, and calibration value is then head pose yn, ynFirst
Dimension table shows the angle of pitch, two-dimensional representation inclination angle, and the third dimension represents that the anglec of rotation, subscript n represent the corresponding attitude of the n-th width image;
In actual applications, if body posture estimation problem, input picture is body image, and desired value is between body parts
Angle.If sight line estimation problem, input picture is eyes image, desired value be direction of visual lines (horizontal direction angle and
Vertical direction angle);
Step 2:By the image zooming-out visual signature obtained in step 1, and remember the corresponding visual signature of any n-th width image
VectorEqually by taking head pose as an example, visual signature generally extracts gradient orientation histogram feature, then
Represent the gradient orientation histogram feature of the n-th width image;
Step 3:By all N width image character pairs vector, arrangement can obtain input data matrix X, i.e. X=in order
[x1, x2..., xN];
Step 5:Corresponding for N width images desired value vector is arranged as data matrix Y, i.e. Y=[y in order1, y2...,
yN];
Step 6:Enhancing coding (see Fig. 2) is carried out to the desired value vector for exporting;
For ynEvery one-dimensional ynjCarrying out binary-coding method is:According to ynjSpan be [- M1+ 1, M2], this takes
Value scope is set according to actual conditions, then to ynjSpan is first adjusted to [1, M1+M2], make M=M1+M2
Basis afterwardsValue carry out binary-coding, the length of coding is
Coding vector an, [] represents and rounds symbol;Obtain coding vector anFront M dimension corresponding be encoded to:
Wherein k presentation codes vector anDimension;
anM+1 dimension to 2M corresponding encode be:
an2M+1 dimension arriveTieing up corresponding coding is:
an'sTie upCorrespondingly coding is dimension:
Step 6:Set up from input feature vectorArriveRegression model, and model is solved, obtains mould
The each parameter of type, the model are shallow Model or Deep model;
Step 7:Using in step 6 obtain model parameter, by feature fromEnhancing space encoder is mapped to, i.e.,
Obtain
Step 8:Desired value is finally mapped in order to coding will be strengthened, is set upWith output desired valueIt
Between mapping relations, set up contact therebetween, the intrinsic dimensionality of the number and random tree of random tree using Random Forest model
Number according to the length and training sample that strengthen coding is selected;
Step 9:When sample to be estimated is given, input feature vector is mapped to increasing first with the model that sets up in step 6
Strong coding, then desired value is mapped to coding mapping is strengthened using the Random Forest model in step 8;With head pose estimation it is
Example, input feature vector is gradient orientation histogram feature, the enhancing coding for mapping afterwards, then from enhancing coding mapping to head appearance
State.