CN106503696A

CN106503696A - A kind of enhancing coding method for vision mapping objects value

Info

Publication number: CN106503696A
Application number: CN201611102813.7A
Authority: CN
Inventors: 潘力立
Original assignee: University of Electronic Science and Technology of China
Current assignee: Qilu Electric Technology Shandong Scientific And Technological Achievement Transformation Co ltd
Priority date: 2016-12-05
Filing date: 2016-12-05
Publication date: 2017-03-15
Anticipated expiration: 2036-12-05
Also published as: CN106503696B

Abstract

The present invention proposes a kind of enhancing coding method for vision mapping objects value, belongs to technical field of computer vision, is related to vision mapping techniques.The image that collects simultaneously carries out feature extraction, and record corresponding desired value；Afterwards, enhancing coding is carried out to desired value, and each of coding is 0/1 two-valued variable；Then, the mapping relations that sets up between original input picture feature and binary-coding；Then all of input picture is mapped to binary-coding according to above-mentioned mapping relations, finally, the mapping relations that recycling random forest method is set up between binary-coding and desired value.For new test pictures, characteristics of image is extracted, recycle the model that has acquired to estimate binary-coding, and binary-coding is revert to desired value.The patent of invention have sample is sparse and skewness in the case of, to improving sample identification rate, and the accuracy of identification.

Description

A kind of enhancing coding method for vision mapping objects value

Technical field

The invention belongs to technical field of computer vision, is related to vision mapping techniques, Attitude estimation, sight line is mainly used in In the vision estimation problem such as tracking and age estimation.

Background technology

In computer vision, vision mapping refers to the mistake of mapping function between study input picture feature and output variable Journey, when new images are input into, to estimate the corresponding target output value of the input picture.Specifically, vision mapping includes：People Body Attitude estimation, head pose estimation, sight line estimation and object tracking etc..Refer to bibliography：O.Williams,A.Blake, and R.Cipolla,Sparse and Semi-Supervised Visual Mapping with the S3GP,in IEEE Conference Computer on Computer Vision and Pattern Recognition,pp.230-237, 2006.

Used as an important branch of computer vision, vision mapping is changed under many occasions by people one by one according to image Content estimates the situation of target output.Replace, by computer according to Input Image Content, by existing vision mapping function Prediction output, so that realize replacing human eye and brain to carry out automatical analysis and estimation to image by video camera and computer.At present, The technology has started to be applied to multiple industries closely related with people's life.Wherein, head pose estimation is applied to vapour Car safe driving industry, sight line is estimated and human body attitude is estimated to be applied to Intelligent Human-Machine Interface and game industry, object tracking The industries such as intelligent transportation are applied to, human body attitude is estimated to be applied to field of human-computer interaction.Believe with computer hardware The progressively solution of key technical problem in the continuous improvement of reason ability and vision mapping, its application prospect will be more wide.

In the model for vision mapping problems, all kinds of regression models have been found to the mould best for solving the problem Type.When regression model is set up, it usually needs by input picture Feature Mapping to desired value (for example：Head pose, the age, Body posture and direction of visual lines etc.).In some particular problems, target range is to determine, and is spacedly distributed, For example：Age, the corresponding angle of direction of visual lines and the corresponding angle of attitude.For this kind of desired value, directly set up from original spy The mapping for levying desired value can have that target Distribution value is sparse and uneven, in order to solve problems, while carrying The performance of high algorithm, we have proposed enhancing coding method and desired value is encoded.

Content of the invention

The invention provides a kind of enhancing coding method of vision mapping objects value, be more suitable for setting up after desired value coding and Mapping relations between input.First to the image that collects and carry out feature extraction (original gradation, HOG, SIFT and Harr Deng), and record corresponding desired value (age, attitude angle and direction of visual lines etc.)；Afterwards, enhancing coding is carried out to desired value, Each of coding is 0/1 two-valued variable；Then, the mapping relations that sets up between original input picture feature and binary-coding； Then all of input picture is mapped to binary-coding according to above-mentioned mapping relations, finally, recycles random forest method to build Mapping relations between vertical binary-coding and desired value.For new test pictures, characteristics of image is extracted, recycling has been acquired Binary-coding estimated by model, and binary-coding is revert to desired value.The patent of invention solves that sample is sparse and skewness In the case of even, the problem that the existing method estimation effect of vision mapping is not good enough.

In order to easily describe present invention, first some terms are defined.

Define 1：Vision maps.Will visual signature revert to desired value.

Define 1：Input feature vector.In vision estimation problem, it is often necessary to extract visual signature to original image, such as ladder Degree direction histogram feature, local binary feature etc..

Define 2：Desired value.In vision estimation problem, it is often necessary to estimate corresponding output valve, example according to input feature vector The age is estimated according to face-image such as, head angle deflection is estimated according to head image, age and head angle here deflect It is desired value.

Define 3：Gradient orientation histogram.Gradient orientation histogram feature.Using image pixel intensities gradient or the direction at edge The presentation and the Visual Feature Retrieval Process method of shape of the object in distribution description piece image.Its implementation is first divided the image into The little connected region for being called pane location；Then the gradient direction or edge direction Nogata of each pixel in pane location are gathered Figure；These set of histograms can be formed by Feature Descriptor altogether finally.In order to improve accuracy, can be with these offices Portion's histogram carries out contrast normalization (contrast-normalized), this side in the bigger interval (block) of image Method by first calculating density of each histogram in this interval (block), then according to this density value to interval in each Individual pane location is normalized.There can be higher robustness to illumination variation and shade by the normalization.

Define 3：Shallow-layer regression model.The combination for directly carrying out one layer of Weight from input feature vector obtains estimate.

Define 4：Deep layer regression model.The hidden feature that the combination that input feature vector carries out Weight is obtained next layer, then right Hidden feature is weighted combination and obtains the hidden feature of next layer, similar down estimates always to obtain last desired value.

Define 5：Random forest.In machine learning, random forest is a grader comprising multiple decision trees or returns Return device, and its classification for exporting be by the classification of output and the mode of numerical value is set individually depending on.

Detailed technology scheme of the present invention is a kind of enhancing coding method for vision mapping objects value；The method includes：

Step 1：Collection N width input pictures, and each image corresponding desired value is demarcated during according to collection each image；

Step 2：By the image zooming-out visual signature obtained in step 1, and remember the corresponding visual signature of any n-th width image Vector

Step 3：By all N width image character pairs vector, arrangement can obtain input data matrix X, i.e. X=in order [x₁, x₂..., x_N]；

Step 5：Corresponding for N width images desired value vector is arranged as data matrix Y, i.e. Y=[y in order₁, y₂..., y_N]；

Step 6：Enhancing coding is carried out to the desired value vector for exporting；

For y_nEvery one-dimensional y_njCarrying out binary-coding method is：According to y_njSpan be [- M₁+ 1, M₂], this takes Value scope is set according to actual conditions, then to y_njSpan is first adjusted to [1, M₁+M₂], make M=M₁+M₂

Basis afterwardsValue carry out binary-coding, the length of coding is Coding vector a_n, [] represents and rounds symbol；Obtain coding vector a_nFront M dimension corresponding be encoded to：

Wherein k presentation codes vector a_nDimension；

a_nM+1 dimension to 2M corresponding encode be：

a_n2M+1 dimension arriveTieing up corresponding coding is：

a_n'sTie upCorrespondingly coding is dimension：

Step 6：Set up from input feature vectorArriveRegression model, and model is solved, obtains mould The each parameter of type；

Step 7：Using in step 6 obtain model parameter, by feature fromEnhancing space encoder is mapped to, is obtained final product Arrive

Step 8：Desired value is finally mapped in order to coding will be strengthened, is set upWith output desired valueIt Between mapping relations, set up contact therebetween, the intrinsic dimensionality of the number and random tree of random tree using Random Forest model Number according to the length and training sample that strengthen coding is selected；

Step 9：When sample to be estimated is given, input feature vector is mapped to increasing first with the model that sets up in step 6 Strong coding, then desired value is mapped to coding mapping is strengthened using the Random Forest model in step 8.

Further, the regression model in the step 6 is shallow Model or Deep model.

The present invention to the image that collects and carries out feature extraction first, and records corresponding desired value；Afterwards, to target Value carries out enhancing coding, and each of coding is 0/1 two-valued variable；Then, original input picture feature and binary-coding are set up Between mapping relations；Then all of input picture is mapped to binary-coding according to above-mentioned mapping relations, finally, is recycled The mapping relations that random forest method is set up between binary-coding and desired value.For new test pictures, characteristics of image is extracted, Recycle the model that has acquired to estimate binary-coding, and binary-coding is revert to desired value.The patent of invention has in sample In the case of sparse and skewness, to improving sample identification rate, and the accuracy of identification.

Description of the drawings

Fig. 1 is vision mapping schematic diagram (head pose estimation, body posture are estimated and sight line is estimated).

Fig. 2 is coding schematic diagram schematic diagram.

Specific embodiment

Realize language：Matlab,C/C++

Hardware platform：Intel core2 E7400+4G DDR RAM

Software platform：Matlab2012a,VisualStdio2010

The method according to the invention, clearly requires the vision mapping problems of solution first, and gathers associated picture (head figure Picture, body image and face-image etc.) and spotting value (head pose angle, body posture angle and age).According to this Patent of invention, first with Matlab or C language coding study image to the mapping model for strengthening coding, and from increasing The Random Forest model of desired value is encoded to by force；Image to be estimated to being input into carries out vision mapping afterwards, estimates desired value.This The method of invention can be used for the vision mapping problems in various computer visions, hence it is evident that improve direct mapping method (from input Feature is to desired value) performance.

The present invention is further detailed to technical scheme with reference to Figure of description：One kind is directed to vision mapping objects value Enhancing coding method；The method includes：

Step 1：Collection N width input pictures (see Fig. 1), and each image corresponding target is demarcated during according to collection each image Value；By taking head pose estimation as an example, N width input picture is N width head images, and calibration value is then head pose y_n, y_nFirst Dimension table shows the angle of pitch, two-dimensional representation inclination angle, and the third dimension represents that the anglec of rotation, subscript n represent the corresponding attitude of the n-th width image； In actual applications, if body posture estimation problem, input picture is body image, and desired value is between body parts Angle.If sight line estimation problem, input picture is eyes image, desired value be direction of visual lines (horizontal direction angle and Vertical direction angle)；

Step 2：By the image zooming-out visual signature obtained in step 1, and remember the corresponding visual signature of any n-th width image VectorEqually by taking head pose as an example, visual signature generally extracts gradient orientation histogram feature, then Represent the gradient orientation histogram feature of the n-th width image；

Step 6：Enhancing coding (see Fig. 2) is carried out to the desired value vector for exporting；

Wherein k presentation codes vector a_nDimension；

a_nM+1 dimension to 2M corresponding encode be：

a_n2M+1 dimension arriveTieing up corresponding coding is：

a_n'sTie upCorrespondingly coding is dimension：

Step 6：Set up from input feature vectorArriveRegression model, and model is solved, obtains mould The each parameter of type, the model are shallow Model or Deep model；

Step 7：Using in step 6 obtain model parameter, by feature fromEnhancing space encoder is mapped to, i.e., Obtain

Step 9：When sample to be estimated is given, input feature vector is mapped to increasing first with the model that sets up in step 6 Strong coding, then desired value is mapped to coding mapping is strengthened using the Random Forest model in step 8；With head pose estimation it is Example, input feature vector is gradient orientation histogram feature, the enhancing coding for mapping afterwards, then from enhancing coding mapping to head appearance State.

Claims

1. a kind of enhancing coding method for vision mapping objects value；The method includes：

Step 2：By the image zooming-out visual signature obtained in step 1, and remember the corresponding visual feature vector of any n-th width image

Step 3：By all N width image character pairs vector, arrangement can obtain input data matrix X, i.e. X=[x in order₁, x₂..., x_N]；

For y_nEvery one-dimensional y_njCarrying out binary-coding method is：According to y_njSpan be [- M₁+ 1, M₂], the span Set according to actual conditions, then to y_njSpan is first adjusted to [1, M₁+M₂], make M=M₁+M₂

{\hat{y}}_{n j} = y_{n j} + M_{1}

Basis afterwardsValue carry out binary-coding, the length of coding is's Coding vector a_n, [] represents and rounds symbol；Obtain coding vector a_nFront M dimension corresponding be encoded to：

a_{n k} = \{\begin{matrix} 1 & 1 \leq k \leq {\hat{y}}_{n j} \\ 0 & {\hat{y}}_{n j} + 1 \leq k \leq M \end{matrix},

Wherein k presentation codes vector a_nDimension；

a_nM+1 dimension to 2M corresponding encode be：

a_{n k} = \{\begin{matrix} 1 & k = M + {\hat{y}}_{n j} \\ 0 & M + 1 \leq k \leq 2 M, k &NotEqual; M + {\hat{y}}_{n j} \end{matrix},

a_n2M+1 dimension arriveTieing up corresponding coding is：

a_{n k} = \{\begin{matrix} 1 & k = 2 M + [\frac{{\hat{y}}_{n j} + 1}{2}] \\ 0 & 2 M + 1 \leq k \leq 2 M + [\frac{M + 1}{2}], k &NotEqual; 2 M + [\frac{{\hat{y}}_{n j} + 1}{2}] \end{matrix},

a_n'sTie upCorrespondingly coding is dimension：

a_{n k} = \{\begin{matrix} 1 & k = 2 M + [\frac{M + 1}{2}] + [\frac{{\hat{y}}_{n j} + 9}{10}] \\ 0 & 2 M + [\frac{M + 1}{2}] + 1 \leq k \leq Q, k &NotEqual; 2 M + [\frac{M + 1}{2}] + [\frac{{\hat{y}}_{n j} + 9}{10}] \end{matrix} .

Step 6：Set up from input feature vectorArriveRegression model, and model is solved, obtains model each Parameter；

Step 7：Using in step 6 obtain model parameter, by feature fromEnhancing space encoder is mapped to, that is, is obtained

Step 8：Desired value is finally mapped in order to coding will be strengthened, is set upWith output desired valueBetween reflect Penetrate relation, set up contact therebetween using Random Forest model, the intrinsic dimensionality of the number and random tree of random tree according to The number of the length and training sample that strengthen coding is selected；

Step 9：When sample to be estimated is given, input feature vector is mapped to enhancing first with the model that sets up in step 6 and is compiled Code, then desired value is mapped to coding mapping is strengthened using the Random Forest model in step 8.

2. a kind of enhancing coding method for vision mapping objects value as claimed in claim 1, it is characterised in that the step Regression model in rapid 6 is shallow Model or Deep model.