CN102663449A

CN102663449A - Method for tracing human body movement based on maximum geometric flow histogram

Info

Publication number: CN102663449A
Application number: CN2012100640600A
Authority: CN
Inventors: 韩红; 苟靖翔; 谢福强; 冯光洁; 韩启强; 王瑞
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2012-03-12
Filing date: 2012-03-12
Publication date: 2012-09-12

Abstract

The present invention discloses a method for tracing human body movement based on a maximum geometric flow histogram, and mainly solves defaults such as fuzziness of human contour and edge descriptions, and failure to reflect characteristic internal geometric structure and texture mode in existing characteristic extraction method. Realization processes are: inputting video images to be treated and extracting a block diagram of a major human body part; performing a two-dimensional multi-scale wavelet transform to the image; searching for an optimal geometric flow direction by using quadtree division and bottom-up fusion principles; performing a one-dimensional wavelet transform to the quantized optimal geometric flow direction signals which are then reconstructed to a two-dimensional form and obtain a coefficient matrix; counting geometric flow coefficient strength histogram of 9 directions of each area as a final image characteristic expression; and through a regression progress, learning mapping relations from image characteristics to three-dimensional movement data, and predicting and recovering three-dimensional postures of the new training video images. The method for tracing human body movement based on the maximum geometric flow histogram, which is of fast computing speed and accurate results, reinforces image characteristic robustness, and can be applied to human body target identification, detection, and posture reconstruction.

Description

Based on maximum geometry flow to histogrammic human body motion tracking method

Technical field

The invention belongs to technical field of video image processing; Relate generally to video image characteristic texture method for expressing; Specifically be a kind of human body motion tracking method of the maximum geometry flow direction histogram based on second generation band ripple, be used for video human motion tracking and three-dimensional posture and recover.

Background technology

The video human motion tracking is one of great focus of computer vision field in nearly decades, and the personage is the content of core, is reflecting the core semantic feature of image.Correlation technique is at capturing movement, man-machine interaction, the multi-field preliminary application that obtained such as video monitoring, and the great application prospect of tool.Understanding and decipher to the video human motion tracking belong to video image processing category, also relate to numerous subjects such as study of pattern-recognition machine and signal Processing.Can promote the development of nature man-machine interaction based on the 3 d human motion recovery technology of image: utilize posture, gesture etc. are carried out interaction with machine.3 d human motion follows the tracks of that to recover a series of research with posture be long-term existence of computer vision field, and important and distance is the still very remote problem of solution thoroughly.Concerning the mankind, almost can understand wherein personage's attitude moment when watching piece image; Yet for computing machine; This understanding need be overcome one difficulty after another; Must utilize effective characteristics of image to characterize wherein personage's motion state and image texture, detailed information such as profile become the input identification signal interface of manageable pattern as computing machine.In the motion tracking process, need motion tracking decision method and characteristics of image be represented to be used in combination motion tracking and the recovery of three-dimensional posture that reaches the two-dimensional video human body.The tracking decision method that use the rear end in the existing motion tracking is broadly divided into based on learning the discriminant of probabilistic inference then to the production of state space optimization with to data.The characteristics of image method for expressing of front end roughly can be divided into based on the global characteristics point methods with based on the character representation method of local code list of Hanzi; Global characteristics comprises like gradient Nogata characteristic HOG, hierarchical characteristic HMAX, and local feature comprises hereinafter Shape Context in shape, and the method for yardstick unchangeability unique point SIFT etc.

Had at present the characteristics of image method for expressing of a lot of maturations applied to the characteristics of human body represent with motion tracking in.But most of characteristics of image method for expressing of describing human body is based on profile and marginal information; Not strict in theory; For static single image can be to a certain degree the general contents of represent images; And, be difficult to picture engraving internal information preferably for the continuous image sequence that slight variation is arranged.Simultaneously also face a subject matter in based on the method at edge at this type of; The fast change of image local often can not be jumped corresponding to the uncontinuity along boundary curve; Cause on the one hand gray scale uncontinuity fuzzy of closed boundary, the texture variations that depicts is not on the other hand assembled along geometrical curve.Net result is how much textures trends in can't the accurate and effective presentation video, can not portray people's attitude and characteristic information therein comprehensively, has produced ambiguity and ambiguousness for the motion tracking and the recovery in later stage.

Summary of the invention

The objective of the invention is to overcome the deficiency of above-mentioned prior art; Utilize the band ripple accurately to reflect the slight change and the inner vein trend of image geometry direction; And self-adaptation is sought best geometry flow direction and is carried out multiscale analysis; A kind of human body motion tracking method based on maximum geometry flow direction histogram has been proposed; To improve the degree of accuracy of image characteristics extraction, reflect the correlativity of continuous single image and the identifiability that subtle difference property is promoted computerized algorithm, the sign ability that improves characteristic reduces the ambiguity that similar image is described simultaneously; And under view data distribution present case, the database that possesses priori through study carries out attitude estimation accurately.

For realizing the object of the invention, technical scheme of the present invention comprises the steps:

(1) will import pending training and test video image set and convert continuous single width sequence chart into; Discern a main human body target and extract the rectangle block diagram that contains human body, the every width of cloth figure of the unified big young pathbreaker of image converts the initial pictures of the 192*64 pixel that is similar to the human motion ratio into, as the training sample image of handling afterwards; When a movement human is arranged in the image; Select this movement human as target, when a plurality of movement human, select the main human body target of needs;

(2) each training sample image is carried out the two-dimensional discrete orthogonal wavelet transformation and obtained the image sub-band information, the number of plies of wavelet transformation is L=1;

(3) to the training sample image behind the wavelet transformation, cut apart according to quad-tree partition and bottom-up fusion rule, obtain best band marble piece, each cut zone size minimum is the 4*4 image subblock, calculates the band ripple direction of extracting each image-region;

(4) use the one-dimensional discrete conversion,, obtain signal f after the conversion by the image sub-band information that best band ripple direction projection error amount obtains in the rearrangement step (2) from small to large _θ

(5) signal f behind the computational transformation _θQuantized value

Quantization parameter Q (x), quantized value here

Be made up of quantization parameter Q (x) fully, computing formula is:

Q (x) = \{\begin{matrix} 0 & | x | \leq T \\ sign (x) \cdot (q + 0.5) \cdot T & qT \leq | x | \leq (q + 1) T \end{matrix}

Wherein, x is signal f _θCoefficient, Q (x) is a quantized value

Quantization parameter, T is a quantization threshold, and q ∈ Z, q is a constant parameter, Z is an integer field;

(6) implement Bandeletization to quantizing each image subblock of back, obtain the Bandelet coefficient, calculate the best geometry flow direction on each image subblock simultaneously, and the Bandelet coefficient is arranged in matrix form by the original two dimensional image mode;

(7) image division is become the grid that laterally vertically is 4 five equilibriums, each grid divides 9 five equilibriums, presses the vote information of directional statistics in all directions Bandelet coefficient intensity, constitutes maximum geometry flow to the statistics with histogram characteristic;

(8) with maximum geometry flow to the statistics with histogram characteristic; Carrying out the machine learning of human motion posture follows the tracks of; Inputted video image is comprised the estimation of the space three-dimensional motion posture of 20 articulation points; The three-dimensional motion gesture data of estimating is reverted to the articulation point skeleton of image format, accomplish the human motion of video image and follow the tracks of.

Technical thought of the present invention is: from sequence of video images, extract the image position of containing a main human body; Then the 2-d wavelet multi-scale transform is carried out at this image position; Go out along the crooked wavelet function of geometry flow direction through quaternary tree subdivision and the bottom-up fusion constructs of CART; In the subdivision graph that finally obtains, confirm the optimum orientation of every sub-block, and do rectangular projection above that, two-dimensional function is converted into the one dimension function.Final in a plurality of fixed sizes zone that image marked off, along every direction 20 degree, the direction of maximum how much flow valuves such as totally 9 directional statistics such as branch such as grade distributes, and finally makes up histogram as the image statistics characteristic.The image statistics characteristic that tranining database is obtained and the three-dimensional posture of corresponding training data are carried out a kind of study that returns mapping function through the tracking decision method of double gauss model, utilize the mapping function of learning that tracking and corresponding three-dimensional posture that new sequence of video images carries out human motion are recovered.

Realization of the present invention also is, the extraction step to the band wave characteristic of best geometry flow in the step (3) includes:

3.1 for width in the initial pictures is the sub-piece S of L, calculates its best geometry flow direction, optimum orientation is through optimizing a Lagrangian penalty function L ₀(S) confirm:

L (f_{θ}, R) = {| | f_{θ} - {\tilde{f}}_{θ} | |}^{2} + λ * T^{2} (R_{g} + R_{b})

First on the right

For approaching square error, second λ * T ²(R _g+ R _b) be the penalty term of computation complexity, in the formula, f _θBe true best base function,

Be estimated value, λ gets empirical value for penalizing scale factor in the calculating, and T is a quantization threshold, and R is the bit number size, R _gBe to handle the required bit number size of best geometry flow parameter d, R through entropy coding _bIt is the size of the required bit number of quantization encoding { Q (t) }.Approaching of geometry flow direction that the calculating of the present invention through Lagrangian penalty function can obtain to estimate and real geometry flow direction.

3.2 make L=2L,, still be labeled as S to the square of each L * L; The geometry flow direction of calculating optimum and corresponding Lagrangian function value L (s); L is the width of an image subblock, and for best geometric direction extracts a variable in calculating, S also is the value of a variation here.

3.3 to each band ripple piece S that is of a size of L * L size, its four children are labeled as (S ₁, S ₂, S ₃, S ₄), calculate the Lagrangian function value of these four children as the leaf node gang:

\tilde{L} (S) = L_{0} (S_{1}) + L_{0} (S_{2}) + L_{0} (S_{3}) + L_{0} (S_{4}) + L_{0} (S) + λ \cdot T^{2}

3.4 order

L_{0} (S) = Min {L (S), \tilde{L} (S)} .

3.5 loop iteration is carried out in repeating step (3.2)-(3.4), reaches maximum fractionation yardstick, L up to L ₀(S) be final quaternary tree segmentation result, can also obtain the best geometry flow direction of each image-region piece simultaneously.

The directional characteristic extraction advantage of band ripple of the best geometry flow that the present invention adopts is; Can seek texture information and geometric direction local in the image subblock in the self-adaptation location; Do not need too much artificial parameter adjustment or intervention; Thereby delineate out the inside and the exterior geometry of general image, constitute the fundamental of further compute histograms statistical nature.

Realization of the present invention is that also the step of the image sub-band information that rearrangement step (2) obtains in the step (4) includes:

4.1 points all in each sub-piece is sorted according to best geometry flow direction projection error amount from small to large, can obtain a L ²The ranking index of length scale.

4.2 the coefficient behind the two-dimensional discrete wavelet conversion in the sub-piece is reordered by index, obtain an one-dimensional signal f _dThis signal is only represented a vector data, not representative function content.

4.3 with f _dCarry out the one-dimensional discrete wavelet transformation, obtain signal f after the conversion _θThis signal is only represented a vector data, not representative function content.

The advantage of the rearrangement sample information that the present invention adopts is, can accelerate computing velocity, and the search capability when improving computing reduces computing time.

Realization of the present invention is that also maximum geometry flow includes to the step of statistics with histogram ballot in the step (7):

7.1 image division is become the regional image block of 4*4, write down the main geometry flow direction in each piece, and corresponding coefficient intensity size.

7.2 by every direction 20 degree, totally 9 directions such as branch such as grade are divided with each image block, statistics ballot geometry flow direction drops on all directions interval numerical value and coefficient intensity, calculates the histogram distribution situation of entire image.

The advantage of the geometry flow statistics with histogram method that the present invention adopts is, whole geometry characteristic direction property distribution that can the computed image zone reduces the mistake coupling of local feature or do not match and strengthens the robustness of character representation simultaneously, and characteristic itself is sparse.

Realization of the present invention is that also the step that the machine learning of human motion posture is followed the tracks of and three-dimensional posture is recovered in the step (8) includes:

8.1 utilize the maximum geometry flow of Gaussian process learning training image to histogram feature X={x ₁, x ₂... x _nAnd X institute corresponding three-dimensional posture coordinate points data Y={ y ₁, y ₂... y _nMapping relations; Utilize this procedural learning to confirm the mapping relations of feature space again to the posture state space to the relevant parameter of kernel function

8.2 for pending testing human motion image sequence, set by step (1) to (7) extract respective histogram characteristic X '=x ' ₁, x ' ₂..., x ' _n, according to the mapping relationship f of learning, find the solution the pairing 3 d human motion gesture data of test sample book Y '=y ' ₁, y ' ₂..., y ' _n.

8.3 by obtain the Y ' as a result that 3 d human motion follows the tracks of=y ' ₁, y ' ₂..., y ' _n, accurately the 3 d human motion of two-dimensional video is imported in recovery, comprises the space three-dimensional motion posture of 20 body joint point coordinate values, the three-dimensional motion gesture data of estimating is reverted to the articulation point skeleton of image format in the space.

The step advantage that tracking that the machine learning method that the present invention adopts carries out and three-dimensional posture are recovered is; Can be through the study priori; Improve the inducing ability of algorithm, final rapid movement for the later stage is followed the tracks of and three-dimensional posture recovers to constitute important priori conditions and necessary ingredient.

The present invention compared with prior art has the following advantages:

(1) the present invention need not carry out background to the human motion zone in the sequence of video images and wipes out, and can save computer resource and time complexity more than traditional feature extracting method.

(2) the maximum geometry flow that uses in the present invention representes that to histogram method can be through human posture's directivity statistical information of the accurate presentation video of geometry flow; Can avoid tradition based on the edge according to the descriptive statistics of geometry flow; Or statement ambiguousness (the mutual hiding relation of front and back limbs and the depth of field distance relation) defective that produces based on the graphical representation method of profile; The geometry flow characteristic can be distinguished the human motion pattern of successive image frame and the nuance of attitude, for the later stage obtains the condition precedent that restoration result foundation necessity is followed the tracks of in more accurate three-dimensional motion.

(3) image feature information that extracts of the present invention is low and sparse than traditional image describing method dimension, can effectively reduce the data computation amount for the learning training part, for the target of real-time follow-up lays the foundation.

Description of drawings

Fig. 1 is algorithm realization flow figure of the present invention;

Fig. 2 is a feature extracting method synoptic diagram of the present invention;

Fig. 3 is that raw data base is handled the various motion sequence video figure of a human body image that extracts the band background in the back and the three-dimensional posture corresponding diagram that the present invention recovers;

The contrast histogram of Fig. 4 three-dimensional articulation point average error value that to be algorithm of the present invention and existing method HOG follow the tracks of the personage's walking movement continuous sequence among Fig. 3 a;

Fig. 5 is the successive frame video image sectional drawing of the present invention to personage's athletic posture in the international Humaneva database;

Fig. 6 is the corresponding diagram of the present invention to personage's corresponding sports attitude 3-d recovery result shown in Figure 5 in the international Humaneva database.

Embodiment

Understanding and decipher to the video human motion tracking belong to video image processing category, also relate to numerous technical fields such as study of pattern-recognition machine and signal Processing.Can promote the development of nature man-machine interaction based on the 3 d human motion recovery technology of image: utilize posture, gesture etc. are carried out interaction with machine.

Embodiment 1

The present invention be a kind of based on maximum geometry flow to histogrammic human body motion tracking method; Mainly video human moving image characteristic being carried out early stage handles; And represent to learn mapping relations to obtaining characteristics of image, thereby current video human exercise data is estimated its corresponding three-dimensional posture through the model of machine learning.With reference to Fig. 1, practical implementation process of the present invention is following:

(1) will import pending training and test video image set and convert continuous single width sequence chart into; According to picture material; Judgement needs the main human body target of identification; Extract the rectangle framework that contains human body, unified is the initial pictures that is similar to the 192*64 pixel of human motion ratio with every width of cloth image size conversion, as the training sample of handling afterwards.Because the present invention does not need pre-service such as original image carry out that background is wiped out, thereby has saved computational resource and dwindled time complexity.

(2) each training sample image is carried out the two-dimensional discrete orthogonal wavelet transformation and obtain the image sub-band information, the number of plies of orthogonal wavelet transformation is L=1.

(3) to the image behind the wavelet transformation, cut apart according to quad-tree partition and bottom-up fusion rule, obtain best band marble piece, and extract the band ripple direction of best geometry flow, concrete steps are following:

3.1 for width in the image is the sub-piece S of L, calculates its best geometry flow direction, optimum orientation is through optimizing a Lagrangian penalty function L ₀(S) confirm:

L (f_{θ}, R) = {| | f_{θ} - {\tilde{f}}_{θ} | |}^{2} + λ * T^{2} (R_{g} + R_{b})

First on the right

For approaching square error, second λ * T ²(R _g+ R _b) be the penalty term of computation complexity.Wherein, f _θBe true best base function,

Be estimated value, f _θ, Obtain through experimental calculation.λ gets empirical value 3/28 for penalizing scale factor in the calculating, T is that quantization threshold gets 15, and R is the bit number size, R _gBe to handle the required bit number size of best geometry flow parameter d through entropy coding to obtain R by calculating _bThe size that is the required bit number of quantization encoding { Q (t) } is confirmed by calculating.Sub-block size L minimum is 4.

3.2 make L=2L, to the square of each L * L, still be designated as S, the geometry flow direction of calculating optimum and corresponding Lagrangian function value L (S).L is the width of an image subblock, and for best geometric direction extracts a variable in calculating, S also is the value of a variation here.

\tilde{L} (S) = L_{0} (S_{1}) + L_{0} (S_{2}) + L_{0} (S_{3}) + L_{0} (S_{4}) + L_{0} (S) + λ \cdot T^{2}

3.4 order

L_{0} (S) = Min {L (S), \tilde{L} (S)} .

3.5 repeating step (3.2)-(3.4) reach the maximum fractionation yardstick up to L, sub-block size L maximal value is that the calculating of algorithm self-adaptation obtains, and theoretical value is no more than picture traverse.L ₀(S) be final quaternary tree segmentation result, obtain the best geometry flow direction of fixing big sub-piece simultaneously.

(4) use the one-dimensional discrete conversion, the image sub-band information that obtains in the rearrangement step (2) from small to large by geometry flow direction and optimum orientation projection error value in each cut zone obtains signal f after the conversion _θ, its concrete steps are following:

4.1 points all in each sub-piece is sorted according to best geometry flow projecting direction error amount from small to large, obtains a L ²The ranking index of length scale.

4.2 the coefficient behind the two-dimensional discrete wavelet conversion in the sub-piece is reordered by index, obtain an one-dimensional signal f _d

4.3 with f _dCarry out the one-dimensional discrete wavelet transformation, obtain signal f after the conversion _θ

(5) signal f behind the computational transformation _θQuantized value

Quantization parameter Q (x), quantized value here

Be made up of quantization parameter Q (x) fully, computing formula is:

Q (x) = \{\begin{matrix} 0 & | x | \leq T \\ sign (x) \cdot (q + 0.5) \cdot T & qT \leq | x | \leq (q + 1) T \end{matrix}

Wherein, x is signal f _θCoefficient, Q (x) is a quantized value

(6) wavelet coefficient with the best geometry flow direction of the every sub-block of corresponding to of gained d is stored in the two-dimensional matrix identical with sub-block size S, is band wave system matrix number.The second generation band wave characteristic that the present invention uses; Because the variation tendency of geometry flow in can the response diagram picture; So can accurately represent to change complicated organization of human body and shape information, just showed the texture information in the video image, reduced the edge fog that the video image profile is prone to.

(7) each 4*4 sized images piece zone is divided into 9 directions, the coefficient intensity of statistics geometry flow is in the distribution of each quantized directions, and as last characteristic, each statistical nature size is 144 dimensions with this histogram feature, and this statistical nature is as Image Representation.See Fig. 2, the personage that stands in the image has been illustrated in Fig. 2 left side.The upper right side has shown the roughly trend of the geometry flow that adds personage's shoulder position, frame position, and shown in its lower right is the synoptic diagram of 9 geometry of direction stream informations, does not have the essence corresponding relation.Comparatively speaking; The method of statistics with histogram makes the present invention use very low dimension to describe video image characteristic and whole texture distributed intelligence; Optimized algorithm and overall process scheme in earlier stage, reduced calculated amount, saved operation time for later stage study part.

(8) with maximum geometry flow histogram feature, carry out the machine learning of human motion posture and follow the tracks of, the video image of importing is carried out three-dimensional posture recover, the three-dimensional motion gesture data of estimating is reverted to the articulation point skeleton of image format.Concrete steps are following:

8.1 utilize the maximum geometry flow of Gaussian process learning training image to histogram feature X={x ₁, x ₂... x _nAnd X institute corresponding three-dimensional posture coordinate points data Y={ y ₁, y ₂... y _nMapping relations.Utilize this procedural learning to confirm that to the relevant parameter of kernel function feature space to the mapping relationship f in posture state space is expressed as again:

8.2 for pending testing human motion image sequence, (2) equally set by step-(7) extract corresponding maximum geometry flow to histogram feature X '=x ' ₁, x ' ₂..., x ' _n.According to the mapping relationship f of learning, find the solution the pairing 3 d human motion gesture data of test sample book Y '=y ' ₁, y ' ₂..., y ' _n.The image feature information that the present invention extracts is low and sparse than traditional image describing method dimension, can effectively reduce the data computation amount for the learning training part, for the target of real-time follow-up lays the foundation.

8.3 by the Y ' as a result that obtains the video human motion tracking=y ' ₁, y ' ₂..., y ' _nData totally 60 dimensions, can accurately recover corresponding human motion, comprise the space three-dimensional motion posture of 20 body joint point coordinate values, and the three-dimensional motion gesture data of estimating reverted to the articulation point skeleton of image format, see Fig. 3 and Fig. 6.

The present invention utilizes the band ripple accurately to reflect the slight change and the inner vein trend of image geometry direction; And self-adaptation is sought best geometry flow direction and is carried out multiscale analysis; A kind of human body motion tracking method based on maximum geometry flow direction histogram has been proposed; To improve the degree of accuracy of image characteristics extraction; Reflect the correlativity of continuous single image and the identifiability that subtle difference property is promoted computerized algorithm, the sign ability that improves characteristic reduces the ambiguity that similar image is described simultaneously, can save computational resource and time complexity more than traditional feature extracting method.And under view data distribution present case, the database that possesses priori through study carries out attitude estimation accurately, obtains the more 3 d human motion tracking results of robust.Also because the image feature information that the present invention extracts is very low than traditional image describing method dimension, for learning training part effectively reduced time and data computation amount, also the real time human movement tracking for future laying the foundation.

Embodiment 2

Based on maximum geometry flow to histogrammic human body motion tracking method with embodiment 1, the λ that wherein Lagrangian function is asked for optimum value in the step 3 is for penalizing scale factor, value 2/35 in the calculating, T is that quantization threshold gets 10, R _gBe to handle the required bit number size of best geometry flow parameter d through entropy coding to obtain R by calculating _bThe size that is the required bit number of quantization encoding { Q (t) } is confirmed by calculating.

The maximum geometry flow that uses in the present invention representes that to histogram method can be through human posture's directivity statistical information of the accurate presentation video of geometry flow; Can avoid tradition based on the edge according to the descriptive statistics of geometry flow; Or the statement ambiguousness that produces based on the graphical representation method of profile; Defective like the mutual hiding relation and the depth of field distance relation of front and back limbs; The geometry flow characteristic can be distinguished the human motion pattern of successive image frame and the nuance of attitude etc., for the later stage obtains more accurate three-dimensional motion and follows the tracks of restoration result and set up necessary condition.

In addition, the study homing method that in step (8), adopts adopts double gauss process or nearest neighbor algorithm to be achieved.Can obtain with embodiment 1 more consistently equally, sharp-edged feature extraction especially can reflect the unity and coherence in writing information under the motion state.

Embodiment 3

Based on maximum geometry flow to histogrammic human body motion tracking method with embodiment 1-2, adopt method of emulation that the present invention is verified.

(1) experiment condition setting

Moving image is divided classification among the present invention, on the different subclass of the HUMANEVA motion video sequence database of generally acknowledging, verifies respectively.Adopt Matlab 7.0 environment to carry out the emulation programming, adopting machine is Pentium Duo p6500 main frame, band 4G internal memory, 160G hard disk.

Last figure is an original image among Fig. 3, and figure below is for recovering three-dimensional posturography; Wherein Fig. 3 a is sequence first sectional drawing; Wherein Fig. 3 b is sequence second sectional drawing, and wherein Fig. 3 c is sequence the 3rd sectional drawing, and wherein Fig. 3 d is sequence the 4th sectional drawing; Wherein Fig. 3 e is that sequence the 4th sectional drawing is shown in Fig. 3 (a)-3 (e); 3 (a) are the video sequence image of " walking ", and a women of role is parallel to the video camera view directions and carries out the helicopodia walking on red carpet, and original image size is 640*480; Handling the every width of cloth in back through step 1, to contain the human body image size be 192*64, and it comprises over against video camera with back to the two field picture section of video camera.Wherein Fig. 3 a is sequence first sectional drawing, the women of role forward camera lens of passing by; Wherein Fig. 3 b is sequence second sectional drawing, and male sex role passes by away from camera lens; Wherein Fig. 3 c is sequence the 3rd sectional drawing, the body walking of male sex role left side; Wherein Fig. 3 d is sequence the 4th sectional drawing, the positive boxing of male sex role; The general pattern feature extracting method is difficult to that just contrary two phasic properties of vision and limbs are blocked fuzzy details and accurately sketches the contours, and also is like this for above-mentioned test pattern.

(2) emulation content and result

With existing classical way HOG descriptor characteristic and the inventive method " walking " shown in Fig. 2 a human motion video image is carried out emulation experiment, average error (estimated value and actual value) histogram of error such as Fig. 4 that 20 human joint pointses of its three-dimensional motion posture are estimated.Fig. 4 is for adopting HOG and mean motion evaluated error of the present invention contrast table as a result, and horizontal ordinate is represented different machine learning model such as double gauss (TGP), arest neighbors double gauss (TGPKNN), arest neighbors (WKNN), Gauss (GP).Ordinate is represented average 20 the human joint points motion tracking evaluated errors of three-dimensional space position, unit: millimeter (mm).The average error value of best-case is respectively: HOG:28.376mm this paper method MGH:25.10mm.

Error amount is calculated by 20 articulation point Euclidean distance differences of estimated value and True Data among Fig. 4.Ordinate is the mm of unit among the figure, and horizontal ordinate is various machine learning model, and TGP is the double gauss model, and TGPKNN is band neighbour's a double gauss, and WKNN is neighbour's weighted regression, and GP is a Gaussian process, and the low more effect of error is good more.The mean motion of HOG descriptor is followed the tracks of more stable; But relatively and the inventive method owing to there be extraction and histogrammic statistics for geometry flow details direction; The characterization image ability relatively before based on being greatly improved of edge and profile, this also is the needs that meet actual descriptive power based on the human motion tracking characteristics texture of learning.The inventive method is directed against generally acknowledged in the world at present database (HUMANEVA) and trains test, and experiment has obtained excellent results, and accuracy and error amount have all surmounted the classic methods in the comparison test under various learning models.Simultaneously, computational resource factor required for the present invention also is able to save according to dimension is low.

Embodiment 4

Based on maximum geometry flow to histogrammic characteristics of image represent with human body motion tracking method with embodiment 1-2, adopt method of emulation that the present invention is verified.

(1) experiment condition setting

Moving image is divided classification and is taken as " jogging " among the present invention, on the subclass of certain S3 of the motion video sequence database of generally acknowledging, verifies.Adopt the matlab environment to carry out the emulation programming.

Represent the test video sequence image of " jogging " respectively like Fig. 5, Fig. 5 (a)-5 (j) is the video interception sequence.Be a male sex role on red carpet to the video camera visual angle; Put in the reciprocal circle road-work posture of carrying out in camera direction; Original image size is 640 * 480, and every width of cloth image size is 192*64 after step 1 is handled, because the movement range difference of the every frame of 7-16 frame is less; The method of common extraction profile is difficult to differentiate hand, the details movement content of foot's joint motions.

Used number of training is corresponding to concrete motion posture among the present invention: smoothing kernel parameter σ=0.4 when " jogging " training sample n=400, each type adopt Gauss to return, double gauss input nucleus function parameter k when adopting double gauss to return _In=0.3, output kernel function parameter k _Out=2.4 * 1e-6.K nearest neighbor cluster sample point number is 100.

(2) emulation content and result

With method of the present invention the continuous human body video motion image sequence of Fig. 5 is carried out motion tracking and 3-d recovery emulation experiment, its tracking results is as shown in Figure 6.Wherein Fig. 6 Fig. 6 (a)-6 (j) is the corresponding three-dimensional posturography that training video sectional drawing 5 recovers, and is corresponding with Fig. 5 (a)-5 (j) respectively, is the motion tracking three-dimensional result of " jogging ".

Can find out from Fig. 6 (a)-6 (j); The accuracy of action of limbs has obtained fine improvement among the result of the inventive method; Simultaneously; The profile of human body has also obtained comparatively accurate in locating, and by the result who obtains the video human motion tracking, visually effect is good equally to draw corresponding three-dimensional motion restoration posture.

The present invention extracts movement human position block diagram through the pending raw video image of input; Image is carried out two-dimentional multi-scale wavelet transformation; Utilize quad-tree partition and bottom-up fusion rule to seek best geometry flow direction; Best geometry flow direction signal to after quantizing is done one-dimensional wavelet transform, is reassembled as two dimensional form, obtains band wave system matrix number; Calculate the maximum geometry flow direction vote information of 9 directions of each image-region, extract this statistics with histogram characteristic as the final image character representation; Regression process study by machine learning obtains the mapping relations of characteristics of image to the three-dimensional motion data, utilizes these mapping relations that new training video image is accurately estimated its three-dimensional motion attitude.

The present invention is when further reducing the complexity of characteristics of image dimension; Improved the sign ability of characteristic; And can be under view data distribution present case to be tested; Carry out accurately 3 d pose through the priori of learning and estimate, reduced computer picture to a great extent and represented the fuzzy ambiguity phenomenon with visual identity.The present invention has, result accurate advantage quick to the recovery of video human motion tracking, and this technology can be further used for human body target identification, detects and the reconstruction of three-dimensional posture.

To sum up, the present invention mainly solves the ambiguity of existing feature extracting method in human body contour outline and edge statement, can not reflect defectives such as inherent geometry of characteristic and texture pattern, provides reliable human motion posture to follow the tracks of and the 3-d recovery method.The present invention calculates fast, and the result is accurate, has strengthened the characteristics of image robustness, can be used for human body target identification, detects and the posture reconstruction.

Claims

One kind based on maximum geometry flow to histogrammic human body motion tracking method, it is characterized in that may further comprise the steps:

(1) will import pending training and test video image set and convert continuous single width sequence chart into, and discern a main human body target and extract the rectangle framework that contains human body, as the training sample image of handling afterwards;

(2) each training sample image is carried out the two-dimensional discrete orthogonal wavelet transformation and obtained the image sub-band information, the number of plies of orthogonal wavelet transformation is L=1;

(3) to the image behind the wavelet transformation, each subband is set up quaternary tree respectively and is cut apart, and each cut zone size minimum is the 4*4 image subblock, extracts the interior best geometry flow direction of each cut zone;

(4) use the one-dimensional discrete conversion, the geometry flow direction in each cut zone by the image sub-band information that optimum orientation projection error value obtains in the rearrangement step (2) from small to large, is obtained signal f after the conversion _θ

(5) signal f behind the computational transformation _θQuantized value
Quantization parameter Q (x), quantized value here Be made up of quantization parameter Q (x) fully, computing formula is:

$Q (x) = \{\begin{matrix} 0 & | x | \leq T \\ sign (x) \cdot (q + 0.5) \cdot T & qT \leq | x | \leq (q + 1) T \end{matrix}$

Wherein, x is signal f _θCoefficient, Q (x) is a quantized value
Quantization parameter, T is a quantization threshold, and q ∈ Z, q is a constant parameter, Z is an integer field;

(6) implement Bandeletization to quantizing each image subblock of back, obtain the Bandelet coefficient, calculate the best geometry flow direction on each image subblock simultaneously, and the Bandelet coefficient is arranged in matrix form by the original two dimensional image mode;

(7) this image two-dimensional matrix form is divided into the grid that laterally vertically is 4 five equilibriums, each grid divides 9 five equilibriums, presses the vote information of directional statistics in all directions Bandelet coefficient intensity, constitutes maximum geometry flow to the statistics with histogram characteristic;

(8) with maximum geometry flow to the statistics with histogram characteristic; Carrying out the machine learning of human motion posture follows the tracks of; Inputted video image is comprised the estimation of the space three-dimensional motion posture of 20 articulation points, and the articulation point skeleton that the three-dimensional motion gesture data of estimating is reverted to image format is as final tracking results.
2. according to claim 1 based on maximum geometry flow to histogrammic human body motion tracking method, be characterised in that: the extraction step to best geometry flow direction in the step (3) includes:

Be the sub-piece S of L for width in the initial pictures 3a), calculate its best geometry flow direction, optimum orientation is through optimizing a Lagrangian penalty function L ₀(S) confirm:

$L (f_{θ}, R) = {| | f_{θ} - {\tilde{f}}_{θ} | |}^{2} + λ * T^{2} (R_{g} + R_{b})$

In the formula, f _θBe true best base function,
Be estimated value, λ is for penalizing scale factor, and T is a quantization threshold, and R is the bit number size, R _gBe to handle the required bit number size of best geometry flow parameter d, R through entropy coding _bIt is the size of the required bit number of quantization encoding { Q (t) };

3b) make L=2L,, still be labeled as S, the geometry flow direction of calculating optimum and corresponding Lagrangian function value L (S) the square of each L * L;

3c) to each band ripple square S that is of a size of L * L size, its four children are labeled as (S ₁, S ₂, S ₃, S ₄), calculate the Lagrangian function value of these four children as the leaf node gang:

$\tilde{L} (S) = L_{0} (S_{1}) + L_{0} (S_{2}) + L_{0} (S_{3}) + L_{0} (S_{4}) + L_{0} (S) + λ \cdot T^{2}$

3d) order $L_{0} (S) = Min {L (S), \tilde{L} (S)};$

3e) repeating step 3b)-3d), reach maximum fractionation yardstick, L up to L ₀(S) be final quaternary tree segmentation result, obtain the best geometry flow direction of each band ripple piece simultaneously.
3. according to claim 2 based on maximum geometry flow to histogrammic human body motion tracking method, it is characterized in that: the step of the image sub-band information that rearrangement step (2) obtains in the step (4) includes:

4a) points all in each sub-piece is sorted according to best geometry flow direction projection error amount from small to large, can obtain a L ²The ranking index of length scale;

4b) coefficient behind the two-dimensional discrete wavelet conversion in the sub-piece is reordered by index, obtain an one-dimensional signal f _d

4c) with f _dCarry out the one-dimensional discrete wavelet transformation, obtain signal f after the conversion _θ
4. according to claim 1 based on maximum geometry flow to histogrammic human body motion tracking method, it is characterized in that: the step that the machine learning of human motion posture is followed the tracks of with the estimation of three-dimensional posture in the step (8) includes:

8a) utilize the maximum geometry flow of Gaussian process learning training sample epigraph to histogram feature X={x ₁, x ₂... x _nAnd X institute corresponding three-dimensional posture coordinate points data Y={ y ₁, y ₂... y _nMapping relations; Utilize the study of this process to confirm the mapping relations of feature space again to the posture state space to the relevant parameter of kernel function

8b) for pending testing human motion image sequence, (2) equally set by step-(7) extract corresponding maximum geometry flow to histogram feature X '=x ' ₁, x ' ₂..., x ' _n, according to the mapping relationship f that step 8.1 is learnt, find the solution the pairing 3 d human motion gesture data of test sample book Y '=y ' ₁, y ' ₂..., y ' _n, obtain video human motion tracking result;

8c) by the Y ' as a result that obtains the video human motion tracking=y ' ₁, y ' ₂..., y ' _n, accurately recover the input video human motion, comprise the space three-dimensional motion posture of 20 articulation points, the articulation point skeleton that the three-dimensional motion gesture data of estimating is reverted to image format is as final tracking results.