CN108154104A

CN108154104A - A kind of estimation method of human posture based on depth image super-pixel union feature

Info

Publication number: CN108154104A
Application number: CN201711395472.1A
Authority: CN
Inventors: 孔德慧; 张雯晖; 王少帆; 王玉萍; 尹宝才
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2017-12-21
Filing date: 2017-12-21
Publication date: 2018-06-12
Anticipated expiration: 2037-12-21
Also published as: CN108154104B

Abstract

The present invention discloses a kind of estimation method of human posture based on depth image super-pixel union feature, the depth image of human body is included as input data using single width, human body attitude feature extraction is carried out to depth image, human body is split using feature, cluster operation is carried out to the position after segmentation, and the location estimation of human skeleton point is carried out applied to sparse regression.Technical solution using the present invention, improves the accuracy rate of human body attitude estimation, and promotes the real-time of Attitude estimation method.

Description

A kind of estimation method of human posture based on depth image super-pixel union feature

Technical field

The invention belongs to computer vision and area of pattern recognition more particularly to a kind of depth image super-pixel that is based on to combine The estimation method of human posture of feature.

Background technology

Human body is divided and the positioning of skeletal joint is as human body attitude estimation problem, is computer vision and man-machine friendship A basic work in mutual field.Attitude estimation is in action recognition, animation simulation, gait analysis, the video based on content Image retrieval and intelligent video monitoring etc. have a wide range of applications.With Kinect sensor, TOF camera equal depth map picture The development of equipment is obtained, many research work are gradually gone to from the intensity image of conventional color or gray scale on depth image.With coloured silk Color image is compared, and depth image can be to avoid the influence of different illumination, appearance and ambient noise.

Since human body belongs to hinge type structure, the high-freedom degree that has, controlled parameter space, self similarity position and From blocking so that directly human skeleton model extremely difficult.The difficult point of human body attitude estimation is to build complicated people Body joint represents model and calculates human synovial position by unmarked data, and application demand further aggravates it in real time Difficulty.

In terms of application data, estimation method of human posture is divided into based on single-frame images method of estimation and based on figure As the method for estimation of sequence.Accumulated error is not present in method of estimation based on the method for estimation of single frames compared to image sequence, Fault recovery is not needed to, required posture can be directly obtained from single image；But due to not having on the time of movement Context information easily differentiates mistake for ambiguous posture.

From the point of view of application process, the method for early stage is with to based on Human Modeling, by searching for human body state space, in order to Matching is carried out with characteristics of image to be aligned.Such as common iterative closest point approach, using Markov Chain to head, body and four limbs Method of detector fitting manikin etc..The step of this kind of matched method of fitting is usually required by initialization, and need It designs one and meets true human body synthetic model, while computationally also more complicated costliness.Side based on machine learning Method is gradually used by everybody, such as Stochastic Decision-making forest (RDF), support vector machines (SVM), k nearest neighbor classification (KNN), deep learning The methods of all have corresponding application in terms of human body attitude estimation.The method of this kind of study does not need to the manikin of priori, but The training set sufficiently large, diversity is good dependent on one simultaneously increases the training time, while whether can extract effective, accurate Really, the feature descriptor stablized is also a main challenge.

In a kind of real-time body's Attitude estimation method that Shotton in 2011 et al. is proposed, and obtained in Kinect applications Obtained success.It is combined using depth difference feature with Stochastic Decision-making forest, uses by human body classification and finally mean value Drift cluster, which returns, obtains human skeleton point.But a problem of Stochastic Decision-making forest application is that the tree in forest is more, whole The result of a forest just can be more stable.And increasing with tree, the training time and testing time of Stochastic Decision-making forest all will It can increase, so limiting the scale of Stochastic Decision-making forest in real-time application in turn.Meanwhile with advances in technology, Depth image resolution ratio is increasing so that original pixel to be treated is also being multiplied, to processing method in real-time side The requirement in face is with regard to higher and higher.

The pixel of semantic similarity can be divided into one by a kind of image partition method of the super-pixel as image processing field In a super-pixel region, then image from processing pixel-by-pixel can become that entire super-pixel block is uniformly processed, for multiple Miscellaneous image processing program efficiency can have the promotion of the order of magnitude so that using more multiple in the application required for real-time Miscellaneous feature calculation is possibly realized.Simple wire-form iteration cluster (SLIC) is a kind of outstanding superpixel segmentation method, segmentation Results contrast is compact and size is not much different, the comparison that neighborhood relationships are still kept relative to other superpixel segmentation methods It is good.The method for carrying out super-pixel segmentation to depth map based on SLIC is fewer, has directly European using three-dimensional point cloud space Distance directly measure pel spacing from, the super-pixel block size that this method is partitioned into differs greatly；There is addition gradient direction Augment semantics segmentation, but it is the increase in the complexity of calculating；Also certain methods are to divide super picture with the combination of colour+depth Element, but which increases additional input information.

Invention content

In order to fast and accurately extract the feature of human body and effectively calculated, human body attitude estimation is improved Accuracy rate, and promote the real-time of Attitude estimation method.The present invention proposes a kind of human body appearance based on joint super-pixel feature State method of estimation, a kind of novel union feature based on super-pixel of extraction, this feature are used based on super-pixel depth difference feature With the fusion feature of the geodesic distance based on super-pixel, and application the union feature in Stochastic Decision-making forest method carry out human body Position is divided, after pass through the method for the K mean cluster at each position point application sparse regression to divide estimation human body attitude. The technology solved required for of the invention includes：Quickly and effectively super-pixel divides；The human body portion of union feature based on super-pixel Position segmentation；Human body attitude sparse regression based on human body cluster.

To achieve the above object, the present invention adopts the following technical scheme that：

The estimation method of human posture based on depth image super-pixel union feature of the present invention includes human body with single width Depth image is input data, carries out human body attitude feature extraction to depth image, human body is split using feature, Cluster operation is carried out to the position after segmentation, and the location estimation of human skeleton point is carried out applied to sparse regression.Entire frame Include the following steps：

Step (1) μ SLIC super-pixel divides：

The present invention surpasses depth map using a kind of method of the simple linear iteration cluster (μ SLIC) of Weighted Coefficients μ Pixel operation, this method are divided into two stages：Initial phase first, for depth image I, deep space (u, v, d (u, V)) conversion obtains corresponding three-dimensional point cloud space (x, y, z), and wherein u, v are two-dimensional image coordinate, and d (u, v) is corresponding for (u, v) The depth value of position, x, y, z are three dimensional space coordinate, and depth image is evenly dividing by δ × δ grids as comprising N_sA kind Son is added to the pixel in grid in the cluster centered on seed point, and according to the geometry of all pixels point in cluster Averagely obtain geometric centerUpdate the new position of seed point.Followed by iteration phase：For all seeds Point, in the contiguous range of the δ of its 3 δ × 3, according to distance D_sMeasure the distance of pixel and seed point, pixel be grouped into from its away from In being clustered from nearest seed point, and update and generate N_sThe new position of a seed point, iteration above step are received until whole process Hold back or reach greatest iteration step number N_i。

Pixel X_k(x_k, y_k, z_k) and ith cluster central pointDistance metric D_sIt is designed to：

Wherein, μ is the weight that compactness is adjusted in super-pixel.

The human body segmentation of step (2) SDDF+SGDF super-pixel union features：

Using the union feature based on super-pixel depth difference feature (SDDF) Yu super-pixel geodesic distance feature (SGDF) (SDDF+SGDF), for one group with super-pixel χ_sGeometric centerFor the center of circle, uniformly adopted at random in the circular scope of radius Θ One group of offset that sample obtainsOn image I, combine N_SDDFThe value f of a SDDF_θAnd N_SGDFA SGDF's Value g_θ, obtain about super-pixel χ_sDimension be N_SDDF+N_SGDFFeature：

1) the depth difference feature f based on super-pixel_θ：For the super-pixel χ divided in depth map I_s, in its geometry CenterRadius be an offset θ being generated in advance by random uniform sampling in the circular scope of Θ, depth difference is special Value indicative is：

Wherein, d_l() represents to take the depth value of some location of pixels.

2) the geodesic distance feature g based on super-pixel_θ：First according to the super-pixel structure comprising prospect divided in image Into the structure of a non-directed graph, wherein vertex is exactly these prospect super-pixel；Then decided whether according to two rules on vertex Between add side, if 1. two super-pixel χ_iAnd χ_iThere is pixel direct neighbor, and the absolute value of the difference of adjacent pixel depth is less than δ_d, then a side about two super-pixel, the weights on side are added in figure：

Euclidean distance for two super-pixel centers；2. for passing through first rule still without side and other vertex phases The outlier χ of connection_i, exist and χ_iA minimum super-pixel χ of distance_c=argmin_χ, dist (χ_i, χ), addition and super-pixel χ_cWeights are dist (χ_i, χ_c) a line；It, then can be with to the connected undirected graph application Floyd-Warshall algorithms constructed The mutual shortest distance of all super-pixel is calculated,

For the offset θ, super-pixel χ being randomly derived as depth difference feature_sThe geodesic distance list of feature values It is shown as：

Wherein, SP_I(X) pixel X affiliated super-pixel in the picture, d are represented_geodesicIt represents in undirected graph structure before The shortest distance between upper two vertex, that is, the geodesic distance being considered in super-pixel set；

3) the Stochastic Decision-making forest classified of super-pixel：For the union feature F of super-pixel_I(χ) is using Stochastic Decision-making forest Classify,

Training generation N first_tTree composition forest, in the training process of Stochastic Decision-making forest, for each division Node is required for calculating its comentropy and information gain, and the comentropy of a node comprising sampled point training set S is：

Wherein l is the position to be classified, and p (l | S) it is the probability that l is classified as in set S, Stochastic Decision-making forest algorithm will select The division of maximum information gain can be obtained by taking, the division P={ P of such a node_left, P_right, which is included Training sampling set S is divided into { S_left, S_right, the information gain of the division is defined as：

After the training of Stochastic Decision-making forest is completed, image I can obtain χ points of super-pixel by a tree t in forest Class is the Probability p of position l_t(l | I, χ), entire forest obtainsChoose maximum probability Classification of the position as super-pixel, and include as super-pixel the classification results of pixel.

Human body attitude sparse regression of the step (3) based on human body cluster feature：

After being classified by Stochastic Decision-making forest to super-pixel, all foreground pixels are classified, using from portion The feature that position is mapped to joint, and the method for passing through sparse regression maps out the position of skeletal point.

1) human body attitude of the cluster feature based on Divisional represents：For a position l, K mean cluster is carried out to it, Acquire N_kIt is a to cluster point and according to the distance-taxis at the position and preset position, obtain vector Then the geometric center c of all foreground pixels₀With all N_pA position is joined together, and obtains new clustering about based on position The human body attitude expression of feature：

2) sparse regression：The target of human body attitude estimation is exactly to obtain N_JThe position y of a three-dimensional framework artis, it is assumed that have N width trains picture, it is known that its artis informationAnd genius lociWherein N_qIt is of genius loci point Number, and i=1 ..., N, while define regression matrixWhereinFeature c is mapped to j-th skeletal point (j=1 ..., N_q), then this sparse regression model is exactly y_i≈ Ac_i, following N then can be passed through for projection matrix A_JA independent optimization obtains：

Wherein y_i(3j-2：3j) represent vector y_iThe subvector tieed up from (3j-2) dimension to (3j).Pass through the square trained Battle array A and position cluster feature c, the three-dimensional position y=Ac of skeletal point is obtained using linear sparse regression.

The present invention observes that the skeletal point of sparse regression is not located on foreground pixel sometimes in an experiment, and due to people Artis be normally on the center at body local position, due to the variation of actual view, at least also before comprising body On scene element, it is possible to the matching of arest neighbors foreground pixel point is carried out to the result of sparse regression, allows the bone for deviateing foreground pixel Frame point is modified, so as to get final skeletal point further promotes precision in the result of recurrence.

Description of the drawings

Fig. 1 human body attitudes estimate working frame；

Position Clustering Effect during Fig. 2 K=3；

The quantity of the depth sum number of Fig. 3 (a) trees；

Fig. 3 (b) deviation ranges and characteristic dimension；

Fig. 4 super-pixel parameters；Wherein, (a) is the super-pixel parameter of μ=1.0, the super-pixel parameter of (b) μ=1.5.(c) μ= 2.0 super-pixel parameters；

Fig. 5 SDDF and SGDF feature combine；

Fig. 6 single pixels feature and super-pixel tagsort result；

Fig. 7 uses single pixel feature (PDDF+PGDF) and the subjective figure of super-pixel feature (SDDF+SGDF) classification；

Fig. 8 postures return the result in CMUSD and XiDian data sets；

Fig. 9 postures return the result in EVAL data sets.

Specific embodiment

As shown in Figure 1, the present invention provides a kind of estimation method of human posture suitable for single frames depth image, with single width packet Depth image containing human body is input data, human body attitude feature extraction is carried out to depth image, using feature to human body It is split, cluster operation is carried out to the position after segmentation, and the location estimation of human skeleton point is carried out applied to sparse regression. Entire frame includes the following steps：

(1) μ SLIC super-pixel divides：

The present invention surpasses depth map using a kind of method of the simple linear iteration cluster (μ SLIC) of Weighted Coefficients μ Pixel operation, this method are divided into two stages：Initial phase first, for depth image I, deep space (u, v, d (u, V)) conversion obtains corresponding three-dimensional point cloud space (x, y, z), and depth image is evenly dividing by δ × δ grids as comprising N_sIt is a Seed is added to the pixel in grid in the cluster centered on seed point, and according in cluster all pixels point it is several What centerUpdate the new position of seed point.Followed by iteration phase：For all seed points, in its 3 δ × 3 In the contiguous range of δ, according to distance D_sThe distance of pixel and seed point is measured, pixel is grouped into the seed closest from its In point cluster, and update and generate N_sThe new position of a seed point, iteration above step is until whole process restrains or reaches maximum Iterative steps N_i。

In order to avoid influence of the region of noise or depth value acute variation to super-pixel, while accelerate the super-pixel stage Processing speed, pixel X_kWith ith cluster central pointDistance metric D_sIt is designed to：

Wherein μ is the weight that compactness is adjusted in super-pixel, when μ values are smaller, is easily spatially generated in two dimensional image More uniform segmentation, but depth detail performance is bad.When μ values are bigger, pixel similar in depth is easier to divide at one In block of pixels, but it is possible that many elongated regions, sex expression of compacting are bad.

Due in algorithmic procedure, distance metric D_sIt is used only to that comparison is far and near not to be needed to be overlapped, so in algorithm With D in realization_s ²To replace D_s, do not need to carry out the operation of evolution in this way, with the execution efficiency of accelerating algorithm.

(2) the human body segmentation of SDDF+SGDF super-pixel union feature：

Kinect systems demonstrate the validity that depth difference feature shows single pixel characteristic in Attitude estimation.This method carries A kind of union feature (SDDF+ based on super-pixel depth difference feature (SDDF) with super-pixel geodesic distance feature (SGDF) is gone out SGDF).For one group with super-pixel χ_sGeometric centerFor the center of circle, random uniform sampling obtains in the circular scope of radius Θ One group of offsetOn image I, combine N_SDDFThe value f of a SDDF_θAnd N_SGDFThe value g of a SGDF_θ, It obtains about super-pixel χ_sDimension be N_SDDF+N_SGDFFeature：

Wherein d_J() represents to take the depth value of some location of pixels.

2) the geodesic distance feature g based on super-pixel_θ：First according to the super-pixel structure comprising prospect divided in image Into the structure of a non-directed graph, wherein vertex is exactly these prospect super-pixel；Then decided whether according to two rules on vertex Between add side, if 1. two super-pixel χ_iAnd χ_jThere is pixel direct neighbor, and the absolute value of the difference of adjacent pixel depth is less than δ_d, then a side about two super-pixel, the weights on side are added in figure：

Euclidean distance for two super-pixel centers；2. for passing through first rule still without side and other vertex phases The outlier χ of connection_i, exist and χ_iA minimum super-pixel χ of distance_c=argmin_χdist(χ_i, χ), addition and super-pixel χ_c Weights are dist (χ_i, χ_c) a line.To the connected undirected graph application Floyd-Warshall algorithms constructed, then can count Calculate the mutual shortest distance of all super-pixel.

Wherein SP_I(X) pixel X affiliated super-pixel in the picture, d are represented_geodesicIt represents in undirected graph structure before The shortest distance between upper two vertex, that is, the geodesic distance being considered in super-pixel set.

3) the Stochastic Decision-making forest classified of super-pixel：For the union feature F of super-pixel_I(χ) is using Stochastic Decision-making forest To classify.

Training generation N first_tTree composition forest.In the training process of Stochastic Decision-making forest, for each division Node is required for calculating its comentropy and information gain.The comentropy of one node comprising sampled point training set S is：

Wherein l is the position to be classified, and p (l | S) it is the probability that l is classified as in set S.Stochastic Decision-making forest algorithm will select The division of maximum information gain can be obtained by taking, the division P={ P of such a node_left, P_right, which is included Training sampling set S is divided into { S_left, S_right, the information gain of the division is defined as：

(3) the human body attitude sparse regression based on human body cluster feature：

After being classified by Stochastic Decision-making forest to super-pixel, just all foreground pixels are classified, Jin Ertu As being divided into each pre-defined position.But required skeletal point information, some joints (such as head joint and chest joint) It is the center on position (such as head and metastomium), and some joints (elbow joint) are then position and position (upper arm position With forearm position) adjoining position, so the present invention designs a kind of feature mapped from position to joint, and pass through sparse regression Method map out the position of skeletal point.

1) human body attitude of the cluster feature based on Divisional represents：For a position l, K mean cluster is carried out to it, Acquire N_kA cluster point and according to the position and the distance-taxis of preset position (main portions being connected with the position), Obtain vectorThen the geometric center c of all foreground pixels₀With all N_pCombine at a position Come, obtain new about the human body attitude expression based on position cluster feature：

2) sparse regression：The target of human body attitude estimation is exactly to obtain N_JThe position y of a three-dimensional framework artis.Assuming that have N width trains picture, it is known that its artis informationAnd genius lociWherein N_qIt is of genius loci point Number, and i=1 ..., N, while define regression matrixWhereinFeature c is mapped to j-th skeletal point (j=1 ..., N_q), then this sparse regression model is exactly y_i≈ Ac_i.Then following N can be passed through for projection matrix A_JA independent optimization obtains：

Wherein y_i(3j-2: 3j) vector y is represented_iThe subvector tieed up from (3j-2) dimension to (3j).Pass through the square trained Battle array A and position cluster feature c, the three-dimensional position y=Ac of skeletal point is obtained using linear sparse regression.

Present invention uses a kind of new human body attitudes to estimate frame, and a kind of new joint is proposed on depth image Super-pixel character representation, body part segmentation is carried out by the union feature application Stochastic Decision-making forest, in extract part point After the cluster feature cut out, map to obtain final skeleton joint point position using the method for sparse regression.The present invention by The advantage and disadvantage tested to compare other methods on multiple data acquisition systems, verification put forward feature and the validity of frame.

Embodiment 1：

Method based on machine learning usually requires a large amount of data set as training and verification.It is used in the present invention three A data set is tested.Wherein EVAL data sets correspond to different performers respectively comprising 3 themes, and each performer has 8 Action sequence is total to about 10,000 frames, and resolution ratio is 320 × 240；XiDian data include 5 action sequences, totally 2850 frame, point Resolution is 2048 × 2048, which acquired on the prototype for the depth data collection of material of big resolution ratio, Depth data noise is bigger.Since these three data sets contain only depth data, it can not train and classify for human body Stochastic Decision-making forest, so the present invention generated using CMU motion capture datas library with depth data and position number of tags According to CMU generated datas collection (CMUSD), 113 themes of common Poser software process qualities, 2549 action sequences, each posture packet Containing 8 cameras random but that front is master site, depth picture and position label more than 820,000 640 × 480, CMUSD are contained It has covered than more rich action.

Prediction accuracy is from Fig. 3 (a) but more next with setting as the depth of tree increases and increases in Stochastic Decision-making forest It is deeper, the raising of accuracy also increasingly unobvious, but the expense of tree itself is increasing.So depth of present invention selection tree Spend is 20.And the increase for tree set also has same effect, in order to balance the efficiency of execution, thus select the scale of forest for 8 trees.The range of characteristic offset values and the dimension of characteristic point, as can be seen that being 180 pixels in deviation range from Fig. 3 (b), When characteristic dimension is 1000, the preferable result that compares can be obtained.In experiment later, this group of parameter will be all used.

(1) μ SLIC super-pixel divides：

In super-pixel operation, carried out just according to the grid of 12 × 12 pixels using the picture of CMUSD settings 640 × 480 Begin to divide, the number of the super-pixel comprising human body prospect is average 120 or so.The super-pixel of data under different data collection resolution ratio Initial size zoomed in and out according to the ratio of foreground pixel, to keep foreground pixel that can obtain the super picture of same number grade Element.During super-pixel division is carried out, after iterations reach certain number, the pixel that super-pixel is included tends to be steady It is fixed, and iterations and run time are proportional, so taking maximum iteration Ni=10.For super-pixel distance metric Parameter, the profile details of chin do not embody from Fig. 4 (a) as can be seen that in the super-pixel on head, in Fig. 4 (c) In, although the profile details at each position are embodied well, but body edge many elongated super-pixel occurs and draws Point.The size of super-pixel is than more uniform and taken into account details performance in Fig. 4 (b).μ=1.5 in the subsequent experimental of the present invention.

(2) super-pixel feature extraction：

In this experiment, using 12 sequences of CMUSD totally 8 ten thousand or so pictures totally 100 ten thousand or so super-pixel, each Sequence randomly selects 50% and is trained, and in addition 50% carries out test verification.

Different column lines represents different feature combinations in Figure 5.As can be seen from the figure 0+1000 uses geodetic completely The effect of distance feature (GDF) is poor than the result of the complete use depth difference features (DDF) of 1000+0, but comes from accuracy rate It says, this feature has certain validity.It, can be deeper than being used alone especially after two kinds of features are mixed according to 800+200 The result of degree difference will be got well, and illustrate to carry out depth difference feature using geodesic distance feature for certain postures and position beneficial Supplement, but what is worked in major part is depth difference feature.So the feature finally used is to be combined deeply with 800+200 Degree difference and 1000 dimension super-pixel features of geodesic distance mixing.

(3) single pixel is compared with super-pixel classifying quality：

Single pixel feature (PDDF+PGDF) is carried out on CMUSD and super-pixel feature (SDDF+SGDF) carries out experiment pair Than being set using identical characteristic dimension (800+200) and random forest.When using PDDF+PGDF, from each depth map 120 pixels of stochastical sampling are as training set.The Average Accuracy of final PDDF+PGDF is the flat of 92.105, SDDF+SGDF Equal accuracy rate is 92.468.Show in figure 6 using SDDF+SGDF and PDDF+PGDF on different themes accuracy rate have Floating slightly, it is contemplated that the randomness of sampled point, the two does not have apparent gap.Although in the figure 7 using SDDF+ Certain pixels of the SGDF methods in classification may be integrated into adjacent different parts, and the classification of super-pixel will influence Entire pixels in super-pixel, but semantic identical position is enabled to tend to classification unanimously using super-pixel, and Greatly save the extraction efficiency and testing efficiency of geodesic distance feature.

(4) posture returns：

In the genius loci extraction process returned in posture, as shown in Fig. 2, the cluster points of setting K mean values is 3, totally 10 A position is gathered respectively for 3 classes, and is ranked up that (such as head is according to body according to its Euclidean distance apart from father's connecting portion The distance of cadre position), the dimension of c is 93 dimensions.

For CMUSD, using in data set 50% picture, as random forest and the training set of sparse regression, other It is tested as test set.As can be seen from Figure 8 the artis of four limbs is compared to the standard of the artis such as head and chest Exactness is relatively low.This is because the movement of acra part is more violent, amplitude range is bigger, and once occur blocking will be to pixel The influence of classification is bigger and final regression result is had an impact.

In the experiment carried out in XiDian databases, using 4 sequences therein as training set, 1 is used as test set Cross validation is carried out, Average Accuracy (mAP)=91.7 illustrate to be directed to big resolution ratio and the bigger data of noise have Effect, this method can generate relatively good result.Although comparatively the electric database data noise in west is bigger, movement compares It is relatively simple, so result is higher than CMUSD accuracy rate.

For EVAL data sets, eliminate which part artis apart from body prospect is distant or artis directly away from From the apparent frame for exceeding organization of human body, it is left each theme and averagely includes 3,000 frames.It is trained in the pixel classifications stage using CMUSD Using the model trained as training set of whole pictures of CMUSD, two masters of EVAL are used in the stage of recurrence for random forest Topic is used as training set, surplus next as test set, is finally three cell means progress cross validation.

Fig. 9 is the comparison on EVAL data sets with the algorithm of the algorithm of Ye et al. and Jung et al..Ye et al. is 2014 The algorithm based on mixed Gauss model that year proposes needs to be fitted body model, it is slower to perform speed；Jung et al. was in 2015 The model of the random walk tree of proposition, greatly improves in execution efficiency, but is only verified on small data.From figure It can be seen that it is relatively accurate close to the centrical artis of body, and the precision of acra point is relatively poor, this is because Acra part is relatively low in the accuracy in pixel classifications stage, the position for returning feature clustering point is directly affected, so as to influence appearance The precision that state returns.

(5) run time

It is set using the characteristic dimension of identical setting and identical random forest, to single pixel method and super-pixel method Carry out run time comparison.Since the time complexity that geodesic distance calculates is 0 (n³), for the image of big resolution ratio, It can not almost be used under real time environment as feature.Although the method based on super-pixel of this method increases calculating super-pixel Time, but substantially reduce the efficiency of pixel characteristic extraction, while decrease the time of the classification of random forest.Even if It is to only use depth difference as feature, the method for super-pixel is while precision is ensured it is also possible that algorithm is to different resolutions Rate integrally accelerates 1.5~8 times.Method can reach under the data set of a variety of resolution ratio as can be seen from the table in real time will It asks.

Table 1 performs time (unit：Millisecond)

The experiment proves that in the case of a variety of different resolution ratio and the quality of data, new union feature can for this part Effectively express region characteristic, entire frame can in real time, effectively calculate human body attitude.

Super-pixel generation method has used the pixel distance measurement for taking into account two and three dimensions information, realizes pixel to super picture The down-sampling operation of element so that the reduction of the order of magnitude is presented in the data directly handled.This method is extracted on single frames depth map melts The union feature of depth difference and geodesic distance information is closed, overall situation and partial situation's incidence relation between pixel is comprehensively utilized, improves people The nicety of grading of body region.Compared with predecessor works, sparse regression is carried out according to Divisional cluster feature point and realizes human synovial Point estimation, not only reduces processing time, also obtains higher human body attitude estimated accuracy.

Claims

1. a kind of estimation method of human posture based on depth image super-pixel union feature, which is characterized in that including following step Suddenly：

Step (1) μ SLIC super-pixel divides

Super-pixel operation is carried out to depth map using the method for the simple linear iteration cluster (μ SLIC) of Weighted Coefficients μ, is divided into two A stage：Deep space (u, v, d (u, v)) conversion for depth image I, is obtained corresponding three-dimensional by initial phase first Point cloud space (x, y, z), and depth image is evenly dividing by δ × δ grids as comprising N_sA seed, the pixel in grid It is added in the cluster centered on seed point, and according to the geometric center of all pixels point in clusterMore The new position of new seed point；Followed by iteration phase：For all seed points, in the contiguous range of the δ of its 3 δ × 3, according to away from From D_sIt measures the distance of pixel and seed point, pixel is grouped into the seed point cluster closest from its, and update and generate N_s The new position of a seed point, iteration above step restrain or reach greatest iteration step number N until whole process_i；

The human body segmentation of step (2) SDDF+SGDF super-pixel union features

Using the union feature (SDDF+ based on super-pixel depth difference feature (SDDF) Yu super-pixel geodesic distance feature (SGDF) SGDF), for one group with super-pixel χ_sGeometric centerFor the center of circle, random uniform sampling obtains in the circular scope of radius Θ One group of offsetOn image I, combine N_SDDFThe value f of a SDDF_θAnd N_SGDFThe value g of a SGDF_θ, It obtains about super-pixel χ_sDimension be N_SDDF+N_SGDFFeature：

1) the depth difference feature f based on super-pixel_θ:For the super-pixel χ divided in depth map I_s, in its geometric centerRadius be an offset θ being generated in advance by random uniform sampling in the circular scope of Θ, depth difference characteristic value For：

Wherein, d_I() represents to take the depth value of some location of pixels.

2) the geodesic distance feature g based on super-pixel_θ：First one is formed according to the super-pixel comprising prospect divided in image The structure of a non-directed graph, wherein vertex are exactly these prospect super-pixel；Then decide whether to add between vertex according to two rules Edged, if 1. two super-pixel χ_iAnd χ_jThere is pixel direct neighbor, and the absolute value of the difference of adjacent pixel depth is less than δ_d, then A side about two super-pixel, the weights on side are added in figure:

Euclidean distance for two super-pixel centers；2. for being connected by the still no side of first rule with other vertex Outlier χ_i, exist and χ_iA minimum super-pixel χ of distance_c=argmin_χdist(χ_i, χ), addition and super-pixel χ_cWeights For dist (χ_i, χ_c) a line；To the connected undirected graph application Floyd-Warshall algorithms constructed, then can calculate The mutual shortest distance of all super-pixel,

For the offset θ, super-pixel χ being randomly derived as depth difference feature_sThe geodesic distance list of feature values be shown as：

Wherein, SP_I(X) pixel X affiliated super-pixel in the picture, d are represented_geodesicIt represents two on undirected graph structure before The shortest distance between a vertex, that is, the geodesic distance being considered in super-pixel set；

3) the Stochastic Decision-making forest classified of super-pixel：For the union feature F of super-pixel_I(χ) is carried out using Stochastic Decision-making forest Classification,

Training generation N first_tTree composition forest, in the training process of Stochastic Decision-making forest, for each split vertexes It needs to calculate its comentropy and information gain, the comentropy of a node comprising sampled point training set S is：

Wherein l is the position to be classified, and p (l | S) it is the probability that l is classified as in set S, Stochastic Decision-making forest algorithm will choose energy Enough obtain the division of maximum information gain, the division P={ P of such a node_left, P_right, the training which is included Sampling set S is divided into { S_left, S_right, the information gain of the division is defined as：

InfGain (S, P)=H (S)-H (S | P)=H (S)-p_leftH(S_left)-p_rightH(S_right)。

After the training of Stochastic Decision-making forest is completed, image I can obtain super-pixel X by a tree t in forest and be classified as The Probability p of position l_t(l | I, X), entire forest obtainsChoose the position of maximum probability As the classification of super-pixel, and include as super-pixel the classification results of pixel.

Human body attitude sparse regression of the step (3) based on human body cluster feature

After being classified by Stochastic Decision-making forest to super-pixel, all foreground pixels are classified, using from position to The feature of joint mapping, and the method for passing through sparse regression maps out the position of skeletal point.

1) human body attitude of the cluster feature based on Divisional represents：For a position l, K mean cluster is carried out to it, is obtained To N_kIt is a to cluster point and according to the distance-taxis at the position and preset position, obtain vectorThen The geometric center c of all foreground pixels₀With all N_pA position is joined together, and is obtained new about based on position cluster feature Human body attitude expression:

2) sparse regression：The target of human body attitude estimation is exactly to obtain N_JThe position y of a three-dimensional framework artis, it is assumed that have N width instruction Practice picture, it is known that its artis informationAnd genius lociWherein N_qIt is the number of genius loci point, and I=1 ..., N, while define regression matrixWherein Feature c is mapped to j-th skeletal point (j=1 ..., N_q), then this sparse regression model is exactly y_i≈Ac_i, then for throwing Shadow matrix A can pass through following N_JA independent optimization obtains：

Wherein y_i(3j-2：3j) represent vector y_iThe subvector tieed up from (3j-2) dimension to (3j), passes through the matrix A trained With position cluster feature c, the three-dimensional position y=Ac of skeletal point is obtained using linear sparse regression.

2. the estimation method of human posture as described in claim 1 based on depth image super-pixel union feature, feature exist In, in step (1), pixel X_k(x_k, y_k, z_k) and ith cluster central pointDistance metric D_sIt is set It is calculated as：

Wherein, μ is the weight that compactness is adjusted in super-pixel.