CN108154104A - A kind of estimation method of human posture based on depth image super-pixel union feature - Google Patents

A kind of estimation method of human posture based on depth image super-pixel union feature Download PDF

Info

Publication number
CN108154104A
CN108154104A CN201711395472.1A CN201711395472A CN108154104A CN 108154104 A CN108154104 A CN 108154104A CN 201711395472 A CN201711395472 A CN 201711395472A CN 108154104 A CN108154104 A CN 108154104A
Authority
CN
China
Prior art keywords
pixel
super
feature
point
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711395472.1A
Other languages
Chinese (zh)
Other versions
CN108154104B (en
Inventor
孔德慧
张雯晖
王少帆
王玉萍
尹宝才
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN201711395472.1A priority Critical patent/CN108154104B/en
Publication of CN108154104A publication Critical patent/CN108154104A/en
Application granted granted Critical
Publication of CN108154104B publication Critical patent/CN108154104B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/231Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Abstract

The present invention discloses a kind of estimation method of human posture based on depth image super-pixel union feature, the depth image of human body is included as input data using single width, human body attitude feature extraction is carried out to depth image, human body is split using feature, cluster operation is carried out to the position after segmentation, and the location estimation of human skeleton point is carried out applied to sparse regression.Technical solution using the present invention, improves the accuracy rate of human body attitude estimation, and promotes the real-time of Attitude estimation method.

Description

A kind of estimation method of human posture based on depth image super-pixel union feature
Technical field
The invention belongs to computer vision and area of pattern recognition more particularly to a kind of depth image super-pixel that is based on to combine The estimation method of human posture of feature.
Background technology
Human body is divided and the positioning of skeletal joint is as human body attitude estimation problem, is computer vision and man-machine friendship A basic work in mutual field.Attitude estimation is in action recognition, animation simulation, gait analysis, the video based on content Image retrieval and intelligent video monitoring etc. have a wide range of applications.With Kinect sensor, TOF camera equal depth map picture The development of equipment is obtained, many research work are gradually gone to from the intensity image of conventional color or gray scale on depth image.With coloured silk Color image is compared, and depth image can be to avoid the influence of different illumination, appearance and ambient noise.
Since human body belongs to hinge type structure, the high-freedom degree that has, controlled parameter space, self similarity position and From blocking so that directly human skeleton model extremely difficult.The difficult point of human body attitude estimation is to build complicated people Body joint represents model and calculates human synovial position by unmarked data, and application demand further aggravates it in real time Difficulty.
In terms of application data, estimation method of human posture is divided into based on single-frame images method of estimation and based on figure As the method for estimation of sequence.Accumulated error is not present in method of estimation based on the method for estimation of single frames compared to image sequence, Fault recovery is not needed to, required posture can be directly obtained from single image;But due to not having on the time of movement Context information easily differentiates mistake for ambiguous posture.
From the point of view of application process, the method for early stage is with to based on Human Modeling, by searching for human body state space, in order to Matching is carried out with characteristics of image to be aligned.Such as common iterative closest point approach, using Markov Chain to head, body and four limbs Method of detector fitting manikin etc..The step of this kind of matched method of fitting is usually required by initialization, and need It designs one and meets true human body synthetic model, while computationally also more complicated costliness.Side based on machine learning Method is gradually used by everybody, such as Stochastic Decision-making forest (RDF), support vector machines (SVM), k nearest neighbor classification (KNN), deep learning The methods of all have corresponding application in terms of human body attitude estimation.The method of this kind of study does not need to the manikin of priori, but The training set sufficiently large, diversity is good dependent on one simultaneously increases the training time, while whether can extract effective, accurate Really, the feature descriptor stablized is also a main challenge.
In a kind of real-time body's Attitude estimation method that Shotton in 2011 et al. is proposed, and obtained in Kinect applications Obtained success.It is combined using depth difference feature with Stochastic Decision-making forest, uses by human body classification and finally mean value Drift cluster, which returns, obtains human skeleton point.But a problem of Stochastic Decision-making forest application is that the tree in forest is more, whole The result of a forest just can be more stable.And increasing with tree, the training time and testing time of Stochastic Decision-making forest all will It can increase, so limiting the scale of Stochastic Decision-making forest in real-time application in turn.Meanwhile with advances in technology, Depth image resolution ratio is increasing so that original pixel to be treated is also being multiplied, to processing method in real-time side The requirement in face is with regard to higher and higher.
The pixel of semantic similarity can be divided into one by a kind of image partition method of the super-pixel as image processing field In a super-pixel region, then image from processing pixel-by-pixel can become that entire super-pixel block is uniformly processed, for multiple Miscellaneous image processing program efficiency can have the promotion of the order of magnitude so that using more multiple in the application required for real-time Miscellaneous feature calculation is possibly realized.Simple wire-form iteration cluster (SLIC) is a kind of outstanding superpixel segmentation method, segmentation Results contrast is compact and size is not much different, the comparison that neighborhood relationships are still kept relative to other superpixel segmentation methods It is good.The method for carrying out super-pixel segmentation to depth map based on SLIC is fewer, has directly European using three-dimensional point cloud space Distance directly measure pel spacing from, the super-pixel block size that this method is partitioned into differs greatly;There is addition gradient direction Augment semantics segmentation, but it is the increase in the complexity of calculating;Also certain methods are to divide super picture with the combination of colour+depth Element, but which increases additional input information.
Invention content
In order to fast and accurately extract the feature of human body and effectively calculated, human body attitude estimation is improved Accuracy rate, and promote the real-time of Attitude estimation method.The present invention proposes a kind of human body appearance based on joint super-pixel feature State method of estimation, a kind of novel union feature based on super-pixel of extraction, this feature are used based on super-pixel depth difference feature With the fusion feature of the geodesic distance based on super-pixel, and application the union feature in Stochastic Decision-making forest method carry out human body Position is divided, after pass through the method for the K mean cluster at each position point application sparse regression to divide estimation human body attitude. The technology solved required for of the invention includes:Quickly and effectively super-pixel divides;The human body portion of union feature based on super-pixel Position segmentation;Human body attitude sparse regression based on human body cluster.
To achieve the above object, the present invention adopts the following technical scheme that:
The estimation method of human posture based on depth image super-pixel union feature of the present invention includes human body with single width Depth image is input data, carries out human body attitude feature extraction to depth image, human body is split using feature, Cluster operation is carried out to the position after segmentation, and the location estimation of human skeleton point is carried out applied to sparse regression.Entire frame Include the following steps:
Step (1) μ SLIC super-pixel divides:
The present invention surpasses depth map using a kind of method of the simple linear iteration cluster (μ SLIC) of Weighted Coefficients μ Pixel operation, this method are divided into two stages:Initial phase first, for depth image I, deep space (u, v, d (u, V)) conversion obtains corresponding three-dimensional point cloud space (x, y, z), and wherein u, v are two-dimensional image coordinate, and d (u, v) is corresponding for (u, v) The depth value of position, x, y, z are three dimensional space coordinate, and depth image is evenly dividing by δ × δ grids as comprising NsA kind Son is added to the pixel in grid in the cluster centered on seed point, and according to the geometry of all pixels point in cluster Averagely obtain geometric centerUpdate the new position of seed point.Followed by iteration phase:For all seeds Point, in the contiguous range of the δ of its 3 δ × 3, according to distance DsMeasure the distance of pixel and seed point, pixel be grouped into from its away from In being clustered from nearest seed point, and update and generate NsThe new position of a seed point, iteration above step are received until whole process Hold back or reach greatest iteration step number Ni
Pixel Xk(xk, yk, zk) and ith cluster central pointDistance metric DsIt is designed to:
Wherein, μ is the weight that compactness is adjusted in super-pixel.
The human body segmentation of step (2) SDDF+SGDF super-pixel union features:
Using the union feature based on super-pixel depth difference feature (SDDF) Yu super-pixel geodesic distance feature (SGDF) (SDDF+SGDF), for one group with super-pixel χsGeometric centerFor the center of circle, uniformly adopted at random in the circular scope of radius Θ One group of offset that sample obtainsOn image I, combine NSDDFThe value f of a SDDFθAnd NSGDFA SGDF's Value gθ, obtain about super-pixel χsDimension be NSDDF+NSGDFFeature:
1) the depth difference feature f based on super-pixelθ:For the super-pixel χ divided in depth map Is, in its geometry CenterRadius be an offset θ being generated in advance by random uniform sampling in the circular scope of Θ, depth difference is special Value indicative is:
Wherein, dl() represents to take the depth value of some location of pixels.
2) the geodesic distance feature g based on super-pixelθ:First according to the super-pixel structure comprising prospect divided in image Into the structure of a non-directed graph, wherein vertex is exactly these prospect super-pixel;Then decided whether according to two rules on vertex Between add side, if 1. two super-pixel χiAnd χiThere is pixel direct neighbor, and the absolute value of the difference of adjacent pixel depth is less than δd, then a side about two super-pixel, the weights on side are added in figure:
Euclidean distance for two super-pixel centers;2. for passing through first rule still without side and other vertex phases The outlier χ of connectioni, exist and χiA minimum super-pixel χ of distancec=argminχ, dist (χi, χ), addition and super-pixel χcWeights are dist (χi, χc) a line;It, then can be with to the connected undirected graph application Floyd-Warshall algorithms constructed The mutual shortest distance of all super-pixel is calculated,
For the offset θ, super-pixel χ being randomly derived as depth difference featuresThe geodesic distance list of feature values It is shown as:
Wherein, SPI(X) pixel X affiliated super-pixel in the picture, d are representedgeodesicIt represents in undirected graph structure before The shortest distance between upper two vertex, that is, the geodesic distance being considered in super-pixel set;
3) the Stochastic Decision-making forest classified of super-pixel:For the union feature F of super-pixelI(χ) is using Stochastic Decision-making forest Classify,
Training generation N firsttTree composition forest, in the training process of Stochastic Decision-making forest, for each division Node is required for calculating its comentropy and information gain, and the comentropy of a node comprising sampled point training set S is:
Wherein l is the position to be classified, and p (l | S) it is the probability that l is classified as in set S, Stochastic Decision-making forest algorithm will select The division of maximum information gain can be obtained by taking, the division P={ P of such a nodeleft, Pright, which is included Training sampling set S is divided into { Sleft, Sright, the information gain of the division is defined as:
After the training of Stochastic Decision-making forest is completed, image I can obtain χ points of super-pixel by a tree t in forest Class is the Probability p of position lt(l | I, χ), entire forest obtainsChoose maximum probability Classification of the position as super-pixel, and include as super-pixel the classification results of pixel.
Human body attitude sparse regression of the step (3) based on human body cluster feature:
After being classified by Stochastic Decision-making forest to super-pixel, all foreground pixels are classified, using from portion The feature that position is mapped to joint, and the method for passing through sparse regression maps out the position of skeletal point.
1) human body attitude of the cluster feature based on Divisional represents:For a position l, K mean cluster is carried out to it, Acquire NkIt is a to cluster point and according to the distance-taxis at the position and preset position, obtain vector Then the geometric center c of all foreground pixels0With all NpA position is joined together, and obtains new clustering about based on position The human body attitude expression of feature:
2) sparse regression:The target of human body attitude estimation is exactly to obtain NJThe position y of a three-dimensional framework artis, it is assumed that have N width trains picture, it is known that its artis informationAnd genius lociWherein NqIt is of genius loci point Number, and i=1 ..., N, while define regression matrixWhereinFeature c is mapped to j-th skeletal point (j=1 ..., Nq), then this sparse regression model is exactly yi≈ Aci, following N then can be passed through for projection matrix AJA independent optimization obtains:
Wherein yi(3j-2:3j) represent vector yiThe subvector tieed up from (3j-2) dimension to (3j).Pass through the square trained Battle array A and position cluster feature c, the three-dimensional position y=Ac of skeletal point is obtained using linear sparse regression.
The present invention observes that the skeletal point of sparse regression is not located on foreground pixel sometimes in an experiment, and due to people Artis be normally on the center at body local position, due to the variation of actual view, at least also before comprising body On scene element, it is possible to the matching of arest neighbors foreground pixel point is carried out to the result of sparse regression, allows the bone for deviateing foreground pixel Frame point is modified, so as to get final skeletal point further promotes precision in the result of recurrence.
Description of the drawings
Fig. 1 human body attitudes estimate working frame;
Position Clustering Effect during Fig. 2 K=3;
The quantity of the depth sum number of Fig. 3 (a) trees;
Fig. 3 (b) deviation ranges and characteristic dimension;
Fig. 4 super-pixel parameters;Wherein, (a) is the super-pixel parameter of μ=1.0, the super-pixel parameter of (b) μ=1.5.(c) μ= 2.0 super-pixel parameters;
Fig. 5 SDDF and SGDF feature combine;
Fig. 6 single pixels feature and super-pixel tagsort result;
Fig. 7 uses single pixel feature (PDDF+PGDF) and the subjective figure of super-pixel feature (SDDF+SGDF) classification;
Fig. 8 postures return the result in CMUSD and XiDian data sets;
Fig. 9 postures return the result in EVAL data sets.
Specific embodiment
As shown in Figure 1, the present invention provides a kind of estimation method of human posture suitable for single frames depth image, with single width packet Depth image containing human body is input data, human body attitude feature extraction is carried out to depth image, using feature to human body It is split, cluster operation is carried out to the position after segmentation, and the location estimation of human skeleton point is carried out applied to sparse regression. Entire frame includes the following steps:
(1) μ SLIC super-pixel divides:
The present invention surpasses depth map using a kind of method of the simple linear iteration cluster (μ SLIC) of Weighted Coefficients μ Pixel operation, this method are divided into two stages:Initial phase first, for depth image I, deep space (u, v, d (u, V)) conversion obtains corresponding three-dimensional point cloud space (x, y, z), and depth image is evenly dividing by δ × δ grids as comprising NsIt is a Seed is added to the pixel in grid in the cluster centered on seed point, and according in cluster all pixels point it is several What centerUpdate the new position of seed point.Followed by iteration phase:For all seed points, in its 3 δ × 3 In the contiguous range of δ, according to distance DsThe distance of pixel and seed point is measured, pixel is grouped into the seed closest from its In point cluster, and update and generate NsThe new position of a seed point, iteration above step is until whole process restrains or reaches maximum Iterative steps Ni
In order to avoid influence of the region of noise or depth value acute variation to super-pixel, while accelerate the super-pixel stage Processing speed, pixel XkWith ith cluster central pointDistance metric DsIt is designed to:
Wherein μ is the weight that compactness is adjusted in super-pixel, when μ values are smaller, is easily spatially generated in two dimensional image More uniform segmentation, but depth detail performance is bad.When μ values are bigger, pixel similar in depth is easier to divide at one In block of pixels, but it is possible that many elongated regions, sex expression of compacting are bad.
Due in algorithmic procedure, distance metric DsIt is used only to that comparison is far and near not to be needed to be overlapped, so in algorithm With D in realizations 2To replace Ds, do not need to carry out the operation of evolution in this way, with the execution efficiency of accelerating algorithm.
(2) the human body segmentation of SDDF+SGDF super-pixel union feature:
Kinect systems demonstrate the validity that depth difference feature shows single pixel characteristic in Attitude estimation.This method carries A kind of union feature (SDDF+ based on super-pixel depth difference feature (SDDF) with super-pixel geodesic distance feature (SGDF) is gone out SGDF).For one group with super-pixel χsGeometric centerFor the center of circle, random uniform sampling obtains in the circular scope of radius Θ One group of offsetOn image I, combine NSDDFThe value f of a SDDFθAnd NSGDFThe value g of a SGDFθ, It obtains about super-pixel χsDimension be NSDDF+NSGDFFeature:
1) the depth difference feature f based on super-pixelθ:For the super-pixel χ divided in depth map Is, in its geometry CenterRadius be an offset θ being generated in advance by random uniform sampling in the circular scope of Θ, depth difference is special Value indicative is:
Wherein dJ() represents to take the depth value of some location of pixels.
2) the geodesic distance feature g based on super-pixelθ:First according to the super-pixel structure comprising prospect divided in image Into the structure of a non-directed graph, wherein vertex is exactly these prospect super-pixel;Then decided whether according to two rules on vertex Between add side, if 1. two super-pixel χiAnd χjThere is pixel direct neighbor, and the absolute value of the difference of adjacent pixel depth is less than δd, then a side about two super-pixel, the weights on side are added in figure:
Euclidean distance for two super-pixel centers;2. for passing through first rule still without side and other vertex phases The outlier χ of connectioni, exist and χiA minimum super-pixel χ of distancec=argminχdist(χi, χ), addition and super-pixel χc Weights are dist (χi, χc) a line.To the connected undirected graph application Floyd-Warshall algorithms constructed, then can count Calculate the mutual shortest distance of all super-pixel.
For the offset θ, super-pixel χ being randomly derived as depth difference featuresThe geodesic distance list of feature values It is shown as:
Wherein SPI(X) pixel X affiliated super-pixel in the picture, d are representedgeodesicIt represents in undirected graph structure before The shortest distance between upper two vertex, that is, the geodesic distance being considered in super-pixel set.
3) the Stochastic Decision-making forest classified of super-pixel:For the union feature F of super-pixelI(χ) is using Stochastic Decision-making forest To classify.
Training generation N firsttTree composition forest.In the training process of Stochastic Decision-making forest, for each division Node is required for calculating its comentropy and information gain.The comentropy of one node comprising sampled point training set S is:
Wherein l is the position to be classified, and p (l | S) it is the probability that l is classified as in set S.Stochastic Decision-making forest algorithm will select The division of maximum information gain can be obtained by taking, the division P={ P of such a nodeleft, Pright, which is included Training sampling set S is divided into { Sleft, Sright, the information gain of the division is defined as:
After the training of Stochastic Decision-making forest is completed, image I can obtain χ points of super-pixel by a tree t in forest Class is the Probability p of position lt(l | I, χ), entire forest obtainsChoose maximum probability Classification of the position as super-pixel, and include as super-pixel the classification results of pixel.
(3) the human body attitude sparse regression based on human body cluster feature:
After being classified by Stochastic Decision-making forest to super-pixel, just all foreground pixels are classified, Jin Ertu As being divided into each pre-defined position.But required skeletal point information, some joints (such as head joint and chest joint) It is the center on position (such as head and metastomium), and some joints (elbow joint) are then position and position (upper arm position With forearm position) adjoining position, so the present invention designs a kind of feature mapped from position to joint, and pass through sparse regression Method map out the position of skeletal point.
1) human body attitude of the cluster feature based on Divisional represents:For a position l, K mean cluster is carried out to it, Acquire NkA cluster point and according to the position and the distance-taxis of preset position (main portions being connected with the position), Obtain vectorThen the geometric center c of all foreground pixels0With all NpCombine at a position Come, obtain new about the human body attitude expression based on position cluster feature:
2) sparse regression:The target of human body attitude estimation is exactly to obtain NJThe position y of a three-dimensional framework artis.Assuming that have N width trains picture, it is known that its artis informationAnd genius lociWherein NqIt is of genius loci point Number, and i=1 ..., N, while define regression matrixWhereinFeature c is mapped to j-th skeletal point (j=1 ..., Nq), then this sparse regression model is exactly yi≈ Aci.Then following N can be passed through for projection matrix AJA independent optimization obtains:
Wherein yi(3j-2: 3j) vector y is representediThe subvector tieed up from (3j-2) dimension to (3j).Pass through the square trained Battle array A and position cluster feature c, the three-dimensional position y=Ac of skeletal point is obtained using linear sparse regression.
Present invention uses a kind of new human body attitudes to estimate frame, and a kind of new joint is proposed on depth image Super-pixel character representation, body part segmentation is carried out by the union feature application Stochastic Decision-making forest, in extract part point After the cluster feature cut out, map to obtain final skeleton joint point position using the method for sparse regression.The present invention by The advantage and disadvantage tested to compare other methods on multiple data acquisition systems, verification put forward feature and the validity of frame.
Embodiment 1:
Method based on machine learning usually requires a large amount of data set as training and verification.It is used in the present invention three A data set is tested.Wherein EVAL data sets correspond to different performers respectively comprising 3 themes, and each performer has 8 Action sequence is total to about 10,000 frames, and resolution ratio is 320 × 240;XiDian data include 5 action sequences, totally 2850 frame, point Resolution is 2048 × 2048, which acquired on the prototype for the depth data collection of material of big resolution ratio, Depth data noise is bigger.Since these three data sets contain only depth data, it can not train and classify for human body Stochastic Decision-making forest, so the present invention generated using CMU motion capture datas library with depth data and position number of tags According to CMU generated datas collection (CMUSD), 113 themes of common Poser software process qualities, 2549 action sequences, each posture packet Containing 8 cameras random but that front is master site, depth picture and position label more than 820,000 640 × 480, CMUSD are contained It has covered than more rich action.
Prediction accuracy is from Fig. 3 (a) but more next with setting as the depth of tree increases and increases in Stochastic Decision-making forest It is deeper, the raising of accuracy also increasingly unobvious, but the expense of tree itself is increasing.So depth of present invention selection tree Spend is 20.And the increase for tree set also has same effect, in order to balance the efficiency of execution, thus select the scale of forest for 8 trees.The range of characteristic offset values and the dimension of characteristic point, as can be seen that being 180 pixels in deviation range from Fig. 3 (b), When characteristic dimension is 1000, the preferable result that compares can be obtained.In experiment later, this group of parameter will be all used.
(1) μ SLIC super-pixel divides:
In super-pixel operation, carried out just according to the grid of 12 × 12 pixels using the picture of CMUSD settings 640 × 480 Begin to divide, the number of the super-pixel comprising human body prospect is average 120 or so.The super-pixel of data under different data collection resolution ratio Initial size zoomed in and out according to the ratio of foreground pixel, to keep foreground pixel that can obtain the super picture of same number grade Element.During super-pixel division is carried out, after iterations reach certain number, the pixel that super-pixel is included tends to be steady It is fixed, and iterations and run time are proportional, so taking maximum iteration Ni=10.For super-pixel distance metric Parameter, the profile details of chin do not embody from Fig. 4 (a) as can be seen that in the super-pixel on head, in Fig. 4 (c) In, although the profile details at each position are embodied well, but body edge many elongated super-pixel occurs and draws Point.The size of super-pixel is than more uniform and taken into account details performance in Fig. 4 (b).μ=1.5 in the subsequent experimental of the present invention.
(2) super-pixel feature extraction:
In this experiment, using 12 sequences of CMUSD totally 8 ten thousand or so pictures totally 100 ten thousand or so super-pixel, each Sequence randomly selects 50% and is trained, and in addition 50% carries out test verification.
Different column lines represents different feature combinations in Figure 5.As can be seen from the figure 0+1000 uses geodetic completely The effect of distance feature (GDF) is poor than the result of the complete use depth difference features (DDF) of 1000+0, but comes from accuracy rate It says, this feature has certain validity.It, can be deeper than being used alone especially after two kinds of features are mixed according to 800+200 The result of degree difference will be got well, and illustrate to carry out depth difference feature using geodesic distance feature for certain postures and position beneficial Supplement, but what is worked in major part is depth difference feature.So the feature finally used is to be combined deeply with 800+200 Degree difference and 1000 dimension super-pixel features of geodesic distance mixing.
(3) single pixel is compared with super-pixel classifying quality:
Single pixel feature (PDDF+PGDF) is carried out on CMUSD and super-pixel feature (SDDF+SGDF) carries out experiment pair Than being set using identical characteristic dimension (800+200) and random forest.When using PDDF+PGDF, from each depth map 120 pixels of stochastical sampling are as training set.The Average Accuracy of final PDDF+PGDF is the flat of 92.105, SDDF+SGDF Equal accuracy rate is 92.468.Show in figure 6 using SDDF+SGDF and PDDF+PGDF on different themes accuracy rate have Floating slightly, it is contemplated that the randomness of sampled point, the two does not have apparent gap.Although in the figure 7 using SDDF+ Certain pixels of the SGDF methods in classification may be integrated into adjacent different parts, and the classification of super-pixel will influence Entire pixels in super-pixel, but semantic identical position is enabled to tend to classification unanimously using super-pixel, and Greatly save the extraction efficiency and testing efficiency of geodesic distance feature.
(4) posture returns:
In the genius loci extraction process returned in posture, as shown in Fig. 2, the cluster points of setting K mean values is 3, totally 10 A position is gathered respectively for 3 classes, and is ranked up that (such as head is according to body according to its Euclidean distance apart from father's connecting portion The distance of cadre position), the dimension of c is 93 dimensions.
For CMUSD, using in data set 50% picture, as random forest and the training set of sparse regression, other It is tested as test set.As can be seen from Figure 8 the artis of four limbs is compared to the standard of the artis such as head and chest Exactness is relatively low.This is because the movement of acra part is more violent, amplitude range is bigger, and once occur blocking will be to pixel The influence of classification is bigger and final regression result is had an impact.
In the experiment carried out in XiDian databases, using 4 sequences therein as training set, 1 is used as test set Cross validation is carried out, Average Accuracy (mAP)=91.7 illustrate to be directed to big resolution ratio and the bigger data of noise have Effect, this method can generate relatively good result.Although comparatively the electric database data noise in west is bigger, movement compares It is relatively simple, so result is higher than CMUSD accuracy rate.
For EVAL data sets, eliminate which part artis apart from body prospect is distant or artis directly away from From the apparent frame for exceeding organization of human body, it is left each theme and averagely includes 3,000 frames.It is trained in the pixel classifications stage using CMUSD Using the model trained as training set of whole pictures of CMUSD, two masters of EVAL are used in the stage of recurrence for random forest Topic is used as training set, surplus next as test set, is finally three cell means progress cross validation.
Fig. 9 is the comparison on EVAL data sets with the algorithm of the algorithm of Ye et al. and Jung et al..Ye et al. is 2014 The algorithm based on mixed Gauss model that year proposes needs to be fitted body model, it is slower to perform speed;Jung et al. was in 2015 The model of the random walk tree of proposition, greatly improves in execution efficiency, but is only verified on small data.From figure It can be seen that it is relatively accurate close to the centrical artis of body, and the precision of acra point is relatively poor, this is because Acra part is relatively low in the accuracy in pixel classifications stage, the position for returning feature clustering point is directly affected, so as to influence appearance The precision that state returns.
(5) run time
It is set using the characteristic dimension of identical setting and identical random forest, to single pixel method and super-pixel method Carry out run time comparison.Since the time complexity that geodesic distance calculates is 0 (n3), for the image of big resolution ratio, It can not almost be used under real time environment as feature.Although the method based on super-pixel of this method increases calculating super-pixel Time, but substantially reduce the efficiency of pixel characteristic extraction, while decrease the time of the classification of random forest.Even if It is to only use depth difference as feature, the method for super-pixel is while precision is ensured it is also possible that algorithm is to different resolutions Rate integrally accelerates 1.5~8 times.Method can reach under the data set of a variety of resolution ratio as can be seen from the table in real time will It asks.
Table 1 performs time (unit:Millisecond)
The experiment proves that in the case of a variety of different resolution ratio and the quality of data, new union feature can for this part Effectively express region characteristic, entire frame can in real time, effectively calculate human body attitude.
Super-pixel generation method has used the pixel distance measurement for taking into account two and three dimensions information, realizes pixel to super picture The down-sampling operation of element so that the reduction of the order of magnitude is presented in the data directly handled.This method is extracted on single frames depth map melts The union feature of depth difference and geodesic distance information is closed, overall situation and partial situation's incidence relation between pixel is comprehensively utilized, improves people The nicety of grading of body region.Compared with predecessor works, sparse regression is carried out according to Divisional cluster feature point and realizes human synovial Point estimation, not only reduces processing time, also obtains higher human body attitude estimated accuracy.

Claims (2)

1. a kind of estimation method of human posture based on depth image super-pixel union feature, which is characterized in that including following step Suddenly:
Step (1) μ SLIC super-pixel divides
Super-pixel operation is carried out to depth map using the method for the simple linear iteration cluster (μ SLIC) of Weighted Coefficients μ, is divided into two A stage:Deep space (u, v, d (u, v)) conversion for depth image I, is obtained corresponding three-dimensional by initial phase first Point cloud space (x, y, z), and depth image is evenly dividing by δ × δ grids as comprising NsA seed, the pixel in grid It is added in the cluster centered on seed point, and according to the geometric center of all pixels point in clusterMore The new position of new seed point;Followed by iteration phase:For all seed points, in the contiguous range of the δ of its 3 δ × 3, according to away from From DsIt measures the distance of pixel and seed point, pixel is grouped into the seed point cluster closest from its, and update and generate Ns The new position of a seed point, iteration above step restrain or reach greatest iteration step number N until whole processi
The human body segmentation of step (2) SDDF+SGDF super-pixel union features
Using the union feature (SDDF+ based on super-pixel depth difference feature (SDDF) Yu super-pixel geodesic distance feature (SGDF) SGDF), for one group with super-pixel χsGeometric centerFor the center of circle, random uniform sampling obtains in the circular scope of radius Θ One group of offsetOn image I, combine NSDDFThe value f of a SDDFθAnd NSGDFThe value g of a SGDFθ, It obtains about super-pixel χsDimension be NSDDF+NSGDFFeature:
1) the depth difference feature f based on super-pixelθ:For the super-pixel χ divided in depth map Is, in its geometric centerRadius be an offset θ being generated in advance by random uniform sampling in the circular scope of Θ, depth difference characteristic value For:
Wherein, dI() represents to take the depth value of some location of pixels.
2) the geodesic distance feature g based on super-pixelθ:First one is formed according to the super-pixel comprising prospect divided in image The structure of a non-directed graph, wherein vertex are exactly these prospect super-pixel;Then decide whether to add between vertex according to two rules Edged, if 1. two super-pixel χiAnd χjThere is pixel direct neighbor, and the absolute value of the difference of adjacent pixel depth is less than δd, then A side about two super-pixel, the weights on side are added in figure:
Euclidean distance for two super-pixel centers;2. for being connected by the still no side of first rule with other vertex Outlier χi, exist and χiA minimum super-pixel χ of distancec=argminχdist(χi, χ), addition and super-pixel χcWeights For dist (χi, χc) a line;To the connected undirected graph application Floyd-Warshall algorithms constructed, then can calculate The mutual shortest distance of all super-pixel,
For the offset θ, super-pixel χ being randomly derived as depth difference featuresThe geodesic distance list of feature values be shown as:
Wherein, SPI(X) pixel X affiliated super-pixel in the picture, d are representedgeodesicIt represents two on undirected graph structure before The shortest distance between a vertex, that is, the geodesic distance being considered in super-pixel set;
3) the Stochastic Decision-making forest classified of super-pixel:For the union feature F of super-pixelI(χ) is carried out using Stochastic Decision-making forest Classification,
Training generation N firsttTree composition forest, in the training process of Stochastic Decision-making forest, for each split vertexes It needs to calculate its comentropy and information gain, the comentropy of a node comprising sampled point training set S is:
Wherein l is the position to be classified, and p (l | S) it is the probability that l is classified as in set S, Stochastic Decision-making forest algorithm will choose energy Enough obtain the division of maximum information gain, the division P={ P of such a nodeleft, Pright, the training which is included Sampling set S is divided into { Sleft, Sright, the information gain of the division is defined as:
InfGain (S, P)=H (S)-H (S | P)=H (S)-pleftH(Sleft)-prightH(Sright)。
After the training of Stochastic Decision-making forest is completed, image I can obtain super-pixel X by a tree t in forest and be classified as The Probability p of position lt(l | I, X), entire forest obtainsChoose the position of maximum probability As the classification of super-pixel, and include as super-pixel the classification results of pixel.
Human body attitude sparse regression of the step (3) based on human body cluster feature
After being classified by Stochastic Decision-making forest to super-pixel, all foreground pixels are classified, using from position to The feature of joint mapping, and the method for passing through sparse regression maps out the position of skeletal point.
1) human body attitude of the cluster feature based on Divisional represents:For a position l, K mean cluster is carried out to it, is obtained To NkIt is a to cluster point and according to the distance-taxis at the position and preset position, obtain vectorThen The geometric center c of all foreground pixels0With all NpA position is joined together, and is obtained new about based on position cluster feature Human body attitude expression:
2) sparse regression:The target of human body attitude estimation is exactly to obtain NJThe position y of a three-dimensional framework artis, it is assumed that have N width instruction Practice picture, it is known that its artis informationAnd genius lociWherein NqIt is the number of genius loci point, and I=1 ..., N, while define regression matrixWherein Feature c is mapped to j-th skeletal point (j=1 ..., Nq), then this sparse regression model is exactly yi≈Aci, then for throwing Shadow matrix A can pass through following NJA independent optimization obtains:
Wherein yi(3j-2:3j) represent vector yiThe subvector tieed up from (3j-2) dimension to (3j), passes through the matrix A trained With position cluster feature c, the three-dimensional position y=Ac of skeletal point is obtained using linear sparse regression.
2. the estimation method of human posture as described in claim 1 based on depth image super-pixel union feature, feature exist In, in step (1), pixel Xk(xk, yk, zk) and ith cluster central pointDistance metric DsIt is set It is calculated as:
Wherein, μ is the weight that compactness is adjusted in super-pixel.
CN201711395472.1A 2017-12-21 2017-12-21 Human body posture estimation method based on depth image super-pixel combined features Active CN108154104B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711395472.1A CN108154104B (en) 2017-12-21 2017-12-21 Human body posture estimation method based on depth image super-pixel combined features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711395472.1A CN108154104B (en) 2017-12-21 2017-12-21 Human body posture estimation method based on depth image super-pixel combined features

Publications (2)

Publication Number Publication Date
CN108154104A true CN108154104A (en) 2018-06-12
CN108154104B CN108154104B (en) 2021-10-15

Family

ID=62464113

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711395472.1A Active CN108154104B (en) 2017-12-21 2017-12-21 Human body posture estimation method based on depth image super-pixel combined features

Country Status (1)

Country Link
CN (1) CN108154104B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635783A (en) * 2019-01-02 2019-04-16 上海数迹智能科技有限公司 Video monitoring method, device, terminal and medium
CN110288677A (en) * 2019-05-21 2019-09-27 北京大学 It is a kind of based on can deformation structure pedestrian image generation method and device
CN110598675A (en) * 2019-09-24 2019-12-20 深圳度影医疗科技有限公司 Ultrasonic fetal posture identification method, storage medium and electronic equipment
CN110610505A (en) * 2019-09-25 2019-12-24 中科新松有限公司 Image segmentation method fusing depth and color information
CN111046733A (en) * 2019-11-12 2020-04-21 宁波大学 3D human body posture estimation method based on sparsity and depth
CN111428619A (en) * 2020-03-20 2020-07-17 电子科技大学 Three-dimensional point cloud head attitude estimation system and method based on ordered regression and soft labels
CN111860311A (en) * 2020-07-20 2020-10-30 南京智金科技创新服务中心 Method and system for prompting abnormal posture of human body
CN112070835A (en) * 2020-08-21 2020-12-11 达闼机器人有限公司 Mechanical arm pose prediction method and device, storage medium and electronic equipment
CN112288798A (en) * 2019-07-24 2021-01-29 鲁班嫡系机器人(深圳)有限公司 Posture recognition and training method, device and system
CN112766335A (en) * 2021-01-08 2021-05-07 四川九洲北斗导航与位置服务有限公司 Image processing method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110206276A1 (en) * 2007-09-24 2011-08-25 Microsoft Corporation Hybrid graph model for unsupervised object segmentation
CN103890752A (en) * 2012-01-11 2014-06-25 三星电子株式会社 Apparatus for recognizing objects, apparatus for learning classification trees, and method for operating same
CN105389569A (en) * 2015-11-17 2016-03-09 北京工业大学 Human body posture estimation method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110206276A1 (en) * 2007-09-24 2011-08-25 Microsoft Corporation Hybrid graph model for unsupervised object segmentation
CN103890752A (en) * 2012-01-11 2014-06-25 三星电子株式会社 Apparatus for recognizing objects, apparatus for learning classification trees, and method for operating same
CN105389569A (en) * 2015-11-17 2016-03-09 北京工业大学 Human body posture estimation method

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109635783A (en) * 2019-01-02 2019-04-16 上海数迹智能科技有限公司 Video monitoring method, device, terminal and medium
CN110288677A (en) * 2019-05-21 2019-09-27 北京大学 It is a kind of based on can deformation structure pedestrian image generation method and device
CN112288798A (en) * 2019-07-24 2021-01-29 鲁班嫡系机器人(深圳)有限公司 Posture recognition and training method, device and system
CN110598675A (en) * 2019-09-24 2019-12-20 深圳度影医疗科技有限公司 Ultrasonic fetal posture identification method, storage medium and electronic equipment
CN110610505A (en) * 2019-09-25 2019-12-24 中科新松有限公司 Image segmentation method fusing depth and color information
CN111046733A (en) * 2019-11-12 2020-04-21 宁波大学 3D human body posture estimation method based on sparsity and depth
CN111046733B (en) * 2019-11-12 2023-04-18 宁波大学 3D human body posture estimation method based on sparsity and depth
CN111428619B (en) * 2020-03-20 2022-08-05 电子科技大学 Three-dimensional point cloud head attitude estimation system and method based on ordered regression and soft labels
CN111428619A (en) * 2020-03-20 2020-07-17 电子科技大学 Three-dimensional point cloud head attitude estimation system and method based on ordered regression and soft labels
CN111860311A (en) * 2020-07-20 2020-10-30 南京智金科技创新服务中心 Method and system for prompting abnormal posture of human body
CN112070835A (en) * 2020-08-21 2020-12-11 达闼机器人有限公司 Mechanical arm pose prediction method and device, storage medium and electronic equipment
CN112766335A (en) * 2021-01-08 2021-05-07 四川九洲北斗导航与位置服务有限公司 Image processing method and device, electronic equipment and storage medium
CN112766335B (en) * 2021-01-08 2023-12-01 四川九洲北斗导航与位置服务有限公司 Image processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN108154104B (en) 2021-10-15

Similar Documents

Publication Publication Date Title
CN108154104A (en) A kind of estimation method of human posture based on depth image super-pixel union feature
CN110458939B (en) Indoor scene modeling method based on visual angle generation
US10891511B1 (en) Human hairstyle generation method based on multi-feature retrieval and deformation
CN112002014B (en) Fine structure-oriented three-dimensional face reconstruction method, system and device
CN109658445A (en) Network training method, increment build drawing method, localization method, device and equipment
CN105389569B (en) A kind of estimation method of human posture
US9942535B2 (en) Method for 3D scene structure modeling and camera registration from single image
Sirmacek et al. Performance evaluation for 3-D city model generation of six different DSMs from air-and spaceborne sensors
CN110163836A (en) Based on deep learning for the excavator detection method under the inspection of high-altitude
CN109410321A (en) Three-dimensional rebuilding method based on convolutional neural networks
CN108319957A (en) A kind of large-scale point cloud semantic segmentation method based on overtrick figure
CN106651926A (en) Regional registration-based depth point cloud three-dimensional reconstruction method
CN107871106A (en) Face detection method and device
CN109214366A (en) Localized target recognition methods, apparatus and system again
CN107481279A (en) A kind of monocular video depth map computational methods
EP3905194A1 (en) Pose estimation method and apparatus
CN109934065A (en) A kind of method and apparatus for gesture identification
Aiteanu et al. Hybrid tree reconstruction from inhomogeneous point clouds
CN104346824A (en) Method and device for automatically synthesizing three-dimensional expression based on single facial image
CN109598234A (en) Critical point detection method and apparatus
CN108648194A (en) Based on the segmentation of CAD model Three-dimensional target recognition and pose measuring method and device
CN112085835B (en) Three-dimensional cartoon face generation method and device, electronic equipment and storage medium
CN110334584B (en) Gesture recognition method based on regional full convolution network
CN109766873A (en) A kind of pedestrian mixing deformable convolution recognition methods again
CN112669448B (en) Virtual data set development method, system and storage medium based on three-dimensional reconstruction technology

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant