CN108154104A - A kind of estimation method of human posture based on depth image super-pixel union feature - Google Patents
A kind of estimation method of human posture based on depth image super-pixel union feature Download PDFInfo
- Publication number
- CN108154104A CN108154104A CN201711395472.1A CN201711395472A CN108154104A CN 108154104 A CN108154104 A CN 108154104A CN 201711395472 A CN201711395472 A CN 201711395472A CN 108154104 A CN108154104 A CN 108154104A
- Authority
- CN
- China
- Prior art keywords
- pixel
- super
- feature
- point
- cluster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/20—Movements or behaviour, e.g. gesture recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/231—Hierarchical techniques, i.e. dividing or merging pattern sets so as to obtain a dendrogram
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
Abstract
The present invention discloses a kind of estimation method of human posture based on depth image super-pixel union feature, the depth image of human body is included as input data using single width, human body attitude feature extraction is carried out to depth image, human body is split using feature, cluster operation is carried out to the position after segmentation, and the location estimation of human skeleton point is carried out applied to sparse regression.Technical solution using the present invention, improves the accuracy rate of human body attitude estimation, and promotes the real-time of Attitude estimation method.
Description
Technical field
The invention belongs to computer vision and area of pattern recognition more particularly to a kind of depth image super-pixel that is based on to combine
The estimation method of human posture of feature.
Background technology
Human body is divided and the positioning of skeletal joint is as human body attitude estimation problem, is computer vision and man-machine friendship
A basic work in mutual field.Attitude estimation is in action recognition, animation simulation, gait analysis, the video based on content
Image retrieval and intelligent video monitoring etc. have a wide range of applications.With Kinect sensor, TOF camera equal depth map picture
The development of equipment is obtained, many research work are gradually gone to from the intensity image of conventional color or gray scale on depth image.With coloured silk
Color image is compared, and depth image can be to avoid the influence of different illumination, appearance and ambient noise.
Since human body belongs to hinge type structure, the high-freedom degree that has, controlled parameter space, self similarity position and
From blocking so that directly human skeleton model extremely difficult.The difficult point of human body attitude estimation is to build complicated people
Body joint represents model and calculates human synovial position by unmarked data, and application demand further aggravates it in real time
Difficulty.
In terms of application data, estimation method of human posture is divided into based on single-frame images method of estimation and based on figure
As the method for estimation of sequence.Accumulated error is not present in method of estimation based on the method for estimation of single frames compared to image sequence,
Fault recovery is not needed to, required posture can be directly obtained from single image;But due to not having on the time of movement
Context information easily differentiates mistake for ambiguous posture.
From the point of view of application process, the method for early stage is with to based on Human Modeling, by searching for human body state space, in order to
Matching is carried out with characteristics of image to be aligned.Such as common iterative closest point approach, using Markov Chain to head, body and four limbs
Method of detector fitting manikin etc..The step of this kind of matched method of fitting is usually required by initialization, and need
It designs one and meets true human body synthetic model, while computationally also more complicated costliness.Side based on machine learning
Method is gradually used by everybody, such as Stochastic Decision-making forest (RDF), support vector machines (SVM), k nearest neighbor classification (KNN), deep learning
The methods of all have corresponding application in terms of human body attitude estimation.The method of this kind of study does not need to the manikin of priori, but
The training set sufficiently large, diversity is good dependent on one simultaneously increases the training time, while whether can extract effective, accurate
Really, the feature descriptor stablized is also a main challenge.
In a kind of real-time body's Attitude estimation method that Shotton in 2011 et al. is proposed, and obtained in Kinect applications
Obtained success.It is combined using depth difference feature with Stochastic Decision-making forest, uses by human body classification and finally mean value
Drift cluster, which returns, obtains human skeleton point.But a problem of Stochastic Decision-making forest application is that the tree in forest is more, whole
The result of a forest just can be more stable.And increasing with tree, the training time and testing time of Stochastic Decision-making forest all will
It can increase, so limiting the scale of Stochastic Decision-making forest in real-time application in turn.Meanwhile with advances in technology,
Depth image resolution ratio is increasing so that original pixel to be treated is also being multiplied, to processing method in real-time side
The requirement in face is with regard to higher and higher.
The pixel of semantic similarity can be divided into one by a kind of image partition method of the super-pixel as image processing field
In a super-pixel region, then image from processing pixel-by-pixel can become that entire super-pixel block is uniformly processed, for multiple
Miscellaneous image processing program efficiency can have the promotion of the order of magnitude so that using more multiple in the application required for real-time
Miscellaneous feature calculation is possibly realized.Simple wire-form iteration cluster (SLIC) is a kind of outstanding superpixel segmentation method, segmentation
Results contrast is compact and size is not much different, the comparison that neighborhood relationships are still kept relative to other superpixel segmentation methods
It is good.The method for carrying out super-pixel segmentation to depth map based on SLIC is fewer, has directly European using three-dimensional point cloud space
Distance directly measure pel spacing from, the super-pixel block size that this method is partitioned into differs greatly;There is addition gradient direction
Augment semantics segmentation, but it is the increase in the complexity of calculating;Also certain methods are to divide super picture with the combination of colour+depth
Element, but which increases additional input information.
Invention content
In order to fast and accurately extract the feature of human body and effectively calculated, human body attitude estimation is improved
Accuracy rate, and promote the real-time of Attitude estimation method.The present invention proposes a kind of human body appearance based on joint super-pixel feature
State method of estimation, a kind of novel union feature based on super-pixel of extraction, this feature are used based on super-pixel depth difference feature
With the fusion feature of the geodesic distance based on super-pixel, and application the union feature in Stochastic Decision-making forest method carry out human body
Position is divided, after pass through the method for the K mean cluster at each position point application sparse regression to divide estimation human body attitude.
The technology solved required for of the invention includes:Quickly and effectively super-pixel divides;The human body portion of union feature based on super-pixel
Position segmentation;Human body attitude sparse regression based on human body cluster.
To achieve the above object, the present invention adopts the following technical scheme that:
The estimation method of human posture based on depth image super-pixel union feature of the present invention includes human body with single width
Depth image is input data, carries out human body attitude feature extraction to depth image, human body is split using feature,
Cluster operation is carried out to the position after segmentation, and the location estimation of human skeleton point is carried out applied to sparse regression.Entire frame
Include the following steps:
Step (1) μ SLIC super-pixel divides:
The present invention surpasses depth map using a kind of method of the simple linear iteration cluster (μ SLIC) of Weighted Coefficients μ
Pixel operation, this method are divided into two stages:Initial phase first, for depth image I, deep space (u, v, d (u,
V)) conversion obtains corresponding three-dimensional point cloud space (x, y, z), and wherein u, v are two-dimensional image coordinate, and d (u, v) is corresponding for (u, v)
The depth value of position, x, y, z are three dimensional space coordinate, and depth image is evenly dividing by δ × δ grids as comprising NsA kind
Son is added to the pixel in grid in the cluster centered on seed point, and according to the geometry of all pixels point in cluster
Averagely obtain geometric centerUpdate the new position of seed point.Followed by iteration phase:For all seeds
Point, in the contiguous range of the δ of its 3 δ × 3, according to distance DsMeasure the distance of pixel and seed point, pixel be grouped into from its away from
In being clustered from nearest seed point, and update and generate NsThe new position of a seed point, iteration above step are received until whole process
Hold back or reach greatest iteration step number Ni。
Pixel Xk(xk, yk, zk) and ith cluster central pointDistance metric DsIt is designed to:
Wherein, μ is the weight that compactness is adjusted in super-pixel.
The human body segmentation of step (2) SDDF+SGDF super-pixel union features:
Using the union feature based on super-pixel depth difference feature (SDDF) Yu super-pixel geodesic distance feature (SGDF)
(SDDF+SGDF), for one group with super-pixel χsGeometric centerFor the center of circle, uniformly adopted at random in the circular scope of radius Θ
One group of offset that sample obtainsOn image I, combine NSDDFThe value f of a SDDFθAnd NSGDFA SGDF's
Value gθ, obtain about super-pixel χsDimension be NSDDF+NSGDFFeature:
1) the depth difference feature f based on super-pixelθ:For the super-pixel χ divided in depth map Is, in its geometry
CenterRadius be an offset θ being generated in advance by random uniform sampling in the circular scope of Θ, depth difference is special
Value indicative is:
Wherein, dl() represents to take the depth value of some location of pixels.
2) the geodesic distance feature g based on super-pixelθ:First according to the super-pixel structure comprising prospect divided in image
Into the structure of a non-directed graph, wherein vertex is exactly these prospect super-pixel;Then decided whether according to two rules on vertex
Between add side, if 1. two super-pixel χiAnd χiThere is pixel direct neighbor, and the absolute value of the difference of adjacent pixel depth is less than
δd, then a side about two super-pixel, the weights on side are added in figure:
Euclidean distance for two super-pixel centers;2. for passing through first rule still without side and other vertex phases
The outlier χ of connectioni, exist and χiA minimum super-pixel χ of distancec=argminχ, dist (χi, χ), addition and super-pixel
χcWeights are dist (χi, χc) a line;It, then can be with to the connected undirected graph application Floyd-Warshall algorithms constructed
The mutual shortest distance of all super-pixel is calculated,
For the offset θ, super-pixel χ being randomly derived as depth difference featuresThe geodesic distance list of feature values
It is shown as:
Wherein, SPI(X) pixel X affiliated super-pixel in the picture, d are representedgeodesicIt represents in undirected graph structure before
The shortest distance between upper two vertex, that is, the geodesic distance being considered in super-pixel set;
3) the Stochastic Decision-making forest classified of super-pixel:For the union feature F of super-pixelI(χ) is using Stochastic Decision-making forest
Classify,
Training generation N firsttTree composition forest, in the training process of Stochastic Decision-making forest, for each division
Node is required for calculating its comentropy and information gain, and the comentropy of a node comprising sampled point training set S is:
Wherein l is the position to be classified, and p (l | S) it is the probability that l is classified as in set S, Stochastic Decision-making forest algorithm will select
The division of maximum information gain can be obtained by taking, the division P={ P of such a nodeleft, Pright, which is included
Training sampling set S is divided into { Sleft, Sright, the information gain of the division is defined as:
After the training of Stochastic Decision-making forest is completed, image I can obtain χ points of super-pixel by a tree t in forest
Class is the Probability p of position lt(l | I, χ), entire forest obtainsChoose maximum probability
Classification of the position as super-pixel, and include as super-pixel the classification results of pixel.
Human body attitude sparse regression of the step (3) based on human body cluster feature:
After being classified by Stochastic Decision-making forest to super-pixel, all foreground pixels are classified, using from portion
The feature that position is mapped to joint, and the method for passing through sparse regression maps out the position of skeletal point.
1) human body attitude of the cluster feature based on Divisional represents:For a position l, K mean cluster is carried out to it,
Acquire NkIt is a to cluster point and according to the distance-taxis at the position and preset position, obtain vector
Then the geometric center c of all foreground pixels0With all NpA position is joined together, and obtains new clustering about based on position
The human body attitude expression of feature:
2) sparse regression:The target of human body attitude estimation is exactly to obtain NJThe position y of a three-dimensional framework artis, it is assumed that have
N width trains picture, it is known that its artis informationAnd genius lociWherein NqIt is of genius loci point
Number, and i=1 ..., N, while define regression matrixWhereinFeature c is mapped to j-th skeletal point (j=1 ..., Nq), then this sparse regression model is exactly yi≈
Aci, following N then can be passed through for projection matrix AJA independent optimization obtains:
Wherein yi(3j-2:3j) represent vector yiThe subvector tieed up from (3j-2) dimension to (3j).Pass through the square trained
Battle array A and position cluster feature c, the three-dimensional position y=Ac of skeletal point is obtained using linear sparse regression.
The present invention observes that the skeletal point of sparse regression is not located on foreground pixel sometimes in an experiment, and due to people
Artis be normally on the center at body local position, due to the variation of actual view, at least also before comprising body
On scene element, it is possible to the matching of arest neighbors foreground pixel point is carried out to the result of sparse regression, allows the bone for deviateing foreground pixel
Frame point is modified, so as to get final skeletal point further promotes precision in the result of recurrence.
Description of the drawings
Fig. 1 human body attitudes estimate working frame;
Position Clustering Effect during Fig. 2 K=3;
The quantity of the depth sum number of Fig. 3 (a) trees;
Fig. 3 (b) deviation ranges and characteristic dimension;
Fig. 4 super-pixel parameters;Wherein, (a) is the super-pixel parameter of μ=1.0, the super-pixel parameter of (b) μ=1.5.(c) μ=
2.0 super-pixel parameters;
Fig. 5 SDDF and SGDF feature combine;
Fig. 6 single pixels feature and super-pixel tagsort result;
Fig. 7 uses single pixel feature (PDDF+PGDF) and the subjective figure of super-pixel feature (SDDF+SGDF) classification;
Fig. 8 postures return the result in CMUSD and XiDian data sets;
Fig. 9 postures return the result in EVAL data sets.
Specific embodiment
As shown in Figure 1, the present invention provides a kind of estimation method of human posture suitable for single frames depth image, with single width packet
Depth image containing human body is input data, human body attitude feature extraction is carried out to depth image, using feature to human body
It is split, cluster operation is carried out to the position after segmentation, and the location estimation of human skeleton point is carried out applied to sparse regression.
Entire frame includes the following steps:
(1) μ SLIC super-pixel divides:
The present invention surpasses depth map using a kind of method of the simple linear iteration cluster (μ SLIC) of Weighted Coefficients μ
Pixel operation, this method are divided into two stages:Initial phase first, for depth image I, deep space (u, v, d (u,
V)) conversion obtains corresponding three-dimensional point cloud space (x, y, z), and depth image is evenly dividing by δ × δ grids as comprising NsIt is a
Seed is added to the pixel in grid in the cluster centered on seed point, and according in cluster all pixels point it is several
What centerUpdate the new position of seed point.Followed by iteration phase:For all seed points, in its 3 δ × 3
In the contiguous range of δ, according to distance DsThe distance of pixel and seed point is measured, pixel is grouped into the seed closest from its
In point cluster, and update and generate NsThe new position of a seed point, iteration above step is until whole process restrains or reaches maximum
Iterative steps Ni。
In order to avoid influence of the region of noise or depth value acute variation to super-pixel, while accelerate the super-pixel stage
Processing speed, pixel XkWith ith cluster central pointDistance metric DsIt is designed to:
Wherein μ is the weight that compactness is adjusted in super-pixel, when μ values are smaller, is easily spatially generated in two dimensional image
More uniform segmentation, but depth detail performance is bad.When μ values are bigger, pixel similar in depth is easier to divide at one
In block of pixels, but it is possible that many elongated regions, sex expression of compacting are bad.
Due in algorithmic procedure, distance metric DsIt is used only to that comparison is far and near not to be needed to be overlapped, so in algorithm
With D in realizations 2To replace Ds, do not need to carry out the operation of evolution in this way, with the execution efficiency of accelerating algorithm.
(2) the human body segmentation of SDDF+SGDF super-pixel union feature:
Kinect systems demonstrate the validity that depth difference feature shows single pixel characteristic in Attitude estimation.This method carries
A kind of union feature (SDDF+ based on super-pixel depth difference feature (SDDF) with super-pixel geodesic distance feature (SGDF) is gone out
SGDF).For one group with super-pixel χsGeometric centerFor the center of circle, random uniform sampling obtains in the circular scope of radius Θ
One group of offsetOn image I, combine NSDDFThe value f of a SDDFθAnd NSGDFThe value g of a SGDFθ,
It obtains about super-pixel χsDimension be NSDDF+NSGDFFeature:
1) the depth difference feature f based on super-pixelθ:For the super-pixel χ divided in depth map Is, in its geometry
CenterRadius be an offset θ being generated in advance by random uniform sampling in the circular scope of Θ, depth difference is special
Value indicative is:
Wherein dJ() represents to take the depth value of some location of pixels.
2) the geodesic distance feature g based on super-pixelθ:First according to the super-pixel structure comprising prospect divided in image
Into the structure of a non-directed graph, wherein vertex is exactly these prospect super-pixel;Then decided whether according to two rules on vertex
Between add side, if 1. two super-pixel χiAnd χjThere is pixel direct neighbor, and the absolute value of the difference of adjacent pixel depth is less than
δd, then a side about two super-pixel, the weights on side are added in figure:
Euclidean distance for two super-pixel centers;2. for passing through first rule still without side and other vertex phases
The outlier χ of connectioni, exist and χiA minimum super-pixel χ of distancec=argminχdist(χi, χ), addition and super-pixel χc
Weights are dist (χi, χc) a line.To the connected undirected graph application Floyd-Warshall algorithms constructed, then can count
Calculate the mutual shortest distance of all super-pixel.
For the offset θ, super-pixel χ being randomly derived as depth difference featuresThe geodesic distance list of feature values
It is shown as:
Wherein SPI(X) pixel X affiliated super-pixel in the picture, d are representedgeodesicIt represents in undirected graph structure before
The shortest distance between upper two vertex, that is, the geodesic distance being considered in super-pixel set.
3) the Stochastic Decision-making forest classified of super-pixel:For the union feature F of super-pixelI(χ) is using Stochastic Decision-making forest
To classify.
Training generation N firsttTree composition forest.In the training process of Stochastic Decision-making forest, for each division
Node is required for calculating its comentropy and information gain.The comentropy of one node comprising sampled point training set S is:
Wherein l is the position to be classified, and p (l | S) it is the probability that l is classified as in set S.Stochastic Decision-making forest algorithm will select
The division of maximum information gain can be obtained by taking, the division P={ P of such a nodeleft, Pright, which is included
Training sampling set S is divided into { Sleft, Sright, the information gain of the division is defined as:
After the training of Stochastic Decision-making forest is completed, image I can obtain χ points of super-pixel by a tree t in forest
Class is the Probability p of position lt(l | I, χ), entire forest obtainsChoose maximum probability
Classification of the position as super-pixel, and include as super-pixel the classification results of pixel.
(3) the human body attitude sparse regression based on human body cluster feature:
After being classified by Stochastic Decision-making forest to super-pixel, just all foreground pixels are classified, Jin Ertu
As being divided into each pre-defined position.But required skeletal point information, some joints (such as head joint and chest joint)
It is the center on position (such as head and metastomium), and some joints (elbow joint) are then position and position (upper arm position
With forearm position) adjoining position, so the present invention designs a kind of feature mapped from position to joint, and pass through sparse regression
Method map out the position of skeletal point.
1) human body attitude of the cluster feature based on Divisional represents:For a position l, K mean cluster is carried out to it,
Acquire NkA cluster point and according to the position and the distance-taxis of preset position (main portions being connected with the position),
Obtain vectorThen the geometric center c of all foreground pixels0With all NpCombine at a position
Come, obtain new about the human body attitude expression based on position cluster feature:
2) sparse regression:The target of human body attitude estimation is exactly to obtain NJThe position y of a three-dimensional framework artis.Assuming that have
N width trains picture, it is known that its artis informationAnd genius lociWherein NqIt is of genius loci point
Number, and i=1 ..., N, while define regression matrixWhereinFeature c is mapped to j-th skeletal point (j=1 ..., Nq), then this sparse regression model is exactly yi≈
Aci.Then following N can be passed through for projection matrix AJA independent optimization obtains:
Wherein yi(3j-2: 3j) vector y is representediThe subvector tieed up from (3j-2) dimension to (3j).Pass through the square trained
Battle array A and position cluster feature c, the three-dimensional position y=Ac of skeletal point is obtained using linear sparse regression.
Present invention uses a kind of new human body attitudes to estimate frame, and a kind of new joint is proposed on depth image
Super-pixel character representation, body part segmentation is carried out by the union feature application Stochastic Decision-making forest, in extract part point
After the cluster feature cut out, map to obtain final skeleton joint point position using the method for sparse regression.The present invention by
The advantage and disadvantage tested to compare other methods on multiple data acquisition systems, verification put forward feature and the validity of frame.
Embodiment 1:
Method based on machine learning usually requires a large amount of data set as training and verification.It is used in the present invention three
A data set is tested.Wherein EVAL data sets correspond to different performers respectively comprising 3 themes, and each performer has 8
Action sequence is total to about 10,000 frames, and resolution ratio is 320 × 240;XiDian data include 5 action sequences, totally 2850 frame, point
Resolution is 2048 × 2048, which acquired on the prototype for the depth data collection of material of big resolution ratio,
Depth data noise is bigger.Since these three data sets contain only depth data, it can not train and classify for human body
Stochastic Decision-making forest, so the present invention generated using CMU motion capture datas library with depth data and position number of tags
According to CMU generated datas collection (CMUSD), 113 themes of common Poser software process qualities, 2549 action sequences, each posture packet
Containing 8 cameras random but that front is master site, depth picture and position label more than 820,000 640 × 480, CMUSD are contained
It has covered than more rich action.
Prediction accuracy is from Fig. 3 (a) but more next with setting as the depth of tree increases and increases in Stochastic Decision-making forest
It is deeper, the raising of accuracy also increasingly unobvious, but the expense of tree itself is increasing.So depth of present invention selection tree
Spend is 20.And the increase for tree set also has same effect, in order to balance the efficiency of execution, thus select the scale of forest for
8 trees.The range of characteristic offset values and the dimension of characteristic point, as can be seen that being 180 pixels in deviation range from Fig. 3 (b),
When characteristic dimension is 1000, the preferable result that compares can be obtained.In experiment later, this group of parameter will be all used.
(1) μ SLIC super-pixel divides:
In super-pixel operation, carried out just according to the grid of 12 × 12 pixels using the picture of CMUSD settings 640 × 480
Begin to divide, the number of the super-pixel comprising human body prospect is average 120 or so.The super-pixel of data under different data collection resolution ratio
Initial size zoomed in and out according to the ratio of foreground pixel, to keep foreground pixel that can obtain the super picture of same number grade
Element.During super-pixel division is carried out, after iterations reach certain number, the pixel that super-pixel is included tends to be steady
It is fixed, and iterations and run time are proportional, so taking maximum iteration Ni=10.For super-pixel distance metric
Parameter, the profile details of chin do not embody from Fig. 4 (a) as can be seen that in the super-pixel on head, in Fig. 4 (c)
In, although the profile details at each position are embodied well, but body edge many elongated super-pixel occurs and draws
Point.The size of super-pixel is than more uniform and taken into account details performance in Fig. 4 (b).μ=1.5 in the subsequent experimental of the present invention.
(2) super-pixel feature extraction:
In this experiment, using 12 sequences of CMUSD totally 8 ten thousand or so pictures totally 100 ten thousand or so super-pixel, each
Sequence randomly selects 50% and is trained, and in addition 50% carries out test verification.
Different column lines represents different feature combinations in Figure 5.As can be seen from the figure 0+1000 uses geodetic completely
The effect of distance feature (GDF) is poor than the result of the complete use depth difference features (DDF) of 1000+0, but comes from accuracy rate
It says, this feature has certain validity.It, can be deeper than being used alone especially after two kinds of features are mixed according to 800+200
The result of degree difference will be got well, and illustrate to carry out depth difference feature using geodesic distance feature for certain postures and position beneficial
Supplement, but what is worked in major part is depth difference feature.So the feature finally used is to be combined deeply with 800+200
Degree difference and 1000 dimension super-pixel features of geodesic distance mixing.
(3) single pixel is compared with super-pixel classifying quality:
Single pixel feature (PDDF+PGDF) is carried out on CMUSD and super-pixel feature (SDDF+SGDF) carries out experiment pair
Than being set using identical characteristic dimension (800+200) and random forest.When using PDDF+PGDF, from each depth map
120 pixels of stochastical sampling are as training set.The Average Accuracy of final PDDF+PGDF is the flat of 92.105, SDDF+SGDF
Equal accuracy rate is 92.468.Show in figure 6 using SDDF+SGDF and PDDF+PGDF on different themes accuracy rate have
Floating slightly, it is contemplated that the randomness of sampled point, the two does not have apparent gap.Although in the figure 7 using SDDF+
Certain pixels of the SGDF methods in classification may be integrated into adjacent different parts, and the classification of super-pixel will influence
Entire pixels in super-pixel, but semantic identical position is enabled to tend to classification unanimously using super-pixel, and
Greatly save the extraction efficiency and testing efficiency of geodesic distance feature.
(4) posture returns:
In the genius loci extraction process returned in posture, as shown in Fig. 2, the cluster points of setting K mean values is 3, totally 10
A position is gathered respectively for 3 classes, and is ranked up that (such as head is according to body according to its Euclidean distance apart from father's connecting portion
The distance of cadre position), the dimension of c is 93 dimensions.
For CMUSD, using in data set 50% picture, as random forest and the training set of sparse regression, other
It is tested as test set.As can be seen from Figure 8 the artis of four limbs is compared to the standard of the artis such as head and chest
Exactness is relatively low.This is because the movement of acra part is more violent, amplitude range is bigger, and once occur blocking will be to pixel
The influence of classification is bigger and final regression result is had an impact.
In the experiment carried out in XiDian databases, using 4 sequences therein as training set, 1 is used as test set
Cross validation is carried out, Average Accuracy (mAP)=91.7 illustrate to be directed to big resolution ratio and the bigger data of noise have
Effect, this method can generate relatively good result.Although comparatively the electric database data noise in west is bigger, movement compares
It is relatively simple, so result is higher than CMUSD accuracy rate.
For EVAL data sets, eliminate which part artis apart from body prospect is distant or artis directly away from
From the apparent frame for exceeding organization of human body, it is left each theme and averagely includes 3,000 frames.It is trained in the pixel classifications stage using CMUSD
Using the model trained as training set of whole pictures of CMUSD, two masters of EVAL are used in the stage of recurrence for random forest
Topic is used as training set, surplus next as test set, is finally three cell means progress cross validation.
Fig. 9 is the comparison on EVAL data sets with the algorithm of the algorithm of Ye et al. and Jung et al..Ye et al. is 2014
The algorithm based on mixed Gauss model that year proposes needs to be fitted body model, it is slower to perform speed;Jung et al. was in 2015
The model of the random walk tree of proposition, greatly improves in execution efficiency, but is only verified on small data.From figure
It can be seen that it is relatively accurate close to the centrical artis of body, and the precision of acra point is relatively poor, this is because
Acra part is relatively low in the accuracy in pixel classifications stage, the position for returning feature clustering point is directly affected, so as to influence appearance
The precision that state returns.
(5) run time
It is set using the characteristic dimension of identical setting and identical random forest, to single pixel method and super-pixel method
Carry out run time comparison.Since the time complexity that geodesic distance calculates is 0 (n3), for the image of big resolution ratio,
It can not almost be used under real time environment as feature.Although the method based on super-pixel of this method increases calculating super-pixel
Time, but substantially reduce the efficiency of pixel characteristic extraction, while decrease the time of the classification of random forest.Even if
It is to only use depth difference as feature, the method for super-pixel is while precision is ensured it is also possible that algorithm is to different resolutions
Rate integrally accelerates 1.5~8 times.Method can reach under the data set of a variety of resolution ratio as can be seen from the table in real time will
It asks.
Table 1 performs time (unit:Millisecond)
The experiment proves that in the case of a variety of different resolution ratio and the quality of data, new union feature can for this part
Effectively express region characteristic, entire frame can in real time, effectively calculate human body attitude.
Super-pixel generation method has used the pixel distance measurement for taking into account two and three dimensions information, realizes pixel to super picture
The down-sampling operation of element so that the reduction of the order of magnitude is presented in the data directly handled.This method is extracted on single frames depth map melts
The union feature of depth difference and geodesic distance information is closed, overall situation and partial situation's incidence relation between pixel is comprehensively utilized, improves people
The nicety of grading of body region.Compared with predecessor works, sparse regression is carried out according to Divisional cluster feature point and realizes human synovial
Point estimation, not only reduces processing time, also obtains higher human body attitude estimated accuracy.
Claims (2)
1. a kind of estimation method of human posture based on depth image super-pixel union feature, which is characterized in that including following step
Suddenly:
Step (1) μ SLIC super-pixel divides
Super-pixel operation is carried out to depth map using the method for the simple linear iteration cluster (μ SLIC) of Weighted Coefficients μ, is divided into two
A stage:Deep space (u, v, d (u, v)) conversion for depth image I, is obtained corresponding three-dimensional by initial phase first
Point cloud space (x, y, z), and depth image is evenly dividing by δ × δ grids as comprising NsA seed, the pixel in grid
It is added in the cluster centered on seed point, and according to the geometric center of all pixels point in clusterMore
The new position of new seed point;Followed by iteration phase:For all seed points, in the contiguous range of the δ of its 3 δ × 3, according to away from
From DsIt measures the distance of pixel and seed point, pixel is grouped into the seed point cluster closest from its, and update and generate Ns
The new position of a seed point, iteration above step restrain or reach greatest iteration step number N until whole processi;
The human body segmentation of step (2) SDDF+SGDF super-pixel union features
Using the union feature (SDDF+ based on super-pixel depth difference feature (SDDF) Yu super-pixel geodesic distance feature (SGDF)
SGDF), for one group with super-pixel χsGeometric centerFor the center of circle, random uniform sampling obtains in the circular scope of radius Θ
One group of offsetOn image I, combine NSDDFThe value f of a SDDFθAnd NSGDFThe value g of a SGDFθ,
It obtains about super-pixel χsDimension be NSDDF+NSGDFFeature:
1) the depth difference feature f based on super-pixelθ:For the super-pixel χ divided in depth map Is, in its geometric centerRadius be an offset θ being generated in advance by random uniform sampling in the circular scope of Θ, depth difference characteristic value
For:
Wherein, dI() represents to take the depth value of some location of pixels.
2) the geodesic distance feature g based on super-pixelθ:First one is formed according to the super-pixel comprising prospect divided in image
The structure of a non-directed graph, wherein vertex are exactly these prospect super-pixel;Then decide whether to add between vertex according to two rules
Edged, if 1. two super-pixel χiAnd χjThere is pixel direct neighbor, and the absolute value of the difference of adjacent pixel depth is less than δd, then
A side about two super-pixel, the weights on side are added in figure:
Euclidean distance for two super-pixel centers;2. for being connected by the still no side of first rule with other vertex
Outlier χi, exist and χiA minimum super-pixel χ of distancec=argminχdist(χi, χ), addition and super-pixel χcWeights
For dist (χi, χc) a line;To the connected undirected graph application Floyd-Warshall algorithms constructed, then can calculate
The mutual shortest distance of all super-pixel,
For the offset θ, super-pixel χ being randomly derived as depth difference featuresThe geodesic distance list of feature values be shown as:
Wherein, SPI(X) pixel X affiliated super-pixel in the picture, d are representedgeodesicIt represents two on undirected graph structure before
The shortest distance between a vertex, that is, the geodesic distance being considered in super-pixel set;
3) the Stochastic Decision-making forest classified of super-pixel:For the union feature F of super-pixelI(χ) is carried out using Stochastic Decision-making forest
Classification,
Training generation N firsttTree composition forest, in the training process of Stochastic Decision-making forest, for each split vertexes
It needs to calculate its comentropy and information gain, the comentropy of a node comprising sampled point training set S is:
Wherein l is the position to be classified, and p (l | S) it is the probability that l is classified as in set S, Stochastic Decision-making forest algorithm will choose energy
Enough obtain the division of maximum information gain, the division P={ P of such a nodeleft, Pright, the training which is included
Sampling set S is divided into { Sleft, Sright, the information gain of the division is defined as:
InfGain (S, P)=H (S)-H (S | P)=H (S)-pleftH(Sleft)-prightH(Sright)。
After the training of Stochastic Decision-making forest is completed, image I can obtain super-pixel X by a tree t in forest and be classified as
The Probability p of position lt(l | I, X), entire forest obtainsChoose the position of maximum probability
As the classification of super-pixel, and include as super-pixel the classification results of pixel.
Human body attitude sparse regression of the step (3) based on human body cluster feature
After being classified by Stochastic Decision-making forest to super-pixel, all foreground pixels are classified, using from position to
The feature of joint mapping, and the method for passing through sparse regression maps out the position of skeletal point.
1) human body attitude of the cluster feature based on Divisional represents:For a position l, K mean cluster is carried out to it, is obtained
To NkIt is a to cluster point and according to the distance-taxis at the position and preset position, obtain vectorThen
The geometric center c of all foreground pixels0With all NpA position is joined together, and is obtained new about based on position cluster feature
Human body attitude expression:
2) sparse regression:The target of human body attitude estimation is exactly to obtain NJThe position y of a three-dimensional framework artis, it is assumed that have N width instruction
Practice picture, it is known that its artis informationAnd genius lociWherein NqIt is the number of genius loci point, and
I=1 ..., N, while define regression matrixWherein
Feature c is mapped to j-th skeletal point (j=1 ..., Nq), then this sparse regression model is exactly yi≈Aci, then for throwing
Shadow matrix A can pass through following NJA independent optimization obtains:
Wherein yi(3j-2:3j) represent vector yiThe subvector tieed up from (3j-2) dimension to (3j), passes through the matrix A trained
With position cluster feature c, the three-dimensional position y=Ac of skeletal point is obtained using linear sparse regression.
2. the estimation method of human posture as described in claim 1 based on depth image super-pixel union feature, feature exist
In, in step (1), pixel Xk(xk, yk, zk) and ith cluster central pointDistance metric DsIt is set
It is calculated as:
Wherein, μ is the weight that compactness is adjusted in super-pixel.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711395472.1A CN108154104B (en) | 2017-12-21 | 2017-12-21 | Human body posture estimation method based on depth image super-pixel combined features |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201711395472.1A CN108154104B (en) | 2017-12-21 | 2017-12-21 | Human body posture estimation method based on depth image super-pixel combined features |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108154104A true CN108154104A (en) | 2018-06-12 |
CN108154104B CN108154104B (en) | 2021-10-15 |
Family
ID=62464113
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201711395472.1A Active CN108154104B (en) | 2017-12-21 | 2017-12-21 | Human body posture estimation method based on depth image super-pixel combined features |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108154104B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109635783A (en) * | 2019-01-02 | 2019-04-16 | 上海数迹智能科技有限公司 | Video monitoring method, device, terminal and medium |
CN110288677A (en) * | 2019-05-21 | 2019-09-27 | 北京大学 | It is a kind of based on can deformation structure pedestrian image generation method and device |
CN110598675A (en) * | 2019-09-24 | 2019-12-20 | 深圳度影医疗科技有限公司 | Ultrasonic fetal posture identification method, storage medium and electronic equipment |
CN110610505A (en) * | 2019-09-25 | 2019-12-24 | 中科新松有限公司 | Image segmentation method fusing depth and color information |
CN111046733A (en) * | 2019-11-12 | 2020-04-21 | 宁波大学 | 3D human body posture estimation method based on sparsity and depth |
CN111428619A (en) * | 2020-03-20 | 2020-07-17 | 电子科技大学 | Three-dimensional point cloud head attitude estimation system and method based on ordered regression and soft labels |
CN111860311A (en) * | 2020-07-20 | 2020-10-30 | 南京智金科技创新服务中心 | Method and system for prompting abnormal posture of human body |
CN112070835A (en) * | 2020-08-21 | 2020-12-11 | 达闼机器人有限公司 | Mechanical arm pose prediction method and device, storage medium and electronic equipment |
CN112288798A (en) * | 2019-07-24 | 2021-01-29 | 鲁班嫡系机器人(深圳)有限公司 | Posture recognition and training method, device and system |
CN112766335A (en) * | 2021-01-08 | 2021-05-07 | 四川九洲北斗导航与位置服务有限公司 | Image processing method and device, electronic equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110206276A1 (en) * | 2007-09-24 | 2011-08-25 | Microsoft Corporation | Hybrid graph model for unsupervised object segmentation |
CN103890752A (en) * | 2012-01-11 | 2014-06-25 | 三星电子株式会社 | Apparatus for recognizing objects, apparatus for learning classification trees, and method for operating same |
CN105389569A (en) * | 2015-11-17 | 2016-03-09 | 北京工业大学 | Human body posture estimation method |
-
2017
- 2017-12-21 CN CN201711395472.1A patent/CN108154104B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110206276A1 (en) * | 2007-09-24 | 2011-08-25 | Microsoft Corporation | Hybrid graph model for unsupervised object segmentation |
CN103890752A (en) * | 2012-01-11 | 2014-06-25 | 三星电子株式会社 | Apparatus for recognizing objects, apparatus for learning classification trees, and method for operating same |
CN105389569A (en) * | 2015-11-17 | 2016-03-09 | 北京工业大学 | Human body posture estimation method |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109635783A (en) * | 2019-01-02 | 2019-04-16 | 上海数迹智能科技有限公司 | Video monitoring method, device, terminal and medium |
CN110288677A (en) * | 2019-05-21 | 2019-09-27 | 北京大学 | It is a kind of based on can deformation structure pedestrian image generation method and device |
CN112288798A (en) * | 2019-07-24 | 2021-01-29 | 鲁班嫡系机器人(深圳)有限公司 | Posture recognition and training method, device and system |
CN110598675A (en) * | 2019-09-24 | 2019-12-20 | 深圳度影医疗科技有限公司 | Ultrasonic fetal posture identification method, storage medium and electronic equipment |
CN110610505A (en) * | 2019-09-25 | 2019-12-24 | 中科新松有限公司 | Image segmentation method fusing depth and color information |
CN111046733A (en) * | 2019-11-12 | 2020-04-21 | 宁波大学 | 3D human body posture estimation method based on sparsity and depth |
CN111046733B (en) * | 2019-11-12 | 2023-04-18 | 宁波大学 | 3D human body posture estimation method based on sparsity and depth |
CN111428619B (en) * | 2020-03-20 | 2022-08-05 | 电子科技大学 | Three-dimensional point cloud head attitude estimation system and method based on ordered regression and soft labels |
CN111428619A (en) * | 2020-03-20 | 2020-07-17 | 电子科技大学 | Three-dimensional point cloud head attitude estimation system and method based on ordered regression and soft labels |
CN111860311A (en) * | 2020-07-20 | 2020-10-30 | 南京智金科技创新服务中心 | Method and system for prompting abnormal posture of human body |
CN112070835A (en) * | 2020-08-21 | 2020-12-11 | 达闼机器人有限公司 | Mechanical arm pose prediction method and device, storage medium and electronic equipment |
CN112766335A (en) * | 2021-01-08 | 2021-05-07 | 四川九洲北斗导航与位置服务有限公司 | Image processing method and device, electronic equipment and storage medium |
CN112766335B (en) * | 2021-01-08 | 2023-12-01 | 四川九洲北斗导航与位置服务有限公司 | Image processing method and device, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN108154104B (en) | 2021-10-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108154104A (en) | A kind of estimation method of human posture based on depth image super-pixel union feature | |
CN110458939B (en) | Indoor scene modeling method based on visual angle generation | |
US10891511B1 (en) | Human hairstyle generation method based on multi-feature retrieval and deformation | |
CN112002014B (en) | Fine structure-oriented three-dimensional face reconstruction method, system and device | |
CN109658445A (en) | Network training method, increment build drawing method, localization method, device and equipment | |
CN105389569B (en) | A kind of estimation method of human posture | |
US9942535B2 (en) | Method for 3D scene structure modeling and camera registration from single image | |
Sirmacek et al. | Performance evaluation for 3-D city model generation of six different DSMs from air-and spaceborne sensors | |
CN110163836A (en) | Based on deep learning for the excavator detection method under the inspection of high-altitude | |
CN109410321A (en) | Three-dimensional rebuilding method based on convolutional neural networks | |
CN108319957A (en) | A kind of large-scale point cloud semantic segmentation method based on overtrick figure | |
CN106651926A (en) | Regional registration-based depth point cloud three-dimensional reconstruction method | |
CN107871106A (en) | Face detection method and device | |
CN109214366A (en) | Localized target recognition methods, apparatus and system again | |
CN107481279A (en) | A kind of monocular video depth map computational methods | |
EP3905194A1 (en) | Pose estimation method and apparatus | |
CN109934065A (en) | A kind of method and apparatus for gesture identification | |
Aiteanu et al. | Hybrid tree reconstruction from inhomogeneous point clouds | |
CN104346824A (en) | Method and device for automatically synthesizing three-dimensional expression based on single facial image | |
CN109598234A (en) | Critical point detection method and apparatus | |
CN108648194A (en) | Based on the segmentation of CAD model Three-dimensional target recognition and pose measuring method and device | |
CN112085835B (en) | Three-dimensional cartoon face generation method and device, electronic equipment and storage medium | |
CN110334584B (en) | Gesture recognition method based on regional full convolution network | |
CN109766873A (en) | A kind of pedestrian mixing deformable convolution recognition methods again | |
CN112669448B (en) | Virtual data set development method, system and storage medium based on three-dimensional reconstruction technology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |