CN104063677A

CN104063677A - Equipment used for estimating human body posture and method thereof

Info

Publication number: CN104063677A
Application number: CN201310088425.8A
Authority: CN
Inventors: 胡芝兰; 陈茂林; 宫鲁津; 孙迅; 刘荣; 张帆; 金智渊
Original assignee: Beijing Samsung Telecommunications Technology Research Co Ltd; Samsung Electronics Co Ltd
Current assignee: Beijing Samsung Telecommunications Technology Research Co Ltd; Samsung Electronics Co Ltd
Priority date: 2013-03-19
Filing date: 2013-03-19
Publication date: 2014-09-24
Anticipated expiration: 2033-03-19
Also published as: CN104063677B; KR20140114741A

Abstract

The invention provides equipment used for estimating a human body posture and a method thereof. The equipment used for estimating the human body posture comprises an image acquisition unit which is used for acquiring a depth image comprising a human body object; a part characteristic detection unit which is used for extracting the human body object from the acquired depth image and detecting each candidate part and characteristic of a human body, performing minimum energy framework scanning on the depth image, confirming multiple framework points, constructing an MESS framework, and constructing a PIXLA framework for each candidate part via a pixel marking result of each candidate part and depth distribution; a part generation unit which is used for generating part assumption for each part of the human body via fusing the result of the minimum energy framework scanning and the pixel marking result; and a posture confirmation unit which is used for assembling part assumption into at least one posture assumption, evaluating each posture assumption according to a posture evaluation criterion and confirming the human body posture.

Description

For estimating equipment and the method for human body attitude

Technical field

The present invention relates to a kind ofly for estimating equipment and the method for human body attitude, relate in particular to a kind of by merging comprising that result that the depth image of human object carries out least energy skeleton scanning (MESS) and element marking estimates equipment and the method for human body attitude.

Background technology

Along with the development of computer vision technique, people can carry out interactive operation for the object showing in true 3d space or virtual 3d space.In the time carrying out this interactive operation, need to carry out natural contactless remote control to the object of described demonstration.Now, the entity that human body itself (for example, head, hand/finger/arm, trunk or whole health) can be used as controlling, thus the exercises by body part in real scene operate the object of described demonstration.In this case, can utilize depth camera floor image or video, go out the attitude of human body based on depth image data estimation, analyze whereby user's intention, thereby need to also can not handle the object showing in virtual 3d space or true 3d space by means of mouse, keyboard, operating rod or touch-screen etc.In addition, under many other application scenarioss, also need to identify the attitude of human body.

People have carried out large quantity research to how estimating human body attitude, still, at present for the scheme of estimating human body attitude only at the configuration space direct estimation human body attitude of single level, this causes operand larger, and estimated accuracy is not high.In this case, existing human body attitude estimation scheme often mainly depends on a large amount of attitude samples, but, even if sample size is a lot, also be difficult to contain the various samples for different builds, different attitude (simple attitude or complicated attitude), and the attitude data storehouse of setting up great amount of samples like this also becomes the difficult problem in machine learning method.

For example, US20100278384 U.S. Patent application " Human body pose estimation " has proposed a kind of system of identifying human body attitude based on a large amount of human body attitude samples.This scheme depends on the attitude of sample to a great extent, and the training time is long.Because such scheme cannot relate to all complicated attitude of various builds in the time setting up attitude tranining database, therefore, in the time estimating complicated attitude, performance significantly reduces.In addition, US20100197390 U.S. Patent application " Pose trackingpipeline " discloses a kind of scheme that produces body part based on pixel clusters, and it depends on disclosed element marking result in US20100278384 United States Patent (USP).In such scheme, the algorithm more complicated that attitude is estimated, and carry out in the configuration space of single level, therefore, the accuracy of attitude estimated result is not high.In addition, US20090252423 U.S. Patent application " Controlled humanpose estimation from depth image streams ", US2010049675A1 U.S. Patent application " Recovery of 3D Human Pose by Jointly Learning Metrics and Mixtures ofExperts ", also all there is similar problem in US2011025834A1 U.S. Patent application " Method and apparatus ofidentifying human body posture ", cause being only applicable to simple attitude, and degree of accuracy is not high in the time estimating complicated attitude, or, cannot be applicable to real-time system because operand is large.

In sum, traditional human body attitude estimation scheme has two subject matters.A problem is, too rely on attitude sample data, and in fact collect cover the different bodily form, different simple/the attitude sample of complicated attitude is very difficult, and if sample size is too much in attitude data storehouse, also will bring sizable difficulty to machine learning process.Another problem is, the attitude of estimating do not classified, for example, and simple attitude and complicated attitude, front/side/intersections etc., are difficult to infer human body attitude accurately.

Summary of the invention

The object of the present invention is to provide a kind of equipment and method that can merge least energy skeleton scanning technique and element marking technology and estimate with complimentary fashion human body attitude, thereby do not need to rely on huge attitude sample data, just from human body depth image, estimate human body attitude comparatively exactly.

According to an aspect of the present invention, provide a kind of for estimating the equipment of human body attitude, comprising: image acquisition unit, for obtaining the depth image that comprises human object; Genius loci detecting unit, extract each candidate position and feature of human object human body for the depth image from obtaining, described depth image is carried out to least energy skeleton scanning and determine that multiple skeletons put and build MESS skeleton, and element marking result by each candidate position and depth profile build the PIXLA skeleton at each candidate position; Position generation unit, the position supposition that produces each human body for the result of the result by merging the scanning of least energy skeleton and element marking; Attitude determining unit, for the supposition of described position being assembled into at least one attitude supposition, evaluates and definite human body attitude each attitude supposition according to attitude interpretational criteria.

Preferably, for arbitrary candidate position, genius loci detecting unit is determined the continuous skeleton point at described position according to the degree of depth continuity between element marking and the pixel of each pixel in described candidate position, to build the PIXLA skeleton at described position.

Preferably, generation unit self-monitoring candidate's head in position produces supposition head, and according to the PIXLA degree of confidence of the pixel in the supposition head of the information of metastomium and generation, described supposition head is evaluated.

Preferably, position generation unit estimates trunk supposition by merging result, the result of element marking and candidate's head of detection of least energy skeleton scanning.

Preferably, the prospect that position generation unit scans according to least energy skeleton is determined rough torso area, estimate 2D trunk direction, remove non-trunk pixel based on element marking result from rough torso area, rough torso area is carried out to the modeling of 2D trunk, use respectively the trunk upper/lower that detects by element marking shoulder/pelvis pixel around to determine 3D shoulder and 3D pelvis.

Preferably, position generation unit is carried out the modeling of 2D trunk by following operation to rough torso area: determine trunk top based on head zone; Barycenter based on health and leg area are determined trunk bottom; By project left border and the right side boundary of determining trunk along the vergence direction of trunk according to trunk dimension constraint condition, determine final torso area from rough torso area.

Preferably, position generation unit is also identified the complicated attitude of human body from the depth image obtaining, and the candidate position that described complicated attitude is related to re-starts mark.

Described complicated attitude can comprise that shank intersects and hand arm held upward.

Preferably, position generation unit uses the result of result, element marking of MESS and moving region to produce the supposition at the each position of a small amount of limbs with complimentary fashion.

Preferably, position generation unit produces lower-left arm supposition by carrying out following operation: in the time left elbow and left wrist being detected, produce lower-left arm supposition by connecting left arm and left wrist (left hand); In the time lower-left arm and left wrist being detected, produce lower-left arm supposition by connecting lower-left arm and left wrist; In the time left elbow and lower-left arm being detected, produce lower-left arm supposition by connecting lower-left arm and left elbow; The PIXLA skeleton extracting by the lower-left arm from detecting produces lower-left arm supposition; From being positioned at upper body but the MESS skeleton that does not belong to head produces the supposition of lower-left arm; In the time not finding reliable lower-left arm, from generation lower-left, the moving region arm supposition of trunk, wherein, by the poor detection of frame moving region; In the middle of the multiple lower-lefts arm supposition producing, remove overlapping lower-left arm supposition; To each lower-left arm supposition weighting producing, and remove the lower-left arm supposition that weights are low, wherein, determine the weights of described lower-left arm supposition according to the probability that falls into the number of pixel of foreground area of described lower-left arm supposition and these pixels and belong to lower-left arm; Wherein, position generation unit is carried out similar operations to produce the supposition of right arm, left leg and right shank position.

Preferably, position generation unit is also removed the supposition of irrational position according to the relation between different parts, with the supposition of selected position.

Preferably, attitude determining unit comprises: attitude sort module, suppose for being assembled into described at least one attitude by each position supposition of human body, and determine that according at least one positional parameter of each attitude supposition each attitude supposes the probability distribution between described predefined attitude classification; And attitude evaluation module, suppose the probability distribution between described predefined attitude classification for utilizing at least one position binding characteristic of each attitude supposition to assess each attitude, then by with assessment after the probability distribution of all attitudes supposition in the corresponding attitude of maximum probability value suppose and be defined as human body attitude.

Preferably, described attitude sort module is based on machine learning algorithm, determines that according to the positional parameter of each attitude supposition each attitude supposes the probability distribution between described predefined attitude classification.

Preferably, described at least one positional parameter comprises at least one in following: the intersection region size between the distance between direction, arm position and the metastomium of metastomium, intersection region size, the shank position between arm position.

Preferably, described at least one position binding characteristic comprises at least one in following: in the two dimension at the two dimension at arm position or three-dimensional length, shank position or three-dimensional length, arm or shank in degree of depth continuity, arm or shank vertically the degree of depth along the degree of depth consistance at the prospect coverage rate at the contrast of direction perpendicular to axial direction and peripheral region, each position, each position, be close to distance and angle between position.

According to a further aspect in the invention, provide a kind of for estimating the method for human body attitude, comprising: A) obtain the depth image that comprises human object; B) extract each candidate position and feature of human object human body from the depth image that obtains, described depth image is carried out to least energy skeleton scanning and determine that multiple skeletons put and build MESS skeleton, and element marking result by each candidate position and depth profile build the PIXLA skeleton at each candidate position; C) position that produces each human body by merging the result of least energy skeleton scanning and the result of element marking is supposed; D) supposition of described position is assembled into at least one attitude supposition, according to attitude interpretational criteria, each attitude supposition is evaluated and definite human body attitude.

Preferably, for arbitrary candidate position, determine the continuous skeleton point at described position according to the degree of depth continuity between element marking and the pixel of each pixel in described candidate position, to build the PIXLA skeleton at described position.

Preferably, at step C) in, self-monitoring candidate's head produces supposition head, and according to the PIXLA degree of confidence of the pixel in the supposition head of the information of metastomium and generation, described supposition head is evaluated.

Preferably, at step C) in, estimate trunk supposition by merging result, the result of element marking and candidate's head of detection of least energy skeleton scanning.

Preferably, at step C) in, the prospect scanning according to least energy skeleton is determined rough torso area, estimate 2D trunk direction, remove non-trunk pixel based on element marking result from rough torso area, rough torso area is carried out to the modeling of 2D trunk, use respectively the trunk upper/lower that detects by element marking shoulder/pelvis pixel around to determine 3D shoulder and 3D pelvis.

Preferably, at step C) in, by following operation, rough torso area is carried out to the modeling of 2D trunk: determine trunk top based on head zone; Barycenter based on health and leg area are determined trunk bottom; By project left border and the right side boundary of determining trunk along the vergence direction of trunk according to trunk dimension constraint condition, determine final torso area from rough torso area.

Preferably, at step C) in, also from the complicated attitude of the depth image identification human body that obtains, and the candidate position that described complicated attitude is related to re-starts mark.

Preferably, at step C) in, use the result of MESS, result and the moving region of element marking to produce a small amount of supposition limbs with complimentary fashion.

Preferably, at step C) in, produce lower-left arm supposition by carrying out following operation: in the time left elbow and left wrist being detected, produce lower-left arm supposition by connecting left arm and left wrist (left hand); In the time lower-left arm and left wrist being detected, produce lower-left arm supposition by connecting lower-left arm and left wrist; In the time left elbow and lower-left arm being detected, produce lower-left arm supposition by connecting lower-left arm and left elbow; The PIXLA skeleton extracting by the lower-left arm from detecting produces lower-left arm supposition; From being positioned at upper body but the MESS skeleton that does not belong to head produces the supposition of lower-left arm; In the time not finding reliable lower-left arm, from generation lower-left, the moving region arm supposition of trunk, wherein, by the poor detection of frame moving region; In the middle of the multiple lower-lefts arm supposition producing, remove overlapping lower-left arm supposition; To each lower-left arm supposition weighting producing, and remove the lower-left arm supposition that weights are low, wherein, determine the weights of described lower-left arm supposition according to the probability that falls into the number of pixel of foreground area of described lower-left arm supposition and these pixels and belong to lower-left arm; Wherein, at step C) in, also carry out similar operations to produce the supposition of right arm, left leg and right shank position.

Preferably, at step C) in, also remove the supposition of irrational position according to the relation between different parts, with the supposition of selected position.

Preferably, step D) comprising: will be assembled into described at least one attitude by each position supposition of human body and suppose, and determine that according at least one positional parameter of each attitude supposition each attitude supposes the probability distribution between described predefined attitude classification; Utilize at least one position binding characteristic of each attitude supposition to assess each attitude and suppose the probability distribution between described predefined attitude classification, then by with assessment after the probability distribution of all attitudes supposition in the corresponding attitude of maximum probability value suppose and be defined as human body attitude.

Preferably, based on machine learning algorithm, determine that according to the positional parameter of each attitude supposition each attitude supposes the probability distribution between described predefined attitude classification.

Brief description of the drawings

By the description of carrying out below in conjunction with the accompanying drawing that exemplary embodiment of the present is shown, above and other objects of the present invention and feature will become apparent, wherein:

Fig. 1 illustrates the block diagram that carries out according to an exemplary embodiment of the present invention the system of man-machine interaction based on user's human body attitude;

Fig. 2 is the logic diagram that the equipment for estimating human body attitude is shown according to an exemplary embodiment of the present invention;

Fig. 3 is the overview flow chart that the method for estimating human body attitude is shown according to an exemplary embodiment of the present invention;

Fig. 4 illustrates from the human body of depth image detection and the example of feature;

Fig. 5 illustrates according to an exemplary embodiment of the present the MESS skeleton that builds and the example of PIXLA skeleton;

Fig. 6 is the process flow diagram that the processing of the step S330 in Fig. 2 is shown;

Fig. 7 illustrates the process flow diagram that produces according to an exemplary embodiment of the present invention trunk supposition;

Fig. 8 is the example that the trunk estimating is according to an exemplary embodiment of the present shown;

Fig. 9 and Figure 10 are the examples illustrating by complicated according to an exemplary embodiment of the present invention attitude classification;

Figure 11 is the example illustrating by the human body attitude estimating according to an exemplary embodiment of the present.

Embodiment

Now will describe exemplary embodiment of the present invention in detail, the example of described embodiment is shown in the drawings, and wherein, identical label refers to identical position all the time.Below will be by described embodiment is described with reference to accompanying drawing, to explain the present invention.

Fig. 1 illustrates the system chart that carries out according to an exemplary embodiment of the present invention man-machine interaction based on user's human body attitude.With reference to Fig. 1, the system of described man-machine interaction comprises input interface unit 110, attitude estimating device 120, DIU display interface unit 130, Network Interface Unit 140 and Applied layer interface 150.

Input interface unit 110 receives input data (as depth image, coloured image etc.) from such as depth camera, color camera, stereo camera etc.

Attitude estimating device 120 is for using the depth image receiving from input interface unit 110 to carry out human body attitude estimation.Described attitude estimating device 120 also can be carried out other the image processing for man-machine interaction, as motion detection, color processing etc.

DIU display interface unit 130 is for showing from the input data of input interface unit 110, can (can comprising by processing result image from the human body attitude flow data of attitude estimating device 120 and other, but be not limited to attitude data, current movement velocity, acceleration, human body and skeleton size etc.).

Network Interface Unit 140 can send the data that attitude estimating device 120 is exported by LAN (Local Area Network), internet or wireless network, and receives related data.Applied layer interface 150 receives attitude flow data from attitude estimating device 120, identification user view, and provide relevance feedback to user.

As mentioned above, attitude estimating device 120 can be integrated in embedded system, so that automatic attitude assessment function to be provided.

Fig. 2 illustrates according to an exemplary embodiment of the present invention the logic diagram of the equipment for estimating human body attitude.Described equipment can be embodied as attitude estimating device 120 or the one parts in Fig. 1.

With reference to Fig. 2, for estimating that the equipment of human body attitude comprises image acquisition unit 210, genius loci detecting unit 220, position generation unit 230, attitude determining unit 240 and attitude output unit 250.

Image acquisition unit 210 is for obtaining the depth image that comprises human object.For example, receive described depth image by the input interface unit 110 of Fig. 1.

Genius loci detecting unit 220 extracts each candidate position and feature of human object human body for the depth image obtaining from image acquisition unit 210, described depth image is carried out to least energy skeleton scanning (MESS) and determine that multiple skeletons put and build MESS skeleton, and element marking (PIXLA) result by each candidate position and depth profile build the PIXLA skeleton at each candidate position.

Genius loci detecting unit 220 can use various existing human objects to detect and extractive technique is extracted human object from described depth image, in the processing that detects and extract, described depth image is carried out to element marking at described human object.On this basis, genius loci detecting unit 220 can detect by various existing Human Detection candidate position and the feature of each human body.The candidate position of detecting includes, but not limited to rigid body or nearly rigid body position and joint part, and the feature of detection includes, but not limited to foreground features and shape facility.In application number is the Chinese patent application of 201210141357.2 (human body detection methods), a kind of method from depth image human body object human body position is disclosed.Fig. 4 exemplarily illustrates the position of detection and the example of feature.Thus, although the human body that genius loci detecting unit 220 detects/extracts and feature can not reach 100% verification and measurement ratio or accuracy, can realize 0% false alarm rate.

Genius loci detecting unit 220 also carries out MESS to described depth image and determines that multiple skeletons put and build MESS skeleton.Application number is in the Chinese patent application of 201210176875.8 (human body image resolver and methods), to disclose a kind of human body depth image to be carried out to least energy skeleton scanning (MESS), thus the technical scheme of human body skeleton point.Certainly when, genius loci detecting unit 220 of the present invention is carried out MESS, be not limited to use the disclosed method of above-mentioned Chinese patent application.

In addition,, according to the present invention, genius loci detecting unit 220 also element marking (PIXLA) result by each candidate position and depth profile builds the PIXLA skeleton at each candidate position.Wherein, for arbitrary candidate position, genius loci detecting unit 220 is determined the continuous skeleton point at described position according to the degree of depth continuity between element marking and the pixel of each pixel in described candidate position, to build the PIXLA skeleton at described position.

For example, genius loci detecting unit 220 can build the PIXLA skeleton of lower-left arm as follows:

(1) the skeleton point of the position center line that searching detects.

A. find left end point: by the left scan image of mind-set therefrom, until the degree of depth sharply changes or multiple continuous pixel does not belong to lower-left arm;

B. find right endpoint: by the right scan image of mind-set therefrom, until the degree of depth sharply changes or multiple continuous pixel does not belong to lower-left arm;

C. the center between left end point and right endpoint is as skeleton point.

(2) find one by one continuous skeleton point more than square frame center, position, until no longer include lower-left arm pixel.

(3) similarly find the continuous skeleton point below square frame center, position.

(4) the PIXLA skeleton of lower-left arm position, by forming from the skeleton point that (1)～(3) are found, represents arm position, described lower-left to connect the line segment of each skeleton point.

Fig. 5 shows the example of the body part of detection and the skeleton pattern of structure.(a) wherein shows the marked graph of partes corporis humani position, and the position detecting with box indicating (b) is shown, the PIXLA skeleton at each position of structure (c) is shown, the MESS skeleton of structure (d) is shown.

Because PIXLA skeleton is by being extracted in conjunction with PIXLA result and degree of depth continuity, therefore in the time that PIXLA mark result is more accurate, can recover and position skeleton that around depth correlation degree is low.Therefore,, in the time that depth correlation degree is low, PIXLA skeleton is more complete compared with MESS skeleton.As shown in Figure 5, the PIXLA skeleton of (c) arm in Fig. 5 and shank is more complete compared with the appropriate section of MESS skeleton in (d).But PIXLA depends on PIXLA result, and in the time of PIXLA result badly, can not extract position skeleton, and MESS skeleton may more easily be extracted.Therefore, PIXLA mark result and MESS scanning result are complementary to a great extent.Therefore, the present invention proposes the technical scheme of estimating human body attitude in complementary mode in conjunction with PIXLA mark result and MESS scanning result.

Here, position generation unit 230 produces the position supposition of each human body for the result of the result by merging the scanning of least energy skeleton and element marking.Describe after a while the processing of the position supposition of position generation unit 230 each human bodies of generation in detail with reference to Fig. 6-Figure 10.

Attitude determining unit 240 is assembled at least one attitude supposition for the position supposition that position generation unit 230 is produced, and according to attitude interpretational criteria, each attitude supposition is evaluated and definite human body attitude.Here, can determine described attitude interpretational criteria according to the number of the attitude supposition producing and the genius loci detecting, described attitude interpretational criteria can comprise the element marking result of attitude, the central combination in any such as 2D length and 3D length and degree of depth continuity of key position.

According to exemplary embodiment of the present invention, attitude determining unit 240 comprises attitude sort module and attitude evaluation module (not shown in Fig. 2).

Attitude sort module supposes for being assembled into described at least one attitude by each position supposition of human body, and determines that according at least one positional parameter of each attitude supposition each attitude supposes the probability distribution between described predefined attitude classification.Attitude sort module is based on machine learning algorithm, determines that according to the positional parameter of each attitude supposition each attitude supposes the probability distribution between described predefined attitude classification.Described at least one positional parameter comprises at least one in following: the intersection region size between the distance between direction, arm position and the metastomium of metastomium, intersection region size, the shank position between arm position.

Attitude evaluation module suppose the probability distribution between described predefined attitude classification for utilizing at least one position binding characteristic of each attitude supposition to assess each attitude, then by with assessment after the probability distribution of all attitudes supposition in the corresponding attitude of maximum probability value suppose and be defined as human body attitude.Described at least one position binding characteristic comprises at least one in following: in the two dimension at the two dimension at arm position or three-dimensional length, shank position or three-dimensional length, arm or shank in degree of depth continuity, arm or shank vertically the degree of depth along the degree of depth consistance at the prospect coverage rate at the contrast of direction perpendicular to axial direction and peripheral region, each position, each position, be close to distance and angle between position.

Attitude output unit 250 is for exporting the data of the definite human body attitude of attitude determining unit 240.

Fig. 3 is the overview flow chart that the method for estimating human body attitude is shown according to an exemplary embodiment of the present invention.

With reference to Fig. 3, at step S310, image acquisition unit 210 obtains the depth image that comprises human object.

At step S320, genius loci detecting unit 220 extracts each candidate position and framework characteristic of human object human body from the depth image obtaining, described depth image is carried out to least energy skeleton scanning and determine that multiple skeletons put and build MESS skeleton, and element marking result by each candidate position and depth profile build the PIXLA skeleton at each candidate position.According to an alternative embodiment of the invention, for arbitrary candidate position, genius loci detecting unit 320 is determined the continuous skeleton point at described position according to the degree of depth continuity between element marking and the pixel of each pixel in described candidate position, to build the PIXLA skeleton at described position.

At step S330, position generation unit 230 produces the position supposition of each human body by merging the result of least energy skeleton scanning and the result of element marking.In step S330, merge MESS result and PIXLA result produces the supposition that comprises head, trunk, four limbs and joint part in complementary mode.

Fig. 6 illustrates the process flow diagram of the processing of position generation unit 230 in step S330.It may be noted that the processing shown in Fig. 6 is only exemplary process, the invention is not restricted to the processing shown in Fig. 6, the section processes shown in also as required can execution graph 6.

With reference to Fig. 6, in step 610, the self-monitoring candidate's head of position generation unit 230 produces supposition head, and according to the PIXLA degree of confidence of the pixel in the supposition head of the information of metastomium and generation, described supposition head is evaluated.In the time producing head supposition, can remove according to the information of the metastomium of estimating the head of error-detecting, or find the head of disappearance from MESS result.

In step 620, position generation unit 230 estimates trunk supposition by merging result, the result of element marking and candidate's head of detection of least energy skeleton scanning.By being connected with four limbs, naturally comprise the abundant information for carrying out motion analysis as the trunk at visible position maximum in human body.But due in complicated attitude, the complicated mutual hiding relation between trunk and arm, is not easy accurately, stably to estimate trunk.Fig. 7 illustrates the flow process of the processing of estimating according to the trunk of exemplary embodiment of the present invention.

With reference to Fig. 7, in step 710, the foreground area that position generation unit 230 goes out according to least energy skeleton scanning (MESS) is determined rough torso area.According to a preferred embodiment of the invention, can carry out thinning processing to the rough torso area of determining according to PIXLA mark result.For example, can be by the border at the head pixel refinement trunk top in restriction trunk, similarly by the border of the shank pixel refinement trunk bottom in restriction trunk.

In step 720, position generation unit 230 is estimated 2D trunk direction.

In step 730, position generation unit 230 is removed non-trunk pixel based on element marking result from rough torso area, and calculates the trunk degree of depth and predict the size of trunk.

In step 740, position generation unit 230 is carried out the modeling of 2D trunk to rough torso area.According to exemplary embodiment of the present invention, position generation unit 230 is by the modeling of following processing execution 2D trunk:

A. determine trunk top based on head zone.In the time there is no head, suppose that head is blocked by arm, using the depth areas of trunk top as head.

B. the barycenter based on health and leg area are determined trunk bottom.

C. by project left border and the right side boundary of determining trunk along the vergence direction of trunk according to trunk dimension constraint condition, determine final torso area from rough torso area.The weights of supposing trunk pixel are 2, and the weights of rough remaining foreground pixel of torso area are 1.By by the predicted value of position described in the weights addition calculation of whole trunk pixels of the position of the vergence direction along trunk.Border, left and right is defined as therefrom to mind-set two Slideslips lower than the predicted value of specific threshold.

D. determine final torso area, i.e. 6 of human trunk model points.For example, can be using TorsoNL/TorsoNR as scanning to left/right both sides from top center, until prospect border or rough torso area border; Using TorsoPL/TorsoPR as scanning to left/right both sides from bottom centre, until prospect border or rough torso area border; Using TorsoSL/TorsoSR as certainly scanning to left/right lower than the point of top center 1/3 torso length, until prospect border or rough torso area border.

In step 750, position generation unit 230 uses PIXLA result to estimate the joint part of trunk, particularly, uses respectively the trunk upper/lower that detects by element marking shoulder/pelvis pixel around to determine 3D shoulder and 3D pelvis.

By the processing of step 710～750, position generation unit 230 estimates one or more trunk supposition.Fig. 8 illustrates the trunk and the corresponding joint part that estimate.

Due to the major beat complexity of some human body attitudes, now be difficult for differentiating accurate location and the skeleton at four limbs and other positions, therefore, according to a preferred embodiment of the invention, position generation unit 230 performs step 630, identify the complicated attitude of human body from the depth image obtaining, and the candidate position that described complicated attitude is related to re-starts mark.Described complicated attitude includes but not limited to, shank intersection and hand arm held upward.

Intersect attitude as example taking shank, be now difficult for correctly distinguishing the position of left leg and right leg.Shank intersection is divided into again leg in both legs top intersection, both legs bottom intersection, and intersects with another lower leg.Can for example, determine shank intersection attitude by several different methods (method based on machine learning).Below introduce a kind of MESS of use depth areas and PIXLA result and determine the method for shank crossing condition.

Leg above can not sheltered from by another one leg, and its lower leg and upper leg are continuous in the degree of depth.Therefore,, if one of met the following conditions, can be defined as the leg above:

1) the upper leg detecting by PIXLA and lower leg are in same MESS depth areas, as shown in (a) in Fig. 9;

2) supreme leg region, lower leg region (the MESS depth areas of descending leg to exist, no matter be left leg or right leg) has degree of depth continuity, as shown in (b) in Fig. 9.

Attribute by the upper leg that detects can determine that leg is left leg or right leg.For leg above, the candidate position of mark based on PIXLA again.Taking left leg as leg above as example, can remove and be positioned at the bottom right leg in foreleg region or the bottom right leg that is positioned at foreleg region is labeled as to left lower leg again, and can remove less than the left lower leg in foreleg region or will not have the left lower leg in foreleg region to be again labeled as bottom right leg.This can reduce the indefinite property of PIXLA result greatly, therefore can produce exactly shank supposition.

Another complicated attitude is that arm is lifted excessive attitude, and the shoulder for this attitude based on PIXLA detects and ancon detection easily makes a mistake.As shown in (a) in Figure 10, the shoulder detecting at first and ancon are misplaced.In this attitude, the ancon of detection will be on shoulder, shoulder and ancon that therefore marked erroneous detects again.When the arm detecting when above, can be defined as hand arm held upward at trunk or head by described attitude.Subsequently, the shoulder relatively detecting and the position of ancon, if the shoulder detecting and ancon dislocation change the mark of shoulder and ancon, thereby be conducive to follow-up generation arm supposition.

As previously mentioned, complicated attitude is not limited to shank intersects and hand arm held upward, can determine and add more complicated attitude, and can select different testing results, the identification that designs in a different manner complicated attitude and part mark thereof.

Get back to Fig. 6, when after completing steps 630, in step 640,650 and 660, position generation unit 230 uses the result of result, element marking of MESS and moving region to produce the supposition at the each position of a small amount of limbs with complimentary fashion.With the example that is produced as of lower-left arm supposition, carry out following operation:

In the time left elbow and left wrist being detected, produce lower-left arm supposition by connecting left arm and left wrist (left hand);

In the time lower-left arm and left wrist being detected, produce lower-left arm supposition by connecting lower-left arm and left wrist;

In the time left elbow and lower-left arm being detected, produce lower-left arm supposition by connecting lower-left arm and left elbow;

The PIXLA skeleton extracting by the lower-left arm from detecting produces lower-left arm supposition;

From being positioned at upper body but the MESS skeleton that does not belong to head produces the supposition of lower-left arm;

In the time not finding reliable lower-left arm, from generation lower-left, the moving region arm supposition of trunk, wherein, by the poor detection of frame moving region;

In the middle of the multiple lower-lefts arm supposition producing, remove overlapping lower-left arm supposition;

To each lower-left arm supposition weighting producing, and remove the lower-left arm supposition that weights are low, wherein, determine the weights of described lower-left arm supposition according to the probability that falls into the number of pixel of foreground area of described lower-left arm supposition and these pixels and belong to lower-left arm.

In like manner, position generation unit 230 can be carried out with the supposition of lower-left arm and produce similarly operation, to produce the supposition of right arm, left leg and right shank position.Should be appreciated that, said method is only used for merging MESS result, PIXLA result and moving region in complementary mode and produces a kind of exemplary approach of the supposition at the each position of limbs, can use other strategy to produce the supposition at limbs position.

In addition, can produce different joint supposition from the result of candidate's joint part of detecting, for example, the center of the shoulder detecting based on PIXLA can be defined as to left shoulder and suppose.

Afterwards, in step 670, position generation unit 230 is also removed the supposition of irrational position according to the relation between different parts, with the supposition of selected position.For example, can carry out selected so that upper arm approaches shoulder, and between underarm and upper arm in closely etc.Can carry out the selected of described position supposition by various rational information and posture restraint.

Specifically described above the processing of the step S330 in Fig. 3 with reference to Fig. 6～Figure 10, position generation unit 230, by merging the result of least energy skeleton scanning and the result of element marking, has produced the position supposition of each human body.

Get back to now Fig. 3, at step S340, the described position supposition that attitude determining unit 240 produces position generation unit 230 is assembled at least one attitude supposition, according to attitude interpretational criteria, each attitude supposition is evaluated and definite human body attitude.Can evaluate at least one attitude supposition of assembling by various attitude interpretational criterias and combination in any thereof, as the parameter of the key position relating to (as distance and angle etc. between degree of depth consistance, the contiguous position at continuity vertically of the degree of depth in the length of the direction of metastomium, arm and shank, arm or shank, the prospect coverage rate at each position, each position).

According to a preferred embodiment of the invention, step S340 comprises: attitude sort module will be assembled into described at least one attitude by each position supposition of human body to be supposed, and determines that according at least one positional parameter of each attitude supposition each attitude supposes the probability distribution between described predefined attitude classification; Attitude evaluation module utilizes at least one position binding characteristic of each attitude supposition to assess each attitude suppose the probability distribution between described predefined attitude classification, then by with assessment after the probability distribution of all attitudes supposition in the corresponding attitude of maximum probability value suppose and be defined as human body attitude.

According to a preferred embodiment of the invention, attitude sort module is based on machine learning algorithm, determines that according to the positional parameter of each attitude supposition each attitude supposes the probability distribution between described predefined attitude classification.

According to a preferred embodiment of the invention, described at least one positional parameter comprises at least one in following: the intersection region size between the distance between direction, arm position and the metastomium of metastomium, intersection region size, the shank position between arm position.

According to a preferred embodiment of the invention, described at least one position binding characteristic comprises at least one in following: in the two dimension at the two dimension at arm position or three-dimensional length, shank position or three-dimensional length, arm or shank in degree of depth continuity, arm or shank vertically the degree of depth along the degree of depth consistance at the prospect coverage rate at the contrast of direction perpendicular to axial direction and peripheral region, each position, each position, be close to distance and angle between position.

After this,, at step S350, attitude output unit 250 is exported the information of the definite human body attitude of attitude determining unit 240.The information of described human body attitude can comprise the dimension information (as 2D/3D length and 2D/3D width) of everyone body region, 3D position and the angle of inclination etc. of joint part.

Due to the attitude data that produces relatively small amount for the method and apparatus of estimating human body attitude based on effective observation according to the present invention, therefore can obtain relatively quickly and accurately the estimated result of different attitudes.Figure 11 illustrates for different human body attitudes, and the final carriage being estimated by the depth image of taking represents (rightmost image in each four-tuple).

Therefore, merge the result of MESS and the result of PIXLA for the method and apparatus of estimating human body attitude with complimentary fashion according to of the present invention, in the case of not relying on huge attitude sample data, can from human body depth image, estimate human body attitude comparatively exactly.

Although specifically shown with reference to its exemplary embodiment and described the present invention, but it should be appreciated by those skilled in the art, in the case of not departing from the spirit and scope of the present invention that claim limits, can carry out the various changes in form and details to it.

Claims

1. for estimating an equipment for human body attitude, comprising:

Image acquisition unit, for obtaining the depth image that comprises human object;

Genius loci detecting unit, extract each candidate position and feature of human object human body for the depth image from obtaining, described depth image is carried out to least energy skeleton scanning and determine that multiple skeletons put and build MESS skeleton, and element marking result by each candidate position and depth profile build the PIXLA skeleton at each candidate position;

Position generation unit, the position supposition that produces each human body for the result of the result by merging the scanning of least energy skeleton and element marking;

Attitude determining unit, for the supposition of described position being assembled into at least one attitude supposition, evaluates and definite human body attitude each attitude supposition according to attitude interpretational criteria.

2. equipment as claimed in claim 1, it is characterized in that, for arbitrary candidate position, genius loci detecting unit is determined the continuous skeleton point at described position according to the degree of depth continuity between element marking and the pixel of each pixel in described candidate position, to build the PIXLA skeleton at described position.

3. equipment as claimed in claim 2, it is characterized in that, generation unit self-monitoring candidate's head in position produces supposition head, and according to the PIXLA degree of confidence of the pixel in the supposition head of the information of metastomium and generation, described supposition head is evaluated.

4. equipment as claimed in claim 3, is characterized in that, position generation unit estimates trunk supposition by merging result, the result of element marking and candidate's head of detection of least energy skeleton scanning.

5. equipment as claimed in claim 4, it is characterized in that, the prospect that position generation unit scans according to least energy skeleton is determined rough torso area, estimate 2D trunk direction, remove non-trunk pixel based on element marking result from rough torso area, rough torso area is carried out to the modeling of 2D trunk, use respectively the trunk upper/lower that detects by element marking shoulder/pelvis pixel around to determine 3D shoulder and 3D pelvis.

6. equipment as claimed in claim 5, is characterized in that, position generation unit is carried out the modeling of 2D trunk by following operation to rough torso area:

Determine trunk top based on head zone;

Barycenter based on health and leg area are determined trunk bottom;

By project left border and the right side boundary of determining trunk along the vergence direction of trunk according to trunk dimension constraint condition,

Determine final torso area from rough torso area.

7. equipment as claimed in claim 6, is characterized in that, position generation unit is also identified the complicated attitude of human body from the depth image obtaining, and the candidate position that described complicated attitude is related to re-starts mark.

8. equipment as claimed in claim 7, is characterized in that, described complicated attitude comprises that shank intersects and hand arm held upward.

9. equipment as claimed in claim 8, is characterized in that, position generation unit uses the result of result, element marking of MESS and moving region to produce the supposition at the each position of a small amount of limbs with complimentary fashion.

10. equipment as claimed in claim 9, is characterized in that, position generation unit produces lower-left arm supposition by carrying out following operation:

To each lower-left arm supposition weighting producing, and remove the lower-left arm supposition that weights are low, wherein, determine the weights of described lower-left arm supposition according to the probability that falls into the number of pixel of foreground area of described lower-left arm supposition and these pixels and belong to lower-left arm,

Wherein, position generation unit is carried out similar operations to produce the supposition of right arm, left leg and right shank position.

11. equipment as claimed in claim 9, is characterized in that, position generation unit is also removed the supposition of irrational position according to the relation between different parts, with the supposition of selected position.

12. equipment as claimed in claim 11, is characterized in that, attitude determining unit comprises:

Attitude sort module, suppose for being assembled into described at least one attitude by each position supposition of human body, and determine that according at least one positional parameter of each attitude supposition each attitude supposes the probability distribution between described predefined attitude classification; And

Attitude evaluation module, suppose the probability distribution between described predefined attitude classification for utilizing at least one position binding characteristic of each attitude supposition to assess each attitude, then by with assessment after the probability distribution of all attitudes supposition in the corresponding attitude of maximum probability value suppose and be defined as human body attitude.

13. equipment as claimed in claim 12, wherein, described attitude sort module is based on machine learning algorithm, determines that according to the positional parameter of each attitude supposition each attitude supposes the probability distribution between described predefined attitude classification.

14. equipment as claimed in claim 13, wherein, described at least one positional parameter comprises at least one in following: the intersection region size between the distance between direction, arm position and the metastomium of metastomium, intersection region size, the shank position between arm position.

15. equipment as claimed in claim 14, wherein, described at least one position binding characteristic comprises at least one in following: in the two dimension at the two dimension at arm position or three-dimensional length, shank position or three-dimensional length, arm or shank in degree of depth continuity, arm or shank vertically the degree of depth along the degree of depth consistance at the prospect coverage rate at the contrast of direction perpendicular to axial direction and peripheral region, each position, each position, be close to distance and angle between position.

16. 1 kinds for estimating the method for human body attitude, comprising:

A) obtain the depth image that comprises human object;

B) extract each candidate position and feature of human object human body from the depth image that obtains, described depth image is carried out to least energy skeleton scanning and determine that multiple skeletons put and build MESS skeleton, and element marking result by each candidate position and depth profile build the PIXLA skeleton at each candidate position;

C) position that produces each human body by merging the result of least energy skeleton scanning and the result of element marking is supposed;

D) supposition of described position is assembled into at least one attitude supposition, according to attitude interpretational criteria, each attitude supposition is evaluated and definite human body attitude.

17. methods as claimed in claim 16, it is characterized in that, at step B) in, for arbitrary candidate position, determine the continuous skeleton point at described position according to the degree of depth continuity between element marking and the pixel of each pixel in described candidate position, to build the PIXLA skeleton at described position.

18. methods as claimed in claim 17, it is characterized in that, at step C) in, self-monitoring candidate's head produces supposition head, and according to the PIXLA degree of confidence of the pixel in the supposition head of the information of metastomium and generation, described supposition head is evaluated.

19. methods as claimed in claim 18, is characterized in that, at step C) in, estimate trunk supposition by merging result, the result of element marking and candidate's head of detection of least energy skeleton scanning.

20. methods as claimed in claim 19, it is characterized in that, at step C) in, the prospect scanning according to least energy skeleton is determined rough torso area, estimate 2D trunk direction, remove non-trunk pixel based on element marking result from rough torso area, rough torso area is carried out to the modeling of 2D trunk, use respectively the trunk upper/lower that detects by element marking shoulder/pelvis pixel around to determine 3D shoulder and 3D pelvis.

21. methods as claimed in claim 20, is characterized in that, at step C) in, by following operation, rough torso area is carried out to the modeling of 2D trunk:

Determine trunk top based on head zone;

Barycenter based on health and leg area are determined trunk bottom;

Determine final torso area from rough torso area.

22. methods as claimed in claim 21, is characterized in that, at step C) in, also from the complicated attitude of the depth image identification human body that obtains, and the candidate position that described complicated attitude is related to re-starts mark.

23. methods as claimed in claim 22, is characterized in that, described complicated attitude comprises that shank intersects and hand arm held upward.

24. methods as claimed in claim 23, is characterized in that, at step C) in, use the result of MESS, result and the moving region of element marking to produce a small amount of supposition limbs with complimentary fashion.

25. methods as claimed in claim 24, is characterized in that, at step C) in, produce lower-left arm supposition by carrying out following operation:

Wherein, at step C) in, also carry out similar operations to produce the supposition of right arm, left leg and right shank position.

26. methods as claimed in claim 24, is characterized in that, at step C) in, also remove the supposition of irrational position according to the relation between different parts, with the supposition of selected position.

27. methods as claimed in claim 26, is characterized in that step D) comprising:

To be assembled into described at least one attitude by each position supposition of human body and suppose, and determine that according at least one positional parameter of each attitude supposition each attitude supposes the probability distribution between described predefined attitude classification;

Utilize at least one position binding characteristic of each attitude supposition to assess each attitude and suppose the probability distribution between described predefined attitude classification, then by with assessment after the probability distribution of all attitudes supposition in the corresponding attitude of maximum probability value suppose and be defined as human body attitude.

28. methods as claimed in claim 27, is characterized in that, based on machine learning algorithm, determine that according to the positional parameter of each attitude supposition each attitude supposes the probability distribution between described predefined attitude classification.

29. methods as claimed in claim 28, it is characterized in that, described at least one positional parameter comprises at least one in following: the intersection region size between the distance between direction, arm position and the metastomium of metastomium, intersection region size, the shank position between arm position.

30. methods as claimed in claim 29, it is characterized in that, described at least one position binding characteristic comprises at least one in following: in the two dimension at the two dimension at arm position or three-dimensional length, shank position or three-dimensional length, arm or shank in degree of depth continuity, arm or shank vertically the degree of depth along the degree of depth consistance at the prospect coverage rate at the contrast of direction perpendicular to axial direction and peripheral region, each position, each position, be close to distance and angle between position.