CN101794384B

CN101794384B - Shooting action identification method based on human body skeleton map extraction and grouping motion diagram inquiry

Info

Publication number: CN101794384B
Application number: CN2010101229166A
Authority: CN
Inventors: 耿卫东; 魏知晓
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2010-03-12
Filing date: 2010-03-12
Publication date: 2012-04-18
Anticipated expiration: 2030-03-12
Also published as: CN101794384A

Abstract

The invention discloses a shooting action identification method based on human body skeleton map extraction and grouping motion diagram inquiry. The method includes the following steps: shooting action is acquired into database in advance and is grouped by class, each group constructs a motion diagram, all the actions are rendered into two-dimensional image under multi-angle and then key characteristic is extracted, and image characteristic value of each gesture is calculated. Accurate outline extraction is carried out on shooting picture sequence shot in operation when a person is shooting, characteristic value of skeleton map is calculated, the group in which the gesture most similar to the characteristic value thereof is located is found in database and is taken as a hit group, the group with the most hit in all the skeleton hit of the shooting action is found, then the node where the gesture the most similar to the characteristic value of each skeleton map on the motion diagram is located is found, the points are analyzed and repaired into a continuous section, and the continuous section is taken as action identification result. The invention can fast and accurately identify shooting action only by utilizing an image capturing device.

Description

Extract the act of shooting identification of inquiring about with grouping motion based on human body contour outline figure

Technical field

The present invention relates to a kind ofly extract the act of shooting recognition methods with the grouping motion inquiry, be specifically related to motion diagram that a kind of contour images and people's through the people act of shooting data form and carry out the method that people's act of shooting is discerned based on human body contour outline figure.

Background technology

The recognition technology of human motion always is the research focus of computer vision field; Its research purpose is exactly to let machine can identify people's action; Comprise figure, gesture etc.; It also is a kind of brand-new people and machine alternant way, this mutual through gauge point etc. attached to the hardware device on the person, the only motion through people self.Potential application is widely arranged in the application of industry, in these external video display, these digital entertainment industries of playing huge prospect is arranged also even be applied.

Main in existing ripe Motion Recognition technology what rely on is some extraordinary hardware devices, and like motion capture device, but it need come recorder's three dimensional local information and discern motion by the binding mark point on the person, and this influences people's interactive experience very much.The Motion Recognition of unmarked point is a kind of good substitute mode; It need not give user's plus outer hardware device; This method mainly relies on video camera to obtain image again two dimensional image analysis to be estimated three-dimensional motion, and recover three-dimensional nature from two dimension is the problem of separating more, so can not reach hundred-percent accuracy of identification; (like interactive entertainment) but in some applications, its accuracy of identification can meet the demands.

The present invention obtains high-quality character contour through people's the shade and the interference of environment in the rejecting image; Again through the organic figure of histokinesis; On motion diagram, use the search strategy of multistage expansion and Local Search, realized the basketball movement identification of unmarked point faster.

Summary of the invention

The objective of the invention is to overcome the deficiency of prior art, a kind of act of shooting recognition methods of inquiring about with grouping motion of extracting based on human body contour outline figure is provided.

Extracting the act of shooting recognition methods of inquiring about with grouping motion based on human body contour outline figure comprises the steps:

1) will be in advance with the good various acts of shooting of motion capture device collection according to towards dividing into groups with big type of right-hand man, then the action in each group is disassembled independently 3 d pose, the attitude in each group is built into motion diagram;

2) for the attitude on the motion diagram of each group, it is played up the two dimensional image under a plurality of visual angles, use the method for machine learning that the image of each group on each visual angle extracted the characteristics of image vector;

3) with step 2) in the two dimensional image at each visual angle of each attitude be spliced into a figure, and with step 2) in proper vector calculate the eigenwert of this stitching image, be called the eigenwert of this attitude;

4) take in the scene of basketball action recognition system picture under each visual angle when nobody, be called background;

5) when carrying out action recognition, take the picture of the whole act of shooting sequence under each visual angle of people, be called foreground picture;

6) profile that under each visual angle, carries out following process extracts: foreground picture and background picture are relatively obtained error image; Confirm the bounding box position in the image of human body place according to error image then; In bounding box, confirm the precise image of human body again; The shade of removing human body at last disturbs, and the image that obtains is called profile diagram;

7) profile diagram with each visual angle is stitched together; With step 2) in the proper vector that extracts calculate eigenwert; On described each group of step 1), seek and the most similar attitude of his eigenwert, and will be called the group of hitting of this two field picture with the group at the nearest attitude of its eigenwert place.

8) find out act of shooting every two field picture hit group, vote in and hit maximum groups, be called the group of hitting of whole action;

9) find every two field picture to hit the point at the attitude place that eigenwert is the most close on the group in action; Analyze the relation between these points; Use the multistage expansion with the method for Local Search these some reparations to be become one section continuous on motion diagram, the action sequence of this section sequence representative is exactly the recognition result that the people shoots and moves.

Described will be in advance with the good various acts of shooting of motion capture device collection according to towards dividing into groups with big type of right-hand man, then the action in each group is disassembled independently 3 d pose, the attitude in each group is built into the motion diagram step comprises:

1) will catch the three-dimensional motion of getting off with motion capture equipment and be divided into four groups; First group comprises forward left hand, forward both hands, left side body left hand; Second group comprises forward right hand one hand, forward both hands, the right side body right hand; The 3rd group comprises dorsad both hands, left hand, left side body left hand dorsad, and the 4th group comprises dorsad both hands, the right hand, the right side body right hand dorsad, and group forming criterion is the action nature transition in each group;

2) with each frame of the everything in every group all intercepting get off, each frame is called an attitude;

3) three-dimensional distance between any two attitudes of calculating in every group; Represent these attitudes and the distance between them with a motion diagram; Node on the motion diagram is an attitude; Limit on the motion diagram is the three-dimensional framework distance between the represented attitude of its two node connecting, for each group all makes up a motion diagram.

The described profile that under each visual angle, carries out following process extracts: foreground picture and background picture are relatively obtained error image; Confirm the bounding box position in the image of human body place according to error image then; In bounding box, confirm the precise image of human body again; The shade of removing human body at last disturbs, and the image that obtains is called the profile diagram step and comprises:

1) for an input take the image that gets off by video camera, calculate the difference I of input picture and background image ^Differ, (x, color value y) are (R to certain point on the definition background image ^Background(x, y), G ^Background(x, y), B ^Background(x, y)), the color value of corresponding point is (R on the input picture ^Input(x, y), G ^Input(x, y), B ^Input(x, y)), I ^DifferBe a gray-scale map, it has represented the difference of input picture and background image corresponding point, I ^DifferThe formula that calculates be s (x, y)=max (| R _Backgroud(x, y)-R _Backgroud(x, y) |, | G _Backgroud(x, y)-G _Backgroud(x, y) |, | B _Backgroud(x, y)-B _Backgroud(x, y) |); I _Differ(x, y)=s (x, y)/3+s (x-1, y)/6+s (x+1, y)/6+s (x, y-1)/6+s (x, y+1)/6;

2) earlier with a big threshold value with I ^DifferBinaryzation, prospect are white, and background is a black; And then expand, will obtain the agglomerate of some rough human bodies, and little noise is by elimination; At first find that maximum in these agglomerates agglomerate; Setting a distance then is threshold value, and the agglomerate of the main agglomerate of those distances less than this threshold value is adsorbed onto on this main agglomerate, and the rectangular area R that finds the prospect in this image at last is the zone that human body occurs;

3) earlier the image inside the bounding box is carried out binaryzation with a little threshold value, this will obtain a profile diagram clearly;

4) carry out binaryzation in the bottom of this bounding box with a big threshold value, this will reject people's shadow;

5) again the image that extracts is carried out simple smoothing and noise-reducing process at last, and be placed on the picture centre after human body contour outline is scaled to unified size, obtain profile diagram.

The described point that finds every two field picture to hit the attitude place that eigenwert is the most close on the group in action; Analyze the relation between these points; Use the multistage expansion with the method for Local Search these some reparations to be become one section continuous on motion diagram, the action sequence of this section sequence representative is exactly that people's recognition result step of moving of shooting comprises:

1) the contour images sequence will be mapped to some points hitting on the motion diagram of group, at first those points that obviously depart from colony removed, and naming a person for a particular job on the action diagram forms some tracts;

2) remove to expand the node that lacks forward and backward from each section beginning; For the node that will infer; With the most contiguous its node that frame was mapped to is the center, is a zone of radius with the setpoint distance, hereto each point in zone; Calculate its two dimensional image and the hamming distance of profile diagram on eigenwert that is frame number with this supposition node number, find that point just to think the node of this frame coupling apart from minimum;

3) find out the formed all sequences section of node on the motion diagram; For every section sequence, use described expanding method of a step, infer all nodes of whole action sequence; These nodes are formed new sequence; For each new sequence of infer, calculate it each node two dimensional image and the eigenwert of the corresponding frame of original contour figure the hamming distance with, picking out this that sequence with minimum at last is exactly the result that we match;

4) the result action sequence is carried out interpolation, suitably increase frame number and link up to improve the level and smooth of action.

The carrier of end user's of the present invention profile diagram information This move is as the input of system; Through action diagram various acts of shooting of gathering are in advance organically organized as the database support; Use the characteristics of image that searches out through various simple machine learning methods as matching tool; The action recognition of three-dimensional being transformed into the images match problem of two dimension. the present invention mainly improves the accuracy rate that improves identification to the quality of key factor-profile diagram of influence net result with the tissue and the inquiry of motion diagram. when the profile to the people extracts, at first, filter out the regional noise jamming of major part that takes human region on the image through the big threshold filter of preceding background differential chart and morphological operation being obtained the bounding box position of the part of human body on the image; In the human body bounding box, obtain meticulous human body contour outline then with less threshold filter; Remove people's shade in the bottom of human region with the big threshold filter of another one at last, obtain the higher human body contour outline image of quality. in the figure of histokinesis, the various acts of shooting that at first will gather are in advance divided into groups for big type according to right-hand man, single both hands etc.; The decision of making the final vote is then mated on which group; At first the mistake on big type can not appear in safety action identification, after obtaining mating frame on the original action diagram, at first on motion diagram, the coupling frame of obviously makeing mistakes is detected and repairs; Overcome the error of a part to produce because of machine learning; To a plurality of disconnected continuously deductions after motion diagram is gone forward respectively, finally seek optimum solution then, matching result is optimized; And when on motion diagram, inferring another with point; Only in a subrange of motion diagram, carry out, can avoid the span between net result frame and the frame excessive, guarantee the continuity and the flatness of the sequence of result's motion.

Description of drawings

Fig. 1 (a) is the scene of gathering act of shooting with motion capture equipment

Fig. 1 (b) is the action that collects with motion capture equipment;

Fig. 2 is to grouping of gathering action and the motion diagram of in every group, constructing;

The Background that Fig. 3 gathers when being system's operation;

The picture that has the people (foreground picture) that Fig. 4 gathers when being system's operation in real time;

Fig. 5 is the process that people's profile extracts, and four width of cloth figure are followed successively by according to the direction of arrow: original difference figure, judge bounding box synoptic diagram, little threshold filter meticulousr profile diagram, remove the profile diagram of shade;

Fig. 6 is the synoptic diagram of on the group node that matches for the first time being repaired hitting;

Fig. 7 is the several node continuous segment synoptic diagram that find after to first matched node reparation;

Fig. 8 (a) expands the sequence node (action) of generation to the 1st section continuum among Fig. 7

Fig. 8 (b) expands the sequence node (action) of generation to the 2nd section continuum among Fig. 7.

Embodiment

Extract with the act of shooting recognition methods step of grouping motion inquiry following based on human body contour outline figure:

1) at first gathers various acts of shooting like Fig. 1 a and Fig. 1 b with motion capture device; Then database is put in these actions; Mode (right-hand man of action and single both hands) according to Fig. 2 is divided into four groups with these actions then, and each action in the database necessarily belongs in wherein one group.In each group, make up a motion diagram at last; The mode that makes up motion diagram is: the attitude that the everything in will organizing is taken into a frame frame apart; Be expressed as a node on the figure, calculate the three-dimensional framework distance between any two attitudes, be expressed as connecting two internodal lines;

2) attitude of each node in each group being represented is played up two dimensional image; Like the image on the node among Fig. 2; Use the method for machine learning that these all images are trained, extract the proper vector of image, the proper vector of image can be selected various general image characteristics; Select for use the Lis Hartel of image to levy in this example, the Lis Hartel that extracts 150 images is at last levied;

3) use 150 Lis Hartels that extract to levy the eigenwert of the image that calculates the everything node of graph, be stored on this node;

When 4) bringing into operation, at first take the picture of having no talent in the scene, save as background, the Background during like Fig. 3 under one of them visual angle in our system;

5) keep scene and camera position constant, take picture when the people gets into after scene is made act of shooting, this is a foreground picture, is a frame wherein like Fig. 4;

6) profile that under each visual angle, each frame foreground picture is carried out following process extracts: the process of extraction such as Fig. 5; Foreground picture and background picture are relatively obtained error image; Confirm the bounding box position in the image of human body place according to error image then; In bounding box, confirm the precise image of human body again, the shade of removing human body at last disturbs, and obtains profile diagram;

7) for a series of movements sequence of taking; He is made up of a lot of frames; Each two field picture is stitched together the profile diagram at each visual angle forms new image, levies with the Lis Hartel of the 2nd step extraction stitching image is calculated, and each two field picture has an eigenwert like this.For each two field picture, find out that node and this gap the most close in each action group respectively with its eigenwert, find out that minimum group of gap in all groups, write down in this group number, be called the group of hitting of this frame;

8) find the group of hitting of every two field picture according to the method in the 7th step, add up each group, select that maximum group of number of times, be called the group of hitting of whole action sequence as the number of times that hits group;

9) for each two field picture; Hitting in the group of action; All find that node the most similar, be called the matched node of this frame, analyze the relation between these points with its eigenwert; Use the multistage expansion with the method for Local Search these some reparations to be become one section continuous on motion diagram, the action sequence of this section sequence representative is exactly the recognition result that the people shoots and moves.

The step of extracting people's profile diagram like described the 6th step of Fig. 5 is:

1) at first calculates the original differences figure I of foreground image and background image ^Differ, (x, color value y) are (R to certain point on the definition background image ^Background(x, y), G ^Background(x, y), B ^Background(x, y)), the color value of corresponding point is (R on the input picture ^Input(x, y), G ^Input(x, y), B ^Input(x, y)), then I ^DifferAccount form be s (x, y)=max (| R _Backgroud(x, y)-R _Backgroud(x, y) |, | G _Backgroud(x, y)-G _Backgroud(x, y) |, | B _Backgroud(x, y)-B _Backgroud(x, y) |); I _Differ(x, y)=s (x, y)/3+s (x-1, y)/6+s (x+1, y)/6+s (x, y-1)/6+s (x, y+1)/6;

2) confirm the residing approximate location of people in the image then, the bounding box position of just calculating the people, earlier with a bigger threshold value with I ^DifferBinaryzation, prospect are white, and background is a black; And then expand, will obtain the agglomerate of some more rough human bodies, and little noise is by elimination; At first find that maximum in these agglomerates agglomerate, think that he is the major part of the person, setting a distance then is threshold value; With the main agglomerate of those distances less than the agglomerate of this threshold value be it is thus clear that; All the other are invisible, and these agglomerates that find so just are considered to people's health, and the minimum bounding box of finding out these agglomerates at last is exactly the bounding box of human body;

3) in bounding box with a less threshold value to I ^DifferCarry out filtering, and outside bounding box, all be changed to black, will obtain a comparatively meticulous human body contour outline figure like this, but a slice shadow region may appear in the underfooting part of human body usually;

4) last below the human body bounding box about 1/3 place with a bigger threshold value to I ^DifferFiltering will well filter out shade, obtain better people's profile diagram at last;

The step of described the 9th step according to the synthetic last final motion result of original match point is:

1) from the 8th step, can find every two field picture at a match point that hits on the group; These points are like hypographous point among Fig. 6; Some continues together these points on motion diagram; Some possibly depart from most of points, those points that obviously depart from colony as put 5 be considered to matching error point, at first these are obviously departed from colony's point and remove;

2) because removed some points; So will these points be repaired; For example repair the position of the 5th frame becomes the solid black point among Fig. 6; The method of repairing is that the certain limit around the point of the point of the 6th frame or the 4th frame is found out some consecutive point earlier, and which is the most similar for the eigenwert of calculating the 5th frame profile diagram then and the eigenwert of these consecutive point, and that the most similar node is just thought the matched node of the 5th new frame.；

3) through the erroneous point reparation; Remaining will be a section sequence of fracture; Each section sequence is formed a continuum, and like continuum among Fig. 71 and continuum 2, this method thinks that last that synthetic action sequence comprises one of them continuum certainly; So this step will be that whole action sequence is inferred on the basis with each continuum, this is called multistage and expands.Be based on the deduction that continuum 1 is made like Fig. 8 a, Fig. 8 b is based on the deduction that continuum 2 is made.The algorithm of inferring is to go forward and backward to expand from each section beginning, and establishing the node that will infer is f, the node that on behalf of the action of i frame, it mated, and establishing the father node that is used to expand it is f _Parent(the joint known point of normally the most contiguous i frame) then finds action diagram with f _ParentNode is the center; Be a zone of radius at a certain distance; Each point in zone hereto calculates the hamming distance on eigenwert between its profile diagram of profile diagram and i frame, finds that point apart from minimum just to think the attitude node that the i frame should mate; The foundation of inferring these points be front and back two frames of an action sequence should be in same regional area on the action diagram, this is called Local Search.At last for each inferred results; Calculate it each node two dimensional image and the eigenwert of the corresponding frame of original contour figure the hamming distance with; Pick out at last and minimum that, the action sequence of this sequence representative is exactly the result that we match;

This method has been implemented as a concrete act of shooting recognition system; Under Windows, use C Plus Plus to write realization; The discrimination that mainly being of the performance situation of native system moved, we test system to the identification situation of the motion of different people, test result such as following table:

The personnel that take one's test are two widely different people A and B of figure; Everyone respectively does 46 acts of shooting, adds up to 92 actions. the act of shooting that has wherein comprised left hand, the right hand, both hands, leaned to one side and partly carry on the back body, listed three statisticss according to the different levels that coupling is required; Wherein first result is the action after mating and the consistance of the basic attitude of test data; Require right-hand man, single both hands of test data, intuitive motion characteristics such as whether squat down to mate fully, second statistics is the discrimination of concrete shooting hand-type, the position that requires differentiation toss that test data can be good and pitching and sell; The 3rd added up test data towards accuracy rate, requirement can identify forward, lean to one side and dorsad.

Test result shows that native system is all right to the identification of act of shooting, and native system can be used in the interactive entertainment the discrimination of action.

Claims

1. an act of shooting recognition methods of inquiring about based on human body contour outline figure extraction and grouping motion is characterized in that comprising the steps:

9) find every two field picture to hit the point at the attitude place that eigenwert is the most close on the group in action; Analyze the relation between these points; Use the multistage expansion with the method for Local Search these some reparations to be become one section continuous on motion diagram, the action sequence of this section sequence representative is exactly the recognition result that the people shoots and moves;

2. a kind of act of shooting recognition methods of inquiring about with grouping motion of extracting based on human body contour outline figure as claimed in claim 1; It is characterized in that; Described will be in advance with the good various acts of shooting of motion capture device collection according to towards dividing into groups with big type of right-hand man; Then the action in each group is disassembled independently 3 d pose, the attitude in each group is built into the motion diagram step comprises:

3. a kind of act of shooting recognition methods of inquiring about with grouping motion of extracting based on human body contour outline figure as claimed in claim 1; It is characterized in that; The described profile that under each visual angle, carries out following process extracts: foreground picture and background picture are relatively obtained error image, confirm the bounding box position in the image of human body place according to error image then, in bounding box, confirm the precise image of human body again; The shade of removing human body at last disturbs, and the image that obtains is called the profile diagram step and comprises:

1) for an input take the image that gets off by video camera, calculate the difference I of input picture and background image ^Differ, (x, color value y) are (R to certain point on the definition background image ^Background(x, y), G ^Background(x, y), B ^Background(x, y)), the color value of corresponding point is (R on the input picture ^Input(x, y), G ^Input(x, y), B ^Input(x, y)), I ^DifferBe a gray-scale map, it has represented the difference of input picture and background image corresponding point, I ^DifferThe formula that calculates is

s(x，y)＝max(|R ^background(x，y)-R ^input(x，y)|，|G ^background(x，y)-G ^input(x，y)|，|B ^background(x，y)-B ^input(x，y)|)；

I _differ(x，y)＝s(x，y)/3+s(x-1，y)/6+s(x+1，y)/6+s(x，y-1)/6+s(x，y+1)/6；

2) earlier with a big threshold value with I _DifferBinaryzation, prospect are white, and background is a black; And then expand, will obtain the agglomerate of some rough human bodies, and little noise is by elimination; At first find that maximum in these agglomerates agglomerate; Setting a distance then is threshold value, and the agglomerate of the main agglomerate of those distances less than this threshold value is adsorbed onto on this main agglomerate, and the rectangular area R that finds the prospect in this image at last is the zone that human body occurs;