CN101794384A

CN101794384A - Shooting action identification method based on human body skeleton map extraction and grouping motion diagram inquiry

Info

Publication number: CN101794384A
Application number: CN 201010122916
Authority: CN
Inventors: 耿卫东; 魏知晓
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2010-03-12
Filing date: 2010-03-12
Publication date: 2010-08-04
Anticipated expiration: 2030-03-12
Also published as: CN101794384B

Abstract

The invention discloses a shooting action identification method based on human body skeleton map extraction and grouping motion diagram inquiry. The method includes the following steps: shooting action is acquired into database in advance and is grouped by class, each group constructs a motion diagram, all the actions are rendered into two-dimensional image under multi-angle and then key characteristic is extracted, and image characteristic value of each gesture is calculated. Accurate outline extraction is carried out on shooting picture sequence shot in operation when a person is shooting, characteristic value of skeleton map is calculated, the group in which the gesture most similar to the characteristic value thereof is located is found in database and is taken as a hit group, the group with the most hit in all the skeleton hit of the shooting action is found, then the node where the gesture the most similar to the characteristic value of each skeleton map on the motion diagram is located is found, the points are analyzed and repaired into a continuous section, and the continuous section is taken as action identification result. The invention can fast and accurately identify shooting action only by utilizing an image capturing device.

Description

A kind of based on the act of shooting recognition methods of human body contour outline figure extraction with the grouping motion inquiry

Technical field

The present invention relates to a kind ofly extract act of shooting recognition methods with the grouping motion inquiry, be specifically related to a kind of contour images and the motion diagram formed of people's act of shooting data method of carrying out people's act of shooting identification by the people based on human body contour outline figure.

Background technology

The recognition technology of human motion always is the research focus of computer vision field, its research purpose is exactly to allow machine can identify people's action, comprise figure, gesture etc., it also is a kind of brand-new people and machine alternant way, this mutual not by gauge point etc. attached to the hardware device on the person, the only motion by people self.Potential application is widely arranged in the application of industry, in these external video display, these digital entertainment industries of playing huge prospect is arranged also even be applied.

Main in existing ripe Motion Recognition technology what rely on is some extraordinary hardware devices, and as motion capture device, but it need come recorder's three dimensional local information and discern motion by the binding mark point on the person, and this influences people's interactive experience very much.The Motion Recognition of unmarked point is a kind of good substitute mode, it does not need to the outer hardware device of user's plus, this method mainly relies on video camera to obtain image again two dimensional image analysis to be estimated three-dimensional motion, recover three-dimensional nature from two dimension is the problem of separating more, so can not reach hundred-percent accuracy of identification, (as interactive entertainment) but in some applications, its accuracy of identification can meet the demands.

The present invention obtains high-quality character contour by people's the shade and the interference of environment in the rejecting image, again by the organic figure of histokinesis, on motion diagram, use the search strategy of multistage expansion and Local Search, realized the basketball movement identification of unmarked point faster.

Summary of the invention

The objective of the invention is to overcome the deficiencies in the prior art, a kind of act of shooting recognition methods of inquiring about with grouping motion of extracting based on human body contour outline figure is provided.

Extracting the act of shooting recognition methods of inquiring about with grouping motion based on human body contour outline figure comprises the steps:

The various acts of shooting of 1) will be in advance gathering with motion capture device according to towards with right-hand man's big class grouping, then the action in each group is disassembled independently 3 d pose, the attitude in each group is built into motion diagram;

2) for the attitude on the motion diagram of each group, it is played up two dimensional image under a plurality of visual angles, use the method for machine learning that each group image on each visual angle is extracted the characteristics of image vector;

3) with step 2) in the two dimensional image at each visual angle of each attitude be spliced into a figure, and with step 2) in proper vector calculate the eigenwert of this stitching image, be called the eigenwert of this attitude;

4) take in the scene of basketball action recognition system picture under each visual angle when nobody, be called background;

5) when carrying out action recognition, take the picture of the whole act of shooting sequence under each visual angle of people, be called foreground picture;

6) profile that carries out following process under each visual angle extracts: foreground picture and background picture are relatively obtained error image, determine bounding box position in the image of human body place according to error image then, in bounding box, determine the precise image of human body again, the shade of removing human body at last disturbs, and the image that obtains is called profile diagram;

7) profile diagram with each visual angle is stitched together, with step 2) in the proper vector that extracts calculate eigenwert, on described each group of step 1), seek and the most similar attitude of his eigenwert, and will be called the group of hitting of this two field picture with the group at the nearest attitude of its eigenwert place.

8) find out act of shooting every two field picture hit group, vote in and hit maximum groups, be called the group of hitting of whole action;

9) find every two field picture to hit the point at the attitude place that eigenwert is the most close on the group in action, analyze the relation between these points, use the multistage expansion with the method for Local Search these some reparations to be become one section continuous on motion diagram, the action sequence of this section sequence representative is exactly the recognition result that the people shoots and moves.

Described various acts of shooting of will be in advance gathering with motion capture device according to towards with right-hand man's big class grouping, then the action in each group is disassembled independently 3 d pose, the attitude in each group is built into the motion diagram step comprises:

1) will catch the three-dimensional motion of getting off with motion capture equipment and be divided into four groups, first group comprises forward left hand, forward both hands, left side body left hand, second group comprises forward right hand one hand, forward both hands, the right side body right hand, the 3rd group comprises dorsad both hands, left hand, left side body left hand dorsad, the 4th group comprises dorsad both hands, the right hand, the right side body right hand dorsad, and group forming criterion is the action nature transition in each group;

2) each frame with the everything in every group all intercepts, and each frame is called an attitude;

3) three-dimensional distance between any two attitudes of calculating in every group, represent these attitudes and the distance between them with a motion diagram, node on the motion diagram is an attitude, limit on the motion diagram is the three-dimensional framework distance between the represented attitude of its two node connecting, for each group all makes up a motion diagram.

The described profile that carries out following process under each visual angle extracts: foreground picture and background picture are relatively obtained error image, determine bounding box position in the image of human body place according to error image then, in bounding box, determine the precise image of human body again, the shade of removing human body at last disturbs, and the image that obtains is called the profile diagram step and comprises:

1), calculates the difference I of input picture and background image for the image that is filmed by video camera of an input ^Differ, (x, color value y) are (R to certain point on the definition background image ^Background(x, y), G ^Background(x, y), B ^Background(x, y)), the color value of corresponding point is (R on the input picture ^Input(x, y), G ^Input(x, y), B ^Input(x, y)), I ^DifferBe a gray-scale map, it has represented the difference of input picture and background image corresponding point, I ^DifferThe formula that calculates be s (x, y)=max (| R _Backgroud(x, y)-R _Backgroud(x, y) |, | G _Backgroud(x, y)-G _Backgroud(x, y) |, | B _Backgroud(x, y)-B _Backgroud(x, y) |); I _Differ(x, y)=s (x, y)/3+s (x-1, y)/6+s (x+1, y)/6+s (x, y-1)/6+s (x, y+1)/6;

2) earlier with a big threshold value with I ^DifferBinaryzation, prospect is a white, background is a black, and then expand, will obtain the agglomerate of some rough human bodies, and little noise is by elimination, at first find that maximum in these agglomerates agglomerate, setting a distance then is threshold value, and the agglomerate of the main agglomerate of those distances less than this threshold value is adsorbed onto on this main agglomerate, and the rectangular area R that finds the prospect in this image at last is the zone that human body occurs;

3) earlier the image inside the bounding box is carried out binaryzation with a little threshold value, this will obtain a profile diagram clearly;

4) carry out binaryzation in the bottom of this bounding box with a big threshold value, this will reject people's shadow;

5) again the image that extracts is carried out simple smoothing and noise-reducing process at last, and be placed on the picture centre after human body contour outline is scaled to unified size, obtain profile diagram.

The described point that finds every two field picture to hit the attitude place that eigenwert is the most close on the group in action, analyze the relation between these points, use the multistage expansion with the method for Local Search these some reparations to be become one section continuous on motion diagram, the action sequence of this section sequence representative is exactly that people's recognition result step of moving of shooting comprises:

1) the contour images sequence will be mapped to some points hitting on the motion diagram of group, at first those obviously be departed from the point of colony and remove, and naming a person for a particular job on the action diagram forms some tracts;

2) remove to expand the node that lacks forward and backward from each section beginning, for the node that will infer, with the most contiguous its node that frame was mapped to is the center, with the setpoint distance is a zone of radius, Qu Yu each point hereto, calculate its two dimensional image and the hamming distance of profile diagram on eigenwert that is frame number with this supposition node number, find that point just to think the node of this frame coupling apart from minimum;

3) find out the formed all sequences section of node on the motion diagram, for every section sequence, with the described expanding method of previous step, infer all nodes of whole action sequence, these nodes are formed new sequence, for each new sequence of infer, calculate it each node two dimensional image and the eigenwert of the corresponding frame of original contour figure the hamming distance and, that sequence of picking out this and minimum at last is exactly the result that we match;

4) the result action sequence is carried out interpolation, suitably increase frame number and link up to improve the level and smooth of action.

The carrier of end user's of the present invention profile diagram information This move is as the input of system, by action diagram various acts of shooting of gathering are in advance organically organized as the database support, with the characteristics of image that searches out by various simple machine learning methods as matching tool, the action recognition of three-dimensional being transformed into the images match problem of two dimension. the present invention mainly improves the accuracy rate that improves identification to the quality of key factor-profile diagram of influence net result and the tissue and the inquiry of motion diagram. in profile extraction to the people, at first by the big threshold filter of preceding background differential chart and morphological operation being obtained the bounding box position of the part of human body on the image, filter out the noise jamming in most of zone of expense human region on the image, in the human body bounding box, obtain meticulous human body contour outline then with less threshold filter, remove people's shade in the bottom of human region with the big threshold filter of another one at last, obtain the higher human body contour outline image of quality. in the figure of histokinesis, the various acts of shooting that at first will gather in advance are according to the right-hand man, big class such as single both hands is divided into groups, the decision of making the final vote is then mated on which group, at first the mistake on the big class can not appear in safety action identification, after obtaining mating frame on the original action diagram, at first on motion diagram, the coupling frame of obviously makeing mistakes is detected and repairs, overcome the error of a part to produce because of machine learning, then to a plurality of disconnected continuously deductions after motion diagram is gone forward respectively, the final optimum solution of seeking, matching result is optimized, and when on motion diagram, inferring another with a point, only in a subrange of motion diagram, carry out, can avoid the span between net result frame and the frame excessive, guarantee the continuity and the flatness of the sequence of result's motion.

Description of drawings

Fig. 1 (a) is the scene of gathering act of shooting with motion capture equipment

Fig. 1 (b) is the action that collects with motion capture equipment;

Fig. 2 is to grouping of gathering action and the motion diagram of constructing in every group;

The Background that Fig. 3 gathers when being system's operation;

The picture that has the people (foreground picture) that Fig. 4 gathers when being system's operation in real time;

Fig. 5 is the process that people's profile extracts, and four width of cloth figure are followed successively by according to the direction of arrow: original difference figure, judge bounding box synoptic diagram, little threshold filter meticulousr profile diagram, remove the profile diagram of shade;

Fig. 6 is the synoptic diagram of on the group node that matches for the first time being repaired hitting;

Fig. 7 is the several node continuous segment synoptic diagram that find after to first matched node reparation;

Fig. 8 (a) expands the sequence node (action) of generation to the 1st section continuum among Fig. 7

Fig. 8 (b) expands the sequence node (action) of generation to the 2nd section continuum among Fig. 7.

Embodiment

Extract with the act of shooting recognition methods step of grouping motion inquiry as follows based on human body contour outline figure:

1) at first gathers various acts of shooting as Fig. 1 a and Fig. 1 b with motion capture device, then database is put in these actions, mode (right-hand man of action and single both hands) according to Fig. 2 is divided into four groups with these actions then, and each action in the database necessarily belongs in wherein one group.In each group, make up a motion diagram at last, the mode that makes up motion diagram is: the attitude that the everything in will organizing is taken into a frame frame apart, be expressed as a node on the figure, calculate the three-dimensional framework distance between any two attitudes, be expressed as connecting two internodal lines;

2) attitude that each node in each group is represented is played up two dimensional image, as the image on the node among Fig. 2, use the method for machine learning that these all images are trained, extract the proper vector of image, the proper vector of image can be selected various general image features, select for use the Lis Hartel of image to levy in this example, the Lis Hartel that extracts 150 images is at last levied;

3) use 150 Lis Hartels that extract to levy the eigenwert of the image that calculates the everything node of graph, be stored on this node;

When 4) bringing into operation, at first take the picture of having no talent in the scene, save as background, the Background during as Fig. 3 under one of them visual angle in our system;

5) keep scene and camera position constant, take picture when the people enters after scene is made act of shooting, this is a foreground picture, is wherein a frame as Fig. 4;

6) profile that under each visual angle each frame foreground picture is carried out following process extracts: the process of extraction such as Fig. 5, foreground picture and background picture are relatively obtained error image, determine bounding box position in the image of human body place according to error image then, in bounding box, determine the precise image of human body again, the shade of removing human body at last disturbs, and obtains profile diagram;

7) for a series of movements sequence of taking, he is made up of a lot of frames, each two field picture is stitched together the profile diagram at each visual angle forms new image, levies with the Lis Hartel of the 2nd step extraction stitching image is calculated, and each two field picture has an eigenwert like this.For each two field picture, find out that node and this gap the most close in each action group respectively with its eigenwert, find out that group of gap minimum in all groups, write down in this group number, be called the group of hitting of this frame;

8) find the group of hitting of every two field picture according to the method in the 7th step, add up each group, select that maximum group of number of times, be called the group of hitting of whole action sequence as the number of times that hits group;

9) for each two field picture, hitting in the group of action, all find that node the most similar to its eigenwert, the matched node that is called this frame, analyze the relation between these points, use the multistage expansion with the method for Local Search these some reparations to be become one section continuous on motion diagram, the action sequence of this section sequence representative is exactly the recognition result that the people shoots and moves.

The step that the 6th step is as described in Figure 5 extracted people's profile diagram is:

1) at first calculates the original differences figure I of foreground image and background image ^Differ, (x, color value y) are (R to certain point on the definition background image ^Background(x, y), G ^Background(x, y), B ^Background(x, y)), the color value of corresponding point is (R on the input picture ^Input(x, y), G ^Input(x, y), B ^Input(x, y)), then I ^DifferAccount form be s (x, y)=max (| R _Backgroud(x, y)-R _Backgroud(x, y) |, | G _Backgroud(x, y)-G _Backgroud(x, y) |, | B _Backgroud(x, y)-B _Backgroud(x, y) |); I _Differ(x, y)=s (x, y)/3+s (x-1, y)/6+s (x+1, y)/6+s (x, y-1)/6+s (x, y+1)/6;

2) determine the residing approximate location of people in the image then, the bounding box position of just calculating the people, earlier with a bigger threshold value with I ^DifferBinaryzation, prospect is a white, background is a black, and then expand, will obtain the agglomerate of some more rough human bodies, and little noise is by elimination, at first find that maximum in these agglomerates agglomerate, think that he is the major part of the person, setting a distance then is threshold value, is that as seen all the other are invisible with the main agglomerate of those distances less than the agglomerate of this threshold value, these agglomerates that find so just are considered to people's health, and the minimum bounding box of finding out these agglomerates at last is exactly the bounding box of human body;

3) in bounding box with a less threshold value to I ^DifferCarry out filtering, and outside bounding box, all be changed to black, will obtain a comparatively meticulous human body contour outline figure like this, but a slice shadow region may appear in the underfooting part of human body usually;

4) last below the human body bounding box about 1/3 place with a bigger threshold value to I ^DifferFiltering will well filter out shade, obtain better people's profile diagram at last;

The step of described the 9th step according to the synthetic last final motion result of original match point is:

1) from the 8th step, can find every two field picture at a match point that hits on the group, these points are as hypographous point among Fig. 6, some continues together these points on motion diagram, some may depart from most of points, those points that obviously depart from colony at first obviously depart from these colony's point and remove as putting 5 points that are considered to matching error;

2) because removed some points, so these points will be repaired, for example the solid black point that becomes among Fig. 6 is repaired in the position of the 5th frame, the method of repairing is to find out some consecutive point at the point of the 6th frame or the some certain limit on every side of the 4th frame earlier, which is the most similar for the eigenwert of calculating the 5th frame profile diagram then and the eigenwert of these consecutive point, and that the most similar node is just thought the matched node of the 5th new frame.；

3) through the erroneous point reparation, remaining will be a section sequence of fracture, each section sequence is formed a continuum, as continuum among Fig. 71 and continuum 2, this method thinks that last that synthetic action sequence comprises one of them continuum certainly, so this step will be that whole action sequence is inferred on the basis with each continuum, this is called multistage and expands.Be based on the deduction that continuum 1 is made as Fig. 8 a, Fig. 8 b is based on the deduction that continuum 2 is made.The algorithm of inferring is to go forward and backward to expand from each section beginning, and establishing the node that will infer is f, the node that on behalf of the action of i frame, it mated, and establishing the father node that is used to expand it is f _Parent(the joint known point of normally the most contiguous i frame) then finds action diagram with f _ParentNode is the center, be a zone of radius at a certain distance, Qu Yu each point hereto, calculate it profile diagram and the profile diagram of i frame between hamming distance on eigenwert, find that point just to think the attitude node that the i frame should mate apart from minimum, the foundation of inferring these points be front and back two frames of an action sequence should be in same regional area on the action diagram, this is called Local Search.At last for each inferred results, calculate it each node two dimensional image and the eigenwert of the corresponding frame of original contour figure the hamming distance and, pick out at last and minimum that, the action sequence of this sequence representative is exactly the result that we match;

This method has been implemented as a concrete act of shooting recognition system, under Windows, use C Plus Plus to write realization, the discrimination that mainly being of the performance situation of native system moved, we test system to the identification situation of the motion of different people, test result such as following table:

The personnel that take one's test are two widely different people A and B of figure, everyone respectively does 46 acts of shooting, add up to 92 actions. wherein comprised left hand, the right hand, both hands, lean to one side, with the act of shooting of partly carrying on the back body, three statisticss have been listed according to the different levels that coupling is required, wherein first result is the action after mating and the consistance of the basic attitude of test data, require the right-hand man of test data, single both hands, intuitive motion characteristics such as whether squat down are mated fully, second statistics is the discrimination of concrete shooting hand-type, the position that requires differentiation toss that test data can be good and pitching and sell, the 3rd added up test data towards accuracy rate, requirement can identify forward, lean to one side and dorsad.

Test result shows that native system is all right to the identification of act of shooting, and native system can be used in the interactive entertainment the discrimination of action.

Claims

1. an act of shooting recognition methods of inquiring about based on human body contour outline figure extraction and grouping motion is characterized in that comprising the steps:

2. a kind of act of shooting recognition methods of inquiring about with grouping motion of extracting based on human body contour outline figure as claimed in claim 1, it is characterized in that, described various acts of shooting of will be in advance gathering with motion capture device according to towards with right-hand man's big class grouping, then the action in each group is disassembled independently 3 d pose, the attitude in each group is built into the motion diagram step comprises:

3. a kind of act of shooting recognition methods of inquiring about with grouping motion of extracting based on human body contour outline figure as claimed in claim 1, it is characterized in that, the described profile that carries out following process under each visual angle extracts: foreground picture and background picture are relatively obtained error image, determine bounding box position in the image of human body place according to error image then, in bounding box, determine the precise image of human body again, the shade of removing human body at last disturbs, and the image that obtains is called the profile diagram step and comprises:

1), calculates the difference I of input picture and background image for the image that is filmed by video camera of an input ^Differ, (x, color value y) are (R to certain point on the definition background image ^Background(x, y), G ^Background(x, y), B ^Background(x, y)), the color value of corresponding point is (R on the input picture ^Input(x, y), G ^Input(x, y), B ^Input(x, y)), I ^DifferBe a gray-scale map, it has represented the difference of input picture and background image corresponding point, I ^DifferThe formula that calculates is

s(x，y)＝max(|R _backgroud(x，y)-R _backgroud(x，y)|，|G _backgroud(x，y)-G _backgroud(x，y)|，|B _backgroud(x，y)-B _backgroud(x，y)|)；

I _differ(x，y)＝s(x，y)/3+s(x-1，y)/6+s(x+1，y)/6+s(x，y-1)/6+s(x，y+1)/6；

4. a kind of act of shooting recognition methods of inquiring about with grouping motion of extracting based on human body contour outline figure as claimed in claim 1, it is characterized in that, the described point that finds every two field picture to hit the attitude place that eigenwert is the most close on the group in action, analyze the relation between these points, use the multistage expansion with the method for Local Search these some reparations to be become one section continuous on motion diagram, the action sequence of this section sequence representative is exactly that people's recognition result step of moving of shooting comprises: