CN106446847A

CN106446847A - Human body movement analysis method based on video data

Info

Publication number: CN106446847A
Application number: CN201610867148.4A
Authority: CN
Inventors: 夏春秋
Original assignee: Shenzhen Vision Technology Co Ltd
Current assignee: Shenzhen Vision Technology Co Ltd
Priority date: 2016-09-30
Filing date: 2016-09-30
Publication date: 2017-02-22

Abstract

The invention provides a human body movement analysis method based on video data. The main content of the method comprises data input, space movement evaluation, time movement path extraction and movement suggestion generation. The method comprises processes of firstly using UCF-Sports data for carrying out training, using Olympic sports dataset for carrying out testing, subjecting input data to space movement evaluation including human body evaluation and movement evaluation, obtaining movement scores, completing movement paths by time movement path generation and connection, and finally obtaining movement suggestion results. The method can process human body movements of different postures and generate movement suggestions, provides a greedy search algorithm to achieve movement path generation, and improves the precision of suggestion generation and the efficiency of suggestions.

Description

A kind of human action analysis method based on video data

Technical field

The present invention relates to field of human motion analysis, especially relate to a kind of analysis of the human action based on video data Method.

Background technology

Video actions analysis is an important subject understanding mankind's activity, has obtained extensive concern in recent years. One common task of video actions analysis is action recognition, and its purpose is to determine that the action of which type occurs regarding In frequency.Compare with action recognition, motion detection is an extremely difficult task, and it does not require nothing more than determination type of action, and Also analyze semantics information.

Nowadays video action analysis remain a challenging problem.The space-time of the complexity due to being related in task Relationship modeling, this problem can be considered as there are two basic steps, and that is, space is (for example：Frame level) action evaluation and the (example of time As：Videl stage) path of motion generation.On the one hand, due to the multiformity of action species and the change of behavior of men, very difficult have The frame level action suggestion of meaning and differentiation.Other aspects, it is true that generally in the potential action region of each frame sum, video Persistent period be exponentially increased, this causes certain difficulty to motion analyses.

The present invention proposes a kind of new frame extracting based on spatial displacements evaluation and time path of motion.Using UCF- Sports data is trained, and is tested using Olympic sports dataset data, the data of input is wrapped Include human body evaluation and the spatial displacements evaluation of Motion evaluation, acquisition action fraction, then generated by time path of motion and contact Execution path, finally obtains action advisory result.The present invention can process human action and the generation action of different attitudes Suggestion, provides a greedy search algorithm to solve coordinates measurement of taking action, and improves precision and the suggestion of suggestion generation simultaneously Efficiency.

Content of the invention

For solving the problems, such as no constraining in video clipping search action suggestion, it is an object of the invention to provide a kind of Human action analysis method based on video data is it is proposed that a kind of extracted based on spatial displacements evaluation and time path of motion New frame.

For solving the above problems, the present invention provides a kind of human action analysis method based on video data, and it is mainly interior Hold and include：

(1) data input；

(2) spatial displacements evaluation；

(3) time path of motion extracts；

(4) action suggestion generates.

Wherein, described data input, including training and test two parts, is wherein instructed using UCF-Sports data Practice, tested using Olympic sports dataset data；

(1) UCF-Sports data set has 10 kinds of actions, 150 sections of short-sighted frequencies, has been widely used for operating position fixing；

(2)Olympic sports dataset：This data base has 16 kinds of behaviors, 783 sections of videos,.

Wherein, described spatial displacements evaluation, calculates including human body evaluation, Motion evaluation and action fraction.

Further, described evaluation, including having evaluation index, evaluation is based on action suggestionWith ground truth G it Between average IoU value, it is defined as：

Wherein G_tWithBe respectively t frame detection bounding box and ground truth, o (...) is IoU value, and | C | is one group Frame, testing result therein or ground truth are not empty；WhenWhen, then action suggestion is positive group；η is specified Threshold value, η is set to 0.5.

Further, described human body evaluation, including execution training data, rotate each training sample, respectively fromArriveSeven different angles, be spaced apart Represent t frame i-th action bounding box, bounding box be expressed as [x, y, W, h], wherein w and h represents width and height respectively, and (x, y) is center；After training terminates, each bounding box is in test video 'sProbabilityCNN assessing network can be passed through；By arranging a probability threshold value, the mankind with more high probability build View, keeps for subsequent treatment.

Further, described action evaluation, including using motion clue exclusion negative action suggestion；Light stream rectangular histogram (HOF) descriptor is used to describe everyone exercise suggestion；Two gauss hybrid models (GMMs), G are constructed based on HOFs_p (.) and G_n(.), represents positive and negative suggestion respectively, and prediction belongs to the probability of the motor pattern of action or ground truth； HOFs calculates intersecting unit (IoU) bounding box, overlapping with ground truth more than 0.5 as positive, and those overlaps are less than 0.1 is negative sample；A given testing schemeHOFh with it_i, definitionProbability as motion scores, make Prediction with the mixture of two Gauss models：

σ=1/ (1+e^-x) mapping probability scope be [0,1].

Further, described action fraction calculates, and scores including a bounding box actionBy human detection scoring and Motion scores two parts form, and are defined as follows：

λ_pIt is the parameter that the balance mankind evaluate and Motion evaluation scores.

Wherein, described time path of motion extracts, and generates including path of motion and contact, path of motion complete, step As follows：

(1) path of motion generates

Action suggestion on each framework, finds one group of action path P={ p₁,p₂,…,p_i, wherein p_i=A corresponding path, starts to e-th frame end from s-th frame；Formulating and finding action path set P is maximum Collection covering problem (MSCP), formulates improved optimization purpose MSCP, makes the member in action scoring and set of paths P simultaneously Between similarity maximum；In form, optimization aim is as follows：

s.t|P|≤N (4)

O(p_i,p_j)≤η_P,i≠j

W(p_i,p_j) represent path of motion p_iAnd p_jBetween similarity, its definition will action path association in explain； S(b_t) it is bounding box b_tAction scoring；Φ is path of motion Candidate Set；η_PIt is a threshold value；

First constraint setting in equation (4) comprises the maximum number of path P；Second constraint is conducive to P to avoid producing Raw overlapping redundant actions path；The overlap in two kinds of paths passes through O (p_i,p_j) evaluate, it is defined as follows：

In equation (5),It is defined asRepresent two bounding boxsWithIoU；

In order to solve the MSCP in equation (4), need first to obtain action path candidates collection φ；φ is by space-time smooth-path p_i Composition, its continuous element Following two requirements should be met：

Represent IoU,WithRepresentColor histogram (HOC) and histogrammic gradient (HOG)；λ_aIt is the balance balancing this two weights；η_oAnd η_fIt is threshold value；First requirement in equation (6) ensures continuous bag Enclose boxWithSpatially continuous；Second requirement guaranteesWithThere is similar outward appearance；Therefore, path p_iMay be with With same action person；

The algorithm obtaining φ includes two stages：Sweep forward and backward tracking；The former purpose is the knot of location path Bundle, the purpose of the latter is intended to recover whole path；Its central idea is intended to maintain the optimal Top-N path candidates of a renewal People, is expressed as φ=(τ_k,b^k), k=1,2 ..., N, wherein, τ_kPath K score, by accumulation'sObtain, b^kIt is k road The bounding box of footpath end；In forward lookup, it also records eachAccumulative action fraction

WithMeet two requirements of formula (6), in t frameUpdate path candidates according to following two steps Pond：For each candidate, (τ_k,b^k), k=1,2 ..., N, if there is anyIt is connected to b^k, then b^kTo there be is maximum'sReplace；IfCumulative point bigger than the fraction that N-th advises, for example,(τ_N,b^N), then more It is newlyAfter searching for forward, follow the tracks of backward and recover path candidate (τ_k,b^k) eachMore specifically, for road Footpath p_k, obtainBy solving equation

(2) path of motion contact

After obtaining φ, the MSCP in formula (4) can be solved；Maximum set covering problem greedy search algorithm can To realize 1-1/e approximation ratio；At first, use maximum actuation fraction τ in φ_kCandidate pool p found_i, then it is added It is added to path set P；Assume that P has and comprise k path of motion, enumerate remaining path in φ, find a maximum flow equation For：

In formula (6), W (p_i,p_j) path of motion p_iAnd p_jSimilarity, be defined as

W(p_i,p_j)=1/ (‖ C (p_i)-C(p_j)‖+λ_a‖H(p_i)-H(p_j)‖) (10)

C(p_*) and H (p_*) difference delegated path p_*The cluster centre of bounding box HOC and HOG, W (p_i,p_j) higher value, p_i And p_jIt is probably identical actor；In order to reduce the redundant path in set P, the new path p adding_iEquation (5) should be met In constraint；

(3) path of motion completes

The linear SVM of training is as frame level detector；The positive bag including data set P of initial group Enclose box, and negative bounding box composition excludes data set p, bounding box randomly chooses in positive group, and IoU is less than 0.3；Give in t frame Detection zone b_t, the test position missed in t+1 frame, in order to find most possible position；First, with region b_tInterior light Transformation in stream, b_tIt is mapped to b '_t+1；Second, by extending b '_t+1Height and width, the original length of past half, build Region of search b '_t+1；3rd, b ' is scanned by a set of window_t+1, the ratio of width and length changes in the range of [0.8,1.2] Adapting to a performer may size variation；b_t+1Best regional choice is as a ground below equation to greatest extent：

N(b′_t+1) represent scanning b '_t+1The window collection producing, S_f() is SVM classifier, and input feature vector is selected as The combination of HOC and HOG；Obtaining b_t+1Afterwards, the support vector machine detector of renewal, is used as a positive sample by adding This, b_t+1The bounding box that IoU is less than 0.3 is feminine gender.

Wherein, described action suggestion generates, and can be considered an action including space and time continuous track, be absorbed in one Actor is from appearance until disappearing；For each action, if its persistent period is more than a threshold value specified, this row Dynamic suggestion, is expressed as T.

Brief description

Fig. 1 is a kind of system flow chart of the human action analysis method based on video data of the present invention.

Fig. 2 is a kind of comparison diagram of the human detection result of the human action analysis method based on video data of the present invention.

Fig. 3 is a kind of example of the path of motion generation of human action analysis method based on video data of the present invention.

Fig. 4 is a kind of action suggestion in UCF-Sports of human action analysis method based on video data of the present invention The result generating.

Specific embodiment

It should be noted that in the case of not conflicting, the embodiment in the application and the feature in embodiment can phases Mutually combine, with specific embodiment, the present invention is described in further detail below in conjunction with the accompanying drawings.

Fig. 1 is a kind of system flow chart of the human action analysis method based on video data of the present invention.Main inclusion number According to input；Spatial displacements are evaluated；Time path of motion extracts；Action suggestion generates.

Wherein, data input includes training and test two parts, is wherein trained using UCF-Sports data, makes Tested with Olympic sports dataset data；

(2)Olympic sports dataset：This data base has 16 kinds of behaviors, 783 sections of videos.

Wherein, spatial displacements evaluation includes human body evaluation, Motion evaluation and action fraction and calculates.

Evaluate inclusion and there is evaluation index, evaluation is based on action suggestionAverage IoU value and ground truth G between, it It is defined as：

Human body evaluate, include execute training data, rotate each training sample, respectively fromArriveSeven different Angle, is spaced apart Represent the bounding box of i-th action in t frame, bounding box is expressed as [x, y, w, h], and wherein w and h divides Do not represent width and height, (x, y) is center；After training terminates, each bounding box is in test videoProbability CNN assessing network can be passed through；By arranging a probability threshold value, there is mankind's suggestion of more high probability, keep for follow-up Process.

Action evaluation, including using motion clue exclusion negative action suggestion；Light stream rectangular histogram (HOF) descriptor is used to Everyone exercise suggestion is described；Two gauss hybrid models (GMMs), G are constructed based on HOFs_p(.) and G_n(.), generation respectively The positive and negative suggestion of table, prediction belongs to the probability of the motor pattern of action or ground truth；HOFs calculates intersecting unit (IoU) bounding box, overlapping with ground truth more than 0.5 as positive, and those overlaps are negative sample less than 0.1；Give A fixed testing schemeHOFh with it_i, definitionProbability as motion scores, using two Gauss models The prediction of mixture：

σ=1/ (1+e^-x) mapping probability scope be [0,1].

Action fraction calculates, and scores including a bounding box actionBy human detection scoring and motion scores two parts group Become, be defined as follows：

Wherein, the extraction of time path of motion includes path of motion generation and contact, path of motion complete.Step is as follows：

(1) path of motion generates

s.t|P|≤N (4)

O(p_i,p_j)≤η_P,i≠j

W(p_i,p_j) represent path of motion p_iAnd p_jBetween similarity, its definition will action path association in explain； S(b_t) it is bounding box b_tAction scoring；Φ is path of motion Candidate Set；η_PIt is a threshold value；First in equation (4) about Bundle setting comprises the maximum number of path P；Second constraint is conducive to P to avoid producing overlapping redundant actions path；Liang Zhong road The overlap in footpath passes through O (p_i,p_j) evaluate, it is defined as follows：

In equation (5),It is defined asRepresent two bounding boxsWithIoU；

Represent IoU,And HRepresentColor histogram (HOC) and histogrammic gradient (HOG)；λ_aIt is the balance balancing this two weights；η_oAnd η_fIt is threshold value；First requirement in equation (6) ensures continuous bag Enclose boxWithSpatially continuous；Second requirement guaranteesWithThere is similar outward appearance；Therefore, path p_iMay be with With same action person；

(2) path of motion contact

In formula (6), W (p_i,p_j) path of motion p_iAnd p_jSimilarity, be defined as

W(p_i,p_j)=1/ (‖ C (p_i)-C(p_j)‖+λ_a‖H(p_i)-H(p_j)‖) (10)

(3) path of motion completes

Wherein, action advises that generation includes space and time continuous track and can be considered an action, is absorbed in an actor From appearance until disappearing；For each action, if its persistent period is more than a threshold value specified, this action is built View, is expressed as.

Fig. 2 is a kind of comparison diagram of the human detection result of the human action analysis method based on video data of the present invention. As shown in the figure it can be observed that model inspection result is more accurate and complicated.The bounding box of square frame 1 and 2 is ground truth respectively And testing result.First width figure and the 3rd width figure are that (there is the inspection of a loss in the 3rd by what quick r-cnn obtained Survey)；And the second width figure and the 4th width figure are the results using method therefor of the present invention, there is not loss to human action detection Situation.

Fig. 3 is a kind of example of the path of motion generation of human action analysis method based on video data of the present invention.As Shown in figure it can be observed that, in the first row, front several square frames contain incoherent actor, and in the second row, employ this Invention method therefor, the path of motion of actor, all by accurate recording, illustrates that the method improves to some extent.

Fig. 4 is a kind of action suggestion in UCF-Sports of human action analysis method based on video data of the present invention The result generating.The bounding box of square frame 1 and 2 is ground truth and action suggestion respectively.

For those skilled in the art, the present invention is not restricted to the details of above-described embodiment, in the essence without departing substantially from the present invention In the case of god and scope, the present invention can be realized with other concrete forms.Additionally, those skilled in the art can be to this Bright carry out various change and modification without departing from the spirit and scope of the present invention, these improve and modification also should be regarded as the present invention's Protection domain.Therefore, all changes that claims are intended to be construed to including preferred embodiment and fall into the scope of the invention More and modification.

Claims

1. a kind of human action analysis method based on video data is it is characterised in that mainly include data input ()；Space Action evaluation (two)；Time path of motion extracts (three)；Action suggestion generates (four).

2., based on the data input () described in claim 1 it is characterised in that including training and test two parts, wherein make It is trained with UCF-Sports data, tested using Olympic sports dataset data；

3. based on the spatial displacements evaluation (two) described in claim 1 it is characterised in that include human body evaluation, Motion evaluation and Action fraction calculates.

4., based on the evaluation described in claim 3 it is characterised in that including thering is evaluation index, evaluation is based on action suggestion Average IoU value and ground truth G between, it is defined as：

Wherein G_tWithBe respectively t frame detection bounding box and ground truth, o (...) is IoU value, and | C | is a framing, its In testing result or ground truth be not empty；WhenWhen, then action suggestion is positive group；η is the threshold specified Value, η is set to 0.5.

5., based on the human body evaluation described in claim 3 it is characterised in that execution training data, rotate each training sample, point Be not fromArriveSeven different angles, be spaced apart Represent the bounding box of i-th action in t frame, bounding box table It is shown as [x, y, w, h], wherein w and h represents width and height respectively, and (x, y) is center；After training terminates, each bounding box exists In test videoProbabilityCNN assessing network can be passed through；By arranging a probability threshold value, there is more high probability The mankind suggestion, keep for subsequent treatment.

6. based on the action evaluation described in claim 3 it is characterised in that including building using the negative action of motion clue exclusion View；Light stream rectangular histogram (HOF) descriptor is used to describe everyone exercise suggestion；Two Gaussian Mixture are constructed based on HOFs Model (GMMs), G_p(.) and G_n(.), represents positive and negative suggestion respectively, and prediction belongs to the motion of action or ground truth The probability of pattern；HOFs calculates intersecting unit (IoU) bounding box, overlapping with ground truth more than 0.5 as positive, and that Overlapping a bit is negative sample less than 0.1；A given testing schemeHOFh with it_i, definitionProbability as one transport Dynamic scoring, using the prediction of the mixture of two Gauss models：

σ=1/ (1+e^-x) mapping probability scope be [0,1].

7. calculated based on the action fraction described in claim 3 it is characterised in that a bounding box action is scoredBy people's health check-up Test and appraisal point and motion scores two parts composition, are defined as follows：

8. (three) are extracted based on the time path of motion described in claim 1 and generate and join it is characterised in that including path of motion System, path of motion complete, and step is as follows：

(1) path of motion generates

Action suggestion on each framework, finds one group of action path P={ p₁,p₂,…,p_i, wherein A corresponding path, starts to e-th frame end from s-th frame；Formulating and finding action path set P is maximum Collection covering problem (MSCP), formulates improved optimization purpose MSCP, makes the member in action scoring and set of paths P simultaneously Between similarity maximum；In form, optimization aim is as follows：

W(p_i,p_j) represent path of motion p_iAnd p_jBetween similarity, its definition will action path association in explain；S(b_t) It is bounding box b_tAction scoring；Φ is path of motion Candidate Set；η_PIt is a threshold value；

First constraint setting in equation (4) comprises the maximum number of path P；Second constraint is conducive to P to avoid producing weight Folded redundant actions path；The overlap in two kinds of paths passes through O (p_i,p_j) evaluate, it is defined as follows：

In equation (5),It is defined asRepresent two bounding boxsWithIoU；

In order to solve the MSCP in equation (4), need first to obtain action path candidates collection φ；φ is by space-time smooth-path p_iComposition, Its continuous elementFollowing two requirements should be met：

Represent IoU,WithRepresentColor histogram (HOC) and histogrammic gradient (HOG)；λ_aIt is Balance the balance of this two weights；η_oAnd η_fIt is threshold value；First requirement in equation (6) ensures continuous bounding boxWithSpatially continuous；Second requirement guaranteesWithThere is similar outward appearance；Therefore, path p_iMay follow identical Actor；

The algorithm obtaining φ includes two stages：Sweep forward and backward tracking；The former purpose is the end of location path, after The purpose of person is intended to recover whole path；Its central idea is intended to maintain the optimal Top-N path candidates people of a renewal, represents For φ=(τ_k,b^k), k=1,2 ..., N, wherein, τ_kPath K score, by accumulation'sObtain, b^kIt is k path ends Bounding box；In forward lookup, it also records eachAccumulative action fraction

WithMeet two requirements of formula (6), in t frameUpdate path candidates pond according to following two steps：For Each candidate, (τ_k,b^k), k=1,2 ..., N, if there is anyIt is connected to b^k, then b^kTo there be is maximum'sReplace；IfCumulative point bigger than the fraction that N-th advises, for example,(τ_N,b^N), then it is updated toAfter searching for forward, follow the tracks of backward and recover path candidate (τ_k,b^k) eachMore specifically, for path p_k, ObtainBy solving equation

(2) path of motion contact

After obtaining φ, the MSCP in formula (4) can be solved；Maximum set covering problem greedy search algorithm can be real Existing 1-1/e approximation ratio；At first, use maximum actuation fraction τ in φ_kCandidate pool p found_i, then add it to Path set P；Assume that P has and comprise k path of motion, enumerate remaining path in φ, finding a maximum flow equation is：

In formula (6), W (p_i,p_j) path of motion p_iAnd p_jSimilarity, be defined as

W(p_i,p_j)=1/ (‖ C (p_i)-C(p_j)‖+λ_a‖H(p_i)-H(p_j)‖) (10)

C(p_*) and H (p_*) difference delegated path p_*The cluster centre of bounding box HOC and HOG, W (p_i,p_j) higher value, p_iAnd p_jCan It can be identical actor；In order to reduce the redundant path in set P, the new path p adding_iShould meet in equation (5) Constraint.

9. completed based on the path of motion described in claim 8 it is characterised in that including the linear SVM conduct trained Frame level detector；The positive bounding box including data set P of initial group, and negative bounding box composition exclusion data set p, Bounding box randomly chooses in positive group, and IoU is less than 0.3；Give detection zone b in t frame_t, the test position missed in t+1 frame, In order to find most possible position；First, with region b_tTransformation in interior light stream, b_tIt is mapped to b_t′₊₁；Second, pass through Extension b_t′₊₁Height and width, the original length of past half, build region of search b_t′₊₁；3rd, swept by a set of window Retouch b_t′₊₁, the ratio of width and length changes one performer of adaptation in the range of [0.8,1.2] may size variation；b_t+1Best Regional choice as one ground below equation to greatest extent：

N(b_t′₊₁) represent scanning b_t′₊₁The window collection producing, S_f() is SVM classifier, input feature vector be selected as HOC and The combination of HOG；Obtaining b_t+1Afterwards, the support vector machine detector of renewal, is used as a positive sample, b by adding_t+ ₁The bounding box that IoU is less than 0.3 is feminine gender.

10. (four) are generated based on the action suggestion described in claim 1 it is characterised in that include space and time continuous track can be by It is considered as an action, be absorbed in an actor from appearance until disappearing；For each action, if its persistent period is big In a threshold value specified, this action suggestion, it is expressed as