CN104881655A

CN104881655A - Human behavior recognition method based on multi-feature time-space relationship fusion

Info

Publication number: CN104881655A
Application number: CN201510298003.2A
Authority: CN
Inventors: 姚莉
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2015-06-03
Filing date: 2015-06-03
Publication date: 2015-09-02
Anticipated expiration: 2035-06-03
Also published as: CN104881655B

Abstract

The invention discloses a human behavior recognition method based on multi-feature time-space relationship fusion, specifically comprising the following steps: first, dense trajectory features extracted from videos are represented through an optical flow histogram and a moving boundary histogram; then, a time-space bipartite graph between the centers of mass of the two types of features is constructed by a KMEANS algorithm, the time-space bipartite graph is segmented by a K-path bipartite graph segmentation technology, and a representation method based on conditional probability is adopted to obtain video-level coding after fusion of the two types of features; and finally, a classifier is trained, and recognition is performed. According to the human behavior recognition method based on multi-feature time-space relationship fusion, the time-space bipartite graph between the centers of mass of the two types of features is constructed by calculating the time-space distance between the features of each video, the time-space bipartite graph is segmented by the K-path bipartite graph segmentation technology, and the centers of mass with strong time-space relationship are fused. Thus, effective information of different features is better mined, and the recognition accuracy is improved.

Description

A kind of human behavior recognition methods of merging based on multiple features time-space relationship

Technical field

The present invention relates to computer vision field, especially relate to a kind of human behavior recognition methods of merging based on multiple features time-space relationship.

Background technology

Along with the development of computer science, video starts to become a part for people's life, how to allow human behavior in computing machine " understanding " video, all has vital role to fields such as content based video retrieval system, intelligent monitoring, man-machine interaction and virtual realities.

Generally speaking, a classical human behavior identification framework mainly comprises three steps: the training of feature extraction, Video coding and sorter and identification, in addition, for the manifold situation of employing, also comprise an optional multiple features to merge or later stage fusion steps early stage, Video coding is wherein the committed step determining recognition accuracy.

At present, one of coding method being widely used and improving is word bag (BagofWords, be called for short BoW) method, first classical BoW method carries out cluster to feature, then videometer be shown as feature appear in each barycenter frequency histogram vector, although BoW coding shows good generalization ability and robustness in a lot of document, but the method also has a lot of shortcoming: such as time-consuming feature clustering process, the loss having time-space relationship information between supervision parameter k and barycenter of KMEANS algorithm.

In order to the problem that the parameter k dependence experience eliminating KMEANS algorithm is determined, " LiuJ, ShahM.Learninghumanactionsviainformationmaximization [C] .ComputerVisionandPatternRecognition, 2008.CVPR2008.IEEEConferenceon.IEEE, 2008:1-8. " use that mutual information maximization clustering algorithm is unsupervised determines most suitable barycenter quantity, first this algorithm carries out KMEANS cluster with a larger k, to reduce the information dropout because KMEANS cluster causes, under the prerequisite of the least possible drop-out, barycenter quantity is reduced afterwards by mutual information maximization clustering algorithm, the computing velocity of subsequent step is improved with this.

In order to solve the problem of time-space relationship information dropout, many researchers propose the extended method based on BoW, according to the difference of retained information, these methods are divided into two classes: the BoW retaining absolute space-time information represents and the BoW that retains relative space-time information represents.The former needs to carry out global segmentation to the space-time body of video usually, and this makes the absolute space-time coordinate of Video coding and the feature calculated relevant, lacks translation invariance." LaptevI; MarszalekM; SchmidC; etal.Learningrealistichumanactionsfrommovies [C] .ComputerVisionandPatternRecognition; 2008.CVPR2008.IEEEConferenceon.IEEE; 2008:1-8. " is slit into predefined space-time grid the space-time body integration of video, calculates BoW respectively afterwards and the BoW of all grids vector is together in series as final Video coding in each grid.But in order to determine best grid combination, the method needs to carry out greed search with cross validation, and this step is very time-consuming, and in addition, the overlength vector obtained by the BoW of different grid of connecting further increases computation complexity." SunJ; WuX; YanS; etal.Hierarchicalspatio-temporalcontextmodelingforaction recognition [C] .ComputerVisionandPatternRecognition; 2009.CVPR2009.IEEEConferenceon.IEEE, 2009:2004-2011. " obtains three layers of spatio-temporal context information in a kind of mode of level.And the latter, namely retain the method for relative space-time information, normally utilize the relative space-time distance between BoW barycenter or feature to carry out Video coding." KovashkaA; GraumanK.Learningahierarchyofdiscriminativespace-timenei ghborhoodfeaturesforhumanactionrecognition [C] .ComputerVisionandPatternRecognition (CVPR); 2010IEEEConferenceon.IEEE; 2010:2046-2053. " first with the some structure new feature around former unique point, afterwards in conjunction with barycenter information belonging to new feature and new feature directional information structure Video coding.Owing to needing the barycenter constructing multi-layer, the computation complexity of the method is relatively high." WangJ; ChenZ; WuY.Actionrecognitionwithmultiscalespatio-temporalcontex ts [C] .ComputerVisionandPatternRecognition (CVPR); 2011IEEEConferenceon.IEEE, 2011:3185-3192. " carries out Video coding by the space-time context interactive information obtained in multiple spatial and temporal scales of former feature between feature.。

Summary of the invention

The technical matters that the present invention mainly solves is to provide a kind of human behavior recognition methods of merging based on multiple features time-space relationship, the method carries out explicit code to the time-space relationship information between the corresponding barycenter of two kinds of features, and the effective information that better can excavate different characteristic carries out human behavior identification.

For solving the problems of the technologies described above, the technical scheme that the present invention adopts is: a kind of human behavior recognition methods of merging based on multiple features time-space relationship, and concrete steps comprise:

Step 1: intensive track characteristic extraction is carried out to video, and with light stream histogram and moving boundaries histogram two kinds of methods, the track characteristic extracted is represented, obtain two kinds of character representations;

Step 2: build the space-time bigraph (bipartite graph) between the corresponding barycenter of two kinds of features with KMEANS algorithm;

Step 3: adopt K road bigraph (bipartite graph) cutting techniques the space-time bigraph (bipartite graph) in step 2 to be divided into the barycenter with strong time-space relationship and the barycenter with weak time-space relationship, the barycenter after segmentation with strong time-space relationship is merged, will there is the barycenter of weak time-space relationship separately;

Step 4: calculate there is strong time-space relationship barycenter between time-space matrix matrix, and adopt the matrix of adjusting the distance of the method for expressing based on conditional probability to compress, obtain the coding of the videl stage after two kinds of Fusion Features;

Step 5: training classifier also identifies.

In a preferred embodiment of the present invention, in described step 2, the two kinds of features obtained in described step 1 are carried out cluster by KMEANS algorithm, thus obtain several barycenter, the time-space relationship between two features is weighed by the L1 distance calculated in each video between the corresponding spacetime coordinates of any two features, utilize the time-space relationship between two kinds of features to calculate time-space relationship between its barycenter, and obtain the space-time bigraph (bipartite graph) between the corresponding barycenter of two kinds of features.

In a preferred embodiment of the present invention, conditional probability described in described step 4 represents that first method carries out discretize to the distance vector between barycenter, then describes any two time-space matrix distributed intelligences of merging between rear barycenter with conditional probability.

The invention has the beneficial effects as follows: a kind of human behavior recognition methods of merging based on multiple features time-space relationship of the present invention, the method is by calculating the time-space matrix in each video between feature, thus the space-time bigraph (bipartite graph) built between the corresponding barycenter of two kinds of features, and adopt K road bigraph (bipartite graph) cutting techniques to split space-time bigraph (bipartite graph), with this, barycenter with strong time-space relationship is merged, the better effective information excavating different characteristic, improves recognition accuracy.

Accompanying drawing explanation

Fig. 1 is a kind of process flow diagram of the human behavior recognition methods based on the fusion of multiple features time-space relationship.

Embodiment

Below preferred embodiment of the present invention is described in detail, can be easier to make advantages and features of the invention be readily appreciated by one skilled in the art, thus more explicit defining is made to protection scope of the present invention.

The embodiment of the present invention comprises: a kind of human behavior recognition methods of merging based on multiple features time-space relationship, and concrete steps comprise:

Step 1: intensive track characteristic extraction is carried out to video, first carries out unique point sampling in an intensive grid.Change of scale can be adapted to make the unique point gathered, to sample in the grid of multiple different spaces yardstick simultaneously, then intensive track characteristic is by estimating that the optical flow field of each frame is followed the tracks of each sampled point, and each sampled point only follows the tracks of L frame in the space scale of its correspondence, finally respectively to feature calculation light stream histogram and moving boundaries histogram as two kinds of different features;

Step 2: establish two kinds of extracting in described step 1 to be characterized as fea1 and fea2, the barycenter that KMEANS cluster obtains is carried out respectively to these two kinds of features and is respectively with , construct space-time bigraph (bipartite graph) G (V, E) by the barycenter of these the two kinds of features obtained.

Wherein V= , E is the adjacency matrix of bigraph (bipartite graph), that is:

Wherein, S is time-space matrix matrix sum between two kinds of features in whole training set, that is:

Wherein, be the time-space matrix matrix of the corresponding barycenter of two kinds of features in video V, weigh the time-space relationship between two features by the L1 distance calculated in each video between the corresponding spacetime coordinates of any two features, L1 distance computing formula is:

；

Step 3: adopt K road bigraph (bipartite graph) cutting techniques the space-time bigraph (bipartite graph) in step 2 to be divided into the barycenter with strong time-space relationship and the given bigraph (bipartite graph) G (V of barycenter with weak time-space relationship, E), vertex set V is divided into k subset by the K road segmentation of bigraph (bipartite graph) , object minimizes balancing cost function:

First Laplacian Matrix is constructed :

And use svd to obtain L's the left singular vector that individual time singular value is corresponding greatly with right singular vector , Qi Zhongshe , , arrive corresponding L's is second largest to the corresponding left singular vector of the large singular value of l respectively, arrive corresponding L's is second largest to the corresponding right singular vector of the large singular value of l respectively;

If

Right row vector carry out KMEANS cluster, K the barycenter obtained for merge barycenter, if for the barycenter quantity of feature 1, for the barycenter quantity of feature 2, then matrix line number be , by cluster, before barycenter difference character pair 1 belonging to individual row vector individual barycenter generic, after barycenter difference character pair 2 belonging to individual row vector individual barycenter generic;

Step 4: set merge before video V distance matrix as , two kinds that carry out merging are characterized as fea1 and fea2, and two kinds of corresponding barycenter set of feature are respectively with , and the protoplasm heart to the mapping function merging barycenter is ;

Wherein , ,

Then the distance matrix of barycenter fusion rear video V is , wherein

First calculate fea1barycenter in video V with fea2all barycenter with distance relevant conditional probability is:

Wherein, represent in video V fea1barycenter with fea2all barycenter apart number of times,

Then, the barycenter of symmetrical calculating fea2 in video V with fea1all barycenter with distance relevant conditional probability is:

Finally, whole video V can be encoded into the matrix of 2m*k:

；

Step 5: last, utilizes the fusion rear video level coding obtained, trains a multi-class support vector machine to be used for the identification of new video.

Compared with prior art, a kind of human behavior recognition methods of merging based on multiple features time-space relationship of the present invention, the method is by calculating the time-space matrix in each video between feature, thus the space-time bigraph (bipartite graph) built between the corresponding barycenter of two kinds of features, and adopt K road bigraph (bipartite graph) cutting techniques to split space-time bigraph (bipartite graph), with this, barycenter with strong time-space relationship is merged, better excavated the effective information of different characteristic, improved recognition accuracy.

The foregoing is only embodiments of the invention; not thereby the scope of the claims of the present invention is limited; every utilize instructions of the present invention to do equivalent structure or the conversion of equivalent flow process, or be directly or indirectly used in other relevant technical fields, be all in like manner included in scope of patent protection of the present invention.

Claims

1., based on the human behavior recognition methods that multiple features time-space relationship merges, it is characterized in that, concrete steps comprise:

Step 5: training classifier also identifies.

2. a kind of human behavior recognition methods of merging based on multiple features time-space relationship according to claim 1, it is characterized in that: in described step 2, the two kinds of features obtained in described step 1 are carried out cluster by KMEANS algorithm, thus obtain several barycenter, the time-space relationship between two features is weighed by the L1 distance calculated in each video between the corresponding spacetime coordinates of any two features, utilize the time-space relationship between two kinds of features to calculate time-space relationship between its barycenter, and obtain the space-time bigraph (bipartite graph) between the corresponding barycenter of two kinds of features.

3. a kind of human behavior recognition methods of merging based on multiple features time-space relationship according to claim 1, it is characterized in that: conditional probability described in described step 4 represents that first method carries out discretize to the distance vector between barycenter, then describe any two time-space matrix distributed intelligences of merging between rear barycenter with conditional probability.