CN104881655B

CN104881655B - A kind of human behavior recognition methods based on the fusion of multiple features time-space relationship

Info

Publication number: CN104881655B
Application number: CN201510298003.2A
Authority: CN
Inventors: 姚莉
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2015-06-03
Filing date: 2015-06-03
Publication date: 2018-08-28
Anticipated expiration: 2035-06-03
Also published as: CN104881655A

Abstract

The invention discloses a kind of human behavior recognition methods based on the fusion of multiple features time-space relationship, specific steps include：The intensive track characteristic of video extraction is indicated by light stream histogram and moving boundaries histogram,Then it uses KMEANS algorithms to build two kinds of features and corresponds to the space-time bigraph (bipartite graph) between barycenter,Space-time bigraph (bipartite graph) is divided using the roads K bigraph (bipartite graph) cutting techniques,The coding of the videl stage after two kinds of Fusion Features is obtained using the representation method based on conditional probability,It finally trains grader and is identified,Through the above way,A kind of human behavior recognition methods based on the fusion of multiple features time-space relationship of the present invention,This method is by calculating the time-space matrix in each video between feature,To which two kinds of features of structure correspond to the space-time bigraph (bipartite graph) between barycenter,Space-time bigraph (bipartite graph) is split using the roads K bigraph (bipartite graph) cutting techniques,It will be merged with the barycenter of strong time-space relationship,The effective information of different characteristic is preferably excavated,Improve recognition accuracy.

Description

A kind of human behavior recognition methods based on the fusion of multiple features time-space relationship

Technical field

The present invention relates to computer vision fields, more particularly, to a kind of mankind's row based on the fusion of multiple features time-space relationship For recognition methods.

Background technology

With the development of computer science, video starts into a part for people's lives, how to allow computer " understanding " Human behavior in video all has the fields such as content based video retrieval system, intelligent monitoring, human-computer interaction and virtual reality Important function.

In general, a classical human behavior identification framework includes mainly three steps：Feature extraction, Video coding And the training and identification of grader, in addition, for using various features the case where, further include an optional multiple features early period Fusion or later stage fusion steps, Video coding therein are the committed steps for determining recognition accuracy.

Currently, it is bag of words to be widely used with one of improved coding method（BagofWords, abbreviation BoW）Method, warp The BoW methods of allusion quotation first cluster feature, and it is straight that representation of video shot is then appeared in the frequency in each barycenter at feature Side's figure vector, although BoW codings show that good generalization ability and robustness, this method also have in many documents Many disadvantages：For example time-consuming feature clustering process, KMEANS algorithms have time-space relationship between supervision parameter k and barycenter to believe The loss of breath.

Parameter k in order to eliminate KMEANS algorithms relies on empirically determined problem, " Liu J, Shah M. Learning human actions via information maximization[C]// Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008:1-8. " uses mutual trust Breath maximizes the unsupervised most suitable barycenter quantity of determination of clustering algorithm, which is carried out with a larger k first KMEANS is clustered, and to reduce since information caused by KMEANS clusters is lost, is existed later by mutual information maximization clustering algorithm Barycenter quantity is reduced under the premise of loss information as few as possible, and the calculating speed of subsequent step is improved with this.

In order to solve the problems, such as that time-space relationship information is lost, many researchers propose the extended method based on BoW, press According to the difference of retained information, these methods are divided into two classes：The BoW for retaining absolute space-time information indicates and retains relative space-time The BoW of information is indicated.The former usually requires to carry out global segmentation to the space-time body of video, this makes the Video coding being calculated It is related to the absolute space-time coordinate of feature, lack translation invariance.“Laptev I, Marszalek M, Schmid C, et al. Learning realistic human actions from movies[C]// Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008:1-8. " The space-time body integral of video is cut into predefined space-time grid, calculates BoW in each grid respectively later and all grids BoW vectors be together in series as final Video coding.However, in order to determine that best grid combination, this method need to use Cross validation carries out greedy search, and the step is very time-consuming, in addition, obtained by the BoW for different grids of connecting Overlength vector further increases computation complexity.“Sun J, Wu X, Yan S, et al. Hierarchical spatio-temporal context modeling for action recognition[C]// Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009: 2004-2011. " obtains three layers of spatio-temporal context information in a manner of a kind of level.And the latter, that is, retain relative space-time information Method typically carries out Video coding using the relative space-time distance between BoW barycenter or feature.“Kovashka A, Grauman K. Learning a hierarchy of discriminative space-time neighborhood features for human action recognition[C]// Computer Vision and Pattern Recognition. IEEE, 2010:2046-2053. " constructs new feature with the point around former characteristic point first, combines later The affiliated barycenter information of new feature and new feature directional information construct Video coding.Due to needing the barycenter of construction multi-layer, the party The computation complexity of method is relatively high.“Wang J, Chen Z, Wu Y. Action recognition with multiscale spatio-temporal contexts[C]// Computer Vision and Pattern Recognition. IEEE, 2011:3185-3192. " passes through between the acquisition feature in multiple spatial and temporal scales of former feature Space-time context interactive information carries out Video coding.

Invention content

The invention mainly solves the technical problem of providing a kind of human behavior knowledges based on the fusion of multiple features time-space relationship Other method, two kinds of features of this method pair correspond to the time-space relationship information between barycenter and carry out explicit code, can preferably excavate The effective information of different characteristic carries out human behavior identification.

In order to solve the above technical problems, one aspect of the present invention is：One kind being based on multiple features time-space relationship The human behavior recognition methods of fusion, specific steps include：

Step 1：Intensive track characteristic extraction is carried out to video, and with two kinds of sides of light stream histogram and moving boundaries histogram Method is indicated the track characteristic of extraction, obtains two kinds of character representations；

Step 2：Two kinds of features, which are built, with KMEANS algorithms corresponds to the space-time bigraph (bipartite graph) between barycenter；

Step 3：The space-time bigraph (bipartite graph) in step 2 is divided into strong time-space relationship using the roads K bigraph (bipartite graph) cutting techniques Barycenter and barycenter with weak time-space relationship, the barycenter fusion with strong time-space relationship will get up after segmentation, when will have weak The barycenter of void relation separates；

Step 4：The time-space matrix matrix between the barycenter with strong time-space relationship is calculated, and using based on conditional probability Representation method matrix of adjusting the distance is compressed, and the coding of the videl stage after two kinds of Fusion Features is obtained；

Step 5：Training grader is simultaneously identified.

In a preferred embodiment of the present invention, KMEANS algorithms will obtain in the step 1 in the step 2 two Kind feature is clustered, and to obtain several barycenter, spacetime coordinate is corresponded to by calculating any two feature in each video Between L1 distances weigh the time-space relationship between two features, using the time-space relationship between two kinds of features calculate its barycenter Between time-space relationship, and obtain two kinds of features and correspond to space-time bigraph (bipartite graph) between barycenter.

In a preferred embodiment of the present invention, conditional probability described in the step 4 indicates method first to barycenter The distance between vector carry out discretization, then the time-space matrix point after any two fusion between barycenter is described with conditional probability Cloth information.

The beneficial effects of the invention are as follows：A kind of human behavior identification side based on the fusion of multiple features time-space relationship of the present invention Method, this method is by calculating the time-space matrix in each video between feature, to which two kinds of features of structure correspond between barycenter Space-time bigraph (bipartite graph), and space-time bigraph (bipartite graph) is split using the roads K bigraph (bipartite graph) cutting techniques, it will be with strong time-space relationship with this Barycenter merges, and has preferably excavated the effective information of different characteristic, has improved recognition accuracy.

Description of the drawings

Fig. 1 is a kind of flow chart of the human behavior recognition methods based on the fusion of multiple features time-space relationship.

Specific implementation mode

The preferred embodiments of the present invention will be described in detail below so that advantages and features of the invention can be easier to by It will be appreciated by those skilled in the art that so as to make a clearer definition of the protection scope of the present invention.

The embodiment of the present invention includes：A kind of human behavior recognition methods based on the fusion of multiple features time-space relationship, it is specific to walk Suddenly include：

Step 1：Intensive track characteristic extraction is carried out to video, carrying out characteristic point in an intensive grid first adopts Sample.In order to enable the characteristic point of acquisition adapts to change of scale, will be adopted in the grid of multiple and different space scales simultaneously Sample, then intensive track characteristic by estimate the optical flow field of each frame to each sampled point into line trace, and each sampled point L frames are only tracked in its corresponding space scale, and finally feature calculation light stream histogram and moving boundaries histogram are made respectively For two different features；

Step 2：If two kinds extracted in the step 1 are characterized as fea1 and fea2, both features are carried out respectively The barycenter that KMEANS is clustered is respectivelyWith, by this obtained The barycenter construction space-time bigraph (bipartite graph) G (V, E) of two kinds of features.

Wherein V=, E is the adjacency matrix of bigraph (bipartite graph), i.e.,：

Wherein, S the sum of time-space matrix matrixes between two kinds of features in entire training set, i.e.,：

Wherein,Time-space matrix matrix of the barycenter in video V is corresponded to for two kinds of features, is each regarded by calculating Any two feature corresponds to the L1 distances between spacetime coordinate to weigh the time-space relationship between two features, L1 distance meters in frequency Calculating formula is:

；

Step 3：The space-time bigraph (bipartite graph) in step 2 is divided into strong time-space relationship using the roads K bigraph (bipartite graph) cutting techniques Barycenter and barycenter with weak time-space relationship give a bigraph (bipartite graph) G (V, E), the roads the K segmentation of bigraph (bipartite graph) is by vertex set V points It is cut into k subset, it is therefore an objective to minimize balancing cost function：

Laplacian Matrix is constructed first：

And find out L's using singular value decompositionA corresponding left singular vector of big singular valueIt is strange with the right side Incorgruous amount, wherein setting,,It arrivesThe second largest unusual greatly to l of L is corresponded to respectively The corresponding left singular vector of value,It arrivesThe second largest of L is corresponded to respectively corresponds to right singular vector to the big singular values of l；

If

It is rightRow vector carry out KMEANS clusters, K obtained barycenter be merge barycenter, ifIt is characterized 1 barycenter Quantity,It is characterized 2 barycenter quantity, then matrixLine number be, by clustering,BeforeA row Barycenter difference character pair 1 belonging to vectorA barycenter generic,AfterA affiliated barycenter of row vector is right respectively Answer feature 2A barycenter generic；

Step 4：If the distance matrix of video V is before fusion, merged two kinds are characterized as fea1 and fea2, two Kind feature corresponds to barycenter set and is respectivelyWith, and the protoplasm heart is to melting Close barycenter mapping function be；

Wherein,,

Then the distance matrix of barycenter fusion rear video V is, wherein

It calculates firstfea1BarycenterIn video V withfea2All barycenter with distanceRelevant condition is general Rate is:

Wherein,It indicates in video Vfea1BarycenterWithfea2All barycenter apart's Number,

Then, the barycenter of fea2 is symmetrically calculatedIn video V withfea1All barycenter with distanceIt is relevant Conditional probability is：

Finally, entire video V can be encoded into the matrix of 2m*k：

；

Step 5：Finally, it is encoded using the fusion rear video grade of acquisition, one multi-class support vector machine of training is for newly regarding The identification of frequency.

Compared with prior art, a kind of human behavior recognition methods based on the fusion of multiple features time-space relationship of the present invention, should Method is by calculating the time-space matrix in each video between feature, to which two kinds of features of structure correspond to the space-time two between barycenter Portion's figure, and space-time bigraph (bipartite graph) is split using the roads K bigraph (bipartite graph) cutting techniques, barycenter with strong time-space relationship is melted with this It closes, has preferably excavated the effective information of different characteristic, improved recognition accuracy.

Example the above is only the implementation of the present invention is not intended to limit the scope of the invention, every to utilize this hair Equivalent structure or equivalent flow shift made by bright specification is applied directly or indirectly in other relevant technical fields, Similarly it is included within the scope of the present invention.

Claims

1. a kind of human behavior recognition methods based on the fusion of multiple features time-space relationship, which is characterized in that specific steps include：

Step 1：Intensive track characteristic extraction is carried out to video, and with light stream histogram and moving boundaries histogram two methods pair The track characteristic of extraction is indicated, and obtains two kinds of character representations；

Step 2：Obtained in the step 1 two kinds of features are clustered, it is each by calculating to obtain several barycenter Any two feature corresponds to the L1 distances between spacetime coordinate to weigh the time-space relationship between two features in video, utilizes two Time-space relationship between kind of feature calculates the time-space relationship between its barycenter, and obtains two kinds of features and correspond to space-time between barycenter Bigraph (bipartite graph)；

Step 3：The space-time bigraph (bipartite graph) in step 2 is divided into the matter with strong time-space relationship using the roads K bigraph (bipartite graph) cutting techniques The heart and barycenter with weak time-space relationship will get up after segmentation, will have weak space-time to close in the barycenter fusion with strong time-space relationship The barycenter of system separates；

Step 4：The time-space matrix matrix between the barycenter with strong time-space relationship is calculated, and uses the expression based on conditional probability Method matrix of adjusting the distance is compressed, and the coding of the videl stage after two kinds of Fusion Features is obtained；

Step 5：Training grader is simultaneously identified.

2. a kind of human behavior recognition methods based on the fusion of multiple features time-space relationship according to claim 1, feature It is：Conditional probability described in the step 4 indicates that method carries out discretization to the distance between barycenter vector first, then The time-space matrix distributed intelligence after any two merges between barycenter is described with conditional probability.