CN104881655B - A kind of human behavior recognition methods based on the fusion of multiple features time-space relationship - Google Patents

A kind of human behavior recognition methods based on the fusion of multiple features time-space relationship Download PDF

Info

Publication number
CN104881655B
CN104881655B CN201510298003.2A CN201510298003A CN104881655B CN 104881655 B CN104881655 B CN 104881655B CN 201510298003 A CN201510298003 A CN 201510298003A CN 104881655 B CN104881655 B CN 104881655B
Authority
CN
China
Prior art keywords
time
space
barycenter
bigraph
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201510298003.2A
Other languages
Chinese (zh)
Other versions
CN104881655A (en
Inventor
姚莉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201510298003.2A priority Critical patent/CN104881655B/en
Publication of CN104881655A publication Critical patent/CN104881655A/en
Application granted granted Critical
Publication of CN104881655B publication Critical patent/CN104881655B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • G06V20/42Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes

Abstract

The invention discloses a kind of human behavior recognition methods based on the fusion of multiple features time-space relationship, specific steps include:The intensive track characteristic of video extraction is indicated by light stream histogram and moving boundaries histogram,Then it uses KMEANS algorithms to build two kinds of features and corresponds to the space-time bigraph (bipartite graph) between barycenter,Space-time bigraph (bipartite graph) is divided using the roads K bigraph (bipartite graph) cutting techniques,The coding of the videl stage after two kinds of Fusion Features is obtained using the representation method based on conditional probability,It finally trains grader and is identified,Through the above way,A kind of human behavior recognition methods based on the fusion of multiple features time-space relationship of the present invention,This method is by calculating the time-space matrix in each video between feature,To which two kinds of features of structure correspond to the space-time bigraph (bipartite graph) between barycenter,Space-time bigraph (bipartite graph) is split using the roads K bigraph (bipartite graph) cutting techniques,It will be merged with the barycenter of strong time-space relationship,The effective information of different characteristic is preferably excavated,Improve recognition accuracy.

Description

A kind of human behavior recognition methods based on the fusion of multiple features time-space relationship
Technical field
The present invention relates to computer vision fields, more particularly, to a kind of mankind's row based on the fusion of multiple features time-space relationship For recognition methods.
Background technology
With the development of computer science, video starts into a part for people's lives, how to allow computer " understanding " Human behavior in video all has the fields such as content based video retrieval system, intelligent monitoring, human-computer interaction and virtual reality Important function.
In general, a classical human behavior identification framework includes mainly three steps:Feature extraction, Video coding And the training and identification of grader, in addition, for using various features the case where, further include an optional multiple features early period Fusion or later stage fusion steps, Video coding therein are the committed steps for determining recognition accuracy.
Currently, it is bag of words to be widely used with one of improved coding method(BagofWords, abbreviation BoW)Method, warp The BoW methods of allusion quotation first cluster feature, and it is straight that representation of video shot is then appeared in the frequency in each barycenter at feature Side's figure vector, although BoW codings show that good generalization ability and robustness, this method also have in many documents Many disadvantages:For example time-consuming feature clustering process, KMEANS algorithms have time-space relationship between supervision parameter k and barycenter to believe The loss of breath.
Parameter k in order to eliminate KMEANS algorithms relies on empirically determined problem, " Liu J, Shah M. Learning human actions via information maximization[C]// Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008:1-8. " uses mutual trust Breath maximizes the unsupervised most suitable barycenter quantity of determination of clustering algorithm, which is carried out with a larger k first KMEANS is clustered, and to reduce since information caused by KMEANS clusters is lost, is existed later by mutual information maximization clustering algorithm Barycenter quantity is reduced under the premise of loss information as few as possible, and the calculating speed of subsequent step is improved with this.
In order to solve the problems, such as that time-space relationship information is lost, many researchers propose the extended method based on BoW, press According to the difference of retained information, these methods are divided into two classes:The BoW for retaining absolute space-time information indicates and retains relative space-time The BoW of information is indicated.The former usually requires to carry out global segmentation to the space-time body of video, this makes the Video coding being calculated It is related to the absolute space-time coordinate of feature, lack translation invariance.“Laptev I, Marszalek M, Schmid C, et al. Learning realistic human actions from movies[C]// Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008:1-8. " The space-time body integral of video is cut into predefined space-time grid, calculates BoW in each grid respectively later and all grids BoW vectors be together in series as final Video coding.However, in order to determine that best grid combination, this method need to use Cross validation carries out greedy search, and the step is very time-consuming, in addition, obtained by the BoW for different grids of connecting Overlength vector further increases computation complexity.“Sun J, Wu X, Yan S, et al. Hierarchical spatio-temporal context modeling for action recognition[C]// Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009: 2004-2011. " obtains three layers of spatio-temporal context information in a manner of a kind of level.And the latter, that is, retain relative space-time information Method typically carries out Video coding using the relative space-time distance between BoW barycenter or feature.“Kovashka A, Grauman K. Learning a hierarchy of discriminative space-time neighborhood features for human action recognition[C]// Computer Vision and Pattern Recognition. IEEE, 2010:2046-2053. " constructs new feature with the point around former characteristic point first, combines later The affiliated barycenter information of new feature and new feature directional information construct Video coding.Due to needing the barycenter of construction multi-layer, the party The computation complexity of method is relatively high.“Wang J, Chen Z, Wu Y. Action recognition with multiscale spatio-temporal contexts[C]// Computer Vision and Pattern Recognition. IEEE, 2011:3185-3192. " passes through between the acquisition feature in multiple spatial and temporal scales of former feature Space-time context interactive information carries out Video coding.
Invention content
The invention mainly solves the technical problem of providing a kind of human behavior knowledges based on the fusion of multiple features time-space relationship Other method, two kinds of features of this method pair correspond to the time-space relationship information between barycenter and carry out explicit code, can preferably excavate The effective information of different characteristic carries out human behavior identification.
In order to solve the above technical problems, one aspect of the present invention is:One kind being based on multiple features time-space relationship The human behavior recognition methods of fusion, specific steps include:
Step 1:Intensive track characteristic extraction is carried out to video, and with two kinds of sides of light stream histogram and moving boundaries histogram Method is indicated the track characteristic of extraction, obtains two kinds of character representations;
Step 2:Two kinds of features, which are built, with KMEANS algorithms corresponds to the space-time bigraph (bipartite graph) between barycenter;
Step 3:The space-time bigraph (bipartite graph) in step 2 is divided into strong time-space relationship using the roads K bigraph (bipartite graph) cutting techniques Barycenter and barycenter with weak time-space relationship, the barycenter fusion with strong time-space relationship will get up after segmentation, when will have weak The barycenter of void relation separates;
Step 4:The time-space matrix matrix between the barycenter with strong time-space relationship is calculated, and using based on conditional probability Representation method matrix of adjusting the distance is compressed, and the coding of the videl stage after two kinds of Fusion Features is obtained;
Step 5:Training grader is simultaneously identified.
In a preferred embodiment of the present invention, KMEANS algorithms will obtain in the step 1 in the step 2 two Kind feature is clustered, and to obtain several barycenter, spacetime coordinate is corresponded to by calculating any two feature in each video Between L1 distances weigh the time-space relationship between two features, using the time-space relationship between two kinds of features calculate its barycenter Between time-space relationship, and obtain two kinds of features and correspond to space-time bigraph (bipartite graph) between barycenter.
In a preferred embodiment of the present invention, conditional probability described in the step 4 indicates method first to barycenter The distance between vector carry out discretization, then the time-space matrix point after any two fusion between barycenter is described with conditional probability Cloth information.
The beneficial effects of the invention are as follows:A kind of human behavior identification side based on the fusion of multiple features time-space relationship of the present invention Method, this method is by calculating the time-space matrix in each video between feature, to which two kinds of features of structure correspond between barycenter Space-time bigraph (bipartite graph), and space-time bigraph (bipartite graph) is split using the roads K bigraph (bipartite graph) cutting techniques, it will be with strong time-space relationship with this Barycenter merges, and has preferably excavated the effective information of different characteristic, has improved recognition accuracy.
Description of the drawings
Fig. 1 is a kind of flow chart of the human behavior recognition methods based on the fusion of multiple features time-space relationship.
Specific implementation mode
The preferred embodiments of the present invention will be described in detail below so that advantages and features of the invention can be easier to by It will be appreciated by those skilled in the art that so as to make a clearer definition of the protection scope of the present invention.
The embodiment of the present invention includes:A kind of human behavior recognition methods based on the fusion of multiple features time-space relationship, it is specific to walk Suddenly include:
Step 1:Intensive track characteristic extraction is carried out to video, carrying out characteristic point in an intensive grid first adopts Sample.In order to enable the characteristic point of acquisition adapts to change of scale, will be adopted in the grid of multiple and different space scales simultaneously Sample, then intensive track characteristic by estimate the optical flow field of each frame to each sampled point into line trace, and each sampled point L frames are only tracked in its corresponding space scale, and finally feature calculation light stream histogram and moving boundaries histogram are made respectively For two different features;
Step 2:If two kinds extracted in the step 1 are characterized as fea1 and fea2, both features are carried out respectively The barycenter that KMEANS is clustered is respectivelyWith, by this obtained The barycenter construction space-time bigraph (bipartite graph) G (V, E) of two kinds of features.
Wherein V=, E is the adjacency matrix of bigraph (bipartite graph), i.e.,:
Wherein, S the sum of time-space matrix matrixes between two kinds of features in entire training set, i.e.,:
Wherein,Time-space matrix matrix of the barycenter in video V is corresponded to for two kinds of features, is each regarded by calculating Any two feature corresponds to the L1 distances between spacetime coordinate to weigh the time-space relationship between two features, L1 distance meters in frequency Calculating formula is:
Step 3:The space-time bigraph (bipartite graph) in step 2 is divided into strong time-space relationship using the roads K bigraph (bipartite graph) cutting techniques Barycenter and barycenter with weak time-space relationship give a bigraph (bipartite graph) G (V, E), the roads the K segmentation of bigraph (bipartite graph) is by vertex set V points It is cut into k subset, it is therefore an objective to minimize balancing cost function:
Laplacian Matrix is constructed first
And find out L's using singular value decompositionA corresponding left singular vector of big singular valueIt is strange with the right side Incorgruous amount, wherein setting,,It arrivesThe second largest unusual greatly to l of L is corresponded to respectively The corresponding left singular vector of value,It arrivesThe second largest of L is corresponded to respectively corresponds to right singular vector to the big singular values of l;
If
It is rightRow vector carry out KMEANS clusters, K obtained barycenter be merge barycenter, ifIt is characterized 1 barycenter Quantity,It is characterized 2 barycenter quantity, then matrixLine number be, by clustering,BeforeA row Barycenter difference character pair 1 belonging to vectorA barycenter generic,AfterA affiliated barycenter of row vector is right respectively Answer feature 2A barycenter generic;
Step 4:If the distance matrix of video V is before fusion, merged two kinds are characterized as fea1 and fea2, two Kind feature corresponds to barycenter set and is respectivelyWith, and the protoplasm heart is to melting Close barycenter mapping function be
Wherein,,
Then the distance matrix of barycenter fusion rear video V is, wherein
It calculates firstfea1BarycenterIn video V withfea2All barycenter with distanceRelevant condition is general Rate is:
Wherein,It indicates in video Vfea1BarycenterWithfea2All barycenter apart's Number,
Then, the barycenter of fea2 is symmetrically calculatedIn video V withfea1All barycenter with distanceIt is relevant Conditional probability is:
Finally, entire video V can be encoded into the matrix of 2m*k:
Step 5:Finally, it is encoded using the fusion rear video grade of acquisition, one multi-class support vector machine of training is for newly regarding The identification of frequency.
Compared with prior art, a kind of human behavior recognition methods based on the fusion of multiple features time-space relationship of the present invention, should Method is by calculating the time-space matrix in each video between feature, to which two kinds of features of structure correspond to the space-time two between barycenter Portion's figure, and space-time bigraph (bipartite graph) is split using the roads K bigraph (bipartite graph) cutting techniques, barycenter with strong time-space relationship is melted with this It closes, has preferably excavated the effective information of different characteristic, improved recognition accuracy.
Example the above is only the implementation of the present invention is not intended to limit the scope of the invention, every to utilize this hair Equivalent structure or equivalent flow shift made by bright specification is applied directly or indirectly in other relevant technical fields, Similarly it is included within the scope of the present invention.

Claims (2)

1. a kind of human behavior recognition methods based on the fusion of multiple features time-space relationship, which is characterized in that specific steps include:
Step 1:Intensive track characteristic extraction is carried out to video, and with light stream histogram and moving boundaries histogram two methods pair The track characteristic of extraction is indicated, and obtains two kinds of character representations;
Step 2:Obtained in the step 1 two kinds of features are clustered, it is each by calculating to obtain several barycenter Any two feature corresponds to the L1 distances between spacetime coordinate to weigh the time-space relationship between two features in video, utilizes two Time-space relationship between kind of feature calculates the time-space relationship between its barycenter, and obtains two kinds of features and correspond to space-time between barycenter Bigraph (bipartite graph);
Step 3:The space-time bigraph (bipartite graph) in step 2 is divided into the matter with strong time-space relationship using the roads K bigraph (bipartite graph) cutting techniques The heart and barycenter with weak time-space relationship will get up after segmentation, will have weak space-time to close in the barycenter fusion with strong time-space relationship The barycenter of system separates;
Step 4:The time-space matrix matrix between the barycenter with strong time-space relationship is calculated, and uses the expression based on conditional probability Method matrix of adjusting the distance is compressed, and the coding of the videl stage after two kinds of Fusion Features is obtained;
Step 5:Training grader is simultaneously identified.
2. a kind of human behavior recognition methods based on the fusion of multiple features time-space relationship according to claim 1, feature It is:Conditional probability described in the step 4 indicates that method carries out discretization to the distance between barycenter vector first, then The time-space matrix distributed intelligence after any two merges between barycenter is described with conditional probability.
CN201510298003.2A 2015-06-03 2015-06-03 A kind of human behavior recognition methods based on the fusion of multiple features time-space relationship Expired - Fee Related CN104881655B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510298003.2A CN104881655B (en) 2015-06-03 2015-06-03 A kind of human behavior recognition methods based on the fusion of multiple features time-space relationship

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510298003.2A CN104881655B (en) 2015-06-03 2015-06-03 A kind of human behavior recognition methods based on the fusion of multiple features time-space relationship

Publications (2)

Publication Number Publication Date
CN104881655A CN104881655A (en) 2015-09-02
CN104881655B true CN104881655B (en) 2018-08-28

Family

ID=53949142

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510298003.2A Expired - Fee Related CN104881655B (en) 2015-06-03 2015-06-03 A kind of human behavior recognition methods based on the fusion of multiple features time-space relationship

Country Status (1)

Country Link
CN (1) CN104881655B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106056043B (en) * 2016-05-19 2019-07-30 中国科学院自动化研究所 Animal behavior recognition methods and device based on transfer learning
CN106599907B (en) * 2016-11-29 2019-11-29 北京航空航天大学 The dynamic scene classification method and device of multiple features fusion
CN107680119A (en) * 2017-09-05 2018-02-09 燕山大学 A kind of track algorithm based on space-time context fusion multiple features and scale filter
CN108322473B (en) * 2018-02-12 2020-05-01 京东数字科技控股有限公司 User behavior analysis method and device
CN109508684B (en) * 2018-11-21 2022-12-27 中山大学 Method for recognizing human behavior in video
CN111080677B (en) * 2019-12-23 2023-09-12 天津理工大学 Protection method for real-time partition operation of workers in pollution remediation site
CN111860598B (en) * 2020-06-18 2023-02-28 中国地质大学(武汉) Data analysis method and electronic equipment for identifying sports behaviors and relationships

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103854016A (en) * 2014-03-27 2014-06-11 北京大学深圳研究生院 Human body behavior classification and identification method and system based on directional common occurrence characteristics
CN104063721A (en) * 2014-07-04 2014-09-24 中国科学院自动化研究所 Human behavior recognition method based on automatic semantic feature study and screening
CN104268577A (en) * 2014-06-27 2015-01-07 大连理工大学 Human body behavior identification method based on inertial sensor

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103854016A (en) * 2014-03-27 2014-06-11 北京大学深圳研究生院 Human body behavior classification and identification method and system based on directional common occurrence characteristics
CN104268577A (en) * 2014-06-27 2015-01-07 大连理工大学 Human body behavior identification method based on inertial sensor
CN104063721A (en) * 2014-07-04 2014-09-24 中国科学院自动化研究所 Human behavior recognition method based on automatic semantic feature study and screening

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Co-clustering documents and words using Bipartite Spectral Graph Partitioning;Inderjit S.Dhillon等;《ACM》;20011231;第269-274页 *
Discovering joint audio-visual codewords for video event detection;Jhuo I H等;《 Machine vision and applications》;20141231;第33-47页 *
Discriminant Bag of Words based representation for human action recognition;Iosifidis A等;《Pattern Recognition Letters》;20141231;第185-192页 *
基于时空兴趣点的人体行为识别方法研究;王博;《中国优秀硕士学位论文全文数据库 信息技术辑》;20150515;第I138-1068页 *

Also Published As

Publication number Publication date
CN104881655A (en) 2015-09-02

Similar Documents

Publication Publication Date Title
CN104881655B (en) A kind of human behavior recognition methods based on the fusion of multiple features time-space relationship
CN110929578B (en) Anti-shielding pedestrian detection method based on attention mechanism
CN110428428B (en) Image semantic segmentation method, electronic equipment and readable storage medium
CN105512289B (en) Image search method based on deep learning and Hash
Khan et al. Automatic shadow detection and removal from a single image
CN110210551A (en) A kind of visual target tracking method based on adaptive main body sensitivity
CN110852182B (en) Depth video human body behavior recognition method based on three-dimensional space time sequence modeling
CN107273905B (en) Target active contour tracking method combined with motion information
CN111814719A (en) Skeleton behavior identification method based on 3D space-time diagram convolution
CN111681274A (en) 3D human skeleton recognition and extraction method based on depth camera point cloud data
CN105740915B (en) A kind of collaboration dividing method merging perception information
CN111462191B (en) Non-local filter unsupervised optical flow estimation method based on deep learning
CN110473284A (en) A kind of moving object method for reconstructing three-dimensional model based on deep learning
CN113240691A (en) Medical image segmentation method based on U-shaped network
CN111241963B (en) First person view video interactive behavior identification method based on interactive modeling
CN103226708A (en) Multi-model fusion video hand division method based on Kinect
CN104063721B (en) A kind of human behavior recognition methods learnt automatically based on semantic feature with screening
CN109508686B (en) Human behavior recognition method based on hierarchical feature subspace learning
CN104298974A (en) Human body behavior recognition method based on depth video sequence
CN113128344B (en) Multi-information fusion stereoscopic video significance detection method
CN113435520A (en) Neural network training method, device, equipment and computer readable storage medium
Chen et al. A full density stereo matching system based on the combination of CNNs and slanted-planes
CN115359372A (en) Unmanned aerial vehicle video moving object detection method based on optical flow network
CN113762201A (en) Mask detection method based on yolov4
Liu et al. Towards interpretable and robust hand detection via pixel-wise prediction

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20180828

Termination date: 20190603