CN104881655B - A kind of human behavior recognition methods based on the fusion of multiple features time-space relationship - Google Patents
A kind of human behavior recognition methods based on the fusion of multiple features time-space relationship Download PDFInfo
- Publication number
- CN104881655B CN104881655B CN201510298003.2A CN201510298003A CN104881655B CN 104881655 B CN104881655 B CN 104881655B CN 201510298003 A CN201510298003 A CN 201510298003A CN 104881655 B CN104881655 B CN 104881655B
- Authority
- CN
- China
- Prior art keywords
- time
- space
- barycenter
- bigraph
- features
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
- G06V20/42—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items of sport video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/49—Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
Abstract
The invention discloses a kind of human behavior recognition methods based on the fusion of multiple features time-space relationship, specific steps include:The intensive track characteristic of video extraction is indicated by light stream histogram and moving boundaries histogram,Then it uses KMEANS algorithms to build two kinds of features and corresponds to the space-time bigraph (bipartite graph) between barycenter,Space-time bigraph (bipartite graph) is divided using the roads K bigraph (bipartite graph) cutting techniques,The coding of the videl stage after two kinds of Fusion Features is obtained using the representation method based on conditional probability,It finally trains grader and is identified,Through the above way,A kind of human behavior recognition methods based on the fusion of multiple features time-space relationship of the present invention,This method is by calculating the time-space matrix in each video between feature,To which two kinds of features of structure correspond to the space-time bigraph (bipartite graph) between barycenter,Space-time bigraph (bipartite graph) is split using the roads K bigraph (bipartite graph) cutting techniques,It will be merged with the barycenter of strong time-space relationship,The effective information of different characteristic is preferably excavated,Improve recognition accuracy.
Description
Technical field
The present invention relates to computer vision fields, more particularly, to a kind of mankind's row based on the fusion of multiple features time-space relationship
For recognition methods.
Background technology
With the development of computer science, video starts into a part for people's lives, how to allow computer " understanding "
Human behavior in video all has the fields such as content based video retrieval system, intelligent monitoring, human-computer interaction and virtual reality
Important function.
In general, a classical human behavior identification framework includes mainly three steps:Feature extraction, Video coding
And the training and identification of grader, in addition, for using various features the case where, further include an optional multiple features early period
Fusion or later stage fusion steps, Video coding therein are the committed steps for determining recognition accuracy.
Currently, it is bag of words to be widely used with one of improved coding method(BagofWords, abbreviation BoW)Method, warp
The BoW methods of allusion quotation first cluster feature, and it is straight that representation of video shot is then appeared in the frequency in each barycenter at feature
Side's figure vector, although BoW codings show that good generalization ability and robustness, this method also have in many documents
Many disadvantages:For example time-consuming feature clustering process, KMEANS algorithms have time-space relationship between supervision parameter k and barycenter to believe
The loss of breath.
Parameter k in order to eliminate KMEANS algorithms relies on empirically determined problem, " Liu J, Shah M. Learning
human actions via information maximization[C]// Computer Vision and Pattern
Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008:1-8. " uses mutual trust
Breath maximizes the unsupervised most suitable barycenter quantity of determination of clustering algorithm, which is carried out with a larger k first
KMEANS is clustered, and to reduce since information caused by KMEANS clusters is lost, is existed later by mutual information maximization clustering algorithm
Barycenter quantity is reduced under the premise of loss information as few as possible, and the calculating speed of subsequent step is improved with this.
In order to solve the problems, such as that time-space relationship information is lost, many researchers propose the extended method based on BoW, press
According to the difference of retained information, these methods are divided into two classes:The BoW for retaining absolute space-time information indicates and retains relative space-time
The BoW of information is indicated.The former usually requires to carry out global segmentation to the space-time body of video, this makes the Video coding being calculated
It is related to the absolute space-time coordinate of feature, lack translation invariance.“Laptev I, Marszalek M, Schmid C, et
al. Learning realistic human actions from movies[C]// Computer Vision and
Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008:1-8. "
The space-time body integral of video is cut into predefined space-time grid, calculates BoW in each grid respectively later and all grids
BoW vectors be together in series as final Video coding.However, in order to determine that best grid combination, this method need to use
Cross validation carries out greedy search, and the step is very time-consuming, in addition, obtained by the BoW for different grids of connecting
Overlength vector further increases computation complexity.“Sun J, Wu X, Yan S, et al. Hierarchical
spatio-temporal context modeling for action recognition[C]// Computer Vision
and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. IEEE, 2009:
2004-2011. " obtains three layers of spatio-temporal context information in a manner of a kind of level.And the latter, that is, retain relative space-time information
Method typically carries out Video coding using the relative space-time distance between BoW barycenter or feature.“Kovashka A,
Grauman K. Learning a hierarchy of discriminative space-time neighborhood
features for human action recognition[C]// Computer Vision and Pattern
Recognition. IEEE, 2010:2046-2053. " constructs new feature with the point around former characteristic point first, combines later
The affiliated barycenter information of new feature and new feature directional information construct Video coding.Due to needing the barycenter of construction multi-layer, the party
The computation complexity of method is relatively high.“Wang J, Chen Z, Wu Y. Action recognition with
multiscale spatio-temporal contexts[C]// Computer Vision and Pattern
Recognition. IEEE, 2011:3185-3192. " passes through between the acquisition feature in multiple spatial and temporal scales of former feature
Space-time context interactive information carries out Video coding.
Invention content
The invention mainly solves the technical problem of providing a kind of human behavior knowledges based on the fusion of multiple features time-space relationship
Other method, two kinds of features of this method pair correspond to the time-space relationship information between barycenter and carry out explicit code, can preferably excavate
The effective information of different characteristic carries out human behavior identification.
In order to solve the above technical problems, one aspect of the present invention is:One kind being based on multiple features time-space relationship
The human behavior recognition methods of fusion, specific steps include:
Step 1:Intensive track characteristic extraction is carried out to video, and with two kinds of sides of light stream histogram and moving boundaries histogram
Method is indicated the track characteristic of extraction, obtains two kinds of character representations;
Step 2:Two kinds of features, which are built, with KMEANS algorithms corresponds to the space-time bigraph (bipartite graph) between barycenter;
Step 3:The space-time bigraph (bipartite graph) in step 2 is divided into strong time-space relationship using the roads K bigraph (bipartite graph) cutting techniques
Barycenter and barycenter with weak time-space relationship, the barycenter fusion with strong time-space relationship will get up after segmentation, when will have weak
The barycenter of void relation separates;
Step 4:The time-space matrix matrix between the barycenter with strong time-space relationship is calculated, and using based on conditional probability
Representation method matrix of adjusting the distance is compressed, and the coding of the videl stage after two kinds of Fusion Features is obtained;
Step 5:Training grader is simultaneously identified.
In a preferred embodiment of the present invention, KMEANS algorithms will obtain in the step 1 in the step 2 two
Kind feature is clustered, and to obtain several barycenter, spacetime coordinate is corresponded to by calculating any two feature in each video
Between L1 distances weigh the time-space relationship between two features, using the time-space relationship between two kinds of features calculate its barycenter
Between time-space relationship, and obtain two kinds of features and correspond to space-time bigraph (bipartite graph) between barycenter.
In a preferred embodiment of the present invention, conditional probability described in the step 4 indicates method first to barycenter
The distance between vector carry out discretization, then the time-space matrix point after any two fusion between barycenter is described with conditional probability
Cloth information.
The beneficial effects of the invention are as follows:A kind of human behavior identification side based on the fusion of multiple features time-space relationship of the present invention
Method, this method is by calculating the time-space matrix in each video between feature, to which two kinds of features of structure correspond between barycenter
Space-time bigraph (bipartite graph), and space-time bigraph (bipartite graph) is split using the roads K bigraph (bipartite graph) cutting techniques, it will be with strong time-space relationship with this
Barycenter merges, and has preferably excavated the effective information of different characteristic, has improved recognition accuracy.
Description of the drawings
Fig. 1 is a kind of flow chart of the human behavior recognition methods based on the fusion of multiple features time-space relationship.
Specific implementation mode
The preferred embodiments of the present invention will be described in detail below so that advantages and features of the invention can be easier to by
It will be appreciated by those skilled in the art that so as to make a clearer definition of the protection scope of the present invention.
The embodiment of the present invention includes:A kind of human behavior recognition methods based on the fusion of multiple features time-space relationship, it is specific to walk
Suddenly include:
Step 1:Intensive track characteristic extraction is carried out to video, carrying out characteristic point in an intensive grid first adopts
Sample.In order to enable the characteristic point of acquisition adapts to change of scale, will be adopted in the grid of multiple and different space scales simultaneously
Sample, then intensive track characteristic by estimate the optical flow field of each frame to each sampled point into line trace, and each sampled point
L frames are only tracked in its corresponding space scale, and finally feature calculation light stream histogram and moving boundaries histogram are made respectively
For two different features;
Step 2:If two kinds extracted in the step 1 are characterized as fea1 and fea2, both features are carried out respectively
The barycenter that KMEANS is clustered is respectivelyWith, by this obtained
The barycenter construction space-time bigraph (bipartite graph) G (V, E) of two kinds of features.
Wherein V=, E is the adjacency matrix of bigraph (bipartite graph), i.e.,:
Wherein, S the sum of time-space matrix matrixes between two kinds of features in entire training set, i.e.,:
Wherein,Time-space matrix matrix of the barycenter in video V is corresponded to for two kinds of features, is each regarded by calculating
Any two feature corresponds to the L1 distances between spacetime coordinate to weigh the time-space relationship between two features, L1 distance meters in frequency
Calculating formula is:
;
Step 3:The space-time bigraph (bipartite graph) in step 2 is divided into strong time-space relationship using the roads K bigraph (bipartite graph) cutting techniques
Barycenter and barycenter with weak time-space relationship give a bigraph (bipartite graph) G (V, E), the roads the K segmentation of bigraph (bipartite graph) is by vertex set V points
It is cut into k subset, it is therefore an objective to minimize balancing cost function:
Laplacian Matrix is constructed first:
And find out L's using singular value decompositionA corresponding left singular vector of big singular valueIt is strange with the right side
Incorgruous amount, wherein setting,,It arrivesThe second largest unusual greatly to l of L is corresponded to respectively
The corresponding left singular vector of value,It arrivesThe second largest of L is corresponded to respectively corresponds to right singular vector to the big singular values of l;
If
It is rightRow vector carry out KMEANS clusters, K obtained barycenter be merge barycenter, ifIt is characterized 1 barycenter
Quantity,It is characterized 2 barycenter quantity, then matrixLine number be, by clustering,BeforeA row
Barycenter difference character pair 1 belonging to vectorA barycenter generic,AfterA affiliated barycenter of row vector is right respectively
Answer feature 2A barycenter generic;
Step 4:If the distance matrix of video V is before fusion, merged two kinds are characterized as fea1 and fea2, two
Kind feature corresponds to barycenter set and is respectivelyWith, and the protoplasm heart is to melting
Close barycenter mapping function be;
Wherein,,
Then the distance matrix of barycenter fusion rear video V is, wherein
It calculates firstfea1BarycenterIn video V withfea2All barycenter with distanceRelevant condition is general
Rate is:
Wherein,It indicates in video Vfea1BarycenterWithfea2All barycenter apart's
Number,
Then, the barycenter of fea2 is symmetrically calculatedIn video V withfea1All barycenter with distanceIt is relevant
Conditional probability is:
Finally, entire video V can be encoded into the matrix of 2m*k:
;
Step 5:Finally, it is encoded using the fusion rear video grade of acquisition, one multi-class support vector machine of training is for newly regarding
The identification of frequency.
Compared with prior art, a kind of human behavior recognition methods based on the fusion of multiple features time-space relationship of the present invention, should
Method is by calculating the time-space matrix in each video between feature, to which two kinds of features of structure correspond to the space-time two between barycenter
Portion's figure, and space-time bigraph (bipartite graph) is split using the roads K bigraph (bipartite graph) cutting techniques, barycenter with strong time-space relationship is melted with this
It closes, has preferably excavated the effective information of different characteristic, improved recognition accuracy.
Example the above is only the implementation of the present invention is not intended to limit the scope of the invention, every to utilize this hair
Equivalent structure or equivalent flow shift made by bright specification is applied directly or indirectly in other relevant technical fields,
Similarly it is included within the scope of the present invention.
Claims (2)
1. a kind of human behavior recognition methods based on the fusion of multiple features time-space relationship, which is characterized in that specific steps include:
Step 1:Intensive track characteristic extraction is carried out to video, and with light stream histogram and moving boundaries histogram two methods pair
The track characteristic of extraction is indicated, and obtains two kinds of character representations;
Step 2:Obtained in the step 1 two kinds of features are clustered, it is each by calculating to obtain several barycenter
Any two feature corresponds to the L1 distances between spacetime coordinate to weigh the time-space relationship between two features in video, utilizes two
Time-space relationship between kind of feature calculates the time-space relationship between its barycenter, and obtains two kinds of features and correspond to space-time between barycenter
Bigraph (bipartite graph);
Step 3:The space-time bigraph (bipartite graph) in step 2 is divided into the matter with strong time-space relationship using the roads K bigraph (bipartite graph) cutting techniques
The heart and barycenter with weak time-space relationship will get up after segmentation, will have weak space-time to close in the barycenter fusion with strong time-space relationship
The barycenter of system separates;
Step 4:The time-space matrix matrix between the barycenter with strong time-space relationship is calculated, and uses the expression based on conditional probability
Method matrix of adjusting the distance is compressed, and the coding of the videl stage after two kinds of Fusion Features is obtained;
Step 5:Training grader is simultaneously identified.
2. a kind of human behavior recognition methods based on the fusion of multiple features time-space relationship according to claim 1, feature
It is:Conditional probability described in the step 4 indicates that method carries out discretization to the distance between barycenter vector first, then
The time-space matrix distributed intelligence after any two merges between barycenter is described with conditional probability.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510298003.2A CN104881655B (en) | 2015-06-03 | 2015-06-03 | A kind of human behavior recognition methods based on the fusion of multiple features time-space relationship |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510298003.2A CN104881655B (en) | 2015-06-03 | 2015-06-03 | A kind of human behavior recognition methods based on the fusion of multiple features time-space relationship |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104881655A CN104881655A (en) | 2015-09-02 |
CN104881655B true CN104881655B (en) | 2018-08-28 |
Family
ID=53949142
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510298003.2A Expired - Fee Related CN104881655B (en) | 2015-06-03 | 2015-06-03 | A kind of human behavior recognition methods based on the fusion of multiple features time-space relationship |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104881655B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106056043B (en) * | 2016-05-19 | 2019-07-30 | 中国科学院自动化研究所 | Animal behavior recognition methods and device based on transfer learning |
CN106599907B (en) * | 2016-11-29 | 2019-11-29 | 北京航空航天大学 | The dynamic scene classification method and device of multiple features fusion |
CN107680119A (en) * | 2017-09-05 | 2018-02-09 | 燕山大学 | A kind of track algorithm based on space-time context fusion multiple features and scale filter |
CN108322473B (en) * | 2018-02-12 | 2020-05-01 | 京东数字科技控股有限公司 | User behavior analysis method and device |
CN109508684B (en) * | 2018-11-21 | 2022-12-27 | 中山大学 | Method for recognizing human behavior in video |
CN111080677B (en) * | 2019-12-23 | 2023-09-12 | 天津理工大学 | Protection method for real-time partition operation of workers in pollution remediation site |
CN111860598B (en) * | 2020-06-18 | 2023-02-28 | 中国地质大学(武汉) | Data analysis method and electronic equipment for identifying sports behaviors and relationships |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103854016A (en) * | 2014-03-27 | 2014-06-11 | 北京大学深圳研究生院 | Human body behavior classification and identification method and system based on directional common occurrence characteristics |
CN104063721A (en) * | 2014-07-04 | 2014-09-24 | 中国科学院自动化研究所 | Human behavior recognition method based on automatic semantic feature study and screening |
CN104268577A (en) * | 2014-06-27 | 2015-01-07 | 大连理工大学 | Human body behavior identification method based on inertial sensor |
-
2015
- 2015-06-03 CN CN201510298003.2A patent/CN104881655B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103854016A (en) * | 2014-03-27 | 2014-06-11 | 北京大学深圳研究生院 | Human body behavior classification and identification method and system based on directional common occurrence characteristics |
CN104268577A (en) * | 2014-06-27 | 2015-01-07 | 大连理工大学 | Human body behavior identification method based on inertial sensor |
CN104063721A (en) * | 2014-07-04 | 2014-09-24 | 中国科学院自动化研究所 | Human behavior recognition method based on automatic semantic feature study and screening |
Non-Patent Citations (4)
Title |
---|
Co-clustering documents and words using Bipartite Spectral Graph Partitioning;Inderjit S.Dhillon等;《ACM》;20011231;第269-274页 * |
Discovering joint audio-visual codewords for video event detection;Jhuo I H等;《 Machine vision and applications》;20141231;第33-47页 * |
Discriminant Bag of Words based representation for human action recognition;Iosifidis A等;《Pattern Recognition Letters》;20141231;第185-192页 * |
基于时空兴趣点的人体行为识别方法研究;王博;《中国优秀硕士学位论文全文数据库 信息技术辑》;20150515;第I138-1068页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104881655A (en) | 2015-09-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104881655B (en) | A kind of human behavior recognition methods based on the fusion of multiple features time-space relationship | |
CN110929578B (en) | Anti-shielding pedestrian detection method based on attention mechanism | |
CN110428428B (en) | Image semantic segmentation method, electronic equipment and readable storage medium | |
CN105512289B (en) | Image search method based on deep learning and Hash | |
Khan et al. | Automatic shadow detection and removal from a single image | |
CN110210551A (en) | A kind of visual target tracking method based on adaptive main body sensitivity | |
CN110852182B (en) | Depth video human body behavior recognition method based on three-dimensional space time sequence modeling | |
CN107273905B (en) | Target active contour tracking method combined with motion information | |
CN111814719A (en) | Skeleton behavior identification method based on 3D space-time diagram convolution | |
CN111681274A (en) | 3D human skeleton recognition and extraction method based on depth camera point cloud data | |
CN105740915B (en) | A kind of collaboration dividing method merging perception information | |
CN111462191B (en) | Non-local filter unsupervised optical flow estimation method based on deep learning | |
CN110473284A (en) | A kind of moving object method for reconstructing three-dimensional model based on deep learning | |
CN113240691A (en) | Medical image segmentation method based on U-shaped network | |
CN111241963B (en) | First person view video interactive behavior identification method based on interactive modeling | |
CN103226708A (en) | Multi-model fusion video hand division method based on Kinect | |
CN104063721B (en) | A kind of human behavior recognition methods learnt automatically based on semantic feature with screening | |
CN109508686B (en) | Human behavior recognition method based on hierarchical feature subspace learning | |
CN104298974A (en) | Human body behavior recognition method based on depth video sequence | |
CN113128344B (en) | Multi-information fusion stereoscopic video significance detection method | |
CN113435520A (en) | Neural network training method, device, equipment and computer readable storage medium | |
Chen et al. | A full density stereo matching system based on the combination of CNNs and slanted-planes | |
CN115359372A (en) | Unmanned aerial vehicle video moving object detection method based on optical flow network | |
CN113762201A (en) | Mask detection method based on yolov4 | |
Liu et al. | Towards interpretable and robust hand detection via pixel-wise prediction |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
EXSB | Decision made by sipo to initiate substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20180828 Termination date: 20190603 |