CN102034096A - Video event recognition method based on top-down motion attention mechanism - Google Patents
Video event recognition method based on top-down motion attention mechanism Download PDFInfo
- Publication number
- CN102034096A CN102034096A CN 201010591513 CN201010591513A CN102034096A CN 102034096 A CN102034096 A CN 102034096A CN 201010591513 CN201010591513 CN 201010591513 CN 201010591513 A CN201010591513 A CN 201010591513A CN 102034096 A CN102034096 A CN 102034096A
- Authority
- CN
- China
- Prior art keywords
- video
- motion
- interest
- point
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000007246 mechanism Effects 0.000 title claims abstract description 15
- 238000012549 training Methods 0.000 claims abstract description 29
- 238000012360 testing method Methods 0.000 claims abstract description 15
- 238000012706 support-vector machine Methods 0.000 claims abstract description 9
- 239000011159 matrix material Substances 0.000 claims abstract description 7
- 238000001514 detection method Methods 0.000 claims description 10
- 230000009466 transformation Effects 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 5
- 230000000007 visual effect Effects 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 239000012141 concentrate Substances 0.000 claims description 2
- 230000006870 function Effects 0.000 abstract description 10
- 238000005516 engineering process Methods 0.000 description 11
- 238000007500 overflow downdraw method Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000003542 behavioural effect Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- BBEAQIROQSPTKN-UHFFFAOYSA-N pyrene Chemical compound C1=CC=C2C=CC3=CC=CC4=CC=C1C2=C43 BBEAQIROQSPTKN-UHFFFAOYSA-N 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000000638 stimulation Effects 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- PXFBZOLANLWPMH-UHFFFAOYSA-N 16-Epiaffinine Natural products C1C(C2=CC=CC=C2N2)=C2C(=O)CC2C(=CC)CN(C)C1C2CO PXFBZOLANLWPMH-UHFFFAOYSA-N 0.000 description 1
- 241001269238 Data Species 0.000 description 1
- 235000002595 Solanum tuberosum Nutrition 0.000 description 1
- 244000061456 Solanum tuberosum Species 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 230000002596 correlated effect Effects 0.000 description 1
- 230000000875 corresponding effect Effects 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- GVEPBJHOBDJJJI-UHFFFAOYSA-N fluoranthrene Natural products C1=CC(C2=CC=CC=C22)=C3C2=CC=CC3=C1 GVEPBJHOBDJJJI-UHFFFAOYSA-N 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012876 topography Methods 0.000 description 1
Images
Abstract
The invention discloses a video event recognition method based on a top-down motion attention mechanism, which comprises the following steps of: 1, detecting points of interest of each frame in each video in a video set on a computer by using Gaussian difference detector, wherein the video set comprises a training video set and a testing video set; 2, extracting scale-invariant characteristic description sub-characteristics and light stream characteristics from the detected points of interest of each frame; 3, establishing an apparent word list and a motion word list; 4, learning the probability of each motion word about each type of events on the training video set and establishing a motion information-based attention histogram; 5, calculating the similarity between videos in the video set by using the distance of a bulldozer, and generating a kernel function matrix; and 6, training a support vector machine classifier by using the obtained kernel function matrix so as to obtain classifier parameters, classifying the tested video sets, and outputting classification results.
Description
Technical field
The present invention relates to the Computer Applied Technology field, particularly Video Events recognition methods.
Background technology
In recent years, develop rapidly along with Internet, popularizing of video compression technology, DVD, WebTV, 3G (Third Generation) Moblie technology technology such as (3G), especially the construction of broadband networks makes that the chance of people's interactive access video information is more and more, some video portal websites arise at the historic moment, as domestic excellent cruel and potato net, external youtube etc.Video information wright in the world, as TV station, film maker, ad production merchant etc., even various digital capture devices such as digital camera, Digital Video etc. have entered into ordinary citizen house, all the time all manufacture the audio-visual-materials that make new advances continuously, the digital video medium have begun to be full of in a large number people's living space.
How to make people that the useful information that comprises in the video is fast located, conveniently obtained and effectively management be a problem demanding prompt solution, the essence of this problem is exactly how with computer technology video content effectively to be managed and expressed; And video content understanding has been an international research focus, and a lot of researchists begin to use relevant video data treatment technology to extract implicit, useful, understandable semantic information in the video, thereby realizes video content understanding.Video information has the characteristics of himself, and that is exactly that data volume is big, and is structural poor, so the problem that the video information expansion brings is also very serious.Leave unused owing to can't effectively handle the video information that causes gathering to a large amount of video informations in a lot of fields.
Event recognition always is one of main task of TRECVID.Along with enriching constantly of various multimedia messagess on the network, content-based multimedia retrieval technology more and more receives publicity and payes attention to.At present, the greatest problem that information retrieval based on contents faced is exactly " the semantic wide gap " that exists between low-level image feature and the high-level semantic.The detection of Video Events is that computer vision technique is combined with content-based multimedia retrieval technology with identification, information from the context and relevant domain knowledge, merging various clues and carry out reasoning, is that the contact between low-level image feature and the high-level semantic is set up on the basis with the incident.Describe by setting up based on the video semanteme of incident, we can carry out higher level semantic analysis to multimedia video, set up index and search mechanism efficiently.Video analysis in the past all is confined to video or strict video such as the databases of controlling such as Weizman, KTH, IXMAS under some fixed cameras, be different from ordinary video, video in the event detection all derives from the video in real video such as news broadcast video, sports tournament video and the film etc., and this just makes event detection face lot of challenges: geometric deformation of the blocking of unordered motion, complicated background, target, illumination and target or the like.
A common Video Events is by being what (what) and how (how) individual aspect description taking place.What is commonly referred to as the frame of video lens features, i.e. appearance features, for example people, object, buildings etc.; The behavioral characteristics that how is commonly referred to as video is a motion feature.Movable information is that video data is exclusive, and it has represented video content development and change situation in time, has considerable effect for describing and understanding video content.How to merge this two problems that the aspect also is a very challenging property effectively.But the method that also lacks at present effective description incident, this mainly is that as what or how, especially some method is only utilized the distributed intelligence of motion because present method is only considered incident in a certain respect, this method robust not in real video.For the present work in both fusion aspects all seldom, and for traditional fusion method as merging earlier and the back fusion method, all be bottom-up basically, just go blindly two aspects of incident are combined, be not task-driven.
Summary of the invention
(1) technical matters that will solve
In order to solve of the interference of prior art background information to assorting process, make that the feature tool specific aim of extracting is not strong, the low technical matters of accuracy of identification the purpose of this invention is to provide the Video Events recognition methods based on top-down motion attention mechanism that a kind of video static nature and behavioral characteristics merge for this reason.
(2) technical scheme
For achieving the above object, the invention provides a kind of Video Events recognition methods based on top-down motion attention mechanism, the technical scheme of the technical solution problem of this method comprises:
Step S1: utilize difference of Gaussian to detect son, detect the point of interest that video is concentrated each each frame of video on computers, described video collection comprises: training video collection and test video collection;
Step S2: the point of interest that detection is obtained each frame extracts appearance features and motion feature, and described appearance features is a yardstick invariant features descriptor feature, and described motion feature is the light stream feature;
Step S3: the yardstick invariant features descriptor feature and the light stream feature that obtain are carried out cluster, and set up apparent vocabulary and motion vocabulary respectively;
Step S4: each motion word of study is about the probability of each class incident and set up attention histogram based on movable information on the training video collection;
Step S5: that utilizes the video collection notes histogram feature based on motion, adopts similarity between dozer distance calculation training video collection and the training video collection, reaches the similarity between training video collection and the test video collection, and the produced nucleus Jacobian matrix;
Step S6: utilize the kernel function matrix that obtains that support vector machine classifier is trained, obtain classifier parameters, utilize the support vector machine classifier model that trains to the classification of test video collection, the classification results of output test video collection.
Wherein, the point of interest of described each frame extract to adopt Harris's angle point, Harris-Laplce's point of interest, Hessen-Laplce's point of interest, Harris-affined transformation point of interest, Hessen-affined transformation point of interest, maximum stable extremal region point of interest, fast robust feature point of interest or net point and difference of Gaussian to detect a kind of in the son.
Wherein, described foundation comprises based on the histogrammic step of the attention of movable information:
Step S41: set video and concentrate each frame I of video
iBe expressed from the next:
In the formula: n () is i frame I
iHistogram represent w
vBe the appearance features word, w
mBe the motion feature word, C is the class label of incident, c ∈ 1,2 ... },
It is the motion word
The probability that belongs to the c class; δ is an indicative function,
Be respectively point of interest d
jMotion and appearance features word index;
Step S42: the attention histogram of setting up two types for exercise intensity and direction of motion is:
Based on the exercise intensity histogram (MMA-BOW) of vision word as shown in the formula expression:
Based on the direction of motion histogram (OMA-BOW) of vision word as shown in the formula expression:
Step S43: consider the intensity and the directional information of light stream simultaneously, set up based on the motion of visual word bag and notice that histogram (MOMA-BoW) is as shown in the formula expression:
Wherein, for each class training video collection c ∈ C that training video is concentrated, each motion word w
mProbability P (C=c|w about each class
m) obtain by bayes rule:
T in the formula
C+Be all set that belong to the training video collection of c class, T
cBe the set of all training samples, || || expression be the number of point of interest.
Wherein, the distance that described employing dozer distance is measured two video sequences of video collection for any two sections video P and Q, is expressed as respectively
P wherein
iAnd q
iThe histogram feature of representing video P and Q respectively,
With
Represent the weight of the i frame of video P and video Q respectively, m and n represent the frame number of video P and video Q respectively, the similarity D of video P and video Q (P, Q) calculate by following formula:
D in the formula
IjBe p
iAnd q
jBetween Euclidean distance, f
IjBe the Optimum Matching of video P and video Q, described Optimum Matching is solved by a linear programming problem.
(3) beneficial effect
From technique scheme as can be seen, the present invention has the following advantages:
1, the recognition methods of this video provided by the invention, because the system of selection of point of interest is varied, the selection of point of interest place local feature is also very flexible, if make and to have occurred more the point of interest detection method of fast robust and the extracting method of point of interest place local feature from now on, can add in the native system easily, thus the performance of further elevator system.
2, because the point of interest quantity of directly extracting on video is often very big, comprised complicated background information, the existence of these background informations brings very serious disturbance to follow-up processing, reduce the accuracy rate of classification, the method of this video identification provided by the invention, owing to adopted people's attention mechanism that point of interest is selected, outstanding those are contributed those big points of interest to event recognition, significantly reduced the interference of background information to assorting process, make the feature of extracting have more specific aim, can significantly improve the accuracy of identification.
3, to merge with the back all be from bottom to top as merging earlier for traditional Feature Fusion method, and we utilize people's attention mechanism to adopt top-down mode to merge the static state and the behavioral characteristics of video, and fusion efficiencies has had and significantly improves.
The present invention utilizes top-down mode to merge the apparent and motion feature of video according to people's attention mechanism, this fusion method is without any need for the setting of parameter, can be well in conjunction with merging the advantage that merges with the back earlier, significantly improved recognition efficiency, the present invention has overcome the shortcoming that traditional event recognition method needs technology such as background subtraction, target following, detection, has good application prospects.
Description of drawings
Fig. 1 is the process flow diagram that the present invention is based on the Video Events recognition methods of top-down motion attention mechanism;
Fig. 1 a-Fig. 1 d is that the point of interest of video frame image of the present invention detects and the light stream example;
Fig. 2 is a system architecture diagram of the present invention.
Embodiment
For making the purpose, technical solutions and advantages of the present invention clearer, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in more detail.
Execution environment of the present invention adopts an algorithm routine that has the Pentium 4 computing machine of 3.0G hertz central processing unit and 2G byte of memory and worked out Video Events identification efficiently with Matlab and C language, can also adopt other execution environment, not repeat them here.
The general frame of system schema of the present invention is seen accompanying drawing 2, utilizes the Video Events identification mission of computer realization based on top-down motion attention mechanism, contains five main modules to be:
Point of interest detection module 1, this module functions are that video database is divided into training set (training video) and test set (test video) two parts, are beneficial to utilize difference of Gaussian to detect the point of interest that son detects training video and each frame of test video.
The input end of characteristic extracting module 2 is connected with the output terminal of point of interest detection module 1, and the major function of characteristic extracting module 2 is on the basis of point of interest detection module 1, extracts the yardstick invariant features descriptor feature and the light stream feature of each point of interest.
The output terminal of the input end characteristic extracting module 2 of setting up module 3 of vocabulary connects, and is used for yardstick invariant features descriptor and light stream feature cluster on training data to obtaining, and sets up apparent vocabulary and motion vocabulary respectively;
Be connected with the output terminal of characteristic extracting module 2 and the output terminal of setting up module 3 of vocabulary based on the histogrammic input end of setting up module 4 of the attention of movable information, according to training data, calculate each motion word in the motion vocabulary about the probability of motion of the special class of each incident, obtain attention histogram based on movable information by the apparent word in described probability and the apparent vocabulary.
The input end of sort module 5 be connected based on the histogrammic output terminal of setting up module 4 of the attention of movable information, the histogram feature that is used for receiver, video based on the motion attention, and the similarity of any two videos of employing dozer distance calculation, the produced nucleus Jacobian matrix, utilize training set that support vector machine classifier is trained, obtain classifier parameters, the support vector machine classifier model that utilization trains is classified to test set, and the classification results of output test video collection, wherein " car (Existing Car) occurs, shake hands (Handshaking) runs (Running); demonstration (Demonstration Or Protest); walk (Walking), and rebellion (Riot) is danced (Dancing); shooting (Shooting), and the masses march (People Marching) " is our event recognition task.
The process flow diagram of the Video Events recognition methods based on top-down motion attention mechanism as shown in Figure 1; Provide the explanation of each related in this invention technical scheme detailed problem below in detail.
(1) point of interest detects
The extracting method of point of interest can have a lot of selections, as: Harris's angle point (Harris), Harris-Laplce's point of interest (Harris Laplace), Hessen-Laplce's point of interest (Hessian Laplace), Harris-affined transformation point of interest (Harris Affine), Hessen-affined transformation point of interest (Hessian_Affine), maximum stable extremal region point of interest (Maximally Stable Extremal Regions, MSER), fast robust feature point of interest (Speeded Up Robust Features, SURF) and net point (Grid) etc.
Video V note is made V={I
i, i ∈ 1,2 ..., N}.Each frame I to video
iDetect Local Extremum in Gaussian difference pyrene (DOG, the Difference of Gassian) metric space simultaneously with as point of interest.
(2) feature extraction
Next put forward topography's feature at point of interest place, alternative local feature extracting method has: yardstick invariant feature (Scale Invariant Feature Transform, SIFT), fast robust feature (Speeded Up Robust Features, SURF) and shape context-descriptive feature (Shape Context, SC) etc.
We adopt the SIFT of 128 dimensions to represent the appearance features of point of interest, according to detected point of interest, and the light stream that utilizes the iteration Lucas-Kanade method in the pyramid to calculate a sparse features collection.Fig. 1 a to Fig. 1 d has provided the example of detected interest and light stream vectors on some frame of video.
With k mean cluster method or other clustering method detected point of interest is distinguished cluster according to apparent and motion feature, be clustered into two vocabulary: w
m(motion word) and w
v(apparent word), defining each cluster centre is a word (word).
Light stream can be represented with intensity Mag and direction Orient under polar coordinate system, and in the two dimensional motion field, each motion vector has all comprised intensity and these two kinds of motion clues of direction.Strength information has reflected the space amplitude of motion, and directional information has reflected the trend of motion.Therefore we have two type of motion words: a kind of is the exercise intensity word
A kind of is the direction of motion word
(3) based on the histogrammic foundation of the attention of movable information
By Fig. 1 a-Fig. 1 d as can be seen, the point of interest quantity of being carried on the frame of video is often very big, comprised complicated background information and with the irrelevant information of our event recognition task, the existence meeting of these information brings very serious disturbance to our follow-up processing.The present invention utilizes people's attention mechanism point of interest is selected and to be weighed, biological and psychological research proves, human always be focussed in particular on one's own initiative that some is specific, can produce the zone of the stimulation that strange stimulation and people expected, be called as focus-of-attention or marking area.Vision significance comprises bottom-up and top-down two kinds of patterns, and the former is by data-driven, and the latter is by knowledge or task-driven.Utilize people's top-down attention mechanism, outstanding those are contributed those big points of interest to event recognition, ignore those to the irrelevant point of interest of identification mission as far as possible.
Each frame I of video
iCan be expressed from the next:
In the formula: C is the class label of incident, c ∈ 1,2 ... }, δ is an indicative function,
Be respectively point of interest d
jMotion and appearance features word index;
From following formula we as can be seen, the function of yardstick invariant features descriptor (SIFT) feature is a descriptor, what aspect in the description incident, and the function of motion feature has two aspects, how aspect in the one side description incident, note clue as one again on the other hand, instruct people to go to discern the events corresponding classification.
Can set up two types attention histogram for exercise intensity and direction of motion:
Exercise intensity histogram (MMA-BOW) based on the vision word is expressed as follows:
Direction of motion histogram (OMA-BOW) based on the vision word is expressed as follows:
If consider the intensity and the directional information of light stream simultaneously, note histogram (MOMA-BoW) based on the motion of the special class of visual word bag:
And for each class Video Events c ∈ C, each motion word can obtain by bayes rule about the probability of each class:
T wherein
C+Be all set that belong to the video of c class, T is the set of all training samples, || || expression be the number of point of interest.
From based on the histogrammic formula of the attention of movable information as can be seen, movable information lies in the expression of video, and also can be used as is the weight of apparent information SIFT feature.Especially, for a given motion word, be different about the probability of different event class, that is to say that same motion word is different for the contribution of inhomogeneous identification.For example when we carry out incident " Running " classification, in all detected points of interest, describe the motion word of " Run " this action really and should give the weight of some greatly.On the other hand, resemble some such incidents of " Riot " for some, movable information is not what be correlated with, and each motion word all is the same for the probability of this class basically so, and speech bag model also can be degenerated to the most basic form.
(4) event recognition
Given one section video V, histogram feature p is noted in the motion based on the visual word bag that can obtain the i frame
iAfter, this video just can be expressed as
The weight of representing the i frame satisfies
Here adopt default value 1/m.(The Earth ' s Mover Distance EMD) measures the distance of two video sequences to adopt the dozer distance.For any two sections video P and Q, can be expressed as respectively
P wherein
iAnd q
iThe histogram feature of representing video P and Q respectively,
With
The weight of representing the i frame of video P and video Q respectively, m and n represent the frame number of video P and video Q respectively, the similarity of video P and Q can be calculated the characteristics that the dozer distance has timing drift and dimensional variation by following formula, the start frame that the former refers to one section video may mate with the end frame of other one section video, and the frame that the latter refers to one section video may mate with the multiframe of other one section video.
The similarity of video P and video Q can be calculated by following formula:
D wherein
IjBe p
iAnd q
jBetween Euclidean distance, f
IjBe the Optimum Matching of two video P and Q, can solve by a linear programming problem.
s.t.
f
ij≥0
Next use support vector machine as sorter, " one-to-many " is as classification policy.
Because what need identification is 9 incidents, therefore trained 9 sorters, the sample that is a class incident in each sorter is as test, and remaining is as training.Dozer distance between the video is embedded in the gaussian kernel function of support vector machine classifier:
M is a normalized factor, can be obtained by the average dozer distance that all training datas are concentrated.λ is that scale factor can be determined by the cross validation experience.
The above; only be the embodiment among the present invention; but protection scope of the present invention is not limited thereto; anyly be familiar with the people of this technology in the disclosed technical scope of the present invention; can understand conversion or the replacement expected; all should be encompassed in of the present invention comprising within the scope, therefore, protection scope of the present invention should be as the criterion with the protection domain of claims.
Claims (5)
1. Video Events recognition methods based on top-down motion attention mechanism comprises step:
Step S1: utilize difference of Gaussian to detect son, detect the point of interest that video is concentrated each each frame of video on computers, described video collection comprises: training video collection and test video collection;
Step S2: the point of interest that detection is obtained each frame extracts appearance features and motion feature, and described appearance features is a yardstick invariant features descriptor feature, and described motion feature is the light stream feature;
Step S3: the yardstick invariant features descriptor feature and the light stream feature that obtain are carried out cluster, and set up apparent vocabulary and motion vocabulary respectively;
Step S4: on the training video collection, calculate each motion word about the probability of each class incident and set up attention histogram based on movable information;
Step S5: that utilizes the video collection notes histogram feature based on motion, adopts similarity between dozer distance calculation training video collection and the training video collection, reaches the similarity between training video collection and the test video collection, and the produced nucleus Jacobian matrix;
Step S6: utilize the kernel function matrix that obtains that support vector machine classifier is trained, obtain classifier parameters, utilize the support vector machine classifier model that trains to the classification of test video collection, the classification results of output test video collection.
2. Video Events recognition methods according to claim 1, it is characterized in that the point of interest of described each frame extracts and adopts Harris's angle point, Harris-Laplce's point of interest, Hessen-Laplce's point of interest, Harris-affined transformation point of interest, Hessen-affined transformation point of interest, maximum stable extremal region point of interest, fast robust feature point of interest or net point and difference of Gaussian to detect a kind of in the son.
3. Video Events recognition methods according to claim 1 is characterized in that, described foundation comprises based on the histogrammic step of the attention of movable information:
Step S41: set video and concentrate each frame I of video
iBe expressed from the next:
In the formula: n () is i frame I
iHistogram represent w
vBe the appearance features word, w
mBe the motion feature word, C is the class label of incident, c ∈ 1,2 ... },
It is the motion word
The probability that belongs to the c class; δ is an indicative function,
Be respectively point of interest d
jMotion and appearance features word index;
Step S42: the attention histogram of setting up two types for exercise intensity and direction of motion is:
Based on the exercise intensity histogram (MMA-BOW) of vision word as shown in the formula expression:
Based on the direction of motion histogram (OMA-BOW) of vision word as shown in the formula expression:
Step S43: consider the intensity and the directional information of light stream simultaneously, set up based on the motion of visual word bag and notice that histogram (MOMA-BoW) is as shown in the formula expression:
4. Video Events recognition methods according to claim 3 is characterized in that, for each class c ∈ C that training video is concentrated, each motion word w
mProbability P (C=c|w about each class
m) obtain by bayes rule:
T wherein
C+Be all set that belong to the training video collection of c class, T
cBe the set of all training samples, || || expression be the number of point of interest.
5. Video Events recognition methods according to claim 1 is characterized in that, the distance that adopts the dozer distance to measure two video sequences of video collection for any two sections video P and Q, is expressed as respectively
P wherein
iAnd q
iThe histogram feature of representing video P and Q respectively,
With
Represent the weight of the i frame of video P and video Q respectively, m and n represent the frame number of video P and video Q respectively, the similarity D of video P and video Q (P, Q) calculate by following formula:
D wherein
IjBe p
iAnd q
jBetween Euclidean distance, f
IjBe the Optimum Matching of video P and video Q, described Optimum Matching is solved by a linear programming problem.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201010591513 CN102034096B (en) | 2010-12-08 | 2010-12-08 | Video event recognition method based on top-down motion attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN 201010591513 CN102034096B (en) | 2010-12-08 | 2010-12-08 | Video event recognition method based on top-down motion attention mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102034096A true CN102034096A (en) | 2011-04-27 |
CN102034096B CN102034096B (en) | 2013-03-06 |
Family
ID=43886959
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN 201010591513 Active CN102034096B (en) | 2010-12-08 | 2010-12-08 | Video event recognition method based on top-down motion attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102034096B (en) |
Cited By (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102163290A (en) * | 2011-05-16 | 2011-08-24 | 天津大学 | Method for modeling abnormal events in multi-visual angle video monitoring based on temporal-spatial correlation information |
CN102930302A (en) * | 2012-10-18 | 2013-02-13 | 山东大学 | On-line sequential extreme learning machine-based incremental human behavior recognition method |
CN103077401A (en) * | 2012-12-27 | 2013-05-01 | 深圳市赛为智能股份有限公司 | Method and system for detecting context histogram abnormal behaviors based on light streams |
CN103093236A (en) * | 2013-01-15 | 2013-05-08 | 北京工业大学 | Movable terminal porn filtering method based on analyzing image and semantics |
CN103226713A (en) * | 2013-05-16 | 2013-07-31 | 中国科学院自动化研究所 | Multi-view behavior recognition method |
CN103366370A (en) * | 2013-07-03 | 2013-10-23 | 深圳市智美达科技有限公司 | Target tracking method and device in video monitoring |
CN103854016A (en) * | 2014-03-27 | 2014-06-11 | 北京大学深圳研究生院 | Human body behavior classification and identification method and system based on directional common occurrence characteristics |
CN104200235A (en) * | 2014-07-28 | 2014-12-10 | 中国科学院自动化研究所 | Time-space local feature extraction method based on linear dynamic system |
CN104657468A (en) * | 2015-02-12 | 2015-05-27 | 中国科学院自动化研究所 | Fast video classification method based on images and texts |
WO2015078134A1 (en) * | 2013-11-29 | 2015-06-04 | 华为技术有限公司 | Video classification method and device |
CN103116896B (en) * | 2013-03-07 | 2015-07-15 | 中国科学院光电技术研究所 | Visual saliency model based automatic detecting and tracking method |
CN105512606A (en) * | 2015-11-24 | 2016-04-20 | 北京航空航天大学 | AR-model-power-spectrum-based dynamic scene classification method and apparatus |
CN105528594A (en) * | 2016-01-31 | 2016-04-27 | 江南大学 | Incident identification method based on video signal |
CN108268597A (en) * | 2017-12-18 | 2018-07-10 | 中国电子科技集团公司第二十八研究所 | A kind of moving-target activity probability map construction and behavior intension recognizing method |
CN108764050A (en) * | 2018-04-28 | 2018-11-06 | 中国科学院自动化研究所 | Skeleton Activity recognition method, system and equipment based on angle independence |
CN109670174A (en) * | 2018-12-14 | 2019-04-23 | 腾讯科技(深圳)有限公司 | A kind of training method and device of event recognition model |
CN110288592A (en) * | 2019-07-02 | 2019-09-27 | 中南大学 | A method of the zinc flotation dosing state evaluation based on probability semantic analysis model |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030033347A1 (en) * | 2001-05-10 | 2003-02-13 | International Business Machines Corporation | Method and apparatus for inducing classifiers for multimedia based on unified representation of features reflecting disparate modalities |
JP2006012012A (en) * | 2004-06-29 | 2006-01-12 | Matsushita Electric Ind Co Ltd | Event extraction device, and method and program therefor |
CN1945628A (en) * | 2006-10-20 | 2007-04-11 | 北京交通大学 | Video frequency content expressing method based on space-time remarkable unit |
CN101894276A (en) * | 2010-06-01 | 2010-11-24 | 中国科学院计算技术研究所 | Training method of human action recognition and recognition method |
-
2010
- 2010-12-08 CN CN 201010591513 patent/CN102034096B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030033347A1 (en) * | 2001-05-10 | 2003-02-13 | International Business Machines Corporation | Method and apparatus for inducing classifiers for multimedia based on unified representation of features reflecting disparate modalities |
JP2006012012A (en) * | 2004-06-29 | 2006-01-12 | Matsushita Electric Ind Co Ltd | Event extraction device, and method and program therefor |
CN1945628A (en) * | 2006-10-20 | 2007-04-11 | 北京交通大学 | Video frequency content expressing method based on space-time remarkable unit |
CN101894276A (en) * | 2010-06-01 | 2010-11-24 | 中国科学院计算技术研究所 | Training method of human action recognition and recognition method |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102163290B (en) * | 2011-05-16 | 2012-08-01 | 天津大学 | Method for modeling abnormal events in multi-visual angle video monitoring based on temporal-spatial correlation information |
CN102163290A (en) * | 2011-05-16 | 2011-08-24 | 天津大学 | Method for modeling abnormal events in multi-visual angle video monitoring based on temporal-spatial correlation information |
CN102930302A (en) * | 2012-10-18 | 2013-02-13 | 山东大学 | On-line sequential extreme learning machine-based incremental human behavior recognition method |
CN102930302B (en) * | 2012-10-18 | 2016-01-13 | 山东大学 | Based on the incrementally Human bodys' response method of online sequential extreme learning machine |
CN103077401A (en) * | 2012-12-27 | 2013-05-01 | 深圳市赛为智能股份有限公司 | Method and system for detecting context histogram abnormal behaviors based on light streams |
CN103093236A (en) * | 2013-01-15 | 2013-05-08 | 北京工业大学 | Movable terminal porn filtering method based on analyzing image and semantics |
CN103093236B (en) * | 2013-01-15 | 2015-11-04 | 北京工业大学 | A kind of pornographic filter method of mobile terminal analyzed based on image, semantic |
CN103116896B (en) * | 2013-03-07 | 2015-07-15 | 中国科学院光电技术研究所 | Visual saliency model based automatic detecting and tracking method |
CN103226713A (en) * | 2013-05-16 | 2013-07-31 | 中国科学院自动化研究所 | Multi-view behavior recognition method |
CN103226713B (en) * | 2013-05-16 | 2016-04-13 | 中国科学院自动化研究所 | A kind of various visual angles Activity recognition method |
CN103366370A (en) * | 2013-07-03 | 2013-10-23 | 深圳市智美达科技有限公司 | Target tracking method and device in video monitoring |
CN103366370B (en) * | 2013-07-03 | 2016-04-20 | 深圳市智美达科技股份有限公司 | Method for tracking target in video monitoring and device |
WO2015078134A1 (en) * | 2013-11-29 | 2015-06-04 | 华为技术有限公司 | Video classification method and device |
US10002296B2 (en) | 2013-11-29 | 2018-06-19 | Huawei Technologies Co., Ltd. | Video classification method and apparatus |
CN103854016A (en) * | 2014-03-27 | 2014-06-11 | 北京大学深圳研究生院 | Human body behavior classification and identification method and system based on directional common occurrence characteristics |
CN104200235A (en) * | 2014-07-28 | 2014-12-10 | 中国科学院自动化研究所 | Time-space local feature extraction method based on linear dynamic system |
CN104657468B (en) * | 2015-02-12 | 2018-07-31 | 中国科学院自动化研究所 | The rapid classification method of video based on image and text |
CN104657468A (en) * | 2015-02-12 | 2015-05-27 | 中国科学院自动化研究所 | Fast video classification method based on images and texts |
CN105512606A (en) * | 2015-11-24 | 2016-04-20 | 北京航空航天大学 | AR-model-power-spectrum-based dynamic scene classification method and apparatus |
CN105512606B (en) * | 2015-11-24 | 2018-12-21 | 北京航空航天大学 | Dynamic scene classification method and device based on AR model power spectrum |
CN105528594B (en) * | 2016-01-31 | 2019-01-22 | 江南大学 | A kind of event recognition method based on vision signal |
CN105528594A (en) * | 2016-01-31 | 2016-04-27 | 江南大学 | Incident identification method based on video signal |
CN108268597A (en) * | 2017-12-18 | 2018-07-10 | 中国电子科技集团公司第二十八研究所 | A kind of moving-target activity probability map construction and behavior intension recognizing method |
CN108764050A (en) * | 2018-04-28 | 2018-11-06 | 中国科学院自动化研究所 | Skeleton Activity recognition method, system and equipment based on angle independence |
CN108764050B (en) * | 2018-04-28 | 2021-02-26 | 中国科学院自动化研究所 | Method, system and equipment for recognizing skeleton behavior based on angle independence |
CN109670174A (en) * | 2018-12-14 | 2019-04-23 | 腾讯科技(深圳)有限公司 | A kind of training method and device of event recognition model |
CN109670174B (en) * | 2018-12-14 | 2022-12-16 | 腾讯科技(深圳)有限公司 | Training method and device of event recognition model |
CN110288592A (en) * | 2019-07-02 | 2019-09-27 | 中南大学 | A method of the zinc flotation dosing state evaluation based on probability semantic analysis model |
CN110288592B (en) * | 2019-07-02 | 2021-03-02 | 中南大学 | Zinc flotation dosing state evaluation method based on probability semantic analysis model |
Also Published As
Publication number | Publication date |
---|---|
CN102034096B (en) | 2013-03-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102034096B (en) | Video event recognition method based on top-down motion attention mechanism | |
Fenil et al. | Real time violence detection framework for football stadium comprising of big data analysis and deep learning through bidirectional LSTM | |
CN109034044B (en) | Pedestrian re-identification method based on fusion convolutional neural network | |
Yang et al. | STA-CNN: Convolutional spatial-temporal attention learning for action recognition | |
Pouyanfar et al. | Automatic video event detection for imbalance data using enhanced ensemble deep learning | |
CN101894276B (en) | Training method of human action recognition and recognition method | |
Gnouma et al. | Stacked sparse autoencoder and history of binary motion image for human activity recognition | |
Soomro et al. | Action localization in videos through context walk | |
Wang et al. | Video event detection using motion relativity and feature selection | |
CN104268586A (en) | Multi-visual-angle action recognition method | |
Gao et al. | Multi‐dimensional data modelling of video image action recognition and motion capture in deep learning framework | |
Jiang et al. | An efficient attention module for 3d convolutional neural networks in action recognition | |
CN103886585A (en) | Video tracking method based on rank learning | |
Kiruba et al. | Hexagonal volume local binary pattern (H-VLBP) with deep stacked autoencoder for human action recognition | |
Pang et al. | Predicting skeleton trajectories using a Skeleton-Transformer for video anomaly detection | |
Huang et al. | Multilabel remote sensing image annotation with multiscale attention and label correlation | |
Yang et al. | Bottom-up foreground-aware feature fusion for practical person search | |
Symeonidis et al. | Neural attention-driven non-maximum suppression for person detection | |
Wang et al. | Action recognition using linear dynamic systems | |
Sun et al. | Exploiting deeply supervised inception networks for automatically detecting traffic congestion on freeway in China using ultra-low frame rate videos | |
Aakur et al. | Action localization through continual predictive learning | |
Zhou et al. | Learning semantic context feature-tree for action recognition via nearest neighbor fusion | |
Li et al. | Video is graph: Structured graph module for video action recognition | |
Elharrouss et al. | Mhad: multi-human action dataset | |
Ahmed | Motion classification using CNN based on image difference |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |