CN101430689A - Detection method for figure action in video - Google Patents

Detection method for figure action in video Download PDF

Info

Publication number
CN101430689A
CN101430689A CNA2008101375080A CN200810137508A CN101430689A CN 101430689 A CN101430689 A CN 101430689A CN A2008101375080 A CNA2008101375080 A CN A2008101375080A CN 200810137508 A CN200810137508 A CN 200810137508A CN 101430689 A CN101430689 A CN 101430689A
Authority
CN
China
Prior art keywords
video
action
steps
model
space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2008101375080A
Other languages
Chinese (zh)
Inventor
姚鸿勋
纪荣嵘
孙晓帅
许鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Harbin Institute of Technology
Original Assignee
Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Harbin Institute of Technology filed Critical Harbin Institute of Technology
Priority to CNA2008101375080A priority Critical patent/CN101430689A/en
Publication of CN101430689A publication Critical patent/CN101430689A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method for detecting a figure action in video, and relates to a video detection method based on content so as to solve the problem that the existing method for searching multimedia information can not detect action information in video materials. The method comprises the following steps: a video shot is divided by a shot boundary detection method based on Graph Partition Model; a space-time saliency map is obtained by a method of establishing a dynamic saliency model based on a saliency map of each frame for continuous video frames; attention transfer of the time-space saliency map is calculated by setting a threshold value and separating an attention transfer value exceeding the threshold value; in the same action, the separated attention transfer value is subject to 3D sequence slicing overlapped by each frame so as to establish the action detection model. In the invention, the method can be used for searching mass video materials according to figure action semantic information contained in the materials, and is convenient for a user to quickly browse and search the video and watch the content which the user is interested in.

Description

Detection method for figure action in a kind of video
Technical field
The invention belongs to content-based video detecting method, be to extract and efficient index by figure action to video content, make its view transformation for general meaning, the duration conversion has stronger robustness, thereby realizes based on the video index of action and the method for retrieval.
Background technology
The emerging in large numbers on a large scale of multimedia messages on the internet makes that arrangement, index, the retrieval technique of multimedia messages are paid close attention to by everybody.But multimedia retrieval mainly adopts keyword matching (as Google, the video frequency searching engine of Baidu) to retrieve at present.Method based on keyword matching is not understood video content, and the shooting of be based on webpage making person or video, wright define the understanding of this video and classify.
In recent years, content-based multimedia information retrieval technology grows up gradually, by the content of multimedia material is analyzed, extracts its low-level image feature (as color characteristic, textural characteristics etc.), and retrieves as new matching criterior with this.Though utilize the method for low-level image feature coupling can reflect two groups of multimedia messagess similarity in terms of content to a certain extent, the semantic wide gap of outwardness still is the difficult problem that this technology is not captured as yet.To content of multimedia, particularly the middle level semanteme of image and video extracts a kind of important channel that is considered to fill up semantic wide gap, and this has obtained checking in the sports video problem analysis.Action message in video material, is a kind of very important semantic information, and particularly in some movie and television play videos, the expansion of story tends to be presented in specific action, also is the focus that the user browses and retrieves.If can carry out index to video material, will be very beneficial for the user and browse and retrieve its interested video clips according to action message.
Summary of the invention
The present invention provides the detection method for figure action in a kind of video for solving the problem that existing multimedia information retrieval method can not detect the action message in the video material.The present invention includes following steps:
Step 1, by the camera lens of video being cut apart based on the lens boundary detection method of Graph Partition Model;
Step 2, for continuous video frames, the method for model obtains space-time remarkable figure by setting up dynamically significantly on the basis of the remarkable figure of each frame;
Step 3, pass through formula
A Shift = 1 CenterDis ( i , j ) > T C and DiameterVar < T D 0 CenterDis ( i , j ) < T C or DiameterVar > T D
Calculate the diversion variables A of space-time remarkable figure Shift:
The distance between the center of adjacent each the frame ' s focus of attention of CenterDis () expression wherein, the change in radius of the circumscribed circle of adjacent each the frame ' s focus of attention of DiameterVar () expression;
Step 4, a threshold value and will be above the diversion value A of threshold value is set ShiftSeparate;
Step 5, in same action, to isolated diversion value A ShiftCarry out the 3D sequence section of each frame stack, set up the motion detection model.
Beneficial effect: a large amount of video materials can be carried out index according to the figure action semantic information that it comprises, make things convenient for the user that video is browsed fast and retrieved, watch own interested content.On the one hand, the invention provides a kind of cutting of carrying out video actions based on the model of conspicuousness redirect; On the other hand, the present invention proposes a kind of camera lens internal physical incidence relation that passes through to analyze, effectively extracted the place semantic information in the video material; Have again, the invention provides a kind of similarity computation model of novelty, make the action similarity calculate for visual angle change, dimensional variation, apparent gradual change and duration change insensitive; At last, the present invention proposes on a kind of layering local feature cluster index structure, the present invention adopts and carries out the index of 3D visual vocabulary, thereby reaches higher accuracy in real-time retrieval.
The objective of the invention is to extract and to utilize the stage business semantic information in the video material, set up the index in video material storehouse, and then realize that the user browses or retrieves video material by figure action.Meaning of the present invention is: propose a kind ofly to generate and scalable similarity Matching Algorithm based on the burst visual vocabulary, in conjunction with model analysis of people's words attention rate and layering local feature cluster, realize the effective search of figure action in the video and browse.
Embodiment
Embodiment one: present embodiment is made up of following steps:
Step 1, by the camera lens of video being cut apart based on the lens boundary detection method of Graph Partition Model;
Step 2, for continuous video frames, the method for model obtains space-time remarkable figure by setting up dynamically significantly on the basis of the remarkable figure of each frame;
Step 3, pass through formula
A Shift = 1 CenterDis ( i , j ) > T C and DiameterVar < T D 0 CenterDis ( i , j ) < T C or DiameterVar > T D
Calculate the diversion variables A of space-time remarkable figure Shift:
The distance between the center of adjacent each the frame ' s focus of attention of CenterDis () expression wherein, the change in radius of the circumscribed circle of adjacent each the frame ' s focus of attention of DiameterVar () expression;
Step 4, a threshold value and will be above the diversion value A of threshold value is set ShiftSeparate, in case A ShiftThe diversion value exceed threshold range, just think the generation that has this moment the focus action in the camera lens to switch;
Step 5, in same action, to isolated diversion value A ShiftCarry out the 3D sequence section of each frame stack, set up the motion detection model.The space-time that the section that this step will generate can be regarded the stack of multiframe as passes through ordered sets, and these set all constituted the structure primitive of action index model.
Present embodiment is at first used role's occupation rate low coverage close-up shot that removes in the video, utilize the background information in the visual attention computation model filtration scene then, adopt sequential burst local feature to generate and quantification subsequently, in conjunction with the dynamic time registration technology, effectively carry out the calculating of respective action similarity.On the action data index, this invention has proposed the Index Algorithm based on layering local feature cluster thought, has effectively satisfied the requirement of retrieval real-time, thereby realizes fast and accurately video tour and retrieval based on figure action.
Embodiment two: present embodiment further defines the motion detection model of setting up described in the step 5 and may further comprise the steps on the basis of embodiment one:
Steps A 1, for the time aerial each 3D sequence section, adopt feature of poly-step of 3D-SIFT space-time to carry out feature description;
Steps A 2, feature of poly-step of all 3D-SIFT space-times that extract is carried out quantification on the high bit space, its quantized result is constituted the hierarchical clustering model by level K mean cluster;
Steps A 3, at the end of this hierarchical clustering model, each feature space that gathers is described as a visual vocabulary, this visual vocabulary converges to all in 3D-SIFT characteristic quantification to a speech of this cluster centre, and the 3D-sequence section at extracting these 3D-SIFT features carries out inverted index.
Present embodiment is referred to as 3D vision statement through the action sequence of this inverted index, because they are made up of the 3D visual vocabulary, and the time sequencing of priority is arranged; Further, present embodiment adopts the Term Frequency-Inverted Document Frequency (TF-IDF) in the text retrieval to carry out the calculating of the importance of each speech in the 3D vision statement, and then gives different weights for each the 3D visual vocabulary in the vision statement; In this 3D visual vocabulary, comprised the temporal information of an action, spatial correspondence, the apparent attribute of movable information and moving object.
Embodiment three, present embodiment further define the hierarchical clustering model described in the steps A 3 on the basis of embodiment two method for building up may further comprise the steps:
Step B1, carry out searching of two 3D vocabulary to be matched by the hierarchical structure of model, judge the visual word symbiosis that whether has in two vocabulary greater than number of thresholds, judged result is for being then to enter step B2, judged result is that then repeated execution of steps B1 does not search again;
Step B2, carry out calculation of similarity degree by the dynamic time registration.
In order in the process of action coupling, to reach rotation, convergent-divergent and viewpoint unchangeability, present embodiment has proposed a kind of 3D vision statement matching algorithm based on the dynamic time registration at this problem.The dynamic time registration Algorithm is the not isometric feature string that is used to weigh on the two ends time preface.Dynamic time is registered in and all seeks current optimum matching unique point of being born in the feature the inside on each characteristic matching.It has adopted the thought of dynamic programming, therefore has the near-optimization matching effect of algorithm.
At first define two 3D vision statements and the following is C=<C 0, C 1, C 2..., C mAnd C '=<C 0', C 1', C 2' ..., C m', each action that embodiment one is extracted all represented in each 3D vision statement, its length m and m ' might not equate.In order to weigh the similarity of these two 3D vision statements, we define the truncation of vision statement be Tail (C)=<C 1, C 2..., C m, and then we utilize formula two to calculate the similarity of two 3D vision statements:
DTW(<>,<>)=0
DTW(C,<>)=DTW(<>,C′)=∞ (2)
DTW ( C , C &prime; ) = | | c i - c j | | + min DTW ( C , Tail ( C &prime; ) ) DTW ( Tail ( C ) , C &prime; ) DTW ( Tail ( C ) , Tail ( C &prime; ) )
The computation process of this similarity adopts dynamic programming to carry out.Generally speaking, || c i-c j|| can be two L2 or cosine distances between the 3D visual vocabulary.In realization, owing to extracted all 3D vision statements in advance, so this computation process can be finished in the efficient time.

Claims (3)

1, the detection method for figure action in a kind of video is characterized in that it may further comprise the steps:
Step 1, by the camera lens of video being cut apart based on the lens boundary detection method of Graph Partition Model;
Step 2, for continuous video frames, the method for model obtains space-time remarkable figure by setting up dynamically significantly on the basis of the remarkable figure of each frame;
Step 3, pass through formula
A shift = 1 CenterDis ( i , j ) > T C and DiameterVar < T D 0 CenterDis ( i , j ) < T C or DiameterVar > T D
Calculate the diversion variables A of space-time remarkable figure Shift:
The distance between the center of adjacent each the frame ' s focus of attention of CenterDis () expression wherein, the change in radius of the circumscribed circle of adjacent each the frame ' s focus of attention of Diameter Var () expression;
Step 4, a threshold value and will be above the diversion value A of threshold value is set ShiftSeparate;
Step 5, in same action, to isolated diversion value A ShiftCarry out the 3D sequence section of each frame stack, set up the motion detection model.
2, the detection method for figure action in a kind of video according to claim 1 is characterized in that the motion detection model of setting up described in the step 5 may further comprise the steps:
Steps A 1, for the time aerial each 3D sequence section, adopt feature of poly-step of 3D-SIFT space-time to carry out feature description;
Steps A 2, feature of poly-step of all 3D-SIFT space-times that extract is carried out quantification on the high bit space, its quantized result is constituted the hierarchical clustering model by level K mean cluster;
The end of steps A 3, the hierarchical clustering model that obtains in steps A 2, each feature space that gathers is described as a visual vocabulary, described visual vocabulary is a speech that all is converged to the 3D-SIFT characteristic quantification acquisition of each cluster centre, and carries out inverted index at the 3D-sequence section of each 3D-SIFT feature by the value of characteristic quantification.
3, the detection method for figure action in a kind of video according to claim 2 is characterized in that the method for building up of the hierarchical clustering model described in the steps A 3 may further comprise the steps:
Step B1, carry out searching of two 3D vocabulary to be matched by the hierarchical structure of model, judge the visual word symbiosis that whether has in two vocabulary greater than number of thresholds, judged result is for being then to enter step B2, judged result is that then repeated execution of steps B1 does not search again;
Step B2, carry out calculation of similarity degree by the dynamic time registration.
CNA2008101375080A 2008-11-12 2008-11-12 Detection method for figure action in video Pending CN101430689A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2008101375080A CN101430689A (en) 2008-11-12 2008-11-12 Detection method for figure action in video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2008101375080A CN101430689A (en) 2008-11-12 2008-11-12 Detection method for figure action in video

Publications (1)

Publication Number Publication Date
CN101430689A true CN101430689A (en) 2009-05-13

Family

ID=40646094

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2008101375080A Pending CN101430689A (en) 2008-11-12 2008-11-12 Detection method for figure action in video

Country Status (1)

Country Link
CN (1) CN101430689A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102668548A (en) * 2009-12-17 2012-09-12 佳能株式会社 Video information processing method and video information processing apparatus
CN105049674A (en) * 2015-07-01 2015-11-11 中科创达软件股份有限公司 Video image processing method and system
CN106534951A (en) * 2016-11-30 2017-03-22 北京小米移动软件有限公司 Method and apparatus for video segmentation
CN109982109A (en) * 2019-04-03 2019-07-05 睿魔智能科技(深圳)有限公司 The generation method and device of short-sighted frequency, server and storage medium
WO2019144840A1 (en) * 2018-01-25 2019-08-01 北京一览科技有限公司 Method and apparatus for acquiring video semantic information
CN110097115A (en) * 2019-04-28 2019-08-06 南开大学 A kind of saliency object detecting method based on attention metastasis

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102668548A (en) * 2009-12-17 2012-09-12 佳能株式会社 Video information processing method and video information processing apparatus
CN102668548B (en) * 2009-12-17 2015-04-15 佳能株式会社 Video information processing method and video information processing apparatus
CN105049674A (en) * 2015-07-01 2015-11-11 中科创达软件股份有限公司 Video image processing method and system
CN106534951A (en) * 2016-11-30 2017-03-22 北京小米移动软件有限公司 Method and apparatus for video segmentation
CN106534951B (en) * 2016-11-30 2020-10-09 北京小米移动软件有限公司 Video segmentation method and device
WO2019144840A1 (en) * 2018-01-25 2019-08-01 北京一览科技有限公司 Method and apparatus for acquiring video semantic information
CN109982109A (en) * 2019-04-03 2019-07-05 睿魔智能科技(深圳)有限公司 The generation method and device of short-sighted frequency, server and storage medium
CN109982109B (en) * 2019-04-03 2021-08-03 睿魔智能科技(深圳)有限公司 Short video generation method and device, server and storage medium
CN110097115A (en) * 2019-04-28 2019-08-06 南开大学 A kind of saliency object detecting method based on attention metastasis
CN110097115B (en) * 2019-04-28 2022-11-25 南开大学 Video salient object detection method based on attention transfer mechanism

Similar Documents

Publication Publication Date Title
CN103218608B (en) Network violent video identification method
CN101650722B (en) Method based on audio/video combination for detecting highlight events in football video
CN104199933B (en) The football video event detection and semanteme marking method of a kind of multimodal information fusion
Mao et al. Deep cross-modal retrieval for remote sensing image and audio
CN112163122B (en) Method, device, computing equipment and storage medium for determining label of target video
CN101430689A (en) Detection method for figure action in video
CN101477798A (en) Method for analyzing and extracting audio data of set scene
CN103745000A (en) Hot topic detection method of Chinese micro-blogs
US20110106531A1 (en) Program endpoint time detection apparatus and method, and program information retrieval system
Su et al. Environmental sound classification for scene recognition using local discriminant bases and HMM
CN103761261A (en) Voice recognition based media search method and device
EP2395502A1 (en) Systems and methods for manipulating electronic content based on speech recognition
CN111754302A (en) Video live broadcast interface commodity display intelligent management system based on big data
CN101247470A (en) Method for detecting scene boundaries in genre independent videos
CN113709384A (en) Video editing method based on deep learning, related equipment and storage medium
CN104281608A (en) Emergency analyzing method based on microblogs
CN113111663A (en) Abstract generation method fusing key information
CN110705292A (en) Entity name extraction method based on knowledge base and deep learning
CN114547373A (en) Method for intelligently identifying and searching programs based on audio
Bartolini et al. Shiatsu: semantic-based hierarchical automatic tagging of videos by segmentation using cuts
CN102253993B (en) Vocabulary tree-based audio-clip retrieving algorithm
Tu et al. Challenge Huawei challenge: Fusing multimodal features with deep neural networks for Mobile Video Annotation
Thuseethan et al. Multimodal deep learning framework for sentiment analysis from text-image web Data
Yang et al. Multimodal short video rumor detection system based on contrastive learning
Tirupattur et al. Violence Detection in Videos

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Open date: 20090513