CN104036023B - Method for creating context fusion tree video semantic indexes - Google Patents
Method for creating context fusion tree video semantic indexes Download PDFInfo
- Publication number
- CN104036023B CN104036023B CN201410297974.0A CN201410297974A CN104036023B CN 104036023 B CN104036023 B CN 104036023B CN 201410297974 A CN201410297974 A CN 201410297974A CN 104036023 B CN104036023 B CN 104036023B
- Authority
- CN
- China
- Prior art keywords
- camera lens
- video
- semantic
- scene
- context
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/71—Indexing; Data structures therefor; Storage structures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Studio Devices (AREA)
Abstract
The invention belongs to the field of technologies for retrieving video, and discloses a method for creating tree video semantic indexes. The video semantic indexes built by the aid of the method contain video semantics of various particle sizes, contexts among the video semantics are fused by the semantic indexes, and the video semantics of the different particle sizes are connected with one another according to the contexts, so that tree structures can be formed. The method is characterized by comprising steps of extracting lens semantic sets of various lenses one by one; acquiring contexts of video lens semantics under the monitoring condition and representing the contexts by context tag trees; combining the lens semantic sets and context information with one another to infer scene semantics; embedding the lens semantic sets and the scene semantics into the context tag trees to obtain the video indexes. The method has the advantages that after the semantic indexes are created for video by the aid of the method, users can input keywords of the different particle sizes to retrieve the video, search spaces can be diminished by the aid of context information in the indexes, and accordingly the efficiency of retrieval systems can be improved.
Description
Technical field
The invention belongs to video search technique area, be a kind of semantic camera lens that can utilize video, Scene Semantics and
The method that context between semanteme builds video semanteme index.
Background technology
Nowadays video data already becomes one of most important data on internet.It is volatile however as video data
Increase, how efficiently to manage, retrieve video and become a highly difficult problem.Generally user is defeated when video is retrieved
Enter a keyword, then need to be found the video data of correlation according to keyword by video search engine.This is required to video
Setting up suitable semantic indexing could improve the efficiency and hit rate of user search video.Video index structure based on video semanteme
It is automatically to analyze the visual signature of video to obtain the semantic information that video contains by computer to build, then by semantic information
Used as the index of video, user can be by being input into key search video when video is retrieved.
But requirement of the user to video search engine is improved constantly, user's different grains of different inputs often according to demand
The keyword of degree, such as user search for football related content video when may be input into " football ", " wonderful ", " penetrate
The varigrained keyword such as door ", " judge's feature " enters line retrieval.So traditional Monosized powder, without level video language
Justice index can not meet the Search Requirement of user.Additionally, the semantic content of video enriches, also deposit in addition to semantic information
In substantial amounts of contextual information, understand mutual between different grain size semanteme using the contextual information engine that can assist search
Effect, be in video it is varigrained semanteme between opening relationships, so as to retrieve video when can be according to these relation informations
The related video of search.Contextual information can reduce search space, improve search effect on the premise of search hit rate is ensured
Rate.Based on this, present invention achieves a kind of video semanteme index for being capable of integrating context, to improve the effective of video index
Property.
The content of the invention
The purpose of the present invention be realize it is a kind of being capable of the integrating context information method of setting up tree-like video semanteme index.Should
Method can incorporate contextual information in video semanteme index, improve video frequency searching hit rate and efficiency.
The present invention is realized using below scheme:A kind of tree-like video semanteme index establishing method of integrating context, it is special
Levy is that the method is comprised the following steps:
Step 1:N training video fragment video of inputj, j ∈ { 1 ..., n }, to videojPre-processed, then with
Camera lens manually marks video for unitjThe semantic collection of the camera lens of each camera lens, and be the semantic training of the semantic construction camera lens of every class camera lens
Collection obtains camera lens semantic analyzer to train grader.The video segment video of m tree index to be set up of inputk, k ∈
{ 1 ..., m }, to videokPre-processed, using camera lens semantic analyzer video is extractedkThe semantic collection of the camera lens of each camera lens;
Step 2:In units of video segment, video is manually markedjContext between middle camera lens semanteme, with upper
The hereafter contextual tab tree LT of labeljRepresent, and build context training set.Training structure supporting vector SVM-
Struct, obtains contextual tab tree analyzer.Video is extracted using context analyserkIn contextual tab tree LTk;
Step 3:With videojScene manually mark Scene Semantics for unit, build Scene Semantics training set.Training
C4.5 graders, obtain Scene Semantics analyzer.Video is extracted using Scene Semantics analyzerkIn each scene scene language
Justice;
Step 4:By the video obtained in step 2kThe video that the semantic collection of the camera lens of each camera lens is obtained with step 4kEach scene
Scene Semantics be embedded into the LT obtained in step 3kIn corresponding node, by the LT of the semantic and Scene Semantics with camera lenskWork
For videokVideo index.
Further, carry out as follows in the step 1:
Step 2.1:To n training video fragment videojShot segmentation is carried out, r training video camera lens is obtained;Extract
And quantify the visual signature of camera lens, it is configured to visual feature vector v;
Step 2.2:The semantic collection Semantic={ Sem of mark are sett| t=1 ..., e }, manually go out in r camera lens of mark
Existing semantic Semt, the semantic concentration of camera lens of each camera lens is added to, it is then each class camera lens semanteme SemtConstruction camera lens is semantic
Training set, obtains e camera lens semanteme training set Trat={ (vi,si) | i=1 ..., r }, if semanteme SemtOccur in the mirror
In head, then si=1, it is otherwise 0;
Step 2.3:It is each semantic Sem using SVM classifier as disaggregated modeltOne grader SVM of trainingt;
SVMtDiscriminant function form be:ft(v)=sgn [g (v)], wherein g (v)=<w,v>+b;So passing through training set TratTraining
SVMtOptimization aim be:
Merge optimization problem using Lagrangian and constraint is converted into (1) formula:
Introduce kernel function K (vj,vh), formula (2) is converted to:
Kernel function is chosen to be RBF, is defined as:
Wherein exp () is exponential function, and σ is parameter.
One group of α has been determined that after the completion of trainingi, also determined that camera lens semanteme SemtDiscriminant function:
Wherein b0For parameter.
Step 2.4:Complete to all Sem according to step 2.3tGrader SVMtAfter training, e camera lens is obtained semantic
Discriminant function, by the camera lens semantic analyzer group of the semantic discriminant function composition of e camera lens.
Step 2.5:Video segment video to m tree index to be set upkShot segmentation is carried out, each is then extracted
The visual signature composition characteristic vector v of camera lens;The semanteme occurred during v is input into into camera lens semantic analyzer group to judge the camera lens,
And the semanteme that will appear from is added to the semantic concentration of camera lens of this camera lens.
Further, the step 2 is carried out as follows:
Step 3.1:From videojThe camera lens of each camera lens is semantic to concentrate one camera lens semanteme of extraction to represent the camera lens, then presses
Camera lens semantic sequence wu is constituted according to sequential relationshipj;
Step 3.2:Artificial mark wujContext, and with context tag tree LTjRepresent contextual information;Context mark
It is a five-tuple LT=to sign tree<L,Video,Scene,NL,P>;Wherein L is camera lens semantic label collection, and its element representation is
wujThe middle camera lens for representing camera lens is semantic;Video is " video context " label, and represented context is that its child node is common
Express the content of this section of video;Scene is " scene context " label, shown in table be its child node co expression this
The content of scape;NL is the contextual tab collection in addition to Video and Scene, and wherein each element represents a kind of context and closes
System;P is contextual rules, and each of which element representation is a context rule;
Step 3.3:By n wujContext training set is configured to corresponding contextual tab tree:
Context={ (xj,yj) | j=1 ..., n }, wherein xjIt is camera lens semantic sequence, yjIt is corresponding context mark
Sign tree;
Step 3.4:Using context training set training structure support vector machines-Struct, concrete operations are:
Step 3.4.1:Construct camera lens semantic sequence is with the mapping function of contextual tab tree:
Wherein f (x, y;W)=<W,ψ(x,y)>For discriminant function, W is weight vector, and ψ (x, y) is the camera lens in training data
The union feature vector of the corresponding contextual tab tree of semantic sequence;The mode of construction ψ (x, y) is as follows:
Wherein piWith ai(i ∈ [1, N]) is respectively the rule in the contextual rules P of the contextual tab tree and the rule
The number of times for occurring then is corresponded to, N is the context rule classification sum occurred in context training set;
Step 3.4.2:Training SVM-Struct is converted into into optimization problem:
Wherein εjFor slack variable, C>0 is the penalty value of wrong point of sample, Δ (yj, y) it is loss function;Make loss function Δ
(yj, y)=(1-F1(yj,y));Wherein yjIt is the true contextual tab tree of camera lens semantic sequence in context training set, y is
The contextual tab tree predicted in training process, F1 calculations are as follows:
Wherein, Precision is each node predictablity rate in contextual tab, and Precision is contextual tab
The recall rate of each node prediction, E (y in treej) it is yjSide collection, E (y) for y side collection;
Step 3.4.3:Formula (6) is changed into into the form of its antithesis:
Wherein αiyIt is Lagrange multiplier. for soft margin, also there is a group constraints in addition:
Step 3.4.4:The computing formula (7) on context training set context, finds one group of α of optimumjyAfterwards also just really
Determine weight vector W, obtain contextual tab tree analyzer;
Step 3.5:Video is extracted with step 3.1 identical modekCamera lens semantic sequence wuk, and by wukInput is regarded
Frequency contextual tab tree analyzer, obtains wukLTk。
Further, the step 3 is carried out as follows:
Step 4.1:According to LTjIn " scene context " label Scene, by the leaf node institute under each Scene label
Corresponding camera lens realizes the scene cut of video as a complete video scene;Then it is artificial right in units of scene
videojScene carry out Scene Semantics mark;
Step 4.2:Using the semantic collection of camera lens and corresponding LT of each camera lens in each scenejIn contextual information construction
Scene Semantics training set;The feature of wherein Scene Semantics is divided into two kinds:
A. camera lens semantic feature:If certain camera lens is semantic to occur in this scenario, making the camera lens semantic feature value be 1, otherwise
For 0;
B. contextual feature:Contextual feature is the context relation between two camera lens semantemes, and camera lens semanteme is in LTjIn
One leaf node of correspondence, so the semantic contextual feature value of the two camera lenses is the two leaf nodes public ancestor node recently
On contextual tab;
Step 4.3:With C4.5 algorithms as disaggregated model, increased according to the information of each characteristic attribute in Scene Semantics training set
Beneficial rate ultimately generates the semantic decision tree of analysis video scene selecting attribute as node, and using this decision tree as field
Scape semantic analyzer;
Step 4.4:According to wukLTk, with identical method in step 4.1 by videokIt is divided into some scenes, and with
Scene extracts the characteristic vector of the scene for unit;By videokThe characteristic vector input scene semantic analyzer of each scene, obtains
To videokThe Scene Semantics of each scene.
Further, the step 4 is carried out as follows:
Step 5.1:By LTkIn each leaf node in camera lens semantic label replace with corresponding to representative camera lens
The semantic collection of camera lens;
Step 5.2:By LTkIn each Scene replace with corresponding Scene Semantics;
Step 5.3:By comprising the semantic LT with Scene Semantics of camera lenskAs videokVideo index.
The invention has the beneficial effects as follows:Set up after semantic indexing for video using the method for the present invention, user can be input into
Varigrained keyword retrieval video, and the contextual information in indexing can reduce search space, improve searching system
Efficiency.
Description of the drawings
Fig. 1 is tree-like video semanteme index Establishing process.
Fig. 2 is the model of a video contextual tab tree.
Fig. 3 is a tree-like video index model.
Specific embodiment
Fig. 1 is refer to, a kind of method of the tree-like video index foundation of integrating context first extracts mirror in units of camera lens
The semantic information of head, obtains the semantic context of video lens with then having supervision, and represents context with tree construction.Tie again
Close the semantic reasoning that Scene Semantics are carried out with their context of camera lens.Finally camera lens semanteme, Scene Semantics are embedded into into tree knot
In structure and as the index of video.It is specific as follows:
1. pair n training video fragment videojShot segmentation is carried out, r training video camera lens is obtained.Extract and quantify
The visual signature of camera lens, is configured to visual feature vector v.
The semantic collection Semantic={ Sem of mark are sett| t=1 ..., e }, the semanteme for manually occurring in r camera lens of mark
Semt, the semantic concentration of camera lens of each camera lens is added to, it is then each class camera lens semanteme SemtConstruction camera lens semanteme training set,
Obtain e camera lens semanteme training set Trat={ (vi,si) | i=1 ..., r }, if semanteme SemtIn occurring in the camera lens, then si
=1, it is otherwise 0;
It is each semantic Sem using SVM classifier as disaggregated modeltOne grader SVM of trainingt。SVMtDifferentiation
Functional form is:ft(v)=sgn [g (v)], wherein g (v)=<w,v>+b.So passing through training set TratTraining SVMtOptimization
Target is:
Merge optimization problem using Lagrangian and constraint is converted into (1) formula:
Introduce kernel function K (vj,vh), formula (2) is converted to:
Kernel function is chosen to be RBF, is defined as:
Wherein exp () is exponential function, and σ is parameter.
One group of α has been determined that after the completion of trainingi, also determined that camera lens semanteme SemtDiscriminant function:
Wherein b0For parameter.
Complete SemtCorresponding grader SVMtAfter training, the camera lens semantic analysis comprising e camera lens semantic analyzer is obtained
Device group.
Video segment video to m tree index to be set upkShot segmentation is carried out, regarding for each camera lens is then extracted
Feel feature composition characteristic vector v.The semanteme occurred during v is input into into camera lens semantic analyzer group to judge the camera lens, and will appear from
Semanteme be added to the camera lens of this camera lens and semantic concentrate.
2. from videojThe camera lens of each camera lens is semantic to concentrate one camera lens semanteme of extraction to represent the camera lens, then according to sequential
Relation constitutes camera lens semantic sequence wuj。
In units of video segment, the semantic sequence wu of artificial mark training video fragmentjContext, and with corresponding
Contextual tab tree LTjRepresent contextual information.Contextual tab tree is formally defined as five-tuple LT=<L,Video,
Scene,NL,P>.Wherein L is camera lens semantic label collection, and its element representation is wujThe middle camera lens for representing camera lens is semantic.Video
It is " video context " label, represented context is the content of its this section of video of child node co expression.Scene is " field
Scape context " label, the content of this scene that has been its child node co expression shown in table.NL be except Video and Scene it
Outer contextual tab collection, wherein each element represent a kind of context relation.P is contextual rules, each of which element
What is represented is a kind of context rule.Such as Fig. 2 leaf nodes l1And l2Constitute their father node nl1Rule, this rule
Can formally be expressed as:nl1→l1l2。
By n wujContext training set is configured to corresponding contextual tab tree:Context={ (xj,yj) | j=
1 ..., n }, wherein xjIt is camera lens semantic sequence, yjIt is corresponding contextual tab tree.
Using context training set training structure support vector machines-Struct, construction camera lens semantic sequence with it is upper and lower
The mapping function of literary tag tree is:
Wherein f (x, y;W)=<W,ψ(x,y)>For discriminant function, W is weight vector, and ψ (x, y) is the camera lens in training data
The union feature vector of the corresponding contextual tab tree of semantic sequence.The mode of construction ψ (x, y) is as follows:
Wherein piWith ai(i ∈ [1, N]) is respectively the context rule in the contextual rules P of the contextual tab tree
The number of times of appearance corresponding with the rule, N is the context rule classification sum occurred in context training set.
Training SVM-Struct is converted into into optimization problem:
Wherein εjFor slack variable, C>0 is the penalty value of wrong point of sample, Δ (yj, y) it is loss function.Make loss function Δ
(yj, y)=(1-F1(yj,y)).Wherein yjIt is the true contextual tab tree of camera lens semantic sequence in context training set, y is
The contextual tab tree predicted in training process, F1 calculations are as follows:
Wherein, Precision is each node predictablity rate in contextual tab, and Precision is contextual tab
The recall rate of each node prediction, E (y in treej) it is yjSide collection, E (y) for y side collection.
Formula (6) is changed into into the form of its antithesis:
Wherein αiyIt is Lagrange multiplier. for soft margin, also there is a group constraints in addition:
After setting penalty value C, the computing formula (7) on context training set context finds one group of α of optimumjyAfterwards
Also weight vector W is determined that, contextual tab tree analyzer is obtained.
Extract videokCamera lens semantic sequence wuk, and by wukInput video contextual tab tree analyzer, obtains wuk
LTk。
3. according to LTjIn " scene context " label Scene, by corresponding to the leaf node under each Scene label
Camera lens realizes the scene cut of video as a complete video scene.Then it is artificial to video in units of scenej's
Scene carries out Scene Semantics mark.
Using the semantic collection of camera lens and corresponding LT of each camera lens in each scenejIn contextual information construction Scene Semantics
Training set.The feature of wherein Scene Semantics is divided into two kinds:
A. camera lens semantic feature:If certain camera lens is semantic to occur in this scenario, making the camera lens semantic feature value be 1, otherwise
For 0;
B. contextual feature:Contextual feature is the context relation between two camera lens semantemes, and camera lens semanteme is in LTjIn
One leaf node of correspondence, so the semantic contextual feature value of the two camera lenses is the two leaf nodes public ancestor node recently
On contextual tab.For example, l in Fig. 21And l2Contextual feature be " nl1", l1And l3Contextual feature be " Scene ".
With C4.5 algorithms as disaggregated model, selected according to the information gain-ratio of each characteristic attribute in Scene Semantics training set
Attribute is selected as node, the semantic decision tree of analysis video scene is ultimately generated.Analyze this decision tree as Scene Semantics
Device.
According to wukLTkIn " scene context " label Scene, by videokIt is divided into some scenes, and is with scene
Unit extracts the camera lens semantic feature of the scene and contextual feature composition characteristic vector.By videokThe characteristic vector of each scene
Input scene semantic analyzer, obtains videokThe Scene Semantics of each scene.
4. by LTkIn each leaf node in camera lens semantic label replace with camera lens language corresponding to representative camera lens
Justice collection, then by LTkIn each Scene replace with corresponding Scene Semantics, finally will be comprising camera lens is semantic and Scene Semantics
LTkAs videokVideo index;
The foregoing is only presently preferred embodiments of the present invention, all impartial changes done according to scope of the present invention patent with
Modification, should all belong to the covering scope of the present invention.
Claims (2)
1. the tree-like video semanteme index establishing method of a kind of integrating context, it is characterised in that the method is comprised the following steps:
Step 1:N training video fragment video of inputj, j ∈ { 1 ..., n }, to videojPre-processed, then with camera lens
Video is manually marked for unitjThe semantic collection of the camera lens of each camera lens, and be the semantic construction camera lens semanteme training set of every class camera lens with
Training grader, obtains camera lens semantic analyzer;The video segment video of m tree index to be set up of inputk, k ∈
{ 1 ..., m }, to videokPre-processed, using camera lens semantic analyzer video is extractedkThe semantic collection of the camera lens of each camera lens;
Step 2:In units of video segment, video is manually markedjContext between middle camera lens semanteme, with context mark
The contextual tab tree LT of labeljRepresent, and build context training set;Training structure supporting vector SVM-Struct, obtains
Contextual tab tree analyzer;Video is extracted using contextual tab tree analyzerkIn contextual tab tree LTk;
Step 3:With videojScene manually mark Scene Semantics for unit, build Scene Semantics training set;Training C4.5 classification
Device, obtains Scene Semantics analyzer;Video is extracted using Scene Semantics analyzerkIn each scene Scene Semantics;
Step 4:By the video obtained in step 1kThe video that the semantic collection of the camera lens of each camera lens is obtained with step 3kThe field of each scene
The LT that scape semantic embedding is obtained in step 2kIn corresponding node, by the LT of the semantic and Scene Semantics with camera lenskConduct
videokVideo index;
Wherein, the step 2 is carried out as follows:
Step 3.1:From videojThe camera lens of each camera lens is semantic to concentrate one camera lens semanteme of extraction to represent the camera lens, then according to when
Order relation constitutes camera lens semantic sequence wuj;
Step 3.2:Artificial mark wujContext, and with context tag tree LTjRepresent contextual information;Contextual tab tree
For a five-tuple LTj=< L, Video, Scene, NL, P >;Wherein L is camera lens semantic label collection, and its element representation is
wujThe middle camera lens for representing camera lens is semantic;Video is " video context " label, and represented context is that its child node is common
Express the content of this section of video;Scene is " scene context " label, represented has been its child node co expression this
The content of scape;NL is the contextual tab collection in addition to Video and Scene, and wherein each element represents a kind of context and closes
System;P is contextual rules, and each of which element representation is a context rule;
Step 3.3:By n wujContext training set is configured to corresponding contextual tab tree:Context={ (xj,yj)|j
=1 ..., n }, wherein xjIt is the camera lens semantic sequence in context training set, yjIt is in context training set and xjIt is corresponding
Contextual tab tree;
Step 3.4:Using context training set training structure support vector machines-Struct, concrete operations are:
Step 3.4.1:Construct camera lens semantic sequence is with the mapping function of contextual tab tree:
Wherein, f (xj,yj;W)=< W, ψ (xj,yj) > be discriminant function, Y is xjThe all contextual tab trees that can be constructed
Set, W is weight vector, ψ (xj,yj) be the corresponding contextual tab tree of camera lens semantic sequence in training data connection
Close characteristic vector;Construction ψ (xj,yj) mode it is as follows:
Wherein piWith ai, i ∈ [1, N] are respectively that the rule in the contextual rules P of the contextual tab tree is corresponding with the rule
The number of times of appearance, N is the context rule classification sum occurred in context training set;
Step 3.4.2:Training SVM-Struct is converted into into optimization problem:
Wherein εjFor slack variable, C>0 is the penalty value of wrong point of sample, Δ (yj, y) it is loss function;Make loss function Δ (yj,
Y)=(1-F1(yj,y));Wherein yjIt is the true contextual tab tree of camera lens semantic sequence in context training set, y is training
During predict contextual tab tree, F1Calculation is as follows:
Wherein, Precision is the accuracy rate of each node prediction in contextual tab, and Recall is every in contextual tab tree
The recall rate of individual node prediction, E (yj) it is yjSide collection, E (y) for y side collection;
Step 3.4.3:Formula (6) is changed into into the form of its antithesis:
Wherein αjyWithIt is Lagrange multiplier, for soft margin, also there is a group constraints in addition:
Step 3.4.4:The computing formula (7) on context training set context, finds one group of α of optimumjyPower is also determined that afterwards
Vectorial W, obtains contextual tab tree analyzer;
Step 3.5:Video is extracted with step 3.1 identical modekCamera lens semantic sequence wuk, and by wukOn input video
Hereafter tag tree analyzer, obtains wukCorresponding LTk;
Wherein, the step 3 is carried out as follows:
Step 4.1:According to LTjIn " scene context " label Scene, by corresponding to the leaf node under each Scene label
Camera lens realizes the scene cut of video as a complete video scene;Then it is artificial to video in units of scenej's
Scene carries out Scene Semantics mark;
Step 4.2:Using the semantic collection of camera lens and corresponding LT of each camera lens in each scenejIn contextual information construction scene
Semantic training set;The feature of wherein Scene Semantics is divided into two kinds:
A. camera lens semantic feature:It is otherwise 0 if certain camera lens is semantic to occur in this scenario, making the camera lens semantic feature value be 1;
B. contextual feature:Contextual feature is the context relation between two camera lens semantemes, and camera lens semanteme is in LTjMiddle correspondence
One leaf node, so the semantic contextual feature value of the two camera lenses is in the nearest public ancestor node of the two leaf nodes
Contextual tab;
Step 4.3:With C4.5 algorithms as disaggregated model, according to the information gain-ratio of each characteristic attribute in Scene Semantics training set
To select attribute as node, the semantic decision tree of analysis video scene is ultimately generated, and using this decision tree as scene language
Adopted analyzer;
Step 4.4:According to wukLTk, with identical method in step 4.1 by videokIt is divided into some scenes, and with scene
For the characteristic vector that unit extracts the scene;By videokThe characteristic vector input scene semantic analyzer of each scene, obtains
videokThe Scene Semantics of each scene;
Wherein, the step 4 is carried out as follows:
Step 5.1:By LTkIn each leaf node in camera lens semantic label replace with camera lens corresponding to representative camera lens
Semanteme collection;
Step 5.2:By LTkIn each Scene replace with corresponding Scene Semantics;
Step 5.3:By comprising the semantic LT with Scene Semantics of camera lenskAs videokVideo index.
2. the tree-like video semanteme index establishing method of a kind of integrating context according to claim 1, it is characterised in that:
Carry out as follows in the step 1:
Step 2.1:To n training video fragment videojShot segmentation is carried out, r training video camera lens shot is obtained1,
shot2..., shotr;Extract and quantify camera lens shotiVisual signature, construct its visual feature vector vi;
Step 2.2:The semantic collection Semantic={ Sem of mark are sett| t=1 ..., e }, manually occur in r camera lens of mark
Semantic Semt, the semantic concentration of camera lens of each camera lens is added to, it is then each class camera lens semanteme SemtThe semantic training of construction camera lens
Collection Trat,Trat={ (vi,si) | i=1 ..., r }, if semanteme SemtOccur in camera lens shotiIn, then si=1, otherwise for
0;Finally give the semantic training set Tra of e camera lens1, Tra2..., Trae;
Step 2.3:It is each semantic Sem using SVM classifier as disaggregated modeltOne grader SVM of trainingt;SVMt's
Discriminant function form is:ft(vi)=sgn [g (vi)], wherein g (vi)=< w, vi>+b, w and b are desired optimized parameters, vi
For video lens shotiVisual feature vector;
Training set TratTraining SVMtOptimization aim be:
Merge optimization problem using Lagrangian and constraint is converted into (1) formula:
Wherein α={ α1,α2,...,αrIt is Lagrange multiplier, h and i is subscript, viAnd vhIt is camera lens shotiAnd shothIt is right
The visual feature vector answered;
Introduce kernel function K (vi,vh), formula (2) is converted to:
Kernel function is chosen to be RBF, is defined as:
Wherein exp () is exponential function, and σ is parameter;
One group of α has been determined that after the completion of trainingi, also determined that camera lens semanteme SemtDiscriminant function:
Wherein b0For parameter;
Step 2.4:Complete to all Sem according to step 2.3tGrader SVMtAfter training, the semantic differentiation of e camera lens is obtained
Function, by the discriminant function composition camera lens semantic analyzer group that e camera lens is semantic;
Step 2.5:Video segment video to m tree index to be set upkShot segmentation is carried out, each camera lens is then extracted
Visual signature composition characteristic vector v;The semanteme occurred during v is input into into camera lens semantic analyzer group to judge the camera lens, and will go out
Existing semanteme is added to the semantic concentration of camera lens of this camera lens.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410297974.0A CN104036023B (en) | 2014-06-26 | 2014-06-26 | Method for creating context fusion tree video semantic indexes |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410297974.0A CN104036023B (en) | 2014-06-26 | 2014-06-26 | Method for creating context fusion tree video semantic indexes |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104036023A CN104036023A (en) | 2014-09-10 |
CN104036023B true CN104036023B (en) | 2017-05-10 |
Family
ID=51466793
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410297974.0A Expired - Fee Related CN104036023B (en) | 2014-06-26 | 2014-06-26 | Method for creating context fusion tree video semantic indexes |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104036023B (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104506947B (en) * | 2014-12-24 | 2017-09-05 | 福州大学 | A kind of video fast forward based on semantic content/rewind speeds self-adapting regulation method |
US20170083623A1 (en) * | 2015-09-21 | 2017-03-23 | Qualcomm Incorporated | Semantic multisensory embeddings for video search by text |
CN106878632B (en) * | 2017-02-28 | 2020-07-10 | 北京知慧教育科技有限公司 | Video data processing method and device |
CN107590442A (en) * | 2017-08-22 | 2018-01-16 | 华中科技大学 | A kind of video semanteme Scene Segmentation based on convolutional neural networks |
KR102387767B1 (en) * | 2017-11-10 | 2022-04-19 | 삼성전자주식회사 | Apparatus and method for user interest information generation |
US10860649B2 (en) * | 2018-03-14 | 2020-12-08 | TCL Research America Inc. | Zoomable user interface for TV |
CN110545299B (en) * | 2018-05-29 | 2022-04-05 | 腾讯科技(深圳)有限公司 | Content list information acquisition method, content list information providing method, content list information acquisition device, content list information providing device and content list information equipment |
CN109344887B (en) * | 2018-09-18 | 2020-07-07 | 山东大学 | Short video classification method, system and medium based on multi-mode dictionary learning |
CN109685144B (en) * | 2018-12-26 | 2021-02-12 | 上海众源网络有限公司 | Method and device for evaluating video model and electronic equipment |
CN111435453B (en) * | 2019-01-14 | 2022-07-22 | 中国科学技术大学 | Fine-grained image zero sample identification method |
CN110097094B (en) * | 2019-04-15 | 2023-06-13 | 天津大学 | Multiple semantic fusion few-sample classification method for character interaction |
CN110765314A (en) * | 2019-10-21 | 2020-02-07 | 长沙品先信息技术有限公司 | Video semantic structural extraction and labeling method |
CN114302224B (en) * | 2021-12-23 | 2023-04-07 | 新华智云科技有限公司 | Intelligent video editing method, device, equipment and storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103593363A (en) * | 2012-08-15 | 2014-02-19 | 中国科学院声学研究所 | Video content indexing structure building method and video searching method and device |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10078693B2 (en) * | 2006-06-16 | 2018-09-18 | International Business Machines Corporation | People searches by multisensor event correlation |
-
2014
- 2014-06-26 CN CN201410297974.0A patent/CN104036023B/en not_active Expired - Fee Related
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103593363A (en) * | 2012-08-15 | 2014-02-19 | 中国科学院声学研究所 | Video content indexing structure building method and video searching method and device |
Non-Patent Citations (2)
Title |
---|
"Co-Concept-Boosting视频语义索引方法";陈丹雯等;《小型微型计算机系统》;20120731;第33卷(第7期);1603-1607 * |
"一种新的用于视频检索的语义索引";韩智广等;《和谐人机环境2008》;20081231;454-459 * |
Also Published As
Publication number | Publication date |
---|---|
CN104036023A (en) | 2014-09-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104036023B (en) | Method for creating context fusion tree video semantic indexes | |
Qu et al. | Dynamic modality interaction modeling for image-text retrieval | |
Gorti et al. | X-pool: Cross-modal language-video attention for text-video retrieval | |
Liang et al. | Jointly learning aspect-focused and inter-aspect relations with graph convolutional networks for aspect sentiment analysis | |
Chen et al. | Hierarchical visual-textual graph for temporal activity localization via language | |
Perez-Martin et al. | Improving video captioning with temporal composition of a visual-syntactic embedding | |
CN106951438A (en) | A kind of event extraction system and method towards open field | |
CN105760507A (en) | Cross-modal subject correlation modeling method based on deep learning | |
CN103970733B (en) | A kind of Chinese new word identification method based on graph structure | |
CN103778227A (en) | Method for screening useful images from retrieved images | |
Zablocki et al. | Context-aware zero-shot learning for object recognition | |
Hii et al. | Multigap: Multi-pooled inception network with text augmentation for aesthetic prediction of photographs | |
CN109948668A (en) | A kind of multi-model fusion method | |
CN105849720A (en) | Visual semantic complex network and method for forming network | |
CN105824862A (en) | Image classification method based on electronic equipment and electronic equipment | |
CN106203296B (en) | The video actions recognition methods of one attribute auxiliary | |
CN104376108B (en) | A kind of destructuring natural language information abstracting method based on the semantic marks of 6W | |
CN104537028B (en) | A kind of Web information processing method and device | |
CN106649663A (en) | Video copy detection method based on compact video representation | |
CN103761286B (en) | A kind of Service Source search method based on user interest | |
Hinami et al. | Discriminative learning of open-vocabulary object retrieval and localization by negative phrase augmentation | |
Jung et al. | Devil's on the edges: Selective quad attention for scene graph generation | |
CN113076483A (en) | Case element heteromorphic graph-based public opinion news extraction type summarization method | |
CN109376964A (en) | A kind of criminal case charge prediction technique based on Memory Neural Networks | |
Wang et al. | Topic scene graph generation by attention distillation from caption |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170510 Termination date: 20200626 |
|
CF01 | Termination of patent right due to non-payment of annual fee |