CN104036023A - Method for creating context fusion tree video semantic indexes - Google Patents
Method for creating context fusion tree video semantic indexes Download PDFInfo
- Publication number
- CN104036023A CN104036023A CN201410297974.0A CN201410297974A CN104036023A CN 104036023 A CN104036023 A CN 104036023A CN 201410297974 A CN201410297974 A CN 201410297974A CN 104036023 A CN104036023 A CN 104036023A
- Authority
- CN
- China
- Prior art keywords
- camera lens
- video
- semantic
- scene
- context
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/71—Indexing; Data structures therefor; Storage structures
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Multimedia (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Studio Devices (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention belongs to the field of technologies for retrieving video, and discloses a method for creating tree video semantic indexes. The video semantic indexes built by the aid of the method contain video semantics of various particle sizes, contexts among the video semantics are fused by the semantic indexes, and the video semantics of the different particle sizes are connected with one another according to the contexts, so that tree structures can be formed. The method is characterized by comprising steps of extracting lens semantic sets of various lenses one by one; acquiring contexts of video lens semantics under the monitoring condition and representing the contexts by context tag trees; combining the lens semantic sets and context information with one another to infer scene semantics; embedding the lens semantic sets and the scene semantics into the context tag trees to obtain the video indexes. The method has the advantages that after the semantic indexes are created for video by the aid of the method, users can input keywords of the different particle sizes to retrieve the video, search spaces can be diminished by the aid of context information in the indexes, and accordingly the efficiency of retrieval systems can be improved.
Description
Technical field
The invention belongs to video search technique area, a kind of method that be that camera lens that can utilize video is semantic, the context between Scene Semantics and semanteme builds video semanteme index.
Background technology
Nowadays video data becomes one of most important data on internet already.Yet along with the volatile growth of video data, how efficiently management, retrieve video become a very difficult problem.Conventionally user is a key word of input when retrieve video, then by video search engine, according to key word, need find relevant video data.This just requires video to set up efficiency and the hit rate that suitable semantic indexing could improve user search video.It is automatically to analyze by computing machine the semantic information that the visual signature of video contains to obtain video that video index based on video semanteme builds, then the index using semantic information as video, user can be by input key search video when retrieve video.
Yet user improves constantly the requirement of video search engine, user often difference according to demand inputs varigrained key word, may input the varigrained keywords such as " football ", " wonderful ", " shooting ", " judge's feature " and retrieve during such as the video of user search football related content.So Search Requirement traditional Monosized powder, can not meet user without the video semanteme index of level.In addition, the semantic content of video is abundant, except semantic information, also exist a large amount of contextual informations, utilize the contextual information engine that can assist search to understand the interaction between different grain size semanteme, for opening relationships between varigrained semanteme in video, thereby can search for relevant video according to these relation informations when retrieve video.Contextual information can guarantee under the prerequisite of search hit rate, dwindle search volume, improves search efficiency.Based on this, the present invention has realized a kind of video semanteme index that can integrating context, to improve the validity of video index.
Summary of the invention
The object of the invention is to realize a kind of method of can integrating context information setting up tree-like video semanteme index.The method can incorporate contextual information in video semanteme index, improves video frequency searching hit rate and efficiency.
The present invention adopts following scheme to realize: a kind of tree-like video semanteme index establishing method of integrating context, is characterized in that the method comprises the following steps:
Step 1: input n training video fragment video
j, j ∈ 1 ..., n}, to video
jcarry out pre-service, then take camera lens as unit labor cost mark video
jthe semantic collection of camera lens of each camera lens, and be that every class camera lens semanteme is constructed the semantic training set of camera lens with training classifier, camera lens semantic analyzer obtained.The video segment video of m tree index to be set up of input
k, k ∈ 1 ..., m}, to video
kcarry out pre-service, utilize camera lens semantic analyzer to extract video
kthe semantic collection of camera lens of each camera lens;
Step 2: take video segment as unit, manually mark video
jcontext between middle camera lens semanteme, uses the contextual tab tree LT with contextual tab
jrepresent, and build context training set.Training structure support vector SVM-Struct, obtains contextual tab tree analyzer.Utilize contextual analysis device to extract video
kin contextual tab tree LT
k;
Step 3: with video
jscene be unit labor cost mark Scene Semantics, build Scene Semantics training set.Training C4.5 sorter, obtains Scene Semantics analyzer.Utilize Scene Semantics analyzer to extract video
kin the Scene Semantics of each scene;
Step 4: by the video obtaining in step 2
kthe video that the semantic collection of camera lens and the step 4 of each camera lens obtains
kthe Scene Semantics of each scene is embedded into the LT obtaining in step 3
kin corresponding node, by the LT of and Scene Semantics semantic with camera lens
kas video
kvideo index.
Further, in described step 1, carry out as follows:
Step 2.1: to n training video fragment video
jcarry out camera lens and cut apart, obtain r training video camera lens; Extract and quantize the visual signature of camera lens, be configured to visual feature vector v;
Step 2.2: the semantic collection of mark Semantic={Sem is set
t| t=1 ..., e}, manually marks the semantic Sem occurring in r camera lens
t, the camera lens that joins each camera lens is semantic concentrated, is then the semantic Sem of each class camera lens
tthe semantic training set of structure camera lens, obtains the semantic training set Tra of e camera lens
t={ (v
i, s
i) | i=1 ..., r}, if semantic Sem
tappear in this camera lens, s
i=1, otherwise be 0;
Step 2.3: using svm classifier device as disaggregated model, is each semantic Sem
ttrain a sorter SVM
t; SVM
tdiscriminant function form be: f
t(v)=sgn[g (v)], g (v)=<w wherein, v>+b; So by training set Tra
ttraining SVM
toptimization aim be:
Utilizing Lagrangian function to merge optimization problem and retrain is converted into (1) formula:
Introduce kernel function K (v
j, v
h), formula (2) is converted to:
Kernel function is chosen to be radial basis function, is defined as:
Wherein exp () is exponential function, and σ is parameter.
After having trained, just determined one group of α
i, also just determined the semantic Sem of camera lens
tdiscriminant function:
B wherein
0for parameter.
Step 2.4: complete all Sem according to step 2.3
tsorter SVM
tafter training, obtain the discriminant function of e camera lens semanteme, the camera lens semantic analyzer group that the discriminant function of e camera lens semanteme is formed.
Step 2.5: the video segment video to m tree index to be set up
kcarry out camera lens and cut apart, then extract the visual signature composition characteristic vector v of each camera lens; V is inputted to camera lens semantic analyzer group to judge the semanteme occurring in this camera lens, and the semanteme of appearance is joined to semantic the concentrating of camera lens of this camera lens.
Further, described step 2 is carried out as follows:
Step 3.1: from video
jthe semantic camera lens semanteme of extraction of concentrating of camera lens of each camera lens represents this camera lens, then according to sequential relationship, forms camera lens semantic sequence wu
j;
Step 3.2: manually mark wu
jcontext, and with context tag tree LT
jrepresent contextual information; Contextual tab tree is a five-tuple LT=<L, Video, Scene, NL, P>; Wherein L is camera lens semantic label collection, its element representation be wu
jthe camera lens of middle representative shot is semantic; Video is " video context " label, and represented context is the content of this section of video of its child node co expression; Scene is " scene context " label, the content of this scene that has been its child node co expression shown in table; NL is the contextual tab collection except Video and Scene, and wherein each element represents a kind of context relation; P is contextual rules, its each element representation be a context rule;
Step 3.3: by n wu
jbe configured to context training set with corresponding contextual tab tree:
Context={ (x
j, y
j) | j=1 ..., n}, wherein x
jcamera lens semantic sequence, y
jit is corresponding contextual tab tree;
Step 3.4: utilize context training set training structure support vector machines-Struct, concrete operations are:
Step 3.4.1: the mapping function of structure camera lens semantic sequence and contextual tab tree is:
F (x, y wherein; W)=<W, ψ (x, y) > is discriminant function, and W is weight vector, and ψ (x, y) is the union feature vector of the contextual tab tree that the camera lens semantic sequence in training data is corresponding with it; The mode of structure ψ (x, y) is as follows:
P wherein
iwith a
i(i ∈ [1, N]) is respectively rule and the corresponding number of times occurring of this rule in the contextual rules P of this contextual tab tree, and N is the context rule classification sum occurring in context training set;
Step 3.4.2: training SVM-Struct is converted into optimization problem:
ε wherein
jfor slack variable, C>0 is the penalty value of wrong minute sample, Δ (y
j, y) be loss function; Make loss function Δ (y
j, y)=(1-F
1(y
j, y)); Y wherein
jbe the true contextual tab tree of camera lens semantic sequence in context training set, y is the contextual tab tree of predicting in training process, and F1 account form is as follows:
Wherein, Precision is each node predictablity rate in contextual tab, and Precision is the recall rate of each node prediction in contextual tab tree, E (y
j) be y
jlimit collection, the limit collection that E (y) is y;
Step 3.4.3: the form that formula (6) is changed into its antithesis:
α wherein
iylagrange multiplier. for soft interval, also there is in addition group constraint condition:
Step 3.4.4: computing formula (7) on context training set context, finds one group of optimum α
jyafter also just determine weight vector W, obtain contextual tab tree analyzer;
Step 3.5: use the mode identical with step 3.1 to extract video
kcamera lens semantic sequence wu
k, and by wu
kinput video contextual tab tree analyzer, obtains wu
klT
k.
Further, described step 3 is carried out as follows:
Step 4.1: according to LT
jin " scene context " label Scene, using the corresponding camera lens of leaf node under each Scene label as a complete video scene, the scene that realizes video is cut apart; Then take scene as unit labor cost is to video
jscene carry out Scene Semantics mark;
Step 4.2: the semantic collection of camera lens and the corresponding LT that utilize each camera lens in each scene
jin contextual information structure Scene Semantics training set; Wherein the feature of Scene Semantics is divided into two kinds:
A. camera lens semantic feature: if certain camera lens semanteme appears in this scene, making this camera lens semantic feature value is 1, otherwise is 0;
B. contextual feature: contextual feature is two context relations between camera lens semanteme, and camera lens semanteme is at LT
ja middle corresponding leaf node, so the contextual feature value of these two camera lens semantemes is the contextual tab in the nearest public ancestor node of these two leaf nodes;
Step 4.3: take C4.5 algorithm as disaggregated model, according to the information gain rate of each characteristic attribute in Scene Semantics training set, select attribute as node, the final decision tree that generates analysis video scene semanteme, and using this decision tree as Scene Semantics analyzer;
Step 4.4: according to wu
klT
k, with method identical in step 4.1 by video
kbe divided into some scenes, and take scene and extract the proper vector of this scene as unit; By video
kthe proper vector input scene semantic analyzer of each scene, obtains video
kthe Scene Semantics of each scene.
Further, described step 4 is carried out as follows:
Step 5.1: by LT
kin each leaf node in camera lens semantic label replace with the semantic collection of the corresponding camera lens of the camera lens of representative;
Step 5.2: by LT
kin each Scene replace with corresponding Scene Semantics;
Step 5.3: will comprise the LT of camera lens semanteme with Scene Semantics
kas video
kvideo index.
The invention has the beneficial effects as follows: utilize method of the present invention to set up after semantic indexing for video, user can input varigrained keyword retrieval video, and the contextual information in index can dwindle search volume, the efficiency of raising searching system.
Accompanying drawing explanation
Fig. 1 is tree-like video semanteme index Establishing process.
Fig. 2 is the model of a video contextual tab tree.
Fig. 3 is a tree-like video index model.
Embodiment
Please refer to Fig. 1, a kind of method that tree-like video index of integrating context is set up, first take camera lens as the semantic information that unit extracts camera lens, then has supervision and obtains the context of video lens semanteme, and represent context with tree construction.In conjunction with camera lens is semantic, carry out the reasoning of Scene Semantics with their context again.Finally camera lens is semantic, Scene Semantics is embedded in tree construction and as the index of video.Specific as follows:
1. couple n training video fragment video
jcarry out camera lens and cut apart, obtain r training video camera lens.Extract and quantize the visual signature of camera lens, be configured to visual feature vector v.
The semantic collection of mark Semantic={Sem is set
t| t=1 ..., e}, manually marks the semantic Sem occurring in r camera lens
t, the camera lens that joins each camera lens is semantic concentrated, is then the semantic Sem of each class camera lens
tthe semantic training set of structure camera lens, obtains the semantic training set Tra of e camera lens
t={ (v
i, s
i) | i=1 ..., r}, if semantic Sem
tappear in this camera lens, s
i=1, otherwise be 0;
Using svm classifier device as disaggregated model, is each semantic Sem
ttrain a sorter SVM
t.SVM
tdiscriminant function form be: f
t(v)=sgn[g (v)], g (v)=<w wherein, v>+b.So by training set Tra
ttraining SVM
toptimization aim be:
Utilizing Lagrangian function to merge optimization problem and retrain is converted into (1) formula:
Introduce kernel function K (v
j, v
h), formula (2) is converted to:
Kernel function is chosen to be radial basis function, is defined as:
Wherein exp () is exponential function, and σ is parameter.
After having trained, just determined one group of α
i, also just determined the semantic Sem of camera lens
tdiscriminant function:
B wherein
0for parameter.
Complete Sem
tcorresponding sorter SVM
tafter training, obtain the camera lens semantic analyzer group that comprises e camera lens semantic analyzer.
Video segment video to m tree index to be set up
kcarry out camera lens and cut apart, then extract the visual signature composition characteristic vector v of each camera lens.V is inputted to camera lens semantic analyzer group to judge the semanteme occurring in this camera lens, and the semanteme of appearance is joined to semantic the concentrating of camera lens of this camera lens.
2. from video
jthe semantic camera lens semanteme of extraction of concentrating of camera lens of each camera lens represents this camera lens, then according to sequential relationship, forms camera lens semantic sequence wu
j.
Take video segment as unit, manually mark the semantic sequence wu of training video fragment
jcontext, and with corresponding contextual tab tree LT
jrepresent contextual information.Contextual tab tree is defined as five-tuple LT=<L, Video, Scene, NL, P> formally.Wherein L is camera lens semantic label collection, its element representation be wu
jthe camera lens of middle representative shot is semantic.Video is " video context " label, and represented context is the content of this section of video of its child node co expression.Scene is " scene context " label, the content of this scene that has been its child node co expression shown in table.NL is the contextual tab collection except Video and Scene, and wherein each element represents a kind of context relation.P is contextual rules, its each element representation be a kind of context rule.Leaf node l in Fig. 2 for example
1and l
2the father node nl that forms them
1rule, this rule can be expressed as formally: nl
1→ l
1l
2.
By n wu
jbe configured to context training set with corresponding contextual tab tree: context={ (x
j, y
j) | j=1 ..., n}, wherein x
jcamera lens semantic sequence, y
jit is corresponding contextual tab tree.
Utilize context training set training structure support vector machines-Struct, the mapping function of structure camera lens semantic sequence and contextual tab tree is:
F (x, y wherein; W)=<W, ψ (x, y) > is discriminant function, and W is weight vector, and ψ (x, y) is the union feature vector of the contextual tab tree that the camera lens semantic sequence in training data is corresponding with it.The mode of structure ψ (x, y) is as follows:
P wherein
iwith a
i(i ∈ [1, N]) is respectively context rule and the corresponding number of times occurring of this rule in the contextual rules P of this contextual tab tree, and N is the context rule classification sum occurring in context training set.
Training SVM-Struct is converted into optimization problem:
ε wherein
jfor slack variable, C>0 is the penalty value of wrong minute sample, Δ (y
j, y) be loss function.Make loss function Δ (y
j, y)=(1-F
1(y
j, y)).Y wherein
jbe the true contextual tab tree of camera lens semantic sequence in context training set, y is the contextual tab tree of predicting in training process, and F1 account form is as follows:
Wherein, Precision is each node predictablity rate in contextual tab, and Precision is the recall rate of each node prediction in contextual tab tree, E (y
j) be y
jlimit collection, the limit collection that E (y) is y.
Formula (6) is changed into the form of its antithesis:
α wherein
iylagrange multiplier. for soft interval, also there is in addition group constraint condition:
Set after penalty value C, computing formula (7) on context training set context, finds one group of optimum α
jyafter also just determine weight vector W, obtain contextual tab tree analyzer.
Extract video
kcamera lens semantic sequence wu
k, and by wu
kinput video contextual tab tree analyzer, obtains wu
klT
k.
3. according to LT
jin " scene context " label Scene, using the corresponding camera lens of leaf node under each Scene label as a complete video scene, the scene that realizes video is cut apart.Then take scene as unit labor cost is to video
jscene carry out Scene Semantics mark.
Utilize the semantic collection of camera lens and the corresponding LT of each camera lens in each scene
jin contextual information structure Scene Semantics training set.Wherein the feature of Scene Semantics is divided into two kinds:
A. camera lens semantic feature: if certain camera lens semanteme appears in this scene, making this camera lens semantic feature value is 1, otherwise is 0;
B. contextual feature: contextual feature is two context relations between camera lens semanteme, and camera lens semanteme is at LT
ja middle corresponding leaf node, so the contextual feature value of these two camera lens semantemes is the contextual tab in the nearest public ancestor node of these two leaf nodes.For example, l in Fig. 2
1and l
2contextual feature be " nl
1", l
1and l
3contextual feature be " Scene ".
Take C4.5 algorithm as disaggregated model, according to the information gain rate of each characteristic attribute in Scene Semantics training set, select attribute as node, the final decision tree that generates analysis video scene semanteme.Using this decision tree as Scene Semantics analyzer.
According to wu
klT
kin " scene context " label Scene, by video
kbe divided into some scenes, and take scene and extract the camera lens semantic feature of this scene and contextual feature composition characteristic vector as unit.By video
kthe proper vector input scene semantic analyzer of each scene, obtains video
kthe Scene Semantics of each scene.
4. by LT
kin each leaf node in camera lens semantic label replace with the semantic collection of the corresponding camera lens of the camera lens of representative, then by LT
kin each Scene replace with corresponding Scene Semantics, finally will comprise the semantic LT with Scene Semantics of camera lens
kas video
kvideo index;
The foregoing is only preferred embodiment of the present invention, all equalizations of doing according to the present patent application the scope of the claims change and modify, and all should belong to covering scope of the present invention.
Claims (5)
1. a tree-like video semanteme index establishing method for integrating context, is characterized in that the method comprises the following steps:
Step 1: input n training video fragment video
j, j ∈ 1 ..., n}, to video
jcarry out pre-service, then take camera lens as unit labor cost mark video
jthe semantic collection of camera lens of each camera lens, and be that every class camera lens semanteme is constructed the semantic training set of camera lens with training classifier, camera lens semantic analyzer obtained; The video segment video of m tree index to be set up of input
k, k ∈ 1 ..., m}, to video
kcarry out pre-service, utilize camera lens semantic analyzer to extract video
kthe semantic collection of camera lens of each camera lens;
Step 2: take video segment as unit, manually mark video
jcontext between middle camera lens semanteme, uses the contextual tab tree LT with contextual tab
jrepresent, and build context training set; Training structure support vector SVM-Struct, obtains contextual tab tree analyzer; Utilize contextual analysis device to extract video
kin contextual tab tree LT
k;
Step 3: with video
jscene be unit labor cost mark Scene Semantics, build Scene Semantics training set; Training C4.5 sorter, obtains Scene Semantics analyzer; Utilize Scene Semantics analyzer to extract video
kin the Scene Semantics of each scene;
Step 4: by the video obtaining in step 2
kthe video that the semantic collection of camera lens and the step 4 of each camera lens obtains
kthe Scene Semantics of each scene is embedded into the LT obtaining in step 3
kin corresponding node, by the LT of and Scene Semantics semantic with camera lens
kas video
kvideo index.
2. the tree-like video semanteme index establishing method of a kind of integrating context according to claim 1, is characterized in that: in described step 1, carry out as follows:
Step 2.1: to n training video fragment video
jcarry out camera lens and cut apart, obtain r training video camera lens; Extract and quantize the visual signature of camera lens, be configured to visual feature vector v;
Step 2.2: the semantic collection of mark Semantic={Sem is set
t| t=1 ..., e}, manually marks the semantic Sem occurring in r camera lens
t, the camera lens that joins each camera lens is semantic concentrated, is then the semantic Sem of each class camera lens
tthe semantic training set of structure camera lens, obtains the semantic training set Tra of e camera lens
t={ (v
i, s
i) | i=1 ..., r}, if semantic Sem
tappear in this camera lens, s
i=1, otherwise be 0;
Step 2.3: using svm classifier device as disaggregated model, is each semantic Sem
ttrain a sorter SVM
t; SVM
tdiscriminant function form be: f
t(v)=sgn[g (v)], wherein
x(v)=<w, v>+b; So by training set Tra
ttraining SVM
toptimization aim be:
Utilizing Lagrangian function to merge optimization problem and retrain is converted into (1) formula:
Introduce kernel function K (v
j, v
h), formula (2) is converted to:
Kernel function is chosen to be radial basis function, is defined as:
Wherein exp () is exponential function, and σ is parameter.
After having trained, just determined one group of α
i, also just determined the semantic Sem of camera lens
tdiscriminant function:
B wherein
0for parameter.
Step 2.4: complete all Sem according to step 2.3
tsorter SVM
tafter training, obtain the discriminant function of e camera lens semanteme, the camera lens semantic analyzer group that the discriminant function of e camera lens semanteme is formed.
Step 2.5: the video segment video to m tree index to be set up
kcarry out camera lens and cut apart, then extract the visual signature composition characteristic vector v of each camera lens; V is inputted to camera lens semantic analyzer group to judge the semanteme occurring in this camera lens, and the semanteme of appearance is joined to semantic the concentrating of camera lens of this camera lens.
3. the tree-like video semanteme index establishing method of a kind of integrating context according to claim 1, is characterized in that: described step 2 is carried out as follows:
Step 3.1: from video
jthe semantic camera lens semanteme of extraction of concentrating of camera lens of each camera lens represents this camera lens, then according to sequential relationship, forms camera lens semantic sequence wu
j;
Step 3.2: manually mark wu
jcontext, and with context tag tree LT
jrepresent contextual information; Contextual tab tree is a five-tuple LT=<L, Video, Scene, NL, P>; Wherein L is camera lens semantic label collection, its element representation be wu
jthe camera lens of middle representative shot is semantic; Video is " video context " label, and represented context is the content of this section of video of its child node co expression; Scene is " scene context " label, the content of this scene that has been its child node co expression shown in table; NL is the contextual tab collection except Video and Scene, and wherein each element represents a kind of context relation; P is contextual rules, its each element representation be a context rule;
Step 3.3: by n wu
jbe configured to context training set with corresponding contextual tab tree: context={ (x
j, y
j) | j=1 ..., n}, wherein x
jcamera lens semantic sequence, y
jit is corresponding contextual tab tree;
Step 3.4: utilize context training set training structure support vector machines-Struct, concrete operations are:
Step 3.4.1: the mapping function of structure camera lens semantic sequence and contextual tab tree is:
F (x, y wherein; W)=<W, ψ (x, y) > is discriminant function, and W is weight vector, and ψ (x, y) is the union feature vector of the contextual tab tree that the camera lens semantic sequence in training data is corresponding with it; The mode of structure ψ (x, y) is as follows:
P wherein
iwith a
i(i ∈ [1, N]) is respectively rule and the corresponding number of times occurring of this rule in the contextual rules P of this contextual tab tree, and N is the context rule classification sum occurring in context training set;
Step 3.4.2: training SVM-Struct is converted into optimization problem:
Be wherein slack variable, C>0 is the penalty value of wrong minute sample, Δ (y
j, y) be loss function; Make loss function Δ (y
j, y)=(1-F
1(y
j, y)); Y wherein
jbe the true contextual tab tree of camera lens semantic sequence in context training set, y is the contextual tab tree of predicting in training process, and F1 account form is as follows:
Wherein, Precision is the accuracy rate of each node prediction in contextual tab, and Precision is the recall rate of each node prediction in contextual tab tree, E (y
j) be y
jlimit collection, the limit collection that E (y) is y;
Step 3.4.3: the form that formula (6) is changed into its antithesis:
α wherein
iylagrange multiplier. for soft interval, also there is in addition group constraint condition:
Step 3.4.4: computing formula (7) on context training set context, finds one group of optimum α
jyafter also just determine weight vector W, obtain contextual tab tree analyzer;
Step 3.5: use the mode identical with step 3.1 to extract video
kcamera lens semantic sequence wu
k, and by wu
kinput video contextual tab tree analyzer, obtains wu
kcorresponding LT
k.
4. the tree-like video semanteme index establishing method of a kind of integrating context according to claim 1, is characterized in that: described step 3 is carried out as follows:
Step 4.1: according to LT
jin " scene context " label Scene, using the corresponding camera lens of leaf node under each Scene label as a complete video scene, the scene that realizes video is cut apart; Then take scene as unit labor cost is to video
jscene carry out Scene Semantics mark;
Step 4.2: the semantic collection of camera lens and the corresponding LT that utilize each camera lens in each scene
jin contextual information structure Scene Semantics training set; Wherein the feature of Scene Semantics is divided into two kinds:
A. camera lens semantic feature: if certain camera lens semanteme appears in this scene, making this camera lens semantic feature value is 1, otherwise is 0;
B. contextual feature: contextual feature is two context relations between camera lens semanteme, and camera lens semanteme is at LT
ja middle corresponding leaf node, so the contextual feature value of these two camera lens semantemes is the contextual tab in the nearest public ancestor node of these two leaf nodes;
Step 4.3: take C4.5 algorithm as disaggregated model, according to the information gain rate of each characteristic attribute in Scene Semantics training set, select attribute as node, the final decision tree that generates analysis video scene semanteme, and using this decision tree as Scene Semantics analyzer;
Step 4.4: according to wu
klT
k, with method identical in step 4.1 by video
kbe divided into some scenes, and take scene and extract the proper vector of this scene as unit; By video
kthe proper vector input scene semantic analyzer of each scene, obtains video
kthe Scene Semantics of each scene.
5. the tree-like video semanteme index establishing method of a kind of integrating context according to claim 1, is characterized in that: described step 4 is carried out as follows:
Step 5.1: by LT
kin each leaf node in camera lens semantic label replace with the semantic collection of the corresponding camera lens of the camera lens of representative;
Step 5.2: by LT
kin each Scene replace with corresponding Scene Semantics;
Step 5.3: will comprise the LT of camera lens semanteme with Scene Semantics
kas video
kvideo index.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410297974.0A CN104036023B (en) | 2014-06-26 | 2014-06-26 | Method for creating context fusion tree video semantic indexes |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410297974.0A CN104036023B (en) | 2014-06-26 | 2014-06-26 | Method for creating context fusion tree video semantic indexes |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104036023A true CN104036023A (en) | 2014-09-10 |
CN104036023B CN104036023B (en) | 2017-05-10 |
Family
ID=51466793
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410297974.0A Expired - Fee Related CN104036023B (en) | 2014-06-26 | 2014-06-26 | Method for creating context fusion tree video semantic indexes |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104036023B (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104506947A (en) * | 2014-12-24 | 2015-04-08 | 福州大学 | Video fast forward/fast backward speed self-adaptive regulating method based on semantic content |
CN106878632A (en) * | 2017-02-28 | 2017-06-20 | 北京知慧教育科技有限公司 | A kind for the treatment of method and apparatus of video data |
CN107590442A (en) * | 2017-08-22 | 2018-01-16 | 华中科技大学 | A kind of video semanteme Scene Segmentation based on convolutional neural networks |
CN108027834A (en) * | 2015-09-21 | 2018-05-11 | 高通股份有限公司 | Semantic more sense organ insertions for the video search by text |
CN109344887A (en) * | 2018-09-18 | 2019-02-15 | 山东大学 | Short video classification methods, system and medium based on multi-modal dictionary learning |
CN109685144A (en) * | 2018-12-26 | 2019-04-26 | 上海众源网络有限公司 | The method, apparatus and electronic equipment that a kind of pair of Video Model does to assess |
CN110097094A (en) * | 2019-04-15 | 2019-08-06 | 天津大学 | It is a kind of towards personage interaction multiple semantic fusion lack sample classification method |
CN110275744A (en) * | 2018-03-14 | 2019-09-24 | Tcl集团股份有限公司 | It is a kind of for making the method and system of scalable user interface |
CN110545299A (en) * | 2018-05-29 | 2019-12-06 | 腾讯科技(深圳)有限公司 | content list information acquisition method, content list information providing method, content list information acquisition device, content list information providing device and content list information equipment |
CN110765314A (en) * | 2019-10-21 | 2020-02-07 | 长沙品先信息技术有限公司 | Video semantic structural extraction and labeling method |
CN111435453A (en) * | 2019-01-14 | 2020-07-21 | 中国科学技术大学 | Fine-grained image zero sample identification method |
US20210182558A1 (en) * | 2017-11-10 | 2021-06-17 | Samsung Electronics Co., Ltd. | Apparatus for generating user interest information and method therefor |
CN114302224A (en) * | 2021-12-23 | 2022-04-08 | 新华智云科技有限公司 | Intelligent video editing method, device, equipment and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080252727A1 (en) * | 2006-06-16 | 2008-10-16 | Lisa Marie Brown | People searches by multisensor event correlation |
CN103593363A (en) * | 2012-08-15 | 2014-02-19 | 中国科学院声学研究所 | Video content indexing structure building method and video searching method and device |
-
2014
- 2014-06-26 CN CN201410297974.0A patent/CN104036023B/en not_active Expired - Fee Related
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080252727A1 (en) * | 2006-06-16 | 2008-10-16 | Lisa Marie Brown | People searches by multisensor event correlation |
CN103593363A (en) * | 2012-08-15 | 2014-02-19 | 中国科学院声学研究所 | Video content indexing structure building method and video searching method and device |
Non-Patent Citations (2)
Title |
---|
陈丹雯等: ""Co-Concept-Boosting视频语义索引方法"", 《小型微型计算机系统》 * |
韩智广等: ""一种新的用于视频检索的语义索引"", 《和谐人机环境2008》 * |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104506947B (en) * | 2014-12-24 | 2017-09-05 | 福州大学 | A kind of video fast forward based on semantic content/rewind speeds self-adapting regulation method |
CN104506947A (en) * | 2014-12-24 | 2015-04-08 | 福州大学 | Video fast forward/fast backward speed self-adaptive regulating method based on semantic content |
CN108027834A (en) * | 2015-09-21 | 2018-05-11 | 高通股份有限公司 | Semantic more sense organ insertions for the video search by text |
CN106878632A (en) * | 2017-02-28 | 2017-06-20 | 北京知慧教育科技有限公司 | A kind for the treatment of method and apparatus of video data |
CN106878632B (en) * | 2017-02-28 | 2020-07-10 | 北京知慧教育科技有限公司 | Video data processing method and device |
CN107590442A (en) * | 2017-08-22 | 2018-01-16 | 华中科技大学 | A kind of video semanteme Scene Segmentation based on convolutional neural networks |
US20210182558A1 (en) * | 2017-11-10 | 2021-06-17 | Samsung Electronics Co., Ltd. | Apparatus for generating user interest information and method therefor |
US11678012B2 (en) * | 2017-11-10 | 2023-06-13 | Samsung Electronics Co., Ltd. | Apparatus and method for user interest information generation |
CN110275744A (en) * | 2018-03-14 | 2019-09-24 | Tcl集团股份有限公司 | It is a kind of for making the method and system of scalable user interface |
CN110275744B (en) * | 2018-03-14 | 2021-11-23 | Tcl科技集团股份有限公司 | Method and system for making scalable user interface |
CN110545299B (en) * | 2018-05-29 | 2022-04-05 | 腾讯科技(深圳)有限公司 | Content list information acquisition method, content list information providing method, content list information acquisition device, content list information providing device and content list information equipment |
CN110545299A (en) * | 2018-05-29 | 2019-12-06 | 腾讯科技(深圳)有限公司 | content list information acquisition method, content list information providing method, content list information acquisition device, content list information providing device and content list information equipment |
CN109344887A (en) * | 2018-09-18 | 2019-02-15 | 山东大学 | Short video classification methods, system and medium based on multi-modal dictionary learning |
CN109685144A (en) * | 2018-12-26 | 2019-04-26 | 上海众源网络有限公司 | The method, apparatus and electronic equipment that a kind of pair of Video Model does to assess |
CN111435453A (en) * | 2019-01-14 | 2020-07-21 | 中国科学技术大学 | Fine-grained image zero sample identification method |
CN111435453B (en) * | 2019-01-14 | 2022-07-22 | 中国科学技术大学 | Fine-grained image zero sample identification method |
CN110097094A (en) * | 2019-04-15 | 2019-08-06 | 天津大学 | It is a kind of towards personage interaction multiple semantic fusion lack sample classification method |
CN110097094B (en) * | 2019-04-15 | 2023-06-13 | 天津大学 | Multiple semantic fusion few-sample classification method for character interaction |
CN110765314A (en) * | 2019-10-21 | 2020-02-07 | 长沙品先信息技术有限公司 | Video semantic structural extraction and labeling method |
CN114302224A (en) * | 2021-12-23 | 2022-04-08 | 新华智云科技有限公司 | Intelligent video editing method, device, equipment and storage medium |
CN114302224B (en) * | 2021-12-23 | 2023-04-07 | 新华智云科技有限公司 | Intelligent video editing method, device, equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN104036023B (en) | 2017-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104036023A (en) | Method for creating context fusion tree video semantic indexes | |
Chang et al. | Semantic pooling for complex event analysis in untrimmed videos | |
Duan et al. | Exploiting web images for event recognition in consumer videos: A multiple source domain adaptation approach | |
Habibian et al. | Videostory: A new multimedia embedding for few-example recognition and translation of events | |
US20170357878A1 (en) | Multi-dimensional realization of visual content of an image collection | |
Garcia et al. | Context-aware embeddings for automatic art analysis | |
US20180293313A1 (en) | Video content retrieval system | |
CN102799684B (en) | The index of a kind of video and audio file cataloguing, metadata store index and searching method | |
Zhou et al. | Conceptlearner: Discovering visual concepts from weakly labeled image collections | |
Dal Bianco et al. | A practical and effective sampling selection strategy for large scale deduplication | |
CN102890700A (en) | Method for retrieving similar video clips based on sports competition videos | |
CN103425757A (en) | Cross-medial personage news searching method and system capable of fusing multi-mode information | |
Zhang et al. | Enhancing video event recognition using automatically constructed semantic-visual knowledge base | |
CN104391924A (en) | Mixed audio and video search method and system | |
CN105678244B (en) | A kind of near video search method based on improved edit-distance | |
CN107515934A (en) | A kind of film semanteme personalized labels optimization method based on big data | |
CN106649663A (en) | Video copy detection method based on compact video representation | |
CN107291895A (en) | A kind of quick stratification document searching method | |
CN103617263A (en) | Television advertisement film automatic detection method based on multi-mode characteristics | |
CN103761286B (en) | A kind of Service Source search method based on user interest | |
CN103778206A (en) | Method for providing network service resources | |
CN108241713A (en) | A kind of inverted index search method based on polynary cutting | |
CN107818183A (en) | A kind of Party building video pushing method based on three stage combination recommended technologies | |
CN110968721A (en) | Method and system for searching infringement of mass images and computer readable storage medium thereof | |
CN104657376A (en) | Searching method and searching device for video programs based on program relationship |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20170510 Termination date: 20200626 |