CN101968797A - Inter-lens context-based video concept labeling method - Google Patents

Inter-lens context-based video concept labeling method Download PDF

Info

Publication number
CN101968797A
CN101968797A CN 201010279086 CN201010279086A CN101968797A CN 101968797 A CN101968797 A CN 101968797A CN 201010279086 CN201010279086 CN 201010279086 CN 201010279086 A CN201010279086 A CN 201010279086A CN 101968797 A CN101968797 A CN 101968797A
Authority
CN
China
Prior art keywords
notion
camera lens
video
concept
target concept
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201010279086
Other languages
Chinese (zh)
Inventor
易剑
彭宇新
肖建国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN 201010279086 priority Critical patent/CN101968797A/en
Publication of CN101968797A publication Critical patent/CN101968797A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention provides an inter-lens context-based video concept labeling method, which comprises: acquiring a detector for each concept training labeled in a training set and inputting lenses in the training set and a test set into the concept detectors, wherein output lenses comprise prediction probabilities of corresponding concepts; assigning a target concept, dividing the lenses into different types according to the relationship between the concepts and target concepts of lenses in the training set, and extracting type characteristics; building a model according to the relationship between the lens type characteristics and type labels of the video lenses in the training set and the relationship between adjacent lens type labels, extracting the type characteristics of the video lenses in the test set, inputting into model, acquiring the probability of a certain type of a lens label, and using the probability as a prediction probability of the optimized target concept; and optimizing all concepts in turn, and labeling the lenses with the corresponding concepts if the optimized prediction probabilities are greater than a given threshold. In the method, as the different relationships between lenses in videos are taken into full consideration, the result of concept labeling can be optimized more effectively.

Description

A kind of based on contextual video notion mask method between camera lens
Technical field
The invention belongs to video content analysis and understand technical field, be specifically related to a kind of based on contextual video notion mask method between camera lens.
Background technology
In recent years, along with Internet technology and rapid development of multimedia, the video content that has occurred magnanimity on the network, video information in the face of magnanimity like this, how to utilize computing machine to come it is effectively managed, make the user can retrieve the content of wanting quickly and accurately, become a key issue that urgency is to be solved.Yet, exist difference between the video of human intelligible and the video of computer understanding: the mankind understand video from the angle of semanteme, and computing machine can only extract the low-level image feature in the video content, for example color, texture, shape and sound etc., can not understand semanteme wherein, therefore, exist a wide gap between the human understanding and the understanding of computing machine, be called semantic wide gap, semantic wide gap becomes the big obstacle that computing machine is analyzed automatically and managed video content.In order to shorten the distance of semantic wide gap, help computing machine to understand better and the managing video content, researchers have proposed video notion mark technology, this method at first defines the semantic concept dictionary, the semantic concept that comprises in the dictionary extensively covers the content that may occur in the video as far as possible, for one section video, can choose corresponding notion the main contents of video are described; After having defined the semantic concept dictionary, video notion mark technology is mainly studied the mapping relations of how setting up between video and the semantic concept, for each section video, it is mapped to corresponding semantic concept gets on, and come video content is described with these semantic concepts, can effectively shorten the distance of semantic wide gap.Fig. 1 has provided an example and has described, these notions are taken from the semantic concept dictionary of definition in the document " Lscom lexicon definitions and annotations version 1.0; dto challengeworkshop on large scale concept ontology for multimedia " (document is the technical report of Columbia university in 2006), can see that the semantic concept that video comprises can be described the main contents of video.When video is retrieved, can utilize corresponding notion to come video is carried out index, thereby realize effective management the magnanimity video.Therefore, video notion mark technology has crucial research and using value.
Existing video notion mark technology can be divided into two classes, and first kind method is directly set up the mapping relations of video low-level image feature and semantic concept.For example, in Columbia university technical report in 2007, A.Yanagawa and S.F.Chang etc. have extracted three kinds of low-level image features of video in document " Columbia university ' s baseline detectors for 374 LSCOM semantic visual concepts ", to each notion, obtain three based on these three kinds of features training respectively and detect son, when a certain camera lens is marked, these three detection are exported a prediction probability respectively, and the mean value of getting these three probable values at last comprises the probability of corresponding concepts as this camera lens.Second class utilizes the adjacent video camera lens further to optimize the result of notion mark in relation semantically on the basis of first kind method.Generally speaking, specifically pay close attention to a certain given notion, when being referred to as target concept, we are divided into four classes to the relation between the camera lens in the video: consistance, positive correlation, negative correlation and irrelevant.Wherein, consistance is meant, if comprise target concept in a camera lens, then can increase the probability that its adjacent camera lens comprises target concept; Positive correlation is meant, if the notion and the target concept positive correlation that comprise in camera lens then can increase the probability that comprises target concept in its adjacent camera lens; Negative correlation is meant, if the notion and the target concept negative correlation that comprise in camera lens then can reduce the probability that comprises target concept in its adjacent camera lens; Irrelevant being meant, notion that comprises in the camera lens and target concept are irrelevant, and this camera lens for whether comprising target concept in its adjacent camera lens can not exert an influence so.On ACM Multimedia Conference in 2008, M.F.Weng and Y.Y.Chuang utilize the adjacent camera lens of video in semantically consistance in document " Multi-cue fusion for semantic video indexing ", and the result of semantic concept mark is optimized.Yet this method has only been considered cinestrip in semantically consistance, and does not consider other relations, and therefore for example positive correlation and negative correlation etc. have the room for improvement of further raising.
Summary of the invention
At the deficiencies in the prior art, the present invention proposes a kind ofly based on contextual video notion mask method between camera lens, be used for video lens is labeled as relevant semantic concept, to reach the purpose that video content is described.This method has taken into full account the various relations of camera lens in the video, comprises consistance, positive correlation, negative correlation and irrelevant etc., can further optimize the result of video notion mark, thereby help understanding and management to video content.
For reaching above purpose, technical scheme of the present invention is as follows:
A kind of based on contextual video notion mask method between camera lens, may further comprise the steps:
1) given video training set with artificial annotation results, for each semantic concept training that marks obtains concept detection, video lens in training set and the test set to be marked is input to concept detection, exports the prediction probability that comprises corresponding concepts in each camera lens;
2) with a certain given semantic concept as target concept, according to the relation of notion that comprises in the training set video lens and target concept, be shot classification dissimilar, and extract the lens type feature;
3) based on step 2) in shot classification and the type feature of extraction, mutual relationship between the camera lens in the training set video is carried out modeling, the relational model that makes up is represented the type feature of a certain camera lens and the relation between the type mark, represent the relation between the type mark of adjacent camera lens simultaneously, extract the type feature of test set video lens, the input relational model obtains the probability that camera lens is labeled as a certain type, and the prediction probability after optimizing as target concept thus;
4) repeating step 2) and step 3), notions all in test set are all as till the optimised mistake of target concept, at this moment, if the probability that camera lens comprises certain notion then marks this camera lens with this notion greater than given threshold value.
Described step 1) is divided into positive sample and negative sample to training set according to artificial annotation results, utilizes the method training of machine learning to obtain sorter then, and this sorter is promptly as detection of corresponding concepts.
Described step 2) camera lens is divided into four types: target concept camera lens, positive correlation notion camera lens, negative correlation notion camera lens, and unrelated concepts camera lens.
Four class camera lenses are defined as follows: comprise target concept in the target concept camera lens; Positive correlation notion camera lens does not comprise target concept, but comprises and the positively related notion of target concept; Negative correlation notion camera lens does not comprise target concept and positively related notion thereof, but comprises the notion with the target concept negative correlation; The unrelated concepts camera lens neither comprises target concept, does not also comprise the notion with target concept positive correlation or negative correlation.
Given a certain target concept, notion and target concept positive correlation represent that this notion and target concept semantically are being mutually related; Notion and target concept negative correlation represent that this notion and target concept are to repel mutually semantically.
Described step 2) in, type feature extracts according to the prediction probability of concept detection output, and expression is shot classification dissimilar probability.
In the described step 3), the camera lens that belongs to a video in the training set is arranged according to the time order and function relation, obtained shot sequence, training obtains the relational model between the camera lens based on shot sequence.
The given threshold value of described step 3) is 0.5.
Effect of the present invention is: compare with existing method, the present invention can obtain video notion annotation results more accurately, thus give full play to notion be labeled in video content analysis and understand in vital role.Why the present invention has the foregoing invention effect, and its reason is: the present invention has taken into full account the relation of the difference between the camera lens in the video, comprises consistance, positive correlation, negative correlation and irrelevant, can more effectively optimize the result of notion mark.
Description of drawings
Fig. 1 is a video notion mark synoptic diagram;
Fig. 2 is the schematic flow sheet of the inventive method.
Embodiment
The present invention is described in further detail below in conjunction with the drawings and specific embodiments.
As shown in Figure 2, of the present invention a kind of based on contextual video notion mask method between camera lens, specifically comprise following steps:
(1) obtains concept detection for each the semantic concept training in the training set
Need to be divided into positive sample and negative sample to training set according to artificial annotation results, utilize the method training of machine learning to obtain sorter then, this sorter is promptly as detection of corresponding concepts.Present embodiment adopts the method that proposes in the document " Columbia university ' s baseline detectors for 374 LSCOM semantic visualconcepts " (author is A.Yanagawa and S.F.Chang) in the Columbia university technical report in 2007 to obtain concept detection.This method is at first extracted three kinds of low-level image features to video lens: i.e. color moment feature, Gabor textural characteristics, and edge histogram feature; Obtain three based on these three kinds of features training respectively then and detect son, for a pending camera lens, these three detection are exported a prediction probability respectively, and the mean value of getting these three prediction probabilities comprises the probability of corresponding concepts as this camera lens.Therefore, in the method, detection of each notion detects son by three and forms.
Symbol definition:, earlier some symbols that will use are described in order hereinafter to clearly demonstrate method used in the present invention.
The camera lens note that belongs in the same video is made S={s 1, s 2..., s n, wherein, the subscript of camera lens is according to the time order and function order assignment of camera lens in video: camera lens s I-1At camera lens s iBefore, camera lens s I+1At camera lens s iAfterwards, n is the number that comprises camera lens in the video.
Use L={l 1, l 2..., l nThe artificial mark of camera lens among the expression S, wherein, Camera lens s in the expression training set iArtificial mark,
Figure BSA00000265834200042
Value be 1 expression camera lens s iComprise notion c j,
Figure BSA00000265834200043
Value be 0 expression this camera lens s iDo not contain notion c jM is the number of notion in the semantic dictionary.Use L TrainArtificial concept's mark of expression training set.
Use P={P 1, P 2..., P mRepresent the prediction probability of concept detection to camera lens among the S, wherein,
Figure BSA00000265834200044
Be notion c jThe prediction probability of detection output, Be notion c jDetection to camera lens s iPrediction probability.Use P TrainAnd P TestRepresent the prediction probability of concept detection respectively to camera lens output in training set and the test set.
Use Y={y 1, y 2..., y mThe type of camera lens among the expression S, wherein,
Figure BSA00000265834200046
Be with c jDuring for target concept, according to notion that comprises in the camera lens and c jRelation camera lens is carried out sorting result, Be camera lens s iType.Use Y TrainThe type mark of expression training set.
Use X={x 1, x 2..., x mThe type feature of camera lens among the expression S, wherein, Be with notion c jDuring for target concept, to the type feature of camera lens extraction.
Figure BSA00000265834200049
Be camera lens s iType feature.Use X TrainAnd X TestThe type feature of representing camera lens in training set and the test set respectively.
With
Figure BSA000002658342000410
Expression utilizes the result after relational model between camera lens is optimized the prediction probability of camera lens among the S,
Figure BSA000002658342000411
Be to notion c jThe result of prediction probability after being optimized,
Figure BSA000002658342000412
Be to notion c jAt camera lens s iOn the result of prediction probability after being optimized.With
Figure BSA000002658342000413
Camera lens is through the prediction probability after optimizing in the expression test set.
(2) for a certain given semantic concept c j, be referred to as target concept, according to notion that is comprised in the camera lens and the relation between the target concept camera lens in the training set is classified.
We are divided into four classes to the camera lens in the training set: the target concept camera lens is designated as
Figure BSA00000265834200051
Positive correlation notion camera lens is designated as
Figure BSA00000265834200052
Negative correlation notion camera lens is designated as
Figure BSA00000265834200053
With the unrelated concepts camera lens, be designated as
Figure BSA00000265834200054
Four class camera lenses are defined as follows: comprise notion c in the target concept camera lens jPositive correlation notion camera lens does not comprise notion c j, but comprise and c jPositively related notion; Negative correlation notion camera lens does not comprise notion c jAnd positively related notion, but comprise and c jThe notion of negative correlation; The unrelated concepts camera lens neither comprises notion c j, also do not comprise and c jThe notion of positive correlation or negative correlation.Here these four types are designated as
Figure BSA00000265834200055
(3) pay close attention to target concept c j, the type feature of camera lens in extraction training set and the test set.In the present embodiment, we are based on concept detection the prediction probability of video lens are extracted type feature, also can adopt method for distinguishing, and for example the low-level image feature based on video lens extracts type feature etc.
Camera lens s iType feature be the proper vector of a four-dimension, be designated as
Figure BSA00000265834200056
Figure BSA00000265834200057
Each dimension value is by following formula definition:
Formula one: x i , consist j = P i j
Formula two: x i , pos j = ( 1 - P i j ) × ( 1 - Π c h ∈ POS j ( 1 - P i h ) )
Formula three: x i , neg j = ( 1 - P i j ) × Π c h ∈ POS j ( 1 - P i h ) × ( 1 - Π c h ∈ NEG j ( 1 - P i h ) )
Formula four: x i , irrel j = ( 1 - P i j ) × Π c h ∈ POS j ( 1 - P i h ) × Π c h ∈ NEG j ( 1 - P i h ) )
Wherein, P i jBe notion c jDetection to camera lens s iOutput valve, i.e. camera lens s iComprise notion c jPrediction probability.POS jBe and notion c jPositively related notion set, NEG jBe and notion c jThe notion set of negative correlation.Notion and c jPositive correlation represents that this notion usually and c jIn same camera lens, occur; Notion and c jNegative correlation is represented this notion c that gets along well substantially jIn same camera lens, occur.Formula five and formula six have provided POS jAnd NEG jDefinition, satisfy the notion c of formula five hBelong to set POS j, satisfy the notion c of formula six hBelong to set NEG j
Formula five: NormMI pos ( c h , c j ) > Th pos j
Formula six: NormMI neg ( c h , c j ) > Th neg j
Wherein, NormMI Pos(c h, c j) and NormMI Neg(c h, c j) value provide by formula seven, eight, nine, ten, 11: NormMI Pos(c h, c j) value big more, the expression c hWith c jPositively related degree is strong more, NormMI Neg(c h, c j) value big more, the expression c hWith c jThe degree of negative correlation is strong more. With
Figure BSA00000265834200064
Be respectively a threshold value, in the present embodiment, we
Figure BSA00000265834200065
Be taken as all NormMI Pos(c h, c j) in the sixth-largest value (h ≠ j), like this, satisfy the notion c of formula five hThere are and only have five,
Figure BSA00000265834200066
Value also taked same method.
Formula seven: NormMI Pos(c h, c j)=MI Pos(c h, c j)/min{H (c h), H (c j)
Formula eight: NormMI Neg(c h, c j)=MI Neg(c h, c j)/min{H (c h), H (c j)
Formula nine:
Figure BSA00000265834200067
Formula ten:
Figure BSA00000265834200068
Formula 11: H ( c j ) = - Σ l j ∈ { - 1,1 } p ( l j ) log p ( l j )
Wherein, l jValue be 1 expression notion c jL appears in camera lens jValue be 0 expression notion c jIn camera lens, do not occur.P (l j) be notion c jThe prior probability that whether occurs in camera lens can be by statistics training centralized concept c jThe number of times that occurs obtains.
(4) the target concept annotation results is optimized
The present invention adopts condition random field, and (Conditional Random Field CRF) carries out modeling for the mutual relationship between the camera lens in the video.In addition, also can utilize method for distinguishing to carry out modeling, for example Markov random field etc.Condition random field is on the International of calendar year 2001 Conference on Machine Learning, proposed in document " Conditional random fields:Probabilistic models for segmenting and labelingsequence data " by people such as J.Lafferty, this method is used to solve the sequence labelling problem.In condition random field, the type of node i mark y iThe type feature that not only depends on this node also depends on the annotation results of its neighborhood of nodes.In the present invention, the camera lens that belongs to same video is arranged in a shot sequence according to time sequencing, each camera lens will be noted as
Figure BSA00000265834200071
Figure BSA00000265834200072
One of these four types, like this, the problem of camera lens mark can well solve with condition random field with regard to can be regarded as a sequence labelling problem, and camera lens is noted as type
Figure BSA00000265834200073
Probability then be counted as and comprise notion c in the camera lens jProbability, can be used as notion c jPrediction probability after the optimization, owing to the dependence of having considered between the camera lens, the probable value after the optimization is more accurate, provides the definition of condition random field below:
Make X and Y be respectively two random vectors, (V E) is a non-directed graph to G=, and v ∈ V is a node among the figure, and v is used for corresponding stochastic variable y v, the variable among so vectorial Y can be corresponding one by one by the node among the G.As variable y vSatisfy when scheme Markov property that G represents that node concerns about the conditional probability distribution of X, we claim<X, Y〉be a condition random field, i.e. P (y v| X, y w, w ≠ v)=P (y v| X, y w, w~v), wherein, w~v represents that node w and v are neighborhood of nodes in figure G.
In practical problems, X represents the type feature of node usually, and Y is the type mark of node.(author is J.Lafferty as document " Conditional random fields:Probabilistic models for segmenting and labelingsequence data ", A.McCallum and F.Pereira, be published on the International Conference on Machine Learning of calendar year 2001) described in, under the prerequisite of given X, the conditional probability of Y is provided by formula 12:
Formula 12: P ( Y | X ) = 1 Z exp ( Σ e ∈ E I ( e , Y | e , X , λ ) + Σ v ∈ V A ( v , Y | v , X , μ ) )
Wherein, D={e, v} are the subgraphs of G, and Y|D is a stochastic variable relevant with subgraph D among the Y; A is the auto-correlation potential function of condition random field, is used for to the type feature of node and the relationship modeling between the type mark; I is the simple crosscorrelation potential function, is used for the relation between the different node type mark is carried out modeling; Z is used for to result of calculation normalization, makes value value between 0 to 1 of P (Y|X).In particular problem, in case the definition of given auto-correlation potential function and simple crosscorrelation potential function, the process of utilizing condition random field to mark can be divided into two stages: training and prediction, and the training stage is to parameter θ=(λ μ) learns; After obtaining θ, the target of forecast period will find a mark sequence Y exactly *, make P (Y|X) get maximal value, shown in formula 13:
Formula 13: Y * = arg max Y P ( Y | X , θ )
About the concrete study of condition random field with forecasting process can (author be J.Lafferty referring to document " Conditional random fields:Probabilistic models for segmenting and labeling sequence data ", A.McCallum and F.Pereira are published on the International Conference on Machine Learning of calendar year 2001).
In the present invention, two potential functions in our the following definite condition random field, i.e. auto-correlation potential function (AssociationPotential) and simple crosscorrelation potential function (Interactive Potential):
The auto-correlation potential function is used for representing the type feature of camera lens and the relation between the type mark, for target concept c j, definition auto-correlation potential function is as follows:
Formula 14: A ( s i , y j , x j ) = x i , consist j y i j = T consist j x i , pos l y i j = T pos j x i , neg j y i j = T neg j x i , irrel j y i j = T irrel j
As above-mentioned definition,
Figure BSA00000265834200083
Be with c jCamera lens s during for objective function iType mark,
Figure BSA00000265834200084
Figure BSA00000265834200085
It is the type mark of n camera lens in the video.
Figure BSA00000265834200086
Be the type feature of n camera lens,
Figure BSA00000265834200087
Be camera lens s iType feature.
The simple crosscorrelation potential function is used for representing the mutual relationship between the dissimilar lens type marks, for target concept c j, definition auto-correlation potential function is as follows:
Formula 15: I ( y i j , y i + 1 j , x j ) = Σ t ′ , t ∈ T j λ t ′ , t δ ( y i j = t ′ ) δ ( y i + 1 j = t )
Wherein, δ (.) is an indicator function, and when the condition of parenthetic statement is a true time, the value of this function is 1, otherwise is 0.
Figure BSA00000265834200089
Be based on target concept c jThe lens type of definition.
After the definition that obtains auto-correlation potential function and simple crosscorrelation potential function, can obtain relational model M between camera lens by training set study j, the video that marks for needs extracts the type feature of camera lens, input model M according to the method in the step (3) j, can obtain camera lens is labeled as type
Figure BSA000002658342000810
Marginal probability, as notion c jPrediction probability after the optimization is if the probability after optimizing is then used notion c greater than threshold value 0.5 jCamera lens is marked.
Below experimental result show that compare with existing method, the present invention can obtain notion annotation results more accurately, thereby give full play to the huge effect of video notion in being labeled in video content analysis and understanding.
Present embodiment has adopted the data set of TRECVID 2006 to evaluate and test, and TRECVID is the authoritative activity of assessment and test by the video frequency searching field of American National Standard Technical Board tissue.This data set is divided into two parts: training set and test set.Comprise 85 hours news video program in the training set, related to Arabic news, Chinese news and English news.These news programs are made up of 137 videos, are split into 43907 camera lenses.(author is A.Yanagawa and S.F.Chang etc. to document " Columbia university ' s baseline detectorsfor 374 lscom semantic visual concepts ", be a technical report of Columbia university) camera lens in these training sets has been carried out artificial mark, each camera lens is noted as corresponding notion, and present embodiment has adopted the labeled data of these data as training set.Test set has comprised 259 videos, and these videos are split into 79484 camera lenses, and we, evaluate and test on 20 notions that TRECVID 2006 officials adopt after these camera lenses are handled with different notion mask methods.We have tested following 3 kinds of methods and have contrasted as experiment:
I. document " Columbia university ' s baseline detectors for 374 LSCOM semantic visualconcepts " (author A.Yanagawa and S.F.Chang, be the technical report of Columbia university in 2007) in method, this method adopts support vector machine as sorter, directly sets up the mapping relations between video low-level image feature and the semantic concept;
II. document " Association and temporal rule mining for post-processing of semanticconcept detection in video " (author K.H.Liu and M.F.Weng, be published on the IEEETransactions on Multimedia in 2008) in method, this method adopts rule-based method that the notion annotation results is optimized on the basis of method I;
III. document " Multi-cue fusion for semantic video indexing " (author M.F.Weng and Y.Y.Chuang, be published on the ACM Multimedia Conference in 2008) in method, this method is utilized in the video to come the prediction probability of notion is optimized in consistance semantically between the adjacent camera lens on the basis of method I;
IV. the method for present embodiment.
Document " Estimating average precision with incomplete and imperfectjudgments " is adopted in experiment, and (author is E.Yilmaz and J.A.Aslam, be published on the ACM Conference onInformation and Knowledge Management in 2006) in the InfAP that proposes as evaluation metrics, the value of InfAP is big more, illustrates that the result of notion mark is just good more.
The contrast of table 1 experimental result
InfAP
Method I 0.0948
Method II 0.1108
Method III 0.1112
IV of the present invention 0.1252
As can be seen from Table 1, IV of the present invention has obtained best notion annotation results, control methods I and IV of the present invention, the context of not using among the method I between camera lens comes the notion annotation results is optimized, and in IV of the present invention, utilize the mutual relationship between the camera lens can improve the performance that notion marks.Control methods II, III and IV of the present invention, method II are based on the method for rule, and it is always limited to sum up the rule of coming out, has limitation, only consider the consistance between the adjacent camera lens among the method III, and do not consider other relations between the camera lens therefore have further improved space.The present invention utilizes condition random field that the mutual relationship between the video lens is carried out modeling, comprises consistance, and positive correlation and negative correlation etc. have been improved the accuracy rate that notion marks greatly.
Obviously, those skilled in the art can carry out various changes and modification to the present invention and not break away from the spirit and scope of the present invention.Like this, if of the present invention these are revised and modification belongs within the scope of claim of the present invention and equivalent technologies thereof, then the present invention also is intended to comprise these changes and modification interior.

Claims (8)

1. one kind based on contextual video notion mask method between camera lens, may further comprise the steps:
1) given video training set with artificial annotation results, for each semantic concept training that marks obtains concept detection, video lens in training set and the test set to be marked is input to concept detection, exports the prediction probability that comprises corresponding concepts in each camera lens;
2) with a certain given semantic concept as target concept, according to the relation of notion that comprises in the training set video lens and target concept, be shot classification dissimilar, and extract the lens type feature;
3) based on step 2) in shot classification and the type feature of extraction, mutual relationship between the camera lens in the training set video is carried out modeling, the relational model that makes up is represented the type feature of a certain camera lens and the relation between the type mark, represent the relation between the type mark of adjacent camera lens simultaneously, extract the type feature of test set video lens, the input relational model obtains the probability that camera lens is labeled as a certain type, and the prediction probability after optimizing as target concept thus;
4) repeating step 2) and step 3), notions all in test set are all as till the optimised mistake of target concept, at this moment, if the probability that camera lens comprises certain notion then marks this camera lens with this notion greater than given threshold value.
2. video notion mask method as claimed in claim 1, it is characterized in that, described step 1) is divided into positive sample and negative sample to training set according to artificial annotation results, utilizes the method training of machine learning to obtain sorter then, and this sorter is promptly as detection of corresponding concepts.
3. video notion mask method as claimed in claim 1 is characterized in that described step 2) camera lens is divided into four types: target concept camera lens, positive correlation notion camera lens, negative correlation notion camera lens, and unrelated concepts camera lens.
4. video notion mask method as claimed in claim 3 is characterized in that, given a certain target concept, and four class camera lenses are defined as follows: comprise target concept in the target concept camera lens; Positive correlation notion camera lens does not comprise target concept, but comprises and the positively related notion of target concept; Negative correlation notion camera lens does not comprise target concept and positively related notion thereof, but comprises the notion with the target concept negative correlation; The unrelated concepts camera lens neither comprises target concept, does not also comprise the notion with target concept positive correlation or negative correlation.
5. video notion mask method as claimed in claim 4 is characterized in that, given a certain target concept, and notion and target concept positive correlation represent that this notion and target concept semantically are being mutually related; Notion and target concept negative correlation represent that this notion and target concept are to repel mutually semantically.
6. video notion mask method as claimed in claim 1 is characterized in that described step 2) in, type feature extracts according to the prediction probability of concept detection output, and expression is shot classification dissimilar probability.
7. video notion mask method as claimed in claim 1, it is characterized in that, in the described step 3), the camera lens that belongs to a video in the training set is arranged according to the time order and function relation, obtain shot sequence, training obtains the relational model between the camera lens based on shot sequence.
8. video notion mask method as claimed in claim 1 is characterized in that, the given threshold value of described step 3) is 0.5.
CN 201010279086 2010-09-10 2010-09-10 Inter-lens context-based video concept labeling method Pending CN101968797A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201010279086 CN101968797A (en) 2010-09-10 2010-09-10 Inter-lens context-based video concept labeling method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201010279086 CN101968797A (en) 2010-09-10 2010-09-10 Inter-lens context-based video concept labeling method

Publications (1)

Publication Number Publication Date
CN101968797A true CN101968797A (en) 2011-02-09

Family

ID=43547954

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201010279086 Pending CN101968797A (en) 2010-09-10 2010-09-10 Inter-lens context-based video concept labeling method

Country Status (1)

Country Link
CN (1) CN101968797A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103279579A (en) * 2013-06-24 2013-09-04 魏骁勇 Video retrieval method based on visual space
CN104781831A (en) * 2012-09-13 2015-07-15 通用电气公司 System and method for generating semantic annotations
CN107025433A (en) * 2017-03-03 2017-08-08 深圳大学 Video Events class people's concept learning method and device
CN109274964A (en) * 2018-11-09 2019-01-25 北京奇艺世纪科技有限公司 A kind of video lens type information modification method and device
CN109508668A (en) * 2018-11-09 2019-03-22 北京奇艺世纪科技有限公司 A kind of lens type information identifying method and device
CN111291618A (en) * 2020-01-13 2020-06-16 腾讯科技(深圳)有限公司 Labeling method, device, server and storage medium
CN111460971A (en) * 2020-03-27 2020-07-28 北京百度网讯科技有限公司 Video concept detection method and device and electronic equipment

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1477566A (en) * 2003-07-18 2004-02-25 北京大学计算机科学技术研究所 Method for making video search of scenes based on contents

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1477566A (en) * 2003-07-18 2004-02-25 北京大学计算机科学技术研究所 Method for making video search of scenes based on contents

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
《Columbia University ADVENT Technical Report》 20070320 Akira Yanagawa,etc Columbia University's Baseline Detectors for 374 LSCOM Semantic Visual Concepts 第1-17页 1-8 , 2 *
《电子学报》 20040731 彭宇新等 一种基于二分图最优匹配的镜头检索方法 1135-1139 1-8 第32卷, 第7期 2 *
《计算机工程与应用》 20040131 肖建国等 一种新的基于模糊聚类的镜头检索方法 56,57,102 1-8 , 第1期 2 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104781831A (en) * 2012-09-13 2015-07-15 通用电气公司 System and method for generating semantic annotations
CN104781831B (en) * 2012-09-13 2019-04-30 通用电气公司 System and method for generative semantics mark
CN103279579A (en) * 2013-06-24 2013-09-04 魏骁勇 Video retrieval method based on visual space
CN103279579B (en) * 2013-06-24 2016-07-06 魏骁勇 The video retrieval method in view-based access control model space
CN107025433A (en) * 2017-03-03 2017-08-08 深圳大学 Video Events class people's concept learning method and device
CN107025433B (en) * 2017-03-03 2021-03-16 深圳大学 Video event human concept learning method and device
CN109274964A (en) * 2018-11-09 2019-01-25 北京奇艺世纪科技有限公司 A kind of video lens type information modification method and device
CN109508668A (en) * 2018-11-09 2019-03-22 北京奇艺世纪科技有限公司 A kind of lens type information identifying method and device
CN111291618A (en) * 2020-01-13 2020-06-16 腾讯科技(深圳)有限公司 Labeling method, device, server and storage medium
CN111291618B (en) * 2020-01-13 2024-01-09 腾讯科技(深圳)有限公司 Labeling method, labeling device, server and storage medium
CN111460971A (en) * 2020-03-27 2020-07-28 北京百度网讯科技有限公司 Video concept detection method and device and electronic equipment
CN111460971B (en) * 2020-03-27 2023-09-12 北京百度网讯科技有限公司 Video concept detection method and device and electronic equipment

Similar Documents

Publication Publication Date Title
Stein et al. Intrinsic plagiarism analysis
CN104699763B (en) The text similarity gauging system of multiple features fusion
CN111026671B (en) Test case set construction method and test method based on test case set
US8630989B2 (en) Systems and methods for information extraction using contextual pattern discovery
CN101968797A (en) Inter-lens context-based video concept labeling method
CN108388559A (en) Name entity recognition method and system, computer program of the geographical space under
Sarkhel et al. Visual segmentation for information extraction from heterogeneous visually rich documents
CN113033183B (en) Network new word discovery method and system based on statistics and similarity
CN114003721A (en) Construction method, device and application of dispute event type classification model
CN108763192B (en) Entity relation extraction method and device for text processing
Spitz et al. So far away and yet so close: Augmenting toponym disambiguation and similarity with text-based networks
Levy et al. Comparison of machine-learning algorithms for the prediction of current procedural terminology (CPT) codes from pathology reports
CN110889275A (en) Information extraction method based on deep semantic understanding
Rehs A supervised machine learning approach to author disambiguation in the Web of Science
Wang et al. Chinese hypernym-hyponym extraction from user generated categories
Aliman et al. Sentiment analysis using logistic regression
CN108519978A (en) A kind of Chinese document segmenting method based on Active Learning
CN114491081A (en) Electric power data tracing method and system based on data blood relationship graph
Mou et al. Research on method of user preference analysis based on entity similarity and semantic assessment
Graff et al. A Python library for exploratory data analysis on twitter data based on tokens and aggregated origin–destination information
Angrosh et al. Context identification of sentences in research articles: Towards developing intelligent tools for the research community
CN109087223A (en) A kind of educational resource model building method based on ontology
Liu et al. Event graph based contradiction recognition from big data collection
CN110502669A (en) The unsupervised chart dendrography learning method of lightweight and device based on the side N DFS subgraph
CN110807096A (en) Information pair matching method and system on small sample set

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20110209