CN1461142A - Video segment searching method based on contents - Google Patents

Video segment searching method based on contents Download PDF

Info

Publication number
CN1461142A
CN1461142A CN03148305A CN03148305A CN1461142A CN 1461142 A CN1461142 A CN 1461142A CN 03148305 A CN03148305 A CN 03148305A CN 03148305 A CN03148305 A CN 03148305A CN 1461142 A CN1461142 A CN 1461142A
Authority
CN
China
Prior art keywords
fragment
similar
factor
camera lens
video
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN03148305A
Other languages
Chinese (zh)
Other versions
CN1206847C (en
Inventor
彭宇新
杨宗桦
肖建国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIDA FANGZHENG TECHN INST Co Ltd BEIJING
Inst Of Computer Science & Technology Peking University
Original Assignee
BEIDA FANGZHENG TECHN INST Co Ltd BEIJING
Inst Of Computer Science & Technology Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIDA FANGZHENG TECHN INST Co Ltd BEIJING, Inst Of Computer Science & Technology Peking University filed Critical BEIDA FANGZHENG TECHN INST Co Ltd BEIJING
Priority to CNB031483054A priority Critical patent/CN1206847C/en
Publication of CN1461142A publication Critical patent/CN1461142A/en
Application granted granted Critical
Publication of CN1206847C publication Critical patent/CN1206847C/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The present invention relates to a video segment search method based on the contents. It adopts maximum matching of graph theory and its optimum matching so as to raise search accuracy and search speed. Said method includes the following steps: firstly, investigating continuity of similar scene to obtain a similar segment, primarily, then utilizing Hungarian algorithm capable of implementing maximum matching to define true similar segment, then adopting the combination of Kuhn-Munkres algorithm capable of making optimum matching and dynamic program algorithm to resolve the metric problem of segment similarity. The tests show that said method can obtain higher search accuracy and more quickly searching speed.

Description

A kind of content-based video fragment searching method
Technical field
The invention belongs to video search technique area, be specifically related to a kind of content-based video fragment searching method.
Background technology
Along with the accumulation of TV station's video frequency program, the increase of online digital video, and digital library, video request program, how the multimedia application that remote teaching etc. are a large amount of retrieves needed data fast and seems most important in the magnanimity video.Traditional video frequency searching of describing based on keyword is because descriptive power is limited, and subjectivity is strong, manual mark, reasons such as intuitive difference, demand that can not the satisfying magnanimity video frequency searching.Therefore, since the nineties, the Content-based Video Retrieval technology becomes the hot issue of research.
Content-based video clip retrieval is based on the main mode of the video frequency searching of content, and it is meant a given query fragment, finds all fragments similar to it in video library.Content-based video clip retrieval need solve two problems and carry out simultaneously the retrieval of two types of fragments.Two problems are: 1, be partitioned into a plurality of fragments similar to query fragment automatically in video library; 2, arrange these similar fragments from high to low according to similarity.Two types retrieval comprises: 1, accurately retrieval: the fragment that retrieve is the same substantially with query fragment, has same camera lens and frame sequence; 2, similarity retrieval: such two kinds of situations are arranged, and a kind of is that former video has been carried out various editors, as insertion/delete frame (slow motion/snapshot), insertion/deletion camera lens, switching frame/camera lens sequential scheduling.Another kind is the different similar programs of taking, as different football matches etc.A good fragment searching algorithm should be able to solve above-mentioned two problems, carries out the retrieval of two types of fragments simultaneously in the rational time.
Existing fragment search method can be divided into two classes: one, as document " A Framework forMeasuring Video Similarity and Its Application to Video Query by Example " [Y.P.Tan, S.R.Kulkami, and P.J.Ramadge, IEEE International Conference on ImageProcessing, Vol.2, pp.106-110,1999] described, video segment is divided into the two-layer consideration of fragment-frame, and the similitude utilization of fragment is formed the similitude of its frame and is directly measured.The shortcoming of these class methods is to limit similar fragment must observe same time sequencing, and the practical video program is not observed this constraint, because the result of later stage compilation makes similar fragment may have different camera lens orders fully, difference editor as same advertisement, this comparison based on every frame simultaneously also makes retrieval rate slow.Two, the prior art the most approaching with the present invention is that (author is L.Chen to the document of delivering at IEEE International Conferenceon Multimedia and Expo calendar year 2001 " A Match and Tiling Approach toContent-based Video Retrieval ", and T.S.Chua, page number 417-420), this documents discloses a class fragment search method, this method is divided into fragment-camera lens-three layers of consideration of frame to video segment, it comprises such several steps: (1) uses MRA (Temporal Multi-Resolution Analysis) method detector lens border earlier, to each frame of each camera lens, carry out color coding and texture coding then.Color coding adopts the average μ and the variances sigma coding of Y component, and texture adopts FRACTAL DIMENSION feature (Fractal Dimension, FD) coding; (2) suppose the similar frame of two camera lens inside, similar according to the time sequencing correspondence, therefore calculate the maximum length sequence of two similar frames of camera lens, the similarity of final two camera lenses is expressed as the linear combination of above-mentioned 3 features, determines similar threshold value σ L, judge whether two camera lenses are similar; (3) on this basis, use the way of sliding window (Sliding Window), finally find the fragment similar to query fragment.This method can accurately be retrieved and similarity retrieval simultaneously, but its problem is: (1) has only considered the quantity of two similar camera lenses of fragment, and the camera lens of not considering multi-to-multi similar (granularity) is to the influence of overall similarity, therefore, even all camera lenses of fragment Y are only similar with the camera lens of fragment X, it is similar to X that Y also can be considered to; (2) hypothesis of Ti Chuing and being false, i.e. the similar frame of two camera lens inside may not be similar according to the time sequencing correspondence; (3) similitude of camera lens is to judge according to two camera lens the longest similar frame sequences, this comparison based on every frame, and the retrieval rate of fragment is slow.
Summary of the invention
At the existing defective of existing video fragment searching method, the objective of the invention is to propose a kind of content-based video fragment searching method, this method can improve the retrieval precision and the retrieval rate of content-based video clip retrieval on the basis of existing technology greatly, thereby brings into play the huge effect of video clip retrieval technology in current network information society more fully.Another object of the present invention is when improving retrieval precision and retrieval rate, on the putting in order of similar fragment, meets people's psychological characteristics more.
The object of the present invention is achieved like this: a kind of content-based video fragment searching method may further comprise the steps:
(1) at first uses space-time section algorithm (spatio-temporal slice) to carry out shot boundary and detect, the video in query fragment and the video library is divided into camera lens; Camera motion information in the detector lens extracts or constructs key frame and represents the camera lens content then; The similarity measurement of camera lens is based on the key frame result relatively of the key frame and the video database camera lens of query fragment camera lens, according to the searching lens result, retrieves all similar to the camera lens of query fragment in video database camera lenses;
(2) by investigating the continuity of similar camera lens, tentatively be partitioned into the fragment similar to query fragment;
(3) these fragments have comprised real similar fragment and dissimilar fragment, and this moment, the Hungarian algorithm of maximum coupling was used for filtering dissimilar fragment, arrived next step and only keep similar fragment;
(4) for similar fragment, the vision similarity that the Optimum Matching of graph theory is calculated they and query fragment is the vision factor; Based on the result of Optimum Matching, the similitude of two similar fragment time sequencings of dynamic programming algorithm tolerance i.e. the order factor; Interference factor is also further measured; The similarity of final two fragments is expressed as the linear combination of the above-mentioned vision factor, the order factor and interference factor.
Need to prove, because Optimum Matching is under the prerequisite of (granularity) one to one, calculate the vision factor, and the calculating of the order factor and interference factor also is based on the result of Optimum Matching, so final measuring similarity has in fact comprised the tolerance of size distribution factor.
In order to realize purpose of the present invention better, when carrying out video clip retrieval, theory with bipartite graph in the graph theory, algorithm and result are incorporated on the measuring similarity of video content, specifically, be that the Hungarian algorithm of maximum coupling and the Kuhn-Munkres algorithm of Optimum Matching in the graph theory are used for content-based video clip retrieval.
Specifically, when carrying out video clip retrieval, tentatively be partitioned into fragment similar among the video library Y: with camera lens y similar among the video library Y to query fragment X to query fragment X jThese y are investigated in ordering then from small to large jContinuity, if | y J+1-y j|>2, j=1,2 ..., λ-1 then obtains a possible similar fragment Y k={ y i, y I+1..., y j, i, j ∈ [1, λ], in the above-mentioned formula, λ is the length of video library Y, shows with the camera lens numerical table.
Again specifically, when carrying out video clip retrieval, utilize the Hungarian algorithm of maximum coupling to filter dissimilar fragment and definite real similar fragment: for bipartite graph G k={ X, Y k, E k, if , fragment Y then kX is similar to query fragment, in the aforementioned calculation formula, and E k={ e Ij, e IjExpression x iWith y jSimilar, maximum coupling M E k, and any two limits are all non-conterminous among the M, and n is the camera lens number of query fragment X.
Further, when carrying out video clip retrieval, utilize the Kuhn-Munkres algorithm and the concrete similarity of calculating two fragments of dynamic programming algorithm of Optimum Matching: the Kuhn-Munkres algorithm computation query fragment X and the similar fragment Y of Optimum Matching kThe vision factor Vision = ω n , ω is cum rights bipartite graph G in the formula k={ X, Y k, E kAuthority, n is the camera lens number of query fragment X; Based on the result of Optimum Matching, the order factor of two similar fragments of dynamic programming algorithm tolerance order = c [ i , j ] n Interference factor is also further measured: Interference = 2 × | M | n + l , l is similar fragment Y in the formula k' the camera lens number, | M| represents G k={ X, Y k, E kThe limit number of Optimum Matching, the similarity of final two fragments is expressed as the linear combination of the above-mentioned vision factor, the order factor and interference factor: Similarity (X, Y k')=ω 1Vision+ ω 2Order+ ω 3Interference, the ω in this formula 1, ω 2, ω 3Represent respectively vision, in proper order, the weight of interference factor.
Again specifically, when carrying out video clip retrieval, the Optimum Matching computation vision factor is as follows with the method for determining similar segment boundaries: the similar value of every pair of similar camera lens is composed to G as weights k={ X, Y k, E kEvery limit, G at this moment kJust be converted into the bipartite graph of a cum rights, the Kuhn_Munkres algorithm that specifically calculates Optimum Matching is as follows:
(1) provides initial label l ( x i ) = max j ω ij , l ( y j ) = 0 , i , j = 1,2 . . . , t , t = max ( n , m ) ;
(2) obtain limit collection E l={ (x i, y j) | l (x i)+l (y j)=ω Ij, G l=(X, Y k, E l) and G lIn one the coupling M;
(3) as all nodes of the saturated X of M, then M promptly is the Optimum Matching of G, calculates and finishes, otherwise carry out next step;
(4) in X, look for a M unsaturation point x 0, make A ← { x 0, B ← φ, A, B are two set;
(5) if N G l ( A ) = B , then changeed for (9) step, otherwise carry out next step, wherein, N G l ( A ) ⊆ Y k , Be with A in the node set of node adjacency;
(6) look for a node y ∈ N G l ( A ) - B ;
(7) if y is the M saturation point, then find out the match point z of y, make A ← A ∪ z}, { y} changeed for (5) step, otherwise carries out next step B ← B ∪;
(8) there is one from x 0But the augmenting path P to y makes M ← M E (P), changes for (3) step;
(9) be calculated as follows a value: a = min x i ∈ A y j ∉ N G l ( A ) { l ( x i ) + l ( y j ) - ω ij } , Revise label:
Ask E according to l ' L 'And G L '
(10) l ← l ', G l← G L ', changeed for (6) step.
After obtaining authority ω and obtaining the coupling M of ω, the vision factor Vision = ω n In order to determine Y kThe segment boundaries similar to X, the present invention gets all y of the related M of X, and ordering is { y from small to large α, y β... y γ, α, beta, gamma ∈ [1, m], in this set, y α, y βPossibility is also discontinuous, i.e. y β-y α>1, according to the successional definition of video segment, the present invention gets y αWith y γBetween all camera lenses constitute similar fragment Y k'={ y α, y α+1..., y γ.
In order to implement the present invention better, the method for the dynamic programming algorithm computation sequence factor can be: in the Optimum Matching M that calculates, further investigate Y k' and the corresponding in chronological order situation of X, promptly find Y k' the full length shot number on limit is arranged with X in chronological order, measure the order factor with this.This problem can be summed up as the longest common subsequence (LCS) problem: given two sequence X={ x 1, x 2..., x nAnd Y k'={ y α, y α+1..., y γ, require to find out X and Y k' a common subsequence the longest, dynamic programming algorithm can effectively address this problem.For convenience of calculation, we are { y α, y α+1..., y γBe expressed as { y 1, y 2..., y l, c[i is used in l=γ-α+1, j] records series X and Y k' the length of long common subsequence, it is as follows to set up recurrence relation:
Figure A0314830500087
The order factor order = c [ i , j ] n .
Effect of the present invention is: adopt video fragment searching method of the present invention, can obtain higher retrieval precision and retrieval rate faster, another effect of the present invention is that the present invention simultaneously on the putting in order of similar fragment, meets people's psychological characteristics more.
Why the present invention has so significant technique effect, and its reason is:
One, as described in the previous technique content, in order to be partitioned into similar fragment, the present invention is divided into searching lens and fragment two stages of retrieval to retrieving: in the searching lens stage, considered the temporal information in the video, the inner time dependent content of a camera lens, be decomposed into the sub-camera lens (sub-shots) of several contents unanimity, thisly reflected more all sidedly based on sub-camera lens whether two camera lenses are similar, it has not only avoided existing method that each camera lens is only adopted key frame deficiency relatively, the problem that the retrieval rate of also having avoided existing method relatively to cause frame by frame is slow; In the fragment retrieval phase, tentatively obtain similar one by one fragment by the continuity of investigating similar camera lens, the Hungarian algorithm of the maximum coupling of utilization is determined real similar fragment again.For the homotaxy fragment, the present invention has considered vision, granularity, time sequencing and the interference factor of segment-similarity tolerance, proposes to combine with the Kuhn_Munkres algorithm of Optimum Matching and dynamic programming algorithm to measure the influence of these factors.The present invention uses the matching theory of graph theory to solve the video frequency searching problem first, this is because the thought of coupling requires similar camera lens corresponding (granularity) one by one, under this condition, the maximum coupling of obtaining can objectively reflect two number of shots and two degree that the fragment vision is similar that fragment is similar all sidedly with Optimum Matching, thereby has avoided the granularity problem that camera lens calculates in the existing method.Experimental result shows, compares with existing method with said function, and no matter be the accuracy of retrieval, or retrieval rate, the present invention has obtained outstanding effect.
Two, the measuring similarity of video segment except visual information, also depends on the internal relations between the camera lens of forming fragment, and in order to reach remarkable technique effect of the present invention, the present invention has considered following 4 factors when concrete retrieval:
(1) the vision factor: be the greatest factor that determines that two fragments are whether similar, mainly the similitude of the camera lens by forming fragment is measured;
(2) size distribution factor: certain camera lens in fragment may be similar in appearance to a plurality of camera lenses in another fragment.Therefore, in the similar camera lens corresponding diagram of two fragments, the situation of one-to-many, many-one, multi-to-multi can appear.Need method to measure the similitude of different camera lens corresponding relations.For example, the fragment of two many-to-one relationships should be given lower similar value;
(3) the order factor: two visually similar fragments can not be considered to dissimilar because of different camera lens orders.But the vision of comparing is similar and two fragments that time sequencing is different, and vision should be endowed higher similar value with two all similar fragments of time sequencing;
(4) interference factor: two similar fragments, some camera lenses in them may not find corresponding similar camera lens, and the existence of these camera lenses has embodied corresponding discontinuity, can exert an influence to two final similitudes of fragment.
Three, the search strategy of content-based video segment has been proposed: find out all visually similar fragments earlier to query fragment; For similar fragment, calculate the concrete similarity of they and query fragment again, because whether similar vision be two fragments of tolerance greatest factor, the advantage of this search strategy is: visually similar fragment can not missed because of the influence of other factor, simultaneously can accelerate retrieval rate, because dissimilar fragment just need not be calculated their concrete similarity.
Description of drawings
Fig. 1 is an overall framework of the present invention, is the schematic flow sheet of each one step process among the present invention;
Fig. 2 is the bipartite graph of two dissimilar fragments;
Fig. 3 is the bipartite graph of two dissimilar fragments;
Fig. 4 is the bipartite graph of two similar fragments;
Fig. 5 is the result who Fig. 3 is used the Hungarian algorithm of asking maximum coupling;
Fig. 6 is the result who Fig. 4 is used the Hungarian algorithm of asking maximum coupling;
Fig. 7 is the result for retrieval of the present invention to a video segment.
Embodiment
The present invention is described in further detail below in conjunction with accompanying drawing.
Fig. 1 has listed the schematic flow sheet of each one step process of the present invention, may further comprise the steps: 1, searching lens
At first using space-time section algorithm (spatio-temporal slice) to carry out shot boundary detects, video among query fragment X and the video library Y is divided into camera lens, can list of references " Video Partitioning by Temporal Slice Coherency " [C.W.Ngo about the detailed description of space-time section algorithm, T.C.Pong, and R.T.Chin, IEEE Transactions on Circuits and Systems forVideo Technology, Vol.11, No.8, pp.941-953, August, 2001]; Then according to document " Motion-based Video Representation for Scene Change Detection " [C.W.Ngo, T.C.Pong, and H.J.Zhang, International Journal of Computer Vision, Vol.50, No.2, pp.127-143, Nov 2002] in method, the camera motion information in the detector lens extracts or the structure key frame is represented the camera lens content; Similar value Similarity (the x of two camera lenses i, y j) be key frame result calculated (x wherein according to two camera lenses i, y jRepresent two camera lenses); Then, setting threshold T=0.5 of the present invention is as Similarity (x i, y j)>T just thinks two camera lens x iAnd y jSimilar, according to this formula, retrieve among the video database Y camera lens x with query fragment X iAll similar camera lens y j2, tentatively cut apart similar fragment
For video library Y, the camera lens similar to query fragment X is a minority, and a large amount of camera lenses is also dissimilar.According to the definition that fragment is made up of cinestrip, the present invention is at first with camera lens y similar to X among the Y jThese y are investigated in ordering then from small to large jContinuity, if | y J+1-y j|>2, j=1,2 ..., λ-1, λ are the length (showing with the camera lens numerical table) of video library Y, then obtain a possible similar fragment Y k={ y i, y I+1... y j, i, j ∈ [1, λ].We get | y J+1-y j|>2, be the robustness of considering algorithm, because: irrelevant camera lens is inserted in (1) later stage compilation meeting, and as the editor of same advertisement, a small amount of dissimilar camera lens is inserted in long advertisement meeting on the basis of short advertisement; (2) if begin a new segment, between them meeting interval for some time, and this interval is generally greater than 2 camera lenses.3, maximum coupling is confirmed similar fragment
Suppose query fragment X={x 1, x 2..., x n, the similar fragment Y that each is possible k={ y 1, y 2..., y m, x wherein i, y jThe expression camera lens, so, X and Y kSimilar camera lens corresponding diagram, can be expressed as the bipartite graph G in the graph theory k={ X, Y k, E k, wherein, vertex set V k=X ∪ Y k, limit collection E k={ e Ij, e IjExpression x iWith y jSimilar.
Possible similar fragment through the 2nd step was judged has comprised dissimilar fragment and real similar fragment.By a large amount of experimental observations, can reduce three kinds of typical case of Fig. 2, Fig. 3 and Fig. 4, wherein Fig. 2 and Fig. 3 are the bipartite graphs of dissimilar fragment, Fig. 4 is the bipartite graph of similar fragment.Because video segment is made up of the cinestrip of the same semanteme of expression, therefore the internal lens of a video segment itself will be similar, and we claim that this character is the self-similarity of video segment, because the existence of this self-similarity, X and Y kBipartite graph the situation of general one-to-many, many-one, multi-to-multi can appear, shown in Fig. 2,3,4.Judge whether two fragments are similar, can judge that through the judgement in the 2nd step, we know, substantially each y from the quantity of their similar camera lenses jIn X, can both find similar camera lens x i, but because the similar existence of multi-to-multi, may not each x iAt Y kIn can both find similar camera lens y jTherefore, we investigate x iSimilar situation because Y kLength may be less than the length of X, consider the robustness of algorithm, if half camera lens is arranged among the X at Y kIn can find similar camera lens, we just think Y kThe camera lens similar with X is abundant, so Y kBe the similar fragment of X, this method can effectively be distinguished the situation of Fig. 2.But at Fig. 3 and Fig. 4, query fragment X={x 1, x 2..., x 8All have 6 camera lenses to find similar camera lens, if use said method, they all are judged as similar fragment, but Fig. 3 is the typical case of dissimilar fragment.
Therefore, we further observe at Y kCorresponding one by one rather than repeat under the corresponding situation their similar situation with X.Fig. 3,4 is used the Hungarian algorithm of asking maximum coupling, obtain Fig. 5,6, if
Figure A0314830500111
, we just think Y kThe camera lens number similar to X is abundant, so it is real similar fragment, maximum coupling M E in this formula k, and any two limits are all non-conterminous among the M, and n is the camera lens number of query fragment X.From Fig. 5,6, we can clearly distinguish dissimilar fragment and similar fragment like this.Concrete Hungarian algorithm is as follows:
(1) appoints an initial matching M who gives the G that publishes picture;
(2) as if all nodes of the saturated X of M, then M promptly is maximum coupling, calculates and finishes, otherwise carry out next step;
(3) look for arbitrary M unsaturation point x among the X 0, order
A ← { x 0, B ← φ, A, B are two set;
(4) as N (A)=B, with x 0Changeed for (2) step as saturation point (or being called the false saturation point), otherwise carry out next step (N (A) Y, be with A in the node set of node adjacency);
(5) look for node y ∈ N (A)-B;
(6) be the M saturation point as y, then find out the match point z of y, order
A ← A ∪ z}, { y} changeed for (4) step to B ← B ∪, otherwise carried out next step;
(7) there is one from x 0But to the augmentation road P of y, order
M ← M P (M and P encircle and)
Changeed for (2) step.4, the similarity model of video segment
Through the calculating in the 3rd step, we have obtained a plurality of fragments visually similar to query fragment, next consider to arrange them from high to low according to similarity.We have considered the following factor of segment-similarity tolerance:
(1) the vision factor: be the greatest factor that determines that two fragments are whether similar, mainly the similitude of the camera lens by forming fragment is measured;
(2) size distribution factor: certain camera lens in fragment may be similar in appearance to a plurality of camera lenses in another fragment.Therefore, in the similar camera lens corresponding diagram of two fragments, the situation of one-to-many, many-one, multi-to-multi can appear.Need method to measure the similitude of different camera lens corresponding relations.For example, the fragment of two many-to-one relationships should be given lower similar value;
(3) the order factor: two visually similar fragments can not be considered to dissimilar because of different camera lens orders.But the vision of comparing is similar and two fragments that time sequencing is different, and vision should be endowed higher similar value with two all similar fragments of time sequencing;
(4) interference factor: two similar fragments, some camera lenses in them may not find corresponding similar camera lens, and the existence of these camera lenses has embodied corresponding discontinuity, can exert an influence to two final similitudes of fragment.
The present invention is based on the Optimum Matching of graph theory and represents the similarity model above-mentioned with modeling, and a remarkable advantage of doing like this is that validity of the present invention can be verified by Optimum Matching.In addition, because vision is the most important criterion of similar fragment, we adopt the linear combination of the above-mentioned factor to judge whether two fragments are similar as existing method, but after utilizing maximum coupling to obtain visual similar fragment earlier, represent and modeling similarity model based on Optimum Matching again, so visually similar fragment can not missed because of the influence of other factor, in addition, because the computation complexity of maximum coupling is lower than Optimum Matching, do the speed that also can accelerate to retrieve like this.Optimum Matching is the same with maximum coupling, all is to calculate under the prerequisite of granularity, below we specifically calculate other 3 factors: the 4.1 Optimum Matching computation vision factors
We compose the similar value of every pair of similar camera lens to G as weights k={ X, Y k, E kEvery limit, G at this moment kJust be converted into the bipartite graph of a cum rights, the Kuhn_Munkres algorithm that specifically calculates Optimum Matching is as follows:
(1) provides initial label l ( x i ) = max i ω ij , l ( y j ) = 0 , i , j = 1,2 . . . , t = max ( n , m ) ;
(2) obtain limit collection E l={ (x i, y j) | l (x i)+l (y j)=ω Ij, G l=(X, Y k, E l) and G lIn one the coupling M;
(3) as all nodes of the saturated X of M, then M promptly is the Optimum Matching of G, calculates and finishes, otherwise carry out next step;
(4) in X, look for a M unsaturation point x 0, make A ← { x 0, B ← φ, A, B are two set;
(5) if N G l ( A ) = B , then changeed for (9) step, otherwise carry out next step, wherein, N G l ( A ) ⊆ Y k , Be with A in the node set of node adjacency;
(6) look for a node y ∈ N G l ( A ) - B ;
(7) if y is the M saturation point, then find out the match point z of y, make A ← A ∪ z}, { y} changeed for (5) step, otherwise carries out next step B ← B ∪;
(8) there is one from x 0But the augmenting path P to y makes M ← M E (P), changes for (3) step;
(9) be calculated as follows a value: a = min x i ∈ A y j ∉ N G l ( A ) { l ( x i ) + l ( y j ) - ω ij } , Revise label:
Figure A0314830500135
Ask E according to l ' L 'And G L '
(10) l ← l ', G l← G L 'Changeed for (6) step.
After obtaining authority ω and obtaining the coupling M of ω, the present invention defines the vision factor Vision = ω n 。In order to determine Y kThe segment boundaries similar to X, the present invention gets all y of the related M of X, and ordering is { y from small to large α, y β..., y γ, α, beta, gamma ∈ [1, m], in this set, y α, y βPossibility is also discontinuous, i.e. y β-y α>1, according to the successional definition of video segment, the present invention gets y αWith y γBetween all camera lenses constitute similar fragment Y k'={ y α, y α+1..., y γ.4.2 the dynamic programming algorithm computation sequence factor
In the 4.1 Optimum Matching M that calculate, we further investigate Y k' and the corresponding in chronological order situation of X, promptly find Y k' the full length shot number on limit is arranged with X in chronological order, measure the order factor with this.This problem can be summed up as the longest common subsequence (LCS) problem: given two sequence X={ x 1, x 2... x nAnd Y k'={ y α, y α+1..., y γ, require to find out X and Y k' a common subsequence the longest, dynamic programming algorithm can effectively address this problem.For convenience of calculation, we are { y α, y α+1..., y γBe expressed as { y 1, y 2..., y l, c[i is used in l=γ-α+1, j] records series X and Y k' the length of long common subsequence, it is as follows to set up recurrence relation:
The definition of order factor of the present invention order = c [ i , j ] n . 4.3 calculating interference factor
In Optimum Matching M, X and Y k' having a small amount of camera lens and do not have frontier juncture connection, this illustrates that these camera lenses can not find corresponding similar camera lens, and their existence has embodied corresponding discontinuity, and the present invention defines interference factor Interference = 2 × | M | n + l , N is the camera lens number of query fragment X, and l is similar fragment Y k' the camera lens number.This equation shows two similar fragment X and Y k' all camera lenses in, can find the camera lens ratio of corresponding similar camera lens.4.4 calculate total similarity
According to the analysis of front, the present invention calculates query fragment X and its similar fragment Y with following formula k' similarity: Similarity (X, Y k')=ω 1Vision+ ω 2Order+ ω 3Interference
Wherein, ω 1, ω 2, ω 3Shown people to vision, in proper order, the attention degree of interference factor, different users can adjust them to the fancy grade of these 3 criterions according to own.In the present invention, get ω respectively 1=0.4, ω 2=0.3, ω 3=0.3, experimental result shows, this similitude criterion that can meet people of following the example of.
With experimental result the outstanding representation of the present invention in video clip retrieval is described below.Experimental data is several days programs from television recording, this video database is very challenging, always have 3 hours 11 minutes, 4714 camera lenses, 286936 two field pictures, comprised advertisement, news, physical culture, the various types of programs of film, the same video fragment of repetition has been arranged here, as the head of news, advertisement etc.; The similar video segments that a lot of repetitions are also arranged is as same advertisement of the different tennis tournaments in the sports cast, different time length and editor etc.In order to verify validity of the present invention, we have used existing method as the experiment contrast, mainly contain such two reasons: 1, existing method is present given experimental data the best way, also is up-to-date a kind of method; 2, consistent with function of the present invention, can in video library, be partitioned into similar fragment automatically, arrange these similar fragments from high to low by similarity then.In video clip retrieval, except the accuracy of retrieval, retrieval rate also is a very important index, and in view of this consideration, we have also compared the retrieval rate of two kinds of methods, and the test machine of use is PIII DualCPU 1G Hz, internal memory 256M.
Fig. 7 is the user interface of experimental arrangement: top delegation be the inquiry certain bar advertisement, demonstration be its key frame, be below the retrieval the result, successively arrange according to the order that similarity is successively decreased.First row that retrieves promptly is the fragment of inquiry, and that yes is the highest for its similarity, and the order that remaining fragment is successively decreased according to similarity is successively arranged.Can see that the similar fragment of arrangement has embodied the effect of the different factors in the 4th step, and is more similar on time sequencing with query fragment as preceding 3 fragments.Concrete experimental result provides at table 1 and table 2 respectively.
Table 1 video segment is the experimental result of retrieval accurately
Query fragment Frame number The present invention Existing method
Precision ratio Recall ratio Speed (second) Precision ratio Recall ratio Speed (second)
1, the head of news ??832 ??100% ??100% ????108 ??75% ??100% ????230
2, football news ??715 ??100% ??100% ????74 ??100% ??100% ????196
3, Huiyuan's advertisement ??367 ??100% ??100% ????167 ??33.3% ??100% ????97
4, bright advertisement ??374 ??100% ??100% ????89 ??100% ??100% ????101
5, good fortune advertisement near the house ??432 ??100% ??100% ????99 ??100% ??100% ????116
On average ??544 ??100% ??100% ????107 ??81.7% ??100% ????148
As can be seen from Table 1, the present invention and existing method have all obtained 100% recall ratio (recall), but on precision ratio (precision), the present invention is better than existing method, main cause is that existing method only calculates the quantity of two similar camera lenses of fragment, and the present invention has considered the corresponding relation of similar camera lens.On retrieval rate, the present invention is faster than existing method, according to our experiment, basically be to equal the time that similar camera lens is judged total retrieval time, existing method adopts the way that compares frame by frame in chronological order, and the present invention only need compare the key frame of each camera lens, and therefore retrieval rate of the present invention is greatly faster than existing method.
The experimental result of table 2 video segment similarity retrieval
Query fragment Frame number The present invention Existing method
Precision ratio Recall ratio Speed (second) Precision ratio Recall ratio Speed (second)
1, tennis tournament ????507 ??100% ??50% ????49 ??100% ????50% ????140
2, the doctor gives emergency treatment to a patient ????1806 ??60% ??85.7% ????93 ??50% ????50% ????507
3, TCL advertisement ????374 ??100% ??100% ????116 ??85.7% ????100% ????100
4, melatonin advertisement ????374 ??100% ??100% ????129 ??100% ????100% ????100
5, Amoisonic's advertisement ????374 ??100% ??100% ????103 ??100% ????50% ????99
On average ????687 ??92% ??87.1% ????98 ??87.1% ????70% ????189
At table 2, no matter be recall ratio, or precision ratio, the present invention is better than existing method, query fragment 1 and 2 is two inquiries that difficulty is very big, and in our video library, tennis tournament occurs 4 times altogether, the present invention has missed wherein two, reason is that we have used the inquiry of blue tennis court, and one the tennis court of missing is green, and another one mainly is player and spectators' camera lens, the camera lens that reflects blue court seldom, existing method has been missed this two fragments too.Similar with query fragment 1, query fragment 2 also is that the very strong and color characteristic of semanteme is difficult to the fragment utilized, and comprehensive entire segment reflects the basic colors feature that this is semantic, and the present invention has also obtained good retrieval effectiveness.On retrieval rate, faster than existing method, query fragment is long more equally in the present invention, and advantage of the present invention is obvious more, for example at query fragment 2, speed of the present invention than existing method fast more than 5 times.In addition, as shown in Figure 7, compare than existing methods, significant advantage of the present invention shows also that according to similarity because except visual signature, the present invention has also considered the different factors of similar fragment from big to small on the homotaxy fragment, and the similarity of existing method only depends on the quantity of similar camera lens, show that by test result the present invention meets people's visual signature and psychological characteristics more in the ordering of similar fragment to the several people.
By gathering 3 hours 11 minutes video frequency program, and and the best and up-to-date existing method of the experiment effect contrast that experimentizes at present, the result shows, adopt video fragment searching method of the present invention, can obtain higher retrieval precision and retrieval rate faster, simultaneously on the putting in order of similar fragment, meet people's psychological characteristics more.Except 6 advertising inquiries that table 1 and table 2 are listed, we have inquired about tens different editors' advertisements again, and the present invention has obtained 100% precision ratio and recall ratio.

Claims (7)

1, a kind of content-based video fragment searching method may further comprise the steps:
(1) at first carries out shot boundary and detect, the video in query fragment and the video library is divided into camera lens; Measure the similarity of the camera lens of the camera lens of query fragment and video database then,, retrieve all similar in video database camera lenses to the camera lens of query fragment according to the tolerance result;
(2) by investigating the continuity of similar camera lens, tentatively be partitioned into the fragment similar to query fragment;
(3) these fragments have comprised real similar fragment and dissimilar fragment, and this moment, the maximum coupling of graph theory was used for filtering dissimilar fragment, arrived next step and only keep similar fragment;
(4) for similar fragment, the vision similarity that the Optimum Matching of graph theory is calculated they and query fragment is the vision factor; Based on the result of Optimum Matching, the similitude of two similar fragment time sequencings of dynamic programming algorithm tolerance i.e. the order factor; Interference factor is also further measured; The similarity of final two fragments is expressed as the linear combination of the above-mentioned vision factor, the order factor and interference factor.
2, a kind of content-based video fragment searching method as claimed in claim 1, it is characterized in that: when carrying out video clip retrieval, theory, algorithm and the result of bipartite graph in the graph theory are incorporated on the measuring similarity of video content, specifically, be that the Hungarian algorithm of maximum coupling and the Kuhn-Munkres algorithm of Optimum Matching in the graph theory are used for content-based video clip retrieval.
3, a kind of content-based video fragment searching method as claimed in claim 2 is characterized in that: in the step (3), utilize the Hungarian algorithm of maximum coupling to filter dissimilar fragment and definite real similar fragment: for bipartite graph G k={ X, Y k, E k, if Fragment Y then kX is similar to query fragment, in the aforementioned calculation formula, and E k={ e Ij, e IjExpression x iWith y jSimilar, maximum coupling M E k, and any two limits are all non-conterminous among the M, and n is the camera lens number of query fragment X.
4, a kind of content-based video fragment searching method as claimed in claim 2, it is characterized in that: in the step (4), utilize the Kuhn-Munkres algorithm and the concrete similarity of calculating two fragments of dynamic programming algorithm of Optimum Matching: the Kuhn-Munkres algorithm computation query fragment X and the similar fragment Y of Optimum Matching kThe vision factor Vision = ω n , ω is cum rights bipartite graph G in the formula k={ X, Y k, E kAuthority, n is the camera lens number of query fragment X; Based on the result of Optimum Matching, the order factor of two similar fragments of dynamic programming algorithm tolerance order = c [ i , j ] n ; Interference factor is also further measured: Interference = 2 × | M | n + l , L is similar fragment Y ' in the formula kThe camera lens number, | M| represents G k={ X, Y k, E kThe limit number of Optimum Matching, the similarity of final two fragments is expressed as the linear combination of the above-mentioned vision factor, the order factor and interference factor: Similarity (X, Y k')=ω 1Vision+ ω 2Order+ ω 3Interference, the ω in this formula 1, ω 2, ω 3Represent respectively vision, in proper order, the weight of interference factor.
5, a kind of content-based video fragment searching method as claimed in claim 1 is characterized in that: in the step (2), tentatively be partitioned into fragment similar to query fragment X among the video library Y: with camera lens y similar to query fragment X among the video library Y jThese y are investigated in ordering then from small to large jContinuity, if | y J+1-y j|>2, j=1,2 ..., λ-1 then obtains a possible similar fragment Y k={ y i, y I+1..., y j, i, j ∈ [1, λ], in the above-mentioned formula, λ is the length of video library Y, shows with the camera lens numerical table.
6, a kind of content-based video fragment searching method as claimed in claim 4 is characterized in that the Optimum Matching computation vision factor is as follows with the method on the border of determining similar fragment:
The similar value of every pair of similar camera lens is composed to G as weights k={ X, Y k, E kEvery limit, G at this moment kJust be converted into the bipartite graph of a cum rights, the Kuhn_Munkres algorithm that specifically calculates Optimum Matching is as follows:
(1) provides initial label l ( x i ) = max j ω ij , l ( y j ) = 0 , i , j = 1,2 . . . , t , t = max ( n , m ) ;
(2) obtain limit collection E l={ (x i, y j) | l (x i)+l (y j)=ω Ij, G l=(X, Y k, E l) and G lIn one the coupling M;
(3) as all nodes of the saturated X of M, then M promptly is the Optimum Matching of G, calculates and finishes, otherwise carry out next step;
(4) in X, look for a M unsaturation point x 0, make A ← { x 0, B ← φ, A, B are two set;
(5) if N G l ( A ) = B , then changeed for (9) step, otherwise carry out next step, wherein, N G l ( A ) ⊆ Y k , Be with A in the node set of node adjacency;
(6) look for a node y ∈ N G l ( A ) - B ;
(7) if y is the M saturation point, then find out the match point z of y, make A ← A ∪ z}, { y} changeed for (5) step, otherwise carries out next step B ← B ∪;
(8) there is one from x 0But the augmenting path P to y makes M ← M E (P), changes for (3) step;
(9) be calculated as follows a value: a = min x i ∈ A y j ∉ N G l ( A ) { l ( x i ) + l ( y j ) - ω ij } , Revise label:
Ask E according to l ' L 'And G L '
(10) l ← l ', G l← G L ', changeed for (6) step;
After obtaining authority ω and obtaining the coupling M of ω, the vision factor Vision = ω n ; In order to determine Y kThe segment boundaries similar to X, the present invention gets all y of the related M of X, and ordering is { y from small to large α, y β..., y γ, α, beta, gamma ∈ [1, m], in this set, y α, y βPossibility is also discontinuous, i.e. y β-y α>1, according to the successional definition of video segment, the present invention gets y αWith y γBetween all camera lenses constitute similar fragment Y k'={ y α, y α+1..., y γ.
7, a kind of content-based video fragment searching method as claimed in claim 4 is characterized in that the method for the dynamic programming algorithm computation sequence factor is as follows:
In the Optimum Matching M that calculates, further investigate Y k' and the corresponding in chronological order situation of X, promptly find Y k' the full length shot number on limit is arranged with X in chronological order, measure the order factor with this; This problem can be summed up as the longest common subsequence (LCS) problem: given two sequence X={ x 1, x 2..., x nAnd Y k'=(y α, y α+1..., y γ, require to find out X and Y k' a common subsequence the longest, dynamic programming algorithm can effectively address this problem, for convenience of calculation, we are { y α, y α+1..., y γBe expressed as { y 1, y 2..., y l, c[i is used in l=γ-α+1, j] records series X and Y k' the length of long common subsequence, it is as follows to set up recurrence relation:
Figure A0314830500041
The order factor order = c [ i , j ] n .
CNB031483054A 2003-06-30 2003-06-30 Video segment searching method based on contents Expired - Fee Related CN1206847C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNB031483054A CN1206847C (en) 2003-06-30 2003-06-30 Video segment searching method based on contents

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNB031483054A CN1206847C (en) 2003-06-30 2003-06-30 Video segment searching method based on contents

Publications (2)

Publication Number Publication Date
CN1461142A true CN1461142A (en) 2003-12-10
CN1206847C CN1206847C (en) 2005-06-15

Family

ID=29591422

Family Applications (1)

Application Number Title Priority Date Filing Date
CNB031483054A Expired - Fee Related CN1206847C (en) 2003-06-30 2003-06-30 Video segment searching method based on contents

Country Status (1)

Country Link
CN (1) CN1206847C (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100447782C (en) * 2004-03-22 2008-12-31 微软公司 Method for duplicate detection and suppression
CN1955964B (en) * 2005-10-28 2010-09-29 乐金电子(中国)研究开发中心有限公司 Video frequency retrieve method
CN101107851B (en) * 2005-01-19 2010-12-15 皇家飞利浦电子股份有限公司 Apparatus and method for analyzing a content stream comprising a content item
CN102222103A (en) * 2011-06-22 2011-10-19 央视国际网络有限公司 Method and device for processing matching relationship of video content
CN102737383A (en) * 2011-03-31 2012-10-17 富士通株式会社 Camera movement analyzing method and device in video
CN102771115A (en) * 2009-12-29 2012-11-07 电视互动系统有限公司 Method for identifying video segments and displaying contextually targeted content on a connected television
WO2013143465A1 (en) * 2012-03-27 2013-10-03 华为技术有限公司 Video query method, device and system
CN103605914A (en) * 2013-11-15 2014-02-26 南京云川信息技术有限公司 Method for computing piracy predictive indexes of network movie resources
CN103984778A (en) * 2014-06-06 2014-08-13 北京金山网络科技有限公司 Video retrieval method and video retrieval system
CN105183752A (en) * 2015-07-13 2015-12-23 中国电子科技集团公司第十研究所 Method for associated query of specific content of infrared video images
CN106126619A (en) * 2016-06-20 2016-11-16 中山大学 A kind of video retrieval method based on video content and system
CN109246446A (en) * 2018-11-09 2019-01-18 东方明珠新媒体股份有限公司 Compare the method, apparatus and equipment of video content similitude
CN109982126A (en) * 2017-12-27 2019-07-05 艾迪普(北京)文化科技股份有限公司 A kind of stacking method of associated video
CN113886632A (en) * 2021-12-03 2022-01-04 杭州并坚科技有限公司 Video retrieval matching method based on dynamic programming

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100447782C (en) * 2004-03-22 2008-12-31 微软公司 Method for duplicate detection and suppression
CN101107851B (en) * 2005-01-19 2010-12-15 皇家飞利浦电子股份有限公司 Apparatus and method for analyzing a content stream comprising a content item
CN1955964B (en) * 2005-10-28 2010-09-29 乐金电子(中国)研究开发中心有限公司 Video frequency retrieve method
CN102771115B (en) * 2009-12-29 2017-09-01 威智优构造技术有限责任公司 The video segment recognition methods of network television and context targeted content display methods
CN102771115A (en) * 2009-12-29 2012-11-07 电视互动系统有限公司 Method for identifying video segments and displaying contextually targeted content on a connected television
US8867892B2 (en) 2011-03-31 2014-10-21 Fujitsu Limited Method and apparatus for camera motion analysis in video
CN102737383B (en) * 2011-03-31 2014-12-17 富士通株式会社 Camera movement analyzing method and device in video
CN102737383A (en) * 2011-03-31 2012-10-17 富士通株式会社 Camera movement analyzing method and device in video
CN102222103A (en) * 2011-06-22 2011-10-19 央视国际网络有限公司 Method and device for processing matching relationship of video content
CN102222103B (en) * 2011-06-22 2013-03-27 央视国际网络有限公司 Method and device for processing matching relationship of video content
WO2013143465A1 (en) * 2012-03-27 2013-10-03 华为技术有限公司 Video query method, device and system
CN103605914A (en) * 2013-11-15 2014-02-26 南京云川信息技术有限公司 Method for computing piracy predictive indexes of network movie resources
CN103605914B (en) * 2013-11-15 2016-05-11 南京云川信息技术有限公司 A kind of computational methods of online movie resource infringement predictive index
CN103984778A (en) * 2014-06-06 2014-08-13 北京金山网络科技有限公司 Video retrieval method and video retrieval system
CN105183752A (en) * 2015-07-13 2015-12-23 中国电子科技集团公司第十研究所 Method for associated query of specific content of infrared video images
CN105183752B (en) * 2015-07-13 2018-08-10 中国电子科技集团公司第十研究所 The method of correlation inquiry Infrared video image specific content
CN106126619A (en) * 2016-06-20 2016-11-16 中山大学 A kind of video retrieval method based on video content and system
CN109982126A (en) * 2017-12-27 2019-07-05 艾迪普(北京)文化科技股份有限公司 A kind of stacking method of associated video
CN109246446A (en) * 2018-11-09 2019-01-18 东方明珠新媒体股份有限公司 Compare the method, apparatus and equipment of video content similitude
CN113886632A (en) * 2021-12-03 2022-01-04 杭州并坚科技有限公司 Video retrieval matching method based on dynamic programming
CN113886632B (en) * 2021-12-03 2022-04-01 杭州并坚科技有限公司 Video retrieval matching method based on dynamic programming

Also Published As

Publication number Publication date
CN1206847C (en) 2005-06-15

Similar Documents

Publication Publication Date Title
CN1206847C (en) Video segment searching method based on contents
US10867212B2 (en) Learning highlights using event detection
JP5711387B2 (en) Method and apparatus for comparing pictures
Papadopoulos et al. Social Event Detection at MediaEval 2011: Challenges, dataset and evaluation.
CN103686231B (en) Method and system for integrated management, failure replacement and continuous playing of film
US8457466B1 (en) Videore: method and system for storing videos from multiple cameras for behavior re-mining
CN101369281A (en) Retrieval method based on video abstract metadata
CN103164539B (en) A kind of combination user evaluates and the interactive image retrieval method of mark
CN109508671A (en) A kind of video accident detection system and method based on Weakly supervised study
CN103430175B (en) For the method and apparatus that video is compared
CN101853295A (en) Image search method
WO2018113673A1 (en) Method and apparatus for pushing search result of variety show query
CN103984778B (en) A kind of video retrieval method and system
CN1245697C (en) Method of proceeding video frequency searching through video frequency segment
EP2573685A1 (en) Ranking of heterogeneous information objects
Kuzey et al. Evin: Building a knowledge base of events
Liu et al. Query sensitive dynamic web video thumbnail generation
US20090106208A1 (en) Apparatus and method for content item annotation
Tsikrika et al. Image annotation using clickthrough data
CN106604068B (en) A kind of method and its system of more new media program
Tsai et al. Qualitative evaluation of automatic assignment of keywords to images
Peng et al. Clip-based similarity measure for hierarchical video retrieval
CN106844573B (en) Video abstract acquisition method based on manifold sorting
Brenner et al. Multimodal detection, retrieval and classification of social events in web photo collections
Stylianou et al. Indexing open imagery to create tools to fight sex trafficking

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20050615

CF01 Termination of patent right due to non-payment of annual fee