CN1461142A

CN1461142A - Video segment searching method based on contents

Info

Publication number: CN1461142A
Application number: CN03148305A
Authority: CN
Inventors: 彭宇新; 杨宗桦; 肖建国
Original assignee: BEIDA FANGZHENG TECHN INST Co Ltd BEIJING; Inst Of Computer Science & Technology Peking University
Current assignee: BEIDA FANGZHENG TECHN INST Co Ltd BEIJING; Inst Of Computer Science & Technology Peking University
Priority date: 2003-06-30
Filing date: 2003-06-30
Publication date: 2003-12-10
Anticipated expiration: 2023-06-30
Also published as: CN1206847C

Abstract

The present invention relates to a video segment search method based on the contents. It adopts maximum matching of graph theory and its optimum matching so as to raise search accuracy and search speed. Said method includes the following steps: firstly, investigating continuity of similar scene to obtain a similar segment, primarily, then utilizing Hungarian algorithm capable of implementing maximum matching to define true similar segment, then adopting the combination of Kuhn-Munkres algorithm capable of making optimum matching and dynamic program algorithm to resolve the metric problem of segment similarity. The tests show that said method can obtain higher search accuracy and more quickly searching speed.

Description

A kind of content-based video fragment searching method

Technical field

The invention belongs to video search technique area, be specifically related to a kind of content-based video fragment searching method.

Background technology

Along with the accumulation of TV station's video frequency program, the increase of online digital video, and digital library, video request program, how the multimedia application that remote teaching etc. are a large amount of retrieves needed data fast and seems most important in the magnanimity video.Traditional video frequency searching of describing based on keyword is because descriptive power is limited, and subjectivity is strong, manual mark, reasons such as intuitive difference, demand that can not the satisfying magnanimity video frequency searching.Therefore, since the nineties, the Content-based Video Retrieval technology becomes the hot issue of research.

Content-based video clip retrieval is based on the main mode of the video frequency searching of content, and it is meant a given query fragment, finds all fragments similar to it in video library.Content-based video clip retrieval need solve two problems and carry out simultaneously the retrieval of two types of fragments.Two problems are: 1, be partitioned into a plurality of fragments similar to query fragment automatically in video library; 2, arrange these similar fragments from high to low according to similarity.Two types retrieval comprises: 1, accurately retrieval: the fragment that retrieve is the same substantially with query fragment, has same camera lens and frame sequence; 2, similarity retrieval: such two kinds of situations are arranged, and a kind of is that former video has been carried out various editors, as insertion/delete frame (slow motion/snapshot), insertion/deletion camera lens, switching frame/camera lens sequential scheduling.Another kind is the different similar programs of taking, as different football matches etc.A good fragment searching algorithm should be able to solve above-mentioned two problems, carries out the retrieval of two types of fragments simultaneously in the rational time.

Existing fragment search method can be divided into two classes: one, as document " A Framework forMeasuring Video Similarity and Its Application to Video Query by Example " [Y.P.Tan, S.R.Kulkami, and P.J.Ramadge, IEEE International Conference on ImageProcessing, Vol.2, pp.106-110,1999] described, video segment is divided into the two-layer consideration of fragment-frame, and the similitude utilization of fragment is formed the similitude of its frame and is directly measured.The shortcoming of these class methods is to limit similar fragment must observe same time sequencing, and the practical video program is not observed this constraint, because the result of later stage compilation makes similar fragment may have different camera lens orders fully, difference editor as same advertisement, this comparison based on every frame simultaneously also makes retrieval rate slow.Two, the prior art the most approaching with the present invention is that (author is L.Chen to the document of delivering at IEEE International Conferenceon Multimedia and Expo calendar year 2001 " A Match and Tiling Approach toContent-based Video Retrieval ", and T.S.Chua, page number 417-420), this documents discloses a class fragment search method, this method is divided into fragment-camera lens-three layers of consideration of frame to video segment, it comprises such several steps: (1) uses MRA (Temporal Multi-Resolution Analysis) method detector lens border earlier, to each frame of each camera lens, carry out color coding and texture coding then.Color coding adopts the average μ and the variances sigma coding of Y component, and texture adopts FRACTAL DIMENSION feature (Fractal Dimension, FD) coding; (2) suppose the similar frame of two camera lens inside, similar according to the time sequencing correspondence, therefore calculate the maximum length sequence of two similar frames of camera lens, the similarity of final two camera lenses is expressed as the linear combination of above-mentioned 3 features, determines similar threshold value σ _L, judge whether two camera lenses are similar; (3) on this basis, use the way of sliding window (Sliding Window), finally find the fragment similar to query fragment.This method can accurately be retrieved and similarity retrieval simultaneously, but its problem is: (1) has only considered the quantity of two similar camera lenses of fragment, and the camera lens of not considering multi-to-multi similar (granularity) is to the influence of overall similarity, therefore, even all camera lenses of fragment Y are only similar with the camera lens of fragment X, it is similar to X that Y also can be considered to; (2) hypothesis of Ti Chuing and being false, i.e. the similar frame of two camera lens inside may not be similar according to the time sequencing correspondence; (3) similitude of camera lens is to judge according to two camera lens the longest similar frame sequences, this comparison based on every frame, and the retrieval rate of fragment is slow.

Summary of the invention

At the existing defective of existing video fragment searching method, the objective of the invention is to propose a kind of content-based video fragment searching method, this method can improve the retrieval precision and the retrieval rate of content-based video clip retrieval on the basis of existing technology greatly, thereby brings into play the huge effect of video clip retrieval technology in current network information society more fully.Another object of the present invention is when improving retrieval precision and retrieval rate, on the putting in order of similar fragment, meets people's psychological characteristics more.

The object of the present invention is achieved like this: a kind of content-based video fragment searching method may further comprise the steps:

(1) at first uses space-time section algorithm (spatio-temporal slice) to carry out shot boundary and detect, the video in query fragment and the video library is divided into camera lens; Camera motion information in the detector lens extracts or constructs key frame and represents the camera lens content then; The similarity measurement of camera lens is based on the key frame result relatively of the key frame and the video database camera lens of query fragment camera lens, according to the searching lens result, retrieves all similar to the camera lens of query fragment in video database camera lenses;

(2) by investigating the continuity of similar camera lens, tentatively be partitioned into the fragment similar to query fragment;

(3) these fragments have comprised real similar fragment and dissimilar fragment, and this moment, the Hungarian algorithm of maximum coupling was used for filtering dissimilar fragment, arrived next step and only keep similar fragment;

(4) for similar fragment, the vision similarity that the Optimum Matching of graph theory is calculated they and query fragment is the vision factor; Based on the result of Optimum Matching, the similitude of two similar fragment time sequencings of dynamic programming algorithm tolerance i.e. the order factor; Interference factor is also further measured; The similarity of final two fragments is expressed as the linear combination of the above-mentioned vision factor, the order factor and interference factor.

Need to prove, because Optimum Matching is under the prerequisite of (granularity) one to one, calculate the vision factor, and the calculating of the order factor and interference factor also is based on the result of Optimum Matching, so final measuring similarity has in fact comprised the tolerance of size distribution factor.

In order to realize purpose of the present invention better, when carrying out video clip retrieval, theory with bipartite graph in the graph theory, algorithm and result are incorporated on the measuring similarity of video content, specifically, be that the Hungarian algorithm of maximum coupling and the Kuhn-Munkres algorithm of Optimum Matching in the graph theory are used for content-based video clip retrieval.

Specifically, when carrying out video clip retrieval, tentatively be partitioned into fragment similar among the video library Y: with camera lens y similar among the video library Y to query fragment X to query fragment X _jThese y are investigated in ordering then from small to large _jContinuity, if | y _J+1-y _j|＞2, j=1,2 ..., λ-1 then obtains a possible similar fragment Y _k={ y _i, y _I+1..., y _j, i, j ∈ [1, λ], in the above-mentioned formula, λ is the length of video library Y, shows with the camera lens numerical table.

Again specifically, when carrying out video clip retrieval, utilize the Hungarian algorithm of maximum coupling to filter dissimilar fragment and definite real similar fragment: for bipartite graph G _k={ X, Y _k, E _k, if , fragment Y then _kX is similar to query fragment, in the aforementioned calculation formula, and E _k={ e _Ij, e _IjExpression x _iWith y _jSimilar, maximum coupling M E _k, and any two limits are all non-conterminous among the M, and n is the camera lens number of query fragment X.

Further, when carrying out video clip retrieval, utilize the Kuhn-Munkres algorithm and the concrete similarity of calculating two fragments of dynamic programming algorithm of Optimum Matching: the Kuhn-Munkres algorithm computation query fragment X and the similar fragment Y of Optimum Matching _kThe vision factor

Vision = \frac{ω}{n}

, ω is cum rights bipartite graph G in the formula _k={ X, Y _k, E _kAuthority, n is the camera lens number of query fragment X; Based on the result of Optimum Matching, the order factor of two similar fragments of dynamic programming algorithm tolerance

order = \frac{c [i, j]}{n}

Interference factor is also further measured:

Interference = \frac{2 \times | M |}{n + l}

, l is similar fragment Y in the formula _k' the camera lens number, | M| represents G _k={ X, Y _k, E _kThe limit number of Optimum Matching, the similarity of final two fragments is expressed as the linear combination of the above-mentioned vision factor, the order factor and interference factor: Similarity (X, Y _k')=ω ₁Vision+ ω ₂Order+ ω ₃Interference, the ω in this formula ₁, ω ₂, ω ₃Represent respectively vision, in proper order, the weight of interference factor.

Again specifically, when carrying out video clip retrieval, the Optimum Matching computation vision factor is as follows with the method for determining similar segment boundaries: the similar value of every pair of similar camera lens is composed to G as weights _k={ X, Y _k, E _kEvery limit, G at this moment _kJust be converted into the bipartite graph of a cum rights, the Kuhn_Munkres algorithm that specifically calculates Optimum Matching is as follows:

(1) provides initial label

l (x_{i}) = \max_{j} ω_{ij}, l (y_{j}) = 0, i, j = 1,2 . . ., t, t = \max (n, m);

(2) obtain limit collection E _l={ (x _i, y _j) | l (x _i)+l (y _j)=ω _Ij, G _l=(X, Y _k, E _l) and G _lIn one the coupling M;

(3) as all nodes of the saturated X of M, then M promptly is the Optimum Matching of G, calculates and finishes, otherwise carry out next step;

(4) in X, look for a M unsaturation point x ₀, make A ← { x ₀, B ← φ, A, B are two set;

(5) if

N_{G_{l}} (A) = B

, then changeed for (9) step, otherwise carry out next step, wherein,

N_{G_{l}} (A) &SubsetEqual; Y_{k},

Be with A in the node set of node adjacency;

(6) look for a node

y &Element; N_{G_{l}} (A) - B;

(7) if y is the M saturation point, then find out the match point z of y, make A ← A ∪ z}, { y} changeed for (5) step, otherwise carries out next step B ← B ∪;

(8) there is one from x ₀But the augmenting path P to y makes M ← M E (P), changes for (3) step;

(9) be calculated as follows a value:

a = \min_{\underset{y_{j} &NotElement; N_{G_{l}} (A)}{x_{i} &Element; A}} {l (x_{i}) + l (y_{j}) - ω_{ij}},

Revise label:

Ask E according to l ' _{L '}And G _{L '}

(10) l ← l ', G _l← G _{L '}, changeed for (6) step.

After obtaining authority ω and obtaining the coupling M of ω, the vision factor

Vision = \frac{ω}{n}

In order to determine Y _kThe segment boundaries similar to X, the present invention gets all y of the related M of X, and ordering is { y from small to large _α, y _β... y _γ, α, beta, gamma ∈ [1, m], in this set, y _α, y _βPossibility is also discontinuous, i.e. y _β-y _α＞1, according to the successional definition of video segment, the present invention gets y _αWith y _γBetween all camera lenses constitute similar fragment Y _k'={ y _α, y _α+1..., y _γ.

In order to implement the present invention better, the method for the dynamic programming algorithm computation sequence factor can be: in the Optimum Matching M that calculates, further investigate Y _k' and the corresponding in chronological order situation of X, promptly find Y _k' the full length shot number on limit is arranged with X in chronological order, measure the order factor with this.This problem can be summed up as the longest common subsequence (LCS) problem: given two sequence X={ x ₁, x ₂..., x _nAnd Y _k'={ y _α, y _α+1..., y _γ, require to find out X and Y _k' a common subsequence the longest, dynamic programming algorithm can effectively address this problem.For convenience of calculation, we are { y _α, y _α+1..., y _γBe expressed as { y ₁, y ₂..., y _l, c[i is used in l=γ-α+1, j] records series X and Y _k' the length of long common subsequence, it is as follows to set up recurrence relation:

The order factor

order = \frac{c [i, j]}{n} .

Effect of the present invention is: adopt video fragment searching method of the present invention, can obtain higher retrieval precision and retrieval rate faster, another effect of the present invention is that the present invention simultaneously on the putting in order of similar fragment, meets people's psychological characteristics more.

Why the present invention has so significant technique effect, and its reason is:

One, as described in the previous technique content, in order to be partitioned into similar fragment, the present invention is divided into searching lens and fragment two stages of retrieval to retrieving: in the searching lens stage, considered the temporal information in the video, the inner time dependent content of a camera lens, be decomposed into the sub-camera lens (sub-shots) of several contents unanimity, thisly reflected more all sidedly based on sub-camera lens whether two camera lenses are similar, it has not only avoided existing method that each camera lens is only adopted key frame deficiency relatively, the problem that the retrieval rate of also having avoided existing method relatively to cause frame by frame is slow; In the fragment retrieval phase, tentatively obtain similar one by one fragment by the continuity of investigating similar camera lens, the Hungarian algorithm of the maximum coupling of utilization is determined real similar fragment again.For the homotaxy fragment, the present invention has considered vision, granularity, time sequencing and the interference factor of segment-similarity tolerance, proposes to combine with the Kuhn_Munkres algorithm of Optimum Matching and dynamic programming algorithm to measure the influence of these factors.The present invention uses the matching theory of graph theory to solve the video frequency searching problem first, this is because the thought of coupling requires similar camera lens corresponding (granularity) one by one, under this condition, the maximum coupling of obtaining can objectively reflect two number of shots and two degree that the fragment vision is similar that fragment is similar all sidedly with Optimum Matching, thereby has avoided the granularity problem that camera lens calculates in the existing method.Experimental result shows, compares with existing method with said function, and no matter be the accuracy of retrieval, or retrieval rate, the present invention has obtained outstanding effect.

Two, the measuring similarity of video segment except visual information, also depends on the internal relations between the camera lens of forming fragment, and in order to reach remarkable technique effect of the present invention, the present invention has considered following 4 factors when concrete retrieval:

(1) the vision factor: be the greatest factor that determines that two fragments are whether similar, mainly the similitude of the camera lens by forming fragment is measured;

(2) size distribution factor: certain camera lens in fragment may be similar in appearance to a plurality of camera lenses in another fragment.Therefore, in the similar camera lens corresponding diagram of two fragments, the situation of one-to-many, many-one, multi-to-multi can appear.Need method to measure the similitude of different camera lens corresponding relations.For example, the fragment of two many-to-one relationships should be given lower similar value;

(3) the order factor: two visually similar fragments can not be considered to dissimilar because of different camera lens orders.But the vision of comparing is similar and two fragments that time sequencing is different, and vision should be endowed higher similar value with two all similar fragments of time sequencing;

(4) interference factor: two similar fragments, some camera lenses in them may not find corresponding similar camera lens, and the existence of these camera lenses has embodied corresponding discontinuity, can exert an influence to two final similitudes of fragment.

Three, the search strategy of content-based video segment has been proposed: find out all visually similar fragments earlier to query fragment; For similar fragment, calculate the concrete similarity of they and query fragment again, because whether similar vision be two fragments of tolerance greatest factor, the advantage of this search strategy is: visually similar fragment can not missed because of the influence of other factor, simultaneously can accelerate retrieval rate, because dissimilar fragment just need not be calculated their concrete similarity.

Description of drawings

Fig. 1 is an overall framework of the present invention, is the schematic flow sheet of each one step process among the present invention;

Fig. 2 is the bipartite graph of two dissimilar fragments;

Fig. 3 is the bipartite graph of two dissimilar fragments;

Fig. 4 is the bipartite graph of two similar fragments;

Fig. 5 is the result who Fig. 3 is used the Hungarian algorithm of asking maximum coupling;

Fig. 6 is the result who Fig. 4 is used the Hungarian algorithm of asking maximum coupling;

Fig. 7 is the result for retrieval of the present invention to a video segment.

Embodiment

The present invention is described in further detail below in conjunction with accompanying drawing.

Fig. 1 has listed the schematic flow sheet of each one step process of the present invention, may further comprise the steps: 1, searching lens

At first using space-time section algorithm (spatio-temporal slice) to carry out shot boundary detects, video among query fragment X and the video library Y is divided into camera lens, can list of references " Video Partitioning by Temporal Slice Coherency " [C.W.Ngo about the detailed description of space-time section algorithm, T.C.Pong, and R.T.Chin, IEEE Transactions on Circuits and Systems forVideo Technology, Vol.11, No.8, pp.941-953, August, 2001]; Then according to document " Motion-based Video Representation for Scene Change Detection " [C.W.Ngo, T.C.Pong, and H.J.Zhang, International Journal of Computer Vision, Vol.50, No.2, pp.127-143, Nov 2002] in method, the camera motion information in the detector lens extracts or the structure key frame is represented the camera lens content; Similar value Similarity (the x of two camera lenses _i, y _j) be key frame result calculated (x wherein according to two camera lenses _i, y _jRepresent two camera lenses); Then, setting threshold T=0.5 of the present invention is as Similarity (x _i, y _j)＞T just thinks two camera lens x _iAnd y _jSimilar, according to this formula, retrieve among the video database Y camera lens x with query fragment X _iAll similar camera lens y _j2, tentatively cut apart similar fragment

For video library Y, the camera lens similar to query fragment X is a minority, and a large amount of camera lenses is also dissimilar.According to the definition that fragment is made up of cinestrip, the present invention is at first with camera lens y similar to X among the Y _jThese y are investigated in ordering then from small to large _jContinuity, if | y _J+1-y _j|＞2, j=1,2 ..., λ-1, λ are the length (showing with the camera lens numerical table) of video library Y, then obtain a possible similar fragment Y _k={ y _i, y _I+1... y _j, i, j ∈ [1, λ].We get | y _J+1-y _j|＞2, be the robustness of considering algorithm, because: irrelevant camera lens is inserted in (1) later stage compilation meeting, and as the editor of same advertisement, a small amount of dissimilar camera lens is inserted in long advertisement meeting on the basis of short advertisement; (2) if begin a new segment, between them meeting interval for some time, and this interval is generally greater than 2 camera lenses.3, maximum coupling is confirmed similar fragment

Suppose query fragment X={x ₁, x ₂..., x _n, the similar fragment Y that each is possible _k={ y ₁, y ₂..., y _m, x wherein _i, y _jThe expression camera lens, so, X and Y _kSimilar camera lens corresponding diagram, can be expressed as the bipartite graph G in the graph theory _k={ X, Y _k, E _k, wherein, vertex set V _k=X ∪ Y _k, limit collection E _k={ e _Ij, e _IjExpression x _iWith y _jSimilar.

Possible similar fragment through the 2nd step was judged has comprised dissimilar fragment and real similar fragment.By a large amount of experimental observations, can reduce three kinds of typical case of Fig. 2, Fig. 3 and Fig. 4, wherein Fig. 2 and Fig. 3 are the bipartite graphs of dissimilar fragment, Fig. 4 is the bipartite graph of similar fragment.Because video segment is made up of the cinestrip of the same semanteme of expression, therefore the internal lens of a video segment itself will be similar, and we claim that this character is the self-similarity of video segment, because the existence of this self-similarity, X and Y _kBipartite graph the situation of general one-to-many, many-one, multi-to-multi can appear, shown in Fig. 2,3,4.Judge whether two fragments are similar, can judge that through the judgement in the 2nd step, we know, substantially each y from the quantity of their similar camera lenses _jIn X, can both find similar camera lens x _i, but because the similar existence of multi-to-multi, may not each x _iAt Y _kIn can both find similar camera lens y _jTherefore, we investigate x _iSimilar situation because Y _kLength may be less than the length of X, consider the robustness of algorithm, if half camera lens is arranged among the X at Y _kIn can find similar camera lens, we just think Y _kThe camera lens similar with X is abundant, so Y _kBe the similar fragment of X, this method can effectively be distinguished the situation of Fig. 2.But at Fig. 3 and Fig. 4, query fragment X={x ₁, x ₂..., x ₈All have 6 camera lenses to find similar camera lens, if use said method, they all are judged as similar fragment, but Fig. 3 is the typical case of dissimilar fragment.

Therefore, we further observe at Y _kCorresponding one by one rather than repeat under the corresponding situation their similar situation with X.Fig. 3,4 is used the Hungarian algorithm of asking maximum coupling, obtain Fig. 5,6, if

, we just think Y _kThe camera lens number similar to X is abundant, so it is real similar fragment, maximum coupling M E in this formula _k, and any two limits are all non-conterminous among the M, and n is the camera lens number of query fragment X.From Fig. 5,6, we can clearly distinguish dissimilar fragment and similar fragment like this.Concrete Hungarian algorithm is as follows:

(1) appoints an initial matching M who gives the G that publishes picture;

(2) as if all nodes of the saturated X of M, then M promptly is maximum coupling, calculates and finishes, otherwise carry out next step;

(3) look for arbitrary M unsaturation point x among the X ₀, order

A ← { x ₀, B ← φ, A, B are two set;

(4) as N (A)=B, with x ₀Changeed for (2) step as saturation point (or being called the false saturation point), otherwise carry out next step (N (A) Y, be with A in the node set of node adjacency);

(5) look for node y ∈ N (A)-B;

(6) be the M saturation point as y, then find out the match point z of y, order

A ← A ∪ z}, { y} changeed for (4) step to B ← B ∪, otherwise carried out next step;

(7) there is one from x ₀But to the augmentation road P of y, order

M ← M P (M and P encircle and)

Changeed for (2) step.4, the similarity model of video segment

Through the calculating in the 3rd step, we have obtained a plurality of fragments visually similar to query fragment, next consider to arrange them from high to low according to similarity.We have considered the following factor of segment-similarity tolerance:

The present invention is based on the Optimum Matching of graph theory and represents the similarity model above-mentioned with modeling, and a remarkable advantage of doing like this is that validity of the present invention can be verified by Optimum Matching.In addition, because vision is the most important criterion of similar fragment, we adopt the linear combination of the above-mentioned factor to judge whether two fragments are similar as existing method, but after utilizing maximum coupling to obtain visual similar fragment earlier, represent and modeling similarity model based on Optimum Matching again, so visually similar fragment can not missed because of the influence of other factor, in addition, because the computation complexity of maximum coupling is lower than Optimum Matching, do the speed that also can accelerate to retrieve like this.Optimum Matching is the same with maximum coupling, all is to calculate under the prerequisite of granularity, below we specifically calculate other 3 factors: the 4.1 Optimum Matching computation vision factors

We compose the similar value of every pair of similar camera lens to G as weights _k={ X, Y _k, E _kEvery limit, G at this moment _kJust be converted into the bipartite graph of a cum rights, the Kuhn_Munkres algorithm that specifically calculates Optimum Matching is as follows:

(1) provides initial label

l (x_{i}) = \max_{i} ω_{ij}, l (y_{j}) = 0, i, j = 1,2 . . ., t = \max (n, m);

(5) if

N_{G_{l}} (A) = B

, then changeed for (9) step, otherwise carry out next step, wherein,

N_{G_{l}} (A) &SubsetEqual; Y_{k},

Be with A in the node set of node adjacency;

(6) look for a node

y &Element; N_{G_{l}} (A) - B;

(9) be calculated as follows a value:

a = \min_{\underset{y_{j} &NotElement; N_{G_{l}} (A)}{x_{i} &Element; A}} {l (x_{i}) + l (y_{j}) - ω_{ij}},

Revise label:

Ask E according to l ' _{L '}And G _{L '}

(10) l ← l ', G _l← G _{L '}Changeed for (6) step.

After obtaining authority ω and obtaining the coupling M of ω, the present invention defines the vision factor

Vision = \frac{ω}{n}

。In order to determine Y _kThe segment boundaries similar to X, the present invention gets all y of the related M of X, and ordering is { y from small to large _α, y _β..., y _γ, α, beta, gamma ∈ [1, m], in this set, y _α, y _βPossibility is also discontinuous, i.e. y _β-y _α＞1, according to the successional definition of video segment, the present invention gets y _αWith y _γBetween all camera lenses constitute similar fragment Y _k'={ y _α, y _α+1..., y _γ.4.2 the dynamic programming algorithm computation sequence factor

In the 4.1 Optimum Matching M that calculate, we further investigate Y _k' and the corresponding in chronological order situation of X, promptly find Y _k' the full length shot number on limit is arranged with X in chronological order, measure the order factor with this.This problem can be summed up as the longest common subsequence (LCS) problem: given two sequence X={ x ₁, x ₂... x _nAnd Y _k'={ y _α, y _α+1..., y _γ, require to find out X and Y _k' a common subsequence the longest, dynamic programming algorithm can effectively address this problem.For convenience of calculation, we are { y _α, y _α+1..., y _γBe expressed as { y ₁, y ₂..., y _l, c[i is used in l=γ-α+1, j] records series X and Y _k' the length of long common subsequence, it is as follows to set up recurrence relation:

The definition of order factor of the present invention

order = \frac{c [i, j]}{n} .

4.3 calculating interference factor

In Optimum Matching M, X and Y _k' having a small amount of camera lens and do not have frontier juncture connection, this illustrates that these camera lenses can not find corresponding similar camera lens, and their existence has embodied corresponding discontinuity, and the present invention defines interference factor

Interference = \frac{2 \times | M |}{n + l},

N is the camera lens number of query fragment X, and l is similar fragment Y _k' the camera lens number.This equation shows two similar fragment X and Y _k' all camera lenses in, can find the camera lens ratio of corresponding similar camera lens.4.4 calculate total similarity

According to the analysis of front, the present invention calculates query fragment X and its similar fragment Y with following formula _k' similarity: Similarity (X, Y _k')=ω ₁Vision+ ω ₂Order+ ω ₃Interference

Wherein, ω ₁, ω ₂, ω ₃Shown people to vision, in proper order, the attention degree of interference factor, different users can adjust them to the fancy grade of these 3 criterions according to own.In the present invention, get ω respectively ₁=0.4, ω ₂=0.3, ω ₃=0.3, experimental result shows, this similitude criterion that can meet people of following the example of.

With experimental result the outstanding representation of the present invention in video clip retrieval is described below.Experimental data is several days programs from television recording, this video database is very challenging, always have 3 hours 11 minutes, 4714 camera lenses, 286936 two field pictures, comprised advertisement, news, physical culture, the various types of programs of film, the same video fragment of repetition has been arranged here, as the head of news, advertisement etc.; The similar video segments that a lot of repetitions are also arranged is as same advertisement of the different tennis tournaments in the sports cast, different time length and editor etc.In order to verify validity of the present invention, we have used existing method as the experiment contrast, mainly contain such two reasons: 1, existing method is present given experimental data the best way, also is up-to-date a kind of method; 2, consistent with function of the present invention, can in video library, be partitioned into similar fragment automatically, arrange these similar fragments from high to low by similarity then.In video clip retrieval, except the accuracy of retrieval, retrieval rate also is a very important index, and in view of this consideration, we have also compared the retrieval rate of two kinds of methods, and the test machine of use is PIII DualCPU 1G Hz, internal memory 256M.

Fig. 7 is the user interface of experimental arrangement: top delegation be the inquiry certain bar advertisement, demonstration be its key frame, be below the retrieval the result, successively arrange according to the order that similarity is successively decreased.First row that retrieves promptly is the fragment of inquiry, and that yes is the highest for its similarity, and the order that remaining fragment is successively decreased according to similarity is successively arranged.Can see that the similar fragment of arrangement has embodied the effect of the different factors in the 4th step, and is more similar on time sequencing with query fragment as preceding 3 fragments.Concrete experimental result provides at table 1 and table 2 respectively.

Table 1 video segment is the experimental result of retrieval accurately

Query fragment	Frame number	The present invention			Existing method
		The present invention			Existing method			Precision ratio	Recall ratio	Speed (second)	Precision ratio	Recall ratio	Speed (second)
		1, the head of news	??832	??100％	??100％	????108	??75％	Precision ratio	Recall ratio	Speed (second)	Precision ratio	Recall ratio	Speed (second)	??100％	????230
2, football news	??715	1, the head of news	??832	??100％	??100％	????108	??75％	??100％	??100％	????74	??100％	??100％	????196	??100％	????230

3, Huiyuan's advertisement	??367	??100％	??100％	????167	??33.3％	??100％	????97
3, Huiyuan's advertisement	??367	??100％	??100％	????167	??33.3％	??100％	????97	4, bright advertisement	??374	??100％	??100％	????89	??100％	??100％	????101
5, good fortune advertisement near the house	??432	??100％	??100％	????99	??100％	??100％	????116	4, bright advertisement	??374	??100％	??100％	????89	??100％	??100％	????101
5, good fortune advertisement near the house	??432	??100％	??100％	????99	??100％	??100％	????116	On average	??544	??100％	??100％	????107	??81.7％	??100％	????148

As can be seen from Table 1, the present invention and existing method have all obtained 100% recall ratio (recall), but on precision ratio (precision), the present invention is better than existing method, main cause is that existing method only calculates the quantity of two similar camera lenses of fragment, and the present invention has considered the corresponding relation of similar camera lens.On retrieval rate, the present invention is faster than existing method, according to our experiment, basically be to equal the time that similar camera lens is judged total retrieval time, existing method adopts the way that compares frame by frame in chronological order, and the present invention only need compare the key frame of each camera lens, and therefore retrieval rate of the present invention is greatly faster than existing method.

The experimental result of table 2 video segment similarity retrieval

Query fragment	Frame number	The present invention			Existing method
		The present invention			Existing method			Precision ratio	Recall ratio	Speed (second)	Precision ratio	Recall ratio	Speed (second)
		1, tennis tournament	????507	??100％	??50％	????49	??100％	Precision ratio	Recall ratio	Speed (second)	Precision ratio	Recall ratio	Speed (second)	????50％	????140
2, the doctor gives emergency treatment to a patient	????1806	1, tennis tournament	????507	??100％	??50％	????49	??100％	??60％	??85.7％	????93	??50％	????50％	????507	????50％	????140
2, the doctor gives emergency treatment to a patient	????1806	3, TCL advertisement	????374	??100％	??100％	????116	??85.7％	??60％	??85.7％	????93	??50％	????50％	????507	????100％	????100
4, melatonin advertisement	????374	3, TCL advertisement	????374	??100％	??100％	????116	??85.7％	??100％	??100％	????129	??100％	????100％	????100	????100％	????100
4, melatonin advertisement	????374	5, Amoisonic's advertisement	????374	??100％	??100％	????103	??100％	??100％	??100％	????129	??100％	????100％	????100	????50％	????99
On average	????687	5, Amoisonic's advertisement	????374	??100％	??100％	????103	??100％	??92％	??87.1％	????98	??87.1％	????70％	????189	????50％	????99

At table 2, no matter be recall ratio, or precision ratio, the present invention is better than existing method, query fragment 1 and 2 is two inquiries that difficulty is very big, and in our video library, tennis tournament occurs 4 times altogether, the present invention has missed wherein two, reason is that we have used the inquiry of blue tennis court, and one the tennis court of missing is green, and another one mainly is player and spectators' camera lens, the camera lens that reflects blue court seldom, existing method has been missed this two fragments too.Similar with query fragment 1, query fragment 2 also is that the very strong and color characteristic of semanteme is difficult to the fragment utilized, and comprehensive entire segment reflects the basic colors feature that this is semantic, and the present invention has also obtained good retrieval effectiveness.On retrieval rate, faster than existing method, query fragment is long more equally in the present invention, and advantage of the present invention is obvious more, for example at query fragment 2, speed of the present invention than existing method fast more than 5 times.In addition, as shown in Figure 7, compare than existing methods, significant advantage of the present invention shows also that according to similarity because except visual signature, the present invention has also considered the different factors of similar fragment from big to small on the homotaxy fragment, and the similarity of existing method only depends on the quantity of similar camera lens, show that by test result the present invention meets people's visual signature and psychological characteristics more in the ordering of similar fragment to the several people.

By gathering 3 hours 11 minutes video frequency program, and and the best and up-to-date existing method of the experiment effect contrast that experimentizes at present, the result shows, adopt video fragment searching method of the present invention, can obtain higher retrieval precision and retrieval rate faster, simultaneously on the putting in order of similar fragment, meet people's psychological characteristics more.Except 6 advertising inquiries that table 1 and table 2 are listed, we have inquired about tens different editors' advertisements again, and the present invention has obtained 100% precision ratio and recall ratio.

Claims

1, a kind of content-based video fragment searching method may further comprise the steps:

(1) at first carries out shot boundary and detect, the video in query fragment and the video library is divided into camera lens; Measure the similarity of the camera lens of the camera lens of query fragment and video database then,, retrieve all similar in video database camera lenses to the camera lens of query fragment according to the tolerance result;

(3) these fragments have comprised real similar fragment and dissimilar fragment, and this moment, the maximum coupling of graph theory was used for filtering dissimilar fragment, arrived next step and only keep similar fragment;

2, a kind of content-based video fragment searching method as claimed in claim 1, it is characterized in that: when carrying out video clip retrieval, theory, algorithm and the result of bipartite graph in the graph theory are incorporated on the measuring similarity of video content, specifically, be that the Hungarian algorithm of maximum coupling and the Kuhn-Munkres algorithm of Optimum Matching in the graph theory are used for content-based video clip retrieval.

3, a kind of content-based video fragment searching method as claimed in claim 2 is characterized in that: in the step (3), utilize the Hungarian algorithm of maximum coupling to filter dissimilar fragment and definite real similar fragment: for bipartite graph G _k={ X, Y _k, E _k, if Fragment Y then _kX is similar to query fragment, in the aforementioned calculation formula, and E _k={ e _Ij, e _IjExpression x _iWith y _jSimilar, maximum coupling M E _k, and any two limits are all non-conterminous among the M, and n is the camera lens number of query fragment X.

4, a kind of content-based video fragment searching method as claimed in claim 2, it is characterized in that: in the step (4), utilize the Kuhn-Munkres algorithm and the concrete similarity of calculating two fragments of dynamic programming algorithm of Optimum Matching: the Kuhn-Munkres algorithm computation query fragment X and the similar fragment Y of Optimum Matching _kThe vision factor

Vision = \frac{ω}{n},

ω is cum rights bipartite graph G in the formula _k={ X, Y _k, E _kAuthority, n is the camera lens number of query fragment X; Based on the result of Optimum Matching, the order factor of two similar fragments of dynamic programming algorithm tolerance

order = \frac{c [i, j]}{n};

Interference factor is also further measured:

Interference = \frac{2 \times | M |}{n + l},

L is similar fragment Y ' in the formula _kThe camera lens number, | M| represents G _k={ X, Y _k, E _kThe limit number of Optimum Matching, the similarity of final two fragments is expressed as the linear combination of the above-mentioned vision factor, the order factor and interference factor: Similarity (X, Y _k')=ω ₁Vision+ ω ₂Order+ ω ₃Interference, the ω in this formula ₁, ω ₂, ω ₃Represent respectively vision, in proper order, the weight of interference factor.

5, a kind of content-based video fragment searching method as claimed in claim 1 is characterized in that: in the step (2), tentatively be partitioned into fragment similar to query fragment X among the video library Y: with camera lens y similar to query fragment X among the video library Y _jThese y are investigated in ordering then from small to large _jContinuity, if | y _J+1-y _j|＞2, j=1,2 ..., λ-1 then obtains a possible similar fragment Y _k={ y _i, y _I+1..., y _j, i, j ∈ [1, λ], in the above-mentioned formula, λ is the length of video library Y, shows with the camera lens numerical table.

6, a kind of content-based video fragment searching method as claimed in claim 4 is characterized in that the Optimum Matching computation vision factor is as follows with the method on the border of determining similar fragment:

The similar value of every pair of similar camera lens is composed to G as weights _k={ X, Y _k, E _kEvery limit, G at this moment _kJust be converted into the bipartite graph of a cum rights, the Kuhn_Munkres algorithm that specifically calculates Optimum Matching is as follows:

(1) provides initial label

l (x_{i}) = \max_{j} ω_{ij}, l (y_{j}) = 0, i, j = 1,2 . . ., t, t = \max (n, m);

(5) if

N_{G_{l}} (A) = B

, then changeed for (9) step, otherwise carry out next step, wherein,

N_{G_{l}} (A) &SubsetEqual; Y_{k},

Be with A in the node set of node adjacency;

(6) look for a node

y &Element; N_{G_{l}} (A) - B;

(9) be calculated as follows a value:

a = \min_{\underset{y_{j} &NotElement; N_{G_{l}} (A)}{x_{i} &Element; A}} {l (x_{i}) + l (y_{j}) - ω_{ij}},

Revise label:

Ask E according to l ' _{L '}And G _{L '}

(10) l ← l ', G _l← G _{L '}, changeed for (6) step;

Vision = \frac{ω}{n};

In order to determine Y _kThe segment boundaries similar to X, the present invention gets all y of the related M of X, and ordering is { y from small to large _α, y _β..., y _γ, α, beta, gamma ∈ [1, m], in this set, y _α, y _βPossibility is also discontinuous, i.e. y _β-y _α＞1, according to the successional definition of video segment, the present invention gets y _αWith y _γBetween all camera lenses constitute similar fragment Y _k'={ y _α, y _α+1..., y _γ.

7, a kind of content-based video fragment searching method as claimed in claim 4 is characterized in that the method for the dynamic programming algorithm computation sequence factor is as follows:

In the Optimum Matching M that calculates, further investigate Y _k' and the corresponding in chronological order situation of X, promptly find Y _k' the full length shot number on limit is arranged with X in chronological order, measure the order factor with this; This problem can be summed up as the longest common subsequence (LCS) problem: given two sequence X={ x ₁, x ₂..., x _nAnd Y _k'=(y _α, y _α+1..., y _γ, require to find out X and Y _k' a common subsequence the longest, dynamic programming algorithm can effectively address this problem, for convenience of calculation, we are { y _α, y _α+1..., y _γBe expressed as { y ₁, y ₂..., y _l, c[i is used in l=γ-α+1, j] records series X and Y _k' the length of long common subsequence, it is as follows to set up recurrence relation:

The order factor

order = \frac{c [i, j]}{n} .