CN102314613B

CN102314613B - Information theory and support vector machine-based shot boundary detection algorithm

Info

Publication number: CN102314613B
Application number: CN 201110188738
Authority: CN
Inventors: 毕佳磊; 郎波; 刘祥龙; 李未
Original assignee: Beihang University
Current assignee: Beihang University
Priority date: 2011-07-06
Filing date: 2011-07-06
Publication date: 2013-06-19
Anticipated expiration: 2031-07-06
Also published as: CN102314613A

Abstract

The invention relates to an information theory and support vector machine-based shot boundary detection algorithm, which comprises the following steps: (1) acquiring total frames (nFrame) of a video, calculating the mutual information MI1 and the joint entropy JE1 of all adjacent frames, and calculating the mutual information MI2 between every two frames, between which one frame exists; (2) for each frame t which satisfies with boundary conditions and each k, calculating the ratio of the mean value of mutual information MIk in window w1 to MIk(t) through taking t as the center, taking the ratio as the dissimilarity between the frame t and a frame t+k, and recording the dissimilarity as Dk(t); (3) for each frame t which satisfies with the boundary conditions, constructing an eigenvector F(t) with the dimensionality of 2*w2 through taking t as the center and taking w2 as the window according to the Dk(t); and (4) taking the F(t) as the input of a trained vector machine, and outputting whether the frame t is the shot boundary or not. With the adoption of the algorithm provided by the invention, the shortcomings of the traditional threshold-based method can be overcome, and meanwhile, the higher rate of accuracy and recall is also reached.

Description

A kind of shot boundary detection algorithms based on information theory and support vector machine

Technical field

The present invention is a kind of on the basis of information theory, by the value of mutual information and combination entropy, and the structural attitude vector, the training vector machine, and carry out with support vector machine the algorithm that shot boundary detects, be the prerequisite that non-structured video data carries out information retrieval based on contents.

Background technology

Along with the develop rapidly of multimedia technology and network technology, digital video obtain and propagation becomes more and more easier, become gradually one of main carriers that human information propagates.In today of video information high level expansion, thing followed problem is exactly to the efficient retrieval of magnanimity video and browses.Traditional video frequency searching is retrieved by the mode that video is added textual identifier with the method for craft, and this retrieval mode workload is huge, efficient is very low, and is subjected to the impact of subjective factor, therefore can not satisfy the needs of actual use.Content-based video retrieval technology computer carries out its content of Procedure Acquisition of processing, analysis and understanding from the low layer to the high level and retrieves according to content video, overcome traditional deficiency based on the text retrieval mode, become the study hotspot in multimedia information retrieval field.

Video lens is cut apart the prerequisite that is based on the content video frequency searching.Up to the present a lot of shot boundary detection algorithms have appearred: have based on pixel method relatively, based on brightness or color histogram method relatively, based on mutual information method relatively, based on the method for machine learning etc.Traditional based on threshold ratio method, be difficult to seek a general threshold value; Based on the training set that the method for machine learning need to be constructed, just can predict.

Summary of the invention

Technology of the present invention is dealt with problems: change obvious mutual information when being chosen in lens mutation as essential characteristic, the obvious proper vector of structural classification, with in machine learning based on the method for support vector machine, carrying out shot boundary detects, overcome existing traditional deficiency based on threshold ratio method, detect more accurately and effectively the shot boundary in video.

Technical solution of the present invention: a kind of shot boundary detection algorithms based on information theory and support vector machine is characterized in that step is as follows:

(1) totalframes that obtains video is according to nFrame, extracts in video the RGB colouring information of each pixel in each frame, and calculates the mutual information MI of all consecutive frames ₁With combination entropy JE ₁, and calculate two interframe mutual information MI of all frames of being separated by ₂, and preserve this tittle;

(2) to satisfying each frame t of First Boundary Condition, to each frame period k, k=1 or 2 calculates centered by t, and length is the interior mutual information MI of the window of w1 _kAverage and MI _k(t) ratio as dissimilarity between frame t and frame t+k, is denoted as D _k(t);

(3) to satisfying each frame t of second boundary, centered by t, get the window that length is w2, according to D _k(t) proper vector F (t) that dimension is 2*w2 of structure;

(4) with the input of F (t) as the vector machine that trains, whether output frame t is shot boundary;

According to a further aspect of the invention, wherein step (1) further comprises step:

(a) initial value that i is set is 1, obtains the totalframes nFrame of video;

(b) the i frame of capturing video, the RGB information of each picture point in the extraction frame is calculated color histogram H to three color components of RGB respectively _R(i), H _G(i), H _B(i);

(c) if the joint histogram JH of frame i-1 and i is calculated respectively in i-1 〉=1 to three components of RGB _R(i-1, i), JH _G(i-1, i), JH _B(i-1, i) utilizes histogram and joint histogram to calculate the mutual information MI of two frames ₁(i-1), combination entropy JE ₁(i-1);

(d) if the joint histogram JH of frame i-2 and i is calculated respectively in i-2 〉=1 to three components of RGB _R(i-2, i), JH _G(i-2, i), JH _B(i-2, i) utilizes histogram and joint histogram to calculate the mutual information MI of two frames ₂(i-2), and the RGB data of deletion i-2 frame buffer, the ephemeral datas such as histogram and joint histogram;

(e) if i＜nFrame, i=i+1 turns (b);

According to a further aspect of the invention, wherein step (2) further comprises:

(a) calculate D _k(t) time, as follows to the First Boundary Condition that t arranges: t-w1/2 〉=1, t+k+w1/2≤nFrame, 1≤k≤2;

(b) calculate D _k(t) formula is as follows:

{MI}_{k} (t, W_{1}) = \frac{1}{W_{1}} \underset{i &NotEqual; t}{Σ_{i &Element; W_{1}}} {MI}_{k} (i)

D_{k} (t) = \frac{{MI}_{k} (t, W_{1})}{{MI}_{k} (t)};

According to a further aspect of the invention, wherein step (3) further comprises:

(a) during calculated characteristics vector F (t), as follows to the second boundary that t arranges: t-w2/2 〉=1, t+w2/2-1≤nFrame;

(b) to k=1,2 construct respectively the vector of W2 dimension

V (t, k) = {D_{k} (t - \frac{W_{2}}{2}), . . ., D_{k} (t), . . ., D_{k} (t + \frac{W_{2}}{2} - 1)};

(c) if JE ₁(t)＞T (T is empirical value) is normalized to V (t, k) with root mean square RMS

V′(t，k)＝V(t，k)/RMS；

Otherwise V ' (t, k) is set to constant vector

V^{'} (t, k) = {\frac{1}{W_{2}}, . . ., \frac{1}{W_{2}}, . . ., \frac{1}{W_{2}}};

(d) structural attitude vector F (t)={ V ' (t, 1), V ' (t, 2) };

According to a further aspect of the invention, wherein step (4) further comprises: the structure of data set during the vector machine training:

(a) choose comprise the polytype abrupt shot video as original training data.Such abrupt shot comprises: former and later two camera lenses have similar visual information; Comprise the motion of object or video camera in camera lens; Change procedure comprises the camera lens of 2-3 frame etc.;

(b) in the selection training video, all sudden change frames are positive example;

(c) select in training video sudden change frame former frame or a rear frame as counter-example take probability P 1;

(d) select normal frames in training videos as counter-example take probability P 2;

(e) for guaranteeing the balance of positive counter-example, P1=0.1 is set, P2=0.8* abrupt shot number/totalframes.

The present invention's advantage compared with prior art is: utilization of the present invention is based on the information theory basis, utilize quantity of information, construct a characteristic of division obvious, be applicable to the proper vector of support vector machine classification, carry out shot boundary and detect, also utilize simultaneously combination entropy to get rid of the impact of being fade-in fade-out and when changing, abrupt shot being detected.Experiment shows, this algorithm is in opposite directions for the method for tradition based on threshold value, and the histogram method of being combined with machine learning, has higher accuracy rate and recall rate.

Description of drawings

Fig. 1 is algorithm basic flow sheet of the present invention;

Fig. 2 is algorithm frames sequence number of the present invention and mutual information scatter diagram;

Fig. 3 is algorithm frames sequence number of the present invention and dissimilarity scatter diagram;

Fig. 4 is the proper vector of the dissimilar frame of algorithm of the present invention;

Fig. 5 is the comparison diagram of invention algorithm and other two algorithms.

Embodiment

Below with reference to accompanying drawing, embodiments of the invention are described in detail.

At first algorithm principle of the present invention is described.

Camera lens refers to the one-time continuous shooting process of video camera.Research has very large similarity with true showing between the interior consecutive frame of same camera lens, if but in video camera lens switch, the difference in different camera lenses between two frames will be larger.Mutual information I (X, Y)=H (X)-H (X|Y), H (X) they are the unconditional entropy of X, i.e. the priori uncertainty of X, and H (X|Y) is the conditional entropy of X, after namely having known Y, the uncertainty that X also exists.I (X, Y) can be understood as and knows after Y with before knowing Y and compare, the uncertainty that X reduces.For two frames in a camera lens, due to closely similar, so after knowing Y, the uncertainty that X also exists is just smaller, namely H (X|Y) is smaller, so I (X, Y) is larger.And for two frames in different camera lenses, two frame differences are larger, after having known Y, X also deposited larger uncertainty, and namely H (X|Y) is larger, so I (X, Y) is less.Just because of this can represent similarity between two frames with mutual information.As shown in Figure 2, in camera lens boundary the 1131st, 1150,1193 frame position, the mutual information value is less.Algorithm of the present invention has utilized this characteristic of mutual information, the wicket w1 that is w1 (w1=4) in length to the mutual trust value carries out conversion, convert the dissmilarity value D (as shown in Figure 3) between two frames to, then getting a length is the large window of w2 (w2=20), obtain the proper vector (as shown in Figure 4) of corresponding frame, then according to the characteristics of proper vector, utilize support vector machine to classify, to judge that whether this frame is as shot boundary.

Particularly, improvement algorithm basic procedure proposed by the invention as shown in Figure 1.

The core concept that the present invention mainly comprises: utilize mutual information, structural attitude obviously, be beneficial to the proper vector that support vector machine is classified, judge shot boundary according to classification results, algorithm can avoid tradition based on the deficiency of threshold method, also reaches simultaneously higher accuracy rate and recall rate.

First be defined as follows variable before describing algorithm:

1. the totalframes of setting video is nFrame, and during processing video frames, the sequence number of present frame is i, when algorithm begins, makes i=1;

2. the window size of establishing when carrying out dissimilarity calculating by the mutual information value is w1;

3. establish that to carry out structural attitude when vector window size by the dissmilarity value be w2;

4. establish structural attitude when vector, judge that with combination entropy whether a certain frame is that the threshold value of the frame when being fade-in fade-out is T;

Arthmetic statement of the present invention is as follows:

1. calculate the mutual information MI of all consecutive frames ₁With combination entropy JE ₁, and calculate two interframe mutual information MI of all frames of being separated by ₂, and preserve this tittle: initial setting up i=1, obtain the totalframes nFrame of video; The suitable frame (i frame) of capturing video extracts the RGB information of each pixel, respectively three color components of RGB is calculated color histogram H _R(i), H _G(i), H _B(i); If the joint histogram JH of frame i-1 and i is calculated respectively in i-1 〉=1 to three components of RGB _R(i-1, i), JH _G(i-1, i), JH _B(i-1, i) utilizes histogram and joint histogram to calculate the mutual information MI1 (i-1) of two frames, combination entropy JE1 (i-1); If the joint histogram JH of frame i-2 and i is calculated respectively in i-2 〉=1 to three components of RGB _R(i-2, i), JH _G(i-2, i), JH _B(i-2, i) utilizes histogram and joint histogram to calculate the mutual information MI2 (i-2) of two frames, and deletes the RGB data of i-2 frame buffer, the ephemeral datas such as histogram and joint histogram; If i＜nFrame, i=i+1, the above operation of iteration;

2. to satisfying each frame t of boundary condition, to each k, calculate centered by t mutual information MI in window w1 _kAverage and MI _k(t) ratio as dissimilarity between frame t and frame t+k, is denoted as D _k(t): the boundary condition that wherein t is arranged is: t-w1/2 〉=1, t+k+w1/2≤nFrame, 1≤k≤2; Calculate D _k(t) formula is as follows:

{MI}_{k} (t, W_{1}) = \frac{1}{W_{1}} \underset{i &NotEqual; t}{Σ_{i &Element; W_{1}}} {MI}_{k} (i),

D_{k} (t) = \frac{{MI}_{k} (t, W_{1})}{{MI}_{k} (t)};

3. to satisfying each frame t of boundary condition, centered by t, get the window that length is w2, according to D _k(t) proper vector F (t) that dimension is 2*w2 of structure: the boundary condition that wherein t is arranged is: t-w2/2 〉=1, t+w2/2-1≤nFrame; To k=1,2 construct respectively the vector of W2 dimension

V (t, k) = {D_{k} (t - \frac{W_{2}}{2}), . . ., D_{k} (t), . . ., D_{k} (t + \frac{W_{2}}{2} - 1)};

If JE ₁(t)＞T (T is empirical value) is normalized to V ' (t, k)=V (t, k)/RMS with the root mean square RMS of each dimension values of vector with V (t, k); Otherwise V ' (t, k) is set to constant vector

Last structural attitude vector is F (t)-{ V ' (t, 1), V ' (t, 2) }; Here the threshold value T that uses combination entropy is mainly the impact of shot boundary being judged by accident for the frame that reduces in the process of being fade-in fade-out;

4. with the input of F (t) as the support vector machine that trains, whether output frame t is shot boundary: before utilizing support vector machine to classify, a training good model must be arranged, and the present invention is with following mode training pattern: choose comprise the polytype abrupt shot video as original training data.The abrupt shot here comprises: former and later two camera lenses have similar visual information, comprise the motion of object or video camera in camera lens, and change procedure comprises the camera lens of 2-3 frame etc.; In the selection training video, all sudden change frames are positive example; Select in training video sudden change frame former frame or a rear frame as counter-example take probability P 1; Select normal frames in training videos as counter-example take probability P 2; For guaranteeing the balance of positive counter-example, P1=0.1 is set, P2=0.8* abrupt shot number/totalframes.

Algorithm of the present invention and original difference based on mutual information algorithm maximum are: this algorithm uses feature of mutual information value structure obvious, be beneficial to the proper vector of classification (as shown in Figure 4, the proper vector of shot boundary place frame, the 10th, 30 Wei Chu reach obvious peak value, but not the proper vector of shot boundary frame or there is no obvious peak value, peak point is positioned at other positions), and utilize support vector machine to classify, when avoiding using threshold value to classify, threshold value is difficult to choose the problem of general threshold value; Simultaneously, this algorithm also utilizes combination entropy to add the method for threshold value, reduces the impact that the frame when being fade-in fade-out is judged by accident shot boundary.

Below by the contrast of algorithm of the present invention and two existing algorithms, to the present invention's accuracy rate when detecting abrupt shot, recall rate all is improved.Test experiments adopts TREC2001, and the standard testing data that TREC2002 announces have also been chosen 8 manual national geography videos (NG made in brief note) that mark simultaneously at random, and the summation ALL of all these data sets.

The algorithm (algorithm 1) of experiment to add threshold value based on mutual information compares based on algorithm (algorithm 2) and the algorithm of the present invention (algorithm *) of histogram and vector machine.Test result is as shown in XXX.Specify during each parameter of this algorithm in addition as follows in the experiment: T=1.8, w1=4, w2=20, the histogram segments bins=32 when calculating mutual information and combination entropy, (these values are empirical value).

As shown in Figure 5, algorithm of the present invention, with algorithm 1, algorithm 2 is compared, and in accuracy rate, all is improved on recall rate.

Can apparently draw other advantages and modification for the person of ordinary skill of the art.Therefore, the present invention who has more extensive areas is not limited to shown and described illustrating and exemplary embodiment here.Therefore, in the situation that do not break away from by the spirit and scope of claim and the defined general inventive concept of equivalents thereof subsequently, can make various modifications to it.

Claims

1. shot boundary detection algorithms based on information theory and support vector machine is characterized in that step is as follows:

(3) to satisfying each frame t of limit second boundary's condition, centered by t, get the window that length is w2, according to D _k(t) proper vector F (t) that dimension is 2*w2 of structure;

Described step (1) further comprises:

(a) initial value that i is set is 1, obtains the totalframes nFrame of video;

(d) if the joint histogram JH of frame i-2 and i is calculated respectively in i-2 〉=1 to three components of RGB _R(i-2, i), JH _G(i-2, i), JH _B(i-2, i) utilizes histogram and joint histogram to calculate the mutual information MI of two frames ₂And delete the RGB data of i-2 frame buffer, histogram and joint histogram ephemeral data (i-2);

(e) if i＜nFrame, i=i+1 turns step 1 (b);

Described step (2) further comprises:

(a) calculate D _k(t) time, as follows to the First Boundary Condition that t arranges: t – w1/2 〉=1, t+k+w1/2≤nFrame, 1≤k≤2;

(b) calculate D _k(t) formula is as follows:

{MI}_{k} (t, W_{1}) = \frac{1}{w_{1}} Σ_{\underset{i &NotEqual; t}{i &Element; w_{1}}} {MI}_{k} (i)

D_{k} (t) = \frac{{MI}_{k} (t, w_{1})}{{MI}_{k} (t)};

Described step (3) further comprises:

(b) to k=1,2 construct respectively the vector of W2 dimension

V (t, k) = {D_{k} (t - \frac{W 2}{2}), . . ., D_{k} (t), . . ., D_{k} (t + \frac{W 2}{2} - 1)};

(c) if JE ₁(t)〉T is normalized to V (t, k) with root mean square RMS

V ' (t, k)=V (t, k)/RMS, wherein T is empirical value;

Otherwise V ' (t, k) is set to constant vector

V^{'} (t, k) = {\frac{1}{W 2}, . . ., \frac{1}{W 2}, . . ., \frac{1}{W 2}};

(d) structural attitude vector P (t)={ V ' (t, 1), V ' (t, 2) }.

2. the shot boundary detection algorithms based on information theory and support vector machine according to claim 1, it is characterized in that: described step (4) further comprises: the structure of data set during the vector machine training:

(a) choose comprise the polytype abrupt shot video as original training data, such abrupt shot comprises: former and later two camera lenses have similar visual information; Comprise the motion of object or video camera in camera lens; Change procedure comprises the camera lens of 2-3 frame;