CN101719271A

CN101719271A - Video shot boundary detection method based on mixed projection function and support vector machine

Info

Publication number: CN101719271A
Application number: CN200910154120A
Authority: CN
Inventors: 凌坚; 练益群
Original assignee: Zhejiang University of Media and Communications
Current assignee: Zhejiang University of Media and Communications
Priority date: 2009-11-05
Filing date: 2009-11-05
Publication date: 2010-06-02

Abstract

The invention discloses a video shot boundary detection method which utilizes a mixed projection function to compute video characteristics and adopts a support vector machine. Video is formed by connecting a plurality of shots and is a video data structured basic unit. The video making is positioned by adopting the boundary of the shots. The method comprises the steps of: firstly, introducing the image projection function, defining a video frame pitch measurement by the combination of the projection function; secondly, computing a characteristic vector of a video frame sequence; thirdly, selecting a candidate sequence detecting the shot boundary through analyzing the change of the video frame pitch; and fourthly, respectively identifying shearing boundaries and gradient boundaries of the shots by using an adaptive threshold and the support vector machine. The method has smaller time complexity and higher detection accuracy, and is suitable for a plenty of digital video automatic analysis.

Description

Video shot boundary detection method based on mixed projection function and support vector machine

Technical field

The present invention relates to multimedia, Video processing and video analysis, area of pattern recognition relates in particular to a kind of video shot boundary detection method that utilizes mixed projection function and support vector machine.

Background technology

Video data is unit with the frame, by particular time interval (as PAL, TSC-system etc.) sequential organization, be the linear structure of one dimension.Yet video data is comprising complicated semantic content, has complicated " paragraph " structure.The structuring of video data is exactly the kind semantic structure that exists in the analysis video, is the basic premise that video analysis and video are understood.By order from coarse to fine, a video can be divided into video (Video), fragment (Section), scene (Scene) and camera lens (Shot) four-layer structure.Wherein camera lens is made up of the frame of video continuous in time that the shooting of video camera one-time continuous obtains, and connects by different conversion regimes between camera lens.Any one section video data all is made up of camera lens, it is the elementary cell of video content analysis, the division of camera lens is the basis that whole video is analyzed, and has only at first video sequence is resolved into camera lens, could further carry out key-frame extraction, video breviary and video sequence and debate work such as knowledge.Therefore, the detection (shotdetection) that camera lens switches has become at first needs the problem that solves in the video retrieval technology, and its quality that detects effect will directly have influence on the performance that video analysis, video are understood.

Conversion regime has two kinds between the camera lens: sudden change (Cut Transition and gradual change (Gradual Transition).Sudden change is meant that the last frame of last camera lens directly links to each other with the first frame of back one camera lens, and any montage effect is not used in the centre.Gradual change then is slowly to change to another camera lens from a camera lens, and whole transfer process is finished gradually, continues tens or tens frames usually.The type of gradual change is very abundant, and some video editing instrument as AdobePremiere and Ulead MediaStudio, can provide kind more than 100 different edit methods.

Because camera lens is made up of a sequence of frames of video with time and space continuity, therefore, each the interframe content in the same camera lens is close; And when camera lens was changed, video content can have greatly changed, this variation shows generally that color distortion increases suddenly, new and old edge away from, the change of object shapes and the aspects such as uncontinuity of motion.The basic thought that shot boundary detects is exactly by selecting suitable feature to measure difference between frame of video, thereby seeks the rule that shot boundary changes, and by analyzing variation characteristic, the border of identification camera lens.In addition, how eliminating the influence to algorithm performance of noise, particularly illumination variation, video camera or object of which movement, also is the major issue that shot boundary detection algorithms is considered.

Summary of the invention

The objective of the invention is to overcome the deficiencies in the prior art, a kind of video shot boundary detection method based on mixed projection function and support vector machine is provided.

Methods for Shot Boundary Detection of Video Sequences based on mixed projection function and support vector machine comprises the steps:

(1) the video frame images data are asked for difference and integral projection function on vertical side and horizontal direction;

(2) the projection function value of frame of video is as the proper vector of this frame of video, and the video frame-to-frame differences is the distance of proper vector in its vector space of 2 frame of video;

(3) utilize moving window, calculate the frame-to-frame differences of the adjacent video frames in window, utilize self-adapting threshold to judge the shot-cut border;

(4) for a sequence of frames of video, the vector that the frame-to-frame differences of adjacent video frames is formed is the proper vector of sequence of frames of video, choose candidate's sequence of frames of video according to the frame-to-frame differences of adjacent video frames, uniformly-spaced delete or interpolation by sequence of frames of video, the proper vector length of sequence of frames of video is reached establish designated value;

(5) utilize support vector machine that the frame-to-frame differences vector of sequence of frames of video is classified, identification belongs to the sequence of frames of video of progressive formation.

Described video frame images data ask for difference on vertical side and horizontal direction and integral projection function step comprises:

With a plurality of row (row) is that unit calculates integration projection function value and difference function value, and discrete horizontal integral projection function MH and the expression formula of vertical integral projection function MV are respectively:

{MH}_{i} = \frac{1}{w * λ} Σ_{j = 0}^{λ - 1} Σ_{k = 0}^{w - 1} I (k, i * λ + j) (i = 1,2, . . . . . ., m - 1) - - - 1

{MV}_{i} = \frac{1}{h * η} Σ_{j = 0}^{μ - 1} Σ_{k = 0}^{h - 1} I (i * η + j, k) (i = 1,2, . . . . . ., n - 1) - - - 2

Discrete horizontal difference projection function with vertical difference projection function expression formula is:

{MH}_{i} = \frac{1}{w * λ} Σ_{j = 0}^{λ - 1} Σ_{k = 0}^{w - 1} {[I (k, i * λ + j) {MH}_{i}]}^{2} (i = 1,2, . . . . . ., m - 1) - - - 3

{MV}_{i} = \frac{1}{h * η} Σ_{j = 0}^{μ - 1} Σ_{k = 0}^{h - 1} {[I (i * η + j, k) - {MV}_{i}]}^{2} (i = 1,2, . . . . . ., n - 1) - - - 4

Wherein, w and h are respectively the width and the height of frame in the video, and λ and η are respectively the line number and the columns of merging,

The projection function value of described frame of video is as the proper vector of this frame of video, and the video frame-to-frame differences is proper vector the comprising apart from step in its vector space of 2 frame of video:

(1) with the length of frame-to-frame differences vector as frame pitch from, computing formula is:

FFD (i) | = | V (i + 1) - V (i) | = \sqrt{Σ_{k = 0}^{k = m - 1} {({DH}_{i + 1, k} - {DH}_{i, k})}^{2} + Σ_{k = 0}^{k = n - 1} {({DV}_{i + 1, k} - {DV}_{i, k})}^{2}} - - - 5

Wherein, DH _{I, j}Be j horizontal projection function of i frame in the video, DV _{I, j}Be j vertical projection functional value of the i frame of video.

(2) formation of the proper vector of sequence of frames of video: be i the component of adjacent frame-to-frame differences in the frame sequence as proper vector, proper vector can be expressed as:

SV＝[V ₂-V ₁，V ₃-V ₂，...，V _n-V _n-1] ^T 6

V _iProper vector for the i frame of video in the sequence.

The described moving window that utilizes, the frame-to-frame differences of the adjacent video frames of calculating in window, utilize self-adapting threshold to judge that shot-cut border step comprises:

Judge between i and the i+1 frame that whether to be adaptive threshold T (i) method of determining on border be: getting the i frame in sequence of frames of video is a moving window at center, asks in the window ranges interframe distance D of time big consecutive frame _Sec-max, take from adaptation threshold value T (i)=a * D _Sec-max, wherein a is a certain constant.

Whether camera lens is the border by following condition judgment i and i+1 frame:

Wherein, N is the width of moving window.If | FFD (i) | be the maximal value in the moving window, and greater than a of second largest value in the moving window doubly, then think to have shot-cut between i frame and the i+1 frame.

The described support vector machine of utilizing is classified to the frame-to-frame differences vector of sequence of frames of video, and the sequence of frames of video step that identification belongs to progressive formation comprises:

(1) determines proper vector

The corresponding frame-to-frame differences sequence vector that length is l-1 of video sequence that length is l (l 〉=2) frame.By the frame-to-frame differences sequence vector of candidate sequence correspondence being done linear interpolation or uniformly-spaced deletion, corresponding frame-to-frame differences sequence vector is spreaded or is compressed to a regular length.With the proper vector of the sequence definition candidate sequence of frame-to-frame differences vector, and the input vector of this vector as the SVM input space:

x = \frac{1}{\underset{i, j}{Max} (| f_{i, j} |)} {[f_{1,1}, f_{1,2}, . . ., f_{1, m + n}, f_{, 2,1}, f_{2,2}, . . ., f_{l, 1}, f_{l, 2}, . . ., f_{l, n + m}]}^{T} - - - 8

Wherein, l be the candidate sequence through spread or compress after frame number, i.e. sequence length, m, n are respectively the number of projection components on level and the vertical direction, f _{I, j}J component for i vector in the frame-to-frame differences sequence vector of candidate sequence correspondence

(2) adopt training sample to determine the support vector collection.

Determined after the proper vector of problem, just can ask for the support vector collection of SVM with the proper vector of the candidate sequence of known type as training sample, again according to support vector collection structure support vector machine.

(3) make up following support vector machine, be used for the candidate frame sequence is classified

f (x) = sgn [\underset{x &Element; SV}{Σ} α_{i} y_{i} K (x_{x}, x) + b] - - - 9

Wherein, K is the RBF kernel function, and threshold value b can try to achieve by arbitrary standard support vector.

The useful effect that the present invention compared with prior art has is:

(1) space characteristics of a kind of mixed projection function based on difference and integral projection function as video frame images proposed, compare with feature extracting method in the past based on pixel or profile, reduce the time complexity and the characteristic dimension of feature extraction, and can reduce the influence that random noise common in the video detects shot boundary effectively.

(2) a kind of define method that utilizes the frame of video proper vector to calculate the proper vector of distance definition sequence of frames of video between frame of video has been proposed, by to the proper vector of sequence of frames of video from slightly to the multiple analysis of essence, determine candidate frame sequence and shearing lens border fast, solved the contradiction between accuracy of detection and the detection speed.

(3) method by adopting support vector machine is carried out the gradual change border of sorting technique detector lens at candidate's frame sequence, avoided the easy omission gradual change of employing threshold method camera lens weakness, has also avoided the picture modelling only at the detection limit of designated model.

Description of drawings

Fig. 1 is the basic step synoptic diagram of algorithm of the present invention;

Fig. 2 (a) is the mixed projection function of two field picture in the step 2, video frame images wherein, and wide and height is respectively 360 and 288;

Fig. 2 (b) is the horizontal mixed projection function of the image among Fig. 2 (a), and directions X is the position on the vertical direction, and the Y direction is a correspondence image position projection value in the horizontal direction;

Fig. 2 (c) is the vertical mixed projection function of the image among Fig. 2 (a), and directions X is the position on the horizontal direction, and the Y direction is a correspondence image position projection value in vertical direction;

Frame-to-frame differences change curve in Fig. 3 step 4, bigger frame-to-frame differences is that the scene variation causes in the camera lens between 0-80 frame and the 400-450 frame; Then 2 shot-cuts near the 305th frame and 550 frames; Near corresponding 2 gradual shot the 210th and the 500th frame;

Fig. 4 is the gradual shot sequence of 491-512 frame correspondence among Fig. 3, uniformly-spaced gets 10 frames wherein;

Fig. 5 is a realization adopting the inventive method;

Fig. 6 is a Filter link among the Graphic of DirectShow in Fig. 5 method.

Embodiment

Concrete technical scheme and the step of implementing is as follows:

1. calculate the difference projection function and the integral projection function of video frame images

One width of cloth two dimensional image can be analyzed by the one dimension projection function of two quadratures, the feature of analysis image is convenient in the reduction of dimension, and reduced calculated amount, the present invention adopts has integral projection function (Integral Projection Function with projection function, IPF) and the variance projection function (Variance Projection Function VPF) is the feature that the mixed projection function on basis calculates video frame images.

Suppose that (x is that (x, gray scale y) or color component value are so at interval [x at point for image y) to I ₁, x ₂] on the average integral projection function M of horizontal direction _h(y) with at interval [y ₁, y ₂] on the average integral projection function M of vertical direction _v(x) be respectively:

M_{h} (y) = \frac{1}{x_{2} - x_{1}} {&Integral;}_{x_{1}}^{x_{2}} I (x, y) dx - - - 1

M_{v} (x) = \frac{1}{y_{2} - y_{1}} {&Integral;}_{y_{1}}^{y_{2}} I (x, y) dy - - - 2

Level (vertically) integral projection function be in the gray scale of locational all pixels of a certain par (vertically) or color component value integration and.When the gray average of certain delegation of image changes, this variation meeting reflects on horizontal projection integral function value, it is same when a certain row gray scale of image changes, this variation also can reflect from the vertical projection functional value, can extract feature in the image by the integral projection functional value.Because the integral projection function does not have the situation of change of consideration in projecting direction epigraph gray scale, can't distinguish two identical width of cloth images of gray average on the projecting direction.In order to reflect the variation of gradation of image, consider to replace average Here it is variance projection function with difference.

Suppose that (x is that (x, gray scale y) or color component value are so at interval [x at point for image y) to I ₁, x ₂] on the difference projection function σ of horizontal direction _hWith at interval [y ₁, y ₂] on the difference projection function σ of vertical direction _vBe defined as respectively:

σ_{h} (y) = \frac{1}{x_{2} - x_{1}} {&Integral;}_{x_{1}}^{x_{2}} {[I (x, y) - M_{h} (y)]}^{2} dx - - - 3

σ_{v} (x) = \frac{1}{y_{2} - y_{1}} {&Integral;}_{y_{1}}^{y_{2}} {[I (x, y) - M_{v} (x)]}^{2} dy - - - 4

M wherein _h(y) and M _v(x) be exactly formula 1 and the level of 2 definition and the average integral projection function on the vertical direction.When the variance of a certain row of image (OK) pixel gray scale changed, this variation meeting reflected on the variance projection value.VPF can utilize VPF as characteristics of image image to be analyzed to random noise and insensitive.

2. the feature extraction of video frame images

Image integration projection function value and variance projection function value have separately advantage and limitation as characteristics of image, the integral projection function can't be distinguished two width of cloth at projecting direction upper integral and identical image, the difference projection feature can't be distinguished the identical image of the variance of two width of cloth on projecting direction, but is not difficult to find to have very strong complementarity between them from definition separately.The present invention is by after carrying out suitable processing to integral projection function and difference projection function respectively, defines mixed projection function with their combination, utilizes the mixed projection function value as characteristics of image.In the horizontal direction with vertical direction on mixed projection function be defined as:

H_{v} (x) = \frac{1}{2} {σ^{'}}_{v} (x) + \frac{1}{2} {M^{'}}_{v} (x) - - - 5

H_{h} = \frac{1}{2} {σ_{h}}^{'} (y) + \frac{1}{2} {M^{'}}_{h} (y) - - - 6

σ ' wherein _v(x), σ ' _h(x), M ' _v(x), M ' _h(x) be σ respectively _v(x), σ _h(x), M _v(x), M _h(x) standardize to the result of interval [0,1]:

{σ_{v}}^{'} = \frac{σ_{v} (x) - \min (σ_{v} (x))}{\max (σ_{v} (x)) - \min (σ_{v} (x))} - - - 7

{σ_{h}}^{'} = \frac{σ_{h} (x) - \min (σ_{h} (x))}{\max (σ_{h} (x)) - \min (σ_{h} (x))} - - - 8

{M_{v}}^{'} (x) = \frac{M_{v} (x) - \min (M_{v} (x))}{\max (M_{v} (x)) - \min (M_{v} (x))} - - - 9

{M_{h}}^{'} (x) = \frac{M_{h} (x) - \min (M_{h} (x))}{\max (M_{h} (x)) - \min (M_{h} (x))} - - - 10

Mixed projection function is insensitive to random noise.If X is a stochastic variable, its expectation is respectively E (X) and σ (x) with variance.η is random noise independently, satisfies normal distribution N (0, σ (η)), then:

\frac{1}{2} σ (X + η) + \frac{1}{2} E (X + η) = \frac{1}{2} {(E (X + η - E (X + η))}^{2} + E (X))

= \frac{1}{2} (E {(X - E (X))}^{2} + E (η^{2}) + E (X)) - - - 11

= \frac{1}{2} (σ (X) + E (X) + σ (η))

Generally speaking, σ (η)＜＜σ (X), so,

Therefore, mixed projection function can effectively hang down the influence that overcomes random noise common in the video to random noise and insensitive based on mixed projection function frame feature.

3. frame of video feature and interframe distance calculation

Frame of video is to form the base unit of video, and the feature of definition and acquisition single-frame images is the basis of further video analysis.

The present invention has proposed the frame character representation method based on mixed projection function in conjunction with the advantage based on pixel and histogram method.

Boundary vicinity except video lens, have very big similarity between the adjacent frame of video, what consider that interframe changes in the same camera lens mainly is that the pixel location that causes such as mobile by the push-and-pull (Zoom in/out) of the moving of gamma camera, camera lens and scene, target object moves.In order to reduce to this mobile susceptibility, also reduce simultaneously intrinsic dimensionality, this paper calculates integration projection function value and difference function value with a plurality of row/unit of classifying as, and the expression formula of Li San horizontal integral projection function MH and vertical integral projection function MV is respectively like this:

{MH}_{i} = \frac{1}{w * λ} Σ_{j = 0}^{λ - 1} Σ_{k = 0}^{w - 1} I (k, i * λ + j) (i = 1,2, . . . . . ., m - 1) - - - 12

{MV}_{i} = \frac{1}{h * η} Σ_{j = 0}^{μ - 1} Σ_{k = 0}^{h - 1} I (i * η + j, k) (i = 1,2, . . . . . ., n - 1) - - - 13

Similarly, Li San horizontal difference projection function with vertical difference projection function expression formula is:

{MH}_{i} = \frac{1}{w * λ} Σ_{j = 0}^{λ - 1} Σ_{k = 0}^{w - 1} {[I (k, i * λ + j) {MH}_{i}]}^{2} (i = 1,2, . . . . . ., m - 1) - - - 14

{DV}_{i} = \frac{1}{h * η} Σ_{j = 0}^{μ - 1} Σ_{k = 0}^{h - 1} {[I (i * η + j, k) - {MV}_{i}]}^{2} (i = 1,2, . . . . . ., n - 1) - - - 15

Here, w and h are respectively the width and the height of frame in the video, and λ and η are respectively the line number and the columns of merging,

Owing to when calculating projection function, image has been divided into the band on level and the vertical direction, in the video since the pixel location that causes such as move of the push-and-pull of the moving of gamma camera, camera lens and scene, target object to move the susceptibility of the influence that brings also very low.Therefore, the characteristics of image of

formula

14 and 15 expressions are particularly suitable for the occasions such as Boundary Detection, key frame analysis of camera lens in the analysis of digital video.

The functional value that utilizes

formula

12,13,14 and 15 to obtain can calculate frame of video in the horizontal direction with vertical direction on the mixed projection function value.Arrange vertical direction projection function value and the projection function value of the horizontal direction of arranging by the Y direction can constitute the vector of a m+n dimension by directions X, and form the proper vector v of frame of video thus:

V＝[DH ₀，DH ₁，...，DH _m-1，DV ₀，DV ₁，...，DV _n-1] ^T 16

The frame-to-frame differences of getting between i and the i+1 frame is the interframe distance vector:

FFD(i)＝V(i+1)-V(i) 17

Wherein, v (i) and v (i+1) are the proper vector of i frame and i+1.

Obviously, FFD (i) also is the vector of a m+n dimension, its vector length | FFD| has represented the Euclidean distance of proper vector in the vector space of correspondence of i frame and i+1 frame, can reflect frame difference roughly.At the initial stage of gradual change Shot Detection, utilize | the value of FFD| can at first be got rid of frame in the most camera lens, thereby improves detection speed.

The gradual change of camera lens is to another shot transition from a camera lens, in the gradual shot process, consecutive frame have relatively large frame pitch from, in most of the cases, frame pitch between the consecutive frame of this frame pitch in being greater than camera lens from, but, also might have bigger frame-to-frame differences (claiming that frame sequence is compound movement in the camera lens in this class camera lens) between the consecutive frame in the camera lens owing to the moving of gamma camera, the push-and-pull of camera lens and the motion of subject or the like reason.In order to reduce the contingency that single frame-to-frame differences changes, at first to frame pitch from carrying out smoothing processing; The frame-to-frame differences that utilizes less interior video sequence of camera lens of threshold method eliminating frame-to-frame differences and shot-cut to cause again changes, and filters out the bigger video sequence of frame-to-frame differences as candidate sequence, and these candidate sequences comprise compound movement in gradual shot and the camera lens.As long as this threshold value is enough little, total energy keeps all gradual shot processes in the candidate sequence.The detection of gradual shot has just become two classification problems of compound movement in progressive formation and the camera lens.

4. shot-cut Boundary Detection

Shear between two camera lenses directly links together two camera lenses and obtains, and any video editing special efficacy is not used in the centre.There is vision difference in two field picture in two different camera lenses, so shot-cut just corresponds to the unexpected variation of vision content between first two field picture of the last frame image of last camera lens and adjacent camera lens.Generally select for use a kind of characteristic quantity to characterize the vision content of two field picture in the video to the detection of shot-cut, weigh the variation of vision content again with the variation of this characteristic quantity, thereby visual shot-cut is converted into variation on the mathematical quantity.Basic step is extracted feature exactly, calculate the difference of the eigenwert of adjacent two frames, this difference and threshold value (may be that self-adapting threshold also may be a global threshold) that certain is specific are compared, if greater than this threshold value, think that then these adjacent two frames belong to different camera lenses respectively, promptly detected shot boundary, otherwise thought that these adjacent two frames belong to same camera lens.

According to above-mentioned thinking, this paper has proposed the detection method on a kind of shot-cut border, choose feature in this method based on projection function, with the Euclidean distance of difference vector between frame of video | FFD| as frame pitch from, utilize frame pitch from the existence of relatively judging shot-cut of an adaptive threshold.Describe this method below in detail.

With the length of frame-to-frame differences vector as frame pitch from, can be calculated as follows:

FFD (i) | = | V (i + 1) - V (i) | = \sqrt{Σ_{k = 0}^{k = m - 1} {({DH}_{i + 1, k} - {DH}_{i, k})}^{2} + Σ_{k = 0}^{k = n - 1} {({DV}_{i + 1, k} - {DV}_{i, k})}^{2}} - - - 18

Judge between i and the i+1 frame that whether to be the adaptive threshold T (i) on border determine as follows: getting the i frame in sequence of frames of video is a moving window at center, finds out in the window ranges interframe distance D of time big consecutive frame _Sec-max, take from adaptation threshold value T (i)=a * D _Sec-max, wherein a is a certain constant, can determine according to the actual conditions of video type.

Wherein, N is the width of moving window.If | FFD (i) | be the maximal value in the moving window, and greater than a of second largest value in the moving window doubly, then think to have shot-cut between i frame and the i+1 frame.The pattern information that this method has used shot-cut to form on time shaft, a is equivalent to the form parameter of the frame-to-frame differences curve of shot-cut formation.

5. gradual shot Boundary Detection

Gradual shot process length generally at several frames between tens frames, the corresponding frame-to-frame differences sequence vector that length is l-1 of the video sequence that length is l (l 〉=2) frame.By the frame-to-frame differences sequence vector of candidate sequence correspondence being done linear interpolation or uniformly-spaced deletion, corresponding frame-to-frame differences sequence vector is spreaded or is compressed to a regular length.Utilize the proper vector of the sequence definition candidate sequence of frame-to-frame differences vector, and the input vector of this vector as the SVM input space:

x = \frac{1}{\underset{i, j}{Max} (| f_{i, j} |)} {[f_{1,1}, f_{1,2}, . . ., f_{1, m + n}, f_{, 2,1}, f_{2,2}, . . ., f_{l, 1}, f_{l, 2}, . . ., f_{l, n + m}]}^{T} - - - 20

Wherein, l be the candidate sequence through spread or compress after frame number, i.e. sequence length, m, n are respectively the number of projection components on level and the vertical direction, f _{I, j}Be j component of i vector in the frame-to-frame differences sequence vector of candidate sequence correspondence, the dimension of proper vector is (m+n) * l.Therefore, the proper vector dimension is determined by following three factors:

1. the size of video image.

2. merged number of lines and columns, i.e. λ and η in formula 3 and the formula 4 when calculating the variance projection function;

3. spread or compress after the candidate sequence length l.

Determined after the proper vector of problem, just can ask for the support vector collection of SVM as training sample with the proper vector of the candidate sequence of known type, according to support vector collection structure support vector machine, this is the protruding double optimization problem under the inequality constrain again:

\max W (α) = Σ_{i = 1}^{l} α_{i} - \frac{1}{2} Σ_{i, j = 1}^{l} y_{i} y_{j} α_{i} α_{j} K (x_{i}, x_{j}) - - - 21

Satisfy constraint condition:

0≤α _i≤C i＝1，...，1

Σ_{i = 1}^{l} α_{i} y_{i} = 0 - - - 22

Wherein, α _iBe the corresponding Lagrange multiplier of frame-to-frame differences vector, K is the mapping function that the input space arrives feature space, satisfies the Mercer condition, gets the RBF kernel function; L is the candidate sequence number, i.e. the training sample number; x _iBe input vector; C is the constant greater than 0; Y ∈ 1,1} is determined by following formula:

Find the solution above-mentioned optimization problem and can obtain unique solution, wherein, corresponding α _i＞0 sample is support vector (SV, Support Machine), 0＜α _i＜C corresponding sample is standard support vector (NSV, Normal SupportVector).So just can make up following support vector machine, be used for the candidate frame sequence is classified:

f (x) = sgn [\underset{x &Element; SV}{Σ} α_{i} y_{i} K (x_{x}, x) + b] - - - 24

Wherein, K is the RBF kernel function, and threshold value b can try to achieve by arbitrary standard support vector:

b = y_{i} - \underset{x_{j} &Element; SV}{Σ} α_{j} y_{j} K (x_{j}, x_{i}) x_{i} &Element; NSV - - - 25

Video boundary detection method of the present invention, utilize difference and the complementarity of integral projection function in image characteristics extraction, adopt the difference of frame of video and the assemblage characteristic of integral projection function to replace traditional feature, improve feature extraction speed and overcome the influence that random noise is brought effectively based on pixel or profile; Adopt the moving window method to determine possible shot boundary then, and the candidate frame sequence is classified, detect shear and gradual change shot boundary respectively with adaptive threshold and support vector machine.This method has improved the precision and the speed of Boundary Detection, and has overcome the influence that random noise is brought in the video effectively.

Claims

1. the Methods for Shot Boundary Detection of Video Sequences based on mixed projection function and support vector machine is characterized in that comprising the steps:

(1) the video frame images data are calculated difference and integral projection function on vertical side and horizontal direction;

(3) utilize moving window, calculate the frame-to-frame differences of all adjacent video frames in window, utilize self-adapting threshold to judge the shot-cut border;

(4) in a sequence of frames of video, the vector that the frame-to-frame differences of adjacent video frames is formed is the proper vector of sequence of frames of video, choose candidate's sequence of frames of video according to the frame-to-frame differences of adjacent video frames, uniformly-spaced delete or interpolation by sequence of frames of video, make the proper vector length of all sequence of frames of video identical, length value can be specified in advance, generally gets 30-50;

2. a kind of Methods for Shot Boundary Detection of Video Sequences based on mixed projection function and support vector machine according to claim 1 is characterized in that described video frame images data are asking for difference and integral projection function on vertical side and horizontal direction, and step comprises:

{MH}_{i} = \frac{1}{w * λ} Σ_{j = 0}^{λ - 1} Σ_{k = 0}^{w - 1} I (k, i * λ + j)

(i＝1，2，......，m-1) 1

{MV}_{i} = \frac{1}{h * η} Σ_{j = 0}^{μ - 1} Σ_{k = 0}^{h - 1} I (i * η + j, k)

(i＝1，2，......，n-1) 2

{DH}_{i} = \frac{1}{w * λ} Σ_{j = 0}^{λ - 1} Σ_{k = 0}^{w - 1} {[I (k, i * λ + j) - {MH}_{i}]}^{2}

(i＝1，2，......，m-1) 3

{DV}_{i} = \frac{1}{h * η} Σ_{j = 0}^{μ - 1} Σ_{k = 0}^{h - 1} {[I (i * η + j, k) - {MV}_{i}]}^{2}

(i＝1，2，......，n-1) 4

3. a kind of Methods for Shot Boundary Detection of Video Sequences according to claim 1 based on mixed projection function and support vector machine, it is characterized in that the proper vector of the projection function value of described frame of video as this frame of video, the video frame-to-frame differences is the distance of proper vector in its vector space of 2 frame of video, and step comprises:

(1) with the length of frame-to-frame differences vector FFD as frame pitch from, computing formula is:

| FFD (i) | = | V (i + 1) - V (i) | = \sqrt{Σ_{k = 0}^{k = m - 1} {({DH}_{i + 1, k} - {DH}_{i, k})}^{2} + Σ_{k = 0}^{k = n - 1} {({DV}_{i + 1, k} - {DV}_{i, k})}^{2}} - - - 5

SV＝[V ₂-V ₁，V ₃-V ₂，...V _n-V _n-1] ^T 6

V _iProper vector for the i frame of video in the sequence.

4. a kind of Methods for Shot Boundary Detection of Video Sequences according to claim 1 based on mixed projection function and support vector machine, it is characterized in that the described moving window that utilizes, the frame-to-frame differences of the adjacent video frames of calculating in window, utilize self-adapting threshold to judge that shot-cut border step comprises:

Judge between i and the i+1 frame that whether to be adaptive threshold T (i) method of determining on border be: getting the i frame in sequence of frames of video is a moving window at center, asks in the window ranges interframe distance D of time big consecutive frame _Sec-max, take from adaptation threshold value T (i)=a * D _Sec-max, wherein a is a certain constant, value is between 0.1～1.0.

5. a kind of Methods for Shot Boundary Detection of Video Sequences according to claim 1 based on mixed projection function and support vector machine, it is characterized in that the described support vector machine of utilizing classifies to the frame-to-frame differences vector of sequence of frames of video, the sequence of frames of video step that identification belongs to progressive formation comprises:

(1) determines proper vector

x = \frac{1}{\underset{i, j}{Max} (| f_{i, j} |)} {[f_{1,1}, f_{1,2}, . . ., f_{1, m + n}, f_{, 2,1}, f_{2,2}, . . ., f_{l, 1}, f_{l, 2}, . . ., f_{l, n + m}]}^{T} - - - 8

(2) adopt training sample to determine the support vector collection.

f (x) = sgn [\underset{x &Element; SV}{Σ} α_{i} y_{i} K (x_{x}, x) + b] - - - 9