CN107222795A

CN107222795A - A kind of video abstraction generating method of multiple features fusion

Info

Publication number: CN107222795A
Application number: CN201710486660.9A
Authority: CN
Inventors: 李泽超; 唐金辉; 胡铜铃
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2017-06-23
Filing date: 2017-06-23
Publication date: 2017-09-29
Anticipated expiration: 2037-06-23
Also published as: CN107222795B

Abstract

The invention provides a kind of video abstraction generating method of multiple features fusion, comprise the following steps：Obtain video and regard video as input data；The number of the segmentation of fragment, record cut-point and video segment is carried out to the video data of input；Extract the frame of video and frame of video central block in each video segment；Frame of video and frame of video central block respectively to extraction carries out the calculating of feature and picture quality；The calculating of global importance and local importance is carried out according to obtained feature；The global importance of each frame to obtaining, which with local importance merge, to be obtained merging importance；The calculating of importance is carried out to each video segment according to cut-point；According to the importance and given threshold of obtained each video segment, video segment is selected, the video segment subset of an optimization is selected；The synthesis of video frequency abstract is carried out according to the video segment subset selected.

Description

A kind of video abstraction generating method of multiple features fusion

Technical field

The present invention relates to a kind of food analysis and image processing techniques, the video frequency abstract of particularly a kind of multiple features fusion is given birth to Into method.

Background technology

Current internet technology and can only the fast development of equipment cause people obtain video and browse the mode of video to become Obtain more diversified, while the video data faced is also more and more, in face of such substantial amounts of video data, how therefrom to find The video data or visual information needed to us is a current study hotspot, is also in the research of Video Analysis Technology Hold.On the Research foundation to massive video data, there is missing in the analysis to video data, the method such as processing and storage, Cause user that there is blindness when finding useful video data, currently the majority generates the result of video frequency abstract in addition All less preferable, because the video frequency abstract of many method generations is all static video frequency abstract, this video frequency abstract is unfavorable for user Browse, be less useful for assurance of the user to video content.Therefore need to carry out data mining to video data and image procossing is obtained To a kind of video abstraction generating method of the practical multiple features fusion based on global importance and local importance.

The content of the invention

It is an object of the invention to provide a kind of video of the multiple features fusion based on global importance and local importance Abstraction generating method, comprises the following steps：

Step 1, obtain video and regard video as input data；

Step 2, the number of the segmentation of fragment, record cut-point and video segment is carried out to the video data of input；

Step 3, the frame of video and frame of video central block in each video segment are extracted；

Step 4, frame of video and frame of video central block respectively to extraction carries out the calculating of feature and picture quality；

Step 5, the calculating of global importance and local importance is carried out according to obtained feature；

Step 6, the global importance of each frame to obtaining, which with local importance merge, obtains merging importance；

Step 7, the calculating of importance is carried out to each video segment according to cut-point；

Step 8, according to the importance and given threshold of obtained each video segment, video segment is selected, selected Go out the video segment subset of an optimization；

Step 9, the synthesis of video frequency abstract is carried out according to the video segment subset selected.

Present invention utilizes user obtain various video data, including obtained by smart machine and internet on obtain The various video data such as video data taken, the video data in these a variety of sources obtained, network can be covered as far as possible On all kinds video data；The present invention can quickly obtain the video frequency abstract that user wants without training, be user's section About substantial amounts of time and efforts；Whether present invention is alternatively directed to there is audio-frequency information dynamically to extract in video in video in addition Audio-frequency information is put into video frequency abstract；The present invention make use of video analysis and figure when user video summary result is presented to As the technology of processing, original video is analyzed and processed to the video frequency abstract concentrated, allows users to quickly obtain wanting concentration Video, has considerably improved the experience of user.

The present invention is described further with reference to Figure of description.

Brief description of the drawings

Fig. 1 is the video abstraction generating method stream of multiple features fusion of the present invention based on global importance and local importance Cheng Tu.

Fig. 2 is the original video frame schematic diagram that the present invention is extracted from original video.

Fig. 3 is the fritter that the frame of video that the present invention is extracted first is divided into 5x5, then extracts the 3x3 of core center Block is used for the schematic diagram for calculating local importance.

Fig. 4 is that the video frequency abstract generation system of multiple features fusion of the present invention based on global importance and local importance is drilled The design sketch shown.

Embodiment

With reference to Fig. 1, a kind of video abstraction generating method based on global importance with the multiple features fusion of local importance, Comprise the following steps：

Step 1, obtain video and regard video as input data；

Step 2, the video data of input is handled, obtains the number of cut-point one by one and video segment；

Step 6, the global importance of each frame to obtaining merge obtaining final fusion weight with local importance The property wanted；

Step 8, according to the importance of obtained each video segment, given threshold carries out video segment and selected, and selects Go out the video segment subset of an optimization；

Video data in step 1 can be obtained by internet and various smart machines, and obtaining the website of video includes http://www.youku.com/, http:The websites such as //www.iqiyi.com/, obtain the smart machine of video including various Smart mobile phone, flat board etc..

The video data of acquisition and is subjected to the segmentation of fragment to it as the video of input in step 2, superframe point is used Video segmentation into small video segment one by one is obtained one by the prospect of the method combination video cut, background and movable information The number of individual cut-point and video segment, shearing point and video segment number to video segment are preserved the meter so as to the later stage Calculate.

The extraction of frame of video and frame of video central block is carried out in step 3 for video, the extraction of frame of video uses routine Extracting method, but the extraction for frame of video central block needs first to split frame of video, here in order that must regard Feel that content is effectively maintained, frame of video is divided into 5x5 block, the 3x3 of core central block is then extracted For calculating local importance.

The calculating of picture feature and picture quality is carried out to the frame of video and frame of video central block of extraction in step 4, calculated Feature include vision significance exposure, saturation degree, colourity, Rule of thirds, contrast, direction degree also needs in addition Calculate the calculating of frame of video and the picture quality of frame of video central block；The calculation formula of wherein vision significance is：

In formula, A_SFor static conspicuousness, A_TFor time conspicuousness, γ is the empirical parameter of a non-negative, F_AIt is only referred to One function name of generation, for representing the fusion of two kinds of vision significances；

The calculation formula of exposure is：

Wherein X, Y are respectively that the video image of extraction is converted to the length and width of HSV images, x, during y is respectively passage V Location of pixels, I_V(x, y) is the V passages of HSV images.

The calculation formula of colourity is：

Wherein X, Y are respectively that the video image of extraction is converted to the length and width of HSV images, x, during y is respectively passage S Location of pixels, I_S(x, y) is the channel S of HSV images.

The calculation formula of saturation degree is：

Wherein X, Y are respectively that the video image of extraction is converted to the length and width of HSV images, x, during y is respectively passage V Location of pixels, I_H(x, y) is the V passages of HSV images.

Rule of thirds calculation formula is：

Wherein X, Y are respectively that the video image of extraction is converted to the length and width of HSV images, x, during y is respectively passage Location of pixels, I_H(x,y)、I_S(x,y)、I_V(x, y) is three passages of HSV images.f₅、f₆、f₇It is according to Rule of Thirds calculates three obtained characteristic values, and the main information mainly reflected with these three characteristic values in image is located at image Three points of positions near.

For contrast, the calculating of direction degree mainly uses Tamura textural characteristics to calculate, Tamura image lines Managing feature includes six kinds of features, is respectively：Roughness, contrast, direction degree, line granularity, six kinds of features of rule degree and smoothness, First three feature in this six kinds of features has very important effect for field of image search.

The picture quality of frame of video is obtained by the image quality evaluating method of non-reference pictureWith frame of video central block Picture qualityAnd picture quality is mainly used to the frame of video of constant extraction and the quality of frame of video central block, because from The possible mass ratio of the frame of video having and central block extracted in video is relatively low, so we need to consider the fuzzy of these distortions Whether these features that frame of video and central block are calculated can express video well, because picture quality quality is plucked to video The generation wanted has very important effect.

For the calculating of the global importance of every frame frame of video and local importance in step 5, the calculating of global importance is public Formula is：

Wherein k refers to kth frame video,It is the quality of frame of video, f_{G_1}~f_{G_9}Respectively require to calculate in 4 based on video The value of nine features of frame.

The calculation formula of local importance is：

Wherein k refers to kth frame video,It is the quality of frame of video, f_{L_1}~f_{L_9}Nine respectively based on frame of video central block The value of individual feature.

To every frame frame of video merge the calculating of importance in step 6, the importance of fusion is made up of two parts：Entirely Office's importance and local importance.Its calculation formula is：

I_G_k&L_k=I_G_k+I_L_k (10)

Wherein I_G_kAnd I_L_kThe respectively global importance of frame of video and local importance.

To the calculating of each video segment importance in step 7, main cutting according to the video segment obtained by step 2 The average fusion that the fusion importance of each frame frame of video obtained by point of contact and step 6 calculates each video segment is important Property, the calculating of this importance is prepared mainly for the selection to ensuing video segment subset.

The calculation formula of video segment is：

I_CRefer to the fusion importance sum of video segment, I_jRefer to the average fusion importance of video segment, i, which refers in step 2, to be obtained The shearing point arrived, next_i refers to next shearing point.The average fusion importance I of video segment_jAs followed by The foundation of video segment subset selection.

The fusion importance of each video segment obtained in step 8 according to being calculated in step 7 and the threshold value of setting are to step Video segment set obtained by splitting in rapid 2 carries out the selection of subset, and threshold value is set as institute shared by video frequency abstract fragment here There is the ratio of video segment, it is impossible to which the ratio of setting is too high or too low, the video segment that otherwise chooses or too much or too Few inherently to influence the quality of video frequency abstract, such as setting ratio is 15% or is set as that 20% is proper.

Selection subset calculation formula be：

Wherein { 1,0 } is a decision function, is used as video for judging whether some video segment is selected and plucks The part wanted, if choosing the part as video frequency abstract, the value of the function is 1, is otherwise 0.Based on above Formula we may be selected by out a suitable video segment subset.

The synthesis of video frequency abstract is carried out in step 9 according to the video segment subset selected in step 8 out.So-called synthesis Exactly each video segment in resulting video segment subset is merged according to the order in original video.Video is plucked Need to consider whether the video includes audio-frequency information during the synthesis wanted, if comprising audio-frequency information, made a summary in synthetic video During also audio-frequency information is included.It is illustrated in figure 4 video frequency abstract demo system.This video summarization method is with one Plant succinct mode to be presented video summary results in front of the user, significantly improve viewing experience of the user to video data And demand.

Claims

1. a kind of video abstraction generating method of multiple features fusion, it is characterised in that comprise the following steps：

Step 1, obtain video and regard video as input data；

Step 4, obtain the frame of video extracted and frame of video central block carries out feature and picture quality；

Step 8, according to the importance and given threshold of obtained each video segment, video segment is selected, one is selected The video segment subset of individual optimization；

2. according to the method described in claim 1, it is characterised in that the video of input is split using superframe in the step 2 Prospect, background and movable information of the method by calculating video by into some small video segments of Video segmentation, split The number of point and video segment.

3. according to the method described in claim 1, it is characterised in that for the extraction of frame of video central block in the step 3 Process is：5x5 block is divided into frame of video, the 3x3 of core central block is then extracted.

4. according to the method described in claim 1, it is characterised in that the feature calculated in step 4 includes vision significance f₁, expose Luminosity f₂, colourity f₃, saturation degree f₄, Rule of thirds three characteristic value f₅,f₆,f₇, contrast f₈, direction degree f₉, step The picture quality calculated in 4 includes the picture quality of frame of videoWith the picture quality of frame of video central blockWherein

<mrow> <msub> <mi>f</mi> <mn>1</mn> </msub> <mo>=</mo> <msub> <mi>F</mi> <mi>A</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mi>S</mi> </msub> <mo>,</mo> <msub> <mi>A</mi> <mi>T</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mn>1</mn> <mn>2</mn> </mfrac> <mo>&lsqb;</mo> <mrow> <mo>(</mo> <msub> <mi>A</mi> <mi>s</mi> </msub> <mo>+</mo> <msub> <mi>A</mi> <mi>T</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mfrac> <mn>1</mn> <mrow> <mn>1</mn> <mo>+</mo> <mi>&gamma;</mi> </mrow> </mfrac> <mo>|</mo> <mrow> <msub> <mi>A</mi> <mi>s</mi> </msub> <mo>-</mo> <msub> <mi>A</mi> <mi>T</mi> </msub> </mrow> <mo>|</mo> <mo>&rsqb;</mo> <mo>,</mo> <mi>w</mi> <mi>h</mi> <mi>e</mi> <mi>r</mi> <mi>e</mi> <mi>&gamma;</mi> <mo>></mo> <mn>0</mn> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>

Wherein, A_SFor static conspicuousness, A_TFor time conspicuousness, γ is the empirical parameter of a non-negative；

<mrow> <msub> <mi>f</mi> <mn>2</mn> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mi>X</mi> <mi>Y</mi> </mrow> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <msub> <mi>x</mi> <mi>v</mi> </msub> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>X</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <munderover> <mo>&Sigma;</mo> <mrow> <msub> <mi>y</mi> <mi>v</mi> </msub> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>Y</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>I</mi> <mi>V</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>v</mi> </msub> <mo>,</mo> <msub> <mi>y</mi> <mi>v</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>2</mn> <mo>)</mo> </mrow> </mrow>

Wherein, X, Y are respectively that the video image of extraction is converted to the length and width of HSV images, x_v、y_vIn respectively passage V Location of pixels, I_V(x_v,y_v) be HSV images V passages；

<mrow> <msub> <mi>f</mi> <mn>3</mn> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mi>X</mi> <mi>Y</mi> </mrow> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <msub> <mi>x</mi> <mi>s</mi> </msub> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>X</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <munderover> <mo>&Sigma;</mo> <mrow> <msub> <mi>y</mi> <mi>s</mi> </msub> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>Y</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>I</mi> <mi>S</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>s</mi> </msub> <mo>,</mo> <msub> <mi>y</mi> <mi>s</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>

Wherein, x_s、y_sLocation of pixels in respectively passage S, I_S(x_s,y_s) be HSV images channel S；

<mrow> <msub> <mi>f</mi> <mn>4</mn> </msub> <mo>=</mo> <mfrac> <mn>1</mn> <mrow> <mi>X</mi> <mi>Y</mi> </mrow> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <msub> <mi>x</mi> <mi>h</mi> </msub> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>X</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <munderover> <mo>&Sigma;</mo> <mrow> <msub> <mi>y</mi> <mi>h</mi> </msub> <mo>=</mo> <mn>0</mn> </mrow> <mrow> <mi>Y</mi> <mo>-</mo> <mn>1</mn> </mrow> </munderover> <msub> <mi>I</mi> <mi>H</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>h</mi> </msub> <mo>,</mo> <msub> <mi>y</mi> <mi>h</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>

Wherein, x_h、y_hLocation of pixels in respectively passage H, I_H(x, y) is the H passages of HSV images；

<mrow> <msub> <mi>f</mi> <mn>5</mn> </msub> <mo>=</mo> <mfrac> <mn>9</mn> <mrow> <mi>X</mi> <mi>Y</mi> </mrow> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <msub> <mi>x</mi> <mi>h</mi> </msub> <mo>=</mo> <mi>X</mi> <mo>/</mo> <mn>3</mn> </mrow> <mrow> <mn>2</mn> <mi>X</mi> <mo>/</mo> <mn>3</mn> </mrow> </munderover> <munderover> <mo>&Sigma;</mo> <mrow> <msub> <mi>y</mi> <mi>h</mi> </msub> <mo>=</mo> <mi>Y</mi> <mo>/</mo> <mn>3</mn> </mrow> <mrow> <mn>2</mn> <mi>Y</mi> <mo>/</mo> <mn>3</mn> </mrow> </munderover> <msub> <mi>I</mi> <mi>H</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>h</mi> </msub> <mo>,</mo> <msub> <mi>y</mi> <mi>h</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>5</mn> <mo>)</mo> </mrow> </mrow>

<mrow> <msub> <mi>f</mi> <mn>6</mn> </msub> <mo>=</mo> <mfrac> <mn>9</mn> <mrow> <mi>X</mi> <mi>Y</mi> </mrow> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <msub> <mi>x</mi> <mi>s</mi> </msub> <mo>=</mo> <mi>X</mi> <mo>/</mo> <mn>3</mn> </mrow> <mrow> <mn>2</mn> <mi>X</mi> <mo>/</mo> <mn>3</mn> </mrow> </munderover> <munderover> <mo>&Sigma;</mo> <mrow> <msub> <mi>y</mi> <mi>s</mi> </msub> <mo>=</mo> <mi>Y</mi> <mo>/</mo> <mn>3</mn> </mrow> <mrow> <mn>2</mn> <mi>Y</mi> <mo>/</mo> <mn>3</mn> </mrow> </munderover> <msub> <mi>I</mi> <mi>S</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>s</mi> </msub> <mo>,</mo> <msub> <mi>y</mi> <mi>s</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>6</mn> <mo>)</mo> </mrow> </mrow> 1

<mrow> <msub> <mi>f</mi> <mn>7</mn> </msub> <mo>=</mo> <mfrac> <mn>9</mn> <mrow> <mi>X</mi> <mi>Y</mi> </mrow> </mfrac> <munderover> <mo>&Sigma;</mo> <mrow> <msub> <mi>x</mi> <mi>v</mi> </msub> <mo>=</mo> <mi>X</mi> <mo>/</mo> <mn>3</mn> </mrow> <mrow> <mn>2</mn> <mi>X</mi> <mo>/</mo> <mn>3</mn> </mrow> </munderover> <munderover> <mo>&Sigma;</mo> <mrow> <msub> <mi>y</mi> <mi>v</mi> </msub> <mo>=</mo> <mi>Y</mi> <mo>/</mo> <mn>3</mn> </mrow> <mrow> <mn>2</mn> <mi>Y</mi> <mo>/</mo> <mn>3</mn> </mrow> </munderover> <msub> <mi>I</mi> <mi>V</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>x</mi> <mi>v</mi> </msub> <mo>,</mo> <msub> <mi>y</mi> <mi>v</mi> </msub> <mo>)</mo> </mrow> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>7</mn> <mo>)</mo> </mrow> </mrow>

Contrast, direction degree are calculated using Tamura textural characteristics；

The picture quality of frame of video is obtained by the image quality evaluating method of non-reference pictureWith the figure of frame of video central block As quality

5. method according to claim 4, it is characterised in that the global importance I_G in step 5_kCalculation formula be：

<mrow> <mi>I</mi> <mo>_</mo> <msub> <mi>G</mi> <mi>k</mi> </msub> <mo>=</mo> <msub> <mi>q</mi> <msub> <mi>G</mi> <mi>k</mi> </msub> </msub> <mo>&CenterDot;</mo> <mo>&lsqb;</mo> <msub> <mi>f</mi> <mrow> <mi>G</mi> <mo>_</mo> <mn>1</mn> </mrow> </msub> <mo>+</mo> <msub> <mi>f</mi> <mrow> <mi>G</mi> <mo>_</mo> <mn>2</mn> </mrow> </msub> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>f</mi> <mrow> <mi>G</mi> <mo>_</mo> <mn>3</mn> </mrow> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>f</mi> <mrow> <mi>G</mi> <mo>_</mo> <mn>4</mn> </mrow> </msub> <mo>+</mo> <msub> <mi>f</mi> <mrow> <mi>G</mi> <mo>_</mo> <mn>5</mn> </mrow> </msub> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>f</mi> <mrow> <mi>G</mi> <mo>_</mo> <mn>6</mn> </mrow> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>f</mi> <mrow> <mi>G</mi> <mo>_</mo> <mn>7</mn> </mrow> </msub> <mo>+</mo> <msub> <mi>f</mi> <mrow> <mi>G</mi> <mo>_</mo> <mn>8</mn> </mrow> </msub> <mo>+</mo> <msub> <mi>f</mi> <mrow> <mi>G</mi> <mo>_</mo> <mn>9</mn> </mrow> </msub> <mo>&rsqb;</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>8</mn> <mo>)</mo> </mrow> </mrow>

Wherein, k is the index value of frame of video, f_{G_1}~f_{G_9}The value of 9 features respectively based on frame of video；

Local importance I_L in step 5_kCalculation formula be：

<mrow> <mi>I</mi> <mo>_</mo> <msub> <mi>L</mi> <mi>k</mi> </msub> <mo>=</mo> <msub> <mi>q</mi> <msub> <mi>L</mi> <mi>k</mi> </msub> </msub> <mo>&CenterDot;</mo> <mo>&lsqb;</mo> <msub> <mi>f</mi> <mrow> <mi>L</mi> <mo>_</mo> <mn>1</mn> </mrow> </msub> <mo>+</mo> <msub> <mi>f</mi> <mrow> <mi>L</mi> <mo>_</mo> <mn>2</mn> </mrow> </msub> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>f</mi> <mrow> <mi>L</mi> <mo>_</mo> <mn>3</mn> </mrow> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>f</mi> <mrow> <mi>L</mi> <mo>_</mo> <mn>4</mn> </mrow> </msub> <mo>+</mo> <msub> <mi>f</mi> <mrow> <mi>L</mi> <mo>_</mo> <mn>5</mn> </mrow> </msub> <mo>+</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>-</mo> <msub> <mi>f</mi> <mrow> <mi>L</mi> <mo>_</mo> <mn>6</mn> </mrow> </msub> <mo>)</mo> </mrow> <mo>+</mo> <msub> <mi>f</mi> <mrow> <mi>L</mi> <mo>_</mo> <mn>7</mn> </mrow> </msub> <mo>+</mo> <msub> <mi>f</mi> <mrow> <mi>L</mi> <mo>_</mo> <mn>8</mn> </mrow> </msub> <mo>+</mo> <msub> <mi>f</mi> <mrow> <mi>L</mi> <mo>_</mo> <mn>9</mn> </mrow> </msub> <mo>&rsqb;</mo> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>9</mn> <mo>)</mo> </mrow> </mrow>

Wherein, k is the index value of frame of video, f_{L_1}~f_{L_9}The value of 9 features respectively based on frame of video central block.

6. according to the method described in claim 1, it is characterised in that step 6 obtains fusion importance by formula (10)：

I_G_k&L_k=I_G_k+I_L_k (10)

Wherein, k is the index value of frame of video, I_G_k&L_kFor fusion importance, I_G_kAnd I_L_kRespectively frame of video is global important Property and local importance.

7. according to the method described in claim 1, it is characterised in that each video segment importance described in step 7 includes regarding The fusion importance sum I of frequency fragment_C, video segment average fusion importance I_j,

Wherein, k is the index value of frame of video, I_G_k&L_kFor the fusion importance of each frame, i represents i-th of shearing point, next_i For next shearing point.

8. method according to claim 7, it is characterised in that the step 8 is by trying the piece of video that (12) selection optimizes Cross-talk collection：

<mrow> <mi>I</mi> <mo>=</mo> <mi>argmax</mi> <munderover> <mo>&Sigma;</mo> <mrow> <mi>c</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <mo>{</mo> <mn>1</mn> <mo>,</mo> <mn>0</mn> <mo>}</mo> <mo>*</mo> <msub> <mi>I</mi> <mi>C</mi> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>12</mn> <mo>)</mo> </mrow> </mrow>

Wherein, N refers to the sum of video segment, and { 1,0 } is a decision function, for judge some video segment whether by The part as video frequency abstract is chosen, if choosing the part as video frequency abstract, the value of the function is 1, Otherwise it is 0.

9. according to the method described in claim 1, it is characterised in that according to regarding for being come out selected in step 7 in the step 9 Frequency fragment is merged according to each video segment in subset according to the order in original video.

10. method according to claim 9, it is characterised in that if comprising audio-frequency information during the synthesis of video frequency abstract, Audio-frequency information is included during synthetic video is made a summary.