CN107222795B

CN107222795B - Multi-feature fusion video abstract generation method

Info

Publication number: CN107222795B
Application number: CN201710486660.9A
Authority: CN
Inventors: 李泽超; 唐金辉; 胡铜铃
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2017-06-23
Filing date: 2017-06-23
Publication date: 2020-07-31
Anticipated expiration: 2037-06-23
Also published as: CN107222795A

Abstract

The invention provides a multi-feature fused video abstract generating method, which comprises the following steps: acquiring a video and taking the video as input data; segmenting input video data, and recording segmentation points and the number of video segments; extracting a video frame and a video frame center block in each video clip; respectively calculating the characteristics and the image quality of the extracted video frame and the center block of the video frame; calculating the global importance and the local importance according to the obtained features; fusing the obtained global importance and the local importance of each frame to obtain fused importance; calculating the importance of each video segment according to the dividing points; selecting the video clips according to the importance of each obtained video clip and a set threshold value, and selecting an optimized video clip subset; and synthesizing the video abstract according to the selected video segment subset.

Description

Multi-feature fusion video abstract generation method

Technical Field

The invention relates to a food analysis and image processing technology, in particular to a multi-feature fusion video abstract generation method.

Background

The current internet technology and the rapid development of devices only enable people to obtain videos and browse videos more and more, and meanwhile, the facing video data is more and more, and in the facing of such a large amount of video data, how to find out the needed video data or visual information from the video data or the video data is a current research hotspot and is also the research content of the video analysis technology. On the basis of research on massive video data, methods such as analysis, processing and storage of the video data are lacked, so that a user has the defect of blindness when searching for useful video data. Therefore, a strong video abstract generation method based on multi-feature fusion of global importance and local importance is needed by performing data mining and image processing on video data.

Disclosure of Invention

The invention aims to provide a video abstract generating method based on multi-feature fusion of global importance and local importance, which comprises the following steps:

step 1, acquiring a video and taking the video as input data;

step 2, segmenting input video data, and recording segmentation points and the number of video segments;

step 3, extracting a video frame and a video frame center block in each video clip;

step 4, calculating the characteristics and the image quality of the extracted video frame and the central block of the video frame respectively;

step 5, calculating the global importance and the local importance according to the obtained features;

step 6, fusing the obtained global importance and the local importance of each frame to obtain fusion importance;

step 7, calculating the importance of each video segment according to the dividing points;

step 8, selecting the video clips according to the importance of each obtained video clip and a set threshold value, and selecting an optimized video clip subset;

and 9, synthesizing the video abstract according to the selected video segment subset.

The invention utilizes various video data acquired by users, including various video data acquired by intelligent equipment and acquired on the Internet, and the acquired video data from various sources can cover all kinds of video data on the network as much as possible; the method can quickly obtain the video abstract wanted by the user without training, thereby saving a great deal of time and energy for the user; in addition, the invention also dynamically extracts the audio information in the video and puts the audio information into the video abstract according to whether the video has the audio information; when the video abstract result is presented to the user, the technology of video analysis and image processing is utilized, the original video is analyzed and processed to obtain the concentrated video abstract, so that the user can quickly obtain the desired concentrated video, and the user experience is improved to a great extent.

The invention is further described below with reference to the accompanying drawings.

Drawings

FIG. 1 is a flow chart of a video summary generation method based on multi-feature fusion of global importance and local importance according to the present invention.

Fig. 2 is a schematic diagram of an original video frame extracted from an original video according to the present invention.

Fig. 3 is a schematic diagram of a video frame extracted by the present invention first divided into 5x5 small blocks and then extracting a central block of 3x3 in the central portion for calculating local importance.

FIG. 4 is an effect diagram of a demonstration of a video summary generation system based on multi-feature fusion of global importance and local importance in the invention.

Detailed Description

With reference to fig. 1, a video summary generation method based on multi-feature fusion of global importance and local importance includes the following steps:

step 1, acquiring a video and taking the video as input data;

step 2, processing the input video data to obtain the number of the segmentation points and the video clips;

step 6, fusing the obtained global importance and the local importance of each frame to obtain final fusion importance;

step 8, setting a threshold value to select the video clips according to the importance of each obtained video clip, and selecting an optimized video clip subset;

The video data in the step 1 can be obtained through the Internet and various intelligent devices, websites for obtaining the video comprise http:// www.youku.com/, http:// www.iqiyi.com/, and the like, and the intelligent devices for obtaining the video comprise various smart phones, tablets, and the like.

And 2, taking the acquired video data as an input video, segmenting the video into segments, segmenting the video into small video segments by using a superframe segmentation method in combination with the foreground, background and motion information of the video to obtain segmentation points and the number of the video segments, and storing the cutting points and the number of the video segments for later calculation.

In step 3, video frames and central blocks of the video frames are extracted from the video, the extraction of the video frames only needs to use a conventional extraction method, but the extraction of the central blocks of the video frames needs to firstly segment the video frames, the video frames are averagely divided into 5x5 blocks in order to well reserve visual contents, and then the central blocks of 3x3 in the central part are extracted for calculating local importance.

Calculating picture characteristics and image quality of the extracted video frame and the video frame center block in the step 4, wherein the calculated characteristics comprise visual saliency exposure, saturation, chroma, Rule soft hards, contrast and direction, and in addition, the calculation of the image quality of the video frame and the video frame center block is required to be calculated; the calculation formula of the visual saliency is as follows:

in the formula, A_SFor static significance, A_TFor temporal significance, γ is a non-negative empirical parameter, F_AJust refers to a function name, which is used to represent the fusion of two visual saliency;

the exposure is calculated by the formula:

wherein X and Y are respectively the length and width of HSV image converted from the extracted video image, X and Y are respectively the pixel position in the channel V, and I_V(x, y) is the V channel of the HSV image.

The formula for calculating the chromaticity is as follows:

wherein X and Y are respectively the length and width of HSV image converted from the extracted video image, X and Y are respectively the pixel position in channel S, and I_S(x, y) is the S channel of the HSV image.

The formula for calculating the saturation is:

wherein X and Y are respectively the length and width of HSV image converted from the extracted video image, X and Y are respectively the pixel position in the channel V, and I_H(x, y) is the V channel of the HSV image.

The formula for Rule ofhirds is:

wherein X and Y are respectively the length and width of HSV image converted from the extracted video image, X and Y are respectively the pixel position in the channel, and I_H(x,y)、I_S(x,y)、I_V(x, y) are three channels of the HSV image. f. of₅、f₆、f₇The three feature values are calculated according to Ruleofhirds, and are mainly used for reflecting that the main information in the image is positioned near the three divisions of the image.

For contrast and direction calculation, Tamura texture features are mainly used for calculation, and the Tamura image texture features comprise six features which are respectively as follows: the image retrieval method comprises six characteristics of roughness, contrast, direction degree, line granularity, regularity and smoothness, wherein the first three characteristics of the six characteristics have very important functions in the field of image retrieval.

Obtaining image quality of video frame by image quality evaluation method without reference image

And image quality of the center block of the video frame

The image quality is mainly used for the quality of the video frames and the central blocks of the video frames extracted in a constant manner, and because some video frames and central blocks extracted from the video may have lower quality, we need to consider whether the characteristics calculated by the distorted and blurred video frames and the central blocks can well express the video or not, because the quality of the image plays a very important role in the generation of the video abstract.

In step 5, for the calculation of the global importance and the local importance of each frame of video frame, the calculation formula of the global importance is as follows:

where k denotes the k frame video, q_GkIs the quality of the video frame, f_{G_1}～f_{G_9}Respectively, the values based on the nine features of the video frame calculated in claim 4.

The calculation formula of the local importance is as follows:

where k refers to the k-th frame of video,

is the quality of the video frame, f_{L_1}～f_{L_9}Respectively, based on the values of nine features of the central block of the video frame.

In step 6, fusion importance of each frame of video is calculated, and the fusion importance is composed of two parts: global importance and local importance. The calculation formula is as follows:

I_G_k&L_k＝I_G_k+I_L_k(10)

wherein I _ G_kAnd I _ L_kGlobal importance and local importance of the video frame, respectively.

In step 7, calculating the importance of each video segment, the average fusion importance of each video segment is calculated mainly according to the cut point of the video segment obtained in step 2 and the fusion importance of each frame of video frame obtained in step 6, and this calculation of importance is mainly used to prepare for the selection of the next subset of video segments.

The calculation formula of the video clip is as follows:

I_Crefers to the sum of the fusion importance of video segments, I_jThe average fusion importance of the video segments is referred to, i refers to a cut point obtained in step 2, and next _ i refers to the next cut point.

In step 8, the subset of the video segment set obtained by segmentation in step 2 is selected according to the fusion importance of each video segment calculated in step 7 and a set threshold, where the threshold is set to be the proportion of the video summary segments to all the video segments, and the unsettable proportion is too high or too low, otherwise, the quality of the video summary is necessarily affected by too many or too few selected video segments, for example, the proportion is set to be 15% or set to be 20% rather suitable.

The calculation formula for selecting the subset is:

where {1,0} is a decision function used to determine whether a video segment is selected as part of the video summary, if so, the value of the function is 1, otherwise, the function is 0. Based on the above formula we can select a suitable subset of video segments.

And 9, synthesizing the video abstract according to the video segment subset selected in the step 8. The synthesis is to combine each video clip in the obtained video clip subset in the order of the original video. The video abstract is synthesized by considering whether the video contains audio information, and if the video contains the audio information, the audio information is also included in the process of synthesizing the video abstract. Fig. 4 shows a video summary presentation system. The video summarization method presents the video summarization result in a concise mode to the user, and greatly improves the browsing experience and the demand of the user on the video data.

Claims

1. A multi-feature fused video abstract generation method is characterized by comprising the following steps:

step 1, acquiring a video and taking the video as input data;

step 4, obtaining the extracted video frame and the center block of the video frame to calculate the characteristics and the image quality;

step 9, synthesizing the video abstract according to the selected video segment subset;

global importance I _ G in step 5_kThe calculation formula of (2) is as follows:

where k is the index value of the video frame, f_{G_1}～f_{G_9}Values based on 9 features of the video frame, respectively;

local importance I _ L in step 5_kThe calculation formula of (2) is as follows:

where k is the index value of the video frame, f_{L_1}～f_{L_9}Respectively, values based on 9 features of a central block of a video frame;

step 7, the importance of each video segment includes the sum of fusion importance of video segments I_CAverage fusion importance of video segments I_j，

Where k is the index value of the video frame, I _ G_k&L_kFor the fusion importance of each frame, i represents the ith segmentation point, and next _ i is the next segmentation point;

the step 8 selects an optimized video segment subset by equation (13):

where N refers to the total number of video segments, {1,0} is a decision function for determining whether a video segment is selected as part of the video summary, and if so, the value of the function is 1, otherwise, the function is 0.

2. The method of claim 1, wherein the superframe segmentation method is used for the input video in the step 2 to segment the video into a plurality of small video segments by calculating the foreground, background and motion information of the video, so as to obtain the segmentation points and the number of the video segments.

3. The method according to claim 1, wherein the step 3 for extracting the central block of the video frame comprises: the video frame is divided into 5x5 blocks on average, and then the center block of 3x3 of the center portion is extracted.

4. The method of claim 1, wherein the features calculated in step 4 comprise visual saliency f₁Exposure f₂Chroma f₃Degree of saturation f₄Three characteristic values f of Rule Soft hards₅,f₆,f₇Contrast f₈Degree of orientation f₉The image quality calculated in step 4 comprises the image quality of the video frame

And image quality of the center block of the video frame

Wherein

Wherein A is_SFor static significance, A_TFor temporal significance, γ is a non-negative empirical parameter;

wherein X, Y is the length and width, x, of HSV image converted from extracted video image_v、y_vRespectively, the pixel position in the channel V, I_V(x_v,y_v) A V channel for an HSV image;

wherein x is_s、y_sRespectively, the pixel position in the channel S, I_S(x_s,y_s) An S channel of an HSV image;

wherein x is_h、y_hRespectively, the pixel position in channel H, I_H(x, y) is the H channel of the HSV image;

calculating contrast and direction degree by adopting Tamura texture characteristics;

obtaining image quality q of video frame by image quality evaluation method without reference image_GkAnd the image quality q of the central block of the video frame_Lk。

5. The method according to claim 1, wherein the fusion importance is obtained by the formula (10) in step 6:

I_G_k&L_k＝I_G_k+I_L_k(10)

where k is the index value of the video frame, I _ G_k&L_kTo fuse importance, I _ G_kAnd I _ L_kGlobal importance and local importance of the video frame, respectively.

6. The method of claim 1, wherein the video segments selected in step 9 are combined in the order of each video segment in the subset as in the original video.

7. The method of claim 1, wherein the video summary is synthesized by including audio information, if any, during the synthesis of the video summary.