CN114463680A

CN114463680A - Video key frame extraction method based on MCP sparse representation

Info

Publication number: CN114463680A
Application number: CN202210122460.6A
Authority: CN
Inventors: 李玉洁; 谭本英; 丁数学
Original assignee: Guilin University of Electronic Technology
Current assignee: Guilin University of Electronic Technology
Priority date: 2022-02-09
Filing date: 2022-02-09
Publication date: 2022-05-10

Abstract

The invention discloses a video key frame extraction method based on MCP sparse representation, which comprises the following steps: splitting a video to obtain image frames, and constructing a video signal matrix based on the image frames; constructing a sparse representation model by using MCP sparse constraint; inputting the video signal matrix into the sparse representation model, optimizing the sparse representation model by using DC coding, calculating a sparse coefficient matrix, and acquiring a key frame index based on the sparse coefficient matrix; and extracting key frames in the video based on the key frame indexes. The invention improves the calculation speed of the key frame extraction algorithm, reduces the number of the extracted key frames and reduces the compression ratio.

Description

Video key frame extraction method based on MCP sparse representation

Technical Field

The invention relates to the technical field of computer vision key frame extraction, in particular to a video key frame extraction method based on MCP sparse representation.

Background

Video abstraction becomes the main research content of video service in smart cities and smart spaces, and the video information is effectively processed by the aid of Internet of things equipment or sensors. For example, video summarization provides a user with the ability to quickly identify summary information for a video and to identify whether the entire video is worth watching. Here, key frame extraction is an important issue in video summarization. Since video data is huge and contains much noise, it is very important how to effectively extract key frame information of video. The video signal is composed of continuous frames, the temporal redundancy is particularly large, namely the correlation between adjacent frames is strong, so that the method is very suitable for extracting the key frames. The key frame extraction is to select a small amount of frames with the largest information quantity to approximately represent the original video, so that the pressure of high-dimensional video signal processing can be relieved, and the video understanding efficiency can be improved.

The current sparse model-based key frame extraction has attracted great attention due to its outstanding advantages, simplicity and sophisticated mathematical analysis. Sparse representations of signals have well-established mathematical representations. The signals are described through the dictionary and the sparse coefficient matrix, a more concise representation mode of complex signals can be obtained, and therefore signal processing performance is improved. Notably, the effectiveness of sparse representation-based keyframe extraction methods depends largely on the sparsity constraint, which is typically based on the L1 norm. Sparse Modeling Representation Selection (SMRS) methods, for example for video classification and summarization, use the L1 norm to compute a sparse matrix corresponding to a key frame. However, the existing sparse constraint cannot extract a valid key frame, and is basically based on the conventional L1 norm, and structural information of a video frame in a sparse coefficient matrix is not considered.

Therefore, it is necessary to develop more efficient methods to obtain better video key frames.

Disclosure of Invention

The invention aims to provide a video key frame extraction method based on MCP sparse representation, which aims to solve the problems in the prior art, improve the calculation speed of a key frame extraction algorithm, reduce the number of extracted key frames and reduce the compression rate.

In order to achieve the purpose, the invention provides the following scheme: the invention provides a video key frame extraction method based on MCP sparse representation, which comprises the following steps:

splitting a video to obtain image frames, and constructing a video signal matrix based on the image frames;

constructing a sparse representation model by using MCP sparse constraint;

inputting the video signal matrix into the sparse representation model, optimizing the sparse representation model by using DC coding, calculating a sparse coefficient matrix, and acquiring a key frame index based on the sparse coefficient matrix;

and extracting key frames in the video based on the key frame indexes.

Optionally, constructing a video signal matrix based on the image frames comprises: and extracting image information of the image frames, and taking the image information of each frame as columns to construct the video signal matrix.

Optionally, the sparse representation model is:

where S is a video signal matrix, X is a sparse coefficient matrix, J_MCP(X) is MCP sparse constraint, and lambda is a weight coefficient.

Optionally, the sparse coefficient matrix is:

where X is a sparse coefficient matrix, S is a video signal matrix, J_MCP(X) is MCP sparse constraint, and lambda is a weight coefficient.

Optionally, optimizing the sparse representation model using DC coding, the computing the sparse coefficient matrix comprising:

decomposing the sparse coefficient matrix into a difference form of two convex functions by using a DC decomposition method, wherein the difference form is shown as the following formula:

where S is a video matrix signal, X is a sparse coefficient matrix, J_MCP(X) is MCP diluteSparse constraint, wherein lambda is a weight coefficient, and | | · | |, represents a norm function;

the sparse coefficient matrix is calculated using a DC algorithm.

Optionally, the method for calculating the sparse coefficient matrix by using a DC algorithm is as follows:

where S is a video matrix signal, X is a sparse coefficient matrix, J_MCP(X) is MCP sparse constraint, lambda is a weight coefficient,<>the inner product operation is represented by the following operation,

the method represents partial derivative operation, and | | · | | represents a norm operation function.

Optionally, obtaining the key frame index based on the sparse coefficient matrix includes: and extracting non-zero rows in the sparse coefficient matrix, namely the non-zero rows are the key frame indexes.

Optionally, the extracting method further comprises verifying the effect of extracting the key frame by using the key frame compression rate.

The invention discloses the following technical effects:

the video key frame extraction method based on MCP sparse representation provided by the invention can efficiently extract a few key frames in a video, so that the number of the extracted key frames is reduced, namely, fewer key frames are used for representing the original video, and the compression rate of key frame extraction is reduced; the non-convex MCP regularization key frame extraction problem is optimized by applying DC coding, so that a group of sub-problem optimization solving methods are provided, accurate key frames are obtained, and meanwhile, the calculation speed of a key frame extraction algorithm is improved; the method is simple to operate, video signals are input, and the video key frames can be obtained through the video key frame extraction method based on MCP sparse representation.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

FIG. 1 is a schematic diagram of a video key frame extraction model based on MCP sparse representation according to the present invention;

fig. 2 is a schematic diagram of a video key frame extraction method based on MCP sparse representation in the embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.

The invention provides a video key frame extraction method based on MCP sparse representation, which is shown in the figure 1-2. The method specifically comprises the following steps:

the video is split into image frames, each frame being represented as each column of the matrix, constituting a video signal matrix S. The original video can be represented in a matrix form capable of sparse representation by constructing a video signal matrix.

The data set uses video signals, each video is composed of a plurality of video frames, redundant information is arranged between adjacent video frames, and each video frame is an image containing the information.

Extracting pixel points in each video frame, sequentially arranging and combining the pixel points from left to right from top to bottom to form a feature vector to obtain the feature vector of the corresponding video frame, and combining the feature vectors of all the video frames to form a video signal matrix S, namely

S＝[s₁,s₂,…,s_i,…,s_N],s_iAnd (i is more than or equal to 1 and less than or equal to N) is the feature vector of the ith video frame.

A sparse representation model based on sparse Peaalty (MCP) sparse regularization is built, a more sparse coefficient matrix can be obtained through the sparse representation model, and then a lower compression rate is obtained in the subsequent key frame extraction.

Two initial matrices Y, D, where Y is the original signal matrix and D is the dictionary matrix. The sparse coefficient matrix X is calculated by bringing the matrix Y, D into equation (1).

Wherein f (X) is a sparse representation model function, Y is an original signal matrix, X is a sparse coefficient matrix, D is a dictionary matrix, J_MCP(X) is MCP sparse constraint, lambda is a weight coefficient, and | l | · | | represents a norm operation function.

The original signal matrix may be represented by a linear combination of a few column vectors in the dictionary, e.g., if only the first and third rows of the sparse matrix are non-zero elements, then the original signal matrix may be represented as a linear combination (product) of the first and third columns of the dictionary matrix and the first and third rows of the sparse matrix. In order to better select a few (sparse) key frames, the model introduces MCP sparse regularization constraint and carries out strong sparse constraint on the generated sparse coefficient matrix. The constraint may cause the sparse matrix to have strong sparsity and may solve the problem using a convex optimization method.

Replacing the original signal matrix Y and the dictionary matrix D in the formula (1) with the video signal matrix S, and constructing a sparse representation model as shown in the formula (2):

wherein S is a video matrix signal, X is a sparse coefficient matrix, J_MCPAnd (X) is MCP sparse constraint, and lambda is a weight coefficient and generally takes a value larger than 0 and smaller than 1. By means of video signalsAnd substituting the matrix, and using the constructed sparse representation model for the application of key frame extraction.

Based on the DC coding optimized sparse representation model, a sparse coefficient matrix is calculated, as shown in fig. 2, where the non-zero rows of the sparse coefficient matrix represent the indices of the key frames. The DC coding can solve the problem of extracting the non-convex MCP regularized key frame to obtain an accurate key frame, and meanwhile, the DC coding can improve the calculation speed of a key frame extraction algorithm.

Because the product of the dictionary (i.e. video matrix) and the sparse coefficient matrix is a video signal matrix, the key frame solving problem can form an optimized sparse representation problem, i.e. the sparse coefficient matrix meeting MCP sparse constraint is solved. In this embodiment, a DC coding optimization sparse representation model is used to calculate a sparse coefficient matrix X. The formula for calculating the sparse coefficient matrix X is shown in equation (3):

where S is a video matrix signal, X is a sparse coefficient matrix, J_MCP(X) is MCP sparse constraint, lambda is a weight coefficient, the value is generally larger than 0 and smaller than 1, and | is | · | |, which represents a norm function.

The formula (3) is subjected to DC decomposition, and is divided into the form of the difference between two convex functions as shown in the formula (4):

where S is a video matrix signal, X is a sparse coefficient matrix, J_MCP(X) is MCP sparse constraint, lambda is a weight coefficient, the value is generally larger than 0 and smaller than 1, and | is | · | |, which represents a norm operation function.

Calculating the value of X by using a DC algorithm, and solving a formula shown in formula (5):

where S is a video matrix signal, X is a sparse coefficient matrix, J_MCP(X) is MCP sparse constraint, lambda is a weight coefficient, the value is generally larger than 0 and smaller than 1,<>the inner product operation is represented by the following operation,

The computed X is a sparse coefficient matrix with most of the elements being zero, where non-zero rows represent key frame indices in the video. If the ith, j row in X is non-zero, the key frame index of the video is { i, j }. And selecting the corresponding video key frame according to the obtained key frame index. If the key frame index of the video is { i, j }, the key frames of the video are the ith frame video frame image and the jth frame video frame image. The key frame index is associated with the key frame, so that the key frame can be quickly and accurately bound, and the accuracy and efficiency of extracting the key frame are improved.

To test the extracted key frame effect, it is verified using a test set. Video key frame compression rate and F-measure are used as evaluation indexes. The key frame compression rate is an objective standard for evaluating video processing, and is often used as a key frame measuring method, which represents the ratio of the number of extracted key frames to all frames of a video, and the unit thereof is that the smaller the value, the stronger the video compression level is. Given a video, its key frame compression ratio (sum length) is shown in equation (6)

Wherein, N_selectFor the number of extracted key frames, N_wholeThe number of frames of the video is full.

Meanwhile, the F-measure is adopted in the embodiment to measure the accurate effect of the extracted key frame. The definition of F-measure is shown as formula (7):

wherein, P and R are accuracy and recall respectively. The higher the F-measure is, the better the effect of the extracted key frame is, and the more accurate the content of the original video can be reflected.

In order to fully pay attention to the sparsity of a sparse coefficient matrix, MCP is introduced as strong sparse constraint. The MCP is normalized to act on a sparse coefficient matrix, and the generated sparse coefficient matrix shows the existing non-zero structural phenomenon due to the structural property of sparse representation to video signal expression. The invention therefore has better presentation properties for structured video data. In addition, the original video is input, and the accurate video key frame can be obtained through the steps, so that other additional operations are not needed, and the method is relatively convenient and simple.

Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the present invention in its spirit and scope. Are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims

1. A video key frame extraction method based on MCP sparse representation is characterized by comprising the following steps: the method comprises the following steps:

constructing a sparse representation model by using MCP sparse constraint;

and extracting key frames in the video based on the key frame indexes.

2. The MCP sparse representation-based video keyframe extraction method of claim 1, wherein constructing a video signal matrix based on the image frames comprises: and extracting image information of the image frames, and taking the image information of each frame as columns to construct the video signal matrix.

3. The video key-frame extraction method based on MCP sparse representation as claimed in claim 1, wherein the sparse representation model is:

4. The video key-frame extraction method based on MCP sparse representation according to claim 1, wherein the sparse coefficient matrix is:

5. The method of claim 4, wherein the sparse representation model is optimized using DC coding, and wherein computing the sparse coefficient matrix comprises:

where S is a video matrix signal, X is a sparse coefficient matrix, J_MCP(X) is MCP sparse constraint, lambda is a weight coefficient, and | l | · | | represents a norm function;

the sparse coefficient matrix is calculated using a DC algorithm.

6. The video key-frame extraction method based on MCP sparse representation according to claim 5, wherein the method for calculating the sparse coefficient matrix by using DC algorithm is as follows:

where S is a video matrix signal, X is a sparse coefficient matrix, J_MCP(X) is MCP sparse constraint, lambda is a weight coefficient,<>represents inner product operation, theta represents partial derivative operation, and | | · | | | represents norm operation function.

7. The method of claim 1, wherein obtaining the key frame index based on the sparse coefficient matrix comprises: and extracting non-zero rows in the sparse coefficient matrix, namely the key frame index.

8. The method of claim 1, further comprising verifying key frame effect extraction by key frame compression ratio.