CN114463680A - Video key frame extraction method based on MCP sparse representation - Google Patents

Video key frame extraction method based on MCP sparse representation Download PDF

Info

Publication number
CN114463680A
CN114463680A CN202210122460.6A CN202210122460A CN114463680A CN 114463680 A CN114463680 A CN 114463680A CN 202210122460 A CN202210122460 A CN 202210122460A CN 114463680 A CN114463680 A CN 114463680A
Authority
CN
China
Prior art keywords
sparse
video
mcp
matrix
key frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210122460.6A
Other languages
Chinese (zh)
Inventor
李玉洁
谭本英
丁数学
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN202210122460.6A priority Critical patent/CN114463680A/en
Publication of CN114463680A publication Critical patent/CN114463680A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/71Indexing; Data structures therefor; Storage structures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The invention discloses a video key frame extraction method based on MCP sparse representation, which comprises the following steps: splitting a video to obtain image frames, and constructing a video signal matrix based on the image frames; constructing a sparse representation model by using MCP sparse constraint; inputting the video signal matrix into the sparse representation model, optimizing the sparse representation model by using DC coding, calculating a sparse coefficient matrix, and acquiring a key frame index based on the sparse coefficient matrix; and extracting key frames in the video based on the key frame indexes. The invention improves the calculation speed of the key frame extraction algorithm, reduces the number of the extracted key frames and reduces the compression ratio.

Description

Video key frame extraction method based on MCP sparse representation
Technical Field
The invention relates to the technical field of computer vision key frame extraction, in particular to a video key frame extraction method based on MCP sparse representation.
Background
Video abstraction becomes the main research content of video service in smart cities and smart spaces, and the video information is effectively processed by the aid of Internet of things equipment or sensors. For example, video summarization provides a user with the ability to quickly identify summary information for a video and to identify whether the entire video is worth watching. Here, key frame extraction is an important issue in video summarization. Since video data is huge and contains much noise, it is very important how to effectively extract key frame information of video. The video signal is composed of continuous frames, the temporal redundancy is particularly large, namely the correlation between adjacent frames is strong, so that the method is very suitable for extracting the key frames. The key frame extraction is to select a small amount of frames with the largest information quantity to approximately represent the original video, so that the pressure of high-dimensional video signal processing can be relieved, and the video understanding efficiency can be improved.
The current sparse model-based key frame extraction has attracted great attention due to its outstanding advantages, simplicity and sophisticated mathematical analysis. Sparse representations of signals have well-established mathematical representations. The signals are described through the dictionary and the sparse coefficient matrix, a more concise representation mode of complex signals can be obtained, and therefore signal processing performance is improved. Notably, the effectiveness of sparse representation-based keyframe extraction methods depends largely on the sparsity constraint, which is typically based on the L1 norm. Sparse Modeling Representation Selection (SMRS) methods, for example for video classification and summarization, use the L1 norm to compute a sparse matrix corresponding to a key frame. However, the existing sparse constraint cannot extract a valid key frame, and is basically based on the conventional L1 norm, and structural information of a video frame in a sparse coefficient matrix is not considered.
Therefore, it is necessary to develop more efficient methods to obtain better video key frames.
Disclosure of Invention
The invention aims to provide a video key frame extraction method based on MCP sparse representation, which aims to solve the problems in the prior art, improve the calculation speed of a key frame extraction algorithm, reduce the number of extracted key frames and reduce the compression rate.
In order to achieve the purpose, the invention provides the following scheme: the invention provides a video key frame extraction method based on MCP sparse representation, which comprises the following steps:
splitting a video to obtain image frames, and constructing a video signal matrix based on the image frames;
constructing a sparse representation model by using MCP sparse constraint;
inputting the video signal matrix into the sparse representation model, optimizing the sparse representation model by using DC coding, calculating a sparse coefficient matrix, and acquiring a key frame index based on the sparse coefficient matrix;
and extracting key frames in the video based on the key frame indexes.
Optionally, constructing a video signal matrix based on the image frames comprises: and extracting image information of the image frames, and taking the image information of each frame as columns to construct the video signal matrix.
Optionally, the sparse representation model is:
Figure BDA0003498984430000021
where S is a video signal matrix, X is a sparse coefficient matrix, JMCP(X) is MCP sparse constraint, and lambda is a weight coefficient.
Optionally, the sparse coefficient matrix is:
Figure BDA0003498984430000031
where X is a sparse coefficient matrix, S is a video signal matrix, JMCP(X) is MCP sparse constraint, and lambda is a weight coefficient.
Optionally, optimizing the sparse representation model using DC coding, the computing the sparse coefficient matrix comprising:
decomposing the sparse coefficient matrix into a difference form of two convex functions by using a DC decomposition method, wherein the difference form is shown as the following formula:
Figure BDA0003498984430000032
where S is a video matrix signal, X is a sparse coefficient matrix, JMCP(X) is MCP diluteSparse constraint, wherein lambda is a weight coefficient, and | | · | |, represents a norm function;
the sparse coefficient matrix is calculated using a DC algorithm.
Optionally, the method for calculating the sparse coefficient matrix by using a DC algorithm is as follows:
Figure BDA0003498984430000033
where S is a video matrix signal, X is a sparse coefficient matrix, JMCP(X) is MCP sparse constraint, lambda is a weight coefficient,<>the inner product operation is represented by the following operation,
Figure BDA0003498984430000034
the method represents partial derivative operation, and | | · | | represents a norm operation function.
Optionally, obtaining the key frame index based on the sparse coefficient matrix includes: and extracting non-zero rows in the sparse coefficient matrix, namely the non-zero rows are the key frame indexes.
Optionally, the extracting method further comprises verifying the effect of extracting the key frame by using the key frame compression rate.
The invention discloses the following technical effects:
the video key frame extraction method based on MCP sparse representation provided by the invention can efficiently extract a few key frames in a video, so that the number of the extracted key frames is reduced, namely, fewer key frames are used for representing the original video, and the compression rate of key frame extraction is reduced; the non-convex MCP regularization key frame extraction problem is optimized by applying DC coding, so that a group of sub-problem optimization solving methods are provided, accurate key frames are obtained, and meanwhile, the calculation speed of a key frame extraction algorithm is improved; the method is simple to operate, video signals are input, and the video key frames can be obtained through the video key frame extraction method based on MCP sparse representation.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a schematic diagram of a video key frame extraction model based on MCP sparse representation according to the present invention;
fig. 2 is a schematic diagram of a video key frame extraction method based on MCP sparse representation in the embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
The invention provides a video key frame extraction method based on MCP sparse representation, which is shown in the figure 1-2. The method specifically comprises the following steps:
the video is split into image frames, each frame being represented as each column of the matrix, constituting a video signal matrix S. The original video can be represented in a matrix form capable of sparse representation by constructing a video signal matrix.
The data set uses video signals, each video is composed of a plurality of video frames, redundant information is arranged between adjacent video frames, and each video frame is an image containing the information.
Extracting pixel points in each video frame, sequentially arranging and combining the pixel points from left to right from top to bottom to form a feature vector to obtain the feature vector of the corresponding video frame, and combining the feature vectors of all the video frames to form a video signal matrix S, namely
S=[s1,s2,…,si,…,sN],siAnd (i is more than or equal to 1 and less than or equal to N) is the feature vector of the ith video frame.
A sparse representation model based on sparse Peaalty (MCP) sparse regularization is built, a more sparse coefficient matrix can be obtained through the sparse representation model, and then a lower compression rate is obtained in the subsequent key frame extraction.
Two initial matrices Y, D, where Y is the original signal matrix and D is the dictionary matrix. The sparse coefficient matrix X is calculated by bringing the matrix Y, D into equation (1).
Figure BDA0003498984430000051
Wherein f (X) is a sparse representation model function, Y is an original signal matrix, X is a sparse coefficient matrix, D is a dictionary matrix, JMCP(X) is MCP sparse constraint, lambda is a weight coefficient, and | l | · | | represents a norm operation function.
The original signal matrix may be represented by a linear combination of a few column vectors in the dictionary, e.g., if only the first and third rows of the sparse matrix are non-zero elements, then the original signal matrix may be represented as a linear combination (product) of the first and third columns of the dictionary matrix and the first and third rows of the sparse matrix. In order to better select a few (sparse) key frames, the model introduces MCP sparse regularization constraint and carries out strong sparse constraint on the generated sparse coefficient matrix. The constraint may cause the sparse matrix to have strong sparsity and may solve the problem using a convex optimization method.
Replacing the original signal matrix Y and the dictionary matrix D in the formula (1) with the video signal matrix S, and constructing a sparse representation model as shown in the formula (2):
Figure BDA0003498984430000061
wherein S is a video matrix signal, X is a sparse coefficient matrix, JMCPAnd (X) is MCP sparse constraint, and lambda is a weight coefficient and generally takes a value larger than 0 and smaller than 1. By means of video signalsAnd substituting the matrix, and using the constructed sparse representation model for the application of key frame extraction.
Based on the DC coding optimized sparse representation model, a sparse coefficient matrix is calculated, as shown in fig. 2, where the non-zero rows of the sparse coefficient matrix represent the indices of the key frames. The DC coding can solve the problem of extracting the non-convex MCP regularized key frame to obtain an accurate key frame, and meanwhile, the DC coding can improve the calculation speed of a key frame extraction algorithm.
Because the product of the dictionary (i.e. video matrix) and the sparse coefficient matrix is a video signal matrix, the key frame solving problem can form an optimized sparse representation problem, i.e. the sparse coefficient matrix meeting MCP sparse constraint is solved. In this embodiment, a DC coding optimization sparse representation model is used to calculate a sparse coefficient matrix X. The formula for calculating the sparse coefficient matrix X is shown in equation (3):
Figure BDA0003498984430000071
where S is a video matrix signal, X is a sparse coefficient matrix, JMCP(X) is MCP sparse constraint, lambda is a weight coefficient, the value is generally larger than 0 and smaller than 1, and | is | · | |, which represents a norm function.
The formula (3) is subjected to DC decomposition, and is divided into the form of the difference between two convex functions as shown in the formula (4):
Figure BDA0003498984430000072
where S is a video matrix signal, X is a sparse coefficient matrix, JMCP(X) is MCP sparse constraint, lambda is a weight coefficient, the value is generally larger than 0 and smaller than 1, and | is | · | |, which represents a norm operation function.
Calculating the value of X by using a DC algorithm, and solving a formula shown in formula (5):
Figure BDA0003498984430000073
where S is a video matrix signal, X is a sparse coefficient matrix, JMCP(X) is MCP sparse constraint, lambda is a weight coefficient, the value is generally larger than 0 and smaller than 1,<>the inner product operation is represented by the following operation,
Figure BDA0003498984430000074
the method represents partial derivative operation, and | | · | | represents a norm operation function.
The computed X is a sparse coefficient matrix with most of the elements being zero, where non-zero rows represent key frame indices in the video. If the ith, j row in X is non-zero, the key frame index of the video is { i, j }. And selecting the corresponding video key frame according to the obtained key frame index. If the key frame index of the video is { i, j }, the key frames of the video are the ith frame video frame image and the jth frame video frame image. The key frame index is associated with the key frame, so that the key frame can be quickly and accurately bound, and the accuracy and efficiency of extracting the key frame are improved.
To test the extracted key frame effect, it is verified using a test set. Video key frame compression rate and F-measure are used as evaluation indexes. The key frame compression rate is an objective standard for evaluating video processing, and is often used as a key frame measuring method, which represents the ratio of the number of extracted key frames to all frames of a video, and the unit thereof is that the smaller the value, the stronger the video compression level is. Given a video, its key frame compression ratio (sum length) is shown in equation (6)
Figure BDA0003498984430000081
Wherein, NselectFor the number of extracted key frames, NwholeThe number of frames of the video is full.
Meanwhile, the F-measure is adopted in the embodiment to measure the accurate effect of the extracted key frame. The definition of F-measure is shown as formula (7):
Figure BDA0003498984430000082
wherein, P and R are accuracy and recall respectively. The higher the F-measure is, the better the effect of the extracted key frame is, and the more accurate the content of the original video can be reflected.
In order to fully pay attention to the sparsity of a sparse coefficient matrix, MCP is introduced as strong sparse constraint. The MCP is normalized to act on a sparse coefficient matrix, and the generated sparse coefficient matrix shows the existing non-zero structural phenomenon due to the structural property of sparse representation to video signal expression. The invention therefore has better presentation properties for structured video data. In addition, the original video is input, and the accurate video key frame can be obtained through the steps, so that other additional operations are not needed, and the method is relatively convenient and simple.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the present invention in its spirit and scope. Are intended to be covered by the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (8)

1. A video key frame extraction method based on MCP sparse representation is characterized by comprising the following steps: the method comprises the following steps:
splitting a video to obtain image frames, and constructing a video signal matrix based on the image frames;
constructing a sparse representation model by using MCP sparse constraint;
inputting the video signal matrix into the sparse representation model, optimizing the sparse representation model by using DC coding, calculating a sparse coefficient matrix, and acquiring a key frame index based on the sparse coefficient matrix;
and extracting key frames in the video based on the key frame indexes.
2. The MCP sparse representation-based video keyframe extraction method of claim 1, wherein constructing a video signal matrix based on the image frames comprises: and extracting image information of the image frames, and taking the image information of each frame as columns to construct the video signal matrix.
3. The video key-frame extraction method based on MCP sparse representation as claimed in claim 1, wherein the sparse representation model is:
Figure FDA0003498984420000011
where S is a video signal matrix, X is a sparse coefficient matrix, JMCP(X) is MCP sparse constraint, and lambda is a weight coefficient.
4. The video key-frame extraction method based on MCP sparse representation according to claim 1, wherein the sparse coefficient matrix is:
Figure FDA0003498984420000012
where X is a sparse coefficient matrix, S is a video signal matrix, JMCP(X) is MCP sparse constraint, and lambda is a weight coefficient.
5. The method of claim 4, wherein the sparse representation model is optimized using DC coding, and wherein computing the sparse coefficient matrix comprises:
decomposing the sparse coefficient matrix into a difference form of two convex functions by using a DC decomposition method, wherein the difference form is shown as the following formula:
Figure FDA0003498984420000021
where S is a video matrix signal, X is a sparse coefficient matrix, JMCP(X) is MCP sparse constraint, lambda is a weight coefficient, and | l | · | | represents a norm function;
the sparse coefficient matrix is calculated using a DC algorithm.
6. The video key-frame extraction method based on MCP sparse representation according to claim 5, wherein the method for calculating the sparse coefficient matrix by using DC algorithm is as follows:
Figure FDA0003498984420000022
where S is a video matrix signal, X is a sparse coefficient matrix, JMCP(X) is MCP sparse constraint, lambda is a weight coefficient,<>represents inner product operation, theta represents partial derivative operation, and | | · | | | represents norm operation function.
7. The method of claim 1, wherein obtaining the key frame index based on the sparse coefficient matrix comprises: and extracting non-zero rows in the sparse coefficient matrix, namely the key frame index.
8. The method of claim 1, further comprising verifying key frame effect extraction by key frame compression ratio.
CN202210122460.6A 2022-02-09 2022-02-09 Video key frame extraction method based on MCP sparse representation Pending CN114463680A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210122460.6A CN114463680A (en) 2022-02-09 2022-02-09 Video key frame extraction method based on MCP sparse representation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210122460.6A CN114463680A (en) 2022-02-09 2022-02-09 Video key frame extraction method based on MCP sparse representation

Publications (1)

Publication Number Publication Date
CN114463680A true CN114463680A (en) 2022-05-10

Family

ID=81414176

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210122460.6A Pending CN114463680A (en) 2022-02-09 2022-02-09 Video key frame extraction method based on MCP sparse representation

Country Status (1)

Country Link
CN (1) CN114463680A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120148149A1 (en) * 2010-12-10 2012-06-14 Mrityunjay Kumar Video key frame extraction using sparse representation
CN106034264A (en) * 2015-03-11 2016-10-19 中国科学院西安光学精密机械研究所 Coordination-model-based method for obtaining video abstract
CN107886054A (en) * 2017-10-27 2018-04-06 天津大学 A kind of video frequency abstract system of selection based on sparse core dictionary

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120148149A1 (en) * 2010-12-10 2012-06-14 Mrityunjay Kumar Video key frame extraction using sparse representation
CN106034264A (en) * 2015-03-11 2016-10-19 中国科学院西安光学精密机械研究所 Coordination-model-based method for obtaining video abstract
CN107886054A (en) * 2017-10-27 2018-04-06 天津大学 A kind of video frequency abstract system of selection based on sparse core dictionary

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHENNI LI 等: "Direct-Optimization-Based DC Dictionary Learning With the MCP Regularizer" *

Similar Documents

Publication Publication Date Title
CN112633419B (en) Small sample learning method and device, electronic equipment and storage medium
CN110377740B (en) Emotion polarity analysis method and device, electronic equipment and storage medium
CN110674850A (en) Image description generation method based on attention mechanism
CN110390052B (en) Search recommendation method, training method, device and equipment of CTR (China train redundancy report) estimation model
CN110110610B (en) Event detection method for short video
CN111523546A (en) Image semantic segmentation method, system and computer storage medium
CN106034264B (en) Method for acquiring video abstract based on collaborative model
CN112883227B (en) Video abstract generation method and device based on multi-scale time sequence characteristics
CN112734881A (en) Text synthesis image method and system based on significance scene graph analysis
Jia et al. Adaptive neighborhood propagation by joint L2, 1-norm regularized sparse coding for representation and classification
CN113177141A (en) Multi-label video hash retrieval method and device based on semantic embedded soft similarity
CN112464100B (en) Information recommendation model training method, information recommendation method, device and equipment
CN111639230B (en) Similar video screening method, device, equipment and storage medium
CN113761359A (en) Data packet recommendation method and device, electronic equipment and storage medium
CN115658934A (en) Image-text cross-modal retrieval method based on multi-class attention mechanism
CN104008204A (en) Dynamic multi-dimensional context awareness film recommending system and achieving method thereof
Zhao et al. Multi-view clustering with orthogonal mapping and binary graph
CN114819777A (en) Enterprise sales business analysis and management system based on digital twin technology
Chen et al. Efficient and differentiable low-rank matrix completion with back propagation
CN114913466A (en) Video key frame extraction method based on double-flow information and sparse representation
CN112330442A (en) Modeling method and device based on ultra-long behavior sequence, terminal and storage medium
CN115759036B (en) Method for constructing event detection model based on recommendation and method for carrying out event detection by using model
CN114463680A (en) Video key frame extraction method based on MCP sparse representation
CN116629258A (en) Structured analysis method and system for judicial document based on complex information item data
CN113688258A (en) Information recommendation method and system based on flexible multidimensional clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination