CN114913466A - Video key frame extraction method based on double-flow information and sparse representation - Google Patents

Video key frame extraction method based on double-flow information and sparse representation Download PDF

Info

Publication number
CN114913466A
CN114913466A CN202210616931.9A CN202210616931A CN114913466A CN 114913466 A CN114913466 A CN 114913466A CN 202210616931 A CN202210616931 A CN 202210616931A CN 114913466 A CN114913466 A CN 114913466A
Authority
CN
China
Prior art keywords
matrix
video
sparse
double
key frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210616931.9A
Other languages
Chinese (zh)
Inventor
李玉洁
郭富林
王旭
甘亚奇
丁数学
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN202210616931.9A priority Critical patent/CN114913466A/en
Publication of CN114913466A publication Critical patent/CN114913466A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/513Sparse representations

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a video key frame extraction method based on double-flow information and sparse representation, which comprises the following steps: splitting a video file to be extracted to obtain image frames, and respectively constructing a video space stream matrix and a video time stream matrix based on the image frames; obtaining a double-current information matrix through a video space flow matrix and a video time flow matrix, and performing feature extraction on the double-current information matrix to obtain a double-current feature matrix; inputting the double-current feature matrix into a sparse representation model, calculating a sparse coefficient matrix, and acquiring a key frame index based on the sparse coefficient matrix; and extracting the key frames in the video file to be extracted through the key frame indexes. The method can efficiently extract fewer key frames in one video, reduce the number of the extracted key frames, reduce the compression rate of key frame extraction, and improve the calculation speed of a key frame extraction algorithm.

Description

Video key frame extraction method based on double-flow information and sparse representation
Technical Field
The invention relates to the technical field of computer vision key frame extraction, in particular to a video key frame extraction method based on double-flow information and sparse representation.
Background
With the rapid development of information technology, the internet and multimedia technology are widely applied, a large amount of video information is generated in daily life and work of people, the video data structure is complex and various, video abstraction becomes the main research content in the field of video understanding, and the video retrieval efficiency is effectively improved and the video data storage difficulty is reduced. Here, key frame extraction is an important issue in video summarization. Since the video data is huge and contains much redundant information, how to effectively extract the video key frame information is very important. The video signal is composed of continuous image sequences, the content is rich, the expressive force is strong, the information quantity is large, the correlation between adjacent frames is strong, and a large amount of redundancy exists, so that the key frame extraction method is very important. The key frame extraction is to select a small amount of frames with the largest information quantity to approximately represent the original video, so that the pressure of high-dimensional video signal processing can be relieved, and the video understanding efficiency can be improved.
The existing image feature extraction method usually only passes the original image information through a neural network and finally folds the result, or folds the frames as a whole and inputs the result to a convolutional neural network. So as to achieve the goal of space-time learning. The traditional method is better for local information learning, but is not very good for motion information learning before and after the video frame. The double-flow information can well represent the spatial information and the motion information of the original video frame, and the double-flow information is applied to the key frame extraction technology, so that the learning effect of the model on the video frame can be effectively improved, the comprehension capability of the model on the video is improved, and the extraction precision of the key frame is improved.
The current sparse model-based key frame extraction has attracted great attention due to its outstanding advantages, simplicity and sophisticated mathematical analysis. Sparse representations of signals have well-established mathematical representations. The signals are described through the dictionary and the sparse coefficient matrix, a more concise representation mode of complex signals can be obtained, and therefore signal processing performance is improved. Notably, the effectiveness of sparse representation-based keyframe extraction methods depends largely on the sparsity constraint, which is typically based on the L1 norm. Sparse Modeling Representation Selection (SMRS) is a method that uses the L1 norm to compute a sparse matrix corresponding to a key frame. Although the method for sparse modeling representation selection can effectively obtain the key frames, the existing method based on sparse representation cannot extract effective key frames, the extracted information is only from the spatial information of the video, and the motion information of objects in the video is not fully considered.
Therefore, it is necessary to develop more efficient methods to obtain better video key frames.
Disclosure of Invention
The invention aims to provide a video key frame extraction method based on double-flow information and sparse representation, which is used for solving the problems in the prior art, improving the calculation accuracy of a key frame extraction algorithm, reducing the number of extracted key frames and reducing the compression rate.
In order to achieve the purpose, the invention provides the following scheme:
a video key frame extraction method based on double-stream information and sparse representation comprises the following steps:
splitting a video file to be extracted to obtain image frames, and respectively constructing a video space stream matrix and a video time stream matrix based on the image frames;
acquiring a double-flow information matrix through the video space flow matrix and the video time flow matrix, and performing feature extraction on the double-flow information matrix to acquire a double-flow feature matrix;
inputting the double-current characteristic matrix into a sparse representation model, calculating a sparse coefficient matrix, and acquiring a key frame index based on the sparse coefficient matrix;
and extracting the key frames in the video file to be extracted through the key frame indexes.
Preferably, constructing the video spatial stream matrix comprises:
extracting pixel points in the image frames, arranging the pixel points in sequence to obtain space flow characteristic vectors of different image frames, and combining the space flow characteristic vectors of all the image frames to form the video space flow matrix.
Preferably, constructing the video time stream matrix comprises:
extracting pixel points in the image frames, acquiring optical flows of different image frames by a Farneback method according to the movement of the pixel points among different image frames, sequentially arranging and combining the optical flows of different image frames to obtain time flow characteristic vectors of different image frames, and combining the time flow characteristic vectors of all the image frames to form the video time flow matrix.
Preferably, the acquiring the dual-stream information matrix includes:
and splicing the video space stream matrix and the video time stream matrix according to columns to obtain the double-stream information matrix.
Preferably, obtaining the dual-stream feature matrix comprises: and extracting the double-stream information matrix through a VGG16 network to obtain the double-stream feature matrix.
Preferably, the sparse representation model is:
Figure BDA0003674705870000031
wherein f (C) is a sparse representation model function, Y is a double-flow characteristic matrix, C is a sparse coefficient matrix, tau is a constraint parameter of the sparse matrix, | | | | | represents a norm operation function, | | | | | | | u F Represents Frobenius norm, and s.t. is a constraint condition.
Preferably, the method for calculating the sparse coefficient matrix based on the sparse representation model is as follows:
Figure BDA0003674705870000041
wherein, C is a sparse coefficient matrix, Y is a double-flow characteristic matrix, tau is a constraint parameter of the sparse matrix, and | | · | | represents a norm operation function.
Preferably, the sparse coefficient matrix solution is calculated in the form of:
Figure BDA0003674705870000042
wherein Γ represents a permutation matrix, I k For a k-dimensional identity matrix, the elements in Δ are in [0, 1).
Preferably, obtaining a key frame index based on the sparse coefficient matrix comprises: and extracting non-zero rows in the sparse coefficient matrix, wherein the non-zero rows in the sparse coefficient are the key frame indexes.
Preferably, the method further comprises evaluating the accurate effect of the extracted key frame, and evaluating based on the video key frame compression rate and the F-measure as evaluation indexes; wherein the F-measure is used for measuring the accurate effect of the extracted key frame;
the method for calculating the video key frame compression rate comprises the following steps:
Figure BDA0003674705870000043
summary length is the compression ratio of video key frame, N select For the number of extracted key frames, N whole The number of frames of the video is full.
The invention has the beneficial effects that:
the video key frame extraction method based on double-stream information and sparse representation can efficiently extract fewer key frames in one video, reduce the number of the extracted key frames, reduce the compression rate of key frame extraction, well represent the information of the original video, comprehensively extract spatial information and motion information in the video frames by utilizing the double-stream information of the video, and represent the learning method through sparse modeling to obtain accurate key frames and improve the calculation speed of a key frame extraction algorithm; the method is simple to operate, video signals are input, and the video key frames can be obtained through the sequential steps of the video key frame extraction method based on the double-current information and sparse representation.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flow chart of a video key frame extraction method based on dual-stream information and sparse representation according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a video key frame extraction model based on dual-stream information and sparse representation according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
As shown in fig. 1-2, the present invention provides a video key frame extraction method based on double-stream information and sparse representation, which specifically includes:
the video is divided into image frames, each frame is expressed into each row of a matrix to form a video spatial stream matrix, optical streams of the video images are obtained through the video image frames by a Farneback method, and the optical streams of each frame are taken as the rows to form the video temporal stream matrix. And then splicing the spatial stream matrix and the time stream matrix according to columns to obtain a double-stream matrix of the video, and extracting the matrix through VGG16 to obtain a double-stream characteristic matrix Y of the video. The original video can be represented in a matrix form capable of sparse representation by constructing a feature matrix of the video.
In this embodiment, the data set uses video signals, each video is composed of a plurality of video frames, redundant information is provided between adjacent video frames, and each video frame is an image containing information.
And extracting pixel points in each video frame, and obtaining the optical flow of the video through a Farnback method according to the movement of the pixel points between frames. And sequentially arranging and combining the pixel points from left to right from top to bottom to form a spatial stream feature vector, namely obtaining the spatial stream feature vector of the corresponding video frame, and combining the spatial stream feature vectors of all the frames to form a video spatial stream matrix. And arranging and combining the optical flows of all the frames into a feature vector from left to right and from top to bottom in sequence to obtain the time stream feature vector of the corresponding video frame, and combining the time stream vectors of all the video frames to form a video time stream matrix. Splicing the space flow matrix and the time flow matrix according to columns to obtain a double-flow matrix of the video, extracting the double-flow matrix through VGG16 to obtain video characteristics Y, namely
Y=[y 1 ,y 2 ,…,y i ,…,y N ],y i And (i is more than or equal to 1 and less than or equal to N) is the feature vector of the ith video frame.
And constructing a sparse representation model, wherein the sparse representation model can obtain a sparse coefficient matrix, and further extracting key frames of the video in colleges and universities.
An initial matrix Y, D, where Y is the video feature matrix and D is the dictionary matrix. The sparse coefficient matrix C is calculated by substituting the matrix Y, D into the following equation (1).
Figure BDA0003674705870000071
Wherein, f (C) is a sparse representation model function, Y is a video feature matrix, C is a sparse coefficient matrix, D is a dictionary matrix, and constraint parameters of the tau sparse matrix generally take values larger than 0, | · | | represents a norm operation function.
The original signal matrix can be represented by a linear combination of a few column vectors in the matrix, and if only the second row and the fourth row in the sparse matrix are non-zero elements, the original signal matrix can be represented by a linear combination (product) of the second column and the fourth column in the matrix and the second row and the fourth row in the sparse matrix.
Replacing the original signal matrix Y and the dictionary matrix D in the formula (1) by the video dual-stream signal matrix Y, and constructing a sparse representation model as shown in the formula (2):
Figure BDA0003674705870000072
wherein, Y is a video matrix signal, C is a sparse coefficient matrix, tau is the constraint of the sparse matrix, and the constraint parameter of the tau sparse matrix is generally greater than 0. And the constructed sparse representation model is used for the application of key frame extraction through the introduction of the video feature matrix.
A sparse coefficient matrix is calculated as shown in fig. 2, where the non-zero rows of the sparse coefficient matrix represent the indices of the key frames.
Since the product of the dictionary (i.e., video matrix) and the sparse coefficient matrix is a video signal matrix, the key frame solution problem can constitute an optimized sparse representation problem, i.e., solve the sparse coefficient matrix satisfying the sparse constraint. In this embodiment, the solving formula for calculating the sparse coefficient matrix X is shown in formula (3):
Figure BDA0003674705870000081
wherein, Y is a video feature matrix, C is a sparse coefficient matrix, tau is a sparse constraint coefficient, and the value is generally larger than 0, | | | |, represents a norm function.
Solving the formula (3), and finally solving a result C as shown in the formula (4):
Figure BDA0003674705870000082
wherein Γ represents a permutation matrix, I k For a k-dimensional identity matrix, the elements in Δ are in [0, 1).
The calculated C is a sparse coefficient matrix in which most elements are zero, where non-zero rows represent key frame indices in the video. If the i, j-th line in C is not zero, the key frame index of the video is { i, j }. And selecting the corresponding video key frame according to the obtained key frame index. If the key frame index of the video is { i, j }, the key frames of the video are the ith frame video frame image and the jth frame video frame image. The key frame index is associated with the key frame, so that the key frame can be quickly and accurately bound, and the accuracy and efficiency of extracting the key frame are improved.
To test the extracted key frame effect, it is verified using a test set. Video key frame compression rate and F-measure are used as evaluation indexes. Given a video, its key frame compression ratio (sum length) is shown in equation (5)
Figure BDA0003674705870000091
Wherein N is select For the number of extracted key frames, N whole The number of frames of the video is full. The key frame compression rate indicates the ratio of the number of extracted key frames to all frames of the video, and the unit thereof is% and a smaller value indicates a stronger video compression level.
Meanwhile, the F-measure is adopted in the embodiment to measure the accurate effect of the extracted key frame. The definition of F-measure is shown as formula (6):
Figure BDA0003674705870000092
wherein, P and R are accuracy and recall respectively. The higher the F-measure is, the better the effect of the extracted key frame is, and the more accurate the content of the original video can be reflected.
In order to fully extract motion information hidden in video frames, the invention introduces optical flow features. The feature is extracted from an original image of a video frame, and a few (sparse) video key frames capable of expressing motion features are generated by sparsely expressing the sparsification property expressed to a video signal. Therefore, the invention has better abstract effect on the video data containing a large amount of motion information. In addition, the original video is input, and the accurate video key frame can be obtained through the steps, so that other additional operations are not needed, and the method is relatively convenient and simple.
The above-described embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solutions of the present invention can be made by those skilled in the art without departing from the spirit of the present invention, and the technical solutions of the present invention are within the scope of the present invention defined by the claims.

Claims (10)

1. A video key frame extraction method based on double-stream information and sparse representation is characterized by comprising the following steps:
splitting a video file to be extracted to obtain image frames, and respectively constructing a video space stream matrix and a video time stream matrix based on the image frames;
acquiring a double-current information matrix through the video space flow matrix and the video time flow matrix, and performing feature extraction on the double-current information matrix to acquire a double-current feature matrix;
inputting the double-current feature matrix into a sparse representation model, calculating a sparse coefficient matrix, and acquiring a key frame index based on the sparse coefficient matrix;
and extracting the key frames in the video file to be extracted through the key frame indexes.
2. The method for extracting video key frames based on dual-stream information and sparse representation according to claim 1, wherein constructing the video spatial stream matrix comprises:
extracting pixel points in the image frames, arranging the pixel points in sequence to obtain space flow characteristic vectors of different image frames, and combining the space flow characteristic vectors of all the image frames to form the video space flow matrix.
3. The method for extracting video key frames based on dual-stream information and sparse representation according to claim 2, wherein constructing the video time stream matrix comprises:
extracting pixel points in the image frames, acquiring optical flows of different image frames by a Farneback method according to the movement of the pixel points among different image frames, sequentially arranging and combining the optical flows of different image frames to obtain time flow characteristic vectors of different image frames, and combining the time flow characteristic vectors of all the image frames to form the video time flow matrix.
4. The method for extracting video key frames based on dual-stream information and sparse representation according to claim 1, wherein obtaining the dual-stream information matrix comprises:
and splicing the video space stream matrix and the video time stream matrix according to columns to obtain the double-stream information matrix.
5. The method for extracting video key frames based on dual-stream information and sparse representation according to claim 1, wherein obtaining the dual-stream feature matrix comprises: and extracting the double-stream information matrix through a VGG16 network to obtain the double-stream feature matrix.
6. The method for extracting video key frames based on dual-stream information and sparse representation according to claim 1, wherein the sparse representation model is:
Figure FDA0003674705860000021
wherein f (C) is a sparse representation model function, Y is a double-current characteristic matrix, C is a sparse coefficient matrix, tau is a constraint parameter of the sparse matrix, | | | | represents a norm operation function, | | | | | | presents a linear transformation function F Represents Frobenius norm, and s.t. is a constraint condition.
7. The method for extracting video key frames based on dual-stream information and sparse representation according to claim 6, wherein the method for calculating the sparse coefficient matrix based on the sparse representation model comprises:
Figure FDA0003674705860000022
wherein, C is a sparse coefficient matrix, Y is a double-flow characteristic matrix, tau is a constraint parameter of the sparse matrix, and | | · | | represents a norm operation function.
8. The method for extracting video key frames based on dual-stream information and sparse representation according to claim 7, wherein the sparse coefficient matrix solution is calculated in the form of:
Figure FDA0003674705860000023
wherein Γ represents a permutation matrix, I k For a k-dimensional identity matrix, the elements in Δ are in [0, 1).
9. The method for extracting video key frames based on dual-stream information and sparse representation according to claim 1, wherein obtaining key frame indexes based on the sparse coefficient matrix comprises: and extracting non-zero rows in the sparse coefficient matrix, wherein the non-zero rows in the sparse coefficient are the key frame indexes.
10. The method for extracting video key frames based on dual-stream information and sparse representation according to claim 1, further comprising evaluating the accurate effect of the extracted key frames, and evaluating based on video key frame compression ratio and F-measure as evaluation indexes; wherein the F-measure is used for measuring the accurate effect of the extracted key frame;
the method for calculating the video key frame compression rate comprises the following steps:
Figure FDA0003674705860000031
summary length is the video key frame compression ratio, N select For the number of extracted key frames, N whole The number of frames of the video is full.
CN202210616931.9A 2022-06-01 2022-06-01 Video key frame extraction method based on double-flow information and sparse representation Pending CN114913466A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210616931.9A CN114913466A (en) 2022-06-01 2022-06-01 Video key frame extraction method based on double-flow information and sparse representation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210616931.9A CN114913466A (en) 2022-06-01 2022-06-01 Video key frame extraction method based on double-flow information and sparse representation

Publications (1)

Publication Number Publication Date
CN114913466A true CN114913466A (en) 2022-08-16

Family

ID=82770118

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210616931.9A Pending CN114913466A (en) 2022-06-01 2022-06-01 Video key frame extraction method based on double-flow information and sparse representation

Country Status (1)

Country Link
CN (1) CN114913466A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116758494A (en) * 2023-08-23 2023-09-15 深圳市科灵通科技有限公司 Intelligent monitoring method and system for vehicle-mounted video of internet-connected vehicle

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116758494A (en) * 2023-08-23 2023-09-15 深圳市科灵通科技有限公司 Intelligent monitoring method and system for vehicle-mounted video of internet-connected vehicle
CN116758494B (en) * 2023-08-23 2023-12-22 深圳市科灵通科技有限公司 Intelligent monitoring method and system for vehicle-mounted video of internet-connected vehicle

Similar Documents

Publication Publication Date Title
CN109947912B (en) Model method based on intra-paragraph reasoning and joint question answer matching
CN109241424B (en) A kind of recommended method
Zhao et al. TTH-RNN: Tensor-train hierarchical recurrent neural network for video summarization
CN113177141B (en) Multi-label video hash retrieval method and device based on semantic embedded soft similarity
CN112417306B (en) Method for optimizing performance of recommendation algorithm based on knowledge graph
CN112328900A (en) Deep learning recommendation method integrating scoring matrix and comment text
CN111523546A (en) Image semantic segmentation method, system and computer storage medium
CN112487949B (en) Learner behavior recognition method based on multi-mode data fusion
CN113297370B (en) End-to-end multi-modal question-answering method and system based on multi-interaction attention
Jia et al. Adaptive neighborhood propagation by joint L2, 1-norm regularized sparse coding for representation and classification
CN110619121A (en) Entity relation extraction method based on improved depth residual error network and attention mechanism
CN112016406A (en) Video key frame extraction method based on full convolution network
CN113515951A (en) Story description generation method based on knowledge enhanced attention network and group-level semantics
CN114913466A (en) Video key frame extraction method based on double-flow information and sparse representation
Zeng et al. Dual Swin-transformer based mutual interactive network for RGB-D salient object detection
CN115203529A (en) Deep neural network recommendation model and method based on multi-head self-attention mechanism
Zhu et al. Deep learning for video-text retrieval: a review
CN111242068A (en) Behavior recognition method and device based on video, electronic equipment and storage medium
CN111079011A (en) Deep learning-based information recommendation method
CN113743188B (en) Feature fusion-based internet video low-custom behavior detection method
CN113378546B (en) Non-autoregressive sentence sequencing method
CN112015760B (en) Automatic question-answering method and device based on candidate answer set reordering and storage medium
CN112288142B (en) Short video memory prediction method and device
CN114463680A (en) Video key frame extraction method based on MCP sparse representation
CN113688258A (en) Information recommendation method and system based on flexible multidimensional clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination