CN115082828A - Video key frame extraction method and device based on dominating set - Google Patents

Video key frame extraction method and device based on dominating set Download PDF

Info

Publication number
CN115082828A
CN115082828A CN202210740444.3A CN202210740444A CN115082828A CN 115082828 A CN115082828 A CN 115082828A CN 202210740444 A CN202210740444 A CN 202210740444A CN 115082828 A CN115082828 A CN 115082828A
Authority
CN
China
Prior art keywords
video
graph
weight graph
undirected
frames
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210740444.3A
Other languages
Chinese (zh)
Inventor
马震
马蕾
秦湘清
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202210740444.3A priority Critical patent/CN115082828A/en
Publication of CN115082828A publication Critical patent/CN115082828A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • G06V20/47Detecting features for summarising video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences

Abstract

The invention provides a method and a device for extracting video key frames based on an dominating set, which relate to the technical field of video processing and can be used in the financial field, and the method comprises the following steps: performing graph modeling on a video to obtain a non-directional weight graph corresponding to the video, wherein vertexes of the non-directional weight graph correspond to video frames of the video one by one, and the edge weight of an edge between the vertexes reflects the similarity degree between the two video frames; obtaining a minimum dominating set of the undirected weight graph by integer programming; and extracting the video frame corresponding to the vertex in the minimum domination set as a video key frame. The method and the device pay more attention to the representativeness of the content of the video frames rather than the time sequence, meanwhile, the key frame extraction is also irrelevant to the video shot division, the representativeness and the distinguishability of the key frames can be embodied, the fidelity and the compression rate are high, and the extraction of the key frames is not influenced by the time sequence, so that the method and the device are very effective for constructing the static abstract of the video frames.

Description

Video key frame extraction method and device based on dominating set
Technical Field
The invention relates to the technical field of video processing, can be used in the financial field, and particularly relates to a method and a device for extracting a video key frame based on an dominating set.
Background
Key frame extraction is one of the important steps of video processing, and has wide application in video content analysis. Conventional video key frame extraction methods are roughly classified into 2 categories: the first type is a sampling-based key frame extraction method, which obtains key frames in a random or uniform sampling manner, but the method is simple and fast, but may cause some important video clips to be unselected for key frames or some clips to be extracted for repeated key frames. The second type is a key frame extraction method based on shot separation, which divides a video into a plurality of video shots, and then selects the first frame or the last frame of each shot as a key frame.
In recent years, some key extraction methods based on video content and semantic features are also proposed successively, such as selecting key frames by using a threshold and clustering method; a method for extracting the key frame by weighting fusion is also provided; the selection of the key frame is realized through motion energy; a key frame extraction algorithm based on the time domain maximum value; and respectively extracting key frames by combining with a visual attention model of a human visual system. However, most of the existing extraction methods are affected by time sequence in the process of extracting the key frames, that is, the same or similar video frames may be simultaneously selected as the key frames at different time points, and for the static video summary, the user aims to quickly know the basic content of the video, and the practical requirement on time is not very high, so the key frames obtained by the methods are redundant for the user to quickly browse the video content. In addition, most of the existing key frame extraction algorithms are based on shot separation or clustering, and then key frames are extracted by combining other tools or features, but shot separation is high in computational complexity and susceptible to parameters such as noise and light change, and a clustering method is susceptible to threshold value influence.
Disclosure of Invention
In view of the above, the present invention provides a method and an apparatus for extracting a video key frame based on an dominating set, so as to solve at least one of the above-mentioned problems.
In order to achieve the purpose, the invention adopts the following scheme:
according to a first aspect of the present invention, there is provided a method for extracting a video key frame based on an dominating set, the method comprising: performing graph modeling on a video to obtain a non-directional weight graph corresponding to the video, wherein vertexes of the non-directional weight graph correspond to video frames of the video one by one, and the edge weight of an edge between the vertexes reflects the similarity degree between the two video frames; obtaining a minimum dominating set of the undirected weight graph by integer programming; and extracting the video frame corresponding to the vertex in the minimum domination set as a video key frame.
According to a second aspect of the present invention, there is provided an dominating set-based video key frame extraction apparatus, the apparatus comprising: the graph modeling unit is used for carrying out graph modeling on the video to obtain a non-directional weight graph corresponding to the video, the vertexes of the non-directional weight graph correspond to the video frames of the video one by one, and the edge weight of an edge between the vertexes reflects the similarity degree between the two video frames; an integer programming unit, configured to obtain a minimum dominating set of the undirected weight graph through integer programming; and the key frame extraction unit is used for extracting the video frame corresponding to the vertex in the minimum domination set as a video key frame.
According to a third aspect of the present invention, there is provided an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.
According to a fourth aspect of the invention, a computer-readable storage medium is provided, on which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned method.
According to a fifth aspect of the invention, there is provided a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the above method.
According to the technical scheme, the key frame extraction method focuses more on the representativeness of the content of the video frame, but not on the time sequence, meanwhile, the key frame extraction is also irrelevant to the video shot division, the representativeness and the distinguishability of the key frame can be embodied, the fidelity and the compression rate are high, and the extraction of the key frame is not influenced by the time sequence, so that the method is very effective for constructing the static abstract of the video frame.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts. In the drawings:
fig. 1 is a schematic flowchart of a method for extracting a video key frame based on an dominating set according to an embodiment of the present application;
fig. 2 is a schematic flowchart of a method for obtaining a undirected weight graph according to another embodiment of the present application;
FIG. 3 is a flowchart illustrating a method for extracting key frames from a video based on an dominating set according to another embodiment of the present application;
FIG. 4 is a flowchart illustrating a method for extracting a key frame of a video based on an dominating set according to another embodiment of the present application;
fig. 5 is a schematic structural diagram of an apparatus for extracting a video key frame based on an dominating set according to another embodiment of the present application;
FIG. 6 is a schematic structural diagram of a graph modeling unit according to another embodiment of the present application;
fig. 7 is a schematic block diagram of a system configuration of an electronic device according to another embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention are further described in detail below with reference to the accompanying drawings. The exemplary embodiments and descriptions of the present invention are provided to explain the present invention, but not to limit the present invention.
The following first presents a brief introduction to the terminology involved in the present application:
the term "undirected weight graph": also called "weighted undirected graph", is a graph model in which each edge is associated with a weight value or cost.
The terms "dominating set", "minimal dominating set": for undirected weight graph G ═ V, E, V is the set of vertices and E is the set of edges, if there is a set M, the set M is a non-empty subset of the set of vertices V, if there is a set M, the set M is a set of edges, if there is a set of edges, the set of edges is a set of edges, and if there is a set of edges, the set of edges is a set of edges, and the set of edges is a set of edges
Figure BDA0003717737060000031
x is connected to at least one vertex in M, then M is called the graph G ═ V, E's dominance set, if no proper subset of M is a dominance set, then M is called the graph G ═ V, E's minimal dominance set, if no other dominance set M ' exists, so that | M ' < | M |, then M is called the graph G ═ V, E's minimal dominance set.
The term "integer program": when all or part of variables in the mathematical programming are limited to integers, the variables are the integer programming.
The term "speeded-up robust feature (SURF)": the method is a robust image identification and description algorithm which can be used for computer vision tasks and mainly comprises two parts of image extreme point detection and feature representation.
The term "Hausdorff distance": is a distance defined between any two sets in metric space.
Fig. 1 is a schematic flow chart of a method for extracting a video key frame based on an dominating set according to an embodiment of the present application, where the method includes the following steps:
step S101: and carrying out graph modeling on the video to obtain a non-directional weight graph corresponding to the video, wherein vertexes of the non-directional weight graph correspond to video frames of the video one by one, and the edge weight of an edge between the vertexes reflects the similarity degree between the two video frames.
In this embodiment, each frame of the video is abstracted into a vertex, and a connecting line between the vertices forms an edge, so that the video can be regarded as a undirected weight graph G ═ V, { V, } in a plane i },E={e ij W ═ W ij And represents a vertex set, an edge set and an edge weight set of the undirected weight graph respectively, wherein the edge weight in the undirected weight graph reflects the degree of similarity between two vertices, namely the degree of similarity of contents between two video frames, and the greater the weight is, the higher the similarity of the contents between the two video frames is.
Step S102: obtaining a minimum dominating set of the undirected weight graph by integer programming.
Because the vertexes in the undirected weight graph correspond to the video frames one to one, the selection of the key frames can be equivalent to the selection of the vertexes in the undirected weight graph. By key frame is meant a collection of a small number of frames in a video frame that can represent the content of the video, and the selected collection of key frames is to cover the content of all video frames, i.e. such vertices selected in the undirected weighted graph can represent all vertices in the graph. The point which is reflected on the undirected weight graph is that the unselected point is connected with at least one selected point with edges. Obviously, the frame set corresponding to the minimum dominating set is an ideal key frame set, but the problem of the minimum dominating set of the graph is an NP-hard problem, so that the method and the device select the minimum dominating set of the video mapping graph by using integer programming, and therefore the key frame set of the video is obtained.
Step S103: and extracting the video frame corresponding to the vertex in the minimum domination set as a video key frame.
After the minimum dominating set of the undirected weight graph is obtained in step S102, in this step, video frames corresponding to vertices in the minimum dominating set can be extracted, and the video frames can be used as key frames of the video for video content analysis.
Preferably, as shown in fig. 2, the graph modeling of the video in the step S101 to obtain the undirected weight graph corresponding to the video may include the following steps:
step S1011: and taking the video frames in the video as vertexes of the undirected weight graph.
Step S1012: and acquiring a SURF feature point set of the vertex by using an speeded up robust feature SURF.
The method mainly comprises two parts of image extreme point detection and feature description: firstly, judging whether an angular point is an extreme point by using the value of a Hessian matrix determinant; next, to describe the local extreme points, the SURF algorithm calculates the Haar wavelet responses of the sample points on a 4 × 4 block, and forms a four-dimensional vector from the wavelet response values in the horizontal and vertical directions and the accumulated absolute value thereof.
Step S1013: and calculating the distance between two vertexes as the edge weight between the two points by using a Hausdorff distance function Hausdorff.
The edge weight of the undirected weight graph represents the similarity degree of two video frames, and the greater the edge weight is, the greater the similarity is. Each frame can be regarded as a SURF feature point set thereof, so that the similarity of video frames, i.e., the similarity of feature point sets can be described by the distance between the two sets.
Let A ═ a 1 ,a 2 ,...,a m B ═ B 1 ,b 2 ,...,b n Is a finite set of feature points for two vertices, the HausForff distance for A and B is:
H(A,B)=max{h(A,B),h(B,A)};
wherein
Figure BDA0003717737060000051
In order to ensure that the similarity of the edge weight and the vertex of the undirected weight graph is in a direct proportion relation, the method utilizes a kernel function
Figure BDA0003717737060000052
Generating weights, where x i And x j Is the data point, | | x i -x j | | is expressed by Hausdorff distance, and σ is the parameter assigned 1 in the text; then the video mappingEdge weight w between vertices i and j ij Comprises the following steps:
w ij =e -H(i,j)
preferably, as an embodiment of the present application, the obtaining the minimum dominating set of the undirected weight graph by integer programming in step S102 may include the following method steps:
defining an integer programming model as:
Figure BDA0003717737060000053
in the above formula: t is t i Is a vertex v i The reference numbers of (a) are given,
Figure BDA0003717737060000054
u is a minimal dominating set of the undirected weight graph, t ═ t 1 ,t 2 ,...,t N ) T Is a vector composed of all vertex labels, N is the number of vertices of the undirected weight graph, B is a column vector of all 1, A is defined as follows:
A=(a ij ) N×N
in the above formula:
Figure BDA0003717737060000055
E={e ij the vertex is the set of edges of the undirected weight graph;
and solving according to an integer programming model to obtain a minimum dominating set of the undirected weight graph.
According to the technical scheme, the key frame extraction method focuses more on the representativeness of the content of the video frames, but not on the time sequence, meanwhile, the key frame extraction is also irrelevant to the division of the video shots and relevant to the similarity of the content of the video frames, the representativeness and the distinguishability of the key frames can be reflected, the fidelity and the compression rate are high, and the extraction of the key frames is not influenced by the time sequence, so that the method is very effective for constructing the static abstract of the video frames.
Fig. 3 is a schematic flow chart of a method for extracting a video key frame based on an dominating set according to another embodiment of the present application, where the method includes the following steps:
s301: and carrying out graph modeling on the video to obtain a non-directional weight graph corresponding to the video, wherein vertexes of the non-directional weight graph correspond to video frames of the video one by one, and the edge weight of an edge between the vertexes reflects the similarity degree between the two video frames.
S302: and judging whether the edge weight of the undirected weight graph is smaller than a preset threshold value alpha multiplied by mean (W), wherein alpha is (0, 1), mean (W) represents the average value of all the edge weights, and if so, removing the edge corresponding to the edge weight in the undirected weight graph.
In this embodiment, in order to enable the undirected weight graph obtained by graph modeling in step S201 to more directly reflect the relationship between video frames, the undirected weight graph may be simplified by this step, where the parameter α is a global random number which is set through experiments, and if the value of α is relatively small, there are few connecting lines between points in the generated graph, and if the value of α is relatively large, there are many connecting lines between points in the generated graph. By setting a reasonable alpha value, the undirected weight graph can be simplified, so that the undirected weight graph can more directly reflect the relationship between video frames.
S303: obtaining a minimum dominating set of the undirected weight graph by integer programming.
S304: and extracting the video frame corresponding to the vertex in the minimum domination set as a video key frame.
The description of the steps S301, S303, and S304 may refer to the description in the corresponding embodiment of fig. 1, and is not repeated herein.
According to the technical scheme, the key frame extraction method focuses more on the representativeness of the content of the video frames, but not on the time sequence, meanwhile, the key frame extraction is also irrelevant to the division of the video shots and relevant to the similarity of the content of the video frames, the representativeness and the distinguishability of the key frames can be reflected, the fidelity and the compression rate are high, and the extraction of the key frames is not influenced by the time sequence, so that the method is very effective for constructing the static abstract of the video frames. In addition, by simplifying the undirected weight graph, not only can the subsequent calculation steps be simplified, but also the undirected weight graph can more directly reflect the relationship among the video frames.
Fig. 4 is a schematic flow chart of a method for extracting a video key frame based on an dominating set according to another embodiment of the present application, where the method includes the following steps:
s401: and carrying out graph modeling on the video to obtain a non-directional weight graph corresponding to the video, wherein vertexes of the non-directional weight graph correspond to video frames of the video one by one, and the edge weight of an edge between the vertexes reflects the similarity degree between two video frames.
S402: and judging whether the edge weight of the undirected weight graph is smaller than a preset threshold value alpha multiplied by mean (W), wherein alpha is (0, 1), mean (W) represents the average value of all the edge weights, and if so, removing the edge corresponding to the edge weight in the undirected weight graph.
S403: obtaining a minimum dominating set of the undirected weight graph by integer programming.
S404: acquiring an upper limit value of the number of the vertex points in the undirected weight graph domination set; adding x vertexes in the minimum domination set, wherein x is larger than or equal to 1, and after the x vertexes are added, the number of vertexes in the minimum domination set does not exceed the upper limit value.
Preferably, this embodiment may obtain the upper limit value of the number of vertex points in the undirected weight graph dominating set by the following theorem: undirected weight graph of n vertices with a minimum degree of k has a number of vertices up to a maximum
Figure BDA0003717737060000071
When the minimum degree of the vertex of the undirected weight graph is greater than 2, the number of the vertex in the minimum dominating set is not greater than 0.4n, and for a video, because of the continuity of the content, the degree of the vertex in most video maps is greater than 2, so the upper limit value is 0.4 n.
The method has good video compressibility on the basis of ensuring the video content representativeness by utilizing the key frame number obtained by the domination set theory, but in some videos, the content of an individual frame may be greatly different from the content of other frames, and the corresponding vertex of the videos on the undirected weight graph may be an isolated point. In order to ensure the accuracy of the algorithm, the application finally adds the frames corresponding to the vertexes into the minimum domination set under the condition that the upper limit value specified in the step is not exceeded. The finally obtained key frame can reflect real video content.
S405: and extracting the video frame corresponding to the vertex in the minimum domination set as a video key frame.
The steps S401, S402, S403, and S405 may refer to corresponding descriptions in the corresponding embodiment of fig. 3, and are not repeated herein.
According to the technical scheme, the key frame extraction method focuses more on the representativeness of the content of the video frames, but not on the time sequence, meanwhile, the key frame extraction is also irrelevant to the division of the video shots and relevant to the similarity of the content of the video frames, the representativeness and the distinguishability of the key frames can be reflected, the fidelity and the compression rate are high, and the extraction of the key frames is not influenced by the time sequence, so that the method is very effective for constructing the static abstract of the video frames. In addition, by simplifying the undirected weight graph, not only can the subsequent calculation steps be simplified, but also the undirected weight graph can more directly reflect the relationship among the video frames.
Fig. 5 is a schematic structural diagram of an apparatus for extracting a video key frame based on an dominating set according to an embodiment of the present application, where the apparatus includes: a graph modeling unit 510, an integer programming unit 520 and a key frame extraction unit 530, wherein the integer programming unit 520 is connected to the graph modeling unit 510 and the key frame extraction unit 530, respectively.
The graph modeling unit 510 is configured to perform graph modeling on a video to obtain a non-directional weight graph corresponding to the video, where vertices of the non-directional weight graph correspond to video frames of the video one to one, and an edge weight of an edge between the vertices reflects a degree of similarity between two video frames.
The integer programming unit 520 is configured to obtain a minimum dominating set of the undirected weight graph by integer programming.
The key frame extracting unit 530 is configured to extract a video frame corresponding to a vertex in the minimum dominance set as a video key frame.
Preferably, as shown in fig. 6, the graph modeling unit 510 may include a vertex setting module 511, a feature point obtaining module 512 and a distance calculating module 513, wherein the feature point obtaining module 512 is connected to the vertex setting module 511 and the distance calculating module 513, respectively.
The vertex setting module 511 is configured to take a video frame in the video as a vertex of the undirected weight graph.
The feature point obtaining module 512 is configured to obtain a SURF feature point set of the vertex by using the speeded up robust feature SURF.
The distance calculation module 513 is configured to calculate the distance between two vertices as the edge weight between two points by using a Hausdorff distance function Hausdorff.
Preferably, the apparatus of this embodiment may further include a simplifying unit, configured to determine whether the edge weight of the undirected weight graph is smaller than a preset threshold α × mean (w), where α ∈ (0, 1), mean (w) represents an average value of all edge weights, and if smaller, an edge corresponding to the edge weight is removed from the undirected weight graph.
Preferably, the obtaining the minimum dominating set of the undirected weight graph by the integer programming unit 520 may further include:
defining an integer programming model as:
Figure BDA0003717737060000081
in the above formula: t is t i Is a vertex v i The reference numbers of (a) are given,
Figure BDA0003717737060000082
u is a minimal dominating set of the undirected weight graph, t ═ t 1 ,t 2 ,...,t N ) T Is a vector composed of all vertex labels, N is the number of vertices of the undirected weight graph, B is a column vector of all 1, A is defined as follows:
A=(a ij ) N×N
in the above formula:
Figure BDA0003717737060000091
E={e ij the vertex is the set of edges of the undirected weight graph;
and solving according to an integer programming model to obtain a minimum dominating set of the undirected weight graph.
Preferably, the embodiment may further include an upper limit setting unit and a minimum dominance set adjusting unit, wherein: the upper limit setting unit is used for acquiring an upper limit value of the number of vertexes in the undirected weight graph domination set; and the minimum dominating set adjusting unit is used for adding x vertexes into the minimum dominating set, wherein x is more than or equal to 1, and after the x vertexes are added, the number of vertexes in the minimum dominating set does not exceed the upper limit value.
Preferably, the upper limit setting unit obtains the upper limit value of the number of vertices in the undirected weight graph dominating set by the following theorem: undirected weight graph of n vertices with a minimum degree of k has a number of vertices up to a maximum
Figure BDA0003717737060000092
Wherein degree refers to the number of edges connected to the vertex.
For the detailed description of each unit, reference may be made to the description in the corresponding method embodiment, which is not described herein again.
According to the technical scheme, the key frame extraction method focuses more on the representativeness of the content of the video frames, but not on the time sequence, meanwhile, the key frame extraction is also irrelevant to the division of the video shots and relevant to the similarity of the content of the video frames, the representativeness and the distinguishability of the key frames can be reflected, the fidelity and the compression rate are high, and the extraction of the key frames is not influenced by the time sequence, so that the method is very effective for constructing the static abstract of the video frames. In addition, by simplifying the undirected weight graph, not only can the subsequent calculation steps be simplified, but also the undirected weight graph can more directly reflect the relationship among the video frames.
From a hardware level, the present application provides an embodiment of an electronic device for implementing all or part of contents in the dominating set-based video keyframe extraction method, where the electronic device specifically includes the following contents:
a processor (processor), a memory (memory), a communication Interface (Communications Interface), and a bus; the processor, the memory and the communication interface complete mutual communication through the bus; the communication interface is used for realizing information transmission between the video key frame extraction system based on the dominating set and relevant equipment such as a core service system, a user terminal, a relevant database and the like; the logic controller may be a desktop computer, a tablet computer, a mobile terminal, and the like, but the embodiment is not limited thereto. In this embodiment, the logic controller may refer to an embodiment of the method for extracting a video key frame based on an dominating set and an embodiment of the apparatus for extracting a video key frame based on a dominating set in the embodiments for implementation, and the contents thereof are incorporated herein, and repeated details are not repeated herein.
It is understood that the user terminal may include a smart phone, a tablet electronic device, a network set-top box, a portable computer, a desktop computer, a Personal Digital Assistant (PDA), an in-vehicle device, a smart wearable device, and the like. Wherein, intelligence wearing equipment can include intelligent glasses, intelligent wrist-watch, intelligent bracelet etc..
In practical applications, part of the dominating set-based video key frame extraction method may be performed on the electronic device side as described in the above, or all operations may be performed in the client device or the server device. The selection may be specifically performed according to the processing capability of the client device or the server device, the limitation of the user usage scenario, and the like. This is not a limitation of the present application. The client device or the server device may further include a processor if all operations are performed in the client device or the server device.
The client device or the server device may have a communication module (i.e., a communication unit), and may be communicatively connected to a remote server or client to implement data transmission. The server may include a server on the task scheduling center side, and in other implementation scenarios, the server may also include a server on an intermediate platform, for example, a server on a third-party server platform that is communicatively linked to the task scheduling center server. The server may include a single computer device, or may include a server cluster formed by a plurality of servers, or a server structure of a distributed apparatus.
Fig. 7 is a schematic block diagram of a system configuration of an electronic device 1100 according to another embodiment of the present invention. As shown in fig. 7, the electronic device 1100 may include a central processor 1110 and a memory 1120; the memory 1120 is coupled to the central processor 1110. Notably, this fig. 7 is exemplary; other types of structures may also be used in addition to or in place of the structure to implement telecommunications or other functions.
In one embodiment, the dominance set based video key frame extraction method functionality may be integrated into the central processor 1110. The central processor 1110 may be configured to control as follows:
step S101: and carrying out graph modeling on the video to obtain a non-directional weight graph corresponding to the video, wherein vertexes of the non-directional weight graph correspond to video frames of the video one by one, and the edge weight of an edge between the vertexes reflects the similarity degree between the two video frames.
Step S102: obtaining a minimum dominating set of the undirected weight graph by integer programming.
Step S103: and extracting the video frame corresponding to the vertex in the minimum domination set as a video key frame.
From the above description, it can be known that the electronic device provided by the present application focuses more on the representativeness of the content of the video frame, rather than the chronological order, and meanwhile, the key frame extraction is also unrelated to the video shot division and related to the similarity of the content of the video frame, so that the representativeness and the distinctiveness of the key frame can be embodied, and the electronic device has higher fidelity and compression rate, and the extraction of the key frame is not affected by the chronological order, so that the present application is very effective for constructing the static abstract of the video frame. In addition, by simplifying the undirected weight graph, not only can the subsequent calculation steps be simplified, but also the undirected weight graph can more directly reflect the relationship among the video frames.
In another embodiment, the dominating set-based video key frame extracting apparatus may be configured separately from the central processor 1110, for example, the dominating set-based video key frame extracting apparatus may be configured as a chip connected to the central processor 1110, and the dominating set-based video key frame extracting method function is realized by the control of the central processor.
As shown in fig. 7, the electronic device 1100 may further include: a communication module 1130, an input unit 1140, an audio processor 1150, a display 1160, and a power supply 1170. It is worthy to note that electronic device 1100 also does not necessarily include all of the components shown in FIG. 7; furthermore, the electronic device 1100 may also comprise components not shown in fig. 7, as reference may be made to the prior art.
As shown in fig. 7, the central processor 1110, also sometimes referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, the central processor 1110 receiving input and controlling operation of the various components of the electronic device 1100.
The memory 1120 can be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information about the above-mentioned dominating set-based video key frame extraction method can be stored, and a program for executing the information can also be stored. And the central processor 1110 may execute the program stored in the memory 1120 to realize information storage or processing, etc.
The input unit 1140 provides input to the central processor 1110. The input unit 1140 is, for example, a key or a touch input device. The power supply 1170 is used to provide power to the electronic device 1100. The display 1160 is used for displaying display objects such as images and characters. The display may be, for example, an LCD display, but is not limited thereto.
The memory 1120 may be a solid-state memory such as Read Only Memory (ROM), Random Access Memory (RAM), a SIM card, or the like. There may also be a memory that holds information even when power is off, can be selectively erased, and is provided with more data, an example of which is sometimes called an EPROM or the like. The memory 1120 may also be some other type of device. The memory 1120 includes a buffer memory 1121 (sometimes referred to as a buffer). The memory 1120 may include an application/function storage 1122, the application/function storage 1122 being used to store application programs and function programs or a flow for executing the operation of the electronic device 1100 by the central processor 1110.
The memory 1120 may also include a data store 1123, the data store 1123 for storing data, such as contacts, digital data, pictures, sounds, and/or any other data used by the electronic device. The driver storage 1124 of the memory 1120 may include various drivers for the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging application, directory application, etc.).
The communication module 1130 is a transmitter/receiver 1130 that transmits and receives signals via an antenna 1131. The communication module (transmitter/receiver) 1130 is coupled to the central processor 1110 to provide an input signal and receive an output signal, which may be the same as in the case of a conventional mobile communication terminal.
Based on different communication technologies, a plurality of communication modules 1130, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, may be provided in the same electronic device. The communication module (transmitter/receiver) 1130 is also coupled to a speaker 1151 and a microphone 1152 via an audio processor 1150 to provide audio output via the speaker 1151 and receive audio input from the microphone 1152, thereby performing typical telecommunications functions. Audio processor 1150 may include any suitable buffers, decoders, amplifiers and so forth. Additionally, an audio processor 1150 is also coupled to the central processor 1110, enabling recording of sounds locally through a microphone 1152, and enabling playing of locally stored sounds through a speaker 1151.
An embodiment of the present application further provides a computer-readable storage medium capable of implementing all steps in the method for extracting a video key frame based on an dominance set, where the execution subject in the foregoing embodiment is a server or a client, where the computer-readable storage medium stores a computer program, and the computer program, when executed by a processor, implements all steps in the method for extracting a video key frame based on a dominance set, where the execution subject is a server or a client, for example, the processor implements the following steps when executing the computer program:
step S101: and carrying out graph modeling on the video to obtain a non-directional weight graph corresponding to the video, wherein vertexes of the non-directional weight graph correspond to video frames of the video one by one, and the edge weight of an edge between the vertexes reflects the similarity degree between the two video frames.
Step S102: obtaining a minimum dominating set of the undirected weight graph by integer programming.
Step S103: and extracting the video frame corresponding to the vertex in the minimum domination set as a video key frame.
As can be seen from the above description, the computer-readable storage medium provided in the embodiment of the present application focuses more on the representativeness of the content of the video frames, rather than the chronological order, and meanwhile, the key frame extraction is also unrelated to the video shot division and related to the similarity of the content of the video frames, so that the representativeness and the distinctiveness of the key frames can be embodied, and the fidelity and the compression rate are higher. In addition, by simplifying the undirected weight graph, not only can the subsequent calculation steps be simplified, but also the undirected weight graph can more directly reflect the relationship among the video frames.
Embodiments of the present application further provide a computer program product capable of implementing all steps in the method for extracting a video key frame based on an dominating set, where an execution subject of the method is a server or a client in the foregoing embodiments, and when executed by a processor, the computer program/instruction implements the steps of the method for extracting a video key frame based on a dominating set, for example, the computer program/instruction implements the following steps:
step S101: and carrying out graph modeling on the video to obtain a non-directional weight graph corresponding to the video, wherein vertexes of the non-directional weight graph correspond to video frames of the video one by one, and the edge weight of an edge between the vertexes reflects the similarity degree between the two video frames.
Step S102: obtaining a minimum dominating set of the undirected weight graph by integer programming.
Step S103: and extracting the video frame corresponding to the vertex in the minimum domination set as a video key frame.
As can be seen from the above description, the computer program product provided in the embodiment of the present application focuses more on the representativeness of the content of the video frames, rather than the chronological order, and meanwhile, the key frame extraction is also unrelated to the video shot division and related to the similarity of the content of the video frames, so that the representativeness and the distinctiveness of the key frames can be embodied, and the fidelity and the compression rate are higher. In addition, by simplifying the undirected weight graph, not only can the subsequent calculation steps be simplified, but also the undirected weight graph can more directly reflect the relationship among the video frames.
In the description herein, reference to the description of the terms "one embodiment," "a particular embodiment," "some embodiments," "for example," "an example," "a particular example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. The sequence of steps involved in the various embodiments is provided to schematically illustrate the practice of the invention, and the sequence of steps is not limited and can be suitably adjusted as desired.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A method for extracting video key frames based on dominating sets is characterized by comprising the following steps:
performing graph modeling on a video to obtain a non-directional weight graph corresponding to the video, wherein vertexes of the non-directional weight graph correspond to video frames of the video one by one, and the edge weight of an edge between the vertexes reflects the similarity degree between the two video frames;
obtaining a minimum dominating set of the undirected weight graph by integer programming;
and extracting the video frame corresponding to the vertex in the minimum domination set as a video key frame.
2. The method of claim 1, wherein the graph modeling of the video to obtain the undirected weight graph corresponding to the video comprises:
taking a video frame in the video as a vertex of the undirected weight graph;
acquiring an SURF feature point set of the vertex by using an speeded up robust feature SURF;
and calculating the distance between two vertexes as the edge weight between the two points by using a Hausdorff distance function Hausdorff.
3. The method for extracting key frames from video according to claim 1, wherein after the graph modeling is performed on the video to obtain the undirected weight graph corresponding to the video, the method further comprises:
and judging whether the edge weight of the undirected weight graph is smaller than a preset threshold value alpha multiplied by mean (W), wherein alpha is (0, 1), mean (W) represents the average value of all the edge weights, and if so, removing the edge corresponding to the edge weight in the undirected weight graph.
4. The dominating set-based video keyframe extraction method of claim 1, wherein said obtaining the minimum dominating set of the undirected weight graph by integer programming comprises:
defining an integer programming model as:
Figure FDA0003717737050000011
s.t.At≥B;
in the above formula: t is t i Is a vertex v i The reference numbers of (a) are given,
Figure FDA0003717737050000012
u is a minimal set of dominants of the undirected weight graph, t ═ t 1 ,t 2 ,...,t N ) T Is a vector composed of all vertex labels, N is the number of vertices of the undirected weight graph, B is a column vector of all 1, A is defined as follows:
A=(a ij ) N×N
in the above formula:
Figure FDA0003717737050000013
E={e ij the vertex is the set of edges of the undirected weight graph;
and solving according to an integer programming model to obtain a minimum dominating set of the undirected weight graph.
5. The dominating set-based video keyframe extraction method of claim 1, wherein said obtaining the minimum dominating set of said undirected weight graph by integer programming further comprises:
acquiring an upper limit value of the number of the vertex points in the undirected weight graph domination set;
adding x vertexes in the minimum domination set, wherein x is larger than or equal to 1, and after the x vertexes are added, the number of vertexes in the minimum domination set does not exceed the upper limit value.
6. The dominating set-based video keyframe extraction method of claim 5, wherein said obtaining an upper bound value for the number of vertices in said undirected weight graph dominating set comprises:
the term "do not" is obtained by the following theoremThe upper limit value of the number of concentrated vertices is governed to the weight map: undirected weight graph of n vertices with a minimum degree of k has a number of vertices up to a maximum
Figure FDA0003717737050000021
Wherein degree refers to the number of edges connected to the vertex.
7. An apparatus for extracting key frames from a video based on an dominating set, the apparatus comprising:
the graph modeling unit is used for carrying out graph modeling on the video to obtain a non-directional weight graph corresponding to the video, the vertexes of the non-directional weight graph correspond to the video frames of the video one by one, and the edge weight of an edge between the vertexes reflects the similarity degree between the two video frames;
an integer programming unit, configured to obtain a minimum dominating set of the undirected weight graph through integer programming;
and the key frame extraction unit is used for extracting the video frame corresponding to the vertex in the minimum domination set as a video key frame.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the dominating set based video keyframe extraction method of any one of claims 1 to 6 when executing the computer program.
9. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, implements the steps of the dominating set based video keyframe extraction method of any one of claims 1 to 6.
10. A computer program product comprising computer program/instructions, characterized in that said computer program/instructions, when executed by a processor, implement the steps of the dominance set based video keyframe extraction method of any one of claims 1 to 6.
CN202210740444.3A 2022-06-28 2022-06-28 Video key frame extraction method and device based on dominating set Pending CN115082828A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210740444.3A CN115082828A (en) 2022-06-28 2022-06-28 Video key frame extraction method and device based on dominating set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210740444.3A CN115082828A (en) 2022-06-28 2022-06-28 Video key frame extraction method and device based on dominating set

Publications (1)

Publication Number Publication Date
CN115082828A true CN115082828A (en) 2022-09-20

Family

ID=83255093

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210740444.3A Pending CN115082828A (en) 2022-06-28 2022-06-28 Video key frame extraction method and device based on dominating set

Country Status (1)

Country Link
CN (1) CN115082828A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117037049A (en) * 2023-10-10 2023-11-10 武汉博特智能科技有限公司 Image content detection method and system based on YOLOv5 deep learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117037049A (en) * 2023-10-10 2023-11-10 武汉博特智能科技有限公司 Image content detection method and system based on YOLOv5 deep learning
CN117037049B (en) * 2023-10-10 2023-12-15 武汉博特智能科技有限公司 Image content detection method and system based on YOLOv5 deep learning

Similar Documents

Publication Publication Date Title
CN109086709B (en) Feature extraction model training method and device and storage medium
CN110149541B (en) Video recommendation method and device, computer equipment and storage medium
CN111476306B (en) Object detection method, device, equipment and storage medium based on artificial intelligence
CN110188719B (en) Target tracking method and device
CN110189246B (en) Image stylization generation method and device and electronic equipment
CN111381909B (en) Page display method and device, terminal equipment and storage medium
CN111325699B (en) Image restoration method and training method of image restoration model
CN111950723A (en) Neural network model training method, image processing method, device and terminal equipment
CN105518712A (en) Keyword notification method, equipment and computer program product based on character recognition
CN107084740B (en) Navigation method and device
US20230013451A1 (en) Information pushing method in vehicle driving scene and related apparatus
CN111491187B (en) Video recommendation method, device, equipment and storage medium
CN112950640A (en) Video portrait segmentation method and device, electronic equipment and storage medium
CN108763350B (en) Text data processing method and device, storage medium and terminal
CN113822427A (en) Model training method, image matching device and storage medium
CN112381707A (en) Image generation method, device, equipment and storage medium
CN115082828A (en) Video key frame extraction method and device based on dominating set
CN114495916A (en) Method, device, equipment and storage medium for determining insertion time point of background music
CN110197459B (en) Image stylization generation method and device and electronic equipment
CN113570510A (en) Image processing method, device, equipment and storage medium
CN112614110A (en) Method and device for evaluating image quality and terminal equipment
CN116704200A (en) Image feature extraction and image noise reduction method and related device
CN113822955B (en) Image data processing method, image data processing device, computer equipment and storage medium
CN113177463A (en) Target positioning method and device in mobile scene
CN113010728A (en) Song recommendation method, system, intelligent device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination