CN111625683B - Automatic video abstract generation method and system based on graph structure difference analysis - Google Patents

Automatic video abstract generation method and system based on graph structure difference analysis Download PDF

Info

Publication number
CN111625683B
CN111625683B CN202010376813.6A CN202010376813A CN111625683B CN 111625683 B CN111625683 B CN 111625683B CN 202010376813 A CN202010376813 A CN 202010376813A CN 111625683 B CN111625683 B CN 111625683B
Authority
CN
China
Prior art keywords
frame
image
video
shot
image block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010376813.6A
Other languages
Chinese (zh)
Other versions
CN111625683A (en
Inventor
吕晨
柴春蕾
马彩霞
马艳玲
吕蕾
刘弘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Senbo Mingde Marketing Technology Co.,Ltd.
Original Assignee
Shandong Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Normal University filed Critical Shandong Normal University
Priority to CN202010376813.6A priority Critical patent/CN111625683B/en
Publication of CN111625683A publication Critical patent/CN111625683A/en
Application granted granted Critical
Publication of CN111625683B publication Critical patent/CN111625683B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/73Querying
    • G06F16/738Presentation of query results
    • G06F16/739Presentation of query results in form of a video summary, e.g. the video summary being a video sequence, a composite still image or having synthesized frames

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a video abstract automatic generation method and a system based on graph structure difference analysis, comprising the following steps: preprocessing a given video stream, and dividing each frame of image in the preprocessed video stream into a plurality of image blocks with equal size; extracting the characteristics of each image block to obtain the characteristic vector of each image block of each frame of image; establishing an undirected weighted graph of each frame of image according to the characteristic vector of each image block of each frame of image; detecting video shot boundaries based on hypothesis testing of graph structure difference analysis; the key frames in each video shot are extracted based on the median map of each video shot. The present disclosure addresses the problem that the original features may not be able to fully capture detailed structural information in the frame, making our method more robust to detect various types of shot transitions.

Description

Automatic video abstract generation method and system based on graph structure difference analysis
Technical Field
The disclosure relates to the technical field of automatic generation of a static video abstract (video key frame extraction), in particular to an automatic generation method and an automatic generation system of a video abstract based on graph structure difference analysis.
Background
The statements in this section merely mention background art related to the present disclosure and do not necessarily constitute prior art.
In recent years, a large number of new photographing apparatuses and video software have emerged, so that various videos on the internet have been greatly increased. There is an increasing need to quickly view and review large amounts of video data in a limited amount of time to facilitate video browsing and video retrieval. Furthermore, this is a general concern in the field where large amounts of video data must be stored, archived, analyzed, or visualized. Automatic video summarization techniques solve these problems by generating a reduced version of a video stream that retains only its most informative and representative content.
Automatic video summarization techniques can be divided into two categories, still video summarization (static key frames) and dynamic video summarization (dynamic video browsing). Currently, in the field of video key frame extraction, key frame extraction by shot boundary detection has been widely used.
In general, shot boundary detection may be achieved by analyzing differences between successive frames, where significant differences indicate that there may be a boundary at the currently detected location. There are a number of different similarity measures based on different video features, such as pixel differences, color histogram differences, compressed domain techniques, motion vectors, object tracking, event analysis, etc. These methods have proven to be specialized methods of detecting abrupt lens changes (e.g., hard cuts).
However, the inventors have found that a major limitation affecting the performance of existing methods is the lack of ability to detect subtle changes. For the transition of dissolution, erasure, fade-in/fade-out, etc., the inter-frame change of the progressive lens is relatively fine, which is difficult to detect by only relying on the low-level features adopted by the traditional method. The reason is that low-level features such as pixels, pixel blocks, histograms, etc. do not express the underlying detail structure information of each frame, which plays a crucial role in distinguishing nuances between successive frames in the progressive lens.
Disclosure of Invention
In order to solve the defects of the prior art, the present disclosure provides a method and a system for automatically generating a video abstract based on graph structure difference analysis; taking the video structure information into consideration, modeling the video frames by using an undirected weighted graph, and detecting shot boundaries through structure difference analysis among the graphs. And calculating a median map in the shot, and extracting corresponding key frames. The present disclosure addresses the problem that the original features may not be able to fully capture detailed structural information in the frame, making our method more robust to detect various types of shot transitions.
In a first aspect, the present disclosure provides a method for automatically generating a video summary based on graph structure difference analysis;
the automatic video abstract generation method based on graph structure difference analysis comprises the following steps:
preprocessing a given video stream, and dividing each frame of image in the preprocessed video stream into a plurality of image blocks with equal size;
extracting the characteristics of each image block to obtain the characteristic vector of each image block of each frame of image;
establishing an undirected weighted graph of each frame of image according to the characteristic vector of each image block of each frame of image; detecting video shot boundaries based on hypothesis testing of graph structure difference analysis;
the key frames in each video shot are extracted based on the median map of each video shot.
In a second aspect, the present disclosure further provides a video summary automatic generation system based on graph structure difference analysis;
an automatic video abstract generating system based on graph structure difference analysis comprises:
a preprocessing module configured to: preprocessing a given video stream, and dividing each frame of image in the preprocessed video stream into a plurality of image blocks with equal size;
a feature extraction module configured to: extracting the characteristics of each image block to obtain the characteristic vector of each image block of each frame of image;
a shot boundary detection module configured to: establishing an undirected weighted graph of each frame of image according to the characteristic vector of each image block of each frame of image; detecting video shot boundaries based on hypothesis testing of graph structure difference analysis;
a key frame extraction module configured to: the key frames in each video shot are extracted based on the median map of each video shot.
In a third aspect, the present disclosure also provides an electronic device comprising a memory and a processor, and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the steps of the method of the first aspect.
In a fourth aspect, the present disclosure also provides a computer readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the method of the first aspect.
In a fifth aspect, the present disclosure also provides a computer program (product) comprising a computer program for implementing the method of any one of the preceding aspects when run on one or more processors.
Compared with the prior art, the beneficial effects of the present disclosure are:
(1) The present disclosure proposes a new method for representing video based on graph modeling, so that strong connectivity between graphs becomes a key factor for determining structural features of video frames, so as to make up for the gap between actual semantics of video frames and original features.
(2) The present disclosure proposes a graph-based dissimilarity measure method to measure inter-frame differences to reflect potential differences between successive frames, enhancing the robustness and accuracy of detecting various shot transitions.
(3) The method and the device have the advantages that the corresponding frames are extracted by the aid of the median map and serve as key frames of each shot, and overall trends of videos can be reflected more comprehensively and accurately.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application.
FIG. 1 is an overall flow overview of an algorithm according to a first embodiment of the disclosure;
FIG. 2 is a schematic diagram of video representation based on graph modeling in accordance with an embodiment of the present disclosure;
FIG. 3 is a schematic diagram illustrating detecting video shot boundaries based on graph structure difference analysis according to a first embodiment of the disclosure;
fig. 4 is a diagram illustrating a median view calculated in a shot according to an embodiment of the present disclosure.
Detailed Description
It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the present application. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs.
It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments in accordance with the present application. As used herein, the singular is also intended to include the plural unless the context clearly indicates otherwise, and furthermore, it is to be understood that the terms "comprises" and/or "comprising" when used in this specification are taken to specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof.
Example 1
The embodiment provides a video abstract automatic generation method based on graph structure difference analysis;
as shown in fig. 1, the automatic video abstract generating method based on graph structure difference analysis includes:
s101: preprocessing a given video stream, and dividing each frame of image in the preprocessed video stream into a plurality of image blocks with equal size;
s102: extracting the characteristics of each image block to obtain the characteristic vector of each image block of each frame of image;
s103: establishing an undirected weighted graph of each frame of image according to the characteristic vector of each image block of each frame of image; detecting video shot boundaries based on hypothesis testing of graph structure difference analysis;
s104: the key frames in each video shot are extracted based on the median map of each video shot.
As one or more embodiments, in S101, a given video stream is preprocessed, and each frame of image in the preprocessed video stream is divided into a plurality of image blocks with equal size; the method comprises the following specific steps:
sampling a given video stream to obtain a video frame set; the video frame set comprises video frames sampled from a given video stream;
and carrying out consistent size adjustment on each frame of image in the video frame set, and dividing each frame of image after adjustment into a plurality of image blocks with equal size.
The image blocks herein are also referred to as patches.
Exemplary, in S101, a given video stream is preprocessed, and each frame of image in the preprocessed video stream is divided into a plurality of image blocks with equal size; the method comprises the following specific steps:
first, for a given video stream, a set of video frames f= { F containing n frames is extracted at a predefined sampling rate 1 ,f 2 ,f 3 ,...,f n }. Frame f i An i-th frame representing a set of video frames.
A predefined sampling rate r:
Figure BDA0002480416290000051
the specified constant C is usually 1 to 3, here 3;
secondly, considering the influence of noise such as local illumination of video frames, each frame in the video frame set F is controlled to be 256 multiplied by 192;
each frame is then equally divided into k patches. Thus, each frame is denoted as f i ={f i 1 ,f i 2 ,f i 3 ,...,f i k And (f), where f i k Representing frame f i Is the kth patch of (c). As shown in fig. 2, k=4 is set.
As one or more embodiments, in S102, feature extraction is performed on each image block to obtain a feature vector of each image block of each frame of image; the method comprises the following specific steps:
extracting HSV color histograms from each image block of each frame of preprocessed image;
extracting an HOG direction gradient histogram from each image block of each preprocessed frame image;
and connecting the HSV color histogram and the HOG direction gradient histogram of each image block of each frame of image to obtain the characteristic vector of each image block of each frame of image.
It will be appreciated that feature extraction plays a critical role in video representation as the first step in key frame extraction, with a critical impact on the subsequent extraction process.
It should be appreciated that color histograms are the most expressive features in video representations. In order to use the color histogram, an appropriate color space needs to be selected in advance. HSV is chosen as the color space because it is more robust to noise. HSV can effectively separate RGB into intensity (brightness) and color information, much like the way humans perceive color.
Illustratively, extracting an HSV color histogram for each image block of each frame of the image after preprocessing; the method comprises the following specific steps:
a color quantization step is employed for each of the hue (H), saturation (S) and value (V), specifically 16 hue components and 4 saturation and value components. Thus, the image block of each frame is represented as a 256-dimensional (16×4×4) HSV histogram:
Figure BDA0002480416290000061
it should be appreciated that HOG counts the gradient direction information of the local region to describe shape edge information of the image. It perfectly describes geometrical and optical deformations and is therefore very robust against environmental changes. Currently, HOG is widely used in the fields of scene analysis, target detection, recognition systems, and the like.
Illustratively, the video frame of each frame patch is partitioned into cell units of size 16×16, and the 8-bin histogram for each cell unit is calculated to yield 384-dimensional HOG histograms for each frame patch, expressed as:
Figure BDA0002480416290000071
illustratively, connecting the HSV color histogram and the HOG direction gradient histogram of each image block of each frame image to obtain a feature vector of each image block of each frame image; the method comprises the following specific steps:
HSV histogram and HOG histogram are connected into a feature vector
Figure BDA0002480416290000072
Known as HSV-HOG histograms.
Thus, the video frame set F is represented as a set of 640 x k x n dimensional histogram features:
F={f 1 ,f 2 ,...,f i ,...,f n }∈R 640×k×n (3)
wherein f i ={f i 1 ,f i 2 ,...,f i p ,...,f i k }∈R 640×k
As one or more embodiments, in S103, an undirected weighted graph of each frame image is established according to the feature vector of each image block of each frame image; the method comprises the following specific steps:
s1031: taking the frequency component of the characteristic vector of each image block of each frame image as a node of the undirected weighted graph;
s1032: taking the connection between two nodes as the edge of the undirected weighted graph;
s1033: taking the distance between the frequency amplitude values of the two nodes as the weight value of the edge;
s1034: expressing the undirected weighted graph as an adjacent matrix, and regularizing the adjacent matrix;
s1035: all frame images in the video frame set are processed in the same way as in S1031 to S1034 to obtain a corresponding adjacency matrix set.
It is well known that shot detection and keyframe selection face two challenges. Firstly, excessive segmentation caused by the change of local content, illumination condition, shooting angle and shooting distance; the other is the key frame omission of the progressive shots. In the over-segmentation problem, the contents at two sides of the lens boundary of the false alarm do not have overall structural change. In the key frame loss detection problem, the background of the lost key frame is similar to the adjacent key frame, but completely different structural and spatial information is expressed. HSV-HOG histograms can only express one-dimensional statistics of video frames. Therefore, it is necessary to explore a suitable model to fully represent the structural and spatial information of video frames, effectively reflecting structural changes in the video stream. The present disclosure designs a graph model to express the content of a video frame.
Illustratively, in the step S103, an undirected weighted graph of each frame image is established according to the feature vector of each image block of each frame image; the method comprises the following specific steps:
modeling each frame image block as an Undirected Weighted Graph (UWG), i.e., G, based on features extracted from the HSV-HOG histogram p = { V, E }, which is constructed as follows:
1) Representing V (i) (1 < < i < < X) (x=640) in V as the i-th frequency component of the HSV-HOG histogram, i.e., one node in the graph;
2) Every two nodes v (i) and v (j) are connected as edge e i,j Calculating Manhattan distance between frequency amplitude values of two nodes as weight d of edge i,j
3) Map G p Represented as an adjacency matrix A p I.e. A p ={d i,j And regularizing the matrix.
The video frame set F is modeled as a series of undirected weighted graphs G, i.e., g= { G 1 ,G 2 ,...,G i ,...,G n Strong connectivity between primitives becomes a key factor in determining structural features of video frames.
Finally, the graph sequence G is represented as a contiguous matrix sequence a, i.e., a= { a 1 ,A 2 ,...,A i ,...,A n }. In the middle of
Figure BDA0002480416290000081
Representing frame f i Corresponding graph G i Comprising k sub-graphs G corresponding to k sub-graphs p . Thus, the first and second substrates are bonded together,
Figure BDA0002480416290000082
representation of diagram G i Corresponding adjacency matrix A i Comprising k sub-matrices A p
As shown in fig. 3, in S103, a video shot boundary is detected based on a hypothesis test of the graph structure difference analysis; the method comprises the following specific steps:
s103a1: obtaining the difference degree between two adjacent frame images based on the difference of adjacent matrixes corresponding to the two adjacent frame images;
s103a2: predicting the size of the current shot, and analyzing the difference degree between frames of the current shot by adopting a sliding window with dynamically adjusted step length;
s103a3: the significance of the variation of the difference degree between the current continuous frames is judged through hypothesis testing, and then the boundary of the video shot is detected.
Further, the step length dynamic adjustment rule of the sliding window is that the difference degree between the new lens initial frame image and the frame image with the specified step length is judged to be compared with the set threshold value, so that the current lens size is predicted, and the dynamic adjustment of the step length of the sliding window of the current lens is realized.
In an exemplary embodiment, in S103a1, a difference degree between two adjacent frame images is obtained based on a difference of adjacent matrixes corresponding to the two adjacent frame images; the method comprises the following specific steps:
the dissimilarity score between graphs G and G' is measured by the sum of edge weight differences, expressed as:
Figure BDA0002480416290000091
to eliminate the negative effects of singular sample data, it is normalized to:
Figure BDA0002480416290000092
wherein Δi, j is calculated as:
Figure BDA0002480416290000093
in S103a2, the current shot size is predicted, and a sliding window with dynamically adjusted step size is adopted to analyze the difference between frames of the current shot; the method comprises the following specific steps:
Figure BDA0002480416290000101
wherein w is 0 The step size of the initial sliding window is that of the incremental sliding window.
It should be appreciated that most methods employ a predefined fixed-size sliding window for inter-frame difference analysis. However, in real life, one important feature of video is its high time variability in similar content. That is, events/shots in one video will always last for several frames or more. Therefore, in shots of different lengths and types, it is not appropriate to detect boundaries with a sliding window of fixed size. In order to further improve the detection accuracy, a prediction strategy is adopted, and a sliding window with a proper size is automatically matched.
Illustratively, in S103a3, the significance of the variance change between the current continuous frames is determined through hypothesis testing, so as to obtain a video shot boundary; the method comprises the following specific steps:
based on equations (8), (9), it is determined by hypothesis testing whether there is a change in video content for the current consecutive frame, i.e., whether it is a shot boundary.
Figure BDA0002480416290000107
Figure BDA0002480416290000102
In the middle of
Figure BDA0002480416290000103
Figure BDA0002480416290000104
The representations respectively correspond to frame f m And frame f n Subgraph->
Figure BDA0002480416290000105
With subgraph->
Figure BDA0002480416290000106
A dissimilarity measure score between them. # {. Cndot. } is a counting function, so z represents the number of dissimilarity metric values that exceed a predefined threshold. k is the number of patches in the video frame. Lambda is the shot boundary detection threshold.
If at least k-1 dissimilarity scores of the k dissimilarity scores of the two frames are greater than a predefined threshold, detecting a shot boundary in the continuous frames, marking the shot boundary (the ending frame index number of the last shot and the starting frame index number of the next shot), predicting a current sliding window w through an equation (7), and starting a detection process of a new shot; otherwise, continuing to detect.
Splitting a video frame into k patches while requiring hypothesis H 1 The value of z is greater than k-1 for two reasons. On the one hand, local variations may cause false positives in shot detection, with which the error rate of shot detection can be reduced by suppressing the influence of each pair of frame patch local variations. On the other hand, the video frame is divided into k patches, so that the calculation efficiency of matrix operation is effectively improved.
It should be appreciated that in video frame set F, video frames of the same shot express similar content and follow the same data distribution. Shot boundary detection may be achieved by analyzing differences between successive frames, where significant differences indicate that there may be a boundary at the currently detected location.
Completing shot division to obtain shot boundary frame Index set index= { In 0 ,In 1 ,In 2 ,...,In M },In 0 For the initial first frame, i.e. In 0 =1,In M For the last frame, i.e. In M =n。
As shown in fig. 4, in S104, a key frame in each video shot is extracted based on the median map of each video shot; the method comprises the following specific steps:
for the undirected weighted graphs corresponding to all the image frames in each video shot, calculating a graph with the smallest sum of the distances between the undirected weighted graphs and all other graphs in the shot, namely, a median graph; and selecting the frame corresponding to the median map as a key frame.
Illustratively, in S104, key frames in each video shot are extracted based on median map calculation; the method comprises the following specific steps:
given the atlas S s= { G for each shot 1 ,G 2 ,G 3 ,...,G N Obtaining a median map by solving a minimization optimization problem
Figure BDA0002480416290000111
Figure BDA0002480416290000112
Wherein, the liquid crystal display device comprises a liquid crystal display device,
Figure BDA0002480416290000113
the calculation is as follows:
Figure BDA0002480416290000121
it is apparent that, in equation (10),
Figure BDA0002480416290000122
is a graph with the smallest sum of distances relative to the other remaining graphs. Finally, selecting a frame corresponding to the median map as a key frame;
and acquiring key frames of all shots to obtain a key frame set.
It should be noted that equation (10) ensures maximum similarity of the key frame to the remaining frames, i.e., minimal sum of distances relative to the other remaining frames. In addition, equation (11) provides support for overcoming sensitivity to local noise in frame difference analysis. The summary of the key frame components thus extracted can comprehensively reflect the overall trend of a given video.
It will be appreciated that after shots are detected in the video, the next step is to select the most informative and representative frames in extracting each shot as key frames. The basic idea is that the extracted key frame is the most similar frame to the rest of the frames in the shot. To this end, we introduced the concept of a median graph for the key frame selection task. In graph theory, the median graph is an effective tool for representing a set of graphs.
Example two
The embodiment also provides an automatic video abstract generating system based on graph structure difference analysis;
an automatic video abstract generating system based on graph structure difference analysis comprises:
a preprocessing module configured to: preprocessing a given video stream, and dividing each frame of image in the preprocessed video stream into a plurality of image blocks with equal size;
a feature extraction module configured to: extracting the characteristics of each image block to obtain the characteristic vector of each image block of each frame of image;
a shot boundary detection module configured to: establishing an undirected weighted graph of each frame of image according to the characteristic vector of each image block of each frame of image; detecting video shot boundaries based on hypothesis testing of graph structure difference analysis;
a key frame extraction module configured to: the key frames in each video shot are extracted based on the median map of each video shot.
It should be noted that the preprocessing module, the feature extraction module, the shot boundary detection module, and the key frame extraction module correspond to steps S101 to S104 in the first embodiment, and the foregoing modules are the same as examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure in the first embodiment. It should be noted that the modules described above may be implemented as part of a system in a computer system, such as a set of computer-executable instructions.
The foregoing embodiments are directed to various embodiments, and details of one embodiment may be found in the related description of another embodiment.
The proposed system may be implemented in other ways. For example, the system embodiments described above are merely illustrative, such as the division of the modules described above, are merely a logical function division, and may be implemented in other manners, such as multiple modules may be combined or integrated into another system, or some features may be omitted, or not performed.
Example III
The embodiment also provides an electronic device, which includes a memory, a processor, and computer instructions stored in the memory and running on the processor, where each operation in the method is completed when the computer instructions are run by the processor, and for brevity, details are not repeated here.
The electronic device may be a mobile terminal and a non-mobile terminal, where the non-mobile terminal includes a desktop computer, and the mobile terminal includes a Smart Phone (such as an Android Phone, an IOS Phone, etc.), a Smart glasses, a Smart watch, a Smart bracelet, a tablet computer, a notebook computer, a personal digital assistant, and other mobile internet devices capable of performing wireless communication.
It should be understood that in this disclosure, the processor may be a central processing unit, CPU, the processor may also be other general purpose processors, digital signal processors, DSPs, application specific integrated circuits, ASICs, off-the-shelf programmable gate arrays, FPGAs, or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory may include read only memory and random access memory and provide instructions and data to the processor, and a portion of the memory may also include non-volatile random access memory. For example, the memory may also store information of the device type.
In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The steps of a method disclosed in connection with the present disclosure may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method. To avoid repetition, a detailed description is not provided herein. Those of ordinary skill in the art will appreciate that the elements of the various examples described in connection with the embodiments disclosed herein, i.e., the algorithm steps, can be implemented as electronic hardware, or as a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.
In the several embodiments provided in the present application, it should be understood that the disclosed systems, apparatuses, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the units is merely a division of one logic function, and there may be additional divisions when actually implemented, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. In addition, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interface, device or unit indirect coupling or communication connection, which may be in electrical, mechanical or other form.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) to perform all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Example IV
The present embodiment also provides a computer-readable storage medium storing computer instructions that, when executed by a processor, perform the method of embodiment one.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the same, but rather, various modifications and variations may be made by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims (8)

1. The automatic video abstract generation method based on graph structure difference analysis is characterized by comprising the following steps of:
preprocessing a given video stream, and dividing each frame of image in the preprocessed video stream into a plurality of image blocks with equal size;
extracting the characteristics of each image block to obtain the characteristic vector of each image block of each frame of image;
establishing an undirected weighted graph of each frame of image according to the characteristic vector of each image block of each frame of image; detecting video shot boundaries based on hypothesis testing of graph structure difference analysis;
extracting key frames in each video shot based on the median map of each video shot;
establishing an undirected weighted graph of each frame of image according to the characteristic vector of each image block of each frame of image; the method comprises the following specific steps:
taking the frequency component of the characteristic vector of each image block of each frame image as a node of the undirected weighted graph;
taking the connection between two nodes as the edge of the undirected weighted graph;
taking the distance between the frequency amplitude values of the two nodes as the weight value of the edge;
expressing the undirected weighted graph as an adjacent matrix, and regularizing the adjacent matrix;
all frame images in the video frame set are subjected to the same processing to obtain a corresponding adjacent matrix set;
detecting video shot boundaries based on hypothesis testing of graph structure difference analysis; the method comprises the following specific steps:
obtaining the difference degree between two adjacent frame images based on the difference of adjacent matrixes corresponding to the two adjacent frame images;
predicting the size of the current shot, and analyzing the difference degree between frames of the current shot by adopting a sliding window with dynamically adjusted step length;
and judging the significance of the difference change between the current continuous frames through hypothesis testing, so as to obtain the video shot boundary.
2. The method of claim 1, wherein a given video stream is preprocessed, and each frame of image in the preprocessed video stream is divided into a plurality of equally sized image blocks; the method comprises the following specific steps:
sampling a given video stream to obtain a video frame set; the video frame set comprises video frames sampled from a given video stream;
and performing size scaling adjustment on each frame of image in the video frame set, and dividing each frame of image after adjustment into a plurality of image blocks with equal size.
3. The method of claim 1, wherein feature extraction is performed on each image block to obtain a feature vector for each image block of each frame of image; the method comprises the following specific steps:
extracting HSV color histograms from each image block of each frame of preprocessed image;
extracting an HOG direction gradient histogram from each image block of each preprocessed frame image;
and connecting the HSV color histogram and the HOG direction gradient histogram of each image block of each frame of image to obtain the characteristic vector of each image block of each frame of image.
4. The method of claim 1, wherein the step size of the sliding window is dynamically adjusted by comparing a difference between a new shot start frame image and a frame image with a specified step size with a set threshold value, so as to predict the current shot size, thereby realizing the dynamic adjustment of the step size of the sliding window of the current shot.
5. The method of claim 1, wherein the key frames in each video shot are extracted based on a median map of each video shot; the method comprises the following specific steps:
for the undirected weighted graphs corresponding to all the image frames in each video shot, calculating a graph with the smallest sum of the distances between the undirected weighted graphs and all other graphs in the shot, namely, a median graph; and selecting the frame corresponding to the median map as a key frame.
6. The automatic video abstract generating system based on graph structure difference analysis, which executes the automatic video abstract generating method based on graph structure difference analysis according to any one of claims 1 to 5, is characterized by comprising the following steps:
a preprocessing module configured to: preprocessing a given video stream, and dividing each frame of image in the preprocessed video stream into a plurality of image blocks with equal size;
a feature extraction module configured to: extracting the characteristics of each image block to obtain the characteristic vector of each image block of each frame of image;
a shot boundary detection module configured to: establishing an undirected weighted graph of each frame of image according to the characteristic vector of each image block of each frame of image; detecting video shot boundaries based on hypothesis testing of graph structure difference analysis;
a key frame extraction module configured to: the key frames in each video shot are extracted based on the median map of each video shot.
7. An electronic device comprising a memory and a processor and computer instructions stored on the memory and running on the processor, which when executed by the processor, perform the steps of the method of any of claims 1-5.
8. A computer readable storage medium storing computer instructions which, when executed by a processor, perform the steps of the method of any of claims 1-5.
CN202010376813.6A 2020-05-07 2020-05-07 Automatic video abstract generation method and system based on graph structure difference analysis Active CN111625683B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010376813.6A CN111625683B (en) 2020-05-07 2020-05-07 Automatic video abstract generation method and system based on graph structure difference analysis

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010376813.6A CN111625683B (en) 2020-05-07 2020-05-07 Automatic video abstract generation method and system based on graph structure difference analysis

Publications (2)

Publication Number Publication Date
CN111625683A CN111625683A (en) 2020-09-04
CN111625683B true CN111625683B (en) 2023-05-23

Family

ID=72272797

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010376813.6A Active CN111625683B (en) 2020-05-07 2020-05-07 Automatic video abstract generation method and system based on graph structure difference analysis

Country Status (1)

Country Link
CN (1) CN111625683B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112380962A (en) * 2020-11-11 2021-02-19 成都摘果子科技有限公司 Animal image identification method and system based on deep learning
CN114079824B (en) * 2021-11-02 2024-03-08 深圳市洲明科技股份有限公司 Transmission card, control method thereof, display device, computer device, and storage medium
CN114090168A (en) * 2022-01-24 2022-02-25 麒麟软件有限公司 Self-adaptive adjusting method for image output window of QEMU (QEMU virtual machine)
CN117177004B (en) * 2023-04-23 2024-05-31 青岛尘元科技信息有限公司 Content frame extraction method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101833569A (en) * 2010-04-08 2010-09-15 中国科学院自动化研究所 Method for automatically identifying film human face image
CN102184242A (en) * 2011-05-16 2011-09-14 天津大学 Cross-camera video abstract extracting method
CN108600865A (en) * 2018-05-14 2018-09-28 西安理工大学 A kind of video abstraction generating method based on super-pixel segmentation
CN110210379A (en) * 2019-05-30 2019-09-06 北京工业大学 A kind of lens boundary detection method of combination critical movements feature and color characteristic

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216886B (en) * 2008-01-11 2010-06-09 北京航空航天大学 A shot clustering method based on spectral segmentation theory
CN101425088A (en) * 2008-10-24 2009-05-06 清华大学 Key frame extracting method and system based on chart partition
CN101951511B (en) * 2010-08-19 2012-11-28 深圳市亮信科技有限公司 Method for layering video scenes by analyzing depth
CN110913243B (en) * 2018-09-14 2021-09-14 华为技术有限公司 Video auditing method, device and equipment
CN111078943B (en) * 2018-10-18 2023-07-04 山西医学期刊社 Video text abstract generation method and device
CN110163239B (en) * 2019-01-25 2022-08-09 太原理工大学 Weak supervision image semantic segmentation method based on super-pixel and conditional random field

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101833569A (en) * 2010-04-08 2010-09-15 中国科学院自动化研究所 Method for automatically identifying film human face image
CN102184242A (en) * 2011-05-16 2011-09-14 天津大学 Cross-camera video abstract extracting method
CN108600865A (en) * 2018-05-14 2018-09-28 西安理工大学 A kind of video abstraction generating method based on super-pixel segmentation
CN110210379A (en) * 2019-05-30 2019-09-06 北京工业大学 A kind of lens boundary detection method of combination critical movements feature and color characteristic

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于支配集的视频关键帧提取方法;聂秀山;柴彦娥;滕聪;;计算机研究与发展(12);225-233 *

Also Published As

Publication number Publication date
CN111625683A (en) 2020-09-04

Similar Documents

Publication Publication Date Title
CN111625683B (en) Automatic video abstract generation method and system based on graph structure difference analysis
CN109151501B (en) Video key frame extraction method and device, terminal equipment and storage medium
CN110400332B (en) Target detection tracking method and device and computer equipment
CN110047095B (en) Tracking method and device based on target detection and terminal equipment
CN108648211B (en) Small target detection method, device, equipment and medium based on deep learning
CN110598558A (en) Crowd density estimation method, device, electronic equipment and medium
CN110610510A (en) Target tracking method and device, electronic equipment and storage medium
CN110909712B (en) Moving object detection method and device, electronic equipment and storage medium
CN110942456B (en) Tamper image detection method, device, equipment and storage medium
CN115294332B (en) Image processing method, device, equipment and storage medium
CN111985314B (en) Smoke detection method based on ViBe and improved LBP
CN113706481A (en) Sperm quality detection method, sperm quality detection device, computer equipment and storage medium
US9286217B2 (en) Systems and methods for memory utilization for object detection
CN110795599B (en) Video emergency monitoring method and system based on multi-scale graph
CN116596935B (en) Deformation detection method, deformation detection device, computer equipment and computer readable storage medium
Chen et al. Modelling of content-aware indicators for effective determination of shot boundaries in compressed MPEG videos
CN113129332A (en) Method and apparatus for performing target object tracking
Huynh-The et al. Locally statistical dual-mode background subtraction approach
CN114298992A (en) Video frame duplication removing method and device, electronic equipment and storage medium
CN110580706A (en) Method and device for extracting video background model
CN111860289B (en) Time sequence action detection method and device and computer equipment
CN112329572B (en) Rapid static living body detection method and device based on frame and flash point
CN113361426A (en) Vehicle loss assessment image acquisition method, medium, device and electronic equipment
CN113158720A (en) Video abstraction method and device based on dual-mode feature and attention mechanism
CN115205555B (en) Method for determining similar images, training method, information determining method and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20231123

Address after: Room 955, 8th Floor, Building B, 1st to 14th Floor, Building 1, No. 59 Huahua Road, Chaoyang District, Beijing, 100020

Patentee after: Beijing Senbo Mingde Marketing Technology Co.,Ltd.

Address before: 250014 No. 88, Wenhua East Road, Lixia District, Shandong, Ji'nan

Patentee before: SHANDONG NORMAL University