CN113591587A - Method for extracting content key frame of motion video - Google Patents

Method for extracting content key frame of motion video Download PDF

Info

Publication number
CN113591587A
CN113591587A CN202110749819.8A CN202110749819A CN113591587A CN 113591587 A CN113591587 A CN 113591587A CN 202110749819 A CN202110749819 A CN 202110749819A CN 113591587 A CN113591587 A CN 113591587A
Authority
CN
China
Prior art keywords
video
slice
frame
image
clustering
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110749819.8A
Other languages
Chinese (zh)
Inventor
冯子亮
刘恒宇
韩震博
何旭东
窦芙蓉
唐玄霜
张欣
贺思睿
何思迪
张炬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202110749819.8A priority Critical patent/CN113591587A/en
Publication of CN113591587A publication Critical patent/CN113591587A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a method for extracting key frames of contents of a motion video, which uses a dynamic space-time slice position selection method to dynamically obtain the position according to an activity intensity diagram of the motion video, thereby improving the extraction effect of key information of the motion video and enabling the extracted slice to more effectively express the key motion information in the motion video; in the distance calculation of the clustering algorithm, the similarity and the time attribute of the slice images are considered, so that the accuracy of identifying the key frames of the motion video is improved; the above measures finally improve the effect of extracting the content key information in the motion video.

Description

Method for extracting content key frame of motion video
Technical Field
The invention relates to the technical field of computer vision, in particular to a method for extracting a content key frame of a motion video.
Background
Digital video has become an important way of information dissemination over networks today. With the great increase of the number of videos on the network, how to quickly and effectively find a required video clip in a large number of videos becomes a focus of people's attention, which is a video content retrieval problem.
Video is composed of continuously changing image frames, and the frames which can effectively represent the main content of the video in the image frames are called video content key frames. The video content key frame extraction technology is an important means for effectively solving the problem of video content retrieval, and has important functions in aspects of video similarity analysis, video content abstraction and the like.
In the motion video analysis, motion type analysis, motion attitude estimation, human behavior recognition and the like are often required, and at the moment, the processing of video key frames is used instead of the processing of all video frames, so that the calculated amount can be effectively reduced, and the efficiency of motion video analysis is improved.
The space-time slicing technology is a technology for abstracting video content according to two dimensions of time and space; specifically, the video is expanded into images in the time dimension; slicing the image in the spatial dimension, extracting one line or one column of information of the image, and finally forming a video space-time slice image as a summary image of the video; the processing of the abstract image can obtain information such as a content key frame of the video.
The usual spatio-temporal slicing process typically slices at fixed positions, which may result in the inability to extract critical information in the video; the time continuity of the slices is not considered when the video slice image is processed, so that the extracted content key frames are not accurate enough and the like.
Aiming at the problems, the invention provides a method for extracting the video content key frame aiming at the motion video based on the dynamic space-time slice clustering, which increases the extraction effect of the key information of the motion video by dynamically calculating the position of the space-time slice; the time attribute of the slice image is considered in the distance calculation of the clustering algorithm, so that the accuracy of key frame identification is improved; therefore, the effect of extracting the key information of the motion video is finally improved.
Disclosure of Invention
A method for extracting key frames of content of a motion video comprises the following steps.
Step 1, calculating an activity intensity map of a sports video:
step 1.1, carrying out gray processing on each frame of image of a motion video;
step 1.2, starting from the 2 nd frame image, calculating the gray difference absolute value of the previous frame image according to pixels to obtain a difference image;
step 1.3, carrying out threshold processing on the difference image, and clearing 0 the pixel gray level of which the gray value is smaller than the threshold;
step 1.4, accumulating the gray value of the difference image according to pixels and normalizing;
and the normalized difference image is called an activity intensity map of the motion video.
The motion video refers to a video with relatively slow background change and relatively large foreground change; the activity intensity map of the motion video can reflect the total effect of foreground target change.
And 2, taking the row with the maximum gray value in the motion video activity intensity map as a space-time slice position row.
Accumulating the gray values of the main area of the activity intensity map according to rows;
and taking the row with the maximum accumulated gray value as a space-time slice position row.
The main area of the activity intensity map refers to a middle area except upper and lower boundaries in a video image, and is a main area of foreground object activity in the video; to prevent foreground changes outside this region from affecting the content key frame extraction, we will ignore changes outside this region.
Optionally, a column with the maximum gray value in the motion video activity intensity map is taken as a space-time slice position column; vertical slicing is subsequently performed column by column.
And 3, performing horizontal slicing operation on each frame of image in the motion video at the position of the space-time slice to obtain a space-time slice image or slice.
The horizontal slicing operation refers to taking one or more lines from an image, and the resulting image is called a spatio-temporal slice image or a slice.
And 4, splicing the time-space slice images to form a transverse video slice image.
And when in splicing, the video is spliced up and down according to the frame sequence when the video is unfolded, and a transverse video slice image from top to bottom in the time direction is formed.
Step 5, using a K-means clustering algorithm and taking each slice image as a basic unit of clustering; the horizontal video slice images are clustered along the temporal direction.
The K mean value clustering algorithm in the step 5 comprises the following steps:
step 5.1, initializing clusters, and uniformly distributing each slice of the transverse video slice images to each cluster or category according to the set number of cluster centers and the time direction;
step 5.2, according to the clustering result, recalculating the clustering center of each category and updating;
step 5.3, recalculating the distance between the slice image and the clustering center between two adjacent clustering centers along the time direction, thereby adjusting the boundary between the two adjacent categories;
and 5.4, repeating the steps 5.2 and 5.3 according to an iterative method of K-means clustering to obtain a clustering result and a key frame candidate frame.
The cluster center, which refers to the center of the category, is a slice image along the time direction and can be represented by a video frame number.
Calculating the distance between the two slice images and the clustering center, namely calculating the distance between the two slice images, wherein the product of the Euclidean distance between the slice images and the time distance (namely the inter-frame distance) of the slice images along the time direction is adopted for representation; when the two slice images are similar, the value becomes small; this value becomes smaller as the inter-slice distance becomes smaller.
And the cluster center of each category is recalculated by adopting an attempt method, selecting any slice of the category as the cluster center, calculating and accumulating the distances between the rest slices of the category and the slice of the cluster center, and taking the cluster center with the minimum accumulated distance as the cluster center of the category.
The method for adjusting the boundary between two adjacent categories or the boundary between two adjacent categories is to consider that the clustering is performed along the time direction, and only the upper slice belongs to the upper clustering center and the lower slice belongs to the lower clustering center, so that the clustering result is the boundary between two categories, and the condition of staggering cannot exist.
The method for adjusting the clustering boundary comprises the steps of firstly, searching slice images in opposite directions from an upper clustering center and a lower clustering center respectively to obtain a first slice image below the upper clustering center and a first slice image above the lower clustering center, and respectively calculating and accumulating the distances between the slice images and the respective clustering centers; then, on the side with the minimum accumulation distance, selecting the next slice image according to the principle of the opposite sum sequence, calculating the distance between the next slice image and the clustering center of the side and accumulating the distance; until all slices between the two cluster centers have been searched, resulting in a new boundary between the two classes.
The clustering result is the boundary and the clustering center of each category along the time direction after the clustering is finished; and at the moment, clustering the frame corresponding to the central slice image, wherein the frame is a candidate frame of the video content key frame.
And 6, combining the classes with less frames and readjusting the class boundaries.
If the number of frames in the clustering result is less than the specified minimum frame number threshold, the category needs to be removed, and the category is merged with the former category and the latter category, and the category can be classified into the former category or the latter category according to the method for adjusting the clustering boundary in the step 5.3.
And in the finally obtained clustering result, a frame sequence formed by the frames of the clustering centers is the final video content key frame sequence.
Compared with the prior art, the invention has the following advantages: the dynamic space-time slice position is used, so that the extraction effect of the key information of the video is improved, and the extracted slice can more effectively express the key motion information in the motion video; the time attribute of the slice image is considered in the distance calculation of the clustering algorithm, so that the accuracy of identifying the key frame of the motion video is improved; the above measures finally improve the effect of extracting the key information of the motion video. Meanwhile, the method has the advantages of easiness in understanding, simplicity in calculation, good robustness and the like.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention.
Fig. 2 is a schematic diagram of the calculation flow of the motion video activity intensity map in the present invention.
Fig. 3 is a schematic diagram of a transverse video slice image in the present invention.
Detailed Description
In the following, the technical solutions in the examples of the present invention are clearly and completely described with reference to the drawings in the examples of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, and not all embodiments.
A method for extracting a content key frame of a motion video, as shown in fig. 1, includes the following steps.
Step 1, calculating an activity intensity map of the sports video, as shown in fig. 2.
Step 1.1, carrying out gray processing on each frame of image of a motion video;
the graying process may use a conventional RGB to grayscale model.
Step 1.2, calculating a difference image frame by frame;
from the 2 nd frame image, the gray value difference with the previous frame image is calculated point by point according to the pixels, and the absolute value is taken to obtain a difference image.
Step 1.3, carrying out threshold processing on the difference image;
the purpose of setting the threshold value is to reduce the influence of video noise in the future, and the threshold value can be set to be 1-5.
In this example, the threshold is set to 3, that is, the difference between the gray levels of two adjacent frames is less than 3, and the difference is cleared to 0.
Step 1.4, accumulating and normalizing all difference images of the video;
accumulating the gray values of the difference image according to the pixels to obtain a maximum value gmax and a minimum value gmin of the gray values; and mapping [ Gmin, Gmax ] between [0,255] in a linear mapping mode to obtain a normalized image.
The normalized image is referred to as the activity intensity map of the motion video.
In this example, let the length of the video image be X =640, the width be Y =480, and the length of the video be L =80 (frame), as shown in (1) in fig. 3.
And 2, taking the row with the maximum gray value in the motion video activity intensity map as a space-time slice position row.
In this example, the upper and lower boundaries are set to be 5 pixels wide, and the range of the line where the main area of the video image is located is: [5, 474], the line number starts from 0.
Accumulating the gray values of the area in the motion video activity intensity map according to the rows;
and taking the row with the maximum accumulated gray value as a space-time slice position row.
In this example, it is assumed that the resulting spatio-temporal slice position row has a value of p =280, as indicated by the dashed position p in fig. 3 (1).
And 3, performing horizontal slicing operation on each frame of image in the motion video.
In this example, assuming that the slice width is 1 pixel, 280 th line of each frame image in the motion video is extracted as the spatio-temporal slice image when slicing, and the size of the slice image is: x1.
And 4, splicing the time-space slice images to form a transverse video slice image.
And when in splicing, the video is spliced up and down according to the frame sequence when the video is unfolded to form a transverse video slice image from top to bottom according to the time sequence.
In this example, the sizes of the finally formed transverse video slice images are: x L, as shown in fig. 3 (3).
And 5, clustering the transverse video slice images along the time direction by using a K-means clustering algorithm and taking each slice image as a basic unit of clustering.
The purpose of step 5 is to find the boundaries and center of the class from the slice.
And 5.1, initializing clustering, and uniformly distributing each slice of the transverse video slice images.
In this example, the cluster center is set to 4, and the number of video frames is set to 0 to 79.
In this example, the frame numbers after the four categories are uniformly distributed are respectively: [0 to 19], [20 to 39], [40 to 59], and [60 to 79 ].
And 5.2, updating the clustering center according to the clustering result.
The distance between the two slice images i and j is calculated as:
dis(i,j)=abs(i-j)*d(p(i),p(j));
where i and j are frame numbers of two slice images, abs () is an absolute value, and d (p (i), p (j)) is a euclidean distance between the slice image p (i) and the slice image p (j).
Updating the clustering center by adopting an attempt method, namely attempting to take each slice image in the class as the clustering center, calculating the distances between the rest slices of the class and the slices of the clustering center and accumulating; and taking the clustering center with the minimum accumulated distance as the updated clustering center.
In this example, the frame numbers of the updated cluster centers are respectively: 10, 35, 65, 75 frames.
And 5.3, adjusting the boundary between two adjacent categories according to the new clustering center.
And recalculating the distance from the clustering center of the slice image between two adjacent clustering centers along the time direction, thereby adjusting the boundary between two adjacent categories.
In this example, in the initial situation, the centers of the first class and the second class are respectively the 10 th frame and the 35 th frame, and the boundary between the two classes is the 19 th frame (upper first class) and the 20 th frame (lower second class); and performing opposite and sequential search on all frames between 11 th frames and 34 th frames, namely sequentially searching from 11 frames downwards in the upper first class and sequentially searching from 34 frames upwards in the lower second class until meeting.
Firstly, calculating the distance between the 11 th frame above and the 10 th frame in the center of the first class, and assuming that the distance is 5; the upper cumulative distance is 5.
Then calculating the distance between the lower 34 th frame and the 35 th frame in the center of the second class, and assuming that the distance is 10; the lower cumulative distance is 10.
Since the upper accumulation distance 5 is smaller than the lower accumulation distance 10, the 12 th frame is then taken in order above for calculation.
The distance of the upper 12 th frame from the first class center 10 th frame is calculated, assuming 2, and the upper cumulative distance is 5+2= 7.
Since the upper accumulation distance 7 is still smaller than the lower accumulation distance 10, the 13 th frame is then sequentially taken from above for calculation.
New boundaries of the first class and the second class are finally obtained, which are assumed to be the 23 rd frame (first class) and the 24 th frame (second class).
And continuously calculating the boundaries between the second class and the third class and between the third class and the fourth class.
And 5.4, repeating the steps 5.2 and 5.3 according to an iterative method of K-means clustering to obtain a clustering result and a key frame candidate frame.
In this example, the frame numbers of the four classes obtained: [0 to 25], [26 to 28], [29 to 47], and [48 to 79 ].
And 6, merging classes with less frame number and readjusting the class boundaries to obtain final content video key frames.
In this example, the threshold of the minimum number of frames is set to 4, and since the number of frames of the second type is 3, which is smaller than the threshold 4, the merging operation is required.
And 5.3, adjusting the boundary between the two classes according to the step 5.3 to obtain the final frame sequence numbers of the three classes: [0 to 27], [28 to 47], and [48 to 79 ].
The frame numbers of the three cluster centers updated according to the step 5.3 are respectively: 8,37, 73.
This is the final video content key frame sequence.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced, or the order of use of the steps may still be modified; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions. The values of the various thresholds and ranges of the present invention may vary depending on the particular situation.

Claims (5)

1. A method for extracting a content key frame of a motion video is characterized by comprising the following steps:
step 1, calculating an activity intensity map of a sports video;
step 2, taking a row with the maximum gray value in the motion video activity intensity map as a space-time slice position row;
step 3, performing horizontal slicing operation on each frame of video image at the position line of the space-time slice to obtain a space-time slice image;
step 4, splicing the time-space slice images to form transverse video slice images;
step 5, clustering the transverse video slice images along the time direction by using a K-means clustering algorithm;
and 6, merging the classes with less frame number and readjusting the boundary to obtain the final video content key frame.
2. The method of claim 1, said step 1, comprising:
step 1.1, carrying out gray processing on each frame of image of a motion video;
step 1.2, starting from the 2 nd frame image, calculating the gray difference absolute value of the previous frame image according to pixels to obtain a difference image;
step 1.3, carrying out threshold processing on the difference image, and clearing 0 the pixel gray level of which the gray value is smaller than the threshold;
and step 1.4, accumulating the gray values of the difference image according to pixels, and normalizing to obtain an activity intensity image of the motion video.
3. The method of claim 1, said step 2, comprising:
accumulating gray values of the main area of the activity intensity map according to rows; taking the row with the maximum accumulated gray value as a space-time slice position row;
optionally, a column with the maximum gray value in the motion video activity intensity map is taken as a spatiotemporal slice position column; vertical slicing is subsequently performed column by column.
4. The method of claim 1, said step 4, comprising:
for horizontal space-time slice images, splicing up and down according to the frame sequence when the video is unfolded to form horizontal video slice images from top to bottom in the time direction;
optionally, for a vertical space-time slice image, left and right stitching is performed according to a frame sequence when the video is unfolded during stitching, so as to form a vertical video slice image in a time direction from left to right.
5. The method of claim 1, said step 5, comprising:
step 5.1, uniformly distributing each slice of the transverse video slice images to each category according to the set number of the clustering centers;
step 5.2, according to the clustering result, recalculating the clustering center of each category and updating;
step 5.3, recalculating the distance between the slice images and the clustering centers and adjusting the boundary of the slice images between the adjacent clustering centers;
step 5.4, repeating the steps 5.2 and 5.3 according to a K-means clustering method until a clustering result and a key frame candidate frame are obtained;
and calculating the distance between the two slice images and the clustering center, namely calculating the distance between the two slice images, wherein the product of the Euclidean distance between the slice images and the inter-frame distance of the slice images along the time direction is used for expressing.
CN202110749819.8A 2021-07-02 2021-07-02 Method for extracting content key frame of motion video Pending CN113591587A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110749819.8A CN113591587A (en) 2021-07-02 2021-07-02 Method for extracting content key frame of motion video

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110749819.8A CN113591587A (en) 2021-07-02 2021-07-02 Method for extracting content key frame of motion video

Publications (1)

Publication Number Publication Date
CN113591587A true CN113591587A (en) 2021-11-02

Family

ID=78245573

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110749819.8A Pending CN113591587A (en) 2021-07-02 2021-07-02 Method for extracting content key frame of motion video

Country Status (1)

Country Link
CN (1) CN113591587A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014022254A2 (en) * 2012-08-03 2014-02-06 Eastman Kodak Company Identifying key frames using group sparsity analysis
CN105139421A (en) * 2015-08-14 2015-12-09 西安西拓电气股份有限公司 Video key frame extracting method of electric power system based on amount of mutual information
CN109858406A (en) * 2019-01-17 2019-06-07 西北大学 A kind of extraction method of key frame based on artis information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014022254A2 (en) * 2012-08-03 2014-02-06 Eastman Kodak Company Identifying key frames using group sparsity analysis
CN105139421A (en) * 2015-08-14 2015-12-09 西安西拓电气股份有限公司 Video key frame extracting method of electric power system based on amount of mutual information
CN109858406A (en) * 2019-01-17 2019-06-07 西北大学 A kind of extraction method of key frame based on artis information

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
SATYABRATA MAITY等: "A Novel Approach for Human Action Recognition from Silhouette Images" *
向东等: "基于改进K-Means 的动态视频关键帧提取模型" *
孙淑敏等: "基于改进K-means算法的关键帧提取" *
韩震博: "基于双向自适应时空切片的关键帧提取算法" *

Similar Documents

Publication Publication Date Title
US6904159B2 (en) Identifying moving objects in a video using volume growing and change detection masks
JP4976608B2 (en) How to automatically classify images into events
JP4907938B2 (en) Method of representing at least one image and group of images, representation of image or group of images, method of comparing images and / or groups of images, method of encoding images or group of images, method of decoding images or sequence of images, code Use of structured data, apparatus for representing an image or group of images, apparatus for comparing images and / or group of images, computer program, system, and computer-readable storage medium
US20130004081A1 (en) Image recognition device, image recognizing method, storage medium that stores computer program for image recognition
CN106937114B (en) Method and device for detecting video scene switching
CN106446015A (en) Video content access prediction and recommendation method based on user behavior preference
CN108629783B (en) Image segmentation method, system and medium based on image feature density peak search
CN110866896B (en) Image saliency target detection method based on k-means and level set super-pixel segmentation
JP2003016448A (en) Event clustering of images using foreground/background segmentation
JP5097280B2 (en) Method and apparatus for representing, comparing and retrieving images and image groups, program, and computer-readable storage medium
US8320664B2 (en) Methods of representing and analysing images
WO2009143279A1 (en) Automatic tracking of people and bodies in video
CN109446967B (en) Face detection method and system based on compressed information
CN113112519B (en) Key frame screening method based on interested target distribution
CN111797707B (en) Clustering-based shot key frame extraction method
CN111583279A (en) Super-pixel image segmentation method based on PCBA
WO2017088479A1 (en) Method of identifying digital on-screen graphic and device
Shi et al. Adaptive graph cut based binarization of video text images
CN108710883B (en) Complete salient object detection method adopting contour detection
CN110188625B (en) Video fine structuring method based on multi-feature fusion
CN111222508A (en) ROI-based house type graph scale identification method and device and computer equipment
Cózar et al. Logotype detection to support semantic-based video annotation
US20030179824A1 (en) Hierarchical video object segmentation based on MPEG standard
KR20030027953A (en) Automatic natural content detection in video information
EP2325801A2 (en) Methods of representing and analysing images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20211102

WD01 Invention patent application deemed withdrawn after publication