CN104156423A - Multiscale video key frame extraction method based on integer programming - Google Patents

Multiscale video key frame extraction method based on integer programming Download PDF

Info

Publication number
CN104156423A
CN104156423A CN201410384972.5A CN201410384972A CN104156423A CN 104156423 A CN104156423 A CN 104156423A CN 201410384972 A CN201410384972 A CN 201410384972A CN 104156423 A CN104156423 A CN 104156423A
Authority
CN
China
Prior art keywords
video
key frame
integer programming
frame
summit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410384972.5A
Other languages
Chinese (zh)
Other versions
CN104156423B (en
Inventor
聂秀山
柴彦娥
马林元
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong University of Finance and Economics
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN201410384972.5A priority Critical patent/CN104156423B/en
Publication of CN104156423A publication Critical patent/CN104156423A/en
Application granted granted Critical
Publication of CN104156423B publication Critical patent/CN104156423B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Landscapes

  • Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a multiscale video key frame extraction method based on integer programming. The method comprises the following steps that (1) video figure modeling is carried out, wherein video is modeled to an undirected weight figure; (2) video content is divided, wherein video frames are divided into a plurality of parts according to normalized figure cutting, and a scale factor is set; (3) a key frame set is obtained through integer programming according to the scale factor. Compared with the prior art, the method is based on the essence of key frame extraction, the key frames are selected based on the normalized figure cutting theory and integer programming, the video content can be represented to the greatest extent, the scale factor is set, and the general number of the key frames can be determined by a user in an interaction mode by selecting different scales so that the different needs of the user can be met.

Description

Multiple dimensioned video key frame extracting method based on integer programming
Technical field
The present invention relates to a kind of video key frame extracting method, relate in particular to a kind ofly based on integer programming and multiple dimensioned video key frame extracting method, belong to video, multimedia signal processing technique field.
Background technology
Along with developing rapidly of the development of computing machine and infotech, particularly multimedia technology, it is more and more abundanter that video content becomes, and video is the important carrier that people carry out information interchange as the media format a kind ofly containing much information, expressive force is strong always.In addition, along with the develop rapidly of software and hardware technology and network technology, the sharply increase of video resource quantity, the mobile devices such as increasing people's choice for use computing machine or mobile phone are watched video.A large amount of video datas is needed efficient video content management mode badly, thereby gives the better multimedia experiences of user.With key frame, represent that video segment is a kind of conventional video management mode, user only need to browse the key frame of minority just can understand the content of video.Therefore, people are making great efforts to carry out the research of key-frame extraction technology always.On the other hand, geometric growth due to video data, video frequency searching is more and more important in multimedia process field, traditional video frequency searching is mainly to rely on text marking to realize, this method workload is large, efficiency is low, and subjectivity is larger, and therefore a kind of automatic, objective, comprehensive video frequency searching mode---content-based video frequency searching is a research emphasis in recent years.An important step of content-based video frequency searching is extracted key frame exactly from video sequence, and take key frame and original contents is retrieved as index.Therefore, key-frame extraction has important effect in content-based video frequency searching.
The method that current key frame of video extracts is roughly divided into two large classes, the first kind is the extraction method of key frame based on sampling, these class methods adopt the mode of random or uniform sampling to obtain key frame, although this class methods simple and fast, but may cause some important video segments not choose key frame, or some fragments are got the key frame of repetition; Equations of The Second Kind is the extraction method of key frame of cutting apart based on camera lens, these class methods are divided into several video lens video, then choose the first frame of each camera lens or last frame as key frame of video, these class methods are limited to the precision that camera lens is cut apart, meanwhile, the key frame that these class methods obtain can not embody the content of video lens completely.
The number of key frame of video is also an important problem, key frame of video choose be in essence select can representing video content frame.Number of key frames too many, although the embodiment of higher degree the content of video, increased the calculated amount of video frequency searching, and lost to a certain extent the meaning (object of choosing key frame is expression video for simplicity) of key frame; And if number of key frames very little, can not embody the content of video completely.In addition, the number of key frames that existing key-frame extraction technology is chosen is mostly all relatively-stationary, for example, and the method based on sampling, uniform sampling is all generally that interval regular time section is chosen a frame as key frame, and key frame total number has generally all been preset in random sampling; The method of cutting apart based on camera lens, after camera lens is cut apart and determined, the number of key frame has also just been determined.Be that existing method has determined that the number of key frames that same video is chosen is relatively-stationary.
Summary of the invention
The present invention is directed to the deficiency that existing key frame of video extractive technique exists, provide a kind of representing video content to the full extent can realize again the key frame of video choosing method that user-interactive is set number of key frames.Compared with prior art, the present invention is from the essence of key-frame extraction, utilize normalization figure hugger opinion and integer programming to choose key frame, not only can use up the representing video content of large degree, an and scale factor of setting, by selecting different scale to realize the cardinal principle number of the decision key frame of user interactions, the present invention is referred to as the multiple dimensioned video key frame extracting method based on integer programming.
The technical solution used in the present invention is:
A multiple dimensioned video key frame extracting method based on integer programming, is characterized in that the method comprises the following steps:
(1) video figure modeling: video modeling is become to undirected weight map;
(2) video content is divided: set scale factor s, described scale factor is set for determining the number of key frame as required by user, and utilizes normalization figure hugger opinion that video sequence is become to s part according to division of teaching contents;
(3) integer programming modeling: the video figure to the video sequence after dividing carries out integer programming modeling, chooses key frame.
Preferably, the specific implementation step of described step (1) is:
1. frame of video is abstract is the summit in higher dimensional space, and between summit, line, as limit, is configured to the figure in higher dimensional space;
2. extract frame of video SURF (Speed Up Robust Feature: fast robust feature), with the feature of different frame
The distance function of point is as limit weight, and the figure that video is abstracted into changes weight map into.
Preferably, the specific implementation step of described step (3) is:
The video figure that above step (1) is obtained, first define the label on each summit, if frame of video corresponding to this summit is chosen as key frame, label is 1, otherwise be 0, the objective function of integer programming be exactly maximize all summits label and, constraint condition has two, the first guarantees to elect as between the video figure summit that key frame is corresponding and will not be connected mutually, it two is that all to have the label of a point at least be 1 for every part of guaranteeing video figure, the solution of integer programming is an optimum label set, and the vertex set that wherein label is 1 is exactly the set of key frame.
Preferably, the distance function using in step (1) is to realize the be inversely proportional to function of relation of weight and distance.
First said method carries out figure modeling to video, with SURF distance function structure weight, utilizes normalization figure hugger opinion that video is divided into some parts, and video figure is carried out to integer programming modeling, chooses figure summit as key frame of video.
The present invention can extract the key frame of representing video content, again can interactively adjusting number of key frames, compared with prior art, technology of the present invention has taken into full account differentiation and the representativeness of video content, video segment at different content is chosen key frame, both the representativeness that had guaranteed content has the repetition of having avoided keyframe content, simultaneously, the present invention can regulate according to scale factor the number of key frame, when user only needs to understand video content when general, less scale factor can be set and obtain less key frame, when the more detailed video content of needs, larger scale factor can be set and obtain plurality object key frame, this is that traditional key frame technology is not available.
Accompanying drawing explanation
Fig. 1 is step framework schematic diagram of the present invention.
Fig. 2 is a certain frame of video SURF schematic diagram.
Fig. 3 is video figure integer programming modeling schematic diagram.
Fig. 4 key-frame extraction example: (a) original video frame; (b) key frame under different scale.
Embodiment
Below in conjunction with accompanying drawing to the present invention's detailed explanation in addition.
Method of the present invention is pressed flow process shown in Fig. 1, comprises following concrete steps:
(1) video figure modeling
1. video figure modeling represents video with non-directed graph G=(V, E), wherein V and E difference representative graph vertex set and Bian Ji.The every frame of video is corresponding to figure summit, the limit collection of interconnecting line pie graph between summit.
2. define limit weight.The limit weight of figure represents the relation between video different frame, and the present invention utilizes the function of the Hausdoff distance between the acceleration robust features (Speed-Up Robust Feature:SURF) of different frame to define weight.SURF refers to the point of interest in image, generally refers to the interested points of human vision such as angle point, spot, has repeatability and reliability, can resist the interference such as selection, translation, illumination and noise, have stronger robustness, and the retrieval rate of SURF is fast, efficiency is high.Fig. 2 is a certain frame of video SURF schematic diagram.Concrete method is: for each frame of video, the value of calculating the Hessian matrix determinant of every bit x=(x, y) judges whether it is unique point.Hessian matrix is defined as follows:
H ( x , σ ) = L xx ( x , σ ) L xy ( x , σ ) L xy ( x , σ ) L yy ( x , σ ) - - - ( 1 )
L wherein xx(x, σ), L xy(x, σ) and L xy(x, σ) is the second order local derviation of Gaussian function with in a convolution of x=(x, y).σ means a yardstick at x=(x, y) place.Hessian determinant of a matrix value is as follows:
det ( H ) = L xx L yy - L xy 2 - - - ( 2 )
If the Hessian matrix determinant of point, for just, represents that this point is for Local Extremum.Then utilize the unique point on non-maximum Restrainable algorithms search different scale.Finally utilize the little wave response of Haar and by the little wave response of cumulative sector region, determine the direction of unique point, structural attitude is vectorial.
For convenience of calculation, the unique point that the every frame of the present invention is got similar number, for the limit weight w of figure summit i and j ijbe defined as follows:
w ij=e -H(i,j) (3)
Wherein H (i, j) is the Hausdorff distance for two frame unique point set.This functional form of all employings be the needs of dividing for video content, for video different frame, limit weight is larger, the distance between the figure summit that two frames are corresponding is less, illustrates that the content between two frames is more similar.For further raising method efficiency, the limit that weight is less than to setting threshold removes.In addition, for the definition of distance weighting, also can adopt other can realize the be inversely proportional to function of relation of weight and distance.
(2) video content is divided
Video content is divided and exactly video sequence is divided into two parts or M part according to content (reasonably).From the angle of video figure modeling, the problem that this problem is divided with regard to being equivalent to figure, how a given figure G=(V, E), be divided into disjoint subset its vertex set, makes this division best.The simplest method is after two or M non-intersect vertex set of delineation, wish the limit between vertex set, its weights and minimum, so-called MinCut (minimal cut) problem that Here it is, but, Mincut likely separates the single summit away from putting from majority with other summit, form two classes, and this is obviously disadvantageous for classification.In fact, we not only want to allow power and the minimum of cut edge, and will allow this M vertex set all similar large, so just meet cluster to people's visual sense.Normalization figure cuts exactly can realize above-mentioned purpose, obtains good figure summit division, thereby realizes the division of video content.
It is the recurrence process of two minutes that normalization is cut apart, and figure vertex set V is broken down into disjoint set A and B, A ∪ B=V, and A ∩ B=φ, this is ground floor, then for set A and B, proceeds two minutes.Until be sub-divided into the number of regions satisfying the demands, number of regions has represented the degree that video content is subdivided.This number is the scale factor s in this programme.When user needs less key frame, can set less scale factor, when needing more key frame, user can set larger scale factor.Scale factor has been determined the minimal amount of key frame on this yardstick, i.e. minimum contents precision on yardstick, and being located at the upper number of key frames of yardstick s is M s, have following formula to set up:
M s≥s (4)
Utilize normalized cut to divide video sequence, mapping graph weight matrix places one's entire reliance upon, and the video local feature that places one's entire reliance upon of choosing of weight matrix characterizes in the present invention, to not restriction of time order and function, though therefore not at one time the similar frame in section still can drop in same video segment.This has just effectively been avoided the redundancy of key frame extraction.
(3) integer programming modeling
The On The Choice of key frame of video, can approximately equivalent be the independent sets On The Choice of video figure, and figure independent sets U is the subset of vertex set V, and for any summit i ∈ U, j ∈ U, does not all have limit to be connected.Between any two summits of video figure, have limit to be connected, just weighted, therefore, is defined as the independent sets of video figure: independent sets U is the subset of vertex set V herein, and for any summit i ∈ U, j ∈ U, weight w ijbe less than threshold value θ.Threshold value setting is the mean value of all limits weight.The On The Choice of figure independent sets is NP-hard problem classical in graph theory, and the present invention adopts the theory of integer programming to be similar to solution.
The general idea of integer programming is: to video figure obtained above, define the label on each summit, if frame of video corresponding to this summit is chosen as key frame, label is 1, otherwise be 0, the objective function of integer programming be exactly maximize all summits label and, constraint condition has two, the first guarantees to elect as between the video figure summit that key frame is corresponding and will not be connected mutually, it two is that all to have the label of a point at least be 1 for every part of guaranteeing video figure, the solution of integer programming is an optimum label set, the vertex set that wherein label is 1 is exactly the set of key frame.
If U is a maximal independent set of video figure, N video figure summit number, on yardstick s, video sequence is divided into s part, and k partly uses M krepresent defining variable A i, d ijas follows:
A i = 1 v i ∈ U 0 v i ∉ U , d ij = 1 v i ∈ M k 1 , v j ∈ M k 2 , k 1 ≠ k 2 0 otherwise - - - ( 5 )
Integer programming model is defined as follows:
max Σ i A i s . t . A i + A j ≤ 1 , if w ij > θ Σ i , j C N 2 d ij ≥ C s 2 - - - ( 6 )
Wherein for number of combinations, constraint condition 1 explanation larger two summits (between 2, limit weight is large) of similarity can not be selected into independent sets simultaneously, and the present invention is referred to as the property distinguished constraint; Constraint condition 2 explanations, on yardstick s, have at least a point to be selected into independent sets in every part of video figure, the present invention is referred to as representative constraint.Fig. 3 has provided the explanation schematic diagram of constraint condition 2 when s=4, supposes a certain video figure totally 5 summits, i.e. N=5, common property life between summit between two individual d ijvalue, as shown in Fig. 3 (a), except d 12(vertex v outside=0 1and v 2in same part), other be 1, for guaranteeing that each part has at least a frame to be chosen as key frame, as shown in Fig. 3 (b), at least should have individual d ijvalue be 1.
For simplified model, next, definition e ijas follows:
e ij = 1 w ij &GreaterEqual; &theta; 0 w ij < &theta; - - - ( 7 )
Linear programming model (6) can transfer integer programming master pattern (8) to:
min - &Sigma; i A i s . t . A i + A j &le; 2 - e ij - &Sigma; i , j C N 2 d ij &le; - C s 2 - - - ( 8 )
The solution of this integer programming model is a maximal independent set of video mapping graph, according to mapping relations, obtains the key frame set of corresponding video.
Fig. 4 is an emulation experiment of the inventive method, Fig. 4 (a) is the partial frame sequence of short-sighted frequency " indi009.mpg ", Fig. 4 (b) is for the key frame that utilizes this method to extract at different scale s, can find out that key frame has reflected the substance of video.

Claims (4)

1. the multiple dimensioned video key frame extracting method based on integer programming, is characterized in that the method comprises the following steps:
(1) video figure modeling: video modeling is become to undirected weight map;
(2) video content is divided: set scale factor s, described scale factor is set for determining the number of key frame as required by user, and utilizes normalization figure hugger opinion that video sequence is become to s part according to division of teaching contents;
(3) integer programming modeling: the video figure to the video sequence after dividing carries out integer programming modeling, chooses key frame.
2. the multiple dimensioned video key frame extracting method based on integer programming as claimed in claim 1, is characterized in that: the specific implementation step of described step (1) is:
1. frame of video is abstract is the summit in higher dimensional space, and between summit, line, as limit, is configured to the figure in higher dimensional space;
2. extract SURF (the Speed Up Robust Feature: fast robust feature), using the distance function of unique point of different frame as limit weight, the figure that video is abstracted into changes weight map into of frame of video.
3. the multiple dimensioned video key frame extracting method based on integer programming as claimed in claim 1, is characterized in that: the specific implementation step of described step (3) is:
The video figure that above step (1) is obtained, first define the label on each summit, if frame of video corresponding to this summit is chosen as key frame, label is 1, otherwise be 0, the objective function of integer programming be exactly maximize all summits label and, constraint condition has two, the first guarantees to elect as between the video figure summit that key frame is corresponding and will not be connected mutually, it two is that all to have the label of a point at least be 1 for every part of guaranteeing video figure, the solution of integer programming is an optimum label set, and the vertex set that wherein label is 1 is exactly the set of key frame.
4. the multiple dimensioned video key frame extracting method based on integer programming as claimed in claim 2, is characterized in that: the distance function using in described step (1) is to realize the be inversely proportional to function of relation of weight and distance.
CN201410384972.5A 2014-08-06 2014-08-06 Multiple dimensioned video key frame extracting method based on integer programming Expired - Fee Related CN104156423B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410384972.5A CN104156423B (en) 2014-08-06 2014-08-06 Multiple dimensioned video key frame extracting method based on integer programming

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410384972.5A CN104156423B (en) 2014-08-06 2014-08-06 Multiple dimensioned video key frame extracting method based on integer programming

Publications (2)

Publication Number Publication Date
CN104156423A true CN104156423A (en) 2014-11-19
CN104156423B CN104156423B (en) 2017-09-29

Family

ID=51881922

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410384972.5A Expired - Fee Related CN104156423B (en) 2014-08-06 2014-08-06 Multiple dimensioned video key frame extracting method based on integer programming

Country Status (1)

Country Link
CN (1) CN104156423B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776896A (en) * 2016-11-30 2017-05-31 董强 A kind of quick figure fused images search method
CN107844779A (en) * 2017-11-21 2018-03-27 重庆邮电大学 A kind of video key frame extracting method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1094408A2 (en) * 1999-10-19 2001-04-25 Lg Electronics Inc. Multimedia description scheme having weight information and method for displaying multimedia
CN102184242A (en) * 2011-05-16 2011-09-14 天津大学 Cross-camera video abstract extracting method
CN102307301A (en) * 2011-05-30 2012-01-04 电子科技大学 Audio-video fingerprint generation method based on key frames
CN102682298A (en) * 2012-04-28 2012-09-19 聂秀山 Video fingerprint method based on graph modeling

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1094408A2 (en) * 1999-10-19 2001-04-25 Lg Electronics Inc. Multimedia description scheme having weight information and method for displaying multimedia
CN102184242A (en) * 2011-05-16 2011-09-14 天津大学 Cross-camera video abstract extracting method
CN102307301A (en) * 2011-05-30 2012-01-04 电子科技大学 Audio-video fingerprint generation method based on key frames
CN102682298A (en) * 2012-04-28 2012-09-19 聂秀山 Video fingerprint method based on graph modeling

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
聂秀山 等: ""基于二叉树和随机领域嵌入的视频指纹算法"", 《电子学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776896A (en) * 2016-11-30 2017-05-31 董强 A kind of quick figure fused images search method
CN107844779A (en) * 2017-11-21 2018-03-27 重庆邮电大学 A kind of video key frame extracting method
CN107844779B (en) * 2017-11-21 2021-03-23 重庆邮电大学 Video key frame extraction method

Also Published As

Publication number Publication date
CN104156423B (en) 2017-09-29

Similar Documents

Publication Publication Date Title
CN107833213B (en) Weak supervision object detection method based on false-true value self-adaptive method
CN101620596B (en) Multi-document auto-abstracting method facing to inquiry
Xu et al. A supervoxel approach to the segmentation of individual trees from LiDAR point clouds
US20120093411A1 (en) Active Segmentation for Groups of Images
US20110282897A1 (en) Method and system for maintaining a database of reference images
CN103995889A (en) Method and device for classifying pictures
CN110322453A (en) 3D point cloud semantic segmentation method based on position attention and auxiliary network
EP2551792A2 (en) System and method for computing the visual profile of a place
WO2018134964A1 (en) Image search system, image search method, and program
CN109471944A (en) Training method, device and the readable storage medium storing program for executing of textual classification model
CN108595582B (en) Social signal-based identification method for disastrous weather hot events
KR101224312B1 (en) Friend recommendation method for SNS user, recording medium for the same, and SNS and server using the same
CN106649663A (en) Video copy detection method based on compact video representation
CN111723666A (en) Signal identification method and device based on semi-supervised learning
Zhu et al. Automatic detection of books based on Faster R-CNN
CN109657082B (en) Remote sensing image multi-label retrieval method and system based on full convolution neural network
CN103473275A (en) Automatic image labeling method and automatic image labeling system by means of multi-feature fusion
CN113989291A (en) Building roof plane segmentation method based on PointNet and RANSAC algorithm
CN111860823A (en) Neural network training method, neural network training device, neural network image processing method, neural network image processing device, neural network image processing equipment and storage medium
JP2012079187A (en) Feature vector generating device, feature vector generating method and program therefor
CN104008177A (en) Method and system for rule base structure optimization and generation facing image semantic annotation
Luqman et al. Subgraph spotting through explicit graph embedding: An application to content spotting in graphic document images
JP2012022419A (en) Learning data creation device, learning data creation method, and program
CN104778272B (en) A kind of picture position method of estimation excavated based on region with space encoding
CN103399863A (en) Image retrieval method based on edge direction difference characteristic bag

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20170830

Address after: 250014 No. 7366 East Second Ring Road, Lixia District, Shandong, Ji'nan

Applicant after: SHANDONG University OF FINANCE AND ECONOMICS

Address before: 250014 School of computer science and technology, No. 7366 East Fourth Ring Road, Lixia District, Ji'nan, Shandong

Applicant before: Nie Xiushan

GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170929