CN103824284A - Key frame extraction method based on visual attention model and system - Google Patents

Key frame extraction method based on visual attention model and system Download PDF

Info

Publication number
CN103824284A
CN103824284A CN201410039072.7A CN201410039072A CN103824284A CN 103824284 A CN103824284 A CN 103824284A CN 201410039072 A CN201410039072 A CN 201410039072A CN 103824284 A CN103824284 A CN 103824284A
Authority
CN
China
Prior art keywords
key
significance
camera lens
frame
time domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410039072.7A
Other languages
Chinese (zh)
Other versions
CN103824284B (en
Inventor
纪庆革
赵杰
刘勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Zhongda Nansha Technology Innovation Industrial Park Co Ltd
National Sun Yat Sen University
Original Assignee
Guangzhou Zhongda Nansha Technology Innovation Industrial Park Co Ltd
National Sun Yat Sen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Zhongda Nansha Technology Innovation Industrial Park Co Ltd, National Sun Yat Sen University filed Critical Guangzhou Zhongda Nansha Technology Innovation Industrial Park Co Ltd
Priority to CN201410039072.7A priority Critical patent/CN103824284B/en
Publication of CN103824284A publication Critical patent/CN103824284A/en
Application granted granted Critical
Publication of CN103824284B publication Critical patent/CN103824284B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a key frame extraction method based on a visual attention model and a system. In a spatial domain, the extraction method uses binomial coefficients to filter the global contrast for salience detection, and uses an adaptive threshold for carrying out extraction on a target region. The algorithm can well maintain the salient target region boundary, and the salience in the region is uniform. Then, in a time domain, the method defines the motion salience, motion of the target is estimated via a homography matrix, a key point is adopted for replacing the target for salience detection, data of salience in the spatial domain is converged, and a boundary extension method based on an energy function is brought forward to acquire a bounding box to serve as the salient target region of the time domain. Finally, the method reduces richness of the video through the salient target region and an online clustering lens adaptive method is adopted for key frame extraction.

Description

A kind of extraction method of key frame and system based on visual attention model
Technical field
The present invention relates to Video Analysis Technology field, particularly relate to a kind of extraction method of key frame and system based on visual attention model.
Background technology
Along with the fast development of Internet technology, we have marched toward the information big bang epoch, and the fast development of various network applications and multimedia technology is widely used.Video is as a kind of common network information carriers, lively and directly perceived, there is very strong sight and expressive force, thereby be widely used in every field, video data magnanimity is increased, take famous video website YouTube as example, the video that per minute is uploaded by user approximately has 60 hours (data are taken from January 23rd, 2012), and is still keeping rising tendency.The video resource of how fast and effeciently to store, manage and to access magnanimity becomes a major issue of current video application.Video is because have relativity of time domain, and under traditional approach, user grasps one section of video information need to browse complete segment video from start to finish.When irrelevant video occupies user plenty of time, a large amount of network bandwidths are also wasted.Therefore, we need to add supplementary to video, help user to screen better.In ripe system, generally adopt traditional label character method at present, by manual type manual classification, give video with the word such as title, description manually semantic.In the face of magnanimity video, this task not only workload is large, and different people is different to video understanding, and other people cannot judge whether video meets the interest of oneself by author's label character.
Therefore, people summarize video effectively in the urgent need to a kind of mode of robotization.
Summary of the invention
In order to solve the deficiencies in the prior art, first the present invention provides a kind of video crux frame extracting method based on visual attention model, adopts the method effectively to obtain video lens is had to fine representational key frame.
Another object of the present invention is to propose a kind of video crux frame extraction system based on visual attention model.
To achieve these goals, technical scheme of the present invention is:
Based on a video key frame extracting method for visual attention model, comprising:
On spatial domain, carry out significance detection by binomial coefficient filtering global contrast, and utilize adaptive threshold to extract target area; Adopt and not only can keep preferably in this way well-marked target zone boundary, and in region, significance is more even.
In time domain, the significance of definition motion, estimates target travel by homography matrix, adopts key point to replace target to carry out significance detection, merge the data of spatial domain significance, propose to obtain the well-marked target region of bounding box as time domain based on the method for energy function border extension;
Reduce the rich of video by well-marked target region, adopt in conjunction with the camera lens adaptive approach of online cluster and carry out key-frame extraction.
A key frame of video extraction system based on visual attention model, this system comprises marking area extraction module, key-frame extraction module;
Concrete, described marking area extraction module comprises:
Spatial domain marking area extraction module, for extracting the marking area on spatial domain;
Time domain key point significance acquisition module, for extracting the significance value of the key point in time domain;
Fusion Module, for the key point in the marking area on spatial domain and time domain is merged, and finally obtains marking area.
Described key-frame extraction module comprises:
Static camera lens key-frame extraction module, for the key-frame extraction of static camera lens;
Dynamically camera lens key-frame extraction module, for the key-frame extraction of dynamic camera lens;
Camera lens adaptation module, for static camera lens key-frame extraction module and the dynamically control between camera lens key-frame extraction module.
Compared with prior art, beneficial effect of the present invention is: adopt the present invention to summarize video automatically with having, effectively obtain video lens is had to fine representational key frame.
Accompanying drawing explanation
Fig. 1 is the key-frame extraction process flow diagram of the static camera lens of the present invention.
Fig. 2 is the key-frame extraction process flow diagram of the dynamic camera lens of the present invention.
Fig. 3 is the key-frame extraction process flow diagram of self-adaptation camera lens of the present invention.
Embodiment
Below in conjunction with accompanying drawing, the present invention is further detailed explanation.
A kind of video key frame extracting method based on visual attention model disclosed by the invention, embodiment is as follows:
First, on spatial domain, by carrying out significance detection by binomial coefficient filtering global contrast, and utilize adaptive threshold to extract target area, concrete grammar is as follows:
(11) binomial coefficient is constructed according to pascal's triangle, and the normalized factor of N layer is 2 n.Select the 4th layer, therefore filter coefficient B 4=(1/16) [1 464 1];
(12) establishing I is primary stimuli intensity,
Figure BDA0000462554430000031
for the average of stimulus intensity around,
Figure BDA0000462554430000032
for I and B 4convolution; Adopt the vector form of CIELAB color space to weigh the power stimulating pixel, the contrast of stimulation is the Euclidean distance of two CIELAB vectors, therefore detects for the stimulation degree of pixel (x, y) to be
S ( x , y ) = | | I B 4 ( x , y ) - I ‾ | | - - - ( 1 )
(13) obtain the measuring assembly S of significance s=(s 11, s 12..., s nM) afterwards, utilize adaptive threshold to extract target area, wherein s ij(0≤i≤N, 0≤j≤M) is the significance of pixel (i, j), M, and N is respectively width and the height of image.
Specifically, realizing by the following method adaptive threshold extracts target area:
(21) the overall significance detection computations of definition pixel (x, y) formula
S g ( x , y ) = 1 A Σ i = 0 N Σ j = 0 M | | I B 4 ( x , y ) - I ( i , j ) | | - - - ( 2 )
Wherein A is the area detecting,
Figure BDA0000462554430000038
for original image is through wave filter B 4the stimulus intensity of pixel (x, y) after filtering, I (i, j) is the primary stimuli intensity of pixel (i, j), M, N is respectively width and the height of image;
(22) carry out computing acceleration by histogram, primary stimuli intensity I is mapped to and stimulates space
Figure BDA0000462554430000035
in, the stimulation of finally experiencing for user
Figure BDA0000462554430000036
significance as follows
S ( I B 4 ( I ) ) = 1 ( m - 1 ) D ( I B 4 ( I ) ) Σ i = 1 m ( D ( I B 4 ( I ) ) - | | I B 4 ( I ) - I B 4 ( I i ) | | ) S g ( I B 4 ( I ) ) - - - ( 3 )
Wherein D is for stimulating
Figure BDA0000462554430000041
distance between m nearest stimulation m is manual control parameter, and getting in the present embodiment m is 8;
(23) by changing threshold value T sappointment prospect and background area, then using the threshold value that obtains minimum energy function as optimal threshold; With T sfor the energy function of threshold value is defined as follows:
E ( I , T s , λ , σ ) = λ Σ n = 1 N ( f ( T s , S n ) S n ) + V ( I , T s , σ ) - - - ( 4 )
Wherein S nobtained by formula (2), λ is the weight of well-marked target energy, gets in the present embodiment λ=1.0, the total pixel number that N is image, f (T s, S n)=max (0, sign (S n-T s)), V (I, T s, be s) measurement of the similarity to around stimulating, select current T sthe pixel composition point of lower significant point and its 8 neighborhood calculates Pair, V ( I , T s , σ ) = Σ { p , q } ∈ Pair 1 dist ( p , q ) × e ( - | | I p - I q | | / 2 σ 2 ) , Dist (p, q) is the space length between 2, and σ is manual control parameter, gets in the present embodiment σ=10.0.
Therefore given piece image and saliency map, by minimization of energy function to T sestimate, in the time that pixel belongs to well-marked target, be marked as 1, all the other are labeled as 0, and parameter lambda and σ need manual setting in advance.
Then, in time domain, the significance of definition motion, by homography matrix, target travel is estimated, adopt key point to replace target to carry out significance detection, merge afterwards the data of spatial domain significance, propose to obtain the well-marked target region of bounding box as time domain based on the method for energy function border extension, concrete grammar is as follows:
(31) given piece image, adopts real-time good FAST(Features from Accelerated Segment Test) feature point detection algorithm obtains the key point of image;
(32) given two adjacent two field pictures, adopt FLANN(Fast Library for Approximate Nearest Neighbor) carry out reference point coupling fast;
(33) with homography matrix (Homography Matrix) H, the motion of key point is described, because a H only describes a kind of forms of motion, same section of video memory forms of motion be various, therefore need multiple H to be described different motions.Adopt in the present embodiment RANSAC algorithm, by continuous iteration, obtain the estimation H={H of a series of homography matrixes 1, H 2..., H n;
(34) the time domain significance of definition key point is
S t ( p m ) = A m W × H Σ i = 1 n A i D ( p m , H i ) - - - ( 5 )
Wherein A mfor motion state H mthe distribution area of all key points, the width that W and H are video image and height;
(35) the significance value in spatial domain and the time domain significance value of the key point of obtaining are merged;
(36) adopt the method based on energy function border extension to obtain the well-marked target region of bounding box as time domain.
Specifically, the significance value that realizes by the following method spatial domain merges with the time domain significance value of the key point of obtaining:
(41) contrast of a motion conspicuousness of definition
Figure BDA0000462554430000052
wherein key point time domain significance value S tobtained by formula (5),
Figure BDA0000462554430000053
for the average of key point time domain significance value;
(42) conspicuousness of motion should be for the target that still has stronger discrimination on spatial domain, therefore to time domain significance S tscope of statistics should limit to some extent, establish p ifor S ti key point, p ishould meet wherein
Figure BDA0000462554430000055
for spatial domain significance value average;
(43) weight of definition time domain the weight in spatial domain
Figure BDA0000462554430000057
time domain and the spatial domain significance value that will meet the key point of (42) are added by weights.
Specifically, realize by the following method time domain well-marked target extracted region:
Using the remarkable key point p in spatial domain as Seed Points, the bounding box B that seed region adopts rectangle, establishes b ifor the four edges of bounding box B, i ∈ 1,2,3,4} is numbering up and down, and the algorithm of border extension is as follows:
Initialization: the summit up and down of bounding box B is all made as key point p position, some p is the internal point of bounding box B.
Step 1: the order computation b from i=1 to increase progressively isignificance energy E on outer boundary outerand the significance energy E of inner boundary (i) inner(i), the calculating of energy function is as formula (4), and the weights that then computation bound can extend out are
Figure BDA0000462554430000058
wherein l ifor the length on the i article of limit of current bounding box B.
Step 2: if w (i) >=ε, i article of limit is to pixel cell of external expansion.ε is the threshold value that expansion is judged, need to set in advance.In experiment herein, be set to 0.8T s', T s' be the spatial domain significance average in bounding box.
Step 3: if do not have new limit to be expanded, stop algorithm in step 2, output bounding box B; Otherwise, repeating step 1 and step 2.
Finally, reduce the rich of video by well-marked target region, adopt in conjunction with the camera lens adaptive approach of online cluster and carry out key-frame extraction, concrete grammar is as follows:
(51) be hsv color space by the RGB color space conversion of marking area, get wherein H component (tone) and S component (saturation degree) calculating form and aspect saturation histograms (Hue-saturation Histogram).Note H p(i) be i bin value of the form and aspect saturation histogram in p frame well-marked target region, the present embodiment adopts Bhattacharyya distance to weigh the visible sensation distance between two frames D sal ( p , q ) = 1 - Σ i H p ( i ) H q ( i ) Σ i H p ( i ) Σ i H q ( i ) ;
(52) adopt in conjunction with the camera lens adaptive approach of online cluster and carry out key-frame extraction, in the cluster mode of static camera lens as main, dynamically the cluster mode of camera lens is auxiliary.For static camera lens, take the form and aspect saturation histogram of marking area as according to carrying out online cluster, choose in cluster any frame as key frame.For dynamic camera lens, first follow the tracks of remarkable moving target, the then foundation using the tracking of remarkable moving target as online cluster, the positional information of well-marked target is as the foundation of extracting key frame from cluster.
Specifically, as Fig. 1, realize the online cluster of static camera lens by following steps:
Initialization: the form and aspect saturation histogram that calculates static camera lens the first frame
Figure BDA0000462554430000063
initial cell is counted N=1, and will as cell Cell 1centre of form C 1vector, C 1=f 1.
S11: if present frame p belongs to static camera lens, calculate the form and aspect saturation histogram H of present frame p.
S12: calculate the visible sensation distance of p and each cell centre of form, obtain wherein minimum visible sensation distance cell m = arg min n { D sal ( p , C n ) | 1 ≤ n ≤ N } , The call number that wherein m is cell.
S13: by D sal(p, C m) and threshold epsilon ccompare, work as D sal(p, C m)≤ε ctime, p is included into cell Cell min, then use H psubstitute Cell mthe centre of form.Otherwise, increase cell Cell n+1, by H pas cell Cell n+1centre of form C n+1vector, final updating cell is counted N=N+1.
S14: repeat S11, S12 and S13 for all static camera lens frames.
Specifically, as Fig. 2, realize the key-frame extraction of dynamic camera lens by following steps:
Initialization: the first frame that obtains dynamic camera lens.
S21: obtain tracking target region, initialization particle or resampling, extract video next frame, judges whether this frame is empty, empty if, finishes.
S22: obtain FAST proper vector, mate with FLANN algorithm, the vectorial weights of regeneration characteristics, if proper vector deficiency finishes.
S23: upgrade each particle weights, calculate key frame weights and target area, S21 is carried out in redirect.
A kind of key-frame extraction system based on visual attention model disclosed by the invention comprises marking area extraction module and key-frame extraction module.
Marking area extraction module comprises:
Spatial domain marking area extraction module, for extracting the marking area on spatial domain;
Time domain key point significance acquisition module, for extracting the significance value of the key point in time domain;
Fusion Module, for the key point in the marking area on spatial domain and time domain is merged, and finally obtains marking area.
Key-frame extraction module comprises:
Static camera lens key-frame extraction module, for the key-frame extraction of static camera lens;
Dynamically camera lens key-frame extraction module, for the key-frame extraction of dynamic camera lens;
Camera lens adaptation module, for static camera lens key-frame extraction module and the dynamically control between camera lens key-frame extraction module.
The above; be only preferred embodiment of the present invention, be not intended to limit protection scope of the present invention, be to be understood that; the present invention is not limited to implementation as described herein, and the object that these implementations are described is to help those of skill in the art to put into practice the present invention.Any those of skill in the art are easy to be further improved without departing from the spirit and scope of the present invention and perfect; therefore any modification of having done within spiritual principles of the present invention, be equal to and replace and improvement etc., within all should being included in claim protection domain of the present invention.

Claims (10)

1. the extraction method of key frame based on visual attention model, extracts for the key frame to video, it is characterized in that, comprising:
On spatial domain, carry out significance detection by binomial coefficient filtering global contrast, and utilize adaptive threshold to extract target area;
In time domain, the significance of definition motion, estimates target travel by homography matrix, adopts key point to replace target to carry out significance detection, merge the data of spatial domain significance, propose to obtain the well-marked target region of bounding box as time domain based on the method for energy function border extension;
Reduce the rich of video by well-marked target region, adopt in conjunction with the camera lens adaptive approach of online cluster and carry out key-frame extraction.
2. method according to claim 1, is characterized in that, on spatial domain, by carrying out significance detection by binomial coefficient filtering global contrast, and utilizes adaptive threshold to extract target area, and concrete grammar is as follows:
(11) binomial coefficient is constructed according to pascal's triangle, and the normalized factor of N layer is 2 n; Select the 4th layer, filter coefficient B 4=(1/16) [1 464 1];
(12) establishing I is primary stimuli intensity,
Figure FDA0000462554420000011
for the average of stimulus intensity around,
Figure FDA0000462554420000012
for I and B 4convolution; Adopt the vector form of CIELAB color space to weigh the power stimulating pixel, the contrast of stimulation is the Euclidean distance of two CIELAB vectors, therefore detects for the stimulation degree of pixel (x, y) to be
S ( x , y ) = | | I B 4 ( x , y ) - I ‾ | | - - - ( 1 )
(13) obtain the measuring assembly S of significance s=(s 11, s 12..., s nM) after, utilize adaptive threshold to extract target area, wherein s ijfor the significance of pixel (i, j), 0≤i≤N, 0≤j≤M, M, N is respectively width and the height of image.
3. method according to claim 2, is characterized in that, realizes by the following method adaptive threshold target area is extracted:
(21) the overall significance detection computations of definition pixel (x, y) formula
S g ( x , y ) = 1 A Σ i = 0 N Σ j = 0 M | | I B 4 ( x , y ) - I ( i , j ) | | - - - ( 2 )
Wherein A is the area detecting, for original image is through wave filter B 4the stimulus intensity of pixel (x, y) after filtering, I (i, j) is the primary stimuli intensity of pixel (i, j), M, N is respectively width and the height of image;
(22) carry out computing acceleration by histogram, primary stimuli intensity I is mapped to and stimulates space
Figure FDA0000462554420000022
in, the stimulation of finally experiencing for user significance as follows
S ( I B 4 ( I ) ) = 1 ( m - 1 ) D ( I B 4 ( I ) ) Σ i = 1 m ( D ( I B 4 ( I ) ) - | | I B 4 ( I ) - I B 4 ( I i ) | | ) S g ( I B 4 ( I ) ) - - - ( 3 )
Wherein D is for stimulating distance between m nearest stimulation D ( I B 4 ( I ) ) = Σ i = 1 m | | I B 4 ( I ) - I B 4 ( I i ) | | ;
(23) by changing threshold value T sappointment prospect and background area, then using the threshold value that obtains minimum energy function as optimal threshold; With T sfor the energy function of threshold value is defined as follows:
E ( I , T s , λ , σ ) = λ Σ n = 1 N ( f ( T s , S n ) S n ) + V ( I , T s , σ ) - - - ( 4 )
Wherein S nobtained by formula (2), λ is the weight of well-marked target energy, the total pixel number that N is image, f (T s, S n)=max (0, sign (S n-T s)), V (I, T s, σ) and be the measurement of the similarity to around stimulating, select current T sthe pixel composition point of lower significant point and its 8 neighborhood calculates Pair, V ( I , T s , σ ) = Σ { p , q } ∈ Pair 1 dist ( p , q ) × e ( - | | I p - I q | | / 2 σ 2 ) , Dist (p, q) is the space length between 2, and σ is for controlling parameter;
Given piece image and saliency map, by minimization of energy function to T sestimate, in the time that pixel belongs to well-marked target, be marked as 1, all the other are labeled as 0.
4. method according to claim 1, it is characterized in that, in time domain, the significance of definition motion, by homography matrix, target travel is estimated, adopted key point to replace target to carry out significance detection, merge afterwards the data of spatial domain significance, propose to obtain the well-marked target region of bounding box as time domain based on the method for energy function border extension, concrete grammar is as follows:
(31) given piece image, the FAST feature point detection algorithm that employing real-time is good obtains the key point of image;
(32) given two adjacent two field pictures, adopt FLANN to carry out reference point coupling fast;
(33) describe the motion of key point with multiple homography matrix H, adopt RANSAC algorithm, by continuous iteration, obtain the estimation H={H of a series of homography matrixes 1, H 2..., H n;
(34) the time domain significance of definition key point is
S t ( p m ) = A m W × H Σ i = 1 n A i D ( p m , H i ) - - - ( 5 )
Wherein A mfor motion state H mthe distribution area of all key points, the width that W and H are video image and height;
(35) the significance value in spatial domain and the time domain significance value of the key point of obtaining are merged;
(36) adopt the method based on energy function border extension to obtain the well-marked target region of bounding box as time domain.
5. method according to claim 4, is characterized in that, the significance value that realizes by the following method spatial domain merges with the time domain significance value of the key point of obtaining:
(41) contrast of a motion conspicuousness of definition
Figure FDA0000462554420000032
wherein key point time domain significance value S tobtained by formula (5),
Figure FDA0000462554420000033
for the average of key point time domain significance value;
(42) establish p ifor S ti key point, p ishould meet
Figure FDA0000462554420000034
wherein for spatial domain significance value average;
(43) weight of definition time domain
Figure FDA0000462554420000036
the weight in spatial domain
Figure FDA0000462554420000037
time domain and the spatial domain significance value that will meet the key point in step (42) are added by weights.
6. method according to claim 4, is characterized in that, realizes by the following method time domain well-marked target extracted region:
Using the remarkable key point p in spatial domain as Seed Points, the bounding box B that seed region adopts rectangle, establishes b ifor the four edges of bounding box B, i ∈ 1,2,3,4} is numbering up and down, and the algorithm of border extension is as follows:
Initialization: the summit up and down of bounding box B is all made as key point p position, some p is the internal point of bounding box B.
Step 1: the order computation b from i=1 to increase progressively isignificance energy E on outer boundary outerand the significance energy E of inner boundary (i) inner(i), the calculating of energy function is as formula (4), and the weights that then computation bound extends out are
Figure FDA0000462554420000041
wherein l ifor the length on the i article of limit of current bounding box B;
Step 2: if w (i) >=ε, i article of limit is to pixel cell of external expansion.ε is the threshold value that the expansion of setting is judged, is set to 0.8T s', T s' be the spatial domain significance average in bounding box;
Step 3: if do not have new limit to be expanded, stop algorithm in step 2, output bounding box B; Otherwise, repeating step 1 and step 2.
7. method according to claim 1, is characterized in that, reduces the rich of video by well-marked target region, adopts in conjunction with the camera lens adaptive approach of online cluster and carries out key-frame extraction, and concrete grammar is as follows:
(51) be hsv color space by the RGB color space conversion of marking area, get wherein H component and S component calculating form and aspect saturation histogram, note H p(i) be i bin value of the form and aspect saturation histogram in p frame well-marked target region, adopt Bhattacharyya distance to weigh the visible sensation distance between p, q two frames D sal ( p , q ) = 1 - Σ i H p ( i ) H q ( i ) Σ i H p ( i ) Σ i H q ( i ) ;
(52) adopt in conjunction with the camera lens adaptive approach of online cluster and carry out key-frame extraction, in the cluster mode of static camera lens as main, dynamically the cluster mode of camera lens is auxiliary;
For static camera lens, take the form and aspect saturation histogram of marking area as according to carrying out online cluster, choose in cluster any frame as key frame;
For dynamic camera lens, first follow the tracks of remarkable moving target, the then foundation using the tracking of remarkable moving target as online cluster, the positional information of well-marked target is as the foundation of extracting key frame from cluster.
8. method according to claim 7, is characterized in that, realizes the online cluster of static camera lens by following steps:
Initialization: the form and aspect saturation histogram that calculates static camera lens the first frame
Figure FDA0000462554420000043
initial cell is counted N=1, and will
Figure FDA0000462554420000052
as cell Cell 1centre of form C 1vector, C 1=f 1;
S11: if present frame p belongs to static camera lens, calculate the form and aspect saturation histogram H of present frame p;
S12: calculate the visible sensation distance of p and each cell centre of form, obtain wherein minimum visible sensation distance cell m = arg min n { D sal ( p , C n ) | 1 ≤ n ≤ N } , The call number that wherein m is cell;
S13: by D sal(p, C m) and threshold epsilon ccompare, work as D sal(p, C m)≤ε ctime, p is included into cell Cell min, then use H psubstitute Cell mthe centre of form; Otherwise, increase cell Cell n+1, by H pas cell Cell n+1centre of form C n+1vector, final updating cell is counted N=N+1;
S14: repeat S11, S12 and S13 for all static camera lens frames.
9. method according to claim 7, is characterized in that, realizes the key-frame extraction of dynamic camera lens by following steps:
Initialization: the first frame that obtains dynamic camera lens;
S21: obtain tracking target region, initialization particle or resampling, extract video next frame, judges whether this frame is empty, empty if, finishes;
S22: obtain FAST proper vector, mate with FLANN algorithm, the vectorial weights of regeneration characteristics, if proper vector deficiency finishes;
S23: upgrade each particle weights, calculate key frame weights and target area, S21 is carried out in redirect.
10. the key-frame extraction system based on visual attention model, is characterized in that, comprises marking area extraction module, key-frame extraction module;
Described marking area extraction module comprises:
Spatial domain marking area extraction module, for extracting the marking area on spatial domain;
Time domain key point significance acquisition module, for extracting the significance value of the key point in time domain;
Fusion Module, for the key point in the marking area on spatial domain and time domain is merged, and finally obtains marking area;
Described key-frame extraction module comprises:
Static camera lens key-frame extraction module, for the key-frame extraction of static camera lens;
Dynamically camera lens key-frame extraction module, for the key-frame extraction of dynamic camera lens;
Camera lens adaptation module, for static camera lens key-frame extraction module and the dynamically control between camera lens key-frame extraction module.
CN201410039072.7A 2014-01-26 2014-01-26 Key frame extraction method based on visual attention model and system Expired - Fee Related CN103824284B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410039072.7A CN103824284B (en) 2014-01-26 2014-01-26 Key frame extraction method based on visual attention model and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410039072.7A CN103824284B (en) 2014-01-26 2014-01-26 Key frame extraction method based on visual attention model and system

Publications (2)

Publication Number Publication Date
CN103824284A true CN103824284A (en) 2014-05-28
CN103824284B CN103824284B (en) 2017-05-10

Family

ID=50759326

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410039072.7A Expired - Fee Related CN103824284B (en) 2014-01-26 2014-01-26 Key frame extraction method based on visual attention model and system

Country Status (1)

Country Link
CN (1) CN103824284B (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104598908A (en) * 2014-09-26 2015-05-06 浙江理工大学 Method for recognizing diseases of crop leaves
CN104778721A (en) * 2015-05-08 2015-07-15 哈尔滨工业大学 Distance measuring method of significant target in binocular image
CN105472380A (en) * 2015-11-19 2016-04-06 国家新闻出版广电总局广播科学研究院 Compression domain significance detection algorithm based on ant colony algorithm
CN106210444A (en) * 2016-07-04 2016-12-07 石家庄铁道大学 Kinestate self adaptation key frame extracting method
CN107967476A (en) * 2017-12-05 2018-04-27 北京工业大学 A kind of method that image turns sound
CN110197107A (en) * 2018-08-17 2019-09-03 平安科技(深圳)有限公司 Micro- expression recognition method, device, computer equipment and storage medium
CN110322474A (en) * 2019-07-11 2019-10-11 史彩成 A kind of image motive target real-time detection method based on unmanned aerial vehicle platform
CN110399847A (en) * 2019-07-30 2019-11-01 北京字节跳动网络技术有限公司 Extraction method of key frame, device and electronic equipment
CN111191650A (en) * 2019-12-30 2020-05-22 北京市新技术应用研究所 Object positioning method and system based on RGB-D image visual saliency
CN111493935A (en) * 2020-04-29 2020-08-07 中国人民解放军总医院 Artificial intelligence-based automatic prediction and identification method and system for echocardiogram
CN112418012A (en) * 2020-11-09 2021-02-26 武汉大学 Video abstract generation method based on space-time attention model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030184579A1 (en) * 2002-03-29 2003-10-02 Hong-Jiang Zhang System and method for producing a video skim
EP2207111A1 (en) * 2009-01-08 2010-07-14 Thomson Licensing SA Method and apparatus for generating and displaying a video abstract
CN102088597A (en) * 2009-12-04 2011-06-08 成都信息工程学院 Method for estimating video visual salience through dynamic and static combination
CN102695056A (en) * 2012-05-23 2012-09-26 中山大学 Method for extracting compressed video key frames

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030184579A1 (en) * 2002-03-29 2003-10-02 Hong-Jiang Zhang System and method for producing a video skim
EP2207111A1 (en) * 2009-01-08 2010-07-14 Thomson Licensing SA Method and apparatus for generating and displaying a video abstract
CN102088597A (en) * 2009-12-04 2011-06-08 成都信息工程学院 Method for estimating video visual salience through dynamic and static combination
CN102695056A (en) * 2012-05-23 2012-09-26 中山大学 Method for extracting compressed video key frames

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
NAVEED EJAZ ET AL.: "Efficient visual attention based framework for extraction key frames from videos", 《SIGNAL PROCESSING:IMAGE COMMUNICATION》, 17 October 2012 (2012-10-17), pages 34 - 44 *
YUN ZHAI ET AL.: "Visual attention detection in video sequences using spatiotemporal cues", 《PROCEEDINGS OF THE 14TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA》, 31 October 2006 (2006-10-31) *
蒋鹏 等: "基于视觉注意模型的自适应视频关键帧提取", 《中国图象图形学报》, vol. 14, no. 8, 31 August 2009 (2009-08-31) *
贾云得: "《机器视觉》", 30 April 2000, article "高斯滤波器设计", pages: 76 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104598908B (en) * 2014-09-26 2017-11-28 浙江理工大学 A kind of crops leaf diseases recognition methods
CN104598908A (en) * 2014-09-26 2015-05-06 浙江理工大学 Method for recognizing diseases of crop leaves
CN104778721A (en) * 2015-05-08 2015-07-15 哈尔滨工业大学 Distance measuring method of significant target in binocular image
CN104778721B (en) * 2015-05-08 2017-08-11 广州小鹏汽车科技有限公司 The distance measurement method of conspicuousness target in a kind of binocular image
CN105472380A (en) * 2015-11-19 2016-04-06 国家新闻出版广电总局广播科学研究院 Compression domain significance detection algorithm based on ant colony algorithm
CN106210444A (en) * 2016-07-04 2016-12-07 石家庄铁道大学 Kinestate self adaptation key frame extracting method
CN106210444B (en) * 2016-07-04 2018-10-30 石家庄铁道大学 Motion state self adaptation key frame extracting method
CN107967476B (en) * 2017-12-05 2021-09-10 北京工业大学 Method for converting image into sound
CN107967476A (en) * 2017-12-05 2018-04-27 北京工业大学 A kind of method that image turns sound
CN110197107A (en) * 2018-08-17 2019-09-03 平安科技(深圳)有限公司 Micro- expression recognition method, device, computer equipment and storage medium
CN110322474A (en) * 2019-07-11 2019-10-11 史彩成 A kind of image motive target real-time detection method based on unmanned aerial vehicle platform
CN110399847A (en) * 2019-07-30 2019-11-01 北京字节跳动网络技术有限公司 Extraction method of key frame, device and electronic equipment
CN110399847B (en) * 2019-07-30 2021-11-09 北京字节跳动网络技术有限公司 Key frame extraction method and device and electronic equipment
CN111191650A (en) * 2019-12-30 2020-05-22 北京市新技术应用研究所 Object positioning method and system based on RGB-D image visual saliency
CN111191650B (en) * 2019-12-30 2023-07-21 北京市新技术应用研究所 Article positioning method and system based on RGB-D image visual saliency
CN111493935A (en) * 2020-04-29 2020-08-07 中国人民解放军总医院 Artificial intelligence-based automatic prediction and identification method and system for echocardiogram
CN112418012A (en) * 2020-11-09 2021-02-26 武汉大学 Video abstract generation method based on space-time attention model
CN112418012B (en) * 2020-11-09 2022-06-07 武汉大学 Video abstract generation method based on space-time attention model

Also Published As

Publication number Publication date
CN103824284B (en) 2017-05-10

Similar Documents

Publication Publication Date Title
CN103824284A (en) Key frame extraction method based on visual attention model and system
CN110111335B (en) Urban traffic scene semantic segmentation method and system for adaptive countermeasure learning
CN103578119B (en) Target detection method in Codebook dynamic scene based on superpixels
CN104094279B (en) Large-range-first cross-camera visual target re-identification method
CN106997597B (en) It is a kind of based on have supervision conspicuousness detection method for tracking target
CN102567731B (en) Extraction method for region of interest
CN101315663B (en) Nature scene image classification method based on area dormant semantic characteristic
Lo et al. Assessment of photo aesthetics with efficiency
CN103020985B (en) A kind of video image conspicuousness detection method based on field-quantity analysis
CN101477633B (en) Method for automatically estimating visual significance of image and video
CN101470809B (en) Moving object detection method based on expansion mixed gauss model
CN103020992B (en) A kind of video image conspicuousness detection method based on motion color-associations
CN103208115B (en) Based on the saliency method for detecting area of geodesic line distance
CN104103082A (en) Image saliency detection method based on region description and priori knowledge
CN102088597B (en) Method for estimating video visual salience through dynamic and static combination
CN103632153B (en) Region-based image saliency map extracting method
CN102142147A (en) Device and method for analyzing site content as well as device and method for detecting and tracking target
CN103226824B (en) Maintain the video Redirectional system of vision significance
CN109544561A (en) Cell mask method, system and device
Yi et al. Realistic action recognition with salient foreground trajectories
CN103020614A (en) Human movement identification method based on spatio-temporal interest point detection
CN108829711A (en) A kind of image search method based on multi-feature fusion
CN103578107A (en) Method for interactive image segmentation
CN103077383B (en) Based on the human motion identification method of the Divisional of spatio-temporal gradient feature
CN103218829B (en) A kind of foreground extracting method being adapted to dynamic background

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170510