CN108600865A

CN108600865A - A kind of video abstraction generating method based on super-pixel segmentation

Info

Publication number: CN108600865A
Application number: CN201810456341.8A
Authority: CN
Inventors: 金海燕; 李喻蒙; 肖照林; 李秀秀
Original assignee: Xian University of Technology
Current assignee: Shaanxi Zhisheng Desen Elevator Co.,Ltd.
Priority date: 2018-05-14
Filing date: 2018-05-14
Publication date: 2018-09-28
Anticipated expiration: 2038-05-14
Also published as: CN108600865B

Abstract

The invention discloses a kind of video abstraction generating methods based on super-pixel segmentation, the segmentation of video boundaries by slightly arrive it is thin in a manner of complete, partitioning boundary tends to Local Minimum moving region, make boundary alignment to the part for the position for being suitable for cutting, the partitioning boundary of video is extracted as the key frame extracted, indicate that the key activities between action sequence are extracted, represent the effective information of video, the operand and complexity of video can be greatly lowered, the real-time for improving video analysis has a significant impact；The image block that will have many characteristics, such as similar grain, color, the adjacent pixel composition of brightness using the method for super-pixel region merging technique, by the similarity degree of feature between pixel by group pixels, the redundancy that image can be obtained reduces the complexity of subsequent image processing task；The Similarity measures of image are carried out using spatial organization's relationship between pixel and generate video frequency abstract to eliminate redundancy key frames, and video frequency abstract effect is preferable.

Description

A kind of video abstraction generating method based on super-pixel segmentation

Technical field

The invention belongs to technical field of image processing, and in particular to a kind of video frequency abstract generation side based on super-pixel segmentation Method.

Background technology

The research early start of video summarization technique is in the Informedia engineerings of Carnegie Mellon University in 1994.From that After, researcher is added more and more, and the technology of video frequency abstract related fields is studied and explored, overall For, for video summarization technique according to the abstract form of output, common video frequency abstract generating algorithm mainly has data clusters at present Method, curve law of planning and machine learning method.

In the prior art, IEEE International Conference on Acoustics, Speech and Signal Processing.IEEE,2014:1260-1264. is disclosed original video is decomposed into image sequence after, extraction it is each The color moment characteristics of frame image, and video is divided into several camera lenses, then use the spectral clustering based on rough set theory To clustering shots.But such algorithm existing defects：First, most of clustering algorithm is required for presetting the number of cluster Mesh, optimal clusters number, generally requires constantly to test repeatedly in order to obtain.And optimal clusters number is dynamic change , and input original video length and type have relationship, cause algorithm that must be executed in the case of manual intervention.The Two, whether the feature of extraction is representative often to influence Clustering Effect.It is only extracted the color characteristic of image, has ignored figure The shape and Texture eigenvalue of picture, and single feature cannot comprehensively express the visual information of image.

Proceedings of IEEE International Conference on Multimedia and Expo.Washington DC,USA:IEEE,2005:In 670-673, after original video is decomposed into image sequence, calculate Pixel difference between frame and frame, and pixel difference is fitted to curve, in order to simplify curve, broken line is divided into the collection of a broken line It closes, and the small point of correlation is deleted from set.Curve law of planning intuitively shows video using the curve in coordinate system Content, can simplify the processing to video, however this method may only reflecting video content variation, it is not possible to complete expression Semantic information.

Nature and Biologically Inspired Computing.IEEE,2011:In 7-11, SVM pairs is utilized Playback mark in football video is trained and learns, and identification playback camera lens, the generation of goal event is corresponded to be indicated in playback Occur the scenes such as stopping, portrait attachment, crowd shots, goal area and audio excitement successively before, and then can after playback indicates There are the scenes such as full length shot, scoreboard.By the playback camera lens detected and judge whether its front and back scene meets the above-mentioned fact To carry out football video Context event monitoring.Machine learning method is established model and is required in the case of the overwhelming majority manually Participate in auxiliary modeling, and this method to modeling when Feature Selection it is more sensitive, if suitable feature is had chosen, using machine Device learning method can get the high-level semantics information for meeting human cognitive.Conversely, unsuitable Feature Selection will make learning outcome not People's will to the greatest extent.

Invention content

The purpose of the present invention is to provide a kind of video abstraction generating methods based on super-pixel segmentation, can significantly drop The operand and complexity of low video, the real-time for improving video analysis have a significant impact.

The technical solution adopted by the present invention is a kind of video abstraction generating method based on super-pixel segmentation, specifically according to Following steps are implemented：

Step 1 obtains video image, and using video image as input data, extracts video image frame sequence；

Step 2 carries out initial isometric segmentation to video image frame sequence, is divided into multiple isometric frame sequences；

Meanwhile the light stream of video image frame sequence is calculated, and use the light stream offset estimation between video image frame sequence Motion amplitude as the function of time；

Step 3 is directed to each isometric frame sequence, and the motion amplitude for passing through the function of time finds initial isometric partitioning boundary frame The frame of partial gradient value minimum is as key frame；

Step 4 carries out Local Clustering to the pixel in every width key frame images, using the similarity pair of feature between pixel Key frame images carry out super-pixel segmentation, obtain multiple super-pixel；

Step 5 carries out region merging technique to multiple super-pixel, obtains super-pixel region merging technique image；

Step 6 carries out similarity measurement to adjacent two width super-pixel region merging technique image, judges adjacent Liang Fu super-pixel area Whether domain merges image similar；

One in step 7, the adjacent similar super-pixel region merging technique image of deletion, by remaining super-pixel region merging technique figure As generating video frequency abstract after combination.

The present invention is further characterized in that：

Motion amplitude detailed process of the step 2 using the light stream offset estimation between successive frame as the function of time be：It is logical The light stream in all pixels of each video image frame in aggregation level and vertical direction is crossed to calculate video image frame sequence The amplitude of movement, calculation formula are as follows：

Wherein, OFx (i, j, t) is the x-component of light stream of the pixel (i, j) between frame t and t-1, and OFy (i, j, t) is picture The y-component of the light stream of plain (i, j) between frame t and t-1；Light stream tracks all the points over time, and summation is between frame Amount of exercise estimation.

The detailed process of step 4 is：

Step 4.1, by every width key frame formation image be converted to 5 dimensional features under CIELAB color spaces and XY coordinates to Then amount constructs module to 5 dimensional feature vectors：

Assuming that it is the super-pixel of K identical sizes that image, which has N number of pixel, pre-segmentation, the size of each super-pixel is N/ K, the super-pixel central point for selecting each pre-segmentation is seed point, and the distance of each two seed point is approximately

The seed point, is moved to the position of Grad minimum by step 4.2, the window that 3 × 3 are divided centered on seed point It sets, while an individual label is distributed for each seed；

Step 4.3, for each pixel, calculate separately the similarity degree for each seed point that distance is nearest therewith, will most phase It is assigned to the pixel like the label of seed point, the pixel with same label forms a super-pixel.

The formula that step 4.3 calculates the similarity degree for each seed point that distance is nearest therewith is as follows：

Wherein, d_labColor distortion between pixel, d_xySpace length between pixel, D_iFor the phase of two pixels Like degree；S is the spacing of seed point, and m is balance parameters, for weighing the ratio of color value and spatial information in measuring similarity Weight, D_iValue is bigger, illustrates that two pixels are more similar.

Step 5 detailed process is：

Step 5.1 presets region merging technique threshold value, it is assumed that super-pixel has K, using non-directed graph G=(V, E) to super picture Plain syntople is recorded；

Wherein V={ S₁, S₂..., S_KBe all K vertex set, E is the set on all boundaries, figure G=(V, E) In, each region is represented as a node in figure；

Step 5.2 chooses any two region (S_i, S_j) ∈ V, if S_i, S_jIt is adjacent, then between corresponding node just There are a lines to be connected, and each side assigns certain weight, calculates weight and corresponds to the cost value for merging two regions；

The weight of cost value minimum merges in step 5.3, selection adjacent area, constantly updates relevant edge weight, And judge whether the areal after the merging is equal to and preset region merging technique threshold value, when equal to presetting region merging technique threshold Value, then merge termination, obtain multiple super-pixel block；

The image that step 5.4, multiple super-pixel block are formed is super-pixel region merging technique image.

Step 5.2 calculates weight：

Wherein, N indicates that region S areas, μ indicate the spectrum mean value in region, and l is the Border of region i and j, and λ is shape Shape parameter.

Step 6 carries out similarity measurement detailed process to adjacent two width super-pixel region merging technique image：

Preset difference value threshold value；

Calculate the difference value between each pixel of adjacent two width super-pixel region merging technique image：

Wherein, i indicates that picture number, i+1 indicate the adjacent image of i images, and a indicates pixel, to any in image The coordinate of pixel a is I_a(x, y), where super-pixel segmentation merge after region be：Ω a=sp (I_a(x, y)), M is indicated The pixel number in the region；

The difference value of all respective pixel points place super-pixel of two width super-pixel region merging technique images is calculated, then adjacent two width Super-pixel region merging technique figure difference value calculation formula is as follows：

Wherein, the boundary of the pixel difference value value during threshold is indicated per adjacent two super-pixel region merging technique image Value.

Step 6 judges whether similar process is adjacent two width super-pixel region merging technique image：Judge adjacent two width super-pixel Whether the difference value of region merging technique image is less than threshold value：

If difference value is less than threshold value, two width super-pixel region merging technique images are similar；

If difference value is not less than threshold value, two width super-pixel region merging technique images are dissimilar.

A kind of video abstraction generating method advantageous effect based on super-pixel segmentation of the present invention is：

(1) present invention in video boundaries segmentation by slightly arrive it is thin in a manner of complete, partitioning boundary tend to Local Minimum transport Dynamic region makes boundary alignment extract the partitioning boundary of video as the key extracted to the part for the position for being suitable for cutting Frame indicates that the key activities between action sequence are extracted, represents the effective information of video, and video can be greatly lowered Operand and complexity, for improve video analysis real-time have a significant impact；

(2) will have many characteristics, such as similar grain, color, the phase of brightness using the method for super-pixel region merging technique in the present invention Adjacent pixel constitutes block of pixels can obtain the redundancy of image by the similarity degree of feature between pixel by group pixels, Largely reduce the complexity of subsequent image processing task；Image is carried out using spatial organization's relationship between pixel Similarity measures generate video frequency abstract to eliminate redundancy key frames, and video frequency abstract effect is preferable.

Description of the drawings

Fig. 1 is a kind of video abstraction generating method flow chart based on super-pixel segmentation of the present invention；

Fig. 2 is key-frame extraction display diagram in the present invention；

Fig. 3 is super-pixel region merging technique process schematic of the present invention.

Specific implementation mode

The following describes the present invention in detail with reference to the accompanying drawings and specific embodiments.

A kind of video abstraction generating method based on super-pixel segmentation of the present invention, as shown in Figure 1, specifically according to the following steps Implement：

Motion amplitude detailed process using the light stream offset estimation between successive frame as the function of time is：By every What the light stream in all pixels of a video image frame in aggregation level and vertical direction moved to calculate video image frame sequence Amplitude, calculation formula are as follows：

Step 3 is directed to each isometric frame sequence, and the motion amplitude for passing through the function of time finds initial isometric partitioning boundary frame The frame of partial gradient value minimum is as key frame, as shown in Figure 2；

Detailed process is：

Step 4.3, as shown in figure 3, for each pixel, calculate separately the similar journey for each seed point that distance is nearest therewith Degree, the pixel is assigned to by the label of most like seed point, and the pixel with same label forms a super-pixel；

The formula for calculating the similarity degree for each seed point that distance is nearest therewith is as follows：

Detailed process is：

It calculates weight and corresponds to and merge the calculation formula of cost value in two regions and be：

Wherein, N indicates that region S areas, μ indicate the spectrum mean value in region, and l is the Border of region i and j, and λ is shape Shape parameter；

Step 6 carries out similarity measurement to adjacent two width super-pixel region merging technique image, and detailed process is：

Preset difference value threshold value；

Judge whether adjacent two width super-pixel region merging technique image is similar, and process is again：Judge adjacent Liang Fu super-pixel area Whether the difference value that domain merges image is less than threshold value：

Embodiment

Feasibility verification is carried out to the scheme of invention with reference to specific calculation formula, it is described below：

The validity of the put forward algorithm of YouTube database authentications is chosen in experiment.YouTube databases include from video website 50 videos that (such as YouTube) is collected, these videos are also because of multiple types (such as caricature, news, sport, advertisement, TV Program and home videos) and it is different, the duration minute is differed from 1 minute to 10.

The present invention is evaluated using the objective evaluation standard of mainstream, i.e. precision (Precision), recall rate (Recall) With F values (F-snore).The calculation formula difference of precision, recall rate and F values is as follows：

Wherein Nmatched indicates autoabstract and the matched length of User Summary, i.e., in autoabstract in User Summary Identical key frame number, definition are less than specified threshold when the manhatton distance of the color histogram of two key framesWhen, it is believed that Two key frames are matched, this experiment willValue is set as 0.5；NAS indicates to automatically generate the length of abstract；Nus indicates user Length of summarization.Precision reflects autoabstract and plucks the ability for selecting matching key frame, and recall rate reflects matching key frame and hits The ability of User Summary, F values are the balances to precision and recall rate, are an overall evaluations to video frequency abstract quality.

For each individual video, when obtaining autoabstract result, the present invention first by by the frame detected with The annotation frame of each user is compared to calculate F values, all F values that then annotation by comparing each user is obtained into Row is average.The average F fractional values are used as each final assessment result summarized.The average F fractional values of algorithms of different such as 1 institute of table Show.

1 the method for the present invention of table and other video frequency abstract algorithm objective evaluation Comparison of standards results

As can be seen from the table, the F value highests that method of the invention calculates, mean apparent 0.54, it is clear that reach best Performance.The more key frames of algorithms selection proposed, although precision is relatively low, however, since recall rate is higher, it is proposed by the present invention The F values of method are not much less, it is proposed by the present invention based on the video summarization method of super-pixel segmentation still better than all Comparison algorithm.

By the above-mentioned means, a kind of video abstraction generating method based on super-pixel segmentation of the present invention, point of video boundaries Cut by slightly arrive it is thin in a manner of complete, partitioning boundary tends to Local Minimum moving region, make boundary alignment to be suitable for cutting The part of position extracts the partitioning boundary of video as the key frame extracted, indicates the key activities quilt between action sequence Extraction, represents the effective information of video, and the operand and complexity of video can be greatly lowered, for improving video point The real-time of analysis has a significant impact；To have many characteristics, such as similar grain, color, brightness using the method for super-pixel region merging technique The image block that adjacent pixel is constituted can obtain the redundancy of image by the similarity degree of feature between pixel by group pixels Information largely reduces the complexity of subsequent image processing task；Using spatial organization's relationship between pixel into The Similarity measures of row image generate video frequency abstract to eliminate redundancy key frames, and video frequency abstract effect is preferable.

Claims

1. a kind of video abstraction generating method based on super-pixel segmentation, which is characterized in that be specifically implemented according to the following steps：

Meanwhile the light stream of video image frame sequence is calculated, and use the light stream offset estimation conduct between video image frame sequence The motion amplitude of the function of time；

Step 3 is directed to each isometric frame sequence, and the motion amplitude for passing through the function of time finds initial isometric partitioning boundary frame part The frame of Grad minimum is as key frame；

Step 4 carries out Local Clustering to the pixel in every width key frame images, using the similarity of feature between pixel to key Frame image carries out super-pixel segmentation, obtains multiple super-pixel；

Step 6 carries out similarity measurement to adjacent two width super-pixel region merging technique image, judges that adjacent two width super-pixel region is closed And whether image is similar；

One in step 7, the adjacent similar super-pixel region merging technique image of deletion, by remaining super-pixel region merging technique image group Video frequency abstract is generated after conjunction.

2. a kind of video abstraction generating method based on super-pixel segmentation according to claim 1, which is characterized in that step 2 The light stream offset estimation using between successive frame is as the motion amplitude detailed process of the function of time：By being regarded each Light stream in all pixels of frequency picture frame in aggregation level and vertical direction calculates the amplitude of video image frame sequence movement, Calculation formula is as follows：

Wherein, OF_x(i, j, t) is the x-component of light stream of the pixel (i, j) between frame t and t-1, OF_y(i, j, t) be pixel (i, J) y-component of the light stream between frame t and t-1；Light stream tracks all the points over time, and summation is the movement between frame The estimation of amount.

3. a kind of video abstraction generating method based on super-pixel segmentation according to claim 1, which is characterized in that step 4 Detailed process be：

Every width key frame formation image is converted to 5 dimensional feature vectors under CIELAB color spaces and XY coordinates by step 4.1, Then module is constructed to 5 dimensional feature vectors：

Assuming that it is the super-pixel of K identical sizes that image, which has N number of pixel, pre-segmentation, the size of each super-pixel is N/K, choosing The super-pixel central point of fixed each pre-segmentation is seed point, and the distance of each two seed point is approximately

The seed point, is moved to the position of Grad minimum, together by step 4.2, the window that 3 × 3 are divided centered on seed point When for each seed distribute an individual label；

Step 4.3, for each pixel, the similarity degree for each seed point that distance is nearest therewith is calculated separately, by most like kind The label of son point is assigned to the pixel, and the pixel with same label forms a super-pixel.

4. a kind of video abstraction generating method based on super-pixel segmentation according to claim 3, which is characterized in that step The formula of 4.3 similarity degrees for calculating each seed point that distance is nearest therewith is as follows：

Wherein, d_labColor distortion between pixel, d_xySpace length between pixel, D_iFor the similarity of two pixels； S is the spacing of seed point, and m is balance parameters, for weighing color value and proportion of the spatial information in measuring similarity, D_iIt takes Value is bigger, illustrates that two pixels are more similar.

5. a kind of video abstraction generating method based on super-pixel segmentation according to claim 1, which is characterized in that step 5 Detailed process is：

Step 5.1 presets region merging technique threshold value, it is assumed that super-pixel has K, using non-directed graph G=(V, E) to super-pixel neighbour The relationship of connecing is recorded；

Wherein V={ S₁, S₂..., S_KBe all K vertex set, E is the set on all boundaries, in figure G=(V, E), often One region is represented as a node in figure；

Step 5.2 chooses any two region (S_i, S_j) ∈ V, if S_i, S_jIt is adjacent, then there is one between corresponding node Side is connected, and each side assigns certain weight, calculates weight and corresponds to the cost value for merging two regions；

The weight of cost value minimum merges in step 5.3, selection adjacent area, constantly updates relevant edge weight, and sentence Whether the areal after the merging that breaks, which is equal to, presets region merging technique threshold value, when equal to presetting region merging technique threshold value, Then merge termination, obtains multiple super-pixel block；

6. a kind of video abstraction generating method based on super-pixel segmentation according to claim 5, which is characterized in that step 5.2 it is described calculate weights and correspond to merge the calculation formula of cost value in two regions and be：

Wherein, N indicates that region S areas, μ indicate that the spectrum mean value in region, l are the Borders of region i and j, and λ is shape ginseng Number.

7. a kind of video abstraction generating method based on super-pixel segmentation according to claim 1, which is characterized in that step 6 It is described to be to adjacent two width super-pixel region merging technique image progress similarity measurement detailed process：

Preset difference value threshold value；

Wherein, i indicates that picture number, i+1 indicate the adjacent image of i images, and a indicates pixel, to any pixel in image The coordinate of point a is I_a(x, y), where super-pixel segmentation merge after region be：Ω a=sp (I_a(x, y)), M indicates the area The pixel number in domain；

The difference value of super-pixel, calculation formula are as follows where calculating all respective pixel points of two width super-pixel region merging technique images：

Wherein, the dividing value of the pixel difference value value during threshold is indicated per adjacent two super-pixel region merging technique image.

8. a kind of video abstraction generating method based on super-pixel segmentation according to claim 7, which is characterized in that step 6 It is described to judge whether similar process is adjacent two width super-pixel region merging technique image：Judge adjacent two width super-pixel region merging technique figure Whether the difference value of picture is less than threshold value：