CN104410867A

CN104410867A - Improved video shot detection method

Info

Publication number: CN104410867A
Application number: CN201410652175.0A
Authority: CN
Inventors: 高腾飞
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2014-11-17
Filing date: 2014-11-17
Publication date: 2015-03-11

Abstract

The invention discloses a video shot detection method. The detection method comprises the following steps: histogram method processing procedure which comprises the following steps: extracting two adjacent frames from the video, computing histogram intersection according to weighting proportion of three color components in a HSV space; and judging whether the shot change occurs; frame difference method processing procedure, if judging that the shot change occurs, entering the frame difference method processing procedure, wherein the frame difference method processing procedure further comprises the following steps: extracting two adjacent frames from the shot and performing non-uniform blocking; computing pixel difference value of each of the blocks; comparing the pixel difference value with a predetermined block frame difference threshold value so as to perform sign; performing weighting summation on the sign variables in the blocks; comparing the summation with a set block weighting threshold value, if the summation is greater than the block weighting threshold value, regarding that the shot change occurs.

Description

The shot detection method improved

Technical field

The present invention relates to video lens cutting techniques, more specifically, the present invention relates to a kind of shot detection method of improvement.

Background technology

Along with becoming increasingly abundant of present stage video data, how effectively to organize the massive video information that network exists, manage, retrieval and indexing, and find the interested video segment of user, this demand has become further urgent.Video lens cutting techniques is one of key problem of content based video retrieval system, can effectively manage for the video data unordered to magnanimity on the Internet.By this technology and key-frame extraction technique, structuring being carried out also for video sets up index and summary to video, when forming the linear structure about video content, effectively could realize fast browsing and the retrieval of video data.

Present stage shot detection segmentation common method be, calculate the frame difference value of Low Level Vision feature or motion feature between successive frame in video, and itself and default or adaptive threshold value are compared, if difference value is greater than threshold value, then this place is shot boundary, otherwise, then think that this group successive frame belongs to same camera lens.From common method, the metric form of frame difference value, the setting of threshold value, and both optimum combination will become the key point of shot detection segmentation.And within same camera lens, video features changes and mainly contains following two reasons: the motion of object/video camera and the change of light.The motion of object/video camera causes constantly occurring new object in camera lens, and old object is also in continuous disappearance simultaneously, if deal with improperly, is then easy to obscure with gradual shot, causes camera lens flase drop.Also often occur in camera lens that light changes, if certain frame brightens suddenly in camera lens, then will there is saltus step in the frame difference based on brightness tolerance, if deal with improperly, will be detected. as shot-cut, also can cause camera lens flase drop.

According to the feature of camera lens conversion place, based on different picture frame visual signatures and camera motion feature, the existing technical scheme of shot segmentation mainly adopts following several class methods: based on the algorithm of pixel, based on histogrammic algorithm etc.

(1) based on the algorithm of pixel

Algorithm based on pixel is sued for peace to the adjacent gray scale of two frame respective pixel or the absolute value of luminance difference, obtains total frame poor, and compares to weigh frame difference degree with predetermined threshold value.It is the simplest and rudimentary algorithm calculating frame difference value,

(2) based on histogrammic algorithm

Histogram reflects the overall distribution of piece image gray scale (grey level histogram) or color (color histogram) intuitively, due to the global property that it is outstanding, be widely used in image procossing, and having multiple metric form: basic skills is the histogram difference value calculating adjacent video interframe, but the result of difference of histograms value is different because of the histogram kind adopted.The histogram Weighted distance also calculated between two width images by introducing weight coefficient is expanded basic skills, also has the histogram intersection between calculating two width image in addition or adopts other distance metric methods.

Be the shot detection dividing method be most widely used based on histogrammic algorithm, process simple and convenient, computation complexity is lower, for most of video, as long as threshold value arranges proper, generally all can reach reasonable effect.

The low-level feature (as visual signature and motion features such as color, edge, textures) that existing shot segmentation algorithm has used video different respectively, although all achieve certain effect, but because algorithm itself limit, more or less all there are some problems.Independent application someway or certain Low Level Vision feature of video video lens is detected, be difficult to obtain good effect.

For the method based on pixel, although algorithm is simple and clear and easily realize, but very responsive for the motion of object/video camera in camera lens, in camera lens, the motion of object/video camera will cause the gray scale of most pixel in picture frame or brightness to change, thus causes the error detection of shot boundary; For based on histogrammic method, because histogram cannot embody positional information and the vision content of image, the uncorrelated two width images of content milli also may have same gray scale/color overall distribution, the two width images with same color overall distribution also may have identical object and background, but the position of object is different, these all may cause the error detection of shot boundary; In addition, the situation (as flash of light etc.) of violent illumination variation, also can disturb the Shot Detection effect of above two kinds of methods to a great extent.

Based on the shortcoming that prior art exists, we improve existing algorithm on the basis of existing shot detection algorithms, and combine the method in theory with complementary characteristic, propose a kind of shot detection method of improvement.The present invention, on the basis that camera lens degree of detecting and accuracy are significantly improved, does not significantly increase time and computation complexity, reaches the corresponding requirements of real-time in practical application.

Summary of the invention

In order to overcome the problems of the prior art, technical scheme content based video retrieval system of the present invention can analyze extraction feature effectively to the image of user's input or video information, to carry out coupling retrieval, thus improve degree of detecting and the accuracy of shot segmentation detection, offer convenience to user.

According to one embodiment of present invention, provide a kind of shot detection method, comprising: histogram method processing procedure, described histogram method processing procedure comprises step: from video, extract adjacent two frames; In HSV space, carry out compute histograms according to the weight ratio of three color components and occur simultaneously; And judge whether shot change occurs; Frame difference method processing procedure, if judge shot change occurs, then this process enters described frame difference method processing procedure, and described frame difference method processing procedure comprises step further: from described camera lens, extract adjacent two frames, and carry out non-homogeneous piecemeal; Calculate the pixel value difference of each piecemeal in described piecemeal; Described pixel value difference and the piecemeal frame difference limen value preset are compared, to mark; Summation is weighted to the token variable in described piecemeal; By described and with setting divided group threshold value compare; And if described and be greater than described divided group threshold value, then there is shot change.

Preferably, judge whether described in that shot change occurs to be comprised further: the shot similarity threshold value of described histogram intersection and setting compared; And if described histogram intersection is greater than described shot similarity threshold value, then shot change occurs.

Preferably, in described histogram method processing procedure, if do not reach video end frame, then described histogram method processing procedure is repeated to next frame.

Preferably, carry out marking comprising further described in: the piecemeal described pixel value difference being greater than described piecemeal frame difference limen value is labeled as 1, and the piecemeal described pixel value difference being not more than described piecemeal frame difference limen value is labeled as 0.

Preferably, in described frame difference method processing procedure, if do not reach camera lens end frame, then described frame difference method processing procedure is repeated to next frame.

Preferably, the camera lens that frame number is less than 20 is incorporated into in a upper camera lens.

According to the detailed description below the disclosure and accompanying drawing, other object, feature and advantage will be apparent to those skilled in the art.

Accompanying drawing explanation

Accompanying drawing illustrates embodiments of the invention, and is used from specification one and explains principle of the present invention.In the accompanying drawings:

Fig. 1 shows and improves histogram method process chart according to an embodiment of the invention.

Fig. 2 shows the frame difference method process chart improved according to an embodiment of the invention.

Embodiment

A kind of shot detection method of improvement is disclosed according to embodiments of the invention.In the following description, for illustrative purposes, multiple detail has been set forth to provide the complete understanding to embodiments of the invention.But it is evident that for those skilled in the art, embodiments of the invention can realize when not having these details.

The structure of video generally can be divided into four levels: video sequence, scene, camera lens and frame.Particularly, a video sequence refers to an independent lattice video file, or a lattice video segment.Video sequence is made up of several scenes.Each scene comprises one or more camera lens, and these camera lenses can be continuous print or spaced.Each case for lense contains some continuous print picture frames.The video segment can expressing certain semantic content that scene is made up of at semantically relevant camera lens jointly several, the camera lens that these voice are relevant can be continuous print, also can be interval.

As used in this, term " video lens " refers to a series of frames that are mutually related absorbed continuously by same video camera, represents a continuous print action, is also regarded as the elementary cell of video frequency searching, usually have the feature that background is constant or gradual.Only have and video sequence is decomposed into camera lens, could effectively carry out the work such as key-frame extraction, video breviary, video sequence identification.As used in this, term " key frame " is used to certain frame of the inner main contents of description camera lens or a few two field picture.Shot detection and key-frame extraction, as the important component part of video structural, are the key technologies of content based video retrieval system.

Frame is the least unit of video, is a secondary static image.During playing video file, the picture fixed at any time, is a two field picture.In the image change of camera lens content, its reason is generally the brightness change etc. of the motion of camera and the motion of object and light source.

Shot segmentation detects also known as shot transition, is the basis of video structure stratification, and requiring the impact avoiding extraneous factor to split for Shot Detection, is multiple camera lens be made up of the one group of uninterrupted frame having identical content by video signal process.For content based video retrieval system, the actual content of what " content-based " represented the method analysis is video data, as color, edge, texture or other information that can acquire from video itself, instead of be similar to the metadata such as keyword, label or video presentation.

Conversion between camera lens comprises two kinds: shear and gradual change.Shear refers to that a camera lens does not adopt any edit effect to convert directly to another camera lens; Gradual change refers to that a camera lens is by certain edit, such as fades out, fades in, dissolves, transform to lentamente in another camera lens.

Shot segmentation is the basis of video structure stratification, as the basic steps analyzing video sequence and effectively retrieve extensive video database and browse, the impact that extraneous factor is split for Shot Detection can be avoided as far as possible, video is accurately divided into camera lens set and will directly has influence on the effect of key-frame extraction and subsequent treatment.

According to technical scheme of the present invention, adopt the histogram processing methods improved to carry out preliminary treatment to video data, then utilize the frame difference method improved again to process based on this result, thus effectively improve degree of detecting and the accuracy of video lens.The specific implementation step of the method is as follows:

A. under hsv color space, respectively non-uniform quantizing is carried out to H, S, V color component, and be that each color component is composed with different weights, calculate the histogrammic common factor of adjacent two frame, and compare to judge whether shot change occurs with setting threshold.

B. for acquired results in (a), interframe gray scale/color difference is utilized to carry out the secondary detection of shot boundary, in conjunction with the method for non-homogeneous divided group, respectively to every block calculating pixel difference and with preset piecemeal frame difference limen value compare to mark, then to the token variable weighted sum of every block, and compare to judge whether shot change occurs with the divided group threshold value of setting.

C. consider the situation of strong illumination variation, especially glisten, camera lens frame number being less than 20 herein incorporates in a upper camera lens again.

Fig. 1 shows and improves histogram method process chart according to an embodiment of the invention.As shown in Figure 1, in a step 11, from video, adjacent two frames are extracted.Then, in step 12, in HSV space, carry out compute histograms according to the weight ratio of three color components and occur simultaneously, to judge whether shot change occurs.The weight ratio of described three color components such as comes from the mode of the non-uniform quantizing HSV space commonly used this area, and such as can adopt 9:3:1.Particularly, in step 13, judge whether histogram intersection is greater than shot similarity threshold value.If so, then in step 16, judge without shot change.If not, then at step 14, judge shot change occurs.Repeat this process, until arrive video end frame.

Like this, after being disposed of Fig. 1, preliminary shot sequence is formed.Then, improvement frame difference method handling process is as shown in Figure 2 entered according to the shot detection method of technical scheme of the present invention.Treatment step in Fig. 2 carries out reprocessing for the frame interior of camera lens each in this shot sequence.

Fig. 2 shows the frame difference method process chart improved according to an embodiment of the invention.As shown in Figure 2, in step 21, from camera lens, extract adjacent two frames, and carry out non-homogeneous piecemeal.Then, in step 22, the pixel value difference of corresponding blocks is calculated.In step 23, judge whether to be greater than piecemeal frame difference limen value.If so, then in step 24, this block is labeled as 1.If not, then in step 25, this block is labeled as 0.As understood by a person skilled in the art, this mark, only for distinguishing object, also can adopt other modes.In step 26, judge whether it is camera lens frame end block.If not, then this process enters next block and returns step 22.If judge it is camera lens frame end block, then in a step 28, summation is weighted to token variable.Then, in step 29, judge whether to be greater than divided group threshold value.If so, then judge in step 30 shot change occurs.If judge it is not be greater than divided group threshold value in step 29, then in step 33, judge without shot change.Further, in step 31, judge whether camera lens end frame.If not, then this process enters next frame and returns step 21.If so, then this process terminates.

And for the situation occurring flash of light (strong illumination variation) in camera lens, camera lens frame number being less than 20 incorporates in a upper camera lens again, is more suitable for human visual system like this.

Video is carried out to the histogram method of preliminary treatment, under hsv color space, respectively non-uniform quantizing is carried out to H, S, V color component, and be that each color component is composed with different weights, calculate the histogram difference degree of two interframe, the difference degree of human visual perception can be reflected so better, and there is certain perception uniformity.

Video is carried out to the pixel frame difference method of aftertreatment, be weighted by non-homogeneous divided-fit surface, can effectively suppress the advertisement of video top or bottom or captions for the interference of Shot Detection like this, and fully taken into account the positional information of each pixel of picture frame, good supplementary function is served for the histogram method improved.

The present invention effectively improves degree of detecting and the accuracy of video lens, can be effective to content based video retrieval system, and is supplied to user according to the facilitating functions detecting camera lens quick browse video.

Above-described embodiment is only the preferred embodiments of the present invention, is not limited to the present invention.It will be apparent for a person skilled in the art that without departing from the spirit and scope of the present invention, various amendment and change can be carried out to embodiments of the invention.Therefore, the invention is intended to contain all amendments within the scope of the present invention as defined by the appended claims of falling into or modification.

Claims

1. a shot detection method, comprising:

Histogram method processing procedure, described histogram method processing procedure comprises step:

Adjacent two frames are extracted from video;

In HSV space, carry out compute histograms according to the weight ratio of three color components and occur simultaneously; And

Judge whether shot change occurs;

Frame difference method processing procedure, if judge shot change occurs, then this process enters described frame difference method processing procedure, and described frame difference method processing procedure comprises step further:

From camera lens, extract adjacent two frames, and carry out non-homogeneous piecemeal;

Calculate the pixel value difference of each piecemeal in described piecemeal;

Described pixel value difference and the piecemeal frame difference limen value preset are compared, to mark;

Summation is weighted to the token variable in described piecemeal;

By described and with setting divided group threshold value compare; And

If described and be greater than described divided group threshold value, then there is shot change.

2. method according to claim 1, wherein, described in judge whether occur shot change comprise further:

The shot similarity threshold value of described histogram intersection and setting is compared; And

If described histogram intersection is greater than described shot similarity threshold value, then there is shot change.

3. method according to claim 1, comprises further:

In described histogram method processing procedure, if do not reach video end frame, then described histogram method processing procedure is repeated to next frame.

4. method according to claim 1 and 2, wherein, described in carry out marking comprising further:

The piecemeal described pixel value difference being greater than described piecemeal frame difference limen value is labeled as 1, and the piecemeal described pixel value difference being not more than described piecemeal frame difference limen value is labeled as 0.

5. method according to claim 1, comprises further:

In described frame difference method processing procedure, if do not reach camera lens end frame, then described frame difference method processing procedure is repeated to next frame.

6. method according to claim 1, wherein, the camera lens that frame number is less than 20 is incorporated into in a upper camera lens.