CN102694966A

CN102694966A - Construction method of full-automatic video cataloging system

Info

Publication number: CN102694966A
Application number: CN2012100548125A
Authority: CN
Inventors: 蔡靖
Original assignee: Tianjin University of Technology
Current assignee: Tianjin University of Technology
Priority date: 2012-03-05
Filing date: 2012-03-05
Publication date: 2012-09-26
Anticipated expiration: 2032-03-05
Also published as: CN102694966B

Abstract

The invention discloses a full-automatic video cataloging system and a full-automatic video cataloging method. The system is mainly used for automatically building a metadatabase of massive non-structured media materials, simultaneously supporting manual modification, adding and perfection of metadata on that basis, and finally realizing the aim of effective management on massive media assets through the perfected media metadatabase. The system comprises a full-automatic video cataloging system function and system architecture, a media material video key frame robustness automatic extraction algorithm and a media material shot segmentation key robustness automatic extraction algorithm. The software realized based on the method provided by the invention can realize the real-time processing effect on the existing mainstream computers. The video key frame and shot segmentation robustness extraction algorithm provided by the invention has good anti-interference effect on the influence of camera flash and special effects including wiping off, fading in fading out and dissolving which frequently appear in video, so that excess omissions and false drop caused by the interference can be avoided.

Description

A kind of construction method of full-automatic video cataloging syytem

Technical field

The present invention relates to multimedia technology, the metadata that relates in particular to the voluminous media material is set up and effectively management.

Background technology

Current society is the society of high speed information development, produces the information of a large amount of various kinds every day.Particularly in the video display industry, a large amount of multimedia materials produce and accumulate.Yet multi-medium data itself is a kind of non-structured data, how to manage these media datas effectively, is the difficult problem that presses for solution that media industry is faced.

Media asset management (Median Asset Management; MAM) be an overall solution of end to end all kinds medium and content (like video/audio data, text, chart etc.) being carried out comprehensive management in its lifetime, comprise all processes such as collection, catalogue, management, transmission and code conversion of Digital Media.It satisfies the functional requirement that the media asset owner collects, preserves, searches, edits, issues various information fully; For the user of media asset provides online content and easy access method, realized safety, intactly preserved media asset and utilize media asset efficiently, at low cost.

The object of multi-medium data (Multimedia Data) media asset management; Mainly be meant the carrier of all styles of information; For example: data such as text, figure, image, sound. its outstanding characteristics are: (1) multi-medium data (being unstructured data mostly) of a great variety; Derive from different medium, have diverse form and form; (2) the multi-medium data amount is huge; (3) multi-medium data has time response and version notion, and is different with traditional numerical and character, belongs to unstructuredness information

Media asset metadata (Matedata) is exactly an information of describing media asset.But the quality of metadata, quantity, unicity, description content accessibility and the property known all are the keys that determines a media asset management system success or not.Therefore perfect metadata collecting and manufacturing system are unusual keys.

The multimedia metadata is collected and made to the major function of full-automatic media materials cataloging syytem exactly, thereby to media asset management system effective information support is provided.The generation of traditional media asset metadata is to rely on manual method, utilizes operating personnel to watch video content, and manually adds mark on this basis, and then the generator data.This traditional method takes time and effort inefficiency.Therefore industry is needed the method that a kind of automatic metadata produces badly, replaces artificial method, raises the efficiency.Simultaneously, because the complexity of media asset content, and the increase of some characteristic contents (special efficacy such as for example be fade-in fade-out, wipe, in the video content influence of camera flashlamp etc.), for the automatic mode that video is cut apart, brought very big difficulty.How to carry out the extraction that key frame of video or camera lens are cut apart frame efficiently and accurately, be a technical barrier that remains unsolved at present.

Camera lens is cut apart automatically: the once record start-stop operation of the corresponding video camera of camera lens, represent in the scene in time with the space on continuous action.Can have polytype transient mode between the camera lens, modal is shear, has some complicated transient modes in addition, as be fade-in fade-out, dissolve, wipe etc.Camera lens is a base unit that carries out the material editor, will bring convenience for searching of information through camera lens being set up index.

The key frame Automatic Extraction: find that in reality the start-stop time of a camera lens is separated by longer sometimes, very large variation possibly take place in inner content and scene.Therefore, only cut apart all important informations that are not enough to take out fully media materials by camera lens.For remedying this inadequate natural endowment of camera lens fragment, defined the notion of key frame fragment.The key frame fragment analysis is based on the correlation analysis (rather than physical action of video camera start-stop) to video content, and extracts representational key frame according to complexity of video content, effectively plays the effect of video frequency abstract.The key frame fragment is to regard smallest meaningful unit in the video segment as, and key frame fragment frame interior has very close content, thereby very large information redundancy is arranged.Key frame can the section of representative in the information of all frames.

Summary of the invention

The present invention seeks to,, utilize advanced computer video analytical technology, propose a kind of construction method of full-automatic video cataloging syytem to the problem that the artificial categorization of voluminous media material takes time and effort.This system can practice thrift manpower in a large number, improves the quality and the efficient of video catalogue.

The invention discloses a cover flexibly, full-automatic video cataloging syytem and relevant intelligent algorithm efficiently.This system's support is automatically carried out intellectual analysis to video content; Extract the camera lens that forms based on the physical change of video camera start-stop and cut apart frame and change and can represent the key frame of video segment content, and cut apart the metadatabase of generation media materials on the basis of frame at key frame that extracts and camera lens based on video content.Key frame of video that the present invention proposes and camera lens cut apart frame robustness extraction algorithm for wiping of often occurring in the video, be fade-in fade-out, special efficacy such as dissolving and camera flashlamp influence etc. have good anti-interference effect, too much omission or flase drop can therefore not occur.

The first, the invention provides a functional framework of setting up the video cataloging syytem, defined the functional module of each subsystem.

The second, the invention provides a cover full-automatic video key frame extraction algorithm, this algorithm for wipe, be fade-in fade-out, special efficacy such as dissolving and camera flashlamp influence etc. have the good restraining effect.

The 3rd, the invention provides a cover full-automatic video camera lens and cut apart the frame extraction algorithm, this algorithm for wipe, be fade-in fade-out, special efficacy such as dissolving and camera flashlamp influence etc. have the good restraining effect.

For this reason, the construction method of full-automatic video cataloging syytem provided by the invention comprises:

1st, the structure of full-automatic video cataloging syytem; Described full-automatic video cataloging syytem comprises:

1.1st, acquisition of media module, through the video acquisition integrated circuit board, the sampling video flow data;

1.2nd, Media Analysis module; Go on foot the video stream data that the acquisition of media module collects to the 1.1st; Utilize advanced computer vision treatment technology to carry out intellectual analysis; Extract key frame or camera lens and cut apart frame, and the key frame that extracts or camera lens are cut apart frame be shown on the Storyboard, supply the user further to edit, handle;

1.3rd, metadata foundation, editor and administration module, the key frame that extracts with the 1.2nd step Media Analysis module is cut apart frame with camera lens and is the basis, and formation is unit with the video segment, and is aided with the media materials metadata of descriptor; Described metadata is the final result of cataloging syytem output, is the main foundation that the destructuring media asset is effectively managed; Metadata editor and management comprise the foundation of metadata model file and modification, metadata demonstration, metadata editor, metadata cache management and metadata preservation;

1.3.1, metadata model file; The content and the organizational form of the metadata information that cataloging syytem carries out media materials drawing after the structured analysis have been defined in this document; Comprise fragment description, classifying content, video author and lister; The metadata model file is with the XML representation of file, and system provides default file, and the user can revise the metadata model file through manual mode as required;

1.3.2, metadata editor comprise:

◆ key frame and camera lens are cut apart frame-editing: the user is cut apart frame with concrete condition increases or deletion is presented on the Storyboard key frame or camera lens as required;

◆ the video segment editor: the user is cut apart frame according to key frame or camera lens that system extracts; Pass through edit; The content of two or more key frames or camera lens being cut apart frame merges; Independent and the significant video segment of component content, and, add corresponding information according to the format content of metadata model document definition;

1.3.3, metadata show, comprise

◆ key frame or camera lens are cut apart frame and shown: promptly system will analyze the key frame that extracts or camera lens automatically and cut apart frame and be presented on the Storyboard;

◆ segment contents shows: the frag info content after promptly system will edit is presented on the user interface;

1.3.4, metadata cache management comprise:

During carrying out video, audio analysis automatically, for preventing frequent reading writing harddisk, and only analysis result is write hard disk when accomplishing analyzing;

◆ using system RAM makes buffer memory, requires memory size can support the metadata analysis of continuous several hrs;

◆ long and cause the Out of Memory time spent when continuous analysis time, the employing system divides page file as sharing mapped file;

1.3.5, metadata are preserved, and comprising:

◆ the metadata of analyzing automatically or revising is saved in storage medium with the XML document form;

◆ key frame or camera lens are cut apart in the file that frame is saved in appointment;

1.4th, Configuration Manager, system is stored in configuration information in the configuration file, and with the XML representation of file, the user can be by hand or through the user interface editor that makes amendment; Read this configuration file during system initialization, carry out the software module configuration; Configuration information comprises:

● input equipment configuration: media file on video acquisition integrated circuit board or the disk;

● algorithm function is selected: key frame or camera lens are cut apart frame and are extracted;

● key frame extraction algorithm parameter setting comprises

The ■ key frame extracts susceptibility and sets;

■ maximum or minimum frame gap are set: limit promptly that the adjacent key frame time interval is no less than or more than specified time interval.

2nd, media materials key frame of video robustness automatic decimation; Comprise:

2.1st, preliminary treatment, this step comprises two parts content: to incoming video signal, at first take a sample down through the frame interior pixel and reduce algorithm complex; Utilize frame-to-frame correlation to obtain binaryzation frame difference image in addition, and utilize this error image to filter out the set of candidate key frame;

2.2nd, information extraction; Candidate key frame set to the 2.1st step obtained is further handled; Extract histogram feature information, and utilize between the consecutive frame behind the histogram Fourier transform and between frame to be measured and the previous keyframe divergence measurement carry out key frame and judge and extract; In processing procedure, this step provides the detection and the inhibition of independent be fade-in fade-out, wipe special efficacy and photoflash lamp influence;

2.2.1, to being fade-in fade-out fragment, meet the characteristics of linear change according to interframe luminance signal in the fragment, detect through the method for estimating this rate of change;

2.2.2, to the special efficacy fragment of wiping, have the regular characteristics in space according to the interframe zone of wiping, detect through detecting interframe wipe wipe between the fragment frame interior spatial variations in zone of zone and whole of wiping;

2.2.3, to the photoflash lamp fragment, according to short characteristics of photoflash lamp time, utilize the little characteristic of the big and separated interframe luminance difference of interframe luminance difference during detection, judge and detect;

2.3rd, information analysis according to characteristic information and the special scene testing result that the 2.2nd step information extraction obtains, is carried out last analysis-by-synthesis, and is selected the key frame that can represent segment contents;

2.3.1, for being fade-in fade-out frame sequence, the last frame that the special efficacy of selecting to be fade-in fade-out is accomplished is exported as key frame;

2.3.2, for wiping fragment sequence, the last frame that the special efficacy of selecting to wipe is accomplished is exported as key frame;

2.3.3, for the frame of detected photoflash lamp influence, directly from the candidate frame sequence with its filtering;

2.3.4, for remaining common candidate's key frame sequence, in its fragment of forming by each successive frame sequence, select to export as key frame with the maximum frame of last key frame histogram Fourier transform diversity factor.

3rd, the media materials camera lens is cut apart frame robustness automatic decimation, comprising:

3.1st, preliminary treatment comprises two parts content: to incoming video signal, at first take a sample down through the frame interior pixel and reduce algorithm complex; Utilize frame-to-frame correlation to obtain binaryzation frame difference image in addition, and utilize this error image to filter out the candidate camera lens to cut apart frame set;

3.2nd, information extraction; The 3.1st candidate camera lens that obtains of step is cut apart the frame set further to be handled: extract histogram feature information, and adopt the method for histogram decay average computing to calculate decay histogram average and the statistical variance from the start frame to the present frame in the same camera lens; Calculate the interframe histogram χ between present frame and the decay histogram average then ²The statistics difference, and upgrade statistical variance; With the statistical variance that dynamically updates, calculate the dynamic decision thresholding, utilize this threshold value to carry out the judgement that camera lens is cut apart frame; In processing procedure, this step provides the detection and the inhibit feature of independent be fade-in fade-out, wipe special efficacy and photoflash lamp influence; For the fragment of being fade-in fade-out, meet the characteristics of linear change according to interframe luminance signal in such fragment, detect through the method for estimating this rate of change; For the fragment of wiping, have the regular characteristics in space according to the interframe zone of wiping, detect through detecting wipe wipe between zone and the entire segment frame interior spatial variations in zone of interframe; For the photoflash lamp fragment, according to short characteristics of photoflash lamp time, utilize the little characteristic of the big and separated interframe luminance difference of interframe luminance difference during detection, judge and detect;

3.3rd, information analysis; According to the characteristic information and the special scene testing result that from the 3.2nd step information extraction, obtain; Carry out last analysis-by-synthesis, and judge camera lens and cut apart frame, comprising the detection of the special circumstances of be fade-in fade-out, wipe special efficacy and photoflash lamp; For the detected frame sequence of being fade-in fade-out, the output last frame is as key frame; For the detected fragment sequence of wiping, the output last frame is as key frame; For the frame of detected photoflash lamp influence, directly from the candidate frame sequence with its filtering; For remaining common candidate frame, the judgement that camera lens is cut apart frame requires to satisfy following two conditions:

● sudden change conditions: present frame and decay histogram average χ ²The statistics difference is greater than the sudden change threshold value of being confirmed by statistical variance;

● smooth conditions: present frame and decay histogram average χ ²The statistics difference is greater than by present frame and the back interframe χ between the frame ²The steady threshold value that the statistics difference is confirmed.

Advantage of the present invention and good effect:

The present invention adopts advanced computer video analytical technology, in real time, automatically analyzes video content, and according to user's request, extracts key frame and camera lens and cut apart frame.Be supported in key frame or camera lens simultaneously and cut apart on the frame basis,, set up media materials metadatabase based on video content through human-edited, perfect.For the back end media asset management system provides sufficient metadata information.Robustness key frame of video disclosed by the invention and camera lens are cut apart the frame extraction algorithm, for the interference of special scenes such as video properties such as being fade-in fade-out, dissolving, wiping and photoflash lamp good restraining ability are arranged.Because the destructuring characteristic of original video content, the conventional artificial mode is carried out the method that video segment is cut apart and made a catalogue, and wastes time and energy, and adopts solution of the present invention, can practice thrift great amount of cost and social resources.

Description of drawings

Fig. 1 is a system architecture functional block diagram of the present invention.

Fig. 2 is a specific embodiment functional block diagram of the present invention.

Fig. 3 is that the present invention is about the special efficacy example of being fade-in fade-out.

Fig. 4 is the example of the present invention about wiping.

Fig. 5 is the example of the present invention about the photoflash lamp influence.

Fig. 6 is that the present invention is about metadata model file example.

Fig. 7 is that the present invention is about metadata configurations file example.

Fig. 8 is that the present invention is about key frame robustness extraction algorithm system flow (method).

Fig. 9 is that the present invention is cut apart frame extraction preliminary treatment submodule flow process (method) about key frame or camera lens.

Figure 10 is that the present invention is about the detection sub-module flow process (method) of being fade-in fade-out.

Figure 11 is that the present invention is about the detection sub-module flow process (method) of wiping.

Figure 12 is that the present invention is about photoflash lamp detection sub-module flow process (method).

Figure 13 is that the present invention is cut apart frame robustness extraction algorithm system flow (method) about camera lens.

Embodiment

One, full-automatic video cataloging syytem framework is as shown in Figure 1, comprising:

1, acquisition of media module

This subsystem is through video acquisition integrated circuit board and integrated circuit board respective drive program, and the sampling video flow data arrive calculator memory, and are sent to the Media Analysis module.

2, Media Analysis module

This module is to the video stream data that collects; According to the config option of setting; Select corresponding Processing Algorithm and Control Parameter thereof for use; Carry out intellectual analysis and key frame or camera lens and cut apart frame and extract, and the key frame that extracts or camera lens are cut apart frame and corresponding timecode information be shown on the user interface on the Storyboard zone, supply the user further to edit, handle.

3, the foundation of medium metadata, editor and administration module

The medium metadata management comprises that the metadata model file is set up and modification, and metadata shows, metadata editor, and the metadata cache management, metadata is preserved.

The metadata model document definition; Content and the organizational form (like fragment description, classifying content, video author, lister etc.) of the metadata information that cataloging syytem carries out media materials drawing after the structured analysis have been defined in this document; With the XML representation of file; System provides default file, and the user can make amendment through the mode of craft according to specific needs.Fig. 6 illustrates an example of metadata model file.

Metadata editor comprises:

◆ key frame and camera lens are cut apart frame-editing: system cuts apart frame with key frame or camera lens and is presented on the dialog control, and through increasing the operation of this dialog control, realizes revising or deletes keys or camera lens is cut apart the function of frame

◆ the video segment editor: the dialog control of frame is cut apart through showing key frame or camera lens in system, provides to cut apart frame by key frame or camera lens and form video segment, and the metadata editting function is provided for this video segment.And metadata is presented in the list of meta data control.

Metadata shows, comprises

◆ key frame or camera lens are cut apart frame and shown: system cuts apart frame with key frame or camera lens and is presented on the dialog control, and each key frame or camera lens are cut apart the corresponding dialog control of frame, and all dialog controls all depend in the view of a Storyboard

◆ segment contents shows: the frag info after promptly system will edit (like fragment description, classifying content, video author, lister etc.) content is presented in the list of meta data control, and the content of list control is with the meta data file of XML format at storage medium.

The metadata cache management comprises:

During looking audio analysis automatically, for preventing frequent reading writing harddisk, and only analysis result is write hard disk analyzing when accomplishing.

◆ using system RAM makes buffer memory, requires memory size can support the metadata analysis of continuous several hrs

◆ long and cause the Out of Memory time spent when continuous analysis time, the employing system divides page file to preserve as sharing the mapped file metadata, comprising:

◆ the metadata of analyzing automatically or revising is saved in storage medium with the XML document form

◆ key frame or camera lens are cut apart frame and are saved in the specific file

4, Configuration Manager

The full-automatic video cataloging syytem is stored in configuration information in the CONFIG.SYS, stores with the XML file mode.System provides user interface, shows and the editing system configuration file that with the mode of dialog box the user can manual mode direct modification CONFIG.SYS.Read this CONFIG.SYS during system initialization, carry out the software module configuration.Fig. 7 illustrates an example of system configuration management document.Configuration information comprises:

● input equipment configuration: media file on video acquisition integrated circuit board or the disk

● algorithm function is selected: key frame or camera lens are cut apart frame and are extracted

● key frame extraction algorithm parameter setting comprises

The ■ key frame extracts susceptibility and sets

■ maximum or minimum frame gap are set: limit promptly that the adjacent key frame time interval is no less than or more than specified time interval

Two, robustness key frame of video abstracting method provided by the invention comprises:

1, handling process design

The key frame abstracting method is illustrated by Fig. 8.The entire process method can be divided into three parts: preliminary treatment submodule, information extraction submodule, information analysis submodule.

◆ the preliminary treatment submodule at first to each frame video data of input carry out simply, preliminary treatment fast, comprise sample circuit time domain preanalysis under the spatial domain, and therefrom filter out candidate key frame set roughly.

◆ information extraction contains following two parts function:

● first further handled to the candidate key frame set that a last step filters out; Characteristic information extraction; And according to reaching the interframe characteristic information in the frame of each frame; Further filter out the candidate frame sequence, and continuous candidate frame sequence and characteristic information thereof made as a whole candidate segment note, supply further to analyze and key-frame extraction used.

● second portion is to carry out particular processing and detection to special screne, comprising:

■ photoflash lamp effect detection

■ is fade-in fade-out and dissolves special efficacy and detects

The ■ interframe special efficacy of wiping detects

◆ the information analysis submodule is on last step detection and processing basis, to condition of different, adopts different processing policies, and key frame of video is extracted in final analysis

1.1, the preliminary treatment submodule---the candidate segment start frame obtains

Preprocessing subsystem is illustrated by Fig. 9, comprises following two steps:

1) sampling of carrying out the spatial domain improves treatment effeciency;

For improving treatment effeciency, under the situation that guarantees not impact analysis effect, at first input video stream is descended sampling.In the practical implementation process, be that 720 * 576 video flowing carries out 8*8 sampling down to input resolution, obtaining resolution is 90 * 72 video stream datas.

2) preanalysis.The key frame purpose is the appearance that is used for identifying, describe new scene in the video or fresh content, need filter out in processing procedure therefore that those are a large amount of, describe the similar frame of segment contents height with last key frame.For this reason,

● at first define two-value template M (i, j), its size is following sampling rear video frame sign, promptly 96 * 72.Each point value of initialization is 0.

● to each frame, consider its luminance signal I _t, calculate following two binaryzation frame difference images, and utilize morphology open-close operator to carry out noise reduction and removal details.Here, get threshold value T=10.

■ consecutive frame binaryzation frame difference image

d_{t, t - 1} (i, j) = \{\begin{matrix} 0 & | I_{t} (i, j) - I_{t - 1} (i, j) | < T \\ 1 & else \end{matrix}

d′ _t，t-1(i，j)＝OC(d _t，t-1(i，j))

● present frame I _tWith previous keyframe I _kBetween binaryzation frame difference image.If do not detect key frame as yet, then use complete black frame to replace previous keyframe.

d_{t, k} (i, j) = \{\begin{matrix} 0 & | I_{t} (i, j) - I_{k} (i, j) | < Tor | I_{t - 1} (i, j) - I_{k} (i, j) | < T \\ 1 & else \end{matrix}

d′ _t，k(i，j)＝OC(d _t，k(i，j))

● more new template M (i, j):

M _new(i，j)＝M _old(i，j)∪d′ _t，k(i，j)

● calculation template M, d _{T, t-1}, d _{T, k}In 1 number:

N (M) = \underset{i, j}{Σ} M (i, j)

N (d_{t, t - 1}) = \underset{i, j}{Σ} d_{t, t - 1} (i, j)

N (d_{t, k}) = \underset{i, j}{Σ} d_{t, k} (i, j)

Just can think that when satisfying one of following condition new contents fragment has begun, promptly to begin all frame of video from present frame be candidate frames to mark, selects until new key frame:

1)N(M)＞size/5?and?N(d _t，t-1)-N(d _t，k)＞size/10

2)N(d _t，t-1)＞size/20?and?N(d _t，t-1)-N(d _t，k)＞size/10

Here, size is the number of frame of video interior pixel

1.2, the information extraction submodule

The information extraction subsystem on the candidate video segment basis that has filtered out, further characteristic information extraction; Simultaneously key frame is extracted influential special scene and detects with special efficacy and suppress for some, comprising: photoflash lamp influence, be fade-in fade-out and dissolve special efficacy, the special efficacy of wiping.

1) essential information is extracted and candidate's key frame sequential recording

● essential information is extracted

Original essential information is based on input yuv video signal.

At first every frame video data Y, U, V component are carried out the statistics with histogram analysis, obtain the color histogram h of 256 component values _y, h _u, h _v

Expand to for frame sequence; Obtaining Nogata atlas t is frame number; I is brightness, chromatic component value, i ∈ [0,255]; Thus, have to give a definition

■ is the frame sequence of N for sequence length, and definition histogram average is:

Ave = \frac{1}{N} Σ_{t = 0}^{N - 1} \overset{ρ}{h} (t, i)

H wherein _y(i), h _u(i), h _v(i) be respectively Y, U, V histogram of component

■ definition color histogram component Fourier transform is:

h_{y} (i) &LeftRightArrow; H_{y} (jω)

h_{u} (i) &LeftRightArrow; H_{u} (jω)

h_{v} (i) &LeftRightArrow; H_{v} (jω)

Histogram difference between ■ definition frame m and the frame n is estimated:

d_{m, n} = \sqrt{Σ_{ω = 0}^{π / 4} ({| H_{y}^{m} (jω) - H_{y}^{n} (jω) |}^{2} + {| H_{u}^{m} (jω) - H_{u}^{n} (jω) |}^{2} {| H_{v}^{m} (jω) - H_{v}^{n} (jω) |}^{2})}

Can find out by following formula, when calculated difference is estimated, only consider the low frequency component in the frequency domain, do two purposes like this:

◆ the mould value of frequency-region signal is insensitive to the time-delay on the time shaft, here is exactly insensitive to the whole global brightness variation of YUV, thereby to a certain extent photoflash lamp is had certain inhibitory action;

◆ adopt the low frequency component of analyzing frequency domain information to be equivalent to primary signal has been carried out smoothly can eliminating The noise like this.

● candidate's key frame sequence is chosen

Utilize histogram difference to estimate, further carry out choosing of key frame candidate frame.

In the key frame testing process, in order to make detected key frame more representative, we will detect the continuous sequence of a key frame, and choose in this section the most representative frame as key frame.

So testing process can be divided into for two steps:

● the detection of key frame section start frame

● the detection of key frame section abort frame.

Detect about key frame section start frame:

1. defining video first frame is key frame

2. the histogram average Ave that upgrades the former frame from the previous keyframe to the present frame is as reference histograms

3. calculate consecutive frame and present frame and the previous keyframe histogram difference is estimated d _{T, t-1}And d _{T, k}If, d _{T, k}＞T shows that then scene has had bigger variation, thus beginning label key frame candidate frame, the beginning of a promptly new contents fragment.

Detect about key frame section abort frame:

When detecting first key frame candidate frame, show that new video segment has begun.Will determine below when this sequence finishes, and key frame fragment abort frame.All frames between key frame start frame and key frame abort frame all are labeled as the key frame candidate frame, and final key frame will be chosen from this sequence.The method of judgement key frame abort frame is:

1. then at first utilize the average computing to upgrade reference histograms, calculate the divergence measurement d of each frame and former frame and reference histograms behind the key frame section start frame then _{T, t-1}And d _{T, k}

If 2. d _{T, k}＞T shows that then another key frame section has begun, and then analyzes old key frame section, obtains best key frame output, writes down new key frame start frame simultaneously;

If 3. d _{T, t-1}＜T shows that then video transformation is steady, and this frame is key frame section end frame;

If 4. this frame and key frame differ more than 30 frames in addition, then force to finish, carry out the analysis of key frame section, draw key frame.

2) special scene detects

In video flowing, there are some special scenes very big influence to be arranged for the extraction of key frame.In these scenes, because interframe alters a great deal, content change is very little again simultaneously, and excessively detecting often appears in conventional detection method.The robustness key frame that the present invention proposes detects, and considers the situation that these are special, adopts corresponding processing method to these special situation, obtains the better inhibited effect

● the detection of being fade-in fade-out and dissolving:

Being fade-in fade-out and dissolving is special efficacy common in the video flowing, and the typical segments example is illustrated by Fig. 3.It is similar being fade-in fade-out with the formation principle of dissolving special efficacy.Fading in refers to video frame content and changes in time, conceals gradually, until the frame that disappears for complete black content; It is just in time opposite to fade out, and refers to video frame content and changes in time, goes out gradually in the complete black frame to occur; The video segment that dissolving refers to two different contents changes mutual superposition in time, and one of them contents fragment conceals gradually, until disappearance; Another contents fragment engenders, until all manifesting.

The essence that this special efficacy forms is the two frames results of phase superposition in varing proportions, its ratio k linear transformation in time.Its testing process is illustrated by Figure 10.

Detection be fade-in fade-out with dissolving special efficacy process in, consider brightness mean value signal in the frame of video, definition

Y _t＝∑y _t(i，j)

In the special efficacy frame sequence of being fade-in fade-out, consecutive frame satisfies:

y _t(i，j)＝k*y _t-1(i，j)+(1-k)*y _t+1(i，j)，k∈[0，1]

Utilize brightness mean value signal in the frame of video, can estimate proportionality coefficient

k = \frac{Y_{t + 1} - Y_{t}}{Y_{t + 1} - Y_{t - 1}}

The proportionality coefficient k that employing is estimated out, the definition frame differences is estimated and decision threshold:

Frame difference is estimated: the d=∑ | k*y _T-1(i, j)-y _t(i, j)+(1-k) * y _T+1(i, j) |

Decision threshold: t ₁=∑ | y _T-1(i, j)-y _t(i, j) |

t ₂＝∑|y _t+1(i，j)-y _t(i，j)|

When satisfying d＜k*t ₁And d＜k*t ₂The time, this frame of mark is the frame of being fade-in fade-out

● photoflash lamp detects and suppresses:

1. the photoflash lamp frame detects.

Photoflash lamp fragment example can be illustrated by Fig. 5.Photoflash lamp is bigger to image brightness and carrier chrominance signal influence.Carrying out key frame or camera lens when cutting apart frame and detecting, the frame of photoflash lamp influence is detected, avoid it is defined as key frame.Its testing process such as Figure 12 illustrate.For this reason,

The binaryzation error image does between definition frame, here T=10:

D_{m, n} (i, j) = \{\begin{matrix} 0 & | y_{m} (i, j) - y_{n} (i, j) | < T \\ 1 & else \end{matrix}

Then, utilize morphology open-close operator to carry out noise and details removal:

D′(i，j)＝OC(D(i，j)

Between definition frame error image estimate for,

N (D^{'} (i, j)) = \underset{i, j}{Σ} D^{'} (i, j)

That is 1 number in the error image.

When frame t satisfied one of following condition, this frame of mark was the photoflash lamp action frame

a)

N (D_{t - 1, t}^{'} (i, j) \cap \overset{&OverBar;}{D_{t - 1, t + 1}^{'} (i, j)}) > \min (size / 5, N (D_{t - 1, t + 1}^{'} (i, j)))

b)

N (D_{t, t + 1}^{'} (i, j) \cap \overset{&OverBar;}{D_{t - 1, t + 1}^{'} (i, j)}) > \min (size / 5, N (D_{t - 1, t + 1}^{'} (i, j))

Here, size is a frame interior pixel number

2. photoflash lamp suppresses

In analytic process, except through carrying out photoflash lamp detects, also following means suppress other overall jump in brightnesss:

A. utilize the time domain consistency of Fourier transform to suppress overall changes in amplitude

B. utilize adaptive threshold to suppress flash light effect.Promptly change when excessive when certain section video, raise decision threshold adaptively and suppress its influence.

● the frame of wiping detects:

The fragment of wiping example can be illustrated by Fig. 4.Generally wipe and show as: continuous 5～10 frames, every frame and former frame have bigger difference in the part of image and not obvious in other part differences, and this image is clocklike than the variation of big-difference part simultaneously.For example, from left to right change, from top to bottom etc.Regularity according to this difference regional change can detect the interval of wiping.

The fragment of wiping testing process such as Figure 11 illustrate, and can be made up of following 4 steps:

Step 1: the definition template Mask that wipes is used for being recorded in whole interframe of wiping the fragment process zone of wiping.The template size of wiping is identical with video frame size, and the initialization internal data is 0,

(i, j)=0, i, j are respectively row, column index to Mask

Step 2: utilize interframe binaryzation error image, detect the interframe zone of wiping:

2.1 calculate interframe binaryzation error image, and utilize morphology open-close operator to carry out noise and details removal, and the binary image D ' after obtaining handling (i, j)

2.2 choose binary image D ' (i, j) in largest connected territory C (i, j), the effect candidate zone of wiping

2.3 when template and above-mentioned largest connected territory meet the following conditions simultaneously, adjudicate this frame and be the frame of wiping

◆ the interframe candidate region of wiping is enough big: N (C (i, j))＞size/15

◆ the newly-increased zone of wiping is enough big:

N (\overset{&OverBar;}{Mask (i, j)} \cap C (i, j)) > Size / 20

The candidate region of wiping is detected, when the candidate regions of not wiping detects, and the step 4 of beating

Step 3: if step 2 detects the zone of wiping, new template Mask more, and get back to step 2, handle next frame:

Mask _new(i，j)＝Mask _old(i，j)∪C(i，j)

Otherwise, jump to step 4

Step 4: if the front detects the plurality of continuous frame fragment of wiping, and satisfy following two conditions simultaneously, then

Judge that this frame sequence is the fragment of wiping, and the fragment last frame of will wiping is designated key frame.

◆N(Mask(i，j)＞size/3

◆ the frame number of wiping continuously＞3 frames

1.3, the information analysis submodule

The information analysis subsystem is exactly that the key frame fragment of record is analyzed, and finally draws best key frame.Detect candidate frame of different nature for the front, adopt following different strategies to handle

4.1 for the detected frame sequence of being fade-in fade-out, the output last frame is as key frame.

4.2 for the detected fragment sequence of wiping, the output last frame is as key frame.

4.3 for the frame of detected photoflash lamp influence, directly from the candidate frame sequence with its filtering.

4.4 for remaining common candidate's key frame sequence, in its continuous sequence frame, select and the maximum frame of last key frame histogram difference degree, as this sequence key frame output I _kSatisfy:

d _k，last＝max(d _t，last) t∈D，

Wherein k is the key frame sequence number, and D is a successive candidate key frame sequence scope, and t is the sequence number variable

Three, robustness camera lens switch frame detection method provided by the invention comprises:

1, handling process design

Camera lens is cut apart the frame extraction system and is illustrated by Figure 13.The entire process method can be divided into three parts: preliminary treatment submodule, information extraction submodule, information analysis submodule.

◆ the preliminary treatment submodule at first to each frame video data of input carry out simply, preliminary treatment fast, comprise sample circuit time domain preanalysis under the spatial domain, and therefrom filter out candidate key frame set roughly

◆ the information extraction submodule comprises following two parts function:

■ photoflash lamp effect detection

■ is fade-in fade-out and dissolves special efficacy and detects

The ■ interframe special efficacy of wiping detects

◆ the information analysis submodule is on last step detection and processing basis, to condition of different, adopts different processing policies, and final analysis extraction video lens is cut apart frame

The function of preprocessing subsystem is identical with the key frame extraction with method, can here repeat no more with reference to preamble.

Through preprocessing subsystem, video resolution is reduced, filter out candidate's camera lens simultaneously and cut apart frame sequence.

1.2, the information extraction submodule

1) essential information is extracted

● essential information is extracted

Two types of frame difference information of main consideration, and dynamic decision thresholding:

A. interframe pixel grey scale different information DOP (Difference of Pixels)

The binaryzation error image is between definition frame:

d_{t, t - 1} (i, j) = \{\begin{matrix} 0 & | I_{t} (i, j) - I_{t - 1} (i, j) | < T \\ 1 & else \end{matrix}

Here T=10

d(i，j)＝OC(d(i，j)

Between definition frame the pixel grey scale error image estimate for,

N (d^{'} (i, j)) = \underset{i, j}{Σ} d^{'} (i, j)

When N (d)＜Size/5, represent that present frame is similar with former frame, thereby can not be that camera lens is cut apart frame, omit remaining processing procedure

When N (d)＞Size/5, represent that present frame and former frame differ greatly, and carry out follow-up processing

B. interframe pixel histogram difference information D OH (Difference of Histogram)

When calculating interframe DOH, adopt three kinds of technology: histogram decay average, histogram χ ²Statistics difference and adaptive threshold judgement.

● histogram χ ²The statistics difference.

The histogram H of given frame m and frame n _m(i), H _n(i), histogram χ between definition frame ²The statistics difference is following:

d (H_{m}, H_{n}) = \frac{1}{N^{2}} \underset{i}{Σ} \frac{{(H_{m} (i) - H_{n} (i))}^{2}}{\max (H_{m} (i), H_{n} (i))}

H _m(i)≠H _n(i)

For the YUV color histogram, the color histogram divergence measurement is between definition frame:

D _m，n＝d _y(H _m，H _n)+d _u(H _m，H _n)+d _v(H _m，H _n)

● the computing of histogram decay average

At technological histogram interframe χ ²The statistical difference timesharing, employing be not that former frame and present frame compare, but the weighted average information of all frames compares in the same camera lens in front.Near frame to be compared, weights are big more, otherwise more little.Here, definition decay average computing:

Suppose sequence { H is arranged _t, t is a frame number, and attenuation coefficient is α＜1, and definition decay average is:

\overset{&OverBar;}{H} = \frac{H_{t} + α H_{t - 1} + α^{2} H_{t - 2} + . . .}{1 + α + α^{2} + . . .}

Here, through attenuation coefficient, make at weights near the frame of present frame bigger, smaller away from the weights of the frame of present frame, do not remember to ignoring and the weights of frame too far away are little.The decay average that obtains like this can embody the slow variation of same camera lens inner content and the correlation between frame and the frame well.

In implementation procedure, the method for iteration is upgraded current decay average frame as follows, and the information of all frames before needn't writing down:

\overset{&OverBar;}{H_{t + 1}} = \frac{\overset{&OverBar;}{H_{t}} (1 - α^{t}) + (1 - α) H_{t + |}}{1 - α^{t + 1}}

In the video analysis processing procedure, calculate the present frame histogram and estimate with the histogram difference of the histogram decay average of interior all frames of all same camera lenses of past, come detector lens to cut apart frame

D_{t, \overset{&OverBar;}{t - 1}} = d_{y} (H_{t}, \overset{&OverBar;}{H_{t - 1}}) + d_{u} (H_{t}, \overset{&OverBar;}{H_{t - 1}}) + d_{v} (H_{t}, \overset{&OverBar;}{H_{t - 1}})

● self-adaptive decision threshold

Consider various noise effects and same camera lens frame-to-frame correlation, when carrying out terminal decision, adopt the method for adaptive threshold.In order to obtain threshold value adaptively, through following two steps:

The inner local variance sigma of ■ camera lens dynamically updates:

◆σ ₀＝0

◆

σ_{t} = \frac{0.7 * (1 - {0.7}^{t - 1}) * σ_{t - 1} + 0.3 * D_{t, \overset{&OverBar;}{t - 1}}}{1 - {0.7}^{t}}

The ■ adaptive threshold is considered various noise effects and same camera lens frame-to-frame correlation in processing procedure, the definition thresholding is:

T＝3*σ+10

Wherein σ is the mean square deviation that the inter-frame difference in the same camera lens fragment changes.Theoretical according to Gaussian distribution, the sample number that is distributed in average both sides 3* σ is more than 99.9%.Therefore, most non-camera lenses cut apart frame the histogram difference all less than this thresholding.

2) special scene detects

In some special scenes of video flowing and special efficacy camera lens being cut apart the correctness that frame extracts has very big influence, if do not do particular processing, can cause very big flase drop.These special scenes and special efficacy front were mentioned, and comprised be fade-in fade-out and dissolve, wipe special efficacy and photoflash lamp influence.Detection about these special scenes is identical with the method that key frame extracts, and can here just repeat no more with reference to the description of front.

1.3, the information analysis submodule

The information that the information analysis subsystem extracts and detects according to the front is done last analysis, and the judgement camera lens is cut apart frame.According to detect candidate frame of different nature for the front, adopt following different strategies to handle

A. for the detected frame sequence of being fade-in fade-out, the output last frame is as key frame.

B. for the detected fragment sequence of wiping, the output last frame is as key frame.

C. for the frame of detected photoflash lamp influence, directly from the candidate frame sequence with its filtering.

D. for remaining common candidate frame sequence, adopt following decision method:

If frame t satisfies simultaneously:

The ■ sudden change conditions:

D_{t, \overset{&OverBar;}{t - 1}} > 3 * σ_{t} + 10

The ■ smooth conditions:

D_{t, \overset{&OverBar;}{t - 1}} > 2 * D_{t, t + 1}

Then determine that it is camera lens and cut apart frame

Four, introduce exemplary embodiment with the lower part

Present embodiment is the software that runs on windows platform, Visual C++ exploitation.

Fig. 1 illustrates the present embodiment Organization Chart.Can know by figure,

● present embodiment mainly is made up of 5 modules:

■ application graphical user interface: this module realizes based on Visual C++MFC.Many documents application program is made up of 3 parts:

◆ video shows: realize based on the MFC view, be presented in this view framework gathering the video data of coming in real time

◆ Storyboard: realize that based on the MFC view key frame that Automatic Extraction is come out is presented in the Storyboard view framework with the mode of dialog control.

◆ metadata: realize based on the MFC view,, be presented in the Metadata View The framework with the mode of list box control with the metadata information after the editing and processing

■ cataloging syytem engine modules main program module; This module is through calling bottom module (comprising data acquisition module, Media Analysis module, metadata management module etc.); To the upper strata user interface service of video catalogue is provided, major function comprises application initialization, Memory Allocation, dynamic load, following each sub-function module is set.

■ data acquisition submodule, this module are the dynamic base card format, by cataloging syytem engine modules dynamic load, operate in the independent thread.Major function is that through operation bottom data analog input card, image data is to the input shared drive

■ Media Analysis submodule: this module is the dynamic base card format, by cataloging syytem engine modules dynamic load, operates in the independent thread.Major function is that realization key frame or camera lens are cut apart frame intelligence extraction algorithm, and the result is exported

■ metadata management submodule

■ configuration management submodule

Fig. 2 illustrate main type of this enforcement use-case with and call relation, wherein,

◆ upper level applications is the many documents application framework based on MFC.

◆ video demonstration view is for inheriting the real-time video of collection and the class of timecode information of showing that be used for of CFormView.

◆ the Storyboard view for inherit CFormView be used for show that key frame or camera lens cuts apart the class of frame.Key frame of gathering out or camera lens are cut apart frame and are shown based on dialog control.

◆ the Metadata View The class is for inheriting the class that is used for the display element data of CFormView.Content metadata shows based on list control.

◆ all functions of full-automatic video coded system engine integrated video cataloging syytem comprise data acquisition, Media Analysis and metadata management, and to the upper strata interface are provided.

◆ the data acquisition class provides by the data integrated circuit board and gathers the video data function in real time, through data acquisition in internal memory

◆ the Media Analysis class calls key frame or camera lens is cut apart the frame Processing Algorithm, handles the data in the internal memory, and result is cached in the key frame data buffer memory

◆ the metadata management class has encapsulated the metadata management correlation function.This module is cut apart frame information according to key frame or camera lens that the Media Analysis module analysis goes out, and the response user marks by hand, carries out metadata creation and management function.

Claims

1. the construction method of a full-automatic video cataloging syytem is characterized in that this method comprises:

1st, the structure of full-automatic video cataloging syytem;

2nd, media materials key frame of video robustness automatic decimation;

3rd, the media materials camera lens is cut apart frame robustness automatic decimation.

2. method according to claim 1 is characterized in that described full-automatic video cataloging syytem of the 1st step comprises:

1.3.2, metadata editor comprise:

1.3.3, metadata show, comprise

1.3.4, metadata cache management comprise:

1.3.5, metadata are preserved, and comprising:

● key frame extraction algorithm parameter setting comprises

The ■ key frame extracts susceptibility and sets;

3. method according to claim 1 is characterized in that described media materials key frame of video robustness automatic decimation of the 2nd step comprises:

2.1st, video low layer preliminary treatment, this step comprises two parts content: at first take a sample down through the frame interior pixel and reduce algorithm complex; Utilize frame-to-frame correlation to obtain binaryzation frame difference image in addition, and utilize this error image to filter out the set of candidate key frame;

4. method according to claim 1 is characterized in that described media materials camera lens of the 3rd step cuts apart frame robustness automatic decimation and comprise:

3.1st, preliminary treatment comprises two parts content: at first take a sample down through the frame interior pixel and reduce algorithm complex; Utilize frame-to-frame correlation to obtain binaryzation frame difference image in addition, and utilize this error image to filter out the candidate camera lens to cut apart frame set;