CN109101920A

CN109101920A - Video time domain unit partioning method

Info

Publication number: CN109101920A
Application number: CN201810889588.9A
Authority: CN
Inventors: 张云佐; 朴春慧; 张莎莎; 李汶轩; 姚慧松
Original assignee: Shijiazhuang Tiedao University
Current assignee: Shijiazhuang Tiedao University
Priority date: 2018-08-07
Filing date: 2018-08-07
Publication date: 2018-12-28
Anticipated expiration: 2038-08-07
Also published as: CN109101920B

Abstract

The invention discloses a kind of video time domain unit partioning methods, are related to method for processing video frequency technical field.Described method includes following steps: extracting horizontal Spatial-temporal slice in video caption center；The minimum space and time order information content of each video frame is calculated according to the horizontal Spatial-temporal slice of extraction；Detect the minimum space and time order information content of videoMutation；With minimum space and time order information contentSport boundary segmentation video time domain unit.The method carries out accuracy rate with higher when time domain unit segmentation, there is stronger anti-interference since the method only extracts the detection of the one-row pixels in video relative to control methods, and computation complexity is low, calculation amount is small, has more apparent advantage on calculating the time.

Description

Video time domain unit partioning method

Technical field

The present invention relates to method for processing video frequency technical field more particularly to a kind of video time domain unit partioning methods.

Background technique

Video is non-structured data flow, and its essence is a series of picture frames continuous in time.These picture frames Only mutual sequential relationship, without any structural information.Video temporal segmentation is i.e. according in video content or video It is specific to indicate the structural unit that stratification is detected from digital video sequences, it is that the structural unit of different levels establishes index letter Breath, in order to which video data is stored, manages, analyzed and handled according to particular content.Video temporal segmentation is to video data Stream carries out Hierarchical Segmentation on a timeline, completes turning for the video entities that structuring is flowed to from original non-structured video It changes.Original video flowing is divided into video time domain unit that is significant, being easily managed, forms the level knot of video content Structure.

Video temporal segmentation is the basis and committed step of video concentration, retrieval with browsing, and camera lens is wide in video analysis The general spatial structure unit used, the existing video temporal segmentation method based on shot boundary detector is usually with video features The foundation that variation degree is divided as video lens.These video features include color, shape, edge and motion vector etc.. In pixel domain Processing Algorithm, video lens segmentation mainly utilizes color histogram feature；In compressed domain video partitioning algorithm, Generally utilize motion vector characteristic.The roadmap of these two types of algorithms is almost the same, is all by comparing between adjacent video frames Feature difference and given threshold determine shot boundary.If feature difference is greater than given threshold, current location is regarded as mirror Boundary in front, it is on the contrary then be not shot boundary.The accuracy of shot boundary detector depends on the threshold of definition and the setting of feature difference Value.However camera lens is not the smallest time domain semantic primitive, sub- camera lens is more accurate in terms of describing video semanteme change.But son Shot Detection will carry out on the basis of shot segmentation, complex steps and computationally intensive.

Summary of the invention

Technical problem to be solved by the invention is to provide a kind of strong interference immunity, computation complexity is low, calculation amount is small, view Frequency division cuts accurate method.

In order to solve the above technical problems, the technical solution used in the present invention is: a kind of video time domain unit partioning method, It is characterized by comprising following steps:

Horizontal Spatial-temporal slice is extracted in video caption center；

The minimum space and time order information content MSSI of each video frame is calculated according to the horizontal Spatial-temporal slice of extraction；

Detect the mutation of the minimum space and time order information content MSSI of video；

Boundary segmentation video time domain unit is sported with minimum space and time order information content MSSI.

A further technical solution lies in, for video V (x, y, t), the horizontal Spatial-temporal slice S of video caption center It may be expressed as:

In formula:Indicate position x=j in video V, t=i, y take the pixel at subtitle median elevation, meet j ∈ [1, W], i ∈ [1, L], W indicate that the width of video frame, L indicate the length of video.

A further technical solution lies in the method also includes carrying out pretreated step, side to horizontal Spatial-temporal slice S Method is as follows:

Pretreatment is carried out using Adaptive Gaussian mixture background model, using each column of horizontal Spatial-temporal slice S as one Gauss model is inputted, model parameter updates by column；Gaussian mean μ and variance δ²More new formula are as follows:

In above formula:It is the brightness that t+1 is arranged in Spatial-temporal slice S, α is adjusted rate, is defined as:

In above formula: M_nIt is matching times；

Detect each pixel of Spatial-temporal slice SWhether N (μ, δ) distribution is obeyed, and then prospect subtitle will be by following public affairs Formula is calculated:

According to formula (4), the subtitle on horizontal Spatial-temporal slice S is separated from background as prospect；Video V (x, y, T) the minimum space and time order information content MSSI of the i-th frame can be calculated by following formula in

In formula:

τ is used to measure the size of single pixel minimum space and time order information content MSSI, and pixel of the MSSI lower than τ will be considered as It interferes and gets rid of.

A further technical solution lies in, the mutation of MSSI can be generated at video time domain elementary boundary, note Sudden Changing Rate is Δ, Then according to formula (5), Δ can be calculated by following formula:

It is obtained from formula (7), Δ increases suddenly comprising MSSI and the two kinds of situations that become smaller suddenly, both of these case both correspond to The boundary of video time domain unit；By the boundary B function of video time domain unit is defined as:

In formula: w₀Indicate the MSSI mutation content remarkable threshold of current subtitle frame and its previous caption frame；

B function curve is calculated according to formula (8), peak of curve corresponds to video time domain elementary boundary, according to B function The segmentation of video time domain unit can be completed in curve.

The beneficial effects of adopting the technical scheme are that video caption is defined as having minimum by the method Semantic sub- lens unit, correspondingly, by minimum space and time order information content (Minimal Spatiotemporal Semantic Information, MSSI) mutation be mapped as the boundary of video time domain unit.The method have when time domain unit segmentation There is higher accuracy rate, relative to control methods, since the method only extracts the detection of the one-row pixels in video, has relatively strong Anti-interference, and computation complexity is low, calculation amount is small, has more apparent advantage on calculating the time.

Detailed description of the invention

The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

Fig. 1 is the schematic diagram of the video Spatial-temporal slice of different directions in the embodiment of the present invention；

Fig. 2 is the horizontal Spatial-temporal slice exemplary diagram of video caption center in the embodiment of the present invention；

Fig. 3 is the flow chart of the method for the embodiment of the present invention.

Specific embodiment

With reference to the attached drawing in the embodiment of the present invention, technical solution in the embodiment of the present invention carries out clear, complete Ground description, it is clear that described embodiment is only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.

In the following description, numerous specific details are set forth in order to facilitate a full understanding of the present invention, but the present invention can be with Implemented using other than the one described here other way, those skilled in the art can be without prejudice to intension of the present invention In the case of do similar popularization, therefore the present invention is not limited by the specific embodiments disclosed below.

Overall, as shown in figure 3, including the following steps: the invention discloses a kind of video time domain unit partioning method

Firstly, extracting horizontal Spatial-temporal slice in video caption center；

Secondly, calculating the minimum space and time order information content MSSI of each video frame according to the horizontal Spatial-temporal slice of extraction；

It is then detected that the mutation of the minimum space and time order information content MSSI of video；

Finally, sporting boundary segmentation video time domain unit with minimum space and time order information content MSSI.

The above method is illustrated below with reference to particular technique means:

Video Spatial-temporal slice is a kind of efficient video analysis method, has many advantages, such as that calculation amount is low, strong robustness.It is only Image space part row, column is extracted, complete video time information is remained, and the missing of spatial information can be by more slice informations Fusion compensates.Video is analyzed under historical time information abundant auxiliary, can effectively avoid interference.Spatial-temporal slice usually from Horizontal, vertical and three directions of diagonal line are chosen, as shown in Figure 1.The video Spatial-temporal slice of different directions reflects different mesh Mark object information and video scene information.The analysis object of this method is video caption, and subtitle multidigit is below video, and transverse direction Arrangement, therefore lateral Spatial-temporal slice is selected, the horizontal Spatial-temporal slice example of video caption center is as shown in Figure 2.

For video V (x, y, t), the horizontal Spatial-temporal slice S of video caption center be may be expressed as:

The horizontal Spatial-temporal slice known to formula (1) is only extracted the one-row pixels in subtitling image space, remains complete Video time domain information, and spatial information (si) be able to reflect out subtitle structure, whether there is or not etc. semantic informations.Therefore, using video space-time The minimum semantic information for being sliced to analyze video is feasible, and data volume to be treated is greatly reduced.

Video caption contains video semanteme information abundant, the corresponding video semanteme content of same subtitle is relatively complete, And it is held essentially constant.Based on this observation, video caption is defined as with minimum semantic sub- lens unit, phase by the method It answers, the mutation of minimum space and time order information content MSSI is mapped as to the boundary of time domain unit.Existing sub- lens detection method It is carried out on the basis of shot segmentation, complex steps and computationally intensive, it is difficult to adapt to the reality of massive video data efficient process Demand.Video MSSI can be analyzed and be characterized by video caption Spatial-temporal slice, thus the method use video caption when Cut-in without ball piece detects the mutation of MSSI.

As can be seen from Figure 3 for the video sequence of input, progress Spatial-temporal slice extraction first, according to formula (1) from defeated Enter to extract the horizontal Spatial-temporal slice S of video caption center in video sequence.Caption information in Spatial-temporal slice S characterizes most Small space-time semantic information MSSI, accurate MSSI, first pre-processes horizontal Spatial-temporal slice S in order to obtain.Pretreatment is adopted It is carried out with Adaptive Gaussian mixture background model, using each column of horizontal Spatial-temporal slice S as an input Gauss model, model Parameter updates by column.Gaussian mean μ and variance δ²More new formula are as follows:

In above formula: M_nIt is matching times.

According to formula (4), the subtitle on Spatial-temporal slice S is separated from background as prospect.In video V (x, y, t) The minimum space and time order information content MSSI of i-th frame can be calculated by following formula:

In formula:

The segmentation of video time domain unit is completed, first to detect the boundary of time domain unit.At video time domain elementary boundary The mutation of MSSI can be generated, note Sudden Changing Rate is Δ, then according to formula (5), Δ can be calculated by following formula:

From formula (7) as can be seen that Δ contains MSSI increases suddenly and the unexpected two kinds of situations that become smaller, both of which Boundary corresponding to video time domain unit.For simplicity, the boundary B function of video time domain unit is defined as:

In formula: w₀Indicate the MSSI mutation content remarkable threshold of current subtitle frame and its previous caption frame.

In order to verify the validity of the method, by itself and existing main stream approach (Petersohn C.Sub-Shots- Basic Units of Video[C]//International Workshop on Systems,Signals and Image Processing,2007 and,Eurasip Conference Focused on Speech and Image Processing, Multimedia Communications and Services.IEEE, 2007:323-326.) it has carried out pair Than.Comparative experiments carries out on five kinds of different types of credit videos, as shown in table 1:

Table 1 tests video information

Video 1 is Renmin University of China's open class, and captioned test is Chinese text, and captioned test and background separate obviously, Shot change form is mutant form；The speech that video 2 is TEDxSuzhou, captioned test are Sino-British mixing text, word Curtain separates obviously with background, and Shot change form is mutant form；Video 3 is Zhejiang University's open class, and captioned test is Chinese Text, subtitle and background separate obviously, and Shot change form is mutation in conjunction with gradual change；The speech that video 4 is TED, word Curtain text is English text, and subtitle is larger by background influence in background, and Shot change form is mutant form；Video 5 is ox The open class of saliva university, captioned test are Sino-British mixing text, have cross section with background, Shot change form is mutation and gradual change In conjunction with form is more various.Test parameter setting are as follows: τ=10, w₀=20.Experiment is completed on universal personal computer, substantially It is configured that 380@2.53G CPU and 8GB memory of Intel (R) Core (TM) i3 M.

Comparison carries out in terms of processing time, recall rate and accuracy rate three.Wherein recall rate R_rIt is defined as follows:

Accuracy rate R_aIt is defined as follows:

In formula: FC_ZIndicate the correct video time domain unit number extracted, FC_sIndicate the video time domain unit actually having Number, FC_tIndicate the total video time domain unit number extracted.

Comparing result is respectively as shown in table 2,3,4,5,6:

Table 2 compares the method for video 1

Table 3 compares the method for video 2

Table 4 compares the method for video 3

Table 5 compares the method for video 4

Table 6 compares the method for video 5

Accuracy rate with higher when can be seen that the method progress time domain unit segmentation from above-mentioned experimental result.Phase There is stronger anti-interference since this method only extracts the detection of the one-row pixels in video for control methods, and calculate multiple Miscellaneous degree is low, calculation amount is small, has more apparent advantage on calculating the time.

Claims

1. a kind of video time domain unit partioning method, it is characterised in that include the following steps:

Horizontal Spatial-temporal slice is extracted in video caption center；

2. video time domain unit partioning method as described in claim 1, it is characterised in that:

3. video time domain unit partioning method as claimed in claim 2, it is characterised in that: when the method also includes to level Cut-in without ball piece S carries out pretreated step, the method is as follows:

Pretreatment is carried out using Adaptive Gaussian mixture background model, is inputted each column of horizontal Spatial-temporal slice S as one Gauss model, model parameter update by column；Gaussian mean μ and variance δ²More new formula are as follows:

In above formula: M_nIt is matching times；

Detect each pixel of Spatial-temporal slice SWhether N (μ, δ) distribution is obeyed, and then prospect subtitle will be by following formula meter It obtains:

According to formula (4), the subtitle on horizontal Spatial-temporal slice S is separated from background as prospect；In video V (x, y, t) The minimum space and time order information content MSSI of i-th frame can be calculated by following formula

In formula:

τ is used to measure the size of single pixel minimum space and time order information content MSSI, and pixel of the MSSI lower than τ will be considered as interfering And it gets rid of.

4. video time domain unit partioning method as claimed in claim 3, it is characterised in that:

The mutation of MSSI can be generated at video time domain elementary boundary, note Sudden Changing Rate is Δ, then according to formula (5), Δ can be by following formula It is calculated:

It is obtained from formula (7), Δ increases suddenly comprising MSSI and the two kinds of situations that become smaller suddenly, both of these case both correspond to video The boundary of time domain unit；By the boundary B function of video time domain unit is defined as:

B function curve is calculated according to formula (8), peak of curve corresponds to video time domain elementary boundary, according to B function curve The segmentation of video time domain unit can be completed.