Summary of the invention
Technical problem to be solved by the invention is to provide a kind of strong interference immunity, computation complexity is low, calculation amount is small, view
Frequency division cuts accurate method.
In order to solve the above technical problems, the technical solution used in the present invention is: a kind of video time domain unit partioning method,
It is characterized by comprising following steps:
Horizontal Spatial-temporal slice is extracted in video caption center;
The minimum space and time order information content MSSI of each video frame is calculated according to the horizontal Spatial-temporal slice of extraction;
Detect the mutation of the minimum space and time order information content MSSI of video;
Boundary segmentation video time domain unit is sported with minimum space and time order information content MSSI.
A further technical solution lies in, for video V (x, y, t), the horizontal Spatial-temporal slice S of video caption center
It may be expressed as:
In formula:Indicate position x=j in video V, t=i, y take the pixel at subtitle median elevation, meet j ∈ [1,
W], i ∈ [1, L], W indicate that the width of video frame, L indicate the length of video.
A further technical solution lies in the method also includes carrying out pretreated step, side to horizontal Spatial-temporal slice S
Method is as follows:
Pretreatment is carried out using Adaptive Gaussian mixture background model, using each column of horizontal Spatial-temporal slice S as one
Gauss model is inputted, model parameter updates by column;Gaussian mean μ and variance δ2More new formula are as follows:
In above formula:It is the brightness that t+1 is arranged in Spatial-temporal slice S, α is adjusted rate, is defined as:
In above formula: MnIt is matching times;
Detect each pixel of Spatial-temporal slice SWhether N (μ, δ) distribution is obeyed, and then prospect subtitle will be by following public affairs
Formula is calculated:
According to formula (4), the subtitle on horizontal Spatial-temporal slice S is separated from background as prospect;Video V (x, y,
T) the minimum space and time order information content MSSI of the i-th frame can be calculated by following formula in
In formula:
τ is used to measure the size of single pixel minimum space and time order information content MSSI, and pixel of the MSSI lower than τ will be considered as
It interferes and gets rid of.
A further technical solution lies in, the mutation of MSSI can be generated at video time domain elementary boundary, note Sudden Changing Rate is Δ,
Then according to formula (5), Δ can be calculated by following formula:
It is obtained from formula (7), Δ increases suddenly comprising MSSI and the two kinds of situations that become smaller suddenly, both of these case both correspond to
The boundary of video time domain unit;By the boundary B function of video time domain unit is defined as:
In formula: w0Indicate the MSSI mutation content remarkable threshold of current subtitle frame and its previous caption frame;
B function curve is calculated according to formula (8), peak of curve corresponds to video time domain elementary boundary, according to B function
The segmentation of video time domain unit can be completed in curve.
The beneficial effects of adopting the technical scheme are that video caption is defined as having minimum by the method
Semantic sub- lens unit, correspondingly, by minimum space and time order information content (Minimal Spatiotemporal Semantic
Information, MSSI) mutation be mapped as the boundary of video time domain unit.The method have when time domain unit segmentation
There is higher accuracy rate, relative to control methods, since the method only extracts the detection of the one-row pixels in video, has relatively strong
Anti-interference, and computation complexity is low, calculation amount is small, has more apparent advantage on calculating the time.
Specific embodiment
With reference to the attached drawing in the embodiment of the present invention, technical solution in the embodiment of the present invention carries out clear, complete
Ground description, it is clear that described embodiment is only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
In the following description, numerous specific details are set forth in order to facilitate a full understanding of the present invention, but the present invention can be with
Implemented using other than the one described here other way, those skilled in the art can be without prejudice to intension of the present invention
In the case of do similar popularization, therefore the present invention is not limited by the specific embodiments disclosed below.
Overall, as shown in figure 3, including the following steps: the invention discloses a kind of video time domain unit partioning method
Firstly, extracting horizontal Spatial-temporal slice in video caption center;
Secondly, calculating the minimum space and time order information content MSSI of each video frame according to the horizontal Spatial-temporal slice of extraction;
It is then detected that the mutation of the minimum space and time order information content MSSI of video;
Finally, sporting boundary segmentation video time domain unit with minimum space and time order information content MSSI.
The above method is illustrated below with reference to particular technique means:
Video Spatial-temporal slice is a kind of efficient video analysis method, has many advantages, such as that calculation amount is low, strong robustness.It is only
Image space part row, column is extracted, complete video time information is remained, and the missing of spatial information can be by more slice informations
Fusion compensates.Video is analyzed under historical time information abundant auxiliary, can effectively avoid interference.Spatial-temporal slice usually from
Horizontal, vertical and three directions of diagonal line are chosen, as shown in Figure 1.The video Spatial-temporal slice of different directions reflects different mesh
Mark object information and video scene information.The analysis object of this method is video caption, and subtitle multidigit is below video, and transverse direction
Arrangement, therefore lateral Spatial-temporal slice is selected, the horizontal Spatial-temporal slice example of video caption center is as shown in Figure 2.
For video V (x, y, t), the horizontal Spatial-temporal slice S of video caption center be may be expressed as:
In formula:Indicate position x=j in video V, t=i, y take the pixel at subtitle median elevation, meet j ∈ [1,
W], i ∈ [1, L], W indicate that the width of video frame, L indicate the length of video.
The horizontal Spatial-temporal slice known to formula (1) is only extracted the one-row pixels in subtitling image space, remains complete
Video time domain information, and spatial information (si) be able to reflect out subtitle structure, whether there is or not etc. semantic informations.Therefore, using video space-time
The minimum semantic information for being sliced to analyze video is feasible, and data volume to be treated is greatly reduced.
Video caption contains video semanteme information abundant, the corresponding video semanteme content of same subtitle is relatively complete,
And it is held essentially constant.Based on this observation, video caption is defined as with minimum semantic sub- lens unit, phase by the method
It answers, the mutation of minimum space and time order information content MSSI is mapped as to the boundary of time domain unit.Existing sub- lens detection method
It is carried out on the basis of shot segmentation, complex steps and computationally intensive, it is difficult to adapt to the reality of massive video data efficient process
Demand.Video MSSI can be analyzed and be characterized by video caption Spatial-temporal slice, thus the method use video caption when
Cut-in without ball piece detects the mutation of MSSI.
As can be seen from Figure 3 for the video sequence of input, progress Spatial-temporal slice extraction first, according to formula (1) from defeated
Enter to extract the horizontal Spatial-temporal slice S of video caption center in video sequence.Caption information in Spatial-temporal slice S characterizes most
Small space-time semantic information MSSI, accurate MSSI, first pre-processes horizontal Spatial-temporal slice S in order to obtain.Pretreatment is adopted
It is carried out with Adaptive Gaussian mixture background model, using each column of horizontal Spatial-temporal slice S as an input Gauss model, model
Parameter updates by column.Gaussian mean μ and variance δ2More new formula are as follows:
In above formula:It is the brightness that t+1 is arranged in Spatial-temporal slice S, α is adjusted rate, is defined as:
In above formula: MnIt is matching times.
Detect each pixel of Spatial-temporal slice SWhether N (μ, δ) distribution is obeyed, and then prospect subtitle will be by following public affairs
Formula is calculated:
According to formula (4), the subtitle on Spatial-temporal slice S is separated from background as prospect.In video V (x, y, t)
The minimum space and time order information content MSSI of i-th frame can be calculated by following formula:
In formula:
τ is used to measure the size of single pixel minimum space and time order information content MSSI, and pixel of the MSSI lower than τ will be considered as
It interferes and gets rid of.
The segmentation of video time domain unit is completed, first to detect the boundary of time domain unit.At video time domain elementary boundary
The mutation of MSSI can be generated, note Sudden Changing Rate is Δ, then according to formula (5), Δ can be calculated by following formula:
From formula (7) as can be seen that Δ contains MSSI increases suddenly and the unexpected two kinds of situations that become smaller, both of which
Boundary corresponding to video time domain unit.For simplicity, the boundary B function of video time domain unit is defined as:
In formula: w0Indicate the MSSI mutation content remarkable threshold of current subtitle frame and its previous caption frame.
B function curve is calculated according to formula (8), peak of curve corresponds to video time domain elementary boundary, according to B function
The segmentation of video time domain unit can be completed in curve.
In order to verify the validity of the method, by itself and existing main stream approach (Petersohn C.Sub-Shots-
Basic Units of Video[C]//International Workshop on Systems,Signals and Image
Processing,2007 and,Eurasip Conference Focused on Speech and Image
Processing, Multimedia Communications and Services.IEEE, 2007:323-326.) it has carried out pair
Than.Comparative experiments carries out on five kinds of different types of credit videos, as shown in table 1:
Table 1 tests video information
Video 1 is Renmin University of China's open class, and captioned test is Chinese text, and captioned test and background separate obviously,
Shot change form is mutant form;The speech that video 2 is TEDxSuzhou, captioned test are Sino-British mixing text, word
Curtain separates obviously with background, and Shot change form is mutant form;Video 3 is Zhejiang University's open class, and captioned test is Chinese
Text, subtitle and background separate obviously, and Shot change form is mutation in conjunction with gradual change;The speech that video 4 is TED, word
Curtain text is English text, and subtitle is larger by background influence in background, and Shot change form is mutant form;Video 5 is ox
The open class of saliva university, captioned test are Sino-British mixing text, have cross section with background, Shot change form is mutation and gradual change
In conjunction with form is more various.Test parameter setting are as follows: τ=10, w0=20.Experiment is completed on universal personal computer, substantially
It is configured that 380@2.53G CPU and 8GB memory of Intel (R) Core (TM) i3 M.
Comparison carries out in terms of processing time, recall rate and accuracy rate three.Wherein recall rate RrIt is defined as follows:
Accuracy rate RaIt is defined as follows:
In formula: FCZIndicate the correct video time domain unit number extracted, FCsIndicate the video time domain unit actually having
Number, FCtIndicate the total video time domain unit number extracted.
Comparing result is respectively as shown in table 2,3,4,5,6:
Table 2 compares the method for video 1
Table 3 compares the method for video 2
Table 4 compares the method for video 3
Table 5 compares the method for video 4
Table 6 compares the method for video 5
Accuracy rate with higher when can be seen that the method progress time domain unit segmentation from above-mentioned experimental result.Phase
There is stronger anti-interference since this method only extracts the detection of the one-row pixels in video for control methods, and calculate multiple
Miscellaneous degree is low, calculation amount is small, has more apparent advantage on calculating the time.