CN105744356A

CN105744356A - Content-based video segmentation method

Info

Publication number: CN105744356A
Application number: CN201610066554.0A
Authority: CN
Inventors: 肖俊; 陈刘策
Original assignee: Hangzhou Guantong Technology Co Ltd
Current assignee: Hangzhou Guantong Technology Co Ltd
Priority date: 2016-01-29
Filing date: 2016-01-29
Publication date: 2016-07-06
Anticipated expiration: 2036-01-29
Also published as: CN105744356B

Abstract

The invention discloses a content-based video segmentation method. The method comprises the following steps of firstly, obtaining a content and a time point of each paragraph in a video by utilizing a subtitle file and combining the close paragraphs together to form a large paragraph by utilizing a time interval between every two paragraphs; secondly, carrying out word segmentation on the large paragraph, obtaining the similarity between the sentences by utilizing the similarity between the words, combining the sentences with the larger similarity together to form a paragraph and obtaining an initial video segmentation position according to the corresponding time information; and lastly, carrying out shot extraction on the video based on an image and combining with the previous segmentation position to find the final accurate segmentation position.

Description

A kind of content-based video segmentation method

Technical field

The present invention designs technical field of video processing, and natural language processing technique, particularly to video segmentation method.

Background technology

Science and education video is a kind of common video type, along with the arriving of cybertimes, user viewing science and education video carrier also from TV gradually to computer and network change.And when viewing video, spectators often select fast hop, skip the part being not desired to viewing, watch the content that they are interested.

And in the process jumped, user is difficult to the position adjusted exactly to thinking viewing accurately, it is required for getting to position satisfied in user mind through repeatedly adjustment, this process extremely affects viewing experience, so video is carried out segmentation, user is when selecting to skip this section of content, the segment information that video itself provides can be passed through, position the beginning to next section of content quickly and accurately, slowly adjust without user oneself, this is for video website, considerably increases its competitiveness beyond doubt.

The scheme of current video automatic segmentation is mostly use the method based on scene detection, the video of same scene is divided into one section, the frame of scene change is set as the starting point of a certain section, but a video often has substantial amounts of scene and tells about same part thing, segmentation based on scene can cause segment frequence too high, even occurring in the special circumstances having divided several sections within several seconds to occur, this is all unscientific segmentation method.

And there is the video of the subtitle file of standard, can from the time shaft of captions and particular content, from the angle of natural language, analyze the similarity between each section of words, similarity is utilized to carry out segmentation, recycle this segment information, in conjunction with the conversion in the scene comparing cleaning in science and education video, obtain segment information accurately.

Summary of the invention

The invention aims to solve the automatic segmentation problem of science and education video, it is provided that a kind of content-based automatic segmentation method.It is characterized in that comprising the following steps:

The dialogue stream S={s in video is extracted by subtitle file₁, s₂, s₃..., s_n, the time started B={b of each section of dialogue₁, b₂, b₃..., b_n, the end time E={e of each section of dialogue₁, e₂, e₃..., e_n}。

For all of neighbor dialogs s_i, s_i-1, set a threshold value λ, work as b_i-e_i-1During < λ, then make s_i, s_i-1Being classified as same section, thus dialogue stream S is divided into m section, wherein i-th section by from kth word, is made up of the dialogue of l section continuous print altogether, i.e. S_i={ s_k, s_k+1, s_k+2... s_k+l}。

Use ICTCLAS participle instrument to every in short s_kCarry out participle, after removing non-notional word, obtain s_kWord list C_k={ c_k1, c_k2, c_k3..., c_kh}。

Any two sentence s is tried to achieve by below equation_xAnd s_yBetween similarity:

s i m (s_{x}, s_{y}) = \frac{Σ_{i} Σ_{j} f (c_{x i}) f (c_{y j})}{\sqrt{Σ_{i} Σ_{j} f (c_{x i}) f (c_{x j})} \sqrt{Σ_{i} Σ_{j} f (c_{y i}) f (c_{y j})}}

Wherein f (c_xi) for word c_xiTerm vector, f (c_yi) for word c_yiTerm vector.By two word c_xi, c_yiThe dot product f (c of term vector_xi)f(c_yi) can in the hope of the similarity of two words；

For all of S_i, utilize the 4th step to try to achieve S_iIn similarity matrix M between all sentences_i, in order to allow matrix character become apparent from, we are to M_iConvert:

First, to M_iIn any one value x all do with down conversion:

f (x) = g (x * \frac{2 π}{200})

Wherein g (x) is:

g (x) = \frac{1}{2} (e^{- \frac{x}{2}} + \frac{1}{2} e^{- \frac{x}{2}} (1 + s i n (10 x^{0.7})))

Then to the M after conversion_iIn any one value m_{X, y}All the window of its 1*11 is made a statistics, namely to { m_{X, y-5}, m_{X, y-4}..., m_{X, y+4}, m_{X, y+5}Count less than m_{X, y}The number of value, with less than m_{X, y}The number of value replace m_{X, y}；

Needing below the matrix after replacing is split, dividing method is as follows:

First, the accuracy of a splitting scheme is defined: ordera_{X, y}=(y-x+1)², it is assumed that to S_iDividing method BF={bf is obtained after dividing₁, bf₂..., bf_p, then according to dividing method BF, it is possible to the accuracy of computational methods BF:

D = \frac{Σ_{k = 1}^{p} t_{k}}{Σ_{k = 1}^{p} a_{k}}

Segmentation is the process of an iteration, and wherein the method for the segmentation of the i-th step is as follows: calculates and all once splits BF upper_i-1When, then carry out the place wherein arbitrarily can split dividing the dividing method BF ' formed_i={ BF '_i1, BF '_i2, BF '_i3... BF '_iw, calculate accuracy D, take out BF '_iMiddle D is the BF ' of maximum value_iuAs optimization next step splitting scheme BF_i；

A series of segmentation step BF can be obtained by previous step₁, BF₂, BF₃…BF_g, corresponding efficiency is D₁, D₂, D₃…D_gTake out the lifting δ D of segmentation_i=D_i+1-D_i, after segmentation each time, calculate δ D_iThe mean μ of set and variance ν, whenIn time, stops.

The characteristics of image that whole video utilizes video carries out segmentation, and the RGB histogrammic difference computing formula of adjacent two frames is as follows:

\begin{matrix} D (f_{1}, f_{2}) = Σ_{i = 1}^{K} | H (f_{1}, R, i) - H (f_{2}, R, i) | + | H (f_{1}, G, i) - H (f_{2}, G, i) | \\ + | H (f_{1}, B, i) - H (f_{2}, B, i) | \end{matrix}

Wherein, H (f₁, R, i) represent f₁The histogram value of the i-th red level of frame, H (f₁, G, i) represent f₁The histogram value of the i-th green level of frame, H (f₁, B, i) represent f₁The histogram value of the i-th blue level of frame.

Then mean μ and the variance ν of the rectangular histogram difference of adjacent two frames of whole video are tried to achieve, when the difference between two frames is more than threshold value μ+av, using the second frame as an edge camera lens.A is for adjusting parameter.

Suitable camera lens is looked near segmentation place that text finds.If segmentation place that text is in video is a period of time, then in this period, find the maximum camera lens of change as shot segmentation；If this period is absent from edge camera lens, then choose from this period nearest camera lens as shot segmentation；If segmentation place that text is in video is a frame, then the immediate camera lens of this frame of selected distance is as shot segmentation.

Accompanying drawing explanation

Fig. 1 is original similarity matrix exemplary plot；

Fig. 2 is the similarity matrix exemplary plot after optimizing；

Fig. 3 is the selection exemplary plot of Factorization algorithm.

Detailed description of the invention

A kind of content-based video segmentation method, comprises the following steps:

Use ICTCLAS participle instrument to every in short s_kCarry out participle, after removing non-notional word, obtain s_kWord list C_k={ c_k1, c_k2, c_k3..., c_kn}。

s i m (s_{x}, s_{y}) = \frac{Σ_{i} Σ_{j} f (c_{x i}) f (c_{y j})}{\sqrt{Σ_{i} Σ_{j} f (c_{x i}) f (c_{x j})} \sqrt{Σ_{i} Σ_{j} f (c_{y i}) f (c_{y j})}}

For all of S_i, utilize the 4th step to try to achieve S_iIn similarity matrix M between all sentences_i, as it is shown in figure 1, in order to allow matrix character become apparent from, we are to M_iConvert:

First, to M_iIn any one value x all do with down conversion:

f (x) = g (x * \frac{2 π}{200})

Wherein g (x) is:

g (x) = \frac{1}{2} (e^{- \frac{x}{2}} + \frac{1}{2} e^{- \frac{x}{2}} (1 + s i n (10 x^{0.7})))

Then to the M after conversion_iIn any one value m_{X, y}All the window of its 1*11 is made a statistics, namely to { m_{X, y-5}, m_{X, y-4}..., m_{X, y+4}, m_{X, y+5}Count less than m_{X, y}The number of value, with less than m_{X, y}The number of value replace m_{X, y}；As shown in the figure.

As it is shown on figure 3, need below the matrix after replacing is split, dividing method is as follows:

D = \frac{Σ_{k = 1}^{p} t_{k}}{Σ_{k = 1}^{p} a_{k}}

\begin{matrix} D (f_{1}, f_{2}) = Σ_{i = 1}^{K} | H (f_{1}, R, i) - H (f_{2}, R, i) | + | H (f_{1}, G, i) - H (f_{2}, G, i) | \\ + | H (f_{1}, B, i) - H (f_{2}, B, i) | \end{matrix}

Claims

1. video segmentation method one kind content-based, it is characterised in that comprise the following steps:

S01: extracted the dialogue stream S={s in video by subtitle file₁,s₂,s₃,…,s_n, the time started B={b of each section of dialogue₁,b₂,b₃,…,b_n, the end time E={e of each section of dialogue₁,e₂,e₃,…,e_n}；

S02: for all of neighbor dialogs s_i,s_i-1, set a threshold value λ, work as b_i-e_i-1During < λ, then make s_i,s_i-1Being classified as same section, thus dialogue stream S is divided into m section, wherein i-th section by from kth word, is made up of the dialogue of l section continuous print altogether, i.e. s_i={ s_k,s_k+1,s_k+2,…s_k+l-1}；

S03: use participle instrument to every in short s_kCarry out participle, after removing non-notional word, obtain s_kWord list C_k={ c_k1,c_k2,c_k3,…,c_kh}；

S04: try to achieve any two sentence s by below equation_xAnd s_yBetween similarity:

Wherein f (c_xi) for word c_xiTerm vector, f (c_yi) for word c_yiTerm vector, by two word c_xi,c_yiThe dot product f (c of term vector_xi)f(c_yi) can in the hope of the similarity of two words；

S05: for all of S_i, utilize the 4th step to try to achieve S_iIn similarity matrix M between all sentences_i, to M_iSplit, obtain the dividing method of corresponding dialogue；

S06: whole video is utilized the image characteristics extraction edge camera lens of video；

S07: look for optimal shot segmentation near segmentation place that text finds: if segmentation place that text is in video is a period of time, then find the maximum camera lens of change as shot segmentation in this period；If this period is absent from edge camera lens, then choose from this period nearest camera lens as shot segmentation；If segmentation place that text is in video is a frame, then the immediate camera lens of this frame of selected distance is as shot segmentation.

2. content-based video segmentation method according to claim 1, it is characterised in that to M in described step S05_iThe method carrying out splitting is:

1) to M_iConvert, first, to M_iIn any one value x all do with down conversion:

Wherein g (x) is:

Then to the M after conversion_iIn any one value m_x,yAll the window of its 1*11 is made a statistics, namely to { m_x,y-5,m_x,y-4,…,m_x,y+4,m_x,y+5Count less than m_x,yThe number of value, with less than m_x,yThe number of value replace m_x,y；

2) to the M after replacing_iSplitting, dividing method is as follows:

First, the accuracy of a splitting scheme is defined: ordera_x,y=(y-x+1)², it is assumed that to S_iDividing method BF={bf is obtained after dividing₁,bf₂,…,bf_p, then according to dividing method BF, it is possible to the accuracy of computational methods BF:

Segmentation is the process of an iteration, and wherein the method for segmentation of kth step is as follows: calculates and all once splits BF upper_k-1When, then carry out the place wherein arbitrarily can split dividing the dividing method BF ' formed_k={ BF '_k1,BF′_k2,BF′_k3,…BF′_kw, calculate accuracy D, take out BF '_kMiddle D is the BF ' of maximum value_kuOption b F is split as optimization_k；

3) by the 2nd) step can obtain a series of segmentation step BF₁,BF₂,BF₃…BF_g, corresponding efficiency is D₁,D₂,D₃…D_gTake out the lifting δ D of segmentation_i=D_i+1-D_i, after segmentation each time, calculate δ D_iThe mean μ of set and variance ν, whenIn time, stops.