CN100515048C

CN100515048C - Method and system for fast detecting static stacking letters in online video stream

Info

Publication number: CN100515048C
Application number: CNB2007101761264A
Authority: CN
Inventors: 李甲; 田永鸿; 黄铁军; 高文
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2007-10-19
Filing date: 2007-10-19
Publication date: 2009-07-15
Anticipated expiration: 2027-10-19
Also published as: CN101137017A

Abstract

The method for detecting quiescent-state superimposed words (QSW) uses related info between frames and the small wave-domain mode-built method to eliminate effectively moving words and the background and reserve the QSW area. This method quickly detects the locations of the QSW in online video flow (OVF). This invention also constructs an OVF detection system based on the method of detection of QSW. This system uses different parameter sets on every user terminal (UE) to quickly detect words and convert them into text flow through OCR control. After all UE transmit text flows to the centralized retrieval server for integration, it can retrieve every channel on multi-time sizes and provide a function for fast browsing content of every channel. Without infringing on copyrights, this invention can do synchronic analysis, indexing, retrieving and browsing OVF of multi- routes and multi-kinds of quality without establishing specific servers for different kinds of video flows.

Description

The method and system of static stack literal in a kind of fast detecting Online Video stream

Technical field

The present invention relates to a kind of image and method for processing video frequency and system, the method and system of static stack literal in particularly flowing about a kind of fast detecting Online Video.

Background technology

Over nearly 10 years, approach and the mode that people receive information expanded in the rapid rise of network media technology greatly, and be in numerous network media technology, the most noticeable with the development of Online Video stream technology.In some sense, Online Video stream can be understood that the broadcasting channel on the network.As famous PPLive video broadcast system, all there are hundreds of channels broadcasting at any time.In order to find out the user's interest channel in the Web broadcast channel of number with thousand notes, the user presses for a kind of effective means and comes the content of the current program of channel is browsed fast and retrieved.Yet the present retrieval mode based on SDI as the method that the descriptive matter in which there by channel, channel header etc. are retrieved, can't be described the up-to-date play content of each web channel fast, therefore can't satisfy user's Search Requirement.

Simultaneously, the imagination space of the more automatic technologies of people's exploitation has been expanded in the rapid rise of digitizing technique.In the image/video process field, object detection and recognition technology have caused that people more and more pay close attention to automatically.Wherein, the image/video text detection is exactly a very significant research direction.In order to search the interested video of user fast in millions of video files, people have developed the automatic technology that a kind of static state stack literal that is used for video file carries out detection and Identification.By to detected static stack literal, carry out optical character identification (OCR) as spolen title, title etc., can effectively know the content of current video by inference.By this technology, people can retrieve and the fast browsing video file by the static state literal that superposes.Yet, when this method being applied to Online Video stream, can run into following each problem:

1, copyright problem.General Web broadcast, as online movie channel, Internet news broadcasting etc., its Streaming Media publisher has the copyright to broadcasted content, if according to the method for " download-analysis " broadcasted content is stored and analyzes, then can invade corresponding copyright.

2, real-time problem.Different with traditional video file retrieval, the retrieval of online Streaming Media has very high requirement to real-time, and for example the user need or browse by retrieval, recognizes in nearest one minute or the programme content of some Online Videos stream in five minutes.Yet, existing text detection algorithm often needs to carry out a large amount of calculating, makes detection speed well below the normal playback speed of video flowing, makes in this way online video flowing is carried out text detection, can cause a large amount of omissions, thereby can not provide complete programme content to describe.

3, channel quantity problem.As mentioned above, describe, need carry out text detection and identification fast each channel in order to ensure obtaining complete programme content.Therefore, the analytic process of Online Video stream can not can be carried out as traditional video file analytical technology in serial, but must to several thousand the tunnel so that roads up to ten thousand Online Video stream carry out concurrent analysis.Yet,, can not adopt every road Online Video stream is set up the method that independent Analysis server is analyzed because the number of Online Video stream is huge and have highly variable.

4, video quality problem.Because the restriction of the network bandwidth and different coded formats, Online Video stream will have multiple resolution and multiple compression quality, yet, the adaptive threshold text detection algorithm of the text detection algorithm of existing employing fixed threshold and employing priori rules can't be applicable to the video of multiple quality, and then can't all reach best text detection result to the video flowing of multiple quality.

By pertinent literature both domestic and external and patent are retrieved and can be found, the detection method of at present main static state stack literal is based on still that single-frame images carries out, in paper " Fast and robust textdetection in images and video frames. " (character detecting method of fast robust in image and the literal frame), single image is carried out wavelet transformation, and, detect character area by SVMs and from the feature that wavelet field is extracted.In the paper " A comprehensive method for multilingual video textdetection; localization; and extraction " (a kind of comprehensive multilingual video text detection, location and extracting method), in single image, detect the approximate region of literal by the Sobel edge detection operator, and accurately locate character area by local threshold processing and level, upright projection.Patent " method of utilizing SVMs to carry out the video caption location and extract ", its main thought is to detect character area by SVMs and gradation of image feature in single image.Patent " video caption content analysis system ", its main thought are to locate character area by gray scale marginal information and predetermined rule on independent frame of video.Yet these methods are not considered and are utilized inter-frame information to carry out speed-up computation.Edge strength has carried out the interframe tracking to character area though some paper and patent have been used intuitively, and this tracking intuitively can not reflect the situation of change at edge effectively, thereby can cause the omission survey of many character areas and flase drop to survey.In addition, using caption character to carry out aspect the video frequency searching, the captions that existing paper and patent also rest on by video file being disregarded cost extract, thereby video file is set up the stage of text index for retrieval.But, for realizing the real time indexing of online video flowing, need can fast detecting static text zone text detection algorithm, therefore, the analytical method of above-mentioned use complicated calculations can not directly apply to the text detection of Online Video stream, thereby can't provide complete textual description to Online Video stream.

By above analysis as can be known, because the special nature of Online Video flow analysis directly uses existing analysis and retrieval technique to video file to have certain difficulty to it.For this reason, need a kind of Online Video stream searching system that can solve above-mentioned each problem.

Summary of the invention

The object of the present invention is to provide the method and system of static stack literal in a kind of fast detecting Online Video stream.Detected character area is being cut apart and discerned, and after transferring it to text formatting, the literal analysis result of multichannel Online Video stream is integrated, and then made up Online Video stream searching system, with Online Video stream search function that granularity of many time is provided and based on the quick browsing function of its content.

For achieving the above object, the present invention takes following technical scheme:

The method of static stack literal in a kind of fast detecting Online Video stream, roughly workflow is as follows for it: at first fast detecting goes out the static state stack character area in the video flowing and discerns on user terminal, the character text result that each user terminal identification is drawn is uploaded to centralized retrieval server then, in these character texts, extract keyword and integrate, retrieve and browse for the user according to time sequencing.

For realizing the fast detecting of static stack character area in the video flowing, at first each frame of video is carried out necessary convergent-divergent, to reduce data volume to be processed, thereafter testing process mainly comprises following step: 1, the definition inter-frame correlation information is used to describe the stability at edge on wavelet field, and removes movement background zone and non-static stack character area; 2, use extensive Gauss model that the coefficient of wavelet sub-band is distributed and carry out modeling, with the distribution situation of simulation wavelet coefficient and derive corresponding threshold value, to be used to extract strong fringe region; 3, to via above-mentioned two step remaining areas, use morphologic associative operation, it is capable to be divided into candidate character; 4, capable to candidate character, use the inter-frame correlation information that is calculated in the step 1 to carry out interframe and follow the tracks of, not that the candidate character that occurs first of this frame is capable to remove; 5, capable to remaining candidate character, extract corresponding feature in wavelet field, and use SVMs as grader, to obtain real literal line, be the emerging static stack literal of this frame.

In addition, for realizing fast browsing and retrieval, at first need on each user terminal, under the prerequisite that does not influence viewing effect, adopt multiple possible parametric family to detect static stack character area to multi-path video stream.Therefore, for on each client, implementing above-mentioned static stack character detecting method, except above-mentioned text detection module, also increased with lower module: 1, frame of video grasps module, from Online Video stream, grasp the frame of video and front and back two frames thereof of current broadcast simultaneously, the static state stack character area that increases newly for extraction by certain frame sampling rate; 2, random parameter generation module, on specific user terminal, the parametric family to being adopted in detecting carries out random value in preset range, be used for all the text detection tasks on this user terminal; 3, analysis speed estimation module, according to the text detection speed of current number frame, the decision frame of video grasps the frame sampling rate of module to Online Video stream, to guarantee the smooth speed of watching.

By on each user terminal, using this static text detection system that superposes, we have invented a kind of based on the Online Video stream retrieval of static state stack literal and the method for browsing, may further comprise the steps: 1, utilize the vacant computational resource on each user terminal, under the situation that does not influence viewing effect, current video stream is carried out the detection of static stack literal; 2, utilize existing literal to cut apart and optical character recognition method, detected static stack character area is discerned, and the text that obtains is sent to concentrated retrieval server with set form; 3, concentrate on the retrieval server, use existing method that the text that is received is proposed keyword, and integrate, to obtain the text index of Online Video stream by channel; 4, concentrate on the retrieval server, use the keyword index of each channel that extracts, the search function to the granularity of many time of each channel is provided; 5, concentrate on the retrieval server, use the keyword index of each channel that extracts, the content-based quick browsing function to each channel is provided.

Based on the method, we have developed a kind of system of retrieving and browsing Online Video stream, in this system, the Online Video that each user terminal is being watched oneself flows to the style of writing word and detects, and testing result is identified as text formatting is transferred to concentrated retrieval server.Concentrate retrieval server that the analysis result of each client is integrated, form text index Online Video stream.After this, concentrated retrieval server relies on these texts, provides search function to the granularity of many time of each channel to the user, and the content-based quick browsing function to each channel is provided.

The present invention is owing to take above technical scheme, and it is compared with existing method, and main innovate point is: 1, in testing process, and the several frame videos that only need use user current time to watch, thus effectively avoided downloading the problem of the infringement that causes.2, the describing method that has proposed a kind of frame-to-frame correlation of robust is used for removing fast non-static stack character area, and has adopted several different methods to guarantee the speed of static stack text detection.Thereby solved the real-time problem of analyzing.3, in testing process, utilize the vacant resource of user terminal to carry out text detection, to realize concurrent detection to multichannel Online Video stream.Thereby solved the problem of channel quantity.4, utilize the distribution character of user terminal, in static state stack text detection flow process, on different user terminals, use different parametric family values, flow with the Online Video that adapts to multiple resolution, multiple compression quality, and then guarantee to detect correct character area.Thereby solved the video quality problem.5,, offer the user with the search method of the Online Video stream of granularity of many time and content-based fast browsing by terminal use's operation result is integrated and issued.6, given one section network Online Video stream, static stack character detecting method of the present invention can produce on any electronics browsing terminal that contains processor, as PC, smart mobile phone etc.7, the present invention equally also is applicable to the television broadcasting of obtaining by capture card, TV card.Analysis result by collecting each terminal is also integrated, can obtain real-time text description, thereby make the user become possibility the retrieval of the granularity of many time of various Online Videos streams and content-based fast browsing to the video that presents with the Streaming Media form arbitrarily.

Description of drawings

Fig. 1 is a rapid static stack text detection system module schematic diagram of the present invention

Fig. 2 is a rapid static stack character detecting method schematic flow sheet of the present invention

Fig. 3 is that video flowing of the present invention is analyzed and the searching system configuration diagram

Fig. 4 is that the method that the present invention is based on static stack literal in the fast detecting Online Video stream is retrieved the system configuration schematic diagram that Online Video flows

Embodiment

Also in conjunction with the accompanying drawings the present invention is described in detail below by embodiment.

As shown in Figure 1, the static stack of the present invention text detection system mainly comprises following each basic module:

1) frame of video grasps module 1, grasps the frame of video and front and back two frames thereof of current broadcast from Online Video stream simultaneously by certain frame sampling rate;

2) the wavelet decomposition module 2, and current video frame and front and back two frames thereof are decomposed into corresponding wavelet sub-band respectively.

3) time domain analysis module 4 according to frame-to-frame correlation, is removed movement background and non-static stack character area;

4) the spatial domain analysis module 5, according to the wavelet field parameter, remove simple background area;

5) post-processing module 6, the result of time domain, spatial domain analysis module is made up, and divide literal line, carry out interframe by the frame-to-frame correlation of calculated and preceding frame and follow the tracks of, and carry out the literal line authenticity by SVMs and adjudicate;

For guaranteeing on user terminal, do not influencing the sampling that realizes under the prerequisite that the user watches video flowing, concrete when implementing this text detection system on user terminal, also increased following two modules:

1) the random parameter generation module 3, and on specific user terminal, the parametric family to being adopted in detecting carries out random value in preset range, be used for all the text detection tasks on this user terminal;

2) the analysis speed estimation module 7, and according to the text detection speed of current number frame, the decision frame of video grasps the frame sampling rate of module to Online Video stream, to guarantee the smooth speed of watching

The data flow relation of intermodule is as follows: video stream data grasps module 1, wavelet decomposition module 2, time domain analysis module 4, spatial domain analysis module 5 and post-processing module 6 via frame of video successively.The output of post-processing module 6 comprises character area information and control information.Wherein control information will be sent to analysis speed estimation module 7 as analyzing the time spent etc., analyze employed sample frequency with decision next time.Simultaneously, random parameter generation module 3 will produce different random parameters, and export these parameters to time domain analysis module 4 and spatial domain analysis module 5, with the process of control text detection.

As shown in Figure 2, be text detection flow process of the present invention, for fear of property infringement, when detecting the static state stack character area of i frame, only select front and back two frames that are adjacent, be used to extract inter-frame correlation information.In general, can think that character area is the set of high frequency edge, and compare with " motion " literal such as rolling news, natural scene literal, static stack literal, particularly the title of artificial stack, spolen title etc. can contain static relatively edge.Therefore, the purpose that frame-to-frame correlation is analyzed is the edge by contrast present frame and front and back two frames, knows the motion conditions at edge, thereby draws the zone of being rich in stagnating margin.

Owing to usually including complicated stagnating margin in the static stack character area, therefore, the method that direct edge strength by front and back two frame same positions carries out the stagnating margin detection is worthless.At first, this method is very responsive to interference of noise.Secondly, the change of background of static stack character area can cause the edge strength of literal to change, thereby may produce erroneous judgement to whether stablizing of edge.At last, use the variation of hard-threshold and edge strength to judge whether static method can't be applicable to that the Online Video of different resolution, different compression qualities flows at the edge.In order to address the above problem, the present invention proposes a kind of definition of frame-to-frame correlation of robust, the stable case at the expression edge of quantification.

To video stream data, grasp module 1 by frame of video and grasp continuous three frames, at first decompose by wavelet decomposition module 2, each frame is decomposed into LL, HL, LH, four wavelet sub-bands such as HH.The present invention uses following method to avoid directly problem in the edge strength method relatively:

1, noise processed.In four wavelet sub-bands, LH and HL subband are represented the edge of level and vertical direction respectively, comprise the edge of diagonal in the HH subband.It is generally acknowledged in the HH subband, include the pseudo-edge that a large amount of isolated noise points causes.Noise-reduction method commonly used is that the HH subband is carried out Filtering Processing.For the reduced data amount reaching the effect of real-time processing, only use edge in HL and the LH subband in this present invention, thereby remove The noise.

2, the edge strength that causes of change of background changes.In video, when identical literal was superimposed upon on the different background, its literal edge also can change.For example, to same literal, when its be in respectively on the white background and gray background on the time, its edge strength difference is very big, if directly compare according to edge strength, is easy to cause erroneous judgement.In order to address this problem, a kind of method intuitively is: do not use the intensity at edge, two frame relevant positions directly to contrast, and be to use the relative intensity at edge, relevant position to compare.For this reason, for the wavelet sub-band WS that obtains i-1 frame and i frame (WS ∈ LH, HL}) mid point (x y) locates the stability at edge, and it is the variance of the edge strength in the neighborhood at center that the present invention calculates with this point:

{\overset{&OverBar;}{σ}}^{2}_{i} (x, y, WS) = \frac{1}{{(2 M + 1)}^{2}} Σ_{a = x - M}^{x + M} Σ_{b = y - M}^{y + M} {WS}_{i} {(a, b)}^{2} - - - (1)

In the following formula, WS _i(a is among the wavelet sub-band WS of i frame b), (a, the wavelet coefficient of b) locating (i.e. edge strength on this direction).By the wavelet coefficient in its (2M+1) * (2M+1) neighborhood is averaged, can effectively avoid because the influence that pinch effect and noise cause edge strength.Afterwards, calculate i-1 frame and i frame wavelet sub-band WS mid point (x, the covariance of y) locating is as follows:

{\overset{&OverBar;}{σ}}_{[i - 1, i]} (x, y, WS) = \frac{1}{{(2 M + 1)}^{2}} Σ_{a = x - M}^{x + M} Σ_{b = y - M}^{y + M} {WS}_{i - 1} (a, b) {WS}_{i} (a, b) - - - (2)

In the following formula, the wavelet coefficient that covariance can be used for representing two frame respective sub-bands (for the expression opposite edges situation of change that quantizes, the frame-to-frame correlation of certain point is as follows among the definition subband WS for a, b) the coupling situation in the local zonule on every side:

ISCC (x, y, i - 1, i, WS) = \{\begin{matrix} - 1 & {\overset{&OverBar;}{σ}}_{i - 1} (x, y, WS) {\overset{&OverBar;}{σ}}_{i} (x, y, WS) < ϵ, \\ \max (\min (1, \frac{{\overset{&OverBar;}{σ}}_{[i - 1, i]} (x, y, WS)}{{\overset{&OverBar;}{σ}}_{i - 1} (x, y, WS) {\overset{&OverBar;}{σ}}_{i} (x, y, WS)}), - 1) & elsewise, \end{matrix} - - - (3)

By the definition of frame-to-frame correlation as can be seen, frame-to-frame correlation has considered that not only the average edge strength in the regional area changes to reduce the influence of noise and pinch effect, also, reflect the motion conditions of these marginal points by calculating the coupling of two interframe regional areas.And by the long-pending ratio of degree of coupling with average edge strength, be equivalent to use opposite edges intensity represent point (x, y) in the regional area at place, the situation of change of marginal position.For the degree of stability at the expression edge that quantizes, the value with frame-to-frame correlation in (3) formula is limited between-1 and 1.And ε is a predefined smaller numerical value, shows that to the non-flanged zone, its correlation is taken as-1, to reduce amount of calculation.

By the definition of frame-to-frame correlation, can be effectively under the situation of abandoning noise and pinch effect, the edge stability of the two frame particular sub-band ad-hoc locations that the calculating background changes.By calculating i-1 and i frame in twos, and in i and the i+1 frame, the correlation of each point in LH and the HL subband, can be from the measurement i frame that two direction in spaces and two time orientations quantize the stability at edge arbitrarily.At this, the stability that the present invention defines certain marginal point is the maximum in these four kinds of stability, thereby obtains the time domain stability diagram of i frame each point.Among this figure, arbitrary place (x, numerical value y) are represented when this puts each, short side to maximum stable.

3, threshold value chooses.According to experiment, the time domain stability value of static stack character area can be between 0.4-1.0, but not the time domain stability value of static stack character area is generally near 0 or be negative value.In order to obtain stable fringe region, need to use a threshold value that the time domain stability diagram is carried out binaryzation, to distinguish stable edge and unsettled edge.In order to make algorithm be applicable to the Online Video stream of different resolution, different compression qualities, choosing of threshold value is extremely important.At this, the present invention uses a kind of method of engineering to reach this purpose, and detail will describe in detail emphatically in the system architecture of video flowing retrieval.

Calculating and binaryzation operation by frame-to-frame correlation can obtain metastable zone on the time shaft.In order to simplify calculating, the present invention carries out character area by frame-to-frame correlation simultaneously and follows the tracks of, to remove once the static state that occurred in the frame in front character area that superposes.By the method for this character area that detects and handle to increase newly in this frame, can accelerate the speed of detection and Identification greatly.

Compare with the simple edges zone with general grain background, static character area has stronger more intensive edge.This feature is reflected in and is intensive and stronger wavelet coefficient on the wavelet field.In order to reflect this characteristics, the present invention uses extensive Gauss model (GMM) to come wavelet field is carried out rapid modeling, and its purpose is to reflect the edge distribution histogram in certain wavelet sub-band by simple average, variance, form factor.But the specific algorithm of modeling and derivation list of references " Estimation of shape parameter for generalizedGaussian distribution in subband decompositions of video " (form parameter of extensive Gaussian Profile is estimated in the video sub-band division), at this, only provide the concise and to the point flow process that it estimates above-mentioned parameter:

To wavelet sub-band WS, estimate that the flow process of its parameter is as follows:

1, calculates among the subband WS average μ of wavelet coefficient and variances sigma ²

2, calculate among the subband WS mathematic expectaion of the absolute value of the difference of wavelet coefficient and its average μ

E (| WS |) = (1 / MN) Σ_{m = 1}^{M} Σ_{n = 1}^{N} | WS (m, n) - μ |

3, calculate the variance ratio of desired value: ρ=σ therewith ²/ E ²(| WS|)

4, search Equation f (γ)=Γ (1/ γ) Γ (3/ γ) Γ by look-up table ²(2/ γ)=ρ separates.Wherein Γ () is the Gama equation, and its concrete form is:

Γ (x) = {&Integral;}_{0}^{\infty} t^{x - 1} e^{- t} dt (x > 0),

γ is form factor.

By these simple parameters, to subband WS, the present invention has determined a simple threshold values:

{threshold}_{WS} = C \times \sqrt{γ_{WS}} \times σ_{WS}

And use this threshold value to remove most background area in this subband, keep zone simultaneously with complex edge.In the following formula wherein, γ _WSForm factor for the corresponding extensive Gauss model of subband WS.When having identical variance, γ _WSLittle, mean that then subband WS medium and low frequency coefficient is many more, i.e. the background of relative " totally ", therefore, it is less that corresponding threshold value also can be got.C is the constant that is used for being weighted.Through experiment as can be known, the value of C generally can be taken between [2.5,5.5].In order to make algorithm be applicable to the Online Video stream of different resolution, different compression qualities, the present invention uses a kind of method of engineering to determine the value of C, and detail will describe in the system architecture of video flowing retrieval.

Two subband HL and LH to the i frame use above-mentioned threshold value to carry out the binaryzation operation respectively, and keep the point of wavelet coefficient absolute value greater than this threshold value, thereby can remove the simple background of two subbands.The result of two subbands is used OR operation to combine can to obtain complex edge zone on all directions of space.

By metastable zone on zone of the complex edge on all directions of space and the time shaft is made up by OR operation, can obtain complicated on the spatial domain, on the time shaft direction metastable fringe region, promptly possible static state stack character area.

In order in these possible static state stack character areas, to detect real static state stack character area, and the form of these zones with literal line showed, the present invention has adopted simple reprocessing and rule-based literal line division methods by post-processing module 6.To possible literal line, extract each rank square and histogram feature of this zone wavelet coefficient, whether by the SVMs of precondition, differentiating it is real literal line.

In order to guarantee the speed of text detection algorithm, in above-mentioned character detecting method, adopted following algorithm to accelerate detection speed:

1, to high-resolution Online Video stream, reduces operand by the method that reduces resolution.

2, after the wavelet decomposition, only use subband LH and HL, do not use the subband HH that is rich in noise

3, calculate in the frame-to-frame correlation, use two-dimentional separable mean filter to calculate local variance and interframe covariance, thereby greatly reduced amount of calculation.

4, in the wavelet field modeling, used fast algorithm to carry out parameter Estimation, and the derivation threshold value.

When 5, using SVMs to carry out the literal line differentiation, used simple and effective feature.

By the algorithm and the fast algorithm of above minimizing operand, can effectively reduce the required time of text detection.Experiment showed, that under rational frame sampling rate this algorithm can guarantee the real-time that detects fully.Detected by this method literal line generally comprises title, spolen title and other some static stack literal.By the method for existing literal dividing method and optical character identification, can obtain the text representation of these character areas.For speed up processing, the present invention only carries out literal to emerging literal line in every frame to be cut apart and discerns, and to the existing character area of preceding frame, directly uses existing recognition result.Thereby further accelerated detection speed.

In order to verify the validity of algorithm of the present invention,, used 15 sections videos that amount to 6 hours 49 minutes that above-mentioned algorithm is tested at this.Video resolution is unified to be 400*320.Simultaneously, the present invention compares the algorithm in algorithm of the present invention and the following two pieces of articles:

1, Lyu, M.R., Jiqiang Song, and Min Cai.A comprehensive method formultilingual video text detection, localization, and extraction.IEEETrans on circuits and systems for video technology, Volume 15, and Issue 2, Feb.2005 Page (s): 243-255 (Lyu, M.R. etc., a kind of comprehensive multilingual video text detection, location and extracting method).

2, Qixiang Ye, Qingming Huang, Wen Gao, and Debin Zhao.Fast and robusttext detection in images and video frames.Image and Vision Computing.Vol.23, No.6, pp565-576, the character detecting method of fast robust in Mar.2005 (Ye Qixiang etc., " image and the literal frame ").

In order to guarantee the justice of comparison, the unified sample rate of 2 frame per seconds that adopts of the present invention is carried out frame sampling to above-mentioned video.Three kinds of algorithms compare at the aspects such as recall ratio, false drop rate and detection speed of static state stack literal respectively, and comparative result is as shown in table 1:

Table 1: the comparison of different literals detection algorithm

Algorithm	Recall ratio (%)	False drop rate (%)	Detection speed (frame/second)
Algorithm	Recall ratio (%)	False drop rate (%)	Detection speed (frame/second)	This paper algorithm	90.66	28.98	9.09
The algorithm of paper (1)	82.11	38.17	4.46	This paper algorithm	90.66	28.98	9.09
The algorithm of paper (1)	82.11	38.17	4.46	The algorithm of paper (2)	88.68	37.49	1.18

By last comparative result as can be seen, owing to effectively utilized inter-frame information, algorithm of the present invention can reach a higher detection speed.Simultaneously, algorithm of the present invention has also reached higher recall ratio and lower false detection rate.

Because the surplus resources quantity difference of each user terminal, in order to guarantee not influence the fast as far as possible text detection of carrying out under the prerequisite that the user watches, when implementing above-mentioned text detection algorithm module on each user terminal, the present invention has also increased following two modules simultaneously:

1, the analysis speed estimation module 7.According to the text detection speed of current number frame, the decision frame of video grasps the frame sampling rate of module to Online Video stream, and the control of video frame grasps the frame sampling rate of module.

2, the random parameter generation module 3.In general, to the network video stream of different resolution, different compression qualities, need reach best text detection effect with different parameters.Since channel watch the user numerous, on different user terminals, can adopt different parameter settings to carry out text detection.And some value of parametric family can be near best parameter, thereby obtains the text detection result near optimum efficiency.Therefore in system of the present invention, can adopt in scope [0.4 the time domain stability diagram by time domain analysis module 4 different terminals, 1.0] between different threshold values carry out binaryzation, and in spatial domain analysis module 5, when deriving the threshold value of wavelet field binaryzation use by the parameter of extensive Gauss model, the value of parameters C also can fluctuation freely between scope [2.5,5.5].On specific user terminal, the parametric family to being adopted in detecting carries out random value in preset range, be used for all the text detection tasks on this user terminal.

As shown in Figure 3, be after having added above-mentioned three kinds of modules, the prototype system of static stack literal in a kind of analysis video of having realized.In order to show the feasibility of this static state stack text detection system, the present invention has carried out identification to detected literal and has handled.Thus figure as can be seen because the position of detected position and actual static stack literal matches, basically can be correct keyword is discerned and extracted to detected character area.This proof is of the present invention feasible fully to the conception that video flowing carries out index based on static state stack literal.

As shown in Figure 4, for the text detection of integrating each user terminal, cut apart and recognition result, and provide the retrieval of granularity of many time and content-based fast browsing to the user, the present invention proposes a kind of Online Video stream retrieval architecture.This video flowing is analyzed with searching system and mainly is made up of three parts: video provider, terminal use, centralized retrieval server.The video flowing service that provides mainly is provided in video provider, it can be seen as the Web broadcast channel with unique network address, i.e. video stream data module.

The terminal use is the recipient of video flowing service.The user terminal part mainly comprises static stack text detection module and optical character identification module.It act as is not influencing under the situation that video flowing is normally watched on the user terminal, the current video flowing of watching of user by above-mentioned rapid static stack text detection algorithm and OCR, is detected the static state stack character area of present frame and is converted to character text.Because the random parameter in the stack of the static state on each user terminal text detection module produces part and all will produce threshold value at random in allowed band at the current channel of watching, for text detection.Therefore, when watching the user abundant, can guarantee that the character area of detected the best approaches true zone.

On different terminals, produce the detection that different parametric families carries out literal at random, and to detected zone carry out respectively literal cut apart and discern after can obtain corresponding character text.After this, character text is transferred to centralized retrieval server further to integrate and issue again.The character text of uploading has following form:

Form 2: upload the character text format setting

Field 1	Field 2	Field 3	Field 4	...	Field N	Field N+1
Field 1	Field 2	Field 3	Field 4	...	Field N	Field N+1	Channel identication	Time marking	Text point	Character text	...	Text point	Character text

Centralized retrieval server mainly comprises keyword extracting module, keyword integrate module, granularity retrieval module of many time and content-based fast browsing module.Each module functions is as follows:

1, keyword extracting module and keyword integrate module: the text that each user terminal transmission comes is integrated in chronological order.At first, according to the position and the ballot principle of newly-increased literal, remove and adopt the detected character area of some parameter mistiming.In addition, individual character carries out because user search generally is by keyword, and parameter selects mistake to cause detecting under the situation of poor effect, and the result of character recognition tends to occur more individual character.For this reason, the text that the present invention at first uses keyword extracting module that each user terminal of same channel is submitted to carries out the keyword mark, and uses the keyword integrate module that these keywords are integrated, and the current content of video flowing is marked being used for.

2, granularity retrieval module of many time: the user search function mainly is provided.Certain web channel resulting text by existing terminal use analyzes provides the result for retrieval on the different time granularity.Use keyword " legal system " to search for as the user, then provide respectively with the legal system is keyword, with one minute in, each channel text key word of different time granularity is compared in five minutes, in 15 minutes, in one hour etc., and the frequency that occurs according to this keyword with and the time gap of query time, Query Result is sorted.

3, content-based fast browsing module: mainly provide quick browsing function to the user.Except function of search, should also provide user's quick browsing function.When the user selects quick browsing function, as channel identication, provide visual impression to the user in the mode of image to each channel intercepting present frame.And the keyword that will extract recently offers the user in the mode of text, understands the summary of each channel program in a period of time recently fast for the user.

Like this, carry out the detection and Identification of rapid static stack literal by the computational resource of effective integration user terminal, obtained content-based index to Online Video, thus can make the user use existing text retrieval technology carry out for a long time to multichannel Online Video stream between the retrieval of granularity and content-based fast browsing.

Claims

1, the method for static stack literal in a kind of fast detecting Online Video stream comprises the following steps:

1) the definition inter-frame correlation information is used to describe the stability at edge on wavelet field, and removes movement background zone and non-static stack character area;

2) use extensive Gauss model that the coefficient of wavelet sub-band is distributed and carry out modeling, with the distribution situation of simulation wavelet coefficient and derive corresponding threshold value, to be used to extract strong fringe region;

3) to via above-mentioned two step remaining areas, use morphologic associative operation, it is capable to be divided into candidate character;

4) capable to candidate character, use the inter-frame correlation information that is calculated in the step 1) to carry out interframe and follow the tracks of, not that the candidate character that occurs first of this frame is capable to remove;

5) capable to remaining candidate character, extract corresponding feature in wavelet field, and use SVMs as grader, to obtain real literal line, be the emerging static stack literal of this frame.

2, the method for static stack literal in the fast detecting Online Video stream as claimed in claim 1, it is characterized in that: the definition inter-frame correlation information on wavelet field of described step 1) is the wavelet character that utilizes present frame and front and back two frames, in wavelet field, calculate local variance and local covariance, and the edge stability describing method of robust, i.e. interframe coefficient correlation have been defined thus.

The method of static stack literal in the 3 fast detecting Online Video streams as claimed in claim 1, it is characterized in that: the extensive Gauss model of use is wavelet coefficient distribution the carrying out modeling to each wavelet sub-band of present frame described step 2), after the parameter by the fast algorithm estimation model, use these parameter Estimation to go out a global threshold, be used to distinguish strong fringe region and simple background area.

4, the system of the static stack of a kind of fast detecting Online Video stream literal, it is characterized in that: comprise from Online Video stream and grasp the frame of video of current broadcast and the frame of video extracting module of front and back two frames thereof simultaneously by certain frame sampling rate, current video frame and front and back two frames thereof are decomposed into the wavelet decomposition module of corresponding wavelet sub-band respectively, according to frame-to-frame correlation, remove the time domain analysis module of movement background and non-static stack character area, according to the wavelet field parameter, remove the spatial domain analysis module of simple background area, with time domain, the result of spatial domain analysis module makes up, and division literal line, carry out interframe by the frame-to-frame correlation of calculated and preceding frame and follow the tracks of, and the post-processing module of carrying out the judgement of literal line authenticity by SVMs; Described video stream data grasps module, wavelet decomposition module, time domain analysis module, spatial domain analysis module and post-processing module, output character area information and control information via frame of video successively.

5, the system of the static stack of a kind of fast detecting Online Video stream as claimed in claim 4 literal, it is characterized in that: also be included on the specific user terminal, to the parametric family that is adopted in detecting, in preset range, carry out random value, be used for the random parameter generation module of all the text detection tasks on this user terminal and according to the text detection speed of current number frame, the decision frame of video grasps the frame sampling rate of module to Online Video stream, to guarantee the smooth analysis speed estimation module of watching speed; When described control information is used, it is transferred into described analysis speed estimation module, analyze employed sample frequency with decision next time, simultaneously, described random parameter generation module produces random parameter, and export these parameters to described time domain analysis module and spatial domain analysis module, with the process of control text detection.

6, a kind of method of static stack literal in the described fast detecting Online Video stream of claim 1 of utilizing is retrieved the method that Online Video flows, and comprises the steps:

1. utilize the vacant computational resource on each user terminal, under the situation that does not influence viewing effect, current video stream is carried out the detection of static stack literal;

2. utilize existing literal to cut apart and optical character recognition method, detected static stack character area is discerned, and the text that obtains is sent to concentrated retrieval server with set form;

3. concentrate on the retrieval server, use existing method that the text that is received is proposed keyword, and integrate, to obtain the text index of Online Video stream by channel;

4. concentrate on the retrieval server, use the keyword index of each channel that extracts, the search function to the granularity of many time of each channel is provided;

5. concentrate on the retrieval server, use the keyword index of each channel that extracts, the content-based quick browsing function to each channel is provided.

7, the search method of Online Video stream as claimed in claim 6, it is characterized in that 3. step also comprises frequency according to keyword, position, time mark etc., text results to different user terminals is integrated, and forms the textual description and the weight thereof that specific Online Video are flowed any time.

8, the search method of Online Video stream as claimed in claim 6, it is characterized in that granularity retrieval time set when 4. step also comprises by user search, use keyword and weight thereof in this time granularity, all Online Videos are flow to line ordering, and the result is submitted to the user.

9, the search method of Online Video stream as claimed in claim 6, wherein 5. step also comprises the moment of browsing according to the user, grasp the present frame of each Online Video stream and the textual description in the nearest short time period, carry out content-based fast browsing for the user.

10, a kind of system of the retrieval Online Video stream of setting up based on the system of the static stack of the described fast detecting Online Video of claim 4 stream literal, it is characterized in that: utilize the vacant computational resource on each user terminal, under the situation that does not influence viewing effect, the current video flowing of watching is carried out the detection of static stack literal; Identification module utilizes existing literal to cut apart and optical character recognition method, on each user terminal detected static stack character area is discerned, and the text that obtains is sent to concentrated retrieval server with set form; Keyword extracting module is being concentrated on the retrieval server, uses existing method that each the road text that is received is proposed keyword, and integrates by channel, to obtain the text index of Online Video stream.The keyword index of each channel that use extracts provides the search function to the granularity of many time of each channel, and the content-based quick browsing function to each channel is provided.