CN106407969A

CN106407969A - Robust complex background video text positioning and extracting method

Info

Publication number: CN106407969A
Application number: CN201610778073.2A
Authority: CN
Inventors: 汤景凡; 王兴起; 姜明; 李志涛
Original assignee: Hangzhou Electronic Science and Technology University
Current assignee: Hangzhou Dianzi University; Hangzhou Electronic Science and Technology University
Priority date: 2016-08-30
Filing date: 2016-08-30
Publication date: 2017-02-15

Abstract

The invention discloses a robust complex background video text positioning and extracting method, and mainly solves the problems that the present video text positioning and extracting method is not sufficiently robust under the complex background. In the text positioning phase, angular points are selected to act as the basic characteristics of characters, and then positioning of text rows is realized by the method of combination of coarse positioning and fine positioning. Candidate text areas are obtained in coarse positioning by utilizing the four characteristics of Area, Saturation, Ratio and Position of a binary text area distribution graph. Finally, the candidate text areas are segmented into the text rows by using the horizontal projection of the angular points and the method of angular point density fusion so that accurate positioning of the text rows can be realized and pseudo text rows can be eliminated. The text extracting phase is completed through polarity judgment based on binary images and an improved local OTSU method so that the problems of optimal threshold selection and stroke detail loss under the complex background can be effectively solved. The robust complex background video text positioning and extracting method has high recall rate in video text positioning and extracting of various programs.

Description

A kind of healthy and strong complex background Video Text Location and abstracting method

Technical field

The invention belongs to field of video image processing is and in particular to a kind of healthy and strong complex background Video Text Location and taking out Take method.

Background technology

With the fast development of modern science and technology, a lot of information in life are all to be transmitted by multimedia form.Wherein, Word in video is one of most useful information type, and these texts provide much valuable information, and such as program is situated between Continue, scene location, especially bulletin, the title of speaker, contest scores, date and time, Real Estate Trend, media event and regard Frequency content etc..Text identification has had a lot of real world applications, such as visual classification, document analysis, the video inspection based on video content Rope, help blind person, automatic marking, Car license recognition etc..So extracting to the text message of video, to the deep layer understanding video Semantic information is significant.

There has been now a lot of video text positioning and the algorithm extracting both at home and abroad, can be largely classified into based on connected domain , based on texture, based on edge and based on study method.Method locating speed ratio wherein based on connected domain is very fast, But easily by disturbing that picture contrast changes；Method positioning based on texture and edge is more stable, but has the time complicated Spend high shortcoming；And the method positioning quality based on study depends entirely on the training of sample.

Content of the invention

The present invention is directed in prior art the unhealthy and strong problem of Video Text Location under complex background it is proposed that a kind of video The positioning of middle complex background text and extraction robust method.

The purpose of the present invention is achieved through the following technical solutions：A kind of healthy and strong complex background Video Text Location And abstracting method, comprise the following steps：

Pretreatment：Input video frame, frame of video is converted into gray level image, carries out Corner Detection to frame of video, obtains angle Point two-value scattergram；

Text coarse positioning：First with angle point two-value scattergram, region merging technique is carried out by sliding window, form two-value literary composition One's respective area scattergram, then filters out the angle point in non-textual region using four attributes of two-value text regional distribution chart, real Existing text filed coarse positioning；

Text fine positioning：By the method for angle point floor projection and angle point density fusion, realize line of text fine positioning；

Text extracts：The polarity that line of text image after positioning is carried out based on bianry image judges, then using improvement Local OTSU algorithm calculate threshold value in each piece of region, and optimal threshold is adjusted to according to polarity judged result, finally complete Become the binaryzation of line of text.

Further, through region, fusion obtains described two-value text regional distribution chart, specifically selects a n*n's Rectangle frame, 5≤n≤15, centered on angle point, angle point color is set to the color in rectangle frame region, has traveled through all angle points, Finally give two-value text regional distribution chart.

Further, described two-value text regional distribution chart has multiple connected regions.

Further, the attribute of described two-value text regional distribution chart includes：Area, Saturation, Ratio and Position.

Further, the described angle point filtering out in non-textual region refers to by four attribute character Area, Saturation, Ratio and Position are progressively filtered, specifically：

First, Area attribute filter refer to each of current two-value text regional distribution chart connected region Area by It is ranked up according to size, filter out relatively small region Area；Then, Saturation filters and refers to calculate each even The ratio of the area A (Rect) of area A (Area) and boundary rectangle frame Rect of logical region Area, Saturation=A (Area)/A (Rect), (Saturation ∈ (0,1)), filter out the less connected region of Saturation；Then, Ratio Filter and refer to calculate the depth-width ratio of each connected region boundary rectangle frame, filter out Ratio and be more than 1：2.5 connected region；? Afterwards, Position filter method specifically calculates the positional information of connected region, filters out the connected region of part in frame of video 2/3 Domain.

Further, the method for described angle point floor projection and angle point density fusion refers to：

First, the angle point rectangular histogram often gone by statistics, is divided into literary composition using histogrammic Wave crest and wave trough by text filed One's own profession, its trough basis for estimation is the angle point number of continuous q row to be less than 1/4 or the 1/3 of angle point meansigma methodss number be considered as trough, 3≤q ≤6.Then the method utilizing angle point density fusion removes the background area of line of text or pseudo- line of text.

Further, the method for described angle point density fusion refers to filter out the background of line of text remaining or removes pseudo-text One's own profession, its filtering rule utilizes H*1/2H (H is the height of line of text) horizontal sliding window mouth to carry out level slip, removes angle point close Less than the region of threshold value C, C is angle point number to degree, and the rectangle frame finally again rectangle frame being smaller than H is fused into new text OK.

Further, the described polarity based on bianry image judges it is the polarity carrying out based on local OTSU bianry image Judge, first, carry out the binaryzation of line of text using local OTSU method, then four borders after above-mentioned bianry image are made For sub-pixel point, carry out four connected region seed fill algorithm filling, Filling power is p, 0<p<255, finally calculate black and white two-value Shared ratio, heavy color is then the polarity of word.

Further, described optimal threshold is adjusted to according to polarity judged result refers to calculate using local OTSU algorithm Go out threshold value T in each block, but do not carry out binary conversion treatment.Then utilize text polarity judged result, then change present threshold value For optimal threshold, amended optimal threshold is finally utilized to execute binarization operation.

Further, described modification present threshold value specifically refers to for optimal threshold, if polarity judges text color being Black, then new threshold value T1=T-T*0.1, conversely, during white text, new threshold value T1=T+T*0.1.

The invention has the beneficial effects as follows：In the String localization stage, it is basic as word that the present invention chooses healthy and strong angle point Feature, then completes the text filed positioning of candidate by coarse positioning, and reservation as much as possible is text filed；Recycle fine positioning Realize segmentation and the verification of text filed line of text.Text extraction stage, is judged and office by the polarity based on bianry image The method that portion OTSU combines completes optimal threshold under complex background and chooses difficult problem.Through the many experiments to the present invention Test, the String localization of the present invention and extraction algorithm have good vigorousness to complex background video.

Brief description

Fig. 1 is the inventive method flow chart.

Specific embodiment

Below in conjunction with accompanying drawing, the invention will be further described.

As shown in figure 1, a kind of healthy and strong complex background Video Text Location that provides of the present invention and abstracting method, including with Lower step：

Embodiment

The realization of the present embodiment, comprises the following steps：

1st, input video frame, carries out pretreatment to frame of video, for example, be converted into gray level image；Using Harris algorithm to regarding Frequency frame carries out Corner Detection, obtains the angle point two-value scattergram that background is that black, angle point are white；

2nd, the rectangle frame of 6*6, centered on angle point, the face consistent by being set to angle point in the rectangle frame of each angle point are utilized Color, obtains two-value text regional distribution chart；

3rd, carry out String localization using by the thick text positioning method to essence；

Coarse positioning：Four attributes removals choosing two-value text regional distribution chart are pseudo- text filed, obtain candidate's text area Domain.It is area, saturation, depth-width ratio and position respectively, be designated as Area, Saturation, Ratio, Position.

Area：Find each connected region Area, then according to the size of connected region is ranked up, filter out Relatively small region.Because the less region of area is not necessarily the main contents of frame of video reflection, and small area region It is easy to be filtered.

Saturation：The i.e. saturation feature of angle point, counts the boundary rectangle frame Rect of each connected region.Filter out The less connected region of Saturation.

Saturation=A (Area)/A (Rect), (Saturation ∈ (0,1))；

Wherein A (Area) is the area of connected domain, and A (Rect) is the area of the boundary rectangle frame of connected domain.Due to video In overlapping text be all level, the therefore value of Saturation is close to 1, and the Saturation of pseudo- text connected region It is worth close to 0.

Ratio：Connected region boundary rectangle frame depth-width ratio, according to the own characteristic of Chinese character, it is big that the present invention filters out Ratio In 1：2.5 connected region.

Position：The positional information of connected region, filters out in frame of video 2/3 connected region.Because in frame of video Overlapping text information be typically all in the lower section of frame of video.

Fine positioning：Due to multiline text or pseudo- text may be comprised in the connected region of each candidate, therefore, it can Carry out floor projection using angle point, the connected region of each candidate is divided into line of text and realizes being accurately positioned；Finally utilize Angle point density fusion method filters further to the background of line of text or pseudo- line of text, and filter window size is that (H is H*1/2H The height of line of text), the rectangle frame again rectangle frame being smaller than H afterwards is fused into new line of text, and line of text positioning completes；

4th, text extracts.The present invention judges mutually to tie with improved local OTSU binaryzation using the polarity based on bianry image The method closed completes the extraction of line of text.

Local OTSU binaryzation：Line of text image division such as is at the region of size, the size of each zonule is H*H (height of line of text).Each region carries out local OTSU binaryzation；

Polarity based on bianry image judges：Using four borders after above-mentioned bianry image as sub-pixel point, carry out Four connected region seed fill algorithm is filled, and the value of filling is 128, finally calculates the ratio shared by black and white two-value, heavy face Color is then the polarity of word；

Improved local OTSU binaryzation：First to line of text image procossing as the OTSU algorithm of local, by line of text It is divided into the region that multiple sizes are H*H, then execution OTSU algorithm calculates threshold value T in each block, but does not now carry out two Value is processed.Because T now is not the optimal threshold of text segmentation.It is thus desirable to according to text polarity judged result, then repair Change this threshold value (if polarity judges text color for black, new threshold value T1=T-T*0.1, conversely, during white text, new threshold Value T1=T+T*0.1), finally utilize amended optimal threshold to execute binarization operation.

Claims

1. a kind of healthy and strong complex background Video Text Location and abstracting method are it is characterised in that comprise the following steps：

Pretreatment：Input video frame, frame of video is converted into gray level image, carries out Corner Detection to frame of video, obtains angle point two Distribution value figure；

Text coarse positioning：First with angle point two-value scattergram, region merging technique is carried out by sliding window, form two-value text area Domain scattergram, then filters out the angle point in non-textual region using four attributes of two-value text regional distribution chart, realizes literary composition One's respective area coarse positioning；

Text extracts：The polarity that line of text image after positioning is carried out based on bianry image judges, then utilizes improved office Portion's OTSU algorithm calculates the threshold value in each piece of region, and is adjusted to optimal threshold according to polarity judged result, finally completes literary composition The binaryzation of one's own profession.

2. method according to claim 1 is it is characterised in that described two-value text regional distribution chart merges through region Obtain, specifically select the rectangle frame of a n*n, 5≤n≤15, centered on angle point, the color in rectangle frame region is set to Angle point color, has traveled through all angle points, finally gives two-value text regional distribution chart.

3. method according to claim 2 is it is characterised in that described two-value text regional distribution chart has multiple connections Region.

4. method according to claim 1 is it is characterised in that the attribute of described two-value text regional distribution chart includes： Area, Saturation, Ratio and Position.

5. method according to claim 1 is it is characterised in that the described angle point filtering out in non-textual region refers to lead to Cross four attribute character Area, Saturation, Ratio and Position are progressively filtered, specifically：

First, Area attribute filters and refers to each of current two-value text regional distribution chart connected region Area according to face Long-pending size is ranked up, and filters out relatively small region Area；Then, Saturation filters and refers to calculate each connected region The ratio of the area A (Rect) of area A (Area) and boundary rectangle frame Rect of domain Area, Saturation=A (Area)/A (Rect), (Saturation ∈ (0,1)), filters out the less connected region of Saturation；Then, Ratio filters and refers to Calculate the depth-width ratio of each connected region boundary rectangle frame, filter out Ratio and be more than 1：2.5 connected region；Finally, Position filter method specifically calculates the positional information of connected region, filters out the connected region of part in frame of video 2/3.

6. method according to claim 1 is it is characterised in that the side of described angle point floor projection and angle point density fusion Method refers to：

First, the angle point rectangular histogram often gone by statistics, is divided into line of text using histogrammic Wave crest and wave trough by text filed, Its trough basis for estimation is the angle point number of continuous q row to be less than 1/4 or the 1/3 of angle point meansigma methodss number be considered as trough, 3≤q≤6. Then the method utilizing angle point density fusion removes the background area of line of text or pseudo- line of text.

7. method according to claim 6 is it is characterised in that the method for described angle point density fusion refers to filter out literary composition The background of one's own profession remaining or the pseudo- line of text of removal, its filtering rule utilizes H*1/2H (H is the height of line of text) horizontal sliding window Mouth carries out level slip, removes the region that angle point density is less than threshold value C, and C is angle point number, is finally smaller than rectangle frame again The rectangle frame of H is fused into new line of text.

8. method according to claim 1 is it is characterised in that the described polarity based on bianry image judges it is based on office The polarity that portion's OTSU bianry image is carried out judges, first, carries out the binaryzation of line of text using local OTSU method, then upper State four borders after bianry image as sub-pixel point, carry out four connected region seed fill algorithm filling, Filling power is p, 0 <p<255, finally calculate the ratio shared by black and white two-value, heavy color is then the polarity of word.

9. method according to claim 1 is it is characterised in that described is adjusted to optimal threshold according to polarity judged result Refer to calculate threshold value T in each block using local OTSU algorithm, but do not carry out binary conversion treatment.Then utilize text polarity Judged result, then change present threshold value for optimal threshold, finally utilize amended optimal threshold to execute binarization operation.

10. method according to claim 9 it is characterised in that described modification present threshold value be optimal threshold specifically Refer to, if polarity judges text color for black, new threshold value T1=T-T*0.1, conversely, during white text, new threshold value T1=T +T*0.1.