CN103440487B

CN103440487B - A kind of natural scene text location method of local tone difference

Info

Publication number: CN103440487B
Application number: CN201310377443.8A
Authority: CN
Inventors: 李宏亮; 黄自力; 姚源; 许静; 孟凡满; 吴庆波; 黄超
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2013-08-27
Filing date: 2013-08-27
Publication date: 2016-11-02
Anticipated expiration: 2033-08-27
Also published as: CN103440487A

Abstract

The present invention provides the natural scene text location method of a kind of local tone difference.The present invention not only make use of the textural characteristics of word, and make use of the character area feature different from peripheral region tone, effectively positions the word in scene.By taking the average hue difference near edge pixel point, this average hue difference is utilized relatively to judge whether this region contains word compared with threshold value, do so can add the local color information of region word, utilizes the colour consistency of word to position word from the different of background.And the present invention utilizes adaptive thresholding method to obtain threshold value, this threshold value is to be obtained by the meansigma methods of the dominant hue difference in the region up and down of all candidate frames, the purpose of do so is that the colouring information utilizing view picture figure is contributed for local color information, and the threshold value obtained can characterize the character area of scene graph and the hue difference of background.Word in natural scene can be accurately positioned by the present invention quickly.

Description

A kind of natural scene text location method of local tone difference

Technical field

The invention belongs to image procossing and technical field of computer vision, particularly to a kind of natural scene text location method.

Background technology

Scene picture Chinese word is detected automatically, segmentation, identify, will provide the biggest to the acquisition of information of people Help, also automatic Understanding and retrieval to the semantic information of image have very important meaning.In onboard navigation system, If can automatically the road sign in front, retail shop's title, traffic signs etc. be positioned, identify, then by the trip for people Safety guarantee is provided, driver can be reminded to slow down, and correct traffic route.Multimedia and the high speed development of computer In, picture becomes the important medium of transmission with its deep specific form of image, and retrieval based on key word can not meet people Demand, and retrieval based on image content, have become as the trend of development, in retrieval, the location of word, identification becomes Crucial technology, attracts the concern of more and more scholar, and text location can be that the reading of blind person provides and assists help simultaneously.

During in scene, text location shows methodical integrated learning, the method for text location substantially can be divided into two kinds of methods: 1, text location method based on texture；2, text location method based on region.Text location method based on texture, just Be utilize textural characteristics to distinguish word and non-legible, by one or one piece of region are the cluster of word to together, this side The robustness of method is good, but the complexity that also result in algorithm is higher.Text positioning method based on region, it is simply that according to one piece The pixel in region meets certain similarity to distinguish word with non-legible, such as, can make according to consistency of colour in region Being characterized, come separately text and background area, this method is simple, but a kind of feature tends not to meet all of classification, Robustness is not enough, effect bad in the scene picture performance process complex background.

Summary of the invention

The technical problem to be solved is to provide one and effectively can position word in natural scene, speed simultaneously Comparatively fast, the text location method that practicality is stronger.

The present invention solves that the technical scheme that above-mentioned technical problem is sampled is, the natural scene word of a kind of local tone difference is fixed Method for position, comprises the following steps:

1) by grader, scene picture is scanned, the candidate frame corresponding to obtain candidate character region；

2) scene picture is converted into HSI color model, extracts tone H component, calculate all candidate frames box (i) and adjacent region Dominant hue difference hue_aver in territory:

h u e_a v e r = \frac{1}{N} Σ_{i = 1}^{N} | b o x_d o m i h u e (i) - b o x_n e i g h b o u r_d o m i h u e (i) |;

Wherein, box_domihue (i) is the dominant hue of i-th candidate frame box (i), box_neighbour_domihue (i) be with The dominant hue of candidate frame box (i) adjacent area；N is candidate frame sum in current scene picture；

3) take edge pixel point in scene picture, seek the average tone between all edge pixel points and neighbor pixel point in each candidate frame Difference local_hue (i)；

4) average hue difference local_hue (i) that relatively each candidate frame is corresponding and the size of dominant hue difference hue_aver, work as candidate Average hue difference local_hue (i) that frame is corresponding is more than dominant hue difference hue_aver, then will regard current candidate frame as comprising word Region, otherwise gives up current candidate frame；After all candidate frames judge, final scene text location completes.

The present invention not only make use of the textural characteristics of word, and make use of the character area feature different from peripheral region tone, Effectively the word in scene is positioned.By taking the average hue difference near edge pixel point, utilize this average hue difference Relatively judging whether this region contains word compared with threshold value, do so can add the local color information of region word, utilizes The colour consistency of word positions word from the different of background.And the present invention utilizes adaptive thresholding method to obtain threshold Value, this threshold value is to be obtained by the meansigma methods of the dominant hue difference in the region up and down of all candidate frames, and the purpose of do so is profit Contributing for local color information with the colouring information of view picture figure, the threshold value obtained can characterize character area and the back of the body of scene graph The hue difference of scape.

The invention has the beneficial effects as follows, can quickly word in natural scene be accurately positioned.

Accompanying drawing explanation

Fig. 1: embodiment candidate frame handling process schematic diagram；

The natural scene picture of Fig. 2: input；

Fig. 3: process the scene graph text location effect obtained through feature and grader；

Fig. 4: tone difference processes the scene graph text location effect obtained through local.

Detailed description of the invention

Specifically comprising the following steps that of the text location of embodiment

Step one: the training of character features and the design of grader.

1) set up a Sample Storehouse, wherein contain positive sample number (containing word) 3000, be masked as 1, and negative sample is (no Containing word) 7000, it is masked as-1, all samples have been normalized into 48*96 size.

2) to all sample extraction gradient orientation histograms (HOG) in Sample Storehouse and the uniform mould with rotational invariance The LBP feature of formula, is unified into a characteristic vector.

3) it is input in grader be trained obtaining one point by the characteristic vector of positive sample and negative sample totally 10000 samples Class device.

The training method of word grader is not belonging to the focus of the present invention, and those skilled in the art can be according to according to existing disclosure Method combine real needs design grader.

Step 2: obtain scene picture, as in figure 2 it is shown, use the grader trained that natural scene picture is scanned, To obtain candidate character region.

1) scene picture is converted into gray-scale map ImGray, scene graph is carried out scaling simultaneously, use the slip of 48*96 Scene picture is scanned by window；

2) grader that each window obtained sliding uses step one to obtain judges, stays if it is judged that be 1 This window, otherwise casts out, and is expanded with the ratio that scene graph scales by the window obtained and obtains candidate frame；

3) all candidate frames are judged, if the ratio of the region intersected of two windows and the overall area of two windows is more than 0.5, then merge into a candidate frame, obtain final candidate frame box, as shown in Figure 3.

Originally it is implemented in this process providing a preferred acquisition candidate frame, merges to simplify follow-up place to overlapping candidate frame Reason.

Step 3: candidate frame is carried out the judgement of local tone difference, to obtain final text location window, shown in Fig. 1.

1) it is HSI (Hue-Saturation-Intensity) model by the RGB model conversation of scene graph, extracts the tone of scene graph H component, in H component, respectively one frame of extraction of each candidate frame box (i) obtaining step 2, if should The a certain bar limit of candidate frame box (i) is positioned at border, then the frame of this boundary direction does not takes, the synthesis of frame up and down that will obtain Individual region is as peripheral region box_neighbour (i) changing candidate frame.

2) each box_neighbour (i) and box (i) are asked for the rectangular histogram of H chrominance component, and this enforcement is by rectangular histogram intermediate value That maximum tone is as the dominant hue of this frame, and those skilled in the art can also make otherwise to determine in conjunction with actual demand Dominant hue, subtracts each other the dominant hue of each box (i) with box_neighbour (i) and takes absolute value, as the dominant hue difference of box (i), The averaged dominant color obtaining all candidate frames adjusts difference hue_aver, and the threshold value as next step uses.

h u e_a v e r = \frac{1}{N} Σ_{i = 1}^{N} | b o x_d o m i h u e (i) - b o x_n e i g h b o u r_d o m i h u e (i) |

Wherein: i represents the i-th candidate frame, N represents candidate frame number, box_domihue (i), represents the dominant hue of i-th frame, Box_neighbour_domihue (i) represents the field dominant hue of i-th frame.

3rd step: use Canny operator that the gray-scale map ImGray in scene is carried out computing, ask for edge graph ImCanny, In each candidate frame that step 2 obtains, find edge graph ImCanny be not equal to 0 pixel pixel, i.e. edge pixel point, The chrominance component H of the pixel up and down asking for edge pixel point is poor, and the chrominance component H of left and right pixel is poor, obtain up and down and The meansigma methods of left and right hue difference, asks for all ImCanny in box (i) candidate frame the most again and is not equal to the average of pixel p ixel of 0 Hue difference local_hue (i), compares with threshold value hue_aver, if greater than this threshold value, then retains this frame, otherwise casts out, Obtain final scene text location, as shown in Figure 4.

Wherein: x represents the position of the pixel of edge pixel in candidate frame, pixel_up (x), pixel_down (x), Pixel_left (x), pixel_right (x) represent the tone value up and down of pixel i respectively, and M represents candidate frame box (i) Middle edge pixel sum.

Claims

1. the natural scene text location method of a local tone difference, it is characterised in that comprise the following steps:

hue_aver = \frac{1}{N} Σ_{i = 1}^{N} | box_domihue (i) - box_neighbour_domihue (i) |;

The natural scene text location method of a kind of local the most as claimed in claim 1 tone difference, it is characterised in that candidate frame The extracting method of adjacent area be:

When candidate frame box (i) does not has while being positioned at the border of scene picture, then in the respectively extraction one of candidate frame box (i) Frame, is positioned at the border of scene picture when candidate frame box (i) has, does not takes frame at this boundary direction；To candidate frame box (i) After being extracted adjacent frame, these frames are synthesized region adjacent area box_neighbour (i) as this candidate frame.

The natural scene text location method of a kind of local the most as claimed in claim 1 tone difference, it is characterised in that each candidate In frame, the computational methods of average hue difference local_hue (i) between all edge pixel points and neighbor pixel point are:

Wherein, x represents the position of the edge pixel point in i-th candidate frame, pixel_up (x), pixel_down (x), Pixel_left (x), pixel_right (x) represent the tone value of the upper and lower, left and right of pixel x respectively, and M represents i-th Edge pixel sum in individual candidate frame.

The natural scene text location method of a kind of local the most as claimed in claim 1 tone difference, it is characterised in that step 3) In the edge graph asked for by Canny operator obtain edge pixel point.

The natural scene text location method of a kind of local the most as claimed in claim 1 tone difference, it is characterised in that described master Tone is the maximum tone value in H histogram of component.