CN103440487B - A kind of natural scene text location method of local tone difference - Google Patents

A kind of natural scene text location method of local tone difference Download PDF

Info

Publication number
CN103440487B
CN103440487B CN201310377443.8A CN201310377443A CN103440487B CN 103440487 B CN103440487 B CN 103440487B CN 201310377443 A CN201310377443 A CN 201310377443A CN 103440487 B CN103440487 B CN 103440487B
Authority
CN
China
Prior art keywords
hue
box
candidate frame
difference
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310377443.8A
Other languages
Chinese (zh)
Other versions
CN103440487A (en
Inventor
李宏亮
黄自力
姚源
许静
孟凡满
吴庆波
黄超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201310377443.8A priority Critical patent/CN103440487B/en
Publication of CN103440487A publication Critical patent/CN103440487A/en
Application granted granted Critical
Publication of CN103440487B publication Critical patent/CN103440487B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The present invention provides the natural scene text location method of a kind of local tone difference.The present invention not only make use of the textural characteristics of word, and make use of the character area feature different from peripheral region tone, effectively positions the word in scene.By taking the average hue difference near edge pixel point, this average hue difference is utilized relatively to judge whether this region contains word compared with threshold value, do so can add the local color information of region word, utilizes the colour consistency of word to position word from the different of background.And the present invention utilizes adaptive thresholding method to obtain threshold value, this threshold value is to be obtained by the meansigma methods of the dominant hue difference in the region up and down of all candidate frames, the purpose of do so is that the colouring information utilizing view picture figure is contributed for local color information, and the threshold value obtained can characterize the character area of scene graph and the hue difference of background.Word in natural scene can be accurately positioned by the present invention quickly.

Description

A kind of natural scene text location method of local tone difference
Technical field
The invention belongs to image procossing and technical field of computer vision, particularly to a kind of natural scene text location method.
Background technology
Scene picture Chinese word is detected automatically, segmentation, identify, will provide the biggest to the acquisition of information of people Help, also automatic Understanding and retrieval to the semantic information of image have very important meaning.In onboard navigation system, If can automatically the road sign in front, retail shop's title, traffic signs etc. be positioned, identify, then by the trip for people Safety guarantee is provided, driver can be reminded to slow down, and correct traffic route.Multimedia and the high speed development of computer In, picture becomes the important medium of transmission with its deep specific form of image, and retrieval based on key word can not meet people Demand, and retrieval based on image content, have become as the trend of development, in retrieval, the location of word, identification becomes Crucial technology, attracts the concern of more and more scholar, and text location can be that the reading of blind person provides and assists help simultaneously.
During in scene, text location shows methodical integrated learning, the method for text location substantially can be divided into two kinds of methods: 1, text location method based on texture;2, text location method based on region.Text location method based on texture, just Be utilize textural characteristics to distinguish word and non-legible, by one or one piece of region are the cluster of word to together, this side The robustness of method is good, but the complexity that also result in algorithm is higher.Text positioning method based on region, it is simply that according to one piece The pixel in region meets certain similarity to distinguish word with non-legible, such as, can make according to consistency of colour in region Being characterized, come separately text and background area, this method is simple, but a kind of feature tends not to meet all of classification, Robustness is not enough, effect bad in the scene picture performance process complex background.
Summary of the invention
The technical problem to be solved is to provide one and effectively can position word in natural scene, speed simultaneously Comparatively fast, the text location method that practicality is stronger.
The present invention solves that the technical scheme that above-mentioned technical problem is sampled is, the natural scene word of a kind of local tone difference is fixed Method for position, comprises the following steps:
1) by grader, scene picture is scanned, the candidate frame corresponding to obtain candidate character region;
2) scene picture is converted into HSI color model, extracts tone H component, calculate all candidate frames box (i) and adjacent region Dominant hue difference hue_aver in territory:
h u e _ a v e r = 1 N Σ i = 1 N | b o x _ d o m i h u e ( i ) - b o x _ n e i g h b o u r _ d o m i h u e ( i ) | ;
Wherein, box_domihue (i) is the dominant hue of i-th candidate frame box (i), box_neighbour_domihue (i) be with The dominant hue of candidate frame box (i) adjacent area;N is candidate frame sum in current scene picture;
3) take edge pixel point in scene picture, seek the average tone between all edge pixel points and neighbor pixel point in each candidate frame Difference local_hue (i);
4) average hue difference local_hue (i) that relatively each candidate frame is corresponding and the size of dominant hue difference hue_aver, work as candidate Average hue difference local_hue (i) that frame is corresponding is more than dominant hue difference hue_aver, then will regard current candidate frame as comprising word Region, otherwise gives up current candidate frame;After all candidate frames judge, final scene text location completes.
The present invention not only make use of the textural characteristics of word, and make use of the character area feature different from peripheral region tone, Effectively the word in scene is positioned.By taking the average hue difference near edge pixel point, utilize this average hue difference Relatively judging whether this region contains word compared with threshold value, do so can add the local color information of region word, utilizes The colour consistency of word positions word from the different of background.And the present invention utilizes adaptive thresholding method to obtain threshold Value, this threshold value is to be obtained by the meansigma methods of the dominant hue difference in the region up and down of all candidate frames, and the purpose of do so is profit Contributing for local color information with the colouring information of view picture figure, the threshold value obtained can characterize character area and the back of the body of scene graph The hue difference of scape.
The invention has the beneficial effects as follows, can quickly word in natural scene be accurately positioned.
Accompanying drawing explanation
Fig. 1: embodiment candidate frame handling process schematic diagram;
The natural scene picture of Fig. 2: input;
Fig. 3: process the scene graph text location effect obtained through feature and grader;
Fig. 4: tone difference processes the scene graph text location effect obtained through local.
Detailed description of the invention
Specifically comprising the following steps that of the text location of embodiment
Step one: the training of character features and the design of grader.
1) set up a Sample Storehouse, wherein contain positive sample number (containing word) 3000, be masked as 1, and negative sample is (no Containing word) 7000, it is masked as-1, all samples have been normalized into 48*96 size.
2) to all sample extraction gradient orientation histograms (HOG) in Sample Storehouse and the uniform mould with rotational invariance The LBP feature of formula, is unified into a characteristic vector.
3) it is input in grader be trained obtaining one point by the characteristic vector of positive sample and negative sample totally 10000 samples Class device.
The training method of word grader is not belonging to the focus of the present invention, and those skilled in the art can be according to according to existing disclosure Method combine real needs design grader.
Step 2: obtain scene picture, as in figure 2 it is shown, use the grader trained that natural scene picture is scanned, To obtain candidate character region.
1) scene picture is converted into gray-scale map ImGray, scene graph is carried out scaling simultaneously, use the slip of 48*96 Scene picture is scanned by window;
2) grader that each window obtained sliding uses step one to obtain judges, stays if it is judged that be 1 This window, otherwise casts out, and is expanded with the ratio that scene graph scales by the window obtained and obtains candidate frame;
3) all candidate frames are judged, if the ratio of the region intersected of two windows and the overall area of two windows is more than 0.5, then merge into a candidate frame, obtain final candidate frame box, as shown in Figure 3.
Originally it is implemented in this process providing a preferred acquisition candidate frame, merges to simplify follow-up place to overlapping candidate frame Reason.
Step 3: candidate frame is carried out the judgement of local tone difference, to obtain final text location window, shown in Fig. 1.
1) it is HSI (Hue-Saturation-Intensity) model by the RGB model conversation of scene graph, extracts the tone of scene graph H component, in H component, respectively one frame of extraction of each candidate frame box (i) obtaining step 2, if should The a certain bar limit of candidate frame box (i) is positioned at border, then the frame of this boundary direction does not takes, the synthesis of frame up and down that will obtain Individual region is as peripheral region box_neighbour (i) changing candidate frame.
2) each box_neighbour (i) and box (i) are asked for the rectangular histogram of H chrominance component, and this enforcement is by rectangular histogram intermediate value That maximum tone is as the dominant hue of this frame, and those skilled in the art can also make otherwise to determine in conjunction with actual demand Dominant hue, subtracts each other the dominant hue of each box (i) with box_neighbour (i) and takes absolute value, as the dominant hue difference of box (i), The averaged dominant color obtaining all candidate frames adjusts difference hue_aver, and the threshold value as next step uses.
h u e _ a v e r = 1 N Σ i = 1 N | b o x _ d o m i h u e ( i ) - b o x _ n e i g h b o u r _ d o m i h u e ( i ) |
Wherein: i represents the i-th candidate frame, N represents candidate frame number, box_domihue (i), represents the dominant hue of i-th frame, Box_neighbour_domihue (i) represents the field dominant hue of i-th frame.
3rd step: use Canny operator that the gray-scale map ImGray in scene is carried out computing, ask for edge graph ImCanny, In each candidate frame that step 2 obtains, find edge graph ImCanny be not equal to 0 pixel pixel, i.e. edge pixel point, The chrominance component H of the pixel up and down asking for edge pixel point is poor, and the chrominance component H of left and right pixel is poor, obtain up and down and The meansigma methods of left and right hue difference, asks for all ImCanny in box (i) candidate frame the most again and is not equal to the average of pixel p ixel of 0 Hue difference local_hue (i), compares with threshold value hue_aver, if greater than this threshold value, then retains this frame, otherwise casts out, Obtain final scene text location, as shown in Figure 4.
Wherein: x represents the position of the pixel of edge pixel in candidate frame, pixel_up (x), pixel_down (x), Pixel_left (x), pixel_right (x) represent the tone value up and down of pixel i respectively, and M represents candidate frame box (i) Middle edge pixel sum.

Claims (5)

1. the natural scene text location method of a local tone difference, it is characterised in that comprise the following steps:
1) by grader, scene picture is scanned, the candidate frame corresponding to obtain candidate character region;
2) scene picture is converted into HSI color model, extracts tone H component, calculate all candidate frames box (i) and adjacent region Dominant hue difference hue_aver in territory:
hue _ aver = 1 N Σ i = 1 N | box _ domihue ( i ) - box _ neighbour _ domihue ( i ) | ;
Wherein, box_domihue (i) is the dominant hue of i-th candidate frame box (i), box_neighbour_domihue (i) be with The dominant hue of candidate frame box (i) adjacent area;N is candidate frame sum in current scene picture;
3) take edge pixel point in scene picture, seek the average tone between all edge pixel points and neighbor pixel point in each candidate frame Difference local_hue (i);
4) average hue difference local_hue (i) that relatively each candidate frame is corresponding and the size of dominant hue difference hue_aver, work as candidate Average hue difference local_hue (i) that frame is corresponding is more than dominant hue difference hue_aver, then will regard current candidate frame as comprising word Region, otherwise gives up current candidate frame;After all candidate frames judge, final scene text location completes.
The natural scene text location method of a kind of local the most as claimed in claim 1 tone difference, it is characterised in that candidate frame The extracting method of adjacent area be:
When candidate frame box (i) does not has while being positioned at the border of scene picture, then in the respectively extraction one of candidate frame box (i) Frame, is positioned at the border of scene picture when candidate frame box (i) has, does not takes frame at this boundary direction;To candidate frame box (i) After being extracted adjacent frame, these frames are synthesized region adjacent area box_neighbour (i) as this candidate frame.
The natural scene text location method of a kind of local the most as claimed in claim 1 tone difference, it is characterised in that each candidate In frame, the computational methods of average hue difference local_hue (i) between all edge pixel points and neighbor pixel point are:
Wherein, x represents the position of the edge pixel point in i-th candidate frame, pixel_up (x), pixel_down (x), Pixel_left (x), pixel_right (x) represent the tone value of the upper and lower, left and right of pixel x respectively, and M represents i-th Edge pixel sum in individual candidate frame.
The natural scene text location method of a kind of local the most as claimed in claim 1 tone difference, it is characterised in that step 3) In the edge graph asked for by Canny operator obtain edge pixel point.
The natural scene text location method of a kind of local the most as claimed in claim 1 tone difference, it is characterised in that described master Tone is the maximum tone value in H histogram of component.
CN201310377443.8A 2013-08-27 2013-08-27 A kind of natural scene text location method of local tone difference Expired - Fee Related CN103440487B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310377443.8A CN103440487B (en) 2013-08-27 2013-08-27 A kind of natural scene text location method of local tone difference

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310377443.8A CN103440487B (en) 2013-08-27 2013-08-27 A kind of natural scene text location method of local tone difference

Publications (2)

Publication Number Publication Date
CN103440487A CN103440487A (en) 2013-12-11
CN103440487B true CN103440487B (en) 2016-11-02

Family

ID=49694180

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310377443.8A Expired - Fee Related CN103440487B (en) 2013-08-27 2013-08-27 A kind of natural scene text location method of local tone difference

Country Status (1)

Country Link
CN (1) CN103440487B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017059576A1 (en) * 2015-10-09 2017-04-13 Beijing Sensetime Technology Development Co., Ltd Apparatus and method for pedestrian detection
CN108564084A (en) * 2018-05-08 2018-09-21 北京市商汤科技开发有限公司 character detecting method, device, terminal and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102163284A (en) * 2011-04-11 2011-08-24 西安电子科技大学 Chinese environment-oriented complex scene text positioning method
US8331684B2 (en) * 2010-03-12 2012-12-11 Sony Corporation Color and intensity based meaningful object of interest detection

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7418141B2 (en) * 2003-03-31 2008-08-26 American Megatrends, Inc. Method, apparatus, and computer-readable medium for identifying character coordinates

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8331684B2 (en) * 2010-03-12 2012-12-11 Sony Corporation Color and intensity based meaningful object of interest detection
CN102163284A (en) * 2011-04-11 2011-08-24 西安电子科技大学 Chinese environment-oriented complex scene text positioning method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Fast and robust text detection in images and video frames;Qixiang Ye等;《Image and Vision Computing》;20051231;第565-576页 *
基于颜色散布分析的自然场景文本定位;周慧灿等;《计算机工程》;20100430;第36卷(第8期);第197-200页 *

Also Published As

Publication number Publication date
CN103440487A (en) 2013-12-11

Similar Documents

Publication Publication Date Title
CN104050471B (en) Natural scene character detection method and system
CN107346420B (en) Character detection and positioning method in natural scene based on deep learning
CN102968637B (en) Complicated background image and character division method
CN105373794B (en) A kind of licence plate recognition method
CN105046196B (en) Front truck information of vehicles structuring output method based on concatenated convolutional neutral net
CN103198315B (en) Based on the Character Segmentation of License Plate of character outline and template matches
EP2575077A2 (en) Road sign detecting method and road sign detecting apparatus
US9092696B2 (en) Image sign classifier
CN108805018A (en) Road signs detection recognition method, electronic equipment, storage medium and system
CN103824081B (en) Method for detecting rapid robustness traffic signs on outdoor bad illumination condition
CN104408449B (en) Intelligent mobile terminal scene literal processing method
CN105160691A (en) Color histogram based vehicle body color identification method
CN104751142A (en) Natural scene text detection algorithm based on stroke features
CN103336961B (en) A kind of interactively natural scene Method for text detection
CN104850850A (en) Binocular stereoscopic vision image feature extraction method combining shape and color
CN108765443A (en) A kind of mark enhancing processing method of adaptive color Threshold segmentation
CN105005766A (en) Vehicle body color identification method
Duan et al. A WBC segmentation methord based on HSI color space
CN103106409A (en) Composite character extraction method aiming at head shoulder detection
CN106529432A (en) Hand area segmentation method deeply integrating significance detection and prior knowledge
CN109753962B (en) Method for processing text region in natural scene image based on hybrid network
CN108664969A (en) Landmark identification method based on condition random field
CN104598907A (en) Stroke width figure based method for extracting Chinese character data from image
CN109145746B (en) Signal lamp detection method based on image processing
CN106874848A (en) A kind of pedestrian detection method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20161102

Termination date: 20190827