CN103413132B - A kind of progressive level cognitive scene image text detection method - Google Patents
A kind of progressive level cognitive scene image text detection method Download PDFInfo
- Publication number
- CN103413132B CN103413132B CN201310253437.1A CN201310253437A CN103413132B CN 103413132 B CN103413132 B CN 103413132B CN 201310253437 A CN201310253437 A CN 201310253437A CN 103413132 B CN103413132 B CN 103413132B
- Authority
- CN
- China
- Prior art keywords
- connected component
- feature
- text
- level
- scene image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Landscapes
- Image Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A kind of progressive level cognitive scene image text detection method, on the basis of Connected component has obtained, first with the adjacent Connected component set different with Rankine-Hugoniot relations composition in the space of Connected component: single Connected component, Connected component to and Connected component row;Then different features is separately designed for different Connected component set, using the text confidence level of different Connected component set as a kind of feature of follow-up Connected component set;By the cognitive classifier parameters assumed with conditional random field models each level of supervised learning of the uniformity of Connected component set, and calculate the text confidence level of Connected component successively;Final localization of text row;Integrated appearance feature of the present invention, low order relation and higher order relationship, directly calculate parameter and classification by classifier algorithm, can effectively improve recall ratio and the precision ratio of scene image text detection result.
Description
Technical field
The present invention relates to scene image text detection techniques field, be specifically related to the cognitive field of a kind of progressive level
Scape image text detection method.
Background technology
Text detection is the visual appearance feature being had by word, positions out from image text filed,
There is provided for follow-up text identification and provide powerful support for.Text detection as text message extraction in guardian technique,
Become the hot research problem in computation vision field already.But text is as a kind of special sensation target, by
In text size, font, color, languages etc., there is uncertainty, simultaneously substantial amounts of in natural scene image
Complex background is easily obscured with text, and these make text filed being difficult to of scene image be detected.Existing
The final steps based on the Method for text detection of Connected component is used to be according to the same non-textual of text Connected component
The difference of Connected component makes a distinction, but the outward appearance of text Connected component is different and non-textual Connected component
Outward appearance similar so that this Connected component distinguish become difficulty.
Therefore, be combined with context from the external appearance characteristic of Connected component that to make a distinction be a class technical strategies.
Pan method utilize neighbour's binary crelation context and consider external appearance characteristic (with reference to the method for Pan: Pan YF,
Hou XW,Liu CL.A Hybrid Approach to Detect and Localize Texts in Natural
Scene Images[J].IEEE Transactions on Image Processing,2011,
20 (3): 800-813).The higher order relationship that text Connected component is spatially formed by Yi method and Yao method
Analyze the line of text feature (method with reference to Yi: Chucai Y, YingLi T.Text string detection
from natural scenes by structure-based partition and grouping[J].IEEE
Transactions on Image Processing, the side of 2011,20 (9): 2594-2605. reference Yao
Method: Cong Y, Xiang B, Wenyu L, et al.Detecting texts of arbitrary
Orientations in natural images [C], 2012:1083-1090).But integrated appearance feature,
Low order relation, higher order relationship still lack corresponding theoretical model, and this makes characteristic Design and parameter learning
All having difficulties, the universality of model is not enough.
Content of the invention
In order to solve the problem that above-mentioned prior art exists, it is an object of the invention to provide a kind of progressive level
Cognitive scene image Method for text detection, is used for the vision such as vehicle-mounted vision guided navigation and scene image semantic analysis
Intelligence system, all effectively improves than existing methodical precision ratio and recall ratio in terms of Connected component analysis.
For reaching object above, the present invention adopts the following technical scheme that
A kind of progressive level cognitive scene image text detection method, uses for reference the level feature of human cognitive,
On the basis of obtaining scene image Connected component, the space first with Connected component is adjacent and Rankine-Hugoniot relations forms not
With Connected component set: single Connected component, Connected component to and Connected component row;Then for difference connection
Component collections separately designs different features, using the text confidence level of different Connected component set as follow-up company
A kind of feature of logical component collections;Assumed and condition random field mould by the uniformity of Connected component set is cognitive
The classifier parameters of type each level of supervised learning, and calculate the text confidence level of Connected component successively;Finally
Localization of text row;Specifically include following steps:
Step 1: in ground floor is analyzed, extract the external appearance characteristic of single Connected component, supervised by grader
Learn and estimate the text confidence level of single Connected component;
Step 2: before the second layer is analyzed, the single Connected component of candidate, with spatial relation, clusters two-by-two
Form Connected component pair;
Step 3: in the second layer is analyzed, extracts the similarity feature of Connected component pair and average Connected component
Energy feature, by grader supervised learning the text confidence level estimating Connected component pair;
Step 4: before third layer is analyzed, candidate's Connected component, to associated relation and Rankine-Hugoniot relations, is formed
Connected component row;
Step 5: in third layer is analyzed, difference in appearance feature, the histogram of gradients of extracting Connected component row are special
Levy, the energy feature average of the energy feature average of all single Connected component and Connected component pair, utilize point
Class device supervised learning localization of text row.
For single Connected component, the feature of design is external appearance characteristic, including geometric properties, live width feature with
And textural characteristics.
For Connected component pair, the feature of design is similarity feature and average Connected component energy feature.
For Connected component row, the feature of design is difference in appearance feature, histogram of gradients feature and all lists
The energy feature average of individual Connected component and the energy feature average of Connected component pair.
Present invention difference (innovative point) compared with the prior art is as follows:
1) present invention uses the level feature of human cognitive, from three level objects, designs character pair one by one,
Analysis result is propagated between level, progressive filters non-textual Connected component, present invention introduces grader output
Level as Connected component set text confidence level is propagated, and can effectively improve scene image text detection result
Recall ratio and precision ratio;
2) model parameter estimation and classification are estimated, the present invention consider external appearance characteristic, low order relation and
Higher order relationship, directly can calculate parameter and classification by classifier algorithm.And the method that has is at higher order relationship
Under the conditions of be difficult to estimate parameter and presumption classification.
Brief description
Fig. 1 is the parameter learning of level cognitive model and cognitive presumption process.
Fig. 2 is energy feature analysis and the comparative graph of level cognitive model, and wherein Fig. 2 A is the second layer
In the classification results comparative graph of three kinds of different characteristic set;Fig. 2 B is three kinds of different characteristic collection in third layer
The classification results comparative graph closed.
Detailed description of the invention
Below in conjunction with drawings and the specific embodiments, the present invention is described in further detail.
First Connected component set in invention is generated and be described below.
In order to obtain the Connected component set of each layer, it would be desirable to cluster analysis is carried out to Connected component.Cluster
It is divided into two steps, before the second layer is analyzed, cluster out the Connected component pair of candidate.Then divide in third layer
Before analysis, it would be desirable to Connected component to the Connected component row forming candidate.Below two sorting procedures are entered
Row illustrates.
Two adjacent and almost parallel appearance Connected component XiAnd Xj, it is flagged as the Connected component of candidate
Right, meet following two conditions:
dist(Xi,Xj)<2·max(max(wi,hi),max(wj,hj)) (1)
disty(Xi,Xj)<0.5·max(hi,hj) (2)
In formula (1) and formula (2): dist (Xi,Xj) represent two Connected component XiAnd XjThe Euclidean distance of barycenter,
disty(Xi,Xj) represent the center-of-mass coordinate of two Connected component at longitudinal distance, (wi,hi) and (wj,hj) be respectively
The width of the external frame of corresponding two Connected component and height.
Text Connected component is to (Xi,Xj) tiltangleθijIt is defined as XiAnd XjBarycenter inclination angle, then two texts
Connected component is to (Xi,Xj) and (Xj,XkThe difference at the inclination angle between) can not be more than π/12, meets such as lower section
Journey:
|θij-θjk|≤π/12 (3)
By the connection of this two two-component pair, so that all Connected component point-blank can
Connect together, form Connected component row.
Then the text confidence calculations for Connected component set is analyzed.
Assume have n Connected component to form Connected component row through the cluster of priori, thus constitute an artwork
Type G=(v, ε).Wherein ε represents the limit constituting between all nodes, and v represents all of node.These nodes
Constituting whole random sequence observation is X=[x1,x2,...xn], being demarcated as of corresponding random sequence
Y=[y1,y2,...yn].Markov property (cluster considers spatial neighbors relation) is met, then between these nodes
According to the definition to condition random field for the document, when sequence is demarcated as Y=Y*When, with observed value X as conditional probability:
P(Y=Y*|X)∝exp(-E(X,Y*,C,Λ)) (4)
Wherein E (X, Y*, C, Λ) and it is the energy function of whole graph model, C represents the son group in graph model, and Λ is energy
The parameter of flow function.In level cognitive model, there are the three i.e. single Connected component of seed group, Connected component pair
And Connected component row.And the energy of all of sub-group and, constitute whole energy function:
Wherein: Vc(X) represent energy that certain seed rolls into a ball and.Need to go further to solve the parameter in whole model
Λ, and estimate final calibration result.
As a rule, the various parameter Estimation in condition random field are conditional log likelihood degree, i.e. maximize condition
The method of probability solves.This kind of method is sought a kind of method maximizing probability often and (is i.e. minimized
Energy function), carry out Optimal Parameters.If comprising polynary sub-group such as fruit group C, then Optimal Parameters problem becomes
NP-hard problem, it is difficult to solve parameter.And the present invention is in text detection, two sides are done to this problem
The hypothesis in face a: aspect thinks that the set of text Connected component constitutes a local association generally in the picture,
And do not produce relation with the non-textual Connected component in image;Still further aspect, owing to we are only concerned text
This kind of situation (i.e. Y of Connected component set*=1), and situation about being likely to occur in other random fields we all recognize
For being non-text sequence (i.e. Y*=0).Therefore, only Y need to be estimated*In this case the text confidence of=1
Degree P (Y*=1 | X), referred to as uniformity is cognitive.Therefore minimum energy random field generally being used obtains mark
Fixed mode, becomes and solves certain text confidence level under demarcating so that judge the Connected component mark constituting
It is set to a binary classification problems, it is possible to the positive sample of Mechanism establishing and the negative sample training point of supervised learning
The parameter of class device, and directly carry out the overall energy function of matching random field with the output of grader, such as Fig. 1 institute
Show.The energy value of the son group under different levels, all can be by the output acquisition of grader.And from point
From the perspective of class device, the front output which floor obtains, as the very strong feature of a kind of classification capacity at rear layer
Grader judges.The ripe algorithm of simultaneously various graders, it is also possible to ensure parameter learning result
Validity.
On the basis of obtaining clustering rule and model parameter study and cognitive presumption, the spy of design different levels
Levy and calculate corresponding text confidence level.
1) single Connected component level: mainly comprise the feature of three types, geometric properties fg, live width feature fsw
And textural characteristics ft.Wherein geometric properties fgIncluding the length-width ratio of each Connected component, axial length ratio, dutycycle
With degree of compacting.And live width feature fswBe calculate Connected component live width on the basis of, design live width ratio and
Live width Variance feature.Textural characteristics ftIt is foreground color uniformity and the background face calculating Connected component regional area
Look uniformity.Three kinds of features can utilize grader supervised learning model parameter λuWith estimation text confidence level
Fu(), thus obtain the energy value E of this Connected componentu(), such as following formula:
Eu(X,yi=1,λu)=1-Fu([fg(X),fsw(X),ft(X)],λu) (6)
2) Connected component is to level: mainly comprising two kinds of feature, average Connected component is to energy feature fup
With similarity feature fsa.Wherein average Connected component is to energy feature fup, the single company of i.e. last level acquisition
The average of logical multicomponent energy feature.And similarity feature fsa, it is aspect ratio, the live width ratio of two Connected component
Poor with front background color.Same two kinds of features can utilize grader supervised learning parameter lambdabConnect into estimation
Point to text confidence level Fb(), thus obtain the energy value E of this Connected component pairb(), such as following formula:
Eb(X,(yi,yj)=1,λb)=1-Fb([fup(X),fsa(X)],λb) (7)
3) Connected component row level: include Connected component row energy feature fstr, difference in appearance feature fvAnd gradient
Histogram feature fhog.Connected component row energy feature fstrIncluding all single Connected component energy feature averages and
Connected component is to energy feature average.Difference in appearance feature fv, including the height variance of Connected component, live width
Variance and foreground color variance.Histogram of gradients feature is total to by calculating 4 gradient directions and six regional areas
The feature of 24 dimensions, describes the local grain distribution that line of text has.Use this level of grader supervised learning
Parameter lambdasWith text confidence level F estimating Connected component pairs(), thus obtain the energy value of this Connected component row
Es(), such as following formula:
Es(X,Y*=1,λs)=1-Fs([fstr(X),fv(X),fhog(X)],λs) (8)
Pass through Fs() finally detects line of text.By experiment, this level cognitive model can pass through successively
Precision ratio and the recall ratio filtering non-textual Connected component and effectively improving scene text.At standard testing collection
Carry out positioning result on ICDAR2005 to compare, as shown in table 1.And the energy feature between the level designing,
Also be proved to there is extraordinary effect, as shown in Figure 2, it can be seen that composition to energy feature with become branch energy
Measure feature can effectively improve the classification capacity of level cognitive model.
Foregoing teachings is only explanation of the principles of the present invention.
Table 1 ICDAR2005 String localization results contrast
Claims (4)
1. a progressive level cognitive scene image text detection method, it is characterised in that: use for reference the mankind and recognize
The level feature known, is obtaining on the basis of scene image Connected component, the space first with Connected component adjacent and
The Rankine-Hugoniot relations different Connected component set of composition: single Connected component, Connected component to and Connected component row;So
After separately design different features for different Connected component set, the text of different Connected component set is put
Reliability is as a kind of feature of follow-up Connected component set;Assumed by the uniformity of Connected component set is cognitive
With the classifier parameters of conditional random field models each level of supervised learning, and successively calculate Connected component literary composition
This confidence level;Final localization of text row;Specifically include following steps:
Step 1: in ground floor is analyzed, extract the external appearance characteristic of single Connected component, supervised by grader
Learn and estimate the text confidence level of single Connected component;
Step 2: before the second layer is analyzed, it is considered to adjacent and almost parallel appearance the Connected component of any two,
Cluster forms Connected component pair two-by-two;
Step 3: in the second layer is analyzed, extracts the similarity feature of Connected component pair and average Connected component
Energy feature, by grader supervised learning the text confidence level estimating Connected component pair;
Step 4: before third layer is analyzed, connects the Connected component pair of all near linears arrangement, forms connection
Become branch;
Step 5: in third layer is analyzed, extracts difference in appearance feature, the histogram of gradients of Connected component row
Feature and all single Connected component energy feature averages and Connected component are to energy feature average, with classification
Device supervised learning simultaneously finally determines line of text.
2. a kind of progressive level cognitive scene image text detection method according to claim 1,
It is characterized in that: for single Connected component, the feature of design is external appearance characteristic, including geometric properties, line
Quant's sign and textural characteristics.
3. a kind of progressive level cognitive scene image text detection method according to claim 1,
It is characterized in that: for Connected component pair, the feature of design is similarity feature and average Connected component energy is special
Levy.
4. a kind of progressive level cognitive scene image text detection method according to claim 1,
It is characterized in that: for Connected component row, the feature of design be difference in appearance feature, histogram of gradients feature with
And all single Connected component energy feature averages and Connected component are to energy feature average.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310253437.1A CN103413132B (en) | 2013-06-24 | 2013-06-24 | A kind of progressive level cognitive scene image text detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310253437.1A CN103413132B (en) | 2013-06-24 | 2013-06-24 | A kind of progressive level cognitive scene image text detection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103413132A CN103413132A (en) | 2013-11-27 |
CN103413132B true CN103413132B (en) | 2016-11-09 |
Family
ID=49606139
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310253437.1A Active CN103413132B (en) | 2013-06-24 | 2013-06-24 | A kind of progressive level cognitive scene image text detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103413132B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108229349B (en) * | 2017-12-21 | 2020-09-01 | 中国科学院自动化研究所 | Reticulate pattern human face image recognition device |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103093228A (en) * | 2013-01-17 | 2013-05-08 | 上海交通大学 | Chinese detection method in natural scene image based on connected domain |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8649600B2 (en) * | 2009-07-10 | 2014-02-11 | Palo Alto Research Center Incorporated | System and method for segmenting text lines in documents |
-
2013
- 2013-06-24 CN CN201310253437.1A patent/CN103413132B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103093228A (en) * | 2013-01-17 | 2013-05-08 | 上海交通大学 | Chinese detection method in natural scene image based on connected domain |
Also Published As
Publication number | Publication date |
---|---|
CN103413132A (en) | 2013-11-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Nurunnabi et al. | Outlier detection and robust normal-curvature estimation in mobile laser scanning 3D point cloud data | |
CN111612008B (en) | Image segmentation method based on convolution network | |
CN110175576A (en) | A kind of driving vehicle visible detection method of combination laser point cloud data | |
CN107025440A (en) | A kind of remote sensing images method for extracting roads based on new convolutional neural networks | |
CN109871875B (en) | Building change detection method based on deep learning | |
CN107633226B (en) | Human body motion tracking feature processing method | |
CN102073841B (en) | Poor video detection method and device | |
CN109934224B (en) | Small target detection method based on Markov random field and visual contrast mechanism | |
CN103632146B (en) | A kind of based on head and shoulder away from human body detecting method | |
JP2016018538A (en) | Image recognition device and method and program | |
CN105894047A (en) | Human face classification system based on three-dimensional data | |
CN107563349A (en) | A kind of Population size estimation method based on VGGNet | |
CN102663723B (en) | Image segmentation method based on color sample and electric field model | |
CN106023257A (en) | Target tracking method based on rotor UAV platform | |
Mei et al. | Scene-adaptive off-road detection using a monocular camera | |
CN107369158A (en) | The estimation of indoor scene layout and target area extracting method based on RGB D images | |
CN106570490A (en) | Pedestrian real-time tracking method based on fast clustering | |
Lee et al. | Integrating multiple character proposals for robust scene text extraction | |
Yin et al. | Spherical coordinates based methods of ground extraction and objects segmentation using 3-D LiDAR sensor | |
CN104951793A (en) | STDF (standard test data format) feature based human behavior recognition algorithm | |
CN105809678B (en) | A kind of line segment feature global registration method between two views under short base line condition | |
Zhang et al. | Adaptive dense pyramid network for object detection in UAV imagery | |
Li et al. | An aerial image segmentation approach based on enhanced multi-scale convolutional neural network | |
CN103413132B (en) | A kind of progressive level cognitive scene image text detection method | |
Yen et al. | Ninepins: Nuclei instance segmentation with point annotations |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |