CN103413132B

CN103413132B - A kind of progressive level cognitive scene image text detection method

Info

Publication number: CN103413132B
Application number: CN201310253437.1A
Authority: CN
Inventors: 刘跃虎; 周刚; 苏远歧; 翟少卓
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2013-06-24
Filing date: 2013-06-24
Publication date: 2016-11-09
Anticipated expiration: 2033-06-24
Also published as: CN103413132A

Abstract

A kind of progressive level cognitive scene image text detection method, on the basis of Connected component has obtained, first with the adjacent Connected component set different with Rankine-Hugoniot relations composition in the space of Connected component: single Connected component, Connected component to and Connected component row；Then different features is separately designed for different Connected component set, using the text confidence level of different Connected component set as a kind of feature of follow-up Connected component set；By the cognitive classifier parameters assumed with conditional random field models each level of supervised learning of the uniformity of Connected component set, and calculate the text confidence level of Connected component successively；Final localization of text row；Integrated appearance feature of the present invention, low order relation and higher order relationship, directly calculate parameter and classification by classifier algorithm, can effectively improve recall ratio and the precision ratio of scene image text detection result.

Description

A kind of progressive level cognitive scene image text detection method

Technical field

The present invention relates to scene image text detection techniques field, be specifically related to the cognitive field of a kind of progressive level Scape image text detection method.

Background technology

Text detection is the visual appearance feature being had by word, positions out from image text filed, There is provided for follow-up text identification and provide powerful support for.Text detection as text message extraction in guardian technique, Become the hot research problem in computation vision field already.But text is as a kind of special sensation target, by In text size, font, color, languages etc., there is uncertainty, simultaneously substantial amounts of in natural scene image Complex background is easily obscured with text, and these make text filed being difficult to of scene image be detected.Existing The final steps based on the Method for text detection of Connected component is used to be according to the same non-textual of text Connected component The difference of Connected component makes a distinction, but the outward appearance of text Connected component is different and non-textual Connected component Outward appearance similar so that this Connected component distinguish become difficulty.

Therefore, be combined with context from the external appearance characteristic of Connected component that to make a distinction be a class technical strategies. Pan method utilize neighbour's binary crelation context and consider external appearance characteristic (with reference to the method for Pan: Pan YF, Hou XW,Liu CL.A Hybrid Approach to Detect and Localize Texts in Natural Scene Images[J].IEEE Transactions on Image Processing,2011, 20 (3): 800-813).The higher order relationship that text Connected component is spatially formed by Yi method and Yao method Analyze the line of text feature (method with reference to Yi: Chucai Y, YingLi T.Text string detection from natural scenes by structure-based partition and grouping[J].IEEE Transactions on Image Processing, the side of 2011,20 (9): 2594-2605. reference Yao Method: Cong Y, Xiang B, Wenyu L, et al.Detecting texts of arbitrary Orientations in natural images [C], 2012:1083-1090).But integrated appearance feature, Low order relation, higher order relationship still lack corresponding theoretical model, and this makes characteristic Design and parameter learning All having difficulties, the universality of model is not enough.

Content of the invention

In order to solve the problem that above-mentioned prior art exists, it is an object of the invention to provide a kind of progressive level Cognitive scene image Method for text detection, is used for the vision such as vehicle-mounted vision guided navigation and scene image semantic analysis Intelligence system, all effectively improves than existing methodical precision ratio and recall ratio in terms of Connected component analysis.

For reaching object above, the present invention adopts the following technical scheme that

A kind of progressive level cognitive scene image text detection method, uses for reference the level feature of human cognitive, On the basis of obtaining scene image Connected component, the space first with Connected component is adjacent and Rankine-Hugoniot relations forms not With Connected component set: single Connected component, Connected component to and Connected component row；Then for difference connection Component collections separately designs different features, using the text confidence level of different Connected component set as follow-up company A kind of feature of logical component collections；Assumed and condition random field mould by the uniformity of Connected component set is cognitive The classifier parameters of type each level of supervised learning, and calculate the text confidence level of Connected component successively；Finally Localization of text row；Specifically include following steps:

Step 1: in ground floor is analyzed, extract the external appearance characteristic of single Connected component, supervised by grader Learn and estimate the text confidence level of single Connected component；

Step 2: before the second layer is analyzed, the single Connected component of candidate, with spatial relation, clusters two-by-two Form Connected component pair；

Step 3: in the second layer is analyzed, extracts the similarity feature of Connected component pair and average Connected component Energy feature, by grader supervised learning the text confidence level estimating Connected component pair；

Step 4: before third layer is analyzed, candidate's Connected component, to associated relation and Rankine-Hugoniot relations, is formed Connected component row；

Step 5: in third layer is analyzed, difference in appearance feature, the histogram of gradients of extracting Connected component row are special Levy, the energy feature average of the energy feature average of all single Connected component and Connected component pair, utilize point Class device supervised learning localization of text row.

For single Connected component, the feature of design is external appearance characteristic, including geometric properties, live width feature with And textural characteristics.

For Connected component pair, the feature of design is similarity feature and average Connected component energy feature.

For Connected component row, the feature of design is difference in appearance feature, histogram of gradients feature and all lists The energy feature average of individual Connected component and the energy feature average of Connected component pair.

Present invention difference (innovative point) compared with the prior art is as follows:

1) present invention uses the level feature of human cognitive, from three level objects, designs character pair one by one, Analysis result is propagated between level, progressive filters non-textual Connected component, present invention introduces grader output Level as Connected component set text confidence level is propagated, and can effectively improve scene image text detection result Recall ratio and precision ratio；

2) model parameter estimation and classification are estimated, the present invention consider external appearance characteristic, low order relation and Higher order relationship, directly can calculate parameter and classification by classifier algorithm.And the method that has is at higher order relationship Under the conditions of be difficult to estimate parameter and presumption classification.

Brief description

Fig. 1 is the parameter learning of level cognitive model and cognitive presumption process.

Fig. 2 is energy feature analysis and the comparative graph of level cognitive model, and wherein Fig. 2 A is the second layer In the classification results comparative graph of three kinds of different characteristic set；Fig. 2 B is three kinds of different characteristic collection in third layer The classification results comparative graph closed.

Detailed description of the invention

Below in conjunction with drawings and the specific embodiments, the present invention is described in further detail.

First Connected component set in invention is generated and be described below.

In order to obtain the Connected component set of each layer, it would be desirable to cluster analysis is carried out to Connected component.Cluster It is divided into two steps, before the second layer is analyzed, cluster out the Connected component pair of candidate.Then divide in third layer Before analysis, it would be desirable to Connected component to the Connected component row forming candidate.Below two sorting procedures are entered Row illustrates.

Two adjacent and almost parallel appearance Connected component X_iAnd X_j, it is flagged as the Connected component of candidate Right, meet following two conditions:

dist(X_i,X_j)<2·max(max(w_i,h_i),max(w_j,h_j)) (1)

dist_y(X_i,X_j)<0.5·max(h_i,h_j) (2)

In formula (1) and formula (2): dist (X_i,X_j) represent two Connected component X_iAnd X_jThe Euclidean distance of barycenter, dist_y(X_i,X_j) represent the center-of-mass coordinate of two Connected component at longitudinal distance, (w_i,h_i) and (w_j,h_j) be respectively The width of the external frame of corresponding two Connected component and height.

Text Connected component is to (X_i,X_j) tiltangleθ_ijIt is defined as X_iAnd X_jBarycenter inclination angle, then two texts Connected component is to (X_i,X_j) and (X_j,X_kThe difference at the inclination angle between) can not be more than π/12, meets such as lower section Journey:

|θ_ij-θ_jk|≤π/12 (3)

By the connection of this two two-component pair, so that all Connected component point-blank can Connect together, form Connected component row.

Then the text confidence calculations for Connected component set is analyzed.

Assume have n Connected component to form Connected component row through the cluster of priori, thus constitute an artwork Type G=(v, ε).Wherein ε represents the limit constituting between all nodes, and v represents all of node.These nodes Constituting whole random sequence observation is X=[x₁,x₂,...x_n], being demarcated as of corresponding random sequence Y=[y₁,y₂,...y_n].Markov property (cluster considers spatial neighbors relation) is met, then between these nodes According to the definition to condition random field for the document, when sequence is demarcated as Y=Y^*When, with observed value X as conditional probability:

P(Y=Y^*|X)∝exp(-E(X,Y^*,C,Λ)) (4)

Wherein E (X, Y^*, C, Λ) and it is the energy function of whole graph model, C represents the son group in graph model, and Λ is energy The parameter of flow function.In level cognitive model, there are the three i.e. single Connected component of seed group, Connected component pair And Connected component row.And the energy of all of sub-group and, constitute whole energy function:

E (X, C) = \underset{c &Element; C}{Σ} V_{c} (X) - - - (5)

Wherein: V_c(X) represent energy that certain seed rolls into a ball and.Need to go further to solve the parameter in whole model Λ, and estimate final calibration result.

As a rule, the various parameter Estimation in condition random field are conditional log likelihood degree, i.e. maximize condition The method of probability solves.This kind of method is sought a kind of method maximizing probability often and (is i.e. minimized Energy function), carry out Optimal Parameters.If comprising polynary sub-group such as fruit group C, then Optimal Parameters problem becomes NP-hard problem, it is difficult to solve parameter.And the present invention is in text detection, two sides are done to this problem The hypothesis in face a: aspect thinks that the set of text Connected component constitutes a local association generally in the picture, And do not produce relation with the non-textual Connected component in image；Still further aspect, owing to we are only concerned text This kind of situation (i.e. Y of Connected component set^*=1), and situation about being likely to occur in other random fields we all recognize For being non-text sequence (i.e. Y^*=0).Therefore, only Y need to be estimated^*In this case the text confidence of=1 Degree P (Y^*=1 | X), referred to as uniformity is cognitive.Therefore minimum energy random field generally being used obtains mark Fixed mode, becomes and solves certain text confidence level under demarcating so that judge the Connected component mark constituting It is set to a binary classification problems, it is possible to the positive sample of Mechanism establishing and the negative sample training point of supervised learning The parameter of class device, and directly carry out the overall energy function of matching random field with the output of grader, such as Fig. 1 institute Show.The energy value of the son group under different levels, all can be by the output acquisition of grader.And from point From the perspective of class device, the front output which floor obtains, as the very strong feature of a kind of classification capacity at rear layer Grader judges.The ripe algorithm of simultaneously various graders, it is also possible to ensure parameter learning result Validity.

On the basis of obtaining clustering rule and model parameter study and cognitive presumption, the spy of design different levels Levy and calculate corresponding text confidence level.

1) single Connected component level: mainly comprise the feature of three types, geometric properties f_g, live width feature f_sw And textural characteristics f_t.Wherein geometric properties f_gIncluding the length-width ratio of each Connected component, axial length ratio, dutycycle With degree of compacting.And live width feature f_swBe calculate Connected component live width on the basis of, design live width ratio and Live width Variance feature.Textural characteristics f_tIt is foreground color uniformity and the background face calculating Connected component regional area Look uniformity.Three kinds of features can utilize grader supervised learning model parameter λ_uWith estimation text confidence level F_u(), thus obtain the energy value E of this Connected component_u(), such as following formula:

E_u(X,y_i=1,λ_u)=1-F_u([f_g(X),f_sw(X),f_t(X)],λ_u) (6)

2) Connected component is to level: mainly comprising two kinds of feature, average Connected component is to energy feature f_up With similarity feature f_sa.Wherein average Connected component is to energy feature f_up, the single company of i.e. last level acquisition The average of logical multicomponent energy feature.And similarity feature f_sa, it is aspect ratio, the live width ratio of two Connected component Poor with front background color.Same two kinds of features can utilize grader supervised learning parameter lambda_bConnect into estimation Point to text confidence level F_b(), thus obtain the energy value E of this Connected component pair_b(), such as following formula:

E_b(X,(y_i,y_j)=1,λ_b)=1-F_b([f_up(X),f_sa(X)],λ_b) (7)

3) Connected component row level: include Connected component row energy feature f_str, difference in appearance feature f_vAnd gradient Histogram feature f_hog.Connected component row energy feature f_strIncluding all single Connected component energy feature averages and Connected component is to energy feature average.Difference in appearance feature f_v, including the height variance of Connected component, live width Variance and foreground color variance.Histogram of gradients feature is total to by calculating 4 gradient directions and six regional areas The feature of 24 dimensions, describes the local grain distribution that line of text has.Use this level of grader supervised learning Parameter lambda_sWith text confidence level F estimating Connected component pair_s(), thus obtain the energy value of this Connected component row E_s(), such as following formula:

E_s(X,Y^*=1,λ_s)=1-F_s([f_str(X),f_v(X),f_hog(X)],λ_s) (8)

Pass through F_s() finally detects line of text.By experiment, this level cognitive model can pass through successively Precision ratio and the recall ratio filtering non-textual Connected component and effectively improving scene text.At standard testing collection Carry out positioning result on ICDAR2005 to compare, as shown in table 1.And the energy feature between the level designing, Also be proved to there is extraordinary effect, as shown in Figure 2, it can be seen that composition to energy feature with become branch energy Measure feature can effectively improve the classification capacity of level cognitive model.

Foregoing teachings is only explanation of the principles of the present invention.

Table 1 ICDAR2005 String localization results contrast

Claims

1. a progressive level cognitive scene image text detection method, it is characterised in that: use for reference the mankind and recognize The level feature known, is obtaining on the basis of scene image Connected component, the space first with Connected component adjacent and The Rankine-Hugoniot relations different Connected component set of composition: single Connected component, Connected component to and Connected component row；So After separately design different features for different Connected component set, the text of different Connected component set is put Reliability is as a kind of feature of follow-up Connected component set；Assumed by the uniformity of Connected component set is cognitive With the classifier parameters of conditional random field models each level of supervised learning, and successively calculate Connected component literary composition This confidence level；Final localization of text row；Specifically include following steps:

Step 2: before the second layer is analyzed, it is considered to adjacent and almost parallel appearance the Connected component of any two, Cluster forms Connected component pair two-by-two；

Step 4: before third layer is analyzed, connects the Connected component pair of all near linears arrangement, forms connection Become branch；

Step 5: in third layer is analyzed, extracts difference in appearance feature, the histogram of gradients of Connected component row Feature and all single Connected component energy feature averages and Connected component are to energy feature average, with classification Device supervised learning simultaneously finally determines line of text.

2. a kind of progressive level cognitive scene image text detection method according to claim 1, It is characterized in that: for single Connected component, the feature of design is external appearance characteristic, including geometric properties, line Quant's sign and textural characteristics.

3. a kind of progressive level cognitive scene image text detection method according to claim 1, It is characterized in that: for Connected component pair, the feature of design is similarity feature and average Connected component energy is special Levy.

4. a kind of progressive level cognitive scene image text detection method according to claim 1, It is characterized in that: for Connected component row, the feature of design be difference in appearance feature, histogram of gradients feature with And all single Connected component energy feature averages and Connected component are to energy feature average.