CN103413132B - A kind of progressive level cognitive scene image text detection method - Google Patents

A kind of progressive level cognitive scene image text detection method Download PDF

Info

Publication number
CN103413132B
CN103413132B CN201310253437.1A CN201310253437A CN103413132B CN 103413132 B CN103413132 B CN 103413132B CN 201310253437 A CN201310253437 A CN 201310253437A CN 103413132 B CN103413132 B CN 103413132B
Authority
CN
China
Prior art keywords
connected component
feature
text
level
scene image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310253437.1A
Other languages
Chinese (zh)
Other versions
CN103413132A (en
Inventor
刘跃虎
周刚
苏远歧
翟少卓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN201310253437.1A priority Critical patent/CN103413132B/en
Publication of CN103413132A publication Critical patent/CN103413132A/en
Application granted granted Critical
Publication of CN103413132B publication Critical patent/CN103413132B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A kind of progressive level cognitive scene image text detection method, on the basis of Connected component has obtained, first with the adjacent Connected component set different with Rankine-Hugoniot relations composition in the space of Connected component: single Connected component, Connected component to and Connected component row;Then different features is separately designed for different Connected component set, using the text confidence level of different Connected component set as a kind of feature of follow-up Connected component set;By the cognitive classifier parameters assumed with conditional random field models each level of supervised learning of the uniformity of Connected component set, and calculate the text confidence level of Connected component successively;Final localization of text row;Integrated appearance feature of the present invention, low order relation and higher order relationship, directly calculate parameter and classification by classifier algorithm, can effectively improve recall ratio and the precision ratio of scene image text detection result.

Description

A kind of progressive level cognitive scene image text detection method
Technical field
The present invention relates to scene image text detection techniques field, be specifically related to the cognitive field of a kind of progressive level Scape image text detection method.
Background technology
Text detection is the visual appearance feature being had by word, positions out from image text filed, There is provided for follow-up text identification and provide powerful support for.Text detection as text message extraction in guardian technique, Become the hot research problem in computation vision field already.But text is as a kind of special sensation target, by In text size, font, color, languages etc., there is uncertainty, simultaneously substantial amounts of in natural scene image Complex background is easily obscured with text, and these make text filed being difficult to of scene image be detected.Existing The final steps based on the Method for text detection of Connected component is used to be according to the same non-textual of text Connected component The difference of Connected component makes a distinction, but the outward appearance of text Connected component is different and non-textual Connected component Outward appearance similar so that this Connected component distinguish become difficulty.
Therefore, be combined with context from the external appearance characteristic of Connected component that to make a distinction be a class technical strategies. Pan method utilize neighbour's binary crelation context and consider external appearance characteristic (with reference to the method for Pan: Pan YF, Hou XW,Liu CL.A Hybrid Approach to Detect and Localize Texts in Natural Scene Images[J].IEEE Transactions on Image Processing,2011, 20 (3): 800-813).The higher order relationship that text Connected component is spatially formed by Yi method and Yao method Analyze the line of text feature (method with reference to Yi: Chucai Y, YingLi T.Text string detection from natural scenes by structure-based partition and grouping[J].IEEE Transactions on Image Processing, the side of 2011,20 (9): 2594-2605. reference Yao Method: Cong Y, Xiang B, Wenyu L, et al.Detecting texts of arbitrary Orientations in natural images [C], 2012:1083-1090).But integrated appearance feature, Low order relation, higher order relationship still lack corresponding theoretical model, and this makes characteristic Design and parameter learning All having difficulties, the universality of model is not enough.
Content of the invention
In order to solve the problem that above-mentioned prior art exists, it is an object of the invention to provide a kind of progressive level Cognitive scene image Method for text detection, is used for the vision such as vehicle-mounted vision guided navigation and scene image semantic analysis Intelligence system, all effectively improves than existing methodical precision ratio and recall ratio in terms of Connected component analysis.
For reaching object above, the present invention adopts the following technical scheme that
A kind of progressive level cognitive scene image text detection method, uses for reference the level feature of human cognitive, On the basis of obtaining scene image Connected component, the space first with Connected component is adjacent and Rankine-Hugoniot relations forms not With Connected component set: single Connected component, Connected component to and Connected component row;Then for difference connection Component collections separately designs different features, using the text confidence level of different Connected component set as follow-up company A kind of feature of logical component collections;Assumed and condition random field mould by the uniformity of Connected component set is cognitive The classifier parameters of type each level of supervised learning, and calculate the text confidence level of Connected component successively;Finally Localization of text row;Specifically include following steps:
Step 1: in ground floor is analyzed, extract the external appearance characteristic of single Connected component, supervised by grader Learn and estimate the text confidence level of single Connected component;
Step 2: before the second layer is analyzed, the single Connected component of candidate, with spatial relation, clusters two-by-two Form Connected component pair;
Step 3: in the second layer is analyzed, extracts the similarity feature of Connected component pair and average Connected component Energy feature, by grader supervised learning the text confidence level estimating Connected component pair;
Step 4: before third layer is analyzed, candidate's Connected component, to associated relation and Rankine-Hugoniot relations, is formed Connected component row;
Step 5: in third layer is analyzed, difference in appearance feature, the histogram of gradients of extracting Connected component row are special Levy, the energy feature average of the energy feature average of all single Connected component and Connected component pair, utilize point Class device supervised learning localization of text row.
For single Connected component, the feature of design is external appearance characteristic, including geometric properties, live width feature with And textural characteristics.
For Connected component pair, the feature of design is similarity feature and average Connected component energy feature.
For Connected component row, the feature of design is difference in appearance feature, histogram of gradients feature and all lists The energy feature average of individual Connected component and the energy feature average of Connected component pair.
Present invention difference (innovative point) compared with the prior art is as follows:
1) present invention uses the level feature of human cognitive, from three level objects, designs character pair one by one, Analysis result is propagated between level, progressive filters non-textual Connected component, present invention introduces grader output Level as Connected component set text confidence level is propagated, and can effectively improve scene image text detection result Recall ratio and precision ratio;
2) model parameter estimation and classification are estimated, the present invention consider external appearance characteristic, low order relation and Higher order relationship, directly can calculate parameter and classification by classifier algorithm.And the method that has is at higher order relationship Under the conditions of be difficult to estimate parameter and presumption classification.
Brief description
Fig. 1 is the parameter learning of level cognitive model and cognitive presumption process.
Fig. 2 is energy feature analysis and the comparative graph of level cognitive model, and wherein Fig. 2 A is the second layer In the classification results comparative graph of three kinds of different characteristic set;Fig. 2 B is three kinds of different characteristic collection in third layer The classification results comparative graph closed.
Detailed description of the invention
Below in conjunction with drawings and the specific embodiments, the present invention is described in further detail.
First Connected component set in invention is generated and be described below.
In order to obtain the Connected component set of each layer, it would be desirable to cluster analysis is carried out to Connected component.Cluster It is divided into two steps, before the second layer is analyzed, cluster out the Connected component pair of candidate.Then divide in third layer Before analysis, it would be desirable to Connected component to the Connected component row forming candidate.Below two sorting procedures are entered Row illustrates.
Two adjacent and almost parallel appearance Connected component XiAnd Xj, it is flagged as the Connected component of candidate Right, meet following two conditions:
dist(Xi,Xj)<2·max(max(wi,hi),max(wj,hj)) (1)
disty(Xi,Xj)<0.5·max(hi,hj) (2)
In formula (1) and formula (2): dist (Xi,Xj) represent two Connected component XiAnd XjThe Euclidean distance of barycenter, disty(Xi,Xj) represent the center-of-mass coordinate of two Connected component at longitudinal distance, (wi,hi) and (wj,hj) be respectively The width of the external frame of corresponding two Connected component and height.
Text Connected component is to (Xi,Xj) tiltangleθijIt is defined as XiAnd XjBarycenter inclination angle, then two texts Connected component is to (Xi,Xj) and (Xj,XkThe difference at the inclination angle between) can not be more than π/12, meets such as lower section Journey:
ijjk|≤π/12 (3)
By the connection of this two two-component pair, so that all Connected component point-blank can Connect together, form Connected component row.
Then the text confidence calculations for Connected component set is analyzed.
Assume have n Connected component to form Connected component row through the cluster of priori, thus constitute an artwork Type G=(v, ε).Wherein ε represents the limit constituting between all nodes, and v represents all of node.These nodes Constituting whole random sequence observation is X=[x1,x2,...xn], being demarcated as of corresponding random sequence Y=[y1,y2,...yn].Markov property (cluster considers spatial neighbors relation) is met, then between these nodes According to the definition to condition random field for the document, when sequence is demarcated as Y=Y*When, with observed value X as conditional probability:
P(Y=Y*|X)∝exp(-E(X,Y*,C,Λ)) (4)
Wherein E (X, Y*, C, Λ) and it is the energy function of whole graph model, C represents the son group in graph model, and Λ is energy The parameter of flow function.In level cognitive model, there are the three i.e. single Connected component of seed group, Connected component pair And Connected component row.And the energy of all of sub-group and, constitute whole energy function:
E ( X , C ) = &Sigma; c &Element; C V c ( X ) - - - ( 5 )
Wherein: Vc(X) represent energy that certain seed rolls into a ball and.Need to go further to solve the parameter in whole model Λ, and estimate final calibration result.
As a rule, the various parameter Estimation in condition random field are conditional log likelihood degree, i.e. maximize condition The method of probability solves.This kind of method is sought a kind of method maximizing probability often and (is i.e. minimized Energy function), carry out Optimal Parameters.If comprising polynary sub-group such as fruit group C, then Optimal Parameters problem becomes NP-hard problem, it is difficult to solve parameter.And the present invention is in text detection, two sides are done to this problem The hypothesis in face a: aspect thinks that the set of text Connected component constitutes a local association generally in the picture, And do not produce relation with the non-textual Connected component in image;Still further aspect, owing to we are only concerned text This kind of situation (i.e. Y of Connected component set*=1), and situation about being likely to occur in other random fields we all recognize For being non-text sequence (i.e. Y*=0).Therefore, only Y need to be estimated*In this case the text confidence of=1 Degree P (Y*=1 | X), referred to as uniformity is cognitive.Therefore minimum energy random field generally being used obtains mark Fixed mode, becomes and solves certain text confidence level under demarcating so that judge the Connected component mark constituting It is set to a binary classification problems, it is possible to the positive sample of Mechanism establishing and the negative sample training point of supervised learning The parameter of class device, and directly carry out the overall energy function of matching random field with the output of grader, such as Fig. 1 institute Show.The energy value of the son group under different levels, all can be by the output acquisition of grader.And from point From the perspective of class device, the front output which floor obtains, as the very strong feature of a kind of classification capacity at rear layer Grader judges.The ripe algorithm of simultaneously various graders, it is also possible to ensure parameter learning result Validity.
On the basis of obtaining clustering rule and model parameter study and cognitive presumption, the spy of design different levels Levy and calculate corresponding text confidence level.
1) single Connected component level: mainly comprise the feature of three types, geometric properties fg, live width feature fsw And textural characteristics ft.Wherein geometric properties fgIncluding the length-width ratio of each Connected component, axial length ratio, dutycycle With degree of compacting.And live width feature fswBe calculate Connected component live width on the basis of, design live width ratio and Live width Variance feature.Textural characteristics ftIt is foreground color uniformity and the background face calculating Connected component regional area Look uniformity.Three kinds of features can utilize grader supervised learning model parameter λuWith estimation text confidence level Fu(), thus obtain the energy value E of this Connected componentu(), such as following formula:
Eu(X,yi=1,λu)=1-Fu([fg(X),fsw(X),ft(X)],λu) (6)
2) Connected component is to level: mainly comprising two kinds of feature, average Connected component is to energy feature fup With similarity feature fsa.Wherein average Connected component is to energy feature fup, the single company of i.e. last level acquisition The average of logical multicomponent energy feature.And similarity feature fsa, it is aspect ratio, the live width ratio of two Connected component Poor with front background color.Same two kinds of features can utilize grader supervised learning parameter lambdabConnect into estimation Point to text confidence level Fb(), thus obtain the energy value E of this Connected component pairb(), such as following formula:
Eb(X,(yi,yj)=1,λb)=1-Fb([fup(X),fsa(X)],λb) (7)
3) Connected component row level: include Connected component row energy feature fstr, difference in appearance feature fvAnd gradient Histogram feature fhog.Connected component row energy feature fstrIncluding all single Connected component energy feature averages and Connected component is to energy feature average.Difference in appearance feature fv, including the height variance of Connected component, live width Variance and foreground color variance.Histogram of gradients feature is total to by calculating 4 gradient directions and six regional areas The feature of 24 dimensions, describes the local grain distribution that line of text has.Use this level of grader supervised learning Parameter lambdasWith text confidence level F estimating Connected component pairs(), thus obtain the energy value of this Connected component row Es(), such as following formula:
Es(X,Y*=1,λs)=1-Fs([fstr(X),fv(X),fhog(X)],λs) (8)
Pass through Fs() finally detects line of text.By experiment, this level cognitive model can pass through successively Precision ratio and the recall ratio filtering non-textual Connected component and effectively improving scene text.At standard testing collection Carry out positioning result on ICDAR2005 to compare, as shown in table 1.And the energy feature between the level designing, Also be proved to there is extraordinary effect, as shown in Figure 2, it can be seen that composition to energy feature with become branch energy Measure feature can effectively improve the classification capacity of level cognitive model.
Foregoing teachings is only explanation of the principles of the present invention.
Table 1 ICDAR2005 String localization results contrast

Claims (4)

1. a progressive level cognitive scene image text detection method, it is characterised in that: use for reference the mankind and recognize The level feature known, is obtaining on the basis of scene image Connected component, the space first with Connected component adjacent and The Rankine-Hugoniot relations different Connected component set of composition: single Connected component, Connected component to and Connected component row;So After separately design different features for different Connected component set, the text of different Connected component set is put Reliability is as a kind of feature of follow-up Connected component set;Assumed by the uniformity of Connected component set is cognitive With the classifier parameters of conditional random field models each level of supervised learning, and successively calculate Connected component literary composition This confidence level;Final localization of text row;Specifically include following steps:
Step 1: in ground floor is analyzed, extract the external appearance characteristic of single Connected component, supervised by grader Learn and estimate the text confidence level of single Connected component;
Step 2: before the second layer is analyzed, it is considered to adjacent and almost parallel appearance the Connected component of any two, Cluster forms Connected component pair two-by-two;
Step 3: in the second layer is analyzed, extracts the similarity feature of Connected component pair and average Connected component Energy feature, by grader supervised learning the text confidence level estimating Connected component pair;
Step 4: before third layer is analyzed, connects the Connected component pair of all near linears arrangement, forms connection Become branch;
Step 5: in third layer is analyzed, extracts difference in appearance feature, the histogram of gradients of Connected component row Feature and all single Connected component energy feature averages and Connected component are to energy feature average, with classification Device supervised learning simultaneously finally determines line of text.
2. a kind of progressive level cognitive scene image text detection method according to claim 1, It is characterized in that: for single Connected component, the feature of design is external appearance characteristic, including geometric properties, line Quant's sign and textural characteristics.
3. a kind of progressive level cognitive scene image text detection method according to claim 1, It is characterized in that: for Connected component pair, the feature of design is similarity feature and average Connected component energy is special Levy.
4. a kind of progressive level cognitive scene image text detection method according to claim 1, It is characterized in that: for Connected component row, the feature of design be difference in appearance feature, histogram of gradients feature with And all single Connected component energy feature averages and Connected component are to energy feature average.
CN201310253437.1A 2013-06-24 2013-06-24 A kind of progressive level cognitive scene image text detection method Active CN103413132B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310253437.1A CN103413132B (en) 2013-06-24 2013-06-24 A kind of progressive level cognitive scene image text detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310253437.1A CN103413132B (en) 2013-06-24 2013-06-24 A kind of progressive level cognitive scene image text detection method

Publications (2)

Publication Number Publication Date
CN103413132A CN103413132A (en) 2013-11-27
CN103413132B true CN103413132B (en) 2016-11-09

Family

ID=49606139

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310253437.1A Active CN103413132B (en) 2013-06-24 2013-06-24 A kind of progressive level cognitive scene image text detection method

Country Status (1)

Country Link
CN (1) CN103413132B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229349B (en) * 2017-12-21 2020-09-01 中国科学院自动化研究所 Reticulate pattern human face image recognition device

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093228A (en) * 2013-01-17 2013-05-08 上海交通大学 Chinese detection method in natural scene image based on connected domain

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8649600B2 (en) * 2009-07-10 2014-02-11 Palo Alto Research Center Incorporated System and method for segmenting text lines in documents

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103093228A (en) * 2013-01-17 2013-05-08 上海交通大学 Chinese detection method in natural scene image based on connected domain

Also Published As

Publication number Publication date
CN103413132A (en) 2013-11-27

Similar Documents

Publication Publication Date Title
Nurunnabi et al. Outlier detection and robust normal-curvature estimation in mobile laser scanning 3D point cloud data
CN111612008B (en) Image segmentation method based on convolution network
CN110175576A (en) A kind of driving vehicle visible detection method of combination laser point cloud data
CN107025440A (en) A kind of remote sensing images method for extracting roads based on new convolutional neural networks
CN109871875B (en) Building change detection method based on deep learning
CN107633226B (en) Human body motion tracking feature processing method
CN102073841B (en) Poor video detection method and device
CN109934224B (en) Small target detection method based on Markov random field and visual contrast mechanism
CN103632146B (en) A kind of based on head and shoulder away from human body detecting method
JP2016018538A (en) Image recognition device and method and program
CN105894047A (en) Human face classification system based on three-dimensional data
CN107563349A (en) A kind of Population size estimation method based on VGGNet
CN102663723B (en) Image segmentation method based on color sample and electric field model
CN106023257A (en) Target tracking method based on rotor UAV platform
Mei et al. Scene-adaptive off-road detection using a monocular camera
CN107369158A (en) The estimation of indoor scene layout and target area extracting method based on RGB D images
CN106570490A (en) Pedestrian real-time tracking method based on fast clustering
Lee et al. Integrating multiple character proposals for robust scene text extraction
Yin et al. Spherical coordinates based methods of ground extraction and objects segmentation using 3-D LiDAR sensor
CN104951793A (en) STDF (standard test data format) feature based human behavior recognition algorithm
CN105809678B (en) A kind of line segment feature global registration method between two views under short base line condition
Zhang et al. Adaptive dense pyramid network for object detection in UAV imagery
Li et al. An aerial image segmentation approach based on enhanced multi-scale convolutional neural network
CN103413132B (en) A kind of progressive level cognitive scene image text detection method
Yen et al. Ninepins: Nuclei instance segmentation with point annotations

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant