CN103413132A - Progressive level cognitive scene image text detection method - Google Patents

Progressive level cognitive scene image text detection method Download PDF

Info

Publication number
CN103413132A
CN103413132A CN2013102534371A CN201310253437A CN103413132A CN 103413132 A CN103413132 A CN 103413132A CN 2013102534371 A CN2013102534371 A CN 2013102534371A CN 201310253437 A CN201310253437 A CN 201310253437A CN 103413132 A CN103413132 A CN 103413132A
Authority
CN
China
Prior art keywords
composition
communicated
feature
text
scene image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013102534371A
Other languages
Chinese (zh)
Other versions
CN103413132B (en
Inventor
刘跃虎
周刚
苏远歧
翟少卓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN201310253437.1A priority Critical patent/CN103413132B/en
Publication of CN103413132A publication Critical patent/CN103413132A/en
Application granted granted Critical
Publication of CN103413132B publication Critical patent/CN103413132B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a progressive level cognitive scene image text detection method, which is characterized by comprising the steps of: composing different connected component collections according to space neighboring and arrangement relations of connected components on the basis of acquired connected components firstly, wherein the connected component collections include single connected components, connected component pairs and connected component rows; designing different features for the different connected component collections, taking the text confidence degree of the different connected component collections as a feature of subsequent connected component collections; performing supervised learning of classifier parameters of each level through a consistency cognitive hypothesis and a conditional random field model of the connected component collections, calculating text confidence degrees of the connected components in sequence; and positioning text lines finally. The progressive level cognitive scene image text detection method integrates appearance characteristics, low order relations and high order relations, direct calculates parameters and categories through a classifier algorithm, and can effectively improve the recall ratio and precision ratio of scene image text detection results.

Description

A kind of scene image Method for text detection of progressive level cognition
Technical field
The present invention relates to scene image text detection technical field, be specifically related to a kind of scene image Method for text detection of progressive level cognition.
Background technology
Text detection is the visual appearance feature had by word, out, provides powerful support for for follow-up text identification provides from location image text filed.Text detection, as the guardian technique in the text message extraction, becomes the hot research problem in computation vision field already.But text is as a kind of special sensation target, because text size, font, color, languages etc. have uncertainty, simultaneously in natural scene image a large amount of complex background easily and text obscure, these make text filed being difficult to of scene image be detected.Existing employing is according to text, to be communicated with composition to distinguish with the difference that non-text is communicated with composition based on the key step of the Method for text detection that is communicated with composition, but it is similar that text is communicated with the outward appearance outward appearance different, be communicated with composition with non-text of composition, make this connection composition distinguish the difficulty that becomes.
Therefore, be combined with context from the external appearance characteristic that is communicated with composition that to distinguish be the technology path that a class is new.The Pan method is utilized the context of neighbour's binary relation and is considered that external appearance characteristic is (with reference to the method for Pan: Pan YF, Hou XW, Liu CL.A Hybrid Approach to Detect and Localize Texts in Natural Scene Images[J] .IEEE Transactions on Image Processing, 2011,20 (3): 800-813).Yi method and Yao method are communicated with the high-order relationship analysis line of text feature that composition spatially forms (with reference to the method for Yi: Chucai Y to text, YingLi T.Text string detection from natural scenes by structure-based partition and grouping[J] .IEEE Transactions on Image Processing, 2011, 20 (9): 2594-2605. is with reference to the method for Yao: Cong Y, Xiang B, Wenyu L, et al.Detecting texts of arbitrary orientations in natural images[C], 2012:1083-1090).But integrated appearance feature, low order relation, high-order relation still lack corresponding theoretical model, this makes characteristic Design and parameter learning all have difficulties, the universality deficiency of model.
Summary of the invention
The problem existed in order to solve above-mentioned prior art, the object of the present invention is to provide a kind of scene image Method for text detection of progressive level cognition, for Visual intelligent systems such as vehicle-mounted vision guided navigation and scene image semantic analyses, than existing methodical precision ratio and recall ratio, all effectively improve aspect connect component analysis.
For reaching above purpose, the present invention adopts following technical scheme:
A kind of scene image Method for text detection of progressive level cognition, use for reference the level characteristics of human cognitive, be communicated with on the composition basis obtaining scene image, at first utilize the adjacent and Rankine-Hugoniot relations in the space that is communicated with composition to form different connection composition set: single connection composition, be communicated with composition to be communicated with in lines; Then for difference, be communicated with the composition set and design respectively different features, difference is communicated with to a kind of feature of the text degree of confidence of composition set as the set of follow-up connection composition; By the classifier parameters of the cognitive hypothesis of the consistance that is communicated with the composition set and each level of conditional random field models supervised learning, and calculate successively the text degree of confidence that is communicated with composition; Final localization of text row; Specifically comprise the steps:
Step 1: in ground floor is analyzed, extract the external appearance characteristic of single connection composition, with the sorter supervised learning and estimate the text degree of confidence of single connection composition;
Step 2: before the second layer was analyzed, the single connection composition of candidate was with spatial relation, and cluster forms and is communicated with composition pair in twos;
Step 3: in the second layer is analyzed, extract and be communicated with the right similarity feature of composition and the average composition energy feature that is communicated with, be communicated with the right text degree of confidence of composition with sorter supervised learning estimation;
Step 4: before the 3rd layer analysis, the candidate is communicated with composition to the relation of being connected and Rankine-Hugoniot relations, forms and is communicated with into branch;
Step 5: in the 3rd layer analysis, extract the energy feature average that is communicated with into difference in appearance feature in lines, histogram of gradients feature, all single connection compositions and be communicated with the right energy feature average of composition, utilizing sorter supervised learning localization of text row.
For single connection composition, design be characterized as external appearance characteristic, comprise geometric properties, live width feature and textural characteristics.
For being communicated with composition pair, design be characterized as similarity feature and the average composition energy feature that is communicated with.
For being communicated with into branch, the energy feature average that is characterized as difference in appearance feature, histogram of gradients feature and all single connection compositions of design and the right energy feature average of connection composition.
The present invention's difference (innovative point) compared with the prior art is as follows:
1) the present invention adopts the level characteristics of human cognitive, from three level objects, design one by one character pair, analysis result is propagated between level, the non-text of progressive filtering is communicated with composition, the present invention introduces sorter output and propagates as the level that is communicated with composition set text degree of confidence, can effectively improve recall ratio and the precision ratio of scene image text detection result;
2) for model parameter estimation and classification, infer, the present invention considers external appearance characteristic, low order relation and high-order relation, can directly pass through classifier algorithm calculating parameter and classification.And existing method is difficult to estimated parameter and infers classification under the high-order relation condition.
The accompanying drawing explanation
Fig. 1 is that process is inferred in parameter learning and the cognition of level cognitive model.
Fig. 2 is energy feature analysis and the comparative graph of level cognitive model, and wherein Fig. 2 A is the classification results comparative graph of three kinds of different characteristic set in the second layer; Fig. 2 B is the classification results comparative graph of three kinds of different characteristic set in the 3rd layer.
Embodiment
The present invention is described in further detail below in conjunction with drawings and the specific embodiments.
At first in invention, being communicated with composition set generation, carry out following description.
In order to obtain the connection composition set of every one deck, we need to carry out cluster analysis to being communicated with composition.Cluster is divided into two steps, and before the second layer was analyzed, cluster went out candidate's connection composition pair.Then before the 3rd layer analysis, we need to become branch to the connection that forms the candidate being communicated with composition.Below with regard to two sorting procedures, be specifically described.
Two adjacent and connection component X almost parallel appearance iAnd X j, just be marked as candidate's connection composition pair, meet following two conditions:
dist(X i,X j)<2·max(max(w i,h i),max(w j,h j)) (1)
dist y(X i,X j)<0.5·max(h i,h j) (2)
In formula (1) and formula (2): dist (X i, X j) mean that two are communicated with component X iAnd X jThe Euclidean distance of barycenter, dist y(X i, X j) mean that two center-of-mass coordinates that are communicated with compositions are at distance longitudinally, (w i, h i) and (w j, h j) be respectively corresponding two width and height that are communicated with the external frame of compositions.
Text is communicated with composition to (X i, X j) tiltangleθ IjBe defined as X iAnd X jThe barycenter inclination angle, two texts are communicated with compositions to (X i, X j) and (X j, X k) between the difference at pitch angle can not be greater than π/12, meet following equation:
ijjk|≤π/12 (3)
By the right connection of this composition in twos, can, so that all point-blank connection compositions can both connect together, form and be communicated with into branch.
Then for the text confidence calculations that is communicated with the composition set, analyze.
Suppose to have n cluster formation that is communicated with composition process priori to be communicated with into branch, so just form a graph model G=(v, ε).Wherein ε means the limit formed between all nodes, and v means all nodes.It is X=[x that these nodes have formed whole random series observed reading 1, x 2... x n], the demarcation of corresponding random series is Y=[y 1, y 2... y n].Between these nodes, meet Markov property (cluster is considered the spatial neighbors relation), according to the definition of document to condition random field, when sequence is demarcated as Y=Y *The time, the observed value X of take is conditional probability:
P(Y=Y *|X)∝exp(-E(X,Y *,C,Λ)) (4)
E (X, Y wherein *, C, Λ) and be the energy function of whole graph model, the sub-group in C presentation graphs model, Λ is the parameter of energy function.In the level cognitive model, three seeds groups are arranged is single connection composition, be communicated with composition to and be communicated with into branch.And the energy of all sons groups and, formed whole energy function:
E ( X , C ) = &Sigma; c &Element; C V c ( X ) - - - ( 5 )
Wherein: V c(X) mean certain seed group energy and.Need to further go to solve the parameter Λ in whole model, and infer final calibration result.
As a rule, the various parameter estimation in condition random field are condition log likelihoods, maximize namely that the method for conditional probability solves.These class methods are sought a kind of method (being the minimization of energy function) that maximizes probability often, carry out Optimal Parameters.As the C of fruit group, comprise polynary son and roll into a ball, the Optimal Parameters problem becomes the NP-hard problem, is difficult to solve parameter.And the present invention has done the hypothesis of two aspects to this problem in text detection: an aspect thinks that the set of text connection composition forms a local association usually in image, and with the non-text in image, is not communicated with composition generation relation; In addition on the one hand, because we only are concerned about that it (is Y that text is communicated with this a kind of situation of composition set *=1), and the situation that may occur in other random fields we all think that non-text sequence (is Y *=0).Therefore, only need estimate Y *=1 text degree of confidence P (Y in this case *=1|X), be referred to as the consistance cognition.Therefore minimization of energy random field adopted usually obtains the mode of demarcating, become the text degree of confidence solved under certain demarcation, the connection composition scale that makes judgement form is decided to be a binary classification problems, just can set up by the mechanism of supervised learning the parameter of positive sample and negative sample training classifier, and with the output of sorter, directly carry out the energy function of match random field integral body, as shown in Figure 1.The energy value of the son group under different levels is all to obtain by the output of sorter.And from the angle of sorter, the front output which floor obtains, the feature very strong as a kind of classification capacity judges in the sorter of rear layer.The ripe algorithm of various sorters simultaneously, also can guarantee the validity of parameter learning result.
Obtaining on clustering rule and model parameter study and cognitive basis of inferring, the feature of design different levels is also calculated corresponding text degree of confidence.
1) single connection composition level: mainly comprise the feature of three types, geometric properties f g, live width feature f SwAnd textural characteristics f t.Geometric properties f wherein gComprise that each is communicated with length breadth ratio, axial length ratio, dutycycle and the degree of compacting of composition.And live width feature f SwTo calculate on the basis that is communicated with the composition live width, designing live width ratio and live width Variance feature.Textural characteristics f tTo calculate foreground color consistance and the background color consistance that is communicated with the composition regional area.Three kinds of features can be utilized sorter supervised learning model parameter λ uWith estimation text degree of confidence F u(), thus the energy value E of this connection composition obtained u(), as shown in the formula:
E u(X,y i=1,λ u)=1-F u([f g(X),f sw(X),f t(X)],λ u) (6)
2) be communicated with composition to level: mainly comprise the feature of two types, on average be communicated with composition to energy feature f UpWith similarity feature f Sa.Wherein on average be communicated with composition to energy feature f Up, i.e. the average of the single connection composition energy feature that obtains of last level.And similarity feature f Sa, be that two aspect ratio, live width ratio and front background colors that are communicated with composition are poor.Same two kinds of features can be utilized sorter supervised learning parameter lambda bWith estimation, be communicated with the right text degree of confidence F of composition b(), thus the right energy value E of this connection composition obtained b(), as shown in the formula:
E b(X,(y i,y j)=1,λ b)=1-F b([f up(X),f sa(X)],λ b) (7)
3) be communicated with into branch's level: comprise and be communicated with into the energy feature f of branch str, difference in appearance feature f vAnd histogram of gradients feature f hog.Be communicated with into the energy feature f of branch strComprise all single connection composition energy feature averages and be communicated with composition to the energy feature average.Difference in appearance feature f v, comprise the height variance, live width variance and the foreground color variance that are communicated with composition.The histogram of gradients feature is by calculating 4 gradient directions and six regional areas features of totally 24 dimensions, described the local grain that line of text has and distributed.Adopt this level parameter lambda of sorter supervised learning sWith estimation, be communicated with the right text degree of confidence F of composition s(), thus the energy value E that this is communicated with into branch obtained s(), as shown in the formula:
E s(X,Y *=1,λ s)=1-F s([f str(X),f v(X),f hog(X)],λ s) (8)
Pass through F s() finally detects line of text.By experiment, this level cognitive model can effectively improve by the non-text connection composition of filtering successively precision ratio and the recall ratio of scene text.On standard testing collection ICDAR2005, carry out positioning result relatively, as shown in table 1.And the energy feature between the level of design also is proved to be and has extraordinary effect, as shown in Figure 2, can find out that composition can both effectively improve the classification capacity of level cognitive model to energy feature with becoming branch's energy feature.
Aforementioned content is only explanation of the principles of the present invention.
Table 1 ICDAR2005 text positioning result relatively
Figure BDA00003394547800071

Claims (4)

1. the scene image Method for text detection of a progressive level cognition, it is characterized in that: the level characteristics of using for reference human cognitive, be communicated with on the composition basis obtaining scene image, at first utilize the adjacent and Rankine-Hugoniot relations in the space that is communicated with composition to form different connection composition set: single connection composition, be communicated with composition to be communicated with in lines; Then for difference, be communicated with the composition set and design respectively different features, difference is communicated with to a kind of feature of the text degree of confidence of composition set as the set of follow-up connection composition; By the classifier parameters of the cognitive hypothesis of the consistance that is communicated with the composition set and each level of conditional random field models supervised learning, and calculate successively the text degree of confidence that is communicated with composition; Final localization of text row; Specifically comprise the steps:
Step 1: in ground floor is analyzed, extract the external appearance characteristic of single connection composition, with the sorter supervised learning and estimate the text degree of confidence of single connection composition;
Step 2: before the second layer was analyzed, the single connection composition of candidate was with spatial relation, and cluster forms and is communicated with composition pair in twos;
Step 3: in the second layer is analyzed, extract and be communicated with the right similarity feature of composition and the average composition energy feature that is communicated with, be communicated with the right text degree of confidence of composition with sorter supervised learning estimation;
Step 4: before the 3rd layer analysis, the candidate is communicated with composition to the relation of being connected and Rankine-Hugoniot relations, forms and is communicated with into branch;
Step 5: in the 3rd layer analysis, extract difference in appearance feature, histogram of gradients feature and all single connection composition energy feature averages that is communicated with into branch and be communicated with composition to the energy feature average, going out line of text with sorter supervised learning final decision.
2. the scene image Method for text detection of a kind of progressive level cognition according to claim 1 is characterized in that: for single connection composition, design be characterized as external appearance characteristic, comprise geometric properties, live width feature and textural characteristics.
3. the scene image Method for text detection of a kind of progressive level cognition according to claim 1 is characterized in that: for being communicated with composition pair, design be characterized as similarity feature and the average composition energy feature that is communicated with.
4. the scene image Method for text detection of a kind of progressive level cognition according to claim 1, it is characterized in that: for being communicated with into branch, being characterized as difference in appearance feature, histogram of gradients feature and all single connection composition energy feature averages and being communicated with composition to the energy feature average of design.
CN201310253437.1A 2013-06-24 2013-06-24 A kind of progressive level cognitive scene image text detection method Expired - Fee Related CN103413132B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310253437.1A CN103413132B (en) 2013-06-24 2013-06-24 A kind of progressive level cognitive scene image text detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310253437.1A CN103413132B (en) 2013-06-24 2013-06-24 A kind of progressive level cognitive scene image text detection method

Publications (2)

Publication Number Publication Date
CN103413132A true CN103413132A (en) 2013-11-27
CN103413132B CN103413132B (en) 2016-11-09

Family

ID=49606139

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310253437.1A Expired - Fee Related CN103413132B (en) 2013-06-24 2013-06-24 A kind of progressive level cognitive scene image text detection method

Country Status (1)

Country Link
CN (1) CN103413132B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229349A (en) * 2017-12-21 2018-06-29 中国科学院自动化研究所 Reticulate pattern facial image identification device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110007970A1 (en) * 2009-07-10 2011-01-13 Palo Alto Research Center Incorporated System and method for segmenting text lines in documents
CN103093228A (en) * 2013-01-17 2013-05-08 上海交通大学 Chinese detection method in natural scene image based on connected domain

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110007970A1 (en) * 2009-07-10 2011-01-13 Palo Alto Research Center Incorporated System and method for segmenting text lines in documents
CN103093228A (en) * 2013-01-17 2013-05-08 上海交通大学 Chinese detection method in natural scene image based on connected domain

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229349A (en) * 2017-12-21 2018-06-29 中国科学院自动化研究所 Reticulate pattern facial image identification device
CN108229349B (en) * 2017-12-21 2020-09-01 中国科学院自动化研究所 Reticulate pattern human face image recognition device

Also Published As

Publication number Publication date
CN103413132B (en) 2016-11-09

Similar Documents

Publication Publication Date Title
CN114119582B (en) Synthetic aperture radar image target detection method
CN109061600B (en) Target identification method based on millimeter wave radar data
CN107633226B (en) Human body motion tracking feature processing method
CN107563355A (en) Hyperspectral abnormity detection method based on generation confrontation network
CN111046964B (en) Convolutional neural network-based human and vehicle infrared thermal image identification method
CN102073841B (en) Poor video detection method and device
CN109934224B (en) Small target detection method based on Markov random field and visual contrast mechanism
CN107025440A (en) A kind of remote sensing images method for extracting roads based on new convolutional neural networks
CN101950364A (en) Remote sensing image change detection method based on neighbourhood similarity and threshold segmentation
CN105426863B (en) The method and apparatus for detecting lane line
CN102663723B (en) Image segmentation method based on color sample and electric field model
CN109671070A (en) A kind of object detection method merged based on characteristic weighing and feature correlation
CN105404886A (en) Feature model generating method and feature model generating device
CN107564022A (en) Saliency detection method based on Bayesian Fusion
CN104537356B (en) Pedestrian identification method and the device again that sequence carries out Gait Recognition are taken turns using Switzerland
CN103632146A (en) Head-shoulder distance based human body detection method
CN104657980A (en) Improved multi-channel image partitioning algorithm based on Meanshift
CN105160649A (en) Multi-target tracking method and system based on kernel function unsupervised clustering
CN113255430A (en) Method for detecting and counting crowd distribution in video based on deep learning
CN108171119B (en) SAR image change detection method based on residual error network
CN107369158A (en) The estimation of indoor scene layout and target area extracting method based on RGB D images
CN111950498A (en) Lane line detection method and device based on end-to-end instance segmentation
CN102982539A (en) Characteristic self-adaption image common segmentation method based on image complexity
CN108846416A (en) The extraction process method and system of specific image
CN104732534B (en) Well-marked target takes method and system in a kind of image

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20161109

CF01 Termination of patent right due to non-payment of annual fee