CN103136523A

CN103136523A - Arbitrary direction text line detection method in natural image

Info

Publication number: CN103136523A
Application number: CN2012105060724A
Authority: CN
Inventors: 魏宝刚; 庄越挺; 袁杰; 张引
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2012-11-29
Filing date: 2012-11-29
Publication date: 2013-06-05
Anticipated expiration: 2032-11-29
Also published as: CN103136523B

Abstract

The invention discloses an arbitrary direction text line detection method in a natural image. The arbitrary direction text line detection method in the natural image comprises the following steps: (1) candidate text areas are detected through a maximum stable extreme area detection method with restriction, compound similarity among area pairs is obtained according to combination with size of the areas, absolute distance, relative distance, geometric similarity in a contextual information definition area and color similarity; (2) three areas are found, serve as seed areas of a candidate text line and are expanded to all areas in the line through a candidate text line identification method based on the similarity; and (3) non-text lines are removed through a filter based on morphologic framework characteristics, the filter conducts filtering through a sparse classifier and characteristic vectors required by the classifier are taken from the morphologic framework characteristics in all areas of the candidate text line. The arbitrary direction text line detection method in the natural image can detect text in arbitrary directions in the natural image. In addition, due to adoption of the area interior characteristic structured classifier, better identification accuracy can be achieved.

Description

Any direction text line detection method in a kind of natural image

Technical field

The present invention relates to any direction text line detection method in a kind of natural image, the method be used for to realize that the scene text that detects any direction at natural image is used for OCR identification, belongs to the Computer Image Processing field.

Background technology

Along with the development of multimedia and electronics industry, increasing image information is produced.How effectively to organize and retrieve them and just become a difficult problem.All contain Word message in a lot of image documents, as the front cover of books, guideboard sign, buildings (on name information is arranged) etc., these Word messages and picture material are closely related.If effectively detect and identify these Word messages, can utilize them that image document is organized and retrieved, have very strong practical value.

Word/text detection can be divided into three kinds of methods: based on the method for the method of gradient, color-based cluster with based on the method for texture.Larger based on the method for gradient hypothesis text its edge strength with respect to background, therefore have larger Grad pixel its be that text filed probability is just larger.The Institute of Electrical and Electric Engineers image is processed journal " 2011 years the 20th phase IEEETransactionson Image Processing, vol.20, no.9,2011) the text stroke is detected in the stroke path between the point by approximate opposite gradient direction is arranged on the searching image edge pair, then uses cluster with other heuristic rule, stroke classification to be become different line of text.It will become unreliable when the background area also comprises a lot of marginal information based on the shortcoming of the method for gradient.Method based on texture is used the texture feature extractions such as Gabor filtrator, wavelet transformation, Fast Fourier Transform (FFT), and then the method with machine learning such as neural network, svm classifier devices detects caption area.Ieee communication technology meeting paper in 2008 is concentrated (In Proceeding of IEEE International Conference on Communication Technology (ICCT), 2008, a kind of method of 722-725) announcing by 4 fritter wavelet coefficients being merged into a bulk of the location in the big font text, then strengthens effect with morphological dilation and neural grid with the HARR wavelet transformation.Method based on texture can't detect the text filed of any direction.ACM multimedia technology meeting paper in 2007 is concentrated (In Proceedings ofthe ACM International Multimedia Conference and Exhibition2007 (MM), a kind of method of 847-850) announcing uses color cluster to remove noise, and they come the relatively best planes of color of adaptive selection to carry out the binaryzation operation according to the TEXT CONTRAST difference on the abstract plane of each color.Textcolor in the method hypothesis frame of video of color-based cluster is all unified, yet this hypothesis is in most of the cases invalid, so the limitation of its application is larger.To detect its effect undesirable owing to utilizing a kind of feature to carry out captions, and therefore a lot of methods are united and used above various features.

Above these Method for text detections all made some good tries, but the characteristics such as, the direction strong with the background area calibration that has due to the natural scene text arbitrarily, the position is fixing cause these methods to the detection poor effect of any direction text in natural image.

Summary of the invention

The objective of the invention is to overcome the deficiencies in the prior art, any direction Method for text detection in a kind of natural image is provided.

In natural image, any direction text line detection method comprises the following steps:

(1) the maximum stable extreme regions detection method with belt restraining detects the text filed of candidate, then calmodulin binding domain CaM is big or small, absolute distance, relative distance, geometric similarity degree between the contextual information defined range, and be combined with color similarity obtain the zone pair between synthetic similarity;

(2) employing based on candidate's line of text recognition methods of similarity, at first finds Three regions as the seed region of candidate's line of text, then expands to the All Ranges of this row;

(3) employing is removed non-line of text based on the filtrator of morphology framework characteristic, and this filtrator uses a sparse sorter to filter, and the required proper vector of sorter is taken from the morphology framework characteristic of All Ranges on candidate's line of text.

described maximum stable extreme regions detection method with belt restraining detects the text filed of candidate, then calmodulin binding domain CaM is big or small, absolute distance, relative distance, geometric similarity degree between the contextual information defined range, and be combined with color similarity obtain the zone pair between synthetic similarity step be: at first, it is text filed as the candidate that detection calculates in Britain's machine vision all MSER maximum stable extreme regions that propose in 2002 nd Annual Meeting collection Robust Wide Baseline Stereofrom Maximally Stable Extremal Regions one literary compositions, then use the marginal information of canny operator extraction image, with the constrained line of these edge lines as MSERs, collect connected region, in collection process, a pixel can only be connected to the pixel of its four direction up and down, two pixels that can prevent like this edge pixel side couple together, after collecting all connected regions, use the geometric similarity degree between following 5 steps definition arbitrary region pair:

Step 1: be provided with connected region CC _iAnd CC _j, its standardization absolute distance is defined as follows:

{dis}_{ij}^{1} = \frac{| C_{x_{i}} - C_{x_{j}} | + k_{1} * | C_{y_{i}} - C_{y_{j}} |}{k_{1} h_{im} + w_{im} - (k_{1} * (h_{i} + h_{j}) + w_{i} + w_{j}) / 2} - - - 1

Wherein

Represent respectively CC _iAnd CC _jCentral point horizontal and vertical coordinate, h _i, h _j, w _i, w _jRepresent respectively CC _iAnd CC _jHeight and width, h _im, w _imThe height and width that represent respectively present image, k ₁Be a constant of controlling horizontal range and vertical range contribution proportion, its value is set as 2.

Value from 0 to 1.

Step 2: in order to enlarge the difference of the right distance of different CC, the distance metric in formula 1 further is modified as following expression:

{dis}_{ij}^{2} = {dis}_{ij}^{1} \cdot \sqrt{{&aleph;}_{i} (j)} - - - 2

Wherein

Expression

Arrive CC at all _iThe central point distance

The sequence number number.

Step 3: formula 2 further is revised as:

{dis}_{ij}^{3} = \min_{p_{k}} {{dis}_{{ip}_{k} j}^{2}, | p_{k} | &Element; {0,1,2, . . . . . ., n - 2}} - - - 3

P wherein _kExpression is from CC _iTo CC _jA paths, and length is between 0 to n-2,

Expression CC _iTo CC _jBetween shortest path, can obtain by the algorithm of Floyd algorithm or other identity functions:

Step 4: defining two interregional shape distances is:

{dis}_{ij}^{4} = \sqrt{\frac{\max (h_{i}, h_{j}) \cdot \max (w_{i}, w_{j})}{\min (h_{i}, h_{j}) \cdot \min (w_{i}, w_{j})}} - - - 4

H wherein _i, h _j, w _i, w _jRepresent respectively CC _iAnd CC _jHeight and width.

Step 5: connected region CC _iAnd CC _jThe geometric similarity degree be:

{simi}_{geometry} (i, j) = \exp (- \sqrt{\max ({dis}_{ij}^{3}, {dis}_{ji}^{3}) \cdot {dis}_{ij}^{4}}) - - - 5

Connected region CC _iAnd CC _jColor similarity be:

At first with image by the RGB color space conversion to the hsv color space, H, S, V component are quantized into respectively 8,3,3 grades.Color histogram is 72 dimensions like this.Suppose CC _iAnd CC _jColor feature vector be respectively C _i=[C _{I, 1}, C _i,2..., C _{I, t}..., C _i,n] and C _j=[C _{J, 1}, C _{J, 2}..., C _{J, t}..., C _j,n], color similarity is:

{simi}_{color} (i, j) = Σ_{t = 1}^{n} \min (C_{i, t}, C_{j, t}) - - - 6

N gets 72;

Finally, the synthetic similarity of two connected region synthetic geometry similarities and color similarity is:

simi(i,j)＝(simi _geometry(i,j)+simi _color(i,j))/27。

Described employing is based on candidate's line of text recognition methods of similarity, at first find Three regions as the seed region of candidate's line of text, the All Ranges step that then expands to this row is: carry out candidate's line of text and generate on the basis of judging based on the brother between connected region pair;

(1) brother judges:

Whether enough the brother judges two zones similar and neighbour.If two zones are not the brothers, they can not be merged into the one text row, define following three restrictive conditions and judge whether two connected regions are brothers:

A) ratio of the height and width of two adjacent areas should be at two threshold value T ₁And T ₂Between;

B) distance between two connected regions should be greater than T ₃Multiply by the high or wide of larger zone;

C) two adjacent characters should have similar color characteristic, so their color similarity should be greater than a threshold value T ₄Formalization representation is as follows:

S_{ij} = S_{ij}^{1}^S_{ij}^{2}^S_{ij}^{3} - - - 8

S _ijExpression connected region CC _iAnd CC _jWhether, if its value be 1, be similar zone, they may belong to the one text row, otherwise can not belong to the one text row if being similar zone,

Represent respectively three above-mentioned restrictive conditions, T ₁, T ₂, T ₃, T ₄Be set as respectively 2,4,3,0.4,

The refinement of condition 1 judge as shown in the formula:

h _r＝max(h _i，h ₁)/min(h _i,h ₁)

w _r＝max(w _i，w ₁)/mm(w _i，w ₁9

In formula 10, θ represents connected region CC _iAnd CC _jAngle between the line of central point and X-axis positive dirction;

The refinement of condition 2 is judged as follows:

S_{ij}^{2} = \{\begin{matrix} 1 & {dis}_{ij} \leq T_{3} \cdot \max (h_{i}, h_{j}) & | tgθ | > 1 \\ 1 & {dis}_{ij} \leq T_{3} \cdot \max (w_{i}, w_{j}) & | tgθ | \leq 1 \\ 0 & others \end{matrix} - - - 11

(2) candidate's line of text generates:

In order to produce candidate's line of text, at first find three seed connected regions, then expand to and comprise more connected region, step is as follows:

Step 1: make UL _ccRepresent all the current set of not determining the connected region composition of line of text, at first initialization set UL _ccBe the set that all connected regions form, for each region division respective flag position, and be initialized as 0.To UL _ccIn each connected region calculate similarity simi (i, *) between other connected region of it and all, then take out two maximum similarities and obtain they and, be designated as partSimi (CC _i), then all partSimi values are by descending sort;

Step 2: for satisfying condition

S _ij=1 ∧ S _1k=1/partSimi (CC _k)≤partSimi (CC _i) ∧ partSimi (CC _j)≤partSimi (CC _i), any three connected region CC _i∈ UL _cc, CC _j∈ UL _cc, and CC _k∈ UL _cc, use following formula to calculate differential seat angle Δ θ _ijk;

V (c wherein _ic _j) and v (c _jc _k) represent respectively vectorial c _ic _jAnd c _jc _k

Use the same method and calculate Δ θ _jikWith Δ θ _ikjIf

Produce a new line of text L _t, record its element S _cc(L _t)={ CC _i, CC _j, CC _kAnd calculate c _ic _jAnd c _jc _kAverage angle as the angle of inclination of current text line, and with CC _i, CC _j, and CC _kFrom set UL _ccIn remove.These three connected regions are just as the kind daughter element of current text line;

Any two line segment c _ic _jAnd c _mc _nThe average tilt angle

Be calculated as follows:

\overset{&OverBar;}{θ} = \{\begin{matrix} \frac{θ_{ij} + θ_{mn} + π}{2} & if θ_{ij} \cdot θ_{mn} \leq 0^\max {| θ_{ij} |, | θ_{mn} |} &GreaterEqual; \frac{π}{4} \\ \frac{θ_{ij} + θ_{mn}}{2} & otherwise \end{matrix} - - - 13

In top equation, θ _ijAnd θ _mnSpan is

Step 3: for UL _ccIn remaining arbitrary connected region CC _m, use following formula to calculate it to working as front L _tBetween similarity: And all simi are pressed descending sort, from UL _ccMiddle order is got a connected region CC _tIf following 3 conditions satisfy, CC _tBe added to S _cc(L _t) in:

A) at CC _tK nearest-neighbors in have at least a CC _k∈ S _cc(L _t), it is not only CC _tThe brother and also connect it and CC _tThe line of central point and current text line L _tThe average tilt angle

Between differential seat angle less than threshold values T ₅

Arbitrary line segment c _ic _jWith average tilt n angle be

Line between differential seat angle be calculated as follows:

Δθ = \min {| - θ_{ij} - \overset{&OverBar;}{θ} |, π - | - θ_{ij} - \overset{&OverBar;}{θ} |} - - - 14

And T ₅Value determined by following formula:

D wherein _ijExpression connected region central point c _iAnd c _jBetween distance and

The center of upper all connected regions of expression line l is by the mean value of adjacent center point distance after from left to right or from top to bottom arranging;

B) CC _tAlso at CC _kThe set that forms of K arest neighbors connected region in;

C) CC _tCentral point with as front L _tBetween distance less than threshold values T _6.；

It is 3, T that K is set ₆Determined by following formula:

T_{6} = \{\begin{matrix} k^{'} \cdot h_{t} & | tgθ | \leq 1 \\ k^{'} \cdot w_{t} & | tgθ | > 1 \end{matrix} - - - 16

In following formula, h _tAnd w _tRepresent respectively CC _tHeight and width, θ is the X-axis positive dirction and be connected CC _tAnd CC _kAngle between the central point line, and k '=1/3;

If current connected region is added to S _cc(L _t) in, upgrade set UL _ccAverage angle with when the front repeats this process until UL _ccTill middle all elements is all processed, then again repeating step 1 to another group candidate's line of text seed of step 3 search until there is not any line of text seed.

Described employing is removed non-line of text based on the filtrator of morphology framework characteristic, and this filtrator uses a sparse sorter to filter, and the required proper vector of sorter is taken from the morphology framework characteristic step of All Ranges on candidate's line of text and is:

Step 1: prepare training sample, take English as example, prepare 26 letters and the 0-9 binary map form of the different fonts of totally 10 numerals, each portion of roman and italic, respectively with these binary map 90-degree rotations, 180 degree, 270 the degree, with postrotational binary map also as the positive example training sample; Prepare again in addition the non-text image of same sample number as the counter-example training sample;

Step 2: for each binary map, be S with its minimum area-encasing rectangle size conversion _rh* S _rw, and max (S _rh, S _rw)=S _rg, S herein _rg=32, the length on the limit that soon connected region will be larger becomes S _rgAnd the constant rate that keeps height and width extracts the skeleton of connected region and is amplified to equally S _rh* S _rw, then, the skeleton of the amplification that extracts is carried out skeletal extraction again, and with center and square area center-aligned, final, square block is converted into the vector of 32 * 32=1024 dimension, as the input vector of sparse filtrator;

Step 3: adopt the FISHER sorter that proposes in IEEE computer vision 2011 nd Annual Meeting collection 543-550 pages to train, the text filed sorter Classifier that obtains training;

Step 4: for any vergence direction

Candidate's line of text L _t, at first with its rotation θ _rAngle, purpose are that it is rotated to level or vertical direction;

θ _rBe defined as follows:

θ_{r} = \{\begin{matrix} - \overset{&OverBar;}{θ} & | \overset{&OverBar;}{θ} | \leq \frac{π}{4} \\ sign (\overset{&OverBar;}{θ}) \cdot (\frac{π}{2} - | \overset{&OverBar;}{θ} |) & | \overset{&OverBar;}{θ} | > \frac{π}{4} \end{matrix} - - - 17

Wherein

Represent the average slope angle in 14 formulas.

Step 5: for candidate's line of text and its connected region of composition

Suppose

The expression element The directory markeys of proper vector.The tag definitions of whole like this line of text is as follows:

C (L_{t}) = \{\begin{matrix} 1 & Σ_{i} C (y_{i}^{t}) &GreaterEqual; C_{T} \\ 0 & otherwise \end{matrix} - - - 18

C _T＝k ₂·n19

K wherein ₂Be one and control parameter, n is current text row L _tThe connected region number, k ₂Get 0.7,

Being labeled as 0 expression current line is line of text, is kept, otherwise is abandoned.

The beneficial effect that the present invention compared with prior art has:

1) the text detection algorithm in the present invention is stronger to the robustness of size text, and the large text that conventional method is detected poor effect has effect preferably.

2) the text detection algorithm in the present invention can overcome detection algorithm commonly used can only detection level or the problem of vertical direction text, it can detect the scene text of any direction.

3) the captions extraction algorithm in the present invention can overcome the not high shortcoming of detection algorithm accuracy of detection commonly used, and due to the internal characteristics that adopts the sparse sorter learning text of FISHER, its accuracy of detection has had and increases substantially.

Description of drawings

Fig. 1 is any direction text line detection method FB(flow block) in natural image;

Fig. 2 (a) is a primitive nature image to be detected;

Fig. 2 (b) is the MSER zone of detected all binaryzations of the present invention;

Fig. 2 (c) is candidate's line of text seed testing result of the present invention;

Fig. 2 (d) is candidate's line of text testing result of the present invention;

Fig. 3 (a) is the detected candidate's line of text of the present invention;

Fig. 3 (b) is candidate's line of text transformation results of the present invention;

Fig. 3 (c) is candidate's line of text conversion back skeleton figure of the present invention;

Fig. 4 (a) is the MSER zone of the detected binaryzation of the present invention;

Fig. 4 (b) is the result that amplify in zone for the first time of the present invention;

Fig. 4 (c) is the result of skeletal extraction for the first time of the present invention;

Fig. 4 (d) is the result that skeleton for the second time of the present invention amplifies;

Fig. 4 (e) is the result of skeletal extraction for the second time of the present invention, and being used for provides input vector to the FISHER sorter;

Fig. 5 is the instance graph that any direction line of text in natural image is detected of the present invention.

Embodiment

Technical scheme for a better understanding of the present invention, the invention will be further described below in conjunction with accompanying drawing 1.Accompanying drawing 1 has been described natural image text identification frame diagram of the present invention.

{dis}_{ij}^{1} = \frac{| C_{x_{i}} - C_{x_{j}} | + k_{1} * | C_{y_{i}} - C_{y_{j}} |}{k_{1} h_{im} + w_{im} - (k_{1} * (h_{i} + h_{j}) + w_{i} + w_{j}) / 2} - - - 1

Wherein

Value from 0 to 1.

{dis}_{ij}^{2} = {dis}_{ij}^{1} \cdot \sqrt{{&aleph;}_{i} (j)} - - - 2

Wherein

Expression

Arrive CC at all _iThe central point distance

The sequence number number.

Step 3: formula 2 further is revised as:

{dis}_{ij}^{3} = \min_{p_{k}} {{dis}_{{ip}_{k} j}^{2}, | p_{k} | &Element; {0,1,2, . . . . . ., n - 2}} - - - 3

Step 4: defining two interregional shape distances is:

{dis}_{ij}^{4} = \sqrt{\frac{\max (h_{i}, h_{j}) \cdot \max (w_{i}, w_{j})}{\min (h_{i}, h_{j}) \cdot \min (w_{i}, w_{j})}} - - - 4

Step 5: connected region CC _iAnd CC _jThe geometric similarity degree be:

{simi}_{geometry} (i, j) = \exp (- \sqrt{\max ({dis}_{ij}^{3}, {dis}_{ji}^{3}) \cdot {dis}_{ij}^{4}}) - - - 5

Connected region CC _iAnd CC _jColor similarity be:

{simi}_{color} (i, j) = Σ_{t = 1}^{n} \min (C_{i, t}, C_{j, t}) - - - 6

N gets 72;

simi(i,j)＝(simi _geometry(i,j)+simi _color(i,j))/27。

(1) brother judges:

C) two adjacent characters should have similar color characteristic, so their color similarity should be greater than a threshold value T ₄

Formalization representation is as follows:

S_{ij} = S_{ij}^{1}^S_{ij}^{2}^S_{ij}^{3} - - - 8

The refinement of condition 1 judge as shown in the formula:

h _r＝max(h _i，h _j)/min(h _i，h _j)

w _r＝max(w _i，w _j)/min(w _i，w _j 9

The refinement of condition 2 is judged as follows:

S_{ij}^{2} = \{\begin{matrix} 1 & {dis}_{ij} \leq T_{3} \cdot \max (h_{i}, h_{j}) & | tgθ | > 1 \\ 1 & {dis}_{ij} \leq T_{3} \cdot \max (w_{i}, w_{j}) & | tgθ | \leq 1 \\ 0 & others \end{matrix} - - - 11

(2) candidate's line of text generates:

Step 2: for the S that satisfies condition _ij=1 ∧ S _jk=1partSimi (CC _k)≤partSimi (CC _i) ∧ partSimi (CC _j)≤partSimi (CC _i), any three connected region CC _i∈ UL _cc, C C _j∈ UL _cc, and CC _k∈ UL _cc, use following formula to calculate differential seat angle Δ θ _ijk,

Use the same method and calculate Δ θ _jikWith Δ θ _ikjIf

Any two line segment c _ic _jAnd c _mc _nThe average tilt angle

Be calculated as follows:

\overset{&OverBar;}{θ} = \{\begin{matrix} \frac{θ_{ij} + θ_{mn} + π}{2} & if θ_{ij} \cdot θ_{mn} \leq 0^\max {| θ_{ij} |, | θ_{mn} |} &GreaterEqual; \frac{π}{4} \\ \frac{θ_{ij} + θ_{mn}}{2} & otherwise \end{matrix} - - - 13

In top equation, θ _ijAnd θ _mnSpan is

Step 3: for UL _ccIn remaining arbitrary connected region CC _m, use following formula to calculate it to working as front L _tBetween similarity:

And all simi are pressed descending sort, from UL _ccMiddle order is got a connected region CC _tIf following 3 conditions satisfy, CC _tBe added to S _cc(L _t) in:

Between differential seat angle less than threshold values T ₅

Arbitrary line segment c _ic _jWith average tilt n angle be

Line between differential seat angle be calculated as follows:

Δθ = \min {| - θ_{ij} - \overset{&OverBar;}{θ} |, π - | - θ_{ij} - \overset{&OverBar;}{θ} |} - - - 14

And T ₅Value determined by following formula:

It is 3, T that K is set ₆Determined by following formula:

T_{6} = \{\begin{matrix} k^{'} \cdot h_{t} & | tgθ | \leq 1 \\ k^{'} \cdot w_{t} & | tgθ | > 1 \end{matrix} - - - 16

Step 4: for any vergence direction

θ _rBe defined as follows:

θ_{r} = \{\begin{matrix} - \overset{&OverBar;}{θ} & | \overset{&OverBar;}{θ} | \leq \frac{π}{4} \\ sign (\overset{&OverBar;}{θ}) \cdot (\frac{π}{2} - | \overset{&OverBar;}{θ} |) & | \overset{&OverBar;}{θ} | > \frac{π}{4} \end{matrix} - - - 17

Wherein

Represent the average slope angle in 14 formulas.

Step 5: for candidate's line of text and its connected region of composition

Suppose

The expression element

The directory markeys of proper vector.The tag definitions of whole like this line of text is as follows:

C (L_{t}) = \{\begin{matrix} 1 & Σ_{i} C (y_{i}^{t}) &GreaterEqual; C_{T} \\ 0 & otherwise \end{matrix} - - - 18

C _T＝k ₂·n 19

Embodiment

As shown in Fig. 2,3,4, for a certain natural image, provided being included in the identification process example of captions wherein.Describe below in conjunction with method of the present invention the concrete steps that this example is implemented in detail, as follows:

For a certain natural image, as shown in accompanying drawing 2 (a),

(1) it is text filed that the maximum stable extreme regions MSER detection method of the belt restraining in employing claim 2 draws all candidates, and result is as shown in accompanying drawing 2 (b);

(2) in conjunction with the fraternal decision method in the definition of the similarity in claim 2 and claim 3, adopt the seed connected region in claim 3 to detect, the candidate's line of text seed region that draws is as scheming as shown in attached 2 (c).

(3) the candidate's line of text extended method in employing claim 3, all candidate's line of text that draw are as shown in Fig. 2 (d).

(4) the arbitrary candidate's line of text that the upper step was obtained, right to use require the rotational transform in 4 that it is transformed into level or vertical candidate's line of text, and result is as shown in accompanying drawing 3 (b), and its corresponding skeleton structure is as shown in accompanying drawing 3 (c).

(5) the arbitrary connected region in the arbitrary candidate's level that the upper step is obtained or vertical line of text, as shown in accompanying drawing 4 (a), right to use requires the feature extracting method in 4 to extract feature, and each intermediate steps result is as shown in accompanying drawing 4 (b), (c), (d), (e).Then right to use requires the sparse sorter in 4 to classify, and abandons classification results and is not the candidate row of line of text.

In some natural images any direction line of text testing result as shown in Figure 5, red of detected text filed use or blue piece are showed.Can find out from accompanying drawing, any direction that this method can detect in natural image preferably is text filed, and testing result can reach precision preferably.

Claims

1. any direction text line detection method in a natural image is characterized in that comprising the following steps:

2. any direction text line detection method in a kind of natural image according to claim 1, it is characterized in that described maximum stable extreme regions detection method with belt restraining detects the text filed of candidate, then calmodulin binding domain CaM is big or small, absolute distance, relative distance, geometric similarity degree between the contextual information defined range, and be combined with color similarity obtain the zone pair between synthetic similarity step be: at first, it is text filed as the candidate that detection calculates in Britain's machine vision all MSER maximum stable extreme regions that propose in 2002 nd Annual Meeting collection Robust Wide Baseline Stereofrom Maximally Stable Extremal Regions one literary compositions, then use the marginal information of canny operator extraction image, with the constrained line of these edge lines as MSERs, collect connected region, in collection process, a pixel can only be connected to the pixel of its four direction up and down, two pixels that can prevent like this edge pixel side couple together, after collecting all connected regions, use the geometric similarity degree between following 5 steps definition arbitrary region pair:

{dis}_{ij}^{1} = \frac{| C_{x_{i}} - C_{x_{j}} | + k_{1} * | C_{y_{i}} - C_{y_{j}} |}{k_{1} h_{im} + w_{im} - (k_{1} * (h_{i} + h_{j}) + w_{i} + w_{j}) / 2} - - - 1

Wherein

Value from 0 to 1;

{dis}_{ij}^{2} = {dis}_{ij}^{1} \cdot \sqrt{{&aleph;}_{i} (j)} - - - 2

Wherein Expression

Arrive CC at all _iThe central point distance

The sequence number number.

Step 3: formula 2 further is revised as:

{dis}_{ij}^{3} = \min_{p_{k}} {{dis}_{{ip}_{k} j}^{2}, | p_{k} | &Element; {0,1,2, . . . . . ., n - 2}} - - - 3

Step 4: defining two interregional shape distances is:

{dis}_{ij}^{4} = \sqrt{\frac{\max (h_{i}, h_{j}) \cdot \max (w_{i}, w_{j})}{\min (h_{i}, h_{j}) \cdot \min (w_{i}, w_{j})}} - - - 4

Step 5: connected region CC _iAnd CC _jThe geometric similarity degree be:

{simi}_{geometry} (i, j) = \exp (- \sqrt{\max ({dis}_{ij}^{3}, {dis}_{ji}^{3}) \cdot {dis}_{ij}^{4}}) - - - 5

Connected region CC _iAnd CC _jColor similarity be:

{simi}_{color} (i, j) = Σ_{t = 1}^{n} \min (C_{i, t}, C_{j, t}) - - - 6

N gets 72;

simi(i,j)＝(simi _geometry(i,j)+simi _color(i,j))/2 7。

3. any direction text line detection method in a kind of natural image according to claim 1, it is characterized in that described employing is based on candidate's line of text recognition methods of similarity, at first find Three regions as the seed region of candidate's line of text, the All Ranges step that then expands to this row is: carry out candidate's line of text and generate on the basis of judging based on the brother between connected region pair;

(1) brother judges:

Formalization representation is as follows:

S_{ij} = S_{ij}^{1}^S_{ij}^{2}^S_{ij}^{3} - - - 8

The refinement of condition 1 judge as shown in the formula:

h _r＝max(h _i，h _j)/min(h _i，h _j)

w _r＝max(w _i，w _j)/min(w _i，w _j 9

The refinement of condition 2 is judged as follows:

S_{ij}^{2} = \{\begin{matrix} 1 & {dis}_{ij} \leq T_{3} \cdot \max (h_{i}, h_{j}) & | tgθ | > 1 \\ 1 & {dis}_{ij} \leq T_{3} \cdot \max (w_{i}, w_{j}) & | tgθ | \leq 1 \\ 0 & others \end{matrix} - - - 11

(2) candidate's line of text generates:

Step 2: for the s that satisfies condition _ij=1 ∧ s _ik=1partSimi (CC _k)≤partSimi (CC _i) ∧ partSimi (CC _j)≤partSimi is (CC first _i), any three connected region CC _i∈ UL _cc, CC _j∈ UL _cc, and CC _k∈ UL _cc, use following formula to calculate differential seat angle Δ θ _ijk:

Use the same method and calculate Δ θ _jikWith Δ θ _ikjIf

Any two line segment c _ic _jAnd c _mc _nThe average tilt angle

Be calculated as follows:

\overset{&OverBar;}{θ} = \{\begin{matrix} \frac{θ_{ij} + θ_{mn} + π}{2} & if θ_{ij} \cdot θ_{mn} \leq 0^\max {| θ_{ij} |, | θ_{mn} |} &GreaterEqual; \frac{π}{4} \\ \frac{θ_{ij} + θ_{mn}}{2} & otherwise \end{matrix} - - - 13

In top equation, θ _ijAnd θ _mnSpan is

Between differential seat angle less than threshold values T ₅

Arbitrary line segment c _ic _jWith average tilt n angle be

Line between differential seat angle be calculated as follows:

Δθ = \min {| - θ_{ij} - \overset{&OverBar;}{θ} |, π - | - θ_{ij} - \overset{&OverBar;}{θ} |} - - - 14

And T ₅Value determined by following formula:

It is 3, T that K is set ₆Determined by following formula:

T_{6} = \{\begin{matrix} k^{'} \cdot h_{t} & | tgθ | \leq 1 \\ k^{'} \cdot w_{t} & | tgθ | > 1 \end{matrix} - - - 16

4. the text line detection method in a kind of natural image according to claim 1, it is characterized in that described employing is based on the non-line of text of filtrator removal of morphology framework characteristic, this filtrator uses a sparse sorter to filter, and the required proper vector of sorter is taken from the morphology framework characteristic step of All Ranges on candidate's line of text and is:

Step 4: for any vergence direction

θ _rBe defined as follows:

θ_{r} = \{\begin{matrix} - \overset{&OverBar;}{θ} & | \overset{&OverBar;}{θ} | \leq \frac{π}{4} \\ sign (\overset{&OverBar;}{θ}) \cdot (\frac{π}{2} - | \overset{&OverBar;}{θ} |) & | \overset{&OverBar;}{θ} | > \frac{π}{4} \end{matrix} - - - 17

Wherein

Represent the average slope angle in 14 formulas.

Step 5: for candidate's line of text and its connected region of composition

Suppose

The expression element

C (L_{t}) = \{\begin{matrix} 1 & Σ_{i} C (y_{i}^{t}) &GreaterEqual; C_{T} \\ 0 & otherwise \end{matrix} - - - 18

C _T＝k ₂·n 19