CN103136523A - Arbitrary direction text line detection method in natural image - Google Patents
Arbitrary direction text line detection method in natural image Download PDFInfo
- Publication number
- CN103136523A CN103136523A CN2012105060724A CN201210506072A CN103136523A CN 103136523 A CN103136523 A CN 103136523A CN 2012105060724 A CN2012105060724 A CN 2012105060724A CN 201210506072 A CN201210506072 A CN 201210506072A CN 103136523 A CN103136523 A CN 103136523A
- Authority
- CN
- China
- Prior art keywords
- text
- line
- theta
- candidate
- connected region
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Abstract
The invention discloses an arbitrary direction text line detection method in a natural image. The arbitrary direction text line detection method in the natural image comprises the following steps: (1) candidate text areas are detected through a maximum stable extreme area detection method with restriction, compound similarity among area pairs is obtained according to combination with size of the areas, absolute distance, relative distance, geometric similarity in a contextual information definition area and color similarity; (2) three areas are found, serve as seed areas of a candidate text line and are expanded to all areas in the line through a candidate text line identification method based on the similarity; and (3) non-text lines are removed through a filter based on morphologic framework characteristics, the filter conducts filtering through a sparse classifier and characteristic vectors required by the classifier are taken from the morphologic framework characteristics in all areas of the candidate text line. The arbitrary direction text line detection method in the natural image can detect text in arbitrary directions in the natural image. In addition, due to adoption of the area interior characteristic structured classifier, better identification accuracy can be achieved.
Description
Technical field
The present invention relates to any direction text line detection method in a kind of natural image, the method be used for to realize that the scene text that detects any direction at natural image is used for OCR identification, belongs to the Computer Image Processing field.
Background technology
Along with the development of multimedia and electronics industry, increasing image information is produced.How effectively to organize and retrieve them and just become a difficult problem.All contain Word message in a lot of image documents, as the front cover of books, guideboard sign, buildings (on name information is arranged) etc., these Word messages and picture material are closely related.If effectively detect and identify these Word messages, can utilize them that image document is organized and retrieved, have very strong practical value.
Word/text detection can be divided into three kinds of methods: based on the method for the method of gradient, color-based cluster with based on the method for texture.Larger based on the method for gradient hypothesis text its edge strength with respect to background, therefore have larger Grad pixel its be that text filed probability is just larger.The Institute of Electrical and Electric Engineers image is processed journal " 2011 years the 20th phase IEEETransactionson Image Processing, vol.20, no.9,2011) the text stroke is detected in the stroke path between the point by approximate opposite gradient direction is arranged on the searching image edge pair, then uses cluster with other heuristic rule, stroke classification to be become different line of text.It will become unreliable when the background area also comprises a lot of marginal information based on the shortcoming of the method for gradient.Method based on texture is used the texture feature extractions such as Gabor filtrator, wavelet transformation, Fast Fourier Transform (FFT), and then the method with machine learning such as neural network, svm classifier devices detects caption area.Ieee communication technology meeting paper in 2008 is concentrated (In Proceeding of IEEE International Conference on Communication Technology (ICCT), 2008, a kind of method of 722-725) announcing by 4 fritter wavelet coefficients being merged into a bulk of the location in the big font text, then strengthens effect with morphological dilation and neural grid with the HARR wavelet transformation.Method based on texture can't detect the text filed of any direction.ACM multimedia technology meeting paper in 2007 is concentrated (In Proceedings ofthe ACM International Multimedia Conference and Exhibition2007 (MM), a kind of method of 847-850) announcing uses color cluster to remove noise, and they come the relatively best planes of color of adaptive selection to carry out the binaryzation operation according to the TEXT CONTRAST difference on the abstract plane of each color.Textcolor in the method hypothesis frame of video of color-based cluster is all unified, yet this hypothesis is in most of the cases invalid, so the limitation of its application is larger.To detect its effect undesirable owing to utilizing a kind of feature to carry out captions, and therefore a lot of methods are united and used above various features.
Above these Method for text detections all made some good tries, but the characteristics such as, the direction strong with the background area calibration that has due to the natural scene text arbitrarily, the position is fixing cause these methods to the detection poor effect of any direction text in natural image.
Summary of the invention
The objective of the invention is to overcome the deficiencies in the prior art, any direction Method for text detection in a kind of natural image is provided.
In natural image, any direction text line detection method comprises the following steps:
(1) the maximum stable extreme regions detection method with belt restraining detects the text filed of candidate, then calmodulin binding domain CaM is big or small, absolute distance, relative distance, geometric similarity degree between the contextual information defined range, and be combined with color similarity obtain the zone pair between synthetic similarity;
(2) employing based on candidate's line of text recognition methods of similarity, at first finds Three regions as the seed region of candidate's line of text, then expands to the All Ranges of this row;
(3) employing is removed non-line of text based on the filtrator of morphology framework characteristic, and this filtrator uses a sparse sorter to filter, and the required proper vector of sorter is taken from the morphology framework characteristic of All Ranges on candidate's line of text.
described maximum stable extreme regions detection method with belt restraining detects the text filed of candidate, then calmodulin binding domain CaM is big or small, absolute distance, relative distance, geometric similarity degree between the contextual information defined range, and be combined with color similarity obtain the zone pair between synthetic similarity step be: at first, it is text filed as the candidate that detection calculates in Britain's machine vision all MSER maximum stable extreme regions that propose in 2002 nd Annual Meeting collection Robust Wide Baseline Stereofrom Maximally Stable Extremal Regions one literary compositions, then use the marginal information of canny operator extraction image, with the constrained line of these edge lines as MSERs, collect connected region, in collection process, a pixel can only be connected to the pixel of its four direction up and down, two pixels that can prevent like this edge pixel side couple together, after collecting all connected regions, use the geometric similarity degree between following 5 steps definition arbitrary region pair:
Step 1: be provided with connected region CC
iAnd CC
j, its standardization absolute distance is defined as follows:
Wherein
Represent respectively CC
iAnd CC
jCentral point horizontal and vertical coordinate, h
i, h
j, w
i, w
jRepresent respectively CC
iAnd CC
jHeight and width, h
im, w
imThe height and width that represent respectively present image, k
1Be a constant of controlling horizontal range and vertical range contribution proportion, its value is set as 2.
Value from 0 to 1.
Step 2: in order to enlarge the difference of the right distance of different CC, the distance metric in formula 1 further is modified as following expression:
Step 3: formula 2 further is revised as:
P wherein
kExpression is from CC
iTo CC
jA paths, and length is between 0 to n-2,
Expression CC
iTo CC
jBetween shortest path, can obtain by the algorithm of Floyd algorithm or other identity functions:
Step 4: defining two interregional shape distances is:
H wherein
i, h
j, w
i, w
jRepresent respectively CC
iAnd CC
jHeight and width.
Step 5: connected region CC
iAnd CC
jThe geometric similarity degree be:
Connected region CC
iAnd CC
jColor similarity be:
At first with image by the RGB color space conversion to the hsv color space, H, S, V component are quantized into respectively 8,3,3 grades.Color histogram is 72 dimensions like this.Suppose CC
iAnd CC
jColor feature vector be respectively C
i=[C
I, 1, C
i,2..., C
I, t..., C
i,n] and C
j=[C
J, 1, C
J, 2..., C
J, t..., C
j,n], color similarity is:
N gets 72;
Finally, the synthetic similarity of two connected region synthetic geometry similarities and color similarity is:
simi(i,j)=(simi
geometry(i,j)+simi
color(i,j))/27。
Described employing is based on candidate's line of text recognition methods of similarity, at first find Three regions as the seed region of candidate's line of text, the All Ranges step that then expands to this row is: carry out candidate's line of text and generate on the basis of judging based on the brother between connected region pair;
(1) brother judges:
Whether enough the brother judges two zones similar and neighbour.If two zones are not the brothers, they can not be merged into the one text row, define following three restrictive conditions and judge whether two connected regions are brothers:
A) ratio of the height and width of two adjacent areas should be at two threshold value T
1And T
2Between;
B) distance between two connected regions should be greater than T
3Multiply by the high or wide of larger zone;
C) two adjacent characters should have similar color characteristic, so their color similarity should be greater than a threshold value T
4Formalization representation is as follows:
S
ijExpression connected region CC
iAnd CC
jWhether, if its value be 1, be similar zone, they may belong to the one text row, otherwise can not belong to the one text row if being similar zone,
Represent respectively three above-mentioned restrictive conditions, T
1, T
2, T
3, T
4Be set as respectively 2,4,3,0.4,
The refinement of condition 1 judge as shown in the formula:
h
r=max(h
i,h
1)/min(h
i,h
1)
w
r=max(w
i,w
1)/mm(w
i,w
19
In formula 10, θ represents connected region CC
iAnd CC
jAngle between the line of central point and X-axis positive dirction;
The refinement of condition 2 is judged as follows:
(2) candidate's line of text generates:
In order to produce candidate's line of text, at first find three seed connected regions, then expand to and comprise more connected region, step is as follows:
Step 1: make UL
ccRepresent all the current set of not determining the connected region composition of line of text, at first initialization set UL
ccBe the set that all connected regions form, for each region division respective flag position, and be initialized as 0.To UL
ccIn each connected region calculate similarity simi (i, *) between other connected region of it and all, then take out two maximum similarities and obtain they and, be designated as partSimi (CC
i), then all partSimi values are by descending sort;
Step 2: for satisfying condition
S
ij=1 ∧ S
1k=1/partSimi (CC
k)≤partSimi (CC
i) ∧ partSimi (CC
j)≤partSimi (CC
i), any three connected region CC
i∈ UL
cc, CC
j∈ UL
cc, and CC
k∈ UL
cc, use following formula to calculate differential seat angle Δ θ
ijk;
V (c wherein
ic
j) and v (c
jc
k) represent respectively vectorial c
ic
jAnd c
jc
k
Use the same method and calculate Δ θ
jikWith Δ θ
ikjIf
Produce a new line of text L
t, record its element S
cc(L
t)={ CC
i, CC
j, CC
kAnd calculate c
ic
jAnd c
jc
kAverage angle as the angle of inclination of current text line, and with CC
i, CC
j, and CC
kFrom set UL
ccIn remove.These three connected regions are just as the kind daughter element of current text line;
Step 3: for UL
ccIn remaining arbitrary connected region CC
m, use following formula to calculate it to working as front L
tBetween similarity:
And all simi are pressed descending sort, from UL
ccMiddle order is got a connected region CC
tIf following 3 conditions satisfy, CC
tBe added to S
cc(L
t) in:
A) at CC
tK nearest-neighbors in have at least a CC
k∈ S
cc(L
t), it is not only CC
tThe brother and also connect it and CC
tThe line of central point and current text line L
tThe average tilt angle
Between differential seat angle less than threshold values T
5
Arbitrary line segment c
ic
jWith average tilt n angle be
Line between differential seat angle be calculated as follows:
And T
5Value determined by following formula:
D wherein
ijExpression connected region central point c
iAnd c
jBetween distance and
The center of upper all connected regions of expression line l is by the mean value of adjacent center point distance after from left to right or from top to bottom arranging;
B) CC
tAlso at CC
kThe set that forms of K arest neighbors connected region in;
C) CC
tCentral point with as front L
tBetween distance less than threshold values T
6.;
It is 3, T that K is set
6Determined by following formula:
In following formula, h
tAnd w
tRepresent respectively CC
tHeight and width, θ is the X-axis positive dirction and be connected CC
tAnd CC
kAngle between the central point line, and k '=1/3;
If current connected region is added to S
cc(L
t) in, upgrade set UL
ccAverage angle with when the front repeats this process until UL
ccTill middle all elements is all processed, then again repeating step 1 to another group candidate's line of text seed of step 3 search until there is not any line of text seed.
Described employing is removed non-line of text based on the filtrator of morphology framework characteristic, and this filtrator uses a sparse sorter to filter, and the required proper vector of sorter is taken from the morphology framework characteristic step of All Ranges on candidate's line of text and is:
Step 1: prepare training sample, take English as example, prepare 26 letters and the 0-9 binary map form of the different fonts of totally 10 numerals, each portion of roman and italic, respectively with these binary map 90-degree rotations, 180 degree, 270 the degree, with postrotational binary map also as the positive example training sample; Prepare again in addition the non-text image of same sample number as the counter-example training sample;
Step 2: for each binary map, be S with its minimum area-encasing rectangle size conversion
rh* S
rw, and max (S
rh, S
rw)=S
rg, S herein
rg=32, the length on the limit that soon connected region will be larger becomes S
rgAnd the constant rate that keeps height and width extracts the skeleton of connected region and is amplified to equally S
rh* S
rw, then, the skeleton of the amplification that extracts is carried out skeletal extraction again, and with center and square area center-aligned, final, square block is converted into the vector of 32 * 32=1024 dimension, as the input vector of sparse filtrator;
Step 3: adopt the FISHER sorter that proposes in IEEE computer vision 2011 nd Annual Meeting collection 543-550 pages to train, the text filed sorter Classifier that obtains training;
Step 4: for any vergence direction
Candidate's line of text L
t, at first with its rotation θ
rAngle, purpose are that it is rotated to level or vertical direction;
θ
rBe defined as follows:
Step 5: for candidate's line of text and its connected region of composition
Suppose
The expression element
The directory markeys of proper vector.The tag definitions of whole like this line of text is as follows:
C
T=k
2·n19
K wherein
2Be one and control parameter, n is current text row L
tThe connected region number, k
2Get 0.7,
Being labeled as 0 expression current line is line of text, is kept, otherwise is abandoned.
The beneficial effect that the present invention compared with prior art has:
1) the text detection algorithm in the present invention is stronger to the robustness of size text, and the large text that conventional method is detected poor effect has effect preferably.
2) the text detection algorithm in the present invention can overcome detection algorithm commonly used can only detection level or the problem of vertical direction text, it can detect the scene text of any direction.
3) the captions extraction algorithm in the present invention can overcome the not high shortcoming of detection algorithm accuracy of detection commonly used, and due to the internal characteristics that adopts the sparse sorter learning text of FISHER, its accuracy of detection has had and increases substantially.
Description of drawings
Fig. 1 is any direction text line detection method FB(flow block) in natural image;
Fig. 2 (a) is a primitive nature image to be detected;
Fig. 2 (b) is the MSER zone of detected all binaryzations of the present invention;
Fig. 2 (c) is candidate's line of text seed testing result of the present invention;
Fig. 2 (d) is candidate's line of text testing result of the present invention;
Fig. 3 (a) is the detected candidate's line of text of the present invention;
Fig. 3 (b) is candidate's line of text transformation results of the present invention;
Fig. 3 (c) is candidate's line of text conversion back skeleton figure of the present invention;
Fig. 4 (a) is the MSER zone of the detected binaryzation of the present invention;
Fig. 4 (b) is the result that amplify in zone for the first time of the present invention;
Fig. 4 (c) is the result of skeletal extraction for the first time of the present invention;
Fig. 4 (d) is the result that skeleton for the second time of the present invention amplifies;
Fig. 4 (e) is the result of skeletal extraction for the second time of the present invention, and being used for provides input vector to the FISHER sorter;
Fig. 5 is the instance graph that any direction line of text in natural image is detected of the present invention.
Embodiment
Technical scheme for a better understanding of the present invention, the invention will be further described below in conjunction with accompanying drawing 1.Accompanying drawing 1 has been described natural image text identification frame diagram of the present invention.
In natural image, any direction text line detection method comprises the following steps:
(1) the maximum stable extreme regions detection method with belt restraining detects the text filed of candidate, then calmodulin binding domain CaM is big or small, absolute distance, relative distance, geometric similarity degree between the contextual information defined range, and be combined with color similarity obtain the zone pair between synthetic similarity;
(2) employing based on candidate's line of text recognition methods of similarity, at first finds Three regions as the seed region of candidate's line of text, then expands to the All Ranges of this row;
(3) employing is removed non-line of text based on the filtrator of morphology framework characteristic, and this filtrator uses a sparse sorter to filter, and the required proper vector of sorter is taken from the morphology framework characteristic of All Ranges on candidate's line of text.
described maximum stable extreme regions detection method with belt restraining detects the text filed of candidate, then calmodulin binding domain CaM is big or small, absolute distance, relative distance, geometric similarity degree between the contextual information defined range, and be combined with color similarity obtain the zone pair between synthetic similarity step be: at first, it is text filed as the candidate that detection calculates in Britain's machine vision all MSER maximum stable extreme regions that propose in 2002 nd Annual Meeting collection Robust Wide Baseline Stereofrom Maximally Stable Extremal Regions one literary compositions, then use the marginal information of canny operator extraction image, with the constrained line of these edge lines as MSERs, collect connected region, in collection process, a pixel can only be connected to the pixel of its four direction up and down, two pixels that can prevent like this edge pixel side couple together, after collecting all connected regions, use the geometric similarity degree between following 5 steps definition arbitrary region pair:
Step 1: be provided with connected region CC
iAnd CC
j, its standardization absolute distance is defined as follows:
Wherein
Represent respectively CC
iAnd CC
jCentral point horizontal and vertical coordinate, h
i, h
j, w
i, w
jRepresent respectively CC
iAnd CC
jHeight and width, h
im, w
imThe height and width that represent respectively present image, k
1Be a constant of controlling horizontal range and vertical range contribution proportion, its value is set as 2.
Value from 0 to 1.
Step 2: in order to enlarge the difference of the right distance of different CC, the distance metric in formula 1 further is modified as following expression:
Step 3: formula 2 further is revised as:
P wherein
kExpression is from CC
iTo CC
jA paths, and length is between 0 to n-2,
Expression CC
iTo CC
jBetween shortest path, can obtain by the algorithm of Floyd algorithm or other identity functions:
Step 4: defining two interregional shape distances is:
H wherein
i, h
j, w
i, w
jRepresent respectively CC
iAnd CC
jHeight and width.
Step 5: connected region CC
iAnd CC
jThe geometric similarity degree be:
Connected region CC
iAnd CC
jColor similarity be:
At first with image by the RGB color space conversion to the hsv color space, H, S, V component are quantized into respectively 8,3,3 grades.Color histogram is 72 dimensions like this.Suppose CC
iAnd CC
jColor feature vector be respectively C
i=[C
I, 1, C
i,2..., C
I, t..., C
i,n] and C
j=[C
J, 1, C
J, 2..., C
J, t..., C
j,n], color similarity is:
N gets 72;
Finally, the synthetic similarity of two connected region synthetic geometry similarities and color similarity is:
simi(i,j)=(simi
geometry(i,j)+simi
color(i,j))/27。
Described employing is based on candidate's line of text recognition methods of similarity, at first find Three regions as the seed region of candidate's line of text, the All Ranges step that then expands to this row is: carry out candidate's line of text and generate on the basis of judging based on the brother between connected region pair;
(1) brother judges:
Whether enough the brother judges two zones similar and neighbour.If two zones are not the brothers, they can not be merged into the one text row, define following three restrictive conditions and judge whether two connected regions are brothers:
A) ratio of the height and width of two adjacent areas should be at two threshold value T
1And T
2Between;
B) distance between two connected regions should be greater than T
3Multiply by the high or wide of larger zone;
C) two adjacent characters should have similar color characteristic, so their color similarity should be greater than a threshold value T
4
Formalization representation is as follows:
S
ijExpression connected region CC
iAnd CC
jWhether, if its value be 1, be similar zone, they may belong to the one text row, otherwise can not belong to the one text row if being similar zone,
Represent respectively three above-mentioned restrictive conditions, T
1, T
2, T
3, T
4Be set as respectively 2,4,3,0.4,
The refinement of condition 1 judge as shown in the formula:
h
r=max(h
i,h
j)/min(h
i,h
j)
w
r=max(w
i,w
j)/min(w
i,w
j 9
In formula 10, θ represents connected region CC
iAnd CC
jAngle between the line of central point and X-axis positive dirction;
The refinement of condition 2 is judged as follows:
(2) candidate's line of text generates:
In order to produce candidate's line of text, at first find three seed connected regions, then expand to and comprise more connected region, step is as follows:
Step 1: make UL
ccRepresent all the current set of not determining the connected region composition of line of text, at first initialization set UL
ccBe the set that all connected regions form, for each region division respective flag position, and be initialized as 0.To UL
ccIn each connected region calculate similarity simi (i, *) between other connected region of it and all, then take out two maximum similarities and obtain they and, be designated as partSimi (CC
i), then all partSimi values are by descending sort;
Step 2: for the S that satisfies condition
ij=1 ∧ S
jk=1partSimi (CC
k)≤partSimi (CC
i) ∧ partSimi (CC
j)≤partSimi (CC
i), any three connected region CC
i∈ UL
cc, C C
j∈ UL
cc, and CC
k∈ UL
cc, use following formula to calculate differential seat angle Δ θ
ijk,
V (c wherein
ic
j) and v (c
jc
k) represent respectively vectorial c
ic
jAnd c
jc
k
Use the same method and calculate Δ θ
jikWith Δ θ
ikjIf
Produce a new line of text L
t, record its element S
cc(L
t)={ CC
i, CC
j, CC
kAnd calculate c
ic
jAnd c
jc
kAverage angle as the angle of inclination of current text line, and with CC
i, CC
j, and CC
kFrom set UL
ccIn remove.These three connected regions are just as the kind daughter element of current text line;
Step 3: for UL
ccIn remaining arbitrary connected region CC
m, use following formula to calculate it to working as front L
tBetween similarity:
And all simi are pressed descending sort, from UL
ccMiddle order is got a connected region CC
tIf following 3 conditions satisfy, CC
tBe added to S
cc(L
t) in:
A) at CC
tK nearest-neighbors in have at least a CC
k∈ S
cc(L
t), it is not only CC
tThe brother and also connect it and CC
tThe line of central point and current text line L
tThe average tilt angle
Between differential seat angle less than threshold values T
5
Arbitrary line segment c
ic
jWith average tilt n angle be
Line between differential seat angle be calculated as follows:
And T
5Value determined by following formula:
D wherein
ijExpression connected region central point c
iAnd c
jBetween distance and
The center of upper all connected regions of expression line l is by the mean value of adjacent center point distance after from left to right or from top to bottom arranging;
B) CC
tAlso at CC
kThe set that forms of K arest neighbors connected region in;
C) CC
tCentral point with as front L
tBetween distance less than threshold values T
6.;
It is 3, T that K is set
6Determined by following formula:
In following formula, h
tAnd w
tRepresent respectively CC
tHeight and width, θ is the X-axis positive dirction and be connected CC
tAnd CC
kAngle between the central point line, and k '=1/3;
If current connected region is added to S
cc(L
t) in, upgrade set UL
ccAverage angle with when the front repeats this process until UL
ccTill middle all elements is all processed, then again repeating step 1 to another group candidate's line of text seed of step 3 search until there is not any line of text seed.
Described employing is removed non-line of text based on the filtrator of morphology framework characteristic, and this filtrator uses a sparse sorter to filter, and the required proper vector of sorter is taken from the morphology framework characteristic step of All Ranges on candidate's line of text and is:
Step 1: prepare training sample, take English as example, prepare 26 letters and the 0-9 binary map form of the different fonts of totally 10 numerals, each portion of roman and italic, respectively with these binary map 90-degree rotations, 180 degree, 270 the degree, with postrotational binary map also as the positive example training sample; Prepare again in addition the non-text image of same sample number as the counter-example training sample;
Step 2: for each binary map, be S with its minimum area-encasing rectangle size conversion
rh* S
rw, and max (S
rh, S
rw)=S
rg, S herein
rg=32, the length on the limit that soon connected region will be larger becomes S
rgAnd the constant rate that keeps height and width extracts the skeleton of connected region and is amplified to equally S
rh* S
rw, then, the skeleton of the amplification that extracts is carried out skeletal extraction again, and with center and square area center-aligned, final, square block is converted into the vector of 32 * 32=1024 dimension, as the input vector of sparse filtrator;
Step 3: adopt the FISHER sorter that proposes in IEEE computer vision 2011 nd Annual Meeting collection 543-550 pages to train, the text filed sorter Classifier that obtains training;
Step 4: for any vergence direction
Candidate's line of text L
t, at first with its rotation θ
rAngle, purpose are that it is rotated to level or vertical direction;
θ
rBe defined as follows:
Step 5: for candidate's line of text and its connected region of composition
Suppose
The expression element
The directory markeys of proper vector.The tag definitions of whole like this line of text is as follows:
C
T=k
2·n 19
K wherein
2Be one and control parameter, n is current text row L
tThe connected region number, k
2Get 0.7,
Being labeled as 0 expression current line is line of text, is kept, otherwise is abandoned.
Embodiment
As shown in Fig. 2,3,4, for a certain natural image, provided being included in the identification process example of captions wherein.Describe below in conjunction with method of the present invention the concrete steps that this example is implemented in detail, as follows:
For a certain natural image, as shown in accompanying drawing 2 (a),
(1) it is text filed that the maximum stable extreme regions MSER detection method of the belt restraining in employing claim 2 draws all candidates, and result is as shown in accompanying drawing 2 (b);
(2) in conjunction with the fraternal decision method in the definition of the similarity in claim 2 and claim 3, adopt the seed connected region in claim 3 to detect, the candidate's line of text seed region that draws is as scheming as shown in attached 2 (c).
(3) the candidate's line of text extended method in employing claim 3, all candidate's line of text that draw are as shown in Fig. 2 (d).
(4) the arbitrary candidate's line of text that the upper step was obtained, right to use require the rotational transform in 4 that it is transformed into level or vertical candidate's line of text, and result is as shown in accompanying drawing 3 (b), and its corresponding skeleton structure is as shown in accompanying drawing 3 (c).
(5) the arbitrary connected region in the arbitrary candidate's level that the upper step is obtained or vertical line of text, as shown in accompanying drawing 4 (a), right to use requires the feature extracting method in 4 to extract feature, and each intermediate steps result is as shown in accompanying drawing 4 (b), (c), (d), (e).Then right to use requires the sparse sorter in 4 to classify, and abandons classification results and is not the candidate row of line of text.
In some natural images any direction line of text testing result as shown in Figure 5, red of detected text filed use or blue piece are showed.Can find out from accompanying drawing, any direction that this method can detect in natural image preferably is text filed, and testing result can reach precision preferably.
Claims (4)
1. any direction text line detection method in a natural image is characterized in that comprising the following steps:
(1) the maximum stable extreme regions detection method with belt restraining detects the text filed of candidate, then calmodulin binding domain CaM is big or small, absolute distance, relative distance, geometric similarity degree between the contextual information defined range, and be combined with color similarity obtain the zone pair between synthetic similarity;
(2) employing based on candidate's line of text recognition methods of similarity, at first finds Three regions as the seed region of candidate's line of text, then expands to the All Ranges of this row;
(3) employing is removed non-line of text based on the filtrator of morphology framework characteristic, and this filtrator uses a sparse sorter to filter, and the required proper vector of sorter is taken from the morphology framework characteristic of All Ranges on candidate's line of text.
2. any direction text line detection method in a kind of natural image according to claim 1, it is characterized in that described maximum stable extreme regions detection method with belt restraining detects the text filed of candidate, then calmodulin binding domain CaM is big or small, absolute distance, relative distance, geometric similarity degree between the contextual information defined range, and be combined with color similarity obtain the zone pair between synthetic similarity step be: at first, it is text filed as the candidate that detection calculates in Britain's machine vision all MSER maximum stable extreme regions that propose in 2002 nd Annual Meeting collection Robust Wide Baseline Stereofrom Maximally Stable Extremal Regions one literary compositions, then use the marginal information of canny operator extraction image, with the constrained line of these edge lines as MSERs, collect connected region, in collection process, a pixel can only be connected to the pixel of its four direction up and down, two pixels that can prevent like this edge pixel side couple together, after collecting all connected regions, use the geometric similarity degree between following 5 steps definition arbitrary region pair:
Step 1: be provided with connected region CC
iAnd CC
j, its standardization absolute distance is defined as follows:
Wherein
Represent respectively CC
iAnd CC
jCentral point horizontal and vertical coordinate, h
i, h
j, w
i, w
jRepresent respectively CC
iAnd CC
jHeight and width, h
im, w
imThe height and width that represent respectively present image, k
1Be a constant of controlling horizontal range and vertical range contribution proportion, its value is set as 2.
Value from 0 to 1;
Step 2: in order to enlarge the difference of the right distance of different CC, the distance metric in formula 1 further is modified as following expression:
Step 3: formula 2 further is revised as:
P wherein
kExpression is from CC
iTo CC
jA paths, and length is between 0 to n-2,
Expression CC
iTo CC
jBetween shortest path, can obtain by the algorithm of Floyd algorithm or other identity functions:
Step 4: defining two interregional shape distances is:
H wherein
i, h
j, w
i, w
jRepresent respectively CC
iAnd CC
jHeight and width.
Step 5: connected region CC
iAnd CC
jThe geometric similarity degree be:
Connected region CC
iAnd CC
jColor similarity be:
At first with image by the RGB color space conversion to the hsv color space, H, S, V component are quantized into respectively 8,3,3 grades.Color histogram is 72 dimensions like this.Suppose CC
iAnd CC
jColor feature vector be respectively C
i=[C
I, 1, C
i,2..., C
I, t..., C
i,n] and C
j=[C
J, 1, C
J, 2..., C
J, t..., C
j,n], color similarity is:
N gets 72;
Finally, the synthetic similarity of two connected region synthetic geometry similarities and color similarity is:
simi(i,j)=(simi
geometry(i,j)+simi
color(i,j))/2 7。
3. any direction text line detection method in a kind of natural image according to claim 1, it is characterized in that described employing is based on candidate's line of text recognition methods of similarity, at first find Three regions as the seed region of candidate's line of text, the All Ranges step that then expands to this row is: carry out candidate's line of text and generate on the basis of judging based on the brother between connected region pair;
(1) brother judges:
Whether enough the brother judges two zones similar and neighbour.If two zones are not the brothers, they can not be merged into the one text row, define following three restrictive conditions and judge whether two connected regions are brothers:
A) ratio of the height and width of two adjacent areas should be at two threshold value T
1And T
2Between;
B) distance between two connected regions should be greater than T
3Multiply by the high or wide of larger zone;
C) two adjacent characters should have similar color characteristic, so their color similarity should be greater than a threshold value T
4
Formalization representation is as follows:
S
ijExpression connected region CC
iAnd CC
jWhether, if its value be 1, be similar zone, they may belong to the one text row, otherwise can not belong to the one text row if being similar zone,
Represent respectively three above-mentioned restrictive conditions, T
1, T
2, T
3, T
4Be set as respectively 2,4,3,0.4,
The refinement of condition 1 judge as shown in the formula:
h
r=max(h
i,h
j)/min(h
i,h
j)
w
r=max(w
i,w
j)/min(w
i,w
j 9
In formula 10, θ represents connected region CC
iAnd CC
jAngle between the line of central point and X-axis positive dirction;
The refinement of condition 2 is judged as follows:
(2) candidate's line of text generates:
In order to produce candidate's line of text, at first find three seed connected regions, then expand to and comprise more connected region, step is as follows:
Step 1: make UL
ccRepresent all the current set of not determining the connected region composition of line of text, at first initialization set UL
ccBe the set that all connected regions form, for each region division respective flag position, and be initialized as 0.To UL
ccIn each connected region calculate similarity simi (i, *) between other connected region of it and all, then take out two maximum similarities and obtain they and, be designated as partSimi (CC
i), then all partSimi values are by descending sort;
Step 2: for the s that satisfies condition
ij=1 ∧ s
ik=1partSimi (CC
k)≤partSimi (CC
i) ∧ partSimi (CC
j)≤partSimi is (CC first
i), any three connected region CC
i∈ UL
cc, CC
j∈ UL
cc, and CC
k∈ UL
cc, use following formula to calculate differential seat angle Δ θ
ijk:
V (c wherein
ic
j) and v (c
jc
k) represent respectively vectorial c
ic
jAnd c
jc
k
Use the same method and calculate Δ θ
jikWith Δ θ
ikjIf
Produce a new line of text L
t, record its element S
cc(L
t)={ CC
i, CC
j, CC
kAnd calculate c
ic
jAnd c
jc
kAverage angle as the angle of inclination of current text line, and with CC
i, CC
j, and CC
kFrom set UL
ccIn remove.These three connected regions are just as the kind daughter element of current text line;
Step 3: for UL
ccIn remaining arbitrary connected region CC
m, use following formula to calculate it to working as front L
tBetween similarity:
And all simi are pressed descending sort, from UL
ccMiddle order is got a connected region CC
tIf following 3 conditions satisfy, CC
tBe added to S
cc(L
t) in:
A) at CC
tK nearest-neighbors in have at least a CC
k∈ S
cc(L
t), it is not only CC
tThe brother and also connect it and CC
tThe line of central point and current text line L
tThe average tilt angle
Between differential seat angle less than threshold values T
5
Arbitrary line segment c
ic
jWith average tilt n angle be
Line between differential seat angle be calculated as follows:
And T
5Value determined by following formula:
D wherein
ijExpression connected region central point c
iAnd c
jBetween distance and
The center of upper all connected regions of expression line l is by the mean value of adjacent center point distance after from left to right or from top to bottom arranging;
B) CC
tAlso at CC
kThe set that forms of K arest neighbors connected region in;
C) CC
tCentral point with as front L
tBetween distance less than threshold values T
6.;
It is 3, T that K is set
6Determined by following formula:
In following formula, h
tAnd w
tRepresent respectively CC
tHeight and width, θ is the X-axis positive dirction and be connected CC
tAnd CC
kAngle between the central point line, and k '=1/3;
If current connected region is added to S
cc(L
t) in, upgrade set UL
ccAverage angle with when the front repeats this process until UL
ccTill middle all elements is all processed, then again repeating step 1 to another group candidate's line of text seed of step 3 search until there is not any line of text seed.
4. the text line detection method in a kind of natural image according to claim 1, it is characterized in that described employing is based on the non-line of text of filtrator removal of morphology framework characteristic, this filtrator uses a sparse sorter to filter, and the required proper vector of sorter is taken from the morphology framework characteristic step of All Ranges on candidate's line of text and is:
Step 1: prepare training sample, take English as example, prepare 26 letters and the 0-9 binary map form of the different fonts of totally 10 numerals, each portion of roman and italic, respectively with these binary map 90-degree rotations, 180 degree, 270 the degree, with postrotational binary map also as the positive example training sample; Prepare again in addition the non-text image of same sample number as the counter-example training sample;
Step 2: for each binary map, be S with its minimum area-encasing rectangle size conversion
rh* S
rw, and max (S
rh, S
rw)=S
rg, S herein
rg=32, the length on the limit that soon connected region will be larger becomes S
rgAnd the constant rate that keeps height and width extracts the skeleton of connected region and is amplified to equally S
rh* S
rw, then, the skeleton of the amplification that extracts is carried out skeletal extraction again, and with center and square area center-aligned, final, square block is converted into the vector of 32 * 32=1024 dimension, as the input vector of sparse filtrator;
Step 3: adopt the FISHER sorter that proposes in IEEE computer vision 2011 nd Annual Meeting collection 543-550 pages to train, the text filed sorter Classifier that obtains training;
Step 4: for any vergence direction
Candidate's line of text L
t, at first with its rotation θ
rAngle, purpose are that it is rotated to level or vertical direction;
θ
rBe defined as follows:
Step 5: for candidate's line of text and its connected region of composition
Suppose
The expression element
The directory markeys of proper vector.The tag definitions of whole like this line of text is as follows:
C
T=k
2·n 19
K wherein
2Be one and control parameter, n is current text row L
tThe connected region number, k
2Get 0.7,
Being labeled as 0 expression current line is line of text, is kept, otherwise is abandoned.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210506072.4A CN103136523B (en) | 2012-11-29 | 2012-11-29 | Any direction text line detection method in a kind of natural image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210506072.4A CN103136523B (en) | 2012-11-29 | 2012-11-29 | Any direction text line detection method in a kind of natural image |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103136523A true CN103136523A (en) | 2013-06-05 |
CN103136523B CN103136523B (en) | 2016-06-29 |
Family
ID=48496331
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210506072.4A Expired - Fee Related CN103136523B (en) | 2012-11-29 | 2012-11-29 | Any direction text line detection method in a kind of natural image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103136523B (en) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104778470A (en) * | 2015-03-12 | 2015-07-15 | 浙江大学 | Character detection and recognition method based on component tree and Hough forest |
CN105005764A (en) * | 2015-06-29 | 2015-10-28 | 东南大学 | Multi-direction text detection method of natural scene |
CN105678207A (en) * | 2014-11-19 | 2016-06-15 | 富士通株式会社 | Device and method for identifying content of target nameplate image from given image |
CN105825216A (en) * | 2016-03-17 | 2016-08-03 | 中国科学院信息工程研究所 | Method of locating text in complex background image |
CN106503732A (en) * | 2016-10-13 | 2017-03-15 | 北京云江科技有限公司 | Text image and the sorting technique and categorizing system of non-textual image |
CN106796647A (en) * | 2014-09-05 | 2017-05-31 | 北京市商汤科技开发有限公司 | Scene text detecting system and method |
CN107368830A (en) * | 2016-05-13 | 2017-11-21 | 佳能株式会社 | Method for text detection and device and text recognition system |
CN107368826A (en) * | 2016-05-13 | 2017-11-21 | 佳能株式会社 | Method and apparatus for text detection |
CN107688807A (en) * | 2016-08-05 | 2018-02-13 | 腾讯科技(深圳)有限公司 | Image processing method and image processing apparatus |
CN107784316A (en) * | 2016-08-26 | 2018-03-09 | 阿里巴巴集团控股有限公司 | A kind of image-recognizing method, device, system and computing device |
CN108288061A (en) * | 2018-03-02 | 2018-07-17 | 哈尔滨理工大学 | A method of based on the quick positioning tilt texts in natural scene of MSER |
CN108399419A (en) * | 2018-01-25 | 2018-08-14 | 华南理工大学 | Chinese text recognition methods in natural scene image based on two-dimentional Recursive Networks |
CN108875744A (en) * | 2018-03-05 | 2018-11-23 | 南京理工大学 | Multi-oriented text lines detection method based on rectangle frame coordinate transform |
CN109284751A (en) * | 2018-10-31 | 2019-01-29 | 河南科技大学 | The non-textual filtering method of text location based on spectrum analysis and SVM |
CN109934229A (en) * | 2019-03-28 | 2019-06-25 | 网易有道信息技术(北京)有限公司 | Image processing method, device, medium and calculating equipment |
CN110059600A (en) * | 2019-04-09 | 2019-07-26 | 杭州视氪科技有限公司 | A kind of single line text recognition methods based on direction gesture |
CN110211048A (en) * | 2019-05-28 | 2019-09-06 | 湖北华中电力科技开发有限责任公司 | A kind of complicated archival image Slant Rectify method based on convolutional neural networks |
CN111325210A (en) * | 2018-12-14 | 2020-06-23 | 北京京东尚科信息技术有限公司 | Method and apparatus for outputting information |
CN112560599A (en) * | 2020-12-02 | 2021-03-26 | 上海眼控科技股份有限公司 | Text recognition method and device, computer equipment and storage medium |
CN112883974A (en) * | 2021-05-06 | 2021-06-01 | 江西省江咨金发数据科技发展有限公司 | Electronic letter identification system based on image verification |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1512439A (en) * | 2002-12-26 | 2004-07-14 | ��ʿͨ��ʽ���� | Video frequency text processor |
US20060062460A1 (en) * | 2004-08-10 | 2006-03-23 | Fujitsu Limited | Character recognition apparatus and method for recognizing characters in an image |
CN102208023A (en) * | 2011-01-23 | 2011-10-05 | 浙江大学 | Method for recognizing and designing video captions based on edge information and distribution entropy |
CN102542268A (en) * | 2011-12-29 | 2012-07-04 | 中国科学院自动化研究所 | Method for detecting and positioning text area in video |
-
2012
- 2012-11-29 CN CN201210506072.4A patent/CN103136523B/en not_active Expired - Fee Related
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1512439A (en) * | 2002-12-26 | 2004-07-14 | ��ʿͨ��ʽ���� | Video frequency text processor |
US20060062460A1 (en) * | 2004-08-10 | 2006-03-23 | Fujitsu Limited | Character recognition apparatus and method for recognizing characters in an image |
CN102208023A (en) * | 2011-01-23 | 2011-10-05 | 浙江大学 | Method for recognizing and designing video captions based on edge information and distribution entropy |
CN102542268A (en) * | 2011-12-29 | 2012-07-04 | 中国科学院自动化研究所 | Method for detecting and positioning text area in video |
Non-Patent Citations (2)
Title |
---|
JIE YUAN ET ALL: "A New Video Text Detection Method", 《JCDL`11 PROCEEDINGS OF THE 11TH ANNUAL INTERNATIONAL ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES》 * |
张引 等: "面向彩色图像和视频的文本提取新方法", 《计算机辅助设计与图形学学报》 * |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106796647B (en) * | 2014-09-05 | 2018-09-14 | 北京市商汤科技开发有限公司 | Scene text detecting system and method |
CN106796647A (en) * | 2014-09-05 | 2017-05-31 | 北京市商汤科技开发有限公司 | Scene text detecting system and method |
CN105678207A (en) * | 2014-11-19 | 2016-06-15 | 富士通株式会社 | Device and method for identifying content of target nameplate image from given image |
CN104778470A (en) * | 2015-03-12 | 2015-07-15 | 浙江大学 | Character detection and recognition method based on component tree and Hough forest |
CN105005764B (en) * | 2015-06-29 | 2018-02-13 | 东南大学 | The multi-direction Method for text detection of natural scene |
CN105005764A (en) * | 2015-06-29 | 2015-10-28 | 东南大学 | Multi-direction text detection method of natural scene |
CN105825216A (en) * | 2016-03-17 | 2016-08-03 | 中国科学院信息工程研究所 | Method of locating text in complex background image |
CN107368826A (en) * | 2016-05-13 | 2017-11-21 | 佳能株式会社 | Method and apparatus for text detection |
CN107368830A (en) * | 2016-05-13 | 2017-11-21 | 佳能株式会社 | Method for text detection and device and text recognition system |
CN107368830B (en) * | 2016-05-13 | 2021-11-09 | 佳能株式会社 | Text detection method and device and text recognition system |
CN107688807A (en) * | 2016-08-05 | 2018-02-13 | 腾讯科技(深圳)有限公司 | Image processing method and image processing apparatus |
CN107784316A (en) * | 2016-08-26 | 2018-03-09 | 阿里巴巴集团控股有限公司 | A kind of image-recognizing method, device, system and computing device |
CN106503732A (en) * | 2016-10-13 | 2017-03-15 | 北京云江科技有限公司 | Text image and the sorting technique and categorizing system of non-textual image |
CN106503732B (en) * | 2016-10-13 | 2019-07-19 | 北京云江科技有限公司 | The classification method and categorizing system of text image and non-textual image |
CN108399419A (en) * | 2018-01-25 | 2018-08-14 | 华南理工大学 | Chinese text recognition methods in natural scene image based on two-dimentional Recursive Networks |
CN108399419B (en) * | 2018-01-25 | 2021-02-19 | 华南理工大学 | Method for recognizing Chinese text in natural scene image based on two-dimensional recursive network |
CN108288061A (en) * | 2018-03-02 | 2018-07-17 | 哈尔滨理工大学 | A method of based on the quick positioning tilt texts in natural scene of MSER |
CN108875744A (en) * | 2018-03-05 | 2018-11-23 | 南京理工大学 | Multi-oriented text lines detection method based on rectangle frame coordinate transform |
CN109284751A (en) * | 2018-10-31 | 2019-01-29 | 河南科技大学 | The non-textual filtering method of text location based on spectrum analysis and SVM |
CN111325210A (en) * | 2018-12-14 | 2020-06-23 | 北京京东尚科信息技术有限公司 | Method and apparatus for outputting information |
CN109934229A (en) * | 2019-03-28 | 2019-06-25 | 网易有道信息技术(北京)有限公司 | Image processing method, device, medium and calculating equipment |
CN110059600A (en) * | 2019-04-09 | 2019-07-26 | 杭州视氪科技有限公司 | A kind of single line text recognition methods based on direction gesture |
CN110059600B (en) * | 2019-04-09 | 2021-07-06 | 杭州视氪科技有限公司 | Single-line character recognition method based on pointing gesture |
CN110211048A (en) * | 2019-05-28 | 2019-09-06 | 湖北华中电力科技开发有限责任公司 | A kind of complicated archival image Slant Rectify method based on convolutional neural networks |
CN112560599A (en) * | 2020-12-02 | 2021-03-26 | 上海眼控科技股份有限公司 | Text recognition method and device, computer equipment and storage medium |
CN112883974A (en) * | 2021-05-06 | 2021-06-01 | 江西省江咨金发数据科技发展有限公司 | Electronic letter identification system based on image verification |
Also Published As
Publication number | Publication date |
---|---|
CN103136523B (en) | 2016-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103136523B (en) | Any direction text line detection method in a kind of natural image | |
Nikolaou et al. | Segmentation of historical machine-printed documents using adaptive run length smoothing and skeleton segmentation paths | |
Antonacopoulos et al. | ICDAR2015 competition on recognition of documents with complex layouts-RDCL2015 | |
JP5492205B2 (en) | Segment print pages into articles | |
CN102081731B (en) | Method and device for extracting text from image | |
US8462394B2 (en) | Document type classification for scanned bitmaps | |
Alberti et al. | Labeling, cutting, grouping: an efficient text line segmentation method for medieval manuscripts | |
CN101122952A (en) | Picture words detecting method | |
CN103530600A (en) | License plate recognition method and system under complicated illumination | |
CN108154151B (en) | Rapid multi-direction text line detection method | |
Rigaud et al. | Automatic text localisation in scanned comic books | |
Shivakumara et al. | Gradient-angular-features for word-wise video script identification | |
Forczmański et al. | Stamps detection and classification using simple features ensemble | |
Unar et al. | Artificial Urdu text detection and localization from individual video frames | |
Mullick et al. | An efficient line segmentation approach for handwritten Bangla document image | |
Kunishige et al. | Scenery character detection with environmental context | |
CN107368826B (en) | Method and apparatus for text detection | |
Melinda et al. | Parameter-free table detection method | |
Zhan et al. | A robust split-and-merge text segmentation approach for images | |
Shelke et al. | A novel multistage classification and wavelet based kernel generation for handwritten marathi compound character recognition | |
Ziaratban et al. | An adaptive script-independent block-based text line extraction | |
Lue et al. | A novel character segmentation method for text images captured by cameras | |
Phan et al. | Text detection in natural scenes using gradient vector flow-guided symmetry | |
Ahmed et al. | Enhancing the character segmentation accuracy of bangla ocr using bpnn | |
Nguyen et al. | An effective method for text line segmentation in historical document images |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20160629 Termination date: 20191129 |