CN108875744A

CN108875744A - Multi-oriented text lines detection method based on rectangle frame coordinate transform

Info

Publication number: CN108875744A
Application number: CN201810179236.4A
Authority: CN
Inventors: 项欣光; 张丽飞
Original assignee: Nanjing University of Science and Technology
Current assignee: Nanjing University of Science and Technology
Priority date: 2018-03-05
Filing date: 2018-03-05
Publication date: 2018-11-23
Anticipated expiration: 2038-03-05
Also published as: CN108875744B

Abstract

The present invention provides a kind of multi-oriented text lines detection methods based on rectangle frame coordinate transform, including：Image to be detected is inputted, Y, R, G, channel B are split as to it, and obtains corresponding backward channel；Candidate characters region is used, significant character is carried out to connection based on the join algorithm of distance and similarity；ER algorithm based on characteristic filter is merged to each channel image and the MSER algorithm based on the ratio of width to height constraint carries out the extraction of candidate characters；Not connected candidate characters are coordinately transformed, then carry out character to connection；The connection of line of text is carried out using the method based on line of text linear restriction；Line of text rectangle frame is then carried out reverse coordinate transform more than the half of number of characters in row by the quantity for counting the character in line of text Jing Guo coordinate transform；The filtering that line of text is carried out using the method based on character statistical nature in template matching and line of text, obtains final line of text testing result.

Description

Multi-oriented text lines detection method based on rectangle frame coordinate transform

Technical field

The present invention designs scene text detection technology in a kind of computer vision field, especially a kind of to be sat based on rectangle frame Mark the multi-oriented text lines detection method of transformation.

Background technique

Universal with smart phone and mobile network, the acquisition and transmission of picture become to be more easier, and picture is as letter The carrier of breath becomes increasingly popular to appear in our life.The abundant information that picture includes, the text in picture are conducive to The understanding of image content, while text itself may be also user's focus of attention.Text detection in natural scene picture by , there are wide application scenarios in the popular research direction for gradually becoming computer vision field：Can be used for multimedia content understand and Searching field；It can be used as novel input and archive mode；It can provide more intelligent application, translation of such as taking pictures；Simultaneously to work Industry automation and automatic Pilot technology have supplementary function.

The detection accuracy and recall rate of text improve year by year in scene picture, and detection time shortens, but still is unable to satisfy The demand of practical application, significant challenge are that the background that natural scene picture Chinese word occurs has complexity；Text layout and The multiplicity of appearance form；The uncertain bring picture quality problem of photo environment and shooting technology.And existing scene Character detecting method is mostly focused on the detection of horizontal line of text, and the present invention proposes a kind of multi-direction text based on rectangle frame rotation Current row detection method, this method all have preferable detection effect to horizontal, vertical, inclination line of text.

Summary of the invention

The purpose of the present invention is to provide a kind of multi-oriented text lines detection methods based on rectangle frame coordinate transform, including Following steps：

Step 1, image to be detected is inputted, Y, R, G, channel B are split as to it, and obtains corresponding backward channel；

Step 2, candidate characters region is used and significant character is carried out to connection based on the join algorithm of distance and similarity；

Step 3, the ER algorithm based on characteristic filter is merged to each channel image and the MSER based on the ratio of width to height constraint is calculated The extraction of method progress candidate characters；

Step 4, not connected candidate characters are coordinately transformed, then carry out character to connection；

Step 5, the connection of line of text is carried out using the method based on line of text linear restriction；

Step 6, the quantity for counting the character in line of text Jing Guo coordinate transform then will more than the half of number of characters in row Line of text rectangle frame carries out reverse coordinate transform；

Step 7, the filtering that line of text is carried out using the method based on character statistical nature in template matching and line of text, is obtained To final line of text testing result.

This method is when detecting multi-oriented text lines, by the way of character rectangle frame and the rotation of line of text rectangle frame, The extraction for carrying out a candidate characters is only needed then to need the extraction of candidate characters twice according to picture rotation mode.Meanwhile Line of text connecting link still retains short transverse and is overlapped this feature, can reduce compared with existing multi-oriented text lines algorithm The search range of line of text connection, improves detection efficiency, shortens detection time.Meanwhile experiments have shown that this method is based on some The algorithm of connected domain is compared, and detection effect is preferable.

The invention will be further described with reference to the accompanying drawings of the specification.

Detailed description of the invention

Fig. 1 is flow chart of the invention.

Fig. 2 is the feature schematic diagram for judging significant character to use.

Fig. 3 is the multi-oriented text lines detection method effect picture based on rectangle frame rotation.

Specific embodiment

In conjunction with Fig. 1, a kind of multi-oriented text lines detection method based on rectangle frame coordinate transform includes the following steps：

It is detection dark-background light color text and light color background dark color character area in step 1, while takes each channel Backward channel is handled, i.e. 255-x, and wherein x is the pixel value in original image channel.

Step 2 specifically comprises the following steps：

Step 2.1, the following characteristics between candidate characters pair are calculated：

(1) relative distance d between candidate characters pair '_ij；

(2) candidate characters are to the overlapping f1 in short transverse；

(3) the height ratio f2 of candidate characters rectangle frame；

(4) the ratio f3 of the mean value of stroke width；

(5) channel RGB and Lab pixel mean value difference f4 and f5；

Step 2.2, according to heuristic rule to the feelings that can not can be connected to significant character pair for meeting one of following conditions Condition carries out coarse filtration and removes：

(1) a certain candidate characters region includes another candidate characters region, i.e. i ∪ j==i | | i ∪ j==j；

(2) the left upper apex horizontal direction starting point of candidate characters is identical, i.e. i.rectx==j.rectx；

(3) short transverse overlapping is too small, and it is small to be shown to be same this possibility that of style of writing, i.e. f1<0.2；

(4) relative distance is excessive, i.e. d'_ij≥2.5；

Step 2.3, the measuring similarity using the feature other than relative distance as character, training character is to connection AdaBoost classifier；Wherein, the positive example of training set is the character pair after heuristic rule filters in one text row, and counter-example is The connection pair of character and noise region composition after heuristic rule filtering；

Step 2.4, character is obtained to the confidence value of connection using trained classifier, and is set according to the distance of distance Dual threshold is determined, wherein the first similarity threshold of the character pair being closer can set smaller, apart from the of farther away character pair Two similarity thresholds are higher；It is otherwise idle character pair if distance and similarity meet following formula character to for significant character pair

μ is the confidence value for obtaining character to connection using trained classifier, μ in formula (1)₁And μ₂Respectively first Similarity threshold and the second similarity threshold can obtain μ according to experiment₁=0.5, μ₂=0.8.

In conjunction with Fig. 2, the feature in step 2.1 is specific as follows：

(1) relative distance d between character pair '_ij, i, j respectively represent two candidate characters regions, w_i、w_jIt respectively represents The width of character rectangle frame

d'_ij=| di_j|/max(w_i,w_j) (2)

(2) the overlapping f1, rect in short transverse are the rectangle frame of candidate characters, and rect.y is upper left corner y-coordinate, br () For the lower right corner of rectangle frame, h_i、h_jThe respectively height of character rectangle frame, if short transverse is non-overlapping, which is then to be negative Number, maximum value 1

(3) the height ratio f2 of candidate characters rectangle frame

(4) the ratio f3, s of the mean value of stroke width_i、s_jRespectively represent the stroke width mean value of candidate characters

(5) channel pixel mean value difference, the difference f5 in the channel difference f4 and Lab in the channel RBG

Wherein R, G, B are RGB channel, and a, b are respectively the channel a and the channel b in the channel Lab.

Step 3 specifically comprises the following steps：

Step 3.1, candidate characters extraction is carried out using the ER algorithm based on characteristic filter on each channel；

Step 3.2, the extraction of candidate characters is carried out on each channel using the MSER algorithm constrained based on the ratio of width to height；

Step 3.3, using candidate characters region is as seed node obtained in step 3.1, according to step in step 3.2 2 processes introduced find the region that significant character pair can be formed with seed node, are added in the candidate characters of step 3.1, make Result is extracted for final candidate characters.

Since the overlapping of short transverse does not have rotational invariance in step 4, detects oblique arrangement and be vertically arranged The character for not forming significant character pair is carried out 90 degree of coordinate transforms clockwise, carried out according still further to the method in step 2 by line of text Significant character is to connection.Coordinate transform formula is：

Wherein original image is img₁, line of text rectangle frame is rect in original image₁, the picture after rotating clockwise 90 degree is img₂, Rectangle frame therein is rect₂, the coordinate of the left upper apex of rectangle frame is (x₁,y₁), wide a height of width and height.

The detailed process of step 5 is：

Step 5.1, significant character obtained in step 3,4 is formed into effective triple to three Connection operators are carried out, Connecting criterion is：

(1) two character is to the shared character of presence, and remaining character is distributed in the two sides of shared character；

(2) three character rectangle frame central point straight lines are fitted according to least square method, while calculate error and three words The rectangle frame of symbol, the high row of rectangle frame of the error of fitting greater than 1/6 is then invalid triple, is otherwise retained；

Step 5.2, the distance between effective triple straight line obtained in estimating step 5.1.Firstly, by each triple As character string.The central point of sequence Far Left character rectangle frame central point and rightmost character rectangle frame after merging is calculated, The coordinate of this two o'clock is substituted into the fitting a straight line for merging presequence respectively, finds out two straight lines respectively in ultra-left point and rightest point Distance, take the larger value of distance as the error of fitting of two straight lines；The calculation formula of error of fitting is as follows：

Wherein l₁、l₂Respectively represent central point fitting a straight line of the original there are two sequence, x, x ' respectively represent merging rear region Most left and most right central point x-axis coordinate, x₁,x₂Represent original character string character rectangle frame to, x-axis coordinate, h indicates fitting The row of sequence is high afterwards；

Step 5.3, estimate the ratio of the horizontal space and height between character string as the distance between character string；

The distance between fitting a straight line and horizontal space are less than certain threshold value, then merge character string, otherwise Nonjoinder；

Step 5.4, character string is successively merged, obtains line of text；

Step 5.5, the quantity of the character rotated through in the line of text that statistic procedure 5.4 obtains, if the quantity of rotation character The half for accounting for character sum in line of text then shows that this article current row is vertical or the biggish line of text of tilt angle, by the text Row rectangle frame carries out reverse coordinate transform according to formula (8), is reduced to original image position.

The quantity of the character rotated through in line of text is counted in step 6, if the quantity of rotation character accounts for character in line of text The half of sum, then show that this article current row is vertical or the biggish line of text of tilt angle, by this article current row rectangle frame according to step Formula in rapid 4.1 carries out reverse coordinate transform, is reduced to original image position.

Step 7 specifically comprises the following steps：

Step 7.1, using template matching method on candidate characters connected domain picture and original image picture in line of text Candidate characters region carries out the measurement of similarity, and matching criterior is normalizated correlation coefficient matching method TM_CCOEFF_NORMED, Matching value is defined as identical greater than 0.8；It is identical as how many region in line of text to count each character rectangle frame region, is recorded in In array；The median of access group shows in this article current row extremely if the array median is greater than the half of line of text number of characters The similarity of rare half candidate characters is very high, so that text rower to be denoted as to the line of text that need to be filtered out；

Step 7.2, since the ratio of width to height constrains, the candidate characters precision that the channel MSER extracts is low, counts the time in line of text Word selection accords with the quantity from the channel, if text line character sum of the quantity greater than 60%, then it represents that the confidence level of this article current row It is low, it need to filter out；

Step 7.3, the calculating of confidence level is carried out to the character in line of text using AdaBoost classifier, confidence level is less than 0.5, then the region is denoted as noise；The noise points in line of text are counted, if noise number is more than the half of line of text number of characters, Then this article current row is filtered out；

Step 7.4, the final testing result of the text behavior retained after line of text filters out.

Claims

1. a kind of multi-oriented text lines detection method based on rectangle frame coordinate transform, which is characterized in that include the following steps：

Step 3, to each channel image merge ER algorithm based on characteristic filter and the MSER algorithm based on the ratio of width to height constraint into The extraction of row candidate characters；

Step 6, the quantity for counting the character in line of text Jing Guo coordinate transform, more than the half of number of characters in row, then by text Row rectangle frame carries out reverse coordinate transform；

Step 7, the filtering that line of text is carried out using the method based on character statistical nature in template matching and line of text, is obtained most Whole line of text testing result.

2. the method according to claim 1, wherein step 2 specifically comprises the following steps：

(1) relative distance d between candidate characters pair '_ij；

(2) candidate characters are to the overlapping f1 in short transverse；

(3) the height ratio f2 of candidate characters rectangle frame；

(4) the ratio f3 of the mean value of stroke width；

(5) channel RGB and Lab pixel mean value difference f4 and f5；

Step 2.2, according to heuristic rule to meet the situation that can not can be connected to significant character pair of one of following conditions into Row coarse filtration is removed：

(1) a certain candidate characters region includes another candidate characters region；

(2) the left upper apex horizontal direction starting point of candidate characters is identical；

(3) short transverse overlapping is too small；

(4) relative distance is excessive；

Step 2.4, character is obtained to the confidence value of connection using trained classifier, and double according to the far and near setting of distance Threshold value, wherein the first similarity threshold of the character pair being closer can set smaller, the second phase apart from farther away character pair It is higher like degree threshold value；It is otherwise idle character pair if distance and similarity meet following formula character to for significant character pair

3. according to the method described in claim 2, it is characterized in that, the feature in step 2.1 is specific as follows：

(1) relative distance d between character pair '_ij, i, j respectively represent two candidate characters regions, w_i、w_jRespectively represent character square The width of shape frame

d'_ij=| di_j|/max(w_i,w_j)

(2) the overlapping f1, rect in short transverse are the rectangle frame of candidate characters, and rect.y is upper left corner y-coordinate, and br () is square The lower right corner of shape frame, h_i、h_jThe respectively height of character rectangle frame, if short transverse is non-overlapping, the value is then is negative, most Big value is 1

(3) the height ratio f2 of candidate characters rectangle frame

4. according to right want ball 2 described in method, which is characterized in that step 3 specifically comprises the following steps：

Step 3.3, using candidate characters region obtained in step 3.1 as seed node, energy and seed are found in step 3.2 Node forms the region of significant character pair, is added in the candidate characters of step 3.1, extracts knot as final candidate characters Fruit.

5. according to the method described in claim 4, it is characterized in that, step 4 specifically comprises the following steps：

Step 4.1, since the overlapping of short transverse does not have rotational invariance, for detection oblique arrangement and the text being vertically arranged The character for not forming significant character pair is carried out 90 degree of coordinate transforms clockwise by row, and coordinate transform formula is：

rect₁.height=rect₂.width

rect₁.width=rect₂.height

rect₁.x₁=rect₂.y₁

rect₁.y₁=img₂.width-rect₂.x₁-rect₂.width

Wherein original image is img₁, line of text rectangle frame is rect in original image₁, the picture after rotating clockwise 90 degree is img₂, wherein Rectangle frame be rect₂, the coordinate of the left upper apex of rectangle frame is (x₁,y₁), wide a height of width and height；

Step 4.2, significant character is carried out to connection using the method in claim steps 2.

6. according to the method described in claim 5, it is characterized in that, counting the number of the character rotated through in line of text in step 6 Amount, if the quantity of rotation character accounts for the half of character sum in line of text, show this article current row be vertical or tilt angle compared with This article current row rectangle frame is carried out reverse coordinate transform according to the formula in step 4.1, is reduced to original image position by big line of text It sets.

7. according to the method described in claim 6, it is characterized in that, step 7 specifically comprises the following steps：

Step 7.1, using template matching method on candidate characters connected domain picture and original image picture to the candidate in line of text Character zone carries out the measurement of similarity, and matching criterior is normalizated correlation coefficient matching method TM_CCOEFF_NORMED, matching Value is defined as identical greater than 0.8；It is identical as how many region in line of text to count each character rectangle frame region, is recorded in array In；The median of access group shows in this article current row at least if the array median is greater than the half of line of text number of characters The similarity of half candidate characters is very high, so that text rower to be denoted as to the line of text that need to be filtered out；

Step 7.2, since the ratio of width to height constrains, the candidate characters precision that the channel MSER extracts is low, counts the candidate word in line of text The quantity from the channel is accorded with, if text line character sum of the quantity greater than 60%, then it represents that the confidence level of this article current row is low, needs It filters out；

Step 7.3, using AdaBoost classifier in line of text character carry out confidence level calculating, confidence level less than 0.5, The region is then denoted as noise；The noise points in line of text are counted, it, will if noise number is more than the half of line of text number of characters This article current row filters out；