CN108647681B - A kind of English text detection method with text orientation correction - Google Patents

A kind of English text detection method with text orientation correction Download PDF

Info

Publication number
CN108647681B
CN108647681B CN201810429149.XA CN201810429149A CN108647681B CN 108647681 B CN108647681 B CN 108647681B CN 201810429149 A CN201810429149 A CN 201810429149A CN 108647681 B CN108647681 B CN 108647681B
Authority
CN
China
Prior art keywords
text
filed
text filed
preliminary
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810429149.XA
Other languages
Chinese (zh)
Other versions
CN108647681A (en
Inventor
代劲
王族
尹航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN201810429149.XA priority Critical patent/CN108647681B/en
Publication of CN108647681A publication Critical patent/CN108647681A/en
Application granted granted Critical
Publication of CN108647681B publication Critical patent/CN108647681B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/63Scene text, e.g. street names
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Abstract

The invention belongs to technical field of image processing, specially a kind of English text detection method with text orientation correction;The described method includes: the carry out maximum stable extremal region detection to each channel of English text image respectively, obtains candidate text filed;The classifier based on convolutional neural networks model is established, the candidate of filter false is text filed, obtains preliminary text filed;Using the double-deck text packets algorithm by the preliminary text filed grouping;By the preliminary text filed carry out correction for direction after grouping, to obtain correction text;The present invention uses a kind of multichannel MSER model of enhancing: finer text filed to obtain;Parallel SPP-CNN classifier is introduced to better discriminate between text filed and non-textual region, can handle the image of arbitrary size, and can be in multiple dimensioned lower extraction pond feature, so as to understand more features by the multilayer spatial information of source images;The present invention can handle the scene text being slightly slanted.

Description

A kind of English text detection method with text orientation correction
Technical field
The invention belongs to technical field of image processing, specially a kind of English text detection side with text orientation correction Method.
Background technique
Text in natural scene image has information accurate, abundant, this is for image analysis, turning over based on image It translates, picture search etc. is of great significance.In past 20 years, researcher proposes some to be examined in natural scene image The method for surveying text.There are many based on content multimedia understand application, as automatic vision classification, image retrieval, assisting navigation, Multilingual translation, Object identifying and the application to satisfy the needs of consumers.
The critical issue that scene text detection faces is: (1) text in file and picture has regular font, similar face Color, it is evenly sized and evenly distributed, even if the text in natural scene may also have different words in Same Scene Body, color, ratio and direction.(2) background of natural scene image may be extremely complex.Mark, fence, brick and meadow are difficult to It is distinguished with real text, therefore be easy to cause and obscure and mistake.(3) other disturbing factors in scene character image.Such as Non-uniform illumination obscures, translucent effects etc..
Researcher proposes many methods to detect the text in natural scene image, and there are two types of main methods.
Text is considered as a kind of texture of specific type by the method based on texture, and using their texture properties, such as office Portion's intensity, filter response and wavelet coefficient distinguish the text filed and non-textual region of image.The meter of these usual methods Calculation amount is very big, because to scan all positions and scale.In addition, these methods mainly handle lateral text, to rotation and scaling It is very sensitive;
Text is considered as connection component by component-oriented approach, first by various methods (such as color cluster or extreme area Extract in domain) text is extracted, the classifier then trained using the regular of manual designs or automatically is filtered non-textual component. It is more effective generally, based on the method for component, because component count to be processed is relatively fewer.In addition, these methods are to rotation, contracting It puts and font is all insensitive.The conventional method for detecting candidate text filed (Candidate Text Region, be denoted as CTR) has Maximum stable extremal region (Maximally Stable Extremal Regions, be denoted as MSER), this method is for image It is affine variation have very strong robustness, can efficiently extract it is text filed in image, after there is scholar to improve MSER's Extraction algorithm makes the time complexity of algorithm reach linear session.
These methods are according to the rule or feature for distinguishing text filed and non-textual region, thus by text filed and non-text One's respective area is distinguished, although these methods are capable of detecting when text, lacks the correction to English text, and to inclination text Differentiation effect and bad, the text identified can have serious separation because of the inclination of word.
Summary of the invention
In view of this, the invention proposes a kind of English text detection method with text orientation correction, it can be effective Identify text, and will identify that inclination text be corrected, specifically includes the following steps:
S1, maximum stable extremal region detection is carried out to each channel of the sharpening image of English text image respectively, from figure MSER is extracted as in as text candidates;It obtains candidate text filed;
S2, the classifier based on convolutional neural networks model is established, extracts candidate text filed feature;It utilizes Softmax function text filed is divided into text class region and non-textual class region according to candidate text filed feature, by candidate; Non-textual class region is filtered, acquisition is preliminary text filed, that is, detects English text;
S3, using the double-deck text packets algorithm by the preliminary text filed grouping;
S4, by the preliminary text filed carry out correction for direction after grouping, to realize the correction of English text.
Further, the channel include: red channel, green channel, blue channel, tone channel, saturation degree channel, Lightness channel and grey channel.
Further, the classifier of the foundation based on convolutional neural networks model extracts candidate text filed spy Sign includes: to obtain candidate text filed fisrt feature according to five layer architectures in classifier respectively and waited by cross-layer The second feature of selection one's respective area, wherein five layer architectures include the first convolutional layer being sequentially connected, maximum pond layer, volume Two Lamination, pyramid pond layer and full articulamentum;Cross-layer indicates the first convolutional layer to full articulamentum.
Further, candidate text filed progress first time filtering is checked using the first convolution in the first layer architecture;It will Filtered candidate is text filed for the first time carries out maximum pond in the second layer architecture;Utilize the volume Two in third layer architecture Product core filters the text filed progress of the candidate of maximum Chi Huahou for second;It is text filed to second of filtered candidate, It carries out utilizing pyramid pond in 4th layer architecture;It is text filed to the candidate of pyramid Chi Huahou to be carried out in layer 5 framework Full connection, to obtain candidate text filed fisrt feature.
Further, using the feature added manually, by the candidate text filed progress first time filtering of the first convolution verification; It is connected entirely according to the feature added manually by filtered candidate is text filed, to obtain candidate text filed second Feature.
Further, the feature added manually includes: depth-width ratio, compactness, stroke width area ratio, local contrast Degree and boundary key point.
Further, the calculation formula of the local contrast are as follows:
Wherein, lc indicates local contrast;RiIndicate the ith pixel of red channel;GiIndicate i-th of green channel Pixel;BiIndicate the ith pixel of blue channel;N indicates the total number of pixels in the region MSER, and k indicates of boundary key point Number.
Further, the acquisition modes of the boundary key point are as follows:
Construct binary picture;The all pixels of binary picture described in iteration;Calculate profile point;It is drawn using Doug Si-Pu Ke compression algorithm profile point obtains boundary key point and specifically includes:
255 are set by the gray value for belonging to pixel in maximum stable extremal region;Maximum stable extremal region will be belonged to Outside, and belong to the gray value of pixel in the minimum circumscribed rectangle region of maximum stable extremal region and be set as 0;If pixel The pixel value p (x, y)=255 of (x, y), and at p (x+1, y), p (x-1, y), p (x, y+1), have in p (x, y-1) value to be 0, then pixel p (x, y) belongs to profile point;Using Douglas-Pu Ke compression algorithm profile point, pass through compressed remaining profile Point is boundary key point.
Further, the preliminary text filed grouping is included: by preliminary text by the double-deck text packets algorithm of the utilization One's respective area carries out vertical grouping and horizontal grouping respectively;
The vertical grouping specifically includes as follows:
Obtain the minimum Y axis coordinate b that n-th of preliminary text filed middle pixel is 255n;Obtain (n+1)th preliminary text area The maximum Y axis coordinate t that pixel is 255 in domainn+1;Obtain (n+1)th preliminary text filed height hn+1
Computed altitude is poorIf difference in height dn,n+1Greater than height threshold;Then by two preliminary texts Region division is identical class, that is, belongs to one text row;If difference in height dn,n+1Less than or equal to height threshold, then at the beginning of two Walking text filed is not same class, (n+1)th it is preliminary it is text filed be considered as new class, and new line of text is in Y direction quilt It splits;
The level is grouped specific steps
Two adjacent tentatively the distance between text filed poor Δ d in one text row in acquisition X-axis;Range difference Δ d packet It includes: the distance between letter d in same word1, the distance between word d2
According to coefficientIt indicates the mean breadth of all letters in line of text, word is separated according to width threshold value;
Obtain the ratio of pitch and intervalIf the ratio d of pitch and intervalhLess than width Threshold value, then the two it is adjacent it is preliminary it is text filed belong to same class, i.e., the same word, if the ratio of pitch and interval Value dhMore than or equal to width threshold value, the two it is adjacent it is preliminary it is text filed be not belonging to same class, i.e., the two regions are not belonging to The same word, by the preliminary text filed beginning as a new word of the latter.
Further, the preliminary text filed progress correction for direction by after grouping includes:
S401, respectively will be preliminary text filed to rotate clockwise α degree after grouping using coordinate rotation formula;Setting is just Initial value i=1, α=- 30 °;
S402, pass through Model Matching process, the group box filtering introduced errors into;Obtain i-th of correction text area undetermined Domain;
S403, as i < 6 ,+10 ° of i=i+1, α=α, return step S401;As i=6, by the 1st correction text undetermined This is to the 6th correction text overlay undetermined, to obtain correction of a final proof text.
Further, the coordinate rotation formula includes:
X'=xcos θ+ysin θ
Y'=ycos θ-xsin θ
Wherein, x indicates the abscissa of pixel;The ordinate of y expression pixel;θ indicates rotation angle threshold;X' is indicated The abscissa of pixel after rotation;Y' indicates the ordinate of pixel after rotation;
The group box includes: inclination group box and long interval group box;The inclination group box includes a letter; The letter that the long interval group box includes is located at both ends.
Beneficial effects of the present invention: the present invention has the advantage that using a kind of multichannel MSER model of enhancing: from R, G, MSER is detected in B, H, S, V and grey channel, it is text filed to obtain finer candidate.Introduce parallel SPP-CNN (Spatial pyramid pooling (SPP)-Convolutional Neural Networks (CNN), pyramid pond- Convolutional neural networks) classifier better discriminates between text filed and non-textual region, and which can handle arbitrary size Image, and can be in multiple dimensioned lower extraction pond feature, so as to pass through the multilayer space of source images (English text image) Information understands more features;By Model Matching process, the group box introduced errors into is filtered;It can handle inclined field Scape text realizes the correction to English text.
Detailed description of the invention
Fig. 1 is flow chart of the invention;
Fig. 2 is the cross-layer SPP-CNN algorithm architecture diagram that the present invention uses;
Fig. 3 is the schematic diagram of SPP working method in the prior art;
Fig. 4 is text packets system assumption diagram of the invention;
Fig. 5 is the constraint schematic diagram of line of text of the invention;
Fig. 6 obtains for the present invention without correction for direction preliminary text filed;
Fig. 7 is that the present invention is text filed by the correction of a final proof of correction for direction;
Fig. 8 is direction rotating model of the invention;
Fig. 9 is the Matching Model of group box of the present invention;
Figure 10 present invention is the testing result figure of different rotary;
Figure 11 present invention is the case diagram of testing result.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to of the invention real The technical solution applied in example is clearly and completely described, it is clear that described embodiment is only that present invention a part is implemented Example, instead of all the embodiments.
The present invention provides a kind of English text detection methods with text orientation correction, as shown in Figure 1, it includes such as Lower step:
S1, maximum stable extremal region detection is carried out to each channel of the sharpening image of English text image respectively, from figure MSER is extracted as in as text candidates;It obtains candidate text filed;
S2, the classification for being based on convolutional neural networks (Convolutional Neural Networks, CNN) model is established Device extracts candidate text filed feature, using softmax function according to candidate text filed feature, by candidate text Region is divided into text class region and non-textual class region;Non-textual class region is filtered, acquisition is preliminary text filed, that is, detects English Text;
S3, using the double-deck text packets algorithm by the preliminary text filed grouping;
S4, by the preliminary text filed carry out correction for direction after grouping, to realize the correction of English text, i.e. acquisition school Positive text, the correction text are the English text after correcting.
Preferably, five layer architectures of the CNN model that the present invention uses are as shown in Figure 2:
First layer architecture use size for 7 × 7 × 5 the first convolution kernel;To use length be 7 depth for 7 width is 5 for expression Convolution kernel;
Second layer architecture uses 5 × 5 × 5 maximum pond;It indicates to use length for maximum pond that 5 width are that 5 depth are 5 Change;
Second convolution kernel of 5 × 3 × 5 convolution of third layer framework applications;To use length be 3 depth for 5 width is 5 for expression Convolution kernel;
4th layer architecture uses the pond SPP;Fig. 3 is the schematic diagram of SPP working method, is respectively adopted 3 to same image × 3 ponds (that is to say that length is 3 pond of 3 width) are divided into 9 blocks, and 2 × 2 ponds are divided into 4 blocks and 1 × 1 Chi Huafen At 1 block, each piece of maximum value is calculated separately, to obtain output neuron, then the image of arbitrary size is converted into one 14 dimensional features of a fixed size.It is understood that the present invention can be increased pyramidal with arbitrarily devised different dimensions size The number of plies or the size for changing grid division.
Layer 5 framework uses full articulamentum;It specifically includes:
Candidate text filed progress first time filtering is checked using the first convolution in the first layer architecture;First time is filtered Candidate afterwards is text filed to carry out maximum pond in the second layer architecture;Using the second convolution kernel in third layer architecture, to most Candidate's second of filtering of text filed progress after great Chiization;It is text filed to second of filtered candidate, in the 4th layer frame It carries out utilizing pyramid pond in structure;It is text filed to the candidate of pyramid Chi Huahou to be connected entirely in layer 5 framework, To extract candidate text filed fisrt feature;
Using the feature added manually, by the candidate text filed progress first time filtering of the first convolution verification;After filtering Candidate it is text filed connected entirely according to the feature added manually, to extract the text filed second feature of candidate.
Preferably, the feature of manual designs is embedded into entire CNN, i.e. cross-layer.Cross-layer is only in the first layer architecture and It works in five layer architectures, feature used in cross-layer namely the feature added manually include:
Depth-width ratioCompactnessStroke width area ratioLocal contrast lc and boundary key point k.
Wherein w and h respectively represents the width and height (pixel number) of the minimum circumscribed rectangle of maximum stable extremal region;a Indicate the area (all pixels point number in region) of the minimum circumscribed rectangle of maximum stable extremal region;P indicates maximum stable The number of the minimum circumscribed rectangle boundary point of extremal region indicates p with boundary key point k in the present invention.
Local contrast can be obtained using following equation:
Wherein, lc indicates local contrast;RiIndicate the ith pixel of red channel;GiIndicate i-th of green channel Pixel;BiIndicate the ith pixel of blue channel;N indicates the total number of pixels in the region MSER, and k indicates of boundary key point Number.
By connecting boundary key point in sequence, it can approximatively restore original area, that is to say and obtain It gets preliminary text filed.
The calculating process of k:
Construct binary picture;The all pixels of binary picture described in iteration;Calculate profile point;It is drawn using Doug Si-Pu Ke compression algorithm profile point, the profile point after overcompression are boundary key point;It specifically includes:
255 are set by the gray value for belonging to pixel in maximum stable extremal region;Maximum stable extremal region will be belonged to Outside, and belong to the gray value of pixel in the minimum circumscribed rectangle region of maximum stable extremal region and be set as 0;If pixel The pixel value p (x, y)=255 of (x, y), and at p (x+1, y), p (x-1, y), p (x, y+1), have in p (x, y-1) value to be 0, then pixel p (x, y) belongs to profile point;It is compressed using Douglas-Pu Ke algorithm (Douglas-Peucker algorithm) Profile point is boundary key point by compressed remaining profile point.
Alternatively, the classification of final feature is obtained using SoftMax classification function;
After the preliminary text filed grouping is carried out text packets using the double-deck text packets algorithm, then carry out low dip Correction for direction, as shown in Figure 4.It is specifically divided into three parts: vertical grouping, horizontal grouping and correction for direction:
Vertical grouping key step is as follows:
Obtain the minimum Y axis coordinate b that n-th of preliminary text filed middle pixel is 255n;Obtain (n+1)th preliminary text area The maximum Y axis coordinate t that pixel is 255 in domainn+1;Obtain (n+1)th preliminary text filed height hn+1;As shown in Figure 5;
Computed altitude is poorIf difference in height dn,n+1Greater than height threshold;Then by two preliminary texts One's respective area is divided into identical class, that is, belongs to one text row;If difference in height dn,n+1Less than or equal to height threshold, then two It is preliminary it is text filed be not identical class, (n+1)th it is preliminary it is text filed be considered as new line of text, and the new line of text is existed Y direction is split;
Wherein, height threshold of the invention takes 0.62;
Level grouping key step is as follows:
Two adjacent tentatively the distance between text filed poor Δ d in one text row in acquisition X-axis;Range difference Δ d packet It includes: the distance between letter d in same word1The distance between word d2
According to coefficientIt indicates the mean breadth of all letters in line of text, word is separated according to width threshold value;
Obtain the ratio of pitch and intervalIf the ratio d of pitch and intervalhLess than width Threshold value, two it is adjacent it is preliminary it is text filed belong to same class, i.e., the same word, if the ratio d of pitch and intervalhGreatly In or be equal to width threshold value, two it is adjacent it is preliminary it is text filed be not belonging to same class, i.e., the two it is adjacent it is preliminary it is text filed not Belong to the same word, by the preliminary text filed beginning as new word of the latter.
Wherein, coefficientBe according to use from 2013 training set of ICDAR include 229 pictures and 1226 words What experiment obtained;
Width threshold value of the invention takes 2.33.
Steps are as follows for low dip correction for direction:
Fig. 6 is illustrated to tentatively text filed, it can be seen that word is seriously separated because of inclination, and " ne1Wor " is recognized For the word for being same a line;It according to experiment, show that the word within 10 degree of slight inclination can be correctly grouped, therefore uses The strategy of rotatable coordinate axis, has obtained the correction of a final proof text as shown in Fig. 7.
Due to the rotation of reference axis, group box " wordline1 " is correctly grouped, but the group box that mistake introduces " wordline2 " is not corrected correctly, so carrying out innovatory algorithm using rotation convergence strategy:
The text filed carry out correction for direction by after grouping includes:
S401, respectively will be preliminary text filed to rotate clockwise α degree after grouping using coordinate rotation formula;Setting is just Initial value i=1, α=- 30 °;
S402, pass through Model Matching process, the group box filtering introduced errors into;Obtain i-th of correction text area undetermined Domain;
S403, as i < 6 ,+10 ° of i=i+1, α=α;Return step S401, as i=6, by the 1st correction text undetermined This is to the 6th correction text overlay undetermined, to obtain correction of a final proof text.
Specifically,
It respectively will be preliminary text filed with tens degree clockwise or counterclockwise after grouping using coordinate rotation formula; Set initial value i=1;
In the present invention, respectively to be rotated both clockwise and counterclockwise 10,20,30 degree, as shown in Figure 8;
By Model Matching process, the group box introduced errors into is filtered;It is text filed to obtain i-th of correction undetermined;
By i-th of correction text undetermined with tens degree clockwise or counterclockwise, introduced errors by Model Matching Group box filtering, obtain i+1 correction text undetermined;
As i < 6, i=i+1;Return step S401, as i=6, by the 1st correction text undetermined to the 6th school undetermined Positive text overlay, to obtain correction of a final proof text;As shown in Figure 9.
As another implementation, in step S403, i can be not limited to be equal to 6, can also be any in 5,7,8 One number.
It is understood that Model Matching process is according to postrotational preliminary text filed corresponding to training set image The matched process of English text retains the lap, accordingly if postrotational image can be Chong Die with the training set image , then another rotation is done to tentatively text filed, if postrotational image can be Chong Die with another training set image, again Retain lap finally to be superimposed out by all laps, obtains correction of a final proof text.
(a) shows the model for being known as " inclination group box " in Fig. 9, only includes one the model describe each frame Letter.The group box that this mistake introduces occurs when mainly text tilts in a single direction.
(b) shows that one is known as " long interval group box " in Fig. 9, and the group box that this mistake introduces indicates that letter is located at There is very big interval at the both ends of each frame between them.Such case will be in the distance between the text that different directions arrange Occur when too close.
An important factor for increment of rotation and number of revolutions are testing results.It, will most for balance quality and time complexity High rotation angle degree is set as 30 degree.As shown in Figure 10, the increment of rotation is spent from 1 to 15 and is differed, and increment is smaller, obtains maximal degree institute The number needed is more, the experimental results showed that, when increment reaches 10 degree, three indexs such as precision, recall rate and f index reach Peak value, this is the final rotation angle threshold of the method proposed.
In the present invention, in order to verify proposition algorithm correctness and validity, to ICDAR 2011 and ICDAR Comparative experiments has been carried out on 2013 data sets.2011 test set of ICDAR includes 255 images, and 2013 test set of ICDAR includes 233 images.The corresponding txt document of each image, it has recorded the specific coordinate for needing the text detected.
The assessment of detection effect is main to calculate the text filed registration between actual text region of correction detected. For each rectangle to be assessed, maximum matching value is used.Formula is as follows:
m(r;R)=max m (r, r') | r' ∈ R }
R indicates that correction is text filed, and r' indicates actual text region;A (r) indicates the rectangular area of the text filed r of correction, R indicates matched region collection.Maximum area matching is obtained, then computational accuracy, recall rate and f index.Formula is as follows:
E indicates the text filed set of correction to be detected, and T indicates rectangular set to be assessed.F-measure be precision and The combination of recall rate.The relative weighting of precision and recall rate is controlled by parameter alpha, is usually arranged as 0.5, has precision and recall rate There is identical weight.
In the present invention, some comparative experiments demonstrate proposed method can extract it is more text filed.
The extraction result of 1 difference MSER method of table
According to table 1 (only considering the performance of alphabetical rank, do not consider the final result of word level), in Laplacian and After the pretreatment of multichannel, more text filed (recall rate increases) can be extracted, but is also extracted more non-textual Region (precision reduction).
Context of methods and existing Method for text detection are carried out quantitative ratio by the validity of the method used to illustrate the invention Compared with.Training set is manually generated by ICDAR 2011 and ICDAR 2013 using multichannel MSER, it includes 44908 English texts Image and 56139 non English language text images.Collect 25% training set as verifying, by training process, accuracy rate reaches 96%.Training for SPP-CNN, using cross validation and stochastic gradient descent (SGD).To ICDAR 2011 and ICDAR 2013 compare experiment in 5 kinds of methods, the English text image identified such as Figure 11 for the present invention, it can be seen that this Invention can effectively identify English text and be able to achieve correction.
As can be seen that the method for the present invention is in recall rate and f index from table 2 and table 3, it is superior to the prior art.
The influence of the scene text detection in ICDAR 2011 of table 2
The influence of the scene text detection in ICDAR 2013 of table 3
It is respectively corresponded in table 2 and the documents in table 3 are as follows:
[1]Liu Z,Li Y,Qi X,et al.Method for unconstrained text detection in natural scene image[J].Iet Computer Vision,2017,11(7):596-604.
[2]Wu H,Zou B,Zhao YQ,et al.Natural scene text detection by multi- scale adaptive color clustering and non-text filtering[J].Neurocomputing, 2016,214:1011–1025.
[3]Yu C,Song Y,Zhang Y.Scene text localization using edge analysis and feature pool[J].Neurocomputing,2015,175:652-661.
[4]Yao Li,Wenjing Jia,Chunhua Shen,et al.Characterness:An Indicator of Text in the Wild[J].IEEE transactions on image processing:a publication of the IEEE Signal Processing Society,2014,23(4):1666-77.
[5]Tian C,Xia Y,Zhang X,et al.Natural Scene Text Detection with MC-MR Candidate Extraction and Coarse-to-Fine Filtering[J].Neurocomputing,2017.
[6]Zhu A,Gao R,Uchida S.Could scene context be beneficial for scene Text detection? [J] .Pattern Recognition, 2016,58:204-215.
[7]Neumann L,Matas J.Efficient Scene text localization and recognition with local character refinement[C]//International Conference on Document Analysis and Recognition.IEEE,2015:746-750.
[8]Gomez L,Karatzas D.A fast hierarchical method for multi-script and arbitrary oriented scene text extraction[J].2014,19(4):1-15.
The present invention can detecte the text of slight inclination and the English text with different fonts or size, as Fig. 9 be at Function detects case.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can It is completed with instructing relevant hardware by program, which can be stored in a computer readable storage medium, storage Medium may include: ROM, RAM, disk or CD etc..
Embodiment provided above has carried out further detailed description, institute to the object, technical solutions and advantages of the present invention It should be understood that embodiment provided above is only the preferred embodiment of the present invention, be not intended to limit the invention, it is all Any modification, equivalent substitution, improvement and etc. made for the present invention, should be included in the present invention within the spirit and principles in the present invention Protection scope within.

Claims (9)

1. a kind of English text detection method with text orientation correction, which comprises the following steps:
S1, maximum stable extremal region detection is carried out to each channel of the sharpening image of English text image respectively, from image Extract maximum stable extremal region;It obtains candidate text filed;
S2, the classifier based on convolutional neural networks model is established, extracts candidate text filed feature;Utilize softmax Function text filed is divided into text class region and non-textual class region according to candidate text filed feature, by candidate;It filters non- Text class region, acquisition is preliminary text filed, that is, detects English text;
S3, using the double-deck text packets algorithm by the preliminary text filed grouping;
S4, by the preliminary text filed carry out correction for direction after grouping, to realize the correction of English text;
S401, respectively will be preliminary text filed to rotate clockwise α degree after grouping using coordinate rotation formula;Set initial value I=1, α=- 30 °;
S402, pass through Model Matching process, the group box filtering introduced errors into;It is text filed to obtain i-th of correction undetermined;
S403, as i < 6 ,+10 ° of i=i+1, α=α;Return step S401, as i=6, extremely by the 1st correction text undetermined 6th correction text overlay undetermined, to obtain correction of a final proof text.
2. a kind of English text detection method with text orientation correction according to claim 1, which is characterized in that institute Stating channel includes: red channel, green channel, blue channel, tone channel, saturation degree channel, lightness channel and grey channel.
3. a kind of English text detection method with text orientation correction according to claim 1, which is characterized in that institute It states and establishes the classifier based on convolutional neural networks model, extracting candidate text filed feature includes: respectively according to classification Five layer architectures in device obtain candidate text filed fisrt feature and obtain candidate the second text filed spy by cross-layer Sign, wherein five layer architectures include the first convolutional layer being sequentially connected, maximum pond layer, the second convolutional layer, pyramid pond layer with And full articulamentum;Cross-layer indicates the first convolutional layer to full articulamentum.
4. a kind of English text detection method with text orientation correction according to claim 3, which is characterized in that institute State the acquisition modes of fisrt feature are as follows: check candidate text filed progress first time filter using the first convolution in the first layer architecture Wave;It will the text filed maximum pond of the progress in the second layer architecture of first time filtered candidate;Using in third layer architecture Second convolution kernel filters the text filed progress of the candidate of maximum Chi Huahou for second;To second of filtered candidate text Region carries out utilizing pyramid pond in the 4th layer architecture;It is text filed in layer 5 frame to the candidate of pyramid Chi Huahou It is connected entirely in structure, to extract candidate text filed fisrt feature.
5. a kind of English text detection method with text orientation correction according to claim 3, which is characterized in that institute State the acquisition modes of second feature are as follows: using the feature added manually, by the candidate text filed progress first of the first convolution verification Secondary filtering;It is connected entirely according to the feature added manually by filtered candidate is text filed, to extract candidate text The second feature in region.
6. a kind of English text detection method with text orientation correction according to claim 5, which is characterized in that institute Stating the feature added manually includes: depth-width ratio, compactness, stroke width area ratio, local contrast and boundary key point.
7. a kind of English text detection method with text orientation correction according to claim 1, which is characterized in that institute State using the double-deck text packets algorithm by the preliminary text filed grouping include: by preliminary text filed carry out vertical grouping, It specifically includes:
Obtain the minimum Y axis coordinate b that n-th of preliminary text filed middle pixel is 255n;Obtain (n+1)th it is preliminary it is text filed in The maximum Y axis coordinate t that pixel is 255n+1;Obtain (n+1)th preliminary text filed height hn+1
Computed altitude is poorIf difference in height dn,n+1Greater than height threshold;Then by two preliminary text areas Domain is divided into identical class, that is, belongs to one text row;If difference in height dn,n+1Less than or equal to height threshold, then two it is preliminary Text filed is not identical class, (n+1)th it is preliminary it is text filed be considered as new line of text, and by the new line of text in Y-axis Direction is split.
8. a kind of English text detection method with text orientation correction according to claim 7, which is characterized in that institute It states and utilizes the double-deck text packets algorithm by the preliminary text filed grouping further include: will preliminary text filed progress level point Group specifically includes:
Two adjacent tentatively the distance between text filed poor Δ d in one text row in acquisition X-axis;Range difference Δ d includes: same The distance between letter d in one word1The distance between word d2
According to coefficientIt indicates the mean breadth of all letters in line of text, word is separated according to width threshold value;
Obtain the ratio of pitch and intervalIf the ratio d of pitch and intervalhLess than width threshold value, Two it is adjacent it is preliminary it is text filed belong to same class, i.e., the same word, if the ratio d of pitch and intervalhBe greater than or Equal to width threshold value, two it is adjacent it is preliminary it is text filed be not belonging to same class, i.e., the two adjacent preliminary text filed are not belonging to The same word, by the preliminary text filed beginning as new word of the latter.
9. a kind of English text detection method with text orientation correction according to claim 1, which is characterized in that institute Stating group box includes: inclination group box and long interval group box;The inclination group box includes a letter;The long interval The letter that group box includes is located at both ends.
CN201810429149.XA 2018-05-08 2018-05-08 A kind of English text detection method with text orientation correction Active CN108647681B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810429149.XA CN108647681B (en) 2018-05-08 2018-05-08 A kind of English text detection method with text orientation correction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810429149.XA CN108647681B (en) 2018-05-08 2018-05-08 A kind of English text detection method with text orientation correction

Publications (2)

Publication Number Publication Date
CN108647681A CN108647681A (en) 2018-10-12
CN108647681B true CN108647681B (en) 2019-06-14

Family

ID=63749675

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810429149.XA Active CN108647681B (en) 2018-05-08 2018-05-08 A kind of English text detection method with text orientation correction

Country Status (1)

Country Link
CN (1) CN108647681B (en)

Families Citing this family (63)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9318108B2 (en) 2010-01-18 2016-04-19 Apple Inc. Intelligent automated assistant
US8977255B2 (en) 2007-04-03 2015-03-10 Apple Inc. Method and system for operating a multi-function portable electronic device using voice-activation
US8676904B2 (en) 2008-10-02 2014-03-18 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US10706373B2 (en) 2011-06-03 2020-07-07 Apple Inc. Performing actions associated with task items that represent tasks to perform
US10417037B2 (en) 2012-05-15 2019-09-17 Apple Inc. Systems and methods for integrating third party services with a digital assistant
JP2016508007A (en) 2013-02-07 2016-03-10 アップル インコーポレイテッド Voice trigger for digital assistant
US10652394B2 (en) 2013-03-14 2020-05-12 Apple Inc. System and method for processing voicemail
US10748529B1 (en) 2013-03-15 2020-08-18 Apple Inc. Voice activated device for use with a voice-based digital assistant
US10176167B2 (en) 2013-06-09 2019-01-08 Apple Inc. System and method for inferring user intent from speech inputs
US9715875B2 (en) 2014-05-30 2017-07-25 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US9966065B2 (en) 2014-05-30 2018-05-08 Apple Inc. Multi-command single utterance input method
US10170123B2 (en) 2014-05-30 2019-01-01 Apple Inc. Intelligent assistant for home automation
US9338493B2 (en) 2014-06-30 2016-05-10 Apple Inc. Intelligent automated assistant for TV user interactions
US9886953B2 (en) 2015-03-08 2018-02-06 Apple Inc. Virtual assistant activation
US10200824B2 (en) 2015-05-27 2019-02-05 Apple Inc. Systems and methods for proactively identifying and surfacing relevant content on a touch-sensitive device
US20160378747A1 (en) 2015-06-29 2016-12-29 Apple Inc. Virtual assistant for media playback
US10747498B2 (en) 2015-09-08 2020-08-18 Apple Inc. Zero latency digital assistant
US10740384B2 (en) 2015-09-08 2020-08-11 Apple Inc. Intelligent automated assistant for media search and playback
US10671428B2 (en) 2015-09-08 2020-06-02 Apple Inc. Distributed personal assistant
US10331312B2 (en) 2015-09-08 2019-06-25 Apple Inc. Intelligent automated assistant in a media environment
US10691473B2 (en) 2015-11-06 2020-06-23 Apple Inc. Intelligent automated assistant in a messaging environment
US10956666B2 (en) 2015-11-09 2021-03-23 Apple Inc. Unconventional virtual assistant interactions
US10223066B2 (en) 2015-12-23 2019-03-05 Apple Inc. Proactive assistance based on dialog communication between devices
US10586535B2 (en) 2016-06-10 2020-03-10 Apple Inc. Intelligent digital assistant in a multi-tasking environment
DK179415B1 (en) 2016-06-11 2018-06-14 Apple Inc Intelligent device arbitration and control
DK201670540A1 (en) 2016-06-11 2018-01-08 Apple Inc Application integration with a digital assistant
US10726832B2 (en) 2017-05-11 2020-07-28 Apple Inc. Maintaining privacy of personal information
DK180048B1 (en) 2017-05-11 2020-02-04 Apple Inc. MAINTAINING THE DATA PROTECTION OF PERSONAL INFORMATION
DK179745B1 (en) 2017-05-12 2019-05-01 Apple Inc. SYNCHRONIZATION AND TASK DELEGATION OF A DIGITAL ASSISTANT
DK201770429A1 (en) 2017-05-12 2018-12-14 Apple Inc. Low-latency intelligent automated assistant
DK179496B1 (en) 2017-05-12 2019-01-15 Apple Inc. USER-SPECIFIC Acoustic Models
US10303715B2 (en) 2017-05-16 2019-05-28 Apple Inc. Intelligent automated assistant for media exploration
US20180336892A1 (en) 2017-05-16 2018-11-22 Apple Inc. Detecting a trigger of a digital assistant
US10818288B2 (en) 2018-03-26 2020-10-27 Apple Inc. Natural assistant interaction
US11145294B2 (en) 2018-05-07 2021-10-12 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US10928918B2 (en) 2018-05-07 2021-02-23 Apple Inc. Raise to speak
US10892996B2 (en) 2018-06-01 2021-01-12 Apple Inc. Variable latency device coordination
DK180639B1 (en) 2018-06-01 2021-11-04 Apple Inc DISABILITY OF ATTENTION-ATTENTIVE VIRTUAL ASSISTANT
DK179822B1 (en) 2018-06-01 2019-07-12 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US11462215B2 (en) 2018-09-28 2022-10-04 Apple Inc. Multi-modal inputs for voice commands
CN109800735A (en) * 2019-01-31 2019-05-24 中国人民解放军国防科技大学 Accurate detection and segmentation method for ship target
US11348573B2 (en) 2019-03-18 2022-05-31 Apple Inc. Multimodality in digital assistant systems
CN109934229B (en) * 2019-03-28 2021-08-03 网易有道信息技术(北京)有限公司 Image processing method, device, medium and computing equipment
DK201970509A1 (en) 2019-05-06 2021-01-15 Apple Inc Spoken notifications
US11307752B2 (en) 2019-05-06 2022-04-19 Apple Inc. User configurable task triggers
US11140099B2 (en) 2019-05-21 2021-10-05 Apple Inc. Providing message response suggestions
DK180129B1 (en) 2019-05-31 2020-06-02 Apple Inc. User activity shortcut suggestions
DK201970511A1 (en) 2019-05-31 2021-02-15 Apple Inc Voice identification in digital assistant systems
US11468890B2 (en) 2019-06-01 2022-10-11 Apple Inc. Methods and user interfaces for voice-based control of electronic devices
CN110298343A (en) * 2019-07-02 2019-10-01 哈尔滨理工大学 A kind of hand-written blackboard writing on the blackboard recognition methods
US11488406B2 (en) 2019-09-25 2022-11-01 Apple Inc. Text detection using global geometry estimators
CN110674815A (en) * 2019-09-29 2020-01-10 四川长虹电器股份有限公司 Invoice image distortion correction method based on deep learning key point detection
CN112825141B (en) * 2019-11-21 2023-02-17 上海高德威智能交通系统有限公司 Method and device for recognizing text, recognition equipment and storage medium
CN111353493B (en) * 2020-03-31 2023-04-28 中国工商银行股份有限公司 Text image direction correction method and device
WO2021196013A1 (en) * 2020-03-31 2021-10-07 京东方科技集团股份有限公司 Word recognition method and device, and storage medium
US11183193B1 (en) 2020-05-11 2021-11-23 Apple Inc. Digital assistant hardware abstraction
US11061543B1 (en) 2020-05-11 2021-07-13 Apple Inc. Providing relevant data items based on context
US11755276B2 (en) 2020-05-12 2023-09-12 Apple Inc. Reducing description length based on confidence
US11490204B2 (en) 2020-07-20 2022-11-01 Apple Inc. Multi-device audio adjustment coordination
US11438683B2 (en) 2020-07-21 2022-09-06 Apple Inc. User identification using headphones
CN113298079B (en) * 2021-06-28 2023-10-27 北京奇艺世纪科技有限公司 Image processing method and device, electronic equipment and storage medium
CN113837169B (en) * 2021-09-29 2023-12-19 平安科技(深圳)有限公司 Text data processing method, device, computer equipment and storage medium
CN114283431B (en) * 2022-03-04 2022-06-28 南京安元科技有限公司 Text detection method based on differentiable binarization

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105279149A (en) * 2015-10-21 2016-01-27 上海应用技术学院 Chinese text automatic correction method
CN105426887A (en) * 2015-10-30 2016-03-23 北京奇艺世纪科技有限公司 Method and device for text image correction
CN105740774A (en) * 2016-01-25 2016-07-06 浪潮软件股份有限公司 Text region positioning method and apparatus for image
CN105868758A (en) * 2015-01-21 2016-08-17 阿里巴巴集团控股有限公司 Method and device for detecting text area in image and electronic device
CN106650725A (en) * 2016-11-29 2017-05-10 华南理工大学 Full convolutional neural network-based candidate text box generation and text detection method
CN106778757A (en) * 2016-12-12 2017-05-31 哈尔滨工业大学 Scene text detection method based on text conspicuousness
CN106997470A (en) * 2017-02-28 2017-08-01 信雅达系统工程股份有限公司 Tilt bearing calibration and the system of text image
CN107066972A (en) * 2017-04-17 2017-08-18 武汉理工大学 Natural scene Method for text detection based on multichannel extremal region
CN107992869A (en) * 2016-10-26 2018-05-04 深圳超多维科技有限公司 For tilting the method, apparatus and electronic equipment of word correction

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103325099A (en) * 2013-07-11 2013-09-25 北京智诺英特科技有限公司 Image correcting method and device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105868758A (en) * 2015-01-21 2016-08-17 阿里巴巴集团控股有限公司 Method and device for detecting text area in image and electronic device
CN105279149A (en) * 2015-10-21 2016-01-27 上海应用技术学院 Chinese text automatic correction method
CN105426887A (en) * 2015-10-30 2016-03-23 北京奇艺世纪科技有限公司 Method and device for text image correction
CN105740774A (en) * 2016-01-25 2016-07-06 浪潮软件股份有限公司 Text region positioning method and apparatus for image
CN107992869A (en) * 2016-10-26 2018-05-04 深圳超多维科技有限公司 For tilting the method, apparatus and electronic equipment of word correction
CN106650725A (en) * 2016-11-29 2017-05-10 华南理工大学 Full convolutional neural network-based candidate text box generation and text detection method
CN106778757A (en) * 2016-12-12 2017-05-31 哈尔滨工业大学 Scene text detection method based on text conspicuousness
CN106997470A (en) * 2017-02-28 2017-08-01 信雅达系统工程股份有限公司 Tilt bearing calibration and the system of text image
CN107066972A (en) * 2017-04-17 2017-08-18 武汉理工大学 Natural scene Method for text detection based on multichannel extremal region

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
Detecting Oriented Text in Natural Images by Linking Segments;Baoguang Shi等;《2017 IEEE Conference on Computer Vision and Pattern Recognition》;20170726;1-9
Scene Text Detection Based on Enhanced Multi-channels MSER and a Fast Text Grouping Process;Jin Dai等;《2018 the 3rd IEEE International Conference on Cloud Computing and Big Data Analysis》;20180422;正文第351-354页
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition;Kaiming He等;《IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE》;20150930;第37卷(第9期);1904-1916
Text detection based on convolutional neural networks with spatial pyramid pooling;Rui Zhu等;《2016 IEEE International Conference on Image Processing》;20160928;正文第2.2节,图3
Text-Attentional Convolutional Neural Network for Scene Text Detection;Tong He等;《IEEE TRANSACTIONS ON IMAGE PROCESSING》;20160630;第25卷(第6期);2529-2541
基于文字结构特征的文本图像方向的研究与应用;朱其猛;《中国优秀硕士学位论文全文数据库 信息科技辑》;20140915(第09期);正文第33-35页
基于深度网络的视觉跟踪算法研究;李玉冰;《中国优秀硕士学位论文全文数据库 信息科技辑》;20180115(第01期);正文第29-30页,图3-5

Also Published As

Publication number Publication date
CN108647681A (en) 2018-10-12

Similar Documents

Publication Publication Date Title
CN108647681B (en) A kind of English text detection method with text orientation correction
CN111325203B (en) American license plate recognition method and system based on image correction
Goodfellow et al. Multi-digit number recognition from street view imagery using deep convolutional neural networks
CN107491730A (en) A kind of laboratory test report recognition methods based on image procossing
CN101976258B (en) Video semantic extraction method by combining object segmentation and feature weighing
Tian et al. Natural scene text detection with MC–MR candidate extraction and coarse-to-fine filtering
CN111539409B (en) Ancient tomb question and character recognition method based on hyperspectral remote sensing technology
CN108647695A (en) Soft image conspicuousness detection method based on covariance convolutional neural networks
CN114067444A (en) Face spoofing detection method and system based on meta-pseudo label and illumination invariant feature
CN110738216A (en) Medicine identification method based on improved SURF algorithm
CN110472652A (en) A small amount of sample classification method based on semanteme guidance
CN116071763A (en) Teaching book intelligent correction system based on character recognition
CN106203448A (en) A kind of scene classification method based on Nonlinear Scale Space Theory
CN111340032A (en) Character recognition method based on application scene in financial field
CN115311746A (en) Off-line signature authenticity detection method based on multi-feature fusion
CN104899551B (en) A kind of form image sorting technique
CN109741351A (en) A kind of classification responsive type edge detection method based on deep learning
CN110222660B (en) Signature authentication method and system based on dynamic and static feature fusion
CN110728214B (en) Weak and small figure target detection method based on scale matching
CN116259062A (en) CNN handwriting identification method based on multichannel and attention mechanism
CN109858353A (en) Facial image feature extracting method based on mark transformation and LBP
Su et al. Skew detection for Chinese handwriting by horizontal stroke histogram
CN110555792B (en) Image tampering blind detection method based on normalized histogram comprehensive feature vector
CN111612045A (en) Universal method for acquiring target detection data set
CN109740618A (en) Network paper score method for automatically counting and device based on FHOG feature

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant