CN108647681B

CN108647681B - A kind of English text detection method with text orientation correction

Info

Publication number: CN108647681B
Application number: CN201810429149.XA
Authority: CN
Inventors: 代劲; 王族; 尹航
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2018-05-08
Filing date: 2018-05-08
Publication date: 2019-06-14
Anticipated expiration: 2038-05-08
Also published as: CN108647681A

Abstract

The invention belongs to technical field of image processing, specially a kind of English text detection method with text orientation correction；The described method includes: the carry out maximum stable extremal region detection to each channel of English text image respectively, obtains candidate text filed；The classifier based on convolutional neural networks model is established, the candidate of filter false is text filed, obtains preliminary text filed；Using the double-deck text packets algorithm by the preliminary text filed grouping；By the preliminary text filed carry out correction for direction after grouping, to obtain correction text；The present invention uses a kind of multichannel MSER model of enhancing: finer text filed to obtain；Parallel SPP-CNN classifier is introduced to better discriminate between text filed and non-textual region, can handle the image of arbitrary size, and can be in multiple dimensioned lower extraction pond feature, so as to understand more features by the multilayer spatial information of source images；The present invention can handle the scene text being slightly slanted.

Description

A kind of English text detection method with text orientation correction

Technical field

The invention belongs to technical field of image processing, specially a kind of English text detection side with text orientation correction Method.

Background technique

Text in natural scene image has information accurate, abundant, this is for image analysis, turning over based on image It translates, picture search etc. is of great significance.In past 20 years, researcher proposes some to be examined in natural scene image The method for surveying text.There are many based on content multimedia understand application, as automatic vision classification, image retrieval, assisting navigation, Multilingual translation, Object identifying and the application to satisfy the needs of consumers.

The critical issue that scene text detection faces is: (1) text in file and picture has regular font, similar face Color, it is evenly sized and evenly distributed, even if the text in natural scene may also have different words in Same Scene Body, color, ratio and direction.(2) background of natural scene image may be extremely complex.Mark, fence, brick and meadow are difficult to It is distinguished with real text, therefore be easy to cause and obscure and mistake.(3) other disturbing factors in scene character image.Such as Non-uniform illumination obscures, translucent effects etc..

Researcher proposes many methods to detect the text in natural scene image, and there are two types of main methods.

Text is considered as a kind of texture of specific type by the method based on texture, and using their texture properties, such as office Portion's intensity, filter response and wavelet coefficient distinguish the text filed and non-textual region of image.The meter of these usual methods Calculation amount is very big, because to scan all positions and scale.In addition, these methods mainly handle lateral text, to rotation and scaling It is very sensitive；

Text is considered as connection component by component-oriented approach, first by various methods (such as color cluster or extreme area Extract in domain) text is extracted, the classifier then trained using the regular of manual designs or automatically is filtered non-textual component. It is more effective generally, based on the method for component, because component count to be processed is relatively fewer.In addition, these methods are to rotation, contracting It puts and font is all insensitive.The conventional method for detecting candidate text filed (Candidate Text Region, be denoted as CTR) has Maximum stable extremal region (Maximally Stable Extremal Regions, be denoted as MSER), this method is for image It is affine variation have very strong robustness, can efficiently extract it is text filed in image, after there is scholar to improve MSER's Extraction algorithm makes the time complexity of algorithm reach linear session.

These methods are according to the rule or feature for distinguishing text filed and non-textual region, thus by text filed and non-text One's respective area is distinguished, although these methods are capable of detecting when text, lacks the correction to English text, and to inclination text Differentiation effect and bad, the text identified can have serious separation because of the inclination of word.

Summary of the invention

In view of this, the invention proposes a kind of English text detection method with text orientation correction, it can be effective Identify text, and will identify that inclination text be corrected, specifically includes the following steps:

S1, maximum stable extremal region detection is carried out to each channel of the sharpening image of English text image respectively, from figure MSER is extracted as in as text candidates；It obtains candidate text filed；

S2, the classifier based on convolutional neural networks model is established, extracts candidate text filed feature；It utilizes Softmax function text filed is divided into text class region and non-textual class region according to candidate text filed feature, by candidate； Non-textual class region is filtered, acquisition is preliminary text filed, that is, detects English text；

S3, using the double-deck text packets algorithm by the preliminary text filed grouping；

S4, by the preliminary text filed carry out correction for direction after grouping, to realize the correction of English text.

Further, the channel include: red channel, green channel, blue channel, tone channel, saturation degree channel, Lightness channel and grey channel.

Further, the classifier of the foundation based on convolutional neural networks model extracts candidate text filed spy Sign includes: to obtain candidate text filed fisrt feature according to five layer architectures in classifier respectively and waited by cross-layer The second feature of selection one's respective area, wherein five layer architectures include the first convolutional layer being sequentially connected, maximum pond layer, volume Two Lamination, pyramid pond layer and full articulamentum；Cross-layer indicates the first convolutional layer to full articulamentum.

Further, candidate text filed progress first time filtering is checked using the first convolution in the first layer architecture；It will Filtered candidate is text filed for the first time carries out maximum pond in the second layer architecture；Utilize the volume Two in third layer architecture Product core filters the text filed progress of the candidate of maximum Chi Huahou for second；It is text filed to second of filtered candidate, It carries out utilizing pyramid pond in 4th layer architecture；It is text filed to the candidate of pyramid Chi Huahou to be carried out in layer 5 framework Full connection, to obtain candidate text filed fisrt feature.

Further, using the feature added manually, by the candidate text filed progress first time filtering of the first convolution verification； It is connected entirely according to the feature added manually by filtered candidate is text filed, to obtain candidate text filed second Feature.

Further, the feature added manually includes: depth-width ratio, compactness, stroke width area ratio, local contrast Degree and boundary key point.

Further, the calculation formula of the local contrast are as follows:

Wherein, lc indicates local contrast；R_iIndicate the ith pixel of red channel；G_iIndicate i-th of green channel Pixel；B_iIndicate the ith pixel of blue channel；N indicates the total number of pixels in the region MSER, and k indicates of boundary key point Number.

Further, the acquisition modes of the boundary key point are as follows:

Construct binary picture；The all pixels of binary picture described in iteration；Calculate profile point；It is drawn using Doug Si-Pu Ke compression algorithm profile point obtains boundary key point and specifically includes:

255 are set by the gray value for belonging to pixel in maximum stable extremal region；Maximum stable extremal region will be belonged to Outside, and belong to the gray value of pixel in the minimum circumscribed rectangle region of maximum stable extremal region and be set as 0；If pixel The pixel value p (x, y)=255 of (x, y), and at p (x+1, y), p (x-1, y), p (x, y+1), have in p (x, y-1) value to be 0, then pixel p (x, y) belongs to profile point；Using Douglas-Pu Ke compression algorithm profile point, pass through compressed remaining profile Point is boundary key point.

Further, the preliminary text filed grouping is included: by preliminary text by the double-deck text packets algorithm of the utilization One's respective area carries out vertical grouping and horizontal grouping respectively；

The vertical grouping specifically includes as follows:

Obtain the minimum Y axis coordinate b that n-th of preliminary text filed middle pixel is 255_n；Obtain (n+1)th preliminary text area The maximum Y axis coordinate t that pixel is 255 in domain_n+1；Obtain (n+1)th preliminary text filed height h_n+1；

Computed altitude is poorIf difference in height d_n,n+1Greater than height threshold；Then by two preliminary texts Region division is identical class, that is, belongs to one text row；If difference in height d_n,n+1Less than or equal to height threshold, then at the beginning of two Walking text filed is not same class, (n+1)th it is preliminary it is text filed be considered as new class, and new line of text is in Y direction quilt It splits；

The level is grouped specific steps

Two adjacent tentatively the distance between text filed poor Δ d in one text row in acquisition X-axis；Range difference Δ d packet It includes: the distance between letter d in same word₁, the distance between word d₂；

According to coefficientIt indicates the mean breadth of all letters in line of text, word is separated according to width threshold value；

Obtain the ratio of pitch and intervalIf the ratio d of pitch and interval_hLess than width Threshold value, then the two it is adjacent it is preliminary it is text filed belong to same class, i.e., the same word, if the ratio of pitch and interval Value d_hMore than or equal to width threshold value, the two it is adjacent it is preliminary it is text filed be not belonging to same class, i.e., the two regions are not belonging to The same word, by the preliminary text filed beginning as a new word of the latter.

Further, the preliminary text filed progress correction for direction by after grouping includes:

S401, respectively will be preliminary text filed to rotate clockwise α degree after grouping using coordinate rotation formula；Setting is just Initial value i=1, α=- 30 °；

S402, pass through Model Matching process, the group box filtering introduced errors into；Obtain i-th of correction text area undetermined Domain；

S403, as i < 6 ,+10 ° of i=i+1, α=α, return step S401；As i=6, by the 1st correction text undetermined This is to the 6th correction text overlay undetermined, to obtain correction of a final proof text.

Further, the coordinate rotation formula includes:

X'=xcos θ+ysin θ

Y'=ycos θ-xsin θ

Wherein, x indicates the abscissa of pixel；The ordinate of y expression pixel；θ indicates rotation angle threshold；X' is indicated The abscissa of pixel after rotation；Y' indicates the ordinate of pixel after rotation；

The group box includes: inclination group box and long interval group box；The inclination group box includes a letter； The letter that the long interval group box includes is located at both ends.

Beneficial effects of the present invention: the present invention has the advantage that using a kind of multichannel MSER model of enhancing: from R, G, MSER is detected in B, H, S, V and grey channel, it is text filed to obtain finer candidate.Introduce parallel SPP-CNN (Spatial pyramid pooling (SPP)-Convolutional Neural Networks (CNN), pyramid pond- Convolutional neural networks) classifier better discriminates between text filed and non-textual region, and which can handle arbitrary size Image, and can be in multiple dimensioned lower extraction pond feature, so as to pass through the multilayer space of source images (English text image) Information understands more features；By Model Matching process, the group box introduced errors into is filtered；It can handle inclined field Scape text realizes the correction to English text.

Detailed description of the invention

Fig. 1 is flow chart of the invention；

Fig. 2 is the cross-layer SPP-CNN algorithm architecture diagram that the present invention uses；

Fig. 3 is the schematic diagram of SPP working method in the prior art；

Fig. 4 is text packets system assumption diagram of the invention；

Fig. 5 is the constraint schematic diagram of line of text of the invention；

Fig. 6 obtains for the present invention without correction for direction preliminary text filed；

Fig. 7 is that the present invention is text filed by the correction of a final proof of correction for direction；

Fig. 8 is direction rotating model of the invention；

Fig. 9 is the Matching Model of group box of the present invention；

Figure 10 present invention is the testing result figure of different rotary；

Figure 11 present invention is the case diagram of testing result.

Specific embodiment

In order to make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with attached drawing to of the invention real The technical solution applied in example is clearly and completely described, it is clear that described embodiment is only that present invention a part is implemented Example, instead of all the embodiments.

The present invention provides a kind of English text detection methods with text orientation correction, as shown in Figure 1, it includes such as Lower step:

S2, the classification for being based on convolutional neural networks (Convolutional Neural Networks, CNN) model is established Device extracts candidate text filed feature, using softmax function according to candidate text filed feature, by candidate text Region is divided into text class region and non-textual class region；Non-textual class region is filtered, acquisition is preliminary text filed, that is, detects English Text；

S4, by the preliminary text filed carry out correction for direction after grouping, to realize the correction of English text, i.e. acquisition school Positive text, the correction text are the English text after correcting.

Preferably, five layer architectures of the CNN model that the present invention uses are as shown in Figure 2:

First layer architecture use size for 7 × 7 × 5 the first convolution kernel；To use length be 7 depth for 7 width is 5 for expression Convolution kernel；

Second layer architecture uses 5 × 5 × 5 maximum pond；It indicates to use length for maximum pond that 5 width are that 5 depth are 5 Change；

Second convolution kernel of 5 × 3 × 5 convolution of third layer framework applications；To use length be 3 depth for 5 width is 5 for expression Convolution kernel；

4th layer architecture uses the pond SPP；Fig. 3 is the schematic diagram of SPP working method, is respectively adopted 3 to same image × 3 ponds (that is to say that length is 3 pond of 3 width) are divided into 9 blocks, and 2 × 2 ponds are divided into 4 blocks and 1 × 1 Chi Huafen At 1 block, each piece of maximum value is calculated separately, to obtain output neuron, then the image of arbitrary size is converted into one 14 dimensional features of a fixed size.It is understood that the present invention can be increased pyramidal with arbitrarily devised different dimensions size The number of plies or the size for changing grid division.

Layer 5 framework uses full articulamentum；It specifically includes:

Candidate text filed progress first time filtering is checked using the first convolution in the first layer architecture；First time is filtered Candidate afterwards is text filed to carry out maximum pond in the second layer architecture；Using the second convolution kernel in third layer architecture, to most Candidate's second of filtering of text filed progress after great Chiization；It is text filed to second of filtered candidate, in the 4th layer frame It carries out utilizing pyramid pond in structure；It is text filed to the candidate of pyramid Chi Huahou to be connected entirely in layer 5 framework, To extract candidate text filed fisrt feature；

Using the feature added manually, by the candidate text filed progress first time filtering of the first convolution verification；After filtering Candidate it is text filed connected entirely according to the feature added manually, to extract the text filed second feature of candidate.

Preferably, the feature of manual designs is embedded into entire CNN, i.e. cross-layer.Cross-layer is only in the first layer architecture and It works in five layer architectures, feature used in cross-layer namely the feature added manually include:

Depth-width ratioCompactnessStroke width area ratioLocal contrast lc and boundary key point k.

Wherein w and h respectively represents the width and height (pixel number) of the minimum circumscribed rectangle of maximum stable extremal region；a Indicate the area (all pixels point number in region) of the minimum circumscribed rectangle of maximum stable extremal region；P indicates maximum stable The number of the minimum circumscribed rectangle boundary point of extremal region indicates p with boundary key point k in the present invention.

Local contrast can be obtained using following equation:

By connecting boundary key point in sequence, it can approximatively restore original area, that is to say and obtain It gets preliminary text filed.

The calculating process of k:

Construct binary picture；The all pixels of binary picture described in iteration；Calculate profile point；It is drawn using Doug Si-Pu Ke compression algorithm profile point, the profile point after overcompression are boundary key point；It specifically includes:

255 are set by the gray value for belonging to pixel in maximum stable extremal region；Maximum stable extremal region will be belonged to Outside, and belong to the gray value of pixel in the minimum circumscribed rectangle region of maximum stable extremal region and be set as 0；If pixel The pixel value p (x, y)=255 of (x, y), and at p (x+1, y), p (x-1, y), p (x, y+1), have in p (x, y-1) value to be 0, then pixel p (x, y) belongs to profile point；It is compressed using Douglas-Pu Ke algorithm (Douglas-Peucker algorithm) Profile point is boundary key point by compressed remaining profile point.

Alternatively, the classification of final feature is obtained using SoftMax classification function；

After the preliminary text filed grouping is carried out text packets using the double-deck text packets algorithm, then carry out low dip Correction for direction, as shown in Figure 4.It is specifically divided into three parts: vertical grouping, horizontal grouping and correction for direction:

Vertical grouping key step is as follows:

Obtain the minimum Y axis coordinate b that n-th of preliminary text filed middle pixel is 255_n；Obtain (n+1)th preliminary text area The maximum Y axis coordinate t that pixel is 255 in domain_n+1；Obtain (n+1)th preliminary text filed height h_n+1；As shown in Figure 5；

Computed altitude is poorIf difference in height d_n,n+1Greater than height threshold；Then by two preliminary texts One's respective area is divided into identical class, that is, belongs to one text row；If difference in height d_n,n+1Less than or equal to height threshold, then two It is preliminary it is text filed be not identical class, (n+1)th it is preliminary it is text filed be considered as new line of text, and the new line of text is existed Y direction is split；

Wherein, height threshold of the invention takes 0.62；

Level grouping key step is as follows:

Two adjacent tentatively the distance between text filed poor Δ d in one text row in acquisition X-axis；Range difference Δ d packet It includes: the distance between letter d in same word₁The distance between word d₂；

Obtain the ratio of pitch and intervalIf the ratio d of pitch and interval_hLess than width Threshold value, two it is adjacent it is preliminary it is text filed belong to same class, i.e., the same word, if the ratio d of pitch and interval_hGreatly In or be equal to width threshold value, two it is adjacent it is preliminary it is text filed be not belonging to same class, i.e., the two it is adjacent it is preliminary it is text filed not Belong to the same word, by the preliminary text filed beginning as new word of the latter.

Wherein, coefficientBe according to use from 2013 training set of ICDAR include 229 pictures and 1226 words What experiment obtained；

Width threshold value of the invention takes 2.33.

Steps are as follows for low dip correction for direction:

Fig. 6 is illustrated to tentatively text filed, it can be seen that word is seriously separated because of inclination, and " ne1Wor " is recognized For the word for being same a line；It according to experiment, show that the word within 10 degree of slight inclination can be correctly grouped, therefore uses The strategy of rotatable coordinate axis, has obtained the correction of a final proof text as shown in Fig. 7.

Due to the rotation of reference axis, group box " wordline1 " is correctly grouped, but the group box that mistake introduces " wordline2 " is not corrected correctly, so carrying out innovatory algorithm using rotation convergence strategy:

The text filed carry out correction for direction by after grouping includes:

S403, as i < 6 ,+10 ° of i=i+1, α=α；Return step S401, as i=6, by the 1st correction text undetermined This is to the 6th correction text overlay undetermined, to obtain correction of a final proof text.

Specifically,

It respectively will be preliminary text filed with tens degree clockwise or counterclockwise after grouping using coordinate rotation formula； Set initial value i=1；

In the present invention, respectively to be rotated both clockwise and counterclockwise 10,20,30 degree, as shown in Figure 8；

By Model Matching process, the group box introduced errors into is filtered；It is text filed to obtain i-th of correction undetermined；

By i-th of correction text undetermined with tens degree clockwise or counterclockwise, introduced errors by Model Matching Group box filtering, obtain i+1 correction text undetermined；

As i < 6, i=i+1；Return step S401, as i=6, by the 1st correction text undetermined to the 6th school undetermined Positive text overlay, to obtain correction of a final proof text；As shown in Figure 9.

As another implementation, in step S403, i can be not limited to be equal to 6, can also be any in 5,7,8 One number.

It is understood that Model Matching process is according to postrotational preliminary text filed corresponding to training set image The matched process of English text retains the lap, accordingly if postrotational image can be Chong Die with the training set image , then another rotation is done to tentatively text filed, if postrotational image can be Chong Die with another training set image, again Retain lap finally to be superimposed out by all laps, obtains correction of a final proof text.

(a) shows the model for being known as " inclination group box " in Fig. 9, only includes one the model describe each frame Letter.The group box that this mistake introduces occurs when mainly text tilts in a single direction.

(b) shows that one is known as " long interval group box " in Fig. 9, and the group box that this mistake introduces indicates that letter is located at There is very big interval at the both ends of each frame between them.Such case will be in the distance between the text that different directions arrange Occur when too close.

An important factor for increment of rotation and number of revolutions are testing results.It, will most for balance quality and time complexity High rotation angle degree is set as 30 degree.As shown in Figure 10, the increment of rotation is spent from 1 to 15 and is differed, and increment is smaller, obtains maximal degree institute The number needed is more, the experimental results showed that, when increment reaches 10 degree, three indexs such as precision, recall rate and f index reach Peak value, this is the final rotation angle threshold of the method proposed.

In the present invention, in order to verify proposition algorithm correctness and validity, to ICDAR 2011 and ICDAR Comparative experiments has been carried out on 2013 data sets.2011 test set of ICDAR includes 255 images, and 2013 test set of ICDAR includes 233 images.The corresponding txt document of each image, it has recorded the specific coordinate for needing the text detected.

The assessment of detection effect is main to calculate the text filed registration between actual text region of correction detected. For each rectangle to be assessed, maximum matching value is used.Formula is as follows:

m(r；R)=max m (r, r') | r' ∈ R }

R indicates that correction is text filed, and r' indicates actual text region；A (r) indicates the rectangular area of the text filed r of correction, R indicates matched region collection.Maximum area matching is obtained, then computational accuracy, recall rate and f index.Formula is as follows:

E indicates the text filed set of correction to be detected, and T indicates rectangular set to be assessed.F-measure be precision and The combination of recall rate.The relative weighting of precision and recall rate is controlled by parameter alpha, is usually arranged as 0.5, has precision and recall rate There is identical weight.

In the present invention, some comparative experiments demonstrate proposed method can extract it is more text filed.

The extraction result of 1 difference MSER method of table

According to table 1 (only considering the performance of alphabetical rank, do not consider the final result of word level), in Laplacian and After the pretreatment of multichannel, more text filed (recall rate increases) can be extracted, but is also extracted more non-textual Region (precision reduction).

Context of methods and existing Method for text detection are carried out quantitative ratio by the validity of the method used to illustrate the invention Compared with.Training set is manually generated by ICDAR 2011 and ICDAR 2013 using multichannel MSER, it includes 44908 English texts Image and 56139 non English language text images.Collect 25% training set as verifying, by training process, accuracy rate reaches 96%.Training for SPP-CNN, using cross validation and stochastic gradient descent (SGD).To ICDAR 2011 and ICDAR 2013 compare experiment in 5 kinds of methods, the English text image identified such as Figure 11 for the present invention, it can be seen that this Invention can effectively identify English text and be able to achieve correction.

As can be seen that the method for the present invention is in recall rate and f index from table 2 and table 3, it is superior to the prior art.

The influence of the scene text detection in ICDAR 2011 of table 2

The influence of the scene text detection in ICDAR 2013 of table 3

It is respectively corresponded in table 2 and the documents in table 3 are as follows:

[1]Liu Z,Li Y,Qi X,et al.Method for unconstrained text detection in natural scene image[J].Iet Computer Vision,2017,11(7):596-604.

[2]Wu H,Zou B,Zhao YQ,et al.Natural scene text detection by multi- scale adaptive color clustering and non-text filtering[J].Neurocomputing, 2016,214:1011–1025.

[3]Yu C,Song Y,Zhang Y.Scene text localization using edge analysis and feature pool[J].Neurocomputing,2015,175:652-661.

[4]Yao Li,Wenjing Jia,Chunhua Shen,et al.Characterness:An Indicator of Text in the Wild[J].IEEE transactions on image processing:a publication of the IEEE Signal Processing Society,2014,23(4):1666-77.

[5]Tian C,Xia Y,Zhang X,et al.Natural Scene Text Detection with MC-MR Candidate Extraction and Coarse-to-Fine Filtering[J].Neurocomputing,2017.

[6]Zhu A,Gao R,Uchida S.Could scene context be beneficial for scene Text detection? [J] .Pattern Recognition, 2016,58:204-215.

[7]Neumann L,Matas J.Efficient Scene text localization and recognition with local character refinement[C]//International Conference on Document Analysis and Recognition.IEEE,2015:746-750.

[8]Gomez L,Karatzas D.A fast hierarchical method for multi-script and arbitrary oriented scene text extraction[J].2014,19(4):1-15.

The present invention can detecte the text of slight inclination and the English text with different fonts or size, as Fig. 9 be at Function detects case.

Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can It is completed with instructing relevant hardware by program, which can be stored in a computer readable storage medium, storage Medium may include: ROM, RAM, disk or CD etc..

Embodiment provided above has carried out further detailed description, institute to the object, technical solutions and advantages of the present invention It should be understood that embodiment provided above is only the preferred embodiment of the present invention, be not intended to limit the invention, it is all Any modification, equivalent substitution, improvement and etc. made for the present invention, should be included in the present invention within the spirit and principles in the present invention Protection scope within.

Claims

1. a kind of English text detection method with text orientation correction, which comprises the following steps:

S1, maximum stable extremal region detection is carried out to each channel of the sharpening image of English text image respectively, from image Extract maximum stable extremal region；It obtains candidate text filed；

S2, the classifier based on convolutional neural networks model is established, extracts candidate text filed feature；Utilize softmax Function text filed is divided into text class region and non-textual class region according to candidate text filed feature, by candidate；It filters non- Text class region, acquisition is preliminary text filed, that is, detects English text；

S4, by the preliminary text filed carry out correction for direction after grouping, to realize the correction of English text；

S401, respectively will be preliminary text filed to rotate clockwise α degree after grouping using coordinate rotation formula；Set initial value I=1, α=- 30 °；

S402, pass through Model Matching process, the group box filtering introduced errors into；It is text filed to obtain i-th of correction undetermined；

S403, as i < 6 ,+10 ° of i=i+1, α=α；Return step S401, as i=6, extremely by the 1st correction text undetermined 6th correction text overlay undetermined, to obtain correction of a final proof text.

2. a kind of English text detection method with text orientation correction according to claim 1, which is characterized in that institute Stating channel includes: red channel, green channel, blue channel, tone channel, saturation degree channel, lightness channel and grey channel.

3. a kind of English text detection method with text orientation correction according to claim 1, which is characterized in that institute It states and establishes the classifier based on convolutional neural networks model, extracting candidate text filed feature includes: respectively according to classification Five layer architectures in device obtain candidate text filed fisrt feature and obtain candidate the second text filed spy by cross-layer Sign, wherein five layer architectures include the first convolutional layer being sequentially connected, maximum pond layer, the second convolutional layer, pyramid pond layer with And full articulamentum；Cross-layer indicates the first convolutional layer to full articulamentum.

4. a kind of English text detection method with text orientation correction according to claim 3, which is characterized in that institute State the acquisition modes of fisrt feature are as follows: check candidate text filed progress first time filter using the first convolution in the first layer architecture Wave；It will the text filed maximum pond of the progress in the second layer architecture of first time filtered candidate；Using in third layer architecture Second convolution kernel filters the text filed progress of the candidate of maximum Chi Huahou for second；To second of filtered candidate text Region carries out utilizing pyramid pond in the 4th layer architecture；It is text filed in layer 5 frame to the candidate of pyramid Chi Huahou It is connected entirely in structure, to extract candidate text filed fisrt feature.

5. a kind of English text detection method with text orientation correction according to claim 3, which is characterized in that institute State the acquisition modes of second feature are as follows: using the feature added manually, by the candidate text filed progress first of the first convolution verification Secondary filtering；It is connected entirely according to the feature added manually by filtered candidate is text filed, to extract candidate text The second feature in region.

6. a kind of English text detection method with text orientation correction according to claim 5, which is characterized in that institute Stating the feature added manually includes: depth-width ratio, compactness, stroke width area ratio, local contrast and boundary key point.

7. a kind of English text detection method with text orientation correction according to claim 1, which is characterized in that institute State using the double-deck text packets algorithm by the preliminary text filed grouping include: by preliminary text filed carry out vertical grouping, It specifically includes:

Obtain the minimum Y axis coordinate b that n-th of preliminary text filed middle pixel is 255_n；Obtain (n+1)th it is preliminary it is text filed in The maximum Y axis coordinate t that pixel is 255_n+1；Obtain (n+1)th preliminary text filed height h_n+1；

Computed altitude is poorIf difference in height d_n,n+1Greater than height threshold；Then by two preliminary text areas Domain is divided into identical class, that is, belongs to one text row；If difference in height d_n,n+1Less than or equal to height threshold, then two it is preliminary Text filed is not identical class, (n+1)th it is preliminary it is text filed be considered as new line of text, and by the new line of text in Y-axis Direction is split.

8. a kind of English text detection method with text orientation correction according to claim 7, which is characterized in that institute It states and utilizes the double-deck text packets algorithm by the preliminary text filed grouping further include: will preliminary text filed progress level point Group specifically includes:

Two adjacent tentatively the distance between text filed poor Δ d in one text row in acquisition X-axis；Range difference Δ d includes: same The distance between letter d in one word₁The distance between word d₂；

Obtain the ratio of pitch and intervalIf the ratio d of pitch and interval_hLess than width threshold value, Two it is adjacent it is preliminary it is text filed belong to same class, i.e., the same word, if the ratio d of pitch and interval_hBe greater than or Equal to width threshold value, two it is adjacent it is preliminary it is text filed be not belonging to same class, i.e., the two adjacent preliminary text filed are not belonging to The same word, by the preliminary text filed beginning as new word of the latter.

9. a kind of English text detection method with text orientation correction according to claim 1, which is characterized in that institute Stating group box includes: inclination group box and long interval group box；The inclination group box includes a letter；The long interval The letter that group box includes is located at both ends.