CN108647681A

CN108647681A - A kind of English text detection method with text orientation correction

Info

Publication number: CN108647681A
Application number: CN201810429149.XA
Authority: CN
Inventors: 代劲; 王族; 尹航
Original assignee: Chongqing University of Post and Telecommunications
Current assignee: Chongqing University of Post and Telecommunications
Priority date: 2018-05-08
Filing date: 2018-05-08
Publication date: 2018-10-12
Anticipated expiration: 2038-05-08
Also published as: CN108647681B

Abstract

The invention belongs to technical field of image processing, specially a kind of English text detection method with text orientation correction；The method includes：Respectively to the carry out maximum stable extremal region detection in each channel of English text image, obtain candidate text filed；The grader based on convolutional neural networks model is established, the candidate of filter false is text filed, obtains preliminary text filed；Using the double-deck text packets algorithm by the preliminary text filed grouping；By the preliminary text filed carry out correction for direction after grouping, to obtain correction text；The present invention uses a kind of multichannel MSER models of enhancing：It is finer text filed to obtain；Parallel SPP CNN graders are introduced to better discriminate between text filed and non-textual region, the image of arbitrary size can be handled, and can be in multiple dimensioned lower extraction pond feature, so as to understand more features by the multilayer spatial information of source images；The present invention can handle the scene text being slightly slanted.

Description

A kind of English text detection method with text orientation correction

Technical field

The invention belongs to technical field of image processing, specially a kind of English text detection side with text orientation correction Method.

Background technology

Text in natural scene image has accurate, abundant information, this turning over for image analysis, based on image It translates, picture search etc. is of great significance.In past 20 years, researcher proposes some and is examined in natural scene image The method for surveying text.There are many based on content multimedia understand application, as automatic vision classification, image retrieval, assisting navigation, Multilingual translation, Object identifying and the application to satisfy the needs of consumers.

Scene text detects the critical issue faced:(1) text in file and picture has regular font, similar face Color, evenly sized and evenly distributed, even if in Same Scene, the text in natural scene may also have different words Body, color, ratio and direction.(2) background of natural scene image may be extremely complex.Mark, fence, brick and meadow are difficult to It is distinguished with real text, therefore be easy to cause and obscure and mistake.(3) other disturbing factors in scene character image.Such as Non-uniform illumination obscures, translucent effects etc..

Researcher proposes many methods to detect the text in natural scene image, and there are two types of main methods.

Text is considered as a kind of texture of specific type, and uses their texture properties by the method based on texture, such as office Portion's intensity, filter response and wavelet coefficient distinguish the text filed and non-textual region of image.The meter of these usual methods Calculation amount is very big, because to scan all positions and scale.In addition, these methods mainly handle lateral text, to rotating and scaling It is very sensitive；

Text is considered as connection component by component-oriented approach, first by various methods (such as color cluster or extreme area Extract in domain) text is extracted, the grader then trained using the regular of manual designs or automatically is filtered non-textual component. It is more effective generally, based on the method for component, because component count to be processed is relatively fewer.In addition, these methods are to rotation, contracting It puts and font is all insensitive.The conventional method of detection candidate text filed (Candidate Text Region, be denoted as CTR) has Maximum stable extremal region (Maximally Stable Extremal Regions, be denoted as MSER), this method is for image It is affine variation have very strong robustness, can efficiently extract it is text filed in image, after there is scholar to improve MSER's Extraction algorithm makes the time complexity of algorithm reach linear session.

These methods are according to the rule or feature for distinguishing text filed and non-textual region, thus by text filed and non-text One's respective area is distinguished, although these methods are capable of detecting when text, lacks the correction to English text, and to tilting text Differentiation effect and bad, the text identified can have serious separation because of the inclination of word.

Invention content

In view of this, the present invention proposes a kind of English text detection method with text orientation correction, it can be effective Identify text, and will identify that tilt text be corrected, specifically include following steps：

S1, maximum stable extremal region detection is carried out to each channel of the sharpening image of English text image respectively, from figure Extraction MSER is as text candidates as in；It obtains candidate text filed；

S2, the grader based on convolutional neural networks model is established, extracts candidate text filed feature；It utilizes Softmax functions text filed are divided into text class region and non-textual class region according to candidate text filed feature, by candidate； Non-textual class region is filtered, acquisition is preliminary text filed, that is, detects English text；

S3, using the double-deck text packets algorithm by the preliminary text filed grouping；

S4, by the preliminary text filed carry out correction for direction after grouping, to realize the correction of English text.

Further, the channel includes：Red channel, green channel, blue channel, tone channel, saturation degree channel, Lightness channel and grey channel.

Further, the grader of the foundation based on convolutional neural networks model extracts candidate text filed spy Sign includes：Candidate text filed fisrt feature is obtained according to five layer architectures in grader and waited by cross-layer respectively The second feature of selection one's respective area, wherein five layer architectures include the first convolutional layer being sequentially connected, maximum pond layer, volume Two Lamination, pyramid pond layer and full articulamentum；Cross-layer indicates the first convolutional layer to full articulamentum.

Further, candidate text filed progress first time filtering is checked using the first convolution in the first layer architecture；It will Filtered candidate is text filed for the first time carries out maximum pond in the second layer architecture；Utilize the volume Two in third layer architecture Product core, to candidate's second of filtering of text filed progress behind maximum pond；It is text filed to second of filtered candidate, It carries out utilizing pyramid pond in 4th layer architecture；It is text filed to the candidate behind pyramid pond to be carried out in layer 5 framework Full connection, to obtain candidate text filed fisrt feature.

Further, using the feature added manually, by the candidate text filed progress first time filtering of the first convolution verification； It is connected entirely according to the feature added manually by filtered candidate is text filed, to obtain candidate text filed second Feature.

Further, the feature added manually includes：Depth-width ratio, compactness, stroke width area ratio, local contrast Degree and boundary key point.

Further, the calculation formula of the local contrast is：

Wherein, lc indicates local contrast；R_iIndicate the ith pixel of red channel；G_iIndicate i-th of green channel Pixel；B_iIndicate the ith pixel of blue channel；N indicates that the pixel total number in the regions MSER, k indicate of boundary key point Number.

Further, the acquisition modes of the boundary key point are：

Build binary picture；The all pixels of binary picture described in iteration；Calculate profile point；It is drawn using Doug Si-Pu Ke compression algorithm profile points obtain boundary key point and specifically include：

The gray value for belonging to pixel in maximum stable extremal region is set as 255；Maximum stable extremal region will be belonged to Outside, and belong to the gray value of pixel in the minimum enclosed rectangle region of maximum stable extremal region and be set as 0；If pixel The pixel value p (x, y)=255 of (x, y), and at p (x+1, y), p (x-1, y), p (x, y+1) are there are one value in p (x, y-1) 0, then pixel p (x, y) belong to profile point；Using Douglas-Pu Ke compression algorithm profile points, pass through compressed remaining profile Point is boundary key point.

Further, described to include by the preliminary text filed grouping using the double-deck text packets algorithm：It will preliminary text One's respective area carries out vertical grouping and horizontal grouping respectively；

The vertical grouping specifically includes as follows：

Obtain the minimum Y axis coordinate b that n-th of preliminary text filed middle pixel is 255_n；Obtain (n+1)th preliminary text area The maximum Y axis coordinate t that pixel is 255 in domain_n+1；Obtain (n+1)th preliminary text filed height h_n+1；

Computed altitude is poorIf difference in height d_n,n+1More than height threshold；Then by two preliminary texts Region division is identical class, that is, belongs to one text row；If difference in height d_n,n+1Less than or equal to height threshold, then at the beginning of two It is not same class to walk text filed, (n+1)th it is preliminary it is text filed be considered as new class, and new line of text is in Y direction quilt It splits；

The level is grouped specific steps：

Two adjacent tentatively the distance between text filed poor Δ d in one text row in acquisition X-axis；Range difference Δ d packets It includes：The distance between letter d in same word₁, the distance between word d₂；

According to coefficientIt indicates the mean breadth of all letters in line of text, word is separated according to width threshold value；

Obtain the ratio of pitch and intervalIf the ratio d of pitch and interval_hLess than width Threshold value, then the two it is adjacent it is preliminary it is text filed belong to same class, i.e., the same word, if the ratio of pitch and interval Value d_hMore than or equal to width threshold value, the two it is adjacent it is preliminary it is text filed be not belonging to same class, i.e., the two regions are not belonging to The same word, by the preliminary text filed beginning as a new word of the latter.

Further, the preliminary text filed progress correction for direction by after grouping includes：

S401, respectively will be preliminary text filed to rotate clockwise α degree after grouping using coordinate rotation formula；Setting is just Initial value i=1, α=- 30 °；

S402, by Model Matching process, the filtering of the group box that introduces errors into；Obtain i-th of correction text area undetermined Domain；

S403, work as i<When 6 ,+10 ° of i=i+1, α=α, return to step S401；As i=6, by the 1st correction text undetermined This is to the 6th correction text overlay undetermined, to obtain correction of a final proof text.

Further, the coordinate rotation formula includes：

X'=xcos θ+ysin θ

Y'=ycos θ-xsin θ

Wherein, x indicates the abscissa of pixel；Y indicates the ordinate of pixel；θ indicates rotation angle threshold value；X' is indicated The abscissa of pixel after rotation；Y' indicates the ordinate of pixel after rotation；

The group box includes：Tilt group box and long interval group box；The inclination group box includes a letter； The letter that the long interval group box includes is located at both ends.

Beneficial effects of the present invention：The invention has the advantages that：Using a kind of multichannel MSER models of enhancing：From R, G, MSER is detected in B, H, S, V and grey channel, it is text filed with the candidate for obtaining finer.Introduce parallel SPP-CNN (Spatial pyramid pooling (SPP)-Convolutional Neural Networks (CNN), pyramid pond- Convolutional neural networks) grader better discriminates between text filed and non-textual region, and which can handle arbitrary size Image, and can be in multiple dimensioned lower extraction pond feature, so as to pass through the multilayer space of source images (English text image) Information understands more features；By Model Matching process, the group box introduced errors into filters；Inclined field can be handled Scape text realizes the correction to English text.

Description of the drawings

Fig. 1 is the flow chart of the present invention；

Fig. 2 is the cross-layer SPP-CNN algorithm Organization Charts that the present invention uses；

Fig. 3 is the schematic diagram of SPP working methods in the prior art；

Fig. 4 is the text packets system assumption diagram of the present invention；

Fig. 5 is the constraint schematic diagram of the line of text of the present invention；

Fig. 6 obtains for the present invention without correction for direction preliminary text filed；

Fig. 7 is that the present invention is text filed by the correction of a final proof of correction for direction；

Fig. 8 is the direction rotating model of the present invention；

Fig. 9 is the Matching Model of group box of the present invention；

Figure 10 present invention is the testing result figure of different rotary；

Figure 11 present invention is the case diagram of testing result.

Specific implementation mode

In order to make the purpose , technical scheme and advantage of the present invention be clearer, below in conjunction with attached drawing to of the invention real The technical solution applied in example is clearly and completely described, it is clear that described embodiment is only that a present invention part is implemented Example, instead of all the embodiments.

The present invention provides a kind of English text detection methods with text orientation correction, as shown in Figure 1, it includes such as Lower step：

S2, the classification based on convolutional neural networks (Convolutional Neural Networks, CNN) model is established Device extracts candidate text filed feature, using softmax functions according to candidate text filed feature, by candidate text Region is divided into text class region and non-textual class region；Non-textual class region is filtered, acquisition is preliminary text filed, that is, detects English Text；

S4, the preliminary text filed carry out correction for direction after grouping is obtained into school to realize the correction of English text Positive text, the correction text are the English text after correcting.

Preferably, five layer architectures of the CNN models that the present invention uses are as shown in Figure 2：

First layer architecture use size for 7 × 7 × 5 the first convolution kernel；It for 7 width be 7 depth is 5 that expression, which uses length, Convolution kernel；

Second layer architecture uses 5 × 5 × 5 maximum pond；It indicates to use length for maximum pond that 5 width are that 5 depth are 5 Change；

Second convolution kernel of 5 × 3 × 5 convolution of third layer framework applications；It for 5 width be 3 depth is 5 that expression, which uses length, Convolution kernel；

4th layer architecture uses the ponds SPP；Fig. 3 is the schematic diagram of SPP working methods, and 3 are respectively adopted to same image × 3 ponds (that is to say that length is 3 pond of 3 width) are divided into 9 blocks, and 2 × 2 ponds are divided into 4 blocks and 1 × 1 pondization point At 1 block, each piece of maximum value is calculated separately, to obtain output neuron, then the image of arbitrary size is converted into one 14 dimensional features of a fixed size.It is understood that the present invention can be increased pyramidal with arbitrarily devised different dimensions size The number of plies or the size for changing grid division.

Layer 5 framework uses full articulamentum；It specifically includes：

Candidate text filed progress first time filtering is checked using the first convolution in the first layer architecture；First time is filtered The text filed maximum pond of the progress in the second layer architecture of candidate afterwards；Using the second convolution kernel in third layer architecture, to most Candidate's second of filtering of text filed progress after great Chiization；It is text filed to second of filtered candidate, in the 4th layer frame It carries out utilizing pyramid pond in structure；It is text filed to the candidate behind pyramid pond to be connected entirely in layer 5 framework, To extract candidate text filed fisrt feature；

Using the feature added manually, by the candidate text filed progress first time filtering of the first convolution verification；After filtering Candidate it is text filed connected entirely according to the feature added manually, to extract the text filed second feature of candidate.

Preferably, the feature of manual designs is embedded into entire CNN, i.e. cross-layer.Cross-layer is only in the first layer architecture and It works in five layer architectures, the feature used in cross-layer namely the feature added manually include：

Depth-width ratioCompactnessStroke width area ratioLocal contrast lc and boundary key point k.

Wherein w and h respectively represents the width and height (pixel number) of the minimum enclosed rectangle of maximum stable extremal region；a Indicate the area (all pixels point number in region) of the minimum enclosed rectangle of maximum stable extremal region；P indicates maximum stable The number of the minimum enclosed rectangle boundary point of extremal region indicates p with boundary key point k in the present invention.

Local contrast can be obtained using following equation：

By connecting boundary key point in sequence, it can approximatively restore original area, that is to say and obtain It gets preliminary text filed.

The calculating process of k：

Build binary picture；The all pixels of binary picture described in iteration；Calculate profile point；It is drawn using Doug Si-Pu Ke compression algorithm profile points, the profile point after overcompression are boundary key point；It specifically includes：

The gray value for belonging to pixel in maximum stable extremal region is set as 255；Maximum stable extremal region will be belonged to Outside, and belong to the gray value of pixel in the minimum enclosed rectangle region of maximum stable extremal region and be set as 0；If pixel The pixel value p (x, y)=255 of (x, y), and at p (x+1, y), p (x-1, y), p (x, y+1) are there are one value in p (x, y-1) 0, then pixel p (x, y) belong to profile point；It is compressed using Douglas-Pu Ke algorithms (Douglas-Peucker algorithm) Profile point is boundary key point by compressed remaining profile point.

Alternatively, the classification of final feature is obtained using SoftMax classification functions；

After the preliminary text filed grouping is carried out text packets using the double-deck text packets algorithm, then carry out low dip Correction for direction, as shown in Figure 4.It is specifically divided into three parts：Vertical grouping, horizontal grouping and correction for direction：

Vertical grouping key step is as follows：

Obtain the minimum Y axis coordinate b that n-th of preliminary text filed middle pixel is 255_n；Obtain (n+1)th preliminary text area The maximum Y axis coordinate t that pixel is 255 in domain_n+1；Obtain (n+1)th preliminary text filed height h_n+1；As shown in Figure 5；

Computed altitude is poorIf difference in height d_n,n+1More than height threshold；Then by two preliminary texts One's respective area is divided into identical class, that is, belongs to one text row；If difference in height d_n,n+1Less than or equal to height threshold, then two It is preliminary it is text filed be not identical class, (n+1)th it is preliminary it is text filed be considered as new line of text, and the new line of text is existed Y direction is split；

Wherein, height threshold of the invention takes 0.62；

Level grouping key step is as follows：

Two adjacent tentatively the distance between text filed poor Δ d in one text row in acquisition X-axis；Range difference Δ d packets It includes：The distance between letter d in same word₁The distance between word d₂；

Obtain the ratio of pitch and intervalIf the ratio d of pitch and interval_hLess than width Threshold value, two it is adjacent it is preliminary it is text filed belong to same class, i.e., the same word, if the ratio d of pitch and interval_hGreatly In or be equal to width threshold value, two it is adjacent it is preliminary it is text filed be not belonging to same class, i.e., the two it is adjacent it is preliminary it is text filed not Belong to the same word, by the preliminary text filed beginning as new word of the latter.

Wherein, coefficientBe according to use from 2013 training sets of ICDAR include 229 pictures and 1226 words What experiment obtained；

The width threshold value of the present invention takes 2.33.

Steps are as follows for low dip correction for direction：

Fig. 6 is illustrated to tentatively text filed, it can be seen that word is seriously detached because of inclination, and " ne1Wor " is recognized To be the word of same a line；According to experiment, show that the word within 10 degree of slight inclination can be correctly grouped, therefore use The strategy of rotatable coordinate axis, has obtained the correction of a final proof text as shown in Fig. 7.

Due to the rotation of reference axis, group box " wordline1 " is correctly grouped, but the group box that mistake introduces " wordline2 " is not corrected correctly, so carrying out innovatory algorithm using rotation convergence strategy：

The text filed carry out correction for direction by after grouping includes：

S403, work as i<When 6 ,+10 ° of i=i+1, α=α；Return to step S401, as i=6, by the 1st correction text undetermined This is to the 6th correction text overlay undetermined, to obtain correction of a final proof text.

Specifically,

It respectively will be preliminary text filed with tens degree clockwise or counterclockwise after grouping using coordinate rotation formula； Set initial value i=1；

In the present invention, respectively to be rotated both clockwise and counterclockwise 10,20,30 degree, as shown in Figure 8；

By Model Matching process, the group box introduced errors into filters；It is text filed to obtain i-th of correction undetermined；

By i-th of correction text undetermined with tens degree clockwise or counterclockwise, introduced errors by Model Matching Group box filtering, obtain i+1 correction text undetermined；

Work as i<When 6, i=i+1；Return to step S401, as i=6, by the 1st correction text undetermined to the 6th school undetermined Positive text overlay, to obtain correction of a final proof text；As shown in Figure 9.

As another realization method, in step S403, i can be not limited to be equal to 6, can also be arbitrary in 5,7,8 One number.

It is understood that Model Matching process is according to corresponding in the postrotational preliminary text filed image to training set The matched process of English text retains the lap, accordingly if postrotational image can be Chong Die with the training set image , then another rotation is done to tentatively text filed, if postrotational image can be Chong Die with another training set image, again Retain lap finally to be superimposed out by all laps, obtains correction of a final proof text.

(a) shows a model for being known as " tilting group box " in Fig. 9, includes only one the model describe each frame Letter.The group box that this mistake introduces occurs when mainly text tilts in a single direction.

(b) shows that one is known as " long interval group box " in Fig. 9, and the group box that this mistake introduces indicates that letter is located at There is prodigious interval at the both ends of each frame between them.Such case will be in the distance between the text that different directions arrange Occur when too close.

An important factor for increment of rotation and number of revolutions are testing results.It, will most for balance quality and time complexity High rotation angle degree is set as 30 degree.As shown in Figure 10, the increment of rotation is spent from 1 to 15 and is differed, and increment is smaller, obtains maximal degree institute The number needed is more, the experimental results showed that, when increment reaches 10 degree, three indexs such as precision, recall rate and f indexs reach Peak value, this is the final rotation angle threshold value of the method proposed.

In the present invention, in order to verify proposition algorithm correctness and validity, to ICDAR 2011 and ICDAR Contrast experiment has been carried out on 2013 data sets.2011 test sets of ICDAR include 255 images, and 2013 test sets of ICDAR include 233 images.Each image corresponds to a txt document, it has recorded the specific coordinate for the text for needing to detect.

The assessment of detection result mainly calculates the text filed registration between actual text region of correction detected. For each rectangle to be assessed, maximum matching value is used.Formula is as follows：

m(r；R)=max m (r, r') | r' ∈ R }

R indicates that correction is text filed, and r' indicates actual text region；A (r) indicates the rectangular area of the text filed r of correction, R indicates matched region collection.Maximum area matching is obtained, then computational accuracy, recall rate and f indexs.Formula is as follows：

E indicates that the text filed set of correction to be detected, T indicate rectangular set to be assessed.F-measure be precision and The combination of recall rate.The relative weighting of precision and recall rate is controlled by parameter alpha, is usually arranged as 0.5, and precision and recall rate is made to have There is identical weight.

In the present invention, some comparative experiments demonstrate proposed method can extract it is more text filed.

The extraction result of 1 difference MSER methods of table

According to table 1 (only considering the performance of alphabetical rank, do not consider the final result of word level), in Laplacian and After the pretreatment of multichannel, more text filed (recall rate increases) can be extracted, but is also extracted more non-textual Region (precision reduction).

Context of methods and existing Method for text detection are carried out quantitative ratio by the validity of the method used to illustrate the invention Compared with.Training set is manually generated by ICDAR 2011 and ICDAR 2013 using multichannel MSER, it includes 44908 English texts Image and 56139 non English language text images.Collect 25% training set as verification, by training process, rate of accuracy reached arrives 96%.Training for SPP-CNN, using cross validation and stochastic gradient descent (SGD).To ICDAR 2011 and ICDAR 2013 compare experiment in 5 kinds of methods, the English text image identified for the present invention such as Figure 11, it can be seen that this Invention can effectively identify English text and can realize correction.

As can be seen that the method for the present invention is in recall rate and f indexs from table 2 and table 3, it is superior to the prior art.

Influence of the table 2 in 2011 Scene text detections of ICDAR

Influence of the table 3 in 2013 Scene text detections of ICDAR

It is corresponded to respectively in table 2 and the documents in table 3：

[1]Liu Z,Li Y,Qi X,et al.Method for unconstrained text detection in natural scene image[J].Iet Computer Vision,2017,11(7):596-604.

[2]Wu H,Zou B,Zhao YQ,et al.Natural scene text detection by multi- scale adaptive color clustering and non-text filtering[J].Neurocomputing, 2016,214:1011–1025.

[3]Yu C,Song Y,Zhang Y.Scene text localization using edge analysis and feature pool[J].Neurocomputing,2015,175:652-661.

[4]Yao Li,Wenjing Jia,Chunhua Shen,et al.Characterness:An Indicator of Text in the Wild[J].IEEE transactions on image processing:a publication of the IEEE Signal Processing Society,2014,23(4):1666-77.

[5]Tian C,Xia Y,Zhang X,et al.Natural Scene Text Detection with MC-MR Candidate Extraction and Coarse-to-Fine Filtering[J].Neurocomputing,2017.

[6]Zhu A,Gao R,Uchida S.Could scene context be beneficial for scene text detection[J].Pattern Recognition,2016,58:204-215.

[7]Neumann L,Matas J.Efficient Scene text localization and recognition with local character refinement[C]//International Conference on Document Analysis and Recognition.IEEE,2015:746-750.

[8]Gomez L,Karatzas D.A fast hierarchical method for multi-script and arbitrary oriented scene text extraction[J].2014,19(4):1-15.

The present invention can detect the text of slight inclination and the English text with different fonts or size, as Fig. 9 be at Work(detects case.

One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment is can It is completed with instructing relevant hardware by program, which can be stored in a computer readable storage medium, storage Medium may include：ROM, RAM, disk or CD etc..

Embodiment provided above has carried out further detailed description, institute to the object, technical solutions and advantages of the present invention It should be understood that embodiment provided above is only the preferred embodiment of the present invention, be not intended to limit the invention, it is all Any modification, equivalent substitution, improvement and etc. made for the present invention, should be included in the present invention within the spirit and principles in the present invention Protection domain within.

Claims

1. a kind of English text detection method with text orientation correction, which is characterized in that include the following steps：

S1, maximum stable extremal region detection is carried out to each channel of the sharpening image of English text image respectively, from image Extract maximum stable extremal region；It obtains candidate text filed；

S2, the grader based on convolutional neural networks model is established, extracts candidate text filed feature；Utilize softmax Function text filed is divided into text class region and non-textual class region according to candidate text filed feature, by candidate；It filters non- Text class region, acquisition is preliminary text filed, that is, detects English text；

2. a kind of English text detection method with text orientation correction according to claim 1, which is characterized in that institute Stating channel includes：Red channel, green channel, blue channel, tone channel, saturation degree channel, lightness channel and grey channel.

3. a kind of English text detection method with text orientation correction according to claim 1, which is characterized in that institute It states and establishes the grader based on convolutional neural networks model, extracting candidate text filed feature includes：Respectively according to classification Five layer architectures in device obtain candidate text filed fisrt feature and obtain candidate the second text filed spy by cross-layer Sign, wherein five layer architectures include the first convolutional layer being sequentially connected, maximum pond layer, the second convolutional layer, pyramid pond layer with And full articulamentum；Cross-layer indicates the first convolutional layer to full articulamentum.

4. a kind of English text detection method with text orientation correction according to claim 3, which is characterized in that institute The acquisition modes for stating fisrt feature are：Candidate text filed progress first time filter is checked using the first convolution in the first layer architecture Wave；It will the text filed maximum pond of the progress in the second layer architecture of first time filtered candidate；Using in third layer architecture Second convolution kernel, to candidate's second of filtering of text filed progress behind maximum pond；Candidate text filtered to second Region carries out utilizing pyramid pond in the 4th layer architecture；It is text filed in layer 5 frame to the candidate behind pyramid pond It is connected entirely in structure, to extract candidate text filed fisrt feature.

5. a kind of English text detection method with text orientation correction according to claim 3, which is characterized in that institute The acquisition modes for stating second feature are：Using the feature added manually, the first convolution is checked into candidate text filed progress first Secondary filtering；It is connected entirely according to the feature added manually by filtered candidate is text filed, to extract candidate text The second feature in region.

6. a kind of English text detection method with text orientation correction according to claim 5, which is characterized in that institute Stating the feature added manually includes：Depth-width ratio, compactness, stroke width area ratio, local contrast and boundary key point.

7. a kind of English text detection method with text orientation correction according to claim 1, which is characterized in that institute It states and includes by the preliminary text filed grouping using the double-deck text packets algorithm：Preliminary text filed carry out vertical grouping is incited somebody to action, It specifically includes：

Obtain the minimum Y axis coordinate b that n-th of preliminary text filed middle pixel is 255_n；Obtain (n+1)th it is preliminary it is text filed in The maximum Y axis coordinate t that pixel is 255_n+1；Obtain (n+1)th preliminary text filed height h_n+1；

Computed altitude is poorIf difference in height d_n,n+1More than height threshold；Then by two preliminary text areas Domain is divided into identical class, that is, belongs to one text row；If difference in height d_n,n+1Less than or equal to height threshold, then two it is preliminary Text filed is not identical class, (n+1)th it is preliminary it is text filed be considered as new line of text, and by the new line of text in Y-axis Direction is split.

8. a kind of English text detection method with text orientation correction according to claim 7, which is characterized in that institute It states and further includes by the preliminary text filed grouping using the double-deck text packets algorithm：It will horizontal point of preliminary text filed progress Group specifically includes：

Two adjacent tentatively the distance between text filed poor Δ d in one text row in acquisition X-axis；Range difference Δ d includes：Together The distance between letter d in one word₁The distance between word d₂；

Obtain the ratio of pitch and intervalIf the ratio d of pitch and interval_hLess than width threshold value, Two it is adjacent it is preliminary it is text filed belong to same class, i.e., the same word, if the ratio d of pitch and interval_hBe more than or Equal to width threshold value, two it is adjacent it is preliminary it is text filed be not belonging to same class, i.e., the two adjacent preliminary text filed are not belonging to The same word, by the preliminary text filed beginning as new word of the latter.

9. a kind of English text detection method with text orientation correction according to claim 1, which is characterized in that institute It states the preliminary text filed carry out correction for direction after grouping, to realize that the correction of English text includes：

S401, respectively will be preliminary text filed to rotate clockwise α degree after grouping using coordinate rotation formula；Set initial value I=1, α=- 30 °；

S402, by Model Matching process, the filtering of the group box that introduces errors into；It is text filed to obtain i-th of correction undetermined；

S403, work as i<When 6 ,+10 ° of i=i+1, α=α；Return to step S401, as i=6, extremely by the 1st correction text undetermined 6th correction text overlay undetermined, to obtain correction of a final proof text.

10. a kind of English text detection method with text orientation correction according to claim 9, which is characterized in that The group box includes：Tilt group box and long interval group box；The inclination group box includes a letter；Between the length It is located at both ends every the letter that group box includes.