CN108960229A

CN108960229A - One kind is towards multidirectional character detecting method and device

Info

Publication number: CN108960229A
Application number: CN201810366383.2A
Authority: CN
Inventors: 王蕊; 伍蹈; 操晓春
Original assignee: Institute of Information Engineering of CAS
Current assignee: Institute of Information Engineering of CAS
Priority date: 2018-04-23
Filing date: 2018-04-23
Publication date: 2018-12-07
Anticipated expiration: 2038-04-23
Also published as: CN108960229B

Abstract

The present invention relates to one kind towards multidirectional character detecting method and device.In terms of training, on the basis of not changing network structure, cutting is carried out to quadrangle true value frame, quadrangle true value frame is become the strip true value frame of multiple rectangular areas, meet the input of CTPN, a positive and negative sample proportion of trained minimum lot size is controlled, guarantees positive negative sample balance, places into CTPN network and be trained；In terms of test, the picture to original image and after being rotated by 90 ° is put into test network simultaneously, the strip rectangular area come out to neural network forecast is fitted to form quadrangle candidate frame, carries out being rotated by 90 ° the coordinate position for reverting to original image counterclockwise to the testing result for the test picture being rotated by 90 °；It is finally comprehensive to the testing result of two figures to do the screening such as non-maxima suppression, realize accurately multi-direction text location.The present invention can adapt to multi-direction, the text including the directions such as horizontal, inclination, vertical, and have higher precision.

Description

One kind is towards multidirectional character detecting method and device

Technical field

The invention belongs to technical field of computer vision, and in particular to a kind of to adapt to multidirectional character detecting method and dress It sets, level, inclination and vertical text under natural scene can be accurately positioned.

Background technique

Text under natural scene is ubiquitous, such as traffic sign, the billboard in shop, poster etc., has artificial The place of trace, substantially with the presence of text.Text under identification natural scene is an important hair in artificial intelligence processes Open up part.Text region (Text Spotting) in image is generally divided into two steps, and text detection first orients image Then the position of middle text obtains the information content of text using identification technology to the text oriented.Wherein, text detection Accurate character area is oriented from picture background, is occupied an important position in entire Text region process.

It is extremely complex that situation occurs in text under natural scene.Firstly, background is complicated very much, it is not the pure of file and picture Color background, the image in natural scene are filled with the various interference for text, for example, electric wire, the artificial traces such as window are deposited Text is set to be difficult to extract from background.Secondly, the font of the text in natural scene, color, layout etc. has very big more Denaturation, which increase the difficulties that we position.In addition, being caused under the natural scene in picture due to shooting angle problem There are tilt angles for text, this is different with common object detection, another improves the difficulty of detection.Therefore, natural field Text detection in scape is the task of a great challenge.

With the development of neural network deep learning, natural scene character detecting method all utilizes deep learning mostly at present To realize.All in all, the natural scene character detecting method after 2006 can be classified as three classes.The first kind is based on pixel point The character detecting method cut.Zhang(Zhang,Zheng,et al."Multi-oriented Text Detection with Fully Convolutional Networks."Computer Vision and Pattern RecognitionIEEE, 2016:4159-4167.) et al. it is partitioned into from picture accordingly first with FCN (Fully Connected Network) Character area is recycled maximum stable extremal region method (MSER) to extract candidate characters region, is estimated using candidate characters region The direction for counting entire line of text, finally constructs line of text.Second class is the method based on candidate frame (object detection), than more typical Have TextBoxes (Liao, Minghui, et al. " TextBoxes:A Fast Text Detector with a Single Deep Neural Network. " (2016)) and CTPN (Tian, Zhi, et al. " Detecting Text in Natural Image with Connectionist Text Proposal Network."European Conference On Computer Vision Springer, Cham, 2016:56-72.), they follow common object detecting method respectively SSD (Liu, Wei, et al. " SSD:Single Shot MultiBox Detector. " (2015): 21-37.) and Faster R-CNN(Ren,S.,et al."Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks."IEEE Transactions on Pattern Analysis&Machine Intelligence 39.6 (2017): 1137-1149.) method of candidate frame is improved according to the feature of text, is finally reached Detect the purpose of text.Third class is mixed method, EAST (Zhou, Xinyu, et al. " EAST:An Efficient And Accurate Scene Text Detector. " (2017)) multi-task learning method, on the one hand divide in picture On the other hand character area out predicts the geometry of text.

Since object detection does not have angled concept, the candidate frame that detected all is rectangle, therefore above-mentioned nature Under scene in character detecting method, the method that the second class method is based on candidate frame (object detection) is mainly used for the text of horizontal direction Word detection.However, the distribution of the text under natural scene in picture be it is very arbitrary, especially because shooting angle is asked Topic, many inclined texts.In addition, there are many Chinese texts to be distributed under Chinese environment, in natural scene vertically, and show For the relevant research work of some texts both for horizontal or inclination text, the detection to vertical text is not fine.To existing Some is improved based on the method for candidate frame (object detection), and adapting it to multi-direction text detection is a significant work Make.

Summary of the invention

In view of the above-mentioned problems, being mentioned the purpose of the present invention is being extended on the character detecting method towards horizontal direction One kind is towards multidirectional character detecting method and device out.

Basis of the invention is text detection network C TPN (Connectionist Text Proposal Network), The network predominantly detects horizontal line of text, and especially in the detection towards Chinese, CTPN is widely used.On this basis, this hair It is bright in order to adapt it to multi-direction text detection under natural scene, it is right on the basis of not changing network structure in terms of training (multi-direction) the progress cutting of quadrangle true value frame becomes quadrangle true value frame the strip true value frame of multiple rectangular areas, meets The input of CTPN.A positive and negative sample proportion of trained minimum lot size is controlled, guarantees positive negative sample balance, places into CTPN network It is trained；In terms of test, in order to be extended to the multi-direction character detecting method including vertical direction, need vertical text Become horizontally or diagonally text input network to be detected.Therefore, while the picture by original image and after being rotated by 90 ° is put into survey Try network.Its strip rectangular area come out to neural network forecast is fitted to form quadrangle candidate frame, the survey to being rotated by 90 ° The testing result for attempting piece carries out being rotated by 90 ° the coordinate position for reverting to original image counterclockwise.It is finally comprehensive to the testing result of two figures Non-maxima suppression is done in conjunction

(NMS) etc. accurately multi-direction text location is realized in screening.

In order to achieve the above object, the technical solution adopted by the present invention is that:

One kind is towards multidirectional character detecting method, comprising the following steps:

1) text network training:

Cutting 1-1) is carried out according to angle information to the quadrangle true value frame of training sample, the strip for forming rectangular area is true It is worth frame；

Positive and negative sample proportion 1-2) is controlled, and carries out stochastical sampling, CTPN network is put samples into and is trained, obtain text Word detects network；

2) text location detects:

Picture to be detected and 90 degree of picture rotation of the picture to be detected 2-1) are inputted into text detection network, text inspection Survey grid network exports bar-shaped frame and it there is a possibility that the prediction score value (score) of text, does to obtained bar-shaped frame non-very big Value inhibits, and therefrom selects the bar-shaped frame that prediction score value is greater than given threshold；

2-2) bar-shaped frame picked out in different angle picture is merged, fitting constructs quadrangle textbox.

Further, step 1-1) find out quadrangle true value frame up and down two sides angle, obtain the straight line on upper and lower both sides Formula utilizes straight line public affairs according to x-axis coordinate every anchor wide (reference frame that anchor is predictive text position in CTPN) Formula determines y-axis coordinate, obtains (the x of strip true value frame_min, y_min, x_max, y_max), respectively indicate the true value frame upper left corner and the lower right corner Position.

Further, step 1-1) training picture in, the beginning and end in the horizontal direction of strip true value frame is (i.e. Xmin, xmax), always on the position of 16 multiples.Therefore the not level side on 16 multiples of the quadrangle true value frame before cutting To both sides, need to be accepted or rejected.

Further, step 1-1) for there are the pictures of the true value frame of vertical text to carry out specially treated.If training figure In piece based on vertical text, after training picture and quadrangle true value frame are rotated horizontally 90 degree, erected in picture after label rotation Straight true value frame is the part of " vertical ", then carries out cutting processing；If in training picture based on horizontal, inclination text, The true value frame for marking vertical text is the part " vertical ", carries out cutting processing again later.Guarantee the validity of training sample.

Further, step 1-2) to strip true value frame and anchor carry out IOU (Intersection over Union, Overlapping region ratio) calculating, IOU be greater than certain threshold value anchor be selected as positive sample, be selected as negative sample less than certain threshold value This, controlling positive negative ratio is 1:1, and anchor frame corresponding to the true value frame of the part " vertical " does not select into training sample In.

Further, step 2-2) in, when merging for the bar-shaped frame selected, pairing condition will be met between any two Bar-shaped frame merge, fitting obtain the coordinate of quadrangle textbox.

Further, step 2-2) in, for picture to be detected, original image is put into togerther network with the picture after being rotated by 90 ° In, merge quadrangularly textbox after predicting bar-shaped frame respectively.Picture after being rotated by 90 ° and the text predicted on it Word frame is rotated by 90 ° counterclockwise reverts to original image direction, for the quadrangle textbox predicted on both angle pictures, closes Quadrangle non-maxima suppression (polygon-NMS) is done together, obtains final text location result.

One kind is towards multidirectional text detection device comprising:

Text detection network training module is responsible for cutting the quadrangle true value frame of training sample according to angle information Point, form the strip true value frame of rectangular area；Then positive and negative sample proportion is controlled, and carries out stochastical sampling, is put samples into CTPN network is trained, and obtains text detection network；

Text location detection module is responsible for picture to be detected and 90 degree of picture rotation of the picture to be detected inputting text Network is detected, which exports bar-shaped frame and it there is a possibility that the prediction score value of text, the bar shaped to obtaining Frame does non-maxima suppression, and therefrom selects the bar-shaped frame that prediction score value is greater than given threshold；Then in different angle picture The bar-shaped frame picked out merges, and fitting constructs quadrangle textbox.

In conclusion the present invention devises one kind towards multidirectional character detecting method, compared with prior art, originally The advantages of invention, is can adapt to multi-direction, the text including the directions such as horizontal, inclination, vertical, and has higher essence Degree.

Detailed description of the invention

Fig. 1 is text detection network C TPN network architecture diagram in one embodiment of the invention.

Fig. 2 is that text detection training data pre-processes cutting figure in one embodiment of the invention.

Fig. 3 is text testing result comparison diagram in one embodiment of the invention.

Specific embodiment

Below with reference to embodiment and attached drawing, clear, complete description is carried out to the present invention.

It is proposed by the present invention to be based primarily upon CTPN towards text detection network in multidirectional character detecting method and changed Into what is got.It wherein mainly include two stages, i.e. network training stage and text location detection-phase.

CTPN network regards the strip region group by fixed width as based on the RPN stage in Faster-RCNN, literal line At network structure is as shown in Figure 1.Wherein, VGG16 is deep neural network model, extracts feature to picture；Conv5 is VGG16 The last layer convolutional layer of network；H, W respectively indicates the height and width for the characteristic pattern that Conv5 comes out；FC indicates full articulamentum； " 3x3xC → 256D " indicates that the characteristic pattern of Conv5 carries out 3x3xC dimension to recurrent neural network Bi-LSTM (two-way LSTM) 256 The conversion of dimension, i.e. each time series vector in BLSTM are 256 dimensions；" a " indicates 2k vertical coordinates, that is, indicates Network exports the offset coordinates on k anchor vertical direction；" b " indicates 2k scores, i.e. expression network is to k anchor It is middle that there are the marking of text；" c " indicates k side-refinement, that is, indicates the offset of k anchor and text boundary.

Anchor's is sized to fixed wide (as network step-length, 16 pixels), highly different strip in the network Shape.Only the vertical direction of anchor frame is returned, the parameter of the regression equation smoothL1 in network is 2 y-axis sides Upward coordinate.RNN layers (BLSTM) is added in CTPN network, two-way context is carried out to the information in the horizontal direction of picture Analysis, in addition, there are three types of loss functions, including classification to judge in total by CTPN, vertical direction upper returning and boundary are returned.Wherein It is the recurrence that x-axis direction is carried out to the horizontal boundary of the true value frame of mark that boundary, which returns, determines the right and left of prediction block, is worth It is noted that not using boundary in a specific embodiment of the invention returns this loss function.

One, the network training stage:

1, the pretreatment of training sample

For CTPN mainly towards horizontal text, the sample coordinate of training managing is that rectangle true value frame is (x_min,y_min,x_max, y_max) four coordinates, the segmentation of true value frame is carried out later, makes it at the strip true value frame with one broadening of anchor frame.It now faces towards more Direction text, the coordinate of training sample are that four side true value shape frames are (x1, y1, x2, y2, x3, y3, x4, y4), are indicated from the upper left corner Start clockwise coordinate, and multi-direction text has horizontal, inclination, vertical a variety of situations.

Here illustrate horizontal in the present invention, inclination, vertical definition.Horizontal: text true value frame is in the horizontal direction；Incline Oblique: text true value frame is [- π/4, π/4] angle with x-axis；It is vertical: text true value frame be in x-axis be [- pi/2 ,-π/4) or (π/ 4, pi/2] angle, Huo Zhegao: it is wider than 1.2.

It simply takes it to cover the minimum rectangle frame of quadrangle and will cause in training sample quadrangle true value frame to exist very much It is not the region of text, causes training sample quality not high, influence network training.Therefore true value of the present invention to multi-direction text Frame is cut into anchor with wide strip true value frame, in Fig. 2 shown in (a) figure, (b) figure, (c) figure and (d) figure, so that true value Frame perfection agrees with text, guarantees the reliability of sample.

It include that specially treated and BORDER PROCESSING, specific steps are carried out to vertical true value frame to quadrangle true value frame cutting It is as follows:

1) vertical true value frame is marked

A) the minimum square of covering quadrangle is calculated according to quadrangle true value frame coordinate (x1, y1, x2, y2, x3, y3, x4, y4) Shape obtains x_min,y_min,x_max,y_max, further calculate width=x_max-x_min, height=y_max-y_min.High if (height): Wide (width) is greater than 1.2, then it is assumed that it is vertical true value frame, is labeled as vertical=1, defaults vertical=0.

B) according to four side true value shape frame coordinates (x1, y1, x2, y2, x3, y3, x4, y4), [(x1, y1), (x2, y2)] two o'clock Determining straight line formula line1, (x4, y4), (x3, y3)] two o'clock determines the straight line formula of line3.Line1 and line3 is represented Quadrangle true value frame the 1st article since the upper left corner clockwise while and at the 3rd article of, formula is as follows:

Line1=k₁×x+b₁

Line3=k₃×x+b₃

If the slope of line1 and line3 are less than -1 or are greater than 1 i.e. k₁> 1 and k₃< -1, then it is assumed that the frame is vertical true It is worth frame, marks vertical=1.

2) to there are the pictures of vertical true value frame to handle

If the ratio for the vertical that all true value collimation marks in a trained figure are remembered is greater than certain threshold gamma, instruction is rotated Practice figure and 90 degree of true value frame, then all horizontal, inclination true value frames become vertical true value frame, therefore the vertical of all true value frames Label negates, and recalculates the straight line formula of line1 and line3.Wherein threshold gamma is taken as 50%.

3) true value frame cutting initial position is found

In order to keep coordinate of the strip true value frame in x-axis consistent with anchor frame, by the cutting beginning boundary of true value frame Also it is scheduled on anchor wide (width_a) on multiple.True value frame boundary is set in x-axis direction:

Wherein x1, x2, x3, x4 are the x coordinate on four vertex of quadrangle true value frame, width_aFor anchor wide.x_leftIt is The initial position on the true value frame cutting left side, for the x-axis direction in the true value frame upper left corner and the lower left corner midpoint turn right it is nearest width_aMultiple.Corresponding x_rightIt is 2 position of termination on the right after true value frame cutting, is the true value frame upper right corner and the lower right corner Turn left nearest width at the midpoint of x-axis direction_aMultiple.

4) cutting strip true value frame

By coordinate on the straight line formula and x-axis of line1 and line3, the strip true value frame coordinate being syncopated as is determined (gt_xmin,gt_ymin,gt_xmax,gt_ymax), wherein gt_xmaxWith gt_xminThe distance between be anchor wide width_a, calculation formula is gt_xmax=gt_xmin+width_a- 1, gt_xmax∈[x_left,x_right), gt_xminValue cutting initial position x_leftStart every width_aUntil cutting final position x_right.Gt is obtained according to the formula of line1 and line3_ymin, gt_ymax, x coordinate is when calculating The central point of strip true value frame

5) BORDER PROCESSING

In horizontal direction, the boundary of quadrangle true value frame may have certain left portion to true value frame cutting initial position Divide φ, because the origin coordinates in quadrangle true value frame horizontal direction can not all be the multiple of anchor wide.To guarantee sample High quality, the present invention accepts or rejects φ.Regulation: φ > widtha/2's is left as strip true value frame, otherwise gives up.

2, training sample is handled

For the positive negative ratio control of training sample in 1:1, minimum lot size training sample (minibatch) is up to 256.In positive sample When this less than 128, negative sample amount is reduced, guarantees the positive negative ratio control of sample in 1:1.

Anchor frame corresponding to the true value frame that vertical true value frame, that is, vertical is 1, is not put into training sample.

Two, the network test stage:

In test phase, need to screen for predicting the strip candidate frame come, and construct text box.To multi-direction Text, the present invention need to handle a variety of situations, but what the CTPN after extending in the present invention substantially handled is horizontal, inclination side To text.In order to solve this problem, the picture of two kinds of angles, original image and the figure being rotated by 90 ° are inputted in test.In original image There may be horizontal, inclinations, the text of vertical three kinds of situations, the picture after turning 90 degrees, originally vertical text become level or Person's inclination, originally horizontal, inclined text becomes vertical text.The object of the invention is to by detecting in original image Level tilts text and is rotated by 90 ° in rear picture horizontal, inclination text, so that the multi-direction text for reaching original image includes water Flat, inclination above all covers the result detected vertically.

Specific steps include:

1, the picture after testing the original image of picture and being rotated by 90 ° is put into togerther in network, predicts corresponding strip respectively Candidate frame.

2, the strip candidate frame predicted is screened respectively in the picture in original image and after being rotated by 90 °, network is defeated It is text candidate frame that score score out, which is greater than certain threshold value, the use of threshold value is generally 0.7, and handled using NMS.

3, the text candidate frame selected in one text row is put into S_kIn set, k is k-th of line of text.Rule are as follows: 1) In two strip text candidate frame horizontal directions within 32 pixels；2) on two strip candidate frame vertical directions the section IOU be [0.7, 1]；3) two strip candidate frame similarity sections are [0.7,1], i.e., the ratio between small frame and big frame height degree.

4, with fitting function to the text candidate frame S in one text row_kYi in set_min,yi_max, i ∈ S_k, respectively Fitting since the upper left corner clockwise 1st article while and formula at the 3rd article of,_min=cx+d, Y_max=ax+b.According to candidate Frame S_kThe minimum value of x-axis and maximum value and formula Y is substituted into set_minAnd Y_maxDetermine that four vertex of quadrangle line of text are sat Mark.

5, the score of quadrangle text prediction box is the score mean value of the strip frame constituted.

6, after the original image to test picture and the picture after being rotated by 90 ° detect strip rectangle frame, four sides that have constructed Shape text prediction box carries out quadrangle non-maxima suppression (polygon-NMS), obtains final text detection result.Wherein Polygon-NMS and NMS before are slightly different, and that calculate here is the IoU of the area of quadrilateral frame.

Method of the invention, for multi-direction text, including level, inclination and vertical text are all adapted to, and are Verifying actual technical effect of the invention passes through the natural scene text detection that specific experiment realizes previous embodiment description Method tests environment and experimental result are as follows:

(1) environment is tested:

System environments: ubuntu14.04；

Hardware environment: memory: 64GB, GPU:K40, hard disk: 1TB；

(2) experimental data:

Training data:

800,000 training datas of SynthText (artificial synthesized text picture) carry out pre-training 1epoch.

RCTW2017 training set (8034)

Test data: RCTW2017 test set (4229)

Appraisal procedure: IOU > 0.5 calculates successful match

(3) experimental result:

Fig. 3 is text detection comparative result figure, and upper row i.e. (a) figure, (b) figure, (c) figure is the testing result of CTPN, A line i.e. (d) figure, (e) figure, (f) figure is testing result of the invention below.Can be clearly seen that, with method of the invention with Former CTPN network is compared, more accurate for the positioning of inclination text and vertical text.Table 1 is on RCTW2017 test set Assessment result, although data set there are certain difficulty, it can be seen that this method possesses higher precision.

Assessment result of the table 1. on RCTW2017 test set

Data set	Precision	Recall	F-measure
				RCTW2017	0.791453	0.441691	0.56697

The above embodiments are merely illustrative of the technical solutions of the present invention rather than is limited, the ordinary skill of this field Personnel can be with modification or equivalent replacement of the technical solution of the present invention are made, without departing from the spirit and scope of the present invention, this The protection scope of invention should be subject to described in claims.

Claims

1. one kind is towards multidirectional character detecting method, which comprises the following steps:

1) cutting is carried out according to angle information to the quadrangle true value frame of training sample, forms the strip true value frame of rectangular area；

2) positive and negative sample proportion is controlled, and carries out stochastical sampling, CTPN network is put samples into and is trained, obtain text detection Network；

3) picture to be detected and 90 degree of picture rotation of the picture to be detected are inputted into text detection network, the text detection network It exports bar-shaped frame and it there is a possibility that the prediction score value of text, non-maxima suppression is done to obtained bar-shaped frame, and therefrom Select the bar-shaped frame that prediction score value is greater than given threshold；

4) bar-shaped frame picked out in different angle picture is merged, fitting constructs quadrangle textbox.

2. the method as described in claim 1, which is characterized in that step 1) finds out the angle on two sides of quadrangle true value frame or more Degree, obtains the straight line formula on upper and lower both sides, every anchor wide, according to x-axis coordinate, determines y-axis coordinate using straight line formula, obtain To (the x of strip true value frame_min, y_min, x_max, y_max), respectively indicate the position in the true value frame upper left corner and the lower right corner.

3. such as claim 1 method, which is characterized in that in the training picture of step 1), in the horizontal direction of strip true value frame Beginning and end always on the position of 16 multiples；To the not horizontal direction on 16 multiples of quadrangle true value frame before cutting Both sides are accepted or rejected.

4. the method as described in claim 1, which is characterized in that step 1) for there are the picture of the true value frame of vertical text into Row specially treated, if in training picture based on vertical text, after training picture and quadrangle true value frame are rotated horizontally 90 degree, True value frame vertical in picture is the part of " vertical " after label rotation, then carries out cutting processing；If training picture in Based on horizontal, inclination text, marking the true value frame of vertical text is the part " vertical ", carries out cutting processing again later, from And guarantee the validity of training sample.

5. the method as described in claim 1, which is characterized in that step 2) carries out the meter of IOU to strip true value frame and anchor It calculates, the anchor that IOU is greater than certain threshold value is selected as positive sample, is selected as negative sample less than certain threshold value, controlling positive negative ratio is 1:1, anchor frame corresponding to the true value frame of the part " vertical " are not selected into training sample.

6. the method as described in claim 1, which is characterized in that, will when being merged in step 4) for the bar-shaped frame selected The bar-shaped frame for meeting pairing condition between any two merges, and fitting obtains the coordinate of quadrangle textbox.

7. the method as described in claim 1, which is characterized in that step 4) is by the original image of picture to be detected and after being rotated by 90 ° Picture is put into togerther in network, predicts merge quadrangularly textbox after bar-shaped frame respectively, picture after being rotated by 90 ° and The textbox predicted on it is rotated by 90 ° counterclockwise reverts to original image direction, for what is predicted on both angle pictures Quadrangle textbox is combined and does quadrangle non-maxima suppression, obtains final text location result.

8. one kind is towards multidirectional text detection device characterized by comprising

Text detection network training module is responsible for carrying out cutting, shape according to angle information to the quadrangle true value frame of training sample The strip true value frame in rectangular region；Then positive and negative sample proportion is controlled, and carries out stochastical sampling, puts samples into CTPN network It is trained, obtains text detection network；

Text location detection module is responsible for picture to be detected and 90 degree of picture rotation of the picture to be detected inputting text detection Network, which exports bar-shaped frame and it there is a possibility that the prediction score value of text, does to obtained bar-shaped frame Non-maxima suppression, and therefrom select the bar-shaped frame that prediction score value is greater than given threshold；Then to being selected in different angle picture Bar-shaped frame out merges, and fitting constructs quadrangle textbox.

9. device as claimed in claim 8, which is characterized in that the text detection network training module is for having vertical text The picture of the true value frame of word carries out specially treated, if in training picture based on vertical text, training picture and quadrangle is true After being worth 90 degree of frame horizontal rotation, true value frame vertical in picture is the part of " vertical " after label rotation, then carries out cutting Processing；If marking the true value frame of vertical text is the part " vertical " in training picture based on horizontal, inclination text, it Cutting processing is carried out again afterwards, to guarantee the validity of training sample.

10. device as claimed in claim 8, which is characterized in that the text location detection module is by the original of picture to be detected Picture after scheming and being rotated by 90 ° is put into togerther in network, merges quadrangularly textbox, rotation after predicting bar-shaped frame respectively Picture after 90 degree and the textbox predicted on it are rotated by 90 ° revert to original image direction counterclockwise, for both angles The quadrangle textbox predicted on degree picture, is combined and does quadrangle non-maxima suppression, obtain final text location knot Fruit.