CN108960229A - One kind is towards multidirectional character detecting method and device - Google Patents

One kind is towards multidirectional character detecting method and device Download PDF

Info

Publication number
CN108960229A
CN108960229A CN201810366383.2A CN201810366383A CN108960229A CN 108960229 A CN108960229 A CN 108960229A CN 201810366383 A CN201810366383 A CN 201810366383A CN 108960229 A CN108960229 A CN 108960229A
Authority
CN
China
Prior art keywords
text
picture
frame
true value
quadrangle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810366383.2A
Other languages
Chinese (zh)
Other versions
CN108960229B (en
Inventor
王蕊
伍蹈
操晓春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201810366383.2A priority Critical patent/CN108960229B/en
Publication of CN108960229A publication Critical patent/CN108960229A/en
Application granted granted Critical
Publication of CN108960229B publication Critical patent/CN108960229B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/243Aligning, centring, orientation detection or correction of the image by compensating for image skew or non-uniform image deformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Character Input (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to one kind towards multidirectional character detecting method and device.In terms of training, on the basis of not changing network structure, cutting is carried out to quadrangle true value frame, quadrangle true value frame is become the strip true value frame of multiple rectangular areas, meet the input of CTPN, a positive and negative sample proportion of trained minimum lot size is controlled, guarantees positive negative sample balance, places into CTPN network and be trained;In terms of test, the picture to original image and after being rotated by 90 ° is put into test network simultaneously, the strip rectangular area come out to neural network forecast is fitted to form quadrangle candidate frame, carries out being rotated by 90 ° the coordinate position for reverting to original image counterclockwise to the testing result for the test picture being rotated by 90 °;It is finally comprehensive to the testing result of two figures to do the screening such as non-maxima suppression, realize accurately multi-direction text location.The present invention can adapt to multi-direction, the text including the directions such as horizontal, inclination, vertical, and have higher precision.

Description

One kind is towards multidirectional character detecting method and device
Technical field
The invention belongs to technical field of computer vision, and in particular to a kind of to adapt to multidirectional character detecting method and dress It sets, level, inclination and vertical text under natural scene can be accurately positioned.
Background technique
Text under natural scene is ubiquitous, such as traffic sign, the billboard in shop, poster etc., has artificial The place of trace, substantially with the presence of text.Text under identification natural scene is an important hair in artificial intelligence processes Open up part.Text region (Text Spotting) in image is generally divided into two steps, and text detection first orients image Then the position of middle text obtains the information content of text using identification technology to the text oriented.Wherein, text detection Accurate character area is oriented from picture background, is occupied an important position in entire Text region process.
It is extremely complex that situation occurs in text under natural scene.Firstly, background is complicated very much, it is not the pure of file and picture Color background, the image in natural scene are filled with the various interference for text, for example, electric wire, the artificial traces such as window are deposited Text is set to be difficult to extract from background.Secondly, the font of the text in natural scene, color, layout etc. has very big more Denaturation, which increase the difficulties that we position.In addition, being caused under the natural scene in picture due to shooting angle problem There are tilt angles for text, this is different with common object detection, another improves the difficulty of detection.Therefore, natural field Text detection in scape is the task of a great challenge.
With the development of neural network deep learning, natural scene character detecting method all utilizes deep learning mostly at present To realize.All in all, the natural scene character detecting method after 2006 can be classified as three classes.The first kind is based on pixel point The character detecting method cut.Zhang(Zhang,Zheng,et al."Multi-oriented Text Detection with Fully Convolutional Networks."Computer Vision and Pattern RecognitionIEEE, 2016:4159-4167.) et al. it is partitioned into from picture accordingly first with FCN (Fully Connected Network) Character area is recycled maximum stable extremal region method (MSER) to extract candidate characters region, is estimated using candidate characters region The direction for counting entire line of text, finally constructs line of text.Second class is the method based on candidate frame (object detection), than more typical Have TextBoxes (Liao, Minghui, et al. " TextBoxes:A Fast Text Detector with a Single Deep Neural Network. " (2016)) and CTPN (Tian, Zhi, et al. " Detecting Text in Natural Image with Connectionist Text Proposal Network."European Conference On Computer Vision Springer, Cham, 2016:56-72.), they follow common object detecting method respectively SSD (Liu, Wei, et al. " SSD:Single Shot MultiBox Detector. " (2015): 21-37.) and Faster R-CNN(Ren,S.,et al."Faster R-CNN:Towards Real-Time Object Detection with Region Proposal Networks."IEEE Transactions on Pattern Analysis&Machine Intelligence 39.6 (2017): 1137-1149.) method of candidate frame is improved according to the feature of text, is finally reached Detect the purpose of text.Third class is mixed method, EAST (Zhou, Xinyu, et al. " EAST:An Efficient And Accurate Scene Text Detector. " (2017)) multi-task learning method, on the one hand divide in picture On the other hand character area out predicts the geometry of text.
Since object detection does not have angled concept, the candidate frame that detected all is rectangle, therefore above-mentioned nature Under scene in character detecting method, the method that the second class method is based on candidate frame (object detection) is mainly used for the text of horizontal direction Word detection.However, the distribution of the text under natural scene in picture be it is very arbitrary, especially because shooting angle is asked Topic, many inclined texts.In addition, there are many Chinese texts to be distributed under Chinese environment, in natural scene vertically, and show For the relevant research work of some texts both for horizontal or inclination text, the detection to vertical text is not fine.To existing Some is improved based on the method for candidate frame (object detection), and adapting it to multi-direction text detection is a significant work Make.
Summary of the invention
In view of the above-mentioned problems, being mentioned the purpose of the present invention is being extended on the character detecting method towards horizontal direction One kind is towards multidirectional character detecting method and device out.
Basis of the invention is text detection network C TPN (Connectionist Text Proposal Network), The network predominantly detects horizontal line of text, and especially in the detection towards Chinese, CTPN is widely used.On this basis, this hair It is bright in order to adapt it to multi-direction text detection under natural scene, it is right on the basis of not changing network structure in terms of training (multi-direction) the progress cutting of quadrangle true value frame becomes quadrangle true value frame the strip true value frame of multiple rectangular areas, meets The input of CTPN.A positive and negative sample proportion of trained minimum lot size is controlled, guarantees positive negative sample balance, places into CTPN network It is trained;In terms of test, in order to be extended to the multi-direction character detecting method including vertical direction, need vertical text Become horizontally or diagonally text input network to be detected.Therefore, while the picture by original image and after being rotated by 90 ° is put into survey Try network.Its strip rectangular area come out to neural network forecast is fitted to form quadrangle candidate frame, the survey to being rotated by 90 ° The testing result for attempting piece carries out being rotated by 90 ° the coordinate position for reverting to original image counterclockwise.It is finally comprehensive to the testing result of two figures Non-maxima suppression is done in conjunction
(NMS) etc. accurately multi-direction text location is realized in screening.
In order to achieve the above object, the technical solution adopted by the present invention is that:
One kind is towards multidirectional character detecting method, comprising the following steps:
1) text network training:
Cutting 1-1) is carried out according to angle information to the quadrangle true value frame of training sample, the strip for forming rectangular area is true It is worth frame;
Positive and negative sample proportion 1-2) is controlled, and carries out stochastical sampling, CTPN network is put samples into and is trained, obtain text Word detects network;
2) text location detects:
Picture to be detected and 90 degree of picture rotation of the picture to be detected 2-1) are inputted into text detection network, text inspection Survey grid network exports bar-shaped frame and it there is a possibility that the prediction score value (score) of text, does to obtained bar-shaped frame non-very big Value inhibits, and therefrom selects the bar-shaped frame that prediction score value is greater than given threshold;
2-2) bar-shaped frame picked out in different angle picture is merged, fitting constructs quadrangle textbox.
Further, step 1-1) find out quadrangle true value frame up and down two sides angle, obtain the straight line on upper and lower both sides Formula utilizes straight line public affairs according to x-axis coordinate every anchor wide (reference frame that anchor is predictive text position in CTPN) Formula determines y-axis coordinate, obtains (the x of strip true value framemin, ymin, xmax, ymax), respectively indicate the true value frame upper left corner and the lower right corner Position.
Further, step 1-1) training picture in, the beginning and end in the horizontal direction of strip true value frame is (i.e. Xmin, xmax), always on the position of 16 multiples.Therefore the not level side on 16 multiples of the quadrangle true value frame before cutting To both sides, need to be accepted or rejected.
Further, step 1-1) for there are the pictures of the true value frame of vertical text to carry out specially treated.If training figure In piece based on vertical text, after training picture and quadrangle true value frame are rotated horizontally 90 degree, erected in picture after label rotation Straight true value frame is the part of " vertical ", then carries out cutting processing;If in training picture based on horizontal, inclination text, The true value frame for marking vertical text is the part " vertical ", carries out cutting processing again later.Guarantee the validity of training sample.
Further, step 1-2) to strip true value frame and anchor carry out IOU (Intersection over Union, Overlapping region ratio) calculating, IOU be greater than certain threshold value anchor be selected as positive sample, be selected as negative sample less than certain threshold value This, controlling positive negative ratio is 1:1, and anchor frame corresponding to the true value frame of the part " vertical " does not select into training sample In.
Further, step 2-2) in, when merging for the bar-shaped frame selected, pairing condition will be met between any two Bar-shaped frame merge, fitting obtain the coordinate of quadrangle textbox.
Further, step 2-2) in, for picture to be detected, original image is put into togerther network with the picture after being rotated by 90 ° In, merge quadrangularly textbox after predicting bar-shaped frame respectively.Picture after being rotated by 90 ° and the text predicted on it Word frame is rotated by 90 ° counterclockwise reverts to original image direction, for the quadrangle textbox predicted on both angle pictures, closes Quadrangle non-maxima suppression (polygon-NMS) is done together, obtains final text location result.
One kind is towards multidirectional text detection device comprising:
Text detection network training module is responsible for cutting the quadrangle true value frame of training sample according to angle information Point, form the strip true value frame of rectangular area;Then positive and negative sample proportion is controlled, and carries out stochastical sampling, is put samples into CTPN network is trained, and obtains text detection network;
Text location detection module is responsible for picture to be detected and 90 degree of picture rotation of the picture to be detected inputting text Network is detected, which exports bar-shaped frame and it there is a possibility that the prediction score value of text, the bar shaped to obtaining Frame does non-maxima suppression, and therefrom selects the bar-shaped frame that prediction score value is greater than given threshold;Then in different angle picture The bar-shaped frame picked out merges, and fitting constructs quadrangle textbox.
In conclusion the present invention devises one kind towards multidirectional character detecting method, compared with prior art, originally The advantages of invention, is can adapt to multi-direction, the text including the directions such as horizontal, inclination, vertical, and has higher essence Degree.
Detailed description of the invention
Fig. 1 is text detection network C TPN network architecture diagram in one embodiment of the invention.
Fig. 2 is that text detection training data pre-processes cutting figure in one embodiment of the invention.
Fig. 3 is text testing result comparison diagram in one embodiment of the invention.
Specific embodiment
Below with reference to embodiment and attached drawing, clear, complete description is carried out to the present invention.
It is proposed by the present invention to be based primarily upon CTPN towards text detection network in multidirectional character detecting method and changed Into what is got.It wherein mainly include two stages, i.e. network training stage and text location detection-phase.
CTPN network regards the strip region group by fixed width as based on the RPN stage in Faster-RCNN, literal line At network structure is as shown in Figure 1.Wherein, VGG16 is deep neural network model, extracts feature to picture;Conv5 is VGG16 The last layer convolutional layer of network;H, W respectively indicates the height and width for the characteristic pattern that Conv5 comes out;FC indicates full articulamentum; " 3x3xC → 256D " indicates that the characteristic pattern of Conv5 carries out 3x3xC dimension to recurrent neural network Bi-LSTM (two-way LSTM) 256 The conversion of dimension, i.e. each time series vector in BLSTM are 256 dimensions;" a " indicates 2k vertical coordinates, that is, indicates Network exports the offset coordinates on k anchor vertical direction;" b " indicates 2k scores, i.e. expression network is to k anchor It is middle that there are the marking of text;" c " indicates k side-refinement, that is, indicates the offset of k anchor and text boundary.
Anchor's is sized to fixed wide (as network step-length, 16 pixels), highly different strip in the network Shape.Only the vertical direction of anchor frame is returned, the parameter of the regression equation smoothL1 in network is 2 y-axis sides Upward coordinate.RNN layers (BLSTM) is added in CTPN network, two-way context is carried out to the information in the horizontal direction of picture Analysis, in addition, there are three types of loss functions, including classification to judge in total by CTPN, vertical direction upper returning and boundary are returned.Wherein It is the recurrence that x-axis direction is carried out to the horizontal boundary of the true value frame of mark that boundary, which returns, determines the right and left of prediction block, is worth It is noted that not using boundary in a specific embodiment of the invention returns this loss function.
One, the network training stage:
1, the pretreatment of training sample
For CTPN mainly towards horizontal text, the sample coordinate of training managing is that rectangle true value frame is (xmin,ymin,xmax, ymax) four coordinates, the segmentation of true value frame is carried out later, makes it at the strip true value frame with one broadening of anchor frame.It now faces towards more Direction text, the coordinate of training sample are that four side true value shape frames are (x1, y1, x2, y2, x3, y3, x4, y4), are indicated from the upper left corner Start clockwise coordinate, and multi-direction text has horizontal, inclination, vertical a variety of situations.
Here illustrate horizontal in the present invention, inclination, vertical definition.Horizontal: text true value frame is in the horizontal direction;Incline Oblique: text true value frame is [- π/4, π/4] angle with x-axis;It is vertical: text true value frame be in x-axis be [- pi/2 ,-π/4) or (π/ 4, pi/2] angle, Huo Zhegao: it is wider than 1.2.
It simply takes it to cover the minimum rectangle frame of quadrangle and will cause in training sample quadrangle true value frame to exist very much It is not the region of text, causes training sample quality not high, influence network training.Therefore true value of the present invention to multi-direction text Frame is cut into anchor with wide strip true value frame, in Fig. 2 shown in (a) figure, (b) figure, (c) figure and (d) figure, so that true value Frame perfection agrees with text, guarantees the reliability of sample.
It include that specially treated and BORDER PROCESSING, specific steps are carried out to vertical true value frame to quadrangle true value frame cutting It is as follows:
1) vertical true value frame is marked
A) the minimum square of covering quadrangle is calculated according to quadrangle true value frame coordinate (x1, y1, x2, y2, x3, y3, x4, y4) Shape obtains xmin,ymin,xmax,ymax, further calculate width=xmax-xmin, height=ymax-ymin.High if (height): Wide (width) is greater than 1.2, then it is assumed that it is vertical true value frame, is labeled as vertical=1, defaults vertical=0.
B) according to four side true value shape frame coordinates (x1, y1, x2, y2, x3, y3, x4, y4), [(x1, y1), (x2, y2)] two o'clock Determining straight line formula line1, (x4, y4), (x3, y3)] two o'clock determines the straight line formula of line3.Line1 and line3 is represented Quadrangle true value frame the 1st article since the upper left corner clockwise while and at the 3rd article of, formula is as follows:
Line1=k1×x+b1
Line3=k3×x+b3
If the slope of line1 and line3 are less than -1 or are greater than 1 i.e. k1> 1 and k3< -1, then it is assumed that the frame is vertical true It is worth frame, marks vertical=1.
2) to there are the pictures of vertical true value frame to handle
If the ratio for the vertical that all true value collimation marks in a trained figure are remembered is greater than certain threshold gamma, instruction is rotated Practice figure and 90 degree of true value frame, then all horizontal, inclination true value frames become vertical true value frame, therefore the vertical of all true value frames Label negates, and recalculates the straight line formula of line1 and line3.Wherein threshold gamma is taken as 50%.
3) true value frame cutting initial position is found
In order to keep coordinate of the strip true value frame in x-axis consistent with anchor frame, by the cutting beginning boundary of true value frame Also it is scheduled on anchor wide (widtha) on multiple.True value frame boundary is set in x-axis direction:
Wherein x1, x2, x3, x4 are the x coordinate on four vertex of quadrangle true value frame, widthaFor anchor wide.xleftIt is The initial position on the true value frame cutting left side, for the x-axis direction in the true value frame upper left corner and the lower left corner midpoint turn right it is nearest widthaMultiple.Corresponding xrightIt is 2 position of termination on the right after true value frame cutting, is the true value frame upper right corner and the lower right corner Turn left nearest width at the midpoint of x-axis directionaMultiple.
4) cutting strip true value frame
By coordinate on the straight line formula and x-axis of line1 and line3, the strip true value frame coordinate being syncopated as is determined (gtxmin,gtymin,gtxmax,gtymax), wherein gtxmaxWith gtxminThe distance between be anchor wide widtha, calculation formula is gtxmax=gtxmin+widtha- 1, gtxmax∈[xleft,xright), gtxminValue cutting initial position xleftStart every widthaUntil cutting final position xright.Gt is obtained according to the formula of line1 and line3ymin, gtymax, x coordinate is when calculating The central point of strip true value frame
5) BORDER PROCESSING
In horizontal direction, the boundary of quadrangle true value frame may have certain left portion to true value frame cutting initial position Divide φ, because the origin coordinates in quadrangle true value frame horizontal direction can not all be the multiple of anchor wide.To guarantee sample High quality, the present invention accepts or rejects φ.Regulation: φ > widtha/2's is left as strip true value frame, otherwise gives up.
2, training sample is handled
For the positive negative ratio control of training sample in 1:1, minimum lot size training sample (minibatch) is up to 256.In positive sample When this less than 128, negative sample amount is reduced, guarantees the positive negative ratio control of sample in 1:1.
Anchor frame corresponding to the true value frame that vertical true value frame, that is, vertical is 1, is not put into training sample.
Two, the network test stage:
In test phase, need to screen for predicting the strip candidate frame come, and construct text box.To multi-direction Text, the present invention need to handle a variety of situations, but what the CTPN after extending in the present invention substantially handled is horizontal, inclination side To text.In order to solve this problem, the picture of two kinds of angles, original image and the figure being rotated by 90 ° are inputted in test.In original image There may be horizontal, inclinations, the text of vertical three kinds of situations, the picture after turning 90 degrees, originally vertical text become level or Person's inclination, originally horizontal, inclined text becomes vertical text.The object of the invention is to by detecting in original image Level tilts text and is rotated by 90 ° in rear picture horizontal, inclination text, so that the multi-direction text for reaching original image includes water Flat, inclination above all covers the result detected vertically.
Specific steps include:
1, the picture after testing the original image of picture and being rotated by 90 ° is put into togerther in network, predicts corresponding strip respectively Candidate frame.
2, the strip candidate frame predicted is screened respectively in the picture in original image and after being rotated by 90 °, network is defeated It is text candidate frame that score score out, which is greater than certain threshold value, the use of threshold value is generally 0.7, and handled using NMS.
3, the text candidate frame selected in one text row is put into SkIn set, k is k-th of line of text.Rule are as follows: 1) In two strip text candidate frame horizontal directions within 32 pixels;2) on two strip candidate frame vertical directions the section IOU be [0.7, 1];3) two strip candidate frame similarity sections are [0.7,1], i.e., the ratio between small frame and big frame height degree.
4, with fitting function to the text candidate frame S in one text rowkYi in setmin,yimax, i ∈ Sk, respectively Fitting since the upper left corner clockwise 1st article while and formula at the 3rd article of,min=cx+d, Ymax=ax+b.According to candidate Frame SkThe minimum value of x-axis and maximum value and formula Y is substituted into setminAnd YmaxDetermine that four vertex of quadrangle line of text are sat Mark.
5, the score of quadrangle text prediction box is the score mean value of the strip frame constituted.
6, after the original image to test picture and the picture after being rotated by 90 ° detect strip rectangle frame, four sides that have constructed Shape text prediction box carries out quadrangle non-maxima suppression (polygon-NMS), obtains final text detection result.Wherein Polygon-NMS and NMS before are slightly different, and that calculate here is the IoU of the area of quadrilateral frame.
Method of the invention, for multi-direction text, including level, inclination and vertical text are all adapted to, and are Verifying actual technical effect of the invention passes through the natural scene text detection that specific experiment realizes previous embodiment description Method tests environment and experimental result are as follows:
(1) environment is tested:
System environments: ubuntu14.04;
Hardware environment: memory: 64GB, GPU:K40, hard disk: 1TB;
(2) experimental data:
Training data:
800,000 training datas of SynthText (artificial synthesized text picture) carry out pre-training 1epoch.
RCTW2017 training set (8034)
Test data: RCTW2017 test set (4229)
Appraisal procedure: IOU > 0.5 calculates successful match
(3) experimental result:
Fig. 3 is text detection comparative result figure, and upper row i.e. (a) figure, (b) figure, (c) figure is the testing result of CTPN, A line i.e. (d) figure, (e) figure, (f) figure is testing result of the invention below.Can be clearly seen that, with method of the invention with Former CTPN network is compared, more accurate for the positioning of inclination text and vertical text.Table 1 is on RCTW2017 test set Assessment result, although data set there are certain difficulty, it can be seen that this method possesses higher precision.
Assessment result of the table 1. on RCTW2017 test set
Data set Precision Recall F-measure
RCTW2017 0.791453 0.441691 0.56697
The above embodiments are merely illustrative of the technical solutions of the present invention rather than is limited, the ordinary skill of this field Personnel can be with modification or equivalent replacement of the technical solution of the present invention are made, without departing from the spirit and scope of the present invention, this The protection scope of invention should be subject to described in claims.

Claims (10)

1. one kind is towards multidirectional character detecting method, which comprises the following steps:
1) cutting is carried out according to angle information to the quadrangle true value frame of training sample, forms the strip true value frame of rectangular area;
2) positive and negative sample proportion is controlled, and carries out stochastical sampling, CTPN network is put samples into and is trained, obtain text detection Network;
3) picture to be detected and 90 degree of picture rotation of the picture to be detected are inputted into text detection network, the text detection network It exports bar-shaped frame and it there is a possibility that the prediction score value of text, non-maxima suppression is done to obtained bar-shaped frame, and therefrom Select the bar-shaped frame that prediction score value is greater than given threshold;
4) bar-shaped frame picked out in different angle picture is merged, fitting constructs quadrangle textbox.
2. the method as described in claim 1, which is characterized in that step 1) finds out the angle on two sides of quadrangle true value frame or more Degree, obtains the straight line formula on upper and lower both sides, every anchor wide, according to x-axis coordinate, determines y-axis coordinate using straight line formula, obtain To (the x of strip true value framemin, ymin, xmax, ymax), respectively indicate the position in the true value frame upper left corner and the lower right corner.
3. such as claim 1 method, which is characterized in that in the training picture of step 1), in the horizontal direction of strip true value frame Beginning and end always on the position of 16 multiples;To the not horizontal direction on 16 multiples of quadrangle true value frame before cutting Both sides are accepted or rejected.
4. the method as described in claim 1, which is characterized in that step 1) for there are the picture of the true value frame of vertical text into Row specially treated, if in training picture based on vertical text, after training picture and quadrangle true value frame are rotated horizontally 90 degree, True value frame vertical in picture is the part of " vertical " after label rotation, then carries out cutting processing;If training picture in Based on horizontal, inclination text, marking the true value frame of vertical text is the part " vertical ", carries out cutting processing again later, from And guarantee the validity of training sample.
5. the method as described in claim 1, which is characterized in that step 2) carries out the meter of IOU to strip true value frame and anchor It calculates, the anchor that IOU is greater than certain threshold value is selected as positive sample, is selected as negative sample less than certain threshold value, controlling positive negative ratio is 1:1, anchor frame corresponding to the true value frame of the part " vertical " are not selected into training sample.
6. the method as described in claim 1, which is characterized in that, will when being merged in step 4) for the bar-shaped frame selected The bar-shaped frame for meeting pairing condition between any two merges, and fitting obtains the coordinate of quadrangle textbox.
7. the method as described in claim 1, which is characterized in that step 4) is by the original image of picture to be detected and after being rotated by 90 ° Picture is put into togerther in network, predicts merge quadrangularly textbox after bar-shaped frame respectively, picture after being rotated by 90 ° and The textbox predicted on it is rotated by 90 ° counterclockwise reverts to original image direction, for what is predicted on both angle pictures Quadrangle textbox is combined and does quadrangle non-maxima suppression, obtains final text location result.
8. one kind is towards multidirectional text detection device characterized by comprising
Text detection network training module is responsible for carrying out cutting, shape according to angle information to the quadrangle true value frame of training sample The strip true value frame in rectangular region;Then positive and negative sample proportion is controlled, and carries out stochastical sampling, puts samples into CTPN network It is trained, obtains text detection network;
Text location detection module is responsible for picture to be detected and 90 degree of picture rotation of the picture to be detected inputting text detection Network, which exports bar-shaped frame and it there is a possibility that the prediction score value of text, does to obtained bar-shaped frame Non-maxima suppression, and therefrom select the bar-shaped frame that prediction score value is greater than given threshold;Then to being selected in different angle picture Bar-shaped frame out merges, and fitting constructs quadrangle textbox.
9. device as claimed in claim 8, which is characterized in that the text detection network training module is for having vertical text The picture of the true value frame of word carries out specially treated, if in training picture based on vertical text, training picture and quadrangle is true After being worth 90 degree of frame horizontal rotation, true value frame vertical in picture is the part of " vertical " after label rotation, then carries out cutting Processing;If marking the true value frame of vertical text is the part " vertical " in training picture based on horizontal, inclination text, it Cutting processing is carried out again afterwards, to guarantee the validity of training sample.
10. device as claimed in claim 8, which is characterized in that the text location detection module is by the original of picture to be detected Picture after scheming and being rotated by 90 ° is put into togerther in network, merges quadrangularly textbox, rotation after predicting bar-shaped frame respectively Picture after 90 degree and the textbox predicted on it are rotated by 90 ° revert to original image direction counterclockwise, for both angles The quadrangle textbox predicted on degree picture, is combined and does quadrangle non-maxima suppression, obtain final text location knot Fruit.
CN201810366383.2A 2018-04-23 2018-04-23 Multidirectional character detection method and device Active CN108960229B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810366383.2A CN108960229B (en) 2018-04-23 2018-04-23 Multidirectional character detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810366383.2A CN108960229B (en) 2018-04-23 2018-04-23 Multidirectional character detection method and device

Publications (2)

Publication Number Publication Date
CN108960229A true CN108960229A (en) 2018-12-07
CN108960229B CN108960229B (en) 2022-04-01

Family

ID=64498736

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810366383.2A Active CN108960229B (en) 2018-04-23 2018-04-23 Multidirectional character detection method and device

Country Status (1)

Country Link
CN (1) CN108960229B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670495A (en) * 2018-12-13 2019-04-23 深源恒际科技有限公司 A kind of method and system of the length text detection based on deep neural network
CN109961064A (en) * 2019-03-20 2019-07-02 深圳市华付信息技术有限公司 Identity card text positioning method, device, computer equipment and storage medium
CN109977945A (en) * 2019-02-26 2019-07-05 博众精工科技股份有限公司 Localization method and system based on deep learning
CN110689010A (en) * 2019-09-27 2020-01-14 支付宝(杭州)信息技术有限公司 Certificate identification method and device
CN111027554A (en) * 2019-12-27 2020-04-17 创新奇智(重庆)科技有限公司 System and method for accurately detecting and positioning commodity price tag characters
CN111046866A (en) * 2019-12-13 2020-04-21 哈尔滨工程大学 Method for detecting RMB crown word number region by combining CTPN and SVM
CN111368831A (en) * 2020-03-03 2020-07-03 开放智能机器(上海)有限公司 System and method for positioning vertically arranged characters
CN111797827A (en) * 2020-05-18 2020-10-20 冠群信息技术(南京)有限公司 Automatic OCR recognition method for character direction mixed arrangement
CN112580624A (en) * 2020-11-18 2021-03-30 中国科学院信息工程研究所 Method and device for detecting multidirectional text area based on boundary prediction
CN112613561A (en) * 2020-12-24 2021-04-06 哈尔滨理工大学 EAST algorithm optimization method
CN113011423A (en) * 2019-12-21 2021-06-22 北京师范大学珠海分校 Text line structure optimization calculation method based on CTPN system and application thereof
CN113139539A (en) * 2021-03-16 2021-07-20 中国科学院信息工程研究所 Method and device for detecting characters of arbitrary-shaped scene with asymptotic regression boundary
CN115497106A (en) * 2022-11-14 2022-12-20 合肥中科类脑智能技术有限公司 Battery laser code spraying identification method based on data enhancement and multitask model

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8718365B1 (en) * 2009-10-29 2014-05-06 Google Inc. Text recognition for textually sparse images
CN106407976A (en) * 2016-08-30 2017-02-15 百度在线网络技术(北京)有限公司 Image character identification model generation and vertical column character image identification method and device
CN107346420A (en) * 2017-06-19 2017-11-14 中国科学院信息工程研究所 Text detection localization method under a kind of natural scene based on deep learning
CN107403130A (en) * 2017-04-19 2017-11-28 北京粉笔未来科技有限公司 A kind of character identifying method and character recognition device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8718365B1 (en) * 2009-10-29 2014-05-06 Google Inc. Text recognition for textually sparse images
CN106407976A (en) * 2016-08-30 2017-02-15 百度在线网络技术(北京)有限公司 Image character identification model generation and vertical column character image identification method and device
CN107403130A (en) * 2017-04-19 2017-11-28 北京粉笔未来科技有限公司 A kind of character identifying method and character recognition device
CN107346420A (en) * 2017-06-19 2017-11-14 中国科学院信息工程研究所 Text detection localization method under a kind of natural scene based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHI TIAN: "《Detecting Text in Natural Image》", 《EUROPEAN CONFERENCE ON COMPUTER VISION》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109670495A (en) * 2018-12-13 2019-04-23 深源恒际科技有限公司 A kind of method and system of the length text detection based on deep neural network
CN109977945A (en) * 2019-02-26 2019-07-05 博众精工科技股份有限公司 Localization method and system based on deep learning
CN109961064A (en) * 2019-03-20 2019-07-02 深圳市华付信息技术有限公司 Identity card text positioning method, device, computer equipment and storage medium
CN110689010A (en) * 2019-09-27 2020-01-14 支付宝(杭州)信息技术有限公司 Certificate identification method and device
CN110689010B (en) * 2019-09-27 2021-05-11 支付宝(杭州)信息技术有限公司 Certificate identification method and device
CN111046866B (en) * 2019-12-13 2023-04-18 哈尔滨工程大学 Method for detecting RMB crown word number region by combining CTPN and SVM
CN111046866A (en) * 2019-12-13 2020-04-21 哈尔滨工程大学 Method for detecting RMB crown word number region by combining CTPN and SVM
CN113011423A (en) * 2019-12-21 2021-06-22 北京师范大学珠海分校 Text line structure optimization calculation method based on CTPN system and application thereof
CN111027554A (en) * 2019-12-27 2020-04-17 创新奇智(重庆)科技有限公司 System and method for accurately detecting and positioning commodity price tag characters
CN111368831A (en) * 2020-03-03 2020-07-03 开放智能机器(上海)有限公司 System and method for positioning vertically arranged characters
CN111368831B (en) * 2020-03-03 2023-05-23 开放智能机器(上海)有限公司 Positioning system and method for vertical text
CN111797827A (en) * 2020-05-18 2020-10-20 冠群信息技术(南京)有限公司 Automatic OCR recognition method for character direction mixed arrangement
CN112580624A (en) * 2020-11-18 2021-03-30 中国科学院信息工程研究所 Method and device for detecting multidirectional text area based on boundary prediction
CN112613561B (en) * 2020-12-24 2022-06-03 哈尔滨理工大学 EAST algorithm optimization method
CN112613561A (en) * 2020-12-24 2021-04-06 哈尔滨理工大学 EAST algorithm optimization method
CN113139539A (en) * 2021-03-16 2021-07-20 中国科学院信息工程研究所 Method and device for detecting characters of arbitrary-shaped scene with asymptotic regression boundary
CN115497106A (en) * 2022-11-14 2022-12-20 合肥中科类脑智能技术有限公司 Battery laser code spraying identification method based on data enhancement and multitask model
CN115497106B (en) * 2022-11-14 2023-01-24 合肥中科类脑智能技术有限公司 Battery laser code-spraying identification method based on data enhancement and multitask model

Also Published As

Publication number Publication date
CN108960229B (en) 2022-04-01

Similar Documents

Publication Publication Date Title
CN108960229A (en) One kind is towards multidirectional character detecting method and device
WO2022148192A1 (en) Image processing method, image processing apparatus, and non-transitory storage medium
CN109711288A (en) Remote sensing ship detecting method based on feature pyramid and distance restraint FCN
CN110232713B (en) Image target positioning correction method and related equipment
CN110287960A (en) The detection recognition method of curve text in natural scene image
Zhou et al. Complete residential urban area reconstruction from dense aerial LiDAR point clouds
CN109614985A (en) A kind of object detection method based on intensive connection features pyramid network
CN108961235A (en) A kind of disordered insulator recognition methods based on YOLOv3 network and particle filter algorithm
CN108009509A (en) Vehicle target detection method
CN105260749B (en) Real-time target detection method based on direction gradient binary pattern and soft cascade SVM
CN109117836A (en) Text detection localization method and device under a kind of natural scene based on focal loss function
CN105046206B (en) Based on the pedestrian detection method and device for moving prior information in video
CN112766184B (en) Remote sensing target detection method based on multi-level feature selection convolutional neural network
CN109785298A (en) A kind of multi-angle object detecting method and system
CN106096542A (en) Image/video scene recognition method based on range prediction information
CN108960135A (en) Intensive Ship Target accurate detecting method based on High spatial resolution remote sensing
CN109711416A (en) Target identification method, device, computer equipment and storage medium
CN106056084B (en) Remote sensing image port ship detection method based on multi-resolution hierarchical screening
CN112990086A (en) Remote sensing image building detection method and device and computer readable storage medium
CN110334594A (en) A kind of object detection method based on batch again YOLO algorithm of standardization processing
Neuhausen et al. Automatic window detection in facade images
CN110223310A (en) A kind of line-structured light center line and cabinet edge detection method based on deep learning
CN114677594A (en) River water level intelligent identification algorithm based on deep learning
US11906441B2 (en) Inspection apparatus, control method, and program
JP3471578B2 (en) Line direction determining device, image tilt detecting device, and image tilt correcting device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant