CN109670495A

CN109670495A - A kind of method and system of the length text detection based on deep neural network

Info

Publication number: CN109670495A
Application number: CN201811528135.XA
Authority: CN
Inventors: 夏路遥; 黄贤俊
Original assignee: Shenzhen Yuan Heng Technology Co Ltd
Current assignee: Shenzhen Yuan Heng Technology Co Ltd
Priority date: 2018-12-13
Filing date: 2018-12-13
Publication date: 2019-04-23

Abstract

The method and system for the length text detection based on deep neural network that the invention discloses a kind of, packet are included: extracting characteristic pattern to original image；The first kind rectangular area of several default length-width ratios is enumerated from characteristic pattern based on Faster RCNN, the prospect or background, the real estate for predicting first kind rectangular area of first kind rectangular area is predicted, obtains all first kind text boxes；Based on CTPN from several predetermined widths of feature enumeration of graph, the second class rectangle region of indefinite length, if the second class rectangle region be it is text filed, access RNN recirculating network and obtain the second class text frame；Merge first kind text box and the second class text frame based on non-maxima suppression.The advantages of present invention incorporates Faster RCNN and CTPN, and merged the detection content of the two based on non-maxima suppression logic by certain rule, so that the recall rate and accuracy rate of text detection are all improved.

Description

A kind of method and system of the length text detection based on deep neural network

Technical field

The present invention relates to text detection techniques fields, and in particular to a kind of length text detection based on deep neural network Method and system.

Background technique

Currently, there is the demand for the Pictures Electronics with text for largely providing user in the market, such demand needs Text detection in picture is come out, and is identified as digitized text.The work of this part needs a large amount of manpower in the past, and present It is then this technology that electronic data is converted picture into using OCR technique.Current OCR technique is divided into detection and identification two Module, this technology are mainly used to improve the performance of text detection module, the accuracy rate that the boundary for promoting text is returned.

Text detection techniques, what is be based primarily upon is to develop preferable depth convolutional neural networks at present to be detected, packet Include Faster RCNN frame and CTPN detection framework；Wherein:

The basic version of Faster RCNN frame has biggish object relatively high accuracy rate.The process of frame Are as follows: I. extracts feature to picture, and II. enumerates a large amount of rectangle to attempt to return out corresponding object, the square that III. will be enumerated Shape is divided into 2 classes: comprising target and the biggish positive sample of intersection and other negative samples, IV. cuts out positive sample from characteristic pattern Come, the boundary of regressive object is then gone to according to characteristic pattern；

Occur for the CTPN testing stand of text class inspection optimization for text is always horizontal, and length is unfixed existing As use following process: I. extracts feature to picture, and II. enumerates a large amount of small rectangle, is different from Faster RCNN, American series Small rectangle can fixed width, and height will use different scales, such as: (11 pixels to 273 pixels, every time multiplied by fixation Ratio, totally 10 different scales), III is attached using the small scale text that RNN recirculating network will test, obtains text Row.IV. using CNN+RNN, training method, support are multiple dimensioned end to end.

Above-mentioned two detection model there are the shortcomings that are as follows:

Faster RCNN detection model is in context of detection, because text has that gap different in size is larger, it is difficult to It determines that the height for enumerating anchor is wide, it is poor to there is a problem of that boundary returns to the great text of length-width ratio gap.

CTPN detection model returns preferably boundary in context of detection, but for there are in the case where text overlapping, There can be the case where text box loss, it is difficult to accomplish to give all texts for change.

Summary of the invention

Aiming at the shortcomings existing in the above problems, the present invention provides a kind of long short text based on deep neural network The method and system of detection.

The method for the length text detection based on deep neural network that the invention discloses a kind of, comprising:

Original image is selected, characteristic pattern is extracted to original image；

The first kind rectangular area of several default length-width ratios, prediction first are enumerated from characteristic pattern based on Faster RCNN The prospect or background in class rectangle region, the real estate for predicting first kind rectangular area, obtain all first kind text boxes；

Based on CTPN from several predetermined widths of feature enumeration of graph, the second class rectangle region of indefinite length, the second class is judged Rectangular area is text or non-textual region；If the second class rectangle region be it is text filed, access RNN recirculating network and obtain the Two class text frames；

Merge the first kind text box and the second class text frame based on non-maxima suppression.

As a further improvement of the present invention, described that several default length and width are enumerated from characteristic pattern based on Faster RCNN The first kind rectangular area of ratio, comprising:

Original image abstract characteristics are extracted based on depth convolutional network；

The candidate region of using area candidate network recommendation original image；

The accurate region of text is returned from candidate region.

As a further improvement of the present invention, the CTPN that is based on is from several predetermined widths of feature enumeration of graph, indefinite length The second class rectangle region, comprising:

Characteristic image is generated using network；

Candidate the second class rectangle region enumerated is generated using fixed width and indefinite length；

The non-textual prediction of text is carried out to the second class rectangle region.

As a further improvement of the present invention, described that the first kind text box and second are merged based on non-maxima suppression Class text frame, comprising:

Framed score is sorted, best result and its corresponding frame is chosen, is set as valid frame；

Remaining frame is traversed, if the overlapping area with current best result frame is greater than certain threshold value, deletes valid frame；

The frame for continuing to select a highest scoring from untreated frame, repeats the above process.

The system for the length text detection based on deep neural network that the present invention also provides a kind of, comprising:

Extraction module extracts characteristic pattern to original image for selecting original image；

Faster RCNN processing module, for enumerating several default length-width ratios from characteristic pattern based on Faster RCNN First kind rectangular area is predicted the prospect or background, the real estate for predicting first kind rectangular area of first kind rectangular area, is obtained To all first kind text boxes；

CTPN processing module, for based on CTPN from several predetermined widths of feature enumeration of graph, the second class square of indefinite length Shape region judges the second class rectangle region for text or non-textual region；If the second class rectangle region be it is text filed, access RNN recirculating network obtains the second class text frame；

Merging module, for merging the first kind text box and the second class text frame based on non-maxima suppression.

The accurate region of text is returned from candidate region.

Characteristic image is generated using network；

Compared with prior art, the invention has the benefit that

The Faster RCNN and CTPN that the present invention uses are all based on deep learning algorithm, are the spies in original image The text detection done on sign figure can use under various complex environments；

The advantages of present invention incorporates Faster RCNN: still there is higher recall rate for overlay target.And CTPN Advantage: still having preferable boundary accurate rate for longer text, has better compatibility simultaneously for the text of different scale Property.The detection content of the two is merged based on non-maxima suppression logic by certain rule, so that text detection is recalled Rate and accuracy rate are all improved.

Detailed description of the invention

Fig. 1 is the process of the method for the length text detection based on deep neural network disclosed in an embodiment of the present invention Figure；

Fig. 2 is the frame of the system of the length text detection based on deep neural network disclosed in an embodiment of the present invention Figure；

Fig. 3 is the schematic diagram of first kind text box disclosed in an embodiment of the present invention；

Fig. 4 is that RNN recirculating network disclosed in an embodiment of the present invention obtains the schematic diagram of the second class text frame.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiments of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people Member's every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.

The present invention is described in further detail with reference to the accompanying drawing:

As shown in Figure 1, the method for the invention discloses a kind of length text detection based on deep neural network, comprising:

Step 1, selection original image, carry out feature extraction to original image, obtain characteristic pattern；Wherein original image can be Bill picture；

Step 2, the first kind rectangular area for enumerating several default length-width ratios from characteristic pattern based on Faster RCNN, in advance The prospect or background, the real estate for predicting first kind rectangular area for surveying first kind rectangular area, obtain all first kind texts This frame, first kind text box are as shown in Figure 3；Wherein:

The principle of Faster RCNN are as follows:

The accurate region of text is returned from candidate region；

Step 3, based on CTPN from several predetermined widths of feature enumeration of graph, the second class rectangle region of indefinite length, judgement Second class rectangle region is text or non-textual region；If the second class rectangle region be it is text filed, access RNN recirculating network The second class text frame is obtained, the second class text frame is as shown in Figure 4；Wherein:

The principle of CTPN are as follows:

Characteristic image is generated using network；

Step 4 merges first kind text box and the second class text frame based on non-maxima suppression (NMS)；Wherein:

Merge first kind text box and the second class text frame based on non-maxima suppression, comprising:

Such as:

Assuming that object detects that 4 Boxes, each Box respectively correspond a classification Score, according to Score from it is small to Longer spread is followed successively by, (B1, S1), (B2, S2), (B3, S3), (B4, S4) .S4 > S3 > S2 > S1；

Step 1. is according to Score size, since Box B4 frame；

Step 2. calculates separately the overlapping degree IoU of B1, B2, B3 and B4, judges whether to be greater than presetting threshold value；Such as Fruit is greater than given threshold, then gives up the Box；The Box. of reservation is marked to assume that the threshold value of B3 and B4 is more than given threshold simultaneously, then Give up B3, label B4 is the Box to be retained；

The maximum B2 of Score is chosen in Step 3. B1 from remaining Boxes, B2, then calculates B2's and remaining B1 Overlapping degree IoU；If it is greater than given threshold, the Box is equally abandoned；The Box of reservation is marked simultaneously；

Above procedure is repeated, until finding whole reservation Boxes.

As shown in Fig. 2, the present invention provides a kind of system of length text detection based on deep neural network, comprising:

Extraction module carries out feature extraction to original image, obtains characteristic pattern for selecting original image；It is wherein original Picture can be bill picture；

Faster RCNN processing module, for enumerating several default length-width ratios from characteristic pattern based on Faster RCNN First kind rectangular area is predicted the prospect or background, the real estate for predicting first kind rectangular area of first kind rectangular area, is obtained To all first kind text boxes, first kind text box is as shown in Figure 3；Wherein:

The principle of Faster RCNN are as follows:

The accurate region of text is returned from candidate region；

CTPN processing module, for based on CTPN from several predetermined widths of feature enumeration of graph, the second class square of indefinite length Shape region judges the second class rectangle region for text or non-textual region；If the second class rectangle region be it is text filed, access RNN recirculating network obtains the second class text frame, and the second class text frame is as shown in Figure 4；Wherein:

The principle of CTPN are as follows:

Characteristic image is generated using network；

The non-textual prediction of text is carried out to the second class rectangle region；

Merging module, for merging first kind text box and the second class text frame based on non-maxima suppression (NMS)；Its In:

Such as:

Step 1. is according to Score size, since Box B4 frame；

Above procedure is repeated, until finding whole reservation Boxes.

Further, the merging of text filed frame can be NMS (non-maxima suppression) method there are many mode, It can be region and merging etc. be completely covered.

The invention has the benefit that

The Faster RCNN and CTPN that the present invention uses are all based on deep learning algorithm, are the spies in original image The text detection done on sign figure can use under various complex environments；The advantages of present invention incorporates Faster RCNN: right Still there is higher recall rate in overlay target.And the advantages of CTPN: still there is preferable boundary accurate for longer text Rate has better compatibility simultaneously for the text of different scale.By certain rule by the detection content of the two based on non- Maximum inhibits logic to merge, so that the recall rate and accuracy rate of text detection are all improved.

These are only the preferred embodiment of the present invention, is not intended to restrict the invention, for those skilled in the art For member, the invention may be variously modified and varied.All within the spirits and principles of the present invention, it is made it is any modification, Equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims

1. a kind of method of the length text detection based on deep neural network characterized by comprising

The first kind rectangular area of several default length-width ratios is enumerated from characteristic pattern based on Faster RCNN, predicts first kind square The prospect or background in shape region, the real estate for predicting first kind rectangular area, obtain all first kind text boxes；

Based on CTPN from several predetermined widths of feature enumeration of graph, the second class rectangle region of indefinite length, the second class rectangle is judged Region is text or non-textual region；If the second class rectangle region be it is text filed, access RNN recirculating network and obtain the second class Text box；

2. the method for the length text detection based on deep neural network as described in claim 1, which is characterized in that the base The first kind rectangular area of several default length-width ratios is enumerated from characteristic pattern in Faster RCNN, comprising:

The accurate region of text is returned from candidate region.

3. the method for the length text detection based on deep neural network as described in claim 1, which is characterized in that the base In CTPN from several predetermined widths of feature enumeration of graph, the second class rectangle region of indefinite length, comprising:

Characteristic image is generated using network；

4. the method for the length text detection based on deep neural network as described in claim 1, which is characterized in that the base Merge the first kind text box and the second class text frame in non-maxima suppression, comprising:

5. a kind of system of the length text detection based on deep neural network characterized by comprising

Faster RCNN processing module, for enumerating the first of several default length-width ratios from characteristic pattern based on Faster RCNN Class rectangle region predicts the prospect or background, the real estate for predicting first kind rectangular area of first kind rectangular area, obtains institute Some first kind text boxes；

CTPN processing module, for based on CTPN from several predetermined widths of feature enumeration of graph, the second class rectangle area of indefinite length Domain judges the second class rectangle region for text or non-textual region；If the second class rectangle region be it is text filed, access RNN Recirculating network obtains the second class text frame；

6. the system of the length text detection based on deep neural network as claimed in claim 5, which is characterized in that the base The first kind rectangular area of several default length-width ratios is enumerated from characteristic pattern in Faster RCNN, comprising:

The accurate region of text is returned from candidate region.

7. the system of the length text detection based on deep neural network as claimed in claim 5, which is characterized in that the base In CTPN from several predetermined widths of feature enumeration of graph, the second class rectangle region of indefinite length, comprising:

Characteristic image is generated using network；

8. the system of the length text detection based on deep neural network as claimed in claim 5, which is characterized in that the base Merge the first kind text box and the second class text frame in non-maxima suppression, comprising: