CN108694393A

CN108694393A - A kind of certificate image text area extraction method based on depth convolution

Info

Publication number: CN108694393A
Application number: CN201810536528.9A
Authority: CN
Inventors: 屈鸿; 石鑫; 黄鹂; 汪文; 汪一文
Original assignee: Shenzhen Di Di Information Technology Ltd By Share Ltd
Current assignee: Shenzhen Di Di Information Technology Ltd By Share Ltd
Priority date: 2018-05-30
Filing date: 2018-05-30
Publication date: 2018-10-23

Abstract

The certificate image text area extraction method based on depth convolution that the invention discloses a kind of,It is related to image processing field,A kind of more particularly to certificate image text area extraction method based on depth convolution,Multiple certificate images that can not be irregular to polymorphic type certificate or the space of a whole page to the image recognition technology solved the problems, such as under the prior art carry out text filed positioning,The present invention includes pre-process simultaneously level correction to target image,The line of text image in horizontal direction is carried out to the image after level correction using depth convolutional neural networks to be accurately positioned,By the line of text image zooming-out of positioning and export,The present invention uses the certificate image text area extraction method based on depth convolution,It can be irregular to polymorphic type certificate or the space of a whole page,The certificate image that the space of a whole page of multiple certificate images is inconsistent carries out text filed be accurately positioned,With accurate positioning,The good advantage of real-time.

Description

A kind of certificate image text area extraction method based on depth convolution

Technical field

The present invention relates to image processing field more particularly to a kind of certificate image text area extractions based on depth convolution Method.

Background technology

The final steps that the text area extraction of certificate image is identified as certificate go out the Text Feature Extraction in image Come, in order to the extraction and identification for character, with the more acurrate better process for obtaining the text message on certificate image.

The text area extraction of certificate image be exactly by the image comprising certificate carry out it is text filed be accurately positioned, and base Text is accurately extracted in positioning.Text filed extraction is generally divided into two ways, directly to the text in image Carry out positioning analysis;First image is corrected, then carries out text filed positioning based on the image after correction.

Traditional character recognition (OCR) is generally divided into image preprocessing, word area detection, character cutting, character recognition Four modules.Image preprocessing mainly with to image carry out image enhancing, noise reduction and correction etc..The pretreatment of image is figure As the first step of processing, can significantly improve to extraction text filed in image and recognition accuracy.In image The detection of character area is divided into two methods, a kind of to use printed page analysis, by being carried out on the space of a whole page to specific certificate image Analysis requires the space of a whole page of certificate image to return to directly carry out text filed positioning to the image of certificate using printed page analysis It is whole, first certificate target image is extracted, then only positioned with relative position, accuracy rate by the space of a whole page alignment degree with The extraction correctness of target image influences;Another method is the method using neural network learning, directly to image into style of writing The selection of one's respective area is not influenced by space of a whole page etc. using network learning method, need not also be extracted to target image, directly It is text filed to connect the intelligent positioning extraction from image.

In the application of depth convolutional neural networks OCR, there are several forms:Text filed, use is extracted using method for distinguishing Character picture is identified in depth convolutional neural networks;Using depth convolutional neural networks to the text filed progress in image Extraction, is then identified again;Using deep learning method, a system end to end is designed, directly by the text in image Region recognition returns to the result of identification.Wherein optimal mode certainly system end to end, but it realizes that difficulty is also Highest, to reach ideal accuracy rate is difficult.And the research of image recognition is mainly concentrated in OCR for image Chinese in recent years The research of the positioning of one's respective area, the proposition of Faster R-CNN, Yolo methods etc., can quickly determine target in image Position and identification, based on this and improved and propose much for natural scene character area extraction and recognition methods, Also reach preferable effect.From the successive proposition of Faster R-CNN and Yolo algorithms, to the text filed inspection under natural scene Survey is also paid close attention to, and has the method much based on depth convolutional neural networks to the text filed inspection under natural scene in succession It surveys, can reach 85% or so to the verification and measurement ratio of text under natural scene.

Traditional printed page analysis method is higher to text filed locating accuracy, but by the type shadow of certificate image It rings, needs to carry out printed page analysis to the image of every one kind, and it also requires image is regular, and some irregular certificates are then It cannot carry out printed page analysis;Although the text filed positioning based on depth convolutional neural networks intelligent can be determined text Position, but it is inadequate to the setting accuracy in region during the study of the method for one side neural network, and most of all, being based on The model of convolutional neural networks needs largely to calculate to need longer time in identification positions, it is difficult to accomplish in real time Effect.

Invention content

It is an object of the invention to:In order to which the image recognition technology solved under the prior art can not be to polymorphic type certificate or version The problem of irregular multiple certificate images in face carry out text filed positioning, the present invention provides a kind of cards based on depth convolution Part image text method for extracting region carries out in horizontal direction pretreated target image using depth convolutional neural networks Line of text image be accurately positioned, the text in image can be accurately positioned, and have the advantages that real-time is good.

The technical solution adopted by the present invention is as follows:

A kind of certificate image text area extraction method based on depth convolution, includes the following steps:

S1:Target image is pre-processed, pretreated target image obtains horizontal image after level correction;

S2:The horizontal image obtained in S1 is substituted into the network model based on convolutional neural networks and is positioned, is obtained Output image as unit of line of text.

Further, the particular content in the S1 is as follows:

S101:Image outline is obtained after being pre-processed to target image, is connected image outline using closed operation, And profile lookup algorithm is combined to take out profile;

S102:The profile extracted in S101 is screened using geometric proportion, chooses target certificate profile;

S103:Based on target certificate profile selected in S102, according to the position pair of the slant characteristic of profile and profile Image is rotated;

S104:Its angle point is monitored using Hough transformation to postrotational image in S103 and combines perspective transform by image calibration Just obtain horizontal image.

Further, the pretreated method includes following two:

Method 1:It first uses Gaussian mode gelatinization processing to carry out noise reduction to target image, then carries out gray processing processing, gray processing Treated, and image uses Sobel operator edge treateds, obtains image outline;

Method 2:First Edge contrast is used to enhance target image details, reuses Canny operator edge treateds and obtain image Profile.

Further, the standard of the screening is according to the feature of specific certificate and the standard length-breadth ratio of specific certificate Choose target certificate profile.

Further, the particular content in the S2 is as follows:

S201:Convolution feature extraction is carried out to the horizontal image obtained in S1, is carried using the depth convolution model of VGG16 The data taken extract feature as basic network, and it is big as characteristic pattern to obtain the 3rd sublayer feature in VGG layer 5 convolutional layers The small width and height for being respectively input picture for W × H × C, W and H, C are convolution nuclear volume, and VGG16 is generated the is used only herein Five layer datas are handled.

S202:Convolution is made using the sliding window of 512 3 × 3 sizes to the characteristic pattern obtained in S201, each point is rolled up Product obtains the feature vector of 512 sizes;

S203:For each window center, the text box that 10 height are 13~273 can be all generated, and to each text Frame uses the probability that the feature vector zoning extracted in S202 is text;

S204:Using the feature vector obtained in S202 as the input of a two-way LSTM, and provide its output be W × 256 length connect the output of 512 full articulamentum progress result behind two-way LSTM;

S205:Output result is integrated based on line of text construction algorithm, sequence or the text box mutually closed on are carried out It integrates, and non-maxima suppression algorithm is carried out to extra text box and is filtered, the line of text finally integrated is target figure The line of text of picture.

Further, feature of the output result comprising text box position, text box judges and for adjusting line of text The predictive information of endpoint location, the text box are indicated by two values of height of center and rectangle frame.

Further, the feature to text box is judged as two, and one is the probability for being judged as text, another is It is judged as non-textual probability.

In conclusion by adopting the above-described technical solution, the beneficial effects of the invention are as follows:

1. the present invention uses the certificate image text area extraction method based on depth convolution, can to polymorphic type certificate or The certificate image that the space of a whole page of irregular, the multiple certificate image of the space of a whole page is inconsistent carries out text filed be accurately positioned.

2. the present invention can carry out String localization extraction under the premise of being calculated using GPU in 1s, recognition speed is fast, real When property is good.

3. the present invention breaks through the weakness of printed page analysis in tradition, to that cannot carry out the certificate image and multiple types of printed page analysis The certificate image of type can also carry out text filed positioning.

4. by the present invention in that carrying out level correction to image with image preprocessing, can filter to being caused by photo environment The illumination effect to image, the influences such as angular transformation and rotation transformation interference, so as to realize to it is text filed carry out it is accurate Positioning.

Description of the drawings

Examples of the present invention will be described by way of reference to the accompanying drawings, in the accompanying drawings:

Fig. 1 is overall flow schematic diagram of the present invention;

Fig. 2 is the neural network model figure the present invention is based on depth convolution;

Fig. 3 is result figure after present invention pretreatment, and left side is using Sobel operator handling result figures, and right side is to use Canny operator handling result figures.

Fig. 4 is the design sketch that the present invention carries out image rotation correction;

Fig. 5 is the positioning effect to ID Card Image the present invention is based on the text filed location model of depth convolutional neural networks Fruit is schemed.

Specific implementation mode

In order to which those skilled in the art are better understood from invention, the present invention is carried out with reference to the accompanying drawings and examples detailed It describes in detail bright.

As a preferred embodiment, the particular content in the S1 is as follows:

S101:Image outline is obtained after being pre-processed to target image on the left of Fig. 4, is connected image outline using closed operation It picks up and, and profile lookup algorithm is combined to take out profile;

S104:Its angle point is monitored using Hough transformation to postrotational image in S103 and combines perspective transform by image calibration Horizontal image shown in just obtaining on the right side of Fig. 4,

As a preferred embodiment, the pretreated method includes following two:

Method 1:It first uses Gaussian mode gelatinization processing to carry out noise reduction to target image, then carries out gray processing processing, gray processing Treated, and image uses Sobel operator edge treateds, obtains image outline, is obtained as described in the left sides Fig. 3 using Sobel operators Profile, the Sobel operator sizes used are 5;

Method 2:First Edge contrast is used to enhance target image details, reuses Canny operator edge treateds and obtain image Profile obtains the profile as shown in the right sides Fig. 3 using Canny operators, and Canny operators are 3 using size, and upper lower threshold value is respectively 89 With 40.

As a preferred embodiment, the method screened described in S102:By the profile combination identity card sheet of extraction The length-width ratio of body is closest to 108:66 ratio and the feature of identity card are screened, and are chosen identity card certificate profile, are passed through length For width than carrying out coarse sizing, whether the classification of retraining one is that the SVM classifier of ID Card Image is chosen.

As a preferred embodiment, being set up in the S2 as shown in Figure 2 based on depth convolutional neural networks Text filed location model and carry out target image String localization analysis using the model, particular content is as follows:

S205:Output result is integrated based on line of text construction algorithm, sequence or the text box mutually closed on are carried out Integrate, and non-maxima suppression algorithm carried out to extra text box and is filtered, the line of text finally integrated as shown in figure 5, The as line of text of target image.

As a preferred embodiment, it is described output result include text box position, text box feature judge and Predictive information for adjusting line of text endpoint location, the text box are worth tables by the height two of center and rectangle frame Show.

As a preferred embodiment, the feature to text box is judged as two, one is to be judged as text Probability, another is to be judged as non-textual probability.

The above is only the preferred embodiment of the present invention, it is noted that is come for those of ordinary skill in the art It says, without departing from the inventive concept of the premise, can also make several modifications and improvements, these belong to the protection model of invention It encloses.

Claims

1. a kind of certificate image text area extraction method based on depth convolution, which is characterized in that include the following steps:

S2:The horizontal image obtained in S1 is substituted into the network model based on convolutional neural networks and is positioned, is obtained with text One's own profession is the output image of unit.

2. a kind of certificate image text area extraction method based on depth convolution according to claim 1, feature exist In the particular content in the S1 is as follows:

S101:Image outline is obtained after being pre-processed to target image, is connected image outline using closed operation, and tie It closes profile lookup algorithm and takes out profile;

S103:Based on target certificate profile selected in S102, according to the position of the slant characteristic of profile and profile to image It is rotated;

S104:Its angle point is monitored using Hough transformation to postrotational image in S103 and obtains image rectification in conjunction with perspective transform To horizontal image.

3. a kind of certificate image text area extraction method based on depth convolution according to claim 2, feature exist In the pretreated method includes following two:

Method 1:It first uses Gaussian mode gelatinization processing to carry out noise reduction to target image, then carries out gray processing processing, gray processing processing Image afterwards uses Sobel operator edge treateds, obtains image outline;

Method 2:First Edge contrast is used to enhance target image details, reuses Canny operator edge treateds and obtain image outline.

4. a kind of certificate image text area extraction method based on depth convolution according to claim 2, feature exist In the standard of the screening is to choose target certificate wheel according to the feature of specific certificate and the standard length-breadth ratio of specific certificate It is wide.

5. a kind of certificate image text area extraction method based on depth convolution according to claim 1, feature exist In the particular content in the S2 is as follows:

S201:Convolution feature extraction is carried out to the horizontal image obtained in S1, is extracted using the depth convolution model of VGG16 Data extract feature as basic network, and the 3rd sublayer feature is as characteristic pattern, size W in acquisition VGG layer 5 convolutional layers × H × C, W and H are respectively the width and height of input picture, and C is convolution nuclear volume;

S202:Convolution is made using the sliding window of 512 3 × 3 sizes to the characteristic pattern obtained in S201, each convolution is obtained To the feature vector of 512 sizes;

S203:For each window center, it can all generate 10 height and be 13~273 text box, and each text box is made With the probability that the feature vector zoning extracted in S202 is text;

S204:Using the feature vector obtained in S202 as the input of a two-way LSTM, and provide that its output is that W × 256 is grown Degree connects the output of 512 full articulamentum progress result behind two-way LSTM;

S205:Output result is integrated based on line of text construction algorithm, sequence or the text box mutually closed on are integrated, And non-maxima suppression algorithm is carried out to extra text box and is filtered, the line of text finally integrated is the text of target image One's own profession.

6. a kind of certificate image text area extraction method based on depth convolution according to claim 5, feature exist In the output result includes text box position, the feature judgement of text box and the prediction for adjusting line of text endpoint location Information, the text box are indicated by two values of height of center and rectangle frame.

7. a kind of certificate image text area extraction method based on depth convolution according to claim 6, feature exist In, the feature to text box is judged as two, and one is the probability for being judged as text, another be judged as it is non-textual Probability.