CN112528954A

CN112528954A - Certificate image character extraction method

Info

Publication number: CN112528954A
Application number: CN202011564026.0A
Authority: CN
Inventors: 吴志雄; 白丹; 周兴杰; 冯智辉
Original assignee: Shenzhen Taiji Yun Soft Technology Co ltd
Current assignee: Shenzhen Taiji Yun Soft Technology Co ltd
Priority date: 2020-12-25
Filing date: 2020-12-25
Publication date: 2021-03-19

Abstract

The invention discloses a certificate image character extraction method, which comprises the following steps: s1, inputting a certificate image; s2, detecting the character position in the certificate image through the character detection model, and marking the character position through a marking frame; s3, counting the distribution situation of the positions of the marking frames in the certificate image, judging the image direction and adjusting the image direction; s4, establishing a plane coordinate, merging and sequencing the label frames in the same row according to the Y axis, and acquiring an information frame of each row of characters; s5, aligning the standard template with the information frame, outputting the intersection part of the information frame and the standard template, and cutting and outputting the character picture; and S6, recognizing the character picture by using the character recognition model and extracting the character content.

Description

Certificate image character extraction method

Technical Field

The invention relates to the field of image processing and character recognition, in particular to a certificate image character extraction method.

Background

Identifying characters in a certificate image is very common and important in many scenes, for example, in scenes such as remote account opening, online loan and payment verification and the like in a financial scene, information such as a name, an address, an identity card number and the like of a user identity card needs to be identified so as to check whether the certificate and the certificate are integrated; the law enforcement of the industrial and commercial department usually needs to identify the enterprise name, the legal representative and the unified social credit code of the business license, and to see whether the important information of the enterprise name, the legal representative, the unified social credit code and the like is consistent with the record of the database in the industrial and commercial department system. To ensure the legitimacy of the enterprise. In traffic enforcement, vehicle management and other scenes, information such as a driving license, a license number in a driving license, a validity period, a vehicle code and the like is required to be identified.

In the prior art, a certificate character recognition method and device (CN108549881A) cannot extract data structured information, not all character information on a certificate is useful, identification of information of the certificate is not distinguished, the calculation amount of character recognition is increased, and the extracted character information is relatively disordered.

Therefore, the key information character recognition of the certificate becomes a problem to be solved urgently.

Disclosure of Invention

The invention aims to provide a certificate image character extraction method which is strong in anti-interference capability and capable of directionally screening key character information.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a certificate image text extraction method comprises the following steps:

s1, inputting a certificate image;

s2, detecting the character position in the certificate image through the character detection model, and marking the character position through a marking frame;

s3, counting the distribution situation of the positions of the marking frames in the certificate image, judging the image direction and adjusting the image direction;

s4, establishing a plane coordinate, merging and sequencing the label frames in the same row according to the Y axis, and acquiring an information frame of each row of characters;

s5, aligning the standard template with the information frame, outputting the intersection part of the information frame and the standard template, and cutting and outputting the character picture;

and S6, recognizing the character picture by using the character recognition model and extracting the character content.

Further, the step S1 is preceded by the steps of:

s0a, establishing a character detection model and a character recognition model through a convolutional neural network;

s0b, manufacturing a standard template of the certificate character distribution structure;

wherein, the steps S0a and S0b are not in sequence.

Further, the step S0a specifically includes:

collecting a large amount of certificate image materials, marking character positions and character contents, and synthesizing a large amount of certificate images according to certificate styles;

mixing actual data and synthetic data to form a training set, and forming a test set by the actual data;

establishing a character detection model and a character recognition model by using a convolutional neural network;

and training the model through a deep learning network and adjusting and optimizing parameters to fit the data.

Further, the convolutional neural network used by the character detection model is a PSENet network.

Further, the convolutional neural network used by the character recognition model is a CRNN network.

Further, the step S3 specifically includes the following sub-steps:

s3a, counting the position distribution of the text boxes, and comparing the distribution of the text boxes with a standard template;

and S3b, if the distribution of the character frames is the same as the distribution of the marking template, judging the character frames to be in a positive direction, if the distribution of the character frames is opposite to the distribution of the marking template, judging the character frames to be in an inverted direction, and rotating the image by 180 degrees.

Further, the types of the certificates are identity cards and social security cards.

Further, the method also includes step S7: and checking the text content and outputting the text content.

Further, the step S7 specifically includes the following sub-steps:

s7a, establishing a national information and identity card number code information base;

s7b, checking ethnic information of the text content;

s7c, verifying the ID card number information of the text content;

and S7d, outputting the character content.

Further, the certificate image character extraction method is used for a high-speed shooting instrument or a window government affair input instrument or a government affair all-in-one machine.

A terminal device comprises a processor and a storage end, wherein the storage end is used for storing program codes, and the processor is used for executing the certificate image character extraction method.

By applying the technical scheme of the invention, a large number of training sets and test sets are manufactured, wherein the training sets are certificate image materials, and are manually marked with character positions and character contents and used for building a model of a convolutional neural network; the test set is a certificate image synthesized according to the certificate style and is used for deeply learning a network optimization model; the method comprises the steps of using a PSENet network and a CRNN network to make a primary character detection model and a character recognition model through a training set, then training the model through a deep learning network and adjusting parameters to fit data, and improving the accuracy and the anti-interference performance of character detection and character recognition;

considering the character position on the certificate as a standard format, respectively manufacturing standard templates of character distribution structures aiming at different certificates; the method comprises the steps of firstly detecting characters in a certificate image through a character detection model, aligning the certificate image through a standard template, directionally screening a character recognition range through the template, removing redundant characters, realizing directional recognition of key character information, confirming character information in the template through the standard template, classifying character contents on the certificate, and providing a basis for subsequent verification;

finally, aiming at different certificate contents, a database is established; characters on the certificate image are classified through the standard template, the character contents of different types are checked, the character recognition result is calibrated, and the accuracy of detecting, recognizing and outputting the characters on the certificate image is improved.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.

Drawings

The present invention will be described in detail below with reference to the accompanying drawings so that the above advantages of the present invention will be more apparent.

FIG. 1 is a flow chart of a document image text extraction method of the present invention;

FIG. 2 is a flow chart of a model for a method of extracting text from a document image according to the present invention;

FIG. 3 is a schematic diagram of an ID card standard template of a document image text extraction method of the present invention;

FIG. 4 is a schematic diagram of an information box of a document image text extraction method according to the present invention;

FIG. 5 is a schematic diagram of a standard template alignment of a document image text extraction method according to the present invention;

FIG. 6 is a schematic diagram of a cropped text picture of a document image text extraction method of the present invention;

FIG. 7 is a schematic diagram of the social security card text annotation of a certificate image text extraction method according to the present invention;

FIG. 8 is a social security card character recognition diagram of a certificate image character extraction method of the present invention.

Detailed Description

In order to make the objects, features and advantages of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the embodiments described below are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1-6, a method for extracting text from a certificate image comprises the following steps:

s1, inputting a certificate image, adjusting the resolution of the certificate image, keeping the length-width ratio of the certificate image unchanged, and adjusting the resolution of the certificate image; the resolution of the certificate image is adjusted to reduce the calculation time of the model, so that the problem that the accuracy of identification is influenced due to too low resolution is avoided, pixels are reduced as much as possible, the calculation amount of identification calculation is reduced, and the speed is increased; if the resolution of the input image is 1920 × 1080 and the resolution of the long edge is 1920 × 1080, if the long edge is 1920 pixels and the resolution is 640 pixels, the coefficient k is 1920/640-3, and the short edge is 1080/3-360 pixels;

s5, aligning the standard template and the information frame, outputting the intersection part of the information frame and the standard template, setting a threshold value, and cutting and outputting a character picture according to the threshold value set by counting the intersection ratio of most of the information frames to be detected and the template, such as 0.5;

and S6, adjusting the size of the character picture, identifying the character picture by using the character identification model, and extracting the character content.

The method aims at the certificate image with the trimmed redundant edges, and the input certificate image has no edge line or narrow edge line.

After the preliminary preparation work is finished, carrying out size fine adjustment on the input image to enable the size of the image to be synchronous with the size of the standard template; then, character detection is carried out, all characters on the certificate image are identified by using the trained character detection model, and labeling is carried out through a labeling frame; then, judging whether the image direction needs to be adjusted according to the position distribution of the marking frame; after the direction and the size of the image are adjusted, the marking frames of each line on the certificate are combined and connected, and the character content is coherent to form an information frame; aligning the standard template with the certificate image, screening the content of the information frame through the standard template, extracting the key content at a specific position and intercepting; then carrying out character recognition on the intercepted character picture; and the character recognition range is directionally screened through the template, redundant characters are removed, key character information is directionally recognized, and the character information in the template can be confirmed through the standard template.

In this embodiment, before the step S1, the method further includes the steps of:

wherein, the steps S0a and S0b are not in sequence. In the former stage, two parts are prepared, wherein the first part is to establish a character detection model and a character recognition model by taking a large number of training sets and test sets as materials and combining a convolutional neural network and a deep learning network, so that the accuracy and the anti-interference capability of the models are improved; the character detection model is used for detecting the position of characters, and the character recognition model is used for recognizing characters.

The other part is to use the standardized characteristic of the certificate, manufacture a standard template aiming at the character distribution on different certificates, screen the character content, reduce redundant character recognition content during character recognition, reduce the workload and realize the directional character recognition.

In this embodiment, the step S0a specifically includes:

and training the model through a deep learning network and adjusting and optimizing parameters to fit the data. Making a large number of training sets and test sets, wherein the training sets are actual data and synthetic data certificate image materials, manually marking character positions and character contents, and establishing a model for a convolutional neural network; the test set is an actual certificate image and is used for deeply learning a network optimization model; and then, the model is trained through a deep learning network and parameters are adjusted to fit data, so that the accuracy and the anti-interference performance of character detection and character recognition are improved.

In this embodiment, the convolutional neural network used by the text detection model is a PSENet network. A new instance splitting network has two advantages. First, PSENet, a segmentation-based method, can locate arbitrarily shaped text. Second, the model proposes a progressive scaling algorithm that can successfully identify adjacent text instances. The backbone in the PSENET adopts the resnet, the network framework is similar to the structure of the FPN, and the structure of the whole network of the characteristic pyramid network is roughly as follows:

1. firstly, a simple network structure of ResNet

2. Secondly, generating FPN characteristic diagrams of p 2-p 5 by an operation method similar to FPN

3. The subsequent treatment method comprises the following steps:

1 feature maps of p 3-p 5 are all upsample to p2 scale

2 feature map of different scales for concat (element-wise addition)

And 3, performing convolution processing of 1x1, outputting kernel _ num prediction results, transforming psenet by using mobrienet to be more suitable for the operating environment of a mobile terminal, and greatly reducing the calculation time.

In this embodiment, the convolutional neural network used by the character recognition model is a CRNN network. In order to recognize characters of indefinite length, a model with higher capability is needed, the model has certain memory capability and can sequentially process information of any length according to time sequence, and the model is a 'Recurrent Neural Networks' (RNN for short).

The LSTM (Long Short Term Memory) is an RNN (recurrent neural network) with a special structure, and is used to solve the Long-Term dependence problem of the RNN, that is, as the time interval for inputting information into the RNN network increases, the ordinary RNN will have the phenomenon of "gradient disappearance" or "gradient explosion", which is the Long-Term dependence problem of the RNN, and the introduction of the LSTM can solve the problem. The LSTM unit consists of an Input Gate, a forgetting Gate and an Output Gate, and the CRNN (Convolutional Recurrent Neural Network) does not need to perform character segmentation on sample data, can identify a text sequence with any length, and has high model speed and good performance. The network structure mainly comprises a convolution layer, a circulation layer and a transcription layer 3.

In this embodiment, the step S3 further includes the following sub-steps:

and S3b, judging the direction to be positive if the distribution of the character frames is similar to the marking template, judging the direction to be inverted if the distribution of the character frames is similar to the marking template, and rotating the image by 180 degrees.

For example, the identity card, the social security card, the driving license and other certificates are provided with character photos, the character content is obviously deviated to one side, and the positive direction or the reverse direction of the certificate image can be judged according to the position distribution deviation of the character frame aiming at different certificates.

In this embodiment, the type of the certificate is an identity card or a social security card.

In this embodiment, the method further includes step S7: and checking the text content and outputting the text content. Classifying the text content on the certificate to provide a basis for subsequent verification; finally, aiming at different certificate contents, a database is established; characters on the certificate image are classified through the standard template, the character contents of different types are checked, the character recognition result is calibrated, and the accuracy of detecting, recognizing and outputting the characters on the certificate image is improved.

In this embodiment, the step S7 includes the following sub-steps:

s7b, checking ethnic information of the text content;

s7c, verifying the ID card number information of the text content;

and S7d, outputting the character content.

And 56 national information, if the verified character contents cannot be matched, searching the matched contents from the similar identified contents and finishing error correction.

The identity card is a process for marking that each object of the code can obtain a unique and unchangeable legal number in order to code the citizen in China. The 18 bits of the ID card respectively represent meanings, and respectively represent from left to right:

the first 1, 2 digit numerical representation: code of the province in which the user is located;

3, 4 digit representation: the code of the city;

5 th and 6 th digit representation: the code of the region;

digit numbers 7-14 indicate: year, month, day, etc. of birth;

digit number 15, 16: dispatching codes of the place where the house is located;

digit 17 indicates: sex male and female;

18 th digit indicates: and checking the code.

In this embodiment, the text images are adjusted to a uniform size with a height of 32 pixels. 32 pixels are the height of the image input by the crnn character recognition model, and are set during model training, and the height 32 pixels are found to be the optimal size after training, so that the calculation time and the calculation precision are balanced.

As shown in fig. 7-8, in another embodiment, the card surface of the social security card has more text information, the text position in the certificate image is detected through the text detection model, the text position is marked through the marking frame, then the screening is performed through the preset standard template of the social security, the cutting is performed in a targeted manner, finally the text content of the cut picture is identified, so that the important information is obtained, the unimportant information is filtered, the redundant calculation is avoided, the calculation amount of text identification is reduced, and the identification speed is increased. In this embodiment, the method for extracting text from document images is used for a high-speed image shooting instrument. The high-speed photographing instrument frequently photographs the certificate in the daily process, and the working logic of the high-speed photographing instrument is that trimming and cutting are carried out on the certificate picture firstly, and then character recognition is carried out on the cut certificate picture; the device can be used for a high-speed shooting instrument or a product with a high-speed shooting function, such as a window government affair input instrument and a government affair all-in-one machine.

A terminal device comprises a processor and a storage end, wherein the storage end is used for storing program codes, and the processor is used for executing the certificate image character extraction method. The terminal device is not limited to a high-speed shooting instrument or a smart phone.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that changes may be made in the embodiments and/or equivalents thereof without departing from the spirit and scope of the invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A certificate image character extraction method is characterized by comprising the following steps:

s1, inputting a certificate image;

2. The method for extracting text from certificate images as claimed in claim 1, wherein said step S1 is preceded by the steps of:

wherein, the steps S0a and S0b are not in sequence.

3. The method for extracting text from an image of a certificate as claimed in claim 1, wherein the step S0a is specifically:

4. The method for extracting text from certificate images as claimed in claim 3, wherein the convolutional neural network used by the text detection model is a PSENet network;

the convolutional neural network used by the character recognition model is a CRNN network.

5. The method for extracting text from an image of a certificate as claimed in claim 1, wherein the step S3 comprises the following steps:

6. The method as claimed in claim 1, wherein the document is of the type of ID card or social security card.

7. The method for extracting text from a document image according to claim 7, further comprising step S7: and checking the text content and outputting the text content.

8. The method for extracting text from an image of a certificate as claimed in claim 8, wherein the step S7 comprises the following steps:

s7b, checking ethnic information of the text content;

s7c, verifying the ID card number information of the text content;

and S7d, outputting the character content.

9. The document image text extraction method according to any one of claims 1 to 8, wherein the document image text extraction method is used for a high-speed camera, a window government affairs input instrument or a government affairs all-in-one machine.

10. A terminal device comprising a processor and a storage for storing program code, the processor being configured to perform the method according to any of claims 1-8.