CN112926469B

CN112926469B - Certificate identification method based on deep learning OCR and layout structure

Info

Publication number: CN112926469B
Application number: CN202110238213.8A
Authority: CN
Inventors: 谭智峰; 周庆勇; 李明明
Original assignee: Inspur Cloud Information Technology Co Ltd
Current assignee: Inspur Cloud Information Technology Co Ltd
Priority date: 2021-03-04
Filing date: 2021-03-04
Publication date: 2022-12-27
Anticipated expiration: 2041-03-04
Also published as: CN112926469A

Abstract

The invention discloses a document identification method based on deep learning OCR and a layout structure, belongs to the technical field of image identification, and aims to solve the technical problem of how to provide an identity card identification method which is low in cost, high in robustness and guaranteed in identification result. The method comprises the following steps: rotating the certificate image, wherein the rotated certificate image conforms to the visual angle of a person; carrying out certificate identification on the rotated certificate image through the trained certificate detection model, and removing a background image; carrying out text detection by an OCR text detection method to obtain a plurality of initial text boxes; removing the miscellaneous frames except the content elements, merging the initial text frames after the miscellaneous frames are removed, and stretching the multiple merged text frames in proportion; calculating coordinate information of the text box corresponding to each content element; and cutting the text box corresponding to each content element to obtain a text box image, and performing text detection by an OCR text detection method to obtain text information corresponding to the content element.

Description

Certificate identification method based on deep learning OCR and layout structure

Technical Field

The invention relates to the technical field of image recognition, in particular to a certificate recognition method based on deep learning OCR and a layout structure.

Background

The identity card is used as a certificate of the identity of a holder, and plays an important role in daily life and work of people. In the processes of registration, access procedures, certificate handling, admission employment, financial credit and the like, the identity card as a unique identity certification material needs to be submitted for examination.

Currently, the identification card identification technology is mainly completed by the following three methods: firstly, a card reader of hardware equipment is adopted, and identification is completed by reading a magnetic stripe in a second-generation identity card, but the card reader equipment is expensive and has higher cost; secondly, the traditional image processing technology is adopted to identify the information of the identity card, the robustness of the method in the aspects of uneven illumination, background interference, shielding and the like is known to be poor, and the identification rate, the accuracy and the like cannot be guaranteed; and thirdly, identification card identification of a fixed shooting area, the shooting process is high in requirement and needs to be sufficiently lighted, and the edge of the identification card needs to be close to the given frame edge, so that higher technical requirements are brought to a photographer.

Based on the above, how to provide an identification card recognition method with low cost, high robustness and guaranteed recognition result is a technical problem to be solved.

Disclosure of Invention

The invention aims at the defects and provides a document recognition method based on deep learning OCR and a layout structure, so as to solve the technical problem of how to provide an identity card recognition method which is low in cost, high in robustness and guaranteed in recognition result.

In a first aspect, the present invention provides a method for identifying a document based on deep learning OCR and layout structure, comprising the following steps:

for an input certificate image, angle detection in four directions is carried out based on character directions, the certificate image is rotated, and the rotated certificate image conforms to the visual angle of a person;

training a certificate detection model based on transfer learning, carrying out certificate identification on the rotated certificate image through the trained certificate detection model, and removing a background image to obtain a target certificate image;

performing text detection on the target certificate image by an OCR text detection method to obtain a plurality of initial text boxes;

removing the impurity frames except for the content elements from the initial text box, combining the initial text box after removing the impurity frames based on the central point coordinates and the length and width angles of the text box to obtain a plurality of combined text boxes, and stretching the combined text boxes in proportion to avoid omission of the content elements;

calculating the coordinate information of the text box corresponding to the certificate number based on the coordinate information of the certificate enclosure box and the length information of the certificate number, and calculating the coordinate information of the text box corresponding to each content element by taking the coordinate information of the text box corresponding to the certificate number as a reference;

and cutting the text box corresponding to each content element to obtain a text box image, and performing text detection on each text box image by an OCR text detection method to obtain text information corresponding to the content element.

Preferably, after the text information corresponding to the content element is obtained, the text information is outputted in a normalized manner through a regular expression.

Preferably, the angle detection is performed in four directions of 0 degree, 90 degree, 180 degree, and 270 degree based on the character direction.

Preferably, the certificate detection model is an SSD-MobileNet V1 model.

Preferably, the method for removing the miscellaneous frame except the content element from the initial text frame comprises the following steps:

removing the text box with low confidence coefficient;

removing all frames except the enclosing frame detected by the certificate and intersected with the enclosing frame;

removing ghost text boxes which are caused by reflection and appear under the portrait;

removing the vertical frame with the length-width ratio smaller than a preset value;

text boxes of languages other than the chinese language are removed.

In a second aspect, the present invention provides a document recognition system based on deep learning OCR and layout structure, which performs text on a document by the method for recognizing a document based on deep learning OCR and layout structure as described in any one of the first aspect, the system comprising:

the image rotation module is used for carrying out angle detection in four directions on the acquired certificate image based on the character direction and carrying out rotation operation on the certificate image, and the rotated certificate image conforms to the visual angle of a person;

the certificate extraction module is used for training a certificate detection model based on transfer learning, carrying out certificate identification on the rotated certificate image through the trained certificate detection model, and removing a background image to obtain a target certificate image;

the text extraction module is used for carrying out text detection on the target certificate image by an OCR text detection method to obtain a plurality of initial text boxes;

the text processing module is used for removing the miscellaneous frames except the content elements from the initial text frames, combining the initial text frames after the impurity frames are removed based on the central point coordinates and the length and width angles of the text frames to obtain a plurality of combined text frames, and stretching the combined text frames in proportion to avoid omission of the content elements;

the content element and layout coordinate matching module is used for calculating the coordinate information of the text box corresponding to the certificate number based on the coordinate information of the certificate enclosure box and the length information of the certificate number, and calculating the coordinate information of the text box corresponding to each content element by taking the coordinate information of the text box corresponding to the certificate number as a reference;

the content element structured extraction module is used for cutting the text box corresponding to each content element to obtain a text box image, and performing text detection on each text box image by an OCR text detection method to obtain text information corresponding to the content element;

and the character specification module is used for carrying out normalized output on the text information corresponding to the content elements through a regular expression.

Preferably, the image rotation module is configured to perform angle detection in four directions of 0 degree, 90 degree, 180 degree, and 270 degree based on the character direction.

Preferably, the certificate detection model is an SSD-MobileNet V1 model.

Preferably, the text processing module is configured to remove the miscellaneous box except for the content element from the initial text box by:

for the initial text box, removing the miscellaneous box except the content element, comprising the following steps:

removing the text box with low confidence coefficient;

removing all frames except the bounding box detected by the certificate and intersected with the bounding box;

removing the vertical frame with the length-width ratio smaller than the preset value;

the text boxes of languages other than the chinese language are removed.

In a third aspect, the present invention provides a computer readable medium having stored thereon computer instructions which, when executed by a processor, cause the processor to perform the method of any of the first aspects.

The certificate identification method based on deep learning OCR and layout structure has the following advantages:

1. the method comprises the steps of carrying out angle detection on an input certificate image, correcting the angle of the certificate image, obtaining a target certificate image through a certificate detection model of transfer learning training, removing a background image, carrying out text extraction on the target certificate image through an OCR text detection method to obtain an initial text box, carrying out merging and proportional stretching operations after removing impurity boxes of the initial text box, calculating coordinate information of each content element corresponding to the text box by taking the coordinate information of the text box corresponding to a certificate number as a reference, cutting the text box image corresponding to each content element, and carrying out text detection through the OCR text detection method to obtain text information, wherein the method is low in cost and improves the accuracy of certificate identification;

2. the method corrects the angle of the certificate image, acquires the target certificate image through the certificate detection model of the transfer learning training, removes the background image, has low requirements on the certificate image shooting, does not need to have sufficient light and has certain technical requirements that the edge of the certificate needs to be close to the given frame edge, and is convenient and quick.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

The invention is further described below with reference to the accompanying drawings.

FIG. 1 is a flowchart of a document recognition method based on deep learning OCR and layout structure in embodiment 1;

FIG. 2 is a schematic diagram of direction determination in the document recognition method based on deep learning OCR and layout structure in embodiment 1;

FIG. 3 is a schematic diagram of an identity card detection result in the document identification method based on deep learning OCR and layout structure in embodiment 1;

FIG. 4 is a schematic diagram illustrating the detection of element content in the document identification method based on deep learning OCR and layout structure in embodiment 1;

FIG. 5 is a schematic view of a processing flow of a detection frame in the certificate recognition method based on deep learning OCR and layout structure in embodiment 1;

FIG. 6 is a schematic diagram of the embodiment 1 after processing a detection frame in the certificate recognition method based on deep learning OCR and layout structure;

fig. 7 is a schematic diagram illustrating matching of element content and coordinate information in the certificate recognition method based on deep learning OCR and layout structure in embodiment 1.

Detailed Description

The present invention is further described below with reference to the accompanying drawings and specific embodiments so that those skilled in the art can better understand the present invention and can implement the present invention, but the embodiments are not intended to limit the present invention, and the embodiments and technical features of the embodiments can be combined with each other without conflict.

The embodiment of the invention provides a document identification method based on deep learning OCR and a layout structure, which is used for solving the technical problem of how to provide an identity card identification method which is low in cost, high in robustness and guaranteed in identification result.

Example 1:

the invention discloses a certificate identification method based on deep learning OCR and a layout structure, which comprises the following steps:

s100, carrying out angle detection in four directions on the input certificate image based on the character direction, and carrying out rotation operation on the certificate image, wherein the rotated certificate image conforms to the visual angle of a person;

s200, training a certificate detection model based on transfer learning, carrying out certificate identification on the rotated certificate image through the trained certificate detection model, and removing a background image to obtain a target certificate image;

s300, carrying out text detection on the target certificate image by an OCR text detection method to obtain a plurality of initial text boxes;

s400, removing impurity frames except for content elements from the initial text box, combining the initial text box after the impurity frames are removed based on the center point coordinates and the length and width angles of the text box to obtain a plurality of combined text boxes, and stretching the combined text boxes in proportion to avoid omission of the content elements;

s500, calculating coordinate information of a text box corresponding to the certificate number based on the coordinate information of the certificate enclosure box and the length information of the certificate number, and calculating the coordinate information of the text box corresponding to each content element by taking the coordinate information of the text box corresponding to the certificate number as a reference;

s600, cutting the text box corresponding to each content element to obtain a text box image, and performing text detection on each text box image by an OCR text detection method to obtain text information corresponding to the content element.

In step S100, direction determination is performed, and due to the fact that the intelligent device takes pictures and stores, the uploaded certificate image rotates by four angles, which causes inconvenience in position detection of the subsequent identity card. In order to solve the problem, the character directions of the input certificate image are detected at 0, 90, 180 and 270 degrees, the certificate image is rotated according to the degrees, the angle detection in four directions is carried out on the certificate to be processed, and the target certificate image which accords with the human visual angle is obtained through rotation.

In the step S200, the certificate detection is performed, and due to the influence of factors such as background interference, stain of the identification card, uneven light, and the like, the method for determining the position information of the identification card by using image processing methods such as edge detection and image segmentation in the past is increasingly inapplicable, and a method with stronger robustness is urgently needed to replace the method. With the deep application of deep learning in the image field, methods for locating target objects by using target detection are becoming more and more common. Marking a certificate data set to mark an enclosure frame of the certificate, training a certificate detection model through transfer learning, and identifying a relatively accurate position of the certificate in a natural scene. The certificate detection model in this embodiment is an SSD-MobileNet V1 model.

Step S300, detecting the certificate content elements, wherein the intelligent equipment is influenced by the environment, such as uneven illumination, low contrast and interference of the background; geometric deformation, blurring and deformity caused by imaging of the equipment, and too small image caused by distance; texture interference and logo interference of the identity card and multi-national multi-language mixing cause that a satisfactory result is difficult to obtain by a single image processing mode or a character segmentation and detection method. At present, text lines are detected through deep learning, and text line identification is carried out by utilizing context information of characters, so that the technology is mature. And detecting corpora by using some public OCR text lines, training the text line detection, and then performing parameter tuning by using the corpora marked with a part of the identity card so as to ensure that the detection is accurate as much as possible. In the embodiment, the element content to be extracted in the certificate is detected by an OCR text detection method, so as to obtain a plurality of initial text boxes.

Step S400 is to perform detection frame processing, where the complexity of the background and the design of the id card itself may result in detecting out the miscellaneous frames other than the content elements, and in order to avoid affecting the subsequent steps, the miscellaneous frames need to be processed, as shown in fig. 5:

(1) Removing the text box with low confidence coefficient;

(2) Removing all frames except the bounding box detected by the certificate and intersected with the bounding box;

(3) Removing ghost text boxes appearing below the portrait due to the reflection;

(4) Removing the vertical frame with the length-width ratio smaller than a preset value;

(5) Text boxes of languages other than the chinese language are removed.

The problem that the processed text box is possibly separated due to interface design needs to be solved according to the position parameters, the coordinates of the central point of the text box and the length and width angle information, and the text box is combined. Due to the detection, the merged text box may have missing content elements, and therefore, the size of the text box is finally scaled.

In step S500, matching between the content element and the layout coordinate is performed, and the text box where the certificate number is located is easily calculated according to the coordinate information of the enclosure box and the length information of the identity card number. The text box is taken as a standard, the area where the content elements of address, birth, gender, nationality and name are located can be roughly determined by combining parameters and surrounding box information according to the proportional relation of certificate design, the position information of the text line is matched with each area, and finally the information of the content elements represented by the text box can be obtained.

In step S600, structured extraction of content elements is performed, and for the text box corresponding to each content element, an image is cut out through the box coordinate information, and then the image is input into the character recognition model, so that the text information corresponding to each content element can be obtained. At present, deep learning is used for text recognition, text line recognition can fully utilize context information of characters for modeling, and therefore the recognition effect is better than that of segmenting single characters and then recognizing the single characters. And training text line recognition by using some public OCR text recognition methods, and then performing parameter tuning by using the corpus marked with a part of the identity card so as to ensure that the recognition is as accurate as possible. Influenced by the accuracy of the recognition model, the recognized characters may be irregular, and finally, the result needs to be outputted in a standardized manner through a regular expression.

In this embodiment, the certificate image may be an identification card, a driving license, a bank card, or the like. For the identity card, the text box where the identity card number is located can be calculated according to the coordinate information of the surrounding box and the length information of the identity card number. The text box corresponding to the ID card number is taken as a reference, the area where the content elements of address, birth, gender, nationality and name are located can be roughly determined by combining the parameters and the surrounding box information according to the proportional relation designed by the ID card, the position information of the text line is matched with each area, and finally the information of the content elements represented by the text box can be obtained. And other types of certificates also acquire the coordinate information of the text box corresponding to the content element based on the method.

Example 2:

the invention relates to a certificate recognition system based on deep learning OCR and a layout structure, which carries out text on a certificate by a certificate recognition method based on deep learning OCR and the layout structure disclosed in embodiment 1, and the system comprises an image rotation module, a certificate extraction module, a text processing module, a content element and layout coordinate matching module, a content element structured extraction module and a character standardization module, wherein the image rotation module is used for carrying out angle detection in four directions on an acquired certificate image based on character directions and carrying out rotation operation on the certificate image, and the rotated certificate image accords with human visual angles; the certificate extraction module is used for training a certificate detection model based on transfer learning, carrying out certificate identification on the rotated certificate image through the trained certificate detection model, and removing a background image to obtain a target certificate image; the text extraction module is used for carrying out text detection on the target certificate image by an OCR text detection method to obtain a plurality of initial text boxes; the text processing module is used for removing the impurity frames except the content elements from the initial text boxes, combining the initial text boxes after the impurity frames are removed based on the central point coordinates and the length and width angles of the text boxes to obtain a plurality of combined text boxes, and stretching the combined text boxes in proportion to avoid omission of the content elements; the content element and layout coordinate matching module is used for calculating the coordinate information of the text box corresponding to the certificate number based on the coordinate information of the certificate enclosure box and the certificate number length information, and calculating the coordinate information of the text box corresponding to each content element by taking the coordinate information of the text box corresponding to the certificate number as a reference; the content element structured extraction module is used for cutting the text box corresponding to each content element to obtain a text box image, and for each text box image, performing text detection by an OCR text detection method to obtain text information corresponding to the content element; and the character specification module is used for carrying out normalized output on the text information corresponding to the content elements through a regular expression.

The image rotation module is used for detecting angles in four directions of 0 degree, 90 degrees, 180 degrees and 270 degrees based on the character direction.

The certificate detection model is an SSD-MobileNet V1 model.

The text processing module is used for removing the miscellaneous frames except the content elements for the initial text frame through the following steps:

(1) For the initial text box, removing the miscellaneous box except the content element, comprising the following steps:

(2) Removing the text box with low confidence coefficient;

(3) Removing all frames except the bounding box detected by the certificate and intersected with the bounding box;

(4) Removing ghost text boxes which are caused by reflection and appear under the portrait;

(5) Removing the vertical frame with the length-width ratio smaller than the preset value;

(6) Text boxes of languages other than the chinese language are removed.

The system can identify the text information in the certificate, and for the identity card, the work flow is as follows: firstly, acquiring identity card image data through intelligent equipment; carrying out angle detection in four directions on an image to be processed, and rotating the image according to the angle to obtain an image which accords with the human visual angle; further finding a general area where the identity card is located through a depth target detection model for the image; detecting the region of the text line in the image by adopting a deep OCR text line detection model; deleting the text boxes of the non-identity card elements, combining the appropriate text boxes according to the parameters, and stretching the combined text boxes; matching and positioning the element position of the identity card according to the preset parameters by referring to the position information of the identity card number; regularizing the extracted information of each element; and finally, formatting and outputting the identity card information. The system has low shooting requirement and is convenient to use.

Example 3:

an embodiment of the present invention further provides a computer-readable medium, where computer instructions are stored on the computer-readable medium, and when the computer instructions are executed by a processor, the processor is caused to execute the method disclosed in embodiment 1. Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.

In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.

Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.

Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.

Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then a CPU or the like mounted on the expansion board or the expansion unit is caused to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the embodiments described above.

It should be noted that not all steps and modules in the above flows and system structure diagrams are necessary, and some steps or modules may be omitted according to actual needs. The execution order of the steps is not fixed and can be adjusted as required. The system structures described in the above embodiments may be physical structures or logical structures, that is, some modules may be implemented by the same physical entity, or some modules may be implemented by a plurality of physical entities separately, or some components may be implemented together in a plurality of independent devices.

While the invention has been shown and described in detail in the drawings and in the preferred embodiments, it is not intended to limit the invention to the embodiments disclosed, and it will be apparent to those skilled in the art that various combinations of the code auditing means in the various embodiments described above may be used to obtain further embodiments of the invention, which are also within the scope of the invention.

Claims

1. The certificate recognition method based on deep learning OCR and layout structure is characterized by comprising the following steps:

for an input certificate image, angle detection is carried out in four directions of 0 degree, 90 degrees, 180 degrees and 270 degrees based on the character direction, the certificate image is rotated, and the rotated certificate image accords with the visual angle of a person;

training a certificate detection model based on transfer learning, wherein the certificate detection model is an SSD-MobileNet V1 model, and performing certificate identification on the rotated certificate image through the trained certificate detection model to remove a background image to obtain a target certificate image;

cutting the text box corresponding to each content element to obtain a text box image, and performing text detection on each text box image by an OCR text detection method to obtain text information corresponding to the content element;

removing the text box with low confidence coefficient;

removing ghost text boxes appearing below the portrait due to the reflection;

text boxes of languages other than the chinese language are removed.

2. The method for recognizing the document based on the deep learning OCR and the layout structure as claimed in claim 1, wherein the text information corresponding to the content element is obtained and then outputted in a normalized manner through a regular expression.

3. A system for document recognition based on deep learning OCR and layout structure, characterized in that the document is subjected to text by the method for document recognition based on deep learning OCR and layout structure as claimed in any one of claims 1-2, the system comprising:

the image rotation module is used for carrying out angle detection on the acquired certificate image in four directions of 0 degree, 90 degrees, 180 degrees and 270 degrees based on the character direction, carrying out rotation operation on the certificate image, and enabling the rotated certificate image to accord with the visual angle of a person;

the certificate extraction module is used for training a certificate detection model based on transfer learning, the certificate detection model is an SSD-MobileNet V1 model, certificate recognition is carried out on the rotated certificate image through the trained certificate detection model, and a background image is removed to obtain a target certificate image;

the character specification module is used for carrying out normalized output on the text information corresponding to the content elements through a regular expression;

removing the text box with low confidence coefficient;

removing ghost text boxes appearing below the portrait due to the reflection;

text boxes of languages other than the chinese language are removed.

4. A computer readable medium having stored thereon computer instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1 to 2.