CN112926469B - Certificate identification method based on deep learning OCR and layout structure - Google Patents

Certificate identification method based on deep learning OCR and layout structure Download PDF

Info

Publication number
CN112926469B
CN112926469B CN202110238213.8A CN202110238213A CN112926469B CN 112926469 B CN112926469 B CN 112926469B CN 202110238213 A CN202110238213 A CN 202110238213A CN 112926469 B CN112926469 B CN 112926469B
Authority
CN
China
Prior art keywords
text
certificate
image
box
detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110238213.8A
Other languages
Chinese (zh)
Other versions
CN112926469A (en
Inventor
谭智峰
周庆勇
李明明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Cloud Information Technology Co Ltd
Original Assignee
Inspur Cloud Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Cloud Information Technology Co Ltd filed Critical Inspur Cloud Information Technology Co Ltd
Priority to CN202110238213.8A priority Critical patent/CN112926469B/en
Publication of CN112926469A publication Critical patent/CN112926469A/en
Application granted granted Critical
Publication of CN112926469B publication Critical patent/CN112926469B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/242Aligning, centring, orientation detection or correction of the image by image rotation, e.g. by 90 degrees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Artificial Intelligence (AREA)
  • Character Input (AREA)

Abstract

The invention discloses a document identification method based on deep learning OCR and a layout structure, belongs to the technical field of image identification, and aims to solve the technical problem of how to provide an identity card identification method which is low in cost, high in robustness and guaranteed in identification result. The method comprises the following steps: rotating the certificate image, wherein the rotated certificate image conforms to the visual angle of a person; carrying out certificate identification on the rotated certificate image through the trained certificate detection model, and removing a background image; carrying out text detection by an OCR text detection method to obtain a plurality of initial text boxes; removing the miscellaneous frames except the content elements, merging the initial text frames after the miscellaneous frames are removed, and stretching the multiple merged text frames in proportion; calculating coordinate information of the text box corresponding to each content element; and cutting the text box corresponding to each content element to obtain a text box image, and performing text detection by an OCR text detection method to obtain text information corresponding to the content element.

Description

Certificate identification method based on deep learning OCR and layout structure
Technical Field
The invention relates to the technical field of image recognition, in particular to a certificate recognition method based on deep learning OCR and a layout structure.
Background
The identity card is used as a certificate of the identity of a holder, and plays an important role in daily life and work of people. In the processes of registration, access procedures, certificate handling, admission employment, financial credit and the like, the identity card as a unique identity certification material needs to be submitted for examination.
Currently, the identification card identification technology is mainly completed by the following three methods: firstly, a card reader of hardware equipment is adopted, and identification is completed by reading a magnetic stripe in a second-generation identity card, but the card reader equipment is expensive and has higher cost; secondly, the traditional image processing technology is adopted to identify the information of the identity card, the robustness of the method in the aspects of uneven illumination, background interference, shielding and the like is known to be poor, and the identification rate, the accuracy and the like cannot be guaranteed; and thirdly, identification card identification of a fixed shooting area, the shooting process is high in requirement and needs to be sufficiently lighted, and the edge of the identification card needs to be close to the given frame edge, so that higher technical requirements are brought to a photographer.
Based on the above, how to provide an identification card recognition method with low cost, high robustness and guaranteed recognition result is a technical problem to be solved.
Disclosure of Invention
The invention aims at the defects and provides a document recognition method based on deep learning OCR and a layout structure, so as to solve the technical problem of how to provide an identity card recognition method which is low in cost, high in robustness and guaranteed in recognition result.
In a first aspect, the present invention provides a method for identifying a document based on deep learning OCR and layout structure, comprising the following steps:
for an input certificate image, angle detection in four directions is carried out based on character directions, the certificate image is rotated, and the rotated certificate image conforms to the visual angle of a person;
training a certificate detection model based on transfer learning, carrying out certificate identification on the rotated certificate image through the trained certificate detection model, and removing a background image to obtain a target certificate image;
performing text detection on the target certificate image by an OCR text detection method to obtain a plurality of initial text boxes;
removing the impurity frames except for the content elements from the initial text box, combining the initial text box after removing the impurity frames based on the central point coordinates and the length and width angles of the text box to obtain a plurality of combined text boxes, and stretching the combined text boxes in proportion to avoid omission of the content elements;
calculating the coordinate information of the text box corresponding to the certificate number based on the coordinate information of the certificate enclosure box and the length information of the certificate number, and calculating the coordinate information of the text box corresponding to each content element by taking the coordinate information of the text box corresponding to the certificate number as a reference;
and cutting the text box corresponding to each content element to obtain a text box image, and performing text detection on each text box image by an OCR text detection method to obtain text information corresponding to the content element.
Preferably, after the text information corresponding to the content element is obtained, the text information is outputted in a normalized manner through a regular expression.
Preferably, the angle detection is performed in four directions of 0 degree, 90 degree, 180 degree, and 270 degree based on the character direction.
Preferably, the certificate detection model is an SSD-MobileNet V1 model.
Preferably, the method for removing the miscellaneous frame except the content element from the initial text frame comprises the following steps:
removing the text box with low confidence coefficient;
removing all frames except the enclosing frame detected by the certificate and intersected with the enclosing frame;
removing ghost text boxes which are caused by reflection and appear under the portrait;
removing the vertical frame with the length-width ratio smaller than a preset value;
text boxes of languages other than the chinese language are removed.
In a second aspect, the present invention provides a document recognition system based on deep learning OCR and layout structure, which performs text on a document by the method for recognizing a document based on deep learning OCR and layout structure as described in any one of the first aspect, the system comprising:
the image rotation module is used for carrying out angle detection in four directions on the acquired certificate image based on the character direction and carrying out rotation operation on the certificate image, and the rotated certificate image conforms to the visual angle of a person;
the certificate extraction module is used for training a certificate detection model based on transfer learning, carrying out certificate identification on the rotated certificate image through the trained certificate detection model, and removing a background image to obtain a target certificate image;
the text extraction module is used for carrying out text detection on the target certificate image by an OCR text detection method to obtain a plurality of initial text boxes;
the text processing module is used for removing the miscellaneous frames except the content elements from the initial text frames, combining the initial text frames after the impurity frames are removed based on the central point coordinates and the length and width angles of the text frames to obtain a plurality of combined text frames, and stretching the combined text frames in proportion to avoid omission of the content elements;
the content element and layout coordinate matching module is used for calculating the coordinate information of the text box corresponding to the certificate number based on the coordinate information of the certificate enclosure box and the length information of the certificate number, and calculating the coordinate information of the text box corresponding to each content element by taking the coordinate information of the text box corresponding to the certificate number as a reference;
the content element structured extraction module is used for cutting the text box corresponding to each content element to obtain a text box image, and performing text detection on each text box image by an OCR text detection method to obtain text information corresponding to the content element;
and the character specification module is used for carrying out normalized output on the text information corresponding to the content elements through a regular expression.
Preferably, the image rotation module is configured to perform angle detection in four directions of 0 degree, 90 degree, 180 degree, and 270 degree based on the character direction.
Preferably, the certificate detection model is an SSD-MobileNet V1 model.
Preferably, the text processing module is configured to remove the miscellaneous box except for the content element from the initial text box by:
for the initial text box, removing the miscellaneous box except the content element, comprising the following steps:
removing the text box with low confidence coefficient;
removing all frames except the bounding box detected by the certificate and intersected with the bounding box;
removing ghost text boxes which are caused by reflection and appear under the portrait;
removing the vertical frame with the length-width ratio smaller than the preset value;
the text boxes of languages other than the chinese language are removed.
In a third aspect, the present invention provides a computer readable medium having stored thereon computer instructions which, when executed by a processor, cause the processor to perform the method of any of the first aspects.
The certificate identification method based on deep learning OCR and layout structure has the following advantages:
1. the method comprises the steps of carrying out angle detection on an input certificate image, correcting the angle of the certificate image, obtaining a target certificate image through a certificate detection model of transfer learning training, removing a background image, carrying out text extraction on the target certificate image through an OCR text detection method to obtain an initial text box, carrying out merging and proportional stretching operations after removing impurity boxes of the initial text box, calculating coordinate information of each content element corresponding to the text box by taking the coordinate information of the text box corresponding to a certificate number as a reference, cutting the text box image corresponding to each content element, and carrying out text detection through the OCR text detection method to obtain text information, wherein the method is low in cost and improves the accuracy of certificate identification;
2. the method corrects the angle of the certificate image, acquires the target certificate image through the certificate detection model of the transfer learning training, removes the background image, has low requirements on the certificate image shooting, does not need to have sufficient light and has certain technical requirements that the edge of the certificate needs to be close to the given frame edge, and is convenient and quick.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
The invention is further described below with reference to the accompanying drawings.
FIG. 1 is a flowchart of a document recognition method based on deep learning OCR and layout structure in embodiment 1;
FIG. 2 is a schematic diagram of direction determination in the document recognition method based on deep learning OCR and layout structure in embodiment 1;
FIG. 3 is a schematic diagram of an identity card detection result in the document identification method based on deep learning OCR and layout structure in embodiment 1;
FIG. 4 is a schematic diagram illustrating the detection of element content in the document identification method based on deep learning OCR and layout structure in embodiment 1;
FIG. 5 is a schematic view of a processing flow of a detection frame in the certificate recognition method based on deep learning OCR and layout structure in embodiment 1;
FIG. 6 is a schematic diagram of the embodiment 1 after processing a detection frame in the certificate recognition method based on deep learning OCR and layout structure;
fig. 7 is a schematic diagram illustrating matching of element content and coordinate information in the certificate recognition method based on deep learning OCR and layout structure in embodiment 1.
Detailed Description
The present invention is further described below with reference to the accompanying drawings and specific embodiments so that those skilled in the art can better understand the present invention and can implement the present invention, but the embodiments are not intended to limit the present invention, and the embodiments and technical features of the embodiments can be combined with each other without conflict.
The embodiment of the invention provides a document identification method based on deep learning OCR and a layout structure, which is used for solving the technical problem of how to provide an identity card identification method which is low in cost, high in robustness and guaranteed in identification result.
Example 1:
the invention discloses a certificate identification method based on deep learning OCR and a layout structure, which comprises the following steps:
s100, carrying out angle detection in four directions on the input certificate image based on the character direction, and carrying out rotation operation on the certificate image, wherein the rotated certificate image conforms to the visual angle of a person;
s200, training a certificate detection model based on transfer learning, carrying out certificate identification on the rotated certificate image through the trained certificate detection model, and removing a background image to obtain a target certificate image;
s300, carrying out text detection on the target certificate image by an OCR text detection method to obtain a plurality of initial text boxes;
s400, removing impurity frames except for content elements from the initial text box, combining the initial text box after the impurity frames are removed based on the center point coordinates and the length and width angles of the text box to obtain a plurality of combined text boxes, and stretching the combined text boxes in proportion to avoid omission of the content elements;
s500, calculating coordinate information of a text box corresponding to the certificate number based on the coordinate information of the certificate enclosure box and the length information of the certificate number, and calculating the coordinate information of the text box corresponding to each content element by taking the coordinate information of the text box corresponding to the certificate number as a reference;
s600, cutting the text box corresponding to each content element to obtain a text box image, and performing text detection on each text box image by an OCR text detection method to obtain text information corresponding to the content element.
In step S100, direction determination is performed, and due to the fact that the intelligent device takes pictures and stores, the uploaded certificate image rotates by four angles, which causes inconvenience in position detection of the subsequent identity card. In order to solve the problem, the character directions of the input certificate image are detected at 0, 90, 180 and 270 degrees, the certificate image is rotated according to the degrees, the angle detection in four directions is carried out on the certificate to be processed, and the target certificate image which accords with the human visual angle is obtained through rotation.
In the step S200, the certificate detection is performed, and due to the influence of factors such as background interference, stain of the identification card, uneven light, and the like, the method for determining the position information of the identification card by using image processing methods such as edge detection and image segmentation in the past is increasingly inapplicable, and a method with stronger robustness is urgently needed to replace the method. With the deep application of deep learning in the image field, methods for locating target objects by using target detection are becoming more and more common. Marking a certificate data set to mark an enclosure frame of the certificate, training a certificate detection model through transfer learning, and identifying a relatively accurate position of the certificate in a natural scene. The certificate detection model in this embodiment is an SSD-MobileNet V1 model.
Step S300, detecting the certificate content elements, wherein the intelligent equipment is influenced by the environment, such as uneven illumination, low contrast and interference of the background; geometric deformation, blurring and deformity caused by imaging of the equipment, and too small image caused by distance; texture interference and logo interference of the identity card and multi-national multi-language mixing cause that a satisfactory result is difficult to obtain by a single image processing mode or a character segmentation and detection method. At present, text lines are detected through deep learning, and text line identification is carried out by utilizing context information of characters, so that the technology is mature. And detecting corpora by using some public OCR text lines, training the text line detection, and then performing parameter tuning by using the corpora marked with a part of the identity card so as to ensure that the detection is accurate as much as possible. In the embodiment, the element content to be extracted in the certificate is detected by an OCR text detection method, so as to obtain a plurality of initial text boxes.
Step S400 is to perform detection frame processing, where the complexity of the background and the design of the id card itself may result in detecting out the miscellaneous frames other than the content elements, and in order to avoid affecting the subsequent steps, the miscellaneous frames need to be processed, as shown in fig. 5:
(1) Removing the text box with low confidence coefficient;
(2) Removing all frames except the bounding box detected by the certificate and intersected with the bounding box;
(3) Removing ghost text boxes appearing below the portrait due to the reflection;
(4) Removing the vertical frame with the length-width ratio smaller than a preset value;
(5) Text boxes of languages other than the chinese language are removed.
The problem that the processed text box is possibly separated due to interface design needs to be solved according to the position parameters, the coordinates of the central point of the text box and the length and width angle information, and the text box is combined. Due to the detection, the merged text box may have missing content elements, and therefore, the size of the text box is finally scaled.
In step S500, matching between the content element and the layout coordinate is performed, and the text box where the certificate number is located is easily calculated according to the coordinate information of the enclosure box and the length information of the identity card number. The text box is taken as a standard, the area where the content elements of address, birth, gender, nationality and name are located can be roughly determined by combining parameters and surrounding box information according to the proportional relation of certificate design, the position information of the text line is matched with each area, and finally the information of the content elements represented by the text box can be obtained.
In step S600, structured extraction of content elements is performed, and for the text box corresponding to each content element, an image is cut out through the box coordinate information, and then the image is input into the character recognition model, so that the text information corresponding to each content element can be obtained. At present, deep learning is used for text recognition, text line recognition can fully utilize context information of characters for modeling, and therefore the recognition effect is better than that of segmenting single characters and then recognizing the single characters. And training text line recognition by using some public OCR text recognition methods, and then performing parameter tuning by using the corpus marked with a part of the identity card so as to ensure that the recognition is as accurate as possible. Influenced by the accuracy of the recognition model, the recognized characters may be irregular, and finally, the result needs to be outputted in a standardized manner through a regular expression.
In this embodiment, the certificate image may be an identification card, a driving license, a bank card, or the like. For the identity card, the text box where the identity card number is located can be calculated according to the coordinate information of the surrounding box and the length information of the identity card number. The text box corresponding to the ID card number is taken as a reference, the area where the content elements of address, birth, gender, nationality and name are located can be roughly determined by combining the parameters and the surrounding box information according to the proportional relation designed by the ID card, the position information of the text line is matched with each area, and finally the information of the content elements represented by the text box can be obtained. And other types of certificates also acquire the coordinate information of the text box corresponding to the content element based on the method.
Example 2:
the invention relates to a certificate recognition system based on deep learning OCR and a layout structure, which carries out text on a certificate by a certificate recognition method based on deep learning OCR and the layout structure disclosed in embodiment 1, and the system comprises an image rotation module, a certificate extraction module, a text processing module, a content element and layout coordinate matching module, a content element structured extraction module and a character standardization module, wherein the image rotation module is used for carrying out angle detection in four directions on an acquired certificate image based on character directions and carrying out rotation operation on the certificate image, and the rotated certificate image accords with human visual angles; the certificate extraction module is used for training a certificate detection model based on transfer learning, carrying out certificate identification on the rotated certificate image through the trained certificate detection model, and removing a background image to obtain a target certificate image; the text extraction module is used for carrying out text detection on the target certificate image by an OCR text detection method to obtain a plurality of initial text boxes; the text processing module is used for removing the impurity frames except the content elements from the initial text boxes, combining the initial text boxes after the impurity frames are removed based on the central point coordinates and the length and width angles of the text boxes to obtain a plurality of combined text boxes, and stretching the combined text boxes in proportion to avoid omission of the content elements; the content element and layout coordinate matching module is used for calculating the coordinate information of the text box corresponding to the certificate number based on the coordinate information of the certificate enclosure box and the certificate number length information, and calculating the coordinate information of the text box corresponding to each content element by taking the coordinate information of the text box corresponding to the certificate number as a reference; the content element structured extraction module is used for cutting the text box corresponding to each content element to obtain a text box image, and for each text box image, performing text detection by an OCR text detection method to obtain text information corresponding to the content element; and the character specification module is used for carrying out normalized output on the text information corresponding to the content elements through a regular expression.
The image rotation module is used for detecting angles in four directions of 0 degree, 90 degrees, 180 degrees and 270 degrees based on the character direction.
The certificate detection model is an SSD-MobileNet V1 model.
The text processing module is used for removing the miscellaneous frames except the content elements for the initial text frame through the following steps:
(1) For the initial text box, removing the miscellaneous box except the content element, comprising the following steps:
(2) Removing the text box with low confidence coefficient;
(3) Removing all frames except the bounding box detected by the certificate and intersected with the bounding box;
(4) Removing ghost text boxes which are caused by reflection and appear under the portrait;
(5) Removing the vertical frame with the length-width ratio smaller than the preset value;
(6) Text boxes of languages other than the chinese language are removed.
The system can identify the text information in the certificate, and for the identity card, the work flow is as follows: firstly, acquiring identity card image data through intelligent equipment; carrying out angle detection in four directions on an image to be processed, and rotating the image according to the angle to obtain an image which accords with the human visual angle; further finding a general area where the identity card is located through a depth target detection model for the image; detecting the region of the text line in the image by adopting a deep OCR text line detection model; deleting the text boxes of the non-identity card elements, combining the appropriate text boxes according to the parameters, and stretching the combined text boxes; matching and positioning the element position of the identity card according to the preset parameters by referring to the position information of the identity card number; regularizing the extracted information of each element; and finally, formatting and outputting the identity card information. The system has low shooting requirement and is convenient to use.
Example 3:
an embodiment of the present invention further provides a computer-readable medium, where computer instructions are stored on the computer-readable medium, and when the computer instructions are executed by a processor, the processor is caused to execute the method disclosed in embodiment 1. Specifically, a system or an apparatus equipped with a storage medium on which software program codes that realize the functions of any of the above-described embodiments are stored may be provided, and a computer (or a CPU or MPU) of the system or the apparatus is caused to read out and execute the program codes stored in the storage medium.
In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.
Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.
Further, it should be clear that the functions of any one of the above-described embodiments may be implemented not only by executing the program code read out by the computer, but also by causing an operating system or the like operating on the computer to perform a part or all of the actual operations based on instructions of the program code.
Further, it is to be understood that the program code read out from the storage medium is written to a memory provided in an expansion board inserted into the computer or to a memory provided in an expansion unit connected to the computer, and then a CPU or the like mounted on the expansion board or the expansion unit is caused to perform part or all of the actual operations based on instructions of the program code, thereby realizing the functions of any of the embodiments described above.
It should be noted that not all steps and modules in the above flows and system structure diagrams are necessary, and some steps or modules may be omitted according to actual needs. The execution order of the steps is not fixed and can be adjusted as required. The system structures described in the above embodiments may be physical structures or logical structures, that is, some modules may be implemented by the same physical entity, or some modules may be implemented by a plurality of physical entities separately, or some components may be implemented together in a plurality of independent devices.
While the invention has been shown and described in detail in the drawings and in the preferred embodiments, it is not intended to limit the invention to the embodiments disclosed, and it will be apparent to those skilled in the art that various combinations of the code auditing means in the various embodiments described above may be used to obtain further embodiments of the invention, which are also within the scope of the invention.

Claims (4)

1. The certificate recognition method based on deep learning OCR and layout structure is characterized by comprising the following steps:
for an input certificate image, angle detection is carried out in four directions of 0 degree, 90 degrees, 180 degrees and 270 degrees based on the character direction, the certificate image is rotated, and the rotated certificate image accords with the visual angle of a person;
training a certificate detection model based on transfer learning, wherein the certificate detection model is an SSD-MobileNet V1 model, and performing certificate identification on the rotated certificate image through the trained certificate detection model to remove a background image to obtain a target certificate image;
performing text detection on the target certificate image by an OCR text detection method to obtain a plurality of initial text boxes;
removing the impurity frames except for the content elements from the initial text box, combining the initial text box after removing the impurity frames based on the central point coordinates and the length and width angles of the text box to obtain a plurality of combined text boxes, and stretching the combined text boxes in proportion to avoid omission of the content elements;
calculating the coordinate information of the text box corresponding to the certificate number based on the coordinate information of the certificate enclosure box and the length information of the certificate number, and calculating the coordinate information of the text box corresponding to each content element by taking the coordinate information of the text box corresponding to the certificate number as a reference;
cutting the text box corresponding to each content element to obtain a text box image, and performing text detection on each text box image by an OCR text detection method to obtain text information corresponding to the content element;
for the initial text box, removing the miscellaneous box except the content element, comprising the following steps:
removing the text box with low confidence coefficient;
removing all frames except the bounding box detected by the certificate and intersected with the bounding box;
removing ghost text boxes appearing below the portrait due to the reflection;
removing the vertical frame with the length-width ratio smaller than the preset value;
text boxes of languages other than the chinese language are removed.
2. The method for recognizing the document based on the deep learning OCR and the layout structure as claimed in claim 1, wherein the text information corresponding to the content element is obtained and then outputted in a normalized manner through a regular expression.
3. A system for document recognition based on deep learning OCR and layout structure, characterized in that the document is subjected to text by the method for document recognition based on deep learning OCR and layout structure as claimed in any one of claims 1-2, the system comprising:
the image rotation module is used for carrying out angle detection on the acquired certificate image in four directions of 0 degree, 90 degrees, 180 degrees and 270 degrees based on the character direction, carrying out rotation operation on the certificate image, and enabling the rotated certificate image to accord with the visual angle of a person;
the certificate extraction module is used for training a certificate detection model based on transfer learning, the certificate detection model is an SSD-MobileNet V1 model, certificate recognition is carried out on the rotated certificate image through the trained certificate detection model, and a background image is removed to obtain a target certificate image;
the text extraction module is used for carrying out text detection on the target certificate image by an OCR text detection method to obtain a plurality of initial text boxes;
the text processing module is used for removing the miscellaneous frames except the content elements from the initial text frames, combining the initial text frames after the impurity frames are removed based on the central point coordinates and the length and width angles of the text frames to obtain a plurality of combined text frames, and stretching the combined text frames in proportion to avoid omission of the content elements;
the content element and layout coordinate matching module is used for calculating the coordinate information of the text box corresponding to the certificate number based on the coordinate information of the certificate enclosure box and the length information of the certificate number, and calculating the coordinate information of the text box corresponding to each content element by taking the coordinate information of the text box corresponding to the certificate number as a reference;
the content element structured extraction module is used for cutting the text box corresponding to each content element to obtain a text box image, and performing text detection on each text box image by an OCR text detection method to obtain text information corresponding to the content element;
the character specification module is used for carrying out normalized output on the text information corresponding to the content elements through a regular expression;
the text processing module is used for removing the miscellaneous frames except the content elements for the initial text frame through the following steps:
for the initial text box, removing the miscellaneous box except the content element, comprising the following steps:
removing the text box with low confidence coefficient;
removing all frames except the bounding box detected by the certificate and intersected with the bounding box;
removing ghost text boxes appearing below the portrait due to the reflection;
removing the vertical frame with the length-width ratio smaller than the preset value;
text boxes of languages other than the chinese language are removed.
4. A computer readable medium having stored thereon computer instructions which, when executed by a processor, cause the processor to perform the method of any of claims 1 to 2.
CN202110238213.8A 2021-03-04 2021-03-04 Certificate identification method based on deep learning OCR and layout structure Active CN112926469B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110238213.8A CN112926469B (en) 2021-03-04 2021-03-04 Certificate identification method based on deep learning OCR and layout structure

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110238213.8A CN112926469B (en) 2021-03-04 2021-03-04 Certificate identification method based on deep learning OCR and layout structure

Publications (2)

Publication Number Publication Date
CN112926469A CN112926469A (en) 2021-06-08
CN112926469B true CN112926469B (en) 2022-12-27

Family

ID=76173252

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110238213.8A Active CN112926469B (en) 2021-03-04 2021-03-04 Certificate identification method based on deep learning OCR and layout structure

Country Status (1)

Country Link
CN (1) CN112926469B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113591657B (en) * 2021-07-23 2024-04-09 京东科技控股股份有限公司 OCR layout recognition method and device, electronic equipment and medium
CN113435449B (en) * 2021-08-03 2023-08-22 全知科技(杭州)有限责任公司 OCR image character recognition and paragraph output method based on deep learning
CN114332865B (en) * 2022-03-11 2022-06-03 北京锐融天下科技股份有限公司 Certificate OCR recognition method and system
CN114708603A (en) * 2022-05-25 2022-07-05 杭州咏柳科技有限公司 Method, system, device and medium for identifying key information in medical bill
CN115131806B (en) * 2022-06-07 2023-10-31 福建极推科技有限公司 Method and system for identifying OCR (optical character recognition) image information of various certificates based on deep learning

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898601A (en) * 2020-07-14 2020-11-06 浙江大华技术股份有限公司 Identity card element extraction method and device
CN112016547A (en) * 2020-08-20 2020-12-01 上海天壤智能科技有限公司 Image character recognition method, system and medium based on deep learning

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2320390A1 (en) * 2009-11-10 2011-05-11 Icar Vision Systems, SL Method and system for reading and validation of identity documents
CN109492643B (en) * 2018-10-11 2023-12-19 平安科技(深圳)有限公司 Certificate identification method and device based on OCR, computer equipment and storage medium
CN109934181A (en) * 2019-03-18 2019-06-25 北京海益同展信息科技有限公司 Text recognition method, device, equipment and computer-readable medium
CN109961064B (en) * 2019-03-20 2023-04-07 深圳华付技术股份有限公司 Identity card text positioning method and device, computer equipment and storage medium
CN110363199A (en) * 2019-07-16 2019-10-22 济南浪潮高新科技投资发展有限公司 Certificate image text recognition method and system based on deep learning
CN111639648B (en) * 2020-05-26 2023-09-19 浙江大华技术股份有限公司 Certificate identification method, device, computing equipment and storage medium
CN111783757A (en) * 2020-06-01 2020-10-16 成都科大极智科技有限公司 OCR technology-based identification card recognition method in complex scene
CN111783761A (en) * 2020-06-30 2020-10-16 苏州科达科技股份有限公司 Certificate text detection method and device and electronic equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898601A (en) * 2020-07-14 2020-11-06 浙江大华技术股份有限公司 Identity card element extraction method and device
CN112016547A (en) * 2020-08-20 2020-12-01 上海天壤智能科技有限公司 Image character recognition method, system and medium based on deep learning

Also Published As

Publication number Publication date
CN112926469A (en) 2021-06-08

Similar Documents

Publication Publication Date Title
CN112926469B (en) Certificate identification method based on deep learning OCR and layout structure
US10885644B2 (en) Detecting specified image identifiers on objects
CN109492643B (en) Certificate identification method and device based on OCR, computer equipment and storage medium
WO2018010657A1 (en) Structured text detection method and system, and computing device
US7970213B1 (en) Method and system for improving the recognition of text in an image
CN104217203B (en) Complex background card face information identifying method and system
EP3940589B1 (en) Layout analysis method, electronic device and computer program product
CN109255300B (en) Bill information extraction method, bill information extraction device, computer equipment and storage medium
US11869259B2 (en) Text line image splitting with different font sizes
CN111079571A (en) Identification card information identification and edge detection model training method and device
CN112819004B (en) Image preprocessing method and system for OCR recognition of medical bills
CN112085022A (en) Method, system and equipment for recognizing characters
JP4904330B2 (en) Method and apparatus for extracting text from an image
CN113011426A (en) Method and device for identifying certificate
CN112926421A (en) Image processing method and apparatus, electronic device, and storage medium
CN115171138A (en) Method, system and equipment for detecting image text of identity card
US10628702B1 (en) Method of matching a query image to a template image and extracting information from the query image
CN114463767A (en) Credit card identification method, device, computer equipment and storage medium
CN111325106A (en) Method and device for generating training data
CN111008635A (en) OCR-based multi-bill automatic identification method and system
JP3031579B2 (en) How to specify the character recognition area of a form
US11756321B2 (en) Information processing apparatus and non-transitory computer readable medium
CN113627442A (en) Medical information input method, device, equipment and storage medium
JP2004094427A (en) Slip image processor and program for realizing the same device
JP2002245404A (en) Program and device for segmenting area

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant