CN111414905B - Text detection method, text detection device, electronic equipment and storage medium - Google Patents

Text detection method, text detection device, electronic equipment and storage medium Download PDF

Info

Publication number
CN111414905B
CN111414905B CN202010117641.0A CN202010117641A CN111414905B CN 111414905 B CN111414905 B CN 111414905B CN 202010117641 A CN202010117641 A CN 202010117641A CN 111414905 B CN111414905 B CN 111414905B
Authority
CN
China
Prior art keywords
text
detection
region
seal
text detection
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010117641.0A
Other languages
Chinese (zh)
Other versions
CN111414905A (en
Inventor
张博熠
马文伟
刘设伟
王亚领
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Taikang Insurance Group Co Ltd
Taikang Online Property Insurance Co Ltd
Original Assignee
Taikang Insurance Group Co Ltd
Taikang Online Property Insurance Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Taikang Insurance Group Co Ltd, Taikang Online Property Insurance Co Ltd filed Critical Taikang Insurance Group Co Ltd
Priority to CN202010117641.0A priority Critical patent/CN111414905B/en
Publication of CN111414905A publication Critical patent/CN111414905A/en
Application granted granted Critical
Publication of CN111414905B publication Critical patent/CN111414905B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06KGRAPHICAL DATA READING; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K7/00Methods or arrangements for sensing record carriers, e.g. for reading patterns
    • G06K7/10Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation
    • G06K7/14Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation using light without selection of wavelength, e.g. sensing reflected white light
    • G06K7/1404Methods for optical code recognition
    • G06K7/1408Methods for optical code recognition the method being specifically adapted for the type of code
    • G06K7/14172D bar codes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/242Aligning, centring, orientation detection or correction of the image by image rotation, e.g. by 90 degrees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Abstract

The application provides a text detection method, a text detection device, electronic equipment and a storage medium, wherein a picture to be detected is firstly obtained; then detecting a target text area in the picture to be detected, wherein the target text area comprises at least one of a header text area, a seal text area and a layout text area; determining a target text detection model corresponding to the target text region according to a preset corresponding relation between the text region and the text detection model; and detecting the text in the target text region by using the target text detection model to obtain a target text detection box. According to the technical scheme, the target text region in the picture to be detected is detected, the text in the region is detected by adopting the text detection model matched with the target text region in a targeted manner by combining the advantages of different text detection models, the integrity of text detection is improved, and powerful support is provided for subsequent text recognition.

Description

Text detection method, text detection device, electronic equipment and storage medium
Technical Field
The present application relates to the field of image recognition technologies, and in particular, to a text detection method, a text detection device, an electronic device, and a storage medium.
Background
Nowadays, with the development of economy and the improvement of living standard of people, more and more people choose to purchase medical, commercial, financial and other insurance. Some insurance companies slowly start self-help claim settlement business, for example, users only need to photograph and upload outpatient or inpatient notes to an insurance company system in the medical claim settlement process, and information on the note pictures uploaded by the users is input into the claim settlement system by insurance company operators.
The recording efficiency of the bill can be improved under a certain condition through an optical character recognition technology (OCR, optical Character Recognition), however, pictures uploaded by a user are influenced by shooting angles, curved characters, inclined characters and the like exist, in addition, due to the complexity of the bill layout, long text at the gauge head, characters in a seal and the like exist, and the difficulty of character detection and recognition is greatly increased under the conditions. For complex layouts, the related text detection technology has the problem of incapability of detecting or incomplete detection, and the final text recognition result is directly influenced by the quality of the detection effect, so that the problem of text detection in complex layouts is solved, and the method is particularly important in the optical character recognition technology.
Disclosure of Invention
The application provides a text detection method, a text detection device, electronic equipment and a storage medium, which are used for improving the integrity of text detection.
In order to solve the problems, the application discloses a text detection method, which comprises the following steps:
acquiring a picture to be detected;
detecting a target text region in the picture to be detected, wherein the target text region comprises at least one of a header text region, a seal text region and a layout text region;
determining a target text detection model corresponding to the target text region according to a preset corresponding relation between the text region and the text detection model, wherein the corresponding relation between the text region and the text detection model comprises at least one of the following: performing text detection on the seal text region by adopting a first Psenet model, performing text detection on the header text region by adopting a second Psenet model, and performing text detection on the layout text region by adopting an EAST model;
and detecting the text in the target text region by adopting the target text detection model to obtain a target text detection box.
In an optional implementation manner, the step of acquiring the picture to be detected includes:
receiving an original bill picture;
determining the rotation angle of the original bill picture according to the included angle between the edge straight line and the horizontal axis of the original bill picture;
and rotating the original bill picture to the horizontal direction according to the rotation angle to obtain the picture to be detected.
In an optional implementation manner, before the step of determining the rotation angle of the original bill picture according to the included angle between the edge straight line and the horizontal axis of the original bill picture, the method further includes:
and carrying out edge detection on the original bill picture by adopting Huffman straight line detection to obtain an edge straight line of the original bill picture.
In an optional implementation manner, the step of detecting the target text region in the picture to be detected includes:
performing seal detection on the picture to be detected to obtain a seal detection frame;
determining a preprinted seal area and a non-preprinted seal area according to the length-width ratio of the seal detection frame;
and determining the pre-printing seal area and/or the non-pre-printing seal area as the seal text area.
In an alternative implementation manner, the step of detecting the text in the target text area by using the target text detection model to obtain a target text detection box includes:
detecting the text in the seal text area by adopting a first Psenet model obtained by pre-training to obtain an initial seal text detection box;
and carrying out text correction on the initial seal text detection box to obtain a horizontal seal text detection box.
In an optional implementation manner, the step of detecting the target text region in the picture to be detected includes:
detecting the picture to be detected by adopting a second Psenet model obtained through pre-training to obtain a plurality of text detection boxes;
determining a header text region according to the length-width ratio of the text detection boxes;
the step of detecting the text in the target text region by using the target text detection model to obtain a target text detection box comprises the following steps:
and determining a text detection box corresponding to the header text region in the text detection boxes as a header text detection box.
In an optional implementation manner, the step of detecting the target text region in the picture to be detected includes:
determining the area except the header text area and the seal text area in the picture to be detected as the layout text area;
the step of detecting the text in the target text region by using the target text detection model to obtain a target text detection box comprises the following steps:
and detecting the text in the layout text area by adopting an EAST model obtained through pre-training to obtain a layout text detection box.
In order to solve the above problems, the present application also discloses a text detection device, which includes:
the acquisition module is configured to acquire a picture to be detected;
the region detection module is configured to detect a target text region in the picture to be detected, wherein the target text region comprises at least one of a header text region, a seal text region and a layout text region;
the model determining module is configured to determine a target text detection model corresponding to the target text region according to a preset corresponding relation between the text region and the text detection model, wherein the corresponding relation between the text region and the text detection model comprises at least one of the following: performing text detection on the seal text region by adopting a first Psenet model, performing text detection on the header text region by adopting a second Psenet model, and performing text detection on the layout text region by adopting an EAST model;
and the text detection module is configured to detect the text in the target text area by adopting the target text detection model to obtain a target text detection box.
In an alternative implementation, the acquisition module is specifically configured to:
receiving an original bill picture;
determining the rotation angle of the original bill picture according to the included angle between the edge straight line and the horizontal axis of the original bill picture;
and rotating the original bill picture to the horizontal direction according to the rotation angle to obtain the picture to be detected.
In an alternative implementation, the acquisition module is further configured to:
and carrying out edge detection on the original bill picture by adopting Huffman straight line detection to obtain an edge straight line of the original bill picture.
In an alternative implementation, the area detection module is specifically configured to:
performing seal detection on the picture to be detected to obtain a seal detection frame;
determining a preprinted seal area and a non-preprinted seal area according to the length-width ratio of the seal detection frame;
and determining the pre-printing seal area and/or the non-pre-printing seal area as the seal text area.
In an alternative implementation, the text detection module is specifically configured to:
detecting the text in the seal text area by adopting a first Psenet model obtained by pre-training to obtain an initial seal text detection box;
and carrying out text correction on the initial seal text detection box to obtain a horizontal seal text detection box.
In an alternative implementation, the area detection module is specifically configured to:
detecting the picture to be detected by adopting a second Psenet model obtained through pre-training to obtain a plurality of text detection boxes;
determining a header text region according to the length-width ratio of the text detection boxes;
the text detection module is specifically configured to:
and determining a text detection box corresponding to the header text region in the text detection boxes as a header text detection box.
In an alternative implementation, the area detection module is specifically configured to:
determining the area except the header text area and the seal text area in the picture to be detected as the layout text area;
the text detection module is specifically configured to:
and detecting the text in the layout text area by adopting an EAST model obtained through pre-training to obtain a layout text detection box.
In order to solve the above problems, the present application also discloses an electronic device, including:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the text detection method of any of the embodiments.
In order to solve the above-mentioned problem, the present application also discloses a storage medium, which when the instructions in the storage medium are executed by a processor of an electronic device, enables the electronic device to execute the text detection method according to any embodiment.
Compared with the prior art, the application has the following advantages:
the technical scheme of the application provides a text detection method, a text detection device, electronic equipment and a storage medium, wherein a picture to be detected is firstly obtained; then detecting a target text area in the picture to be detected, wherein the target text area comprises at least one of a header text area, a seal text area and a layout text area; determining a target text detection model corresponding to the target text region according to a preset corresponding relation between the text region and the text detection model, wherein the corresponding relation between the text region and the text detection model comprises at least one of the following: performing text detection on the seal text region by adopting a first Psenet model, performing text detection on the header text region by adopting a second Psenet model, and performing text detection on the layout text region by adopting an EAST model; and detecting the text in the target text region by using the target text detection model to obtain a target text detection box. According to the technical scheme, the target text region in the picture to be detected is detected, the text in the region is detected by adopting the text detection model matched with the target text region in a targeted manner by combining the advantages of different text detection models, the integrity of text detection is improved, and powerful support is provided for subsequent text recognition.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart showing steps of a text detection method according to an embodiment of the present application;
FIG. 2 is a flowchart showing steps for detecting seal text according to an embodiment of the present application;
FIG. 3 is a flowchart showing steps for detecting header text according to one embodiment of the present application;
FIG. 4 is a flowchart showing steps for detecting layout text according to an embodiment of the present application;
FIG. 5 is a flowchart showing steps of a specific implementation manner of a text detection method according to an embodiment of the present application;
FIG. 6 is a diagram showing the effect of text detection on a hand stamp using a first Pseneet model according to an embodiment of the present application;
FIG. 7 is a diagram showing the effect of text detection on a pre-printed stamp using a first Pseneet model in accordance with one embodiment of the present application;
FIG. 8 is a diagram showing the effect of text detection on header text using a second Pseneet model according to an embodiment of the present application;
FIG. 9 shows an effect diagram of text detection of header text using the EAST model;
FIG. 10 is a diagram showing the effect of text detection on layout text using EAST model according to one embodiment of the present application;
fig. 11 is a block diagram showing a structure of a text detection device according to an embodiment of the present application.
Detailed Description
In order to enable those skilled in the art to better understand the technical solutions of the present disclosure, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the foregoing figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the disclosure described herein may be capable of operation in sequences other than those illustrated or described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
Text detection is an important non-exhaustive ring in optical character recognition technology, and text detection is not only focused on integrity but also focused on integrity. Because of the complexity of bill layout, there are long text of header, seal text, handwritten text, etc., and curved text and inclined text, etc., which are affected by shooting angle, the inventor analyses and discovers that the root cause that it is difficult to completely identify the text in the complex layout in the prior art is to detect various text types in the complex layout by only one model.
To solve the above problems, an exemplary embodiment of the present application shows a flowchart of a text detection method, as shown in fig. 1, which may include the steps of:
in step S11, a picture to be detected is acquired.
In a specific implementation, an original bill picture can be received first, and then correction processing is carried out on the original bill picture to obtain a picture to be detected.
For example, this step may further include: receiving an original bill picture; determining the rotation angle of the original bill picture according to the included angle between the edge straight line and the horizontal axis of the original bill picture; and rotating the original bill picture to the horizontal direction according to the rotation angle to obtain a picture to be detected.
The edge line of the original bill picture can be obtained by detecting the edge of the original bill picture by adopting Huffman line detection. The horizontal and vertical edge straight lines in the original bill picture can be detected through Huffman straight line detection, then the original bill picture is rotated according to the included angles between the horizontal and vertical edge straight lines and the horizontal axis, and the original bill picture is rotated to the horizontal direction, so that characters in the bill picture can be detected and identified later.
In step S12, a target text region in the picture to be detected is detected, the target text region including at least one of a header text region, a seal text region, and a layout text region.
In a specific implementation, seal detection can be performed on the picture to be detected to obtain a seal text region in the picture to be detected; a text detection algorithm (such as a Psenet algorithm) can be adopted to carry out text detection on the picture to be detected, and a header text region is determined according to the length-width ratio of the text detection box; the non-seal text area and the non-header text area in the picture to be detected can be determined as layout text areas.
The target text region may also include a handwritten text region, a tabular text region, a capped text region, and the like.
In step S13, a target text detection model corresponding to the target text region is determined according to a preset correspondence between the text region and the text detection model, wherein the correspondence between the text region and the text detection model includes at least one of: and performing text detection on the seal text area by adopting a first Psenet model, performing text detection on the header text area by adopting a second Psenet model, and performing text detection on the layout text area by adopting an EAST model.
In practical applications, there are various text detection models, such as fast RCNN, SSD, YOLO, psenet, EAST, RRCNN, textBoxes, CTPN, etc., however, each text detection model has its own merits and merits. For example, the deep learning model EAST has the advantage of better detection effect on horizontal or inclined quadrilateral block texts, has the disadvantage of not being capable of completely detecting header long texts, can generate segment detection once the texts are overlong, and can not effectively detect texts in a seal and two-dimensional codes. The other deep learning model Psenet has the advantages of better detection effect on curved text (such as in-seal text) and long block text (such as header text) and two-dimension codes, and has the disadvantage of easy detection on text with closer distance (comprising upper and lower spacing and left and right spacing).
Thus, each text detection model is adapted to detect a different text type (text region), i.e. there is a correspondence between the text region and the matching text detection model. Specifically, the text detection can be performed on the seal text area by adopting the first Psenet model, the text detection can be performed on the header text area by adopting the second Psenet model, and the text detection can be performed on the layout text area by adopting the EAST model. By combining the advantages of different text detection models and adopting a proper text detection model to detect the corresponding text types, the complete and effective detection of various text types in the complex layout can be realized.
In step S14, the text in the target text region is detected using the target text detection model, and a target text detection box is obtained.
In a specific implementation, the text in the target text region is detected by adopting the target text detection model determined in the step S13, so as to obtain a target text detection box.
For example, a first Psenet model may be used to detect the seal text region, so as to obtain a seal text detection box; detecting the header text region by adopting a second Psenet model to obtain a header text detection box; and detecting the layout text area by adopting an EAST model to obtain a layout text detection box.
In practical application, the detected target text detection box can be input into a deep learning recognition engine to output a text recognition result.
The embodiment can be applied to OCR emergency bill input (such as the field of medical bill and claim in insurance industry), is responsible for the character detection function of OCR emergency bill pictures, and can solve the pain points of various links such as insurance industry check and claim. The embodiment can not only detect the quadrilateral block characters completely and effectively, but also detect the header long text, the handwritten text, the form text, the capped text, the seal text (curved text) and the two-dimensional code completely, so that the embodiment can be applied to the text detection of other text types such as long text, curved text or quadrilateral block text.
In the text detection method provided by the embodiment, the target text region in the picture to be detected is detected first, and the text in the region is detected by adopting the text detection model suitable for the target text region in a targeted manner by combining the advantages or detection characteristics of different text detection models, so that the integrity of text detection is improved, powerful support is provided for subsequent text recognition, and the problem that various text types in complex layouts cannot be completely detected by adopting a single model in the prior art is solved.
In an alternative implementation, in step S12, the method may further include: and performing seal detection on the picture to be detected to obtain a seal text region. Further, referring to fig. 2, this step may include:
in step S21, seal detection is performed on the picture to be detected, and a seal detection frame is obtained.
In step S22, a preprinted stamp area and a non-preprinted stamp area are determined according to the aspect ratio of the stamp detection frame.
In step S23, the pre-printed stamp area and/or the non-pre-printed stamp area is determined as a stamp text area.
In specific implementation, the technology such as LCSELLIPSE and the like can be used for detecting the picture to be detected by the seal, so that the seal detection frame, namely the circumscribed rectangle of the seal, is obtained, and the aspect ratio of the seal detection frame is calculated. Because of the characteristic that the aspect ratio (preset aspect ratio) of the pre-printed stamp is fixed, a stamp detection frame whose aspect ratio satisfies the preset aspect ratio (e.g., when the absolute value of the difference between the two is smaller than a preset threshold value, it is determined that the aspect ratio satisfies the preset aspect ratio) is determined as a pre-printed stamp area, and a stamp detection frame whose aspect ratio does not satisfy the preset aspect ratio (e.g., when the absolute value of the difference between the two is greater than or equal to the preset threshold value, it is determined that the aspect ratio does not satisfy the preset aspect ratio) is determined as a non-pre-printed stamp area (e.g., a manual stamp area).
Since information such as a region is generally contained in the non-preprinted stamp, the non-preprinted stamp region can be determined as a stamp text region in this case. In practical application, the seal text area can be determined to be a pre-printed seal area, or a non-pre-printed seal area, or a pre-printed seal area and a non-pre-printed seal area according to practical requirements.
In a specific implementation, step S14 may further include:
in step S24, the text in the seal text region is detected by using the first Psenet model obtained by training in advance, and an initial seal text detection box is obtained.
In step S25, the initial seal text detection box is subjected to text correction, and a horizontal seal text detection box is obtained.
In a specific implementation, a seal text region in a picture to be detected can be scratched, then text detection is performed on the seal text region by using a first Psenet model, an initial seal text detection frame is obtained, an effect diagram of text detection on a manual seal by using the first Psenet model is shown with reference to fig. 6, and an effect diagram of text detection on a pre-printed seal by using the first Psenet model is shown with reference to fig. 7.
Because the initial seal text detection boxes obtained through detection are mostly curved or inclined, in order to facilitate subsequent text recognition, a text correction network can be adopted to carry out text correction on the initial seal text detection boxes, so that horizontal seal text detection boxes are obtained, and then the horizontal seal text detection boxes are sent to a deep learning recognition engine for text recognition.
The first Psenet model is obtained by training a curved text serving as a training sample in advance based on a progressive expansion network Psenet. The progressive extension network Psenet is a new instance segmentation network, which can locate text with arbitrary shape, and adopts a progressive scale extension algorithm, which can successfully identify adjacent text instances.
The Psenet model is adopted to detect the text in the seal, and the Psenet model can effectively detect the long block text, so that the integrity of text detection can be improved.
In an alternative implementation, referring to fig. 3, in step S12, it may further include:
in step S31, a second Psenet model obtained by training in advance is adopted to detect the picture to be detected, so as to obtain a plurality of text detection boxes.
In step S32, a header text region is determined based on the aspect ratios of the plurality of text detection boxes.
Accordingly, in step S14, it may further include:
in step S33, a text detection box corresponding to the header text region in the plurality of text detection boxes is determined as a header text detection box.
In a specific implementation, a second Psenet model can be adopted to carry out text detection on the whole picture to be detected, so as to obtain a plurality of text detection boxes; and then screening the characteristics (such as the length-width ratio) of the text detection boxes, and screening out the text detection box area with the largest length-width ratio as a header text area, wherein the text detection box with the largest length-width ratio is a header text detection box, thereby completing header text detection. Referring to fig. 8, which shows an effect diagram of text detection on header text using the second Psenet model, and referring to fig. 9, which shows an effect diagram of text detection on header text using the EAST model, it can be seen that the second Psenet model can more completely detect header length text.
The second Psenet model is obtained by training a long text of a quadrangle (horizontal or inclined) as a training sample based on a progressive expansion network Psenet.
The method adopts the Psenet model to detect the header long text, and the Psenet model can effectively detect the long block text, so that the integrity of text detection can be further improved.
In an alternative implementation, referring to fig. 4, in step S12, it may further include:
in step S41, the region of the picture to be detected excluding the header text region and the seal text region is determined as the layout text region.
Accordingly, in step S14, it may further include:
in step S42, text in the layout text area is detected by using the EAST model obtained by training in advance, and a layout text detection box is obtained.
In a specific implementation, a non-header text region and a non-seal text region in a picture to be detected can be determined as layout text regions; and then adopting an EAST model to carry out text detection on the layout text area to obtain a layout text detection box. Referring to fig. 10, an effect diagram of text detection of layout text using the EAST model is shown.
The EAST model is obtained by training a text of a quadrangle (horizontal or inclined) as a training sample based on an EAST network in advance. The EAST network contains three parts in total: feature extractor stem (feature extraction branch), feature-merge branch, and output layer.
According to the implementation mode, the EAST model is adopted to detect the layout text, and the EAST model has a good detection effect on the horizontal or inclined quadrilateral text block, so that the implementation mode can obtain a text detection result of a more complete and fine quadrilateral text block.
Referring to fig. 5, a flow chart of a specific implementation manner of the multi-mechanism joint text detection method provided in this embodiment is shown, and the implementation manner is mainly divided into 4 steps:
step 1: and carrying out rotation correction on the input bill image to obtain a horizontal bill image (picture to be detected).
Step 2: detecting the seal of the picture to be detected, detecting the position of the pre-printed seal according to the characteristic of fixed length-width ratio of the pre-printed seal, thereby determining the coordinate positions of other seals, then carrying out the matting of other seals (namely, non-pre-printed seals), carrying out the text detection of the non-pre-printed seals by using a Pseneet algorithm, carrying out the text correction network correction on the detected result, and finally sending the result to a deep learning recognition engine for character recognition. The method aims at solving the problem of detecting the text in the seal.
Step 3: and performing Psenet character detection on the picture to be detected, performing feature screening on the detected quadrangle (text detection box), and screening the quadrangle with the largest length-width ratio as the detected header text detection box. The aim is to solve the header text detection.
Step 4: and performing text detection on the bill layout by using an EAST engine. The method aims at solving the problem that a more complete and fine text detection result of the quadrilateral text block can be obtained.
The method comprises the steps that firstly, a Psenet model and a seal are utilized to detect a header long text and a seal in a picture to be detected, and because the length-to-width ratio of the header long text is maximum, a preprinted bill seal has the characteristics of a fixed format and an aspect ratio, a header long text area and a preprinted seal area can be easily detected, other seals obtained by seal detection are regarded as artificial hand seal official seals (namely non-preprinted seals), the areas of the artificial hand seal official seals are subjected to pattern matting, and then the Psenet model is utilized to detect the text of the artificial hand seal official seals; and then the EAST is used for carrying out text detection on the whole layout.
According to the technical scheme, the rotation correction algorithm of the bill, the seal detection algorithm, the EAST deep learning text detection frame engine and the Psenet curved text detection deep learning frame engine are combined, text detection in a layout is carried out aiming at a complex medical bill layout, and the completeness of the text detection is improved well. The embodiment combines the advantages of different models to solve the problem of detecting layout characters under the complex conditions.
Fig. 11 is a block diagram of a text detection device according to an exemplary embodiment. Referring to fig. 11, the apparatus may include:
an acquisition module 111 configured to acquire a picture to be detected;
a region detection module 112 configured to detect a target text region in the picture to be detected, the target text region including at least one of a header text region, a seal text region, and a layout text region;
a model determining module 113 configured to determine a target text detection model corresponding to the target text region according to a preset correspondence between the text region and the text detection model, wherein the correspondence between the text region and the text detection model includes at least one of: performing text detection on the seal text region by adopting a first Psenet model, performing text detection on the header text region by adopting a second Psenet model, and performing text detection on the layout text region by adopting an EAST model;
the text detection module 114 is configured to detect the text in the target text region by using the target text detection model, so as to obtain a target text detection box.
In a specific implementation, the acquiring module 111 may first receive an original bill picture, and then correct the original bill picture to obtain a picture to be detected.
Further, the obtaining module 111 is specifically configured to receive an original ticket picture; determining the rotation angle of the original bill picture according to the included angle between the edge straight line and the horizontal axis of the original bill picture; and rotating the original bill picture to the horizontal direction according to the rotation angle to obtain a picture to be detected.
The edge line of the original bill picture can be obtained by detecting the edge of the original bill picture by adopting Huffman line detection. The horizontal and vertical edge straight lines in the original bill picture can be detected through Huffman straight line detection, then the original bill picture is rotated according to the included angles between the horizontal and vertical edge straight lines and the horizontal axis, and the original bill picture is rotated to the horizontal direction, so that characters in the bill picture can be detected and identified later.
The region detection module 112 can perform seal detection on the picture to be detected to obtain a seal text region in the picture to be detected; the region detection module 112 may perform text detection on the picture to be detected by using a text detection algorithm (such as Psenet algorithm), and determine a header text region according to an aspect ratio of the text detection box; the region detection module 112 may determine the non-seal text region and the non-header text region in the picture to be detected as layout text regions.
The target text region may also include a handwritten text region, a tabular text region, a capped text region, and the like.
In practical applications, there are various text detection models, such as fast RCNN, SSD, YOLO, psenet, EAST, RRCNN, textBoxes, CTPN, etc., however, each text detection model has its own merits and merits. For example, the deep learning model EAST has the advantage of better detection effect on horizontal or inclined quadrilateral block texts, has the disadvantage of not being capable of completely detecting header long texts, can generate segment detection once the texts are overlong, and can not effectively detect texts in a seal and two-dimensional codes. The other deep learning model Psenet has the advantages of better detection effect on curved text (such as in-seal text) and long block text (such as header text) and two-dimension codes, and has the disadvantage of easy detection on text with closer distance (comprising upper and lower spacing and left and right spacing).
Thus, each text detection model is adapted to detect a different text type (text region), i.e. there is a correspondence between the text region and the matching text detection model. Specifically, the text detection can be performed on the seal text area by adopting the first Psenet model, the text detection can be performed on the header text area by adopting the second Psenet model, and the text detection can be performed on the layout text area by adopting the EAST model. By combining the advantages of different text detection models and adopting a proper text detection model to detect the corresponding text types, the complete and effective detection of various text types in the complex layout can be realized.
In a specific implementation, the text detection module 114 detects the text in the target text area by using the target text detection model determined by the model determination module 113, so as to obtain a target text detection box. For example, the text detection module 114 may detect the seal text region by using the first Psenet model to obtain a seal text detection box; detecting the header text region by adopting a second Psenet model to obtain a header text detection box; and detecting the layout text area by adopting an EAST model to obtain a layout text detection box.
In practical application, the detected target text detection box can be input into a deep learning recognition engine to output a text recognition result.
The embodiment can be applied to OCR emergency bill input (such as the field of medical bill and claim in insurance industry), is responsible for the character detection function of OCR emergency bill pictures, and can solve the pain points of various links such as insurance industry check and claim. The embodiment can not only detect the quadrilateral block characters completely and effectively, but also detect the header long text, the handwritten text, the form text, the capped text, the seal text (curved text) and the two-dimensional code completely, so that the embodiment can be applied to the text detection of other text types such as long text, curved text or quadrilateral block text.
According to the text detection device provided by the embodiment, the target text region in the picture to be detected is detected first, the text detection model suitable for the target text region is used for detecting the text in the region in a targeted manner by combining the advantages or detection characteristics of different text detection models, so that the integrity of text detection is improved, powerful support is provided for subsequent text recognition, and the problem that various text types in complex layouts cannot be completely detected by adopting a single model in the prior art is solved.
In an alternative implementation, the region detection module 112 is specifically configured to:
performing seal detection on the picture to be detected to obtain a seal detection frame;
determining a preprinted seal area and a non-preprinted seal area according to the length-width ratio of the seal detection frame;
and determining the pre-printing seal area and/or the non-pre-printing seal area as the seal text area.
In an alternative implementation, the text detection module 114 is specifically configured to:
detecting the text in the seal text area by adopting a first Psenet model obtained by pre-training to obtain an initial seal text detection box;
and carrying out text correction on the initial seal text detection box to obtain a horizontal seal text detection box.
In an alternative implementation, the region detection module 112 is specifically configured to:
detecting the picture to be detected by adopting a second Psenet model obtained through pre-training to obtain a plurality of text detection boxes;
determining a header text region according to the length-width ratio of the text detection boxes;
the text detection module 114 is specifically configured to:
and determining a text detection box corresponding to the header text region in the text detection boxes as a header text detection box.
In an alternative implementation, the region detection module 112 is specifically configured to:
determining the area except the header text area and the seal text area in the picture to be detected as the layout text area;
the text detection module 114 is specifically configured to:
and detecting the text in the layout text area by adopting an EAST model obtained through pre-training to obtain a layout text detection box.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
Another embodiment of the present application also provides an electronic device, including:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the text detection method of any of the embodiments.
Another embodiment of the present application also provides a storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the text detection method of any of the embodiments.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
The text detection method, the text detection device, the electronic equipment and the storage medium provided by the application are described in detail, and specific examples are applied to illustrate the principle and the implementation of the application, and the description of the above examples is only used for helping to understand the method and the core idea of the application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (9)

1. A method of text detection, the method comprising:
acquiring a picture to be detected;
detecting a target text region in the picture to be detected, wherein the target text region comprises a header text region, a seal text region and a layout text region;
determining a target text detection model corresponding to the target text region according to a preset corresponding relation between the text region and the text detection model, wherein the corresponding relation between the text region and the text detection model comprises the following steps: performing text detection on the seal text region by adopting a first Psenet model, performing text detection on the header text region by adopting a second Psenet model, and performing text detection on the layout text region by adopting an EAST model;
detecting the text in the target text region by adopting the target text detection model to obtain a target text detection box; the text detection of the seal text region by adopting the first Psenet model comprises the following steps:
performing seal detection on the picture to be detected to obtain a seal detection frame; determining a pre-printing seal area and a non-pre-printing seal area according to the length-width ratio of the seal detection frame, and determining the pre-printing seal area and/or the non-pre-printing seal area as the seal text area;
the text detection of the header text region by adopting the second Psenet model comprises the following steps:
performing Psenet character detection on the picture to be detected to obtain a text detection box;
taking the text detection box with the largest length-width ratio of the text detection box as a header text detection box;
the text detection of the layout text area by adopting the EAST model comprises the following steps:
and performing text detection on the bill layout by using an EAST engine.
2. The text detection method according to claim 1, wherein the step of acquiring the picture to be detected includes:
receiving an original bill picture;
determining the rotation angle of the original bill picture according to the included angle between the edge straight line and the horizontal axis of the original bill picture;
and rotating the original bill picture to the horizontal direction according to the rotation angle to obtain the picture to be detected.
3. The text detection method of claim 2, further comprising, before the step of determining the rotation angle of the original document picture based on the angle between the edge straight line of the original document picture and the horizontal axis:
and carrying out edge detection on the original bill picture by adopting Huffman straight line detection to obtain an edge straight line of the original bill picture.
4. The text detection method according to claim 1, wherein the step of detecting text in the target text region using the target text detection model to obtain a target text detection box includes:
detecting the text in the seal text area by adopting a first Psenet model obtained by pre-training to obtain an initial seal text detection box;
and carrying out text correction on the initial seal text detection box to obtain a horizontal seal text detection box.
5. The text detection method according to claim 1, wherein the step of detecting a target text region in the picture to be detected includes:
detecting the picture to be detected by adopting a second Psenet model obtained through pre-training to obtain a plurality of text detection boxes;
determining a header text region according to the length-width ratio of the text detection boxes;
the step of detecting the text in the target text region by using the target text detection model to obtain a target text detection box comprises the following steps:
and determining a text detection box corresponding to the header text region in the text detection boxes as a header text detection box.
6. The text detection method according to claim 1, wherein the step of detecting a target text region in the picture to be detected includes:
determining the area except the header text area and the seal text area in the picture to be detected as the layout text area;
the step of detecting the text in the target text region by using the target text detection model to obtain a target text detection box comprises the following steps:
and detecting the text in the layout text area by adopting an EAST model obtained through pre-training to obtain a layout text detection box.
7. A text detection device, the device comprising:
the acquisition module is configured to acquire a picture to be detected;
the region detection module is configured to detect a target text region in the picture to be detected, wherein the target text region comprises a header text region, a seal text region and a layout text region;
the model determining module is configured to determine a target text detection model corresponding to the target text region according to a preset corresponding relation between the text region and the text detection model, wherein the corresponding relation between the text region and the text detection model comprises: performing text detection on the seal text region by adopting a first Psenet model, performing text detection on the header text region by adopting a second Psenet model, and performing text detection on the layout text region by adopting an EAST model;
the text detection module is configured to detect the text in the target text area by adopting the target text detection model to obtain a target text detection box;
the region detection module is specifically configured to:
performing seal detection on the picture to be detected to obtain a seal detection frame;
determining a preprinted seal area and a non-preprinted seal area according to the length-width ratio of the seal detection frame;
determining the pre-printed seal area and/or the non-pre-printed seal area as the seal text area;
the region detection module is specifically configured to:
performing Psenet character detection on the picture to be detected to obtain a text detection box;
taking the text detection box with the largest length-width ratio of the text detection box as a header text detection box;
the region detection module is specifically configured to:
and performing text detection on the bill layout by using an EAST engine.
8. An electronic device, the electronic device comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the text detection method of any of claims 1 to 6.
9. A storage medium, which when executed by a processor of an electronic device, causes the electronic device to perform the text detection method of any of claims 1 to 6.
CN202010117641.0A 2020-02-25 2020-02-25 Text detection method, text detection device, electronic equipment and storage medium Active CN111414905B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010117641.0A CN111414905B (en) 2020-02-25 2020-02-25 Text detection method, text detection device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010117641.0A CN111414905B (en) 2020-02-25 2020-02-25 Text detection method, text detection device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111414905A CN111414905A (en) 2020-07-14
CN111414905B true CN111414905B (en) 2023-08-18

Family

ID=71492933

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010117641.0A Active CN111414905B (en) 2020-02-25 2020-02-25 Text detection method, text detection device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111414905B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111683285B (en) * 2020-08-11 2021-01-26 腾讯科技(深圳)有限公司 File content identification method and device, computer equipment and storage medium
CN112597999B (en) * 2021-03-03 2021-06-29 北京易真学思教育科技有限公司 Question identification method and device, electronic equipment and computer storage medium
CN112861794A (en) * 2021-03-11 2021-05-28 浙江康旭科技有限公司 Universal detection algorithm for optical printing texts and scene texts

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446621A (en) * 2018-03-14 2018-08-24 平安科技(深圳)有限公司 Bank slip recognition method, server and computer readable storage medium
CN108921166A (en) * 2018-06-22 2018-11-30 深源恒际科技有限公司 Medical bill class text detection recognition method and system based on deep neural network
CN109522900A (en) * 2018-10-30 2019-03-26 北京陌上花科技有限公司 Natural scene character recognition method and device
CN109670494A (en) * 2018-12-13 2019-04-23 深源恒际科技有限公司 A kind of Method for text detection and system of subsidiary recognition confidence
CN110008950A (en) * 2019-03-13 2019-07-12 南京大学 The method of text detection in the natural scene of a kind of pair of shape robust
CN110443250A (en) * 2019-07-31 2019-11-12 天津车之家数据信息技术有限公司 A kind of classification recognition methods of contract seal, device and calculate equipment
CN110647829A (en) * 2019-09-12 2020-01-03 全球能源互联网研究院有限公司 Bill text recognition method and system
CN110796082A (en) * 2019-10-29 2020-02-14 上海眼控科技股份有限公司 Nameplate text detection method and device, computer equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295629B (en) * 2016-07-15 2018-06-15 北京市商汤科技开发有限公司 structured text detection method and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108446621A (en) * 2018-03-14 2018-08-24 平安科技(深圳)有限公司 Bank slip recognition method, server and computer readable storage medium
CN108921166A (en) * 2018-06-22 2018-11-30 深源恒际科技有限公司 Medical bill class text detection recognition method and system based on deep neural network
CN109522900A (en) * 2018-10-30 2019-03-26 北京陌上花科技有限公司 Natural scene character recognition method and device
CN109670494A (en) * 2018-12-13 2019-04-23 深源恒际科技有限公司 A kind of Method for text detection and system of subsidiary recognition confidence
CN110008950A (en) * 2019-03-13 2019-07-12 南京大学 The method of text detection in the natural scene of a kind of pair of shape robust
CN110443250A (en) * 2019-07-31 2019-11-12 天津车之家数据信息技术有限公司 A kind of classification recognition methods of contract seal, device and calculate equipment
CN110647829A (en) * 2019-09-12 2020-01-03 全球能源互联网研究院有限公司 Bill text recognition method and system
CN110796082A (en) * 2019-10-29 2020-02-14 上海眼控科技股份有限公司 Nameplate text detection method and device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
姜典转.基于深度学习的票据文本定位与识别研究.《中国优秀硕士学位论文全文数据库信息科技辑》.2020,(第01期),第I138-1480页. *

Also Published As

Publication number Publication date
CN111414905A (en) 2020-07-14

Similar Documents

Publication Publication Date Title
CN110210413B (en) Multidisciplinary test paper content detection and identification system and method based on deep learning
CN111414905B (en) Text detection method, text detection device, electronic equipment and storage medium
Burie et al. ICDAR2015 competition on smartphone document capture and OCR (SmartDoc)
US11663817B2 (en) Automated signature extraction and verification
US8587685B2 (en) Method and apparatus for retrieving label
CN109635805B (en) Image text positioning method and device and image text identification method and device
US20060062460A1 (en) Character recognition apparatus and method for recognizing characters in an image
CN108846385B (en) Image identification and correction method and device based on convolution-deconvolution neural network
CN111783757A (en) OCR technology-based identification card recognition method in complex scene
CN111191649A (en) Method and equipment for identifying bent multi-line text image
Tardón et al. Optical music recognition for scores written in white mensural notation
CN111079571A (en) Identification card information identification and edge detection model training method and device
CN111340035A (en) Train ticket identification method, system, equipment and medium
CN112434690A (en) Method, system and storage medium for automatically capturing and understanding elements of dynamically analyzing text image characteristic phenomena
CN113177435A (en) Test paper analysis method and device, storage medium and electronic equipment
CN112070649A (en) Method and system for removing specific character string watermark
CN108052936B (en) Automatic inclination correction method and system for Braille image
Nayak et al. Automatic number plate recognition
CN109147002B (en) Image processing method and device
CN114092938A (en) Image recognition processing method and device, electronic equipment and storage medium
Natei et al. Extracting text from image document and displaying its related information
KR102562170B1 (en) Method for providing deep learning based paper book digitizing service
WO2019140641A1 (en) Information processing method and system, cloud processing device and computer program product
CN110766001B (en) Bank card number positioning and end-to-end identification method based on CNN and RNN
CN112991410A (en) Text image registration method, electronic equipment and storage medium thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant