CN114581928A - Form identification method and system - Google Patents

Form identification method and system Download PDF

Info

Publication number
CN114581928A
CN114581928A CN202111632890.4A CN202111632890A CN114581928A CN 114581928 A CN114581928 A CN 114581928A CN 202111632890 A CN202111632890 A CN 202111632890A CN 114581928 A CN114581928 A CN 114581928A
Authority
CN
China
Prior art keywords
image
enhanced
cells
form image
line segment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111632890.4A
Other languages
Chinese (zh)
Inventor
陈君麟
林子越
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
One Chain Alliance Ecological Technology Co ltd
Original Assignee
One Chain Alliance Ecological Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by One Chain Alliance Ecological Technology Co ltd filed Critical One Chain Alliance Ecological Technology Co ltd
Priority to CN202111632890.4A priority Critical patent/CN114581928A/en
Publication of CN114581928A publication Critical patent/CN114581928A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/177Editing, e.g. inserting or deleting of tables; using ruled lines
    • G06F40/18Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/183Tabulation, i.e. one-dimensional positioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration by the use of local operators
    • G06T5/30Erosion or dilatation, e.g. thinning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/40Image enhancement or restoration by the use of histogram techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation

Abstract

The invention relates to the technical field of digital image processing, and aims to provide a form identification method and a form identification system. The table identification method comprises the following steps: acquiring a document image, and extracting an image area only containing a form from the document image to obtain a form image; preprocessing the form image to obtain a preprocessed form image; performing signal enhancement on the table lines in the preprocessed table image to obtain an enhanced table image; extracting a point set of each intersection point in the table in the enhanced table image, and obtaining a table structure of the enhanced table image according to the point set; performing text detection on the enhanced form image to obtain a text detection result; performing character recognition on the text detection result to obtain a character recognition result; and generating the electronic form according to the form structure and the character recognition result. The invention can detect, identify and restore tables in documents with complex layouts, such as insurance policies, and is convenient for automatic table entry and filing management of complex documents.

Description

Form identification method and system
Technical Field
The invention relates to the technical field of digital image processing, in particular to a form identification method and a form identification system.
Background
The automatic office is an indispensable process of modern enterprises, and especially in the financial industry, a large number of documents need to be arranged, recorded, filed and other operations, such as manual recording, a large amount of labor cost and time cost are consumed.
In the field of image processing, research work related to document identification is always carried out, and particularly with the development of deep learning in computer vision, document identification technology has greatly advanced.
However, the format of documents such as insurance documents is very complex, and usually has a plurality of forms, and the typesetting of different insurance companies and risk varieties is very different, and in the process of identifying such documents by using the prior art, the inventor finds that the prior art has at least the following problems, so that the effect of automatically entering and archiving the forms in the documents cannot meet the requirements of users: 1) a document with a complex layout cannot be analyzed; 2) the table with a complex structure is difficult to analyze; 3) if the method is based on template matching, the generalization performance is extremely poor.
Disclosure of Invention
The present invention is directed to solve the above technical problems to at least some extent, and the present invention provides a table identification method and system.
The technical scheme adopted by the invention is as follows:
in a first aspect, the present invention provides a table identification method, including:
acquiring a document image, and extracting an image area only containing a form from the document image to obtain a form image;
preprocessing the form image to obtain a preprocessed form image;
performing signal enhancement on the table lines in the preprocessed table image to obtain an enhanced table image;
extracting a point set of each intersection point in the table in the enhanced table image, and obtaining a table structure of the enhanced table image according to the point set;
performing text detection on the enhanced form image to obtain a text detection result;
performing character recognition on the text detection result to obtain a character recognition result;
and generating an electronic form according to the form structure and the character recognition result.
The invention can detect, identify and restore tables in documents with complex layouts, such as insurance documents, is convenient for automatic table entry and filing management of the complex documents, such as insurance documents, solves the problem of difficult table identification caused by various types and complex layouts in the documents, such as the insurance documents, and can improve the accuracy of table structure and content identification in the complex documents, such as the insurance documents, and the like.
In one possible design, extracting an image region containing only a form from the document image to obtain a form image, including:
constructing a target detection model;
acquiring a sample set, introducing the sample set into the target detection model, and performing multi-scale training on the target detection model to obtain a primarily trained target detection model; wherein the sample in the sample set comprises a document image containing a table;
simplifying the target detection model after the primary training by using a pruning technology to obtain a simplified target detection model;
labeling the sample set based on transfer learning to obtain labeling information;
inputting the labeling information into the simplified target detection model, and performing retraining on the simplified target detection model to obtain a final trained target detection model;
and inputting the document image into a finally trained target detection model to obtain a form image.
In one possible design, the target detection model employs an FCOS model.
In one possible design, the preprocessing the form image to obtain a preprocessed form image includes:
converting the form image into a gray image, calculating a histogram of the gray image, and performing statistical analysis on the histogram to obtain a gray average value and a gray range of background pixels in the gray image;
filtering background pixel points of the histogram to obtain a filtered histogram, traversing remaining target pixel points and gray values thereof in the histogram to obtain a gray value corresponding to the maximum inter-class variance among the target pixel points;
and taking the gray value corresponding to the maximum inter-class variance as a threshold value, and carrying out binarization processing on the table image to obtain a preprocessed table image.
In a possible design, after filtering background pixel points of the histogram to obtain a filtered histogram, the form identification method further includes:
and importing the filtered histogram into a trained ESRGAN model, and performing super-resolution conversion on the gray level image.
In one possible design, after performing binarization processing on the form image, the form identification method further includes:
obtaining a first line segment set in the table image after binarization processing based on Hough transform;
calculating a tilt angle average of the first set of segments;
filtering a second line segment deviating from the average value of the inclination angles in the first line segment set to obtain a second line segment set;
acquiring an inclination angle average value of a second line segment set to obtain an inclination angle of the table image after binarization processing;
and correcting the table image after the binarization processing based on planar affine transformation according to the inclination angle of the table image after the binarization processing.
In one possible design, extracting a point set of each intersection point in a table in the enhanced table image, and obtaining a table structure of the enhanced table image according to the point set, includes:
extracting a point set of each intersection point in the table in the enhanced table image by a matrix intersection method;
determining the structures of all cells in the enhanced form image according to the point set and the enhanced form image;
and obtaining the table structure of the enhanced table image according to the structures of all the cells.
In one possible design, determining the structure of all cells in the enhanced form image according to the set of points and the enhanced form image includes:
constructing a cell class, wherein the attributes of the cell class comprise the upper left corner coordinate of the cell, the lower right corner coordinate of the cell and an attribution intersection point;
screening point sets of all intersection points in the table in the enhanced table image, selecting a unique intersection point from each point set, and then sorting horizontal and vertical coordinates of all the intersection points respectively to obtain sorted intersection points;
respectively clustering the lengths and the widths of all possible cells according to the horizontal and vertical coordinates of the sequenced intersection points;
constructing a complete table according to the enhanced table image, wherein no merging cells exist in the complete table;
for the upper left-corner coordinate connection line of any two adjacent cells in the complete table, constructing a third line segment which is perpendicular to the connection line of the two intersection points and has a preset length according to pixels between the two intersection points one by one to obtain a third line segment set;
taking the enhanced form image as a reference object, judging whether the sum of pixel values of at least one third line segment in the third line segment set is 0 or not, if so, judging that two cells corresponding to the third line segment in the form image after binarization processing are merged cells, modifying the class attribute of a next cell in the two cells to enable the next cell and a previous cell to belong to the same intersection point, and if not, judging that two cells corresponding to the third line segment in the form image after binarization processing are independent cells;
and traversing all the cells in the enhanced form image until obtaining the structures of all the cells in the enhanced form image.
In one possible design, performing text detection on the enhanced form image to obtain a text detection result, including:
and performing text detection on the enhanced form image based on the trained DB model to obtain a text detection result.
In a second aspect, the present invention provides a form recognition system for implementing the form recognition method according to any one of the above items; the form recognition system includes:
the form image extraction module is used for acquiring a document image and extracting an image area only containing a form from the document image to obtain a form image;
the form image preprocessing module is in communication connection with the form image extraction module and is used for preprocessing the form image to obtain a preprocessed form image;
the table image enhancement module is in communication connection with the image preprocessing module and is used for performing signal enhancement on the table lines in the preprocessed table image to obtain an enhanced table image;
the table structure identification module is in communication connection with the table image enhancement module and is used for extracting a point set of each intersection point in the table in the enhanced table image and obtaining the table structure of the enhanced table image according to the point set;
the text detection module is in communication connection with the form image enhancement module and is used for performing text detection on the enhanced form image to obtain a text detection result;
the text detection module is used for detecting the text of the user, and detecting the text of the user according to the text detection result;
and the electronic form generation module is respectively in communication connection with the form structure identification module and the character identification module and is used for generating an electronic form according to the form structure and the character identification result.
Drawings
FIG. 1 is a schematic diagram of a table identification method according to the present invention;
FIG. 2 is a schematic diagram of the structure of the target detection model in tag matching according to the present invention;
FIG. 3 is an exemplary diagram of a document image input in the present invention (sensitive information processed);
FIG. 4 is an exemplary diagram of a form image to be pre-processed in the present invention (sensitive information processed);
FIG. 5 is an exemplary illustration of a converted grayscale image in the present invention;
FIG. 6 is an exemplary diagram of a table image after preprocessing in the present invention;
FIG. 7 is an exemplary diagram of an enhanced form image in the present invention;
FIG. 8 is an exemplary diagram of a spreadsheet generated in the present invention;
FIG. 9 is a block diagram of a table identification system of the present invention.
Detailed Description
The invention is further described with reference to the following figures and specific embodiments.
It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another.
It should also be noted that, in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed substantially concurrently, or the figures may sometimes be executed in the reverse order, depending upon the functionality/acts involved.
Example 1:
the first aspect of the present embodiment provides a form recognition method and system, which can be executed by, but not limited to, a Computer device or a virtual machine with certain computing resources, for example, a Personal Computer (PC, which refers to a multipurpose Computer with a size, price and performance suitable for Personal use, and electronic devices such as a desktop Computer, a notebook Computer, a mini-notebook Computer, a tablet Computer, a super Computer, and the like, all belong to a Personal Computer), a smart phone, a Personal digital assistant (PAD), or a wearable device, or a virtual machine Hypervisor, so as to perform form automatic entry and filing management on complex documents such as insurance documents.
As shown in fig. 1, a table recognition method and system may include, but are not limited to, the following steps:
s1, acquiring a document image, and extracting an image area only containing a form from the document image to obtain a form image; in the embodiment, the document image may be, but is not limited to, PDF, a shot image, a scanogram, and the like;
in step S1, extracting an image area including only a form from the document image to obtain a form image, including:
s101, constructing a target detection model; it should be noted that the target detection model may be divided into a feature extraction module, a feature fusion module, and a classification detection module, which are respectively used for performing feature extraction, feature fusion, and classification detection operations on received data.
In the prior art, a large amount of redundant information and noise interference items exist for a document with a complex layout, and the table position in the document obtained by using the conventional image processing technology is very unstable and is not beneficial to obtaining a table image, so that the following improvements are made in the embodiment to solve the technical problem: the target Detection model adopts an FCOS (full volumetric One-Stage Object Detection) model. It should be noted that, by adopting the FCOS model and using the FCOS detection method, the accuracy of the detection of the target position can be greatly improved; meanwhile, the FCOS detection method is a detection method without an anchor frame, so that the training time and the development difficulty are greatly reduced, the training parameters can be reduced, the reasoning speed can be accelerated, the number of regression objects needing to be trained is greatly reduced, and meanwhile, the generalization performance is stronger.
S102, obtaining a sample set, introducing the sample set into the target detection model, and performing multi-scale training on the target detection model to obtain a primarily trained target detection model; wherein the sample in the sample set comprises a document image containing a table; in this embodiment, the sample set employs an open source TableBank dataset, specifically, a TableBank dataset that is an image-based table detection and identification dataset that is based on Word and Latex documents on the Internet and is constructed using a novel weak surveillance mechanism, totaling 417234 tables of high quality tags and their original documents in various domains. It should be understood that after multi-scale training of the target detection model, only one scale of feature maps is retained.
In this embodiment, when performing multi-scale training on the target detection model, the method includes: and obtaining a plurality of feature maps with different scales, and matching the corresponding relation between the regression object and the real label in the feature maps with different scales so as to realize feature extraction and integration and obtain a final feature vector. In this embodiment, the feature maps with different scales are set to be 5 types, and the feature maps with different scales are obtained by inputting the document image into the feature fusion module in the target detection model and processing the document image. When the corresponding relationship between the regression object and the real label is matched in the feature maps with different scales, as shown in fig. 2, the structure diagram is shown when the labels are matched, the middle side graph is a feature map with 8 x 8, and each pixel point on the feature map corresponds to an area on the original image in proportion; each small area on the original image is used as a regression object. And comparing the position of each regression object with the real label frame, wherein the regression object contains the reservation in the real label frame, and otherwise, removing the regression object. As shown in the left diagram of fig. 2, a small region a in the fourth column of the second row is included in one real label, and a small region a in the second column of the third row is not included in any real label, so a remains as a valid regression object, and B is removed. More specifically, in the present embodiment, a matrix of corner coordinates of candidate regression objects is constructed, a matrix of corner coordinates of a real tag box is constructed, and matrix subtraction is utilized, so as to quickly screen valid regression objects.
In this embodiment, an empirically based proportional size may be set for each scale feature map; for a feature map with a single scale, if the proportion of the real label frame in the original image does not match the corresponding proportion size of the feature map, the regression object on the feature map does not need to pay attention to the real label frame.
After the corresponding relationship between the regression object and the real label is matched, in any single-scale feature map, the situation that one regression object is matched with a plurality of real label frames still possibly occurs, and the regression object is called as a fuzzy sample at this moment. In the embodiment, the real label frame with a small area is selected to be matched with the fuzzy sample to serve as the regression target of the fuzzy sample, so that most candidate regression objects can be filtered out, the training difficulty is reduced, and the reasoning speed is accelerated.
S103, simplifying the target detection model after the primary training by using a pruning technology to obtain a simplified target detection model; it should be noted that, in the present embodiment, the pruning technique is used, so that the model can be simplified, and the method is suitable for a case where the overall difference in size of the input documents such as the policy is small in an actual scene.
S104, labeling the sample set based on transfer learning to obtain labeling information; it should be noted that, in this embodiment, the sample set is labeled based on the transfer learning, so that the difficulty of the manual labeling task in the early stage of the model training can be reduced. It should be understood that the labeling information of the sample set is obtained by manually labeling the input documents such as the policy.
S105, inputting the labeling information into the simplified target detection model, and performing retraining on the simplified target detection model to obtain a final trained target detection model;
and S106, inputting the document image into a finally trained target detection model to obtain a form image. As shown in fig. 3, the document images are input into the final trained target detection model.
S2, preprocessing the form image to obtain a preprocessed form image; it should be noted that the preprocessing is used for removing noise and interference items such as a background watermark and a drop red stamp in the form image, and performing processing such as inclination angle correction on the form image so as to obtain a clear form image;
when the document background is complex (for example, the background color is different and the watermark color is of a plurality of types), in the prior art, when the table image is preprocessed, the foreground image and the background watermark are usually distinguished by an image binarization method, however, as shown in fig. 4 and 5, when the difference between the background watermark and the background is large, the effect of directly preprocessing the table image such as binarization is very poor, so that it is difficult to distinguish the foreground, the background and the background watermark at the same time, because the original algorithm classifies the background watermark into the category of the foreground, the purpose of distinguishing the foreground cannot be achieved. To solve the above problems of the prior art, the present embodiment further improves as follows: preprocessing the form image to obtain a preprocessed form image, wherein the preprocessing comprises the following steps:
s201, converting the form image into a gray image, calculating a histogram of the gray image, and obtaining a gray average value and a gray range of background pixels in the gray image by performing statistical analysis on the histogram as a background is the part with the largest proportion in the image; in this embodiment, the form image to be preprocessed is shown in fig. 4, and the converted grayscale image is shown in fig. 5;
s202, filtering background pixel points of the histogram to obtain a filtered histogram;
in the prior art, when filtering a background and a background watermark, slight information loss is inevitably caused to a gray level image, and if an input original image pixel is too low and the confidence coefficient or accuracy of character recognition is generally low in a subsequent inspection process, the definition of a filtered histogram is too low. In order to solve the technical problem, in this embodiment, after filtering the background pixel points of the histogram and obtaining the filtered histogram, the table identification method further includes:
s203, importing the filtered histogram into a trained ESRGAN (Enhanced Super-Resolution generating adaptive network) model, and performing Super-Resolution conversion on the low-Resolution gray image. Therefore, the table lines and the fuzzy areas of the table contents in the table image can be repaired to a certain degree, and the accuracy and the stability of subsequent table identification and character identification are improved.
In this embodiment, the acquisition process of the trained ESRGAN model is as follows:
constructing an ESRGAN model;
acquiring a batch of form images, wherein each form image comprises a high-resolution picture and a low-resolution picture, and the low-resolution pictures are obtained by downsampling the high-resolution pictures;
and inputting the batch of form images into the ESRGAN model for training to obtain the trained ESRGAN model.
S204, traversing the remaining target pixel points in the histogram and the gray values of the target pixel points to obtain the gray values corresponding to the maximum inter-class variance among the target pixel points; in this embodiment, the remaining target pixel points and their gray values in the histogram, that is, the remaining target pixel points and their gray values after the filtered histogram is removed from the histogram;
s205, using the gray value corresponding to the maximum inter-class variance as a threshold value, and performing binarization processing on the table image;
it should be noted that, during binarization processing, the morphological kernel of the corresponding scale can be used for the horizontal lines of the table according to the image size, and the vertical lines of the table are filtered out through the erosion expansion operation of the binary image, and only the horizontal lines in the table are reserved. In this embodiment, after performing binarization processing on the table image, the table identification method further includes:
s206, obtaining a first line segment set in the table image after binarization processing based on Hough transform;
s207, calculating an inclination angle average value of the first line segment set;
s208, filtering a second line segment deviating from the average value of the inclination angles in the first line segment set to obtain a second line segment set;
s209, obtaining the average value of the inclination angles of the second line segment set to obtain the inclination angle of the table image after the binarization processing;
s210, correcting the table image after the binarization processing based on planar affine transformation according to the inclination angle of the table image after the binarization processing;
the above flow can realize the inclination correction of the table image after the binarization processing, so that the spatial domain of the table image can be adjusted.
S211, obtaining the preprocessed form image, wherein the preprocessed form image is shown in FIG. 6 in the embodiment. It should be noted that the preprocessed form image can effectively distinguish the foreground image from other images (other images include background, background watermark and other noise data).
In the prior art, the greater fluid algorithm usually calculates the threshold value for minimizing the intra-class variance and maximizing the inter-class variance through exhaustive search, and in this embodiment, by improving the greater fluid threshold value method, adaptive calculation of the threshold value can be realized, which can be adapted to documents such as policy of different companies or dangerous species, avoid using a preset template, stably filter out the background and background watermarks (including red seals, etc.), and the stability of subsequent table line extraction is higher.
S3, performing signal enhancement on the table lines in the preprocessed table image to obtain an enhanced table image; it should be noted that, when performing signal enhancement on the form line, conventional digital image processing techniques may be used, including but not limited to morphological image processing techniques, hough transform processing techniques, and the like, which are not limited herein; in this embodiment, the enhanced form image is shown in fig. 7;
s4, extracting a point set of each intersection point in a table in the enhanced table image, and obtaining a table structure of the enhanced table image according to the point set;
in this embodiment, extracting a point set of each intersection point in a table in the enhanced table image, and obtaining a table structure of the enhanced table image according to the point set includes:
s401, extracting a point set of each intersection point in a table in the enhanced table image by a matrix intersection method; compared with the traditional method of calculating the fitted straight line firstly and then calculating the intersection point of the horizontal line and the vertical line through a mathematical formula, in the embodiment, the matrix intersection method can greatly reduce the complexity of point set calculation and extraction;
s402, determining the structures of all cells in the enhanced form image according to the point set and the enhanced form image;
in this embodiment, determining the structures of all cells in the enhanced form image according to the point set and the enhanced form image includes:
s4021, constructing a cell class, wherein the attributes of the cell class comprise an upper left corner coordinate of the cell, a lower right corner coordinate of the cell and an attribution intersection point (assuming that the cell belongs to the same intersection point, the same cell is formed);
s4022, screening point sets of all intersection points in the table in the enhanced table image, selecting a unique intersection point from each point set, and sequencing horizontal and vertical coordinates of all the intersection points respectively to obtain sequenced intersection points;
s4023, respectively clustering the lengths and widths of all possible cells according to the horizontal and vertical coordinates of the sequenced intersection points;
s4024, constructing a complete table according to the enhanced table image, wherein the complete table does not have a merging cell, and the constructed complete table is 17 rows by 11 columns in the embodiment;
s4025, for the upper left corner coordinate connecting line of any two adjacent cells in the complete table, constructing a third line segment which is perpendicular to the connecting line of the two intersection points and has a length of a preset value (the preset value is 10pix in the embodiment) according to pixels between the two intersection points one by one, and obtaining a third line segment set;
s4026, taking the enhanced form image as a reference object, judging whether the sum of pixel values of at least one third line segment in the third line segment set is 0, if so, judging that two cells corresponding to the third line segment in the form image after binarization processing are merged cells, modifying the class attribute of a next cell in the two cells to enable the next cell and a previous cell to belong to the same intersection point, and if not, judging that two cells corresponding to the third line segment in the form image after binarization processing are independent cells;
and traversing all the cells in the enhanced form image in the order from left to right and from top to bottom, and repeatedly executing the step of acquiring the third line set to judge the cell type (namely the steps S4025 to S4026) until the structures of all the cells in the enhanced form image are obtained.
In this embodiment, in the step of obtaining the table structure of the enhanced table image, a series of line sets are constructed between the vertexes of any two adjacent cells, and statistical analysis is performed on each pixel point of the line sets, so that the structure of the cells is determined, the table structure of the enhanced table image is finally obtained, and the fault tolerance of the result is high.
And S403, obtaining the table structure of the enhanced table image according to the structures of all the cells.
S5, performing text detection on the enhanced form image to obtain a text detection result;
in the prior art, methods for text detection can be classified into methods based on regression ideas and methods based on segmentation ideas. Among them, the method based on the regression idea is developed earlier and can be subdivided into the method based on the object box and the method based on the pixel value regression. The object frame regression-based method has a good detection effect on the regular-shaped texts, but has a poor detection effect on the irregular-shaped texts; pixel value regression based methods can handle irregular text, but the real-time is very poor. The method based on the segmentation idea is also directed at text detection at the pixel level, but the segmentation network only needs to classify the foreground and the background of each pixel, and the training and reasoning time is greatly improved compared with the method based on pixel regression. Based on this, in this embodiment, performing text detection on the enhanced form image to obtain a text detection result includes:
and performing text detection on the enhanced form image based on a trained DB (binary binarization) model to obtain a text detection result. In this embodiment, by using an approximate binarization formula, the non-differentiable image binarization process is changed into a differentiable process, and the process becomes a part of network training, so that the threshold value can be adaptively trained, thereby improving the performance of text block segmentation and accelerating the training and text detection.
S6, performing character recognition on the text detection result to obtain a character recognition result; it should be noted that, in this embodiment, the character recognition may be performed by using, but not limited to, a CRNN neural network model and a CTC loss function, and this is not limited herein;
and S7, generating an electronic table according to the table structure and the character recognition result. In this embodiment, the spreadsheet may be, but is not limited to, in JSON format or Excel format, and is not limited herein. In the present embodiment, the generated spreadsheet is shown in fig. 8.
The embodiment can detect, identify and restore forms in documents such as insurance policies with complex layouts, is convenient for automatic form entry and filing management of complex documents such as insurance documents, solves the problem of difficult form identification caused by various types and complex layouts in the documents such as insurance policies, and can improve the accuracy of form structure and content identification in the complex documents such as insurance policies.
Example 2:
the present embodiment provides a form identification system, which is used to implement the form identification method in embodiment 1; as shown in fig. 9, the form recognition system includes:
the table image extraction module is used for acquiring a document image and extracting an image area only containing a table from the document image to obtain a table image;
the form image preprocessing module is in communication connection with the form image extraction module and is used for preprocessing the form image to obtain a preprocessed form image;
the table image enhancement module is in communication connection with the image preprocessing module and is used for performing signal enhancement on the table lines in the preprocessed table image to obtain an enhanced table image;
the table structure identification module is in communication connection with the table image enhancement module and is used for extracting a point set of each intersection point in the table in the enhanced table image and obtaining the table structure of the enhanced table image according to the point set;
the text detection module is in communication connection with the form image enhancement module and is used for performing text detection on the enhanced form image to obtain a text detection result;
the text detection module is used for detecting the text of the user, and detecting the text of the user according to the text detection result;
and the electronic form generation module is respectively in communication connection with the form structure identification module and the character identification module and is used for generating an electronic form according to the form structure and the character identification result.
Example 3:
on the basis of embodiment 1 or 2, this embodiment discloses an electronic device, which may be a smart phone, a tablet computer, a notebook computer, or a desktop computer, etc. The electronic device may be referred to as a terminal, a portable terminal, a desktop terminal, or the like, and includes:
a memory for storing computer program instructions; and the number of the first and second groups,
a processor for executing the computer program instructions to perform the operations of the form identification method as in any of embodiment 1.
Example 4:
on the basis of any embodiment of embodiments 1 to 3, the present embodiment discloses a computer-readable storage medium for storing computer-readable computer program instructions configured to, when executed, perform the operations of the table identification method according to embodiment 1.
It should be noted that the functions described herein, if implemented in software functional units and sold or used as a stand-alone product, may be stored in a non-transitory computer readable storage medium that is executable by a processor. Based on such understanding, the technical solution of the present invention or a part thereof which substantially contributes to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present invention is not limited to any specific combination of hardware and software.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: modifications of the technical solutions described in the embodiments or equivalent replacements of some technical features may still be made. And such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Finally, it should be noted that the present invention is not limited to the above alternative embodiments, and that any person can obtain other products in various forms in the light of the present invention. The above detailed description should not be taken as limiting the scope of the invention, which is defined in the claims, and which the description is intended to be interpreted accordingly.

Claims (10)

1. A method of identifying a form, comprising: the method comprises the following steps:
acquiring a document image, and extracting an image area only containing a form from the document image to obtain a form image;
preprocessing the form image to obtain a preprocessed form image;
performing signal enhancement on the table lines in the preprocessed table image to obtain an enhanced table image;
extracting a point set of each intersection point in the table in the enhanced table image, and obtaining a table structure of the enhanced table image according to the point set;
performing text detection on the enhanced form image to obtain a text detection result;
performing character recognition on the text detection result to obtain a character recognition result;
and generating an electronic form according to the form structure and the character recognition result.
2. A form recognition method according to claim 1, wherein: extracting an image area only containing a table from the document image to obtain a table image, wherein the table image comprises:
constructing a target detection model;
acquiring a sample set, introducing the sample set into the target detection model, and performing multi-scale training on the target detection model to obtain a primarily trained target detection model; wherein the sample set samples comprise document images containing tables;
simplifying the target detection model after the primary training by using a pruning technology to obtain a simplified target detection model;
labeling the sample set based on transfer learning to obtain labeling information;
inputting the labeling information into the simplified target detection model, and performing retraining on the simplified target detection model to obtain a final trained target detection model;
and inputting the document image into a finally trained target detection model to obtain a form image.
3. A form recognition method as claimed in claim 2, wherein: the target detection model adopts an FCOS model.
4. A form recognition method as claimed in claim 1, wherein: preprocessing the form image to obtain a preprocessed form image, wherein the preprocessing comprises the following steps:
converting the form image into a gray image, calculating a histogram of the gray image, and performing statistical analysis on the histogram to obtain a gray average value and a gray range of background pixels in the gray image;
filtering background pixel points of the histogram to obtain a filtered histogram, traversing remaining target pixel points and gray values thereof in the histogram to obtain a gray value corresponding to the maximum inter-class variance among the target pixel points;
and performing binarization processing on the table image by using the gray value corresponding to the maximum inter-class variance as a threshold value to obtain a preprocessed table image.
5. A form recognition method as claimed in claim 4, wherein: after filtering the background pixel points of the histogram and obtaining the filtered histogram, the table identification method further includes:
and importing the filtered histogram into a trained ESRGAN model, and performing super-resolution conversion on the gray level image.
6. A form recognition method as claimed in claim 4, wherein: after the binarization processing is performed on the table image, the table identification method further comprises the following steps:
obtaining a first line segment set in the table image after binarization processing based on Hough transform;
calculating a mean value of the tilt angles of the first set of segments;
filtering a second line segment deviating from the average value of the inclination angles in the first line segment set to obtain a second line segment set;
acquiring an inclination angle average value of a second line segment set to obtain an inclination angle of the table image after binarization processing;
and correcting the table image after the binarization processing based on planar affine transformation according to the inclination angle of the table image after the binarization processing.
7. A form recognition method according to claim 6, wherein: extracting a point set of each intersection point in the table in the enhanced table image, and obtaining a table structure of the enhanced table image according to the point set, wherein the table structure comprises the following steps:
extracting a point set of each intersection point in the table in the enhanced table image by a matrix intersection method;
determining the structures of all cells in the enhanced form image according to the point set and the enhanced form image;
and obtaining the table structure of the enhanced table image according to the structures of all the cells.
8. A form recognition method as claimed in claim 7, wherein: determining the structures of all cells in the enhanced form image according to the point set and the enhanced form image, wherein the determining comprises the following steps:
constructing a cell class, wherein the attributes of the cell class comprise the upper left corner coordinate of the cell, the lower right corner coordinate of the cell and an attribution intersection point;
screening point sets of all intersection points in the table in the enhanced table image, selecting a unique intersection point from each point set, and then sorting horizontal and vertical coordinates of all the intersection points respectively to obtain sorted intersection points;
respectively clustering the lengths and the widths of all possible cells according to the horizontal and vertical coordinates of the sequenced intersection points;
constructing a complete table according to the enhanced table image, wherein no merging cells exist in the complete table;
for the upper left-corner coordinate connection line of any two adjacent cells in the complete table, constructing a third line segment which is perpendicular to the connection line of the two intersection points and has a preset length according to pixels between the two intersection points one by one to obtain a third line segment set;
taking the enhanced form image as a reference object, judging whether the sum of pixel values of at least one third line segment in the third line segment set is 0 or not, if so, judging that two cells corresponding to the third line segment in the form image after binarization processing are merged cells, modifying the class attribute of a next cell in the two cells to enable the next cell and a previous cell to belong to the same intersection point, and if not, judging that two cells corresponding to the third line segment in the form image after binarization processing are independent cells;
and traversing all the cells in the enhanced form image until the structures of all the cells in the enhanced form image are obtained.
9. A form recognition method as claimed in claim 1, wherein: performing text detection on the enhanced form image to obtain a text detection result, including:
and performing text detection on the enhanced form image based on the trained DB model to obtain a text detection result.
10. A form recognition system, characterized by: for implementing a table recognition method as claimed in any one of claims 1 to 9; the form recognition system includes:
the form image extraction module is used for acquiring a document image and extracting an image area only containing a form from the document image to obtain a form image;
the form image preprocessing module is in communication connection with the form image extraction module and is used for preprocessing the form image to obtain a preprocessed form image;
the table image enhancement module is in communication connection with the image preprocessing module and is used for performing signal enhancement on the table lines in the preprocessed table image to obtain an enhanced table image;
the table structure identification module is in communication connection with the table image enhancement module and is used for extracting a point set of each intersection point in the table in the enhanced table image and obtaining the table structure of the enhanced table image according to the point set;
the text detection module is in communication connection with the form image enhancement module and is used for performing text detection on the enhanced form image to obtain a text detection result;
the text detection module is used for detecting the text of the user, and detecting the text of the user according to the text detection result;
and the electronic form generation module is respectively in communication connection with the form structure identification module and the character identification module and is used for generating an electronic form according to the form structure and the character identification result.
CN202111632890.4A 2021-12-29 2021-12-29 Form identification method and system Pending CN114581928A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111632890.4A CN114581928A (en) 2021-12-29 2021-12-29 Form identification method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111632890.4A CN114581928A (en) 2021-12-29 2021-12-29 Form identification method and system

Publications (1)

Publication Number Publication Date
CN114581928A true CN114581928A (en) 2022-06-03

Family

ID=81771942

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111632890.4A Pending CN114581928A (en) 2021-12-29 2021-12-29 Form identification method and system

Country Status (1)

Country Link
CN (1) CN114581928A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115294588A (en) * 2022-08-17 2022-11-04 湖北鑫英泰系统技术股份有限公司 Data processing method and system based on RPA process robot
CN116824611A (en) * 2023-08-28 2023-09-29 星汉智能科技股份有限公司 Table structure identification method, electronic device, and computer-readable storage medium
CN115294588B (en) * 2022-08-17 2024-04-19 湖北鑫英泰系统技术股份有限公司 Data processing method and system based on RPA flow robot

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115294588A (en) * 2022-08-17 2022-11-04 湖北鑫英泰系统技术股份有限公司 Data processing method and system based on RPA process robot
CN115294588B (en) * 2022-08-17 2024-04-19 湖北鑫英泰系统技术股份有限公司 Data processing method and system based on RPA flow robot
CN116824611A (en) * 2023-08-28 2023-09-29 星汉智能科技股份有限公司 Table structure identification method, electronic device, and computer-readable storage medium
CN116824611B (en) * 2023-08-28 2024-04-05 星汉智能科技股份有限公司 Table structure identification method, electronic device, and computer-readable storage medium

Similar Documents

Publication Publication Date Title
US10943105B2 (en) Document field detection and parsing
US8442319B2 (en) System and method for classifying connected groups of foreground pixels in scanned document images according to the type of marking
US20190385054A1 (en) Text field detection using neural networks
US7653244B2 (en) Intelligent importation of information from foreign applications user interface
JP5492205B2 (en) Segment print pages into articles
Antonacopoulos et al. ICDAR2015 competition on recognition of documents with complex layouts-RDCL2015
CN109213886B (en) Image retrieval method and system based on image segmentation and fuzzy pattern recognition
CN115273115A (en) Document element labeling method and device, electronic equipment and storage medium
CN113158895A (en) Bill identification method and device, electronic equipment and storage medium
Selvi et al. Recognition of Arabic numerals with grouping and ungrouping using back propagation neural network
Sharma et al. Primitive feature-based optical character recognition of the Devanagari script
CN114581928A (en) Form identification method and system
CN111414917B (en) Identification method of low-pixel-density text
EP3985527A1 (en) Processing digitized handwriting
Kataria et al. CNN-bidirectional LSTM based optical character recognition of Sanskrit manuscripts: A comprehensive systematic literature review
Naz et al. Challenges in baseline detection of cursive script languages
Shirdhonkar et al. Discrimination between printed and handwritten text in documents
CN114758340A (en) Intelligent identification method, device and equipment for logistics address and storage medium
Salagar et al. Analysis of PCA usage to detect and correct skew in document images
Zheng et al. Recognition of expiry data on food packages based on improved DBNet
El Makhfi Handwritten text segmentation approach in historical Arabic documents
Kazdar et al. Table Recognition in Scanned Documents
CN114202761B (en) Information batch extraction method based on picture information clustering
Mohite et al. Challenging issues in Devanagari script recognition
Mehta et al. A survey on the application of image processing techniques on palm leaf manuscripts

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination