CN114419647A - Table information extraction method and system - Google Patents

Table information extraction method and system Download PDF

Info

Publication number
CN114419647A
CN114419647A CN202111665466.XA CN202111665466A CN114419647A CN 114419647 A CN114419647 A CN 114419647A CN 202111665466 A CN202111665466 A CN 202111665466A CN 114419647 A CN114419647 A CN 114419647A
Authority
CN
China
Prior art keywords
line
cell
lines
boundary
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111665466.XA
Other languages
Chinese (zh)
Inventor
饶顶锋
陶坚坚
刘伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yitu Zhixun Technology Co ltd
Original Assignee
Beijing Yitu Zhixun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yitu Zhixun Technology Co ltd filed Critical Beijing Yitu Zhixun Technology Co ltd
Priority to CN202111665466.XA priority Critical patent/CN114419647A/en
Publication of CN114419647A publication Critical patent/CN114419647A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Abstract

The invention belongs to the technical field of image processing and pattern recognition, and discloses a table information extraction method and a system, wherein the method comprises the following steps: acquiring a form image; detecting the position of a text line, and acquiring the coordinate information of a text block; identifying the content of a text line, and acquiring the text content and direction information of a text block; correcting the image direction, calculating an image inclination angle according to the position angle and the direction information of the text block, and performing inclination correction; analyzing and predicting a table line, inputting the image into a deep learning model, and extracting a table line characteristic diagram; analyzing the fusion table line to generate a binary image; analyzing the table cell information, clustering based on the four-side boundary theory and calculating the table row and column information; fusing cell information; and (5) formatting the output. The method can accurately extract and restore the table image, fully utilizes the excellent feature extraction capability of the deep neural network and the advantage of high performance of the traditional image processing algorithm, improves the robustness and universality of the scheme, and has excellent table extraction speed and effect.

Description

Table information extraction method and system
Technical Field
The invention belongs to the technical field of image processing and pattern recognition, and particularly relates to a table information extraction method and system.
Background
With the rapid development of mobile interconnection technology, big data and 5G technology, a large amount of pictures and documents are generated in daily work and life, and the data needs to be analyzed and processed, for example, a large amount of pictures and document data are generated in the work flows of banks, insurance, government affairs and the like, and the data needs to be archived, analyzed and modeled, and under the large environment of responding cost reduction and efficiency improvement, a plurality of enterprises are urged to utilize technical means to assist in efficiently processing the big data, so that the work efficiency of the enterprises is improved, and the operation cost is reduced. In documents processed in daily office, a table is a very common data storage method, and people like to display data in a table form because the table is favorable for expressing structured information. Although the information contained in the form is very rich, many times the form is either stored in a PDF document or displayed in a picture form, which is not very beneficial to the extraction of form information and the processing of data, and therefore, the technology based on the automatic analysis processing of form data is very important in daily work.
The forms have the characteristics of various types, various styles and the like, and different obtaining modes can generate great difference on the forms, for example, the document types include modern electronic documents, historical handwritten documents, documents or pictures obtained through a scanner and pictures obtained through a photographing device, and the document styles, the illumination and the background environments of the documents or the pictures are greatly different, so that form identification and analysis are important and difficult points in the field of document and image identification. The research work of table identification and analysis has existed for many years, and a very large number of methods have emerged, for example, from the earliest conventional methods based on rules, to the later methods based on deep learning, to the last few years methods based on graph convolution for comparing fires.
Conventional table detection and analysis techniques: the basic traditional image processing method generally adopts image binarization to carry out edge detection, finds a connected region by using corrosion and expansion algorithms, detects line segments and straight lines, and then carries out cell analysis according to transverse lines and longitudinal lines in the image. The traditional method has the advantages of high speed, low dependence on hardware, low difficulty in algorithm implementation, and the like, but the traditional method depends on image binarization or edge detection effect very much, the line detection effect is very unstable for some photographed or blurred images, the adaptability of the algorithm is poor, and the processing capability of the traditional method is very limited for wireless form extraction.
Table detection and analysis techniques based on deep learning: the scheme for solving the table detection and the structure analysis by the deep learning technology is mainly realized by using the deep learning target detection technology and the image segmentation technology, and generally comprises two steps, namely firstly, the deep learning target detection technology is used for detecting table areas and classifying tables, then, the image of the table areas is segmented by using the image segmentation technology, and then, the position and the row and column of a cell are analyzed according to the table lines. The detection accuracy of the scheme is far beyond that of the traditional method, but at least two or more than two models are needed for prediction in the identification and extraction process, and the models are serialized, on one hand, the requirement on hardware in the actual application process is higher, and simultaneously, the training and maintenance costs of the models are higher in the sample marking, so that the performance of the method has a further improved space.
Based on the technical scheme of the neural network of the figure: in the scheme, text blocks are regarded as vertexes of a graph, relations among the text blocks are regarded as edges of the graph, the vertexes and the edges of the graph are subjected to embedding (a mode of converting discrete variables into continuous vectors), then nodes of each graph are subjected to classification judgment through a GCN (graph convolution neural network), the probability that a certain relation exists among the vertexes is obtained, and therefore an adjacency matrix representing the relations among the text blocks is obtained, and a table structure is restored according to the adjacency matrices. The coupling among all steps of the whole process is too strong, once the result is not ideal, the rapid intervention and optimization cannot be realized, and the algorithm is not suitable for landing.
Disclosure of Invention
The invention aims to overcome the defects in the prior art, and provides a table information extraction method and a table information extraction system, which can extract and structurally restore table contents in an image or a document, accurately extract contents and table structure information of table units, completely restore a table structure and perform formatted output, fully utilize the excellent feature extraction capability of a deep neural network and the high performance advantage of a traditional image processing algorithm, effectively improve the robustness and the universality of a scheme, have excellent table extraction speed and extraction effect, and have wide application prospect.
The technical scheme of the invention combines the advantages of the traditional image processing technology and the advantages of the deep learning image processing technology, adopts the deep learning image instance segmentation technology, can accurately predict the position information of the form and simultaneously predict the form line information in the form area, and combines the traditional image processing analysis technology on the basis to accurately calculate the form structure information, thereby realizing high-efficiency form information extraction and form structure restoration.
Specifically, to achieve the above technical objectives, the present invention adopts the following technical solutions:
a table information extraction method comprises the following steps:
step S1: acquiring a form image;
step S2: detecting the position of a text line: inputting the table image in the step S1 into a text line detection model for prediction, and obtaining coordinate information (x, y, w, h, angle) of each text block on the table image, where (x, y) represents the position coordinates of the center point of the text block, (w, h) represents the width and height of the text block, and angle represents the angle of the text block;
step S3: identifying the content of the text line: inputting the text block detected in the step S2 into a text line recognition model, and acquiring the text content of each text block and the direction information of the text block, where the direction information of the text block includes judging whether the text block is 0 ° or 180 °;
step S4: correcting the image direction: calculating the inclination angle of the image and performing inclination correction according to the position angle information and the direction information of each text block acquired in the steps S2 and S3;
step S5: table line analysis predicts: inputting the image output in the step S4 into a deep learning model mainly based on image instance segmentation, analyzing, and extracting a table line feature map;
step S6: analyzing the fusion table line: fusing the table line feature maps generated in the step S5 to generate a binary map;
step S7: analyzing the table cell information: performing table cell analysis on the table line binary image generated in the step S6, and finding four boundary lines (L, T, R, B) closest to each background point in the range of the table based on the theory that background pixels in the same cell all share the same upper, lower, left, and right boundary lines, wherein L, T, R, B represents the IDs of the left edge line, the upper edge line, the right edge line, and the lower edge line closest to the current point (the ID is a unique number for distinguishing each line); then, performing cluster analysis according to the boundary attribute of the background point, namely when two pixel points have the same boundary attribute (namely the same left edge line, the same upper edge line, the same right edge line and the same lower edge line), determining that the two pixel points belong to the same class, and each class represents a cell; finally, calculating the row and column information of the table according to the coordinate information and the connection attribute of the cells;
step S8: fusing cell information: combining the output results of the step S2, the step S3 and the step S7, integrating the cell information to obtain table structure information;
step S9: and (3) formatting output: and outputting the result output in the step S8 according to a data format or a file format.
Further, the step S4 specifically includes the following steps:
step S41: calculating the angle of the image: selecting a text line with detection precision and recognition precision larger than a set threshold T1 from all text lines to calculate the overall angle of the image, carrying out angle clustering by taking the angle of the current text block as the center and carrying out positive and negative deviation of 5 degrees, and finally selecting the category with the largest number of text blocks to calculate and obtain the inclination angle of the image and carry out inclination correction on the image;
step S42: mapping the text block coordinates onto the corrected image: the text block information acquired in steps S2 and S3 is rotation-mapped, i.e., the text coordinate information is mapped onto the tilt-corrected image, according to the image tilt angle acquired in step S41.
Further, in step S5, the deep learning model includes the following algorithm steps:
step S51: performing feature extraction by using a network structure of CNN + FPN, wherein CNN represents a convolutional neural network, including but not limited to resnet, vgg, one of mobility convolutional neural networks; FPN represents a characteristic pyramid structure, and is a universal network structure; spatial information of a bottom layer feature map and semantic information of a high layer feature map of the image can be learned simultaneously through a CNN + FPN network structure;
step S52: generating a table Region by using an RPN (Region Proposal Net): performing table region ROI (region of interest) extraction on the feature map output in step S51 by using an RPN network, that is, extracting a plurality of table candidate regions;
step S53: table classification branch and table boundary regression branch: performing ROI posing (pooling) on each ROI output in the step S52, outputting feature maps with consistent sizes, performing multiple convolutions after posing, and then respectively sending the feature maps into a table classification branch and a table boundary regression branch; wherein, the table classification branch is to perform full connection (1 × 1 × 2) with 2 class number once, and the table boundary regression branch is to perform full connection (1 × 1 × 2 × 4) with 2 × 4 dimensions once;
step S54: table line splitting the predicted branch: performing roilign (a pooling method) on each ROI output in step S52, performing multiple convolution and deconvolution operations on roilign to obtain a mask feature map (512 × 512 × 5), and generating a feature map of 5 channels, where each channel represents a background map bg, a horizontal solid line segmentation map h1, a vertical solid line segmentation map v1, a horizontal virtual line segmentation map h2, and a vertical virtual line segmentation map v 2; wherein bg represents a background point feature map, and is used for judging whether a certain pixel in the current table is a background or a table line, that is, if the response value of the current pixel is greater than a set threshold T2, the pixel is a background; h1 represents a solid line segmentation graph in the horizontal direction, that is, if the response value of a certain pixel point in the segmentation graph is greater than a set threshold value T3, the current pixel point is a point on the solid line in the horizontal direction in the wired table; v1 represents a solid line segmentation graph in the vertical direction, that is, if the response value of a certain pixel point in the segmentation graph is greater than a set threshold value T4, the pixel point is represented as a point on a vertical solid line in the wired table; h2 represents a virtual line segmentation graph in the horizontal direction, that is, if the response value of a certain pixel point on the segmentation graph is greater than a set threshold value T5, the current pixel point is a point on the horizontal virtual line in the wireless table; v2 represents a division graph of the virtual line in the vertical direction, that is, if the response value of a certain pixel point on the division graph is greater than a set threshold value T6, the current pixel point is a point on the vertical virtual line in the wireless table; through this branch, all lines of all wired tables and all lines of all wireless tables on the current image can be acquired simultaneously.
Furthermore, in step S51, the CNN uses the rescet 18 as a feature extraction network, so that a more complex network structure such as the rescet 50/resnet101/densenet is not used, and two main considerations are considered, on one hand, the prediction speed is considered, and if the speed is too slow in practical application, the algorithm cannot fall to the ground; on the other hand, because the characteristics of the table are obvious, the general target is large, and the line and column specification is obvious, the lightweight network can meet the requirement on the prediction precision, thereby realizing the balance of speed and precision.
Further, the step S6 specifically includes the following steps:
step S61: fusion of horizontal direction lines: setting the point with the response value larger than a set threshold value T3 in the h1 characteristic diagram as 255 and setting other points as 0 to form a binary diagram, acquiring all foreground points of each line by using connectivity analysis, and fitting all solid lines in the horizontal direction in the wired table by using a least square method; similarly, in the h2 characteristic diagram, a point with a response value larger than a certain set threshold T5 is set to be 255, and other points are set to be 0, and all virtual lines in the horizontal direction in the wireless table are fitted;
step S62: merging and filtering horizontal direction lines: acquiring all the horizontal lines through the step S61, deleting the lines with the length smaller than the set threshold d1, merging the lines with the longitudinal distance smaller than the set threshold d2 of the two horizontal lines, and merging the lines with the head-to-tail distance smaller than the set threshold d3 of the two horizontal lines;
step S63: fusion of vertical direction lines: setting the point with the response value larger than a set threshold value T4 in the v1 characteristic diagram as 255 and setting other points as 0 to form a binary diagram, acquiring all foreground points of each line by using connectivity analysis, and fitting all solid lines in the vertical direction in the wired table by using a least square method; similarly, in the v2 characteristic diagram, a point with a response value greater than a certain set threshold T6 is 255, and other points are set to 0, and all virtual lines in the vertical direction in the wireless table are fitted;
step S64: merging and filtering of vertical direction lines: acquiring all lines in the vertical direction through step S63, deleting lines whose length is smaller than the set threshold D1, merging lines whose distance in the horizontal direction is smaller than the set threshold D2, and merging lines whose distance from the head to the tail is smaller than the set threshold D3; the values of the setting thresholds D1-D3 can be set to be the same as or different from the values of D1-D3 as required;
step S65: the horizontal and vertical lines of the table acquired in step S62 and step S64 are merged to form a binary map.
Further, the step S7 specifically includes the following steps:
step S71: preliminary analysis of cells: finding four boundary lines (L, T, R, B) closest to each background point pixel in the table range, wherein L represents the ID of the left boundary line of the current point, T represents the ID of the upper boundary line of the current point, R represents the ID of the right boundary line of the current point, and B represents the ID of the lower boundary line of the current point, the boundary lines being all horizontal lines and vertical lines analyzed in step S65;
if a certain boundary line of the current point is not found, the ID of the corresponding boundary line is set as-1, so that the boundary attribute analysis is carried out on all background points in the table to obtain the boundary attribute of each point, and then the preliminary clustering is carried out according to the boundary attribute of the point;
the preliminary clustering rule is as follows: when the boundary attributes of two certain points are completely consistent, the two points are classified into the same class, the same class label is set, minimum area rectangle analysis is carried out on the pixels of the same class, four corner points corresponding to the rectangle are obtained, coordinates (x1, y1, x2, y2, x3, y3, x4 and y4) of the four corner points are coordinate information of the cells of the current class, wherein (x1 and y1) represent the coordinates of the upper left corner of the cell, (x2 and y2) represent the coordinates of the upper right corner of the cell, (x3 and y3) represent the coordinates of the lower right corner of the cell, and (x4 and y4) represent the coordinates of the lower left corner of the cell;
step S72: and (3) cell filtering: deleting cells generated by noise;
step S73: cell merging: when the four boundary attributes of the two cells are completely consistent, merging the two cells;
step S74: cell repair: when the table lines are broken or incomplete, some overlapped cells are generated, the overlapped cells are cut and then merged, and the original table style is restored;
step S75: cell filling: for the table with partial missing boundary lines, the position information of the cells is inaccurate, so that the boundary positions of the cells cannot be aligned and the table style cannot be correctly restored, and the partial cells are repaired, so that the boundary cells can be aligned; for an irregular table missing a plurality of lines at the same time, adjusting the positions of the boundary cells to make the irregular table complete according to the complete table;
step S76: and (3) calculating the row and column information of the table: and calculating the row and column attributes of the cells according to the coordinate information and the connection attributes of the cells, namely the current cell belongs to the row and column of the table.
Further, the step S75 specifically includes the following steps:
step S751: calculating the boundary line of the table: finding out the left line, the upper line, the right line and the lower line of the table, if the boundary line in a certain direction is missing, calculating the position of the boundary line in the direction according to the information of the existing line and the existing cells;
step S752: and searching for boundary cells: finding out all boundary cells in the table by searching the relationship between the four boundary attributes of the cells and the four boundary lines of the table;
step S753: cell alignment: and analyzing the boundary cell and the four edge lines of the table, if the boundary cell is an upper boundary cell, setting the upper boundary line of the cell as the upper boundary line of the current table, and setting the boundary cells in other directions in the same way until all the boundary cells are completely filled and aligned with the boundary line, thereby providing a complete data basis for the information calculation of the row and column of the subsequent cell and the table restoration.
Further, the step S8 specifically includes the following steps:
step S81: estimating the font size: estimating the size of a font according to the pixel height of the text block;
step S82: predicting the font property: on the basis of the step S2, inputting the text block into a font classifier for prediction, and acquiring the font attribute of the current text block;
step S83: integrating cell text content: according to the text block position information obtained in the step S2 and the cell position information obtained in the step S7, the text block content is segmented or merged with the cell as a reference, the text information located inside the cell is the text content of the current cell, the text alignment mode can be estimated according to the position of the cell and the position relationship of the text block to which the cell belongs, and finally the text content, the position information, the font size and the alignment mode of the text of the current cell are obtained by integration.
The present invention also provides a table information extraction system for implementing any one of the above table information extraction methods, the table information extraction system comprising: the system comprises a table image acquisition module, a text line analysis module, an image direction correction module, a table line analysis and prediction module, a table line analysis and fusion module, a table cell information analysis module and a cell information fusion module;
the form image acquisition module is used for acquiring form images;
the text line analysis module comprises a text line detection model and a text line identification model, wherein the text line detection model is used for acquiring coordinate information (x, y, w, h, angle) of each text block on the form image; the text line identification model acquires the text content of the text block and the direction information of the text block detected by the text line detection model, wherein the direction information of the text block comprises the judgment of 0 degree or 180 degrees of the text block;
the image direction correction module is used for calculating the inclination angle of the image according to the position angle information and the direction information of each text block acquired by the text line analysis module;
the table line analysis and prediction module is used for inputting the image output by the image direction correction module into a deep learning model mainly based on image example segmentation for analysis and extracting a table line characteristic diagram;
the table line analysis and fusion module is used for fusing the table line characteristic graph generated by the table line analysis and prediction module to generate a binary graph;
the table cell information analysis module is used for carrying out table cell analysis on the table line binary image generated by the table line analysis and fusion module, and analyzing the table cell based on the theory that background pixel points in the same cell all share the same upper, lower, left and right boundary lines, finding four boundary lines which are closest to each background point in the table range, and then carrying out cluster analysis according to the theory that the pixel points in the same cell have the same boundary attribute, wherein each type represents one cell; calculating the row and column information of the table according to the coordinate information and the connection attribute of the cells;
the cell information fusion module is used for integrating cell information by combining output results of the text line analysis module and the table cell information analysis module, and each cell has cell position information, row and column information, cell text content and font attribute information of the text content;
and the formatting output module is used for outputting the result output by the cell information fusion module according to a data format or a file format.
Further, the image direction correction module comprises an image angle calculation module and a text block coordinate mapping module, wherein the image angle calculation module selects a text line with detection precision and recognition precision larger than a set threshold value T1 from all text lines to calculate the overall angle of the image, performs angle clustering by taking the angle of the current text block as the center and positive and negative deviations of 5 degrees, and finally selects the category with the largest number of the text blocks to calculate and obtain the inclination angle of the image and perform inclination correction on the image; the text block coordinate mapping module rotationally maps the text block information acquired by the text line analysis module to an image subjected to tilt correction according to the image tilt angle acquired by the image angle calculation module;
the table line analysis prediction module is internally provided with a deep learning model, the deep learning model adopts a CNN + FPN network structure to perform feature extraction, then adopts an RPN network structure to generate a plurality of table candidate areas, and performs table classification, table boundary regression and table line segmentation prediction on each table candidate area to finally obtain all lines of all wired tables and all lines of all wireless tables on the current image;
the table line analysis and fusion module comprises a horizontal direction line analysis and fusion module and a vertical direction line analysis and fusion module, the horizontal direction line analysis and fusion module fits all solid lines and virtual lines in the horizontal direction by using a least square method, deletes the lines of which the length of the horizontal lines is smaller than a set threshold value, merges the lines of which the longitudinal distance between the two horizontal lines is smaller than the set threshold value, and merges the lines of which the head-to-tail distance between the two horizontal lines is smaller than the set threshold value; similarly, the vertical direction line analysis and fusion module fits all solid lines and virtual lines in the vertical direction by using a least square method, deletes lines of which the length of the vertical line is smaller than a set threshold value, merges lines of which the distance in the horizontal direction of the two lines is smaller than the set threshold value, and merges lines of which the head-to-tail distance of the two lines is smaller than the set threshold value; finally, the table line analysis and fusion module fuses the horizontal lines and the vertical lines of the table acquired by the horizontal direction line analysis and fusion module and the vertical direction line analysis and fusion module to form a binary image;
the table cell information analysis module also comprises a table boundary line calculation module, a boundary cell search module and a cell alignment module;
the grid boundary line calculation module finds out the left line, the upper line, the right line and the lower line of the table, and if the boundary line in a certain direction is missing, the position of the boundary line in the direction is calculated according to the information of the existing line and the existing cells;
the border cell lookup module: finding out all boundary cells in the table by searching the relationship between the four boundary attributes of the cells and the four boundary lines of the table;
the cell alignment module: and analyzing the boundary cell and the four edge lines of the table, if the boundary cell is an upper boundary cell, setting the upper boundary line of the cell as the upper boundary line of the current table, and setting the boundary cells in other directions in the same way until all the boundary cells are completely filled and aligned with the boundary line, thereby providing a complete data basis for the information calculation of the row and column of the subsequent cell and the table restoration.
Compared with the prior art, the invention has the following beneficial effects:
(1) the invention adopts the deep learning image processing technology to replace the traditional image processing technology to carry out the form detection and the form line detection, improves the precision of the form detection and the form line detection, and improves the robustness and the adaptability of the algorithm, for example, the traditional method has poor adaptability to the photographed form image and the fuzzy form image, but adopts the deep learning scheme, and adds some image enhancement technologies (simulating the form images under some complex scenes) in the model training, so that the model can automatically learn the image characteristics under the complex scenes in the training process, the robustness and the universality of the scheme are effectively improved, and the form extraction speed and the extraction effect are excellent;
(2) the invention provides a deep learning scheme for end-to-end table detection and table line detection, one model can simultaneously predict table areas, table types and table line characteristic graphs, compared with a scheme of two-stage tasks (one model is used for table area detection and table type prediction, and one model is used for table line segmentation), the deep learning parameter sharing characteristic is fully utilized by the end-to-end scheme, the prediction time of the whole process is reduced, the two-stage task takes 240ms on a GPU of an RTX2070, the end-to-end detection of the invention takes 140ms, and the performance is obviously improved;
(3) the cell analysis method provided by the invention is based on the theory that the background pixel points of the same cell share four boundary lines of the cell to perform cluster analysis, so that the position of the cell is obtained, the conditions of crossing rows and crossing columns of the cell can be directly processed, and meanwhile, for the conditions of irregular forms, broken form lines, incomplete form lines, missing form lines and the like, the reduction effect of the cell is very excellent, and the processing speed of the cell analysis method is obviously superior to that of the conventional image processing method based on rules and the conventional technical scheme adopting deep learning; in addition, because the final result of the cells depends on the table line detection effect of the previous stage, the deep learning technology is adopted in the key stage of the table line detection, so that the technical scheme of the invention has obvious advantages in processing speed under the condition of not influencing the analysis structure of the final cells;
(4) the form information extraction method and the form information extraction system can accurately restore the form structure and the form format information, including the word size, the font, the alignment mode and the like of the text, and the restoration effect is outstanding;
(5) in the table unit analysis process, the wired table and the wireless table do not need to be distinguished, the processing flows of the two types of tables are consistent, and the complexity of the processing flows is effectively reduced.
Drawings
FIG. 1 is a flowchart of a table information extraction algorithm according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating an exemplary detection behavior of a form text according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating an example of a table text recognition behavior according to an embodiment of the present invention;
FIG. 4 is an exemplary diagram of table image tilt correction according to an embodiment of the present invention;
FIG. 5 is a block diagram of an end-to-end table detection and table line segmentation algorithm according to an embodiment of the present invention;
FIG. 6 is a diagram illustrating an exemplary behavior of an end-to-end table detection and table line segmentation algorithm according to an embodiment of the present invention;
FIG. 7 is a diagram illustrating an example process of table line fusion according to an embodiment of the present invention;
FIG. 8 is a diagram illustrating an example process for analyzing a table cell in accordance with an embodiment of the present invention;
FIG. 9 is a diagram illustrating an example of a result of extracting the wireless form information according to an embodiment of the present invention;
FIG. 10 is a diagram illustrating an example of extraction results of cable form information according to an embodiment of the present invention;
FIG. 11 is a diagram illustrating an example of cell completion according to an embodiment of the present invention;
FIG. 12 is a diagram illustrating a cell preliminary cluster analysis result according to an embodiment of the present invention;
fig. 13 is a schematic diagram of a cell repairing process under a broken condition of a wired table according to an embodiment of the present invention, where a represents an original drawing of the wired table with a broken line, b represents a result of preliminary analysis of cells, 1/2/3 cells are analyzed at a broken point of the wired table, c is a detailed diagram of position information of 1/2/3 cells in the b drawing, and d represents a final result after cell repairing, that is, three cells are repaired into two cells according to table structure information;
FIG. 14 is a schematic diagram illustrating a process for filling up border cells in the case of missing border lines of a table according to an embodiment of the present invention; wherein, A represents the original chart drawing with the lack of chart lines; b represents the results of the analysis of the form lines; c represents the cell analysis result, and the corresponding graph uses the number 1 to mark the cell position; d represents the analysis result of four boundary lines of the table, and the positions of the four boundary lines are respectively marked by numbers 1, 2, 3 and 4 in the corresponding graph; e represents the boundary cell analysis result of the table, and the corresponding graph is marked with the number 1; f represents a result schematic diagram after cell filling, and cell filling is carried out on the position marked as the number 1 in the corresponding diagram, so that the tables after filling are completely aligned in a row-column format.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention provides a table information extraction method, including the following steps:
step S1: the method comprises the steps of obtaining a form image, wherein the form of the form image supports file formats such as JPG, BMP, TIFF, PNG, WORD, EXCEL, PDF, GIF and HEIC, and the supported picture source modes comprise an electronic document, a file obtained by scanning of a scanner and a file obtained by photographing of photographing equipment;
step S2: detecting the position of a text line: inputting the form image in step S1 into a text line detection model for prediction, so as to obtain coordinate information (x, y, w, h, angle) of each text block on the form image, where (x, y) represents the position coordinates of the central point of the text block, (w, h) represents the width and height of the text block, and angle represents the angle of the text block, and the text line detection model supports text line detection in any direction of 360 °, as shown in fig. 2, where the left diagram of fig. 2 is an original image to be detected, and the right diagram is a text line detection result;
step S3: identifying the content of the text line: inputting the text block detected in step S2 into the text line recognition model, so as to obtain the text content of each text block and the direction information of the text block, where the direction information of the text block includes determining whether the text block is 0 ° or 180 °, as shown in fig. 3;
step S4: correcting the image direction: and calculating the inclination angle of the image and performing inclination correction according to the position angle information and the direction information of each text block acquired in the steps of S2 and S3, as shown in fig. 4, wherein the left image of fig. 4 is the original image with inclination to be corrected, and the right image is the image after inclination correction.
The step S4 specifically includes the following steps:
step S41: calculating the angle of the image: selecting a text line with detection precision and recognition precision larger than a set threshold T1 from all text lines to calculate the overall angle of the image, clustering the text lines with recognition precision smaller than or equal to the set threshold T1 according to angles, clustering the text blocks larger than the set threshold T1 by taking the angle of the current text block as the center and carrying out positive and negative deviation of 5 degrees, and finally selecting the category with the largest number of the text blocks to calculate the inclination angle of the image and carry out inclination correction on the image, wherein through the step, the image is rotated into the image with the angle close to 0 degree no matter whether the input image is 0 degree, 90 degrees, 180 degrees or other angles;
step S42: mapping the text block coordinates onto the corrected image: after step S41, the image is subjected to tilt correction, and the tilt angle of the image is obtained, and according to this angle, the text block information obtained in steps S2 and S3 is subjected to rotation mapping, that is, the text coordinate information is mapped to the tilt-corrected image;
step S5: in the table line analysis and prediction, the image output in step S4 is input into a deep learning model mainly based on image instance segmentation, and is analyzed to obtain a table line feature map, the main structure of the deep learning model is as shown in fig. 5, and the algorithm steps included in the table line analysis and prediction are as follows:
step S51: adopting a CNN + FPN network structure to extract features: the CNN represents a convolutional neural network, which may be any one of resnet, vgg, mobilent, and the like in an actual implementation process; FPN represents a characteristic pyramid structure, and is a universal network structure; spatial information of a bottom layer feature map and semantic information of a high layer feature map of the image can be learned simultaneously through a CNN + FPN network structure;
as a preferable scheme, the step S51 adopts resnet18+ FPN as a feature extraction network, so that a more complicated CNN network structure such as resnet50/resnet101/densenet is not adopted, and the like has two main considerations, on one hand, the prediction speed is considered, and if the speed is too slow in practical application, the algorithm cannot fall to the ground; on the other hand, because the characteristics of the table are obvious, the general target is large, and the table has the characteristics of obvious row and column specifications and the like, the lightweight network can meet the requirement on the prediction precision, thereby realizing the balance of speed and precision;
step S52: generating a table Region by using an RPN (Region Proposal Net): performing table region ROI (region of interest) extraction on the feature map output in step S51 by using an RPN network, that is, extracting a plurality of table candidate regions;
step S53: table classification branch and table boundary regression branch: performing ROI posing pooling on each ROI output in the step S52, outputting feature maps with consistent sizes, performing multiple convolutions after posing pooling, and then respectively sending into a table classification branch and a table boundary regression branch; wherein, the table classification branch is to perform full connection (1 × 1 × 2) with 2 category numbers once, and can judge whether the table is a wired table or a wireless table; the table boundary regression branch is to perform 2 × 4 dimensional full connection (1 × 1 × 2 × 4) once;
step S54: table line splitting the predicted branch: the roilign pooling is performed on each ROI output in step S52, and a roilign pooling method is adopted to improve the precision of the mask boundary, so that the problem that the feature map generated by the ROI posing method is not aligned with the original image (mis-alignment) can be solved, and the accuracy of the prediction of the mask boundary can be improved to a certain extent; performing a plurality of convolution and deconvolution operations on RoIAlign to obtain a mask feature map (512 multiplied by 5), wherein the resolution ratio of the mask generated by the branch is higher, if the resolution ratio is too low, some dense table lines can not be separated on the feature map, and a plurality of lines are combined together, so that the prediction condition of the table lines with various densities can be adapted by adopting higher resolution ratio;
the table line segmentation predicted branch generates a feature map of 5 channels, each channel representing a background map bg, a horizontal solid line segmentation map h1, a vertical solid line segmentation map v1, a horizontal virtual line segmentation map h2, and a vertical virtual line segmentation map v2, as shown in fig. 6. Wherein bg represents a background point feature map, and is used for judging whether a certain pixel in the current table is a background or a table line, that is, if the response value of the current pixel is greater than a set threshold T2, the pixel is a background; h1 represents a solid line segmentation graph in the horizontal direction, that is, if the response value of a certain pixel point in the segmentation graph is greater than a set threshold value T3, the current pixel point is a point on the solid line in the horizontal direction in the wired table; v1 represents a vertical solid line segmentation graph, that is, if the response value of a certain pixel point in the segmentation graph is greater than a set threshold value T4, the pixel point is represented as a point on a vertical solid line in a wired table; h2 represents a virtual line segmentation graph in the horizontal direction, that is, if the response value of a certain pixel point on the segmentation graph is greater than a set threshold value T5, the current pixel point is a point on the horizontal virtual line in the wireless table; v2 represents a division graph of the virtual line in the vertical direction, that is, if the response value of a certain pixel point on the division graph is greater than a set threshold value T6, the current pixel point is a point on the vertical virtual line in the wireless table; all lines of all wired tables and all lines of all wireless tables on the current image can be acquired simultaneously through the branch; the setting thresholds T2 to T6 may be set to the same value or different values, as necessary.
Step S6: analyzing the fused table line, fusing the table line feature maps generated in step S5 to generate a binary map, wherein step S6 may specifically include:
step S61: fusing horizontal direction lines, setting the point with the response value larger than a set threshold value T3 in the h1 characteristic diagram as 255, setting other points as 0, forming a binary diagram, obtaining all foreground points of each line by utilizing connectivity analysis, and fitting all solid lines in the horizontal direction in the wired table by utilizing a least square method; similarly, in the h2 feature map, the point with the response value larger than the set threshold T5 is set to be 255, and other points are set to be 0, and all virtual lines in the horizontal direction in the wireless table are fitted;
step S62: merging and filtering horizontal direction lines, namely acquiring all horizontal lines through step S61, deleting the lines with the length smaller than a set threshold value d1, merging the lines with the longitudinal distance smaller than a set threshold value d2 of the two horizontal lines, and merging the lines with the head-to-tail distance smaller than a set threshold value d3 of the two horizontal lines;
step S63: fusing vertical direction lines, setting the point with the response value larger than a set threshold value T4 in the v1 characteristic diagram as 255, setting other points as 0, forming a binary diagram, obtaining all foreground points of each line by using connectivity analysis, and fitting all solid lines in the vertical direction in a wired table by using a least square method; similarly, in the v2 feature map, a point with a response value greater than a certain set threshold T6 is set to be 255, and other points are set to be 0, and all virtual lines in the vertical direction are fitted;
step S64: merging and filtering the lines in the vertical direction, namely acquiring all the lines in the vertical direction through step S63, deleting the lines with the length smaller than a set threshold D1, merging the lines with the distance in the horizontal direction smaller than a set threshold D2, and merging the lines with the distance from the head to the tail smaller than a set threshold D3; the values of the setting thresholds D1-D3 can be set to be the same as or different from the values of D1-D3 as required;
step S65: the horizontal and vertical lines of the table acquired in step S62 and step S64 are merged to form a binary map, as shown in fig. 7.
Step S7: analyzing the table cell information: performing table cell analysis on the table line binary image generated in the step S6, and finding four boundary lines (L, T, R, B) closest to each background point in the range of the table based on the theory that background pixels in the same cell all share the same upper, lower, left, and right boundary lines, wherein L, T, R, B represents the IDs of the left edge line, the upper edge line, the right edge line, and the lower edge line closest to the current point (the ID is a unique number for distinguishing each line); then, performing cluster analysis according to the boundary attribute of the background point, namely when two pixel points have the same boundary attribute (namely the same left edge line, the same upper edge line, the same right edge line and the same lower edge line), determining that the two pixel points belong to the same class, and each class represents a cell; finally, calculating the row and column information of the table according to the coordinate information and the connection attribute of the cells; the cell analysis method can process the condition that cells cross rows and columns, and can also process cell analysis extraction and restoration of table line fracture, table line incompleteness and irregular tables, and the exemplary effect is shown in fig. 8, wherein the left graph of fig. 8 is a table line fusion effect graph, the middle graph is a background pixel clustering effect graph, and the right graph is a final cell analysis effect graph.
The step S7 may specifically include:
step S71: preliminary analysis of cells: theoretically, all background points in the same cell share the same four boundary lines, namely the same left boundary line, right boundary line, upper boundary line and lower boundary line; based on this, four boundary lines (L, T, R, B) closest to the top, bottom, left and right of each background point pixel are found in the table range, where L represents the ID of the left boundary line of the current point, T represents the ID of the top boundary line of the current point, R represents the ID of the right boundary line of the current point, and B represents the ID of the bottom boundary line of the current point, and the boundary lines are all the horizontal and vertical lines analyzed in step S65;
if a certain boundary line of the current point is not found, the ID of the corresponding boundary line is set as-1, so that the boundary attribute analysis is carried out on all background points in the table to obtain the boundary attribute of each point, and then the preliminary clustering is carried out according to the boundary attribute of the point;
the preliminary clustering rule is as follows: when the boundary attributes of two certain points are completely consistent, the two points are classified into the same class, the same class label is set, minimum area rectangle analysis is performed on pixels of the same class, four corner points corresponding to a rectangle are obtained, coordinates (x1, y1, x2, y2, x3, y3, x4 and y4) of the four corner points are coordinate information of a current class cell, wherein (x1 and y1) represent coordinates of the upper left corner of the cell, (x2 and y2) represent coordinates of the upper right corner of the cell, (x3 and y3) represent coordinates of the lower right corner of the cell, (x4 and y4) represent coordinates of the lower left corner of the cell, and a primary clustering result schematic diagram is shown in fig. 12;
step S72: cell filtering, namely filtering some cells generated by noise, for example, deleting cells smaller than a certain size which are considered as noise;
step S73: cell merging: when the four boundary attributes of the two cells are completely consistent, merging the two cells;
step S74: cell repair: when some overlapped cells are generated under the condition that the table lines are broken or incomplete, the overlapped cells need to be cut and then merged, so that the overlapped cells are restored to the original table style, and the principle is as follows: when two cells have three boundaries with the same attribute and the coincidence rate is greater than the threshold value T, the large cell is first segmented, as shown in the flow diagram of fig. 13, where a in fig. 13 represents the original graph of the wired table with the fracture line, b represents the result of the preliminary analysis of the cell, 1/2/3 cells will be analyzed at the fracture point of the table line, c is a detailed diagram of the position information of three cells 1-3 in b, and d represents the final result after cell repair, that is, three cells are repaired into two cells conforming to the table structure information; as can be seen from fig. 13, if cell 2 and cell 1 satisfy the condition, the cells are split in the horizontal direction, two new cells are generated, that is, 4 and 5, respectively, and then cell 4 and cell 1 are completely overlapped, the two cells are merged into cell 1, and similarly, cell 5 and cell 3 are merged into cell 3, which can repair the accurate position information of the cells in the case of broken lines and incomplete lines;
step S75: cell filling: in some tables, a boundary line part is missing, as shown in fig. 11, the left boundary line of the left image in fig. 11 is incomplete, which results in inaccurate position information of the cells, which results in the fact that the boundary positions of the cells cannot be aligned and the table style cannot be correctly restored, so that the cells in the part need to be repaired, so that the boundary cells can be aligned, and the effect shown in the right image in fig. 11 is obtained; some tables lose a plurality of lines at the same time or are irregular tables, as shown in the flow chart diagram 14, the positions of the boundary cells need to be adjusted to be completed according to the complete table, wherein a represents the original image of the table with the missing table lines; b represents the results of the analysis of the form lines; c represents the cell analysis result, and the corresponding graph uses the number 1 to mark the cell position; d represents the analysis result of four boundary lines of the table, and the positions of the four boundary lines are respectively marked by numbers 1, 2, 3 and 4 in the corresponding graph; e represents the boundary cell analysis result of the table, and the corresponding graph is marked with the number 1; f represents a result schematic diagram after cell filling, and cell filling is carried out on the position marked as the number 1 in the corresponding diagram, so that the tables after filling are completely aligned in a row-column format. The specific steps of step 75 are as follows;
step S751: calculating the boundary line of the table: finding out the left side line, the upper side line, the right side line and the lower side line of the table, if the boundary line in a certain direction is missing, calculating the position of the boundary line in the direction (namely, virtualizing) according to the information of the existing line and the existing cell;
step S752: and searching for boundary cells: finding out all boundary cells in the table by searching the relationship between the four boundary attributes of the cells and the four boundary lines of the table;
step S753: cell alignment: analyzing the boundary cells and the four edge lines of the table, if the boundary cells are the upper boundary cells, setting the upper boundary lines of the cells as the upper boundary lines of the current table, and the boundary cells in other directions are the same, and through the step, all the boundary cells are completely filled and aligned with the boundary lines, so that a complete data basis is provided for subsequent row-column calculation (calculation of the column to which the cell belongs) and table restoration;
step S76: and calculating the row and column information of the table, namely calculating the row and column attribute of the cell according to the coordinate information and the connection attribute of the cell, namely the current cell belongs to the row and column of the table.
Step S8: integrating the cell information and combining the output results of the step S2, the step S3 and the step S7 to integrate the cell information to obtain the table structure information, wherein the specific steps are as follows:
step S81: estimating the size of the font, and estimating the size of the font according to the pixel height of the text block;
step S82: predicting the font property, inputting the text block into a font classifier for prediction on the basis of the step S2, and acquiring the font property of the current text block;
step S83: integrating cell text contents, integrating the cell text contents: according to the text block position information obtained in the step S2 and the cell position information obtained in the step S7, the text block content is segmented or combined with the cell as a reference, the text information located inside the cell is the text content of the current cell, the text alignment mode can be estimated according to the position of the cell and the position relation of the text block to which the cell belongs, and through the step S83, the text content, the position information, the font, the word size and the alignment mode of the text of the current cell can be finally obtained in an integrated mode;
step S9: and (4) outputting in a format, wherein the output in the format is performed on the basis of step S8, for example, formatted data such as xml and json can be output, and formatted files such as word, excel, csv and txt can also be exported.
Fig. 9 is an exemplary diagram of a wireless form extracted and structurally restored by the form information extraction method of the present invention, wherein a left diagram in fig. 9 is an original drawing of the wireless form to be extracted and restored, and a right diagram is a drawing of a wireless form extraction result; fig. 10 is an exemplary diagram of a wired table extracted and structurally restored by the table information extraction method of the present invention, wherein the left diagram of fig. 10 is an original drawing of the wired table to be extracted and restored, and the right diagram is a wired table extraction result diagram.
The invention provides a table information extraction system, which is used for implementing the table information extraction method, and the table information extraction system comprises: the device comprises a table image acquisition module, a text line analysis module, an image direction correction module, a table line analysis and prediction module, a table line analysis and fusion module, a table cell information analysis module and a cell information fusion module.
Specifically, the form image acquiring module is used for acquiring a form image; the form image format supports file formats such as JPG, BMP, TIFF, PNG, WORD, EXCEL, PDF, GIF, HEIC and the like, and the supported picture source modes comprise an electronic document, a file obtained by scanning with a scanner and a file obtained by photographing with a photographing device.
The text line analysis module comprises a text line detection model and a text line identification model, wherein the text line detection model is used for acquiring coordinate information (x, y, w, h, angle) of each text block on the form image, the (x, y) represents the position coordinates of the central point of the text block, the (w, h) represents the width and height of the text block, the angle represents the angle of the text block, and the text line detection model supports text line detection in any direction of 360 degrees; the text line recognition model obtains the text content of the text block and the direction information of the text block detected by the text line detection model, wherein the direction information of the text block comprises the judgment of 0 degree or 180 degrees of the text block.
The image direction correction module calculates the inclination angle of the image according to the position angle information and the direction information of each text block acquired by the text line analysis module; the image direction correction module comprises an image angle calculation module and a text block coordinate mapping module, the image angle calculation module selects a text line with detection precision and recognition precision larger than a set threshold value T1 from all text lines to calculate the whole image angle, carries out angle clustering by taking the angle of the current text block as the center and carrying out positive and negative deviation of 5 degrees, and finally selects the category with the largest number of the text blocks to calculate and obtain the inclination angle of the image and carry out inclination correction on the image; and the text block coordinate mapping module rotationally maps the text block information acquired by the text line analysis module to the image after inclination correction according to the image inclination angle acquired by the image angle calculation module.
The table line analysis and prediction module is used for inputting the image output by the image direction correction module into a deep learning model mainly based on image example segmentation for analysis and extracting a table line characteristic diagram; the table line analysis prediction module is internally provided with a deep learning model, the deep learning model adopts a CNN + FPN network structure to perform feature extraction, then adopts an RPN network structure to generate a plurality of table candidate areas, and performs table classification, table boundary regression and table line segmentation prediction on each table candidate area to finally obtain all lines of all wired tables and all wireless tables on the current image.
The table line analysis and fusion module is used for fusing the table line characteristic graph generated by the table line analysis and prediction module to generate a binary graph; the table line analysis and fusion module comprises a horizontal direction line analysis and fusion module and a vertical direction line analysis and fusion module, the horizontal direction line analysis and fusion module fits all solid lines and virtual lines in the horizontal direction by using a least square method, deletes the lines of which the length of the horizontal lines is smaller than a set threshold value, merges the lines of which the longitudinal distance between the two horizontal lines is smaller than the set threshold value, and merges the lines of which the head-to-tail distance between the two horizontal lines is smaller than the set threshold value; similarly, the vertical direction line analysis and fusion module fits all solid lines and virtual lines in the vertical direction by using a least square method, deletes lines of which the length of the vertical line is smaller than a set threshold value, merges lines of which the distance in the horizontal direction of the two lines is smaller than the set threshold value, and merges lines of which the head-to-tail distance of the two lines is smaller than the set threshold value; and finally, the table line analysis and fusion module fuses the horizontal lines and the vertical lines of the table acquired by the horizontal direction line analysis and fusion module and the vertical direction line analysis and fusion module to form a binary image.
The table cell information analysis module is used for carrying out table cell analysis on the table line binary image generated by the table line analysis and fusion module, and analyzing the theory that the background pixel points in the same cell share the same upper, lower, left and right boundary lines, finding the four boundary lines which are closest to each background point in the range of the table, and then clustering the background points according to the collinear attribute, wherein each type represents one cell; calculating the row and column information of the table according to the coordinate information and the connection attribute of the cells; the table cell information analysis module further comprises a table boundary line calculation module, a boundary cell search module and a cell alignment module:
the grid boundary line calculation module finds out the left line, the upper line, the right line and the lower line of the table, and if the boundary line in a certain direction is missing, the position of the boundary line in the direction is calculated according to the information of the existing line and the existing cells;
the border cell lookup module: finding out all boundary cells in the table by searching the relationship between the four boundary attributes of the cells and the four boundary lines of the table;
the cell alignment module: and analyzing the boundary cell and the four edge lines of the table, if the boundary cell is an upper boundary cell, setting the upper boundary line of the cell as the upper boundary line of the current table, and setting the boundary cells in other directions in the same way until all the boundary cells are completely filled and aligned with the boundary line, thereby providing a complete data basis for the information calculation of the row and column of the subsequent cell and the table restoration.
The cell information fusion module is used for integrating cell information by combining output results of the text line analysis module and the table cell information analysis module, and each cell has cell position information, row and column information, cell text content and font attribute information of the text content;
the formatting output module is used for outputting the result output by the cell information fusion module according to a data format or a file format, for example, formatted data such as xml and json can be output, and formatted files such as word, excel, csv and txt can also be exported.
The above description is only an example of the present application and is not intended to limit the present invention. Any modification, equivalent replacement, and improvement made within the scope of the application of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A form information extraction method is characterized by comprising the following steps:
step S1: acquiring a form image;
step S2: detecting the position of a text line: inputting the table image in the step S1 into a text line detection model for prediction, and obtaining coordinate information (x, y, w, h, angle) of each text block on the table image, where (x, y) represents the position coordinates of the center point of the text block, (w, h) represents the width and height of the text block, and angle represents the angle of the text block;
step S3: identifying the content of the text line: inputting the text block detected in the step S2 into a text line recognition model, and acquiring the text content of each text block and the direction information of the text block, where the direction information of the text block includes judging whether the text block is 0 ° or 180 °;
step S4: correcting the image direction: calculating the inclination angle of the image and performing inclination correction according to the position angle information and the direction information of each text block acquired in the steps S2 and S3;
step S5: table line analysis predicts: inputting the image output in the step S4 into a deep learning model mainly based on image instance segmentation, analyzing, and extracting a table line feature map;
step S6: analyzing the fusion table line: fusing the table line feature maps generated in the step S5 to generate a binary map;
step S7: analyzing the table cell information: performing table cell analysis on the table line binary image generated in the step S6, and finding four boundary lines (L, T, R, B) closest to each background point in the range of the table based on the theory that background pixels in the same cell all share the same upper, lower, left and right boundary lines, wherein L, T, R, B represents the IDs of the left edge line, the upper edge line, the right edge line and the lower edge line closest to the current point respectively; then, carrying out cluster analysis according to the boundary attribute of the background point, namely when two pixel points have the same boundary attribute, recognizing that the two pixel points belong to the same class, and each class represents a cell; finally, calculating the row and column information of the table according to the coordinate information and the connection attribute of the cells;
step S8: fusing cell information: combining the output results of the step S2, the step S3 and the step S7, integrating the cell information to obtain table structure information;
step S9: and (3) formatting output: and outputting the result output in the step S8 according to a data format or a file format.
2. The form information extraction method according to claim 1, wherein the step S4 specifically includes the steps of:
step S41: calculating the angle of the image: selecting a text line with detection precision and recognition precision larger than a set threshold T1 from all text lines to calculate the overall angle of the image, carrying out angle clustering by taking the angle of the current text block as the center and carrying out positive and negative deviation of 5 degrees, and finally selecting the category with the maximum number of the text blocks to calculate the inclination angle of the image and carry out inclination correction on the image;
step S42: mapping the text block coordinates onto the corrected image: the text block information acquired in steps S2 and S3 is rotation-mapped, i.e., the text coordinate information is mapped onto the tilt-corrected image, according to the image tilt angle acquired in step S41.
3. The method for extracting form information according to claim 2, wherein in the step S5, the deep learning model includes the following algorithm steps:
step S51: performing feature extraction by using a network structure of CNN + FPN, wherein CNN represents a convolutional neural network, including but not limited to resnet, vgg, one of mobility convolutional neural networks; FPN represents a characteristic pyramid structure, and is a universal network structure; spatial information of a bottom layer feature map and semantic information of a high layer feature map of the image can be learned at the same time through a CNN + FPN network structure;
step S52: and generating a table area by adopting an RPN (resilient packet network): performing table region ROI extraction on the feature map output in the step S51 by using an RPN (resilient packet network), namely extracting a plurality of table candidate regions;
step S53: table classification branch and table boundary regression branch: performing ROI posing pooling on each ROI output in the step S52, outputting feature maps with consistent sizes, performing multiple convolutions after posing, and then respectively sending the feature maps into a table classification branch and a table boundary regression branch; wherein, the table classification branch is to perform full connection (1 × 1 × 2) with 2 class number once, and the table boundary regression branch is to perform full connection (1 × 1 × 2 × 4) with 2 × 4 dimensions once;
step S54: table line splitting the predicted branch: performing roilign pooling on each ROI output in step S52, performing multiple convolution and deconvolution operations on roilign to obtain a mask feature map (512 × 512 × 5), generating a feature map of 5 channels, each channel representing a background map bg, a horizontal solid line segmentation map h1, a vertical solid line segmentation map v1, a horizontal virtual line segmentation map h2, and a vertical virtual line segmentation map v2, respectively; wherein bg represents a background point feature map, and is used for judging whether a certain pixel in the current table is a background or a table line, that is, if the response value of the current pixel is greater than a set threshold T2, the pixel is a background; h1 represents a solid line segmentation graph in the horizontal direction, that is, if the response value of a certain pixel point in the segmentation graph is greater than a set threshold value T3, the current pixel point is a point on the solid line in the horizontal direction in the wired table; v1 represents a vertical solid line segmentation graph, that is, if the response value of a certain pixel point in the segmentation graph is greater than a set threshold value T4, the pixel point is represented as a point on a vertical solid line in a wired table; h2 represents a virtual line segmentation graph in the horizontal direction, that is, if the response value of a certain pixel point on the segmentation graph is greater than a set threshold value T5, the current pixel point is a point on the horizontal virtual line in the wireless table; v2 represents a division graph of the virtual line in the vertical direction, that is, if the response value of a certain pixel point on the division graph is greater than a set threshold value T6, the current pixel point is a point on the vertical virtual line in the wireless table; through this branch, all lines of all wired tables and all lines of all wireless tables on the current image can be acquired simultaneously.
4. The form information extraction method of claim 3, wherein in step S51, the CNN uses resnet18 as a feature extraction network.
5. The form information extraction method according to claim 3 or 4, wherein the step S6 specifically includes the steps of:
step S61: fusion of horizontal direction lines: setting the point with the response value larger than a set threshold value T3 in the h1 characteristic diagram as 255 and setting other points as 0 to form a binary diagram, acquiring all foreground points of each line by using connectivity analysis, and fitting all solid lines in the horizontal direction in the wired table by using a least square method; similarly, in the h2 feature map, a point with a response value larger than the set threshold T5 is set to be 255, and other points are set to be 0, and all virtual lines in the horizontal direction in the wireless table are fitted;
step S62: merging and filtering horizontal direction lines: acquiring all the horizontal lines through the step S61, deleting the lines with the length smaller than the set threshold d1, merging the lines with the longitudinal distance smaller than the set threshold d2 of the two horizontal lines, and merging the lines with the head-to-tail distance smaller than the set threshold d3 of the two horizontal lines;
step S63: fusion of vertical direction lines: setting the point with the response value larger than a set threshold value T4 in the v1 characteristic diagram as 255 and setting other points as 0 to form a binary diagram, acquiring all foreground points of each line by using connectivity analysis, and fitting all solid lines in the vertical direction in the wired table by using a least square method; similarly, in the v2 feature map, a point with a response value larger than the set threshold T6 is set to 255, and other points are set to 0, and all virtual lines in the vertical direction in the wireless table are fitted;
step S64: merging and filtering of vertical direction lines: acquiring all lines in the vertical direction through step S63, deleting lines whose length is smaller than the set threshold D1, merging lines whose distance in the horizontal direction is smaller than the set threshold D2, and merging lines whose distance from the head to the tail is smaller than the set threshold D3;
step S65: the horizontal and vertical lines of the table acquired in step S62 and step S64 are merged to form a binary map.
6. The form information extraction method according to claim 5, wherein the step S7 specifically includes the steps of:
step S71: preliminary analysis of cells: finding four boundary lines (L, T, R, B) closest to each background point pixel in the table range, wherein L represents the ID of the left boundary line of the current point, T represents the ID of the upper boundary line of the current point, R represents the ID of the right boundary line of the current point, and B represents the ID of the lower boundary line of the current point, the boundary lines being all horizontal lines and vertical lines analyzed in step S65;
if a certain boundary line of the current point is not found, the ID of the corresponding boundary line is set as-1, so that the boundary attribute analysis is carried out on all background points in the table to obtain the boundary attribute of each point, and then the preliminary clustering is carried out according to the boundary attribute of the point;
the preliminary clustering rule is as follows: when the boundary attributes of two certain points are completely consistent, the two points are classified into the same class, the same class label is set, minimum area rectangle analysis is carried out on the pixels of the same class, four corner points corresponding to the rectangle are obtained, coordinates (x1, y1, x2, y2, x3, y3, x4 and y4) of the four corner points are coordinate information of the cells of the current class, wherein (x1 and y1) represent the coordinates of the upper left corner of the cell, (x2 and y2) represent the coordinates of the upper right corner of the cell, (x3 and y3) represent the coordinates of the lower right corner of the cell, and (x4 and y4) represent the coordinates of the lower left corner of the cell;
step S72: and (3) cell filtering: deleting cells generated by noise;
step S73: cell merging: when the four boundary attributes of the two cells are completely consistent, merging the two cells;
step S74: cell repair: when the table lines are broken or incomplete, some overlapped cells are generated, the overlapped cells are cut and then merged, and the original table style is restored;
step S75: cell filling: for the table with the missing boundary line part, repairing the cell of the part to ensure that the boundary cells can be aligned; for an irregular table missing a plurality of lines at the same time, adjusting the positions of the boundary cells to make the irregular table complete according to the complete table;
step S76: and (3) calculating the row and column information of the table: and calculating the row and column attributes of the cells according to the coordinate information and the connection attributes of the cells, namely the current cell belongs to the row and column of the table.
7. The form information extraction method according to claim 6, wherein the step S75 specifically includes the steps of:
step S751: calculating the boundary line of the table: finding out the left line, the upper line, the right line and the lower line of the table, if the boundary line in a certain direction is missing, calculating the position of the boundary line in the direction according to the information of the existing line and the existing cells;
step S752: and searching for boundary cells: finding out all boundary cells in the table by searching the relationship between the four boundary attributes of the cells and the four boundary lines of the table;
step S753: cell alignment: and analyzing the boundary cell and the four edge lines of the table, if the boundary cell is an upper boundary cell, setting the upper boundary line of the cell as the upper boundary line of the current table, and setting the boundary cells in other directions in the same way until all the boundary cells are completely filled and aligned with the boundary line, thereby providing a complete data basis for the information calculation of the row and column of the subsequent cell and the table restoration.
8. The form information extraction method according to claim 7, wherein the step S8 specifically includes the steps of:
step S81: estimating the font size: estimating the size of a font according to the pixel height of the text block;
step S82: predicting the font property: on the basis of the step S2, inputting the text block into a font classifier for prediction, and acquiring the font attribute of the current text block;
step S83: integrating cell text content: according to the text block position information obtained in the step S2 and the cell position information obtained in the step S7, the text block content is segmented or merged with the cell as a reference, the text information located inside the cell is the text content of the current cell, the text alignment mode can be estimated according to the position of the cell and the position relationship of the text block to which the cell belongs, and finally the text content, the position information, the font size and the alignment mode of the text of the current cell are obtained by integration.
9. A form information extraction system for implementing a form information extraction method according to any one of claims 1 to 8, characterized by comprising: the system comprises a table image acquisition module, a text line analysis module, an image direction correction module, a table line analysis and prediction module, a table line analysis and fusion module, a table cell information analysis module and a cell information fusion module;
the form image acquisition module is used for acquiring form images;
the text line analysis module comprises a text line detection model and a text line identification model, wherein the text line detection model is used for acquiring coordinate information (x, y, w, h, angle) of each text block on the form image; the text line identification model acquires the text content of the text block and the direction information of the text block detected by the text line detection model, wherein the direction information of the text block comprises the judgment of 0 degree or 180 degrees of the text block;
the image direction correction module is used for calculating the inclination angle of the image according to the position angle information and the direction information of each text block acquired by the text line analysis module;
the table line analysis and prediction module is used for inputting the image output by the image direction correction module into a deep learning model mainly based on image example segmentation to predict and extracting a table line characteristic diagram;
the table line analysis and fusion module is used for fusing the table line characteristic graph generated by the table line analysis and prediction module to generate a binary graph;
the table cell information analysis module is used for carrying out table cell analysis on the table line binary image generated by the table line analysis and fusion module, and analyzing the table cell based on the theory that background pixel points in the same cell all share the same upper, lower, left and right boundary lines, finding four boundary lines which are closest to each background point in the table range, and then carrying out cluster analysis according to the theory that the pixel points in the same cell have the same boundary attribute, wherein each type represents one cell; calculating the row and column information of the table according to the coordinate information and the connection attribute of the cells;
the cell information fusion module is used for integrating cell information by combining output results of the text line analysis module and the table cell information analysis module, and each cell has cell position information, row and column information, cell text content and font attribute information of the text content;
and the formatting output module is used for outputting the result output by the cell information fusion module according to a data format or a file format.
10. The form information extraction system of claim 9, wherein:
the image direction correction module comprises an image angle calculation module and a text block coordinate mapping module, the image angle calculation module selects a text line with detection precision and recognition precision larger than a set threshold value T1 from all text lines to calculate the whole image angle, carries out angle clustering by taking the angle of the current text block as the center and carrying out positive and negative deviation of 5 degrees, and finally selects the category with the largest number of the text blocks to calculate and obtain the inclination angle of the image and carry out inclination correction on the image; the text block coordinate mapping module rotationally maps the text block information acquired by the text line analysis module to an image subjected to tilt correction according to the image tilt angle acquired by the image angle calculation module;
the table line analysis prediction module is internally provided with a deep learning model, the deep learning model adopts a CNN + FPN network structure to perform feature extraction, then adopts an RPN network structure to generate a plurality of table candidate areas, and performs table classification, table boundary regression and table line segmentation prediction on each table candidate area to finally obtain all lines of all wired tables and all lines of all wireless tables on the current image;
the table line analysis and fusion module comprises a horizontal direction line analysis and fusion module and a vertical direction line analysis and fusion module, the horizontal direction line analysis and fusion module fits all solid lines and virtual lines in the horizontal direction by using a least square method, deletes the lines of which the length of the horizontal lines is smaller than a set threshold value, merges the lines of which the longitudinal distance between the two horizontal lines is smaller than the set threshold value, and merges the lines of which the head-to-tail distance between the two horizontal lines is smaller than the set threshold value; similarly, the vertical direction line analysis and fusion module fits all solid lines and virtual lines in the vertical direction by using a least square method, deletes lines of which the length of the vertical line is smaller than a set threshold value, merges lines of which the distance in the horizontal direction of the two lines is smaller than the set threshold value, and merges lines of which the head-to-tail distance of the two lines is smaller than the set threshold value; finally, the table line analysis and fusion module fuses the horizontal lines and the vertical lines of the table acquired by the horizontal direction line analysis and fusion module and the vertical direction line analysis and fusion module to form a binary image;
the table cell information analysis module also comprises a table boundary line calculation module, a boundary cell search module and a cell alignment module;
the grid boundary line calculation module finds out the left line, the upper line, the right line and the lower line of the table, and if the boundary line in a certain direction is missing, the position of the boundary line in the direction is calculated according to the information of the existing line and the existing cells;
the border cell lookup module: finding out all boundary cells in the table by searching the relationship between the four boundary attributes of the cells and the four boundary lines of the table;
the cell alignment module: and analyzing the boundary cell and the four edge lines of the table, if the boundary cell is an upper boundary cell, setting the upper boundary line of the cell as the upper boundary line of the current table, and setting the boundary cells in other directions in the same way until all the boundary cells are completely filled and aligned with the boundary line, thereby providing a complete data basis for the information calculation of the row and column of the subsequent cell and the table restoration.
CN202111665466.XA 2021-12-31 2021-12-31 Table information extraction method and system Pending CN114419647A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111665466.XA CN114419647A (en) 2021-12-31 2021-12-31 Table information extraction method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111665466.XA CN114419647A (en) 2021-12-31 2021-12-31 Table information extraction method and system

Publications (1)

Publication Number Publication Date
CN114419647A true CN114419647A (en) 2022-04-29

Family

ID=81270817

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111665466.XA Pending CN114419647A (en) 2021-12-31 2021-12-31 Table information extraction method and system

Country Status (1)

Country Link
CN (1) CN114419647A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115661847A (en) * 2022-09-14 2023-01-31 北京百度网讯科技有限公司 Table structure recognition and model training method, device, equipment and storage medium
CN116052193A (en) * 2023-04-03 2023-05-02 杭州实在智能科技有限公司 RPA interface dynamic form picking and matching method and system
CN116311310A (en) * 2023-05-19 2023-06-23 之江实验室 Universal form identification method and device combining semantic segmentation and sequence prediction
CN116343247A (en) * 2023-05-24 2023-06-27 荣耀终端有限公司 Form image correction method, device and equipment
CN116824611A (en) * 2023-08-28 2023-09-29 星汉智能科技股份有限公司 Table structure identification method, electronic device, and computer-readable storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115661847A (en) * 2022-09-14 2023-01-31 北京百度网讯科技有限公司 Table structure recognition and model training method, device, equipment and storage medium
CN115661847B (en) * 2022-09-14 2023-11-21 北京百度网讯科技有限公司 Table structure recognition and model training method, device, equipment and storage medium
CN116052193A (en) * 2023-04-03 2023-05-02 杭州实在智能科技有限公司 RPA interface dynamic form picking and matching method and system
CN116311310A (en) * 2023-05-19 2023-06-23 之江实验室 Universal form identification method and device combining semantic segmentation and sequence prediction
CN116343247A (en) * 2023-05-24 2023-06-27 荣耀终端有限公司 Form image correction method, device and equipment
CN116343247B (en) * 2023-05-24 2023-10-20 荣耀终端有限公司 Form image correction method, device and equipment
CN116824611A (en) * 2023-08-28 2023-09-29 星汉智能科技股份有限公司 Table structure identification method, electronic device, and computer-readable storage medium
CN116824611B (en) * 2023-08-28 2024-04-05 星汉智能科技股份有限公司 Table structure identification method, electronic device, and computer-readable storage medium

Similar Documents

Publication Publication Date Title
CN114419647A (en) Table information extraction method and system
CN111814722B (en) Method and device for identifying table in image, electronic equipment and storage medium
CN110032962B (en) Object detection method, device, network equipment and storage medium
CN113658132B (en) Computer vision-based structural part weld joint detection method
CN109389121B (en) Nameplate identification method and system based on deep learning
CN111626146B (en) Merging cell table segmentation recognition method based on template matching
CN109255350B (en) New energy license plate detection method based on video monitoring
CN110866871A (en) Text image correction method and device, computer equipment and storage medium
CN110180186B (en) Topographic map conversion method and system
CN105528601A (en) Identity card image acquisition and recognition system as well as acquisition and recognition method based on contact type sensor
CN112836650B (en) Semantic analysis method and system for quality inspection report scanning image table
WO2020125057A1 (en) Livestock quantity identification method and apparatus
CN113343740B (en) Table detection method, device, equipment and storage medium
CN112115774A (en) Character recognition method and device combining RPA and AI, electronic equipment and storage medium
CN114612488A (en) Building-integrated information extraction method, computer device, and storage medium
CN113239818B (en) Table cross-modal information extraction method based on segmentation and graph convolution neural network
CN112200117A (en) Form identification method and device
CN112580647A (en) Stacked object oriented identification method and system
CN111460927A (en) Method for extracting structured information of house property certificate image
CN110728307A (en) Method for realizing small sample character recognition of X-ray image by self-generating data set and label
CN115424017B (en) Building inner and outer contour segmentation method, device and storage medium
CN110598698A (en) Natural scene text detection method and system based on adaptive regional suggestion network
CN112241730A (en) Form extraction method and system based on machine learning
CN112364834A (en) Form identification restoration method based on deep learning and image processing
CN116311310A (en) Universal form identification method and device combining semantic segmentation and sequence prediction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination