CN115620322A - Method for identifying table structure of whole-line table based on key point detection - Google Patents
Method for identifying table structure of whole-line table based on key point detection Download PDFInfo
- Publication number
- CN115620322A CN115620322A CN202211637591.4A CN202211637591A CN115620322A CN 115620322 A CN115620322 A CN 115620322A CN 202211637591 A CN202211637591 A CN 202211637591A CN 115620322 A CN115620322 A CN 115620322A
- Authority
- CN
- China
- Prior art keywords
- line
- upper left
- points
- left corner
- point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/412—Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/151—Transformation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/60—Analysis of geometric attributes
- G06T7/66—Analysis of geometric attributes of image moments or centre of gravity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/28—Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/32—Normalisation of the pattern dimensions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/34—Smoothing or thinning of the pattern; Morphological operations; Skeletonisation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Geometry (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a method for identifying a table structure of a full-line table based on key point detection, which comprises the following steps: detecting key points in the table image by adopting a key point detection network to obtain a Gaussian heat map containing the position information of all the key points; scaling the Gaussian heat map to be consistent with the size of the input form image, and obtaining the coordinate positions of all key points through a contour center distance algorithm; analyzing the structural position relation of the key points in the table by using a scanning line method; detecting whether a connection relation exists between adjacent key points by using a connected domain method; and reconstructing all cells in the table according to the structural position relationship and the connection relationship among the key points, and converting the cells into the required markup language description. The method can robustly find all table line intersection points in the table image by adopting a key point detection method based on deep learning, and obtains the accurate positions of all cells according to the key points, thereby completing table structure identification with high quality.
Description
Technical Field
The invention belongs to the technical field of image processing and artificial intelligence, and particularly relates to a method for identifying a table structure of a full-line table based on key point detection.
Background
The table is a simple and compact content organization mode, has good visualization effect, has wide application value and is widely applied to aspects of data visualization, computer software, bill record and the like. With the advent of the information age, more data is generated in daily life, business exchange, experimental research and the like of people, the number of tables is exponentially increased, and the table data is very valuable for systems such as big data mining, knowledge maps and the like. However, structured data such as tables is usually not easy to be identified by an algorithm, and on the other hand, when cells are merged between tables, the difficulty of identifying the table structure by the algorithm is increased. In order to extract data in the table efficiently, the identification of the table structure is one of the key tasks, an algorithm capable of effectively identifying the table structure is developed, and the auxiliary extraction of the data in the table is significant.
The traditional table structure recognition algorithm usually uses a special operator set manually to detect the table lines, obtains the row and column information of the table lines, and then determines the cells in the table by combining the intersection points of the table lines to complete the table structure recognition. However, these methods have certain drawbacks: on one hand, the method is very dependent on manual setting of special operator performance when a straight line is detected, and the detected table line is easy to have the problems of short lines and the like; on the other hand, these methods are often only applicable to regular and clean tables exported from PDF files, and when the tables are faced with irregular and rich and diverse tables obtained by scanning, photographing, etc., the performance is often sharply reduced.
Therefore, with the deeply learned fire heat, the table structure recognition algorithm based on the deep learning also comes along. At present, the method can be divided into three categories:
the first type is an example segmentation-based method, and the idea is to segment all cells in a table by an example segmentation algorithm in deep learning and then complete table structure recognition by some manually preset post-processing. However, the algorithm can only complete the positioning of the cells, but cannot identify the structural positions of the cells in the table, so that the performance of the algorithm depends on a post-processing algorithm set manually. On the other hand, the segmentation algorithm can only provide the area occupied by the cell, and the approximate shape and position of the cell can be obtained only by using the algorithm such as the minimum bounding rectangle, which is not accurate enough.
The second type is an image-to-sequence based method, and the core of the method lies in encoding a table structure into a sequence such as LaTeX or HTML, and the like, so that the table structure identification can be directly completed through a model of an image conversion sequence. However, the method does not explicitly utilize the position and structure information of the cells, thereby limiting the performance of the model.
The third type is a method based on a graph neural network, and the key point is that the graph neural network is used for completing modeling of cell relations in a table, the cells are used as nodes, and structural relations among the cells are used as edges, so that the problem of table structure identification is converted into the problem of edge classification. However, these methods need to obtain the location information of the cells through the cell detection network first, and cannot train end-to-end with the cell detection network, so their performance is limited by the performance of these cell detection networks.
In summary, the conventional method lacks robustness and cannot cope with more complicated tables nowadays, and the deep learning method is limited by accurately acquiring the positions of the cells. Therefore, a robust method capable of accurately acquiring the cell position is needed for table structure identification.
Disclosure of Invention
In view of the above, there is a need to provide a method for identifying a table structure of a whole line table based on key point detection, which uses the intersection point of table lines as a key point, first detects all key points in a table image by using a key point detection model, then determines the structural position relationship (located row and column) and the connection relationship of these key points in the table, and finally reconstructs a cell by using the relationship between the key points, thereby completing the table structure identification.
The invention discloses a method for identifying a table structure of a full-line table based on key point detection, which comprises the following steps of:
and 5, reconstructing all cells in the table according to the structural position relationship and the connection relationship among the key points, and converting the cells into the required markup language description.
Optionally, the key point detection network is HRNet.
Optionally, the key point detection network is deepcut.
The key point detection network uses the self-constructed table key point data set for pre-training to obtain the capability of detecting the key points of the table.
Specifically, the table keyword data set contains 500 table images, and the coordinates of all table line keywords in each table image are given as labels of the coordinate positions of the keywords. The form image data source is a work order image and is a full line form.
Specifically, the contour center distance algorithm comprises the following steps:
adopting a Gaussian fuzzy algorithm to the Gaussian heat map to reduce noise points appearing on the outline edge in the Gaussian heat map;
carrying out binarization operation on the Gaussian heatmap to obtain a binarization map;
carrying out contour detection on the binary image to obtain the contours of all key points;
and calculating the center distance of the outline of each key point, wherein the center distance is the coordinate position of the key point.
Optionally, the contour detection algorithm is implemented using the findContour () method in OpenCV.
Specifically, the calculation formula of the center distance of the contour calculation is as follows:
wherein the content of the first and second substances,is a key pointOf (2), the profileSet of all points above,Representing the abscissa of the contour point in the figure,representing the ordinate of the contour point in the figure,to representIs/are as followsTo the power of the above, the first order,to representIsTo the power of the wave,representing the abscissa of all points in the contourAnd ordinateAfter the operation, the sum is obtained,is the zero-order spatial moment of the profile,andis the first spatial moment of the contour.
Specifically, the structural positions of the key points in the table, that is, the key points belong to the row table line and the list table line.
Furthermore, the scanning line method comprises the following steps:
scaling the form image to the same height and width;
setting an initial straight line as a line scanning line, vertically moving in the form image from top to bottom, and moving for a fixed distance each time;
When continuously moved by a predetermined amountA distance of movementNumber of key pointsIf the number of the lines is not increased, a line separation line is arranged at the current position;
separating all key points according to the line separating line, thereby obtaining the key points belonging to the table line of the fourth line;
an initial straight line is set as a column scanning line and horizontally moves in the form image from left to rightEach time a fixed distance;
Move 1 distance each timeThen, the number of key points to the left of the column scan line is calculated;
When continuously moved by a predetermined amountA distance of movementNumber of pointsIf the number of the lines is not increased, a column separation line is arranged at the current position;
all the key points are separated according to the column separation line, so that the key points belong to the fourth list grid line.
Preferably, if the table in the table image is tilted, the corresponding row scanning line or column scanning line needs to be corrected;
the correction process of the line scanning line comprises the following steps: firstly, setting an initial line scanning line as a horizontal straight line, carrying out a first scanning line method to obtain points positioned on a first line table line, then calculating an approximate slope by using a least square method according to coordinates of the points, and applying the approximate slope on the initial line scanning line as a corrected line scanning line;
the correction process of the column scanning lines is as follows: firstly, setting an initial column scanning line as a vertical straight line, carrying out a first scanning line method to obtain points positioned on a first list grid line, then calculating an approximate slope by using a least square method according to coordinates of the points, and applying the approximate slope on the initial column scanning line as a corrected column scanning line;
the least square method is used for solving the approximate slope, and the calculation formula is as follows:
representing the approximate slope of the corrected row or column scan line,for the coordinates of a point on the first line table or list grid,the number of points which are points on the first row table line or the list table line;
wherein the content of the first and second substances,the calculation formula of (c) is as follows:
specifically, the adjacent keypoints refer to two keypoints belonging to the same row grid line and belonging to adjacent list grid lines, or two keypoints belonging to the same list grid line and belonging to adjacent row grid lines.
Specifically, the connected domain method comprises the following steps:
traversing all adjacent key points, taking the positions of the key points as the circle centers, firstly making a circle with the radius of r, then obtaining a minimum circumscribed rectangle containing the circles with the two adjacent key points as the circle centers as a cutting area, and then cutting out a local area graph where the adjacent key points are located from the form image;
after the local area image is converted into a gray image, a Gaussian low-pass filter is used for smoothing operation, closed operation is used for filling gaps in a connected domain, then a self-adaptive threshold algorithm is used for converting the connected domain into a binary image, and then a second large connected domain is found; the largest connected component is the background, not the foreground;
taking the maximum value of the height and the width of the local area map asThe maximum of the height and width of the second largest connected domain isCalculating the ratio ofThe calculation is as follows:
when the temperature is higher than the set temperatureAnd if the threshold value is larger than the preset threshold value, the connection relation exists between the adjacent key points, otherwise, the connection relation does not exist.
Furthermore, the existing connection relationship comprises a vertical connection relationship and a horizontal connection relationship, and if the height of the local area graph is greater than the width, the vertical connection relationship exists; if the height of the local area graph is smaller than the width, the horizontal connection relationship exists;
the non-existence of the connection relation comprises non-existence of a longitudinal connection relation and non-existence of a transverse connection relation, and if the height of the local area graph is greater than the width, the non-existence of the longitudinal connection relation is determined; if the height of the local area map is smaller than the width, the transverse connection relation does not exist.
Furthermore, the step of reconstructing all cells in the table according to the structural position relationship and the connection relationship between the key points is as follows: according to a reconstruction rule, firstly finding all upper left corner points in the key points, and then finding the lower right corner point corresponding to each upper left corner point, thereby reconstructing all cells;
the reconfiguration gauge includes: according to the first rule, each key point can only be the upper left corner point of one cell at most, and can also only be the lower right corner point of one cell at most; according to a second rule, when adjacent key points do not have a connection relation, if the adjacent key points do not have a longitudinal connection relation, the key point positioned above is not the upper left corner point and the key point positioned below is not the lower right corner point; if the transverse connection relation does not exist, the key point positioned on the left side is not the upper left corner point and the key point positioned on the right side is not the lower right corner point; rule three, the list grid line where the lower right corner point corresponding to the upper left corner point of one cell is located in the same column or left of the list grid line where the upper left corner point of the cell adjacent to the right of the cell is located, if there is no cell adjacent to the right, the column where the lower right corner point corresponding to the upper left corner point of the cell is located in the same column or left of the last list grid line; a list grid line where a lower right corner point corresponding to the upper left corner point of one cell is located on the right of the list grid line where the upper left corner point of the cell is located; and a fifth rule, wherein the lower right corner point corresponding to the upper left corner point of one cell is the closest to the upper left corner point among the points according with the third and fourth rules.
Specifically, the steps of reconstructing all cells are as follows:
taking a set formed by all key points as a candidate upper left corner point set, and simultaneously taking a set formed by all key points as a candidate lower right corner point set;
according to a second rule, according to the connection relation between all adjacent key points, when the adjacent key points do not have the longitudinal connection relation, removing the key point positioned above from the candidate upper left corner set, simultaneously removing the key point positioned below from the candidate lower right corner set, when the adjacent key points do not have the transverse connection relation, removing the key point positioned on the left from the candidate upper left corner set, simultaneously removing the key point positioned on the right from the candidate lower right corner set, and obtaining all formal upper left corner sets and all formal lower right corner sets;
according to the first rule, each point can only be the upper left corner point of one cell at most and can also be the lower right corner point of one cell at most, so that the upper left corner point in the upper left corner point set and the lower right corner point in the lower right corner point set are in one-to-one correspondence, and then all cells can be reconstructed as long as corresponding lower right corner points are found for the upper left corner points in all the upper left corner point sets;
according to a third rule, sequentially taking out one upper left corner point in all the upper left corner point sets, and calling the upper left corner point as an upper left corner point A, finding a plurality of upper left corner points which are positioned on the same row of table lines with the upper left corner point A, wherein one of the plurality of upper left corner points, which is positioned at the right of the upper left corner point A and is closest to the upper left corner point A, is the upper left corner point of a cell adjacent to the right of the cell in which the upper left corner point A is positioned, and calling the upper left corner point as an upper left corner point B; then, screening all the lower right corner point sets for the first time, removing the lower right corner point positioned at the right side of the list grid line where the upper left corner point B is positioned, and if the upper left corner point A is positioned on the last list grid line in the list, not needing to screen for the first time;
according to the fourth rule, performing secondary screening on all the lower right corner point sets, and removing all lower right corner points, which do not meet the requirements, of the list grid lines positioned on the right side of the list grid line positioned at the upper left corner point A;
according to the fifth rule, one corner point which is closest to the upper left corner point A is found from all lower right corner points left after twice screening, the lower right corner point is the lower right corner point corresponding to the upper left corner point A, and paired upper left corner points and lower right corner points form a cell;
and sequentially finding the lower right corner points corresponding to the upper left corner points in the set of all upper left corner points, so as to reconstruct all the cells.
Compared with the prior art, the method has the advantages that the method can robustly find all table line intersection points in the table image by adopting a key point detection method based on deep learning, and obtains the accurate positions of all cells according to the key points, so that the table structure identification is finished with high quality.
Drawings
FIG. 1 shows a schematic flow diagram of a method embodying the present invention;
fig. 2 shows a schematic structural diagram of the table of the present embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
For the sake of reference and clarity, the technical terms, abbreviations or abbreviations used hereinafter are to be interpreted in summary as follows:
gaussian heat map: the heatmap is a matrix that displays data as color changes, and a gaussian heatmap, i.e., a heatmap in which the color changes exhibit a gaussian distribution.
PDF (Portable Document Format): portable document format, a file format that presents documents in a manner that is independent of the application, operating system, hardware.
HRNet (High Resolution Net): high resolution network model.
DeeperCut: a key point detection network.
OpenCV (Open Source Computer Vision Library): a cross-platform computer vision library encapsulates many commonly used functions and classes of image processing.
Line table line: horizontal form lines in the form image.
Listing grid lines: vertical form lines in the form image.
HTML (Hyper Text Markup Language): hypertext markup language.
LaTeX: a typesetting system.
Point at the upper left corner: the vertex in the upper left corner of the four vertices of the cell in the table.
Lower right corner point: the vertex in the lower right corner of the four vertices of the cell in the table.
Fig. 1 shows a schematic flow diagram of an embodiment of the invention. A method for identifying a table structure of a full-line table based on key point detection comprises the following steps:
and 5, reconstructing all cells in the table according to the structural position relationship and the connection relationship among the key points, and converting the cells into the required markup language description.
The following describes the whole line table structure identification process based on the keypoint detection in detail with reference to the examples.
And step 1, detecting the key points in the table image by adopting a key point detection network to obtain a Gaussian heat map containing the position information of all the key points.
Specifically, the key point detection network is HRNet.
The key point detection network uses the self-constructed table key point data set for pre-training to obtain the capability of detecting the key points of the table.
Specifically, the table keyword data set contains 500 table images, and the coordinates of all table line keywords in each table image are given as labels of the coordinate positions of the keywords.
The form image data source is a work order image and is a full-line form.
And 2, executing step 2, scaling the Gaussian heat map to be consistent with the size of the input form image, and obtaining the coordinate positions of all key points through a contour center distance algorithm.
Specifically, the scaling is implemented by using a resize () algorithm method in OpenCV, and bilinear interpolation is used.
Specifically, the contour center distance algorithm comprises the following steps:
adopting a Gaussian fuzzy algorithm for the Gaussian heat map to reduce noise points appearing at the edges of the outline in the heat map;
carrying out binarization operation on the Gaussian heat map to obtain a binarization map;
carrying out contour detection on the binary image to obtain the contours of all key points;
specifically, the contour detection algorithm is implemented by using a findContour () algorithm in OpenCV;
calculating the center distance of the outline of each key point, wherein the center distance is the coordinate position of the key point;
specifically, the method calculates key points by using a contour central moment algorithmIn the position ofThe calculation is as follows:
wherein, the first and the second end of the pipe are connected with each other,is calculated as follows:
wherein the content of the first and second substances,is a key pointThe contour of (1), the set of all points on the contour,Representing the abscissa of the contour point in the figure,representing the ordinate of the contour point in the figure,to representIsTo the power of the wave,to representIs/are as followsTo the power of the wave,represents the abscissa of all points in the contourAnd ordinateAfter the operation, the sum is obtained,is the zero-order spatial moment of the profile,andis the first spatial moment of the contour.
And 3, executing a step of analyzing the structural position relation of the key points in the table by using a scanning line method.
Specifically, the structural positions of the key points in the table, that is, the key points belong to the row table line and the list table line.
Specifically, the scanning line method comprises the following steps:
the form image is scaled to the same height and width.
Setting an initial straight line as a line scanning line, and vertically moving in the form image from top to bottom, each time moving for a fixed distance;
When continuously moved and manually presetA distance of movementNumber of pointsIf the number of the lines is not increased, a line separation line is arranged at the current position;
separating all key points according to the row separation line, thereby obtaining the key point belonging to the second row table line;
an initial straight line is set as a column scan line and moved horizontally in the form image from left to right, each timeMove a fixed distance;
When continuously moved and manually presetA distance of movementNumber of pointsIf the number of the lines is not increased, a column separation line is arranged at the current position;
separating all key points according to the column separation line, thereby obtaining the key points belonging to the second list grid line;
preferably, for a tilted table, the initial row-scan line slope needs to be corrected; first, an initial line scan line is set as a horizontal straight line, a first scan line method is performed to obtain points located on a first line table line, an approximate slope is obtained from the points by a least square method, and the slope is applied to the initial line scan line as a corrected initial line scan line.
Preferably, for a tilted table, the initial column scan line slope needs to be corrected; first, an initial column scan line is set as a vertical straight line, a first scan line method is performed to obtain points located on a first list grid line, an approximate slope is obtained by a least square method from coordinates of the points, and the slope is applied to the initial column scan line as a corrected initial column scan line.
Wherein, the least square method calculates the approximate slope, and the calculation is as follows:
the approximate slope of the row/column representing the corrected scan line,as the coordinates of the point on the first row/list gridline,the number of points that are points on the first row/list grid;
wherein, the first and the second end of the pipe are connected with each other,is calculated as follows:
and 4, executing a step of detecting whether the adjacent key points have a connection relation or not by using a connected domain method.
Specifically, the adjacent keypoints refer to two keypoints belonging to the same row grid line and belonging to adjacent list grid lines, or two keypoints belonging to the same list grid line and belonging to adjacent row grid lines.
Specifically, the connected domain method comprises the following steps:
traversing all adjacent key points, taking the positions of the key points as the circle centers, firstly making a circle with the radius of r, then obtaining a minimum circumscribed rectangle containing the circles with the two adjacent key points as the circle centers as a cutting area, and then cutting out a local area graph where the adjacent key points are located from the form image;
after the local area image is converted into a gray level image, a Gaussian low-pass filter is used for smoothing operation, closed operation is used for filling gaps in a connected domain, then an adaptive threshold algorithm is used for converting the connected domain into a binary image, and then a second large connected domain is found (the largest connected domain is the background and not the foreground).
Taking the maximum value of the height and the width of the local area map asThe maximum of the height and width of the connected domain isCalculating the ratioThe calculation is as follows:
finally, a threshold value is set whenIf the key points are larger than the threshold, the connection relation exists between the adjacent key points, otherwise, the connection relation does not exist;
furthermore, the existence of the connection relationship can be divided into the existence of a longitudinal connection relationship and the existence of a transverse connection relationship, and if the height of the local area graph is greater than the width, the existence of the longitudinal connection relationship is determined; if the height of the local area graph is smaller than the width, the horizontal connection relationship exists;
furthermore, the absence of the connection relationship can be divided into the absence of the longitudinal connection relationship and the absence of the transverse connection relationship, and if the height of the local area graph is greater than the width, the absence of the longitudinal connection relationship is determined; if the height of the local area map is smaller than the width, the transverse connection relation does not exist.
And 5, executing step 5, reconstructing all cells in the table according to the structural position relation and the connection relation among the key points, and converting into the required markup language description.
Specifically, the steps of reconstructing all cells in the table according to the structural position relationship and the connection relationship between the key points are as follows:
according to five rules, all the upper left corner points in the key points are found first, and then the lower right corner point corresponding to each upper left corner point is found, so that all the cells are reconstructed.
Specifically, the five rules are as follows: rule I, each key point can only be the upper left corner of one cell at most, and can only be the lower right corner of one cell at most; according to a second rule, when adjacent key points do not have a connection relation, if the adjacent key points do not have a longitudinal connection relation, the key point positioned above is not the upper left corner point and the key point positioned below is not the lower right corner point; if the transverse connection relation does not exist, the key point positioned on the left side is not the upper left corner point and the key point positioned on the right side is not the lower right corner point; rule three, a list grid line where a lower right corner point corresponding to an upper left corner point of one cell is located in the same column or left of a list grid line where an upper left corner point of a cell adjacent to the right of the cell is located, if no cell adjacent to the right exists, a column where a lower right corner point corresponding to an upper left corner point of the cell is located in the same column or left of the last list grid line; a list grid line where a lower right corner point corresponding to the upper left corner point of one cell is located on the right of the list grid line where the upper left corner point of the cell is located; and a fifth rule, wherein the lower right corner point corresponding to the upper left corner point of one cell is the closest to the upper left corner point among the points according with the third and fourth rules.
According to the rule I, as shown in fig. 2, the key points 1 to 22 can only be the upper left corner point of one cell at most, and can also only be the lower right corner point of one cell at most, and all points cannot be the common upper left corner point or the common lower right corner point of a plurality of cells;
in the second rule, as shown in fig. 2, the key points 7 and 13 do not have a longitudinal connection relationship, so that the key point 7 located above is not the upper left corner of the cell and the key point 13 located below is not the lower right corner of the cell; if the key points 8 and 9 do not have a transverse connection relationship, the key point 8 located on the left side is not an upper left corner point and the key point 9 located on the right side is not a lower right corner point;
in the rule three, as shown in fig. 2, a tabulation line in which a right lower corner point (the key point 12 in fig. 2) corresponding to a left upper corner point (the key point 1 in fig. 2) of one cell (the cell surrounded by the key points 1,2, 12, 11 in fig. 2) is located on the same column or left of a tabulation line in which a left upper corner point (the key point 2 in fig. 2) of a cell (the cell surrounded by the key points 2,3, 8, 6 in fig. 2) adjacent to the right of the cell is located. If a cell (the cell surrounded by the key points 4, 5, 10, and 9 in fig. 2) has no right-adjacent cell, the column of the right-lower corner (the key point 10 in fig. 2) corresponding to the upper-left corner (the key point 4 in fig. 2) of the cell is located in the same column or to the left of the last list grid line;
as shown in fig. 2, a tabulation ruled line in which a lower-right corner (key point 12 in fig. 2) corresponding to an upper-left corner (key point 1 in fig. 2) of a cell (a cell surrounded by key points 1,2, 12, and 11 in fig. 2) is located on the right of a tabulation ruled line in which an upper-left corner (key point 1 in fig. 2) of the cell is located;
in the rule five, as shown in fig. 2, a right lower corner point (the key point 12 in fig. 2) corresponding to an upper left corner point (the key point 1 in fig. 2) of one cell (the cell surrounded by the key points 1,2, 12, and 11 in fig. 2) is a key point 12 in fig. 2, which is the nearest one of the points (the key points 6, 12, and 18 in fig. 2) that meets the rule three and four.
Specifically, the steps of reconstructing all cells are as follows:
taking the set formed by all key points as a candidate upper left corner point set {1,2, \8230;, 22}, and simultaneously taking the set formed by all key points as a candidate lower right corner point set {1,2, \8230;, 22};
according to a second rule, according to the connection relation between all adjacent key points, when the adjacent key points do not have a longitudinal connection relation, removing the key point positioned above from the candidate upper left corner set, and simultaneously removing the key point positioned below from the candidate lower right corner set, when the adjacent key points do not have a transverse connection relation, removing the key point positioned on the left from the candidate upper left corner set, and simultaneously removing the key point positioned on the right from the candidate lower right corner set, and finally obtaining all formal upper left corner sets {1,2,3,4,6,9,11,12,13,14,15} and all lower right corner sets {8,10,12,14,15,16,18,19,20,21,22};
according to the first rule, each point can only be an upper left corner point of one cell and can also only be a lower right corner point of one cell, so that an upper left corner point in an upper left corner point set and a lower right corner point in a lower right corner point set are in one-to-one correspondence, and then all cells can be reconstructed as long as corresponding lower right corner points are found for upper left corner points in all upper left corner point sets;
according to the third rule, sequentially taking out an upper left corner point in all the upper left corner point sets, taking out the upper left corner point 3, finding out a plurality of upper left corner points {1,2,4} which are positioned on the same row of table lines with the upper left corner point 3, wherein one of the upper left corner points, which is positioned at the right of the upper left corner point 3 and is closest to the upper left corner point 3, is the upper left corner point of the right adjacent cell of the cell in which the upper left corner point 3 is positioned, and is the upper left corner point 4. Next, a first filtering is performed on the set of all lower right corner points to remove the lower right corner point located to the right of the list grid where the upper left corner point 4 is located, and the remaining lower right corner points are {8,12,14,15,18,19,20,21}. Because the upper left corner 3 is not located on the last column of table line in the table, the first screening is needed here;
according to the fourth rule, performing second screening on the set of all lower right corner points {8,12,14,15,18,19,20 and 21}, removing all lower right corner points which do not meet the right of the list ruled line of which the list ruled line is positioned at the upper left corner point 3, and setting the rest lower right corner points to be {15 and 21};
according to the fifth rule, one closest to the upper left corner point 3, namely the lower right corner point 15, is found from all the lower right corner points {15, 21} left after twice screening, and the upper left corner point 3 and the lower right corner point 15 form a cell;
and sequentially finding the lower right corner points corresponding to the upper left corner points in the set of all upper left corner points, so as to reconstruct all the cells.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.
Claims (10)
1. A method for identifying a table structure of a full-line table based on key point detection is characterized by comprising the following steps:
step 1, detecting key points in a table image by adopting a key point detection network to obtain a Gaussian heat map containing information of all key point positions;
step 2, scaling the Gaussian heat map to be consistent with the size of the table image, and obtaining coordinate positions of all key points through a contour center distance algorithm;
step 3, analyzing the structural position relation of the key points in the table by using a scanning line method;
step 4, detecting whether the adjacent key points have a connection relation by using a connected domain method;
and 5, reconstructing all cells in the table according to the structural position relationship and the connection relationship among the key points, and converting the cells into the required markup language description.
2. The method for identifying a full-line table structure based on keypoint detection as claimed in claim 1, wherein said contour center distance algorithm comprises the following steps:
adopting a Gaussian fuzzy algorithm to the Gaussian heat map to reduce noise points appearing on the outline edge in the Gaussian heat map;
carrying out binarization operation on the Gaussian heatmap to obtain a binarization map;
carrying out contour detection on the binary image to obtain the contours of all key points;
and calculating the center distance of the outline of each key point, wherein the center distance is the coordinate position of the key point.
3. The method for identifying the table structure of the whole line table based on the key point detection as claimed in claim 2, wherein the calculation formula of the contour calculation center distance is as follows:
wherein the content of the first and second substances,is a key pointThe contour of (1), the set of all points on the contour,Representing the abscissa of the contour point in the figure,representing the ordinate of the contour point in the figure,representIs/are as followsTo the power of the above, the first order,representIs/are as followsTo the power of the wave,represents the abscissa of all points in the contourAnd ordinateAfter the operation, the sum is obtained,is the zero-order spatial moment of the profile,andis the first spatial moment of the contour.
4. A method for identifying a full-line table structure based on keypoint detection as claimed in claim 3, characterized in that said scan-line method comprises the following steps:
scaling the form image to the same height and width;
setting an initial straight line as a line scanning line, vertically moving in the form image from top to bottom, and moving for a fixed distance each time;
When continuously moved by a predetermined amountA distance of movementIf the key points are countedIf the number of the lines is not increased, a line separation line is arranged at the current position;
separating all key points according to the line separating line, thereby obtaining the key points belonging to the table line of the fourth line;
setting an initial straight line as a column scanning line, and moving horizontally in the form image from left to right, each time by a fixed distance;
Move 1 distance each timeThen, the number of key points to the left of the column scan line is calculated;
When continuously moved by a predetermined amountA distance of movementNumber of pointsIf the number of the lines is not increased, a column separation line is arranged at the current position;
all the key points are separated according to the column separation line, so that the key points belong to the fourth list grid line.
5. The method as claimed in claim 4, wherein if the table in the table image is tilted, the corresponding row scan line or column scan line needs to be corrected;
the correction process of the line scanning line comprises the following steps: firstly, setting an initial line scanning line as a horizontal straight line, carrying out a first scanning line method to obtain points positioned on a first line table line, then calculating an approximate slope by using a least square method according to coordinates of the points, and applying the approximate slope on the initial line scanning line as a corrected line scanning line;
the correction process of the column scanning lines is as follows: firstly, setting an initial column scanning line as a vertical straight line, carrying out a first scanning line method to obtain points positioned on a first list grid line, then calculating an approximate slope by using a least square method according to coordinates of the points, and applying the approximate slope on the initial column scanning line as a corrected column scanning line;
the least square method is used for solving the approximate slope, and the calculation formula is as follows:
representing the approximate slope of the corrected row or column scan line,for the coordinates of a point on the first line table or list grid,the number of points which are points on the first row table line or the list table line;
wherein, the first and the second end of the pipe are connected with each other,、the calculation formula of (a) is as follows:
6. the method for identifying the table structure of the whole line table based on the key point detection as claimed in claim 1 or 4, wherein the connected domain method comprises the following steps:
traversing all adjacent key points, taking the positions of the key points as the circle centers, firstly making a circle with the radius of r, then obtaining a minimum circumscribed rectangle containing the circles with the two adjacent key points as the circle centers as a cutting area, and then cutting out a local area graph where the adjacent key points are located from the form image;
after the local area image is converted into a gray image, a Gaussian low-pass filter is used for carrying out smoothing operation, closed operation is used for filling gaps in a connected domain, then a self-adaptive threshold algorithm is used for converting the gap into a binary image, and then a second large connected domain is found;
taking the maximum value of the height and the width of the local area map asThe maximum of the height and width of the second largest connected domain isCalculating the ratioThe calculation is as follows:
7. The method for identifying a full-line table structure based on keypoint detection as defined in claim 6, wherein the existence of connection relationships includes the existence of a vertical connection relationship and the existence of a horizontal connection relationship, and if the height of a local region graph is greater than the width, the existence of a vertical connection relationship is determined; if the height of the local area graph is smaller than the width, the horizontal connection relationship exists;
the non-existence of the connection relation comprises non-existence of a longitudinal connection relation and non-existence of a transverse connection relation, and if the height of the local area graph is greater than the width, the non-existence of the longitudinal connection relation is determined; if the height of the local area map is smaller than the width, the transverse connection relation does not exist.
8. The method for identifying the table structure of the whole line table based on the key point detection as claimed in claim 7, wherein the step of reconstructing all the cells in the table according to the structure position relationship and the connection relationship between the key points comprises the following steps: according to the reconstruction rule, all upper left corner points in the key points are found, and then the lower right corner point corresponding to each upper left corner point is found, so that all cells are reconstructed;
the reconfiguration gauge includes: rule I, each key point can only be the upper left corner of one cell at most, and can only be the lower right corner of one cell at most; according to a second rule, when adjacent key points do not have a connection relation, if the adjacent key points do not have a longitudinal connection relation, the key point positioned above is not the upper left corner point and the key point positioned below is not the lower right corner point; if the transverse connection relation does not exist, the key point positioned on the left side is not the upper left corner point and the key point positioned on the right side is not the lower right corner point; rule three, the list grid line where the lower right corner point corresponding to the upper left corner point of one cell is located in the same column or left of the list grid line where the upper left corner point of the cell adjacent to the right of the cell is located, if there is no cell adjacent to the right, the column where the lower right corner point corresponding to the upper left corner point of the cell is located in the same column or left of the last list grid line; a list grid line where a lower right corner point corresponding to the upper left corner point of one cell is located on the right of the list grid line where the upper left corner point of the cell is located; and a fifth rule, wherein the lower right corner point corresponding to the upper left corner point of one cell is closest to the upper left corner point among the points according with the third and fourth rules.
9. The method for identifying the table structure of the whole line table based on the key point detection as claimed in claim 8, wherein the step of reconstructing all the cells is as follows:
taking a set formed by all key points as a candidate upper left corner set, and simultaneously taking a set formed by all key points as a candidate lower right corner set;
according to a second rule, according to the connection relation between all adjacent key points, when the adjacent key points do not have the longitudinal connection relation, removing the key point positioned above from the candidate upper left corner set, simultaneously removing the key point positioned below from the candidate lower right corner set, when the adjacent key points do not have the transverse connection relation, removing the key point positioned on the left from the candidate upper left corner set, simultaneously removing the key point positioned on the right from the candidate lower right corner set, and obtaining all formal upper left corner sets and all formal lower right corner sets;
according to the first rule, each point can only be the upper left corner point of one cell at most and can also be the lower right corner point of one cell at most, so that the upper left corner point in the upper left corner point set and the lower right corner point in the lower right corner point set are in one-to-one correspondence, and then all cells can be reconstructed as long as corresponding lower right corner points are found for the upper left corner points in all the upper left corner point sets;
according to a third rule, sequentially taking out one upper left corner point in all the upper left corner point sets, and calling the upper left corner point as an upper left corner point A, finding a plurality of upper left corner points which are positioned on the same row of table lines with the upper left corner point A, wherein one of the plurality of upper left corner points, which is positioned at the right of the upper left corner point A and is closest to the upper left corner point A, is the upper left corner point of a cell adjacent to the right of the cell in which the upper left corner point A is positioned, and calling the upper left corner point as an upper left corner point B; then, screening all the lower right corner point sets for the first time, removing the lower right corner point which is positioned at the right side of the list grid line where the upper left corner point B is positioned, and if the upper left corner point A is positioned on the last column of the list grid line in the list, not needing to screen for the first time;
according to the fourth rule, performing secondary screening on all the lower right corner point sets, and removing all lower right corner points, which do not meet the requirements, of the list grid lines positioned on the right side of the list grid line positioned at the upper left corner point A;
according to a fifth rule, one corner point which is closest to the upper left corner point A is found from all lower right corner points left after twice screening, the lower right corner point is the lower right corner point corresponding to the upper left corner point A, and the paired upper left corner point and lower right corner point form a cell;
and sequentially finding the lower right corner points corresponding to the upper left corner points in the set of all upper left corner points, so as to reconstruct all the cells.
10. The method as claimed in claim 1, wherein the keypoint detection network is HRNet or DeeperCut, and the constructed table keypoint data set is used for pre-training to obtain the capability of detecting keypoints of the table.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211637591.4A CN115620322B (en) | 2022-12-20 | 2022-12-20 | Method for identifying table structure of whole-line table based on key point detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211637591.4A CN115620322B (en) | 2022-12-20 | 2022-12-20 | Method for identifying table structure of whole-line table based on key point detection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115620322A true CN115620322A (en) | 2023-01-17 |
CN115620322B CN115620322B (en) | 2023-04-07 |
Family
ID=84879584
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211637591.4A Active CN115620322B (en) | 2022-12-20 | 2022-12-20 | Method for identifying table structure of whole-line table based on key point detection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115620322B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116259064A (en) * | 2023-03-09 | 2023-06-13 | 北京百度网讯科技有限公司 | Table structure identification method, training method and training device for table structure identification model |
CN117576699A (en) * | 2023-11-06 | 2024-02-20 | 华南理工大学 | Locomotive work order information intelligent recognition method and system based on deep learning |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105786957A (en) * | 2016-01-08 | 2016-07-20 | 云南大学 | Table sorting method based on cell adjacency relation and depth-first traversal |
US20170091943A1 (en) * | 2015-09-25 | 2017-03-30 | Qualcomm Incorporated | Optimized object detection |
US20180357776A1 (en) * | 2017-06-08 | 2018-12-13 | Microsoft Technology Licensing, Llc | Vector graphics handling processes for user applications |
CN112733855A (en) * | 2020-12-30 | 2021-04-30 | 科大讯飞股份有限公司 | Table structuring method, table recovery equipment and device with storage function |
CN113268982A (en) * | 2021-06-03 | 2021-08-17 | 湖南四方天箭信息科技有限公司 | Network table structure identification method and device, computer device and computer readable storage medium |
US20210256680A1 (en) * | 2020-02-14 | 2021-08-19 | Huawei Technologies Co., Ltd. | Target Detection Method, Training Method, Electronic Device, and Computer-Readable Medium |
CN113705395A (en) * | 2021-08-16 | 2021-11-26 | 南京英诺森软件科技有限公司 | Method for converting paper form into word document based on deep learning model |
CN113723328A (en) * | 2021-09-06 | 2021-11-30 | 华南理工大学 | Method for analyzing and understanding chart document panel |
CN113723330A (en) * | 2021-09-06 | 2021-11-30 | 华南理工大学 | Method and system for understanding chart document information |
CN114359939A (en) * | 2021-12-16 | 2022-04-15 | 华南理工大学 | Table structure identification method, system and equipment based on cell detection |
-
2022
- 2022-12-20 CN CN202211637591.4A patent/CN115620322B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170091943A1 (en) * | 2015-09-25 | 2017-03-30 | Qualcomm Incorporated | Optimized object detection |
CN105786957A (en) * | 2016-01-08 | 2016-07-20 | 云南大学 | Table sorting method based on cell adjacency relation and depth-first traversal |
US20180357776A1 (en) * | 2017-06-08 | 2018-12-13 | Microsoft Technology Licensing, Llc | Vector graphics handling processes for user applications |
US20210256680A1 (en) * | 2020-02-14 | 2021-08-19 | Huawei Technologies Co., Ltd. | Target Detection Method, Training Method, Electronic Device, and Computer-Readable Medium |
CN112733855A (en) * | 2020-12-30 | 2021-04-30 | 科大讯飞股份有限公司 | Table structuring method, table recovery equipment and device with storage function |
CN113268982A (en) * | 2021-06-03 | 2021-08-17 | 湖南四方天箭信息科技有限公司 | Network table structure identification method and device, computer device and computer readable storage medium |
CN113705395A (en) * | 2021-08-16 | 2021-11-26 | 南京英诺森软件科技有限公司 | Method for converting paper form into word document based on deep learning model |
CN113723328A (en) * | 2021-09-06 | 2021-11-30 | 华南理工大学 | Method for analyzing and understanding chart document panel |
CN113723330A (en) * | 2021-09-06 | 2021-11-30 | 华南理工大学 | Method and system for understanding chart document information |
CN114359939A (en) * | 2021-12-16 | 2022-04-15 | 华南理工大学 | Table structure identification method, system and equipment based on cell detection |
Non-Patent Citations (1)
Title |
---|
高良才 等: "表格识别技术研究进展" * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116259064A (en) * | 2023-03-09 | 2023-06-13 | 北京百度网讯科技有限公司 | Table structure identification method, training method and training device for table structure identification model |
CN116259064B (en) * | 2023-03-09 | 2024-05-17 | 北京百度网讯科技有限公司 | Table structure identification method, training method and training device for table structure identification model |
CN117576699A (en) * | 2023-11-06 | 2024-02-20 | 华南理工大学 | Locomotive work order information intelligent recognition method and system based on deep learning |
Also Published As
Publication number | Publication date |
---|---|
CN115620322B (en) | 2023-04-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gao et al. | ICDAR 2019 competition on table detection and recognition (cTDaR) | |
CN115620322B (en) | Method for identifying table structure of whole-line table based on key point detection | |
Shi et al. | Text extraction from gray scale historical document images using adaptive local connectivity map | |
Shi et al. | A steerable directional local profile technique for extraction of handwritten arabic text lines | |
Lee et al. | Parameter-free geometric document layout analysis | |
JP4065460B2 (en) | Image processing method and apparatus | |
CN106096592B (en) | A kind of printed page analysis method of digital book | |
Tran et al. | Page segmentation using minimum homogeneity algorithm and adaptive mathematical morphology | |
CN105574524B (en) | Based on dialogue and divide the mirror cartoon image template recognition method and system that joint identifies | |
CN114529925B (en) | Method for identifying table structure of whole line table | |
CN113158808A (en) | Method, medium and equipment for Chinese ancient book character recognition, paragraph grouping and layout reconstruction | |
CN113435240A (en) | End-to-end table detection and structure identification method and system | |
CN109389050B (en) | Method for identifying connection relation of flow chart | |
Roy et al. | Text line extraction in graphical documents using background and foreground information | |
Al Abodi et al. | An effective approach to offline Arabic handwriting recognition | |
CN115661848A (en) | Form extraction and identification method and system based on deep learning | |
CN116824608A (en) | Answer sheet layout analysis method based on target detection technology | |
JPH07220090A (en) | Object recognition method | |
Stewart et al. | Document image page segmentation and character recognition as semantic segmentation | |
Maddouri et al. | Text lines and PAWs segmentation of handwritten Arabic document by two hybrid methods | |
Park et al. | A method for automatically translating print books into electronic Braille books | |
Ablameyko et al. | Recognition of engineering drawing entities: review of approaches | |
Li et al. | Detection of overlapped quadrangles in plane geometric figures | |
JP3720892B2 (en) | Image processing method and image processing apparatus | |
CN116259062A (en) | CNN handwriting identification method based on multichannel and attention mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |