CN115620322A - Method for identifying table structure of whole-line table based on key point detection - Google Patents

Method for identifying table structure of whole-line table based on key point detection Download PDF

Info

Publication number
CN115620322A
CN115620322A CN202211637591.4A CN202211637591A CN115620322A CN 115620322 A CN115620322 A CN 115620322A CN 202211637591 A CN202211637591 A CN 202211637591A CN 115620322 A CN115620322 A CN 115620322A
Authority
CN
China
Prior art keywords
line
upper left
points
left corner
point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211637591.4A
Other languages
Chinese (zh)
Other versions
CN115620322B (en
Inventor
黄双萍
刘宗昊
黄森
彭文杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202211637591.4A priority Critical patent/CN115620322B/en
Publication of CN115620322A publication Critical patent/CN115620322A/en
Application granted granted Critical
Publication of CN115620322B publication Critical patent/CN115620322B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/66Analysis of geometric attributes of image moments or centre of gravity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/32Normalisation of the pattern dimensions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/34Smoothing or thinning of the pattern; Morphological operations; Skeletonisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Geometry (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a method for identifying a table structure of a full-line table based on key point detection, which comprises the following steps: detecting key points in the table image by adopting a key point detection network to obtain a Gaussian heat map containing the position information of all the key points; scaling the Gaussian heat map to be consistent with the size of the input form image, and obtaining the coordinate positions of all key points through a contour center distance algorithm; analyzing the structural position relation of the key points in the table by using a scanning line method; detecting whether a connection relation exists between adjacent key points by using a connected domain method; and reconstructing all cells in the table according to the structural position relationship and the connection relationship among the key points, and converting the cells into the required markup language description. The method can robustly find all table line intersection points in the table image by adopting a key point detection method based on deep learning, and obtains the accurate positions of all cells according to the key points, thereby completing table structure identification with high quality.

Description

Method for identifying table structure of whole-line table based on key point detection
Technical Field
The invention belongs to the technical field of image processing and artificial intelligence, and particularly relates to a method for identifying a table structure of a full-line table based on key point detection.
Background
The table is a simple and compact content organization mode, has good visualization effect, has wide application value and is widely applied to aspects of data visualization, computer software, bill record and the like. With the advent of the information age, more data is generated in daily life, business exchange, experimental research and the like of people, the number of tables is exponentially increased, and the table data is very valuable for systems such as big data mining, knowledge maps and the like. However, structured data such as tables is usually not easy to be identified by an algorithm, and on the other hand, when cells are merged between tables, the difficulty of identifying the table structure by the algorithm is increased. In order to extract data in the table efficiently, the identification of the table structure is one of the key tasks, an algorithm capable of effectively identifying the table structure is developed, and the auxiliary extraction of the data in the table is significant.
The traditional table structure recognition algorithm usually uses a special operator set manually to detect the table lines, obtains the row and column information of the table lines, and then determines the cells in the table by combining the intersection points of the table lines to complete the table structure recognition. However, these methods have certain drawbacks: on one hand, the method is very dependent on manual setting of special operator performance when a straight line is detected, and the detected table line is easy to have the problems of short lines and the like; on the other hand, these methods are often only applicable to regular and clean tables exported from PDF files, and when the tables are faced with irregular and rich and diverse tables obtained by scanning, photographing, etc., the performance is often sharply reduced.
Therefore, with the deeply learned fire heat, the table structure recognition algorithm based on the deep learning also comes along. At present, the method can be divided into three categories:
the first type is an example segmentation-based method, and the idea is to segment all cells in a table by an example segmentation algorithm in deep learning and then complete table structure recognition by some manually preset post-processing. However, the algorithm can only complete the positioning of the cells, but cannot identify the structural positions of the cells in the table, so that the performance of the algorithm depends on a post-processing algorithm set manually. On the other hand, the segmentation algorithm can only provide the area occupied by the cell, and the approximate shape and position of the cell can be obtained only by using the algorithm such as the minimum bounding rectangle, which is not accurate enough.
The second type is an image-to-sequence based method, and the core of the method lies in encoding a table structure into a sequence such as LaTeX or HTML, and the like, so that the table structure identification can be directly completed through a model of an image conversion sequence. However, the method does not explicitly utilize the position and structure information of the cells, thereby limiting the performance of the model.
The third type is a method based on a graph neural network, and the key point is that the graph neural network is used for completing modeling of cell relations in a table, the cells are used as nodes, and structural relations among the cells are used as edges, so that the problem of table structure identification is converted into the problem of edge classification. However, these methods need to obtain the location information of the cells through the cell detection network first, and cannot train end-to-end with the cell detection network, so their performance is limited by the performance of these cell detection networks.
In summary, the conventional method lacks robustness and cannot cope with more complicated tables nowadays, and the deep learning method is limited by accurately acquiring the positions of the cells. Therefore, a robust method capable of accurately acquiring the cell position is needed for table structure identification.
Disclosure of Invention
In view of the above, there is a need to provide a method for identifying a table structure of a whole line table based on key point detection, which uses the intersection point of table lines as a key point, first detects all key points in a table image by using a key point detection model, then determines the structural position relationship (located row and column) and the connection relationship of these key points in the table, and finally reconstructs a cell by using the relationship between the key points, thereby completing the table structure identification.
The invention discloses a method for identifying a table structure of a full-line table based on key point detection, which comprises the following steps of:
step 1, detecting key points in a form image by adopting a key point detection network to obtain a Gaussian heat map containing all key point position information;
step 2, scaling the Gaussian heat map to be consistent with the size of the table image, and obtaining coordinate positions of all key points through a contour center distance algorithm;
step 3, analyzing the structural position relation of the key points in the table by using a scanning line method;
step 4, detecting whether the adjacent key points have a connection relation by using a connected domain method;
and 5, reconstructing all cells in the table according to the structural position relationship and the connection relationship among the key points, and converting the cells into the required markup language description.
Optionally, the key point detection network is HRNet.
Optionally, the key point detection network is deepcut.
The key point detection network uses the self-constructed table key point data set for pre-training to obtain the capability of detecting the key points of the table.
Specifically, the table keyword data set contains 500 table images, and the coordinates of all table line keywords in each table image are given as labels of the coordinate positions of the keywords. The form image data source is a work order image and is a full line form.
Specifically, the contour center distance algorithm comprises the following steps:
adopting a Gaussian fuzzy algorithm to the Gaussian heat map to reduce noise points appearing on the outline edge in the Gaussian heat map;
carrying out binarization operation on the Gaussian heatmap to obtain a binarization map;
carrying out contour detection on the binary image to obtain the contours of all key points;
and calculating the center distance of the outline of each key point, wherein the center distance is the coordinate position of the key point.
Optionally, the contour detection algorithm is implemented using the findContour () method in OpenCV.
Specifically, the calculation formula of the center distance of the contour calculation is as follows:
for key points
Figure 182253DEST_PATH_IMAGE002
In a coordinate position of
Figure 729909DEST_PATH_IMAGE003
Then, the first step is executed,
Figure 431149DEST_PATH_IMAGE004
Figure 671637DEST_PATH_IMAGE005
Figure 45112DEST_PATH_IMAGE006
the calculation formula of (a) is as follows:
Figure 583410DEST_PATH_IMAGE007
wherein the content of the first and second substances,
Figure 139156DEST_PATH_IMAGE008
is a key point
Figure 550546DEST_PATH_IMAGE009
Of (2), the profileSet of all points above
Figure 142808DEST_PATH_IMAGE010
Figure 219217DEST_PATH_IMAGE011
Representing the abscissa of the contour point in the figure,
Figure 160628DEST_PATH_IMAGE012
representing the ordinate of the contour point in the figure,
Figure 742919DEST_PATH_IMAGE013
to represent
Figure 887724DEST_PATH_IMAGE011
Is/are as follows
Figure 784136DEST_PATH_IMAGE014
To the power of the above, the first order,
Figure 32583DEST_PATH_IMAGE015
to represent
Figure 785776DEST_PATH_IMAGE012
Is
Figure 667144DEST_PATH_IMAGE014
To the power of the wave,
Figure 652067DEST_PATH_IMAGE016
representing the abscissa of all points in the contour
Figure 302492DEST_PATH_IMAGE011
And ordinate
Figure 429848DEST_PATH_IMAGE012
After the operation, the sum is obtained,
Figure 516621DEST_PATH_IMAGE017
is the zero-order spatial moment of the profile,
Figure 285994DEST_PATH_IMAGE018
and
Figure 807236DEST_PATH_IMAGE019
is the first spatial moment of the contour.
Specifically, the structural positions of the key points in the table, that is, the key points belong to the row table line and the list table line.
Furthermore, the scanning line method comprises the following steps:
scaling the form image to the same height and width;
setting an initial straight line as a line scanning line, vertically moving in the form image from top to bottom, and moving for a fixed distance each time
Figure 636652DEST_PATH_IMAGE020
Move 1 distance each time
Figure 492613DEST_PATH_IMAGE020
Then, calculating the number of key points above the line scanning line
Figure 783786DEST_PATH_IMAGE021
When continuously moved by a predetermined amount
Figure 612064DEST_PATH_IMAGE022
A distance of movement
Figure 877961DEST_PATH_IMAGE023
Number of key points
Figure 703441DEST_PATH_IMAGE024
If the number of the lines is not increased, a line separation line is arranged at the current position;
separating all key points according to the line separating line, thereby obtaining the key points belonging to the table line of the fourth line;
an initial straight line is set as a column scanning line and horizontally moves in the form image from left to rightEach time a fixed distance
Figure 345775DEST_PATH_IMAGE025
Move 1 distance each time
Figure 294139DEST_PATH_IMAGE025
Then, the number of key points to the left of the column scan line is calculated
Figure 183467DEST_PATH_IMAGE026
When continuously moved by a predetermined amount
Figure 482861DEST_PATH_IMAGE027
A distance of movement
Figure 414039DEST_PATH_IMAGE028
Number of points
Figure 748068DEST_PATH_IMAGE029
If the number of the lines is not increased, a column separation line is arranged at the current position;
all the key points are separated according to the column separation line, so that the key points belong to the fourth list grid line.
Preferably, if the table in the table image is tilted, the corresponding row scanning line or column scanning line needs to be corrected;
the correction process of the line scanning line comprises the following steps: firstly, setting an initial line scanning line as a horizontal straight line, carrying out a first scanning line method to obtain points positioned on a first line table line, then calculating an approximate slope by using a least square method according to coordinates of the points, and applying the approximate slope on the initial line scanning line as a corrected line scanning line;
the correction process of the column scanning lines is as follows: firstly, setting an initial column scanning line as a vertical straight line, carrying out a first scanning line method to obtain points positioned on a first list grid line, then calculating an approximate slope by using a least square method according to coordinates of the points, and applying the approximate slope on the initial column scanning line as a corrected column scanning line;
the least square method is used for solving the approximate slope, and the calculation formula is as follows:
Figure 355767DEST_PATH_IMAGE030
Figure 391725DEST_PATH_IMAGE031
representing the approximate slope of the corrected row or column scan line,
Figure 313545DEST_PATH_IMAGE032
for the coordinates of a point on the first line table or list grid,
Figure 767660DEST_PATH_IMAGE033
the number of points which are points on the first row table line or the list table line;
wherein the content of the first and second substances,
Figure 46062DEST_PATH_IMAGE034
the calculation formula of (c) is as follows:
Figure 585627DEST_PATH_IMAGE035
specifically, the adjacent keypoints refer to two keypoints belonging to the same row grid line and belonging to adjacent list grid lines, or two keypoints belonging to the same list grid line and belonging to adjacent row grid lines.
Specifically, the connected domain method comprises the following steps:
traversing all adjacent key points, taking the positions of the key points as the circle centers, firstly making a circle with the radius of r, then obtaining a minimum circumscribed rectangle containing the circles with the two adjacent key points as the circle centers as a cutting area, and then cutting out a local area graph where the adjacent key points are located from the form image;
after the local area image is converted into a gray image, a Gaussian low-pass filter is used for smoothing operation, closed operation is used for filling gaps in a connected domain, then a self-adaptive threshold algorithm is used for converting the connected domain into a binary image, and then a second large connected domain is found; the largest connected component is the background, not the foreground;
taking the maximum value of the height and the width of the local area map as
Figure 107876DEST_PATH_IMAGE036
The maximum of the height and width of the second largest connected domain is
Figure 400186DEST_PATH_IMAGE037
Calculating the ratio of
Figure 552949DEST_PATH_IMAGE038
The calculation is as follows:
Figure 64964DEST_PATH_IMAGE039
when the temperature is higher than the set temperature
Figure 125324DEST_PATH_IMAGE040
And if the threshold value is larger than the preset threshold value, the connection relation exists between the adjacent key points, otherwise, the connection relation does not exist.
Furthermore, the existing connection relationship comprises a vertical connection relationship and a horizontal connection relationship, and if the height of the local area graph is greater than the width, the vertical connection relationship exists; if the height of the local area graph is smaller than the width, the horizontal connection relationship exists;
the non-existence of the connection relation comprises non-existence of a longitudinal connection relation and non-existence of a transverse connection relation, and if the height of the local area graph is greater than the width, the non-existence of the longitudinal connection relation is determined; if the height of the local area map is smaller than the width, the transverse connection relation does not exist.
Furthermore, the step of reconstructing all cells in the table according to the structural position relationship and the connection relationship between the key points is as follows: according to a reconstruction rule, firstly finding all upper left corner points in the key points, and then finding the lower right corner point corresponding to each upper left corner point, thereby reconstructing all cells;
the reconfiguration gauge includes: according to the first rule, each key point can only be the upper left corner point of one cell at most, and can also only be the lower right corner point of one cell at most; according to a second rule, when adjacent key points do not have a connection relation, if the adjacent key points do not have a longitudinal connection relation, the key point positioned above is not the upper left corner point and the key point positioned below is not the lower right corner point; if the transverse connection relation does not exist, the key point positioned on the left side is not the upper left corner point and the key point positioned on the right side is not the lower right corner point; rule three, the list grid line where the lower right corner point corresponding to the upper left corner point of one cell is located in the same column or left of the list grid line where the upper left corner point of the cell adjacent to the right of the cell is located, if there is no cell adjacent to the right, the column where the lower right corner point corresponding to the upper left corner point of the cell is located in the same column or left of the last list grid line; a list grid line where a lower right corner point corresponding to the upper left corner point of one cell is located on the right of the list grid line where the upper left corner point of the cell is located; and a fifth rule, wherein the lower right corner point corresponding to the upper left corner point of one cell is the closest to the upper left corner point among the points according with the third and fourth rules.
Specifically, the steps of reconstructing all cells are as follows:
taking a set formed by all key points as a candidate upper left corner point set, and simultaneously taking a set formed by all key points as a candidate lower right corner point set;
according to a second rule, according to the connection relation between all adjacent key points, when the adjacent key points do not have the longitudinal connection relation, removing the key point positioned above from the candidate upper left corner set, simultaneously removing the key point positioned below from the candidate lower right corner set, when the adjacent key points do not have the transverse connection relation, removing the key point positioned on the left from the candidate upper left corner set, simultaneously removing the key point positioned on the right from the candidate lower right corner set, and obtaining all formal upper left corner sets and all formal lower right corner sets;
according to the first rule, each point can only be the upper left corner point of one cell at most and can also be the lower right corner point of one cell at most, so that the upper left corner point in the upper left corner point set and the lower right corner point in the lower right corner point set are in one-to-one correspondence, and then all cells can be reconstructed as long as corresponding lower right corner points are found for the upper left corner points in all the upper left corner point sets;
according to a third rule, sequentially taking out one upper left corner point in all the upper left corner point sets, and calling the upper left corner point as an upper left corner point A, finding a plurality of upper left corner points which are positioned on the same row of table lines with the upper left corner point A, wherein one of the plurality of upper left corner points, which is positioned at the right of the upper left corner point A and is closest to the upper left corner point A, is the upper left corner point of a cell adjacent to the right of the cell in which the upper left corner point A is positioned, and calling the upper left corner point as an upper left corner point B; then, screening all the lower right corner point sets for the first time, removing the lower right corner point positioned at the right side of the list grid line where the upper left corner point B is positioned, and if the upper left corner point A is positioned on the last list grid line in the list, not needing to screen for the first time;
according to the fourth rule, performing secondary screening on all the lower right corner point sets, and removing all lower right corner points, which do not meet the requirements, of the list grid lines positioned on the right side of the list grid line positioned at the upper left corner point A;
according to the fifth rule, one corner point which is closest to the upper left corner point A is found from all lower right corner points left after twice screening, the lower right corner point is the lower right corner point corresponding to the upper left corner point A, and paired upper left corner points and lower right corner points form a cell;
and sequentially finding the lower right corner points corresponding to the upper left corner points in the set of all upper left corner points, so as to reconstruct all the cells.
Compared with the prior art, the method has the advantages that the method can robustly find all table line intersection points in the table image by adopting a key point detection method based on deep learning, and obtains the accurate positions of all cells according to the key points, so that the table structure identification is finished with high quality.
Drawings
FIG. 1 shows a schematic flow diagram of a method embodying the present invention;
fig. 2 shows a schematic structural diagram of the table of the present embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
For the sake of reference and clarity, the technical terms, abbreviations or abbreviations used hereinafter are to be interpreted in summary as follows:
gaussian heat map: the heatmap is a matrix that displays data as color changes, and a gaussian heatmap, i.e., a heatmap in which the color changes exhibit a gaussian distribution.
PDF (Portable Document Format): portable document format, a file format that presents documents in a manner that is independent of the application, operating system, hardware.
HRNet (High Resolution Net): high resolution network model.
DeeperCut: a key point detection network.
OpenCV (Open Source Computer Vision Library): a cross-platform computer vision library encapsulates many commonly used functions and classes of image processing.
Line table line: horizontal form lines in the form image.
Listing grid lines: vertical form lines in the form image.
HTML (Hyper Text Markup Language): hypertext markup language.
LaTeX: a typesetting system.
Point at the upper left corner: the vertex in the upper left corner of the four vertices of the cell in the table.
Lower right corner point: the vertex in the lower right corner of the four vertices of the cell in the table.
Fig. 1 shows a schematic flow diagram of an embodiment of the invention. A method for identifying a table structure of a full-line table based on key point detection comprises the following steps:
step 1, detecting key points in a table image by adopting a key point detection network to obtain a Gaussian heat map containing information of all the key point positions;
step 2, scaling the Gaussian heat map to be consistent with the size of the input form image, and obtaining coordinate positions of all key points through a contour center distance algorithm;
step 3, analyzing the structural position relation of the key points in the table by using a scanning line method;
step 4, detecting whether the adjacent key points have a connection relation by using a connected domain method;
and 5, reconstructing all cells in the table according to the structural position relationship and the connection relationship among the key points, and converting the cells into the required markup language description.
The following describes the whole line table structure identification process based on the keypoint detection in detail with reference to the examples.
And step 1, detecting the key points in the table image by adopting a key point detection network to obtain a Gaussian heat map containing the position information of all the key points.
Specifically, the key point detection network is HRNet.
The key point detection network uses the self-constructed table key point data set for pre-training to obtain the capability of detecting the key points of the table.
Specifically, the table keyword data set contains 500 table images, and the coordinates of all table line keywords in each table image are given as labels of the coordinate positions of the keywords.
The form image data source is a work order image and is a full-line form.
And 2, executing step 2, scaling the Gaussian heat map to be consistent with the size of the input form image, and obtaining the coordinate positions of all key points through a contour center distance algorithm.
Specifically, the scaling is implemented by using a resize () algorithm method in OpenCV, and bilinear interpolation is used.
Specifically, the contour center distance algorithm comprises the following steps:
adopting a Gaussian fuzzy algorithm for the Gaussian heat map to reduce noise points appearing at the edges of the outline in the heat map;
carrying out binarization operation on the Gaussian heat map to obtain a binarization map;
carrying out contour detection on the binary image to obtain the contours of all key points;
specifically, the contour detection algorithm is implemented by using a findContour () algorithm in OpenCV;
calculating the center distance of the outline of each key point, wherein the center distance is the coordinate position of the key point;
specifically, the method calculates key points by using a contour central moment algorithm
Figure 288452DEST_PATH_IMAGE009
In the position of
Figure 408855DEST_PATH_IMAGE003
The calculation is as follows:
Figure DEST_PATH_IMAGE041
wherein, the first and the second end of the pipe are connected with each other,
Figure DEST_PATH_IMAGE042
is calculated as follows:
Figure DEST_PATH_IMAGE043
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE044
is a key point
Figure DEST_PATH_IMAGE045
The contour of (1), the set of all points on the contour
Figure DEST_PATH_IMAGE046
Figure 857766DEST_PATH_IMAGE011
Representing the abscissa of the contour point in the figure,
Figure 941391DEST_PATH_IMAGE012
representing the ordinate of the contour point in the figure,
Figure DEST_PATH_IMAGE047
to represent
Figure 162288DEST_PATH_IMAGE011
Is
Figure DEST_PATH_IMAGE048
To the power of the wave,
Figure DEST_PATH_IMAGE049
to represent
Figure 374964DEST_PATH_IMAGE012
Is/are as follows
Figure 67763DEST_PATH_IMAGE048
To the power of the wave,
Figure DEST_PATH_IMAGE050
represents the abscissa of all points in the contour
Figure 204346DEST_PATH_IMAGE011
And ordinate
Figure 325755DEST_PATH_IMAGE012
After the operation, the sum is obtained,
Figure DEST_PATH_IMAGE051
is the zero-order spatial moment of the profile,
Figure DEST_PATH_IMAGE052
and
Figure DEST_PATH_IMAGE053
is the first spatial moment of the contour.
And 3, executing a step of analyzing the structural position relation of the key points in the table by using a scanning line method.
Specifically, the structural positions of the key points in the table, that is, the key points belong to the row table line and the list table line.
Specifically, the scanning line method comprises the following steps:
the form image is scaled to the same height and width.
Setting an initial straight line as a line scanning line, and vertically moving in the form image from top to bottom, each time moving for a fixed distance
Figure DEST_PATH_IMAGE054
Move 1 distance each time
Figure 601009DEST_PATH_IMAGE054
Then, the key point number above the straight line is calculated
Figure DEST_PATH_IMAGE055
When continuously moved and manually preset
Figure DEST_PATH_IMAGE056
A distance of movement
Figure DEST_PATH_IMAGE057
Number of points
Figure DEST_PATH_IMAGE058
If the number of the lines is not increased, a line separation line is arranged at the current position;
separating all key points according to the row separation line, thereby obtaining the key point belonging to the second row table line;
an initial straight line is set as a column scan line and moved horizontally in the form image from left to right, each timeMove a fixed distance
Figure DEST_PATH_IMAGE059
Move 1 distance each time
Figure 837562DEST_PATH_IMAGE059
Then, the key point number on the left side of the straight line is calculated
Figure DEST_PATH_IMAGE060
When continuously moved and manually preset
Figure DEST_PATH_IMAGE061
A distance of movement
Figure DEST_PATH_IMAGE062
Number of points
Figure DEST_PATH_IMAGE063
If the number of the lines is not increased, a column separation line is arranged at the current position;
separating all key points according to the column separation line, thereby obtaining the key points belonging to the second list grid line;
preferably, for a tilted table, the initial row-scan line slope needs to be corrected; first, an initial line scan line is set as a horizontal straight line, a first scan line method is performed to obtain points located on a first line table line, an approximate slope is obtained from the points by a least square method, and the slope is applied to the initial line scan line as a corrected initial line scan line.
Preferably, for a tilted table, the initial column scan line slope needs to be corrected; first, an initial column scan line is set as a vertical straight line, a first scan line method is performed to obtain points located on a first list grid line, an approximate slope is obtained by a least square method from coordinates of the points, and the slope is applied to the initial column scan line as a corrected initial column scan line.
Wherein, the least square method calculates the approximate slope, and the calculation is as follows:
Figure DEST_PATH_IMAGE064
Figure 794148DEST_PATH_IMAGE031
the approximate slope of the row/column representing the corrected scan line,
Figure DEST_PATH_IMAGE065
as the coordinates of the point on the first row/list gridline,
Figure DEST_PATH_IMAGE066
the number of points that are points on the first row/list grid;
wherein, the first and the second end of the pipe are connected with each other,
Figure DEST_PATH_IMAGE067
is calculated as follows:
Figure DEST_PATH_IMAGE068
and 4, executing a step of detecting whether the adjacent key points have a connection relation or not by using a connected domain method.
Specifically, the adjacent keypoints refer to two keypoints belonging to the same row grid line and belonging to adjacent list grid lines, or two keypoints belonging to the same list grid line and belonging to adjacent row grid lines.
Specifically, the connected domain method comprises the following steps:
traversing all adjacent key points, taking the positions of the key points as the circle centers, firstly making a circle with the radius of r, then obtaining a minimum circumscribed rectangle containing the circles with the two adjacent key points as the circle centers as a cutting area, and then cutting out a local area graph where the adjacent key points are located from the form image;
after the local area image is converted into a gray level image, a Gaussian low-pass filter is used for smoothing operation, closed operation is used for filling gaps in a connected domain, then an adaptive threshold algorithm is used for converting the connected domain into a binary image, and then a second large connected domain is found (the largest connected domain is the background and not the foreground).
Taking the maximum value of the height and the width of the local area map as
Figure DEST_PATH_IMAGE069
The maximum of the height and width of the connected domain is
Figure DEST_PATH_IMAGE070
Calculating the ratio
Figure DEST_PATH_IMAGE071
The calculation is as follows:
Figure DEST_PATH_IMAGE072
finally, a threshold value is set when
Figure 75788DEST_PATH_IMAGE071
If the key points are larger than the threshold, the connection relation exists between the adjacent key points, otherwise, the connection relation does not exist;
furthermore, the existence of the connection relationship can be divided into the existence of a longitudinal connection relationship and the existence of a transverse connection relationship, and if the height of the local area graph is greater than the width, the existence of the longitudinal connection relationship is determined; if the height of the local area graph is smaller than the width, the horizontal connection relationship exists;
furthermore, the absence of the connection relationship can be divided into the absence of the longitudinal connection relationship and the absence of the transverse connection relationship, and if the height of the local area graph is greater than the width, the absence of the longitudinal connection relationship is determined; if the height of the local area map is smaller than the width, the transverse connection relation does not exist.
And 5, executing step 5, reconstructing all cells in the table according to the structural position relation and the connection relation among the key points, and converting into the required markup language description.
Specifically, the steps of reconstructing all cells in the table according to the structural position relationship and the connection relationship between the key points are as follows:
according to five rules, all the upper left corner points in the key points are found first, and then the lower right corner point corresponding to each upper left corner point is found, so that all the cells are reconstructed.
Specifically, the five rules are as follows: rule I, each key point can only be the upper left corner of one cell at most, and can only be the lower right corner of one cell at most; according to a second rule, when adjacent key points do not have a connection relation, if the adjacent key points do not have a longitudinal connection relation, the key point positioned above is not the upper left corner point and the key point positioned below is not the lower right corner point; if the transverse connection relation does not exist, the key point positioned on the left side is not the upper left corner point and the key point positioned on the right side is not the lower right corner point; rule three, a list grid line where a lower right corner point corresponding to an upper left corner point of one cell is located in the same column or left of a list grid line where an upper left corner point of a cell adjacent to the right of the cell is located, if no cell adjacent to the right exists, a column where a lower right corner point corresponding to an upper left corner point of the cell is located in the same column or left of the last list grid line; a list grid line where a lower right corner point corresponding to the upper left corner point of one cell is located on the right of the list grid line where the upper left corner point of the cell is located; and a fifth rule, wherein the lower right corner point corresponding to the upper left corner point of one cell is the closest to the upper left corner point among the points according with the third and fourth rules.
According to the rule I, as shown in fig. 2, the key points 1 to 22 can only be the upper left corner point of one cell at most, and can also only be the lower right corner point of one cell at most, and all points cannot be the common upper left corner point or the common lower right corner point of a plurality of cells;
in the second rule, as shown in fig. 2, the key points 7 and 13 do not have a longitudinal connection relationship, so that the key point 7 located above is not the upper left corner of the cell and the key point 13 located below is not the lower right corner of the cell; if the key points 8 and 9 do not have a transverse connection relationship, the key point 8 located on the left side is not an upper left corner point and the key point 9 located on the right side is not a lower right corner point;
in the rule three, as shown in fig. 2, a tabulation line in which a right lower corner point (the key point 12 in fig. 2) corresponding to a left upper corner point (the key point 1 in fig. 2) of one cell (the cell surrounded by the key points 1,2, 12, 11 in fig. 2) is located on the same column or left of a tabulation line in which a left upper corner point (the key point 2 in fig. 2) of a cell (the cell surrounded by the key points 2,3, 8, 6 in fig. 2) adjacent to the right of the cell is located. If a cell (the cell surrounded by the key points 4, 5, 10, and 9 in fig. 2) has no right-adjacent cell, the column of the right-lower corner (the key point 10 in fig. 2) corresponding to the upper-left corner (the key point 4 in fig. 2) of the cell is located in the same column or to the left of the last list grid line;
as shown in fig. 2, a tabulation ruled line in which a lower-right corner (key point 12 in fig. 2) corresponding to an upper-left corner (key point 1 in fig. 2) of a cell (a cell surrounded by key points 1,2, 12, and 11 in fig. 2) is located on the right of a tabulation ruled line in which an upper-left corner (key point 1 in fig. 2) of the cell is located;
in the rule five, as shown in fig. 2, a right lower corner point (the key point 12 in fig. 2) corresponding to an upper left corner point (the key point 1 in fig. 2) of one cell (the cell surrounded by the key points 1,2, 12, and 11 in fig. 2) is a key point 12 in fig. 2, which is the nearest one of the points (the key points 6, 12, and 18 in fig. 2) that meets the rule three and four.
Specifically, the steps of reconstructing all cells are as follows:
taking the set formed by all key points as a candidate upper left corner point set {1,2, \8230;, 22}, and simultaneously taking the set formed by all key points as a candidate lower right corner point set {1,2, \8230;, 22};
according to a second rule, according to the connection relation between all adjacent key points, when the adjacent key points do not have a longitudinal connection relation, removing the key point positioned above from the candidate upper left corner set, and simultaneously removing the key point positioned below from the candidate lower right corner set, when the adjacent key points do not have a transverse connection relation, removing the key point positioned on the left from the candidate upper left corner set, and simultaneously removing the key point positioned on the right from the candidate lower right corner set, and finally obtaining all formal upper left corner sets {1,2,3,4,6,9,11,12,13,14,15} and all lower right corner sets {8,10,12,14,15,16,18,19,20,21,22};
according to the first rule, each point can only be an upper left corner point of one cell and can also only be a lower right corner point of one cell, so that an upper left corner point in an upper left corner point set and a lower right corner point in a lower right corner point set are in one-to-one correspondence, and then all cells can be reconstructed as long as corresponding lower right corner points are found for upper left corner points in all upper left corner point sets;
according to the third rule, sequentially taking out an upper left corner point in all the upper left corner point sets, taking out the upper left corner point 3, finding out a plurality of upper left corner points {1,2,4} which are positioned on the same row of table lines with the upper left corner point 3, wherein one of the upper left corner points, which is positioned at the right of the upper left corner point 3 and is closest to the upper left corner point 3, is the upper left corner point of the right adjacent cell of the cell in which the upper left corner point 3 is positioned, and is the upper left corner point 4. Next, a first filtering is performed on the set of all lower right corner points to remove the lower right corner point located to the right of the list grid where the upper left corner point 4 is located, and the remaining lower right corner points are {8,12,14,15,18,19,20,21}. Because the upper left corner 3 is not located on the last column of table line in the table, the first screening is needed here;
according to the fourth rule, performing second screening on the set of all lower right corner points {8,12,14,15,18,19,20 and 21}, removing all lower right corner points which do not meet the right of the list ruled line of which the list ruled line is positioned at the upper left corner point 3, and setting the rest lower right corner points to be {15 and 21};
according to the fifth rule, one closest to the upper left corner point 3, namely the lower right corner point 15, is found from all the lower right corner points {15, 21} left after twice screening, and the upper left corner point 3 and the lower right corner point 15 form a cell;
and sequentially finding the lower right corner points corresponding to the upper left corner points in the set of all upper left corner points, so as to reconstruct all the cells.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
All possible combinations of the technical features in the above embodiments may not be described for the sake of brevity, but should be considered as being within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A method for identifying a table structure of a full-line table based on key point detection is characterized by comprising the following steps:
step 1, detecting key points in a table image by adopting a key point detection network to obtain a Gaussian heat map containing information of all key point positions;
step 2, scaling the Gaussian heat map to be consistent with the size of the table image, and obtaining coordinate positions of all key points through a contour center distance algorithm;
step 3, analyzing the structural position relation of the key points in the table by using a scanning line method;
step 4, detecting whether the adjacent key points have a connection relation by using a connected domain method;
and 5, reconstructing all cells in the table according to the structural position relationship and the connection relationship among the key points, and converting the cells into the required markup language description.
2. The method for identifying a full-line table structure based on keypoint detection as claimed in claim 1, wherein said contour center distance algorithm comprises the following steps:
adopting a Gaussian fuzzy algorithm to the Gaussian heat map to reduce noise points appearing on the outline edge in the Gaussian heat map;
carrying out binarization operation on the Gaussian heatmap to obtain a binarization map;
carrying out contour detection on the binary image to obtain the contours of all key points;
and calculating the center distance of the outline of each key point, wherein the center distance is the coordinate position of the key point.
3. The method for identifying the table structure of the whole line table based on the key point detection as claimed in claim 2, wherein the calculation formula of the contour calculation center distance is as follows:
for key points
Figure 954318DEST_PATH_IMAGE001
The coordinate position of which is
Figure 794360DEST_PATH_IMAGE002
Then, the first step is executed,
Figure 824633DEST_PATH_IMAGE003
Figure 717503DEST_PATH_IMAGE004
Figure 491424DEST_PATH_IMAGE005
the calculation formula of (c) is as follows:
Figure 776518DEST_PATH_IMAGE006
wherein the content of the first and second substances,
Figure 649796DEST_PATH_IMAGE007
is a key point
Figure 764383DEST_PATH_IMAGE001
The contour of (1), the set of all points on the contour
Figure 138732DEST_PATH_IMAGE008
Figure 156629DEST_PATH_IMAGE009
Representing the abscissa of the contour point in the figure,
Figure 528705DEST_PATH_IMAGE010
representing the ordinate of the contour point in the figure,
Figure 130587DEST_PATH_IMAGE011
represent
Figure 394779DEST_PATH_IMAGE009
Is/are as follows
Figure 296876DEST_PATH_IMAGE012
To the power of the above, the first order,
Figure 511957DEST_PATH_IMAGE013
represent
Figure 866715DEST_PATH_IMAGE010
Is/are as follows
Figure 615490DEST_PATH_IMAGE012
To the power of the wave,
Figure 637673DEST_PATH_IMAGE014
represents the abscissa of all points in the contour
Figure 23655DEST_PATH_IMAGE009
And ordinate
Figure 600129DEST_PATH_IMAGE010
After the operation, the sum is obtained,
Figure 821770DEST_PATH_IMAGE015
is the zero-order spatial moment of the profile,
Figure 291934DEST_PATH_IMAGE016
and
Figure 911135DEST_PATH_IMAGE017
is the first spatial moment of the contour.
4. A method for identifying a full-line table structure based on keypoint detection as claimed in claim 3, characterized in that said scan-line method comprises the following steps:
scaling the form image to the same height and width;
setting an initial straight line as a line scanning line, vertically moving in the form image from top to bottom, and moving for a fixed distance each time
Figure 912589DEST_PATH_IMAGE018
Move 1 distance each time
Figure 3167DEST_PATH_IMAGE018
Then, calculating the number of key points above the line scanning line
Figure 203204DEST_PATH_IMAGE019
When continuously moved by a predetermined amount
Figure 790043DEST_PATH_IMAGE020
A distance of movement
Figure 278793DEST_PATH_IMAGE021
If the key points are counted
Figure 379254DEST_PATH_IMAGE019
If the number of the lines is not increased, a line separation line is arranged at the current position;
separating all key points according to the line separating line, thereby obtaining the key points belonging to the table line of the fourth line;
setting an initial straight line as a column scanning line, and moving horizontally in the form image from left to right, each time by a fixed distance
Figure 964956DEST_PATH_IMAGE022
Move 1 distance each time
Figure 394801DEST_PATH_IMAGE022
Then, the number of key points to the left of the column scan line is calculated
Figure 229901DEST_PATH_IMAGE023
When continuously moved by a predetermined amount
Figure 239446DEST_PATH_IMAGE024
A distance of movement
Figure 712278DEST_PATH_IMAGE025
Number of points
Figure 781865DEST_PATH_IMAGE026
If the number of the lines is not increased, a column separation line is arranged at the current position;
all the key points are separated according to the column separation line, so that the key points belong to the fourth list grid line.
5. The method as claimed in claim 4, wherein if the table in the table image is tilted, the corresponding row scan line or column scan line needs to be corrected;
the correction process of the line scanning line comprises the following steps: firstly, setting an initial line scanning line as a horizontal straight line, carrying out a first scanning line method to obtain points positioned on a first line table line, then calculating an approximate slope by using a least square method according to coordinates of the points, and applying the approximate slope on the initial line scanning line as a corrected line scanning line;
the correction process of the column scanning lines is as follows: firstly, setting an initial column scanning line as a vertical straight line, carrying out a first scanning line method to obtain points positioned on a first list grid line, then calculating an approximate slope by using a least square method according to coordinates of the points, and applying the approximate slope on the initial column scanning line as a corrected column scanning line;
the least square method is used for solving the approximate slope, and the calculation formula is as follows:
Figure 573103DEST_PATH_IMAGE027
Figure 979814DEST_PATH_IMAGE028
representing the approximate slope of the corrected row or column scan line,
Figure 805687DEST_PATH_IMAGE029
for the coordinates of a point on the first line table or list grid,
Figure 872607DEST_PATH_IMAGE030
the number of points which are points on the first row table line or the list table line;
wherein, the first and the second end of the pipe are connected with each other,
Figure 292087DEST_PATH_IMAGE031
Figure 768068DEST_PATH_IMAGE032
the calculation formula of (a) is as follows:
Figure 651710DEST_PATH_IMAGE033
6. the method for identifying the table structure of the whole line table based on the key point detection as claimed in claim 1 or 4, wherein the connected domain method comprises the following steps:
traversing all adjacent key points, taking the positions of the key points as the circle centers, firstly making a circle with the radius of r, then obtaining a minimum circumscribed rectangle containing the circles with the two adjacent key points as the circle centers as a cutting area, and then cutting out a local area graph where the adjacent key points are located from the form image;
after the local area image is converted into a gray image, a Gaussian low-pass filter is used for carrying out smoothing operation, closed operation is used for filling gaps in a connected domain, then a self-adaptive threshold algorithm is used for converting the gap into a binary image, and then a second large connected domain is found;
taking the maximum value of the height and the width of the local area map as
Figure 453313DEST_PATH_IMAGE034
The maximum of the height and width of the second largest connected domain is
Figure 360089DEST_PATH_IMAGE035
Calculating the ratio
Figure 141226DEST_PATH_IMAGE036
The calculation is as follows:
Figure 144954DEST_PATH_IMAGE037
when in use
Figure 992824DEST_PATH_IMAGE036
And if the threshold value is larger than the preset threshold value, the connection relation exists between the adjacent key points, otherwise, the connection relation does not exist.
7. The method for identifying a full-line table structure based on keypoint detection as defined in claim 6, wherein the existence of connection relationships includes the existence of a vertical connection relationship and the existence of a horizontal connection relationship, and if the height of a local region graph is greater than the width, the existence of a vertical connection relationship is determined; if the height of the local area graph is smaller than the width, the horizontal connection relationship exists;
the non-existence of the connection relation comprises non-existence of a longitudinal connection relation and non-existence of a transverse connection relation, and if the height of the local area graph is greater than the width, the non-existence of the longitudinal connection relation is determined; if the height of the local area map is smaller than the width, the transverse connection relation does not exist.
8. The method for identifying the table structure of the whole line table based on the key point detection as claimed in claim 7, wherein the step of reconstructing all the cells in the table according to the structure position relationship and the connection relationship between the key points comprises the following steps: according to the reconstruction rule, all upper left corner points in the key points are found, and then the lower right corner point corresponding to each upper left corner point is found, so that all cells are reconstructed;
the reconfiguration gauge includes: rule I, each key point can only be the upper left corner of one cell at most, and can only be the lower right corner of one cell at most; according to a second rule, when adjacent key points do not have a connection relation, if the adjacent key points do not have a longitudinal connection relation, the key point positioned above is not the upper left corner point and the key point positioned below is not the lower right corner point; if the transverse connection relation does not exist, the key point positioned on the left side is not the upper left corner point and the key point positioned on the right side is not the lower right corner point; rule three, the list grid line where the lower right corner point corresponding to the upper left corner point of one cell is located in the same column or left of the list grid line where the upper left corner point of the cell adjacent to the right of the cell is located, if there is no cell adjacent to the right, the column where the lower right corner point corresponding to the upper left corner point of the cell is located in the same column or left of the last list grid line; a list grid line where a lower right corner point corresponding to the upper left corner point of one cell is located on the right of the list grid line where the upper left corner point of the cell is located; and a fifth rule, wherein the lower right corner point corresponding to the upper left corner point of one cell is closest to the upper left corner point among the points according with the third and fourth rules.
9. The method for identifying the table structure of the whole line table based on the key point detection as claimed in claim 8, wherein the step of reconstructing all the cells is as follows:
taking a set formed by all key points as a candidate upper left corner set, and simultaneously taking a set formed by all key points as a candidate lower right corner set;
according to a second rule, according to the connection relation between all adjacent key points, when the adjacent key points do not have the longitudinal connection relation, removing the key point positioned above from the candidate upper left corner set, simultaneously removing the key point positioned below from the candidate lower right corner set, when the adjacent key points do not have the transverse connection relation, removing the key point positioned on the left from the candidate upper left corner set, simultaneously removing the key point positioned on the right from the candidate lower right corner set, and obtaining all formal upper left corner sets and all formal lower right corner sets;
according to the first rule, each point can only be the upper left corner point of one cell at most and can also be the lower right corner point of one cell at most, so that the upper left corner point in the upper left corner point set and the lower right corner point in the lower right corner point set are in one-to-one correspondence, and then all cells can be reconstructed as long as corresponding lower right corner points are found for the upper left corner points in all the upper left corner point sets;
according to a third rule, sequentially taking out one upper left corner point in all the upper left corner point sets, and calling the upper left corner point as an upper left corner point A, finding a plurality of upper left corner points which are positioned on the same row of table lines with the upper left corner point A, wherein one of the plurality of upper left corner points, which is positioned at the right of the upper left corner point A and is closest to the upper left corner point A, is the upper left corner point of a cell adjacent to the right of the cell in which the upper left corner point A is positioned, and calling the upper left corner point as an upper left corner point B; then, screening all the lower right corner point sets for the first time, removing the lower right corner point which is positioned at the right side of the list grid line where the upper left corner point B is positioned, and if the upper left corner point A is positioned on the last column of the list grid line in the list, not needing to screen for the first time;
according to the fourth rule, performing secondary screening on all the lower right corner point sets, and removing all lower right corner points, which do not meet the requirements, of the list grid lines positioned on the right side of the list grid line positioned at the upper left corner point A;
according to a fifth rule, one corner point which is closest to the upper left corner point A is found from all lower right corner points left after twice screening, the lower right corner point is the lower right corner point corresponding to the upper left corner point A, and the paired upper left corner point and lower right corner point form a cell;
and sequentially finding the lower right corner points corresponding to the upper left corner points in the set of all upper left corner points, so as to reconstruct all the cells.
10. The method as claimed in claim 1, wherein the keypoint detection network is HRNet or DeeperCut, and the constructed table keypoint data set is used for pre-training to obtain the capability of detecting keypoints of the table.
CN202211637591.4A 2022-12-20 2022-12-20 Method for identifying table structure of whole-line table based on key point detection Active CN115620322B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211637591.4A CN115620322B (en) 2022-12-20 2022-12-20 Method for identifying table structure of whole-line table based on key point detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211637591.4A CN115620322B (en) 2022-12-20 2022-12-20 Method for identifying table structure of whole-line table based on key point detection

Publications (2)

Publication Number Publication Date
CN115620322A true CN115620322A (en) 2023-01-17
CN115620322B CN115620322B (en) 2023-04-07

Family

ID=84879584

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211637591.4A Active CN115620322B (en) 2022-12-20 2022-12-20 Method for identifying table structure of whole-line table based on key point detection

Country Status (1)

Country Link
CN (1) CN115620322B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116259064A (en) * 2023-03-09 2023-06-13 北京百度网讯科技有限公司 Table structure identification method, training method and training device for table structure identification model
CN117576699A (en) * 2023-11-06 2024-02-20 华南理工大学 Locomotive work order information intelligent recognition method and system based on deep learning

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105786957A (en) * 2016-01-08 2016-07-20 云南大学 Table sorting method based on cell adjacency relation and depth-first traversal
US20170091943A1 (en) * 2015-09-25 2017-03-30 Qualcomm Incorporated Optimized object detection
US20180357776A1 (en) * 2017-06-08 2018-12-13 Microsoft Technology Licensing, Llc Vector graphics handling processes for user applications
CN112733855A (en) * 2020-12-30 2021-04-30 科大讯飞股份有限公司 Table structuring method, table recovery equipment and device with storage function
CN113268982A (en) * 2021-06-03 2021-08-17 湖南四方天箭信息科技有限公司 Network table structure identification method and device, computer device and computer readable storage medium
US20210256680A1 (en) * 2020-02-14 2021-08-19 Huawei Technologies Co., Ltd. Target Detection Method, Training Method, Electronic Device, and Computer-Readable Medium
CN113705395A (en) * 2021-08-16 2021-11-26 南京英诺森软件科技有限公司 Method for converting paper form into word document based on deep learning model
CN113723328A (en) * 2021-09-06 2021-11-30 华南理工大学 Method for analyzing and understanding chart document panel
CN113723330A (en) * 2021-09-06 2021-11-30 华南理工大学 Method and system for understanding chart document information
CN114359939A (en) * 2021-12-16 2022-04-15 华南理工大学 Table structure identification method, system and equipment based on cell detection

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170091943A1 (en) * 2015-09-25 2017-03-30 Qualcomm Incorporated Optimized object detection
CN105786957A (en) * 2016-01-08 2016-07-20 云南大学 Table sorting method based on cell adjacency relation and depth-first traversal
US20180357776A1 (en) * 2017-06-08 2018-12-13 Microsoft Technology Licensing, Llc Vector graphics handling processes for user applications
US20210256680A1 (en) * 2020-02-14 2021-08-19 Huawei Technologies Co., Ltd. Target Detection Method, Training Method, Electronic Device, and Computer-Readable Medium
CN112733855A (en) * 2020-12-30 2021-04-30 科大讯飞股份有限公司 Table structuring method, table recovery equipment and device with storage function
CN113268982A (en) * 2021-06-03 2021-08-17 湖南四方天箭信息科技有限公司 Network table structure identification method and device, computer device and computer readable storage medium
CN113705395A (en) * 2021-08-16 2021-11-26 南京英诺森软件科技有限公司 Method for converting paper form into word document based on deep learning model
CN113723328A (en) * 2021-09-06 2021-11-30 华南理工大学 Method for analyzing and understanding chart document panel
CN113723330A (en) * 2021-09-06 2021-11-30 华南理工大学 Method and system for understanding chart document information
CN114359939A (en) * 2021-12-16 2022-04-15 华南理工大学 Table structure identification method, system and equipment based on cell detection

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
高良才 等: "表格识别技术研究进展" *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116259064A (en) * 2023-03-09 2023-06-13 北京百度网讯科技有限公司 Table structure identification method, training method and training device for table structure identification model
CN116259064B (en) * 2023-03-09 2024-05-17 北京百度网讯科技有限公司 Table structure identification method, training method and training device for table structure identification model
CN117576699A (en) * 2023-11-06 2024-02-20 华南理工大学 Locomotive work order information intelligent recognition method and system based on deep learning

Also Published As

Publication number Publication date
CN115620322B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
Gao et al. ICDAR 2019 competition on table detection and recognition (cTDaR)
CN115620322B (en) Method for identifying table structure of whole-line table based on key point detection
Shi et al. Text extraction from gray scale historical document images using adaptive local connectivity map
Shi et al. A steerable directional local profile technique for extraction of handwritten arabic text lines
Lee et al. Parameter-free geometric document layout analysis
JP4065460B2 (en) Image processing method and apparatus
CN106096592B (en) A kind of printed page analysis method of digital book
Tran et al. Page segmentation using minimum homogeneity algorithm and adaptive mathematical morphology
CN105574524B (en) Based on dialogue and divide the mirror cartoon image template recognition method and system that joint identifies
CN114529925B (en) Method for identifying table structure of whole line table
CN113158808A (en) Method, medium and equipment for Chinese ancient book character recognition, paragraph grouping and layout reconstruction
CN113435240A (en) End-to-end table detection and structure identification method and system
CN109389050B (en) Method for identifying connection relation of flow chart
Roy et al. Text line extraction in graphical documents using background and foreground information
Al Abodi et al. An effective approach to offline Arabic handwriting recognition
CN115661848A (en) Form extraction and identification method and system based on deep learning
CN116824608A (en) Answer sheet layout analysis method based on target detection technology
JPH07220090A (en) Object recognition method
Stewart et al. Document image page segmentation and character recognition as semantic segmentation
Maddouri et al. Text lines and PAWs segmentation of handwritten Arabic document by two hybrid methods
Park et al. A method for automatically translating print books into electronic Braille books
Ablameyko et al. Recognition of engineering drawing entities: review of approaches
Li et al. Detection of overlapped quadrangles in plane geometric figures
JP3720892B2 (en) Image processing method and image processing apparatus
CN116259062A (en) CNN handwriting identification method based on multichannel and attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant