CN116259064B - Table structure identification method, training method and training device for table structure identification model - Google Patents

Table structure identification method, training method and training device for table structure identification model Download PDF

Info

Publication number
CN116259064B
CN116259064B CN202310259267.1A CN202310259267A CN116259064B CN 116259064 B CN116259064 B CN 116259064B CN 202310259267 A CN202310259267 A CN 202310259267A CN 116259064 B CN116259064 B CN 116259064B
Authority
CN
China
Prior art keywords
grid lines
target
information
column
table structure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310259267.1A
Other languages
Chinese (zh)
Other versions
CN116259064A (en
Inventor
吕鹏原
马伟洪
章成全
姚锟
王井东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202310259267.1A priority Critical patent/CN116259064B/en
Publication of CN116259064A publication Critical patent/CN116259064A/en
Application granted granted Critical
Publication of CN116259064B publication Critical patent/CN116259064B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19147Obtaining sets of training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure provides a table structure recognition method, a training method of a table structure recognition model and a training device of the table structure recognition model, relates to the technical field of artificial intelligence, and particularly relates to the technical field of computer vision and deep learning. The specific implementation scheme of the table structure identification method is as follows: extracting image features of a form image to be identified; based on an attention mechanism, obtaining grid line characteristics according to preset grid line information and image characteristics, wherein the grid line characteristics comprise relative position relation characteristics among a plurality of grid lines corresponding to a table structure of a table image to be identified and structural characteristics of the plurality of grid lines; detecting grid line characteristics to obtain the relative position relation among a plurality of grid lines and the structural information of the plurality of grid lines; based on the relative positional relationship and the structural information, a table structure is obtained.

Description

Table structure identification method, training method and training device for table structure identification model
Technical Field
The present disclosure relates to the field of artificial intelligence, and in particular, to the field of computer vision and deep learning. In particular to a table structure identification method, a training method and a training device of a table structure identification model.
Background
Computer vision refers to a technology that a computer replaces human eyes to perform machine vision such as recognition, tracking and measurement on a target, and further performs image processing.
With the continuous development of artificial intelligence technology, the artificial intelligence technology is applied to the field of computer vision, and can be used for identifying scenes with complex structures, such as: the recognition scenario of the table structure.
Disclosure of Invention
The disclosure provides a table structure identification method, a table structure identification device, electronic equipment and a storage medium.
According to an aspect of the present disclosure, there is provided a table structure identifying method including: and extracting image features of the form image to be identified. Based on the attention mechanism, grid line characteristics are obtained according to preset grid line information and image characteristics, wherein the grid line characteristics comprise relative position relation characteristics among a plurality of grid lines corresponding to a table structure of a table image to be identified and structural characteristics of the plurality of grid lines. And detecting the grid line characteristics to obtain the relative position relation among the grid lines and the structural information of the grid lines. Based on the relative positional relationship and the structural information, a table structure is obtained.
According to another aspect of the present disclosure, there is provided a training method of a table structure recognition model, including: the following training operations are performed using the initial table structure recognition model: extracting image features of a sample table image; based on the attention mechanism, obtaining sample grid line characteristics according to grid line information and image characteristics of the sample table image, wherein the sample grid line characteristics comprise relative position relation characteristics among a plurality of grid lines corresponding to a table structure of the sample table image and structural characteristics of the plurality of grid lines; detecting characteristics of the sample grid lines to obtain relative position relations among the plurality of sample grid lines and structural information of the plurality of sample grid lines; based on the relative position relation and the structural information, a table structure identification result of a sample table image is obtained; based on the table structure recognition result and the table structure label of the sample table image, model parameters of the initial table structure recognition model are adjusted to obtain a trained table structure recognition model.
According to another aspect of the present disclosure, there is provided a table structure identifying apparatus including: the device comprises a first feature extraction module, a first attention module, a first structure detection module and a first structure identification module. The first feature extraction module is used for extracting image features of the form image to be identified. The first attention module is used for obtaining grid line characteristics according to preset grid line information and image characteristics based on an attention mechanism, wherein the grid line characteristics comprise relative position relation characteristics among a plurality of grid lines corresponding to a table structure of a table image to be identified and structural characteristics of the plurality of grid lines. The first structure detection module is used for detecting the grid line characteristics to obtain the relative position relation among the grid lines and the structure information of the grid lines. The first structure identification module is used for obtaining a table structure based on the relative position relation and the structure information.
According to another aspect of the present disclosure, there is provided a training apparatus of a table structure recognition model, including: the system comprises a second feature extraction module, a second attention module, a second structure detection module, a second structure identification module and a training module. The second feature extraction module is used for extracting image features of the sample table image. And the second attention module is used for obtaining sample grid line characteristics according to the grid line information and the image characteristics of the sample table image based on an attention mechanism, wherein the sample grid line characteristics comprise relative position relation characteristics among a plurality of grid lines corresponding to the table structure of the sample table image and structural characteristics of the plurality of grid lines. And the second structure detection module is used for detecting the characteristics of the sample grid lines to obtain the relative position relation among the plurality of sample grid lines and the structure information of the plurality of sample grid lines. And the second structure identification module is used for obtaining a table structure identification result of the sample table image based on the relative position relation and the structure information. And the training module is used for adjusting model parameters of an initial table structure recognition model based on the table structure recognition result and the table structure label of the sample table image to obtain the trained table structure recognition model.
According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method according to an embodiment of the present disclosure.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform a method according to an embodiment of the present disclosure.
According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method according to embodiments of the present disclosure.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 schematically illustrates an exemplary system architecture to which a table structure recognition method or a training method and apparatus of a table structure recognition model may be applied, according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a method of table structure identification according to an embodiment of the disclosure;
FIG. 3 schematically illustrates a schematic diagram of a grid representation of a table structure according to an embodiment of the disclosure;
FIG. 4 schematically illustrates a schematic diagram of detecting row and column features in accordance with an embodiment of the present disclosure;
FIG. 5 schematically illustrates a schematic diagram of a structure for obtaining target grid lines in accordance with an embodiment of the present disclosure;
FIG. 6 schematically illustrates a flow chart of a training method of a tabular identification model in accordance with an embodiment of the disclosure;
FIG. 7 schematically illustrates a block diagram of a table structure identification apparatus according to an embodiment of the disclosure;
FIG. 8 schematically illustrates a block diagram of a training apparatus of a tabular identification model in accordance with an embodiment of the disclosure; and
Fig. 9 schematically illustrates a block diagram of an electronic device adapted to implement a content processing method according to an embodiment of the disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Because of various table forms and complex structures, the recognition of the table structures is always a technical difficulty in the intelligent document recognition scene.
A cell is the smallest unit that makes up a table, referring to the intersection of a row and a column in a table. May be split or combined. The splitting and merging based on the cells can form a table structure with various forms and complex structures. Wherein, the unit cell is generally quadrilateral, four top lines of the quadrilateral are vertices of the unit cell, and sides of the quadrilateral are border lines of the unit cell. For example: a wired table, a wireless table, or a hybrid table. The wired table refers to a table in which border lines of each cell in the table structure are displayed in the table structure. The wireless table refers to a table in which border lines of each cell in the table structure are not displayed in the table structure. The mixed table refers to a table in which border lines of part of cells in the table structure are not displayed in the table structure.
In the table, a plurality of cells may be included. The text content in some cells may be blank. For cells containing blank text content, a form in which the border lines of the cells are not displayed in a table structure may be referred to as a wireless blank form. The following method can be generally adopted for identifying the table structure in the related art:
Method 1: by means of image segmentation technology, the form image is divided into a plurality of areas through predicting the overstated rows and columns, and then whether adjacent areas need to be fused is judged through a combined network. The segmentation-before-combination method has the problems that the model is sub-optimized because the method comprises two stages of image segmentation and image combination, and a deep learning model of the two stages is not trained end to end.
Method 2: by using a text OCR (optical character recognition) recognition technology, whether the text boxes belong to the same row, the same column or the same text box is judged by detecting the text boxes in the table image. This method is too dependent on the result of OCR detection, and can only detect form images containing text content, and cannot be applied to detect empty form images without text.
Method 3: through direct identification detection of the cells, other information is passed through, for example: and judging whether the cells belong to the same row or the same column according to the semantic content of the characters in the cells. This approach is better for wired tables, but worse for wireless empty tables.
Method 4: by predicting cells as structured language, for example: HTML or Latex structured language. This method generally adopts a serial prediction method, and has low recognition speed and poor recognition accuracy for a table with a complex structure.
In view of this, an embodiment of the present disclosure provides a method for identifying a table structure, including: and extracting image features of the form image to be identified. Based on the attention mechanism, grid line characteristics are obtained according to preset grid line information and image characteristics, wherein the grid line characteristics comprise relative position relation characteristics among a plurality of grid lines corresponding to a table structure of a table image to be identified and structural characteristics of the plurality of grid lines. And detecting the grid line characteristics to obtain the relative position relation among the grid lines and the structural information of the grid lines. Based on the relative positional relationship and the structural information, a table structure is obtained.
Fig. 1 schematically illustrates an exemplary system architecture to which a table structure recognition method or a training method and apparatus of a table structure recognition model may be applied according to an embodiment of the present disclosure.
It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios. For example, in another embodiment, an exemplary system architecture to which the method and apparatus for training a table structure identification method or a table structure identification model may be applied may include a terminal device, but the terminal device may implement the method and apparatus for training a table structure identification method or a table structure identification model provided in the embodiments of the present disclosure without interaction with a server.
As shown in fig. 1, a system architecture 100 according to this embodiment may include a feature extraction module 101, an attention module 102, a structure detection module 103, and a structure identification module 104.
According to an embodiment of the present disclosure, the feature extraction module 101 may be CNN (Convolutional Neural Network) convolutional neural networks. Image features are extracted from the form image to be identified using the feature extraction module 101.
According to embodiments of the present disclosure, the attention module 102 may be a transducer encoder. Based on the attention mechanism, mesh features are derived from predetermined mesh information and image features using the attention module 102.
According to an embodiment of the present disclosure, the structure detection module 103 may include a row feature decoder 103_1, a column feature decoder 103_2, a row column class detection head 103_3, a position detection head 103_4, and a side class detection head 103_5.
According to an embodiment of the present disclosure, in the row feature decoder 103_1, the row features of each row of table images may be decoded sequentially in order from top to bottom, and the row features may be detected by the row and column type detection head 103_3, resulting in the row classification result [1, 0]. The row classification result indicates that at least corner points connected with two target edges exist in the grid lines of the 1 st row to the 4 th row. The target edge characterizes an edge corresponding to a cell boundary in the table structure.
According to embodiments of the present disclosure, in the column feature decoder 103_2, column features of each row of table images may be sequentially decoded in a left-to-right order, and detected by the column category detection head 103_3, resulting in a column classification result [1,1,1,0,0,0]. The column classification result indicates that at least corner points connected with two target edges exist in the grid lines of the 1 st column to the 3 rd column.
According to the embodiment of the present disclosure, the position detection head 103_4 is used to detect the corner positions of the plurality of grid lines in the same row, and corner position information of the same row is obtained. And detecting the angular point positions of the grid lines in the same column to obtain angular point position information of the same column.
According to the embodiment of the present disclosure, the edge category detection head 103_5 is used to detect the edge categories of the plurality of grid lines in the same row, and obtain the edge attribute information of the same row. Similarly, the edge type detection head 103_5 detects edge types of a plurality of grid lines in the same column, and obtains edge attribute information in the same column.
According to embodiments of the present disclosure, the target grid lines may be first determined from the row classification result and the column classification result using the structure identification module 105. And secondly, determining the target edge according to the edge attribute information of the target grid line. And then, determining the target corner point according to the connection relation between the target edge and the corner point, and obtaining the structural information of the target grid line according to the corner point position of the target corner point, the connection relation between the target edge and the target corner point and the target edge. And obtaining the recognition result of the table image, namely the table structure in the table image, according to the structure information of the target grid lines and the relative position relation information among the target grid lines.
Fig. 2 schematically illustrates a flow chart of a table structure identification method according to an embodiment of the present disclosure.
As shown in fig. 2, the method includes operations S210 to S240.
In operation S210, image features of a form image to be recognized are extracted.
In operation S220, based on the attention mechanism, grid line characteristics including a relative positional relationship characteristic between a plurality of grid lines corresponding to a table structure of the table image to be recognized and a structural characteristic of the plurality of grid lines are obtained from the predetermined grid line information and the image characteristics.
In operation S230, the grid line characteristics are detected, and the relative positional relationship between the plurality of grid lines and the structural information of the plurality of grid lines are obtained.
In operation S240, a table structure is obtained based on the relative positional relationship and the structural information.
According to an embodiment of the present disclosure, the predetermined grid line information may be grid lines capable of constituting a cross grid, and the densities of the grid lines may be uniform or non-uniform. In embodiments of the present disclosure, the density of grid lines is generally not limited to the size of cells in the tabular image. However, for the table structure in the table image to be recognized may be inclined or distorted due to the influence of the photographing environment, in which case the density of the grid lines may affect the accuracy of the table structure recognition to some extent. For example: the greater the density of grid lines, the greater the accuracy of the identification of the table structure.
According to embodiments of the present disclosure, the attention mechanism may be a self-attention mechanism or a cross-attention mechanism. The grid line features obtained based on the attention mechanism can be used for representing the features obtained by carrying out grid representation on the table structure of the table image to be identified. The grid representation means that the corner points and the edges in the table structure are represented by the corner points and the edges in the grid, and thus, the grid line features may include the relative positional relationship features between the plurality of grid lines and the structural features of the plurality of grid lines corresponding to the table structure of the table image to be identified.
According to embodiments of the present disclosure, the relative positional relationship features between the plurality of grid lines may characterize corner position features of the plurality of grid lines. The structural features of the plurality of grid lines may include connection relationship features of corner points and edges of the plurality of grid lines and attribute features of edges.
According to the embodiment of the disclosure, the grid line characteristics are detected, and the angular point position information, the angular point and side connection relation information and the side attribute information of a plurality of grid lines can be obtained. The position information of the corner points can represent coordinate information of the corner points. The attribute information of an edge may characterize whether the edge belongs to a border of a cell in the table structure.
According to an embodiment of the present disclosure, in the table structure, the points of each cell connect at least two edges of the cell. For the wireless table, the points and border lines of the cells may be represented by grids.
It should be noted that, in the embodiment of the present disclosure, the corner points refer to intersection points obtained by intersecting the grid lines of the same row with the grid lines of the same column. A grid line may comprise two corner points and one edge. Wherein two corner points are located at both ends of the edge.
Fig. 3 schematically illustrates a schematic diagram of a grid representation of a table structure according to an embodiment of the disclosure.
As shown in FIG. 3, in an embodiment 300, the table structure of the table 310 may be represented by a grid 320 of M rows and N columns. Wherein, the grid of row 1, column 1 through column 3 may represent cell T 1 of table 310. The grid of row 2 and column 1 may represent cells T 2 of table 310. The grid of row 3 and column 1 may represent cells T 3 of table 310. The grid of row 4 and column 1 may represent cells T 4 of table 310. The 5th row and 1 st column grid may represent cells T 5 of table 310. Row 2 and column 2 may represent cells T 6 of table 310. The grid of row 2 and column 3 may represent cells T 7 of table 310. The grid of column 2, row 3 through row 5 and the grid of column 3, row 3 through row 5 may represent the merge cell T 8 of the table 310.
Therefore, the table structure identification method provided by the embodiment of the disclosure can be suitable for identification scenes of relatively complex table structures such as wired tables, wireless tables, mixed structures of wired and wireless tables and the like.
According to an embodiment of the present disclosure, the relative positional relationship may represent corner point positional information of the plurality of grid lines, and the structural information may represent side attribute information of the plurality of grid lines and connection relationship information of corner points and sides. Based on the relative positional relationship and the structural information, a table structure can be obtained.
According to an embodiment of the present disclosure, grid line features are derived from predetermined grid line information and image features by an attention-based mechanism. And detecting the grid line characteristics to obtain the relative position relation among the grid lines and the structural information of the grid lines, and obtaining a table structure based on the relative position relation and the structural information. Since the grid lines can be used for accurately representing the structural characteristics of any form of table structure, the table structure can be accurately obtained through detecting the grid line characteristics. The accuracy and the application range of the identification of the table structure are improved.
According to an embodiment of the present disclosure, the above operation S220 may include the following operations:
Based on the attention mechanism, according to the grid lines and the image characteristics of each row in the grid line information, the row characteristics of the to-be-identified table image are obtained. Based on the attention mechanism, according to the grid lines and the image characteristics of each column in the grid line information, column characteristics of the table image to be identified are obtained. And obtaining grid line characteristics according to the row characteristics and the column characteristics.
For example: the line characteristics of the 1 st line of grid lines can be obtained based on the attention mechanism according to the 1 st line of grid lines in the grid line information and the 1 st line of image characteristics in the table image. And obtaining the column characteristics of the 1 st column of grid lines according to the 1 st column of grid lines in the grid line information and the 1 st column of image characteristics in the table image.
According to embodiments of the present disclosure, a row feature sequence and a column feature sequence may be obtained as the grid lines of each row, each column are processed in order based on the attention mechanism. So as to improve the processing efficiency when the detection result is processed later.
According to an embodiment of the present disclosure, the above operation S230 may include the following operations:
And detecting the grid line characteristics to obtain classification results of the grid lines. And obtaining target grid lines from the grid lines based on classification results of the grid lines, wherein the target grid lines comprise at least one target corner point, the target corner points represent corner points connected with at least two target edges, and the target edges represent edges corresponding to cell boundaries in the table structure. And detecting grid line characteristics of the target grid lines to obtain the relative position relationship between the target grid lines and the structural information of the target grid lines.
According to an embodiment of the present disclosure, the gridline features may include row features and column features. The row features characterize the structural features of the plurality of grid lines located in the same row and the column features characterize the structural features of the plurality of grid lines located in the same column.
According to an embodiment of the present disclosure, detecting the grid line features to obtain classification results of the plurality of grid lines may include the following operations:
and detecting the row characteristics to obtain a row classification result. And detecting the column characteristics to obtain a column classification result. And obtaining classification results of the grid lines according to the row classification results and the column classification results.
Fig. 4 schematically illustrates a schematic diagram of detecting row and column features according to an embodiment of the disclosure.
As shown in fig. 4, in embodiment 400, a row classification result 432 is obtained by detecting row characteristics for each row in grid 431. By detecting the column characteristics of each column, a column classification result 433 is obtained. For example: in the row classification result: and corresponding to the 1 st line of grid lines in the grid 431, detecting the 1 st line of grid lines with target angular points and the 2 nd line of grid lines sequentially row by row until the m th line of grid lines. No target corner exists in the grid lines of the M-th row to the M-th row.
Similarly, in the column classification result, corresponding to the 1 st column of grid lines in the grid 431, the 1 st column of grid lines have target corner points, the 2 nd column of grid lines have target corner points, and the row by row detection is sequentially performed until the nth column of grid lines. No target corner exists in the grid lines of the N-th column and the N-th column.
According to the embodiment of the disclosure, the line classification result and the column classification result are obtained by respectively detecting the line characteristics and the column characteristics of the grid line characteristics, so that the grid line characteristics can be sequentially detected to obtain the ordered classification result, the grid lines can be conveniently screened, and the data processing amount in the table structure identification process is reduced.
According to an embodiment of the present disclosure, detecting a line feature to obtain a line classification result may include the following operations:
And detecting the line characteristics to obtain a first probability of existence of the target corner in the grid lines of the same line. And obtaining a row classification result based on the first probability.
According to an embodiment of the present disclosure, by setting a threshold value of the first probability, in a case where the first probability that there is a target corner in a plurality of grid lines of the same row is greater than or equal to the threshold value, a classification result of the row may be 1, which indicates that there are target edges in a cell frame corresponding to a table structure to be identified in the grid lines of the row, and target corners connected to at least two target edges.
According to an embodiment of the present disclosure, in a case where a first probability that a target corner exists among a plurality of grid lines of the same line is less than a threshold value, a classification result of the line may be 0. The grid lines representing the row have no target edges in the cell border corresponding to the table structure to be identified, and target corner points connected to at least two target edges.
For example: the threshold value of the first probability may be 0.7. In line 3 of the grid 431 shown in fig. 4, when detecting a plurality of grid lines of line 3, the first probability obtained may be 0.9, and the classification result of the line may be 1.
Thus, row classification result 432 in FIG. 4 may be represented as [1,1,1,1,1,1,0,0,0,0,0].
According to an embodiment of the present disclosure, detecting a column feature to obtain a column classification result may include the following operations:
And detecting the column characteristics to obtain a second probability of existence of the target corner in the grid lines of the same column. Based on the second probability, a column classification result is obtained.
According to an embodiment of the present disclosure, by setting a threshold value of the second probability, in a case where the second probability that there is a target corner point in a plurality of grid lines of the same column is greater than or equal to the threshold value, a classification result of the column may be 1, which indicates that there are target edges in a cell frame corresponding to a table structure to be identified in the grid lines of the column, and the target corner points connected to at least two target edges.
According to an embodiment of the present disclosure, in a case where the second probability that the target corner exists in the plurality of grid lines of the same column is smaller than the threshold value, the classification result of the column may be 0. The grid lines representing the column have no target edges in the cell border corresponding to the table structure to be identified, and target corner points connected to at least two target edges.
For example: the threshold for the second probability may be 0.7. In column 2 of the grid 431 shown in fig. 4, when detecting a plurality of grid lines in column 2, the second probability obtained may be 0.8, and the classification result of this column may be 1.
Thus, column classification result 433 in FIG. 4 may be denoted as [1,1,1,1,0,0,0,0,0,0,0].
According to an embodiment of the present disclosure, obtaining a target grid line from a plurality of grid lines based on classification results of the plurality of grid lines may include the operations of:
Based on the row classification result, a first grid line is obtained from a plurality of grid lines of the same row. Based on the column classification result, a second grid line is obtained from the plurality of grid lines of the same column. And obtaining the target grid line according to the first grid line and the second grid line.
For example: row classification result 432 may be [1,1,1,1,1,1,0,0,0,0,0] and column classification result 433 may be [1,1,1,1,0,0,0,0,0,0,0]. Thus, the resulting target grid lines may be the 1 st to 6 th row grid lines and the 1 st to 4 th column grid lines.
According to an embodiment of the present disclosure, a target grid line is obtained from a plurality of grid lines based on a row classification effect and a column classification effect. So as to reduce the amount of data processing when the structure identification of the grid lines results in a table structure.
According to an embodiment of the present disclosure, detecting grid line characteristics of target grid lines to obtain a relative positional relationship between the target grid lines and structural information of the target grid lines may include the following operations:
And detecting the relative position relation characteristics of the target grid lines to obtain the relative position relation information among the target grid lines. And detecting the structural characteristics of the target grid lines to obtain the structural information of the target grid lines.
According to an embodiment of the present disclosure, the relative positional relationship information of the target grid lines may include angular point positional information on the grid lines, for example: may be coordinate information of the corner points.
According to an embodiment of the present disclosure, the structure information of the target grid line may include connection relationship information of corner points and edges constituting the grid line and attribute information of the edges. The connection relation information of the corner points and the edges can represent the connection quantity relation of the corner points and the edges. For example: the edges connecting to a corner point may be 2 or 3 or even more.
The attribute information of an edge may characterize whether the edge belongs to a border of a cell in the table structure. For example: in the case where the side of the grid line belongs to part or all of the sides of the frame of the cell, the attribute information of the side of the grid line may be represented as 1. In the case where the side of the grid line does not belong to part or all of the sides of the frame of the cell, the attribute information of the side of the grid line may be represented as 0.
According to an embodiment of the present disclosure, detecting structural features of a target grid line to obtain structural information of the target grid line may include the following operations:
And detecting the structural characteristics of the target grid lines to obtain the side attribute information and the connection relation information of the sides and the corner points of the target grid lines. And obtaining the structural information of the target grid line according to the side attribute information of the target grid line and the connection relation information of the side and the corner point.
According to an embodiment of the present disclosure, obtaining structural information of a target grid line according to side attribute information and connection relationship information of sides and corner points of the target grid line may include the following operations:
And determining the target edge according to the edge attribute information. And obtaining the target corner point according to the connection relation information of the edge and the corner point and the target edge. And obtaining the structural information of the target grid lines according to the target corner points, the target edges and the connection relation information of the edges and the corner points.
FIG. 5 schematically illustrates a schematic diagram of a structure for obtaining target grid lines according to an embodiment of the present disclosure.
As shown in fig. 5, in embodiment 500, for corner a of the target grid lines, the grid lines connected to corner a are respectively: grid line La, grid line Lb, grid line Lc. From the side attribute information, it can be determined that the grid lines La, lb, and Lc are target sides. Therefore, the number of the connection between the corner point a and the target edge is 3, and the corner point a can be determined as the target corner point.
According to an embodiment of the present disclosure, for another corner B of the grid line Lb, 3 grid lines are target edges (grid line Lb, grid line Le, and grid line Lf) among 4 grid lines connected to the corner B, and 1 grid line is a non-target edge (grid line Ld). Therefore, the number of the connection between the corner B and the target edge is 3, and the corner B can be determined as the target corner. The structure information of the grid line Lb may be that the grid line Lb is composed of two target corner points and one target edge.
According to the embodiment of the disclosure, by determining the edge attribute and determining the corner point in combination with the connection information of the edge and the corner point, detection of grid lines corresponding to the table structure is realized, a relatively accurate grid structure recognition result can be obtained, and the method is not limited by text contents in grids.
According to an embodiment of the present disclosure, detecting the relative positional relationship feature of the target grid lines to obtain the relative positional relationship information between the target grid lines may include the following operations:
and detecting the relative position relation characteristics of the target grid lines to obtain the position information of the corner points.
According to embodiments of the present disclosure, the position information of the corner point may be represented as coordinate information of the corner point. In fig. 5, the corner points constituting the grid line Lb may include corner points a and B. The corner points constituting the grid line La may include corner point a and corner point C. The coordinates of the corner a may be (x a,ya), the coordinates of the corner B may be (x b,yb), and the coordinates of the corner C may be (x c,yc). By the coordinates of the corner points of the grid lines, the relative positional relationship of the grid lines La and the grid lines Lb can be determined.
For example: the coordinates of the corner a may be (1, 0), the coordinates of the corner B may be (2, 0), and the coordinates of the corner C may be (1, 1). It can be determined that the grid lines La and Lb are mutually perpendicular grid lines having the common corner a.
As shown in fig. 5, in embodiment 500, by detecting the grid characteristics of target grid lines 541, edge attribute detection results 542, edge-corner connection relationship detection results 543, and corner position detection results 544 can be obtained. A target edge 545 may be determined from the edge attribute detection result 542. According to the edge attribute detection result 542 and the edge-to-corner connection relationship detection result 543, the target corner 547 and the target corner-to-target edge connection relationship 546 can be obtained. The position information 548 of the target corner can be obtained from the corner position detection result 544 according to the target corner 547. Finally, a table structure is obtained according to the target edge 545, the connection relation 546 between the target corner points and the target edge, and the position information 548 of the target corner points.
According to the embodiment of the disclosure, the relative position relationship of the grid lines is determined through the position information of the corner points, so that a table structure is obtained by combining the connection relationship of the corner points and the edges and the attributes of the edges.
According to an embodiment of the present disclosure, obtaining a table structure based on the relative positional relationship and the structural information may include the following operations:
Traversing the grid lines, and obtaining cell structure information and cell position information for representing the table structure according to the relative position relation and the structure information based on a breadth-first algorithm. And obtaining a table structure according to the cell structure information and the cell position information.
According to the embodiment of the disclosure, the cell structure information can be obtained according to the connection relation among the target edge, the target corner point and the target edge by traversing the grid lines. And obtaining the position information of the cell according to the position information of the target corner.
For example: for an independent cell, the grid lines corresponding to the independent cell may be 4 grid lines with the same structure, that is, each grid line is composed of two target corner points and one target edge.
For example: for a merged cell formed by merging 2 independent cells, the grid lines corresponding to the merged cell may include 7 grid lines, including 6 grid lines having the same structure as the independent cells, and 1 grid line formed by two target corner points and one non-target side.
According to the embodiment of the disclosure, the structure information of each cell and the position information of the cell are obtained based on the breadth-first algorithm. According to the position information of the corner points in the cells, the cells can be combined to obtain a table structure. Thereby realizing the purpose of accurately and quickly identifying the form image with any complex structure.
According to an embodiment of the present disclosure, the above table structure identification method further includes: and generating a target table corresponding to the table image to be identified according to the table structure.
For example: the table generation tool may be utilized to generate an editable target table based on the recognition result of the table structure.
Since the structure and the relative position relationship of the grid lines are used as the result of table identification in the embodiment of the present disclosure, any tool for reconstructing a table based on the position information of the corner points, the connection relationship between the edges and the corner points, and the edges is suitable for the embodiment of the present disclosure, and the embodiment of the present disclosure does not specifically limit the table generating tool.
According to the embodiment of the disclosure, the table structure of the table image is identified, and the table is reconstructed while the intelligent identification of the table structure is realized, so that the editable target table is obtained, the method is applicable to reconstructing an empty table with a complicated linear structure, and the application range of the method for reconstructing the table on the basis of the identification of the table structure is enlarged.
Fig. 6 schematically illustrates a training method of a tabular identification model according to an embodiment of the disclosure.
As shown in fig. 6, in embodiment 600, the training method includes performing the following operations S610-S650 using the initial table structure recognition model.
In operation S610, image features of a sample table image are extracted.
In operation S620, based on the attention mechanism, sample grid line features including a relative positional relationship feature between a plurality of grid lines corresponding to a table structure of the sample table image and a structural feature of the plurality of grid lines are obtained from the grid line information and the image features of the sample table image.
In operation S630, the characteristics of the sample grid lines are detected, and the relative positional relationship among the plurality of sample grid lines and the structural information of the plurality of sample grid lines are obtained.
In operation S640, a table structure recognition result of the sample table image is obtained based on the relative positional relationship and the structural information.
In operation S650, model parameters of the initial table structure recognition model are adjusted based on the table structure recognition result and the table structure label of the sample table image, resulting in a trained table structure recognition model.
According to the embodiment of the present disclosure, the definition ranges of the characteristics of the sample grid lines, the relative positional relationships among the plurality of sample grid lines, and the structural information of the plurality of sample grid lines are the same as the definition ranges of the characteristics of the grid lines, the relative positional relationships among the plurality of grid lines, and the structural information of the plurality of grid lines in the method for identifying a table structure in the embodiment of the present disclosure, and are not described herein.
According to the embodiment of the disclosure, in training the table structure recognition model, the representation structure recognition model may be caused to perform 3 detection tasks, which are respectively: row/column category detection, edge attribute detection, corner position detection. The table structure labels may include labels corresponding to the 3 detection tasks, namely row/column category labels, edge attribute labels, and corner position labels.
For example: the line in which the target corner exists in the plurality of grid lines of the same line may be a positive sample line, and the line in which the target corner does not exist in the plurality of grid lines of the same line may be a negative sample line. The column in which the target corner exists among the plurality of grid lines of the same column may be regarded as a positive sample column, and the column in which the target corner does not exist among the plurality of grid lines of the same column may be regarded as a negative sample column.
For example: edges that belong to cell borders in the table structure may be positive sample edges, and edges that do not belong to cell borders in the table structure may be negative sample edges.
For example: the corner connected to at least two positive sample sides may be referred to as a positive sample corner and the corner connected to less than two positive sample sides may be referred to as a negative sample corner.
According to the embodiment of the disclosure, the grid line characteristics of the table structure in the table image can be reflected based on the attention mechanism, and the table structure is obtained through detecting the grid line characteristics. Compared with the training method which relies on image segmentation and then combination in the related art, the method can realize end-to-end training and has no sub-optimization problem.
According to an embodiment of the present disclosure, adjusting model parameters of an initial table structure recognition model based on a table structure recognition result and a table structure label of a sample table image to obtain a trained table structure recognition model may include the operations of:
And obtaining a loss value according to the table structure identification result and the label of the sample table image based on the target loss function. Based on the loss value, performing iterative training on the table structure recognition model by using a back propagation algorithm, and adjusting parameters of the initial table structure recognition model until the loss value converges to obtain a trained table structure recognition model.
According to an embodiment of the present disclosure, the target loss function may be a cross entropy loss function. And adjusting parameters of each module in the table structure identification model based on the loss value of the cross entropy until the loss value converges to obtain a trained table structure identification model.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 7 schematically illustrates a block diagram of a table structure identifying apparatus according to an embodiment of the present disclosure.
As shown in fig. 7, the apparatus 700 may include a first feature extraction module 710, a first attention module 720, a first structure detection module 730, and a first structure identification module 740.
The first feature extraction module 710 is configured to extract image features of the form image to be identified. In some embodiments, the first feature extraction module 710 may be used to implement operation S210 described previously.
The first attention module 720 is configured to obtain, based on an attention mechanism, grid line features according to predetermined grid line information and image features, where the grid line features include a relative positional relationship feature between a plurality of grid lines corresponding to a table structure of a table image to be identified and a structural feature of the plurality of grid lines. In some embodiments, the first attention module 720 may be used to implement operation S220 described previously.
The first structure detection module 730 is configured to detect the grid line characteristics, and obtain the relative positional relationship among the grid lines and the structural information of the grid lines. In some embodiments, the first structure detection module 730 may be configured to implement operation S230 described above.
The first structure identification module 740 is configured to obtain a table structure based on the relative positional relationship and the structural information. In some embodiments, the first structure identification module 740 may be used to implement operation S240 described previously.
According to an embodiment of the present disclosure, the first structure detection module may include: the device comprises a first detection sub-module, a first acquisition sub-module and a second detection sub-module. The first detection submodule is used for detecting the grid line characteristics to obtain classification results of a plurality of grid lines. The first obtaining submodule is used for obtaining target grid lines from the grid lines based on classification results of the grid lines, the target grid lines comprise at least one target corner point, the target corner points represent corner points connected with at least two target edges, and the target edges represent edges corresponding to cell boundaries in a table structure. And the second detection submodule is used for detecting grid line characteristics of the target grid lines to obtain the relative position relation among the target grid lines and the structural information of the target grid lines.
According to an embodiment of the present disclosure, the grid line features include row features that characterize structural features of a plurality of grid lines located in the same row and column features that characterize structural features of a plurality of grid lines located in the same column. The first detection sub-module may include: a first detection unit, a second detection unit and a first obtaining unit. The first detection unit is used for detecting the row characteristics to obtain a row classification result. And the second detection unit is used for detecting the column characteristics to obtain column classification results. And the first obtaining unit is used for obtaining the classification results of the grid lines according to the row classification results and the column classification results.
According to an embodiment of the present disclosure, the first detection unit may include: a first detection subunit and a first acquisition subunit. The first detection subunit is configured to detect a line feature, and obtain a first probability that a target corner exists in a plurality of grid lines in the same line. And the first obtaining subunit is used for obtaining a row classification result based on the first probability.
According to an embodiment of the present disclosure, the second detection unit may include: a second detection subunit and a second acquisition subunit. And the second detection subunit is used for detecting the column characteristics to obtain a second probability of existence of the target corner points in the grid lines of the same column. And a second obtaining subunit, configured to obtain a column classification result based on the second probability.
According to an embodiment of the present disclosure, the first obtaining sub-module may include: a second obtaining unit, a third obtaining unit, and a fourth obtaining unit. The second obtaining unit is used for obtaining the first grid line from the grid lines of the same row based on the row classification result. And a third obtaining unit configured to obtain a second grid line from the plurality of grid lines in the same column based on the column classification result. And a fourth obtaining unit, configured to obtain the target grid line according to the first grid line and the second grid line.
According to an embodiment of the present disclosure, the second detection sub-module may include a third detection unit and a fourth detection unit. The third detection unit is used for detecting the relative position relation characteristics of the target grid lines and obtaining the relative position relation information among the target grid lines. And the fourth detection unit is used for detecting the structural characteristics of the target grid lines to obtain the structural information of the target grid lines.
According to an embodiment of the present disclosure, the structural features include edge attribute features and connection relationship features of edges and corner points, and the fourth detection unit may include: a third detection subunit and a third acquisition subunit. And the third detection subunit is used for detecting the structural characteristics of the target grid line to obtain the side attribute information and the connection relation information of the side and the corner point of the target grid line. And the third obtaining subunit is used for obtaining the structural information of the target grid line according to the side attribute information and the connection relation information of the side and the corner point of the target grid line.
According to an embodiment of the present disclosure, the third obtaining subunit is configured to: and determining the target edge according to the edge attribute information. And obtaining the target corner point according to the connection relation information of the edge and the corner point and the target edge. And obtaining the structural information of the target grid lines according to the target corner points, the target edges and the connection relation information of the edges and the corner points.
According to an embodiment of the present disclosure, the relative position relationship feature of the target grid line includes a position feature of the corner point, and the third detection unit includes a fourth detection subunit, configured to detect the relative position relationship feature of the target grid line, to obtain position information of the corner point.
According to an embodiment of the present disclosure, the first structure identification module may include: the second obtaining sub-module and the third obtaining sub-module. The second obtaining sub-module is used for traversing the grid lines, and obtaining cell structure information and cell position information used for representing the table structure according to the relative position relation and the structure information based on a breadth-first algorithm. And the third obtaining sub-module is used for obtaining a table structure according to the cell structure information and the cell position information.
According to an embodiment of the present disclosure, the above table structure identifying device further includes a generating module, configured to generate, according to the table structure, a target table corresponding to the table image to be identified.
According to an embodiment of the present disclosure, the first attention module may include: the system comprises a first attention sub-module, a second attention sub-module and a fourth acquisition sub-module. The first attention sub-module is used for obtaining the line characteristics of the to-be-identified form image according to the grid lines and the image characteristics of each line in the grid line information based on an attention mechanism. And the second attention sub-module is used for obtaining the column characteristics of the to-be-identified table image according to the grid lines and the image characteristics of each column in the grid line information based on an attention mechanism. And a fourth obtaining sub-module, configured to obtain grid line features according to the row features and the column features.
Fig. 8 schematically illustrates a training apparatus of a tabular identification model according to an embodiment of the present disclosure.
As shown in fig. 8, the training device 800 may include: a second feature extraction module 810, a second attention module 820, a second structure detection module 830, a second structure identification module 840, and a training module 850.
A second feature extraction module 810 is configured to extract image features of the sample table image.
The second attention module 820 is configured to obtain, based on the attention mechanism, a sample grid line feature according to the grid line information and the image feature of the sample table image, where the sample grid line feature includes a relative positional relationship feature between a plurality of grid lines corresponding to a table structure of the sample table image and a structural feature of the plurality of grid lines.
The second structure detection module 830 is configured to detect characteristics of the sample grid lines, and obtain a relative positional relationship between the plurality of sample grid lines and structural information of the plurality of sample grid lines.
The second structure identifying module 840 is configured to obtain a table structure identifying result of the sample table image based on the relative position relationship and the structure information.
The training module 850 is configured to adjust model parameters of the initial table structure identification model based on the table structure identification result and the table structure label of the sample table image, to obtain a trained table structure identification model.
According to an embodiment of the present disclosure, the training module may include: the loss calculation sub-module and the adjustment sub-module. The loss calculation sub-module is used for obtaining a loss value based on the target loss function according to the table structure identification result and the label of the sample table image. And the adjustment sub-module is used for carrying out iterative training on the initial table structure identification model by utilizing a back propagation algorithm based on the loss value, and adjusting parameters of the table structure identification model until the loss value converges to obtain a trained table structure identification model.
According to an embodiment of the present disclosure, an electronic device includes: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the method as described above.
According to an embodiment of the present disclosure, a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the method as described above.
According to an embodiment of the present disclosure, a computer program product comprising a computer program which, when executed by a processor, implements a method as described above.
Fig. 9 shows a schematic block diagram of an example electronic device 900 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 9, the apparatus 900 includes a computing unit 901 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The computing unit 901, the ROM 902, and the RAM 903 are connected to each other by a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.
Various components in device 900 are connected to I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, or the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, an optical disk, or the like; and a communication unit 909 such as a network card, modem, wireless communication transceiver, or the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunications networks.
The computing unit 901 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 901 performs the respective methods and processes described above, for example, table structure recognition or a training method of a table structure recognition model. For example, in some embodiments, the table structure recognition or the training method of the table structure recognition model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 908. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 900 via the ROM902 and/or the communication unit 909. When the computer program is loaded into the RAM903 and executed by the computing unit 901, one or more steps of the above-described table structure recognition or training method of the table structure recognition model may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured to perform the table structure recognition or the training method of the table structure recognition model in any other suitable way (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (29)

1. A method of table structure identification, comprising:
extracting image features of a form image to be identified;
based on an attention mechanism, according to grid lines and the image characteristics of each row/column in preset grid line information, obtaining row/column characteristics of the grid lines, wherein the row/column characteristics of the grid lines are used for representing characteristics obtained by carrying out grid representation on a table structure of a table image to be identified; the row/column characteristics of the grid lines comprise relative position relation characteristics among a plurality of grid lines corresponding to the table structure of the table image to be identified and structural characteristics of the plurality of grid lines;
Detecting the row/column characteristics of the grid lines to obtain classification results of the grid lines, wherein the classification results represent whether the grid lines in the same row/column are rows/columns with target corner points;
Obtaining a target grid line from the grid lines based on classification results of the grid lines, wherein the target grid line comprises at least one target corner point, the target corner point represents a corner point connected with at least two target edges, and the target edge represents an edge corresponding to a cell boundary in the table structure;
Detecting row/column characteristics of grid lines of the target grid lines to obtain relative position relations among the target grid lines and structural information of the target grid lines; the relative positional relationship includes: angular point position information on grid lines; the structure information includes: the method comprises the steps of forming connection relation information of corner points and edges of grid lines and attribute information of edges, wherein the attribute information of the edges represents whether the edges belong to frames of cells in a table structure or not; and
And obtaining the table structure based on the relative position relation among the target grid lines and the structure information among the target grid lines.
2. The method of claim 1, wherein the row feature characterizes structural features of a plurality of grid lines located in a same row, the column feature characterizes structural features of a plurality of grid lines located in a same column, and the detecting the grid line feature obtains classification results of the plurality of grid lines, including:
detecting the row characteristics to obtain a row classification result;
Detecting the column characteristics to obtain a column classification result; and
And obtaining the classification results of the grid lines according to the row classification results and the column classification results.
3. The method of claim 2, the detecting the line feature to obtain a line classification result, comprising:
Detecting the row characteristics to obtain a first probability that the target corner exists in a plurality of grid lines of the same row; and
And obtaining the row classification result based on the first probability.
4. The method of claim 2, wherein the detecting the column feature to obtain a column classification result comprises:
Detecting the column characteristics to obtain a second probability of existence of the target corner in the grid lines of the same column; and
And obtaining the column classification result based on the second probability.
5. The method of claim 1, wherein the deriving the target grid line from the plurality of grid lines based on the classification result of the plurality of grid lines comprises:
Based on the row classification result, a first grid line is obtained from a plurality of grid lines in the same row;
Obtaining a second grid line from a plurality of grid lines in the same column based on the column classification result; and
And obtaining the target grid line according to the first grid line and the second grid line.
6. The method of claim 1, wherein detecting the row/column characteristics of the grid lines of the target grid lines to obtain the relative positional relationship between the target grid lines and the structural information of the target grid lines, comprises:
detecting the relative position relation characteristics of the target grid lines to obtain the relative position relation information among the target grid lines; and
And detecting the structural characteristics of the target grid lines to obtain the structural information of the target grid lines.
7. The method of claim 6, wherein the structural features include edge attribute features and connection relationship features between edges and corner points, and the detecting the structural features of the target grid line to obtain structural information of the target grid line includes:
Detecting the structural characteristics of the target grid lines to obtain side attribute information and connection relation information of sides and corner points of the target grid lines; and
And obtaining the structural information of the target grid line according to the side attribute information of the target grid line and the connection relation information of the side and the corner point.
8. The method of claim 7, wherein the obtaining structural information of the target grid line according to the edge attribute information of the target grid line and the connection relationship information of the edge and the corner point includes:
Determining a target edge according to the edge attribute information;
Obtaining a target corner point according to the connection relation information of the edge and the corner point and the target edge; and
And obtaining the structural information of the target grid lines according to the target corner points, the target edges and the connection relation information of the edges and the corner points.
9. The method of claim 6, wherein the relative positional relationship features of the target grid lines include positional features of corner points, detecting the relative positional relationship features of the target grid lines to obtain the relative positional relationship information between the target grid lines, comprising:
and detecting the relative position relation characteristics of the target grid lines to obtain the position information of the corner points.
10. The method of claim 1, wherein the obtaining the table structure based on the relative positional relationship and the structural information comprises:
traversing the grid lines, and obtaining cell structure information and cell position information used for representing a table structure according to the relative position relation and the structure information based on a breadth-first algorithm; and
And obtaining the table structure according to the cell structure information and the cell position information.
11. The method of claim 1 or 10, further comprising:
And generating a target table corresponding to the table image to be identified according to the table structure.
12. A training method of a table structure recognition model, the training method comprising:
The following training operations are performed using the initial table structure recognition model:
extracting image features of a sample table image;
Based on an attention mechanism, obtaining row/column characteristics of sample grid lines according to grid lines of each row/column of grid line information of a sample table image and the image characteristics, wherein the row/column characteristics of the sample grid lines are used for representing characteristics obtained by carrying out grid representation on a table structure of the sample table image; the row/column features of the sample grid lines include relative positional relationship features between a plurality of grid lines corresponding to a table structure of the sample table image and structural features of the plurality of grid lines;
Detecting the row/column characteristics of the sample grid lines to obtain classification results of the plurality of sample grid lines, wherein the classification results represent whether the plurality of grid lines in the same row/column are rows/columns with sample target corner points;
Obtaining a target sample grid line from the plurality of sample grid lines based on classification results of the plurality of sample grid lines, wherein the target sample grid line comprises at least one target sample corner point, the target sample corner point represents a corner point connected with at least two target sample edges, and the target sample edge represents an edge corresponding to a cell boundary in the table structure;
Detecting row/column characteristics of grid lines of the target sample grid lines to obtain relative position relations among the target sample grid lines and structural information of the target sample grid lines; the relative positional relationship includes: angular point position information on grid lines; the structure information includes: the connection relation information of the corner points and the edges of the grid lines and the attribute information of the edges are formed; the attribute information of the edge represents whether the edge belongs to the frame of a cell in the sample table structure;
obtaining a table structure identification result of the sample table image based on the relative position relation among the target sample grid lines and the structure information among the target sample grid lines; and
And adjusting model parameters of the initial table structure recognition model based on the table structure recognition result and the table structure label of the sample table image to obtain the trained table structure recognition model.
13. The method of claim 12, wherein the adjusting model parameters of the initial table structure identification model based on the table structure identification result and the table structure label of the sample table image to obtain the trained table structure identification model comprises:
based on a target loss function, obtaining a loss value according to the table structure identification result and the label of the sample table image; and
And based on the loss value, performing iterative training on the table structure identification model by using a back propagation algorithm, and adjusting parameters of the initial table structure identification model until the loss value converges to obtain the trained table structure identification model.
14. A table structure identification device, comprising:
the first feature extraction module is used for extracting image features of the form image to be identified;
The first attention module is used for obtaining row/column characteristics of grid lines according to the grid lines and the image characteristics of each row/column in the preset grid line information based on an attention mechanism, wherein the row/column characteristics of the grid lines are used for representing the characteristics obtained by carrying out grid representation on the table structure of the table image to be identified; the row/column characteristics of the grid lines comprise relative position relation characteristics among a plurality of grid lines corresponding to the table structure of the table image to be identified and structural characteristics of the plurality of grid lines;
A first structure detection module comprising:
the first detection submodule is used for detecting the row/column characteristics of the grid lines to obtain classification results of the grid lines, wherein the classification results represent whether the grid lines in the same row/column are rows/columns with target corner points;
A first obtaining submodule, configured to obtain a target grid line from the plurality of grid lines based on a classification result of the plurality of grid lines, where the target grid line includes at least one target corner point, the target corner point represents a corner point connected with at least two target edges, and the target edge represents an edge corresponding to a cell boundary in the table structure; and
The second detection submodule is used for detecting the row/column characteristics of the grid lines of the target grid lines to obtain the relative position relation between the target grid lines and the structural information of the target grid lines; the relative positional relationship includes: angular point position information on grid lines; the structure information includes: the method comprises the steps of forming connection relation information of corner points and edges of grid lines and attribute information of edges, wherein the attribute information of the edges represents whether the edges belong to frames of cells in a table structure or not; and
And the first structure identification module is used for obtaining the table structure based on the relative position relation among the target grid lines and the structure information among the target grid lines.
15. The apparatus of claim 14, wherein the grid line features comprise row features that characterize structural features of a plurality of grid lines located in a same row and column features that characterize structural features of a plurality of grid lines located in a same column, the first detection submodule comprising:
The first detection unit is used for detecting the row characteristics to obtain a row classification result;
the second detection unit is used for detecting the column characteristics to obtain column classification results; and
And the first obtaining unit is used for obtaining the classification results of the grid lines according to the row classification results and the column classification results.
16. The apparatus of claim 15, wherein the first detection unit comprises:
the first detection subunit is used for detecting the line characteristics to obtain a first probability of existence of the target corner in the grid lines of the same line; and
And the first obtaining subunit is used for obtaining the row classification result based on the first probability.
17. The apparatus of claim 15, wherein the second detection unit comprises:
The second detection subunit is used for detecting the column characteristics to obtain a second probability of existence of the target corner in the grid lines of the same column; and
And a second obtaining subunit, configured to obtain the column classification result based on the second probability.
18. The apparatus of claim 14, wherein the first obtaining submodule comprises:
a second obtaining unit, configured to obtain a first grid line from a plurality of grid lines in the same row based on the row classification result;
A third obtaining unit configured to obtain a second grid line from a plurality of grid lines in the same column based on the column classification result; and
And a fourth obtaining unit, configured to obtain the target grid line according to the first grid line and the second grid line.
19. The apparatus of claim 14, wherein the second detection submodule comprises:
The third detection unit is used for detecting the relative position relation characteristics of the target grid lines to obtain the relative position relation information among the target grid lines; and
And the fourth detection unit is used for detecting the structural characteristics of the target grid lines to obtain the structural information of the target grid lines.
20. The apparatus of claim 19, wherein the structural features include edge attribute features and edge-to-corner connection relationship features, and the fourth detection unit includes:
The third detection subunit is used for detecting the structural characteristics of the target grid lines to obtain side attribute information and connection relation information between sides and corner points of the target grid lines; and
And a third obtaining subunit, configured to obtain structural information of the target grid line according to the edge attribute information of the target grid line and the connection relationship information of the edge and the corner point.
21. The apparatus of claim 20, wherein the third acquisition subunit is configured to:
Determining a target edge according to the edge attribute information;
Obtaining a target corner point according to the connection relation information of the edge and the corner point and the target edge; and
And obtaining the structural information of the target grid lines according to the target corner points, the target edges and the connection relation information of the edges and the corner points.
22. The apparatus of claim 19, wherein the relative positional relationship feature of the target grid line comprises a positional feature of a corner point, the third detection unit comprising:
and the fourth detection submodule is used for detecting the relative position relation characteristics of the target grid lines to obtain the position information of the corner points.
23. The apparatus of claim 14, wherein the first structure identification module comprises:
the second obtaining submodule is used for traversing the grid lines and obtaining cell structure information and cell position information used for representing a table structure according to the relative position relation and the structure information based on a breadth-first algorithm; and
And a third obtaining sub-module, configured to obtain the table structure according to the cell structure information and the cell position information.
24. The apparatus of claim 14 or 23, further comprising:
and the generating module is used for generating a target table corresponding to the table image to be identified according to the table structure.
25. A training device for a table structure recognition model, comprising:
The second feature extraction module is used for extracting image features of the sample table image;
The second attention module is used for obtaining row/column characteristics of the sample grid lines according to the grid lines of each row/column in the grid line information of the sample table image and the image characteristics based on an attention mechanism, wherein the row/column characteristics of the sample grid lines are used for representing the characteristics obtained by carrying out grid representation on the table structure of the sample table image; the row/column features of the sample grid lines include relative positional relationship features between a plurality of grid lines corresponding to a table structure of the sample table image and structural features of the plurality of grid lines;
A second structure detection module comprising:
A fourth detection sub-module, configured to detect a row/column feature of the sample grid line, to obtain a classification result of the plurality of sample grid lines, where the classification result characterizes whether the plurality of grid lines in the same row/column are rows/columns in which sample target corner points exist;
A fifth obtaining sub-module, configured to obtain a target sample grid line from the plurality of sample grid lines based on classification results of the plurality of sample grid lines, where the target sample grid line includes at least one target sample corner, the target sample corner represents a corner connected with at least two target sample edges, and the target sample edge represents an edge corresponding to a cell boundary in the table structure;
A fifth detection sub-module, configured to detect a row/column feature of grid lines of the target sample grid line, to obtain a relative positional relationship between the target sample grid lines and structural information of the target sample grid lines; the relative positional relationship includes: angular point position information on grid lines; the structure information includes: the connection relation information of the corner points and the edges of the grid lines and the attribute information of the edges are formed; the attribute information of the edge represents whether the edge belongs to the frame of a cell in the sample table structure;
The second structure identification module is used for obtaining a table structure identification result of the sample table image based on the relative position relation among the target sample grid lines and the structure information among the target sample grid lines;
and the training module is used for adjusting model parameters of an initial table structure recognition model based on the table structure recognition result and the table structure label of the sample table image to obtain the trained table structure recognition model.
26. The apparatus of claim 25, wherein the training module comprises:
The loss calculation sub-module is used for obtaining a loss value based on a target loss function according to the table structure identification result and the label of the sample table image; and
And the adjustment sub-module is used for carrying out iterative training on the table structure identification model by utilizing a back propagation algorithm based on the loss value, and adjusting parameters of the initial table structure identification model until the loss value converges to obtain the trained table structure identification model.
27. An electronic device, comprising:
at least one processor; and
A memory communicatively coupled to the at least one processor; wherein,
The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-13.
28. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-13.
29. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-13.
CN202310259267.1A 2023-03-09 2023-03-09 Table structure identification method, training method and training device for table structure identification model Active CN116259064B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310259267.1A CN116259064B (en) 2023-03-09 2023-03-09 Table structure identification method, training method and training device for table structure identification model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310259267.1A CN116259064B (en) 2023-03-09 2023-03-09 Table structure identification method, training method and training device for table structure identification model

Publications (2)

Publication Number Publication Date
CN116259064A CN116259064A (en) 2023-06-13
CN116259064B true CN116259064B (en) 2024-05-17

Family

ID=86682464

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310259267.1A Active CN116259064B (en) 2023-03-09 2023-03-09 Table structure identification method, training method and training device for table structure identification model

Country Status (1)

Country Link
CN (1) CN116259064B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116452707B (en) * 2023-06-20 2023-09-12 城云科技(中国)有限公司 Text generation method and device based on table and application of text generation method and device

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163198A (en) * 2018-09-27 2019-08-23 腾讯科技(深圳)有限公司 A kind of Table recognition method for reconstructing, device and storage medium
CN110472208A (en) * 2019-06-26 2019-11-19 上海恒生聚源数据服务有限公司 The method, system of form analysis, storage medium and electronic equipment in PDF document
CN110796031A (en) * 2019-10-11 2020-02-14 腾讯科技(深圳)有限公司 Table identification method and device based on artificial intelligence and electronic equipment
CN112183038A (en) * 2020-09-23 2021-01-05 国信智能系统(广东)有限公司 Form identification and typing method, computer equipment and computer readable storage medium
CN113505762A (en) * 2021-09-09 2021-10-15 冠传网络科技(南京)有限公司 Table identification method and device, terminal and storage medium
CN114332893A (en) * 2021-09-01 2022-04-12 腾讯科技(深圳)有限公司 Table structure identification method and device, computer equipment and storage medium
CN114529925A (en) * 2022-04-22 2022-05-24 华南理工大学 Method for identifying table structure of whole line table
CN115273112A (en) * 2022-07-29 2022-11-01 北京金山数字娱乐科技有限公司 Table identification method and device, electronic equipment and readable storage medium
CN115620322A (en) * 2022-12-20 2023-01-17 华南理工大学 Method for identifying table structure of whole-line table based on key point detection
CN115620325A (en) * 2022-10-18 2023-01-17 北京百度网讯科技有限公司 Table structure restoration method and device, electronic equipment and storage medium
CN115620321A (en) * 2022-10-20 2023-01-17 北京百度网讯科技有限公司 Table identification method and device, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220335240A1 (en) * 2021-04-15 2022-10-20 Microsoft Technology Licensing, Llc Inferring Structure Information from Table Images

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163198A (en) * 2018-09-27 2019-08-23 腾讯科技(深圳)有限公司 A kind of Table recognition method for reconstructing, device and storage medium
CN110472208A (en) * 2019-06-26 2019-11-19 上海恒生聚源数据服务有限公司 The method, system of form analysis, storage medium and electronic equipment in PDF document
CN110796031A (en) * 2019-10-11 2020-02-14 腾讯科技(深圳)有限公司 Table identification method and device based on artificial intelligence and electronic equipment
CN112183038A (en) * 2020-09-23 2021-01-05 国信智能系统(广东)有限公司 Form identification and typing method, computer equipment and computer readable storage medium
CN114332893A (en) * 2021-09-01 2022-04-12 腾讯科技(深圳)有限公司 Table structure identification method and device, computer equipment and storage medium
CN113505762A (en) * 2021-09-09 2021-10-15 冠传网络科技(南京)有限公司 Table identification method and device, terminal and storage medium
CN114529925A (en) * 2022-04-22 2022-05-24 华南理工大学 Method for identifying table structure of whole line table
CN115273112A (en) * 2022-07-29 2022-11-01 北京金山数字娱乐科技有限公司 Table identification method and device, electronic equipment and readable storage medium
CN115620325A (en) * 2022-10-18 2023-01-17 北京百度网讯科技有限公司 Table structure restoration method and device, electronic equipment and storage medium
CN115620321A (en) * 2022-10-20 2023-01-17 北京百度网讯科技有限公司 Table identification method and device, electronic equipment and storage medium
CN115620322A (en) * 2022-12-20 2023-01-17 华南理工大学 Method for identifying table structure of whole-line table based on key point detection

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
S. A. Siddiqui et al.Rethinking Semantic Segmentation for Table Structure Recognition in Documents.《ICDAR》.2020,1397-1402. *
卜飞宇等.版面分析中表格与图形的鉴别.《计算机工程与应用》.2004,(第12期),83-87. *
张云佐著.《时空域视频运动片段分割理论及应用》.2020,15. *
梁天恺等.智能化表格识别技术综述.《计算机工程与应用》.2023,1-15. *

Also Published As

Publication number Publication date
CN116259064A (en) 2023-06-13

Similar Documents

Publication Publication Date Title
CN113221743B (en) Table analysis method, apparatus, electronic device and storage medium
CN114550177B (en) Image processing method, text recognition method and device
US20210357710A1 (en) Text recognition method and device, and electronic device
EP3852008A2 (en) Image detection method and apparatus, device, storage medium and computer program product
CN113313083B (en) Text detection method and device
US20210350541A1 (en) Portrait extracting method and apparatus, and storage medium
CN113657274B (en) Table generation method and device, electronic equipment and storage medium
CN115880536B (en) Data processing method, training method, target object detection method and device
CN113204615A (en) Entity extraction method, device, equipment and storage medium
EP3910590A2 (en) Method and apparatus of processing image, electronic device, and storage medium
CN113627439A (en) Text structuring method, processing device, electronic device and storage medium
CN116259064B (en) Table structure identification method, training method and training device for table structure identification model
CN113205041A (en) Structured information extraction method, device, equipment and storage medium
CN114359932B (en) Text detection method, text recognition method and device
CN116844177A (en) Table identification method, apparatus, device and storage medium
CN113657396B (en) Training method, translation display method, device, electronic equipment and storage medium
CN113538450A (en) Method and device for generating image
CN113553428B (en) Document classification method and device and electronic equipment
CN113255501B (en) Method, apparatus, medium and program product for generating form recognition model
CN113837194A (en) Image processing method, image processing apparatus, electronic device, and storage medium
CN113378857A (en) Target detection method and device, electronic equipment and storage medium
CN114511862B (en) Form identification method and device and electronic equipment
CN115082298A (en) Image generation method, image generation device, electronic device, and storage medium
CN113435257B (en) Method, device, equipment and storage medium for identifying form image
CN115359502A (en) Image processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant