CN111597943A - Table structure identification method based on graph neural network - Google Patents
Table structure identification method based on graph neural network Download PDFInfo
- Publication number
- CN111597943A CN111597943A CN202010390152.2A CN202010390152A CN111597943A CN 111597943 A CN111597943 A CN 111597943A CN 202010390152 A CN202010390152 A CN 202010390152A CN 111597943 A CN111597943 A CN 111597943A
- Authority
- CN
- China
- Prior art keywords
- branch
- blob
- node
- graph
- blobs
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 20
- 230000009977 dual effect Effects 0.000 claims abstract description 29
- 238000013145 classification model Methods 0.000 claims abstract description 12
- 230000009123 feedback regulation Effects 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000011524 similarity measure Methods 0.000 claims description 3
- 125000004122 cyclic group Chemical group 0.000 claims description 2
- 230000004048 modification Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
- G06V30/41—Analysis of document content
- G06V30/413—Classification of content, e.g. text, photographs or tables
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2413—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
- G06F18/24133—Distances to prototypes
- G06F18/24137—Distances to cluster centroïds
- G06F18/2414—Smoothing the distance, e.g. radial basis function networks [RBFN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a table structure identification method based on a graph neural network, which converts each page of a pdf document into an image, identifies the position of a table for each input image, and intercepts a table area; identifying a text blob block for the table region; finding a set of adjacent blobs for each blob, thereby establishing a blob graph structure; establishing a dual graph structure for the blob graph, and converting the graph node connection prediction problem into a graph node classification problem; training a node classification model of the graph; sorting the relationship between the blobs to obtain a cell structure of the table; the graph neural network is applied to the identification of the table structure, the identification and modeling of the table structure are carried out into the graph node classification, the feedback adjusting network and the conditional random field are added, the graph node classification result is corrected based on the reasonability of the whole table structure, and the identification accuracy is improved.
Description
Technical Field
The invention relates to an image recognition technology, in particular to a table structure recognition method based on a graph neural network.
Background
Under the application scene of big data and artificial intelligence, a large amount of information is collected, processed and analyzed, the data is structured, and the production is guided by discovering the rule in the data. The information exists in various and unstructured ways, and a large amount of information exists in tables, which may exist in pdf, web page, image.
For the table in the pdf, the existing table parsing method generally includes parsing the table by reading xml information of the pdf (e.g., an xpdf tool), converting the pdf into another format such as xml, html, word, etc., and then parsing (e.g., a pdf-docx tool), converting the pdf into an image and then performing structure recognition, where the first two methods cannot accurately parse the table due to information loss of the pdf file itself, the third method mainly depends on an image recognition algorithm, and the existing method cannot accurately recognize a complex table.
Disclosure of Invention
The invention aims to provide a table structure identification method based on a graph neural network, which can obtain the cell arrangement information of a table, such as the specific content of the ith row and the jth column, and the cross-column (colspan) and cross-row (rows) information of a complex table; the invention applies the graph neural network to the identification of the table structure, models the identification of the table structure into graph node classification, adds a feedback regulation network and a Conditional Random Field (CRF), and corrects the graph node classification result based on the reasonability of the whole table structure.
The purpose of the invention is realized by the following technical scheme: a table structure identification method based on a graph neural network is characterized by comprising the following steps:
(1) and recognizing the position of the table in the input image, and intercepting a table area.
(2) A text blob block is identified for the table region.
(3) Finding a set of adjacent blobs for each blob, thereby establishing a blob graph structure: ordering the blob sets in the table area according to the y coordinate of the image, arranging the blob sets into a plurality of lines of blob sets, and ordering the blob sets in each line according to the x coordinate; after sorting, for the blobs in each row, finding the next adjacent blob in the same row and the blob in the next row which is coincident with the x axis of the blob as the adjacent set.
(4) Establishing a dual graph structure for the blob graph, and converting the graph node connection prediction problem into a graph node classification problem: each pair of connected blobs of the original blob map corresponds to a graph node in the dual map, and if two connected blob pairs in the original blob map have an identical blob, there is an edge between graph nodes corresponding to the two blob pairs in the dual map.
(5) Training a node classification model of the training chart, wherein the training process of the column classification model is as follows:
training data: for each node on the dual graph of the table, if two original image blobs of the node are in the same column, the group route of the node is 1, otherwise, the group route is 0; the input characteristics of the node are the characteristics of two original image blobs of the node;
training a model: establishing a classifier by using an gnn model, wherein the classifier is used for classifying the category of each node in the dual graph, namely whether two original image blobs represented by each node are in the same column or not;
model prediction: and obtaining a dual graph of the table and the characteristics of each node, and inputting the classification result of each node, namely the same-column relation between every two blobs in the original graph, by the model.
(6) And (3) sorting the relationship among the blobs to obtain a cell structure of the table:
respectively calculating a column set and a row set of the table;
cell of the table: sorting the table row set according to the y coordinate of the image, sorting the column set according to the x coordinate of the image, and then crossing each row and each column to obtain the table cells;
and (3) arranging the blobs in the cells: and arranging the blobs in each cell according to rows, combining the blobs in each row into a large blob, expanding the abscissa of the large blob to the cell boundary of the table, and performing character recognition on the large blob to obtain the character content of the cell.
Further, in the step (1), a table detector is built by adopting a neural network based on RCNN, and the table position is identified; in the step (2), the character blob in the table region is identified based on ctpn, craft, tesseract and other tools.
Further, in the training data sorting process in the step (5), for each node on the dual graph of the table, the input features of the nodes are features of two original image blobs of the node, including relative values of image coordinates of the two blobs, an euclidean distance between the two blobs, an x-axis coincidence rate, and a y-axis coincidence rate, and both absolute values and relative values of these values are taken as features.
Further, in the training data sorting process in the step (5), the position relationship between two blobs and adjacent table lines is used as a feature, and specifically, the distance and the coincidence rate between the table line in 4 directions where the 1 st blob may exist and the 2 nd blob are used as a feature.
Further, in the training data sorting process in the step (5), the table line is processed, the table line is used as blob, when the node characteristics are calculated, the table line is used as an adjacent point, the classification of the dual nodes generated by the table line is not calculated, namely, when the model loss is calculated, the classification error generated by the table line is not added into the total error.
Further, a Conditional Random Field (CRF) layer is added in the training model process in the step (5), specifically: adding a CRF model to the graph classification result to carry out result regularization adjustment, namely considering the integral connection structure of the table; taking the classification result of the graph nodes as a unary item of the CRF, adding a binary item into the step, giving out the same classification result to the blob pairs under the similar configuration in the loss function, and punishing different classification results of the blob pairs under the similar configuration; establishing a densecrf model, wherein the definition of similar configuration is that two blob pairs are approximately consistent in the y direction; construct the crf loss function for a single node: l (i) ═ L1(i) + r × L2(i), where L1 is the unary term, i.e., the loss function of the graph node classification model, representing the error of the individual graph node classification results from the ground channel, and L2 is the binary term.
Further, in the crf loss function of the single node, the formula of the binary term L2 is as follows:
L2=∑i,jQ(i,j),i∈[0,N],j∈{0,1},
Q(i,j)=∑mu(m,j)P(i,j),m∈{0,1},
P(i,j)=∑ksim(k,j)N(k,j),k∈[0,N],k≠i,
u(m,j)∈R2,
sim(k,j)=overlapx(Bk1,Bj1)*overlapx(Bk2,Bj2),
wherein Q (i, j) is the binary loss when the prediction result of the node i is j, N is the number of nodes in the dual graph, and j is 0, namely the classification result of the graph node i is 0; p (i, j) is the weighted sum of the prediction results of the similar node k set of the node i, and the weight of each node k is the similarity sim (k, j) between k and i; u represents a penalty coefficient between the node i and the prediction result of the node similar to the node i, and has four variables of u (0,0), u (0,1), u (1,0) and u (1,1), wherein u (0,1) represents the penalty coefficient when the prediction result of the node i is 0 and the prediction result of the similar node is 1; sim (k, j), a function of the nodes sim of the two graphs, is a similarity measure between nodes i and j, defined here as the blob represented by each of nodes i and j: (B)k1,Bi2) And (B)k1,Bj2) Overlap ratio in x direction (B)k1,Bj1)=(min(Bk1.right,Bj1.right)-max(Bk1.left,Bj1.left))/min(Bk1.right-Bk1.left,Bj1.right-Bj1Left), where left and right are the left and right blobs of a blob pair.
Further, a result feedback adjustment model is added in the training model process in the step (5), and the method specifically comprises the following steps: and the input of the model is a graph node classification result, namely the classification probability of the category of each node on the graph, a graph node classification network is constructed, the classification result still corresponding to each node is output and is compared with a ground channel to generate an error so as to train the model, the classification result is used as an input sample of the feedback regulation model again, and the cyclic prediction is carried out until the difference between the obtained result and the classification probability of the result of the previous prediction is less than a threshold value or reaches the preset maximum cycle number.
Further, in the step (6), the calculation procedure for the table column is as follows: for the blobs in each row, finding out a blob set blob _ sameline _ neighbor in the same row from the adjacent set of the blobs, finding out blobs which have the same column relationship with the blobs from the blob set blob _ sameline _ neighbor, and forming a branch ({ blob }, i) is 0,1,. n _ branch, and n _ branch is the number of blobs which have the same column relationship with the blobs; merging the branches with the common blob to obtain a branch set branch _ this of the row, { branch _ j }, j ═ 0,1,. n _ branch _ this, where n _ branch _ this is the branch number of the row; merging with the branch _ last obtained in the previous row, where k is 0,1,. n _ branch _ last is the branch number in the previous row, that is, a certain branch j and branch k in the branch _ this set and the branch _ last set having the same blob are merged to obtain an updated set branch _ last { branch }, k is 0,1,. n _ branch _ last _ update, and if branch j cannot find a branch having the same blob in the branch _ last, adding branch to the branch _ last set; obtaining a branch _ last set finally, wherein each branch in the set is a column of the table;
the calculation procedure for the table rows is as follows: for each row of blobs, finding out a blob set blob _ sameline _ neighbor in the same row from the adjacent set of blobs, finding out blobs which have a same row relationship with the blob from the blob set blob _ sameline _ neighbor, and forming a branch (blob) together with the blobs which have the same row relationship with the blob to form a branch, i is 0, 1.. n _ branch, and n _ branch is the number of blobs which have the same row relationship with the blob; merging the branches with the common blob to obtain a branch set branch _ this of the row as { branch j }, j as 0, 1.. n _ branch _ this, where n _ branch _ this is the branch number of the row; merging with the branch _ last obtained in the previous column, wherein k is 0,1,. n _ branch _ last is the branch number in the previous column, that is, a certain branch j and branch k with the same blob as in the branch _ last are merged in the branch _ this set, so as to obtain an updated set branch _ last { branch }, k is 0,1,. n _ branch _ last _ update, and if branch j cannot find a branch with the same blob in the branch _ last, adding branch to the branch _ last set; the resulting branch _ last set, each branch in the set is a row of the table.
The invention has the beneficial effects that: the method uses dual representation to convert graph connection prediction into graph node classification problem, so that the method is more convenient to be combined with a CRF and feedback regulation model; specially processing the table lines according to the characteristics of the table; and the classification result is optimized by using the CRF and the feedback regulation model and combining the classification of a single node and the consistent smoothness of the integral connection of the table.
Drawings
FIG. 1 is a flow chart of a table structure identification method based on a graph neural network according to the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
As shown in fig. 1, the method for identifying a table structure based on a graph neural network provided by the present invention converts documents of other formats into images, identifies the position of a table for each input image, intercepts a table area, identifies a text blob block for the table area, finds an adjacent blob for each blob, predicts the relationship between the blob and each adjacent blob (whether the blob is in the same row or the same column), and finally obtains the structure of the table through the relationship. The method comprises the following concrete steps:
1. each page of the pdf document is converted into an image.
2. For each input image, the position of the form is identified, and the form area is intercepted.
In an embodiment of the present application, a table detector may be built using an RCNN-based neural network to identify the table location.
3. A text blob block is identified for the table region.
In the embodiment of the present application, the identification may be performed based on ctpn, craft, tesseract, and other tools.
4. And finding a neighboring blob set for each blob, thereby establishing a blob graph structure, namely a connection relation between the blobs.
The embodiment of the application specifically comprises the following steps: ordering the blob sets in the table area according to the y coordinate of the image, arranging the blob sets into a plurality of lines (text line), and ordering the blob sets in each line according to the x coordinate; after sorting, for the blobs in each row, finding the next adjacent blob in the same row and the blob in the next row which is coincident with the x axis of the blob as the adjacent set.
5. And establishing a dual graph structure for the blob graph, and converting the graph node connection prediction problem into a graph node classification problem.
The embodiment of the application specifically comprises the following steps: each pair of connected blobs of the original blob map corresponds to a graph node in the dual map, and if two connected blob pairs in the original blob map have an identical blob, there is an edge between graph nodes corresponding to the two blob pairs in the dual map.
6. Training a graph node classification model, taking a column classification model as an example, and a row classification model is similar:
6.1 training data:
6.1.1 for each node on the dual graph of the table:
if the two original image blobs of the node are in the same row, the group route of the node is 1, otherwise, the group route is 0;
the input characteristics of the node are the characteristics of two original image blobs of the node, including the relative value of image coordinates (one blob has left, right, top and bottom 4 coordinate values), the Euclidean distance between the two blobs, the x-axis coincidence rate and the y-axis coincidence rate of the two blobs, and the absolute value and the relative value of the values are taken as the characteristics;
in addition, the positional relationship between two blobs and adjacent table grid lines is taken as a feature, specifically, the distance and the coincidence ratio between a table grid line in 4 directions of a blob1 (1 st blob) and a blob2 (2 nd blob) are taken as features, and the coincidence ratio is the coincidence ratio in the x direction, that is, (blob2.right, line.right) -max (blob2.left, line.right))/min (blob2.right-blob2.left, line.right-line.left), and the distance is blob 2.left-line.right, that is, the coincidence ratio in the left and right direction is 2.right-line.top, blob 2.right-line.top, dot.2.top-line.top, and the coincidence ratio in the up and down directions is taken as a feature;
6.1.2 processing the table lines, namely taking the table lines as blobs, taking the table lines as adjacent points when calculating the node characteristics, not calculating the classification of dual nodes generated by the table lines, namely not calculating the classification errors generated by the model lines into the total errors when calculating the model loss, and taking the table lines as the important characteristics of the table, thereby taking the remarkable influence of the table on the classification results of the blobs into consideration; the nodes generated by the table lines are placed at the end to facilitate subsequent processing;
6.2 training model:
6.2.1 node classification model of the graph: establishing a classifier by using an gnn (such as gcn, gat and the like) model, wherein the classifier is used for classifying the category (1 or 0) of each node in the dual graph, namely whether two original image blobs represented by each node are in the same column or not, 1 is in the same column, and 0 is in different columns; in the training process, the loss generated by nodes with table lines in the blob pairs is eliminated, namely the loss generated by only the first n _ blob graph nodes is taken;
6.2.2 layers of Conditional Random Fields (CRF):
preferably, a CRF model is added to the graph classification result to perform regularization adjustment of the result, i.e., to consider the overall connection structure of the table. And taking the classification result of the graph nodes as a unary item of the CRF, adding a binary item into the step, exciting the blob pairs under similar configuration to have the same classification result in the loss function, and punishing different classification results of the blob pairs under similar configuration, so that part of noise in the result can be removed, and a more consistent and smooth result can be obtained. A densecrf model is established, and a similar configuration is defined as two blob pairs that are generally consistent in the y-direction, for example, the crf loss function of a single node can be constructed as follows: l (i) ═ L1(i) + r × L2(i), where L1 is the unary term, i.e., the loss function of the graph node classification model, representing the error of the individual graph node classification results from ground truth, L2 is the binary term,
L2=∑i,jQ(i,j),i∈[0,N],j∈{0,1},
Q(i,j)=∑mu(m,j)P(i,j),m∈{0,1},
P(i,j)=∑ksim(k,j)N(k,j),k∈[0,N],k≠i,
u(m,j)∈R2,
sim(k,j)=overlapx(Bk1,Bj1)*overlapx(Bk2,Bj2),
q (i, j) is the binary loss when the prediction result of the node i is j, N is the number of nodes in the dual graph, and j is 0, namely the classification result of the graph node i is 0; p (i, j) is the weighted sum of the prediction results of the similar node k set of the node i, and the weight of each node k is the similarity sim (k, j) between k and i; u represents a penalty coefficient between the node i and the prediction result of the node similar to the node i, and has four variables of u (0,0), u (0,1), u (1,0) and u (1,1), for example, u (0,1) represents the penalty coefficient when the prediction result of the node i is 0 and the prediction result of the similar node is 1; sim (k, j), a function of the nodes sim of the two graphs, is a similarity measure between nodes i and j, defined here as the blob represented by each of nodes i and j: (B)k1,Bi2) And (B)k1,Bj2) Overlap ratio in x direction (B)k1,Bj1)=(min(Bk1.right,Bj1.right)-max(Bk1.left,Bj1.left))/min(Bk1.right-Bk1.left,Bj1.right-Bj1Left), where left and right are the left and right blobs of a blob pair. The parameters that need to be adjusted by training are u and r, and r can also be set to be constant, for example, r is 0.5. The definition of sim herein can be varied as long as it expresses its penalty purpose. In addition, sim and u can have multiple groups, and all groups are finally added to represent different penalty coefficients under different similarity configurations.
6.2.3 result feedback adjustment model:
preferably, a feedback regulation model is added to the graph node classification result (which may include the CRF layer) to optimize the classification result. The input to the model is the classification result of the nodes of the graph, i.e., the classification probability of the class of each node on the graph, e.g., the classification probability of a single node may be [0.9, -0.1], and then the node is class 0. And constructing a graph node classification network by taking the graph node classification network as input (the network can adopt an gnn model), outputting the classification result still of each node, comparing the classification result with a ground route to generate an error, training the model, taking the classification result as an input sample of a feedback regulation model again, and circularly predicting until the difference of the classification probability of the obtained result and the result of the previous prediction is less than a threshold (for example, T is 0.001) or a preset maximum cycle number (for example, N is 3), wherein the values of T and N can be modified according to the actual result.
6.3 model prediction: and obtaining a dual graph of the table and the characteristics of each node as in the training data sorting process, inputting the dual graph into the model, and obtaining the classification result of each node, namely obtaining the same-column relationship between every two blobs in the original graph. Only the relationships between non-table line blobs are of interest in the prediction result. And during prediction, whether a feedback regulation model is added or not is selected according to an actual result.
7. And (4) sorting the relations among the blobs to obtain a table cell structure, namely a table column (which blobs are in the same column), a table row (which blobs are in the same row) and a table cell (which blobs belong to the same cell).
For the calculation of the table column: for each blob in each row (text line), finding out a blob set blob _ sameline _ neighbor in the same row from the neighborhood set of the blob, finding out blobs which have the same column relationship with the blob from the blob set blob _ sameline _ neighbor, and forming a branch with the blob which has the same column relationship with the blob together, wherein i is 0, 1. Merging the branches with the common blob to obtain a branch set branch _ this of the row, wherein j is 0,1,. n _ branch _ this, and n _ branch _ this is the branch number of the row; merging the branch _ last obtained from the previous line (text line), wherein n _ branch _ last is the branch number of the previous line, that is, a certain branch j and branch k which have the same blob as the branch _ last in the branch _ this set are merged to obtain an updated set branch _ last which is { branch }, k is 0, 1.. n _ branch _ last _ update, and if branch j cannot find a branch having the same blob in the branch _ last, adding branch to the branch _ last set; the resulting branch _ last set, each branch in the set is a column of the table.
The calculations for the table rows are similar: for each row of blobs, finding out a blob set blob _ sameline _ neighbor in the same row from the adjacent set of blobs, finding out blobs which have a same row relationship with the blob from the blob set blob _ sameline _ neighbor, and forming a branch (blob) together with the blobs which have the same row relationship with the blob to form a branch, i is 0, 1.. n _ branch, and n _ branch is the number of blobs which have the same row relationship with the blob; merging the branches with the common blob to obtain a branch set branch _ this of the row as { branch j }, j as 0, 1.. n _ branch _ this, where n _ branch _ this is the branch number of the row; merging with the branch _ last obtained in the previous column, wherein k is 0,1,. n _ branch _ last is the branch number in the previous column, that is, a certain branch j and branch k with the same blob as in the branch _ last are merged in the branch _ this set, so as to obtain an updated set branch _ last { branch }, k is 0,1,. n _ branch _ last _ update, and if branch j cannot find a branch with the same blob in the branch _ last, adding branch to the branch _ last set; the resulting branch _ last set, each branch in the set is a row of the table.
Cell of the table: and then sorting the table row set according to the y coordinate of the image, sorting the column set according to the x coordinate of the image, and then crossing each row and each column to obtain a cell of the table, wherein the cell of the ith row and the j column of the table is formed by the presence of the blob set of the ith row and the blob set of the jth column.
And (3) arranging the blobs in the cells: the method comprises the steps of arranging the blobs in each cell according to lines, combining the blobs in each line into a large blob, expanding the abscissa of the large blob to the cell boundary of a table, and performing character recognition (ocr) on the large blob to obtain the character content of the cell. Character recognition may be based on a crnn model, such as the tesseract-ocr tool.
The foregoing is only a preferred embodiment of the present invention, and although the present invention has been disclosed in the preferred embodiments, it is not intended to limit the present invention. Those skilled in the art can make numerous possible variations and modifications to the present teachings, or modify equivalent embodiments to equivalent variations, without departing from the scope of the present teachings, using the methods and techniques disclosed above. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are still within the scope of the protection of the technical solution of the present invention, unless the contents of the technical solution of the present invention are departed.
Claims (10)
1. A table structure identification method based on a graph neural network is characterized by comprising the following steps:
(1) and recognizing the position of the table in the input image, and intercepting a table area.
(2) A text blob block is identified for the table region.
(3) Finding a set of adjacent blobs for each blob, thereby establishing a blob graph structure: ordering the blob sets in the table area according to the y coordinate of the image, arranging the blob sets into a plurality of lines of blob sets, and ordering the blob sets in each line according to the x coordinate; after sorting, for the blobs in each row, finding the next adjacent blob in the same row and the blob in the next row which is coincident with the x axis of the blob as the adjacent set.
(4) Establishing a dual graph structure for the blob graph, and converting the graph node connection prediction problem into a graph node classification problem: each pair of connected blobs of the original blob map corresponds to a graph node in the dual map, and if two connected blob pairs in the original blob map have an identical blob, there is an edge between graph nodes corresponding to the two blob pairs in the dual map.
(5) Training a node classification model of the training chart, wherein the training process of the column classification model is as follows:
training data: for each node on the dual graph of the table, if two original image blobs of the node are in the same column, the group route of the node is 1, otherwise, the group route is 0; the input characteristics of the node are the characteristics of two original image blobs of the node;
training a model: establishing a classifier by using an gnn model, wherein the classifier is used for classifying the category of each node in the dual graph, namely whether two original image blobs represented by each node are in the same column or not;
model prediction: and obtaining a dual graph of the table and the characteristics of each node, and inputting the classification result of each node, namely the same-column relation between every two blobs in the original graph, by the model.
(6) And (3) sorting the relationship among the blobs to obtain a cell structure of the table:
respectively calculating a column set and a row set of the table;
cell of the table: sorting the table row set according to the y coordinate of the image, sorting the column set according to the x coordinate of the image, and then crossing each row and each column to obtain the table cells;
and (3) arranging the blobs in the cells: and arranging the blobs in each cell according to rows, combining the blobs in each row into a large blob, expanding the abscissa of the large blob to the cell boundary of the table, and performing character recognition on the large blob to obtain the character content of the cell.
2. The method according to claim 1, wherein in step (1), the RCNN-based neural network is used to build a table detector to identify the table location.
3. The table structure recognition method based on graph neural network as claimed in claim 1, wherein in the step (2), the recognition of the text blob of the table region is performed based on ctpn, craft, tesseract and other tools.
4. The table structure recognition method based on graph neural network as claimed in claim 1, wherein in the training data sorting process of step (5), for each node on the dual graph of the table, the input features of the node are the features of two original image blobs of the node, including the relative values of the image coordinates of the two blobs, the euclidean distance between the two blobs, the x-axis coincidence rate, and the y-axis coincidence rate, and the absolute values and the relative values of these values are taken as features.
5. The table structure recognition method based on the neural network of the graph according to claim 1, wherein in the training data sorting process of step (5), the position relationship between two blobs and adjacent table lines is used as a feature, and specifically, the distance and the coincidence rate between the table line in 4 directions where the 1 st blob may exist and the 2 nd blob are used as a feature.
6. The table structure identification method based on the graph neural network as claimed in claim 1, wherein in the training data arrangement process of step (5), the table lines are processed, the table lines are used as blobs, the table lines are used as adjacent points when calculating node features, and the classification of the dual nodes generated by the table lines is not calculated, that is, the classification error generated by the table lines is not added into the total error when calculating the model loss.
7. The method according to claim 1, wherein a Conditional Random Field (CRF) layer is added in the training model process of step (5), specifically: adding a CRF model to the graph classification result to carry out result regularization adjustment, namely considering the integral connection structure of the table; taking the classification result of the graph nodes as a unary item of the CRF, adding a binary item into the step, giving out the same classification result to the blob pairs under the similar configuration in the loss function, and punishing different classification results of the blob pairs under the similar configuration; establishing a densecrf model, wherein the definition of similar configuration is that two blob pairs are approximately consistent in the y direction; construct the crf loss function for a single node: l (i) ═ L1(i) + r × L2(i), where L1 is the unary term, i.e., the loss function of the graph node classification model, representing the error of the individual graph node classification results from the ground channel, and L2 is the binary term.
8. The method of claim 7, wherein in the crf loss function of the single node, the formula of the binary term L2 is as follows:
L2=∑i,jQ(i,j),i∈[0,N],j∈{0,1},
Q(i,j)=∑mu(m,j)P(i,j),m∈{0,1},
P(i,j)=∑ksim(k,j)N(k,j),k∈[0,N],k≠i,
u(m,j)∈R2,
sim(k,j)=overlapx(Bk1,Bj1)*overlapx(Bk2,Bj2),
wherein Q (i, j) is the binary loss when the prediction result of the node i is j, N is the number of nodes in the dual graph, and j is 0, namely the classification result of the graph node i is 0; p (i, j) is the weighted sum of the prediction results of the similar node k set of the node i, and the weight of each node k is the similarity sim (k, j) between k and i; u represents a penalty coefficient between the node i and the prediction result of the node similar to the node i, and has four variables of u (0,0), u (0,1), u (1,0) and u (1,1), wherein u (0,1) represents the penalty coefficient when the prediction result of the node i is 0 and the prediction result of the similar node is 1; sim (k, j), a function of the nodes sim of the two graphs, is a similarity measure between nodes i and j, defined here as the blob represented by each of nodes i and j: (B)k1,Bi2) And (B)k1,Bj2) Overlap ratio in x direction (B)k1,Bj1)=(min(Bk1.right,Bj1.right)-max(Bk1.left,Bj1.left))/min(Bk1.right-Bk1.left,Bj1.right-Bj1Left), where left and right are the left and right blobs of a blob pair.
9. The method for identifying a table structure based on a graph neural network according to claim 1, wherein a result feedback adjustment model is added in the training model process in the step (5), and specifically comprises the following steps: and the input of the model is a graph node classification result, namely the classification probability of the category of each node on the graph, a graph node classification network is constructed, the classification result still corresponding to each node is output and is compared with a ground channel to generate an error so as to train the model, the classification result is used as an input sample of the feedback regulation model again, and the cyclic prediction is carried out until the difference between the obtained result and the classification probability of the result of the previous prediction is less than a threshold value or reaches the preset maximum cycle number.
10. The table structure identification method based on the neural network of the figure as claimed in claim 1, wherein in the step (6), the calculation process for the table column is as follows: for the blobs in each row, finding out a blob set blob _ sameline _ neighbor in the same row from the adjacent set of the blobs, finding out blobs which have the same column relationship with the blobs from the blob set blob _ sameline _ neighbor, and forming a branch ({ blob }, i) is 0,1,. n _ branch, and n _ branch is the number of blobs which have the same column relationship with the blobs; merging the branches with the common blob to obtain a branch set branch _ this of the row, { branch _ j }, j ═ 0,1,. n _ branch _ this, where n _ branch _ this is the branch number of the row; merging with the branch _ last obtained in the previous row, where k is 0,1,. n _ branch _ last is the branch number in the previous row, that is, a certain branch j and branch k in the branch _ this set and the branch _ last set having the same blob are merged to obtain an updated set branch _ last { branch }, k is 0,1,. n _ branch _ last _ update, and if branch j cannot find a branch having the same blob in the branch _ last, adding branch to the branch _ last set; obtaining a branch _ last set finally, wherein each branch in the set is a column of the table;
the calculation procedure for the table rows is as follows: for each row of blobs, finding out a blob set blob _ sameline _ neighbor in the same row from the adjacent set of blobs, finding out blobs which have a same row relationship with the blob from the blob set blob _ sameline _ neighbor, and forming a branch (blob) together with the blobs which have the same row relationship with the blob to form a branch, i is 0, 1.. n _ branch, and n _ branch is the number of blobs which have the same row relationship with the blob; merging the branches with the common blob to obtain a branch set branch _ this of the row as { branch j }, j as 0, 1.. n _ branch _ this, where n _ branch _ this is the branch number of the row; merging with the branch _ last obtained in the previous column, wherein k is 0,1,. n _ branch _ last is the branch number in the previous column, that is, a certain branch j and branch k with the same blob as in the branch _ last are merged in the branch _ this set, so as to obtain an updated set branch _ last { branch }, k is 0,1,. n _ branch _ last _ update, and if branch j cannot find a branch with the same blob in the branch _ last, adding branch to the branch _ last set; the resulting branch _ last set, each branch in the set is a row of the table.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010390152.2A CN111597943B (en) | 2020-05-08 | 2020-05-08 | Table structure identification method based on graph neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010390152.2A CN111597943B (en) | 2020-05-08 | 2020-05-08 | Table structure identification method based on graph neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111597943A true CN111597943A (en) | 2020-08-28 |
CN111597943B CN111597943B (en) | 2021-09-03 |
Family
ID=72181924
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010390152.2A Active CN111597943B (en) | 2020-05-08 | 2020-05-08 | Table structure identification method based on graph neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111597943B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112115884A (en) * | 2020-09-22 | 2020-12-22 | 北京一览群智数据科技有限责任公司 | Form recognition method and system |
CN112861821A (en) * | 2021-04-06 | 2021-05-28 | 刘羽 | Map data reduction method based on PDF file analysis |
CN112949476A (en) * | 2021-03-01 | 2021-06-11 | 苏州美能华智能科技有限公司 | Text relation detection method and device based on graph convolution neural network and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0859333A1 (en) * | 1997-02-12 | 1998-08-19 | STMicroelectronics S.r.l. | Method of coding characters for word recognition and word recognition device using that coding |
CN109154934A (en) * | 2016-03-18 | 2019-01-04 | 甲骨文国际公司 | Run length encoding perception direct memory access filter engine for the multi-core processor that buffer enables |
CN109447007A (en) * | 2018-12-19 | 2019-03-08 | 天津瑟威兰斯科技有限公司 | A kind of tableau format completion algorithm based on table node identification |
CN109934261A (en) * | 2019-01-31 | 2019-06-25 | 中山大学 | A kind of Knowledge driving parameter transformation model and its few sample learning method |
CN109993112A (en) * | 2019-03-29 | 2019-07-09 | 杭州睿琪软件有限公司 | The recognition methods of table and device in a kind of picture |
CN110188714A (en) * | 2019-06-04 | 2019-08-30 | 言图科技有限公司 | A kind of method, system and storage medium for realizing financial management under chat scenario |
CN110390269A (en) * | 2019-06-26 | 2019-10-29 | 平安科技(深圳)有限公司 | PDF document table extracting method, device, equipment and computer readable storage medium |
CN110751038A (en) * | 2019-09-17 | 2020-02-04 | 北京理工大学 | PDF table structure identification method based on graph attention machine mechanism |
-
2020
- 2020-05-08 CN CN202010390152.2A patent/CN111597943B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP0859333A1 (en) * | 1997-02-12 | 1998-08-19 | STMicroelectronics S.r.l. | Method of coding characters for word recognition and word recognition device using that coding |
CN109154934A (en) * | 2016-03-18 | 2019-01-04 | 甲骨文国际公司 | Run length encoding perception direct memory access filter engine for the multi-core processor that buffer enables |
CN109447007A (en) * | 2018-12-19 | 2019-03-08 | 天津瑟威兰斯科技有限公司 | A kind of tableau format completion algorithm based on table node identification |
CN109934261A (en) * | 2019-01-31 | 2019-06-25 | 中山大学 | A kind of Knowledge driving parameter transformation model and its few sample learning method |
CN109993112A (en) * | 2019-03-29 | 2019-07-09 | 杭州睿琪软件有限公司 | The recognition methods of table and device in a kind of picture |
CN110188714A (en) * | 2019-06-04 | 2019-08-30 | 言图科技有限公司 | A kind of method, system and storage medium for realizing financial management under chat scenario |
CN110390269A (en) * | 2019-06-26 | 2019-10-29 | 平安科技(深圳)有限公司 | PDF document table extracting method, device, equipment and computer readable storage medium |
CN110751038A (en) * | 2019-09-17 | 2020-02-04 | 北京理工大学 | PDF table structure identification method based on graph attention machine mechanism |
Non-Patent Citations (3)
Title |
---|
FEDERICO MONTI等: ""Dual-Primal Graph Convolutional Networks"", 《ARXIV》 * |
SHAH RUKH QASIM等: ""Rethinking Table Recognition using Graph Neural Networks"", 《2019 INTERNATIONAL CONFERENCE ON DOCUMENT ANALYSIS AND RECOGNITION (ICDAR)》 * |
白铂等: ""图神经网络"", 《中国科学》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112115884A (en) * | 2020-09-22 | 2020-12-22 | 北京一览群智数据科技有限责任公司 | Form recognition method and system |
CN112949476A (en) * | 2021-03-01 | 2021-06-11 | 苏州美能华智能科技有限公司 | Text relation detection method and device based on graph convolution neural network and storage medium |
CN112949476B (en) * | 2021-03-01 | 2023-09-29 | 苏州美能华智能科技有限公司 | Text relation detection method, device and storage medium based on graph convolution neural network |
CN112861821A (en) * | 2021-04-06 | 2021-05-28 | 刘羽 | Map data reduction method based on PDF file analysis |
CN112861821B (en) * | 2021-04-06 | 2024-04-19 | 刘羽 | Map data reduction method based on PDF file analysis |
Also Published As
Publication number | Publication date |
---|---|
CN111597943B (en) | 2021-09-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110909725B (en) | Method, device, equipment and storage medium for recognizing text | |
CN112765358B (en) | Taxpayer industry classification method based on noise label learning | |
CN111597943B (en) | Table structure identification method based on graph neural network | |
CN112241481B (en) | Cross-modal news event classification method and system based on graph neural network | |
US20220076150A1 (en) | Method, apparatus and system for estimating causality among observed variables | |
CN110188228B (en) | Cross-modal retrieval method based on sketch retrieval three-dimensional model | |
CN112101430B (en) | Anchor frame generation method for image target detection processing and lightweight target detection method | |
CN111144300B (en) | Pdf table structure identification method based on image identification | |
CN111931505A (en) | Cross-language entity alignment method based on subgraph embedding | |
CN110751038A (en) | PDF table structure identification method based on graph attention machine mechanism | |
CN114419304A (en) | Multi-modal document information extraction method based on graph neural network | |
CN111178196B (en) | Cell classification method, device and equipment | |
CN110263855B (en) | Method for classifying images by utilizing common-basis capsule projection | |
CN113516019B (en) | Hyperspectral image unmixing method and device and electronic equipment | |
CN112949476A (en) | Text relation detection method and device based on graph convolution neural network and storage medium | |
Manokhin | Multi-class probabilistic classification using inductive and cross Venn–Abers predictors | |
CN116206327A (en) | Image classification method based on online knowledge distillation | |
CN115761502A (en) | SAR image change detection method based on hybrid convolution | |
CN114139631B (en) | Multi-target training object-oriented selectable gray box countermeasure sample generation method | |
Luqman et al. | Subgraph spotting through explicit graph embedding: An application to content spotting in graphic document images | |
CN118279320A (en) | Target instance segmentation model building method based on automatic prompt learning and application thereof | |
Rijal et al. | Integrating Information Gain methods for Feature Selection in Distance Education Sentiment Analysis during Covid-19. | |
CN113741364A (en) | Multi-mode chemical process fault detection method based on improved t-SNE | |
He et al. | Classification of metro facilities with deep neural networks | |
CN117671666A (en) | Target identification method based on self-adaptive graph convolution neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address | ||
CP03 | Change of name, title or address |
Address after: 310053 room 1310, Huarong times building, No. 3880 Jiangnan Avenue, Binjiang District, Hangzhou City, Zhejiang Province Patentee after: Hangzhou Huiyidao Technology Co.,Ltd. Country or region after: China Address before: 310053 room 1310, Huarong times building, No. 3880 Jiangnan Avenue, Binjiang District, Hangzhou City, Zhejiang Province Patentee before: Hangzhou Firestone Technology Co.,Ltd. Country or region before: China |