CN111597943B - Table structure identification method based on graph neural network - Google Patents

Table structure identification method based on graph neural network Download PDF

Info

Publication number
CN111597943B
CN111597943B CN202010390152.2A CN202010390152A CN111597943B CN 111597943 B CN111597943 B CN 111597943B CN 202010390152 A CN202010390152 A CN 202010390152A CN 111597943 B CN111597943 B CN 111597943B
Authority
CN
China
Prior art keywords
node
graph
blob
blobs
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010390152.2A
Other languages
Chinese (zh)
Other versions
CN111597943A (en
Inventor
杨红飞
金霞
韩瑞峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Huiyidao Technology Co.,Ltd.
Original Assignee
Hangzhou Firestone Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Firestone Technology Co ltd filed Critical Hangzhou Firestone Technology Co ltd
Priority to CN202010390152.2A priority Critical patent/CN111597943B/en
Publication of CN111597943A publication Critical patent/CN111597943A/en
Application granted granted Critical
Publication of CN111597943B publication Critical patent/CN111597943B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a table structure identification method based on a graph neural network, which converts each page of a pdf document into an image, identifies the position of a table for each input image, and intercepts a table area; identifying a text blob block for the table region; finding a set of adjacent blobs for each blob, thereby establishing a blob graph structure; establishing a dual graph structure for the blob graph, and converting the graph node connection prediction problem into a graph node classification problem; training a node classification model of the graph; sorting the relationship between the blobs to obtain a cell structure of the table; the graph neural network is applied to the identification of the table structure, the identification and modeling of the table structure are carried out into the graph node classification, the feedback adjusting network and the conditional random field are added, the graph node classification result is corrected based on the reasonability of the whole table structure, and the identification accuracy is improved.

Description

Table structure identification method based on graph neural network
Technical Field
The invention relates to an image recognition technology, in particular to a table structure recognition method based on a graph neural network.
Background
Under the application scene of big data and artificial intelligence, a large amount of information is collected, processed and analyzed, the data is structured, and the production is guided by discovering the rule in the data. The information exists in various and unstructured ways, and a large amount of information exists in tables, which may exist in pdf, web page, image.
For the table in the pdf, the existing table parsing method generally includes parsing the table by reading xml information of the pdf (e.g., an xpdf tool), converting the pdf into another format such as xml, html, word, etc., and then parsing (e.g., a pdf-docx tool), converting the pdf into an image and then performing structure recognition, where the first two methods cannot accurately parse the table due to information loss of the pdf file itself, the third method mainly depends on an image recognition algorithm, and the existing method cannot accurately recognize a complex table.
Disclosure of Invention
The invention aims to provide a table structure identification method based on a graph neural network, which can obtain the cell arrangement information of a table, such as the specific content of the ith row and the jth column, and the cross-column (colspan) and cross-row (rows) information of a complex table; the invention applies the graph neural network to the identification of the table structure, models the identification of the table structure into graph node classification, adds a feedback regulation network and a Conditional Random Field (CRF), and corrects the graph node classification result based on the reasonability of the whole table structure.
The purpose of the invention is realized by the following technical scheme: a table structure identification method based on a graph neural network is characterized by comprising the following steps:
(1) and recognizing the position of the table in the input image, and intercepting a table area.
(2) A text blob block is identified for the table region.
(3) Finding a set of adjacent blobs for each blob, thereby establishing a blob graph structure: ordering the blob sets in the table area according to the y coordinate of the image, arranging the blob sets into a plurality of lines of blob sets, and ordering the blob sets in each line according to the x coordinate; after sorting, for the blobs in each row, finding the next adjacent blob in the same row and the blob in the next row which is coincident with the x axis of the blob as the adjacent set.
(4) Establishing a dual graph structure for the blob graph, and converting the graph node connection prediction problem into a graph node classification problem: each pair of connected blobs of the original blob map corresponds to a graph node in the dual map, and if two connected blob pairs in the original blob map have an identical blob, there is an edge between graph nodes corresponding to the two blob pairs in the dual map.
(5) Training a node classification model of the training chart, wherein the training process of the column classification model is as follows:
training data: for each node on the dual graph of the table, if two original image blobs of the node are in the same column, the group route of the node is 1, otherwise, the group route is 0; the input characteristics of the node are the characteristics of two original image blobs of the node;
training a model: establishing a classifier by using an gnn model, wherein the classifier is used for classifying the category of each node in the dual graph, namely whether two original image blobs represented by each node are in the same column or not;
model prediction: and obtaining a dual graph of the table and the characteristics of each node, and inputting the classification result of each node, namely the same-column relation between every two blobs in the original graph, by the model.
(6) And (3) sorting the relationship among the blobs to obtain a cell structure of the table:
respectively calculating a column set and a row set of the table;
cell of the table: sorting the table row set according to the y coordinate of the image, sorting the column set according to the x coordinate of the image, and then crossing each row and each column to obtain the table cells;
and (3) arranging the blobs in the cells: and arranging the blobs in each cell according to rows, combining the blobs in each row into a large blob, expanding the abscissa of the large blob to the cell boundary of the table, and performing character recognition on the large blob to obtain the character content of the cell.
Further, in the step (1), a table detector is built by adopting a neural network based on RCNN, and the table position is identified; in the step (2), the character blob in the table region is identified based on ctpn, craft, tesseract and other tools.
Further, in the training data sorting process in the step (5), for each node on the dual graph of the table, the input features of the nodes are features of two original image blobs of the node, including relative values of image coordinates of the two blobs, an euclidean distance between the two blobs, an x-axis coincidence rate, and a y-axis coincidence rate, and both absolute values and relative values of these values are taken as features.
Further, in the training data sorting process in the step (5), the position relationship between two blobs and adjacent table lines is used as a feature, and specifically, the distance and the coincidence rate between the table line in 4 directions where the 1 st blob may exist and the 2 nd blob are used as a feature.
Further, in the training data sorting process in the step (5), the table line is processed, the table line is used as blob, when the node characteristics are calculated, the table line is used as an adjacent point, the classification of the dual nodes generated by the table line is not calculated, namely, when the model loss is calculated, the classification error generated by the table line is not added into the total error.
Further, a Conditional Random Field (CRF) layer is added in the training model process in the step (5), specifically: adding a CRF model to the graph classification result to carry out result regularization adjustment, namely considering the integral connection structure of the table; taking the classification result of the graph nodes as a unary item of the CRF, adding a binary item into the step, giving out the same classification result to the blob pairs under the similar configuration in the loss function, and punishing different classification results of the blob pairs under the similar configuration; establishing a densecrf model, wherein the definition of similar configuration is that two blob pairs are approximately consistent in the y direction; construct the crf loss function for a single node: l (i) ═ L1(i) + r × L2(i), where L1 is the unary term, i.e., the loss function of the graph node classification model, representing the error of the individual graph node classification results from the ground channel, and L2 is the binary term.
Further, in the crf loss function of the single node, the formula of the binary term L2 is as follows:
L2=∑i,jQ(i,j),i∈[0,N],j∈{0,1},
Q(i,j)=∑mu(m,j)P(i,j),m∈{0,1},
P(i,j)=∑ksim(k,j)N(k,j),k∈[0,N],k≠i,
u(m,j)∈R2,
sim(k,j)=overlapx(Bk1,Bj1)*overlapx(Bk2,Bj2),
wherein Q (i, j) is the binary loss when the prediction result of the node i is j, N is the number of nodes in the dual graph, and j is 0, namely the classification result of the graph node i is 0; p (i, j) is the weighted sum of the prediction results of the similar node k set of the node i, and the weight of each node k is the similarity sim (k, j) between k and i; u represents a penalty coefficient between the node i and the prediction result of the node similar to the node i, and has four variables of u (0,0), u (0,1), u (1,0) and u (1,1), wherein u (0,1) represents the penalty coefficient when the prediction result of the node i is 0 and the prediction result of the similar node is 1; sim (k, j), a function of the nodes sim of the two graphs, is a similarity measure between nodes i and j, defined here as the blob represented by each of nodes i and j: (B)k1,Bi2) And (B)k1,Bj2) Overlap ratio in x direction (B)k1,Bj1)=(min(Bk1.right,Bj1.right)-max(Bk1.left,Bj1.left))/min(Bk1.right-Bk1.left,Bj1.right-Bj1Left), where left and right are the left and right blobs of a blob pair.
Further, a result feedback adjustment model is added in the training model process in the step (5), and the method specifically comprises the following steps: and the input of the model is a graph node classification result, namely the classification probability of the category of each node on the graph, a graph node classification network is constructed, the classification result still corresponding to each node is output and is compared with a ground channel to generate an error so as to train the model, the classification result is used as an input sample of the feedback regulation model again, and the cyclic prediction is carried out until the difference between the obtained result and the classification probability of the result of the previous prediction is less than a threshold value or reaches the preset maximum cycle number.
Further, in the step (6), the calculation procedure for the table column is as follows: for the blobs in each row, finding out a blob set blob _ sameline _ neighbor in the same row from the adjacent set of the blobs, finding out blobs which have the same column relationship with the blobs from the blob set blob _ sameline _ neighbor, and forming a branch ({ blob }, i) is 0,1,. n _ branch, and n _ branch is the number of blobs which have the same column relationship with the blobs; merging the branches with the common blob to obtain a branch set branch _ this of the row, { branch _ j }, j ═ 0,1,. n _ branch _ this, where n _ branch _ this is the branch number of the row; merging with the branch _ last obtained in the previous row, where k is 0,1,. n _ branch _ last is the branch number in the previous row, that is, a certain branch j and branch k in the branch _ this set and the branch _ last set having the same blob are merged to obtain an updated set branch _ last { branch }, k is 0,1,. n _ branch _ last _ update, and if branch j cannot find a branch having the same blob in the branch _ last, adding branch to the branch _ last set; obtaining a branch _ last set finally, wherein each branch in the set is a column of the table;
the calculation procedure for the table rows is as follows: for each row of blobs, finding out a blob set blob _ sameline _ neighbor in the same row from the adjacent set of blobs, finding out blobs which have a same row relationship with the blob from the blob set blob _ sameline _ neighbor, and forming a branch (blob) together with the blobs which have the same row relationship with the blob to form a branch, i is 0, 1.. n _ branch, and n _ branch is the number of blobs which have the same row relationship with the blob; merging the branches with the common blob to obtain a branch set branch _ this of the row as { branch j }, j as 0, 1.. n _ branch _ this, where n _ branch _ this is the branch number of the row; merging with the branch _ last obtained in the previous column, wherein k is 0,1,. n _ branch _ last is the branch number in the previous column, that is, a certain branch j and branch k with the same blob as in the branch _ last are merged in the branch _ this set, so as to obtain an updated set branch _ last { branch }, k is 0,1,. n _ branch _ last _ update, and if branch j cannot find a branch with the same blob in the branch _ last, adding branch to the branch _ last set; the resulting branch _ last set, each branch in the set is a row of the table.
The invention has the beneficial effects that: the method uses dual representation to convert graph connection prediction into graph node classification problem, so that the method is more convenient to be combined with a CRF and feedback regulation model; specially processing the table lines according to the characteristics of the table; and the classification result is optimized by using the CRF and the feedback regulation model and combining the classification of a single node and the consistent smoothness of the integral connection of the table.
Drawings
FIG. 1 is a flow chart of a table structure identification method based on a graph neural network according to the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
As shown in fig. 1, the method for identifying a table structure based on a graph neural network provided by the present invention converts documents of other formats into images, identifies the position of a table for each input image, intercepts a table area, identifies a text blob block for the table area, finds an adjacent blob for each blob, predicts the relationship between the blob and each adjacent blob (whether the blob is in the same row or the same column), and finally obtains the structure of the table through the relationship. The method comprises the following concrete steps:
1. each page of the pdf document is converted into an image.
2. For each input image, the position of the form is identified, and the form area is intercepted.
In an embodiment of the present application, a table detector may be built using an RCNN-based neural network to identify the table location.
3. A text blob block is identified for the table region.
In the embodiment of the present application, the identification may be performed based on ctpn, craft, tesseract, and other tools.
4. And finding a neighboring blob set for each blob, thereby establishing a blob graph structure, namely a connection relation between the blobs.
The embodiment of the application specifically comprises the following steps: ordering the blob sets in the table area according to the y coordinate of the image, arranging the blob sets into a plurality of lines (text line), and ordering the blob sets in each line according to the x coordinate; after sorting, for the blobs in each row, finding the next adjacent blob in the same row and the blob in the next row which is coincident with the x axis of the blob as the adjacent set.
5. And establishing a dual graph structure for the blob graph, and converting the graph node connection prediction problem into a graph node classification problem.
The embodiment of the application specifically comprises the following steps: each pair of connected blobs of the original blob map corresponds to a graph node in the dual map, and if two connected blob pairs in the original blob map have an identical blob, there is an edge between graph nodes corresponding to the two blob pairs in the dual map.
6. Training a graph node classification model, taking a column classification model as an example, and a row classification model is similar:
6.1 training data:
6.1.1 for each node on the dual graph of the table:
if the two original image blobs of the node are in the same row, the group route of the node is 1, otherwise, the group route is 0;
the input characteristics of the node are the characteristics of two original image blobs of the node, including the relative value of image coordinates (one blob has left, right, top and bottom 4 coordinate values), the Euclidean distance between the two blobs, the x-axis coincidence rate and the y-axis coincidence rate of the two blobs, and the absolute value and the relative value of the values are taken as the characteristics;
in addition, the positional relationship between two blobs and adjacent table grid lines is taken as a feature, specifically, the distance and the coincidence ratio between a table grid line in 4 directions of a blob1 (1 st blob) and a blob2 (2 nd blob) are taken as features, and the coincidence ratio is the coincidence ratio in the x direction, that is, (blob2.right, line.right) -max (blob2.left, line.right))/min (blob2.right-blob2.left, line.right-line.left), and the distance is blob 2.left-line.right, that is, the coincidence ratio in the left and right direction is 2.right-line.top, blob 2.right-line.top, dot.2.top-line.top, and the coincidence ratio in the up and down directions is taken as a feature;
6.1.2 processing the table lines, namely taking the table lines as blobs, taking the table lines as adjacent points when calculating the node characteristics, not calculating the classification of dual nodes generated by the table lines, namely not calculating the classification errors generated by the model lines into the total errors when calculating the model loss, and taking the table lines as the important characteristics of the table, thereby taking the remarkable influence of the table on the classification results of the blobs into consideration; the nodes generated by the table lines are placed at the end to facilitate subsequent processing;
6.2 training model:
6.2.1 node classification model of the graph: establishing a classifier by using an gnn (such as gcn, gat and the like) model, wherein the classifier is used for classifying the category (1 or 0) of each node in the dual graph, namely whether two original image blobs represented by each node are in the same column or not, 1 is in the same column, and 0 is in different columns; in the training process, the loss generated by nodes with table lines in the blob pairs is eliminated, namely the loss generated by only the first n _ blob graph nodes is taken;
6.2.2 layers of Conditional Random Fields (CRF):
preferably, a CRF model is added to the graph classification result to perform regularization adjustment of the result, i.e., to consider the overall connection structure of the table. And taking the classification result of the graph nodes as a unary item of the CRF, adding a binary item into the step, exciting the blob pairs under similar configuration to have the same classification result in the loss function, and punishing different classification results of the blob pairs under similar configuration, so that part of noise in the result can be removed, and a more consistent and smooth result can be obtained. A densecrf model is established, and a similar configuration is defined as two blob pairs that are generally consistent in the y-direction, for example, the crf loss function of a single node can be constructed as follows: l (i) ═ L1(i) + r × L2(i), where L1 is the unary term, i.e., the loss function of the graph node classification model, representing the error of the individual graph node classification results from ground truth, L2 is the binary term,
L2=∑i,jQ(i,j),i∈[0,N],j∈{0,1},
Q(i,j)=∑mu(m,j)P(i,j),m∈{0,1},
P(i,j)=∑ksim(k,j)N(k,j),k∈[0,N],k≠i,
u(m,j)∈R2,
sim(k,j)=overlapx(Bk1,Bj1)*overlapx(Bk2,Bj2),
q (i, j) is the binary loss when the prediction result of the node i is j, N is the number of nodes in the dual graph, and j is 0, namely the classification result of the graph node i is 0; p (i, j) is the weighted sum of the prediction results of the similar node k set of the node i, and the weight of each node k is the similarity sim (k, j) between k and i; u represents a penalty coefficient between the node i and the prediction result of the node similar to the node i, and has four variables of u (0,0), u (0,1), u (1,0) and u (1,1), for example, u (0,1) represents the penalty coefficient when the prediction result of the node i is 0 and the prediction result of the similar node is 1; sim (k, j), a function of the nodes sim of the two graphs, is a similarity measure between nodes i and j, defined here as the blob represented by each of nodes i and j: (B)k1,Bi2) And (B)k1,Bj2) Overlap ratio in x direction (B)k1,Bj1)=(min(Bk1.right,Bj1.right)-max(Bk1.left,Bj1.left))/min(Bk1.right-Bk1.left,Bj1.right-Bj1Left), where left and right are the left and right blobs of a blob pair. The parameters that need to be adjusted by training are u and r, and r can also be set to be constant, for example, r is 0.5. The definition of sim herein can be varied as long as it expresses its penalty purpose. In addition, sim and u can have multiple groups, and all groups are finally added to represent different penalty coefficients under different similarity configurations.
6.2.3 result feedback adjustment model:
preferably, a feedback regulation model is added to the graph node classification result (which may include the CRF layer) to optimize the classification result. The input to the model is the classification result of the nodes of the graph, i.e., the classification probability of the class of each node on the graph, e.g., the classification probability of a single node may be [0.9, -0.1], and then the node is class 0. And constructing a graph node classification network by taking the graph node classification network as input (the network can adopt an gnn model), outputting the classification result still of each node, comparing the classification result with a ground route to generate an error, training the model, taking the classification result as an input sample of a feedback regulation model again, and circularly predicting until the difference of the classification probability of the obtained result and the result of the previous prediction is less than a threshold (for example, T is 0.001) or a preset maximum cycle number (for example, N is 3), wherein the values of T and N can be modified according to the actual result.
6.3 model prediction: and obtaining a dual graph of the table and the characteristics of each node as in the training data sorting process, inputting the dual graph into the model, and obtaining the classification result of each node, namely obtaining the same-column relationship between every two blobs in the original graph. Only the relationships between non-table line blobs are of interest in the prediction result. And during prediction, whether a feedback regulation model is added or not is selected according to an actual result.
7. And (4) sorting the relations among the blobs to obtain a table cell structure, namely a table column (which blobs are in the same column), a table row (which blobs are in the same row) and a table cell (which blobs belong to the same cell).
For the calculation of the table column: for each blob in each row (text line), finding out a blob set blob _ sameline _ neighbor in the same row from the neighborhood set of the blob, finding out blobs which have the same column relationship with the blob from the blob set blob _ sameline _ neighbor, and forming a branch with the blob which has the same column relationship with the blob together, wherein i is 0, 1. Merging the branches with the common blob to obtain a branch set branch _ this of the row, wherein j is 0,1,. n _ branch _ this, and n _ branch _ this is the branch number of the row; merging the branch _ last obtained from the previous line (text line), wherein n _ branch _ last is the branch number of the previous line, that is, a certain branch j and branch k which have the same blob as the branch _ last in the branch _ this set are merged to obtain an updated set branch _ last which is { branch }, k is 0, 1.. n _ branch _ last _ update, and if branch j cannot find a branch having the same blob in the branch _ last, adding branch to the branch _ last set; the resulting branch _ last set, each branch in the set is a column of the table.
The calculations for the table rows are similar: for each row of blobs, finding out a blob set blob _ sameline _ neighbor in the same row from the adjacent set of blobs, finding out blobs which have a same row relationship with the blob from the blob set blob _ sameline _ neighbor, and forming a branch (blob) together with the blobs which have the same row relationship with the blob to form a branch, i is 0, 1.. n _ branch, and n _ branch is the number of blobs which have the same row relationship with the blob; merging the branches with the common blob to obtain a branch set branch _ this of the row as { branch j }, j as 0, 1.. n _ branch _ this, where n _ branch _ this is the branch number of the row; merging with the branch _ last obtained in the previous column, wherein k is 0,1,. n _ branch _ last is the branch number in the previous column, that is, a certain branch j and branch k with the same blob as in the branch _ last are merged in the branch _ this set, so as to obtain an updated set branch _ last { branch }, k is 0,1,. n _ branch _ last _ update, and if branch j cannot find a branch with the same blob in the branch _ last, adding branch to the branch _ last set; the resulting branch _ last set, each branch in the set is a row of the table.
Cell of the table: and then sorting the table row set according to the y coordinate of the image, sorting the column set according to the x coordinate of the image, and then crossing each row and each column to obtain a cell of the table, wherein the cell of the ith row and the j column of the table is formed by the presence of the blob set of the ith row and the blob set of the jth column.
And (3) arranging the blobs in the cells: the method comprises the steps of arranging the blobs in each cell according to lines, combining the blobs in each line into a large blob, expanding the abscissa of the large blob to the cell boundary of a table, and performing character recognition (ocr) on the large blob to obtain the character content of the cell. Character recognition may be based on a crnn model, such as the tesseract-ocr tool.
The foregoing is only a preferred embodiment of the present invention, and although the present invention has been disclosed in the preferred embodiments, it is not intended to limit the present invention. Those skilled in the art can make numerous possible variations and modifications to the present teachings, or modify equivalent embodiments to equivalent variations, without departing from the scope of the present teachings, using the methods and techniques disclosed above. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical essence of the present invention are still within the scope of the protection of the technical solution of the present invention, unless the contents of the technical solution of the present invention are departed.

Claims (9)

1. A table structure identification method based on a graph neural network is characterized by comprising the following steps:
(1) recognizing the position of a table in an input image, and intercepting a table area;
(2) identifying a text blob block for the table region;
(3) finding a set of adjacent blobs for each blob, thereby establishing a blob graph structure: ordering the blob sets in the table area according to the y coordinate of the image, arranging the blob sets into a plurality of lines of blob sets, and ordering the blob sets in each line according to the x coordinate; after sorting, for the blobs in each row, finding the next adjacent blob in the same row and the blob which is superposed with the blob on the x axis in the next row as the adjacent set;
(4) establishing a dual graph structure for the blob graph, and converting the graph node connection prediction problem into a graph node classification problem: each pair of connected blobs of the original blob graph corresponds to a graph node in the dual graph, and if two connected blobs in the original blob graph have one same blob, an edge exists between the graph nodes corresponding to the two blobs in the dual graph;
(5) training a node classification model of the training chart, wherein the training process of the column classification model is as follows:
training data: for each node on the dual graph of the table, if two original image blobs of the node are in the same column, the group route of the node is 1, otherwise, the group route is 0; the input characteristics of the node are the characteristics of two original image blobs of the node;
training a model: establishing a classifier by using an gnn model, wherein the classifier is used for classifying the category of each node in the dual graph, namely whether two original image blobs represented by each node are in the same column or not;
model prediction: obtaining a dual graph of the table and the characteristics of each node, inputting the classification result of each node, namely the same-column relation between every two blobs in the original graph, obtained by the model;
(6) and (3) sorting the relationship among the blobs to obtain a cell structure of the table:
respectively calculating a column set and a row set of the table;
cell of the table: sorting the table row set according to the y coordinate of the image, sorting the column set according to the x coordinate of the image, and then crossing each row and each column to obtain the table cells;
and (3) arranging the blobs in the cells: and arranging the blobs in each cell according to rows, combining the blobs in each row into a large blob, expanding the abscissa of the large blob to the cell boundary of the table, and performing character recognition on the large blob to obtain the character content of the cell.
2. The method according to claim 1, wherein in step (1), the RCNN-based neural network is used to build a table detector to identify the table location.
3. The table structure recognition method based on graph neural network as claimed in claim 1, wherein in the step (2), the recognition of the text blob of the table region is performed based on ctpn, craft and tesseract tools.
4. The table structure recognition method based on graph neural network as claimed in claim 1, wherein in the training data sorting process of step (5), for each node on the dual graph of the table, the input features of the node are the features of two original image blobs of the node, including the relative values of the image coordinates of the two blobs, the euclidean distance between the two blobs, the x-axis coincidence rate, and the y-axis coincidence rate, and the absolute values and the relative values of these values are taken as features.
5. The table structure recognition method based on the neural network of the graph according to claim 1, wherein in the training data sorting process of step (5), the position relationship between two blobs and adjacent table lines is used as a feature, and specifically, the distance and the coincidence rate between the table line in 4 directions where the 1 st blob may exist and the 2 nd blob are used as a feature.
6. The table structure identification method based on the graph neural network as claimed in claim 1, wherein in the training data arrangement process of step (5), the table lines are processed, the table lines are used as blobs, the table lines are used as adjacent points when calculating node features, and the classification of the dual nodes generated by the table lines is not calculated, that is, the classification error generated by the table lines is not added into the total error when calculating the model loss.
7. The method according to claim 1, wherein a Conditional Random Field (CRF) layer is added in the training model process of step (5), specifically: adding a CRF model to the graph classification result to carry out result regularization adjustment, namely considering the integral connection structure of the table; taking the classification result of the graph nodes as a unary item of the CRF, adding a binary item into the step, giving out the same classification result to the blob pairs under the similar configuration in the loss function, and punishing different classification results of the blob pairs under the similar configuration; establishing a densecrf model, wherein the definition of similar configuration is that two blob pairs are approximately consistent in the y direction; construct the crf loss function for a single node: l (i) ═ L1(i) + r × L2(i), where L1 is the unary term, i.e., the loss function of the graph node classification model, representing the error of the individual graph node classification results from the ground channel, and L2 is the binary term.
8. The method of claim 7, wherein in the crf loss function of the single node, the formula of the binary term L2 is as follows:
L2=∑i,jQ(i,j),i∈[0,N],j∈{0,1},
Q(i,j)=∑mu(m,j)P(i,j),m∈{0,1},
P(i,j)=∑ksim(k,j)N(k,j),k∈[0,N],k≠i,
u(m,j)∈R2,
sim(k,j)=overlapx(Bk1,Bj1)*overlapx(Bk2,Bj2),
wherein Q (i, j) is the binary loss when the prediction result of the node i is j, N is the number of nodes in the dual graph, and j is 0, namely the classification result of the graph node i is 0; p (i, j) is the weighted sum of the prediction results of the similar node k set of the node i, and the weight of each node k is the similarity sim (k, j) between k and i; u represents a penalty coefficient between the node i and the prediction result of the node similar to the node i, and has four variables of u (0,0), u (0,1), u (1,0) and u (1,1), wherein u (0,1) represents the penalty coefficient when the prediction result of the node i is 0 and the prediction result of the similar node is 1; sim (k, j), a function of the nodes sim of the two graphs, is a similarity measure between nodes i and j, defined here as the blob represented by each of nodes i and j: (B)k1,Bi2) And (B)k1,Bj2) In the x directionUpward overlap ratio (B)k1,Bj1)=(min(Bk1.right,Bj1.right)-max(Bk1.left,Bj1.left))/min(Bk1.right-Bk1.left,Bj1.right-Bj1Left), where left and right are the left and right blobs of a blob pair.
9. The method for identifying a table structure based on a graph neural network according to claim 1, wherein a result feedback adjustment model is added in the training model process in the step (5), and specifically comprises the following steps: and the input of the model is a graph node classification result, namely the classification probability of the category of each node on the graph, a graph node classification network is constructed, the classification result still corresponding to each node is output and is compared with a ground channel to generate an error so as to train the model, the classification result is used as an input sample of the feedback regulation model again, and the cyclic prediction is carried out until the difference between the obtained result and the classification probability of the result of the previous prediction is less than a threshold value or reaches the preset maximum cycle number.
CN202010390152.2A 2020-05-08 2020-05-08 Table structure identification method based on graph neural network Active CN111597943B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010390152.2A CN111597943B (en) 2020-05-08 2020-05-08 Table structure identification method based on graph neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010390152.2A CN111597943B (en) 2020-05-08 2020-05-08 Table structure identification method based on graph neural network

Publications (2)

Publication Number Publication Date
CN111597943A CN111597943A (en) 2020-08-28
CN111597943B true CN111597943B (en) 2021-09-03

Family

ID=72181924

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010390152.2A Active CN111597943B (en) 2020-05-08 2020-05-08 Table structure identification method based on graph neural network

Country Status (1)

Country Link
CN (1) CN111597943B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112115884A (en) * 2020-09-22 2020-12-22 北京一览群智数据科技有限责任公司 Form recognition method and system
CN112949476B (en) * 2021-03-01 2023-09-29 苏州美能华智能科技有限公司 Text relation detection method, device and storage medium based on graph convolution neural network
CN112861821B (en) * 2021-04-06 2024-04-19 刘羽 Map data reduction method based on PDF file analysis

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447007A (en) * 2018-12-19 2019-03-08 天津瑟威兰斯科技有限公司 A kind of tableau format completion algorithm based on table node identification

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0859333A1 (en) * 1997-02-12 1998-08-19 STMicroelectronics S.r.l. Method of coding characters for word recognition and word recognition device using that coding
US10055358B2 (en) * 2016-03-18 2018-08-21 Oracle International Corporation Run length encoding aware direct memory access filtering engine for scratchpad enabled multicore processors
CN109934261B (en) * 2019-01-31 2023-04-07 中山大学 Knowledge-driven parameter propagation model and few-sample learning method thereof
CN109993112B (en) * 2019-03-29 2021-04-09 杭州睿琪软件有限公司 Method and device for identifying table in picture
CN110188714A (en) * 2019-06-04 2019-08-30 言图科技有限公司 A kind of method, system and storage medium for realizing financial management under chat scenario
CN110390269B (en) * 2019-06-26 2023-08-01 平安科技(深圳)有限公司 PDF document table extraction method, device, equipment and computer readable storage medium
CN110751038A (en) * 2019-09-17 2020-02-04 北京理工大学 PDF table structure identification method based on graph attention machine mechanism

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109447007A (en) * 2018-12-19 2019-03-08 天津瑟威兰斯科技有限公司 A kind of tableau format completion algorithm based on table node identification

Also Published As

Publication number Publication date
CN111597943A (en) 2020-08-28

Similar Documents

Publication Publication Date Title
CN111597943B (en) Table structure identification method based on graph neural network
CN112241481B (en) Cross-modal news event classification method and system based on graph neural network
CN110188228B (en) Cross-modal retrieval method based on sketch retrieval three-dimensional model
CN112101430B (en) Anchor frame generation method for image target detection processing and lightweight target detection method
CN110909725A (en) Method, device and equipment for recognizing text and storage medium
US20190318256A1 (en) Method, apparatus and system for estimating causality among observed variables
CN111737535B (en) Network characterization learning method based on element structure and graph neural network
CN108334805B (en) Method and device for detecting document reading sequence
CN110751038A (en) PDF table structure identification method based on graph attention machine mechanism
CN111931505A (en) Cross-language entity alignment method based on subgraph embedding
CN111144300B (en) Pdf table structure identification method based on image identification
CN114419304A (en) Multi-modal document information extraction method based on graph neural network
CN110263855B (en) Method for classifying images by utilizing common-basis capsule projection
CN112949476A (en) Text relation detection method and device based on graph convolution neural network and storage medium
CN115526236A (en) Text network graph classification method based on multi-modal comparative learning
CN111178196B (en) Cell classification method, device and equipment
Manokhin Multi-class probabilistic classification using inductive and cross Venn–Abers predictors
CN116206327A (en) Image classification method based on online knowledge distillation
CN114782761A (en) Intelligent storage material identification method and system based on deep learning
Luqman et al. Subgraph spotting through explicit graph embedding: An application to content spotting in graphic document images
CN113516019B (en) Hyperspectral image unmixing method and device and electronic equipment
Rijal et al. Integrating Information Gain methods for Feature Selection in Distance Education Sentiment Analysis during Covid-19.
Obaidullah et al. Comparison of different classifiers for script identification from handwritten document
He et al. Classification of metro facilities with deep neural networks
CN112612900A (en) Knowledge graph guided multi-scene image generation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 310053 room 1310, Huarong times building, No. 3880 Jiangnan Avenue, Binjiang District, Hangzhou City, Zhejiang Province

Patentee after: Hangzhou Huiyidao Technology Co.,Ltd.

Country or region after: China

Address before: 310053 room 1310, Huarong times building, No. 3880 Jiangnan Avenue, Binjiang District, Hangzhou City, Zhejiang Province

Patentee before: Hangzhou Firestone Technology Co.,Ltd.

Country or region before: China

CP03 Change of name, title or address