CN115527223A

CN115527223A - Complex diagram extraction method and system based on computer vision and graph convolution network

Info

Publication number: CN115527223A
Application number: CN202210667214.9A
Authority: CN
Inventors: 江秀; 伍惠英; 翁晓锋; 曹凯; 谢登峰; 方声财; 林晋瑶; 陈榕城
Original assignee: Fujian Yili Information Technology Co ltd
Current assignee: Fujian Yili Information Technology Co ltd
Priority date: 2022-06-14
Filing date: 2022-06-14
Publication date: 2022-12-27

Abstract

The invention relates to a complex diagram extraction method based on computer vision and a diagram convolution network, which comprises the following steps: step S1, rendering a document into an image, and performing layout segmentation by adopting a computer vision and deep learning technology; s2, preprocessing the segmented image; and S2, analyzing the preprocessed image topological structure based on the model of the graph convolution network, and detecting and extracting the table. The invention realizes the end-to-end table detection and effectively improves the detection efficiency and the accuracy.

Description

Complex diagram extraction method and system based on computer vision and graph convolution network

Technical Field

The invention relates to the technical field of chart data analysis and extraction, in particular to a complex chart extraction method and system based on computer vision and a chart convolution network.

Background

With the continuous deepening of the application and the increasing of the data quantity, the core data are distributed in text, table and chart information such as annual reports, financial reports, audit reports and IPO reports of companies, even in formats such as scanning pieces, which belong to unstructured format data, and a lot of time is consumed by depending on manual reading, positioning and manual extraction, and a lot of time is needed for finding the core chart data. The steps of copying data from an original report, calculating the data to finally enter an analysis model are various, manual operation is prone to errors, algorithm technology threshold is high, sample diversity is complex, and an enterprise IT department cannot make a work. The purpose of form recognition is to acquire forms in images and access data thereof, and is an important branch of the field of document analysis and recognition. How to effectively utilize the technology and how to efficiently find the table area from the document or the image by an intelligent means to realize intelligent analysis and intelligent extraction of data is a pain point and a challenge currently facing.

Disclosure of Invention

In view of this, the present invention provides a complex diagram extraction method based on computer vision and a graph convolution network, which realizes end-to-end table detection and effectively improves detection efficiency and accuracy.

In order to achieve the purpose, the invention adopts the following technical scheme:

a complex diagram extraction method based on computer vision and graph convolution network comprises the following steps:

step S1, rendering a document into an image, and performing layout segmentation by adopting a computer vision and deep learning technology;

s2, preprocessing the segmented image;

and S3, analyzing the preprocessed image topological structure based on the model of the graph convolution network, and detecting and extracting the table.

Further, in the step S1, a full convolution neural network is adopted to identify each independent area in the document page, including a title, a paragraph, a table, an illustration, and a data diagram layout.

Further, the full convolution neural network performs image semantic segmentation through convolution, deconvolution and a layer jump structure, and specifically includes the following steps:

inputting the image into a convolution neural network, and obtaining a series of characteristic graphs through multiple convolution and pooling processes;

then, the resolution is improved through upsampling, and after the resolution of the picture is improved to be consistent with the original picture, the area with high weight is the area where the target is located;

and finally, restoring the restored image by combining the data after the up-sampling and the upper-layer convolution pooling.

Furthermore, the full convolution neural network adopts a skip level connection method, the feature maps extracted from the first layers of convolution are respectively connected with the subsequent upsampling layer, and then are added to continue upsampling.

Further, the preprocessing comprises:

(1) There is red seal to shelter from

For the condition that the red seal is shielded, the red seal removing operation is carried out on the existing document, and then the character recognition is carried out

(2) Has wrinkles

Identifying the wrinkle condition of a scanned part or picture with wrinkles, wherein the wrinkle condition comprises resolvable, partially resolved and unresolvable, if the wrinkle condition cannot be resolved, not resolving, alarming to resolve a result, performing manual intervention on the resolved result, performing wrinkle degree evaluation in the first step by resolving, wherein the wrinkle condition comprises clear and resolvable contents of partial wrinkles, and the accuracy of seriously resolved contents of wrinkles is lower than the average level; firstly, inclination, handstand and correction treatment are carried out on part of wrinkle contents clearly; then, identifying the table purpose, and comparing the labeled sample data;

(3) Image tilting

For the condition that the scanned part or the picture is inclined, before analysis, the image is corrected and then analyzed according to the scanned part and the picture processing algorithm;

(4) Image side stand

For the condition that the scanning piece or the picture stands on the side, the image is analyzed according to the scanning piece and the picture processing algorithm after being processed in the forward direction before being analyzed;

(5) Image handstand

For the condition that the scanning piece or the picture is inverted, the image is analyzed according to the scanning piece and the picture processing algorithm after being processed in the forward direction before being analyzed;

(6) Cross-page table merging

For the table in the scanned piece or the picture and the condition of cross-page segmentation, firstly, if the table header exists, performing table header comparison, and then performing table merging according to the table header content; if the table header does not exist, merging the tables according to the length of the tables and the division number of the tables;

(7) Form radio

The cases that the table does not exist in the scanned object or the picture comprise the cases that the beginning and the end exist, the beginning exists, the end exists and no table exists overseas; identifying the table use according to the title of the text; after identification, carrying out matching identification analysis according to a sample labeling result, and carrying out table reduction; under the condition of no title, sample data matching is carried out according to the labeling result; after matching, carrying out analysis table reduction; and under the condition of no sample data, early warning the table and carrying out manual algorithm intervention.

Further, the step S3 specifically includes:

firstly, abstracting the information of the table structure into a row-column relationship among nodes, namely, character strings in the same column in the table form nodes with the same row relationship, character strings in the same row in the table form nodes with the same row relationship, and finally restoring the table with the digital structure through the row-column relationship among the nodes;

secondly, constructing a spatial relationship graph by adopting an epsilon-neighbor graph, finding epsilon neighbor samples closest to a certain node through Euclidean distance according to text information, position information and picture information in a given sample table data set, and then respectively connecting ix with the epsilon neighbor samples to form epsilon directed edges, wherein all nodes in the space are processed according to the method;

and finally, constructing a diffusion convolution neural network, acquiring text features, position features and image features for unified modeling based on two text boxes indicated by each edge in the spatial relationship diagram, and giving structure position prediction aiming at the two text boxes so as to identify and extract the table.

Further, the diffusion convolution neural network regards graph convolution as a diffusion process, and information is transferred from one node to an adjacent node with a certain transfer probability, so that the information distribution is balanced after several rounds, and then the convolution operation of each layer is expressed as:

H ^(k) ＝f(W ^(k) ⊙P ^k X)

where K denotes the number of layers, P = D ^-1 A represents a transition matrix, D is a node degree matrix, A is an adjacency matrix, P ^k The node neighbor range observed by the convolution is represented, and the convolution of the neighbor node with the distance of 1 is represented when k =1, and the convolution of the neighbor node with the distance of 2 is represented when k = 2.

A system of a complex chart extraction method based on computer vision and graph convolution network comprises a labeling module, a training module and an extraction module;

the marking template realizes multi-event marking on the same document through a self-defined template, and firstly, an index and a marking template are created; secondly, creating a labeling set, uploading a file to be labeled, supporting PDF file and plain text file labeling, and finally performing visible labeling;

the training progress and loss of the training module can be dynamically updated and displayed by a visual chart, and a log generated in the training process can be fed back to a Web page in real time, so that an algorithm worker can conveniently analyze and locate problems; after training is finished, outputting the accuracy, the recall rate and the F1-Socre of the overall and sub indexes, and simultaneously generating an index confusion matrix;

the extraction module can immediately issue the HTTP model service which can be directly called according to the model which is successfully trained, and can quickly verify the model service by inputting the text.

Compared with the prior art, the invention has the following beneficial effects:

1. the invention realizes the end-to-end table detection, and the recall rate of the table detection is far higher than that of the traditional table detection algorithm;

2. according to the method, a basic model can be generated through a small amount of labels, the result of algorithm pre-labeling is provided, a trained model can be selected in a continuous labeling task, and after a certain number of samples are accumulated, the trained model can be continuously fed back to algorithm training, so that the generalization capability of the model is further improved; and finally rendering the document into an image, performing layout segmentation in a visual mode, and visually displaying the electronic document table, the table label and the table extraction visual mode.

Drawings

FIG. 1 is a diagram of a full convolution neural network in accordance with an embodiment of the present invention;

FIG. 2 is a schematic diagram of an image with red seal occlusion according to an embodiment of the invention;

FIG. 3 illustrates the relationship between nodes after table abstraction according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a Diffusion Convolutional Neural Network (DCNN) according to an embodiment of the present invention;

FIG. 5 is a schematic illustration of visualization tagging in an embodiment of the present invention;

FIG. 6 is a graphical illustration of a dynamic update visualization of training progress and LOSS (LOSS) in an embodiment of the invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

The invention provides a complex diagram extraction method based on computer vision and a diagram convolution network, which comprises the following steps:

referring to fig. 1, in the present embodiment, a full convolution neural network is used to identify each independent region in a document page, including a title, a paragraph, a table, an illustration, and a data diagram layout, and the full convolution neural network is trained using a pixel-level cross entropy loss function.

The full convolution neural network carries out image semantic segmentation through key steps of convolution, deconvolution, layer jump structures and the like, thereby realizing the layout segmentation, and the method mainly comprises the following steps:

first, after an RGB image is input to a convolutional neural network, a series of feature maps (convolution) are obtained through a plurality of convolution and pooling processes.

And finally, after the resolution of the picture is improved to be consistent with the original picture, the area with high weight is the area where the target is located.

And finally, restoring the restored image by combining the data (layer jump structure) after the upsampling and the upper layer convolution pooling.

When the feature map is still large during convolution, the extracted image information is very rich, and the information loss of the subsequent image becomes more obvious. For the image of the last layer, 32 times of upsampling is needed to obtain the same size as the original image, but the upsampling is performed only on the image of the last layer, the obtained result is still not accurate, and some details are still not accurate.

Therefore, a skip level connection method is adopted, namely feature maps extracted from the first layers of convolution are respectively connected with the subsequent upsampling layers, then the feature maps are added and continuously upsampled, and the feature maps with the same size as the original image can be obtained after upsampling for multiple times.

S2, preprocessing the segmented image, including

The method comprises the following steps:

(1) With red seal shielding

(2) Has wrinkles

Identifying the wrinkle condition of the scanned part or the picture with wrinkles, wherein the wrinkle condition comprises resolvable state, partial resolution and unresolvable state, if the scanned part or the picture with wrinkles is unresolvable, alarming to resolve the result, performing manual intervention on the resolved result, and firstly performing wrinkle degree evaluation by resolving, wherein the wrinkle degree evaluation comprises clear and resolvable contents of partial wrinkles, and the accuracy rate of the seriously resolved contents of wrinkles is lower than the average level; firstly, inclination, handstand and correction treatment are carried out on part of wrinkle contents clearly; then, according to the identification of the table purpose, the labeled sample data is compared;

(3) Image tilting

(4) Image side stand

For the condition that the scanned part or the picture stands on one side, the image is analyzed according to the scanned part and the picture processing algorithm after being processed in the forward direction before being analyzed;

(5) Image handstand

For the condition that the scanned piece or the picture is inverted, carrying out forward processing on the image before analysis, and then carrying out analysis according to the scanned piece and the picture processing algorithm;

(6) Cross-page table merging

For the table in the scanned piece or the picture and the condition of cross-page segmentation, firstly, if the table header exists, performing table header comparison, and then performing table merging according to the table header content; if the table header does not exist, merging the tables according to the length of the tables and the dividing number of the tables;

(7) Form radio

In this embodiment, the information of the table structure may be abstracted into a row-column relationship between nodes, that is, the character strings in the same column in the table all constitute nodes having a "same column" relationship, the character strings in the same row in the table all constitute nodes having a "same row" relationship, and the table with a digital structure may be finally restored through the row-column relationship between the nodes, as shown in table 1 and fig. 3:

secondly, constructing a spatial relationship graph by adopting an epsilon-neighbor graph, finding epsilon neighbor samples closest to a certain node through Euclidean distance according to text information, position information and picture information in a given sample table data set, and then respectively connecting ix with the epsilon neighbor samples to form epsilon directed edges, wherein all nodes in the space are processed according to the mode;

The diffusion convolution neural network regards graph convolution as a diffusion process, and information is transferred from one node to an adjacent node with a certain transfer probability, so that the information distribution is balanced after several rounds, and then the convolution operation of each layer is expressed as:

H ^(k) ＝f(W ^(k) ⊙P ^k X)

where K denotes the number of layers, P = D ^-1 A represents a transition matrix, D is a node degree matrix, A is an adjacency matrix, P ^k The node neighbor range observed by the convolution is represented, and the convolution of the neighbor node with the distance of 1 is represented when k =1, and the convolution of the neighbor node with the distance of 2 is represented when k = 2. When we calculate the feature of the node of each layer, it is necessary to connect the node features of each layer into a matrix, and then perform feature transformation on the feature of each node in different layers through some linear transformations, as shown in fig. 5 below, to finally obtain the whole graph or the feature matrix of each node.

In this embodiment, referring to fig. 6, a system of a complex chart extraction method based on computer vision and a graph convolution network is further provided, including a labeling module, a training module, and an extraction module;

the marking template realizes multi-event marking on the same document through a self-defined template, and firstly, an index and a marking template are created; secondly, creating a label set, uploading a file to be marked, supporting PDF file and plain text file labeling, and finally visually labeling;

the training progress and loss of the training module can be dynamically updated and displayed by a visual chart, and a log generated in the training process can be fed back to a Web page in real time, so that an algorithm worker can conveniently analyze and locate problems; after training is finished, the accuracy, recall rate and F1-Socre of the overall index and the sub-index are output, and an index confusion matrix is generated at the same time;

The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

Claims

1. A complex diagram extraction method based on computer vision and graph convolution network is characterized by comprising the following steps:

s2, preprocessing the segmented image;

2. The computer vision and graph convolution network-based complex diagram extraction method according to claim 1, wherein the step S1 adopts a full convolution neural network to identify each independent area in a document page, including a title, a paragraph, a table, an illustration, and a data diagram layout.

3. The method for extracting complex diagrams based on computer vision and graph convolution networks according to claim 2, wherein the full convolution neural network performs image semantic segmentation through convolution, deconvolution and layer jump structures, and specifically comprises the following steps:

inputting the image into a convolutional neural network, and obtaining a series of characteristic graphs through multiple convolution and pooling processes;

4. The method for extracting complex diagrams based on computer vision and graph convolution network according to claim 3, characterized in that the full convolution neural network adopts a jump connection method, and feature maps extracted from the first few layers of convolution are respectively connected with the subsequent upsampling layer, and then are added to continue upsampling.

5. The computer vision and graph convolution network based complex graph extraction method of claim 1, wherein the preprocessing includes:

(1) There is red seal to shelter from

(2) Has wrinkles

Identifying the wrinkle condition of the scanned part or the picture with wrinkles, wherein the wrinkle condition comprises resolvable state, partial resolution and unresolvable state, if the scanned part or the picture with wrinkles is unresolvable, alarming to resolve the result, performing manual intervention on the resolved result, and firstly performing wrinkle degree evaluation by resolving, wherein the wrinkle degree evaluation comprises clear and resolvable contents of partial wrinkles, and the accuracy rate of the seriously resolved contents of wrinkles is lower than the average level; firstly, carrying out inclination, inversion and correction treatment on part of wrinkle contents; then, according to the identification of the table purpose, the labeled sample data is compared;

(3) Image tilting

(4) Image side stand

(5) Image handstand

(6) Cross-page table merging

(7) Form radio

The method comprises the following steps that when no table exists in a scanned object or a picture, the scanned object or the picture comprises the situations that a start and an end exist, the start exists, the end exists and no table exists overseas; identifying the table use according to the title of the text; after identification, carrying out matching identification analysis according to a sample labeling result, and carrying out table reduction; under the condition of no title, sample data matching is carried out according to the labeling result; after matching, carrying out analysis table reduction; and under the condition of no sample data, early warning the table and carrying out manual algorithm intervention.

6. The method for extracting complex diagrams based on computer vision and graph convolution network according to claim 1, wherein the step S3 is specifically:

7. The method for extracting complex diagrams based on computer vision and graph convolution networks according to claim 6, wherein the diffusion convolution neural network regards graph convolution as a diffusion process, and if information is transferred from one node to an adjacent node with a certain transfer probability, so that the information distribution reaches equilibrium after several rounds, the convolution operation of each layer is expressed as:

H ^(k) ＝f(W ^(k) ⊙P ^k X)

8. The system of the complex diagram extraction method based on the computer vision and the graph convolution network is characterized by comprising a labeling module, a training module and an extraction module;

the training progress and loss of the training module can be dynamically updated and displayed by a visual chart, and logs generated in the training process can be fed back to a Web page in real time, so that an algorithm worker can analyze and position problems conveniently; after training is finished, the accuracy, recall rate and F1-Socre of the overall index and the sub-index are output, and an index confusion matrix is generated at the same time;