CN112381082A

CN112381082A - Table structure reconstruction method based on deep learning

Info

Publication number: CN112381082A
Application number: CN202011280981.1A
Authority: CN
Inventors: 蔡雨欣
Original assignee: Changzhi Instant Communication Equipment Co ltd
Current assignee: Changzhi Instant Communication Equipment Co ltd
Priority date: 2020-11-16
Filing date: 2020-11-16
Publication date: 2021-02-19

Abstract

The invention discloses a table structure reconstruction method based on deep learning, which comprises the following steps: acquiring a training image, wherein a table is displayed on the training image; preprocessing a training image; extracting a feature map of the preprocessed image; learning and parameter updating are carried out by utilizing the characteristic diagram, and a table line classification and positioning model is obtained; acquiring an image to be reconstructed for table reconstruction, wherein the image to be reconstructed displays a table to be reconstructed; obtaining structural information of a table to be reconstructed according to the table line classification and positioning model; performing character recognition and image target detection on an image to be reconstructed to obtain table content information; and matching the structure information of the table to be reconstructed with the table content information to reconstruct the table. The invention can complete the training of the network by using less data to enable the network to learn stable and accurate characteristic information, thereby greatly improving the accuracy of extracting the table line information under the condition of low data samples, and meanwhile, the algorithm has good generalization capability and better robustness.

Description

Table structure reconstruction method based on deep learning

Technical Field

The invention relates to the field of image processing and pattern recognition, in particular to a table structure reconstruction method based on an image processing and deep learning method.

Background

As a common document format, a form frequently appears in people's lives, such as a resume, an entry form, a financial statement, and the like. The form style is changeable and has the characteristic of the format of the form, however, when people need to use the form, the people often need to establish a new form style by themselves, which is time-consuming.

Currently unsophisticated automation schemes may assist or assist users in quickly completing form copying or editing. The early table restoration or table identification methods are mostly traditional image processing schemes based on line regression or hough transform as main methods, and are difficult to deal with various table styles and scenes.

It can be seen that the current table reconstruction scheme has the problems of poor generalization capability and single applicable environment.

Disclosure of Invention

The invention aims to provide a table structure reconstruction method based on deep learning, which does not need user interaction in the whole process of table reconstruction, can be suitable for various table styles and scene application tables, and supports cross-platform.

In order to achieve the technical purpose, the technical scheme adopted by the invention is as follows:

a table structure reconstruction method based on deep learning comprises

Acquiring a training image, wherein a table is displayed on the training image;

preprocessing a training image;

extracting a feature map of the preprocessed image;

learning and parameter updating are carried out by utilizing the characteristic diagram, and a table line classification and positioning model is obtained;

acquiring an image to be reconstructed for table reconstruction, wherein the image to be reconstructed displays a table to be reconstructed;

obtaining structural information of a table to be reconstructed according to the image to be reconstructed;

performing character recognition and image target detection on an image to be reconstructed to obtain table content information;

and matching the structure information of the table to be reconstructed with the table content information to reconstruct the table.

Preferably, the preprocessing the training target image includes:

performing data augmentation on the training image to generate augmented training data;

mixing the augmented training data with the real form data, and then carrying out normalization processing;

and acquiring a label of each training image, wherein the label comprises the relative position relation of each pixel point relative to the table structure.

Preferably, the extracting the feature map of the preprocessed image includes:

inputting the preprocessed image data and the labels into a segmentation network based on deep learning to train and extract a feature map;

the split network employs an improved Unet split network that selects a weighted cross-entropy loss function for training.

Preferably, the obtaining of the structural information of the table to be reconstructed according to the image to be reconstructed includes:

sending the image to be reconstructed into the segmentation network to obtain a pixel probability matrix of table line classification;

generating a table line binary image according to the obtained pixel probability matrix;

extracting cell intersections according to the table line binary image;

and combining the rows and the columns of the cells according to the cell intersection points to obtain the structural information of the table to be reconstructed.

Preferably, the generating a table line binary image according to the obtained pixel probability matrix includes mapping the pixel probability matrix, and specifically includes:

and setting a probability threshold, if the pixel probability of the horizontal and vertical lines of the table is greater than the probability threshold, mapping the corresponding pixel point to be 255, otherwise, mapping to be 0, and thus segmenting the table lines.

The table structure reconstruction algorithm based on deep learning aims to convert a table image or an uneditable table area into a table in an editable form through deep learning and image processing ideas, so that a user can quickly copy or modify the structure or content of a certain table without redrawing the table, and early preparation is made for extracting key content of the table.

Compared with the prior art, the method can complete the training of the network by using less data to enable the network to learn stable and accurate characteristic information, thereby greatly improving the accuracy of extracting the profile information under the condition of low data samples. The algorithm has good generalization capability, still has very high recovery effect on fuzzy, perspective distortion and inclined tables to a certain degree, and has better robustness. The method tests about 1000 images, and the accuracy rate of cell reconstruction is over 95%.

Drawings

FIG. 1 is a flow chart of a table structure reconstruction method based on deep learning;

FIG. 2 is a flow diagram of training image preprocessing;

fig. 3 is a flowchart for acquiring structure information of a table to be reconstructed.

Detailed Description

In order to facilitate understanding of those skilled in the art, the present invention will be further described with reference to the following examples and drawings, which are not intended to limit the present invention.

As shown in FIG. 1, a table structure reconstruction method based on deep learning includes a training process and an inference process.

Training process:

the training comprises the steps of collecting samples to establish corresponding reconstruction models, and specifically comprises the following steps:

s1: the method comprises the steps of collecting a training image for training, wherein the training image is from a network real form image or a PDF or word form screenshot, can be shot through a camera, and can also be shot through a computer, but the training image is required to contain a form.

S2: the collected training images are preprocessed, so that sample training is facilitated, and the specific preprocessing process is as follows:

s21: and performing data augmentation on the training image to generate augmented training data.

Data augmentation is one of the common skills in deep learning, and is mainly used for increasing a training data set, so that the data set is diversified as much as possible, and a trained model has stronger generalization capability. The related data in the data set is improved through data augmentation, so that the network can be prevented from learning irrelevant features, more data related performance is learned, and the overall performance is obviously improved. In practical applications, not all augmentation methods are applicable to the current training data, and it is necessary to determine which augmentation method should be used according to the features of the currently trained data set.

In the embodiment, data amplification in the modes of inversion, noise addition, color dithering, blurring and the like is adopted.

S22: and mixing the augmented training data subjected to data augmentation in the step S21 with the real table data to form a training image set, and then carrying out normalization processing on the training image set.

S23: and acquiring a label of each training image, wherein the label comprises the relative position relation of each pixel point relative to the table structure.

S3: extracting a feature map of the preprocessed image, which specifically comprises the following steps:

s31: the preprocessed image data and the label of the step S23 are used as input, and are sent into a segmentation network based on deep learning to be trained to extract a feature map.

A common segmentation network is a Unet model, in which feature extraction is performed on image data through an encoder (encoder), up-sampling and feature fusion are performed in an extracted feature map, and classified probabilities and pixel coordinates are output.

The application improves on a Unet split network, and replaces the original loss Function by using a weighted Cross Entropy loss Function (Cross Engine Error Function).

S4: and learning and updating parameters by using the characteristic diagram to obtain a table line classification and positioning model.

And (3) reasoning process:

the inference process identifies the table in the image to be reconstructed according to the reconstruction model, and specifically comprises the following steps:

s5: acquiring an image to be reconstructed for form reconstruction, wherein the image to be reconstructed can be shot by a camera or captured by a computer, and the image to be reconstructed displays a form to be reconstructed;

s6: obtaining structure information of a table to be reconstructed according to an image to be reconstructed, which specifically comprises the following steps:

s61: sending the image to be reconstructed into the segmentation network used in S31 to obtain a pixel probability matrix of table line classification;

s62: generating a table line binary image according to the obtained pixel probability matrix, namely mapping the pixel probability matrix, specifically comprising:

setting a probability threshold value, for example, 0.5, if the pixel probability of the table line is greater than 0.5, mapping the corresponding pixel point to 255, otherwise, mapping to 0, and mapping all the pixel points to form a binary image with the same size as the original image.

S63: the positions of the cell intersections, such as the coordinates of the four vertices of the cell, are extracted from the table line binary image.

S64: and combining the rows and the columns of the cells according to the cell intersection points to obtain the structural information of the table to be reconstructed.

S7: and performing character recognition and image target detection on the image to be reconstructed to obtain table content information.

The character recognition here may adopt a common OCR recognition algorithm for recognizing character information and image information in the image to be reconstructed.

S8: matching the structure information of the table to be reconstructed with the table content information, and reconstructing the table, which specifically comprises the following steps:

the method comprises the steps of describing structure information and table content information of a table according to a general description language of the table, then opening the word described in word, excel and Office of WPS, and editing the word, such as an XML file, wherein basic composition units of the table are individual cells, vertex coordinates of the cells are obtained according to the method so as to position rows and columns of the extracted cells, and then the character information and the image information identified according to characters are in the corresponding cells.

It should be noted that, in this embodiment, the RGB images are selected for both the training and the image to be reconstructed.

The table structure reconstruction method based on deep learning provided by the application is described in detail above. The description of the specific embodiments is only intended to facilitate an understanding of the methods of the present application and their core concepts. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

Claims

1. The table structure reconstruction method based on deep learning is characterized by comprising

Acquiring a training image, wherein a table is displayed on the training image;

preprocessing a training image;

extracting a feature map of the preprocessed image;

obtaining structural information of a table to be reconstructed according to the table line classification and positioning model;

2. The table structure reconstruction method based on deep learning of claim 1, wherein the preprocessing of the training target image comprises:

3. The table structure reconstruction method based on deep learning of claim 2, wherein the extracting the feature map of the preprocessed image comprises:

4. The table structure reconstruction method based on deep learning of claim 3, wherein obtaining the structure information of the table to be reconstructed according to the image to be reconstructed comprises:

extracting cell intersections according to the table line binary image;

5. The table structure reconstructing method based on deep learning according to claim 4, wherein the generating of the table line binary image according to the obtained pixel probability matrix includes mapping the pixel probability matrix, and specifically includes:

and setting a probability threshold, if the pixel probability of the table horizontal and vertical lines is greater than the probability threshold, mapping the corresponding pixel point to be 255, otherwise, mapping to be 0, and thus segmenting the table horizontal and vertical lines.