CN112381082A - Table structure reconstruction method based on deep learning - Google Patents
Table structure reconstruction method based on deep learning Download PDFInfo
- Publication number
- CN112381082A CN112381082A CN202011280981.1A CN202011280981A CN112381082A CN 112381082 A CN112381082 A CN 112381082A CN 202011280981 A CN202011280981 A CN 202011280981A CN 112381082 A CN112381082 A CN 112381082A
- Authority
- CN
- China
- Prior art keywords
- image
- reconstructed
- training
- deep learning
- information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000013135 deep learning Methods 0.000 title claims abstract description 20
- 238000012549 training Methods 0.000 claims abstract description 42
- 238000007781 pre-processing Methods 0.000 claims abstract description 7
- 238000010586 diagram Methods 0.000 claims abstract description 5
- 238000001514 detection method Methods 0.000 claims abstract description 4
- 239000011159 matrix material Substances 0.000 claims description 11
- 238000013507 mapping Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 7
- 230000011218 segmentation Effects 0.000 claims description 7
- 230000003190 augmentative effect Effects 0.000 claims description 6
- 238000013434 data augmentation Methods 0.000 claims description 6
- 230000006870 function Effects 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 description 7
- 230000003416 augmentation Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003321 amplification Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/22—Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a table structure reconstruction method based on deep learning, which comprises the following steps: acquiring a training image, wherein a table is displayed on the training image; preprocessing a training image; extracting a feature map of the preprocessed image; learning and parameter updating are carried out by utilizing the characteristic diagram, and a table line classification and positioning model is obtained; acquiring an image to be reconstructed for table reconstruction, wherein the image to be reconstructed displays a table to be reconstructed; obtaining structural information of a table to be reconstructed according to the table line classification and positioning model; performing character recognition and image target detection on an image to be reconstructed to obtain table content information; and matching the structure information of the table to be reconstructed with the table content information to reconstruct the table. The invention can complete the training of the network by using less data to enable the network to learn stable and accurate characteristic information, thereby greatly improving the accuracy of extracting the table line information under the condition of low data samples, and meanwhile, the algorithm has good generalization capability and better robustness.
Description
Technical Field
The invention relates to the field of image processing and pattern recognition, in particular to a table structure reconstruction method based on an image processing and deep learning method.
Background
As a common document format, a form frequently appears in people's lives, such as a resume, an entry form, a financial statement, and the like. The form style is changeable and has the characteristic of the format of the form, however, when people need to use the form, the people often need to establish a new form style by themselves, which is time-consuming.
Currently unsophisticated automation schemes may assist or assist users in quickly completing form copying or editing. The early table restoration or table identification methods are mostly traditional image processing schemes based on line regression or hough transform as main methods, and are difficult to deal with various table styles and scenes.
It can be seen that the current table reconstruction scheme has the problems of poor generalization capability and single applicable environment.
Disclosure of Invention
The invention aims to provide a table structure reconstruction method based on deep learning, which does not need user interaction in the whole process of table reconstruction, can be suitable for various table styles and scene application tables, and supports cross-platform.
In order to achieve the technical purpose, the technical scheme adopted by the invention is as follows:
a table structure reconstruction method based on deep learning comprises
Acquiring a training image, wherein a table is displayed on the training image;
preprocessing a training image;
extracting a feature map of the preprocessed image;
learning and parameter updating are carried out by utilizing the characteristic diagram, and a table line classification and positioning model is obtained;
acquiring an image to be reconstructed for table reconstruction, wherein the image to be reconstructed displays a table to be reconstructed;
obtaining structural information of a table to be reconstructed according to the image to be reconstructed;
performing character recognition and image target detection on an image to be reconstructed to obtain table content information;
and matching the structure information of the table to be reconstructed with the table content information to reconstruct the table.
Preferably, the preprocessing the training target image includes:
performing data augmentation on the training image to generate augmented training data;
mixing the augmented training data with the real form data, and then carrying out normalization processing;
and acquiring a label of each training image, wherein the label comprises the relative position relation of each pixel point relative to the table structure.
Preferably, the extracting the feature map of the preprocessed image includes:
inputting the preprocessed image data and the labels into a segmentation network based on deep learning to train and extract a feature map;
the split network employs an improved Unet split network that selects a weighted cross-entropy loss function for training.
Preferably, the obtaining of the structural information of the table to be reconstructed according to the image to be reconstructed includes:
sending the image to be reconstructed into the segmentation network to obtain a pixel probability matrix of table line classification;
generating a table line binary image according to the obtained pixel probability matrix;
extracting cell intersections according to the table line binary image;
and combining the rows and the columns of the cells according to the cell intersection points to obtain the structural information of the table to be reconstructed.
Preferably, the generating a table line binary image according to the obtained pixel probability matrix includes mapping the pixel probability matrix, and specifically includes:
and setting a probability threshold, if the pixel probability of the horizontal and vertical lines of the table is greater than the probability threshold, mapping the corresponding pixel point to be 255, otherwise, mapping to be 0, and thus segmenting the table lines.
The table structure reconstruction algorithm based on deep learning aims to convert a table image or an uneditable table area into a table in an editable form through deep learning and image processing ideas, so that a user can quickly copy or modify the structure or content of a certain table without redrawing the table, and early preparation is made for extracting key content of the table.
Compared with the prior art, the method can complete the training of the network by using less data to enable the network to learn stable and accurate characteristic information, thereby greatly improving the accuracy of extracting the profile information under the condition of low data samples. The algorithm has good generalization capability, still has very high recovery effect on fuzzy, perspective distortion and inclined tables to a certain degree, and has better robustness. The method tests about 1000 images, and the accuracy rate of cell reconstruction is over 95%.
Drawings
FIG. 1 is a flow chart of a table structure reconstruction method based on deep learning;
FIG. 2 is a flow diagram of training image preprocessing;
fig. 3 is a flowchart for acquiring structure information of a table to be reconstructed.
Detailed Description
In order to facilitate understanding of those skilled in the art, the present invention will be further described with reference to the following examples and drawings, which are not intended to limit the present invention.
As shown in FIG. 1, a table structure reconstruction method based on deep learning includes a training process and an inference process.
Training process:
the training comprises the steps of collecting samples to establish corresponding reconstruction models, and specifically comprises the following steps:
s1: the method comprises the steps of collecting a training image for training, wherein the training image is from a network real form image or a PDF or word form screenshot, can be shot through a camera, and can also be shot through a computer, but the training image is required to contain a form.
S2: the collected training images are preprocessed, so that sample training is facilitated, and the specific preprocessing process is as follows:
s21: and performing data augmentation on the training image to generate augmented training data.
Data augmentation is one of the common skills in deep learning, and is mainly used for increasing a training data set, so that the data set is diversified as much as possible, and a trained model has stronger generalization capability. The related data in the data set is improved through data augmentation, so that the network can be prevented from learning irrelevant features, more data related performance is learned, and the overall performance is obviously improved. In practical applications, not all augmentation methods are applicable to the current training data, and it is necessary to determine which augmentation method should be used according to the features of the currently trained data set.
In the embodiment, data amplification in the modes of inversion, noise addition, color dithering, blurring and the like is adopted.
S22: and mixing the augmented training data subjected to data augmentation in the step S21 with the real table data to form a training image set, and then carrying out normalization processing on the training image set.
S23: and acquiring a label of each training image, wherein the label comprises the relative position relation of each pixel point relative to the table structure.
S3: extracting a feature map of the preprocessed image, which specifically comprises the following steps:
s31: the preprocessed image data and the label of the step S23 are used as input, and are sent into a segmentation network based on deep learning to be trained to extract a feature map.
A common segmentation network is a Unet model, in which feature extraction is performed on image data through an encoder (encoder), up-sampling and feature fusion are performed in an extracted feature map, and classified probabilities and pixel coordinates are output.
The application improves on a Unet split network, and replaces the original loss Function by using a weighted Cross Entropy loss Function (Cross Engine Error Function).
S4: and learning and updating parameters by using the characteristic diagram to obtain a table line classification and positioning model.
And (3) reasoning process:
the inference process identifies the table in the image to be reconstructed according to the reconstruction model, and specifically comprises the following steps:
s5: acquiring an image to be reconstructed for form reconstruction, wherein the image to be reconstructed can be shot by a camera or captured by a computer, and the image to be reconstructed displays a form to be reconstructed;
s6: obtaining structure information of a table to be reconstructed according to an image to be reconstructed, which specifically comprises the following steps:
s61: sending the image to be reconstructed into the segmentation network used in S31 to obtain a pixel probability matrix of table line classification;
s62: generating a table line binary image according to the obtained pixel probability matrix, namely mapping the pixel probability matrix, specifically comprising:
setting a probability threshold value, for example, 0.5, if the pixel probability of the table line is greater than 0.5, mapping the corresponding pixel point to 255, otherwise, mapping to 0, and mapping all the pixel points to form a binary image with the same size as the original image.
S63: the positions of the cell intersections, such as the coordinates of the four vertices of the cell, are extracted from the table line binary image.
S64: and combining the rows and the columns of the cells according to the cell intersection points to obtain the structural information of the table to be reconstructed.
S7: and performing character recognition and image target detection on the image to be reconstructed to obtain table content information.
The character recognition here may adopt a common OCR recognition algorithm for recognizing character information and image information in the image to be reconstructed.
S8: matching the structure information of the table to be reconstructed with the table content information, and reconstructing the table, which specifically comprises the following steps:
the method comprises the steps of describing structure information and table content information of a table according to a general description language of the table, then opening the word described in word, excel and Office of WPS, and editing the word, such as an XML file, wherein basic composition units of the table are individual cells, vertex coordinates of the cells are obtained according to the method so as to position rows and columns of the extracted cells, and then the character information and the image information identified according to characters are in the corresponding cells.
It should be noted that, in this embodiment, the RGB images are selected for both the training and the image to be reconstructed.
The table structure reconstruction method based on deep learning provided by the application is described in detail above. The description of the specific embodiments is only intended to facilitate an understanding of the methods of the present application and their core concepts. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
Claims (5)
1. The table structure reconstruction method based on deep learning is characterized by comprising
Acquiring a training image, wherein a table is displayed on the training image;
preprocessing a training image;
extracting a feature map of the preprocessed image;
learning and parameter updating are carried out by utilizing the characteristic diagram, and a table line classification and positioning model is obtained;
acquiring an image to be reconstructed for table reconstruction, wherein the image to be reconstructed displays a table to be reconstructed;
obtaining structural information of a table to be reconstructed according to the table line classification and positioning model;
performing character recognition and image target detection on an image to be reconstructed to obtain table content information;
and matching the structure information of the table to be reconstructed with the table content information to reconstruct the table.
2. The table structure reconstruction method based on deep learning of claim 1, wherein the preprocessing of the training target image comprises:
performing data augmentation on the training image to generate augmented training data;
mixing the augmented training data with the real form data, and then carrying out normalization processing;
and acquiring a label of each training image, wherein the label comprises the relative position relation of each pixel point relative to the table structure.
3. The table structure reconstruction method based on deep learning of claim 2, wherein the extracting the feature map of the preprocessed image comprises:
inputting the preprocessed image data and the labels into a segmentation network based on deep learning to train and extract a feature map;
the split network employs an improved Unet split network that selects a weighted cross-entropy loss function for training.
4. The table structure reconstruction method based on deep learning of claim 3, wherein obtaining the structure information of the table to be reconstructed according to the image to be reconstructed comprises:
sending the image to be reconstructed into the segmentation network to obtain a pixel probability matrix of table line classification;
generating a table line binary image according to the obtained pixel probability matrix;
extracting cell intersections according to the table line binary image;
and combining the rows and the columns of the cells according to the cell intersection points to obtain the structural information of the table to be reconstructed.
5. The table structure reconstructing method based on deep learning according to claim 4, wherein the generating of the table line binary image according to the obtained pixel probability matrix includes mapping the pixel probability matrix, and specifically includes:
and setting a probability threshold, if the pixel probability of the table horizontal and vertical lines is greater than the probability threshold, mapping the corresponding pixel point to be 255, otherwise, mapping to be 0, and thus segmenting the table horizontal and vertical lines.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011280981.1A CN112381082A (en) | 2020-11-16 | 2020-11-16 | Table structure reconstruction method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011280981.1A CN112381082A (en) | 2020-11-16 | 2020-11-16 | Table structure reconstruction method based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112381082A true CN112381082A (en) | 2021-02-19 |
Family
ID=74584795
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011280981.1A Pending CN112381082A (en) | 2020-11-16 | 2020-11-16 | Table structure reconstruction method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112381082A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113610043A (en) * | 2021-08-19 | 2021-11-05 | 海默潘多拉数据科技(深圳)有限公司 | Industrial drawing table structured recognition method and system |
CN113850249A (en) * | 2021-12-01 | 2021-12-28 | 深圳市迪博企业风险管理技术有限公司 | Method for formatting and extracting chart information |
CN114463766A (en) * | 2021-07-16 | 2022-05-10 | 荣耀终端有限公司 | Form processing method and electronic equipment |
JP7268115B1 (en) | 2021-11-09 | 2023-05-02 | 西松建設株式会社 | Rebar arrangement list reader, list reader, bar arrangement list reading method and program |
-
2020
- 2020-11-16 CN CN202011280981.1A patent/CN112381082A/en active Pending
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114463766A (en) * | 2021-07-16 | 2022-05-10 | 荣耀终端有限公司 | Form processing method and electronic equipment |
CN113610043A (en) * | 2021-08-19 | 2021-11-05 | 海默潘多拉数据科技(深圳)有限公司 | Industrial drawing table structured recognition method and system |
JP7268115B1 (en) | 2021-11-09 | 2023-05-02 | 西松建設株式会社 | Rebar arrangement list reader, list reader, bar arrangement list reading method and program |
JP2023070290A (en) * | 2021-11-09 | 2023-05-19 | 西松建設株式会社 | Rebar arrangement list reader, list reader, rebar arrangement list reading method and program |
CN113850249A (en) * | 2021-12-01 | 2021-12-28 | 深圳市迪博企业风险管理技术有限公司 | Method for formatting and extracting chart information |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108829826B (en) | Image retrieval method based on deep learning and semantic segmentation | |
CN112381082A (en) | Table structure reconstruction method based on deep learning | |
CN107239801B (en) | Video attribute representation learning method and video character description automatic generation method | |
CN100373399C (en) | Method and apparatus for establishing degradation dictionary | |
CN112819686B (en) | Image style processing method and device based on artificial intelligence and electronic equipment | |
CN112132197B (en) | Model training, image processing method, device, computer equipment and storage medium | |
CN110570481A (en) | calligraphy word stock automatic repairing method and system based on style migration | |
CN112884758B (en) | Defect insulator sample generation method and system based on style migration method | |
CN113762269B (en) | Chinese character OCR recognition method, system and medium based on neural network | |
CN110969681A (en) | Method for generating handwriting characters based on GAN network | |
CN109299303B (en) | Hand-drawn sketch retrieval method based on deformable convolution and depth network | |
CN113283336A (en) | Text recognition method and system | |
CN110674777A (en) | Optical character recognition method in patent text scene | |
CN114170608A (en) | Super-resolution text image recognition method, device, equipment and storage medium | |
CN111523622A (en) | Method for simulating handwriting by mechanical arm based on characteristic image self-learning | |
Saraf et al. | Devnagari script character recognition using genetic algorithm for get better efficiency | |
CN114241495B (en) | Data enhancement method for off-line handwritten text recognition | |
CN114742014B (en) | Few-sample text style migration method based on associated attention | |
Zhong et al. | Least-squares method and deep learning in the identification and analysis of name-plates of power equipment | |
CN114359917A (en) | Handwritten Chinese character detection and recognition and font evaluation method | |
Pang et al. | PTRSegNet: A Patch-to-Region Bottom-Up Pyramid Framework for the Semantic Segmentation of Large-Format Remote Sensing Images | |
CN113096133A (en) | Method for constructing semantic segmentation network based on attention mechanism | |
CN116361502B (en) | Image retrieval method, device, computer equipment and storage medium | |
Wu et al. | Sketchscene: Scene sketch to image generation with diffusion models | |
CN116311281A (en) | Handwriting font correcting system based on generation countermeasure network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20210219 |
|
WD01 | Invention patent application deemed withdrawn after publication |