CN112381082A - Table structure reconstruction method based on deep learning - Google Patents

Table structure reconstruction method based on deep learning Download PDF

Info

Publication number
CN112381082A
CN112381082A CN202011280981.1A CN202011280981A CN112381082A CN 112381082 A CN112381082 A CN 112381082A CN 202011280981 A CN202011280981 A CN 202011280981A CN 112381082 A CN112381082 A CN 112381082A
Authority
CN
China
Prior art keywords
image
reconstructed
training
deep learning
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011280981.1A
Other languages
Chinese (zh)
Inventor
蔡雨欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changzhi Instant Communication Equipment Co ltd
Original Assignee
Changzhi Instant Communication Equipment Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changzhi Instant Communication Equipment Co ltd filed Critical Changzhi Instant Communication Equipment Co ltd
Priority to CN202011280981.1A priority Critical patent/CN112381082A/en
Publication of CN112381082A publication Critical patent/CN112381082A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a table structure reconstruction method based on deep learning, which comprises the following steps: acquiring a training image, wherein a table is displayed on the training image; preprocessing a training image; extracting a feature map of the preprocessed image; learning and parameter updating are carried out by utilizing the characteristic diagram, and a table line classification and positioning model is obtained; acquiring an image to be reconstructed for table reconstruction, wherein the image to be reconstructed displays a table to be reconstructed; obtaining structural information of a table to be reconstructed according to the table line classification and positioning model; performing character recognition and image target detection on an image to be reconstructed to obtain table content information; and matching the structure information of the table to be reconstructed with the table content information to reconstruct the table. The invention can complete the training of the network by using less data to enable the network to learn stable and accurate characteristic information, thereby greatly improving the accuracy of extracting the table line information under the condition of low data samples, and meanwhile, the algorithm has good generalization capability and better robustness.

Description

Table structure reconstruction method based on deep learning
Technical Field
The invention relates to the field of image processing and pattern recognition, in particular to a table structure reconstruction method based on an image processing and deep learning method.
Background
As a common document format, a form frequently appears in people's lives, such as a resume, an entry form, a financial statement, and the like. The form style is changeable and has the characteristic of the format of the form, however, when people need to use the form, the people often need to establish a new form style by themselves, which is time-consuming.
Currently unsophisticated automation schemes may assist or assist users in quickly completing form copying or editing. The early table restoration or table identification methods are mostly traditional image processing schemes based on line regression or hough transform as main methods, and are difficult to deal with various table styles and scenes.
It can be seen that the current table reconstruction scheme has the problems of poor generalization capability and single applicable environment.
Disclosure of Invention
The invention aims to provide a table structure reconstruction method based on deep learning, which does not need user interaction in the whole process of table reconstruction, can be suitable for various table styles and scene application tables, and supports cross-platform.
In order to achieve the technical purpose, the technical scheme adopted by the invention is as follows:
a table structure reconstruction method based on deep learning comprises
Acquiring a training image, wherein a table is displayed on the training image;
preprocessing a training image;
extracting a feature map of the preprocessed image;
learning and parameter updating are carried out by utilizing the characteristic diagram, and a table line classification and positioning model is obtained;
acquiring an image to be reconstructed for table reconstruction, wherein the image to be reconstructed displays a table to be reconstructed;
obtaining structural information of a table to be reconstructed according to the image to be reconstructed;
performing character recognition and image target detection on an image to be reconstructed to obtain table content information;
and matching the structure information of the table to be reconstructed with the table content information to reconstruct the table.
Preferably, the preprocessing the training target image includes:
performing data augmentation on the training image to generate augmented training data;
mixing the augmented training data with the real form data, and then carrying out normalization processing;
and acquiring a label of each training image, wherein the label comprises the relative position relation of each pixel point relative to the table structure.
Preferably, the extracting the feature map of the preprocessed image includes:
inputting the preprocessed image data and the labels into a segmentation network based on deep learning to train and extract a feature map;
the split network employs an improved Unet split network that selects a weighted cross-entropy loss function for training.
Preferably, the obtaining of the structural information of the table to be reconstructed according to the image to be reconstructed includes:
sending the image to be reconstructed into the segmentation network to obtain a pixel probability matrix of table line classification;
generating a table line binary image according to the obtained pixel probability matrix;
extracting cell intersections according to the table line binary image;
and combining the rows and the columns of the cells according to the cell intersection points to obtain the structural information of the table to be reconstructed.
Preferably, the generating a table line binary image according to the obtained pixel probability matrix includes mapping the pixel probability matrix, and specifically includes:
and setting a probability threshold, if the pixel probability of the horizontal and vertical lines of the table is greater than the probability threshold, mapping the corresponding pixel point to be 255, otherwise, mapping to be 0, and thus segmenting the table lines.
The table structure reconstruction algorithm based on deep learning aims to convert a table image or an uneditable table area into a table in an editable form through deep learning and image processing ideas, so that a user can quickly copy or modify the structure or content of a certain table without redrawing the table, and early preparation is made for extracting key content of the table.
Compared with the prior art, the method can complete the training of the network by using less data to enable the network to learn stable and accurate characteristic information, thereby greatly improving the accuracy of extracting the profile information under the condition of low data samples. The algorithm has good generalization capability, still has very high recovery effect on fuzzy, perspective distortion and inclined tables to a certain degree, and has better robustness. The method tests about 1000 images, and the accuracy rate of cell reconstruction is over 95%.
Drawings
FIG. 1 is a flow chart of a table structure reconstruction method based on deep learning;
FIG. 2 is a flow diagram of training image preprocessing;
fig. 3 is a flowchart for acquiring structure information of a table to be reconstructed.
Detailed Description
In order to facilitate understanding of those skilled in the art, the present invention will be further described with reference to the following examples and drawings, which are not intended to limit the present invention.
As shown in FIG. 1, a table structure reconstruction method based on deep learning includes a training process and an inference process.
Training process:
the training comprises the steps of collecting samples to establish corresponding reconstruction models, and specifically comprises the following steps:
s1: the method comprises the steps of collecting a training image for training, wherein the training image is from a network real form image or a PDF or word form screenshot, can be shot through a camera, and can also be shot through a computer, but the training image is required to contain a form.
S2: the collected training images are preprocessed, so that sample training is facilitated, and the specific preprocessing process is as follows:
s21: and performing data augmentation on the training image to generate augmented training data.
Data augmentation is one of the common skills in deep learning, and is mainly used for increasing a training data set, so that the data set is diversified as much as possible, and a trained model has stronger generalization capability. The related data in the data set is improved through data augmentation, so that the network can be prevented from learning irrelevant features, more data related performance is learned, and the overall performance is obviously improved. In practical applications, not all augmentation methods are applicable to the current training data, and it is necessary to determine which augmentation method should be used according to the features of the currently trained data set.
In the embodiment, data amplification in the modes of inversion, noise addition, color dithering, blurring and the like is adopted.
S22: and mixing the augmented training data subjected to data augmentation in the step S21 with the real table data to form a training image set, and then carrying out normalization processing on the training image set.
S23: and acquiring a label of each training image, wherein the label comprises the relative position relation of each pixel point relative to the table structure.
S3: extracting a feature map of the preprocessed image, which specifically comprises the following steps:
s31: the preprocessed image data and the label of the step S23 are used as input, and are sent into a segmentation network based on deep learning to be trained to extract a feature map.
A common segmentation network is a Unet model, in which feature extraction is performed on image data through an encoder (encoder), up-sampling and feature fusion are performed in an extracted feature map, and classified probabilities and pixel coordinates are output.
The application improves on a Unet split network, and replaces the original loss Function by using a weighted Cross Entropy loss Function (Cross Engine Error Function).
S4: and learning and updating parameters by using the characteristic diagram to obtain a table line classification and positioning model.
And (3) reasoning process:
the inference process identifies the table in the image to be reconstructed according to the reconstruction model, and specifically comprises the following steps:
s5: acquiring an image to be reconstructed for form reconstruction, wherein the image to be reconstructed can be shot by a camera or captured by a computer, and the image to be reconstructed displays a form to be reconstructed;
s6: obtaining structure information of a table to be reconstructed according to an image to be reconstructed, which specifically comprises the following steps:
s61: sending the image to be reconstructed into the segmentation network used in S31 to obtain a pixel probability matrix of table line classification;
s62: generating a table line binary image according to the obtained pixel probability matrix, namely mapping the pixel probability matrix, specifically comprising:
setting a probability threshold value, for example, 0.5, if the pixel probability of the table line is greater than 0.5, mapping the corresponding pixel point to 255, otherwise, mapping to 0, and mapping all the pixel points to form a binary image with the same size as the original image.
S63: the positions of the cell intersections, such as the coordinates of the four vertices of the cell, are extracted from the table line binary image.
S64: and combining the rows and the columns of the cells according to the cell intersection points to obtain the structural information of the table to be reconstructed.
S7: and performing character recognition and image target detection on the image to be reconstructed to obtain table content information.
The character recognition here may adopt a common OCR recognition algorithm for recognizing character information and image information in the image to be reconstructed.
S8: matching the structure information of the table to be reconstructed with the table content information, and reconstructing the table, which specifically comprises the following steps:
the method comprises the steps of describing structure information and table content information of a table according to a general description language of the table, then opening the word described in word, excel and Office of WPS, and editing the word, such as an XML file, wherein basic composition units of the table are individual cells, vertex coordinates of the cells are obtained according to the method so as to position rows and columns of the extracted cells, and then the character information and the image information identified according to characters are in the corresponding cells.
It should be noted that, in this embodiment, the RGB images are selected for both the training and the image to be reconstructed.
The table structure reconstruction method based on deep learning provided by the application is described in detail above. The description of the specific embodiments is only intended to facilitate an understanding of the methods of the present application and their core concepts. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.

Claims (5)

1. The table structure reconstruction method based on deep learning is characterized by comprising
Acquiring a training image, wherein a table is displayed on the training image;
preprocessing a training image;
extracting a feature map of the preprocessed image;
learning and parameter updating are carried out by utilizing the characteristic diagram, and a table line classification and positioning model is obtained;
acquiring an image to be reconstructed for table reconstruction, wherein the image to be reconstructed displays a table to be reconstructed;
obtaining structural information of a table to be reconstructed according to the table line classification and positioning model;
performing character recognition and image target detection on an image to be reconstructed to obtain table content information;
and matching the structure information of the table to be reconstructed with the table content information to reconstruct the table.
2. The table structure reconstruction method based on deep learning of claim 1, wherein the preprocessing of the training target image comprises:
performing data augmentation on the training image to generate augmented training data;
mixing the augmented training data with the real form data, and then carrying out normalization processing;
and acquiring a label of each training image, wherein the label comprises the relative position relation of each pixel point relative to the table structure.
3. The table structure reconstruction method based on deep learning of claim 2, wherein the extracting the feature map of the preprocessed image comprises:
inputting the preprocessed image data and the labels into a segmentation network based on deep learning to train and extract a feature map;
the split network employs an improved Unet split network that selects a weighted cross-entropy loss function for training.
4. The table structure reconstruction method based on deep learning of claim 3, wherein obtaining the structure information of the table to be reconstructed according to the image to be reconstructed comprises:
sending the image to be reconstructed into the segmentation network to obtain a pixel probability matrix of table line classification;
generating a table line binary image according to the obtained pixel probability matrix;
extracting cell intersections according to the table line binary image;
and combining the rows and the columns of the cells according to the cell intersection points to obtain the structural information of the table to be reconstructed.
5. The table structure reconstructing method based on deep learning according to claim 4, wherein the generating of the table line binary image according to the obtained pixel probability matrix includes mapping the pixel probability matrix, and specifically includes:
and setting a probability threshold, if the pixel probability of the table horizontal and vertical lines is greater than the probability threshold, mapping the corresponding pixel point to be 255, otherwise, mapping to be 0, and thus segmenting the table horizontal and vertical lines.
CN202011280981.1A 2020-11-16 2020-11-16 Table structure reconstruction method based on deep learning Pending CN112381082A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011280981.1A CN112381082A (en) 2020-11-16 2020-11-16 Table structure reconstruction method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011280981.1A CN112381082A (en) 2020-11-16 2020-11-16 Table structure reconstruction method based on deep learning

Publications (1)

Publication Number Publication Date
CN112381082A true CN112381082A (en) 2021-02-19

Family

ID=74584795

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011280981.1A Pending CN112381082A (en) 2020-11-16 2020-11-16 Table structure reconstruction method based on deep learning

Country Status (1)

Country Link
CN (1) CN112381082A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610043A (en) * 2021-08-19 2021-11-05 海默潘多拉数据科技(深圳)有限公司 Industrial drawing table structured recognition method and system
CN113850249A (en) * 2021-12-01 2021-12-28 深圳市迪博企业风险管理技术有限公司 Method for formatting and extracting chart information
CN114463766A (en) * 2021-07-16 2022-05-10 荣耀终端有限公司 Form processing method and electronic equipment
JP7268115B1 (en) 2021-11-09 2023-05-02 西松建設株式会社 Rebar arrangement list reader, list reader, bar arrangement list reading method and program

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114463766A (en) * 2021-07-16 2022-05-10 荣耀终端有限公司 Form processing method and electronic equipment
CN113610043A (en) * 2021-08-19 2021-11-05 海默潘多拉数据科技(深圳)有限公司 Industrial drawing table structured recognition method and system
JP7268115B1 (en) 2021-11-09 2023-05-02 西松建設株式会社 Rebar arrangement list reader, list reader, bar arrangement list reading method and program
JP2023070290A (en) * 2021-11-09 2023-05-19 西松建設株式会社 Rebar arrangement list reader, list reader, rebar arrangement list reading method and program
CN113850249A (en) * 2021-12-01 2021-12-28 深圳市迪博企业风险管理技术有限公司 Method for formatting and extracting chart information

Similar Documents

Publication Publication Date Title
CN108829826B (en) Image retrieval method based on deep learning and semantic segmentation
CN112381082A (en) Table structure reconstruction method based on deep learning
CN107239801B (en) Video attribute representation learning method and video character description automatic generation method
CN100373399C (en) Method and apparatus for establishing degradation dictionary
CN112819686B (en) Image style processing method and device based on artificial intelligence and electronic equipment
CN112132197B (en) Model training, image processing method, device, computer equipment and storage medium
CN110570481A (en) calligraphy word stock automatic repairing method and system based on style migration
CN112884758B (en) Defect insulator sample generation method and system based on style migration method
CN113762269B (en) Chinese character OCR recognition method, system and medium based on neural network
CN110969681A (en) Method for generating handwriting characters based on GAN network
CN109299303B (en) Hand-drawn sketch retrieval method based on deformable convolution and depth network
CN113283336A (en) Text recognition method and system
CN110674777A (en) Optical character recognition method in patent text scene
CN114170608A (en) Super-resolution text image recognition method, device, equipment and storage medium
CN111523622A (en) Method for simulating handwriting by mechanical arm based on characteristic image self-learning
Saraf et al. Devnagari script character recognition using genetic algorithm for get better efficiency
CN114241495B (en) Data enhancement method for off-line handwritten text recognition
CN114742014B (en) Few-sample text style migration method based on associated attention
Zhong et al. Least-squares method and deep learning in the identification and analysis of name-plates of power equipment
CN114359917A (en) Handwritten Chinese character detection and recognition and font evaluation method
Pang et al. PTRSegNet: A Patch-to-Region Bottom-Up Pyramid Framework for the Semantic Segmentation of Large-Format Remote Sensing Images
CN113096133A (en) Method for constructing semantic segmentation network based on attention mechanism
CN116361502B (en) Image retrieval method, device, computer equipment and storage medium
Wu et al. Sketchscene: Scene sketch to image generation with diffusion models
CN116311281A (en) Handwriting font correcting system based on generation countermeasure network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210219

WD01 Invention patent application deemed withdrawn after publication