CN117973337A - Table reconstruction method, apparatus, electronic device and storage medium - Google Patents

Table reconstruction method, apparatus, electronic device and storage medium Download PDF

Info

Publication number
CN117973337A
CN117973337A CN202410102694.3A CN202410102694A CN117973337A CN 117973337 A CN117973337 A CN 117973337A CN 202410102694 A CN202410102694 A CN 202410102694A CN 117973337 A CN117973337 A CN 117973337A
Authority
CN
China
Prior art keywords
image
cell
feature
candidate
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410102694.3A
Other languages
Chinese (zh)
Other versions
CN117973337B (en
Inventor
张亚萍
庞刘成
赵阳
周玉
宗成庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202410102694.3A priority Critical patent/CN117973337B/en
Publication of CN117973337A publication Critical patent/CN117973337A/en
Application granted granted Critical
Publication of CN117973337B publication Critical patent/CN117973337B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/177Editing, e.g. inserting or deleting of tables; using ruled lines
    • G06F40/18Editing, e.g. inserting or deleting of tables; using ruled lines of spreadsheets
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/414Extracting the geometrical structure, e.g. layout tree; Block segmentation, e.g. bounding boxes for graphics or text

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Geometry (AREA)
  • Computer Graphics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a table reconstruction method, a table reconstruction device, electronic equipment and a storage medium, which are applied to the technical field of image processing. The method comprises the following steps: acquiring a form image; extracting image features of the table image, and determining cell categories, cell coordinates and cell pixel masks of the table image according to the image features; carrying out grid line reconstruction according to the cell coordinates and the cell pixel mask to obtain a first table, and carrying out cell merging on the first table according to the cell category to obtain a second table; the cell categories comprise blank cells, basic cells and merging cells.

Description

Table reconstruction method, apparatus, electronic device and storage medium
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to a table reconstruction method, apparatus, electronic device, and storage medium.
Background
The form recognition technology is to automatically parse a form region contained in an image into a structured form by using a computer system and store the structured form. The form identification technology can quickly and effectively help people identify and understand form contents in the image, and can quickly analyze the form in the image into a computer readable format so as to facilitate electronic storage and subsequent content analysis of the form contents.
In the prior art, the structure identification branch and the content identification branch of the table image are separated, and then the structural analysis of the table content is realized through merging processing.
However, since the structure recognition branches and the content recognition branches have a difference in modality information, the content recognition branches lack an interdependence relationship between structures, and thus the recognition performance is poor.
Disclosure of Invention
The invention provides a table reconstruction method, a table reconstruction device, electronic equipment and a storage medium, which are used for solving the problem of poor recognition performance of the content recognition branch lack of the structure of the table recognition technology in the prior art.
The invention provides a table reconstruction method, which comprises the following steps: acquiring a form image; extracting image features of the table image, and determining cell categories, cell coordinates and cell pixel masks of the table image according to the image features; carrying out grid line reconstruction according to the cell coordinates and the cell pixel mask to obtain a first table, and carrying out cell merging on the first table according to the cell category to obtain a second table; the cell categories comprise blank cells, basic cells and merging cells.
According to the present invention, there is provided a table reconstruction method, wherein the determining a cell category, a cell coordinate and a cell pixel mask of the table image according to the image feature includes: determining a first candidate frame and a first candidate feature; predicting a target object feature from the image feature, the first candidate box, and the first candidate feature; and obtaining the cell category, the cell coordinates and the cell pixel mask by decoding the target object feature.
According to the present invention, there is provided a table reconstruction method, wherein the determining a target object feature according to the image feature, the first candidate frame and the first candidate feature includes: performing multi-head self-attention transformation on the first candidate feature to obtain a second candidate feature; performing frame region interest alignment on the first candidate frame and the image feature to obtain a first frame feature; performing dynamic convolution module enhancement on the second candidate features and the first frame features to obtain second frame features, determining a second candidate frame based on the second frame features, and updating the first candidate frame to be the second candidate frame; performing mask region interest alignment on the second frame feature and the image feature to obtain a first pixel mask; and carrying out dynamic convolution module enhancement on the second candidate feature and the first pixel mask to obtain the target object feature.
According to the present invention, there is provided a form reconstruction method, wherein the obtaining a form image includes: performing size transformation processing on the image to be processed according to a preset size; and identifying a table position in the image to be processed, and dividing the table image from the image to be processed according to the table position.
According to the present invention, there is provided a table reconstruction method, wherein the extracting the image features of the table image includes: and determining the characteristic expression of the table image through a convolutional neural network, and determining the multi-scale characteristic expression of the characteristic expression through a characteristic pyramid network to obtain the image characteristic.
According to the present invention, before the table image is acquired, the method further includes: acquiring training data, wherein the training data comprises a training image and an image label, and the image label comprises a category label, a coordinate label and a pixel label; inputting the training image into a table recognition model to obtain a first cell category, a first cell coordinate and a first cell pixel mask; determining a first loss according to the first cell category and the category label, determining a second loss according to the first cell coordinates and the coordinate label, and determining a third loss according to the first cell pixel mask and the pixel label; weighting and summing the first loss, the second loss and the third loss based on a predefined weight coefficient to obtain a target loss; and updating the model parameters of the table identification model according to the target loss.
According to the present invention, there is provided a table reconstruction method, wherein the extracting image features of the table image and determining cell categories, cell coordinates and cell pixel masks of the table image according to the image features includes: and extracting image features of the table image through the table identification model, and predicting cell categories, cell coordinates and cell pixel masks of the table image according to the image features.
The invention also provides a table reconstruction device, which comprises: the device comprises an acquisition module and a processing module; the acquisition module is used for acquiring a form image; the processing module is used for extracting the image characteristics of the table image and determining the cell category, the cell coordinates and the cell pixel mask of the table image according to the image characteristics; carrying out grid line reconstruction according to the cell coordinates and the cell pixel mask to obtain a first table, and carrying out cell merging on the first table according to the cell category to obtain a second table; the cell categories comprise blank cells, basic cells and merging cells.
According to the present invention, there is provided a table reconstruction device, wherein the processing module is configured to: determining a first candidate frame and a first candidate feature; predicting a target object feature from the image feature, the first candidate box, and the first candidate feature; and obtaining the cell category, the cell coordinates and the cell pixel mask by decoding the target object feature.
According to the present invention, there is provided a table reconstruction device, wherein the processing module is configured to: performing multi-head self-attention transformation on the first candidate feature to obtain a second candidate feature; performing frame region interest alignment on the first candidate frame and the image feature to obtain a first frame feature; performing dynamic convolution module enhancement on the second candidate features and the first frame features to obtain second frame features, determining a second candidate frame based on the second frame features, and updating the first candidate frame to be the second candidate frame; performing mask region interest alignment on the second frame feature and the image feature to obtain a first pixel mask; and carrying out dynamic convolution module enhancement on the second candidate feature and the first pixel mask to obtain the target object feature.
According to the invention, there is provided a table reconstruction device, wherein the acquisition module is used for: performing size transformation processing on the image to be processed according to a preset size; and identifying a table position in the image to be processed, and dividing the table image from the image to be processed according to the table position.
According to the present invention, there is provided a table reconstruction device, wherein the processing module is configured to: and determining the characteristic expression of the table image through a convolutional neural network, and determining the multi-scale characteristic expression of the characteristic expression through a characteristic pyramid network to obtain the image characteristic.
According to the invention, there is provided a table reconstruction device, wherein the acquisition module is used for: acquiring training data, wherein the training data comprises a training image and an image label, and the image label comprises a category label, a coordinate label and a pixel label; the processing module is used for: inputting the training image into a table recognition model to obtain a first cell category, a first cell coordinate and a first cell pixel mask; determining a first loss according to the first cell category and the category label, determining a second loss according to the first cell coordinates and the coordinate label, and determining a third loss according to the first cell pixel mask and the pixel label; weighting and summing the first loss, the second loss and the third loss based on a predefined weight coefficient to obtain a target loss; and updating the model parameters of the table identification model according to the target loss.
According to the present invention, there is provided a table reconstruction device, wherein the processing module is configured to: and extracting image features of the table image through the table identification model, and predicting cell categories, cell coordinates and cell pixel masks of the table image according to the image features.
The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the table reconstruction method as described in any one of the above when executing the program.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of a table reconstruction method as described in any of the above.
The table reconstruction method, the table reconstruction device, the electronic equipment and the storage medium can acquire a table image; extracting image features of the table image, and determining cell categories, cell coordinates and cell pixel masks of the table image according to the image features; carrying out grid line reconstruction according to the cell coordinates and the cell pixel mask to obtain a first table, and carrying out cell merging on the first table according to the cell category to obtain a second table; the cell categories comprise blank cells, basic cells and merging cells. According to the scheme, the cell type, the cell coordinate and the cell pixel mask of the table image can be determined according to the image characteristics, and the second table is reconstructed according to the cell type, the cell coordinate and the cell pixel mask.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a table reconstruction method according to the present invention;
FIG. 2 is a second flow chart of the table rebuilding method according to the present invention;
FIG. 3 is a third flow chart of a table rebuilding method according to the present invention;
FIG. 4 is a schematic diagram of a table reconstruction device according to the present invention;
Fig. 5 is a schematic structural diagram of an electronic device provided by the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that, in the embodiments of the present invention, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present invention is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.
In order to clearly describe the technical solution of the embodiment of the present invention, in the embodiment of the present invention, the words "first", "second", etc. are used to distinguish identical items or similar items having substantially the same function and effect, and those skilled in the art will understand that the words "first", "second", etc. are not limited in number and execution order.
Embodiments of the invention some exemplary embodiments have been described for illustrative purposes, it being understood that the invention may be practiced otherwise than as specifically shown in the accompanying drawings.
The foregoing implementations are described in detail below with reference to specific embodiments and accompanying drawings.
As shown in fig. 1, an embodiment of the present invention provides a table reconstruction method, which can be applied to a table reconstruction device. The table reconstruction method may include S101-S103:
s101, a table reconstruction device acquires a table image.
Wherein, the form image is an image containing form elements.
Optionally, the table reconstruction device may perform size transformation processing on the image to be processed according to a preset size; and identifying a table position in the image to be processed, and dividing the table image from the image to be processed according to the table position.
Specifically, the table reconstruction device may perform a preprocessing operation on the input image, where the preprocessing operation includes first adjusting the size of the input image to a preset size (img W*imgH) by using an interpolation algorithm, where img W represents an image width, img H represents an image height, then detecting a table position in the input image by using a table image detection algorithm, and finally dividing the image containing the table according to the table position to obtain a table image, where the table image may be represented as a matrix (img W*imgH*imgC), and img C represents the number of image channels.
S102, a table reconstruction device extracts image features of the table image and determines cell categories, cell coordinates and cell pixel masks of the table image according to the image features.
The cell categories include blank cells, basic cells and merging cells.
Optionally, the table reconstruction device may determine a feature expression of the table image through a convolutional neural network, and determine a multi-scale feature representation of the feature expression through a feature pyramid network, so as to obtain the image feature.
Specifically, the table reconstruction device may perform multiplexing operations such as convolution, pooling, residual connection, activation, etc. through the convolutional neural network, so as to obtain a feature expression of the table image, then, implement multi-scale feature expression of the table image through the feature pyramid network, and finally, obtain an image feature of the table image, where the image feature may be expressed as a matrix W img*Himg*Cimg.
Optionally, the table reconstruction means may determine a first candidate box and a first candidate feature; predicting a target object feature from the image feature, the first candidate box, and the first candidate feature; and obtaining the cell category, the cell coordinates and the cell pixel mask by decoding the target object feature.
Specifically, the table reconstruction device may perform feature encoding on the input image features by using an encoder group, and then decode according to different targets by using a decoder group. First, the table reconstruction means may randomly initialize the first candidate frame and the first candidate feature that are learnable, wherein the number of the first candidate frame and the first candidate feature may be set to 300, the dimension of the first candidate frame may be set to 4, and the dimension of the first candidate feature may be set to 256. And then, mutually restricting the candidate features and the candidate frames through a dynamic convolution module, and encoding to obtain the final object features. The method specifically comprises the following steps: the table reconstruction device can perform multi-head self-attention transformation on the first candidate feature to obtain a second candidate feature; and carrying out frame region interest alignment on the first candidate frame and the image features by using a bilinear interpolation algorithm to obtain first frame features, so that the frame features can be obtained more accurately to improve the sensitivity to small regions. Then, carrying out dynamic convolution module enhancement on the second candidate features and the first frame features to obtain second frame features, determining a second candidate frame based on the second frame features, and updating the first candidate frame into the second candidate frame; then, mask region interest alignment is carried out on the second frame features and the image features, and a first pixel mask is obtained; and carrying out dynamic convolution module enhancement on the second candidate feature and the first pixel mask to obtain a target object feature. After the target object feature is obtained, the table reconstruction device may obtain the cell category by a classification decoder, the cell coordinates by a bounding box decoder, and the cell pixel mask by a pixel mask decoder.
Optionally, after obtaining the second frame feature, a candidate frame decoder may classify and regress the second frame feature to obtain a second candidate frame, and replace the first candidate frame with the second candidate frame, so as to implement updating of the first candidate frame.
Alternatively, the classification decoder may include 5 linear layers, the bounding box decoder may include 3 linear layers, and the pixel mask decoder may include four consecutive convolution layers for capturing the input hierarchical features, one deconvolution layer for upsampling the spatial resolution of the image features, and one 1×1 convolution layer for reducing the number of channels in the image features, resulting in the current stage of mask prediction.
Optionally, before acquiring the form image, the form reconstruction device may acquire training data, where the training data includes a training image and an image tag, and the image tag includes a category tag, a coordinate tag, and a pixel tag; inputting the training image into a table recognition model to obtain a first cell category, a first cell coordinate and a first cell pixel mask; determining a first loss according to the first cell category and the category label, determining a second loss according to the first cell coordinates and the coordinate label, and determining a third loss according to the first cell pixel mask and the pixel label; weighting and summing the first loss, the second loss and the third loss based on a predefined weight coefficient to obtain a target loss; model parameters of the tabular identification model are updated in accordance with the target loss using AdamW optimization algorithm.
Specifically, as shown in fig. 2, after the training image is input into the table recognition model, the table recognition model may firstly perform image feature extraction on the training image, then perform feature encoding through the encoder set, then perform feature decoding through the decoder set, then calculate the target loss based on the decoding result and the image tag, and finally implement gradient update of the table recognition model through the target loss.
Alternatively, as shown in fig. 3, the table reconstructing apparatus may extract image features of the table image through the above-mentioned table recognition model, and predict cell categories, cell coordinates, and cell pixel masks of the table image according to the image features.
And S103, the table reconstruction device reconstructs grid lines according to the cell coordinates and the cell pixel mask to obtain a first table, and performs cell merging on the first table according to the cell category to obtain a second table.
Specifically, with continued reference to fig. 3, for the output cell coordinate set, the table reconstruction device may filter out the cell coordinates with confidence level lower than the preset threshold through the maximum suppression algorithm, then reconstruct virtual grid lines based on the filtered cell coordinate set and the cell pixel mask area to obtain a first table through the set horizontal line threshold and the set vertical line threshold, merge cells of the first table according to the output cell type, generate a logical structure of the table according to the merged cell set, and backfill cell contents to obtain a reconstructed second table.
The embodiment of the invention can fuse the interdependence relationship among the cells in the training process, can simultaneously utilize the information between the structure and the content, has shared information and shared partial model parameters among different cells, and can improve the training effect of the model; in the testing and reasoning process, only the regression frame fused with the cell layout and the content information is required to be inferred, the complexity of the required model storage space is small, in the testing stage, the model is directly converted into a structured sequence from the image containing the table, the required model decoding time is greatly reduced, and the recognition performance of the table image recognition architecture can be effectively improved in terms of quality and efficiency.
In the embodiment of the invention, the cell type, the cell coordinate and the cell pixel mask of the table image can be determined according to the image characteristics, and the second table is reconstructed according to the cell type, the cell coordinate and the cell pixel mask.
The foregoing description of the solution provided by the embodiments of the present invention has been mainly presented in terms of a method. To achieve the above functions, it includes corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the table reconstruction method provided by the embodiment of the invention, the execution body may be a table reconstruction device or a control module for table reconstruction in the table reconstruction device. In the embodiment of the present invention, a table reconstruction method performed by a table reconstruction device is taken as an example, and the table reconstruction device provided by the embodiment of the present invention is described.
It should be noted that, in the embodiment of the present invention, the table reconstruction device may be divided into functional modules according to the above method example, for example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The integrated modules may be implemented in hardware or in software functional modules. Optionally, the division of the modules in the embodiment of the present invention is schematic, which is merely a logic function division, and other division manners may be implemented in practice.
As shown in fig. 4, an embodiment of the present invention provides a table reconstruction apparatus 400. The table reconstruction device 400 includes: an acquisition module 401 and a processing module 402. The acquiring module 401 is configured to acquire a table image; the processing module 402 is configured to extract image features of the table image, and determine a cell category, a cell coordinate, and a cell pixel mask of the table image according to the image features; carrying out grid line reconstruction according to the cell coordinates and the cell pixel mask to obtain a first table, and carrying out cell merging on the first table according to the cell category to obtain a second table; the cell categories comprise blank cells, basic cells and merging cells.
Optionally, the processing module 402 is configured to: determining a first candidate frame and a first candidate feature; predicting a target object feature from the image feature, the first candidate box, and the first candidate feature; and obtaining the cell category, the cell coordinates and the cell pixel mask by decoding the target object feature.
Optionally, the processing module 402 is configured to: performing multi-head self-attention transformation on the first candidate feature to obtain a second candidate feature; performing frame region interest alignment on the first candidate frame and the image feature to obtain a first frame feature; performing dynamic convolution module enhancement on the second candidate features and the first frame features to obtain second frame features, determining a second candidate frame based on the second frame features, and updating the first candidate frame to be the second candidate frame; performing mask region interest alignment on the second frame feature and the image feature to obtain a first pixel mask; and carrying out dynamic convolution module enhancement on the second candidate feature and the first pixel mask to obtain the target object feature.
Optionally, the obtaining module 401 is configured to: performing size transformation processing on the image to be processed according to a preset size; and identifying a table position in the image to be processed, and dividing the table image from the image to be processed according to the table position.
Optionally, the processing module 402 is configured to: and determining the characteristic expression of the table image through a convolutional neural network, and determining the multi-scale characteristic expression of the characteristic expression through a characteristic pyramid network to obtain the image characteristic.
Optionally, the obtaining module 401 is configured to: acquiring training data, wherein the training data comprises a training image and an image label, and the image label comprises a category label, a coordinate label and a pixel label; the processing module 402 is configured to: inputting the training image into a table recognition model to obtain a first cell category, a first cell coordinate and a first cell pixel mask; determining a first loss according to the first cell category and the category label, determining a second loss according to the first cell coordinates and the coordinate label, and determining a third loss according to the first cell pixel mask and the pixel label; weighting and summing the first loss, the second loss and the third loss based on a predefined weight coefficient to obtain a target loss; and updating the model parameters of the table identification model according to the target loss.
Optionally, the processing module 402 is configured to: and extracting image features of the table image through the table identification model, and predicting cell categories, cell coordinates and cell pixel masks of the table image according to the image features.
In the embodiment of the invention, the cell type, the cell coordinate and the cell pixel mask of the table image can be determined according to the image characteristics, and the second table is reconstructed according to the cell type, the cell coordinate and the cell pixel mask.
Fig. 5 illustrates a physical schematic diagram of an electronic device, as shown in fig. 5, which may include: processor 510, communication interface (Communications Interface) 520, memory 530, and communication bus 540, wherein processor 510, communication interface 520, memory 530 complete communication with each other through communication bus 540. Processor 510 may invoke logic instructions in memory 530 to perform a table reconstruction method comprising: acquiring a form image; extracting image features of the table image, and determining cell categories, cell coordinates and cell pixel masks of the table image according to the image features; carrying out grid line reconstruction according to the cell coordinates and the cell pixel mask to obtain a first table, and carrying out cell merging on the first table according to the cell category to obtain a second table; the cell categories comprise blank cells, basic cells and merging cells.
Further, the logic instructions in the memory 530 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform a table reconstruction method provided by the above methods, the method comprising: acquiring a form image; extracting image features of the table image, and determining cell categories, cell coordinates and cell pixel masks of the table image according to the image features; carrying out grid line reconstruction according to the cell coordinates and the cell pixel mask to obtain a first table, and carrying out cell merging on the first table according to the cell category to obtain a second table; the cell categories comprise blank cells, basic cells and merging cells.
In yet another aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the table reconstruction methods provided above, the method comprising: acquiring a form image; extracting image features of the table image, and determining cell categories, cell coordinates and cell pixel masks of the table image according to the image features; carrying out grid line reconstruction according to the cell coordinates and the cell pixel mask to obtain a first table, and carrying out cell merging on the first table according to the cell category to obtain a second table; the cell categories comprise blank cells, basic cells and merging cells.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method of table reconstruction, comprising:
Acquiring a form image;
extracting image features of the table image, and determining cell categories, cell coordinates and cell pixel masks of the table image according to the image features;
Carrying out grid line reconstruction according to the cell coordinates and the cell pixel mask to obtain a first table, and carrying out cell merging on the first table according to the cell category to obtain a second table;
The cell categories comprise blank cells, basic cells and merging cells.
2. The method of claim 1, wherein determining the cell class, cell coordinates, and cell pixel mask of the table image from the image features comprises:
Determining a first candidate frame and a first candidate feature;
Predicting a target object feature from the image feature, the first candidate box, and the first candidate feature;
And obtaining the cell category, the cell coordinates and the cell pixel mask by decoding the target object feature.
3. The method of table reconstruction according to claim 2, wherein the determining a target object feature from the image feature, the first candidate box, and the first candidate feature comprises:
performing multi-head self-attention transformation on the first candidate feature to obtain a second candidate feature;
Performing frame region interest alignment on the first candidate frame and the image feature to obtain a first frame feature;
Performing dynamic convolution module enhancement on the second candidate features and the first frame features to obtain second frame features, determining a second candidate frame based on the second frame features, and updating the first candidate frame to be the second candidate frame;
Performing mask region interest alignment on the second frame feature and the image feature to obtain a first pixel mask;
and carrying out dynamic convolution module enhancement on the second candidate feature and the first pixel mask to obtain the target object feature.
4. The form reconstruction method according to claim 1, wherein the acquiring the form image includes:
Performing size transformation processing on the image to be processed according to a preset size;
And identifying a table position in the image to be processed, and dividing the table image from the image to be processed according to the table position.
5. The method of claim 1, wherein the extracting image features of the form image comprises:
And determining the characteristic expression of the table image through a convolutional neural network, and determining the multi-scale characteristic expression of the characteristic expression through a characteristic pyramid network to obtain the image characteristic.
6. The method of claim 1-5, wherein prior to the acquiring the form image, the method further comprises:
Acquiring training data, wherein the training data comprises a training image and an image label, and the image label comprises a category label, a coordinate label and a pixel label;
inputting the training image into a table recognition model to obtain a first cell category, a first cell coordinate and a first cell pixel mask;
Determining a first loss according to the first cell category and the category label, determining a second loss according to the first cell coordinates and the coordinate label, and determining a third loss according to the first cell pixel mask and the pixel label;
weighting and summing the first loss, the second loss and the third loss based on a predefined weight coefficient to obtain a target loss;
And updating the model parameters of the table identification model according to the target loss.
7. The method of claim 6, wherein extracting the image features of the table image and determining the cell category, the cell coordinates, and the cell pixel mask of the table image based on the image features comprises:
And extracting image features of the table image through the table identification model, and predicting cell categories, cell coordinates and cell pixel masks of the table image according to the image features.
8. A form reconstruction apparatus, comprising: the device comprises an acquisition module and a processing module;
the acquisition module is used for acquiring a form image;
The processing module is used for extracting the image characteristics of the table image and determining the cell category, the cell coordinates and the cell pixel mask of the table image according to the image characteristics; carrying out grid line reconstruction according to the cell coordinates and the cell pixel mask to obtain a first table, and carrying out cell merging on the first table according to the cell category to obtain a second table;
The cell categories comprise blank cells, basic cells and merging cells.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the table reconstruction method according to any one of claims 1 to 7 when the program is executed by the processor.
10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the steps in the table reconstruction method according to any one of claims 1 to 7.
CN202410102694.3A 2024-01-24 2024-01-24 Table reconstruction method, apparatus, electronic device and storage medium Active CN117973337B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410102694.3A CN117973337B (en) 2024-01-24 2024-01-24 Table reconstruction method, apparatus, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410102694.3A CN117973337B (en) 2024-01-24 2024-01-24 Table reconstruction method, apparatus, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN117973337A true CN117973337A (en) 2024-05-03
CN117973337B CN117973337B (en) 2024-10-11

Family

ID=90856737

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410102694.3A Active CN117973337B (en) 2024-01-24 2024-01-24 Table reconstruction method, apparatus, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN117973337B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190266394A1 (en) * 2018-02-26 2019-08-29 Abc Fintech Co., Ltd. Method and device for parsing table in document image
CN111932545A (en) * 2020-07-14 2020-11-13 浙江大华技术股份有限公司 Image processing method, target counting method and related device thereof
CN113297975A (en) * 2021-05-25 2021-08-24 新东方教育科技集团有限公司 Method and device for identifying table structure, storage medium and electronic equipment
CN114529925A (en) * 2022-04-22 2022-05-24 华南理工大学 Method for identifying table structure of whole line table
CN115331245A (en) * 2022-10-12 2022-11-11 中南民族大学 Table structure identification method based on image instance segmentation
CN115761773A (en) * 2022-11-17 2023-03-07 上海交通大学 Deep learning-based in-image table identification method and system
WO2023134447A1 (en) * 2022-01-12 2023-07-20 华为技术有限公司 Data processing method and related device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190266394A1 (en) * 2018-02-26 2019-08-29 Abc Fintech Co., Ltd. Method and device for parsing table in document image
CN111932545A (en) * 2020-07-14 2020-11-13 浙江大华技术股份有限公司 Image processing method, target counting method and related device thereof
CN113297975A (en) * 2021-05-25 2021-08-24 新东方教育科技集团有限公司 Method and device for identifying table structure, storage medium and electronic equipment
WO2023134447A1 (en) * 2022-01-12 2023-07-20 华为技术有限公司 Data processing method and related device
CN114529925A (en) * 2022-04-22 2022-05-24 华南理工大学 Method for identifying table structure of whole line table
CN115331245A (en) * 2022-10-12 2022-11-11 中南民族大学 Table structure identification method based on image instance segmentation
CN115761773A (en) * 2022-11-17 2023-03-07 上海交通大学 Deep learning-based in-image table identification method and system

Also Published As

Publication number Publication date
CN117973337B (en) 2024-10-11

Similar Documents

Publication Publication Date Title
EP3916635B1 (en) Defect detection method and apparatus
CN110428428B (en) Image semantic segmentation method, electronic equipment and readable storage medium
JP7373042B2 (en) Brain function registration method based on graph model
CN111784673B (en) Defect detection model training and defect detection method, device and storage medium
CN109615614B (en) Method for extracting blood vessels in fundus image based on multi-feature fusion and electronic equipment
CN113688862B (en) Brain image classification method based on semi-supervised federal learning and terminal equipment
CN110807463B (en) Image segmentation method and device, computer equipment and storage medium
CN112906794A (en) Target detection method, device, storage medium and terminal
CN116433914A (en) Two-dimensional medical image segmentation method and system
CN113449787B (en) Chinese character stroke structure-based font library completion method and system
CN113762265A (en) Pneumonia classification and segmentation method and system
CN116071309A (en) Method, device, equipment and storage medium for detecting sound scanning defect of component
CN113554655B (en) Optical remote sensing image segmentation method and device based on multi-feature enhancement
CN114708465A (en) Image classification method and device, electronic equipment and storage medium
CN117474796B (en) Image generation method, device, equipment and computer readable storage medium
Wang et al. Multi-focus image fusion framework based on transformer and feedback mechanism
CN117973337B (en) Table reconstruction method, apparatus, electronic device and storage medium
CN117710295A (en) Image processing method, device, apparatus, medium, and program product
CN111724309B (en) Image processing method and device, training method of neural network and storage medium
CN116778016A (en) MRI image reconstruction method, system and medium
CN116503677A (en) Wetland classification information extraction method, system, electronic equipment and storage medium
CN114708353B (en) Image reconstruction method and device, electronic equipment and storage medium
CN112232102A (en) Building target identification method and system based on deep neural network and multitask learning
CN113096142A (en) White matter nerve tract automatic segmentation method based on joint embedding space
CN117765531B (en) Cell image sample enhancement method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant